This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Quantum Speedups for Zero-Sum Games
via Improved Dynamic Gibbs Sampling

Adam Bouland     Yosheb Getachew     Yujia Jin     Aaron Sidford     Kevin Tian111Work partly completed while at Stanford.
{abouland,yoshebg,yujiajin,sidford}@stanford.edu, tiankevin@microsoft.com
Abstract

We give a quantum algorithm for computing an ϵ\epsilon-approximate Nash equilibrium of a zero-sum game in a m×nm\times n payoff matrix with bounded entries. Given a standard quantum oracle for accessing the payoff matrix our algorithm runs in time O~(m+nϵ2.5+ϵ3)\widetilde{O}(\sqrt{m+n}\cdot\epsilon^{-2.5}+\epsilon^{-3}) and outputs a classical representation of the ϵ\epsilon-approximate Nash equilibrium. This improves upon the best prior quantum runtime of O~(m+nϵ3)\widetilde{O}(\sqrt{m+n}\cdot\epsilon^{-3}) obtained by [vAG19] and the classic O~((m+n)ϵ2)\widetilde{O}((m+n)\cdot\epsilon^{-2}) runtime due to [GK95] whenever ϵ=Ω((m+n)1)\epsilon=\Omega((m+n)^{-1}). We obtain this result by designing new quantum data structures for efficiently sampling from a slowly-changing Gibbs distribution.

1 Introduction

There is now a broad family of quantum algorithms for machine learning and fast numerical linear algebra [BWP+17], built on many quantum algorithmic primitives, e.g. [BHMT02, HHL09, GSLW19]. More specifically, for a wide range of problems it has been shown how quantum algorithms can (in certain parameter regimes) yield faster runtimes.222Note that quantifying the end-to-end speedups obtained by these methods can be subtle due to I/O overheads, different access models [Aar15], and classical de-quantization algorithms [Tan19, CGL+20, GLG22]. These quantum algorithms obtain runtimes which improve upon the dimension dependence of classical algorithms, but often at the cost of a worse dependence on the error tolerance and/or implicit access to the solution (e.g. query or sampling access for solution entries). Consequently, this paper is motivated by the following question.

To what degree is there an inherent accuracy versus dimension-dependence tradeoff for quantum optimization algorithms? What algorithmic techniques improve this tradeoff?

In this paper we consider this question for the fundamental optimization problem of computing ϵ\epsilon-approximate Nash equilibrium in zero-sum games. Our main result is an improved dependence on ϵ\epsilon for quantum algorithms solving zero-sum games, which is very close to that of its classical counterpart. Further, we show that for our algorithms, obtaining a classical representation of the solution is obtainable at no additional asymptotic cost. Our work builds upon [vAG19, LCW19], which already took a large and important step towards answering the above question by designing quantum data structures for efficiently implementing algorithms for solving zero-sum games.

Interestingly, to obtain our result we provide improved quantum algorithms for solving a dynamic data structure problem of sampling from a slowly-changing Gibbs distribution. Such dynamic sampling problems arise as a natural component of stochastic gradient methods for solving zero-sum games. We obtain our speedups by improving a Gibbs sampling subroutine developed in [vAG19]. We design a new dynamic quantum data structure which performs the necessary Gibbs sampling in time O~(ϵ12)\widetilde{O}(\epsilon^{-\frac{1}{2}}), which is faster than the corresponding O~(ϵ1)\widetilde{O}(\epsilon^{-1}) runtime achieved by [vAG19]. Beyond the intrinsic utility of solving this problem, we hope our improved Gibbs sampler showcases potential algorithmic insights that can be gleaned by seeking improved error dependencies for quantum optimization algorithms. Moreover, we hope this work encourages the study and design of quantum data structures for efficient optimization.

1.1 Zero-sum games

For matrix 𝐀m×n\mathbf{A}\in\mathbb{R}^{m\times n} its associated zero-sum game is the pair of equivalent optimization problems

minuΔmmaxvΔnu𝐀v=maxvΔnminuΔmu𝐀v, where Δk:={x0k:i[k]xi=1}.\min_{u\in\Delta^{m}}\max_{v\in\Delta^{n}}u^{\top}\mathbf{A}v=\max_{v\in\Delta^{n}}\min_{u\in\Delta^{m}}u^{\top}\mathbf{A}v,\text{ where }\textstyle{\Delta^{k}:=\{x\in\mathbb{R}^{k}_{\geq 0}:\sum_{i\in[k]}x_{i}=1\}}.

In such a game, we refer to 𝐀\mathbf{A} as the payoff matrix and view the mm and nn-dimensional simplices, i.e. Δm\Delta^{m} and Δn\Delta^{n}, as the space of distributions over [m][m] and [n][n] respectively. From this perspective u𝐀vu^{\top}\mathbf{A}v, known as payoff or utility of (u,v)(u,v), is the expected value of 𝐀ij\mathbf{A}_{ij} when sampling i[m]i\in[m] and j[n]j\in[n] independently from the distributions corresponding to uu and vv. Thus, a zero-sum game models a two-player game where a minimization player seeks to minimize the payoff while, simultaneously, a maximization player seeks to maximize it.

In this paper, we consider the canonical problem of computing an approximate Nash equilibrium of a zero-sum game. Given the payoff matrix 𝐀m×n\mathbf{A}\in\mathbb{R}^{m\times n} we call a pair (u,v)Δm×Δn(u,v)\in\Delta^{m}\times\Delta^{n} an ϵ\epsilon-approximate Nash equilibrium (NE) for ϵ>0\epsilon\in\mathbb{R}_{>0} if

(maxvΔnu𝐀v)(minuΔm(u)𝐀v)ϵ.\left(\max_{v^{\prime}\in\Delta^{n}}u^{\top}\mathbf{A}v^{\prime}\right)-\left(\min_{u^{\prime}\in\Delta^{m}}(u^{\prime})^{\top}\mathbf{A}v\right)\leq\epsilon.

We assume that the payoff matrix 𝐀\mathbf{A} and the error-tolerance are given as input to an algorithm, and that, for simplicity, 𝐀max1\left\lVert\mathbf{A}\right\rVert_{\max}\leq 1, i.e. the largest entry of 𝐀\mathbf{A} has magnitude at most 11 (this is without loss of generality by rescaling 𝐀𝐀max1𝐀\mathbf{A}\leftarrow\left\lVert\mathbf{A}\right\rVert_{\max}^{-1}\mathbf{A} and ϵ𝐀max1ϵ\epsilon\leftarrow\left\lVert\mathbf{A}\right\rVert_{\max}^{-1}\epsilon). The main goal of this paper is to design improved zero-sum game solvers, i.e. algorithms that compute ϵ\epsilon-approximate NEs.

Zero-sum games are foundational to theoretical computer science, optimization, and economics. The problem of approximately solving zero-sum games is a natural formulation of approximate linear programming (LP) and correspondingly, this problem is a prominent testbed for new optimization techniques. Over the past decades there have been numerous advances in the computational complexity of solving zero-sum games under various assumptions on problem parameter (see Section 1.3 for a survey). Recent advancements in interior point methods (IPMs) for linear programming, e.g. [vdBLL+21] and references therein (discussed in more detail in Section 1.3), solve zero sum-games in time O~(mn+min(m,n)2.5)\widetilde{O}(mn+\min(m,n)^{2.5}).333We use the O~\widetilde{O} notation to hide polylogarithmic dependences on problem parameters when convenient for exposition; see Section 2 for a more detailed statement of hidden parameters. In informal theorem statements, we use “with high probability” to indicate a polylogarithmic dependence on the failure probability. Further the linear programming algorithm of [vdB20], shows that zero-sum games can be solved deterministically in O~((m+n)ω)\widetilde{O}((m+n)^{\omega}) time where ω<2.373\omega<2.373 is the current matrix multiplication constant [AW21], or O~((m+n)3)\widetilde{O}((m+n)^{3}) without fast matrix multiplication. In this paper, we primarily focus on sublinear-time algorithms for approximating NEs.

A well-known algorithm by [GK95] achieves a runtime of O~((m+n)ϵ2)\widetilde{O}((m+n)\cdot\epsilon^{-2}), which is the state-of-the-art sublinear runtime amongst classical algorithms, without further problem assumptions. Recently it has been shown that quantum algorithms can yield strikingly runtime improvements for solving zero-sum games and their generalizations [LCW19, vAG19, LWCW21]. In particular, in 2019 Li, Chakrabati and Wu [LCW19] gave a quantum algorithm for zero sum games in time O~(m+nϵ4)\widetilde{O}(\sqrt{m+n}\cdot\epsilon^{-4}), and simultaneously van Apeldoorn and Gilyen [vAG19] gave an algorithm running in time O~(m+nϵ3)\widetilde{O}(\sqrt{m+n}\cdot\epsilon^{-3}). These algorithms yield a quadratic improvement in the dimension dependence of the best classical algorithm, at the cost of a higher error dependence.

The algorithms of [LCW19, vAG19, LWCW21] operate using a standard quantum oracle for 𝐀\mathbf{A} (formally stated in Section 2), in which one can query the entries of 𝐀\mathbf{A} in superposition. We focus on the algorithm of [vAG19] for the rest of this paper, as we focus on improving error dependence. The [vAG19] algorithm generalizes the classical algorithm of Grigoriadis and Khachiyan [GK95], and obtains a runtime improvement by speeding up a key dynamic Gibbs sampling subroutine required by the [GK95] method. As we discuss in greater detail in Section 3, van Apeldoorn and Gilyen give a quantum data structure to efficiently perform this sampling in time quadratically faster in the dimension, which lies at the core of their algorithmic speedup.

Our result.

We give a new quantum algorithm for solving zero-sum games which improves upon the runtime of the prior state-of-the-art quantum algorithm, due to [vAG19].

Theorem 1 (informal, see Theorem 4).

Let 𝐀m×n\mathbf{A}\in\mathbb{R}^{m\times n} with 𝐀max1\left\lVert\mathbf{A}\right\rVert_{\max}\leq 1, and ϵ(0,1)\epsilon\in(0,1). Given a quantum oracle for 𝐀\mathbf{A} (defined in Section 2), there is an O~(m+nϵ2.5+ϵ3)\widetilde{O}(\sqrt{m+n}\cdot\epsilon^{-2.5}+\epsilon^{-3}) time algorithm which yields a classical output (u,v)Δm×Δn(u,v)\in\Delta^{m}\times\Delta^{n} that is an ϵ\epsilon-approximate NE with high probability.

Our new algorithm simultaneously improves the best known quantum [vAG19] and classical [GK95] algorithms in the parameter regime where IPMs do not dominate sublinear algorithms. In particular, it is faster than the classical O~((m+n)ϵ2)\widetilde{O}((m+n)\cdot\epsilon^{-2}) runtime of [GK95] whenever ϵ1=O~(m+n)\epsilon^{-1}=\widetilde{O}(m+n), which includes the regime where [GK95] offers advantages over the O~((m+n)ω)\widetilde{O}((m+n)^{\omega}) runtime of the [vdB20] IPM, as ω<3\omega<3. This is in contrast to the prior quantum rate of [vAG19], which does not achieve an improvement upon [GK95] in the full parameter range where sublinear algorithms are currently preferable to IPMs. For example, when mnm\approx n and (up to logarithmic factors) ϵ[nc,n12]\epsilon\in[n^{-c},n^{-\frac{1}{2}}] where c=12(ω1)c=\frac{1}{2}(\omega-1), the rate of [GK95] is favorable to that of [vAG19] and state-of-the-art IPMs [vdB20, CLS21].444There is evidence that ω=2\omega=2 cannot be achieved with current techniques, e.g. [Alm21].

1.2 Dynamic Gibbs sampling

We obtain the improved error dependence in our zero-sum game solver by producing a new, faster quantum data structure to perform the Gibbs sampling as used in the algorithm of [vAG19], which may be of independent interest. Gibbs sampling is a fundamental algorithmic primitive — the basic task is, given vector vnv\in\mathbb{R}^{n}, sample from the probability distribution proportional to exp(v)\exp(v). Gibbs sampling is used as a subroutine in many quantum and classical optimization algorithms, e.g. [BS17] and follow-up works. In general, quantum algorithms can perform this task more efficiently using amplitude estimation, which can boost the acceptance probability of rejection sampling schemes. This strategy was implemented in [vAG19], which approximate the maximum entry vmaxv_{\max} of vv using quantum maximum finding [DH96], uniformly sample i[n]i\in[n], and accept the sample with probability exp(vivmax)1\exp(v_{i}-v_{\max})\leq 1 using quantum rejection sampling. We give a more detailed overview of the [vAG19] Gibbs sampler and its complexity analysis in Section 3.2.

We give a data structure which quadratically improves the error dependence of the [vAG19] Gibbs sampling subroutine runtime, from O~(m+nϵ1)\widetilde{O}(\sqrt{m+n}\cdot\epsilon^{-1}) per sample to an amortized O~(m+nϵ12)\widetilde{O}(\sqrt{m+n}\cdot\epsilon^{-\frac{1}{2}}) per sample. A key fact which enables this improvement is that the Gibbs distributions one samples from in the zero-sum game solver of [GK95] change slowly over time: the base vector vv receives bounded sparse updates in each iteration. By storing partial information about the Gibbs distribution, namely an efficiently-computable overestimate to its entries which remains valid across many consecutive iterations, we obtain an improved dynamic Gibbs sampler, which we also provide a detailed overview of in Section 3.2.

We now define our notion of an approximate Gibbs sampler, and then state the dynamic sampling problem we consider, which arises naturally in zero-sum game algorithms with sublinear runtimes.

Definition 1 (Approximate Gibbs oracle).

For vnv\in\mathbb{R}^{n}, its associated Gibbs distribution is pvΔnp_{v}\in\Delta^{n} such that for all i[n]i\in[n], [pv]iexp(vi)[p_{v}]_{i}\propto\exp(v_{i}). We say 𝒪vgibbs\mathcal{O}^{\textup{gibbs}}_{v} is a δ\delta-approximate Gibbs oracle if it samples from p~Δn\tilde{p}\in\Delta^{n} with p~pv1δ\left\lVert\tilde{p}-p_{v}\right\rVert_{1}\leq\delta.

Problem 1 (Sampling maintenance).

Let η>0\eta>0, δ(0,1)\delta\in(0,1), and suppose we have a quantum oracle for 𝐀m×n\mathbf{A}\in\mathbb{R}^{m\times n}. Consider a sequence of TT 𝖴𝗉𝖽𝖺𝗍𝖾\mathsf{Update} operations to a dynamic vector x0mx\in\mathbb{R}^{m}_{\geq 0}, each of the form xixi+ηx_{i}\leftarrow x_{i}+\eta for some i[m]i\in[m]. In the sampling maintenance problem, in amortized 𝒯update\mathcal{T}_{\textup{update}} time per 𝖴𝗉𝖽𝖺𝗍𝖾\mathsf{Update} we must maintain a δ\delta-approximate Gibbs oracle, 𝒪samp\mathcal{O}_{\textup{samp}}, for 𝐀x\mathbf{A}^{\top}x which is queryable in worst-case time 𝒯samp\mathcal{T}_{\textup{samp}}.

Our result.

We provide a quantum algorithm for solving Problem 1, which improves upon the runtime implied by the corresponding component in the algorithm of [vAG19].

Theorem 2 (informal, see Theorem 3).

There is a quantum algorithm which solves 1 with high probability with max(𝒯samp,𝒯update)=O~(nTη1.5)\max(\mathcal{T}_{\textup{samp}},\mathcal{T}_{\textup{update}})=\widetilde{O}\left(\sqrt{n}\cdot T\eta^{1.5}\right) and an initialization cost of O~(η3T3)\widetilde{O}\left(\eta^{3}T^{3}\right).

Theorem 2 improves upon the solution to the sampling maintenance Problem 1 implied by [vAG19] by a η12\eta^{-\frac{1}{2}} factor; in the setting of the [GK95] solver, where T=O~(ϵ2)T=\widetilde{O}(\epsilon^{-2}) and η=Θ(ϵ)\eta=\Theta(\epsilon), this is an ϵ12\epsilon^{-\frac{1}{2}}-factor improvement. At a high level, our improvement is obtained by storing a hint consisting of a vector which overestimates the true Gibbs distribution, and an approximate normalization factor, which are infrequently updated. Our maintained hint satisfies the desirable properties that: (i)(i) it remains valid for a batch of consecutive iterations, and (ii)(ii) the degree of overestimation is bounded. The former property ensures a fast amortized update time, and the latter ensures a fast sample time by lower bounding the acceptance probability of our quantum rejection sampler. Our high-level strategy for maintaining improved hints is to repeatedly call our sampling access to accurately estimate large entries of the Gibbs distribution, and to exploit stability of the distribution under the setting of Problem 1. We discuss our dynamic Gibbs sampler in more detail and compare it with previous methods for solving 1 in Section 3.2.

The initialization cost of Theorem 2 is due to the current state-of-the-art in numerically stable implementations of the quantum singular value transformation (SVT) framework of [GSLW19]. This cost is also the cause of the additive O~(ϵ3)\widetilde{O}(\epsilon^{-3}) term in Theorem 1. We discuss this cost in Appendix D; improvements to numerically stable implementations of [GSLW19] would be reflected in the runtimes of Theorems 1 and 2.

1.3 Related work

Table 1: Algorithms for computing ϵ\displaystyle\epsilon-approximate Nash equilibria of zero-sum games. Hides polylogarithmic factors and assumes 𝐀m×n\displaystyle\mathbf{A}\in\mathbb{R}^{m\times n} with 𝐀max1\displaystyle\left\lVert\mathbf{A}\right\rVert_{\max}\leq 1.
Method Query model Total runtime
interior point method [CLS21] classical max(m,n)ω\displaystyle\max(m,n)^{\omega}
interior point method [vdBLL+21] classical mn+min(m,n)2.5\displaystyle mn+\min(m,n)^{2.5}
extragradient [Nem04, Nes07] classical mnϵ1\displaystyle mn\cdot\epsilon^{-1}
stochastic mirror descent (SMD) [GK95] classical (m+n)ϵ2\displaystyle(m+n)\cdot\epsilon^{-2}
variance-reduced SMD [CJST19] classical mn+mn(m+n)ϵ1\displaystyle mn+\sqrt{mn(m+n)}\cdot\epsilon^{-1}
[vAG19] quantum m+nϵ3\displaystyle\sqrt{m+n}\cdot\epsilon^{-3}
Theorem 1 (our work) quantum m+nϵ2.5+ϵ3\displaystyle\sqrt{m+n}\cdot\epsilon^{-2.5}+\epsilon^{-3}
Table 2: Solutions to Problem 1, T=ϵ2\displaystyle T=\epsilon^{-2}, η=ϵ\displaystyle\eta=\epsilon. Hides polylogarithmic factors.
Method Query model 𝒯samp\displaystyle\mathcal{T}_{\textup{samp}} 𝒯update\displaystyle\mathcal{T}_{\textup{update}}
explicit updates [GK95] classical 1\displaystyle 1 m+n\displaystyle m+n
max-based rejection sampling [vAG19] quantum m+nϵ1\displaystyle\sqrt{m+n}\cdot\epsilon^{-1} m+nϵ1\displaystyle\sqrt{m+n}\cdot\epsilon^{-1}
Theorem 2 (our work) quantum m+nϵ12\displaystyle\sqrt{m+n}\cdot\epsilon^{-\frac{1}{2}} m+nϵ12\displaystyle\sqrt{m+n}\cdot\epsilon^{-\frac{1}{2}}

Quantum optimization and machine learning.

There are a wide array of quantum algorithms for optimization and machine learning which make use of fundamental algorithmic primitives such as amplitude amplification [BHMT02], the HHL algorithm [HHL09], and the quantum singular value transformation [GSLW19]. For example, a number of works gave HHL-based algorithms for a variety of machine learning tasks such as PCA [LMR14], SVMs [RML14], and recommendation systems [KP16]. For more details see the survey article of [BWP+17].

Most relevant to our current work are quantum algorithms for optimization problems. For example, Brandao and Svore [BS17] gave a quantum algorithm for SDP solving based on the Arora-Kale algorithm [AK07], which was later improved by [VAGGdW20b]. There have also been quantum IPM-based methods for LPs and SDPs [KP20]. Additionally a series of works have considered quantum algorithms for general convex optimization [CCLW20, vAGGdW20a], which make use of Jordan’s algorithm for fast gradient estimation [Jor05, GAW19].

In the area of zero-sum games, in addition to the works previously mentioned [vAG19, LCW19] on 1\ell_{1}-1\ell_{1} games (where both players are 1\ell_{1}-constrained), there have been several works considering different variants of zero-sum games. For example Li, Chakrabati and Wu [LCW19] gave quantum algorithms for 2\ell_{2}-1\ell_{1} games with quadratic improvement on the dimension. Later Li, Wang, Chakrabati and Wu [LWCW21] extended this algorithm to more general q\ell_{q}-1\ell_{1} games with q(1,2]q\in(1,2].

Zero-sum games.

Zero-sum games are a canonical modeling tool in optimization, economics and machine learning [Neu28]. The classic extragradient (mirror prox) method [Nem04, Nes07] computes an ϵ\epsilon-approximate NE in O~(mnϵ1)\widetilde{O}(mn\cdot\epsilon^{-1}) time; as discussed previously, the stochastic mirror descent method of [GK95] obtains the same accuracy in time O~((m+n)ϵ2)\widetilde{O}((m+n)\cdot\epsilon^{-2}). An intermediate runtime was recently obtained by [CJST19] using variance reduction, described in Table 1.

Improved runtimes are available under more fine-grained characterizations of the matrix 𝐀\mathbf{A}, such as sparsity (e.g. number of nonzero entries per row or column) or numerical sparsity (e.g. rows and columns with bounded 1\ell_{1}-to-2\ell_{2} norm ratios) [CJST20]. Notably, the [GK95] algorithm also offers runtime improvements under a sparsity assumption, as does the algorithm of [vAG19] in certain sparsity-to-accuracy ratio regimes. In this paper, we focus on NE algorithms in the general setting (without further sparsity or numerical sparsity assumptions).

In parallel, a long line of research improving IPMs for solving linear programming [Kar84, Ren88, LS14, LS19, vdBLSS20, JSWZ21] has led to a number of different zero-sum game solvers with polylogarithmic runtime dependencies on the problem accuracy ϵ\epsilon. The current state-of-the-art variants of IPMs are [CLS21] and [vdBLL+21], which achieve runtimes of O~(max(m,n)ω)\widetilde{O}(\max(m,n)^{\omega}) and O~(mn+min(m,n)2.5)\widetilde{O}(mn+\min(m,n)^{2.5}) respectively. We refer readers to Table 1 for detailed comparisons. Finally, for strongly polynomial runtimes (i.e. with no dependence on ϵ\epsilon), which are outside the scope of this paper, we refer readers to [DNV20] and references therein.

1.4 Future work

Theorem 1’s ϵ\epsilon dependence is within an ϵ12\epsilon^{-\frac{1}{2}} factor of matching classical counterparts. To the best of our knowledge, removing this ϵ12\epsilon^{-\frac{1}{2}} overhead would represent the first quantum algorithm for a natural optimization problem which improves upon classical counterparts across all parameters.

Both our work and [vAG19] solve Problem 1 by leveraging a powerful polynomial approximation-based technique developed in [GSLW19], known as the quantum singular value transform (QSVT). In both cases, QSVT is used with a polynomial of degree O~(ϵ1)\widetilde{O}(\epsilon^{-1}). We note that in closely-related classical settings (discussed in [SV14]), Chebyshev polynomial-based approximations yield a quadratically smaller degree. However, a boundedness requirement (due to the spectra of quantum gates) prevents straightforwardly applying these constructions within QSVT. Sidestepping this barrier is a natural avenue towards improving our work, which we leave as an open problem.

More generally, establishing optimal oracle query complexities of dynamic Gibbs sampling (e.g. Problem 1) and solving zero-sum games are key problems left open by our work. These questions are potentially more approachable than establishing tight time complexity characterizations. For example, could max(𝒯samp,𝒯update)\max(\mathcal{T}_{\textup{samp}},\mathcal{T}_{\textup{update}}) be improved to O~(n)\widetilde{O}(\sqrt{n}) in the context of Theorem 1, or can we rule out such an improvement in the query model?

1.5 Organization

In Section 2 we state the notation used throughout the paper, as well as the (classical and quantum) computational models we assume. In Section 3, we give a brief technical overview of the core components of our algorithm used to prove Theorem 1: the stochastic gradient method our method is built on, and an efficient quantum implementation of a key subroutine using a new dynamic Gibbs sampler. Finally in Section 4 we give our new quantum sampler, and prove Theorem 2.

We aim to give a self-contained, but simplified, description of our algorithm in Section 3 to improve the readability of the paper for readers with an optimization background unfamiliar with quantum computing, and vice versa. In particular, we abstract away the core optimization machinery (stochastic mirror descent) and quantum machinery (quantum SVT) developed in prior work into the statements of Propositions 1 and 2, and focus on how we use these statements black-box to build a faster algorithm. The proofs of these statements can be found in Appendices A and B.

2 Preliminaries

General notation.

O~\widetilde{O} hides logarithmic factors in problem dimensions (denoted mm and nn), target accuracies (denoted ϵ\epsilon), and failure probabilities (denoted α\alpha). When discussing runtimes for Problem 1, we additionally use O~\widetilde{O} to hide logarithmic factors in the parameters η,T\eta,T. For all i[n]i\in[n] we let eine_{i}\in\mathbb{R}^{n} denote the ithi^{\text{th}} standard basis vector for i[n]i\in[n] when nn is clear. p\left\lVert\cdot\right\rVert_{p} denotes the p\ell_{p} norm of a vector. For 𝐀m×n\mathbf{A}\in\mathbb{R}^{m\times n}, its ithi^{\text{th}} row and jthj^{\text{th}} column are respectively 𝐀i:,𝐀:j\mathbf{A}_{i:},\mathbf{A}_{:j}. For vnv\in\mathbb{R}^{n}, diag(v)\textbf{{diag}}\left(v\right) is the diagonal n×nn\times n matrix with vv as the diagonal. Conjugate transposes of 𝐀\mathbf{A} are denoted 𝐀\mathbf{A}^{*}; when the matrix is real we use 𝐀\mathbf{A}^{\top}. The all-ones and all-zeros vectors of dimension nn are 𝟏n\mathbf{1}_{n} and 𝟎n\mathbf{0}_{n}. Finally, throughout a:=log2ma:=\lceil\log_{2}m\rceil and b:=log2nb:=\lceil\log_{2}n\rceil, so [m][2a][m]\subseteq[2^{a}] and [n][2b][n]\subseteq[2^{b}].

Computation models.

We assume entries of 𝐀\mathbf{A} are ww-bit reals for w=O(log(mn))w=O(\log(mn)), and work in the word RAM model where ww-bit arithmetic operations take O(1)O(1) time; for simplicity, we assume mathematical operations such as trigonometric functions and radicals can also be implemented exactly for ww-bit words in O(1)O(1) time. Throughout, “quantum states” mean unit vectors, and “quantum gates” or “oracles” 𝒪\mathcal{O} mean unitary matrices. We follow standard notation and identify a standard basis vector eie_{i} for i[n]i\in[n] with |i|i\rangle, an aa-qubit state, in which ii is represented in binary (i.e. more formally, |i=|bin(i)|i\rangle=|\textup{bin}(i)\rangle, and bin is omitted for brevity). We consider the standard model of quantum access to oracles, in which the oracle 𝒪\mathcal{O}, which is defined by its operation on |s|s\rangle for all {0,1}\{0,1\}^{*}-valued ss (where length is clear from context), can be queried in superposition. If 𝒪\mathcal{O} is queried on |v:=sαs|s|v\rangle:=\sum_{s}\alpha_{s}|s\rangle, the result is 𝒪|v=sαi(𝒪|s)\mathcal{O}|v\rangle=\sum_{s}\alpha_{i}(\mathcal{O}|s\rangle). We use |g|g\rangle, |g|g^{\prime}\rangle, etc. (when clear from context) to denote arbitrary sub-unit vectors, which represent garbage states (unused in computations). The tensor product of states |u|u\rangle and |v|v\rangle on aa and bb qubits is denoted |u|v|u\rangle|v\rangle, an (a+b)(a+b)-qubit state. The runtime of a quantum circuit is its maximum depth (in arithmetic gates on ww-bit words).

Access model.

Throughout the paper, we assume a standard quantum oracle for accessing 𝐀\mathbf{A} (recall 𝐀max1\left\lVert\mathbf{A}\right\rVert_{\max}\leq 1). In particular, by a quantum oracle for 𝐀\mathbf{A} we mean an oracle 𝒪𝐀\mathcal{O}_{\mathbf{A}} which, when queried with |i|j|s|i\rangle|j\rangle|s\rangle for i[m],j[n],s{0,1}wi\in[m],j\in[n],s\in\{0,1\}^{w}, reversibly writes 𝐀ij\mathbf{A}_{ij} (in binary) to the third register in O(1)O(1) time, i.e. 𝒪𝐀|i|j|s=|i|j|s𝐀ij\mathcal{O}_{\mathbf{A}}|i\rangle|j\rangle|s\rangle=|i\rangle|j\rangle|s\oplus\mathbf{A}_{ij}\rangle where \oplus is bitwise mod-22 addition.

Given a quantum oracle for 𝐀\mathbf{A}, with two queries, by standard constructions one can construct an oracle which places the value in the amplitude of the state rather than the register itself. More formally, one can construct555This follows e.g. by calling the oracle to obtain the value of 𝐀ij\mathbf{A}_{ij} in binary (interpreted as a signed number between 0 and 11), adding an ancilla qubit, performing arithmetric to compute the rotation angle needed on that ancilla, applying a tower of controlled rotation gates to an ancilla qubit using that rotation angle express in binary, then calling the standard oracle a second time to uncompute the binary value of 𝐀ij\mathbf{A}_{ij}. See e.g. [GR02] for details. an 𝒪𝐀\mathcal{O}_{\mathbf{A}}^{\prime}, which operates as:

𝒪𝐀|0|i|j=𝐀ij|0|i|j+1|𝐀ij||1|g, for (i,j)[m]×[n].\mathcal{O}_{\mathbf{A}}^{\prime}|0\rangle|i\rangle|j\rangle=\sqrt{\mathbf{A}_{ij}}|0\rangle|i\rangle|j\rangle+\sqrt{1-|\mathbf{A}_{ij}|}|1\rangle|g\rangle,\text{ for }(i,j)\in[m]\times[n].

It is standard in the literature to (using ancilla qubits to store the output register where 𝐀ij\mathbf{A}_{ij} is written) construct such an 𝒪𝐀\mathcal{O}_{\mathbf{A}}^{\prime} from 𝒪𝐀\mathcal{O}_{\mathbf{A}} under our classical model of computation, see e.g. [GR02]. For simplicity, we omit discussion of ancilla qubits in the remainder of the paper and assume direct access to 𝒪𝐀\mathcal{O}_{\mathbf{A}}^{\prime}. We also note that there is ambiguity in the implementation of 𝒪𝐀\mathcal{O}_{\mathbf{A}}^{\prime} in that the square root is not unique, and that we have control over the signing used in this implementation. We will use this flexibility crucially later in the paper, specifically Corollary 6.

3 Overview of approach

In this section, we give an overview of the approach we take to prove our main results: an improved quantum runtime for solving zero-sum games (Theorem 4) and an improved quantum data structures for dynamic Gibbs sampling (Theorem 3). We organize this section as follows.

In Section 3.1, we state Algorithm 1, the optimization method framework we use to solve zero-sum games. This framework is a generalization of the classical algorithm of [GK95]. We state its guarantees in Proposition 1 and defer the proof to Appendix A. Algorithm 1 assumes access to an approximate Gibbs oracle (Definition 1) for sampling from dynamic distributions as stated in Problem 1. The bulk of our work is devoted to obtaining an efficient quantum implementation of such an oracle (Theorem 3) and using this result we prove Theorem 4 at the end of Section 3.1.

In Section 3.2, we overview the main technical innovation of this paper, an improved solution to Problem 1. Whereas prior work by [vAG19] solves Problem 1 at an amortized m+nϵ1\approx\sqrt{m+n}\cdot\epsilon^{-1} cost per iteration, we show how to solve the problem at an amortized m+nϵ12\approx\sqrt{m+n}\cdot\epsilon^{-\frac{1}{2}} cost. We remark that the only quantum components of our algorithm (quantum SVT and amplitude amplification) are abstracted away by Proposition 2, which is proven in Appendix B.

3.1 Solving matrix games with a Gibbs sampling oracle

Our proof of Theorem 4 uses an efficient implementation of the algorithmic framework stated in Algorithm 1, based on stochastic mirror descent. In specifying Algorithm 1, we recall our earlier Definition 1, which captures the approximate sampling access we require for Algorithm 1’s execution.

1 Input: 𝐀m×n\mathbf{A}\in\mathbb{R}^{m\times n}, desired accuracy ϵ(0,1)\epsilon\in(0,1), δ\delta-approximate Gibbs oracles for the (dynamic) vectors 𝐀xt-\mathbf{A}^{\top}x_{t} and 𝐀yt\mathbf{A}y_{t}
2 Parameters: Gibbs sampler parameter δ(0,1)\delta\in(0,1), step size η>0\eta>0, iteration count TT
3 Initialize u^𝟎m\hat{u}\leftarrow\mathbf{0}_{m}, v^𝟎n\hat{v}\leftarrow\mathbf{0}_{n}, x0𝟎mx_{0}\leftarrow\mathbf{0}_{m}, and y0𝟎ny_{0}\leftarrow\mathbf{0}_{n}
4 for t=0t=0 to T1T-1 do
5       Independently sample jt,jt[n]j_{t},j^{\prime}_{t}\in[n] using 𝒪𝐀xtgibbs\mathcal{O}^{\textup{gibbs}}_{-\mathbf{A}^{\top}x_{t}} and it,it[m]i_{t},i^{\prime}_{t}\in[m] using 𝒪𝐀ytgibbs\mathcal{O}^{\textup{gibbs}}_{\mathbf{A}y_{t}}
       Update yt+1yt+ηejty_{t+1}\leftarrow y_{t}+\eta e_{j_{t}} and xt+1xt+ηeitx_{t+1}\leftarrow x_{t}+\eta e_{i_{t}}
        // Update iterates.
       Update u^u^+1Teit\hat{u}\leftarrow\hat{u}+\frac{1}{T}e_{i^{\prime}_{t}} and v^v^+1Tejt\hat{v}\leftarrow\hat{v}+\frac{1}{T}e_{j^{\prime}_{t}}
        // Update output.
6      
7return (u^,v^)(\hat{u},\hat{v})
Algorithm 1 MatrixGameSolver(δ,η,T\delta,\eta,T)

The main skeleton of Algorithm 1 (Lines 5-6) using exact oracles is identical to the method of [GK95]. However, our framework builds upon [GK95] in the following three ways.

  1. 1.

    We tolerate total variation error in the sampling procedure via δ\delta-approximate Gibbs oracles.

  2. 2.

    We provide a high-probability guarantee on the duality gap using martingale arguments.

  3. 3.

    We subsample the output to obtain a sparse solution yielding a comparable duality gap.

We remark that several of these improvements have appeared previously, either explicitly or implicitly, in the stochastic gradient method literature. For example, an approximation-tolerant stochastic gradient method was given in [CJST20], and our proofs of the high-probability guarantees are based on arguments in [AL17, CDST19]. For completeness we give a self-contained proof of the following guarantee on Algorithm 1 in Appendix A.

Proposition 1.

Let 𝐀m×n\mathbf{A}\in\mathbb{R}^{m\times n} satisfy 𝐀max1\left\lVert\mathbf{A}\right\rVert_{\max}\leq 1 and ϵ,α(0,1)\epsilon,\alpha\in(0,1). Let δϵ20\delta\leq\frac{\epsilon}{20}, η=ϵ60\eta=\frac{\epsilon}{60}, and T=Θ(ϵ2logmnα)T=\Theta(\epsilon^{-2}\log\frac{mn}{\alpha}) for an appropriate constant. With probability 1α\geq 1-\alpha, Algorithm 1 outputs an ϵ\epsilon-approximate NE for 𝐀\mathbf{A}.

Given Proposition 1 to obtain our faster zero-sum game solvers, we simply need to efficiently implement the Gibbs sampling in Algorithm 1. As introduced in Section 1, 1, describes a dynamic approximate Gibbs oracle sampling problem sufficient for this task. Indeed, solving two appropriate parameterizations of Problem 1 provides the oracles needed by Algorithm 1. By combining Proposition 1 with the following Theorem 3 (our solution to Problem 1, discussed in greater detail in Section 3.2), we prove our main result Theorem 4.

Theorem 3.

Let α(0,1)\alpha\in(0,1) and δη\delta\leq\eta. Given a quantum oracle for 𝐀m×n\mathbf{A}\in\mathbb{R}^{m\times n} (defined in Section 2) with 𝐀max1\left\lVert\mathbf{A}\right\rVert_{\max}\leq 1, we can solve Problem 1 with probability 1α\geq 1-\alpha with

max(𝒯samp,𝒯update)=O(1+nTηlog4(mnδ)(ηlog(nηTα)+ηlog(nηTα))),\max(\mathcal{T}_{\textup{samp}},\mathcal{T}_{\textup{update}})=O\left(1+\sqrt{n}\cdot T\eta\log^{4}\left(\frac{mn}{\delta}\right)\cdot\left(\sqrt{\eta\log\left(\frac{n\eta T}{\alpha}\right)}+\eta\log\left(\frac{n\eta T}{\alpha}\right)\right)\right),

and an additive initialization cost of

O(η3T3log4(nηTδ)+log7(nηTδ)).O\left(\eta^{3}T^{3}\log^{4}\left(\frac{n\eta T}{\delta}\right)+\log^{7}\left(\frac{n\eta T}{\delta}\right)\right).
Theorem 4.

Let 𝐀m×n\mathbf{A}\in\mathbb{R}^{m\times n} satisfy 𝐀max1\left\lVert\mathbf{A}\right\rVert_{\max}\leq 1, and let ϵ,α(0,1)\epsilon,\alpha\in(0,1). Given a quantum oracle for 𝐀\mathbf{A} (defined in Section 2), there is a quantum algorithm which yields a classical output (u,v)Δm×Δn(u,v)\in\Delta^{m}\times\Delta^{n} that is an ϵ\epsilon-approximate NE for 𝐀\mathbf{A} with probability 1α\geq 1-\alpha in time

O(m+nϵ2.5log4(mnϵ)log2.5(mnαϵ)+m+nϵ2log4(mnϵ)log3(mnαϵ)+1ϵ3log7(mnϵ)).O\left(\frac{\sqrt{m+n}}{\epsilon^{2.5}}\log^{4}\left(\frac{mn}{\epsilon}\right)\log^{2.5}\left(\frac{mn}{\alpha\epsilon}\right)+\frac{\sqrt{m+n}}{\epsilon^{2}}\log^{4}\left(\frac{mn}{\epsilon}\right)\log^{3}\left(\frac{mn}{\alpha\epsilon}\right)+\frac{1}{\epsilon^{3}}\log^{7}\left(\frac{mn}{\epsilon}\right)\right).
Proof.

We apply two instances of Theorem 3 to implement the δ\delta-approximate Gibbs oracle for the dynamic vectors 𝐀xt-\mathbf{A}^{\top}x_{t} and 𝐀yt\mathbf{A}y_{t}, to implement each iteration of Algorithm 1 in amortized O(1+𝒯samp+𝒯update)O(1+\mathcal{T}_{\textup{samp}}+\mathcal{T}_{\textup{update}}) time. Using the settings of parameters T,ηT,\eta in Proposition 1 and setting δ=Θ(ϵ)\delta=\Theta(\epsilon), which suffices for Algorithm 1 and Theorem 3, we have

max(𝒯samp,𝒯update)=O(m+nϵlog4(mnϵ)log(mnαϵ)(ϵlog(mnαϵ)+ϵlog(mnαϵ))).\max(\mathcal{T}_{\textup{samp}},\mathcal{T}_{\textup{update}})=O\left(\frac{\sqrt{m+n}}{\epsilon}\log^{4}\left(\frac{mn}{\epsilon}\right)\log\left(\frac{mn}{\alpha\epsilon}\right)\left(\epsilon\log\left(\frac{mn}{\alpha\epsilon}\right)+\sqrt{\epsilon\log\left(\frac{mn}{\alpha\epsilon}\right)}\right)\right).

The conclusion follows since, by observation, Algorithm 1 costs O(T(1+𝒯samp+𝒯update))O(T\cdot(1+\mathcal{T}_{\textup{samp}}+\mathcal{T}_{\textup{update}})). As remarked in the introduction, the additive term in the runtime comes from the cost of stably implementing a quantum circuit required in the use of Theorem 3 representing a polynomial transformation in finite precision, which we discuss in greater detail in Appendix D. ∎

3.2 Dynamic sampling maintenance via dynamic hint maintenance

In this section, we overview our proof of Theorem 3, which proceeds in two steps.

  1. 1.

    We reduce sampling maintenance (Problem 1) to a problem which we call hint maintenance. This latter problem is a specialization of the sampling maintenance problem where suitable advice, which we call the hint throughout, is provided.

  2. 2.

    We show how to solve the hint maintenance problem required by Proposition 2 in Theorem 3, by recursively calling Proposition 2 in phases, allowing us to maintain hints of suitable quality.

Reducing sampling maintenance to hint maintenance.

First, we introduce the following data structure for maintaining the xx variable in Problem 1, which was used crucially in [vAG19] for dynamic Gibbs sampling. This data structure allows efficient queries to subsets of the coordinates of xx and we use it in our Gibbs sampler as well.

Lemma 1 (Sampler tree).

Let η0\eta\in\mathbb{R}_{\geq 0} and mm\in\mathbb{N}. There is a classical data structure, 𝖲𝖺𝗆𝗉𝗅𝖾𝗋𝖳𝗋𝖾𝖾\mathsf{SamplerTree}, supporting a tree on O(m)O(m) nodes such that [m][m] corresponds to leaves, with the following operations.

  • 𝖨𝗇𝗂𝗍(m,ηfixed)\mathsf{Init}(m,\eta_{\textup{fixed}}): initialize x𝟎mx\leftarrow\mathbf{0}_{m} and ηηfixed\eta\leftarrow\eta_{\textup{fixed}}

  • 𝖴𝗉𝖽𝖺𝗍𝖾(i)\mathsf{Update}(i): xixi+ηx_{i}\leftarrow x_{i}+\eta

  • 𝖲𝗎𝖻𝗍𝗋𝖾𝖾𝖲𝗎𝗆(v)\mathsf{SubtreeSum}(v): return the sum of all xix_{i}, where ii is in the subtree of vv

The total runtime of TT calls to 𝖴𝗉𝖽𝖺𝗍𝖾\mathsf{Update} is O(Tlogm)O(T\log m), and calls to 𝖲𝗎𝖻𝗍𝗋𝖾𝖾𝖲𝗎𝗆\mathsf{SubtreeSum} cost O(1)O(1).

An implementation of 𝖲𝖺𝗆𝗉𝗅𝖾𝗋𝖳𝗋𝖾𝖾\mathsf{SamplerTree} based on propagating subtree sums upon updates is standard classical data structure, and we omit further description for brevity. Next, we state our first building block towards solving Problem 1, a result which can be thought of as quantum sampling with a hint. We defer its proof to Appendix B, as it is primarily based on generalizing dynamic block-encoding strategies with bounded-degree polynomial approximations, as pioneered by [GSLW19, vAG19].

Proposition 2.

Let x0mx\in\mathbb{R}^{m}_{\geq 0} correspond to an instance of 𝖲𝖺𝗆𝗉𝗅𝖾𝗋𝖳𝗋𝖾𝖾\mathsf{SamplerTree}, and βx1\beta\geq\left\lVert x\right\rVert_{1}. Let pp be the Gibbs distribution associated with 𝐀x\mathbf{A}^{\top}x, let Z:=j[n]exp([𝐀x]j)Z:=\sum_{j\in[n]}\exp([\mathbf{A}^{\top}x]_{j}) and Z~[Z,CZ]\widetilde{Z}\in[Z,CZ] for some C1C\geq 1. Finally, let qnq\in\mathbb{R}^{n} have entries classically queriable in O(1)O(1) time, satisfy qpq\geq p entrywise, qj[δn,1]q_{j}\in[\frac{\delta}{n},1] for all j[n]j\in[n], and q1=ρ\left\lVert q\right\rVert_{1}=\rho. Suppose Z~\widetilde{Z}, CC, ρ\rho, and β\beta are explicitly known. Given a quantum oracle for 𝐀m×n\mathbf{A}\in\mathbb{R}^{m\times n} (defined in Section 2) with 𝐀max1\left\lVert\mathbf{A}\right\rVert_{\max}\leq 1, we can implement a δ\delta-approximate Gibbs oracle which has query cost O(ρCβlog4(Cmnδ))O(\sqrt{\rho C}\cdot\beta\log^{4}\left(\frac{Cmn}{\delta}\right)). The total additional cost incurred if xx undergoes TT 𝖴𝗉𝖽𝖺𝗍𝖾\mathsf{Update} calls which preserve the invariants on Z~,C,ρ,β\widetilde{Z},C,\rho,\beta is O(Tlogm)O(T\log m).

Proposition 2 makes use of an overestimating hint vector qq and approximate normalization constant Z~\widetilde{Z}, which we collectively call the hint. The acceptance probability of our rejection sampling is governed by two primary parameters: ρ=q1\rho=\|q\|_{1}, which reflects the degree of overestimation (and can be thought of as a hint quality), and C1C\geq 1, which reflects our inability to accept with probability pjqj\frac{p_{j}}{q_{j}} when pp is implicit (which can be thought of as a normalization quality). In particular, the rejection sampling scheme used in Proposition 2 will instead accept with probability pjCqj\frac{p_{j}}{Cq_{j}}.666Exactly computing ZZ may require time Ω(n)\Omega(n) in standard implementations, an obstacle to runtimes n\propto\sqrt{n}.

Here we elaborate briefly on the implementation of Proposition 2 (for more details, see Appendix 4). We follow notation of Proposition 2, and also let w:=𝐀xw:=\mathbf{A}^{\top}x such that the unnormalized Gibbs distribution is exp(w)\exp(w), and p=exp(w)Zp=\frac{\exp(w)}{Z}. Proposition 2 is a rejection sampler which first loads the hint qq into superposition, and then applies a filter. Overall, our scheme has the form

sample jqρ, then accept with probability exp(wj)CZqj=pjCqj,\text{sample }j\sim\frac{q}{\rho},\text{ then accept with probability }\frac{\exp(w_{j})}{CZ\cdot q_{j}}=\frac{p_{j}}{Cq_{j}}, (1)

which results in an accepted sample with probability 1ρC\approx\frac{1}{\rho C}, and hence requires ρC\approx\sqrt{\rho C} trials to succeed after applying quantum amplitude amplification, a generalization of Grover search [BHMT02].777The β\beta in Proposition 2 comes from loading exp(wj)\exp(w_{j}) into a quantum oracle via polynomials of degree β\approx\beta. The latter filtering step is implemented using appropriate block-encoding technology.

The above discussion suggests that the hint and normalization qualities, parameterized by ρ\rho and CC, are crucial in controlling the acceptance probability of our scheme. More concretely, in our applications of Proposition 2, β=ηT=O~(1ϵ)\beta=\eta T=\widetilde{O}(\frac{1}{\epsilon}), which is the bound on the 1\ell_{1} norm of the xtx_{t} and yty_{t} iterates in Algorithm 1 under the parameter settings of Proposition 1. Overall, the cost of implementing an approximate Gibbs oracle is then (up to logarithmic factors) ρC1ϵ\sqrt{\rho C}\cdot\frac{1}{\epsilon}. Proposition 2 hence reduces Problem 1 to the problem of maintaining the hint consisting of a vector qq and a normalization estimate Z~\widetilde{Z}. We mention that Proposition 2 is a strict generalization of a corresponding building block in [vAG19], which only used qq set to the all-ones vector.

Approaches for Problem 1.

We now overview our improved solution to Problem 1 via efficient use of Proposition 2. To motivate our solution, we outline three solutions to Problem 1 offering different tradeoffs in the overall quality ρC\rho C. The first only uses classical information and does not use Proposition 2 at all, the second uses Proposition 2 but maintains no history across iterates, and the third (building upon the first two) is our approach.

Solution 1: [GK95]. A standard way to solve Problem 1 is to explicitly update w=𝐀xw=\mathbf{A}^{\top}x and exp(w)\exp(w), and exactly maintain the normalizing constant ZZ. This allows us to sample from pp in O~(1)\widetilde{O}(1) time. Since ww changes by one row of 𝐀\mathbf{A} under a 11-sparse 𝖴𝗉𝖽𝖺𝗍𝖾\mathsf{Update} operation to xx, this is implementable in O(n)O(n) time per iteration. We can view this as an instance of the scheme (1) with q=pq=p, C=1C=1, and ρ=1\rho=1. It yields the (unbalanced) tradeoff for Problem 1 of 𝒯samp=O~(1)\mathcal{T}_{\textup{samp}}=\widetilde{O}(1) and 𝒯update=O(n)\mathcal{T}_{\textup{update}}=O(n).

Solution 2: [vAG19]. A recent work [vAG19] introduced a quantum implementation of the scheme (1) with an improved tradeoff. The [vAG19] scheme first uniformly samples, which in the language of (1) means q=𝟏nq=\mathbf{1}_{n} and ρ=n\rho=n. It then applies quantum maximum finding [DH96] to obtain an approximate maximum entry of ww, which they show takes time O~(βn)\widetilde{O}(\beta\cdot\sqrt{n}); for the sake of simplicity here, we assume this exactly yields wmax:=maxj[n]wjw_{\max}:=\max_{j\in[n]}w_{j}. Finally, the acceptance probability pjCqj\frac{p_{j}}{Cq_{j}} is set to exp(wjwmax)\exp(w_{j}-w_{\max}). For q=𝟏nq=\mathbf{1}_{n}, this translates to

pjexp(wmaxwj)=exp(wmax)Z1,p_{j}\cdot\exp(w_{\max}-w_{j})=\frac{\exp(w_{\max})}{Z}\leq 1,

implying C=1C=1 suffices. We note this bound on CC can be tight when ww is very non-uniform. Overall, the [vAG19] scheme’s update time requires maximum finding, and its sampling time (via Proposition 2) requires time O~(βρC)=O~(βn)\widetilde{O}(\beta\cdot\sqrt{\rho C})=\widetilde{O}(\beta\cdot\sqrt{n}). For β=O~(1ϵ)\beta=\widetilde{O}(\frac{1}{\epsilon}) as in Algorithm 1, this yields the balanced tradeoff max(𝒯samp,𝒯update)=O~(nϵ1)\max(\mathcal{T}_{\textup{samp}},\mathcal{T}_{\textup{update}})=\widetilde{O}\left(\sqrt{n}\cdot\epsilon^{-1}\right). As discussed earlier, our key insight is to improve upon this specific choice of hint in [vAG19], for their implicit use of Proposition 2.

Solution 3: this work. We design better hints for Proposition 2 by executing our algorithm in phases corresponding to batches of 1η\approx\frac{1}{\eta} iterations. At the start of each phase, we use the Gibbs access afforded by Proposition 2 to produce a suitable hint for efficiently implementing the next phase. Our execution of this strategy, parameterized by an integer k[n]k\in[n], relies on the following observations.

  1. 1.

    During 1η\lceil\frac{1}{\eta}\rceil iterations t{τ+s}s[1η]t\in\{\tau+s\}_{s\in[\lceil\frac{1}{\eta}\rceil]} (where τ\tau starts the phase), the dynamic Gibbs distribution ptp_{t} (where tt is the iteration index) changes by O(1)O(1) multiplicatively, since ww entrywise changes by O(1)O(1) additively. Thus, the quality of a hint vector deteriorates by at most a constant in the phase, so it suffices to give a good hint qτpτq_{\tau}\geq p_{\tau} at the phase start.

  2. 2.

    By using access to Proposition 2 at the end of the previous phase, we can efficiently estimate large entries of pτp_{\tau}. More precisely, we sample O~(k)\widetilde{O}(k) times from pτp_{\tau}, and let the empirical distribution of these samples be q~\tilde{q}. Chernoff bounds show that any large entry [pτ]j=Ω(1k)[p_{\tau}]_{j}=\Omega(\frac{1}{k}) will be accurately reflected in the empirical sample. Hence, we set the hint to

    qj={q~jO(1)q~j=Ω(1k)1kO(1)q~j=O(1k),q_{j}=\begin{cases}\tilde{q}_{j}\cdot O(1)&\tilde{q}_{j}=\Omega(\frac{1}{k})\\ \frac{1}{k}\cdot O(1)&\tilde{q}_{j}=O(\frac{1}{k})\end{cases},

    for appropriate constants. This yields an improved hint quality of ρnk\rho\approx\frac{n}{k}, since large entries of the hint sum to at most O(1)O(1) (as q~jpj\tilde{q}_{j}\approx p_{j}), and small entries sum to O(nk)O(\frac{n}{k}).

  3. 3.

    We show a similar strategy of using empirical concentration, combined with a testing variant of Proposition 2, accurately estimates the normalizing factor ZZ, yielding C=O(1)C=O(1).

This strategy yields 𝒯samp=O~(βn/k)\mathcal{T}_{\textup{samp}}=\widetilde{O}(\beta\cdot\sqrt{n/k}) and 𝒯update=O~(𝒯sampkη)\mathcal{T}_{\textup{update}}=\widetilde{O}(\mathcal{T}_{\textup{samp}}\cdot k\eta) (since we amortize 𝒯update\mathcal{T}_{\textup{update}} over 1η\approx\frac{1}{\eta} iterations). For the parameter settings of Algorithm 1, optimizing kk yields

max(𝒯samp,𝒯update)=O~(nϵ12).\max(\mathcal{T}_{\textup{samp}},\mathcal{T}_{\textup{update}})=\widetilde{O}\left(\sqrt{n}\cdot\epsilon^{-\frac{1}{2}}\right)\,.

We prove Theorem 3, our improved solution to Problem 1, in Section 4. Ignoring logarithmic factors and assuming η1\eta\ll 1 (as in our setting), Theorem 3 shows we can maintain max(𝒯samp,𝒯update)=O~(nTη1.5)\max(\mathcal{T}_{\textup{samp}},\mathcal{T}_{\textup{update}})=\widetilde{O}(\sqrt{n}\cdot T\eta^{1.5}). For the parameter settings T=O~(ϵ2)T=\widetilde{O}(\epsilon^{-2}), η=Θ(ϵ)\eta=\Theta(\epsilon), as stated in Proposition 1, this indeed equates to max(𝒯samp,𝒯update)=O~(nϵ12)\max(\mathcal{T}_{\textup{samp}},\mathcal{T}_{\textup{update}})=\widetilde{O}(\sqrt{n}\cdot\epsilon^{-\frac{1}{2}}).

4 Gibbs sampling oracle implementation

In this section, we prove Theorem 3, which gives our solution to Problem 1. To do so, we follow the outline given in Section 3.2, wherein we solve Problem 1 in batches of 1η\lceil\frac{1}{\eta}\rceil iterations, each of which we call a “phase.” In Sections 4.1 and 4.2, we only discuss a single phase of Problem 1, consisting of the iterations τ+s\tau+s for s[1η]s\in[\lceil\frac{1}{\eta}\rceil] and some initial iteration τ\tau, assuming certain invariants (stated below) hold at the start of the phase. We give a complete solution to Problem 1 in Section 4.3.

Invariant 1 (Approximate normalization access).

We explicitly have Z~prev\widetilde{Z}_{\textup{prev}} with Z~prev[Zτ,CZτ]\widetilde{Z}_{\textup{prev}}\in[Z_{\tau},CZ_{\tau}] for some C=O(1)C=O(1).

Invariant 2 (Initial sampling maintenance).

We have 𝒪τ\mathcal{O}_{\tau} solving Problem 1 in iteration τ\tau.

The remainder of this section is then organized as follows.

  • Section 4.1: We show that assuming Invariants 1 and 2 hold at the start of a phase, we can perform preprocessing used to construct our hint, consisting of the estimated normalization Z~\widetilde{Z} and vector qq, in an application of Proposition 2. This gives the cost of 𝒯samp\mathcal{T}_{\textup{samp}} in Problem 1.

  • Section 4.2: We show that at the conclusion of each phase we can maintain Invariants 1 and 2 for use in the next phase. This gives the cost of 𝒯update\mathcal{T}_{\textup{update}} in Problem 1.

  • Section 4.3: We recursively call the subroutine of Sections 4.1 and 4.2 (which solves Problem 1 for all the iterations τ+s\tau+s where s[1η]s\in[\lceil\frac{1}{\eta}\rceil] for some τ\tau) ηT\approx\eta T times to prove Theorem 3.

4.1 Preprocessing and approximate Gibbs oracle implementation

In this section, we show how to construct the “hint” qq which will be used throughout a phase (starting in iteration τ\tau) given access to 𝒪τ\mathcal{O}_{\tau}, and bound ρ=q1\rho=\left\lVert q\right\rVert_{1} which quantifies the quality of our hint, under the assumption that Invariants 1 and 2 hold in the phase. We first show a multiplicative stability property of the relevant Gibbs distributions in a phase.

Lemma 2.

For all s[1η]s\in[\lceil\frac{1}{\eta}\rceil], we have

Zτ+s[13Zτ,3Zτ], and pτ+s[19pτ,9pτ] entrywise.Z_{\tau+s}\in\left[\frac{1}{3}Z_{\tau},3Z_{\tau}\right],\text{ and }p_{\tau+s}\in\left[\frac{1}{9}p_{\tau},9p_{\tau}\right]\text{ entrywise.}
Proof.

Let νt:=exp(𝐀xt)\nu_{t}:=\exp(\mathbf{A}^{\top}x_{t}) for all tt, such that pt=νtZtp_{t}=\frac{\nu_{t}}{Z_{t}}. We have that for any j[n]j\in[n],

[ντ+s]j[ντ]j\displaystyle\frac{[\nu_{\tau+s}]_{j}}{[\nu_{\tau}]_{j}} =exp([𝐀(xτ+sxτ)]j)\displaystyle=\exp\left(\left[\mathbf{A}^{\top}\left(x_{\tau+s}-x_{\tau}\right)\right]_{j}\right)
[exp(𝐀maxxτ+sxτ1),exp(𝐀maxxτ+sxτ1)]\displaystyle\in\left[\exp\left(-\left\lVert\mathbf{A}\right\rVert_{\max}\left\lVert x_{\tau+s}-x_{\tau}\right\rVert_{1}\right),\exp\left(\left\lVert\mathbf{A}\right\rVert_{\max}\left\lVert x_{\tau+s}-x_{\tau}\right\rVert_{1}\right)\right]
[exp(ηs),exp(ηs)][13,3].\displaystyle\in\left[\exp\left(-\eta s\right),\exp\left(\eta s\right)\right]\in\left[\frac{1}{3},3\right].

Similarly, Zτ+s[13Zτ,3Zτ]Z_{\tau+s}\in[\frac{1}{3}Z_{\tau},3Z_{\tau}], and combining yields the conclusion. ∎

Next, our computation of the overestimating vector qq is parameterized by an integer k[n]k\in[n] which will be fixed throughout this section and Section 4.2. We will simply set qq to be an upscaled variant of an empirical distribution of roughly kk draws from 𝒪τ\mathcal{O}_{\tau}.

Lemma 3.

Let k[n]k\in[n], α(0,1)\alpha\in(0,1), and suppose δ116k\delta\leq\frac{1}{16k}. Draw N=Θ(klognηTα)N=\Theta(k\log\frac{n\eta T}{\alpha}) samples from 𝒪τ\mathcal{O}_{\tau} for an appropriately large constant, and let q~Δn\tilde{q}\in\Delta^{n} be the empirical distribution over these NN samples. Define :={i[n]q~i12k}\mathcal{B}:=\{i\in[n]\mid\tilde{q}_{i}\geq\frac{1}{2k}\}. Then for

qj={18q~jj18kj,q_{j}=\begin{cases}18\tilde{q}_{j}&j\in\mathcal{B}\\ \frac{18}{k}&j\not\in\mathcal{B}\end{cases},

with probability 1α2ηT\geq 1-\frac{\alpha}{2\lceil\eta T\rceil}, q1=O(nk)\left\lVert q\right\rVert_{1}=O(\frac{n}{k}) and qpτ+sq\geq p_{\tau+s} entrywise, for all s1ηs\leq\frac{1}{\eta}.

Proof.

The first conclusion q1=O(nk)\left\lVert q\right\rVert_{1}=O(\frac{n}{k}) is immediate from the definition of qq, since q118q~1+18nk\left\lVert q\right\rVert_{1}\leq 18\left\lVert\tilde{q}\right\rVert_{1}+\frac{18n}{k}. In light of Lemma 2 (which holds deterministically), to show the second conclusion, it suffices to show that with the desired success probability, we have both

2q~j[pτ]j for all j2\tilde{q}_{j}\geq[p_{\tau}]_{j}\text{ for all }j\in\mathcal{B} (2)

and

2k[pτ]j for all j.\frac{2}{k}\geq[p_{\tau}]_{j}\text{ for all }j\not\in\mathcal{B}. (3)

Denote α:=α2ηT\alpha^{\prime}:=\frac{\alpha}{2\lceil\eta T\rceil} for notational convenience, and let p~\tilde{p} denote the distribution of samples from 𝒪τ\mathcal{O}_{\tau}, and recall that p~pτ1116k\left\lVert\tilde{p}-p_{\tau}\right\rVert_{1}\leq\frac{1}{16k}. Because we are taking Θ(klognα)\Theta(k\log\frac{n}{\alpha^{\prime}}) samples from p~\tilde{p}, we have by a standard Chernoff bound that with probability at least 1α1-\alpha^{\prime} (union bounding over all coordinates j[n]j\in[n]), both of the following hold.

  1. 1.

    For all j[n]j\in[n] such that p~j14k\tilde{p}_{j}\geq\frac{1}{4k}, q~j2p~j3\tilde{q}_{j}\geq\frac{2\tilde{p}_{j}}{3}.

  2. 2.

    For all j[n]j\in[n] such that p~j14k\tilde{p}_{j}\leq\frac{1}{4k}, q~j12k\tilde{q}_{j}\leq\frac{1}{2k}.

We condition on these events for the remainder of the proof; we now show (2), (3) in turn.

Proof of (2). To see (2), the second event above implies that if p~j14k\tilde{p}_{j}\leq\frac{1}{4k}, then jj\not\in\mathcal{B}. Hence, for all jj\in\mathcal{B}, we have q~j2p~j3[pτ]j2\tilde{q}_{j}\geq\frac{2\tilde{p}_{j}}{3}\geq\frac{[p_{\tau}]_{j}}{2} since p~pτ116k14p~j\left\lVert\tilde{p}-p_{\tau}\right\rVert_{\infty}\leq\frac{1}{16k}\leq\frac{1}{4}\tilde{p}_{j} for all jj\in\mathcal{B}.

Proof of (3). To see (3), suppose for contradiction that jj\not\in\mathcal{B} and [pτ]j>2k[p_{\tau}]_{j}>\frac{2}{k}. This implies that p~j>1k\tilde{p}_{j}>\frac{1}{k}, and hence by the first event above, q~j12k\tilde{q}_{j}\geq\frac{1}{2k}, contradicting jj\not\in\mathcal{B}. ∎

Corollary 1.

Assume that Invariants 12 hold for the phase consisting of iterations τ+s\tau+s, s[1η]s\in[\lceil\frac{1}{\eta}\rceil]. We can solve Problem 1 for the phase with probability 1α2ηT\geq 1-\frac{\alpha}{2\lceil\eta T\rceil}, and

𝒯samp:=O(nkTηlog4(mnδ)).\mathcal{T}_{\textup{samp}}:=O\left(\sqrt{\frac{n}{k}}\cdot T\eta\log^{4}\left(\frac{mn}{\delta}\right)\right).
Proof.

We will run the algorithm described in the proof of Lemma 3, and condition on it succeeding, giving the failure probability. It then suffices to apply Proposition 2 with qq defined in Lemma 3. For this qq, we parameterize Proposition 2 with C=O(1)C=O(1) (see Invariant 1), ρ=O(nk)\rho=O(\frac{n}{k}) (see Lemma 3), and β=Tη\beta=T\eta. It is clear the lower bound on entries of qq in Proposition 2 holds. ∎

4.2 Maintaining invariants

We now show how to maintain Invariant 1 at iteration τ:=τ+1η\tau^{\prime}:=\tau+\lceil\frac{1}{\eta}\rceil, for use in the next phase, and bound the cost of doing so. We note that Invariant 2 follows immediately from our construction in Corollary 1. First, by combining Lemma 2 with Invariant 1,

Zτ[Z~prev3C,3Z~prev].Z_{\tau^{\prime}}\in\left[\frac{\widetilde{Z}_{\textup{prev}}}{3C},3\widetilde{Z}_{\textup{prev}}\right]. (4)

This suggests that we may use 3Z~prev=Z~3\widetilde{Z}_{\textup{prev}}=\widetilde{Z} for the next phase; however, this would lead to an exponential blowup in the multiplicative range CC. To sidestep this, we develop a tester for a hidden parameter governing a success probability, which will be used to give a refined estimate Z~\widetilde{Z}. We require the following corollary of Proposition 2, whose proof we defer to Appendix B.

Corollary 2.

Following notation of Proposition 2, let R:=Z~ZR:=\frac{\widetilde{Z}}{Z}. There is a quantum oracle 𝒪test\mathcal{O}_{\textup{test}} which can be implemented under TT 𝖴𝗉𝖽𝖺𝗍𝖾\mathsf{Update} calls to xx in O(Tlogm)O(T\log m) time, and has query cost

O(ρCβlog4(Cmnδ)).O\left(\sqrt{\rho C}\cdot\beta\log^{4}\left(\frac{Cmn}{\ell\delta}\right)\right).

Furthermore, for explicitly known constants CC_{\ell} and CuC_{u}, 𝒪test\mathcal{O}_{\textup{test}} returns “success” with probability pp for

CRρpCuRρ.\frac{C_{\ell}}{\sqrt{R\rho}}\leq p\leq\frac{C_{u}}{\sqrt{R\rho}}.

Corollary 2 differs from Proposition 2 in that it returns a Boolean-valued answer (as opposed to a sample from an approximate Gibbs distribution), and has a success probability parameterized by explicit constants. We now show how to use Corollary 2 to maintain Invariant 1.

Lemma 4.

Assume Invariants 12 hold for the phase consisting of iterations τ+s\tau+s, s[1η]s\in[\lceil\frac{1}{\eta}\rceil], and suppose C4Cu2C2C\geq\frac{4C_{u}^{2}}{C_{\ell}^{2}} for C=O(1)C=O(1), where CuC_{u} and CC_{\ell} are the constants from Corollary 2. Further, suppose we have obtained qq satisfying the conclusion of Lemma 3 (i.e. that the algorithm in Lemma 3 succeeded). We can determine Z~\widetilde{Z} such that Z~[Zτ,CZτ]\widetilde{Z}\in[Z_{\tau^{\prime}},CZ_{\tau^{\prime}}] with probability 1α2ηT\geq 1-\frac{\alpha}{2\lceil\eta T\rceil}, in time

O(nkTηlog4(mnδ)log(ηTα)).O\left(\sqrt{\frac{n}{k}}\cdot T\eta\log^{4}\left(\frac{mn}{\delta}\right)\log\left(\frac{\eta T}{\alpha}\right)\right).
Proof.

Define Z~0:=3Z~prev\widetilde{Z}_{0}:=3\widetilde{Z}_{\textup{prev}}, R0:=Z~0ZτR_{0}:=\frac{\widetilde{Z}_{0}}{Z_{\tau^{\prime}}}, and note that Z~0[Zτ,9CZτ]\widetilde{Z}_{0}\in[Z_{\tau^{\prime}},9CZ_{\tau^{\prime}}] by Invariant 1 and Lemma 2. Next, assuming the success of Lemma 3, we have that the success probability pp of 𝒪test\mathcal{O}_{\textup{test}} from Corollary 2 using the estimate Z~0\widetilde{Z}_{0} satisfies (for the unknown R0[1,9C]R_{0}\in[1,9C], and known C,Cu,ρC_{\ell},C_{u},\rho)

CR0ρpCuR0ρ.\frac{C_{\ell}}{\sqrt{R_{0}\rho}}\leq p\leq\frac{C_{u}}{\sqrt{R_{0}\rho}}.

For N:=27log4ηTα3CρCN:=27\log\frac{4\lceil\eta T\rceil}{\alpha}\cdot\frac{3\sqrt{C\rho}}{C_{\ell}}, we first run 𝒪test\mathcal{O}_{\textup{test}} NN times and check the number of successes, denoted by SS, which fits within the runtime budget by Corollary 2. By a Chernoff bound, we have that with probability 1α2ηT\geq 1-\frac{\alpha}{2\lceil\eta T\rceil}, we have

54log4ηTαCR023pNS43pN108log4ηTαCuCCR0.54\log\frac{4\lceil\eta T\rceil}{\alpha}\cdot\sqrt{\frac{C}{R_{0}}}\leq\frac{2}{3}pN\leq S\leq\frac{4}{3}pN\leq 108\log\frac{4\lceil\eta T\rceil}{\alpha}\cdot\frac{C_{u}}{C_{\ell}}\cdot\sqrt{\frac{C}{R_{0}}}.

Hence, we can determine the quantity R0R_{0} up to a multiplicative factor of 4Cu2C2C\frac{4C_{u}^{2}}{C_{\ell}^{2}}\leq C, which also implies the same multiplicative approximation factor for ZτZ_{\tau^{\prime}}, as desired. ∎

4.3 Proof of Theorem 3

See 3

Proof.

We first claim that for any k[n]k\in[n], we can solve Problem 1 with probability 1α\geq 1-\alpha and

𝒯samp\displaystyle\mathcal{T}_{\textup{samp}} =O(nkTηlog4(mnδ)),\displaystyle=O\left(\sqrt{\frac{n}{k}}\cdot T\eta\log^{4}\left(\frac{mn}{\delta}\right)\right),
𝒯update\displaystyle\mathcal{T}_{\textup{update}} =O((nkTηlog4(mnδ))kηlog(nηTα)).\displaystyle=O\left(\left(\sqrt{\frac{n}{k}}\cdot T\eta\log^{4}\left(\frac{mn}{\delta}\right)\right)\cdot k\eta\log\left(\frac{n\eta T}{\alpha}\right)\right).

This follows from combining Lemma 3 (amortized over 1η\lceil\frac{1}{\eta}\rceil iterations), Corollary 1, and Lemma 4, and taking a union bound over at most ηT\lceil\eta T\rceil phases. Here we note that the cost of logm\log m per iteration to support 𝖴𝗉𝖽𝖺𝗍𝖾\mathsf{Update} costs to xx in Lemma 1, Proposition 2, and Corollary 2 is not dominant. By choosing k=Θ(max(1,(ηlogmnαϵ)1))k=\Theta(\max(1,(\eta\log\frac{mn}{\alpha\epsilon})^{-1})), we balance the costs of 𝒯samp\mathcal{T}_{\textup{samp}} and 𝒯update\mathcal{T}_{\textup{update}}, yielding the conclusion. We finally note that by picking an appropriate constant in the definition of kk, we have δηδ116k\delta\leq\eta\implies\delta\leq\frac{1}{16k} as required by Lemma 3, the only component specifying a bound on δ\delta. ∎

Acknowledgments

We thank András Gilyén for communication regarding the prior work [vAG19]. AB was supported in part by the DOE QuantISED grant DE-SC0020360, by the AFOSR under grant FA9550-21-1-0392, and by the U.S. DOE Office of Science under Award Number DE-SC0020266. YG was supported in part by the Stanford MS&E DE&I Research program. YJ was supported in part by a Stanford Graduate Fellowship and a Danzig-Lieberman Graduate Fellowship. AS was supported in part by a Microsoft Research Faculty Fellowship, NSF CAREER Award CCF1844855, NSF Grant CCF-1955039, a PayPal research award, and a Sloan Research Fellowship. KT thanks Ewin Tang for her expertise on quantum linear algebra and for fielding many of our questions.

References

  • [Aar15] Scott Aaronson. Read the fine print. Nature Physics, 11(4):291–293, 2015.
  • [AK07] Sanjeev Arora and Satyen Kale. A combinatorial, primal-dual approach to semidefinite programs. In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pages 227–236, 2007.
  • [AL17] Zeyuan Allen-Zhu and Yuanzhi Li. Follow the compressed leader: Faster online learning of eigenvectors and faster MMWU. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, volume 70 of Proceedings of Machine Learning Research, pages 116–125. PMLR, 2017.
  • [Alm21] Josh Alman. Limits on the universal method for matrix multiplication. Theory Comput., 17:1–30, 2021.
  • [AW21] Josh Alman and Virginia Vassilevska Williams. A refined laser method and faster matrix multiplication. In Dániel Marx, editor, Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms, SODA 2021, Virtual Conference, January 10 - 13, 2021, pages 522–539. SIAM, 2021.
  • [BHMT02] Gilles Brassard, Peter Høyer, Michele Mosca, and Alain Tapp. Quantum amplitude amplification and estimation. Quantum Computation and Quantum Information, 305:53–74, 2002.
  • [BS17] Fernando GSL Brandao and Krysta M Svore. Quantum speed-ups for solving semidefinite programs. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 415–426. IEEE, 2017.
  • [Bub15] Sébastien Bubeck. Convex optimization: Algorithms and complexity. Foundations and Trends in Machine Learning, 8(3-4):231–357, 2015.
  • [BWP+17] Jacob Biamonte, Peter Wittek, Nicola Pancotti, Patrick Rebentrost, Nathan Wiebe, and Seth Lloyd. Quantum machine learning. Nature, 549(7671):195–202, 2017.
  • [CCLW20] Shouvanik Chakrabarti, Andrew M Childs, Tongyang Li, and Xiaodi Wu. Quantum algorithms and lower bounds for convex optimization. Quantum, 4:221, 2020.
  • [CDST19] Yair Carmon, John C. Duchi, Aaron Sidford, and Kevin Tian. A rank-1 sketch for matrix multiplicative weights. In Alina Beygelzimer and Daniel Hsu, editors, Conference on Learning Theory, COLT 2019, 25-28 June 2019, Phoenix, AZ, USA, volume 99 of Proceedings of Machine Learning Research, pages 589–623. PMLR, 2019.
  • [CGL+20] Nai-Hui Chia, András Gilyén, Tongyang Li, Han-Hsuan Lin, Ewin Tang, and Chunhao Wang. Sampling-based sublinear low-rank matrix arithmetic framework for dequantizing quantum machine learning. In Proceedings of the 52nd Annual ACM SIGACT symposium on theory of computing, pages 387–400, 2020.
  • [CJST19] Yair Carmon, Yujia Jin, Aaron Sidford, and Kevin Tian. Variance reduction for matrix games. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 11377–11388, 2019.
  • [CJST20] Yair Carmon, Yujia Jin, Aaron Sidford, and Kevin Tian. Coordinate methods for matrix games. In Sandy Irani, editor, 61st IEEE Annual Symposium on Foundations of Computer Science, FOCS 2020, Durham, NC, USA, November 16-19, 2020, pages 283–293. IEEE, 2020.
  • [CLS21] Michael B Cohen, Yin Tat Lee, and Zhao Song. Solving linear programs in the current matrix multiplication time. Journal of the ACM (JACM), 68(1):1–39, 2021.
  • [DH96] Christoph Dürr and Peter Høyer. A quantum algorithm for finding the minimum. CoRR, quant-ph/9607014, 1996.
  • [DNV20] Daniel Dadush, Bento Natura, and Làszlò A Vègh. Revisiting tardos’s framework for linear programming: faster exact solutions using approximate solvers. In Sandy Irani, editor, 61st IEEE Annual Symposium on Foundations of Computer Science, FOCS 2020, Durham, NC, USA, November 16-19, 2020, pages 931–942. IEEE, 2020.
  • [GAW19] András Gilyén, Srinivasan Arunachalam, and Nathan Wiebe. Optimizing quantum optimization algorithms via faster quantum gradient computation. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1425–1444. SIAM, 2019.
  • [GK95] Michael D. Grigoriadis and Leonid G. Khachiyan. A sublinear-time randomized approximation algorithm for matrix games. Operation Research Letters, 18(2):53–58, 1995.
  • [GLG22] Sevag Gharibian and François Le Gall. Dequantizing the quantum singular value transformation: Hardness and applications to quantum chemistry and the quantum pcp conjecture. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 19–32, 2022.
  • [GR02] Lov Grover and Terry Rudolph. Creating superpositions that correspond to efficiently integrable probability distributions. CoRR, abs/quant-ph/0208112, 2002.
  • [GSLW19] András Gilyén, Yuan Su, Guang Hao Low, and Nathan Wiebe. Quantum singular value transformation and beyond: exponential improvements for quantum matrix arithmetics. In Moses Charikar and Edith Cohen, editors, Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, Phoenix, AZ, USA, June 23-26, 2019, pages 193–204. ACM, 2019.
  • [Haa19] Jeongwan Haah. Product decomposition of periodic functions in quantum signal processing. Quantum, 3:190, 2019.
  • [HHL09] Aram W Harrow, Avinatan Hassidim, and Seth Lloyd. Quantum algorithm for linear systems of equations. Physical review letters, 103(15):150502, 2009.
  • [Jor05] Stephen P Jordan. Fast quantum algorithm for numerical gradient estimation. Physical review letters, 95(5):050501, 2005.
  • [JSWZ21] Shunhua Jiang, Zhao Song, Omri Weinstein, and Hengjie Zhang. A faster algorithm for solving general lps. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2021, 2021, pages 823–832, 2021.
  • [Kar84] Narendra Karmarkar. A new polynomial-time algorithm for linear programming. In Proceedings of the sixteenth annual ACM symposium on Theory of computing, pages 302–311, 1984.
  • [KP16] Iordanis Kerenidis and Anupam Prakash. Quantum recommendation systems. arXiv preprint arXiv:1603.08675, 2016.
  • [KP20] Iordanis Kerenidis and Anupam Prakash. A quantum interior point method for lps and sdps. ACM Transactions on Quantum Computing, 1(1):1–32, 2020.
  • [LCW19] Tongyang Li, Shouvanik Chakrabarti, and Xiaodi Wu. Sublinear quantum algorithms for training linear and kernel-based classifiers. In International Conference on Machine Learning, pages 3815–3824. PMLR, 2019.
  • [LMR14] Seth Lloyd, Masoud Mohseni, and Patrick Rebentrost. Quantum principal component analysis. Nature Physics, 10(9):631–633, 2014.
  • [LS14] Yin Tat Lee and Aaron Sidford. Path finding methods for linear programming: Solving linear programs in o (vrank) iterations and faster algorithms for maximum flow. In 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, pages 424–433. IEEE, 2014.
  • [LS19] Yin Tat Lee and Aaron Sidford. Solving linear programs with sqrt (rank) linear system solves. arXiv preprint arXiv:1910.08033, 2019.
  • [LWCW21] Tongyang Li, Chunhao Wang, Shouvanik Chakrabarti, and Xiaodi Wu. Sublinear classical and quantum algorithms for general matrix games. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 8465–8473, 2021.
  • [Nem04] Arkadi Nemirovski. Prox-method with rate of convergence O(1/t){O}(1/t) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM Journal on Optimization, 15(1):229–251, 2004.
  • [Nes07] Yurii Nesterov. Dual extrapolation and its applications to solving variational inequalities and related problems. Mathematical Programing, 109(2-3):319–344, 2007.
  • [Neu28] John Von Neumann. Zur theorie der gesellschaftsspiele. Mathematische Annalen, 100:295–320, 1928.
  • [NJLS09] Arkadi Nemirovski, Anatoli B. Juditsky, Guanghui Lan, and Alexander Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM J. Optim., 19(4):1574–1609, 2009.
  • [Ren88] James Renegar. A polynomial-time algorithm, based on newton’s method, for linear programming. Mathematical programming, 40(1):59–93, 1988.
  • [RML14] Patrick Rebentrost, Masoud Mohseni, and Seth Lloyd. Quantum support vector machine for big data classification. Physical review letters, 113(13):130503, 2014.
  • [SV14] Sushant Sachdeva and Nisheeth K. Vishnoi. Faster algorithms via approximation theory. Found. Trends Theor. Comput. Sci., 9(2):125–210, 2014.
  • [Tan19] Ewin Tang. A quantum-inspired classical algorithm for recommendation systems. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 217–228, 2019.
  • [vAG19] Joran van Apeldoorn and András Gilyén. Quantum algorithms for zero-sum games. CoRR, abs/1904.03180, 2019.
  • [vAGGdW20a] Joran van Apeldoorn, András Gilyén, Sander Gribling, and Ronald de Wolf. Convex optimization using quantum oracles. Quantum, 4:220, 2020.
  • [VAGGdW20b] Joran Van Apeldoorn, András Gilyén, Sander Gribling, and Ronald de Wolf. Quantum sdp-solvers: Better upper and lower bounds. Quantum, 4:230, 2020.
  • [vdB20] Jan van den Brand. A deterministic linear program solver in current matrix multiplication time. In Proceedings of the Thirty-first Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, 2020, pages 259–278, 2020.
  • [vdBLL+21] Jan van den Brand, Yin Tat Lee, Yang P. Liu, Thatchaphol Saranurak, Aaron Sidford, Zhao Song, and Di Wang. Minimum cost flows, mdps, and 1\ell_{1}-regression in nearly linear time for dense instances. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2021, 2021, pages 859–869, 2021.
  • [vdBLSS20] Jan van den Brand, Yin Tat Lee, Aaron Sidford, and Zhao Song. Solving tall dense linear programs in nearly linear time. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, pages 775–788, 2020.

Appendix A Solving matrix games with a Gibbs sampling oracle

In this section, we prove Proposition 1, which shows how to solve a zero-sum matrix game using an approximate Gibbs sampling oracle (via Algorithm 1). To briefly motivate the algorithm we use and our proof of its guarantees, we recall the problem we consider is of the form

minvΔnmaxuΔmf(u,v):=u𝐀v,where𝐀max1,\begin{gathered}\min_{v\in\Delta^{n}}\max_{u\in\Delta^{m}}f(u,v):=u^{\top}\mathbf{A}v,\leavevmode\nobreak\ \leavevmode\nobreak\ \text{where}\leavevmode\nobreak\ \leavevmode\nobreak\ \left\lVert\mathbf{A}\right\rVert_{\max}\leq 1,\end{gathered} (5)

and we define the associated gradient operator as

g(u,v)=(𝐀v,𝐀u).\displaystyle g(u,v)=(-\mathbf{A}v,\mathbf{A}^{\top}u). (6)

Taking (stochastic) mirror descent steps on the gradient operator in (5) is well-known to yield an approximate NE to the matrix game [Bub15]. We show that an approximate implementation of this strategy, combined with appropriate subsampling, efficiently yields an approximate NE. We begin by making the following observation.

Lemma 5.

Let u,u~Δmu,\tilde{u}\in\Delta^{m} have uu~1δ\left\lVert u-\tilde{u}\right\rVert_{1}\leq\delta. Let g~:=𝐀i:\tilde{g}:=\mathbf{A}_{i:} where iu~i\sim\tilde{u}, and g:=𝐀ug:=\mathbf{A}^{\top}u. Then, g𝔼g~δ\left\lVert g-\mathbb{E}\tilde{g}\right\rVert_{\infty}\leq\delta.

Proof.

Note that 𝔼g~=𝐀u~\mathbb{E}\tilde{g}=\mathbf{A}^{\top}\tilde{u}, and 𝐀(uu~)uu~1δ\left\lVert\mathbf{A}^{\top}(u-\tilde{u})\right\rVert_{\infty}\leq\left\lVert u-\tilde{u}\right\rVert_{1}\leq\delta since 𝐀max1\left\lVert\mathbf{A}\right\rVert_{\max}\leq 1. ∎

We next present a variant of the classical mirror descent analysis, which bounds the expected approximation quality of iterates of Algorithm 1 prior to subsampling.

Proposition 3.

Let δϵ20\delta\leq\frac{\epsilon}{20}, η=ϵ15\eta=\frac{\epsilon}{15} and T6log(mn)ηϵT\geq\frac{6\log(mn)}{\eta\epsilon} in Algorithm 1. Let the iterates of Algorithm 1 be {xt,yt}t=0T1\{x_{t},y_{t}\}_{t=0}^{T-1}, and denote ut:=exp(𝐀yt)exp(𝐀yt)1u_{t}:=\frac{\exp(\mathbf{A}y_{t})}{\left\lVert\exp(\mathbf{A}y_{t})\right\rVert_{1}}, vt:=exp(𝐀xt)exp(𝐀xt)1v_{t}:=\frac{\exp(-\mathbf{A}^{\top}x_{t})}{\left\lVert\exp(-\mathbf{A}^{\top}x_{t})\right\rVert_{1}} for all 0t<T0\leq t<T. For (u¯,v¯):=1Tt=0T1(ut,vt)(\bar{u},\bar{v}):=\frac{1}{T}\sum_{t=0}^{T-1}(u_{t},v_{t}), we have

𝔼[maxuΔmu𝐀v¯minvΔnu¯𝐀v]ϵ.\displaystyle\mathbb{E}\left[\max_{u\in\Delta^{m}}u^{\top}\mathbf{A}\bar{v}-\min_{v\in\Delta^{n}}\bar{u}^{\top}\mathbf{A}v\right]\leq\epsilon. (7)
Proof.

By definition of the updates, at every iteration 0tT10\leq t\leq T-1, we have

ut+1\displaystyle u_{t+1} =argminuΔm{η𝐀:jt,u+i[m][u]ilog[u]i[ut]i},\displaystyle=\textup{argmin}_{u\in\Delta^{m}}\left\{\eta\langle-\mathbf{A}_{:j_{t}},u\rangle+\sum_{i\in[m]}[u]_{i}\log\frac{[u]_{i}}{[u_{t}]_{i}}\right\},
vt+1\displaystyle v_{t+1} =argminvΔn{η𝐀it:,v+j[n][v]jlog[v]j[vt]j}.\displaystyle=\textup{argmin}_{v\in\Delta^{n}}\left\{\eta\langle\mathbf{A}_{i_{t}:},v\rangle+\sum_{j\in[n]}[v]_{j}\log\frac{[v]_{j}}{[v_{t}]_{j}}\right\}.

Consequently, by the optimality conditions of ut+1u_{t+1} and vt+1v_{t+1} respectively, we have for any uΔmu\in\Delta^{m}, vΔnv\in\Delta^{n}, and letting Vx(x):=k[x]klog[x]k[x]kV_{x}(x^{\prime}):=\sum_{k}[x^{\prime}]_{k}\log\frac{[x^{\prime}]_{k}}{[x]_{k}} be the KL divergence between simplex variables of appropriate dimension,

𝐀:j,utu+𝐀i:,vtv\displaystyle\left\langle-\mathbf{A}_{:j},u_{t}-u\right\rangle+\left\langle\mathbf{A}_{i:},v_{t}-v\right\rangle 1η(Vut(u)Vut+1(u)+Vvt(v)Vvt+1(v))\displaystyle\leq\frac{1}{\eta}\left(V_{u_{t}}(u)-V_{u_{t+1}}(u)+V_{v_{t}}(v)-V_{v_{t+1}}(v)\right) (8)
+(𝐀:j,utut+11ηVut(ut+1))\displaystyle+\left(\left\langle-\mathbf{A}_{:j},u_{t}-u_{t+1}\right\rangle-\frac{1}{\eta}V_{u_{t}}(u_{t+1})\right)
+(𝐀i:,vtvt+11ηVvt(vt+1))\displaystyle+\left(\left\langle\mathbf{A}_{i:},v_{t}-v_{t+1}\right\rangle-\frac{1}{\eta}V_{v_{t}}(v_{t+1})\right)
1η(Vut(u)Vut+1(u)+Vvt(v)Vvt+1(v))\displaystyle\leq\frac{1}{\eta}\left(V_{u_{t}}(u)-V_{u_{t+1}}(u)+V_{v_{t}}(v)-V_{v_{t+1}}(v)\right)
+η2𝐀:j2+η2𝐀i:2,\displaystyle+\frac{\eta}{2}\left\lVert\mathbf{A}_{:j}\right\rVert_{\infty}^{2}+\frac{\eta}{2}\left\lVert\mathbf{A}_{i:}\right\rVert_{\infty}^{2},

where for the last inequality we use Hölder’s inequality and the fact that VV is 11-strongly convex in the 1\ell_{1} norm (by Pinsker’s inequality). Averaging the above for 0t<T0\leq t<T, and denoting wt:=(ut,vt)w_{t}:=(u_{t},v_{t}) and g~t:=(𝐀:jt,𝐀it:)\tilde{g}_{t}:=(-\mathbf{A}_{:j_{t}},\mathbf{A}_{i_{t}:}), we obtain for any w=(u,v)Δm×Δnw=(u,v)\in\Delta^{m}\times\Delta^{n},

1Tt=0T1g~t,wtw1ηT(Vu0(u)+Vv0(v))+η.\frac{1}{T}\sum_{t=0}^{T-1}\langle\tilde{g}_{t},w_{t}-w\rangle\leq\frac{1}{\eta T}\left(V_{u_{0}}(u)+V_{v_{0}}(v)\right)+\eta. (9)

In the above, we further recalled the bound 𝐀max1\left\lVert\mathbf{A}\right\rVert_{\max}\leq 1 by assumption. In order to bound the deviation of the left-hand side from its expectation, we use a “ghost iterate” argument following [NJLS09, CJST19]. In particular, we define iterates u~t\tilde{u}_{t}, v~t\tilde{v}_{t} as follows: let u~0u0\tilde{u}_{0}\leftarrow u_{0}, v~0v0\tilde{v}_{0}\leftarrow v_{0}, and then for each 0t<T0\leq t<T, define

u~t+1\displaystyle\tilde{u}_{t+1} :=argminuΔm{η𝐀vt+𝐀:jt,u¯+i[m][u]ilog[u]i[u~t]i},\displaystyle:=\textup{argmin}_{u\in\Delta^{m}}\left\{\eta\langle-\mathbf{A}v_{t}+\mathbf{A}_{:j_{t}},\bar{u}\rangle+\sum_{i\in[m]}[u]_{i}\log\frac{[u]_{i}}{[\tilde{u}_{t}]_{i}}\right\},
v~t+1\displaystyle\tilde{v}_{t+1} :=argminvΔn{η𝐀ut𝐀:it,v¯+j[n][v]jlog[v]j[v~t]j},\displaystyle:=\textup{argmin}_{v\in\Delta^{n}}\left\{\eta\langle\mathbf{A}^{\top}u_{t}-\mathbf{A}_{:i_{t}},\bar{v}\rangle+\sum_{j\in[n]}[v]_{j}\log\frac{[v]_{j}}{[\tilde{v}_{t}]_{j}}\right\},

where i,ji,j above are the same coordinates as were used in defining the updates to ut+1u_{t+1} and vt+1v_{t+1}. By an analogous bound to (8), where we note 𝐀:jt𝐀vt,𝐀ut𝐀it:2\left\lVert\mathbf{A}_{:j_{t}}-\mathbf{A}^{\top}v_{t}\right\rVert_{\infty},\left\lVert\mathbf{A}u_{t}-\mathbf{A}_{i_{t}:}\right\rVert_{\infty}\leq 2,

𝐀vt+𝐀:jt,u~tu+𝐀ut𝐀it:,v~tv\displaystyle\left\langle-\mathbf{A}^{\top}v_{t}+\mathbf{A}_{:j_{t}},\tilde{u}_{t}-u\right\rangle+\left\langle\mathbf{A}u_{t}-\mathbf{A}_{i_{t}:},\tilde{v}_{t}-v\right\rangle 1η(Vu~t(u)Vu~t+1(u)+Vv~t(v)Vv~t+1(v))\displaystyle\leq\frac{1}{\eta}\left(V_{\tilde{u}_{t}}(u)-V_{\tilde{u}_{t+1}}(u)+V_{\tilde{v}_{t}}(v)-V_{\tilde{v}_{t+1}}(v)\right)
+4η.\displaystyle+4\eta.

Averaging the above for 0t<T0\leq t<T, and denoting w~t:=(u~t,v~t)\tilde{w}_{t}:=(\tilde{u}_{t},\tilde{v}_{t}) and gt:=g(wt)g_{t}:=g(w_{t}) (see (5)), we obtain for any w=(u,v)Δm×Δnw=(u,v)\in\Delta^{m}\times\Delta^{n},

1Tt[T]1gtg~t,w~tw1ηT(Vu0(u)+Vv0(v))+4η.\frac{1}{T}\sum_{t\in[T]-1}\langle g_{t}-\tilde{g}_{t},\tilde{w}_{t}-w\rangle\leq\frac{1}{\eta T}\left(V_{u_{0}}(u)+V_{v_{0}}(v)\right)+4\eta. (10)

Summing inequalities (9) and (10), and maximizing over w=(u,v)Δm×Δnw=(u,v)\in\Delta^{m}\times\Delta^{n}, we have

maxwΔm×Δn1Tt=0T1gt,wtw\displaystyle\max_{w\in\Delta^{m}\times\Delta^{n}}\frac{1}{T}\sum_{t=0}^{T-1}\langle g_{t},w_{t}-w\rangle maxuΔn,vΔm2ηT(Vu0(u)+Vv0(v))\displaystyle\leq\max_{u\in\Delta^{n},v\in\Delta^{m}}\frac{2}{\eta T}\left(V_{u_{0}}(u)+V_{v_{0}}(v)\right) (11)
+5η+1Tt=0T1gtg~t,wtw~t.\displaystyle+5\eta+\frac{1}{T}\sum_{t=0}^{T-1}\langle g_{t}-\tilde{g}_{t},w_{t}-\tilde{w}_{t}\rangle.

Taking expectations over the above, we have

𝔼[maxwΔm×Δn1Tt=0T1gt,wtw]\displaystyle\mathbb{E}\left[\max_{w\in\Delta^{m}\times\Delta^{n}}\frac{1}{T}\sum_{t=0}^{T-1}\langle g_{t},w_{t}-w\rangle\right] maxuΔn,vΔm2ηT[Vu0(u)+Vv0(v)]\displaystyle\leq\max_{u\in\Delta^{n},v\in\Delta^{m}}\frac{2}{\eta T}\left[V_{u_{0}}(u)+V_{v_{0}}(v)\right]
+5η+𝔼[1Tt=0T1gtg~t,wtw~t]\displaystyle+5\eta+\mathbb{E}\left[\frac{1}{T}\sum_{t=0}^{T-1}\langle g_{t}-\tilde{g}_{t},w_{t}-\tilde{w}_{t}\rangle\right]
(i)2log(mn)ηT+5η+1Tt[T]1gt𝔼g~t,wtw¯t,\displaystyle\stackrel{{\scriptstyle(i)}}{{\leq}}\frac{2\log(mn)}{\eta T}+5\eta+\frac{1}{T}\sum_{t\in[T]-1}\langle g_{t}-\mathbb{E}\tilde{g}_{t},w_{t}-\bar{w}_{t}\rangle,
(ii)2log(mn)ηT+5η+4δ(iii)ϵ.\displaystyle\stackrel{{\scriptstyle(ii)}}{{\leq}}\frac{2\log(mn)}{\eta T}+5\eta+4\delta\stackrel{{\scriptstyle(iii)}}{{\leq}}\epsilon.

In the above, (i)(i) used the diameter bound of the KL divergence from the uniform distribution, i.e. maxuΔmVu0(u)=logm\max_{u\in\Delta^{m}}V_{u_{0}}(u)=\log m (and a similar bound for Vv0(v)V_{v_{0}}(v)). Further, (ii)(ii) uses that g~t\tilde{g}_{t} is conditionally independent of wtw_{t} and w~t\tilde{w}_{t}, and by the assumption on the Gibbs sampler gt𝔼g~tδ\left\lVert g_{t}-\mathbb{E}\tilde{g}_{t}\right\rVert_{\infty}\leq\delta (via Lemma 5), and Hölder, and (iii)(iii) uses our choices of TT, η\eta and δ\delta.

Finally, we note that the desired claim follows by linearity: for any w=(u,v)w=(u,v),

1Tt=0T1gt,wtw\displaystyle\frac{1}{T}\sum_{t=0}^{T-1}\langle g_{t},w_{t}-w\rangle =g(1Tt=0T1wt),1Tt=0T1wtw\displaystyle=\left\langle g\left(\frac{1}{T}\sum_{t=0}^{T-1}w_{t}\right),\frac{1}{T}\sum_{t=0}^{T-1}w_{t}-w\right\rangle
=u𝐀v¯u¯𝐀v.\displaystyle=u^{\top}\mathbf{A}\bar{v}-\bar{u}^{\top}\mathbf{A}v.

By using a simple martingale argument (inspired by those in [AL17, CDST19]) to bound the error term in (11), we show that the guarantee of Proposition 3 holds with high probability.

Corollary 3.

Let α(0,1)\alpha\in(0,1), and let δϵ20\delta\leq\frac{\epsilon}{20}, η=ϵ20\eta=\frac{\epsilon}{20} and T8log(mn)ηϵ+2048log1αϵ2T\geq\frac{8\log(mn)}{\eta\epsilon}+\frac{2048\log\frac{1}{\alpha}}{\epsilon^{2}} in Algorithm 1. Then with probability at least 1α1-\alpha, following notation of Proposition 3, (u¯,v¯)(\bar{u},\bar{v}) are an ϵ\epsilon-approximate NE for 𝐀\mathbf{A}.

Proof.

Consider the filtration given by t=σ(u0,v0,g~0,,g~t,ut+1,vt+1)\mathcal{F}_{t}=\sigma(u_{0},v_{0},\tilde{g}_{0},\cdots,\tilde{g}_{t},u_{t+1},v_{t+1}). We will bound the terms t=0T1gtg~t,wtw¯t\sum_{t=0}^{T-1}\langle g_{t}-\tilde{g}_{t},w_{t}-\bar{w}_{t}\rangle in (7). To do so, we define a martingale difference sequence of the form Dt:=gtg~t,wtw¯tgt𝔼[g~t|t1],wtw¯tD_{t}:=\langle g_{t}-\tilde{g}_{t},w_{t}-\bar{w}_{t}\rangle-\langle g_{t}-\mathbb{E}\left[\tilde{g}_{t}|\mathcal{F}_{t-1}\right],w_{t}-\bar{w}_{t}\rangle which is adapted to the filtration t\mathcal{F}_{t}. We first note that Dtgt1g~t1wt1w¯t118D_{t}\leq\left\lVert g_{t-1}-\tilde{g}_{t-1}\right\rVert_{\infty}\left\lVert w_{t-1}-\bar{w}_{t-1}\right\rVert_{1}\leq 8 with probability 11. Consequently, applying the Azuma-Hoeffding inequality yields

t=0T1Dt128Tlog1αwith probability1α.\displaystyle\sum_{t=0}^{T-1}D_{t}\leq\sqrt{128T\log\frac{1}{\alpha}}\leavevmode\nobreak\ \text{with probability}\geq 1-\alpha.

Plugging this back into (11) and using the KL divergence range bound, Lemma 5 with our definition of 𝒪gibbs\mathcal{O}^{\textup{gibbs}}, and choices of parameters, we thus have with probability 1α1-\alpha,

maxwΔm×Δn1Tt=0T1gt,wtw2logmnηT+5η+4δ+128log1αTϵ.\displaystyle\max_{w\in\Delta^{m}\times\Delta^{n}}\frac{1}{T}\sum_{t=0}^{T-1}\langle g_{t},w_{t}-w\rangle\leq\frac{2\log mn}{\eta T}+5\eta+4\delta+\sqrt{\frac{128\log\frac{1}{\alpha}}{T}}\leq\epsilon. (12)

The remainder of the proof follows analogously to Proposition 3. ∎

The Gibbs sampling oracles implicitly maintain access to utexp(𝐀yt)u_{t}\propto\exp(\mathbf{A}y_{t}) and vtexp(𝐀xt)v_{t}\propto\exp(-\mathbf{A}^{\top}x_{t}), which by averaging gives (u¯,v¯)=1Tt=0T1(ut,vt)(\bar{u},\bar{v})=\frac{1}{T}\sum_{t=0}^{T-1}(u_{t},v_{t}) as one approximate equilibrium as guaranteed in Corollary 3. To turn the implicitly maintained iterates into an actual classic output, we subsample the iterates. Below we formally show one can take the empirical average of independent samples from distributions close to u¯\bar{u} and v¯\bar{v} to also obtain an approximate equilibrium (with the same approximation factor up to constant factors) with high probability.

Lemma 6.

Suppose u¯=1Tt=0T1ut\bar{u}=\frac{1}{T}\sum_{t=0}^{T-1}u_{t} for {ut}t=0T1Δm\{u_{t}\}_{t=0}^{T-1}\subset\Delta^{m} and v¯=1Tt=0T1vt\bar{v}=\frac{1}{T}\sum_{t=0}^{T-1}v_{t} for {vt}t=0T1Δn\{v_{t}\}_{t=0}^{T-1}\subset\Delta^{n} are an ϵ\epsilon-approximate NE for 𝐀\mathbf{A}. Further suppose that for some δ(0,1)\delta\in(0,1), {u~t}t=0T1Δm\{\tilde{u}_{t}\}_{t=0}^{T-1}\subset\Delta^{m}, {v~t}t=0T1Δn\{\tilde{v}_{t}\}_{t=0}^{T-1}\subset\Delta^{n}, and for all 0t<T10\leq t<T-1, we have u~tut1δ\left\lVert\tilde{u}_{t}-u_{t}\right\rVert_{1}\leq\delta and v~tvt1δ\left\lVert\tilde{v}_{t}-v_{t}\right\rVert_{1}\leq\delta. Let u^=1Tt=0T1eit\hat{u}=\frac{1}{T}\sum_{t=0}^{T-1}e_{i_{t}} where each eitme_{i_{t}}\in\mathbb{R}^{m} is sampled independently according to u~t\tilde{u}_{t}; similarly, let v^=1Tt=0T1ejt\hat{v}=\frac{1}{T}\sum_{t=0}^{T-1}e_{j_{t}} where each ejtne_{j_{t}}\in\mathbb{R}^{n} is sampled independently according to v~t\tilde{v}_{t}. Suppose T16logmnαϵ2T\geq\frac{16\log\frac{mn}{\alpha}}{\epsilon^{2}}. Then with probability at least 1α1-\alpha, (u^,v^)(\hat{u},\hat{v}) are a (2ϵ+2δ)(2\epsilon+2\delta)-approximate NE for 𝐀\mathbf{A}.

Proof.

First, let u~avg=1Tt=0T1u~t\tilde{u}_{\textup{avg}}=\frac{1}{T}\sum_{t=0}^{T-1}\tilde{u}_{t} and v~avg=1Tt=0T1v~t\tilde{v}_{\textup{avg}}=\frac{1}{T}\sum_{t=0}^{T-1}\tilde{v}_{t}. By convexity of norms, we have u~avgu¯1δ\left\lVert\tilde{u}_{\textup{avg}}-\bar{u}\right\rVert_{1}\leq\delta and v~avgv¯1δ\left\lVert\tilde{v}_{\textup{avg}}-\bar{v}\right\rVert_{1}\leq\delta, and hence under the NE approximation guarantee of (u¯,v¯)(\bar{u},\bar{v}) and Hölder’s inequality,

maxuΔmu𝐀v~avgminvΔmu~avg𝐀vϵ+2δ.\max_{u\in\Delta^{m}}u^{\top}\mathbf{A}\tilde{v}_{\textup{avg}}-\min_{v\in\Delta^{m}}\tilde{u}_{\textup{avg}}^{\top}\mathbf{A}v\leq\epsilon+2\delta.

Let zz be a fixed vector in [1,1]n[-1,1]^{n}. By Hoeffding’s inequality, since each random variable z,ejt\left\langle z,e_{j_{t}}\right\rangle lies in the range [1,1][-1,1] and 𝔼v^=v~avg\mathbb{E}\hat{v}=\tilde{v}_{\textup{avg}}, we have that

Pr[|z,v^v~avg|ϵ2]2exp(Tϵ28)αm+n.\Pr\left[\left|\left\langle z,\hat{v}-\tilde{v}_{\textup{avg}}\right\rangle\right|\geq\frac{\epsilon}{2}\right]\leq 2\exp\left(-\frac{T\epsilon^{2}}{8}\right)\leq\frac{\alpha}{m+n}. (13)

Next, note that maxuΔmu𝐀v~avg\max_{u\in\Delta^{m}}u^{\top}\mathbf{A}\tilde{v}_{\textup{avg}} is achieved by a basis vector u=eiu=e_{i}. Hence, applying a union bound over (13) for all z=𝐀i:z=\mathbf{A}_{i:} shows that with probability at least 1αmm+n1-\frac{\alpha m}{m+n},

maxuΔmu𝐀v^maxuΔmu𝐀v~avg+ϵ2.\max_{u\in\Delta^{m}}u^{\top}\mathbf{A}\hat{v}\leq\max_{u\in\Delta^{m}}u^{\top}\mathbf{A}\tilde{v}_{\textup{avg}}+\frac{\epsilon}{2}.

By symmetry, with probability at least 1αnm+n1-\frac{\alpha n}{m+n},

minvΔnu^𝐀vminvΔnu~avg𝐀vϵ2.\min_{v\in\Delta^{n}}\hat{u}^{\top}\mathbf{A}v\geq\min_{v\in\Delta^{n}}\tilde{u}_{\textup{avg}}^{\top}\mathbf{A}v-\frac{\epsilon}{2}.

The conclusion follows from a union bound, and combining the above three displays. ∎

Finally, we put these pieces together to give a complete guarantee.

See 1

Proof.

We follow notation of Proposition 3. By applying Corollary 3 (up to constant factors), we have that with probability at least 1α21-\frac{\alpha}{2}, u¯:=1Tt=0T1ut\bar{u}:=\frac{1}{T}\sum_{t=0}^{T-1}u_{t} and v¯:=1Tt=0T1vt\bar{v}:=\frac{1}{T}\sum_{t=0}^{T-1}v_{t} satisfy

maxuΔmu𝐀v¯minvΔnu¯𝐀vϵ3.\max_{u\in\Delta^{m}}u^{\top}\mathbf{A}\bar{v}-\min_{v\in\Delta^{n}}\bar{u}^{\top}\mathbf{A}v\leq\frac{\epsilon}{3}.

Finally, Lemma 6 (with failure probability α2\frac{\alpha}{2}) and a union bound yields the desired conclusion. ∎

Appendix B Quantum rejection sampling with a hint

In this section, we prove Proposition 2, which gives a dynamic quantum rejection sampling subroutine and bounds its cost of implementation. Our result is an extension of analogous developments in [vAG19], but are stated more generally to allow for the use of an appropriate “hint” vector in the rejection sampling procedure. We build up to our main result in several pieces.

Amplitude amplification.

First, for a quantum decision algorithm which applies unitary 𝐔\mathbf{U} and then measures, yielding an accepting state with probability α\alpha, quantum amplification [BHMT02] shows we can apply 𝐔\mathbf{U} α12\approx\alpha^{-\frac{1}{2}} times to obtain an accepting state with high probability.

Proposition 4 (Theorem 3, [BHMT02]).

Let S{0,1}sS\subseteq\{0,1\}^{s}, let 𝐔\mathbf{U} be a ss-qubit quantum oracle, and let α\alpha be the probability that measuring the result of applying 𝐔\mathbf{U} yields an accepting state. There is a (quantum) algorithm using O(α12log1δ)O(\alpha^{-\frac{1}{2}}\log\frac{1}{\delta}) queries to 𝐔\mathbf{U} and O(logslog1δ)O(\log s\log\frac{1}{\delta}) additional time that returns ss with sSs\in S with probability 1δ\geq 1-\delta.

Loading from trees.

Given a dynamic vector x0mx\in\mathbb{R}^{m}_{\geq 0} which is supported in an appropriate efficient data structure 𝖲𝖺𝗆𝗉𝗅𝖾𝗋𝖳𝗋𝖾𝖾\mathsf{SamplerTree} (see Lemma 1), and a known bound βx1\beta\geq\left\lVert x\right\rVert_{1}, we recall a result of [GR02] which allows us to form a superposition of the entries in xx (suitably rescaled).

Lemma 7.

Let x0mx\in\mathbb{R}^{m}_{\geq 0} correspond to an instance of 𝖲𝖺𝗆𝗉𝗅𝖾𝗋𝖳𝗋𝖾𝖾\mathsf{SamplerTree}, and βx1\beta\geq\left\lVert x\right\rVert_{1}. We can maintain a quantum oracle 𝒪𝖲𝖺𝗆𝗉𝗅𝖾𝗋𝖳𝗋𝖾𝖾\mathcal{O}_{\mathsf{SamplerTree}} which takes O(logm)O(\log m) time to apply, such that the total cost of building 𝒪𝖲𝖺𝗆𝗉𝗅𝖾𝗋𝖳𝗋𝖾𝖾\mathcal{O}_{\mathsf{SamplerTree}} after TT calls to 𝖴𝗉𝖽𝖺𝗍𝖾\mathsf{Update} is O(Tlogm)O(T\log m), and

𝒪𝖲𝖺𝗆𝗉𝗅𝖾𝗋𝖳𝗋𝖾𝖾|0(a+1)=i[m]xiβ|0|i+1x1β|1|g.\mathcal{O}_{\mathsf{SamplerTree}}|0\rangle^{\otimes(a+1)}=\sum_{i\in[m]}\sqrt{\frac{x_{i}}{\beta}}|0\rangle|i\rangle+\sqrt{1-\frac{\left\lVert x\right\rVert_{1}}{\beta}}|1\rangle|g\rangle.
Proof.

This is implicit in [GR02]. We first apply a 11-qubit gate to condition on selecting from the tree (with probability x1β\frac{\left\lVert x\right\rVert_{1}}{\beta}), and then apply the [GR02] procedure conditioned on the first qubit being |0|0\rangle, which controls for one qubit at a time while propagating subtree sums (provided by 𝖲𝖺𝗆𝗉𝗅𝖾𝗋𝖳𝗋𝖾𝖾\mathsf{SamplerTree} via 𝖲𝗎𝖻𝗍𝗋𝖾𝖾𝖲𝗎𝗆\mathsf{SubtreeSum}). The cost to build the circuit follows because on an 𝖴𝗉𝖽𝖺𝗍𝖾\mathsf{Update} we need to change the gates corresponding to the relevant leaf-to-root path.

Corollary 4.

Let x0mx\in\mathbb{R}^{m}_{\geq 0} correspond to an instance of 𝖲𝖺𝗆𝗉𝗅𝖾𝗋𝖳𝗋𝖾𝖾\mathsf{SamplerTree}, and let βx1\beta\geq\left\lVert x\right\rVert_{1}, and suppose 𝐀m×n\mathbf{A}\in\mathbb{R}^{m\times n} has 𝐀max1\left\lVert\mathbf{A}\right\rVert_{\max}\leq 1. We can maintain a quantum oracle 𝒪𝐀x\mathcal{O}_{\mathbf{A}^{\top}x} which takes O(logm)O(\log m) time to apply, with total building cost O(Tlogm)O(T\log m) after TT calls to 𝖴𝗉𝖽𝖺𝗍𝖾\mathsf{Update}, such that for any j[n]j\in[n],

𝒪𝐀x|0(a+2)|j=|0(i[m]𝐀ijxiβ|0|i+|1|g)|j.\mathcal{O}_{\mathbf{A}^{\top}x}|0\rangle^{\otimes(a+2)}|j\rangle=|0\rangle\left(\sum_{i\in[m]}\sqrt{\frac{\mathbf{A}_{ij}x_{i}}{\beta}}|0\rangle|i\rangle+|1\rangle|g\rangle\right)|j\rangle.
Proof.

We apply 𝒪𝐀\mathcal{O}_{\mathbf{A}}^{\prime} (see Section 2) to the output of 𝒪𝖲𝖺𝗆𝗉𝗅𝖾𝗋𝖳𝗋𝖾𝖾\mathcal{O}_{\mathsf{SamplerTree}}, ignoring the additional qubit. ∎

We remark here that the additional qubit in Corollary 4 will shortly become useful in constructing an appropriate block-encoding of a scaling of diag(𝐀x)\textbf{{diag}}\left(\mathbf{A}^{\top}x\right).

Polynomial approximation.

In order to give approximate Gibbs samplers for the types of dynamic vectors Algorithm 1 encounters, we further require some tools from polynomial approximation theory. We first state a helper result on boundedly approximating the exponential, a variant of which was also used in [vAG19]. We provide a proof in Appendix C.

Lemma 8 (Lemma 7, [vAG19]).

Let β1\beta\geq 1, ξ110\xi\leq\frac{1}{10}. There is a polynomial 𝒫β,ξ\mathcal{P}_{\beta,\xi} of degree O(βlog1ξ)O(\beta\log\frac{1}{\xi}) such that maxx[1,1]|𝒫β,ξ(x)|3\max_{x\in[-1,1]}|\mathcal{P}_{\beta,\xi}(x)|\leq 3 and maxx[1,0]|𝒫β,ξ(x)exp(βx)|ξ\max_{x\in[-1,0]}|\mathcal{P}_{\beta,\xi}(x)-\exp(\beta x)|\leq\xi.

Next, we state a further corollary of Lemma 8 to be used in our rejection sampler.

Corollary 5.

Let B,δ0B,\delta\geq 0 and suppose vnv\in\mathbb{R}^{n} has vB\left\lVert v\right\rVert_{\infty}\leq B. Further, suppose for some c0c\geq 0, cmaxj[n]vj0-c\leq\max_{j\in[n]}v_{j}\leq 0. Let q0nq\in\mathbb{R}^{n}_{\geq 0} satisfy qj[,1]q_{j}\in[\ell,1] entrywise. Finally, define uj:=vj2Bu_{j}:=\frac{v_{j}}{2B} entrywise. There is a degree-Δ\Delta polynomial 𝒫\mathcal{P}, for Δ=O(B(c+lognδ))\Delta=O(B\cdot(c+\log\frac{n}{\ell\delta})), such that for wj:=𝒫(uj)2qjw_{j}:=\mathcal{P}(u_{j})^{2}q_{j} and zj:=exp(2Buj)qjz_{j}:=\exp(2Bu_{j})q_{j} entrywise,

ww1zz11δ.\left\lVert\frac{w}{\left\lVert w\right\rVert_{1}}-\frac{z}{\left\lVert z\right\rVert_{1}}\right\rVert_{1}\leq\delta. (14)

Moreover, maxx[1,1]|𝒫(x)|12\max_{x\in[-1,1]}|\mathcal{P}(x)|\leq\frac{1}{2}, and w11δ36z1\|w\|_{1}\geq\frac{1-\delta}{36}\|z\|_{1}.

Proof.

Assume δ2\delta\leq 2 else the statement is clearly true. First, uj[12,0]u_{j}\in[-\frac{1}{2},0] entrywise by the stated assumptions (since vj[B,0]v_{j}\in[-B,0] entrywise). Let 𝒫β,ξ()\mathcal{P}_{\beta,\xi}(\cdot) be the polynomial given by Lemma 8 which ξ\xi-approximates exp(β)\exp(\beta\cdot) on [12,0][-\frac{1}{2},0]. We define

𝒫(u):=16𝒫B,ξ(u), for ξ:=δ6nexp(c).\mathcal{P}(u):=\frac{1}{6}\mathcal{P}_{B,\xi}\left(u\right),\text{ for }\xi:=\frac{\delta\ell}{6n\exp(c)}.

The degree bound and absolute value bound of this polynomial follows immediately from Lemma 8, so it remains to show the distance bound. The guarantees of Lemma 8 then imply for all j[n]j\in[n],

|6𝒫(uj)exp(Buj)|ξ.\left|6\mathcal{P}(u_{j})-\exp\left(Bu_{j}\right)\right|\leq\xi. (15)

We further have that uj0u_{j}\leq 0, so exp(Buj)1\exp(Bu_{j})\leq 1. Hence, we also have

|6𝒫(uj)+exp(Buj)|2+ξ3.\left|6\mathcal{P}(u_{j})+\exp\left(Bu_{j}\right)\right|\leq 2+\xi\leq 3.

Combining yields for all j[n]j\in[n],

|36𝒫(uj)2exp(2Buj)|3ξ.\left|36\mathcal{P}(u_{j})^{2}-\exp\left(2Bu_{j}\right)\right|\leq 3\xi. (16)

Next, let yj:=36wjy_{j}:=36w_{j} for all j[n]j\in[n], and note that yy1=ww1\frac{y}{\|y\|_{1}}=\frac{w}{\|w\|_{1}}. We bound

ww1zz11\displaystyle\left\lVert\frac{w}{\left\lVert w\right\rVert_{1}}-\frac{z}{\left\lVert z\right\rVert_{1}}\right\rVert_{1} =j[n]|yjy1zjz1|j[n]|yjy1yjz1|+j[n]|yjz1zjz1|\displaystyle=\sum_{j\in[n]}\left|\frac{y_{j}}{\left\lVert y\right\rVert_{1}}-\frac{z_{j}}{\left\lVert z\right\rVert_{1}}\right|\leq\sum_{j\in[n]}\left|\frac{y_{j}}{\left\lVert y\right\rVert_{1}}-\frac{y_{j}}{\left\lVert z\right\rVert_{1}}\right|+\sum_{j\in[n]}\left|\frac{y_{j}}{\left\lVert z\right\rVert_{1}}-\frac{z_{j}}{\left\lVert z\right\rVert_{1}}\right| (17)
|1y1z1|+yz1z12yz1z1.\displaystyle\leq\left|1-\frac{\left\lVert y\right\rVert_{1}}{\left\lVert z\right\rVert_{1}}\right|+\frac{\left\lVert y-z\right\rVert_{1}}{\left\lVert z\right\rVert_{1}}\leq\frac{2\left\lVert y-z\right\rVert_{1}}{\left\lVert z\right\rVert_{1}}.

By using the definitions of y,zy,z and (16), as well as the assumed ranges on qq,

yz13nξ,z1exp(c).\displaystyle\left\lVert y-z\right\rVert_{1}\leq 3n\xi,\;\left\lVert z\right\rVert_{1}\geq\ell\exp(-c).

The second inequality used that some vj=2Bujv_{j}=2Bu_{j} is at least c-c by assumption. Combining the above display with (17) and the definition of ξ\xi concludes the proof of (14). Finally, using the bounds on yz1,z1\left\lVert y-z\right\rVert_{1},\|z\|_{1} above shows that

w1=136y11δ36z1.\|w\|_{1}=\frac{1}{36}\|y\|_{1}\geq\frac{1-\delta}{36}\|z\|_{1}.

Block-encoding.

Our approximate Gibbs oracle follows an implementation strategy pioneered by [GSLW19] termed “block-encoding.” Specifically, we follow [GSLW19] and say that 𝐔\mathbf{U}, an (a+)(a+\ell)-qubit quantum gate, is an \ell-bit block-encoding of 𝐌\mathbf{M} if the top-left 2a×2a2^{a}\times 2^{a} submatrix of 𝐔\mathbf{U} is 𝐌\mathbf{M}. Block-encoded matrices admit efficient composable operations, such as the application of linear combinations and bounded polynomials. We summarize these properties in the following.

Proposition 5 (Lemma 52, [GSLW19]).

Let 𝐔1\mathbf{U}_{1} and 𝐔2\mathbf{U}_{2} be \ell-bit block-encodings of 𝐌1\mathbf{M}_{1}, 𝐌2\mathbf{M}_{2} of the same size. There is an O()O(\ell)-bit block-encoding of 12𝐌1+12𝐌2\frac{1}{2}\mathbf{M}_{1}+\frac{1}{2}\mathbf{M}_{2} which takes the same asymptotic time to apply as applying 𝐔1\mathbf{U}_{1} and 𝐔2\mathbf{U}_{2}.

Proposition 6 (Theorem 56, [GSLW19]).

Let 𝐔\mathbf{U} be an \ell-bit block-encoding of 𝐌\mathbf{M}, and 𝒫:[1,1][12,12]\mathcal{P}:[-1,1]\to[-\frac{1}{2},\frac{1}{2}] be a degree-Δ\Delta polynomial. There is an O()O(\ell)-bit block-encoding of 𝒫(𝐌)\mathcal{P}(\mathbf{M}) which can be applied in O(Δ)O(\Delta) applications of 𝐔\mathbf{U} and 𝐔\mathbf{U}^{\dagger} and O(Δ)O(\ell\Delta) additional time.

We also demonstrate that an application of Corollary 4 yields a simple block-encoding of diag(𝐀xβ)\textbf{{diag}}\left(\frac{\mathbf{A}^{\top}x}{\beta}\right). A similar construction previously appeared in [vAG19].

Corollary 6.

Let x0mx\in\mathbb{R}^{m}_{\geq 0} correspond to an instance of 𝖲𝖺𝗆𝗉𝗅𝖾𝗋𝖳𝗋𝖾𝖾\mathsf{SamplerTree}, and βx1\beta\geq\left\lVert x\right\rVert_{1}. Let 𝐌:=diag(𝐀xβ)\mathbf{M}:=\textbf{{diag}}\left(\frac{\mathbf{A}^{\top}x}{\beta}\right) and 𝐔:=𝒪𝐀x(SWAP12𝐈)𝒪𝐀x\mathbf{U}:=\mathcal{O}_{\mathbf{A}^{\top}x}^{*}(\textup{SWAP}_{12}\otimes\mathbf{I})\mathcal{O}_{\mathbf{A}^{\top}x}, where SWAP12\textup{SWAP}_{12} swaps the first two qubits and 𝒪𝐀x\mathcal{O}_{\mathbf{A}^{\top}x} is from Corollary 4. Then 𝐔\mathbf{U} is a block-encoding of 𝐌\mathbf{M}, and can be applied in time O(logm)O(\log m), with total building cost O(Tlogm)O(T\log m) after TT calls to 𝖴𝗉𝖽𝖺𝗍𝖾\mathsf{Update}.

Proof.

Define wij:=𝐀ijxiβw_{ij}:=\frac{\mathbf{A}_{ij}x_{i}}{\beta} for convenience. By the definition of 𝒪𝐀x\mathcal{O}_{\mathbf{A}^{\top}x}, we have that

(SWAP12𝐈)𝒪𝐀x(|0(a+2)|j)=(|00i[m]wij|i+|10|g)|j.\left(\textup{SWAP}_{12}\otimes\mathbf{I}\right)\mathcal{O}_{\mathbf{A}^{\top}x}\left(|0\rangle^{\otimes(a+2)}|j\rangle\right)=\left(|00\rangle\sum_{i\in[m]}\sqrt{w_{ij}}|i\rangle+|10\rangle|g\rangle\right)|j\rangle.

Hence, for j,j[n]j,j^{\prime}\in[n], we compute j|0|(a+2)𝐔|0(a+2)|j\langle j^{\prime}|\langle 0|^{\otimes(a+2)}\mathbf{U}|0\rangle^{\otimes(a+2)}|j\rangle as:

j|(|00i[m]wij|i+|01|g)(|00i[m]wij|i+|10|g)|j\displaystyle\langle j^{\prime}|\left(|00\rangle\sum_{i\in[m]}\sqrt{w_{ij}}|i\rangle+|01\rangle|g\rangle\right)^{*}\left(|00\rangle\sum_{i\in[m]}\sqrt{w_{ij}}|i\rangle+|10\rangle|g\rangle\right)|j\rangle
={i[m]wij=[𝐀x]jβj=j0jj.\displaystyle=\begin{cases}\sum_{i\in[m]}w_{ij}=\frac{[\mathbf{A}^{\top}x]_{j}}{\beta}&j=j^{\prime}\\ 0&j\neq j^{\prime}\end{cases}.

In particular the |01|01\rangle and |10|10\rangle terms disappear, and |j|j\rangle, |j|j^{\prime}\rangle are orthogonal unless j=jj=j^{\prime}. In the above, we required that wijwij=wij\sqrt{w_{ij}}^{*}\sqrt{w_{ij}}=w_{ij}, which is only true if wijw_{ij} is nonnegative. To bypass this issue, we will implement the two copies of 𝒪𝐀x\mathcal{O}_{\mathbf{A}^{\top}x} in slightly different ways, to obtain the correct signing. For notational clarity, we let 𝒪L\mathcal{O}^{L} be the oracle which is conjugated on the left and 𝒪R\mathcal{O}^{R} be the oracle on the right, such that 𝐔=(𝒪L)(SWAP12𝐈)(𝒪R)\mathbf{U}=(\mathcal{O}^{L})^{*}(\textup{SWAP}_{12}\otimes\mathbf{I})(\mathcal{O}^{R}). Note that xx is entrywise nonnegative and β>0\beta>0, and hence the only factor determining the sign of wijw_{ij} is 𝐀ij\mathbf{A}_{ij}. When 𝐀ij0\mathbf{A}_{ij}\geq 0, we will define the oracles 𝒪𝐀\mathcal{O}_{\mathbf{A}}^{\prime} used to load 𝐀ij\sqrt{\mathbf{A}_{ij}} for 𝒪L\mathcal{O}^{L} and 𝒪R\mathcal{O}^{R} in a consistent way (i.e. use the same-signed square root), so that wij2=wij\sqrt{w_{ij}}^{2}=w_{ij}. When 𝐀ij<0\mathbf{A}_{ij}<0 we will define them in an inconsistent way, so that after the conjugation operation, wijwij=wij-\sqrt{w_{ij}}\sqrt{w_{ij}}=w_{ij}. We have thus shown that 0|(a+2)𝐔|0(a+2)=𝐌\langle 0|^{\otimes(a+2)}\mathbf{U}|0\rangle^{\otimes(a+2)}=\mathbf{M} which implies the first conclusion. To see the second, all our gates are reversible (arithmetic circuits are reversible, and 𝒪𝐀\mathcal{O}_{\mathbf{A}} is its own inverse), and hence the complexity of applying 𝒪𝐀x\mathcal{O}_{\mathbf{A}^{\top}x}^{*} is the same as 𝒪𝐀x\mathcal{O}_{\mathbf{A}^{\top}x}. ∎

Finally, we put together the pieces and prove Proposition 2, which we use repeatedly throughout the paper to implement our Gibbs sampling oracles.

See 2

Proof.

Throughout the proof, let δmin(12,δ)\delta\leftarrow\min(\frac{1}{2},\delta) and B:=4(β+log(Cnδ))B:=4(\beta+\log(\frac{Cn}{\delta})). Also define :=δn\ell:=\frac{\delta}{n} (following notation of Corollary 5). We first observe that since maxj[n][𝐀x]jlogZmaxj[n][𝐀x]j+logn\max_{j\in[n]}[\mathbf{A}^{\top}x]_{j}\leq\log Z\leq\max_{j\in[n]}[\mathbf{A}^{\top}x]_{j}+\log n,

log(Cn)maxj[n][𝐀x]jlog(Z~qj)0.-\log(Cn)\leq\max_{j\in[n]}[\mathbf{A}^{\top}x]_{j}-\log\left(\widetilde{Z}q_{j}\right)\leq 0.

Here, the upper bound used that for all j[n]j\in[n], exp([𝐀x]jZ~qj)=pjqjZZ~1\exp([\mathbf{A}^{\top}x]_{j}-\widetilde{Z}q_{j})=\frac{p_{j}}{q_{j}}\cdot\frac{Z}{\widetilde{Z}}\leq 1 by assumption. Hence, for v:=𝐀xlog(Z~q)v:=\mathbf{A}^{\top}x-\log(\widetilde{Z}q) entrywise,

cmaxj[n]vj0 for c:=log(Cn).-c\leq\max_{j\in[n]}v_{j}\leq 0\text{ for }c:=\log(Cn).

Next, we note log(Z~q)\log(\widetilde{Z}q) is entrywise bounded in magnitude by B2\frac{B}{2}:

log(Z~qj)\displaystyle\log(\widetilde{Z}q_{j}) log(CZ)log(nmaxj[n]exp([𝐀x]j))+logCB2,\displaystyle\leq\log(CZ)\leq\log\left(n\cdot\max_{j\in[n]}\exp([\mathbf{A}^{\top}x]_{j})\right)+\log C\leq\frac{B}{2},
log(Z~qj)\displaystyle\log(\widetilde{Z}q_{j}) logZ+logδnminj[n][𝐀x]jlognδB2.\displaystyle\geq\log Z+\log\frac{\delta}{n}\geq\min_{j\in[n]}[\mathbf{A}^{\top}x]_{j}-\log\frac{n}{\delta}\geq-\frac{B}{2}.

Define 𝐌1:=diag(𝐀x2B)\mathbf{M}_{1}:=\textbf{{diag}}\left(\frac{\mathbf{A}^{\top}x}{2B}\right) and 𝐌2:=diag(12Blog(Z~q))\mathbf{M}_{2}:=\textbf{{diag}}\left(-\frac{1}{2B}\log(\widetilde{Z}q)\right). By the calculations above, we have 𝐌2op12\left\lVert\mathbf{M}_{2}\right\rVert_{\textup{op}}\leq\frac{1}{2}, and similarly it is clear that 𝐌1op12\left\lVert\mathbf{M}_{1}\right\rVert_{\textup{op}}\leq\frac{1}{2} because 𝐀xβ\left\lVert\mathbf{A}^{\top}x\right\rVert_{\infty}\leq\beta. Moreover, by using Corollary 6 with βB\beta\leftarrow B, we obtain 𝐔1\mathbf{U}_{1}, a block-encoding of 𝐌1\mathbf{M}_{1} applicable in O(logm)O(\log m) time. Using a similar construction as Corollary 6, since qq, BB, and Z~\widetilde{Z} are all efficiently classically queriable, we obtain 𝐔2\mathbf{U}_{2}, a block-encoding of 𝐌2\mathbf{M}_{2} applicable in O(1)O(1) time. Hence, Proposition 5 yields 𝐔\mathbf{U}, a block-encoding of

𝐌1+𝐌2=diag(v2B),\mathbf{M}_{1}+\mathbf{M}_{2}=\textbf{diag}\left(\frac{v}{2B}\right),

which can be applied in O(logmn)O(\log mn) time. Next, let 𝒫\mathcal{P} be the degree-Δ=O(BlogCnδ)\Delta=O(B\log\frac{Cn}{\delta}) polynomial from Corollary 5, parameterized by B,v,c,q,B,v,c,q,\ell as defined earlier. Corollary 5 shows that 𝒫:[1,1][12,12]\mathcal{P}:[-1,1]\to[-\frac{1}{2},\frac{1}{2}]. Thus, Proposition 6 then yields 𝐔\mathbf{U}^{\prime}, a block-encoding of diag(𝒫(v2B))\textbf{{diag}}\left(\mathcal{P}(\frac{v}{2B})\right) which can be applied in O(Δlogmn)O(\Delta\cdot\log mn) time. Furthermore, since qq and ρ\rho are efficiently classically queriable, we can define a gate 𝒪q\mathcal{O}_{q} which is applicable in O(1)O(1) time and acts as

𝒪q|0(b+1)=|0j[n]qjρ|j+|1|g.\mathcal{O}_{q}|0\rangle^{\otimes(b+1)}=|0\rangle\sum_{j\in[n]}\sqrt{\frac{q_{j}}{\rho}}|j\rangle+|1\rangle|g\rangle.

Applying 𝐔\mathbf{U}^{\prime} to the output of 𝒪q\mathcal{O}_{q} with appropriate ancilla qubits then yields

|0O(1)j[n]qj𝒫(uj)2ρ|j|gj+|g, where uj:=vj2B for all j[n].|0\rangle^{\otimes O(1)}\sum_{j\in[n]}\sqrt{\frac{q_{j}\mathcal{P}(u_{j})^{2}}{\rho}}|j\rangle|g_{j}\rangle+|g^{\prime}\rangle,\text{ where }u_{j}:=\frac{v_{j}}{2B}\text{ for all }j\in[n].

Post-selecting on the first register being the all-zeroes state and measuring on the register corresponding to jj, we see that we obtain a sample j[n]j\in[n] with probability proportional to qj𝒫(uj)2q_{j}\mathcal{P}(u_{j})^{2}. By Corollary 5, conditioned on the sample succeeding, the resulting distribution is δ\delta-close in 1\ell_{1} to the distribution proportional to qexp(v)exp(𝐀x)q\circ\exp(v)\propto\exp(\mathbf{A}^{\top}x), and hence the result is a δ\delta-approximate Gibbs oracle. Finally, we bound the query cost of the oracle. Define wj:=𝒫(uj)2qjw_{j}:=\mathcal{P}(u_{j})^{2}q_{j} and zj:=exp(vj)qjz_{j}:=\exp(v_{j})q_{j} as in Corollary 5. By definition of v,Z~v,\widetilde{Z},

z1=j[n]exp([𝐀x]j)Z~[C1,1].\left\lVert z\right\rVert_{1}=\sum_{j\in[n]}\frac{\exp\left(\left[\mathbf{A}^{\top}x\right]_{j}\right)}{\widetilde{Z}}\in\left[C^{-1},1\right].

Moreover, the last conclusion in Corollary 5 shows w1172z1(72C)1\left\lVert w\right\rVert_{1}\geq\frac{1}{72}\left\lVert z\right\rVert_{1}\geq(72C)^{-1}. Hence,

j[n]qj𝒫(uj)2ρ=w1ρ172Cρ.\sum_{j\in[n]}\frac{q_{j}\mathcal{P}(u_{j})^{2}}{\rho}=\frac{\left\lVert w\right\rVert_{1}}{\rho}\geq\frac{1}{72C\rho}.

In other words, we have an oracle which we can apply in time O(Δlogmn)O(\Delta\cdot\log mn) which correctly returns a sample with probability α172Cρ\alpha\geq\frac{1}{72C\rho}. By applying Proposition 4 to improve the success probability, we obtain the desired conclusion at a O(Cρlog1δ)O(\sqrt{C\rho}\log\frac{1}{\delta}) overhead.

See 2

Proof.

Our oracle 𝒪test\mathcal{O}_{\textup{test}} is the oracle from Proposition 2, except we will choose a sufficiently small constant value of δ\delta. It returns “success” when the sample is accepted by the rejection sampler after boosting by amplitude amplification. Before boosting, the success probability from Proposition 2 is Θ(1Rρ)\Theta(\frac{1}{R\rho}) where the constants in the upper and lower bounds are explicit. Further, the constants from Proposition 4 are explicit, and hence boosting by amplitude amplification improves the success probability to Θ(1Rρ)\Theta(\frac{1}{\sqrt{R\rho}}) with known constant bounds as required by the corollary statement. ∎

Appendix C Bounded approximation to exp\exp on [1,1][-1,1]

Here, we give a proof of a lemma (with slightly different constants) used in the prior work [vAG19]. This section builds entirely off prior results on polynomial approximation in [GSLW19]; we include it for completeness because a proof was not given in [vAG19]. As a reminder, we stated and used the following result earlier when constructing our rejection sampler in Appendix B.

See 8

To obtain the lemma, we will utilize the following result from [GSLW19].

Proposition 7 (Corollary 66, [GSLW19]).

Let x0[1,1]x_{0}\in[-1,1], r(0,2]r\in(0,2], δ(0,r]\delta\in(0,r]. Let f:[x0rδ,x0+r+δ]f:[x_{0}-r-\delta,x_{0}+r+\delta]\rightarrow\mathbb{C} be such that f(x0+x)=0axf(x_{0}+x)=\sum_{\ell\geq 0}a_{\ell}x^{\ell} for all x[rδ,r+δ]x\in[-r-\delta,r+\delta]. Suppose B>0B>0 is such that 0(r+δ)|a|B\sum_{\ell\geq 0}(r+\delta)^{\ell}|a_{\ell}|\leq B and let ϵ(0,12B]\epsilon\in(0,\frac{1}{2B}]. There is a polynomial PP (see Appendix D for its numerically stable implementation) of degree O(1δlogBϵ)O\left(\frac{1}{\delta}\log\frac{B}{\epsilon}\right) such that

maxx[x0r,x0+r]|f(x)P(x)|ϵ and maxx[1,1]|P(x)|ϵ+B.\displaystyle\max_{x\in[x_{0}-r,x_{0}+r]}|f(x)-P(x)|\leq\epsilon\text{ and }\max_{x\in[-1,1]}|P(x)|\leq\epsilon+B.
Proof of Lemma 8.

We apply Proposition 7 with f(x):=exp(βx)f(x):=\exp(\beta x) which has a convergent Taylor series everywhere, and the parameter settings x0=1x_{0}=-1, r=1r=1, δ=1β\delta=\frac{1}{\beta}, B=eB=e. We have that f(x0+x)=0exp(β)βx!=0axf(x_{0}+x)=\sum_{\ell\geq 0}\exp(-\beta)\frac{\beta^{\ell}\cdot x^{\ell}}{\ell!}=\sum_{\ell\geq 0}a_{\ell}x^{\ell} with a=exp(β)β!a_{\ell}=\exp(-\beta)\frac{\beta^{\ell}}{\ell!} for any integer 0\ell\geq 0. We also check that our choice of BB is valid, via

0(r+δ)|a|=exp(β)0(1+1β)β!=exp(β)0(β+1)!=exp(β+1β)=e.\sum_{\ell\geq 0}(r+\delta)^{\ell}|a_{\ell}|=\exp(-\beta)\sum_{\ell\geq 0}\left(1+\frac{1}{\beta}\right)^{\ell}\frac{\beta^{\ell}}{\ell!}=\exp(-\beta)\sum_{\ell\geq 0}\frac{(\beta+1)^{\ell}}{\ell!}=\exp(\beta+1-\beta)=e.

Hence by Proposition 7, we have for any ξ12e\xi\leq\frac{1}{2e}, there is a polynomial PP of degree O(βlog1ξ)O(\beta\log\frac{1}{\xi}) such that maxx[2,0]|exp(βx)P(x)|ϵ\max_{x\in[-2,0]}|\exp(\beta x)-P(x)|\leq\epsilon and maxx[1,1]|P~(x)|e+16+ξ3\max_{x\in[-1,1]}|\tilde{P}(x)|\leq e+\frac{1}{6}+\xi\leq 3. ∎

Appendix D Numerically stable implementation of polynomial approximation

Throughout this section, let Δ=O(1ϵlog2(mnϵ))\Delta=O(\frac{1}{\epsilon}\log^{2}(\frac{mn}{\epsilon})) be the degree of the polynomial used in the proof of Proposition 2 in Appendix B (specifically, constructed in the proof of Proposition 2, where we have C=O(1)C=O(1) and δ=O(ϵ)\delta=O(\epsilon) in our applications). The polynomial we use is constructed via a decomposition in the Fourier basis (see Lemmas 57 and 65, [GSLW19]). It is not immediate that this polynomial transform can be implemented stably in finite-precision arithmetic, within the quantum singular value transformation framework of [GSLW19], which is used in the proof of Proposition 2. However, [Haa19] shows that given such a decomposition in the Fourier basis, we can obtain a numerically-stable implementation of the polynomial transformation required as a quantum circuit up to additive error ξ\xi, in time

O(Δ3log(Δξ)).O\left(\Delta^{3}\log\left(\frac{\Delta}{\xi}\right)\right).

In our setting (in the proof of Proposition 2), it is straightforward to check that ξ=poly(m,n,ϵ1)\xi=\textup{poly}(m,n,\epsilon^{-1}). This construction results in the additive term in Theorem 4.