This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Quantum random power method for ground state computation

Taehee Ko [email protected] School of Computational Sciences, Korea Institute for Advanced Study    Hyowon Park [email protected] Department of Physics, University of Illinois at Chicago    SangKook Choi [email protected] School of Computational Sciences, Korea Institute for Advanced Study
Abstract

We present a quantum-classical hybrid random power method that approximates a ground state of a Hamiltonian. The quantum part of our method computes a fixed number of elements of a Hamiltonian-matrix polynomial via a quantum polynomial filtering technique. This technique can be implemented using Hamiltonian simulation or block encoding, suitable for early fault-tolerant and fault-tolerant regimes, respectively. The classical part of our method is a randomized iterative algorithm that takes as input the matrix elements computed from the quantum part and outputs an approximation of ground state of the Hamiltonian. For the per-iteration complexity of our method, the required classical time is independent of system size, and the quantum circuit complexity depends polylogarithmically on the system size. We prove that with probability one, our method converges to an approximation of a ground state of the Hamiltonian. We also show a lower bound of the fidelity of the approximate ground state with the true one. The lower bound depends linearly on the magnitude of noise occurring from quantum computation if it is smaller than a critical value. Several numerical experiments demonstrate that our method provides a good approximation of ground state in the presence of systematic and/or sampling noise.

I Introduction

Quantum computing algorithms are shown as a promising means for various problems in theoretical sciences [34, 41, 27, 33, 21]. Among those, the computation of ground state and the corresponding energy is of significant importance in physics, chemistry, and material science. Despite the importance of the problem, it is not clear whether there exists an efficient algorithm even when using the power of quantum computations (i.e. the QMA-hard result [17, 1, 15, 32]). Nevertheless, many interesting quantum algorithms have been proposed recently, showing different levels of computational power that classical algorithms may not achieve. For instance, quantum algorithms for ground-state energy estimation were proposed based on a Krylov subspace method, given access to Hamiltonian simulation or block encoding [16, 37, 20, 29].

Quantum computing algorithms for ground-state preparation have also been developed and studied, but there are still issues to consider: whether the algorithm demands a reasonable circuit depth, and whether it guarantees the convergence to the ground state of a quantum Hamiltonian. For instance, filtering-based algorithms by Lin and his colleagues [22, 11] employ the idea of quantum signal processing (QSP) [24] to prepare a ground state with rigorous guarantee. However, these algorithms require a circuit depth sufficiently long to implement filtering techniques for the QSP. Another type of algorithm relies on the variational approach with parameterized quantum circuits such as the variational quantum eigensolver (VQE) [35, 13, 28, 19]. Although VQEs are more suitable on near-term quantum devices, to the best of our knowledge, no convergence guarantee of VQAs is found, but only empirical evidence.

The ground-state problem is viewed as a constrained optimization problem. Due to the exponential growth of the dimension of the Hilbert space, the problem of practical interest is considered as a large-scale optimization task. In classical computational algorithms, randomization has been applied as an effective algorithmic technique to large-scale optimizations. Examples are the Karzmacz method for solving a linear system [39], quantum Monte Carlo (QMC) for ground-state problem [2, 6, 26], the stochastic gradient descent for training machine learning models [3] and the principal component analysis (PCA) [38], and the random coordinate descent for convex optimization [30]. The idea of randomization has been also employed for many recent quantum algorithms, such as optimization of parameterized quantum circuits [9, 40], Hamiltonian simulation [4, 8] and extracting the properteis of the ground states and the Gibbs states [42]. The common feature of randomized methods is a trade-off between computational resources and the sampling cost. Interestingly, it is often observed that the randomized approach is more efficient than the exact counterpart theoretically and empirically.

In this paper, we present the first quantum version of randomized power method that yields a good approximation of ground state. The per-iteration complexity of our method includes only a constant classical time and a polylogarithmic quantum circuit complexity with respect to system size. Importantly, we prove that the proposed method converges to an approximation of the ground state with probability one, whose accuracy can be systematically improved. Our method incorporates a quantum polynomial filtering into a randomized classical iterative algorithm. The use of quantum polynomial filtering enables us to achieve much better convergence properties than what we would obtain without it. To implement the polynomial filtering, we propose two approaches based on either Hamiltonian simulation or block encoding. The former approach utilizes a Fourier-series approximation for the target polynomial filtering [23, 42], whereas the latter can operate with the quantum singular value transformation (QSVT) [12]. Given a corresponding quantum circuit to the quantum polynomial filtering in our method, the only task with the circuit is to estimate elements of the matrix polynomial using a Hadamard test. The estimated matrix elements are then fed into a classical randomized iteration, as illustrated in Fig. 1. At a high level, our hybrid approach reduces the task of approximating ground state to a randomized iterative procedure of sampling matrix elements. The sampling is carried out without repetition, which reduces the overall shot cost.

Refer to caption
Figure 1: An illustration of quantum random power method. At every iteration, a set of indices, MM, is sampled randomly, and then the corresponding elements of a matrix polynomial filter, {p(H)~ij}(i,j)M\{\widetilde{p(H)}_{ij}\}_{(i,j)\in M}, are computed from a quantum computer. We call this quantum polynomial filtering. In practice, matrix elements obtained from the quantum polynomial filtering may be biased due to various types of noise (e.g. systematic error from the Trotter formula, sampling noise from a Hadamard test, and so on). This implies that we essentially estimate elements of a perturbation of the true matrix polynomial, p(H)p(H), from a quantum computer. Next, we use the matrix elements as input for a classical randomized algorithm, which we call random power method. After a sufficient number of iterations, the method yields a classical vector, 𝒙\bm{x}, close to a ground state of HH, ψGS\psi_{GS}.

Contribution

This paper focuses on developing a hybrid randomized version of the power method for ground-state computation. On the algorithmic and theoretical sides, the main contributions of this work are summarized as follows,

  • We show that our method converges to a ground state of a given Hamiltonian HN×NH\in\mathbb{C}^{N\times N} to a target precision ϵ\epsilon with probability one (Theorem 4). Specifically, for a desired precision ϵ\epsilon, if the initial guess 𝒙1\bm{x}_{1} has a non-zero overlap γ\gamma with a ground state of HH, there exist a step size a>0a>0 and a constant cϵ,a>1c_{\epsilon,a}>1 dependent on aa and ϵ\epsilon such that an iterate, whose fidelity with the ground state is greater than 1ϵ1-\epsilon, is obtained with 𝒪(log1γ/logcϵ,a)\mathcal{O}(\log\frac{1}{\gamma}/\log c_{\epsilon,a}) iteration complexity. The convergence analysis is performed under only the assumption that HH is a Hermitian matrix (e.g. a linear combination of the Pauli operators). To the best of our knowledge, such probabilistic convergence has not been established for inexact power methods [26, 38]. For our analysis, we employ a submartingale approach to quantify a recursive relation in terms of average fidelity.

  • We propose a simple and efficient gradient estimator for updating the iteration in our method, which requires a 𝒪(1)\mathcal{O}(1) classical time and a polylogarithmic quantum circuit complexity with system size.

  • We present a lower bound of the fidelity of an approximation of ground state with the true one in Lemma 5. The lower bound depends linearly on the magnitude of noise between their corresponding matrices, if it is smaller than some critical value. The value is formulated by the spectral gap of the true matrix.

  • We provide several numerical results demonstrating that our method outputs a fairly good approximation of a ground state and supporting our convergence analysis. A part of the results also shows that the noise condition in Lemma 5 holds in many cases.

  • The proposed method can be straightforwardly generalized to the task of approximating an excited state and the corresponding eigenvalue by changing the quantum polynomial filtering accordingly.

Related work

Classical deterministic ground-state algorithms are often inapplicable for large electronic systems, due to the demand of expensive computational resource in performing the matrix-vector product HxHx (e.g. power method and Krylov subspace method). Alternatively, randomized variants of the classical algorithms dramatically reduce iteration cost by performing inexact matrix-vector calculations (see [26, 2, 6] and [38]). These approaches typically assume classical input models for HH. In our setting, however, we take advantage of quantum input models such as Hamiltonian evolution for the Fourier-series approximation [23, 42] or block encoding for the quantum singular value transformation [12]) for more efficient access to p(H)p(H).

In the framework of the noisy power method [14], the recent work [7] proposes quantum-classical hybrid algorithms that outputs a good approximation of the top eigenvector on a classical computer. Their algorithms achieve a speedup by reducing the computational complexity of the matrix-vector multiplication using an amplitude amplification. In contrast, our approach gains a speed-up by enlarging the spectral gap via a quantum polynomial filtering. Furthermore, the algorithms in [7] assume more intricate techniques that may be available in the fault-tolerant quantum computation regime. However, our approach can be suitable for early fault-tolerant quantum devices, if one uses a Fourier-series approximation with Hamiltonian simulation to sample the elements of p(H)p(H) as discussed in [23, 42].

The choice of the polynomial p(x)p(x) in our method affects convergence properties. In our applications, we use the Chebyshve polynomial as a filtering polynomial, which enables to amplify the contributions associated with the small eigenvalues much larger than those of other eigenvalues after shifting the Hamiltonian (i.e. the Chebyshev filtering method in the density functional theory [45]). Previously, other filtering methods have also found use in quantum ground-state algorithms [22, 11]. One difference is that our method can work with loose filtering bounds that may include some eigenvalues other than the ground state energy, while the quantum filtering algorithms [22, 11] need tight ones to capture only the ground state energy. Thus, our approach may require less overhead in a search of the filtering bounds.

Organization

The rest of the paper is organized as follows. We present our main result and algorithmic components for solving the ground state problem in Section II. Section III shows convergence guarantee of our algorithm, with emphasis on convergence in probability and the implication to the iteration complexity. Additionally, we provide a perturbation theorem that elucidates the effect of biases occuring from the quantum filtering technique on the accuracy of approximate ground state. In Section V, several numerical experiments validate the performance of our algorithm for approximating ground state for different model Hamiltonians. In the appendixes, we supplement analytical details and all proofs thereof.

II Preliminaries

In this paper, HH denotes the Hamiltonian matrix. We use \norm{\cdot} for the 2-norm of a vector of a matrix. The state |𝒙\ket{\bm{x}} in the braket notation is always a unit vector, 𝒙=1\norm{\bm{x}}=1, and a vector without the notation, 𝒙\bm{x}, is not necessarily a unit vector. This notation will be used mainly in analytic results in the appendixes. For the inner product of two vectors, if they are unit vectors, we denote it as 𝒗|𝒘\bra{\bm{v}}\ket{\bm{w}}. Otherwise, we use (𝒗,𝒘)(\bm{v},\bm{w}), the conventional notation in linear algebra.

This paper focuses on the approximation of a ground state of a Hamiltonian to the desired precision ϵ\epsilon using quantum and classical resources. Specifically, the problem is set up as follows

Problem 1 (Finding an approximation of a ground state).

For a given precision ϵ>0\epsilon>0 and a given Hamiltonian HN×NH\in\mathbb{C}^{N\times N}, design a quantum-classical hybrid algorithm that outputs a classical description of state vector |𝐱\ket{\bm{x}} such that

j=1g|ψGS,j|𝒙|21ϵ.\sqrt{\sum_{j=1}^{g}\absolutevalue{\bra{\psi_{GS,j}}\ket{\bm{x}}}^{2}}\geq 1-\epsilon. (1)

Here the set, {|ψGS,j}j=1g\{\ket{\psi_{GS,j}}\}_{j=1}^{g}, is the collection of degenerate ground states of HH, all of which satisfy the following optimization problem,

min𝒙=1𝒙|H|𝒙.\min_{\norm{\bm{x}}=1}\bra{\bm{x}}H\ket{\bm{x}}. (2)

The quantity (1) is the fidelity of the iterate 𝒙\bm{x} with the subspace of ground states of HH, which measures how much close the iterate is to some ground state of HH. In the case of non-degnerate ground state, the quantity (1) becomes the usual fidelity |ψGS|𝒙|\absolutevalue{\bra{\psi_{GS}}\ket{\bm{x}}}. More generally, (1) covers the case of degenerate ground states when the eigenvalues of HH are ordered as follows

λ1==λg<λg+1λN\lambda_{1}=\cdots=\lambda_{g}<\lambda_{g+1}\leq\cdots\lambda_{N} (3)

for some g2g\geq 2. For the analysis, we first work with the case of non-degenerate ground state (g=1g=1), and extend it to the case of degenerate ones (g2g\geq 2).

II.1 Matrix-element estimation from quantum polynomial filtering

A main component in our algorithm is the estimation of the matrix element of the Hamiltonian matrix polynomial p(H)p(H), for example, for any indices i,j[N]i,j\in[N],

i|p(H)|j,\bra{i}p(H)\ket{j}, (4)

where |i\ket{i} denotes the computational basis element corresponding to the integer ii, and the same for |j\ket{j}. To estimate this quantity, one can consider two approaches: one based on the Hamiltonian evolution [23] and the other based on the quantum singular value transformation (QSVT) [12].

Hamiltonian evolution is built on the ability to implement eiHte^{iHt}. We estimate a matrix polynomial of HH using a Fourier-series approximation. By the property of the Fourier expansion, we obtain the following approximation for any smooth function p(x)p(x) on [π,π][-\pi,\pi] and some sufficiently large MM,

p(x)m=MMcmeimxp(x)\approx\sum_{m=-M}^{M}c_{m}e^{imx} (5)

The coefficient cmc_{m} depends on the choice of a kernel [43]. In our applciations, we used the Fej́er kernel as discussed in Section VIII. Using this approximation and Hamiltonian simulation, we obtain access to a matrix polynomial of HH approximately,

p(H)m=MMcmeiHm.p(H)\approx\sum_{m=-M}^{M}c_{m}e^{iHm}. (6)

Especially, estimating matrix elements of p(H)p(H) and the corresponding elements of the exponentials of HH are equivalent as follows,

𝒊|p(H)|𝒋m=MMcm𝒊|eiHm|𝒋.\bra{\bm{i}}p(H)\ket{\bm{j}}\approx\sum_{m=-M}^{M}c_{m}\bra{\bm{i}}e^{iHm}\ket{\bm{j}}. (7)

To compute this matrix element, we use the Hadamard test in Fig. 2.

Alternatively, if we are given a block encoding of HH, we construct a block encoding of p(H)p(H) by combining quantum signal processing techniques [25, 12, 5]. Then, we are able to do the same task by replacing eiHte^{iHt} in Fig. 2 with Up(H)U_{p(H)}, the block encoding of p(H)p(H), and additional ancilla qubits,

\Qcircuit@C=0.9em @R=1.4em & \lstick—0⟩ \gateH \ctrl1 \gateW \gateH \qw \meter \qw
\lstick0\qw/ \gateU_i^†UU_j \qw \qw\qw \qw \qw

Figure 2: Estimating matrix elements using the Hadamard test given an access to the Hamiltonian evolution U=eiHtU=e^{iHt} or the block encoding U=Up(H)U=U_{p(H)}. H is the Hadamard gate and UiU_{i} (UjU_{j}) applies the X gate to the ii-th qubit (the jj-th qubit). WW is either II or SS^{\dagger} where SS is the phase gate.

For our algorithm, we let p(x)p(x) be a Chebyshev polynomial, which was motivated by the Chebyshev filtering method [45].

Refer to caption
Figure 3: An illustration of the Chebyshev filtering applied to a Hamiltonian matrix HH. The x-axis labels the index of the eigenvalue of HH, λ\lambda, in increasing order (e.g. the first index corresponds to the ground-state energy), and the y-axis labels the value of p(λ)p(\lambda) corresponding to the eigenvalue λ\lambda.

This method requires a priori bounds to be properly implemented. For example, we need λlb\lambda_{lb} and λub\lambda_{ub} such that

λ1<λlb<λN<λub,\lambda_{1}<\lambda_{lb}<\lambda_{N}<\lambda_{ub}, (8)

with which the given Hamiltonian is linearly transformed as

2λubλlb(Hλub)+I.\frac{2}{\lambda_{ub}-\lambda_{lb}}(H-\lambda_{ub})+I. (9)

Applying this transformation and a Chebyshev polynomial, p(x)p(x), to HH amplifies the contributions of small eigenvalues but dampens those of large eigenvalues of HH as in Fig. 3, say,

|p(λj)|={>1,jk<ϵ,j>k.|p(\lambda_{j})|=\begin{cases}>1,\quad\forall j\leq k\\ <\epsilon,\quad\forall j>k.\end{cases} (10)

Notice that the high-lying eigenvalues in [λlb,λub][\lambda_{lb},\lambda_{ub}] are mapped into [1,1][-1,1] where the Chebyshev polynomials value in [1,1][-1,1], while the low-lying eigenvalues lying in [λ1,λlb)[\lambda_{1},\lambda_{lb}) are located outside [1,1][-1,1]. Due to the fact that the Chebyshev polynomial is amplified outside [1,1][-1,1], the contributions of low-lying eigenvalues of p(H)p(H) increases rapidly.

A search of a priori bounds is an important step in the context of filtering algorithms [45, 22, 23]. In classical algorithms, the task can be completed with a combination of Rayleigh quotient computations and Lanczos algorithm as demonstrated in [45]. But this may yield an exponential arithmetic operation cost with system size. Alternatively, one may use the binary amplitude estimation with a block encoding of HH provided [22] or with an access to the Hamiltonian evolution U=eiHtU=e^{iHt} [23].

II.2 Random power method

Once we have access to the matrix polynomial p(H)p(H) as discussed above, we build an iterative algorithm in which a gradient estimator is constructed from the elements of the matrix polynomial. To do so, we propose an estimator for the matrix-vector product H𝒙H\bm{x} by borrowing the ideas of randomized coordinate approach [30, 18] and the stochastic gradient descent [3]. A more precise definition is found in (16). For simple discussion, we look at the property that the estimator in (16) is proportional to p(H)𝒙p(H)\bm{x} in the expectation. Essentially, we define an iterative algorithm as follows,

𝒙k+1=𝒙ka𝒈k.\bm{x}_{k+1}=\bm{x}_{k}-a\bm{g}_{k}. (11)

Here 𝒈k\bm{g}_{k} satisfies that

𝔼[𝒈k]p(H)𝒙k.\mathbb{E}[\bm{g}_{k}]\propto p(H)\bm{x}_{k}. (12)

For example, we sample two indices ik,jk[N]i_{k},j_{k}\in[N] first, and then estimate the corresponding matrix element to define the random vector as

𝒈k=((𝒙k)ik𝒊k|p(H)|𝒋k+ϵ)𝒆jk,\bm{g}_{k}=\left((\bm{x}_{k})_{i_{k}}\bra{\bm{i}_{k}}p(H)\ket{\bm{j}_{k}}+\epsilon\right)\bm{e}_{j_{k}}, (13)

where (𝒙k)ik(\bm{x}_{k})_{i_{k}} is the iki_{k}-th component of xkx_{k}, 𝒆jk\bm{e}_{j_{k}} is the classical computational basis vector corresponding to index jkj_{k}. For simplicity, we assume that there is no systematic error and let ϵ\epsilon denote the sampling noise. Noting that the two independent random samplings occur for indices ik,jki_{k},j_{k}, we then observe that averaging 𝒈k\bm{g}_{k} over the indices yields the desired matrix-vector product up to a scaling factor,

𝔼[𝒈k]=1N2p(H)𝒙k.\mathbb{E}[\bm{g}_{k}]=\frac{1}{N^{2}}p(H)\bm{x}_{k}. (14)

We remark that the iterative scheme (11) is typically used for uncontrained optimization problems in machine learning, but still applicable for the problem (2) which is a constrained optimization problem with normalization as demonstrated in [26, 38]. To understand why it converges, we naively consider the dynamics of the averaged iteration. Let us denote 𝒚k:=𝔼[𝒙k]\bm{y}_{k}:=\mathbb{E}[\bm{x}_{k}]. By taking expectation on the equation, we can obtain

𝒚k+1(Iap(H))𝒚k,\bm{y}_{k+1}\propto(I-ap(H))\bm{y}_{k}, (15)

which resembles the power method without normalization. With sufficiently small a>0a>0, the iteration will converge to the eigenvector corresponding to the maximum eigenvalue of the matrix Iap(H)I-ap(H) or the ground state of HH. From this viewpoint, we can expect that the iteration (11) converges to the ground state of HH on average. In fact, a stronger convergence property holds as will be shown in Theorem 4 that contrasts to the previous works [26, 38].

III Algorithm description

Gradient estimator

We extend the estimator in (13) to the one defind with more than one matrix element sampling as follows,

Definition 1.

Given an iterate vector 𝐱\bm{x} and a Hamiltonian HH, we define an estimator 𝐠\bm{g} as

𝒈=ic=1mcir=1mrxirξ(H,ir,ic)𝒆ic.\bm{g}=\sum_{i_{c}=1}^{m_{c}}\sum_{i_{r}=1}^{m_{r}}x_{i_{r}}\xi(H,i_{r},i_{c})\bm{e}_{i_{c}}. (16)

Here, mr[N]m_{r}\in[N] and mc[N]m_{c}\in[N] are distinct indices randomly sampled. Here, iri_{r} and ici_{c} correspond to the numbers of sampled indices iri_{r} and ici_{c}, respectively. The coefficient xirx_{i_{r}} denotes the iri_{r}-th component of 𝐱\bm{x}.

We note that the quantity, ξ(H,ir,ic)\xi(H,i_{r},i_{c}), is an estimate of the element of a desired matrix. For our applications, the estimate ξ\xi will approximate the element of p(H)p(H), such as (7). For the sake of simplicity, we first consider the case where ξ(H,ir,ic)=𝒊r|H|𝒊c\xi(H,i_{r},i_{c})=\bra{\bm{i}_{r}}H\ket{\bm{i}_{c}}. By the definition (16) and the definition of matrix 2-norm, we have

𝒈Hir=1mrxir2H𝒙.\norm{\bm{g}}\leq\norm{H}\sqrt{\sum_{i_{r}=1}^{m_{r}}x_{i_{r}}^{2}}\leq\norm{H}\norm{\bm{x}}. (17)

Note that the example (13) is a special case of this estimator when mr=mc=1m_{r}=m_{c}=1 without any quantum noise. The following proposition shows that the estimator (16) is proportional to H𝒙H\bm{x} in the expectation, and has the variance scaling with 𝒙2\norm{\bm{x}}^{2}.

Proposition 2.

The estimator defined in (16) satisfies the following properties

𝔼[𝒈]=mrmcN2H𝒙\displaystyle\mathbb{E}[\bm{g}]=\frac{m_{r}m_{c}}{N^{2}}H\bm{x} (18)
𝔼[𝒈𝒈]=mc(Nmc)N(N1)diag(H𝒙𝒙H)+mc(mc1)N(N1)H𝒙𝒙H+Σr,c,\displaystyle\mathbb{E}[\bm{g}\bm{g}^{*}]=\frac{m_{c}(N-m_{c})}{N(N-1)}\mathrm{diag}\left(H\bm{x}\bm{x}^{*}H\right)+\frac{m_{c}(m_{c}-1)}{N(N-1)}H\bm{x}\bm{x}^{*}H+\Sigma_{r,c}, (19)

where the matrix Σr,c\Sigma_{r,c} satisfies

tr(Σr,c)NmrNH2𝒙2.\mathrm{tr}(\Sigma_{r,c})\leq\frac{N-m_{r}}{N}\norm{H}^{2}\norm{\bm{x}}^{2}. (20)

The proof of Proposition 2 is shown in Section VIII.1.

We remark that when ξ\xi in (16) estimates the element of p(H)p(H), Proposition 2 can be restated with p(H)p(H) instead of HH.

Algorithm

As discussed in Section II, we incorporate the estimator (16) to the randomized power method (11). This outlines the algorithm as shown in Algorithm 1 that produces an approximation of a ground state and the corresponding energy estimate. As a stopping criterion, the Rayleigh quotient,

(𝒙t,H𝒙t)𝒙t2,\frac{(\bm{x}_{t},H\bm{x}_{t})}{\norm{\bm{x}_{t}}^{2}}, (21)

, is computed every iteration. To do so efficiently, we exploit the sparsity of HH and 𝒈\bm{g} in (16), otherwise the cost of a brute-force calculation of (21) scales with the dimension NN. The sparisty of HH may inherit from that HH is a linear combination of 𝒪(poly(n))\mathcal{O}(poly(n)) many Pauli strings. As a result, the estimation of (21) relies on the following recursive relations

H𝒙t=H𝒙t1aH𝒈t1\displaystyle H\bm{x}_{t}=H\bm{x}_{t-1}-aH\bm{g}_{t-1} (22a)
𝒙t+12=𝒙t2a𝒈t𝒙ta𝒙t𝒈t+a2𝒈t2\displaystyle\norm{\bm{x}_{t+1}}^{2}=\norm{\bm{x}_{t}}^{2}-a\bm{g}_{t}^{*}\bm{x}_{t}-a\bm{x}_{t}^{*}\bm{g}_{t}+a^{2}\norm{\bm{g}_{t}}^{2} (22b)
(xt+1,H𝒙t+1)=(𝒙t,H𝒙t)a(gt,H𝒙t)a(𝒙t,H𝒈t)+a2(gt,H𝒈t).\displaystyle\bm{(}x_{t+1},H\bm{x}_{t+1})=(\bm{x}_{t},H\bm{x}_{t})-a\bm{(}g_{t},H\bm{x}_{t})-a(\bm{x}_{t},H\bm{g}_{t})+a^{2}\bm{(}g_{t},H\bm{g}_{t}). (22c)

Notice that the first relation requires 𝒪(mrmc)\mathcal{O}(m_{r}m_{c}) cost, since 𝒈t1\bm{g}_{t-1} is mcm_{c}-sparse and the mrm_{r}-sparse columns of HH corresponding to 𝒈t1\bm{g}_{t-1} are added to the vector H𝒙t1H\bm{x}_{t-1}. Similarly, the second relation needs 𝒪(mc)\mathcal{O}(m_{c}) cost. The last relation takes 𝒪(mcmin{mr,mc})\mathcal{O}(m_{c}\min\{m_{r},m_{c}\}) cost as the last term on the right side dominates overall cost. Therefore, the per-iteration cost for the estimation of (21) scales only as 𝒪(mrmc)\mathcal{O}(m_{r}m_{c}).

Data: initial guess 𝒙0\bm{x}_{0}, H𝒙0H\bm{x}_{0}, step size a(0,1)a\in(0,1), sparsity parameter mrm_{r}, index parameter mcm_{c}
Result: approximate ground-state and energy
for t=0:Tt=0:T do
       Pick mr,mc[N]m_{r},m_{c}\in[N] different indices randomly;
      
      Construct 𝒈t\bm{g}_{t} in (16) based on the circuit in Fig. 2 without repetition;
      
      H𝒙t=H𝒙t1aH𝒈t1H\bm{x}_{t}=H\bm{x}_{t-1}-aH\bm{g}_{t-1};
      
      Compute the quantities in (22);
      
end for
Algorithm 1 Quantum random power method

Rigorous guarantee

In this section, we provide theoretical results for Algorithm 1. Our main result can be summarized as follows,

Theorem 3 (Implication from Theorem 4 and Lemma 5).

For any precision ϵ>0\epsilon>0, there exists a quantum algorithm that solves (1) by outputing an approximation of a ground state of HN×NH\in\mathbb{C}^{N\times N} to the precision with probability one and the required number of iterations at most 𝒪(mrmclogclog1|ψGS|𝐱1|)\mathcal{O}\left(\frac{m_{r}m_{c}}{\log c}\log\frac{1}{\absolutevalue{\bra{\psi_{GS}}\ket{\bm{x}_{1}}}}\right) where the constant c>1c>1 depends on the hyperparameters mr,mcm_{r},m_{c}, the step size aa, the precision ϵ\epsilon, and the magnitude of noise involved in the quantity ξ\xi in (16). At each iteration, the number of samples required in the quantum part of the algorithm ranges from 0 to 𝒪(1ϵ2)\mathcal{O}(\frac{1}{\epsilon^{2}}).

This theorem combines Theorem 4 and Lemma 5. Theorem 4 tells us that with probability one, Algorithm 1 converges to a ground state of a matrix that is constructed from the matrix-element estimation ξ\xi in (16). Since ξ\xi may involve noise occuring in quantum computation, the ground state, to which the iteration converges, may be a perturbation of the true one. Thus, one needs to quantify the resulting fidelity, probably not one, influenced by the noise. Related to this, Lemma 5 makes a connection between the fidelity and the magnitude of noise under a small-noise assumption.

Stochastic gradient sampling

We note that in Algorithm 1, the gradient estimate (16) may involve bias due to the use of the quantum filtering technique (10) and the sampling without repetition. This means that the matrix-element estimation is performed not for p(H)p(H), but for a perturbed one p(H)~\widetilde{p(H)}. As a consequence, Algorithm 1 converges to an approximation of ground state of HH as in the following theorem. In the next section, we provide a lower bound of the fidelity between perturbed and true ground states to ensure that the more accurate the estimator (16) is computed, the higher the achievable fidelity of Algorithm 1 is, as being arbitrarily close to one.

The following theorem guarantees convergence with probability one to a given precision if a sufficiently small step size is chosen. As usually observed in variants of the power method [26, 38], the iteration complexity has a logarithmic dependence on the overlap of the ground state. By combining Theorem 6, we present the overall query complexity for Algorithm 1.

Theorem 4.

For any precision ϵ>0\epsilon>0, there exists a step size a>0a>0 and T=𝒪(log1|ψGS|𝐱1|logcϵ,a)T=\mathcal{O}\left(\frac{\log\frac{1}{\absolutevalue{\bra{\psi_{GS}}\ket{\bm{x}_{1}}}}}{\log c_{\epsilon,a}}\right) such that Algorithm 1 outputs an iterate whose fidelity is greater than 1ϵ1-\epsilon with probability one, namely,

([rT1ϵ,rt<1ϵ for all t<T] or [rt1ϵ for some t<T])=1,\mathbb{P}\left(\left[r_{T}\geq 1-\epsilon,r_{t}<1-\epsilon\text{ for all }t<T\right]\text{ or }\left[r_{t}\geq 1-\epsilon\text{ for some }t<T\right]\right)=1, (23)

where rtr_{t} denotes the squared fidelity with the subspace of degenerate ground states, namely,

rt:=j=1g|(ψGS,j,𝒙t)|2𝒙t2.r_{t}:=\frac{\sum_{j=1}^{g}\absolutevalue{(\psi_{GS,j},\bm{x}_{t})}^{2}}{\|\bm{x}_{t}\|^{2}}. (24)

Here, the ψGS,j\psi_{GS,j}’s are the ground states of a matrix whose (ir,ic)(i_{r},i_{c})-element is determined by the estimation ξ(H,ir,ic)\xi(H,i_{r},i_{c}) in (16).

The proof of Theorem 4 is shown in Section VIII.3.

A lower bound of the fidelity between a perturbed ground state and the subspace of true ones

In Algorithm 1, the elements of a matrix polynomial are estimated without repetition. The resulting matrix, p(H)~\widetilde{p(H)}, will be one perturbed from the desired matrix p(H)p(H) by at least the sampling noise, denoted as EE, namely,

E=p(H)~p(H)>0.\norm{E}=\norm{\widetilde{p(H)}-p(H)}>0. (25)

Here, we denote an error-free matrix and noisy one by AA and A~\widetilde{A}. Suppose that these matrices are Hermitian as in the situation (25). It may be challenging to derive a perturbative relation between the ground states of AA and A~\widetilde{A} when the error EE is considered to be arbitrary. But when EE is smaller than some value, we can show that a lower bound for the fidelity between a perturbed ground state and the true ones depends only on the magnitude of noise, E\norm{E}.

Let λ1(A)\lambda_{1}(A) and λ2(A)\lambda_{2}(A) be the first and second smallest eigenvalues of AA, and similar for A~\widetilde{A}. Let E=A~AE=\widetilde{A}-A denote the error. The following lemma shows that when E\norm{E} is smaller than a certain value, the overlap of any ground state of A~\widetilde{A} with the subspace of ground states of AA increases to 11 linearly with E\norm{E}.

Lemma 5.

Assume that Ee\norm{E}\leq e for some e(0,12|λ1(A)λ2(A)|)e\in(0,\frac{1}{2}\absolutevalue{\lambda_{1}(A)-\lambda_{2}(A)}). Then, for any precision ϵ>0\epsilon>0, there exists a constant C(A,e,ϵ)>0C(\norm{A},e,\epsilon)>0 such that

ψ~|Pλ1(A)|ψ~1C(A,e,ϵ)Eϵ,\bra{\widetilde{\psi}}P_{\lambda_{1}(A)}\ket{\widetilde{\psi}}\geq 1-C(\norm{A},e,\epsilon)\norm{E}-\epsilon, (26)

where |ψ~\ket{\widetilde{\psi}} is any ground state of A~\widetilde{A}, Pλ1(A)P_{\lambda_{1}(A)} denotes the projection of AA onto its subspace of ground states, and the constant C(A,e,ϵ)C(\norm{A},e,\epsilon) depends on A\norm{A}, ee and ϵ\epsilon.

The proof of this lemma is shown in Section VIII.

IV Per-iteration computational complexity

We discuss about the per-iteration computational complexity of our algorithm. As described in Section II.1, the gradient estimator (16) can be constructed in two manners: either the Hamiltonian simulation or the block encoding.

When using Hamiltonian simulation, the estimation of 𝒊|p(H)|𝒋\bra{\bm{i}}p_{\ell}(H)\ket{\bm{j}} in (7) at each iteration of our algorithm requires the 𝒪(M)\mathcal{O}(M) values for 𝒊|eiHm|𝒋\bra{\bm{i}}e^{iHm}\ket{\bm{j}}, each of which requires Trotter steps as many as 𝒪(poly(logN)2m2)\mathcal{O}(\text{poly}(\log N)^{2}m^{2}) [8]. That is, implementing Algorithm 1 using Hamiltonian simulation requires at most 𝒪(poly(logN)2M2)\mathcal{O}(\text{poly}(\log N)^{2}M^{2}) depth and gate count for a MM-degree Fourier-series approximation. The value of MM is determined by how much accurate the Fourier-series approximation of the degree-\ell polynomial, p(x)p_{\ell}(x), should be. Thus, MM implicitly depends on \ell. Thus, the overall circuit complexity for computing 𝒊|p(H)|𝒋\bra{\bm{i}}p_{\ell}(H)\ket{\bm{j}} based on Hamiltonian simulation is 𝒪(poly(logN)2M3)\mathcal{O}(\text{poly}(\log N)^{2}M^{3}) with a single ancilla qubit for the Hadamard test in Fig. 2.

Secondly, implementing the QSVT for estimating 𝒊|p(H)|𝒋\bra{\bm{i}}p_{\ell}(H)\ket{\bm{j}} requires 𝒪()\mathcal{O}(\ell) applications of one- and two- qubit gates together with \ell applications of the block encoding of HH and its inverse, assuming a block encoding of HH [12, Theorem 31]. In our applications as shown in Section V, we considered typical physical quantum models whose Hamiltonians are written as a linear combination of Paulis. In this case, the required circuit complexity for constructing a block encoding of HH scales polylogarithmically with system size [44, Theorem 7]. Therefore, the overall circuit complexity for computing 𝒊|p(H)|𝒋\bra{\bm{i}}p_{\ell}(H)\ket{\bm{j}} using QSVT is 𝒪(poly(logN))\mathcal{O}(\ell\text{poly}(\log N)), as QSVT requires \ell applications of the block encoding of HH and its inverse.

From either of the two approaches, we estimate a fixed number of matrix elements of the form 𝒊|p(H)|𝒋\bra{\bm{i}}p_{\ell}(H)\ket{\bm{j}}, and construct the mcm_{c}-sparse gradient estimator, 𝒈t\bm{g}_{t}, at each iteration in Algorithm 1. Thus, the per-iteration classical computational cost for updating the iteration is independent of system size as scaling as 𝒪(1)\mathcal{O}(1).

V Numerical result

In this section, we present several numerical experiments to demonstrate the performance of our algorithm. For the Hamiltonian simulation approach, we work with the transverse-field Ising model (TFIM) and the XXZ model, and examine the performance of the algorithm with and without polynomial filtering methods. For the block-encoding approach, we provide numerical tests that include the TFIM and one-dimensional Hubbard model (HM). We compare convergence results that are obtained using Algorithm 1 with and without polynomial filters. Additionally, we discuss how the filtering methods alleviate the effect of noise in the two approaches.

We set up the TFIM as

H=Jj=1n1ZjZj+1+Dj=1nXj.H=J\sum_{j=1}^{n-1}Z_{j}Z_{j+1}+D\sum_{j=1}^{n}X_{j}. (27)

Here, nn denotes the number of qubits, and Xj,ZjX_{j},Z_{j} are Pauli operators acting on the jj-th qubit. The XXZ model is defined as

H=Jj=1n1(XjXj+1+YjYj+1)Dj=1n1ZjZj+1.H=J\sum_{j=1}^{n-1}(X_{j}X_{j+1}+Y_{j}Y_{j+1})-D\sum_{j=1}^{n-1}Z_{j}Z_{j+1}. (28)

The HM, which is defined on a one-dimensional lattice with open boundary condition, is expressed as

H=tσ{,}j=1n1(aj,σaj+1,σ+aj+1,σaj,σ)+Uj=1nn^j,n^j,,H=-t\sum_{\sigma\in\{\uparrow,\downarrow\}}\sum_{j=1}^{n-1}\left(a_{j,\sigma}^{\dagger}a_{j+1,\sigma}+a_{j+1,\sigma}^{\dagger}a_{j,\sigma}\right)+U\sum_{j=1}^{n}\hat{n}_{j,\uparrow}\hat{n}_{j,\downarrow}, (29)

where the parameters tt and UU account for the hopping energy and the Coulomb energy. After applying a Jordan-Wigner transformation [27], we obtain

H=t2σ{,}j=1n1(Xj+1,σXj,σ+Yj+1,σYj,σ)+U4j=1n(Zj,Zj,+Zj,Zj,).H=-\frac{t}{2}\sum_{\sigma\in\{\uparrow,\downarrow\}}\sum_{j=1}^{n-1}\left(X_{j+1,\sigma}X_{j,\sigma}+Y_{j+1,\sigma}Y_{j,\sigma}\right)+\frac{U}{4}\sum_{j=1}^{n}\left(-Z_{j,\uparrow}-Z_{j,\downarrow}+Z_{j,\uparrow}Z_{j,\downarrow}\right). (30)

For convenience, a multiple of the identity matrix in (30) is omitted, since it does not affect the optimization.

V.1 Experiment detail

For the Hamiltonian simulation approach (i.e. (27) and (28)), we simulate Algorithm 1 using Qiskit [36]. To estimate the gradient, we incorporate the Chebyshev filtering method using the first-order Trotter formula, the Fourier-series approximation with the Fej́er kernel, and Hamiltonian simulation. Thus three types of errors are involved: numerical errors due to the Fourier expansion and the Trotter formula as well as the sampling noise from the Hadamard test. For the block-encoding approach (i.e. (27) and (30)), we simulate the quantum singular value transformation (QSVT) on MATLAB by computing the eigen-decomposition of a given Hamiltonian and constructing a corresponding matrix polynomial to mimic a block encoding of a matrix polynomial of HH. In this approach, only sampling noise is considered. In all numerical experiments, we save the estimated matrix elements once and use the estimated matrix to carry out 100 independent simulations for Algorithm 1 with the same initial guess.

V.2 Hamiltonian simulation approach

We consider the models (27) and (28) for different D/JD/J with J=1J=1. For every D/JD/J, we set the same parameter values, n=2n=2, and the shot number to be 4×1054\times 10^{5}, the degree of the Fej́er kernel to be 3030, the degree of the Chebyshev filtering to be 33, and mr=mc=1m_{r}=m_{c}=1 in Algorithm 1. We use the first-order Trotter formula and set max{k2+20,50}\max\{k^{2}+20,50\} to be the number of Trotter steps when estimating matrix elements of eiHke^{iHk} where kk is a positive integer. In Fig. 4 that compares the performance of Algorithm 1 with and without polynomial filters, we measure the growing rate of the average fidelity for each D/JD/J, namely,

AA(1ϵ2)+Bϵ2,\frac{A}{A(1-\frac{\epsilon}{2})+B\frac{\epsilon}{2}}, (31)

where the constants AA and BB are

A=(1aλ1N2)2,B=12aλ2N2+a2Nmaxj1λj2.A=\left(1-\frac{a\lambda_{1}}{N^{2}}\right)^{2},\quad B=1-\frac{2a\lambda_{2}}{N^{2}}+\frac{a^{2}}{N}\max_{j\neq 1}\lambda_{j}^{2}. (32)

Note that this quantity measures how fast the average fidelity increases, according to the technical detail (101). In the following numerical results, mr=mc=1m_{r}=m_{c}=1. Here we set ϵ=0.9\epsilon=0.9. Under this setting, the quantity (31) tells us the worst-case growing rate of the average fidelity unitl it reaches the precision ϵ\epsilon. For fair comparison, we compare the cases without scheduling decreasing step sizes in the bottom panels of Fig. 4, as the step size aa is a constant. For the polynomial filtering, we set the upper and lower bounds in (8) as λub=λmax+0.2(λmaxλmin)\lambda_{ub}=\lambda_{\max}+0.2(\lambda_{\max}-\lambda_{\min}) and λlb=λmin+0.2(λmaxλmin)\lambda_{lb}=\lambda_{\min}+0.2(\lambda_{\max}-\lambda_{\min}) for the TFIM and the XXZ model.

The top panels in Fig. 4 show the convergence of our algorithm with and without polynomial filters, and with and without scheduling decreasing step sizes. The graphs show the average of fidelities obtained from the 100th iterates of 100 independent simulations with a fixed initial guess for each D/JD/J. For the two models, we can see that in all cases, the optimization with the filtering method converges much faster than the ones without it. One reason may be that the convergence rate of the fidelity is related to the growing rate of the average fidelity (31). We observe that the overall growing rate is relatively higher when the filtering is applied as shown in the bottom panels of Fig. 4. In addition, the convergence result in Fig. 4 shows that Algorithm 1 with the filtering method approximates the true ground state well, despite the additional errors involved (e.g. the Trotter error, the Fourier approximation error, and the sampling error). Together with the convergence guarantee in Theorem 4, this can be verified by checking the high fidelities in the last column of Table 1, which are the fidelity of the ground state from the errorneous matrix contructed from Algorithm 1 with the true ground state.

Regarding Lemma 5, we observe that the perturbation error (LHS), E\norm{E}, is smaller than half of the enlarged spectral gap (RHS), 12|λ1(p(H))λ2(p(H))|\frac{1}{2}\absolutevalue{\lambda_{1}(p(H))-\lambda_{2}(p(H))}, in all cases. This means that the condition in Lemma 5 is satisfied in the cases, and so is the lower bound in the lemma. Accordingly, the ordering of the first two smallest eigenvalues is preserved even in the presence of sampling noise, and moreover the interval [λ1e,λ1+e][\lambda_{1}-e,\lambda_{1}+e] filters out the smallest eigenvalue of the estimated matrix from the second smallest one.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 4: Top: Comparison of the averaged fidelities obtained after 100 iterations of Algorithm 1 with and without the polynomial filter, and with and without scheduling decaying step sizes for the TFIM (left) and the XXZ model (right) for different D/JD/J. For each model, we used the same initial guess. For the optimization with step-size scheduling, we decrease the step size by a rate of 0.5 every 10 iteration steps. Bottom: Comparison of the growing rate of average fidelity for the cases with and without polynomial filtering.
D/JD/J LHS RHS λ1~\widetilde{\lambda_{1}} λ2~\widetilde{\lambda_{2}} λ1LHS\lambda_{1}-\text{LHS} λ1+LHS\lambda_{1}+\text{LHS} |ψ~|ψtrue|\absolutevalue{\bra{\widetilde{\psi}}\ket{\psi_{true}}}
0.5 1.36 2.37 -9.58 -4.45 -10.94 -8.21 0.99
0.6 1.17 2.86 -9.75 -2.84 -10.93 -8.57 0.99
0.7 1.02 3.25 -8.73 -3.02 -9.75 -7.71 0.99
0.8 0.49 3.54 -9.16 -1.96 -9.65 -8.67 0.99
0.9 0.85 3.74 -9.03 -1.09 -9.88 -8.18 0.99
1.0 0.87 3.88 -8.88 -1.30 -9.76 -8.01 0.99
1.1 0.77 3.97 -8.31 -1.02 -9.09 -7.53 0.99
1.2 1.02 4.03 -8.61 -1.05 -9.63 -7.59 0.99
1.3 0.5 4.07 -8.76 -0.83 -9.26 -8.26 0.99
1.4 0.61 4.09 -8.04 -0.42 -8.65 -7.42 0.99
1.5 0.78 4.11 -9.21 -0.59 -10.00 -8.42 0.99
(a) The TFIM
D/JD/J LHS RHS λ1~\widetilde{\lambda_{1}} λ2~\widetilde{\lambda_{2}} λ1LHS\lambda_{1}-\text{LHS} λ1+LHS\lambda_{1}+\text{LHS} |ψ~|ψtrue|\absolutevalue{\bra{\widetilde{\psi}}\ket{\psi_{true}}}
-0.1 0.49 3.28 -6.37 0.09 -6.86 -5.88 0.99
0.0 0.5 4.07 -8.03 0.08 -8.53 -7.53 0.99
0.1 0.54 5.03 -10.16 -0.28 -10.70 -9.61 0.99
0.2 0.29 6.39 -13.18 -0.72 -13.48 -12.88 0.99
0.3 0.61 8.03 -16.45 -0.37 -17.07 -15.83 0.99
0.4 0.76 9.72 -19.84 -0.22 -20.61 -19.08 0.99
0.5 0.51 11.62 -23.57 -0.61 -24.09 -23.05 0.99
0.6 0.75 13.96 -27.77 -0.17 -28.52 -27.01 0.99
0.7 0.77 16.6 -33.46 -0.27 -34.24 -32.69 0.99
0.8 0.7 19.35 -38.72 -0.39 -39.43 -38.01 0.99
0.9 0.74 22.33 -44.27 -0.10 -45.02 -43.52 0.99
(b) The XXZ model
Table 1: For each D/JD/J, the second and third columns show the spectral norm of the noise E\norm{E} and a half of the spectral gap of the noise-free matrix |λ1λ2|2\frac{\absolutevalue{\lambda_{1}-\lambda_{2}}}{2} in Lemma 5. The fourth and fifth columns show the two smallest eigenvalues of the perturbed matrix polynomial p(H)~\widetilde{p(H)}. The values in the sixth and seventh columns form an interval [λ1E,λ1+E][\lambda_{1}-\norm{E},\lambda_{1}+\norm{E}] that may contain λ1~\widetilde{\lambda_{1}} but λ2~.\widetilde{\lambda_{2}}. The last column displays the absolute values of fidelity between the perturbed ground state of p(H)~\widetilde{p(H)} and the true one of HH.

V.3 Block encoding approach

In this section, we consider the TFIM (27) with J=1,D=1.5J=1,D=1.5 and the Hubbard model (30) in weak (t=1,U=1t=1,U=1) and strong coupling regimes (t=1,U=6t=1,U=6). we set n=10n=10, the parameters mr=20,mc=20m_{r}=20,m_{c}=20 in Algorithm 1, and the degree of the Chebyshev polynomial to be 77 for the TFIM and 88 for the Hubbard model. We note that there is a non-degenerate ground state for the cases of (27) and (30) in the weak-coupling regime, while there are two degenerate ground states in the strong-coupling regime for (30). For this case, we measure the fidelity as follows,

|(ψGS,1,𝒙t)|2+|(ψGS,2,𝒙t)|2𝒙t,\frac{\sqrt{\absolutevalue{(\psi_{GS,1},\bm{x}_{t})}^{2}+\absolutevalue{(\psi_{GS,2},\bm{x}_{t})}^{2}}}{\norm{\bm{x}_{t}}}, (33)

where the ψGS,i\psi_{GS,i}’s for i=1,2i=1,2 denote two orthogonal ground states. This is a special case of more general result in A lower bound of the fidelity between a perturbed ground state and the subspace of true ones, which considers the number of ground states to be 22. For the polynomial filtering, we set the upper and lower bounds in (8) as λub=λmax+0.1(λmaxλmin)\lambda_{ub}=\lambda_{\max}+0.1(\lambda_{\max}-\lambda_{\min}) and λlb=λmin+0.03(λmaxλmin)\lambda_{lb}=\lambda_{\min}+0.03(\lambda_{\max}-\lambda_{\min}) for the TFIM and the Hubbard model in the weak-coupling regime, and the same upperbound and λlb=λmin+0.02(λmaxλmin)\lambda_{lb}=\lambda_{\min}+0.02(\lambda_{\max}-\lambda_{\min}) for the HM in the strong-coupling regime. These lower bounds, λlb\lambda_{lb}’s, seperate λ1\lambda_{1} and λ2\lambda_{2} for the TFIM model, λ3\lambda_{3} and λ4\lambda_{4} for the HM in the weak-coupling regime, and λ15\lambda_{15} and λ16\lambda_{16} for the HM in the strong-coupling regime, respectively.

Fig. 5 shows the effect of the polynomial filter and the shot number on the convergence of Algorithm 1. One can observe that the number of shots influences the performance of Algorithm 1. For example, the result with the largest shot number is much better than the others. Additionally, from the bottom panels of Fig. 5, it is evident that the use of the polynomial filter improves overall performance of Algorithm 1, exhibiting better convergence than the result without it. Importantly, the result in Fig. 5 shows high fidelities, which means that Algorithm 1 with the filtering method provides a fairly good approximation even with the sampling noise. By the convergence theorem Theorem 4, this can be understood by looking at the fidelity of approximate ground states with the true ones as in the last columns in Table 2.

As discussed in the previous section, we compare the perturbation error (LHS) to the enlarged spectral gap (RHS). For each of the three models, the condition in Lemma 5 is not satisfied in some cases where the LHS is larger than the RHS, but the interval [λ1e,λ1+e][\lambda_{1}-e,\lambda_{1}+e] filters the smallest perturbed eigenvalue λ1~\widetilde{\lambda_{1}} from the second one, which means that the ordering of the first two smallest eigenvalues is preserved even in the presence of sampling noise. However, this observation implies that the condition in Lemma 5 can be restrictive in practice. It would be interesting to find a more relaxed condition, and we leave this to future work.

For the Hubbard model in a strong coupling regime, among 100 simulation results, Fig. 6 in Section VIII.4 plots 12 that display the respective fidelities with the two degenerate ground states and the mean-square sum with normalization as defined in (1). The graphs mean that our method converges to a ground state as shown in Theorem 4.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 5: Top: comparison of the Rayleigh quotients obtained from Algorithm 1 when using the polynomial filter with different shot numbers (nshot=105(orange), 4×105(blue), 4×106(green)n_{shot}=10^{5}(\text{orange}),\;4\times 10^{5}(\text{blue}),\;4\times 10^{6}(\text{green})) and without the filter (yellow) for the TFIM (left), the HM in a weak coupling regime (middle), and the HM in a strong regime (right). In all optimization results, we decrease the step size by a rate of 0.8 every 500 iteration steps. Bottom: comparison of the fidelity of the iteration with the true ground state or the subspace of true ground states. For each of the three models, we use a fixed initial guess.
nshotn_{shot} LHS RHS λ1~\widetilde{\lambda_{1}} λ2~\widetilde{\lambda_{2}} λ1LHS\lambda_{1}-\text{LHS} λ1+LHS\lambda_{1}+\text{LHS} |ψ~|ψtrue|\absolutevalue{\bra{\widetilde{\psi}}\ket{\psi_{true}}}
10510^{5} 2.11 2.10 -5.41 -2.46 -7.31 -3.09 0.95
4×1054\times 10^{5} 1.05 2.10 -5.25 -1.59 -6.25 -4.15 0.98
4×1064\times 10^{6} 0.33 2.10 -5.21 -1.12 -5.53 -4.87 0.99
(a) The TFIM
nshotn_{shot} LHS RHS λ1~\widetilde{\lambda_{1}} λ2~\widetilde{\lambda_{2}} λ1LHS\lambda_{1}-\text{LHS} λ1+LHS\lambda_{1}+\text{LHS} |ψ~|ψtrue|\absolutevalue{\bra{\widetilde{\psi}}\ket{\psi_{true}}}
10510^{5} 3.09 2.91 -7.62 -3.37 -10.32 -4.13 0.95
4×1054\times 10^{5} 1.54 2.91 -7.29 -2.02 -8.77 -5.68 0.98
4×1064\times 10^{6} 0.48 2.91 -7.24 -1.44 -7.72 -6.74 0.99
(b) The HBM weak
nshotn_{shot} LHS RHS λ1~\widetilde{\lambda_{1}} λ2~\widetilde{\lambda_{2}} λ1LHS\lambda_{1}-\text{LHS} λ1+LHS\lambda_{1}+\text{LHS} ψB,1|P1|ψB,1\bra{\psi_{B,1}}P_{1}\ket{\psi_{B,1}} ψB,2|P1|ψB,2\bra{\psi_{B,2}}P_{1}\ket{\psi_{B,2}}
10510^{5} 2.04 0.93 -4.72 -3.01 -6.48 -2.39 0.95 0.94
4×1054\times 10^{5} 1.01 0.93 -4.53 -2.71 -5.45 -3.42 0.98 0.98
4×1064\times 10^{6} 0.32 0.93 -4.45 -2.58 -4.76 -4.11 0.99 0.99
(c) The HBM strong
Table 2: For each nshotn_{shot}, the second and third columns show the spectral norm of the noise E\norm{E} and a half of the spectral gap of the noise-free matrix |λ1λ2|2\frac{\absolutevalue{\lambda_{1}-\lambda_{2}}}{2} in Lemma 5. The fourth and fifth columns show the two smallest eigenvalues of the perturbed matrix polynomial p(H)~\widetilde{p(H)}. The values in the sixth and seventh columns form an interval [λ1E,λ1+E[\lambda_{1}-\norm{E},\lambda_{1}+\norm{E} that may contain λ1~\widetilde{\lambda_{1}} but λ2~.\widetilde{\lambda_{2}}. The last column displays the absolute values of fidelity between the perturbed ground state of p(H)~\widetilde{p(H)} and the true one of HH for the TFIM and the Hubbard model in a weak-coupling regime. For the Hubbard in a strong-coupling regime, we measure the fidelity of the two perturbed ground states of p(H)~\widetilde{p(H)} with the subspace of true ones of HH, on which the orthogonal projection P1P_{1} is defined.

VI Conclusion

In this work, we propose a quantum-classical hybrid random power method for ground-state computation. Unlike the previous inexact power methods [7, 26, 38], our method updates the iteration by estimating p(H)𝒙p(H)\bm{x} in the expectation, where p(x)p(x) is a filtering polynomial. Here we used the Chebyshev polynomial [45] for the filtering polynomial. The estimation of p(H)𝒙p(H)\bm{x} is based on a randomized technique that requires only 𝒪(1)\mathcal{O}(1) classical time and a polylogarithmic dependence of quantum circuit depth on the system size. To perform the estimation, our method implements a Hadamard test, given an access to either Hamiltonian simulation or block encoding, which is equivalent to sampling the elements of the matrix p(H)p(H) without repetition. More importantly, to any desired precision, we show the existence of a fixed quantum circuit for which our algorithm outputs an approximation of ground state to the precision with probability one, as stated in Theorem 3. As a part of the theorem, we prove that a lower bound of the fidelity of the approximate ground state with the subspace of true ones scales linearly with the magnitude of noise, if it is less than half of the enlarged spectral gap of p(H)p(H).

The quantum filtering part of our method provides a computational advantage that may not be obtained classically. For example, estimating the elements of p(H)p(H) with only classical computations requires a cost dependent on the dimension of the Hamiltonian. In contrast, doing so with the quantum filtering technique in our method may result in only a logarithmic dependence on resource. More importantly, the filtering technique with filtering bounds being not necessarily tight provides much better convergence behaviors as demonstrated with the numerical results in Section V.

There are a few questions to be further investigated. For instance, can we prove a tighter lower bound than one in Lemma 5 for the high fidelity against the perturbation error? Could we derive the variance of the fidelity in Theorem 8 to explain much better stability with polynomial filtering in the numerical results in Section V? Another question is whether we can incorporate the phase estimation algorithms [31, 10] into our method to achieve a Heisenberg-scaling limit in samping matrix elements.

VII Acknowledgment

We would like to thank Dr. Mancheon Han for helpful advice on numerical simulations. This research was supported by Quantum Simulator Development Project for Materials Innovation through the National Research Foundation of Korea (NRF) funded by the Korean government (Ministry of Science and ICT(MSIT))(No. NRF- 2023M3K5A1094813). We used resources of the Center for Advanced Computa- tion at Korea Institute for Advanced Study and the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility operated under Contract No. DE-AC02-05CH11231. S.C. was supported by a KIAS Individual Grant (CG090601) at Korea Institute for Advanced Study. T.K. is supported by a KIAS Individual Grant (No. CG096001) at Korea Institute for Advanced Study.

VIII Appendices

The proof of Lemma 5

We use an outcome of the Weyl’s inequality, which states that for Hermitians A,BN×NA,B\in\mathbb{C}^{N\times N} and any k{1,2,..,N}k\in\{1,2,..,N\},

|λk(A+B)λk(B)|B,\absolutevalue{\lambda_{k}(A+B)-\lambda_{k}(B)}\leq\norm{B}, (34)

where the λk(A)\lambda_{k}(A) denotes the kk-th smallest eigenvalue of AA and similarly for A+BA+B and BB.

To prove Lemma 5, we claim that

Pλ1(A~)=𝕀[λ1(A)e,λ1(A)+e](A~),P_{\lambda_{1}(\widetilde{A})}=\mathbb{I}_{[\lambda_{1}(A)-e,\lambda_{1}(A)+e]}\left(\widetilde{A}\right), (35)

where 𝕀\mathbb{I} is the indicator function that values one on the given interval and zero otherwise. To show this, we check that λ1(A~)[λ1(A)e,λ1(A)+e]\lambda_{1}(\widetilde{A})\in[\lambda_{1}(A)-e,\lambda_{1}(A)+e], and λ2(A~)[λ1(A)e,λ1(A)+e]\lambda_{2}(\widetilde{A})\not\in[\lambda_{1}(A)-e,\lambda_{1}(A)+e]. By (34), λ1(A~)[λ1(A)e,λ1(A)+e]\lambda_{1}(\widetilde{A})\in[\lambda_{1}(A)-e,\lambda_{1}(A)+e]. Again, by (34), λ2(A~)λ2(A)e\lambda_{2}(\widetilde{A})\geq\lambda_{2}(A)-e. By the assumption, λ2(A)e>λ1(A)+e\lambda_{2}(A)-e>\lambda_{1}(A)+e. Thus, the claim is proven.

Notice that the interval [λ1(A)e,λN(A)+e][\lambda_{1}(A)-e,\lambda_{N}(A)+e] includes the spectra of AA and A~\widetilde{A} by the Weyl’s inequality, where λN(A)\lambda_{N}(A) is the largest eigenvalue of AA. Restricting to this interval, by the Stone–Weierstrass theorem, we can take a sequence of polynomials {pn(x)}n=1\{p_{n}(x)\}_{n=1}^{\infty} that uniformly converges to the indicator function in (35). Then for any ϵ\epsilon, there exists a n>0n>0 such that pnp_{n} is ϵ2\frac{\epsilon}{2}-close to the indicator function on the interval in the uniform norm. Thus, it immediately follows that

𝕀(A~)pn(A~)ϵ2,𝕀(A)pn(A)ϵ2,\norm{\mathbb{I}(\widetilde{A})-p_{n}(\widetilde{A})}\leq\frac{\epsilon}{2},\quad\norm{\mathbb{I}(A)-p_{n}(A)}\leq\frac{\epsilon}{2}, (36)

where 𝕀:=𝕀[λ1(A)e,λ1(A)+e]\mathbb{I}:=\mathbb{I}_{[\lambda_{1}(A)-e,\lambda_{1}(A)+e]}. Together with the above claim, we observe

Pλ1(A~)=𝕀(A~)=pn(A~)+𝕀(A~)pn(A~)=pn(A)+Rn(A,E)+𝕀(A~)pn(A~)=𝕀(A)+Rn(A,E)+pn(A)𝕀(A)+𝕀(A~)pn(A~)=Pλ1(A)+Rn(A,E)+pn(A)𝕀(A)+𝕀(A~)pn(A~)\begin{split}P_{\lambda_{1}(\widetilde{A})}&=\mathbb{I}(\widetilde{A})\\ &=p_{n}(\widetilde{A})+\mathbb{I}(\widetilde{A})-p_{n}(\widetilde{A})\\ &=p_{n}(A)+R_{n}(A,E)+\mathbb{I}(\widetilde{A})-p_{n}(\widetilde{A})\\ &=\mathbb{I}(A)+R_{n}(A,E)+p_{n}(A)-\mathbb{I}(A)+\mathbb{I}(\widetilde{A})-p_{n}(\widetilde{A})\\ &=P_{\lambda_{1}(A)}+R_{n}(A,E)+p_{n}(A)-\mathbb{I}(A)+\mathbb{I}(\widetilde{A})-p_{n}(\widetilde{A})\end{split} (37)

where Rn(A,E):=pn(A~)pn(A)R_{n}(A,E):=p_{n}(\widetilde{A})-p_{n}(A). We note that 𝕀(A)=Pλ1(A)\mathbb{I}(A)=P_{\lambda_{1}(A)} by the selection of ee. In addition, Rn(A,E)CE\norm{R_{n}(A,E)}\leq C\norm{E}, since all terms in Rn(A,E)R_{n}(A,E) have at least one EE as a factor, and CC is determined by A\norm{A}, ee, and ϵ\epsilon, based on the triangle inequality argument. Let |𝒒~=j=1gβj|𝒒~j\ket{\widetilde{\bm{q}}}=\sum_{j=1}^{g}\beta_{j}\ket{\widetilde{\bm{q}}_{j}} be a ground state of A~\widetilde{A}. By applying the braket of the state |𝒒~\ket{\widetilde{\bm{q}}} to (37), we therefore obtain

1𝒒~|Pλ1(A)|𝒒~+CE+ϵ,1\leq\bra{\widetilde{\bm{q}}}P_{\lambda_{1}(A)}\ket{\widetilde{\bm{q}}}+C\norm{E}+\epsilon, (38)

which completes the proof.

Approximate Chebyshev filtering via the Fej́er kernel

To implement the Chebyshev filtering [45] with the Fourier expansion and Hamiltonian simulation, we first transform the Hamiltonian as (9), and normalize the transformed matrix such that its spectrum belongs to [π,π][-\pi,\pi], which is required to simulate Hamiltonian. Denote the transformed matrix by H¯\overline{H}. Let α\alpha be set as H¯/απ\norm{\overline{H}/\alpha}\leq\pi.

Our goal is to estimate the matrix elements of p(H)p_{\ell}(H) for the Chebyshev polynomial of degree \ell, p(x)p_{\ell}(x), using the Fourier expansion with the Fej́er kernel. The Fourier expansion with the Fej́er kernel of order KK for p(αx)p_{\ell}(\alpha x) is represented as a convolution form

p(αx)k=KKckeikx=(p(α())SK)(x),p_{\ell}(\alpha x)\approx\sum_{k=-K}^{K}c_{k}e^{ikx}=(p_{\ell}(\alpha(\cdot))*S_{K})(x), (39)

where ck=Kk+12πKππp(αx)eikx𝑑xc_{k}=\frac{K-k+1}{2\pi K}\int_{-\pi}^{\pi}p_{\ell}(\alpha x)e^{-ikx}dx, and SK(x)=sin2(Kx/2)Ksin2(x/2)S_{K}(x)=\frac{\sin^{2}\left(Kx/2\right)}{K\sin^{2}(x/2)}. By the Fej́er’s theorem, the RHS converges uniformly to the LHS as KK\rightarrow\infty.

The ckc_{k} can be computed via a recursive relation for the following auxiliary variable,

c(k,n)=12πππxneikx𝑑x.c(k,n)=\frac{1}{2\pi}\int_{-\pi}^{\pi}x^{n}e^{-ikx}dx. (40)

By the integral of parts, we have

c(k,n+1)=i2πk[πn+1eikπ(π)n+1eikπ]+(n+1)ikc(k,n)c(0,n)={0πnn+1.\begin{split}c(k,n+1)&=\frac{i}{2\pi k}\left[\pi^{n+1}e^{-ik\pi}-(-\pi)^{n+1}e^{ik\pi}\right]+\frac{(n+1)}{ik}c(k,n)\\ c(0,n)&=\begin{cases}0\\ \frac{\pi^{n}}{n+1}\end{cases}.\end{split} (41)

Then, we compute ckc_{k} as follows,

ck=Kk+1Kj=0αjdjc(k,j),c_{k}=\frac{K-k+1}{K}\sum_{j=0}^{\ell}\alpha^{j}d_{j}c(k,j), (42)

where djd_{j} is the jj-th coefficient of the Chebyshev polynomial p(x)p_{\ell}(x).

By properties of the Fej́er kernel, we obtain that for any differentiable function ff and every xx\in\mathbb{R},

|fSK(x)f(x)|12πππ|SK(y)||f(xy)f(x)|𝑑y=|y|<δ|SK(y)||f(xy)f(x)|𝑑y+δ|y|π|SK(y)||f(xy)f(x)|𝑑yδsup|xy|<δ|f(x)|+2Csup|xy|π|f(x)|Kδ2,\begin{split}\absolutevalue{f*S_{K}(x)-f(x)}&\leq\frac{1}{2\pi}\int_{-\pi}^{\pi}\absolutevalue{S_{K}(y)}\absolutevalue{f(x-y)-f(x)}dy\\ &=\int_{\absolutevalue{y}<\delta}\absolutevalue{S_{K}(y)}\absolutevalue{f(x-y)-f(x)}dy+\int_{\delta\leq\absolutevalue{y}\leq\pi}\absolutevalue{S_{K}(y)}\absolutevalue{f(x-y)-f(x)}dy\\ &\leq\delta\sup_{\absolutevalue{x-y}<\delta}\absolutevalue{f^{\prime}(x)}+\frac{2C\sup_{\absolutevalue{x-y}\leq\pi}\absolutevalue{f(x)}}{K\delta^{2}},\end{split} (43)

for sufficiently small δ\delta and some C>0C>0 depending on δ\delta. By letting δ=Θ(ϵ)\delta=\Theta(\epsilon) and K=𝒪(ϵ3)K=\mathcal{O}(\epsilon^{3}), it follows that |fSK(x)f(x)|=𝒪(ϵ)\absolutevalue{f*S_{K}(x)-f(x)}=\mathcal{O}(\epsilon).

Once the ckc_{k} is computed, we can estimate the matrix elements of p(H)p_{\ell}(H) by replacing xx with H¯/α\overline{H}/\alpha in (39) and performing the Hadamard test in Fig. 2.

Complexity of evaluating matrix elements using block encoding

The definition of a block-encoding of HH is as follows: for a matrix HH with Hα1\norm{\frac{H}{\alpha}}\leq 1, we say that UHU_{H} is an (α,a,ϵ)(\alpha,a,\epsilon)-block-encoding of HH if UHU_{H} is a (m+a)(m+a)-qubit unitary, and

Hα(0a|I)UH(|0aI)ϵ.\displaystyle\norm{H-\alpha(\bra*{0^{\otimes a}}\otimes I)U_{H}(\ket*{0^{\otimes a}}\otimes I)}\leq\epsilon. (44)

For a given block-encoding of a Hermitian matrix, the quantum singular value transformation (QSVT) [12] prepares a corresponding matrix polynomial to be encoded into the left corner of a unitary, namely,

Up(H)/C=(p(H)/C),U_{p_{\ell}(H)/C}=\begin{pmatrix}p_{\ell}(H)/C&\dotproduct\\ \dotproduct&\dotproduct\\ \end{pmatrix}, (45)

where CC normalizes the matrix polynomial p(H)p_{\ell}(H). Together with [12, Theorem 31], we obtain the following overall complexity for the estimation of matrix element,

Theorem 6 (Overall complexity).

For any tolerance ϵ>0\epsilon>0 and failure probability δ>0\delta>0, if the sample size is S=4C2ϵ2log4δS=\frac{4C^{2}}{\epsilon^{2}}\log\frac{4}{\delta}, then the Hadamard test in Fig. 2 outputs an ϵ\epsilon-close unbiased estimate ξ\xi of the matrix element 𝐢|p(H)|𝐣\bra{\bm{i}}p_{\ell}(H)\ket{\bm{j}} with confidence 1δ1-\delta, namely,

𝔼[ξ]=𝒊|p(H)|𝒋,\displaystyle\mathbb{E}[\xi]=\bra{\bm{i}}p_{\ell}(H)\ket{\bm{j}}, (46)
(|ξ𝒊|p(H)|𝒋|ϵ)δ.\displaystyle\mathbb{P}\left(\absolutevalue{\xi-\bra{\bm{i}}p_{\ell}(H)\ket{\bm{j}}}\geq\epsilon\right)\leq\delta. (47)

Each sample is obtained from \ell applications of UHU_{H} and UHU_{H}^{\dagger}, one application of controlled-UHU_{H} gate, and 𝒪()\mathcal{O}(\ell) applications of other one- and two-qubit gates.

Proof.

Define a random variable

ξ=1S(j=1Szj+izj+S),\xi=\frac{1}{S}\left(\sum_{j=1}^{S}z_{j}+iz_{j+S}\right), (48)

where the zjz_{j}’s and zj+Sz_{j+S} are independent random variables in {1,1}\{-1,1\} obtained from 2S2S applications of the Hadamard test Fig. 2 by letting W=IW=I and SS^{\dagger}, respectively. The values 11 and 1-1 correspond to the cases that the state in the first qubit results in 0 and 11, respectively. Notice that when W=IW=I, the probability of state in the ancilla qubit in Fig. 2 satisfies

12(0)=Re(0n+m+1|(IXi)Up(H)(IXj)|0n+m+1)=Re(0n+m+1|(IXi)|0(|0mp(H)|𝒋+|))=Re(0m|𝒊|(|0mp(H)|𝒋+|))=Re(𝒊|p(H)|𝒋),\begin{split}1-2\mathbb{P}\left(\text{0}\right)&=\mathrm{Re}\left(\bra{0^{n+m+1}}(I\otimes X_{i}^{\dagger})U_{p(H)}(I\otimes X_{j})\ket{0^{n+m+1}}\right)\\ &=\mathrm{Re}\left(\bra{0^{n+m+1}}(I\otimes X_{i}^{\dagger})\ket{0}\left(\ket{0^{m}}p(H)\ket{\bm{j}}+\ket{\perp}\right)\right)\\ &=\mathrm{Re}\left(\bra{0^{m}}\bra{\bm{i}}\left(\ket{0^{m}}p(H)\ket{\bm{j}}+\ket{\perp}\right)\right)\\ &=\mathrm{Re}\left(\bra{\bm{i}}p(H)\ket{\bm{j}}\right),\end{split} (49)

where |\ket{\perp} is perpendicular to any state of the form |0m|𝒙\ket{0^{m}}\ket{\bm{x}}. Similarly, the imaginary part of 𝒊|p(H)|𝒋\bra{\bm{i}}p(H)\ket{\bm{j}} can be estimated by setting W=SW=S^{\dagger} in Fig. 2. We observe that ξ\xi is a sample mean for the matrix element, and therefore we arrive at

𝔼[ξ]=𝒊|p(H)|𝒋.\mathbb{E}[\xi]=\bra{\bm{i}}p_{\ell}(H)\ket{\bm{j}}. (50)

By the Hoeffding’s inequality, we have

(|Re(ξ)Re(𝒊|p(H)|𝒋)|ϵ)2exp(Sϵ22).\mathbb{P}\left(\absolutevalue{\mathrm{Re}(\xi)-\mathrm{Re}\left(\bra{\bm{i}}p_{\ell}(H)\ket{\bm{j}}\right)}\geq\epsilon\right)\leq 2\exp\left(-\frac{S\epsilon^{2}}{2}\right). (51)

The same inequality applies to the case of the imaginary part of 𝒊|p(H)|𝒋\bra{\bm{i}}p_{\ell}(H)\ket{\bm{j}}. Since the inequality that |a+bixyi|ϵ\absolutevalue{a+bi-x-yi}\geq\epsilon for real a,b,x,ya,b,x,y implies that |ax|ϵ2\absolutevalue{a-x}\geq\frac{\epsilon}{\sqrt{2}} or |by|ϵ2\absolutevalue{b-y}\geq\frac{\epsilon}{\sqrt{2}}, we have

(|ξ𝒊|p(H)|𝒋|ϵ)4exp(Sϵ24).\mathbb{P}\left(\absolutevalue{\xi-\bra{\bm{i}}p_{\ell}(H)\ket{\bm{j}}}\geq\epsilon\right)\leq 4\exp\left(-\frac{S\epsilon^{2}}{4}\right). (52)

In the case that p(H)=p(H¯)Cp(H)=\frac{p_{\ell}(\overline{H})}{C}, it immediately follows

(|ξ~𝒊|p(H)|𝒋|ϵ)4exp(Sϵ24C2),\mathbb{P}\left(\absolutevalue{\widetilde{\xi}-\bra{\bm{i}}p_{\ell}(H)\ket{\bm{j}}}\geq\epsilon\right)\leq 4\exp\left(-\frac{S\epsilon^{2}}{4C^{2}}\right), (53)

where ξ~\widetilde{\xi} is defined similar to (48) with random variables chosen from {C,C}\{-C,C\} and CC defined in (45).

An exponential growth of the spectral gap with the degree of Chebyshev polynomial

In this section, we show that the Chebyshev polynomial filtering, p(H)p_{\ell}(H), enlarges the spectral gap of H¯\overline{H} exponentially with respect to degree \ell. Recall that H¯\overline{H} denotes the transformed Hamiltonian in (9). In the following, we denote the eigenvalue of H¯\overline{H} by λ¯\overline{\lambda} according to (9).

Proposition 7.

Assume that λlb(λ1,λ2]\lambda_{lb}\in(\lambda_{1},\lambda_{2}] and λub>λN\lambda_{ub}>\lambda_{N}. Then,

|p(λ1)|=Θ((|λ¯1|+|λ¯1|21)).\absolutevalue{p_{\ell}(\lambda_{1})}=\Theta\left(\left(\absolutevalue{\overline{\lambda}_{1}}+\sqrt{\absolutevalue{\overline{\lambda}_{1}}^{2}-1}\right)^{\ell}\right). (54)

Especially,

|p(λ1)p(λ2)|=Θ((|λ¯1|+|λ¯1|21)),\absolutevalue{p_{\ell}(\lambda_{1})-p_{\ell}(\lambda_{2})}=\Theta\left(\left(\absolutevalue{\overline{\lambda}_{1}}+\sqrt{\absolutevalue{\overline{\lambda}_{1}}^{2}-1}\right)^{\ell}\right), (55)

where λ¯=2λubλlb(λλub)+1\overline{\lambda}=\frac{2}{\lambda_{ub}-\lambda_{lb}}(\lambda-\lambda_{ub})+1 and λ¯1>1\overline{\lambda}_{1}>1. With the relaxed condition that λlb(λ2,λN]\lambda_{lb}\in(\lambda_{2},\lambda_{N}], the following hold

|p(λ1)|=Θ((|λ¯1|+|λ¯1|21))\displaystyle\absolutevalue{p_{\ell}(\lambda_{1})}=\Theta\left(\left(\absolutevalue{\overline{\lambda}_{1}}+\sqrt{\absolutevalue{\overline{\lambda}_{1}}^{2}-1}\right)^{\ell}\right) (56)
|p(λ2)|=Θ((|λ¯2|+|λ¯2|21)),\displaystyle\absolutevalue{p_{\ell}(\lambda_{2})}=\Theta\left(\left(\absolutevalue{\overline{\lambda}_{2}}+\sqrt{\absolutevalue{\overline{\lambda}_{2}}^{2}-1}\right)^{\ell}\right), (57)

and

|p(λ1)p(λ2)|=Θ((|λ¯1|+|λ¯1|21)).\absolutevalue{p_{\ell}(\lambda_{1})-p_{\ell}(\lambda_{2})}=\Theta\left(\left(\absolutevalue{\overline{\lambda}_{1}}+\sqrt{\absolutevalue{\overline{\lambda}_{1}}^{2}-1}\right)^{\ell}\right). (58)
Proof.

We use the identity for the Chebyshev polynomial p(x)p_{\ell}(x),

p(x)=(xx21)+(x+x21)2.p_{\ell}(x)=\frac{(x-\sqrt{x^{2}-1})^{\ell}+(x+\sqrt{x^{2}-1})^{\ell}}{2}. (59)

We consider the first case that λlb\lambda_{lb} is given in (λ1,λ2](\lambda_{1},\lambda_{2}]. Notice that in this case, λ¯2(1,1)\overline{\lambda}_{2}\in(-1,1) and λ¯1<1\overline{\lambda}_{1}<-1 where λ¯:=2λubλlb(λλub)+1\overline{\lambda}:=\frac{2}{\lambda_{ub}-\lambda_{lb}}(\lambda-\lambda_{ub})+1. Then we have

|p(λ2)|1,\absolutevalue{p_{\ell}(\lambda_{2})}\leq 1, (60)

and

|p(λ1)|=|p(|λ1|)|12||λ¯1|+|λ¯1|21|.\absolutevalue{p_{\ell}(\lambda_{1})}=\absolutevalue{p_{\ell}(\absolutevalue{\lambda_{1}})}\geq\frac{1}{2}\absolutevalue{\absolutevalue{\overline{\lambda}_{1}}+\sqrt{\absolutevalue{\overline{\lambda}_{1}}^{2}-1}}^{\ell}. (61)

Thus, the triangle inequality implies

|p(λ1)p(λ2)|12||λ¯1|+|λ¯1|21|1.\begin{split}\absolutevalue{p_{\ell}(\lambda_{1})-p_{\ell}(\lambda_{2})}&\geq\frac{1}{2}\absolutevalue{\absolutevalue{\overline{\lambda}_{1}}+\sqrt{\absolutevalue{\overline{\lambda}_{1}}^{2}-1}}^{\ell}-1.\end{split} (62)

Similarly, one can check the second case that λlb\lambda_{lb} is given in (λ2,λN](\lambda_{2},\lambda_{N}], and thus λ¯1<λ¯2<1\overline{\lambda}_{1}<\overline{\lambda}_{2}<-1, by using the identity (59). ∎

VIII.1 The proof of Proposition 2

We show the first property as

𝔼[𝒈]=𝔼c[𝔼r[𝔼ξ[𝒈]]]=𝔼c[ic=1mcmrN(𝒙,H𝒆ic)𝒆ic]=mrmcN2H𝒙.\mathbb{E}[\bm{g}]=\mathbb{E}_{c}\left[\mathbb{E}_{r}[\mathbb{E}_{\xi}[\bm{g}]]\right]=\mathbb{E}_{c}[\sum_{i_{c}=1}^{m_{c}}\frac{m_{r}}{N}(\bm{x},H\bm{e}_{i_{c}})\bm{e}_{i_{c}}]=\frac{m_{r}m_{c}}{N^{2}}H\bm{x}. (63)

The first equality holds since there are three types of randomness: sampling non-zero components of 𝒙\bm{x}, computational basis element 𝒆ic\bm{e}_{i_{c}}, and estimating the corresponding matrix element. In the second equality, we notice that the noise from quantum computation is averaged out and that

𝔼[ir=1mrxir𝒆ir]=(N1mr1)(Nmr)𝒙=mrN𝒙.\mathbb{E}[\sum_{i_{r}=1}^{m_{r}}x_{i_{r}}\bm{e}_{i_{r}}]=\frac{{N-1\choose m_{r}-1}}{{N\choose m_{r}}}\bm{x}=\frac{m_{r}}{N}\bm{x}. (64)

Similarly, the third equality holds by considering the factor (N1mc1)(Nmc)\frac{{N-1\choose m_{c}-1}}{{N\choose m_{c}}}.

Next we prove the second property. We first consider the case mr=Nm_{r}=N. In this case, we can rewrite the estimator as

𝒈=ic=1mc(𝒙,H𝒆ic)𝒆ic.\bm{g}=\sum_{i_{c}=1}^{m_{c}}(\bm{x},H\bm{e}_{i_{c}})\bm{e}_{i_{c}}. (65)

We observe that

𝔼[𝒈𝒈]=𝔼[(ic(𝒙,H𝒆ic)𝒆ic)(ic(𝒙,H𝒆ic)𝒆ic)]=𝔼[ic,jc(𝒙,H𝒆ic)(H𝒆jc,𝒙)𝒆ic𝒆jc]=𝔼[ic|(𝒙,H𝒆ic)|2𝒆ic𝒆ic+icjc(𝒙,H𝒆ic)(H𝒆jc,𝒙)𝒆ic𝒆jc]=(N1mc1)(Nmc)diag(H𝒙𝒙H)+(N2mc2)(Nmc)(H𝒙𝒙Hdiag(H𝒙𝒙H))=mc(Nmc)N(N1)diag(H𝒙𝒙H)+mc(mc1)N(N1)H𝒙𝒙H.\begin{split}\mathbb{E}[\bm{g}\bm{g}^{*}]&=\mathbb{E}\left[\left(\sum_{i_{c}}(\bm{x},H\bm{e}_{i_{c}})\bm{e}_{i_{c}}\right)\left(\sum_{i_{c}}(\bm{x},H\bm{e}_{i_{c}})\bm{e}_{i_{c}}\right)^{*}\right]\\ &=\mathbb{E}\left[\sum_{i_{c},j_{c}}(\bm{x},H\bm{e}_{i_{c}})(H\bm{e}_{j_{c}},\bm{x})\bm{e}_{i_{c}}\bm{e}_{j_{c}}^{*}\right]\\ &=\mathbb{E}\left[\sum_{i_{c}}\absolutevalue{(\bm{x},H\bm{e}_{i_{c}})}^{2}\bm{e}_{i_{c}}\bm{e}_{i_{c}}^{*}+\sum_{i_{c}\neq j_{c}}(\bm{x},H\bm{e}_{i_{c}})(H\bm{e}_{j_{c}},\bm{x})\bm{e}_{i_{c}}\bm{e}_{j_{c}}^{*}\right]\\ &=\frac{{N-1\choose m_{c}-1}}{{N\choose m_{c}}}\mathrm{diag}(H\bm{x}\bm{x}^{*}H)+\frac{{N-2\choose m_{c}-2}}{{N\choose m_{c}}}(H\bm{x}\bm{x}^{*}H-\mathrm{diag}(H\bm{x}\bm{x}^{*}H))\\ &=\frac{m_{c}(N-m_{c})}{N(N-1)}\mathrm{diag}\left(H\bm{x}\bm{x}^{*}H\right)+\frac{m_{c}(m_{c}-1)}{N(N-1)}H\bm{x}\bm{x}^{*}H.\end{split} (66)

By this result, we show the second property for the case that mrNm_{r}\leq N, obtaining that

𝔼[𝒈𝒈]=𝔼c[𝔼r[𝒈𝒈]]=𝔼c[𝔼r[(𝒈𝔼r[𝒈])(𝒈𝔼r[𝒈])]+𝔼r[𝒈]𝔼r[𝒈]]=𝔼c[𝔼r[(𝒈𝔼r[𝒈])(𝒈𝔼r[𝒈])]]+mc(Nmc)N(N1)diag(H𝒙𝒙H)+mc(mc1)N(N1)H𝒙𝒙H,\begin{split}\mathbb{E}[\bm{g}\bm{g}^{*}]&=\mathbb{E}_{c}\left[\mathbb{E}_{r}[\bm{g}\bm{g}^{*}]\right]\\ &=\mathbb{E}_{c}\left[\mathbb{E}_{r}[(\bm{g}-\mathbb{E}_{r}[\bm{g}])(\bm{g}-\mathbb{E}_{r}[\bm{g}])^{*}]+\mathbb{E}_{r}[\bm{g}]\mathbb{E}_{r}[\bm{g}]^{*}\right]\\ &=\mathbb{E}_{c}\left[\mathbb{E}_{r}[(\bm{g}-\mathbb{E}_{r}[\bm{g}])(\bm{g}-\mathbb{E}_{r}[\bm{g}])^{*}]\right]\\ &+\frac{m_{c}(N-m_{c})}{N(N-1)}\mathrm{diag}\left(H\bm{x}\bm{x}^{*}H\right)+\frac{m_{c}(m_{c}-1)}{N(N-1)}H\bm{x}\bm{x}^{*}H,\end{split} (67)

where the conditional expectation w.r.t rr, 𝔼r[𝒈]\mathbb{E}_{r}[\bm{g}], depends on the sampled column index cc. In the second and third equalities, we used the identities 𝔼[𝒗𝒗]=𝔼[(𝒗𝔼[𝒗])(𝒗𝔼[𝒗])]+𝔼[𝒗]𝔼[𝒗]\mathbb{E}[\bm{v}\bm{v}^{*}]=\mathbb{E}[(\bm{v}-\mathbb{E}[\bm{v}])(\bm{v}-\mathbb{E}[\bm{v}])^{*}]+\mathbb{E}[\bm{v}]\mathbb{E}[\bm{v}]^{*} and 𝔼r[𝒈]=ic=1mc(𝒙,H𝒆ic)𝒆ic\mathbb{E}_{r}[\bm{g}]=\sum_{i_{c}=1}^{m_{c}}(\bm{x},H\bm{e}_{i_{c}})\bm{e}_{i_{c}}. Denote Σr,c=𝔼c[𝔼r[(𝒈𝔼r[𝒈])(𝒈𝔼r[𝒈])]]\Sigma_{r,c}=\mathbb{E}_{c}[\mathbb{E}_{r}\left[(\bm{g}-\mathbb{E}_{r}[\bm{g}])(\bm{g}-\mathbb{E}_{r}[\bm{g}])^{*}\right]]. Using this, we achieve that

tr(Σr,c)=𝔼c[𝔼r[ic|(𝒙r𝒙,H𝒆ic)|2]]H2𝔼r[𝒙r𝒙2]=H2(N1mr)(Nmr)𝒙2=NmrNH2𝒙2.\begin{split}\mathrm{tr}(\Sigma_{r,c})&=\mathbb{E}_{c}\left[\mathbb{E}_{r}\left[\sum_{i_{c}}\absolutevalue{(\bm{x}_{r}-\bm{x},H\bm{e}_{i_{c}})}^{2}\right]\right]\\ &\leq\norm{H}^{2}\mathbb{E}_{r}[\norm{\bm{x}_{r}-\bm{x}}^{2}]\\ &=\norm{H}^{2}\frac{{N-1\choose m_{r}}}{{N\choose m_{r}}}\norm{\bm{x}}^{2}\\ &=\frac{N-m_{r}}{N}\norm{H}^{2}\norm{\bm{x}}^{2}\end{split}. (68)

VIII.2 Convergence for the case of exact computation with mr=N,mc=1m_{r}=N,m_{c}=1 in (16)

We consider the simple case that (16) is estimated exactly with mr=N,mc=1m_{r}=N,m_{c}=1. The analysis of this case will be extended to the case of noisy computation (16) in Section VIII.3. Let us define an estimate as follows,

𝒈=(H𝒙,𝒆is)𝒆is,\bm{g}=(H\bm{x},\bm{e}_{i_{s}})\bm{e}_{i_{s}}, (69)

where isi_{s} denotes the index randomly selected from {1,..,N}\{1,..,N\}. By definition, notice that

𝔼[𝒈]=1NH𝒙.\mathbb{E}[\bm{g}]=\frac{1}{N}H\bm{x}. (70)
Theorem 8.

Assume the estimator (69). For any precision ϵ>0\epsilon>0 and T=𝒪(1logcϵ,alog1|(ψGS,𝐱1)|)T=\mathcal{O}\left(\frac{1}{\log c_{\epsilon,a}}\log\frac{1}{\absolutevalue{(\psi_{GS},\bm{x}_{1})}}\right), the ovelap of the last iterate from Algorithm 1 with the ground state is greater than 1ϵ1-\epsilon with probability one, namely,

(|(ψGS,𝒙T)|2𝒙T2>1ϵ)=1.\mathbb{P}\left(\frac{\absolutevalue{(\psi_{GS},\bm{x}_{T})}^{2}}{\|\bm{x}_{T}\|^{2}}>1-\epsilon\right)=1. (71)
Proof.

We observe that the Cauchy-Schwarz inequality yields

𝔼[|(ψGS,𝒙k+1)|2𝒙k+12]|𝔼[(ψGS,𝒙k+1)]|2𝔼[𝒙k+12]=|𝔼[(ψGS,𝒙k+1)]|2j=1N𝔼[|(ψGS,j,𝒙k+1)|2],\begin{split}\mathbb{E}\left[\frac{\absolutevalue{(\psi_{GS},\bm{x}_{k+1})}^{2}}{\|\bm{x}_{k+1}\|^{2}}\right]&\geq\frac{\absolutevalue{\mathbb{E}[(\psi_{GS},\bm{x}_{k+1})]}^{2}}{\mathbb{E}[\|\bm{x}_{k+1}\|^{2}]}\\ &=\frac{\absolutevalue{\mathbb{E}[(\psi_{GS},\bm{x}_{k+1})]}^{2}}{\sum_{j=1}^{N}\mathbb{E}[\absolutevalue{(\psi_{GS,j},\bm{x}_{k+1})}^{2}]},\end{split} (72)

assuming deterministically non-zero denominators. For sufficiently small step size a>0a>0, the norm of 𝒙k\bm{x}_{k} is bounded below by some positive value deterministically. We can easily check this as

𝒈k=(H𝒙k,𝒆ik)𝒆ik=|(H𝒙k,𝒆ik)|H𝒙kH𝒙k𝒙k(1Ha)k1>0,\begin{split}&\norm{\bm{g}_{k}}=\norm{(H\bm{x}_{k},\bm{e}_{i_{k}})\bm{e}_{i_{k}}}=\absolutevalue{(H\bm{x}_{k},\bm{e}_{i_{k}})}\leq\norm{H\bm{x}_{k}}\leq\norm{H}\norm{\bm{x}_{k}}\\ &\Longrightarrow\norm{\bm{x}_{k}}\geq(1-\norm{H}a)^{k-1}>0,\end{split} (73)

if a<1Ha<\frac{1}{\norm{H}}. In the last line, we used the recursive relation (11), the initial condition 𝒙1=1\norm{\bm{x}_{1}}=1, and the triangle inequality. The following analysis is performed assuming a<1Ha<\frac{1}{\norm{H}}.

First, we estimate the nominator and the denominator of the right side in (72). From (11) and (69), notice that

𝔼[(ψGS,𝒙k+1)]=(ψGS,𝔼[𝒙k+1])=(ψGS,𝒙k)a(ψGS,𝔼[𝒈k])=(ψGS,𝒙k)a(ψGS,1NH𝒙k)=(1λ1Na)(ψGS,𝒙k).\begin{split}&\mathbb{E}[(\psi_{GS},\bm{x}_{k+1})]\\ &=(\psi_{GS},\mathbb{E}[\bm{x}_{k+1}])\\ &=(\psi_{GS},\bm{x}_{k})-a(\psi_{GS},\mathbb{E}[\bm{g}_{k}])\\ &=(\psi_{GS},\bm{x}_{k})-a(\psi_{GS},\frac{1}{N}H\bm{x}_{k})\\ &=\left(1-\frac{\lambda_{1}}{N}a\right)(\psi_{GS},\bm{x}_{k}).\end{split} (74)

In addition, we have

𝔼k[|(ψGS,j,𝒙k+1)|2]=|(ψGS,j,𝒙k)|22a𝐑𝐞[(ψGS,j,𝒙k)𝔼k[(𝒈k,ψGS,j)]]+a2𝔼k[|(ψGS,j,𝒈k)|2].\mathbb{E}_{k}[\absolutevalue{(\psi_{GS,j},\bm{x}_{k+1})}^{2}]=\absolutevalue{(\psi_{GS,j},\bm{x}_{k})}^{2}-2a\mathbf{Re}[(\psi_{GS,j},\bm{x}_{k})\mathbb{E}_{k}[(\bm{g}_{k},\psi_{GS,j})]]+a^{2}\mathbb{E}_{k}[\absolutevalue{(\psi_{GS,j},\bm{g}_{k})}^{2}]. (75)

Let’s estimate the middle and last terms. First, by definition (69), the middle term reduces to

(ψGS,j,𝒙k)𝔼k[(𝒈k,ψGS,j)]=λjN|(ψGS,j,𝒙k)|2.(\psi_{GS,j},\bm{x}_{k})\mathbb{E}_{k}[(\bm{g}_{k},\psi_{GS,j})]=\frac{\lambda_{j}}{N}\absolutevalue{(\psi_{GS,j},\bm{x}_{k})}^{2}. (76)

To estimate the last term, we take two steps. First we simplify the following outer product

𝔼k[𝒈k𝒈k]=𝔼k[(H𝒙k,𝒆ik)(𝒆ik,H𝒙k)𝒆ik𝒆ik]=1Ndiag(H𝒙k𝒙kH)\begin{split}\mathbb{E}_{k}[\bm{g}_{k}\bm{g}_{k}^{*}]&=\mathbb{E}_{k}\left[(H\bm{x}_{k},\bm{e}_{i_{k}})(\bm{e}_{i_{k}},H\bm{x}_{k})\bm{e}_{i_{k}}\bm{e}_{i_{k}}^{*}\right]=\frac{1}{N}\mathrm{diag}(H\bm{x}_{k}\bm{x}_{k}^{*}H)\end{split} (77)

Then, the last term can be simplifed to

𝔼k[|(ψGS,j,𝒈k)|2]=1NψGS,jdiag(H𝒙k𝒙kH)ψGS,j.\mathbb{E}_{k}[\absolutevalue{(\psi_{GS,j},\bm{g}_{k})}^{2}]=\frac{1}{N}\psi_{GS,j}^{*}\mathrm{diag}\left(H\bm{x}_{k}\bm{x}_{k}^{*}H\right)\psi_{GS,j}. (78)

To put these all together, we have

𝔼k[|(ψGS,j,𝒙k+1)|2]=bj|(ψGS,j,𝒙k)|2+a2NψGS,jdiag(H𝒙k𝒙kH)ψGS,j,\begin{split}\mathbb{E}_{k}[\absolutevalue{(\psi_{GS,j},\bm{x}_{k+1})}^{2}]=b_{j}\absolutevalue{(\psi_{GS,j},\bm{x}_{k})}^{2}+\frac{a^{2}}{N}\psi_{GS,j}^{*}\mathrm{diag}\left(H\bm{x}_{k}\bm{x}_{k}^{*}H\right)\psi_{GS,j},\end{split} (79)

where

bj=12Nλja.b_{j}=1-\frac{2}{N}\lambda_{j}a. (80)

Notice that

bjbj+1>0b_{j}\geq b_{j+1}>0 (81)

whenever a>0a>0 with aa being sufficiently small. From (79), we observe that

𝔼k[𝒙k+12]=j=1N𝔼k[|(ψGS,j,𝒙k+1)|2]=j=1Nbj|(ψGS,j,𝒙k)|2+a2Nj=1Ntr(ψGS,jdiag(H𝒙k𝒙kH)ψGS,j)=jbj|(ψGS,j,𝒙k)|2+a2Ntr(diag(H𝒙k𝒙kH)jψGS,jψGS,j)=jbj|(ψGS,j,𝒙k)|2+a2Ntr(H𝒙k𝒙kH)=jbj|(ψGS,j,𝒙k)|2+a2NH𝒙k2.\begin{split}&\mathbb{E}_{k}[\norm{\bm{x}_{k+1}}^{2}]\\ &=\sum_{j=1}^{N}\mathbb{E}_{k}[\absolutevalue{(\psi_{GS,j},\bm{x}_{k+1})}^{2}]\\ &=\sum_{j=1}^{N}b_{j}\absolutevalue{(\psi_{GS,j},\bm{x}_{k})}^{2}+\frac{a^{2}}{N}\sum_{j=1}^{N}\mathrm{tr}\left(\psi_{GS,j}^{*}\mathrm{diag}\left(H\bm{x}_{k}\bm{x}_{k}^{*}H\right)\psi_{GS,j}\right)\\ &=\sum_{j}b_{j}\absolutevalue{(\psi_{GS,j},\bm{x}_{k})}^{2}+\frac{a^{2}}{N}\mathrm{tr}\left(\mathrm{diag}\left(H\bm{x}_{k}\bm{x}_{k}^{*}H\right)\sum_{j}\psi_{GS,j}\psi_{GS,j}^{*}\right)\\ &=\sum_{j}b_{j}\absolutevalue{(\psi_{GS,j},\bm{x}_{k})}^{2}+\frac{a^{2}}{N}\mathrm{tr}\left(H\bm{x}_{k}\bm{x}_{k}^{*}H\right)\\ &=\sum_{j}b_{j}\absolutevalue{(\psi_{GS,j},\bm{x}_{k})}^{2}+\frac{a^{2}}{N}\norm{H\bm{x}_{k}}^{2}.\end{split} (82)

Observe that

H𝒙k2=j=1Nλj2|(ψGS,j,𝒙k)|2λ12|(ψGS,𝒙k)|2+maxj1λj2(𝒙k2|(ψGS,𝒙k)|2).\norm{H\bm{x}_{k}}^{2}=\sum_{j=1}^{N}\lambda_{j}^{2}\absolutevalue{(\psi_{GS,j},\bm{x}_{k})}^{2}\leq\lambda_{1}^{2}\absolutevalue{(\psi_{GS},\bm{x}_{k})}^{2}+\max_{j\neq 1}\lambda_{j}^{2}\left(\norm{\bm{x}_{k}}^{2}-\absolutevalue{(\psi_{GS},\bm{x}_{k})}^{2}\right). (83)

With these observations, we derive a recursive relation from the inequality (72) as,

𝔼k[|(ψGS,𝒙k+1)|2𝒙k+12]|𝔼k[(ψGS,𝒙k+1)]|2𝔼k[𝒙k+12]=(1λ1Na)2|(ψGS,𝒙k)|2jbj|(ψGS,j,𝒙k)|2+a2NH𝒙k2(74),(82)A|(ψGS,𝒙k)|2xk2A|(ψGS,𝒙k)|2xk2+B(1|(ψGS,𝒙k)|2xk2)+a2(N1)N2λ12(81),(83),\begin{split}\mathbb{E}_{k}\left[\frac{\absolutevalue{(\psi_{GS},\bm{x}_{k+1})}^{2}}{\|\bm{x}_{k+1}\|^{2}}\right]&\geq\frac{\absolutevalue{\mathbb{E}_{k}[(\psi_{GS},\bm{x}_{k+1})]}^{2}}{\mathbb{E}_{k}[\norm{\bm{x}_{k+1}}^{2}]}\\ &=\frac{\left(1-\frac{\lambda_{1}}{N}a\right)^{2}\absolutevalue{(\psi_{GS},\bm{x}_{k})}^{2}}{\sum_{j}b_{j}\absolutevalue{(\psi_{GS,j},\bm{x}_{k})}^{2}+\frac{a^{2}}{N}\norm{H\bm{x}_{k}}^{2}}\quad\eqref{eq: nominator},\eqref{eq: ubnorm}\\ &\geq\frac{A\frac{\absolutevalue{(\psi_{GS},\bm{x}_{k})}^{2}}{\norm{x_{k}}^{2}}}{A\frac{\absolutevalue{(\psi_{GS},\bm{x}_{k})}^{2}}{\norm{x_{k}}^{2}}+B\left(1-\frac{\absolutevalue{(\psi_{GS},\bm{x}_{k})}^{2}}{\norm{x_{k}}^{2}}\right)+\frac{a^{2}(N-1)}{N^{2}}\lambda_{1}^{2}}\quad\eqref{eq: monotone b},\eqref{eq: residual},\end{split} (84)

where

A=(1λ1Na)2\displaystyle A=\left(1-\frac{\lambda_{1}}{N}a\right)^{2} (85a)
B=b2+a2Nmaxj1λj2.\displaystyle B=b_{2}+\frac{a^{2}}{N}\max_{j\neq 1}\lambda_{j}^{2}. (85b)

Let us denote rk:=|(ψGS,𝒙k)|2𝒙k2r_{k}:=\frac{\absolutevalue{(\psi_{GS},\bm{x}_{k})}^{2}}{\|\bm{x}_{k}\|^{2}}. Note that rk[0,1]r_{k}\in[0,1] for any kk by definition. With this notation, the above inequality is rewritten as

𝔼k[rk+1]AArk+B(1rk)+a2Nλ12rk.\mathbb{E}_{k}[r_{k+1}]\geq\frac{A}{Ar_{k}+B(1-r_{k})+\frac{a^{2}}{N}\lambda_{1}^{2}}r_{k}. (86)

Under the spectral gap assumption that λ1<λ2\lambda_{1}<\lambda_{2}, we observe that

AB=2aN(λ2λ1)a2N(maxj1λj2λ12N)>aN(λ2λ1)>0\begin{split}A-B&=\frac{2a}{N}(\lambda_{2}-\lambda_{1})-\frac{a^{2}}{N}(\max_{j\neq 1}\lambda_{j}^{2}-\frac{\lambda_{1}^{2}}{N})>\frac{a}{N}(\lambda_{2}-\lambda_{1})>0\end{split} (87)

for any a(0,λ2λ1|maxj1λj2λ12N|)a\in\left(0,\frac{\lambda_{2}-\lambda_{1}}{\absolutevalue{\max_{j\neq 1}\lambda_{j}^{2}-\frac{\lambda_{1}^{2}}{N}}}\right).

For ϵ>0\epsilon>0, we consider the event that rk1ϵr_{k}\leq 1-\epsilon. In this event, if we set a<(λ2λ1)ϵ2λ12a<\sqrt{\frac{(\lambda_{2}-\lambda_{1})\epsilon}{2\lambda_{1}^{2}}}, then it follows from (87),

a2<(λ2λ1)ϵ2λ12<N(AB)ϵ2λ12.a^{2}<\frac{(\lambda_{2}-\lambda_{1})\epsilon}{2\lambda_{1}^{2}}<\frac{N(A-B)\epsilon}{2\lambda_{1}^{2}}. (88)

From this, the ratio in (86) satisfies

AArk+B(1rk)+a2Nλ12AA(1ϵ)+Bϵ+a2Nλ12>AA(1ϵ)+Bϵ+(AB)ϵ2=AA(1ϵ2)+Bϵ2>1,\begin{split}\frac{A}{Ar_{k}+B(1-r_{k})+\frac{a^{2}}{N}\lambda_{1}^{2}}&\geq\frac{A}{A(1-\epsilon)+B\epsilon+\frac{a^{2}}{N}\lambda_{1}^{2}}\\ &>\frac{A}{A(1-\epsilon)+B\epsilon+\frac{(A-B)\epsilon}{2}}\\ &=\frac{A}{A(1-\frac{\epsilon}{2})+B\frac{\epsilon}{2}}>1,\end{split} (89)

since B<AB<A. Denote the ratio as

cϵ:=AA(1ϵ2)+Bϵ2.c_{\epsilon}:=\frac{A}{A(1-\frac{\epsilon}{2})+B\frac{\epsilon}{2}}. (90)

Let Rk:=rk𝕀kR_{k}:=r_{k}\mathbb{I}_{k} where 𝕀k\mathbb{I}_{k} values one if rj1ϵr_{j}\leq 1-\epsilon for all jkj\leq k and zero otherwise. By this definition, it follows that conditioned on k\mathcal{F}_{k}, where the σ\sigma-algebra k\mathcal{F}_{k} is associated with the stochastic process up to kk,

𝔼k[Rk+1]cϵRk.\mathbb{E}_{k}[R_{k+1}]\geq c_{\epsilon}R_{k}. (91)

This implies

𝔼[RT]cϵT1|ψ1|𝒙1|2.\mathbb{E}[R_{T}]\geq c_{\epsilon}^{T-1}\absolutevalue{\bra{\psi_{1}}\ket{\bm{x}_{1}}}^{2}. (92)

For T=𝒪(logcϵ1ϵ|ψ1|𝒙1|2)T=\mathcal{O}\left(\log_{c_{\epsilon}}\frac{1-\epsilon}{\absolutevalue{\bra{\psi_{1}}\ket{\bm{x}_{1}}}^{2}}\right), it should be that RT=1ϵR_{T}=1-\epsilon with probability one, since Rk1ϵR_{k}\leq 1-\epsilon for all kk by definition. Therefore, this implies that rT1ϵr_{T}\geq 1-\epsilon with probability one conditioned on that rk<1ϵr_{k}<1-\epsilon for all k<Tk<T, or it could happen that rt1ϵr_{t}\geq 1-\epsilon for some t<Tt<T.

For the case of degenerate ground states, let {ψGS,j}j=1g\{\psi_{GS,j}\}_{j=1}^{g} be the ground states and λ1==λg<λg+1λN\lambda_{1}=\cdots=\lambda_{g}<\lambda_{g+1}\leq\cdots\lambda_{N}. By summing (84) over the ground states, we achieve that

𝔼k[j=1g|(ψGS,j,𝒙k+1)|2𝒙k+12]j=1g|𝔼k[(ψGS,j,𝒙k+1)]|2𝔼k[𝒙k+12]=(1λ1Na)2j=1g|(ψGS,j,𝒙k)|2j=1Nbj|(ψGS,j,𝒙k)|2+a2NH𝒙k2Aj=1g|(ψGS,j,𝒙k)|2𝒙k2Aj=1g|(ψGS,j,𝒙k)|2𝒙k2+B(1j=1g|(ψGS,j,𝒙k)|2𝒙k2)+a2(N1)N2λ12,\begin{split}\mathbb{E}_{k}\left[\frac{\sum_{j=1}^{g}\absolutevalue{(\psi_{GS,j},\bm{x}_{k+1})}^{2}}{\|\bm{x}_{k+1}\|^{2}}\right]&\geq\frac{\sum_{j=1}^{g}\absolutevalue{\mathbb{E}_{k}[(\psi_{GS,j},\bm{x}_{k+1})]}^{2}}{\mathbb{E}_{k}[\norm{\bm{x}_{k+1}}^{2}]}\\ &=\frac{\left(1-\frac{\lambda_{1}}{N}a\right)^{2}\sum_{j=1}^{g}\absolutevalue{(\psi_{GS,j},\bm{x}_{k})}^{2}}{\sum_{j=1}^{N}b_{j}\absolutevalue{(\psi_{GS,j},\bm{x}_{k})}^{2}+\frac{a^{2}}{N}\norm{H\bm{x}_{k}}^{2}}\\ &\geq\frac{A\frac{\sum_{j=1}^{g}\absolutevalue{(\psi_{GS,j},\bm{x}_{k})}^{2}}{\norm{\bm{x}_{k}}^{2}}}{A\frac{\sum_{j=1}^{g}\absolutevalue{(\psi_{GS,j},\bm{x}_{k})}^{2}}{\norm{\bm{x}_{k}}^{2}}+B\left(1-\frac{\sum_{j=1}^{g}\absolutevalue{(\psi_{GS,j},\bm{x}_{k})}^{2}}{\norm{\bm{x}_{k}}^{2}}\right)+\frac{a^{2}(N-1)}{N^{2}}\lambda_{1}^{2}},\end{split} (93)

where

A=(1λ1Na)2\displaystyle A=\left(1-\frac{\lambda_{1}}{N}a\right)^{2} (94a)
B=bg+1+a2Nmaxjg+1λj2,\displaystyle B=b_{g+1}+\frac{a^{2}}{N}\max_{j\geq g+1}\lambda_{j}^{2}, (94b)

and bjb_{j} is defined in (80). By denoting rk=j=1g|(ψGS,j,𝒙k)|2𝒙k2r_{k}=\frac{\sum_{j=1}^{g}\absolutevalue{(\psi_{GS,j},\bm{x}_{k})}^{2}}{\norm{\bm{x}_{k}}^{2}}, we can obtain a similar recursive inequality as (86). Additionally, from the assumption that λ1==λg<λg+1\lambda_{1}=\cdots=\lambda_{g}<\lambda_{g+1}, we obtain

AB=2aN(λg+1λ1)a2N(maxjg+1λj2λ12N)>aN(λg+1λ1)>0A-B=\frac{2a}{N}(\lambda_{g+1}-\lambda_{1})-\frac{a^{2}}{N}(\max_{j\geq g+1}\lambda_{j}^{2}-\frac{\lambda_{1}^{2}}{N})>\frac{a}{N}(\lambda_{g+1}-\lambda_{1})>0 (95)

for any a(0,λg+1λ1|maxjg+1λj2λ12N|)a\in\left(0,\frac{\lambda_{g+1}-\lambda_{1}}{\absolutevalue{\max_{j\geq g+1}\lambda_{j}^{2}-\frac{\lambda_{1}^{2}}{N}}}\right). Therefore, the proof is complete similar to the above proof.

VIII.3 The proof of Theorem 4

We prove Theorem 4 which generalizes Theorem 8 to the case mr<Nm_{r}<N. We consider the estimator (16) and show convergence of Algorithm 1. By Proposition 2, we notice that

𝔼[(ψGS,𝒙k+1)]=(ψGS,𝔼[𝒙k+1])=(ψGS,𝒙k)a(ψGS,𝔼[𝒈k])=(ψGS,𝒙k)a(ψGS,mrmcN2H𝒙k)=(1mrmcλ1N2a)(ψGS,𝒙k),\begin{split}&\mathbb{E}[(\psi_{GS},\bm{x}_{k+1})]\\ &=(\psi_{GS},\mathbb{E}[\bm{x}_{k+1}])\\ &=(\psi_{GS},\bm{x}_{k})-a(\psi_{GS},\mathbb{E}[\bm{g}_{k}])\\ &=(\psi_{GS},\bm{x}_{k})-a(\psi_{GS},\frac{m_{r}m_{c}}{N^{2}}H\bm{x}_{k})\\ &=\left(1-\frac{m_{r}m_{c}\lambda_{1}}{N^{2}}a\right)(\psi_{GS},\bm{x}_{k}),\end{split} (96)

and

𝔼k[(ψGS,j,𝒙k)(𝒈k,ψGS,j)]=(ψGS,j,𝒙k)𝔼k[(𝒈k,ψGS,j)]=(ψGS,j,𝒙k)(mrmcN2H𝒙k,ψGS,j)=mrmcλjN2|(ψGS,j,𝒙k)|2.\begin{split}\mathbb{E}_{k}[(\psi_{GS,j},\bm{x}_{k})(\bm{g}_{k},\psi_{GS,j})]&=(\psi_{GS,j},\bm{x}_{k})\mathbb{E}_{k}[(\bm{g}_{k},\psi_{GS,j})]\\ &=(\psi_{GS,j},\bm{x}_{k})(\frac{m_{r}m_{c}}{N^{2}}H\bm{x}_{k},\psi_{GS,j})\\ &=\frac{m_{r}m_{c}\lambda_{j}}{N^{2}}\absolutevalue{(\psi_{GS,j},\bm{x}_{k})}^{2}.\end{split} (97)

Additionally, it follows that

𝔼k[|(ψGS,j,𝒈k)|2]=mc(Nmc)N(N1)ψGS,jdiag(H𝒙k𝒙kH)ψGS,j+λj2mc(mc1)N(N1)|(ψGS,j,𝒙k)|2+ψGS,jΣr,cψGS,j,\begin{split}&\mathbb{E}_{k}[\absolutevalue{(\psi_{GS,j},\bm{g}_{k})}^{2}]=\frac{m_{c}(N-m_{c})}{N(N-1)}\psi_{GS,j}^{*}\mathrm{diag}\left(H\bm{x}_{k}\bm{x}_{k}^{*}H\right)\psi_{GS,j}+\frac{\lambda_{j}^{2}m_{c}(m_{c}-1)}{N(N-1)}\absolutevalue{(\psi_{GS,j},\bm{x}_{k})}^{2}+\psi_{GS,j}^{*}\Sigma_{r,c}\psi_{GS,j},\end{split} (98)

and as we did in (82), we achieve that

𝔼k[𝒙k+12]=jbj|(ψGS,j,𝒙k)|2+mc(Nmc)N(N1)a2H𝒙k2+a2tr(Σr,c),\mathbb{E}_{k}[\norm{\bm{x}_{k+1}}^{2}]=\sum_{j}b_{j}\absolutevalue{(\psi_{GS,j},\bm{x}_{k})}^{2}+\frac{m_{c}(N-m_{c})}{N(N-1)}a^{2}\norm{H\bm{x}_{k}}^{2}+a^{2}\mathrm{tr}(\Sigma_{r,c}), (99)

where bj=12aλjmrmcN2+a2λj2mc(mc1)N(N1)b_{j}=1-\frac{2a\lambda_{j}m_{r}m_{c}}{N^{2}}+a^{2}\frac{\lambda_{j}^{2}m_{c}(m_{c}-1)}{N(N-1)} and Σr,c\Sigma_{r,c} is defined in Proposition 2. Then, we have

𝔼k[|(ψGS,𝒙k+1)|2𝒙k+12]|𝔼k[(ψGS,𝒙k+1)]|2𝔼k[𝒙k+12]=(1mrmcλ1N2a)2|(ψGS,𝒙k)|2jbj|(ψGS,j,𝒙k)|2+mc(Nmc)N(N1)a2H𝒙k2+a2tr(Σr,c)A|(ψGS,𝒙k)|2𝒙k2A|(ψGS,𝒙k)|2𝒙k2+B(1|(ψGS,𝒙k)|2𝒙k2)+[(1mr2mcN3)mcλ12N+tr(Σr,c)𝒙k2]a2\begin{split}\mathbb{E}_{k}\left[\frac{\absolutevalue{(\psi_{GS},\bm{x}_{k+1})}^{2}}{\|\bm{x}_{k+1}\|^{2}}\right]&\geq\frac{\absolutevalue{\mathbb{E}_{k}[(\psi_{GS},\bm{x}_{k+1})]}^{2}}{\mathbb{E}_{k}[\norm{\bm{x}_{k+1}}^{2}]}\\ &=\frac{\left(1-\frac{m_{r}m_{c}\lambda_{1}}{N^{2}}a\right)^{2}\absolutevalue{(\psi_{GS},\bm{x}_{k})}^{2}}{\sum_{j}b_{j}\absolutevalue{(\psi_{GS,j},\bm{x}_{k})}^{2}+\frac{m_{c}(N-m_{c})}{N(N-1)}a^{2}\norm{H\bm{x}_{k}}^{2}+a^{2}\mathrm{tr}(\Sigma_{r,c})}\\ &\geq\frac{A\frac{\absolutevalue{(\psi_{GS},\bm{x}_{k})}^{2}}{\norm{\bm{x}_{k}}^{2}}}{A\frac{\absolutevalue{(\psi_{GS},\bm{x}_{k})}^{2}}{\norm{\bm{x}_{k}}^{2}}+B\left(1-\frac{\absolutevalue{(\psi_{GS},\bm{x}_{k})}^{2}}{\norm{\bm{x}_{k}}^{2}}\right)+\left[\left(1-\frac{m_{r}^{2}m_{c}}{N^{3}}\right)\frac{m_{c}\lambda_{1}^{2}}{N}+\frac{\mathrm{tr}(\Sigma_{r,c})}{\norm{\bm{x}_{k}}^{2}}\right]a^{2}}\end{split} (100)

where

A=(1mrmcλ1N2a)2\displaystyle A=\left(1-\frac{m_{r}m_{c}\lambda_{1}}{N^{2}}a\right)^{2} (101a)
B=b2+mc(Nmc)a2N(N1)maxj1λj2.\displaystyle B=b_{2}+\frac{m_{c}(N-m_{c})a^{2}}{N(N-1)}\max_{j\neq 1}\lambda_{j}^{2}. (101b)

In this case, the difference between these two constants is

AB=2mrmcaN2(λ2λ1)+(mr2mc2λ12N4λ22mc(mc1)N(N1)mc(Nmc)N(N1)maxj1λj2)a2mrmcaN2(λ2λ1)+mr2mc2λ12N4a2>0,\begin{split}A-B&=\frac{2m_{r}m_{c}a}{N^{2}}(\lambda_{2}-\lambda_{1})+\left(\frac{m_{r}^{2}m_{c}^{2}\lambda_{1}^{2}}{N^{4}}-\frac{\lambda_{2}^{2}m_{c}(m_{c}-1)}{N(N-1)}-\frac{m_{c}(N-m_{c})}{N(N-1)}\max_{j\neq 1}\lambda_{j}^{2}\right)a^{2}\\ &\geq\frac{m_{r}m_{c}a}{N^{2}}(\lambda_{2}-\lambda_{1})+\frac{m_{r}^{2}m_{c}^{2}\lambda_{1}^{2}}{N^{4}}a^{2}>0,\end{split} (102)

for any a(0,mr(λ2λ1)|Nλ22(mc1)(N1)+N(Nmc)(N1)maxj1λj2|)a\in\left(0,\frac{m_{r}(\lambda_{2}-\lambda_{1})}{\absolutevalue{\frac{N\lambda_{2}^{2}(m_{c}-1)}{(N-1)}+\frac{N(N-m_{c})}{(N-1)}\max_{j\neq 1}\lambda_{j}^{2}}}\right) by the triangle inequality. Following after (86) in Section VIII.2, we complete the proof of Theorem 4 similarly, noticing that tr(Σr,c)𝒙k2=𝒪(1)\frac{\mathrm{tr}(\Sigma_{r,c})}{\norm{\bm{x}_{k}}^{2}}=\mathcal{O}(1) in Proposition 2. Lastly, one can find a sufficiently small a>0a>0 such that 𝒙k>0\norm{\bm{x}_{k}}>0 for all kk by deriving a similar inequality to (73).

VIII.4 Additional numerical result

Fig. 6 plots 9 simulation results among the 100 ones in Fig. 5, where the two fidelities of the iteration with two degenerate ground states are shown.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 6: Convergence of fidelities, γ1=|(ψGS,1,𝒙)𝒙|\gamma_{1}=\absolutevalue{\frac{(\psi_{GS,1},\bm{x})}{\norm{\bm{x}}}}, γ2=|(ψGS,2,𝒙)𝒙|\gamma_{2}=\absolutevalue{\frac{(\psi_{GS,2},\bm{x})}{\norm{\bm{x}}}}, and γ12+γ22\sqrt{\gamma_{1}^{2}+\gamma_{2}^{2}}, which is observed from 12 independent simulations among the 100 ones in Fig. 5. Results named ”total” correspond to the fidelity (1) of the iteration with the two ground states.

References

  • [1] Dorit Aharonov, Daniel Gottesman, Sandy Irani, and Julia Kempe. The power of quantum systems on a line. Communications in mathematical physics, 287(1):41–65, 2009.
  • [2] Federico Becca and Sandro Sorella. Quantum Monte Carlo approaches for correlated systems. Cambridge University Press, 2017.
  • [3] Léon Bottou, Frank E Curtis, and Jorge Nocedal. Optimization methods for large-scale machine learning. SIAM review, 60(2):223–311, 2018.
  • [4] Earl Campbell. Random compiler for fast hamiltonian simulation. Physical review letters, 123(7):070503, 2019.
  • [5] Daan Camps, Lin Lin, Roel Van Beeumen, and Chao Yang. Explicit quantum circuits for block encodings of certain sparse matrices. arXiv preprint arXiv:2203.10236, 2022.
  • [6] David Ceperley and Berni Alder. Quantum monte carlo. Science, 231(4738):555–560, 1986.
  • [7] Yanlin Chen, András Gilyén, and Ronald de Wolf. A quantum speed-up for approximating the top eigenvectors of a matrix. arXiv preprint arXiv:2405.14765, 2024.
  • [8] Andrew M Childs, Aaron Ostrander, and Yuan Su. Faster quantum simulation by randomization. Quantum, 3:182, 2019.
  • [9] Zhiyan Ding, Taehee Ko, Jiahao Yao, Lin Lin, and Xiantao Li. Random coordinate descent: a simple alternative for optimizing parameterized quantum circuits. arXiv preprint arXiv:2311.00088, 2023.
  • [10] Zhiyan Ding and Lin Lin. Even shorter quantum circuit for phase estimation on early fault-tolerant quantum computers with applications to ground-state energy estimation. PRX Quantum, 4(2):020331, 2023.
  • [11] Yulong Dong, Lin Lin, and Yu Tong. Ground-state preparation and energy estimation on early fault-tolerant quantum computers via quantum eigenvalue transformation of unitary matrices. PRX Quantum, 3(4):040305, 2022.
  • [12] András Gilyén, Yuan Su, Guang Hao Low, and Nathan Wiebe. Quantum singular value transformation and beyond: exponential improvements for quantum matrix arithmetics. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 193–204. ACM, 2019.
  • [13] Harper R Grimsley, Sophia E Economou, Edwin Barnes, and Nicholas J Mayhall. An adaptive variational algorithm for exact molecular simulations on a quantum computer. Nature communications, 10(1):3007, 2019.
  • [14] Moritz Hardt and Eric Price. The noisy power method: A meta algorithm with applications. Advances in neural information processing systems, 27, 2014.
  • [15] Julia Kempe, Alexei Kitaev, and Oded Regev. The complexity of the local hamiltonian problem. Siam journal on computing, 35(5):1070–1097, 2006.
  • [16] William Kirby, Mario Motta, and Antonio Mezzacapo. Exact and efficient lanczos method on a quantum computer. Quantum, 7:1018, 2023.
  • [17] Alexei Yu Kitaev, Alexander Shen, and Mikhail N Vyalyi. Classical and quantum computation. Number 47. American Mathematical Soc., 2002.
  • [18] Taehee Ko, Xiantao Li, and Chunhao Wang. Implementation of the density-functional theory on quantum computers with linear scaling with respect to the number of atoms. arXiv preprint arXiv:2307.07067, 2023.
  • [19] Martin Larocca, Nathan Ju, Diego García-Martín, Patrick J Coles, and Marco Cerezo. Theory of overparametrization in quantum neural networks. Nature Computational Science, 3(6):542–551, 2023.
  • [20] Gwonhak Lee, Dongkeun Lee, and Joonsuk Huh. Sampling error analysis in quantum krylov subspace diagonalization. arXiv preprint arXiv:2307.16279, 2023.
  • [21] Laura Lewis, Hsin-Yuan Huang, Viet T Tran, Sebastian Lehner, Richard Kueng, and John Preskill. Improved machine learning algorithm for predicting ground state properties. nature communications, 15(1):895, 2024.
  • [22] Lin Lin and Yu Tong. Near-optimal ground state preparation. Quantum, 4:372, 2020.
  • [23] Lin Lin and Yu Tong. Heisenberg-limited ground-state energy estimation for early fault-tolerant quantum computers. PRX Quantum, 3(1):010318, 2022.
  • [24] Guang Hao Low and Isaac L Chuang. Optimal hamiltonian simulation by quantum signal processing. Physical review letters, 118(1):010501, 2017.
  • [25] Guang Hao Low and Isaac L Chuang. Hamiltonian simulation by qubitization. Quantum, 3:163, 2019.
  • [26] Jianfeng Lu and Zhe Wang. The full configuration interaction quantum monte carlo method through the lens of inexact power iteration. SIAM Journal on Scientific Computing, 42(1):B1–B29, 2020.
  • [27] Sam McArdle, Suguru Endo, Alán Aspuru-Guzik, Simon C Benjamin, and Xiao Yuan. Quantum computational chemistry. Reviews of Modern Physics, 92(1):015003, 2020.
  • [28] Jarrod R McClean, Jonathan Romero, Ryan Babbush, and Alán Aspuru-Guzik. The theory of variational hybrid quantum-classical algorithms. New Journal of Physics, 18(2):023023, 2016.
  • [29] Mario Motta, Chong Sun, Adrian TK Tan, Matthew J O’Rourke, Erika Ye, Austin J Minnich, Fernando GSL Brandao, and Garnet Kin-Lic Chan. Determining eigenstates and thermal states on a quantum computer using quantum imaginary time evolution. Nature Physics, 16(2):205–210, 2020.
  • [30] Yu Nesterov. Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM Journal on Optimization, 22(2):341–362, 2012.
  • [31] Hongkang Ni, Haoya Li, and Lexing Ying. On low-depth algorithms for quantum phase estimation. Quantum, 7:1165, 2023.
  • [32] Roberto Oliveira and Barbara M Terhal. The complexity of quantum spin systems on a two-dimensional square lattice. arXiv preprint quant-ph/0504050, 2005.
  • [33] Thomas E O’Brien, Brian Tarasinski, and Barbara M Terhal. Quantum phase estimation of multiple eigenvalues for small-scale (noisy) experiments. New Journal of Physics, 21(2):023022, 2019.
  • [34] Peter JJ O’Malley, Ryan Babbush, Ian D Kivlichan, Jonathan Romero, Jarrod R McClean, Rami Barends, Julian Kelly, Pedram Roushan, Andrew Tranter, Nan Ding, et al. Scalable quantum simulation of molecular energies. Physical Review X, 6(3):031007, 2016.
  • [35] Alberto Peruzzo, Jarrod McClean, Peter Shadbolt, Man-Hong Yung, Xiao-Qi Zhou, Peter J Love, Alán Aspuru-Guzik, and Jeremy L O’brien. A variational eigenvalue solver on a photonic quantum processor. Nature communications, 5(1):4213, 2014.
  • [36] Qiskit contributors. Qiskit: An open-source framework for quantum computing, 2023.
  • [37] Kazuhiro Seki and Seiji Yunoki. Quantum power method by a superposition of time-evolved states. PRX Quantum, 2(1):010333, 2021.
  • [38] Ohad Shamir. Convergence of stochastic gradient descent for pca. In International Conference on Machine Learning, pages 257–265. PMLR, 2016.
  • [39] Thomas Strohmer and Roman Vershynin. A randomized kaczmarz algorithm with exponential convergence. Journal of Fourier Analysis and Applications, 15(2):262–278, 2009.
  • [40] Ryan Sweke, Frederik Wilde, Johannes Meyer, Maria Schuld, Paul K Fährmann, Barthélémy Meynard-Piganeau, and Jens Eisert. Stochastic gradient descent for hybrid quantum-classical optimization. Quantum, 4:314, 2020.
  • [41] Libor Veis and Jiří Pittner. Quantum computing applied to calculations of molecular energies: Ch2 benchmark. The Journal of chemical physics, 133(19), 2010.
  • [42] Samson Wang, Sam McArdle, and Mario Berta. Qubit-efficient randomized quantum algorithms for linear algebra. PRX Quantum, 5(2):020324, 2024.
  • [43] Alexander Weiße, Gerhard Wellein, Andreas Alvermann, and Holger Fehske. The kernel polynomial method. Reviews of modern physics, 78(1):275, 2006.
  • [44] Xiao-Ming Zhang and Xiao Yuan. Circuit complexity of quantum access models for encoding classical data. npj Quantum Information, 10(1):42, 2024.
  • [45] Yunkai Zhou, Yousef Saad, Murilo L Tiago, and James R Chelikowsky. Self-consistent-field calculations using chebyshev-filtered subspace iteration. Journal of Computational Physics, 219(1):172–184, 2006.