This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Improved iterative quantum algorithm for ground-state preparation

Jin-Min Liang [email protected] School of Mathematical Sciences, Capital Normal University, Beijing 100048, China    Qiao-Qiao Lv School of Mathematical Sciences, Capital Normal University, Beijing 100048, China    Shu-Qian Shen College of Science, China University of Petroleum, Qingdao 266580, China    Ming Li College of Science, China University of Petroleum, Qingdao 266580, China    Zhi-Xi Wang [email protected] School of Mathematical Sciences, Capital Normal University, Beijing 100048, China    Shao-Ming Fei [email protected] School of Mathematical Sciences, Capital Normal University, Beijing 100048, China
Abstract

Finding the ground state of a Hamiltonian system is of great significance in many-body quantum physics and quantum chemistry. We propose an improved iterative quantum algorithm to prepare the ground state of a Hamiltonian. The crucial point is to optimize a cost function on the state space via the quantum gradient descent (QGD) implemented on quantum devices. We provide practical guideline on the selection of the learning rate in QGD by finding a fundamental upper bound and establishing a relationship between our algorithm and the first-order approximation of the imaginary time evolution. Furthermore, we adapt a variational quantum state preparation method as a subroutine to generate an ancillary state by utilizing only polylogarithmic quantum resources. The performance of our algorithm is demonstrated by numerical calculations of the deuteron molecule and Heisenberg model without and with noises. Compared with the existing algorithms, our approach has advantages including the higher success probability at each iteration, the measurement precision-independent sampling complexity, the lower gate complexity, and only quantum resources are required when the ancillary state is well prepared.

Keywords: Quantum computation, quantum circuits, ground state preparation

I Introduction

One of the most promising applications of quantum computers is to simulate the dynamics of chemical and physical systems Feynman1982Simulating . Quantum simulation in general requires to prepare an evolved state at any time and make measurement with respect to a physical observable Bolens2021Reinforcement ; Lu2021Preparation ; Lee2022Variational ; LiangWeiFei2022Quantum ; Xie2022Orthogonal . In particular, the ground state preparation for a given Hamiltonian system is of great significance. By using the Jordan-Wigner or Bravyi-Kitaev Bravyi2002 transformations, the molecule Hamiltonian can be transformed into qubit Hamiltonian in many quantum chemistry problems. The property of the ground states allows one to understand the dynamics of molecular and the design of drug Vivo2016Role .

Many works have focused on the ground state preparation of a Hamiltonian Innocenti2020Ultrafast ; Huggins2020A ; Seki2021Quantum ; Yao2021Reinforcement ; Bierman2022Quantum . For instance, the earlier work using quantum phase estimation prepares the ground state by projecting a guess state onto the ground state Abrams1999Quantum . However, the long coherence time and high-fidelity gates render it impractical for noisy intermediate-scale quantum (NISQ) devices. Later, the variational quantum eigensolver (VQE) Peruzzo2014Variational has attracted much attention due to its potential to be performed on NISQ devices. VQE obtains the ground state with a proper choice of ansatz, a suitable cost function and a controllable classical optimization Liang2020Variational ; Cerezo2021Variational ; Bharti2022Noisy ; Callison2022Hybrid . Nevertheless, VQE may face obstacles such as highly depending on the expressibility of the chosen ansatz and the so-called barren plateau which makes the optimization track detract from the global minimum McClean2018Barren . In order to circumvent the barren plateau, imaginary time evolution (ITE) is an alternative which drives an initial state to the ground state after long time evolution Motta2020Determining ; McArdle2019Variational ; Gomes2021Adaptive ; Lin2021Real .

Apart from the above approaches, an iterative quantum technique utilizes the power iteration Kyriienko2020Quantum and the linear combinational of unitary operators (LCU) Long2006General ; Long2008Duality ; Berry2015Simulating . A representative work is the full quantum eigensolver (FQE) combining LCU and quantum gradient descent (QGD), which optimizes a cost function on original state space via iteratively performing gradient operator on an initial state Wei2020A . This valuable technique has been generalized to approximate Hamiltonian operator Bespalova2021Hamiltonian , optimize a polynomial function Li2021Optimizing , determine the excited states of a Hamiltonian Wen2021A and obtaine the generalized eigenstates of a matrix pencil Liang2022Quantum .

Consider the time-independent Schrödinger equation,

^|ui=Ei|ui,E0<E1EN1,N=2n,\displaystyle\hat{\mathcal{H}}|u_{i}\rangle=E_{i}|u_{i}\rangle,~{}~{}E_{0}<E_{1}\leq\cdots\leq E_{N-1},~{}~{}N=2^{n}, (1)

where |ui|u_{i}\rangle denotes the eigenstate of the Hamiltonian ^N×N\hat{\mathcal{H}}\in\mathbb{C}^{N\times N} with eigenvalue EiE_{i}. Assume that the ll-local Hamiltonian encoded in an nn-qubit system can be written as the linear combinations of unitary operators ^k\hat{\mathcal{H}}_{k},

^=k=0K1hk^k,\displaystyle\hat{\mathcal{H}}=\sum_{k=0}^{K-1}h_{k}\hat{\mathcal{H}}_{k}, (2)

where hkh_{k} are real coefficients and ^k\hat{\mathcal{H}}_{k} are tensor products of Pauli matrices {σx,σy,σz}\{\sigma_{x},\sigma_{y},\sigma_{z}\} and the 2×22\times 2 identity matrix I2I_{2}. Assume also that the number of non-trivial nonzero terms being in K𝒪[poly(n)]K\sim\mathcal{O}[\textrm{poly}(n)] which grows only polynomially with the number of qubits nn.

The FQE prepares the ground state by optimizing a cost function C(|ϕ)=ϕ|^|ϕC(|\phi\rangle)=\langle\phi|\hat{\mathcal{H}}|\phi\rangle with the help of QGD and LCU Wei2020A . Basically, the minimal value of C(|ϕ)C(|\phi\rangle) implies the ground energy E0E_{0} and ground state |u0|u_{0}\rangle. The gradient of C(|ϕ)C(|\phi\rangle) with respect to state |ϕ|\phi\rangle is C(|ϕ)=2^|ϕ\nabla C(|\phi\rangle)=2\hat{\mathcal{H}}|\phi\rangle. Applying the gradient descent formula, the update state after one iteration is given by

|ϕ(1)\displaystyle|\phi(1)\rangle =|ϕ(0)μC(|ϕ(0))|ϕ(0)μC(|ϕ(0))2=G|ϕ(0)G|ϕ(0)2,\displaystyle=\frac{|\phi(0)\rangle-\mu\nabla C(|\phi(0)\rangle)}{\||\phi(0)\rangle-\mu\nabla C(|\phi(0)\rangle)\|_{2}}=\frac{G|\phi(0)\rangle}{\|G|\phi(0)\rangle\|_{2}}, (3)

where μ>0\mu>0 denotes the learning rate and the non-unitary operator G=IN2μ^G=I_{N}-2\mu\hat{\mathcal{H}}. The iteration process can be seen as a pure state evolution under the non-unitary operator GG. Although the implementation of operator GG gives rise to a challenge for quantum devices designed only from unitary gates, the dilation method Sweke2015Universal ; Marsden2021Capturing makes it possible by generating a superposition state whose elements are associated with the coefficients hkh_{k}. After carefully selecting the learning rate μ\mu in practical situations, FQE converges to the ground state with polylogarithmic depth circuit. Moreover, it has been demonstrated that FQE can prepare the ground state even at the presence of Gaussian and random noise Wei2020A .

However, there are three caveats which may hinder the efficient implementations of FQE on quantum computers. (i) The selection of learning rate μ\mu. Different learning rate may have different performance on the convergence. Thus, in practice the selection of the learning rate demands a theoretical constraint. (ii) The preparation of the ancillary superposition state. Before applying the gradient operator GG, one needs to prepare an ancillary state whose elements are the coefficients of the Pauli decomposition of the gradient operator. For a general 2m2^{m}-dimensional classical vector, it has been demonstrated that an exact universally algorithm would need at least 𝒪(m)\mathcal{O}(m) qubits and 𝒪(2m)\mathcal{O}(2^{m}) operators to prepare the corresponding amplitude encoding state Plesch2011Quantum . Therefore, new technology is needed to reduce the gate complexity. (iii) The lack of analysis on the sampling complexity. Although quantum amplitude amplification (QAA) induces a quadratic speedup Brassard2002Quantum , performing QAA needs additional computational resources.

In this work, we treat the above problems with several techniques and present an improved iterative quantum algorithm for ground state preparation. Our algorithm has a lower gate complexity attributed to the preparation of the ancillary state. For the problem (i), we theoretically demonstrate that the learning rate has an upper bound determined by the ground state energy and the largest eigenvalue of the Hamiltonian. In particular, the learning rate may be arbitrary positive number for special Hamiltonian. Moreover, we find that the first-order approximation of imaginary time evolution and our algorithm are equivalent. This claim provides a practical guideline about the selection of the learning rate. Specifically, μ=Δt/2\mu=\Delta t/2 for small time step Δt\Delta t. For the problem (ii), we adapt a variational quantum state preparation (VQSP) to efficiently access an ancillary state with only polynomial overhead in terms of the size of the state. For the problem (iii), we demonstrate that the sampling complexity of our algorithm does not depend on the precision of measurement as shown in VQE. Actually, it is a finite value depending on the summation of the coefficients in the decomposition of gradient operator, the overlap between the initial state and the ground state, and the largest (in absolute value) eigenvalue of gradient operator. Meanwhile, the sampling complexity decreases with the increase of the iterations. Finally, we numerically prepare ground states of the deuteron molecule and Heisenberg model on noiseless and noisy situations. Since the behavior of Gaussian and random noises has been investigated in FQE Wei2020A , here the noise behavior is modeled and simulated by accounting to the global depolarizing noise channel in each iteration. The fidelity and the evolved energy are chosen as the quantities to evaluate the performance of the obtained ground state. Comparing with VQE for the deuteron molecule, our algorithm has shorter iteration steps even the consumed resources are different. It does not need to perform classical-quantum optimization loops, thus avoiding the barren plateau as long as VQSP is perfectly achieved in advance. Finally, our algorithm has higher success probability at the end of each iteration compared with FQE scheme. Hence, the sampling complexity is reduced.

II Methods

Our iterative quantum algorithm for ground-state preparation contains two crucial subroutines: variational quantum state preparation (VQSP) presented in Sec. II.3 and the implementation of the non-unitary gradient operator Berry2015Simulating . A schematic diagram for the proposed algorithm is shown in Fig. (1.a).

Refer to caption
Figure 1: (a) A schematic diagram of our iterative quantum algorithm for the ground state preparation. \mathcal{E} denotes the noise channel. (b) A 33-qubit parametrized quantum circuit U(𝜽)U(\boldsymbol{\theta}) in which the unitary RyR_{y} is parameterized. (c) The decomposition of the unitary 𝒰=[U(𝜽opt)IN]Λk~G[U(𝜽opt)IN]\mathcal{U}=[U^{{\dagger}}(\boldsymbol{\theta}_{\textrm{opt}})\otimes I_{N}]\Lambda_{\tilde{k}}G[U(\boldsymbol{\theta}_{\textrm{opt}})\otimes I_{N}].

For a quantum system of Hamiltonian (2), the gradient operator GG can also be expressed as an LCU,

G\displaystyle G =IN2μk=0K1hk^k=k=0KykGk,yk>0,\displaystyle=I_{N}-2\mu\sum_{k=0}^{K-1}h_{k}\hat{\mathcal{H}}_{k}=\sum_{k=0}^{K}y_{k}G_{k},~{}~{}y_{k}>0, (4)

where the coefficients yk=|2μhk|y_{k}=|2\mu h_{k}| and the unitary Gk=sk^kG_{k}=s_{k}\hat{\mathcal{H}}_{k} for k=0,,K1k=0,\cdots,K-1, yK=1y_{K}=1 and GK=ING_{K}=I_{N}. Note that the sign sk=1s_{k}=1 (1)(-1) if hkh_{k} is negative (positive). Here we assume that K+1=2k~K+1=2^{\tilde{k}} for an integer k~\tilde{k}. If K+1K+1 is not a power of 22, we can divide the identity operator INI_{N} into several sub-terms until K+1K+1 can be denoted as a power of 22.

Before executing our algorithm, we require to prepare a superposition state in a k~\tilde{k}-qubit system,

|𝒚=k=0K𝒚k|k,𝒚k=yk𝒩y,\displaystyle|\boldsymbol{y}\rangle=\sum_{k=0}^{K}\boldsymbol{y}_{k}|k\rangle,~{}~{}\boldsymbol{y}_{k}=\sqrt{\frac{y_{k}}{\mathcal{N}_{y}}}, (5)

where the constant 𝒩y=k=0Kyk\mathcal{N}_{y}=\sum_{k=0}^{K}y_{k}. We give a variational quantum algorithm to achieve the state preparation on an NISQ computer in Sec. II.3, where an optimal unitary U(𝜽opt)U(\boldsymbol{\theta}_{\textrm{opt}}) determined by the parameter 𝜽opt\boldsymbol{\theta}_{\textrm{opt}} is utilized to produce the state |𝒚=U(𝜽opt)|0k~|\boldsymbol{y}\rangle=U(\boldsymbol{\theta}_{\textrm{opt}})|0^{\otimes\tilde{k}}\rangle.

II.1 The main procedure

The inputs in our algorithm are the Hamiltonian ^=k=0K1hk^k\hat{\mathcal{H}}=\sum_{k=0}^{K-1}h_{k}\hat{\mathcal{H}}_{k}, a tensor product state |0k~|0^{\otimes\tilde{k}}\rangle and the initial state |ϕ(0)|\phi(0)\rangle. A well-decided initial state |ϕ(0)|\phi(0)\rangle should have a sufficient overlap with the ground state |u0|u_{0}\rangle such that c0(0)=|u0|ϕ(0)|=𝒪[poly(1/n)]c_{0}^{(0)}=|\langle u_{0}|\phi(0)\rangle|=\mathcal{O}[\textrm{poly}(1/n)] He2021Quantum ; Note1 . Denote SS the largest iterative step. Our algorithm iteratively run the following procedures for s=0,1,,S1s=0,1,\cdots,S-1.

(a) Performing the unitary operator U(𝜽opt)INU(\boldsymbol{\theta}_{\textrm{opt}})\otimes I_{N} on the state |Ψ(s)=|0k~|ϕ(s)|\Psi(s)\rangle=|0^{\otimes\tilde{k}}\rangle|\phi(s)\rangle, we obtain the state |Ψa(s)=|𝒚|ϕ(s)|\Psi_{a}(s)\rangle=|\boldsymbol{y}\rangle|\phi(s)\rangle.

(b) Apply the controlled unitary Λk~G=k=0K|kk|Gk\Lambda_{\tilde{k}}G=\sum_{k=0}^{K}|k\rangle\langle k|\otimes G_{k} on the whole system. The state |Ψa(s)|\Psi_{a}(s)\rangle is transformed into the state

|Ψb(s)=k=0K𝒚k|kGk|ϕ(s).\displaystyle|\Psi_{b}(s)\rangle=\sum_{k=0}^{K}\boldsymbol{y}_{k}|k\rangle\otimes G_{k}|\phi(s)\rangle.

(c) Implementing the unitary U(𝜽opt)INU^{{\dagger}}(\boldsymbol{\theta}_{\textrm{opt}})\otimes I_{N} on the state |Ψb(s)|\Psi_{b}(s)\rangle, we obtain

|Ψc(s)=|0k~k=0K𝒚k2Gk|ϕ(s)+|Ψ(s),\displaystyle|\Psi_{c}(s)\rangle=|0^{\otimes\tilde{k}}\rangle\sum_{k=0}^{K}\boldsymbol{y}^{2}_{k}G_{k}|\phi(s)\rangle+|\Psi_{\perp}(s)\rangle,

where the ancillary state of |Ψ(s)|\Psi_{\perp}(s)\rangle is orthogonal to |00|0\cdots 0\rangle.

(d) Measure the ancillary system with the projector M=|0k~0k~|INM=|0^{\otimes\tilde{k}}\rangle\langle 0^{\otimes\tilde{k}}|\otimes I_{N}. The state of the whole system after the measurement is

M|Ψc(s)P(s+1)\displaystyle\frac{M|\Psi_{c}(s)\rangle}{\sqrt{P(s+1)}} =|0k~k=0KykGk|ϕ(s)𝒩yP(s+1)\displaystyle=\frac{|0^{\otimes\tilde{k}}\rangle\sum_{k=0}^{K}y_{k}G_{k}|\phi(s)\rangle}{\mathcal{N}_{y}\sqrt{P(s+1)}}
=|0k~|ϕ(s+1)=|Ψ(s+1).\displaystyle=|0^{\otimes\tilde{k}}\rangle|\phi(s+1)\rangle=|\Psi(s+1)\rangle.

The success probability of preparing the state |ϕ(s+1)|\phi(s+1)\rangle is

P(s+1)\displaystyle P(s+1) =Ψc(s)|MM|Ψc(s)=G|ϕ(s)22𝒩y2.\displaystyle=\langle\Psi_{c}(s)|M^{{\dagger}}M|\Psi_{c}(s)\rangle=\frac{\|G|\phi(s)\rangle\|_{2}^{2}}{\mathcal{N}_{y}^{2}}.

Furthermore, the following Lemma gives a critical property of the success probability that P(s+1)P(s+1) would not decay exponentially to zero with the iteration (see the proof in Appendix A).

Lemma 1.

The success probability of each iteration is an increasing sequence such that

P(1)P(2)P(S).\displaystyle P(1)\leq P(2)\leq\cdots\leq P(S). (6)

Although QAA provides a quadratic speedup on the measurement sampling complexity, Θ[P1/2(s+1)]\Theta[P^{-1/2}(s+1)], this step is expensive since the Hamiltonian simulation has a sophisticated circuit structure and requires many copies of pure states |Ψc(s)Ψc(s)||\Psi_{c}(s)\rangle\langle\Psi_{c}(s)| Llyod2014Quantum . Instead, we perform the classical Monte Carlo sampling Montanaro2015Quantum and as a result it needs roughly Θ[P1(s+1)]\Theta[P^{-1}(s+1)] copies of |Ψc(s)|\Psi_{c}(s)\rangle to generate the state |ϕ(s+1)|\phi(s+1)\rangle.

We consider two common metrics for the solution quality, the energy and fidelity. The energy is given as an expectation value of the Hamiltonian ^\hat{\mathcal{H}} on the evolved state |ϕ(s)|\phi(s)\rangle such that

E(s)\displaystyle E(s) =ϕ(s)|^|ϕ(s)=k=0K1hkϕ(s)|^k|ϕ(s).\displaystyle=\langle\phi(s)|\hat{\mathcal{H}}|\phi(s)\rangle=\sum_{k=0}^{K-1}h_{k}\langle\phi(s)|\hat{\mathcal{H}}_{k}|\phi(s)\rangle. (7)

The largest step SS is determined by the convergence criterion such as the energy difference between two consecutive steps, |E(s)E(s1)|ε|E(s)-E(s-1)|\leq\varepsilon, where ε\varepsilon is an error threshold Gomes2021Adaptive . The fidelity between two the pure states |ϕ(s)|\phi(s)\rangle and |u0|u_{0}\rangle is defined as F(|ϕ(s),|u0)=|ϕ(s)|u0|2F(|\phi(s)\rangle,|u_{0}\rangle)=\sqrt{|\langle\phi(s)|u_{0}\rangle|^{2}} Nielsen2000Quantum .

II.2 The implementation of the block diagonal unitary

In order to enact the non-unitary operator GG (Eq. 4) on an arbitrary state |ϕ(s)|\phi(s)\rangle, we consider the implementation of the unitary operation Λk~G=k=0K|kk|Gk\Lambda_{\tilde{k}}G=\sum_{k=0}^{K}|k\rangle\langle k|\otimes G_{k} in a similar way adopted in Barenco1995Elementary ; Wei2020A ; Liang2022Quantum . Since Λk~G\Lambda_{\tilde{k}}G is a block diagonal unitary, it can be implemented by applying K+1K+1 k~\tilde{k}-fold controlled nn-qubit gates, Λk~G=k=0K𝒞k~,|kGk\Lambda_{\tilde{k}}G=\prod_{k=0}^{K}\mathcal{C}_{\tilde{k},|k\rangle}G_{k}, where the unitary

𝒞k~,|kGk=|kk|Gk+k=0kkK|kk|IN.\displaystyle\mathcal{C}_{\tilde{k},|k\rangle}G_{k}=|k\rangle\langle k|\otimes G_{k}+\sum_{\begin{subarray}{c}k^{\prime}=0\\ k^{\prime}\neq k\end{subarray}}^{K}|k^{\prime}\rangle\langle k^{\prime}|\otimes I_{N}. (8)

The index k~\tilde{k} in 𝒞k~,|kGk\mathcal{C}_{\tilde{k},|k\rangle}G_{k} denotes the number of control qubits, see Fig. (1.c). The index |k|k\rangle in 𝒞k~,|kGk\mathcal{C}_{\tilde{k},|k\rangle}G_{k} denotes the controlled state. Since the Hamiltonian ^\hat{\mathcal{H}} is ll-local, each operator 𝒞k~,|kGk\mathcal{C}_{\tilde{k},|k\rangle}G_{k} can then be decomposed into at most ll k~\tilde{k}-fold controlled single-qubit gates 𝒞k~,|kσa(b)\mathcal{C}_{\tilde{k},|k\rangle}\sigma_{a}^{(b)}, where a{x,y,z}a\in\{x,y,z\}, σa(b)\sigma_{a}^{(b)} denotes the Pauli operator σa\sigma_{a} acting on bbth qubit of the working system, and the factor sks_{k} has been absorbed in one of the operator σa(b)\sigma_{a}^{(b)}. Notice that if sk=1s_{k}=-1, the operator skσa(b)s_{k}\sigma_{a}^{(b)} can be achieved by the unitary Rx(2π)σa(b)R_{x}(2\pi)\sigma_{a}^{(b)} associated with a rotation operator with respect to the xx axe.

For an arbitrary unitary 𝒫U(2)\mathcal{P}\in\textrm{U}(2), the controlled operator

𝒞k~,|K𝒫=|KK|𝒫+k=0K1|kk|I2\displaystyle\mathcal{C}_{\tilde{k},|K\rangle}\mathcal{P}=|K\rangle\langle K|\otimes\mathcal{P}+\sum_{k=0}^{K-1}|k\rangle\langle k|\otimes I_{2} (9)

can be decomposed into three controlled unitary operators 𝒞k~1,|(K1)/2Q\mathcal{C}_{\tilde{k}-1,|(K-1)/2\rangle}Q, 𝒞1,|1Q\mathcal{C}_{1,|1\rangle}Q and 𝒞1,|1Q\mathcal{C}_{1,|1\rangle}Q^{{\dagger}}, as well as two Toffolli gates over k~1\tilde{k}-1 qubits, where Q2=𝒫Q^{2}=\mathcal{P} Bergholm2005Quantum . The cost of simulating the two Toffolli gates is 𝒪(k~)\mathcal{O}(\tilde{k}) which counts the number of single qubit and CNOT gates Xin2017Quantum . For an arbitrary unitary 𝒞k~,|k𝒫\mathcal{C}_{\tilde{k},|k\rangle}\mathcal{P}, k=0,,Kk=0,\cdots,K, we utilize the Pauli operator σx\sigma_{x} to realize the transformation between the operators 𝒞k~,|k𝒫\mathcal{C}_{\tilde{k},|k\rangle}\mathcal{P} and 𝒞k~,|K𝒫\mathcal{C}_{\tilde{k},|K\rangle}\mathcal{P}. Particularly, given the binary representation k=k1k2kk~k=k_{1}k_{2}\cdots k_{\tilde{k}}, we have

𝒞k~,|k𝒫\displaystyle\mathcal{C}_{\tilde{k},|k\rangle}\mathcal{P} =|kk|𝒫+k=0kkK|kk|I2\displaystyle=|k\rangle\langle k|\otimes\mathcal{P}+\sum_{\begin{subarray}{c}k^{\prime}=0\\ k^{\prime}\neq k\end{subarray}}^{K}|k^{\prime}\rangle\langle k^{\prime}|\otimes I_{2}
=[σx]𝒞k~,|K𝒫[σx],\displaystyle=[\sigma_{x}]\mathcal{C}_{\tilde{k},|K\rangle}\mathcal{P}[\sigma_{x}]^{{\dagger}}, (10)

where [σx]=σx!k1σx!k2σx!kk~I2[\sigma_{x}]=\sigma_{x}^{!k_{1}}\otimes\sigma_{x}^{!k_{2}}\otimes\cdots\sigma_{x}^{!k_{\tilde{k}}}\otimes I_{2} with !ki!k_{i} representing the NOT operator that returns 11 when ki=0k_{i}=0 and returns 0 when ki=1k_{i}=1. Denoting 𝒯\mathcal{T} the cost of simulating the unitary 𝒞k~,|k𝒫\mathcal{C}_{\tilde{k},|k\rangle}\mathcal{P}, we have 𝒯=𝒪(k~2)\mathcal{T}=\mathcal{O}(\tilde{k}^{2}). Thus, the total gate complexity of simulating Λk~G\Lambda_{\tilde{k}}G is roughly 𝒯total=l(K+1)𝒯=𝒪[l(K+1)k~2]\mathcal{T}_{\textrm{total}}=l(K+1)\mathcal{T}=\mathcal{O}[l(K+1)\widetilde{k}^{2}].

II.3 Variational quantum state preparation

Algorithm 1 presents the detailed process of variational quantum algorithm to prepare the state |𝒚|\boldsymbol{y}\rangle.

Input: a reference quantum state |0k~|0^{\otimes\tilde{k}}\rangle, a parameterized quantum circuit (PQC) U(𝜽)U(\boldsymbol{\theta}), and a desired precision ε\varepsilon^{{}^{\prime}}.
(1) Randomly choose an initial parameters 𝜽\boldsymbol{\theta} and measure the parameterized quantum state |ψ(𝜽)=U(𝜽)|0k~|\psi(\boldsymbol{\theta})\rangle=U(\boldsymbol{\theta})|0^{\otimes\tilde{k}}\rangle in the standard basis {|z}\{|z\rangle\}, where z{0,1,,K}z\in\{0,1,\cdots,K\}. The probability of measurement outcome zz is denoted as 𝒑z\boldsymbol{p}_{z}.
(2) Estimate the cost function F(𝜽)F(\boldsymbol{\theta}) on an NISQ device.
(3) Train the PQC U(𝜽)U(\boldsymbol{\theta}) and find the optimal parameter 𝜽opt\boldsymbol{\theta}_{\textrm{opt}} under the termination condition Cond:F(𝜽)ε\emph{Cond}:F(\boldsymbol{\theta})\leq\varepsilon^{{}^{\prime}}.
Output: the optimal parameter 𝜽opt\boldsymbol{\theta}_{\textrm{opt}} and the encoded state |𝒚|ψ(𝜽opt)=U(𝜽opt)|0k~|\boldsymbol{y}\rangle\approx|\psi(\boldsymbol{\theta}_{\textrm{opt}})\rangle=U(\boldsymbol{\theta}_{\textrm{opt}})|0^{\otimes\tilde{k}}\rangle.
Algorithm 1 Variational quantum state preparation

In step (1), the parameterized quantum circuit U(𝜽)U(\boldsymbol{\theta}) is a hardware-efficient ansatz Kandala2017Hardware ; Havlivcek2019Supervised ; Commeau2020Variational . For instance, As shown in Fig. (1b), the PQC consists of single qubit gates Ry(θ)=eιθ2σyR_{y}(\theta)=e^{-\iota\frac{\theta}{2}\sigma_{y}} depending on the tunable parameter θ\theta, the Pauli operator σy\sigma_{y} and the controlled Pauli gates σz\sigma_{z} applying to the neighbor qubits. The probability 𝒑z\boldsymbol{p}_{z} is generated by measuring the given trial state multiple times. Let the total number of samples be NsampleN_{\textrm{sample}} and the number of samples of state |z|z\rangle be NzN_{z}. Thus the probability is 𝒑z=Nz/Nsample\boldsymbol{p}_{z}=N_{z}/N_{\textrm{sample}}.

In step (2), we define a cost function

F(𝜽)=FKL(𝜽)+|k=0K𝒚kK+1+k~|ψ(𝜽)|,\displaystyle F(\boldsymbol{\theta})=F_{\textrm{KL}}(\boldsymbol{\theta})+\Bigg{|}\sum_{k=0}^{K}\boldsymbol{y}_{k}-\sqrt{K+1}\langle+^{\otimes\tilde{k}}|\psi(\boldsymbol{\theta})\rangle\Bigg{|}, (11)

where |+=(|0+|1)/2|+\rangle=(|0\rangle+|1\rangle)/\sqrt{2} and |||\cdot| denotes absolute value. The first term

FKL(𝜽)=k=0K𝒚k2log𝒑k𝒚k2\displaystyle F_{\textrm{KL}}(\boldsymbol{\theta})=-\sum_{k=0}^{K}\boldsymbol{y}_{k}^{2}\log\frac{\boldsymbol{p}_{k}}{\boldsymbol{y}_{k}^{2}} (12)

is dubbed as the Kullback-Leibler divergence (KL) which quantifies the amount of information lost when changing from the vector distribution 𝒚[2]=(𝒚02,,𝒚K2)\boldsymbol{y}^{[2]}=(\boldsymbol{y}_{0}^{2},\cdots,\boldsymbol{y}_{K}^{2}) to another distribution 𝒑=(𝒑0,,𝒑K)\boldsymbol{p}=(\boldsymbol{p}_{0},\cdots,\boldsymbol{p}_{K}). The second term ensures the obtained states learned by the quantity FKL(𝜽)F_{\textrm{KL}}(\boldsymbol{\theta}) have positive local phases. For example, we aim to prepare a single qubit state |𝒚=15|0+25|1|\boldsymbol{y}\rangle=\frac{1}{\sqrt{5}}|0\rangle+\frac{2}{\sqrt{5}}|1\rangle corresponding to a classical vector 𝒚=(15,25)\boldsymbol{y}=(\frac{1}{\sqrt{5}},\frac{2}{\sqrt{5}}). Variational optimizing the cost function FKLF_{\textrm{KL}} generates four states,

|𝒚0=15|0+25|1,|𝒚1=15|0+25|1,\displaystyle|\boldsymbol{y}_{0}\rangle=\frac{1}{\sqrt{5}}|0\rangle+\frac{2}{\sqrt{5}}|1\rangle,~{}|\boldsymbol{y}_{1}\rangle=-\frac{1}{\sqrt{5}}|0\rangle+\frac{2}{\sqrt{5}}|1\rangle,
|𝒚2=15|025|1,|𝒚3=15|025|1.\displaystyle|\boldsymbol{y}_{2}\rangle=\frac{1}{\sqrt{5}}|0\rangle-\frac{2}{\sqrt{5}}|1\rangle,~{}|\boldsymbol{y}_{3}\rangle=-\frac{1}{\sqrt{5}}|0\rangle-\frac{2}{\sqrt{5}}|1\rangle.

The desired state |𝒚0|\boldsymbol{y}_{0}\rangle can be obtained by optimizing the second term. Note that the second term can also be

(k=0K𝒚kK+1+k~|ψ(𝜽))2.\displaystyle\left(\sum_{k=0}^{K}\boldsymbol{y}_{k}-\sqrt{K+1}\langle+^{\otimes\tilde{k}}|\psi(\boldsymbol{\theta})\rangle\right)^{2}. (13)

It is clear that F(𝜽)F(\boldsymbol{\theta}) is positive and is zero if and only if U(𝜽opt)|0k~=|𝒚U(\boldsymbol{\theta}_{\textrm{opt}})|0^{\otimes\tilde{k}}\rangle=|\boldsymbol{y}\rangle. Much small cost values indicate a high fidelity preparation of the state |𝒚|\boldsymbol{y}\rangle. Thus, the cost function is faithful and operationally meaningful Cerezo2021Variational . Here the inner product +k~|ψ(𝜽)\langle+^{\otimes\tilde{k}}|\psi(\boldsymbol{\theta})\rangle is computed via the Hadamard test Aharonov2009A . Updating the circuit parameter by performing a classical optimization procedure such as the Nelder-Mead algorithm Nelder1965A , we produce the optimal parameter 𝜽opt\boldsymbol{\theta}_{\textrm{opt}} until the cost function converges. The learning scheme maps the classical vector into a set of parameters 𝜽opt\boldsymbol{\theta}_{\textrm{opt}} such that 𝒚{𝜽opt}\boldsymbol{y}\mapsto\{\boldsymbol{\theta}_{\textrm{opt}}\}. Loading the parameter 𝜽opt\boldsymbol{\theta}_{\textrm{opt}} into NISQ devices equipped with a parameterized quantum circuit U(𝜽)U(\boldsymbol{\theta}), we then produce the state |𝒚|𝒚~=U(𝜽opt)|0k~|\boldsymbol{y}\rangle\approx|\tilde{\boldsymbol{y}}\rangle=U(\boldsymbol{\theta}_{\textrm{opt}})|0^{\otimes\tilde{k}}\rangle.

II.4 The selection of the learning rate

In this section, we report strategies to determine the learning rate μ\mu. Theorem 1 provides a theoretical benchmark for analyzing the learning rate μ\mu (the proof is provided in Appendix B).

Theorem 1.

Consider a system Hamiltonian ^\hat{\mathcal{H}} (1) with a ground state |u0|u_{0}\rangle and the ground state energy E0E_{0}. The algorithm proposed in Sec. II.1 converges to the ground state |u0|u_{0}\rangle if the learning rate μ\mu in Eq. (4) satisfies

μ{(0,1EN1+E0),EN1+E0>0+,others,\mu\in\left\{\begin{aligned} &\Big{(}0,\frac{1}{E_{N-1}+E_{0}}\Big{)},&E_{N-1}+E_{0}>0\\ &\mathbb{R}^{+},&\textrm{others}\\ \end{aligned},\right. (14)

where +\mathbb{R}^{+} denotes the set of positive real value and EN1E_{N-1} is the largest eigenvalue of ^\hat{\mathcal{H}}.

We here remark that from Theorem 1 the learning rate μ\mu can be an arbitrary positive number when the eigenvalues satisfy EN1+E00E_{N-1}+E_{0}\leq 0. One may wonder whether one can add diagonal constant shift to the Hamiltonian ^\hat{\mathcal{H}} to switch the condition EN1+E00E_{N-1}+E_{0}\leq 0. Let ^=^+τIN\hat{\mathcal{H}}^{{}^{\prime}}=\hat{\mathcal{H}}+\tau I_{N} with eigenvalues Ei=Ei+τE_{i}^{{}^{\prime}}=E_{i}+\tau for some constant τ\tau\in\mathbb{R}, such that EN1+E00E_{N-1}^{{}^{\prime}}+E_{0}^{{}^{\prime}}\leq 0 but EN1+E0>0E_{N-1}+E_{0}>0. Based on Theorem 1, the corresponding gradient operator is given by G=IN2μ^G^{{}^{\prime}}=I_{N}-2\mu^{{}^{\prime}}\hat{\mathcal{H}}^{{}^{\prime}} with an arbitrary positive valued learn rate μ\mu^{{}^{\prime}}. However, EN1+E00E_{N-1}^{{}^{\prime}}+E_{0}^{{}^{\prime}}\leq 0 means that the shift constant τEN1+E02\tau\leq-\frac{E_{N-1}+E_{0}}{2}. Therefore, although appropriate shift constant can avoid the selection of the learning rate, the fundamental convergence behavior does not modify.

We also remark that the condition (14) is impractical since the eigenvalues of ^\hat{\mathcal{H}} are required to be estimated in advance. Thus, we next provide a practical strategy to choose μ\mu by demonstrating the equivalence between the imaginary time evolution (ITE) and the algorithm proposed in Sec. II.1.

ITE is an iterative computational method to solve the ground-state of many-body quantum systems Aulicino2021State . Consider the time-independent Schrödinger equation in imaginary time (tιtt\rightarrow-\iota t),

|ϕ(t)t=^|ϕ(t),\displaystyle\frac{\partial|\phi(t)\rangle}{\partial t}=-\hat{\mathcal{H}}|\phi(t)\rangle, (15)

where |ϕ(t)|\phi(t)\rangle is a quantum state at time tt and ^\hat{\mathcal{H}} is the Hamiltonian. The formal solution of Eq. (15) can be expressed as |ϕ(t)=e^t|ϕ(0)|\phi(t)\rangle=e^{-\hat{\mathcal{H}}t}|\phi(0)\rangle, where |ϕ(0)|\phi(0)\rangle is an initial state at time t=0t=0. Suppose the quantum state |ϕ(0)|\phi(0)\rangle is expanded in the eigenbasis of ^\hat{\mathcal{H}} given in (1),

|ϕ(0)=i=0N1ci(0)|ui,ci(0)=ui|ϕ(0),i=0N1|ci(0)|2=1.\displaystyle|\phi(0)\rangle=\sum_{i=0}^{N-1}c_{i}^{(0)}|u_{i}\rangle,~{}c_{i}^{(0)}=\langle u_{i}|\phi(0)\rangle,~{}\sum_{i=0}^{N-1}|c_{i}^{(0)}|^{2}=1.

The (unnormalized) sequence states are given by

|ϕ(t)=i=0N1ci(0)eEit|uic0(0)eE0t|u0\displaystyle|\phi(t)\rangle=\sum_{i=0}^{N-1}c_{i}^{(0)}e^{-E_{i}t}|u_{i}\rangle\approx c_{0}^{(0)}e^{-E_{0}t}|u_{0}\rangle

in the limitation of large tt with non-zero overlap (c0(0)0)(c_{0}^{(0)}\neq 0) Lehtovaara2006Solution . The trial energy

limtE(t)=limtϕ(t)|^|ϕ(t)ϕ(t)|ϕ(t)=E0.\displaystyle\lim_{t\rightarrow\infty}E(t)=\lim_{t\rightarrow\infty}\frac{\langle\phi(t)|\hat{\mathcal{H}}|\phi(t)\rangle}{\langle\phi(t)|\phi(t)\rangle}=E_{0}. (16)

Namely, the ITE always converges to the ground state after long time iterations.

Consider a small time Δt=t/Nt\Delta t=t/N_{t}. The exponential operator e^t=e^NtΔte^{-\hat{\mathcal{H}}t}=e^{-\hat{\mathcal{H}}N_{t}\Delta t} and each non-unitary operator e^Δte^{-\hat{\mathcal{H}}\Delta t} have good approximations in terms of the following JJth-order product formula Childs2021Theory ,

e^Δt=j=0J(Δt)jj!^j+𝒪(ΔtJ+1),\displaystyle e^{-\hat{\mathcal{H}}\Delta t}=\sum_{j=0}^{J}\frac{(-\Delta t)^{j}}{j!}\hat{\mathcal{H}}^{j}+\mathcal{O}(\Delta t^{J+1}), (17)

where the Taylor series is truncated at order JJ Berry2015Simulating . In particular, with the first-order truncation of the Taylor series, e^Δte^{-\hat{\mathcal{H}}\Delta t} is approximated as a non-unitary operator

G~=IN^Δt,\displaystyle\tilde{G}=I_{N}-\hat{\mathcal{H}}\Delta t, (18)

with error 𝒪(Δt2)\mathcal{O}(\Delta t^{2}). By representing G~\tilde{G} as an LCU, the non-unitary operation G~\tilde{G} can be implemented on a quantum computer Berry2015Simulating ; Long2006General . We repeatedly apply the non-unitary operator G~\tilde{G} on a random initial state |ϕ(0)|\phi(0)\rangle. When the time tt is large enough, the updated state can be seen as an approximate ground state.

We observe that the two operators G~=INΔt^\tilde{G}=I_{N}-\Delta t\hat{\mathcal{H}} and G=IN2μ^G=I_{N}-2\mu\hat{\mathcal{H}} are equivalent when the small time step Δt=2μ\Delta t=2\mu. Thus, in practice we select the learning rate μ\mu in terms of Δt\Delta t such that μ=Δt/2\mu=\Delta t/2. Δt\Delta t should be sufficient small. Fox example, let the error of the first-order approximation of operator e^Δte^{-\hat{\mathcal{H}}\Delta t} be ϵ=Δt2\epsilon=\Delta t^{2}. This means that we should select μ=Δt/2=ϵ/2\mu=\Delta t/2=\sqrt{\epsilon}/2. In the next section, we find that ϵ102\epsilon\leq 10^{-2} is an amenable error.

III Numerical results and discussion

III.1 Numerical Simulation Results

In this subsection, we apply the proposed algorithm to simulate the ground state of two models including the deuteron molecule Dumitrescu2018Cloud ; Shehab2019Toward ; Aydeniz2020Practical and the Heisenberg spin1/2-1/2 model with external magnetic field Bespalova2021Hamiltonian .

III.1.1 The deuteron molecule

The n=2n=2 oscillator-basis deuteron Hamiltonian in the discrete variable representation using the harmonic oscillator basis can be written as a Pauli string form,

^\displaystyle\hat{\mathcal{H}} =5.907I4+0.2183σz(1)6.125σz(2)\displaystyle=5.907I_{4}+0.2183\sigma_{z}^{(1)}-6.125\sigma_{z}^{(2)}
2.143(σx(1)σx(2)+σy(1)σy(2)),\displaystyle-2.143(\sigma_{x}^{(1)}\sigma_{x}^{(2)}+\sigma_{y}^{(1)}\sigma_{y}^{(2)}), (19)

where the Pauli operators σx(k)\sigma_{x}^{(k)}, σy(k)\sigma_{y}^{(k)} and σz(k)\sigma_{z}^{(k)} act on the kkth site. Based on the decomposition of ^\hat{\mathcal{H}} in Eq. (III.1.1), the Pauli decomposition of the gradient operator GG is given by

G\displaystyle G =I42μ^\displaystyle=I_{4}-2\mu\hat{\mathcal{H}}
=0.2I4+0.3I4+0.5I4+11.814μRx(1)(2π)I4\displaystyle=0.2I_{4}+0.3I_{4}+0.5I_{4}+11.814\mu R_{x}^{(1)}(2\pi)I_{4}
+0.4366μRx(1)(2π)σz(1)+12.25μσz(2)+4.286μσx(1)σx(2)\displaystyle+0.4366\mu R_{x}^{(1)}(2\pi)\sigma_{z}^{(1)}+12.25\mu\sigma_{z}^{(2)}+4.286\mu\sigma_{x}^{(1)}\sigma_{x}^{(2)}
+4.286μσy(1)σy(2).\displaystyle+4.286\mu\sigma_{y}^{(1)}\sigma_{y}^{(2)}.

We here divide the identity operator I4I_{4} into three parts 0.2I4,0.3I4,0.5I40.2I_{4},0.3I_{4},0.5I_{4} and construct a classical vector,

𝒚=1𝒩y[\displaystyle\boldsymbol{y}=\frac{1}{\sqrt{\mathcal{N}_{y}}}[ 0.2,0.3,0.5,11.814μ,0.4366μ,\displaystyle\sqrt{0.2},\sqrt{0.3},\sqrt{0.5},\sqrt{11.814\mu},\sqrt{0.4366\mu},
12.25μ,4.286μ,4.286μ],\displaystyle\sqrt{12.25\mu},\sqrt{4.286\mu},\sqrt{4.286\mu}], (20)

where the constant 𝒩y=1+33.0726μ\mathcal{N}_{y}=1+33.0726\mu Note2 .

The first step trains a PQC U(𝜽)U(\boldsymbol{\theta}) to approximately prepare a 33-qubit quantum state |𝒚U(𝜽opt)|000|\boldsymbol{y}\rangle\approx U(\boldsymbol{\theta}_{\textrm{opt}})|000\rangle which is an amplitude encoding state of the vector 𝒚\boldsymbol{y}. In our numerical experiments, as shown in Fig. (1.b), the circuit depth L=3L=3 and 𝜽=(θ1,,θ9)\boldsymbol{\theta}=(\theta_{1},\cdots,\theta_{9}). Due to the relation between the learning rate μ\mu and the precision ϵ\epsilon, μ=ϵ/2\mu=\sqrt{\epsilon}/2, as discussed in Sec. II.4, the learning rate would increase when the precision increases. Fig. 2 illustrates the training processes of four cases with the precisions ϵ=101,102,103,104\epsilon=10^{-1},10^{-2},10^{-3},10^{-4}. The classical optimization procedure utilizes the Nelder-Mead algorithm Nelder1965A . It is straightforward to check that the final value approaches the minimal value zero. Meanwhile, we verify that the fidelity between |𝒚|\boldsymbol{y}\rangle and U(𝜽opt)|000U(\boldsymbol{\theta}_{\textrm{opt}})|000\rangle approaches one in these four cases.

Refer to caption
Figure 2: The training processes of the cost function F(𝜽)F(\boldsymbol{\theta}) Eq. (11) under different learning rates. The final values the cost function are (a) 2.04389×10102.04389\times 10^{-10} with ϵ=101\epsilon=10^{-1}, (b) 4.87097×10114.87097\times 10^{-11} with ϵ=102\epsilon=10^{-2}, (c) 1.7075×10101.7075\times 10^{-10} with ϵ=103\epsilon=10^{-3} and (d) 3.02248×10103.02248\times 10^{-10} with ϵ=104\epsilon=10^{-4}.

Next we iteratively apply the unitary Λk~G\Lambda_{\tilde{k}}G on an initial state |ϕ(0)=V(𝜶)|00|\phi(0)\rangle=V(\boldsymbol{\alpha})|00\rangle, where

V(𝜶)=\displaystyle V(\boldsymbol{\alpha})= [Ry(α1)Ry(α2)Ry(α3)Ry(α4)]CNOT\displaystyle[R_{y}(\alpha_{1})R_{y}(\alpha_{2})\otimes R_{y}(\alpha_{3})R_{y}(\alpha_{4})]\cdot\textrm{CNOT} (21)

and the parameter 𝜶=(0.3692,0.1112,0.7803,0.3897)\boldsymbol{\alpha}=(0.3692,0.1112,0.7803,0.3897). The overlap |c0(0)|2=|u0|ϕ(0)|2=0.3186|c_{0}^{(0)}|^{2}=|\langle u_{0}|\phi(0)\rangle|^{2}=0.3186. Fig. 3 plots the numerical results under different precisions ϵ\epsilon. Notice that the energy does not converge to the ground state energy when ϵ=101\epsilon=10^{-1} as shown in Fig. (3.a). This situation means that the ITE is performed with a long time step Δt=ϵ=101=0.3162\Delta t=\sqrt{\epsilon}=\sqrt{10^{-1}}=0.3162. Theoretically, in this case, the eigenvalues (in absolute value) of GG are

|λ0|=|12μE0|=1.5529,|λ1|=|12μE1|=0.9999,\displaystyle|\lambda_{0}|=|1-2\mu E_{0}|=1.5529,|\lambda_{1}|=|1-2\mu E_{1}|=0.9999,
|λ2|=|12μE2|=2.7358,|λ3|=|12μE3|=3.2889.\displaystyle|\lambda_{2}|=|1-2\mu E_{2}|=2.7358,|\lambda_{3}|=|1-2\mu E_{3}|=3.2889.

Based on the power method Golub2013Matrix , the algorithm converges to E3E_{3} rather than the lowest eigenvalue E0E_{0}. When the precision ϵ102\epsilon\leq 10^{-2}, the energy and the fidelity can successfully converge to the exact value 1.7485-1.7485 and 0.99990.9999 (see Fig. 3b,c,d). Note that, however, the smaller time step has the higher number of iteration step. In particular, about 200200 (4040) steps are required to reach the ground state for the precision ϵ=104\epsilon=10^{-4} (10210^{-2}).

Refer to caption
Figure 3: The convergence curves of energy (left axis) and fidelity (right axis) for the deuteron molecule.

Until now we have focused on the performance of our algorithm under noiseless situation. As quantum devices are usually imperfect and have noise, we here consider the noise implementation by taking into account a global depolarizing noise \mathcal{E} Schumacher1996Sending ; Fontana2021Evaluating . For any nn-qubit state ρ0\rho_{0}, the global symmetric depolarizing noise channel is defined by

(ρ0)=(1β)ρ0+β2nI2n,\displaystyle\mathcal{E}(\rho_{0})=(1-\beta)\rho_{0}+\frac{\beta}{2^{n}}I_{2^{n}}, (22)

where the parameter β\beta denotes the strength of the noise. Table 1 shows the quantity [E(s)E0]/|E0|[E(s)-E_{0}]/|E_{0}| and the evolved fidelity Tr(|u0u0|ϕ(s))\textrm{Tr}(|u_{0}\rangle\langle u_{0}|\phi(s)) between the exact ground state |u0|u_{0}\rangle and the iterated state ϕ(s)\phi(s) under different strength β\beta. The fidelity is evaluated exactly for the state obtained in the noisy simulation. The corresponding calculation is just classical. Table 1 implies that the fidelity increases with the decreasing of β\beta. It suggests that we still can obtain the ground state with amenable state fidelity even though the quantum device has a global noise.

Table 1: The numerical results of noisy simulation of our algorithm when the precision ϵ=102\epsilon=10^{-2}.
β\beta 0.00 0.02 0.04 0.06 0.08 0.1
[E(s)E0]/|E0|[E(s)-E_{0}]/|E_{0}| 0.0778 0.1272 0.1783 0.2313 0.2862 0.3431
Tr(|u0u0|ϕ(s))\textrm{Tr}(|u_{0}\rangle\langle u_{0}|\phi(s)) 0.9911 0.9805 0.9696 0.9583 0.9467 0.9346

Furthermore, we utilize the VQE Peruzzo2014Variational to obtain the ground state by minimizing the cost function C(𝜸)=00|W(𝜸)W(𝜸)|00C(\boldsymbol{\gamma})=\langle 00|W^{{\dagger}}(\boldsymbol{\gamma})\mathcal{H}W(\boldsymbol{\gamma})|00\rangle. The PQC is given by

W(𝜸)=\displaystyle W(\boldsymbol{\gamma})= [Ry(γ3)Ry(γ4)](|00|I2+|11|σz)×\displaystyle[R_{y}(\gamma_{3})\otimes R_{y}(\gamma_{4})](|0\rangle\langle 0|\otimes I_{2}+|1\rangle\langle 1|\otimes\sigma_{z})\times
[Ry(γ1)Ry(γ2)].\displaystyle[R_{y}(\gamma_{1})\otimes R_{y}(\gamma_{2})].

We apply the Nelder-Mead algorithm to optimize the 44 variational parameters and find the optimal parameter 𝜸opt=(0.2556,1.9895,0.8996,3.4769)\boldsymbol{\gamma}_{\textrm{opt}}=(0.2556,1.9895,-0.8996,-3.4769). We also consider a 22 depth circuit with 8 parameters. As shown in Fig. 4, our result converges faster than the VQE whose successful implementation requires to choose suitable PQC, avoid the local minimal value and train the cost function, while our iteration quantum algorithm can always obtain the ground state without the so-called barren plateau.

Refer to caption
Figure 4: The numerical simulation of VQE and our approach under conditions β=0\beta=0 and ϵ=102\epsilon=10^{-2}. VQE1 and VQE2 denote the cases with the depth of the ansatz 11 and 22.

Finally, we investigate the success probability after each iteration. As shown in Fig. 5 (a), our algorithm has higher success probability than FQE. This implies that our method requires much less repetition to prepare the iterated state and thus reduces the computation resources. Here, the success probability P(s+1)P(s+1) corresponds to the measurement result when we have obtained the state |ϕ(s)|\phi(s)\rangle. In this case it can be seen as a local success probability. If we consider the former ss steps as an overall procedure, the global success probability is given by P(s+1)=j=1sP(j)P^{{}^{\prime}}(s+1)=\prod_{j=1}^{s}P(j). Fig. 5 (b) indicates that the global success probability decays exponentially with the number of iterations.

Refer to caption
Figure 5: (a) The local success probability of each iteration. (b) The global success probability of each iteration. In both cases, the learning rate μ=102/2\mu=\sqrt{10^{-2}}/2.

III.1.2 The Heisenberg model

We now consider the following Heisenberg spin-1/2 chain Hamiltonian with open boundaries,

^=\displaystyle\hat{\mathcal{H}}= Jj=1n1(σx(j)σx(j+1)+σy(j)σy(j+1)+σz(j)σz(j+1))\displaystyle-J\sum_{j=1}^{n-1}(\sigma_{x}^{(j)}\sigma_{x}^{(j+1)}+\sigma_{y}^{(j)}\sigma_{y}^{(j+1)}+\sigma_{z}^{(j)}\sigma_{z}^{(j+1)})
hj=1nσz(j).\displaystyle-h\sum_{j=1}^{n}\sigma_{z}^{(j)}. (23)

for n=2n=2 and 44.

For n=2n=2, the gradient operator is of the form,

G\displaystyle G =I42η^\displaystyle=I_{4}-2\eta\hat{\mathcal{H}}
=0.2I4+0.4I4+0.4I4+2μJ[σx(1)σx(2)+σy(1)σy(2)\displaystyle=0.2I_{4}+0.4I_{4}+0.4I_{4}+2\mu J[\sigma_{x}^{(1)}\sigma_{x}^{(2)}+\sigma_{y}^{(1)}\sigma_{y}^{(2)}
+σz(1)σz(2)]+2μh(σz(1)+σz(2)).\displaystyle+\sigma_{z}^{(1)}\sigma_{z}^{(2)}]+2\mu h(\sigma_{z}^{(1)}+\sigma_{z}^{(2)}). (24)

Performing VQSP, we prepare the ancillary state |𝒚|\boldsymbol{y}\rangle associated with the vector

𝒚=1𝒩y[\displaystyle\boldsymbol{y}=\frac{1}{\sqrt{\mathcal{N}_{y}}}[ 0.2,0.4,0.4,2μJ,2μJ,2μJ,\displaystyle\sqrt{0.2},\sqrt{0.4},\sqrt{0.4},\sqrt{2\mu J},\sqrt{2\mu J},\sqrt{2\mu J},
2μh,2μh],\displaystyle\sqrt{2\mu h},\sqrt{2\mu h}], (25)

where the constant 𝒩y=1+2μ(3J+2h)\mathcal{N}_{y}=1+2\mu(3J+2h). The used PQC has 33 layers and 99 parameters. The structure is similar to Fig. 1 (b). The initial state |ϕ(0)|\phi(0)\rangle is produced by unitary (21) with same parameters. From Fig. 6 (c) and (d), we see that the energy and fidelity convergence approaches the ground state and the exact value 11 when the precision ϵ102\epsilon\leq 10^{-2} [μ(0,0.01/2=0.05)][\mu\in(0,\sqrt{0.01}/2=0.05)]. For the Heisenberg model, the largest eigenvalue is 33 and the smallest eigenvalue is 1.2-1.2. Based on Theorem 1, the theoretical interval of the learning rate is μ(0,131.20.5556)\mu\in(0,\frac{1}{3-1.2}\approx 0.5556) which contains the empirical interval (0,0.05)(0,0.05). There exists ϵ\epsilon such that μ(0.05,0.5556)\mu\in(0.05,0.5556) as shown in Fig. 6 (b). When ϵ=1.5\epsilon=1.5, the learning rate μ=1.5/2=0.6124(0,0.5556)\mu=\sqrt{1.5}/2=0.6124\not\in(0,0.5556). Thus, in Fig. 6 (a) the energy does not converge to the one of ground state.

Refer to caption
Figure 6: The convergence curves of energy (left axis) and fidelity (right axis) for the Heisenberg model with n=2n=2, h=0.1h=0.1 and J=1J=1. In VQSP, the trained values of the corresponding cost function F(𝜽opt)F(\boldsymbol{\theta}_{\textrm{opt}}) are (a) 4.2×10134.2\times 10^{-13}, (b) 4.132×10104.132\times 10^{-10}, (c) 5.909×10105.909\times 10^{-10} and (d) 1.51×10121.51\times 10^{-12}.

For the Heisenberg model with n=4n=4 and h=J=1h=J=1, the identity I16I_{16} in the gradient operator is decomposed into three parts 0.2I16+0.4I16+0.4I160.2I_{16}+0.4I_{16}+0.4I_{16}. For preparing the associated ancilla state |𝒚|\boldsymbol{y}\rangle, the PQC has 22 layers and each layer contains single rotations Ry(θi)R_{y}(\theta_{i}) and controlled rotations Ry(θj)R_{y}(\theta_{j}). Thus the number of parameters is 1616. The training of the cost function is shown in Fig. 7 (a) and the final value is 2.3900×1032.3900\times 10^{-3}. The fidelity of preparing |𝒚|\boldsymbol{y}\rangle is 0.99920.9992. The initial state is |ϕ(0)=i=14Ry(αi)|04|\phi(0)\rangle=\bigotimes_{i=1}^{4}R_{y}(\alpha_{i})|0^{\otimes 4}\rangle with

(α1,α2,α3,α4)=(0.5906,0.6604,0.0476,0.3488).\displaystyle(\alpha_{1},\alpha_{2},\alpha_{3},\alpha_{4})=(0.5906,0.6604,0.0476,0.3488). (26)

Fig. 7 (b) shows that our algorithm can successful converge towards the ground energy.

Refer to caption
Figure 7: (a) The convergence curves of energy (left axis) and fidelity (right axis) for the Heisenberg model with n=8n=8 and J=1J=1, h=0.1h=0.1. The learning rate μ=102/2\mu=\sqrt{10^{-2}}/2. (b) The training process of the cost function F(𝜽)F(\boldsymbol{\theta}). The final value is 2.38995×1032.38995\times 10^{-3}.

For the Heisenberg model with n=8n=8, h=0.1h=0.1 and J=1J=1, the identity I256I_{256} in the gradient operator is decomposed into three parts 0.2I256+0.4I256+0.4I2560.2I_{256}+0.4I_{256}+0.4I_{256}. For preparing the associated ancilla state |𝒚|\boldsymbol{y}\rangle, the PQC has 33 layers and each layer contains single rotations Ry(θi)R_{y}(\theta_{i}) and controlled rotations Ry(θj)R_{y}(\theta_{j}). Thus the number of parameters is 3030. The training of the cost function is shown in Fig. 7 (c) and the final value is 6.8840×1046.8840\times 10^{-4}. The fidelity of preparing |𝒚|\boldsymbol{y}\rangle is 0.99980.9998. The initial state is |ϕ(0)=i=18Ry(αi)|08|\phi(0)\rangle=\bigotimes_{i=1}^{8}R_{y}(\alpha_{i})|0^{\otimes 8}\rangle with

(α1,,α8)=(\displaystyle(\alpha_{1},\cdots,\alpha_{8})=( 0.1079,0.1822,0.0991,0.4898,0.1932,\displaystyle 0.1079,0.1822,0.0991,0.4898,0.1932,
0.8959,0.0991,0.0442).\displaystyle 0.8959,0.0991,0.0442). (27)

Fig. 7 (d) shows that our algorithm can successful converge towards the ground energy.

III.2 Performance and scaling of different algorithms

In this subsection, we analyse and compare the computational complexity of our approach with other ground state estimation methods. The computational complexity contains the number of qubits (qubit complexity), the number of quantum gates (gate complexity), and the number of measurements (sampling complexity).

First of all, we evaluate the computational complexity of our iterative algorithm. The number of qubits used to store the state |𝒚|ϕ(s)|\boldsymbol{y}\rangle|\phi(s)\rangle in two registers is n+k~n+\tilde{k}. Our algorithm includes the VQSP part and the block unitary evolution part. The gate complexity of VQSP scales as 𝒪[poly(k~)]\mathcal{O}[\textrm{poly}(\tilde{k})] with k~=log2(K+1)\tilde{k}=\log_{2}(K+1), where the constant KK scales polynomial on the number of qubits as 𝒪[poly(n)]\mathcal{O}[\textrm{poly}(n)]. Thus the overhead of VQSP is 𝒪[poly(logn)]\mathcal{O}[\textrm{poly}(\log n)]. As analysed in II.2, the cost of performing unitary part scales as 𝒯total=𝒪[l(K+1)k~2]=𝒪[lpoly(n)(logn)2]𝒪[lpoly(n)]\mathcal{T}_{\textrm{total}}=\mathcal{O}[l(K+1)\widetilde{k}^{2}]=\mathcal{O}[l\textrm{poly}(n)(\log n)^{2}]\approx\mathcal{O}[l\textrm{poly}(n)]. Here we neglect the item (logn)2(\log n)^{2}. As a result, the gate complexity of our algorithm is

𝒪[lNtpoly(n)+2Ntpoly(logn)],\displaystyle\mathcal{O}[lN_{t}\textrm{poly}(n)+2N_{t}\textrm{poly}(\log n)], (28)

which depends polynomially on the number of Trotter steps NtN_{t}, qubit number nn and locality ll. For an nn-qubit and ll-local Hamiltonian ^\hat{\mathcal{H}}, a classical computer to perform imaginary time evolution in general requires time and space that scale at least as exp[𝒪(2n)]\textrm{exp}[\mathcal{O}(2^{n})]. Our algorithm overcomes these bottlenecks and reduces the runtime to 𝒪[lNtpoly(n)]\mathcal{O}[lN_{t}\textrm{poly}(n)] which exhibits an exponential speedup in the size of the system compared with classical methods. At each iteration, the sample complexity determined by the success probability scales as 𝒪[P(s+1)1]\mathcal{O}[P(s+1)^{-1}]. According to the property of the success probability, the number of measurements is

1P(1)=𝒩y2G|ϕ(0)22=i=0N1yki=0N1|λi|2|ci(0)|2<i=0N1yk|λ0|2|c0(0)|2.\displaystyle\frac{1}{P(1)}=\frac{\mathcal{N}_{y}^{2}}{\|G|\phi(0)\rangle\|_{2}^{2}}=\frac{\sum_{i=0}^{N-1}y_{k}}{\sum_{i=0}^{N-1}|\lambda_{i}|^{2}|c_{i}^{(0)}|^{2}}<\frac{\sum_{i=0}^{N-1}y_{k}}{|\lambda_{0}|^{2}|c_{0}^{(0)}|^{2}}.

Notice that the sampling complexity is related to three quantities including the summation of coefficients i=0Kyk\sum_{i=0}^{K}y_{k}, the overlap c0(0)c_{0}^{(0)} between the initial state |ϕ(0)|\phi(0)\rangle and the ground state |u0|u_{0}\rangle, and the largest (in absolute value) finite eigenvalue |λ0||\lambda_{0}| of the gradient operator GG. Let the summation of coefficients i=0Kyk𝒪[poly(n)]\sum_{i=0}^{K}y_{k}\sim\mathcal{O}[\textrm{poly}(n)] scales polynomially in the number of qubits and the overlap |c0(0)|2𝒪[1/poly(n)]|c_{0}^{(0)}|^{2}\sim\mathcal{O}[1/\textrm{poly}(n)]. In this case, the sampling complexity scales 𝒪[poly(n)]\mathcal{O}[\textrm{poly}(n)] which increases polynomially with the number of qubits, implying that our approach is practical for estimating the ground states. To decrease the sample complexity of an approach is clearly to increase |c0(0)|2|c_{0}^{(0)}|^{2} by using a variational method Note1 .

A promising approach for preparing a ground state of a Hamiltonian on near-term devices is the variational quantum eigensolver (VQE) with hybrid quantum-classical loops Peruzzo2014Variational . The classical computer trains a PQC with shallow depth and learns an optimal parameter which is fed into an NISQ device to produce the ground state McClean2016The . However, the efficiency of VQE highly depends on the ansatz choice Patti2021Entanglement and the optimization of the non-convex cost function Wang2021Noise . The classical optimization problems are generally NP-hard and the gradient descent algorithms may not converge to the optimal solution Bittel2021Training . Due to the fact that our iterative algorithm is based on the ITE, it circumvents the aforementioned potential problems if the subroutine VQSP is successful. It is worth to remark that our algorithm prepares the ground state on the original state space rather than parameter space, namely, it can always reach the global minimal. Moreover, the VQE and our algorithm consume different quantum resources. For a single step, the VQE only involves a trial state preparation which has an overhead 𝒪(poly(n))\mathcal{O}(\textrm{poly}(n)) Peruzzo2014Variational . Based on Eq. (28), the overhead of our algorithm is 𝒪[lpoly(n)+2poly(logn)]\mathcal{O}[l\textrm{poly}(n)+2\textrm{poly}(\log n)]. Thus, the VQE has small gate cost compared with our algorithm in this case.

Another iterative quantum algorithm based on LCU is the FQE Wei2020A . FQE starts with a different ancillary state,

|𝒚~=1𝒩~yk=0Kyk|k,𝒩~y=k=0Kyk2.\displaystyle|\tilde{\boldsymbol{y}}\rangle=\frac{1}{\sqrt{\widetilde{\mathcal{N}}_{y}}}\sum_{k=0}^{K}y_{k}|k\rangle,~{}~{}\widetilde{\mathcal{N}}_{y}=\sum_{k=0}^{K}y_{k}^{2}. (29)

After performing the controlled unitary Λk~G\Lambda_{\tilde{k}}G on the composed state |𝒚~|ϕ(s)|\tilde{\boldsymbol{y}}\rangle\otimes|\phi(s)\rangle, one obtains the state

1𝒩~yk=0Kyk|kGk|ϕ(s).\displaystyle\frac{1}{\sqrt{\widetilde{\mathcal{N}}_{y}}}\sum_{k=0}^{K}y_{k}|k\rangle\otimes G_{k}|\phi(s)\rangle. (30)

Next one applies k~\tilde{k} Hadamard gate HH on the ancillary system and gets

1(K+1)𝒩~yk,j=0Kyk(1)kj|jGk|ϕ(s),\displaystyle\frac{1}{\sqrt{(K+1)\widetilde{\mathcal{N}}_{y}}}\sum_{k,j=0}^{K}y_{k}(-1)^{k\cdot j}|j\rangle\otimes G_{k}|\phi(s)\rangle, (31)

where kjk\cdot j denotes the bitwise inner product of kk and jj modulo 2. Notice that this step is different from our approach. In our algorithm, we apply the unitary U(𝜽opt)U^{{\dagger}}(\boldsymbol{\theta}_{\textrm{opt}}) rather than Hadamard gates. Using the projector operator MM to measure the above state, one gets the collapsed state,

1P~(s)(K+1)𝒩~y|0k~k=0KykGk|ϕ(s),\displaystyle\frac{1}{\sqrt{\widetilde{P}(s)(K+1)\widetilde{\mathcal{N}}_{y}}}|0^{\otimes\tilde{k}}\rangle\sum_{k=0}^{K}y_{k}G_{k}|\phi(s)\rangle, (32)

which is proportional to state |ϕ(s+1)=G|ϕ(s)G|ϕ(s)2|\phi(s+1)\rangle=\frac{G|\phi(s)\rangle}{\|G|\phi(s)\rangle\|_{2}}. The success probability of obtaining the result |0k~|0^{\otimes\tilde{k}}\rangle is

P~(s)=G|ϕ(s)22(K+1)𝒩~y=G|ϕ(s)22(K+1)k=0Kyk2.\displaystyle\widetilde{P}(s)=\frac{\|G|\phi(s)\rangle\|_{2}^{2}}{(K+1)\widetilde{\mathcal{N}}_{y}}=\frac{\|G|\phi(s)\rangle\|_{2}^{2}}{(K+1)\sum_{k=0}^{K}y_{k}^{2}}. (33)
Lemma 2.

The success probability P~(s)\widetilde{P}(s) is an increasing sequence such that

P~(1)P~(2)P~(S).\displaystyle\widetilde{P}(1)\leq\widetilde{P}(2)\leq\cdots\leq\widetilde{P}(S). (34)

The proof of Lemma 2 is similar the proof of Lemma 1. Due to the fact that

[k=0Kyk]2(K+1)k=0Kyk2\displaystyle\Bigg{[}\sum_{k=0}^{K}y_{k}\Bigg{]}^{2}-(K+1)\sum_{k=0}^{K}y_{k}^{2} =k=0Kj=k+1K(ykyj)20,\displaystyle=-\sum_{k=0}^{K}\sum_{j=k+1}^{K}(y_{k}-y_{j})^{2}\leq 0,

we obtain 𝒩y2(K+1)𝒩~y\mathcal{N}_{y}^{2}\leq(K+1)\widetilde{\mathcal{N}}_{y}. Thus, the success probability of our approach is higher than the counterpart of the FQE, P(s)P~(s)P(s)\geq\widetilde{P}(s). This property indicates that the sampling complexity of our algorithm is smaller than that of FQE.

As the gate complexity of preparing ancillary state is 𝒪(2k~)=𝒪[poly(n)]\mathcal{O}(2^{\tilde{k}})=\mathcal{O}[\textrm{poly}(n)], the total gate complexity of FQE is

𝒪[lNtpoly(n)+Ntpoly(n)].\displaystyle\mathcal{O}[lN_{t}\textrm{poly}(n)+N_{t}\textrm{poly}(n)]. (35)

Compared with Eq. (28), it is clear that our algorithm reduces the gate complexity. However, the circuit depth of FQE and our algorithm are D1+D2D_{1}+D_{2} and 2D1+D22D_{1}+D_{2}, respectively, where D1D_{1} and D2D_{2} denote the depth of VQSP and the block diagonal unitary. Thus FQE has more shallower circuit depth than our algorithm in this case.

Concerning the training of the cost function Eq. (11), there could be multiple local minima due to the selected ansatz. It is a key problem to find the global minimal value. Similar to general VQAs, our algorithm also encounters the so-called barren plateau (BP) phenomenon Cerezo2021Variational . In our numerical simulations, we have employed different initial points until the global minimal is reached. However, several other approaches may also be used in our algorithm such as designing local cost function Cerezo2021Cost and constructing adaptive circuit structure Pesah2021Absence .

IV Comparison with quantum imaginary time evolution

We remark that our algorithm framework can be treated as a quantum version of ITE. In this section we give a detailed comparison with the quantum imaginary time evolution (QITE) Motta2020Determining .

Given a Hamiltonian ^=k=0K1hk^k\hat{\mathcal{H}}=\sum_{k=0}^{K-1}h_{k}\hat{\mathcal{H}}_{k} and an initial state |ϕ|\phi\rangle. For a small step Δt\Delta t, the Trotter-Suzuki decomposition can simulate the evolution,

eΔt^eΔt^1eΔt^2eΔt^K,\displaystyle e^{-\Delta t\hat{\mathcal{H}}}\approx e^{-\Delta t\hat{\mathcal{H}}_{1}}e^{-\Delta t\hat{\mathcal{H}}_{2}}\cdots e^{-\Delta t\hat{\mathcal{H}}_{K}}, (36)

where the Trotter error subsumes terms of order Δt2\Delta t^{2} and higher. The QITE replaces each non-unitary step eΔt^ke^{-\Delta t\hat{\mathcal{H}}_{k}} with a unitary evolution eιΔtA^ke^{-\iota\Delta t\hat{A}_{k}} Motta2020Determining . More specially, the goal of QITE is to minimize the difference between states

|ϕ\displaystyle|\phi^{{}^{\prime}}\rangle =eΔt^k|ϕϕ|e2Δt^k|ϕ(INΔt^k)|ϕϕ|(INΔt^k)|ϕ,\displaystyle=\frac{e^{-\Delta t\hat{\mathcal{H}}_{k}}|\phi\rangle}{\sqrt{\langle\phi|e^{-2\Delta t\hat{\mathcal{H}}_{k}}|\phi\rangle}}\approx\frac{(I_{N}-\Delta t\hat{\mathcal{H}}_{k})|\phi\rangle}{\sqrt{\langle\phi|(I_{N}-\Delta t\hat{\mathcal{H}}_{k})|\phi\rangle}}, (37)

and

eιΔtA^k|ϕ(INιΔtA^k)|ϕ.\displaystyle e^{-\iota\Delta t\hat{A}_{k}}|\phi\rangle\approx(I_{N}-\iota\Delta t\hat{A}_{k})|\phi\rangle. (38)

Decomposing the DD-qubit operator A^k\hat{A}_{k} as a sum of Pauli strings, we have

A^k=i1i2iDa[k]i1i2iDσi1σi2σiD=a[k]σ,\displaystyle\hat{A}_{k}=\sum_{i_{1}i_{2}\cdots i_{D}}a[k]_{i_{1}i_{2}\cdots i_{D}}\sigma_{i_{1}}\sigma_{i_{2}}\cdots\sigma_{i_{D}}=\sum_{\mathcal{I}}a[k]_{\mathcal{I}}\sigma_{\mathcal{I}},

where \mathcal{I} denotes the index i1i2iDi_{1}i_{2}\cdots i_{D}. The minimum value of the function

|ϕ|ϕΔt+ιA^k|ϕ2\displaystyle\left\|\frac{|\phi^{{}^{\prime}}\rangle-|\phi\rangle}{\Delta t}+\iota\hat{A}_{k}|\phi\rangle\right\|_{2} (39)

can be approximately obtained by solving the linear equation (𝑺+𝑺T)𝒂=𝒃(\boldsymbol{S}+\boldsymbol{S}^{T})\boldsymbol{a}=-\boldsymbol{b}, where 𝑺\boldsymbol{S} and 𝒃\boldsymbol{b} are the expectation values obtained by measuring |ϕ|\phi\rangle.

Different from QITE, our algorithm achieves the non-unitary INΔt^kI_{N}-\Delta t\hat{\mathcal{H}}_{k} by adding a single ancilla qubit. By applying a unitary

U=(11+ΔtΔt1+ΔtΔt1+Δt11+Δt)=Ry(𝜽opt)\displaystyle U=\begin{pmatrix}\frac{1}{\sqrt{1+\Delta t}}&\frac{\sqrt{\Delta t}}{\sqrt{1+\Delta t}}\\ \frac{-\sqrt{\Delta t}}{\sqrt{1+\Delta t}}&\frac{1}{\sqrt{1+\Delta t}}\end{pmatrix}=R_{y}(\boldsymbol{\theta}_{\textrm{opt}}) (40)

on state |0|0\rangle, we prepare a superposition state |𝒚=11+Δt|0Δt1+Δt|1|\boldsymbol{y}\rangle=\frac{1}{\sqrt{1+\Delta t}}|0\rangle-\frac{\sqrt{\Delta t}}{\sqrt{1+\Delta t}}|1\rangle, where the parameter 𝜽opt=2arccos11+Δt\boldsymbol{\theta}_{\textrm{opt}}=2\arccos\frac{1}{\sqrt{1+\Delta t}}. Next, we apply the unitary U(|00|IN+|11|^k)U^{{\dagger}}(|0\rangle\langle 0|\otimes I_{N}+|1\rangle\langle 1|\otimes\hat{\mathcal{H}}_{k}) to the state |𝒚|ϕ|\boldsymbol{y}\rangle\otimes|\phi\rangle, which yields the following state,

|0(11+Δt|ϕΔt1+Δt^k|ϕ)\displaystyle|0\rangle\left(\frac{1}{1+\Delta t}|\phi\rangle-\frac{\Delta t}{1+\Delta t}\hat{\mathcal{H}}_{k}|\phi\rangle\right)-
|1(Δt1+Δt|ϕΔt1+Δt^k|ϕ).\displaystyle|1\rangle\left(\frac{\sqrt{\Delta t}}{1+\Delta t}|\phi\rangle-\frac{\sqrt{\Delta t}}{1+\Delta t}\hat{\mathcal{H}}_{k}|\phi\rangle\right). (41)

Finally, we measure the ancilla qubit to get the resulting state |0|0\rangle. The success probability is

P\displaystyle P =1(1+Δt)2|ϕΔt^k|ϕ22,\displaystyle=\frac{1}{\left(1+\Delta t\right)^{2}}\left\||\phi\rangle-\Delta t\hat{\mathcal{H}}_{k}|\phi\rangle\right\|_{2}^{2}, (42)

and the collapsed state is

|ϕ=|ϕΔt^k|ϕ(1+Δt)P=(INΔt^k)|ϕ(INΔt^k)|ϕ2.\displaystyle|\phi^{{}^{\prime}}\rangle=\frac{|\phi\rangle-\Delta t\hat{\mathcal{H}}_{k}|\phi\rangle}{\left(1+\Delta t\right)\sqrt{P}}=\frac{(I_{N}-\Delta t\hat{\mathcal{H}}_{k})|\phi\rangle}{\left\|(I_{N}-\Delta t\hat{\mathcal{H}}_{k})|\phi\rangle\right\|_{2}}. (43)

In summary, our algorithm requires one single ancilla qubit to achieve the imaginary time evolution while the QITE needs no any other ancilla qubit. Moreover, our approach is a totally quantum procedure. It does not use any classical part compared with QITE. Based on the locality of the Hamiltonian, the controlled unitary can be decomposed into ll single controlled Pauli operators. More importantly, all the required quantum gates are two qubit gates. Therefore, our method is suitable for NISQ devices.

V Conclusion and discussion

The selection of learning rate and the preparation of ancillary state are two crucial problems for an iterative quantum algorithm to estimate the ground state of quantum systems. We have established an equivalence between imaginary time evolution (ITE) and our improved iterative quantum algorithm. This equivalence can be thought of to be a benchmark to determine the learning rate in terms of the time step of the ITE. Furthermore, we have utilized a variational quantum algorithm to produce an amplitude encoding quantum state whose elements are associated with the coefficients of the gradient operator. This preparation strategy requires polynomial resources in contrast with other preparation approaches, for instance, the Grover-oracle-based method Soklakov2006Efficient , the decomposition of universal quantum gate Plesch2011Quantum and ancillary-assisted methods Zhang2021Low ; Zhang2022Quantum . This crucial subroutine enables our algorithm to have lower gate complexity. Notice that Nakaji et.al. proposed an efficient method for approximate amplitude encoding by using a shallow PQC Nakaji2022Approximate . Their method is more general and tackles the real-valued vector rather than positive vector. However, the defined cost function is different. The numerical results demonstrate that our proposal successfully finds the ground state of the deuteron molecule with high state fidelity. Our approach not only estimates the ground state energy but also prepares the ground state.

Our approach has general applicability to problems in many-body physics, since it is potential to implement a non-unitary operator on an arbitrary state. For example, given the Lindblad operator of open quantum systems, the entire non-unitary evolution can be simulated by combining the vectorizing density matrix with our approach. The iteration quantum framework can naturally be used to prepare non-equilibrium steady states Liang2022Quantum , excited states Wen2021A and to compute the time domain Green’s function Keen2021Quantum .


Acknowledgements: This work is supported by the National Natural Science Foundation of China (NSFC) under Grants 12075159 and 12171044; Beijing Natural Science Foundation (Grant No. Z190005); Academy for Multidisciplinary Studies, Capital Normal University; the Academician Innovation Platform of Hainan Province; Shenzhen Institute for Quantum Science and Engineering, Southern University of Science and Technology (No. SIQSE202001); Shandong Provincial Natural Science Foundation for Quantum Science No.ZR2020LLZ003, ZR2021LLZ002, and the Fundamental Research Funds for the Central Universities No.22CX03005A.

Conflict of Interest: The authors declare no conflict of interest.

Data Availability Statement: The data that support the findings of this study are available from the corresponding author upon reasonable request

Appendix A Proof of the Lemma 1

As shown in the main text, the success probability of each iteration is

P(s+1)=G|ϕ(s)22𝒩y2.\displaystyle P(s+1)=\frac{\|G|\phi(s)\rangle\|_{2}^{2}}{\mathcal{N}_{y}^{2}}. (44)

After making a measurement, we get the state |ϕ(s+1)|\phi(s+1)\rangle with probability P(s+1)P(s+1). Expanding |ϕ(s)|\phi(s)\rangle in the eigenbasis {|ui}i=0N1\{|u_{i}\rangle\}_{i=0}^{N-1}, we obtain

|ϕ(s)=i=0N1ci(s)|ui,i=0N1|ci(s)|2=1.\displaystyle|\phi(s)\rangle=\sum_{i=0}^{N-1}c_{i}^{(s)}|u_{i}\rangle,~{}~{}\sum_{i=0}^{N-1}|c_{i}^{(s)}|^{2}=1. (45)

Plugging Eq. (45) in Eq. (44), we have

P(s+1)=i=0N1|λi|2|ci(s)|2𝒩y2.\displaystyle P(s+1)=\frac{\sum_{i=0}^{N-1}|\lambda_{i}|^{2}|c_{i}^{(s)}|^{2}}{\mathcal{N}_{y}^{2}}. (46)

By the same trick, the success probability of obtaining state |ϕ(s+2)|\phi(s+2)\rangle is given by

P(s+2)=i=0N1|λi|2|ci(s+1)|2𝒩y2.\displaystyle P(s+2)=\frac{\sum_{i=0}^{N-1}|\lambda_{i}|^{2}|c_{i}^{(s+1)}|^{2}}{\mathcal{N}_{y}^{2}}. (47)

Here, the state |ϕ(s+1)|\phi(s+1)\rangle is given by

|ϕ(s+1)\displaystyle|\phi(s+1)\rangle =G|ϕ(s)G|ϕ(s)2=i=0N1λici(s)|uii=0N1|λi|2|ci(s)|2\displaystyle=\frac{G|\phi(s)\rangle}{\|G|\phi(s)\rangle\|_{2}}=\frac{\sum_{i=0}^{N-1}\lambda_{i}c_{i}^{(s)}|u_{i}\rangle}{\sqrt{\sum_{i=0}^{N-1}|\lambda_{i}|^{2}|c_{i}^{(s)}|^{2}}}
=i=0N1ci(s+1)|ui,\displaystyle=\sum_{i=0}^{N-1}c_{i}^{(s+1)}|u_{i}\rangle, (48)

where the coefficients

ci(s+1)=λici(s)i=0N1|λi|2|ci(s)|2.\displaystyle c_{i}^{(s+1)}=\frac{\lambda_{i}c_{i}^{(s)}}{\sqrt{\sum_{i=0}^{N-1}|\lambda_{i}|^{2}|c_{i}^{(s)}|^{2}}}. (49)

We now show P(s+2)P(s+1)P(s+2)\geq P(s+1) by showing P(s+2)/P(s+1)1P(s+2)/P(s+1)\geq 1. It is clear to see that

P(s+2)P(s+1)\displaystyle\frac{P(s+2)}{P(s+1)} =i=0N1|λi|2|ci(s+1)|2i=0N1|λi|2|ci(s)|2=i=0N1|λi|4|ci(s)|2[i=0N1|λi|2|ci(s)|2]2.\displaystyle=\frac{\sum_{i=0}^{N-1}|\lambda_{i}|^{2}|c_{i}^{(s+1)}|^{2}}{\sum_{i=0}^{N-1}|\lambda_{i}|^{2}|c_{i}^{(s)}|^{2}}=\frac{\sum_{i=0}^{N-1}|\lambda_{i}|^{4}|c_{i}^{(s)}|^{2}}{\Big{[}\sum_{i=0}^{N-1}|\lambda_{i}|^{2}|c_{i}^{(s)}|^{2}\Big{]}^{2}}.

By using the inequality |λi||λj||\lambda_{i}|\geq|\lambda_{j}| for iji\leq j and the condition i=0N1|ci(s)|2=1\sum_{i=0}^{N-1}|c_{i}^{(s)}|^{2}=1, we have

i=0N1|λi|4|ci(s)|2[i=0N1|λi|2|ci(s)|2]2\displaystyle\sum_{i=0}^{N-1}|\lambda_{i}|^{4}|c_{i}^{(s)}|^{2}-\Bigg{[}\sum_{i=0}^{N-1}|\lambda_{i}|^{2}|c_{i}^{(s)}|^{2}\Bigg{]}^{2}
=i,j=0N1|λi|4|ci(s)|2|cj(s)|2i,j=0N1|λi|2|ci(s)|2|λj|2|cj(s)|2\displaystyle=\sum_{i,j=0}^{N-1}|\lambda_{i}|^{4}|c_{i}^{(s)}|^{2}|c_{j}^{(s)}|^{2}-\sum_{i,j=0}^{N-1}|\lambda_{i}|^{2}|c_{i}^{(s)}|^{2}|\lambda_{j}|^{2}|c_{j}^{(s)}|^{2}
=i,j=0N1|λi|2(|λi|2|λj|2)|ci(s)|2|cj(s)|2\displaystyle=\sum_{i,j=0}^{N-1}|\lambda_{i}|^{2}(|\lambda_{i}|^{2}-|\lambda_{j}|^{2})|c_{i}^{(s)}|^{2}|c_{j}^{(s)}|^{2}
=i=0N1j=i+1N1(|λi2||λj2|)2|ci(s)|2|cj(s)|2\displaystyle=\sum_{i=0}^{N-1}\sum_{j=i+1}^{N-1}\left(|\lambda_{i}^{2}|-|\lambda_{j}^{2}|\right)^{2}|c_{i}^{(s)}|^{2}|c_{j}^{(s)}|^{2}
0.\displaystyle\geq 0. (50)

Therefore, we obtain P(s+2)/P(s+1)1P(s+2)/P(s+1)\geq 1.

Appendix B Proof of Theorem 1

We here prove the Theorem 1. Consider the eigenvalue decomposition of the non-unitary operator G=i=0N1λi|uiui|G=\sum_{i=0}^{N-1}\lambda_{i}|u_{i}\rangle\langle u_{i}|, λi=12Eiμ\lambda_{i}=1-2E_{i}\mu. We define functions Wi(μ)=|λi|=|12Eiμ|W_{i}(\mu)=|\lambda_{i}|=|1-2E_{i}\mu| with variable μ\mu for i=0,,N1i=0,\cdots,N-1. Based on the power method Golub2013Matrix , the evolved state |ϕ(t)|\phi(t)\rangle converges to the ground state of ^\hat{\mathcal{H}} if

|W0(μ)|>|W1(μ)||WN1(μ)|.\displaystyle|W_{0}(\mu)|>|W_{1}(\mu)|\geq\cdots\geq|W_{N-1}(\mu)|. (51)

Let us discuss the upper bound of the learning rate μ\mu according to following three cases.

Case 1: 0E0<E1EN10\leq E_{0}<E_{1}\leq\cdots\leq E_{N-1}.

Case 2: E0<E1EN10E_{0}<E_{1}\leq\cdots\leq E_{N-1}\leq 0.

Case 3: E0<E1Ei10Ei+1EN1E_{0}<E_{1}\leq\cdots\leq E_{i-1}\leq 0\leq E_{i+1}\leq\cdots\leq E_{N-1}.

In Case 1, we can verify that Wi(μ)[0,1]W_{i}(\mu)\in[0,1] in the interval μ[0,1Ei]\mu\in[0,\frac{1}{E_{i}}] and is a symmetric function about the axis μ=12Ei\mu=\frac{1}{2E_{i}} for all ii. For μ(12E0,+)\mu\in(\frac{1}{2E_{0}},+\infty), the following equality always hold,

WN1(μ)W1(μ)>W0(μ),\displaystyle W_{N-1}(\mu)\geq\cdots\geq W_{1}(\mu)>W_{0}(\mu), (52)

which does not satisfy the condition (51). In order to ensure the condition (51) for μ(0,12E0)\mu\in(0,\frac{1}{2E_{0}}), we set Wi(μ)=W0(μ)W_{i}(\mu)=W_{0}(\mu) and obtain μ=1Ei+E0\mu=\frac{1}{E_{i}+E_{0}}. Accounting to the inequality 1EN1+E01E1+E0\frac{1}{E_{N-1}+E_{0}}\leq\cdots\leq\frac{1}{E_{1}+E_{0}}, one sees that the algorithm converges to the ground state if μ(0,1EN1+E0)\mu\in(0,\frac{1}{E_{N-1}+E_{0}}). In Case 2, we have an increasing sequence |E0|>|E1||EN1|0|E_{0}|>|E_{1}|\geq\cdots\geq|E_{N-1}|\geq 0. Hence, the condition (51) is always true for μ(0,+)\mu\in(0,+\infty). In Case 3, we have an inequality |E0|>|E1||Ei1|0|E_{0}|>|E_{1}|\geq\cdots\geq|E_{i-1}|\geq 0 and further

1+2|E0|μ\displaystyle 1+2|E_{0}|\mu >1+2|E1|μ1+2|Ei1|μ1\displaystyle>1+2|E_{1}|\mu\geq\cdots\geq 1+2|E_{i-1}|\mu\geq 1

for μ>0\mu>0. In order to maintain the sequence (51), we let WN1(μ)<1+2|E0|μW_{N-1}(\mu)<1+2|E_{0}|\mu and find 2(EN1+E0)μ<12(E_{N-1}+E_{0})\mu<1. Notice that if 0EN1|E0|0\leq E_{N-1}\leq|E_{0}| (E0+EN10)(E_{0}+E_{N-1}\leq 0), the inequality (51) always holds for μ>0\mu>0. For EN1>|E0|E_{N-1}>|E_{0}| (E0+EN1>0)(E_{0}+E_{N-1}>0), we obtain an upper bound of μ\mu, μ(0,1EN1+E0)\mu\in(0,\frac{1}{E_{N-1}+E_{0}}). From the above analysis, we complete the proof.

References