This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A simple quantum simulation algorithm with near-optimal precision scaling

Amir Kalev Information Sciences Institute, University of Southern California, Arlington, VA 22203, USA    Itay Hen [email protected] Information Sciences Institute, University of Southern California, Marina del Rey, California 90292, USA Department of Physics and Astronomy and Center for Quantum Information Science & Technology, University of Southern California, Los Angeles, California 90089, USA
Abstract

Quantum simulation is a foundational application for quantum computers, projected to offer insights into complex quantum systems that are beyond the reach of classical computation. However, with the exception of Trotter-based methods which suffer from suboptimal scaling with respect to simulation precision, existing simulation techniques are for the most part too intricate to implement on early fault-tolerant quantum hardware. We propose a quantum Hamiltonian dynamics simulation algorithm that aims to be both straightforward to implement and at the same time have near-optimal scaling in simulation precision.

I Introduction

Quantum computers are widely believed to have unique advantages over classical computers when it comes to simulating Hamiltonian dynamics, due to their inherent quantum nature, which allows them to efficiently represent complex quantum states and the evolution thereof. Such simulations hold the potential to solve intricate problems in quantum chemistry, materials science and physics, making quantum computers potentially very powerful future tools for advancing scientific knowledge and tackling challenges beyond the capabilities of classical computing.

The precise manner in which Hamiltonian dynamics can be approximated on quantum devices varies considerably between approaches, as each is tailored to different resource constraints, runtime requirements and precision needs. Perhaps the most straightforward and resource-efficient technique is that of Trotterization, in which case the time-evolution operator is approximated using Trotter-Suzuki product formulas [1, 2]. This approach is simple to implement, however it suffers from a suboptimal scaling of the resources required for its execution with error tolerance. A more advanced and vastly popular technique is quantum signal processing (QSP) [3], which leverages a sequence of unitary operations combined with classical preprocessing to directly approximate the exponential of a Hamiltonian. Complementing QSP, the linear combination of unitaries (LCU) framework [4] encompasses some of the most powerful quantum simulation methods [4, 5, 6, 7, 8]. In LCU-based approaches, the time evolution operator is expressed as a linear combination of efficiently implementable unitary operators, with ancillary qubits facilitating the simulation through oblivious amplitude amplification [9]. This method is particularly advantageous for simulating Hamiltonians in quantum chemistry and materials science, where decomposition of the Hamiltonian into simpler components is feasible.

Despite enjoying near-optimal scaling of resources with respect to target precision — typically requiring only a logarithmic number of ancilla qubits — these more advanced approaches present significant challenges for implementation on near-term quantum computing devices. For example, in the typical LCU-based approach, the select unitary transformations are (multi-qubit) control of Pauli strings. These control operations, which involve different Pauli operations acting on different qubits, translate to a non-trivial overhead in resources which then hinders the technique’s otherwise straightforward applicability on near-term quantum computing devices.

In this work, we present a novel LCU-based quantum simulation algorithm designed to be both straightforward to implement, by generally requiring only control-not (CNOT) operations as the select unitaries, while at the same time offering the benefits of near-optimal resource scaling, making dynamical simulations potentially more accessible for near-term devices.

The simulation algorithm we present builds on two recent studies [5, 6] that have shown how the expansion of the unitary time-evolution operator in powers of its off-diagonal strength [10, 11, 12] combined with the use of the concept of divided differences [13, 14] leads to resource-efficient quantum simulation algorithms whose complexity is on par and, in some cases, superior to other state-of-the-art optimal-precision simulation protocols [4, 15]. The approach we present here, which we refer to as the permutation matrix representation (PMR) simulation method, similarly utilizes a series expansion of the quantum time-evolution operator in its off-diagonal elements. However, in contrast to the work depicted in Refs. [5, 6], here we utilize a powerful approximation of divided differences which allows us to render those in terms of sums of phases which in turn leads to simplifications in the select unitary operations.

As we demonstrate, when the above approximation is combined with the LCU technique, the resulting simulation algorithm enjoys both near-optimal scaling in terms of the overall simulation error and simple-to-implement CNOTs interleaved with controlled phases as the select unitary operations.

The paper is organized as follows. In Sec. II, we provide an overview of the permutation matrix representation technique which serves as a foundation for our algorithm. In Sec. III, we present the Hamiltonian dynamics algorithm, which we construct using PMR, and discuss the divided difference calculation including cost analysis and various extensions. A discussion and some conclusions are given in Sec. IV.

II Permutation matrix representation of the time evolution operator

Before delving into the technical specifics of our algorithm, we first briefly present the key components of the PMR approach to expanding matrix exponentials in a series [12, 5, 6]. For concreteness we restrict our attention here to time-independent Hamiltonians and defer the discussion about how the method is to be extended to the time-dependent case to the concluding section. The PMR expansion consists of several steps, the first of which is choosing a preferred basis, the ‘computational basis’, in which the Hamiltonian matrix is represented, and casting the Hamiltonian HH as a sum of a diagonal operator and an off-diagonal operator in that basis. Hereafter we denote the set of computational basis state as {|z}\{|z\rangle\}. Intuitively, the diagonal part of the Hamiltonian corresponds to a ‘classical’ Hamiltonian whereas its off-diagonal part governs the non-trivial ‘quantum’ dynamics of the system in the chosen basis. The off-diagonal component is further decomposed to a sum of products of diagonal and permutation operators [12]. This defines the PMR form of the Hamiltonian:

H=D0+i=1MDiPi,H=D_{0}+\sum_{i=1}^{M}D_{i}P_{i}\,, (1)

where the DiD_{i}’s are diagonal operators and the PiP_{i}’s are permutation operators, i.e., operators that permute the basis states. For example, for qubit systems, if one chooses the computational basis to be the eigenbasis of the Pauli-ZZ operators than the PiP_{i} operators are strings of Pauli-XX operators. The decomposition Eq. (1) can be carried out efficiently for any physical Hamiltonian [12]. Next consider the evolution of a quantum state under HH for duration tt, which is then split to a sequence of rr repeated short-time evolution circuits each evolving the state by a short time period Δt=t/r\Delta t=t/r, the value of which we determine later on in a manner designed to optimize resources (we discuss this in detail in Sec. III.3).

The PMR decomposition of the Hamiltonian allows us to write the (short) time evolution operator U(Δt)=eiHΔtU(\Delta t)=e^{-iH\Delta t} in an off-diagonal series expansion:

eiHΔt\displaystyle e^{-iH\Delta t} =n=0(iΔt)nn!Hn=n=0(iΔt)nn!(i=0MDiPi)n,\displaystyle=\sum_{n=0}^{\infty}\frac{(-i\Delta t)^{n}}{n!}H^{n}=\sum_{n=0}^{\infty}\frac{(-i\Delta t)^{n}}{n!}\Big{(}\sum_{i=0}^{M}D_{i}P_{i}\Big{)}^{n}, (2)

where in the last step we identify P0P_{0} with the identity operator. After some algebra, this expansion (see Refs. [10, 12] for a complete and detailed derivation) may be expressed as [5]

U(Δt)=q=0Δtqq!𝐢qΓ𝐢qP𝐢qA𝐢q.\displaystyle U(\Delta t)=\sum_{q=0}^{\infty}\frac{\Delta t^{q}}{q!}\sum_{{\bf i}_{q}}\Gamma_{{\bf i}_{q}}P_{{\bf i}_{q}}A_{{\bf i}_{q}}. (3)

Here, the boldfaced index 𝐢q=(i1,,iq){\bf i}_{q}=(i_{1},\ldots,i_{q}) is a tuple of indices iji_{j}, with j=1,,qj=1,\ldots,q, each ranging from 11 to MM and P𝐢q:=PiqPi2Pi1P_{{\bf i}_{q}}\mathrel{\mathop{\mathchar 58\relax}}=P_{i_{q}}\cdots P_{i_{2}}P_{i_{1}} is an ordered product of off-diagonal permutation operators. In addition, the operators A𝐢qA_{{\bf i}_{q}} are the diagonal operators

A𝐢q=zd𝐢qΓ𝐢qq!ΔtqeiΔt[Ez0,Ez1,,Ezq]|zz|,A_{{\bf i}_{q}}=\sum_{z}\frac{d_{{\bf i}_{q}}}{\Gamma_{{{\bf i}_{q}}}}\frac{q!}{\Delta t^{q}}\text{e}^{-i\Delta t[E_{z_{0}},E_{z_{1}},\ldots,E_{z_{q}}]}|z\rangle\langle z|\,, (4)

where eiΔt[Ez0,,Ezq]\text{e}^{-i\Delta t[E_{z_{0}},\ldots,E_{z_{q}}]} is the divided difference of the exponential function [13, 14] over the multi-set {Ez0,,Ezq}\{E_{z_{0}},\ldots,E_{z_{q}}\}, where Ezj=zj|D0|zjE_{z_{j}}=\langle z_{j}|D_{0}|z_{j}\rangle with |zj=PijPi1|z|z_{j}\rangle=P_{i_{j}}\cdots P_{i_{1}}|z\rangle, and we used the convention that z0=z{z_{0}}=z. We provide additional details pertaining to divided differences in Appendix A. Note that the zjz_{j}’s in |zj|z_{j}\rangle and in EzjE_{z_{j}} should actually be denoted by z𝐢j(z)z_{{\bf i}_{j}}(z), however for conciseness we are using the abbreviations zjz_{j}. Lastly, we have denoted the product of off-diagonal matrix element d𝐢q=j=1qdij(zj)d_{{\bf i}_{q}}=\prod_{j=1}^{q}d_{i_{j}}(z_{j}) where

dij(zj)=zj|Dij|zjd_{i_{j}}(z_{j})=\langle z_{j}|D_{i_{j}}|z_{j}\rangle (5)

can be considered the ‘hopping strength’ of PijP_{i_{j}} with respect to |zj|z_{j}\rangle, and we have defined the real-valued coefficients Γ𝐢q=jΓij\Gamma_{{\bf i}_{q}}=\prod_{j}\Gamma_{i_{j}} where Γi=maxz|di(z)|\Gamma_{i}=\max_{z}|d_{i}(z)|.

In what follows, we use the expanded form of U(Δt)U(\Delta t) given in Eq. (3) to show that the time evolution operator can be formulated as a linear combination of simple-to-execute unitary operators which in turn allows us to utilize the LCU technique to approximate the time-evolution operator.

The main technical contribution of this paper is devising an algorithm that implements the LCU with near-optimal scaling and at the same time utilizes only simple-to-implement unitary operators and avoid complicated classical computations on quantum registers or involved select unitaries. We present our algorithm next.

III The simulation algorithm

III.1 Divided difference exponentials as linear combinations of phases

To begin with, we present an efficient method for calculating in superposition the divided differences appearing in the A𝐢qA_{{\bf i}_{q}} operators in a manner that does not require any classical (i.e., quantum reversible) arithmetic calculations on additional ancillary registers.

First, let us focus on the divided differences of the exponential function eiτ[x0,,xq]\text{e}^{-i\tau[x_{0},\ldots,x_{q}]} for a multi-set of inputs {x0,,xq}q+1\{x_{0},\ldots,x_{q}\}\in\mathbb{R}^{q+1} and a real valued parameter τ\tau. By exploiting the Leibniz rule for divided differences [13, 14], which states that

(fg)[x0,,xq]=j=0qf[x0,,xj]g[xj,,xq],(f\cdot g)[x_{0},\ldots,x_{q}]=\sum_{j=0}^{q}f[x_{0},\ldots,x_{j}]g[x_{j},\ldots,x_{q}]\,, (6)

for any two functions ff and gg, we can write

eiτ[x0,,xq]=j=0qeiξτ[x0,,xj]ei(1ξ)τ[xj,,xq],\text{e}^{-i\tau[x_{0},\ldots,x_{q}]}=\sum_{j=0}^{q}\text{e}^{-i\xi\tau[x_{0},\ldots,x_{j}]}\text{e}^{-i(1-\xi)\tau[x_{j},\ldots,x_{q}]}\,, (7)

for any ξ\xi\in\mathbb{C}. A successive application of the rule yields

eiτ[x0,,xq]=\displaystyle\text{e}^{-i\tau[x_{0},\ldots,x_{q}]}= (8)
0j1jK1qeiτK[x0,,xj1]eiτK[xj1,,xj2]eiτK[xjK1,,xq],\displaystyle\sum_{\mathclap{0\leq j_{1}\leq\ldots\leq j_{K-1}\leq q}}\text{e}^{-i\frac{\tau}{K}[x_{0},\ldots,x_{j_{1}}]}\text{e}^{-i\frac{\tau}{K}[x_{j_{1}},\ldots,x_{j_{2}}]}\cdots\text{e}^{-i\frac{\tau}{K}[x_{j_{K-1}},\ldots,x_{q}]}\,,

for any integer K1K\geq 1. We shall refer to KK as the divided differences approximation subdivision constant in what follows. For convenience, we shall choose it to be a power of two, namely, K=2κK=2^{\kappa}, for a nonnegative integer κ\kappa to which we will refer as the divided differences approximation depth.

As a next step, we utilize the observation that for every finite qq there is small enough δτ/K\delta\coloneqq\tau/K, such that the divided difference eiδ[x0,,xq]\text{e}^{-i\delta[x_{0},\ldots,x_{q}]} is well approximated by [16]:

eiδ[x0,,xq](iδ)qq!eiδ1q+1j=0qxj\text{e}^{-i\delta[x_{0},\ldots,x_{q}]}\approx\frac{(-i\delta)^{q}}{q!}\text{e}^{-i\delta\frac{1}{q+1}\sum_{j=0}^{q}x_{j}} (9)

(we discuss the quality of this approximation and its effect on the algorithm precision later on). We can thus use Eq. (9) to approximate the divided differences in Eq. (8) to obtain

eiτ[x0,,xq]\displaystyle\text{e}^{-i\tau[x_{0},\ldots,x_{q}]} (iδ)q0j1jK1qeiδ(x¯(0,j1)+x¯(j1,j2)++x¯(jK1,q))j1!(j2j1)!(qjK1)!\displaystyle\approx(-i\delta)^{q}\sum_{\mathclap{\begin{subarray}{c}\\ 0\leq j_{1}\leq\ldots\leq j_{K-1}\leq q\end{subarray}}}\frac{\text{e}^{-i\delta(\bar{x}_{(0,j_{1})}+\bar{x}_{(j_{1},j_{2})}+\ldots+\bar{x}_{(j_{K-1},q)})}}{j_{1}!(j_{2}-j_{1})!\cdots(q-j_{K-1})!}
e^Kiτ[x0,,xq],\displaystyle\coloneqq\hat{\text{e}}_{K}^{-i\tau[x_{0},\ldots,x_{q}]}, (10)

where we used the notation

x¯(j1,j)=1jj1+1m=j1jxm,\bar{x}_{(j_{\ell-1},j_{\ell})}=\frac{1}{j_{\ell}-j_{\ell-1}+1}\sum_{m=j_{\ell-1}}^{j_{\ell}}x_{m}\,, (11)

with =1,,K\ell=1,\ldots,K (and defining j0=0j_{0}=0). The choice for KK which ensures a near-optimal scaling of resources in terms of simulation error will be discussed in Sec. III.3.

Next, we write the approximation e^Kiτ[]\hat{\text{e}}^{-i\tau[\cdots]}_{K} as a sum of simple-to-calculate phases. By a change of variables, Eq. (III.1) can be rewritten as:

e^Kiτ[x0,,xq]=(iδ)q\displaystyle\hat{\text{e}}^{-i\tau[x_{0},\ldots,x_{q}]}_{K}=(-i\delta)^{q} (12)
×j1,j2,,jK0,j=qeiδ(x¯(0,j1)+x¯(j1,j1+j2)++x¯(j1++jK1,j1++jK))j1!j2!jK!\displaystyle\times\sum_{\mathclap{\begin{subarray}{c}j_{1},j_{2},\ldots,j_{K}\geq 0,\\ \sum_{\ell}j_{\ell}=q\end{subarray}}}\frac{\text{e}^{-i\delta(\bar{x}_{(0,j_{1})}+\bar{x}_{(j_{1},j_{1}+j_{2})}+\ldots+\bar{x}_{(j_{1}+\ldots+j_{K-1},j_{1}+\ldots+j_{K})})}}{j_{1}!j_{2}!\cdots j_{K}!}
=(iδ)qq!𝐣Ω(K,q)(qj1,,jK)=1Keiδx¯,\displaystyle=\frac{(-i\delta)^{q}}{q!}\sum_{{\bf j}\in\Omega_{(\!K,q\!)}}{{q}\choose{j_{1},\ldots,j_{K}}}\prod_{\ell=1}^{K}\text{e}^{-i\delta\bar{x}_{\ell}}\,,

where in the second equation (qj1,,jK)=q!j1!jK!{{q}\choose{j_{1},\ldots,j_{K}}}=\frac{q!}{j_{1}!\cdots j_{K}!} is the multinomial coefficient, and we defined

x¯\displaystyle\bar{x}_{\ell} x¯(j1++j1,j1++j)\displaystyle\coloneqq\bar{x}_{(j_{1}+\ldots+j_{\ell-1},j_{1}+\ldots+j_{\ell})} (13)
=1j+1m=j1++j1j1++jxm,\displaystyle=\frac{1}{j_{\ell}+1}\sum_{m=j_{1}+\ldots+j_{\ell-1}}^{j_{1}+\ldots+j_{\ell}}x_{m}\,,

with the convention that for =1\ell=1, the sum j1++j10j_{1}+\ldots+j_{\ell-1}\equiv 0. For readability, we have abbreviated j1,j2,,jK0,j=q\sum_{{\begin{subarray}{c}j_{1},j_{2},\ldots,j_{K}\geq 0,\sum_{\ell}j_{\ell}=q\end{subarray}}} by 𝐣Ω(K,q)\sum_{{\bf j}\in\Omega_{(\!K,q\!)}} with Ω(K,q)\Omega_{(K,q)} being the set of all integer tuples 𝐣=(j1,,jK){\bf j}=(j_{1},\ldots,j_{K}) that obey 0jq,0\leq j_{\ell}\leq q\,,\forall\ell, and =1Kj=q\sum_{\ell=1}^{K}j_{\ell}=q.

Moreover, the product =1Keiδx¯\prod_{\ell=1}^{K}\text{e}^{-i\delta\bar{x}_{\ell}} can be recast as a product of phases each of which proportional to one of the original xjx_{j} inputs, namely

=1Keiδx¯=s=0qeiδαsxs\displaystyle\prod_{\ell=1}^{K}\text{e}^{-i\delta\bar{x}_{\ell}}=\prod_{s=0}^{q}\text{e}^{-i\delta\alpha_{s}x_{s}} (14)

where the coefficients αs\alpha_{s} are given by

αs\displaystyle\alpha_{s} =:s[Σ1,Σ]1ΣΣ1+1,\displaystyle=\sum_{{\ell\mathrel{\mathop{\mathchar 58\relax}}\,s\in[\Sigma_{\ell-1},\Sigma_{\ell}]}}\frac{1}{\Sigma_{\ell}-\Sigma_{\ell-1}+1}, (15)

with Σ=j1++j\Sigma_{\ell}=j_{1}+\ldots+j_{\ell}. This simplification is achieved by noticing that xx_{\ell} contributes an additive factor to αs\alpha_{s} if and only if it contains xsx_{s} in its average, in which case the factor is 1j+1=1ΣΣ1+1\frac{1}{j_{\ell}+1}=\frac{1}{\Sigma_{\ell}-\Sigma_{\ell-1}+1}. In addition, xx_{\ell} will contain xsx_{s} if and only if ss is in the range [Σ1,Σ][\Sigma_{\ell-1},\Sigma_{\ell}].

Furthermore, one can show that the αs\alpha_{s} may be expressed in terms of the minimum \ell index, which we denote min\ell_{\min}, obeying s[Σ,Σ+1]s\in[\Sigma_{\ell},\Sigma_{\ell+1}] and the maximum \ell index, denoted max\ell_{\max} for which s[Σ1,Σ]s\in[\Sigma_{\ell-1},\Sigma_{\ell}], namely,

αs\displaystyle\alpha_{s} =\displaystyle= 1Σmin+1Σmin+1+1ΣmaxΣmax1+1\displaystyle\frac{1}{\Sigma_{\ell_{\min}+1}-\Sigma_{\ell_{\min}}+1}+\frac{1}{\Sigma_{\ell_{\max}}-\Sigma_{\ell_{\max}-1}+1} (16)
+\displaystyle+ (maxmin2).\displaystyle(\ell_{\max}-\ell_{\min}-2)\,.

Using αs\alpha_{s} and Eq. (14) we can now write the approximation Eq. (12) as a sum of phases

e^Kiτ[x0,,xq]=(iδ)qq!𝐤qeiδsαsxs,\displaystyle\hat{\text{e}}^{-i\tau[x_{0},\ldots,x_{q}]}_{K}=\frac{(-i\delta)^{q}}{q!}\sum_{{\bf k}_{q}}\text{e}^{-i\delta\sum_{s}\alpha_{s}x_{s}}\,, (17)

where we replaced the sum over 𝐣Ω(K,q){\bf j}\in\Omega_{(\!K,q\!)} with a 𝐤q{\bf k}_{q} multi-index. The multi-index 𝐤q=(k1,,kq){\bf k}_{q}=(k_{1},\ldots,k_{q}) is a tuple of indices kmk_{m}, with m=1,,qm=1,\ldots,q, each ranging from 11 to KK. The relation between the former indices j1,,jKj_{1},\ldots,j_{K} and the current indices k1,,kqk_{1},\ldots,k_{q} is such that jj_{\ell} is the number of indices in 𝐤q{\bf k}_{q} whose value is \ell (since there are qq indices, indeed jj_{\ell} can take on values in the range [0,q][0,q]). Since the 𝐤q{\bf k}_{q} indices represent ordered occurrences, we remove the multinomial coefficient from the sum. We also note that αs\alpha_{s} can be directly calculated from the multi-index 𝐤q{\bf k}_{q} indices, since \sum_{\ell} it is the sum total of all the index values k1,,kqk_{1},\ldots,k_{q} whose value is less than or equal to \ell.

Next we describe a quantum circuit designed to approximate U(Δt)U(\Delta t) as given in Eq. (3) based on the approximation derived above.

III.2 The LCU implementation

Having obtained the approximation Eq. (17), we insert it now into Eq. (4), identifying δ=Δt/K\delta=\Delta t/K and the inputs xjx_{j} with diagonal energies EzjE_{z_{j}}. Equation (3) becomes

U(Δt)U^=\displaystyle U(\Delta t)\approx\hat{U}= q=0Δtqq!𝐢qΓ𝐢q1Kq𝐤q×\displaystyle\sum_{q=0}^{\infty}\frac{\Delta t^{q}}{q!}\sum_{{\bf i}_{q}}\Gamma_{{\bf i}_{q}}\frac{1}{K^{q}}\sum_{{\bf k}_{q}}\times (18)
(i)qP𝐢q[zd𝐢qΓ𝐢qeiδs=0qαsEzs|zz|].\displaystyle(-i)^{q}P_{{\bf i}_{q}}\left[\sum_{z}\frac{d_{{\bf i}_{q}}}{\Gamma_{{\bf i}_{q}}}\text{e}^{-i\delta\sum_{s=0}^{q}\alpha_{s}{E}_{z_{s}}}|z\rangle\langle z|\right]\,.

At this point we assume for ease of presentation that the matrix elements d𝐢q=j=1qzj|Dij|zjd_{{\bf i}_{q}}=\prod_{j=1}^{q}\langle z_{j}|D_{i_{j}}|z_{j}\rangle are real-valued and independent of zz (i.e., that the DiD_{i} matrices are proportional to the identity matrix), as this is the case for many physical systems. We will discuss the necessary adjustments to the algorithm (which do not lead to added complexity) in the case where these coefficients are zz-dependent later on. In this simpler case we have Di=Γi𝟙D_{i}=\Gamma_{i}\mathds{1}, for some Γi\Gamma_{i}\in\mathbb{R} and d𝐢q=jΓijd_{{\bf i}_{q}}=\prod_{j}\Gamma_{i_{j}}, so d𝐢q/Γ𝐢q=1d_{{\bf i}_{q}}/\Gamma_{{\bf i}_{q}}=1, and the evolution operator U^\hat{U} becomes

U^=\displaystyle\hat{U}= q=0𝐢q𝐤qΓ𝐢qΔtqKqq!V(𝐢q,𝐤q),\displaystyle\sum_{q=0}^{\infty}\sum_{{\bf i}_{q}}\sum_{{\bf k}_{q}}\frac{\Gamma_{{\bf i}_{q}}\Delta t^{q}}{K^{q}q!}\ V_{({\bf i}_{q},{\bf k}_{q})}\,, (19)

where

V(𝐢q,𝐤q)=(i)qP𝐢qzeiδs=0qαsEzs|zz|V_{({\bf i}_{q},{\bf k}_{q})}=(-i)^{q}P_{{\bf i}_{q}}\sum_{z}\text{e}^{-i\delta\sum_{s=0}^{q}\alpha_{s}E_{z_{s}}}|z\rangle\langle z| (20)

are unitary operators. While the dependence of V(𝐢q,𝐤q)V_{({\bf i}_{q},{\bf k}_{q})} on the 𝐢q{\bf i}_{q} multi-index is evident, we note that the 𝐤q{\bf k}_{q} dependence is (only) via the αs\alpha_{s} coefficients. Truncating the infinite series in Eq. (19) at some maximal order QQ. and rearranging terms, we find

U^U~:=q=0Q(ΓΔt)qq!𝐢qΓ𝐢qΓq𝐤q1KqV(𝐢q,𝐤q),\displaystyle\hat{U}\approx\widetilde{U}\mathrel{\mathop{\mathchar 58\relax}}=\sum_{q=0}^{Q}\frac{(\Gamma\Delta t)^{q}}{q!}\sum_{{\bf i}_{q}}\frac{\Gamma_{{\bf i}_{q}}}{\Gamma^{q}}\sum_{{\bf k}_{q}}\frac{1}{K^{q}}V_{({\bf i}_{q},{\bf k}_{q})}\,, (21)

where we have denoted the ‘off-diagonal norm’ of the Hamiltonian by Γ=i=1MΓi\Gamma=\sum_{i=1}^{M}\Gamma_{i}. The choice for QQ which ensures a near-optimal scaling of resources in terms of simulation error will be discussed in Sec. III.3.

Since Eq. (21) has the form of a linear combination of unitary operators, we can invoke the LCU technique to execute it on a quantum computer. The technique consists of two main components (i) state preparation on ancilla qubits and (ii) unitary circuits controlled by the ancilla qubits. We discuss a simple-to-implement protocol for the two in order next.

III.2.1 The state preparation subroutine

The ancilla state we prepare consists of three quantum (multi-)registers: (i) A QQ-qubit register, (ii) QQ registers with MM qubits each, and (iii) a third set of QQ registers consisting of κ=log2K\kappa=\log_{2}K qubits each. Denoting |𝟏q=|1q|0Qq|{\bf 1}_{q}\rangle=|1\rangle^{\otimes{q}}|0\rangle^{\otimes{Q-q}} which is the unary encoding of the order qq, |𝐢q=|i1|iq|0Qq|{\bf i}_{q}\rangle=|i_{1}\rangle\cdots|i_{q}\rangle|0\rangle^{\otimes{Q-q}} where |i|i\rangle is a shorthand for the MM qubit state |1i|0M1|1\rangle_{i}|0\rangle^{\otimes M-1}, that is a unary encoding of i[1,M]i\in[1,M] in which the ii-th qubit is set to 11 while the rest of the M1M-1 qubits are set to 0, and |𝐤q=|k1|kq|0Qq|{\bf k}_{q}\rangle=|k_{1}\rangle\cdots|k_{q}\rangle|0\rangle^{\otimes{Q-q}} – a shorthand for QQ quantum registers, each of dimension KK, the ancilla state for the LCU reads:

|ψ0=1sq=0Q(ΓΔt)qq!𝐢qΓ𝐢qΓq𝐤q1Kq|𝟏q|𝐢q|𝐤q.\displaystyle|\psi_{0}\rangle=\frac{1}{\sqrt{s}}\sum_{q=0}^{Q}\sqrt{\frac{(\Gamma\Delta t)^{q}}{q!}}\sum_{{\bf i}_{q}}\sqrt{\frac{\Gamma_{{\bf i}_{q}}}{\Gamma^{q}}}\sum_{{\bf k}_{q}}\frac{1}{\sqrt{K^{q}}}|{\bf 1}_{q}\rangle|{\bf i}_{q}\rangle|{\bf k}_{q}\rangle\,. (22)

with the normalization constant

s\displaystyle s =q=0Q(ΓΔt)qq!𝐢qΓ𝐢qΓq𝐤q1Kq=q=0Q(ΓΔt)qq!.\displaystyle=\sum_{q=0}^{Q}\frac{(\Gamma\Delta t)^{q}}{q!}\sum_{{\bf i}_{q}}\frac{\Gamma_{{\bf i}_{q}}}{\Gamma^{q}}\sum_{{\bf k}_{q}}\frac{1}{K^{q}}=\sum_{q=0}^{Q}\frac{(\Gamma\Delta t)^{q}}{q!}. (23)

We prepare the state in three stages. In the first, we use the first QQ-qubit register to prepare

|0Q1sq=0Q(ΓΔt)qq!|𝟏q,|0\rangle^{\otimes Q}\to\frac{1}{\sqrt{s}}\sum_{q=0}^{Q}\sqrt{\frac{(\Gamma\Delta t)^{q}}{q!}}|{\bf 1}_{q}\rangle\,, (24)

at the cost of 𝒪(Q){\cal O}(Q) controlled RyR_{y} rotations [5]. As a next step, we use the second set of QQ registers to prepare

|𝟏q|0Q|𝟏q𝐢qΓ𝐢qΓq|𝐢q,|{\bf 1}_{q}\rangle|0\rangle^{\otimes Q}\to|{\bf 1}_{q}\rangle\sum_{{\bf{i}}_{q}}\sqrt{\frac{\Gamma_{{\bf{i}}_{q}}}{\Gamma^{q}}}|{\bf{i}}_{q}\rangle\,, (25)

which can also be done at the cost of 𝒪(QM){\cal O}(QM) controlled rotations [5]. Note that the right-hand-side of the last equation has a tensor produce structure over qq registers: (|1i=1MΓi/Γ|i)q(|1\rangle\sum_{i=1}^{M}\sqrt{\Gamma_{i}/\Gamma}|i\rangle)^{\otimes q}. Similarly, the third and final stage consists of using the last set of QQ κ\kappa-qubit registers to prepare

|𝟏q|0Q|𝟏q1Kq𝐤q|𝐤q,|{\bf 1}_{q}\rangle|0\rangle^{\otimes Q}\to|{\bf 1}_{q}\rangle\frac{1}{\sqrt{K^{q}}}\sum_{{\bf k}_{q}}|{\bf k}_{q}\rangle\,, (26)

which can be carried out using QQ controlled Hadamard gates on κ\kappa qubits. Therefore, overall the state preparation subroutine can be completed using 𝒪(Q(M+κ)){\cal O}(Q(M+\kappa)) controlled rotations.

III.2.2 The controlled unitaries

\Qcircuit

@C=2em @R=0.2em @!R — s ⟩ & \ctrl1 \ctrl2 \qw\ctrl2 \gate\qw

i_q ⟩ \ctrl3 \qw\qw\qw\qw\qw

k_q ⟩ \qw\ctrl1 \qw \ctrl1\qw \qw

— 0 ⟩ \qw\gateα_s \ctrl1 \gate-α_s \qw\qw

— z ⟩ \gateP_i_s\qw\gateie^-iδα_s D_0\qw \qw\qw   U_(i_q,k_q)— z ⟩

Figure 1: A circuit description of U(iq,𝐤q)U_{(i_{q},{\bf k}_{q})}. QQ successive application of this circuit, preceded by a phase circuit eiδα0D0\text{e}^{-i\delta\alpha_{0}D_{0}}, make up the controlled unitary V(𝐢q,𝐤q)V_{({\bf i}_{q},{\bf k}_{q})} of Eq. (27). The |s|s\rangle register consists of QQ qubits and is initialized with the rightmost bit set to one and the others set to zero. The \ll gate shifts the set bit one place to the left (if the leftmost bit is set, the gate sets the rightmost bit). The αs\alpha_{s} coefficients in the phase-gate eiδαsD0\text{e}^{-i\delta\alpha_{s}D_{0}} are calculated using a circuit we denote in this figure by αs(𝐤q)\alpha_{s}({\bf k}_{q}). The circuit for αs\alpha_{s} is described in the main text. The controlled PisP_{i_{s}} circuit executes Pi1P_{i_{1}} then Pi2P_{i_{2}} as per the value stored in the |s|s\rangle register. Since the PiP_{i}’s are Pauli-XX strings, the control-PiP_{i} gates consist of application simple-to-implement CNOTs.

The second ingredient of the LCU protocol consists of designing the select-unitary operation:

|𝐢q|𝐤q|z|𝐢q|𝐤qV(𝐢q,𝐤q)|z,|{\bf i}_{q}\rangle|{\bf k}_{q}\rangle|z\rangle\to|{\bf i}_{q}\rangle|{\bf k}_{q}\rangle V_{({\bf i}_{q},{\bf k}_{q})}|z\rangle\,, (27)

where the V(𝐢q,𝐤q)V_{({\bf i}_{q},{\bf k}_{q})} are given in Eq. (20). We now show that the select-unitary operation which executes Eq. (27) can be implemented using interleaved applications of CNOTs and controlled-phases operations. To that aim, we define the unitary operator:

U(is,𝐤q)(i)eiδαs(𝐤q)D0PisU_{(i_{s},{\bf k}_{q})}\coloneqq(-i)\text{e}^{-i\delta\alpha_{s}({\bf k}_{q})D_{0}}P_{i_{s}} (28)

where we have now spelled out the dependence of αs\alpha_{s} on 𝐤q{\bf k}_{q}. Let us next consider the action of the ordered product

U(iq,𝐤q)U(i1,𝐤q)eiδα0(𝐤q)D0U(𝐢q,𝐤q)eiδα0(𝐤q)D0U_{(i_{q},{\bf k}_{q})}\cdots U_{(i_{1},{\bf k}_{q})}\text{e}^{-i\delta\alpha_{0}({\bf k}_{q})D_{0}}\coloneqq U_{({\bf i}_{q},{\bf k}_{q})}\text{e}^{-i\delta\alpha_{0}({\bf k}_{q})D_{0}} (29)

on a state |z|z\rangle. We find that

U(𝐢q,𝐤q)eiδα0(𝐤q)D0|z\displaystyle U_{({\bf i}_{q},{\bf k}_{q})}\text{e}^{-i\delta\alpha_{0}({\bf k}_{q})D_{0}}|z\rangle
=(i)qP𝐢qzeiδs=1qαs(𝐤q)Ezs|z=V(𝐢q,𝐤q)|z.\displaystyle=(-i)^{q}P_{{\bf i}_{q}}\sum_{z}\text{e}^{-i\delta\sum_{s=1}^{q}\alpha_{s}({\bf k}_{q})E_{z_{s}}}|z\rangle=V_{({\bf i}_{q},{\bf k}_{q})}|z\rangle. (30)

Hence, U(𝐢q,𝐤q)U_{({\bf i}_{q},{\bf k}_{q})} consists of an interleaved application of CNOT gates (since the PiP_{i} operators are Pauli-XX strings) and a phase-kickback circuit that implements eiδαs(𝐤q)D0\text{e}^{-i\delta\alpha_{s}({\bf k}_{q})D_{0}} [up to a simple factor of (i)(-i)]. A circuit for U(𝐢q,𝐤q)U_{({\bf i}_{q},{\bf k}_{q})} is sketched out in Fig. 1. For simplicity, the circuit uses a QQ-qubit ancilla that encodes s=0,,Qs=0,\ldots,Q in unary form, and a shift gate that, starting with |001|0\ldots 01\rangle, shifts the 11 bit to the left with every application. By including this counter register we are able to express V(𝐢q,𝐤q)V_{({\bf i}_{q},{\bf k}_{q})} as QQ repeated applications of the circuit shown in Fig. 1.

In addition, since the diagonal D0D_{0} is written as a linear combination of Pauli-ZZ strings, i.e., as D0=JkZkD_{0}=\sum J_{k}Z_{k}, for some JkJ_{k}\in{\mathbb{R}} and where the ZkZ_{k} operators stand for some Pauli-ZZ strings, the diagonal unitary eiδαs(𝐤q)D0\text{e}^{-i\delta\alpha_{s}({\bf k}_{q})D_{0}} is essentially a product of eiδαsJkZk\text{e}^{-i\delta\alpha_{s}J_{k}Z_{k}} each of which is trivial to implement, once the coefficient αs\alpha_{s} is provided. Next, we discuss the implementation of the αs\alpha_{s} calculation.

III.2.3 Calculating the αs\alpha_{s} coefficients

First, we note that given a quantum register that stores an integer bb, one can implement Rz(bθ)R_{z}(b\theta) as well as Rz(θ/b)R_{z}(\theta/b) using log2b\log_{2}{b} calls to a single qubit RzR_{z} rotation gate [17]. Next, we observe that αs\alpha_{s} consists of three terms, each of which is either an integer or the reciprocal thereof, cf. Eq. (16). Therefore, by calculating and storing those integers in a quantum register, one may implement Rz(αsθ)R_{z}(\alpha_{s}\theta) efficiently.

Lastly, finding min\ell_{\min} and max\ell_{\max} as per Eq. (16) (each of which requiring a κ\kappa-qubit register), may be achieved with 𝒪(κ){\cal O}(\kappa) operations using reversible binary search (a blueprint for such a circuit is given in Appendix C) provided one has access to a cost function circuit that calculates Σ\Sigma_{\ell} for a given \ell value.

A circuit for calculating Σ\Sigma_{\ell} from \ell can be constructed by implementing an integer comparison circuit [18]

|k||Σ|k||Σδkmin,|k\rangle|\ell\rangle|\Sigma\rangle\to|k\rangle|\ell\rangle|\Sigma\oplus\delta_{k\leq\ell_{\min}}\rangle\,, (31)

i.e., a circuit that increments the third Σ\Sigma register if and only if kmink\leq\ell_{\min}. Here the |k|k\rangle and ||\ell\rangle registers have κ\kappa qubits and |Σ|\Sigma\rangle has log2Q\log_{2}Q qubits. Executing the above sub-routine qq times sequentially with the |k1|kq|k_{1}\rangle\ldots|k_{q}\rangle functioning as the first register in each iteration, we obtain a circuit that implements

(|k1|k2|kq)||0(|k1|k2|kq)||Σ.\displaystyle\left(|k_{1}\rangle|k_{2}\rangle\cdots|k_{q}\rangle\right)|\ell\rangle|0\rangle\to\left(|k_{1}\rangle|k_{2}\rangle\cdots|k_{q}\rangle\right)|\ell\rangle|\Sigma_{\ell}\rangle\,. (32)

Implementation of a binary search circuit using Σ\Sigma_{\ell} as the cost function, allows us to construct a circuit that produces

|𝐤q|04|𝐤q|min|max|Σmin|Σmax.|{\bf k}_{q}\rangle|0\rangle^{\otimes 4}\to|{\bf k}_{q}\rangle|\ell_{\min}\rangle|\ell_{\max}\rangle|\Sigma_{\ell_{\min}}\rangle|\Sigma_{\ell_{\max}}\rangle\,. (33)

The last two registers can in turn be used to calculate the relevant integers that appear in Eq. (16). The above can be done in 𝒪(κ){\cal O}(\kappa) operations [18].

We mention in passing that another 𝒪(K){\cal O}(K) [rather than 𝒪(κ){\cal O}(\kappa)] method for calculating αs\alpha_{s} is available which is nonetheless simpler to implement as it does not involve the calculation of min\ell_{\min} and max\ell_{\max}. For this one, we write

eiδαsEzs==1KeiδbsEzsΣΣ1+1,\text{e}^{-i\delta\alpha_{s}E_{z_{s}}}=\prod_{\ell=1}^{K}\text{e}^{-i\delta\frac{b_{s\ell}E_{z_{s}}}{\Sigma_{\ell}-\Sigma_{\ell-1}+1}}\,, (34)

where bsb_{s\ell} denotes a bit that is set to one if s[Σ1,Σ]s\in[\Sigma_{\ell-1},\Sigma_{\ell}] and is zero otherwise. This requires a circuit |s||0|s||bs|s\rangle|\ell\rangle|0\rangle\to|s\rangle|\ell\rangle|b_{s\ell}\rangle implementing an integer comparison circuit which checks for the conditions sΣ1s\geq\Sigma_{\ell-1} and sΣs\leq\Sigma_{\ell}.

III.3 Algorithm cost

In the previous section we worked out the circuit U~\widetilde{U} which approximates the short-time evolution U(Δt)U(\Delta t). In this section we provide the resource scaling analysis that ensures that U~\widetilde{U} is ϵ/r\epsilon/r-close to U(Δt)U(\Delta t) in spectral distance, U~U(Δt)ϵ/r\|\widetilde{U}-U(\Delta t)\|\leq\epsilon/r, where ϵ\epsilon is the overall error over the simulation time tt, and r=t/Δtr=t/\Delta t. From the subadditivity property of the spectral norm it is ensured that with rr repetition of U~\widetilde{U}, the overall simulation is ϵ\epsilon-close to the exact dynamics induced by the exact eiHt\text{e}^{-iHt}.

Our algorithm involves two basic approximations to the exact dynamics, U(Δt)U^U~U(\Delta t)\to\hat{U}\to\widetilde{U}, determined by the divided differences approximation constant KK and the truncation order QQ, respectively. Therefore, to ensure that U~U(Δt)ϵ/r\|\widetilde{U}-U(\Delta t)\|\leq\epsilon/r we require that both U^U(Δt)\|\hat{U}-U(\Delta t)\| and U~U^\|\widetilde{U}-\hat{U}\| are at most ϵ/2r\epsilon/2r.

First we note that the LCU formalism [4] dictates that the ancilla state preparation normalization factor ss should be such that |s2|ϵ/2r|s-2|\leq\epsilon/2r to ensure that U~U^ϵ/2r\|\widetilde{U}-\hat{U}\|\leq\epsilon/2r. According to Eq. (23), fixing Δt=ln2/Γ\Delta t=\ln 2/\Gamma (equivalently, r=tΓ/ln2r=t\Gamma/\ln 2) and Q=𝒪(log(Γt/2ϵ)/loglog(Γt/2ϵ))Q={\cal O}(\log(\Gamma t/2\epsilon)/\log\log(\Gamma t/2\epsilon)), in our algorithm ensures |s2|ϵ/2r|s-2|\leq\epsilon/2r, as required.

Second, based on Eq. (9), we show in Appendix B that

|eiΔt[Ez0,Ezq]e^KiΔt[Ez0,Ezq]|Δtqq!(ΔtΔE2K2)2,\Big{|}\text{e}^{-i\Delta t[E_{z_{0}}\ldots,E_{z_{q}}]}-\hat{\text{e}}^{-i\Delta t[E_{z_{0}}\ldots,E_{z_{q}}]}_{K}\Big{|}\leq\frac{\Delta t^{q}}{q!}\Big{(}\frac{\Delta t\Delta E}{2K^{2}}\Big{)}^{2}\,, (35)

where ΔE𝒪(1)\Delta E\sim{\cal O}(1) is a bound on |Ezj+1Ezj||E_{z_{j+1}}-E_{z_{j}}| for all jj, i.e., on two consecutive diagonal energies, from which it follows that

U^U(Δt)\displaystyle\|\hat{U}-U(\Delta t)\| q,𝐢qΓ𝐢q|A𝐢qA^𝐢q|\displaystyle\leq\sum_{q,{\bf i}_{q}}\Gamma_{{\bf i}_{q}}\Big{|}A_{{\bf i}_{q}}-\hat{A}_{{\bf i}_{q}}\Big{|} (36)
=q,𝐢qΓ𝐢qΔtqq!(ΔtΔE2K)2=12(ΔtΔEK)2.\displaystyle=\sum_{q,{\bf i}_{q}}\Gamma_{{\bf i}_{q}}\frac{\Delta t^{q}}{q!}\left(\frac{\Delta t\Delta E}{2K}\right)^{2}=\frac{1}{2}\left(\frac{\Delta t\Delta E}{K}\right)^{2}.

Hence, choosing K=𝒪(rΓϵ)=𝒪(t/Γϵ)K={\cal O}(\frac{\sqrt{r}}{\Gamma\sqrt{\epsilon}})={\cal O}(\sqrt{t}/\sqrt{\Gamma\epsilon}) ensures that U^U(Δt)ϵ/(2r)\|\hat{U}-U(\Delta t)\|\leq{\epsilon/(2r)}. Note that the gate and qubit resources of our algorithm scale with κ=log2K\kappa=\log_{2}K, that is, logaritmic in t/(Γϵ)\sqrt{t/(\Gamma\epsilon)}.

Finally, with the above choices for QQ and κ\kappa we recall that the number of ancilla qubits needed and the number of gates of a single LCU execution are 𝒪(Q(M+κ)){\cal O}(Q(M+\kappa)), as discussed above. Since both QQ and κ\kappa scale as logt/Γϵ\log t/\Gamma\epsilon and logt/Γϵ\log\sqrt{t/\Gamma\epsilon}, respectively, we find that the algorithm does indeed have near-optimal dependence of precision.

III.4 The case of zz-dependent d𝐢qd_{{\bf i}_{q}}

In the previous section, we assumed for simplicity that d𝐢q/Γ𝐢q=1d_{{\bf i}_{q}}/\Gamma_{{{\bf i}_{q}}}=1 whereas in the general case all that is guaranteed is that |d𝐢q/Γ𝐢q|1|d_{{\bf i}_{q}}/\Gamma_{{{\bf i}_{q}}}|\leq 1 which follows from |dij/Γij|1j|d_{i_{j}}/\Gamma_{i_{j}}|\leq 1\,\forall j. The ratio dij/Γijd_{i_{j}}/\Gamma_{i_{j}} may always be expressed as the average of two phases:

dijΓij=12(ei(θij+ϕij)+ei(θijϕij)).\frac{d_{i_{j}}}{\Gamma_{i_{j}}}=\frac{1}{2}\Big{(}\text{e}^{i(\theta_{i_{j}}+\phi_{i_{j}})}+\text{e}^{i(\theta_{i_{j}}-\phi_{i_{j}})}\Big{)}. (37)

Therefore in this case, we can simply replace U(is,𝐤q)U_{(i_{s},{\bf k}_{q})} of Eq. (28) with

U(is,𝐤q)i2k=0,1ei(θis+(1)kϕis)eiδαs(𝐤q)D0Pis,U_{(i_{s},{\bf k}_{q})}\coloneqq-\frac{i}{2}\sum_{k=0,1}\text{e}^{i(\theta_{i_{s}}+(-1)^{k}\phi_{i_{s}})}\text{e}^{-i\delta\alpha_{s}({\bf k}_{q})D_{0}}P_{i_{s}}, (38)

i.e., we can express it as an average of two unitaries. The simulation circuit in this case can be implemented in much the same manner as before, where now the factor ei(θis+(1)kϕis)\text{e}^{i(\theta_{i_{s}}+(-1)^{k}\phi_{i_{s}})} can be encoded directly as a phase if the off-diagonal matrix elements are given in polar coordinates (and otherwise using a phase-kickback circuit using quantum registers that encode |is|θis|ϕis|k|i_{s}\rangle|\theta_{i_{s}}\rangle|\phi_{i_{s}}\rangle|k\rangle [5]). Additionally, the LCU state preparation should be modified so as to include QQ additional qubits each prepared in the |+|+\rangle state alongside the change ΓΔtΓΔt/2\Gamma\Delta t\to\Gamma\Delta t/2 in the first stage of the state preparation routine to account for the qq extra factors of 1/21/2. These modifications do not alter the overall complexity of the algorithm.

IV Summary and conclusions

We devised a simple-to-implement quantum algorithm designed to simulate the dynamics of general Hamiltonians on a quantum computer. While straightforward to execute, we have shown that the proposed algorithm retains near-optimal dependence on the target precision.

Our algorithm has numerous properties that we argue make it attractive for implementation on resource-limited near-future quantum computing devices, which are characterized by being small and noisy and on which one is not afforded the luxury of fault tolerance and error correction schemes. These are: (i) Neither the gate cost nor the qubit cost of the algorithm depend on the norm of the diagonal component of the Hamiltonian. While for most simulation algorithms the number of repetitions of the short-time evolution unitary grows linearly with the diagonal norm, in the present algorithm the dependence on D0D_{0} enters only via trivial phases. (ii) In addition, the present algorithm offers a compact LCU decomposition of the Hamiltonian where the terms in the Hamiltonians are grouped according to which bits they flip. This is the PMR decomposition which is in the general case considerably more compact than the customary breakup of HH to a linear combination of Pauli strings [5]. As such, the number of ancilla qubits required for the LCU state preparation subroutine is expected to be in general less costly by orders of magnitude compared to standard decompositions. (iii) Last, we note again the simplicity of our proposed algorithm, which prescribes the implementation of only simple-to-implement, and for the most part straightforward, sub-routines. The above property is specially important in the NISQ era where quantum computing platforms that are small and noisy and where any unnecessary qubit or gate that are added to the circuit may be decisive in terms of performance.

Furthermore, The PMR method presented above also extends naturally to the case of time-dependent Hamiltonians with the main modification being that in the time-dependent case the divided-difference inputs no longer consist only of diagonal energies but must rather be augmented with the frequencies of the time dependence [6], namely,

EzjEzj+=j+1qλ(,zj)E_{z_{j}}\to E_{z_{j}}+\sum_{\ell=j+1}^{q}\lambda_{(\ell,z_{j})} (39)

where the λ(,zj)\lambda_{(\ell,z_{j})} are (in general complex-valued) frequencies of the now time-dependent diagonal operators DiD_{i} in the PMR decomposition of the Hamiltonian. As this is the only difference between the time-independent and time-dependent case, the divided-difference approximation introduced here similarly applies to time-dependent simulations as well. A major advantage of the PMR formulation in the time-dependent case is that the cost of the present algorithm is linear in the evolution time and does not depend on frequencies (see Ref. [6]). This property too translates to potentially critical savings in gate and qubit costs.

In light of the above, we hope that the algorithm we proposed here will prove to be a useful tool in coming years where near-term quantum computing devices become more widely available and allow their users to generate credible simulation results for scientifically relevant problems. In a future study we hope to report on a performance comparison of this algorithm against existing schemes when tasked with simulating a scientifically meaningful model and executed on a NISQ device. An apples-to-apples comparison against existing algorithms on specific applications will highlight the advantages of our approach.

Acknowledgements.
We thank Arman Babakhani and Lev Barash for useful suggestions. IH acknowledges support by the Office of Advanced Scientific Computing Research of the U.S. Department of Energy under Contract No DE-SC0024389. In addition, this research was developed with funding from the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR00112330014. The views, opinions, and/or findings expressed are those of the authors and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.

References

Appendix A Notes on divided differences

We provide below a brief summary of the concept of divided differences, which is a recursive division process. This method is typically encountered when calculating the coefficients in the interpolation polynomial in the Newton form.

The divided differences [13, 14] of a function f()f(\cdot) are defined as

f[x0,,xq]j=0qf(xj)kj(xjxk)f[x_{0},\ldots,x_{q}]\equiv\sum_{j=0}^{q}\frac{f(x_{j})}{\prod_{k\neq j}(x_{j}-x_{k})} (40)

with respect to the list of real-valued input variables [x0,,xq][x_{0},\ldots,x_{q}]. The above expression is ill-defined if some of the inputs have repeated values, in which case one must resort to the use of limits. For instance, in the case where x0=x1==xq=xx_{0}=x_{1}=\ldots=x_{q}=x, the definition of divided differences reduces to:

f[x0,,xq]=f(q)(x)q!,f[x_{0},\ldots,x_{q}]=\frac{f^{(q)}(x)}{q!}\,, (41)

where f(n)()f^{(n)}(\cdot) stands for the nn-th derivative of f()f(\cdot). Divided differences can alternatively be defined via the recursion relations

f[xi,,xi+j]=f[xi+1,,xi+j]f[xi,,xi+j1]xi+jxi,\displaystyle f[x_{i},\ldots,x_{i+j}]=\frac{f[x_{i+1},\ldots,x_{i+j}]-f[x_{i},\ldots,x_{i+j-1}]}{x_{i+j}-x_{i}}\,, (42)

with i{0,,qj},j{1,,q}i\in\{0,\ldots,q-j\},\ j\in\{1,\ldots,q\} and the initial conditions

f[xi]=f(xi),i{0,,q}i.f[x_{i}]=f(x_{i}),\qquad i\in\{0,\ldots,q\}\quad\forall i\,. (43)

A function of divided differences can be defined in terms of its Taylor expansion

f[x0,,xq]=n=0f(n)(0)n![x0,,xq]n.f[x_{0},\ldots,x_{q}]=\sum_{n=0}^{\infty}\frac{f^{(n)}(0)}{n!}[x_{0},\ldots,x_{q}]^{n}\ . (44)

Moreover, it is easy to verify that

[x0,,xq]q+n={0n<01n=0kj=nj=0qxjkjn>0.[x_{0},\ldots,x_{q}]^{q+n}=\Bigg{\{}\begin{tabular}[]{ l c l }$0$&\phantom{$0$}&$n<0$\\ $1$&\phantom{$0$}&$n=0$\\ $\sum_{\sum k_{j}=n}\prod_{j=0}^{q}x_{j}^{k_{j}}$&\phantom{$0$}&$n>0$\\ \end{tabular}\,. (45)

One may therefore write:

f[x0,,xq]=n=0f(n)(0)n![x0,,xq]n=n=qf(n)(0)n![x0,,xq]n=m=0f(q+m)(0)(q+m)![x0,,xq]q+m.f[x_{0},\ldots,x_{q}]=\sum_{n=0}^{\infty}\frac{f^{(n)}(0)}{n!}[x_{0},\ldots,x_{q}]^{n}=\sum_{n=q}^{\infty}\frac{f^{(n)}(0)}{n!}[x_{0},\ldots,x_{q}]^{n}=\sum_{m=0}^{\infty}\frac{f^{(q+m)}(0)}{(q+m)!}[x_{0},\ldots,x_{q}]^{q+m}. (46)

The above expression can be further simplified to

f[x0,,xq]={ki}=(0,,0)(,,)f(q+ki)(0)(q+ki)!j=0qxjkj,f[x_{0},\ldots,x_{q}]=\sum_{\{k_{i}\}=(0,\ldots,0)}^{(\infty,\ldots,\infty)}\frac{f^{(q+\sum k_{i})}(0)}{(q+\sum k_{i})!}\prod_{j=0}^{q}x_{j}^{k_{j}}, (47)

as was asserted in the main text.

Appendix B Derivation of the divided-difference approximation bound

Here, we bound the absolute value of the difference between the divided difference exponential eiΔt[E0,,Eq]\text{e}^{-i\Delta t[E_{0},\ldots,E_{q}]} and its approximation

e^KiΔt[Ez0,,Ezq](iΔtK)q0j1jK1qeiΔtK(E¯(0,j1)+E¯(j1,j2)++E¯(jK1,q))j1!(j2j1)!(qjK1)!,\hat{\text{e}}_{K}^{-i\Delta t[E_{z_{0}},\ldots,E_{z_{q}}]}\coloneqq\left(\frac{-i\Delta t}{K}\right)^{q}\sum_{\mathclap{\begin{subarray}{c}\\ 0\leq j_{1}\leq\ldots\leq j_{K-1}\leq q\end{subarray}}}\frac{\text{e}^{-i\frac{\Delta t}{K}(\bar{E}_{(0,j_{1})}+\bar{E}_{(j_{1},j_{2})}+\ldots+\bar{E}_{(j_{K-1},q)})}}{j_{1}!(j_{2}-j_{1})!\cdots(q-j_{K-1})!}\,, (48)

where E¯(j,j+1)=Ezj++Ezj+1j+1j+1\bar{E}_{(j_{\ell},j_{\ell+1})}=\frac{E_{z_{j_{\ell}}}+\cdots+E_{z_{j_{\ell+1}}}}{j_{\ell+1}-j_{\ell}+1}. For what follows, we shall assume that |Ezj+1Ezj|ΔE|E_{z_{j+1}}-E_{z_{j}}|\leq\Delta E for some ΔE\Delta E which is of order unity, i.e., we shall assume it is an 𝒪(1){\cal O}(1) quantity that does not scale with system size, evolution time or Hamiltonian norm. This will be the case for all physical Hamiltonians, as the basis states |zj+1|z_{j+1}\rangle and |zj|z_{j}\rangle differ for those by a single kk-local permutation operation. Thus, for any the physical Hamiltonian the change ΔE\Delta E corresponds to a kk-local change in the basis state energy.

Next, let us use the fact [16] that the divided difference approximation is worst when the standard deviation of its inputs is maximal, namely

|q!eiΔt[Ez0,,Ezq](iΔt)qeiΔt(E(0,q))1|σ2Δt22(q+2),\left|\frac{q!\text{e}^{-i\Delta t[E_{z_{0}},\ldots,E_{z_{q}}]}}{(-i\Delta t)^{q}\text{e}^{-i\Delta t(E_{(0,q)})}}-1\right|\leq\frac{\sigma^{2}\Delta t^{2}}{2(q+2)}\,, (49)

where σ2\sigma^{2} is the variance of the inputs {Ez0,,Ezq}\{E_{z_{0}},\ldots,E_{z_{q}}\}. Owing to this observation, we find that the bound

|eiΔt[Ez0,Ez1,Ezq]e^KiΔt[Ez0,,Ezq]|=|eiΔt[0,,Ez1Ez0,,EzqEz0]e^KiΔt[0,,Ez1Ez0,,EzqEz0]|\Big{|}\text{e}^{-i\Delta t[E_{z_{0}},E_{z_{1}}\ldots,E_{z_{q}}]}-\hat{\text{e}}_{K}^{-i\Delta t[E_{z_{0}},\ldots,E_{z_{q}}]}\Big{|}=\Big{|}\text{e}^{-i\Delta t[0,\ldots,E_{z_{1}}-E_{z_{0}},\ldots,E_{z_{q}}-E_{z_{0}}]}-\hat{\text{e}}_{K}^{-i\Delta t[0,\ldots,E_{z_{1}}-E_{z_{0}},\ldots,E_{z_{q}}-E_{z_{0}}]}\Big{|} (50)

is maximized for the inputs Ezj=jΔEE_{z_{j}}=j\Delta E (i.e., this choice maximizes the variance of the inputs). Equation (50) can thus be bounded by the quantity

|eiΔt[0,ΔE,,qΔE]e^KiΔt[0,ΔE,,qΔE]|,\Big{|}\text{e}^{-i\Delta t[0,\Delta E,\ldots,q\Delta E]}-\hat{\text{e}}_{K}^{-i\Delta t[0,\Delta E,\ldots,q\Delta E]}\Big{|}\,, (51)

a quantity that we can calculate exactly, as both the divided difference and its approximation can be caluclated directly. A calculation of eiΔt[0,ΔE,,qΔE]\text{e}^{-i\Delta t[0,\Delta E,\ldots,q\Delta E]} reveals:

eiΔt[0,ΔE,,qΔE]=(2ieiΔtΔE/2ΔEsinΔtΔE2)qq!,\text{e}^{-i\Delta t[0,\Delta E,\ldots,q\Delta E]}=\frac{\left(\frac{-2i\text{e}^{-i\Delta t\Delta E/2}}{\Delta E}\sin\frac{\Delta t\Delta E}{2}\right)^{q}}{q!}\,, (52)

whereas for e^KiΔt[Ez0,,Ezq]\hat{\text{e}}_{K}^{-i\Delta t[E_{z_{0}},\ldots,E_{z_{q}}]}, after plugging in the inputs and simplifying, we obtain

e^KiΔt[0,ΔE,,qΔE]=(2iΔteiΔtΔE/22KsinΔtΔE2KsinΔtΔE2)qq!,\hat{\text{e}}_{K}^{-i\Delta t[0,\Delta E,\ldots,q\Delta E]}=\frac{\left(\frac{-2i\Delta t\text{e}^{-i\Delta t\Delta E/2}}{2K\sin\frac{\Delta t\Delta E}{2K}}\sin\frac{\Delta t\Delta E}{2}\right)^{q}}{q!}\,, (53)

which gives for their ratio

eiΔt[0,ΔE,,qΔE]e^KiΔt[Ez0,,Ezq]=(sinΔtΔE2KΔtΔE2K)q.\frac{\text{e}^{-i\Delta t[0,\Delta E,\ldots,q\Delta E]}}{\hat{\text{e}}_{K}^{-i\Delta t[E_{z_{0}},\ldots,E_{z_{q}}]}}=\left(\frac{\sin\frac{\Delta t\Delta E}{2K}}{\frac{\Delta t\Delta E}{2K}}\right)^{q}\,. (54)

Putting it all together, we find that

|eiΔt[Ez0,Ez1,Ezq]e^KiΔt[Ez0,,Ezq]|Δtqq!|eiΔt[0,ΔE,,qΔE]e^KiΔt[Ez0,,Ezq]1|=Δtqq!(1(sinΔtΔE2KΔtΔE2K)q)Δtqq!(ΔtΔE2K)2.\displaystyle\Big{|}\text{e}^{-i\Delta t[E_{z_{0}},E_{z_{1}}\ldots,E_{z_{q}}]}-\hat{\text{e}}_{K}^{-i\Delta t[E_{z_{0}},\ldots,E_{z_{q}}]}\Big{|}\leq\frac{\Delta t^{q}}{q!}\left|\frac{\text{e}^{-i\Delta t[0,\Delta E,\ldots,q\Delta E]}}{\hat{\text{e}}_{K}^{-i\Delta t[E_{z_{0}},\ldots,E_{z_{q}}]}}-1\right|=\frac{\Delta t^{q}}{q!}\left(1-\left(\frac{\sin\frac{\Delta t\Delta E}{2K}}{\frac{\Delta t\Delta E}{2K}}\right)^{q}\right)\leq\frac{\Delta t^{q}}{q!}\left(\frac{\Delta t\Delta E}{2K}\right)^{2}\,. (55)

Appendix C A quantum circuit for reversible binary search

Given an oracle |x|0|x|f(x)|x\rangle|0\rangle\to|x\rangle|f(x)\rangle where f(x)f(x) is monotonically non-decreasing in xx and x=0K1x=0\ldots K-1 (here K=2κ=2^{\kappa}), the task of a reversible binary search circuit is to find the smallest index x[0,K1]x_{*}\in[0,K-1] such that sf(x)s\geq f(x_{*}) for a given input ss, namely, a circuit that yields |s|0|s|x|s\rangle|0\rangle\to|s\rangle|x_{*}\rangle. Here, the second register has κ\kappa qubits.

To construct the desired circuit, we utilize an integer comparison oracle C|s|f|0=|s|f|bsfC|s\rangle|f\rangle|0\rangle=|s\rangle|f\rangle|b_{sf}\rangle where bsf=1b_{sf}=1 iff sfs\geq f (see Ref. [18]). We implement |s|0|s|x|s\rangle|0\rangle\to|s\rangle|x_{*}\rangle by κ\kappa calls to CC, such that each call will fix a single bit of xx_{*} going from left (most significant) to right (least significant).

Starting with the leftmost bit of xx_{*}, which we call xκx_{\kappa}, and moving to the right, we have C|s|f(K/2)|0|0κ1=|s|f(K/2)|xκ|0κ1C|s\rangle|f(K/2)\rangle|0\rangle|0\rangle^{\otimes\kappa-1}=|s\rangle|f(K/2)\rangle|x_{\kappa}\rangle|0\rangle^{\otimes\kappa-1}. We note that the input to ff, namely K/2K/2, is the value of the output register with the leftmost bit set to 11. The next bit is similarly set by C|s|f(xκK/2+K/4)|xκ|0|0κ2=|s|f(xκK/2+K/4)|xκ|xκ1|0κ2C|s\rangle|f(x_{\kappa}K/2+K/4)\rangle|x_{\kappa}\rangle|0\rangle|0\rangle^{\otimes\kappa-2}=|s\rangle|f(x_{\kappa}K/2+K/4)\rangle|x_{\kappa}\rangle|x_{\kappa-1}\rangle|0\rangle^{\otimes\kappa-2}, noting that now the input to ff is the value of the output register with the second leftmost bit set to 1. This sets xκ1x_{\kappa-1}. We continue similarly to set the rest of the xx_{*} bits. After κ\kappa iterations of we find the output register is the desired index xx_{*}.