A simple quantum simulation algorithm with near-optimal precision scaling

Amir Kalev Information Sciences Institute, University of Southern California, Arlington, VA 22203, USA Itay Hen [email protected] Information Sciences Institute, University of Southern California, Marina del Rey, California 90292, USA Department of Physics and Astronomy and Center for Quantum Information Science & Technology, University of Southern California, Los Angeles, California 90089, USA

Abstract

Quantum simulation is a foundational application for quantum computers, projected to offer insights into complex quantum systems that are beyond the reach of classical computation. However, with the exception of Trotter-based methods which suffer from suboptimal scaling with respect to simulation precision, existing simulation techniques are for the most part too intricate to implement on early fault-tolerant quantum hardware. We propose a quantum Hamiltonian dynamics simulation algorithm that aims to be both straightforward to implement and at the same time have near-optimal scaling in simulation precision.

I Introduction

Quantum computers are widely believed to have unique advantages over classical computers when it comes to simulating Hamiltonian dynamics, due to their inherent quantum nature, which allows them to efficiently represent complex quantum states and the evolution thereof. Such simulations hold the potential to solve intricate problems in quantum chemistry, materials science and physics, making quantum computers potentially very powerful future tools for advancing scientific knowledge and tackling challenges beyond the capabilities of classical computing.

The precise manner in which Hamiltonian dynamics can be approximated on quantum devices varies considerably between approaches, as each is tailored to different resource constraints, runtime requirements and precision needs. Perhaps the most straightforward and resource-efficient technique is that of Trotterization, in which case the time-evolution operator is approximated using Trotter-Suzuki product formulas [1, 2]. This approach is simple to implement, however it suffers from a suboptimal scaling of the resources required for its execution with error tolerance. A more advanced and vastly popular technique is quantum signal processing (QSP) [3], which leverages a sequence of unitary operations combined with classical preprocessing to directly approximate the exponential of a Hamiltonian. Complementing QSP, the linear combination of unitaries (LCU) framework [4] encompasses some of the most powerful quantum simulation methods [4, 5, 6, 7, 8]. In LCU-based approaches, the time evolution operator is expressed as a linear combination of efficiently implementable unitary operators, with ancillary qubits facilitating the simulation through oblivious amplitude amplification [9]. This method is particularly advantageous for simulating Hamiltonians in quantum chemistry and materials science, where decomposition of the Hamiltonian into simpler components is feasible.

Despite enjoying near-optimal scaling of resources with respect to target precision — typically requiring only a logarithmic number of ancilla qubits — these more advanced approaches present significant challenges for implementation on near-term quantum computing devices. For example, in the typical LCU-based approach, the select unitary transformations are (multi-qubit) control of Pauli strings. These control operations, which involve different Pauli operations acting on different qubits, translate to a non-trivial overhead in resources which then hinders the technique’s otherwise straightforward applicability on near-term quantum computing devices.

In this work, we present a novel LCU-based quantum simulation algorithm designed to be both straightforward to implement, by generally requiring only control-not (CNOT) operations as the select unitaries, while at the same time offering the benefits of near-optimal resource scaling, making dynamical simulations potentially more accessible for near-term devices.

The simulation algorithm we present builds on two recent studies [5, 6] that have shown how the expansion of the unitary time-evolution operator in powers of its off-diagonal strength [10, 11, 12] combined with the use of the concept of divided differences [13, 14] leads to resource-efficient quantum simulation algorithms whose complexity is on par and, in some cases, superior to other state-of-the-art optimal-precision simulation protocols [4, 15]. The approach we present here, which we refer to as the permutation matrix representation (PMR) simulation method, similarly utilizes a series expansion of the quantum time-evolution operator in its off-diagonal elements. However, in contrast to the work depicted in Refs. [5, 6], here we utilize a powerful approximation of divided differences which allows us to render those in terms of sums of phases which in turn leads to simplifications in the select unitary operations.

As we demonstrate, when the above approximation is combined with the LCU technique, the resulting simulation algorithm enjoys both near-optimal scaling in terms of the overall simulation error and simple-to-implement CNOTs interleaved with controlled phases as the select unitary operations.

The paper is organized as follows. In Sec. II, we provide an overview of the permutation matrix representation technique which serves as a foundation for our algorithm. In Sec. III, we present the Hamiltonian dynamics algorithm, which we construct using PMR, and discuss the divided difference calculation including cost analysis and various extensions. A discussion and some conclusions are given in Sec. IV.

II Permutation matrix representation of the time evolution operator

Before delving into the technical specifics of our algorithm, we first briefly present the key components of the PMR approach to expanding matrix exponentials in a series [12, 5, 6]. For concreteness we restrict our attention here to time-independent Hamiltonians and defer the discussion about how the method is to be extended to the time-dependent case to the concluding section. The PMR expansion consists of several steps, the first of which is choosing a preferred basis, the ‘computational basis’, in which the Hamiltonian matrix is represented, and casting the Hamiltonian $H$ as a sum of a diagonal operator and an off-diagonal operator in that basis. Hereafter we denote the set of computational basis state as $\{|z\rangle\}$ . Intuitively, the diagonal part of the Hamiltonian corresponds to a ‘classical’ Hamiltonian whereas its off-diagonal part governs the non-trivial ‘quantum’ dynamics of the system in the chosen basis. The off-diagonal component is further decomposed to a sum of products of diagonal and permutation operators [12]. This defines the PMR form of the Hamiltonian:

H=D_{0}+\sum_{i=1}^{M}D_{i}P_{i}\,,

(1)

where the $D_{i}$ ’s are diagonal operators and the $P_{i}$ ’s are permutation operators, i.e., operators that permute the basis states. For example, for qubit systems, if one chooses the computational basis to be the eigenbasis of the Pauli- $Z$ operators than the $P_{i}$ operators are strings of Pauli- $X$ operators. The decomposition Eq. (1) can be carried out efficiently for any physical Hamiltonian [12]. Next consider the evolution of a quantum state under $H$ for duration $t$ , which is then split to a sequence of $r$ repeated short-time evolution circuits each evolving the state by a short time period $\Delta t=t/r$ , the value of which we determine later on in a manner designed to optimize resources (we discuss this in detail in Sec. III.3).

The PMR decomposition of the Hamiltonian allows us to write the (short) time evolution operator $U(\Delta t)=e^{-iH\Delta t}$ in an off-diagonal series expansion:

\displaystyle e^{-iH\Delta t}

\displaystyle=\sum_{n=0}^{\infty}\frac{(-i\Delta t)^{n}}{n!}H^{n}=\sum_{n=0}^{\infty}\frac{(-i\Delta t)^{n}}{n!}\Big{(}\sum_{i=0}^{M}D_{i}P_{i}\Big{)}^{n},

(2)

where in the last step we identify $P_{0}$ with the identity operator. After some algebra, this expansion (see Refs. [10, 12] for a complete and detailed derivation) may be expressed as [5]

\displaystyle U(\Delta t)=\sum_{q=0}^{\infty}\frac{\Delta t^{q}}{q!}\sum_{{\bf i}_{q}}\Gamma_{{\bf i}_{q}}P_{{\bf i}_{q}}A_{{\bf i}_{q}}.

(3)

Here, the boldfaced index ${\bf i}_{q}=(i_{1},\ldots,i_{q})$ is a tuple of indices $i_{j}$ , with $j=1,\ldots,q$ , each ranging from $1$ to $M$ and $P_{{\bf i}_{q}}\mathrel{\mathop{\mathchar 58\relax}}=P_{i_{q}}\cdots P_{i_{2}}P_{i_{1}}$ is an ordered product of off-diagonal permutation operators. In addition, the operators $A_{{\bf i}_{q}}$ are the diagonal operators

A_{{\bf i}_{q}}=\sum_{z}\frac{d_{{\bf i}_{q}}}{\Gamma_{{{\bf i}_{q}}}}\frac{q!}{\Delta t^{q}}\text{e}^{-i\Delta t[E_{z_{0}},E_{z_{1}},\ldots,E_{z_{q}}]}|z\rangle\langle z|\,,

(4)

where $\text{e}^{-i\Delta t[E_{z_{0}},\ldots,E_{z_{q}}]}$ is the divided difference of the exponential function [13, 14] over the multi-set $\{E_{z_{0}},\ldots,E_{z_{q}}\}$ , where $E_{z_{j}}=\langle z_{j}|D_{0}|z_{j}\rangle$ with $|z_{j}\rangle=P_{i_{j}}\cdots P_{i_{1}}|z\rangle$ , and we used the convention that ${z_{0}}=z$ . We provide additional details pertaining to divided differences in Appendix A. Note that the $z_{j}$ ’s in $|z_{j}\rangle$ and in $E_{z_{j}}$ should actually be denoted by $z_{{\bf i}_{j}}(z)$ , however for conciseness we are using the abbreviations $z_{j}$ . Lastly, we have denoted the product of off-diagonal matrix element $d_{{\bf i}_{q}}=\prod_{j=1}^{q}d_{i_{j}}(z_{j})$ where

d_{i_{j}}(z_{j})=\langle z_{j}|D_{i_{j}}|z_{j}\rangle

(5)

can be considered the ‘hopping strength’ of $P_{i_{j}}$ with respect to $|z_{j}\rangle$ , and we have defined the real-valued coefficients $\Gamma_{{\bf i}_{q}}=\prod_{j}\Gamma_{i_{j}}$ where $\Gamma_{i}=\max_{z}|d_{i}(z)|$ .

In what follows, we use the expanded form of $U(\Delta t)$ given in Eq. (3) to show that the time evolution operator can be formulated as a linear combination of simple-to-execute unitary operators which in turn allows us to utilize the LCU technique to approximate the time-evolution operator.

The main technical contribution of this paper is devising an algorithm that implements the LCU with near-optimal scaling and at the same time utilizes only simple-to-implement unitary operators and avoid complicated classical computations on quantum registers or involved select unitaries. We present our algorithm next.

III The simulation algorithm

III.1 Divided difference exponentials as linear combinations of phases

To begin with, we present an efficient method for calculating in superposition the divided differences appearing in the $A_{{\bf i}_{q}}$ operators in a manner that does not require any classical (i.e., quantum reversible) arithmetic calculations on additional ancillary registers.

First, let us focus on the divided differences of the exponential function $\text{e}^{-i\tau[x_{0},\ldots,x_{q}]}$ for a multi-set of inputs $\{x_{0},\ldots,x_{q}\}\in\mathbb{R}^{q+1}$ and a real valued parameter $\tau$ . By exploiting the Leibniz rule for divided differences [13, 14], which states that

(f\cdot g)[x_{0},\ldots,x_{q}]=\sum_{j=0}^{q}f[x_{0},\ldots,x_{j}]g[x_{j},\ldots,x_{q}]\,,

(6)

for any two functions $f$ and $g$ , we can write

\text{e}^{-i\tau[x_{0},\ldots,x_{q}]}=\sum_{j=0}^{q}\text{e}^{-i\xi\tau[x_{0},\ldots,x_{j}]}\text{e}^{-i(1-\xi)\tau[x_{j},\ldots,x_{q}]}\,,

(7)

for any $\xi\in\mathbb{C}$ . A successive application of the rule yields

	$\displaystyle\text{e}^{-i\tau[x_{0},\ldots,x_{q}]}=$		(8)
	$\displaystyle\sum_{\mathclap{0\leq j_{1}\leq\ldots\leq j_{K-1}\leq q}}\text{e}^{-i\frac{\tau}{K}[x_{0},\ldots,x_{j_{1}}]}\text{e}^{-i\frac{\tau}{K}[x_{j_{1}},\ldots,x_{j_{2}}]}\cdots\text{e}^{-i\frac{\tau}{K}[x_{j_{K-1}},\ldots,x_{q}]}\,,$

for any integer $K\geq 1$ . We shall refer to $K$ as the divided differences approximation subdivision constant in what follows. For convenience, we shall choose it to be a power of two, namely, $K=2^{\kappa}$ , for a nonnegative integer $\kappa$ to which we will refer as the divided differences approximation depth.

As a next step, we utilize the observation that for every finite $q$ there is small enough $\delta\coloneqq\tau/K$ , such that the divided difference $\text{e}^{-i\delta[x_{0},\ldots,x_{q}]}$ is well approximated by [16]:

\text{e}^{-i\delta[x_{0},\ldots,x_{q}]}\approx\frac{(-i\delta)^{q}}{q!}\text{e}^{-i\delta\frac{1}{q+1}\sum_{j=0}^{q}x_{j}}

(9)

(we discuss the quality of this approximation and its effect on the algorithm precision later on). We can thus use Eq. (9) to approximate the divided differences in Eq. (8) to obtain

	$\displaystyle\text{e}^{-i\tau[x_{0},\ldots,x_{q}]}$	$\displaystyle\approx(-i\delta)^{q}\sum_{\mathclap{\begin{subarray}{c}\\ 0\leq j_{1}\leq\ldots\leq j_{K-1}\leq q\end{subarray}}}\frac{\text{e}^{-i\delta(\bar{x}_{(0,j_{1})}+\bar{x}_{(j_{1},j_{2})}+\ldots+\bar{x}_{(j_{K-1},q)})}}{j_{1}!(j_{2}-j_{1})!\cdots(q-j_{K-1})!}$
		$\displaystyle\coloneqq\hat{\text{e}}_{K}^{-i\tau[x_{0},\ldots,x_{q}]},$		(10)

where we used the notation

\bar{x}_{(j_{\ell-1},j_{\ell})}=\frac{1}{j_{\ell}-j_{\ell-1}+1}\sum_{m=j_{\ell-1}}^{j_{\ell}}x_{m}\,,

(11)

with $\ell=1,\ldots,K$ (and defining $j_{0}=0$ ). The choice for $K$ which ensures a near-optimal scaling of resources in terms of simulation error will be discussed in Sec. III.3.

Next, we write the approximation $\hat{\text{e}}^{-i\tau[\cdots]}_{K}$ as a sum of simple-to-calculate phases. By a change of variables, Eq. (III.1) can be rewritten as:

		$\displaystyle\hat{\text{e}}^{-i\tau[x_{0},\ldots,x_{q}]}_{K}=(-i\delta)^{q}$		(12)
		$\displaystyle\times\sum_{\mathclap{\begin{subarray}{c}j_{1},j_{2},\ldots,j_{K}\geq 0,\\ \sum_{\ell}j_{\ell}=q\end{subarray}}}\frac{\text{e}^{-i\delta(\bar{x}_{(0,j_{1})}+\bar{x}_{(j_{1},j_{1}+j_{2})}+\ldots+\bar{x}_{(j_{1}+\ldots+j_{K-1},j_{1}+\ldots+j_{K})})}}{j_{1}!j_{2}!\cdots j_{K}!}$
		$\displaystyle=\frac{(-i\delta)^{q}}{q!}\sum_{{\bf j}\in\Omega_{(\!K,q\!)}}{{q}\choose{j_{1},\ldots,j_{K}}}\prod_{\ell=1}^{K}\text{e}^{-i\delta\bar{x}_{\ell}}\,,$

where in the second equation ${{q}\choose{j_{1},\ldots,j_{K}}}=\frac{q!}{j_{1}!\cdots j_{K}!}$ is the multinomial coefficient, and we defined

	$\displaystyle\bar{x}_{\ell}$	$\displaystyle\coloneqq\bar{x}_{(j_{1}+\ldots+j_{\ell-1},j_{1}+\ldots+j_{\ell})}$		(13)
		$\displaystyle=\frac{1}{j_{\ell}+1}\sum_{m=j_{1}+\ldots+j_{\ell-1}}^{j_{1}+\ldots+j_{\ell}}x_{m}\,,$

with the convention that for $\ell=1$ , the sum $j_{1}+\ldots+j_{\ell-1}\equiv 0$ . For readability, we have abbreviated $\sum_{{\begin{subarray}{c}j_{1},j_{2},\ldots,j_{K}\geq 0,\sum_{\ell}j_{\ell}=q\end{subarray}}}$ by $\sum_{{\bf j}\in\Omega_{(\!K,q\!)}}$ with $\Omega_{(K,q)}$ being the set of all integer tuples ${\bf j}=(j_{1},\ldots,j_{K})$ that obey $0\leq j_{\ell}\leq q\,,\forall\ell$ , and $\sum_{\ell=1}^{K}j_{\ell}=q$ .

Moreover, the product $\prod_{\ell=1}^{K}\text{e}^{-i\delta\bar{x}_{\ell}}$ can be recast as a product of phases each of which proportional to one of the original $x_{j}$ inputs, namely

\displaystyle\prod_{\ell=1}^{K}\text{e}^{-i\delta\bar{x}_{\ell}}=\prod_{s=0}^{q}\text{e}^{-i\delta\alpha_{s}x_{s}}

(14)

where the coefficients $\alpha_{s}$ are given by

\displaystyle\alpha_{s}

\displaystyle=\sum_{{\ell\mathrel{\mathop{\mathchar 58\relax}}\,s\in[\Sigma_{\ell-1},\Sigma_{\ell}]}}\frac{1}{\Sigma_{\ell}-\Sigma_{\ell-1}+1},

(15)

with $\Sigma_{\ell}=j_{1}+\ldots+j_{\ell}$ . This simplification is achieved by noticing that $x_{\ell}$ contributes an additive factor to $\alpha_{s}$ if and only if it contains $x_{s}$ in its average, in which case the factor is $\frac{1}{j_{\ell}+1}=\frac{1}{\Sigma_{\ell}-\Sigma_{\ell-1}+1}$ . In addition, $x_{\ell}$ will contain $x_{s}$ if and only if $s$ is in the range $[\Sigma_{\ell-1},\Sigma_{\ell}]$ .

Furthermore, one can show that the $\alpha_{s}$ may be expressed in terms of the minimum $\ell$ index, which we denote $\ell_{\min}$ , obeying $s\in[\Sigma_{\ell},\Sigma_{\ell+1}]$ and the maximum $\ell$ index, denoted $\ell_{\max}$ for which $s\in[\Sigma_{\ell-1},\Sigma_{\ell}]$ , namely,

	$\displaystyle\alpha_{s}$	$\displaystyle=$	$\displaystyle\frac{1}{\Sigma_{\ell_{\min}+1}-\Sigma_{\ell_{\min}}+1}+\frac{1}{\Sigma_{\ell_{\max}}-\Sigma_{\ell_{\max}-1}+1}$		(16)
		$\displaystyle+$	$\displaystyle(\ell_{\max}-\ell_{\min}-2)\,.$		(16)

Using $\alpha_{s}$ and Eq. (14) we can now write the approximation Eq. (12) as a sum of phases

\displaystyle\hat{\text{e}}^{-i\tau[x_{0},\ldots,x_{q}]}_{K}=\frac{(-i\delta)^{q}}{q!}\sum_{{\bf k}_{q}}\text{e}^{-i\delta\sum_{s}\alpha_{s}x_{s}}\,,

(17)

where we replaced the sum over ${\bf j}\in\Omega_{(\!K,q\!)}$ with a ${\bf k}_{q}$ multi-index. The multi-index ${\bf k}_{q}=(k_{1},\ldots,k_{q})$ is a tuple of indices $k_{m}$ , with $m=1,\ldots,q$ , each ranging from $1$ to $K$ . The relation between the former indices $j_{1},\ldots,j_{K}$ and the current indices $k_{1},\ldots,k_{q}$ is such that $j_{\ell}$ is the number of indices in ${\bf k}_{q}$ whose value is $\ell$ (since there are $q$ indices, indeed $j_{\ell}$ can take on values in the range $[0,q]$ ). Since the ${\bf k}_{q}$ indices represent ordered occurrences, we remove the multinomial coefficient from the sum. We also note that $\alpha_{s}$ can be directly calculated from the multi-index ${\bf k}_{q}$ indices, since $\sum_{\ell}$ it is the sum total of all the index values $k_{1},\ldots,k_{q}$ whose value is less than or equal to $\ell$ .

Next we describe a quantum circuit designed to approximate $U(\Delta t)$ as given in Eq. (3) based on the approximation derived above.

III.2 The LCU implementation

Having obtained the approximation Eq. (17), we insert it now into Eq. (4), identifying $\delta=\Delta t/K$ and the inputs $x_{j}$ with diagonal energies $E_{z_{j}}$ . Equation (3) becomes

	$\displaystyle U(\Delta t)\approx\hat{U}=$	$\displaystyle\sum_{q=0}^{\infty}\frac{\Delta t^{q}}{q!}\sum_{{\bf i}_{q}}\Gamma_{{\bf i}_{q}}\frac{1}{K^{q}}\sum_{{\bf k}_{q}}\times$		(18)
		$\displaystyle(-i)^{q}P_{{\bf i}_{q}}\left[\sum_{z}\frac{d_{{\bf i}_{q}}}{\Gamma_{{\bf i}_{q}}}\text{e}^{-i\delta\sum_{s=0}^{q}\alpha_{s}{E}_{z_{s}}}\|z\rangle\langle z\|\right]\,.$

At this point we assume for ease of presentation that the matrix elements $d_{{\bf i}_{q}}=\prod_{j=1}^{q}\langle z_{j}|D_{i_{j}}|z_{j}\rangle$ are real-valued and independent of $z$ (i.e., that the $D_{i}$ matrices are proportional to the identity matrix), as this is the case for many physical systems. We will discuss the necessary adjustments to the algorithm (which do not lead to added complexity) in the case where these coefficients are $z$ -dependent later on. In this simpler case we have $D_{i}=\Gamma_{i}\mathds{1}$ , for some $\Gamma_{i}\in\mathbb{R}$ and $d_{{\bf i}_{q}}=\prod_{j}\Gamma_{i_{j}}$ , so $d_{{\bf i}_{q}}/\Gamma_{{\bf i}_{q}}=1$ , and the evolution operator $\hat{U}$ becomes

\displaystyle\hat{U}=

\displaystyle\sum_{q=0}^{\infty}\sum_{{\bf i}_{q}}\sum_{{\bf k}_{q}}\frac{\Gamma_{{\bf i}_{q}}\Delta t^{q}}{K^{q}q!}\ V_{({\bf i}_{q},{\bf k}_{q})}\,,

(19)

where

V_{({\bf i}_{q},{\bf k}_{q})}=(-i)^{q}P_{{\bf i}_{q}}\sum_{z}\text{e}^{-i\delta\sum_{s=0}^{q}\alpha_{s}E_{z_{s}}}|z\rangle\langle z|

(20)

are unitary operators. While the dependence of $V_{({\bf i}_{q},{\bf k}_{q})}$ on the ${\bf i}_{q}$ multi-index is evident, we note that the ${\bf k}_{q}$ dependence is (only) via the $\alpha_{s}$ coefficients. Truncating the infinite series in Eq. (19) at some maximal order $Q$ . and rearranging terms, we find

\displaystyle\hat{U}\approx\widetilde{U}\mathrel{\mathop{\mathchar 58\relax}}=\sum_{q=0}^{Q}\frac{(\Gamma\Delta t)^{q}}{q!}\sum_{{\bf i}_{q}}\frac{\Gamma_{{\bf i}_{q}}}{\Gamma^{q}}\sum_{{\bf k}_{q}}\frac{1}{K^{q}}V_{({\bf i}_{q},{\bf k}_{q})}\,,

(21)

where we have denoted the ‘off-diagonal norm’ of the Hamiltonian by $\Gamma=\sum_{i=1}^{M}\Gamma_{i}$ . The choice for $Q$ which ensures a near-optimal scaling of resources in terms of simulation error will be discussed in Sec. III.3.

Since Eq. (21) has the form of a linear combination of unitary operators, we can invoke the LCU technique to execute it on a quantum computer. The technique consists of two main components (i) state preparation on ancilla qubits and (ii) unitary circuits controlled by the ancilla qubits. We discuss a simple-to-implement protocol for the two in order next.

III.2.1 The state preparation subroutine

The ancilla state we prepare consists of three quantum (multi-)registers: (i) A $Q$ -qubit register, (ii) $Q$ registers with $M$ qubits each, and (iii) a third set of $Q$ registers consisting of $\kappa=\log_{2}K$ qubits each. Denoting $|{\bf 1}_{q}\rangle=|1\rangle^{\otimes{q}}|0\rangle^{\otimes{Q-q}}$ which is the unary encoding of the order $q$ , $|{\bf i}_{q}\rangle=|i_{1}\rangle\cdots|i_{q}\rangle|0\rangle^{\otimes{Q-q}}$ where $|i\rangle$ is a shorthand for the $M$ qubit state $|1\rangle_{i}|0\rangle^{\otimes M-1}$ , that is a unary encoding of $i\in[1,M]$ in which the $i$ -th qubit is set to $1$ while the rest of the $M-1$ qubits are set to $0$ , and $|{\bf k}_{q}\rangle=|k_{1}\rangle\cdots|k_{q}\rangle|0\rangle^{\otimes{Q-q}}$ – a shorthand for $Q$ quantum registers, each of dimension $K$ , the ancilla state for the LCU reads:

\displaystyle|\psi_{0}\rangle=\frac{1}{\sqrt{s}}\sum_{q=0}^{Q}\sqrt{\frac{(\Gamma\Delta t)^{q}}{q!}}\sum_{{\bf i}_{q}}\sqrt{\frac{\Gamma_{{\bf i}_{q}}}{\Gamma^{q}}}\sum_{{\bf k}_{q}}\frac{1}{\sqrt{K^{q}}}|{\bf 1}_{q}\rangle|{\bf i}_{q}\rangle|{\bf k}_{q}\rangle\,.

(22)

with the normalization constant

\displaystyle s

\displaystyle=\sum_{q=0}^{Q}\frac{(\Gamma\Delta t)^{q}}{q!}\sum_{{\bf i}_{q}}\frac{\Gamma_{{\bf i}_{q}}}{\Gamma^{q}}\sum_{{\bf k}_{q}}\frac{1}{K^{q}}=\sum_{q=0}^{Q}\frac{(\Gamma\Delta t)^{q}}{q!}.

(23)

We prepare the state in three stages. In the first, we use the first $Q$ -qubit register to prepare

|0\rangle^{\otimes Q}\to\frac{1}{\sqrt{s}}\sum_{q=0}^{Q}\sqrt{\frac{(\Gamma\Delta t)^{q}}{q!}}|{\bf 1}_{q}\rangle\,,

(24)

at the cost of ${\cal O}(Q)$ controlled $R_{y}$ rotations [5]. As a next step, we use the second set of $Q$ registers to prepare

|{\bf 1}_{q}\rangle|0\rangle^{\otimes Q}\to|{\bf 1}_{q}\rangle\sum_{{\bf{i}}_{q}}\sqrt{\frac{\Gamma_{{\bf{i}}_{q}}}{\Gamma^{q}}}|{\bf{i}}_{q}\rangle\,,

(25)

which can also be done at the cost of ${\cal O}(QM)$ controlled rotations [5]. Note that the right-hand-side of the last equation has a tensor produce structure over $q$ registers: $(|1\rangle\sum_{i=1}^{M}\sqrt{\Gamma_{i}/\Gamma}|i\rangle)^{\otimes q}$ . Similarly, the third and final stage consists of using the last set of $Q$ $\kappa$ -qubit registers to prepare

|{\bf 1}_{q}\rangle|0\rangle^{\otimes Q}\to|{\bf 1}_{q}\rangle\frac{1}{\sqrt{K^{q}}}\sum_{{\bf k}_{q}}|{\bf k}_{q}\rangle\,,

(26)

which can be carried out using $Q$ controlled Hadamard gates on $\kappa$ qubits. Therefore, overall the state preparation subroutine can be completed using ${\cal O}(Q(M+\kappa))$ controlled rotations.

III.2.2 The controlled unitaries

\Qcircuit

@C=2em @R=0.2em @!R — s ⟩ & \ctrl1 \ctrl2 \qw\ctrl2 \gate≪\qw

— i_q ⟩ \ctrl3 \qw\qw\qw\qw\qw

— k_q ⟩ \qw\ctrl1 \qw \ctrl1\qw \qw

— 0 ⟩ \qw\gateα_s \ctrl1 \gate-α_s \qw\qw

— z ⟩ \gateP_i_s\qw\gateie^-iδα_s D_0\qw \qw\qw U_(i_q,k_q)— z ⟩

Figure 1: A circuit description of

U_{(i_{q},{\bf k}_{q})}

Q

successive application of this circuit, preceded by a phase circuit

\text{e}^{-i\delta\alpha_{0}D_{0}}

, make up the controlled unitary

V_{({\bf i}_{q},{\bf k}_{q})}

of Eq. (27). The

|s\rangle

Q

qubits and is initialized with the rightmost bit set to one and the others set to zero. The

\ll

gate shifts the set bit one place to the left (if the leftmost bit is set, the gate sets the rightmost bit). The

\alpha_{s}

coefficients in the phase-gate

\text{e}^{-i\delta\alpha_{s}D_{0}}

are calculated using a circuit we denote in this figure by

\alpha_{s}({\bf k}_{q})

. The circuit for

\alpha_{s}

is described in the main text. The controlled

P_{i_{s}}

circuit executes

P_{i_{1}}

then

P_{i_{2}}

as per the value stored in the

|s\rangle

P_{i}

’s are Pauli-

X

strings, the control-

P_{i}

gates consist of application simple-to-implement CNOTs.

The second ingredient of the LCU protocol consists of designing the select-unitary operation:

|{\bf i}_{q}\rangle|{\bf k}_{q}\rangle|z\rangle\to|{\bf i}_{q}\rangle|{\bf k}_{q}\rangle V_{({\bf i}_{q},{\bf k}_{q})}|z\rangle\,,

(27)

where the $V_{({\bf i}_{q},{\bf k}_{q})}$ are given in Eq. (20). We now show that the select-unitary operation which executes Eq. (27) can be implemented using interleaved applications of CNOTs and controlled-phases operations. To that aim, we define the unitary operator:

U_{(i_{s},{\bf k}_{q})}\coloneqq(-i)\text{e}^{-i\delta\alpha_{s}({\bf k}_{q})D_{0}}P_{i_{s}}

(28)

where we have now spelled out the dependence of $\alpha_{s}$ on ${\bf k}_{q}$ . Let us next consider the action of the ordered product

U_{(i_{q},{\bf k}_{q})}\cdots U_{(i_{1},{\bf k}_{q})}\text{e}^{-i\delta\alpha_{0}({\bf k}_{q})D_{0}}\coloneqq U_{({\bf i}_{q},{\bf k}_{q})}\text{e}^{-i\delta\alpha_{0}({\bf k}_{q})D_{0}}

(29)

on a state $|z\rangle$ . We find that

	$\displaystyle U_{({\bf i}_{q},{\bf k}_{q})}\text{e}^{-i\delta\alpha_{0}({\bf k}_{q})D_{0}}\|z\rangle$
	$\displaystyle=(-i)^{q}P_{{\bf i}_{q}}\sum_{z}\text{e}^{-i\delta\sum_{s=1}^{q}\alpha_{s}({\bf k}_{q})E_{z_{s}}}\|z\rangle=V_{({\bf i}_{q},{\bf k}_{q})}\|z\rangle.$		(30)

Hence, $U_{({\bf i}_{q},{\bf k}_{q})}$ consists of an interleaved application of CNOT gates (since the $P_{i}$ operators are Pauli- $X$ strings) and a phase-kickback circuit that implements $\text{e}^{-i\delta\alpha_{s}({\bf k}_{q})D_{0}}$ [up to a simple factor of $(-i)$ ]. A circuit for $U_{({\bf i}_{q},{\bf k}_{q})}$ is sketched out in Fig. 1. For simplicity, the circuit uses a $Q$ -qubit ancilla that encodes $s=0,\ldots,Q$ in unary form, and a shift gate that, starting with $|0\ldots 01\rangle$ , shifts the $1$ bit to the left with every application. By including this counter register we are able to express $V_{({\bf i}_{q},{\bf k}_{q})}$ as $Q$ repeated applications of the circuit shown in Fig. 1.

In addition, since the diagonal $D_{0}$ is written as a linear combination of Pauli- $Z$ strings, i.e., as $D_{0}=\sum J_{k}Z_{k}$ , for some $J_{k}\in{\mathbb{R}}$ and where the $Z_{k}$ operators stand for some Pauli- $Z$ strings, the diagonal unitary $\text{e}^{-i\delta\alpha_{s}({\bf k}_{q})D_{0}}$ is essentially a product of $\text{e}^{-i\delta\alpha_{s}J_{k}Z_{k}}$ each of which is trivial to implement, once the coefficient $\alpha_{s}$ is provided. Next, we discuss the implementation of the $\alpha_{s}$ calculation.

III.2.3 Calculating the $\alpha_{s}$ coefficients

First, we note that given a quantum register that stores an integer $b$ , one can implement $R_{z}(b\theta)$ as well as $R_{z}(\theta/b)$ using $\log_{2}{b}$ calls to a single qubit $R_{z}$ rotation gate [17]. Next, we observe that $\alpha_{s}$ consists of three terms, each of which is either an integer or the reciprocal thereof, cf. Eq. (16). Therefore, by calculating and storing those integers in a quantum register, one may implement $R_{z}(\alpha_{s}\theta)$ efficiently.

Lastly, finding $\ell_{\min}$ and $\ell_{\max}$ as per Eq. (16) (each of which requiring a $\kappa$ -qubit register), may be achieved with ${\cal O}(\kappa)$ operations using reversible binary search (a blueprint for such a circuit is given in Appendix C) provided one has access to a cost function circuit that calculates $\Sigma_{\ell}$ for a given $\ell$ value.

A circuit for calculating $\Sigma_{\ell}$ from $\ell$ can be constructed by implementing an integer comparison circuit [18]

|k\rangle|\ell\rangle|\Sigma\rangle\to|k\rangle|\ell\rangle|\Sigma\oplus\delta_{k\leq\ell_{\min}}\rangle\,,

(31)

i.e., a circuit that increments the third $\Sigma$ register if and only if $k\leq\ell_{\min}$ . Here the $|k\rangle$ and $|\ell\rangle$ registers have $\kappa$ qubits and $|\Sigma\rangle$ has $\log_{2}Q$ qubits. Executing the above sub-routine $q$ times sequentially with the $|k_{1}\rangle\ldots|k_{q}\rangle$ functioning as the first register in each iteration, we obtain a circuit that implements

\displaystyle\left(|k_{1}\rangle|k_{2}\rangle\cdots|k_{q}\rangle\right)|\ell\rangle|0\rangle\to\left(|k_{1}\rangle|k_{2}\rangle\cdots|k_{q}\rangle\right)|\ell\rangle|\Sigma_{\ell}\rangle\,.

(32)

Implementation of a binary search circuit using $\Sigma_{\ell}$ as the cost function, allows us to construct a circuit that produces

|{\bf k}_{q}\rangle|0\rangle^{\otimes 4}\to|{\bf k}_{q}\rangle|\ell_{\min}\rangle|\ell_{\max}\rangle|\Sigma_{\ell_{\min}}\rangle|\Sigma_{\ell_{\max}}\rangle\,.

(33)

The last two registers can in turn be used to calculate the relevant integers that appear in Eq. (16). The above can be done in ${\cal O}(\kappa)$ operations [18].

We mention in passing that another ${\cal O}(K)$ [rather than ${\cal O}(\kappa)$ ] method for calculating $\alpha_{s}$ is available which is nonetheless simpler to implement as it does not involve the calculation of $\ell_{\min}$ and $\ell_{\max}$ . For this one, we write

\text{e}^{-i\delta\alpha_{s}E_{z_{s}}}=\prod_{\ell=1}^{K}\text{e}^{-i\delta\frac{b_{s\ell}E_{z_{s}}}{\Sigma_{\ell}-\Sigma_{\ell-1}+1}}\,,

(34)

where $b_{s\ell}$ denotes a bit that is set to one if $s\in[\Sigma_{\ell-1},\Sigma_{\ell}]$ and is zero otherwise. This requires a circuit $|s\rangle|\ell\rangle|0\rangle\to|s\rangle|\ell\rangle|b_{s\ell}\rangle$ implementing an integer comparison circuit which checks for the conditions $s\geq\Sigma_{\ell-1}$ and $s\leq\Sigma_{\ell}$ .

III.3 Algorithm cost

In the previous section we worked out the circuit $\widetilde{U}$ which approximates the short-time evolution $U(\Delta t)$ . In this section we provide the resource scaling analysis that ensures that $\widetilde{U}$ is $\epsilon/r$ -close to $U(\Delta t)$ in spectral distance, $\|\widetilde{U}-U(\Delta t)\|\leq\epsilon/r$ , where $\epsilon$ is the overall error over the simulation time $t$ , and $r=t/\Delta t$ . From the subadditivity property of the spectral norm it is ensured that with $r$ repetition of $\widetilde{U}$ , the overall simulation is $\epsilon$ -close to the exact dynamics induced by the exact $\text{e}^{-iHt}$ .

Our algorithm involves two basic approximations to the exact dynamics, $U(\Delta t)\to\hat{U}\to\widetilde{U}$ , determined by the divided differences approximation constant $K$ and the truncation order $Q$ , respectively. Therefore, to ensure that $\|\widetilde{U}-U(\Delta t)\|\leq\epsilon/r$ we require that both $\|\hat{U}-U(\Delta t)\|$ and $\|\widetilde{U}-\hat{U}\|$ are at most $\epsilon/2r$ .

First we note that the LCU formalism [4] dictates that the ancilla state preparation normalization factor $s$ should be such that $|s-2|\leq\epsilon/2r$ to ensure that $\|\widetilde{U}-\hat{U}\|\leq\epsilon/2r$ . According to Eq. (23), fixing $\Delta t=\ln 2/\Gamma$ (equivalently, $r=t\Gamma/\ln 2$ ) and $Q={\cal O}(\log(\Gamma t/2\epsilon)/\log\log(\Gamma t/2\epsilon))$ , in our algorithm ensures $|s-2|\leq\epsilon/2r$ , as required.

Second, based on Eq. (9), we show in Appendix B that

\Big{|}\text{e}^{-i\Delta t[E_{z_{0}}\ldots,E_{z_{q}}]}-\hat{\text{e}}^{-i\Delta t[E_{z_{0}}\ldots,E_{z_{q}}]}_{K}\Big{|}\leq\frac{\Delta t^{q}}{q!}\Big{(}\frac{\Delta t\Delta E}{2K^{2}}\Big{)}^{2}\,,

(35)

where $\Delta E\sim{\cal O}(1)$ is a bound on $|E_{z_{j+1}}-E_{z_{j}}|$ for all $j$ , i.e., on two consecutive diagonal energies, from which it follows that

	$\displaystyle\\|\hat{U}-U(\Delta t)\\|$	$\displaystyle\leq\sum_{q,{\bf i}_{q}}\Gamma_{{\bf i}_{q}}\Big{\|}A_{{\bf i}_{q}}-\hat{A}_{{\bf i}_{q}}\Big{\|}$		(36)
		$\displaystyle=\sum_{q,{\bf i}_{q}}\Gamma_{{\bf i}_{q}}\frac{\Delta t^{q}}{q!}\left(\frac{\Delta t\Delta E}{2K}\right)^{2}=\frac{1}{2}\left(\frac{\Delta t\Delta E}{K}\right)^{2}.$

Hence, choosing $K={\cal O}(\frac{\sqrt{r}}{\Gamma\sqrt{\epsilon}})={\cal O}(\sqrt{t}/\sqrt{\Gamma\epsilon})$ ensures that $\|\hat{U}-U(\Delta t)\|\leq{\epsilon/(2r)}$ . Note that the gate and qubit resources of our algorithm scale with $\kappa=\log_{2}K$ , that is, logaritmic in $\sqrt{t/(\Gamma\epsilon)}$ .

Finally, with the above choices for $Q$ and $\kappa$ we recall that the number of ancilla qubits needed and the number of gates of a single LCU execution are ${\cal O}(Q(M+\kappa))$ , as discussed above. Since both $Q$ and $\kappa$ scale as $\log t/\Gamma\epsilon$ and $\log\sqrt{t/\Gamma\epsilon}$ , respectively, we find that the algorithm does indeed have near-optimal dependence of precision.

III.4 The case of $z$ -dependent $d_{{\bf i}_{q}}$

In the previous section, we assumed for simplicity that $d_{{\bf i}_{q}}/\Gamma_{{{\bf i}_{q}}}=1$ whereas in the general case all that is guaranteed is that $|d_{{\bf i}_{q}}/\Gamma_{{{\bf i}_{q}}}|\leq 1$ which follows from $|d_{i_{j}}/\Gamma_{i_{j}}|\leq 1\,\forall j$ . The ratio $d_{i_{j}}/\Gamma_{i_{j}}$ may always be expressed as the average of two phases:

\frac{d_{i_{j}}}{\Gamma_{i_{j}}}=\frac{1}{2}\Big{(}\text{e}^{i(\theta_{i_{j}}+\phi_{i_{j}})}+\text{e}^{i(\theta_{i_{j}}-\phi_{i_{j}})}\Big{)}.

(37)

Therefore in this case, we can simply replace $U_{(i_{s},{\bf k}_{q})}$ of Eq. (28) with

U_{(i_{s},{\bf k}_{q})}\coloneqq-\frac{i}{2}\sum_{k=0,1}\text{e}^{i(\theta_{i_{s}}+(-1)^{k}\phi_{i_{s}})}\text{e}^{-i\delta\alpha_{s}({\bf k}_{q})D_{0}}P_{i_{s}},

(38)

i.e., we can express it as an average of two unitaries. The simulation circuit in this case can be implemented in much the same manner as before, where now the factor $\text{e}^{i(\theta_{i_{s}}+(-1)^{k}\phi_{i_{s}})}$ can be encoded directly as a phase if the off-diagonal matrix elements are given in polar coordinates (and otherwise using a phase-kickback circuit using quantum registers that encode $|i_{s}\rangle|\theta_{i_{s}}\rangle|\phi_{i_{s}}\rangle|k\rangle$ [5]). Additionally, the LCU state preparation should be modified so as to include $Q$ additional qubits each prepared in the $|+\rangle$ state alongside the change $\Gamma\Delta t\to\Gamma\Delta t/2$ in the first stage of the state preparation routine to account for the $q$ extra factors of $1/2$ . These modifications do not alter the overall complexity of the algorithm.

IV Summary and conclusions

We devised a simple-to-implement quantum algorithm designed to simulate the dynamics of general Hamiltonians on a quantum computer. While straightforward to execute, we have shown that the proposed algorithm retains near-optimal dependence on the target precision.

Our algorithm has numerous properties that we argue make it attractive for implementation on resource-limited near-future quantum computing devices, which are characterized by being small and noisy and on which one is not afforded the luxury of fault tolerance and error correction schemes. These are: (i) Neither the gate cost nor the qubit cost of the algorithm depend on the norm of the diagonal component of the Hamiltonian. While for most simulation algorithms the number of repetitions of the short-time evolution unitary grows linearly with the diagonal norm, in the present algorithm the dependence on $D_{0}$ enters only via trivial phases. (ii) In addition, the present algorithm offers a compact LCU decomposition of the Hamiltonian where the terms in the Hamiltonians are grouped according to which bits they flip. This is the PMR decomposition which is in the general case considerably more compact than the customary breakup of $H$ to a linear combination of Pauli strings [5]. As such, the number of ancilla qubits required for the LCU state preparation subroutine is expected to be in general less costly by orders of magnitude compared to standard decompositions. (iii) Last, we note again the simplicity of our proposed algorithm, which prescribes the implementation of only simple-to-implement, and for the most part straightforward, sub-routines. The above property is specially important in the NISQ era where quantum computing platforms that are small and noisy and where any unnecessary qubit or gate that are added to the circuit may be decisive in terms of performance.

Furthermore, The PMR method presented above also extends naturally to the case of time-dependent Hamiltonians with the main modification being that in the time-dependent case the divided-difference inputs no longer consist only of diagonal energies but must rather be augmented with the frequencies of the time dependence [6], namely,

E_{z_{j}}\to E_{z_{j}}+\sum_{\ell=j+1}^{q}\lambda_{(\ell,z_{j})}

(39)

where the $\lambda_{(\ell,z_{j})}$ are (in general complex-valued) frequencies of the now time-dependent diagonal operators $D_{i}$ in the PMR decomposition of the Hamiltonian. As this is the only difference between the time-independent and time-dependent case, the divided-difference approximation introduced here similarly applies to time-dependent simulations as well. A major advantage of the PMR formulation in the time-dependent case is that the cost of the present algorithm is linear in the evolution time and does not depend on frequencies (see Ref. [6]). This property too translates to potentially critical savings in gate and qubit costs.

In light of the above, we hope that the algorithm we proposed here will prove to be a useful tool in coming years where near-term quantum computing devices become more widely available and allow their users to generate credible simulation results for scientifically relevant problems. In a future study we hope to report on a performance comparison of this algorithm against existing schemes when tasked with simulating a scientifically meaningful model and executed on a NISQ device. An apples-to-apples comparison against existing algorithms on specific applications will highlight the advantages of our approach.

Acknowledgements.

We thank Arman Babakhani and Lev Barash for useful suggestions. IH acknowledges support by the Office of Advanced Scientific Computing Research of the U.S. Department of Energy under Contract No DE-SC0024389. In addition, this research was developed with funding from the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR00112330014. The views, opinions, and/or findings expressed are those of the authors and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.

References

Lloyd [1996] S. Lloyd, Science 273, 1073 (1996).
Childs et al. [2021] A. M. Childs, Y. Su, M. C. Tran, N. Wiebe, and S. Zhu, Phys. Rev. X 11, 011020 (2021).
Low et al. [2016] G. H. Low, T. J. Yoder, and I. L. Chuang, Phys. Rev. X 6, 041067 (2016).
Berry et al. [2015] D. W. Berry, A. M. Childs, R. Cleve, R. Kothari, and R. D. Somma, Phys. Rev. Lett. 114, 090502 (2015).
Kalev and Hen [2021] A. Kalev and I. Hen, Quantum 5, 426 (2021).
Chen et al. [2021] Y.-H. Chen, A. Kalev, and I. Hen, PRX Quantum 2, 030342 (2021).
An et al. [2023] D. An, J.-P. Liu, and L. Lin, Phys. Rev. Lett. 131, 150603 (2023).
Chakraborty [2024] S. Chakraborty, Quantum 8, 1496 (2024).
Berry et al. [2014] D. W. Berry, A. M. Childs, R. Cleve, R. Kothari, and R. D. Somma, in Proceedings of the Forty-Sixth Annual ACM Symposium on Theory of Computing, STOC ’14 (Association for Computing Machinery, New York, NY, USA, 2014) p. 283–292.
Albash et al. [2017] T. Albash, G. Wagenbreth, and I. Hen, Phys. Rev. E 96, 063309 (2017).
Hen [2018] I. Hen, Journal of Statistical Mechanics: Theory and Experiment 2018, 053102 (2018).
Gupta et al. [2020] L. Gupta, T. Albash, and I. Hen, Journal of Statistical Mechanics: Theory and Experiment 2020, 073105 (2020).
Whittaker and Robinson [1940] E. T. Whittaker and G. Robinson, The Calculus of Observations: A Treatise on Numerical Mathematics, 3rd Edition. (Blackie and Sons Limited, London, 1940).
de Boor [2005] C. de Boor, Surveys in Approximation Theory 1, 46 (2005).
Hao Low and Wiebe [2018] G. Hao Low and N. Wiebe, ArXiv e-prints , arXiv:1805.00675 (2018), arXiv:1805.00675 [quant-ph] .
Kalev and Hen [2024] A. Kalev and I. Hen, Feynman path integrals for discrete-variable systems: Walks on hamiltonian graphs (2024), arXiv:2407.11231 [quant-ph] .
Nielsen and Chuang [2011] M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information: 10th Anniversary Edition, 10th ed. (Cambridge University Press, USA, 2011).
Sanders et al. [2019] Y. R. Sanders, G. H. Low, A. Scherer, and D. W. Berry, Phys. Rev. Lett. 122, 020502 (2019).

Appendix A Notes on divided differences

We provide below a brief summary of the concept of divided differences, which is a recursive division process. This method is typically encountered when calculating the coefficients in the interpolation polynomial in the Newton form.

The divided differences [13, 14] of a function $f(\cdot)$ are defined as

f[x_{0},\ldots,x_{q}]\equiv\sum_{j=0}^{q}\frac{f(x_{j})}{\prod_{k\neq j}(x_{j}-x_{k})}

(40)

with respect to the list of real-valued input variables $[x_{0},\ldots,x_{q}]$ . The above expression is ill-defined if some of the inputs have repeated values, in which case one must resort to the use of limits. For instance, in the case where $x_{0}=x_{1}=\ldots=x_{q}=x$ , the definition of divided differences reduces to:

f[x_{0},\ldots,x_{q}]=\frac{f^{(q)}(x)}{q!}\,,

(41)

where $f^{(n)}(\cdot)$ stands for the $n$ -th derivative of $f(\cdot)$ . Divided differences can alternatively be defined via the recursion relations

\displaystyle f[x_{i},\ldots,x_{i+j}]=\frac{f[x_{i+1},\ldots,x_{i+j}]-f[x_{i},\ldots,x_{i+j-1}]}{x_{i+j}-x_{i}}\,,

(42)

with $i\in\{0,\ldots,q-j\},\ j\in\{1,\ldots,q\}$ and the initial conditions

f[x_{i}]=f(x_{i}),\qquad i\in\{0,\ldots,q\}\quad\forall i\,.

(43)

A function of divided differences can be defined in terms of its Taylor expansion

f[x_{0},\ldots,x_{q}]=\sum_{n=0}^{\infty}\frac{f^{(n)}(0)}{n!}[x_{0},\ldots,x_{q}]^{n}\ .

(44)

Moreover, it is easy to verify that

[x_{0},\ldots,x_{q}]^{q+n}=\Bigg{\{}\begin{tabular}[]{ l c l }$0$&\phantom{$0$}&$n<0$\\ $1$&\phantom{$0$}&$n=0$\\ $\sum_{\sum k_{j}=n}\prod_{j=0}^{q}x_{j}^{k_{j}}$&\phantom{$0$}&$n>0$\\ \end{tabular}\,.

(45)

One may therefore write:

f[x_{0},\ldots,x_{q}]=\sum_{n=0}^{\infty}\frac{f^{(n)}(0)}{n!}[x_{0},\ldots,x_{q}]^{n}=\sum_{n=q}^{\infty}\frac{f^{(n)}(0)}{n!}[x_{0},\ldots,x_{q}]^{n}=\sum_{m=0}^{\infty}\frac{f^{(q+m)}(0)}{(q+m)!}[x_{0},\ldots,x_{q}]^{q+m}.

(46)

The above expression can be further simplified to

f[x_{0},\ldots,x_{q}]=\sum_{\{k_{i}\}=(0,\ldots,0)}^{(\infty,\ldots,\infty)}\frac{f^{(q+\sum k_{i})}(0)}{(q+\sum k_{i})!}\prod_{j=0}^{q}x_{j}^{k_{j}},

(47)

as was asserted in the main text.

Appendix B Derivation of the divided-difference approximation bound

Here, we bound the absolute value of the difference between the divided difference exponential $\text{e}^{-i\Delta t[E_{0},\ldots,E_{q}]}$ and its approximation

\hat{\text{e}}_{K}^{-i\Delta t[E_{z_{0}},\ldots,E_{z_{q}}]}\coloneqq\left(\frac{-i\Delta t}{K}\right)^{q}\sum_{\mathclap{\begin{subarray}{c}\\ 0\leq j_{1}\leq\ldots\leq j_{K-1}\leq q\end{subarray}}}\frac{\text{e}^{-i\frac{\Delta t}{K}(\bar{E}_{(0,j_{1})}+\bar{E}_{(j_{1},j_{2})}+\ldots+\bar{E}_{(j_{K-1},q)})}}{j_{1}!(j_{2}-j_{1})!\cdots(q-j_{K-1})!}\,,

(48)

where $\bar{E}_{(j_{\ell},j_{\ell+1})}=\frac{E_{z_{j_{\ell}}}+\cdots+E_{z_{j_{\ell+1}}}}{j_{\ell+1}-j_{\ell}+1}$ . For what follows, we shall assume that $|E_{z_{j+1}}-E_{z_{j}}|\leq\Delta E$ for some $\Delta E$ which is of order unity, i.e., we shall assume it is an ${\cal O}(1)$ quantity that does not scale with system size, evolution time or Hamiltonian norm. This will be the case for all physical Hamiltonians, as the basis states $|z_{j+1}\rangle$ and $|z_{j}\rangle$ differ for those by a single $k$ -local permutation operation. Thus, for any the physical Hamiltonian the change $\Delta E$ corresponds to a $k$ -local change in the basis state energy.

Next, let us use the fact [16] that the divided difference approximation is worst when the standard deviation of its inputs is maximal, namely

\left|\frac{q!\text{e}^{-i\Delta t[E_{z_{0}},\ldots,E_{z_{q}}]}}{(-i\Delta t)^{q}\text{e}^{-i\Delta t(E_{(0,q)})}}-1\right|\leq\frac{\sigma^{2}\Delta t^{2}}{2(q+2)}\,,

(49)

where $\sigma^{2}$ is the variance of the inputs $\{E_{z_{0}},\ldots,E_{z_{q}}\}$ . Owing to this observation, we find that the bound

\Big{|}\text{e}^{-i\Delta t[E_{z_{0}},E_{z_{1}}\ldots,E_{z_{q}}]}-\hat{\text{e}}_{K}^{-i\Delta t[E_{z_{0}},\ldots,E_{z_{q}}]}\Big{|}=\Big{|}\text{e}^{-i\Delta t[0,\ldots,E_{z_{1}}-E_{z_{0}},\ldots,E_{z_{q}}-E_{z_{0}}]}-\hat{\text{e}}_{K}^{-i\Delta t[0,\ldots,E_{z_{1}}-E_{z_{0}},\ldots,E_{z_{q}}-E_{z_{0}}]}\Big{|}

(50)

is maximized for the inputs $E_{z_{j}}=j\Delta E$ (i.e., this choice maximizes the variance of the inputs). Equation (50) can thus be bounded by the quantity

\Big{|}\text{e}^{-i\Delta t[0,\Delta E,\ldots,q\Delta E]}-\hat{\text{e}}_{K}^{-i\Delta t[0,\Delta E,\ldots,q\Delta E]}\Big{|}\,,

(51)

a quantity that we can calculate exactly, as both the divided difference and its approximation can be caluclated directly. A calculation of $\text{e}^{-i\Delta t[0,\Delta E,\ldots,q\Delta E]}$ reveals:

\text{e}^{-i\Delta t[0,\Delta E,\ldots,q\Delta E]}=\frac{\left(\frac{-2i\text{e}^{-i\Delta t\Delta E/2}}{\Delta E}\sin\frac{\Delta t\Delta E}{2}\right)^{q}}{q!}\,,

(52)

whereas for $\hat{\text{e}}_{K}^{-i\Delta t[E_{z_{0}},\ldots,E_{z_{q}}]}$ , after plugging in the inputs and simplifying, we obtain

\hat{\text{e}}_{K}^{-i\Delta t[0,\Delta E,\ldots,q\Delta E]}=\frac{\left(\frac{-2i\Delta t\text{e}^{-i\Delta t\Delta E/2}}{2K\sin\frac{\Delta t\Delta E}{2K}}\sin\frac{\Delta t\Delta E}{2}\right)^{q}}{q!}\,,

(53)

which gives for their ratio

\frac{\text{e}^{-i\Delta t[0,\Delta E,\ldots,q\Delta E]}}{\hat{\text{e}}_{K}^{-i\Delta t[E_{z_{0}},\ldots,E_{z_{q}}]}}=\left(\frac{\sin\frac{\Delta t\Delta E}{2K}}{\frac{\Delta t\Delta E}{2K}}\right)^{q}\,.

(54)

Putting it all together, we find that

\displaystyle\Big{|}\text{e}^{-i\Delta t[E_{z_{0}},E_{z_{1}}\ldots,E_{z_{q}}]}-\hat{\text{e}}_{K}^{-i\Delta t[E_{z_{0}},\ldots,E_{z_{q}}]}\Big{|}\leq\frac{\Delta t^{q}}{q!}\left|\frac{\text{e}^{-i\Delta t[0,\Delta E,\ldots,q\Delta E]}}{\hat{\text{e}}_{K}^{-i\Delta t[E_{z_{0}},\ldots,E_{z_{q}}]}}-1\right|=\frac{\Delta t^{q}}{q!}\left(1-\left(\frac{\sin\frac{\Delta t\Delta E}{2K}}{\frac{\Delta t\Delta E}{2K}}\right)^{q}\right)\leq\frac{\Delta t^{q}}{q!}\left(\frac{\Delta t\Delta E}{2K}\right)^{2}\,.

(55)

Appendix C A quantum circuit for reversible binary search

Given an oracle $|x\rangle|0\rangle\to|x\rangle|f(x)\rangle$ where $f(x)$ is monotonically non-decreasing in $x$ and $x=0\ldots K-1$ (here K $=2^{\kappa}$ ), the task of a reversible binary search circuit is to find the smallest index $x_{*}\in[0,K-1]$ such that $s\geq f(x_{*})$ for a given input $s$ , namely, a circuit that yields $|s\rangle|0\rangle\to|s\rangle|x_{*}\rangle$ . Here, the second register has $\kappa$ qubits.

To construct the desired circuit, we utilize an integer comparison oracle $C|s\rangle|f\rangle|0\rangle=|s\rangle|f\rangle|b_{sf}\rangle$ where $b_{sf}=1$ iff $s\geq f$ (see Ref. [18]). We implement $|s\rangle|0\rangle\to|s\rangle|x_{*}\rangle$ by $\kappa$ calls to $C$ , such that each call will fix a single bit of $x_{*}$ going from left (most significant) to right (least significant).

Starting with the leftmost bit of $x_{*}$ , which we call $x_{\kappa}$ , and moving to the right, we have $C|s\rangle|f(K/2)\rangle|0\rangle|0\rangle^{\otimes\kappa-1}=|s\rangle|f(K/2)\rangle|x_{\kappa}\rangle|0\rangle^{\otimes\kappa-1}$ . We note that the input to $f$ , namely $K/2$ , is the value of the output register with the leftmost bit set to $1$ . The next bit is similarly set by $C|s\rangle|f(x_{\kappa}K/2+K/4)\rangle|x_{\kappa}\rangle|0\rangle|0\rangle^{\otimes\kappa-2}=|s\rangle|f(x_{\kappa}K/2+K/4)\rangle|x_{\kappa}\rangle|x_{\kappa-1}\rangle|0\rangle^{\otimes\kappa-2}$ , noting that now the input to $f$ is the value of the output register with the second leftmost bit set to 1. This sets $x_{\kappa-1}$ . We continue similarly to set the rest of the $x_{*}$ bits. After $\kappa$ iterations of we find the output register is the desired index $x_{*}$ .