A simple quantum simulation algorithm with near-optimal precision scaling
Abstract
Quantum simulation is a foundational application for quantum computers, projected to offer insights into complex quantum systems that are beyond the reach of classical computation. However, with the exception of Trotter-based methods which suffer from suboptimal scaling with respect to simulation precision, existing simulation techniques are for the most part too intricate to implement on early fault-tolerant quantum hardware. We propose a quantum Hamiltonian dynamics simulation algorithm that aims to be both straightforward to implement and at the same time have near-optimal scaling in simulation precision.
I Introduction
Quantum computers are widely believed to have unique advantages over classical computers when it comes to simulating Hamiltonian dynamics, due to their inherent quantum nature, which allows them to efficiently represent complex quantum states and the evolution thereof. Such simulations hold the potential to solve intricate problems in quantum chemistry, materials science and physics, making quantum computers potentially very powerful future tools for advancing scientific knowledge and tackling challenges beyond the capabilities of classical computing.
The precise manner in which Hamiltonian dynamics can be approximated on quantum devices varies considerably between approaches, as each is tailored to different resource constraints, runtime requirements and precision needs. Perhaps the most straightforward and resource-efficient technique is that of Trotterization, in which case the time-evolution operator is approximated using Trotter-Suzuki product formulas [1, 2]. This approach is simple to implement, however it suffers from a suboptimal scaling of the resources required for its execution with error tolerance. A more advanced and vastly popular technique is quantum signal processing (QSP) [3], which leverages a sequence of unitary operations combined with classical preprocessing to directly approximate the exponential of a Hamiltonian. Complementing QSP, the linear combination of unitaries (LCU) framework [4] encompasses some of the most powerful quantum simulation methods [4, 5, 6, 7, 8]. In LCU-based approaches, the time evolution operator is expressed as a linear combination of efficiently implementable unitary operators, with ancillary qubits facilitating the simulation through oblivious amplitude amplification [9]. This method is particularly advantageous for simulating Hamiltonians in quantum chemistry and materials science, where decomposition of the Hamiltonian into simpler components is feasible.
Despite enjoying near-optimal scaling of resources with respect to target precision — typically requiring only a logarithmic number of ancilla qubits — these more advanced approaches present significant challenges for implementation on near-term quantum computing devices. For example, in the typical LCU-based approach, the select unitary transformations are (multi-qubit) control of Pauli strings. These control operations, which involve different Pauli operations acting on different qubits, translate to a non-trivial overhead in resources which then hinders the technique’s otherwise straightforward applicability on near-term quantum computing devices.
In this work, we present a novel LCU-based quantum simulation algorithm designed to be both straightforward to implement, by generally requiring only control-not (CNOT) operations as the select unitaries, while at the same time offering the benefits of near-optimal resource scaling, making dynamical simulations potentially more accessible for near-term devices.
The simulation algorithm we present builds on two recent studies [5, 6] that have shown how the expansion of the unitary time-evolution operator in powers of its off-diagonal strength [10, 11, 12] combined with the use of the concept of divided differences [13, 14] leads to resource-efficient quantum simulation algorithms whose complexity is on par and, in some cases, superior to other state-of-the-art optimal-precision simulation protocols [4, 15]. The approach we present here, which we refer to as the permutation matrix representation (PMR) simulation method, similarly utilizes a series expansion of the quantum time-evolution operator in its off-diagonal elements. However, in contrast to the work depicted in Refs. [5, 6], here we utilize a powerful approximation of divided differences which allows us to render those in terms of sums of phases which in turn leads to simplifications in the select unitary operations.
As we demonstrate, when the above approximation is combined with the LCU technique, the resulting simulation algorithm enjoys both near-optimal scaling in terms of the overall simulation error and simple-to-implement CNOTs interleaved with controlled phases as the select unitary operations.
The paper is organized as follows. In Sec. II, we provide an overview of the permutation matrix representation technique which serves as a foundation for our algorithm. In Sec. III, we present the Hamiltonian dynamics algorithm, which we construct using PMR, and discuss the divided difference calculation including cost analysis and various extensions. A discussion and some conclusions are given in Sec. IV.
II Permutation matrix representation of the time evolution operator
Before delving into the technical specifics of our algorithm, we first briefly present the key components of the PMR approach to expanding matrix exponentials in a series [12, 5, 6]. For concreteness we restrict our attention here to time-independent Hamiltonians and defer the discussion about how the method is to be extended to the time-dependent case to the concluding section. The PMR expansion consists of several steps, the first of which is choosing a preferred basis, the ‘computational basis’, in which the Hamiltonian matrix is represented, and casting the Hamiltonian as a sum of a diagonal operator and an off-diagonal operator in that basis. Hereafter we denote the set of computational basis state as . Intuitively, the diagonal part of the Hamiltonian corresponds to a ‘classical’ Hamiltonian whereas its off-diagonal part governs the non-trivial ‘quantum’ dynamics of the system in the chosen basis. The off-diagonal component is further decomposed to a sum of products of diagonal and permutation operators [12]. This defines the PMR form of the Hamiltonian:
(1) |
where the ’s are diagonal operators and the ’s are permutation operators, i.e., operators that permute the basis states. For example, for qubit systems, if one chooses the computational basis to be the eigenbasis of the Pauli- operators than the operators are strings of Pauli- operators. The decomposition Eq. (1) can be carried out efficiently for any physical Hamiltonian [12]. Next consider the evolution of a quantum state under for duration , which is then split to a sequence of repeated short-time evolution circuits each evolving the state by a short time period , the value of which we determine later on in a manner designed to optimize resources (we discuss this in detail in Sec. III.3).
The PMR decomposition of the Hamiltonian allows us to write the (short) time evolution operator in an off-diagonal series expansion:
(2) |
where in the last step we identify with the identity operator. After some algebra, this expansion (see Refs. [10, 12] for a complete and detailed derivation) may be expressed as [5]
(3) |
Here, the boldfaced index is a tuple of indices , with , each ranging from to and is an ordered product of off-diagonal permutation operators. In addition, the operators are the diagonal operators
(4) |
where is the divided difference of the exponential function [13, 14] over the multi-set , where with , and we used the convention that . We provide additional details pertaining to divided differences in Appendix A. Note that the ’s in and in should actually be denoted by , however for conciseness we are using the abbreviations . Lastly, we have denoted the product of off-diagonal matrix element where
(5) |
can be considered the ‘hopping strength’ of with respect to , and we have defined the real-valued coefficients where .
In what follows, we use the expanded form of given in Eq. (3) to show that the time evolution operator can be formulated as a linear combination of simple-to-execute unitary operators which in turn allows us to utilize the LCU technique to approximate the time-evolution operator.
The main technical contribution of this paper is devising an algorithm that implements the LCU with near-optimal scaling and at the same time utilizes only simple-to-implement unitary operators and avoid complicated classical computations on quantum registers or involved select unitaries. We present our algorithm next.
III The simulation algorithm
III.1 Divided difference exponentials as linear combinations of phases
To begin with, we present an efficient method for calculating in superposition the divided differences appearing in the operators in a manner that does not require any classical (i.e., quantum reversible) arithmetic calculations on additional ancillary registers.
First, let us focus on the divided differences of the exponential function for a multi-set of inputs and a real valued parameter . By exploiting the Leibniz rule for divided differences [13, 14], which states that
(6) |
for any two functions and , we can write
(7) |
for any . A successive application of the rule yields
(8) | |||
for any integer . We shall refer to as the divided differences approximation subdivision constant in what follows. For convenience, we shall choose it to be a power of two, namely, , for a nonnegative integer to which we will refer as the divided differences approximation depth.
As a next step, we utilize the observation that for every finite there is small enough , such that the divided difference is well approximated by [16]:
(9) |
(we discuss the quality of this approximation and its effect on the algorithm precision later on). We can thus use Eq. (9) to approximate the divided differences in Eq. (8) to obtain
(10) |
where we used the notation
(11) |
with (and defining ). The choice for which ensures a near-optimal scaling of resources in terms of simulation error will be discussed in Sec. III.3.
Next, we write the approximation as a sum of simple-to-calculate phases. By a change of variables, Eq. (III.1) can be rewritten as:
(12) | ||||
where in the second equation is the multinomial coefficient, and we defined
(13) | ||||
with the convention that for , the sum . For readability, we have abbreviated by with being the set of all integer tuples that obey , and .
Moreover, the product can be recast as a product of phases each of which proportional to one of the original inputs, namely
(14) |
where the coefficients are given by
(15) |
with . This simplification is achieved by noticing that contributes an additive factor to if and only if it contains in its average, in which case the factor is . In addition, will contain if and only if is in the range .
Furthermore, one can show that the may be expressed in terms of the minimum index, which we denote , obeying and the maximum index, denoted for which , namely,
(16) | |||||
Using and Eq. (14) we can now write the approximation Eq. (12) as a sum of phases
(17) |
where we replaced the sum over with a multi-index. The multi-index is a tuple of indices , with , each ranging from to . The relation between the former indices and the current indices is such that is the number of indices in whose value is (since there are indices, indeed can take on values in the range ). Since the indices represent ordered occurrences, we remove the multinomial coefficient from the sum. We also note that can be directly calculated from the multi-index indices, since it is the sum total of all the index values whose value is less than or equal to .
Next we describe a quantum circuit designed to approximate as given in Eq. (3) based on the approximation derived above.
III.2 The LCU implementation
Having obtained the approximation Eq. (17), we insert it now into Eq. (4), identifying and the inputs with diagonal energies . Equation (3) becomes
(18) | ||||
At this point we assume for ease of presentation that the matrix elements are real-valued and independent of (i.e., that the matrices are proportional to the identity matrix), as this is the case for many physical systems. We will discuss the necessary adjustments to the algorithm (which do not lead to added complexity) in the case where these coefficients are -dependent later on. In this simpler case we have , for some and , so , and the evolution operator becomes
(19) |
where
(20) |
are unitary operators. While the dependence of on the multi-index is evident, we note that the dependence is (only) via the coefficients. Truncating the infinite series in Eq. (19) at some maximal order . and rearranging terms, we find
(21) |
where we have denoted the ‘off-diagonal norm’ of the Hamiltonian by . The choice for which ensures a near-optimal scaling of resources in terms of simulation error will be discussed in Sec. III.3.
Since Eq. (21) has the form of a linear combination of unitary operators, we can invoke the LCU technique to execute it on a quantum computer. The technique consists of two main components (i) state preparation on ancilla qubits and (ii) unitary circuits controlled by the ancilla qubits. We discuss a simple-to-implement protocol for the two in order next.
III.2.1 The state preparation subroutine
The ancilla state we prepare consists of three quantum (multi-)registers: (i) A -qubit register, (ii) registers with qubits each, and (iii) a third set of registers consisting of qubits each. Denoting which is the unary encoding of the order , where is a shorthand for the qubit state , that is a unary encoding of in which the -th qubit is set to while the rest of the qubits are set to , and – a shorthand for quantum registers, each of dimension , the ancilla state for the LCU reads:
(22) |
with the normalization constant
(23) |
We prepare the state in three stages. In the first, we use the first -qubit register to prepare
(24) |
at the cost of controlled rotations [5]. As a next step, we use the second set of registers to prepare
(25) |
which can also be done at the cost of controlled rotations [5]. Note that the right-hand-side of the last equation has a tensor produce structure over registers: . Similarly, the third and final stage consists of using the last set of -qubit registers to prepare
(26) |
which can be carried out using controlled Hadamard gates on qubits. Therefore, overall the state preparation subroutine can be completed using controlled rotations.
III.2.2 The controlled unitaries
@C=2em @R=0.2em @!R — s ⟩ & \ctrl1 \ctrl2 \qw\ctrl2 \gate≪\qw
— i_q ⟩ \ctrl3 \qw\qw\qw\qw\qw
— k_q ⟩ \qw\ctrl1 \qw \ctrl1\qw \qw
— 0 ⟩ \qw\gateα_s \ctrl1 \gate-α_s \qw\qw
— z ⟩ \gateP_i_s\qw\gateie^-iδα_s D_0\qw \qw\qw U_(i_q,k_q)— z ⟩
The second ingredient of the LCU protocol consists of designing the select-unitary operation:
(27) |
where the are given in Eq. (20). We now show that the select-unitary operation which executes Eq. (27) can be implemented using interleaved applications of CNOTs and controlled-phases operations. To that aim, we define the unitary operator:
(28) |
where we have now spelled out the dependence of on . Let us next consider the action of the ordered product
(29) |
on a state . We find that
(30) |
Hence, consists of an interleaved application of CNOT gates (since the operators are Pauli- strings) and a phase-kickback circuit that implements [up to a simple factor of ]. A circuit for is sketched out in Fig. 1. For simplicity, the circuit uses a -qubit ancilla that encodes in unary form, and a shift gate that, starting with , shifts the bit to the left with every application. By including this counter register we are able to express as repeated applications of the circuit shown in Fig. 1.
In addition, since the diagonal is written as a linear combination of Pauli- strings, i.e., as , for some and where the operators stand for some Pauli- strings, the diagonal unitary is essentially a product of each of which is trivial to implement, once the coefficient is provided. Next, we discuss the implementation of the calculation.
III.2.3 Calculating the coefficients
First, we note that given a quantum register that stores an integer , one can implement as well as using calls to a single qubit rotation gate [17]. Next, we observe that consists of three terms, each of which is either an integer or the reciprocal thereof, cf. Eq. (16). Therefore, by calculating and storing those integers in a quantum register, one may implement efficiently.
Lastly, finding and as per Eq. (16) (each of which requiring a -qubit register), may be achieved with operations using reversible binary search (a blueprint for such a circuit is given in Appendix C) provided one has access to a cost function circuit that calculates for a given value.
A circuit for calculating from can be constructed by implementing an integer comparison circuit [18]
(31) |
i.e., a circuit that increments the third register if and only if . Here the and registers have qubits and has qubits. Executing the above sub-routine times sequentially with the functioning as the first register in each iteration, we obtain a circuit that implements
(32) |
Implementation of a binary search circuit using as the cost function, allows us to construct a circuit that produces
(33) |
The last two registers can in turn be used to calculate the relevant integers that appear in Eq. (16). The above can be done in operations [18].
We mention in passing that another [rather than ] method for calculating is available which is nonetheless simpler to implement as it does not involve the calculation of and . For this one, we write
(34) |
where denotes a bit that is set to one if and is zero otherwise. This requires a circuit implementing an integer comparison circuit which checks for the conditions and .
III.3 Algorithm cost
In the previous section we worked out the circuit which approximates the short-time evolution . In this section we provide the resource scaling analysis that ensures that is -close to in spectral distance, , where is the overall error over the simulation time , and . From the subadditivity property of the spectral norm it is ensured that with repetition of , the overall simulation is -close to the exact dynamics induced by the exact .
Our algorithm involves two basic approximations to the exact dynamics, , determined by the divided differences approximation constant and the truncation order , respectively. Therefore, to ensure that we require that both and are at most .
First we note that the LCU formalism [4] dictates that the ancilla state preparation normalization factor should be such that to ensure that . According to Eq. (23), fixing (equivalently, ) and , in our algorithm ensures , as required.
Second, based on Eq. (9), we show in Appendix B that
(35) |
where is a bound on for all , i.e., on two consecutive diagonal energies, from which it follows that
(36) | ||||
Hence, choosing ensures that . Note that the gate and qubit resources of our algorithm scale with , that is, logaritmic in .
Finally, with the above choices for and we recall that the number of ancilla qubits needed and the number of gates of a single LCU execution are , as discussed above. Since both and scale as and , respectively, we find that the algorithm does indeed have near-optimal dependence of precision.
III.4 The case of -dependent
In the previous section, we assumed for simplicity that whereas in the general case all that is guaranteed is that which follows from . The ratio may always be expressed as the average of two phases:
(37) |
Therefore in this case, we can simply replace of Eq. (28) with
(38) |
i.e., we can express it as an average of two unitaries. The simulation circuit in this case can be implemented in much the same manner as before, where now the factor can be encoded directly as a phase if the off-diagonal matrix elements are given in polar coordinates (and otherwise using a phase-kickback circuit using quantum registers that encode [5]). Additionally, the LCU state preparation should be modified so as to include additional qubits each prepared in the state alongside the change in the first stage of the state preparation routine to account for the extra factors of . These modifications do not alter the overall complexity of the algorithm.
IV Summary and conclusions
We devised a simple-to-implement quantum algorithm designed to simulate the dynamics of general Hamiltonians on a quantum computer. While straightforward to execute, we have shown that the proposed algorithm retains near-optimal dependence on the target precision.
Our algorithm has numerous properties that we argue make it attractive for implementation on resource-limited near-future quantum computing devices, which are characterized by being small and noisy and on which one is not afforded the luxury of fault tolerance and error correction schemes. These are: (i) Neither the gate cost nor the qubit cost of the algorithm depend on the norm of the diagonal component of the Hamiltonian. While for most simulation algorithms the number of repetitions of the short-time evolution unitary grows linearly with the diagonal norm, in the present algorithm the dependence on enters only via trivial phases. (ii) In addition, the present algorithm offers a compact LCU decomposition of the Hamiltonian where the terms in the Hamiltonians are grouped according to which bits they flip. This is the PMR decomposition which is in the general case considerably more compact than the customary breakup of to a linear combination of Pauli strings [5]. As such, the number of ancilla qubits required for the LCU state preparation subroutine is expected to be in general less costly by orders of magnitude compared to standard decompositions. (iii) Last, we note again the simplicity of our proposed algorithm, which prescribes the implementation of only simple-to-implement, and for the most part straightforward, sub-routines. The above property is specially important in the NISQ era where quantum computing platforms that are small and noisy and where any unnecessary qubit or gate that are added to the circuit may be decisive in terms of performance.
Furthermore, The PMR method presented above also extends naturally to the case of time-dependent Hamiltonians with the main modification being that in the time-dependent case the divided-difference inputs no longer consist only of diagonal energies but must rather be augmented with the frequencies of the time dependence [6], namely,
(39) |
where the are (in general complex-valued) frequencies of the now time-dependent diagonal operators in the PMR decomposition of the Hamiltonian. As this is the only difference between the time-independent and time-dependent case, the divided-difference approximation introduced here similarly applies to time-dependent simulations as well. A major advantage of the PMR formulation in the time-dependent case is that the cost of the present algorithm is linear in the evolution time and does not depend on frequencies (see Ref. [6]). This property too translates to potentially critical savings in gate and qubit costs.
In light of the above, we hope that the algorithm we proposed here will prove to be a useful tool in coming years where near-term quantum computing devices become more widely available and allow their users to generate credible simulation results for scientifically relevant problems. In a future study we hope to report on a performance comparison of this algorithm against existing schemes when tasked with simulating a scientifically meaningful model and executed on a NISQ device. An apples-to-apples comparison against existing algorithms on specific applications will highlight the advantages of our approach.
Acknowledgements.
We thank Arman Babakhani and Lev Barash for useful suggestions. IH acknowledges support by the Office of Advanced Scientific Computing Research of the U.S. Department of Energy under Contract No DE-SC0024389. In addition, this research was developed with funding from the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR00112330014. The views, opinions, and/or findings expressed are those of the authors and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.References
- Lloyd [1996] S. Lloyd, Science 273, 1073 (1996).
- Childs et al. [2021] A. M. Childs, Y. Su, M. C. Tran, N. Wiebe, and S. Zhu, Phys. Rev. X 11, 011020 (2021).
- Low et al. [2016] G. H. Low, T. J. Yoder, and I. L. Chuang, Phys. Rev. X 6, 041067 (2016).
- Berry et al. [2015] D. W. Berry, A. M. Childs, R. Cleve, R. Kothari, and R. D. Somma, Phys. Rev. Lett. 114, 090502 (2015).
- Kalev and Hen [2021] A. Kalev and I. Hen, Quantum 5, 426 (2021).
- Chen et al. [2021] Y.-H. Chen, A. Kalev, and I. Hen, PRX Quantum 2, 030342 (2021).
- An et al. [2023] D. An, J.-P. Liu, and L. Lin, Phys. Rev. Lett. 131, 150603 (2023).
- Chakraborty [2024] S. Chakraborty, Quantum 8, 1496 (2024).
- Berry et al. [2014] D. W. Berry, A. M. Childs, R. Cleve, R. Kothari, and R. D. Somma, in Proceedings of the Forty-Sixth Annual ACM Symposium on Theory of Computing, STOC ’14 (Association for Computing Machinery, New York, NY, USA, 2014) p. 283–292.
- Albash et al. [2017] T. Albash, G. Wagenbreth, and I. Hen, Phys. Rev. E 96, 063309 (2017).
- Hen [2018] I. Hen, Journal of Statistical Mechanics: Theory and Experiment 2018, 053102 (2018).
- Gupta et al. [2020] L. Gupta, T. Albash, and I. Hen, Journal of Statistical Mechanics: Theory and Experiment 2020, 073105 (2020).
- Whittaker and Robinson [1940] E. T. Whittaker and G. Robinson, The Calculus of Observations: A Treatise on Numerical Mathematics, 3rd Edition. (Blackie and Sons Limited, London, 1940).
- de Boor [2005] C. de Boor, Surveys in Approximation Theory 1, 46 (2005).
- Hao Low and Wiebe [2018] G. Hao Low and N. Wiebe, ArXiv e-prints , arXiv:1805.00675 (2018), arXiv:1805.00675 [quant-ph] .
- Kalev and Hen [2024] A. Kalev and I. Hen, Feynman path integrals for discrete-variable systems: Walks on hamiltonian graphs (2024), arXiv:2407.11231 [quant-ph] .
- Nielsen and Chuang [2011] M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information: 10th Anniversary Edition, 10th ed. (Cambridge University Press, USA, 2011).
- Sanders et al. [2019] Y. R. Sanders, G. H. Low, A. Scherer, and D. W. Berry, Phys. Rev. Lett. 122, 020502 (2019).
Appendix A Notes on divided differences
We provide below a brief summary of the concept of divided differences, which is a recursive division process. This method is typically encountered when calculating the coefficients in the interpolation polynomial in the Newton form.
The divided differences [13, 14] of a function are defined as
(40) |
with respect to the list of real-valued input variables . The above expression is ill-defined if some of the inputs have repeated values, in which case one must resort to the use of limits. For instance, in the case where , the definition of divided differences reduces to:
(41) |
where stands for the -th derivative of . Divided differences can alternatively be defined via the recursion relations
(42) |
with and the initial conditions
(43) |
A function of divided differences can be defined in terms of its Taylor expansion
(44) |
Moreover, it is easy to verify that
(45) |
One may therefore write:
(46) |
The above expression can be further simplified to
(47) |
as was asserted in the main text.
Appendix B Derivation of the divided-difference approximation bound
Here, we bound the absolute value of the difference between the divided difference exponential and its approximation
(48) |
where . For what follows, we shall assume that for some which is of order unity, i.e., we shall assume it is an quantity that does not scale with system size, evolution time or Hamiltonian norm. This will be the case for all physical Hamiltonians, as the basis states and differ for those by a single -local permutation operation. Thus, for any the physical Hamiltonian the change corresponds to a -local change in the basis state energy.
Next, let us use the fact [16] that the divided difference approximation is worst when the standard deviation of its inputs is maximal, namely
(49) |
where is the variance of the inputs . Owing to this observation, we find that the bound
(50) |
is maximized for the inputs (i.e., this choice maximizes the variance of the inputs). Equation (50) can thus be bounded by the quantity
(51) |
a quantity that we can calculate exactly, as both the divided difference and its approximation can be caluclated directly. A calculation of reveals:
(52) |
whereas for , after plugging in the inputs and simplifying, we obtain
(53) |
which gives for their ratio
(54) |
Putting it all together, we find that
(55) |
Appendix C A quantum circuit for reversible binary search
Given an oracle where is monotonically non-decreasing in and (here K), the task of a reversible binary search circuit is to find the smallest index such that for a given input , namely, a circuit that yields . Here, the second register has qubits.
To construct the desired circuit, we utilize an integer comparison oracle where iff (see Ref. [18]). We implement by calls to , such that each call will fix a single bit of going from left (most significant) to right (least significant).
Starting with the leftmost bit of , which we call , and moving to the right, we have . We note that the input to , namely , is the value of the output register with the leftmost bit set to . The next bit is similarly set by , noting that now the input to is the value of the output register with the second leftmost bit set to 1. This sets . We continue similarly to set the rest of the bits. After iterations of we find the output register is the desired index .