This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Learning the structure of any Hamiltonian
from minimal assumptions

Andrew Zhao
[email protected]
Sandia National Laboratories
(October 28, 2024)
Abstract

We study the problem of learning an unknown quantum many-body Hamiltonian HH from black-box queries to its time evolution eiHte^{-\mathrm{i}Ht}. Prior proposals for solving this task either impose some assumptions on HH, such as its interaction structure or locality, or otherwise use an exponential amount of computational postprocessing. In this paper, we present efficient algorithms to learn any nn-qubit Hamiltonian, assuming only a bound on the number of Hamiltonian terms, mpoly(n)m\leq\mathrm{poly}(n). Our algorithms do not need to know the terms in advance, nor are they restricted to local interactions. We consider two models of control over the time evolution: the first has access to time reversal (t<0t<0), enabling an algorithm that outputs an ϵ\epsilon-accurate classical description of HH after querying its dynamics for a total of O~(m/ϵ)\widetilde{O}(m/\epsilon) evolution time. The second access model is more conventional, allowing only forward-time evolutions; our algorithm requires O~(H3/ϵ4)\widetilde{O}(\|H\|^{3}/\epsilon^{4}) evolution time in this setting. Central to our results is the recently introduced concept of a pseudo-Choi state of HH. We extend the utility of this learning resource by showing how to use it to learn the Fourier spectrum of HH, how to achieve nearly Heisenberg-limited scaling with it, and how to prepare it even under our more restricted access models.

1 Introduction

Identifying an unknown Hamiltonian is a central task in quantum physics. As the Hamiltonian fully characterizes the interactions within any closed system, learning it provides a tractable description of a potentially highly complex quantum system. This is in contrast to the more general task of quantum process tomography, which learns the dynamical map itself but is exponentially expensive in the size of the system, even when the system is closed [BKD14, BMQ22, HKOT23]. Applications of Hamiltonian learning include quantum metrology and sensing [GLM11, DRC17], quantum device engineering [Shu+14, IBFBP20], and certifying quantum computations [WGFC14]. By developing more accurate and efficient methods for understanding quantum systems, we can enable the design and control of such systems with finer precision.

Historically, Hamiltonian learning draws from the subject of quantum metrology [Ram50, Cav81, BIWH96], extending to heuristic protocols for learning many-body models with familiar types of interactions [GFWC12, WGFC14a, HBCP15, Wan+17, BAL19]. Recently, there has been a surge of interest in learning Hamiltonians of large many-body systems with more complicated interactions and rigorous theoretical guarantees [AAKS21, HKT24, GCC24, HTFS23, Car24, DOS23, BLMT24, BLMT24a]. This body of work substantially broadens the applicability of this task, especially as experimental capabilities in engineering large-scale quantum devices continues to improve. From a theoretical perspective, such works explore fundamental questions in the learnability of quantum systems, such as: what classes of Hamiltonians can be learned? How efficiently and accurately can this task be carried out? And what level of control or access to the system suffices? This paper aims to shed some light on these foundational questions.

Broadly speaking, an unknown Hamiltonian is accessed either through its static or dynamical behavior. In the former, we are tasked with deducing the Hamiltonian when the system is at equilibrium [Alh23]. This is typically given through its Gibbs state at some temperature [AAKS21, HKT24, GCC24, BLMT24], although learning from eigenstates has also been considered [QR19]. In the latter model, we are given black-box access to the time evolution operator eiHte^{-\mathrm{i}Ht} of the unknown Hamiltonian HH, for some controllable amount of time tt. Again, we are tasked to output a classical description of HH, this time given the ability to prepare some initial probe states, evolving them under the dynamics of HH, and measuring in some bases. Both versions of Hamiltonian learning have different merits and challenges, and are appropriate depending on the experimental scenario at hand. For example, the static model is more experimentally friendly, arising in many natural settings where one has limited control of the system. On the other hand, dynamical access requires finer experimental control but can serve as a more powerful learning resource [HKT24, HTFS23]. Learning from the time evolution operator also has close connections to applications in quantum simulation and algorithms. For the remainder of this paper, we will focus exclusively on the dynamical paradigm.

In this context, the primary cost of a Hamiltonian learning algorithm is the total evolution time, ttotalt_{\mathrm{total}}. This is the total amount of time under which we evolve by eiHte^{-\mathrm{i}Ht}. We can understand this cost in relation to the fractional- or continuous-query model [BCCKS14], wherein queries to the time evolution operator cost less for smaller tt. Note that this is part of, but not the same as, the total time complexity of the algorithm, which includes the cost of additional quantum and classical computation (before, during, and after querying the system). In this work, we will not closely inspect the details of the computational complexity; we will only be concerned whether an algorithm is computationally efficient (polynomially scaling in all parameters) or not. Other figures of merit include the time resolution tmint_{\mathrm{min}}, which dictates how finely an experimenter must be able to control the time evolution; NexpN_{\mathrm{exp}}, the total number of prepare–evolve–measure experiments to run; and nancn_{\mathrm{anc}}, the number of ancilla qubits used either to perform supporting quantum computation/control or to coherently store queries in quantum memory.

Most Hamiltonian learning results operate under the assumption that the structure of the unknown Hamiltonian is fully or at least partially known. The structure of a Hamiltonian H=a=1mλaEaH=\sum_{a=1}^{m}\lambda_{a}E_{a} is the set of terms EaE_{a} for which λa0\lambda_{a}\neq 0. For some algorithms, the learning guarantees can be lost in the worst case when HH has even a single unidentified term [HKT24, HTFS23]. For others, the algorithm may succeed at learning the parameters of a partially known structure, but there is no guarantee for efficiently learning the unidentified terms [SLP11, BAL19, ZYLB21]. To address this deficiency, Bakshi, Liu, Moitra, and Tang [BLMT24a] formalized the problem of Hamiltonian structure learning, and they provided an efficient algorithm to solve it. Specifically, when HH is a constant-local nn-qubit Hamiltonian (with no restriction on the range of its local interactions) they showed that it is possible to identify all local terms EaE_{a} with |λa|>ϵ|\lambda_{a}|>\epsilon, and furthermore estimate the values of all such λa\lambda_{a} to additive error ϵ\epsilon. In terms of evolution time, their algorithm costs ttotal=𝒪(rlog(n)/ϵ)t_{\mathrm{total}}=\mathcal{O}(r\log(n)/\epsilon), where rr is an effective sparsity parameter of the qubits’ interaction graph.

With nearly optimal scaling, this breakthrough result essentially completely solves the problem for local spin Hamiltonians. However, it remained an open question as to how one performs structure learning for even broader classes of Hamiltonians. For example, suppose that we assume HH is kk-local, but there are in fact a few kk^{\prime}-local terms, where k>kk^{\prime}>k. This can occur, for instance, by (undesired) spin-squeezing effects [KCM22]. In this setting, the algorithm of [BLMT24a] can learn all the kk-local terms, but not the kk^{\prime}-local terms. Furthermore, even if one were given the value of kk^{\prime}, if say k=ω(1)k^{\prime}=\omega(1) then the complexity of their algorithm would be superpolynomial.

More generally, we pose the scenario where HH is completely arbitrary, with no locality restriction on its terms. Even if mpoly(n)m\leq\operatorname{poly}(n), it is unclear how to identify these terms from a pool of 2𝒪(n)2^{\mathcal{O}(n)} candidate terms within polynomial time. [BLMT24a] refers to this problem as the “sparse, nonlocal” setting and suggests that their algorithm could be applied here as well; however it is unclear how to avoid an exponential amount of computation using either their Goldreich–Levin-like algorithm [GL89] or general shadow tomography ideas [Aar20, BO21]. To our knowledge, there are only a handful of proposals in the literature which can operate in this setting that at least avoid an exponential number of experiments [YSHY23, Car24, CW23]. However, the algorithms of Caro [Car24] and Castaneda and Wiebe [CW23] would use an exponential amount of classical computation to solve this task, essentially brute-force checking over all possible terms. Meanwhile, the analysis by Yu, Sun, Han, and Yuan [YSHY23] assumes that the mm terms are distributed uniformly among the set of Pauli operators, and therefore only guarantees that their algorithm succeeds with high probability (when 1/m1/m is sufficiently small) over the draw of HH from this ensemble of Hamiltonians. Their numerics suggest good heuristic performance but does not guarantee the ability to structure-learn any given sparse, nonlocal Hamiltonian in the worst case.

We therefore ask: can efficient structure learning be accomplished in this setting, and if so, what is the minimal amount of prior information required? Because we require an efficient algorithm, we only consider Hamiltonians for which mpoly(n)m\leq\operatorname{poly}(n), as otherwise merely writing down the description of HH is intractable. As an example of prior information, consider again the algorithm of [BLMT24]. Implicit in their problem input is a bound on the effective sparsity and a “local one-norm” of HH (i.e., the sum of strengths of all terms involving any given qubit). Bounds on these quantities are necessary in order to choose appropriate values for tmint_{\mathrm{min}} and NexpN_{\mathrm{exp}}. On the other hand, this aspect is trivial in non-structure-learning settings because knowing the structure is already more than sufficient to establish such bounds. We answer both questions by describing structure learning algorithms which essentially only require a bound on mm to run correctly. As long as mpoly(n)m\leq\operatorname{poly}(n), our algorithms are efficient in all metrics.

1.1 Main results

In this paper, we propose two algorithms for learning the structure of any unknown nn-qubit Hamiltonian. Let us begin by providing a technical definition for what we mean by the structure of a Hamiltonian.

Definition 1.1.

An nn-qubit Hamiltonian is a Hermitian 2n×2n2^{n}\times 2^{n} matrix

H=a=14nλaEa,H=\sum_{a=1}^{4^{n}}\lambda_{a}E_{a}, (1)

where Ea𝒫n={𝕀,X,Y,Z}nE_{a}\in\mathcal{P}_{n}=\{\mathbb{I},X,Y,Z\}^{\otimes n} are the unique Pauli terms of HH, and λa\lambda_{a}\in\mathbb{R} are the parameters of HH. The labeling a[4n]a\in[4^{n}] is arbitrary; defining the Pauli structure of HH as the set S={(Ea,λa):Ea𝕀 and λa0}S=\{(E_{a},\lambda_{a}):E_{a}\neq\mathbb{I}\text{ and }\lambda_{a}\neq 0\}, we take convention that the labels a=1,,|S|a=1,\ldots,|S| correspond to the elements of SS. As a slight abuse of notation, we refer to λ|S|\lambda\in\mathbb{R}^{|S|} as the vector of nonzero parameters, which we can pad with zeros as necessary.

For normalization, we adopt units of energy such that λ1\|\lambda\|_{\infty}\leq 1.

Our definition of structure is understood with respect to an operator basis. In this paper we consider the Pauli basis 𝒫n\mathcal{P}_{n}, as it is arguably the most natural choice for studying many-qubit systems. Other choices of basis are also possible in principle, as long as they admit an efficient classical description; such a choice should be thought of as part of the definition to the learning problem. In this paper, we will exclusively consider the Pauli structure, which is a quantum analogue to the Fourier spectrum of a classical Boolean function. We will be primarily concerned with sparse (|S|poly(n)|S|\leq\operatorname{poly}(n)), nonlocal (supp(Ea)n\operatorname{supp}(E_{a})\leq n) Hamiltonians in this basis.

Note that our definition of SS explicitly excludes the identity component, as it merely constitutes an energy shift. This shift contributes to the time evolution eiHte^{-\mathrm{i}Ht} as a global phase, and is therefore fundamentally unobservable within our black-box access models. However, let us point out that we will not be restricted to traceless Hamiltonians as inputs; our techniques will essentially extract the traceless part of any given Hamiltonian “for free.”

With these considerations in mind, let us define the learning problem as follows.

Problem 1.2.

Let HH be an arbitrary, unknown nn-qubit Hamiltonian, SS its Pauli structure, and ϵ(0,1)\epsilon\in(0,1) a desired accuracy parameter. In the Hamiltonian structure learning problem, we are given as input: query access to applications of eiHte^{-\mathrm{i}Ht} for tunable tt and an integer m|S|m\geq|S|. We are tasked with outputting a description {(Ea,λ^a):|λa|>ϵ}\{(E_{a},\widehat{\lambda}_{a}):|\lambda_{a}|>\epsilon\} such that each |λ^aλa|ϵ|\widehat{\lambda}_{a}-\lambda_{a}|\leq\epsilon, with probability at least 2/32/3.

The 2/32/3 success probability is conventional; by a standard median-boosting argument, any algorithm that solves this problem can be repeated 𝒪(log(1/δ))\mathcal{O}(\log(1/\delta)) times to amplify the probability above 1δ1-\delta.

The particular type of access to eiHte^{-\mathrm{i}Ht} that we are given can significantly affect the learning guarantees for solving 1.2. At a minimum, we will suppose discrete control over the Hamiltonian in the sense of [DOS23]. This is the standard type of quantum control assumed in most settings.

Definition 1.3.

Discrete quantum control is the ability to run the circuit

VL(eiHtL𝕀)VL1V1(eiHt1𝕀)V0,V_{L}(e^{-\mathrm{i}Ht_{L}}\otimes\mathbb{I})V_{L-1}\cdots V_{1}(e^{-\mathrm{i}Ht_{1}}\otimes\mathbb{I})V_{0}, (2)

where V0,V1,,VLV_{0},V_{1},\ldots,V_{L} are discrete control operations (quantum gates) acting jointly on the nn-qubit system and some nancn_{\mathrm{anc}} ancilla qubits, and the tj0t_{j}\geq 0 are some sequence of tunable evolution times.

Without some form of quantum control, [DOS23] showed that any Hamiltonian learning algorithm cannot learn with precision beyond the standard quantum limit Ω(1/ϵ2)\Omega(1/\epsilon^{2}).111Up to some technical consideration about HH. Along with discrete control, one of our algorithms uses an additional form of control to beat the standard limit: the ability to implement time reversal.

Definition 1.4.

Access to time-reversal dynamics is the ability to apply eiHte^{-\mathrm{i}Ht} for both t0t\geq 0 and t<0t<0.

For example, systems possessing a chiral symmetry Γ\Gamma obey ΓHΓ=H\Gamma H\Gamma^{\dagger}=-H, which immediately gives rise to time-reversal dynamics. Alternatively, in the context of algorithmic certification, one typically has direct control over tt as a tunable, real parameter. The power of time reversal in learning was previously considered in [Sch+23, CSM23, SHH24], where exponential separations for certain unitary learning tasks were established between protocols with access to time reversal and those without. However, these results do not necessarily imply such a separation in our setting (i.e., learning Hamiltonians with sparse structures).

With access to discrete quantum control and time-reversal dynamics, we prove the following theorem.

Theorem 1.5 (Structure learning with time reversal).

There exists a quantum algorithm that solves 1.2 under the discrete control and time-reversal access model. The algorithm queries eiHte^{-\mathrm{i}Ht} for a total evolution time of ttotal=𝒪~(m/ϵ)t_{\mathrm{total}}=\widetilde{\mathcal{O}}(m/\epsilon), requires a time resolution of |t|tmin=Ω~(ϵ/m3)|t|\geq t_{\mathrm{min}}=\widetilde{\Omega}(\epsilon/m^{3}), runs Nexp=𝒪~(m2log(1/ϵ))N_{\mathrm{exp}}=\widetilde{\mathcal{O}}(m^{2}\log(1/\epsilon)) experiments, and uses poly(n,m,1/ϵ)\operatorname{poly}(n,m,1/\epsilon) quantum and classical computation. The discrete control operations act on an ancillary computational register of nanc=n+𝒪(logm)n_{\mathrm{anc}}=n+\mathcal{O}(\log m) qubits.

This algorithm nearly achieves the Heisenberg limit of quantum metrology, up to a factor of log(1/ϵ)\log(1/\epsilon) hidden in the 𝒪~()\widetilde{\mathcal{O}}(\cdot) notation.

In more restricted experimental settings, it is unrealistic to assume the ability to perform time reversal. Therefore, we consider a second access model that is more conventional: only having discrete quantum control and forward-time evolutions.

Theorem 1.6 (Structure learning without time reversal).

There exists a quantum algorithm that solves 1.2 under the discrete control access model alone. The algorithm queries eiHte^{-\mathrm{i}Ht} for a total evolution time of ttotal=𝒪~(H3/ϵ4)t_{\mathrm{total}}=\widetilde{\mathcal{O}}(\|H\|^{3}/\epsilon^{4}), requires a time resolution of ttmin=Ω~(ϵ4/H5)t\geq t_{\mathrm{min}}=\widetilde{\Omega}(\epsilon^{4}/\|H\|^{5}), runs Nexp=𝒪~(H4/ϵ4)N_{\mathrm{exp}}=\widetilde{\mathcal{O}}(\|H\|^{4}/\epsilon^{4}) experiments, and uses poly(n,m,1/ϵ)\operatorname{poly}(n,m,1/\epsilon) quantum and classical computation. The discrete control operations act on an ancillary computational register of nanc=n+𝒪(loglog(H/ϵ))n_{\mathrm{anc}}=n+\mathcal{O}(\log\log(\|H\|/\epsilon)) qubits.

Above, we have phrased the complexity in terms of H\|H\|, mirroring the statements of related results in the literature [Car24, CW23]. Without knowledge of H\|H\|, we simply apply the bound Hm\|H\|\leq m (which holds due to our normalization convention λ1\|\lambda\|_{\infty}\leq 1). We remark that it is not possible to make the reverse replacement in Theorem 1.5 with our current analysis.

The proof of Theorem 1.6 can in fact be strengthened to almost recover the conventional 1/ϵ21/\epsilon^{2} scaling in the precision. Although a strict asymptotic improvement in every metric, the resulting algorithm becomes “galactic”: for example, to get a scaling of 1/ϵ2.011/\epsilon^{2.01}, constants on the order of 106010^{60} appear.222On the other hand, smaller speedups feature more reasonable constants. For instance, achieving ttotal=𝒪~(H2/ϵ3)t_{\mathrm{total}}=\widetilde{\mathcal{O}}(\|H\|^{2}/\epsilon^{3}) only incurs a factor of 44 relative to the hidden constants of Theorem 1.6. We therefore treat this result as a curious observation, and we will defer any further discussion and analysis to Appendix B.

Corollary 1.7 (Galactic structure learning without time reversal).

There exists a quantum algorithm that solves 1.2 under the discrete control access model alone. For any real constant p1p\geq 1, the algorithm queries eiHte^{-\mathrm{i}Ht} for a total evolution time of ttotal=𝒪~((2pH)1+2/p/ϵ2+2/p)t_{\mathrm{total}}=\widetilde{\mathcal{O}}((2^{p}\|H\|)^{1+2/p}/\epsilon^{2+2/p}), requires a time resolution of ttmin=Ω~(ϵ4/(2pH)5)t\geq t_{\mathrm{min}}=\widetilde{\Omega}(\epsilon^{4}/(2^{p}\|H\|)^{5}), runs Nexp=𝒪~((2pH)2+2/p/ϵ2+2/p)N_{\mathrm{exp}}=\widetilde{\mathcal{O}}((2^{p}\|H\|)^{2+2/p}/\epsilon^{2+2/p}) experiments, and uses poly(n,m,1/ϵ)\operatorname{poly}(n,m,1/\epsilon) quantum and classical computation. The discrete control operations act on an ancillary computational register of nanc=n+𝒪(loglog(H/ϵ))n_{\mathrm{anc}}=n+\mathcal{O}(\log\log(\|H\|/\epsilon)) qubits.

1.2 Related work

In this section we will give an overview of prior works on Hamiltonian learning from time evolution, as well as some related topics. The literature on Hamiltonian learning is vast and so this discussion will not be comprehensive. Table 1 catalogs various results in Hamiltonian learning and shows how they compare to our algorithms. In the table, we say an algorithm can achieve structure learning (SL) depending on the assumed class of Hamiltonians. This is a subtle but important point: for example, applying any algorithm under the “sparse, nonlocal” category to constant-local Hamiltonians can achieve SL within that smaller class (by simply searching over the entire n𝒪(1)n^{\mathcal{O}(1)}-sized pool of candidate terms).

Reference Hamiltonian class SL? Control? nancn_{\mathrm{anc}} ttotalt_{\mathrm{total}} tmint_{\mathrm{min}} NexpN_{\mathrm{exp}} [ZYLB21] Geometrically local Yes None 0 1/ϵ31/\epsilon^{3} ϵ\epsilon 1/ϵ41/\epsilon^{4} [SMDWB24] —— No None 0 1/ϵ21/\epsilon^{2} 1polylog(1/ϵ)\frac{1}{\operatorname{polylog}(1/\epsilon)} 1/ϵ21/\epsilon^{2} [HKT24] Low intersection No None 0 1/ϵ21/\epsilon^{2} 11 1/ϵ21/\epsilon^{2} [HTFS23] —— No Discrete 0 1/ϵ1/\epsilon ϵ\sqrt{\epsilon} polylog(1/ϵ)\operatorname{polylog}(1/\epsilon) [DOS23] —— No Continuous 0 1/ϵ1/\epsilon 11 log(1/ϵ)\log(1/\epsilon) [BLMT24a] Effectively sparse, bounded strength Yes Discrete 0 1/ϵ1/\epsilon 11 log(1/ϵ)\log(1/\epsilon) [YSHY23] Sparse, nonlocal Yes Discrete 0 nH3/ϵ4n\|H\|^{3}/\epsilon^{4} 1/H1/\|H\| nH4/ϵ4n\|H\|^{4}/\epsilon^{4} [Car24] —— No Discrete n+H2/ϵ2n+\|H\|^{2}/\epsilon^{2} H3/ϵ4\|H\|^{3}/\epsilon^{4} 1/H1/\|H\| H4/ϵ4\|H\|^{4}/\epsilon^{4} [OKSM24] —— No Discrete 11 m/ϵm/\epsilon ϵ2/H2\epsilon^{2}/\|H\|^{2} mlog(1/ϵ)m\log(1/\epsilon) [CW23]§ —— No ctrl(e±iHt)\texttt{ctrl}(e^{\pm\mathrm{i}Ht}) nn H/ϵ2\|H\|/\epsilon^{2} 1/H1/\|H\| H2/ϵ2\|H\|^{2}/\epsilon^{2} [CW23]§ —— No ctrl(e±iHt)\texttt{ctrl}(e^{\pm\mathrm{i}Ht}) n+mn+m m/ϵ\sqrt{m}/\epsilon 1/H1/\|H\| Hm/ϵ\|H\|\sqrt{m}/\epsilon Here Sparse, nonlocal Yes ctrl(e±iHt)\texttt{ctrl}(e^{\pm\mathrm{i}Ht}) n+logmn+\log m m/ϵm/\epsilon 1/m1/m m2log(1/ϵ)m^{2}\log(1/\epsilon) Here —— Yes Discrete, ±t\pm t n+logmn+\log m m/ϵm/\epsilon ϵ/m3\epsilon/m^{3} m2log(1/ϵ)m^{2}\log(1/\epsilon) Here —— Yes Discrete n+loglog(H/ϵ)n+\log\log(\|H\|/\epsilon) H3/ϵ4\|H\|^{3}/\epsilon^{4} ϵ4/H5\epsilon^{4}/\|H\|^{5} H4/ϵ4\|H\|^{4}/\epsilon^{4}

Table 1: Nonexhaustive list of algorithms for learning an nn-qubit Hamiltonian from time evolution. If the operator norm is not known, it can be replaced with mHm\geq\|H\|. Each Hamiltonian class in the table is subsumed by the one below it. Subdominant constants and polylogarithmic factors are suppressed.
The structure-learning analysis was demonstrated posthoc in [BLMT24a].
Their analysis assumes that the mm unknown terms are distributed uniformly among all Pauli operators.
SL can be achieved if given exponential classical resources (also incurs a factor of nn in ttotalt_{\mathrm{total}} and NexpN_{\mathrm{exp}}).
§Their results were originally stated with 2\ell_{2}-error, which we have converted to \ell_{\infty}-error here.
Can be improved to ttotal=H1+o(1)/ϵ2+o(1)t_{\mathrm{total}}=\|H\|^{1+o(1)}/\epsilon^{2+o(1)} and Nexp=H2+o(1)/ϵ2+o(1)N_{\mathrm{exp}}=\|H\|^{2+o(1)}/\epsilon^{2+o(1)}, with potentially enormous constants.

Algorithms capable of (local) structure learning. While [BLMT24a] first put forth the explicit problem of Hamiltonian structure learning, it was recognized that prior results could be reanalyzed in this context as well. At the core of many of these results, including [BLMT24a] itself, is the so-called time-derivative estimator [SMLKR11, SLP11, ZYLB21, SMDWB24]. This technique is essentially based on the relation tr(PaeiHtρaeiHt)2λat\operatorname{tr}(P_{a}e^{-\mathrm{i}Ht}\rho_{a}e^{\mathrm{i}Ht})\approx 2\lambda_{a}t for small tt, where ρa=12n(𝕀+iEaPa)\rho_{a}=\frac{1}{2^{n}}(\mathbb{I}+\mathrm{i}E_{a}P_{a}) is a mixed state defined by any Pauli operator PaP_{a} that anticommutes with the term EaE_{a} to be learned. This relation does not depend on the locality of the terms, and therefore even a naive term-by-term approach can estimate any sparse, nonlocal Hamiltonian in polynomial time. However, achieving structure learning by this approach requires that the size of the pool of candidate terms be polynomially bounded, which we do not assume in our results. Furthermore, parallelizing the estimation of many EaE_{a}’s per experiment relies on a classical shadows-like argument that ony applies to local terms [HKP20].

The algorithms of Caro [Car24] and Castaneda and Wiebe [CW23] also experience similar advantages and limitations. However, one particular advantage they have over the time-derivative approaches is their use of shadow tomography variants (those of [CCHL22] and [HKP20], respectively), which improves the parallel estimation of many potentially nonlocal observables. These algorithms require more sophisticated quantum resources however, needing at least nn additional qubits to prepare the (pseudo-)Choi states central to their strategies. Notably, [Car24] requires a quantum memory of size 𝒪~(H2/ϵ2)\widetilde{\mathcal{O}}(\|H\|^{2}/\epsilon^{2}) on top of the Choi qubits, which is inherited from the shadow tomography protocol of [CCHL22]. [CW23] proposes two approaches, the second of which is based on the gradient-estimation-based algorithm of [Hug+22], achieving ttotal=𝒪~(m/ϵ)t_{\mathrm{total}}=\widetilde{\mathcal{O}}(\sqrt{m}/\epsilon) but using a large ancilla register of size 𝒪~(m)\widetilde{\mathcal{O}}(m). Also, both algorithms of [CW23] require the ability to apply ctrl(e±iHt)\texttt{ctrl}(e^{\pm\mathrm{i}Ht}).

Other algorithms of note include that of Yu, Sun, Han, and Yuan [YSHY23], which is based on sparse Pauli channel learning [FW20]. To our knowledge, this is the only prior algorithm which can specifically target sparse, nonlocal Hamiltonians while maintaining efficient scaling. However, their rigorous analysis requires them to assume that the mm Pauli terms are uniformly distributed over all possible (4nm)\binom{4^{n}}{m} combinations. Their results therefore only hold with high probability over the draw of HH. The algorithm of Odake, Kristjánsson, Soeda, and Murao [OKSM24] applies a linear transformation to the Hamiltonian to isolate each term EaE_{a}, thereby allowing them to learn the parameter λa\lambda_{a} via robust phase estimation [KLY15]. Iterating this procedure over all known terms furnishes the Heisenberg-limited estimation of HH. One particular downside to their approach is that their tmin=Ω(ϵ2/H2)t_{\mathrm{min}}=\Omega(\epsilon^{2}/\|H\|^{2}), where one factor is incurred to approximate the linear transformation, and then another factor is taken to get appropriate guarantees on the robust phase estimation.

Finally, Gutiérrez [Gut24] describes an algorithm to learn a kk-local Hamiltonian with some unique properties. First, it is strongly limited to k=𝒪(1)k=\mathcal{O}(1) because of the use of a noncommutative Bohnenblust–Hille inequality [HCP23, VZ24]. However, this inequality is also what allows it to achieve 2\ell_{2}-error without explicit dependence on mm. In contrast, all other attempts to get 2\ell_{2}-error pay the standard factor of m\sqrt{m} for converting from the \ell_{\infty}-norm. Assuming H1\|H\|\leq 1, his algorithm has query complexity exp(𝒪(k2+klog(1/ϵ)))\exp(\mathcal{O}(k^{2}+k\log(1/\epsilon))). Unfortunately this features rather large constants in the exponents: unraveling the analysis for unnormalized H\|H\| shows that ttotal=𝒪((H/ϵ)15k+17)t_{\mathrm{total}}=\mathcal{O}((\|H\|/\epsilon)^{15k+17}) and tmin=Ω((ϵ/H)k+1)t_{\mathrm{min}}=\Omega((\epsilon/\|H\|)^{k+1}). The learning algorithm also requires applications of ctrl(e±iHt)\texttt{ctrl}(e^{\pm\mathrm{i}Ht}), inherited from the use of the quantum Goldreich–Levin algorithm of [MO10] as a subroutine.

Learning Pauli spectra. Prior works have also studied the task of learning an operator’s Pauli structure, sometimes referred to as its Pauli or Fourier spectrum, given query access to that operator [AS07, MO10, CNY23, Ang23, Gut24, BLMT24a]. These can be understood as quantum analogues to the Goldreich–Levin algorithm, which allows one to efficiently estimate any Fourier coefficient of a Boolean function from queries to that function [GL89]. Most works study the Pauli spectrum of a unitary operator, developing algorithms to test and learn unitaries acting on only a constant subsystem of qubits (or more generally, possessing a constant amount of influence over the system) [AS07, MO10, CNY23, Ang23].

Most relevant to the problem of Hamiltonian learning are the algorithms of Bakshi, Liu, Moitra, and Tang [BLMT24a] and Guitérrez [Gut24], which approximate the spectrum of HH from that of eiHte^{-\mathrm{i}Ht} using series approximations at small tt. Both results are limited to local Hamiltonians, featuring explicit exponential scalings with respect to the locality of the Pauli terms being learned. In contrast, our algorithm works with an object called the pseudo-Choi state of HH [CW23], which gives direct access to the Pauli spectrum of HH rather than deducing it from its time evolution unitary. As such, our approach is not hindered by any locality considerations. At its core, our algorithm is efficient as long as Hpoly(n)\|H\|\leq\operatorname{poly}(n), and in fact can be applied to learn the Pauli spectrum of any operator, not necessarily Hermitian or unitary, as long as one can efficiently block encode it.

Time reversal in learning problems. Time reversal is a tool that has been utilized with success in various modern quantum experiments [Gär+17, Mi+21, Bra+22, Col+22, Li+23]. In learning tasks, the intuitive power of time reversal is its ability to “refocus” the evolution of a physical system, enhancing the signal of important many-body correlations that might only manifest at late time scales [Sch+23]. Theoretically, superpolynomial separations have been established between learning protocols with and without access to time-reversal dynamics. Cotler, Schuster, and Mohseni [CSM23] considered the task of distinguishing whether a Haar-random unitary was sampled from U(2n)\mathrm{U}(2^{n}) or U(2n/2)U(2n/2)\mathrm{U}(2^{n/2})\otimes\mathrm{U}(2^{n/2}). Meanwhile, Schuster, Haferkamp, and Huang [SHH24] gave the task of distinguishing between two different classes of pseudorandom quantum circuits. In both cases, it was proven that no time-forward experiment can solve these tasks in polynomial time, but protocols using time-reversed dynamics to probe out-of-time-ordered correlators (OTOCs) can solve them very efficiently.

In our work, we do not use time reversal to probe OTOCs, but rather to implement quantum simulation techniques like the quantum singular value transformation (QSVT) [GSLW19]. In a similar spirit to the works discussed above, we observe that our time-reversed protocol can nearly reach the Heisenberg limit, while our time-forward algorithm does not surpass the standard quantum limit. However, as we do not consider lower bounds in this paper, we do not establish analogous separations for the sparse Hamiltonian learning problem.

Indistiguishable particles. We have restricted attention to many-qubit systems, although we note that there has been some progress on learning bosonic [LTGNY24] and fermionic [NLY24, MH24] Hamiltonians. However, these algorithms only consider Hubbard models, and they cannot perform structure learning due to their use of the Hamiltonian reshaping strategy of [HTFS23]. Conversely, it is interesting to point out that our results can be applied to these systems under an appropriate qubit mapping. For instance, we immediately get an efficient learning algorithm for any fermionic Hamiltonian with at most polynomially many interaction terms. This is true even if one uses the nonlocal Jordan–Wigner transformation [JW28] to represent the fermions as qubits.

1.3 Technical overview

In this section, we will describe the high-level techniques used to build our algorithms and prove our results. As per 1.2, we are given an unknown nn-qubit Hamiltonian HH and an integer mm with the promise that HH is supported on at most mm Pauli operators. Therefore, we write H=a=1mλaEaH=\sum_{a=1}^{m}\lambda_{a}E_{a}, with the understanding that λ[1,1]m\lambda\in[-1,1]^{m} may potentially be padded with zeros. At times, it is natural to consider the size of the Hamiltonian in terms of its operator norm H\|H\|. Whenever the learner does not have access to this quantity, we keep in mind that the bound Hm\|H\|\leq m can always be used in its stead.

Learning with pseudo-Choi states

The key technique enabling our learning algorithms is the pseudo-Choi state, a concept introduced by Castaneda and Wiebe [CW23]. They define a version with a reference control qubit, but it is straightforward to consider a referenceless version as well. We will use both variants in our algorithms.

Definition 1.8.

Let |Ω=12nb{0,1}n|b|b|\Omega\rangle=\frac{1}{\sqrt{2^{n}}}\sum_{b\in\{0,1\}^{n}}|b\rangle|b\rangle. The pseudo-Choi states of an nn-qubit operator HH, with and without reference, are

|Ψref(H)\displaystyle|\Psi_{\mathrm{ref}}(H)\rangle =(H𝕀n)|Ω|0+|Ω|1𝒩ref,\displaystyle=\frac{(H\otimes\mathbb{I}_{n})|\Omega\rangle|0\rangle+|\Omega\rangle|1\rangle}{\mathcal{N}_{\mathrm{ref}}}, (3)
|Ψ(H)\displaystyle|\Psi(H)\rangle =(H𝕀n)|Ω𝒩,\displaystyle=\frac{(H\otimes\mathbb{I}_{n})|\Omega\rangle}{\mathcal{N}}, (4)

respectively. 𝒩ref\mathcal{N}_{\mathrm{ref}} and 𝒩\mathcal{N} are normalization constants.

This state draws analogies to the usual Choi state (𝕀)(|ΩΩ|)(\mathcal{E}\otimes\mathbb{I})(|\Omega\rangle\!\langle\Omega|) of a quantum channel \mathcal{E}. However, since HH does not necessarily induce a quantum channel, pseudo-Choi states do not exhibit channel–state duality in the sense of the Choi–Jamiołkowski isomorphism. Nonetheless, it serves as a powerful learning resource: [CW23] shows that Clifford classical shadows [HKP20] can be used to estimate all mm parameters of HH with 𝒪(Δ2log(m)/ϵ2)\mathcal{O}(\Delta^{2}\log(m)/\epsilon^{2}) copies of |Ψref(H/Δ)|\Psi_{\mathrm{ref}}(H/\Delta)\rangle. Here, Δ2H\Delta\geq 2\|H\| is a normalization factor necessary for the preparation of this state. This measurement scheme is therefore able to simultaneously learn a large number of observables with no restriction on their locality. We describe this protocol in more detail in Section 3.

To prepare such states, [CW23] calls upon a subroutine to block encode the Hamiltonian from its time evolution [LC17, GSLW19]. This involves block encoding the operator sin(H/Δ)=12(eiH/Δ+eiH/Δ)\sin(H/\Delta)=\frac{1}{2}(e^{\mathrm{i}H/\Delta}+e^{-\mathrm{i}H/\Delta}) as a linear combination of unitaries (LCU) [CW12], followed by invoking the QSVT [GSLW19] to apply a polynomial approximation of the arcsin\arcsin function onto that block. The normalization H/Δ1/2\|H/\Delta\|\leq 1/2 ensures that the spectrum being transformed lies in the appropriate domain of the arcsin\arcsin approximation. This technique is very accurate: the polynomial only needs degree d=𝒪(log(1/εarcsin))d=\mathcal{O}(\log(1/\varepsilon_{\mathrm{arcsin}})) to get spectral error εarcsin\varepsilon_{\mathrm{arcsin}}, and choosing εarcsin=Θ(ϵ/Δ)\varepsilon_{\mathrm{arcsin}}=\Theta(\epsilon/\Delta) suffices for the learning problem. Then from 𝒪(d)\mathcal{O}(d) queries to the controlled time evolution with resolution |t|=Θ(1/Δ)|t|=\Theta(1/\Delta), one can construct the block encoding of H/ΔH/\Delta. Finally, the pseudo-Choi state |Ψref(H/Δ)|\Psi_{\mathrm{ref}}(H/\Delta)\rangle is prepared probabilistically from the block encoding by measuring some ancilla flag qubits; this succeeds with probability at least 1/21/2, so a constant number of repetitions per copy suffices.

The limitations of this approach are two-fold. First, it does not supply an efficient way to learn the terms if they are a priori unknown. Second, the state-preparation procedure requires controlled forms of the time evolution, ctrl(eiHt)\texttt{ctrl}(e^{-\mathrm{i}Ht}) and ctrl(e+iHt)\texttt{ctrl}(e^{+\mathrm{i}Ht}). Access to these resources is unnatural in most learning frameworks, and there are no-go conditions for controlling black-box unitaries [AFCB14, GST24]. Furthermore, in the time-forward access model one cannot directly query the (controlled) inverses either. In order to make the pseudo-Choi state method applicable to more conventional learning scenarios, we seek a way to lift these limitations.

Preparing pseudo-Choi states without controlled evolutions

To get our main results Theorems 1.5 and 1.6, we show how to prepare pseudo-Choi states without either controlled time evolutions or (in the second access model) time reversal. The guarantees on our state-preparation algorithms are as follows.

Theorem 1.9 (Pseudo-Choi state preparation with uncontrolled time-reversal queries).

Let Δ2H\Delta\geq 2\|H\|, ϵ>0\epsilon>0. From access to time-reversal dynamics, we can prepare 𝒪(ϵ/Δ)\mathcal{O}(\epsilon/\Delta)-close states to |Ψref(H/Δ)|\Psi_{\mathrm{ref}}(H/\Delta)\rangle and |Ψ(H)|\Psi(H)\rangle with success probabilities at least 1/21/2 and Ω(λ/Δ2)\Omega(\|\lambda/\Delta\|^{2}), respectively. For either state, each preparation attempt uses 𝒪(1Δlog(Δ/ϵ))\mathcal{O}(\frac{1}{\Delta}\log(\Delta/\epsilon)) evolution time to eiHte^{-\mathrm{i}Ht} with time resolution |t|tmin=Ω~(ϵ/Δ2)|t|\geq t_{\mathrm{min}}=\widetilde{\Omega}(\epsilon/\Delta^{2}). The number of ancilla qubits required is at most nanc=n+3n_{\mathrm{anc}}=n+3.

Theorem 1.10 (Pseudo-Choi state preparation with uncontrolled time-forward queries).

Let Δ2H\Delta\geq 2\|H\|, ϵ>0\epsilon>0. From access to only time-forward dynamics, we can prepare 𝒪(ϵ/Δ)\mathcal{O}(\epsilon/\Delta)-close states to |Ψref(H/(ΛΔ))|\Psi_{\mathrm{ref}}(H/(\Lambda\Delta))\rangle and |Ψ(H)|\Psi(H)\rangle with success probabilities at least 1/21/2 and Ω(ϵ2λ/Δ22)\Omega(\epsilon^{2}\|\lambda/\Delta^{2}\|^{2}), respectively. The parameter Λ\Lambda is known analytically and obeys Λ=𝒪(Δ/ϵ)\Lambda=\mathcal{O}(\Delta/\epsilon). For either state, each preparation attempt uses 𝒪(1Δlog2(Δ/ϵ))\mathcal{O}(\frac{1}{\Delta}\log^{2}(\Delta/\epsilon)) evolution time to eiHte^{-\mathrm{i}Ht} with time resolution ttmin=Ω~(ϵ/Δ2)t\geq t_{\mathrm{min}}=\widetilde{\Omega}(\epsilon/\Delta^{2}). The number of ancilla qubits required is nanc=n+𝒪(loglog(Δ/ϵ))n_{\mathrm{anc}}=n+\mathcal{O}(\log\log(\Delta/\epsilon)).

Observe that the success probability for |Ψref(H/Δ)|\Psi_{\mathrm{ref}}(H/\Delta)\rangle is always at least 1/21/2. Meanwhile for |Ψ(H)|\Psi(H)\rangle, the dependence of λ2\|\lambda\|^{2} will cancel nicely with its use case (which we describe later). The demand for 𝒪(ϵ/Δ)\mathcal{O}(\epsilon/\Delta) accuracy is a consequence of the fact that |Ψref(H/Δ)|\Psi_{\mathrm{ref}}(H/\Delta)\rangle encodes the normalized parameters λa/Δ\lambda_{a}/\Delta. In order to get the desired in accuracy in learning λa\lambda_{a}, we need the state-preparation error to be cϵ/Δ\leq c\epsilon/\Delta for a small constant c1c\ll 1. Below, we describe the techniques that make these constructions possible.

Controlization to construct block encodings

We remove the need for controlled time evolutions through a technique called controlization. While various approaches for controlization have been proposed [Jan02, Zho+11, FDDB14, NSM15, DNSM19, CBW24], we employ the one described by Odake, Kristjánsson, Taranto, and Murao [OKTM23] due to its simplicity and universal applicability. Let H0=Htr(H)𝕀n/2nH_{0}=H-\operatorname{tr}(H)\mathbb{I}_{n}/2^{n} be the traceless part of HH. The subroutine generates a mixed-unitary channel (t)\mathcal{E}(t) which approximates ctrl(eiH0t)\texttt{ctrl}(e^{-\mathrm{i}H_{0}t}) to any target diamond-norm distance γ\gamma. In terms of evolution time, controlization has the nice feature that it exhibits zero overhead: to approximate ctrl(eiH0t)\texttt{ctrl}(e^{-\mathrm{i}H_{0}t}), we query eiHδte^{-\mathrm{i}H\delta t} for a total of N=t/δtN=t/\delta t times. The downside is that each query requires a small time resolution of tmin=t/Nt_{\mathrm{min}}=t/N, so our main task is bounding what NN suffices to achieve the total target error.

The controlization technique of [OKTM23] is based on Pauli twirling and randomized product formulas. It is well know that averaging over the Pauli group gives the identity

14nP𝒫nPHP=trH2n𝕀n.\frac{1}{4^{n}}\sum_{P\in\mathcal{P}_{n}}P^{\dagger}HP=\frac{\operatorname{tr}H}{2^{n}}\mathbb{I}_{n}. (5)

This can be extended by appending a control qubit and replacing each Pauli gate by ctrl0(P)\texttt{ctrl}_{0}(P). In this case the identity becomes

14nP𝒫nctrl0(P)(𝕀1H)ctrl0(P)=|11|H0+trH2n𝕀n+1.\frac{1}{4^{n}}\sum_{P\in\mathcal{P}_{n}}\texttt{ctrl}_{0}(P^{\dagger})(\mathbb{I}_{1}\otimes H)\texttt{ctrl}_{0}(P)=|1\rangle\!\langle 1|\otimes H_{0}+\frac{\operatorname{tr}H}{2^{n}}\mathbb{I}_{n+1}. (6)

Exponentiating this twirled Hamiltonian gives, up to some global phase, the unitary ctrl1(eiH0t)\texttt{ctrl}_{1}(e^{-\mathrm{i}H_{0}t}). We then show that this idea can be easily extended to synthesize multi-controlled unitaries ctrlb(eiH0t)\texttt{ctrl}_{b}(e^{-\mathrm{i}H_{0}t}), where b{0,1}kb\in\{0,1\}^{k} is any kk-qubit control string.

In order to evolve by the twirled Hamiltonian, Trotter’s product formula tells us that eijhjjeihje^{-\mathrm{i}\sum_{j}h_{j}}\approx\prod_{j}e^{-\mathrm{i}h_{j}}. It is enticing to simply concatenate the unitaries eihj=ctrl0(P)eiHt/4nctrl0(P)e^{-\mathrm{i}h_{j}}=\texttt{ctrl}_{0}(P^{\dagger})e^{-\mathrm{i}Ht/4^{n}}\texttt{ctrl}_{0}(P), but clearly this is inefficient; even ignoring the Trotter error, there are exponentially many terms. Fortunately, Campbell’s qDRIFT protocol [Cam19] was designed with precisely this concern in mind. Rather than include each term in the simulation, qDRIFT constructs random product formulas wherein terms are selected with probability proportional to their norm. In our context, all terms have the same weight, so we just sample from 𝒫n\mathcal{P}_{n} uniformly. By treating this ensemble of random product formulas as a mixed-unitary channel (wherein we draw a new formula for each instance), the analysis of [Cam19] (and tightened in [CHKT21]) shows that qDRIFT sequences of length N=𝒪(t2H2/γ)N=\mathcal{O}(t^{2}\|H\|^{2}/\gamma) suffice to get diamond-norm distance at most γ\gamma. Since we only need applications running for time t=𝒪(1/Δ)=𝒪(1/H)t=\mathcal{O}(1/\Delta)=\mathcal{O}(1/\|H\|), this simplifies to N=𝒪(1/γ)N=\mathcal{O}(1/\gamma).

To determine what γ\gamma we need to maintain our learning guarantees, note that our approximation prepares a mixed state from which we probabilistically prepare the pseudo-Choi states. A careful error analysis shows that 𝒪(ϵ/Δ)\mathcal{O}(\epsilon/\Delta) error in trace distance of the mixed states (hence in diamond-norm distance of the channels) suffices. Each block-encoding circuit makes Q1=𝒪(log(Δ/ϵ))Q_{1}=\mathcal{O}(\log(\Delta/\epsilon)) queries to ctrl(e±iH/Δ)\texttt{ctrl}(e^{\pm\mathrm{i}H/\Delta}), so choosing γ=𝒪(ϵ/(Q1Δ))=𝒪~(ϵ/Δ)\gamma=\mathcal{O}(\epsilon/(Q_{1}\Delta))=\widetilde{\mathcal{O}}(\epsilon/\Delta) nets the claim of Theorem 1.9. (For the success probabilities, observe that Pr(Preparing |Ψref(H/Δ))1/2\Pr(\text{Preparing }|\Psi_{\mathrm{ref}}(H/\Delta)\rangle)\geq 1/2 always holds because of the reference term, while Pr(Preparing |Ψ(H))\Pr(\text{Preparing }|\Psi(H)\rangle) is proportional to the norm of the block-encoded matrix.)

Block encoding the matrix logarithm without time reversal

The construction above removes the need for ctrl(e±iHt)\texttt{ctrl}(e^{\pm\mathrm{i}Ht}), but still requires negative-time queries in order to implement the QSVT (and even the LCU for sin(H/Δ)\sin(H/\Delta)). To develop a completely time-reversal-free approach, we construct an entirely different quantum circuit for block encoding H/ΔH/\Delta. The basic idea is simple: we appeal to the well-known power series for the (matrix) logarithm [Hal15].

For any square complex matrix XX such that X<log2\|X\|<\log 2, the following identity holds:

X=k=1(1)k+1(eX𝕀)kk.X=\sum_{k=1}^{\infty}(-1)^{k+1}\frac{(e^{X}-\mathbb{I})^{k}}{k}. (7)

Letting X=iH/ΔX=-\mathrm{i}H/\Delta, this formula gives us a relationship between the Hamiltonian and a linear combination of its time evolution operators which only run forward in time. Block encoding the LCU still requires controlled unitaries, but by our prior discussion we now have a technique to approximate them. Note that our normalization requirement is slightly relaxed compared to the QSVT approach, but for sake of clarity let us impose Δ2H\Delta\geq 2\|H\| as before.333We can actually relax the QSVT normalization further: one can allow H/Δ1δ\|H/\Delta\|\leq 1-\delta for any δ(0,1)\delta\in(0,1) by increasing the polynomial degree to d=𝒪(log(Δ/ϵ)/δ)d=\mathcal{O}(\log(\Delta/\epsilon)/\delta) [GSLW19, Lemma 70].

Technically, Eq. 7 is not written in the form of an LCU. It is also a series with an infinite number of terms. Our analysis is then to determine where we should truncate the series to maintain sufficient accuracy, and how to express the truncated sum as a proper LCU. The former is straightforward: since X12\|X\|\leq\frac{1}{2}, we have eX𝕀12\|e^{X}-\mathbb{I}\|\leq\frac{1}{2}. Let LKL_{K} be the partial sum of Eq. 7 up to kKk\leq K terms. An application of the triangle inequality gets XLK2(K+1)\|X-L_{K}\|\leq 2^{-(K+1)}, which means that K=log2(12γ)K=\lceil\log_{2}(\frac{1}{2\gamma})\rceil ensures a truncation error of at most γ\gamma.

To express LKL_{K} as an LCU, we expand the terms (eiH/Δ𝕀)k(e^{-\mathrm{i}H/\Delta}-\mathbb{I})^{k} using the binomial theorem. With the help of combinatorial identities, we can derive analytical expressions for coefficients cjc_{j} such that LK=j=0Kcjei(H/Δ)jL_{K}=\sum_{j=0}^{K}c_{j}e^{-\mathrm{i}(H/\Delta)j}. Techniques for implementing LCUs are well established [CW12, GSLW19], requiring controlled instances of the unitaries on only log2(K+1)=𝒪(loglog(1/γ))\lceil\log_{2}(K+1)\rceil=\mathcal{O}(\log\log(1/\gamma)) control qubits. The bottleneck of any LCU implementation is the size of its coefficient 1\ell_{1}-norm, Λ=j|cj|\Lambda=\sum_{j}|c_{j}|. Using the analytical form of the cjc_{j}’s, we can compute an essentially tight bound of Λ=𝒪(2K)\Lambda=\mathcal{O}(2^{K}). Recalling that we need to take error below γ=𝒪(ϵ/Δ)\gamma=\mathcal{O}(\epsilon/\Delta), we immediately get Λ=𝒪(Δ2/ϵ2)\Lambda=\mathcal{O}(\Delta^{2}/\epsilon^{2}) and Theorem 1.10 follows. For learning the Hamiltonian parameters, our LCU now prepares the state |Ψref(H/(ΛΔ))|\Psi_{\mathrm{ref}}(H/(\Lambda\Delta))\rangle (still with at least 1/21/2 probability), which requires us to refine the learning precision by 1/Λ1/\Lambda. In turn, this raises the copy complexity to Λ2𝒪(Δ2log(m)/ϵ2)=𝒪~(Δ4/ϵ4)\Lambda^{2}\cdot\mathcal{O}(\Delta^{2}\log(m)/\epsilon^{2})=\widetilde{\mathcal{O}}(\Delta^{4}/\epsilon^{4}). Each copy costs 𝒪~(1/Δ)\widetilde{\mathcal{O}}(1/\Delta) evolution time to produce, leading to the claimed result in Theorem 1.6.

Identifying Pauli spectra using pseudo-Choi states

So far we have discussed the utility of the pseudo-Choi state with reference, as introduced by [CW23]. Let us now turn to the referenceless version as in Eq. 4. We invoke it in order to accomplish the structure identification task. The reference was essential to learn values of the Hamiltonian coefficients λa\lambda_{a}, but it serves no purpose when we only care about detecting which coefficients are large relative to the others. By performing Bell measurements [Mon17] on copies of |Ψ(H)|\Psi(H)\rangle, we can learn which terms in HH have weight greater than ϵ\epsilon.

Theorem 1.11 (Bell sampling for structure learning).

With high probability, we can generate a set of Pauli labels containing Sϵ(H)={Ea:|λa|>ϵ}S_{\epsilon}(H)=\{E_{a}:|\lambda_{a}|>\epsilon\}. This process consumes 𝒪(λ2log(m)/ϵ2)\mathcal{O}(\|\lambda\|^{2}\log(m)/\epsilon^{2}) copies of |Ψ(H)|\Psi(H)\rangle, measuring each in the Bell basis.

Observe that while we need N=𝒪~(λ2/ϵ2)N=\widetilde{\mathcal{O}}(\|\lambda\|^{2}/\epsilon^{2}) copies, the probability of producing a single copy is p=Ω(λ2/Δ2)p=\Omega(\|\lambda\|^{2}/\Delta^{2}) (time-reversal) or p=Ω(ϵ2λ2/Δ4)p=\Omega(\epsilon^{2}\|\lambda\|^{2}/\Delta^{4}) (time-forward) from 𝒪~(1/Δ)\widetilde{\mathcal{O}}(1/\Delta) evolution time. It is a standard fact that 𝒪(N/p)\mathcal{O}(N/p) Bernoulli trials are sufficient to get NN successes. Hence, 𝒪~(Δ/ϵ2)\widetilde{\mathcal{O}}(\Delta/\epsilon^{2}) (time-reversal) or 𝒪~(Δ3/ϵ4)\widetilde{\mathcal{O}}(\Delta^{3}/\epsilon^{4}) (time-forward) total time evolution is used to learn the structure of HH. Unlike prior approaches to estimating Pauli spectra of unitary operators [MO10, Ang23, Gut24, BLMT24a], our algorithm can in principle be applied to any nn-qubit operator AA, as long as one can consistently generate copies of its pseudo-Choi state |Ψ(A)|\Psi(A)\rangle. We are also not restricted by any locality considerations, and the total number of queries to eiHte^{-\mathrm{i}Ht} essentially only depends on ϵ/H\epsilon/\|H\| (i.e., the target error in “natural” units of the Hamiltonian’s norm).

Algorithm 1 describes the core idea of our structure-identifying algorithm. For each copy of |Ψ(H)|\Psi(H)\rangle, we measure in the 2n2n-qubit Bell basis, which can be described as {(P𝕀n)|Ω:P𝒫n}\{(P\otimes\mathbb{I}_{n})|\Omega\rangle:P\in\mathcal{P}_{n}\}. In turn, this induces a distribution governed by Pr(Ea)=|λa|2/λ2\Pr(E_{a})=|\lambda_{a}|^{2}/\|\lambda\|^{2}. Sampling from this distribution allows us to learn the Pauli terms in HH that have relatively high weight.

Input: NN copies of the referenceless pseudo-Choi state |Ψ(H)|\Psi(H)\rangle.
1
2Let S^\widehat{S}\leftarrow\varnothing;
3
4for j=1,,Nj=1,\ldots,N do
5       Measure |Ψ(H)|\Psi(H)\rangle in the Bell basis {|P=(P𝕀n)|Ω:P𝒫n}\{|P\rangle=(P\otimes\mathbb{I}_{n})|\Omega\rangle:P\in\mathcal{P}_{n}\};
6       Let |P(j)|P^{(j)}\rangle be the observed outcome;
7       S^S^{P(j)}\widehat{S}\leftarrow\widehat{S}\cup\{P^{(j)}\};
8      
9 end for
return S^\widehat{S}
Algorithm 1 Identifying the support of the Pauli spectrum of a Hamiltonian.

To analyze the cost of this algorithm, we need to bound the number of measurements such that we observe at least all the terms with |λa|>ϵ|\lambda_{a}|>\epsilon. This is a variant of the coupon collector problem, which is well studied [Shi07]. Our setup has two modifications from the classic version of the problem. First, we have a nonuniform distribution, which bottlenecks the sample complexity based on the least likely coupon. Second, we only care to collect low-rarity coupons (i.e., those such that |λa|>ϵ|\lambda_{a}|>\epsilon). Hence the rarest coupon we need to collect has at least Pr(Ea)>ϵ2/λ2\Pr(E_{a})>\epsilon^{2}/\|\lambda\|^{2}. This condition allows us to bound the tail of the cumulative distribution where, after NN i.i.d. draws of the coupons (Pauli terms), we still have not observed at least one of each type. We show that this probability is at most δ\delta whenever N=𝒪(λ2log(m/δ)/ϵ2)N=\mathcal{O}(\|\lambda\|^{2}\log(m/\delta)/\epsilon^{2}).

Besides this main argument, there are some smaller technical details to take care of. First is the fact that we do not prepare |Ψ(H)|\Psi(H)\rangle exactly, but rather an approximation |Ψ(H~)|\Psi(\widetilde{H})\rangle. Suppose that HH~ε\|H-\widetilde{H}\|\leq\varepsilon for some small ε\varepsilon, which we can tune by constructing a more precise block encoding. Then, we show that Sϵ(H)Sϵ+ε(H~)S_{\epsilon}(H)\subseteq S_{\epsilon+\varepsilon}(\widetilde{H}). While we may pick up terms with |λa|ϵ|\lambda_{a}|\leq\epsilon (this is possible even if sampling from the exact pseudo-Choi state), it is fine to learn a superset. This is because the next stage predicts all λa\lambda_{a} using classical shadows, which only has an explicit logarithmic cost in the number of parameters. Any parameters deemed too small can simply be discarded. Clearly since we only take a polynomial number of samples, this superset has bounded size. But we can prove something even stronger: Sϵ+ε(H~){E1,,Em}S_{\epsilon+\varepsilon}(\widetilde{H})\subseteq\{E_{1},\ldots,E_{m}\}. That is, even if the approximation H~\widetilde{H} has support that leaks onto other Pauli terms, its (ϵ+ε)(\epsilon+\varepsilon)-support never contains those erroneous terms.

Finally, we remark that the controlization approximation affects this algorithm more severely than the parameter estimation stage. We find that the Hamiltonian coefficients become perturbed on the order of ΔQ1γ\Delta\sqrt{Q_{1}\gamma}, where Q1Q_{1} is the number of queries to the time evolution for one block encoding and γ\gamma is the controlization error per query. In order to ensure the correctness of the Bell sampling procedure, we need to adjust the ϵ\epsilon threshold of Theorem 1.11 and choose a sufficiently small γ\gamma. The upshot of this analysis is a time resolution of tmin=Ω~(ϵ/Δ3)t_{\mathrm{min}}=\widetilde{\Omega}(\epsilon/\Delta^{3}) under the time-reversal access model, and tmin=Ω~(ϵ4/Δ5)t_{\mathrm{min}}=\widetilde{\Omega}(\epsilon^{4}/\Delta^{5}) under the conventional access model, for this stage of the algorithm. As is inherent to controlization, ttotalt_{\mathrm{total}} and NexpN_{\mathrm{exp}} are unaffected.

Bootstrapping to (nearly) Heisenberg-limited scaling

Having described an algorithm that learns at the standard quantum limit, we consider how to the enhance the procedure to reach the Heisenberg limit. Using the pseudo-Choi state approach, this is ultimately achieved by a technique called uniform spectral amplification [LC17], a block-encoding generalization of amplitude amplification. As such, we will need access to time evolution inverses, so the following discussion is only viable under the time-reversal model.

To achieve Heisenberg-limited scaling, we use the precision bootstrapping framework, an idea that has enjoyed many fruitful successes in the dynamical learning paradigm [KLY15, HKOT23, DOS23, Zha+23, BLMT24a]. The first use of the strategy for many-body Hamiltonian learning is attributed to Dutkiewicz, O’Brien, and Schuster [DOS23], inspired by the feedback control scheme of Kura and Ueda [KU18] for qudit learning. Its broader applicability was subsequently appreciated by [BLMT24a]. We briefly review the idea here.

Let 𝒜:(H,ϵ,δ)λ^m\operatorname{\mathcal{A}}:(H,\epsilon,\delta)\mapsto\widehat{\lambda}\in\mathbb{R}^{m} be some base learning algorithm with the guarantee that Pr(λλ^ϵ)1δ\Pr(\|\lambda-\widehat{\lambda}\|_{\infty}\leq\epsilon)\geq 1-\delta. The idea is to bootstrap applications of this base algorithm, with constant precision ϵ0=1/2\epsilon_{0}=1/2, to the Heisenberg limit ttotal1/ϵt_{\mathrm{total}}\sim 1/\epsilon by adaptively adjusting the Hamiltonian being learned via quantum control. Consider a previously learned estimate H^\widehat{H} and form a “residual Hamiltonian” R=HH^R=H-\widehat{H}. Assuming that the current estimate is only η\eta-good, we can amplify RR by a factor of 1/η1/\eta to boost the residual errors up to a constant. Then if we learn R/ηR/\eta using 𝒜\operatorname{\mathcal{A}} with merely constant precision ϵ0=1/2\epsilon_{0}=1/2, we can improve our estimate to η/2\eta/2 error. Recursing on this process for multiple rounds j=0,1,,Tj=0,1,\ldots,T for some TT, it is clear that each round produces an estimate with error ηj=2j\eta_{j}=2^{-j}. Hence T=log2(1/ϵ)T=\lfloor\log_{2}(1/\epsilon)\rfloor suffices to get the target accuracy ϵ\epsilon. Taking some care with the acceptable amount of failure probability establishes the correctness of the algorithm. This overall strategy is outlined in Algorithm 2.

Input: Error parameters ϵ,δ(0,1)\epsilon,\delta\in(0,1), base learning algorithm 𝒜\operatorname{\mathcal{A}}, and black-box access to time evolution under the unknown Hamiltonian H=a=1mλaEaH=\sum_{a=1}^{m}\lambda_{a}E_{a}.
Output: An estimate λ^\widehat{\lambda} such that Pr(λλ^ϵ)1δ\Pr(\|\lambda-\widehat{\lambda}\|_{\infty}\leq\epsilon)\geq 1-\delta.
1
2Let λ^(0)(0,,0)m\widehat{\lambda}^{(0)}\leftarrow(0,\ldots,0)\in\mathbb{R}^{m};
3 Let Tlog2(1/ϵ)T\leftarrow\lfloor\log_{2}(1/\epsilon)\rfloor;
4
5for j=0,1,,Tj=0,1,\ldots,T do
6       Let ζjδ/2T+1j\zeta_{j}\leftarrow\delta/2^{T+1-j};
7       Let ηj1/2j\eta_{j}\leftarrow 1/2^{j};
8       Let H^ja=1mλ^a(j)Ea\widehat{H}_{j}\leftarrow\sum_{a=1}^{m}\widehat{\lambda}^{(j)}_{a}E_{a};
9       Let RjHH^jR_{j}\leftarrow H-\widehat{H}_{j};
10       r^(j)𝒜(Rj/ηj,1/2,ζj)\widehat{r}^{(j)}\leftarrow\operatorname{\mathcal{A}}(R_{j}/\eta_{j},1/2,\zeta_{j});
11       Let λ^(j+1)λ^(j)+ηjr^(j)\widehat{\lambda}^{(j+1)}\leftarrow\widehat{\lambda}^{(j)}+\eta_{j}\widehat{r}^{(j)};
12      
13 end for
return λ^λ^(T+1)\widehat{\lambda}\leftarrow\widehat{\lambda}^{(T+1)}
Algorithm 2 Bootstrapped learning algorithm with Heisenberg-limited scaling.

To achieve Heisenberg-limited scaling, the process of boosting the residual by 1/η1/\eta needs to take 𝒪(η)\mathcal{O}(\eta) time, otherwise the advantage is lost. In the conventional time-evolution setting, this holds because applying eiR/ηe^{-\mathrm{i}R/\eta} takes time 1/η1/\eta by definition. In [DOS23], they consider continuous quantum control, wherein the experimenter can implement ei(HH^)te^{-\mathrm{i}(H-\widehat{H})t} directly, while [BLMT24a] assumes only discrete control, requiring them to approximate the residual evolution via Trotterization. In our algorithm, we prepare pseudo-Choi states from a block encoding of the Hamiltonian. Thus the bootstrapping strategy requires us to block encode the residual Hamiltonian, which comes with unique advantages and challenges. The main advantage is that we can block encode H^-\widehat{H} exactly as an LCU [CW12], thereby avoiding the complications of Trotter error in the discrete control model.

On the other hand, boosting the block-encoded residual by 1/η1/\eta is more challenging. To do this, we invoke Low and Chuang’s uniform spectral amplification technique [LC17],444A generalization for amplifying the singular values of a not-necessarily-Hermitian matrix [GSLW19, Theorem 30] also accomplishes this task. which essentially allows us to map RR/ηR\mapsto R/\eta, up to some controllable error εamp\varepsilon_{\mathrm{amp}}, as long as R/η1\|R/\eta\|\leq 1. This costs 𝒪(log(1/εamp)/η)\mathcal{O}(\log(1/\varepsilon_{\mathrm{amp}})/\eta) queries to the block encoding of R=HH^R=H-\widehat{H}. Meanwhile, recall that block encoding HH itself costs 𝒪(log(1/εarcsin))\mathcal{O}(\log(1/\varepsilon_{\mathrm{arcsin}})) queries to ctrl(e±iH/Δ)\texttt{ctrl}(e^{\pm\mathrm{i}H/\Delta}). A careful error analysis shows that we need to take εarcsin=Θ(η/Δ)\varepsilon_{\mathrm{arcsin}}=\Theta(\eta/\Delta) (on the other hand, εamp=Θ(ϵ0/Δ)\varepsilon_{\mathrm{amp}}=\Theta(\epsilon_{0}/\Delta) suffices). Thus in terms of η\eta, the cost of each round of the bootstrap scales as log(1/η)/η\log(1/\eta)/\eta, giving us nearly Heisenberg-limited scaling up to the factor of log(1/η)log(1/ϵ)\log(1/\eta)\leq\log(1/\epsilon).

Note that this argument applies to the structure identification stage of our algorithm as well. For intuition here, consider the first round j=0j=0, wherein we identify all the terms which have |λa|>1/2|\lambda_{a}|>1/2 and estimate them, giving us H^1\widehat{H}_{1}. In the next round j=1j=1, we synthesize HH^H-\widehat{H} and amplify by 22. This splits the Hamiltonian terms into two sets: those in S1/2(H)S_{1/2}(H), and those in the complement S1/2(H)𝖢S_{1/2}(H)^{\mathsf{C}}. We only care to identify the elements of S1/2(H)𝖢S_{1/2}(H)^{\mathsf{C}}; after amplification, all the such terms larger than 1/41/4 become larger than 1/21/2, which again can be identified with just a constant 1/21/2 precision. Iterating in this way, we can deduce the identity of all terms larger than ϵ\epsilon with only log2(1/ϵ)\lfloor\log_{2}(1/\epsilon)\rfloor rounds. Combining this with the parameter estimation procedure gives us the claim in Theorem 1.5.

One final detail to point out is the normalization of the residual Hamiltonian. To guarantee that R/η1\|R/\eta\|\leq 1, we make the particular choice Δ=2m\Delta=2m. This is because our error metric is with respect to the vector-norm of the coefficients, which typically has a loose conversion from the operator norm:

HH^λλ^1mλλ^.\|H-\widehat{H}\|\leq\|\lambda-\widehat{\lambda}\|_{1}\leq m\|\lambda-\widehat{\lambda}\|_{\infty}. (8)

That said, there exist Hamiltonians which saturate this inequality, so in the worst case we cannot expect to tighten it. While we use the \ell_{\infty}-norm as our metric, converting to different norms simply incurs factors of mm in the sample complexity.

1.4 Discussion

In this paper, we have described algorithms to efficiently learn sparse, nonlocal Hamiltonians from minimal prior information. This addresses an open direction posed in [BLMT24a]. At the center of our results is the pseudo-Choi state introduced by [CW23], which enables accurate and parallelized learning of many Hamiltonians terms without locality restrictions or a large quantum memory. The original proposal required controlled time evolutions and time reversal to prepare these states; we have shown how to lift these resource requirements, thereby bringing the scope of the problem down to conventional black-box query scenarios.

For the time-reversal access model, our algorithm can achieve nearly Heisenberg-limited scaling with the pseudo-Choi method. For the time-forward access model, we prepare pseudo-Choi states using a strategy for block encoding the logarithm of a unitary operator without the QSVT. These constructions may be of independent interest more broadly.

To learn the structure of a sparse, nonlocal Hamiltonian, we established a protocol to estimate the Pauli spectrum of any Hamiltonian by sampling its (referenceless) pseudo-Choi state. Parallel to other works which establish quantum analogues of the Goldreich–Levin algorithm for unitaries [MO10, Ang23] and local Hamiltonians [Gut24, BLMT24a], our result establishes a version for nonlocal Hamiltonians. In fact, this algorithm is applicable not only to Hamiltonians but to any matrix that we can efficiently block encode. It would be interesting to find applications of this result in other contexts.

While our work affirms that it is possible to efficiently learn any Pauli-sparse Hamiltonian from queries to its time evolution, many important questions remain open. We list a few below.

  1. 1.

    Is there an information-theoretic separation between Hamiltonian learning protocols with and without time reversal? While exponential separations have been established in unitary learning problems [CSM23, SHH24], these results do not immediately translate to our sparse Hamiltonian learning problem. Since we have established that this problem can be solved in polynomial time, any separation can only be polynomially large. Investigating the (im)possibility of nearly Heisenberg-limited scaling without time reversal would be an important question to address in this context.

  2. 2.

    Can the time resolution of our algorithms be improved? Because the small resolution arises from controlization, an access model given ctrl(eiHt)\texttt{ctrl}(e^{-\mathrm{i}Ht}) immediately bypasses this issue. Within the controlization perspective, an analysis of randomized block encodings (e.g., as in [MR24]) might be useful.

  3. 3.

    Do nontrivial mm-dependent lower bounds exist for 1.2? To our knowledge, the current best lower bound is Ω(1/ϵ)\Omega(1/\epsilon) [HTFS23]. This bound is demonstrated by reducing the mm-term Hamiltonian problem to distinguishing between two single-term Hamiltonians. Meanwhile, exponential lower bounds have been shown for protocols with (Ω(4n/n)\Omega(4^{n}/n)) and without (Ω(4n/ϵ)\Omega(4^{n}/\epsilon)) ancillas and adaptivity, where now the distinguishing task corresponds to a net of Hamiltonians generated by Haar-random unitary transformations [BCO24]. Related is the problem of learning unitary channels, which has the tight lower bound of Ω(4n/ϵ)\Omega(4^{n}/\epsilon) queries [HKOT23]. These results can be understood as either the case of constant mm or exponentially large mm; a lower bound for the intermediate regime m=Θ(poly(n))m=\Theta(\operatorname{poly}(n)) is of clear interest. Note added: During the final preparation stage of this manuscript, a lower bound of Ω(m/ϵ1)\Omega(m/\epsilon_{1}) with respect to 1\ell_{1}-error ϵ1\epsilon_{1} was claimed by Ma, Flammia, Preskill, and Tong [MFPT24].

  4. 4.

    To what extent is our algorithm robust to state-preparation and measurement errors?

2 Background

2.1 Notation

We denote the imaginary unit by i1\mathrm{i}\equiv\sqrt{-1}. For any integer k1k\geq 1, define the set [k]{1,,k}[k]\coloneqq\{1,\ldots,k\}. The natural logarithm is denoted by log\log, and other bases will be specified when necessary. Asymptotic upper and lower bounds are denoted by 𝒪()\mathcal{O}(\cdot) and Ω()\Omega(\cdot), respectively. We write f=Θ(g)f=\Theta(g) if f=𝒪(g)f=\mathcal{O}(g) and f=Ω(g)f=\Omega(g). Tildes on big-O notation suppress polylogarithmic factors, e.g., 𝒪~(f)=𝒪(fpolylog(f))\widetilde{\mathcal{O}}(f)=\mathcal{O}(f\operatorname{polylog}(f)). When we only care that a quantity is polynomially bounded, we write poly(f,g,)=(fg)𝒪(1)\operatorname{poly}(f,g,\ldots)=(fg\cdots)^{\mathcal{O}(1)}. Generally speaking, for an object zz, we will use z^\widehat{z} to denote an estimate of it and z~\widetilde{z} to denote an approximation to it.

2.2 Linear algebra

We collect some standard facts that we will make regular use of.

Proposition 2.1 (Vector norm conversion).

Let xp(j=1d|xj|p)1/p\|x\|_{p}\coloneqq\mathopen{}\left(\sum_{j=1}^{d}|x_{j}|^{p}\right)^{1/p}\mathclose{} be the p\ell_{p}-norm of any xdx\in\mathbb{C}^{d}. For all 0<r<p0<r<p\leq\infty, these vector norms obey

xpxrd1r1pxp,\|x\|_{p}\leq\|x\|_{r}\leq d^{\frac{1}{r}-\frac{1}{p}}\|x\|_{p}, (9)

with the understanding that 1=0\frac{1}{\infty}=0.

The 2\ell_{2}-norm is particularly important, and when the context is clear, we may simply write it as x\|x\|.

Definition 2.2.

For any matrix Ad×dA\in\mathbb{C}^{d\times d}, the Schatten pp-norm Ap\|A\|_{p} is the p\ell_{p}-norm of its singular values.

As such, Schatten norms also obey the bounds of Proposition 2.1. Important matrix norms are the operator norm AA\|A\|\equiv\|A\|_{\infty} and the Frobenius norm AFA2\|A\|_{F}\equiv\|A\|_{2}, which are related by AAFdA\|A\|\leq\|A\|_{F}\leq\sqrt{d}\|A\|. Schatten norms also appear in a matrix version of Hölder’s inequality.

Proposition 2.3 (Hölder’s inequality).

Let A,Bd×dA,B\in\mathbb{C}^{d\times d}. For any 1p,q1\leq p,q\leq\infty such that 1p+1q=1\frac{1}{p}+\frac{1}{q}=1,

|tr(AB)|ApBq.|{\operatorname{tr}(A^{\dagger}B)}|\leq\|A\|_{p}\|B\|_{q}. (10)

Next we consider a Hilbert space (2)n2n(\mathbb{C}^{2})^{\otimes n}\cong\mathbb{C}^{2^{n}} of nn qubits. Let 𝕀n2n×2n\mathbb{I}_{n}\in\mathbb{C}^{2^{n}\times 2^{n}} be the nn-qubit identity operator. We will write 𝕀\mathbb{I} when the dimension is clear from context.

Definition 2.4.

The Pauli matrices are defined as

𝕀=(1001),X=(0110),Y=(0ii0),Z=(1001).\mathbb{I}=\begin{pmatrix}1&0\\ 0&1\end{pmatrix},\quad X=\begin{pmatrix}0&1\\ 1&0\end{pmatrix},\quad Y=\begin{pmatrix}0&-\mathrm{i}\\ \mathrm{i}&0\end{pmatrix},\quad Z=\begin{pmatrix}1&0\\ 0&-1\end{pmatrix}. (11)

The set of nn-qubit Pauli operators 𝒫n{𝕀,X,Y,Z}n\mathcal{P}_{n}\coloneqq\{\mathbb{I},X,Y,Z\}^{\otimes n} is a complete orthogonal operator basis for 2n\mathbb{C}^{2^{n}}, obeying the trace-orthogonality condition tr(PQ)=2nδPQ\operatorname{tr}(P^{\dagger}Q)=2^{n}\delta_{PQ} for all P,Q𝒫nP,Q\in\mathcal{P}_{n}.

As such, any nn-qubit operator AA can be written as

A=P𝒫nxPPA=\sum_{P\in\mathcal{P}_{n}}x_{P}P (12)

where xP=12ntr(PA)x_{P}=\frac{1}{2^{n}}\operatorname{tr}(P^{\dagger}A)\in\mathbb{C} are sometimes referred to as the Fourier coefficients or Pauli spectrum of AA. There is a close relation between the 2\ell_{2}-norm of these coefficients, the Frobenius norm, and the norm of Choi-like vectors of AA.

Proposition 2.5 (Fourier coefficient identities).

Let A=P𝒫nxPPA=\sum_{P\in\mathcal{P}_{n}}x_{P}P be any 2n×2n2^{n}\times 2^{n} complex matrix. Parseval’s identity states that

x2=12nAF.\|x\|_{2}=\frac{1}{\sqrt{2^{n}}}\|A\|_{F}. (13)

An important related identity is

12nAF=(A𝕀n)|Ω,\frac{1}{\sqrt{2^{n}}}\|A\|_{F}=\|(A\otimes\mathbb{I}_{n})|\Omega\rangle\|, (14)

where |Ω=12nb{0,1}n|b|b|\Omega\rangle=\frac{1}{\sqrt{2^{n}}}\sum_{b\in\{0,1\}^{n}}|b\rangle|b\rangle. This is a consequence of a more general result,

Ω|(A𝕀n)|Ω=12ntrA.\langle\Omega|(A\otimes\mathbb{I}_{n})|\Omega\rangle=\frac{1}{2^{n}}\operatorname{tr}A. (15)

We also standardize our notation for controlled unitaries.

Definition 2.6.

For any bit string b{0,1}kb\in\{0,1\}^{k}, the kk-qubit controlled variant of an nn-qubit unitary operator UU is

ctrlb(U)|bb|U+(𝕀k|bb|)𝕀n.\texttt{ctrl}_{b}(U)\coloneqq|b\rangle\!\langle b|\otimes U+(\mathbb{I}_{k}-|b\rangle\!\langle b|)\otimes\mathbb{I}_{n}. (16)

When it is unimportant to specify the precise control bits bb, or the context is otherwise clear, we will simply write ctrl(U)\texttt{ctrl}(U).

Finally, we will also work with quantum channels: completely positive, trace-preserving maps on the space of operators. The natural norm to equip quantum channels with is the diamond norm, also known as the completely bounded trace norm.

Definition 2.7.

Let :d×dd×d\mathcal{E}:\mathbb{C}^{d\times d}\to\mathbb{C}^{d\times d} be a quantum channel. The diamond norm of \mathcal{E} is

sup{(𝕀)(ρ)1:ρd2×d2,ρ11}.\|\mathcal{E}\|_{\diamond}\coloneqq\sup\{\|(\mathcal{E}\otimes\mathbb{I})(\rho)\|_{1}:\rho\in\mathbb{C}^{d^{2}\times d^{2}},\|\rho\|_{1}\leq 1\}. (17)

2.3 Block encodings

Here we collect some standard results about block encodings, namely the ability to take linear combinations of unitaries. Our presentation follows that of [GSLW19] for convenience.

Definition 2.8 ([GSLW19, Definition 43]).

Let AA be an nn-qubit operator and α,ε0\alpha,\varepsilon\geq 0. An (n+a)(n+a)-qubit unitary operator BB is called an (α,a,ε)(\alpha,a,\varepsilon)-block encoding of AA if

Aα(0a|𝕀n)B(|0a𝕀n)ε.\|A-\alpha(\langle 0^{a}|\otimes\mathbb{I}_{n})B(|0^{a}\rangle\otimes\mathbb{I}_{n})\|\leq\varepsilon. (18)

For brevity, we will say that BB is an (α,a,ε)(\alpha,a,\varepsilon)-BE. One may write an (α,a,0)(\alpha,a,0)-BE as

B=(A/α)B=\begin{pmatrix}A/\alpha&\cdot\ \\ \cdot&\cdot\ \end{pmatrix} (19)

when the unspecified blocks are unimportant.

Definition 2.9 ([GSLW19, Definition 51]).

Let yKy\in\mathbb{C}^{K} with y1β\|y\|_{1}\leq\beta, and ε0\varepsilon\geq 0. The pair of bb-qubit unitaries (𝖯𝖱𝖤𝖯L,𝖯𝖱𝖤𝖯R)(\mathsf{PREP}_{L},\mathsf{PREP}_{R}) is called a (β,b,ε)(\beta,b,\varepsilon)-state-preparation pair if

j=0K1|β(cjdj)yj|ε,\sum_{j=0}^{K-1}|\beta(c_{j}^{*}d_{j})-y_{j}|\leq\varepsilon, (20)

where cj=j|𝖯𝖱𝖤𝖯L|0bc_{j}=\langle j|\mathsf{PREP}_{L}|0^{b}\rangle and dj=j|𝖯𝖱𝖤𝖯R|0bd_{j}=\langle j|\mathsf{PREP}_{R}|0^{b}\rangle, and for all j=K,,2b1j=K,\ldots,2^{b}-1, cjdj=0c_{j}^{*}d_{j}=0.

Proposition 2.10 (Linear combination of block encodings [GSLW19, Lemma 52]).

Let A=j=0K1yjAjA=\sum_{j=0}^{K-1}y_{j}A_{j} be an nn-qubit operator, expressed as a linear combination of KK operators AjA_{j} with yKy\in\mathbb{C}^{K}. Let ε1,ε20\varepsilon_{1},\varepsilon_{2}\geq 0. For each AjA_{j}, suppose that there is an (α,a,ε2)(\alpha,a,\varepsilon_{2})-block encoding BjB_{j}. Also, let (𝖯𝖱𝖤𝖯L,𝖯𝖱𝖤𝖯R)(\mathsf{PREP}_{L},\mathsf{PREP}_{R}) be a (β,b,ε1)(\beta,b,\varepsilon_{1})-state-preparation pair for yy. Define the (n+a+b)(n+a+b)-qubit unitary

𝖲𝖤𝖫=j=0K1|jj|Bj+j=K2b1|jj|𝕀a+n.\mathsf{SEL}=\sum_{j=0}^{K-1}|j\rangle\!\langle j|\otimes B_{j}+\sum_{j=K}^{2^{b}-1}|j\rangle\!\langle j|\otimes\mathbb{I}_{a+n}. (21)

Then (𝖯𝖱𝖤𝖯L𝕀a+n)𝖲𝖤𝖫(𝖯𝖱𝖤𝖯R𝕀a+n)(\mathsf{PREP}_{L}^{\dagger}\otimes\mathbb{I}_{a+n})\mathsf{SEL}(\mathsf{PREP}_{R}\otimes\mathbb{I}_{a+n}) is an (αβ,a+b,αε1+βε2)(\alpha\beta,a+b,\alpha\varepsilon_{1}+\beta\varepsilon_{2})-block encoding of AA.555Note that a typo in [GSLW19] mistakenly carried an factor of α\alpha in the βε2\beta\varepsilon_{2} error term.

In this paper, we assume that we can produce state-preparation pairs with infinite precision. While some finite-precision machine error will be incurred in practice, this is vastly negligible to all other errors in our algorithm. Furthermore by the Solovay–Kitaev theorem, we only pay a polylogarithmic overhead for synthesizing the state-preparation pair (as well as any other quantum gates that we may need) from a finite gate set. If necessary, we trust the diligent reader to carry forth the machine precision throughout our exposition.

3 Parameter estimation from pseudo-Choi states

As the base learning algorithm, we will adapt the algorithm of [CW23], which learns the Hamiltonian from a unique resource called its pseudo-Choi state. Given access to such a resource state, their algorithm can efficiently learn any Hamiltonian supported on a polynomial number of terms, regardless of any constraints such as locality that typically hinder other approaches. In this section, we review and streamline the key components of this algorithm. Readers familiar with these results may skip ahead to Section 4.

In order to prepare the pseudo-Choi state from queries to the time evolution operator, [CW23] assumes the ability to perform controlled evolutions ctrl(e±iHt)\texttt{ctrl}(e^{\pm\mathrm{i}Ht}). Controlizing black-box unitaries is in general an impossible task [AFCB14, GST24]. However, because we have fractional-query access to U(t)U(t) through the real parameter tt, we can approximate ctrl(eiHt)\texttt{ctrl}(e^{-\mathrm{i}Ht}) to arbitrary accuracy with no overhead in the evolution time [OKSM24]. Instead, controlization only affects the time resolution and number of additional quantum gates; we carry out this analysis in Section 7. For simplicity of the present exposition, we will work with the exact controlled unitary.

3.1 State preparation from time-evolution queries

[CW23] defines the pseudo-Choi state with a reference control qubit as follows.

Definition 3.1.

Let HH be an nn-qubit Hamiltonian. The pseudo-Choi state of HH with reference666Later we will introduce a referenceless version, which is a 2n2n-qubit state. is the (2n+1)(2n+1)-qubit state

|Ψref(H)(H𝕀n)|Ω|0+|Ω|1𝒩(H),|\Psi_{\mathrm{ref}}(H)\rangle\coloneqq\frac{(H\otimes\mathbb{I}_{n})|\Omega\rangle|0\rangle+|\Omega\rangle|1\rangle}{\mathcal{N}(H)}, (22)

where |Ω=12nb{0,1}n|b|b|\Omega\rangle=\frac{1}{\sqrt{2^{n}}}\sum_{b\in\{0,1\}^{n}}|b\rangle|b\rangle and 𝒩(H)=12nHF2+1\mathcal{N}(H)=\sqrt{\frac{1}{2^{n}}\|H\|_{F}^{2}+1} is a normalization factor.

The reference here will be vital for deducing the Hamiltonian parameters from measurements on the state. Observe that since HF2nH\|H\|_{F}\leq\sqrt{2^{n}}\|H\|, it follows that 𝒩(H)H+1\mathcal{N}(H)\leq\|H\|+1.

The actual pseudo-Choi state that we consider will be

|Ψref(H/Δ)=(H/Δ𝕀n)|Ω|0+|Ω|1𝒩(H/Δ),|\Psi_{\mathrm{ref}}(H/\Delta)\rangle=\frac{(H/\Delta\otimes\mathbb{I}_{n})|\Omega\rangle|0\rangle+|\Omega\rangle|1\rangle}{\mathcal{N}(H/\Delta)}, (23)

where Δ2H\Delta\geq 2\|H\| is a normalization on the Hamiltonian. This normalization is invoked because the pseudo-Choi states will be prepared from a block encoding of the Hamiltonian. Block encoded matrices must have norm at most 11 to ensure unitarity of the encoding operator; the further factor of 22 ensures that the QSVT used to construct this block encoding is also properly normalized. [LC17, GSLW19] show how to block encode HH from forward- and reverse-time queries to its time evolution operator. We opt to restate the formulation from [GSLW19] here.

Proposition 3.2 ([GSLW19, Corollary 71]).

Let ε(0,π4]\varepsilon\in(0,\frac{\pi}{4}]. Suppose that U=eiH/ΔU=e^{-\mathrm{i}H/\Delta} for some Δ2H\Delta\geq 2\|H\|. There exists a (π2,2,ε)(\frac{\pi}{2},2,\varepsilon)-BE of H/ΔH/\Delta, constructed from 𝒪(log(1/ε))\mathcal{O}(\log(1/\varepsilon)) queries to ctrl(U)\texttt{ctrl}(U) and ctrl(U)\texttt{ctrl}(U^{\dagger}), and 𝒪(log(1/ε))\mathcal{O}(\log(1/\varepsilon)) additional quantum gates.

Proof sketch..

We outline the proof idea for completeness and to clarify some details. We begin with the observation that

ctrl(U)(Y𝕀n)ctrl(U)=(sin(H/Δ)),\texttt{ctrl}(U)(Y\otimes\mathbb{I}_{n})\texttt{ctrl}(U^{\dagger})=\begin{pmatrix}\sin(H/\Delta)&\cdot\ \\ \cdot&\cdot\ \end{pmatrix}, (24)

where the matrix representation on the rhs is in the |±|\pm\rangle basis. This is a (1,1,0)(1,1,0)-BE of sin(H/Δ)\sin(H/\Delta). Then, [GSLW19, Lemma 70] shows that there exists a polynomial P(x)P(x) which approximates the arcsin(x)\arcsin(x) function on the domain x[12,12]x\in[-\frac{1}{2},\frac{1}{2}]. Specifically, for any ε(0,12]\varepsilon^{\prime}\in(0,\frac{1}{2}], P(x)P(x) is a polynomial of odd degree d=𝒪(log(1/ε))d=\mathcal{O}(\log(1/\varepsilon^{\prime})) obeying

max12x12|P(x)2πarcsin(x)|ε\max_{-\frac{1}{2}\leq x\leq\frac{1}{2}}\mathopen{}\left|P(x)-\frac{2}{\pi}\arcsin(x)\right|\mathclose{}\leq\varepsilon^{\prime} (25)

and max12x12|P(x)|1\max_{-\frac{1}{2}\leq x\leq\frac{1}{2}}|P(x)|\leq 1. Thus, quantum signal processing or the quantum singular value transformation (on one ancilla qubit) can be applied to encode P(sin(H/Δ))P(\sin(H/\Delta)). A polynomial transformation of degree dd requires dd queries to UU and UU^{\dagger} and 𝒪(d)\mathcal{O}(d) additional quantum gates. This produces a (π2,2,π2ε)(\frac{\pi}{2},2,\frac{\pi}{2}\varepsilon^{\prime})-BE of H/ΔH/\Delta; letting ε=π2ε\varepsilon=\frac{\pi}{2}\varepsilon^{\prime} furnishes the claim. ∎

From this block encoding, [CW23] shows that a pseudo-Choi state of an approximation to H/ΔH/\Delta can be prepared with probability greater than 12\frac{1}{2}.

Proposition 3.3 ([CW23, Lemma 19]).

Let HH be an nn-qubit Hamiltonian and Δ2H\Delta\geq 2\|H\|. Let ε(0,π4]\varepsilon\in(0,\frac{\pi}{4}]. With success probability at least 12\frac{1}{2}, we can prepare a copy of |Ψref(H~Δπ/2)|\Psi_{\mathrm{ref}}(\frac{\widetilde{H}}{\Delta\pi/2})\rangle, where H~\widetilde{H} is a Hermitian matrix obeying

HH~εΔ.\|H-\widetilde{H}\|\leq\varepsilon\Delta. (26)

This procedure uses 𝒪(log(1/ε))\mathcal{O}(\log(1/\varepsilon)) queries to U=eiH/ΔU=e^{-\mathrm{i}H/\Delta}, its inverse, and their multi-controlled forms.

To produce NN copies of the state, 𝒪(N)\mathcal{O}(N) queries to the block encoding suffice. This is a standard result derived from bounding the tail of a sequence of i.i.d. Bernoulli trials. Following the argument given in [CW23, Appendix B.6], we prove the statement below in Appendix A for completeness.

Proposition 3.4.

Let QN1Q\geq N\geq 1 be integers. Let |ψ|\psi\rangle be a state which is probabilistically prepared from a block encoding BB, with success probability p(0,1]p\in(0,1]. With high probability, we can prepare NN copies of |ψ|\psi\rangle from

Q=Θ(N/p)Q=\Theta(N/p) (27)

queries to BB.

3.2 Estimating with decoding operators

Recall the Pauli decomposition of H=a=1mλaEaH=\sum_{a=1}^{m}\lambda_{a}E_{a}. In parallel, we define

λ~a=12ntr(EaH~),\widetilde{\lambda}_{a}=\frac{1}{2^{n}}\operatorname{tr}(E_{a}\widetilde{H}), (28)

which are the parameters in the block-encoded approximation. For each a[m]a\in[m], [CW23] defines a set of so-called “decoding operators” on the 2n+12n+1 qubits:

Oa=(Ea𝕀n)|ΩΩ||01|,a=1,,m,O𝒩=|ΩΩ||11|.\begin{split}O_{a}&=(E_{a}\otimes\mathbb{I}_{n})|\Omega\rangle\!\langle\Omega|\otimes|0\rangle\!\langle 1|,\quad a=1,\ldots,m,\\ O_{\mathcal{N}}&=|\Omega\rangle\!\langle\Omega|\otimes|1\rangle\!\langle 1|.\end{split} (29)

These operators are defined such that their mean values with respect to |Ψref(H~Δπ/2)|\Psi_{\mathrm{ref}}(\frac{\widetilde{H}}{\Delta\pi/2})\rangle recover λ~a\widetilde{\lambda}_{a}, up to normalizations:

Oa=λ~a𝒩(H~Δπ/2)2Δπ/2.\langle O_{a}\rangle=\frac{\widetilde{\lambda}_{a}}{\mathcal{N}\mathopen{}\left(\frac{\widetilde{H}}{\Delta\pi/2}\right)^{2}\mathclose{}\Delta\pi/2}. (30)

While Δ\Delta is chosen by the learner, 𝒩(H~Δπ/2)\mathcal{N}(\frac{\widetilde{H}}{\Delta\pi/2}) is unknown. To access this quantity, O𝒩O_{\mathcal{N}} is used:

O𝒩=𝒩(H~Δπ/2)2.\langle O_{\mathcal{N}}\rangle=\mathcal{N}\mathopen{}\left(\frac{\widetilde{H}}{\Delta\pi/2}\right)^{-2}\mathclose{}. (31)

Note that if we had not prepared the pseudo-Choi state with the reference control qubit, this normalization would have been more difficult to infer.

The classical shadows protocol [HKP20] over the Clifford group Cl(22n+1)\mathrm{Cl}(2^{2n+1}) can be used to estimate all of these mean values with very efficient copy complexity. In brief, the shadow tomography protocol involves applying a random Clifford circuit to each copy of the state and then measuring in the standard basis. After collecting a sufficient number of samples, the measurement data is then classically postprocessed by a linear-inversion estimator which converges to the mean with strong theoretical guarantees. A major selling point of the technique is that, in terms of the explicit dependence on the number of observables, the copy complexity only scales logarithmically.

Let o^j\widehat{o}_{j} be the classical-shadows estimator for Oj\langle O_{j}\rangle. [CW23] showed that their variances under Clifford shadow tomography are bounded by a constant:

maxj[m]{𝒩}Var[o^j]6.\max_{j\in[m]\cup\{\mathcal{N}\}}\operatorname{Var}[\widehat{o}_{j}]\leq 6. (32)

However, the estimate λ^a\widehat{\lambda}_{a} of λ~a\widetilde{\lambda}_{a} is a ratio of two point estimators,

λ^aπΔ2Re[o^a]Re[o^𝒩],\widehat{\lambda}_{a}\coloneqq\frac{\pi\Delta}{2}\frac{\operatorname{Re}[\widehat{o}_{a}]}{\operatorname{Re}[\widehat{o}_{\mathcal{N}}]}, (33)

which requires more than just bounding the variance to control its error. The analysis of [CW23] shows that learning each Oj\langle O_{j}\rangle up to precision εshadow=Ω(γ/Δ)\varepsilon_{\mathrm{shadow}}=\Omega(\gamma/\Delta) suffices to achieve precision γ\gamma in λ^a\widehat{\lambda}_{a}. Plugging this into the sample complexity for classical shadows [HKP20], they find that

𝒪(maxjVar[o^j]log(m/δ)εshadow2)=𝒪(Δ2log(m/δ)γ2)\mathcal{O}\mathopen{}\left(\frac{\max_{j}\operatorname{Var}[\widehat{o}_{j}]\log(m/\delta)}{\varepsilon_{\mathrm{shadow}}^{2}}\right)\mathclose{}=\mathcal{O}\mathopen{}\left(\frac{\Delta^{2}\log(m/\delta)}{\gamma^{2}}\right)\mathclose{} (34)

copies of the pseudo-Choi state suffice to produce an estimate λ^m\widehat{\lambda}\in\mathbb{R}^{m} such that

Pr(λ~λ^γ)1δ.\Pr(\|\widetilde{\lambda}-\widehat{\lambda}\|_{\infty}\leq\gamma)\geq 1-\delta. (35)

Note that the original analysis [CW23, Proposition 15] uses a propagation of uncertainty argument under a linear approximation; a more rigorous analysis by the Taylor expansion of the random variable o^a/o^𝒩\widehat{o}_{a}/\widehat{o}_{\mathcal{N}} leads to the same conclusion (up to constant factors).

Finally, one must translate this estimate of H~\widetilde{H} to an estimate of HH. By Hölder’s inequality, each parameter has systematic error at most

|λaλ~a|=12n|tr(Ea(HH~))|12nEa1HH~=HH~.\begin{split}|\lambda_{a}-\widetilde{\lambda}_{a}|&=\frac{1}{2^{n}}|{\operatorname{tr}(E_{a}(H-\widetilde{H}))}|\\ &\leq\frac{1}{2^{n}}\|E_{a}\|_{1}\|H-\widetilde{H}\|\\ &=\|H-\widetilde{H}\|.\end{split} (36)

Recall from Proposition 3.3 that HH~εΔ\|H-\widetilde{H}\|\leq\varepsilon\Delta. Meanwhile, the statistical error is controlled by the shadow tomography analysis, which guarantees λ~λ^γ\|\widetilde{\lambda}-\widehat{\lambda}\|_{\infty}\leq\gamma with a sufficient number of measurements. Altogether, by a triangle inequality the total error is

λλ^λλ~+λ~λ^𝒪(εΔ)+γ.\begin{split}\|\lambda-\widehat{\lambda}\|_{\infty}&\leq\|\lambda-\widetilde{\lambda}\|_{\infty}+\|\widetilde{\lambda}-\widehat{\lambda}\|_{\infty}\\ &\leq\mathcal{O}(\varepsilon\Delta)+\gamma.\end{split} (37)

To bound this by ϵ\epsilon, we take 𝒪(log(Δ/ϵ))\mathcal{O}(\log(\Delta/\epsilon)) queries to the controlled time evolution to generate a sufficiently accurate copy of the pseudo-Choi state. Because the probability of successful preparation is 12\geq\frac{1}{2}, only a constant number of repetitions are required to generate a copy from the block encoding (Proposition 3.4). This bounds the first term of Eq. 37 by, say, ϵ/2\epsilon/2. Consuming 𝒪~(Δ2/ϵ2)\widetilde{\mathcal{O}}(\Delta^{2}/\epsilon^{2}) such copies via Clifford shadows bounds the second term, also by ϵ/2\epsilon/2. The total number of queries is therefore 𝒪~(Δ2/ϵ2)\widetilde{\mathcal{O}}(\Delta^{2}/\epsilon^{2}), and since each query runs for time 1/Δ1/\Delta, the total evolution time is 𝒪~(Δ/ϵ2)\widetilde{\mathcal{O}}(\Delta/\epsilon^{2}). This is one of the main results of [CW23].

Proposition 3.5 ([CW23, Theorem 26, modified for \ell_{\infty}-error]).

Let H=a=1mλaEaH=\sum_{a=1}^{m}\lambda_{a}E_{a} be an unknown Hamiltonian and let Δ2H\Delta\geq 2\|H\|. Assuming access to U(t)=eiHtU(t)=e^{-\mathrm{i}Ht}, U(t)=U(t)U(t)^{\dagger}=U(-t), and their multi-controlled forms, there exists an algorithm which can learn each λa\lambda_{a} to additive error ϵ\epsilon. The total number of queries to the time evolution operator is 𝒪~(Δ2/ϵ2)\widetilde{\mathcal{O}}(\Delta^{2}/\epsilon^{2}), the total evolution time is 𝒪~(Δ/ϵ2)\widetilde{\mathcal{O}}(\Delta/\epsilon^{2}), and the minimum time resolution is Ω(1/Δ)\Omega(1/\Delta). The amount of classical and quantum computation scales at most polynomially in all parameters.

Remark 3.6.

This algorithm can perform structure learning of arbitrary Hamiltonians, with some important caveats. First, suppose the pool of possible Hamiltonian terms is restricted to some polynomially sized set. In that case, shadow tomography gives sufficient samples to estimate all the possible terms, and the structure can be learned by selecting only those which have large magnitude. For example, the set of kk-local terms has size 𝒪(nk)\mathcal{O}(n^{k}), and this is the setting originally considered in [CW23]. Second, if one allows an exponential amount of classical computation, then the structure can be learned by brute-force processing the shadows over all 4n14^{n}-1 Pauli operators. Although computationally inefficient, the query complexity is only increased by a factor of 𝒪(logm)=𝒪(log4n)=𝒪(n)\mathcal{O}(\log m)=\mathcal{O}(\log 4^{n})=\mathcal{O}(n) for the shadow tomography step. [Car24] uses a similar idea that is also computationally inefficient in this setting for the same reasons.

4 Identifying Pauli spectra with pseudo-Choi states

In this section, we show how to identify the structure {Ea:|λa|ϵ}\{E_{a}:|\lambda_{a}|\geq\epsilon\} of H=a=1mλaEaH=\sum_{a=1}^{m}\lambda_{a}E_{a}, using a total evolution time of 𝒪~(Δ/ϵ2)\widetilde{\mathcal{O}}(\Delta/\epsilon^{2}). In the literature, this problem is typically encountered as learning where the Pauli spectrum of a unitary operator has large mass [MO10, Ang23]. Here, we instead want to identify the Pauli spectrum of a non-unitary operator, HH. The pseudo-Choi state provides us a convenient resource to accomplish this task.

To deduce the identity of the unknown terms, the key idea is to sample from the referenceless pseudo-Choi state of HH. This state can be prepared by essentially the same techniques as for the pseudo-Choi state with reference, described in Section 3. The only changes are that we do not need to control the block encoding, and that we have different guarantees on the success probability.

First, let us define these states.

Definition 4.1.

Let HH be an nn-qubit Hamiltonian. The referenceless pseudo-Choi state of HH is the 2n2n-qubit state

|Ψ(H)(H𝕀n)|Ω(H𝕀n)|Ω,|\Psi(H)\rangle\coloneqq\frac{(H\otimes\mathbb{I}_{n})|\Omega\rangle}{\|(H\otimes\mathbb{I}_{n})|\Omega\rangle\|}, (38)

where |Ω=12nb{0,1}n|b|b|\Omega\rangle=\frac{1}{\sqrt{2^{n}}}\sum_{b\in\{0,1\}^{n}}|b\rangle|b\rangle and (H𝕀n)|Ω=12nHF\|(H\otimes\mathbb{I}_{n})|\Omega\rangle\|=\frac{1}{\sqrt{2^{n}}}\|H\|_{F}.

Unlike the pseudo-Choi state with reference, this referenceless variant is invariant under rescaling HsHH\mapsto sH for any s{0}s\in\mathbb{C}\setminus\{0\}. Let us briefly outline a method for preparing this state if one has access to both positive- and negative-time evolutions. Without negative-time dynamics, a different approach must be taken to block encode the Hamiltonian. This results in an entirely different query complexity; see Section 6 for that setting.

Lemma 4.2.

Let H=a=1mλaEaH=\sum_{a=1}^{m}\lambda_{a}E_{a} be an nn-qubit Hamiltonian, Δ2H\Delta\geq 2\|H\|, and ε>0\varepsilon>0. With success probability Θ(λ~/Δ2)\Theta(\|\widetilde{\lambda}/\Delta\|^{2}), we can prepare a copy of |Ψ(H~)|\Psi(\widetilde{H})\rangle, where H~\widetilde{H} is a Hermitian matrix obeying

HH~ε\|H-\widetilde{H}\|\leq\varepsilon (39)

and λ~a=12ntr(EaH~)\widetilde{\lambda}_{a}=\frac{1}{2^{n}}\operatorname{tr}(E_{a}\widetilde{H}). This procedure uses 𝒪(log(Δ/ε))\mathcal{O}(\log(\Delta/\varepsilon)) queries to U=eiH/ΔU=e^{-\mathrm{i}H/\Delta}, its inverse, and their controlled forms.

Proof.

The preparation of this state is similar to that of the variant with reference, see Proposition 3.3. The main difference here is that we do not need a controlled form of the block encoding. Let BB be the (π2,2,0)(\frac{\pi}{2},2,0)-BE of H~/Δ\widetilde{H}/\Delta produced from 𝒪(log(Δ/ε))\mathcal{O}(\log(\Delta/\varepsilon)) queries, where HH~ε\|H-\widetilde{H}\|\leq\varepsilon. Applying B𝕀nB\otimes\mathbb{I}_{n} to |00|Ω|00\rangle|\Omega\rangle and measuring the first two qubits, if we obtain |00|00\rangle in that register then we have successfully prepared |Ψ(H~)|\Psi(\widetilde{H})\rangle in the remaining 2n2n qubits. This occurs with probability

Pr(00)=(H~Δπ/2𝕀n)|Ω2=(2π)2H~/ΔF22n=(2π)2λ~/Δ2.\begin{split}\Pr(00)&=\mathopen{}\left\|\mathopen{}\left(\frac{\widetilde{H}}{\Delta\pi/2}\otimes\mathbb{I}_{n}\right)\mathclose{}|\Omega\rangle\right\|^{2}\mathclose{}\\ &=\mathopen{}\left(\frac{2}{\pi}\right)^{2}\mathclose{}\frac{\|\widetilde{H}/\Delta\|^{2}_{F}}{2^{n}}\\ &=\mathopen{}\left(\frac{2}{\pi}\right)^{2}\mathclose{}\|\widetilde{\lambda}/\Delta\|^{2}.\end{split} (40)

Let us remark briefly about the applicability of this result. In the context of the bootstrap framework, we will apply Lemma 4.2 in the first step where H^0=0\widehat{H}_{0}=0. Once we have a nontrivial estimate H^j0\widehat{H}_{j}\neq 0, we synthesize the residual Hamiltonian HH^jH-\widehat{H}_{j} and learn from that instead. In that case, the proof argument is the same but the constant (2/π)2(2/\pi)^{2} becomes 1/41/4.

Returning to the present setting, note that the block-encoded approximation H~=a=1mλ~aEa\widetilde{H}=\sum_{a=1}^{m^{\prime}}\widetilde{\lambda}_{a}E_{a} may have (mm)(m^{\prime}-m) potentially superfluous error terms. For some γ(0,1)\gamma\in(0,1), define the set

Sγ(H~){Ea:|λ~a|>γ}.S_{\gamma}(\widetilde{H})\coloneqq\{E_{a}:|\widetilde{\lambda}_{a}|>\gamma\}. (41)

This labels the terms in H~\widetilde{H} that have weight more than γ\gamma. It suffices only to learn the terms in Sγ(H~)S_{\gamma}(\widetilde{H}) because, if EaSγ(H~)E_{a}\notin S_{\gamma}(\widetilde{H}), then λ^a=0\widehat{\lambda}_{a}=0 is a γ\gamma-accurate estimate of λ~a\widetilde{\lambda}_{a}. For small enough γ\gamma, such terms can safely be ignored.

In order to learn this set, we will appeal to the technique of Bell sampling. In our case, we sample from the pseudo-Choi state (instead of two copies of an nn-qubit state, as was introduced in [Mon17]).

Lemma 4.3.

Bell measurements across the bipartition of the 2n2n-qubit state |Ψ(H~)|\Psi(\widetilde{H})\rangle sample from the distribution

𝒟={Pr(Ea)=|λ~a|22:a[m]},\mathcal{D}=\mathopen{}\left\{\Pr(E_{a})=\frac{|\widetilde{\lambda}_{a}|^{2}}{\ell^{2}}:a\in[m^{\prime}]\right\}\mathclose{}, (42)

where =λ~\ell=\|\widetilde{\lambda}\|.

Proof.

Bell measurements, implemented by a depth-22 Clifford circuit, sample from the basis {|P=(P𝕀n)|Ω:P𝒫n}\{|P\rangle=(P\otimes\mathbb{I}_{n})|\Omega\rangle:P\in\mathcal{P}_{n}\}. On the state |Ψ(H~)|\Psi(\widetilde{H})\rangle, this induces the probability distribution

|P|Ψ(H~)|2=|Ω|(PH~)|Ω|2(H~𝕀n)|Ω2=|12ntr(PH~)|22.\begin{split}|\langle P|\Psi(\widetilde{H})\rangle|^{2}&=\frac{|\langle\Omega|(P^{\dagger}\otimes\widetilde{H})|\Omega\rangle|^{2}}{\|(\widetilde{H}\otimes\mathbb{I}_{n})|\Omega\rangle\|^{2}}\\ &=\frac{|\frac{1}{2^{n}}\operatorname{tr}(P^{\dagger}\widetilde{H})|^{2}}{\ell^{2}}.\end{split} (43)

We have |12ntr(PH~)|2=|λ~a|2|\frac{1}{2^{n}}\operatorname{tr}(P^{\dagger}\widetilde{H})|^{2}=|\widetilde{\lambda}_{a}|^{2} if P=EaP=E_{a}, and 0 otherwise. ∎

Our goal then is to draw samples from 𝒟\mathcal{D} until we see all elements of Sγ(H~)S_{\gamma}(\widetilde{H}). The following lemma bounds the number of Bell measurements sufficient to accomplish this task with high probability.

Lemma 4.4.

Let γ,δ(0,1)\gamma,\delta\in(0,1). Let H~\widetilde{H} be a Hamiltonian and suppose we have access to copies of its referenceless pseudo-Choi state. With probability at least 1δ1-\delta, taking 𝒪(λ~2log(|Sγ(H~)|/δ)/γ2)\mathcal{O}(\|\widetilde{\lambda}\|^{2}\log(|S_{\gamma}(\widetilde{H})|/\delta)/\gamma^{2}) Bell measurements of |Ψ(H~)|\Psi(\widetilde{H})\rangle suffices to observe all elements of Sγ(H~)S_{\gamma}(\widetilde{H}).

Proof.

This problem is a variant of the classic coupon collector problem: given a distribution 𝒟\mathcal{D} over mm^{\prime} “coupons,” how many draws does it take until we have collected at least one of each coupon? Our bound follows from a modification of [Shi07, Lemma 12]. In particular, we relax the task by not aiming to collect the rare coupons (i.e., those outside of Sγ(H~)S_{\gamma}(\widetilde{H})).

Let XγX_{\gamma} be the random variable defined as the number of draws until all coupons from Sγ(H~)S_{\gamma}(\widetilde{H}) are obtained. For each integer N>0N>0, define Aa(N)A_{a}(N) as the event that, after drawing NN coupons, none of them are EaE_{a}. The total probability that after NN draws, we have yet to obtain all coupons (from Sγ(H~)S_{\gamma}(\widetilde{H})), can be estimated by a union bound:

Pr(Xγ>N)=Pr(aSγ(H~)Aa(N))aSγ(H~)Pr(Aa(N))=aSγ(H~)[1Pr(Ea)]N.\Pr(X_{\gamma}>N)=\Pr\mathopen{}\left(\bigcup_{a\in S_{\gamma}(\widetilde{H})}A_{a}(N)\right)\mathclose{}\leq\sum_{a\in S_{\gamma}(\widetilde{H})}\Pr(A_{a}(N))=\sum_{a\in S_{\gamma}(\widetilde{H})}[1-\Pr(E_{a})]^{N}. (44)

Let =λ~2\ell=\|\widetilde{\lambda}\|^{2}. For each EaSγ(H~)E_{a}\in S_{\gamma}(\widetilde{H}), we have that Pr(Ea)=|λ~a|2/2>γ2/2\Pr(E_{a})=|\widetilde{\lambda}_{a}|^{2}/\ell^{2}>\gamma^{2}/\ell^{2}. The inequality 1xex1-x\leq e^{-x} then gets

Pr(Xγ>N)<|Sγ(H~)|exp(Nγ22).\Pr(X_{\gamma}>N)<|S_{\gamma}(\widetilde{H})|\exp\mathopen{}\left(-\frac{N\gamma^{2}}{\ell^{2}}\right)\mathclose{}. (45)

Setting this bound as the maximum failure probability δ\delta and solving for NN proves the claim. ∎

Remark 4.5.

The Bell sampling algorithm can also discover elements EaSγ(H~)E_{a^{\prime}}\notin S_{\gamma}(\widetilde{H}). At this current stage in the algorithm, we do not have enough information to determine inclusion or exclusion, so we store all elements observed. As long as we have at least a superset of Sγ(H~)S_{\gamma}(\widetilde{H}), we say that the structure learning algorithm has succeeded. This is fine because in the next stage, shadow tomography can easily estimate an overcomplete set of terms with only logarithmic cost to the sample complexity, per Eq. 34. Any terms that are too small can then be discarded. And because we only ever make a polynomial number of queries, we can never pick up more than a polynomial number of superfluous terms.

The last piece we need is a conversion from learning Sγ(H~)S_{\gamma}(\widetilde{H}) to learning Sϵ(H)S_{\epsilon}(H), given that HH~ε\|H-\widetilde{H}\|\leq\varepsilon.

Lemma 4.6.

Let ε,ϵ(0,1)\varepsilon,\epsilon\in(0,1). Let H,H~H,\widetilde{H} be two Hamiltonians such that

H\displaystyle H =a=1mλaEa,\displaystyle=\sum_{a=1}^{m}\lambda_{a}E_{a}, (46)
H~\displaystyle\widetilde{H} =a=1mλ~aEa+a=m+1mλ~aEa,\displaystyle=\sum_{a=1}^{m}\widetilde{\lambda}_{a}E_{a}+\sum_{a^{\prime}=m+1}^{m^{\prime}}\widetilde{\lambda}_{a^{\prime}}E_{a^{\prime}}, (47)

where mmm^{\prime}\geq m. Suppose HH~ε\|H-\widetilde{H}\|\leq\varepsilon. Then Sϵ(H)Sϵ+ε(H~)S_{\epsilon}(H)\subseteq S_{\epsilon+\varepsilon}(\widetilde{H}), and furthermore, EaSϵ+ε(H~)E_{a^{\prime}}\notin S_{\epsilon+\varepsilon}(\widetilde{H}) for all a=m+1,m+2,,ma^{\prime}=m+1,m+2,\ldots,m^{\prime}.

Proof.

For all EaSϵ+ε(H~)E_{a}\in S_{\epsilon+\varepsilon}(\widetilde{H}) we have |λ~a|>ϵ+ε|\widetilde{\lambda}_{a}|>\epsilon+\varepsilon. By a triangle inequality and Hölder’s inequality (Eq. 36), it holds that |λ~a||λa|+ε|\widetilde{\lambda}_{a}|\leq|\lambda_{a}|+\varepsilon. Therefore all such aa also correspond to |λa|>ϵ|\lambda_{a}|>\epsilon, which characterizes the set Sϵ(H)S_{\epsilon}(H). This demonstrates the first claim.

For the second claim, consider any a{m+1,m+2,,m}a^{\prime}\in\{m+1,m+2,\ldots,m^{\prime}\}. Because HH has no support on such terms, λa=0\lambda_{a^{\prime}}=0 and so |λ~a|ε|\widetilde{\lambda}_{a^{\prime}}|\leq\varepsilon. But since ϵ>0\epsilon>0, this magnitude is strictly less than ε+ϵ\varepsilon+\epsilon. Hence all such λ~a\widetilde{\lambda}_{a^{\prime}} are too small to lie in Sϵ+ε(H~)S_{\epsilon+\varepsilon}(\widetilde{H}). ∎

Combining Lemmas 4.2, 4.4 and 4.6, we arrive at the main result of this section.

Theorem 4.7.

Let H=a=1mλaEaH=\sum_{a=1}^{m}\lambda_{a}E_{a} be an unknown nn-qubit Hamiltonian, Δ2H\Delta\geq 2\|H\|, and ϵ>0\epsilon>0. Define the set Sϵ(H)={Ea:|λa|>ϵ}S_{\epsilon}(H)=\{E_{a}:|\lambda_{a}|>\epsilon\}. There exists an algorithm which learns, with high probability, at least all the elements of Sϵ(H)S_{\epsilon}(H). The algorithm makes

Q=𝒪(Δ2log(m)log(Δ/ϵ)ϵ2).Q=\mathcal{O}\mathopen{}\left(\frac{\Delta^{2}\log(m)\log(\Delta/\epsilon)}{\epsilon^{2}}\right)\mathclose{}. (48)

queries to ctrl(e±iH/Δ)\texttt{ctrl}(e^{\pm\mathrm{i}H/\Delta}) and uses poly(n,m,1/ϵ)\operatorname{poly}(n,m,1/\epsilon) quantum and classical computation.

Proof.

Lemma 4.4 tell us that, with high probability,

N=𝒪(λ~2log|Sγ(H~)|γ2)N=\mathcal{O}\mathopen{}\left(\frac{\|\widetilde{\lambda}\|^{2}\log|S_{\gamma}(\widetilde{H})|}{\gamma^{2}}\right)\mathclose{} (49)

copies of |Ψ(H~)|\Psi(\widetilde{H})\rangle suffice to learn Sγ(H~)S_{\gamma}(\widetilde{H}) via Bell measurements. Meanwhile, Lemma 4.2 tells us that the probability of successfully generating a single copy is

p=Θ(λ~2Δ2).p=\Theta\mathopen{}\left(\frac{\|\widetilde{\lambda}\|^{2}}{\Delta^{2}}\right)\mathclose{}. (50)

By Proposition 3.4, we need to take Q=𝒪(N/p)Q^{\prime}=\mathcal{O}(N/p) trials to produce the NN copies. Therefore from

Q=𝒪(Δ2log|Sγ(H~)|γ2)Q^{\prime}=\mathcal{O}\mathopen{}\left(\frac{\Delta^{2}\log|S_{\gamma}(\widetilde{H})|}{\gamma^{2}}\right)\mathclose{} (51)

queries to the block encoding of H~/Δ\widetilde{H}/\Delta, or equivalently Q=𝒪(Qlog(Δ/ε))Q=\mathcal{O}(Q^{\prime}\log(\Delta/\varepsilon)) queries to the time evolution operator, we learn the set Sγ(H~)S_{\gamma}(\widetilde{H}) with high probability. Finally by Lemma 4.6, setting γ=ϵ+ε\gamma=\epsilon+\varepsilon ensures that the set Sϵ+ε(H~)S_{\epsilon+\varepsilon}(\widetilde{H}) that we have learned contains Sϵ(H)S_{\epsilon}(H) as a subset. Lemma 4.6 also assures us that |Sϵ+ε(H~)|m|S_{\epsilon+\varepsilon}(\widetilde{H})|\leq m since it cannot contain any superfluous terms. Taking ε=Θ(ϵ)\varepsilon=\Theta(\epsilon) sufficiently small concludes the proof. ∎

Comparing this result to Proposition 3.5, we find remarkably that the query and evolution-time complexity of both parameter learning and structure learning from pseudo-Choi states are identical. Therefore even without invoking the precision bootstrap, we have demonstrated an algorithm that can perform structure learning on a completely arbitrary nn-qubit Hamiltonian, using 𝒪~(H/ϵ2)\widetilde{\mathcal{O}}(\|H\|/\epsilon^{2}) total evolution time.

To conclude this section, we make a remark regarding a potential constant shift in HH.

Remark 4.8 (On traceful Hamiltonians).

If HH is not traceless, then the identity coefficient (which we are typically not interested in learning) may wash out the distribution. Asymptotically, this only slows down the learning procedure by a constant factor. However, we have a nicer guarantee using the approximate controlization described in Remark 7.2. Those techniques, which we use to prepare the psuedo-Choi states, will automatically extract the traceless part of HH. Hence moving forward we may suppose trH=0\operatorname{tr}H=0 without loss of generality.

5 Learning with the residual Hamiltonian

Let H^=aλ^aEa\widehat{H}=\sum_{a}\widehat{\lambda}_{a}E_{a} be the current best estimate of HH, satisfying λλ^η\|\lambda-\widehat{\lambda}\|_{\infty}\leq\eta. The key to the precision bootstrap is the ability to learn from R/ηR/\eta, where

RHH^R\coloneqq H-\widehat{H} (52)

is the “residual” Hamiltonian. By scaling up the residual by a factor of 1/η1/\eta, it suffices to learn R/ηR/\eta with merely constant precision ϵ012\epsilon_{0}\leq\frac{1}{2}. In this section, we will show how to block encode R/ηR/\eta and prepare its pseudo-Choi states. This will give us the majority of the proof of Theorem 1.5.

5.1 Block encoding by spectral amplification

Recall that Proposition 3.2 gave us a (π2,2,ε)(\frac{\pi}{2},2,\varepsilon)-BE of H/ΔH/\Delta from queries to its time evolution, provided that Δ2H\Delta\geq 2\|H\|. To make a concrete choice, since Hm\|H\|\leq m we shall fix

Δ=2m\Delta=2m (53)

here. This will guarantee that all the operators we aim to block encode have are sufficiently normalized.

Next, since our description of H^/Δ\widehat{H}/\Delta is already an LCU, we can synthesize its block encoding with standard techniques (Proposition 2.10). However, a technicality we need to address is to match normalizations between the two block encodings before we take their difference.777Alternatively we could make the appropriate choice of nonuniform coefficients in the linear combination; we opt for the approach as presented to simplify the exposition.

Lemma 5.1.

Let H^=a=1mλ^aEa\widehat{H}=\sum_{a=1}^{m}\widehat{\lambda}_{a}E_{a} and Δ=2m\Delta=2m. There exists a (π2,log2m+1,0)(\frac{\pi}{2},\lceil\log_{2}m\rceil+1,0)-BE of H^/Δ\widehat{H}/\Delta.

Proof.

Let B^\widehat{B}^{\prime} be the (λ^1,log2m,0)(\|\widehat{\lambda}\|_{1},\lceil\log_{2}m\rceil,0)-BE of H^\widehat{H} as given by Proposition 2.10. We aim to convert the subnormalization on the block-encoded matrix according to

B^=(H^λ^1)(H^Δπ/2).\widehat{B}^{\prime}=\begin{pmatrix}\frac{\widehat{H}}{\|\widehat{\lambda}\|_{1}}&\cdot\ \\ \cdot&\cdot\ \end{pmatrix}\mapsto\begin{pmatrix}\frac{\widehat{H}}{\Delta\pi/2}&\cdot\ \\ \cdot&\cdot\ \end{pmatrix}. (54)

Define single-qubit the rotation

Ry(θ)=(cos(θ2)sin(θ2)sin(θ2)cos(θ2)).R_{y}(\theta)=\begin{pmatrix}\cos\mathopen{}\left(\frac{\theta}{2}\right)\mathclose{}&-\sin\mathopen{}\left(\frac{\theta}{2}\right)\mathclose{}\\ \sin\mathopen{}\left(\frac{\theta}{2}\right)\mathclose{}&\cos\mathopen{}\left(\frac{\theta}{2}\right)\mathclose{}\end{pmatrix}. (55)

Set θ=2arccos(2λ^1πΔ)\theta=2\arccos\mathopen{}\left(\frac{2\|\widehat{\lambda}\|_{1}}{\pi\Delta}\right)\mathclose{}, which is a real-valued angle since 02λ^1πΔ1π0\leq\frac{2\|\widehat{\lambda}\|_{1}}{\pi\Delta}\leq\frac{1}{\pi}. Then

B^(Ry(θ)𝕀log2m)ctrl0(B^)=(2λ^1πΔB^)\widehat{B}\coloneqq(R_{y}(\theta)\otimes\mathbb{I}_{\lceil\log_{2}m\rceil})\cdot\texttt{ctrl}_{0}(\widehat{B}^{\prime})=\begin{pmatrix}\frac{2\|\widehat{\lambda}\|_{1}}{\pi\Delta}\widehat{B}^{\prime}&\cdot\ \\ \cdot&\cdot\ \end{pmatrix} (56)

has the desired subnormalization in the top-left block, using one more ancilla qubit. ∎

The block encoding for the residual Hamiltonian immediately follows.

Lemma 5.2.

Define R~H~H^\widetilde{R}\coloneqq\widetilde{H}-\widehat{H} where 1ΔHH~ε\frac{1}{\Delta}\|H-\widetilde{H}\|\leq\varepsilon. There exists a (π,log2m+2,0)(\pi,\lceil\log_{2}m\rceil+2,0)-BE of R~/Δ\widetilde{R}/\Delta, using 𝒪(log(1/ε))\mathcal{O}(\log(1/\varepsilon)) queries to ctrl(e±iH/Δ)\texttt{ctrl}(e^{\pm\mathrm{i}H/\Delta}).

Proof.

Here, let us interpret B~\widetilde{B} as a (π2,2,0)(\frac{\pi}{2},2,0)-BE of H~/Δ\widetilde{H}/\Delta given by Proposition 3.2. Let B^\widehat{B} be the (π2,log2m+1,0)(\frac{\pi}{2},\lceil\log_{2}m\rceil+1,0)-BE of H^/Δ\widehat{H}/\Delta given by Lemma 5.1, and let (𝖯𝖱𝖤𝖯L,𝖯𝖱𝖤𝖯R)(\mathsf{PREP}_{L},\mathsf{PREP}_{R}) be a (2,1,0)(2,1,0)-state-preparation pair for y=(1,1)y=(1,-1). Then by Proposition 2.10, we can produce a (π,log2m+2,0)(\pi,\lceil\log_{2}m\rceil+2,0)-BE for R~/Δ\widetilde{R}/\Delta using a single query to ctrl(B~)\texttt{ctrl}(\widetilde{B}) and ctrl(B^)\texttt{ctrl}(\widehat{B}) each. In terms of the query complexity, a single call to ctrl(B~)\texttt{ctrl}(\widetilde{B}) costs 𝒪(log(1/ε))\mathcal{O}(\log(1/\varepsilon)) queries to eiH/Δe^{-\mathrm{i}H/\Delta}, as stated in Proposition 3.2. ∎

Observe that this synthesized residual matrix has small operator norm, because

R~/Δ=1ΔH~H^1Δ(H~H+HH^)ε+a=1m|λaλ^a|2mε+η2.\begin{split}\|\widetilde{R}/\Delta\|&=\frac{1}{\Delta}\|\widetilde{H}-\widehat{H}\|\\ &\leq\frac{1}{\Delta}\mathopen{}\left(\|\widetilde{H}-H\|+\|H-\widehat{H}\|\right)\mathclose{}\\ &\leq\varepsilon+\sum_{a=1}^{m}\frac{|\lambda_{a}-\widehat{\lambda}_{a}|}{2m}\\ &\leq\varepsilon+\frac{\eta}{2}.\end{split} (57)

We need to boost this norm to 𝒪(1)\mathcal{O}(1) in order to learn with sufficient precision. This can be accomplished by uniform spectral amplification [LC17], a matrix-generalization of amplitude amplification. Note that a further generalization called uniform singular value amplification exists for applications beyond Hermitian matrices [GSLW19], however we will not need the strength of that extension here.

Proposition 5.3 ([LC17, Theorem 2]).

Let BB be an (α,q,0)(\alpha,q,0)-BE for a Hermitian matrix HH. Choose some β[H,α]\beta\in[\|H\|,\alpha]. For any 0<ε𝒪(β/α)0<\varepsilon\leq\mathcal{O}(\beta/\alpha), we can construct a (2β,q+2,2βε)(2\beta,q+2,2\beta\varepsilon)-BE of HH, using N=𝒪(αβlog(1/ε))N=\mathcal{O}(\frac{\alpha}{\beta}\log(1/\varepsilon)) queries to ctrl(B)\texttt{ctrl}(B) and 𝒪(Nq)\mathcal{O}(Nq) additional quantum gates.

With this technique, we can block encode the amplified residual Hamiltonian.

Theorem 5.4.

Let H=a=1mλaEaH=\sum_{a=1}^{m}\lambda_{a}E_{a} be an unknown nn-qubit Hamiltonian and H^=a=1mλ^aEa\widehat{H}=\sum_{a=1}^{m}\widehat{\lambda}_{a}E_{a} an estimate such that λλ^η\|\lambda-\widehat{\lambda}\|_{\infty}\leq\eta for some η>0\eta>0. Let Δ=2m\Delta=2m, q=log2m+2q=\lceil\log_{2}m\rceil+2, and fix some error parameters 0<ε,εamp𝒪(η)0<\varepsilon,\varepsilon_{\mathrm{amp}}\leq\mathcal{O}(\eta). There exists a (2,q+2,ε+εamp)(2,q+2,\varepsilon+\varepsilon_{\mathrm{amp}})-BE of

RηΔHH^ηΔ\frac{R}{\eta\Delta}\equiv\frac{H-\widehat{H}}{\eta\Delta} (58)

which costs 𝒪(log(1/(εη))log(1/εamp)/η)\mathcal{O}(\log(1/(\varepsilon\eta))\log(1/\varepsilon_{\mathrm{amp}})/\eta) queries to multi-controlled variants of e±iH/Δe^{\pm\mathrm{i}H/\Delta}.

Proof.

Let

TR~2ηΔ,T\coloneqq\frac{\widetilde{R}}{2\eta\Delta}, (59)

be the “target” operator, which is what we want to amplify up to. Note that by Eq. 57 we have

Tε+η/22η,\|T\|\leq\frac{\varepsilon+\eta/2}{2\eta}, (60)

which is at most 12\frac{1}{2} as long as εη/2\varepsilon\leq\eta/2. Let BB be the (π,q,0)(\pi,q,0)-BE of R~/Δ\widetilde{R}/\Delta, given by Lemma 5.2. Equivalently, we interpret this as a (π2η,q,0)(\frac{\pi}{2\eta},q,0)-BE for TT. Choose β=12\beta=\frac{1}{2}, which is guaranteed to lie within the interval [T,π2η][\|T\|,\frac{\pi}{2\eta}]. Also note that εamp𝒪(η)\varepsilon_{\mathrm{amp}}\leq\mathcal{O}(\eta). Then, we apply uniform spectral amplification to produce BampB_{\mathrm{amp}}, which by Proposition 5.3 is a (1,q+2,εamp)(1,q+2,\varepsilon_{\mathrm{amp}})-BE of TT. The circuit for this transformation costs 𝒪(πηlog(1/εamp))\mathcal{O}(\frac{\pi}{\eta}\log(1/\varepsilon_{\mathrm{amp}})) queries to ctrl(B)\texttt{ctrl}(B), or equivalently

Q=𝒪(log(1/ε)log(1/εamp)η)Q=\mathcal{O}\mathopen{}\left(\frac{\log(1/\varepsilon)\log(1/\varepsilon_{\mathrm{amp}})}{\eta}\right)\mathclose{} (61)

queries to ctrl(e±iH/Δ)\texttt{ctrl}(e^{\pm\mathrm{i}H/\Delta}). To get the block-encoding parameters stated in the theorem, observe that 2T=R~ηΔ2T=\frac{\widetilde{R}}{\eta\Delta} is 2εamp2\varepsilon_{\mathrm{amp}}-close to the amplified matrix. Meanwhile, 1ηΔRR~ε/η\frac{1}{\eta\Delta}\|R-\widetilde{R}\|\leq\varepsilon/\eta. Rescaling error parameters εampεamp/2\varepsilon_{\mathrm{amp}}\leftarrow\varepsilon_{\mathrm{amp}}/2 and εεη\varepsilon\leftarrow\varepsilon\eta concludes the proof. ∎

5.2 Parameter estimation

Now we consider the error parameters sufficient for learning the residual Hamiltonian, within the context of the bootstrap algorithm. Recall that we only need to learn R/ηR/\eta up to constant error of ϵ0=12\epsilon_{0}=\frac{1}{2}.

Let TT^{\prime} be the matrix encoded by BampB_{\mathrm{amp}}. By Theorem 5.4, it has error

RηΔ2Tε+εamp.\mathopen{}\left\|\frac{R}{\eta\Delta}-2T^{\prime}\right\|\mathclose{}\leq\varepsilon+\varepsilon_{\mathrm{amp}}. (62)

Define

ra\displaystyle r_{a} 12ntr[Ea(R/η)],\displaystyle\coloneqq\frac{1}{2^{n}}\operatorname{tr}[E_{a}(R/\eta)], (63)
ra\displaystyle r_{a}^{\prime} 12ntr[Ea(2ΔT)].\displaystyle\coloneqq\frac{1}{2^{n}}\operatorname{tr}[E_{a}(2\Delta T^{\prime})]. (64)

By Hölder’s inequality, we get

|rara|Rη2ΔTΔ(ε+εamp).\begin{split}|r_{a}-r_{a}^{\prime}|&\leq\mathopen{}\left\|\frac{R}{\eta}-2\Delta T^{\prime}\right\|\mathclose{}\\ &\leq\Delta(\varepsilon+\varepsilon_{\mathrm{amp}}).\end{split} (65)

We want to learn rar_{a} to within constant additive error ϵ0=12\epsilon_{0}=\frac{1}{2}, so it suffices to bound Eq. 65 by a small constant as well. Choosing ε=εamp=cϵ0/Δ\varepsilon=\varepsilon_{\mathrm{amp}}=c\epsilon_{0}/\Delta for some sufficiently small constant c12c\ll\frac{1}{2}, we get |rara|2cϵ0|r_{a}-r_{a}^{\prime}|\leq 2c\epsilon_{0}. By Theorem 5.4, the query complexity for producing the block encoding of such a TT^{\prime} is

Q=𝒪(log(Δ/η)log(Δ)η),Q=\mathcal{O}\mathopen{}\left(\frac{\log(\Delta/\eta)\log(\Delta)}{\eta}\right)\mathclose{}, (66)

where we have suppressed the constants c,ϵ0c,\epsilon_{0}. Then given copies of |Ψref(T)|\Psi_{\mathrm{ref}}(T^{\prime})\rangle, we can run the shadow tomography protocol of [CW23] to learn an estimate r^a\widehat{r}^{\prime}_{a} of rar^{\prime}_{a}. By Eq. 34, with high probability we get an ϵ0\epsilon_{0}-accurate estimate by measuring

N=𝒪(Δ2logmϵ02).N=\mathcal{O}\mathopen{}\left(\frac{\Delta^{2}\log m}{\epsilon_{0}^{2}}\right)\mathclose{}. (67)

copies of |Ψref(T)|\Psi_{\mathrm{ref}}(T^{\prime})\rangle. Note that, using the decoding operators OaO_{a} and O𝒩O_{\mathcal{N}} as defined in Eq. 29, our estimate using |Ψref(T)|\Psi_{\mathrm{ref}}(T^{\prime})\rangle has slightly modified constants in its definition:

r^a2ΔRe[o^a]Re[o^𝒩].\widehat{r}_{a}^{\prime}\coloneqq 2\Delta\frac{\operatorname{Re}[\widehat{o}_{a}]}{\operatorname{Re}[\widehat{o}_{\mathcal{N}}]}. (68)

Now we consider the preparation of |Ψref(T)|\Psi_{\mathrm{ref}}(T^{\prime})\rangle. Conditioned on its successful preparation from a query to ctrl0(Bamp)\texttt{ctrl}_{0}(B_{\mathrm{amp}}), the total number of queries required is NQ=𝒪~(Δ2/η)NQ=\widetilde{\mathcal{O}}(\Delta^{2}/\eta). Meanwhile, parallel to the claim in Proposition 3.3, the success probability is above 12\frac{1}{2}, since

Pr(Preparing |Ψref(T))=12(T𝕀n)|Ω|0+|Ω|12=12(12nTF2+1)12.\begin{split}\Pr(\text{Preparing }|\Psi_{\mathrm{ref}}(T^{\prime})\rangle)&=\frac{1}{2}\mathopen{}\left\|(T^{\prime}\otimes\mathbb{I}_{n})|\Omega\rangle|0\rangle+|\Omega\rangle|1\rangle\right\|^{2}\mathclose{}\\ &=\frac{1}{2}\mathopen{}\left(\frac{1}{2^{n}}\|T^{\prime}\|_{F}^{2}+1\right)\mathclose{}\\ &\geq\frac{1}{2}.\end{split} (69)

Therefore by Proposition 3.4, only a constant number of queries suffices to prepare a copy of the pseudo-Choi state with reference, for a total of 𝒪(NQ)\mathcal{O}(NQ) queries and 𝒪(NQ/Δ)\mathcal{O}(NQ/\Delta) evolution time. The theorem below follows.

Theorem 5.5.

Let H=a=1mλaEaH=\sum_{a=1}^{m}\lambda_{a}E_{a} be an unknown nn-qubit Hamiltonian and H^=a=1mλ^aEa\widehat{H}=\sum_{a=1}^{m}\widehat{\lambda}_{a}E_{a} an estimate such that λλ^η\|\lambda-\widehat{\lambda}\|_{\infty}\leq\eta for some η(0,1]\eta\in(0,1]. There exists a learning algorithm which, with high probability, produces an estimate r^a\widehat{r}_{a}^{\prime} of (λaλ^a)/η(\lambda_{a}-\widehat{\lambda}_{a})/\eta up to constant additive error (say, ϵ0=12\epsilon_{0}=\frac{1}{2}) for each a[m]a\in[m]. The algorithm makes 𝒪~(m2/η)\widetilde{\mathcal{O}}(m^{2}/\eta) queries to ctrl(e±iH/(2m))\texttt{ctrl}(e^{\pm\mathrm{i}H/(2m)}) and uses poly(n,m,1/η)\operatorname{poly}(n,m,1/\eta) quantum and classical computation.

5.3 Structure learning

To perform structure learning with (nearly) Heisenberg-limited scaling, the algorithm of Section 4 must also be placed into the bootstrap framework. We describe how that procedure works in this subsection.

Suppose we have copies of the referenceless pseudo-Choi state |Ψ(T)|\Psi(T^{\prime})\rangle, prepared from queries to BampB_{\mathrm{amp}}. (We analyze the complexity of their preparation later.) We run the algorithm of Lemma 4.4 to acquire a set containing Sγ(2ΔT)={Ea:|ra|>γ}S_{\gamma}(2\Delta T^{\prime})=\{E_{a}:|r_{a}^{\prime}|>\gamma\}; this consumes

N=𝒪~(r2γ2)N=\widetilde{\mathcal{O}}\mathopen{}\left(\frac{\|r^{\prime}\|^{2}}{\gamma^{2}}\right)\mathclose{} (70)

copies of |Ψ(T)|\Psi(T^{\prime})\rangle. To relate this set to Sϵ0(R/η)S_{\epsilon_{0}}(R/\eta), we have the bound |ra||ra|+Δ(ε+εamp)|r_{a}^{\prime}|\leq|r_{a}|+\Delta(\varepsilon+\varepsilon_{\mathrm{amp}}), which implies

|ra||ra|Δ(ε+εamp)>γΔ(ε+εamp)\begin{split}|r_{a}|&\geq|r_{a}^{\prime}|-\Delta(\varepsilon+\varepsilon_{\mathrm{amp}})\\ &>\gamma-\Delta(\varepsilon+\varepsilon_{\mathrm{amp}})\end{split} (71)

for all aSγ(2ΔT)a\in S_{\gamma}(2\Delta T^{\prime}). Thus if we choose γ=ϵ0+Δ(ε+εamp)\gamma=\epsilon_{0}+\Delta(\varepsilon+\varepsilon_{\mathrm{amp}}), we get Sϵ0(R/η)Sγ(2ΔT)S_{\epsilon_{0}}(R/\eta)\subseteq S_{\gamma}(2\Delta T^{\prime}).

At each step in the bootstrapping loop, we only need to consider terms in HH that we had not observed yet. Suppose the worst case, wherein at the current step we only have the minimally guaranteed set of terms Sη(H)S_{\eta}(H). In other words, the estimate H^\widehat{H} is supported only on Sη(H)S_{\eta}(H), which means that

R/η=EaSη(H)(λaλ^aη)Ea+EaSη(H)λaηEa.R/\eta=\sum_{E_{a}\in S_{\eta}(H)}\mathopen{}\left(\frac{\lambda_{a}-\widehat{\lambda}_{a}}{\eta}\right)\mathclose{}E_{a}+\sum_{E_{a}\notin S_{\eta}(H)}\frac{\lambda_{a}}{\eta}E_{a}. (72)

We focus on the terms in the second sum, which we have yet to identify. Upon learning Sϵ0(R/η)S_{\epsilon_{0}}(R/\eta), the new terms we have acquired are at least those which obey |λa|>ϵ0η=η/2|\lambda_{a}|>\epsilon_{0}\eta=\eta/2. Appending these newly identified terms with Sη(H)S_{\eta}(H) gives us Sη/2(H)S_{\eta/2}(H), as desired. Setting η=2j\eta=2^{-j} and recursing this procedure over j=0,1,,log2(1/ϵ)j=0,1,\ldots,\lfloor\log_{2}(1/\epsilon)\rfloor allows us to identify Sϵ(H)S_{\epsilon}(H) with nearly Heisenberg-limited precision because the query complexity scales nearly linearly with 1/η1/\eta; we show this in the sequel.

The argument is parallel to the analysis given in Theorem 4.7. Take ε=εamp=cϵ0/Δ\varepsilon=\varepsilon_{\mathrm{amp}}=c\epsilon_{0}/\Delta for a small constant cc, so that γ=12+2c\gamma=\frac{1}{2}+2c is also constant. The number of copies required is then N=𝒪~(r2)N=\widetilde{\mathcal{O}}(\|r^{\prime}\|^{2}). On the other hand, the success probability of preparing a copy from BampB_{\mathrm{amp}} is p=12nTF2=14r/Δ2p=\frac{1}{2^{n}}\|T^{\prime}\|_{F}^{2}=\frac{1}{4}\|r^{\prime}/\Delta\|^{2}. Thus by Proposition 3.4, 𝒪(N/p)=𝒪~(Δ2)=𝒪~(m2)\mathcal{O}(N/p)=\widetilde{\mathcal{O}}(\Delta^{2})=\widetilde{\mathcal{O}}(m^{2}) queries to BampB_{\mathrm{amp}} suffice. Finally, recall from Theorem 5.4 that we make 𝒪~(1/η)\widetilde{\mathcal{O}}(1/\eta) queries to ctrl(e±iH/Δ)\texttt{ctrl}(e^{\pm\mathrm{i}H/\Delta}) in order to produce the amplified block encoding. In sum, we get the following result about structure learning from residuals.

Theorem 5.6.

Let H=a=1mλaEaH=\sum_{a=1}^{m}\lambda_{a}E_{a} be an unknown nn-qubit Hamiltonian and H^=a=1mλ^aEa\widehat{H}=\sum_{a=1}^{m}\widehat{\lambda}_{a}E_{a} an estimate such that λλ^η\|\lambda-\widehat{\lambda}\|_{\infty}\leq\eta for some η(0,1]\eta\in(0,1]. There exists a quantum algorithm which, with high probability, identifies all EaE_{a} such that |λa|η/2|\lambda_{a}|\geq\eta/2. The algorithm makes 𝒪~(m2/η)\widetilde{\mathcal{O}}(m^{2}/\eta) queries to ctrl(e±iH/(2m))\texttt{ctrl}(e^{\pm\mathrm{i}H/(2m)}) and uses poly(n,m,1/η)\operatorname{poly}(n,m,1/\eta) quantum and classical computation.

5.4 Heisenberg-limited bootstrap analysis

Here, we finish the complexity analysis by placing these results into the outer loop described in Algorithm 2. For completeness, let us state and quickly prove why the bootstrap framework achieves Heisenberg-limited scaling.

Proposition 5.7.

Suppose that the base algorithm 𝒜(H,η,ζ)\operatorname{\mathcal{A}}(H,\eta,\zeta) uses 𝒪(τlog(1/ζ)/ηα)\mathcal{O}(\tau\log(1/\zeta)/\eta^{\alpha}) evolution time under HH, for some τ>0\tau>0 and α>1\alpha>1. Furthermore, suppose that 𝒜\operatorname{\mathcal{A}} has the property that 𝒜(sH+H,η,ζ)\operatorname{\mathcal{A}}(sH+H^{\prime},\eta,\zeta) uses evolution time 𝒪(sτlog(1/ζ)/ηα)\mathcal{O}(s\tau\log(1/\zeta)/\eta^{\alpha}), for any s>0s>0 and any (known) Hermitian operator HH^{\prime}. Then, Algorithm 2 returns an ϵ\epsilon-accurate estimate of the Hamiltonian coefficients with probability at least 1δ1-\delta, using total evolution time 𝒪(τlog(1/δ)/ϵ)\mathcal{O}(\tau\log(1/\delta)/\epsilon).

Proof.

First we analyze the query complexity of the algorithm. At each step jj in the loop, the base algorithm only requires a fixed error of ϵ0=1/2\epsilon_{0}=1/2, so the dependence on precision 𝒪(1/ϵ0α)\mathcal{O}(1/\epsilon_{0}^{\alpha}) is reduced to a constant. Meanwhile, we scale HH by s=1/ηjs=1/\eta_{j}. Thus we query the black box for time |tj|=𝒪((τ/ηj)log(1/ζj))|t_{j}|=\mathcal{O}((\tau/\eta_{j})\log(1/\zeta_{j})). With the choices ηj=2j\eta_{j}=2^{-j}, ζj=δ/2T+1j\zeta_{j}=\delta/2^{T+1-j}, and T=log2(1/ϵ)T=\lfloor\log_{2}(1/\epsilon)\rfloor, the total evolution time is

ttotal=j=0T|tj|=𝒪(τj=0T2jlog(2T+1jδ))=𝒪(τj=0T2j(T+1j+log(1/δ)))=𝒪(τ2Tlog(1/δ))=𝒪((τ/ϵ)log(1/δ)).\begin{split}t_{\mathrm{total}}&=\sum_{j=0}^{T}|t_{j}|=\mathcal{O}\mathopen{}\left(\tau\sum_{j=0}^{T}2^{j}\log\mathopen{}\left(\frac{2^{T+1-j}}{\delta}\right)\mathclose{}\right)\mathclose{}\\ &=\mathcal{O}\mathopen{}\left(\tau\sum_{j=0}^{T}2^{j}(T+1-j+\log(1/\delta))\right)\mathclose{}\\ &=\mathcal{O}(\tau 2^{T}\log(1/\delta))=\mathcal{O}((\tau/\epsilon)\log(1/\delta)).\end{split} (73)

Next we check the correctness of the algorithm. By a union bound, the overall failure probability is at most

j=0Tζj=δj=0T2(T+1)+jδ.\sum_{j=0}^{T}\zeta_{j}=\delta\sum_{j=0}^{T}2^{-(T+1)+j}\leq\delta. (74)

Conditioned on the success at each step, each estimate of the residual λ^(j)\widehat{\lambda}^{(j)} is within 1/21/2 of (λλ(j))/ηj(\lambda-\lambda^{(j)})/\eta_{j}. Equivalently, this means that

λ(λ^(j)+ηjr^(j))ηj2=12j+1.\|\lambda-(\widehat{\lambda}^{(j)}+\eta_{j}\widehat{r}^{(j)})\|_{\infty}\leq\frac{\eta_{j}}{2}=\frac{1}{2^{j+1}}. (75)

Thus at the final step, we are left with an estimate with error at most 2(T+1)ϵ2^{-(T+1)}\leq\epsilon. ∎

Now we piece together the claim of Theorem 1.5, up to the time resolution tmint_{\mathrm{min}} which we address in Section 7.

Theorem 5.8.

Let H=a=1mλaEaH=\sum_{a=1}^{m}\lambda_{a}E_{a} be an unknown nn-qubit Hamiltonian but mm given to us. Let ϵ,δ(0,1)\epsilon,\delta\in(0,1). There exists a structure learning algorithm that outputs λ^m\widehat{\lambda}\in\mathbb{R}^{m}, along with classical representations of corresponding E1,,EmE_{1},\ldots,E_{m}, such that Pr(λλ^ϵ)1δ\Pr(\|\lambda-\widehat{\lambda}\|_{\infty}\leq\epsilon)\geq 1-\delta. The algorithm uses a total evolution time of ttotal=𝒪~(mlog(1/ϵ)/ϵ)t_{\mathrm{total}}=\widetilde{\mathcal{O}}(m\log(1/\epsilon)/\epsilon) to time-forward and time-reverse queries, and uses poly(n,m,1/ϵ,log(1/δ))\operatorname{poly}(n,m,1/\epsilon,\log(1/\delta)) quantum and classical computation.

Proof.

We combine Theorems 5.5 and 5.6 to construct the base algorithm 𝒜\operatorname{\mathcal{A}}. (For the j=0j=0 step, we can use Propositions 3.5 and 4.7 without using uniform spectral amplification, at the cost of slightly worse constant factors.) Let Rj=HH^jR_{j}=H-\widehat{H}_{j}, ηj=2j\eta_{j}=2^{-j}, and ζj=δ/2T+1j\zeta_{j}=\delta/2^{T+1-j}. We first identify the terms in Rj/ηjR_{j}/\eta_{j} which are at least 1/21/2 in magnitude, S^jS1/2(Rj/ηj)\widehat{S}_{j}\supseteq S_{1/2}(R_{j}/\eta_{j}). By Theorem 5.6, this costs 𝒪~(m2log(1/ηj)/ηj)\widetilde{\mathcal{O}}(m^{2}\log(1/\eta_{j})/\eta_{j}) queries evolving for time 12m\frac{1}{2m}. Then we estimate all r^j\widehat{r}_{j} corresponding to the Pauli terms in S^j\widehat{S}_{j}, which by Theorem 5.5 has the same query complexity. Repeating these procedures 𝒪(log(1/ζj))\mathcal{O}(\log(1/\zeta_{j})) times and taking the median for each estimate suppresses the failure probability to at most ζj\zeta_{j}.

Thus, 𝒜(Rj/ηj,1/2,ζj)\operatorname{\mathcal{A}}(R_{j}/\eta_{j},1/2,\zeta_{j}) uses evolution time

|tj|=𝒪~((mlog(1/ηj)/ηj)log(1/ζj))𝒪~((mlog(1/ϵ)/ηj)log(1/ζj)),|t_{j}|=\widetilde{\mathcal{O}}((m\log(1/\eta_{j})/\eta_{j})\log(1/\zeta_{j}))\leq\widetilde{\mathcal{O}}((m\log(1/\epsilon)/\eta_{j})\log(1/\zeta_{j})), (76)

where the 𝒪~()\widetilde{\mathcal{O}}(\cdot) notation here suppresses logarithmic factors not involving ηj\eta_{j}, ζj\zeta_{j}, ϵ\epsilon, or δ\delta. Inserting this into the result of Proposition 5.7 with τ=mlog(1/ϵ)\tau=m\log(1/\epsilon) yields the claim. ∎

6 Structure learning without time reversal

To prepare Choi-like states of HH, we required a block encoding of H/ΔH/\Delta for some appropriate normalization Δ\Delta. The approach described up until now relied heavily on the ability to perform both the time evolution operator and its inverse, in order to make full use of the QSVT toolbox. (We also relied on multi-controlled time evolutions, however the universal controlization scheme described in Section 7 allows us to circumvent that concern.) In the access model for this section, we no longer assume access to time reversal. While some time-reversal techniques exist, they either require knowledge of the Hamiltonian structure [OKSM24] or scale exponentially in nn [QDSSM19]. Absent that information at the start of the learning algorithm, it is unclear how to utilize those techniques in our setting. Even more, it has been shown that there exists unitary learning tasks for which time reversal enables exponential speedups in query complexity [Sch+23, CSM23, SHH24]. Such results establish that, in general, time reversal is a powerful resource in learning.

The goal of this section is to prove Theorem 1.6, up to the time resolution tmint_{\mathrm{min}} which will be addressed in Section 7.

6.1 Block encoding the matrix logarithm

In this section we develop an alternative avenue for block encoding the logarithm of eiH/Δe^{-\mathrm{i}H/\Delta}, using only forward-time queries. For exposition, we assume throughout that we know the value of H\|H\| and so can choose Δ=2H\Delta=2\|H\|; without that knowledge, all appearances of the norm can be replaced by mHm\geq\|H\|. Our construction is based on the well-known power series for the matrix logarithm.

Proposition 6.1 ([Hal15]).

For any d×dd\times d complex matrix AA, define

logAk=1(1)k+1(A𝕀)kk.\log A\coloneqq\sum_{k=1}^{\infty}(-1)^{k+1}\frac{(A-\mathbb{I})^{k}}{k}. (77)

This series converges whenever A𝕀<1\|A-\mathbb{I}\|<1 and satisfies

elogA=A.e^{\log A}=A. (78)

Additionally, let Xd×dX\in\mathbb{C}^{d\times d} obeying X<log2\|X\|<\log 2. Then eX𝕀<1\|e^{X}-\mathbb{I}\|<1 and

logeX=X.\log e^{X}=X. (79)

For an integer K1K\geq 1, we define the KKth partial sum of the series,

LK(A)k=1K(1)k+1(A𝕀)kk.L_{K}(A)\coloneqq\sum_{k=1}^{K}(-1)^{k+1}\frac{(A-\mathbb{I})^{k}}{k}. (80)

The error of this truncation is easily bounded as follows.

Lemma 6.2.

Let Ad×dA\in\mathbb{C}^{d\times d} such that A𝕀r<1\|A-\mathbb{I}\|\leq r<1. Then

logALK(A)rK+1(K+1)(1r).\|{\log A}-L_{K}(A)\|\leq\frac{r^{K+1}}{(K+1)(1-r)}. (81)
Proof.

Compute:

logALK(A)=k=K+1(A𝕀)kkk=K+1rkkrK+1K+1k=0rk=rK+1(K+1)(1r).\begin{split}\|{\log A}-L_{K}(A)\|&=\mathopen{}\left\|\sum_{k=K+1}^{\infty}\frac{(A-\mathbb{I})^{k}}{k}\right\|\mathclose{}\\ &\leq\sum_{k=K+1}^{\infty}\frac{r^{k}}{k}\\ &\leq\frac{r^{K+1}}{K+1}\sum_{k=0}^{\infty}r^{k}\\ &=\frac{r^{K+1}}{(K+1)(1-r)}.\end{split} (82)

It remains to block encode iLK(eiH/Δ)\mathrm{i}L_{K}(e^{-\mathrm{i}H/\Delta}), where Δ\Delta is such that H/Δ<log2\|H/\Delta\|<\log 2. It turns out we will need a slightly stronger subnormalization of H/Δ12\|H/\Delta\|\leq\frac{1}{2}, as was the case with the arcsin\arcsin polynomial approximation. Let us fix a Δ2H\Delta\geq 2\|H\| as usual. To simplify notation, we shall denote UU(1/Δ)U\equiv U(1/\Delta) throughout this section.

The follow lemma expresses the truncated power series for logU\log U as a linear combination of powers of UU.

Lemma 6.3.

The operator LK(U)L_{K}(U) can be represented as

LK(U)=j=0KcjUj,L_{K}(U)=\sum_{j=0}^{K}c_{j}U^{j}, (83)

where the coefficients cjc_{j}\in\mathbb{R} are given by

c0=k=1K1k,cj=(1)j+1j(Kj),j=1,,K.\begin{split}c_{0}&=-\sum_{k=1}^{K}\frac{1}{k},\\ c_{j}&=\frac{(-1)^{j+1}}{j}\binom{K}{j},\quad j=1,\ldots,K.\end{split} (84)

The 1\ell_{1}-norm of the coefficients, Λj=0K|cj|\Lambda\coloneqq\sum_{j=0}^{K}|c_{j}|, obeys the following bounds:

Ω(2K/K)Λ𝒪(2K).\Omega(2^{K}/K)\leq\Lambda\leq\mathcal{O}(2^{K}). (85)
Proof.

Since all terms in Eq. 80 are just powers of UU, we can treat LK(U)L_{K}(U) as a commutative polynomial of degree KK. By binomial expansion of (U𝕀n)k(U-\mathbb{I}_{n})^{k}, we get

LK(U)=k=1K(1)k+1k(U𝕀n)k=k=1K(1)k+1kj=0k(kj)(1)kjUj=j=0Kk=1K(1)j+1k(kj)Uj,\begin{split}L_{K}(U)&=\sum_{k=1}^{K}\frac{(-1)^{k+1}}{k}(U-\mathbb{I}_{n})^{k}\\ &=\sum_{k=1}^{K}\frac{(-1)^{k+1}}{k}\sum_{j=0}^{k}\binom{k}{j}(-1)^{k-j}U^{j}\\ &=\sum_{j=0}^{K}\sum_{k=1}^{K}\frac{(-1)^{j+1}}{k}\binom{k}{j}U^{j},\end{split} (86)

where the rearrangement in the final line is valid because (kj)=0\binom{k}{j}=0 whenever k<jk<j. Thus the coefficients of the polynomial are

cj=(1)j+1k=1K1k(kj).c_{j}=(-1)^{j+1}\sum_{k=1}^{K}\frac{1}{k}\binom{k}{j}. (87)

The value at j=0j=0 is clear. For j1j\geq 1, first we rewrite each summand as:

1k(kj)=(k1)!j!(kj)!=1j(k1)!(j1)!(kj)!=1j(k1j1).\begin{split}\frac{1}{k}\binom{k}{j}&=\frac{(k-1)!}{j!(k-j)!}\\ &=\frac{1}{j}\frac{(k-1)!}{(j-1)!(k-j)!}\\ &=\frac{1}{j}\binom{k-1}{j-1}.\end{split} (88)

To evaluate the sum over kk, we use the hockey-stick identity, which states that

k=jK(kj)=(K+1j+1).\sum_{k^{\prime}=j^{\prime}}^{K^{\prime}}\binom{k^{\prime}}{j^{\prime}}=\binom{K^{\prime}+1}{j^{\prime}+1}. (89)

Inserting the appropriate values of j,k,Kj^{\prime},k^{\prime},K^{\prime} for our context, we get

k=1K(k1j1)=k=0K1(kj1)=k=j1K1(kj1)=(Kj),\sum_{k=1}^{K}\binom{k-1}{j-1}=\sum_{k=0}^{K-1}\binom{k}{j-1}=\sum_{k=j-1}^{K-1}\binom{k}{j-1}=\binom{K}{j}, (90)

from which Eq. 84 follows. To get bounds on the scaling of Λ\Lambda, the upper bound can be seen by

Λ=j=0K|cj|=|c0|+j=1K1j(Kj)logK+1+j=1K(Kj)=logK+2K.\begin{split}\Lambda&=\sum_{j=0}^{K}|c_{j}|=|c_{0}|+\sum_{j=1}^{K}\frac{1}{j}\binom{K}{j}\\ &\leq\log K+1+\sum_{j=1}^{K}\binom{K}{j}\\ &=\log K+2^{K}.\end{split} (91)

Meanwhile for the lower bound, we have

Λ|c0|+j=1K1K(Kj)log(K+1)+1K(2K1),\begin{split}\Lambda&\geq|c_{0}|+\sum_{j=1}^{K}\frac{1}{K}\binom{K}{j}\\ &\geq\log(K+1)+\frac{1}{K}(2^{K}-1),\end{split} (92)

which finishes the claim. ∎

Remark 6.4.

Ideally, we would prefer to have constructed the LCU for logU\log U using block encodings of (U𝕀n)k(U-\mathbb{I}_{n})^{k}, wherein the coefficients satisfy k=1K|(1)k+1/k|=𝒪(logK)\sum_{k=1}^{K}|(-1)^{k+1}/k|=\mathcal{O}(\log K). However, each such block encoding is subnormalized by a factor of 2k2^{k}, which again would require inverses to be boosted via singular value amplification [GSLW19].

From this result, we immediately get an (𝒪(1/γ),𝒪(loglog(1/γ),𝒪(γ))(\mathcal{O}(1/\gamma),\mathcal{O}(\log\log(1/\gamma),\mathcal{O}(\gamma))-BE of H/ΔH/\Delta for some γ>0\gamma>0 of our choosing, provided that H/Δ12\|H/\Delta\|\leq\frac{1}{2}.

Theorem 6.5.

Let HH be an nn-qubit Hamiltonian and Δ2H\Delta\geq 2\|H\|. Fix some integer K1K\geq 1 and denote U=eiH/ΔU=e^{-\mathrm{i}H/\Delta}. From access to ctrl(Uj)\texttt{ctrl}(U^{j}) for j=0,1,,Kj=0,1,\ldots,K, we can construct a (Λ,log2(K+1),2(K+1))(\Lambda,\lceil\log_{2}(K+1)\rceil,2^{-(K+1)})-BE of H/ΔH/\Delta, where Λ=𝒪(2K)\Lambda=\mathcal{O}(2^{K}). This construction costs 𝒪(K2/Δ)\mathcal{O}(K^{2}/\Delta) evolution time to the Hamiltonian.

Proof.

Lemma 6.3 describes the LCU representation for iLK(U)\mathrm{i}L_{K}(U) involving only UjU^{j} for j=0,1,,Kj=0,1,\ldots,K, with 1\ell_{1}-norm ΛlogK+2K\Lambda\leq\log K+2^{K}. Proposition 2.10 gives the prescription for realizing the LCU circuit, using log2(K+1)\lceil\log_{2}(K+1)\rceil control qubits. Finally, observe that if H/Δ12\|H/\Delta\|\leq\frac{1}{2}, then

eiH/Δ𝕀n|ei/21|<12.\|e^{-\mathrm{i}H/\Delta}-\mathbb{I}_{n}\|\leq|e^{-\mathrm{i}/2}-1|<\frac{1}{2}. (93)

Therefore if we take r=1/2r=1/2 in Lemma 6.2, we get the error bound

H/ΔiLK(U)2KK+12(K+1).\|H/\Delta-\mathrm{i}L_{K}(U)\|\leq\frac{2^{-K}}{K+1}\leq 2^{-(K+1)}. (94)

Note that the j=0j=0 term is merely the identity; the evolution time from the remaining KK queries is j=1Kj/Δ=𝒪(K2/Δ)\sum_{j=1}^{K}j/\Delta=\mathcal{O}(K^{2}/\Delta). ∎

This block encoding of H/ΔH/\Delta comes with a small subnormalization of 𝒪(γlog(1/γ))\mathcal{O}(\gamma\log(1/\gamma)). Unfortunately, without access to UU^{\dagger} it is difficult to perform spectral amplification to boost this up to Θ(1)\Theta(1). Instead, we will resort to a more rudimentary approach, which is simply to prepare the (referenceless) pseudo-Choi states with low probability.

6.2 Structure learning

The following lemma bounds how much evolution time suffices to prepare NN copies of the referenceless pseudo-Choi state, |Ψ(iLK(U))|\Psi(\mathrm{i}L_{K}(U))\rangle.

Lemma 6.6.

Let H=a=1mλaEaH=\sum_{a=1}^{m}\lambda_{a}E_{a} be an nn-qubit Hamiltonian, Δ2H\Delta\geq 2\|H\|, and U=eiH/ΔU=e^{-\mathrm{i}H/\Delta}. For any integers N,K1N,K\geq 1, with high probability we can prepare NN copies of |Ψ(iLK(U))|\Psi(\mathrm{i}L_{K}(U))\rangle from

Q=𝒪(4KKN12niLK(U)F2)Q=\mathcal{O}\mathopen{}\left(\frac{4^{K}KN}{\frac{1}{2^{n}}\|\mathrm{i}L_{K}(U)\|_{F}^{2}}\right)\mathclose{} (95)

queries and 𝒪(KQ/Δ)\mathcal{O}(KQ/\Delta) evolution time to ctrl(U)\texttt{ctrl}(U) (and no queries to its inverse).

Proof.

Let

B=(iLK(U)/Λ)B=\begin{pmatrix}\mathrm{i}L_{K}(U)/\Lambda&\cdot\ \\ \cdot&\cdot\ \end{pmatrix} (96)

be the block encoding described in Theorem 6.5. Recall that each instance of this unitary requires 𝒪(K2/Δ)\mathcal{O}(K^{2}/\Delta) evolution time to construct. As is standard, we apply B𝕀nB\otimes\mathbb{I}_{n} to the state |0q|Ω|0^{q}\rangle|\Omega\rangle with q=log2(K+1)q=\lceil\log_{2}(K+1)\rceil and measure the qq-qubit register. If we observe |0q|0^{q}\rangle, then the remaining 2n2n-qubit register contains |Ψ(iLK(U))|\Psi(\mathrm{i}L_{K}(U))\rangle as desired. This occurs with probability

Pr(0q)=(iLK(U)Λ𝕀n)|Ω2=12nΛ2iLK(U)F2.\begin{split}\Pr(0^{q})&=\mathopen{}\left\|\mathopen{}\left(\frac{\mathrm{i}L_{K}(U)}{\Lambda}\otimes\mathbb{I}_{n}\right)\mathclose{}|\Omega\rangle\right\|^{2}\mathclose{}\\ &=\frac{1}{2^{n}\Lambda^{2}}\|\mathrm{i}L_{K}(U)\|^{2}_{F}.\end{split} (97)

By Proposition 3.4, we need 𝒪(N/Pr(0q))\mathcal{O}(N/\Pr(0^{q})) queries to BB to prepare NN copies with high probability. Each query to BB costs K+1K+1 queries and 𝒪(K2/Δ)\mathcal{O}(K^{2}/\Delta) evolution time to UU. Use the bound Λ2𝒪(4K)\Lambda^{2}\leq\mathcal{O}(4^{K}) from Eq. 91 to conclude the statement. ∎

With this result, we are equipped to determine the complexity of structure learning arbitrary Hamiltonians, using our algorithm with only forward-time queries.

Theorem 6.7.

Fix ϵ>0\epsilon>0. Let Sϵ(H)={Ea:|λa|>ϵ}S_{\epsilon}(H)=\{E_{a}:|\lambda_{a}|>\epsilon\} be the set of terms in H=a=1mλaEaH=\sum_{a=1}^{m}\lambda_{a}E_{a} which have weight greater than ϵ\epsilon. Let Δ2H\Delta\geq 2\|H\|. From only positive-time queries to eiHte^{-\mathrm{i}Ht}, t0t\geq 0, we can learn all the elements of SϵS_{\epsilon} using

𝒪~(Δ4ϵ4) queries and 𝒪~(Δ3ϵ4) evolution time,\widetilde{\mathcal{O}}\mathopen{}\left(\frac{\Delta^{4}}{\epsilon^{4}}\right)\mathclose{}\text{ queries and }\widetilde{\mathcal{O}}\mathopen{}\left(\frac{\Delta^{3}}{\epsilon^{4}}\right)\mathclose{}\text{ evolution time,} (98)

with high probability.

Proof.

For notation, write H=iΔLK(U)H^{\prime}=\mathrm{i}\Delta L_{K}(U) which has the Pauli decomposition H=a=1mλaEaH^{\prime}=\sum_{a=1}^{m^{\prime}}\lambda^{\prime}_{a}E_{a} for some mmm^{\prime}\geq m. We remark that λa\lambda^{\prime}_{a}\in\mathbb{C} because iLK(U)\mathrm{i}L_{K}(U) is not guaranteed to be Hermitian, although this will not affect our analysis. Let ϵ>0\epsilon^{\prime}>0 to be determined later and define the set

Sϵ(H)={Ea:|λa|>ϵ}.S_{\epsilon^{\prime}}(H^{\prime})=\{E_{a}:|\lambda^{\prime}_{a}|>\epsilon^{\prime}\}. (99)

We apply Lemma 4.4 to the state |Ψ(H)=|Ψ(iLK(U))|\Psi(H^{\prime})\rangle=|\Psi(\mathrm{i}L_{K}(U))\rangle, allowing us to learn all the elements of Sϵ(H)S_{\epsilon^{\prime}}(H^{\prime}) with high probability. This requires

N=𝒪(12niΔLK(U)F2logmϵ2)N=\mathcal{O}\mathopen{}\left(\frac{\frac{1}{2^{n}}\|\mathrm{i}\Delta L_{K}(U)\|_{F}^{2}\log m}{\epsilon^{\prime 2}}\right)\mathclose{} (100)

copies of the referenceless pseudo-Choi state. Combined with Lemma 6.6, this costs Q=𝒪~(4KKΔ2/ϵ2)Q=\widetilde{\mathcal{O}}(4^{K}K\Delta^{2}/\epsilon^{\prime 2}) queries and 𝒪~(KQ/Δ)\widetilde{\mathcal{O}}(KQ/\Delta) evolution time.

Next, we need to determine how close HH^{\prime} should be to HH to guarantee Sϵ(H)Sϵ(H)S_{\epsilon}(H)\subseteq S_{\epsilon^{\prime}}(H^{\prime}). Recall that if HHγΔ\|H-H^{\prime}\|\leq\gamma\Delta, then the error in each coefficient is also bounded by |λaλa|γΔ|\lambda_{a}-\lambda_{a}^{\prime}|\leq\gamma\Delta. Then as long as |λa|>ϵγΔ|\lambda_{a}^{\prime}|>\epsilon-\gamma\Delta, we are ensured that |λa|>ϵ|\lambda_{a}|>\epsilon. Set γΔ=ϵ/2\gamma\Delta=\epsilon/2, which implies the choices ϵ=ϵ/2\epsilon^{\prime}=\epsilon/2 and

K=log2(12γ)=log2(Δ/ϵ).\begin{split}K&=\mathopen{}\left\lceil\log_{2}\mathopen{}\left(\frac{1}{2\gamma}\right)\mathclose{}\right\rceil\mathclose{}\\ &=\lceil\log_{2}(\Delta/\epsilon)\rceil.\end{split} (101)

Thus 4KK=𝒪~(Δ2/ϵ2)4^{K}K=\widetilde{\mathcal{O}}(\Delta^{2}/\epsilon^{2}) and so the claim follows. ∎

6.3 Parameter estimation

Once we have identified the set Sϵ(H)S_{\epsilon}(H), any learning algorithm appropriate for arbitrary Pauli terms can be deployed to learn the parameters to precision ϵ\epsilon. The most straightforward algorithm to use is that of [Car24], which uses shadow tomography on copies of the proper Choi state (U(t)𝕀n)|Ω(U(t)\otimes\mathbb{I}_{n})|\Omega\rangle. After a total evolution time of 𝒪~(H3/ϵ4)\widetilde{\mathcal{O}}(\|H\|^{3}/\epsilon^{4}), the algorithm uses a polynomial interpolation postprocessing step to output ϵ\epsilon-accurate estimates to λa\lambda_{a} for each EaSϵ(H)E_{a}\in S_{\epsilon}(H).

However, the shadow tomography step in [Car24] requires a large quantum memory of 𝒪~(H2/ϵ2)\widetilde{\mathcal{O}}(\|H\|^{2}/\epsilon^{2}) qubits. This quantum memory is used to perform a gentle measurement on multiple coherently stored copies of the Choi states. The size of this register can be reduced to 2n2n by recent advances in shadow tomography [KGKB24], namely the concept of a so-called mimicking state. This mimicking state serves the same effective purpose as the many coherent copies, but would only be the size of the system it is mimicking. However, constructing such a mimicking state takes 2𝒪(n)2^{\mathcal{O}(n)} quantum (and classical) gates, so this approach would not be computationally efficient.

Instead, let us apply the pseudo-Choi learning algorithm of [CW23] directly to our construction. The preparation of the pseudo-Choi state with reference, from a query to ctrl0(B)\texttt{ctrl}_{0}(B), still succeeds with probability at least 12\frac{1}{2}. However, the state we produce has small norm in the encoded Hamiltonian:

|Ψref(iLK(U)Λ)=(iLK(U)Λ𝕀n)|Ω|0+|Ω|1𝒩(iLK(U)Λ).\mathopen{}\left|\Psi_{\mathrm{ref}}\mathopen{}\left(\frac{\mathrm{i}L_{K}(U)}{\Lambda}\right)\mathclose{}\right\rangle\mathclose{}=\frac{(\frac{\mathrm{i}L_{K}(U)}{\Lambda}\otimes\mathbb{I}_{n})|\Omega\rangle|0\rangle+|\Omega\rangle|1\rangle}{\mathcal{N}(\frac{\mathrm{i}L_{K}(U)}{\Lambda})}. (102)

In this case, analogous to Eq. 33 we define the estimator from classical shadows as

λ^aΛΔRe[o^a]Re[o^𝒩].\widehat{\lambda}_{a}\coloneqq\Lambda\Delta\frac{\operatorname{Re}[\widehat{o}_{a}]}{\operatorname{Re}[\widehat{o}_{\mathcal{N}}]}. (103)

Then Eq. 34 gets the modification ΔΛΔ\Delta\leftarrow\Lambda\Delta, leaving us with a new shadow tomography copy complexity of

𝒪~(Λ2Δ2ϵ2)=𝒪~(Δ2γ2ϵ2).\widetilde{\mathcal{O}}\mathopen{}\left(\frac{\Lambda^{2}\Delta^{2}}{\epsilon^{2}}\right)\mathclose{}=\widetilde{\mathcal{O}}\mathopen{}\left(\frac{\Delta^{2}}{\gamma^{2}\epsilon^{2}}\right)\mathclose{}. (104)

This learns λa\lambda_{a}^{\prime} to error ϵ\epsilon. Since λa\lambda_{a}^{\prime} is γΔ\gamma\Delta-close to λa\lambda_{a}, as usual we take γ=cϵ/Δ\gamma=c\epsilon/\Delta for sufficiently small c>0c>0 and improve our sampling error to (1c)ϵ(1-c)\epsilon as well. This gets us a query complexity

Q=𝒪~(Δ4ϵ4)Q=\widetilde{\mathcal{O}}\mathopen{}\left(\frac{\Delta^{4}}{\epsilon^{4}}\right)\mathclose{} (105)

queries to ctrl(ei(H/Δ)j)\texttt{ctrl}(e^{-\mathrm{i}(H/\Delta)j}), for 0j𝒪(log(Δ/ϵ))0\leq j\leq\mathcal{O}(\log(\Delta/\epsilon)). Combined with structure identification which has the same costs (Theorem 6.7), we get Theorem 1.6, up the time resolution claim which we establish in Section 7.

Theorem 6.8.

Let H=a=1mλaEaH=\sum_{a=1}^{m}\lambda_{a}E_{a} be an unknown nn-qubit Hamiltonian. For any Δ2H\Delta\geq 2\|H\| and ϵ(0,1)\epsilon\in(0,1), there exists a structure learning algorithm which, with high probability, learns all λa\lambda_{a} to additive error ϵ\epsilon. The algorithm requires no prior knowledge about the identity of the terms EaE_{a}, makes 𝒪~(Δ4/ϵ4)\widetilde{\mathcal{O}}(\Delta^{4}/\epsilon^{4}) queries to eiH/Δe^{-\mathrm{i}H/\Delta} (and no queries to any form of e+iH/Δe^{+\mathrm{i}H/\Delta}), and uses poly(n,m,1/ϵ)\operatorname{poly}(n,m,1/\epsilon) quantum and classical computation.

7 Approximate controlization of time evolutions

In this section we describe how to approximately implement multi-controlled versions of U(t)=eiHtU(t)=e^{-\mathrm{i}Ht}, given only the ability to apply it for arbitrary tt. While there are no-go results for this task in general [AFCB14, GST24], our fractional-query access model (since we can tune tt) makes this task possible. A number of approaches have been developed, suitable for different scenarios [Jan02, Zho+11, FDDB14, NSM15, DNSM19, CBW24]. We take the one described in [OKTM23], which is universally applicable to any time evolution operator. We also describe how to generalize their technique to multi-controlled operations, at no additional query cost.

For any kk-bit string bb and nn-qubit operator OO, define

ctrlb(O)|bb|O+(𝕀k|bb|)𝕀n.\texttt{ctrl}_{b}(O)\coloneqq|b\rangle\!\langle b|\otimes O+(\mathbb{I}_{k}-|b\rangle\!\langle b|)\otimes\mathbb{I}_{n}. (106)

Our goal is to approximately implement ctrlb(eiH0t)\texttt{ctrl}_{b}(e^{-\mathrm{i}H_{0}t}) for any b{0,1}kb\in\{0,1\}^{k}, where H0H_{0} is the traceless part of HH. This is sufficient for our purposes, since we do not care about learning the identity component of the Hamiltonian. First, we will review the k=1k=1 case. Afterwards, we build up the generalization to k1k\geq 1 control qubits. Finally, we show how our learning algorithms should be modified to account for controlization.

7.1 A single control qubit

The main idea of controlization from [OKTM23] relies on the fact that twirling by the Pauli operators 𝒫n={𝕀,X,Y,Z}n\mathcal{P}_{n}=\{\mathbb{I},X,Y,Z\}^{\otimes n} yields

14nP𝒫nPHP=trH2n𝕀n.\frac{1}{4^{n}}\sum_{P\in\mathcal{P}_{n}}PHP=\frac{\operatorname{tr}H}{2^{n}}\mathbb{I}_{n}. (107)

For b{0,1}b\in\{0,1\}, let b¯=b1\overline{b}=b\oplus 1. Appending a control qubit and replacing each Pauli gate with its controlled form (conditioned on b¯\overline{b}), we get

14nP𝒫nctrlb¯(P)(𝕀1H)ctrlb¯(P)=14nP𝒫n(P00𝕀n)(H00H)(P00𝕀n)=(tr(H)𝕀n/2n00H)=(000H0)+trH2n𝕀n+1,\begin{split}\frac{1}{4^{n}}\sum_{P\in\mathcal{P}_{n}}\texttt{ctrl}_{\overline{b}}(P)(\mathbb{I}_{1}\otimes H)\texttt{ctrl}_{\overline{b}}(P)&=\frac{1}{4^{n}}\sum_{P\in\mathcal{P}_{n}}\begin{pmatrix}P&0\\ 0&\mathbb{I}_{n}\end{pmatrix}\begin{pmatrix}H&0\\ 0&H\end{pmatrix}\begin{pmatrix}P&0\\ 0&\mathbb{I}_{n}\end{pmatrix}\\ &=\begin{pmatrix}\operatorname{tr}(H)\mathbb{I}_{n}/2^{n}&0\\ 0&H\end{pmatrix}\\ &=\begin{pmatrix}0&0\\ 0&H_{0}\end{pmatrix}+\frac{\operatorname{tr}H}{2^{n}}\mathbb{I}_{n+1},\end{split} (108)

where H0=Htr(H)𝕀n/2nH_{0}=H-\operatorname{tr}(H)\mathbb{I}_{n}/2^{n} is the traceless part of HH. Denote this (n+1)(n+1)-qubit operator by H𝒫n,1H^{\mathcal{P}_{n},1}. It generates the unitary

exp(iH𝒫n,1t)=eiα(𝕀n00eiH0t)=eiαctrlb(eiH0t),\exp(-\mathrm{i}H^{\mathcal{P}_{n},1}t)=e^{\mathrm{i}\alpha}\begin{pmatrix}\mathbb{I}_{n}&0\\ 0&e^{-\mathrm{i}H_{0}t}\end{pmatrix}=e^{\mathrm{i}\alpha}\texttt{ctrl}_{b}(e^{-\mathrm{i}H_{0}t}), (109)

where α=ttr(H)/2n\alpha=-t\operatorname{tr}(H)/2^{n}. Thus up to this global phase, we have a formalism for the desired controlled unitary. This can realized by a product formula over the 4n4^{n} terms of H𝒫n,1H^{\mathcal{P}_{n},1}, each of which only requires the uncontrolled time evolution:

exp(it4nctrlb¯(P)(𝕀1H)ctrlb¯(P))=ctrlb¯(P)(𝕀1eiHt/4n)ctrlb¯(P).\exp\mathopen{}\left(-\frac{\mathrm{i}t}{4^{n}}\,\texttt{ctrl}_{\overline{b}}(P)(\mathbb{I}_{1}\otimes H)\texttt{ctrl}_{\overline{b}}(P)\right)\mathclose{}=\texttt{ctrl}_{\overline{b}}(P)(\mathbb{I}_{1}\otimes e^{-\mathrm{i}Ht/4^{n}})\texttt{ctrl}_{\overline{b}}(P). (110)

Taking the product over all P𝒫nP\in\mathcal{P}_{n} yields a product-formula approximation to ctrlb(eiH0t)\texttt{ctrl}_{b}(e^{-\mathrm{i}H_{0}t}). However, this naive implementation clearly requires an exponential amount of resources, as well as an exponentially small time resolution.

To circumvent this issue, one can instead appeal to the randomized product formula qDRIFT [Cam19]. In this approach the gate sequence is constructed by randomly sampling terms, whereby the distribution of terms is proportional to their operator norms. In this case, the distribution reduces to uniformly drawing elements from 𝒫n\mathcal{P}_{n}. Consider a qDRIFT sequence constructed from NN such random draws. That is, we have the circuit V1VNV_{1}\cdots V_{N} where each ViV_{i} is of the form

V=ctrlb¯(P)(𝕀1U(t/N))ctrlb¯(P).V=\texttt{ctrl}_{\overline{b}}(P)(\mathbb{I}_{1}\otimes U(t/N))\texttt{ctrl}_{\overline{b}}(P). (111)

This random construction can be viewed as the mixed-unitary channel 𝔼[𝒱1𝒱N]\operatorname*{\mathbb{E}}[\mathcal{V}_{1}\circ\cdots\circ\mathcal{V}_{N}], where 𝒱i()=Vi()Vi\mathcal{V}_{i}(\cdot)=V_{i}(\cdot)V_{i}^{\dagger}. Also, let 𝒞b(t)\mathcal{C}_{b}(t) be the quantum channel corresponding to the target unitary ctrlb(eiH0t)\texttt{ctrl}_{b}(e^{-\mathrm{i}H_{0}t}). The improved analysis of [CHKT21] guarantees that the qDRIFT channel converges to the desired controlled unitary as

12𝒞b(t)𝔼[𝒱1𝒱N]t2H2N,\frac{1}{2}\|\mathcal{C}_{b}(t)-\operatorname*{\mathbb{E}}[\mathcal{V}_{1}\circ\cdots\circ\mathcal{V}_{N}]\|_{\diamond}\leq\frac{t^{2}\|H\|^{2}}{N}, (112)

where \|\cdot\|_{\diamond} is the diamond norm. Equivalently, building qDRIFT sequences of length N=𝒪(t2H2/γ)N=\mathcal{O}(t^{2}\|H\|^{2}/\gamma) suffices to achieve error γ\gamma.

7.2 Multiple control qubits

To generalize to multiple controls, first note that in the k=1k=1 setting, we have

ctrlb¯(P)=ctrlb(P)(𝕀1P).\texttt{ctrl}_{\overline{b}}(P)=\texttt{ctrl}_{b}(P)(\mathbb{I}_{1}\otimes P). (113)

This follows because P2=𝕀nP^{2}=\mathbb{I}_{n}. Now if we let k1k\geq 1, then for any b{0,1}kb\in\{0,1\}^{k}, the multi-controlled gates that we need to implement are

ctrlb(P)(𝕀kP)=(𝕀k|bb|)P+|bb|𝕀n=(PP𝕀n),\begin{split}\texttt{ctrl}_{b}(P)(\mathbb{I}_{k}\otimes P)&=(\mathbb{I}_{k}-|b\rangle\!\langle b|)\otimes P+|b\rangle\!\langle b|\otimes\mathbb{I}_{n}\\ &=\begin{pmatrix}P&\\ &\ddots\\ &&P\\ &&&\mathbb{I}_{n}\end{pmatrix},\end{split} (114)

where there are 2k12^{k}-1 blocks of PP above. Then, twirling over these gates gives us

H𝒫n,k=14nP𝒫nctrlb(P)(𝕀kP)(𝕀kH)(𝕀kP)ctrlb(P)=(00H0)+trH2n𝕀n+k.\begin{split}H^{\mathcal{P}_{n},k}&=\frac{1}{4^{n}}\sum_{P\in\mathcal{P}_{n}}\texttt{ctrl}_{b}(P)(\mathbb{I}_{k}\otimes P)(\mathbb{I}_{k}\otimes H)(\mathbb{I}_{k}\otimes P)\texttt{ctrl}_{b}(P)\\ &=\begin{pmatrix}0&\\ &\ddots\\ &&0\\ &&&H_{0}\end{pmatrix}+\frac{\operatorname{tr}H}{2^{n}}\mathbb{I}_{n+k}.\end{split} (115)

Simulating this twirled Hamiltonian with the qDRIFT protocol then yields an approximation to ctrlb(eiH0t)\texttt{ctrl}_{b}(e^{-\mathrm{i}H_{0}t}) with the same performance guarantees as in the single-control setting.

Theorem 7.1.

Let HH be a Hamiltonian and suppose we have access to U(t)=eiHtU(t)=e^{-\mathrm{i}Ht}. Let 𝒞b(t)()=ctrlb(U(t))()ctrlb(U(t))\mathcal{C}_{b}(t)(\cdot)=\texttt{ctrl}_{b}(U(t))(\cdot)\texttt{ctrl}_{b}(U(t))^{\dagger} for any b{0,1}kb\in\{0,1\}^{k}. For a γ>0\gamma>0, there exists a mixed-unitary channel (t)\mathcal{E}(t) such that

12𝒞b(t)(t)γ,\frac{1}{2}\|\mathcal{C}_{b}(t)-\mathcal{E}(t)\|_{\diamond}\leq\gamma, (116)

implemented from NN queries to U(t/N)U(t/N) and 𝒪(N)\mathcal{O}(N) Pauli and controlled-Pauli gates, where

N=𝒪(t2H2γ).N=\mathcal{O}\mathopen{}\left(\frac{t^{2}\|H\|^{2}}{\gamma}\right)\mathclose{}. (117)
Remark 7.2 (On the trace of the Hamiltonian).

This method of controlization isolates the traceless part of the Hamiltonian, so without further modifications it can only approximate ctrlb(eiH0t)\texttt{ctrl}_{b}(e^{-\mathrm{i}H_{0}t}). For our purposes this is actually favorable, because we can now always safely assume that the Hamiltonian to be learned is traceless, removing any potential complications with a nonzero identity component in HH.

7.3 Error in pseudo-Choi state preparation

All the algorithms presented in this work (even the time-reversal-free approach) require controlled variants of the time evolution operator. Using the controlization scheme of this section, all instances of ctrl(eiHt)\texttt{ctrl}(e^{-\mathrm{i}Ht}) appearing in this paper can be approximated by the qDRIFT circuit involving NN applications of eiHt/Ne^{-\mathrm{i}Ht/N}. Let us briefly discuss how this approximation affects the learning guarantees.

To begin, observe that each query to ctrl(U(t))\texttt{ctrl}(U(t)) runs for time |t|=1/Δ|t|=1/\Delta when given access to time reversal, and t=j/Δt=j/\Delta for j𝒪(log(Δ/ϵ))j\leq\mathcal{O}(\log(\Delta/\epsilon)) without time reversal. Thus to streamline exposition, suppose |t|=𝒪~(1/Δ)|t|=\widetilde{\mathcal{O}}(1/\Delta). According to Theorem 7.1, each query costs

N=𝒪~(H2Δ2γ)=𝒪~(1/γ)N=\widetilde{\mathcal{O}}\mathopen{}\left(\frac{\|H\|^{2}}{\Delta^{2}\gamma}\right)\mathclose{}=\widetilde{\mathcal{O}}(1/\gamma) (118)

queries to the uncontrolled evolution U(t/N)U(t/N), where the second bound follows from H2Δ\|H\|\leq 2\Delta. Clearly, the total evolution time is the same. However, the time resolution is shortened to tmin=t/N=Ω(γ/Δ)t_{\mathrm{min}}=t/N=\Omega(\gamma/\Delta), so it behooves us to analyze what γ>0\gamma>0 suffices. The controlization also introduces additional quantum gates, but this overhead is at most poly(n,Δ,1/γ)\operatorname{poly}(n,\Delta,1/\gamma) so we do not closely analyze it here.

Theorem 7.3.

In order to replace controlled time evolutions with approximate controlizations in any pseudo-Choi state learning algorithm, we can make NN queries to U(t/N)U(t/N) per instance of ctrl(U(t))\texttt{ctrl}(U(t)). Letting ϵ>0\epsilon>0 be the target \ell_{\infty} error and Δ\Delta the choice of Hamiltonian normalization, it suffices to take for:

  • Parameter estimation with time reversal: N=𝒪~(Δ)N=\widetilde{\mathcal{O}}(\Delta);

  • Parameter estimation without time reversal: N=𝒪~(Δ/ϵ)N=\widetilde{\mathcal{O}}(\Delta/\epsilon);

  • Structure learning with time reversal: N=𝒪~(Δ2/ϵ)N=\widetilde{\mathcal{O}}(\Delta^{2}/\epsilon);

  • Structure learning without time reversal: N=𝒪~(Δ4/ϵ4)N=\widetilde{\mathcal{O}}(\Delta^{4}/\epsilon^{4}).

The time resolution becomes tmin=Ω(1/(NΔ))t_{\mathrm{min}}=\Omega(1/(N\Delta)), and the total evolution time and number of experiments are unaffected.

We will prove this statement in two parts: first the effect of controlization on the parameter estimation stage, and then the effect on the term identification stage.

Lemma 7.4.

Let BB be an (α,q,0)(\alpha,q,0)-BE that prepares any pseudo-Choi state with reference, and \mathcal{B} its corresponding channel. Suppose BB makes Q1Q_{1} queries to controlled time evolutions. Let ~\widetilde{\mathcal{B}} be the channel obtained by replacing all such queries with controlized approximations, each with diamond error γ\gamma. Let ρ0\rho_{0} be any initial (2n+1+q)(2n+1+q)-qubit state, and define

σ(0q|𝕀2n+1)(ρ0)(|0q𝕀2n+1)tr[(|0q0q|𝕀2n+1)(ρ0)],\sigma\coloneqq\frac{(\langle 0^{q}|\otimes\mathbb{I}_{2n+1})\mathcal{B}(\rho_{0})(|0^{q}\rangle\otimes\mathbb{I}_{2n+1})}{\operatorname{tr}[(|0^{q}\rangle\!\langle 0^{q}|\otimes\mathbb{I}_{2n+1})\mathcal{B}(\rho_{0})]}, (119)

Analogously, define σ~\widetilde{\sigma} with respect to ~\widetilde{\mathcal{B}}. Then for any decoding operator Oj{O1,,Om,O𝒩}O_{j}\in\{O_{1},\ldots,O_{m},O_{\mathcal{N}}\}, we have

|tr(Ojσ)tr(Ojσ~)|6Q1γ.|{\operatorname{tr}(O_{j}\sigma)}-\operatorname{tr}(O_{j}\widetilde{\sigma})|\leq 6Q_{1}\gamma. (120)
Proof.

Let 𝒞j\mathcal{C}_{j} denote a (multi-)controlled instance of 𝒰\mathcal{U} appearing in \mathcal{B}. Let j\mathcal{E}_{j} be the mixed-unitary channel guaranteed by Theorem 7.1 to approximate 𝒞j\mathcal{C}_{j} to within diamond distance γ\gamma. By a telescoping inequality, we have

~j=1Q1𝒞jjQ1γ.\|\mathcal{B}-\widetilde{\mathcal{B}}\|_{\diamond}\leq\sum_{j=1}^{Q_{1}}\|\mathcal{C}_{j}-\mathcal{E}_{j}\|_{\diamond}\leq Q_{1}\gamma. (121)

For convenience, we will define ρ(ρ0)\rho\coloneqq\mathcal{B}(\rho_{0}) and ρ~~(ρ0)\widetilde{\rho}\coloneqq\widetilde{\mathcal{B}}(\rho_{0}) To incorporate the conditional state preparation into the observable estimation analysis, we use the fact that in expectation,

tr(Ojσ)=tr[(|0q0q|Pr(0q;ρ)Oj)ρ],\operatorname{tr}(O_{j}\sigma)=\operatorname{tr}\mathopen{}\left[\mathopen{}\left(\frac{|0^{q}\rangle\!\langle 0^{q}|}{\Pr(0^{q};\rho)}\otimes O_{j}\right)\mathclose{}\rho\right]\mathclose{}, (122)

where Pr(0q;ρ)=tr[(|0q0q|𝕀2n+1)ρ]\Pr(0^{q};\rho)=\operatorname{tr}[(|0^{q}\rangle\!\langle 0^{q}|\otimes\mathbb{I}_{2n+1})\rho]. Analogously, we also have

tr(Ojσ~)=tr[(|0q0q|Pr(0q;ρ~)Oj)ρ~].\operatorname{tr}(O_{j}\widetilde{\sigma})=\operatorname{tr}\mathopen{}\left[\mathopen{}\left(\frac{|0^{q}\rangle\!\langle 0^{q}|}{\Pr(0^{q};\widetilde{\rho})}\otimes O_{j}\right)\mathclose{}\widetilde{\rho}\right]\mathclose{}. (123)

To simplify notation, let a=Pr(0q;ρ)a=\Pr(0^{q};\rho) and b=Pr(0q;ρ~)b=\Pr(0^{q};\widetilde{\rho}). By Hölder’s inequality with |0q0q|Oj1\||0^{q}\rangle\!\langle 0^{q}|\otimes O_{j}\|\leq 1, we get

|tr(Ojσ)tr(Ojσ~)|=tr[(|0q0q|Oj)(ρaρ~b)]ρaρ~b1=1aρabρ~11a(ρρ~1+ρ~abρ~1)=1aρρ~1+|1a1b|.\begin{split}|{\operatorname{tr}(O_{j}\sigma)}-\operatorname{tr}(O_{j}\widetilde{\sigma})|&=\operatorname{tr}\mathopen{}\left[\mathopen{}\left(|0^{q}\rangle\!\langle 0^{q}|\otimes O_{j}\right)\mathclose{}\mathopen{}\left(\frac{\rho}{a}-\frac{\widetilde{\rho}}{b}\right)\mathclose{}\right]\mathclose{}\\ &\leq\mathopen{}\left\|\frac{\rho}{a}-\frac{\widetilde{\rho}}{b}\right\|_{1}\mathclose{}=\frac{1}{a}\mathopen{}\left\|\rho-\frac{a}{b}\widetilde{\rho}\right\|_{1}\mathclose{}\\ &\leq\frac{1}{a}\mathopen{}\left(\|\rho-\widetilde{\rho}\|_{1}+\mathopen{}\left\|\widetilde{\rho}-\frac{a}{b}\widetilde{\rho}\right\|_{1}\mathclose{}\right)\mathclose{}\\ &=\frac{1}{a}\|\rho-\widetilde{\rho}\|_{1}+\mathopen{}\left|\frac{1}{a}-\frac{1}{b}\right|\mathclose{}.\end{split} (124)

Recall that preparing pseudo-Choi states with reference always succeeds with probability a,b12a,b\geq\frac{1}{2}. By definition of the diamond norm, the first term is bounded by

1aρρ~12~2Q1γ.\begin{split}\frac{1}{a}\|\rho-\widetilde{\rho}\|_{1}&\leq 2\|\mathcal{B}-\widetilde{\mathcal{B}}\|_{\diamond}\\ &\leq 2Q_{1}\gamma.\end{split} (125)

For the second term, we apply another round of Hölder’s inequality:

|1a1b|=|ab|ab4|tr[(|0q0q|𝕀2n+1)(ρρ~)]|4ρρ~14Q1γ.\begin{split}\mathopen{}\left|\frac{1}{a}-\frac{1}{b}\right|\mathclose{}&=\frac{|a-b|}{ab}\\ &\leq 4\mathopen{}\left|{\operatorname{tr}[(|0^{q}\rangle\!\langle 0^{q}|\otimes\mathbb{I}_{2n+1})(\rho-\widetilde{\rho})]}\right|\mathclose{}\\ &\leq 4\|\rho-\widetilde{\rho}\|_{1}\\ &\leq 4Q_{1}\gamma.\end{split} (126)

Combining Eqs. 125 and 126 yields the desired bound. ∎

Next we show how controlization affects the magnitude of the encoded Pauli coefficients of HH.

Lemma 7.5.

Let BB be an (α,q,0)(\alpha,q,0)-BE that prepares the referenceless pseudo-Choi state of any HH. Fix ρ0=|0q|ΩΩ|0q|\rho_{0}=|0^{q}\rangle|\Omega\rangle\!\langle\Omega|\langle 0^{q}|, and let Q1,,~,γQ_{1},\mathcal{B},\widetilde{\mathcal{B}},\gamma be analogously as in Lemma 7.4. For any P𝒫nP\in\mathcal{P}_{n}, define λP12ntr(PH)\lambda_{P}\coloneqq\frac{1}{2^{n}}\operatorname{tr}(PH) and wPαtr[(|0q0q||PP|)~(ρ0)]w_{P}\coloneqq\alpha\sqrt{\operatorname{tr}[(|0^{q}\rangle\!\langle 0^{q}|\otimes|P\rangle\!\langle P|)\widetilde{\mathcal{B}}(\rho_{0})]}. These quantities obey

|λP|wPαQ1γ.|\lambda_{P}|\geq w_{P}-\alpha\sqrt{Q_{1}\gamma}. (127)
Proof.

We start by reviewing the properties of the target state σ=|Ψ(H)Ψ(H)|\sigma=|\Psi(H)\rangle\!\langle\Psi(H)| from our analysis in Section 4. Recall the basis states |P(P𝕀n)|Ω|P\rangle\equiv(P\otimes\mathbb{I}_{n})|\Omega\rangle for P𝒫nP\in\mathcal{P}_{n}, which define the distribution of Bell measurements:

Pr(P;σ)=P|σ|P=|λP|22,\Pr(P;\sigma)=\langle P|\sigma|P\rangle=\frac{|\lambda_{P}|^{2}}{\ell^{2}}, (128)

where λP=12ntr(PH)\lambda_{P}=\frac{1}{2^{n}}\operatorname{tr}(PH) and =λ\ell=\|\lambda\|. We also have the probability of preparing σ\sigma from measuring the ancilla register of ρ=(ρ0)\rho=\mathcal{B}(\rho_{0}),

Pr(0q;ρ)=(H𝕀n)|Ω2α2=2α2.\Pr(0^{q};\rho)=\frac{\|(H\otimes\mathbb{I}_{n})|\Omega\rangle\|^{2}}{\alpha^{2}}=\frac{\ell^{2}}{\alpha^{2}}. (129)

Multiplying these two quantities allows us to express |λP|2|\lambda_{P}|^{2} as

|λP|2=α2Pr(P;σ)Pr(0q;ρ)=α2tr[(|0q0q||PP|)ρ].\begin{split}|\lambda_{P}|^{2}&=\alpha^{2}\Pr(P;\sigma)\Pr(0^{q};\rho)\\ &=\alpha^{2}\operatorname{tr}\mathopen{}\left[(|0^{q}\rangle\!\langle 0^{q}|\otimes|P\rangle\!\langle P|)\rho\right]\mathclose{}.\end{split} (130)

This observation motivates our definition for wP=αtr[(|0q0q||PP|)~(ρ0)]w_{P}=\alpha\sqrt{\operatorname{tr}[(|0^{q}\rangle\!\langle 0^{q}|\otimes|P\rangle\!\langle P|)\widetilde{\mathcal{B}}(\rho_{0})]} as the magnitude of the Pauli coefficients encoded in the controlized form of the block encoding. The error of this approximation is bounded by

|wP2|λP|2|α2(ρ0)~(ρ0)1α2~α2Q1γ,|w_{P}^{2}-|\lambda_{P}|^{2}|\leq\alpha^{2}\|\mathcal{B}(\rho_{0})-\widetilde{\mathcal{B}}(\rho_{0})\|_{1}\leq\alpha^{2}\|\mathcal{B}-\widetilde{\mathcal{B}}\|_{\diamond}\leq\alpha^{2}Q_{1}\gamma, (131)

where we applied the same telescoping argument from Eq. 121 in the final inequality. Then we use x+yx+y\sqrt{x+y}\leq\sqrt{x}+\sqrt{y} for x,y0x,y\geq 0 to get

wP|λP|2+α2Q1γ|λP|+αQ1γ,\begin{split}w_{P}&\leq\sqrt{|\lambda_{P}|^{2}+\alpha^{2}Q_{1}\gamma}\\ &\leq|\lambda_{P}|+\alpha\sqrt{Q_{1}\gamma},\end{split} (132)

which implies the claim. ∎

With these two lemmas in hand, we are ready to prove Theorem 7.3.

Proof (of Theorem 7.3)..

Parameter estimation. Recall from Eq. 34 and the surrounding discussion that, in order to get ϵ\epsilon precision in λ^a\widehat{\lambda}_{a}, we need Ω(ϵ/Δ)\Omega(\epsilon/\Delta) precision in the decoding operators Oj\langle O_{j}\rangle. In the bootstrap framework (time-reversal), we only need a constant precision ϵ0=12\epsilon_{0}=\frac{1}{2}, so we can tolerate a systematic error of |tr(Ojσ)tr(Ojσ~)|𝒪(1/Δ)|{\operatorname{tr}(O_{j}\sigma)}-\operatorname{tr}(O_{j}\widetilde{\sigma})|\leq\mathcal{O}(1/\Delta). Therefore by Lemma 7.4, we should choose γ=Θ(1/(Q1Δ))\gamma=\Theta(1/(Q_{1}\Delta)) with a sufficiently small constant factor. Recall that Q1=𝒪~(1/η)Q_{1}=\widetilde{\mathcal{O}}(1/\eta), dominated by the cost of spectral amplification on the residual Hamiltonian (Theorem 5.4). Since ηϵ\eta\geq\epsilon, we get tmin=Ω(γ/Δ)=Ω~(ϵ/Δ2)t_{\mathrm{min}}=\Omega(\gamma/\Delta)=\widetilde{\Omega}(\epsilon/\Delta^{2}).

Without the bootstrap framework (time-forward), our error tolerance is ϵ\epsilon-dependent, |tr(Ojσ)tr(Ojσ~)|𝒪(ϵ/Δ)|{\operatorname{tr}(O_{j}\sigma)}-\operatorname{tr}(O_{j}\widetilde{\sigma})|\leq\mathcal{O}(\epsilon/\Delta). However, since the number of queries in this setting is only Q1=𝒪(log(Δ/ϵ))=𝒪~(1)Q_{1}=\mathcal{O}(\log(\Delta/\epsilon))=\widetilde{\mathcal{O}}(1), we end up with the same asymptotic time resolution, tmin=Ω~(ϵ/Δ2)t_{\mathrm{min}}=\widetilde{\Omega}(\epsilon/\Delta^{2}). Note that this applies to the originally proposed algorithm of [CW23], since it also has a query complexity of Q1=𝒪(log(Δ/ϵ))Q_{1}=\mathcal{O}(\log(\Delta/\epsilon)).

Structure learning. The Bell sampling protocol collects Pauli terms which have large coefficients, and the result of Lemma 7.5 tells us how to appropriately adjust the threshold for what we mean by “large” under controlization. In the bootstrap framework, recall that we learn the residual Hamiltonian, so make the notation change λPrP\lambda_{P}\to r_{P} from Lemma 7.5. Also, from Theorem 5.4 we have that α=2Δ\alpha=2\Delta and Q1=𝒪~(1/η)Q_{1}=\widetilde{\mathcal{O}}(1/\eta). The algorithm only needs to find PP such that |rP|>12|r_{P}|>\frac{1}{2}, so we run the protocol with the threshold wP>12+αQ1γw_{P}>\frac{1}{2}+\alpha\sqrt{Q_{1}\gamma}. This holds with γ=Θ(1/(α2Q1))\gamma=\Theta(1/(\alpha^{2}Q_{1})) with a sufficiently small constant, which implies tmin=Ω~(η/Δ3)=Ω~(ϵ/m3)t_{\mathrm{min}}=\widetilde{\Omega}(\eta/\Delta^{3})=\widetilde{\Omega}(\epsilon/m^{3}).

Without the bootstrap framework, we have instead α=ΛΔ=𝒪(Δ2/ϵ)\alpha=\Lambda\Delta=\mathcal{O}(\Delta^{2}/\epsilon) and a desired threshold |λP|>ϵ|\lambda_{P}|>\epsilon. We can achieve this by taking wP>ϵ+αQ1γw_{P}>\epsilon+\alpha\sqrt{Q_{1}\gamma}, where Q1=𝒪(log(ϵ/Δ))Q_{1}=\mathcal{O}(\log(\epsilon/\Delta)) (Theorem 6.5). Hence it suffices to take γ=Θ(ϵ2/(α2Q1))=Θ~(ϵ4/Δ4)\gamma=\Theta(\epsilon^{2}/(\alpha^{2}Q_{1}))=\widetilde{\Theta}(\epsilon^{4}/\Delta^{4}), which results in a time resolution of tmin=Ω~(ϵ4/Δ5)t_{\mathrm{min}}=\widetilde{\Omega}(\epsilon^{4}/\Delta^{5}). ∎

Acknowledgments

AZ thanks Andrew Baczewski, Matthias Caro, John Kallaugher, Danial Motlagh, and Ojas Parekh for helpful discussions. This work was supported by the Laboratory Directed Research and Development program at Sandia National Laboratories, under the Gil Herrera Fellowship in Quantum Information Science. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA-0003525. AZ also acknowledges support from the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Accelerated Research in Quantum Computing.

References

  • [AAKS21] Anurag Anshu, Srinivasan Arunachalam, Tomotaka Kuwahara and Mehdi Soleimanifar “Sample-efficient learning of interacting quantum systems” In Nature Physics 17.8 Nature Publishing Group UK London, 2021, pp. 931–935 DOI: 10.1038/s41567-021-01232-0
  • [Aar20] Scott Aaronson “Shadow tomography of quantum states” In SIAM Journal on Computing 49.5 SIAM, 2020, pp. 368–394 DOI: 10.1137/18M120275X
  • [AFCB14] Mateus Araújo, Adrien Feix, Fabio Costa and Časlav Brukner “Quantum circuits cannot control unknown operations” In New Journal of Physics 16.9 IOP Publishing, 2014, pp. 093026 DOI: 10.1088/1367-2630/16/9/093026
  • [Alh23] Álvaro M. Alhambra “Quantum Many-Body Systems in Thermal Equilibrium” In PRX Quantum 4 American Physical Society, 2023, pp. 040201 DOI: 10.1103/PRXQuantum.4.040201
  • [Ang23] Armando Angrisani “Learning unitaries with quantum statistical queries” In arXiv:2310.02254, 2023 URL: https://arxiv.org/abs/2310.02254
  • [AS07] Alp Atıcı and Rocco A Servedio “Quantum algorithms for learning and testing juntas” In Quantum Information Processing 6.5 Springer, 2007, pp. 323–348 DOI: 10.1007/s11128-007-0061-6
  • [BAL19] Eyal Bairey, Itai Arad and Netanel H. Lindner “Learning a Local Hamiltonian from Local Measurements” In Physical Review Letters 122 American Physical Society, 2019, pp. 020504 DOI: 10.1103/PhysRevLett.122.020504
  • [BCCKS14] Dominic W Berry et al. “Exponential improvement in precision for simulating sparse Hamiltonians” In Proceedings of the Forty-Sixth Annual ACM Symposium on Theory of Computing, 2014, pp. 283–292 DOI: 10.1145/2591796.2591854
  • [BCO24] Andreas Bluhm, Matthias C Caro and Aadil Oufkir “Hamiltonian Property Testing” In arXiv:2403.02968, 2024 URL: https://arxiv.org/abs/2403.02968
  • [BIWH96] J.. Bollinger, Wayne M. Itano, D.. Wineland and D.. Heinzen “Optimal frequency measurements with maximally correlated states” In Physical Review A 54 American Physical Society, 1996, pp. R4649–R4652 DOI: 10.1103/PhysRevA.54.R4649
  • [BKD14] Charles H. Baldwin, Amir Kalev and Ivan H. Deutsch “Quantum process tomography of unitary and near-unitary maps” In Physical Review A 90 American Physical Society, 2014, pp. 012110 DOI: 10.1103/PhysRevA.90.012110
  • [BLMT24] Ainesh Bakshi, Allen Liu, Ankur Moitra and Ewin Tang “Learning quantum Hamiltonians at any temperature in polynomial time” In Proceedings of the 56th Annual ACM Symposium on Theory of Computing, 2024, pp. 1470–1477 DOI: 10.1145/3618260.3649619
  • [BLMT24a] Ainesh Bakshi, Allen Liu, Ankur Moitra and Ewin Tang “Structure learning of Hamiltonians from real-time evolution” In arXiv:2405.00082, 2024 URL: https://arxiv.org/abs/2405.00082
  • [BMQ22] Jessica Bavaresco, Mio Murao and Marco Túlio Quintino “Unitary channel discrimination beyond group structures: Advantages of sequential and indefinite-causal-order strategies” In Journal of Mathematical Physics 63.4 AIP Publishing, 2022 DOI: 10.1063/5.0075919
  • [BO21] Costin Bădescu and Ryan O’Donnell “Improved quantum data analysis” In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, 2021, pp. 1398–1411 DOI: 10.1145/3406325.3451109
  • [Bra+22] Jochen Braumüller et al. “Probing quantum information propagation with out-of-time-ordered correlators” In Nature Physics 18.2 Nature Publishing Group UK London, 2022, pp. 172–178 DOI: 10.1038/s41567-021-01430-w
  • [Cam19] Earl Campbell “Random Compiler for Fast Hamiltonian Simulation” In Physical Review Letters 123 American Physical Society, 2019, pp. 070503 DOI: 10.1103/PhysRevLett.123.070503
  • [Car24] Matthias C Caro “Learning quantum processes and Hamiltonians via the Pauli transfer matrix” In ACM Transactions on Quantum Computing 5.2 ACM New York, NY, 2024, pp. 1–53 DOI: 10.1145/3670418
  • [Cav81] Carlton M. Caves “Quantum-mechanical noise in an interferometer” In Physical Review D 23 American Physical Society, 1981, pp. 1693–1708 DOI: 10.1103/PhysRevD.23.1693
  • [CBW24] Anirban Chowdhury, Ewout van den Berg and Pawel Wocjan “Controlization Schemes Based on Orthogonal Arrays” In arXiv:2407.09382, 2024 URL: https://arxiv.org/abs/2407.09382
  • [CCHL22] Sitan Chen, Jordan Cotler, Hsin-Yuan Huang and Jerry Li “Exponential separations between learning with and without quantum memory” In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), 2022, pp. 574–585 IEEE DOI: 10.1109/FOCS52979.2021.00063
  • [CHKT21] Chi-Fang Chen, Hsin-Yuan Huang, Richard Kueng and Joel A. Tropp “Concentration for Random Product Formulas” In PRX Quantum 2 American Physical Society, 2021, pp. 040305 DOI: 10.1103/PRXQuantum.2.040305
  • [CNY23] Thomas Chen, Shivam Nadimpalli and Henry Yuen “Testing and learning quantum juntas nearly optimally” In Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2023, pp. 1163–1185 SIAM DOI: 10.1137/1.9781611977554.ch43
  • [Col+22] Simone Colombo et al. “Time-reversal-based quantum metrology with many-body entangled states” In Nature Physics 18.8 Nature Publishing Group UK London, 2022, pp. 925–930 DOI: 10.1038/s41567-022-01653-5
  • [CSM23] Jordan Cotler, Thomas Schuster and Masoud Mohseni “Information-theoretic hardness of out-of-time-order correlators” In Physical Review A 108 American Physical Society, 2023, pp. 062608 DOI: 10.1103/PhysRevA.108.062608
  • [CW12] Andrew M Childs and Nathan Wiebe “Hamiltonian simulation using linear combinations of unitary operations” In Quantum Information & Computation 12.11-12 Rinton Press, Incorporated Paramus, NJ, 2012, pp. 901–924 DOI: 10.26421/QIC12.11-12
  • [CW23] Juan Castaneda and Nathan Wiebe “Hamiltonian Learning via Shadow Tomography of Pseudo-Choi States” In arXiv:2308.13020, 2023 URL: https://arxiv.org/abs/2308.13020
  • [DNSM19] Qingxiuxiong Dong, Shojun Nakayama, Akihito Soeda and Mio Murao “Controlled quantum operations and combs, and their applications to universal controllization of divisible unitary operations” In arXiv:1911.01645, 2019 URL: https://arxiv.org/abs/1911.01645
  • [DOS23] Alicja Dutkiewicz, Thomas E O’Brien and Thomas Schuster “The advantage of quantum control in many-body Hamiltonian learning” In arXiv:2304.07172, 2023 URL: https://arxiv.org/abs/2304.07172
  • [DRC17] C.. Degen, F. Reinhard and P. Cappellaro “Quantum sensing” In Reviews of Modern Physics 89 American Physical Society, 2017, pp. 035002 DOI: 10.1103/RevModPhys.89.035002
  • [FDDB14] Nicolai Friis, Vedran Dunjko, Wolfgang Dür and Hans J. Briegel “Implementing quantum control for unknown subroutines” In Physical Review A 89 American Physical Society, 2014, pp. 030303 DOI: 10.1103/PhysRevA.89.030303
  • [FW20] Steven T Flammia and Joel J Wallman “Efficient estimation of Pauli channels” In ACM Transactions on Quantum Computing 1.1 ACM New York, NY, USA, 2020, pp. 1–32 DOI: 10.1145/3408039
  • [Gär+17] Martin Gärttner et al. “Measuring out-of-time-order correlations and multiple quantum spectra in a trapped-ion quantum magnet” In Nature Physics 13.8 Nature Publishing Group UK London, 2017, pp. 781–786 DOI: 10.1038/nphys4119
  • [GCC24] Andi Gu, Lukasz Cincio and Patrick J Coles “Practical Hamiltonian learning with unitary dynamics and Gibbs states” In Nature Communications 15.1 Nature Publishing Group UK London, 2024, pp. 312 DOI: 10.1038/s41467-023-44008-1
  • [GFWC12] Christopher E Granade, Christopher Ferrie, Nathan Wiebe and David G Cory “Robust online Hamiltonian learning” In New Journal of Physics 14.10 IOP Publishing, 2012, pp. 103013 DOI: 10.1088/1367-2630/14/10/103013
  • [GL89] O. Goldreich and L.. Levin “A hard-core predicate for all one-way functions” In Proceedings of the Twenty-First Annual ACM Symposium on Theory of Computing Association for Computing Machinery, 1989, pp. 25–32 DOI: 10.1145/73007.73010
  • [GLM11] Vittorio Giovannetti, Seth Lloyd and Lorenzo Maccone “Advances in quantum metrology” In Nature Photonics 5.4 Nature Publishing Group UK London, 2011, pp. 222–229 DOI: 10.1038/nphoton.2011.35
  • [GSLW19] András Gilyén, Yuan Su, Guang Hao Low and Nathan Wiebe “Quantum singular value transformation and beyond: exponential improvements for quantum matrix arithmetics” In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, 2019, pp. 193–204 DOI: 10.1145/3313276.3316366
  • [GST24] Zuzana Gavorová, Matan Seidel and Yonathan Touati “Topological obstructions to quantum computation with unitary oracles” In Physical Review A 109 American Physical Society, 2024, pp. 032625 DOI: 10.1103/PhysRevA.109.032625
  • [Gut24] Francisco Escudero Gutiérrez “Simple algorithms to test and learn local Hamiltonians” In arXiv:2404.06282, 2024 URL: https://arxiv.org/abs/2404.06282
  • [Hal15] Brian C. Hall “Lie Groups, Lie Algebras, and Representations”, Graduate Texts in Mathematics Springer, 2015
  • [HBCP15] M. Holzäpfel, T. Baumgratz, M. Cramer and M.. Plenio “Scalable reconstruction of unitary processes and Hamiltonians” In Physical Review A 91 American Physical Society, 2015, pp. 042129 DOI: 10.1103/PhysRevA.91.042129
  • [HCP23] Hsin-Yuan Huang, Sitan Chen and John Preskill “Learning to Predict Arbitrary Quantum Processes” In PRX Quantum 4 American Physical Society, 2023, pp. 040337 DOI: 10.1103/PRXQuantum.4.040337
  • [HKOT23] Jeongwan Haah, Robin Kothari, Ryan O’Donnell and Ewin Tang “Query-optimal estimation of unitary channels in diamond distance” In 2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS), 2023, pp. 363–390 IEEE DOI: 10.1109/FOCS57990.2023.00028
  • [HKP20] Hsin-Yuan Huang, Richard Kueng and John Preskill “Predicting many properties of a quantum system from very few measurements” In Nature Physics 16, 2020, pp. 1050–1057 DOI: 10.1038/s41567-020-0932-7
  • [HKT24] Jeongwan Haah, Robin Kothari and Ewin Tang “Learning quantum Hamiltonians from high-temperature Gibbs states and real-time evolutions” In Nature Physics 20 Nature Publishing Group UK London, 2024, pp. 1027–1031 DOI: 10.1038/s41567-023-02376-x
  • [HTFS23] Hsin-Yuan Huang, Yu Tong, Di Fang and Yuan Su “Learning Many-Body Hamiltonians with Heisenberg-Limited Scaling” In Physical Review Letters 130 American Physical Society, 2023, pp. 200403 DOI: 10.1103/PhysRevLett.130.200403
  • [Hug+22] William J. Huggins et al. “Nearly Optimal Quantum Algorithm for Estimating Multiple Expectation Values” In Physical Review Letters 129 American Physical Society, 2022, pp. 240501 DOI: 10.1103/PhysRevLett.129.240501
  • [IBFBP20] Luca Innocenti et al. “Supervised learning of time-independent Hamiltonians for gate design” In New Journal of Physics 22.6 IOP Publishing, 2020, pp. 065001 DOI: 10.1088/1367-2630/ab8aaf
  • [Jan02] Dominik Janzing “Quantum algorithm for measuring the energy of nn qubits with unknown pair-interactions” In Quantum Information & Computation 2.3 Rinton Press, Incorporated Paramus, NJ, 2002, pp. 198–207 DOI: 10.26421/QIC2.3-3
  • [JW28] P. Jordan and E. Wigner “Über das Paulische Äquivalenzverbot” In Z. Phys. 47, 1928, pp. 631–651 DOI: 10.1007/BF01331938
  • [KCM22] Or Katz, Marko Cetina and Christopher Monroe NN-Body Interactions between Trapped Ion Qubits via Spin-Dependent Squeezing” In Physical Review Letters 129 American Physical Society, 2022, pp. 063603 DOI: 10.1103/PhysRevLett.129.063603
  • [KGKB24] Robbie King, David Gosset, Robin Kothari and Ryan Babbush “Triply efficient shadow tomography” In arXiv:2404.19211, 2024 URL: https://arxiv.org/abs/2404.19211
  • [KLY15] Shelby Kimmel, Guang Hao Low and Theodore J. Yoder “Robust calibration of a universal single-qubit gate set via robust phase estimation” In Physical Review A 92 American Physical Society, 2015, pp. 062315 DOI: 10.1103/PhysRevA.92.062315
  • [KU18] Naoto Kura and Masahito Ueda “Finite-error metrological bounds on multiparameter Hamiltonian estimation” In Physical Review A 97 American Physical Society, 2018, pp. 012101 DOI: 10.1103/PhysRevA.97.012101
  • [LC17] Guang Hao Low and Isaac L Chuang “Hamiltonian simulation by uniform spectral amplification” In arXiv:1707.05391, 2017 URL: https://arxiv.org/abs/1707.05391
  • [Li+23] Zeyang Li et al. “Improving metrology with quantum scrambling” In Science 380.6652 American Association for the Advancement of Science, 2023, pp. 1381–1384 DOI: 10.1126/science.adg9500
  • [LTGNY24] Haoya Li et al. “Heisenberg-limited Hamiltonian learning for interacting bosons” In npj Quantum Information 10.1 Nature Publishing Group UK London, 2024, pp. 83 DOI: 10.1038/s41534-024-00881-2
  • [MFPT24] Muzhou Ma, Steven T Flammia, John Preskill and Yu Tong “Learning kk-body Hamiltonians via compressed sensing” In arXiv:2410.18928, 2024 URL: https://arxiv.org/abs/2410.18928
  • [MH24] Arjun Mirani and Patrick Hayden “Learning interacting fermionic Hamiltonians at the Heisenberg limit” In arXiv:2403.00069, 2024 URL: https://arxiv.org/abs/2403.00069
  • [Mi+21] Xiao Mi et al. “Information scrambling in quantum circuits” In Science 374.6574 American Association for the Advancement of Science, 2021, pp. 1479–1483 DOI: 10.1126/science.abg5029
  • [MO10] Ashley Montanaro and Tobias J Osborne “Quantum boolean functions” In Chicago Journal of Theoretical Computer Science 1, 2010, pp. 1–45 DOI: 10.4086/cjtcs.2010.001
  • [Mon17] Ashley Montanaro “Learning stabilizer states by Bell sampling” In arXiv:1707.04012, 2017 DOI: https://arxiv.org/abs/1707.04012
  • [MR24] John M Martyn and Patrick Rall “Halving the Cost of Quantum Algorithms with Randomization” In arXiv:2409.03744, 2024 URL: https://arxiv.org/abs/2409.03744
  • [NLY24] Hongkang Ni, Haoya Li and Lexing Ying “Quantum Hamiltonian Learning for the Fermi-Hubbard Model” In Acta Applicandae Mathematicae 191.1 Springer, 2024, pp. 1–16 DOI: 10.1007/s10440-024-00651-4
  • [NSM15] Shojun Nakayama, Akihito Soeda and Mio Murao “Quantum Algorithm for Universal Implementation of the Projective Measurement of Energy” In Physical Review Letters 114 American Physical Society, 2015, pp. 190501 DOI: 10.1103/PhysRevLett.114.190501
  • [OKSM24] Tatsuki Odake, Hlér Kristjánsson, Akihito Soeda and Mio Murao “Higher-order quantum transformations of Hamiltonian dynamics” In Physical Review Research 6 American Physical Society, 2024, pp. L012063 DOI: 10.1103/PhysRevResearch.6.L012063
  • [OKTM23] Tatsuki Odake, Hlér Kristjánsson, Philip Taranto and Mio Murao “Universal algorithm for transforming Hamiltonian eigenvalues” In arXiv:2312.08848, 2023 URL: https://arxiv.org/abs/2312.08848
  • [QDSSM19] Marco Túlio Quintino et al. “Reversing Unknown Quantum Transformations: Universal Quantum Circuit for Inverting General Unitary Operations” In Physical Review Letters 123 American Physical Society, 2019, pp. 210502 DOI: 10.1103/PhysRevLett.123.210502
  • [QR19] Xiao-Liang Qi and Daniel Ranard “Determining a local Hamiltonian from a single eigenstate” In Quantum 3 Verein zur Förderung des Open Access Publizierens in den Quantenwissenschaften, 2019, pp. 159 DOI: 10.22331/q-2019-07-08-159
  • [Ram50] Norman F. Ramsey “A Molecular Beam Resonance Method with Separated Oscillating Fields” In Physical Review 78 American Physical Society, 1950, pp. 695–699 DOI: 10.1103/PhysRev.78.695
  • [Sch+23] Thomas Schuster et al. “Learning quantum systems via out-of-time-order correlators” In Physical Review Research 5 American Physical Society, 2023, pp. 043284 DOI: 10.1103/PhysRevResearch.5.043284
  • [SHH24] Thomas Schuster, Jonas Haferkamp and Hsin-Yuan Huang “Random unitaries in extremely low depth” In arXiv:2407.07754, 2024 URL: https://arxiv.org/abs/2407.07754
  • [Shi07] Shigeo Shioda “Some upper and lower bounds on the coupon collector problem” In Journal of Computational and Applied Mathematics 200.1 Elsevier, 2007, pp. 154–167 DOI: 10.1016/j.cam.2005.12.011
  • [Shu+14] Michael D Shulman et al. “Suppressing qubit dephasing using real-time Hamiltonian estimation” In Nature Communications 5.1 Nature Publishing Group UK London, 2014, pp. 5156 DOI: 10.1038/ncomms6156
  • [SLP11] Marcus P. Silva, Olivier Landon-Cardinal and David Poulin “Practical Characterization of Quantum Devices without Tomography” In Physical Review Letters 107 American Physical Society, 2011, pp. 210404 DOI: 10.1103/PhysRevLett.107.210404
  • [SMDWB24] Daniel Stilck França et al. “Efficient and robust estimation of many-qubit Hamiltonians” In Nature Communications 15.1 Nature Publishing Group UK London, 2024, pp. 311 DOI: 10.1038/s41467-023-44012-5
  • [SMLKR11] A. Shabani et al. “Estimation of many-body quantum Hamiltonians via compressive sensing” In Physical Review A 84 American Physical Society, 2011, pp. 012107 DOI: 10.1103/PhysRevA.84.012107
  • [VZ24] Alexander Volberg and Haonan Zhang “Noncommutative Bohnenblust–Hille inequalities” In Mathematische Annalen 389.2 Springer, 2024, pp. 1657–1676 DOI: 10.1007/s00208-023-02680-0
  • [Wan+17] Jianwei Wang et al. “Experimental quantum Hamiltonian learning” In Nature Physics 13.6 Nature Publishing Group UK London, 2017, pp. 551–555 DOI: 10.1038/nphys4074
  • [WGFC14] Nathan Wiebe, Christopher Granade, Christopher Ferrie and D.. Cory “Hamiltonian Learning and Certification Using Quantum Resources” In Physical Review Letters 112 American Physical Society, 2014, pp. 190501 DOI: 10.1103/PhysRevLett.112.190501
  • [WGFC14a] Nathan Wiebe, Christopher Granade, Christopher Ferrie and David Cory “Quantum Hamiltonian learning using imperfect quantum resources” In Phys. Rev. A 89 American Physical Society, 2014, pp. 042314 DOI: 10.1103/PhysRevA.89.042314
  • [YSHY23] Wenjun Yu, Jinzhao Sun, Zeyao Han and Xiao Yuan “Robust and efficient Hamiltonian learning” In Quantum 7 Verein zur Förderung des Open Access Publizierens in den Quantenwissenschaften, 2023, pp. 1045 DOI: 10.22331/q-2023-06-29-1045
  • [Zha+23] Haimeng Zhao et al. “Learning quantum states and unitaries of bounded gate complexity” In arXiv:2310.19882, 2023 DOI: https://arxiv.org/abs/2310.19882
  • [Zho+11] Xiao-Qi Zhou et al. “Adding control to arbitrary unknown quantum operations” In Nature Communications 2.1 Nature Publishing Group UK London, 2011, pp. 413 DOI: 10.1038/ncomms1392
  • [ZYLB21] Assaf Zubida, Elad Yitzhaki, Netanel H Lindner and Eyal Bairey “Optimal short-time measurements for Hamiltonian learning” In arXiv:2108.08824, 2021 URL: https://arxiv.org/abs/2108.08824

Appendix A Copies to queries reduction

Let p(0,1]p\in(0,1] be the probability of preparing a state |ψ|\psi\rangle, say from querying a block encoding BB and measuring some of its ancilla qubits. As a series of Bernoulli trials, it is a standard result that Q=Θ(N/p)Q=\Theta(N/p) queries is necessary and sufficient to successfully prepare NN copies of the state. In this appendix we prove the statement for completeness, adapting the argument of [CW23, Appendix B.6] to the appropriate level of generality for our purposes.

Lemma A.1.

Fix integers QN1Q\geq N\geq 1. Let X1,,XQX_{1},\ldots,X_{Q} be i.i.d. Bernoulli random variables with Pr(Xi)=p(0,1]\Pr(X_{i})=p\in(0,1]. Define X=X1++XQX=X_{1}+\cdots+X_{Q}. In order to observe the event XNX\geq N with probability at least 1eΩ(N)1-e^{-\Omega(N)},

Q=Θ(N/p)Q=\Theta(N/p) (133)

trials are necessary and sufficient.

Proof.

We invoke Chernoff’s inequality to bound the tail probability for the event X<NX<N. For any t(0,1)t\in(0,1), the multiplicative Chernoff bound states that

Pr(X(1t)𝔼[X])exp(t2𝔼[X]2).\Pr(X\leq(1-t)\operatorname*{\mathbb{E}}[X])\leq\exp\mathopen{}\left(-\frac{t^{2}\operatorname*{\mathbb{E}}[X]}{2}\right)\mathclose{}. (134)

Since 𝔼[X]=Qp\operatorname*{\mathbb{E}}[X]=Qp, we can set t=1N1Qpt=1-\frac{N-1}{Qp} to get

Pr(XN1)exp(Qp2(1NQp)2).\Pr(X\leq N-1)\leq\exp\mathopen{}\left(-\frac{Qp}{2}\mathopen{}\left(1-\frac{N}{Qp}\right)^{2}\mathclose{}\right)\mathclose{}. (135)

This bounds the failure probability of the entire process, so set the rhs to be equal to ecNe^{-cN} for some constant cc. Taking logs and rearranging yields

log(ecN)=Qp2N+N22QpQp2N.\begin{split}\log(e^{cN})&=\frac{Qp}{2}-N+\frac{N^{2}}{2Qp}\\ &\geq\frac{Qp}{2}-N.\end{split} (136)

Solving for QQ, we find

Q2(c+1)Np=𝒪(N/p)Q\leq\frac{2(c+1)N}{p}=\mathcal{O}(N/p) (137)

is sufficient. Conversely, observe that for t>0t>0 to hold, we must have Qp>N1Qp>N-1, so Q=Ω(N/p)Q=\Omega(N/p) is also necessary.

Finally, we remark that t<1t<1 only when N2N\geq 2, so let us consider the N=1N=1 case separately. It is clear that Pr(X0)=Pr(X=0)=(1p)Q\Pr(X\leq 0)=\Pr(X=0)=(1-p)^{Q}. If p=1p=1 then the result is trivial; meanwhile for each p(0,1)p\in(0,1), there exists some c>1c^{\prime}>1 such that 1pecp1-p\geq e^{-c^{\prime}p}. Combined with 1pep1-p\leq e^{-p}, we get

Pr(X=0)=eΘ(p)Q=ec,\Pr(X=0)=e^{-\Theta(p)Q}=e^{-c}, (138)

which implies that Q=Θ(1/p)Q=\Theta(1/p) when N=1N=1 as well. Note that even when N=1N=1 we can choose c=log(1/δ)c=\log(1/\delta) to get some small failure probability δ\delta. ∎

Appendix B The galactic algorithm

Here we prove Corollary 1.7. Let us restate it below.

Corollary B.1 (Galactic structure learning without time reversal).

There exists a quantum algorithm that solves 1.2 under the discrete control access model alone. For any real constant p1p\geq 1, the algorithm queries eiHte^{-\mathrm{i}Ht} for a total evolution time of ttotal=𝒪~((2pH)1+2/p/ϵ2+2/p)t_{\mathrm{total}}=\widetilde{\mathcal{O}}((2^{p}\|H\|)^{1+2/p}/\epsilon^{2+2/p}), requires a time resolution of ttmin=Ω~(ϵ4/(2pH)5)t\geq t_{\mathrm{min}}=\widetilde{\Omega}(\epsilon^{4}/(2^{p}\|H\|)^{5}), runs Nexp=𝒪~((2pH)2+2/p/ϵ2+2/p)N_{\mathrm{exp}}=\widetilde{\mathcal{O}}((2^{p}\|H\|)^{2+2/p}/\epsilon^{2+2/p}) experiments, and uses poly(n,m,1/ϵ)\operatorname{poly}(n,m,1/\epsilon) quantum and classical computation. The discrete control operations act on an ancillary computational register of nanc=n+𝒪(loglog(H/ϵ))n_{\mathrm{anc}}=n+\mathcal{O}(\log\log(\|H\|/\epsilon)) qubits.

Proof.

The key modification to make is shrinking the bound rr from Lemma 6.2. Let p1p\geq 1 be some real constant and for simplicity suppose Δ=2pH\Delta=2^{p}\|H\|. Then eiH/Δ𝕀n2pr\|e^{-\mathrm{i}H/\Delta}-\mathbb{I}_{n}\|\leq 2^{-p}\equiv r, which by Lemma 6.2 means that

H/ΔLK(U)2p(K+1)(K+1)(12p)C2p(K+1)),\begin{split}\|H/\Delta-L_{K}(U)\|&\leq\frac{2^{-p(K+1)}}{(K+1)(1-2^{-p})}\\ &\leq C\cdot 2^{-p(K+1)}),\end{split} (139)

where C1=2(12p)C^{-1}=2(1-2^{-p}) is a constant. To make this error at most γ\gamma, we can take K=log2(C/γ)/p1K=\lceil\log_{2}(C/\gamma)/p\rceil-1. Taking γ=Θ(ϵ/Δ)\gamma=\Theta(\epsilon/\Delta), we get

Λ2=𝒪(4K)=𝒪((C/γ)2/p)=𝒪((Δ/ϵ)2/p)=𝒪((2pH/ϵ)2/p).\begin{split}\Lambda^{2}=\mathcal{O}(4^{K})&=\mathcal{O}((C/\gamma)^{2/p})\\ &=\mathcal{O}((\Delta/\epsilon)^{2/p})\\ &=\mathcal{O}((2^{p}\|H\|/\epsilon)^{2/p}).\end{split} (140)

Then recall from Theorems 6.7 and 104 that the query complexity for either parameter or structure learning is 𝒪(Λ2Δ2/ϵ2)\mathcal{O}(\Lambda^{2}\Delta^{2}/\epsilon^{2}), and the time resolution from Theorem 7.3 is Ω~(ϵ4/Δ5)\widetilde{\Omega}(\epsilon^{4}/\Delta^{5}). ∎

Remark B.2.

One may be concerned that K=0K=0 for large pp, but this is not an issue: K1K\geq 1 always holds because limplog(2p)/p=2\lim_{p\to\infty}\lceil\log(2^{p})/p\rceil=2. On the other hand, this means that for very large pp, we are essentially just constructing a block encoding of i(U𝕀n)=H/Δ+𝒪(4p)\mathrm{i}(U-\mathbb{I}_{n})=H/\Delta+\mathcal{O}(4^{-p}), which is why this approximation is so accurate for just two terms of the matrix logarithm. Of course, the trade-off is that 1/Δ1/\Delta suppresses the estimation of the parameters of HH by a factor of 2p2^{-p} as well, hence the blow-up in this constant.