This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Using Random Walks for Iterative Phase Estimation

Cassandra Granade Microsoft Quantum, Microsoft, Redmond, WA, United States    Nathan Wiebe Department of Computer Science, University of Toronto, Toronto, Canada Pacific Northwest National Laboratory, Richland, USA University of Washington, Department of Physics, Seattle, USA
(authors in alphabetical order)
Abstract

In recent years there has been substantial development in algorithms for quantum phase estimation. In this work we provide a new approach to online Bayesian phase estimation that achieves Heisenberg limited scaling that requires exponentially less classical processing time with the desired error tolerance than existing Bayesian methods. This practically means that we can perform an update in microseconds on a CPU as opposed to milliseconds for existing particle filter methods. Our approach assumes that the prior distribution is Gaussian and exploits the fact, when optimal experiments are chosen, the mean of the prior distribution is given by the position of a random walker whose moves are dictated by the measurement outcomes. We then argue from arguments based on the Fisher information that our algorithm provides a near-optimal analysis of the data. This work shows that online Bayesian inference is practical, efficient and ready for deployment in modern FPGA driven adaptive experiments.

1 Introduction

Phase estimation is widely used throughout quantum information to learn the eigenvalues of unitary operators. This is a critical step in realizing computational speedups from algorithms such as Shor’s algorithm [1], versions of the linear systems algorithm [2] and quantum simulation [3, 4, 5, 6]. Moreover, the eigenvalues of unitary operators can carry information about physical parameters, such that phase estimation is critical in quantum metrology applications as well [7, 8] and similar ideas can be used in Hamiltonian learning protocols as well [9, 10, 11].

One very popular approach to minimizing the quantum resources required for phase estimation is iterative phase estimation [12, 13, 14], in which one repeatedly uses the unitary of interest to write a phase onto a single-qubit register, and then processes the data collected from these measurements using classical resources. The data obtained from each iteration can then be used to determine the next measurement to perform, such that classically adaptive protocols are naturally expressed in the iterative phase estimation framework. Across applications for phase estimation, there is a significant benefit to formulating approaches to iterative phase estimation that allow for adaptivity and data processing with very modest classical resources. For instance, Wiebe and Granade [15] have introduced rejection filtering, an iterative phase estimation algorithm that is modest enough to be compatible with modern experimental control hardware such as field-programmable gate arrays (FPGAs). Recent experimental demonstrations have shown the effectiveness of rejection filtering in modern applications [16].

In this work, we simplify the classical processing requirements still further, providing a new algorithm that uses approximately 250 bits of classical memory between iterations and that is entirely deterministic conditioned on the experimental data record. Our new algorithm is thus uniquely well-suited not only for implementation on FPGAs, but as an application-specific integrated circuit (ASIC) or on tightly memory-bound microcontroller platforms, which are common in certain cryogenic applications where it is desirable to have a small low-power controller located as close as possible to the base plate of the dilution refrigerator for latency purposes despite the limited cooling power available at such stages. The ability to use such simple control hardware in turn significantly reduces the cost required for implementing quantum algorithms and new metrology protocols such as distributed sensors [17] in near term quantum hardware.

2 Review of Bayesian Phase Estimation

Concretely, consider a family of unitary operators U(t)U(t) for a real parameter tt and a quantum state |ω\ket{\omega} such that for all tt U(t)|ω=eiωt|ωU(t)\ket{\omega}=\mathrm{e}^{\mathrm{i}\omega t}\ket{\omega} for an unknown real number ω\omega. We are then interested in performing controlled applications of U(t)U(t) on a copy of the state |ω\ket{\omega} in order to learn ω\omega.

There are many ways that the learning problem for phase estimation can be formalized. While the most commonly used methods have historically estimated the phase in a bit-by-bit fashion [18], Bayesian approaches to learning ω\omega have recently gained in popularity because of their robustness and their statistical efficiency [15]. The idea behind such methods is to quantify the uncertainty in ω\omega via a prior probability distribution, Pr(ω)\Pr(\omega). Then conditioned on measuring an outcome dd, the probability distribution describing the uncertainty in ω\omega conditioned on the measurement is

Pr(ω|d)\displaystyle\Pr(\omega|d) =Pr(d|ω)Pr(ω)Pr(d),\displaystyle=\frac{\Pr(d|\omega)\Pr(\omega)}{\Pr(d)}, (1)

where Pr(d)\Pr(d) is a normalization factor and Pr(d|ω)\Pr(d|\omega) is the likelihood of the experiment reproducing the observation dd given that the phase ωt\omega t was the true eigenphase of U(t)U(t).

The art of Bayesian phase estimation is then to choose experiments in such a way to minimize the resources needed to reduce the uncertainty in an estimate of ω\omega, given by the estimator ω^\hat{\omega} [13, 15]. Adaptive versions of Bayesian phase estimation are known to achieve Heisenberg limited scaling and come close to saturating lower bounds on the uncertainty in ω\omega as a function of experimental time or number of measurements [7, 15]. A complication that arises in choosing a Bayesian method is that each such method requires different amounts of experimental time, total number of measurements, classical memory and processing time and robustness to experimental imperfections. The latter two are especially important for present-day experiments where any form of Bayesian inference that requires classical processing time that is comparable to the coherence time (which is often on the order of microseconds). To this end, finding simple phase estimation methods that are robust, efficient and can be executed within a timescale of hundreds of nanoseconds remains an important problem.

The likelihood function Pr(d|ω)\Pr(d|\omega) used in iterative phase estimation is given by the quantum circuit provided in Section 2. Computing the probability of observing d{0,1}d\in\{0,1\} as an outcome in the circuit shown in Section 2 gives us the likelihood function for phase estimation,

Pr(d|ω;t,ωinv)\displaystyle\Pr(d|\omega;t,\omega_{\mathrm{inv}}) =cos2(t[ωωinv]/2+dπ/2).\displaystyle=\cos^{2}(t[\omega-\omega_{\mathrm{inv}}]/2+d\pi/2). (2)

Equipped with a likelihood function, we can thus reason about the posterior probability Pr(ω|d)\Pr(\omega|d) for a datum dd by using Bayesian inference.

In this setting we assume that we are able to apply the unitary U(t)U(t) for any real-valued tt. This assumption is eminently reasonable in quantum simulation wherein, up to small errors on the order of the simulation error incurred in implementing U(t)U(t), such unitaries can be constructed within sufficiently small error. In settings where U(t)U(t) can only can only be implemented directly for integer valued tt, an evolution can be closely approximated using a quantum singular value transformation that is an ϵ\epsilon-approximation to UttU^{t-\lfloor t\rfloor} using O(log(1/ϵ))O(\log(1/\epsilon)) applications of U(t)U(\lfloor t\rfloor) [19] provided that log(U(t))π/4\|\log(U(t))\|\leq\pi/4, which is typical for applications in quantum simulation. These observations motivate our assumption that U(t)U(t) is a continuous function of tt.

A major challenge that we face when trying to numerically implement Bayesian phase estimation arises from the fact that we need to discretize the posterior distribution to ensure that an update to the distribution can be made in finite time. Fortunately, unlike many other problems in Bayesian inference, the curse of dimensionality does not usually appear because the prior distribution Pr(ω)\Pr(\omega) maps \mathbb{R} to \mathbb{R}. Three natural approaches emerge when trying to model the posterior distribution are Sequential Monte-Carlo, Grids and Gaussian processes.

Sequential Monte-Carlo is a general approach to Bayesian inference that approximates the posterior distribution as a sum of Dirac-delta functions: Pr(ω)jwjδ(ωωj)\Pr(\omega)\approx\sum_{j}w_{j}\delta(\omega-\omega_{j}) for a set of phases {ωj:j=1,,Npart}\{\omega_{j}:j=1,\ldots,N_{part}\}. As the posterior distribution is updated the ωj\omega_{j} are moved through a resampler to allow the distribution to continue to capture the low-order moments of the distribution even as the learning process excludes an exponentially growing fraction of the original prior distribution. These Sequential-Monte-Carlo approaches been a workhorse for both phase estimation and also Hamiltonian learning: [16, 20]. However, for multimodal distributions these methods can fail and since NpartO(1/ϵ2)N_{part}\in O(1/\epsilon^{2}) for most applications, this means that the classical memory requirements of storing the posterior distribution can be exponentially worse than one may expect from its contemporaries. This makes such approaches less well suited for memory limited environments and motivated the development of rejection sampling [15] PE.

Theposteriordistributionisalsodiscretizedovergridsinotherworkshasalsoconsidered[13, 21].Thesetechniquescanbefastandaccuratewhenusedwithadaptivegridrefinement.Theyfailhoweverwhentheposteriordistributionisnotsufficientlysmoothandsufferfromthecurseofdimensionalityasallsimilarmethodsdo.Thefinalapproachthatisbroadlyused,andmirrorsthestrategytakenhere,exploitsGaussianprocessestomaketheinferencefastandhighlymemoryefficient.ThecentralideabehindsuchmethodsisthattheyassumethattheunderlyingpriordistributionisaGaussianandundertheconditionsthatthelikelihoodfunctionissharplypeakedtheresultingposteriordistributionwillalsobeapproximatelyGaussian.SincetheseupdatesmapGaussianstoGaussiansusingclosedformexpressionsfortheupdatetheyarehighlyefficientbothinspaceandtimeandthesemethodsformthemostpopularapproachestodoingBayesianinferenceingeneral.However,inHeisenberglimitedphaseestimationtheunderlyingdistributionisseldomGaussianandsothesemethodshavenotseenasmuchuseforgenericphaseestimationexperiments(althoughsimilarideashavebeenconsideredfornonHeisenberglimitedphaseestimation[22]).\lx@xy@svg{\hbox{ \@@toccaption{{\lx@tag[ ]{{1}}{ Iterative phase estimation circuit used to measure a single datum in the experimental data set used to estimate $\omega$. }}}\@@caption{{\lx@tag[: ]{{Figure 1}}{ Iterative phase estimation circuit used to measure a single datum in the experimental data set used to estimate $\omega$. }}}\end{figure}}\par Theposteriordistributionisalsodiscretizedovergridsinotherworkshasalsoconsidered\leavevmode\nobreak\ \cite[cite]{[\@@bibref{Number}{svore_faster_2013,tipireddy2020bayesian}{}{}]}.Thesetechniquescanbefastandaccuratewhenusedwithadaptivegridrefinement.Theyfailhoweverwhentheposteriordistributionisnotsufficientlysmoothandsufferfromthecurseofdimensionalityasallsimilarmethodsdo.\par Thefinalapproachthatisbroadlyused,andmirrorsthestrategytakenhere,exploitsGaussianprocessestomaketheinferencefastandhighlymemoryefficient.ThecentralideabehindsuchmethodsisthattheyassumethattheunderlyingpriordistributionisaGaussianandundertheconditionsthatthelikelihoodfunctionissharplypeakedtheresultingposteriordistributionwillalsobeapproximatelyGaussian.SincetheseupdatesmapGaussianstoGaussiansusingclosedformexpressionsfortheupdatetheyarehighlyefficientbothinspaceandtimeandthesemethodsformthemostpopularapproachestodoingBayesianinferenceingeneral.However,inHeisenberg-limitedphaseestimationtheunderlyingdistributionisseldomGaussianandsothesemethodshavenotseenasmuchuseforgenericphaseestimationexperiments(althoughsimilarideashavebeenconsideredfornon-Heisenberglimitedphaseestimation\leavevmode\nobreak\ \cite[cite]{[\@@bibref{Number}{lumino2018experimental}{}{}]}).\par\par\par\par\par}
Figure 1: Iterative phase estimation circuit used to measure a single datum in the experimental data set used to estimate ω\omega.

3 Random Walk Phase Estimation

Algorithm 1 Basic random walk phase estimation algorithm.
function RandomWalkPhaseEst(μ0\mu_{0}, σ0\sigma_{0})
      \blacksquare Arguments
        μ0\mu_{0}: initial mean
        σ0\sigma_{0}: initial standard deviation
     
      \blacksquare Initialization
     μμ0\mu\leftarrow\mu_{0}
     σσ0\sigma\leftarrow\sigma_{0}
      \blacksquare Main body
     for iexp{0,1,nexp1}i_{\textrm{exp}}\in\{0,1,\dots n_{\textrm{exp}}-1\} do
         ωinvμπσ/2\omega_{\mathrm{inv}}\leftarrow\mu-\pi\sigma/2
         t1/σt\leftarrow 1/\sigma
         Sample a datum dd from Pr(d=0|ω;ωinv,t)=cos2(t(ωωinv)/2)\Pr(d=0|\omega;\omega_{\mathrm{inv}},t)=\cos^{2}(t(\omega-\omega_{\mathrm{inv}})/2).
         if d=0d=0 then
              μμ+σ/e\mu\leftarrow\mu+\sigma/\sqrt{e}
         else
              μμσ/e\mu\leftarrow\mu-\sigma/\sqrt{e}          
         σσ(e1)/e\sigma\leftarrow\sigma\sqrt{(e-1)/e}      
      \blacksquare Final estimate
     return ω^μ\hat{\omega}\leftarrow\mu

The central difference between our approach and most other Bayesian methods that have been proposed is that our method is entirely deterministic. Specifically, the posterior mean (which we use as our estimate of the true eigenphase) shifts left or right upon collection of each datum by a fixed amount that depends only on the experimental outcome. Thus the trajectory that our estimate of the eigenphase takes as the experiment proceeds follows a random walk with exponentially shrinking stepsize. This simplicity allows us to not only store an approximation to the posterior distribution using shockingly little memory; it also only uses basic arithmetic that can be performed within the coherence times of modern quantum devices.

Rather than performing exact Bayesian inference, which is not efficient because the memory required grows exponentially with the number of bits of precision needed, we use a form of approximate Bayesian inference. In particular, we assume that the prior distribution is Gaussian at each step:

Pr(ω)=exp((ωμ)2/2σ2)2πσ.\displaystyle\Pr(\omega)=\frac{\exp(-(\omega-\mu)^{2}/2\sigma^{2})}{\sqrt{2\pi}\sigma}. (3)

Such priors can be efficiently represented because they are only a function of μ\mu and σ\sigma. Unfortunately, the conjugate priors for (missing) 2 are not Gaussian which means that the posterior distribution is not Gaussian:

Pr(ω|d;t,ωinv)\displaystyle\Pr(\omega|d;t,\omega_{\mathrm{inv}}) =cos2(t[ωωinv]/2+dπ/2)exp((ωμ)2/2σ2)cos2(t[ωωinv]/2+dπ/2)exp((ωμ)2/2σ2)dω.\displaystyle=\frac{\cos^{2}(t[\omega-\omega_{\mathrm{inv}}]/2+d\pi/2){\exp(-(\omega-\mu)^{2}/2\sigma^{2})}}{\int_{-\infty}^{\infty}\cos^{2}(t[\omega-\omega_{\mathrm{inv}}]/2+d\pi/2){\exp(-(\omega-\mu)^{2}/2\sigma^{2})}\mathrm{d}\omega}. (4)

However, for most experiments that we consider the posterior distribution will be unimodal and thus the majority of the probability mass will be given by the first two moments of the distribution. This justifies the following approximation,

Pr(ω|d;t,ωinv)exp((ωμ)2/2σ2)2πσ,\displaystyle\Pr(\omega|d;t,\omega_{\mathrm{inv}})\approx\frac{\exp(-(\omega-\mu^{\prime})^{2}/2{\sigma^{\prime}}^{2})}{\sqrt{2\pi}\sigma^{\prime}}, (5)

where μ\mu^{\prime} and σ\sigma^{\prime} are chosen to match the posterior mean and posterior standard deviation. Specifically,

μ:=𝔼[ω]=ωP(d|ω;t,ωinv)dω\mu^{\prime}\mathrel{:=}\mathbb{E}[\omega]=\int_{-\infty}^{\infty}\omega P(d|\omega;t,\omega_{\mathrm{inv}})\mathrm{d}\omega

and similarly

σ2:=𝕍[ω]=ω2P(d|ω;t,ωinv)dωμ2.{\sigma^{\prime}}^{2}\mathrel{:=}\mathbb{V}[\omega]=\int_{-\infty}^{\infty}\omega^{2}P(d|\omega;t,\omega_{\mathrm{inv}})\mathrm{d}\omega-{\mu^{\prime}}^{2}.

Here, we will take as an approximation that the prior distribution Pr(ω)\Pr(\omega) is a Gaussian distribution at each step, and thus the prior can be characterized by its mean μ0:=𝔼[ω]\mu_{0}\mathrel{:=}\mathbb{E}[\omega] and variance σ02:=𝕍[ω]\sigma_{0}^{2}\mathrel{:=}\mathbb{V}[\omega]. The Bayes update then consists of finding the updated mean μ=𝔼[ω|d]=ωPr(ω|d)dω\mu=\mathbb{E}[\omega|d]=\int\omega\Pr(\omega|d)\,\mathrm{d}\omega and updated variance σ2=𝕍[ω|d]=ωPr(ω|d)dωμ2\sigma^{2}=\mathbb{V}[\omega|d]=\int\omega\Pr(\omega|d)\,\mathrm{d}\omega-\mu^{2}, which we can then use as the prior distribution at the next step. For the phase estimation likelihood and under the assumption of prior Gaussianity, we can do this explicitly. In calculating the update, we assume without loss of generality that under the prior, μ0=0\mu_{0}=0 and σ02=1\sigma_{0}^{2}=1; the general case is obtained from this case by the standard location–scale transformation. We will express our update rule explicitly without the use of rescaling at the end of our derivation. Without further ado, then, the rescaled update rule is given by

μ\displaystyle\mu^{\prime}\leftarrow (tsin[tωinv])set2/2+cos[tωinv]) and\displaystyle\frac{(t\sin[t\omega_{\mathrm{inv}}])}{s\mathrm{e}^{t^{2}/2}+\cos[t\omega_{\mathrm{inv}}])}\quad\text{ and } (6a)
σ2\displaystyle{\sigma^{\prime}}^{2}\leftarrow 1st2[et2/2cos(tωinv)+s(et2/2+scos(tωinv))2],\displaystyle 1-st^{2}\left[\frac{\mathrm{e}^{t^{2}/2}\cos(t\omega_{\mathrm{inv}})+s}{\left(\mathrm{e}^{t^{2}/2}+s\cos(t\omega_{\mathrm{inv}})\right)^{2}}\right], (6b)

where s=(1)ds=(-1)^{d}.

Since the posterior variance describes the error that we should expect we will incur in estimating ω\omega with the currently available observations, we can then choose tt and ωinv\omega_{\mathrm{inv}} to minimize the posterior variance and hence minimize the errors that we will incur. The posterior variance is minimized for both d=0,1d=0,1 at t=σ1t=\sigma^{-1} and ωinv=μ\omega_{\mathrm{inv}}=\mu, such that we can specialize (missing) 6 at these values to obtain a simpler update rule,

μ\displaystyle\mu^{\prime}\leftarrow μ+(1)dσe\displaystyle\mu+(-1)^{d}\frac{\sigma}{\sqrt{e}} (7a)
σ\displaystyle\sigma^{\prime}\leftarrow σe1e\displaystyle\sigma\sqrt{\frac{\mathrm{e}-1}{\mathrm{e}}} (7b)

In this way, we can derive the particle guess heuristic (PGH) of Wiebe et al. [9] and Wiebe et al. [20] for the special case of phase estimation with a Gaussian conjugate prior. In particular, the PGH selects each new experiment by drawing two samples ω\omega and ω\omega^{\prime} from the current posterior, then choosing ωinv=ω\omega_{\mathrm{inv}}=\omega and t=1/ωωt=1/\|\omega-\omega^{\prime}\|, such that minimizing (missing) 6 agrees with the PGH in expectation over ω\omega and ω\omega^{\prime}.

Critically, the variance update in (missing) 7 does not depend on which datum we observe; the reduction in variance at each step is wholly deterministic. Similarly, the mean shifts by the same amount at each step, such that the only dependence on the data of the Gaussian approximation to the posterior distribution is through which direction the mean shifts at each observation. We can thus think of the Gaussian approximation as yielding a random walk that is damped towards the true value of ω\omega. This leads to two observations:

  1. 1.

    The update rule for the position of the posterior Gaussian distribution is translationally invariant and scale invariant.

  2. 2.

    The update rule for the posterior variance is scale invariant.

Following this identification, we obtain Algorithm 1, which exploits these invariances in the approximate posterior density to render the inference step computationally trivial and therefore suitable for rapid execution in both ASICs and FPGAs.

Refer to caption
Refer to caption
Refer to caption
Figure 2: (Left column) Histogram over log-losses for many different trials, compared to the mean loss (Bayes risk) and the median loss. (Right column) Log-loss versus the true value of ω\omega, compared with the finite range (thick lines) in which each walker can explore. (Top row) Trials are run using the basic approach of Algorithm 1. (Middle row) Trials are run using one unwinding step, as in Algorithm 2. (Bottom row) Trials are run using two unwinding steps, as in Algorithm 2.
Refer to caption
Refer to caption
Figure 3: (Left column) τcheck=102\tau_{\textrm{check}}=10^{-2} (Right column) τcheck=100\tau_{\textrm{check}}=10^{0} (Top row) Histogram over log-losses for many different trials, using Algorithm 2 with nunwind=1n_{\mathrm{unwind}}=1. (Bottom row) Histogram over log-losses for the same trials as in the top row, but using particle filtering with Liu–West resampling to incorporate consistency check and unwound data.

Algorithm 1 can fail to provide an inaccurate estimate of ω^\hat{\omega} in two different ways. First, that the mean can only shift by fixed amount at each step implies that the walk on ω\omega can not go further than

1ek=0(e1e)k/2=1ee12.95\displaystyle\frac{1}{\sqrt{e}}\sum_{k=0}^{\infty}\left(\frac{e-1}{e}\right)^{k/2}=\frac{1}{\sqrt{e}-\sqrt{e-1}}\approx 2.95 (8)

away from its initial position. Assuming μ0=0\mu_{0}=0 and σ0=1\sigma_{0}=1 correctly describe the true probability distribution from which ω\omega are chosen, this corresponds to a failure probability of 0.3%. If we take a prior that is too “tight” by a factor of about 20% (corresponding to a single step of the algorithm), then the failure probability due to finite support of the random walk can be as large as 1.8%. Thankfully, this is an easy failure modality to address, in that we can use an initial prior distribution with an artificially wider variance than our actual prior. We confirm this intuition in Figure 2 wherein we see that these failures tend to lead to rare events where improbable sequences of measurements that causes the mean error to be much larger than the median error due to rare but significant deviations from the true value.

The other failure modality is more severe, in that the Gaussian approximation itself can fail. By contrast to rejection filtering and other such approaches, in random walk phase estimation, we do not choose the variance based on the data itself, but based on an offline analytic computation. Thus, we must have a way to correct for approximation failures. We propose to do this by unwinding data records, returning to an earlier state of the random walk algorithm. In particular, since the data record uniquely describes the entire path of a random walker, we can step backwards along the history of a random walk.

To determine when we should unwind in this fashion, we will perform additional measurements whose outcomes we can accurately predict if the approximation is still accurate. Choosing ωinv=μ\omega_{\mathrm{inv}}=\mu and t=τcheck/σt=\tau_{\textrm{check}}/\sigma for a scale parameter τcheck>0\tau_{\textrm{check}}>0, we obtain that the probability of a 0 outcome is

Pr(0;μ,τcheck/σ)\displaystyle\Pr(0;\mu,\tau_{\textrm{check}}/\sigma) =Pr(0|ω;μ,τcheck/σ)Pr(ω)dω\displaystyle=\int\Pr(0|\omega;\mu,\tau_{\textrm{check}}/\sigma)\Pr(\omega)\mathrm{d}\omega (9a)
=12(1+eτcheck2/2),\displaystyle=\frac{1}{2}\left(1+e^{-\tau_{\textrm{check}}^{2}/2}\right), (9b)

where Pr(ω)\Pr(\omega) is assumed to be the normal distribution with mean μ\mu and variance σ2\sigma^{2} as per the Gaussianity assumption. For τcheck=1\tau_{\textrm{check}}=1, this corresponds to a false negative rate (that is, inferring the approximation has failed when it has not) of approximately 19.6%19.6\%, while τcheck=102\tau_{\textrm{check}}=10^{-2} gives a false negative rate of 0.002%0.002\%. The key idea behind this is to use a low-cost consistency check to determine whether the Gaussian assumptions for the posterior distribution still hold and if they fail then we revert the random walk until the consistency check passes. We list the unwinding together with the original random walk in Algorithm 2.

Notably, both the consistency check data and the unwound data are still useful data. In situations where the data can be efficiently recorded for later use, the entire data record including all unwound data and consistency checks may be used in postprocessing to verify that the estimate derived from the random walk is feasible. In the numerical results presented in Section 4, we use the example of particle filtering to perform postprocessing [23], using the QInfer software library [24].

4 Numerical Results

The data given in Figure 2 provides an indication of the performance of RWPE for typical phase estimation experiments. We find from the data provided that RWPE works extremely well over the trials considered. Specifically, we take our initial value of ω\omega to be Gaussian distributed according to the initial prior of the distribution. Note that the data as seen from the true ω\omega distribution is not distributed via a circular distribution but rather a linear Gaussian distribution. While this creates some bias in the output angles in principle, the distribution is broad enough that it approximates a uniform distribution here.

More concretely, we see that for experiments consisting of at most 100100 accepted steps for the random walk (and a maximum of 100 000100\leavevmode\nobreak\ 000 total steps) that the median loss for RWPE is low. Specifically, it is on the order of 102010^{-20} which corresponds to 1010 digits of accuracy. Also, as a point of comparison the Heisenberg limit for the phase observed is 3.5×10113.5\times 10^{-11} [7]. We find from RWPE that the median performance is close to the Heisenberg limit, specifically the median is also on the order of 3×10113\times 10^{-11}; however, a direct comparison to the Heisenberg limit may not be fully appropriate as the definition is usually formally given in terms of the Holevo variance rather than the median deviation. These differences are apparent in the mean, where we find that the distribution of errors in Figure 2 has losses that are on the order of 11 with non-negligible frequency for the case where no unwinding steps are taken. The impact of this is severely limited by taking one unwinding step; however, we see that for the examples considered a single case of failure (which does not appear in the histogram at the scale presented) causes the mean error to be substantial. Despite the median being small, the large mean errors in both cases suggest that we formally are not Hesienberg limited. In contrast, two unwinding steps were found to completely eliminate these issues and the mean loss was observed to shrink to 102010^{-20}, which yields a variance that is within less than an order of magnitude of the Heisenberg limit for these experiments.

The cause of many of these failures can be seen in the support of the errors over the true value of ω\omega. If no unwinding is performed then the mean-square error is on the order unity for true frequencies that are close to π\pi or π-\pi. This is because the update rules for the random walk can only explore [2.95,2.95][-2.95,2.95] according to (8). The ability to unwind our variance past the initial prior allows the unwinding algorithm to deal with this.

We further see from the histogram of the quadratic loss for the reported eigenphase versus the actual eigenphase in Figure 3 that the median performance of RWPE outperforms the results seen by Liu-West when applied to the same dataset. We perform this part in order to ensure that even the data used in the consistency check is provided to both methods. For the case where the check timescale is small, τcheck=0.01\tau_{\rm check}=0.01, we see that the median quadratic loss is nearly two orders of magnitude smaller than Liu-West using 80008000 particles. For the case of the larger time used for the consistency check, τcheck=1\tau_{\rm check}=1, we find that there is a separation between the two on the order of nearly 55 orders of magnitude. The increased separation between the performance is likely because of the larger timesteps used in the consistency check step leading to multi-modal distributions in the particle filter which cannot be well approximated using the Liu-West resampler.

In both cases the mean errors are observed to be large. This is expected as when a failure occurs the error tends to be on the order of unity, which means that when the median is on the order of 101010^{-10} that rare errors can dominate the mean, however they do not dominate the median. These problems can be mitigated by increasing the number of rewinding steps or increasing the number of particles used in the particle filter approximations.

Here 80008000 particles are used in the particle filters because previous work [15] found that this tended to yield stable results while also allowing updates to be performed on a scale of tens of milliseconds on a single-core CPU. In contrast, RWPE requires time on the order of a microsecond to perform an update on a single core CPU. The difference between the two processing times means that RWPE is vastly better suited for online setting in superconducting hardware where the coherence times are often on the order of 100μ100\mus.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 4: The data presented here provide a demonstration of Heisenberg limited scaling, which corresponds to a quadratic loss that is inverse to the square of the time for up to 100100 accepted steps for RWPE. Here the Heisenberg limit corresponds to a line of zero slope for the mean square error and is observed for the data with 22 and 33 unwinding steps.

The final question that we wish to probe numerically is whether the algorithm is capable of achieving Heisenberg limited scaling in practice. We defer discussion about the optimal constants for the scaling to the following section. Specifically, for Heisenbert limited scaling we need to have that the Holevo variance scales like the inverse of the square of the total simulation time. The Holevo variance corresponds to the ordinary variance for narrow distributions that have negligible overlap with the branch cut chosen for the eigenphases returned by phase estimation.

We examine this scaling in Figure 4. We find that for zero and one unwinding step, that while the median error is small, the mean error does not shrink as the number of iterations grows. In contrast, we see clear evidence (a line of zero slope) for Heisenberg limited scaling if we use two or three unwinding steps. This suggests that the processing technique that RWPE is capable of achieving Heisenberg limited scaling even in an online setting.

5 Decision Theoretic Bounds

In this work, we have expressed our algorithm as approximating the Bayesian mean estimator (BME) ω^BME(D):=𝔼[ω|D]\hat{\omega}_{\mathrm{BME}}(D)\mathrel{:=}\mathbb{E}[\omega|D]. This choice is motivated by a decision theoretic argument, in which we penalize an estimator by a particular amount L(ω^(D),ω)L(\hat{\omega}(D),\omega) for returning a particular estimate ω^(D)\hat{\omega}(D) when the true value of the parameter of interest is ω\omega. A function LL which we use to assign such penalties is called a loss function; as described in the main body, the quadratic loss LQ(ω^(D)ω):=(ω^(D)ω)2L_{Q}(\hat{\omega}(D)-\omega)\mathrel{:=}(\hat{\omega}(D)-\omega)^{2} is a particularly useful and broadly applicable loss function. We will drop the subscript QQ when it is clear from context that we are referring to this choice.

In any case, once we have adopted such a loss function, we can then reason about the average loss incurred by a particular estimator. In particular, the Bayes risk rr is defined as the expected loss over all hypotheses ωπ\omega\sim\pi drawn from a prior π\pi and over all data sets DD. That is,

r(ω^):=𝔼ωπ,D[L(ω(D),ω)].\displaystyle r(\hat{\omega})\mathrel{:=}\mathbb{E}_{\omega\sim\pi,D}[L(\omega(D),\omega)]. (10)

The celebrated result of Banerjee et al. [25] then provides that the BME minimizes the Bayes risk for any loss function of the form L(ω^(D),ω)=F(ω^(D))F(ω)(ddωF(ω))(ω^ω)L(\hat{\omega}(D),\omega)=F(\hat{\omega}(D))-F(\omega)-(\frac{\mathrm{d}}{\mathrm{d}\omega}F(\omega))(\hat{\omega}-\omega) for a convex function FF. This holds for the quadratic loss, as well as a number of other important examples such as the Kullback–Leibler divergence.

Having thus argued that our approach is approximately optimal for the Bayes risk, we are then interested in how well our algorithm approximately achieves the Bayes risk. To address this, we note that the van Trees inequality [26] lower bounds the Bayes risk achievable by any estimator ω^\hat{\omega} as

r(ω^)1𝔼ωπ[I(ω)]+I0,\displaystyle r(\hat{\omega})\geq\frac{1}{\mathbb{E}_{\omega\sim\pi}[I(\omega)]+I_{0}}, (11)

where I(ω):=𝔼D[(ddωlogPr(D|ω))2]I(\omega)\mathrel{:=}\mathbb{E}_{D}[\left(\frac{\mathrm{d}}{\mathrm{d}\omega}\log\Pr(D|\omega)\right)^{2}] is the usual Fisher information, and where I0:=𝔼ωπ[(ddωlogπ(ω))2]I_{0}\mathrel{:=}\mathbb{E}_{\omega\sim\pi}[\left(\frac{\mathrm{d}}{\mathrm{d}\omega}\log\pi(\omega)\right)^{2}] is a correction for the information provided by the prior distribution π\pi. Thus, the van Trees inequality can be thought of as a correction to the traditional Cramér–Rao bound for the effects of prior information [27]. Whereas the Cramér–Rao bound only holds for unbiased estimators, the BME is asymptotically unbiased but may in general be biased for a finite-sized data set.

In the case that π\pi is a normal distribution 𝒩(0,1)\mathcal{N}(0,1), however, the correction I0I_{0} for the bias introduced by the prior distribution is identically zero, such that the van Trees and Cramér–Rao bounds coincide. More generally, the set of true values ω\omega for which the Cramér–Rao bound can be violated by a biased estimator approaches Lebesgue measure zero asymptotically with the number of observations nexpn_{\textrm{exp}} [28], such that the Cramér–Rao bound is useful for characterizing the performance of biased estimators such as the BME.

Applying this bound to Algorithm 2, we note that the experiments chosen by the algorithm are deterministic unless the unwinding step proceeds backwards from the initial prior. Thus, the Cramér–Rao and hence the van Trees inequalities can be calculated explicitly for the particular experimental choices made by Algorithm 2, yielding that

r(ω^)1iexp=0nexp1(ee1)iexp=([e1][ee1]nexp)1,\displaystyle r(\hat{\omega})\geq\frac{1}{\sum_{i_{\textrm{exp}}=0}^{n_{\textrm{exp}}-1}\left(\frac{\mathrm{e}}{\mathrm{e}-1}\right)^{i_{\textrm{exp}}}}=\left(\left[\mathrm{e}-1\right]\left[\frac{\mathrm{e}}{\mathrm{e}-1}\right]^{n_{\textrm{exp}}}\right)^{-1}, (12)

where we have used that I(ω)=iti2I(\omega)=\sum_{i}t_{i}^{2} [29]. For nexp=100n_{\textrm{exp}}=100, r(ω^)7×1021r(\hat{\omega})\gtrapprox 7\times 10^{-21}. This corresponds to the mean-square error observed in Figure 2 for the case where 22 unwinding steps is used and thus RWPE performs a nearly optimal analysis of the data using on the order of a millisecond of total processing time.

Critically, the van Trees inequality holds in expectation, such that the loss in any particular experiment can be significantly better than (missing) 12. Moreover, the consistency checks utilized by Algorithm 2 provide additional data not included in the bound given by (missing) 12, such that our algorithm violate the van Trees inequality evaluated only at the accepted measurement results. In that sense, our algorithm can be thought of as a heuristic for importance sampling from amongst the data, such that the usual discussions of optimality and postselection apply [29, 30].

6 Conclusion

To conclude, our work has shown a new approach to adaptive Bayesian phase estimation that uses Gaussian processes coupled with an unwinding rule to improve the stability of the learning process. Our method requires a constant amount of memory to perform the analysis and is further Heisenberg limited. Further, the fact that a constant application needs to be performed at every step in the process means that it is trivial to implement the updates to the prior distribution here using an FPGA without resorting to a lookup table. This makes our approach arguably the best known Heisenberg-limited approach for quantum phase estimation on near-term experiments wherein latency may be a concern and limitations due to heating power proves a bottleneck. Specifically, we find that our approach to phase estimation requires O(log(1/ϵ))O(\log(1/\epsilon)) bits of memory to perform phase estimation within error epsilon and requires O(log(1/ϵ))O(\log(1/\epsilon)) gate operations to perform the phase estimation assuming a constant number of unwinding steps are needed if single-precision data types are used. In contrast, Bayesian phase estimation using SMC requires O(log(1/ϵ)/ϵ2)O(\log(1/\epsilon)/\epsilon^{2}) arithmetic operations and cosine evaluations to perform phase estimation [31]. This gives our approach a substantial advantage in classical time and space complexity relative to its classical brethren.

Important applications of these ideas include not only quantum phase estimation for algorithmic applications but also metrology applications and further these ideas are sufficiently cost effective that in some platforms with slow quantum operations, they could be used to assist in adaptive measurement of qubits or other quantum systems. Further, while this work focuses on applications within the quantum domain, similar considerations also apply outside of this setting and the ability to do adaptive inference in memory limited environment may have further applications in broader settings such as with autonomous drones or in control engineering. Regardless, the ideas behind this work show that adaptive Bayesian inference need not be expensive and through the use of clever heuristics can be brought within the reach of even the most modest control hardware.

References

  • Shor [1994] P. W. Shor, “Algorithms for quantum computation: discrete logarithms and factoring,” in Proceedings 35th annual symposium on foundations of computer science (Ieee, 1994) pp. 124–134.
  • Harrow et al. [2009] A. W. Harrow, A. Hassidim,  and S. Lloyd, “Quantum algorithm for linear systems of equations,” Physical review letters 103, 150502 (2009).
  • Reiher et al. [2017] M. Reiher, N. Wiebe, K. M. Svore, D. Wecker,  and M. Troyer, “Elucidating reaction mechanisms on quantum computers,” Proceedings of the national academy of sciences 114, 7555 (2017).
  • Abrams and Lloyd [1997] D. S. Abrams and S. Lloyd, “Simulation of many-body fermi systems on a universal quantum computer,” Physical Review Letters 79, 2586 (1997).
  • von Burg et al. [2021] V. von Burg, G. H. Low, T. Häner, D. S. Steiger, M. Reiher, M. Roetteler,  and M. Troyer, “Quantum computing enhanced computational catalysis,” Physical Review Research 3, 033055 (2021).
  • Su et al. [2021] Y. Su, D. W. Berry, N. Wiebe, N. Rubin,  and R. Babbush, “Fault-tolerant quantum simulations of chemistry in first quantization,” PRX Quantum 2, 040332 (2021).
  • Berry et al. [2001] D. W. Berry, H. M. Wiseman,  and J. K. Breslin, “Optimal input states and feedback for interferometric phase estimation,” Physical Review A 63, 053804 (2001).
  • Hentschel and Sanders [2010] A. Hentschel and B. C. Sanders, “Machine Learning for Precise Quantum Measurement,” Physical Review Letters 104, 063603 (2010).
  • Wiebe et al. [2014] N. Wiebe, C. Granade, C. Ferrie,  and D. G. Cory, “Hamiltonian learning and certification using quantum resources,” Physical Review Letters 112, 190501 (2014).
  • Zintchenko and Wiebe [2016] I. Zintchenko and N. Wiebe, “Randomized gap and amplitude estimation,” Physical Review A 93, 062306 (2016).
  • Wang et al. [2017] J. Wang, S. Paesani, R. Santagati, S. Knauer, A. A. Gentile, N. Wiebe, M. Petruzzella, J. L. O’Brien, J. G. Rarity, A. Laing, et al., “Experimental quantum hamiltonian learning,” Nature Physics 13, 551 (2017).
  • Kitaev et al. [2002] A. Y. Kitaev, A. Shen, M. N. Vyalyi,  and M. N. Vyalyi, Classical and quantum computation, 47 (American Mathematical Soc., 2002).
  • Svore et al. [2013] K. M. Svore, M. B. Hastings,  and M. Freedman, “Faster phase estimation,” arXiv:1304.0741  (2013).
  • Kimmel et al. [2015] S. Kimmel, G. H. Low,  and T. J. Yoder, “Robust calibration of a universal single-qubit gate set via robust phase estimation,” Physical Review A 92, 062315 (2015).
  • Wiebe and Granade [2016] N. Wiebe and C. Granade, “Efficient Bayesian phase estimation,” Phys. Rev. Lett. 117, 010503 (2016).
  • Paesani et al. [2017] S. Paesani, A. A. Gentile, R. Santagati, J. Wang, N. Wiebe, D. P. Tew, J. L. O’Brien,  and M. G. Thompson, “Experimental Bayesian Quantum Phase Estimation on a Silicon Photonic Chip,” Physical Review Letters 118, 100503 (2017).
  • Eldredge et al. [2016] Z. Eldredge, M. Foss-Feig, S. L. Rolston,  and A. V. Gorshkov, “Optimal and secure measurement protocols for quantum sensor networks,” arXiv:1607.04646 [quant-ph]  (2016), arXiv: 1607.04646.
  • Kitaev [1995] A. Y. Kitaev, “Quantum measurements and the Abelian stabilizer problem,” arXiv:quant-ph/9511026  (1995), quant-ph/9511026 .
  • Gilyén et al. [2019] A. Gilyén, Y. Su, G. H. Low,  and N. Wiebe, “Quantum singular value transformation and beyond: exponential improvements for quantum matrix arithmetics,” in Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing (2019) pp. 193–204.
  • Wiebe et al. [2015] N. Wiebe, C. Granade,  and D. G. Cory, “Quantum bootstrapping via compressed quantum Hamiltonian learning,” New Journal of Physics 17, 022005 (2015).
  • Tipireddy and Wiebe [2020] R. Tipireddy and N. Wiebe, “Bayesian phase estimation with adaptive grid refinement,” arXiv preprint arXiv:2009.07898  (2020).
  • Lumino et al. [2018] A. Lumino, E. Polino, A. S. Rab, G. Milani, N. Spagnolo, N. Wiebe,  and F. Sciarrino, “Experimental phase estimation enhanced by machine learning,” Physical Review Applied 10, 044033 (2018).
  • Doucet and Johansen [2011] A. Doucet and A. M. Johansen, A Tutorial on Particle Filtering and Smoothing: Fifteen Years Later (2011).
  • Granade et al. [2017] C. Granade, C. Ferrie, I. Hincks, S. Casagrande, T. Alexander, J. Gross, M. Kononenko,  and Y. Sanders, “QInfer: Statistical inference software for quantum applications,” Quantum 1, 5 (2017).
  • Banerjee et al. [2005] A. Banerjee, X. Guo,  and H. Wang, “On the optimality of conditional expectation as a Bregman predictor,” IEEE Transactions on Information Theory 51, 2664 (2005).
  • Gill and Levit [1995] R. D. Gill and B. Y. Levit, “Applications of the van Trees inequality: A Bayesian Cramér-Rao bound,” Bernoulli 1, 59 (1995).
  • Cover and Thomas [1991] T. M. Cover and J. A. Thomas, “Information theory and statistics,” Elements of information theory 1, 279 (1991).
  • Opper [1998] M. Opper, “On-line learning in neural networks,”  (Cambridge University Press, New York, NY, USA, 1998) pp. 363–378.
  • Ferrie et al. [2013] C. Ferrie, C. E. Granade,  and D. G. Cory, “How to best sample a periodic probability distribution, or on the accuracy of Hamiltonian finding strategies,” Quantum Information Processing 12, 611 (2013).
  • Combes et al. [2014] J. Combes, C. Ferrie, Z. Jiang,  and C. M. Caves, “Quantum limits on postselected, probabilistic quantum metrology,” Physical Review A 89, 052117 (2014).
  • Granade et al. [2012] C. E. Granade, C. Ferrie, N. Wiebe,  and D. G. Cory, “Robust online hamiltonian learning,” New Journal of Physics 14, 103013 (2012).
Algorithm 2 Random walk phase estimation algorithm with consistency checks and data unwinding.
function RandomWalkPhaseEst( μ0\mu_{0}, σ0\sigma_{0}, τcheck\tau_{\textrm{check}}, nunwindn_{\mathrm{unwind}} )
      \blacksquare Arguments
        μ0\mu_{0}: initial mean
        σ0\sigma_{0}: initial standard deviation
        τcheck\tau_{\textrm{check}}: consistency check scale
        nunwindn_{\mathrm{unwind}}: number of unwinding steps
     
      \blacksquare Initialization
     μμ0\mu\leftarrow\mu_{0}
     σσ0\sigma\leftarrow\sigma_{0}
     DD\leftarrow a stack of Boolean values.
      \blacksquare Main body
     for iexp{0,1,nexp1}i_{\textrm{exp}}\in\{0,1,\dots n_{\textrm{exp}}-1\} do
         ωinvμπσ/2\omega_{\mathrm{inv}}\leftarrow\mu-\pi\sigma/2
         t1/σt\leftarrow 1/\sigma
         Sample a datum dd from Pr(d=0|ω;ωinv,t)=cos2(t(ωωinv)/2)\Pr(d=0|\omega;\omega_{\mathrm{inv}},t)=\cos^{2}(t(\omega-\omega_{\mathrm{inv}})/2).
         Push dd onto DD.
         if d=0d=0 then
              μμ+σ/e\mu\leftarrow\mu+\sigma/\sqrt{e}
         else
              μμσ/e\mu\leftarrow\mu-\sigma/\sqrt{e}          
         σσ(e1)/e\sigma\leftarrow\sigma\sqrt{(e-1)/e}
          \blacksquare Data unwinding
         if nunwind>0n_{\mathrm{unwind}}>0 then
               /​​/ Perform a consistency check; d=0d^{\prime}=0 is most probable if our approximation is correct.
              Sample a datum dd^{\prime} from Pr(d=0|ω;μ,τcheck/σ)=cos2(τcheck/σ(ωμ)/2)\Pr(d^{\prime}=0|\omega;\mu,\tau_{\textrm{check}}/\sigma)=\cos^{2}(\tau_{\textrm{check}}/\sigma(\omega-\mu)/2).
               /​​/ Keep unwinding until the consistency check passes.
              while d=1d^{\prime}=1 do
                  for iunwind{0,,nunwind1}i_{\mathrm{unwind}}\in\{0,\dots,n_{\mathrm{unwind}}-1\} do
                       σσe/(e1)\sigma\leftarrow\sigma\sqrt{e/(e-1)}
                       if DD is not empty then /​​/ Only shift μ\mu backwards when we unwound actual data.
                           dpopDd\leftarrow\operatorname{pop}D
                           if d=0d=0 then
                                μμσ/e\mu\leftarrow\mu-\sigma/\sqrt{e}
                           else
                                μμ+σ/e\mu\leftarrow\mu+\sigma/\sqrt{e}                                                                     
                   /​​/ Perform a new consistency check.
                  Sample a datum dd^{\prime} from Pr(d=0|ω;μ,τcheck/σ)=cos2(τcheck/σ(ωμ)/2)\Pr(d^{\prime}=0|\omega;\mu,\tau_{\textrm{check}}/\sigma)=\cos^{2}(\tau_{\textrm{check}}/\sigma(\omega-\mu)/2).                             
      \blacksquare Final estimate
     return ω^μ\hat{\omega}\leftarrow\mu

Appendix A Unwinding Past the Initial Prior

In Figure 2, we presented the results of using Algorithm 2 with unwinding steps that can continue backwards past the initial prior. Effectively, unwinding past the initial prior automates interventions such as those used by rejection filtering PE [15] by allowing RWPE to explore hypothesis that are considered improbable under the initial prior distribution.

In this Appendix, we underscore that this rule is critical to allow the random walk to exceed the finite constraints of the basic walk presented Algorithm 1 by repeating the previous analysis with unwinding on actual data only. That is, in the results presented within Figure 5, if an unwinding step would pop an empty stack DD, then the unwinding step is aborted instead.

Refer to caption
Refer to caption
Refer to caption
Figure 5: (Left column) Histogram over log-losses for many different trials, compared to the mean loss (Bayes risk) and the median loss. (Right column) Log-loss versus the true value of ω\omega, compared with the finite range (thick lines) in which each walker can explore. (Top row) Trials are run using the basic approach of Algorithm 1. (Middle row) Trials are run using one constrained unwinding step, as in Algorithm 2. (Bottom row) Trials are run using two constrained unwinding steps, as in Algorithm 2.

We note that the constrained unwinding, while still an improvement over the initial version of Algorithm 1, does not adequately deal with the effects of the finite support of the basic walk. This observation is underscored by the results presented in Figure 6, where we consider the frequentist risk rather than the traditional Bayes risk. Concretely, the frequentist risk is defined as the average loss that incurred by an estimation when the true value of ω\omega is fixed, R(ω^,ω):=𝔼D[(ω^ω)2]R(\hat{\omega},\omega)\mathrel{:=}\mathbb{E}_{D}[(\hat{\omega}-\omega)^{2}]. We then observe that without allowing the unwinding step to proceed backwards from the initial prior, the frequentist risk suddenly worsens by approximately 102010^{20} when the finite extent of the random walk begins to dominate.

Constrained Unwinding

Refer to caption

Unwinding Past Initial Prior

Refer to caption
Figure 6: (Top section) Unwinding constrained by initial prior. (Bottom section) Unwinding as described in Algorithm 2. (Top row of each section) Frequentist risk and median loss as a function of |ω||\omega|, the absolute value of the true ω\omega used to generate data in each trial. (Bottom row of each section) Median number of unwinding steps taken to accept 100 experiments. (Columns) Different numbers of unwinding steps allowed.

By contrast, the frequentist risk for RWPE with unwinding of at least two steps past the initial prior is much more flat, representing that the risk remains acceptable across the full range of valid hypotheses.