Quantum enhanced Markov chains require fine-tuned quenches

Alev Orfi Center for Computational Quantum Physics, Flatiron Institute, 162 5th Avenue, New York, NY 10010, USA Center for Quantum Phenomena, Department of Physics, New York University, 726 Broadway, New York, New York 10003, USA Dries Sels Center for Computational Quantum Physics, Flatiron Institute, 162 5th Avenue, New York, NY 10010, USA Center for Quantum Phenomena, Department of Physics, New York University, 726 Broadway, New York, New York 10003, USA

Abstract

Quantum-enhanced Markov chain Monte Carlo, an algorithm in which configurations are proposed through a measured quantum quench and accepted or rejected by a classical algorithm, has been proposed as a possible method for robust quantum speedup on imperfect quantum devices. While this procedure is resilient to noise and control imperfections, the potential for quantum advantage is unclear. By upper-bounding the gap of the Markov chain, we identify competing factors that limit the algorithm’s performance. One needs the quantum dynamics to efficiently delocalize the system over a range of classical states, however, it is also detrimental to introduce too much entropy through the quench. Specifically, we show that in the long-time limit, the gap of the Markov chain is bounded by the inverse participation ratio of the classical states in the eigenstate basis, showing there is no advantage when quenching to an ergodic system. For the paradigmatic Sherrington-Kirkpatrick and 3-spin model, we identify the regime of optimal spectral gap scaling and link it to the system’s eigenstate properties.

I Introduction

Sampling from a Boltzmann distribution of a classical system is a critical task across multiple fields. Consider a classical Hamiltonian, $H_{c}$ , which characterizes a system of $N$ binary random variables $x_{i}=\pm 1$ . The Boltzmann distribution,

\pi(x)=\frac{1}{\mathcal{Z}}e^{-\beta H_{c}(x)},

(1)

describes the probability of the system occupying the configuration $x=\{x_{i}\}$ at a specified temperature $T=1/\beta$ . As the partition function, $\mathcal{Z}$ , is typically intractable, sampling from the Boltzmann distribution is often the only approach for estimating the system’s thermodynamic properties. This class of sampling problem is additionally a common subroutine in machine learning and combinatorial optimization, often emerging as a computational bottleneck in these methods.

Several quantum algorithms for this task, such as quantum walks and quantum simulated annealing, are known to give algorithmic speedups over classical methods for certain problems [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]. However, these algorithms rely on fault-tolerant quantum computers with many qubits, making them infeasible on today’s devices. Current quantum devices are limited in size and performance, and few algorithms for sampling from a Boltzmann distribution are available within the constraints of current hardware [15, 16, 17, 18, 19].

One such algorithm, introduced in Ref. [17], the quantum-enhanced Markov chain Monte Carlo, demonstrates promising error resilience. This hybrid approach constructs a Markov chain over classical configurations, proposing new configurations based on measured quantum evolution. Next, a classical step decides whether to accept or reject the proposed configuration as the new state of the chain. Empirical tests reported in Ref. [17] indicated a polynomial speedup in sampling low-temperature spin glass problems compared to uniform and local classical MCMC methods. Additionally, this algorithm was as a subroutine in an adversarial quantum auto-encoder, showing a large temperature region with a larger spectral gap compared to its classical counterparts [20]. In Ref. [21], it was shown that the speedup is not generic across models as it is provably absent for some worst-case problems. In this work, we extend the analysis to a more practically relevant class of Hamiltonians and show generic performance behaviour related to the ergodicity of the quantum quench, allowing for the identification of favourable performance regimes.

Figure 1: Illustrated in (a) is the conventional two-step Markov chain update strategy. In contrast, the quantum-enhanced Markov chain Monte Carlo algorithm alters only the initial step of this process, as depicted in (b).

II Algorithm Overview

Markov chain Monte Carlo (MCMC) is an extremely successful and widely used class of sampling algorithms. These methods rely on Markov chains designed to equilibrate to the desired probability distribution. Often, these chains are constructed with a proposal step followed by an acceptance step, facilitating easier engineering to ensure the conditions for convergence are met. Specifically, if the chain is in the state $y$ , a new state $x$ is proposed with probability $Q(x|y)$ . In the acceptance step, this new state is adopted as the new state of the chain, with probability $A(x|y)$ , known as the acceptance probability. This process is outlined in Fig. 1(a).

The quantum-enhanced Markov chain Monte Carlo algorithm follows this two-step procedure, where only the proposal step requires quantum computation, as seen in Fig. 1(b). The current state of the Markov chain, a classical configuration, is encoded as a computational basis state $\ket{x}$ . A unitary $U$ is applied to the state and then the system is measured in the computational basis, resulting in a new classical configuration. This method results in the following proposal probability,

Q(x|y)=|\langle x|U|y\rangle|^{2}.

(2)

A common acceptance rule, the Metropolis-Hasting probability,

A(x|y)=\min\left(1,e^{-\beta(H_{c}(x)-H_{c}(y))}\frac{Q(y|x)}{Q(x|y)}\right),

(3)

depends only on the ratio of proposal probabilities. If $Q(y|x)=Q(x|y)$ , the acceptance probability can be computed efficiently even if the proposal probability is intractable. With this construction, the chain satisfies the detailed balance condition. If $Q(x|y)>0$ $\forall x,y$ , the chain is aperiodic and irreducible; ensuring convergence to the desired distribution [17].

Due to the classical acceptance step, this algorithm maintains convergence even with errors in the proposal step, making it well-suited for current noisy quantum devices. The only problematic errors are those that remove the needed proposal probability symmetry. However, mitigation methods such as state preparation and measurement twirling can be employed [17]. Although many choices of $U$ meet the symmetry requirement, they do not all lead to an algorithmic improvement over classical methods. In this work the focus is on evolution of the form $U=e^{-iHt}$ with the following time-independent Hamiltonian,

H=H_{c}+h\sum_{i=1}^{N}\sigma^{x}_{i}.

(4)

After initialization, Markov chains require a warm-in period, known as the mixing time, to equilibrate to its stationary distribution. Specifically, the mixing time is the number of steps required for the total variation distance to $\pi$ to be $\epsilon$ -small. On difficult problems, chains suffer from slow convergence, making the mixing time the important metric in MCMC performance.

A Markov chain can be specified with a stochastic transition matrix $P$ whose elements $P(y,x)$ describe the probability of moving from the state $y$ to $x$ . For the two-step construction,

P(y,x)=\begin{cases}Q(x|y)A(x|y),&\text{if}\ x\neq y\\ 1-\sum_{z\neq x}Q(x|z)A(x|z),&\text{if}\ x=y\end{cases}.

(5)

As $P$ is stochastic, its eigenvalues have a magnitude at most 1, and the unique stationary distribution is associated with the eigenvalue 1. The repeated application of $P$ determines the distribution of the chain, therefore, the mixing time can be connected to the spectral gap $\delta=1-|\lambda_{2}|$ of $P$ , formally:

(\delta^{-1}-1)\ln\left(\frac{1}{2\epsilon}\right)\leq t_{\text{mix}}\leq\delta^{-1}\ln\left(\frac{1}{\epsilon\pi_{\min}}\right)

(6)

where $\pi_{\min}=\min_{x\in S}\pi(x)$ [22]. The spectral gap is commonly used in performance analyses of MCMC algorithms. Exact spectral gap calculation is, however, limited to small systems as the transition matrix grows with the dimension of the state space and quickly becomes intractable to diagonalize.

The performance of the quantum-enhanced Markov chain Monte Carlo algorithm was probed through spectral gap calculations on the Sherrington-Kirkpatrick (SK) model in Ref. [17]. The two free parameters of the proposal strategy, $t$ and $h$ , were randomly drawn at each MCMC step. This strategy was compared to local and uniform classical proposal strategies. Even without optimizing the free parameters, the algorithm exhibited favourable mixing times in the low-temperature regime over the classical counterparts. Although these numerical studies indicated a polynomial speedup, they are limited to small systems. It is unclear whether this advantage persists for larger systems, given the significant finite-sized effects known in the SK model.

The improvement in the mixing time is suggested to be due to the proposal strategy favouring states close in energy, as the quantum evolution depends on the classical Hamiltonian. These states can be difficult to propose classically as they can be decorrelated from the initial state due to the thermalization of the system after the quantum quench. Here, we focus on how the mixing time depends on the free parameters $t$ and $h$ as a strategy to probe the underlying mechanism responsible for the found advantage. Understanding this dependence is essential in knowing if a similar improvement can be obtained where this quantum evolution is classically simulatable. Through analysis of the bottlenecks of the quantum-enhanced method, we show that the speedup observed numerically can only be achieved when quenching to a non-ergodic quantum system, highlighting the difficulty in achieving a generic speedup.

III Bottleneck Analysis

The algorithmic performance of MCMC methods is often not formally derived but determined empirically. This poses a difficulty for the quantum-enhanced method as very limited system sizes are accessible. However, rigorous upper bounds for the mixing time of classical MCMC methods have been found in certain cases using analytical techniques [22]. One such technique is to analyze the bottlenecks of the chain, which restrict the chain’s movement from one section of states to another.

The equilibrium flow between two configurations $x$ and $y$ is defined as $E(x,y)=\pi(x)P(x,y)$ . There is a natural graph corresponding to any MCMC method, depicted in Fig. 2, where the nodes represent the possible configurations and the edge weights indicate the equilibrium flow between these configurations. The mixing time of the chain is related to the equilibrium flow between a subset of configurations $S$ , to its complement,

E(S,S^{c})=\sum_{x\in S,y\in S^{c}}\pi(x)P(x,y).

(7)

An upper bound can be placed on the spectral gap through minimization over possible subsets [22],

\delta\leq\min_{S:\pi(S)\leq 1/2}\frac{E(S,S^{c})}{\pi(S)\pi(S^{c})}:=\Lambda_{\text{min}},

(8)

where $\pi(S)=\sum_{x\in S}\pi(x)$ is the Boltzmann weight of the subset $S$ . Here, the minimization can be interpreted as finding the cut with the most restricted equilibrium flow, thus determining the chain’s mixing time.

At high temperatures, the Boltzmann distribution is close to uniform and, therefore, easy to sample. The difficulties arise at lower temperatures, where the distribution favours low-energy configurations. We will focus on this low-temperature regime, where the quantum-enhanced method numerically showed favourable mixing time. The equilibrium flow of any set $B$ to its complement provides a further upper bound on the spectral gap of this chain,

\delta\leq\Lambda_{\text{min}}\leq\Lambda(B)=\frac{E(B,B^{c})}{\pi(B)\pi(B^{c})}.

(9)

In the low-temperature regime, for some models, one can devise a reasonable choice of $B$ resulting in a tight bound on the spectral gap. This method allows for the behaviour of the MCMC to be probed for system sizes where exact spectral gap calculation is not accessible numerically [21].

Figure 2: Equilibrium flow graph of an MCMC method. The nodes represent different spin configurations, and the edge weights denote the equilibrium flow between these configurations. The normalized flow through the most restricted graph cut, shown with the dashed line, provides an upper bound for the spectral gap, as expressed in Eq. (8).

The quantum quench in the quantum-enhanced method can propose configurations far from the previous state in Hamming distance. Ideally, this evolution causes the system to dephase within an energy window around the initial state’s energy, as then the proposed configurations are likely to be accepted. Difficult sampling problems, such as the low-temperature SK model, often have many nearly degenerate states. The energy windows can be exponentially small, suggesting the need for long time evolution. The proposal probability can be expanded in terms of the eigenstates of $H$ ,

Q(x|y)=\sum_{n,m}e^{-i(E_{n}-E_{m})t}\innerproduct{x}{n}\innerproduct{n}{y}\innerproduct{y}{m}\innerproduct{m}{x}.

(10)

In the long time limit, and in the absence of many degenerate gaps, the proposal probability relaxes to its equilibrium value, equal to the time-averaged probability [23]. Therefore, in the above form, only the terms where $n=m$ contribute,

Q(x|y)=\sum_{n}|\innerproduct{x}{n}|^{2}|\innerproduct{n}{y}|^{2}.

(11)

It can be shown, as presented in Appendix A, that the spectral gap for this time-averaged proposal upper bounds the time-average of the gap. As a result, this limit has shorter mixing times than the average performance of the strategy of randomly selecting $t$ for each MCMC step. Numerical studies of the spectral gap dependence on time, presented in Appendix A, show the best mixing occurs once the system has reached equilibrium. The long-time limit, therefore will be the focus of the analysis. In addition, this regime alleviates the need for fine-tuning the evolution time.

Refer to caption — Figure 3: The two right and leftmost plots illustrate the partitioning of the Boltzmann distribution into the set $B$ and its complement. The bottleneck bound of Eq. (12) involves the product of two distributions $f(n)$ and $g(n)$ , which are shown in light and dark blue, respectively, in the center plots corresponding to two different quantum proposals. In panel a) there is little overlap between the two distributions as the quantum states closely track the classical states, not ideal for fast mixing. In panel b), the overlap is enhanced significantly, but so is the entropy of each distribution, highlighting the balance at play.

In this limit, the bottleneck upper bound provides valuable insight into the properties of the quantum evolution that lead to favourable mixing. Consider a set $B$ where all configurations in $B$ have higher energy than those in $B^{c}$ . An example of such a set partition within the Boltzmann distribution is illustrated in the left and rightmost panels of Fig. 3. As the Boltzmann measure weighs the equilibrium flow, the bound for this type of set $B$ is typically tight in the low-temperature limit. The bottleneck bound takes the following form,

\begin{split}\Lambda(B)&=\frac{1}{\pi(B^{c})}\sum_{n}\sum_{x\in B,y\in B_{c}}\pi_{B}(x)|\innerproduct{x}{n}|^{2}|\innerproduct{n}{y}|^{2},\\ &=\frac{1}{\overline{\pi}(B^{c})}\sum_{n}f(n)g(n),\end{split}

(12)

with $\pi_{B}(x)$ the Boltzmann measure normalized over $B$ . Two distributions are defined,

f(n)=\sum_{x\in B}\pi_{B}(x)|\innerproduct{x}{n}|^{2},

(13)

the probability of transitioning from the set $B$ to the $n$ th quantum eigenstate, weighted by $\pi_{B}$ and

g(n)=\frac{1}{|B^{c}|}\sum_{y\in B^{c}}|\innerproduct{n}{y}|^{2},

(14)

the probability that elements of $B^{c}$ move to the eigenstate $n$ . Finally, $\overline{\pi}(B^{c})=|B^{c}|^{-1}\sum_{x\in B^{c}}\pi(x)$ denotes the average Boltzmann probability to be in set $B^{c}$ . As Eq. (12) involves the product of these distributions, one achieves fast mixing if there is significant overlap between them. For example, Fig. 3 illustrates these distributions for two quantum proposal strategies that differ by field strength $h$ . In the first strategy, shown in Fig. 3(a), the distributions $f(n)$ and $g(n)$ have minimal overlap, resulting in unfavourable mixing compared to in Fig. 3(b), where the overlap is significantly greater.

The purity of the two eigenstate distributions $f$ and $g$ limits the extent of their overlap and consequently, the spectral gap. This result follows from the Cauchy-Schwarz inequality applied to Eq. (12),

\delta\leq\frac{1}{\overline{\pi}(B^{c})}\left(\sum_{n}f(n)^{2}\sum_{n}g(n)^{2}\right)^{1/2}.

(15)

These distribution purities can be related to a physical quantity of the quantum evolution, namely the inverse participation ratio (IPR) of classical configurations in the eigenstate basis, defined as,

\mathrm{IPR}(x)=\sum_{n}|\innerproduct{n}{x}|^{4}.

(16)

Specifically, applying Cauchy-Schwarz inequality again, the following bound on the gap is obtained,

\delta\leq\frac{1}{\overline{\pi}(B^{c})}\sqrt{\left(\sum_{x\in B}\pi_{B}(x)\mathrm{IPR}(x)\right)\left(\sum_{y\in B^{c}}|B^{c}|^{-1}\mathrm{IPR}(y)\right)}.

(17)

The IPR is a measure of the localization of the classical configuration $x$ on the eigenstates of $H$ . The minimum IPR corresponds to a completely delocalized state, with $\mathrm{IPR}(x)=1/2^{N}$ , whereas a perfectly localized state gives the maximum IPR value of $1$ . If the Hamiltonian under which the system is quenched is fully ergodic, the eigenstates are delocalized, and the spectral gap is upper bounded by $1/2^{N}$ . Specifically, the proposal probability of Eq. (11) is $Q(x|y)=1/2^{N}$ , identical to the strategy of picking configurations uniformly at random. This shows that the quantum quench must be non-ergodic to obtain a scaling advantage over the uniform classical strategy.

The IPR of many-body systems generally scales exponentially with the system size, a consequence of the tensor product structure. For instance, if the states $\ket{n}$ in Eq. (16) are product states where each qubit is rotated away from the classical z-axis by a small angle $\theta$ , the IPR would still scale exponentially as ${\rm IPR}\sim\exp(-2\theta^{2}N)$ . This immediately highlights the central difficulty in obtaining fast mixing with these quantum-enhanced Markov chains as they suffer from an orthogonality catastrophe, causing all terms within the square root in Eq. (17) to be exponentially small. A large IPR can be attained if the eigenstates of the quantum evolution are well localized on the classical configurations, specifically when the transverse field is perturbatively small. In this perturbative limit, there is no quantum advantage, as the quantum evolution can be replicated classically. Moreover, the likelihood of shifting spectral weight from $B$ to $B^{c}$ decreases with a smaller transverse field. This effect is not reflected in the bound of Eq. (17) but with Eq. (15).

Additionally, consider the following application of Jensen’s inequality,

\overline{\pi}(B^{c})=\frac{1}{|B^{c}|}\sum_{x\in B^{c}}\pi(x)\geq e^{-\beta E_{c}+\beta F_{\beta}},

(18)

where $E_{c}=|B_{c}|^{-1}\sum_{x\in B^{c}}H_{c}(x)$ and $F_{\beta}$ is the free energy of the system at inverse temperature $\beta$ . Expression (15) can thus be further bound by,

\delta\leq e^{-\beta(F_{\beta}-F_{c})},

(19)

where $F_{c}=E_{c}-T(S_{f}+S_{g})/2$ and $S_{f,g}$ is the Rényi 2-entropy of $f$ and $g$ , respectively. This form is reminiscent of classical rate equations, where $F_{c}$ corresponds to the free energy of the transition state. At low temperatures, the Gibbs free energy $F_{\beta}$ approaches $E_{c}$ , but the entropy of $f$ and $g$ remains finite, provided the transverse field is finite.

The upper bounds derived here link the mixing time of the quantum-enhanced method to the physical properties of the quantum evolution. Achieving fast mixing requires a balance of competing properties: the distributions $f$ and $g$ must have significant overlap but not too much entropy, as that results in an unfavourably small IPR. Specifically, the low-temperature gap is exponentially small and constrained by the IPR of the low-energy states, allowing regimes of slow mixing to be identified.

IV Performance

As with any MCMC method, the performance of the quantum-enhanced method is problem-dependent. For example, on an unstructured marked item problem, a quantum proposal method has no advantage over a uniform classical proposal [21]. Before investigating more complex spin glass systems, it is insightful to consider a simpler analytically tractable example to elucidate some points made in the previous section.

IV.1 Ising Chain

Consider the Ising chain with periodic boundary conditions,

H_{c}=-\sum_{i=1}^{N}\sigma_{i}^{z}\sigma_{i+1}^{z}.

(20)

The quantum evolution associated with this system can be solved analytically, allowing the proposal probability to be found for any system size. Unfortunately, this does not mean that the spectral gap can be easily determined, as diagonalizing the transition matrix is feasible only for small systems. As illustrated in Fig. 4, investigations of these small systems show non-trivial dependence on the Hamiltonian parameters $h$ and $t$ but indicate substantial parameter regimes with large spectral gaps. However, it is unclear if this behaviour continues for larger system sizes. Using the bottleneck upper bound, the performance of the quantum-enhanced method on this model can be probed in the thermodynamic limit.

The possible configuration energies of the classical system are,

E(k)=-N+4k,

(21)

where $k=\{0,1,..,N/2\}$ . Let $S_{k}$ be the subset such that $\forall x\in S_{k}$ , $H_{c}(x)=E(k)$ . Consider the choice $B=S_{1}$ for the upper bound of Eq. (9). This gives the following bound on the spectral gap,

\delta\leq\frac{1}{N(N-1)}\sum_{x\in S_{1},y\in S_{0}}Q(y|x)+\frac{e^{-4\beta}}{2-N(N-1)e^{-4\beta}},

(22)

where the second term is small in the low-temperature limit. As derived in Appendix B, this bound can be expressed in terms of the Hamiltonian parameters $h$ and $t$ ,

\delta\leq\frac{2\gamma(h,t)}{N-1}e^{-N\lambda(h,t)}+\frac{e^{-4\beta}}{2-N(N-1)e^{-4\beta}}.

(23)

Where the polynomial prefactor is,

\gamma(h,t)=\int_{0}^{\pi}\frac{dk}{2\pi}\frac{h^{2}\sin^{2}(k)\sin^{2}(2t\varepsilon_{k})}{\varepsilon^{2}_{k}-h^{2}\sin^{2}(2t\varepsilon_{k})\sin^{2}(k)},

(24)

and exponential term,

\lambda(h,t)=-\int_{0}^{\pi}\frac{dk}{2\pi}\ln\left[1-\frac{h^{2}\sin^{2}(k)}{\varepsilon_{k}^{2}}\sin^{2}(2t\varepsilon_{k})\right],

(25)

with $\varepsilon_{k}=\sqrt{(h-\cos(k))^{2}+\sin^{2}(k)}$ . This bound is tight compared to the spectral gap calculated through exact diagonalization on small systems sizes, as seen in Fig. 5, and reproduces the intricate structure seen in Fig. 4. This result confirms our two main points: (i) the gap remains exponentially small unless $\lambda$ vanishes, which only happens in the $h\rightarrow 0$ limit, and (ii) the numerics for small system sizes, which indicate a broad parameter range for speedup, are misleading. In the thermodynamic limit, these features will disappear, and the gap will scale exponentially.

It is a general feature of the quantum-enhanced Markov chain Monte Carlo method that a local classical strategy is reproduced in the small $h$ limit. The quantum proposal can be expanded perturbatively for small $h$ , where the proposal probability is only non-negligible for configurations separated by single spin flips. For the Ising chain the local spin-flip strategy is efficient; the spectral gap scales polynomially with the system’s size [16]. This polynomial scaling is reproduced in the quantum-enhanced MCMC method at small $h$ where the polynomial decay dominates compared as compared to the exponential decay. To leading order $\lambda(h)\sim O(h^{2})$ therefore $h\leq O(1/\sqrt{N})$ is needed to avoid exponential scaling. Similarly, $\gamma(h)\sim O(h^{2})$ such that the gap increases with $h$ simply because the matrix elements increase. Consequently, the optimal point is around $h_{\rm opt}\sim 1/\sqrt{N}$ and the gap scales like $1/N^{2}$ , in correspondence with the classical result. Hence, similar to the worst-case problem discussed in Ref. [21], there is no speedup over classical MCMC sampling for the 1D Ising chain.

IV.2 Spin Glasses

As the transverse field Ising model is solvable, it was possible to study large system sizes through a bottleneck analysis. However, the Ising model has unique mixing properties that are not relevant to most other systems as it is as easy to sample classically. The main focus of this work is the application of the quantum-enhanced method to spin glass systems, disordered models whose energy landscapes are characterized by many metastable states. Classical chains sampling these spin glass problems at low temperatures are expected to have exponentially slow mixing. Therefore, the optimal transverse field of the quantum-enhanced algorithm does not need to vanish.

The Sherrington-Kirkpatrick (SK) model, a fully connected spin glass model, is described by the following Hamiltonian,

H_{c}=-\sum_{1\leq i<j}J_{ij}\sigma_{i}^{z}\sigma_{j}^{z}+\sum_{i}h_{i}\sigma_{i}^{z},

(26)

with $J_{ij}$ drawn from a Gaussian distribution with a mean of zero and a standard deviation of $1/\sqrt{N}$ [24]. In the following numerical study, $h_{i}$ is drawn uniformly between $-0.25$ and $0.25$ to break the $\mathbb{Z}_{2}$ symmetry.

Due to Parisi’s replica method, the system is known to exhibit a full replica-symmetry breaking (RSB) low-temperature phase, whose energy minima follow a hierarchical clustering structure [25]. The number of these minima is sub-exponential in the system size and separated by large energy barriers, making this phase difficult to sample with a local MCMC method. Finding the ground state of the system, an easier task than sampling, is known to be NP-hard [26]. The system has a second-order transition from the replica symmetric paramagnetic phase into this low-temperature spin glass phase.

A generalization of the SK model, known as the p-spin model, has p-wise interactions instead of two-spin interactions [27],

H_{c}=-\sum_{1\leq i_{1}<i_{2}..<i_{p}\leq N}J_{i_{1}i_{2}...i_{p}}\sigma_{i_{1}}^{z}\sigma_{i_{2}}^{z}...\sigma_{i_{p}}^{z}.

(27)

The parameters are drawn from a Gaussian distribution with a mean of zero and variance of $p!/(2N^{p-1})$ to ensure the Hamiltonian is extensive. Systems with $p\geq 3$ do not have only the full-RSB phase of the SK model. Instead, these models exhibit a first-order transition from the paramagnetic phases into a 1-RSB phase. At lower temperatures, there is an additional phase transition into the full-RSB phase [28, 27]. Due to the known differences in these systems, the 3-spin model is also studied.

The spectral gap averaged over 1000 Hamiltonian instances is calculated exactly for the SK and 3-spin system in the long time limit, as seen in Fig. 6 plot a) and b), respectively. The spectral gap scaling with system size determines the algorithm’s performance and is the point of comparison to other MCMC methods. In Fig. 6 plot c) and d), the scaling exponent $k$ is plotted from a least squares exponential fit of the form $2^{-kN}$ , on the average gap for each value of $h$ . This work focuses on determining the optimal performance regimes of quantum-enhanced MCMC, not an extensive comparison to classical methods. Therefore, the scaling of the spectral gap is exclusively compared to two fixed-temperature classical proposal strategies.

In the first approach, a new configuration is randomly chosen uniformly, resulting in $Q(x|y)=1/2^{N}$ for all $x$ and $y$ . The spectral gap of this method scales inversely with the size of the state space, and the scaling exponent is shown in red in Fig. 6. The second method is a local approach where a new configuration is proposed by randomly flipping one spin. In this case, $Q(x|y)=1/N$ , for $x$ and $y$ separated by a single spin flip, and zero otherwise. The quantum-enhanced MCMC reproduces a local strategy in the small transverse field limit, with proposal probability found from a perturbative expansion to the lowest order in $h$ ,

Q(x_{i}|x_{j})=\frac{2h^{2}}{(E_{i}-E_{j})^{2}},

(28)

where $x_{i}$ and $x_{j}$ differ by a single spin flip, say at spin $k$ . This probability is independent of $N$ , unlike the classical local strategy. For example for the SK model,

E_{i}-E_{j}=\sum_{i=1}^{N}2J_{ik}+2h_{k}

(29)

is independent of $N$ due to the central limit theorem. As a result, plotted as the dashed line in Fig. 6 is the scaling exponent of the spectral gap of the local classical strategy multiplied by $N$ . This modified classical scaling agrees with the scaling of the quantum-enhanced MCMC in the small $h$ limit. Unlike the Ising chain, these spin glass problems are known to have long mixing times for local strategies so the spectral gap scales poorly in this limit. The local strategy, gives much better scaling for the SK model than the 3-spin model, indicating the differences in difficulty between these two sampling tasks. For these disordered models, isoenergetic cluster updates are often more successful than local strategies [29, 30]. Additionally, methods such as simulated annealing and parallel tempering can enhance the efficiency of MCMC methods [31]. Therefore, these classical scaling exponents are used as points of comparison, not as a representation of leading classical approaches.

Increasing $h$ leads to the proposal of higher-order spin flip configurations and an increasing spectral gap. This improvement continues until the effect of the IPR bound becomes relevant. Thus, there is an intermediate optimal regime for both the SK and 3-spin model where the spectral gap is maximized and the system proposes configurations far in Hamming distance but has yet to become fully ergodic. It is at this point that the quantum-enhanced MCMC has the best scaling, outperforming the uniform and local classical strategies. Past this region, further increasing $h$ causes the system to have a larger fractal dimension and, therefore, worse scaling until the large $h$ limit is reached. It is difficult to know the position of this optimal regime a priori as it depends on the quantum evolution’s specific eigenstate properties.

The variations in scaling with $h$ are due to the different localization properties of the quantum eigenstates on classical configurations. The IPR of these configurations, sorted by their energy for different $h$ values, is depicted in Fig. 7. Panel a) shows the IPR for the SK model at the point of optimal scaling ( $h=0.4$ ), panel b) at the maximally ergodic point ( $h=1.6$ ), and panel c) represents the large $h$ limit ( $h=100$ ). Similarly, panels e), f), and g) respectively exhibit the IPR of the 3-spin model at $h=0.7$ , $h=1$ , and $h=100$ . Individual IPR values are displayed, and their mean over an energy window is plotted in a darker colour for $N=6-14$ . In the bottleneck bound of Eq. (8), the proposal probability is weighted according to the Boltzmann measure. Consequently, in the low-temperature limit, the IPR scaling of low-energy configurations gives physical insight into the scaling of the MCMC method. Panels d) and h) display the average IPR within an energy window, indicated in grey in the other panels, as a function of $N$ . Fitting exponentially as $2^{-kN}$ , the IPR scaling exponent matches the behaviour observed in the spectral gap scaling of Fig. 6, verifying the relation of the MCMC scaling to the localization properties.

In both models, the scaling exponent plateaus in the large $h$ limit, never reaching the fully ergodic scaling. In the limit of large $h$ , the proposal probability can once again be expanded perturbatively. It is tempting to think that the proposal probability is uniform, since the large $h$ eigenstates are those of $X=\sum_{i}\sigma^{x}_{i}$ . However, the latter is highly degenerate, requiring some care in selecting the proper eigenstates. Specifically for the SK model, using the degenerate perturbation theory we find the eigenstates are those of:

H=\sum_{i<j}J_{ij}\left(\sigma_{i}^{z}\sigma_{j}^{z}+\sigma_{i}^{y}\sigma_{j}^{y}\right)+h\sum_{i}\sigma^{x}_{i}.

(30)

Note that the two terms in the Hamiltonian commute, the eigenstates are independent of the value of $h$ but some small $h$ is required to avoid the emergency of a $\mathbb{Z}_{2}$ symmetry. This Hamiltonian, at small $h$ , is potentially easier to implement as to maintain the needed degeneracy in the original construction $h$ must have no variations across spins, a requirement that is extremely difficult experimentally. Although not achieving the optimal performance of the quantum proposal technique, this large $h$ regime still exhibits a scaling enhancement over the uniform classical method with a simplified experimental procedure.

V Conclusion

The performance of the quantum-enhanced MCMC method varies significantly across different systems and parameter regimes. In this work, this behaviour is explained through a bottleneck analysis of the chain, which links MCMC performance to the ergodicity of the quantum Hamiltonian. An upper bound on the spectral gap is established, dependent on the IPR of the classical configurations on the quantum eigenstates. An immediate result of this bound is that the quantum Hamiltonian under which the system is quenched needs to be non-ergodic to have improved scaling over the uniform classical method.

Since this analysis provides an upper bound on the spectral gap, it does not guarantee an improvement in scaling over classical strategies in the non-ergodic regime. Establishing lower bounds through bottleneck analysis is possible but more challenging for this class of problems. However, the bottleneck upper bound offers valuable physical insights into the properties that lead to favourable mixing allowing the identification of parameter regimes where a speedup is expected. We show that there is a generic behaviour across various models, with optimal performance occurring in the long time limit at an intermediate value of $h$ . The exact value of this optimal transverse field depends on the specific localization properties of the model and is typically hard to determine. Due to the degeneracy of the quantum Hamiltonian’s transverse field, scaling improvement over the uniform strategy persists into the large $h$ limit. Effective Hamiltonians in this regime can be identified as having similar behaviour but with simpler experimental realization, potentially enabling large system-size demonstrations of this algorithm on today’s quantum devices.

VI Acknowledgments

The authors are grateful for ongoing support through the Flatiron Institute, a division of the Simons Foundation. D.S. is partially supported by AFOSR (Award no. FA9550-21-1-0236) and NSF (Award no. OAC-2118310). We acknowledge support from DARPA-STTR award (Award No. 140D0422C0035). The authors acknowledge useful discussions with Pooya Ronagh and Raymond Laflamme.

References

[1] Szegedy, M. Quantum speed-up of Markov chain based algorithms. In 45th Annual IEEE Symposium on Foundations of Computer Science, 32–41 (2004).
[2] Somma, R. D., Boixo, S., Barnum, H. & Knill, E. Quantum simulations of classical annealing processes. Physical review letters 101, 130504 (2008).
[3] Wocjan, P. & Abeyesinghe, A. Speedup via quantum sampling. Physical Review A 78, 042336 (2008).
[4] Poulin, D. & Wocjan, P. Sampling from the thermal quantum gibbs state and evaluating partition functions with a quantum computer. Physical review letters 103, 220502 (2009).
[5] Bilgin, E. & Boixo, S. Preparing thermal states of quantum systems by dimension reduction. Physical review letters 105, 170405 (2010).
[6] Temme, K., Osborne, T. J., Vollbrecht, K. G., Poulin, D. & Verstraete, F. Quantum metropolis sampling. Nature 471, 87–90 (2011).
[7] Yung, M.-H. & Aspuru-Guzik, A. A quantum–quantum metropolis algorithm. Proceedings of the National Academy of Sciences 109, 754–759 (2012).
[8] Montanaro, A. Quantum speedup of Monte Carlo methods. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 471, 20150301 (2015).
[9] Chowdhury, A. N. & Somma, R. D. Quantum algorithms for gibbs sampling and hitting-time estimation. arXiv preprint arXiv:1603.02940 (2016).
[10] Harrow, A. W. & Wei, A. Y. Adaptive quantum simulated annealing for Bayesian inference and estimating partition functions. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, 193–212 (SIAM, 2020).
[11] Lemieux, J., Heim, B., Poulin, D., Svore, K. & Troyer, M. Efficient quantum walk circuits for Metropolis-Hastings algorithm. Quantum 4, 287 (2020).
[12] Arunachalam, S., Havlicek, V., Nannicini, G., Temme, K. & Wocjan, P. Simpler (classical) and faster (quantum) algorithms for Gibbs partition functions. Quantum 6, 789 (2022).
[13] Rall, P., Wang, C. & Wocjan, P. Thermal state preparation via rounding promises. Quantum 7, 1132 (2023).
[14] Chen, C.-F., Kastoryano, M. J. & Gilyén, A. An efficient and exact noncommutative quantum gibbs sampler. arXiv preprint arXiv:2311.09207 (2023).
[15] Wild, D. S., Sels, D., Pichler, H., Zanoci, C. & Lukin, M. D. Quantum sampling algorithms for near-term devices. Physical Review Letters 127, 100504 (2021).
[16] Wild, D. S., Sels, D., Pichler, H., Zanoci, C. & Lukin, M. D. Quantum sampling algorithms, phase transitions, and computational complexity. Physical Review A 104, 032602 (2021).
[17] Layden, D. et al. Quantum-enhanced markov chain monte carlo. Nature 619, 282–287 (2023).
[18] Zhang, D., Bosse, J. L. & Cubitt, T. Dissipative quantum gibbs sampling. arXiv preprint arXiv:2304.04526 (2023).
[19] Ding, Z., Chen, C. & Lin, L. Single-ancilla ground state preparation via lindbladians (2023). arXiv preprint arXiv:2308.15676 .
[20] Wilson, B. A. et al. Non-native quantum generative optimization with adversarial autoencoders. arXiv preprint arXiv:2407.13830 (2024).
[21] Orfi, A. & Sels, D. Bounding speedup of quantum-enhanced markov chain monte carlo. arXiv preprint arXiv:2403.03087 (2024).
[22] Levin, D. A. & Peres, Y. Markov chains and mixing times, vol. 107 (American Mathematical Soc., 2017).
[23] D’Alessio, L., Kafri, Y., Polkovnikov, A. & Rigol, M. From quantum chaos and eigenstate thermalization to statistical mechanics and thermodynamics. Advances in Physics 65, 239–362 (2016).
[24] Sherrington, D. & Kirkpatrick, S. Solvable model of a spin-glass. Physical review letters 35, 1792 (1975).
[25] Mézard, M., Parisi, G., Sourlas, N., Toulouse, G. & Virasoro, M. Nature of the spin-glass phase. Physical review letters 52, 1156 (1984).
[26] Barahona, F. On the computational complexity of Ising spin glass models. Journal of Physics A: Mathematical and General 15, 3241 (1982).
[27] Gardner, E. Spin glasses with p-spin interactions. Nuclear Physics B 257, 747–765 (1985).
[28] de Oliveira, V. M. & Fontanari, J. F. Landscape statistics of the p-spin ising model. Journal of Physics A: Mathematical and General 30, 8445 (1997).
[29] Zhu, Z., Ochoa, A. J. & Katzgraber, H. G. Efficient cluster algorithm for spin glasses in any space dimension. Physical review letters 115, 077201 (2015).
[30] Houdayer, J. A cluster monte carlo algorithm for 2-dimensional spin glasses. The European Physical Journal B-Condensed Matter and Complex Systems 22, 479–484 (2001).
[31] Earl, D. J. & Deem, M. W. Parallel tempering: Theory, applications, and new perspectives. Physical Chemistry Chemical Physics 7, 3910–3916 (2005).
[32] Horn, R. A. & Johnson, C. R. Matrix analysis (Cambridge university press, 2012).
[33] Sachdev, S. Quantum phase transitions. Physics world 12, 33 (1999).

Appendix A Time Averaged Upper Bound

Consider the matrix $Q$ whose elements describe the proposal probability between configurations. In the long time limit, each proposal probability equilibrates to its time-averaged value. Specifically, let $Q^{t_{i}}$ be the proposal probability matrix at some time $t_{i}$ . The time-averaged proposal probability matrix is then,

Q=\frac{1}{n}\sum_{i=1}^{n}Q^{t_{i}}.

(31)

The time-averaged $P$ is the average of the transition matrices, $P^{t_{i}}$ , associated with proposal probabilities $Q^{t_{i}}$ ,

P=\frac{1}{n}\sum_{i=1}^{n}P^{t_{i}}.

(32)

This can be shown by relating elements of $P$ to elements of $P^{t_{i}}$ . By construction, $P^{t_{i}}(y,x)=Q^{t_{i}}(x|y)M(x|y)$ and $P^{t_{i}}(x,x)=1-\sum_{y\neq x}Q^{t_{i}}(x|y)M(x|y)$ . The off-diagonal elements of $P$ are therefore,

P(y,x)=\frac{1}{n}\sum_{i=1}^{n}Q^{t_{i}}(x|y)M(x|y)=\frac{1}{n}\sum_{i=1}^{n}P^{t_{i}}(y,x).

(33)

The diagonal elements are,

P(x,x)=1-\sum_{y\neq x}\frac{1}{n}\sum_{i=1}^{n}Q^{t_{i}}(x|y)M(x|y)=\frac{1}{n}\sum_{i=1}^{n}(1-\sum_{y\neq x}Q^{t_{i}}(x|y)M(x|y))=\frac{1}{n}\sum_{i=1}^{n}P^{t_{i}}(x,x).

(34)

The gap of this time-averaged transition matrix $P$ is an upper bound on the averaged gap,

\delta[P]\geq\frac{1}{n}\sum_{i=1}^{n}\delta[P^{t_{i}}].

(35)

This claim is equivalent to,

\lambda_{2}[P]\leq\frac{1}{n}\sum_{i=1}^{n}\lambda_{2}[P^{t_{i}}],

(36)

where $\lambda_{2}$ denotes the second largest eigenvalue. In order to prove this claim, we can make use of the following decomposition of $P$ [22],

P=D_{\pi}^{-\frac{1}{2}}ED_{\pi}^{\frac{1}{2}}

(37)

where $D_{\pi}^{\frac{1}{2}}$ is a diagonal matrix with elements $D_{\pi}^{\frac{1}{2}}(i,i)=\sqrt{\pi(i)}$ . An element of the matrix $E$ is then,

E(y,x)=\sqrt{\frac{\pi(y)}{\pi(x)}}P(y,x).

(38)

A $P$ satisfies the detailed balance condition $E$ is symmetric. Let $\{\varphi_{j}\}$ be the set of eigenvectors of $E$ , then $\varphi_{1}=\sqrt{\pi}$ with corresponding eigenvalue $\lambda_{1}=1$ . Also let $f_{j}=D_{\pi}^{-\frac{1}{2}}\varphi_{j}$ , then $f_{j}$ is an eigenvector of $P$ ,

Pf_{j}=PD_{\pi}^{-\frac{1}{2}}\varphi_{j}=D_{\pi}^{-\frac{1}{2}}D_{\pi}^{\frac{1}{2}}PD_{\pi}^{-\frac{1}{2}}\varphi_{j}=D_{\pi}^{-\frac{1}{2}}E\varphi_{j}=\lambda_{j}D_{\pi}^{-\frac{1}{2}}\varphi_{j}=\lambda_{j}f_{j}.

(39)

This decomposition allows the eigenvalues of the symmetric matrices $E$ and $E^{t_{i}}$ to instead be compared. The claim of Eq. 36 is equivalent to,

\lambda_{2}[E]\leq\frac{1}{n}\sum_{i=1}^{n}\lambda_{2}[E^{t_{i}}].

(40)

The first eigenvector is the same for all $E^{t_{i}}$ and $E$ , $\varphi_{1}=\sqrt{\pi}$ . Consider then the matrix,

B=E-\varphi_{1}^{T}\varphi_{1}

(41)

then $\lambda_{1}[B]=\lambda_{2}[E]$ . As each $E^{t_{i}}$ has the same first eigenvector, we have,

B=\frac{1}{n}\sum_{i=1}^{N}B^{t_{i}}.

(42)

The claim can now be written in terms of the largest eigenvalues of the symmetric matrices $B$ and $B^{t_{i}}$ ,

\lambda_{1}[B]\leq\frac{1}{n}\sum_{i=1}^{n}\lambda_{1}[B^{t_{i}}].

(43)

The claim now follows from the convexity of the largest eigenvalue of symmetric metrics, a direct application of the Courant-Fischer Theorem [32]. Specifically, given two symmetric matrices $A$ and $B$ ,

\lambda_{1}[A+B]\leq\lambda_{1}[A]+\lambda_{1}[B].

(44)

This result shows that the average gap when choosing $t$ randomly at each step is smaller than the time-averaged gap.

Numerical studies of the exact spectral gap of the SK and 3-spin model are shown in a) and b) of Fig. 8, respectively. The time dependence is shown for two different values of the transverse field for each model, $h_{\text{max}}$ and $h_{\text{min}}$ , corresponding to the maximal and minimal gap values found in Fig. 6. Before the system is thermalized, the spectral gap at $h_{\text{max}}$ is below its equilibrium value. For $h_{\text{min}}$ , the gap is highly oscillatory, requiring fine-tuning and an understanding of the thermalization properties to take advantage of one of these peak values. These effects further indicate that one must consider the long-time limit to have the optimal behaviour of this algorithm.

Appendix B Ising Model Upper Bound

For simplicity, we will assume that the number of spins considered $N$ , is even. The upper bound of Eq. (9), where the set $B$ is chosen to be $S_{1}$ is,

\begin{split}\Lambda(B)&=\frac{1}{\pi(S_{1})(1-\pi(S_{1}))}\sum_{x\in S_{1},y\in S_{1}^{c}}\pi(x)P(x,y),\\ &=\frac{1}{|S_{1}|(1-\pi(S_{1}))}\sum_{x\in S_{1}}\left[\sum_{y\in S_{0}}P(x,y)+\sum_{i=2}^{N/2}\sum_{z\in S_{i}}P(x,z)\right],\\ &=\frac{1}{|S_{1}|(1-\pi(S_{1}))}\sum_{x\in S_{1}}\left[\sum_{y\in S_{0}}Q(y|x)+\sum_{j=2}^{N/2}\sum_{z\in S_{j}}e^{-4j\beta}Q(z|x)\right],\\ &=\frac{1}{|S_{1}|}\sum_{x\in S_{1},y\in S_{0}}Q(y|x)+\frac{1}{|S_{1}|(1-\pi(S_{1}))}\sum_{x\in S_{1}}\left[\pi(S_{1})\sum_{y\in S_{0}}Q(y|x)+\sum_{j=2}^{N/2}\sum_{z\in S_{j}}e^{-4j\beta}Q(z|x)\right].\\ \end{split}

(45)

The last term can be simplified using the following upper bound, which is tight in the low-temperature regime,

\pi(B)=\frac{|S_{1}|e^{-\beta(-N+4)}}{\sum_{k=0}^{N/2}|S_{k}|e^{-\beta(-N+4k)}}=\frac{|S_{1}|e^{-4\beta}}{|S_{0}|+\sum_{k=1}^{N/2}|S_{k}|e^{-4k\beta}}\leq\frac{|S_{1}|e^{-4\beta}}{2}.

(46)

Specifically, we then have,

\begin{split}&\frac{1}{|S_{1}|(1-\pi(S_{1}))}\sum_{x\in S_{1}}\left[\pi(S_{1})\sum_{y\in S_{0}}Q(y|x)+\sum_{j=2}^{N/2}\sum_{z\in S_{j}}e^{-4j\beta}Q(z|x)\right]\\ &\leq\frac{e^{-4\beta}}{|S_{1}|(1-\frac{1}{2}|S_{1}|e^{-4\beta})}\sum_{x\in S_{1}}\left[\frac{|S_{1}|}{2}\sum_{y\in S_{0}}Q(y|x)+\sum_{j=2}^{N/2}\sum_{z\in S_{j}}e^{-4j\beta}Q(z|x)\right],\\ &\leq\frac{e^{-4\beta}}{|S_{1}|(1-\frac{1}{2}|S_{1}|e^{-4\beta})}\left[1+\left(\frac{|S_{1}}{2}-1\right)\sum_{x\in S_{1},y\in S_{0}}Q(y|x)\right],\\ &\leq\frac{e^{-4\beta}}{2-|S_{1}|e^{-4\beta}}.\end{split}

(47)

In all, this gives the following upper bound on the spectral gap,

\delta\leq\frac{1}{N(N-1)}\sum_{x\in S_{1},y\in S_{0}}Q(y|x)+\frac{e^{-4\beta}}{2-N(N-1)e^{-4\beta}}.

(48)

The transverse field Ising model in one dimension is solvable, allowing this bound to be calculated exactly for any system size. Specifically, we require the time evolution of the following Hamiltonian with periodic boundary conditions,

H=-\sum_{i=1}^{N}\sigma_{i}^{z}\sigma_{i+1}^{z}+h\sum_{i=1}^{N}\sigma_{i}^{x}.

(49)

This Hamiltonian can be written in terms of fermionic operators through a Jordan-Wigner transformation [33],

H=-\sum_{i}^{N-1}(a_{i}^{\dagger}-a_{i})(a_{i+1}+a_{i+1}^{\dagger})-e^{i\pi N}(a_{N}^{\dagger}-a_{N})(a_{1}+a_{1}^{\dagger})+h\sum_{i}^{N-1}(2a_{i}^{\dagger}a_{i}-1).

(50)

The $\mathbb{Z}_{2}$ symmetry of the spin problem is now reflected in the fermionic parity $p=\frac{1}{2}(1-e^{i\pi\hat{N}})$ . This allows the two parity sectors to be considered independently. The Hamiltonian in these sectors is therefore,

H_{p}=-\sum_{i}^{N}(a_{i}^{\dagger}-a_{i})(a_{i+1}+a_{i+1}^{\dagger})+h\sum_{i}^{N}(2a_{i}^{\dagger}a_{i}-1)

(51)

with the following boundary conditions,

a_{N+1}=(-1)^{p+1}a_{1}.

(52)

This Hamiltonian can be further simplified by transforming into momentum space with the following operators,

a_{k}=\frac{1}{\sqrt{N}}\sum_{j=1}^{N}e^{-ikj}a_{j},

(53)

where the different boundary conditions in each parity sector are accounted for by using the following set of k values,

K_{p}=\bigg{\{}k=\pm\frac{2\pi}{N}\times\begin{cases}(l-1/2)\quad\text{with }l=1,2,...,N/2&p=0\\ l\quad\text{with }l=1,2,...,N/2-1&p=1\end{cases}

(54)

This gives a simplified form of the Hamiltonian,

H_{p}=\sum_{K_{p}}\left[\left(h-\cos(k)\right)\left(a_{k}^{\dagger}a_{k}-a_{-k}a_{-k}^{\dagger}\right)+i\sin(k)\left(a_{-k}a_{k}-a_{k}^{\dagger}a_{-k}^{\dagger}\right)\right]\\

(55)

The values of $k$ can be organized into pairs $(k,-k)$ allowing the Hamiltonian to be written as the sum over positive $k$ values, $K_{p}^{+}$ ,

H_{0}=\sum_{K^{+}_{0}}H_{k},\quad\quad\quad H_{1}=\sum_{K^{+}_{1}}H_{k}+H_{k=0,\pi}.

(56)

Here the $k=0$ and $k=\pi$ terms of the $p=1$ sector have been separated,

H_{k=0,\pi}=2(\hat{n}_{\pi}-\hat{n}_{0})+2h(\hat{n}_{0}+\hat{n}_{\pi}-1).

(57)

Now each $H_{k}$ can be in terms of $2\times 2$ matrix $\mathbf{H}_{k}$ ,

\begin{split}H_{k}&=2\left(h-\cos(k)\right)\left(a_{k}^{\dagger}a_{k}-a_{-k}a_{-k}^{\dagger}\right)+2i\sin(k)\left(a_{-k}a_{k}-a_{k}^{\dagger}a_{-k}^{\dagger}\right),\\ &=2\begin{pmatrix}a_{k}^{\dagger}&a_{-k}\end{pmatrix}\begin{pmatrix}h-\cos(k)&-i\sin(k)\\ i\sin(k)&-h+\cos(k)\end{pmatrix}\begin{pmatrix}a_{k}\\ a_{-k}^{\dagger}\end{pmatrix},\\ &=2\begin{pmatrix}a_{k}^{\dagger}&a_{-k}\end{pmatrix}\mathbf{H}_{k}\begin{pmatrix}a_{k}\\ a_{-k}^{\dagger}\end{pmatrix}.\end{split}

(58)

This matrix can be diagonalized through a Bogoliubov transformation, a mapping to a new set of fermionic operators through a unitary transform,

\gamma_{k}=u_{k}a_{k}-iv_{k}a_{-k}^{\dagger}

(59)

with $u_{k}=\cos(\theta_{k}/2)$ and $v_{k}=\sin(\theta_{k}/2)$ . This defines the transformation matrix,

A=\begin{pmatrix}\cos(\theta_{k}/2)&-i\sin(\theta_{k}/2)\\ -i\sin(\theta_{k}/2)&\cos(\theta_{k}/2)\end{pmatrix},

(60)

where $\theta_{k}$ is chosen to diagonalize $H_{k}$ . Specifically, take

\begin{split}\cos(\theta_{k})&=\frac{h-\cos(k)}{\sqrt{(h-\cos(k))^{2}+\sin^{2}(k)}},\\ \sin(\theta_{k})&=\frac{\sin(k)}{\sqrt{(h-\cos(k))^{2}+\sin^{2}(k)}}.\end{split}

(61)

After this transformation, $H_{k}$ takes on the simplified form,

\begin{split}H_{k}&=2\begin{pmatrix}a_{k}^{\dagger}&a_{-k}\end{pmatrix}AA^{\dagger}\mathbf{H}_{k}AA^{\dagger}\begin{pmatrix}a_{k}\\ a_{-k}^{\dagger}\end{pmatrix},\\ &=2\varepsilon_{k}(\gamma_{k}^{\dagger}\gamma_{k}-1/2),\end{split}

(62)

where the dispersion relation $\varepsilon_{k}$ is,

\varepsilon_{k}=\sqrt{(h-\cos(k))^{2}+\sin^{2}(k)}.

(63)

The ground state of the system, known as the Bogoliubov vacuum, is the state which annihilates $\gamma_{k}$ for all $k$ . The problem has two bands $\pm\varepsilon_{k}$ , where higher energy states are created by moving fermions from the lower to the upper band. At the phase transition, $h=1$ , the two bands collide, allowing zero energy excitations.

To calculate the bound of Eq. 48, it is easiest to put the classical ground and excited states in terms of these Bogoliubov fermions. Denote the classical ground states as $\ket{y^{1}},\ket{y^{2}}\in S_{0}$ , which are the two ferromagnetic states. These states can be decomposed into states with $p=0$ and $p=1$ , labelled $\ket{gs_{0}}$ and $\ket{gs_{1}}$ , respectively,

\begin{split}\ket{y^{1}}&=\frac{1}{\sqrt{2}}\left(\ket{gs_{0}}+\ket{gs_{1}}\right),\\ \ket{y^{2}}&=\frac{1}{\sqrt{2}}\left(\ket{gs_{0}}-\ket{gs_{1}}\right).\\ \end{split}

(64)

The classical excited states can be decomposed similarly, with each parity sector containing $N(N-1)/2$ states. As $U$ conserves parity, there is no method to transition between parity sectors during this evolution. This conservation permits these two sectors to be considered independently,

\sum_{x\in S_{1},y\in S_{0}}|\bra{y}U\ket{x}|^{2}=\sum_{x\in S_{1}^{p=0}}|\bra{gs_{0}}U\ket{x}|^{2}+\sum_{x\in S_{1}^{p=1}}|\bra{gs_{1}}U\ket{x}|^{2}.

(65)

Consider the $p=0$ sector. However, the same procedure can be followed for both sectors. The ground state $\ket{gs_{0}}$ is the Bogoliubov vacuum state, $\ket{gs_{0}}=\prod_{K_{0}^{+}}\ket{0}^{c}_{k}$ , where the transformation $A(\theta_{k}^{c})$ is defined with $h=0$ . To calculate the effect of $U$ on this state, $\ket{\emptyset}^{c}_{k}$ should be written in terms of the Bogoliubov fermions specified by $h\neq 0$ . This is done by first undoing the transformation $A(\theta_{k}^{c})$ , then changing into the new Bogoliubov fermions, $\gamma_{k}$ , with $A(\theta_{k})$ ,

A(\theta_{k})^{\dagger}A(\theta_{k}^{c})A(\theta_{k}^{c})^{\dagger}\ket{0}_{k}^{c}=A(\theta_{k})^{\dagger}A(\theta_{k}^{c})\begin{pmatrix}0\\ 1\end{pmatrix}=A(\theta_{k})^{\dagger}\begin{pmatrix}-i\sin(\theta_{k}^{c}/2)\\ \cos(\theta_{k}^{c}/2)\end{pmatrix}=\begin{pmatrix}-i\sin((\theta_{k}^{c}-\theta_{k})/2)\\ \cos((\theta_{k}^{c}-\theta_{k})/2)\end{pmatrix}.

(66)

This state is now in the diagonal basis of our desired evolution. The same method can be used for the classical excited states. Consider the excited state where the fermion at site $k_{1}$ is moved from the bottom to top band,

\ket{x}_{k_{1}}=\ket{1}^{c}_{k_{1}}\prod_{k\neq k_{1}}\ket{0}^{c}_{k}.

(67)

The non-vacuum portion of this state $\ket{1}^{c}_{k_{1}}$ , is transformed as,

A(\theta_{k})^{\dagger}A(\theta_{k}^{c})A(\theta_{k}^{c})^{\dagger}\ket{1}^{c}_{k}=A(\theta_{k})^{\dagger}A(\theta_{k}^{c})\begin{pmatrix}1\\ 0\end{pmatrix}=A(\theta_{k})^{\dagger}\begin{pmatrix}\cos(\theta_{k}^{c}/2)\\ -i\sin(\theta_{k}^{c}/2)\end{pmatrix}=\begin{pmatrix}\cos((\theta_{k}^{c}-\theta_{k})/2)\\ -i\sin((\theta_{k}^{c}-\theta_{k})/2)\end{pmatrix}.

(68)

First, consider the needed overlap in the $k_{1}$ block,

\bra{1}^{c}_{k_{1}}U_{k_{1}}\ket{0}^{c}_{k_{1}}=i\left(e^{2it\varepsilon_{k_{1}}}-e^{-2it\varepsilon_{k_{1}}}\right)\sin((\theta_{k_{1}}^{c}-\theta_{k_{1}})/2)\cos((\theta_{k_{1}}^{c}-\theta_{k_{1}})/2)=-\frac{h\sin(2t\varepsilon_{k_{1}})\sin(k_{1})}{\varepsilon_{k_{1}}}.

(69)

Therefore the overlap of the entire state is,

\begin{split}|\bra{gs_{0}}U\ket{x}_{k_{1}}|^{2}&=\prod_{k\neq k_{1}}\left[1-\frac{h^{2}\sin^{2}(k)}{\varepsilon_{k}^{2}}\sin^{2}(2t\varepsilon_{k})\right]\left(\frac{h^{2}\sin^{2}(2t\varepsilon_{k_{1}})\sin^{2}(k_{1})}{\varepsilon^{2}_{k_{1}}}\right),\\ &=\prod_{k\in K_{0}}\left[1-\frac{h^{2}\sin^{2}(k)}{\varepsilon_{k}^{2}}\sin^{2}(2t\varepsilon_{k})\right]\left(\frac{h^{2}\sin^{2}(2t\varepsilon_{k_{1}})\sin^{2}(k_{1})}{\varepsilon^{2}_{k_{1}}-h^{2}\sin^{2}(2t\varepsilon_{k_{1}})\sin^{2}(k_{1})}\right).\end{split}

(70)

This method can be used for all first excited states from both sectors, leading to the following,

\begin{split}\sum_{x\in S_{1},y\in S_{0}}|\bra{y}U\ket{x}|^{2}&=\prod_{k\in K_{0}}\left[1-\frac{h^{2}\sin^{2}(k)}{\varepsilon_{k}^{2}}\sin^{2}(2t\varepsilon_{k})\right]\sum_{k\in K_{0}}\left(\frac{h^{2}\sin^{2}(2t\varepsilon_{k})\sin^{2}(k)}{\varepsilon^{2}_{k}-h^{2}\sin^{2}(2t\varepsilon_{k})\sin^{2}(k)}\right)\\ &\quad+\prod_{k\in K_{1}}\left[1-\frac{h^{2}\sin^{2}(k)}{\varepsilon_{k}^{2}}\sin^{2}(2t\varepsilon_{k})\right]\sum_{k\in K_{1}}\left(\frac{h^{2}\sin^{2}(2t\varepsilon_{k})\sin^{2}(k)}{\varepsilon^{2}_{k}-h^{2}\sin^{2}(2t\varepsilon_{k})\sin^{2}(k)}\right).\end{split}

(71)

In the long time limit considered in the main text, this further simplifies,

\sum_{x\in S_{1},y\in S_{0}}|\bra{y}U\ket{x}|^{2}=\prod_{k\in K_{0}}\left[1-\frac{h^{2}\sin^{2}(k)}{2\varepsilon_{k}^{2}}\right]\sum_{k\in K_{0}}\left(\frac{h^{2}\sin^{2}(k)}{2\varepsilon^{2}_{k}-h^{2}\sin^{2}(k)}\right)+\prod_{k\in K_{1}}\left[1-\frac{h^{2}\sin^{2}(k)}{2\varepsilon_{k}^{2}}\right]\sum_{k\in K_{1}}\left(\frac{h^{2}\sin^{2}(k)}{2\varepsilon^{2}_{k}-h^{2}\sin^{2}(k)}\right),

(72)

which is the form used in the upper bound seen in Fig. 5. In the continuum limit, $\{K_{0}\}=\{K_{1}\}$ and we have for both sectors,

\lambda(h)=-\lim_{N\rightarrow\infty}\frac{1}{N}\sum_{k}\ln(1-\frac{h^{2}\sin^{2}(k)}{2\varepsilon_{k}^{2}})=-\int_{0}^{\pi}\frac{dk}{2\pi}\ln\left[1-\frac{h^{2}\sin^{2}(k)}{2\left(1+h^{2}-2h\cos(k)\right)}\right],

(73)

and

\gamma(h)=\lim_{N\rightarrow\infty}\frac{1}{N}\sum_{k}\frac{h^{2}\sin^{2}(k)}{2\varepsilon^{2}_{k}-h^{2}\sin^{2}(k)}=\int_{0}^{\pi}\frac{dk}{2\pi}\frac{h^{2}\sin^{2}(k)}{2(1+h^{2}-2h\cos(k))-h^{2}\sin^{2}(k)}.

(74)

This gives the final form of the bottleneck upper bound,

\delta\leq\frac{2\gamma(h)}{N-1}e^{-N\lambda(h)}+\frac{e^{-4\beta}}{2-N(N-1)e^{-4\beta}}.

(75)