Quantum and classical annealing in a continuous space with multiple local minima

Yang Wei Koh Institute of Innovative Research, Tokyo Institute of Technology, Nagatsuta-cho, Midori-ku, Yokohama 226-8503, Japan Hidetoshi Nishimori Institute of Innovative Research, Tokyo Institute of Technology, Nagatsuta-cho, Midori-ku, Yokohama 226-8503, Japan Graduate School of Information Sciences, Tohoku University, Sendai 980-8579, Japan RIKEN, Interdisciplinary Theoretical and Mathematical Sciences (iTHEMS), Wako, Saitama 351-0198, Japan

Abstract

The protocol of quantum annealing is applied to an optimization problem with a one-dimensional continuous degree of freedom, a variant of the problem proposed by Shinomoto and Kabashima. The energy landscape has a number of local minima, and the classical approach of simulated annealing is predicted to have a logarithmically slow convergence to the global minimum. We show by extensive numerical analyses that quantum annealing yields a power law convergence, thus an exponential improvement over simulated annealing. The power is larger, and thus the convergence is faster, than a prediction by an existing phenomenological theory for this problem. Performance of simulated annealing is shown to be enhanced by introducing quasi-global searches across energy barriers, leading to a power-law convergence but with a smaller power than in the quantum case and thus a slower convergence classically even with quasi-global search processes. We also reveal how diabatic quantum dynamics, quantum tunneling in particular, steers the systems toward the global minimum by a meticulous choice of annealing schedule. This latter result explicitly contrasts the role of tunneling in quantum annealing against the classical counterpart of stochastic optimization by simulated annealing.

I Introduction

Quantum annealing is a quantum-mechanical metaheuristic for optimization problems and has been applied predominantly to problems with discrete degrees of freedom, typically in terms of the transverse-field Ising model Kadowaki and Nishimori (1998); Santoro et al. (2002); Das and Chakrabarti (2008); Santoro and Tosatti (2006); Morita and Nishimori (2008); Albash and Lidar (2018); Hauke et al. (2020); Farhi et al. (2001). An early paper by Finnila et al. Finnila et al. (1994) discussed a system with continuous degrees of freedom but with an imaginary-time Schrödinger dynamics, thus short of intrinsic quantum effects. Recent work by Abel and Spannowsky Abel and Spannowsky (2021) applied the idea of domain-wall encoding Abel et al. (2021a) to approximately but accurately represent continuous degrees of freedom by Ising spins and illustrated quantum tunneling for a double-well potential by experiments on the D-Wave quantum annealing device Johnson et al. (2011). Abel et al. Abel et al. (2021b) extended their idea to more complex energy landscapes with several sharp local minima or a single deep global minimum and showed that quantum annealing as implemented on the D-Wave device achieves better performance than classical algorithms including simulated annealing. Most pertinent to the present paper is the contribution by Stella et al. Stella et al. (2005), in which they studied quantum annealing and classical simulated annealing for simple as well as non-trivial optimization problems in a continuous space. Their overall conclusion is that the quantum approach leads to a faster convergence to the global minimum than the classical one does. This result has been derived convincingly by numerical and analytical methods for the trivial case of harmonic oscillator as well as for the less trivial double-well potential. However, for the potential with very many local minima proposed and analyzed classically by Shinomoto and Kabashima Shinomoto and Kabashima (1991), they adopted a phenomenological coarse-grained model, in which only discrete positions of local minima were considered and then a continuum limit to ignore the distance between local minima was taken to facilitate the analysis. This procedure necessarily involves approximations, whose legitimacy is not necessarily evident in the long-time low-temperature limit in the classical case. This is true in the quantum case as well, and applicability of the approximation needs careful scrutiny. Another important aspect is that the approximation makes it difficult to intuitively grasp the idea of how quantum dynamics promotes the flow of quantum-mechanical probability between adjacent local minima toward the goal of reaching the global minimum.

We are in this way motivated to study the problem of a Shinomoto-Kabashima-like potential with many local minima by direct numerical computations in the continuous space under classical as well as quantum annealing protocols. Our results indicate that a flow of quantum-mechanical probability toward the global minimum enhances the performance of quantum annealing, in particular under meticulously designed non-linear annealing schedules. Also observed is an accelerated convergence of simulated annealing under quasi-global search processes across energy barriers in comparison with a conventional local search but still at a slower rate than in the quantum case.

The rest of the paper is organized as follows. In Sec. II, we define the problem and give a discussion on the energy landscape of our potential. Section III examines the time-independent aspects of the system, particularly the quantum ground state and the classical equilibrium state. Sections IV and V are devoted to quantum annealing, with the former focusing on the linear annealing schedule and the latter on nonlinear ones for accelerated convergence. Sections VII and VI are on simulated annealing. The former section focuses on the linear schedule, while the latter section deals with the logarithmic schedule considered by Shinomoto and Kabashima. Lastly in Sec. VIII, we summarize the paper and then conclude with some discussions. Additional information is provided in Appendixes.

II The problem

In this section, we first define the problem and show its salient features.

II.1 Definition of the problem

Let us consider the following potential inspired by Shinomoto and Kabashima Shinomoto and Kabashima (1991),

V_{\rm SK}(x)=\frac{1}{2}kx^{2}+\frac{h_{0}}{2}\left[1-\cos\left(\frac{2\pi x}{w_{0}}\right)\right].

(1)

The first is a harmonic term forming the envelope of the overall potential, and it sets a unique global minimum at $x=0$ . Without loss of generality, we shall let the spring constant $k$ be unity from now on. The cosine term introduces local minima onto the energy landscape, and the structure of these minima can be tuned using two parameters $h_{0}$ and $w_{0}$ . As an example, Fig. 1 shows the graph of $V_{\rm SK}(x)$ when $h_{0}=w_{0}=0.2$ . The parameter $h_{0}$ corresponds approximately to the height of the energy barrier between two adjacent minima, and $w_{0}$ to the horizontal distance between them.

Refer to caption — Figure 1: Graph of the potential $V_{\rm SK}(x)$ in the vicinity of the global minimum at $x=0$ , with parameters $h_{0}=w_{0}=0.2$ . These parameters correspond, around the global minimum, approximately to the height of the energy barrier and the distance between neighboring minima. Inset: Zoomed-out view of the potential with the same parameter values, showing the 15 local minima (denoted $N_{\rm min}$ ) on each side of the origin. Due to the harmonic term in Eq. (1), the local minima are smoothed out far away from the origin.

Our proposed potential $V_{\rm SK}(x)$ tries to capture the essential physics underlying Shinomoto and Kabashima’s idea. Nevertheless, there are some differences because in their original formulation, it is assumed that the local minima are all uniform such that the barrier height and distance between neighboring minima remain constant ad infinitum along the $x$ -axis. On the other hand, in Eq. (1), due to the presence of the quadratic term, the local minima of $V_{\rm SK}(x)$ are smoothed out far away from the origin and so the total number of local minima is in fact finite. This is illustrated by Fig. 1, where the inset shows the global view of the same graph. Notwithstanding this difference, Eq. (1) is expected to lead to essentially the same asymptotic behavior of the system because the probability of the system being far from the origin is very small in the long-time limit. We adopt Eq. (1) because of its simpler numerical implementation and yet with essentially the same asymptotic properties to be expected.

II.2 Number of local minima

Let us define $N_{\rm min}$ as the number of local minima in the positive $x$ -direction. It is seen that $N_{\rm min}=15$ when the parameters are $h_{0}=w_{0}=0.2$ . Notice that although the local minima are relatively uniform in the vicinity of the origin, they become shallower further away, so our studies should be focused on the former region in order to stay faithful to Shinomoto and Kabashima’s proposal.

We calculated $N_{\rm min}$ numerically as a function of the parameters $h_{0}$ and $w_{0}$ , and the results are shown in Fig. 2 in the form of a phase diagram. Individual regions of $N_{\rm min}$ from 0 up to 7 are shown. In the region $N_{\rm min}\geq 8$ , the boundaries between successive regions become increasingly closely spaced, with $N_{\rm min}$ approaching infinity as one approaches the vertical axis. The graph of Fig. 1 is indicated by the circle, located inside the region $N_{\rm min}\geq 8$ . Roughly speaking, one can interpret the phase diagram of $N_{\rm min}$ as a map of the difficulty of optimizing the potential $V_{\rm SK}(x)$ . Regions where $N_{\rm min}$ is large are, intuitively, more difficult to optimize than regions where it is smaller. Within a region of constant $N_{\rm min}$ , it is generally more difficult to optimize if either (or both) of the parameters $h_{0}$ and $w_{0}$ increases.

It is not feasible to perform a thorough computational study of the entire phase space. Instead, we shall focus on just one point, $(w_{0},h_{0})=(0.2,0.2)$ , which has been introduced in Figs. 1 and 2. Our choice is due to the need to balance two competing factors. On the one hand, the energy landscape of the potential should be difficult enough to be non-trivial for our annealing algorithms to optimize. On the other hand, we are limited by the computational resources needed to solve the time-dependent Schrödinger equation. For instance, the top left corner of the phase space where $N_{\rm min}$ and $h_{0}$ are large would pose a good challenge to both quantum and simulated annealing; however, a very large number of grid points is required to accurately represent the wavefunction in this region of phase space, making it prohibitively expensive to compute energy eigenvalues and perform long annealing simulations. The energy landscape at the phase point $(0.2,0.2)$ is complex and non-trivial, but at the same time also computationally feasible to simulate, allowing us to focus on the physics of annealing rather than on the optimization of a potential with true technical difficulties. Hence, in the rest of the paper, unless otherwise stated, it is implicit that all numerical calculations are performed with the parameters $h_{0}=w_{0}=0.2$ .

III Comparison of quantum ground state and classical equilibrium state

Before discussing annealing, which involves dynamics, let us first examine the time-independent aspects of the model. The Hamiltonian, applicable to both the quantum and classical descriptions of the system, is given by

H=\frac{p^{2}}{2m}+V_{\rm SK}(x),

(2)

where $p$ and $m$ are the momentum and mass of the particle, respectively. In the quantum description, $p$ is replaced by the momentum operator $\frac{\hbar}{i}\frac{\partial}{\partial x}$ ( $\hbar=1$ ) and $x$ by the position operator. The Hamiltonian operator is then parameterized by the mass $m$ which also serves as the annealing parameter during optimization.

In the classical case, the system is treated using statistical mechanics where it is described by the partition function. The kinetic part of Eq. (2) can be integrated out analytically, and we only need to work with the position space Boltzmann distribution given by

\varrho(x)=\frac{e^{-\beta V_{\rm SK}(x)}}{Z(\beta)},

(3)

where $\beta$ is the inverse temperature ( $k_{\rm B}=1$ ), and $Z(\beta)$ is the partition function. We shall mostly be concerned with the internal energy (i.e. the average energy) of the system

U(\beta)=\frac{1}{2\beta}+\langle V_{\rm SK}(x)\rangle_{\beta},

(4)

where the first term is the contribution from the kinetic energy ( $m=1$ ) and the second one is the average potential energy

\langle V_{\rm SK}(x)\rangle_{\beta}=\int_{-\infty}^{\infty}V_{\rm SK}(x)\,\varrho(x)\,dx.

(5)

The subscript $\beta$ in $\langle\cdot\rangle_{\beta}$ is to indicate the dependence on the inverse temperature via the Boltzmann distribution $\varrho(x)$ . Thus, in contrast to the quantum scenario, we see that the classical energy is parameterized by $\beta$ during annealing.

As energy is commonly used as a gauge to monitor the progress of an annealing process, let us compare the quantum ground state and classical equilibrium state at the same energies. Figure 3 shows the situation when the energies are approximately 0.6. Panel (a) shows the quantum case where the ground state energy eigenvalue $E_{0}$ is plotted in blue and its corresponding probability density $|\langle x|E_{0}\rangle|^{2}$ (i.e. the ground state wavefunction squared) is plotted in red. All quantities are plotted to scale and the vertical axis reflects their actual numerical values. Hence, by comparing the blue line to the potential $V_{\rm SK}(x)$ (black curve), one can say that the energy is high because many local minima are ‘submerged’ below the blue line. The quantum ground state is delocalized in the sense that it spans across several local minima. Panel (b) depicts the classical situation, where the internal energy $U$ is plotted in blue and the Boltzmann distribution $\varrho(x)$ [Eq. (3)] in green. Noteworthy is the fact that, apart from some wave-like oscillations, the classical equilibrium state is also delocalized, spanning local minima, in a way similar to its quantum counterpart. This means that when the energy is high, there is not much quantitative difference between the quantum ground state and classical equilibrium state for exploring the configuration space of the potential $V_{\rm SK}(x)$ .

This analogous behavior of the two states, however, undergoes a dramatic change when the energy is lowered. Figure 4 compares them again at energies around 0.1 [panels (a) and (b)] and at around 0.01 [panels (c) and (d)]. At 0.1, one sees that the quantum ground state has acquired a compact form concentrated around the global minimum. On the other hand, the classical equilibrium state fragments into a multi-modal distribution, with significant probability of finding the particle even at local minima which are relatively far away from the origin. At the low energy of 0.01, the difference between quantum and classical is even more pronounced, with the Boltzmann distribution manifesting two ‘spurious’ modes. These spurious modes are metastable (i.e. long-lived) as the probability channel connecting them to the main mode at the origin is almost completely suppressed.

Based on these comparisons, we see that from an energy viewpoint the quantum ground state appears to be more suitable for the task of optimization than the classical one. At low energies, the former is more focused around the global minimum, whereas the latter exhibits signs of trappings by local minima. As a heuristic explanation, one might say that classical mechanics penalizes the particle in regions where its total energy is smaller than the potential energy, and so the equilibrium state tends to distribute itself around local minima to avoid incurring this penalty. On the other hand, quantum mechanics allows the particle to access classically forbidden regions via tunneling, and this freedom enables its ground state to achieve a more compact form concentrated around the global minimum.

Let us now look at how the quantum and classical energies vary with mass and temperature. Figure 5(a) shows the quantum ground state energy $E_{0}$ as a function of mass $m$ (red). To discern the essential features of $V_{\rm SK}(x)$ , the zero-point energy, $\frac{1}{2}\sqrt{k/m}$ , of two harmonic oscillators labelled ‘outer’ ( $k=1$ ) and ‘inner’ ( $k=99.696$ ) are also shown (blue lines). The outer oscillator describes the envelope of $V_{\rm SK}(x)$ , while the inner oscillator is a quadratic approximation of the potential around the global minimum. In the limits of small and large masses, the ground state energy is well-described by the outer and inner oscillators. The range of $m$ where $E_{0}$ makes a transition between the two oscillators marks the important region of the quantum system.

The classical case is shown in panel (b) where the internal energy $U$ is plotted as a function of the inverse temperature $\beta$ (green). The internal energy of the harmonic oscillator, independent of spring constant as embodied by the equipartition theorem, is also shown (blue). Analogous to the quantum case, in the limit of small and large $\beta$ , the classical ground state energy again approaches that of the oscillator. On the other hand, the distinctive feature here is that the non-trivial aspect of $V_{\rm SK}(x)$ expresses itself in the form of the shaded region (yellow), where the system seems to have ‘excess energy’ relative to the oscillator. The range of $\beta$ where $U$ contains significant excess energy defines the crucial region for the classical system.

Another important aspect of the system is the quantum energy gap $\Delta$ , defined as the energy difference between the ground and first excited states $E_{1}-E_{0}$ . In quantum annealing, fast closure of the energy gap is a sign of first-order phase transition and forebodes long annealing times Albash and Lidar (2018). Figure 6 shows the gap $\Delta$ of the potential $V_{\rm SK}(x)$ as a function of $m$ (red). Once again, we show the gaps of the outer and inner oscillators (given by $\sqrt{k/m}$ and plotted in blue) to highlight those features which are particular to $V_{\rm SK}(x)$ . We see that the gap of $V_{\rm SK}(x)$ does not exhibit any anomalously small behavior since its magnitude lies mostly (although not strictly) between those of the two oscillators. This suggests a polynomial computational complexity for quantum annealing, which will be confirmed numerically. Therefore this is not a hard problem to solve from the viewpoint of computational complexity, but our point is to compare the performance of quantum and classical protocols, not to discuss difficulties within the framework of quantum annealing per se.

Apart from the absence of closure, let us also point out the segment of the gap curve indicated as ‘flat gap’. The gradient $\frac{\partial\Delta}{\partial m}$ stays zero throughout the segment, with the first excited state being multiply degenerate. This feature is not peculiar to the phase point $h_{0}=w_{0}=0.2$ , and is in fact a persistent motif of the gap curve throughout the $h_{0}$ - $w_{0}$ phase space. We shall elaborate more on this in Appendix A. The implications of this ‘flat gap’ for quantum annealing is not immediately clear, but we shall see later that the performance seems to be slightly lower here compared to other regions where it is absent.

Lastly, let us remark that the ground state properties at the phase point $(w_{0},h_{0})=(0.2,0.2)$ presented in this section can be considered a good representative of the general behavior in phase space. Nevertheless, we also found certain aspects which are not captured by $(0.2,0.2)$ . To supplement the discussion given here, we briefly touch upon the quantum energy $E_{0}$ and the gap $\Delta$ of other phase points in Appendix A.

IV Quantum annealing using a linear schedule

We now analyze the dynamics, first for the quantum case under a simple linear annealing schedule.

IV.1 Linear schedule

Quantum annealing of the potential $V_{\rm SK}(x)$ is performed by solving the time-dependent Schrödinger equation

\left[i\hbar\frac{\partial}{\partial t}+\frac{\hbar^{2}}{2m(t)}\frac{\partial^{2}}{\partial x^{2}}-V_{\rm SK}(x)\right]\psi(x,t)=0,

(6)

where $\psi(x,t)$ is the wavefunction, and the time dependence of mass $m(t)$ serves as the annealing schedule. Conceptually speaking, one starts annealing from zero mass where the kinetic energy is dominating, and slowly increase it to infinity such that the potential $V_{\rm SK}(x)$ becomes the dominating term. If the change of $m(t)$ with time is infinitesimally slow, then according to the adiabatic theorem Albash and Lidar (2018), a wavefunction initially at the ground state will evolve into the global optimized solution of $V_{\rm SK}(x)$ upon the completion of annealing.

Let us restrict the time evolution under Eq. (6) to within the time interval $0\leq t\leq T$ where $T$ is the total annealing time. Consider the following linear form for $m(t)$

m^{(1)}(t)=\left(m_{f}-m_{i}\right)\left(\frac{t}{T}\right)+m_{i}~{},

(7)

where $m_{i}$ and $m_{f}$ are the masses at the initial $(t=0)$ and final $(t=T)$ times, respectively. When $m_{i}$ and $m_{f}$ are fixed, $T$ determines the annealing speed, with a large $T$ giving slow annealing. The superscript (1) denotes ‘first-order’, and is to differentiate Eq. (7) from higher-order, nonlinear schedules which we shall introduced later. As suggested by Morita Morita (2007), only the initial and final time derivatives of annealing schedule determine the final excitation probability for sufficiently large annealing time. It is thus expected that an explicit form of the annealing function is unimportant, i.e. it is irrelevant whether we change the mass as in Eq. (7) or change the coefficient $\Gamma(t)\equiv\hbar/2m$ of the kinetic energy term as a whole as was done in Ref. Stella et al. (2005), as long as the initial and final time derivatives are shared among different functions. The linear schedule of Eq. (7) represents the case with finite derivatives at both ends of annealing.

IV.2 Stages of annealing

One operational constraint faced when quantum annealing is implemented numerically via the solution of Eq. (6) concerns the choice of the mass parameters $m_{i}$ and $m_{f}$ . In principle, one should be able to use arbitrary small $m_{i}$ and arbitrary large $m_{f}$ in the schedule Eq. (7). For instance, one would like to start from the high energy state shown in Fig. 3(a) and anneal towards the low energy state shown in Fig. 4(c). In practice, however, it is not feasible to solve Eq. (6) numerically with such a large energy difference because the number of grid points needed to represent the wavefunction is very large. In particular, one sees from Figs. 3(a) and 4(c) that the wavefunction evolves from a delocalized to a localized form, and so the grid must be wide enough to represent the initial wavefunction, while at the same time its points must also be dense enough to resolve the final wavefunction. Hence, in practice, the choice of $m_{i}$ and $m_{f}$ is partly constrained by the computational resources, both in terms of memory and processing speed, one has at one’s disposal. In all our quantum annealing simulations, we restricted ourselves to a plane wave grid with 2048 grid points, and then choose $m_{i}$ and $m_{f}$ that are computationally feasible with this grid size.

Despite the above issue, our ultimate goal is still to attain some understanding of annealing when the initial mass is small and the final mass is large. Consider the masses $m_{a}$ and $m_{d}$ listed in Table 1, where the difference $m_{d}-m_{a}$ spans six orders of magnitude. The ground state at $m_{a}$ was discussed in Fig. 3(a). To get an intuitive sense of the ground state at $m_{d}$ , note that $m_{d}$ is larger than $m_{c}$ , whose ground state was discussed in Fig. 4(c). Our objective is to study quantum annealing from the high energy, delocalized state at $m_{a}$ to the low energy, localized state at $m_{d}$ .

To traverse this large mass difference, we perform annealing in stages. Referring to Table 1, let us define the annealing from $m_{a}$ to $m_{b}$ as Stage 1, that from $m_{b}$ to $m_{c}$ as Stage 2, and that from $m_{c}$ to $m_{d}$ as Stage 3. The properties of the ground states at $m_{a}$ , $m_{b}$ , and $m_{c}$ have been discussed in Sec. III. Apart from keeping the computations manageable, the stages are also set up based on energy considerations. In Stage 1, the energy decreases from a high level to half the barrier height of $V_{\rm SK}(x)$ , as indicated by the blue lines in Figs. 3(a) and 4(a). In Stage 2, the energy continues its descent to the potential level of the two lowest local minima, as seen in Fig. 4(c). Finally, in Stage 3 the energy descends to close to the potential of the global minimum.

A graphical perspective of the annealing stages is also given in Figs. 5(a) and 6 where they are superposed upon the $E_{0}$ and $\Delta$ curves. In these figures, the vertical lines indicate the initial and final masses of each of the stages.

$l$ (label)	$\log_{10}m_{l}$	$E_{0}(m_{l})$	$\log_{10}\beta_{l}$	$U(\beta_{l})$
	Quantum Annealing		Simulated Annealing
$a$	0	0.5999898	0.29	0.6031582
$b$	3	0.1050870	1.18	0.1061226
$c$	5	0.0154758	1.97	0.0156931
$d$	6	0.0049617	2.35	0.0049526

Table 1: The boundary parameters of the various annealing stages, as well as their corresponding energies. Each

m

and

\beta

is labelled, and their roles as the initial and/or final parameter of the stages are shown in Fig. 5.

In numerical calculations, the grid in each stage is determined by making sure that it is wide enough to accommodate the ground state wavefunction at $m_{i}$ and its grid points dense enough to resolve the ground state wavefunction at $m_{f}$ . This is done quantitatively by making sure that the eigenenergy $E_{0}$ is converged with respect to the number of grid points at both $m_{i}$ and $m_{f}$ ¹¹1More concretely, we first determine the limit $x_{\rm max}$ such that the ground state wavefunction at $m_{i}$ lies comfortably within the interval $[-x_{\rm max},x_{\rm max}]$ . We then compute the energy eigenvalue $E_{0}$ using a grid spanning the interval with 2048 grid points. The computed $E_{0}$ must remain unchanged when the number of grid points is decreased to 1024 and increased to 4096. With the converged grid at $m_{i}$ , we repeat the procedure at $m_{f}$ to ensure that the same grid must guarantee the convergence of $E_{0}$ at the final mass as well. . A grid that passed the convergence test at both ends of a stage is used for quantum annealing of that stage.

IV.3 Residual energy

Quantum annealing of a stage is performed by inserting the particular $m_{i}$ and $m_{f}$ into the schedule Eq. (7), which is then substituted into Eq. (6). The resulting Schrödinger equation is solved using the WavePacket program package Schmidt and Lorenz (2017); Schmidt and Hartmann (2018); Lorenz and Schmidt (2020). Atomic units are used, and $m(t)$ is expressed in units of the electron rest mass. Time propagation is performed using the Runge-Kutta algorithm with a time step of $dt=0.1$ ²²2Convergence of trajectories with respect to $dt$ is checked by halving it (to 0.05) and then seeing that recalculated results remain invariant.. The ground state wavefunction at $m_{i}$ is used as the initial wavefunction at the start of annealing.

To assess the performance of annealing, we consider the residual energy, defined as

R_{\rm q.a.}(T)=\langle\psi(T)|H(T)|\psi(T)\rangle-E_{0}(m_{f}).

(8)

The first term is the expectation value of the Hamiltonian [given in Eq. (6)] taken with respect to the wavefunction obtained when annealing finishes at $t=T$ . The second term is the ground state energy at the final mass, and is the target energy which annealing seeks to attain. In the limit $T\rightarrow\infty$ , the time evolution becomes adiabatic and the residual energy approaches zero. The rate at which $R_{\rm q.a.}(T)$ decays with the total annealing time $T$ is a measure of how quickly the target energy is being attained by the annealing process.

Figure 7 shows $R_{\rm q.a.}(T)$ as a function of $T$ obtained under the linear schedule. The three curves show the results for the three stages discussed earlier. The straight lines (black) are proportional to $T^{-2}$ and indicate the asymptotic behavior of the curves when $T$ is large ³³3Due to the periodic oscillatory behavior of the data points, instead of peforming a least square fit, we have chosen instead to manually adjust the vertical displacement of the line $T^{-2}$ to fit those data points at the crests of the oscillatory curve. We find that this reflects more accurately the observed gradients of the curves.. We also performed some additional simulations at other $h_{0}$ - $w_{0}$ phase points and found that the decay rate $T^{-2}$ holds quite generally in phase space. This decay rate for the linear schedule is in agreement with previous studies on spin systems Suzuki and Okada (2005); Morita (2007).

The curve of Stage 2 (blue squares) is slightly different from the other two stages, first undergoing a transient phase before crossing over into the asymptotic $T^{-2}$ regime. Similar two-phased behavior in the residual energy has also been reported in a two-level system, with the earlier phase being attributed to diabatic transitions and explained by the Landau-Zener theorem Morita (2007). The transient phase here is probably also due to transitions into excited states. Figure 8 shows an example of the final wavefunction that is obtained in Stage 2, with the corresponding residual energy indicated in Fig. 7 (arrowed $T=3250$ ). For comparison, the target ground state is shown in Fig. 4(c). One sees that the annealed wavefunction exhibits two spurious peaks at the neighboring local minima, which are absent in the target wavefunction. Furthermore, we found that the first excited level in the ‘flat gap’ region (c.f. Fig. 6) is doubly-degenerate, spanned by two compact gaussians each centered at one of the two local minima. Hence, the spurious peaks in the annealed wavefunction in Fig. 8 are most likely caused by diabatic transition into the first excited level when the annealing traverses the ‘flat gap’ region.

It is interesting to see that in quantum annealing, parts of the wavefunction can become trapped near local minima if the annealing time is not long enough. Normally, this may be what one would expect more in simulated annealing. This similarity between quantum and simulated annealing is inspiring from a physics point of view, especially since two annealing protocols obey very different rules. The mechansim underlying the trapping in simulated annealing is fairly well-understood. On the other hand, we might try to explain the trapping observed for quantum annealing using Landau-Zener transition. In the case of $V_{\rm SK}(x)$ , we have seen that the system encounters a region of ‘flat gap’ during the course of annealing. This may pose a problem for the Landau-Zener theory as there is no well-defined avoided crossing, which is one of the basic tenets of the Landau-Zener theory. Hence, although the physical process underlying the trapping observed here might be a diabatic transition, the detailed mechanism could differ from the traditional Landau-Zener one.

IV.4 Time evolution of the wavefunction and the approach to adiabaticity

We now discuss the time evolution of the wavefunction during the course of annealing. Figure 9(a) shows a snapshot of the probability density at time $t=44$ in Stage 1, with $T=560$ . The single-modal form is quite representative of the shape of the wavefunction in Stage 1, so we can express its temporal behavior more succinctly using the width $\langle\psi(t)|x^{2}|\psi(t)\rangle$ , which is illustrated in Fig. 9(a). Figure 10(a) shows the time evolution of the width in Stage 1 for several values of $T$ . Overall, one sees that all three graphs oscillate, with the amplitude of oscillation decaying with the passage of time. Among the graphs themselves, as $T$ increases, the oscillation becomes suppressed to yield a smoother decay.

Although informative, looking at the width alone gives us rather limited understanding of the behavior of the wavefunction. More insight can be gained by looking at the so-called probability current, defined as

j(x,t)=\frac{\hbar}{2m(t)i}\left(\psi^{*}\frac{\partial\psi}{\partial x}-\psi\frac{\partial\psi^{*}}{\partial x}\right),

(9)

where $\psi^{*}(x,t)$ is the complex conjugate of $\psi(x,t)$ , $i=\sqrt{-1}$ , and $m(t)$ is again the time-dependent mass. The current $j(x,t)$ is a real-valued quantity, and indicates the direction in which the probability is flowing at a point in space ⁴⁴4For the plane wave $e^{ikx}$ , one has $j(x,t)=\frac{\hbar k}{m}=\frac{p}{m}=v$ , which is simply the velocity of the particle. Hence, the probability current can be interpreted as a generalization of the velocity field, indicating both the speed and direction of motion of the particle. . Figure 9(b) shows the probability current corresponding to the probability density shown in panel (a). It is seen that the current is positive (negative) when $x>0$ ( $x<0$ ), meaning that probability is flowing outwards away from the global minimum at the origin. With respect to the purpose of optimization, such an instantaneous behavior of the wavefunction should be considered unfavorable because the particle is diffusing away from the optimal solution at $x=0$ .

As with the width of the probability density, let us also introduce a parameter associated with $j(x,t)$ to monitor the time evolution of the current. Define the amplitude as the largest displacement of $j(x,t)$ from zero in the region $x<0$ . This is illustrated in Fig. 9(b), where the amplitude is $j(x_{0},t)$ at the point $x_{0}$ . The amplitude can take on negative values, with the sign indicating the direction of probability flow.

Figure 10(b) shows the time evolutions of the current amplitude $j(x_{0},t)$ associated with the time evolutions of the width shown in panel (a). We see that the amplitude oscillates in time between positive and negative values, meaning that the probability current flow towards and away from the global minimum in a periodic manner. This is consistent with the pulsating behavior exhibited by the width. Another feature is that as $T$ increases, the amplitude tends to zero, meaning that the current itself approaches zero both spatially, as well as temporally throughout the entire duration of annealing. As $j(x,t)$ is identically zero for stationary states, this is a sign of the annealing process approaching adiabaticity. Viewed differently, even for the moderate value of $T=560$ , where the residual energy approaches the asymptotic $T^{-2}$ law for Stage 1 as seen in Fig. 7 (at $\log_{10}T=2.748$ ), the current keeps oscillating, see Fig. 10(b), implying a diabatic evolution.

So far, we have focused our discussion on Stage 1. The probability current in Stage 2 is slightly more complicated. An example is shown in Fig. 21, where a snapshot of $j(x,t)$ at $t=10$ with $T=320$ is shown. One sees that the current direction varies spatially, unlike the spatially-coherent flow we have seen in Fig. 9(b). Despite such differences, the overall behaviors in Stages 2 and 3 are generally quite similar to Stage 1. For completeness, these results are discussed in Appendix B.

V Quantum annealing with nonlinear schedules

V.1 Nonlinear annealing schedules

In the previous section, we have seen that asymptotically the residual energy decays with the total annealing time as $T^{-2}$ . Analysis related to the adiabatic theorem shows that this scaling is largely a result of annealing with the linear schedule Eq. (7) Suzuki and Okada (2005); Morita (2007). Nonlinear schedules have been proposed in which the residual energy decays much faster with the annealing time, leading to better efficiency Morita (2007). In this section, we apply them to the potential $V_{\rm SK}(x)$ .

Nonlinear schedules can be obtained by generalizing Eq. (7) as

m^{(n)}(t)=\left(m_{f}-m_{i}\right)f_{n}(t/T)+m_{i},

(10)

where $n$ denotes the order of the schedule, and the linear function $t/T$ is replaced by higher-order functions $f_{n}(t/T)$ . Details concerning the derivation of $f_{n}$ can be found in Ref. Morita (2007). Physically speaking, these functions are designed such that annealing is constrained to proceed slowly both at the beginning as well as at the end of the process. Earlier studies on spin systems revealed that the residual energy decays with annealing time as $T^{-2n}$ , so higher-order schedules generally lead to better annealing performances. In the following, we shall investigate the performances of the functions $f_{1}(s)$ to $f_{4}(s)$ given in Ref. Morita (2007),

	$\displaystyle f_{1}(s)$	$\displaystyle=s,~{}f_{2}(s)=s^{2}(3-2s),$		(11)
	$\displaystyle f_{3}(s)$	$\displaystyle=s^{3}(10-15s+6s^{2}),~{}f_{4}(s)=s^{4}(35-84s+70s^{2}-20s^{3}).$		(12)

V.2 Residual energy: faster decay under nonlinear schedules

Figure 11 compares the residual energies obtained under the schedules given by Eq. (10), with $n=1$ to 3. The results from Stages 1 to 3 are shown in panels (a) to (c) of Fig. 11, respectively. The general behaviors in Stages 1 and 3 are qualitatively quite similar. It is seen that for an $n$ th-order schedule the residual energy decays as $T^{-2n}$ , verifying the theoretical prediction of Ref. Morita (2007). To emphasize the importance of varying the schedule slowly both at the start as well as towards the end of annealing, we also considered a straightforward quadratic schedule $m^{(q)}(t)$ , partly motivated by Ref. Stella et al. (2005), which is obtained by simply substituting $f_{n}$ with $(t/T)^{2}$ in Eq. (10). This schedule proceeds slowly only at the start but not at the end of annealing. In panels (a) and (c), we see that the residual energy of $m^{(q)}(t)$ (solid blue squares) decays only as $T^{-2}$ . That both linear and quadratic schedules should show the same asymptotic decay rate can also be explained based on the theoretical analysis of Ref. Morita (2007).

The situation in Stage 2 is different from the other two in that the residual energy decays in the manner of the transient phase discussed in Sec. IV.3. Note that this is in the domain of strongly diabatic transitions and so is not the regime in which the function $f_{n}$ were originally designed for. Nevertheless, it is seen from panel (b) of Fig. 11 that the residual energies of $m^{(2)}(t)$ and $m^{(3)}(t)$ decrease faster than that of $m^{(1)}(t)$ . However, we also found that the curve for $m^{(4)}(t)$ (not shown) is almost exactly the same as that of $m^{(3)}(t)$ , so we should be cautious about generalizing the improvement in performance seen for $n=2$ and 3 to higher orders. We were unable to pursue the curves of $m^{(2)}(t)$ and $m^{(3)}(t)$ to larger $T$ to examine the asymptotic crossover because we have exhausted the numerical precision offered by the simulation package. Within the time span we could study as shown in Fig. 11(b), diabatic processes lead to significantly faster convergence than $T^{-2}$ . Interestingly, the quadratic schedule gives the best performance here, unlike in the other two stages. Overall, nonlinear schedules generally show improvement over the linear one in this stage. Interpretation of the results, however, is not as clear-cut as in Stages 1 and 3.

V.3 Persistent tunneling current during quantum annealing

Earlier in Sec. IV.4, we have seen that under the linear schedule, increasing the annealing time suppresses of the probability current and leads to a more adiabatic process. Conventionally, adiabaticity is considered to be closely related with how the global minimum is ultimately to be attained in quantum annealing. With nonlinear schedules, however, we are now allowed to explore the schedule’s order $n$ as a new parameter dimension while keeping the annealing time constant. We shall see that this leads to tunneling, as opposed to adiabaticity, as a dominant mechanism for quantum annealing to arrive at the optimal solution.

Let us focus our discussion on Stage 1. The probability currents under nonlinear schedules also have the spatially-coherent form of Fig. 9(b) (no node in each of the region $x>0$ and $x<0$ ), so once again we can use the amplitude $j(x_{0},t)$ to monitor their temporal behaviors. Figure 12 shows the time evolutions of $j(x_{0},t)$ obtained under the various schedules, with annealing time $T=560$ . The salient feature that is common to all the nonlinear schedules and which distinguishes them from the linear one is that the amplitude is consistently positive throughout the entire duration of annealing. The probability current is therefore always flowing towards the origin, and no current ever flows away. This is a highly diabatic process in which the optimization is targeted at the global minimum. In stark contrast, the current of the linear schedule oscillates in an unguided manner, resulting in inefficient annealing.

Apart from being persistent, the current of nonlinear schedules also has the additional feature of exhibiting tunneling. To explain, let us first note from Fig. 9 that the spatial extent of both the probability density and the current in the domain $x>0$ can be approximated by the width $\sqrt{\langle\psi(t)|x^{2}|\psi(t)\rangle}$ . The red curve in Fig. 13 shows the time evolution of $\sqrt{\langle\psi(t)|x^{2}|\psi(t)\rangle}$ for the schedule $m^{(3)}(t)$ in Stage 1 with $T=560$ . The (green) shaded areas show the classically forbidden regions where the total energy $\langle\psi(t)|H(t)|\psi(t)\rangle$ is less than the potential energy $V_{\rm SK}(x)$ . Shaded areas lying below the red curve are regions where the current resides in classically forbidden regimes, and these currents are tunneling in nature. Hence, nonlinear schedules induce persistent currents that tunnel across potential barriers to greatly improve the efficiency of annealing. This is in contrast to the linear schedule which, on the contrary, seeks to suppress the flow of current in order to bring about a more adiabatic annealing process.

So far, our discussion has focused on Stage 1. For Stage 2, we mentioned earlier that under the linear schedule the current is not spatially-coherent (c.f. Fig. 21). Under nonlinear schedules, however, the current resumes a spatially-coherent form and so we can again use the amplitude $j(x_{0},t)$ as a monitoring parameter to study its temporal evolution. Despite the different behavior of their residual energies, it was found that the observations reported for Stage 1 are valid for Stage 2 as well. For completeness, these results are summarized in Appendix B.

The situation in Stage 3 is quite similar to that of Stage 1, with the current showing persistence under the nonlinear schedules. Tunneling, however, is not a prominent feature in Stage 3 as the wavefunction is already well-localized within the global minimum [c.f. Fig. 4(c)]. The optimization is therefore taking place within an effective quadratic potential, which is a relatively simple problem for quantum annealing.

VI Simulated annealing using a linear schedule

We next study simulated annealing, first under a linear annealing schedule for comparison with the quantum counterpart.

VI.1 Annealing protocol

We perform simulated annealing numerically using the Metropolis algorithm. Let the position of the particle be $x_{t}$ at time $t$ . A new position $x^{\prime}$ is proposed as

x^{\prime}=x_{t}+\Delta x

(13)

where $\Delta x$ is a random number drawn from the interval $[-\frac{s}{2},\frac{s}{2}]$ of width $s$ . The quantity $s$ is called the Monte Carlo step size and it determines how far the particle is allowed to jump with each move. The proposed position $x^{\prime}$ is then accepted with probability

P_{\rm accept}=\min\left\{1,e^{-\beta(t)\left[V_{\rm SK}(x^{\prime})-V_{\rm SK}(x_{t})\right]}\right\},

(14)

where the temperature is provided by the annealing schedule $\beta(t)$ , which will be specified later. After $x_{t}$ is updated, the time $t$ advances by a time step $dt$ and the Metropolis algorithm is repeated ⁵⁵5The simulated annealing calculations in Secs. VII and VI are performed with $dt=0.1$ , to be consistent with the time step used in quantum annealing. .

To compute averages, we work with an ensemble of $N$ particles where each of them evolves independently in time according to the Metropolis algorithm. The average $\langle V_{\rm SK}(x)\rangle_{t}$ at time $t$ is calculated via the ensemble average as

\langle V_{\rm SK}(x)\rangle_{t}=\lim_{N\rightarrow\infty}\frac{1}{N}\sum_{\nu=1}^{N}V_{\rm SK}(x_{t}^{\nu}),

(15)

where $x^{\nu}_{t}$ is the position of the $\nu$ th particle at time $t$ . In practice, one works with a large but finite $N$ , typically $N=10^{7}$ and $10^{9}$ . The ensemble average then becomes an approximation, subjected to errors due to finite-size fluctuations.

VI.2 Linear schedule and stages of simulated annealing

Let us consider the following linear form for $\beta(t)$

\beta^{(1)}(t)=\left(\beta_{f}-\beta_{i}\right)\left(\frac{t}{T}\right)+\beta_{i}.

(16)

As in quantum annealing, we restrict ourselves to the time interval $0\leq t\leq T$ , where $T$ again denotes the total annealing time. The parameters $\beta_{i}$ and $\beta_{f}$ are the inverse temperatures at the initial ( $t=0$ ) and final ( $t=T$ ) times, respectively. The schedule $\beta^{(1)}(t)$ is the simulated annealing counterpart to the mass schedule $m^{(1)}(t)$ that we have used in quantum annealing.

We would like to perform the simulated annealing calculations in such a way that they can be compared to the results of quantum annealing obtained in Sec. IV. To enable a fair comparison, we use energy as the criterion. The boundary parameters $\beta_{i}$ and $\beta_{f}$ in Eq. (16) are chosen in such a way that the initial and final energies of a stage in simulated annealing are approximately the same as the initial and final energies of the corresponding stage in quantum annealing. These boundary parameters $\beta_{l}$ and their energies $U(\beta_{l})$ are summarized in Table 1, alongside their corresponding quantum counterparts $m_{l}$ and $E_{0}(m_{l})$ . To facilitate our subsequent discussions, let us call annealing from $\beta_{a}$ to $\beta_{b}$ Stage A, that from $\beta_{b}$ to $\beta_{c}$ Stage B, and that from $\beta_{c}$ to $\beta_{d}$ Stage C. The equilibrium state properties at the stage boundaries have been discussed in Sec. III [c.f. Figs. 3(b), 4(b), and 4(d)]. A graphical perspective of the three stages is also given in Fig. 5(b), where they are superposed upon the curve of the internal energy $U(\beta)$ . It is seen that the stages are positioned in the non-trivial regime of the system exhibiting ‘excess energy’.

VI.3 Residual energy

The annealing performance is assessed by considering the residual energy

R_{\rm s.a.}(T)=\langle V_{\rm SK}(x)\rangle_{t=T}-\langle V_{\rm SK}(x)\rangle_{\beta_{f}}~{}.

(17)

The first term is the average potential energy obtained when annealing ends at $t=T$ . The second term is the average potential energy at equilibrium at the final temperature [c.f. Eq. (5)], and is the target energy that annealing seeks to attain. The quantity $R_{\rm s.a.}(T)$ is the simulated annealing counterpart to the residual energy $R_{\rm q.a.}(T)$ considered in quantum annealing.

Figure 7 shows the behavior of $R_{\rm s.a.}(T)$ as a function of $T$ in Stage A under the linear schedule. We experimented with a wide range of step sizes and some examples of the residual energy curves we obtained are shown in the figure. There exists an optimum step size that gives the lowest-lying $R_{\rm s.a.}(T)$ curve, and in Stage A this step size was found to be $s=1$ (red circles). The solid line (black) is obtained by fitting a straight line to the data points of $s=1$ . We see that the residual energy decreases as $T^{-1}$ . This scaling of $R_{\rm s.a.}(T)$ with $T$ is quite general. We identified the optimum step sizes in Stages B and C as well, and the lowest-lying $R_{\rm s.a.}(T)$ curve in all three stages are summarized in Fig. 15, shown by the three slanting curves. The two solid lines (black) are obtained by fitting a straight line to the data points of Stages B and C. It is seen that the residual energy also decreases as $T^{-1}$ in these two stages.

The scaling of $T^{-1}$ is fairly robust in the sense that it is attainable over quite a wide range of step sizes. However, in Stages B and C, we also found that if the step size falls below the inter-minima distance of $V_{\rm SK}(x)$ (i.e. $\frac{s}{2}<w_{0}$ ), then the $T^{-1}$ scaling will no longer be achieved. In Fig. 15, the two uppermost curves (magenta squares) show the residual energies obtained in these two stages with the annealing performed using $s=0.1$ . It is seen that they decrease much slower than the previous curves. The reason for the slow decay is trapping by local minima, and can be understood by looking at the initial Boltzmann distributions of these two stages, shown in Figs. 4(b) and 4(d). The probability channels connecting the global minimum to the two adjacent local ones are suppressed, so if the step size is smaller than the inter-minima distance $w_{0}$ , particles at the local minima will need a long time to hill-climb the potential barrier in order to reach the global minimum, possibly leading to a logarithmic decrease of the residual energy as suggested by Shinomoto and Kabashima Shinomoto and Kabashima (1991). See also discussions in the next section. Viewed differently, larger step sizes satisfying $\frac{s}{2}>w_{0}$ may be regarded to realize quasi-global searches across energy barriers without hill-climbing processes, leading to a faster polynomial decrease of the residual energy. It is noteworthy that, even with such a quasi-global search, the decrease rate $T^{-1}$ is slower than the quantum counterpart $T^{-2}$ .

One might inquire if it is possible to obtain a better scaling than $T^{-1}$ by making some adjustments to the annealing algorithm. We tried a simple quadratic schedule where the linear term $t/T$ in Eq. (16) is squared. We also experimented with a variant algorithm in which the step size decreases adaptively with temperature during the course of annealing. Overall, we did not manage to achieve a faster decrease than $T^{-1}$ with these attempts.

VII Simulated annealing using a logarithmic schedule

VII.1 Logarithmic schedule

In Ref. Shinomoto and Kabashima (1991), Shinomoto and Kabashima studied simulated annealing of the original form of $V_{\rm SK}(x)$ , where the barrier height and distance between neighboring minima are kept constant for all $x$ , under a logarithmic annealing schedule with time running from 0 to $\infty$ in contrast to the finite-time protocols in the previous sections. To test their prediction, let us consider the following schedule for the inverse temperature

\beta^{(l)}(t)=\beta_{i}\log_{10}\left(t+10\right),

(18)

where $\beta_{i}$ is the initial inverse temperature at time $t=0$ . Time $t$ is supposed to run from 0 to $\infty$ . The superscript $(l)$ denotes ‘logarithmic’ and is to differentiate Eq. (18) from the other schedules. Shinomoto and Kabashima showed, by assuming that particle are located only at local minima and taking the continuum limit ignoring discreteness of positions, that the average of the potential energy during simulated annealing under Eq. (18) should decrease with time asymptotically as

\langle V_{\rm SK}(x)\rangle_{t}\sim(\log_{10}t)^{-1},

(19)

where $\langle V_{\rm SK}(x)\rangle_{t}$ is given by the right side of Eq. (5), with a small difference that $\beta$ is now further dependent on time via the annealing schedule. We indicate this time dependence using the subscript $t$ in $\langle\cdot\rangle_{t}$ . In the following, we shall study the dependence of $\langle V_{\rm SK}(x)\rangle_{t}$ on time from the perspective of Monte Carlo simulations to see if their prediction is observed numerically.

VII.2 Time scaling of the potential energy from Monte Carlo simulations

Figure 16 shows the decay of $\langle V_{\rm SK}(x)\rangle_{t}$ with time under Eq. (18), with $\beta_{i}=10^{0.29}$ . The initial ensemble at the start of annealing is the one drawn from the Boltzmann distribution of $V_{\rm SK}(x)$ with temperature $\beta_{i}$ . Each curve in the figure represents the result obtained using a particular step size $s$ . It is seen that when $t$ is large, the curves collapse onto a single curve. We fitted a straight line to the curve of $s=4$ using the data points from $\log_{10}\log_{10}t>0.5$ , and the result is shown by the solid line (black). We see that $\langle V_{\rm SK}(x)\rangle_{t}$ decays asymptotically as $(\log_{10}t)^{-0.74}$ . The decay exponent $-0.74$ is different from the $-1$ proposed by Shinomoto and Kabashima.

To account for our results, we propose generalizing the scaling Eq. (19) slightly as

\langle V_{\rm SK}(x)\rangle_{t}\sim(\log_{10}t)^{\alpha_{\rm s.a.}}

(20)

where the decay exponent $\alpha_{\rm s.a.}$ is obtained from numerical simulations. We repeated the simulated annealing calculations with different values of $\beta_{i}$ and the results are summarized in Table 2. These results capture the non-trivial aspects of the potential $V_{\rm SK}(x)$ . The column $\beta_{\rm final}$ indicates the final inverse temperature attained by the schedule for that annealing simulation. By referring $\beta_{\rm final}$ to Fig. 5(b), we see that our annealing was performed in the regime where the system exhibits ‘excess energy’.

$\beta_{i}$	$\beta_{\rm final}$	$\alpha_{\rm s.a.}$	$\alpha_{\rm eq.}(\beta_{\rm final})$
$10^{0.29}$	$10^{0.94}$	$-0.74$	$-0.74$
$10^{1.00}$	$10^{1.66}$	$-1.07$	$-1.07$
$10^{1.75}$	$10^{2.40}$	$-1.56$	$-1.59$

Table 2: Decay exponent

\alpha_{\rm s.a.}

obtained from Monte Carlo simulations. Annealing was performed under the schedule Eq. (18), starting from

\beta_{i}

and ending at

\beta_{\rm final}

. The gradient

\alpha_{\rm eq.}(\beta_{\rm final})

[c.f. Eq. (21)] approximates

\alpha_{\rm s.a.}

very well.

The exponents $\alpha_{\rm s.a.}$ may be understood on the basis of the equilibrium properties of the system. Let us define

\alpha_{\rm eq.}(\beta^{\prime})=\left.\frac{d[\log_{10}\langle V_{\rm SK}(x)\rangle_{\beta}]}{d[\log_{10}\beta]}\right|_{\beta^{\prime}}

(21)

where $\langle V_{\rm SK}(x)\rangle_{\beta}$ is given by Eq. (5) and is the average potential energy at equilibrium. The quantity $\alpha_{\rm eq.}(\beta^{\prime})$ is the gradient of the curve $\log_{10}\langle V_{\rm SK}(x)\rangle_{\beta}$ at $\log_{10}\beta^{\prime}$ . In Table 2, the column $\alpha_{\rm eq.}(\beta_{\rm final})$ shows the gradient evaluated at the final attained temperature $\beta_{\rm final}$ , and we see that the values agree quite well with the decay exponent $\alpha_{\rm s.a.}$ . This agreement between the two quantities can be explained by noting that when $t$ is large the schedule Eq. (18) varies very slowly. The system is therefore very close to equilibrium during the late stages of annealing, and it traces out a narrow segment of the $\langle V_{\rm SK}(x)\rangle_{\beta}$ curve around $\beta_{\rm final}$ . The relation Eq. (20) can therefore be approximated by $\langle V_{\rm SK}(x)\rangle_{\beta_{\rm final}}\sim(\beta_{\rm final})^{\alpha_{\rm s.a.}}$ , from which the interpretation of $\alpha_{\rm s.a.}$ as the gradient $\alpha_{\rm eq.}(\beta_{\rm final})$ follows immediately via Eq. (21).

We close this section with some comments on the decay exponent $-1$ obtained by Shinomoto and Kabashima and the $\alpha_{\rm s.a.}$ we obtained. One possible reason for the discrepancy is that the potential $V_{\rm SK}(x)$ we used, although inspired by Shinomoto and Kabashima’s work, is not exactly the same as the one they treated. Another cause may be the coarse-graining assumption $(w_{0})^{2}\beta_{\rm final}\ll 1$ which the authors made during their derivation. Substituting $w_{0}=0.2$ and the values of $\beta_{\rm final}$ in Table 2, we see that the inequality is not very well-satisfied for the numerical simulations which we performed here. A third, and possibly more important, reason may be as follows. If the system is close to equilibrium and the particle stays almost exclusively in the valley around the global minimum at $x=0$ for $t$ large and thus $\beta$ large, then the equipartition law suggests that the average of the potential energy is proportional to $\beta^{-1}$ , which would imply $\langle V_{\rm SK}(x)\rangle_{\beta}\sim(\log_{10}t)^{-1}$ under the annealing schedule of Eq. (18). Also the kinetic energy term leads to the same behavior due to equipartition. Deviations from this inverse-log law in numerical simulations would imply a few possibilities including a spreading of probability distribution away from the central valley to nearby local minima as well as deviations from equilibrium. It is anyway the case that the contribution to the equilibrium internal energy from kinetic energy would yield the inverse-log law under the quasi-equilibrium context, which is more dominant than the potential term with larger powers as listed in Table 2.

VIII Summary and discussions

We have studied quantum and simulated annealing of a one-dimensional particle along the $x$ -axis. The subject of optimization is played by the potential energy $V_{\rm SK}(x)$ , which is modelled closely after a similar potential originally proposed by Shinomoto and Kabashima Shinomoto and Kabashima (1991). On the one hand, the rugged energy landscape of $V_{\rm SK}(x)$ offers a non-trivial challenge for the annealing algorithms to solve; on the other, the potential is designed such that it possesses a unique global minimum at the origin. This combination of both difficulty and simplicity makes $V_{\rm SK}(x)$ a suitable candidate for understanding the fundamental aspects of quantum annealing for continuous variables, as well as for identifying the underlying differences between it and simulated annealing.

We first briefly reviewed the general features of $V_{\rm SK}(x)$ and its phase diagram. Focusing on a particular phase point, we then examined the properties of both the quantum ground state and the classical equilibrium state in detail. It was pointed out that the quantum ground state wavefunction appears more suitable for optimization than the classical Boltzmann distribution because it exhibits less signs of trapping by local minima at the same value of energy. We moved on to look at the dynamical aspects involving annealing. Quantum annealing was performed under linear and nonlinear schedules. The performance of annealing was assessed using the residual energy, which was found to decrease with total annealing time $T$ as $T^{-2n}$ where $n$ is the order of the schedule, which is in contrast to the result $T^{-1}$ by Stella et al. Stella et al. (2005) derived by a phenomenological approach to the quantum process involving several steps of approximations. Also of particular interest is the use of probability current to monitor the time evolution of the wavefunction during the annealing process. The current revealed the different mechanisms at work behind linear and nonlinear schedules (adiabaticity versus persistent tunneling). We then went on to consider simulated annealing. Here, our main finding is that under the linear annealing schedule the residual energy decreases as $T^{-1}$ if we choose the step size $s$ appropriately. An important lesson we learnt is that the Monte Carlo step size utilized for annealing must be large enough to traverse the energy barrier between local minima to achieve the best performance; otherwise, the particle will be trapped and the residual energy will decrease much slower than $T^{-1}$ , possibly logarithmically as predicted by Shinomoto and Kabashima.

The central result of this work is the comparison between the residual energies of quantum and simulated annealing. We have seen that under the linear schedule, the residual energy of quantum annealing decreases as $T^{-2}$ whereas that of simulated annealing decreases as $T^{-1}$ at best, i.e. even with quasi-global search processes across energy barriers, and probably as $(\log_{10}T)^{-1}$ under the usual local search. This provides another example in which quantum mechanics helps the system reach the global minimum more efficiently than classical stochastic process does. Furthermore, the theoretical framework of quantum annealing allows room for additional improvement, as we have seen from the results of nonlinear schedules. On the other hand, in the case of simulated annealing it is not immediately apparent how one can improve upon the scaling of $T^{-1}$ for quasi-global search and $(\log_{10}T)^{-1}$ for local search. Overall, our results seem to favor quantum over simulated annealing when it comes to the optimization of continuous variables, at least for potentials like $V_{\rm SK}(x)$ .

A second point concerns the mechanisms underlying quantum annealing. We have seen that there are two different ways in which quantum annealing can arrive at the global minimum. Under the linear schedule, this goal is reached by suppressing the probability current so that the annealing mimics an adiabatic process. Under nonlinear schedules, on the other hand, the same objective is achieved by inducing a persistent tunneling current. These are two incompatible mechanisms, but interestingly both can be utilized by quantum annealing, depending on the way it is performed. In the literature, quantum adiabatic computation and quantum annealing are sometimes loosely used as synonyms. Our work shows that it is better not to conflate these two terms as the underlying mechanism can be very different.

Lastly, let us comment on possible ways to extend this work. One interesting feature of quantum annealing is the flexibility to choose (or even design) the annealing driver so as to reap the best performance. In this work, the role of the driver is played by the kinetic energy operator. One might consider introducing a secondary driver in the form of a vector potential operator $A(x)$ , generalizing the momentum operator $p$ to $p-q(t)A(x)$ . The time-dependent charge $q(t)$ plays the role of a second schedule, together with the mass $m(t)$ . One recovers the original model when $q(t)=0$ . It would be interesting to see if one could improve the annealing performance with some appropriate choice of $A(x)$ and $q(t)$ . This could be useful, for instance, in Stage 2 where we have seen that resorting to nonlinear schedules did not deliver the expected improvement in the scaling of the residual energy.

Acknowledgements.

This work is based on a project commissioned by the New Energy and Industrial Technology Development Organization.

Appendix A Overview of $E_{0}(m)$ and $\Delta(m)$ in the $h_{0}$ - $w_{0}$ phase space

In Sec. III, we presented a detailed discussion of the quantum ground state energy $E_{0}$ and the energy gap $\Delta$ of the system as a function of mass $m$ . The discussion there focused on one point of $h_{0}$ - $w_{0}$ phase space, $(w_{0},h_{0})=(0.2,0.2)$ . In this appendix, we broaden the discussion to the curves of $E_{0}(m)$ and $\Delta(m)$ at other phase points, with the aim of providing a more comprehensive picture of the time-independent aspects of the system.

We first discuss the energy $E_{0}$ . As seen in Fig. 5(a), the curve of $E_{0}(m)$ has an ‘S’ shape, resembling the function $\tanh x$ . This feature is generally true at other phase points as well. Panel (a) of Fig. 17 shows several $E_{0}(m)$ curves where $h_{0}$ is fixed and $w_{0}$ varies. [For $w_{0}=5$ (magenta), the ‘S’ portion of the curve lies in the range $m<10^{-2}$ .] Panel (b) shows the complementary situation where $w_{0}$ is fixed and $h_{0}$ varies. Qualitatively, we see that the energy curve of Fig. 5(a) at the phase point $(0.2,0.2)$ can be considered a reasonable representative of the generic behavior in phase space. Quantitatively, however, one also sees that the ‘S’ transition at different phase points varies over a wide range of masses. Hence, one should be slightly cautious when inferring the behavior at other phase points based on the results obtained at $(0.2,0.2)$ .

The energy gap $\Delta$ , on the other hand, exhibits richer behavior compared to $E_{0}$ . In addition to the ‘flat gap’ described in Fig. 6, we also observed two other notable features on the $\Delta(m)$ curves of other phase points.

The first concerns a minimum turning point on the gap curve. In studies on quantum annealing, much attention is being paid to the minimum energy gap as it constitutes the bottleneck of the entire annealing process. For the potential $V_{\rm SK}(x)$ , however, a minimum gap in the conventional sense does not exist. This is most easily seen by considering the gap of the harmonic oscillator, $\sqrt{k/m}$ , which approaches zero as the annealing approaches completion at infinite mass. However, there are situations where the gap goes through a minimum turning point. Panel (a) of Fig. 18 shows an example of such a gap curve obtained at the phase point $(w_{0},h_{0})=(3.7,11.1)$ . Based on our observations of extensive numerical data, we found that (i) the turning point generally occurs at small $m$ , and (ii) just the northeastern region of phase space (c.f. Fig. 2) exhibits this feature on the gap curve. [The phase point $(0.2,0.2)$ lies in the southwestern region and does not exhibit a turning point.] In this paper, we did not investigate whether this minimum turning point has any influence on quantum annealing.

The second feature concerns the occurrence of small gaps. Once again, this is conventionally characterized by the minimum gap. In the case of $V_{\rm SK}(x)$ , however, we found that the gap can show a significant decrease in magnitude but does not go through a turning point. Panel (b) of Fig. 18 shows an example of such a gap curve obtained at the phase point $(w_{0},h_{0})=(0.5,11.1)$ . The sudden and anomalous decrease in the gap size of $V_{\rm SK}(x)$ can be evinced when compared to the ‘benign’ gap behavior of the oscillator. Note that the scale along the vertical axis is logarithmic, which means that the reduction in magnitude is quite significant. The effects of this peculiar gap feature on annealing is also not covered by our numerical studies.

The behavior of the $\Delta(m)$ curve at different points in phase space can be quite complex, exhibiting interplay among the various features mentioned above. Figure 19 shows some examples of how the curve evolves when $h_{0}$ is kept constant and $w_{0}$ varies. Panel (a) shows the case of $h_{0}=0.5$ , corresponding to a small barrier in the potential $V_{\rm SK}(x)$ . The inter-minima distance $w_{0}$ is decreased from 5 to 0.5, which in phase space translates to an increasing $N_{\rm min}$ . To aid visualization, only the curves showing notable changes are plotted with color. We see that the energy gap evolves from a simple, monotonically decreasing curve (gray) to one exhibiting a distinctive ‘flat gap’ (blue). Panel (b) shows the case of large barrier $h_{0}=15$ . The evolution of the gap curve here is more complicated, first showing a minimum turning point (gray), then a ‘flat gap’ (red), and finally a ‘steep decrease’ (blue). Based on extensive numerical observations, we found that in general when $N_{\rm min}$ increases beyond a certain threshold, the two features, ‘flat gap’ followed by ‘steep decrease’, will emerge on the $\Delta(m)$ curve.

Figure 20 shows the complementary situation, with $w_{0}$ held constant and $h_{0}$ varying. It is seen that as $h_{0}$ (and $N_{\rm min}$ ) increases, the gap curve evolves from being monotonically decreasing (red) to having a minimum turning point (green), and then finally attaining the features ‘flat gap’ and ‘steep decrease’ (blue). An additional point to note here is that for the gray curves, their ‘flat gap’ segment seems to have collapsed onto a single horizontal line. This phenomenon is also a motif that recurs quite frequently throughout phase space.

To summarize, in this appendix we presented an overview of the behaviors of $E_{0}(m)$ and $\Delta(m)$ at different points in phase space. In general, the $E_{0}(m)$ curve evolves quite predictably with respect to changes in the parameters $h_{0}$ and $w_{0}$ . By contrast, the shape of the $\Delta(m)$ curve is more diversed, exhibiting different features at different points in phase space. The discussion here aims to give the reader a sense of the underlying complexity of $V_{\rm SK}(x)$ , while also serving as a caveat that our annealing studies performed at just one phase point might not be exhaustive in terms of capturing the dynamical aspects of the system.

Appendix B Time evolution of probability current in Stages 2 and 3

In Secs. IV.4 and V.3, we discussed the temporal behavior of the probability current $j(x,t)$ in Stage 1, and it was mentioned that the results there are also valid in Stages 2 and 3. In this Appendix, we supplement those discussions by presenting some of the key results in these two stages.

B.1 Stage 2 under the linear schedule

The probability current in Stage 2 is sometimes not spatially-coherent, meaning that the current flows in different directions at different points in space. Figure 21 shows a snapshot of $j(x,t)$ in this stage under the linear schedule at time $t=10$ , with annealing time $T=320$ . It is seen that $j(x,t)$ undulates between positive and negative values spatially, so we cannot represent its overall spatial behavior with just the amplitude $j(x_{0},t)$ as in Fig. 9(b). To keep things simple, let us just consider $j(x,t)$ at three locations, namely $x=-0.025,$ $-0.1$ , and $-0.175$ , indicated by the vertical lines in Fig. 21. Physically speaking, $x=-0.1$ is located at the first local maximum of $V_{\rm SK}(x)$ and monitors the current in the classically forbidden region; the other two points monitor the currents in the vicinity of the global minimum and its adjacent local minimum.

The time evolutions are presented in Fig. 22, where panels (a) to (c) show the results for $T=320$ , 3250, and 10000, respectively. In each panel, the current $j(x,t)$ at the three $x$ positions are plotted as a function of time. Generally speaking, we see that as $T$ increases, the oscillations of $j(x,t)$ at all three $x$ positions become suppressed and the annealing approaches adiabaticity. This is similar to what we have seen in Stage 1 for the linear schedule [c.f. Sec. IV.4 and Fig. 10(b)].

Let us make a minor point concerning the current near the local minimum ( $x=-0.175$ , i.e. the red curves). When the total time is short ( $T=320$ ), during the early stages the current is negative, so it is directed towards the local minimum. This explains the spurious peaks observed in Fig. 8. As $T$ increases, this early current becomes predominantly positive, meaning that it now flows away from the local minimum and towards the global one by tunneling, thereby aiding annealing in attaining the correct optimized solution. Nevertheless, we also see that the magnitude and duration of this reversed current decrease with longer $T$ , so adiabaticity is ultimately still the main mechanism contributing to a successful optimization here.

B.2 Stage 2 under nonlinear schedules

In Sec. V.3, we have seen that under nonlinear schedules the probability current in Stage 1 exhibits persistent tunneling, which is in contrast to the oscillatory behavior seen under the linear schedule. The corresponding situation in Stage 2 is quite similar. First, let us note that under nonlinear schedules, the current $j(x,t)$ in Stage 2 has the spatially-coherent form shown in Fig. 9(b), so the amplitude $j(x_{0},t)$ can again be used as a parameter for monitoring the temporal behavior. Figure 23 shows the time evolutions of the amplitude in Stage 2 under the various nonlinear schedules, with $T=3250$ . Overall, the behaviors are similar to what we have seen in Stage 1 (c.f. Fig. 12).

B.3 Stage 3 under the linear schedule

In Stage 3, the wavefunction is well-localized within the central global minimum [c.f. Fig. 4(c)], and so annealing can be thought of as taking place within an effective quadratic potential. Figure 24 shows the time evolution of the amplitude $j(x_{0},t)$ in Stage 3 under the linear schedule, with $T=1000$ . We see that the current exhibits the oscillatory behavior seen in Stages 1 and 2. A minor observation would be that the crests (positive current) are larger in magnitude than the troughs (negative current), so the wavefunction becomes narrower after each oscillatory cycle.

For nonlinear schedules, the overall behavior is similar to Stages 1 and 2, so we shall not discuss them further.

References

Kadowaki and Nishimori (1998) T. Kadowaki and H. Nishimori, Phys. Rev. E 58, 5355 (1998).
Santoro et al. (2002) G. E. Santoro, R. Martoňák, E. Tosatti, and R. Car, Science 295, 2427 (2002).
Das and Chakrabarti (2008) A. Das and B. K. Chakrabarti, Rev. Mod. Phys. 80, 1061 (2008).
Santoro and Tosatti (2006) G. E. Santoro and E. Tosatti, J. Phys. A 39, R393 (2006).
Morita and Nishimori (2008) S. Morita and H. Nishimori, J. Math. Phys. 49, 125210 (2008).
Albash and Lidar (2018) T. Albash and D. A. Lidar, Rev. Mod. Phys. 90, 015002 (2018).
Hauke et al. (2020) P. Hauke, H. G. Katzgraber, W. Lechner, H. Nishimori, and W. D. Oliver, Rep. Prog. Phys. 83, 054401 (2020).
Farhi et al. (2001) E. Farhi, J. Goldstone, S. Gutmann, J. Lapan, A. Lundgren, and D. Preda, Science 292, 472 (2001).
Finnila et al. (1994) A. B. Finnila, M. A. Gomez, C. Sebenik, C. Stenson, and J. D. Doll, Chemical Physics Letters 219, 343 (1994).
Abel and Spannowsky (2021) S. Abel and M. Spannowsky, PRX Quantum 2, 010349 (2021).
Abel et al. (2021a) S. Abel, N. Chancellor, and M. Spannowsky, Phys. Rev. D 103, 16008 (2021a).
Johnson et al. (2011) M. W. Johnson, M. H. S. Amin, S. Gildert, T. Lanting, F. Hamze, N. Dickson, R. Harris, A. J. Berkley, J. Johansson, P. Bunyk, E. M. Chapple, C. Enderud, J. P. Hilton, K. Karimi, E. Ladizinsky, N. Ladizinsky, T. Oh, I. Perminov, C. Rich, M. C. Thom, E. Tolkacheva, C. J. S. Truncik, S. Uchaikin, J. Wang, B. Wilson, and G. Rose, Nature 473, 194 (2011).
Abel et al. (2021b) S. Abel, A. Blance, and M. Spannowsky, arXiv:2105.13945 (2021b).
Stella et al. (2005) L. Stella, G. E. Santoro, and E. Tosatti, Phys. Rev. B 72, 014303 (2005).
Shinomoto and Kabashima (1991) S. Shinomoto and Y. Kabashima, J. Phys. A 24, L141 (1991).
Morita (2007) S. Morita, J. Phys. Soc. Jpn. 76, 104001 (2007).
Note (1) More concretely, we first determine the limit $x_{\rm max}$ such that the ground state wavefunction at $m_{i}$ lies comfortably within the interval $[-x_{\rm max},x_{\rm max}]$ . We then compute the energy eigenvalue $E_{0}$ using a grid spanning the interval with 2048 grid points. The computed $E_{0}$ must remain unchanged when the number of grid points is decreased to 1024 and increased to 4096. With the converged grid at $m_{i}$ , we repeat the procedure at $m_{f}$ to ensure that the same grid must guarantee the convergence of $E_{0}$ at the final mass as well.
Schmidt and Lorenz (2017) B. Schmidt and U. Lorenz, Comp. Phys. Commun. 213, 223 (2017).
Schmidt and Hartmann (2018) B. Schmidt and C. Hartmann, Comp. Phys. Commun. 228, 229 (2018).
Lorenz and Schmidt (2020) U. Lorenz and B. Schmidt, “WavePacket (C++/Python): Time-dependent simulation of open and closed quantum systems. Version 0.3.1.” (2020), https://sourceforge.net/projects/wavepacket/.
Note (2) Convergence of trajectories with respect to $dt$ is checked by halving it (to 0.05) and then seeing that recalculated results remain invariant.
Note (3) Due to the periodic oscillatory behavior of the data points, instead of peforming a least square fit, we have chosen instead to manually adjust the vertical displacement of the line $T^{-2}$ to fit those data points at the crests of the oscillatory curve. We find that this reflects more accurately the observed gradients of the curves.
Suzuki and Okada (2005) S. Suzuki and M. Okada, J. Phys. Soc. Jpn. 74, 1649 (2005).
Note (4) For the plane wave $e^{ikx}$ , one has $j(x,t)=\frac{\hbar k}{m}=\frac{p}{m}=v$ , which is simply the velocity of the particle. Hence, the probability current can be interpreted as a generalization of the velocity field, indicating both the speed and direction of motion of the particle.
Note (5) The simulated annealing calculations in Secs. VII and VI are performed with $dt=0.1$ , to be consistent with the time step used in quantum annealing.