This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Quantum and classical annealing in a continuous space with multiple local minima

Yang Wei Koh Institute of Innovative Research, Tokyo Institute of Technology, Nagatsuta-cho, Midori-ku, Yokohama 226-8503, Japan    Hidetoshi Nishimori Institute of Innovative Research, Tokyo Institute of Technology, Nagatsuta-cho, Midori-ku, Yokohama 226-8503, Japan Graduate School of Information Sciences, Tohoku University, Sendai 980-8579, Japan RIKEN, Interdisciplinary Theoretical and Mathematical Sciences (iTHEMS), Wako, Saitama 351-0198, Japan
Abstract

The protocol of quantum annealing is applied to an optimization problem with a one-dimensional continuous degree of freedom, a variant of the problem proposed by Shinomoto and Kabashima. The energy landscape has a number of local minima, and the classical approach of simulated annealing is predicted to have a logarithmically slow convergence to the global minimum. We show by extensive numerical analyses that quantum annealing yields a power law convergence, thus an exponential improvement over simulated annealing. The power is larger, and thus the convergence is faster, than a prediction by an existing phenomenological theory for this problem. Performance of simulated annealing is shown to be enhanced by introducing quasi-global searches across energy barriers, leading to a power-law convergence but with a smaller power than in the quantum case and thus a slower convergence classically even with quasi-global search processes. We also reveal how diabatic quantum dynamics, quantum tunneling in particular, steers the systems toward the global minimum by a meticulous choice of annealing schedule. This latter result explicitly contrasts the role of tunneling in quantum annealing against the classical counterpart of stochastic optimization by simulated annealing.

I Introduction

Quantum annealing is a quantum-mechanical metaheuristic for optimization problems and has been applied predominantly to problems with discrete degrees of freedom, typically in terms of the transverse-field Ising model Kadowaki and Nishimori (1998); Santoro et al. (2002); Das and Chakrabarti (2008); Santoro and Tosatti (2006); Morita and Nishimori (2008); Albash and Lidar (2018); Hauke et al. (2020); Farhi et al. (2001). An early paper by Finnila et al. Finnila et al. (1994) discussed a system with continuous degrees of freedom but with an imaginary-time Schrödinger dynamics, thus short of intrinsic quantum effects. Recent work by Abel and Spannowsky Abel and Spannowsky (2021) applied the idea of domain-wall encoding Abel et al. (2021a) to approximately but accurately represent continuous degrees of freedom by Ising spins and illustrated quantum tunneling for a double-well potential by experiments on the D-Wave quantum annealing device Johnson et al. (2011). Abel et al. Abel et al. (2021b) extended their idea to more complex energy landscapes with several sharp local minima or a single deep global minimum and showed that quantum annealing as implemented on the D-Wave device achieves better performance than classical algorithms including simulated annealing. Most pertinent to the present paper is the contribution by Stella et al. Stella et al. (2005), in which they studied quantum annealing and classical simulated annealing for simple as well as non-trivial optimization problems in a continuous space. Their overall conclusion is that the quantum approach leads to a faster convergence to the global minimum than the classical one does. This result has been derived convincingly by numerical and analytical methods for the trivial case of harmonic oscillator as well as for the less trivial double-well potential. However, for the potential with very many local minima proposed and analyzed classically by Shinomoto and Kabashima Shinomoto and Kabashima (1991), they adopted a phenomenological coarse-grained model, in which only discrete positions of local minima were considered and then a continuum limit to ignore the distance between local minima was taken to facilitate the analysis. This procedure necessarily involves approximations, whose legitimacy is not necessarily evident in the long-time low-temperature limit in the classical case. This is true in the quantum case as well, and applicability of the approximation needs careful scrutiny. Another important aspect is that the approximation makes it difficult to intuitively grasp the idea of how quantum dynamics promotes the flow of quantum-mechanical probability between adjacent local minima toward the goal of reaching the global minimum.

We are in this way motivated to study the problem of a Shinomoto-Kabashima-like potential with many local minima by direct numerical computations in the continuous space under classical as well as quantum annealing protocols. Our results indicate that a flow of quantum-mechanical probability toward the global minimum enhances the performance of quantum annealing, in particular under meticulously designed non-linear annealing schedules. Also observed is an accelerated convergence of simulated annealing under quasi-global search processes across energy barriers in comparison with a conventional local search but still at a slower rate than in the quantum case.

The rest of the paper is organized as follows. In Sec. II, we define the problem and give a discussion on the energy landscape of our potential. Section III examines the time-independent aspects of the system, particularly the quantum ground state and the classical equilibrium state. Sections IV and V are devoted to quantum annealing, with the former focusing on the linear annealing schedule and the latter on nonlinear ones for accelerated convergence. Sections VII and VI are on simulated annealing. The former section focuses on the linear schedule, while the latter section deals with the logarithmic schedule considered by Shinomoto and Kabashima. Lastly in Sec. VIII, we summarize the paper and then conclude with some discussions. Additional information is provided in Appendixes.

II The problem

In this section, we first define the problem and show its salient features.

II.1 Definition of the problem

Let us consider the following potential inspired by Shinomoto and Kabashima Shinomoto and Kabashima (1991),

VSK(x)=12kx2+h02[1cos(2πxw0)].V_{\rm SK}(x)=\frac{1}{2}kx^{2}+\frac{h_{0}}{2}\left[1-\cos\left(\frac{2\pi x}{w_{0}}\right)\right]. (1)

The first is a harmonic term forming the envelope of the overall potential, and it sets a unique global minimum at x=0x=0. Without loss of generality, we shall let the spring constant kk be unity from now on. The cosine term introduces local minima onto the energy landscape, and the structure of these minima can be tuned using two parameters h0h_{0} and w0w_{0}. As an example, Fig. 1 shows the graph of VSK(x)V_{\rm SK}(x) when h0=w0=0.2h_{0}=w_{0}=0.2. The parameter h0h_{0} corresponds approximately to the height of the energy barrier between two adjacent minima, and w0w_{0} to the horizontal distance between them.

Refer to caption
Figure 1: Graph of the potential VSK(x)V_{\rm SK}(x) in the vicinity of the global minimum at x=0x=0, with parameters h0=w0=0.2h_{0}=w_{0}=0.2. These parameters correspond, around the global minimum, approximately to the height of the energy barrier and the distance between neighboring minima. Inset: Zoomed-out view of the potential with the same parameter values, showing the 15 local minima (denoted NminN_{\rm min}) on each side of the origin. Due to the harmonic term in Eq. (1), the local minima are smoothed out far away from the origin.

Our proposed potential VSK(x)V_{\rm SK}(x) tries to capture the essential physics underlying Shinomoto and Kabashima’s idea. Nevertheless, there are some differences because in their original formulation, it is assumed that the local minima are all uniform such that the barrier height and distance between neighboring minima remain constant ad infinitum along the xx-axis. On the other hand, in Eq. (1), due to the presence of the quadratic term, the local minima of VSK(x)V_{\rm SK}(x) are smoothed out far away from the origin and so the total number of local minima is in fact finite. This is illustrated by Fig. 1, where the inset shows the global view of the same graph. Notwithstanding this difference, Eq. (1) is expected to lead to essentially the same asymptotic behavior of the system because the probability of the system being far from the origin is very small in the long-time limit. We adopt Eq. (1) because of its simpler numerical implementation and yet with essentially the same asymptotic properties to be expected.

II.2 Number of local minima

Let us define NminN_{\rm min} as the number of local minima in the positive xx-direction. It is seen that Nmin=15N_{\rm min}=15 when the parameters are h0=w0=0.2h_{0}=w_{0}=0.2. Notice that although the local minima are relatively uniform in the vicinity of the origin, they become shallower further away, so our studies should be focused on the former region in order to stay faithful to Shinomoto and Kabashima’s proposal.

We calculated NminN_{\rm min} numerically as a function of the parameters h0h_{0} and w0w_{0}, and the results are shown in Fig. 2 in the form of a phase diagram. Individual regions of NminN_{\rm min} from 0 up to 7 are shown. In the region Nmin8N_{\rm min}\geq 8, the boundaries between successive regions become increasingly closely spaced, with NminN_{\rm min} approaching infinity as one approaches the vertical axis. The graph of Fig. 1 is indicated by the circle, located inside the region Nmin8N_{\rm min}\geq 8. Roughly speaking, one can interpret the phase diagram of NminN_{\rm min} as a map of the difficulty of optimizing the potential VSK(x)V_{\rm SK}(x). Regions where NminN_{\rm min} is large are, intuitively, more difficult to optimize than regions where it is smaller. Within a region of constant NminN_{\rm min}, it is generally more difficult to optimize if either (or both) of the parameters h0h_{0} and w0w_{0} increases.

Refer to caption
Figure 2: Phase diagram showing NminN_{\rm min}, the number of local minima of VSK(x)V_{\rm SK}(x), as a function of the parameters h0h_{0} and w0w_{0}. Individual phases from Nmin=0N_{\rm min}=0 up to 7 are shown. In the region Nmin8N_{\rm min}\geq 8, the phase boundaries become increasingly closely spaced, with NminN_{\rm min} approaching infinity as one approaches the vertical axis w0=0w_{0}=0. This paper focuses on the point h0=w0=0.2h_{0}=w_{0}=0.2 indicated by the circle, which lies within the region Nmin8N_{\rm min}\geq 8.

It is not feasible to perform a thorough computational study of the entire phase space. Instead, we shall focus on just one point, (w0,h0)=(0.2,0.2)(w_{0},h_{0})=(0.2,0.2), which has been introduced in Figs. 1 and 2. Our choice is due to the need to balance two competing factors. On the one hand, the energy landscape of the potential should be difficult enough to be non-trivial for our annealing algorithms to optimize. On the other hand, we are limited by the computational resources needed to solve the time-dependent Schrödinger equation. For instance, the top left corner of the phase space where NminN_{\rm min} and h0h_{0} are large would pose a good challenge to both quantum and simulated annealing; however, a very large number of grid points is required to accurately represent the wavefunction in this region of phase space, making it prohibitively expensive to compute energy eigenvalues and perform long annealing simulations. The energy landscape at the phase point (0.2,0.2)(0.2,0.2) is complex and non-trivial, but at the same time also computationally feasible to simulate, allowing us to focus on the physics of annealing rather than on the optimization of a potential with true technical difficulties. Hence, in the rest of the paper, unless otherwise stated, it is implicit that all numerical calculations are performed with the parameters h0=w0=0.2h_{0}=w_{0}=0.2.

III Comparison of quantum ground state and classical equilibrium state

Before discussing annealing, which involves dynamics, let us first examine the time-independent aspects of the model. The Hamiltonian, applicable to both the quantum and classical descriptions of the system, is given by

H=p22m+VSK(x),H=\frac{p^{2}}{2m}+V_{\rm SK}(x), (2)

where pp and mm are the momentum and mass of the particle, respectively. In the quantum description, pp is replaced by the momentum operator ix\frac{\hbar}{i}\frac{\partial}{\partial x} (=1\hbar=1) and xx by the position operator. The Hamiltonian operator is then parameterized by the mass mm which also serves as the annealing parameter during optimization.

In the classical case, the system is treated using statistical mechanics where it is described by the partition function. The kinetic part of Eq. (2) can be integrated out analytically, and we only need to work with the position space Boltzmann distribution given by

ϱ(x)=eβVSK(x)Z(β),\varrho(x)=\frac{e^{-\beta V_{\rm SK}(x)}}{Z(\beta)}, (3)

where β\beta is the inverse temperature (kB=1k_{\rm B}=1), and Z(β)Z(\beta) is the partition function. We shall mostly be concerned with the internal energy (i.e. the average energy) of the system

U(β)=12β+VSK(x)β,U(\beta)=\frac{1}{2\beta}+\langle V_{\rm SK}(x)\rangle_{\beta}, (4)

where the first term is the contribution from the kinetic energy (m=1m=1) and the second one is the average potential energy

VSK(x)β=VSK(x)ϱ(x)𝑑x.\langle V_{\rm SK}(x)\rangle_{\beta}=\int_{-\infty}^{\infty}V_{\rm SK}(x)\,\varrho(x)\,dx. (5)

The subscript β\beta in β\langle\cdot\rangle_{\beta} is to indicate the dependence on the inverse temperature via the Boltzmann distribution ϱ(x)\varrho(x). Thus, in contrast to the quantum scenario, we see that the classical energy is parameterized by β\beta during annealing.

As energy is commonly used as a gauge to monitor the progress of an annealing process, let us compare the quantum ground state and classical equilibrium state at the same energies. Figure 3 shows the situation when the energies are approximately 0.6. Panel (a) shows the quantum case where the ground state energy eigenvalue E0E_{0} is plotted in blue and its corresponding probability density |x|E0|2|\langle x|E_{0}\rangle|^{2} (i.e. the ground state wavefunction squared) is plotted in red. All quantities are plotted to scale and the vertical axis reflects their actual numerical values. Hence, by comparing the blue line to the potential VSK(x)V_{\rm SK}(x) (black curve), one can say that the energy is high because many local minima are ‘submerged’ below the blue line. The quantum ground state is delocalized in the sense that it spans across several local minima. Panel (b) depicts the classical situation, where the internal energy UU is plotted in blue and the Boltzmann distribution ϱ(x)\varrho(x) [Eq. (3)] in green. Noteworthy is the fact that, apart from some wave-like oscillations, the classical equilibrium state is also delocalized, spanning local minima, in a way similar to its quantum counterpart. This means that when the energy is high, there is not much quantitative difference between the quantum ground state and classical equilibrium state for exploring the configuration space of the potential VSK(x)V_{\rm SK}(x).

Refer to caption
Figure 3: Comparing the quantum ground state and classical equilibrium state of VSK(x)V_{\rm SK}(x) with the same energies. The vertical axes reflect the actual values of all quantities plotted. (a) Quantum. The ground state probability density (red) with energy eigenvalue E00.6E_{0}\approx 0.6 (blue). (b) Classical. The Boltzmann distribution (green) with internal energy U0.6U\approx 0.6 (blue). The precise energies and their corresponding parameters (mm and β\beta) are summarized in Table 1 (labelled aa).

This analogous behavior of the two states, however, undergoes a dramatic change when the energy is lowered. Figure 4 compares them again at energies around 0.1 [panels (a) and (b)] and at around 0.01 [panels (c) and (d)]. At 0.1, one sees that the quantum ground state has acquired a compact form concentrated around the global minimum. On the other hand, the classical equilibrium state fragments into a multi-modal distribution, with significant probability of finding the particle even at local minima which are relatively far away from the origin. At the low energy of 0.01, the difference between quantum and classical is even more pronounced, with the Boltzmann distribution manifesting two ‘spurious’ modes. These spurious modes are metastable (i.e. long-lived) as the probability channel connecting them to the main mode at the origin is almost completely suppressed.

Refer to caption
Figure 4: Comparing the quantum ground state and classical equilibrium state at lower energies. The panels are organized similar to Fig. 3. Panels (a) and (b): Comparison at energy 0.1\approx 0.1. The precise energies and corresponding mm and β\beta are listed in Table 1, labelled bb. Panels (c) and (d): At energy 0.01\approx 0.01, and labelled cc in Table 1. Note that the states have been rescaled for presentation purposes.

Based on these comparisons, we see that from an energy viewpoint the quantum ground state appears to be more suitable for the task of optimization than the classical one. At low energies, the former is more focused around the global minimum, whereas the latter exhibits signs of trappings by local minima. As a heuristic explanation, one might say that classical mechanics penalizes the particle in regions where its total energy is smaller than the potential energy, and so the equilibrium state tends to distribute itself around local minima to avoid incurring this penalty. On the other hand, quantum mechanics allows the particle to access classically forbidden regions via tunneling, and this freedom enables its ground state to achieve a more compact form concentrated around the global minimum.

Refer to caption
Figure 5: The quantum ground state and classical equilibrium state energies of VSK(x)V_{\rm SK}(x) as a function of their respective parameters. (a) The red curve shows the quantum energy eigenvalue E0E_{0} as a function of mass mm. The lower blue line shows the zero-point energy of a harmonic oscillator with k=1k=1, and the upper line shows one with k=99.696k=99.696. (b) The classical internal energy UU as a function of inverse temperature β\beta (green), and the equipartition theorem for the harmonic oscillator (blue). The region between the two curves (shaded yellow) shows the ‘excess energy’ of the system. In both panels, the vertical lines indicate the boundary (i.e. initial and final) parameter values of the various annealing stages. These parameters are summarized in Table 1. Stages are defined in Secs. IV and VI.

Let us now look at how the quantum and classical energies vary with mass and temperature. Figure 5(a) shows the quantum ground state energy E0E_{0} as a function of mass mm (red). To discern the essential features of VSK(x)V_{\rm SK}(x), the zero-point energy, 12k/m\frac{1}{2}\sqrt{k/m}, of two harmonic oscillators labelled ‘outer’ (k=1k=1) and ‘inner’ (k=99.696k=99.696) are also shown (blue lines). The outer oscillator describes the envelope of VSK(x)V_{\rm SK}(x), while the inner oscillator is a quadratic approximation of the potential around the global minimum. In the limits of small and large masses, the ground state energy is well-described by the outer and inner oscillators. The range of mm where E0E_{0} makes a transition between the two oscillators marks the important region of the quantum system.

The classical case is shown in panel (b) where the internal energy UU is plotted as a function of the inverse temperature β\beta (green). The internal energy of the harmonic oscillator, independent of spring constant as embodied by the equipartition theorem, is also shown (blue). Analogous to the quantum case, in the limit of small and large β\beta, the classical ground state energy again approaches that of the oscillator. On the other hand, the distinctive feature here is that the non-trivial aspect of VSK(x)V_{\rm SK}(x) expresses itself in the form of the shaded region (yellow), where the system seems to have ‘excess energy’ relative to the oscillator. The range of β\beta where UU contains significant excess energy defines the crucial region for the classical system.

Another important aspect of the system is the quantum energy gap Δ\Delta, defined as the energy difference between the ground and first excited states E1E0E_{1}-E_{0}. In quantum annealing, fast closure of the energy gap is a sign of first-order phase transition and forebodes long annealing times Albash and Lidar (2018). Figure 6 shows the gap Δ\Delta of the potential VSK(x)V_{\rm SK}(x) as a function of mm (red). Once again, we show the gaps of the outer and inner oscillators (given by k/m\sqrt{k/m} and plotted in blue) to highlight those features which are particular to VSK(x)V_{\rm SK}(x). We see that the gap of VSK(x)V_{\rm SK}(x) does not exhibit any anomalously small behavior since its magnitude lies mostly (although not strictly) between those of the two oscillators. This suggests a polynomial computational complexity for quantum annealing, which will be confirmed numerically. Therefore this is not a hard problem to solve from the viewpoint of computational complexity, but our point is to compare the performance of quantum and classical protocols, not to discuss difficulties within the framework of quantum annealing per se.

Apart from the absence of closure, let us also point out the segment of the gap curve indicated as ‘flat gap’. The gradient Δm\frac{\partial\Delta}{\partial m} stays zero throughout the segment, with the first excited state being multiply degenerate. This feature is not peculiar to the phase point h0=w0=0.2h_{0}=w_{0}=0.2, and is in fact a persistent motif of the gap curve throughout the h0h_{0}-w0w_{0} phase space. We shall elaborate more on this in Appendix A. The implications of this ‘flat gap’ for quantum annealing is not immediately clear, but we shall see later that the performance seems to be slightly lower here compared to other regions where it is absent.

Refer to caption
Figure 6: The quantum energy gap Δ\Delta as a function of mass mm. The red curve shows that of VSK(x)V_{\rm SK}(x), and the blue lines show those of the oscillators previously discussed in Fig. 5(a). The three quantum annealing stages are also shown. Stages 1, 2, and 3 are defined in Sec. IV.

Lastly, let us remark that the ground state properties at the phase point (w0,h0)=(0.2,0.2)(w_{0},h_{0})=(0.2,0.2) presented in this section can be considered a good representative of the general behavior in phase space. Nevertheless, we also found certain aspects which are not captured by (0.2,0.2)(0.2,0.2). To supplement the discussion given here, we briefly touch upon the quantum energy E0E_{0} and the gap Δ\Delta of other phase points in Appendix A.

IV Quantum annealing using a linear schedule

We now analyze the dynamics, first for the quantum case under a simple linear annealing schedule.

IV.1 Linear schedule

Quantum annealing of the potential VSK(x)V_{\rm SK}(x) is performed by solving the time-dependent Schrödinger equation

[it+22m(t)2x2VSK(x)]ψ(x,t)=0,\left[i\hbar\frac{\partial}{\partial t}+\frac{\hbar^{2}}{2m(t)}\frac{\partial^{2}}{\partial x^{2}}-V_{\rm SK}(x)\right]\psi(x,t)=0, (6)

where ψ(x,t)\psi(x,t) is the wavefunction, and the time dependence of mass m(t)m(t) serves as the annealing schedule. Conceptually speaking, one starts annealing from zero mass where the kinetic energy is dominating, and slowly increase it to infinity such that the potential VSK(x)V_{\rm SK}(x) becomes the dominating term. If the change of m(t)m(t) with time is infinitesimally slow, then according to the adiabatic theorem Albash and Lidar (2018), a wavefunction initially at the ground state will evolve into the global optimized solution of VSK(x)V_{\rm SK}(x) upon the completion of annealing.

Let us restrict the time evolution under Eq. (6) to within the time interval 0tT0\leq t\leq T where TT is the total annealing time. Consider the following linear form for m(t)m(t)

m(1)(t)=(mfmi)(tT)+mi,m^{(1)}(t)=\left(m_{f}-m_{i}\right)\left(\frac{t}{T}\right)+m_{i}~{}, (7)

where mim_{i} and mfm_{f} are the masses at the initial (t=0)(t=0) and final (t=T)(t=T) times, respectively. When mim_{i} and mfm_{f} are fixed, TT determines the annealing speed, with a large TT giving slow annealing. The superscript (1) denotes ‘first-order’, and is to differentiate Eq. (7) from higher-order, nonlinear schedules which we shall introduced later. As suggested by Morita Morita (2007), only the initial and final time derivatives of annealing schedule determine the final excitation probability for sufficiently large annealing time. It is thus expected that an explicit form of the annealing function is unimportant, i.e. it is irrelevant whether we change the mass as in Eq. (7) or change the coefficient Γ(t)/2m\Gamma(t)\equiv\hbar/2m of the kinetic energy term as a whole as was done in Ref. Stella et al. (2005), as long as the initial and final time derivatives are shared among different functions. The linear schedule of Eq. (7) represents the case with finite derivatives at both ends of annealing.

IV.2 Stages of annealing

One operational constraint faced when quantum annealing is implemented numerically via the solution of Eq. (6) concerns the choice of the mass parameters mim_{i} and mfm_{f}. In principle, one should be able to use arbitrary small mim_{i} and arbitrary large mfm_{f} in the schedule Eq. (7). For instance, one would like to start from the high energy state shown in Fig. 3(a) and anneal towards the low energy state shown in Fig. 4(c). In practice, however, it is not feasible to solve Eq. (6) numerically with such a large energy difference because the number of grid points needed to represent the wavefunction is very large. In particular, one sees from Figs. 3(a) and 4(c) that the wavefunction evolves from a delocalized to a localized form, and so the grid must be wide enough to represent the initial wavefunction, while at the same time its points must also be dense enough to resolve the final wavefunction. Hence, in practice, the choice of mim_{i} and mfm_{f} is partly constrained by the computational resources, both in terms of memory and processing speed, one has at one’s disposal. In all our quantum annealing simulations, we restricted ourselves to a plane wave grid with 2048 grid points, and then choose mim_{i} and mfm_{f} that are computationally feasible with this grid size.

Despite the above issue, our ultimate goal is still to attain some understanding of annealing when the initial mass is small and the final mass is large. Consider the masses mam_{a} and mdm_{d} listed in Table 1, where the difference mdmam_{d}-m_{a} spans six orders of magnitude. The ground state at mam_{a} was discussed in Fig. 3(a). To get an intuitive sense of the ground state at mdm_{d}, note that mdm_{d} is larger than mcm_{c}, whose ground state was discussed in Fig. 4(c). Our objective is to study quantum annealing from the high energy, delocalized state at mam_{a} to the low energy, localized state at mdm_{d}.

To traverse this large mass difference, we perform annealing in stages. Referring to Table 1, let us define the annealing from mam_{a} to mbm_{b} as Stage 1, that from mbm_{b} to mcm_{c} as Stage 2, and that from mcm_{c} to mdm_{d} as Stage 3. The properties of the ground states at mam_{a}, mbm_{b}, and mcm_{c} have been discussed in Sec. III. Apart from keeping the computations manageable, the stages are also set up based on energy considerations. In Stage 1, the energy decreases from a high level to half the barrier height of VSK(x)V_{\rm SK}(x), as indicated by the blue lines in Figs. 3(a) and 4(a). In Stage 2, the energy continues its descent to the potential level of the two lowest local minima, as seen in Fig. 4(c). Finally, in Stage 3 the energy descends to close to the potential of the global minimum.

A graphical perspective of the annealing stages is also given in Figs. 5(a) and 6 where they are superposed upon the E0E_{0} and Δ\Delta curves. In these figures, the vertical lines indicate the initial and final masses of each of the stages.

Quantum Annealing Simulated Annealing
ll (label) log10ml\log_{10}m_{l} E0(ml)E_{0}(m_{l}) log10βl\log_{10}\beta_{l} U(βl)U(\beta_{l})
aa 0 0.5999898 0.29 0.6031582
bb 3 0.1050870 1.18 0.1061226
cc 5 0.0154758 1.97 0.0156931
dd 6 0.0049617 2.35 0.0049526
Table 1: The boundary parameters of the various annealing stages, as well as their corresponding energies. Each mm and β\beta is labelled, and their roles as the initial and/or final parameter of the stages are shown in Fig. 5.

In numerical calculations, the grid in each stage is determined by making sure that it is wide enough to accommodate the ground state wavefunction at mim_{i} and its grid points dense enough to resolve the ground state wavefunction at mfm_{f}. This is done quantitatively by making sure that the eigenenergy E0E_{0} is converged with respect to the number of grid points at both mim_{i} and mfm_{f} 111More concretely, we first determine the limit xmaxx_{\rm max} such that the ground state wavefunction at mim_{i} lies comfortably within the interval [xmax,xmax][-x_{\rm max},x_{\rm max}]. We then compute the energy eigenvalue E0E_{0} using a grid spanning the interval with 2048 grid points. The computed E0E_{0} must remain unchanged when the number of grid points is decreased to 1024 and increased to 4096. With the converged grid at mim_{i}, we repeat the procedure at mfm_{f} to ensure that the same grid must guarantee the convergence of E0E_{0} at the final mass as well. . A grid that passed the convergence test at both ends of a stage is used for quantum annealing of that stage.

IV.3 Residual energy

Quantum annealing of a stage is performed by inserting the particular mim_{i} and mfm_{f} into the schedule Eq. (7), which is then substituted into Eq. (6). The resulting Schrödinger equation is solved using the WavePacket program package Schmidt and Lorenz (2017); Schmidt and Hartmann (2018); Lorenz and Schmidt (2020). Atomic units are used, and m(t)m(t) is expressed in units of the electron rest mass. Time propagation is performed using the Runge-Kutta algorithm with a time step of dt=0.1dt=0.1 222Convergence of trajectories with respect to dtdt is checked by halving it (to 0.05) and then seeing that recalculated results remain invariant.. The ground state wavefunction at mim_{i} is used as the initial wavefunction at the start of annealing.

To assess the performance of annealing, we consider the residual energy, defined as

Rq.a.(T)=ψ(T)|H(T)|ψ(T)E0(mf).R_{\rm q.a.}(T)=\langle\psi(T)|H(T)|\psi(T)\rangle-E_{0}(m_{f}). (8)

The first term is the expectation value of the Hamiltonian [given in Eq. (6)] taken with respect to the wavefunction obtained when annealing finishes at t=Tt=T. The second term is the ground state energy at the final mass, and is the target energy which annealing seeks to attain. In the limit TT\rightarrow\infty, the time evolution becomes adiabatic and the residual energy approaches zero. The rate at which Rq.a.(T)R_{\rm q.a.}(T) decays with the total annealing time TT is a measure of how quickly the target energy is being attained by the annealing process.

Figure 7 shows Rq.a.(T)R_{\rm q.a.}(T) as a function of TT obtained under the linear schedule. The three curves show the results for the three stages discussed earlier. The straight lines (black) are proportional to T2T^{-2} and indicate the asymptotic behavior of the curves when TT is large 333Due to the periodic oscillatory behavior of the data points, instead of peforming a least square fit, we have chosen instead to manually adjust the vertical displacement of the line T2T^{-2} to fit those data points at the crests of the oscillatory curve. We find that this reflects more accurately the observed gradients of the curves.. We also performed some additional simulations at other h0h_{0}-w0w_{0} phase points and found that the decay rate T2T^{-2} holds quite generally in phase space. This decay rate for the linear schedule is in agreement with previous studies on spin systems Suzuki and Okada (2005); Morita (2007).

Refer to caption
Figure 7: Residual energy Rq.a.(T)R_{\rm q.a.}(T) as a function of total annealing time TT, for the three stages of quantum annealing performed using the linear schedule Eq. (7). The three straight lines (black) are proportional to T2T^{-2} and show the asymptotic behaviors of the curves. Lines connecting the data points are to guide the eye only.

The curve of Stage 2 (blue squares) is slightly different from the other two stages, first undergoing a transient phase before crossing over into the asymptotic T2T^{-2} regime. Similar two-phased behavior in the residual energy has also been reported in a two-level system, with the earlier phase being attributed to diabatic transitions and explained by the Landau-Zener theorem Morita (2007). The transient phase here is probably also due to transitions into excited states. Figure 8 shows an example of the final wavefunction that is obtained in Stage 2, with the corresponding residual energy indicated in Fig. 7 (arrowed T=3250T=3250). For comparison, the target ground state is shown in Fig. 4(c). One sees that the annealed wavefunction exhibits two spurious peaks at the neighboring local minima, which are absent in the target wavefunction. Furthermore, we found that the first excited level in the ‘flat gap’ region (c.f. Fig. 6) is doubly-degenerate, spanned by two compact gaussians each centered at one of the two local minima. Hence, the spurious peaks in the annealed wavefunction in Fig. 8 are most likely caused by diabatic transition into the first excited level when the annealing traverses the ‘flat gap’ region.

It is interesting to see that in quantum annealing, parts of the wavefunction can become trapped near local minima if the annealing time is not long enough. Normally, this may be what one would expect more in simulated annealing. This similarity between quantum and simulated annealing is inspiring from a physics point of view, especially since two annealing protocols obey very different rules. The mechansim underlying the trapping in simulated annealing is fairly well-understood. On the other hand, we might try to explain the trapping observed for quantum annealing using Landau-Zener transition. In the case of VSK(x)V_{\rm SK}(x), we have seen that the system encounters a region of ‘flat gap’ during the course of annealing. This may pose a problem for the Landau-Zener theory as there is no well-defined avoided crossing, which is one of the basic tenets of the Landau-Zener theory. Hence, although the physical process underlying the trapping observed here might be a diabatic transition, the detailed mechanism could differ from the traditional Landau-Zener one.

Refer to caption
Figure 8: An example of a wavefunction that is obtained if annealing is performed too quickly. The red curve shows the final probability density |ψ(x,T)|2|\psi(x,T)|^{2} obtained in Stage 2 with T=3250T=3250, whose residual energy is also indicated in Fig. 7. The graph of VSK(x)V_{\rm SK}(x) (black) is also shown for comparison. Note the spurious peaks located at the local minima, which are absent in the target ground state [c.f. Fig. 4(c)].

IV.4 Time evolution of the wavefunction and the approach to adiabaticity

We now discuss the time evolution of the wavefunction during the course of annealing. Figure 9(a) shows a snapshot of the probability density at time t=44t=44 in Stage 1, with T=560T=560. The single-modal form is quite representative of the shape of the wavefunction in Stage 1, so we can express its temporal behavior more succinctly using the width ψ(t)|x2|ψ(t)\langle\psi(t)|x^{2}|\psi(t)\rangle, which is illustrated in Fig. 9(a). Figure 10(a) shows the time evolution of the width in Stage 1 for several values of TT. Overall, one sees that all three graphs oscillate, with the amplitude of oscillation decaying with the passage of time. Among the graphs themselves, as TT increases, the oscillation becomes suppressed to yield a smoother decay.

Refer to caption
Figure 9: Snapshot of the wavefunction at time t=44t=44 during annealing in Stage 1 with T=560T=560. (a) The probability density |ψ(x,t)|2|\psi(x,t)|^{2}, with width given by the expectation value of x2x^{2}. (b) The probability current j(x,t)j(x,t), with amplitude j(x0,t)j(x_{0},t) at x0x_{0}. In both panels, the graph of VSK(x)V_{\rm SK}(x) has been rescaled vertically for presentation purposes.

Although informative, looking at the width alone gives us rather limited understanding of the behavior of the wavefunction. More insight can be gained by looking at the so-called probability current, defined as

j(x,t)=2m(t)i(ψψxψψx),j(x,t)=\frac{\hbar}{2m(t)i}\left(\psi^{*}\frac{\partial\psi}{\partial x}-\psi\frac{\partial\psi^{*}}{\partial x}\right), (9)

where ψ(x,t)\psi^{*}(x,t) is the complex conjugate of ψ(x,t)\psi(x,t), i=1i=\sqrt{-1}, and m(t)m(t) is again the time-dependent mass. The current j(x,t)j(x,t) is a real-valued quantity, and indicates the direction in which the probability is flowing at a point in space 444For the plane wave eikxe^{ikx}, one has j(x,t)=km=pm=vj(x,t)=\frac{\hbar k}{m}=\frac{p}{m}=v, which is simply the velocity of the particle. Hence, the probability current can be interpreted as a generalization of the velocity field, indicating both the speed and direction of motion of the particle. . Figure 9(b) shows the probability current corresponding to the probability density shown in panel (a). It is seen that the current is positive (negative) when x>0x>0 (x<0x<0), meaning that probability is flowing outwards away from the global minimum at the origin. With respect to the purpose of optimization, such an instantaneous behavior of the wavefunction should be considered unfavorable because the particle is diffusing away from the optimal solution at x=0x=0.

As with the width of the probability density, let us also introduce a parameter associated with j(x,t)j(x,t) to monitor the time evolution of the current. Define the amplitude as the largest displacement of j(x,t)j(x,t) from zero in the region x<0x<0. This is illustrated in Fig. 9(b), where the amplitude is j(x0,t)j(x_{0},t) at the point x0x_{0}. The amplitude can take on negative values, with the sign indicating the direction of probability flow.

Figure 10(b) shows the time evolutions of the current amplitude j(x0,t)j(x_{0},t) associated with the time evolutions of the width shown in panel (a). We see that the amplitude oscillates in time between positive and negative values, meaning that the probability current flow towards and away from the global minimum in a periodic manner. This is consistent with the pulsating behavior exhibited by the width. Another feature is that as TT increases, the amplitude tends to zero, meaning that the current itself approaches zero both spatially, as well as temporally throughout the entire duration of annealing. As j(x,t)j(x,t) is identically zero for stationary states, this is a sign of the annealing process approaching adiabaticity. Viewed differently, even for the moderate value of T=560T=560, where the residual energy approaches the asymptotic T2T^{-2} law for Stage 1 as seen in Fig. 7 (at log10T=2.748\log_{10}T=2.748), the current keeps oscillating, see Fig. 10(b), implying a diabatic evolution.

Refer to caption
Figure 10: Time evolutions of (a) the width ψ(t)|x2|ψ(t)\langle\psi(t)|x^{2}|\psi(t)\rangle and (b) the current amplitude j(x0,t)j(x_{0},t) of the wavefunction in Stage 1 under the linear schedule. The graphs of T=320,560T=320,560, and 3250 are compared to show the effects of increasing annealing time. In panel (b), the diminishing of j(x0,t)j(x_{0},t) towards zero with increasing TT indicates the approach to adiabaticity.

So far, we have focused our discussion on Stage 1. The probability current in Stage 2 is slightly more complicated. An example is shown in Fig. 21, where a snapshot of j(x,t)j(x,t) at t=10t=10 with T=320T=320 is shown. One sees that the current direction varies spatially, unlike the spatially-coherent flow we have seen in Fig. 9(b). Despite such differences, the overall behaviors in Stages 2 and 3 are generally quite similar to Stage 1. For completeness, these results are discussed in Appendix B.

V Quantum annealing with nonlinear schedules

V.1 Nonlinear annealing schedules

In the previous section, we have seen that asymptotically the residual energy decays with the total annealing time as T2T^{-2}. Analysis related to the adiabatic theorem shows that this scaling is largely a result of annealing with the linear schedule Eq. (7) Suzuki and Okada (2005); Morita (2007). Nonlinear schedules have been proposed in which the residual energy decays much faster with the annealing time, leading to better efficiency Morita (2007). In this section, we apply them to the potential VSK(x)V_{\rm SK}(x).

Nonlinear schedules can be obtained by generalizing Eq. (7) as

m(n)(t)=(mfmi)fn(t/T)+mi,m^{(n)}(t)=\left(m_{f}-m_{i}\right)f_{n}(t/T)+m_{i}, (10)

where nn denotes the order of the schedule, and the linear function t/Tt/T is replaced by higher-order functions fn(t/T)f_{n}(t/T). Details concerning the derivation of fnf_{n} can be found in Ref. Morita (2007). Physically speaking, these functions are designed such that annealing is constrained to proceed slowly both at the beginning as well as at the end of the process. Earlier studies on spin systems revealed that the residual energy decays with annealing time as T2nT^{-2n}, so higher-order schedules generally lead to better annealing performances. In the following, we shall investigate the performances of the functions f1(s)f_{1}(s) to f4(s)f_{4}(s) given in Ref. Morita (2007),

f1(s)\displaystyle f_{1}(s) =s,f2(s)=s2(32s),\displaystyle=s,~{}f_{2}(s)=s^{2}(3-2s), (11)
f3(s)\displaystyle f_{3}(s) =s3(1015s+6s2),f4(s)=s4(3584s+70s220s3).\displaystyle=s^{3}(10-15s+6s^{2}),~{}f_{4}(s)=s^{4}(35-84s+70s^{2}-20s^{3}). (12)

V.2 Residual energy: faster decay under nonlinear schedules

Figure 11 compares the residual energies obtained under the schedules given by Eq. (10), with n=1n=1 to 3. The results from Stages 1 to 3 are shown in panels (a) to (c) of Fig. 11, respectively. The general behaviors in Stages 1 and 3 are qualitatively quite similar. It is seen that for an nnth-order schedule the residual energy decays as T2nT^{-2n}, verifying the theoretical prediction of Ref. Morita (2007). To emphasize the importance of varying the schedule slowly both at the start as well as towards the end of annealing, we also considered a straightforward quadratic schedule m(q)(t)m^{(q)}(t), partly motivated by Ref. Stella et al. (2005), which is obtained by simply substituting fnf_{n} with (t/T)2(t/T)^{2} in Eq. (10). This schedule proceeds slowly only at the start but not at the end of annealing. In panels (a) and (c), we see that the residual energy of m(q)(t)m^{(q)}(t) (solid blue squares) decays only as T2T^{-2}. That both linear and quadratic schedules should show the same asymptotic decay rate can also be explained based on the theoretical analysis of Ref. Morita (2007).

Refer to caption
Figure 11: Comparing the performances of linear and nonlinear schedules in quantum annealing. Panels (a), (b), and (c) show the Rq.a.(T)R_{\rm q.a.}(T) versus TT curves in Stage 1, 2, and 3, respectively. Legend for the schedules shown in panel (a) also applies to (b) and (c). The straight lines (black) are proportional to T2nT^{-2n} (n=1n=1 to 4) and indicate the asymptotic behaviors of the curves. Lines connecting the data points are to guide the eye only.

The situation in Stage 2 is different from the other two in that the residual energy decays in the manner of the transient phase discussed in Sec. IV.3. Note that this is in the domain of strongly diabatic transitions and so is not the regime in which the function fnf_{n} were originally designed for. Nevertheless, it is seen from panel (b) of Fig. 11 that the residual energies of m(2)(t)m^{(2)}(t) and m(3)(t)m^{(3)}(t) decrease faster than that of m(1)(t)m^{(1)}(t). However, we also found that the curve for m(4)(t)m^{(4)}(t) (not shown) is almost exactly the same as that of m(3)(t)m^{(3)}(t), so we should be cautious about generalizing the improvement in performance seen for n=2n=2 and 3 to higher orders. We were unable to pursue the curves of m(2)(t)m^{(2)}(t) and m(3)(t)m^{(3)}(t) to larger TT to examine the asymptotic crossover because we have exhausted the numerical precision offered by the simulation package. Within the time span we could study as shown in Fig. 11(b), diabatic processes lead to significantly faster convergence than T2T^{-2}. Interestingly, the quadratic schedule gives the best performance here, unlike in the other two stages. Overall, nonlinear schedules generally show improvement over the linear one in this stage. Interpretation of the results, however, is not as clear-cut as in Stages 1 and 3.

V.3 Persistent tunneling current during quantum annealing

Earlier in Sec. IV.4, we have seen that under the linear schedule, increasing the annealing time suppresses of the probability current and leads to a more adiabatic process. Conventionally, adiabaticity is considered to be closely related with how the global minimum is ultimately to be attained in quantum annealing. With nonlinear schedules, however, we are now allowed to explore the schedule’s order nn as a new parameter dimension while keeping the annealing time constant. We shall see that this leads to tunneling, as opposed to adiabaticity, as a dominant mechanism for quantum annealing to arrive at the optimal solution.

Let us focus our discussion on Stage 1. The probability currents under nonlinear schedules also have the spatially-coherent form of Fig. 9(b) (no node in each of the region x>0x>0 and x<0x<0), so once again we can use the amplitude j(x0,t)j(x_{0},t) to monitor their temporal behaviors. Figure 12 shows the time evolutions of j(x0,t)j(x_{0},t) obtained under the various schedules, with annealing time T=560T=560. The salient feature that is common to all the nonlinear schedules and which distinguishes them from the linear one is that the amplitude is consistently positive throughout the entire duration of annealing. The probability current is therefore always flowing towards the origin, and no current ever flows away. This is a highly diabatic process in which the optimization is targeted at the global minimum. In stark contrast, the current of the linear schedule oscillates in an unguided manner, resulting in inefficient annealing.

Refer to caption
Figure 12: Comparing the time evolutions of j(x0,t)j(x_{0},t) under different annealing schedules m(n)(t)m^{(n)}(t) in Stage 1. The oscillatory behavior under the linear schedule m(1)(t)m^{(1)}(t) (gray) has also been shown in Fig. 10(b) (green). For nonlinear schedules, the j(x0,t)j(x_{0},t) are strictly positive, indicating a persistent current directed towards the global minimum throughout the entire annealing.

Apart from being persistent, the current of nonlinear schedules also has the additional feature of exhibiting tunneling. To explain, let us first note from Fig. 9 that the spatial extent of both the probability density and the current in the domain x>0x>0 can be approximated by the width ψ(t)|x2|ψ(t)\sqrt{\langle\psi(t)|x^{2}|\psi(t)\rangle}. The red curve in Fig. 13 shows the time evolution of ψ(t)|x2|ψ(t)\sqrt{\langle\psi(t)|x^{2}|\psi(t)\rangle} for the schedule m(3)(t)m^{(3)}(t) in Stage 1 with T=560T=560. The (green) shaded areas show the classically forbidden regions where the total energy ψ(t)|H(t)|ψ(t)\langle\psi(t)|H(t)|\psi(t)\rangle is less than the potential energy VSK(x)V_{\rm SK}(x). Shaded areas lying below the red curve are regions where the current resides in classically forbidden regimes, and these currents are tunneling in nature. Hence, nonlinear schedules induce persistent currents that tunnel across potential barriers to greatly improve the efficiency of annealing. This is in contrast to the linear schedule which, on the contrary, seeks to suppress the flow of current in order to bring about a more adiabatic annealing process.

Refer to caption
Figure 13: Tunneling exhibited by the wavefunction when annealing under a nonlinear schedule. Tics along the vertical axis represent the domain x>0x>0 and also indicate the time evolution of ψ(t)|x2|ψ(t)\sqrt{\langle\psi(t)|x^{2}|\psi(t)\rangle} in Stage 1 under m(3)(t)m^{(3)}(t) with T=560T=560 (red curve). The (green) shaded areas indicate the classically forbidden regions where ψ(t)|H(t)|ψ(t)<VSK(x)\langle\psi(t)|H(t)|\psi(t)\rangle<V_{\rm SK}(x). The shaded areas lying below the red curve are where the wavefunction exhibits tunneling.

So far, our discussion has focused on Stage 1. For Stage 2, we mentioned earlier that under the linear schedule the current is not spatially-coherent (c.f. Fig. 21). Under nonlinear schedules, however, the current resumes a spatially-coherent form and so we can again use the amplitude j(x0,t)j(x_{0},t) as a monitoring parameter to study its temporal evolution. Despite the different behavior of their residual energies, it was found that the observations reported for Stage 1 are valid for Stage 2 as well. For completeness, these results are summarized in Appendix B.

The situation in Stage 3 is quite similar to that of Stage 1, with the current showing persistence under the nonlinear schedules. Tunneling, however, is not a prominent feature in Stage 3 as the wavefunction is already well-localized within the global minimum [c.f. Fig. 4(c)]. The optimization is therefore taking place within an effective quadratic potential, which is a relatively simple problem for quantum annealing.

VI Simulated annealing using a linear schedule

We next study simulated annealing, first under a linear annealing schedule for comparison with the quantum counterpart.

VI.1 Annealing protocol

We perform simulated annealing numerically using the Metropolis algorithm. Let the position of the particle be xtx_{t} at time tt. A new position xx^{\prime} is proposed as

x=xt+Δxx^{\prime}=x_{t}+\Delta x (13)

where Δx\Delta x is a random number drawn from the interval [s2,s2][-\frac{s}{2},\frac{s}{2}] of width ss. The quantity ss is called the Monte Carlo step size and it determines how far the particle is allowed to jump with each move. The proposed position xx^{\prime} is then accepted with probability

Paccept=min{1,eβ(t)[VSK(x)VSK(xt)]},P_{\rm accept}=\min\left\{1,e^{-\beta(t)\left[V_{\rm SK}(x^{\prime})-V_{\rm SK}(x_{t})\right]}\right\}, (14)

where the temperature is provided by the annealing schedule β(t)\beta(t), which will be specified later. After xtx_{t} is updated, the time tt advances by a time step dtdt and the Metropolis algorithm is repeated 555The simulated annealing calculations in Secs. VII and VI are performed with dt=0.1dt=0.1, to be consistent with the time step used in quantum annealing. .

To compute averages, we work with an ensemble of NN particles where each of them evolves independently in time according to the Metropolis algorithm. The average VSK(x)t\langle V_{\rm SK}(x)\rangle_{t} at time tt is calculated via the ensemble average as

VSK(x)t=limN1Nν=1NVSK(xtν),\langle V_{\rm SK}(x)\rangle_{t}=\lim_{N\rightarrow\infty}\frac{1}{N}\sum_{\nu=1}^{N}V_{\rm SK}(x_{t}^{\nu}), (15)

where xtνx^{\nu}_{t} is the position of the ν\nuth particle at time tt. In practice, one works with a large but finite NN, typically N=107N=10^{7} and 10910^{9}. The ensemble average then becomes an approximation, subjected to errors due to finite-size fluctuations.

VI.2 Linear schedule and stages of simulated annealing

Let us consider the following linear form for β(t)\beta(t)

β(1)(t)=(βfβi)(tT)+βi.\beta^{(1)}(t)=\left(\beta_{f}-\beta_{i}\right)\left(\frac{t}{T}\right)+\beta_{i}. (16)

As in quantum annealing, we restrict ourselves to the time interval 0tT0\leq t\leq T, where TT again denotes the total annealing time. The parameters βi\beta_{i} and βf\beta_{f} are the inverse temperatures at the initial (t=0t=0) and final (t=Tt=T) times, respectively. The schedule β(1)(t)\beta^{(1)}(t) is the simulated annealing counterpart to the mass schedule m(1)(t)m^{(1)}(t) that we have used in quantum annealing.

We would like to perform the simulated annealing calculations in such a way that they can be compared to the results of quantum annealing obtained in Sec. IV. To enable a fair comparison, we use energy as the criterion. The boundary parameters βi\beta_{i} and βf\beta_{f} in Eq. (16) are chosen in such a way that the initial and final energies of a stage in simulated annealing are approximately the same as the initial and final energies of the corresponding stage in quantum annealing. These boundary parameters βl\beta_{l} and their energies U(βl)U(\beta_{l}) are summarized in Table 1, alongside their corresponding quantum counterparts mlm_{l} and E0(ml)E_{0}(m_{l}). To facilitate our subsequent discussions, let us call annealing from βa\beta_{a} to βb\beta_{b} Stage A, that from βb\beta_{b} to βc\beta_{c} Stage B, and that from βc\beta_{c} to βd\beta_{d} Stage C. The equilibrium state properties at the stage boundaries have been discussed in Sec. III [c.f. Figs. 3(b), 4(b), and 4(d)]. A graphical perspective of the three stages is also given in Fig. 5(b), where they are superposed upon the curve of the internal energy U(β)U(\beta). It is seen that the stages are positioned in the non-trivial regime of the system exhibiting ‘excess energy’.

VI.3 Residual energy

The annealing performance is assessed by considering the residual energy

Rs.a.(T)=VSK(x)t=TVSK(x)βf.R_{\rm s.a.}(T)=\langle V_{\rm SK}(x)\rangle_{t=T}-\langle V_{\rm SK}(x)\rangle_{\beta_{f}}~{}. (17)

The first term is the average potential energy obtained when annealing ends at t=Tt=T. The second term is the average potential energy at equilibrium at the final temperature [c.f. Eq. (5)], and is the target energy that annealing seeks to attain. The quantity Rs.a.(T)R_{\rm s.a.}(T) is the simulated annealing counterpart to the residual energy Rq.a.(T)R_{\rm q.a.}(T) considered in quantum annealing.

Refer to caption
Figure 14: Decrease of residual energy Rs.a.(T)R_{\rm s.a.}(T) with total annealing time TT for simulated annealing, performed under the linear schedule Eq. (16) in Stage A. Results for three step sizes ss are shown. All data points are obtained by averaging over an ensemble of N=109N=10^{9}, and the error bars indicate finite-NN fluctuations 777The error bars are generated as follows. The entire ensemble of size 10910^{9} is divided, arbitrarily, into ten sub-ensembles each of size 10810^{8}. The residual energy of each sub-ensemble is calculated, and the largest and smallest ones obtained are indicated by the upper and lower bounds of the error bar. . The solid line (black) is obtained by fitting a straight line to the data points of s=1s=1. Thin lines connecting data points are to guide the eye only.

Figure 7 shows the behavior of Rs.a.(T)R_{\rm s.a.}(T) as a function of TT in Stage A under the linear schedule. We experimented with a wide range of step sizes and some examples of the residual energy curves we obtained are shown in the figure. There exists an optimum step size that gives the lowest-lying Rs.a.(T)R_{\rm s.a.}(T) curve, and in Stage A this step size was found to be s=1s=1 (red circles). The solid line (black) is obtained by fitting a straight line to the data points of s=1s=1. We see that the residual energy decreases as T1T^{-1}. This scaling of Rs.a.(T)R_{\rm s.a.}(T) with TT is quite general. We identified the optimum step sizes in Stages B and C as well, and the lowest-lying Rs.a.(T)R_{\rm s.a.}(T) curve in all three stages are summarized in Fig. 15, shown by the three slanting curves. The two solid lines (black) are obtained by fitting a straight line to the data points of Stages B and C. It is seen that the residual energy also decreases as T1T^{-1} in these two stages.

Refer to caption
Figure 15: Residual energy Rs.a.(T)R_{\rm s.a.}(T) versus annealing time TT curves for simulated annealing under the linear schedule. The three slanting curves (circle, triangle, diamond) show the results for Stages A to C, with each curve obtained using the optimum step size of that stage. The solid straight lines (black) are obtained by fitting to the data points of Stages B and C. The two curves at the top (magenta squares) exhibit slow decay due to trappings by local minima, which happens when the step size is small (s2<w0\frac{s}{2}<w_{0}). All data points are obtained by averaging over an ensemble of N=109N=10^{9}, and the error bars are generated in the same way as in Fig. 7. Thin lines connecting data points are to guide the eye only.

The scaling of T1T^{-1} is fairly robust in the sense that it is attainable over quite a wide range of step sizes. However, in Stages B and C, we also found that if the step size falls below the inter-minima distance of VSK(x)V_{\rm SK}(x) (i.e. s2<w0\frac{s}{2}<w_{0}), then the T1T^{-1} scaling will no longer be achieved. In Fig. 15, the two uppermost curves (magenta squares) show the residual energies obtained in these two stages with the annealing performed using s=0.1s=0.1. It is seen that they decrease much slower than the previous curves. The reason for the slow decay is trapping by local minima, and can be understood by looking at the initial Boltzmann distributions of these two stages, shown in Figs. 4(b) and 4(d). The probability channels connecting the global minimum to the two adjacent local ones are suppressed, so if the step size is smaller than the inter-minima distance w0w_{0}, particles at the local minima will need a long time to hill-climb the potential barrier in order to reach the global minimum, possibly leading to a logarithmic decrease of the residual energy as suggested by Shinomoto and Kabashima Shinomoto and Kabashima (1991). See also discussions in the next section. Viewed differently, larger step sizes satisfying s2>w0\frac{s}{2}>w_{0} may be regarded to realize quasi-global searches across energy barriers without hill-climbing processes, leading to a faster polynomial decrease of the residual energy. It is noteworthy that, even with such a quasi-global search, the decrease rate T1T^{-1} is slower than the quantum counterpart T2T^{-2}.

One might inquire if it is possible to obtain a better scaling than T1T^{-1} by making some adjustments to the annealing algorithm. We tried a simple quadratic schedule where the linear term t/Tt/T in Eq. (16) is squared. We also experimented with a variant algorithm in which the step size decreases adaptively with temperature during the course of annealing. Overall, we did not manage to achieve a faster decrease than T1T^{-1} with these attempts.

VII Simulated annealing using a logarithmic schedule

VII.1 Logarithmic schedule

In Ref. Shinomoto and Kabashima (1991), Shinomoto and Kabashima studied simulated annealing of the original form of VSK(x)V_{\rm SK}(x), where the barrier height and distance between neighboring minima are kept constant for all xx, under a logarithmic annealing schedule with time running from 0 to \infty in contrast to the finite-time protocols in the previous sections. To test their prediction, let us consider the following schedule for the inverse temperature

β(l)(t)=βilog10(t+10),\beta^{(l)}(t)=\beta_{i}\log_{10}\left(t+10\right), (18)

where βi\beta_{i} is the initial inverse temperature at time t=0t=0. Time tt is supposed to run from 0 to \infty. The superscript (l)(l) denotes ‘logarithmic’ and is to differentiate Eq. (18) from the other schedules. Shinomoto and Kabashima showed, by assuming that particle are located only at local minima and taking the continuum limit ignoring discreteness of positions, that the average of the potential energy during simulated annealing under Eq. (18) should decrease with time asymptotically as

VSK(x)t(log10t)1,\langle V_{\rm SK}(x)\rangle_{t}\sim(\log_{10}t)^{-1}, (19)

where VSK(x)t\langle V_{\rm SK}(x)\rangle_{t} is given by the right side of Eq. (5), with a small difference that β\beta is now further dependent on time via the annealing schedule. We indicate this time dependence using the subscript tt in t\langle\cdot\rangle_{t}. In the following, we shall study the dependence of VSK(x)t\langle V_{\rm SK}(x)\rangle_{t} on time from the perspective of Monte Carlo simulations to see if their prediction is observed numerically.

VII.2 Time scaling of the potential energy from Monte Carlo simulations

Figure 16 shows the decay of VSK(x)t\langle V_{\rm SK}(x)\rangle_{t} with time under Eq. (18), with βi=100.29\beta_{i}=10^{0.29}. The initial ensemble at the start of annealing is the one drawn from the Boltzmann distribution of VSK(x)V_{\rm SK}(x) with temperature βi\beta_{i}. Each curve in the figure represents the result obtained using a particular step size ss. It is seen that when tt is large, the curves collapse onto a single curve. We fitted a straight line to the curve of s=4s=4 using the data points from log10log10t>0.5\log_{10}\log_{10}t>0.5, and the result is shown by the solid line (black). We see that VSK(x)t\langle V_{\rm SK}(x)\rangle_{t} decays asymptotically as (log10t)0.74(\log_{10}t)^{-0.74}. The decay exponent 0.74-0.74 is different from the 1-1 proposed by Shinomoto and Kabashima.

Refer to caption
Figure 16: Decay of VSK(x)t\langle V_{\rm SK}(x)\rangle_{t} with time under the schedule Eq. (18) (βi=100.29\beta_{i}=10^{0.29}). Results for several different ss are shown. Ensemble size is N=107N=10^{7} for all the curves. The errors due to finite-NN fluctuations are small and hence not shown. The curves collapse together asymptotically, decaying as (log10t)0.74(\log_{10}t)^{-0.74} (solid line). Lines connecting the data points are to guide the eye only.

To account for our results, we propose generalizing the scaling Eq. (19) slightly as

VSK(x)t(log10t)αs.a.\langle V_{\rm SK}(x)\rangle_{t}\sim(\log_{10}t)^{\alpha_{\rm s.a.}} (20)

where the decay exponent αs.a.\alpha_{\rm s.a.} is obtained from numerical simulations. We repeated the simulated annealing calculations with different values of βi\beta_{i} and the results are summarized in Table 2. These results capture the non-trivial aspects of the potential VSK(x)V_{\rm SK}(x). The column βfinal\beta_{\rm final} indicates the final inverse temperature attained by the schedule for that annealing simulation. By referring βfinal\beta_{\rm final} to Fig. 5(b), we see that our annealing was performed in the regime where the system exhibits ‘excess energy’.

βi\beta_{i} βfinal\beta_{\rm final} αs.a.\alpha_{\rm s.a.} αeq.(βfinal)\alpha_{\rm eq.}(\beta_{\rm final})
100.2910^{0.29} 100.9410^{0.94} 0.74-0.74 0.74-0.74
101.0010^{1.00} 101.6610^{1.66} 1.07-1.07 1.07-1.07
101.7510^{1.75} 102.4010^{2.40} 1.56-1.56 1.59-1.59
Table 2: Decay exponent αs.a.\alpha_{\rm s.a.} obtained from Monte Carlo simulations. Annealing was performed under the schedule Eq. (18), starting from βi\beta_{i} and ending at βfinal\beta_{\rm final}. The gradient αeq.(βfinal)\alpha_{\rm eq.}(\beta_{\rm final}) [c.f. Eq. (21)] approximates αs.a.\alpha_{\rm s.a.} very well.

The exponents αs.a.\alpha_{\rm s.a.} may be understood on the basis of the equilibrium properties of the system. Let us define

αeq.(β)=d[log10VSK(x)β]d[log10β]|β\alpha_{\rm eq.}(\beta^{\prime})=\left.\frac{d[\log_{10}\langle V_{\rm SK}(x)\rangle_{\beta}]}{d[\log_{10}\beta]}\right|_{\beta^{\prime}} (21)

where VSK(x)β\langle V_{\rm SK}(x)\rangle_{\beta} is given by Eq. (5) and is the average potential energy at equilibrium. The quantity αeq.(β)\alpha_{\rm eq.}(\beta^{\prime}) is the gradient of the curve log10VSK(x)β\log_{10}\langle V_{\rm SK}(x)\rangle_{\beta} at log10β\log_{10}\beta^{\prime}. In Table 2, the column αeq.(βfinal)\alpha_{\rm eq.}(\beta_{\rm final}) shows the gradient evaluated at the final attained temperature βfinal\beta_{\rm final}, and we see that the values agree quite well with the decay exponent αs.a.\alpha_{\rm s.a.}. This agreement between the two quantities can be explained by noting that when tt is large the schedule Eq. (18) varies very slowly. The system is therefore very close to equilibrium during the late stages of annealing, and it traces out a narrow segment of the VSK(x)β\langle V_{\rm SK}(x)\rangle_{\beta} curve around βfinal\beta_{\rm final}. The relation Eq. (20) can therefore be approximated by VSK(x)βfinal(βfinal)αs.a.\langle V_{\rm SK}(x)\rangle_{\beta_{\rm final}}\sim(\beta_{\rm final})^{\alpha_{\rm s.a.}}, from which the interpretation of αs.a.\alpha_{\rm s.a.} as the gradient αeq.(βfinal)\alpha_{\rm eq.}(\beta_{\rm final}) follows immediately via Eq. (21).

We close this section with some comments on the decay exponent 1-1 obtained by Shinomoto and Kabashima and the αs.a.\alpha_{\rm s.a.} we obtained. One possible reason for the discrepancy is that the potential VSK(x)V_{\rm SK}(x) we used, although inspired by Shinomoto and Kabashima’s work, is not exactly the same as the one they treated. Another cause may be the coarse-graining assumption (w0)2βfinal1(w_{0})^{2}\beta_{\rm final}\ll 1 which the authors made during their derivation. Substituting w0=0.2w_{0}=0.2 and the values of βfinal\beta_{\rm final} in Table 2, we see that the inequality is not very well-satisfied for the numerical simulations which we performed here. A third, and possibly more important, reason may be as follows. If the system is close to equilibrium and the particle stays almost exclusively in the valley around the global minimum at x=0x=0 for tt large and thus β\beta large, then the equipartition law suggests that the average of the potential energy is proportional to β1\beta^{-1}, which would imply VSK(x)β(log10t)1\langle V_{\rm SK}(x)\rangle_{\beta}\sim(\log_{10}t)^{-1} under the annealing schedule of Eq. (18). Also the kinetic energy term leads to the same behavior due to equipartition. Deviations from this inverse-log law in numerical simulations would imply a few possibilities including a spreading of probability distribution away from the central valley to nearby local minima as well as deviations from equilibrium. It is anyway the case that the contribution to the equilibrium internal energy from kinetic energy would yield the inverse-log law under the quasi-equilibrium context, which is more dominant than the potential term with larger powers as listed in Table 2.

VIII Summary and discussions

We have studied quantum and simulated annealing of a one-dimensional particle along the xx-axis. The subject of optimization is played by the potential energy VSK(x)V_{\rm SK}(x), which is modelled closely after a similar potential originally proposed by Shinomoto and Kabashima Shinomoto and Kabashima (1991). On the one hand, the rugged energy landscape of VSK(x)V_{\rm SK}(x) offers a non-trivial challenge for the annealing algorithms to solve; on the other, the potential is designed such that it possesses a unique global minimum at the origin. This combination of both difficulty and simplicity makes VSK(x)V_{\rm SK}(x) a suitable candidate for understanding the fundamental aspects of quantum annealing for continuous variables, as well as for identifying the underlying differences between it and simulated annealing.

We first briefly reviewed the general features of VSK(x)V_{\rm SK}(x) and its phase diagram. Focusing on a particular phase point, we then examined the properties of both the quantum ground state and the classical equilibrium state in detail. It was pointed out that the quantum ground state wavefunction appears more suitable for optimization than the classical Boltzmann distribution because it exhibits less signs of trapping by local minima at the same value of energy. We moved on to look at the dynamical aspects involving annealing. Quantum annealing was performed under linear and nonlinear schedules. The performance of annealing was assessed using the residual energy, which was found to decrease with total annealing time TT as T2nT^{-2n} where nn is the order of the schedule, which is in contrast to the result T1T^{-1} by Stella et al. Stella et al. (2005) derived by a phenomenological approach to the quantum process involving several steps of approximations. Also of particular interest is the use of probability current to monitor the time evolution of the wavefunction during the annealing process. The current revealed the different mechanisms at work behind linear and nonlinear schedules (adiabaticity versus persistent tunneling). We then went on to consider simulated annealing. Here, our main finding is that under the linear annealing schedule the residual energy decreases as T1T^{-1} if we choose the step size ss appropriately. An important lesson we learnt is that the Monte Carlo step size utilized for annealing must be large enough to traverse the energy barrier between local minima to achieve the best performance; otherwise, the particle will be trapped and the residual energy will decrease much slower than T1T^{-1}, possibly logarithmically as predicted by Shinomoto and Kabashima.

The central result of this work is the comparison between the residual energies of quantum and simulated annealing. We have seen that under the linear schedule, the residual energy of quantum annealing decreases as T2T^{-2} whereas that of simulated annealing decreases as T1T^{-1} at best, i.e. even with quasi-global search processes across energy barriers, and probably as (log10T)1(\log_{10}T)^{-1} under the usual local search. This provides another example in which quantum mechanics helps the system reach the global minimum more efficiently than classical stochastic process does. Furthermore, the theoretical framework of quantum annealing allows room for additional improvement, as we have seen from the results of nonlinear schedules. On the other hand, in the case of simulated annealing it is not immediately apparent how one can improve upon the scaling of T1T^{-1} for quasi-global search and (log10T)1(\log_{10}T)^{-1} for local search. Overall, our results seem to favor quantum over simulated annealing when it comes to the optimization of continuous variables, at least for potentials like VSK(x)V_{\rm SK}(x).

A second point concerns the mechanisms underlying quantum annealing. We have seen that there are two different ways in which quantum annealing can arrive at the global minimum. Under the linear schedule, this goal is reached by suppressing the probability current so that the annealing mimics an adiabatic process. Under nonlinear schedules, on the other hand, the same objective is achieved by inducing a persistent tunneling current. These are two incompatible mechanisms, but interestingly both can be utilized by quantum annealing, depending on the way it is performed. In the literature, quantum adiabatic computation and quantum annealing are sometimes loosely used as synonyms. Our work shows that it is better not to conflate these two terms as the underlying mechanism can be very different.

Lastly, let us comment on possible ways to extend this work. One interesting feature of quantum annealing is the flexibility to choose (or even design) the annealing driver so as to reap the best performance. In this work, the role of the driver is played by the kinetic energy operator. One might consider introducing a secondary driver in the form of a vector potential operator A(x)A(x), generalizing the momentum operator pp to pq(t)A(x)p-q(t)A(x). The time-dependent charge q(t)q(t) plays the role of a second schedule, together with the mass m(t)m(t). One recovers the original model when q(t)=0q(t)=0. It would be interesting to see if one could improve the annealing performance with some appropriate choice of A(x)A(x) and q(t)q(t). This could be useful, for instance, in Stage 2 where we have seen that resorting to nonlinear schedules did not deliver the expected improvement in the scaling of the residual energy.

Acknowledgements.
This work is based on a project commissioned by the New Energy and Industrial Technology Development Organization.

Appendix A Overview of E0(m)E_{0}(m) and Δ(m)\Delta(m) in the h0h_{0}-w0w_{0} phase space

In Sec. III, we presented a detailed discussion of the quantum ground state energy E0E_{0} and the energy gap Δ\Delta of the system as a function of mass mm. The discussion there focused on one point of h0h_{0}-w0w_{0} phase space, (w0,h0)=(0.2,0.2)(w_{0},h_{0})=(0.2,0.2). In this appendix, we broaden the discussion to the curves of E0(m)E_{0}(m) and Δ(m)\Delta(m) at other phase points, with the aim of providing a more comprehensive picture of the time-independent aspects of the system.

We first discuss the energy E0E_{0}. As seen in Fig. 5(a), the curve of E0(m)E_{0}(m) has an ‘S’ shape, resembling the function tanhx\tanh x. This feature is generally true at other phase points as well. Panel (a) of Fig. 17 shows several E0(m)E_{0}(m) curves where h0h_{0} is fixed and w0w_{0} varies. [For w0=5w_{0}=5 (magenta), the ‘S’ portion of the curve lies in the range m<102m<10^{-2}.] Panel (b) shows the complementary situation where w0w_{0} is fixed and h0h_{0} varies. Qualitatively, we see that the energy curve of Fig. 5(a) at the phase point (0.2,0.2)(0.2,0.2) can be considered a reasonable representative of the generic behavior in phase space. Quantitatively, however, one also sees that the ‘S’ transition at different phase points varies over a wide range of masses. Hence, one should be slightly cautious when inferring the behavior at other phase points based on the results obtained at (0.2,0.2)(0.2,0.2).

Refer to caption
Figure 17: Ground state energy E0E_{0} as a function of mass mm, evaluated at points in h0h_{0}-w0w_{0} phase space other than (0.2,0.2)(0.2,0.2). The lines labelled ‘outer oscillator’ have the same meaning as in Fig. 5. (a) The E0(m)E_{0}(m) curves where h0=9.1h_{0}=9.1 and w0w_{0} varies. (b) The curves where w0=0.6w_{0}=0.6 and h0h_{0} varies.

The energy gap Δ\Delta, on the other hand, exhibits richer behavior compared to E0E_{0}. In addition to the ‘flat gap’ described in Fig. 6, we also observed two other notable features on the Δ(m)\Delta(m) curves of other phase points.

The first concerns a minimum turning point on the gap curve. In studies on quantum annealing, much attention is being paid to the minimum energy gap as it constitutes the bottleneck of the entire annealing process. For the potential VSK(x)V_{\rm SK}(x), however, a minimum gap in the conventional sense does not exist. This is most easily seen by considering the gap of the harmonic oscillator, k/m\sqrt{k/m}, which approaches zero as the annealing approaches completion at infinite mass. However, there are situations where the gap goes through a minimum turning point. Panel (a) of Fig. 18 shows an example of such a gap curve obtained at the phase point (w0,h0)=(3.7,11.1)(w_{0},h_{0})=(3.7,11.1). Based on our observations of extensive numerical data, we found that (i) the turning point generally occurs at small mm, and (ii) just the northeastern region of phase space (c.f. Fig. 2) exhibits this feature on the gap curve. [The phase point (0.2,0.2)(0.2,0.2) lies in the southwestern region and does not exhibit a turning point.] In this paper, we did not investigate whether this minimum turning point has any influence on quantum annealing.

Refer to caption
Figure 18: Two notable features on the energy gap curves. Solid curves (red) show the energy gap Δ\Delta as a function of mass mm. The lines labelled ‘outer oscillator’ have the same meaning as in Fig. 6. (a) The Δ(m)\Delta(m) curve at the phase point (w0,h0)=(3.7,11.1)(w_{0},h_{0})=(3.7,11.1) has a minimum turning point when the mass is small. (b) At the phase point (0.5,11.1)(0.5,11.1), the gap of VSK(x)V_{\rm SK}(x) shows a steep and significant decrease relative to the gap of the oscillator.

The second feature concerns the occurrence of small gaps. Once again, this is conventionally characterized by the minimum gap. In the case of VSK(x)V_{\rm SK}(x), however, we found that the gap can show a significant decrease in magnitude but does not go through a turning point. Panel (b) of Fig. 18 shows an example of such a gap curve obtained at the phase point (w0,h0)=(0.5,11.1)(w_{0},h_{0})=(0.5,11.1). The sudden and anomalous decrease in the gap size of VSK(x)V_{\rm SK}(x) can be evinced when compared to the ‘benign’ gap behavior of the oscillator. Note that the scale along the vertical axis is logarithmic, which means that the reduction in magnitude is quite significant. The effects of this peculiar gap feature on annealing is also not covered by our numerical studies.

The behavior of the Δ(m)\Delta(m) curve at different points in phase space can be quite complex, exhibiting interplay among the various features mentioned above. Figure 19 shows some examples of how the curve evolves when h0h_{0} is kept constant and w0w_{0} varies. Panel (a) shows the case of h0=0.5h_{0}=0.5, corresponding to a small barrier in the potential VSK(x)V_{\rm SK}(x). The inter-minima distance w0w_{0} is decreased from 5 to 0.5, which in phase space translates to an increasing NminN_{\rm min}. To aid visualization, only the curves showing notable changes are plotted with color. We see that the energy gap evolves from a simple, monotonically decreasing curve (gray) to one exhibiting a distinctive ‘flat gap’ (blue). Panel (b) shows the case of large barrier h0=15h_{0}=15. The evolution of the gap curve here is more complicated, first showing a minimum turning point (gray), then a ‘flat gap’ (red), and finally a ‘steep decrease’ (blue). Based on extensive numerical observations, we found that in general when NminN_{\rm min} increases beyond a certain threshold, the two features, ‘flat gap’ followed by ‘steep decrease’, will emerge on the Δ(m)\Delta(m) curve.

Refer to caption
Figure 19: Evolution of the gap curve as w0w_{0} changes with h0h_{0} held fixed. The curves exhibit salient changes when w0w_{0} decreases below 2.5 and these are highlighted in color. To aid visualization, the other curves (w03w_{0}\geq 3) are plotted in gray. Note that decrease in w0w_{0} corresponds to increase in NminN_{\rm min} on the phase diagram, and hence increased complexity of VSK(x)V_{\rm SK}(x)’s energy landscape. (a) For h0=0.5h_{0}=0.5 (small barrier). (b) For h0=15h_{0}=15 (large barrier).

Figure 20 shows the complementary situation, with w0w_{0} held constant and h0h_{0} varying. It is seen that as h0h_{0} (and NminN_{\rm min}) increases, the gap curve evolves from being monotonically decreasing (red) to having a minimum turning point (green), and then finally attaining the features ‘flat gap’ and ‘steep decrease’ (blue). An additional point to note here is that for the gray curves, their ‘flat gap’ segment seems to have collapsed onto a single horizontal line. This phenomenon is also a motif that recurs quite frequently throughout phase space.

To summarize, in this appendix we presented an overview of the behaviors of E0(m)E_{0}(m) and Δ(m)\Delta(m) at different points in phase space. In general, the E0(m)E_{0}(m) curve evolves quite predictably with respect to changes in the parameters h0h_{0} and w0w_{0}. By contrast, the shape of the Δ(m)\Delta(m) curve is more diversed, exhibiting different features at different points in phase space. The discussion here aims to give the reader a sense of the underlying complexity of VSK(x)V_{\rm SK}(x), while also serving as a caveat that our annealing studies performed at just one phase point might not be exhaustive in terms of capturing the dynamical aspects of the system.

Refer to caption
Figure 20: Evolution of the gap curve when w0=1w_{0}=1 and h0h_{0} increases from 0.5 to 15. To aid visualization, only the curves exhibiting prominent changes are plotted in color. Notice that for the gray curves (8w0148\leq w_{0}\leq 14), their ‘flat gap’ region (c.f. Fig. 6) are collapsed onto a single horizontal line.

Appendix B Time evolution of probability current in Stages 2 and 3

In Secs. IV.4 and V.3, we discussed the temporal behavior of the probability current j(x,t)j(x,t) in Stage 1, and it was mentioned that the results there are also valid in Stages 2 and 3. In this Appendix, we supplement those discussions by presenting some of the key results in these two stages.

B.1 Stage 2 under the linear schedule

The probability current in Stage 2 is sometimes not spatially-coherent, meaning that the current flows in different directions at different points in space. Figure 21 shows a snapshot of j(x,t)j(x,t) in this stage under the linear schedule at time t=10t=10, with annealing time T=320T=320. It is seen that j(x,t)j(x,t) undulates between positive and negative values spatially, so we cannot represent its overall spatial behavior with just the amplitude j(x0,t)j(x_{0},t) as in Fig. 9(b). To keep things simple, let us just consider j(x,t)j(x,t) at three locations, namely x=0.025,x=-0.025, 0.1-0.1, and 0.175-0.175, indicated by the vertical lines in Fig. 21. Physically speaking, x=0.1x=-0.1 is located at the first local maximum of VSK(x)V_{\rm SK}(x) and monitors the current in the classically forbidden region; the other two points monitor the currents in the vicinity of the global minimum and its adjacent local minimum.

Refer to caption
Figure 21: Snapshot of the probability current j(x,t)j(x,t) at time t=10t=10 in Stage 2 under the linear schedule with T=320T=320. The undulation of j(x,t)j(x,t) between positive and negative values spatially indicates that the current flows in different directions at different points in space. We monitor the time evolution of j(x,t)j(x,t) at x=0.025,0.1x=-0.025,-0.1 and 0.175-0.175 (vertical lines), three xx positions located between the global minimum and the first local minimum of VSK(x)V_{\rm SK}(x).

The time evolutions are presented in Fig. 22, where panels (a) to (c) show the results for T=320T=320, 3250, and 10000, respectively. In each panel, the current j(x,t)j(x,t) at the three xx positions are plotted as a function of time. Generally speaking, we see that as TT increases, the oscillations of j(x,t)j(x,t) at all three xx positions become suppressed and the annealing approaches adiabaticity. This is similar to what we have seen in Stage 1 for the linear schedule [c.f. Sec. IV.4 and Fig. 10(b)].

Refer to caption
Figure 22: Time evolutions of j(x,t)j(x,t) at x=0.025,0.1,x=-0.025,-0.1, and 0.175-0.175 (c.f. Fig. 21) in Stage 2 under the linear schedule. For total annealing times (a) T=320T=320, (b) T=3250T=3250, and (c) T=10000T=10000. Generally, the behaviors at all three spatial points exhibit the same trend towards adiabaticity as in Stage 1 [c.f. Fig. 10(b)].

Let us make a minor point concerning the current near the local minimum (x=0.175x=-0.175, i.e. the red curves). When the total time is short (T=320T=320), during the early stages the current is negative, so it is directed towards the local minimum. This explains the spurious peaks observed in Fig. 8. As TT increases, this early current becomes predominantly positive, meaning that it now flows away from the local minimum and towards the global one by tunneling, thereby aiding annealing in attaining the correct optimized solution. Nevertheless, we also see that the magnitude and duration of this reversed current decrease with longer TT, so adiabaticity is ultimately still the main mechanism contributing to a successful optimization here.

B.2 Stage 2 under nonlinear schedules

In Sec. V.3, we have seen that under nonlinear schedules the probability current in Stage 1 exhibits persistent tunneling, which is in contrast to the oscillatory behavior seen under the linear schedule. The corresponding situation in Stage 2 is quite similar. First, let us note that under nonlinear schedules, the current j(x,t)j(x,t) in Stage 2 has the spatially-coherent form shown in Fig. 9(b), so the amplitude j(x0,t)j(x_{0},t) can again be used as a parameter for monitoring the temporal behavior. Figure 23 shows the time evolutions of the amplitude in Stage 2 under the various nonlinear schedules, with T=3250T=3250. Overall, the behaviors are similar to what we have seen in Stage 1 (c.f. Fig. 12).

Refer to caption
Figure 23: Comparing the time evolutions of the current amplitude j(x0,t)j(x_{0},t) in Stage 2 under nonlinear schedules. In general, the amplitude exhibits similar behavior as in Stage 1 (c.f. Fig. 12), showing no oscillations and staying strictly positive throughout the entire annealing.

B.3 Stage 3 under the linear schedule

In Stage 3, the wavefunction is well-localized within the central global minimum [c.f. Fig. 4(c)], and so annealing can be thought of as taking place within an effective quadratic potential. Figure 24 shows the time evolution of the amplitude j(x0,t)j(x_{0},t) in Stage 3 under the linear schedule, with T=1000T=1000. We see that the current exhibits the oscillatory behavior seen in Stages 1 and 2. A minor observation would be that the crests (positive current) are larger in magnitude than the troughs (negative current), so the wavefunction becomes narrower after each oscillatory cycle.

For nonlinear schedules, the overall behavior is similar to Stages 1 and 2, so we shall not discuss them further.

Refer to caption
Figure 24: Time evolution of the current amplitude j(x0,t)j(x_{0},t) in Stage 3 under the linear schedule.

References

  • Kadowaki and Nishimori (1998) T. Kadowaki and H. Nishimori, Phys. Rev. E 58, 5355 (1998).
  • Santoro et al. (2002) G. E. Santoro, R. Martoňák, E. Tosatti,  and R. Car, Science 295, 2427 (2002).
  • Das and Chakrabarti (2008) A. Das and B. K. Chakrabarti, Rev. Mod. Phys. 80, 1061 (2008).
  • Santoro and Tosatti (2006) G. E. Santoro and E. Tosatti, J. Phys. A 39, R393 (2006).
  • Morita and Nishimori (2008) S. Morita and H. Nishimori, J. Math. Phys. 49, 125210 (2008).
  • Albash and Lidar (2018) T. Albash and D. A. Lidar, Rev. Mod. Phys. 90, 015002 (2018).
  • Hauke et al. (2020) P. Hauke, H. G. Katzgraber, W. Lechner, H. Nishimori,  and W. D. Oliver, Rep. Prog. Phys. 83, 054401 (2020).
  • Farhi et al. (2001) E. Farhi, J. Goldstone, S. Gutmann, J. Lapan, A. Lundgren,  and D. Preda, Science 292, 472 (2001).
  • Finnila et al. (1994) A. B. Finnila, M. A. Gomez, C. Sebenik, C. Stenson,  and J. D. Doll, Chemical Physics Letters 219, 343 (1994).
  • Abel and Spannowsky (2021) S. Abel and M. Spannowsky, PRX Quantum 2, 010349 (2021).
  • Abel et al. (2021a) S. Abel, N. Chancellor,  and M. Spannowsky, Phys. Rev. D 103, 16008 (2021a).
  • Johnson et al. (2011) M. W. Johnson, M. H. S. Amin, S. Gildert, T. Lanting, F. Hamze, N. Dickson, R. Harris, A. J. Berkley, J. Johansson, P. Bunyk, E. M. Chapple, C. Enderud, J. P. Hilton, K. Karimi, E. Ladizinsky, N. Ladizinsky, T. Oh, I. Perminov, C. Rich, M. C. Thom, E. Tolkacheva, C. J. S. Truncik, S. Uchaikin, J. Wang, B. Wilson,  and G. Rose, Nature 473, 194 (2011).
  • Abel et al. (2021b) S. Abel, A. Blance,  and M. Spannowsky, arXiv:2105.13945  (2021b).
  • Stella et al. (2005) L. Stella, G. E. Santoro,  and E. Tosatti, Phys. Rev. B 72, 014303 (2005).
  • Shinomoto and Kabashima (1991) S. Shinomoto and Y. Kabashima, J. Phys. A 24, L141 (1991).
  • Morita (2007) S. Morita, J. Phys. Soc. Jpn. 76, 104001 (2007).
  • Note (1) More concretely, we first determine the limit xmaxx_{\rm max} such that the ground state wavefunction at mim_{i} lies comfortably within the interval [xmax,xmax][-x_{\rm max},x_{\rm max}]. We then compute the energy eigenvalue E0E_{0} using a grid spanning the interval with 2048 grid points. The computed E0E_{0} must remain unchanged when the number of grid points is decreased to 1024 and increased to 4096. With the converged grid at mim_{i}, we repeat the procedure at mfm_{f} to ensure that the same grid must guarantee the convergence of E0E_{0} at the final mass as well.
  • Schmidt and Lorenz (2017) B. Schmidt and U. Lorenz, Comp. Phys. Commun. 213, 223 (2017).
  • Schmidt and Hartmann (2018) B. Schmidt and C. Hartmann, Comp. Phys. Commun. 228, 229 (2018).
  • Lorenz and Schmidt (2020) U. Lorenz and B. Schmidt, “WavePacket (C++/Python): Time-dependent simulation of open and closed quantum systems. Version 0.3.1.”  (2020), https://sourceforge.net/projects/wavepacket/.
  • Note (2) Convergence of trajectories with respect to dtdt is checked by halving it (to 0.05) and then seeing that recalculated results remain invariant.
  • Note (3) Due to the periodic oscillatory behavior of the data points, instead of peforming a least square fit, we have chosen instead to manually adjust the vertical displacement of the line T2T^{-2} to fit those data points at the crests of the oscillatory curve. We find that this reflects more accurately the observed gradients of the curves.
  • Suzuki and Okada (2005) S. Suzuki and M. Okada, J. Phys. Soc. Jpn. 74, 1649 (2005).
  • Note (4) For the plane wave eikxe^{ikx}, one has j(x,t)=km=pm=vj(x,t)=\frac{\hbar k}{m}=\frac{p}{m}=v, which is simply the velocity of the particle. Hence, the probability current can be interpreted as a generalization of the velocity field, indicating both the speed and direction of motion of the particle.
  • Note (5) The simulated annealing calculations in Secs. VII and VI are performed with dt=0.1dt=0.1, to be consistent with the time step used in quantum annealing.