Physics Successfully Implements Lagrange Multiplier Optimization

Sri Krishna Vadlamani¹, Tianyao Patrick Xiao¹¹¹1Now at Sandia National Laboratories, Albuquerque, NM, USA , Eli Yablonovitch¹

(¹Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA
)

Abstract

Optimization is a major part of human effort. While being mathematical, optimization is also built into physics. For example, physics has the principle of Least Action, the principle of Minimum Entropy Generation, and the Variational Principle. Physics also has physical annealing which, of course, preceded computational Simulated Annealing. Physics has the Adiabatic Principle which in its quantum form is called Quantum Annealing. Thus, physical machines can solve the mathematical problem of optimization, including constraints.

Binary constraints can be built into the physical optimization. In that case, the machines are digital in the same sense that a flip-flop is digital. A wide variety of machines have had recent success at optimizing the Ising magnetic energy. We demonstrate in this paper that almost all those machines perform optimization according to the Principle of Minimum Entropy Generation as put forth by Onsager. Further, we show that this optimization is in fact equivalent to Lagrange multiplier optimization for constrained problems. We find that the physical gain coefficients which drive those systems actually play the role of the corresponding Lagrange Multipliers.

1 Introduction

Optimization is ubiquitous in today’s world. Everyday applications of optimization range from aerodynamic design of vehicles by fluid mechanics and physical stress optimization of bridges in civil engineering to scheduling of airline crews and routing of delivery trucks in operations research. Furthermore, optimization is also indispensable in machine learning, reinforcement learning, computer vision, and speech processing. Given the preponderance of massive datasets and computations today, there has been a surge of activity in the design of hardware accelerators for neural network training and inference [1].

We ask whether Physics can address optimization? There are a number of promising physical principles that drive dynamical systems toward an extremum. These are: the principle of Least Action, the principle of Minimum Entropy Generation, and the Variational Principle. Physics also has actual physical annealing which preceded computational Simulated Annealing. Further, Physics has the Adiabatic Principle which in its quantum form is called Quantum Annealing.

In due course, we may learn how to use each of these principles to perform optimization. Let us consider the principle of minimum entropy generation in dissipative physical systems, such as resistive electrical circuits. It was shown by Onsager [2] that the equations of linear systems, like resistor networks, can be re-expressed as the minimization principle of a function $f(i_{1},i_{2},\dots,i_{n})$ for currents $i_{n}$ in various branches of the resistor network. For a resistor network, the function $f$ contains the power dissipation, or entropy generation. By re-expressing a merit function in terms of power dissipation, the circuit itself will find the minimum of the merit function, or minimum power dissipation. Optimization is generally accompanied by constraints. For example, perhaps the constraint is that the final answers must be restricted to be $\pm 1$ . Such a digitally constrained optimization produces answers compatible with any digital computer.

There has been a series of machines created in the physics and engineering community to devise physics-based engines for the Ising problem. The Ising challenge is to find the minimum energy configuration of a large set of magnets. It’s a very hard problem even when the magnets are restricted to only two orientations, North pole up or down [3]. Our main insight in this paper is that most of these Ising solvers use hardware based on the Principle of Minimum Entropy generation. The natural evolution of these machines is toward a good, low-power-dissipation final state. Further, almost all of them implement the well-known Lagrange Multipliers method for constrained optimization.

An early work was by Yamamoto et al. in [4] and this was followed by further work from their group [5], [6], [7], [8], [9], [10], [11] and other groups [12], [13], [14], [15], [16], [17], [18], [19], [20]. These entropy generating machines range from coupled optical parametric oscillators, to RLC electrical circuits, to coupled exciton-polaritons, and silicon photonic coupled arrays. These types of machines have the advantage that they solve digital problems in the analog domain, which can be orders-of-magnitude faster and more energy-efficient than conventional digital chips that are limited by latency and the energy cost [11].

Within the framework of these dissipative machines, constraints can be readily included. In effect, these machines perform constrained optimization equivalent to the technique of Lagrange multipliers. We illustrate this connection by surveying 7 published physically distinct machines and showing that each minimizes entropy generation in its own way, subject to constraints; in fact, they perform Lagrange multiplier optimization. We note here that the systems in [11], [17], and [18] follow their own dynamics and are not related to the method of Lagrange multipliers. The system in [11] will be discussed in Section 5 in the main text while the work in [17] will be discussed in Appendix D.

In effect, physical machines perform local steepest descent in the entropy generation rate. They can become stuck in local optima. At the very least, they perform a rapid search for local optima, thus reducing the search space for the global optimum. These machines are also adaptable toward searching in an expanded phase space, and other techniques for approaching a global optimum.

The paper is organized as follows. In Section 2, we recognize that physics performs optimization through its variational and optimization principles. Then, we concentrate on the principle of minimum entropy generation or minimum power dissipation. In Section 3, we give an overview of the minimum entropy generation optimization solvers in the literature and show how they incorporate constraints. Section 4 has a quick tutorial on the method of Lagrange Multipliers. Section 5 studies five published solvers in detail and shows that they all follow some form of Lagrange multiplier dynamics. In Section 6, we look at those published physics-based solvers which are less obviously connected to Lagrange multipliers. Section 7 presents the applications of physics-based solvers to perform linear regression in statistics. Finally, in Section 8, we conclude and discuss the consequences of this ability to implement physics-based Lagrange multiplier optimization for areas such as machine learning.

2 Optimization in Physics

We survey the minimization principles of physics and the important optimization algorithms derived from them. These physical optimization machines are intended to converge to optima that are ‘agnostic to initial conditions’. By ‘agnostic to initial conditions’, we mean systems that converge to the global optimum, or a good local optimum, irrespective of the initial point for the search.

2.1 Physics Principles and Algorithms

2.1.1 The principle of Least Action

The principle of Least Action is the most fundamental principle in physics. Newton’s Laws of Mechanics, Maxwell’s Equations of Electromagnetism, Schrödinger’s equation in Quantum Mechanics, and Quantum Field Theory can all be interpreted as minimizing a quantity called action. For the special case of light propagation, this reduces to the principle of Least Time, as shown in Fig. 1.

A conservative system without friction or losses evolves according to the principle of Least Action. The fundamental equations of physics are reversible. A consequence of this reversibility is the Liouville Theorem which states that volumes in phase space are left unchanged as the system evolves.

Contrary-wise, in both a computer and an optimization solver, the goal is to have a specific solution with less uncertainty or a smaller zone in phase space than the initial state, an entropy cost first specified by Landauer and Bennett. Thus, some degree of irreversibility, or energy cost, is needed, specified by the number of digits in the answer in the Landauer/Bennett analysis. An algorithm has to be designed and programmed into the reversible system (either via software or via hardcoding of hardware) to effect the reduction in entropy needed to solve the optimization problem. Coming up with a fast algorithm for NP-hard problems is still an open problem in the field of reversible computing, which includes quantum computing.

This would require an energy cost, but not necessarily a requirement for continuous power dissipation. We look forward to computer science breakthroughs that would allow the principle of Least Action to address unsolved problems. An alternative approach to computing would involve physical systems that continuously dissipate power, aiding in the contraction of phase space toward a final solution. While exceeding the Landauer limit, such systems might have the advantage of speed and simplicity. This brings us to the principle of Least Power dissipation.

Refer to caption — Figure 1: The principle of Least Time, a subset of the principle of Least Action: The actual path that light takes to travel from point A to point B is the one that takes the least time to traverse. Recording the correct path entails a small energy cost consistent with the Landauer Limit.

2.1.2 The principle of Least Power Dissipation

If we consider systems that continuously dissipate power, we are led to a second optimization principle in physics, the principle of Least Entropy Production or Least Power Dissipation. This principle states that any physical system will evolve into a steady-state configuration that minimizes the rate of entropy production given the constraints (such as fixed thermodynamic forces, voltage sources, or input power) that are imposed on the system. An early version of this statement is provided by Onsager in his celebrated papers on the reciprocal relations [2]. This was followed by further foundational work on this principle by Prigogine, [21], and de Groot, [22]. This principle is readily seen in action in electrical circuits and is illustrated in Fig. 2. We shall frequently use this principle, as formulated by Onsager, in the rest of the paper.

2.1.3 Physical Annealing; Energy Minimization

This technique is widely used in materials science and metallurgy and involves the slow cooling of a system starting from a high-temperature. As the cooling proceeds, the system tries to maintain thermodynamic equilibrium by reorganizing itself into the lowest energy minimum in its phase space. Energy fluctuations due to finite temperatures help the system escape from local optima. This procedure leads to global optima when the temperature reaches zero in theory but the temperature has to be lowered prohibitively slowly for this to happen.

2.1.4 Adiabatic Method

The Adiabatic Method involves the slow transformation of a system from initial conditions that are easily constructed to final conditions that capture the difficult problem at hand.

More specifically, to solve the Ising problem using this algorithm, one initializes the system of spins in the ground state of a simple Hamiltonian and then slowly varies the parameters of the Hamiltonian to end up with the final Ising Hamiltonian of interest. If the parameters are varied slowly enough, the system ends up in the ground state of the final Hamiltonian and the problem gets solved. In a quantum mechanical system, this is sometimes called ‘quantum annealing’. Several proposals and demonstrations, including the well-known D-Wave machine [23], utilize this algorithm.

The slow rate of variation of the Hamiltonian parameters that is required for this method to work is determined by the minimum energy spacing between the instantaneous ground state and the instantaneous first excited state that occurs as we move from the initial Hamiltonian to the final Hamiltonian. The smaller the gap is for a particular variation schedule, the slower the rate at which we need to perform the variation to successfully solve the problem. It has been shown that the gap can get exponentially small in the worst case, implying that this algorithm can take exponential time in the worst case for NP-hard problems.

2.1.5 Minimum Entropy Generation in Multi-Oscillator Arrays

Multi-Oscillator Arrays, subject to Parametric Gain were introduced in [4] and [7] for solving Ising problems. This can be regarded as a subset of the Principle of Minimum Entropy Generation, which is always subject to a non-zero input power constraint. In this case, gain acts as a boundary condition or constraint for the principle of minimum entropy generation, and the oscillator array must arrange itself to dissipate the least power subject to that constraint. In the context of a multi-coupled-oscillator arrays with gain, a certain oscillator mode will have the least loss. That mode will grow in amplitude most rapidly. This least loss mode can be individually selected by increasing the gain until it matches the intrinsic loss of that mode. Then further nonlinear evolution amongst all the modes will occur. If the oscillator array is bistable, as is the case for parametric gain which drives oscillation along the $\pm$ real axis, this becomes the analog of magnetic bistability in an Ising problem. Then, we seek a solution for the lowest magnetic energy state with some oscillators locked at zero phase shift and others locked at $\pi$ -phase shift. This mechanism will be the main point of Section 3.

3 Coupled Multi-Oscillator Array Ising Solvers

The motivation for ‘Coupled Multi-Oscillator Array Ising Solvers’ is best explained using concepts from laser physics. As a laser is slowly being turned on, spontaneous emission from the laser gain medium couples into the various cavity modes and begins to become amplified. The different optical modes in the cavity have different loss coefficients due to their differing spatial profiles. As the laser pump/gain increases, the cavity mode with the least loss grows faster than the other modes. Once the gain reaches the threshold gain, then further nonlinear evolution amongst all the modes will occur.

The design of the Coupled Multi-Oscillator Array Ising machines tries to map the power losses of the optimization machine to the magnetic energies of various states of the Ising problem. If the mapping is correct, the lowest power configuration will match the energetic ground state of the Ising problem. This is illustrated in Fig. 5. In effect, this is an example of a system evolving toward a state of minimum entropy generation, or minimum power dissipation, subject to the constraint of gain being present.

The archetypal solver in this class consists of a network of interconnected oscillators, driven by phase-dependent parametric gain. Parametric gain amplifies only the cosine quadrature and causes the electric field to lie along the $\pm$ Real axis in the complex plane. The phase of the electric field (0 or $\pi$ ) can be used to represent $\pm$ spin in the Ising problem. The resistive interconnections between the oscillators are designed to favor ferromagnetic or anti-ferromagnetic ‘spin-spin’ interactions by the Principle of Minimum Entropy Generation, subject to the constraint of parametric (phase-dependent) gain as the power input. The parametric gain favors oscillation along the real axis of the complex plane, where the positive real axis would correspond to spin up, and the negative real axis would correspond to spin down.

The Voltage (or Current) input constraint is very important to the Principle of Minimum Entropy Generation. If there were no power input constraint, all the currents and voltages would be zero, and the minimum power dissipated would be zero. In the case of the Coupled Multi-Oscillator circuit, the power input is produced through a gain mechanism, or a gain module. The constraint could be the voltage input to the gain module. But if the gain were to be too small, it might not exceed the corresponding circuit losses, and the current and voltage would remain near zero. Thus, there is usually a threshold gain requirement, when applying the Principle of Minimum Entropy Generation to the Coupled Multi-Oscillator circuit.

If the gain is insufficient, the circuit will achieve minimum entropy generation at negligible currents and voltages. The intrinsic losses in the network would dominate and the circuit currents and voltages would be near zero. If the pump gain is then gradually ramped up the oscillatory mode requiring the least threshold gain begins oscillating. Upon reaching the threshold gain, a non-trivial current distribution built on the bistability of the Couple Multi-Oscillator circuit will emerge. As the gain exceeds the required threshold, there will be further nonlinear evolution among the modes so as minimize power dissipation. The final state “spin” configuration, dissipating the lowest power, (or entropy generation) emerges as the sought-for optimum.

Ideally, the gain evolution will be controlled by the Lagrange function to find the local minimum power dissipation configuration as will be discussed in Section 5.5. With Minimum Entropy Generation, as with most optimization schemes, it is difficult to guarantee a global optimum.

In optimization, each constraint contributes a Lagrange multiplier. We will show that the gains of the oscillators are the Lagrange multipliers of the constrained system. In the next section, we provide a brief tutorial on Lagrange Multiplier optimization.

4 Lagrange Multiplier Optimization Tutorial

The method of Lagrange multipliers is a very well-known procedure for solving constrained optimization problems. In constrained optimization, the optimal point $\bm{x^{*}}\equiv(x,y)$ in multi-dimensional solution space locally optimizes the merit function $f(\bm{x})$ subject to the constraint $g(\bm{x})=0$ . The optimal point has the property that the slope of the merit function is zero as infinitesimal steps are taken away from $\bm{x^{*}}$ , as taught in calculus. But these deviations are restricted to the constraint curve, as shown in Fig. 6. The iso-contours of the function $f(\bm{x})$ increase until they are limited by, and just touch, the constraint curve $g(\bm{x})=0$ at the point $\bm{x^{*}}$ .

At the point of touching, $\bm{x^{*}}$ , the gradient of $f$ and the gradient of $g$ are parallel to each other. This can be stated formally as:

\bm{\nabla}f(\bm{x^{*}})=\lambda^{*}\bm{\nabla}g(\bm{x^{*}}).

The proportionality constant $\lambda^{*}$ is called the Lagrange multiplier corresponding to the constraint $g(\bm{x})=0$ .

Although Fig. 6 shows only a 2-dimensional optimization space, in general let us take the optimization space $x$ to be $n$ -dimensional. When we have multiple constraints $g_{1},\dots,g_{p}$ , we correspondingly expand the $n$ -dimensional Lagrange Multiplier requirement:

\bm{\nabla}f(\bm{x^{*}})=\sum_{i=1}^{p}\lambda_{i}^{*}\bm{\nabla}g_{i}(\bm{x^{*}}),

where the gradient vector $\bm{\nabla}$ represents $n$ -equations, accompanied by the $p$ constraint equations $g_{i}(\bm{x})=0$ , resulting in $n+p$ equations. These equations solve for the $n$ components in the vector $\bm{x^{*}}$ , and the $p$ unknown Lagrange Multipliers $\lambda_{i}^{*}$ . That would be $n+p$ equations for $n+p$ unknowns.

Motivated by the above condition, let us introduce a Lagrange function $L(\bm{x},\bm{\lambda})$ defined as follows:

L(\bm{x},\bm{\lambda})=f(\bm{x})+\sum_{i=1}^{p}\lambda_{i}g_{i}(\bm{x}),

which can be optimized by gradient descent or other methods to solve for $\bm{x^{*}}$ and $\bm{\lambda^{*}}$ . The full theory of Lagrange multipliers, and the popular ‘Augmented Lagrangian Method of Multipliers’ algorithm used to solve for locally optimal $(\bm{x^{*}},\bm{\lambda^{*}})$ , are discussed in great detail in [24] and [25]. A gist of the main points is presented in Appendices A, B, C.

For the specific case of the Ising problem, the objective function is given by $f(\bm{\mu})=\nobreak\sum_{i,j}J_{ij}\bm{\mu_{i}}\cdot\bm{\mu_{j}}$ , where $f(\bm{\mu})$ is the magnetic Ising Energy and $\bm{\mu_{i}}$ is the $i$ -th magnetic moment vector. For the optimization method represented in this paper, we need a circuit or other physical system whose power dissipation is also $f(\bm{x})=\nobreak\sum_{i,j}J_{ij}x_{i}x_{j}$ , but now $f(\bm{x})$ is power dissipation, not energy, $x_{i}$ is a variable that represents voltage, or current or electric field, and the $J_{ij}$ are not magnetic energy, but rather resistive coupling elements. The correspondence is between magnetic spins quantized along the z-axis, $\mu_{zi}=\pm 1$ , and the circuit variable $x_{i}=\pm 1$ .

While ‘energy’ as opposed to ‘power dissipation’ are represented by different units, nonetheless we need to establish a correspondence between them. For every optimization problem, there is a challenge of finding a physical system whose power dissipation function represents the desired equivalent optimization function.

If the Ising problem has $n$ spins, there are also $p=n$ constraints, one for each of the spins. A sufficient constraint is: $g_{i}(\bm{x})=1-x_{i}^{2}$ . More complicated nonlinear constraints can be envisioned, but $(1-x_{i}^{2})$ could represent the first two terms in a more complicated constraint Taylor expansion.

Therefore, a sufficient Lagrange function for the Ising problem, with digital constraints, is given by:

L(\bm{x},\bm{\lambda})=\sum_{i=1}^{n}\sum_{j=1}^{n}J_{ij}x_{i}x_{j}+\sum_{i=1}^{n}\lambda_{i}\left(1-x_{i}^{2}\right),

where $\lambda_{i}$ is the Lagrange Multiplier associated with the corresponding constraint. We shall see in the next section that most analog algorithms that have been proposed for the Ising problem in the literature, actually tend to optimize some version of the above Lagrange function.

5 The Physical Ising Solvers

In this section, we discuss each physical procedure proposed in the literature and show how each physical scheme implements the method of Lagrange multipliers. They all obtain good performance on the Gset benchmark problem set [26], and many of them demonstrate that they perform better than the heuristic algorithm, Breakout Local Search [27]. The main result of our work is the realization that the pump gains used in all the physical methods are in fact Lagrange multipliers.

The available physical solvers in the literature are as follows:

1.

Optical Parametric Oscillators
2.

Coupled Radio Oscillators on the Real Axis
3.

Coupled laser cavities using multicore fibers
4.

Coupled Radio Oscillators on the Unit Circle
5.

Coupled polariton condensates

Then there are a number of schemes that also rely upon a variant of minimum entropy production or power dissipation:

6.

Iterative Analog Matrix Multipliers
7.

Leleu Mathematical Ising Solver

In Appendix D, there is scheme that appears unconnected with minimum entropy production rate:

8.

Adiabatic coupled radio oscillators (Toshiba)

We shall see that methods 1, 2, 4, in the literature use only one gain for all the oscillators which is equivalent to imposing only one constraint. The other methods, 3, 5, 6, use different gains for each spin and correctly capture the full $n$ constraints of the Ising problem.

5.1 Optical Parametric Oscillators

5.1.1 Overview

An early optical machine for solving the Ising problem was presented by Yamamoto et al. [4] and [8]. Their system consists of several pulses of light circulating in an optical fiber loop, with the phase of each light pulse representing an Ising spin. In parametric oscillators, gain occurs at half the pump frequency. If the gain overcomes the intrinsic losses of the fiber, the optical pulse builds up. Parametric amplification provides phase dependent gain. It restricts the oscillatory phase to the Real Axis of the complex plane. This leads to bistability along the positive or negative real axis, allowing the optical pulses to mimic the bistability of magnets.

In the Ising problem, there is magnetic coupling between spins. The corresponding coupling between optical pulses is achieved by controlled optical interactions between the pulses. In Yamamoto’s approach, one pulse $i$ is first plucked out by an optical gate, amplitude modulated by the proper connection weight specified in the $J_{ij}$ Ising Hamiltonian, and then reinjected and superposed onto the other optical pulse $j$ , producing constructive or destructive interference, representing ferromagnetic or anti-ferromagnetic coupling.

By providing saturation to the pulse amplitudes, the optical pulses will finally settle down, each to one of the two bistable states. We will find that the pulse amplitude configuration evolves exactly according to the Principle of Minimum Entropy Generation. If the magnetic dipole solutions in the Ising problem are constrained to $\pm 1$ then each constraint is associated with a Lagrange Multiplier. We find that each Lagrange Multiplier turns out to be equal to the gain or loss associated with the corresponding oscillator.

5.1.2 Lagrange multipliers as gain coefficients

As an example, Yamamoto et al. [7] analyze their parametric oscillator system using coupled wave equations for the slowly varying amplitudes of the circulating optical modes. We now show that the coupled wave equation approach reduces to an extremum of their system ‘Entropy Generation’ or ‘power dissipation’. The coupled-wave equation for parametric gain of the slowly varying amplitude $c_{i}$ of the in-phase cosine component of the $i$ -th optical pulse (representing magnetic spin in an Ising system) is as follows:

\frac{dc_{i}}{dt}=(-\alpha_{i}+\gamma_{i})c_{i}-\sum_{j}J_{ij}c_{j},

(1)

where the weights, $J_{ij}$ , are the magnetic ferromagnetic or anti-ferromagnetic cross-couplings, and $\gamma_{i}$ represents the phase dependent parametric gain given by the pump to the $i$ -th circulating pulse, whereas $\alpha_{i}$ is the corresponding dissipative loss. The amplitudes $c_{i}$ represent the optical cosine wave electric field amplitudes. In the case of circuits, the voltage amplitudes can be expressed as $V_{i}=c_{i}+\text{j}s_{i}$ where $c_{i}$ and $s_{i}$ represent the cosine and sine quadrature components of voltage, and j is the unit imaginary. For clarity of discussion, we dropped the cubic terms in (1) that Yamamoto et al originally had. A discussion of these terms in given in Appendix C.

Owing to the nature of parametric amplification, the quadrature sine wave components $s_{i}$ of the electric field amplitude die out rapidly. The rate of entropy generation, or net power dissipation $h$ , including the negative dissipation associated with gain can be written:

h(c_{1},\dots,c_{n})=\sum_{i,j}J_{ij}c_{i}c_{j}+\sum_{i}\alpha_{i}c_{i}^{2}-\sum_{i}\gamma_{i}c_{i}^{2},

(2)

If we minimize the entropy generation $h(\bm{c})$ without invoking any constraints, that is, with $\gamma_{i}=0$ , the amplitudes $c_{i}$ simply go to zero, which generates the minimum entropy.

If the gain $\gamma_{i}$ is large enough, some of the amplitudes might go to infinity. To avoid this, we may employ the $n$ constraint functions $g_{i}(c_{i})=\left(1-c_{i}^{2}\right)=0$ , which enforce a digital $c_{i}=\pm 1$ outcome. (Actually, a constraint of the form $g_{i}(c_{i})=\left(1-c_{i}^{2}\right)=0$ is quite general in the sense that $\left(1-c_{i}^{2}\right)$ can represent the first two terms of the Taylor Series of an arbitrary constraint.) Adding the constraint function to the entropy generation, yields the Lagrange function including the constraint functions times the respective Lagrange Multipliers:

L(\bm{c},\bm{\gamma})=\sum_{i,j}J_{ij}c_{i}c_{j}+\sum_{i}\alpha_{i}c_{i}^{2}-\sum_{i}\gamma_{i}\left(c_{i}^{2}-1\right),

(3)

Comparing the unconstrained (2) to the constrained (3), they only differ in the final (-1) term which effectively constrains the amplitudes and prevents them from diverging to $\infty$ . Expression (3) is the Lagrange function given at the end of Section 4. Surprisingly, the gains $\gamma_{i}$ emerge to play the role of Lagrange Multipliers. This means that each mode, represented by the subscripts in $c_{i}$ , must adjust to a particular gain $\gamma_{i}$ which minimizes the overall entropy generation, and the respective gains $\gamma_{i}$ represent the Lagrange Multipliers. Minimization of the Lagrange function (3) provides the final steady state of the system dynamics.

If the circuit or optical system is designed to dissipate power, or equivalently generate entropy, in a mathematical form that matches the magnetic energy in the Ising problem, then the dissipative system will seek out a corresponding local optimum configuration of the magnetic Ising energy.

Such a physical system, constrained to $c_{i}=\pm 1$ , is digital in the same sense as a flip-flop circuit, but unlike the von Neumann computer, the inputs are resistor weights for power dissipation. Nonetheless a physical system can evolve in a direct manner, without the need for shuttling information back and forth as in a von Neumann computer, providing faster answers. Without the communications overhead but with the higher operation speed, the energy dissipated to arrive at the final answer will be less, in spite of the circuit being required to generate entropy during its evolution toward the final state.

To achieve minimum entropy production, the amplitudes $c_{i}$ , and the Lagrange Multipliers $\gamma_{i}$ , must all be simultaneously optimized. While a circuit will evolve toward optimal amplitudes $c_{i}$ , the gains $\gamma_{i}$ must arise from a separate active circuit. Ideally, the active circuit which controls the Lagrange Multiplier gains $\gamma_{i}$ , would have its entropy production included with the main circuit. A more common method is to provide gain that follows a heuristic rule using an external feedback circuit. For example, Yamamoto et al. follow the heuristic rule $\gamma_{i}=a+bt$ . It is not yet clear whether the heuristic-based approach toward gain evolution will be equally effective as simply lumping together all main circuit and feedback components and simply minimizing the total power dissipation.

We conclude this subsection by noting that the Lagrangian, Eq (3), corresponds to Lagrange multiplier optimization using the following merit function and constraints:

	$\displaystyle f(\bm{c})=\sum_{i,j}J_{ij}c_{i}c_{j}+\sum_{i}\alpha_{i}c_{i}^{2},$
	$\displaystyle g_{i}(c_{i})=\left(1-c_{i}^{2}\right)=0.$

5.1.3 Conclusions from Optical Parametric Oscillators

1.

Physical systems minimize the entropy production rate, or power dissipation, subject to input constraints of voltage, amplitude, gain, etc.
2.

These systems actually perform Lagrange Multiplier optimization.
3.

Indeed, it is the gain $\gamma_{i}$ in each oscillator $i$ that plays the role of the corresponding Lagrange Multiplier.
4.

If the Lagrange function is split in such a way that only the Ising function, $\sum_{i,j}J_{ij}c_{i}c_{j}$ , is treated as the merit function $f(\bm{c})$ , then the Lagrange multipliers corresponding to the constraints $g_{i}$ are the net gains, $\gamma_{i}-\alpha_{i}$ .
5.

Under the digital constraint, amplitudes $c_{i}=\pm 1$ , entropy generation minimization schemes are actually binary, similar to a flip-flop.

5.2 Coupled Radio Oscillators on the real axis

5.2.1 Overview

A coupled LC oscillator system with parametric amplification was analyzed in the circuit simulator, SPICE, by Xiao et al., [12]. This is analogous to the optical Yamamoto system but this system consists of a network of radio frequency LC oscillators coupled to one another through resistive connections. The LC oscillators contain linear inductors but nonlinear capacitors which provide the parametric gain. The parallel or cross-connect resistive connections between the oscillators are designed to implement the ferromagnetic or anti-ferromagnetic couplings $J_{ij}$ between magnetic dipole moments $\mu_{i}$ as shown in Fig. 7. The corresponding phase of the voltage amplitude $V_{i}$ , 0 or $\pi$ determines the sign of magnetic dipole moment $\mu_{i}$ .

The nonlinear capacitors are pumped by voltage $V(2\omega_{0})$ at frequency $2\omega_{0}$ , where the LC oscillator natural frequency is $\omega_{0}$ . Second harmonic pumping leads to parametric amplification in the oscillators. As in the optical case, parametric amplification plays the dual role of generating/sustaining voltage oscillations as well as imposing phase bistability to the ac voltages in the oscillators. The bistability refers to gain along the $\pm$ Real-Axis defined by time synchronization with the $2\omega_{0}$ -pump.

The $2\omega_{0}$ -pump induces gain $\gamma_{i}$ in the Real-Axis quadrature. As in the case of optical parametric amplifier machines, ideally, an active circuit would control the Lagrange Multiplier gains $\gamma_{i}$ , and the gain control circuit would have its entropy production included with the main circuit. A more common approach is to provide gain that follows a heuristic rule.

As in the optical parametric amplifier case, a mechanism is needed to prevent the parametric gain from producing infinite amplitude signals. Zener diodes can be inserted into the circuit to restrict the amplitudes to finite saturation values or to digitally defined values. With the diodes in place, the circuit settles into a voltage phase configuration, 0 or $\pi$ , that minimizes net power dissipation for a given pump gain.

5.2.2 Lagrange function and Lagrange multipliers

The amplitudes can be defined by the voltages across the LC oscillator capacitors and derived from Kirchoff’s voltage and current equations as in Xiao, [12]. Performing the slowly-varying amplitude approximation on the cosine component of these voltages, $c_{i}$ , Xiao obtains the following equation of motion:

\frac{dc_{i}}{dt}=\frac{1}{4R_{c}C_{0}}\sum_{j}{J_{ij}c_{j}}-\alpha_{i}c_{i}+\gamma_{i}c_{i},

(4)

where the $c_{i}$ are the peak voltage amplitudes in units of a reference voltage, $R_{c}$ is the resistance of the coupling resistors, the cross-couplings $J_{ij}$ are assigned binary values $J_{ij}=\pm 1$ , $C_{0}$ is the linear part of the capacitance in each oscillator, $n$ is the number of oscillators, $\omega_{0}$ is the natural frequency of all the oscillators, and the parametric gain constant $\gamma=\omega_{0}|\Delta C|/4C_{0}$ where $|\Delta C|$ is the capacitance modulation at the second harmonic. In the decay constant $\alpha=(n-1)/(4R_{c}C_{0})$ , there are $(n-1)$ resistors $R_{c}$ in parallel, since it is assumed that each oscillator can give up energy to all the other $(n-1)$ oscillators, with the coupling resistors acting in parallel. In this simplified model all decay constants $\alpha$ are taken as equal, and moreover each oscillator experiences exactly the same parametric gain $\gamma$ ; conditions that can be relaxed if needed.

The number 4 is present in the first denominator, since for two coupled LC circuits as shown in Fig. 7, the decay is controlled by the RC time of the capacitors in parallel and the $R_{c}$ resistors in series. Likewise, the number 4 appears for loss and parametric gain, but for different reasons in each case.

We note that equation (4) above performs gradient descent on the net power dissipation function:

h(\bm{c},\gamma)=-\frac{1}{4R_{c}}\sum_{i,j}J_{ij}c_{i}c_{j}+\sum_{i}\alpha C_{0}c_{i}^{2}-\sum_{i}\gamma C_{0}c_{i}^{2},

(5)

which is very similar to Section 5.1. On the right-hand side, the reason for the 4 in the first denominator is to compensate for double counting in $i\neq j$ . The first two terms on the right-hand side together represent the dissipative losses in the coupling resistors while the third term is the negative of the gain provided to the system of oscillators.

Next, we obtain the following Lagrange function through the same replacement of $\left(-c_{i}^{2}\right)$ with $\left(1-c_{i}^{2}\right)$ that we performed in Section 5.1:

L(\bm{c},\gamma)=-\frac{1}{4R_{c}}\sum_{i,j}J_{ij}c_{i}c_{j}+\sum_{i}\alpha C_{0}c_{i}^{2}-\sum_{i}\gamma C_{0}\left(c_{i}^{2}-1\right),

(6)

The above Lagrangian corresponds to Lagrange multiplier optimization using the following merit function and constraints:

	$\displaystyle f(\bm{c})=-\frac{1}{4R_{c}}\sum_{i,j}J_{ij}c_{i}c_{j}+\sum_{i}\alpha C_{0}c_{i}^{2},$
	$\displaystyle g(\bm{c})=\sum_{i}\left(1-c_{i}^{2}\right)=0.$

Again, we see that the gain coefficient $\gamma$ is the Lagrange multiplier of the constraint $g=0$ .

5.2.3 Iterative Optimization as a Series of Time Steps

Although the extremum of Eq (6) represents the final evolved state of a physical system representing an optimization outcome, it would be interesting to examine the time evolution toward the optimal state. The optimization occurs by iterative steps, where each iteration can be regarded to take place in a time interval $\Delta t$ . At each successive iteration, the voltage amplitude $c_{i}$ takes a step whose magnitude is proportional to the gradient of the Lagrange function:

c_{i}(t+\Delta t)=c_{i}(t)-\kappa\Delta t\frac{\partial}{\partial c_{i}}L(\bm{c},\gamma),

(7)

where the minus sign on the right hand side drives the system toward minimum power dissipation. As the Lagrange function comes closer to its minimum, the gradient $\frac{\partial}{\partial c_{i}}L(\bm{c},\gamma)$ diminishes and the amplitude steps become smaller and smaller. The adjustable proportionality constant $\kappa$ , controls the size of each iterative step; it also calibrates the dimensional units between power dissipation and voltage amplitude. (Since $c_{i}$ is voltage amplitude, $\kappa$ has units of reciprocal capacitance.) Converting Eq (7) to continuous time,

	$\displaystyle\frac{dc_{i}}{dt}=-\kappa\frac{\partial}{\partial c_{i}}L(\bm{c},\gamma),$		(8)
	$\displaystyle\frac{dc_{i}}{dt}=-\kappa\left(\frac{\partial}{\partial c_{i}}f(c_{i})+\sum_{j}\gamma_{j}\frac{\partial}{\partial c_{i}}g_{j}(c_{j})\right),$		(9)

where the $\gamma_{j}$ play the role of Lagrange multipliers, and the $g_{j}=0$ are the constraints. Taking $L(\bm{c},\gamma)$ from Eq (6), the gradient in the voltage amplitudes becomes

\frac{\partial}{\partial c_{i}}L(\bm{c},\gamma)=-\frac{1}{2R_{c}}\sum_{j}{J_{ij}c_{j}}+2\alpha C_{0}c_{i}-2\gamma C_{0}c_{i}.

Substituting into Eq (8), the time derivative of the voltage amplitudes becomes

\frac{dc_{i}}{dt}=2\kappa\left(\frac{1}{4R_{c}}\sum_{j}{J_{ij}c_{j}}-\alpha C_{0}c_{i}+\gamma C_{0}c_{i}\right).

(10)

The constant $\kappa$ can be absorbed into the units of time to reproduce the dynamical equation (4), the slowly varying amplitude approximation for the coupled radio oscillators. Thus, it is interesting that the slowly-varying time dynamics can be reproduced from iterative optimization steps on the Lagrange function.

5.2.4 Conclusions from Coupled Oscillators on the Real Axis

1.

As in other cases, the coupled LC oscillator system [12] minimizes the entropy production rate, or power dissipation, incorporating the power input from the pump gain.
2.

The coupled LC oscillator system actually performs Lagrange Multiplier optimization whose Merit Function is the Lagrange Function.
3.

The gain $\gamma_{i}$ in each oscillator $i$ plays the role of the corresponding Lagrange Multiplier.
4.

Under the amplitude constraint on the voltage cosine component, $c_{i}=\pm 1$ , the entropy generation minimization scheme is actually binary, similar to a flip-flop.
5.

Successive iterations toward a power dissipation minimum, employing gradient descent in the amplitudes of the cosine components of the voltages $c_{i}$ , surprisingly, yield the same time-dependent Slowly Varying amplitude differential equation (4).

5.3 Coupled laser cavities using multicore fibers

5.3.1 Overview

The Ising solver designed by Babaeian et al., [13], makes use of coupled laser modes in a multicore optical fiber. Polarized light in each core of the optical fiber corresponds to each magnetic moment in the Ising problem. This means that the number of cores needs to be equal to the number of magnets in the given Ising instance. The right-hand circular polarization and left-hand circular polarization of the laser light in each core represent the two polarities (up and down) of the corresponding magnet. The mutual coherence of the various cores is maintained by injecting seed light from a master laser.

The coupling between the fiber cores is achieved through amplitude mixing of the laser modes by Spatial Light Modulators at one end of the multicore fiber, [13]. These Spatial Light Modulators couple light amplitude from the $i$ -th core to the $j$ -th core according to the prescribed connection weight $J_{ij}$ .

5.3.2 Equations and comparison with Lagrange multipliers

As in prior physical examples the electric field amplitudes can be expressed in the slowly-varying polarization modes of the $i$ -th core, $E_{iL}$ and $E_{iR}$ , where the two electric field amplitudes are in-phase temporally, are positive real, but have different polarization. They are,

	$\displaystyle\frac{d}{dt}E_{iL}=$	$\displaystyle-\alpha_{i}E_{iL}+\gamma_{i}E_{iL}+\frac{1}{2}\sum_{j}{J_{ij}\left(E_{jR}-E_{jL}\right)},$
	$\displaystyle\frac{d}{dt}E_{iR}=$	$\displaystyle-\alpha_{i}E_{iR}+\gamma_{i}E_{iR}-\frac{1}{2}\sum_{j}{J_{ij}\left(E_{jR}-E_{jL}\right)},$

where $\alpha_{i}$ is the decay rate in the optical core in 1/sec units, and $\gamma_{i}$ is the gain supplied to the $i$ -th core. The first term on the right in both the equations represents optical fiber losses while the second term represents the gain provided. The third term represents the coupling between the $j$ -th and $i$ -th cores that is provided by the Spatial Light Modulators. We next define the degree of polarization as $\mu_{i}\equiv E_{iL}-E_{iR}$ . (This differs from the usual definition of degree of polarization of an optical beam which contains intensity rather than electric field amplitude.) Subtracting the two equations above, we obtain the following evolution equation for $\mu_{i}$ :

\frac{d}{dt}\mu_{i}=-\alpha_{i}\mu_{i}+\gamma_{i}\mu_{i}+\sum_{j}{J_{ij}\mu_{j}}.

The power dissipation, or entropy generation, is proportional to two orthogonal components of electric field, each squared, $|E_{iL}|^{2}+|E_{iR}|^{2}$ . But this can also be written $|E_{iL}-E_{iR}|^{2}+|E_{iL}+E_{iR}|^{2}=|\mu_{i}|^{2}+|E_{iL}+E_{iR}|^{2}$ . But $|E_{iL}+E_{iR}|^{2}$ can be regarded as relatively constant as energy switches back and forth between right and left circular polarization. Then the changes in power dissipation, or entropy generation, $h(\bm{\mu})$ would be most influenced by quadratic terms in $\bm{\mu}$ :

h(\bm{\mu},\bm{\gamma})=\sum_{i}\alpha_{i}\mu_{i}^{2}+\sum_{i,j}J_{ij}\mu_{i}\mu_{j}-\sum_{i}\gamma_{i}\mu_{i}^{2}.

As before, we add the $n$ digital constraints $g_{i}(\mu_{i})=1-\mu_{i}^{2}=0$ where $\mu_{i}=\pm 1$ represents fully left or right circular optical polarization. With the constraints, the corresponding Lagrange function is:

L(\bm{\mu},\bm{\gamma})=\sum_{i}\alpha_{i}\mu_{i}^{2}+\sum_{i,j}J_{ij}\mu_{i}\mu_{j}-\sum_{i}\gamma_{i}\left(\mu_{i}^{2}-1\right).

Once again, the gains $\gamma_{i}$ play the role of Lagrange multipliers. Thus, a minimization of the power dissipation, subject to the optical gain $\gamma_{i}$ , solves the Ising problem defined by the same $J_{ij}$ couplings.

The power dissipation function and constraint function in the Lagrange function above are:

	$\displaystyle f(\bm{\mu})=\sum_{i}\alpha_{i}\mu_{i}^{2}+\sum_{i,j}J_{ij}\mu_{i}\mu_{j},$
	$\displaystyle g_{i}(\mu_{i})=\left(1-\mu_{i}^{2}\right)=0.$

5.3.3 Conclusions for Coupled Multicore Fibers

1.

The coupled multicore fiber system [13] minimizes the entropy production rate, or power dissipation, which includes the power input from the pump gain, as in Sections 5.1 and 5.2.
2.

The coupled multicore fiber system actually performs Lagrange Multiplier optimization.
3.

The gain $\gamma_{i}$ in each fiber $i$ plays the role of the corresponding Lagrange Multiplier.
4.

Under the constraint of completely light polarization, $\mu_{i}=\pm 1$ , the entropy generation minimization procedure is actually binary, similar to a flip-flop.

5.4 Coupled Electrical Oscillators on the Unit Circle

5.4.1 Overview

In this section, we consider a network of interconnected, nonlinear, but amplitude-stable electrical oscillators designed by Roychowdhury et al. [14] to represent a conventional Ising system for which we seek a digital solution with each magnetic dipole $\mu_{iz}=\pm 1$ along the z-axis in the magnetic dipole space. To solve this, Roychowdhury et al. provide a dissipative system of LC oscillators, somewhat similar to the Optical Parametric Oscillators in Section 5.1, but with oscillation amplitude clamped, and oscillation phase $\phi_{i}=0$ or $\pi$ revealing the preferred magnetic dipole orientation $\mu_{iz}=\pm 1$ . It is noteworthy that Roychowdhury goes beyond Ising machines and constructs general digital logic gates using these amplitude-stable oscillators in [28].

In the construction of their oscillators, Roychowdhury et al. [14] use nonlinear elements that behave like negative resistors at low voltage amplitudes but with saturating resistance at high voltage amplitudes. This produces of amplitude-stable oscillators. In addition, Roychowdhury et al. [14] provide a second harmonic pump and use a form of parametric amplification (referred to as sub-harmonic injection locking in [14]) to obtain bistability with respect to phase.

The dynamics of the amplitude saturation is purposely very fast and the oscillators are essentially clamped to an amplitude limit dictated by the nonlinear resistor in each oscillator. The phase dynamics induced by the 2^nd harmonic pump are slower. It is the readout of these phase shifts, 0 or $\pi$ that provides the magnetic dipole orientation $\mu_{iz}=\pm 1$ . One key difference between this system and Yamamoto’s Optical Parametric Oscillator (OPO) system is that the OPO system had fast phase dynamics and slow amplitude dynamics, while the injection locking system has the reverse.

5.4.2 Equations and comparison with Lagrange Multipliers

Roychowdhury et al. [14] derived the dynamics of their amplitude stable oscillator network using perturbation concepts developed in [29]. While a circuit diagram is not shown, [14] invokes the following dynamical equation for the phases of their electrical oscillators:

\frac{d\phi_{i}}{dt}=-\frac{1}{R_{c}}\sum_{j}{J_{ij}\sin{\left(\phi_{i}(t)-\phi_{j}(t)\right)}}-\lambda_{i}\sin{\left(2\phi_{i}(t)\right)},

(11)

where $R_{c}$ is a coupling resistance in their system, $\phi_{i}$ is the phase of the $i$ -th electrical oscillator, and the $\lambda_{i}$ are decay parameters that dictate how fast the phase angles settle towards their steady state values.

We shall now show that Eq (11) can be reproduced by iteratively minimizing the power dissipation in their system. Power dissipation across a resistor is $(V_{1}-V_{2})^{2}/R_{c}$ where $(V_{1}-V_{2})$ is the voltage difference across resistor $R_{c}$ . Since $V_{1}$ and $V_{2}$ are sinusoidal, the power dissipation consists of constant terms and a cross-term of the form:

f(\phi_{1},\phi_{2})=\frac{|V|^{2}\cos{\left(\phi_{1}-\phi_{2}\right)}}{R_{c}},

where $f(\phi_{1},\phi_{2})$ is the power dissipated in the resistors. Magnetic dipole orientation parallel or anti-parallel is represented by whether electrical oscillator phase difference is $\phi_{1}-\phi_{2}=0$ or $\phi_{1}-\phi_{2}=\pi$ respectively. We are permitted to choose an origin for angle space at $\phi=0$ which implies $\phi_{i}=0$ or $\phi_{i}=\pi$ . This can be implemented through a constraint of the form:

g_{i}(\phi_{i})=\left(\cos{\left(2\phi_{i}\right)}-1\right)=0.

Combining the power dissipated in the resistors with the constraint function $g_{i}(\phi_{i})=0$ we obtain a Lagrange function:

L(\bm{\phi},\bm{\lambda})=\frac{1}{R_{c}}\sum_{i,j}{J_{ij}\cos{\left(\phi_{i}-\phi_{j}\right)}}+\sum_{i}{\lambda_{i}\left(\cos{\left(2\phi_{i}\right)}-1\right)},

where $\lambda_{i}$ is the Lagrange Multiplier corresponding to the phase angle constraint, and $J_{ij}$ is a digital multiplier $\pm 1$ to the conductance $\frac{1}{R_{c}}$ .

The Lagrange function above is isomorphic with the general form in Section 4. The effective merit function $f$ and constraints $g_{i}$ in this correspondence are:

	$\displaystyle f(\bm{\phi})=\frac{1}{R_{c}}\sum_{i,j}{J_{ij}\cos{\left(\phi_{i}-\phi_{j}\right)}},$
	$\displaystyle g_{i}(\phi_{i})=\left(\cos{\left(2\phi_{i}\right)}-1\right)=0.$

5.4.3 Conclusions from Coupled Oscillators on the Unit Circle

1.

As in other cases, the amplitude-stable oscillator system [14] minimizes the entropy production rate, or power dissipation, subject to constraints of amplitude and phase reference.
2.

The amplitude-stable oscillator system actually performs Lagrange Multiplier optimization.
3.

The phase decay time constant $\lambda_{i}$ in each oscillator $i$ plays the role of the corresponding Lagrange Multiplier.
4.

Under the phase reference constraint, $\phi_{i}=0$ or $\pi$ , the entropy generation minimization scheme is actually binary, similar to a flip-flop.

5.5 Coupled polariton condensates

5.5.1 Overview

Kalinin and Berloff [15] proposed a system consisting of coupled polariton condensates to minimize the XY Hamiltonian. The XY Hamiltonian is a 2 dimensional version of the Ising Hamiltonian and is given by:

H(\bm{\mu})=\sum_{ij}J_{ij}\bm{\mu_{i}}\cdot\bm{\mu_{j}},

where the $\bm{\mu_{i}}$ represents the magnetic moment vector of the $i$ -th spin restricted to the spin-space XY plane.

To represent the spin system, Kalinin and Berloff pump a grid of coupled semiconductor microcavities with laser beams and observe the formation of strongly coupled exciton-photon states called polaritons. For our purposes, the polaritonic nomenclature is irrelevant. For us, these are simply coupled electromagnetic cavities similar to optical and LC resonators that we have already discussed. The electromagnetic system operates by the principle of minimum entropy generation similar to the previous cases. The complex electromagnetic amplitude in the $i$ -th microcavity can be written $E_{i}=c_{i}+\text{j}s_{i}$ , where $c_{i}$ and $s_{i}$ represent the cosine and sine quadrature components of $E_{i}$ , and j is the unit imaginary. In this case we identify $\text{Re}(E)=c$ as representing the X-component of the magnetic dipole vector, and $\text{Im}(E)=s$ representing the Y-component of the magnetic dipole vector. The electromagnetic microcavity system settles into a state of minimum entropy generation as the laser pump and optical gain are ramped up to compensate for the intrinsic cavity losses. The phase angles in the complex plane of the final electromagnetic modes are then reported as the corresponding $\bm{\mu}$ -magnetic moment angles in the XY plane.

An important point to note here is that the electromagnetic cavities experience normal phase-independent gain and not parametric gain which happens to be phase-dependent. As a consequence, this system does NOT seek phase bistability as appropriate for binary up/down spin orientations. This is because we are searching for the magnetic dipole vector angles in the XY plane which would minimize the corresponding XY magnetic energy.

5.5.2 Lagrange function and Lagrange multipliers

Ref. [15] uses ‘Ginzburg-Landau’ equations to analyze their system resulting in equations for the complex amplitudes $\Psi_{i}$ of the polariton wavefunctions. But the $\Psi_{i}$ are actually the complex electric field amplitudes $E_{i}$ of the corresponding $i$ -th cavity. The electric field amplitudes satisfy the following slowly-varying-amplitude equation:

\frac{dE_{i}}{dt}=\left(\gamma_{i}-\beta|E_{i}|^{2}\right)E_{i}-iU|E_{i}|^{2}E_{i}-\sum_{j}{J_{ij}E_{j}},

(12)

where $\gamma_{i}$ represents the optical gain, $\beta$ represents nonlinear attenuation, $U$ represents nonlinear phase shift, and $J_{ij}$ is a dissipative cross-coupling-term representing linear loss. We note from the above that both the amplitudes and phases of the electromagnetic modes are coupled to each other and evolve on comparable timescales. This is in contrast to ref. [14] where the main dynamics were embedded in phase—amplitude was fast and almost fixed—or conversely [12] embedded in amplitude—phase was fast and almost fixed.

We shall now show that the method of ref. [15] is essentially the method of Lagrange multipliers with an added ‘rotation’. The entropy generation or power dissipation rate is:

	$\displaystyle h(\bm{E})=$	$\displaystyle\frac{d}{dt}\sum_{i}{\left(\frac{E_{i}^{*}}{2}+\frac{E_{i}}{2}\right)^{2}}$
	$\displaystyle=$	$\displaystyle\frac{1}{2}\sum_{i,j}{J_{ij}\left(E_{i}^{}E_{j}+E_{i}E_{j}^{}\right)}+\sum_{i}{\beta\|E_{i}\|^{4}}-\sum_{i}{\gamma_{i}\|E_{i}\|^{2}}.$

If we add a saturation constraint, $g_{i}(E_{i})=\nobreak\left(1-|E_{i}|^{2}\right)=\nobreak 0$ , then by analogy to the previous sections, $\gamma_{i}$ is reinterpreted as a Lagrange Multiplier:

\begin{split}L(\bm{E},\bm{\gamma})=&\frac{1}{2}\sum_{i,j}{J_{ij}\left(E_{i}^{*}E_{j}+E_{i}E_{j}^{*}\right)}+\sum_{i}{\beta|E_{i}|^{4}}\\ &-\sum_{i}{\gamma_{i}\left(|E_{i}|^{2}-1\right)},\end{split}

(13)

where $L$ is the Lagrange function that represents power dissipation combined with the amplitude constraint $|E_{i}|^{2}=1$ . Thus, the scheme of coupled polaritonic resonators operates to find the state of minimum entropy generation in steady-state, similar to the parametric oscillator case of Section 5.1. That is, the steady-state solution of dynamics (12) renders (13) stationary with respect to changes in $\bm{E}$ . The difference is that the coupled polaritonic system solves the XY Ising problem for a magnetic moment restricted to the magnetic XY plane, while the parametric oscillator system, in Section 5.1, solves the $\pm$ Z Ising problem. For any particular mathematical optimization that we would perform, we still retain the burden of specifying that dissipative system whose entropy generation matches the optimization that we are seeking.

The imaginary ‘rotation’ term, $iU$ , could potentially be of use in developing more sophisticated algorithms than the method of Lagrange multipliers and we discuss this prospect in some detail in Section 6.2 where a system with a similar, but more general, ‘rotation’ term is discussed.

5.5.3 Iterative Evolution of Lagrange Multipliers

Next, we discuss how Kalinin and Berloff et al. [15], adjusted their Lagrange multipliers $\gamma_{i}$ during the course of their optimization. In the method of Lagrange multipliers, the merit-function, eq (13), is used to optimize not only the electric field amplitudes $E_{i}$ but also the Lagrange Multipliers $\gamma_{i}$ . The authors in Sections 5.1 and 5.2 used simple heuristics to adjust their gains/decay constants which we have proven to be the Lagrange multipliers. Kalinin and Berloff of this section employ the Lagrange function optimization itself to adjust the gain/losses, as in the complete Lagrange method discussed next.

To iteratively adjust the Lagrange multipliers, we shall briefly shift back to the tutorial notation of Section 4. Afterward, we shall translate it back to the notation of this section and show that Kalinin, Berloff et al., indeed use this same iterative procedure to adjust their Lagrange multipliers.

The Lagrange function is given by $L(\bm{x},\bm{\lambda})=f(\bm{x})+\sum_{i=1}^{p}{\lambda_{i}g_{i}(\bm{x})}$ , where $x_{i}$ are the field variables and $\lambda_{i}$ are the Lagrange multipliers. Then, the procedure to find the optimal $\bm{x^{*}}$ and $\bm{\lambda^{*}}$ is to perform gradient descent of $L$ in $\bm{x}$ and gradient ascent of $L$ in $\bm{\lambda}$ . The reason for ascent in $\bm{\lambda}$ rather than descent is to more strictly penalize deviations from the constraint. In the language of iterations, this leads to the expressions:

	$\displaystyle x_{i}(t+\Delta t)=x_{i}(t)-\kappa\Delta t\frac{\partial}{\partial x_{i}}L(\bm{x},\bm{\lambda}),$
	$\displaystyle\lambda_{i}(t+\Delta t)=\lambda_{i}(t)+\kappa^{\prime}\Delta t\frac{\partial}{\partial\lambda_{i}}L(\bm{x},\bm{\lambda}),$

where $\kappa$ and $\kappa^{\prime}$ are suitably chosen step sizes.

With our identification that the Lagrange multipliers $\lambda$ are actually the same as the gains $\gamma$ , we plug the expression for the full Lagrange function (13) into the second iterative equation, projecting out the constraint function $g_{i}(E_{i})=\left(1-|E_{i}|^{2}\right)=0$ , and take the limit $\Delta t\to 0$ . We obtain the following dynamical equation for the gains $\gamma_{i}$ :

\frac{d\gamma_{i}}{dt}=\kappa^{\prime}\left(1-|E_{i}|^{2}\right).

(14)

The above equation for the iterative evolution of the Lagrange multipliers is indeed the very same evolution that Kalinin, Berloff et al. employ in their coupled polariton system.

To Eq (14), we must add the iterative evolution of the field variables $x_{i}$ :

\frac{dx_{i}}{dt}=-\kappa\frac{\partial}{\partial x_{i}}L(\bm{x},\bm{\lambda}).

(15)

Equations (14) and (15) represent the full iterative evolution, but in some of the earlier sub-sections, $\gamma_{i}(t)$ was sometimes assigned a heuristic time dependence.

We conclude this sub-section by splitting the Lagrange function into the effective merit function $f$ , and the constraint function $g_{i}$ . The extra ‘phase rotation’ U is not captured by this interpretation.

	$\displaystyle f(E_{1},\dots,E_{n})=\frac{1}{2}\sum_{i,j}{J_{ij}\left(E_{i}^{}E_{j}+E_{i}E_{j}^{}\right)}+\sum_{i}{\beta\|E_{i}\|^{4}},$
	$\displaystyle g_{i}(E_{i})=\left(1-\|E_{i}\|^{2}\right)=0.$

5.5.4 Conclusions from Coupled Electromagnetic Cavities

1.

As in other cases, the coupled-cavity system [15] minimizes the steady state entropy production rate, or power dissipation, which includes the power input from the pump gain.
2.

The gain $\gamma_{i}$ in each oscillator $i$ plays the role of the corresponding Lagrange Multiplier.
3.

There is a phase rotation term $iU$ in the dynamics that is not captured by the Lagrange function framework.

6 Other methods in the literature

In this section, we look at other methods in the literature that do not explicitly implement the method of Lagrange Multiplier but nevertheless end up with dynamics that resemble it to varying extents. All these methods offer operation regimes where the dynamics is not analogous to Lagrange multiplier optimization, and we believe it is an interesting avenue of future work to study the capabilities of these regimes.

6.1 Iterative Analog Matrix Multipliers

Soljacic et al. [16] developed an iterative procedure consisting of repeated matrix multiplication to solve the Ising problem. Their algorithm was implemented on a photonic circuit that utilized on-chip optical matrix multiplication units composed of Mach-Zehnder interferometers that were first introduced for matrix algebra by Zeilinger et al. in [30]. Soljacic et al. showed that their algorithm performed optimization on an effective merit function, such as total magnetic Ising energy.

Let us use our insights from the previous sections and see how one would implement iterative optimization using an optical matrix multiplier. Let the multiple magnetic moment configuration of the Ising problem be represented as a vector of electric field amplitudes, $E_{i}$ , of the spatially-separated optical modes. Each mode field amplitude represents the value of each magnetic moment. In each iteration, the optical modes are fed into the optical circuit which performs matrix multiplication, and the resulting output optical modes are then fed back to the optical circuit input for the next iteration. Optical gain or some other type of gain sustains the successive iterations.

In this section we design an iterative optimization scheme for the Ising problem that involves only matrix multiplications in each iterative step and whose power dissipation function matches the magnetic Ising energy. A simple block diagram of such a scheme is shown in Fig. 8.

We wish to design the matrix multiplication unit such that it has the following power dissipation function:

h(\bm{E})=-\sum_{i}{\gamma_{i}|E_{i}|^{2}}+\frac{1}{2}\sum_{i,j}{J_{ij}\left(E_{i}^{*}E_{j}+E_{i}E_{j}^{*}\right)}.

The Lagrange function, including a binary constraint, $|E_{i}|^{2}=1$ , is given by:

L(\bm{E},\bm{\gamma})=-\sum_{i}{\gamma_{i}\left(|E_{i}|^{2}-1\right)}+\frac{1}{2}\sum_{i,j}{J_{ij}\left(E_{i}^{*}E_{j}+E_{i}E_{j}^{*}\right)},

(16)

where the $J_{ij}$ represents dissipative loss associated with electric field interference between optical modes in the Mach-Zehnder interferometers. The dissipation is compensated by optical gain $\gamma_{i}$ in the system.

The iterative multiplicative procedure that evolves the electric fields toward the minimum of the Lagrange function Eq (16) is given by:

\begin{split}E_{i}(t+1)-E_{i}(t)&=-\kappa\Delta t\frac{\partial}{\partial E_{i}}\left(\sum_{i}{\gamma_{i}\left(1-|E_{i}|^{2}(t)\right)}\right.\\ &+\left.\frac{1}{2}\sum_{i,j}{J_{ij}\left(E_{i}^{*}(t)E_{j}(t)+E_{i}(t)E_{j}^{*}(t)\right)}\right),\end{split}

where we move from iteration $t$ to iteration $t+1$ by taking steps in $E_{i}$ proportional to the gradient $\partial/\partial E_{i}$ of the Lagrange function. ( $\partial/\partial E_{i}$ represents separate differentiation with respect to the two quadrature components.) With each iteration, we feed the output of the Mach-Zehnder interferometer matrix-multiplier array back to the input, compensating for the losses with gain. Calculating the gradient, we obtain:

E_{i}(t+1)-E_{i}(t)=2\kappa\Delta t\left(\gamma_{i}E_{i}(t)-\sum_{j}{J_{ij}E_{j}(t)}\right),

where $\kappa$ is a constant step size, whose dimensionality should ensure consistency of units. Sending all the terms involving time step $t$ to the right hand side, we finally get:

E_{i}(t+1)=\sum_{j}{\left(\left(1+2\kappa\Delta t\gamma_{i}\right)\delta_{ij}-2\kappa\Delta tJ_{ij}\right)E_{j}(t)},

(17)

where $\delta_{ij}$ is the Kronecker delta that is 1 only if $i=j$ . The Mach-Zehnder interferometers should be tuned to the matrix $\left[\left(1+2\kappa\Delta t\gamma_{i}\right)\delta_{ij}-2\kappa\Delta tJ_{ij}\right]$ . Thus, we have an iterative matrix multiplier scheme that minimizes the Lagrange function and performs Lagrange multiplier optimization of the Ising problem. In effect, a lump of dissipative optical circuitry, compensated by optical gain, will, in a series of iterations, settle into a solution of the Ising problem.

The simple system above differs from that of Soljacic et al. [16] quite significantly, in that their method has added noise and nonlinear thresholding after each iteration. It is possible that their modifications lead to performance improvements, or possibly it might work equally well as our more standard approach. A detailed description of the Soljacic approach is presented in Appendix E.

6.2 Leleu Mathematical Ising Solver

Leleu et al. [11] proposed a modified version of the Yamamoto’s Ising machine [7]. Leleu’s method significantly resembles the Lagrange method while incorporating important new features that might be important research directions in their own right. To understand the similarities and differences between Leleu’s method and that of Lagrange multipliers, we shall first list out the Lagrange multipliers equations of motion.

We recall the Lagrange function for the Ising problem that we encountered in Section 5:

L(\bm{x},\bm{\gamma})=\sum_{i,j}{J_{ij}x_{i}x_{j}}+\sum_{i}{\alpha_{i}x_{i}^{2}}+\sum_{i}{\gamma_{i}\left(1-x_{i}^{2}\right)}.

In the above, $x_{i}$ are the optimization variables, $J_{ij}$ is the interaction matrix, $\gamma_{i}$ is the gain provided to the $i$ -th variable, and $\alpha_{i}$ is the loss experienced by the $i$ -th variable. To find a local optimum $(\bm{x^{*}},\bm{\gamma^{*}})$ that satisfies the constraints, one has to perform gradient descent on the Lagrange function in the $\bm{x}$ variables and gradient ascent in the $\bm{\gamma}$ variables as discussed in Section 5.5. That is, the iterations one would need to perform are:

	$\displaystyle x_{i}(t+\Delta t)=x_{i}(t)-\kappa\Delta t\frac{\partial}{\partial x_{i}}L(\bm{x},\bm{\gamma}),$
	$\displaystyle\gamma_{i}(t+\Delta t)=\gamma_{i}(t)+\kappa^{\prime}\Delta t\frac{\partial}{\partial\gamma_{i}}L(\bm{x},\bm{\gamma}),$

where $\kappa$ and $\kappa^{\prime}$ are suitably chosen step sizes. Substituting the expression for $L$ into the above and taking the limit of $\Delta t\to 0$ , we get the following equations of motion:

	$\displaystyle\frac{dx_{i}}{dt}=2\kappa\left(\left(-\alpha_{i}+\gamma_{i}\right)x_{i}-\sum_{j}{J_{ij}x_{j}}\right),$		(18)
	$\displaystyle\frac{d\gamma_{i}}{dt}=\kappa^{\prime}\left(1-x_{i}^{2}\right).$		(19)

We shall compare these equations with those designed by Leleu et al. in [11]. Their system is described by the following equations:

	$\displaystyle\frac{dx_{i}}{dt}=(-\alpha+\gamma)x_{i}+e_{i}\sum_{j}{J_{ij}x_{j}},$		(20)
	$\displaystyle\frac{de_{i}}{dt}=\beta(1-x_{i}^{2})e_{i},$		(21)

where the $x_{i}$ represent the optimization variables as usual, $\alpha$ represents the loss experienced by each degree of freedom, $\gamma$ represents a common gain supplied to each variable, $\beta$ is some suitably chosen positive parameter, and the $e_{i}$ represent error coefficients that capture how far away each $x_{i}$ is from its desired unity saturation amplitude. Leleu had cubic terms in $x_{i}$ in [11] but we shall ignore them in the present discussion for the sake of simplicity. A discussion of these cubic terms is given in Appendix C.

It is clear at once that there are significant similarities between Leleu’s system and the Lagrange multiplier system. The optimization variables in both systems experience linear losses and gains and have interaction terms that capture the Ising interaction. Both systems have auxiliary variables that are varied according to how far away each degree of freedom is from its preferred saturation amplitude. However, the similarities end here.

A major differentiation in Leleu’s system is that $e_{i}$ multiplies the Ising interaction felt by the $i$ -th variable resulting in $e_{i}J_{ij}$ . On the other hand, the complementary coefficient is $e_{j}J_{ij}$ . This has the consequence that Leleu’s equations implements a system which has non-symmetric interactions $e_{i}J_{ij}\neq e_{j}J_{ij}$ between vector components $x_{i}$ and $x_{j}$ . The inclusion of non-symmetric matrix terms seems to be important because Leleu’s system achieves excellent performance on Ising problems in the Gset problem set as demonstrated in [11].

Let us obtain some intuition about this system by splitting the non-symmetric interaction term $e_{i}J_{ij}$ into the sum of a symmetric and anti-symmetric part. This follows from the fact that any matrix $A$ can be written as the sum of a symmetric matrix, $(A+A^{T})/2$ , and an anti-symmetric matrix, $(A-A^{T})/2$ . The symmetric part leads to gradient descent dynamics just as in Eq (20), similar to all the systems in Section 5. Conversely, the anti-symmetric part causes a energy-conserving ‘rotary’ motion in the vector space of $x_{i}$ . Leleu et al.’s non-symmetric approach seems to contain the same number of auxiliary parameters, $e_{i}$ versus $\gamma_{i}$ , and so the secret of their improved performance seems to lie in this anti-symmetric part. We believe further analysis of this method might be a fruitful future research direction.

We recall Onsager’s reciprocity theorem [2] which states that the thermodynamic response coefficient $R_{ij}$ must be equal to the response coefficient $R_{ji}$ if time-reversal symmetry holds in the equilibrium state. This would imply that one might have to construct a magnetic system of some sort to break time-reversal symmetry in order to physically implement the asymmetry embedded in the dynamics designed by Leleu et al.

In conclusion, the symmetric part of Eq (20) takes iterative steps along the gradient of a dissipative merit function. The anti-symmetric part produce energy conserving rotations in the the vector space of optimization variables, $x_{i}$ . The associated dynamical freedom might provide a fruitful future research direction in optimization and deserves further study to ascertain its power.

7 Applications in Linear Algebra and Statistics

We have seen that minimum power dissipation solvers can address the Ising problem and various similar problems like the traveling salesman problem, etc. In this section, we provide yet another application of minimum entropy generation solvers to another optimization problem that appears frequently in statistics, namely curve fitting. In particular, we make the observation that the problem of linear least squares regression, linear curve fitting with a quadratic merit function, resembles the Ising problem. In fact, the electrical circuit example we presented in Section 5.2 can be applied to linear regression. We present such a circuit in this section. Our circuit provides a digital answer but requires a series of binary resistance values, that is, $\dots,2R_{0},R_{0},0.5R_{0},\dots$ to represent arbitrary binary statistical input observations.

The objective of linear least squares regression is to fit a linear function to a given set of data $\{(\bm{x_{1}},y_{1}),(\bm{x_{2}},y_{2}),(\bm{x_{3}},y_{3}),\dots,(\bm{x_{n}},y_{n})\}$ . The $\bm{x_{i}}$ s are input vectors of dimension $d$ while the $y_{i}$ are the observed outputs that we want our regression to capture. The linear function that is being fit is of the form $y(\bm{a})=\sum_{i=1}^{d}{w_{i}a_{i}}$ where $\bm{a}$ is a feature vector of length $d$ and $\bm{w}$ is a vector of unknown weights multiplying the features vector $\bm{a}$ . The vector $\bm{w}$ is calculated by minimizing the sum of the squared errors it causes when used on an actual data set. This optimization problem may be represented as:

\bm{w^{*}}=\min_{\bm{w}}\sum_{i=1}^{n}{\left[\left(\sum_{j=1}^{d}{w_{j}x_{ij}}\right)-y_{i}\right]^{2}},

where $x_{ij}$ is the $j$ -th component of the vector $\bm{x_{i}}$ . The merit function, upon expansion, yields $\sum_{i=1}^{d}y_{i}^{2}-2\sum_{i=1}^{d}\left(\sum_{j=1}^{n}x_{ji}y_{j}\right)w_{i}+\sum_{i,j=1}^{d}\left(\sum_{k=1}^{n}x_{ki}x_{kj}\right)w_{i}w_{j}$ . This functional form is identical to that of the Ising Hamiltonian and we may construct an Ising circuit with $J_{ij}=\sum_{k=1}^{n}x_{ki}x_{kj}$ with the weights $\bm{w}$ acting like the unknown magnetic moments. There is an effective magnetic field in the problem, $h_{i}=-2\sum_{j=1}^{n}x_{ji}y_{j}$ , and a fixed number $\sum_{i=1}^{d}y_{i}^{2}$ which doesn’t play a role in the optimization. A simple circuit that solves the linear least squares problem for $d=2$ —the case where there are two features per instance and, consequently, two weights $w_{d}$ to be estimated—is provided in Fig. 9. This circuit provides weights upto 2-bit precision.

The two oscillators on the left hand side of the figure represent the $2^{0}$ and $2^{1}$ bits of the weight corresponding to the first feature while the oscillators on the right hand side are the corresponding bits of the second weight.

The cross-resistance $R$ that one would need to represent the $J_{ij}$ that connects the $i$ -th and $j$ -th oscillators is calculated as:

\frac{1}{R}=\frac{b_{1}}{R_{-1}}+\frac{b_{0}}{R_{0}}+\frac{b_{-1}}{R_{1}},

where $R_{m}=2^{m}R_{0}$ is a binary hierarchy of resistances $\dots,2R_{0},R_{0},0.5R_{0},\dots$ based on a reference resistor $R_{0}$ , and $b_{m}$ are the binary digits of $J_{ij}$ expressed as a binary number: $b=b_{1}\times 2^{1}+b_{0}\times 2^{0}+b_{-1}\times 2^{-1}$ . This represents $J_{ij}$ to 3-bit precision using resistors that span a dynamic range $2^{2}=4$ . Further, the sign of the coupling is allotted according to whether the resistors $R$ are parallel-connected or cross-connected. In operation, the resistors $R$ would be externally programmed to the correct binary values, with many more bits than 3-bit precision, as given by the matrix product $J_{ij}=\sum_{k=1}^{n}x_{ki}x_{kj}$ .

We have just solved the regression problem of the form $\bm{Xw}=\bm{y}$ , where matrix $\bm{X}$ and vector $\bm{y}$ were known measurements and the corresponding best weight vector $\bm{w}$ for fitting was the unknown. We conclude by noting that this same procedure can be adopted to solve linear systems of equations of the form $\bm{Xw}=\bm{y}$ .

8 Discussion and Conclusion

Physics obeys a number of maximization or minimization principles such as the principle of Least Action, the principle of Minimum Entropy Generation, the Variational Principle, Physical Annealing, and the Adiabatic Principle (which in its quantum form is called Quantum Annealing).

Optimization is very significant in diverse fields ranging from scheduling and routing in operations research to protein folding in biology, porfolio optimization in finance, and energy minimization in physics. In this article, we made the observation that physics has optimization principles at its heart, and that they can be exploited to design fast, low power, digital solvers that avoid the limits of standard computational paradigms. Nature thus provides us with a means to solve optimization problems in all these areas including Engineering, Artificial Intelligence, Machine Learning (backpropagation), Control Theory, and Reinforcement Learning.

We reviewed 7 physical machines, proposed or built, that purported to solve the Ising problem and found that 6 of the 7 were performing Lagrange multiplier optimization; further, they also obey the Principle of Minimized Entropy generation (always subject to a power input constraint). This means that by appropriate choice of parameter values, these physical solvers can be used to perform Lagrange multiplier optimization orders-of-magnitude faster and with lower power than conventional digital computers. This performance advantage can be utilized for optimization in machine learning applications where energy and time considerations are critical.

The questions arise: What are the action items? And what is the most promising near term application? All the hardware approaches seem to work comparably well. The easiest to implement would be the electrical oscillator circuits, though the optical oscillator arrays can be compact and very fast. Electrically, there would two integrated circuits, the oscillator array, and the connecting resistors that would need to be reprogrammed for different problems. The action item could be to design the first chip consisting of about 1000 oscillators, and a second chip that would consist of the appropriate coupling resistor array, for a specific optimization problem. The resistors should be in an addressable binary hierarchy so that any desired resistance value can be programmed in by switches, within the number of bits accuracy. It is possible to imagine solving a new Ising problem ever milli-second, by reprogramming the resistor chip.

On the software side, a compiler would need to be developed to go from an unsolved optimization problem to the resistor array which matches that desired goal. If the merit function were mildly nonlinear, we believe that the Principle of Minimum Entropy generation would still hold, but there has been less background science justifying that claim.

With regard to the most promising near term application, it might be in Control Theory or in Reinforcement Learning in self-driving vehicles, where rapid answers are required, at modest power dissipation.

The act of computation can be regarded as a search among many possible answers. Finally, the circuit converges to a final correct configuration. Thus the initial conditions may include a huge phase space volume= $2^{n}$ of possible solutions, ultimately transitioning into a final configuration representing a small or modest-sized binary number. This type of computing implies a substantial entropy reduction. This led to Landauer’s admonition that computation costs $kn\log{2}$ of entropy decrease, and $kTn\log{2}$ of energy, for a final answer with $n$ binary digits.

By the 2^nd Law of Thermodynamics, such an entropy reduction must be accompanied by an entropy increase elsewhere. For example, in a dissipative circuit, electricity converts to heat. In Landauer’s viewpoint, the energy and entropy limit of computing was associated with the final acting of writing out the answer in $n$ -bits, assuming the rest of the computer was reversible. In practice, technology consumes $\sim 10^{4}\times$ times more than the Landauer limit, owing to the insensitivity of the transistors operating at $\sim$ 1Volt, when they could be operating at $\sim$ 10mVolts.

In the continuously dissipative circuits we have described here, the energy consumed would be infinite if we waited long enough for the system to reach the final optimal state. If we terminate the powering of our optimizer systems after they reach the desired final state answer, the energy consumed becomes finite. By operating at voltage $<$ 1Volt and by powering off after the desired answer is achieved, our continuously dissipating Lagrange optimizers could actually be closer to the Landauer limit than a conventional computer.

A controversial point relates to the quality of solutions that are obtained for NP-hard problems. The physical systems we are proposing evolve by steepest descent toward a local optimum, not a global optimum. Nonetheless, many of the authors of the 7 physical systems presented here have claimed to find better local optima than their competitors, due to special adjustments in their methods. Undoubtedly, some improvements are possible, but none of the 7 papers reviewed here claims to always find the one global optimum which would be NP-hard [31].

We have shown that a number of physical systems that perform optimization are acting through the Principle of Minimum Entropy generation, though other physics principles could also fulfill this goal. As the systems evolve toward an extremum, they perform Lagrange function optimization where the Lagrange Multipliers are given by the gain or loss coefficients that keep the machine running. Thus, Nature provides us with a series of physical Optimization Machines that are much faster and possibly more energy efficient than conventional computers.

The authors gratefully acknowledge useful discussions with Dr. Ryan Hamerly, Dr. Tianshi Wang, and Prof. Jaijeet Roychowdhury. The work of S. K. Vadlamani and Prof. E. Yablonovitch was supported by the National Science Foundation through the Center for Energy Efficient Electronics Science $(E^{3}S)$ under Award ECCS-0939514 and the Office of Naval Research under grant #N00014-14-1-0505.

Appendix A Lagrange multipliers theory

In this appendix, we shall study in greater detail the method of Lagrange multipliers, one of the most well-known techniques for the solution of constrained optimization problems. At its core, it simply involves the modification of the merit function by adding terms that penalize constraint violations.

In the setting of constrained optimization, we are required to minimize an objective function $f(\bm{x})$ with respect to all points that satisfy certain given inequality ( $h_{j}(\bm{x})\leq 0$ ) and equality ( $g_{i}(\bm{x})=0$ ) constraints. Let us assume that the domain set $D$ , that is, the intersection of the domains of $f(\bm{x}),g_{i}(\bm{x}),h_{j}(\bm{x})$ , is a subset of $\mathbb{R}^{n}$ . Further, let the set of points in $D$ that satisfy all the constraints $g_{i},h_{j}$ be called $F$ , the feasible set. Then, the optimization problem can be written as:

	minimize	$\displaystyle f(\bm{x})$
	subject to	$\displaystyle h_{j}(\bm{x})\leq 0,\ j=1,\dots,m,$
		$\displaystyle g_{i}(\bm{x})=0,\ i=1,\dots,p,$
		$\displaystyle\bm{x}\in D.$

A.1 Lagrange function

We now define a new function in $n+m+p$ variables, called the Lagrange function, as follows:

L(\bm{x},\bm{\lambda},\bm{\mu})=f(\bm{x})+\sum_{j=1}^{m}\mu_{j}h_{j}(\bm{x})+\sum_{i=1}^{p}\lambda_{i}g_{i}(\bm{x}).

The summations that were added to the plain objective function $f(\bm{x})$ serve as constraint violation penalty terms. The coefficients multiplying the penalty terms, $\lambda_{i}$ and $\mu_{j}$ , are known as Lagrange multipliers. The inequality Lagrange multipliers $\mu_{j}$ are constrained to be non-negative in order that the penalty that arises when the inequality constraints are not satisfied (i.e. when $h_{j}(\bm{x})>0$ ) is non-negative. The equality Lagrange multipliers, $\lambda_{i}$ , have no such restrictions.

The Lagrange function has the advantage that it helps us express the constrained optimization of $f$ as an unconstrained optimization of $L$ . That is, it can be shown that:

\min_{\bm{x}\in F}f(\bm{x})=\min_{\bm{x}\in D}\max_{(\bm{\mu}\geq 0),\bm{\lambda}}L(\bm{x},\bm{\lambda},\bm{\mu}).

The minimization over $\bm{x}$ in the feasible set, $F$ , on the left-hand side has turned into a minimization over $\bm{x}$ in the entire domain, $D$ , on the right-hand side.

A.2 Karush-Kuhn-Tucker (KKT) sufficient conditions

The Lagrange function also appears in an important, related role. While the conditions for a point $\bm{x^{*}}$ to be an unconstrained optimum of a differentiable function $f$ are expressed in terms of the gradient of $f(\bm{x})$ , the optimality conditions for constrained optimization problems are naturally expressed in terms of the Lagrange function. These conditions are presented next.

A point $\bm{x^{*}}$ is a local optimum of the function $f(\bm{x})$ subject to the constraints $g_{i},h_{j}$ if it satisfies the Karush-Kuhn-Tucker (KKT) sufficient conditions:

1.

Primal feasibility: The point $\bm{x^{*}}$ is feasible, that is, it satisfies all the constraints.

$\displaystyle h_{j}(\bm{x^{*}})$ $\displaystyle\leq 0,\ j=1,\dots,m,$

$\displaystyle g_{i}(\bm{x^{*}})$ $\displaystyle=0,\ i=1,\dots,p.$

First-order condition: There exist Lagrange multipliers $\mu_{1}^{*},\dots,\mu_{m}^{*},\lambda^{*}_{1},\dots,\lambda^{*}_{p}$ such that the following equation is satisfied:

	$\displaystyle\ \ \ \ \bm{\nabla_{x}}L(\bm{x^{}},\bm{\mu^{}},\bm{\lambda^{*}})$
	$\displaystyle=\bm{\nabla_{x}}f(\bm{x^{}})+\sum_{j=1}^{m}\mu_{j}^{}\bm{\nabla_{x}}h_{j}(\bm{x^{}})+\sum_{i=1}^{p}\lambda_{i}^{}\bm{\nabla_{x}}g_{i}(\bm{x^{*}})$
	$\displaystyle=0.$

3.

Second-order condition: In addition to the first-order condition, if all the concerned functions are twice differentiable, we require that the Hessian of $L$ with respect to $\bm{x}$ be positive definite along all directions that respect the active constraints. That is, the following equation has to be satisfied:

$\displaystyle\bm{v}^{T}\bm{\nabla_{xx}}^{2}L(\bm{x^{*}},\bm{\mu^{*}},\bm{\lambda^{*}})\bm{v}>0,\ \forall\bm{v}\in T,$

where $T$ is defined as $T=\{\bm{v}:\ \left(\bm{\nabla_{x}}h_{j}(\bm{x^{*}})\right)^{T}\bm{v}=0,\ j=$ active constraints at $\bm{x^{*}},\ \left(\bm{\nabla_{x}}g_{i}(\bm{x^{*}})\right)^{T}\bm{v}=0,\ i=1,\dots,p\}$ .
4.

Complementary slackness: $\mu_{j}^{*}h_{j}(\bm{x^{*}})=0$ is satisfied for all the inequality constraints, $j=1,\dots,m$ .
5.

Dual feasibility: The Lagrange multipliers of all the inequality constraints satisfy $\mu_{j}^{*}\geq 0,\ j=1,\dots,m$ , with the inequality being strict for active constraints.

We recognize condition number 2 as the one we encountered in Section 4 in the main text.

A.3 Saddle point nature of global and local optima

If the functions $f$ , $g_{i}$ , $h_{j}$ , and the feasible set $F$ satisfy certain special conditions (such as, for instance, the Slater condition), a phenomenon called strong duality occurs. Under such circumstances, it turns out that the minimization and maximization in the above equation can be switched:

\min_{\bm{x}\in D}\max_{(\bm{\mu}\geq 0),\bm{\lambda}}L(\bm{x},\bm{\lambda},\bm{\mu})=\max_{(\bm{\mu}\geq 0),\bm{\lambda}}\min_{\bm{x}\in D}L(\bm{x},\bm{\lambda},\bm{\mu}).

When this occurs, it can be shown that the optimal pair $(\bm{x^{*}},\bm{\lambda^{*}},\bm{\mu^{*}})$ that satisfies the KKT conditions actually forms a saddle point of the Lagrange function, $L(\bm{x},\bm{\lambda},\bm{\mu})$ . That is, one has:

	$\displaystyle L(\bm{x^{}},\bm{\lambda^{}},\bm{\mu^{*}})$	$\displaystyle\leq L(\bm{x},\bm{\lambda^{}},\bm{\mu^{}}),\$	$\displaystyle\forall\ \bm{x}\in D,$
	$\displaystyle L(\bm{x^{}},\bm{\lambda^{}},\bm{\mu^{*}})$	$\displaystyle\geq L(\bm{x^{*}},\bm{\lambda},\bm{\mu}),\$	$\displaystyle\forall\ \bm{\mu}\geq 0,\ \bm{\lambda}.$

This can be viewed as a simple motivation for the gradient descent of $L$ in $x$ and gradient ascent of $L$ in $\lambda$ that we encountered in Section 5.5. In the next section, Appendix B, we shall follow Chapter 4 of the excellent book [25] and see how this descent/ascent procedure can be used to find local optima even if strong duality doesn’t hold.

Appendix B Lagrange multipliers algorithms: Augmented Lagrangian

In this section, we discuss the so-called ‘Augmented Lagrangian method of multipliers’, a popular algorithm used to obtain locally optimal solutions $(\bm{x^{*}},\bm{\lambda^{*}})$ that satisfy the KKT conditions. We shall be only considering the case where there are no inequality constraints $h_{j}$ (and, consequently, no $\bm{\mu}$ multipliers). This algorithm is discussed in detail in [25].

To motivate the ‘Augmented Lagrangian’ approach, let us observe the KKT conditions a bit closely. We conclude from the first-order condition that a locally optimum $\bm{x^{*}}$ renders the function $L(\bm{x},\bm{\lambda^{*}})$ stationary. However, this doesn’t guarantee that $\bm{x^{*}}$ is a minimum of $L(\bm{x},\bm{\lambda^{*}})$ . Indeed, observation of the second-order condition tells us that $\bm{x^{*}}$ could be a saddle point of $L(\bm{x},\bm{\lambda^{*}})$ . This means that gradient descent-based algorithms will not converge to $\bm{x^{*}}$ if the starting point is not in the correct region. It was to solve this problem that the ‘Augmented Lagrangian’ method was invented.

The Augmented Lagrange function, $L_{c}(\bm{x},\bm{\lambda})$ , is given by:

L_{c}(\bm{x},\bm{\lambda})=f(\bm{x})+\sum_{i=1}^{p}\lambda_{i}g_{i}(\bm{x})+\frac{c}{2}\left(\sum_{i=1}^{p}\left(g_{i}(\bm{x})\right)^{2}\right).

This function $L_{c}(\bm{x},\bm{\lambda})$ , for a suitable choice of $c$ , has the property that $\bm{x^{*}}$ forms a minimum of $L_{c}(\bm{x},\bm{\lambda^{*}})$ and not just a saddle point as was the case with $L(\bm{x},\bm{\lambda^{*}})$ .

In other words, local optima of the original constrained optimization problem can be obtained if we perform an unconstrained optimization of $L_{c}(\bm{x},\bm{\lambda^{*}})$ . However, for this procedure to be a feasible solution approach, we would have to know the right $\bm{\lambda^{*}}$ . It has been shown in [25] that the way to do this is to perform gradient ascent of $L_{c}$ in the $\bm{\lambda}$ variables.

The ‘Augmented Lagrangian method of multipliers’ involves the repeated minimization of $L_{c}(\bm{x},\bm{\lambda}^{(k)})$ using progressively better estimates, $\bm{\lambda}^{(k)}$ , of $\bm{\lambda^{*}}$ . The algorithm starts off with an arbitrary starting point $(x^{(0)},\lambda^{(0)})$ . It then performs the following steps repeatedly:

1.

Locally minimize $L_{c}(\bm{x},\bm{\lambda}^{(k)})$ and call the minimum point $\bm{x}^{(k)}$ .
2.

$\lambda_{i}^{(k+1)}=\lambda_{i}^{(k)}+cg_{i}(\bm{x}^{(k)})$ .

The second step above corresponds to gradient ascent of $L_{c}(\bm{x},\bm{\lambda})$ in the $\bm{\lambda}$ variables. Basically, this method performs a fast gradient descent of $L_{c}$ in the $\bm{x}$ directions in conjunction with a slow gradient ascent of $L_{c}$ in the $\bm{\lambda}$ directions. A dynamical system that performs this process in continuous time is given below:

	$\displaystyle\frac{dx_{i}}{dt}$	$\displaystyle=-\kappa\frac{\partial}{\partial x_{i}}L_{c}(\bm{x},\bm{\lambda}),$
	$\displaystyle\frac{d\lambda_{i}}{dt}$	$\displaystyle=\kappa^{\prime}\frac{\partial}{\partial\lambda_{i}}L_{c}(\bm{x},\bm{\lambda}).$

Appendix C Application of Lagrange multipliers to the Ising problem; Cubic terms

In this appendix, we shall provide an explanation of the role that is played by the cubic terms—that were neglected in the main text—in the methods of Yamamoto et al. [7], Kalinin et al. [15], and others. It will turn out that the inclusion of cubic terms helps us implement the ‘Augmented Lagrangian method of multipliers’ from Appendix B.

The statement of the Ising problem is as follows:

	minimize	$\displaystyle\bm{x}^{T}\bm{J}\bm{x}$
	subject to	$\displaystyle x_{i}^{2}-1=0,\ \ i=1,\dots,n.$

The corresponding Lagrange function is given by:

L(\bm{x},\bm{\lambda})=\bm{x}^{T}\bm{Jx}-\sum_{i=1}^{n}\lambda_{i}(x_{i}^{2}-1).

Next, we write down the Augmented Lagrange function:

L_{c}(\bm{x},\bm{\lambda})=\bm{x}^{T}\bm{Jx}-\sum_{i=1}^{n}\lambda_{i}(x_{i}^{2}-1)+\frac{c}{2}\left(\sum_{i=1}^{n}\left(x_{i}^{2}-1\right)^{2}\right).

Substituting the above Augmented Lagrange function into the dynamical system provided at the end of Appendix B, we get:

	$\displaystyle\frac{dx_{i}}{dt}$	$\displaystyle=2\kappa\left(\lambda_{i}x_{i}+2cx_{i}-2cx_{i}^{3}-\sum_{j=1}^{n}{J_{ij}x_{j}}\right),$
	$\displaystyle\frac{d\lambda_{i}}{dt}$	$\displaystyle=\kappa^{\prime}(1-x_{i}^{2}),$

where $\kappa$ and $\kappa^{\prime}$ are appropriately chosen step sizes. We notice the equivalence in form between this dynamical system and those in the papers discussed in the main text and conclude that the cubic terms that appear in most of those systems are in fact helping to implement the Augmented Lagrangian method of multipliers.

Appendix D Adiabatic method using Kerr $\left(\chi^{(3)}\right)$ nonlinear coupled oscillators

Researchers at Toshiba proposed an adiabatic Ising solver consisting of networks of coupled Kerr nonlinear oscillators [17], [32], [33]. The big difference between the machines we have studied so far and the Toshiba machine is that Toshiba uses dissipation-less optical systems and utilizes the adiabatic method to optimize the Ising Hamiltonian. By dissipation-less, we mean that the coupling between different oscillators is not dissipative but perfectly elastic.

Further, they replace parametric oscillators that use the $\chi^{(2)}$ nonlinearity with Kerr-parametric oscillators that possess both the $\chi^{(2)}$ and the $\chi^{(3)}$ nonlinearities. The $\chi^{(2)}$ nonlinearity gives rise to effects such as parametric amplification whereas the $\chi^{(3)}$ nonlinearity leads to intensity dependent refractive index change. A second-harmonic pump signal is used to achieve parametric amplification. The refractive index of the modes in the oscillators is modulated by their own intensities due to the third-order nonlinearity.

The slowly varying amplitude equations for the sine ( $s_{i}$ ) and cosine ( $c_{i}$ ) components of the electric field in their system are given by:

	$\displaystyle\frac{dc_{i}}{dt}=K\left(c_{i}^{2}+s_{i}^{2}\right)s_{i}+ps_{i}+\xi_{0}\sum_{j}{J_{ij}s_{j}},$
	$\displaystyle\frac{ds_{i}}{dt}=-K\left(c_{i}^{2}+s_{i}^{2}\right)c_{i}+pc_{i}-\xi_{0}\sum_{j}{J_{ij}c_{j}},$

where $c_{i}$ and $s_{i}$ represent the cosine and sine components of the $i$ -th oscillator, $p$ refers to the strength of the parametric pumping, $K$ is the value of the third-order nonlinearity (Kerr coefficient), and $\xi_{0}$ is the strength of the coupling interaction between the oscillators.

We rotate the axes and recast the above set of equations in phasor notation as follows to elucidate the similarity between this method and that of Cambridge:

\frac{dE_{i}}{dt}=\kappa\left(pE_{i}^{*}+iK|E_{i}|^{2}E_{i}+i\sum_{j}{J_{ij}E_{j}}\right).

The first term on the right-hand side is the parametric gain, the second term is the nonlinear rotation caused by the Kerr nonlinearity, and the third term is the dissipation-less coupling between oscillators.

The machine, which follows the adiabatic principle, works by ramping up the value of the pump $p$ adiabatically. The authors show that their system Hamiltonian mimics the Ising Hamiltonian well when p becomes large enough.

We note the interesting fact that, though the authors have parametric gain, the system does not blow up to infinity due to the presence of the nonlinear rotating term. This term ensures that the quadrature vector keeps traversing gain and loss regions periodically in the phase space, hence keeping the amplitudes in check.

Appendix E Iterative Analog Matrix Multipliers

In this appendix, we present the system designed by Soljacic et al. [16]. We shall see that the simplified system we presented in Sec 6a of the main text differs from that of Soljacic et al. significantly in that their method has added noise and nonlinear thresholding after each iteration. It is possible that their modifications lead to performance improvements.

Formally, their iteration is given by:

\bm{E}(t+1)=u\left(2\bm{KE}(t)+\bm{N}(t)\right),

where $\bm{E}(t)$ is the vector of electric field amplitude values at the beginning of the $t$ -th iteration, $u(x)$ is the Heaviside step function that is 1 for positive $x$ and 0 for negative $x$ , $\bm{N}(t)$ is a zero-mean Gaussian random noise vector, and $K$ is a matrix given by $K=\sqrt{\bm{J}+\alpha\bm{M}}$ , $\bm{J}$ is the Ising connectivity matrix, $\alpha$ , a real number, and $\bm{M}$ , some suitably chosen matrix. More specifically, $\bm{M}$ is chosen to have the same eigenvectors as $\bm{J}$ . It will turn out that the eigenvalues of $\bm{M}$ play the role of Lagrange multipliers.

The authors showed that under the condition of high noise $\bm{N}(t)$ , their system performs minimization of the following effective merit function:

H=-\frac{\beta}{2}\sum_{ij}{\left(J_{ij}+\alpha M_{ij}\right)E_{i}E_{j}},

where $\beta$ is some parameter dependent on the noise.

Using the fact that the matrix $\bm{M}$ is chosen to have the same eigenvectors as $\bm{J}$ , we rewrite the above merit function, modulo additive constants, as the following Lagrange function:

H=L(\bm{E},\bm{\gamma})=-\frac{\beta}{2}\left(\sum_{ij}{J_{ij}E_{i}E_{j}}+\alpha\sum_{i}{\gamma_{i}\left(z_{i}^{2}-1\right)}\right),

where the $\gamma_{i}$ are the eigenvalues of the matrix $\bm{M}$ , and the vector $\bm{z}$ is the vector of electric field amplitudes $\bm{E}$ expressed in the basis of eigenvectors of $\bm{M}$ (that is, the eigenvectors of $\bm{J}$ ). We see that the eigenvalues of $\bm{M}$ play the role of Lagrange multipliers, albeit for different constraints than those required by the Ising problem. This difference seems to be caused by $\bm{M}$ not being a diagonal matrix.

In conclusion, we interpret their algorithm as optimizing a Lagrange function with the merit function being the Ising Hamiltonian itself, and the constraints being that the components of the spin vector when expressed in the eigenvector basis of $\bm{J}$ be restricted to 1 and -1.

References

[1] Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, D. Englund, and M. Soljačić, “Deep learning with coherent nanophotonic circuits,” Nature Photonics, vol. 11, pp. 441–446, July 2017.
[2] L. Onsager, “Reciprocal Relations in Irreversible Processes. II.,” Physical Review, vol. 38, pp. 2265–2279, Dec. 1931.
[3] A. Lucas, “Ising formulations of many NP problems,” arXiv preprint arXiv:1302.5843, 2013.
[4] S. Utsunomiya, K. Takata, and Y. Yamamoto, “Mapping of Ising models onto injection-locked laser systems,” Optics express, vol. 19, no. 19, pp. 18091–18108, 2011.
[5] Z. Wang, A. Marandi, K. Wen, R. L. Byer, and Y. Yamamoto, “Coherent Ising machine based on degenerate optical parametric oscillators,” Physical Review A, vol. 88, no. 6, p. 063853, 2013.
[6] A. Marandi, Z. Wang, K. Takata, R. L. Byer, and Y. Yamamoto, “Network of time-multiplexed optical parametric oscillators as a coherent Ising machine,” Nature Photonics, vol. 8, no. 12, p. 937, 2014.
[7] Y. Haribara, S. Utsunomiya, and Y. Yamamoto, “Computational Principle and Performance Evaluation of Coherent Ising Machine Based on Degenerate Optical Parametric Oscillator Network,” Entropy, vol. 18, p. 151, Apr. 2016.
[8] P. L. McMahon, A. Marandi, Y. Haribara, R. Hamerly, C. Langrock, S. Tamate, T. Inagaki, H. Takesue, S. Utsunomiya, and K. Aihara, “A fully programmable 100-spin coherent Ising machine with all-to-all connections,” Science, vol. 354, no. 6312, pp. 614–617, 2016.
[9] T. Inagaki, K. Inaba, R. Hamerly, K. Inoue, Y. Yamamoto, and H. Takesue, “Large-scale Ising spin network based on degenerate optical parametric oscillators,” Nature Photonics, vol. 10, no. 6, p. 415, 2016.
[10] T. Inagaki, Y. Haribara, K. Igarashi, T. Sonobe, S. Tamate, T. Honjo, A. Marandi, P. L. McMahon, T. Umeki, K. Enbutsu, O. Tadanaga, H. Takenouchi, K. Aihara, K.-i. Kawarabayashi, K. Inoue, S. Utsunomiya, and H. Takesue, “A coherent Ising machine for 2000-node optimization problems,” Science, vol. 354, pp. 603–606, Nov. 2016.
[11] T. Leleu, Y. Yamamoto, P. L. McMahon, and K. Aihara, “Destabilization of Local Minima in Analog Spin Systems by Correction of Amplitude Heterogeneity,” Physical Review Letters, vol. 122, p. 040607, Feb. 2019.
[12] T. P. Xiao, Optoelectronics for refrigeration and analog circuits for combinatorial optimization. PhD Thesis, UC Berkeley, 2019.
[13] M. Babaeian, D. T. Nguyen, V. Demir, M. Akbulut, P.-A. Blanche, Y. Kaneda, S. Guha, M. A. Neifeld, and N. Peyghambarian, “A single shot coherent Ising machine based on a network of injection-locked multicore fiber lasers,” Nature Communications, vol. 10, p. 3516, Dec. 2019.
[14] T. Wang and J. Roychowdhury, “OIM: Oscillator-Based Ising Machines for Solving Combinatorial Optimisation Problems,” in Unconventional Computation and Natural Computation (I. McQuillan and S. Seki, eds.), vol. 11493, pp. 232–256, Cham: Springer International Publishing, 2019.
[15] K. P. Kalinin and N. G. Berloff, “Global optimization of spin Hamiltonians with gain-dissipative systems,” Scientific Reports, vol. 8, p. 17791, Dec. 2018.
[16] C. Roques-Carmes, Y. Shen, C. Zanoci, M. Prabhu, F. Atieh, L. Jing, T. Dubček, C. Mao, M. R. Johnson, V. Čeperić, J. D. Joannopoulos, D. Englund, and M. Soljačić, “Heuristic recurrent algorithms for photonic Ising machines,” Nature Communications, vol. 11, p. 249, Dec. 2020.
[17] H. Goto, K. Tatsumura, and A. R. Dixon, “Combinatorial optimization by simulating adiabatic bifurcations in nonlinear Hamiltonian systems,” Science Advances, vol. 5, p. eaav2372, Apr. 2019.
[18] B. Molnár, F. Molnár, M. Varga, Z. Toroczkai, and M. Ercsey-Ravasz, “A continuous-time MaxSAT solver with high analog performance,” Nature Communications, vol. 9, p. 4864, Dec. 2018.
[19] D. Pierangeli, G. Marcucci, and C. Conti, “Large-Scale Photonic Ising Machine by Spatial Light Modulation,” Physical Review Letters, vol. 122, p. 213902, May 2019.
[20] D. Pierangeli, M. Rafayelyan, C. Conti, and S. Gigan, “Scalable spin-glass optical simulator,” arXiv:2006.00828 [physics], June 2020. arXiv: 2006.00828.
[21] I. Prigogine, Etude Thermodynamique des Phénomènes irréversibles. Editions Desoer, Liège, 1947. Chap V.
[22] S. de Groot, Thermodynamics of Irreversible Processes. Interscience Publishers, Inc., New York, 1951. Chap X.
[23] N. G. Dickson, M. W. Johnson, M. H. Amin, R. Harris, F. Altomare, A. J. Berkley, P. Bunyk, J. Cai, E. M. Chapple, and P. Chavez, “Thermally assisted quantum annealing of a 16-qubit problem,” Nature communications, vol. 4, p. 1903, 2013.
[24] S. Boyd and L. Vandenberghe, Convex optimization. Cambridge university press, 2004.
[25] D. Bertsekas, Nonlinear programming. Athena Scientific, 1999.
[26] https://web.stanford.edu/~yyye/yyye/Gset/.
[27] U. Benlic and J.-K. Hao, “Breakout Local Search for the Max-Cutproblem,” Engineering Applications of Artificial Intelligence, vol. 26, pp. 1162–1173, Mar. 2013.
[28] J. Roychowdhury, “Boolean Computation Using Self-Sustaining Nonlinear Oscillators,” Proceedings of the IEEE, vol. 103, pp. 1958–1969, Nov. 2015.
[29] A. Demir, A. Mehrotra, and J. Roychowdhury, “Phase noise in oscillators: A unifying theory and numerical methods for characterization,” IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, vol. 47, no. 5, pp. 655–674, 2000.
[30] M. Reck, A. Zeilinger, H. J. Bernstein, and P. Bertani, “Experimental realization of any discrete unitary operator,” Physical Review Letters, vol. 73, pp. 58–61, July 1994.
[31] R. M. Karp, “Reducibility among combinatorial problems,” in Complexity of computer computations, pp. 85–103, Springer, 1972.
[32] H. Goto, “Bifurcation-based adiabatic quantum computation with a nonlinear oscillator network,” Scientific reports, vol. 6, p. 21686, 2016.
[33] H. Goto, “Quantum computation based on quantum adiabatic bifurcations of Kerr-nonlinear parametric oscillators,” Journal of the Physical Society of Japan, vol. 88, no. 6, p. 061015, 2019.

	$\displaystyle h_{j}(\bm{x^{*}})$	$\displaystyle\leq 0,\ j=1,\dots,m,$
	$\displaystyle g_{i}(\bm{x^{*}})$	$\displaystyle=0,\ i=1,\dots,p.$