This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Physics Successfully Implements Lagrange Multiplier Optimization

Sri Krishna Vadlamani1, Tianyao Patrick Xiao1111Now at Sandia National Laboratories, Albuquerque, NM, USA , Eli Yablonovitch1
(1Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA
)
Abstract

Optimization is a major part of human effort. While being mathematical, optimization is also built into physics. For example, physics has the principle of Least Action, the principle of Minimum Entropy Generation, and the Variational Principle. Physics also has physical annealing which, of course, preceded computational Simulated Annealing. Physics has the Adiabatic Principle which in its quantum form is called Quantum Annealing. Thus, physical machines can solve the mathematical problem of optimization, including constraints.

Binary constraints can be built into the physical optimization. In that case, the machines are digital in the same sense that a flip-flop is digital. A wide variety of machines have had recent success at optimizing the Ising magnetic energy. We demonstrate in this paper that almost all those machines perform optimization according to the Principle of Minimum Entropy Generation as put forth by Onsager. Further, we show that this optimization is in fact equivalent to Lagrange multiplier optimization for constrained problems. We find that the physical gain coefficients which drive those systems actually play the role of the corresponding Lagrange Multipliers.

1 Introduction

Optimization is ubiquitous in today’s world. Everyday applications of optimization range from aerodynamic design of vehicles by fluid mechanics and physical stress optimization of bridges in civil engineering to scheduling of airline crews and routing of delivery trucks in operations research. Furthermore, optimization is also indispensable in machine learning, reinforcement learning, computer vision, and speech processing. Given the preponderance of massive datasets and computations today, there has been a surge of activity in the design of hardware accelerators for neural network training and inference [1].

We ask whether Physics can address optimization? There are a number of promising physical principles that drive dynamical systems toward an extremum. These are: the principle of Least Action, the principle of Minimum Entropy Generation, and the Variational Principle. Physics also has actual physical annealing which preceded computational Simulated Annealing. Further, Physics has the Adiabatic Principle which in its quantum form is called Quantum Annealing.

In due course, we may learn how to use each of these principles to perform optimization. Let us consider the principle of minimum entropy generation in dissipative physical systems, such as resistive electrical circuits. It was shown by Onsager [2] that the equations of linear systems, like resistor networks, can be re-expressed as the minimization principle of a function f(i1,i2,,in)f(i_{1},i_{2},\dots,i_{n}) for currents ini_{n} in various branches of the resistor network. For a resistor network, the function ff contains the power dissipation, or entropy generation. By re-expressing a merit function in terms of power dissipation, the circuit itself will find the minimum of the merit function, or minimum power dissipation. Optimization is generally accompanied by constraints. For example, perhaps the constraint is that the final answers must be restricted to be ±1\pm 1. Such a digitally constrained optimization produces answers compatible with any digital computer.

There has been a series of machines created in the physics and engineering community to devise physics-based engines for the Ising problem. The Ising challenge is to find the minimum energy configuration of a large set of magnets. It’s a very hard problem even when the magnets are restricted to only two orientations, North pole up or down [3]. Our main insight in this paper is that most of these Ising solvers use hardware based on the Principle of Minimum Entropy generation. The natural evolution of these machines is toward a good, low-power-dissipation final state. Further, almost all of them implement the well-known Lagrange Multipliers method for constrained optimization.

An early work was by Yamamoto et al. in [4] and this was followed by further work from their group [5], [6], [7], [8], [9], [10], [11] and other groups [12], [13], [14], [15], [16], [17], [18], [19], [20]. These entropy generating machines range from coupled optical parametric oscillators, to RLC electrical circuits, to coupled exciton-polaritons, and silicon photonic coupled arrays. These types of machines have the advantage that they solve digital problems in the analog domain, which can be orders-of-magnitude faster and more energy-efficient than conventional digital chips that are limited by latency and the energy cost [11].

Within the framework of these dissipative machines, constraints can be readily included. In effect, these machines perform constrained optimization equivalent to the technique of Lagrange multipliers. We illustrate this connection by surveying 7 published physically distinct machines and showing that each minimizes entropy generation in its own way, subject to constraints; in fact, they perform Lagrange multiplier optimization. We note here that the systems in [11], [17], and [18] follow their own dynamics and are not related to the method of Lagrange multipliers. The system in [11] will be discussed in Section 5 in the main text while the work in [17] will be discussed in Appendix D.

In effect, physical machines perform local steepest descent in the entropy generation rate. They can become stuck in local optima. At the very least, they perform a rapid search for local optima, thus reducing the search space for the global optimum. These machines are also adaptable toward searching in an expanded phase space, and other techniques for approaching a global optimum.

The paper is organized as follows. In Section 2, we recognize that physics performs optimization through its variational and optimization principles. Then, we concentrate on the principle of minimum entropy generation or minimum power dissipation. In Section 3, we give an overview of the minimum entropy generation optimization solvers in the literature and show how they incorporate constraints. Section 4 has a quick tutorial on the method of Lagrange Multipliers. Section 5 studies five published solvers in detail and shows that they all follow some form of Lagrange multiplier dynamics. In Section 6, we look at those published physics-based solvers which are less obviously connected to Lagrange multipliers. Section 7 presents the applications of physics-based solvers to perform linear regression in statistics. Finally, in Section 8, we conclude and discuss the consequences of this ability to implement physics-based Lagrange multiplier optimization for areas such as machine learning.

2 Optimization in Physics

We survey the minimization principles of physics and the important optimization algorithms derived from them. These physical optimization machines are intended to converge to optima that are ‘agnostic to initial conditions’. By ‘agnostic to initial conditions’, we mean systems that converge to the global optimum, or a good local optimum, irrespective of the initial point for the search.

2.1 Physics Principles and Algorithms

2.1.1 The principle of Least Action

The principle of Least Action is the most fundamental principle in physics. Newton’s Laws of Mechanics, Maxwell’s Equations of Electromagnetism, Schrödinger’s equation in Quantum Mechanics, and Quantum Field Theory can all be interpreted as minimizing a quantity called action. For the special case of light propagation, this reduces to the principle of Least Time, as shown in Fig. 1.

A conservative system without friction or losses evolves according to the principle of Least Action. The fundamental equations of physics are reversible. A consequence of this reversibility is the Liouville Theorem which states that volumes in phase space are left unchanged as the system evolves.

Contrary-wise, in both a computer and an optimization solver, the goal is to have a specific solution with less uncertainty or a smaller zone in phase space than the initial state, an entropy cost first specified by Landauer and Bennett. Thus, some degree of irreversibility, or energy cost, is needed, specified by the number of digits in the answer in the Landauer/Bennett analysis. An algorithm has to be designed and programmed into the reversible system (either via software or via hardcoding of hardware) to effect the reduction in entropy needed to solve the optimization problem. Coming up with a fast algorithm for NP-hard problems is still an open problem in the field of reversible computing, which includes quantum computing.

This would require an energy cost, but not necessarily a requirement for continuous power dissipation. We look forward to computer science breakthroughs that would allow the principle of Least Action to address unsolved problems. An alternative approach to computing would involve physical systems that continuously dissipate power, aiding in the contraction of phase space toward a final solution. While exceeding the Landauer limit, such systems might have the advantage of speed and simplicity. This brings us to the principle of Least Power dissipation.

Refer to caption
Figure 1: The principle of Least Time, a subset of the principle of Least Action: The actual path that light takes to travel from point A to point B is the one that takes the least time to traverse. Recording the correct path entails a small energy cost consistent with the Landauer Limit.

2.1.2 The principle of Least Power Dissipation

If we consider systems that continuously dissipate power, we are led to a second optimization principle in physics, the principle of Least Entropy Production or Least Power Dissipation. This principle states that any physical system will evolve into a steady-state configuration that minimizes the rate of entropy production given the constraints (such as fixed thermodynamic forces, voltage sources, or input power) that are imposed on the system. An early version of this statement is provided by Onsager in his celebrated papers on the reciprocal relations [2]. This was followed by further foundational work on this principle by Prigogine, [21], and de Groot, [22]. This principle is readily seen in action in electrical circuits and is illustrated in Fig. 2. We shall frequently use this principle, as formulated by Onsager, in the rest of the paper.

Refer to caption
Figure 2: The principle of Least Power Dissipation: In a parallel connection, the current distributes itself in a manner that minimizes the power dissipation subject to the constraint of fixed input current II.

2.1.3 Physical Annealing; Energy Minimization

This technique is widely used in materials science and metallurgy and involves the slow cooling of a system starting from a high-temperature. As the cooling proceeds, the system tries to maintain thermodynamic equilibrium by reorganizing itself into the lowest energy minimum in its phase space. Energy fluctuations due to finite temperatures help the system escape from local optima. This procedure leads to global optima when the temperature reaches zero in theory but the temperature has to be lowered prohibitively slowly for this to happen.

Refer to caption
Figure 3: Physical annealing involves the slow cooling down of a system. The system performs gradient descent in configuration space with occasional jumps activated by finite temperature. If the cooling is done slowly enough, the system ends up in the ground state of configuration space.

2.1.4 Adiabatic Method

The Adiabatic Method involves the slow transformation of a system from initial conditions that are easily constructed to final conditions that capture the difficult problem at hand.

More specifically, to solve the Ising problem using this algorithm, one initializes the system of spins in the ground state of a simple Hamiltonian and then slowly varies the parameters of the Hamiltonian to end up with the final Ising Hamiltonian of interest. If the parameters are varied slowly enough, the system ends up in the ground state of the final Hamiltonian and the problem gets solved. In a quantum mechanical system, this is sometimes called ‘quantum annealing’. Several proposals and demonstrations, including the well-known D-Wave machine [23], utilize this algorithm.

The slow rate of variation of the Hamiltonian parameters that is required for this method to work is determined by the minimum energy spacing between the instantaneous ground state and the instantaneous first excited state that occurs as we move from the initial Hamiltonian to the final Hamiltonian. The smaller the gap is for a particular variation schedule, the slower the rate at which we need to perform the variation to successfully solve the problem. It has been shown that the gap can get exponentially small in the worst case, implying that this algorithm can take exponential time in the worst case for NP-hard problems.

Refer to caption
Figure 4: A system initialized in the ground state of a simple Hamiltonian continues to stay in the ground state as long as the Hamiltonian is changed slowly enough.

2.1.5 Minimum Entropy Generation in Multi-Oscillator Arrays

Multi-Oscillator Arrays, subject to Parametric Gain were introduced in [4] and [7] for solving Ising problems. This can be regarded as a subset of the Principle of Minimum Entropy Generation, which is always subject to a non-zero input power constraint. In this case, gain acts as a boundary condition or constraint for the principle of minimum entropy generation, and the oscillator array must arrange itself to dissipate the least power subject to that constraint. In the context of a multi-coupled-oscillator arrays with gain, a certain oscillator mode will have the least loss. That mode will grow in amplitude most rapidly. This least loss mode can be individually selected by increasing the gain until it matches the intrinsic loss of that mode. Then further nonlinear evolution amongst all the modes will occur. If the oscillator array is bistable, as is the case for parametric gain which drives oscillation along the ±\pmreal axis, this becomes the analog of magnetic bistability in an Ising problem. Then, we seek a solution for the lowest magnetic energy state with some oscillators locked at zero phase shift and others locked at π\pi-phase shift. This mechanism will be the main point of Section 3.

3 Coupled Multi-Oscillator Array Ising Solvers

The motivation for ‘Coupled Multi-Oscillator Array Ising Solvers’ is best explained using concepts from laser physics. As a laser is slowly being turned on, spontaneous emission from the laser gain medium couples into the various cavity modes and begins to become amplified. The different optical modes in the cavity have different loss coefficients due to their differing spatial profiles. As the laser pump/gain increases, the cavity mode with the least loss grows faster than the other modes. Once the gain reaches the threshold gain, then further nonlinear evolution amongst all the modes will occur.

The design of the Coupled Multi-Oscillator Array Ising machines tries to map the power losses of the optimization machine to the magnetic energies of various states of the Ising problem. If the mapping is correct, the lowest power configuration will match the energetic ground state of the Ising problem. This is illustrated in Fig. 5. In effect, this is an example of a system evolving toward a state of minimum entropy generation, or minimum power dissipation, subject to the constraint of gain being present.

Refer to caption
Figure 5: A lossy multi-oscillator system is provided with gain: The x-axis is a list of all the available modes in the system whereas the y-axis plots the loss coefficient of each mode. Gain is provided to the system and is gradually increased. As in single-mode lasers, the lowest loss mode, illustrated by the blue dot, grows exponentially, saturating the gain. Above threshold we can expect further nonlinear evolution among the modes so as to minimize power dissipation.

The archetypal solver in this class consists of a network of interconnected oscillators, driven by phase-dependent parametric gain. Parametric gain amplifies only the cosine quadrature and causes the electric field to lie along the ±\pmReal axis in the complex plane. The phase of the electric field (0 or π\pi) can be used to represent ±\pmspin in the Ising problem. The resistive interconnections between the oscillators are designed to favor ferromagnetic or anti-ferromagnetic ‘spin-spin’ interactions by the Principle of Minimum Entropy Generation, subject to the constraint of parametric (phase-dependent) gain as the power input. The parametric gain favors oscillation along the real axis of the complex plane, where the positive real axis would correspond to spin up, and the negative real axis would correspond to spin down.

The Voltage (or Current) input constraint is very important to the Principle of Minimum Entropy Generation. If there were no power input constraint, all the currents and voltages would be zero, and the minimum power dissipated would be zero. In the case of the Coupled Multi-Oscillator circuit, the power input is produced through a gain mechanism, or a gain module. The constraint could be the voltage input to the gain module. But if the gain were to be too small, it might not exceed the corresponding circuit losses, and the current and voltage would remain near zero. Thus, there is usually a threshold gain requirement, when applying the Principle of Minimum Entropy Generation to the Coupled Multi-Oscillator circuit.

If the gain is insufficient, the circuit will achieve minimum entropy generation at negligible currents and voltages. The intrinsic losses in the network would dominate and the circuit currents and voltages would be near zero. If the pump gain is then gradually ramped up the oscillatory mode requiring the least threshold gain begins oscillating. Upon reaching the threshold gain, a non-trivial current distribution built on the bistability of the Couple Multi-Oscillator circuit will emerge. As the gain exceeds the required threshold, there will be further nonlinear evolution among the modes so as minimize power dissipation. The final state “spin” configuration, dissipating the lowest power, (or entropy generation) emerges as the sought-for optimum.

Ideally, the gain evolution will be controlled by the Lagrange function to find the local minimum power dissipation configuration as will be discussed in Section 5.5. With Minimum Entropy Generation, as with most optimization schemes, it is difficult to guarantee a global optimum.

In optimization, each constraint contributes a Lagrange multiplier. We will show that the gains of the oscillators are the Lagrange multipliers of the constrained system. In the next section, we provide a brief tutorial on Lagrange Multiplier optimization.

4 Lagrange Multiplier Optimization Tutorial

The method of Lagrange multipliers is a very well-known procedure for solving constrained optimization problems. In constrained optimization, the optimal point 𝒙(x,y)\bm{x^{*}}\equiv(x,y) in multi-dimensional solution space locally optimizes the merit function f(𝒙)f(\bm{x}) subject to the constraint g(𝒙)=0g(\bm{x})=0. The optimal point has the property that the slope of the merit function is zero as infinitesimal steps are taken away from 𝒙\bm{x^{*}}, as taught in calculus. But these deviations are restricted to the constraint curve, as shown in Fig. 6. The iso-contours of the function f(𝒙)f(\bm{x}) increase until they are limited by, and just touch, the constraint curve g(𝒙)=0g(\bm{x})=0 at the point 𝒙\bm{x^{*}}.

Refer to caption
Figure 6: Maximization of function f(x,y)f(x,y) subject to the constraint g(x,y)=0g(x,y)=0. At the constrained local optimum, the gradients of ff and gg, namely f(x,y)\nabla f(x,y) and g(x,y)\nabla g(x,y), are parallel.

At the point of touching, 𝒙\bm{x^{*}}, the gradient of ff and the gradient of gg are parallel to each other. This can be stated formally as:

f(𝒙)=λg(𝒙).\bm{\nabla}f(\bm{x^{*}})=\lambda^{*}\bm{\nabla}g(\bm{x^{*}}).

The proportionality constant λ\lambda^{*} is called the Lagrange multiplier corresponding to the constraint g(𝒙)=0g(\bm{x})=0.

Although Fig. 6 shows only a 2-dimensional optimization space, in general let us take the optimization space xx to be nn-dimensional. When we have multiple constraints g1,,gpg_{1},\dots,g_{p}, we correspondingly expand the nn-dimensional Lagrange Multiplier requirement:

f(𝒙)=i=1pλigi(𝒙),\bm{\nabla}f(\bm{x^{*}})=\sum_{i=1}^{p}\lambda_{i}^{*}\bm{\nabla}g_{i}(\bm{x^{*}}),

where the gradient vector \bm{\nabla} represents nn-equations, accompanied by the pp constraint equations gi(𝒙)=0g_{i}(\bm{x})=0, resulting in n+pn+p equations. These equations solve for the nn components in the vector 𝒙\bm{x^{*}}, and the pp unknown Lagrange Multipliers λi\lambda_{i}^{*}. That would be n+pn+p equations for n+pn+p unknowns.

Motivated by the above condition, let us introduce a Lagrange function L(𝒙,𝝀)L(\bm{x},\bm{\lambda}) defined as follows:

L(𝒙,𝝀)=f(𝒙)+i=1pλigi(𝒙),L(\bm{x},\bm{\lambda})=f(\bm{x})+\sum_{i=1}^{p}\lambda_{i}g_{i}(\bm{x}),

which can be optimized by gradient descent or other methods to solve for 𝒙\bm{x^{*}} and 𝝀\bm{\lambda^{*}}. The full theory of Lagrange multipliers, and the popular ‘Augmented Lagrangian Method of Multipliers’ algorithm used to solve for locally optimal (𝒙,𝝀)(\bm{x^{*}},\bm{\lambda^{*}}), are discussed in great detail in [24] and [25]. A gist of the main points is presented in Appendices A, B, C.

For the specific case of the Ising problem, the objective function is given by f(𝝁)=i,jJij𝝁𝒊𝝁𝒋f(\bm{\mu})=\nobreak\sum_{i,j}J_{ij}\bm{\mu_{i}}\cdot\bm{\mu_{j}}, where f(𝝁)f(\bm{\mu}) is the magnetic Ising Energy and 𝝁𝒊\bm{\mu_{i}} is the ii-th magnetic moment vector. For the optimization method represented in this paper, we need a circuit or other physical system whose power dissipation is also f(𝒙)=i,jJijxixjf(\bm{x})=\nobreak\sum_{i,j}J_{ij}x_{i}x_{j}, but now f(𝒙)f(\bm{x}) is power dissipation, not energy, xix_{i} is a variable that represents voltage, or current or electric field, and the JijJ_{ij} are not magnetic energy, but rather resistive coupling elements. The correspondence is between magnetic spins quantized along the z-axis, μzi=±1\mu_{zi}=\pm 1, and the circuit variable xi=±1x_{i}=\pm 1.

While ‘energy’ as opposed to ‘power dissipation’ are represented by different units, nonetheless we need to establish a correspondence between them. For every optimization problem, there is a challenge of finding a physical system whose power dissipation function represents the desired equivalent optimization function.

If the Ising problem has nn spins, there are also p=np=n constraints, one for each of the spins. A sufficient constraint is: gi(𝒙)=1xi2g_{i}(\bm{x})=1-x_{i}^{2}. More complicated nonlinear constraints can be envisioned, but (1xi2)(1-x_{i}^{2}) could represent the first two terms in a more complicated constraint Taylor expansion.

Therefore, a sufficient Lagrange function for the Ising problem, with digital constraints, is given by:

L(𝒙,𝝀)=i=1nj=1nJijxixj+i=1nλi(1xi2),L(\bm{x},\bm{\lambda})=\sum_{i=1}^{n}\sum_{j=1}^{n}J_{ij}x_{i}x_{j}+\sum_{i=1}^{n}\lambda_{i}\left(1-x_{i}^{2}\right),

where λi\lambda_{i} is the Lagrange Multiplier associated with the corresponding constraint. We shall see in the next section that most analog algorithms that have been proposed for the Ising problem in the literature, actually tend to optimize some version of the above Lagrange function.

5 The Physical Ising Solvers

In this section, we discuss each physical procedure proposed in the literature and show how each physical scheme implements the method of Lagrange multipliers. They all obtain good performance on the Gset benchmark problem set [26], and many of them demonstrate that they perform better than the heuristic algorithm, Breakout Local Search [27]. The main result of our work is the realization that the pump gains used in all the physical methods are in fact Lagrange multipliers.

The available physical solvers in the literature are as follows:

  1. 1.

    Optical Parametric Oscillators

  2. 2.

    Coupled Radio Oscillators on the Real Axis

  3. 3.

    Coupled laser cavities using multicore fibers

  4. 4.

    Coupled Radio Oscillators on the Unit Circle

  5. 5.

    Coupled polariton condensates

Then there are a number of schemes that also rely upon a variant of minimum entropy production or power dissipation:

  1. 6.

    Iterative Analog Matrix Multipliers

  2. 7.

    Leleu Mathematical Ising Solver

In Appendix D, there is scheme that appears unconnected with minimum entropy production rate:

  1. 8.

    Adiabatic coupled radio oscillators (Toshiba)

We shall see that methods 1, 2, 4, in the literature use only one gain for all the oscillators which is equivalent to imposing only one constraint. The other methods, 3, 5, 6, use different gains for each spin and correctly capture the full nn constraints of the Ising problem.

5.1 Optical Parametric Oscillators

5.1.1 Overview

An early optical machine for solving the Ising problem was presented by Yamamoto et al. [4] and [8]. Their system consists of several pulses of light circulating in an optical fiber loop, with the phase of each light pulse representing an Ising spin. In parametric oscillators, gain occurs at half the pump frequency. If the gain overcomes the intrinsic losses of the fiber, the optical pulse builds up. Parametric amplification provides phase dependent gain. It restricts the oscillatory phase to the Real Axis of the complex plane. This leads to bistability along the positive or negative real axis, allowing the optical pulses to mimic the bistability of magnets.

In the Ising problem, there is magnetic coupling between spins. The corresponding coupling between optical pulses is achieved by controlled optical interactions between the pulses. In Yamamoto’s approach, one pulse ii is first plucked out by an optical gate, amplitude modulated by the proper connection weight specified in the JijJ_{ij} Ising Hamiltonian, and then reinjected and superposed onto the other optical pulse jj, producing constructive or destructive interference, representing ferromagnetic or anti-ferromagnetic coupling.

By providing saturation to the pulse amplitudes, the optical pulses will finally settle down, each to one of the two bistable states. We will find that the pulse amplitude configuration evolves exactly according to the Principle of Minimum Entropy Generation. If the magnetic dipole solutions in the Ising problem are constrained to ±1\pm 1 then each constraint is associated with a Lagrange Multiplier. We find that each Lagrange Multiplier turns out to be equal to the gain or loss associated with the corresponding oscillator.

5.1.2 Lagrange multipliers as gain coefficients

As an example, Yamamoto et al. [7] analyze their parametric oscillator system using coupled wave equations for the slowly varying amplitudes of the circulating optical modes. We now show that the coupled wave equation approach reduces to an extremum of their system ‘Entropy Generation’ or ‘power dissipation’. The coupled-wave equation for parametric gain of the slowly varying amplitude cic_{i} of the in-phase cosine component of the ii-th optical pulse (representing magnetic spin in an Ising system) is as follows:

dcidt=(αi+γi)cijJijcj,\frac{dc_{i}}{dt}=(-\alpha_{i}+\gamma_{i})c_{i}-\sum_{j}J_{ij}c_{j}, (1)

where the weights, JijJ_{ij}, are the magnetic ferromagnetic or anti-ferromagnetic cross-couplings, and γi\gamma_{i} represents the phase dependent parametric gain given by the pump to the ii-th circulating pulse, whereas αi\alpha_{i} is the corresponding dissipative loss. The amplitudes cic_{i} represent the optical cosine wave electric field amplitudes. In the case of circuits, the voltage amplitudes can be expressed as Vi=ci+jsiV_{i}=c_{i}+\text{j}s_{i} where cic_{i} and sis_{i} represent the cosine and sine quadrature components of voltage, and j is the unit imaginary. For clarity of discussion, we dropped the cubic terms in (1) that Yamamoto et al originally had. A discussion of these terms in given in Appendix C.

Owing to the nature of parametric amplification, the quadrature sine wave components sis_{i} of the electric field amplitude die out rapidly. The rate of entropy generation, or net power dissipation hh, including the negative dissipation associated with gain can be written:

h(c1,,cn)=i,jJijcicj+iαici2iγici2,h(c_{1},\dots,c_{n})=\sum_{i,j}J_{ij}c_{i}c_{j}+\sum_{i}\alpha_{i}c_{i}^{2}-\sum_{i}\gamma_{i}c_{i}^{2}, (2)

If we minimize the entropy generation h(𝒄)h(\bm{c}) without invoking any constraints, that is, with γi=0\gamma_{i}=0, the amplitudes cic_{i} simply go to zero, which generates the minimum entropy.

If the gain γi\gamma_{i} is large enough, some of the amplitudes might go to infinity. To avoid this, we may employ the nn constraint functions gi(ci)=(1ci2)=0g_{i}(c_{i})=\left(1-c_{i}^{2}\right)=0, which enforce a digital ci=±1c_{i}=\pm 1 outcome. (Actually, a constraint of the form gi(ci)=(1ci2)=0g_{i}(c_{i})=\left(1-c_{i}^{2}\right)=0 is quite general in the sense that (1ci2)\left(1-c_{i}^{2}\right) can represent the first two terms of the Taylor Series of an arbitrary constraint.) Adding the constraint function to the entropy generation, yields the Lagrange function including the constraint functions times the respective Lagrange Multipliers:

L(𝒄,𝜸)=i,jJijcicj+iαici2iγi(ci21),L(\bm{c},\bm{\gamma})=\sum_{i,j}J_{ij}c_{i}c_{j}+\sum_{i}\alpha_{i}c_{i}^{2}-\sum_{i}\gamma_{i}\left(c_{i}^{2}-1\right), (3)

Comparing the unconstrained (2) to the constrained (3), they only differ in the final (-1) term which effectively constrains the amplitudes and prevents them from diverging to \infty. Expression (3) is the Lagrange function given at the end of Section 4. Surprisingly, the gains γi\gamma_{i} emerge to play the role of Lagrange Multipliers. This means that each mode, represented by the subscripts in cic_{i}, must adjust to a particular gain γi\gamma_{i} which minimizes the overall entropy generation, and the respective gains γi\gamma_{i} represent the Lagrange Multipliers. Minimization of the Lagrange function (3) provides the final steady state of the system dynamics.

If the circuit or optical system is designed to dissipate power, or equivalently generate entropy, in a mathematical form that matches the magnetic energy in the Ising problem, then the dissipative system will seek out a corresponding local optimum configuration of the magnetic Ising energy.

Such a physical system, constrained to ci=±1c_{i}=\pm 1, is digital in the same sense as a flip-flop circuit, but unlike the von Neumann computer, the inputs are resistor weights for power dissipation. Nonetheless a physical system can evolve in a direct manner, without the need for shuttling information back and forth as in a von Neumann computer, providing faster answers. Without the communications overhead but with the higher operation speed, the energy dissipated to arrive at the final answer will be less, in spite of the circuit being required to generate entropy during its evolution toward the final state.

To achieve minimum entropy production, the amplitudes cic_{i}, and the Lagrange Multipliers γi\gamma_{i}, must all be simultaneously optimized. While a circuit will evolve toward optimal amplitudes cic_{i}, the gains γi\gamma_{i} must arise from a separate active circuit. Ideally, the active circuit which controls the Lagrange Multiplier gains γi\gamma_{i}, would have its entropy production included with the main circuit. A more common method is to provide gain that follows a heuristic rule using an external feedback circuit. For example, Yamamoto et al. follow the heuristic rule γi=a+bt\gamma_{i}=a+bt. It is not yet clear whether the heuristic-based approach toward gain evolution will be equally effective as simply lumping together all main circuit and feedback components and simply minimizing the total power dissipation.

We conclude this subsection by noting that the Lagrangian, Eq (3), corresponds to Lagrange multiplier optimization using the following merit function and constraints:

f(𝒄)=i,jJijcicj+iαici2,\displaystyle f(\bm{c})=\sum_{i,j}J_{ij}c_{i}c_{j}+\sum_{i}\alpha_{i}c_{i}^{2},
gi(ci)=(1ci2)=0.\displaystyle g_{i}(c_{i})=\left(1-c_{i}^{2}\right)=0.

5.1.3 Conclusions from Optical Parametric Oscillators

  1. 1.

    Physical systems minimize the entropy production rate, or power dissipation, subject to input constraints of voltage, amplitude, gain, etc.

  2. 2.

    These systems actually perform Lagrange Multiplier optimization.

  3. 3.

    Indeed, it is the gain γi\gamma_{i} in each oscillator ii that plays the role of the corresponding Lagrange Multiplier.

  4. 4.

    If the Lagrange function is split in such a way that only the Ising function, i,jJijcicj\sum_{i,j}J_{ij}c_{i}c_{j}, is treated as the merit function f(𝒄)f(\bm{c}), then the Lagrange multipliers corresponding to the constraints gig_{i} are the net gains, γiαi\gamma_{i}-\alpha_{i}.

  5. 5.

    Under the digital constraint, amplitudes ci=±1c_{i}=\pm 1, entropy generation minimization schemes are actually binary, similar to a flip-flop.

5.2 Coupled Radio Oscillators on the real axis

5.2.1 Overview

A coupled LC oscillator system with parametric amplification was analyzed in the circuit simulator, SPICE, by Xiao et al., [12]. This is analogous to the optical Yamamoto system but this system consists of a network of radio frequency LC oscillators coupled to one another through resistive connections. The LC oscillators contain linear inductors but nonlinear capacitors which provide the parametric gain. The parallel or cross-connect resistive connections between the oscillators are designed to implement the ferromagnetic or anti-ferromagnetic couplings JijJ_{ij} between magnetic dipole moments μi\mu_{i} as shown in Fig. 7. The corresponding phase of the voltage amplitude ViV_{i}, 0 or π\pi determines the sign of magnetic dipole moment μi\mu_{i}.

Refer to caption
Figure 7: Coupled LC oscillator circuit for two coupled magnets. The oscillation of the LC oscillators represents the magnetic moments while the parallel or antiparallel cross-connections represent ferromagnetic or anti-ferromagnetic coupling, respectively. The nonlinear capacitors are pumped by V(2ω0)V(2\omega_{0}) at frequency 2ω02\omega_{0} providing parametric gain at ω0\omega_{0}.

The nonlinear capacitors are pumped by voltage V(2ω0)V(2\omega_{0}) at frequency 2ω02\omega_{0}, where the LC oscillator natural frequency is ω0\omega_{0}. Second harmonic pumping leads to parametric amplification in the oscillators. As in the optical case, parametric amplification plays the dual role of generating/sustaining voltage oscillations as well as imposing phase bistability to the ac voltages in the oscillators. The bistability refers to gain along the ±\pmReal-Axis defined by time synchronization with the 2ω02\omega_{0}-pump.

The 2ω02\omega_{0}-pump induces gain γi\gamma_{i} in the Real-Axis quadrature. As in the case of optical parametric amplifier machines, ideally, an active circuit would control the Lagrange Multiplier gains γi\gamma_{i}, and the gain control circuit would have its entropy production included with the main circuit. A more common approach is to provide gain that follows a heuristic rule.

As in the optical parametric amplifier case, a mechanism is needed to prevent the parametric gain from producing infinite amplitude signals. Zener diodes can be inserted into the circuit to restrict the amplitudes to finite saturation values or to digitally defined values. With the diodes in place, the circuit settles into a voltage phase configuration, 0 or π\pi, that minimizes net power dissipation for a given pump gain.

5.2.2 Lagrange function and Lagrange multipliers

The amplitudes can be defined by the voltages across the LC oscillator capacitors and derived from Kirchoff’s voltage and current equations as in Xiao, [12]. Performing the slowly-varying amplitude approximation on the cosine component of these voltages, cic_{i}, Xiao obtains the following equation of motion:

dcidt=14RcC0jJijcjαici+γici,\frac{dc_{i}}{dt}=\frac{1}{4R_{c}C_{0}}\sum_{j}{J_{ij}c_{j}}-\alpha_{i}c_{i}+\gamma_{i}c_{i}, (4)

where the cic_{i} are the peak voltage amplitudes in units of a reference voltage, RcR_{c} is the resistance of the coupling resistors, the cross-couplings JijJ_{ij} are assigned binary values Jij=±1J_{ij}=\pm 1, C0C_{0} is the linear part of the capacitance in each oscillator, nn is the number of oscillators, ω0\omega_{0} is the natural frequency of all the oscillators, and the parametric gain constant γ=ω0|ΔC|/4C0\gamma=\omega_{0}|\Delta C|/4C_{0} where |ΔC||\Delta C| is the capacitance modulation at the second harmonic. In the decay constant α=(n1)/(4RcC0)\alpha=(n-1)/(4R_{c}C_{0}), there are (n1)(n-1) resistors RcR_{c} in parallel, since it is assumed that each oscillator can give up energy to all the other (n1)(n-1) oscillators, with the coupling resistors acting in parallel. In this simplified model all decay constants α\alpha are taken as equal, and moreover each oscillator experiences exactly the same parametric gain γ\gamma; conditions that can be relaxed if needed.

The number 4 is present in the first denominator, since for two coupled LC circuits as shown in Fig. 7, the decay is controlled by the RC time of the capacitors in parallel and the RcR_{c} resistors in series. Likewise, the number 4 appears for loss and parametric gain, but for different reasons in each case.

We note that equation (4) above performs gradient descent on the net power dissipation function:

h(𝒄,γ)=14Rci,jJijcicj+iαC0ci2iγC0ci2,h(\bm{c},\gamma)=-\frac{1}{4R_{c}}\sum_{i,j}J_{ij}c_{i}c_{j}+\sum_{i}\alpha C_{0}c_{i}^{2}-\sum_{i}\gamma C_{0}c_{i}^{2}, (5)

which is very similar to Section 5.1. On the right-hand side, the reason for the 4 in the first denominator is to compensate for double counting in iji\neq j. The first two terms on the right-hand side together represent the dissipative losses in the coupling resistors while the third term is the negative of the gain provided to the system of oscillators.

Next, we obtain the following Lagrange function through the same replacement of (ci2)\left(-c_{i}^{2}\right) with (1ci2)\left(1-c_{i}^{2}\right) that we performed in Section 5.1:

L(𝒄,γ)=14Rci,jJijcicj+iαC0ci2iγC0(ci21),L(\bm{c},\gamma)=-\frac{1}{4R_{c}}\sum_{i,j}J_{ij}c_{i}c_{j}+\sum_{i}\alpha C_{0}c_{i}^{2}-\sum_{i}\gamma C_{0}\left(c_{i}^{2}-1\right), (6)

The above Lagrangian corresponds to Lagrange multiplier optimization using the following merit function and constraints:

f(𝒄)=14Rci,jJijcicj+iαC0ci2,\displaystyle f(\bm{c})=-\frac{1}{4R_{c}}\sum_{i,j}J_{ij}c_{i}c_{j}+\sum_{i}\alpha C_{0}c_{i}^{2},
g(𝒄)=i(1ci2)=0.\displaystyle g(\bm{c})=\sum_{i}\left(1-c_{i}^{2}\right)=0.

Again, we see that the gain coefficient γ\gamma is the Lagrange multiplier of the constraint g=0g=0.

5.2.3 Iterative Optimization as a Series of Time Steps

Although the extremum of Eq (6) represents the final evolved state of a physical system representing an optimization outcome, it would be interesting to examine the time evolution toward the optimal state. The optimization occurs by iterative steps, where each iteration can be regarded to take place in a time interval Δt\Delta t. At each successive iteration, the voltage amplitude cic_{i} takes a step whose magnitude is proportional to the gradient of the Lagrange function:

ci(t+Δt)=ci(t)κΔtciL(𝒄,γ),c_{i}(t+\Delta t)=c_{i}(t)-\kappa\Delta t\frac{\partial}{\partial c_{i}}L(\bm{c},\gamma), (7)

where the minus sign on the right hand side drives the system toward minimum power dissipation. As the Lagrange function comes closer to its minimum, the gradient ciL(𝒄,γ)\frac{\partial}{\partial c_{i}}L(\bm{c},\gamma) diminishes and the amplitude steps become smaller and smaller. The adjustable proportionality constant κ\kappa, controls the size of each iterative step; it also calibrates the dimensional units between power dissipation and voltage amplitude. (Since cic_{i} is voltage amplitude, κ\kappa has units of reciprocal capacitance.) Converting Eq (7) to continuous time,

dcidt=κciL(𝒄,γ),\displaystyle\frac{dc_{i}}{dt}=-\kappa\frac{\partial}{\partial c_{i}}L(\bm{c},\gamma), (8)
dcidt=κ(cif(ci)+jγjcigj(cj)),\displaystyle\frac{dc_{i}}{dt}=-\kappa\left(\frac{\partial}{\partial c_{i}}f(c_{i})+\sum_{j}\gamma_{j}\frac{\partial}{\partial c_{i}}g_{j}(c_{j})\right), (9)

where the γj\gamma_{j} play the role of Lagrange multipliers, and the gj=0g_{j}=0 are the constraints. Taking L(𝒄,γ)L(\bm{c},\gamma) from Eq (6), the gradient in the voltage amplitudes becomes

ciL(𝒄,γ)=12RcjJijcj+2αC0ci2γC0ci.\frac{\partial}{\partial c_{i}}L(\bm{c},\gamma)=-\frac{1}{2R_{c}}\sum_{j}{J_{ij}c_{j}}+2\alpha C_{0}c_{i}-2\gamma C_{0}c_{i}.

Substituting into Eq (8), the time derivative of the voltage amplitudes becomes

dcidt=2κ(14RcjJijcjαC0ci+γC0ci).\frac{dc_{i}}{dt}=2\kappa\left(\frac{1}{4R_{c}}\sum_{j}{J_{ij}c_{j}}-\alpha C_{0}c_{i}+\gamma C_{0}c_{i}\right). (10)

The constant κ\kappa can be absorbed into the units of time to reproduce the dynamical equation (4), the slowly varying amplitude approximation for the coupled radio oscillators. Thus, it is interesting that the slowly-varying time dynamics can be reproduced from iterative optimization steps on the Lagrange function.

5.2.4 Conclusions from Coupled Oscillators on the Real Axis

  1. 1.

    As in other cases, the coupled LC oscillator system [12] minimizes the entropy production rate, or power dissipation, incorporating the power input from the pump gain.

  2. 2.

    The coupled LC oscillator system actually performs Lagrange Multiplier optimization whose Merit Function is the Lagrange Function.

  3. 3.

    The gain γi\gamma_{i} in each oscillator ii plays the role of the corresponding Lagrange Multiplier.

  4. 4.

    Under the amplitude constraint on the voltage cosine component, ci=±1c_{i}=\pm 1, the entropy generation minimization scheme is actually binary, similar to a flip-flop.

  5. 5.

    Successive iterations toward a power dissipation minimum, employing gradient descent in the amplitudes of the cosine components of the voltages cic_{i} , surprisingly, yield the same time-dependent Slowly Varying amplitude differential equation (4).

5.3 Coupled laser cavities using multicore fibers

5.3.1 Overview

The Ising solver designed by Babaeian et al., [13], makes use of coupled laser modes in a multicore optical fiber. Polarized light in each core of the optical fiber corresponds to each magnetic moment in the Ising problem. This means that the number of cores needs to be equal to the number of magnets in the given Ising instance. The right-hand circular polarization and left-hand circular polarization of the laser light in each core represent the two polarities (up and down) of the corresponding magnet. The mutual coherence of the various cores is maintained by injecting seed light from a master laser.

The coupling between the fiber cores is achieved through amplitude mixing of the laser modes by Spatial Light Modulators at one end of the multicore fiber, [13]. These Spatial Light Modulators couple light amplitude from the ii-th core to the jj-th core according to the prescribed connection weight JijJ_{ij}.

5.3.2 Equations and comparison with Lagrange multipliers

As in prior physical examples the electric field amplitudes can be expressed in the slowly-varying polarization modes of the ii-th core, EiLE_{iL} and EiRE_{iR}, where the two electric field amplitudes are in-phase temporally, are positive real, but have different polarization. They are,

ddtEiL=\displaystyle\frac{d}{dt}E_{iL}= αiEiL+γiEiL+12jJij(EjREjL),\displaystyle-\alpha_{i}E_{iL}+\gamma_{i}E_{iL}+\frac{1}{2}\sum_{j}{J_{ij}\left(E_{jR}-E_{jL}\right)},
ddtEiR=\displaystyle\frac{d}{dt}E_{iR}= αiEiR+γiEiR12jJij(EjREjL),\displaystyle-\alpha_{i}E_{iR}+\gamma_{i}E_{iR}-\frac{1}{2}\sum_{j}{J_{ij}\left(E_{jR}-E_{jL}\right)},

where αi\alpha_{i} is the decay rate in the optical core in 1/sec units, and γi\gamma_{i} is the gain supplied to the ii-th core. The first term on the right in both the equations represents optical fiber losses while the second term represents the gain provided. The third term represents the coupling between the jj-th and ii-th cores that is provided by the Spatial Light Modulators. We next define the degree of polarization as μiEiLEiR\mu_{i}\equiv E_{iL}-E_{iR}. (This differs from the usual definition of degree of polarization of an optical beam which contains intensity rather than electric field amplitude.) Subtracting the two equations above, we obtain the following evolution equation for μi\mu_{i}:

ddtμi=αiμi+γiμi+jJijμj.\frac{d}{dt}\mu_{i}=-\alpha_{i}\mu_{i}+\gamma_{i}\mu_{i}+\sum_{j}{J_{ij}\mu_{j}}.

The power dissipation, or entropy generation, is proportional to two orthogonal components of electric field, each squared, |EiL|2+|EiR|2|E_{iL}|^{2}+|E_{iR}|^{2}. But this can also be written |EiLEiR|2+|EiL+EiR|2=|μi|2+|EiL+EiR|2|E_{iL}-E_{iR}|^{2}+|E_{iL}+E_{iR}|^{2}=|\mu_{i}|^{2}+|E_{iL}+E_{iR}|^{2}. But |EiL+EiR|2|E_{iL}+E_{iR}|^{2} can be regarded as relatively constant as energy switches back and forth between right and left circular polarization. Then the changes in power dissipation, or entropy generation, h(𝝁)h(\bm{\mu}) would be most influenced by quadratic terms in 𝝁\bm{\mu}:

h(𝝁,𝜸)=iαiμi2+i,jJijμiμjiγiμi2.h(\bm{\mu},\bm{\gamma})=\sum_{i}\alpha_{i}\mu_{i}^{2}+\sum_{i,j}J_{ij}\mu_{i}\mu_{j}-\sum_{i}\gamma_{i}\mu_{i}^{2}.

As before, we add the nn digital constraints gi(μi)=1μi2=0g_{i}(\mu_{i})=1-\mu_{i}^{2}=0 where μi=±1\mu_{i}=\pm 1 represents fully left or right circular optical polarization. With the constraints, the corresponding Lagrange function is:

L(𝝁,𝜸)=iαiμi2+i,jJijμiμjiγi(μi21).L(\bm{\mu},\bm{\gamma})=\sum_{i}\alpha_{i}\mu_{i}^{2}+\sum_{i,j}J_{ij}\mu_{i}\mu_{j}-\sum_{i}\gamma_{i}\left(\mu_{i}^{2}-1\right).

Once again, the gains γi\gamma_{i} play the role of Lagrange multipliers. Thus, a minimization of the power dissipation, subject to the optical gain γi\gamma_{i}, solves the Ising problem defined by the same JijJ_{ij} couplings.

The power dissipation function and constraint function in the Lagrange function above are:

f(𝝁)=iαiμi2+i,jJijμiμj,\displaystyle f(\bm{\mu})=\sum_{i}\alpha_{i}\mu_{i}^{2}+\sum_{i,j}J_{ij}\mu_{i}\mu_{j},
gi(μi)=(1μi2)=0.\displaystyle g_{i}(\mu_{i})=\left(1-\mu_{i}^{2}\right)=0.

5.3.3 Conclusions for Coupled Multicore Fibers

  1. 1.

    The coupled multicore fiber system [13] minimizes the entropy production rate, or power dissipation, which includes the power input from the pump gain, as in Sections 5.1 and 5.2.

  2. 2.

    The coupled multicore fiber system actually performs Lagrange Multiplier optimization.

  3. 3.

    The gain γi\gamma_{i} in each fiber ii plays the role of the corresponding Lagrange Multiplier.

  4. 4.

    Under the constraint of completely light polarization, μi=±1\mu_{i}=\pm 1, the entropy generation minimization procedure is actually binary, similar to a flip-flop.

5.4 Coupled Electrical Oscillators on the Unit Circle

5.4.1 Overview

In this section, we consider a network of interconnected, nonlinear, but amplitude-stable electrical oscillators designed by Roychowdhury et al. [14] to represent a conventional Ising system for which we seek a digital solution with each magnetic dipole μiz=±1\mu_{iz}=\pm 1 along the z-axis in the magnetic dipole space. To solve this, Roychowdhury et al. provide a dissipative system of LC oscillators, somewhat similar to the Optical Parametric Oscillators in Section 5.1, but with oscillation amplitude clamped, and oscillation phase ϕi=0\phi_{i}=0 or π\pi revealing the preferred magnetic dipole orientation μiz=±1\mu_{iz}=\pm 1. It is noteworthy that Roychowdhury goes beyond Ising machines and constructs general digital logic gates using these amplitude-stable oscillators in [28].

In the construction of their oscillators, Roychowdhury et al. [14] use nonlinear elements that behave like negative resistors at low voltage amplitudes but with saturating resistance at high voltage amplitudes. This produces of amplitude-stable oscillators. In addition, Roychowdhury et al. [14] provide a second harmonic pump and use a form of parametric amplification (referred to as sub-harmonic injection locking in [14]) to obtain bistability with respect to phase.

The dynamics of the amplitude saturation is purposely very fast and the oscillators are essentially clamped to an amplitude limit dictated by the nonlinear resistor in each oscillator. The phase dynamics induced by the 2nd harmonic pump are slower. It is the readout of these phase shifts, 0 or π\pi that provides the magnetic dipole orientation μiz=±1\mu_{iz}=\pm 1. One key difference between this system and Yamamoto’s Optical Parametric Oscillator (OPO) system is that the OPO system had fast phase dynamics and slow amplitude dynamics, while the injection locking system has the reverse.

5.4.2 Equations and comparison with Lagrange Multipliers

Roychowdhury et al. [14] derived the dynamics of their amplitude stable oscillator network using perturbation concepts developed in [29]. While a circuit diagram is not shown, [14] invokes the following dynamical equation for the phases of their electrical oscillators:

dϕidt=1RcjJijsin(ϕi(t)ϕj(t))λisin(2ϕi(t)),\frac{d\phi_{i}}{dt}=-\frac{1}{R_{c}}\sum_{j}{J_{ij}\sin{\left(\phi_{i}(t)-\phi_{j}(t)\right)}}-\lambda_{i}\sin{\left(2\phi_{i}(t)\right)}, (11)

where RcR_{c} is a coupling resistance in their system, ϕi\phi_{i} is the phase of the ii-th electrical oscillator, and the λi\lambda_{i} are decay parameters that dictate how fast the phase angles settle towards their steady state values.

We shall now show that Eq (11) can be reproduced by iteratively minimizing the power dissipation in their system. Power dissipation across a resistor is (V1V2)2/Rc(V_{1}-V_{2})^{2}/R_{c} where (V1V2)(V_{1}-V_{2}) is the voltage difference across resistor RcR_{c}. Since V1V_{1} and V2V_{2} are sinusoidal, the power dissipation consists of constant terms and a cross-term of the form:

f(ϕ1,ϕ2)=|V|2cos(ϕ1ϕ2)Rc,f(\phi_{1},\phi_{2})=\frac{|V|^{2}\cos{\left(\phi_{1}-\phi_{2}\right)}}{R_{c}},

where f(ϕ1,ϕ2)f(\phi_{1},\phi_{2}) is the power dissipated in the resistors. Magnetic dipole orientation parallel or anti-parallel is represented by whether electrical oscillator phase difference is ϕ1ϕ2=0\phi_{1}-\phi_{2}=0 or ϕ1ϕ2=π\phi_{1}-\phi_{2}=\pi respectively. We are permitted to choose an origin for angle space at ϕ=0\phi=0 which implies ϕi=0\phi_{i}=0 or ϕi=π\phi_{i}=\pi. This can be implemented through a constraint of the form:

gi(ϕi)=(cos(2ϕi)1)=0.g_{i}(\phi_{i})=\left(\cos{\left(2\phi_{i}\right)}-1\right)=0.

Combining the power dissipated in the resistors with the constraint function gi(ϕi)=0g_{i}(\phi_{i})=0 we obtain a Lagrange function:

L(ϕ,𝝀)=1Rci,jJijcos(ϕiϕj)+iλi(cos(2ϕi)1),L(\bm{\phi},\bm{\lambda})=\frac{1}{R_{c}}\sum_{i,j}{J_{ij}\cos{\left(\phi_{i}-\phi_{j}\right)}}+\sum_{i}{\lambda_{i}\left(\cos{\left(2\phi_{i}\right)}-1\right)},

where λi\lambda_{i} is the Lagrange Multiplier corresponding to the phase angle constraint, and JijJ_{ij} is a digital multiplier ±1\pm 1 to the conductance 1Rc\frac{1}{R_{c}}.

The Lagrange function above is isomorphic with the general form in Section 4. The effective merit function ff and constraints gig_{i} in this correspondence are:

f(ϕ)=1Rci,jJijcos(ϕiϕj),\displaystyle f(\bm{\phi})=\frac{1}{R_{c}}\sum_{i,j}{J_{ij}\cos{\left(\phi_{i}-\phi_{j}\right)}},
gi(ϕi)=(cos(2ϕi)1)=0.\displaystyle g_{i}(\phi_{i})=\left(\cos{\left(2\phi_{i}\right)}-1\right)=0.

5.4.3 Conclusions from Coupled Oscillators on the Unit Circle

  1. 1.

    As in other cases, the amplitude-stable oscillator system [14] minimizes the entropy production rate, or power dissipation, subject to constraints of amplitude and phase reference.

  2. 2.

    The amplitude-stable oscillator system actually performs Lagrange Multiplier optimization.

  3. 3.

    The phase decay time constant λi\lambda_{i} in each oscillator ii plays the role of the corresponding Lagrange Multiplier.

  4. 4.

    Under the phase reference constraint, ϕi=0\phi_{i}=0 or π\pi, the entropy generation minimization scheme is actually binary, similar to a flip-flop.

5.5 Coupled polariton condensates

5.5.1 Overview

Kalinin and Berloff [15] proposed a system consisting of coupled polariton condensates to minimize the XY Hamiltonian. The XY Hamiltonian is a 2 dimensional version of the Ising Hamiltonian and is given by:

H(𝝁)=ijJij𝝁𝒊𝝁𝒋,H(\bm{\mu})=\sum_{ij}J_{ij}\bm{\mu_{i}}\cdot\bm{\mu_{j}},

where the 𝝁𝒊\bm{\mu_{i}} represents the magnetic moment vector of the ii-th spin restricted to the spin-space XY plane.

To represent the spin system, Kalinin and Berloff pump a grid of coupled semiconductor microcavities with laser beams and observe the formation of strongly coupled exciton-photon states called polaritons. For our purposes, the polaritonic nomenclature is irrelevant. For us, these are simply coupled electromagnetic cavities similar to optical and LC resonators that we have already discussed. The electromagnetic system operates by the principle of minimum entropy generation similar to the previous cases. The complex electromagnetic amplitude in the ii-th microcavity can be written Ei=ci+jsiE_{i}=c_{i}+\text{j}s_{i}, where cic_{i} and sis_{i} represent the cosine and sine quadrature components of EiE_{i}, and j is the unit imaginary. In this case we identify Re(E)=c\text{Re}(E)=c as representing the X-component of the magnetic dipole vector, and Im(E)=s\text{Im}(E)=s representing the Y-component of the magnetic dipole vector. The electromagnetic microcavity system settles into a state of minimum entropy generation as the laser pump and optical gain are ramped up to compensate for the intrinsic cavity losses. The phase angles in the complex plane of the final electromagnetic modes are then reported as the corresponding 𝝁\bm{\mu}-magnetic moment angles in the XY plane.

An important point to note here is that the electromagnetic cavities experience normal phase-independent gain and not parametric gain which happens to be phase-dependent. As a consequence, this system does NOT seek phase bistability as appropriate for binary up/down spin orientations. This is because we are searching for the magnetic dipole vector angles in the XY plane which would minimize the corresponding XY magnetic energy.

5.5.2 Lagrange function and Lagrange multipliers

Ref. [15] uses ‘Ginzburg-Landau’ equations to analyze their system resulting in equations for the complex amplitudes Ψi\Psi_{i} of the polariton wavefunctions. But the Ψi\Psi_{i} are actually the complex electric field amplitudes EiE_{i} of the corresponding ii-th cavity. The electric field amplitudes satisfy the following slowly-varying-amplitude equation:

dEidt=(γiβ|Ei|2)EiiU|Ei|2EijJijEj,\frac{dE_{i}}{dt}=\left(\gamma_{i}-\beta|E_{i}|^{2}\right)E_{i}-iU|E_{i}|^{2}E_{i}-\sum_{j}{J_{ij}E_{j}}, (12)

where γi\gamma_{i} represents the optical gain, β\beta represents nonlinear attenuation, UU represents nonlinear phase shift, and JijJ_{ij} is a dissipative cross-coupling-term representing linear loss. We note from the above that both the amplitudes and phases of the electromagnetic modes are coupled to each other and evolve on comparable timescales. This is in contrast to ref. [14] where the main dynamics were embedded in phase—amplitude was fast and almost fixed—or conversely [12] embedded in amplitude—phase was fast and almost fixed.

We shall now show that the method of ref. [15] is essentially the method of Lagrange multipliers with an added ‘rotation’. The entropy generation or power dissipation rate is:

h(𝑬)=\displaystyle h(\bm{E})= ddti(Ei2+Ei2)2\displaystyle\frac{d}{dt}\sum_{i}{\left(\frac{E_{i}^{*}}{2}+\frac{E_{i}}{2}\right)^{2}}
=\displaystyle= 12i,jJij(EiEj+EiEj)+iβ|Ei|4iγi|Ei|2.\displaystyle\frac{1}{2}\sum_{i,j}{J_{ij}\left(E_{i}^{*}E_{j}+E_{i}E_{j}^{*}\right)}+\sum_{i}{\beta|E_{i}|^{4}}-\sum_{i}{\gamma_{i}|E_{i}|^{2}}.

If we add a saturation constraint, gi(Ei)=(1|Ei|2)=0g_{i}(E_{i})=\nobreak\left(1-|E_{i}|^{2}\right)=\nobreak 0, then by analogy to the previous sections, γi\gamma_{i} is reinterpreted as a Lagrange Multiplier:

L(𝑬,𝜸)=12i,jJij(EiEj+EiEj)+iβ|Ei|4iγi(|Ei|21),\begin{split}L(\bm{E},\bm{\gamma})=&\frac{1}{2}\sum_{i,j}{J_{ij}\left(E_{i}^{*}E_{j}+E_{i}E_{j}^{*}\right)}+\sum_{i}{\beta|E_{i}|^{4}}\\ &-\sum_{i}{\gamma_{i}\left(|E_{i}|^{2}-1\right)},\end{split} (13)

where LL is the Lagrange function that represents power dissipation combined with the amplitude constraint |Ei|2=1|E_{i}|^{2}=1. Thus, the scheme of coupled polaritonic resonators operates to find the state of minimum entropy generation in steady-state, similar to the parametric oscillator case of Section 5.1. That is, the steady-state solution of dynamics (12) renders (13) stationary with respect to changes in 𝑬\bm{E}. The difference is that the coupled polaritonic system solves the XY Ising problem for a magnetic moment restricted to the magnetic XY plane, while the parametric oscillator system, in Section 5.1, solves the ±\pmZ Ising problem. For any particular mathematical optimization that we would perform, we still retain the burden of specifying that dissipative system whose entropy generation matches the optimization that we are seeking.

The imaginary ‘rotation’ term, iUiU, could potentially be of use in developing more sophisticated algorithms than the method of Lagrange multipliers and we discuss this prospect in some detail in Section 6.2 where a system with a similar, but more general, ‘rotation’ term is discussed.

5.5.3 Iterative Evolution of Lagrange Multipliers

Next, we discuss how Kalinin and Berloff et al. [15], adjusted their Lagrange multipliers γi\gamma_{i} during the course of their optimization. In the method of Lagrange multipliers, the merit-function, eq (13), is used to optimize not only the electric field amplitudes EiE_{i} but also the Lagrange Multipliers γi\gamma_{i}. The authors in Sections 5.1 and 5.2 used simple heuristics to adjust their gains/decay constants which we have proven to be the Lagrange multipliers. Kalinin and Berloff of this section employ the Lagrange function optimization itself to adjust the gain/losses, as in the complete Lagrange method discussed next.

To iteratively adjust the Lagrange multipliers, we shall briefly shift back to the tutorial notation of Section 4. Afterward, we shall translate it back to the notation of this section and show that Kalinin, Berloff et al., indeed use this same iterative procedure to adjust their Lagrange multipliers.

The Lagrange function is given by L(𝒙,𝝀)=f(𝒙)+i=1pλigi(𝒙)L(\bm{x},\bm{\lambda})=f(\bm{x})+\sum_{i=1}^{p}{\lambda_{i}g_{i}(\bm{x})}, where xix_{i} are the field variables and λi\lambda_{i} are the Lagrange multipliers. Then, the procedure to find the optimal 𝒙\bm{x^{*}} and 𝝀\bm{\lambda^{*}} is to perform gradient descent of LL in 𝒙\bm{x} and gradient ascent of LL in 𝝀\bm{\lambda}. The reason for ascent in 𝝀\bm{\lambda} rather than descent is to more strictly penalize deviations from the constraint. In the language of iterations, this leads to the expressions:

xi(t+Δt)=xi(t)κΔtxiL(𝒙,𝝀),\displaystyle x_{i}(t+\Delta t)=x_{i}(t)-\kappa\Delta t\frac{\partial}{\partial x_{i}}L(\bm{x},\bm{\lambda}),
λi(t+Δt)=λi(t)+κΔtλiL(𝒙,𝝀),\displaystyle\lambda_{i}(t+\Delta t)=\lambda_{i}(t)+\kappa^{\prime}\Delta t\frac{\partial}{\partial\lambda_{i}}L(\bm{x},\bm{\lambda}),

where κ\kappa and κ\kappa^{\prime} are suitably chosen step sizes.

With our identification that the Lagrange multipliers λ\lambda are actually the same as the gains γ\gamma, we plug the expression for the full Lagrange function (13) into the second iterative equation, projecting out the constraint function gi(Ei)=(1|Ei|2)=0g_{i}(E_{i})=\left(1-|E_{i}|^{2}\right)=0, and take the limit Δt0\Delta t\to 0. We obtain the following dynamical equation for the gains γi\gamma_{i}:

dγidt=κ(1|Ei|2).\frac{d\gamma_{i}}{dt}=\kappa^{\prime}\left(1-|E_{i}|^{2}\right). (14)

The above equation for the iterative evolution of the Lagrange multipliers is indeed the very same evolution that Kalinin, Berloff et al. employ in their coupled polariton system.

To Eq (14), we must add the iterative evolution of the field variables xix_{i}:

dxidt=κxiL(𝒙,𝝀).\frac{dx_{i}}{dt}=-\kappa\frac{\partial}{\partial x_{i}}L(\bm{x},\bm{\lambda}). (15)

Equations (14) and (15) represent the full iterative evolution, but in some of the earlier sub-sections, γi(t)\gamma_{i}(t) was sometimes assigned a heuristic time dependence.

We conclude this sub-section by splitting the Lagrange function into the effective merit function ff, and the constraint function gig_{i}. The extra ‘phase rotation’ U is not captured by this interpretation.

f(E1,,En)=12i,jJij(EiEj+EiEj)+iβ|Ei|4,\displaystyle f(E_{1},\dots,E_{n})=\frac{1}{2}\sum_{i,j}{J_{ij}\left(E_{i}^{*}E_{j}+E_{i}E_{j}^{*}\right)}+\sum_{i}{\beta|E_{i}|^{4}},
gi(Ei)=(1|Ei|2)=0.\displaystyle g_{i}(E_{i})=\left(1-|E_{i}|^{2}\right)=0.

5.5.4 Conclusions from Coupled Electromagnetic Cavities

  1. 1.

    As in other cases, the coupled-cavity system [15] minimizes the steady state entropy production rate, or power dissipation, which includes the power input from the pump gain.

  2. 2.

    The gain γi\gamma_{i} in each oscillator ii plays the role of the corresponding Lagrange Multiplier.

  3. 3.

    There is a phase rotation term iUiU in the dynamics that is not captured by the Lagrange function framework.

6 Other methods in the literature

In this section, we look at other methods in the literature that do not explicitly implement the method of Lagrange Multiplier but nevertheless end up with dynamics that resemble it to varying extents. All these methods offer operation regimes where the dynamics is not analogous to Lagrange multiplier optimization, and we believe it is an interesting avenue of future work to study the capabilities of these regimes.

6.1 Iterative Analog Matrix Multipliers

Soljacic et al. [16] developed an iterative procedure consisting of repeated matrix multiplication to solve the Ising problem. Their algorithm was implemented on a photonic circuit that utilized on-chip optical matrix multiplication units composed of Mach-Zehnder interferometers that were first introduced for matrix algebra by Zeilinger et al. in [30]. Soljacic et al. showed that their algorithm performed optimization on an effective merit function, such as total magnetic Ising energy.

Let us use our insights from the previous sections and see how one would implement iterative optimization using an optical matrix multiplier. Let the multiple magnetic moment configuration of the Ising problem be represented as a vector of electric field amplitudes, EiE_{i}, of the spatially-separated optical modes. Each mode field amplitude represents the value of each magnetic moment. In each iteration, the optical modes are fed into the optical circuit which performs matrix multiplication, and the resulting output optical modes are then fed back to the optical circuit input for the next iteration. Optical gain or some other type of gain sustains the successive iterations.

In this section we design an iterative optimization scheme for the Ising problem that involves only matrix multiplications in each iterative step and whose power dissipation function matches the magnetic Ising energy. A simple block diagram of such a scheme is shown in Fig. 8.

Refer to caption
Figure 8: An optical circuit performing iterative multiplications converges on a solution of the Ising problem. Optical pulses are fed as input from the left hand side at the beginning of each iteration, pass through the matrix multiplication unit, and are passed back from the outputs to the inputs for the next iteration. Distributed optical gain sustains the iterations.

We wish to design the matrix multiplication unit such that it has the following power dissipation function:

h(𝑬)=iγi|Ei|2+12i,jJij(EiEj+EiEj).h(\bm{E})=-\sum_{i}{\gamma_{i}|E_{i}|^{2}}+\frac{1}{2}\sum_{i,j}{J_{ij}\left(E_{i}^{*}E_{j}+E_{i}E_{j}^{*}\right)}.

The Lagrange function, including a binary constraint, |Ei|2=1|E_{i}|^{2}=1, is given by:

L(𝑬,𝜸)=iγi(|Ei|21)+12i,jJij(EiEj+EiEj),L(\bm{E},\bm{\gamma})=-\sum_{i}{\gamma_{i}\left(|E_{i}|^{2}-1\right)}+\frac{1}{2}\sum_{i,j}{J_{ij}\left(E_{i}^{*}E_{j}+E_{i}E_{j}^{*}\right)}, (16)

where the JijJ_{ij} represents dissipative loss associated with electric field interference between optical modes in the Mach-Zehnder interferometers. The dissipation is compensated by optical gain γi\gamma_{i} in the system.

The iterative multiplicative procedure that evolves the electric fields toward the minimum of the Lagrange function Eq (16) is given by:

Ei(t+1)Ei(t)=κΔtEi(iγi(1|Ei|2(t))+12i,jJij(Ei(t)Ej(t)+Ei(t)Ej(t))),\begin{split}E_{i}(t+1)-E_{i}(t)&=-\kappa\Delta t\frac{\partial}{\partial E_{i}}\left(\sum_{i}{\gamma_{i}\left(1-|E_{i}|^{2}(t)\right)}\right.\\ &+\left.\frac{1}{2}\sum_{i,j}{J_{ij}\left(E_{i}^{*}(t)E_{j}(t)+E_{i}(t)E_{j}^{*}(t)\right)}\right),\end{split}

where we move from iteration tt to iteration t+1t+1 by taking steps in EiE_{i} proportional to the gradient /Ei\partial/\partial E_{i} of the Lagrange function. (/Ei\partial/\partial E_{i} represents separate differentiation with respect to the two quadrature components.) With each iteration, we feed the output of the Mach-Zehnder interferometer matrix-multiplier array back to the input, compensating for the losses with gain. Calculating the gradient, we obtain:

Ei(t+1)Ei(t)=2κΔt(γiEi(t)jJijEj(t)),E_{i}(t+1)-E_{i}(t)=2\kappa\Delta t\left(\gamma_{i}E_{i}(t)-\sum_{j}{J_{ij}E_{j}(t)}\right),

where κ\kappa is a constant step size, whose dimensionality should ensure consistency of units. Sending all the terms involving time step tt to the right hand side, we finally get:

Ei(t+1)=j((1+2κΔtγi)δij2κΔtJij)Ej(t),E_{i}(t+1)=\sum_{j}{\left(\left(1+2\kappa\Delta t\gamma_{i}\right)\delta_{ij}-2\kappa\Delta tJ_{ij}\right)E_{j}(t)}, (17)

where δij\delta_{ij} is the Kronecker delta that is 1 only if i=ji=j. The Mach-Zehnder interferometers should be tuned to the matrix [(1+2κΔtγi)δij2κΔtJij]\left[\left(1+2\kappa\Delta t\gamma_{i}\right)\delta_{ij}-2\kappa\Delta tJ_{ij}\right]. Thus, we have an iterative matrix multiplier scheme that minimizes the Lagrange function and performs Lagrange multiplier optimization of the Ising problem. In effect, a lump of dissipative optical circuitry, compensated by optical gain, will, in a series of iterations, settle into a solution of the Ising problem.

The simple system above differs from that of Soljacic et al. [16] quite significantly, in that their method has added noise and nonlinear thresholding after each iteration. It is possible that their modifications lead to performance improvements, or possibly it might work equally well as our more standard approach. A detailed description of the Soljacic approach is presented in Appendix E.

6.2 Leleu Mathematical Ising Solver

Leleu et al. [11] proposed a modified version of the Yamamoto’s Ising machine [7]. Leleu’s method significantly resembles the Lagrange method while incorporating important new features that might be important research directions in their own right. To understand the similarities and differences between Leleu’s method and that of Lagrange multipliers, we shall first list out the Lagrange multipliers equations of motion.

We recall the Lagrange function for the Ising problem that we encountered in Section 5:

L(𝒙,𝜸)=i,jJijxixj+iαixi2+iγi(1xi2).L(\bm{x},\bm{\gamma})=\sum_{i,j}{J_{ij}x_{i}x_{j}}+\sum_{i}{\alpha_{i}x_{i}^{2}}+\sum_{i}{\gamma_{i}\left(1-x_{i}^{2}\right)}.

In the above, xix_{i} are the optimization variables, JijJ_{ij} is the interaction matrix, γi\gamma_{i} is the gain provided to the ii-th variable, and αi\alpha_{i} is the loss experienced by the ii-th variable. To find a local optimum (𝒙,𝜸)(\bm{x^{*}},\bm{\gamma^{*}}) that satisfies the constraints, one has to perform gradient descent on the Lagrange function in the 𝒙\bm{x} variables and gradient ascent in the 𝜸\bm{\gamma} variables as discussed in Section 5.5. That is, the iterations one would need to perform are:

xi(t+Δt)=xi(t)κΔtxiL(𝒙,𝜸),\displaystyle x_{i}(t+\Delta t)=x_{i}(t)-\kappa\Delta t\frac{\partial}{\partial x_{i}}L(\bm{x},\bm{\gamma}),
γi(t+Δt)=γi(t)+κΔtγiL(𝒙,𝜸),\displaystyle\gamma_{i}(t+\Delta t)=\gamma_{i}(t)+\kappa^{\prime}\Delta t\frac{\partial}{\partial\gamma_{i}}L(\bm{x},\bm{\gamma}),

where κ\kappa and κ\kappa^{\prime} are suitably chosen step sizes. Substituting the expression for LL into the above and taking the limit of Δt0\Delta t\to 0, we get the following equations of motion:

dxidt=2κ((αi+γi)xijJijxj),\displaystyle\frac{dx_{i}}{dt}=2\kappa\left(\left(-\alpha_{i}+\gamma_{i}\right)x_{i}-\sum_{j}{J_{ij}x_{j}}\right), (18)
dγidt=κ(1xi2).\displaystyle\frac{d\gamma_{i}}{dt}=\kappa^{\prime}\left(1-x_{i}^{2}\right). (19)

We shall compare these equations with those designed by Leleu et al. in [11]. Their system is described by the following equations:

dxidt=(α+γ)xi+eijJijxj,\displaystyle\frac{dx_{i}}{dt}=(-\alpha+\gamma)x_{i}+e_{i}\sum_{j}{J_{ij}x_{j}}, (20)
deidt=β(1xi2)ei,\displaystyle\frac{de_{i}}{dt}=\beta(1-x_{i}^{2})e_{i}, (21)

where the xix_{i} represent the optimization variables as usual, α\alpha represents the loss experienced by each degree of freedom, γ\gamma represents a common gain supplied to each variable, β\beta is some suitably chosen positive parameter, and the eie_{i} represent error coefficients that capture how far away each xix_{i} is from its desired unity saturation amplitude. Leleu had cubic terms in xix_{i} in [11] but we shall ignore them in the present discussion for the sake of simplicity. A discussion of these cubic terms is given in Appendix C.

It is clear at once that there are significant similarities between Leleu’s system and the Lagrange multiplier system. The optimization variables in both systems experience linear losses and gains and have interaction terms that capture the Ising interaction. Both systems have auxiliary variables that are varied according to how far away each degree of freedom is from its preferred saturation amplitude. However, the similarities end here.

A major differentiation in Leleu’s system is that eie_{i} multiplies the Ising interaction felt by the ii-th variable resulting in eiJije_{i}J_{ij}. On the other hand, the complementary coefficient is ejJije_{j}J_{ij}. This has the consequence that Leleu’s equations implements a system which has non-symmetric interactions eiJijejJije_{i}J_{ij}\neq e_{j}J_{ij} between vector components xix_{i} and xjx_{j}. The inclusion of non-symmetric matrix terms seems to be important because Leleu’s system achieves excellent performance on Ising problems in the Gset problem set as demonstrated in [11].

Let us obtain some intuition about this system by splitting the non-symmetric interaction term eiJije_{i}J_{ij} into the sum of a symmetric and anti-symmetric part. This follows from the fact that any matrix AA can be written as the sum of a symmetric matrix, (A+AT)/2(A+A^{T})/2, and an anti-symmetric matrix, (AAT)/2(A-A^{T})/2. The symmetric part leads to gradient descent dynamics just as in Eq (20), similar to all the systems in Section 5. Conversely, the anti-symmetric part causes a energy-conserving ‘rotary’ motion in the vector space of xix_{i}. Leleu et al.’s non-symmetric approach seems to contain the same number of auxiliary parameters, eie_{i} versus γi\gamma_{i}, and so the secret of their improved performance seems to lie in this anti-symmetric part. We believe further analysis of this method might be a fruitful future research direction.

We recall Onsager’s reciprocity theorem [2] which states that the thermodynamic response coefficient RijR_{ij} must be equal to the response coefficient RjiR_{ji} if time-reversal symmetry holds in the equilibrium state. This would imply that one might have to construct a magnetic system of some sort to break time-reversal symmetry in order to physically implement the asymmetry embedded in the dynamics designed by Leleu et al.

In conclusion, the symmetric part of Eq (20) takes iterative steps along the gradient of a dissipative merit function. The anti-symmetric part produce energy conserving rotations in the the vector space of optimization variables, xix_{i}. The associated dynamical freedom might provide a fruitful future research direction in optimization and deserves further study to ascertain its power.

7 Applications in Linear Algebra and Statistics

We have seen that minimum power dissipation solvers can address the Ising problem and various similar problems like the traveling salesman problem, etc. In this section, we provide yet another application of minimum entropy generation solvers to another optimization problem that appears frequently in statistics, namely curve fitting. In particular, we make the observation that the problem of linear least squares regression, linear curve fitting with a quadratic merit function, resembles the Ising problem. In fact, the electrical circuit example we presented in Section 5.2 can be applied to linear regression. We present such a circuit in this section. Our circuit provides a digital answer but requires a series of binary resistance values, that is, ,2R0,R0,0.5R0,\dots,2R_{0},R_{0},0.5R_{0},\dots to represent arbitrary binary statistical input observations.

The objective of linear least squares regression is to fit a linear function to a given set of data {(𝒙𝟏,y1),(𝒙𝟐,y2),(𝒙𝟑,y3),,(𝒙𝒏,yn)}\{(\bm{x_{1}},y_{1}),(\bm{x_{2}},y_{2}),(\bm{x_{3}},y_{3}),\dots,(\bm{x_{n}},y_{n})\}. The 𝒙𝒊\bm{x_{i}} s are input vectors of dimension dd while the yiy_{i} are the observed outputs that we want our regression to capture. The linear function that is being fit is of the form y(𝒂)=i=1dwiaiy(\bm{a})=\sum_{i=1}^{d}{w_{i}a_{i}} where 𝒂\bm{a} is a feature vector of length dd and 𝒘\bm{w} is a vector of unknown weights multiplying the features vector 𝒂\bm{a}. The vector 𝒘\bm{w} is calculated by minimizing the sum of the squared errors it causes when used on an actual data set. This optimization problem may be represented as:

𝒘=min𝒘i=1n[(j=1dwjxij)yi]2,\bm{w^{*}}=\min_{\bm{w}}\sum_{i=1}^{n}{\left[\left(\sum_{j=1}^{d}{w_{j}x_{ij}}\right)-y_{i}\right]^{2}},

where xijx_{ij} is the jj-th component of the vector 𝒙𝒊\bm{x_{i}}. The merit function, upon expansion, yields i=1dyi22i=1d(j=1nxjiyj)wi+i,j=1d(k=1nxkixkj)wiwj\sum_{i=1}^{d}y_{i}^{2}-2\sum_{i=1}^{d}\left(\sum_{j=1}^{n}x_{ji}y_{j}\right)w_{i}+\sum_{i,j=1}^{d}\left(\sum_{k=1}^{n}x_{ki}x_{kj}\right)w_{i}w_{j}. This functional form is identical to that of the Ising Hamiltonian and we may construct an Ising circuit with Jij=k=1nxkixkjJ_{ij}=\sum_{k=1}^{n}x_{ki}x_{kj} with the weights 𝒘\bm{w} acting like the unknown magnetic moments. There is an effective magnetic field in the problem, hi=2j=1nxjiyjh_{i}=-2\sum_{j=1}^{n}x_{ji}y_{j}, and a fixed number i=1dyi2\sum_{i=1}^{d}y_{i}^{2} which doesn’t play a role in the optimization. A simple circuit that solves the linear least squares problem for d=2d=2—the case where there are two features per instance and, consequently, two weights wdw_{d} to be estimated—is provided in Fig. 9. This circuit provides weights upto 2-bit precision.

Refer to caption
Figure 9: A two-bit linear regression circuit to find the best two curve-fitting weights wdw_{d} using the Principle of Minimum Entropy Generation.

The two oscillators on the left hand side of the figure represent the 202^{0} and 212^{1} bits of the weight corresponding to the first feature while the oscillators on the right hand side are the corresponding bits of the second weight.

The cross-resistance RR that one would need to represent the JijJ_{ij} that connects the ii-th and jj-th oscillators is calculated as:

1R=b1R1+b0R0+b1R1,\frac{1}{R}=\frac{b_{1}}{R_{-1}}+\frac{b_{0}}{R_{0}}+\frac{b_{-1}}{R_{1}},

where Rm=2mR0R_{m}=2^{m}R_{0} is a binary hierarchy of resistances ,2R0,R0,0.5R0,\dots,2R_{0},R_{0},0.5R_{0},\dots based on a reference resistor R0R_{0}, and bmb_{m} are the binary digits of JijJ_{ij} expressed as a binary number: b=b1×21+b0×20+b1×21b=b_{1}\times 2^{1}+b_{0}\times 2^{0}+b_{-1}\times 2^{-1}. This represents JijJ_{ij} to 3-bit precision using resistors that span a dynamic range 22=42^{2}=4. Further, the sign of the coupling is allotted according to whether the resistors RR are parallel-connected or cross-connected. In operation, the resistors RR would be externally programmed to the correct binary values, with many more bits than 3-bit precision, as given by the matrix product Jij=k=1nxkixkjJ_{ij}=\sum_{k=1}^{n}x_{ki}x_{kj}.

We have just solved the regression problem of the form 𝑿𝒘=𝒚\bm{Xw}=\bm{y}, where matrix 𝑿\bm{X} and vector 𝒚\bm{y} were known measurements and the corresponding best weight vector 𝒘\bm{w} for fitting was the unknown. We conclude by noting that this same procedure can be adopted to solve linear systems of equations of the form 𝑿𝒘=𝒚\bm{Xw}=\bm{y}.

8 Discussion and Conclusion

Physics obeys a number of maximization or minimization principles such as the principle of Least Action, the principle of Minimum Entropy Generation, the Variational Principle, Physical Annealing, and the Adiabatic Principle (which in its quantum form is called Quantum Annealing).

Optimization is very significant in diverse fields ranging from scheduling and routing in operations research to protein folding in biology, porfolio optimization in finance, and energy minimization in physics. In this article, we made the observation that physics has optimization principles at its heart, and that they can be exploited to design fast, low power, digital solvers that avoid the limits of standard computational paradigms. Nature thus provides us with a means to solve optimization problems in all these areas including Engineering, Artificial Intelligence, Machine Learning (backpropagation), Control Theory, and Reinforcement Learning.

We reviewed 7 physical machines, proposed or built, that purported to solve the Ising problem and found that 6 of the 7 were performing Lagrange multiplier optimization; further, they also obey the Principle of Minimized Entropy generation (always subject to a power input constraint). This means that by appropriate choice of parameter values, these physical solvers can be used to perform Lagrange multiplier optimization orders-of-magnitude faster and with lower power than conventional digital computers. This performance advantage can be utilized for optimization in machine learning applications where energy and time considerations are critical.

The questions arise: What are the action items? And what is the most promising near term application? All the hardware approaches seem to work comparably well. The easiest to implement would be the electrical oscillator circuits, though the optical oscillator arrays can be compact and very fast. Electrically, there would two integrated circuits, the oscillator array, and the connecting resistors that would need to be reprogrammed for different problems. The action item could be to design the first chip consisting of about 1000 oscillators, and a second chip that would consist of the appropriate coupling resistor array, for a specific optimization problem. The resistors should be in an addressable binary hierarchy so that any desired resistance value can be programmed in by switches, within the number of bits accuracy. It is possible to imagine solving a new Ising problem ever milli-second, by reprogramming the resistor chip.

On the software side, a compiler would need to be developed to go from an unsolved optimization problem to the resistor array which matches that desired goal. If the merit function were mildly nonlinear, we believe that the Principle of Minimum Entropy generation would still hold, but there has been less background science justifying that claim.

With regard to the most promising near term application, it might be in Control Theory or in Reinforcement Learning in self-driving vehicles, where rapid answers are required, at modest power dissipation.

The act of computation can be regarded as a search among many possible answers. Finally, the circuit converges to a final correct configuration. Thus the initial conditions may include a huge phase space volume=2n2^{n} of possible solutions, ultimately transitioning into a final configuration representing a small or modest-sized binary number. This type of computing implies a substantial entropy reduction. This led to Landauer’s admonition that computation costs knlog2kn\log{2} of entropy decrease, and kTnlog2kTn\log{2} of energy, for a final answer with nn binary digits.

By the 2nd Law of Thermodynamics, such an entropy reduction must be accompanied by an entropy increase elsewhere. For example, in a dissipative circuit, electricity converts to heat. In Landauer’s viewpoint, the energy and entropy limit of computing was associated with the final acting of writing out the answer in nn-bits, assuming the rest of the computer was reversible. In practice, technology consumes 104×\sim 10^{4}\times times more than the Landauer limit, owing to the insensitivity of the transistors operating at \sim1Volt, when they could be operating at \sim10mVolts.

In the continuously dissipative circuits we have described here, the energy consumed would be infinite if we waited long enough for the system to reach the final optimal state. If we terminate the powering of our optimizer systems after they reach the desired final state answer, the energy consumed becomes finite. By operating at voltage <<1Volt and by powering off after the desired answer is achieved, our continuously dissipating Lagrange optimizers could actually be closer to the Landauer limit than a conventional computer.

A controversial point relates to the quality of solutions that are obtained for NP-hard problems. The physical systems we are proposing evolve by steepest descent toward a local optimum, not a global optimum. Nonetheless, many of the authors of the 7 physical systems presented here have claimed to find better local optima than their competitors, due to special adjustments in their methods. Undoubtedly, some improvements are possible, but none of the 7 papers reviewed here claims to always find the one global optimum which would be NP-hard [31].

We have shown that a number of physical systems that perform optimization are acting through the Principle of Minimum Entropy generation, though other physics principles could also fulfill this goal. As the systems evolve toward an extremum, they perform Lagrange function optimization where the Lagrange Multipliers are given by the gain or loss coefficients that keep the machine running. Thus, Nature provides us with a series of physical Optimization Machines that are much faster and possibly more energy efficient than conventional computers.

The authors gratefully acknowledge useful discussions with Dr. Ryan Hamerly, Dr. Tianshi Wang, and Prof. Jaijeet Roychowdhury. The work of S. K. Vadlamani and Prof. E. Yablonovitch was supported by the National Science Foundation through the Center for Energy Efficient Electronics Science (E3S)(E^{3}S) under Award ECCS-0939514 and the Office of Naval Research under grant #N00014-14-1-0505.

Appendix A Lagrange multipliers theory

In this appendix, we shall study in greater detail the method of Lagrange multipliers, one of the most well-known techniques for the solution of constrained optimization problems. At its core, it simply involves the modification of the merit function by adding terms that penalize constraint violations.

In the setting of constrained optimization, we are required to minimize an objective function f(𝒙)f(\bm{x}) with respect to all points that satisfy certain given inequality (hj(𝒙)0h_{j}(\bm{x})\leq 0) and equality (gi(𝒙)=0g_{i}(\bm{x})=0) constraints. Let us assume that the domain set DD, that is, the intersection of the domains of f(𝒙),gi(𝒙),hj(𝒙)f(\bm{x}),g_{i}(\bm{x}),h_{j}(\bm{x}), is a subset of n\mathbb{R}^{n}. Further, let the set of points in DD that satisfy all the constraints gi,hjg_{i},h_{j} be called FF, the feasible set. Then, the optimization problem can be written as:

minimize f(𝒙)\displaystyle f(\bm{x})
subject to hj(𝒙)0,j=1,,m,\displaystyle h_{j}(\bm{x})\leq 0,\ j=1,\dots,m,
gi(𝒙)=0,i=1,,p,\displaystyle g_{i}(\bm{x})=0,\ i=1,\dots,p,
𝒙D.\displaystyle\bm{x}\in D.

A.1 Lagrange function

We now define a new function in n+m+pn+m+p variables, called the Lagrange function, as follows:

L(𝒙,𝝀,𝝁)=f(𝒙)+j=1mμjhj(𝒙)+i=1pλigi(𝒙).L(\bm{x},\bm{\lambda},\bm{\mu})=f(\bm{x})+\sum_{j=1}^{m}\mu_{j}h_{j}(\bm{x})+\sum_{i=1}^{p}\lambda_{i}g_{i}(\bm{x}).

The summations that were added to the plain objective function f(𝒙)f(\bm{x}) serve as constraint violation penalty terms. The coefficients multiplying the penalty terms, λi\lambda_{i} and μj\mu_{j}, are known as Lagrange multipliers. The inequality Lagrange multipliers μj\mu_{j} are constrained to be non-negative in order that the penalty that arises when the inequality constraints are not satisfied (i.e. when hj(𝒙)>0h_{j}(\bm{x})>0) is non-negative. The equality Lagrange multipliers, λi\lambda_{i}, have no such restrictions.

The Lagrange function has the advantage that it helps us express the constrained optimization of ff as an unconstrained optimization of LL. That is, it can be shown that:

min𝒙Ff(𝒙)=min𝒙Dmax(𝝁0),𝝀L(𝒙,𝝀,𝝁).\min_{\bm{x}\in F}f(\bm{x})=\min_{\bm{x}\in D}\max_{(\bm{\mu}\geq 0),\bm{\lambda}}L(\bm{x},\bm{\lambda},\bm{\mu}).

The minimization over 𝒙\bm{x} in the feasible set, FF, on the left-hand side has turned into a minimization over 𝒙\bm{x} in the entire domain, DD, on the right-hand side.

A.2 Karush-Kuhn-Tucker (KKT) sufficient conditions

The Lagrange function also appears in an important, related role. While the conditions for a point 𝒙\bm{x^{*}} to be an unconstrained optimum of a differentiable function ff are expressed in terms of the gradient of f(𝒙)f(\bm{x}), the optimality conditions for constrained optimization problems are naturally expressed in terms of the Lagrange function. These conditions are presented next.

A point 𝒙\bm{x^{*}} is a local optimum of the function f(𝒙)f(\bm{x}) subject to the constraints gi,hjg_{i},h_{j} if it satisfies the Karush-Kuhn-Tucker (KKT) sufficient conditions:

  1. 1.

    Primal feasibility: The point 𝒙\bm{x^{*}} is feasible, that is, it satisfies all the constraints.

    hj(𝒙)\displaystyle h_{j}(\bm{x^{*}}) 0,j=1,,m,\displaystyle\leq 0,\ j=1,\dots,m,
    gi(𝒙)\displaystyle g_{i}(\bm{x^{*}}) =0,i=1,,p.\displaystyle=0,\ i=1,\dots,p.
  2. 2.

    First-order condition: There exist Lagrange multipliers μ1,,μm,λ1,,λp\mu_{1}^{*},\dots,\mu_{m}^{*},\lambda^{*}_{1},\dots,\lambda^{*}_{p} such that the following equation is satisfied:

    𝒙L(𝒙,𝝁,𝝀)\displaystyle\ \ \ \ \bm{\nabla_{x}}L(\bm{x^{*}},\bm{\mu^{*}},\bm{\lambda^{*}})
    =𝒙f(𝒙)+j=1mμj𝒙hj(𝒙)+i=1pλi𝒙gi(𝒙)\displaystyle=\bm{\nabla_{x}}f(\bm{x^{*}})+\sum_{j=1}^{m}\mu_{j}^{*}\bm{\nabla_{x}}h_{j}(\bm{x^{*}})+\sum_{i=1}^{p}\lambda_{i}^{*}\bm{\nabla_{x}}g_{i}(\bm{x^{*}})
    =0.\displaystyle=0.
  3. 3.

    Second-order condition: In addition to the first-order condition, if all the concerned functions are twice differentiable, we require that the Hessian of LL with respect to 𝒙\bm{x} be positive definite along all directions that respect the active constraints. That is, the following equation has to be satisfied:

    𝒗T𝒙𝒙2L(𝒙,𝝁,𝝀)𝒗>0,𝒗T,\displaystyle\bm{v}^{T}\bm{\nabla_{xx}}^{2}L(\bm{x^{*}},\bm{\mu^{*}},\bm{\lambda^{*}})\bm{v}>0,\ \forall\bm{v}\in T,

    where TT is defined as T={𝒗:(𝒙hj(𝒙))T𝒗=0,j=T=\{\bm{v}:\ \left(\bm{\nabla_{x}}h_{j}(\bm{x^{*}})\right)^{T}\bm{v}=0,\ j= active constraints at 𝒙,(𝒙gi(𝒙))T𝒗=0,i=1,,p}\bm{x^{*}},\ \left(\bm{\nabla_{x}}g_{i}(\bm{x^{*}})\right)^{T}\bm{v}=0,\ i=1,\dots,p\}.

  4. 4.

    Complementary slackness: μjhj(𝒙)=0\mu_{j}^{*}h_{j}(\bm{x^{*}})=0 is satisfied for all the inequality constraints, j=1,,mj=1,\dots,m.

  5. 5.

    Dual feasibility: The Lagrange multipliers of all the inequality constraints satisfy μj0,j=1,,m\mu_{j}^{*}\geq 0,\ j=1,\dots,m, with the inequality being strict for active constraints.

We recognize condition number 2 as the one we encountered in Section 4 in the main text.

A.3 Saddle point nature of global and local optima

If the functions ff, gig_{i}, hjh_{j}, and the feasible set FF satisfy certain special conditions (such as, for instance, the Slater condition), a phenomenon called strong duality occurs. Under such circumstances, it turns out that the minimization and maximization in the above equation can be switched:

min𝒙Dmax(𝝁0),𝝀L(𝒙,𝝀,𝝁)=max(𝝁0),𝝀min𝒙DL(𝒙,𝝀,𝝁).\min_{\bm{x}\in D}\max_{(\bm{\mu}\geq 0),\bm{\lambda}}L(\bm{x},\bm{\lambda},\bm{\mu})=\max_{(\bm{\mu}\geq 0),\bm{\lambda}}\min_{\bm{x}\in D}L(\bm{x},\bm{\lambda},\bm{\mu}).

When this occurs, it can be shown that the optimal pair (𝒙,𝝀,𝝁)(\bm{x^{*}},\bm{\lambda^{*}},\bm{\mu^{*}}) that satisfies the KKT conditions actually forms a saddle point of the Lagrange function, L(𝒙,𝝀,𝝁)L(\bm{x},\bm{\lambda},\bm{\mu}). That is, one has:

L(𝒙,𝝀,𝝁)\displaystyle L(\bm{x^{*}},\bm{\lambda^{*}},\bm{\mu^{*}}) L(𝒙,𝝀,𝝁),\displaystyle\leq L(\bm{x},\bm{\lambda^{*}},\bm{\mu^{*}}),\ 𝒙D,\displaystyle\forall\ \bm{x}\in D,
L(𝒙,𝝀,𝝁)\displaystyle L(\bm{x^{*}},\bm{\lambda^{*}},\bm{\mu^{*}}) L(𝒙,𝝀,𝝁),\displaystyle\geq L(\bm{x^{*}},\bm{\lambda},\bm{\mu}),\ 𝝁0,𝝀.\displaystyle\forall\ \bm{\mu}\geq 0,\ \bm{\lambda}.

This can be viewed as a simple motivation for the gradient descent of LL in xx and gradient ascent of LL in λ\lambda that we encountered in Section 5.5. In the next section, Appendix B, we shall follow Chapter 4 of the excellent book [25] and see how this descent/ascent procedure can be used to find local optima even if strong duality doesn’t hold.

Appendix B Lagrange multipliers algorithms: Augmented Lagrangian

In this section, we discuss the so-called ‘Augmented Lagrangian method of multipliers’, a popular algorithm used to obtain locally optimal solutions (𝒙,𝝀)(\bm{x^{*}},\bm{\lambda^{*}}) that satisfy the KKT conditions. We shall be only considering the case where there are no inequality constraints hjh_{j} (and, consequently, no 𝝁\bm{\mu} multipliers). This algorithm is discussed in detail in [25].

To motivate the ‘Augmented Lagrangian’ approach, let us observe the KKT conditions a bit closely. We conclude from the first-order condition that a locally optimum 𝒙\bm{x^{*}} renders the function L(𝒙,𝝀)L(\bm{x},\bm{\lambda^{*}}) stationary. However, this doesn’t guarantee that 𝒙\bm{x^{*}} is a minimum of L(𝒙,𝝀)L(\bm{x},\bm{\lambda^{*}}). Indeed, observation of the second-order condition tells us that 𝒙\bm{x^{*}} could be a saddle point of L(𝒙,𝝀)L(\bm{x},\bm{\lambda^{*}}). This means that gradient descent-based algorithms will not converge to 𝒙\bm{x^{*}} if the starting point is not in the correct region. It was to solve this problem that the ‘Augmented Lagrangian’ method was invented.

The Augmented Lagrange function, Lc(𝒙,𝝀)L_{c}(\bm{x},\bm{\lambda}), is given by:

Lc(𝒙,𝝀)=f(𝒙)+i=1pλigi(𝒙)+c2(i=1p(gi(𝒙))2).L_{c}(\bm{x},\bm{\lambda})=f(\bm{x})+\sum_{i=1}^{p}\lambda_{i}g_{i}(\bm{x})+\frac{c}{2}\left(\sum_{i=1}^{p}\left(g_{i}(\bm{x})\right)^{2}\right).

This function Lc(𝒙,𝝀)L_{c}(\bm{x},\bm{\lambda}), for a suitable choice of cc, has the property that 𝒙\bm{x^{*}} forms a minimum of Lc(𝒙,𝝀)L_{c}(\bm{x},\bm{\lambda^{*}}) and not just a saddle point as was the case with L(𝒙,𝝀)L(\bm{x},\bm{\lambda^{*}}).

In other words, local optima of the original constrained optimization problem can be obtained if we perform an unconstrained optimization of Lc(𝒙,𝝀)L_{c}(\bm{x},\bm{\lambda^{*}}). However, for this procedure to be a feasible solution approach, we would have to know the right 𝝀\bm{\lambda^{*}}. It has been shown in [25] that the way to do this is to perform gradient ascent of LcL_{c} in the 𝝀\bm{\lambda} variables.

The ‘Augmented Lagrangian method of multipliers’ involves the repeated minimization of Lc(𝒙,𝝀(k))L_{c}(\bm{x},\bm{\lambda}^{(k)}) using progressively better estimates, 𝝀(k)\bm{\lambda}^{(k)}, of 𝝀\bm{\lambda^{*}}. The algorithm starts off with an arbitrary starting point (x(0),λ(0))(x^{(0)},\lambda^{(0)}). It then performs the following steps repeatedly:

  1. 1.

    Locally minimize Lc(𝒙,𝝀(k))L_{c}(\bm{x},\bm{\lambda}^{(k)}) and call the minimum point 𝒙(k)\bm{x}^{(k)}.

  2. 2.

    λi(k+1)=λi(k)+cgi(𝒙(k))\lambda_{i}^{(k+1)}=\lambda_{i}^{(k)}+cg_{i}(\bm{x}^{(k)}).

The second step above corresponds to gradient ascent of Lc(𝒙,𝝀)L_{c}(\bm{x},\bm{\lambda}) in the 𝝀\bm{\lambda} variables. Basically, this method performs a fast gradient descent of LcL_{c} in the 𝒙\bm{x} directions in conjunction with a slow gradient ascent of LcL_{c} in the 𝝀\bm{\lambda} directions. A dynamical system that performs this process in continuous time is given below:

dxidt\displaystyle\frac{dx_{i}}{dt} =κxiLc(𝒙,𝝀),\displaystyle=-\kappa\frac{\partial}{\partial x_{i}}L_{c}(\bm{x},\bm{\lambda}),
dλidt\displaystyle\frac{d\lambda_{i}}{dt} =κλiLc(𝒙,𝝀).\displaystyle=\kappa^{\prime}\frac{\partial}{\partial\lambda_{i}}L_{c}(\bm{x},\bm{\lambda}).

Appendix C Application of Lagrange multipliers to the Ising problem; Cubic terms

In this appendix, we shall provide an explanation of the role that is played by the cubic terms—that were neglected in the main text—in the methods of Yamamoto et al. [7], Kalinin et al. [15], and others. It will turn out that the inclusion of cubic terms helps us implement the ‘Augmented Lagrangian method of multipliers’ from Appendix B.

The statement of the Ising problem is as follows:

minimize 𝒙T𝑱𝒙\displaystyle\bm{x}^{T}\bm{J}\bm{x}
subject to xi21=0,i=1,,n.\displaystyle x_{i}^{2}-1=0,\ \ i=1,\dots,n.

The corresponding Lagrange function is given by:

L(𝒙,𝝀)=𝒙T𝑱𝒙i=1nλi(xi21).L(\bm{x},\bm{\lambda})=\bm{x}^{T}\bm{Jx}-\sum_{i=1}^{n}\lambda_{i}(x_{i}^{2}-1).

Next, we write down the Augmented Lagrange function:

Lc(𝒙,𝝀)=𝒙T𝑱𝒙i=1nλi(xi21)+c2(i=1n(xi21)2).L_{c}(\bm{x},\bm{\lambda})=\bm{x}^{T}\bm{Jx}-\sum_{i=1}^{n}\lambda_{i}(x_{i}^{2}-1)+\frac{c}{2}\left(\sum_{i=1}^{n}\left(x_{i}^{2}-1\right)^{2}\right).

Substituting the above Augmented Lagrange function into the dynamical system provided at the end of Appendix B, we get:

dxidt\displaystyle\frac{dx_{i}}{dt} =2κ(λixi+2cxi2cxi3j=1nJijxj),\displaystyle=2\kappa\left(\lambda_{i}x_{i}+2cx_{i}-2cx_{i}^{3}-\sum_{j=1}^{n}{J_{ij}x_{j}}\right),
dλidt\displaystyle\frac{d\lambda_{i}}{dt} =κ(1xi2),\displaystyle=\kappa^{\prime}(1-x_{i}^{2}),

where κ\kappa and κ\kappa^{\prime} are appropriately chosen step sizes. We notice the equivalence in form between this dynamical system and those in the papers discussed in the main text and conclude that the cubic terms that appear in most of those systems are in fact helping to implement the Augmented Lagrangian method of multipliers.

Appendix D Adiabatic method using Kerr (χ(3))\left(\chi^{(3)}\right) nonlinear coupled oscillators

Researchers at Toshiba proposed an adiabatic Ising solver consisting of networks of coupled Kerr nonlinear oscillators [17], [32], [33]. The big difference between the machines we have studied so far and the Toshiba machine is that Toshiba uses dissipation-less optical systems and utilizes the adiabatic method to optimize the Ising Hamiltonian. By dissipation-less, we mean that the coupling between different oscillators is not dissipative but perfectly elastic.

Further, they replace parametric oscillators that use the χ(2)\chi^{(2)} nonlinearity with Kerr-parametric oscillators that possess both the χ(2)\chi^{(2)} and the χ(3)\chi^{(3)} nonlinearities. The χ(2)\chi^{(2)} nonlinearity gives rise to effects such as parametric amplification whereas the χ(3)\chi^{(3)} nonlinearity leads to intensity dependent refractive index change. A second-harmonic pump signal is used to achieve parametric amplification. The refractive index of the modes in the oscillators is modulated by their own intensities due to the third-order nonlinearity.

The slowly varying amplitude equations for the sine (sis_{i}) and cosine (cic_{i}) components of the electric field in their system are given by:

dcidt=K(ci2+si2)si+psi+ξ0jJijsj,\displaystyle\frac{dc_{i}}{dt}=K\left(c_{i}^{2}+s_{i}^{2}\right)s_{i}+ps_{i}+\xi_{0}\sum_{j}{J_{ij}s_{j}},
dsidt=K(ci2+si2)ci+pciξ0jJijcj,\displaystyle\frac{ds_{i}}{dt}=-K\left(c_{i}^{2}+s_{i}^{2}\right)c_{i}+pc_{i}-\xi_{0}\sum_{j}{J_{ij}c_{j}},

where cic_{i} and sis_{i} represent the cosine and sine components of the ii-th oscillator, pp refers to the strength of the parametric pumping, KK is the value of the third-order nonlinearity (Kerr coefficient), and ξ0\xi_{0} is the strength of the coupling interaction between the oscillators.

We rotate the axes and recast the above set of equations in phasor notation as follows to elucidate the similarity between this method and that of Cambridge:

dEidt=κ(pEi+iK|Ei|2Ei+ijJijEj).\frac{dE_{i}}{dt}=\kappa\left(pE_{i}^{*}+iK|E_{i}|^{2}E_{i}+i\sum_{j}{J_{ij}E_{j}}\right).

The first term on the right-hand side is the parametric gain, the second term is the nonlinear rotation caused by the Kerr nonlinearity, and the third term is the dissipation-less coupling between oscillators.

The machine, which follows the adiabatic principle, works by ramping up the value of the pump pp adiabatically. The authors show that their system Hamiltonian mimics the Ising Hamiltonian well when p becomes large enough.

We note the interesting fact that, though the authors have parametric gain, the system does not blow up to infinity due to the presence of the nonlinear rotating term. This term ensures that the quadrature vector keeps traversing gain and loss regions periodically in the phase space, hence keeping the amplitudes in check.

Appendix E Iterative Analog Matrix Multipliers

In this appendix, we present the system designed by Soljacic et al. [16]. We shall see that the simplified system we presented in Sec 6a of the main text differs from that of Soljacic et al. significantly in that their method has added noise and nonlinear thresholding after each iteration. It is possible that their modifications lead to performance improvements.

Formally, their iteration is given by:

𝑬(t+1)=u(2𝑲𝑬(t)+𝑵(t)),\bm{E}(t+1)=u\left(2\bm{KE}(t)+\bm{N}(t)\right),

where 𝑬(t)\bm{E}(t) is the vector of electric field amplitude values at the beginning of the tt-th iteration, u(x)u(x) is the Heaviside step function that is 1 for positive xx and 0 for negative xx, 𝑵(t)\bm{N}(t) is a zero-mean Gaussian random noise vector, and KK is a matrix given by K=𝑱+α𝑴K=\sqrt{\bm{J}+\alpha\bm{M}}, 𝑱\bm{J} is the Ising connectivity matrix, α\alpha, a real number, and 𝑴\bm{M}, some suitably chosen matrix. More specifically, 𝑴\bm{M} is chosen to have the same eigenvectors as 𝑱\bm{J}. It will turn out that the eigenvalues of 𝑴\bm{M} play the role of Lagrange multipliers.

The authors showed that under the condition of high noise 𝑵(t)\bm{N}(t), their system performs minimization of the following effective merit function:

H=β2ij(Jij+αMij)EiEj,H=-\frac{\beta}{2}\sum_{ij}{\left(J_{ij}+\alpha M_{ij}\right)E_{i}E_{j}},

where β\beta is some parameter dependent on the noise.

Using the fact that the matrix 𝑴\bm{M} is chosen to have the same eigenvectors as 𝑱\bm{J}, we rewrite the above merit function, modulo additive constants, as the following Lagrange function:

H=L(𝑬,𝜸)=β2(ijJijEiEj+αiγi(zi21)),H=L(\bm{E},\bm{\gamma})=-\frac{\beta}{2}\left(\sum_{ij}{J_{ij}E_{i}E_{j}}+\alpha\sum_{i}{\gamma_{i}\left(z_{i}^{2}-1\right)}\right),

where the γi\gamma_{i} are the eigenvalues of the matrix 𝑴\bm{M}, and the vector 𝒛\bm{z} is the vector of electric field amplitudes 𝑬\bm{E} expressed in the basis of eigenvectors of 𝑴\bm{M} (that is, the eigenvectors of 𝑱\bm{J}). We see that the eigenvalues of 𝑴\bm{M} play the role of Lagrange multipliers, albeit for different constraints than those required by the Ising problem. This difference seems to be caused by 𝑴\bm{M} not being a diagonal matrix.

In conclusion, we interpret their algorithm as optimizing a Lagrange function with the merit function being the Ising Hamiltonian itself, and the constraints being that the components of the spin vector when expressed in the eigenvector basis of 𝑱\bm{J} be restricted to 1 and -1.

References

  • [1] Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, D. Englund, and M. Soljačić, “Deep learning with coherent nanophotonic circuits,” Nature Photonics, vol. 11, pp. 441–446, July 2017.
  • [2] L. Onsager, “Reciprocal Relations in Irreversible Processes. II.,” Physical Review, vol. 38, pp. 2265–2279, Dec. 1931.
  • [3] A. Lucas, “Ising formulations of many NP problems,” arXiv preprint arXiv:1302.5843, 2013.
  • [4] S. Utsunomiya, K. Takata, and Y. Yamamoto, “Mapping of Ising models onto injection-locked laser systems,” Optics express, vol. 19, no. 19, pp. 18091–18108, 2011.
  • [5] Z. Wang, A. Marandi, K. Wen, R. L. Byer, and Y. Yamamoto, “Coherent Ising machine based on degenerate optical parametric oscillators,” Physical Review A, vol. 88, no. 6, p. 063853, 2013.
  • [6] A. Marandi, Z. Wang, K. Takata, R. L. Byer, and Y. Yamamoto, “Network of time-multiplexed optical parametric oscillators as a coherent Ising machine,” Nature Photonics, vol. 8, no. 12, p. 937, 2014.
  • [7] Y. Haribara, S. Utsunomiya, and Y. Yamamoto, “Computational Principle and Performance Evaluation of Coherent Ising Machine Based on Degenerate Optical Parametric Oscillator Network,” Entropy, vol. 18, p. 151, Apr. 2016.
  • [8] P. L. McMahon, A. Marandi, Y. Haribara, R. Hamerly, C. Langrock, S. Tamate, T. Inagaki, H. Takesue, S. Utsunomiya, and K. Aihara, “A fully programmable 100-spin coherent Ising machine with all-to-all connections,” Science, vol. 354, no. 6312, pp. 614–617, 2016.
  • [9] T. Inagaki, K. Inaba, R. Hamerly, K. Inoue, Y. Yamamoto, and H. Takesue, “Large-scale Ising spin network based on degenerate optical parametric oscillators,” Nature Photonics, vol. 10, no. 6, p. 415, 2016.
  • [10] T. Inagaki, Y. Haribara, K. Igarashi, T. Sonobe, S. Tamate, T. Honjo, A. Marandi, P. L. McMahon, T. Umeki, K. Enbutsu, O. Tadanaga, H. Takenouchi, K. Aihara, K.-i. Kawarabayashi, K. Inoue, S. Utsunomiya, and H. Takesue, “A coherent Ising machine for 2000-node optimization problems,” Science, vol. 354, pp. 603–606, Nov. 2016.
  • [11] T. Leleu, Y. Yamamoto, P. L. McMahon, and K. Aihara, “Destabilization of Local Minima in Analog Spin Systems by Correction of Amplitude Heterogeneity,” Physical Review Letters, vol. 122, p. 040607, Feb. 2019.
  • [12] T. P. Xiao, Optoelectronics for refrigeration and analog circuits for combinatorial optimization. PhD Thesis, UC Berkeley, 2019.
  • [13] M. Babaeian, D. T. Nguyen, V. Demir, M. Akbulut, P.-A. Blanche, Y. Kaneda, S. Guha, M. A. Neifeld, and N. Peyghambarian, “A single shot coherent Ising machine based on a network of injection-locked multicore fiber lasers,” Nature Communications, vol. 10, p. 3516, Dec. 2019.
  • [14] T. Wang and J. Roychowdhury, “OIM: Oscillator-Based Ising Machines for Solving Combinatorial Optimisation Problems,” in Unconventional Computation and Natural Computation (I. McQuillan and S. Seki, eds.), vol. 11493, pp. 232–256, Cham: Springer International Publishing, 2019.
  • [15] K. P. Kalinin and N. G. Berloff, “Global optimization of spin Hamiltonians with gain-dissipative systems,” Scientific Reports, vol. 8, p. 17791, Dec. 2018.
  • [16] C. Roques-Carmes, Y. Shen, C. Zanoci, M. Prabhu, F. Atieh, L. Jing, T. Dubček, C. Mao, M. R. Johnson, V. Čeperić, J. D. Joannopoulos, D. Englund, and M. Soljačić, “Heuristic recurrent algorithms for photonic Ising machines,” Nature Communications, vol. 11, p. 249, Dec. 2020.
  • [17] H. Goto, K. Tatsumura, and A. R. Dixon, “Combinatorial optimization by simulating adiabatic bifurcations in nonlinear Hamiltonian systems,” Science Advances, vol. 5, p. eaav2372, Apr. 2019.
  • [18] B. Molnár, F. Molnár, M. Varga, Z. Toroczkai, and M. Ercsey-Ravasz, “A continuous-time MaxSAT solver with high analog performance,” Nature Communications, vol. 9, p. 4864, Dec. 2018.
  • [19] D. Pierangeli, G. Marcucci, and C. Conti, “Large-Scale Photonic Ising Machine by Spatial Light Modulation,” Physical Review Letters, vol. 122, p. 213902, May 2019.
  • [20] D. Pierangeli, M. Rafayelyan, C. Conti, and S. Gigan, “Scalable spin-glass optical simulator,” arXiv:2006.00828 [physics], June 2020. arXiv: 2006.00828.
  • [21] I. Prigogine, Etude Thermodynamique des Phénomènes irréversibles. Editions Desoer, Liège, 1947. Chap V.
  • [22] S. de Groot, Thermodynamics of Irreversible Processes. Interscience Publishers, Inc., New York, 1951. Chap X.
  • [23] N. G. Dickson, M. W. Johnson, M. H. Amin, R. Harris, F. Altomare, A. J. Berkley, P. Bunyk, J. Cai, E. M. Chapple, and P. Chavez, “Thermally assisted quantum annealing of a 16-qubit problem,” Nature communications, vol. 4, p. 1903, 2013.
  • [24] S. Boyd and L. Vandenberghe, Convex optimization. Cambridge university press, 2004.
  • [25] D. Bertsekas, Nonlinear programming. Athena Scientific, 1999.
  • [26] https://web.stanford.edu/~yyye/yyye/Gset/.
  • [27] U. Benlic and J.-K. Hao, “Breakout Local Search for the Max-Cutproblem,” Engineering Applications of Artificial Intelligence, vol. 26, pp. 1162–1173, Mar. 2013.
  • [28] J. Roychowdhury, “Boolean Computation Using Self-Sustaining Nonlinear Oscillators,” Proceedings of the IEEE, vol. 103, pp. 1958–1969, Nov. 2015.
  • [29] A. Demir, A. Mehrotra, and J. Roychowdhury, “Phase noise in oscillators: A unifying theory and numerical methods for characterization,” IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, vol. 47, no. 5, pp. 655–674, 2000.
  • [30] M. Reck, A. Zeilinger, H. J. Bernstein, and P. Bertani, “Experimental realization of any discrete unitary operator,” Physical Review Letters, vol. 73, pp. 58–61, July 1994.
  • [31] R. M. Karp, “Reducibility among combinatorial problems,” in Complexity of computer computations, pp. 85–103, Springer, 1972.
  • [32] H. Goto, “Bifurcation-based adiabatic quantum computation with a nonlinear oscillator network,” Scientific reports, vol. 6, p. 21686, 2016.
  • [33] H. Goto, “Quantum computation based on quantum adiabatic bifurcations of Kerr-nonlinear parametric oscillators,” Journal of the Physical Society of Japan, vol. 88, no. 6, p. 061015, 2019.