This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

High dimensional optimization under non-convex excluded volume constraints

Antonio Sclocchi Institute of Physics, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland    Pierfrancesco Urbani Institut de physique théorique, Université Paris Saclay, CNRS, CEA, F-91191 Gif-sur-Yvette, France
Abstract

We consider high dimensional random optimization problems where the dynamical variables are subjected to non-convex excluded volume constraints. We focus on the case in which the cost function is a simple quadratic cost and the excluded volume constraints are modeled by a perceptron constraint satisfaction problem. We show that depending on the density of constraints, one can have different situations. If the number of constraints is small, one typically has a phase where the ground state of the cost function is unique and sits on the boundary of the island of configurations allowed by the constraints. In this case, there is a hypostatic number of marginally satisfied constraints. If the number of constraints is increased one enters a glassy phase where the cost function has many local minima sitting again on the boundary of the regions of allowed configurations. At the phase transition point, the total number of marginally satisfied constraints becomes equal to the number of degrees of freedom in the problem and therefore we say that these minima are isostatic. We conjecture that by increasing further the constraints the system stays isostatic up to the point where the volume of available phase space shrinks to zero. We derive our results using the replica method and we also analyze a dynamical algorithm, the Karush-Kuhn-Tucker algorithm, through dynamical mean-field theory and we show how to recover the results of the replica approach in the replica symmetric phase.

I Introduction

Several complex systems can be described by a large number of variables whose dynamics is trying to optimize some cost function under a set of constraints. In the most general setting, the cost function is arbitrary and the constraints may be either inequalities or equalities. In this work, we consider the generic situation in which there is a set of NN real variables x¯={x1,,xN}\underline{x}=\{x_{1},\ldots,x_{N}\} and a set of M=αNM=\alpha N inequality constraints

𝒞={gμ(x¯)0,μ=1,,M=αN}.{\cal C}=\{g^{\mu}(\underline{x})\geq 0,\mu=1,\ldots,M=\alpha N\}\>. (1)

each characterized by a function gμ(x¯)g^{\mu}(\underline{x}) and we consider a cost function H(x¯)H(\underline{x}).

An example of this situation is found in classification problems in machine learning. In the simplest setting of binary classification, one has a set of inequalities that enforce the correct separation of the training set into two classes. The parameters of the machine (that correspond to the variables xix_{i} here) must be adjusted to correctly satisfy these constraints. Therefore, the supervised learning task is recast into a continuous constraint satisfaction problem (CCSP). In many cases one wants to enforce that, once a solution of the problem is found, it satisfies some requirements. For example, in order to prevent numerical instabilities, one would like to have the parameters of the network be quantities that are of the good order of magnitude and do not explode to infinity. To achieve this, one typically uses regularization schemes as, for example, Ridge regularization. In other cases, one is interested in finding sparse networks in which many of the parameters are zero. This can also be achieved by imposing an appropriate cost function on the parameters Neyshabur (2020).

In all these cases, the machine performing the classification task is defined by the form of the functions gμ(x¯)g_{\mu}(\underline{x}) and the regularizing cost function is given by H(x¯)H(\underline{x}) which introduces some penalty to prevent that the parameters of the network behave in a way that is not the desired one. In this work, we will be rather generic and we will not focus on any practical application. We will rather focus on the characterization of the landscape of the cost function HH subjected to the constraints 𝒞{\cal C}. In particular, we want to find the ground state of the cost function defined by

x¯=argminx¯:gμ(x¯)0μH(x¯).\underline{x}^{*}=\underset{\underline{x}:g^{\mu}(\underline{x})\geq 0\ \forall\mu}{\rm argmin}H(\underline{x})\>. (2)

In order to have a well-posed problem, we will assume that there exists a region of the phase space of x¯\underline{x} such that all constraints are satisfied. We will call this region the SAT region (the terminology comes from constraint satisfaction problems in computer science Franz et al. (2017)). Examples of optimization problems as Eq. (2) can be easily constructed and classified according to the properties of the corresponding cost function. We list here just a set of general cases in which the cost function is:

  • separable: this means that each degree of freedom tries to optimize a cost function that is independent of the configuration of the other degrees of freedom. An example in this class is

    H(x¯)=12i=1N(xi1)2H(\underline{x})=\frac{1}{2}\sum_{i=1}^{N}(x_{i}-1)^{2} (3)

    but other non-convex potentials with local minima may be chosen.

  • non-separable and convex: the cost function of each degree of freedom depends on the configuration of part or the whole system and the global cost is a convex function when it is considered in the whole phase space (in absence of the excluded volume constraints). As an example one can consider:

    H(x¯)=1Pμ=1PxiηiμηjμxjH(\underline{x})=\frac{1}{P}\sum_{\mu=1}^{P}x_{i}\eta^{\mu}_{i}\eta^{\mu}_{j}x_{j} (4)

    being η¯μ\underline{\eta}^{\mu} i.i.d. random vectors. Another example is the following: consider the vector x¯\underline{x} to be unconstrained. Then one wants to minimize the volume of the phase space of the vector x¯\underline{x}, for example by minimizing |x¯|2|\underline{x}|^{2}, such that there is a configuration that satisfies all constraints. This problem is related to the packing problem and has been analyzed in Franz and Parisi (2016); Franz et al. (2015, 2017); Yoshino (2018); Brito et al. (2018); Franz et al. (2019a); Ikeda et al. (2019); Ikeda (2020), see also Geiger et al. (2020) for a review with the corresponding connection to deep learning.

  • non-separable and non-convex: we have again a non-separable cost which is also non-convex. As an example, one can think of high dimensional random Gaussian functions Castellani and Cavagna (2005)

    H(x¯)=1Ni<j<kJijkxixjxkH(\underline{x})=\frac{1}{N}\sum_{i<j<k}J_{ijk}x_{i}x_{j}x_{k} (5)

    and the coupling constants are Gaussian variables of the order of 1/N1/N. In other words, one can superimpose a spin glass landscape on top of a continuous constraint satisfaction problem (CCSP) identified by the set of constraints.

This class of optimization problems is quite vast. In the following, we will consider the simplest case of having a simple separable convex optimization cost and the simplest set of non-convex excluded volume constraints. We will show that the non-convexity in the constraints brings about glassiness and marginal stability even when the cost function is convex by itself.

II A toy model

To be concrete we consider a simple toy model. We assume that each variable xix_{i} wants to minimize a simple convex cost function but it is constrained to be in a possibly non-convex region of phase space. We consider the simplest cost function that is

H(x¯)=12i=1N(xi1)2H(\underline{x})=\frac{1}{2}\sum_{i=1}^{N}(x_{i}-1)^{2} (6)

and we want to minimize it under the constraints that

gμ(x¯)0,μ=1,,M=αNg^{\mu}(\underline{x})\geq 0,\ \ \ \ \ \ \forall\mu=1,\ldots,M=\alpha N (7)

with

gμ(x¯)=1Nξ¯μx¯σg^{\mu}(\underline{x})=\frac{1}{\sqrt{N}}\underline{\xi}^{\mu}\cdot\underline{x}-\sigma (8)

where ξ¯μ={ξ1μ,,ξNμ}\underline{\xi}^{\mu}=\{\xi^{\mu}_{1},\ldots,\xi^{\mu}_{N}\} is a random vector whose components are Gaussian random variables with zero mean and unit variance. Furthermore, we constrain x¯\underline{x} to be on the NN-dimensional sphere |x¯|2=N|\underline{x}|^{2}=N. The constraints in Eqs. (7) and (8) define the perceptron continuous constraint satisfaction problem (CCSP). Eqs. (6), (7) and (8) define a quadratic programming problem with inequality constraints Boyd and Vandenberghe (2004). However, as we will see, the topology of the phase space and the nature of the constraints are such that the global optimization problem can become non-convex.

In Gardner (1988); Gardner and Derrida (1988); Franz and Parisi (2016); Franz et al. (2017) the properties of the space of solutions of this CCSP have been extensively studied. As a function of σ\sigma, one has a critical value αJ(σ)\alpha_{J}(\sigma) of the density of constraints below which the CCSP is SAT, meaning that with probability one (for NN\rightarrow\infty) the phase space defined by the constraints has a finite positive volume, and above which the problem is UNSAT, meaning that there is no configuration of x¯\underline{x} that satisfies all constraints.

In Franz et al. (2017) the UNSAT phase of the perceptron has been studied by associating a harmonic energy cost to the unsatisfied constraints. In Franz and Parisi (2016) the SAT phase of the perceptron CCSP has been characterized. The present setting, instead, is quite different: we still consider the SAT phase of the perceptron CCSP, corresponding to α<αJ(σ)\alpha<\alpha_{J}(\sigma), but on top of it we want to optimize the cost function H(x¯)H(\underline{x}) of Eq. (6). It is therefore a problem of constrained optimization.

The nature of the constraints changes completely when σ\sigma passes from being positive, where the CCSP is convex Franz and Parisi (2016), to a negative value, where the solution of the CCSP lies at the intersection of non-convex domains which may be in general non-convex.

In order to study the properties of the landscape of this optimization problem, we consider the partition function defined as

Z=dx¯exp[βH(x¯)][μ=1Mθ(gμ(x¯))]Z=\int\mathrm{d}\underline{x}\exp\left[-\beta H(\underline{x})\right]\left[\prod_{\mu=1}^{M}\theta\left(g^{\mu}(\underline{x})\right)\right] (9)

where β=1/T\beta=1/T is the inverse of the temperature TT and θ(x)\theta(x) is the Heaviside step function. Given the partition function, one constructs the average free energy as

f=1βNlnZ¯{\rm f}=-\frac{1}{\beta N}\overline{\ln Z} (10)

where the overline denotes the average over the random vectors ξ\xis. To compute the properties of the landscape of the cost function, we need to study the behavior of the free energy in the zero-temperature limit β\beta\rightarrow\infty.

The average of the logarithm of the partition function can be computed using the replica method Mézard et al. (1987)

f=1βNlimn0nZn¯.{\rm f}=-\frac{1}{\beta N}\lim_{n\rightarrow 0}\partial_{n}\overline{Z^{n}}\>. (11)

The average over the nn-th power of the partition function can be obtained by considering nn to be integer and then taking the analytic continuation down to n0n\rightarrow 0. Performing the integral over the random vectors ξ\xis (the details are essentially very close to what is reported in Franz et al. (2017) and we will omit them) we obtain

Zn¯=dQ^exp[NA(Q^)]\overline{Z^{n}}=\int\mathrm{d}\hat{Q}\exp\left[NA(\hat{Q})\right] (12)

where A(Q)A(Q) is an action and QQ is an (n+1)×(n+1)(n+1)\times(n+1) integration matrix. For NN\rightarrow\infty we can evaluate the integral by saddle point. The expression of the function A(Q^)A(\hat{Q}) is given by

A(Q^)=12lndetQ^β(1m)+αln𝒵A(\hat{Q})=\frac{1}{2}\ln\det\hat{Q}-\beta(1-m)+\alpha\ln{\cal Z} (13)

where we have assumed that Q^1a=Q^a1=m\hat{Q}_{1a}=\hat{Q}_{a1}=m independently on aa111The fact that Q1aQ_{1a} is independent on the replica index is a property that holds at saddle point. Indeed one can show that at the saddle point Q1a=1Ni=1Nxi¯Q_{1a}=\frac{1}{N}\sum_{i=1}^{N}\overline{\langle x_{i}\rangle} which is therefore independent of aa.. At the saddle point we have the following interpretation for the overlap matrix QQ:

Q^ab=1Nx¯ax¯ba,b>1Q^1a=Q^a1=1Ni=1N[¯x¯a]i=ma>1\begin{split}\hat{Q}_{ab}&=\frac{1}{N}\underline{x}_{a}\cdot\underline{x}_{b}\ \ \ \ \ \ \ a,b>1\\ \hat{Q}_{1a}&=\hat{Q}_{a1}=\frac{1}{N}\sum_{i=1}^{N}\underline{[}\underline{x}_{a}]_{i}=m\ \ \ \ \ \ \ a>1\end{split} (14)

Therefore Q^1a=Q^a1=m\hat{Q}_{1a}=\hat{Q}_{a1}=m is an order parameter that quantifies the overlap between the configuration x¯a\underline{x}_{a} and the minimum [1,1,,1][1,1,...,1] of H(x¯)H(\underline{x}), while Q^ab=1Nx¯ax¯b\hat{Q}_{ab}=\frac{1}{N}\underline{x}_{a}\cdot\underline{x}_{b}, for a,b>1a,b>1, is the overlap between the configurations of different replicas of the system. Note that the spherical constraint on x¯\underline{x} imposes that Q^aa=1\hat{Q}_{aa}=1. The local partition function 𝒵{\cal Z} is given by

𝒵=exp[12a,b=1nQab2hahb]c=1nθ[hcσ]|h¯=0{\cal Z}=\left.\exp\left[\frac{1}{2}\sum_{a,b=1}^{n}Q_{ab}\frac{\partial^{2}}{\partial h_{a}\partial h_{b}}\right]\prod_{c=1}^{n}\theta\left[h_{c}-\sigma\right]\right|_{\underline{h}=0} (15)

where we have defined QQ as the matrix obtained from Q^\hat{Q} by removing the first row and column. The saddle point equations for the matrix Q^\hat{Q} can be obtained in full generality following the same steps as in Franz et al. (2017). Here we will not repeat the computation and we will consider the replica symmetric solution and its stability.

II.1 The replica symmetric solution and its stability

The replica symmetric ansatz to the solution of the saddle point equations amounts to have Qab=qQ_{a\neq b}=q for all aba\neq b. The order parameters mm and qq are then given by the following saddle point equations

qm2(1q)2=αdhP(q,h)(f(q,h))2m1q=β\begin{split}\frac{q-m^{2}}{(1-q)^{2}}&=\alpha\int\mathrm{d}hP(q,h)(f^{\prime}(q,h))^{2}\\ \frac{m}{1-q}=\beta\end{split} (16)

where

f(q,h)=lnγ1qθ(h)P(q,h)=γq(h+σ)\begin{split}f(q,h)&=\ln\gamma_{1-q}\star\theta(h)\\ P(q,h)&=\gamma_{q}(h+\sigma)\end{split} (17)

and we used the notation according to which the prime is the derivative w.r.t. hh, \star denotes a convolution operation while

γA(h)=12πAeh2/(2A).\gamma_{A}(h)=\frac{1}{\sqrt{2\pi A}}e^{-h^{2}/(2A)}\>. (18)

The stability condition of the replica symmetric solution is given by studying the Hessian 2A(Q^)Q^abQ^cd\frac{\partial^{2}A(\hat{Q})}{\partial\hat{Q}_{ab}\partial\hat{Q}_{cd}} of the function A(Q^)A(\hat{Q}) De Almeida and Thouless (1978); Mézard et al. (1987); Franz et al. (2017). The replica symmetric solution is stable if the saddle point solution satisfies the inequality

1(1q)2αdhP(q,h)(f′′(q,h))2.\frac{1}{(1-q)^{2}}\geq\alpha\int\mathrm{d}hP(q,h)(f^{\prime\prime}(q,h))^{2}\>. (19)

When Eq. (19) becomes a strict equality, the model enters in a glassy regime where multiple minima appear, and the model is characterized by replica symmetry breaking and marginal stability Mézard et al. (1987). The equations (16)-(19) have been derived in the finite temperature regime. However, we are interested in looking at the zero-temperature limit which corresponds to the ground state of the cost function. In order to take the T0T\rightarrow 0 limit, we assume that in this regime

q=1χTT0+q=1-\chi T\ \ \ \ \ \ T\rightarrow 0^{+} (20)

which gives m=χm=\chi. Substituting this relation in the first equation of Eqs. (16) we get that χ\chi satisfies

1χ2χ2=T2αdhP(1,h)(f(1,h))2.\frac{1-\chi^{2}}{\chi^{2}}=T^{2}\alpha\int\mathrm{d}hP(1,h)(f^{\prime}(1,h))^{2}\>. (21)

Furthermore, in the zero-temperature limit, we have

f(1,h)h22χTθ(h)f(1,h)\simeq-\frac{h^{2}}{2\chi T}\theta(-h) (22)

which gives

1χ2=αdhP(1,h)h2θ(h).1-\chi^{2}=\alpha\int\mathrm{d}hP(1,h)h^{2}\theta(-h)\>. (23)

The marginal stability of the RS solution is given by

1=αdhP(1,h)θ(h).1=\alpha\int\mathrm{d}hP(1,h)\theta(-h)\>. (24)

This relation defines the region in which replica symmetry breaking (RSB) appears. Note that Eq. (20) requires χ\chi to be positive since qq is constrained to be q1q\leq 1. The phase diagram of the model is plotted in Fig. 1 where we considered only the case of σ<0\sigma<0 which gives rise to non-convex excluded volume constraints.

Refer to caption
Figure 1: The phase diagram of the toy optimization problem we are considering. The model is defined only below the SAT/UNSAT transition line for which we plotted in red the replica symmetric approximation. The SAT/UNSAT transition line represents where the volume of configurations allowed by the constraints shrinks to zero. Below the green dashed line, the minimum of the cost function is unique and the replica symmetric solution is stable, meaning that the stability condition Eq. (19) is satisfied. The dashed green line corresponds to Eq. (19) becoming a strict equality. Above this line, instead, the system enters in a glassy phase characterized by multiple minima. This phase is described by replica symmetry breaking. In the replica symmetric phase, the global minimum of the cost function is characterized by a hypostatic number of marginally satisfied constraints. Upon reaching the marginal stability line the number of those constraints becomes equal to the number of degrees of freedom so that the minimum becomes isostatic. We plot in blue also the glass transition line of the perceptron CCSP, whose equation is given in Refs. Franz and Parisi (2016); Franz et al. (2017) and is independent of the choice of the optimization function H(x¯)H(\underline{x}) since it only depends on the perceptron constraints. Below this line, the phase space defined by the constraints is ergodic while right above ergodicity is broken and one has replica symmetry breaking.

The red line in the phase diagram represents the replica symmetric approximation for the SAT/UNSAT (jamming) transition line. Above this line, there is no configuration of x¯\underline{x} that satisfies all constraints and therefore the optimization problem makes sense only in the region below this line. This line is simply obtained by taking the χ0\chi\rightarrow 0 limit of the saddle point equations. Indeed, above this line no physically meaningful solution should exist. Therefore this line coincides with the one obtained in Franz and Parisi (2016). Another way to see that the χ0\chi\rightarrow 0 is the right limit to get the SAT/UNSAT line is by noting that at saddle point χ=m\chi=m. We expect that, on this line, only one solution of the CCSP is left and, by rotational invariance of the disorder, it is orthogonal to the global minimum of the cost function. This implies that m=0m=0.

The dashed green line instead corresponds to the line that separates a strictly stable phase below it, where the ground state of the cost function is unique and not critical, from a marginally stable phase where the landscape of the cost function is glassy and local minima are marginally stable. Note that the CCSP defined by the perceptron constraints also undergoes a replica symmetry breaking transition where the space of solutions disconnects before reaching the red line. However this line, plotted also in Fig. 1, is different from the one signaling the onset of glassiness of the cost function and, in particular, it is closer to the SAT/UNSAT transition line. Therefore we get that even if the CCSP is replica symmetric, the landscape of the cost function may be glassy.

II.2 The distribution of gaps

To characterize the phase diagram, we compute the distribution of the gaps defined as the values of gμg^{\mu} computed in the configuration that represents the minimum of the cost function. Using the same approach of Franz et al. (2017), we can show that the distribution of gμg^{\mu} is given by

ρ(h)=1Nμ=1Mδ(hgμ(x¯))=cδ(h)+ρ+(h).\rho(h)=\frac{1}{N}\sum_{\mu=1}^{M}\langle\delta(h-g^{\mu}(\underline{x}))\rangle=c\delta(h)+\rho^{+}(h). (25)

In the replica symmetric region, we have that ρ+(h)\rho^{+}(h) is given by

ρ+(h)=αP(1,h)θ(h)\rho^{+}(h)=\alpha P(1,h)\theta(h) (26)

and

c=αα0dhP(1,h)=α0dP(1,h).c=\alpha-\alpha\int_{0}^{\infty}\mathrm{d}hP(1,h)=\alpha\int_{-\infty}^{0}\mathrm{d}P(1,h)\>. (27)

Therefore, when the replica symmetric solution becomes marginally stable, the system becomes isostatic, corresponding to c1c\rightarrow 1, meaning that the number of marginally satisfied constraints equals the number of degrees of freedom in the system. At this point we may ask what happens in the glassy phase, beyond the instability line where the replica symmetric solution breaks down. We do not attempt the RSB solution of the model but we give a simple argument. At the jamming point we know that the allowed phase space of configurations shrinks to a point. This point is jamming critical and isostatic Franz et al. (2017). Therefore we get that, at jamming as well as at the instability transition point, the system is isostatic. We, therefore, conjecture that the RSB phase is again isostatic and jamming critical. This would imply that we have another optimization problem for which the landscape is critical everywhere far from jamming as it happens for other cases Franz et al. (2019b, 2020); Sclocchi and Urbani (2021).

III Karush-Kuhn-Tucker Dynamics

In order to solve constrained optimization problems, one can rely on the so called Karush-Kuhn-Tucker (KKT) conditions. We consider a dynamical version of such conditions given by the following dynamical system

x˙i(t)=ζ(t)xi(t)(xi(t)1)+1Nμ=1Mξiμfμ(t)f˙μ(t)=fμ(t)gμ(t)\begin{split}\dot{x}_{i}(t)&=-\zeta(t)x_{i}(t)-(x_{i}(t)-1)\\ &+\frac{1}{\sqrt{N}}\sum_{\mu=1}^{M}\xi^{\mu}_{i}f_{\mu}(t)\\ \dot{f}_{\mu}(t)&=-f_{\mu}(t)g_{\mu}(t)\end{split} (28)

where ζ(t)\zeta(t) is a Lagrange multiplier needed to enforce the spherical constraint on the variables xix_{i}. It is simple to show that a fixed point of these equations is a solution of the optimization problem. Indeed if at long times gμ>0g_{\mu}>0, we have fμ=0f_{\mu}=0. Instead if we have gμ=0g_{\mu}=0 we have fμ>0f_{\mu}>0. The variables fμf_{\mu}, which are the Lagrange multipliers of the KKT conditions, are nothing but the forces that the constraints are exerting in order to fix the corresponding variables ggs to zero. Therefore this system of equations, when it reaches a fixed point, directly gives the statistics of forces corresponding to the gμg_{\mu} that are identically equal to zero.

Refer to caption
Refer to caption
Figure 2: Time evolution of gμg_{\mu} (panel (a)) and fμf_{\mu} (panel (b)) in the RS phase (α=5\alpha=5, σ=1\sigma=-1 in this example), each color corresponding to a different μ=1,,10\mu=1,...,10, obtained by numerically integrating Eq. (28) with an explicit Runge-Kutta method of order 5 Dormand and Prince (1980). In the RS phase, the system quickly converges to the steady-state. We notice that in the steady-state of the RS phase, for a given μ\mu, gμ>0g_{\mu}>0 and fμ=0f_{\mu}=0 or gμ=0g_{\mu}=0 and fμ>0f_{\mu}>0. In this experiment, the fμf_{\mu} have been initialized to 11 while the xix_{i} have been initialized as standard Gaussian variables. We used N=100N=100.

This set of dynamical equations can be analyzed exactly in the NN\rightarrow\infty limit through dynamical mean-field theory. We first start by considering the case in which the dynamical system is initialized at random for the xix_{i} while we will fix fμ(0)=1f_{\mu}(0)=1 which is a convenient choice. Then, to analyze the dynamics in the NN\rightarrow\infty limit, we consider the Martin-Siggia-Rose-Jensenn-De-Dominicis dynamical path integral Castellani and Cavagna (2005) defined as

Zdyn=𝒟(f¯,f¯^)𝒟(x¯,x¯^)𝒟(r¯,r¯^)exp[Sdyn]¯Z_{\rm dyn}=\overline{\int{\cal D}(\underline{f},\underline{\hat{f}}){\cal D}(\underline{x},\underline{\hat{x}}){\cal D}(\underline{r},\underline{\hat{r}})\exp[S_{\rm dyn}]} (29)

where the dynamical action is given by

Sdyn=iμ=1Mdt{f^μ(t)[f˙μ(t)+fμ(t)(rμ(t)σ)]+r^μ(t)rμ(t)}+ii=1Ndtx^i(t)[x˙i+ζ(t)xi(t)+(xi(t)1)1Nμ=1Mξiμfμ(t)]iNi=1Ndtμ=1Mr^μ(t)ξiμxi(t)\begin{split}S_{\rm dyn}&=i\sum_{\mu=1}^{M}\int\mathrm{d}t\left\{\hat{f}_{\mu}(t)\left[\dot{f}_{\mu}(t)\right.\right.\\ &\left.\left.+f_{\mu}(t)\left(r_{\mu}(t)-\sigma\right)\right]+\hat{r}_{\mu}(t)r_{\mu}(t)\right\}\\ &+i\sum_{i=1}^{N}\int\mathrm{d}t\hat{x}_{i}(t)\left[\dot{x}_{i}+\zeta(t)x_{i}(t)\right.\\ &\left.+(x_{i}(t)-1)-\frac{1}{\sqrt{N}}\sum_{\mu=1}^{M}\xi^{\mu}_{i}f_{\mu}(t)\right]\\ &-\frac{i}{\sqrt{N}}\sum_{i=1}^{N}\int\mathrm{d}t\sum_{\mu=1}^{M}\hat{r}_{\mu}(t)\xi^{\mu}_{i}x_{i}(t)\end{split} (30)

and the overline stands for the average over the random patterns. Taking the average over these vectors gives

exp[iNμ=1Mi=1Nξiμdt(x^i(t)fμ(t)+r^μ(t)xi(t))]¯=exp[12μ=1Mdtdt[fμ(t)fμ(t)D(t,t)+r^μ(t)r^μ(t)C(t,t)+2ir^μ(t)fμ(t)R(t,t)]]\begin{split}&\overline{\exp\left[-\frac{i}{\sqrt{N}}\sum_{\mu=1}^{M}\sum_{i=1}^{N}\xi^{\mu}_{i}\int\mathrm{d}t\left(\hat{x}_{i}(t)f_{\mu}(t)+\hat{r}_{\mu}(t)x_{i}(t)\right)\right]}\\ &=\exp\bigg{[}-\frac{1}{2}\sum_{\mu=1}^{M}\int\mathrm{d}t\mathrm{d}t^{\prime}\left[f_{\mu}(t)f_{\mu}(t^{\prime})D(t,t^{\prime})\right.\\ &\left.+\hat{r}_{\mu}(t)\hat{r}_{\mu}(t^{\prime})C(t,t^{\prime})+2i\hat{r}_{\mu}(t)f_{\mu}(t^{\prime})R(t,t^{\prime})\right]\bigg{]}\end{split} (31)

where we have introduced the dynamical order parameters

C(t,t)=1Nx¯(t)x¯(t)D(t,t)=1Nx¯^i(t)x¯^(t)R(t,t)=1Nx¯(t)ix¯^(t)\begin{split}C(t,t^{\prime})&=\frac{1}{N}\underline{x}(t)\cdot\underline{x}(t^{\prime})\\ D(t,t^{\prime})&=\frac{1}{N}\hat{\underline{x}}_{i}(t)\cdot\hat{\underline{x}}(t^{\prime})\\ R(t,t^{\prime})&=-\frac{1}{N}\underline{x}(t)\cdot i\hat{\underline{x}}(t^{\prime})\end{split} (32)

As usual, at the saddle point level D(t,t)=0D(t,t^{\prime})=0 by causality. The single gap effective process then gives the equation

r(t)=0tdtR(t,t)f(t)+η(t)r(t)=\int_{0}^{t}\mathrm{d}t^{\prime}R(t,t^{\prime})f(t^{\prime})+\eta(t) (33)

where the statistics of the noise η(t)\eta(t) is Gaussian and given by

η(t)¯=0η(t)η(t)¯=C(t,t).\overline{\eta(t)}=0\ \ \ \ \ \ \overline{\eta(t)\eta(t^{\prime})}=C(t,t^{\prime})\>. (34)

Finally we have

f(t)=exp[0tds(r(s)σ)].f(t)=\exp\left[-\int^{t}_{0}\mathrm{d}s(r(s)-\sigma)\right]\>. (35)

Note that Eq.(33) must be understood as a distributional equation. Furthermore, we have used that at the initial time t=0t=0 forces are initialized to fμ(0)=1f_{\mu}(0)=1. This initialization does not affect the endpoint of the dynamics and provides a way to enforce that all forces are either zero or positive. The single-site effective process is instead

x˙(t)=ζ(t)x(t)(x(t)1)+Ξ(t)0tdsK(t,s)x(s)\begin{split}\dot{x}(t)&=-\zeta(t)x(t)-(x(t)-1)+\Xi(t)-\int_{0}^{t}\mathrm{d}sK(t,s)x(s)\end{split} (36)

where the statistics of the Gaussian noise Ξ(t)\Xi(t) is given by

Ξ(t)¯=0Ξ(t)Ξ(t)¯=αf(t)f(t)=M(t,t)\overline{\Xi(t)}=0\ \ \ \ \ \ \ \ \ \overline{\Xi(t)\Xi(t^{\prime})}=\alpha\langle f(t)f(t^{\prime})\rangle=M(t,t^{\prime}) (37)

and the kernel K(t,s)K(t,s) is defined as

K(t,s)=αδf(t)ηδP(s)|P=0K(t,s)=\alpha\left.\frac{\delta\langle f(t)\rangle_{\eta}}{\delta P(s)}\right|_{P=0} (38)

where the perturbation P(t)P(t) is an additive perturbation on the right hand side of Eq. (33). Therefore we obtain the following equations for correlation, response and magnetization m(t)=ixi(t)/Nm(t)=\sum_{i}x_{i}(t)/N

tm(t)=ζ(t)m(t)(m(t)1)0tdsK(t,s)m(s)tC(t,t)=ζ(t)C(t,t)(C(t,t)m(t))0tdsK(t,s)C(t,s)+0tdsM(t,s)R(t,s)tR(t,t)=ζ(t)R(t,t)R(t,t)+δ(tt)ttdsK(t,s)R(s,t).\begin{split}\partial_{t}m(t)&=-\zeta(t)m(t)-(m(t)-1)\\ &-\int_{0}^{t}\mathrm{d}sK(t,s)m(s)\\ \partial_{t}C(t,t^{\prime})&=-\zeta(t)C(t,t^{\prime})-(C(t,t^{\prime})-m(t^{\prime}))\\ &-\int_{0}^{t}\mathrm{d}sK(t,s)C(t^{\prime},s)+\int_{0}^{t^{\prime}}\mathrm{d}sM(t,s)R(t^{\prime},s)\\ \partial_{t}R(t,t^{\prime})&=-\zeta(t)R(t,t^{\prime})-R(t,t^{\prime})+\delta(t-t^{\prime})\\ &-\int_{t^{\prime}}^{t}\mathrm{d}sK(t,s)R(s,t^{\prime})\>.\end{split} (39)

Note that the Lagrange multiplier ζ(t)\zeta(t) can be obtained directly from the equation for C(t,t)C(t,t^{\prime}).

Refer to caption
Refer to caption
Figure 3: Time evolution of gμg_{\mu} (panel (a)) and fμf_{\mu} (panel (b)) in the RSB phase (α=8\alpha=8, σ=1\sigma=-1 in this example), each color corresponding to a different μ=1,,10\mu=1,...,10, obtained by numerically integrating Eq. (28) with an explicit Runge-Kutta method of order 5 Dormand and Prince (1980). Since the RSB phase is marginally stable, the time evolution is chaotic and does not converge to a stationary state. In this experiment, the fμf_{\mu} have been initialized to 11 while the xix_{i} have been initialized as standard Gaussian variables. We used N=100N=100.

From these dynamical equations it follows that

m(t)=0tdsR(t,s).m(t)=\int_{0}^{t}\mathrm{d}sR(t,s)\>. (40)

which is a relation connecting the response function R(t,s)R(t,s) to the magnetization m(t)m(t).

III.1 The replica symmetric solution

Let us consider what happens in the region where replica symmetry is unbroken. If we define

χ=limt0tdsR(t,s)\chi=\lim_{t\rightarrow\infty}\int_{0}^{t}\mathrm{d}sR(t,s) (41)

we immediately have

mlimtm(t)=χ.m_{\infty}\equiv\lim_{t\rightarrow\infty}m(t)=\chi\>. (42)

Considering Eq.(33) and taking the tt\rightarrow\infty limit we get that

r=χf+ηr_{\infty}=\chi f_{\infty}+\eta (43)

being η\eta a normal Gaussian random variable. This equation tells us that if f=0f_{\infty}=0 then rr_{\infty} is a Gaussian random variable constrained to be such that rσr_{\infty}-\sigma is positive. Therefore the gap distribution is

P(h)=γ1(h+σ).P(h)=\gamma_{1}(h+\sigma)\>. (44)

On the other hand, if r=σr_{\infty}=\sigma then f>0f_{\infty}>0 and its distribution coincides with the one of the random variable (ση)/χ(\sigma-\eta)/\chi with the constraint that ση>0\sigma-\eta>0. Finally we need to establish an equation for χ\chi and show that it coincides with the one coming from the replica approach. We consider the long time limit of the equation for m(t)m(t) which gives

0=ζχ(χ1)Kχ0=-\zeta_{\infty}\chi-(\chi-1)-K_{\infty}\chi (45)

where

K=limt0dsK(t,s).K_{\infty}=\lim_{t\rightarrow\infty}\int_{0}^{\infty}\mathrm{d}sK(t,s)\>. (46)

On the other hand the long time limit of the equation for MM is obtained by considering the long time limit of the equation for CC. This gives

0=ζ+1χK+Mχ0=-\zeta_{\infty}+1-\chi-K_{\infty}+M_{\infty}\chi (47)

which can be combined with Eq.(45) to get

Mχ2=1χ2.M_{\infty}\chi^{2}=1-\chi^{2}\>. (48)

From the definition of MM_{\infty} we finally have

M=αf2=αdη2πeη2/2(η+σ)2χ2θ(η+σ)M_{\infty}=\alpha\langle f_{\infty}^{2}\rangle=\alpha\int\frac{\mathrm{d}\eta}{\sqrt{2\pi}}e^{-\eta^{2}/2}\frac{(\eta+\sigma)^{2}}{\chi^{2}}\theta(\eta+\sigma) (49)

and therefore using Eqs. (48) and (49) we get Eq. (16). This concludes the derivation of the replica symmetric equation from the KKT dynamics.

We implemented numerically the algorithm of Eq. (28) and we found that in the RS phase the algorithm goes very quickly to the solution of the optimization problem. In the RSB phase, instead, it seems that the algorithm does not converge and therefore we believe that in this region it makes a chaotic surf on the isostatic landscape.

IV Conclusions

We have considered the generic setting of optimization problems under non-convex excluded volume constraints. We have analyzed a simple example of this kind in which the cost function is separable and convex and in which the non-convex excluded volume constraints are modeled by a negative perceptron.

We have found that, when the number of constraints is low enough, the cost function has a simple ground state which is captured by the replica symmetric solution of the model. At high density of constraints, instead, the optimization landscape undergoes an RSB transition where minima become marginally stable. Remarkably, we find that the RSB transition point happens before the point in which the accessible phase space defined by the constraints splits into glassy regions. We have also shown how to recover these results from the dynamical mean-field theory of the KKT algorithm. We leave the analysis of more complex cost functions for future work.

Acknowledgements.
This work was supported by ”Investissements d’Avenir” LabExPALM (ANR-10-LABX-0039-PALM). A.S. thanks LPTMS where part of this work has been done and the Simons Foundation Grant (No. 454941 Silvio Franz and No. 454953 Matthieu Wyart) for support.

References

  • Neyshabur (2020) B. Neyshabur, in Advances in Neural Information Processing Systems, Vol. 33, edited by H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan,  and H. Lin (Curran Associates, Inc., 2020) pp. 8078–8088.
  • Franz et al. (2017) S. Franz, G. Parisi, M. Sevelev, P. Urbani,  and F. Zamponi, SciPost Physics 2, 019 (2017).
  • Franz and Parisi (2016) S. Franz and G. Parisi, Journal of Physics A: Mathematical and Theoretical 49, 145001 (2016).
  • Franz et al. (2015) S. Franz, G. Parisi, P. Urbani,  and F. Zamponi, Proceedings of the National Academy of Sciences 112, 14539 (2015).
  • Yoshino (2018) H. Yoshino, SciPost Phys 4, 040 (2018).
  • Brito et al. (2018) C. Brito, H. Ikeda, P. Urbani, M. Wyart,  and F. Zamponi, Proceedings of the National Academy of Sciences 115, 11736 (2018).
  • Franz et al. (2019a) S. Franz, S. Hwang,  and P. Urbani, Physical Review Letters 123, 160602 (2019a).
  • Ikeda et al. (2019) H. Ikeda, P. Urbani,  and F. Zamponi, Journal of Physics A: Mathematical and Theoretical 52, 344001 (2019).
  • Ikeda (2020) H. Ikeda, Physical Review Research 2, 033220 (2020).
  • Geiger et al. (2020) M. Geiger, L. Petrini,  and M. Wyart, arXiv preprint arXiv:2012.15110  (2020).
  • Castellani and Cavagna (2005) T. Castellani and A. Cavagna, Journal of Statistical Mechanics: Theory and Experiment 2005, P05012 (2005).
  • Boyd and Vandenberghe (2004) S. Boyd and L. Vandenberghe, Convex optimization (Cambridge university press, 2004).
  • Gardner (1988) E. Gardner, Journal of physics A: Mathematical and general 21, 257 (1988).
  • Gardner and Derrida (1988) E. Gardner and B. Derrida, Journal of Physics A: Mathematical and general 21, 271 (1988).
  • Mézard et al. (1987) M. Mézard, G. Parisi,  and M. A. Virasoro, Spin glass theory and beyond (World Scientific, Singapore, 1987).
  • De Almeida and Thouless (1978) J. De Almeida and D. J. Thouless, Journal of Physics A: Mathematical and General 11, 983 (1978).
  • Franz et al. (2019b) S. Franz, A. Sclocchi,  and P. Urbani, Physical review letters 123, 115702 (2019b).
  • Franz et al. (2020) S. Franz, A. Sclocchi,  and P. Urbani, SciPost Physics 9, 012 (2020).
  • Sclocchi and Urbani (2021) A. Sclocchi and P. Urbani, SciPost Phys. 10, 13 (2021).
  • Dormand and Prince (1980) J. R. Dormand and P. J. Prince, Journal of computational and applied mathematics 6, 19 (1980).