Population Level Activity in Large Random Neural Networks
Abstract
We determine limiting equations for large asymmetric ‘spin glass’ networks. The initial conditions are not assumed to be independent of the disordered connectivity: one of the main motivations for this is that allows one to understand how the structure of the limiting equations depends on the energy landscape of the random connectivity. The method is to determine the convergence of the double empirical measure (this yields population density equations for the joint distribution of the spins and fields). An additional advantage to utilizing the double empirical measure is that it yields a means of obtaining accurate finite-dimensional approximations to the dynamics.
1 Introduction
This paper concerns the high-dimensional dynamics of asymmetric random neural networks of the form, for .
(1) |
where is a Lipschitz function, is a constant, and are sampled independently from a centered normal distribution of variance , are Brownian Motions. We study the convergence of the double empirical measure
(2) |
where . The dynamics of high-dimensional recurrent neural networks have many applications. They have been heavily applied to neuroscientific problems: many scholars think that they can be used to explain how the brain balances excitation and inhibition [7, 32, 19, 28, 12]. They have been used to study spatially-extended patterns in the brain [33, 34]. Most recently, it has been recognized that they are of fundamental importance to data science [6, 2, 22, 35]. For more applications, see the mongraph of Helias and Dahmen [26] and the recent survey in [13].
There exist limiting ‘correlation equations’ [14, 26] that describe the effective dynamics of high dimensional random neural networks. These constitute delayed integro-differential equations that have proven very difficult to analyze, particularly over short timescales. A related problem is that the correlation equations have only been determined from initial conditions that are independent of the connectivity. This means that they may not be accurate over longer timescales that diverge with . For example, many scholars are interested in understanding the nature of the limiting dynamics after the system attains a particular state (such as, if it enters an ‘energy well’ of specified characteristics, does it escape?). To address this question, one needs to start the dynamics at a particular point in the energy landscape of the connectivity (and therefore the initial condition is disorder-dependent).
The literature concerning large limiting equations for random neural networks has a complex history. Sompolinsky, Crisanti and Sommers anticipated that Path Integral methods would yield limiting dynamical equations [36] - the derivation was published in a later work [14]. We refer the reader to the excellent discussion in the monograph of Helias and Dahmen [26].
Path Integral methods (as practiced by physicists) yield population density equations by determining where the probability measure for the -dimensional system concentrates. In the probability literature, one of the most powerful means of addressing this question is the theory of Large Deviations [18]. Large Deviations theory was used to determine spin glass dynamics in the pioneering papers of Ben Arous and Guionnet [3, 5, 24]; they obtained the first rigorous results concerning the large limit of random neural networks. After this work, Grunwald employed Large Deviations theory to obtain correlation / response equations for random neural networks whose spins flip randomly between discrete states [23]. Moynot and Samuelides studied the non-Gaussian case [31]. Faugeras and MacLaurin extended the work of Ben Arous and Guionnet to include correlations in the connectivity [20]. Touboul and Cabana determined limiting equations for spatially-extended systems [10, 11]. Faugeras, Soret and Tanre [21] determined novel integral equations to describe the state of these systems.
On a related note, correlation-response equations for symmetric random neural networks were first derived by Crisanti, Horner Sommers [15] and Cugliandolo and Kurchan [16]. Ben Arous, Dembo and Guionnet [4] proved the accuracy of these correlation / response equations for symmetric random neural networks, employing concentration inequalities.
Broadly-speaking, this paper follows the approach of Ben Arous and Guionnet [3]. We employ the theory of Large Deviations to determine the large limit of the empirical measure. However the main novelties of our approach are:
-
•
We employ a general class of connectivity-dependent initial conditions. This unsurprisingly yields a different limiting dynamics as . Connectivity-dependent initial conditions were employed in the papers of Ben Arous and Guionnet [5] (who studied dynamics started at the equilibrium distribution, in the high temperature regime) and Dembo and Subag [17].
-
•
We study the double empirical measure, that includes information about both the spins and the fields. This has several advantages: it facilitates finite-dimensional approximations to the dynamics that are very accurate, and it facilitates a broader class of disorder-dependent initial condition. For spin-glass dynamics, the Large Deviations of the double empirical measure was determined by Grunwald [23] for jump-Markov systems.
-
•
We include Replicas (i.e. copies of the system with the same connectivity, but independent Brownian Motions). This broadens the class of admissible disorder-dependent initial conditions.
-
•
The function can be unbounded and the diffusion coefficient can vary in time. The time-varying nature of is essential for studying how periodic environmental noise in the brain shapes the dynamics of random neural networks.
Notation: Let be the set of neuron indices. For any Polish space , let denote all probability measures over . The space is always endowed with the supremum topology (unless indicated otherwise), i.e.
For , is the Euclidean norm. For any probability measures and over a Polish Space, let denotes the relative entropy of measure with respect to . For any two measures on the same metric space with metric , indicates the Wasserstein distance, i.e.
(3) |
where the infimum is taken over all on the product space such that the marginal of the first variable is equal to and the marginal of the second variable is equal to . In the particular case that , the distance is (unless otherwise indicated) .
For any , we write to be the marginals over (respectively) the first variables and last variables.
2 Outline of Model and Main Results
We are going to rigorously determine the limiting dynamics of multiple replicas (with identical connections , but with independent initial conditions and independent Brownian Motions). We let the superscript denote replica , and consider the system
(4) | ||||
(5) |
We assume that : this means in particular that there is a constant such that . The noise intensity is taken to be continuous and non-random, and such that for constants and ,
(6) |
Our major motivation for time-varying diffusivity lies in neuroscience: often synaptic noise exhibits particular rhythms. It has been of major interest how these rhythms shape pattern formation [8].
The connectivities are taken to be independent centered Gaussian variables, with variance
Let be their joint probability law. There are two cases for the initial conditions that are considered in this paper.
2.1 Assumptions on the Initial Conditions
2.1.1 Case 1: Connectivity-Dependent Initial Conditions
The probability law of the initial conditions is assumed to be such that for any measurable set ,
(7) |
where the probability density is defined as follows.
Let be the uniform Lebesgue Measure over the set . Conditionally on a realization of the random connections, let the probability density be such that for (with and decreasing to as ), there exists some such that
(8) | ||||
(9) | ||||
(10) |
Roughly speaking, we need to assume that as , the law of behaves like its annealed average. Its assumed that (i) has a finite second moment in each of its variables, and (ii) we have the bound
(11) |
It is also assumed that for any ,
(12) |
Define to be the covariance matrix with entries, ,
(13) | ||||
(14) |
It is also assumed that both and are invertible.
2.1.2 Case 2: Connectivity-Independent Initial Conditions
One can also assume that the initial conditions are (i) independent of the connectivity, and (ii) sampled independently from a -valued probabilistic distribution of bounded variance. This distribution is written as .
2.2 Main Result
Our main result is that the empirical measure converges to a fixed point of a mapping . Here is defined in (54), consisting of (i) a broad class of measures with nice regularity properties, and (ii) such that the empirical measure inhabits with unit probability.
For any , in the case of connectivity-dependent initial conditions is specified as follows. It is defined to be the law of Gaussian random variables such that (i) are distributed according to , and (ii) conditionally on the initial conditions, is a Gaussian system such that and the conditional variance is
(15) |
Here
(16) | ||||
(17) | ||||
(18) |
Letting be Brownian Motions that are independent of , we define to be the strong solution to the stochastic differential equation
(19) |
In the case of connectivity-independent initial conditions, is defined as follows. One first defines to be a centered Gaussian system such that
is independent of and distributed according to . For Brownian Motions , that are independent of , is the strong solution of (19).
Theorem 1.
The mapping is well-defined for all . Furthermore there exists a unique probability measure such that with unit probability,
(20) |
is the unique measure such that . Furthermore,
(21) |
where and is any measure in .
Remark. This theorem is useful because it also implies a means to efficiently determine the large limiting equations, through repeated application of the mapping . Because the limiting system is Gaussian, one only needs to solve for its covariance matrix. See the discussion in Helias and Dahmen [26] for an alternative formulation of the limiting covariance function in terms of a PDE.
2.3 An Example System that Satisfies the Conditions of Section 2.1.1
We now outline a general system that satisfies the conditions of Section 2.1.1. Suppose that is an odd function. Define to be
(22) |
Here is centered and Gaussian, the law of random variables that are such that , and . Let be bounded.
Lemma 2.
Suppose that there is a unique such that
(23) |
Suppose also that
(24) |
Then the conditions of Section 2.1.1 are satisfied, substituting , and .
Proof.
We show that
(25) |
It is immediate from (25) that (12) must be satisfied. Let be iid random variables. Form the empirical measure .
Define
We will first demonstrate that
(26) |
It is straightforward to prove that for any there exists a compact subset such that
(27) |
A simple Large Deviations estimate yields that, for any , and ,
(28) |
(In fact this could also be proved using Corollary 11). Our choice that goes to zero sufficiently slowly (i.e. ) ensures that the rate function in (28) dominates the asymptotic estimate of the probability as . Thus discretizing into balls, and then taking , one obtains (26).
3 Proof Outline
The main goal of this paper is to prove Theorem 1 employing the theory of Large Deviations [18]. The method - similarly to the original work by Ben Arous and Guionnet [3] - is to (i) prove a Large Deviations Principle for the uncoupled system, and then (ii) perform an exponential change-of-measure using Girsanov’s Theorem to obtain the Large Deviations Principle for the coupled system, before (iii) proving that the rate function has a unique zero.
The three main differences between this paper and the early papers of Ben Arous and Guionnet is that we (i) study the convergence of the double empirical measure (2) (whereas Ben Arous and Guionnet study the convergence of the annealed empirical measure in their earlier papers [3]) (in the later works [25, 4] quenched asymptotics are determined) (ii) we employ disorder-dependent initial conditions and (iii) we employ replicas.
Our main focus is on proving Case 1 (i.e. the connectivity-dependent initial conditions). The proofs are broadly similar, however Case 1 is more difficult because it requires a uniform Large Deviations Principle for the conditioned probability laws.
3.1 Large Deviations of the Uncoupled System
We start by noting a Large Deviation Principle for the uncoupled system. Because we are employing general disorder-dependent initial conditions, we must determine a Large Deviations Principle for the conditioned probability law. To this end, we must first define the set of all ‘valid initial conditions’ (basically the set of all initial points such that the empirical measure at time is close to its limit). More precisely, we define to be such that
(30) |
where and . Define the uncoupled dynamics,
(31) |
and let be the law of , conditioned on the values at time . Write
(32) |
Define to be the law of the connections , conditioned on the event
(33) |
We note that is Gaussian, but no longer of zero mean (in general). The mean of is a function of the empirical measure and (explicit formulae are outlined further below). Let be the joint law of the uncoupled system, i.e.
(34) |
The first main result is a uniform Large Deviations Principle for the conditioned system.
Theorem 3.
Let , such that is open and closed. Then
(35) | ||||
(36) |
Here the rate function is such that if , else otherwise
(37) |
where, is defined in (117). Furthermore is lower-semi-continuous and has compact level sets.
We will prove Theorem 3 by locally freezing the dependence of the fields on the empirical measure. In order that we may do this, we must first define a regular subset (for a positive integer ) which is such that (i) the empirical measure inhabits with high probability, and (ii) there exist uniform bounds on the fluctuations in time. To this end, writing to be the compact set specified in Lemma 24, define the set
(38) |
where and . Write
(39) |
Lemma 4.
For any , there exists such that
(40) |
The above lemma is proved in the Appendix. Next, for any , we define a centered Gaussian law as follows. We stipulate that is the law of Gaussian random variables with covariance structure
(41) |
This definition will be useful because for any , the law of under is . In the following Lemma we collect some regularity estimates for the Gaussian Law .
Lemma 5.
(i) is a well-defined Gaussian probability law. (ii) Furthermore, the map is ‘uniformly continuous’ for all measures in , in the following sense. For any , and any , there exists such that for all ,
(42) |
In order that we may make sense of the disorder-dependent initial condition, we also require an understanding of the distribution of , conditioned on the value of . To this end, for any and any , let be the probability law conditioned on the event . Standard Gaussian identities [29] imply that is also Gaussian, with the following mean and variance,
(43) | ||||
(44) |
where
(45) | ||||
(46) | ||||
(47) |
Corollary 6.
(i) For any , and any , there exists such that for all , and all ,
(48) |
(ii) For any , there exists a compact set such that for all ,
(49) |
and for all such that ,
(50) |
(iii) For and any , the law of under is identical to .
3.2 Exponential Tightness
To prove a Large Deviation Principle, one requires that the empirical measure inhabits a compact set with arbitrarily high probability. For any , write to be the law of the random variables , and write to be conditioned on the event in (33).
The following lemmas are needed for this proof.
Lemma 7.
For any , there exists a compact set such that the following holds. For any , and any such that ,
(51) |
Also, as long as ,
(52) |
For , write to be the marginal of over its first variables, and to be the marginal of over its last variables. Next, define the set
(53) |
and let
(54) |
It follows immediately from the above definition that for any . We can now prove an ‘exponential tightness’ result.
Lemma 8.
For any , is compact. For any , there exists such that
(55) |
3.3 The Coupled System (with connectivity-dependent initial conditions)
Define to be such that if , or if the marginal of at time is not . Else otherwise, for any and , writing for conditioned on the values of its variables at time , define
(56) |
Here is defined to be the probability law of , where is distributed according to , and for Brownian Motions that are independent of
(57) |
Define to be the following map. For some , write to be the law of the following random variables . First, it is stipulated that have probability law . Second, conditionally on , the distribution of is given by .
Lemma 9.
The probability law is well-defined. Furthermore, there exists a unique zero of the rate function . is the unique measure in such that .
Proof.
We have already proved in Lemma 5 that is well-defined. It is straightforward to check that for any path , there exists a unique strong solution to the stochastic differential equation (57). Thus the probability law is well-defined.
It is well-known that the Relative Entropy is zero if and only if its two arguments are identical [9]. Thus, from the form of in (56), any zero must be a fixed point of . Furthermore, there must exist at least one zero of the rate function (if not, the total probability mass could not be one as ). The uniqueness of the zero is proved in Lemma 22. ∎
Theorem 10.
For any ,
(58) |
Thus with unit probability,
(59) |
Furthermore,
(60) |
where and is any measure in .
3.4 Connectivity-Independent Initial Conditions
The above reasoning can be adapted to prove a Large Deviation Principle for the unconditioned system. This is needed for proving the main theorem for Case 2 (connectivity-independent initial conditions). Write to be the law of the random variables (with no conditioning), and for any , define to be . In the following corollary to Theorem 3, we prove the Large Deviation Principle for the unconditioned and uncoupled system.
Corollary 11.
Let , such that is open and closed. Then
(61) | ||||
(62) |
Here the rate function is lower semi-continuous and has compact level sets.
We now specify the operator . Fix and defined to be the law of processes . One first defines to be centered Gaussian system such that
is independent of and distributed according to . Letting be Brownian Motions that are independent of , we define to be the strong solution to the stochastic differential equation
(63) |
Theorem 12.
Assume the connectivity-independent initial conditions (Case 2). For any ,
(64) |
Thus with unit probability,
(65) |
Furthermore,
(66) |
where and is any measure in .
4 Proofs
We have divided the proofs into three main sections. In Section 4.1, we prove general regularity properties of the stochastic processes. In Section 4.2, we prove the LDP for the uncoupled system. In Section 4.3, we determine the limiting dynamics of the coupled system.
4.1 Regularity Estimates and Compactness
We first prove Lemma 5.
Proof.
We first check that the covariance function is positive definite (when restricted to a finite set of times). Let be a finite set of times. Then evidently for any constants , it must be that
(67) |
This means that there exists a finite set of centered Gaussian variables such that (41) holds. It then follows from the Komolgorov Extension Theorem that is well-defined on any countably dense subset of times of . It remains for us to demonstrate continuity, i.e. that there exists a Gaussian probability law such that (41) holds for all time. We do this using standard theory for the continuity of Gaussian Processes (following Chapter 2 of [1]).
First, we notice that
(68) |
Now define the canonical metric,
(69) | ||||
(70) |
thanks to properties of the set , for all such that is smaller than some constant depending on . It follows from Theorem 1.4.1 of [1] that the Gaussian Process is almost-surely continuous.
Write to be the -ball about , and let denote the smallest number of such balls that cover . We see that there exists a constant such that
(71) |
Writing , it follows from Theorem 1.3.5 in [1] that there exist Gaussian Processes such that is almost-surely continuous, and there exists a universal constant and a random such that for all ,
(72) | ||||
(73) |
and we note that the above goes to 0 as . This also implies (42). ∎
We next prove Corollary 6.
Proof.
We can now prove Lemma 7.
Proof.
We prove (52) only. The other proof is very similar.
It follows from Lemma 5 that for any , there exists a compact set such that for any , and all such that ,
(74) |
It has already been noted above that for any , are independent, and the probability law of is . Thus as long as , the estimate in (74) holds for any such that .
Define
(75) |
Our construction of implies that
(76) |
Write . Next, for some , and , by Chernoff’s Inequality, for any and any ,
We thus find that (by taking small enough , and ), for any integer , that there must exist a compact set such that for all , all and all such that
(77) |
This motivates us to define the compact set to consist of all measures such that for all ,
(78) |
Thus using a union-of-events bound,
(79) | ||||
(80) | ||||
(81) |
for all , as long as is large enough.
∎
The following bound on the operator norm of the connectivity matrix is well-known (and the proof is omitted).
Lemma 13.
For any , there exists such that
(82) |
where has entry
Lemma 14.
For any , there exists such that for all and all ,
(83) |
where
Proof.
Write
If the event holds, then thanks to Ito’s Lemma it must be that
(84) | ||||
(85) |
since . Write
(86) |
and define the stopping time, for a constant ,
(87) |
Gronwall’s Inequality implies that for all ,
where . The quadratic variation of is
(88) |
For all ,
(89) |
and notice that is independent of the Brownian Motions. Now define the stochastic process to be such that
(90) | ||||
(91) |
Thanks to the time-rescaled representation of a stochastic integral, is a Brownian Motion [27]. Writing , it follows that
and we have written
(92) | ||||
(93) |
and we recall that . Employing a union-of-events bound,
(94) |
Now since is centered and Gaussian, with variance of ,
(95) | ||||
(96) |
We fix , and take to be arbitrarily large. Then
We thus find that, for large enough ,
(97) |
We have already demonstrated in the course of the proof that if the event holds, and , then there exists a constant such that . We have thus established the Lemma. ∎
The following -Wasserstein distance provides a very useful way of controlling the dependence of the fields on the measure . Define to be such that for any ,
(98) |
where the infimum is over all , such that the law of the first processes is given by , and the law of the last processes is given by . Let .
Lemma 15.
For any , metrizes weak convergence in . Furthermore,
(99) |
Proof.
Since is compact, Prokhorov’s Theorem implies that for any , there exists a compact set such that for all ,
(100) |
Since is compact, it follows from the Arzela-Ascoli Theorem that for any , there exists such that for all such that for all ,
(101) |
it necessarily holds that
(102) |
Let be any measure that is within of realizing the infimum in (98). Then, writing
we have the bound
(103) |
Now we take , and too. Since is closed, thanks to the Portmanteau Theorem, we thus find that for any ,
(104) |
which in turn implies that (making use of the uniform convergence over in (102))
(105) |
For the other term on the RHS of (103), for , write
Then,
(106) |
Thanks to the fact that, for all ,
one finds that the second term on the RHS of (106) goes to as , uniformly for all and all . For any fixed , the first term on the RHS of (106) must go to zero as , thanks to (100). We have thus proved the Lemma. ∎
For , we define analogously to (98).
Lemma 16.
There exists a constant such that for all and all ,
(107) |
Also for all such that for some , , there exists a constant such that
(108) |
and is the Euclidean norm on .
Proof.
We start by proving (107). Recalling the definition of the distance in (98), let be such that
(109) |
Furthermore define centered Gaussian processes to be such that for any and any ,
(110) | ||||
(111) | ||||
(112) |
This definition is possible thanks to a trivial modification of Lemma 5 (switching ). We thus find that
(113) | ||||
(114) | ||||
(115) |
Now as , the LHS of (113) must decrease to . (108) follows analogously.
4.2 Large Deviations of the Uncoupled System
Our first aim is to prove a Large Deviation Principle in the case of fields with a frozen interaction structure (in Lemma 17 below). This would ordinarily be a trivial application of Sanov’s Theorem. However the proof is slightly complicated by the need for the LDP to be uniform with respect to the variables that the probability laws are conditioned on.
For any and , define . In other words, is the law of independent -valued Gaussian variables . The mean and variance of these variables is specified in (43) and (44).
Let be the joint law of the uncoupled system, i.e.
(116) |
For , define as follows. We specify that if either the marginal of at time is not equal to , and / or . Otherwise, for any , writing to be the law of , conditioned on the values of its variables at time , define
(117) |
Define the empirical measure to be
(118) |
where we recall that
(119) |
Lemma 17.
Fix some . Let , such that is open and closed. Then
(120) | ||||
(121) |
Furthermore is lower semi-continuous, and has compact level sets.
Proof.
First, fix any sequence , such that . Necessarily, thanks to the definition of , it must be that
(122) |
It follows from (122) that
(123) | ||||
(124) |
See for instance [30] for a proof of this fact. Furthermore is lower-semi-continuous and has compact level sets.
We next have to show that the convergence is uniform over (as in the statement of the Theorem). To do this, we first wish to show that for any measurable set and any , for all large enough ,
(125) |
and is the closed -blowup of with respect to . To do this, we are going to compare the conditioned probability to the conditioned probability induced by any other sequence in . This comparison is facilitated by using the following permutation-averaged probability law.
Define the set
(126) |
and we recall that is a sequence that decreases to , as defined in (30). We endow with the topology that it inherits from . Write to be the set of all permutations on , and define the measure to be the average over all permutations, i.e. for any measurable ,
(127) |
and here we denote to be the permutation,
(128) |
Since the empirical measure is invariant under any permutation of its arguments, for any measurable
(129) |
We can without loss of generality take . Now consider any other sequence . Let be the -Wasserstein Distance on induced by the norm
(130) |
We claim that
(131) | ||||
(132) |
This identity follows from the fact that and are identically distributed, both (i) for all , and (ii) with respect to both probability laws and .
To now must prove the Large Deviations bounds in the statement of the theorem. We start with the upper-bound (120). It follows from (123) and (125) that for any ,
(134) |
The lower-semi-continuity of dictates that
and we have proved the upperbound. For the lower-bound, let be open, and for any , take to be such that . Then
(135) | ||||
(136) | ||||
(137) | ||||
(138) |
using the Large Deviations estimate (124). Taking , it must be that for any ,
(139) |
Since is arbitrary, (121) follows immediately. ∎
We can now state the proof of Theorem 3.
Proof.
We start with the upper bound (35). We write . Using a union-of-events bound, for any ,
(140) |
for any , as long as is sufficiently large, thanks to the exponential tightness proved in Lemma 8. By taking , it thus suffices that we prove that for arbitrary such that ,
(141) |
Since is compact, for any we can always find an open covering of the form, for some positive integer , ,
(142) |
We thus find that
(143) |
Thus, employing Lemma 5 in the third line below,
(144) | ||||
(145) | ||||
(146) | ||||
(147) | ||||
(148) |
thanks to Theorem 17. We thus find that
(149) |
Now, it is proved in Lemma 16 that is continuous. Since the Relative Entropy is lower-semi-continuous in both of its arguments, we thus find that the following map is lower-semi-continuous,
Thus taking , we obtain that
(150) |
and we have proved (141).
Turning to the lower bound (36), consider an arbitrary open set . If , then
since is identically outside of . In this case, its clear that (121) holds.
We can thus assume that . Let be such that is in the interior of , for some . We can thus find a sequence of neighborhoods of such that . We thus find that for any ,
(151) |
Similarly to the bound for the closed sets, we obtain that
(152) |
Taking , since is lower semicontinuous, it must be that
(153) |
Since is arbitrary, it must be that
(154) |
∎
4.2.1 Uncoupled System (with no conditioning)
In this subsection we prove Corollary 11.
For some , let be the joint law of the uncoupled system (with no conditioning), i.e.
(155) |
We reach a corollary to Lemma 17.
Corollary 18.
Fix some . Let , such that is open and closed. Then
(156) | ||||
(157) |
Furthermore is lower semi-continuous, and has compact level sets.
Proof.
This is a consequence of Sanov’s Theorem. ∎
4.3 Coupled System
Girsanov’s Theorem implies that
(158) |
where is
(159) |
We wish to specify a map with (i) as nice regularity properties as possible, and (ii) such that with unit probability
(160) |
It is well-known that the stochastic integral is not a continuous function of the driving Brownian motion. Thus we define the map to be a limit of time-discretized approximations, and we will show that this limit must always converge for any measure in .
Our precise definition of is as follows. We first define a time-discretized approximation to . ,
(161) |
We now define to be such that (in the case that the following limit exists)
(162) |
where is a positive integer defined further below in Lemma 20. If the above limit does not exist, then we define (in fact we will see that the limit always exists if ). It may be observed that is a well-defined measurable function.
Lemma 19.
For every , every , and for almost every , the following limit exists
(163) |
With unit probability, the Radon-Nikodym Derivative in (158) is such that
(164) |
Also for any , there exists such that for all ,
(165) |
Proof.
Define the set
(166) |
Thanks to a union-of-events bound, for any , and using the bound in Lemma 20,
(167) |
It thus follows from the Borel-Cantelli Lemma that there exists a random such that for all , and so the limit in (163) exists (almost surely). (165) follows analogously.
∎
Lemma 20.
(i) is continuous. (ii) Moreover, for any , there exists such that for all and all ,
(168) |
Proof.
(i) The continuity of is almost immediate from the definition.
(ii) For any , write . Starting with the discrete approximation to the stochastic integral, we can thus write
(169) |
Hence,
(170) |
where . Writing
(171) |
we obtain that
(172) |
The quadratic variation of this stochastic integral is
(173) |
By definition of the set , if , then for any , one can find such that as long as , necessarily
Then writing to be a standard Brownian Motion, using the Dambin-Dubins-Schwarz [27] time-rescaled representation of the stochastic integral, as long as ,
(174) | ||||
(175) |
as long as we choose sufficiently small.
The other terms in
are treated similarly (observe that they are just Riemann Sums, so its straightforward to control their difference from the limiting integral).
∎
Proof.
In the case of connectivity-independent initial conditions (Case 2 of the Assumptions), the theorem follows from Corollary 11. Since the relative entropy is only zero when its two arguments are identical, any zero must be a fixed point of the operator . It is proved in the following Lemma that there is a unique zero.
For the rest of this proof, we prove the theorem in the case of connectivity-dependent initial conditions. We start by proving that for any , there must exist a measure such that
(176) |
Write , where is large enough that
where is the upperbound for in Lemma 21. This is possible thanks to the Exponential Tightness. Thanks to the Radon-Nikodym derivative identity in (160), we thus find that
(177) |
Thus for (176) to hold, it suffices that we prove that there exists such that
(178) |
Since is compact, for any , we can obtain a finite covering of of the form
(179) |
where . By a union of events bound,
(180) | ||||
(181) |
If our proposition in (178) were to be false, then (181) would be strictly negative, which would be a contradiction.
Write to be such that
(182) |
Let be any measure such that for some subsequence , (this must be possible because is compact).
We next claim that
(183) |
Indeed writing ,
(184) | ||||
(185) | ||||
(186) | ||||
(187) |
and in this last step we first perform the conditional expectation, for conditioned on the values of .
Now, recall that
Furthermore, writing
our assumption on the initial condition dictates that for any ,
(188) |
Next, we claim that
(189) |
Indeed (189) is a consequence of Lemma 19: this Lemma implies that can be approximated arbitrarily well by continuous functions over .
We thus obtain that
(190) | ||||
(191) | ||||
(192) |
since (by definition)
and we have employed the uniform lower bound in (36). The lower semi-continuity of implies that
We thus obtain (183), as required.
Lemma 21.
There exists a constant such that
(195) |
Proof.
For any ,
(196) |
Thanks to Lemma 13, converges to as . It thus suffices that we prove that, for abitrary , there exists such that
(197) |
Now, leaving out the negative-semi-definite terms, we find that
(198) |
Furthermore, writing , and assuming that , one finds that
(199) | ||||
(200) | ||||
(201) |
We thus find that, thanks to Lemma , for any there exists a constant such that
(202) |
Write
We thus find that,
(203) |
Furthermore, using the Dambins-Dubins Schwarz Theorem [27], and writing to be 1D Brownian Motion,
(204) |
as long as is sufficiently large, using standard properties of Brownian Motion. ∎
We now prove that the rate function has a unique minimizer (i.e. we prove Lemma 9).
Lemma 22.
There exists a unique fixed point of in . Furthermore is such that for any , writing and , it holds that
(205) |
Proof.
We start by considering the following restricted map . This is such that , where for any such that . Define analogously.
We are going to demonstrate that there is a constant such that for all ,
(206) |
For any , we construct a particular that is within of realizing the infimum in the definition of the Wasserstein distance in (98). To do this, we employ the construction of Lemma 16. Let be -valued random variables (in the same probability space), with joint probability law . Then for Brownian motions , define
(207) | ||||
(208) |
The initial conditions are identical: . We immediately see that
(209) |
and hence
(210) | ||||
(211) |
It follows from this that there exists a constant such that for all ,
(212) | ||||
(213) |
using Lemma 16. Thus for small enough , there is a unique fixed point of (the mapping upto time ). Iterating this argument, we find a unique fixed point for . The uniqueness for in turn implies uniqueness for , thanks to the identity in Lemma 16.
To see why (205) holds. First consider arbitrary , and define . The above bound in (213) implies that necessarily is Cauchy. It then immediate follows that for any with first marginal equal to , and writing , it must be that is Cauchy.
Finally we note that metrizes weak convergence, thanks to Lemma 15.
∎
Appendix A Bounding Fluctuations of the Noise
For the processes that are defined in (31), define the empirical measure
(214) |
Next, we bound the probability of the empirical being in the set , defined in (38), which we recall
(215) |
where and . The main result of this section is the following.
Lemma 23.
For any , there exists such that for all ,
(216) |
Proof.
Employing a union-of-events bound, or any ,
(217) |
With a view to bounding the first term on the RHS, since ,
Thus for a positive constant ,
(218) |
Thus, thanks to Chernoff’s Inequality,
(219) | |||
(220) |
The first term on the RHS is bounded for all and all . For the second term on the RHS, standard theory on stochastic processes implies that the exponential moment exists, as long as is small enough. Thus, taking , the RHS can be made arbitrarily small. We thus find that
(221) |
The Lemma now follows from applying (221), Lemma 25 and Lemma 24 to (217). ∎
The following result is well-known. Nevertheless we sketch a quick proof for clarity.
Lemma 24.
For any , there exists a compact set such that for all ,
(222) |
Proof.
The following property follows straightforwardly from properties of the stochastic integral (noting that the diffusion coefficient is uniformly bounded): for any , there exists a compact set such that for all such that ,
(223) |
Write
(224) |
and note that our assumptions on dictates that
(225) |
For any , define the set to be such that
(226) |
We claim that for any , there exists such that
(227) |
To see this, employing a Chernoff Inequality, for a constant , for any ,
(228) | ||||
(229) | ||||
(230) |
Taking to be sufficiently small, and sufficiently large, we obtain (227).
Now, for an integer to be specified further below, define . Prokhorov’s Theorem implies that is compact. Employing a union-of-events bound, we obtain that
(231) | ||||
(232) |
We thus find that, for large enough ,
as required. ∎
Lemma 25.
There exists a constant such that for any positive integer and any , writing and , for any ,
(233) |
Proof.
Define, for ,
Notice that is a submartingale. Thus, writing , is a submartingale. Therefore, thanks to Doob’s Submartingale Inequality,
(234) | ||||
(235) | ||||
(236) |
∎
References
- [1] Robert Adler and Jonathan Taylor. Random Fields and Geometry, volume 53. Springer, 2019.
- [2] Ahmed El Alaoui, Andrea Montanari, and Mark Sellke. Optimization of mean-field spin glasses. Arxiv Preprint, 2020.
- [3] G Ben Arous and A Guionnet. Large deviations for langevin spin glass dynamics. Probability Theory and Related Fields, 102, 1995.
- [4] Gerard Ben Arous, Amir Dembo, and Alice Guionnet. Cugliandolo-kurchan equations for dynamics of spin-glasses. Probability Theory and Related Fields, 136:619–660, 2006.
- [5] Gerard Ben Arous and Alice Guionnet. Symmetric langevin spin glass dynamics. The Annals of Probability, 25:1367–1422, 1997.
- [6] Gerard Ben Arous, Song Mei, Andrea Montanari, and Mihai Nica. The landscape of the spiked tensor model. Communications on Pure and Applied Mathematics, 72:2282–2330, 11 2019.
- [7] Nicolas Brunel. Dynamics of sparsely connected networks of excitatory and inhibitory spiking neurons. Journal of Computational Neuroscience, 8:183–208, 2000.
- [8] Nicolas Brunel and Xiao-Jing Wang. What determines the frequency of fast network oscillations with irregular neural discharges? i. synaptic dynamics and excitation-inhibition balance. Journal of Neurophysiology, 90:415–430, 2003.
- [9] Amarjit Budhiraja and Paul Dupuis. Analysis and Approximation of Rare Events, volume 94. Springer, 2019.
- [10] Tanguy Cabana and Jonathan D. Touboul. Large deviations for randomly connected neural networks: I. spatially extended systems. Advances in Applied Probability, 50:983–1004, 2018.
- [11] Tanguy Cabana and Jonathan D. Touboul. Large deviations for randomly connected neural networks: Ii. state-dependent interactions. Advances in Applied Probability, 50:983–1004, 2018.
- [12] B. Cessac. Linear response in neuronal networks: from neurons dynamics to collective response. 5 2019.
- [13] Patrick Charbonneau, Enzo Marinari, Mark Mezard, Giorgio Parisi, Federico Ricci-Tersenghi, Gabriella Sicuro, and Francesco Zamponi, editors. Spin Glass Theory and Far Beyond. World Scientific, 2023.
- [14] A. Crisanti and H. Sompolinsky. Path integral approach to random neural networks. Physical Review E, 98:1–20, 2018.
- [15] Andrea Crisanti, Heinz Horner, and H.J. Sommers. The spherical p-spin interaction spin-glass model. Zeitschrift fur Physik B Condensed Matter, 92:257–271, 1993.
- [16] L. F. Cugliandolo and J. Kurchan. Analytical solution of the off-equilibrium dynamics of a long-range spin-glass model. Physical Review Letters, 71:173–176, 1993.
- [17] Amir Dembo and Eliran Subag. Dynamics for spherical spin glasses: Disorder dependent initial conditions. Journal of Statistical Physics, 181:465–514, 2020.
- [18] Amir Dembo and Ofer Zeitouni. Large Deviations Techniques and Applications 2nd Edition. Springer, 1998.
- [19] Diego Fasoli and Stefano Panzeri. Stationary-state statistics of a binary neural network model with quenched disorder. Entropy, pages 1–30, 2019.
- [20] Olivier Faugeras and James MacLaurin. Asymptotic description of neural networks with correlated synaptic weights. Entropy, 17:4701–4743, 2015.
- [21] Olivier Faugeras, Emilie Soret, and Etienne Tanré. Asymptotic behaviour of a network of neurons with random linear interactions. Preprint HAL Id : hal-01986927, 2019.
- [22] David Gamarnik. The overlap gap property: A topological barrier to optimizing over random structures. Proceedings of the National Academy of Scientists, 2021.
- [23] M Grunwald. Sanov results for glauber spin-glass dynamics. Probability Theory and Related Fields, 106:187–232, 1996.
- [24] A Guionnet. Averaged and quenched propagation of chaos for spin glass dynamics. Probability Theory and Related Fields, 109:183–215, 1997.
- [25] Alice Guionnet and Boguslaw Zegarlinski. Decay to equilibrium in random spin systems on a lattice. Journal of Statistical Physics, 86:899–904, 1997.
- [26] Moritz Helias and David Dahmen. Statistical Field Theory for Neural Networks. Springer, 2020.
- [27] Ioannis Karatzas and Steven Shreve. Brownian motion and stochastic calculus 2nd edition, 1991.
- [28] Itamar Daniel Landau and Haim Sompolinsky. Coherent chaos in a recurrent neural network with structured connectivity. PLoS Computational Biology, 14:1–27, 2018.
- [29] George Lindgren, Holger Rootzen, and Maria Sandsten. Stationary Stochastic Processes for Scientists and Engineers. Chapman Hall, 2013.
- [30] Eric Lucon. Quenched large deviations for interacting diffusions in random media. Journal of Statistical Physics, 166:1405–1440, 2017.
- [31] Olivier Moynot and Manuel Samuelides. Large deviations and mean-field theory for asymmetric random recurrent neural networks. Probability Theory and Related Fields, 123:41–75, 5 2002.
- [32] Gabriel Koch Ocker, Yu Hu, Michael A. Buice, Brent Doiron, Kresimir Josic, Robert Rosenbaum, and Eric Shea-Brown. From the statistics of connectivity to the statistics of spike times in neuronal networks. Current Opinion in Neurobiology, 46:109–119, 2017.
- [33] Robert Rosenbaum and Brent Doiron. Balanced networks of spiking neurons with spatially dependent recurrent connections. Physical Review X, pages 1–9, 2014.
- [34] Robert Rosenbaum, Matthew A. Smith, Adam Kohn, Jonathan E. Rubin, and Brent Doiron. The spatial structure of correlated neuronal variability. Nature Neuroscience, 20:107–114, 2017.
- [35] Kai Segadlo, Bastian Epping, Alexander Van Meegen, David Dahmen, Michael Krämer, and Moritz Helias. Unified field theoretical approach to deep and recurrent neuronal networks. Journal of Statistical Mechanics: Theory and Experiment, 2022, 10 2022.
- [36] H. Sompolinsky, A. Crisanti, and H. J. Sommers. Chaos in random neural networks. Physical Review Letters, 61:259–262, 1988.