This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Analyzing dynamics and average case complexity in the spherical Sherrington-Kirkpatrick model: a focus on extreme eigenvectors

Tingzhou Yu University of Waterloo, University of Alberta, [email protected]
Abstract

We explore Langevin dynamics in the spherical Sherrington-Kirkpatrick model, delving into the asymptotic energy limit. Our approach involves integro-differential equations, incorporating the Crisanti-Horner-Sommers-Cugliandolo-Kurchan equation from spin glass literature, to analyze the system’s size and its temperature-dependent phase transition. Additionally, we conduct an average case complexity analysis, establishing hitting time bounds for the bottom eigenvector of a Wigner matrix. Our investigation also includes the power iteration algorithm, examining its average case complexity in identifying the top eigenvector overlap, with comprehensive complexity bounds.

1 Introduction

Spin glasses are disordered magnetic systems known for their complex behavior, making them a topic of interest in physics [16]. These systems have extensive applications across various fields, including statistics, computer science, and beyond (see e.g., [6, 14, 32]). Among the models studying spin glasses, the Sherrington-Kirkpatrick (SK) model [25] is particularly noteworthy for its extensive analysis. This paper primarily focuses on the spherical Sherrington-Kirkpatrick (SSK) model, which serves as the continuous counterpart of the SK model. A central challenge in the study of spin glasses is understanding the equilibrium phase transition (see e.g., [17, 18]), a critical change in the system’s properties occurring as temperature decreases. This transition reveals insights about the nature of the system’s ground state.

Consider 𝕊N1(N)\mathbb{S}^{N-1}(\sqrt{N}), the N1N-1 dimensional sphere of radius N\sqrt{N} in N\mathbb{R}^{N}: 𝕊N1(N):={𝑿N:𝑿2=N}\mathbb{S}^{N-1}(\sqrt{N}):=\{\bm{X}\in\mathbb{R}^{N}:\|\bm{X}\|^{2}=N\}, where 2\|\cdot\|_{2} represents the 2\ell^{2} norm. The 22-spin spherical Sherrington-Kirkpatrick (SSK) model is defined by the Hamiltonian:

H𝐉(X):=1i,jNJijXiXj=𝑿T𝐉𝑿,H_{\mathbf{J}}(X):=\sum_{1\leq i,j\leq N}J_{ij}X_{i}X_{j}=\bm{X}^{T}\mathbf{J}\bm{X}, (1.1)

where 𝑿=(X1,,XN)N\bm{X}=(X_{1},\dots,X_{N})\in\mathbb{R}^{N} are spin variables on the sphere 𝕊N1(N)\mathbb{S}^{N-1}(\sqrt{N}) and 𝐉=Jij1i,jN\mathbf{J}={J_{ij}}_{1\leq i,j\leq N} is the normalized Wigner matrix as defined in Definition 1.1.

However, in some cases, the system may relax to equilibrium so slowly that it never actually reaches it. This phenomenon is known as ‘aging’, and it is integral to the study of spin glass dynamics, both experimentally and theoretically. Aging is a phenomenon that affects the system’s decorrelation properties over time. According to [4], this means that the longer the system exists, the longer it takes to forget its past. This phenomenon has been studied extensively by various authors (see e.g., [9, 10, 31]). The study of aging and its effect on spin glass dynamics is an active area of physics research, with important implications for the understanding of the low-temperature behavior of these systems.

The foundational mathematical literature on aging in spin glasses was first introduced in [2]. In this work, the authors concentrated on such systems, particularly investigating the aging phenomenon in spherical spin glasses. This was aimed at characterizing the low-temperature behaviors of Langevin dynamics in matrix models. A key focus was the study of Langevin dynamics within the Sherrington-Kirkpatrick (SK) model, utilizing correlation functions. These functions adhere to a complex system of integro-differential equations, known as the Crisanti-Horner-Sommers-Cugliandolo-Kurchan (CHSCK) equations [8, 9]. Recent advancements in this field include the work of [13], who derived the CHSCK equations for spherical mixed pp-spin disordered mean-field models. Another significant contribution was made by [12], who employed a general combinatorial method and stochastic Taylor expansion to establish universality in Langevin dynamical systems. Additionally, [23] explored the signal recovery problem in spiked matrix models using Langevin dynamics.

In this paper, we aim to analyze the asymptotic limit of the energy and focus on the long-time behavior of the system’s energy in Section 2. The energy, described by the Hamiltonian function, is critical in determining the equilibrium properties and governing the dynamics of spherical spin glass models (see e.g., [27, 3]). By studying the long-time behavior of the energy function, we can gain insight into the system’s energy evolution over time and its interaction with the spin variable dynamics. Notably, we derived an integro-differential equation of energy involving the empirical correlation function in Theorem 2.4. We also established an explicit formula for the energy’s limiting behavior in Theorem 2.5, where a phase transition was observed. Consequently, we obtained the limiting ratio of the energy to the empirical correlation function of the system in Corollary 2.6. The techniques employed in deriving our main results partially rely on the work of [2, 23].

As a potential application, the insights gained from our analysis of the asymptotic limit of the energy can be utilized to develop more efficient algorithms for solving challenging linear algebra problems, such as computing the eigenvectors of Wigner random matrices. However, this hinges on our understanding of the complexity of iterative methods, which have received less attention in the realm of complexity theory [26].

The complexity of algorithms in linear algebra has been a topic of interest for many years in [26]. While direct algorithms that solve problems in a finite number of steps have been extensively studied, iterative methods such as those required for the matrix eigenvalue problem have received less attention in complexity theory in [26]. The power method is a popular iterative algorithm that approximates the eigenvector corresponding to the dominant eigenvalue. However, for Hermitian random matrices, the complexity of the power method for obtaining a dominant eigenvector is infinite by [19]. [19] showed that the upper bound of the complexity is O(N2logN)O(N^{2}\log N), conditioned on all the eigenvalues being positive. The upper bound of this complexity is established as O(N2logN)O(N^{2}\log N) under the assumption of all positive eigenvalues. In a separate study, [20] explored another algorithm for calculating the dominant vector, revealing that under certain conditions, the average number of iterations needed is O(logN+log|logε|)O(\log N+\log|\log\varepsilon|). Furthermore, [11] studied the efficiency of three algorithms for computing the eigenvalues of sample covariance matrices, concluding that the complexity approximates O((logε2logN32)N2/3logN)O\left(\left(\frac{\log\varepsilon^{-2}}{\log N}-\frac{3}{2}\right)N^{2/3}\log N\right), independent of the specific distribution of the matrix entries.

In our work, as presented in Section 3, we delve into the complexity of an algorithm employing spherical gradient descent within the aging framework to analyze the equilibrium of the spherical SK model under zero-temperature dynamics (i.e., setting β=\beta=\infty in the Langevin dynamics defined in (2.1)). Spherical gradient descent, functioning as an optimization method, updates spin variables in the direction of the negative gradient of the energy function on a unit sphere, effectively serving as a continuous analog of the power method. We focus on the hitting time, the moment when there is a positive overlap between the algorithm’s output and the bottom eigenvector, with overlap being a measure of similarity between two spin configurations. Our research contributes to the field by providing bounds for the complexity of computing eigenvectors of Wigner random matrices with finite fourth moments; specifically, we establish a lower bound of O(N2/3)O(N^{2/3}) and an upper bound of O(N2/3logN)O(N^{2/3}\log N) in Theorem 3.3. Similarly, we analyzed the power iteration algorithm, particularly its complexity in reaching the first instance when the algorithm’s output and the top eigenvector overlap. Here too, we found the complexity’s upper bound to be O(N2/3logN)O(N^{2/3}\log N) and the lower bound to be O(N2/3)O(N^{2/3}), as detailed in Theorem 4.1.”

Notation:

For two real-valued functions f(x),g(x)f(x),g(x), we write f(x)=𝒪(g(x))f(x)=\mathcal{O}(g(x)) as xx\to\infty, which means there are constants C>0C>0 and x0x_{0} such that |f(x)|C|g(x)||f(x)|\leq C|g(x)| for all xx0x\geq x_{0}.

For f(x)=o(g(x))f(x)=o(g(x)) as xx\to\infty, it means for any ε>0\varepsilon>0, there exists an x0x_{0} such that |f(x)|ε|g(x)||f(x)|\leq\varepsilon|g(x)| for all x>x0x>x_{0}.

Asymptotic Equality f(x)σxg(x)f(x)\sigma_{x\to\infty}g(x): Implies that limxf(x)/g(x)=1\lim_{x\to\infty}f(x)/g(x)=1, showing that f(x)f(x) and g(x)g(x) are asymptotically equal as xx gets very large.

The definition of the (normalized) Winger matrix we consider in this paper is as follows.

Definition 1.1.

Let N1N\geq 1 be an integer. Consider a symmetric (Hermitian) N×NN\times N matrix 𝐉={Jij}1i,jN={1NZij}1i,jN\mathbf{J}=\{J_{ij}\}_{1\leq i,j\leq N}=\{\frac{1}{\sqrt{N}}Z_{ij}\}_{1\leq i,j\leq N}. Assume that the following conditions hold:

  • The upper-triangular entries {Zij}1ijN\{Z_{ij}\}_{1\leq i\leq j\leq N} are independent real (complex) random variables with mean zero;

  • The diagonal entries {Zii}1iN\{Z_{ii}\}_{1\leq i\leq N} are identically distributed with finite variance, and the off-diagonal entries {Zij}1i<jN\{Z_{ij}\}_{1\leq i<j\leq N} are identically distributed with unit variance;

  • Furthermore, in the context of the Hermitian case, assume that 𝔼[(Z12)2]=0\mathbb{E}[(Z_{12})^{2}]=0.

The matrix 𝐉\mathbf{J} is called the normalized symmetric (Hermitian) Wigner matrix.

Remark 1.2.

For cases where the random variables ZijZ_{ij} and ZiiZ_{ii} are real Gaussian and 𝔼[|Zii|2]=2\mathbb{E}[|Z_{ii}|^{2}]=2 for 1i,jN1\leq i,j\leq N, the Wigner matrix 𝐉\mathbf{J} is called the Gaussian Orthogonal Ensemble (GOE). Likewise, when the entries ZijZ_{ij} are complex Gaussian and ZiiZ_{ii} are real Gaussian with 𝔼[|Zii|2]=1\mathbb{E}[|Z_{ii}|^{2}]=1, the Wigner matrix 𝐉\mathbf{J} is called the Gaussian Unitary Ensemble (GUE).

2 Asymptotic energy dynamics in the spherical SK model

In this section, we explore the asymptotic energy dynamics of the spherical Sherrington-Kirkpatrick model. Our focus is on the Langevin dynamics, through which we analyze the system’s behavior over time. We will present our main results, including the derivation of integro-differential equations and the examination of long-term energy behavior and phase transitions in the model.

2.1 Main results

We consider the Langevin dynamics for the Sherrington-Kirkpatrick (SK) model defined by the following system of stochastic differential equations (SDEs) as in [2]:

dXti=j=1NJijXtjdtf(1Nj=1N(Xtj)2)Xtjdt+β1/2dWti,dX_{t}^{i}=\sum_{j=1}^{N}J_{ij}X_{t}^{j}dt-f^{\prime}\left({\frac{1}{N}\sum_{j=1}^{N}(X_{t}^{j})^{2}}\right)X_{t}^{j}dt+\beta^{-1/2}dW_{t}^{i}, (2.1)

where 𝐉={Jij}1i,jN\mathbf{J}=\{J_{ij}\}_{1\leq i,j\leq N} is a symmetric matrix of centered Gaussian random variables such that 𝔼[Jij2]=1N\mathbb{E}[J_{ij}^{2}]=\frac{1}{N} and 𝔼[Jii2]=2N\mathbb{E}[J_{ii}^{2}]=\frac{2}{N} for 1i<jN1\leq i<j\leq N, f:[0,)f:[0,\infty)\to\mathbb{R} satisfies ff^{\prime} to be non-negative and Lipschitz, β\beta is a positive constant, and {Wti}1iN\{W^{i}_{t}\}_{1\leq i\leq N} is an NN-dimensional Brownian motion, independent of {Jij}1i,jN\{J_{ij}\}_{1\leq i,j\leq N} and of the initial data {X0i}1iN\{X_{0}^{i}\}_{1\leq i\leq N}.

For any N1N\geq 1 and T0T\geq 0, the SDE (2.1) has a unique strong solution 𝑿t={Xti:1iN,t[0,T]}\bm{X}_{t}=\{X_{t}^{i}:1\leq i\leq N,t\in[0,T]\} on 𝒞([0,T],N)\mathcal{C}([0,T],\mathbb{R}^{N}). See [2, Lemma 6.7] for a proof.

The second term in (2.1) is a Lagrange multiplier in order to implement a smooth spherical constraint [2]. The simplification caused by the SSK model is the invariance under rotation for the SDE (2.1).

We write 𝐉=GTDG\mathbf{J}=G^{T}DG, where GG is an orthogonal matrix with the uniform law on the sphere and D=diag(σ1,,σN)D=\text{diag}(\sigma^{1},\dots,\sigma^{N}) is the diagonal matrix of the eigenvalues {σi}1iN\{\sigma^{i}\}_{1\leq i\leq N} of 𝐉\mathbf{J}. As NN\to\infty, we have 1Ni=1Nδσi\frac{1}{N}\sum_{i=1}^{N}\delta_{\sigma^{i}} converges weakly to the semicircle law μD\mu_{D} with compact support [2,2][-2,2] by [28, Theorem 2.4.2]. Recall that the density function of semicircle law μD\mu_{D} is given by

dμD=12π4x2𝟏{2x2}dx.d\mu_{D}=\frac{1}{2\pi}\sqrt{4-x^{2}}\mathbf{1}_{\{-2\leq x\leq 2\}}d\,x. (2.2)

To simplify the SDE (2.1), we let both sides of (2.1) be multiplied by the rotation matrix GG which is invariant under rotation. We take 𝒀t:=G𝑿t\bm{Y}_{t}:=G\bm{X}_{t} and Bt:=GWtB_{t}:=GW_{t}. Then the SDE under the rotation is given by

dYti=(σif(Yt22/N))Ytidt+β1/2dBti,dY_{t}^{i}=\left({\sigma^{i}-f^{\prime}\left({\|Y_{t}\|_{2}^{2}/N}\right)}\right)Y_{t}^{i}dt+\beta^{-1/2}dB_{t}^{i}, (2.3)

where 2\|\cdot\|_{2} is the 2\ell^{2} norm.

Denote by

KN(t,s):=1Ni=1NXtiXsiK_{N}(t,s):=\frac{1}{N}\sum_{i=1}^{N}X_{t}^{i}X_{s}^{i} (2.4)

the empirical correlation function. We use abbreviated notation KN(t):=KN(t,t)K_{N}(t):=K_{N}(t,t) for convenience.

Ben Arous, Dembo, and Guionnet studied the dynamics of the empirical correlation KNK_{N} and the limiting point as NN\to\infty (NN is the size of the system) in [2], which is the unique solution to a CHSCK equation as follows.

Theorem 2.1.

[2, Theorem 2.3] Assume that the initial data {X0i}1iN\{X_{0}^{i}\}_{1\leq i\leq N} are i.i.d with law μ0\mu_{0} so that 𝔼Xμ0[eαX]<\mathbb{E}_{X\sim\mu_{0}}[e^{\alpha X}]<\infty for some α>0\alpha>0. Fix T0T\geq 0. As NN\to\infty, KNK_{N} converges almost surely to deterministic limits KK. Recall that μD\mu_{D} is the semicircle law. Moreover, the limit KK is the unique solution to the following integro-differential equation:

K(t,s)\displaystyle K(t,s) =e0tf(K(w))𝑑w0sf(K(w))𝑑w𝔼(σ,X0)π[eσ(t+s)X02]\displaystyle=e^{-\int_{0}^{t}f^{\prime}(K(w))dw-\int_{0}^{s}f^{\prime}(K(w))dw}\underset{(\sigma,X_{0})\sim\pi^{\infty}}{\mathbb{E}}[e^{\sigma(t+s)}X_{0}^{2}]
+β10tsertf(K(w))𝑑wrsf(K(w))𝑑w𝔼(σ,X0)π[eσ(t+s2r)]𝑑r,\displaystyle+\beta^{-1}\int_{0}^{t\land s}e^{-\int_{r}^{t}f^{\prime}(K(w))dw-\int_{r}^{s}f^{\prime}(K(w))dw}\underset{(\sigma,X_{0})\sim\pi^{\infty}}{\mathbb{E}}[e^{\sigma(t+s-2r)}]dr,

where π=μDμ0\pi^{\infty}=\mu_{D}\otimes\mu_{0} and here we write K(s):=K(s.s)K(s):=K(s.s).

Remark 2.2.

As emphasized in [2], the aging is very dependent on initial conditions. In addition to considering i.i.d. initial condition, the author also considers other three types of initial conditions: the rotated independent initial conditions, the top eigenvector initial conditions, and the stationary initial conditions.

Remark 2.3.

Based on the thermodynamic limit of KN(t,s)K_{N}(t,s) as NN\to\infty, the authors study the long time evaluations of K(t,s)K(t,s) and established a dynamical phase transition in terms of the asymptotic of K(t,s)K(t,s) in [2, Proposition 3.2]. This is a first mathematical proof of the aging phenomenon.

Next, we similarly consider how to describe how the energy of the system evolves over time. Recall that the quadratic Hamiltonian of SSK model is defined by H𝐉(𝑿t)=𝑿tT𝐉𝑿tH_{\mathbf{J}}(\bm{X}_{t})=\bm{X}_{t}^{T}\mathbf{J}\bm{X}_{t}. Note that we have H𝐉(𝒀t)=𝒀tTD𝒀tH_{\mathbf{J}}(\bm{Y}_{t})=\bm{Y}_{t}^{T}D\bm{Y}_{t}, where 𝒀t=GXt\bm{Y}_{t}=GX_{t} and 𝐉=GTDG\mathbf{J}=G^{T}DG.

Let

HN(t):=1NH𝐉(𝒀t)=1Ni=1Nσi(Yti)2H_{N}(t):=\frac{1}{N}H_{\mathbf{J}}(\bm{Y}_{t})=\frac{1}{N}\sum_{i=1}^{N}\sigma^{i}(Y_{t}^{i})^{2} (2.5)

be the energy of the system.

Our first result characterizes the limiting behavior of the energy HN(t)H_{N}(t) of the SSK model as NN\to\infty for t[0,T]t\in[0,T] as follows.

Theorem 2.4.

Assume that the initial data {X0i}1iN\{X_{0}^{i}\}_{1\leq i\leq N} are i.i.d with law μ0\mu_{0} so that 𝔼Xμ0[eαX]<\mathbb{E}_{X\sim\mu_{0}}[e^{\alpha X}]<\infty for some α>0\alpha>0. Fix T0T\geq 0. Let KK be the solution defined as in Theorem 2.1. As NN\to\infty, HNH_{N} converges almost surely to deterministic limits HH. Moreover, the limit HH is the unique solution to the following integro-differential equation:

H(t)\displaystyle H(t) =e20tf(K(w))𝑑w𝔼(σ,X0)π[σe2σtX02]\displaystyle=e^{-2\int_{0}^{t}f^{\prime}(K(w))dw}\underset{(\sigma,X_{0})\sim\pi^{\infty}}{\mathbb{E}}[\sigma e^{2\sigma t}X_{0}^{2}] (2.6)
+β10te2stf(K(w))𝑑w𝔼(σ,X0)π[σe2σ(ts)]𝑑s\displaystyle+\beta^{-1}\int_{0}^{t}e^{-2\int_{s}^{t}f^{\prime}(K(w))dw}\underset{(\sigma,X_{0})\sim\pi^{\infty}}{\mathbb{E}}[\sigma e^{2\sigma(t-s)}]ds (2.7)

where π=μDμ0\pi^{\infty}=\mu_{D}\otimes\mu_{0} and here we write K(s):=K(s.s)K(s):=K(s.s).

Theorem 2.4 provides a precise characterization of H(t)H(t). However, the expression of H(t)H(t) is unclear because it involves the fixed point equation of K(t)K(t) in Theorem 2.1. The key point of this model is that we can exactly study the long time behavior of the energy H(t)H(t) as tt\to\infty.

In order to precisely determine the limit of the energy, we will define K(0,0)K(0,0) to be 1 and take function

f(x)=cx22,f(x)=\frac{cx^{2}}{2}, (2.8)

where cc is a positive constant.

Let m:supp(μD)m:\mathbb{R}\setminus\mbox{supp}(\mu_{D})\to\mathbb{R} be the Stieljes transform of the probability measure μD\mu_{D} given by

m(s)=𝔼σμD[1sσ]=2s+s24.m(s)=\mathbb{E}_{\sigma\sim\mu_{D}}\left[\frac{1}{s-\sigma}\right]=\frac{2}{s+\sqrt{s^{2}-4}}. (2.9)

Let βc\beta_{c} be the critical temperature such that

βc=c4m(2)=c4.\beta_{c}=\frac{c}{4}m(2)=\frac{c}{4}. (2.10)

Our second result is as follows.

Theorem 2.5.

Assume that K(0,0)=1K(0,0)=1 and f(x)=cx22f(x)=\frac{cx^{2}}{2} for some positive constants c>0c>0. Let HH be the unique solution given in Theorem 2.1. Then, for ββc\beta\leq\beta_{c}, we have

limtH(t)=0.\lim_{t\to\infty}H(t)=0. (2.11)

For β>βc\beta>\beta_{c}, we have

limtH(t)=4βc252πcβ+12β1.\lim_{t\to\infty}H(t)=\frac{4\beta-c}{2^{\frac{5}{2}}\sqrt{\pi}c\beta}+\frac{1}{2}\beta^{-1}. (2.12)

Theorem 2.5 describes the limit of energy function H(t)H(t) as tt\to\infty. This concludes a dynamical phase transition phenomenon. See Figure 1 for the existence of a jump discontinuity in the asymptotic limit of the function H(t)H(t), where we set c=1.c=1.

Refer to caption
Figure 1: In this figure, we set c=1c=1 and plot the limiting behavior of the function H(t)H(t) as tt approaches infinity. H(t)H(t) exhibits a jump discontinuity in the phase transition at the critical inverse temperature βc=0.25\beta_{c}=0.25.

The proof of Theorem 2.5 utilizes tools and techniques from the paper [2], with some modifications made to their results. We borrow the notation from [2] and write

R(t):=exp(20tf(K(w))𝑑w)R(t):=\exp\left({2\int_{0}^{t}f^{\prime}\left({K(w)}\right)dw}\right) (2.13)

with K(w)=K(w,w)K(w)=K(w,w).

Then the expression of H(t)H(t) in Theorem 2.4 becomes

H(t)=R(t)1(𝔼[σe2σt]+β10tR(r)𝔼[σe2σ(tr)]𝑑r)H(t)=R(t)^{-1}\left({\mathbb{E}[\sigma e^{2\sigma t}]+\beta^{-1}\int_{0}^{t}R(r)\mathbb{E}[\sigma e^{2\sigma(t-r)}]dr}\right) (2.14)

Note that the limit of H(t)H(t) is governed by the asymptotic of the derivative of the moment generating function of σ\sigma. So it suffices to characteristic the limit of R(t)R(t) and 𝔼[σe2σt]\mathbb{E}\left[\sigma e^{2\sigma t}\right].

Similarly, we consider the asymptotic limit of H(t)/K(t)H(t)/K(t) as tt\to\infty.

Corollary 2.6.

Assume the same setting as in Theorem 2.5. Then for ββc\beta\leq\beta_{c}, we have

limtH(t)K(t)=0.\lim_{t\to\infty}\frac{H(t)}{K(t)}=0. (2.15)

For β>βc\beta>\beta_{c}, we have

limtH(t)K(t)=23/2(4βc)+πc25/2(4βc)+πc\lim_{t\to\infty}\frac{H(t)}{K(t)}=\frac{2^{-3/2}(4\beta-c)+\sqrt{\pi}c}{2^{-5/2}(4\beta-c)+\sqrt{\pi}c} (2.16)
Refer to caption
Figure 2: In this figure, we set c=1c=1 and plot the limiting behavior of the function H(t)/K(t)H(t)/K(t) as tt approaches infinity. There is a jump discontinuity in the phase transition at the critical inverse temperature βc=0.25\beta_{c}=0.25.

2.2 Proof of Theorem 2.5

In proof of our main results, we will need to use the Bessel function. Let us first revisit its basic definition and some relevant results that we will be using.

An alternative definition of the (modified) Bessel function, for integer values of nn, is possible using integral representation:

Definition 2.7.

[1, Section 9.1] The Bessel function is given by

Bn(x):=inπ0πeixcosθcos(nθ)𝑑θ.B_{n}(x):=\frac{i^{-n}}{\pi}\int_{0}^{\pi}e^{ix\cos\theta}\cos(n\theta)d\theta. (2.17)

for nn\in\mathbb{N} and xx\in\mathbb{R}.

The modified Bessel function is given by

In(x):=1π0πexcosθcos(nθ)𝑑θ.I_{n}(x):=\frac{1}{\pi}\int_{0}^{\pi}e^{x\cos\theta}\cos(n\theta)d\theta. (2.18)

for nn\in\mathbb{N} and xx\in\mathbb{R}.

The relation between the Bessel function and the modified Bessel function is given by

In(x)=einπ/2Bn(xeiπ/2)I_{n}(x)=e^{-in\pi/2}B_{n}(xe^{i\pi/2}) (2.19)

as shown in [1, Section 9.6].

We will use the following lemma about the recurrence relation and derivatives of modified Bessel functions.

Lemma 2.8.

[1, Section 9.6] Let InI_{n} be the modified Bessel function defined as in Definition 2.7. For nn\in\mathbb{N},

In(x)=In+1(x)+nxIn(x),I_{n}^{\prime}(x)=I_{n+1}(x)+\frac{n}{x}I_{n}(x), (2.20)

and I0(x)=I1(x)I^{\prime}_{0}(x)=I_{1}(x).

By [1, Section 9.6], we have the following asymptotic results of modified Bessel functions.

Lemma 2.9.

Let InI_{n} be the modified Bessel function defined as in Definition 2.7. For nn\in\mathbb{N}, as xx\to\infty we have

limxIn(x)x1/2ex=12π.\lim_{x\to\infty}\frac{I_{n}(x)}{x^{-1/2}e^{x}}=\frac{1}{\sqrt{2\pi}}. (2.21)

We will give the representation of the characteristic function of the semicircle law by the Bessel function.

Lemma 2.10.

Let B1(t)B_{1}(t) be the Bessel function defined as in Definition 2.7 for n=1n=1. Recall that the eigenvalues σ\sigma of N×NN\times N normalized symmetric Wigner matrix 𝐉\mathbf{J} follow the semicircle law with distribution μD\mu_{D} as in (2.2). Then we have

𝔼[eitσ]=B1(2t)t\mathbb{E}[e^{it\sigma}]=\frac{B_{1}(2t)}{t} (2.22)
Proof.

By [7, Theorem 6.2.3], it is enough to calculate the inverse of the characteristic function of B1(2t)/tB_{1}(2t)/t is the density of the semicircle law.

Note that by the inversion formula we have

12π1tB1(2t)eitx𝑑t\displaystyle\dfrac{1}{2\pi}\int_{\mathbb{R}}\dfrac{1}{t}B_{1}\left(2t\right)e^{-itx}dt =12iπ20π1te2itcosθcosθeitxdθdt\displaystyle=\dfrac{1}{2i\pi^{2}}\int_{\mathbb{R}}\int_{0}^{\pi}\dfrac{1}{t}e^{2it\cos\theta}\cos\theta e^{-itx}d\theta dt
=12iπ20πcosθ1teit(2cosθx)𝑑t=iπSign(2cosθx)dθ\displaystyle=\dfrac{1}{2i\pi^{2}}\int_{0}^{\pi}\cos\theta\underbrace{\int_{\mathbb{R}}\dfrac{1}{t}e^{it(2\cos\theta-x)}dt}_{=-i\pi\cdot\text{Sign}(2\cos\theta-x)}d\theta
=12π0πcosθsign(2cosθx)𝑑θ,\displaystyle=-\dfrac{1}{2\pi}\int_{0}^{\pi}\cos\theta\cdot\text{sign}(2\cos\theta-x)d\theta,

where Sign()\text{Sign}(\cdot) is the Sign function.

Clearly, we have Sign(2cosθx)=1\text{Sign}(2\cos\theta-x)=-1 for x>2x>2, and sgn(2cosθx)=1\text{sgn}(2\cos\theta-x)=1 for x<2x<-2. For both cases, the above integral is zero due to 0πcosθdθ=0\int_{0}^{\pi}\cos\theta d\theta=0.

Consider the case that 2x2-2\leq x\leq 2. Set u=2cosθxu=2\cos\theta-x. The above integral becomes

12π2x2xu+x211(u+x2)2Sign(u)𝑑u\displaystyle-\dfrac{1}{2\pi}\int_{-2-x}^{2-x}\dfrac{u+x}{2}\dfrac{1}{\sqrt{1-\left(\frac{u+x}{2}\right)^{2}}}\text{Sign}(u)du =12π{2x0u+x211(u+x2)2𝑑u02xu+x211(u+x2)2𝑑u}\displaystyle=\dfrac{1}{2\pi}\Bigg{\{}\int_{-2-x}^{0}\dfrac{u+x}{2}\dfrac{1}{\sqrt{1-\left(\frac{u+x}{2}\right)^{2}}}du-\int_{0}^{2-x}\dfrac{u+x}{2}\dfrac{1}{\sqrt{1-\left(\frac{u+x}{2}\right)^{2}}}du\Bigg{\}}
=12π(21x2y1y2𝑑y2x21y1y2𝑑y)\displaystyle=\dfrac{1}{2\pi}\left(2\int_{-1}^{\frac{x}{2}}\dfrac{y}{\sqrt{1-y^{2}}}dy-2\int_{\frac{x}{2}}^{1}\dfrac{y}{\sqrt{1-y^{2}}}dy\right)
=12π4x2,\displaystyle=\dfrac{1}{2\pi}\sqrt{4-x^{2}},

where we take variable y=(u+x)/2y=(u+x)/2 in the second line for 2x2-2\leq x\leq 2. ∎

The following Lemma derives the asymptotic limit of the derivative of the moment generating function of σ\sigma.

Lemma 2.11.

Recall that the eigenvalues σ\sigma of N×NN\times N normalized symmetric Wigner matrix 𝐉\mathbf{J} follow the semicircle law with distribution μD\mu_{D} as in (2.2). Then we have:

limt𝔼σμD[σetσ]t3/2e2t=12π.\lim_{t\to\infty}\frac{\mathbb{E}_{\sigma\sim\mu_{D}}[\sigma e^{t\sigma}]}{t^{-3/2}e^{2t}}=\frac{1}{2\sqrt{\pi}}. (2.23)
Proof.

Substitute tt by itit in Lemma 2.10, then we have

𝔼[etσ]=B1(2it)2it\mathbb{E}[e^{t\sigma}]=\frac{B_{1}(-2it)}{-2it} (2.24)

where B1()B_{1}(\cdot) is the Bessel function.

By equation (2.19), we get

𝔼[etσ]=I1(2t)2t,\mathbb{E}[e^{t\sigma}]=\frac{I_{1}(2t)}{2t}, (2.25)

where I1(2t)I_{1}(2t) is the modified Bessel function.

Thus, the derivative of the moment generating function can be expressed as

𝔼[σetσ]=t2I1(2t)+2t1I1(2t)\mathbb{E}[\sigma e^{t\sigma}]=-t^{-2}I_{1}(2t)+2t^{-1}I_{1}^{\prime}(2t) (2.26)

Combine (2.26) and Lemma 2.8 we have

𝔼[σetσ]=I2(2t)t.\mathbb{E}[\sigma e^{t\sigma}]=\frac{I_{2}(2t)}{t}. (2.27)

By Lemma 2.9, it yields the desired result.

Similarly, we have the following result.

Lemma 2.12.

Assume the same setting holds as in Lemma 2.11, we have

limt𝔼σμD[etσ]t3/2e2t=14π.\lim_{t\to\infty}\frac{\mathbb{E}_{\sigma\sim\mu_{D}}[e^{t\sigma}]}{t^{-3/2}e^{2t}}=\frac{1}{4\sqrt{\pi}}. (2.28)

Let m(s)m(s) be defined as in (2.9). We define

p(s,β):=2βscm(s).p(s,\beta):=\frac{2\beta s}{c}-m(s). (2.29)

Recall that βc\beta_{c} is the critical temperature defined in (2.10). For any β(0,βc)\beta\in(0,\beta_{c}), there exists a unique solution of p(s,β)=0p(s,\beta)=0 on the interval (2,)(2,\infty), denoted by sβs_{\beta}. We can solve for sβs_{\beta} as sβ=2(1(1β/βc)2)1/2s_{\beta}=2(1-(1-\beta/\beta_{c})^{2})^{-1/2}. For β>βc\beta>\beta_{c}, we simply define sβ=2s_{\beta}=2.

We can get the asymptotic limit of R(t)R(t) as follows.

Lemma 2.13.

[2, Lemma 3.3] Recall that sβ=2(1(1β/βc)2)1/2s_{\beta}=2(1-(1-\beta/\beta_{c})^{2})^{-1/2} for any β(0,βc)\beta\in(0,\beta_{c}), and sβ=2s_{\beta}=2 for ββc\beta\geq\beta_{c}. Let Ψ=0\Psi=0 for β<βc\beta<\beta_{c}, Ψ=12\Psi=\frac{1}{2} for β=βc\beta=\beta_{c}, and Ψ=32\Psi=\frac{3}{2} for β>βc\beta>\beta_{c}. Then there exists a constant Cβ>0C_{\beta}>0 such that

limxR(x)xΨe2xsβ=Cβ.\lim_{x\to\infty}\frac{R(x)}{x^{-\Psi}e^{2xs_{\beta}}}=C_{\beta}. (2.30)

Moreover, we have

Cβ={β(cm(sβ)+1)2βcm(sβ),β<βcβ(c+1)c,β=βccβ(4β+1)(4βc)2,β>βcC_{\beta}=\begin{cases}\frac{\beta(cm(s_{\beta})+1)}{2\beta-cm^{\prime}(s_{\beta})},\quad&\beta<\beta_{c}\\ \frac{\beta(c+1)}{c},\quad&\beta=\beta_{c}\\ \frac{c\beta(4\beta+1)}{(4\beta-c)^{2}},\quad&\beta>\beta_{c}\end{cases}

where cc is the coefficient constant defined in (2.8) and m(s)m(s) is defined as in (2.9).

Combining Lemma 2.11 and Lemma 2.13 we can characterize the limit of H(t)H(t) as tt\to\infty. In order to give a precise result of the asymptotic limit, we also need to do Laplace transformation on both sides of the equation (2.14) to get an identity as follows.

Define the Laplace transform of the function R(t)R(t) for z>2z>2 by

R(z):=0e2ztR(t)𝑑t.\mathcal{L}_{R}(z):=\int_{0}^{\infty}e^{-2zt}R(t)dt. (2.31)
Lemma 2.14.

The Laplace transform R(z)\mathcal{L}_{R}(z) satisfies the equation

2zR(z)1=cm(z)(1+β1R(z)),2z\mathcal{L}_{R}(z)-1=cm(z)(1+\beta^{-1}\mathcal{L}_{R}(z)), (2.32)

where cc is the coefficient constant defined in (2.8) and m(s)m(s) is defined as in (2.9).

Proof.

Note that

K(t)R(t)=K(t)e2c0tK(w)𝑑w=12ctR(t).K(t)R(t)=K(t)e^{2c\int_{0}^{t}K(w)dw}=\frac{1}{2c}\partial_{t}R(t).

Then we have the linear Volterra integro-differential equation

R(t)=2cK(t)R(t)=2c(𝔼[e2σt]+β10tR(r)𝔼[e2σ(tr)]𝑑r).R^{\prime}(t)=2cK(t)R(t)=2c\left({\mathbb{E}[e^{2\sigma t}]+\beta^{-1}\int_{0}^{t}R(r)\mathbb{E}[e^{2\sigma(t-r)}]dr}\right). (2.33)

The Laplace transform of the LHS in (2.33) is

R(z)=R(0)+2zR(z)=1+2zR(z).\mathcal{L}_{R^{\prime}}(z)=-R(0)+2z\mathcal{L}_{R}(z)=-1+2z\mathcal{L}_{R}(z).

Note that the term inside the integral on RHS in (2.33) can be expressed as the convolution of R(t)R(t) and e2σte^{2\sigma t}. We write it as (e2σR)(t)(e^{2\sigma\cdot}*R)(t) and use the fact that the Laplace transform of this one is equal to product of the Laplace transform of each function.

Thus, the RHS becomes

0e2ztR(t)𝑑t\displaystyle\int_{0}^{\infty}e^{-2zt}R^{\prime}(t)dt =0e2zt(2c(𝔼[e2σt]+β10tR(r)𝔼[e2σ(tr)]𝑑r))𝑑t\displaystyle=\int_{0}^{\infty}e^{-2zt}\left({2c\left({\mathbb{E}[e^{2\sigma t}]+\beta^{-1}\int_{0}^{t}R(r)\mathbb{E}[e^{2\sigma(t-r)}]dr}\right)}\right)dt
=c𝔼[1zσ]+2cβ1𝔼[0e2t(zσ)𝑑t]g(z)\displaystyle=c\mathbb{E}\left[\frac{1}{z-\sigma}\right]+2c\beta^{-1}\mathbb{E}\left[\int_{0}^{\infty}e^{-2t(z-\sigma)}dt\right]\mathcal{L}_{g}(z)
=c𝔼[1zσ]+cβ1𝔼[1zσ]g(z)\displaystyle=c\mathbb{E}\left[\frac{1}{z-\sigma}\right]+c\beta^{-1}\mathbb{E}\left[\frac{1}{z-\sigma}\right]\mathcal{L}_{g}(z)

Hence, combining the Laplace transforms of the left and right sides, we obtain

2zR(z)=1+cm(z)+cβ1m(z)R(z).\displaystyle 2z\mathcal{L}_{R}(z)=1+cm(z)+c\beta^{-1}m(z)\mathcal{L}_{R}(z).

We now turn to the proof of Theorem 2.5.

Proof of Theorem 2.5.
  1. (i)

    We start by considering the case where β>βc\beta>\beta_{c}.

    Using Lemma 2.13, we obtain an asymptotic limit for R(t)R(t) as:

    R(t)tCβt3/2e4t,R(t)\sim_{t\to\infty}C_{\beta}t^{-3/2}e^{4t},

    where Cβ=cβ(4β+1)(4βc)2C_{\beta}=\frac{c\beta(4\beta+1)}{(4\beta-c)^{2}}.

    Combining the asymptotic limit for R(t)R(t) and Lemma 2.11, we notice that the limit of the first term of H(t)H(t) defined as in (2.14) is:

    limtR(t)1𝔼[σe2σt]=limt𝔼[σe2σt](2t)3/2e4t(2t)3/2e4tR(t)Cβt3/2e4tCβt3/2e4t=25/2πCβ.\lim_{t\to\infty}R(t)^{-1}\mathbb{E}[\sigma e^{2\sigma t}]=\lim_{t\to\infty}\frac{\frac{\mathbb{E}[\sigma e^{2\sigma t}]}{(2t)^{-3/2}e^{4t}}(2t)^{-3/2}e^{4t}}{\frac{R(t)}{C_{\beta}t^{-3/2}e^{4t}}C_{\beta}t^{-3/2}e^{4t}}=\frac{2^{-5/2}}{\sqrt{\pi}C_{\beta}}. (2.34)

    Next, we multiply the integral in equation (2.14) by the asymptotic limit of R(t)R(t) and split it into three parts: for t,x,t\to\infty,x\to\infty, and x/t0x/t\to 0,

    Cβ1t32e4t0tR(r)𝔼[σe2σ(tr)]𝑑r\displaystyle C_{\beta}^{-1}t^{\frac{3}{2}}e^{-4t}\int_{0}^{t}R(r)\mathbb{E}[\sigma e^{2\sigma(t-r)}]dr =Cβ1t32e4t(0xR(r)𝔼[σe2σ(tr)]dr\displaystyle=C_{\beta}^{-1}t^{\frac{3}{2}}e^{-4t}\Bigg{(}\int_{0}^{x}R(r)\mathbb{E}[\sigma e^{2\sigma(t-r)}]dr
    +xtxR(r)𝔼[σe2σ(tr)]dr+txtR(r)𝔼[σe2σ(tr)]dr)\displaystyle+\int_{x}^{t-x}R(r)\mathbb{E}[\sigma e^{2\sigma(t-r)}]dr+\int_{t-x}^{t}R(r)\mathbb{E}[\sigma e^{2\sigma(t-r)}]dr\Bigg{)}
    =Cβ1t32e4t(0xR(r)𝔼[σe2σ(tr)]dr\displaystyle=C_{\beta}^{-1}t^{\frac{3}{2}}e^{-4t}\Bigg{(}\int_{0}^{x}R(r)\mathbb{E}[\sigma e^{2\sigma(t-r)}]dr
    +xtxR(r)𝔼[σe2σ(tr)]dr+0xR(tr)𝔼[σe2σ(r)]dr)\displaystyle+\int_{x}^{t-x}R(r)\mathbb{E}[\sigma e^{2\sigma(t-r)}]dr+\int_{0}^{x}R(t-r)\mathbb{E}[\sigma e^{2\sigma(r)}]dr\Bigg{)}
    =:I1+I2+I3.\displaystyle=:I_{1}+I_{2}+I_{3}.

    We now estimate each term separately. For I1I_{1}, we have:

    I1\displaystyle I_{1} =Cβ1t32e4t0xR(r)𝔼[σe2σ(tr)]𝑑r\displaystyle=C_{\beta}^{-1}t^{\frac{3}{2}}e^{-4t}\int_{0}^{x}R(r)\mathbb{E}[\sigma e^{2\sigma(t-r)}]dr
    =Cβ1t32e4t0xR(r)𝔼[σe2σ(tr)](2(tr))32e4(tr)(2(tr))32e4(tr)𝑑r\displaystyle=C_{\beta}^{-1}t^{\frac{3}{2}}e^{-4t}\int_{0}^{x}R(r)\frac{\mathbb{E}[\sigma e^{2\sigma(t-r)}]}{(2(t-r))^{-\frac{3}{2}}e^{4(t-r)}}(2(t-r))^{-\frac{3}{2}}e^{4(t-r)}dr
    =232Cβ10xR(r)e4r(12π+o(1))(ttr)32𝑑r\displaystyle=2^{-\frac{3}{2}}C_{\beta}^{-1}\int_{0}^{x}R(r)e^{-4r}\left({\frac{1}{2\sqrt{\pi}}+o(1)}\right)\left({\frac{t}{t-r}}\right)^{\frac{3}{2}}dr
    =232Cβ10xR(r)e4r(12π+o(1))(1+o(1))𝑑r\displaystyle=2^{-\frac{3}{2}}C_{\beta}^{-1}\int_{0}^{x}R(r)e^{-4r}\left({\frac{1}{2\sqrt{\pi}}+o(1)}\right)(1+o(1))dr
    =252Cβ11π0xR(r)e4r𝑑r\displaystyle=2^{-\frac{5}{2}}C_{\beta}^{-1}\frac{1}{\sqrt{\pi}}\int_{0}^{x}R(r)e^{-4r}dr

    For I2I_{2}, we can show that it is of smaller order than I1I_{1} and I3I_{3} and can be neglected. Indeed, we have

    I2\displaystyle I_{2} =Cβ1t32e4txtxR(r)𝔼[σe2σ(tr)]𝑑r\displaystyle=C_{\beta}^{-1}t^{\frac{3}{2}}e^{-4t}\int_{x}^{t-x}R(r)\mathbb{E}[\sigma e^{2\sigma(t-r)}]dr
    =Cβ1t32e4txtxR(r)Cβr32e4rCβr32e4r𝔼[σe2σ(tr)](2(tr))32e4(tr)(2(tr))32e4(tr)𝑑r\displaystyle=C_{\beta}^{-1}t^{\frac{3}{2}}e^{-4t}\int_{x}^{t-x}\frac{R(r)}{C_{\beta}r^{-\frac{3}{2}}e^{4r}}C_{\beta}r^{-\frac{3}{2}}e^{4r}\frac{\mathbb{E}[\sigma e^{2\sigma(t-r)}]}{(2(t-r))^{-\frac{3}{2}}e^{4(t-r)}}(2(t-r))^{-\frac{3}{2}}e^{4(t-r)}dr
    =Cβ1t32e4txtxCβr32e4r(12π+o(1))(2(tr))32e4(tr)𝑑r\displaystyle=C_{\beta}^{-1}t^{\frac{3}{2}}e^{-4t}\int_{x}^{t-x}C_{\beta}r^{-\frac{3}{2}}e^{4r}\left({\frac{1}{2\sqrt{\pi}}+o(1)}\right)(2(t-r))^{-\frac{3}{2}}e^{4(t-r)}dr
    =2521πxtx(tr(tr))32𝑑r\displaystyle=2^{-\frac{5}{2}}\frac{1}{\sqrt{\pi}}\int_{x}^{t-x}\left({\frac{t}{r(t-r)}}\right)^{\frac{3}{2}}dr
    =o(1)\displaystyle=o(1)

    Finally, for I3I_{3}, we have:

    I3=0x𝔼[σe2σr]e4r𝑑r.I_{3}=\int_{0}^{x}\mathbb{E}[\sigma e^{2\sigma r}]e^{-4r}dr.

    Note that we have

    0|R(r)|e4r𝑑r< and 0𝔼[σe2σr]e4r𝑑r<.\int_{0}^{\infty}|R(r)|e^{-4r}dr<\infty\mbox{ and }\,\int_{0}^{\infty}\mathbb{E}[\sigma e^{2\sigma r}]e^{-4r}dr<\infty.

    Then we have

    limtH(t)=252Cβπ+252β1Cβπ0R(r)e4r𝑑r+β10𝔼[σe2σr]e4r𝑑r.\lim_{t\to\infty}H(t)=\frac{2^{-\frac{5}{2}}}{C_{\beta}\sqrt{\pi}}+\frac{2^{-\frac{5}{2}}\beta^{-1}}{C_{\beta}\sqrt{\pi}}\int_{0}^{\infty}R(r)e^{-4r}dr+\beta^{-1}\int_{0}^{\infty}\mathbb{E}[\sigma e^{2\sigma r}]e^{-4r}dr. (2.35)

    By Lemma 2.14, we have

    0e4rR(r)𝑑r=R(2)=β(1+c)4βc\int_{0}^{\infty}e^{-4r}R(r)dr=\mathcal{L}_{R}(2)=\frac{\beta(1+c)}{4\beta-c} (2.36)

    Also, note that

    0𝔼[σe2σr]e4r𝑑r\displaystyle\int_{0}^{\infty}\mathbb{E}[\sigma e^{2\sigma r}]e^{-4r}dr =𝔼[σ0e2(2σ)r𝑑r]\displaystyle=\mathbb{E}\left[\sigma\int_{0}^{\infty}e^{-2(2-\sigma)r}dr\right]
    =𝔼[σ2(2σ)]\displaystyle=\mathbb{E}\left[\frac{\sigma}{2(2-\sigma)}\right]
    =12(1+𝔼[22σ])=12.\displaystyle=\frac{1}{2}\left({-1+\mathbb{E}\left[\frac{2}{2-\sigma}\right]}\right)=\frac{1}{2}. (2.37)

    Hence, plug (2.36) and (2.37) into (2.35) we have

    limtH(t)=252Cβπ+(252Cβπ)(1+c4βc)+12β1=252(4βc)πcβ+12β1.\lim_{t\to\infty}H(t)=\frac{2^{-\frac{5}{2}}}{C_{\beta}\sqrt{\pi}}+\left({\frac{2^{-\frac{5}{2}}}{C_{\beta}\sqrt{\pi}}}\right)\left({\frac{1+c}{4\beta-c}}\right)+\frac{1}{2}\beta^{-1}=\frac{2^{-\frac{5}{2}}(4\beta-c)}{\sqrt{\pi}c\beta}+\frac{1}{2}\beta^{-1}. (2.38)
  2. (ii)

    As β=βc\beta=\beta_{c}, by Lemma 2.13, we have

    R(t)tCβt1/2e4tR(t)\sim_{t\uparrow\infty}C_{\beta}t^{-1/2}e^{4t}

    where Cβ=β(c+1)cC_{\beta}=\frac{\beta(c+1)}{c}.

    By Lemma 2.11 and Lemma 2.13, the first term R1(t)𝔼[σe2σt]R^{-1}(t)\mathbb{E}[\sigma e^{2\sigma t}] in H(t)H(t) converges to 0 as tt\to\infty.

    Note that we have for x,t,x/t0x\to\infty,t\to\infty,x/t\to 0

    Cβ1t12e4t0tR(r)𝔼[σe2σ(tr)]𝑑r\displaystyle C_{\beta}^{-1}t^{\frac{1}{2}}e^{-4t}\int_{0}^{t}R(r)\mathbb{E}[\sigma e^{2\sigma(t-r)}]dr =Cβ1t12e4t(0xR(r)𝔼[σe2σ(tr)]dr\displaystyle=C_{\beta}^{-1}t^{\frac{1}{2}}e^{-4t}\Bigg{(}\int_{0}^{x}R(r)\mathbb{E}[\sigma e^{2\sigma(t-r)}]dr
    +xtxR(r)𝔼[σe2σ(tr)]dr+0xR(tr)𝔼[σe2σr]dr)\displaystyle+\int_{x}^{t-x}R(r)\mathbb{E}[\sigma e^{2\sigma(t-r)}]dr+\int_{0}^{x}R(t-r)\mathbb{E}[\sigma e^{2\sigma r}]dr\Bigg{)}
    =:E1+E2+E3\displaystyle=:E_{1}+E_{2}+E_{3}

    Then we have

    E1\displaystyle E_{1} =232Cβ10xR(r)e4rt12(tr)32(12π+o(1))𝑑r=o(1)\displaystyle=2^{-\frac{3}{2}}C_{\beta}^{-1}\int_{0}^{x}R(r)e^{-4r}\frac{t^{\frac{1}{2}}}{(t-r)^{\frac{3}{2}}}\left({\frac{1}{2\sqrt{\pi}}+o(1)}\right)dr=o(1)

    where it follows from t12(tr)32=1t(1+o(1))=o(1)\frac{t^{\frac{1}{2}}}{(t-r)^{\frac{3}{2}}}=\frac{1}{t}(1+o(1))=o(1).

    Similarly, we have E3=o(1)E_{3}=o(1).

    Also, we have

    E2=232πxtx(tr)12(1tr)32𝑑r=O(x12).\displaystyle E_{2}=\frac{2^{-\frac{3}{2}}}{\sqrt{\pi}}\int_{x}^{t-x}\left({\frac{t}{r}}\right)^{\frac{1}{2}}\left({\frac{1}{t-r}}\right)^{\frac{3}{2}}dr=O(x^{-\frac{1}{2}}).

    Hence, we have

    limtH(t)=0.\lim_{t\to\infty}H(t)=0. (2.39)
  3. (iii)

    In the case of β<βc\beta<\beta_{c}:

    By Lemma 2.13, we have

    R(t)tCβe2sβt,R(t)\sim_{t\uparrow\infty}C_{\beta}e^{2s_{\beta}t},

    where Cβ=β(cm(sβ)+1)2βcm(sβ)C_{\beta}=\frac{\beta(cm(s_{\beta})+1)}{2\beta-cm^{\prime}(s_{\beta})}.

    By Lemma 2.11 and Lemma 2.13, the first term R1(t)𝔼[σe2σt]R^{-1}(t)\mathbb{E}[\sigma e^{2\sigma t}] in H(t)H(t) converges to 0 as tt\to\infty.

    Note that we have for x,t,x/t0x\to\infty,t\to\infty,x/t\to 0

    Cβ1e2sβt0tR(r)𝔼[σe2σ(tr)]𝑑r\displaystyle C_{\beta}^{-1}e^{-2s_{\beta}t}\int_{0}^{t}R(r)\mathbb{E}[\sigma e^{2\sigma(t-r)}]dr =Cβ1e2sβt(0xR(r)𝔼[σe2σ(tr)]dr\displaystyle=C_{\beta}^{-1}e^{-2s_{\beta}t}\Bigg{(}\int_{0}^{x}R(r)\mathbb{E}[\sigma e^{2\sigma(t-r)}]dr
    +xtxR(r)𝔼[σe2σ(tr)]dr+0xR(tr)𝔼[σe2σr]dr)\displaystyle+\int_{x}^{t-x}R(r)\mathbb{E}[\sigma e^{2\sigma(t-r)}]dr+\int_{0}^{x}R(t-r)\mathbb{E}[\sigma e^{2\sigma r}]dr\Bigg{)}
    =:F1+F2+F3\displaystyle=:F_{1}+F_{2}+F_{3}

    Then we have

    F1\displaystyle F_{1} =232Cβ10xR(r)e4r1(tr)32e2(sβ2)t(12π+o(1))𝑑r=o(1)\displaystyle=2^{-\frac{3}{2}}C_{\beta}^{-1}\int_{0}^{x}R(r)e^{-4r}\frac{1}{(t-r)^{\frac{3}{2}}}e^{-2(s_{\beta}-2)t}\left({\frac{1}{2\sqrt{\pi}}+o(1)}\right)dr=o(1)

    where it follows from t12(tr)32=1t(1+o(1))=o(1)\frac{t^{\frac{1}{2}}}{(t-r)^{\frac{3}{2}}}=\frac{1}{t}(1+o(1))=o(1) and sβ>2s_{\beta}>2.

    Similarly, we have F3=o(1)F_{3}=o(1).

    Also, we have

    F2=232πxtxe2(sβ2)(tr)(1tr)32𝑑r=O(x12).\displaystyle F_{2}=\frac{2^{-\frac{3}{2}}}{\sqrt{\pi}}\int_{x}^{t-x}e^{-2(s_{\beta}-2)(t-r)}\left({\frac{1}{t-r}}\right)^{\frac{3}{2}}dr=O(x^{-\frac{1}{2}}).

    Hence, we have

    limtH(t)=0.\lim_{t\to\infty}H(t)=0. (2.40)

2.3 Proof of Theorem 2.4

Recall that ff^{\prime} is non-negative and Lipschitz as defined in (2.1). Recall that K=K(t,t)K=K(t,t) is defined in (2.4). Define

Rτθ(K):=eτθf(K(s))𝑑sR_{\tau}^{\theta}(K):=e^{-\int_{\tau}^{\theta}f^{\prime}(K(s))ds} (2.41)

and

DRτθ(K)=ddτRτθ(K)=f(K(τ,τ))eτθf(K(s))𝑑sDR_{\tau}^{\theta}(K)=\frac{d}{d\tau}R_{\tau}^{\theta}(K)=f^{\prime}(K(\tau,\tau))e^{-\int_{\tau}^{\theta}f^{\prime}(K(s))ds} (2.42)

We have the following bound on Rτθ(K)R_{\tau}^{\theta}(K) and DRτθ(K)DR_{\tau}^{\theta}(K).

Lemma 2.15.

[2, Theorem 5.3] Recall that we define f,Rτθ(K),f^{\prime},R_{\tau}^{\theta}(K), and DRτθ(K)DR_{\tau}^{\theta}(K) as above. Then we have

  1. 1.

    for any 0τθT0\leq\tau\leq\theta\leq T and K𝐂([0,T]2)K\in\mathbf{C}([0,T]^{2}), we have

    0Rτθ(K)1, and 0t|Rτθ(K)|𝑑τ1.0\leq R_{\tau}^{\theta}(K)\leq 1,\,\mbox{ and }\int_{0}^{t}|R_{\tau}^{\theta}(K)|d\tau\leq 1. (2.43)
  2. 2.

    for every θT\theta\leq T, we have

    supτθ|Rτθ(K)Rτθ(K~)|f0θ|K(s,s)K~(s,s)|𝑑s,\sup_{\tau\leq\theta}|R_{\tau}^{\theta}(K)-R_{\tau}^{\theta}(\widetilde{K})|\leq\|f^{\prime}\|_{\mathcal{L}}\int_{0}^{\theta}|K(s,s)-\widetilde{K}(s,s)|ds, (2.44)

    where f\|f^{\prime}\|_{\mathcal{L}} is the Lipschitz norm of ff^{\prime}.

  3. 3.

    for any 0τθT0\leq\tau\leq\theta\leq T,

    |DRτθ(K)DRτθ(K~)|f{|K(τ,τ)K~(τ,τ)|+(DRτθ(K)+DRτθ(K~))0θ|K(s,s)K~(s,s)|𝑑s}.\displaystyle|DR_{\tau}^{\theta}(K)-DR_{\tau}^{\theta}(\widetilde{K})|\leq\|f^{\prime}\|_{\mathcal{L}}\Bigg{\{}|K(\tau,\tau)-\widetilde{K}(\tau,\tau)|+\left({DR_{\tau}^{\theta}(K)+DR_{\tau}^{\theta}(\widetilde{K})}\right)\int_{0}^{\theta}|K(s,s)-\widetilde{K}(s,s)|ds\Bigg{\}}.

Consider the following collections of functions with domain space 2×𝐂([0,T])\mathbb{R}^{2}\times\mathbf{C}([0,T]) for T>0T>0 and range space one of 𝐂([0,T]j)\mathbf{C}([0,T]^{j}) for j=1,2,3j=1,2,3:

𝒢1:={gj,j=1,2,3:g1(Y0,σ,B)(t)=σeσt(Y0)2,g2()(t)=σBt2,g3()(t)=σY0Bt}.\displaystyle\mathcal{G}_{1}:=\{g_{j},j=1,2,3:g_{1}(Y_{0},\sigma,B_{\cdot})(t)=\sigma e^{\sigma t}(Y_{0})^{2},g_{2}(\cdot)(t)=\sigma B_{t}^{2},g_{3}(\cdot)(t)=\sigma Y_{0}B_{t}\}.
𝒢2:={gj,j=4,5:g4(Y0,σ,B)(s,t)=σY0Bseσt,g5()(s,t)=σ2Y0Bseσt}.\displaystyle\mathcal{G}_{2}:=\{g_{j},j=4,5:g_{4}(Y_{0},\sigma,B_{\cdot})(s,t)=\sigma Y_{0}B_{s}e^{\sigma t},g_{5}(\cdot)(s,t)=\sigma^{2}Y_{0}B_{s}e^{\sigma t}\}.
𝒢3:={gj,j=6,7:g6(Y0,σ,B)(u,v,t)=σBuBveσt,g7()(u,v,t)=σ2BuBveσt}.\displaystyle\mathcal{G}_{3}:=\{g_{j},j=6,7:g_{6}(Y_{0},\sigma,B_{\cdot})(u,v,t)=\sigma B_{u}B_{v}e^{\sigma t},g_{7}(\cdot)(u,v,t)=\sigma^{2}B_{u}B_{v}e^{\sigma t}\}.

Then our collection of functions is

𝒢=𝒢1𝒢2𝒢3.\mathcal{G}=\mathcal{G}_{1}\cup\mathcal{G}_{2}\cup\mathcal{G}_{3}. (2.45)

Define the empirical measure

νTN:=1Ni=1NδY0i,σi,B[0,T]i.\nu_{T}^{N}:=\frac{1}{N}\sum_{i=1}^{N}\delta_{Y_{0}^{i},\sigma^{i},B^{i}_{[0,T]}}. (2.46)

Define for g𝒢g\in\mathcal{G}

𝒞N:=g(Y0,σ,B)𝑑νTN(Y0,σ,B)=1Ni=1Ng(Y0i,σi,Bi),\mathcal{C}_{N}:=\int g(Y_{0},\sigma,B_{\cdot})d\nu_{T}^{N}(Y_{0},\sigma,B_{\cdot})=\frac{1}{N}\sum_{i=1}^{N}g(Y_{0}^{i},\sigma^{i},B^{i}_{\cdot}), (2.47)

where note that for g𝒢jg\in\mathcal{G}_{j}, g𝑑νTN𝐂([0,T]j)\int gd\nu_{T}^{N}\in\mathbf{C}([0,T]^{j}) for j=1,2,3j=1,2,3.

Proof of Theorem 2.4.
  1. 1.

    Existence and uniqueness of the limit

    Apply Ito’s formula from [15, Theorem 7.6.1], we have

    Yti=e0t(σif(KN(r)))𝑑rY0i+β1/20test(σif(KN(r)))𝑑r𝑑Bsi.Y_{t}^{i}=e^{\int_{0}^{t}\left({\sigma^{i}-f^{\prime}(K_{N}(r))}\right)dr}Y_{0}^{i}+\beta^{-1/2}\int_{0}^{t}e^{\int_{s}^{t}\left({\sigma^{i}-f^{\prime}(K_{N}(r))}\right)dr}dB_{s}^{i}. (2.48)

    Define Ft(K,σ)=f((K(t,t))σF_{t}(K,\sigma)=f^{\prime}((K(t,t))-\sigma. By integration by part, we have

    Yti=e0tFr(KN,σi)𝑑rY0i+β1/2Bti+(β1/20tBsiFs(KN,σi)estFr(KN,σi)𝑑r𝑑s).Y_{t}^{i}=e^{-\int_{0}^{t}F_{r}(K_{N},\sigma^{i})dr}Y_{0}^{i}+\beta^{-1/2}B_{t}^{i}+\left({-\beta^{-1/2}\int_{0}^{t}B_{s}^{i}F_{s}(K_{N},\sigma^{i})e^{-\int_{s}^{t}F_{r}(K_{N},\sigma^{i})dr}ds}\right). (2.49)

    Then we have

    Yti=R0t(KN)eσitY0i=:T1i(t)+β1/2Bti=:T2i(t)+(β1/20tBsieσi(ts)(DRst(KN)σiRst(KN))𝑑s)=:T3i(t).Y_{t}^{i}=\underbrace{R_{0}^{t}(K_{N})e^{\sigma^{i}t}Y_{0}^{i}}_{=:T_{1}^{i}(t)}+\underbrace{\beta^{-1/2}B_{t}^{i}}_{=:T_{2}^{i}(t)}+\underbrace{\left({-\beta^{-1/2}\int_{0}^{t}B_{s}^{i}e^{\sigma^{i}(t-s)}(DR_{s}^{t}(K_{N})-\sigma^{i}R_{s}^{t}(K_{N}))ds}\right)}_{=:T_{3}^{i}(t)}. (2.50)

    We denote the sum of three terms as T1i(t)T_{1}^{i}(t), T2i(t)T_{2}^{i}(t), and T3i(t)T_{3}^{i}(t), respectively. In this case,

    Yti=T1i(t)+T2i(t)+T3i(t).Y_{t}^{i}=T_{1}^{i}(t)+T_{2}^{i}(t)+T_{3}^{i}(t). (2.51)

    Then the energy HN(t)H_{N}(t) becomes

    HN(t)=1Ni=1Nσi(j=13(Tji(t))2+2jkTji(t)Tki(t)),H_{N}(t)=\frac{1}{N}\sum_{i=1}^{N}\sigma^{i}\left({\sum_{j=1}^{3}(T_{j}^{i}(t))^{2}+2\sum_{j\neq k}T_{j}^{i}(t)T_{k}^{i}(t)}\right), (2.52)

    Plug into the expression of T1i(t),T2i(t),T3i(t)T_{1}^{i}(t),T_{2}^{i}(t),T_{3}^{i}(t)

    HN(t)\displaystyle H_{N}(t) =1Ni=1Nσi(R0t(KN))2e2σit(Y0i)2+1Ni=1Nσiβ1(Bti)2\displaystyle=\frac{1}{N}\sum_{i=1}^{N}\sigma^{i}(R_{0}^{t}(K_{N}))^{2}e^{2\sigma^{i}t}(Y_{0}^{i})^{2}+\frac{1}{N}\sum_{i=1}^{N}\sigma^{i}\beta^{-1}(B_{t}^{i})^{2}
    +β1Ni=1Nσi0t0tBuiBvieσi(2tuv)(DRut(KN)σiRvt(KN))(DRvt(KN)σiRvt(KN))𝑑u𝑑v\displaystyle+\frac{\beta^{-1}}{N}\sum_{i=1}^{N}\sigma^{i}\int_{0}^{t}\int_{0}^{t}B_{u}^{i}B_{v}^{i}e^{\sigma^{i}(2t-u-v)}\left({DR_{u}^{t}(K_{N})-\sigma^{i}R_{v}^{t}(K_{N})}\right)\left({DR_{v}^{t}(K_{N})-\sigma^{i}R_{v}^{t}(K_{N})}\right)dudv
    +2Ni=1Nσi{β1/2Y0iBtiR0t(KN)β1/20tY0iBsieσi(2ts)R0t(KN)(DRst(KN)σiRst(KN))ds\displaystyle+\frac{2}{N}\sum_{i=1}^{N}\sigma^{i}\Bigg{\{}\beta^{-1/2}Y_{0}^{i}B_{t}^{i}R_{0}^{t}(K_{N})-\beta^{-1/2}\int_{0}^{t}Y_{0}^{i}B_{s}^{i}e^{\sigma^{i}(2t-s)}R_{0}^{t}(K_{N})\left({DR_{s}^{t}(K_{N})-\sigma^{i}R_{s}^{t}(K_{N})}\right)ds
    β10tBtiBsieσi(ts)(DRst(KN)σiRst(KN))ds}\displaystyle-\beta^{-1}\int_{0}^{t}B_{t}^{i}B_{s}^{i}e^{\sigma^{i}(t-s)}\left({DR_{s}^{t}(K_{N})-\sigma^{i}R_{s}^{t}(K_{N})}\right)ds\Bigg{\}}

    Hence, the above equation of HN(t)H_{N}(t) specifies the function

    Φ:𝐂([0,T]2)×𝐂([0,T])|𝒢1|×𝐂([0,T]2)|𝒢2|×𝐂([0,T]3)|𝒢3|𝐂([0,T])\Phi:\mathbf{C}([0,T]^{2})\times\mathbf{C}([0,T])^{|\mathcal{G}_{1}|}\times\mathbf{C}([0,T]^{2})^{|\mathcal{G}_{2}|}\times\mathbf{C}([0,T]^{3})^{|\mathcal{G}_{3}|}\to\mathbf{C}([0,T])

    such that

    HN=Φ(KN,𝒞N).H_{N}=\Phi(K_{N},\mathcal{C}_{N}).

    By Lemma 2.15, for any 𝒞N\mathcal{C}_{N} defined as in (2.47) and KN,K~N𝐂([0,T]2)K_{N},\widetilde{K}_{N}\in\mathbf{C}([0,T]^{2}), there exists a constant C1>0C_{1}>0 so that

    sup0tT|Φ(KN,𝒞N)Φ(K~N,𝒞N)|C10t|KN(s,s)K~N(s,s)|𝑑s\sup_{0\leq t\leq T}|\Phi(K_{N},\mathcal{C}_{N})-\Phi(\widetilde{K}_{N},\mathcal{C}_{N})|\leq C_{1}\int_{0}^{t}|K_{N}(s,s)-\widetilde{K}_{N}(s,s)|ds (2.53)

    Similarly, we apply Lemma 2.15, then for any 𝒞N,𝒞~N\mathcal{C}_{N},\widetilde{\mathcal{C}}_{N} defined as in (2.47) and K~N𝐂([0,T]2)\widetilde{K}_{N}\in\mathbf{C}([0,T]^{2}), there exists a constant C2>0C_{2}>0 so that

    sup0tT|Φ(K~N,𝒞N)Φ(K~N,𝒞~N)|C2𝒞N𝒞~N\sup_{0\leq t\leq T}|\Phi(\widetilde{K}_{N},\mathcal{C}_{N})-\Phi(\widetilde{K}_{N},\widetilde{\mathcal{C}}_{N})|\leq C_{2}\|\mathcal{C}_{N}-\widetilde{\mathcal{C}}_{N}\|_{\infty} (2.54)

    By Theorem 2.1, we know that KNK_{N} converges to the deterministic limit KK almost surely and KK is unique. By [23, Lemma 3.7], each element in 𝒞N\mathcal{C}_{N} converges to deterministic limits 𝒞\mathcal{C} almost surely under the i.i.d. initial conditions.

    Combining (2.53) and (2.54), then we have

    Φ(KN,𝒞N)Φ(K,𝒞)\displaystyle\|\Phi(K_{N},\mathcal{C}_{N})-\Phi(K,\mathcal{C})\|_{\infty} =Φ(KN,𝒞N)Φ(KN,𝒞)+Φ(KN,𝒞)Φ(K,𝒞)\displaystyle=\|\Phi(K_{N},\mathcal{C}_{N})-\Phi(K_{N},\mathcal{C})+\Phi(K_{N},\mathcal{C})-\Phi(K,\mathcal{C})\|_{\infty}
    Φ(KN,𝒞N)Φ(KN,𝒞)+Φ(KN,𝒞)Φ(K,𝒞)\displaystyle\leq\|\Phi(K_{N},\mathcal{C}_{N})-\Phi(K_{N},\mathcal{C})\|_{\infty}+\|\Phi(K_{N},\mathcal{C})-\Phi(K,\mathcal{C})\|_{\infty}
    C1KNK+C2𝒞N𝒞0, as N.\displaystyle\leq C_{1}\|K_{N}-K\|_{\infty}+C_{2}\|\mathcal{C}_{N}-\mathcal{C}\|_{\infty}\to 0,\mbox{ as }N\to\infty.

    Hence, HNH_{N} converges to deterministic function HH.

    Also, we have H=Φ(K,𝒞)H=\Phi(K,\mathcal{C}). Indeed,

    |HΦ(K,𝒞)||HHN|+|HNΦ(K,𝒞)|0, as N.\displaystyle|H-\Phi(K,\mathcal{C})|\leq|H-H_{N}|+|H_{N}-\Phi(K,\mathcal{C})|\to 0,\mbox{ as }N\to\infty.
  2. 2.

    Equations for the limit points

    Next, we characterize the limit HH as follows. Recall that Rτθ(K)=eτθf(K(s))𝑑sR_{\tau}^{\theta}(K)=e^{-\int_{\tau}^{\theta}f^{\prime}(K(s))ds}. Then YtY_{t} can be expressed as

    Yti=R0t(KN)eσitY0i+β1/20tRst(KN)eσi(ts)𝑑Bsi.Y_{t}^{i}=R_{0}^{t}(K_{N})e^{\sigma^{i}t}Y_{0}^{i}+\beta^{-1/2}\int_{0}^{t}R_{s}^{t}(K_{N})e^{\sigma^{i}(t-s)}dB_{s}^{i}.

    Substitute the above expression of YtiY_{t}^{i} to HN(t)H_{N}(t) defined as in (2.5), then we have

    HN(t)\displaystyle H_{N}(t) =1Ni=1Nσie2σit(Y0i)2(R0t(KN))2+β1Ni=1Nσi(0tRst(KN)eσi(ts)𝑑Bsi)2\displaystyle=\frac{1}{N}\sum_{i=1}^{N}\sigma^{i}e^{2\sigma^{i}t}(Y_{0}^{i})^{2}(R_{0}^{t}(K_{N}))^{2}+\frac{\beta^{-1}}{N}\sum_{i=1}^{N}\sigma^{i}\left({\int_{0}^{t}R_{s}^{t}(K_{N})e^{\sigma^{i}(t-s)}dB_{s}^{i}}\right)^{2}
    +2β1/2Ni=1Nσi(R0t(KN)eσitY0i)0tRst(KN)eσi(ts)𝑑Bsi.\displaystyle+\frac{2\beta^{-1/2}}{N}\sum_{i=1}^{N}\sigma^{i}\left({R_{0}^{t}(K_{N})e^{\sigma^{i}t}Y_{0}^{i}}\right)\int_{0}^{t}R_{s}^{t}(K_{N})e^{\sigma^{i}(t-s)}dB_{s}^{i}.

    As NN\to\infty, note that the first term is convergence to its expectation as in (2.6) by strong law of large number (SLLN) [15, Theorem 2.4.1], the limit of the second term given in (2.7) is obtained by SLLN and Itô isometry [24, Lemma 3.1.5], and the last term is convergence to zero by SLLN. Note that we take the limit that requires the empirical measure νTN:=1Ni=1NδY0i,σi,B[0,T]i\nu_{T}^{N}:=\frac{1}{N}\sum_{i=1}^{N}\delta_{Y_{0}^{i},\sigma^{i},B^{i}_{[0,T]}} converges to the desired limits under the i.i.d. initial conditions. Hence, we obtain the desired integro-differential equation as in Theorem 2.4.

2.4 Proof of Corollary 2.6

Proof.

Consider first the regime β>βc\beta>\beta_{c}. Recall that R(t)R(t) is defined as in (2.13) and K(t)K(t) is given in Theorem 2.1. We rewrite K(t)K(t) as follows:

K(t)=R1(t)(𝔼[e2tσ]+β10tR(r)𝔼[e2(tr)σ]𝑑r).K(t)=R^{-1}(t)\left({\mathbb{E}[e^{2t\sigma}]+\beta^{-1}\int_{0}^{t}R(r)\mathbb{E}[e^{2(t-r)\sigma}]dr}\right). (2.55)

We apply Lemma 2.13 and Lemma 2.12, and plug in the asymptotic limit of R(t)R(t) and E[e2tσ]E[e^{2t\sigma}], we get

limtK(t)=272π12(4β+1)Cβ(4βc)+12β1.\lim_{t\to\infty}K(t)=\frac{2^{-\frac{7}{2}}\pi^{-\frac{1}{2}}(4\beta+1)}{C_{\beta}(4\beta-c)}+\frac{1}{2}\beta^{-1}. (2.56)

Combining this result and Theorem 2.5, we obtained the desired result.

For β<βc\beta<\beta_{c}, we will show that limtK(t)=C\lim_{t\to\infty}K(t)=C for some non-zero constants CC. Take h(t)=R(t)K(t)h(t)=R(t)K(t). Note that

R(t)=2cK(t)R(t)=2ch(t).R^{\prime}(t)=2cK(t)R(t)=2ch(t). (2.57)

Thus, we have 2ch(z)=zR(z)12c\mathcal{L}_{h}(z)=z\mathcal{L}_{R}(z)-1.

By Lemma 2.14, we have

R(z)=1+cm(z)2zcβ1m(z)\mathcal{L}_{R}(z)=\frac{1+cm(z)}{2z-c\beta^{-1}m(z)} (2.58)

Hence, we get

g(z)=12c(czm(z)(1+β1)z2zcβ1m(z)).\mathcal{L}_{g}(z)=\frac{1}{2c}\left({\frac{czm(z)(1+\beta^{-1})-z}{2z-c\beta^{-1}m(z)}}\right). (2.59)

Note that g(z)\mathcal{L}_{g}(z) has a simple pole at sβs_{\beta}, which is a solution to 2z=cβ1m(z)2z=c\beta^{-1}m(z). Thus there exists a constant C>0C>0 such that

limz0zg(z+sβ)=C\lim_{z\to 0}z\mathcal{L}_{g}(z+s_{\beta})=C (2.60)

By [4, Lemma 7.2], we have

limte2sβth(t)=C\lim_{t\to\infty}e^{-2s_{\beta}t}h(t)=C (2.61)

Hence, limtK(t)=C\lim_{t\to\infty}K(t)=C for some non-zero constants CC. Hence, the desired follows from Theorem 2.5.

For β=βc\beta=\beta_{c}, we apply the same proof as β<βc\beta<\beta_{c}. Note that limtK(t)=C1\lim_{t\to\infty}K(t)=C_{1} for some non-zero constants C1C_{1}. Hence, we still obtain the desired result by Theorem 2.5. ∎

3 Gradient descent in spherical SK model: hitting time analysis

In this section, we mainly study the algorithm complexity of using the gradient descent algorithm to find the extreme eigenvalues of the Wigner matrix. This is related to our previous main results about the asymptotic limit of energy in Section 2 for the description of the time to the equilibrium state of the SSK model.

Recall that we previously defined the Spherical Sherrington-Kirkpatrick (SSK) model on a sphere of radius NN. For simplicity, we consider the SSK model on the unit sphere in this section characterized by the Hamiltonian

H𝐉(X)=XT𝐉X,H_{\mathbf{J}}(X)=X^{T}\mathbf{J}X,

where X=(X1,,XN)X=(X_{1},\dots,X_{N}) with X2=1\|X\|_{2}=1 and 𝐉\mathbf{J} is the normalized symmetric Wigner matrix.

We define NN eigenvalues of 𝐉\mathbf{J} in increasing order as

λ1λ2λN,\lambda_{1}\leq\lambda_{2}\leq\dots\leq\lambda_{N},

and v1,v2,,vNv_{1},v_{2},\dots,v_{N} be an orthonormal basis of eigenvectors of 𝐉\mathbf{J} so that 𝐉vi=λivi\mathbf{J}v_{i}=\lambda_{i}v_{i} for i=1,,Ni=1,\dots,N.

The gradient descent algorithm (a.k.a., zero-temperature dynamics) of the SK model on the unit sphere is defined as follows:

dXt=𝕊N1H𝐉(Xt)dt,dX_{t}=-\nabla_{\mathbb{S}^{N-1}}H_{\mathbf{J}}(X_{t})dt, (3.1)

where the initial data X0={X0i}1iNX_{0}=\{X_{0}^{i}\}_{1\leq i\leq N} is uniformly distributed on the unit sphere 𝕊N1\mathbb{S}^{N-1}, and 𝕊N1\nabla_{\mathbb{S}^{N-1}} is the gradient on the unit sphere 𝕊N1\mathbb{S}^{N-1} and

𝕊N1f(x):=f(x)(f(x)x)x,xN\nabla_{\mathbb{S}^{N-1}}f(x):=\nabla f(x)-(\nabla f(x)\cdot x)x,\,x\in\mathbb{R}^{N}

for smooth functions ff.

Before presenting the main result of this section, we need to add one assumption on the Wigner matrix 𝐉\mathbf{J} to ensure that the Tracy-Widom law holds, which provides information about the rescaling of the largest eigenvalue around the edge in [29, 30]. Moreover, Lee and Yin proved that this is not only true for Gaussian ensembles but also holds for more general situations in [22]. A simple sufficient criterion for Tracy-Widom law has been proved as follows.

Theorem 3.1.

[22, Theorem 1.2] Let 𝐉\mathbf{J} be the normalized Wigner matrix as in Definition 1.1 and λ1λ2λN\lambda_{1}\leq\lambda_{2}\leq\dots\lambda_{N} the eigenvalues of 𝐉\mathbf{J} as before. If the off-diagonal entry of the Wigner matrix satisfies

limss4(|Z12|s)=0,\lim_{s\to\infty}s^{4}\mathbb{P}(|Z_{12}|\geq s)=0, (3.2)

then the joint distribution function of kk rescaled the largest eigenvalues

(N2/3(λN2)s1,N2/3(λN12)s2,,N2/3(λNk+12)sk)\mathbb{P}(N^{2/3}(\lambda_{N}-2)\leq s_{1},N^{2/3}(\lambda_{N-1}-2)\leq s_{2},\dots,N^{2/3}(\lambda_{N-k+1}-2)\leq s_{k}) (3.3)

has a limit as NN\to\infty, which coincides with that in the GUE and GOE cases, i.e., it weakly converges to the Tracy-Widom distribution. This result also holds for the smallest eigenvalues λ1,,λk\lambda_{1},\dots,\lambda_{k}.

Remark 3.2.

Note that any distribution with a finite fourth moment satisfies the criterion (3.2), however, the converse statement does not hold. See [22, Remark 1.3] for a counterexample.

Fix ε(0,1)\varepsilon\in(0,1). Denote by TεT_{\varepsilon} the hitting time of the overlap between the output XtX_{t} of the gradient descent and eigenvector v1v_{1} corresponding to the smallest eigenvalue of 𝐉\mathbf{J} is greater than ε\varepsilon, that is

Tε:=inft>0{|v1Xt|ε}.T_{\varepsilon}:=\inf_{t>0}\{|v_{1}\cdot X_{t}|\geq\varepsilon\}.

Our main result is the lower bound and upper bound of the hitting time TεT_{\varepsilon} as follows.

Theorem 3.3.

Assume that the normalized N×NN\times N Wigner matrix 𝐉={Jij}1i,jN={ZijN}1i,jN\mathbf{J}=\{J_{ij}\}_{1\leq i,j\leq N}=\{\frac{Z_{ij}}{\sqrt{N}}\}_{1\leq i,j\leq N} obeys the following condition: for 1i,jN1\leq i,j\leq N

𝔼|Zij|4C\mathbb{E}|Z_{ij}|^{4}\leq C (3.4)

for some constants C>0C>0. Consider the gradient descent described in (3.1) with the same setting as before. Fix ε(0,1)\varepsilon\in(0,1). Let the hitting time TεT_{\varepsilon} be defined as before. For every δ>0\delta>0, there exist constants C1=C1(δ)>0C_{1}=C_{1}(\delta)>0, C2=C2(δ)>0C_{2}=C_{2}(\delta)>0, and C3=C3(ε,δ)>0C_{3}=C_{3}(\varepsilon,\delta)>0 such that

limN(C1N2/3<Tε<C2N2/3log(C3N))>1δ.\lim_{N\to\infty}\mathbb{P}\left({C_{1}N^{2/3}<T_{\varepsilon}<C_{2}N^{2/3}\log(C_{3}N)}\right)>1-\delta.

Combining Theorem 3.1 and continuous mapping theorem in [15, Theorem 3.2.10] we obtain the following Lemma. Moreover, we have λ2λ1=Op(N2/3)\lambda_{2}-\lambda_{1}=O_{p}(N^{-2/3}).

Lemma 3.4.

[21, Lemma 2.3] Let 𝐉\mathbf{J} be the normalized Wigner matrix. Assume that 𝐉\mathbf{J} obeys the same condition as in (3.4). Let λ1<λ2\lambda_{1}<\lambda_{2} be the two smallest eigenvalues of 𝐉\mathbf{J}. Then for every ε>0\varepsilon>0, there exists δ>0\delta>0 so that

(N2/3(λ2λ1)δ)1ε\mathbb{P}\left({N^{2/3}(\lambda_{2}-\lambda_{1})\geq\delta}\right)\geq 1-\varepsilon (3.5)

for all NN large enough. This result also holds for the two largest eigenvalue λN1<λN\lambda_{N-1}<\lambda_{N}.

Define the overlap of the output XtX_{t} and eigenvectors viv_{i} of 𝐉\mathbf{J} by hi(t):=viXth_{i}(t):=v_{i}\cdot X_{t} for i=1,,Ni=1,\dots,N. Note that we can solve hi(t)h_{i}(t) for i=1,,Ni=1,\dots,N as follows.

Lemma 3.5.

Assume that the same setting holds as in Theorem 3.3. Then we have

|hj(t)|=|hj(0)|e2λjti=1Nhi2(0)e4λit.|h_{j}(t)|=\frac{|h_{j}(0)|e^{-2\lambda_{j}t}}{\sqrt{\sum_{i=1}^{N}h^{2}_{i}(0)e^{-4\lambda_{i}t}}}. (3.6)
Proof.

Note that the gradient descent (3.1) can be simplified as follows

dXt=(H𝐉(Xt)(H𝐉(Xt)Xt)Xt)dt=2𝐉Xtdt+2H𝐉(Xt)Xtdt.dX_{t}=-\left({\nabla H_{\mathbf{J}}(X_{t})-(\nabla H_{\mathbf{J}}(X_{t})\cdot X_{t})X_{t}}\right)dt=-2\mathbf{J}X_{t}dt+2H_{\mathbf{J}}(X_{t})X_{t}dt. (3.7)

By spectral decomposition we have

𝐉=i=1NviviTλi,\mathbf{J}=\sum_{i=1}^{N}v_{i}v_{i}^{T}\lambda_{i},

where v1,,vNv_{1},\dots,v_{N} are eigenvectors corresponding to eigenvalues λ1λ2λN\lambda_{1}\leq\lambda_{2}\leq\dots\leq\lambda_{N} of 𝐉\mathbf{J}.

Note that

H𝐉(Xt)=Xt𝐉Xt=iλi(hi(t))2.H_{\mathbf{J}}(X_{t})=X_{t}\cdot\mathbf{J}X_{t}=\sum_{i}\lambda_{i}(h_{i}(t))^{2}.

Consider the dot product v1v_{1} on the both sides of (3.7) and substitute the above equation of H𝐉(Xt)H_{\mathbf{J}}(X_{t}) to get

12h1(t)\displaystyle\frac{1}{2}h_{1}^{\prime}(t) =h1(t)H𝐉(Xt)v1(iviviTλi)Xt\displaystyle=h_{1}(t)H_{\mathbf{J}}(X_{t})-v_{1}\cdot(\sum_{i}v_{i}v_{i}^{T}\lambda_{i})X_{t}
=h(t)H𝐉(Xt)λ1v1Xt\displaystyle=h(t)H_{\mathbf{J}}(X_{t})-\lambda_{1}v_{1}\cdot X_{t}
=h1(t)H𝐉(Xt)λ1h1(t).\displaystyle=h_{1}(t)H_{\mathbf{J}}(X_{t})-\lambda_{1}h_{1}(t).

By the fact that ihi(t)2=1\sum_{i}h_{i}(t)^{2}=1 we have

12h1(t)=(H𝐉(Xt)λ1)h(t)=i=1N[(λiλ1)hi2]h1(t).\frac{1}{2}h_{1}^{\prime}(t)=(H_{\mathbf{J}}(X_{t})-\lambda_{1})h(t)=\sum_{i=1}^{N}[(\lambda_{i}-\lambda_{1})h_{i}^{2}]h_{1}(t). (3.8)

where we write hi(t)=hih_{i}(t)=h_{i} for convenience, i=1,,Ni=1,\dots,N.

Similarly, we have

12hj(t)=i=1N[(λiλj)hi2]hj(t).\frac{1}{2}h_{j}^{\prime}(t)=\sum_{i=1}^{N}[(\lambda_{i}-\lambda_{j})h_{i}^{2}]h_{j}(t). (3.9)

Multiply hj(t)h_{j}(t) on the both side of (3.9) yields

(hj2(t))=4i=1N((λiλj)hi2)hj2\displaystyle(h^{2}_{j}(t))^{\prime}=4\sum_{i=1}^{N}\left({(\lambda_{i}-\lambda_{j})h_{i}^{2}}\right)h_{j}^{2}

Denote f(t)=iλihi2(t)f(t)=\sum_{i}\lambda_{i}h_{i}^{2}(t) and F(t)=0tf(s)𝑑sF(t)=\int_{0}^{t}f(s)ds.

Both sides of the equation are divided by hj2h_{j}^{2} yields forj=1,,Nj=1,\dots,N

(loghj2(t))=4f(t)4λj\displaystyle(\log h_{j}^{2}(t))^{\prime}=4f(t)-4\lambda_{j}

Integrating the two sides with respect to time tt yields

hj2(t)=hj(0)2exp(4F(t)4λjt)h^{2}_{j}(t)=h_{j}(0)^{2}\exp\left({4F(t)-4\lambda_{j}t}\right) (3.10)

Taking the derivative with respect to both sides of F(t)F(t) and substitute (3.10) yields

F(t)=iλihi2(t)=e4F(t)(iλihi2(0)e4λtt).\displaystyle F^{\prime}(t)=\sum_{i}\lambda_{i}h_{i}^{2}(t)=e^{4F(t)}\left({\sum_{i}\lambda_{i}h^{2}_{i}(0)e^{-4\lambda_{t}t}}\right).

Both sides are divided by the integration factor e4F(t)e^{4F(t)} and integrating with respect to tt to obtain

e4F(t)1=hi2(0)(e4λit1).e^{-4F(t)}-1=\sum h_{i}^{2}(0)(e^{-4\lambda_{i}t}-1). (3.11)

So we get

F(t)=14log(ihi2(0)e4λit).F(t)=-\frac{1}{4}\log\left({\sum_{i}h_{i}^{2}(0)e^{-4\lambda_{i}t}}\right). (3.12)

Substituting (3.12) into equation (3.10) yields for j=1,2,,Nj=1,2,\dots,N

|hj(t)|=|hj(0)|e2λjti=1Nhi2(0)e4λit.|h_{j}(t)|=\frac{|h_{j}(0)|e^{-2\lambda_{j}t}}{\sqrt{\sum_{i=1}^{N}h^{2}_{i}(0)e^{-4\lambda_{i}t}}}. (3.13)

To prove Theorem 3.3, we need the following Lemma.

Lemma 3.6.

Consider a sequence of i.i.d. positive random variables X1,,XkX_{1},\dots,X_{k}. For every constant C>0C>0, we have

(X1+X2++XkXj>C)>1Ck, for j=1,,k.\mathbb{P}\left(\frac{X_{1}+X_{2}+\dots+X_{k}}{X_{j}}>C\right)>1-\frac{C}{k},\mbox{ for }\,j=1,\dots,k.
Proof.

Note that

0<𝔼[X1i=1kXi]=𝔼[X2i=1kXi]==𝔼[Xki=1kXi]<1.0<\mathbb{E}\left[\frac{X_{1}}{\sum_{i=1}^{k}X_{i}}\right]=\mathbb{E}\left[\frac{X_{2}}{\sum_{i=1}^{k}X_{i}}\right]=\dots=\mathbb{E}\left[\frac{X_{k}}{\sum_{i=1}^{k}X_{i}}\right]<1.

Then we have

𝔼[X1i=1kXi]=1k.\mathbb{E}\left[\frac{X_{1}}{\sum_{i=1}^{k}X_{i}}\right]=\frac{1}{k}.

By Markov’s inequality, we get for every constant C>0C>0

(X1+X2++XkXjC)=(XjX1+X2++Xk1/C)C𝔼[XjiXi]=Ck.\mathbb{P}\left(\frac{X_{1}+X_{2}+\dots+X_{k}}{X_{j}}\leq C\right)=\mathbb{P}\left(\frac{X_{j}}{X_{1}+X_{2}+\dots+X_{k}}\geq 1/C\right)\leq C\cdot\mathbb{E}\left[\frac{X_{j}}{\sum_{i}X_{i}}\right]=\frac{C}{k}. (3.14)

We require the following Lemma.

Lemma 3.7.

Let XX be a standard normal random variable. Then for every ϵ>0\epsilon>0, there exists a constant δ>0\delta>0 so that

(|X|>δ)1ϵ.\mathbb{P}(|X|>\delta)\geq 1-\epsilon. (3.15)
Proof.

For every ϵ>0\epsilon>0, there exists a sufficiently small constant δ(0,π2ε)\delta\in\left({0,\sqrt{\frac{\pi}{2}}\varepsilon}\right) so that

(|X|δ)=12πδδex2/2𝑑x22πδ<ϵ,\mathbb{P}(|X|\leq\delta)=\frac{1}{\sqrt{2\pi}}\int_{-\delta}^{\delta}e^{-x^{2}/2}dx\leq\frac{2}{\sqrt{2\pi}}\delta<\epsilon, (3.16)

where the above inequality follows from the fact that ex2/21e^{-x^{2}/2}\leq 1 for xx\in\mathbb{R}. ∎

Armed with the previous results, we then can prove our main result.

The proof of Theorem 3.3.

By Lemma 3.5, we have

|h1(t)|=|h1(0)|e2λ1tihi2(0)e4λit=|h1(0)|h12(0)+i=2Nhi2(0)e4(λiλ1)t|h_{1}(t)|=\frac{|h_{1}(0)|e^{-2\lambda_{1}t}}{\sqrt{\sum_{i}h^{2}_{i}(0)e^{-4\lambda_{i}t}}}=\frac{|h_{1}(0)|}{\sqrt{h_{1}^{2}(0)+\sum_{i=2}^{N}h^{2}_{i}(0)e^{-4(\lambda_{i}-\lambda_{1})t}}} (3.17)

Next, we consider the upper and lower bound of the hitting time TεT_{\varepsilon}, respectively.

  1. 1.

    Lower bound of TεT_{\varepsilon}.

    For any δ>0\delta>0, we fix the first kk terms of the denominator of (3.17) and then we will find desired kk below depending on ϵ\epsilon and δ\delta (independent of NN):

    |h1(t)||h1(0)|(h12(0)+i=2khi2(0)e4t(λiλ1))1/2.|h_{1}(t)|\leq|h_{1}(0)|\left({h_{1}^{2}(0)+\sum_{i=2}^{k}h_{i}^{2}(0)e^{-4t(\lambda_{i}-\lambda_{1})}}\right)^{-1/2}. (3.18)

    By the fact that (λiλ1)(λkλ1)(\lambda_{i}-\lambda_{1})\leq(\lambda_{k}-\lambda_{1}) for i=1,,k1i=1,\dots,k-1, then we upper bound the above inequality:

    |h1(t)||h1(0)|(h12(0)+e4t(λkλ1)i=2khi2(0))1/2|h_{1}(t)|\leq|h_{1}(0)|\left({h_{1}^{2}(0)+e^{-4t(\lambda_{k}-\lambda_{1})}\sum_{i=2}^{k}h_{i}^{2}(0)}\right)^{-1/2} (3.19)

    As tTϵt\geq T_{\epsilon}, we have

    ϵ|h1(t)||h1(0)|(h12(0)+e4t(λkλ1)i=2khi2(0))1/2.\epsilon\leq|h_{1}(t)|\leq|h_{1}(0)|\left({h_{1}^{2}(0)+e^{-4t(\lambda_{k}-\lambda_{1})}\sum_{i=2}^{k}h_{i}^{2}(0)}\right)^{-1/2}. (3.20)

    Then we get

    Tϵ14(λkλ1)log(h22(0)++hk2(0)h12(0)(ϵ21))T_{\epsilon}\geq\frac{1}{4(\lambda_{k}-\lambda_{1})}\log\left(\frac{h_{2}^{2}(0)+\dots+h_{k}^{2}(0)}{h_{1}^{2}(0)(\epsilon^{-2}-1)}\right) (3.21)

    For any δ>0\delta>0, we apply Lemma 3.6 and choose C=2ε21C=2\varepsilon^{-2}-1:

    limN(h22(0)++hk2(0)h12(0)(ϵ21)>2)\displaystyle\lim_{N\to\infty}\mathbb{P}\left(\frac{h_{2}^{2}(0)+\dots+h_{k}^{2}(0)}{h_{1}^{2}(0)(\epsilon^{-2}-1)}>2\right) =limN(h12(0)+h22(0)++hk2(0)h12(0)>2ϵ21)\displaystyle=\lim_{N\to\infty}\mathbb{P}\left(\frac{h_{1}^{2}(0)+h_{2}^{2}(0)+\dots+h_{k}^{2}(0)}{h_{1}^{2}(0)}>\frac{2}{\epsilon^{2}}-1\right)
    1δ/2,\displaystyle\geq 1-\delta/2,

    where we take k=[2(2ϵ21)/δ]+1k=[2(2\epsilon^{-2}-1)/\delta]+1.

    By the similar argument of Lemma 3.4, for any δ>0\delta>0 there exists a constant c1=c1(δ)>0c_{1}=c_{1}(\delta)>0 so that

    limN(N2/3(λkλ1)<c1)1δ/2\lim_{N\to\infty}\mathbb{P}(N^{2/3}(\lambda_{k}-\lambda_{1})<c_{1})\geq 1-\delta/2 (3.22)

    Hence, for any δ>0\delta>0 there exists c2=log24c1c_{2}=\frac{\log 2}{4c_{1}} so that

    limN(Tε>c2N2/3)1δ.\lim_{N\to\infty}\mathbb{P}(T_{\varepsilon}>c_{2}N^{2/3})\geq 1-\delta. (3.23)
  2. 2.

    Upper bound of TεT_{\varepsilon}.

    Note that (λ2λ1)(λiλ1)(\lambda_{2}-\lambda_{1})\leq(\lambda_{i}-\lambda_{1}) for i=3,,Ni=3,\dots,N. We upper bound each term of the denominator in (3.17) by e4t(λiλ1)e4t(λ2λ1)e^{-4t(\lambda_{i}-\lambda_{1})}\leq e^{-4t(\lambda_{2}-\lambda_{1})} for i=3,,Ni=3,\dots,N. Then we have

    |h1(t)||h1(0)|h12(0)+e4(λ2λ1)ti=2Nhi2(0)=|h1(0)|h12(0)+e4(λ2λ1)t(1h12(0))|h_{1}(t)|\geq\frac{|h_{1}(0)|}{\sqrt{h_{1}^{2}(0)+e^{-4(\lambda_{2}-\lambda_{1})t}\sum_{i=2}^{N}h^{2}_{i}(0)}}=\frac{|h_{1}(0)|}{\sqrt{h_{1}^{2}(0)+e^{-4(\lambda_{2}-\lambda_{1})t}(1-h_{1}^{2}(0))}} (3.24)

    For tTεt\leq T_{\varepsilon}, we get

    ε|h1(t)||h1(0)|h12(0)+e4(λ2λ1)t(1h12(0)).\varepsilon\geq|h_{1}(t)|\geq\frac{|h_{1}(0)|}{\sqrt{h_{1}^{2}(0)+e^{-4(\lambda_{2}-\lambda_{1})t}(1-h_{1}^{2}(0))}}. (3.25)

    and this yields

    Tε14(λ2λ1)log(h12(0)1ε21).T_{\varepsilon}\leq\frac{1}{4(\lambda_{2}-\lambda_{1})}\log\left({\frac{h_{1}^{-2}(0)-1}{\varepsilon^{-2}-1}}\right). (3.26)

    By Lemma 3.4, for any δ>0\delta>0 there exists a constant c3=c3(δ)>0c_{3}=c_{3}(\delta)>0 so that

    limN(N2/3(λ2λ1)c3)1δ/2\lim_{N\to\infty}\mathbb{P}\left({N^{2/3}(\lambda_{2}-\lambda_{1})\geq c_{3}}\right)\geq 1-\delta/2 (3.27)

    Note that Nh1(0)\sqrt{N}h_{1}(0) is asymptotic Gaussian by [28, Theorem 13]. By Lemma 3.7, for every δ>0\delta>0, there exists a constant c4>0c_{4}>0 so that

    limN(N|h1(0)|>c4)1δ/2.\lim_{N\to\infty}\mathbb{P}\left({\sqrt{N}|h_{1}(0)|>c_{4}}\right)\geq 1-\delta/2. (3.28)

    For any δ>0\delta>0, we take c5=1c42(ε21)>0c_{5}=\frac{1}{c_{4}^{2}(\varepsilon^{-2}-1)}>0 and then get

    (h12(0)1ε21<c5N)=(|h1(0)|>c4c42+N)=(1+c42NN|h1(0)|>c4)\displaystyle\mathbb{P}\left({\frac{h_{1}^{-2}(0)-1}{\varepsilon^{-2}-1}<c_{5}N}\right)=\mathbb{P}\left({|h_{1}(0)|>\sqrt{\frac{c_{4}}{c_{4}^{2}+N}}}\right)=\mathbb{P}\left({\sqrt{1+\frac{c_{4}^{2}}{N}}\sqrt{N}|h_{1}(0)|>c_{4}}\right)

    By inequality (3.28), for every δ>0\delta>0 we have

    limN(h12(0)1ε21<c5N)1δ/2.\lim_{N\to\infty}\mathbb{P}\left({\frac{h_{1}^{-2}(0)-1}{\varepsilon^{-2}-1}<c_{5}N}\right)\geq 1-\delta/2. (3.29)

    Combining the two upper bounds (3.27) and (3.29), for every δ>0\delta>0 there exist constants c5c_{5} defined as above and c6=14c3c_{6}=\frac{1}{4c_{3}} so that

    limN(Tε<c6N2/3log(c5N))1δ.\lim_{N\to\infty}\mathbb{P}\left({T_{\varepsilon}<c_{6}N^{2/3}\log(c_{5}N)}\right)\geq 1-\delta. (3.30)

4 Power iteration method: a hitting time perspective

In this section, we conduct an average case analysis of the hitting time for the top eigenvectors using the power iteration method [5, Section 9.3], paralleling the approach in Section 3. Our focus will be on a normalized Wigner matrix, for which we will arrange the eigenvalues by their absolute values.

Let 𝐉\mathbf{J} be a normalized Wigner matrix as before. We arrange the eigenvalues of the matrix 𝐉\mathbf{J} in order of their absolute values

|σN||σN1||σ2||σ1||\sigma_{N}|\leq|\sigma_{N-1}|\leq\dots\leq|\sigma_{2}|\leq|\sigma_{1}|

with corresponding eigenvectors uN,uN1,,u2,u1u_{N},u_{N-1},\dots,u_{2},u_{1}. Power iteration is then employed to estimate the matrix’s dominant eigenvalue σ1\sigma_{1} and the corresponding eigenvector.

Given an arbitrary initial vector q0Nq_{0}\in\mathbb{R}^{N} with q02=1\|q_{0}\|_{2}=1, we take for k=1,2,3,k=1,2,3,\dots

qk:=𝐉qk1𝐉qk12q_{k}:=\frac{\mathbf{J}q_{k-1}}{\|\mathbf{J}q_{k-1}\|_{2}} (4.1)

Note that there are some constants αi\alpha_{i} for i=1,2,,Ni=1,2,\dots,N so that

q0=i=1Nαiui.q_{0}=\sum_{i=1}^{N}\alpha_{i}u_{i}. (4.2)

Then we have for for k=1,2,Nk=1,2,\dots N,

𝐉kq0=(i=1NuiuiTσik)(i=1Nαiui)=σ1k(α1u1+i=2Nαi(σiσ1)kui).\mathbf{J}^{k}q_{0}=\left({\sum_{i=1}^{N}u_{i}u_{i}^{T}\sigma_{i}^{k}}\right)\left({\sum_{i=1}^{N}\alpha_{i}u_{i}}\right)=\sigma_{1}^{k}\left({\alpha_{1}u_{1}+\sum_{i=2}^{N}\alpha_{i}\left({\frac{\sigma_{i}}{\sigma_{1}}}\right)^{k}u_{i}}\right). (4.3)

Our main theorem in this section is as follows.

Theorem 4.1.

Let 𝐉\mathbf{J} be the N×NN\times N normalized Wigner matrix. Assume that 𝐉\mathbf{J} obeys the same condition as in (3.4). Fix 0<ε<10<\varepsilon<1 and intial value q0Nq_{0}\in\mathbb{R}^{N}. Let qkq_{k} be output of power iteration for step k1k\geq 1 and Tε=infk1{|qku1|ε}T_{\varepsilon}=\inf_{k\geq 1}\{\left\lvert{q_{k}\cdot u_{1}}\right\rvert\geq\varepsilon\}. For every δ>0\delta>0, there exists constants C1,C2,C3>0C_{1},C_{2},C_{3}>0 so that

limN(C1N2/3<Tε<C2N2/3log(C3N))>1δ.\lim_{N\to\infty}\mathbb{P}\left({C_{1}N^{2/3}<T_{\varepsilon}<C_{2}N^{2/3}\log(C_{3}N)}\right)>1-\delta. (4.4)

To prove our main theorem, we require the following lemma results.

Lemma 4.2.

Let 𝐉\mathbf{J} be the N×NN\times N normalized Wigner matrix. Assume that 𝐉\mathbf{J} obeys the same condition as in (3.4). Ordering its eigenvalues |σN||σN1||σ1||\sigma_{N}|\leq|\sigma_{N-1}|\leq\dots\leq|\sigma_{1}|. Then we have |σ1|/|σ2|=1+𝒪(N2/3)|\sigma_{1}|/|\sigma_{2}|=1+\mathcal{O}(N^{-2/3})

Proof.

By Theorem 3.1, we assume that |σ1|=2+γ1|\sigma_{1}|=2+\gamma_{1} and |σ2|=2+γ2|\sigma_{2}|=2+\gamma_{2} with γ1,γ2=𝒪(N2/3)\gamma_{1},\gamma_{2}=\mathcal{O}(N^{-2/3}). Then by Taylor expansion, we get

|σ1||σ2|\displaystyle\frac{|\sigma_{1}|}{|\sigma_{2}|} =2+γ12(1+γ22)1=2+γ12(1γ22+𝒪(γ22))\displaystyle=\frac{2+\gamma_{1}}{2}(1+\frac{\gamma_{2}}{2})^{-1}=\frac{2+\gamma_{1}}{2}\left({1-\frac{\gamma_{2}}{2}+\mathcal{O}(\gamma_{2}^{2})}\right)
=1+12(γ1γ2)+𝒪(N4/3)\displaystyle=1+\frac{1}{2}(\gamma_{1}-\gamma_{2})+\mathcal{O}(N^{-4/3})
=1+𝒪(N2/3).\displaystyle=1+\mathcal{O}(N^{-2/3}).

The proof of Theorem 4.1.

We first show that the lower bound of TεT_{\varepsilon}. Fix δ>0\delta>0. We will find appropriate 1l<N1\leq l<N depending on ε\varepsilon and δ\delta (independent of NN) to lower bound the denominator of qku1q_{k}\cdot u_{1} by the first ll terms as follows.

By (4.1) and (4.3), we have

|qku1|=|σ1kα1|i=1Nαi2σi2k|σ1kα1|i=1lαi2σi2k=(1+i=2lαi2α12σi2kσ12k)1/2\left\lvert{q_{k}\cdot u_{1}}\right\rvert=\frac{\left\lvert{\sigma_{1}^{k}\alpha_{1}}\right\rvert}{\sqrt{\sum_{i=1}^{N}\alpha_{i}^{2}\sigma_{i}^{2k}}}\leq\frac{\left\lvert{\sigma_{1}^{k}\alpha_{1}}\right\rvert}{\sqrt{\sum_{i=1}^{l}\alpha_{i}^{2}\sigma_{i}^{2k}}}=\left({1+\sum_{i=2}^{l}\frac{\alpha_{i}^{2}}{\alpha_{1}^{2}}\frac{\sigma_{i}^{2k}}{\sigma_{1}^{2k}}}\right)^{-1/2} (4.5)

By the fact that |σi||σl||\sigma_{i}|\geq|\sigma_{l}| for i=1,,li=1,\dots,l, we obtain

|qku1|(1+σl2kσ12ki=2lαi2α12)1/2\left\lvert{q_{k}\cdot u_{1}}\right\rvert\leq\left({1+\frac{\sigma_{l}^{2k}}{\sigma_{1}^{2k}}\sum_{i=2}^{l}\frac{\alpha_{i}^{2}}{\alpha_{1}^{2}}}\right)^{-1/2} (4.6)

For kTεk\geq T_{\varepsilon}, we have |qku1|ε\left\lvert{q_{k}\cdot u_{1}}\right\rvert\geq\varepsilon. Thus, we get

Tε12log1(|σ1||σl|)log(α22++αl2α12(ε21)).T_{\varepsilon}\geq\frac{1}{2}\log^{-1}\left({\frac{|\sigma_{1}|}{|\sigma_{l}|}}\right)\log\left({\frac{\alpha_{2}^{2}+\dots+\alpha_{l}^{2}}{\alpha_{1}^{2}(\varepsilon^{-2}-1)}}\right). (4.7)

Then we apply Lemma 3.6 and choose C=2ε21C=2\varepsilon^{-2}-1:

limN(α22++αl2α12(ϵ21)>2)\displaystyle\lim_{N\to\infty}\mathbb{P}\left(\frac{\alpha_{2}^{2}+\dots+\alpha_{l}^{2}}{\alpha_{1}^{2}(\epsilon^{-2}-1)}>2\right) =limN(α12+α22++αl2α12>2ϵ21)\displaystyle=\lim_{N\to\infty}\mathbb{P}\left(\frac{\alpha_{1}^{2}+\alpha_{2}^{2}+\dots+\alpha_{l}^{2}}{\alpha_{1}^{2}}>\frac{2}{\epsilon^{2}}-1\right)
1δ/2,\displaystyle\geq 1-\delta/2,

where we take l=[2(2ϵ21)/δ]+1l=[2(2\epsilon^{-2}-1)/\delta]+1.

By Lemma 4.2 and Theorem 3.1, for any δ>0\delta>0 there exists a constant c1=c1(δ)>0c_{1}=c_{1}(\delta)>0 so that

limN(N2/3(|σ1||σl|1)c1)1δ/2.\lim_{N\to\infty}\mathbb{P}\left({N^{2/3}\left({\frac{|\sigma_{1}|}{|\sigma_{l}|}-1}\right)\leq c_{1}}\right)\geq 1-\delta/2. (4.8)

By the fact that log(1+x)<x\log(1+x)<x for x>0x>0, then for any δ>0\delta>0 there exists c2=c11>0c_{2}=c_{1}^{-1}>0 so that

limN(Tεc2N2/3)1δ.\lim_{N\to\infty}\mathbb{P}(T_{\varepsilon}\geq c_{2}N^{2/3})\geq 1-\delta. (4.9)

Next, we prove the upper bound of TεT_{\varepsilon}.

By inequality that |σi||σ2||\sigma_{i}|\leq|\sigma_{2}| for i=2,3,,Ni=2,3,\dots,N, we have

|qku1|=(1+i=2Nαi2α12σi2kσ12k)1/2(1+σ22kσ12ki=2Nαi2α12)1/2.\left\lvert{q_{k}\cdot u_{1}}\right\rvert=\left({1+\sum_{i=2}^{N}\frac{\alpha_{i}^{2}}{\alpha_{1}^{2}}\frac{\sigma_{i}^{2k}}{\sigma_{1}^{2k}}}\right)^{-1/2}\geq\left({1+\frac{\sigma_{2}^{2k}}{\sigma_{1}^{2k}}\sum_{i=2}^{N}\frac{\alpha_{i}^{2}}{\alpha_{1}^{2}}}\right)^{-1/2}. (4.10)

Since we have q02=1\|q_{0}\|_{2}=1, then

|qku1|(1+σ22kσ12k1α12α12)1/2.\left\lvert{q_{k}\cdot u_{1}}\right\rvert\geq\left({1+\frac{\sigma_{2}^{2k}}{\sigma_{1}^{2k}}\frac{1-\alpha_{1}^{2}}{\alpha_{1}^{2}}}\right)^{-1/2}. (4.11)

By Bernoulli’s inequality (1+x)α1+αx(1+x)^{\alpha}\geq 1+\alpha x for α0\alpha\leq 0 and x>1x>-1, we get

|qku1|112(σ2σ1)2k(1α12α12)\displaystyle\left\lvert{q_{k}\cdot u_{1}}\right\rvert\geq 1-\frac{1}{2}\left({\frac{\sigma_{2}}{\sigma_{1}}}\right)^{2k}\left({\frac{1-\alpha_{1}^{2}}{\alpha_{1}^{2}}}\right)

Note that for kTεk\leq T_{\varepsilon}, we have |qku1|ε\left\lvert{q_{k}\cdot u_{1}}\right\rvert\leq\varepsilon. Then we get

Tε2log|σ1σ2|log(α1212(1ε)).T_{\varepsilon}\leq\frac{2}{\log\left\lvert{\frac{\sigma_{1}}{\sigma_{2}}}\right\rvert}\log\left({\frac{\alpha_{1}^{-2}-1}{2(1-\varepsilon)}}\right). (4.12)

By Lemma 4.2, for any δ>0\delta>0 there exists a constant c3=c3(δ)>0c_{3}=c_{3}(\delta)>0 so that

limN(N2/3(|σ1σ2|1)c3)1δ/2.\lim_{N\to\infty}\mathbb{P}\left({N^{2/3}\left({\left\lvert{\frac{\sigma_{1}}{\sigma_{2}}}\right\rvert-1}\right)\geq c_{3}}\right)\geq 1-\delta/2. (4.13)

By substituting into inequalities (4.13) and log(1+x)>x/2\log(1+x)>x/2 for x[0,1)x\in[0,1), we get the following inequality

(log1|σ1σ2|<2c31N2/3)1δ/2.\mathbb{P}\left({\log^{-1}\left\lvert{\frac{\sigma_{1}}{\sigma_{2}}}\right\rvert<2c_{3}^{-1}N^{2/3}}\right)\geq 1-\delta/2. (4.14)

Since Nα1\sqrt{N}\alpha_{1} is asymptotic Gaussian by [28, Theorem 13], then by using the same proof as for inequality (3.29): for any δ>0\delta>0, there exists a constant c4>0c_{4}>0 so that

limN(α1212(1ε)<c4N)1δ/2\lim_{N\to\infty}\mathbb{P}\left({\frac{\alpha_{1}^{-2}-1}{2(1-\varepsilon)}<c_{4}N}\right)\geq 1-\delta/2 (4.15)

Thus, combining inequalities (4.14) and (4.15), we can prove the desired result: for any δ>0\delta>0, there exist constants c4,c5=4c31>0c_{4},c_{5}=4c_{3}^{-1}>0 such that

limN(Tε<c5N2/3log(c4N))1δ.\lim_{N\to\infty}\mathbb{P}\left({T_{\varepsilon}<c_{5}N^{2/3}\log(c_{4}N)}\right)\geq 1-\delta. (4.16)

Acknowledgement:

We thanks Aukosh Jagannath, Yi Shen, and Dong Yao for their invaluable suggestions and insights.

References

  • [1] M. Abramowitz and I. A. Stegun. Handbook of mathematical functions with formulas, graphs, and mathematical tables, volume 55. US Government printing office, 1948.
  • [2] G. B. Arous, A. Dembo, and A. Guionnet. Aging of spherical spin glasses. Probability theory and related fields, 120(1):1–67, 2001.
  • [3] A. Auffinger and W.-K. Chen. On the energy landscape of spherical spin glasses. Advances in Mathematics, 330:553–588, 2018.
  • [4] G. Ben-Arous. Aging and spin-glass dynamics. arXiv preprint math/0304364, 2003.
  • [5] R. L. Burden and J. D. Faires. Numerical analysis, brooks, 1997.
  • [6] S. Chatterjee. Estimation in spin glasses: A first step. Annals of Statistics, 35(5):1931 – 1946, 2007.
  • [7] K. L. Chung. A course in probability theory. Academic press, 2001.
  • [8] A. Crisanti and H.-J. Sommers. The spherical p-spin interaction spin glass model: the statics. Zeitschrift für Physik B Condensed Matter, 87(3):341–354, 1992.
  • [9] L. F. Cugliandolo and J. Kurchan. Analytical solution of the off-equilibrium dynamics of a long-range spin-glass model. Physical Review Letters, 71(1):173, 1993.
  • [10] L. F. Cugliandolo and J. Kurchan. On the out-of-equilibrium relaxation of the sherrington-kirkpatrick model. Journal of Physics A: Mathematical and General, 27(17):5749, 1994.
  • [11] P. Deift and T. Trogdon. Universality for eigenvalue algorithms on sample covariance matrices. SIAM Journal on Numerical Analysis, 55(6):2835–2862, 2017.
  • [12] A. Dembo and R. Gheissari. Diffusions interacting through a random matrix: universality via stochastic taylor expansion. Probability Theory and Related Fields, 180:1057–1097, 2021.
  • [13] A. Dembo and E. Subag. Dynamics for spherical spin glasses: disorder dependent initial conditions. Journal of Statistical Physics, 181:465–514, 2020.
  • [14] D. L. Donoho, A. Maleki, and A. Montanari. Message-passing algorithms for compressed sensing. Proceedings of the National Academy of Sciences, 106(45):18914–18919, 2009.
  • [15] R. Durrett. Probability: theory and examples, volume 49. Cambridge university press, 2019.
  • [16] S. F. Edwards and P. W. Anderson. Theory of spin glasses. Journal of Physics F: Metal Physics, 5(5):965, 1975.
  • [17] H.-O. Georgii. Gibbs measures and phase transitions, volume 9. Walter de Gruyter, 2011.
  • [18] H.-O. Georgii, O. Häggström, and C. Maes. The random geometry of equilibrium phases. In Phase transitions and critical phenomena, volume 18, pages 1–142. Elsevier, 2001.
  • [19] E. Kostlan. Complexity theory of numerical linear algebra. Journal of Computational and Applied Mathematics, 22(2-3):219–230, 1988.
  • [20] E. Kostlan. Statistical complexity of dominant eigenvector calculation. Journal of Complexity, 7(4):371–379, 1991.
  • [21] B. Landon. Free energy fluctuations of the two-spin spherical sk model at critical temperature. Journal of Mathematical Physics, 63(3):033301, 2022.
  • [22] J. O. Lee and J. Yin. A necessary and sufficient condition for edge universality of wigner matrices. Duke Mathematical Journal, 163(1):117–173, 2014.
  • [23] T. Liang, S. Sen, and P. Sur. High-dimensional asymptotics of Langevin dynamics in spiked matrix models. Information and Inference: A Journal of the IMA, 12(4):2720–2752, 10 2023.
  • [24] B. Øksendal. Stochastic differential equations. In Stochastic differential equations, pages 65–84. Springer, 2003.
  • [25] D. Sherrington and S. Kirkpatrick. Solvable model of a spin-glass. Physical review letters, 35(26):1792, 1975.
  • [26] S. Smale. Complexity theory and numerical analysis. Acta numerica, 6:523–551, 1997.
  • [27] D. L. Stein and C. M. Newman. Spin glasses and complexity, volume 4. Princeton University Press, 2013.
  • [28] T. Tao and V. Vu. Random matrices: universal properties of eigenvectors. Random Matrices: Theory and Applications, 1(01):1150001, 2012.
  • [29] C. A. Tracy and H. Widom. Level-spacing distributions and the airy kernel. Communications in Mathematical Physics, 159:151–174, 1994.
  • [30] C. A. Tracy and H. Widom. On orthogonal and symplectic matrix ensembles. Communications in Mathematical Physics, 177:727–754, 1996.
  • [31] E. Vincent, J. Hammann, M. Ocio, J.-P. Bouchaud, and L. F. Cugliandolo. Slow dynamics and aging in spin glasses. In Complex Behaviour of Glassy Systems: Proceedings of the XIV Sitges Conference Sitges, Barcelona, Spain, 10–14 June 1996, pages 184–219. Springer, 2007.
  • [32] L. Zdeborová and F. Krzakala. Statistical physics of inference: Thresholds and algorithms. Advances in Physics, 65(5):453–552, 2016.