Analyzing dynamics and average case complexity in the spherical Sherrington-Kirkpatrick model: a focus on extreme eigenvectors

Tingzhou Yu University of Waterloo, University of Alberta, [email protected]

Abstract

We explore Langevin dynamics in the spherical Sherrington-Kirkpatrick model, delving into the asymptotic energy limit. Our approach involves integro-differential equations, incorporating the Crisanti-Horner-Sommers-Cugliandolo-Kurchan equation from spin glass literature, to analyze the system’s size and its temperature-dependent phase transition. Additionally, we conduct an average case complexity analysis, establishing hitting time bounds for the bottom eigenvector of a Wigner matrix. Our investigation also includes the power iteration algorithm, examining its average case complexity in identifying the top eigenvector overlap, with comprehensive complexity bounds.

1 Introduction

Spin glasses are disordered magnetic systems known for their complex behavior, making them a topic of interest in physics [16]. These systems have extensive applications across various fields, including statistics, computer science, and beyond (see e.g., [6, 14, 32]). Among the models studying spin glasses, the Sherrington-Kirkpatrick (SK) model [25] is particularly noteworthy for its extensive analysis. This paper primarily focuses on the spherical Sherrington-Kirkpatrick (SSK) model, which serves as the continuous counterpart of the SK model. A central challenge in the study of spin glasses is understanding the equilibrium phase transition (see e.g., [17, 18]), a critical change in the system’s properties occurring as temperature decreases. This transition reveals insights about the nature of the system’s ground state.

Consider $\mathbb{S}^{N-1}(\sqrt{N})$ , the $N-1$ dimensional sphere of radius $\sqrt{N}$ in $\mathbb{R}^{N}$ : $\mathbb{S}^{N-1}(\sqrt{N}):=\{\bm{X}\in\mathbb{R}^{N}:\|\bm{X}\|^{2}=N\}$ , where $\|\cdot\|_{2}$ represents the $\ell^{2}$ norm. The $2$ -spin spherical Sherrington-Kirkpatrick (SSK) model is defined by the Hamiltonian:

H_{\mathbf{J}}(X):=\sum_{1\leq i,j\leq N}J_{ij}X_{i}X_{j}=\bm{X}^{T}\mathbf{J}\bm{X},

(1.1)

where $\bm{X}=(X_{1},\dots,X_{N})\in\mathbb{R}^{N}$ are spin variables on the sphere $\mathbb{S}^{N-1}(\sqrt{N})$ and $\mathbf{J}={J_{ij}}_{1\leq i,j\leq N}$ is the normalized Wigner matrix as defined in Definition 1.1.

However, in some cases, the system may relax to equilibrium so slowly that it never actually reaches it. This phenomenon is known as ‘aging’, and it is integral to the study of spin glass dynamics, both experimentally and theoretically. Aging is a phenomenon that affects the system’s decorrelation properties over time. According to [4], this means that the longer the system exists, the longer it takes to forget its past. This phenomenon has been studied extensively by various authors (see e.g., [9, 10, 31]). The study of aging and its effect on spin glass dynamics is an active area of physics research, with important implications for the understanding of the low-temperature behavior of these systems.

The foundational mathematical literature on aging in spin glasses was first introduced in [2]. In this work, the authors concentrated on such systems, particularly investigating the aging phenomenon in spherical spin glasses. This was aimed at characterizing the low-temperature behaviors of Langevin dynamics in matrix models. A key focus was the study of Langevin dynamics within the Sherrington-Kirkpatrick (SK) model, utilizing correlation functions. These functions adhere to a complex system of integro-differential equations, known as the Crisanti-Horner-Sommers-Cugliandolo-Kurchan (CHSCK) equations [8, 9]. Recent advancements in this field include the work of [13], who derived the CHSCK equations for spherical mixed $p$ -spin disordered mean-field models. Another significant contribution was made by [12], who employed a general combinatorial method and stochastic Taylor expansion to establish universality in Langevin dynamical systems. Additionally, [23] explored the signal recovery problem in spiked matrix models using Langevin dynamics.

In this paper, we aim to analyze the asymptotic limit of the energy and focus on the long-time behavior of the system’s energy in Section 2. The energy, described by the Hamiltonian function, is critical in determining the equilibrium properties and governing the dynamics of spherical spin glass models (see e.g., [27, 3]). By studying the long-time behavior of the energy function, we can gain insight into the system’s energy evolution over time and its interaction with the spin variable dynamics. Notably, we derived an integro-differential equation of energy involving the empirical correlation function in Theorem 2.4. We also established an explicit formula for the energy’s limiting behavior in Theorem 2.5, where a phase transition was observed. Consequently, we obtained the limiting ratio of the energy to the empirical correlation function of the system in Corollary 2.6. The techniques employed in deriving our main results partially rely on the work of [2, 23].

As a potential application, the insights gained from our analysis of the asymptotic limit of the energy can be utilized to develop more efficient algorithms for solving challenging linear algebra problems, such as computing the eigenvectors of Wigner random matrices. However, this hinges on our understanding of the complexity of iterative methods, which have received less attention in the realm of complexity theory [26].

The complexity of algorithms in linear algebra has been a topic of interest for many years in [26]. While direct algorithms that solve problems in a finite number of steps have been extensively studied, iterative methods such as those required for the matrix eigenvalue problem have received less attention in complexity theory in [26]. The power method is a popular iterative algorithm that approximates the eigenvector corresponding to the dominant eigenvalue. However, for Hermitian random matrices, the complexity of the power method for obtaining a dominant eigenvector is infinite by [19]. [19] showed that the upper bound of the complexity is $O(N^{2}\log N)$ , conditioned on all the eigenvalues being positive. The upper bound of this complexity is established as $O(N^{2}\log N)$ under the assumption of all positive eigenvalues. In a separate study, [20] explored another algorithm for calculating the dominant vector, revealing that under certain conditions, the average number of iterations needed is $O(\log N+\log|\log\varepsilon|)$ . Furthermore, [11] studied the efficiency of three algorithms for computing the eigenvalues of sample covariance matrices, concluding that the complexity approximates $O\left(\left(\frac{\log\varepsilon^{-2}}{\log N}-\frac{3}{2}\right)N^{2/3}\log N\right)$ , independent of the specific distribution of the matrix entries.

In our work, as presented in Section 3, we delve into the complexity of an algorithm employing spherical gradient descent within the aging framework to analyze the equilibrium of the spherical SK model under zero-temperature dynamics (i.e., setting $\beta=\infty$ in the Langevin dynamics defined in (2.1)). Spherical gradient descent, functioning as an optimization method, updates spin variables in the direction of the negative gradient of the energy function on a unit sphere, effectively serving as a continuous analog of the power method. We focus on the hitting time, the moment when there is a positive overlap between the algorithm’s output and the bottom eigenvector, with overlap being a measure of similarity between two spin configurations. Our research contributes to the field by providing bounds for the complexity of computing eigenvectors of Wigner random matrices with finite fourth moments; specifically, we establish a lower bound of $O(N^{2/3})$ and an upper bound of $O(N^{2/3}\log N)$ in Theorem 3.3. Similarly, we analyzed the power iteration algorithm, particularly its complexity in reaching the first instance when the algorithm’s output and the top eigenvector overlap. Here too, we found the complexity’s upper bound to be $O(N^{2/3}\log N)$ and the lower bound to be $O(N^{2/3})$ , as detailed in Theorem 4.1.”

Notation:

For two real-valued functions $f(x),g(x)$ , we write $f(x)=\mathcal{O}(g(x))$ as $x\to\infty$ , which means there are constants $C>0$ and $x_{0}$ such that $|f(x)|\leq C|g(x)|$ for all $x\geq x_{0}$ .

For $f(x)=o(g(x))$ as $x\to\infty$ , it means for any $\varepsilon>0$ , there exists an $x_{0}$ such that $|f(x)|\leq\varepsilon|g(x)|$ for all $x>x_{0}$ .

Asymptotic Equality $f(x)\sigma_{x\to\infty}g(x)$ : Implies that $\lim_{x\to\infty}f(x)/g(x)=1$ , showing that $f(x)$ and $g(x)$ are asymptotically equal as $x$ gets very large.

The definition of the (normalized) Winger matrix we consider in this paper is as follows.

Definition 1.1.

Let $N\geq 1$ be an integer. Consider a symmetric (Hermitian) $N\times N$ matrix $\mathbf{J}=\{J_{ij}\}_{1\leq i,j\leq N}=\{\frac{1}{\sqrt{N}}Z_{ij}\}_{1\leq i,j\leq N}$ . Assume that the following conditions hold:

•

The upper-triangular entries $\{Z_{ij}\}_{1\leq i\leq j\leq N}$ are independent real (complex) random variables with mean zero;
•

The diagonal entries $\{Z_{ii}\}_{1\leq i\leq N}$ are identically distributed with finite variance, and the off-diagonal entries $\{Z_{ij}\}_{1\leq i<j\leq N}$ are identically distributed with unit variance;
•

Furthermore, in the context of the Hermitian case, assume that $\mathbb{E}[(Z_{12})^{2}]=0$ .

The matrix $\mathbf{J}$ is called the normalized symmetric (Hermitian) Wigner matrix.

Remark 1.2.

For cases where the random variables $Z_{ij}$ and $Z_{ii}$ are real Gaussian and $\mathbb{E}[|Z_{ii}|^{2}]=2$ for $1\leq i,j\leq N$ , the Wigner matrix $\mathbf{J}$ is called the Gaussian Orthogonal Ensemble (GOE). Likewise, when the entries $Z_{ij}$ are complex Gaussian and $Z_{ii}$ are real Gaussian with $\mathbb{E}[|Z_{ii}|^{2}]=1$ , the Wigner matrix $\mathbf{J}$ is called the Gaussian Unitary Ensemble (GUE).

2 Asymptotic energy dynamics in the spherical SK model

In this section, we explore the asymptotic energy dynamics of the spherical Sherrington-Kirkpatrick model. Our focus is on the Langevin dynamics, through which we analyze the system’s behavior over time. We will present our main results, including the derivation of integro-differential equations and the examination of long-term energy behavior and phase transitions in the model.

2.1 Main results

We consider the Langevin dynamics for the Sherrington-Kirkpatrick (SK) model defined by the following system of stochastic differential equations (SDEs) as in [2]:

dX_{t}^{i}=\sum_{j=1}^{N}J_{ij}X_{t}^{j}dt-f^{\prime}\left({\frac{1}{N}\sum_{j=1}^{N}(X_{t}^{j})^{2}}\right)X_{t}^{j}dt+\beta^{-1/2}dW_{t}^{i},

(2.1)

where $\mathbf{J}=\{J_{ij}\}_{1\leq i,j\leq N}$ is a symmetric matrix of centered Gaussian random variables such that $\mathbb{E}[J_{ij}^{2}]=\frac{1}{N}$ and $\mathbb{E}[J_{ii}^{2}]=\frac{2}{N}$ for $1\leq i<j\leq N$ , $f:[0,\infty)\to\mathbb{R}$ satisfies $f^{\prime}$ to be non-negative and Lipschitz, $\beta$ is a positive constant, and $\{W^{i}_{t}\}_{1\leq i\leq N}$ is an $N-$ dimensional Brownian motion, independent of $\{J_{ij}\}_{1\leq i,j\leq N}$ and of the initial data $\{X_{0}^{i}\}_{1\leq i\leq N}$ .

For any $N\geq 1$ and $T\geq 0$ , the SDE (2.1) has a unique strong solution $\bm{X}_{t}=\{X_{t}^{i}:1\leq i\leq N,t\in[0,T]\}$ on $\mathcal{C}([0,T],\mathbb{R}^{N})$ . See [2, Lemma 6.7] for a proof.

The second term in (2.1) is a Lagrange multiplier in order to implement a smooth spherical constraint [2]. The simplification caused by the SSK model is the invariance under rotation for the SDE (2.1).

We write $\mathbf{J}=G^{T}DG$ , where $G$ is an orthogonal matrix with the uniform law on the sphere and $D=\text{diag}(\sigma^{1},\dots,\sigma^{N})$ is the diagonal matrix of the eigenvalues $\{\sigma^{i}\}_{1\leq i\leq N}$ of $\mathbf{J}$ . As $N\to\infty$ , we have $\frac{1}{N}\sum_{i=1}^{N}\delta_{\sigma^{i}}$ converges weakly to the semicircle law $\mu_{D}$ with compact support $[-2,2]$ by [28, Theorem 2.4.2]. Recall that the density function of semicircle law $\mu_{D}$ is given by

d\mu_{D}=\frac{1}{2\pi}\sqrt{4-x^{2}}\mathbf{1}_{\{-2\leq x\leq 2\}}d\,x.

(2.2)

To simplify the SDE (2.1), we let both sides of (2.1) be multiplied by the rotation matrix $G$ which is invariant under rotation. We take $\bm{Y}_{t}:=G\bm{X}_{t}$ and $B_{t}:=GW_{t}$ . Then the SDE under the rotation is given by

dY_{t}^{i}=\left({\sigma^{i}-f^{\prime}\left({\|Y_{t}\|_{2}^{2}/N}\right)}\right)Y_{t}^{i}dt+\beta^{-1/2}dB_{t}^{i},

(2.3)

where $\|\cdot\|_{2}$ is the $\ell^{2}$ norm.

Denote by

K_{N}(t,s):=\frac{1}{N}\sum_{i=1}^{N}X_{t}^{i}X_{s}^{i}

(2.4)

the empirical correlation function. We use abbreviated notation $K_{N}(t):=K_{N}(t,t)$ for convenience.

Ben Arous, Dembo, and Guionnet studied the dynamics of the empirical correlation $K_{N}$ and the limiting point as $N\to\infty$ ( $N$ is the size of the system) in [2], which is the unique solution to a CHSCK equation as follows.

Theorem 2.1.

[2, Theorem 2.3] Assume that the initial data $\{X_{0}^{i}\}_{1\leq i\leq N}$ are i.i.d with law $\mu_{0}$ so that $\mathbb{E}_{X\sim\mu_{0}}[e^{\alpha X}]<\infty$ for some $\alpha>0$ . Fix $T\geq 0$ . As $N\to\infty$ , $K_{N}$ converges almost surely to deterministic limits $K$ . Recall that $\mu_{D}$ is the semicircle law. Moreover, the limit $K$ is the unique solution to the following integro-differential equation:

	$\displaystyle K(t,s)$	$\displaystyle=e^{-\int_{0}^{t}f^{\prime}(K(w))dw-\int_{0}^{s}f^{\prime}(K(w))dw}\underset{(\sigma,X_{0})\sim\pi^{\infty}}{\mathbb{E}}[e^{\sigma(t+s)}X_{0}^{2}]$
		$\displaystyle+\beta^{-1}\int_{0}^{t\land s}e^{-\int_{r}^{t}f^{\prime}(K(w))dw-\int_{r}^{s}f^{\prime}(K(w))dw}\underset{(\sigma,X_{0})\sim\pi^{\infty}}{\mathbb{E}}[e^{\sigma(t+s-2r)}]dr,$

where $\pi^{\infty}=\mu_{D}\otimes\mu_{0}$ and here we write $K(s):=K(s.s)$ .

Remark 2.2.

As emphasized in [2], the aging is very dependent on initial conditions. In addition to considering i.i.d. initial condition, the author also considers other three types of initial conditions: the rotated independent initial conditions, the top eigenvector initial conditions, and the stationary initial conditions.

Remark 2.3.

Based on the thermodynamic limit of $K_{N}(t,s)$ as $N\to\infty$ , the authors study the long time evaluations of $K(t,s)$ and established a dynamical phase transition in terms of the asymptotic of $K(t,s)$ in [2, Proposition 3.2]. This is a first mathematical proof of the aging phenomenon.

Next, we similarly consider how to describe how the energy of the system evolves over time. Recall that the quadratic Hamiltonian of SSK model is defined by $H_{\mathbf{J}}(\bm{X}_{t})=\bm{X}_{t}^{T}\mathbf{J}\bm{X}_{t}$ . Note that we have $H_{\mathbf{J}}(\bm{Y}_{t})=\bm{Y}_{t}^{T}D\bm{Y}_{t}$ , where $\bm{Y}_{t}=GX_{t}$ and $\mathbf{J}=G^{T}DG$ .

Let

H_{N}(t):=\frac{1}{N}H_{\mathbf{J}}(\bm{Y}_{t})=\frac{1}{N}\sum_{i=1}^{N}\sigma^{i}(Y_{t}^{i})^{2}

(2.5)

be the energy of the system.

Our first result characterizes the limiting behavior of the energy $H_{N}(t)$ of the SSK model as $N\to\infty$ for $t\in[0,T]$ as follows.

Theorem 2.4.

Assume that the initial data $\{X_{0}^{i}\}_{1\leq i\leq N}$ are i.i.d with law $\mu_{0}$ so that $\mathbb{E}_{X\sim\mu_{0}}[e^{\alpha X}]<\infty$ for some $\alpha>0$ . Fix $T\geq 0$ . Let $K$ be the solution defined as in Theorem 2.1. As $N\to\infty$ , $H_{N}$ converges almost surely to deterministic limits $H$ . Moreover, the limit $H$ is the unique solution to the following integro-differential equation:

	$\displaystyle H(t)$	$\displaystyle=e^{-2\int_{0}^{t}f^{\prime}(K(w))dw}\underset{(\sigma,X_{0})\sim\pi^{\infty}}{\mathbb{E}}[\sigma e^{2\sigma t}X_{0}^{2}]$		(2.6)
		$\displaystyle+\beta^{-1}\int_{0}^{t}e^{-2\int_{s}^{t}f^{\prime}(K(w))dw}\underset{(\sigma,X_{0})\sim\pi^{\infty}}{\mathbb{E}}[\sigma e^{2\sigma(t-s)}]ds$		(2.7)

where $\pi^{\infty}=\mu_{D}\otimes\mu_{0}$ and here we write $K(s):=K(s.s)$ .

Theorem 2.4 provides a precise characterization of $H(t)$ . However, the expression of $H(t)$ is unclear because it involves the fixed point equation of $K(t)$ in Theorem 2.1. The key point of this model is that we can exactly study the long time behavior of the energy $H(t)$ as $t\to\infty$ .

In order to precisely determine the limit of the energy, we will define $K(0,0)$ to be 1 and take function

f(x)=\frac{cx^{2}}{2},

(2.8)

where $c$ is a positive constant.

Let $m:\mathbb{R}\setminus\mbox{supp}(\mu_{D})\to\mathbb{R}$ be the Stieljes transform of the probability measure $\mu_{D}$ given by

m(s)=\mathbb{E}_{\sigma\sim\mu_{D}}\left[\frac{1}{s-\sigma}\right]=\frac{2}{s+\sqrt{s^{2}-4}}.

(2.9)

Let $\beta_{c}$ be the critical temperature such that

\beta_{c}=\frac{c}{4}m(2)=\frac{c}{4}.

(2.10)

Our second result is as follows.

Theorem 2.5.

Assume that $K(0,0)=1$ and $f(x)=\frac{cx^{2}}{2}$ for some positive constants $c>0$ . Let $H$ be the unique solution given in Theorem 2.1. Then, for $\beta\leq\beta_{c}$ , we have

\lim_{t\to\infty}H(t)=0.

(2.11)

For $\beta>\beta_{c}$ , we have

\lim_{t\to\infty}H(t)=\frac{4\beta-c}{2^{\frac{5}{2}}\sqrt{\pi}c\beta}+\frac{1}{2}\beta^{-1}.

(2.12)

Theorem 2.5 describes the limit of energy function $H(t)$ as $t\to\infty$ . This concludes a dynamical phase transition phenomenon. See Figure 1 for the existence of a jump discontinuity in the asymptotic limit of the function $H(t)$ , where we set $c=1.$

Refer to caption — Figure 1: In this figure, we set $c=1$ and plot the limiting behavior of the function $H(t)$ as $t$ approaches infinity. $H(t)$ exhibits a jump discontinuity in the phase transition at the critical inverse temperature $\beta_{c}=0.25$ .

The proof of Theorem 2.5 utilizes tools and techniques from the paper [2], with some modifications made to their results. We borrow the notation from [2] and write

R(t):=\exp\left({2\int_{0}^{t}f^{\prime}\left({K(w)}\right)dw}\right)

(2.13)

with $K(w)=K(w,w)$ .

Then the expression of $H(t)$ in Theorem 2.4 becomes

H(t)=R(t)^{-1}\left({\mathbb{E}[\sigma e^{2\sigma t}]+\beta^{-1}\int_{0}^{t}R(r)\mathbb{E}[\sigma e^{2\sigma(t-r)}]dr}\right)

(2.14)

Note that the limit of $H(t)$ is governed by the asymptotic of the derivative of the moment generating function of $\sigma$ . So it suffices to characteristic the limit of $R(t)$ and $\mathbb{E}\left[\sigma e^{2\sigma t}\right]$ .

Similarly, we consider the asymptotic limit of $H(t)/K(t)$ as $t\to\infty$ .

Corollary 2.6.

Assume the same setting as in Theorem 2.5. Then for $\beta\leq\beta_{c}$ , we have

\lim_{t\to\infty}\frac{H(t)}{K(t)}=0.

(2.15)

For $\beta>\beta_{c}$ , we have

\lim_{t\to\infty}\frac{H(t)}{K(t)}=\frac{2^{-3/2}(4\beta-c)+\sqrt{\pi}c}{2^{-5/2}(4\beta-c)+\sqrt{\pi}c}

(2.16)

2.2 Proof of Theorem 2.5

In proof of our main results, we will need to use the Bessel function. Let us first revisit its basic definition and some relevant results that we will be using.

An alternative definition of the (modified) Bessel function, for integer values of $n$ , is possible using integral representation:

Definition 2.7.

[1, Section 9.1] The Bessel function is given by

B_{n}(x):=\frac{i^{-n}}{\pi}\int_{0}^{\pi}e^{ix\cos\theta}\cos(n\theta)d\theta.

(2.17)

for $n\in\mathbb{N}$ and $x\in\mathbb{R}$ .

The modified Bessel function is given by

I_{n}(x):=\frac{1}{\pi}\int_{0}^{\pi}e^{x\cos\theta}\cos(n\theta)d\theta.

(2.18)

for $n\in\mathbb{N}$ and $x\in\mathbb{R}$ .

The relation between the Bessel function and the modified Bessel function is given by

I_{n}(x)=e^{-in\pi/2}B_{n}(xe^{i\pi/2})

(2.19)

as shown in [1, Section 9.6].

We will use the following lemma about the recurrence relation and derivatives of modified Bessel functions.

Lemma 2.8.

[1, Section 9.6] Let $I_{n}$ be the modified Bessel function defined as in Definition 2.7. For $n\in\mathbb{N}$ ,

I_{n}^{\prime}(x)=I_{n+1}(x)+\frac{n}{x}I_{n}(x),

(2.20)

and $I^{\prime}_{0}(x)=I_{1}(x)$ .

By [1, Section 9.6], we have the following asymptotic results of modified Bessel functions.

Lemma 2.9.

Let $I_{n}$ be the modified Bessel function defined as in Definition 2.7. For $n\in\mathbb{N}$ , as $x\to\infty$ we have

\lim_{x\to\infty}\frac{I_{n}(x)}{x^{-1/2}e^{x}}=\frac{1}{\sqrt{2\pi}}.

(2.21)

We will give the representation of the characteristic function of the semicircle law by the Bessel function.

Lemma 2.10.

Let $B_{1}(t)$ be the Bessel function defined as in Definition 2.7 for $n=1$ . Recall that the eigenvalues $\sigma$ of $N\times N$ normalized symmetric Wigner matrix $\mathbf{J}$ follow the semicircle law with distribution $\mu_{D}$ as in (2.2). Then we have

\mathbb{E}[e^{it\sigma}]=\frac{B_{1}(2t)}{t}

(2.22)

Proof.

By [7, Theorem 6.2.3], it is enough to calculate the inverse of the characteristic function of $B_{1}(2t)/t$ is the density of the semicircle law.

Note that by the inversion formula we have

	$\displaystyle\dfrac{1}{2\pi}\int_{\mathbb{R}}\dfrac{1}{t}B_{1}\left(2t\right)e^{-itx}dt$	$\displaystyle=\dfrac{1}{2i\pi^{2}}\int_{\mathbb{R}}\int_{0}^{\pi}\dfrac{1}{t}e^{2it\cos\theta}\cos\theta e^{-itx}d\theta dt$
		$\displaystyle=\dfrac{1}{2i\pi^{2}}\int_{0}^{\pi}\cos\theta\underbrace{\int_{\mathbb{R}}\dfrac{1}{t}e^{it(2\cos\theta-x)}dt}_{=-i\pi\cdot\text{Sign}(2\cos\theta-x)}d\theta$
		$\displaystyle=-\dfrac{1}{2\pi}\int_{0}^{\pi}\cos\theta\cdot\text{sign}(2\cos\theta-x)d\theta,$

where $\text{Sign}(\cdot)$ is the Sign function.

Clearly, we have $\text{Sign}(2\cos\theta-x)=-1$ for $x>2$ , and $\text{sgn}(2\cos\theta-x)=1$ for $x<-2$ . For both cases, the above integral is zero due to $\int_{0}^{\pi}\cos\theta d\theta=0$ .

Consider the case that $-2\leq x\leq 2$ . Set $u=2\cos\theta-x$ . The above integral becomes

	$\displaystyle-\dfrac{1}{2\pi}\int_{-2-x}^{2-x}\dfrac{u+x}{2}\dfrac{1}{\sqrt{1-\left(\frac{u+x}{2}\right)^{2}}}\text{Sign}(u)du$	$\displaystyle=\dfrac{1}{2\pi}\Bigg{\{}\int_{-2-x}^{0}\dfrac{u+x}{2}\dfrac{1}{\sqrt{1-\left(\frac{u+x}{2}\right)^{2}}}du-\int_{0}^{2-x}\dfrac{u+x}{2}\dfrac{1}{\sqrt{1-\left(\frac{u+x}{2}\right)^{2}}}du\Bigg{\}}$
		$\displaystyle=\dfrac{1}{2\pi}\left(2\int_{-1}^{\frac{x}{2}}\dfrac{y}{\sqrt{1-y^{2}}}dy-2\int_{\frac{x}{2}}^{1}\dfrac{y}{\sqrt{1-y^{2}}}dy\right)$
		$\displaystyle=\dfrac{1}{2\pi}\sqrt{4-x^{2}},$

where we take variable $y=(u+x)/2$ in the second line for $-2\leq x\leq 2$ . ∎

The following Lemma derives the asymptotic limit of the derivative of the moment generating function of $\sigma$ .

Lemma 2.11.

Recall that the eigenvalues $\sigma$ of $N\times N$ normalized symmetric Wigner matrix $\mathbf{J}$ follow the semicircle law with distribution $\mu_{D}$ as in (2.2). Then we have:

\lim_{t\to\infty}\frac{\mathbb{E}_{\sigma\sim\mu_{D}}[\sigma e^{t\sigma}]}{t^{-3/2}e^{2t}}=\frac{1}{2\sqrt{\pi}}.

(2.23)

Proof.

Substitute $t$ by $it$ in Lemma 2.10, then we have

\mathbb{E}[e^{t\sigma}]=\frac{B_{1}(-2it)}{-2it}

(2.24)

where $B_{1}(\cdot)$ is the Bessel function.

By equation (2.19), we get

\mathbb{E}[e^{t\sigma}]=\frac{I_{1}(2t)}{2t},

(2.25)

where $I_{1}(2t)$ is the modified Bessel function.

Thus, the derivative of the moment generating function can be expressed as

\mathbb{E}[\sigma e^{t\sigma}]=-t^{-2}I_{1}(2t)+2t^{-1}I_{1}^{\prime}(2t)

(2.26)

Combine (2.26) and Lemma 2.8 we have

\mathbb{E}[\sigma e^{t\sigma}]=\frac{I_{2}(2t)}{t}.

(2.27)

By Lemma 2.9, it yields the desired result.

∎

Similarly, we have the following result.

Lemma 2.12.

Assume the same setting holds as in Lemma 2.11, we have

\lim_{t\to\infty}\frac{\mathbb{E}_{\sigma\sim\mu_{D}}[e^{t\sigma}]}{t^{-3/2}e^{2t}}=\frac{1}{4\sqrt{\pi}}.

(2.28)

Let $m(s)$ be defined as in (2.9). We define

p(s,\beta):=\frac{2\beta s}{c}-m(s).

(2.29)

Recall that $\beta_{c}$ is the critical temperature defined in (2.10). For any $\beta\in(0,\beta_{c})$ , there exists a unique solution of $p(s,\beta)=0$ on the interval $(2,\infty)$ , denoted by $s_{\beta}$ . We can solve for $s_{\beta}$ as $s_{\beta}=2(1-(1-\beta/\beta_{c})^{2})^{-1/2}$ . For $\beta>\beta_{c}$ , we simply define $s_{\beta}=2$ .

We can get the asymptotic limit of $R(t)$ as follows.

Lemma 2.13.

[2, Lemma 3.3] Recall that $s_{\beta}=2(1-(1-\beta/\beta_{c})^{2})^{-1/2}$ for any $\beta\in(0,\beta_{c})$ , and $s_{\beta}=2$ for $\beta\geq\beta_{c}$ . Let $\Psi=0$ for $\beta<\beta_{c}$ , $\Psi=\frac{1}{2}$ for $\beta=\beta_{c}$ , and $\Psi=\frac{3}{2}$ for $\beta>\beta_{c}$ . Then there exists a constant $C_{\beta}>0$ such that

\lim_{x\to\infty}\frac{R(x)}{x^{-\Psi}e^{2xs_{\beta}}}=C_{\beta}.

(2.30)

Moreover, we have

C_{\beta}=\begin{cases}\frac{\beta(cm(s_{\beta})+1)}{2\beta-cm^{\prime}(s_{\beta})},\quad&\beta<\beta_{c}\\ \frac{\beta(c+1)}{c},\quad&\beta=\beta_{c}\\ \frac{c\beta(4\beta+1)}{(4\beta-c)^{2}},\quad&\beta>\beta_{c}\end{cases}

where $c$ is the coefficient constant defined in (2.8) and $m(s)$ is defined as in (2.9).

Combining Lemma 2.11 and Lemma 2.13 we can characterize the limit of $H(t)$ as $t\to\infty$ . In order to give a precise result of the asymptotic limit, we also need to do Laplace transformation on both sides of the equation (2.14) to get an identity as follows.

Define the Laplace transform of the function $R(t)$ for $z>2$ by

\mathcal{L}_{R}(z):=\int_{0}^{\infty}e^{-2zt}R(t)dt.

(2.31)

Lemma 2.14.

The Laplace transform $\mathcal{L}_{R}(z)$ satisfies the equation

2z\mathcal{L}_{R}(z)-1=cm(z)(1+\beta^{-1}\mathcal{L}_{R}(z)),

(2.32)

where $c$ is the coefficient constant defined in (2.8) and $m(s)$ is defined as in (2.9).

Proof.

Note that

K(t)R(t)=K(t)e^{2c\int_{0}^{t}K(w)dw}=\frac{1}{2c}\partial_{t}R(t).

Then we have the linear Volterra integro-differential equation

R^{\prime}(t)=2cK(t)R(t)=2c\left({\mathbb{E}[e^{2\sigma t}]+\beta^{-1}\int_{0}^{t}R(r)\mathbb{E}[e^{2\sigma(t-r)}]dr}\right).

(2.33)

The Laplace transform of the LHS in (2.33) is

\mathcal{L}_{R^{\prime}}(z)=-R(0)+2z\mathcal{L}_{R}(z)=-1+2z\mathcal{L}_{R}(z).

Note that the term inside the integral on RHS in (2.33) can be expressed as the convolution of $R(t)$ and $e^{2\sigma t}$ . We write it as $(e^{2\sigma\cdot}*R)(t)$ and use the fact that the Laplace transform of this one is equal to product of the Laplace transform of each function.

Thus, the RHS becomes

	$\displaystyle\int_{0}^{\infty}e^{-2zt}R^{\prime}(t)dt$	$\displaystyle=\int_{0}^{\infty}e^{-2zt}\left({2c\left({\mathbb{E}[e^{2\sigma t}]+\beta^{-1}\int_{0}^{t}R(r)\mathbb{E}[e^{2\sigma(t-r)}]dr}\right)}\right)dt$
		$\displaystyle=c\mathbb{E}\left[\frac{1}{z-\sigma}\right]+2c\beta^{-1}\mathbb{E}\left[\int_{0}^{\infty}e^{-2t(z-\sigma)}dt\right]\mathcal{L}_{g}(z)$
		$\displaystyle=c\mathbb{E}\left[\frac{1}{z-\sigma}\right]+c\beta^{-1}\mathbb{E}\left[\frac{1}{z-\sigma}\right]\mathcal{L}_{g}(z)$

Hence, combining the Laplace transforms of the left and right sides, we obtain

\displaystyle 2z\mathcal{L}_{R}(z)=1+cm(z)+c\beta^{-1}m(z)\mathcal{L}_{R}(z).

∎

We now turn to the proof of Theorem 2.5.

Proof of Theorem 2.5.

(i)

We start by considering the case where $\beta>\beta_{c}$ .

Using Lemma 2.13, we obtain an asymptotic limit for $R(t)$ as:

R(t)\sim_{t\to\infty}C_{\beta}t^{-3/2}e^{4t},

where $C_{\beta}=\frac{c\beta(4\beta+1)}{(4\beta-c)^{2}}$ .

Combining the asymptotic limit for $R(t)$ and Lemma 2.11, we notice that the limit of the first term of $H(t)$ defined as in (2.14) is:

\lim_{t\to\infty}R(t)^{-1}\mathbb{E}[\sigma e^{2\sigma t}]=\lim_{t\to\infty}\frac{\frac{\mathbb{E}[\sigma e^{2\sigma t}]}{(2t)^{-3/2}e^{4t}}(2t)^{-3/2}e^{4t}}{\frac{R(t)}{C_{\beta}t^{-3/2}e^{4t}}C_{\beta}t^{-3/2}e^{4t}}=\frac{2^{-5/2}}{\sqrt{\pi}C_{\beta}}.

(2.34)

Next, we multiply the integral in equation (2.14) by the asymptotic limit of $R(t)$ and split it into three parts: for $t\to\infty,x\to\infty,$ and $x/t\to 0$ ,

	$\displaystyle C_{\beta}^{-1}t^{\frac{3}{2}}e^{-4t}\int_{0}^{t}R(r)\mathbb{E}[\sigma e^{2\sigma(t-r)}]dr$	$\displaystyle=C_{\beta}^{-1}t^{\frac{3}{2}}e^{-4t}\Bigg{(}\int_{0}^{x}R(r)\mathbb{E}[\sigma e^{2\sigma(t-r)}]dr$
		$\displaystyle+\int_{x}^{t-x}R(r)\mathbb{E}[\sigma e^{2\sigma(t-r)}]dr+\int_{t-x}^{t}R(r)\mathbb{E}[\sigma e^{2\sigma(t-r)}]dr\Bigg{)}$
		$\displaystyle=C_{\beta}^{-1}t^{\frac{3}{2}}e^{-4t}\Bigg{(}\int_{0}^{x}R(r)\mathbb{E}[\sigma e^{2\sigma(t-r)}]dr$
		$\displaystyle+\int_{x}^{t-x}R(r)\mathbb{E}[\sigma e^{2\sigma(t-r)}]dr+\int_{0}^{x}R(t-r)\mathbb{E}[\sigma e^{2\sigma(r)}]dr\Bigg{)}$
		$\displaystyle=:I_{1}+I_{2}+I_{3}.$

We now estimate each term separately. For $I_{1}$ , we have:

	$\displaystyle I_{1}$	$\displaystyle=C_{\beta}^{-1}t^{\frac{3}{2}}e^{-4t}\int_{0}^{x}R(r)\mathbb{E}[\sigma e^{2\sigma(t-r)}]dr$
		$\displaystyle=C_{\beta}^{-1}t^{\frac{3}{2}}e^{-4t}\int_{0}^{x}R(r)\frac{\mathbb{E}[\sigma e^{2\sigma(t-r)}]}{(2(t-r))^{-\frac{3}{2}}e^{4(t-r)}}(2(t-r))^{-\frac{3}{2}}e^{4(t-r)}dr$
		$\displaystyle=2^{-\frac{3}{2}}C_{\beta}^{-1}\int_{0}^{x}R(r)e^{-4r}\left({\frac{1}{2\sqrt{\pi}}+o(1)}\right)\left({\frac{t}{t-r}}\right)^{\frac{3}{2}}dr$
		$\displaystyle=2^{-\frac{3}{2}}C_{\beta}^{-1}\int_{0}^{x}R(r)e^{-4r}\left({\frac{1}{2\sqrt{\pi}}+o(1)}\right)(1+o(1))dr$
		$\displaystyle=2^{-\frac{5}{2}}C_{\beta}^{-1}\frac{1}{\sqrt{\pi}}\int_{0}^{x}R(r)e^{-4r}dr$

For $I_{2}$ , we can show that it is of smaller order than $I_{1}$ and $I_{3}$ and can be neglected. Indeed, we have

	$\displaystyle I_{2}$	$\displaystyle=C_{\beta}^{-1}t^{\frac{3}{2}}e^{-4t}\int_{x}^{t-x}R(r)\mathbb{E}[\sigma e^{2\sigma(t-r)}]dr$
		$\displaystyle=C_{\beta}^{-1}t^{\frac{3}{2}}e^{-4t}\int_{x}^{t-x}\frac{R(r)}{C_{\beta}r^{-\frac{3}{2}}e^{4r}}C_{\beta}r^{-\frac{3}{2}}e^{4r}\frac{\mathbb{E}[\sigma e^{2\sigma(t-r)}]}{(2(t-r))^{-\frac{3}{2}}e^{4(t-r)}}(2(t-r))^{-\frac{3}{2}}e^{4(t-r)}dr$
		$\displaystyle=C_{\beta}^{-1}t^{\frac{3}{2}}e^{-4t}\int_{x}^{t-x}C_{\beta}r^{-\frac{3}{2}}e^{4r}\left({\frac{1}{2\sqrt{\pi}}+o(1)}\right)(2(t-r))^{-\frac{3}{2}}e^{4(t-r)}dr$
		$\displaystyle=2^{-\frac{5}{2}}\frac{1}{\sqrt{\pi}}\int_{x}^{t-x}\left({\frac{t}{r(t-r)}}\right)^{\frac{3}{2}}dr$
		$\displaystyle=o(1)$

Finally, for $I_{3}$ , we have:

I_{3}=\int_{0}^{x}\mathbb{E}[\sigma e^{2\sigma r}]e^{-4r}dr.

Note that we have

\int_{0}^{\infty}|R(r)|e^{-4r}dr<\infty\mbox{ and }\,\int_{0}^{\infty}\mathbb{E}[\sigma e^{2\sigma r}]e^{-4r}dr<\infty.

Then we have

\lim_{t\to\infty}H(t)=\frac{2^{-\frac{5}{2}}}{C_{\beta}\sqrt{\pi}}+\frac{2^{-\frac{5}{2}}\beta^{-1}}{C_{\beta}\sqrt{\pi}}\int_{0}^{\infty}R(r)e^{-4r}dr+\beta^{-1}\int_{0}^{\infty}\mathbb{E}[\sigma e^{2\sigma r}]e^{-4r}dr.

(2.35)

By Lemma 2.14, we have

\int_{0}^{\infty}e^{-4r}R(r)dr=\mathcal{L}_{R}(2)=\frac{\beta(1+c)}{4\beta-c}

(2.36)

Also, note that

$\displaystyle\int_{0}^{\infty}\mathbb{E}[\sigma e^{2\sigma r}]e^{-4r}dr$	$\displaystyle=\mathbb{E}\left[\sigma\int_{0}^{\infty}e^{-2(2-\sigma)r}dr\right]$
	$\displaystyle=\mathbb{E}\left[\frac{\sigma}{2(2-\sigma)}\right]$
	$\displaystyle=\frac{1}{2}\left({-1+\mathbb{E}\left[\frac{2}{2-\sigma}\right]}\right)=\frac{1}{2}.$	(2.37)

Hence, plug (2.36) and (2.37) into (2.35) we have

\lim_{t\to\infty}H(t)=\frac{2^{-\frac{5}{2}}}{C_{\beta}\sqrt{\pi}}+\left({\frac{2^{-\frac{5}{2}}}{C_{\beta}\sqrt{\pi}}}\right)\left({\frac{1+c}{4\beta-c}}\right)+\frac{1}{2}\beta^{-1}=\frac{2^{-\frac{5}{2}}(4\beta-c)}{\sqrt{\pi}c\beta}+\frac{1}{2}\beta^{-1}.

(2.38)

(ii)

As $\beta=\beta_{c}$ , by Lemma 2.13, we have

R(t)\sim_{t\uparrow\infty}C_{\beta}t^{-1/2}e^{4t}

where $C_{\beta}=\frac{\beta(c+1)}{c}$ .

By Lemma 2.11 and Lemma 2.13, the first term $R^{-1}(t)\mathbb{E}[\sigma e^{2\sigma t}]$ in $H(t)$ converges to $0$ as $t\to\infty$ .

Note that we have for $x\to\infty,t\to\infty,x/t\to 0$

	$\displaystyle C_{\beta}^{-1}t^{\frac{1}{2}}e^{-4t}\int_{0}^{t}R(r)\mathbb{E}[\sigma e^{2\sigma(t-r)}]dr$	$\displaystyle=C_{\beta}^{-1}t^{\frac{1}{2}}e^{-4t}\Bigg{(}\int_{0}^{x}R(r)\mathbb{E}[\sigma e^{2\sigma(t-r)}]dr$
		$\displaystyle+\int_{x}^{t-x}R(r)\mathbb{E}[\sigma e^{2\sigma(t-r)}]dr+\int_{0}^{x}R(t-r)\mathbb{E}[\sigma e^{2\sigma r}]dr\Bigg{)}$
		$\displaystyle=:E_{1}+E_{2}+E_{3}$

Then we have

\displaystyle E_{1}

\displaystyle=2^{-\frac{3}{2}}C_{\beta}^{-1}\int_{0}^{x}R(r)e^{-4r}\frac{t^{\frac{1}{2}}}{(t-r)^{\frac{3}{2}}}\left({\frac{1}{2\sqrt{\pi}}+o(1)}\right)dr=o(1)

where it follows from $\frac{t^{\frac{1}{2}}}{(t-r)^{\frac{3}{2}}}=\frac{1}{t}(1+o(1))=o(1)$ .

Similarly, we have $E_{3}=o(1)$ .

Also, we have

\displaystyle E_{2}=\frac{2^{-\frac{3}{2}}}{\sqrt{\pi}}\int_{x}^{t-x}\left({\frac{t}{r}}\right)^{\frac{1}{2}}\left({\frac{1}{t-r}}\right)^{\frac{3}{2}}dr=O(x^{-\frac{1}{2}}).

Hence, we have

\lim_{t\to\infty}H(t)=0.

(2.39)

(iii)

In the case of $\beta<\beta_{c}$ :

By Lemma 2.13, we have

R(t)\sim_{t\uparrow\infty}C_{\beta}e^{2s_{\beta}t},

where $C_{\beta}=\frac{\beta(cm(s_{\beta})+1)}{2\beta-cm^{\prime}(s_{\beta})}$ .

By Lemma 2.11 and Lemma 2.13, the first term $R^{-1}(t)\mathbb{E}[\sigma e^{2\sigma t}]$ in $H(t)$ converges to $0$ as $t\to\infty$ .

Note that we have for $x\to\infty,t\to\infty,x/t\to 0$

	$\displaystyle C_{\beta}^{-1}e^{-2s_{\beta}t}\int_{0}^{t}R(r)\mathbb{E}[\sigma e^{2\sigma(t-r)}]dr$	$\displaystyle=C_{\beta}^{-1}e^{-2s_{\beta}t}\Bigg{(}\int_{0}^{x}R(r)\mathbb{E}[\sigma e^{2\sigma(t-r)}]dr$
		$\displaystyle+\int_{x}^{t-x}R(r)\mathbb{E}[\sigma e^{2\sigma(t-r)}]dr+\int_{0}^{x}R(t-r)\mathbb{E}[\sigma e^{2\sigma r}]dr\Bigg{)}$
		$\displaystyle=:F_{1}+F_{2}+F_{3}$

Then we have

\displaystyle F_{1}

\displaystyle=2^{-\frac{3}{2}}C_{\beta}^{-1}\int_{0}^{x}R(r)e^{-4r}\frac{1}{(t-r)^{\frac{3}{2}}}e^{-2(s_{\beta}-2)t}\left({\frac{1}{2\sqrt{\pi}}+o(1)}\right)dr=o(1)

where it follows from $\frac{t^{\frac{1}{2}}}{(t-r)^{\frac{3}{2}}}=\frac{1}{t}(1+o(1))=o(1)$ and $s_{\beta}>2$ .

Similarly, we have $F_{3}=o(1)$ .

Also, we have

\displaystyle F_{2}=\frac{2^{-\frac{3}{2}}}{\sqrt{\pi}}\int_{x}^{t-x}e^{-2(s_{\beta}-2)(t-r)}\left({\frac{1}{t-r}}\right)^{\frac{3}{2}}dr=O(x^{-\frac{1}{2}}).

Hence, we have

\lim_{t\to\infty}H(t)=0.

(2.40)

∎

2.3 Proof of Theorem 2.4

Recall that $f^{\prime}$ is non-negative and Lipschitz as defined in (2.1). Recall that $K=K(t,t)$ is defined in (2.4). Define

R_{\tau}^{\theta}(K):=e^{-\int_{\tau}^{\theta}f^{\prime}(K(s))ds}

(2.41)

and

DR_{\tau}^{\theta}(K)=\frac{d}{d\tau}R_{\tau}^{\theta}(K)=f^{\prime}(K(\tau,\tau))e^{-\int_{\tau}^{\theta}f^{\prime}(K(s))ds}

(2.42)

We have the following bound on $R_{\tau}^{\theta}(K)$ and $DR_{\tau}^{\theta}(K)$ .

Lemma 2.15.

[2, Theorem 5.3] Recall that we define $f^{\prime},R_{\tau}^{\theta}(K),$ and $DR_{\tau}^{\theta}(K)$ as above. Then we have

1.

for any $0\leq\tau\leq\theta\leq T$ and $K\in\mathbf{C}([0,T]^{2})$ , we have

$0\leq R_{\tau}^{\theta}(K)\leq 1,\,\mbox{ and }\int_{0}^{t}|R_{\tau}^{\theta}(K)|d\tau\leq 1.$ (2.43)

for every $\theta\leq T$ , we have

\sup_{\tau\leq\theta}|R_{\tau}^{\theta}(K)-R_{\tau}^{\theta}(\widetilde{K})|\leq\|f^{\prime}\|_{\mathcal{L}}\int_{0}^{\theta}|K(s,s)-\widetilde{K}(s,s)|ds,

(2.44)

where $\|f^{\prime}\|_{\mathcal{L}}$ is the Lipschitz norm of $f^{\prime}$ .

for any $0\leq\tau\leq\theta\leq T$ ,

\displaystyle|DR_{\tau}^{\theta}(K)-DR_{\tau}^{\theta}(\widetilde{K})|\leq\|f^{\prime}\|_{\mathcal{L}}\Bigg{\{}|K(\tau,\tau)-\widetilde{K}(\tau,\tau)|+\left({DR_{\tau}^{\theta}(K)+DR_{\tau}^{\theta}(\widetilde{K})}\right)\int_{0}^{\theta}|K(s,s)-\widetilde{K}(s,s)|ds\Bigg{\}}.

Consider the following collections of functions with domain space $\mathbb{R}^{2}\times\mathbf{C}([0,T])$ for $T>0$ and range space one of $\mathbf{C}([0,T]^{j})$ for $j=1,2,3$ :

\displaystyle\mathcal{G}_{1}:=\{g_{j},j=1,2,3:g_{1}(Y_{0},\sigma,B_{\cdot})(t)=\sigma e^{\sigma t}(Y_{0})^{2},g_{2}(\cdot)(t)=\sigma B_{t}^{2},g_{3}(\cdot)(t)=\sigma Y_{0}B_{t}\}.

\displaystyle\mathcal{G}_{2}:=\{g_{j},j=4,5:g_{4}(Y_{0},\sigma,B_{\cdot})(s,t)=\sigma Y_{0}B_{s}e^{\sigma t},g_{5}(\cdot)(s,t)=\sigma^{2}Y_{0}B_{s}e^{\sigma t}\}.

\displaystyle\mathcal{G}_{3}:=\{g_{j},j=6,7:g_{6}(Y_{0},\sigma,B_{\cdot})(u,v,t)=\sigma B_{u}B_{v}e^{\sigma t},g_{7}(\cdot)(u,v,t)=\sigma^{2}B_{u}B_{v}e^{\sigma t}\}.

Then our collection of functions is

\mathcal{G}=\mathcal{G}_{1}\cup\mathcal{G}_{2}\cup\mathcal{G}_{3}.

(2.45)

Define the empirical measure

\nu_{T}^{N}:=\frac{1}{N}\sum_{i=1}^{N}\delta_{Y_{0}^{i},\sigma^{i},B^{i}_{[0,T]}}.

(2.46)

Define for $g\in\mathcal{G}$

\mathcal{C}_{N}:=\int g(Y_{0},\sigma,B_{\cdot})d\nu_{T}^{N}(Y_{0},\sigma,B_{\cdot})=\frac{1}{N}\sum_{i=1}^{N}g(Y_{0}^{i},\sigma^{i},B^{i}_{\cdot}),

(2.47)

where note that for $g\in\mathcal{G}_{j}$ , $\int gd\nu_{T}^{N}\in\mathbf{C}([0,T]^{j})$ for $j=1,2,3$ .

Proof of Theorem 2.4.

Existence and uniqueness of the limit

Apply Ito’s formula from [15, Theorem 7.6.1], we have

Y_{t}^{i}=e^{\int_{0}^{t}\left({\sigma^{i}-f^{\prime}(K_{N}(r))}\right)dr}Y_{0}^{i}+\beta^{-1/2}\int_{0}^{t}e^{\int_{s}^{t}\left({\sigma^{i}-f^{\prime}(K_{N}(r))}\right)dr}dB_{s}^{i}.

(2.48)

Define $F_{t}(K,\sigma)=f^{\prime}((K(t,t))-\sigma$ . By integration by part, we have

Y_{t}^{i}=e^{-\int_{0}^{t}F_{r}(K_{N},\sigma^{i})dr}Y_{0}^{i}+\beta^{-1/2}B_{t}^{i}+\left({-\beta^{-1/2}\int_{0}^{t}B_{s}^{i}F_{s}(K_{N},\sigma^{i})e^{-\int_{s}^{t}F_{r}(K_{N},\sigma^{i})dr}ds}\right).

(2.49)

Then we have

Y_{t}^{i}=\underbrace{R_{0}^{t}(K_{N})e^{\sigma^{i}t}Y_{0}^{i}}_{=:T_{1}^{i}(t)}+\underbrace{\beta^{-1/2}B_{t}^{i}}_{=:T_{2}^{i}(t)}+\underbrace{\left({-\beta^{-1/2}\int_{0}^{t}B_{s}^{i}e^{\sigma^{i}(t-s)}(DR_{s}^{t}(K_{N})-\sigma^{i}R_{s}^{t}(K_{N}))ds}\right)}_{=:T_{3}^{i}(t)}.

(2.50)

We denote the sum of three terms as $T_{1}^{i}(t)$ , $T_{2}^{i}(t)$ , and $T_{3}^{i}(t)$ , respectively. In this case,

Y_{t}^{i}=T_{1}^{i}(t)+T_{2}^{i}(t)+T_{3}^{i}(t).

(2.51)

Then the energy $H_{N}(t)$ becomes

H_{N}(t)=\frac{1}{N}\sum_{i=1}^{N}\sigma^{i}\left({\sum_{j=1}^{3}(T_{j}^{i}(t))^{2}+2\sum_{j\neq k}T_{j}^{i}(t)T_{k}^{i}(t)}\right),

(2.52)

Plug into the expression of $T_{1}^{i}(t),T_{2}^{i}(t),T_{3}^{i}(t)$

	$\displaystyle H_{N}(t)$	$\displaystyle=\frac{1}{N}\sum_{i=1}^{N}\sigma^{i}(R_{0}^{t}(K_{N}))^{2}e^{2\sigma^{i}t}(Y_{0}^{i})^{2}+\frac{1}{N}\sum_{i=1}^{N}\sigma^{i}\beta^{-1}(B_{t}^{i})^{2}$
		$\displaystyle+\frac{\beta^{-1}}{N}\sum_{i=1}^{N}\sigma^{i}\int_{0}^{t}\int_{0}^{t}B_{u}^{i}B_{v}^{i}e^{\sigma^{i}(2t-u-v)}\left({DR_{u}^{t}(K_{N})-\sigma^{i}R_{v}^{t}(K_{N})}\right)\left({DR_{v}^{t}(K_{N})-\sigma^{i}R_{v}^{t}(K_{N})}\right)dudv$
		$\displaystyle+\frac{2}{N}\sum_{i=1}^{N}\sigma^{i}\Bigg{\{}\beta^{-1/2}Y_{0}^{i}B_{t}^{i}R_{0}^{t}(K_{N})-\beta^{-1/2}\int_{0}^{t}Y_{0}^{i}B_{s}^{i}e^{\sigma^{i}(2t-s)}R_{0}^{t}(K_{N})\left({DR_{s}^{t}(K_{N})-\sigma^{i}R_{s}^{t}(K_{N})}\right)ds$
		$\displaystyle-\beta^{-1}\int_{0}^{t}B_{t}^{i}B_{s}^{i}e^{\sigma^{i}(t-s)}\left({DR_{s}^{t}(K_{N})-\sigma^{i}R_{s}^{t}(K_{N})}\right)ds\Bigg{\}}$

Hence, the above equation of $H_{N}(t)$ specifies the function

\Phi:\mathbf{C}([0,T]^{2})\times\mathbf{C}([0,T])^{|\mathcal{G}_{1}|}\times\mathbf{C}([0,T]^{2})^{|\mathcal{G}_{2}|}\times\mathbf{C}([0,T]^{3})^{|\mathcal{G}_{3}|}\to\mathbf{C}([0,T])

such that

H_{N}=\Phi(K_{N},\mathcal{C}_{N}).

By Lemma 2.15, for any $\mathcal{C}_{N}$ defined as in (2.47) and $K_{N},\widetilde{K}_{N}\in\mathbf{C}([0,T]^{2})$ , there exists a constant $C_{1}>0$ so that

\sup_{0\leq t\leq T}|\Phi(K_{N},\mathcal{C}_{N})-\Phi(\widetilde{K}_{N},\mathcal{C}_{N})|\leq C_{1}\int_{0}^{t}|K_{N}(s,s)-\widetilde{K}_{N}(s,s)|ds

(2.53)

Similarly, we apply Lemma 2.15, then for any $\mathcal{C}_{N},\widetilde{\mathcal{C}}_{N}$ defined as in (2.47) and $\widetilde{K}_{N}\in\mathbf{C}([0,T]^{2})$ , there exists a constant $C_{2}>0$ so that

\sup_{0\leq t\leq T}|\Phi(\widetilde{K}_{N},\mathcal{C}_{N})-\Phi(\widetilde{K}_{N},\widetilde{\mathcal{C}}_{N})|\leq C_{2}\|\mathcal{C}_{N}-\widetilde{\mathcal{C}}_{N}\|_{\infty}

(2.54)

By Theorem 2.1, we know that $K_{N}$ converges to the deterministic limit $K$ almost surely and $K$ is unique. By [23, Lemma 3.7], each element in $\mathcal{C}_{N}$ converges to deterministic limits $\mathcal{C}$ almost surely under the i.i.d. initial conditions.

Combining (2.53) and (2.54), then we have

	$\displaystyle\\|\Phi(K_{N},\mathcal{C}_{N})-\Phi(K,\mathcal{C})\\|_{\infty}$	$\displaystyle=\\|\Phi(K_{N},\mathcal{C}_{N})-\Phi(K_{N},\mathcal{C})+\Phi(K_{N},\mathcal{C})-\Phi(K,\mathcal{C})\\|_{\infty}$
		$\displaystyle\leq\\|\Phi(K_{N},\mathcal{C}_{N})-\Phi(K_{N},\mathcal{C})\\|_{\infty}+\\|\Phi(K_{N},\mathcal{C})-\Phi(K,\mathcal{C})\\|_{\infty}$
		$\displaystyle\leq C_{1}\\|K_{N}-K\\|_{\infty}+C_{2}\\|\mathcal{C}_{N}-\mathcal{C}\\|_{\infty}\to 0,\mbox{ as }N\to\infty.$

Hence, $H_{N}$ converges to deterministic function $H$ .

Also, we have $H=\Phi(K,\mathcal{C})$ . Indeed,

\displaystyle|H-\Phi(K,\mathcal{C})|\leq|H-H_{N}|+|H_{N}-\Phi(K,\mathcal{C})|\to 0,\mbox{ as }N\to\infty.

Equations for the limit points

Next, we characterize the limit $H$ as follows. Recall that $R_{\tau}^{\theta}(K)=e^{-\int_{\tau}^{\theta}f^{\prime}(K(s))ds}$ . Then $Y_{t}$ can be expressed as

Y_{t}^{i}=R_{0}^{t}(K_{N})e^{\sigma^{i}t}Y_{0}^{i}+\beta^{-1/2}\int_{0}^{t}R_{s}^{t}(K_{N})e^{\sigma^{i}(t-s)}dB_{s}^{i}.

Substitute the above expression of $Y_{t}^{i}$ to $H_{N}(t)$ defined as in (2.5), then we have

	$\displaystyle H_{N}(t)$	$\displaystyle=\frac{1}{N}\sum_{i=1}^{N}\sigma^{i}e^{2\sigma^{i}t}(Y_{0}^{i})^{2}(R_{0}^{t}(K_{N}))^{2}+\frac{\beta^{-1}}{N}\sum_{i=1}^{N}\sigma^{i}\left({\int_{0}^{t}R_{s}^{t}(K_{N})e^{\sigma^{i}(t-s)}dB_{s}^{i}}\right)^{2}$
		$\displaystyle+\frac{2\beta^{-1/2}}{N}\sum_{i=1}^{N}\sigma^{i}\left({R_{0}^{t}(K_{N})e^{\sigma^{i}t}Y_{0}^{i}}\right)\int_{0}^{t}R_{s}^{t}(K_{N})e^{\sigma^{i}(t-s)}dB_{s}^{i}.$

As $N\to\infty$ , note that the first term is convergence to its expectation as in (2.6) by strong law of large number (SLLN) [15, Theorem 2.4.1], the limit of the second term given in (2.7) is obtained by SLLN and Itô isometry [24, Lemma 3.1.5], and the last term is convergence to zero by SLLN. Note that we take the limit that requires the empirical measure $\nu_{T}^{N}:=\frac{1}{N}\sum_{i=1}^{N}\delta_{Y_{0}^{i},\sigma^{i},B^{i}_{[0,T]}}$ converges to the desired limits under the i.i.d. initial conditions. Hence, we obtain the desired integro-differential equation as in Theorem 2.4.

∎

2.4 Proof of Corollary 2.6

Proof.

Consider first the regime $\beta>\beta_{c}$ . Recall that $R(t)$ is defined as in (2.13) and $K(t)$ is given in Theorem 2.1. We rewrite $K(t)$ as follows:

K(t)=R^{-1}(t)\left({\mathbb{E}[e^{2t\sigma}]+\beta^{-1}\int_{0}^{t}R(r)\mathbb{E}[e^{2(t-r)\sigma}]dr}\right).

(2.55)

We apply Lemma 2.13 and Lemma 2.12, and plug in the asymptotic limit of $R(t)$ and $E[e^{2t\sigma}]$ , we get

\lim_{t\to\infty}K(t)=\frac{2^{-\frac{7}{2}}\pi^{-\frac{1}{2}}(4\beta+1)}{C_{\beta}(4\beta-c)}+\frac{1}{2}\beta^{-1}.

(2.56)

Combining this result and Theorem 2.5, we obtained the desired result.

For $\beta<\beta_{c}$ , we will show that $\lim_{t\to\infty}K(t)=C$ for some non-zero constants $C$ . Take $h(t)=R(t)K(t)$ . Note that

R^{\prime}(t)=2cK(t)R(t)=2ch(t).

(2.57)

Thus, we have $2c\mathcal{L}_{h}(z)=z\mathcal{L}_{R}(z)-1$ .

By Lemma 2.14, we have

\mathcal{L}_{R}(z)=\frac{1+cm(z)}{2z-c\beta^{-1}m(z)}

(2.58)

Hence, we get

\mathcal{L}_{g}(z)=\frac{1}{2c}\left({\frac{czm(z)(1+\beta^{-1})-z}{2z-c\beta^{-1}m(z)}}\right).

(2.59)

Note that $\mathcal{L}_{g}(z)$ has a simple pole at $s_{\beta}$ , which is a solution to $2z=c\beta^{-1}m(z)$ . Thus there exists a constant $C>0$ such that

\lim_{z\to 0}z\mathcal{L}_{g}(z+s_{\beta})=C

(2.60)

By [4, Lemma 7.2], we have

\lim_{t\to\infty}e^{-2s_{\beta}t}h(t)=C

(2.61)

Hence, $\lim_{t\to\infty}K(t)=C$ for some non-zero constants $C$ . Hence, the desired follows from Theorem 2.5.

For $\beta=\beta_{c}$ , we apply the same proof as $\beta<\beta_{c}$ . Note that $\lim_{t\to\infty}K(t)=C_{1}$ for some non-zero constants $C_{1}$ . Hence, we still obtain the desired result by Theorem 2.5. ∎

3 Gradient descent in spherical SK model: hitting time analysis

In this section, we mainly study the algorithm complexity of using the gradient descent algorithm to find the extreme eigenvalues of the Wigner matrix. This is related to our previous main results about the asymptotic limit of energy in Section 2 for the description of the time to the equilibrium state of the SSK model.

Recall that we previously defined the Spherical Sherrington-Kirkpatrick (SSK) model on a sphere of radius $N$ . For simplicity, we consider the SSK model on the unit sphere in this section characterized by the Hamiltonian

H_{\mathbf{J}}(X)=X^{T}\mathbf{J}X,

where $X=(X_{1},\dots,X_{N})$ with $\|X\|_{2}=1$ and $\mathbf{J}$ is the normalized symmetric Wigner matrix.

We define $N$ eigenvalues of $\mathbf{J}$ in increasing order as

\lambda_{1}\leq\lambda_{2}\leq\dots\leq\lambda_{N},

and $v_{1},v_{2},\dots,v_{N}$ be an orthonormal basis of eigenvectors of $\mathbf{J}$ so that $\mathbf{J}v_{i}=\lambda_{i}v_{i}$ for $i=1,\dots,N$ .

The gradient descent algorithm (a.k.a., zero-temperature dynamics) of the SK model on the unit sphere is defined as follows:

dX_{t}=-\nabla_{\mathbb{S}^{N-1}}H_{\mathbf{J}}(X_{t})dt,

(3.1)

where the initial data $X_{0}=\{X_{0}^{i}\}_{1\leq i\leq N}$ is uniformly distributed on the unit sphere $\mathbb{S}^{N-1}$ , and $\nabla_{\mathbb{S}^{N-1}}$ is the gradient on the unit sphere $\mathbb{S}^{N-1}$ and

\nabla_{\mathbb{S}^{N-1}}f(x):=\nabla f(x)-(\nabla f(x)\cdot x)x,\,x\in\mathbb{R}^{N}

for smooth functions $f$ .

Before presenting the main result of this section, we need to add one assumption on the Wigner matrix $\mathbf{J}$ to ensure that the Tracy-Widom law holds, which provides information about the rescaling of the largest eigenvalue around the edge in [29, 30]. Moreover, Lee and Yin proved that this is not only true for Gaussian ensembles but also holds for more general situations in [22]. A simple sufficient criterion for Tracy-Widom law has been proved as follows.

Theorem 3.1.

[22, Theorem 1.2] Let $\mathbf{J}$ be the normalized Wigner matrix as in Definition 1.1 and $\lambda_{1}\leq\lambda_{2}\leq\dots\lambda_{N}$ the eigenvalues of $\mathbf{J}$ as before. If the off-diagonal entry of the Wigner matrix satisfies

\lim_{s\to\infty}s^{4}\mathbb{P}(|Z_{12}|\geq s)=0,

(3.2)

then the joint distribution function of $k$ rescaled the largest eigenvalues

\mathbb{P}(N^{2/3}(\lambda_{N}-2)\leq s_{1},N^{2/3}(\lambda_{N-1}-2)\leq s_{2},\dots,N^{2/3}(\lambda_{N-k+1}-2)\leq s_{k})

(3.3)

has a limit as $N\to\infty$ , which coincides with that in the GUE and GOE cases, i.e., it weakly converges to the Tracy-Widom distribution. This result also holds for the smallest eigenvalues $\lambda_{1},\dots,\lambda_{k}$ .

Remark 3.2.

Note that any distribution with a finite fourth moment satisfies the criterion (3.2), however, the converse statement does not hold. See [22, Remark 1.3] for a counterexample.

Fix $\varepsilon\in(0,1)$ . Denote by $T_{\varepsilon}$ the hitting time of the overlap between the output $X_{t}$ of the gradient descent and eigenvector $v_{1}$ corresponding to the smallest eigenvalue of $\mathbf{J}$ is greater than $\varepsilon$ , that is

T_{\varepsilon}:=\inf_{t>0}\{|v_{1}\cdot X_{t}|\geq\varepsilon\}.

Our main result is the lower bound and upper bound of the hitting time $T_{\varepsilon}$ as follows.

Theorem 3.3.

Assume that the normalized $N\times N$ Wigner matrix $\mathbf{J}=\{J_{ij}\}_{1\leq i,j\leq N}=\{\frac{Z_{ij}}{\sqrt{N}}\}_{1\leq i,j\leq N}$ obeys the following condition: for $1\leq i,j\leq N$

\mathbb{E}|Z_{ij}|^{4}\leq C

(3.4)

for some constants $C>0$ . Consider the gradient descent described in (3.1) with the same setting as before. Fix $\varepsilon\in(0,1)$ . Let the hitting time $T_{\varepsilon}$ be defined as before. For every $\delta>0$ , there exist constants $C_{1}=C_{1}(\delta)>0$ , $C_{2}=C_{2}(\delta)>0$ , and $C_{3}=C_{3}(\varepsilon,\delta)>0$ such that

\lim_{N\to\infty}\mathbb{P}\left({C_{1}N^{2/3}<T_{\varepsilon}<C_{2}N^{2/3}\log(C_{3}N)}\right)>1-\delta.

Combining Theorem 3.1 and continuous mapping theorem in [15, Theorem 3.2.10] we obtain the following Lemma. Moreover, we have $\lambda_{2}-\lambda_{1}=O_{p}(N^{-2/3})$ .

Lemma 3.4.

[21, Lemma 2.3] Let $\mathbf{J}$ be the normalized Wigner matrix. Assume that $\mathbf{J}$ obeys the same condition as in (3.4). Let $\lambda_{1}<\lambda_{2}$ be the two smallest eigenvalues of $\mathbf{J}$ . Then for every $\varepsilon>0$ , there exists $\delta>0$ so that

\mathbb{P}\left({N^{2/3}(\lambda_{2}-\lambda_{1})\geq\delta}\right)\geq 1-\varepsilon

(3.5)

for all $N$ large enough. This result also holds for the two largest eigenvalue $\lambda_{N-1}<\lambda_{N}$ .

Define the overlap of the output $X_{t}$ and eigenvectors $v_{i}$ of $\mathbf{J}$ by $h_{i}(t):=v_{i}\cdot X_{t}$ for $i=1,\dots,N$ . Note that we can solve $h_{i}(t)$ for $i=1,\dots,N$ as follows.

Lemma 3.5.

Assume that the same setting holds as in Theorem 3.3. Then we have

|h_{j}(t)|=\frac{|h_{j}(0)|e^{-2\lambda_{j}t}}{\sqrt{\sum_{i=1}^{N}h^{2}_{i}(0)e^{-4\lambda_{i}t}}}.

(3.6)

Proof.

Note that the gradient descent (3.1) can be simplified as follows

dX_{t}=-\left({\nabla H_{\mathbf{J}}(X_{t})-(\nabla H_{\mathbf{J}}(X_{t})\cdot X_{t})X_{t}}\right)dt=-2\mathbf{J}X_{t}dt+2H_{\mathbf{J}}(X_{t})X_{t}dt.

(3.7)

By spectral decomposition we have

\mathbf{J}=\sum_{i=1}^{N}v_{i}v_{i}^{T}\lambda_{i},

where $v_{1},\dots,v_{N}$ are eigenvectors corresponding to eigenvalues $\lambda_{1}\leq\lambda_{2}\leq\dots\leq\lambda_{N}$ of $\mathbf{J}$ .

Note that

H_{\mathbf{J}}(X_{t})=X_{t}\cdot\mathbf{J}X_{t}=\sum_{i}\lambda_{i}(h_{i}(t))^{2}.

Consider the dot product $v_{1}$ on the both sides of (3.7) and substitute the above equation of $H_{\mathbf{J}}(X_{t})$ to get

	$\displaystyle\frac{1}{2}h_{1}^{\prime}(t)$	$\displaystyle=h_{1}(t)H_{\mathbf{J}}(X_{t})-v_{1}\cdot(\sum_{i}v_{i}v_{i}^{T}\lambda_{i})X_{t}$
		$\displaystyle=h(t)H_{\mathbf{J}}(X_{t})-\lambda_{1}v_{1}\cdot X_{t}$
		$\displaystyle=h_{1}(t)H_{\mathbf{J}}(X_{t})-\lambda_{1}h_{1}(t).$

By the fact that $\sum_{i}h_{i}(t)^{2}=1$ we have

\frac{1}{2}h_{1}^{\prime}(t)=(H_{\mathbf{J}}(X_{t})-\lambda_{1})h(t)=\sum_{i=1}^{N}[(\lambda_{i}-\lambda_{1})h_{i}^{2}]h_{1}(t).

(3.8)

where we write $h_{i}(t)=h_{i}$ for convenience, $i=1,\dots,N$ .

Similarly, we have

\frac{1}{2}h_{j}^{\prime}(t)=\sum_{i=1}^{N}[(\lambda_{i}-\lambda_{j})h_{i}^{2}]h_{j}(t).

(3.9)

Multiply $h_{j}(t)$ on the both side of (3.9) yields

\displaystyle(h^{2}_{j}(t))^{\prime}=4\sum_{i=1}^{N}\left({(\lambda_{i}-\lambda_{j})h_{i}^{2}}\right)h_{j}^{2}

Denote $f(t)=\sum_{i}\lambda_{i}h_{i}^{2}(t)$ and $F(t)=\int_{0}^{t}f(s)ds$ .

Both sides of the equation are divided by $h_{j}^{2}$ yields for $j=1,\dots,N$

\displaystyle(\log h_{j}^{2}(t))^{\prime}=4f(t)-4\lambda_{j}

Integrating the two sides with respect to time $t$ yields

h^{2}_{j}(t)=h_{j}(0)^{2}\exp\left({4F(t)-4\lambda_{j}t}\right)

(3.10)

Taking the derivative with respect to both sides of $F(t)$ and substitute (3.10) yields

\displaystyle F^{\prime}(t)=\sum_{i}\lambda_{i}h_{i}^{2}(t)=e^{4F(t)}\left({\sum_{i}\lambda_{i}h^{2}_{i}(0)e^{-4\lambda_{t}t}}\right).

Both sides are divided by the integration factor $e^{4F(t)}$ and integrating with respect to $t$ to obtain

e^{-4F(t)}-1=\sum h_{i}^{2}(0)(e^{-4\lambda_{i}t}-1).

(3.11)

So we get

F(t)=-\frac{1}{4}\log\left({\sum_{i}h_{i}^{2}(0)e^{-4\lambda_{i}t}}\right).

(3.12)

Substituting (3.12) into equation (3.10) yields for $j=1,2,\dots,N$

|h_{j}(t)|=\frac{|h_{j}(0)|e^{-2\lambda_{j}t}}{\sqrt{\sum_{i=1}^{N}h^{2}_{i}(0)e^{-4\lambda_{i}t}}}.

(3.13)

∎

To prove Theorem 3.3, we need the following Lemma.

Lemma 3.6.

Consider a sequence of i.i.d. positive random variables $X_{1},\dots,X_{k}$ . For every constant $C>0$ , we have

\mathbb{P}\left(\frac{X_{1}+X_{2}+\dots+X_{k}}{X_{j}}>C\right)>1-\frac{C}{k},\mbox{ for }\,j=1,\dots,k.

Proof.

Note that

0<\mathbb{E}\left[\frac{X_{1}}{\sum_{i=1}^{k}X_{i}}\right]=\mathbb{E}\left[\frac{X_{2}}{\sum_{i=1}^{k}X_{i}}\right]=\dots=\mathbb{E}\left[\frac{X_{k}}{\sum_{i=1}^{k}X_{i}}\right]<1.

Then we have

\mathbb{E}\left[\frac{X_{1}}{\sum_{i=1}^{k}X_{i}}\right]=\frac{1}{k}.

By Markov’s inequality, we get for every constant $C>0$

\mathbb{P}\left(\frac{X_{1}+X_{2}+\dots+X_{k}}{X_{j}}\leq C\right)=\mathbb{P}\left(\frac{X_{j}}{X_{1}+X_{2}+\dots+X_{k}}\geq 1/C\right)\leq C\cdot\mathbb{E}\left[\frac{X_{j}}{\sum_{i}X_{i}}\right]=\frac{C}{k}.

(3.14)

∎

We require the following Lemma.

Lemma 3.7.

Let $X$ be a standard normal random variable. Then for every $\epsilon>0$ , there exists a constant $\delta>0$ so that

\mathbb{P}(|X|>\delta)\geq 1-\epsilon.

(3.15)

Proof.

For every $\epsilon>0$ , there exists a sufficiently small constant $\delta\in\left({0,\sqrt{\frac{\pi}{2}}\varepsilon}\right)$ so that

\mathbb{P}(|X|\leq\delta)=\frac{1}{\sqrt{2\pi}}\int_{-\delta}^{\delta}e^{-x^{2}/2}dx\leq\frac{2}{\sqrt{2\pi}}\delta<\epsilon,

(3.16)

where the above inequality follows from the fact that $e^{-x^{2}/2}\leq 1$ for $x\in\mathbb{R}$ . ∎

Armed with the previous results, we then can prove our main result.

The proof of Theorem 3.3.

By Lemma 3.5, we have

|h_{1}(t)|=\frac{|h_{1}(0)|e^{-2\lambda_{1}t}}{\sqrt{\sum_{i}h^{2}_{i}(0)e^{-4\lambda_{i}t}}}=\frac{|h_{1}(0)|}{\sqrt{h_{1}^{2}(0)+\sum_{i=2}^{N}h^{2}_{i}(0)e^{-4(\lambda_{i}-\lambda_{1})t}}}

(3.17)

Next, we consider the upper and lower bound of the hitting time $T_{\varepsilon}$ , respectively.

Lower bound of $T_{\varepsilon}$ .

For any $\delta>0$ , we fix the first $k$ terms of the denominator of (3.17) and then we will find desired $k$ below depending on $\epsilon$ and $\delta$ (independent of $N$ ):

|h_{1}(t)|\leq|h_{1}(0)|\left({h_{1}^{2}(0)+\sum_{i=2}^{k}h_{i}^{2}(0)e^{-4t(\lambda_{i}-\lambda_{1})}}\right)^{-1/2}.

(3.18)

By the fact that $(\lambda_{i}-\lambda_{1})\leq(\lambda_{k}-\lambda_{1})$ for $i=1,\dots,k-1$ , then we upper bound the above inequality:

|h_{1}(t)|\leq|h_{1}(0)|\left({h_{1}^{2}(0)+e^{-4t(\lambda_{k}-\lambda_{1})}\sum_{i=2}^{k}h_{i}^{2}(0)}\right)^{-1/2}

(3.19)

As $t\geq T_{\epsilon}$ , we have

\epsilon\leq|h_{1}(t)|\leq|h_{1}(0)|\left({h_{1}^{2}(0)+e^{-4t(\lambda_{k}-\lambda_{1})}\sum_{i=2}^{k}h_{i}^{2}(0)}\right)^{-1/2}.

(3.20)

Then we get

T_{\epsilon}\geq\frac{1}{4(\lambda_{k}-\lambda_{1})}\log\left(\frac{h_{2}^{2}(0)+\dots+h_{k}^{2}(0)}{h_{1}^{2}(0)(\epsilon^{-2}-1)}\right)

(3.21)

For any $\delta>0$ , we apply Lemma 3.6 and choose $C=2\varepsilon^{-2}-1$ :

	$\displaystyle\lim_{N\to\infty}\mathbb{P}\left(\frac{h_{2}^{2}(0)+\dots+h_{k}^{2}(0)}{h_{1}^{2}(0)(\epsilon^{-2}-1)}>2\right)$	$\displaystyle=\lim_{N\to\infty}\mathbb{P}\left(\frac{h_{1}^{2}(0)+h_{2}^{2}(0)+\dots+h_{k}^{2}(0)}{h_{1}^{2}(0)}>\frac{2}{\epsilon^{2}}-1\right)$
		$\displaystyle\geq 1-\delta/2,$

where we take $k=[2(2\epsilon^{-2}-1)/\delta]+1$ .

By the similar argument of Lemma 3.4, for any $\delta>0$ there exists a constant $c_{1}=c_{1}(\delta)>0$ so that

\lim_{N\to\infty}\mathbb{P}(N^{2/3}(\lambda_{k}-\lambda_{1})<c_{1})\geq 1-\delta/2

(3.22)

Hence, for any $\delta>0$ there exists $c_{2}=\frac{\log 2}{4c_{1}}$ so that

\lim_{N\to\infty}\mathbb{P}(T_{\varepsilon}>c_{2}N^{2/3})\geq 1-\delta.

(3.23)

Upper bound of $T_{\varepsilon}$ .

Note that $(\lambda_{2}-\lambda_{1})\leq(\lambda_{i}-\lambda_{1})$ for $i=3,\dots,N$ . We upper bound each term of the denominator in (3.17) by $e^{-4t(\lambda_{i}-\lambda_{1})}\leq e^{-4t(\lambda_{2}-\lambda_{1})}$ for $i=3,\dots,N$ . Then we have

|h_{1}(t)|\geq\frac{|h_{1}(0)|}{\sqrt{h_{1}^{2}(0)+e^{-4(\lambda_{2}-\lambda_{1})t}\sum_{i=2}^{N}h^{2}_{i}(0)}}=\frac{|h_{1}(0)|}{\sqrt{h_{1}^{2}(0)+e^{-4(\lambda_{2}-\lambda_{1})t}(1-h_{1}^{2}(0))}}

(3.24)

For $t\leq T_{\varepsilon}$ , we get

\varepsilon\geq|h_{1}(t)|\geq\frac{|h_{1}(0)|}{\sqrt{h_{1}^{2}(0)+e^{-4(\lambda_{2}-\lambda_{1})t}(1-h_{1}^{2}(0))}}.

(3.25)

and this yields

T_{\varepsilon}\leq\frac{1}{4(\lambda_{2}-\lambda_{1})}\log\left({\frac{h_{1}^{-2}(0)-1}{\varepsilon^{-2}-1}}\right).

(3.26)

By Lemma 3.4, for any $\delta>0$ there exists a constant $c_{3}=c_{3}(\delta)>0$ so that

\lim_{N\to\infty}\mathbb{P}\left({N^{2/3}(\lambda_{2}-\lambda_{1})\geq c_{3}}\right)\geq 1-\delta/2

(3.27)

Note that $\sqrt{N}h_{1}(0)$ is asymptotic Gaussian by [28, Theorem 13]. By Lemma 3.7, for every $\delta>0$ , there exists a constant $c_{4}>0$ so that

\lim_{N\to\infty}\mathbb{P}\left({\sqrt{N}|h_{1}(0)|>c_{4}}\right)\geq 1-\delta/2.

(3.28)

For any $\delta>0$ , we take $c_{5}=\frac{1}{c_{4}^{2}(\varepsilon^{-2}-1)}>0$ and then get

\displaystyle\mathbb{P}\left({\frac{h_{1}^{-2}(0)-1}{\varepsilon^{-2}-1}<c_{5}N}\right)=\mathbb{P}\left({|h_{1}(0)|>\sqrt{\frac{c_{4}}{c_{4}^{2}+N}}}\right)=\mathbb{P}\left({\sqrt{1+\frac{c_{4}^{2}}{N}}\sqrt{N}|h_{1}(0)|>c_{4}}\right)

By inequality (3.28), for every $\delta>0$ we have

\lim_{N\to\infty}\mathbb{P}\left({\frac{h_{1}^{-2}(0)-1}{\varepsilon^{-2}-1}<c_{5}N}\right)\geq 1-\delta/2.

(3.29)

Combining the two upper bounds (3.27) and (3.29), for every $\delta>0$ there exist constants $c_{5}$ defined as above and $c_{6}=\frac{1}{4c_{3}}$ so that

\lim_{N\to\infty}\mathbb{P}\left({T_{\varepsilon}<c_{6}N^{2/3}\log(c_{5}N)}\right)\geq 1-\delta.

(3.30)

∎

4 Power iteration method: a hitting time perspective

In this section, we conduct an average case analysis of the hitting time for the top eigenvectors using the power iteration method [5, Section 9.3], paralleling the approach in Section 3. Our focus will be on a normalized Wigner matrix, for which we will arrange the eigenvalues by their absolute values.

Let $\mathbf{J}$ be a normalized Wigner matrix as before. We arrange the eigenvalues of the matrix $\mathbf{J}$ in order of their absolute values

|\sigma_{N}|\leq|\sigma_{N-1}|\leq\dots\leq|\sigma_{2}|\leq|\sigma_{1}|

with corresponding eigenvectors $u_{N},u_{N-1},\dots,u_{2},u_{1}$ . Power iteration is then employed to estimate the matrix’s dominant eigenvalue $\sigma_{1}$ and the corresponding eigenvector.

Given an arbitrary initial vector $q_{0}\in\mathbb{R}^{N}$ with $\|q_{0}\|_{2}=1$ , we take for $k=1,2,3,\dots$

q_{k}:=\frac{\mathbf{J}q_{k-1}}{\|\mathbf{J}q_{k-1}\|_{2}}

(4.1)

Note that there are some constants $\alpha_{i}$ for $i=1,2,\dots,N$ so that

q_{0}=\sum_{i=1}^{N}\alpha_{i}u_{i}.

(4.2)

Then we have for for $k=1,2,\dots N$ ,

\mathbf{J}^{k}q_{0}=\left({\sum_{i=1}^{N}u_{i}u_{i}^{T}\sigma_{i}^{k}}\right)\left({\sum_{i=1}^{N}\alpha_{i}u_{i}}\right)=\sigma_{1}^{k}\left({\alpha_{1}u_{1}+\sum_{i=2}^{N}\alpha_{i}\left({\frac{\sigma_{i}}{\sigma_{1}}}\right)^{k}u_{i}}\right).

(4.3)

Our main theorem in this section is as follows.

Theorem 4.1.

Let $\mathbf{J}$ be the $N\times N$ normalized Wigner matrix. Assume that $\mathbf{J}$ obeys the same condition as in (3.4). Fix $0<\varepsilon<1$ and intial value $q_{0}\in\mathbb{R}^{N}$ . Let $q_{k}$ be output of power iteration for step $k\geq 1$ and $T_{\varepsilon}=\inf_{k\geq 1}\{\left\lvert{q_{k}\cdot u_{1}}\right\rvert\geq\varepsilon\}$ . For every $\delta>0$ , there exists constants $C_{1},C_{2},C_{3}>0$ so that

\lim_{N\to\infty}\mathbb{P}\left({C_{1}N^{2/3}<T_{\varepsilon}<C_{2}N^{2/3}\log(C_{3}N)}\right)>1-\delta.

(4.4)

To prove our main theorem, we require the following lemma results.

Lemma 4.2.

Let $\mathbf{J}$ be the $N\times N$ normalized Wigner matrix. Assume that $\mathbf{J}$ obeys the same condition as in (3.4). Ordering its eigenvalues $|\sigma_{N}|\leq|\sigma_{N-1}|\leq\dots\leq|\sigma_{1}|$ . Then we have $|\sigma_{1}|/|\sigma_{2}|=1+\mathcal{O}(N^{-2/3})$

Proof.

By Theorem 3.1, we assume that $|\sigma_{1}|=2+\gamma_{1}$ and $|\sigma_{2}|=2+\gamma_{2}$ with $\gamma_{1},\gamma_{2}=\mathcal{O}(N^{-2/3})$ . Then by Taylor expansion, we get

	$\displaystyle\frac{\|\sigma_{1}\|}{\|\sigma_{2}\|}$	$\displaystyle=\frac{2+\gamma_{1}}{2}(1+\frac{\gamma_{2}}{2})^{-1}=\frac{2+\gamma_{1}}{2}\left({1-\frac{\gamma_{2}}{2}+\mathcal{O}(\gamma_{2}^{2})}\right)$
		$\displaystyle=1+\frac{1}{2}(\gamma_{1}-\gamma_{2})+\mathcal{O}(N^{-4/3})$
		$\displaystyle=1+\mathcal{O}(N^{-2/3}).$

∎

The proof of Theorem 4.1.

We first show that the lower bound of $T_{\varepsilon}$ . Fix $\delta>0$ . We will find appropriate $1\leq l<N$ depending on $\varepsilon$ and $\delta$ (independent of $N$ ) to lower bound the denominator of $q_{k}\cdot u_{1}$ by the first $l$ terms as follows.

By (4.1) and (4.3), we have

\left\lvert{q_{k}\cdot u_{1}}\right\rvert=\frac{\left\lvert{\sigma_{1}^{k}\alpha_{1}}\right\rvert}{\sqrt{\sum_{i=1}^{N}\alpha_{i}^{2}\sigma_{i}^{2k}}}\leq\frac{\left\lvert{\sigma_{1}^{k}\alpha_{1}}\right\rvert}{\sqrt{\sum_{i=1}^{l}\alpha_{i}^{2}\sigma_{i}^{2k}}}=\left({1+\sum_{i=2}^{l}\frac{\alpha_{i}^{2}}{\alpha_{1}^{2}}\frac{\sigma_{i}^{2k}}{\sigma_{1}^{2k}}}\right)^{-1/2}

(4.5)

By the fact that $|\sigma_{i}|\geq|\sigma_{l}|$ for $i=1,\dots,l$ , we obtain

\left\lvert{q_{k}\cdot u_{1}}\right\rvert\leq\left({1+\frac{\sigma_{l}^{2k}}{\sigma_{1}^{2k}}\sum_{i=2}^{l}\frac{\alpha_{i}^{2}}{\alpha_{1}^{2}}}\right)^{-1/2}

(4.6)

For $k\geq T_{\varepsilon}$ , we have $\left\lvert{q_{k}\cdot u_{1}}\right\rvert\geq\varepsilon$ . Thus, we get

T_{\varepsilon}\geq\frac{1}{2}\log^{-1}\left({\frac{|\sigma_{1}|}{|\sigma_{l}|}}\right)\log\left({\frac{\alpha_{2}^{2}+\dots+\alpha_{l}^{2}}{\alpha_{1}^{2}(\varepsilon^{-2}-1)}}\right).

(4.7)

Then we apply Lemma 3.6 and choose $C=2\varepsilon^{-2}-1$ :

	$\displaystyle\lim_{N\to\infty}\mathbb{P}\left(\frac{\alpha_{2}^{2}+\dots+\alpha_{l}^{2}}{\alpha_{1}^{2}(\epsilon^{-2}-1)}>2\right)$	$\displaystyle=\lim_{N\to\infty}\mathbb{P}\left(\frac{\alpha_{1}^{2}+\alpha_{2}^{2}+\dots+\alpha_{l}^{2}}{\alpha_{1}^{2}}>\frac{2}{\epsilon^{2}}-1\right)$
		$\displaystyle\geq 1-\delta/2,$

where we take $l=[2(2\epsilon^{-2}-1)/\delta]+1$ .

By Lemma 4.2 and Theorem 3.1, for any $\delta>0$ there exists a constant $c_{1}=c_{1}(\delta)>0$ so that

\lim_{N\to\infty}\mathbb{P}\left({N^{2/3}\left({\frac{|\sigma_{1}|}{|\sigma_{l}|}-1}\right)\leq c_{1}}\right)\geq 1-\delta/2.

(4.8)

By the fact that $\log(1+x)<x$ for $x>0$ , then for any $\delta>0$ there exists $c_{2}=c_{1}^{-1}>0$ so that

\lim_{N\to\infty}\mathbb{P}(T_{\varepsilon}\geq c_{2}N^{2/3})\geq 1-\delta.

(4.9)

Next, we prove the upper bound of $T_{\varepsilon}$ .

By inequality that $|\sigma_{i}|\leq|\sigma_{2}|$ for $i=2,3,\dots,N$ , we have

\left\lvert{q_{k}\cdot u_{1}}\right\rvert=\left({1+\sum_{i=2}^{N}\frac{\alpha_{i}^{2}}{\alpha_{1}^{2}}\frac{\sigma_{i}^{2k}}{\sigma_{1}^{2k}}}\right)^{-1/2}\geq\left({1+\frac{\sigma_{2}^{2k}}{\sigma_{1}^{2k}}\sum_{i=2}^{N}\frac{\alpha_{i}^{2}}{\alpha_{1}^{2}}}\right)^{-1/2}.

(4.10)

Since we have $\|q_{0}\|_{2}=1$ , then

\left\lvert{q_{k}\cdot u_{1}}\right\rvert\geq\left({1+\frac{\sigma_{2}^{2k}}{\sigma_{1}^{2k}}\frac{1-\alpha_{1}^{2}}{\alpha_{1}^{2}}}\right)^{-1/2}.

(4.11)

By Bernoulli’s inequality $(1+x)^{\alpha}\geq 1+\alpha x$ for $\alpha\leq 0$ and $x>-1$ , we get

\displaystyle\left\lvert{q_{k}\cdot u_{1}}\right\rvert\geq 1-\frac{1}{2}\left({\frac{\sigma_{2}}{\sigma_{1}}}\right)^{2k}\left({\frac{1-\alpha_{1}^{2}}{\alpha_{1}^{2}}}\right)

Note that for $k\leq T_{\varepsilon}$ , we have $\left\lvert{q_{k}\cdot u_{1}}\right\rvert\leq\varepsilon$ . Then we get

T_{\varepsilon}\leq\frac{2}{\log\left\lvert{\frac{\sigma_{1}}{\sigma_{2}}}\right\rvert}\log\left({\frac{\alpha_{1}^{-2}-1}{2(1-\varepsilon)}}\right).

(4.12)

By Lemma 4.2, for any $\delta>0$ there exists a constant $c_{3}=c_{3}(\delta)>0$ so that

\lim_{N\to\infty}\mathbb{P}\left({N^{2/3}\left({\left\lvert{\frac{\sigma_{1}}{\sigma_{2}}}\right\rvert-1}\right)\geq c_{3}}\right)\geq 1-\delta/2.

(4.13)

By substituting into inequalities (4.13) and $\log(1+x)>x/2$ for $x\in[0,1)$ , we get the following inequality

\mathbb{P}\left({\log^{-1}\left\lvert{\frac{\sigma_{1}}{\sigma_{2}}}\right\rvert<2c_{3}^{-1}N^{2/3}}\right)\geq 1-\delta/2.

(4.14)

Since $\sqrt{N}\alpha_{1}$ is asymptotic Gaussian by [28, Theorem 13], then by using the same proof as for inequality (3.29): for any $\delta>0$ , there exists a constant $c_{4}>0$ so that

\lim_{N\to\infty}\mathbb{P}\left({\frac{\alpha_{1}^{-2}-1}{2(1-\varepsilon)}<c_{4}N}\right)\geq 1-\delta/2

(4.15)

Thus, combining inequalities (4.14) and (4.15), we can prove the desired result: for any $\delta>0$ , there exist constants $c_{4},c_{5}=4c_{3}^{-1}>0$ such that

\lim_{N\to\infty}\mathbb{P}\left({T_{\varepsilon}<c_{5}N^{2/3}\log(c_{4}N)}\right)\geq 1-\delta.

(4.16)

∎

Acknowledgement:

We thanks Aukosh Jagannath, Yi Shen, and Dong Yao for their invaluable suggestions and insights.

References

[1] M. Abramowitz and I. A. Stegun. Handbook of mathematical functions with formulas, graphs, and mathematical tables, volume 55. US Government printing office, 1948.
[2] G. B. Arous, A. Dembo, and A. Guionnet. Aging of spherical spin glasses. Probability theory and related fields, 120(1):1–67, 2001.
[3] A. Auffinger and W.-K. Chen. On the energy landscape of spherical spin glasses. Advances in Mathematics, 330:553–588, 2018.
[4] G. Ben-Arous. Aging and spin-glass dynamics. arXiv preprint math/0304364, 2003.
[5] R. L. Burden and J. D. Faires. Numerical analysis, brooks, 1997.
[6] S. Chatterjee. Estimation in spin glasses: A first step. Annals of Statistics, 35(5):1931 – 1946, 2007.
[7] K. L. Chung. A course in probability theory. Academic press, 2001.
[8] A. Crisanti and H.-J. Sommers. The spherical p-spin interaction spin glass model: the statics. Zeitschrift für Physik B Condensed Matter, 87(3):341–354, 1992.
[9] L. F. Cugliandolo and J. Kurchan. Analytical solution of the off-equilibrium dynamics of a long-range spin-glass model. Physical Review Letters, 71(1):173, 1993.
[10] L. F. Cugliandolo and J. Kurchan. On the out-of-equilibrium relaxation of the sherrington-kirkpatrick model. Journal of Physics A: Mathematical and General, 27(17):5749, 1994.
[11] P. Deift and T. Trogdon. Universality for eigenvalue algorithms on sample covariance matrices. SIAM Journal on Numerical Analysis, 55(6):2835–2862, 2017.
[12] A. Dembo and R. Gheissari. Diffusions interacting through a random matrix: universality via stochastic taylor expansion. Probability Theory and Related Fields, 180:1057–1097, 2021.
[13] A. Dembo and E. Subag. Dynamics for spherical spin glasses: disorder dependent initial conditions. Journal of Statistical Physics, 181:465–514, 2020.
[14] D. L. Donoho, A. Maleki, and A. Montanari. Message-passing algorithms for compressed sensing. Proceedings of the National Academy of Sciences, 106(45):18914–18919, 2009.
[15] R. Durrett. Probability: theory and examples, volume 49. Cambridge university press, 2019.
[16] S. F. Edwards and P. W. Anderson. Theory of spin glasses. Journal of Physics F: Metal Physics, 5(5):965, 1975.
[17] H.-O. Georgii. Gibbs measures and phase transitions, volume 9. Walter de Gruyter, 2011.
[18] H.-O. Georgii, O. Häggström, and C. Maes. The random geometry of equilibrium phases. In Phase transitions and critical phenomena, volume 18, pages 1–142. Elsevier, 2001.
[19] E. Kostlan. Complexity theory of numerical linear algebra. Journal of Computational and Applied Mathematics, 22(2-3):219–230, 1988.
[20] E. Kostlan. Statistical complexity of dominant eigenvector calculation. Journal of Complexity, 7(4):371–379, 1991.
[21] B. Landon. Free energy fluctuations of the two-spin spherical sk model at critical temperature. Journal of Mathematical Physics, 63(3):033301, 2022.
[22] J. O. Lee and J. Yin. A necessary and sufficient condition for edge universality of wigner matrices. Duke Mathematical Journal, 163(1):117–173, 2014.
[23] T. Liang, S. Sen, and P. Sur. High-dimensional asymptotics of Langevin dynamics in spiked matrix models. Information and Inference: A Journal of the IMA, 12(4):2720–2752, 10 2023.
[24] B. Øksendal. Stochastic differential equations. In Stochastic differential equations, pages 65–84. Springer, 2003.
[25] D. Sherrington and S. Kirkpatrick. Solvable model of a spin-glass. Physical review letters, 35(26):1792, 1975.
[26] S. Smale. Complexity theory and numerical analysis. Acta numerica, 6:523–551, 1997.
[27] D. L. Stein and C. M. Newman. Spin glasses and complexity, volume 4. Princeton University Press, 2013.
[28] T. Tao and V. Vu. Random matrices: universal properties of eigenvectors. Random Matrices: Theory and Applications, 1(01):1150001, 2012.
[29] C. A. Tracy and H. Widom. Level-spacing distributions and the airy kernel. Communications in Mathematical Physics, 159:151–174, 1994.
[30] C. A. Tracy and H. Widom. On orthogonal and symplectic matrix ensembles. Communications in Mathematical Physics, 177:727–754, 1996.
[31] E. Vincent, J. Hammann, M. Ocio, J.-P. Bouchaud, and L. F. Cugliandolo. Slow dynamics and aging in spin glasses. In Complex Behaviour of Glassy Systems: Proceedings of the XIV Sitges Conference Sitges, Barcelona, Spain, 10–14 June 1996, pages 184–219. Springer, 2007.
[32] L. Zdeborová and F. Krzakala. Statistical physics of inference: Thresholds and algorithms. Advances in Physics, 65(5):453–552, 2016.

	$\displaystyle\\|\Phi(K_{N},\mathcal{C}_{N})-\Phi(K,\mathcal{C})\\|_{\infty}$	$\displaystyle=\\|\Phi(K_{N},\mathcal{C}_{N})-\Phi(K_{N},\mathcal{C})+\Phi(K_{N},\mathcal{C})-\Phi(K,\mathcal{C})\\|_{\infty}$
		$\displaystyle\leq\\|\Phi(K_{N},\mathcal{C}_{N})-\Phi(K_{N},\mathcal{C})\\|_{\infty}+\\|\Phi(K_{N},\mathcal{C})-\Phi(K,\mathcal{C})\\|_{\infty}$
		$\displaystyle\leq C_{1}\\|K_{N}-K\\|_{\infty}+C_{2}\\|\mathcal{C}_{N}-\mathcal{C}\\|_{\infty}\to 0,\mbox{ as }N\to\infty.$