This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Optimal Smoothed Analysis and Quantitative Universality for the Smallest Singular Value of Random Matrices

Haoyu Wang Department of Mathematics, Yale University, New Haven, CT 06520, USA, [email protected]
Abstract

The smallest singular value and condition number play important roles in numerical linear algebra and the analysis of algorithms. In numerical analysis with randomness, many previous works make Gaussian assumptions, which are not general enough to reflect the arbitrariness of the input. To overcome this drawback, we prove the first quantitative universality for the smallest singular value and condition number of random matrices.

Moreover, motivated by the study of smoothed analysis that random perturbation makes deterministic matrices well-conditioned, we consider an analog for random matrices. For a random matrix perturbed by independent Gaussian noise, we show that this matrix quickly becomes approximately Gaussian. In particular, we derive an optimal smoothed analysis for random matrices in terms of a sharp Gaussian approximation.

1 Introduction

In numerical analysis and analysis of algorithms, the smallest singular value and condition number of matrices play important roles [TB97]. Broadly speaking, the smallest singular value reflects the invertibility of a matrix, and a better non-degeneracy condition of the matrix will result in faster algorithms or ensures better stability in computations. The condition number is a crucial quantity as well. It measures the hardness and accuracy of numerical computations, and its most well-known practical meaning is the loss of precision in solving linear equations [Sma85]. For more concrete state-of-the-art applications, the smallest singular value affect the stability in fast algorithms of solving linear systems [PV21, Nie22]. In [GPV21], the condition number determines the iteration complexity in fast algorithms for pp-norm regression. There are a lot of other applications of these two quantities, and we do not intend to provide a complete review here.

In the seminal work of Spielman and Teng [ST04], smoothed analysis is introduced to understand why some algorithms with poor worst-case performance can work successfully in practice. Roughly speaking, smoothed analysis is an interpolation between the worst-case analysis and the average-case analysis. In the context of solving linear equations Ax=bAx=b, even if the matrix AA has large condition number (and consequently large loss of precision), a random perturbation A+HA+H will become well-conditioned with high probability. Smoothed analysis for random perturbation of deterministic matrices has been well studied since its invention [Wsc04, VH14, TV10b, FV16, SS20] etc. It has also been applied in many algorithms, for example Gaussian elimination [SST06], matrix inversion [BC10], conjugate descent method [MT16], tensor decomposition [BCMV14], the kk-means method [AMR11], etc.

Classical results of smoothed analysis focus on random perturbation of deterministic matrices, and show that this perturbation indeed makes the original matrix better for the implementation of algorithms (in the sense such as having smaller condition number). Recently, numerical computations with random data has become more and more common. In the context of random matrices, we show that if a random matrix is perturbed by independent Gaussian noise, it quickly becomes approximately Gaussian. The smallest singular value and condition number for a Gaussian matrix has been well studied [Ede88], and therefore such Gaussian approximations enable us to analyze the more tractable Gaussian ensemble instead with benign approximation error. Specifically, we prove an optimal smoothed analysis of random matrices in terms of a sharp Gaussian approximation.

Moreover, in numerical analysis with randomness, many previous works make Gaussian assumptions. Such assumptions are not general enough to reflect the arbitrariness of the input. To overcome this drawback, we prove the first quantitative universality for the smallest singular value and condition number of random matrices. Both the smoothed analysis for random matrices and the quantitative universality will play useful roles in computations with randomness.

1.1 Overview and our contributions

From the mathematical perspective, the edge universality has been a classical problem in random matrix theory and it has tremendous progress in the past decade. However, most previous results are qualitative statements, and most quantitative rate of convergence to the limiting law were only known for integrable models. Recently, the rate of convergence to the Tracy-Widom distribution for the largest eigenvalue was first obtained in [Bou22] for generalized Wigner matrices and in [Wan19] for sample covariance matrices. These results were further improved by Schnelli and Xu in [SX22a, SX21, SX22b]. Our work is the first result about the quantitative universality for the smallest singular value and the condition number.

For an M×NM\times N matrix HH with limNN/Md(0,1]\lim_{N\to\infty}N/M\to d\in(0,1], the empirical spectral distribution of the sample covariance matrix HHH^{\top}H converges to the Marchenko-Pastur law

ρ𝖬𝖯(x)=12πd(x(1d)2)((1+d)2x)x2,x[(1d)2,(1+d)2].\rho_{\mathsf{MP}}(x)=\frac{1}{2\pi d}\sqrt{\frac{(x-(1-\sqrt{d})^{2})((1+\sqrt{d})^{2}-x)}{x^{2}}},\ \ x\in[(1-\sqrt{d})^{2},(1+\sqrt{d})^{2}].

In the case d<1d<1, the Marchenko-Pastur law has no singularity and such situations are called the soft edge case in the literature of random matrix theory. Moreover, note that all eigenvalues are of order constant with high probability. In particular, thanks to the square-root decay near the spectral edge of the Marchenko-Pastur law, the extreme eigenvalues of the covariance matrix HHH^{\top}H are highly concentrated at the spectral endpoints at the scale N2/3N^{-2/3} and the fluctuation is the Tracy-Widom law. In this case, the optimal rate of convergence to Tracy-Widom distribution has been established in [SX21], and the quantitative universality for the smallest singular value will be an easy consequence.

However, in the case limN/M=1\lim N/M=1, things become much more complicated. Note that in this case the Marchenko-Pastur law has a singularity at the origin

ρ𝖬𝖯(x)=12π4xx,x[0,4].\rho_{\mathsf{MP}}(x)=\dfrac{1}{2\pi}\sqrt{\dfrac{4-x}{x}},\ \ x\in[0,4].

This singularity of the limiting spectral distribution makes the behaviours of the eigenvalue near the left endpoint x=0x=0 very different. In particular, the typical eigenvalue spacing near the left edge is of order N1N^{-1}, which is much smaller than the soft edge case where the typical spacing is of order N2/3N^{-2/3}. In random matrix theory, this case is called the hard edge case. In this model, the left edge of the spectrum with singularity is called the hard edge and the right edge is called the soft edge. The study of the spectral statistics near the hard edge is a notoriously tricky problem. Due to some technical difficulties, the study of the hard edge case is mostly restricted to square matrices with MNM\equiv N. Because of this difficulty, we will also focus on the study of square N×NN\times N random matrices.

For the study of the smallest singular value of an N×NN\times N random matrix, the first result was obtained by Edelman for the Gaussian matrix in [Ede88]. Later, the distribution in the Gaussian ensemble was shown to be universal by Tao and Vu in [TV10a] and further generalized to sparse matrices by Che and Lopatto in [CL19]. However, both of their universality results are qualitative statements with unknown error terms.

In computer science and numerical computations, the accuracy or complexity of algorithms depend on the smallest singular value or condition number in a delicate way. Therefore, a qualitative universality is not enough to measure the performance of general random input. Thus, quantitative estimates are necessary for practical purposes. In this work, we prove the first quantitative version of the universality for the smallest singular value and the condition number. This solves a long-standing open problem111Problem 11 in http://www.aimath.org/WWN/randommatrices/randommatrices.pdf, Open problems: AIM workshop on random matrices.. Along the way to prove quantitative universality, we obtain a sharp estimate for matrices with Gaussian perturbations, and hence establish the optimal smoothed analysis for random matrices.

1.2 Models and main results

Thanks to the motivations from theoretical computer science, we mainly focus on real matrices. However, as mentioned in [CL19], our whole proof works for complex matrices as well.

Let H=(Hij)H=(H_{ij}) be an N×NN\times N matrix with independent real valued entries with mean 0 and variance N1N^{-1},

Hij=N1/2hij,𝔼hij=0,𝔼hij2=1.H_{ij}=N^{-1/2}h_{ij},\ \ \ \mathbb{E}h_{ij}=0,\ \ \ \mathbb{E}h_{ij}^{2}=1. (1)

We assume the entries hijh_{ij} have a sub-exponential decay, that is, there exists a constant θ>0\theta>0 such that for u>1u>1,

(|hij|>u)θ1exp(uθ).\mathbb{P}(|h_{ij}|>u)\leqslant\theta^{-1}\exp(-u^{\theta}). (2)

We remark that this assumption is mainly for convenience, and other conditions such as the existence of a sufficiently high moment would also be enough.

For an N×NN\times N matrix HH, let σ1(H)σN(H)\sigma_{1}(H)\leqslant\cdots\leqslant\sigma_{N}(H) denote the singular values in non-decreasing order and use κ(H):=σN(H)/σ1(H)\kappa(H):=\sigma_{N}(H)/\sigma_{1}(H) to denote the condition number. Throughout this paper, we let GG be an N×NN\times N Gaussian matrix with i.i.d. entires 𝒩(0,N1){\mathcal{N}}(0,N^{-1}).

To state the main results, we first introduce two important probabilistic notions that are commonly used throughout the whole paper.

Definition 1 (Overwhelming probability).

Let {N}\{{\mathcal{E}}_{N}\} be a sequence of events. We say that N{\mathcal{E}}_{N} holds with overwhelming probability if for any (large) D>0D>0, there exists N0(D)N_{0}(D) such that for all NN0(D)N\geqslant N_{0}(D) we have

(N)1ND.\mathbb{P}({\mathcal{E}}_{N})\geqslant 1-N^{-D}.
Definition 2 (Stochastic domination).

Let 𝒳={𝒳N}{\mathcal{X}}=\{{\mathcal{X}}_{N}\} and 𝒴={𝒴N}{\mathcal{Y}}=\{{\mathcal{Y}}_{N}\} be two families of nonnegative random variables. We say that 𝒳{\mathcal{X}} is stochastically dominated by 𝒴{\mathcal{Y}} if for all (small) ε>0\varepsilon>0 and (large) D>0D>0, there exists N0(ε,D)>0N_{0}(\varepsilon,D)>0 such that for NN0(ε,D)N\geqslant N_{0}(\varepsilon,D) we have

(𝒳N>Nε𝒴N)ND.\mathbb{P}\left({\mathcal{X}}_{N}>N^{\varepsilon}{\mathcal{Y}}_{N}\right)\leqslant N^{-D}.

The stochastic domination is always uniform in all parameters. If 𝒳{\mathcal{X}} is stochastically dominated by 𝒴{\mathcal{Y}}, we use the notation 𝒳𝒴{\mathcal{X}}\prec{\mathcal{Y}}.

Our main result is the following.

Theorem 1.

Let HH be an N×NN\times N random matrix satisfying (1) and (2). For any ε>0\varepsilon>0 and any λ>N1/2+ε\lambda>N^{-1/2+\varepsilon}, we have

|σ1(H+λG)1+λ2σ1(G)|1+λ2N2log(1+λ2).\left|{\sigma_{1}(H+\lambda G)-\sqrt{1+\lambda^{2}}\sigma_{1}(G)}\right|\prec\frac{\sqrt{1+\lambda^{2}}}{N^{2}\log(1+\lambda^{2})}. (3)
Theorem 2.

Let HH be an N×NN\times N matrix satisfying (1) and (2). For any δ(0,1)\delta\in(0,1) and ε>0\varepsilon>0, we have

(Nσ1(G)>r+Nδ)Nε(N1+δN12)(Nσ1(H)>r)(Nσ1(G)>rNδ)+Nε(N1+δN12),\mathbb{P}\left(N\sigma_{1}(G)>r+N^{-\delta}\right)-N^{\varepsilon}\left(N^{-1+\delta}\vee N^{-\frac{1}{2}}\right)\leqslant\mathbb{P}\left(N\sigma_{1}(H)>r\right)\\ \leqslant\mathbb{P}\left(N\sigma_{1}(G)>r-N^{-\delta}\right)+N^{\varepsilon}\left(N^{-1+\delta}\vee N^{-\frac{1}{2}}\right), (4)

where ab=max{a,b}a\vee b=\max\{a,b\} denotes the maximum between aa and bb.

For the smallest singular value, one aspect of the complex-valued case is particularly interesting in the sense that the complex Gaussian model is explicitly integrable, i.e., the distribution of its smallest singular value is given by an exact formula. Specifically, let GG_{\mathbb{C}} be an N×N\times\mathbb{N} matrix whose entries are i.i.d complex Gaussians whose real and imaginary parts are i.i.d. copies of 12𝒩(0,N1)\tfrac{1}{\sqrt{2}}{\mathcal{N}}(0,N^{-1}). For the complex Gaussian ensemble, Edelman proved in [Ede88] that the distribution of the (renormalized) smallest singular value of a complex Gaussian ensemble is independent of NN and can be computed explicitly

(Nσ1(G)r)=0rexdx=1er.\mathbb{P}(N\sigma_{1}(G_{\mathbb{C}})\leqslant r)=\int_{0}^{r}e^{-x}\mathrm{d}x=1-e^{-r}.

Thanks to this exact formula for the integrable model, the edge universality for the smallest singular value can be quantified in terms of the Kolmogorov-Smirnov distance to the explicit law.

More precisely, let HH_{\mathbb{C}} be an N×NN\times N random matrix satisfying

(H)ij=N1/2hij,𝔼hij=0,𝔼[(Rehij)2]=𝔼[(Imhij)2]=12,𝔼[(Rehij)(Imhij)]=0.(H_{\mathbb{C}})_{ij}=N^{-1/2}h_{ij},\ \ \ \mathbb{E}h_{ij}=0,\ \ \ \mathbb{E}[({\mathrm{Re}\,}h_{ij})^{2}]=\mathbb{E}[({\mathrm{Im}\,}h_{ij})^{2}]=\frac{1}{2},\ \ \mathbb{E}[({\mathrm{Re}\,}h_{ij})({\mathrm{Im}\,}h_{ij})]=0.

and the sub-exponential decay assumption (2). Then we have the following rate of convergence to the explicit law.

Corollary 1.

Let HH_{\mathbb{C}} be a complex N×NN\times N random matrix defined as above, then for any ε>0\varepsilon>0 we have

(Nσ1(H)r)=1er+O(N12+ε).\mathbb{P}\left(N\sigma_{1}(H_{\mathbb{C}})\leqslant r\right)=1-e^{-r}+O(N^{-\frac{1}{2}+\varepsilon}). (5)

Based on the analysis of the smallest singular value, combined with results on the quantitative results of the largest singular values, we can also derive optimal smoothed analysis and quantitative universality for the condition number.

Theorem 3.

Let HH be an N×NN\times N random matrix satisfying (1) and (2). For any ε>0\varepsilon>0 and any λ>N1/2+ε\lambda>N^{-1/2+\varepsilon}, we have

|κ(H+λG)κ(G)|1log(1+λ2).\left|{\kappa(H+\lambda G)-\kappa(G)}\right|\prec\frac{1}{\log(1+\lambda^{2})}. (6)
Theorem 4.

Let HH be an N×NN\times N random matrix satisfying (1) and (2). For any ε>0\varepsilon>0, we have

(κ(G)N>r+N23+ε)N13ε(κ(H)N>r)(κ(G)N>rN23+ε)+N13ε\mathbb{P}\left(\frac{\kappa(G)}{N}>r+N^{-\frac{2}{3}+\varepsilon}\right)-N^{-\frac{1}{3}-\varepsilon}\leqslant\mathbb{P}\left(\frac{\kappa(H)}{N}>r\right)\leqslant\mathbb{P}\left(\frac{\kappa(G)}{N}>r-N^{-\frac{2}{3}+\varepsilon}\right)+N^{-\frac{1}{3}-\varepsilon} (7)

As mentioned previously, the general hard edge case limN/M=1\lim N/M=1 without MNM\equiv N is a notoriously difficult problem in random matrix theory. To further generalize our results, we have the following slight extension towards the general case.

Theorem 5.

Let HH be an M×NM\times N random matrix satisfying (1) and (2) with M=N+O(No(1))M=N+O(N^{o(1)}). Then all results in Theorem 1, Theorem 2, Theorem 3 and Theorem 4 are still true.

1.3 Outline of proofs

The central idea of this paper is based on the Erdős-Schlein-Yau dynamical approach in random matrix theory. In their seminal work [ESYY12], the so-called three-step strategy is developed to prove universality phenomena for random matrices. Roughly speaking, this framework is the following three ideas.

  1. (i)

    A priori estimates for spectral statistics. This is based on the analyzing the resolvent of the matrix, and such analysis is called local law in random matrix theory. Local law states that the spectral density converges to the limiting law on microscopic scale. This local law implies the eigenvalue rigidity phenomenon, which states that the eigenvalues are close to their typical locations. Such a priori control of the eigenvalue locations will play a significant role in further analysis.

  2. (ii)

    Local relaxation of eigenvalues. This step is designed to prove universality for matrices with a tiny Gaussian component. We perturb the matrix by some independent Gaussian noise, and then under this perturbation, the dynamics of the eigenvalues is governed by the Dyson Brownian motion (DBM). Moreover, the spectral distribution of the Gaussian ensemble is the equilibrium measure of DBM. The ergodicity of DBM results in a fast convergence to the local equilibrium, and hence implies the universality for matrix with small Gaussian perturbation.

  3. (iii)

    Density arguments. For any probability distribution of the matrix elements, there exists a distribution with a small Gaussian component (in the sense of Step (ii)) such that the two associated random matrices have asymptotically identical spectral statistics. Typically, such an asymptotic identity is guaranteed by some moment matching conditions and the comparison of resolvents.

For a systematic discussion of this method, we refer to the monograph [EY17]. Following this strategy, our main techniques can also be divided into the following three steps.

  • The first step is the local semicircle law for the symmetrization of the random matrix HH. This local law guarantees the optimal rigidity estimates for the singular values. This step is be based on classical works in random matrix theory such as [BEK+14, AEK14, AEK17].

  • The second step is to interpolate the general matrix HH with the Gaussian matrix GG, and estimate the dynamics of the singular value. More specifically, we consider the interpolation Ht=et/2H+1etGH_{t}=e^{-t/2}H+\sqrt{1-e^{-t}}G, which solves the matrix Ornstein-Uhlenbeck stochastic differential equation

    dHt=1NdBt12Htdt.\mathrm{d}H_{t}=\frac{1}{\sqrt{N}}\mathrm{d}B_{t}-\frac{1}{2}H_{t}\mathrm{d}t.

    Note that this interpolation HtH_{t} is equivalent to the matrix perturbation in our smoothed analysis. We consider a weighted Stieltjes transform (defined in (15)). A key innovation of our work is that, combined with a symmetrization trick, the evolution of the weighted Stieltjes along the dynamics of HtH_{t} satisfies a stochastic PDE that can be well approximated by a deterministic advection equation. This deterministic PDE yields a rough estimate for |σk(Ht)σk(G)||\sigma_{k}(H_{t})-\sigma_{k}(G)|. Finally, using a delicate bootstrap argument, we show that the estimates for |σk(Ht)σk(G)||\sigma_{k}(H_{t})-\sigma_{k}(G)| are self-improving. Iterating the bootstrap argument to optimal scale, we derive the optimal smoothed analysis for the smallest singular value.

  • The last step is a quantitative resolvent comparison. In particular, the difference between the resolvent of two different random matrices are explicitly controlled in terms of the difference of their fourth moments. This comparison is proved via the Lindeberg exchange method. Together with the optimal smoothed analysis, this comparison theorem establishes the quantitative universality.

1.4 Notations and paper organizations

Throughout this paper, we denote CC a generic constant which does not depend on any parameter but may vary form line to line. We write ABA\lesssim B if ACBA\leqslant CB holds for some constant CC, and similarly write ABA\gtrsim B if AC1BA\geqslant C^{-1}B. We also denote ABA\sim B if both C1BACBC^{-1}B\leqslant A\leqslant CB hold. When AA and BB are complex valued, ABA\sim B means ReAReB{\mathrm{Re}\,}A\sim{\mathrm{Re}\,}B and ImAImB{\mathrm{Im}\,}A\sim{\mathrm{Im}\,}B. We use A,B:=[A,B]\llbracket A,B\rrbracket:=[A,B]\cap\mathbb{Z} to denote the set of integers between AA and BB. We use ab:=max{a,b}a\vee b:=\max\{a,b\} and ab:=min{a,b}a\wedge b:=\min\{a,b\} to denote the maximum and minimum between aa and bb, respectively.

The paper is organized as follows. In Section 2, we discuss some applications of our results in numerical analysis and algorithms. In Section 3, we discuss the smoothed analysis for the smallest singular value of random matrices via the study of singular value dynamics. In Section 4, we use the smoothed analysis to establish a full quantitative universality for the smallest singular value. In Section 5, we use the results on the smallest singular value to derive smoothed analysis and quantitative universality for the condition number. In Section 6, we extend the result for square matrices to a slightly more general non-square case. Finally, in the Appendix, we collect some auxiliary results and provide the deferred technical proofs.

2 Applications in Numerical Analysis and Algorithms

In this section, we discuss some applications of our results in numerical analysis and algorithms. There are numerous circumstances where the smallest singular value and condition number play important roles. We do not intend to mention all of them and just focus on two simple scenarios in the framework of solving linear systems to illustrate the usefulness of our results. We expect our results can be applied in more complicated models and more advanced algorithms.

2.1 Accuracy of least-square solution

Consider the linear least-square optimization

minxNAxb22,\min_{x\in\mathbb{R}^{N}}\|Ax-b\|_{2}^{2},

where AA is an M×NM\times N matrix satisfying MN=No(1)M-N=N^{o(1)}, and bNb\in\mathbb{R}^{N} is a fixed vector. The loss of precision of this problem, denoted by LoP(A,b)\mathrm{LoP}(A,b), is the number of correct digits in the entries of the data (A,b)(A,b) minus the same quantity for the computed solution. Let LoP(A)\mathrm{LoP}(A) denote the loss of precision for the worst bb. Then, as shown in [Hig02], we have

LoP(A)=logMN3/2+2logκ(A)+O(1).\mathrm{LoP}(A)=\log MN^{3/2}+2\log\kappa(A)+O(1).

Let HH be an M×NM\times N random matrix satisfying (1) and (2). Also let GG be an M×NM\times N Gaussian matrix. By Theorem 3 and Theorem 5, for any ε>0\varepsilon>0, with overwhelming probability we have

LoP(H+λG)logMN3/2+2logκ(G)+N1+ε1log(1+λ2)+O(1).\mathrm{LoP}(H+\lambda G)\leqslant\log MN^{3/2}+2\log\kappa(G)+N^{-1+\varepsilon}\frac{1}{\log(1+\lambda^{2})}+O(1).

Also, using Theorem 4 and 5, for any ε>0\varepsilon>0, with probability at least 1N1/3ε1-N^{-1/3-\varepsilon}, we have

LoP(H)logMN3/2+2logκ(G)+N23+ε+O(1)\mathrm{LoP}(H)\leqslant\log MN^{3/2}+2\log\kappa(G)+N^{-\frac{2}{3}+\varepsilon}+O(1)

The error terms are smaller than the O(1)O(1) term. These results imply that a general random matrix and its Gaussian perturbation can ensure accuracy as good as in the Gaussian case.

2.2 Complexity of conjugate gradient method

Consider the linear equation

AAx=c,A^{\top}Ax=c,

where AA is an M×NM\times N matrix with MN=No(1)M-N=N^{o(1)}, and cNc\in\mathbb{R}^{N} is a fixed vector. This linear system can be solved via the conjugate gradient algorithm. Let Tδ(A)T_{\delta}(A) denote the needed iterations to obtain an δ\delta-approximation of the true solution in the worst case. Then it is known (see e.g. [TB97]) that

Tδ(A)=12κ(A)δ.T_{\delta}(A)=\frac{1}{2}\kappa(A)\delta.

Let HH be an M×NM\times N random matrix satisfying (1) and (2). Also let GG be an M×NM\times N Gaussian matrix. By Theorem 3 and Theorem 5, for any ε>0\varepsilon>0, with overwhelming probability we have

|Tδ(H+λG)Tδ(G)|δNεlog(1+λ2)Nεδλ2\left|{T_{\delta}(H+\lambda G)-T_{\delta}(G)}\right|\leqslant\frac{\delta N^{\varepsilon}}{\log(1+\lambda^{2})}\lesssim N^{\varepsilon}\frac{\delta}{\lambda^{2}}

This shows that as long as λ2δ\lambda^{2}\gg\delta, the Gaussian perturbation H+λGH+\lambda G has time complexity as good as the Gaussian ensemble.

Similarly, using Theorem 4 and Theorem 5, for any ε>0\varepsilon>0, with probability at least 1N1/3ε1-N^{-1/3-\varepsilon}, we have

Tδ(H)Tδ(G)+δN1/3+ε.T_{\delta}(H)\leqslant T_{\delta}(G)+\delta N^{1/3+\varepsilon}.

This shows that as long as the required accuracy satisfies δN1/3\delta\ll N^{-1/3}, the time complexity for a general random matrix is as good as the Gaussian ensemble.

3 Smoothed Analysis and Gaussian Approximation

3.1 Singular value dynamics

In smoothed analysis, we are interested in matrix perturbation of the form H+λGH+\lambda G. After normalization of the variance, it is equivalent to study matrix of the form Ht=et/2H+1etGH_{t}=e^{-t/2}H+\sqrt{1-e^{-t}}G. More specifically, we have

H+λG=1+λ2(11+λ2H+λ1+λ2G)=1+λ2Hlog(1+λ2).H+\lambda G=\sqrt{1+\lambda^{2}}\left(\frac{1}{\sqrt{1+\lambda^{2}}}H+\frac{\lambda}{\sqrt{1+\lambda^{2}}}G\right)=\sqrt{1+\lambda^{2}}H_{\log(1+\lambda^{2})}. (8)

Let BB be an N×NN\times N matrix Brownian motion, i.e. BijB_{ij} are independent standard Brownian motions. Then the evolution of HtH_{t} is governed by the following matrix-valued Ornstein-Uhlenbeck process:

dHt=1NdBt12Htdt.\mathrm{d}H_{t}=\dfrac{1}{\sqrt{N}}\mathrm{d}B_{t}-\dfrac{1}{2}H_{t}\mathrm{d}t. (9)

Let {sk(t)}k=1N\{s_{k}(t)\}_{k=1}^{N} denote the singular values of HtH_{t}, then {sk(t)}k=1N\{s_{k}(t)\}_{k=1}^{N} satisfy the following system of stochastic differential equations [ESYY12, equation (5.8)],

dsk=dBkN+[12sk+12Nk(1sks+1sk+s)]dt, 1kN.\mathrm{d}s_{k}=\dfrac{\mathrm{d}B_{k}}{\sqrt{N}}+\left[-\dfrac{1}{2}s_{k}+\dfrac{1}{2N}\sum_{\ell\neq k}\left(\dfrac{1}{s_{k}-s_{\ell}}+\dfrac{1}{s_{k}+s_{\ell}}\right)\right]\mathrm{d}t,\ \ \ 1\leqslant k\leqslant N. (10)

To handle these SDEs, an important idea is the following symmetrization trick (see [CL19, equation (3.9)]):

si(t)=si(t),Bi(t)=Bi(t),t0, 1iN.s_{-i}(t)=-s_{i}(t),\ \ \ B_{-i}(t)=-B_{i}(t),\ \ \forall t\geqslant 0,\ 1\leqslant i\leqslant N.

With these notations, we label the indices from 1-1 to N-N and 11 to NN, so that the zero index is omitted. Unless otherwise stated, this will be the convention and we will not emphasize it explicitly in the following parts of the paper. After symmetrization, for the real case we have

dsk=dBkN+[12sk+12N±k1sks]dt,NkN,k0.\mathrm{d}s_{k}=\dfrac{\mathrm{d}B_{k}}{\sqrt{N}}+\left[-\dfrac{1}{2}s_{k}+\dfrac{1}{2N}\sum_{\ell\neq\pm k}\dfrac{1}{s_{k}-s_{\ell}}\right]\mathrm{d}t,\ \ \ -N\leqslant k\leqslant N,k\neq 0. (11)

Now we use the coupling method introduced in [LSY19] to analyze these dynamics. Consider the interpotation between a general matrix HH and a Gaussian matrix GG. Let {σk(H)}k=NN\{\sigma_{k}(H)\}_{k=-N}^{N} and {σk(G)}k=NN\{\sigma_{k}(G)\}_{k=-N}^{N} be the (symmetrized) singular values of HH and GG, respectively. For ν[0,1]\nu\in[0,1], define

sk(ν)(0)=(1ν)σk(H)+νσk(G).s_{k}^{(\nu)}(0)=(1-\nu)\sigma_{k}(H)+\nu\sigma_{k}(G).

With this initial condition, we denote the unique solution of (11) by {sk(ν)(t)}\{s_{k}^{(\nu)}(t)\}. Also, let {σk(H,t)}\{\sigma_{k}(H,t)\} and {σk(G,t)}\{\sigma_{k}(G,t)\} denote the solutions of (11) with initial conditions {σk(H)}\{\sigma_{k}(H)\} and {σk(G)}\{\sigma_{k}(G)\}, respectively.

It is well known that the empirical measure of the eigenvalues of HHH^{*}H converges to the Marchenko-Pastur distribution

ρ𝖬𝖯(x)=12π4xx,\rho_{\mathsf{MP}}(x)=\dfrac{1}{2\pi}\sqrt{\dfrac{4-x}{x}},

For 1kN1\leqslant k\leqslant N, we define the typical position of the singular value σk\sigma_{k} as the quantile γk\gamma_{k} satisfying

γk2ρ𝖬𝖯(x)dx=kN.\int_{-\infty}^{\gamma_{k}^{2}}\rho_{\mathsf{MP}}(x)\mathrm{d}x=\dfrac{k}{N}.

We also define γk=γk\gamma_{-k}=-\gamma_{k}. By a change of variable, we have

γkρ𝗌𝖼(x)dx=N+k2N,γkρ𝗌𝖼(x)dx=Nk2N,\int_{-\infty}^{\gamma_{k}}\rho_{\mathsf{sc}}(x)\mathrm{d}x=\dfrac{N+k}{2N},\ \ \ \int_{-\infty}^{\gamma_{-k}}\rho_{\mathsf{sc}}(x)\mathrm{d}x=\dfrac{N-k}{2N}, (12)

where ρ𝗌𝖼(x)=12π(4x2)+\rho_{\mathsf{sc}}(x)=\tfrac{1}{2\pi}\sqrt{(4-x^{2})_{+}} is the semicircle law.

An important input of our proof is the following uniform rigidity estimates. For any fixed ε>0\varepsilon>0, consider the set of good trajectories

𝒜ε={|sk(ν)(t)γk|<N23+ε(N+1|k|)13for all 0t1,NkN,0ν1}.\mathscr{A}_{\varepsilon}=\left\{\left|s_{k}^{(\nu)}(t)-\gamma_{k}\right|<N^{-\frac{2}{3}+\varepsilon}(N+1-|k|)^{-\frac{1}{3}}\ \mbox{for all}\ 0\leqslant t\leqslant 1,-N\leqslant k\leqslant N,0\leqslant\nu\leqslant 1\right\}. (13)

Such rigidity estimates for fixed tt and ν=0\nu=0 or 11 were proved in [AEK14, AEK17, BEK+14, BYY14, CMS13].The extension to uniform estimates in parameters can be done by a discretization argument: (1) discretize in tt and ν\nu; (2) use weyl’s inequality to control increments over small time intervals; (3) use a maximum principle for the derivative with respect to the ν\nu parameter (see Lemma 3.2) to control increments in small ν\nu-intervals. As a consequence, we have

Lemma 3.1.

For any ε>0\varepsilon>0, the event 𝒜ε\mathscr{A}_{\varepsilon} happens with overwhelming probability, i.e. for any D>0D>0, there exists N0(ε,D)N_{0}(\varepsilon,D) such that for any N>N0N>N_{0} we have

(𝒜ε)>1ND.\mathbb{P}(\mathscr{A}_{\varepsilon})>1-N^{-D}.

We consider

φk(ν)(t):=et2ddνsk(ν)(t).\varphi_{k}^{(\nu)}(t):=e^{\frac{t}{2}}\dfrac{\mathrm{d}}{\mathrm{d}\nu}s_{k}^{(\nu)}(t).

For the simplicity of notations, we omit the parameter ν\nu if the context is clear. Then φk\varphi_{k} satisfies the following non-local parabolic type equation.

ddtφk=12N±kφφk(ssk)2\dfrac{\mathrm{d}}{\mathrm{d}t}\varphi_{k}=\dfrac{1}{2N}\sum_{\ell\neq\pm k}\dfrac{\varphi_{\ell}-\varphi_{k}}{\left(s_{\ell}-s_{k}\right)^{2}} (14)

Let ψk=ψk(ν)\psi_{k}=\psi_{k}^{(\nu)} solve the same equation as φk\varphi_{k} in (14) but with initial condition ψk(0)=|φk(0)|=|σk(H)σk(G)|\psi_{k}(0)=|\varphi_{k}(0)|=|\sigma_{k}(H)-\sigma_{k}(G)|. Following the same arguments in [Wan19, Lemma 3.1], this equation satisfies a maximum principle.

Lemma 3.2.

For all t0t\geqslant 0 and NkN-N\leqslant k\leqslant N, we have

ψk(t)=ψk(t),ψk(t)0,|ψk(t)|maxNN|ψ(0)|,|φk(t)|ψk(t).\psi_{k}(t)=\psi_{-k}(t),\ \ \ \psi_{k}(t)\geqslant 0,\ \ \ |\psi_{k}(t)|\leqslant\max_{-N\leqslant\ell\leqslant N}|\psi_{\ell}(0)|,\ \ \ |\varphi_{k}(t)|\leqslant\psi_{k}(t).

We consider the following weighted Stieltjes transform

𝔖t(z):=et2NkNφk(t)sk(t)z,𝔖~t(z):=et2NkNψk(t)sk(t)z.{\mathfrak{S}}_{t}(z):=e^{-\frac{t}{2}}\sum_{-N\leqslant k\leqslant N}\dfrac{\varphi_{k}(t)}{s_{k}(t)-z},\ \ \ \widetilde{{\mathfrak{S}}}_{t}(z):=e^{-\frac{t}{2}}\sum_{-N\leqslant k\leqslant N}\dfrac{\psi_{k}(t)}{s_{k}(t)-z}. (15)

Let St(z)S_{t}(z) and m𝗌𝖼(z)m_{\mathsf{sc}}(z) denote the Stieltjes transforms of the empirical measure for the singular values and of the semicircle law

St(z)=12NNkN1skz,m𝗌𝖼(z)=z+z242.S_{t}(z)=\dfrac{1}{2N}\sum_{-N\leqslant k\leqslant N}\dfrac{1}{s_{k}-z},\ \ \ m_{\mathsf{sc}}(z)=\dfrac{-z+\sqrt{z^{2}-4}}{2}.

A well-known result in random matrix theory is the following local semicircle law for the Stieltjes transform St(z)S_{t}(z). Let ω>0\omega>0 be an arbitrarily fixed constant. Define the spectral domain

𝐒=𝐒ω:={z=E+iη:|E|ω1,N1+ωηω1}.\mathbf{S}=\mathbf{S}_{\omega}:=\left\{z=E+\text{\rm i}\eta:|E|\leqslant\omega^{-1},N^{-1+\omega}\leqslant\eta\leqslant\omega^{-1}\right\}.

For any ω>0\omega>0 and z𝐒ωz\in\mathbf{S}_{\omega}, it was shown in [AEK14, AEK17] that

|St(z)m𝗌𝖼(z)|1Nη.\left|{S_{t}(z)-m_{\mathsf{sc}}(z)}\right|\prec\frac{1}{N\eta}. (16)

As computed in [Wan19, Lemma 3.3], a key result is that 𝔖t(z){\mathfrak{S}}_{t}(z) and 𝔖~t(z)\widetilde{{\mathfrak{S}}}_{t}(z) satisfy the following stochastic advection equation.

Lemma 3.3.

For Imz0{\mathrm{Im}\,}z\neq 0, we have

d𝔖~t\displaystyle\mathrm{d}\widetilde{{\mathfrak{S}}}_{t} =(St(z)+z2)(z𝔖~t)dt+14N(zz𝔖~t)dt+[et22NNkNψk(skz)2(sk+z)]dt\displaystyle=\left(S_{t}(z)+\dfrac{z}{2}\right)(\partial_{z}\widetilde{{\mathfrak{S}}}_{t})\mathrm{d}t+\dfrac{1}{4N}(\partial_{zz}\widetilde{{\mathfrak{S}}}_{t})\mathrm{d}t+\left[\dfrac{e^{-\frac{t}{2}}}{2N}\sum_{-N\leqslant k\leqslant N}\dfrac{\psi_{k}}{(s_{k}-z)^{2}(s_{k}+z)}\right]\mathrm{d}t (17)
et2NNkNψk(skz)2dBk.\displaystyle\quad-\dfrac{e^{-\frac{t}{2}}}{\sqrt{N}}\sum_{-N\leqslant k\leqslant N}\dfrac{\psi_{k}}{(s_{k}-z)^{2}}\mathrm{d}B_{k}.

Based on the local semicircle law (16), we expect that this stochastic differential equation can be approximated by the deterministic advection PDE

th=z242zh.\partial_{t}h=\dfrac{\sqrt{z^{2}-4}}{2}\partial_{z}h. (18)

The above PDE have the following explicit characteristics

zt=et2(z+z24)+et2(zz24)2.z_{t}=\dfrac{e^{\frac{t}{2}}\left(z+\sqrt{z^{2}-4}\right)+e^{-\frac{t}{2}}\left(z-\sqrt{z^{2}-4}\right)}{2}. (19)

This implies 𝔖~t(z)𝔖~0(zt)\widetilde{{\mathfrak{S}}}_{t}(z)\approx\widetilde{{\mathfrak{S}}}_{0}(z_{t}), and we will justify this approximation in the next subsection. Moreover, we remark that 𝔖t{\mathfrak{S}}_{t} satisfies the same equation (17) with ψk\psi_{k} replaced by φk\varphi_{k}.

Before moving to the main estimates, we first collect some basic results, including the geometry of the characteristics and a rough estimate for the initial condition. These can be proved via direct computations and the details can be found in [Bou22, Section 2].

Let ξ(z)=min{|z2|,|z+2|}\xi(z)=\min\{|z-2|,|z+2|\}. For any ε>0\varepsilon>0, we consider the curve and the domain

𝒮ε={E+iη:2+N23+4ε<E<2N23+4ε,η=N1+4εξ(E)12},ε=0t1{zt:z𝒮ε}.\mathscr{S}_{\varepsilon}=\left\{E+\text{\rm i}\eta:-2+N^{-\frac{2}{3}+4\varepsilon}<E<2-N^{-\frac{2}{3}+4\varepsilon},\eta=N^{-1+4\varepsilon}\xi(E)^{-\frac{1}{2}}\right\},\ \ \mathscr{R}_{\varepsilon}=\bigcup_{0\leqslant t\leqslant 1}\{z_{t}:z\in\mathscr{S}_{\varepsilon}\}.

We also define a(z)=dist(z,[2,2])a(z)=\operatorname{dist}(z,[-2,2]) and b(z)=dist(z,[2,2]c)b(z)=\operatorname{dist}(z,[-2,2]^{c}).

Lemma 3.4.

Uniformly in 0<t<10<t<1 and z=z0z=z_{0} satisfying η=Imz>0\eta={\mathrm{Im}\,}z>0 and |z2|<110|z-2|<\tfrac{1}{10}, we have

Re(ztz0)ta(z)ξ(z)1/2+t2,Im(ztz0)tb(z)ξ(z)1/2.{\mathrm{Re}\,}(z_{t}-z_{0})\sim t\dfrac{a(z)}{\xi(z)^{1/2}}+t^{2},\ \ \ {\mathrm{Im}\,}(z_{t}-z_{0})\sim t\dfrac{b(z)}{\xi(z)^{1/2}}.

In particular, for ε>0\varepsilon>0, if z𝒮εz\in\mathscr{S}_{\varepsilon}, then ztz0(tN1+4εξ(E)1+t2)+iξ(E)1/2tz_{t}-z_{0}\sim(tN^{-1+4\varepsilon}\xi(E)^{-1}+t^{2})+\text{\rm i}\xi(E)^{1/2}t.

Moreover, for any ξ>0\xi>0, and z=E+iη[2+ξ,2ξ]×[0,ξ1]z=E+\text{\rm i}\eta\in[-2+\xi,2-\xi]\times[0,\xi^{-1}], we have Im(ztz0)t{\mathrm{Im}\,}(z_{t}-z_{0})\sim t.

Lemma 3.5.

Let ε>0\varepsilon>0 be any small constant. In the set 𝒜ε\mathscr{A}_{\varepsilon}, for any z=E+iηεz=E+\text{\rm i}\eta\in\mathscr{R}_{\varepsilon}, we have Im𝔖~0(z)Nε{\mathrm{Im}\,}\widetilde{{\mathfrak{S}}}_{0}(z)\lesssim N^{\varepsilon} if η>max(E2,E2)\eta>\max(E-2,-E-2), and Im𝔖~0(z)Nεξ(z)1η{\mathrm{Im}\,}\widetilde{{\mathfrak{S}}}_{0}(z)\lesssim N^{\varepsilon}{\xi(z)^{-1}}{\eta} otherwise. The same bounds also hold for |Im𝔖0||{\mathrm{Im}\,}{\mathfrak{S}}_{0}|.

A key ingredient of our proof is the following a priori estimate for 𝔖~t\widetilde{{\mathfrak{S}}}_{t}, whose proof is deferred to Appendix B.1.

Proposition 3.1.

Let ε>0\varepsilon>0 be any small constant. Uniformly for all 0<t<10<t<1 and z=E+iη𝒮εz=E+\text{\rm i}\eta\in\mathscr{S}_{\varepsilon}, with overwhelming probability we have

Im𝔖~t(z)N2εξ(E)1/2(ξ(E)1/2t).{\mathrm{Im}\,}\widetilde{{\mathfrak{S}}}_{t}(z)\lesssim N^{2\varepsilon}\dfrac{\xi(E)^{1/2}}{\left(\xi(E)^{1/2}\vee t\right)}.

This estimate yields a rough control for the decay of φk(t)\varphi_{k}(t), which will be an important input for more refined estimates.

Lemma 3.6.

For all NkN-N\leqslant k\leqslant N and 0t10\leqslant t\leqslant 1, we have

|φk(t)|1N1((N+1|k|N)1/3t)|\varphi_{k}(t)|\prec\frac{1}{N}\dfrac{1}{\left((\frac{N+1-|k|}{N})^{1/3}\vee t\right)} (20)
Proof.

By Lemma 3.2, it suffices to control ψk(t)\psi_{k}(t). By the nonnegativity of ψk(t)\psi_{k}(t), we have

Im𝔖~t(z)=NkNψk(t)Imz|sk(t)z|2ψk(t)Imz|sk(t)z|2,{\mathrm{Im}\,}\widetilde{{\mathfrak{S}}}_{t}(z)=\sum_{-N\leqslant k\leqslant N}\frac{\psi_{k}(t){\mathrm{Im}\,}z}{|s_{k}(t)-z|^{2}}\geqslant\psi_{k}(t)\frac{{\mathrm{Im}\,}z}{|s_{k}(t)-z|^{2}},

which implies

ψk(t)Im𝔖~t(z)|sk(t)z|2Imz.\psi_{k}(t)\leqslant{\mathrm{Im}\,}\widetilde{{\mathfrak{S}}}_{t}(z)\dfrac{|s_{k}(t)-z|^{2}}{{\mathrm{Im}\,}z}.

Let ε>0\varepsilon>0. For (N+1|k|)>N10ε(N+1-|k|)>N^{10\varepsilon}, pick the point z=γk+iN1+4εξ(γk)1/2𝒮εz=\gamma_{k}+\text{\rm i}{N^{-1+4\varepsilon}\xi(\gamma_{k})^{-1/2}}\in\mathscr{S}_{\varepsilon}. In this case we have ξ(γk)1/2(N+1|k|N)1/3{\xi(\gamma_{k})}^{1/2}\sim(\tfrac{N+1-|k|}{N})^{1/3}. Therefore, in the set 𝒜ε\mathscr{A}_{\varepsilon}, by Proposition 3.1, uniformly for all N+N10εkNN10ε-N+N^{10\varepsilon}\leqslant k\leqslant N-N^{10\varepsilon} and 0t10\leqslant t\leqslant 1, with overwhelming probability we have

|ψk(t)|<N8εN1((N+1|k|N)1/3t).|\psi_{k}(t)|<\dfrac{N^{8\varepsilon}}{N}\dfrac{1}{\left((\frac{N+1-|k|}{N})^{1/3}\vee t\right)}.

For (N+1|k|)N10ε(N+1-|k|)\leqslant N^{10\varepsilon}, without loss of generality we consider N+1kN10εN+1-k\leqslant N^{10\varepsilon}. In this case, let k0=NN10ε+1k_{0}=N-N^{10\varepsilon}+1 and consider z=γk0+iN1+4εξ(γk0)1/2z=\gamma_{k_{0}}+\text{\rm i}{N^{-1+4\varepsilon}\xi(\gamma_{k_{0}})^{-1/2}}. The same argument results in a similar bound with a larger N20εN^{20\varepsilon} factor. By the arbitrariness of ε\varepsilon, this completes the proof. ∎

3.2 Local relaxation at the hard edge

In this subsection we prove a quantitative estimate for the local relaxation flow (11) at the hard edge. The main estimate in this section is the following. We remark that Theorem 6 is equivalent to Theorem 1 using the rescaling (8).

Theorem 6.

For ε0>0\varepsilon_{0}>0 arbitrarily small and any N1+ε0<t<1N^{-1+\varepsilon_{0}}<t<1, we have

|σ1(H,t)σ1(G,t)|1N2t.|\sigma_{1}(H,t)-\sigma_{1}(G,t)|\prec\dfrac{1}{N^{2}t}. (21)

To estimate φk(t)\varphi_{k}(t) near the hard edge, we introduce the following quantity to approximate it. Let γkt=(γk)t\gamma_{k}^{t}=(\gamma_{k})_{t} with the convention γt=(γ+i0+)t\gamma^{t}=(\gamma+\text{\rm i}0^{+})_{t}, and define

φ^k(t):=12NImm𝗌𝖼(γkt)NjNIm(1γjγkt)(σj(H)σj(G)).\widehat{\varphi}_{k}(t):=\dfrac{1}{2N{\mathrm{Im}\,}m_{\mathsf{sc}}(\gamma_{k}^{t})}\sum_{-N\leqslant j\leqslant N}{\mathrm{Im}\,}\left(\dfrac{1}{\gamma_{j}-\gamma_{k}^{t}}\right)(\sigma_{j}(H)-\sigma_{j}(G)). (22)

Our goal is to prove the following estimates

Proposition 3.2.

Let 0<c<10<c<1 be a fixed small constant. For ε0>0\varepsilon_{0}>0 arbitrarily small with any N1+ε0<t<1N^{-1+\varepsilon_{0}}<t<1 and k(c1)N,(1c)Nk\in\llbracket(c-1)N,(1-c)N\rrbracket, we have

|σk(H,t)σk(G,t)φ^k(t)|1N2t.|\sigma_{k}(H,t)-\sigma_{k}(G,t)-\widehat{\varphi}_{k}(t)|\prec\dfrac{1}{N^{2}t}.

To obtain the optimal control for the local relaxation flow, we need to carefully estimate φ^k\widehat{\varphi}_{k} near the hard edge. A first step towards such estimates is given in the following lemma.

Lemma 3.7.

Let ε>0\varepsilon>0 and 0<c<10<c<1. For any (k,)(c1)N,(1c)N2(k,\ell)\in\llbracket(c-1)N,(1-c)N\rrbracket^{2}, |E|<2c|E|<2-c, and s,t,η[N1+4ε,1]s,t,\eta\in[N^{-1+4\varepsilon},1], in the set 𝒜ε\mathscr{A}_{\varepsilon}, for z=E+iηz=E+\text{\rm i}\eta we have

|φ^k(t)φ^(s)|Nε(|k|N2(st)+|st|N(st)),\left|\widehat{\varphi}_{k}(t)-\widehat{\varphi}_{\ell}(s)\right|\lesssim N^{\varepsilon}\left(\dfrac{|k-\ell|}{N^{2}(s\wedge t)}+\dfrac{|s-t|}{N(s\wedge t)}\right), (23)
|Im𝔖0(zt)2NImS0(zt)φ^(s)|Nε(|Eγ|N(s(η+t))+|η+ts|N(s(η+t))).\left|\dfrac{{\mathrm{Im}\,}{\mathfrak{S}}_{0}(z_{t})}{2N{\mathrm{Im}\,}S_{0}(z_{t})}-\widehat{\varphi}_{\ell}(s)\right|\lesssim N^{\varepsilon}\left(\dfrac{|E-\gamma_{\ell}|}{N(s\wedge(\eta+t))}+\dfrac{|\eta+t-s|}{N(s\wedge(\eta+t))}\right). (24)
Proof.

By the properties of the Stieltjes transform m𝗌𝖼(z)m_{\mathsf{sc}}(z) (see e.g. [EY17, Section 6]) and direct computation, we have

Imm𝗌𝖼(γkt)1,Imm𝗌𝖼(γs)1,{\mathrm{Im}\,}m_{\mathsf{sc}}(\gamma_{k}^{t})\gtrsim 1,\ \ {\mathrm{Im}\,}m_{\mathsf{sc}}(\gamma_{\ell}^{s})\gtrsim 1,

and

|Imm𝗌𝖼(γkt)Imm𝗌𝖼(γs)|+|γktγs||k|N+|st|.\left|{{\mathrm{Im}\,}m_{\mathsf{sc}}(\gamma_{k}^{t})-{\mathrm{Im}\,}m_{\mathsf{sc}}(\gamma_{\ell}^{s})}\right|+\left|{\gamma_{k}^{t}-\gamma_{\ell}^{s}}\right|\lesssim\frac{|k-\ell|}{N}+|s-t|. (25)

In the set 𝒜ε\mathscr{A}_{\varepsilon}, the rigidity estimates imply that

|σj(H)σj(G)|N23+ε(N+1|j|)13.|\sigma_{j}(H)-\sigma_{j}(G)|\leqslant N^{-\frac{2}{3}+\varepsilon}(N+1-|j|)^{-\frac{1}{3}}.

Then we have

|12NIm(NjNσj(H)σj(G)γjγkt)|\displaystyle\left|{\frac{1}{2N}{\mathrm{Im}\,}\left(\sum_{-N\leqslant j\leqslant N}\frac{\sigma_{j}(H)-\sigma_{j}(G)}{\gamma_{j}-\gamma_{k}^{t}}\right)}\right| 12N(Imγkt)jjNN23+ε(N+1|j|)13(γjReγkt)2+(Imγkt)2\displaystyle\lesssim\frac{1}{2N}({\mathrm{Im}\,}\gamma_{k}^{t})\sum_{-j\leqslant j\leqslant N}\frac{N^{-\frac{2}{3}+\varepsilon}(N+1-|j|)^{-\frac{1}{3}}}{(\gamma_{j}-{\mathrm{Re}\,}\gamma_{k}^{t})^{2}+({\mathrm{Im}\,}\gamma_{k}^{t})^{2}} (26)
N1+ε(Imγkt)22ξ(x)12(xReγkt)2+(Imγkt)2dρ𝗌𝖼(x)\displaystyle\lesssim N^{-1+\varepsilon}({\mathrm{Im}\,}\gamma_{k}^{t})\int_{-2}^{2}\frac{\xi(x)^{-\frac{1}{2}}}{(x-{\mathrm{Re}\,}\gamma_{k}^{t})^{2}+({\mathrm{Im}\,}\gamma_{k}^{t})^{2}}\mathrm{d}\rho_{\mathsf{sc}}(x)
N1+ε(Imγkt)221(xReγkt)2+(Imγkt)2dx\displaystyle\lesssim N^{-1+\varepsilon}({\mathrm{Im}\,}\gamma_{k}^{t})\int_{-2}^{2}\frac{1}{(x-{\mathrm{Re}\,}\gamma_{k}^{t})^{2}+({\mathrm{Im}\,}\gamma_{k}^{t})^{2}}\mathrm{d}x
N1+ε.\displaystyle\lesssim N^{-1+\varepsilon}.

By triangle inequality, we have

|φ^k(t)φ^(s)|\displaystyle\left|{\widehat{\varphi}_{k}(t)-\widehat{\varphi}_{\ell}(s)}\right| 12N|(1Imm𝗌𝖼(γkt)1Imm𝗌𝖼(γs))Im(NjNσj(H)σj(G)γjγkt)|\displaystyle\leqslant\frac{1}{2N}\left|{\left(\frac{1}{{\mathrm{Im}\,}m_{\mathsf{sc}}(\gamma_{k}^{t})}-\frac{1}{{\mathrm{Im}\,}m_{\mathsf{sc}}(\gamma_{\ell}^{s})}\right){\mathrm{Im}\,}\left(\sum_{-N\leqslant j\leqslant N}\frac{\sigma_{j}(H)-\sigma_{j}(G)}{\gamma_{j}-\gamma_{k}^{t}}\right)}\right|
+12NImm𝗌𝖼(γs)NjN|Im(1γjγkt)Im(1γjγs)||σj(H)σj(G)|\displaystyle\quad+\frac{1}{2N{\mathrm{Im}\,}m_{\mathsf{sc}}(\gamma_{\ell}^{s})}\sum_{-N\leqslant j\leqslant N}\left|{{\mathrm{Im}\,}\left(\frac{1}{\gamma_{j}-\gamma_{k}^{t}}\right)-{\mathrm{Im}\,}\left(\frac{1}{\gamma_{j}-\gamma_{\ell}^{s}}\right)}\right|\left|{\sigma_{j}(H)-\sigma_{j}(G)}\right|
=:I1+I2.\displaystyle=:I_{1}+I_{2}.

Using (25) and (26), we obtain

I1Nε(|k|N2+|st|N).I_{1}\lesssim N^{\varepsilon}\left(\frac{|k-\ell|}{N^{2}}+\frac{|s-t|}{N}\right).

For the second term I2I_{2}, in the set 𝒜ε\mathscr{A}_{\varepsilon}, the rigidity and (25) imply that

I2\displaystyle I_{2} N1+εNjNN23(N+1|j|)13|γktγs(γjγkt)(γjγs)|\displaystyle\lesssim N^{-1+\varepsilon}\sum_{-N\leqslant j\leqslant N}N^{-\frac{2}{3}}(N+1-|j|)^{-\frac{1}{3}}\left|{\frac{\gamma_{k}^{t}-\gamma_{\ell}^{s}}{(\gamma_{j}-\gamma_{k}^{t})(\gamma_{j}-\gamma_{\ell}^{s})}}\right|
N1+ε(|k|N+|st|)NjNN23(N+1|j|)13(1|γjγkt|2+1|γjγs|2).\displaystyle\lesssim N^{-1+\varepsilon}\left(\frac{|k-\ell|}{N}+|s-t|\right)\sum_{-N\leqslant j\leqslant N}N^{-\frac{2}{3}}(N+1-|j|)^{-\frac{1}{3}}\left(\frac{1}{|\gamma_{j}-\gamma_{k}^{t}|^{2}}+\frac{1}{|\gamma_{j}-\gamma_{\ell}^{s}|^{2}}\right).

Recall from Lemma 3.4 that Imγktt{\mathrm{Im}\,}\gamma_{k}^{t}\sim t and Imγss{\mathrm{Im}\,}\gamma_{\ell}^{s}\sim s. Using a similar argument as in (26) we obtain

I2N1+ε(|k|N+|st|)(1t+1s)Nε(|k|N2(st)+|st|N(st)).I_{2}\lesssim N^{-1+\varepsilon}\left(\frac{|k-\ell|}{N}+|s-t|\right)\left(\frac{1}{t}+\frac{1}{s}\right)\lesssim N^{\varepsilon}\left(\frac{|k-\ell|}{N^{2}(s\wedge t)}+\frac{|s-t|}{N(s\wedge t)}\right).

Hence we have proved (23). For the other part (24), it can be proved via the same arguments. ∎

As a consequence, we have a good control for the size of φ^k(t)\widehat{\varphi}_{k}(t) away from the soft edge. This is based on the symmetric structure of {φ^k}\{\widehat{\varphi}_{k}\}.

Lemma 3.8.

Let ε>0\varepsilon>0 and 0<c<10<c<1. For any t[N1+4ε,1]t\in[N^{-1+4\varepsilon},1] and k(c1)N,(1c)Nk\in\llbracket(c-1)N,(1-c)N\rrbracket, with overwhelming probability we have

|φ^k(t)|NεkN2t.|\widehat{\varphi}_{k}(t)|\lesssim N^{\varepsilon}\dfrac{k}{N^{2}t}. (27)
Proof.

A key observation is the following

Reγkt=Reγkt,Imγkt=Imγkt,Rem𝗌𝖼(γkt)=Rem𝗌𝖼(γkt),Imm𝗌𝖼(γkt)=Imm𝗌𝖼(γkt).{\mathrm{Re}\,}\gamma_{-k}^{t}=-{\mathrm{Re}\,}\gamma_{k}^{t},\ \ {\mathrm{Im}\,}\gamma_{-k}^{t}={\mathrm{Im}\,}\gamma_{k}^{t},\ \ {\mathrm{Re}\,}m_{\mathsf{sc}}(\gamma_{-k}^{t})=-{\mathrm{Re}\,}m_{\mathsf{sc}}(\gamma_{k}^{t}),\ \ {\mathrm{Im}\,}m_{\mathsf{sc}}(\gamma_{-k}^{t})={\mathrm{Im}\,}m_{\mathsf{sc}}(\gamma_{k}^{t}).

Therefore, we have

φ^k(t)\displaystyle\widehat{\varphi}_{-k}(t) =12NImm𝗌𝖼(γkt)NjNIm(1γjγkt)(σj(H)σj(G))\displaystyle=\dfrac{1}{2N{\mathrm{Im}\,}m_{\mathsf{sc}}(\gamma_{-k}^{t})}\sum_{-N\leqslant j\leqslant N}{\mathrm{Im}\,}\left(\dfrac{1}{\gamma_{j}-\gamma_{-k}^{t}}\right)(\sigma_{j}(H)-\sigma_{j}(G))
=12NImm𝗌𝖼(γkt)NjNIm(γkt)(γjRe(γkt))2+(Imγkt)2(σj(H)σj(G))\displaystyle=\dfrac{1}{2N{\mathrm{Im}\,}m_{\mathsf{sc}}(\gamma_{k}^{t})}\sum_{-N\leqslant j\leqslant N}\dfrac{{\mathrm{Im}\,}(\gamma_{-k}^{t})}{\left(\gamma_{j}-{\mathrm{Re}\,}(\gamma_{-k}^{t})\right)^{2}+({\mathrm{Im}\,}\gamma_{-k}^{t})^{2}}(\sigma_{j}(H)-\sigma_{j}(G))
=12NImm𝗌𝖼(γkt)NjNIm(γkt)(γj+Re(γkt))2+(Imγkt)2(σj(H)σj(G))\displaystyle=\dfrac{1}{2N{\mathrm{Im}\,}m_{\mathsf{sc}}(\gamma_{k}^{t})}\sum_{-N\leqslant j\leqslant N}\dfrac{{\mathrm{Im}\,}(\gamma_{k}^{t})}{\left(\gamma_{j}+{\mathrm{Re}\,}(\gamma_{k}^{t})\right)^{2}+({\mathrm{Im}\,}\gamma_{k}^{t})^{2}}(\sigma_{j}(H)-\sigma_{j}(G))

Using the symmtrization of the singular values, we further have

φ^k(t)\displaystyle\widehat{\varphi}_{-k}(t) =12NImm𝗌𝖼(γkt)NjNIm(γkt)(γjRe(γkt))2+(Imγkt)2(σj(H)σj(G))\displaystyle=-\dfrac{1}{2N{\mathrm{Im}\,}m_{\mathsf{sc}}(\gamma_{k}^{t})}\sum_{-N\leqslant j\leqslant N}\dfrac{{\mathrm{Im}\,}(\gamma_{k}^{t})}{\left(\gamma_{-j}-{\mathrm{Re}\,}(\gamma_{k}^{t})\right)^{2}+({\mathrm{Im}\,}\gamma_{k}^{t})^{2}}(\sigma_{-j}(H)-\sigma_{-j}(G))
=12NImm𝗌𝖼(γkt)NjNIm(1γjγkt)(σj(H)σj(G))\displaystyle=-\dfrac{1}{2N{\mathrm{Im}\,}m_{\mathsf{sc}}(\gamma_{k}^{t})}\sum_{-N\leqslant j\leqslant N}{\mathrm{Im}\,}\left(\dfrac{1}{\gamma_{-j}-\gamma_{k}^{t}}\right)(\sigma_{-j}(H)-\sigma_{-j}(G))
=12NImm𝗌𝖼(γkt)NjNIm(1γjγkt)(σj(H)σj(G))\displaystyle=-\dfrac{1}{2N{\mathrm{Im}\,}m_{\mathsf{sc}}(\gamma_{k}^{t})}\sum_{-N\leqslant j\leqslant N}{\mathrm{Im}\,}\left(\dfrac{1}{\gamma_{j}-\gamma_{k}^{t}}\right)(\sigma_{j}(H)-\sigma_{j}(G))
=φ^k(t)\displaystyle=-\widehat{\varphi}_{k}(t)

Consequently, by (23) we have

|φ^k(t)φ^k(t)|=2|φ^k(t)|Nε2kN2t.|\widehat{\varphi}_{k}(t)-\widehat{\varphi}_{-k}(t)|=2|\widehat{\varphi}_{k}(t)|\lesssim N^{\varepsilon}\dfrac{2k}{N^{2}t}.

This shows the desired result. ∎

Finally, it’s straightfoward to derive Theorem 6 from Proposition 3.2 and Lemma 3.8. Thus, our primary goal is to prove Proposition 3.2.

3.3 Bootstrap arguments

We will prove the main technical estimate Proposition 3.2 via a bootstrap argument.

Definition 3 (Hypothesis α{\mathcal{H}}_{\alpha}).

Consider the following hypothesis: For any fixed small 0<c<10<c<1, the following holds for ε0>0\varepsilon_{0}>0 arbitrarily small. For any N1+ε0<t<1N^{-1+\varepsilon_{0}}<t<1, k(c1)N,(1c)Nk\in\llbracket(c-1)N,(1-c)N\rrbracket and ν[0,1]\nu\in[0,1], we have

|φk(ν)(t)φ^k(t)|(Nt)αN2t.\left|\varphi_{k}^{(\nu)}(t)-\widehat{\varphi}_{k}(t)\right|\prec\dfrac{(Nt)^{\alpha}}{N^{2}t}. (28)

Proposition 3.2 is derived via a bootstrap of the hypothesis α{\mathcal{H}}_{\alpha}. Specifically, we have the following two lemmas.

Lemma 3.9.

The hypothesis 1{\mathcal{H}}_{1} is true.

Proof.

Recall from Lemma 3.6 that (20) implies that φk(ν)(t)N1\varphi_{k}^{(\nu)}(t)\prec N^{-1}. On the other hand, from the definition (22) of φ^k\widehat{\varphi}_{k}, using the rigidity and (26) we obtain φ^k(t)N1\widehat{\varphi}_{k}(t)\prec N^{-1} thanks to the arbitrariness of ε0\varepsilon_{0}. Therefore, the triangle inequality yields |φk(ν)(t)φ^k(t)|N1|\varphi_{k}^{(\nu)}(t)-\widehat{\varphi}_{k}(t)|\prec N^{-1}, which completes the proof. ∎

Lemma 3.10.

If α{\mathcal{H}}_{\alpha} is true, then 3α/4{\mathcal{H}}_{3\alpha/4} is true, i.e.

|φk(ν)(t)φ^k(t)|(Nt)3α4N2t.\left|\varphi_{k}^{(\nu)}(t)-\widehat{\varphi}_{k}(t)\right|\prec\dfrac{(Nt)^{\frac{3\alpha}{4}}}{N^{2}t}.

The self-improving property of the hypothesis α{\mathcal{H}}_{\alpha} stated in Lemma 3.10 is the main technical part of the proof for Proposition 3.2. We defer its proof to Appendix B.2.

Finally, the optimal control (21) for the local relaxation flow at the hard edge follows from these two lemma together with Lemma 3.8.

Proof of Proposition 3.2.

Note that

σk(H,t)σk(G,t)φ^k(t)=01φk(ν)(t)dνφ^k(t)=01(φk(ν)(t)φ^k(t))dν\sigma_{k}(H,t)-\sigma_{k}(G,t)-\widehat{\varphi}_{k}(t)=\int_{0}^{1}\varphi_{k}^{(\nu)}(t)\mathrm{d}\nu-\widehat{\varphi}_{k}(t)=\int_{0}^{1}\left(\varphi_{k}^{(\nu)}(t)-\widehat{\varphi}_{k}(t)\right)\mathrm{d}\nu (29)

Consider an arbitrarily fixed δ>0\delta>0, based on Lemma 3.9 and Lemma 3.10, after a finite time of iterations, with overwhelming probability we have

|φk(t)φ^k(t)|NδN2t\left|\varphi_{k}(t)-\widehat{\varphi}_{k}(t)\right|\leqslant\dfrac{N^{\delta}}{N^{2}t}

This shows that for any fixed D~\widetilde{D} and pp, and for large enough NN, we have

𝔼(|φk(t)φ^k(t)|2p)(NδN2t)2p+ND~.\mathbb{E}\left(\left|\varphi_{k}(t)-\widehat{\varphi}_{k}(t)\right|^{2p}\right)\leqslant\left(\dfrac{N^{\delta}}{N^{2}t}\right)^{2p}+N^{-\widetilde{D}}.

By (29) we obtain

𝔼(|σk(H,t)σk(G,t)φ^k(t)|2p)01𝔼(|φk(t)φ^k(t)|2p)dν(NδN2t)2p+ND~.\mathbb{E}\left(\left|\sigma_{k}(H,t)-\sigma_{k}(G,t)-\widehat{\varphi}_{k}(t)\right|^{2p}\right)\leqslant\int_{0}^{1}\mathbb{E}\left(\left|\varphi_{k}(t)-\widehat{\varphi}_{k}(t)\right|^{2p}\right)\mathrm{d}\nu\\ \leqslant\left(\dfrac{N^{\delta}}{N^{2}t}\right)^{2p}+N^{-\widetilde{D}}.

We choose p=D/δp=D/\delta and D~=D+100p\widetilde{D}=D+100p, and then the Markov inequality yields

(|σk(H,t)σk(G,t)φ^k(t)|N2δN2t)>1ND,\mathbb{P}\left(\left|\sigma_{k}(H,t)-\sigma_{k}(G,t)-\widehat{\varphi}_{k}(t)\right|\leqslant\dfrac{N^{2\delta}}{N^{2}t}\right)>1-N^{-D},

which completes the proof thanks to the arbitrariness of δ\delta and DD. ∎

4 Quantitative Universality

4.1 Quantitative resolvent comparison

In classical random matrix theory, the spectral universality is proved by comparison of the resolvent for matrices with some moment matching conditions. To obtain a quantitative universality for the smallest singular value, we need a quantitative version of the resolvent comparison theorem.

For a fixed constant a(1,2)a\in(1,2), let ρ=ρ(N)[Na,N1]\rho=\rho(N)\in[N^{-a},N^{-1}] be a cutoff scaling. Let r>0r>0 and consider two symmetric functions f1(x),f2(x)f_{1}(x),f_{2}(x) that are non-increasing in |x||x|, given by

f1(x):={0if |x|>rN11if |x|<rN1ρ,f2(x):={0if |x|>rN1+ρ1if |x|<rN1.f_{1}(x):=\begin{cases}0&\textrm{if }|x|>rN^{-1}\\ 1&\textrm{if }|x|<rN^{-1}-\rho\end{cases},\ \ \ f_{2}(x):=\begin{cases}0&\textrm{if }|x|>rN^{-1}+\rho\\ 1&\textrm{if }|x|<rN^{-1}\end{cases}.

Also, consider a fixed non-increasing smooth function FF such that F(x)=1F(x)=1 for x0x\leqslant 0 and F(x)=0F(x)=0 for x1x\geqslant 1.

A key observation is that the functions f1,f2f_{1},f_{2} and FF can bound the distribution of the smallest singular value σ1(H)\sigma_{1}(H). For any function f:f:\mathbb{R}\to\mathbb{R}, we denote Trf(H):=i=NNf(σi(H))\operatorname{Tr}f(H):=\sum_{i=-N}^{N}f(\sigma_{i}(H)).

Lemma 4.1.

We have

𝔼[F(Trf2(H))](σ1(H)>rN1)𝔼[F(Trf1(H))],\mathbb{E}\left[F\left(\operatorname{Tr}f_{2}(H)\right)\right]\leqslant\mathbb{P}\left(\sigma_{1}(H)>rN^{-1}\right)\leqslant\mathbb{E}\left[F\left(\operatorname{Tr}f_{1}(H)\right)\right], (30)
Proof.

For the right-hand side, assume σ1(H)>rN1\sigma_{1}(H)>rN^{-1}. By definition of the function f1f_{1}, we have i=NNf1(σi(H))=0\sum_{i=-N}^{N}f_{1}(\sigma_{i}(H))=0, which implies F(Trf1(H))=1F(\operatorname{Tr}f_{1}(H))=1. Also note that F0F\geqslant 0. Therefore, we conclude 𝟙{σ1(H)>rN1}F(Trf1(H))\mathbbm{1}\left\{\sigma_{1}(H)>rN^{-1}\right\}\leqslant F(\operatorname{Tr}f_{1}(H)) and this yields

(σ1(H)>rN1)=𝔼[𝟙{σ1(H)>rN1}]𝔼[F(Trf1(H))].\mathbb{P}\left(\sigma_{1}(H)>rN^{-1}\right)=\mathbb{E}\left[\mathbbm{1}\left\{\sigma_{1}(H)>rN^{-1}\right\}\right]\leqslant\mathbb{E}\left[F(\operatorname{Tr}f_{1}(H))\right].

The left-hand side can be proved similarly. ∎

When estimating the distribution (σ1(H)>rN1)\mathbb{P}\left(\sigma_{1}(H)>rN^{-1}\right), thanks to the rigidity of singular values, we can assume r<Nεr<N^{\varepsilon} without loss of generality, where ε>0\varepsilon>0 is a constant that can be arbitrarily small. Based on Lemma 4.1, to compare the distribution of the smallest singular values of different random matrices, it suffices to compare the functions Trf1\operatorname{Tr}f_{1} and Trf2\operatorname{Tr}f_{2}. In the remaining part of this section, we provide a systematic treatment of such a comparison.

Pick a point EE\in\mathbb{R} with 0<E<N1+ε0<E<N^{-1+\varepsilon}. Let f(x)f(x) be a smooth symmetric function that is non-increasing in |x||x| satisfying

f(x)={0if |x|>E1if |x|<Eρ,and f(k)ρkfor k=1,2.f(x)=\begin{cases}0&\textrm{if }|x|>E\\ 1&\textrm{if }|x|<E-\rho\end{cases},\ \ \ \textrm{and }\ \|f^{(k)}\|_{\infty}\lesssim\rho^{-k}\ \textrm{for }k=1,2. (31)

For the test functions ff and FF defined as above, we have the following quantitative comparison of the resolvents, whose proof is deferred to Appendix C.

Proposition 4.1.

Let XX and YY be two independent random matrices satisfying (1) and (2). Assume the first three moments of the entries are identical, i.e. 𝔼[Xijk]=𝔼[Yijk]\mathbb{E}[X_{ij}^{k}]=\mathbb{E}[Y_{ij}^{k}] for all 1i,jN1\leqslant i,j\leqslant N and 1k31\leqslant k\leqslant 3. Suppose also that for some parameter t=t(N)t=t(N) we have

|𝔼[(NXij)4]𝔼[(NYij)4]|t,for all 1i,jN.\left|\mathbb{E}[(\sqrt{N}X_{ij})^{4}]-\mathbb{E}[(\sqrt{N}Y_{ij})^{4}]\right|\leqslant t,\ \ \textrm{for all }1\leqslant i,j\leqslant N. (32)

With the test functions ff and FF defined as above, there exists a constant C>0C>0 such that the following is true for any ε>0\varepsilon>0

|𝔼[F(Trf(X))]𝔼[F(Trf(Y))]|NCε(1ρN2+(ρNa)5N+tρNa).\left|\mathbb{E}[F(\operatorname{Tr}f(X))]-\mathbb{E}[F(\operatorname{Tr}f(Y))]\right|\leqslant N^{C\varepsilon}\left(\frac{1}{\rho N^{2}}+\frac{\left(\rho N^{a}\right)^{5}}{\sqrt{N}}+t\rho N^{a}\right). (33)

4.2 Proof of Theorem 2

Using the quantitative comparison theorem (Proposition 4.1) and the smoothed analysis (Theorem 6), we now prove the quantitative universality.

For a general random matrix HH satisfying Assumptions (1) and (2), there exists another matrix H0H_{0}^{\prime} that also satisfies the same assumptions such that the matrix Ht:=et/2H0+(1et)1/2GH_{t}^{\prime}:=e^{-t/2}H_{0}^{\prime}+(1-e^{-t})^{1/2}G has the same first three moments as HH and the difference between the fourth moments (in the sense of (32)) is O(t)O(t). This is guaranteed by [EYY11, Lemma 3.4].

Lemma 4.1 and Proposition 4.1 yields

𝔼[F(Trf2(Ht))]NCε(1ρN2+(ρNa)5N+tρNa)(σ1(H)>rN1)𝔼[F(Trf1(Ht))]+NCε(1ρN2+(ρNa)5N+tρNa).\mathbb{E}\left[F(\operatorname{Tr}f_{2}(H_{t}^{\prime}))\right]-N^{C\varepsilon}\left(\frac{1}{\rho N^{2}}+\frac{\left(\rho N^{a}\right)^{5}}{\sqrt{N}}+t\rho N^{a}\right)\leqslant\mathbb{P}\left(\sigma_{1}(H)>rN^{-1}\right)\\ \leqslant\mathbb{E}\left[F(\operatorname{Tr}f_{1}(H_{t}^{\prime}))\right]+N^{C\varepsilon}\left(\frac{1}{\rho N^{2}}+\frac{\left(\rho N^{a}\right)^{5}}{\sqrt{N}}+t\rho N^{a}\right). (34)

Using Lemma 4.1 for HtH_{t}^{\prime} with f1f_{1} and f2f_{2} shifted by ±ρ\pm\rho, we have

(σ1(Ht)>rN+ρ)NCε(1ρN2+(ρNa)5N+tρNa)(σ1(H)>rN)(σ1(Ht)>rNρ)+NCε(1ρN2+(ρNa)5N+tρNa).\mathbb{P}\left(\sigma_{1}(H_{t}^{\prime})>\frac{r}{N}+\rho\right)-N^{C\varepsilon}\left(\frac{1}{\rho N^{2}}+\frac{\left(\rho N^{a}\right)^{5}}{\sqrt{N}}+t\rho N^{a}\right)\leqslant\mathbb{P}\left(\sigma_{1}(H)>\frac{r}{N}\right)\\ \leqslant\mathbb{P}\left(\sigma_{1}(H_{t}^{\prime})>\frac{r}{N}-\rho\right)+N^{C\varepsilon}\left(\frac{1}{\rho N^{2}}+\frac{\left(\rho N^{a}\right)^{5}}{\sqrt{N}}+t\rho N^{a}\right). (35)

Using smoothed analysis Theorem 6, we have

(σ1(G)>rN+ρ+1N2t)NCε(1ρN2+(ρNa)5N+tρNa)(σ1(H)>rN)(σ1(G)>rNρ1N2t)+NCε(1ρN2+(ρNa)5N+tρNa).\mathbb{P}\left(\sigma_{1}(G)>\frac{r}{N}+\rho+\frac{1}{N^{2}t}\right)-N^{C\varepsilon}\left(\frac{1}{\rho N^{2}}+\frac{\left(\rho N^{a}\right)^{5}}{\sqrt{N}}+t\rho N^{a}\right)\leqslant\mathbb{P}\left(\sigma_{1}(H)>\frac{r}{N}\right)\\ \leqslant\mathbb{P}\left(\sigma_{1}(G)>\frac{r}{N}-\rho-\frac{1}{N^{2}t}\right)+N^{C\varepsilon}\left(\frac{1}{\rho N^{2}}+\frac{\left(\rho N^{a}\right)^{5}}{\sqrt{N}}+t\rho N^{a}\right). (36)

Taking ρ=Na\rho=N^{-a}, t=Na2t=N^{a-2} and setting a=1+δa=1+\delta, we obtain the optimal bounds

(Nσ1(G)>r+Nδ)NCε(N1+δN12)(Nσ1(H)>r)(Nσ1(G)>rNδ)+NCε(N1+δN12).\mathbb{P}\left(N\sigma_{1}(G)>r+N^{-\delta}\right)-N^{C\varepsilon}\left(N^{-1+\delta}\vee N^{-\frac{1}{2}}\right)\leqslant\mathbb{P}\left(N\sigma_{1}(H)>r\right)\\ \leqslant\mathbb{P}\left(N\sigma_{1}(G)>r-N^{-\delta}\right)+N^{C\varepsilon}\left(N^{-1+\delta}\vee N^{-\frac{1}{2}}\right). (37)

Hence, thanks to the arbitrariness of ε\varepsilon, we have proved Theorem 2.

Finally, for the complex case, using the exact formula for the distribution of σ1(G)\sigma_{1}(G_{\mathbb{C}}), we obtain a rate of convergence to the limiting law. Recall that

(Nσ1(G)r)=0rexdx=1er.\mathbb{P}(N\sigma_{1}(G_{\mathbb{C}})\leqslant r)=\int_{0}^{r}e^{-x}\mathrm{d}x=1-e^{-r}.
Proof of Corollary 1.

For the complex case, the previous arguments are still valid. Therefore, we still have (37). Since Nσ1(G)N\sigma_{1}(G_{\mathbb{C}}) has a bounded density, we have

(Nσ1(G)r)Nε(Nδ+(N1+δN1/2))(Nσ1(H)r)(Nσ1(G)r)+Nε(Nδ+(N1+δN1/2))\mathbb{P}\left(N\sigma_{1}(G_{\mathbb{C}})\leqslant r\right)-N^{\varepsilon}\left(N^{-\delta}+(N^{-1+\delta}\vee N^{-1/2})\right)\leqslant\mathbb{P}\left(N\sigma_{1}(H_{\mathbb{C}})\leqslant r\right)\\ \mathbb{P}\left(N\sigma_{1}(G_{\mathbb{C}})\leqslant r\right)+N^{\varepsilon}\left(N^{-\delta}+(N^{-1+\delta}\vee N^{-1/2})\right) (38)

Choosing δ=1/2\delta=1/2, we obtain the optimal estimate

(Nσ1(H)r)=1er+N12+ε,\mathbb{P}\left(N\sigma_{1}(H_{\mathbb{C}})\leqslant r\right)=1-e^{-r}+N^{-\frac{1}{2}+\varepsilon},

which proves the desired result. ∎

5 Condition Number

5.1 Smoothed analysis

Note that the condition number κ(H)\kappa(H) is scaling invariant in the sense that κ(aH)=κ(H)\kappa(aH)=\kappa(H) for any a>0a>0. Therefore, in the smoothed analysis, it suffices to consider Ht=et/2H+(1et)1/2GH_{t}=e^{-t/2}H+(1-e^{t})^{1/2}G, whose singular values satisfies the stochastic differential equation (10).

Recall from Theorem 6, we have shown that

|σ1(Ht)σ1(G)|1N2t.|\sigma_{1}(H_{t})-\sigma_{1}(G)|\prec\frac{1}{N^{2}t}.

Note that in Lemma 3.6, using the same arguments as in Proposition 3.2, we can derive that

|σN(Ht)σN(G)|1Nt.|\sigma_{N}(H_{t})-\sigma_{N}(G)|\prec\frac{1}{Nt}.

Then for any large D>0D>0 and small ε1,ε2>0\varepsilon_{1},\varepsilon_{2}>0, there exists N0(ε1,ε2,D)N_{0}(\varepsilon_{1},\varepsilon_{2},D) such that the following holds with probability at least 1ND1-N^{-D}.

σ1(G)Nε1N2tσ1(Ht)σ1(G)+Nε1N2t,σN(G)Nε2NtσN(Ht)σN(G)+Nε2Nt.\sigma_{1}(G)-\frac{N^{\varepsilon_{1}}}{N^{2}t}\leqslant\sigma_{1}(H_{t})\leqslant\sigma_{1}(G)+\frac{N^{\varepsilon_{1}}}{N^{2}t},\ \ \ \sigma_{N}(G)-\frac{N^{\varepsilon_{2}}}{Nt}\leqslant\sigma_{N}(H_{t})\leqslant\sigma_{N}(G)+\frac{N^{\varepsilon_{2}}}{Nt}.

Without loss of generality, we assume that ε1ε2ε\varepsilon_{1}\sim\varepsilon_{2}\sim\varepsilon for some ε>0\varepsilon>0 that can be arbitrarily small. Then we have

κ(Ht)N=σN(Ht)Nσ1(Ht)σN(G)+Nε2NtNσ1(G)Nε1Ntκ(G)N+NCεNt,\frac{\kappa(H_{t})}{N}=\frac{\sigma_{N}(H_{t})}{N\sigma_{1}(H_{t})}\leqslant\frac{\sigma_{N}(G)+\frac{N^{\varepsilon_{2}}}{Nt}}{N\sigma_{1}(G)-\frac{N^{\varepsilon_{1}}}{Nt}}\leqslant\frac{\kappa(G)}{N}+\frac{N^{C\varepsilon}}{Nt},

where in the last inequality we use that Nσ1(G)N\sigma_{1}(G) is of order constant with overwhelming probability. By the arbitrariness of ε>0\varepsilon>0, we can relabel the parameter and then obtain

κ(Ht)κ(G)+Nεt.\kappa(H_{t})\leqslant\kappa(G)+\frac{N^{\varepsilon}}{t}.

Similarly, we can also prove a lower bound and conclude that

|κ(Ht)κ(G)|1t.|\kappa(H_{t})-\kappa(G)|\prec\frac{1}{t}.

For the matrix H+λGH+\lambda G, we can write the matrix as

H+λG=1+λ2(11+λ2H+λ1+λ2G)=1+λ2Hlog(1+λ2).H+\lambda G=\sqrt{1+\lambda^{2}}\left(\frac{1}{\sqrt{1+\lambda^{2}}}H+\frac{\lambda}{\sqrt{1+\lambda^{2}}}G\right)=\sqrt{1+\lambda^{2}}H_{\log(1+\lambda^{2})}.

Therefore we have that κ(H+λG)=κ(Hlog(1+λ2))\kappa(H+\lambda G)=\kappa(H_{\log(1+\lambda^{2})}). As a consequence, we obtain that

|κ(H+λG)κ(G)|1log(1+λ2),|\kappa(H+\lambda G)-\kappa(G)|\prec\frac{1}{\log(1+\lambda^{2})},

which completes the proof for Theorem 3.

5.2 Quantitative universality

In this section, we prove Theorem 4, which establishes the quantitative universality for the condition number. We will use the universality for both the smallest singular value and the largest singular values. In particular, since the smallest singular values is typically of order O(N1)O(N^{-1}), a quantitative control is necessary. The largest singular value is of order constant, and therefore it is easier to deal with. Specifically, for the largest singular value, due to the Tracy-Widom limit for the largest eigenvalue of the sample covariance matrix, we know that |σN(H)2|N2/3+ε|\sigma_{N}(H)-2|\leqslant N^{-2/3+\varepsilon} for any ε>0\varepsilon>0 with overwhelming probability.

Let ε>0\varepsilon>0 be an arbitrarily small constant. For any large D>0D>0 and sufficiently large NN, we have

(κ(H)N>r)=(Nσ1(H)<r1σN(H))(Nσ1(H)<r1(2+N23+ε))+ND\mathbb{P}\left(\frac{\kappa(H)}{N}>r\right)=\mathbb{P}\left(N\sigma_{1}(H)<r^{-1}\sigma_{N}(H)\right)\leqslant\mathbb{P}\left(N\sigma_{1}(H)<r^{-1}(2+N^{-\frac{2}{3}+\varepsilon})\right)+N^{-D}

Using Theorem 2, we obtain

(N1κ(H)>r)\displaystyle\mathbb{P}\left(N^{-1}\kappa(H)>r\right) (rNσ1(G)<2)+N13ε+ND\displaystyle\leqslant\mathbb{P}\left(rN\sigma_{1}(G)<2\right)+N^{-\frac{1}{3}-\varepsilon}+N^{-D}
(rNσ1(G)<σN(G)+N23+ε2)+N13ε+ND\displaystyle\leqslant\mathbb{P}\left(rN\sigma_{1}(G)<\sigma_{N}(G)+N^{-\frac{2}{3}+\frac{\varepsilon}{2}}\right)+N^{-\frac{1}{3}-\varepsilon}+N^{-D}
(κ(G)N>rNε/2N2/31Nσ1(G))+N13ε+ND\displaystyle\leqslant\mathbb{P}\left(\frac{\kappa(G)}{N}>r-\frac{N^{\varepsilon/2}}{N^{2/3}}\frac{1}{N\sigma_{1}(G)}\right)+N^{-\frac{1}{3}-\varepsilon}+N^{-D}
(κ(G)N>rN23+ε)+N13ε+ND\displaystyle\leqslant\mathbb{P}\left(\frac{\kappa(G)}{N}>r-N^{-\frac{2}{3}+\varepsilon}\right)+N^{-\frac{1}{3}-\varepsilon}+N^{-D}

where the third inequality follows from that Nσ1(G)N\sigma_{1}(G) is of order constant with overwhelming probability. Taking large enough DD, we have

(κ(H)N>r)(κ(G)N>rN23+ε)+O(N13ε).\mathbb{P}\left(\frac{\kappa(H)}{N}>r\right)\leqslant\mathbb{P}\left(\frac{\kappa(G)}{N}>r-N^{-\frac{2}{3}+\varepsilon}\right)+O\left(N^{-\frac{1}{3}-\varepsilon}\right).

This yields the upper bound in (7), and the lower bound can be proved similarly. Hence, we have proved Theorem 4.

6 Beyond Strictly Square Matrices

As mentioned in the Introduction, the optimal local law for an M×NM\times N random matrix with general limN/M=1\lim N/M=1 is a notoriously hard problem. In particular, the optimal rigidity estimates Lemma 3.1 is unknown unless we restrict MNM\equiv N. In this section, we discuss a slight extension of the strictly-square case. We show that in the regime M=N+O(No(1))M=N+O(N^{o(1)}), all of our theorems still hold.

This claim is based on the following important observation. All proofs of our paper only relies on the local law (as well as its consequences), and therefore it suffices to show that such a local law is still valid for a general M×NM\times N matrix. More specifically, the main task is to show that the optimal local semicircle law (16) still holds for the Girko symmetrization of an M×NM\times N random matrix. Modulo the optimal local law, the optimal rigidity will still be valid as a by-product via standard approaches in random matrix theory.

For an M×NM\times N matrix HH, we consider the augmented matrix H𝖠H_{\mathsf{A}}, which is an M×MM\times M matrix by adding MNM-N columns to HH with independent entries satisfying (1) and (2). Without loss of generality, we may assume that the added columns are the first MNM-N columns in H𝖠H_{\mathsf{A}}. Since H𝖠H_{\mathsf{A}} is a square matrix, the local semicircle law (see [AEK14, Theorem 1.1]) is still true. Specifically, for any fixed ω>0\omega>0, define the spectral domain

𝐒=𝐒ω:={z=x+iy:|x|ω1,M1+ωyω1}.\mathbf{S}=\mathbf{S}_{\omega}:=\left\{z=x+\text{\rm i}y:|x|\leqslant\omega^{-1},M^{-1+\omega}\leqslant y\leqslant\omega^{-1}\right\}.

Then for any z=x+iy𝐒z=x+\text{\rm i}y\in\mathbf{S}, define the resolvent G𝖠(z):=(H~𝖠z)1G^{\mathsf{A}}(z):=({\widetilde{H}}_{\mathsf{A}}-z)^{-1} and we have

|12MTrG𝖠(z)m𝗌𝖼(z)|1My,\left|{\frac{1}{2M}\operatorname{Tr}G^{\mathsf{A}}(z)-m_{\mathsf{sc}}(z)}\right|\prec\frac{1}{My},

and

maxi,j|Gij𝖠(z)δijm𝗌𝖼(z)|(1My+Immsc(z)My).\max_{i,j}\left|{G_{ij}^{\mathsf{A}}(z)-\delta_{ij}m_{\mathsf{sc}}(z)}\right|\prec\left(\frac{1}{My}+\sqrt{\frac{{\mathrm{Im}\,}m_{sc}(z)}{My}}\right).

For an n×nn\times n matrix XX and a subset 1kn1\leqslant k\leqslant n, we define X(k)X^{(k)} as the (nk)×(nk)(n-k)\times(n-k) matrix

X(k)=(Xij)k+1i,jn.X^{(k)}=(X_{ij})_{k+1\leqslant i,j\leqslant n}.

Recall the Girko symmetrization H~{\widetilde{H}} of a matrix HH. Then we have H~=H~𝖠(MN){\widetilde{H}}={\widetilde{H}}_{\mathsf{A}}^{(M-N)}.

For z=x+iyz=x+\text{\rm i}y with y>0y>0, let G(k)(z):=(H~𝖠(k)z)1G^{(k)}(z):=({\widetilde{H}}_{\mathsf{A}}^{(k)}-z)^{-1}. In particular we have G(0)(z)=G𝖠(z)G^{(0)}(z)=G^{\mathsf{A}}(z) and G(MN)(z)=G(z):=(H~z)1G^{(M-N)}(z)=G(z):=({\widetilde{H}}-z)^{-1}. Then we have the following resolvent identity (see e.g. [BGK17, Lemma 3.5])

Gij𝖠(z)=Gij(1)(z)+Gi1𝖠(z)G1j𝖠(z)G11𝖠(z),for all 2i,j2M.G_{ij}^{\mathsf{A}}(z)=G_{ij}^{(1)}(z)+\frac{G_{i1}^{\mathsf{A}}(z)G_{1j}^{\mathsf{A}}(z)}{G_{11}^{\mathsf{A}}(z)},\ \ \mbox{for all }2\leqslant i,j\leqslant 2M.

From the local semicircle law for the square matrix H𝖠H_{\mathsf{A}}, with overwhelming probability we have

|Gi,1𝖠(z)|,|G1,j𝖠(z)|Mε(1My+Immsc(z)My),and |G1,1𝖠(z)|1.|G_{i,1}^{\mathsf{A}}(z)|,|G_{1,j}^{\mathsf{A}}(z)|\leqslant M^{\varepsilon}\left(\frac{1}{My}+\sqrt{\frac{{\mathrm{Im}\,}m_{sc}(z)}{My}}\right),\ \ \mbox{and }\ |G_{1,1}^{\mathsf{A}}(z)|\sim 1.

Using local law for Gij𝖠G_{ij}^{\mathsf{A}} again, this implies that

|Gij(1)(z)Gij𝖠(z)|M2εMy.|G_{ij}^{(1)}(z)-G_{ij}^{\mathsf{A}}(z)|\lesssim\frac{M^{2\varepsilon}}{My}.

More generally, we have

Gij(k)(z)=Gij(k+1)(z)+Gi,k+1(k)(z)Gk+1,j(k)(z)Gk+1,k+1(k)(z),for all k+2i,j2M.G_{ij}^{(k)}(z)=G_{ij}^{(k+1)}(z)+\frac{G_{i,k+1}^{(k)}(z)G_{k+1,j}^{(k)}(z)}{G_{k+1,k+1}^{(k)}(z)},\ \ \mbox{for all }k+2\leqslant i,j\leqslant 2M.

By the same arguments as above, for any fixed kk, we can derive that

|Gij(k+1)(z)Gij(k)(z)|M2εMy.|G_{ij}^{(k+1)}(z)-G_{ij}^{(k)}(z)|\lesssim\frac{M^{2\varepsilon}}{My}.

By a telescoping summation, we derive

|Gij(z)δijm𝗌𝖼(z)|\displaystyle|G_{ij}(z)-\delta_{ij}m_{\mathsf{sc}}(z)| k=0MN1|Gij(k+1)(z)Gij(k)(z)|+|Gij𝖠(z)δijm𝗌𝖼(z)|\displaystyle\leqslant\sum_{k=0}^{M-N-1}|G_{ij}^{(k+1)}(z)-G_{ij}^{(k)}(z)|+|G_{ij}^{\mathsf{A}}(z)-\delta_{ij}m_{\mathsf{sc}}(z)|
Mε(1My+Immsc(z)My)+(MN)M2εMy\displaystyle\leqslant M^{\varepsilon}\left(\frac{1}{My}+\sqrt{\frac{{\mathrm{Im}\,}m_{sc}(z)}{My}}\right)+(M-N)\frac{M^{2\varepsilon}}{My}

Since MN=No(1)M-N=N^{o(1)}, the second term can be absorb into the first term with a larger factor N3εN^{3\varepsilon}. Thanks to the arbitrariness of ε\varepsilon, we have

|Gij(z)δijm𝗌𝖼(z)|(1Ny+Immsc(z)Ny).|G_{ij}(z)-\delta_{ij}m_{\mathsf{sc}}(z)|\prec\left(\frac{1}{Ny}+\sqrt{\frac{{\mathrm{Im}\,}m_{sc}(z)}{Ny}}\right). (39)

Moreover, the resolvent identity also yields

i=k+22MGii(k)(z)=TrG(k+1)(z)+1Gk+1,k+1(k)(z)i=k+22MGi,k+1(k)(z)Gk+1,i(k)(z).\sum_{i=k+2}^{2M}G_{ii}^{(k)}(z)=\operatorname{Tr}G^{(k+1)}(z)+\frac{1}{G_{k+1,k+1}^{(k)}(z)}\sum_{i=k+2}^{2M}G_{i,k+1}^{(k)}(z)G_{k+1,i}^{(k)}(z).

This yields

TrG(k)(z)TrG(k+1)(z)=1Gk+1,k+1(k)(z)i=k+12MGi,k+1(k)(z)Gk+1,i(k)(z).\operatorname{Tr}G^{(k)}(z)-\operatorname{Tr}G^{(k+1)}(z)=\frac{1}{G_{k+1,k+1}^{(k)}(z)}\sum_{i=k+1}^{2M}G_{i,k+1}^{(k)}(z)G_{k+1,i}^{(k)}(z).

Using the Ward identity, this implies that

|TrG(k)(z)TrG(k+1)(z)|\displaystyle\left|{\operatorname{Tr}G^{(k)}(z)-\operatorname{Tr}G^{(k+1)}(z)}\right| 1|Gk+1,k+1(k)(z)|i=k+12M|Gi,k+1(k)(z)|2\displaystyle\leqslant\frac{1}{|G_{k+1,k+1}^{(k)}(z)|}\sum_{i=k+1}^{2M}\left|{G_{i,k+1}^{(k)}(z)}\right|^{2}
1|Gk+1,k+1(k)(z)|ImGk+1,k+1(k)(z)y1y.\displaystyle\leqslant\frac{1}{|G_{k+1,k+1}^{(k)}(z)|}\frac{{\mathrm{Im}\,}G_{k+1,k+1}^{(k)}(z)}{y}\leqslant\frac{1}{y}.

From the local law for G𝖠G^{{\mathsf{A}}}, with overwhelming probability we have

|12MTrG𝖠(z)m𝗌𝖼(z)|MεMy\left|{\frac{1}{2M}\operatorname{Tr}G^{{\mathsf{A}}}(z)-m_{\mathsf{sc}}(z)}\right|\leqslant\frac{M^{\varepsilon}}{My}

Again, using MN=No(1)M-N=N^{o(1)}, the telescoping sum yields

|1M+NTrG(z)m𝗌𝖼(z)|NεNy+No(1)Ny.\left|{\frac{1}{M+N}\operatorname{Tr}G(z)-m_{\mathsf{sc}}(z)}\right|\leqslant\frac{N^{\varepsilon}}{Ny}+\frac{N^{o(1)}}{Ny}.

The second term can be absorbed into the first term for any fixed ε>0\varepsilon>0. The arbitrariness of ε\varepsilon concludes that

|1M+NTrG(z)m𝗌𝖼(z)|1Ny.\left|{\frac{1}{M+N}\operatorname{Tr}G(z)-m_{\mathsf{sc}}(z)}\right|\prec\frac{1}{Ny}. (40)

Hence, (39) and (40) have shown that the local law also holds for HH. The rigidity estimates Lemma 3.1 also follow from a classical argument in random matrix theory (see e.g. [BGK17]). As a consequence, Theorem 1, Theorem 2, Theorem 3 and Theorem 4 are all still valid.

Acknowledgment

The author thanks Paul Bourgade for suggesting this problem and for insightful comments on an early version of this manuscript.

Appendix A Auxiliary Results

To make this paper self-contained, we collect some well-known results that are used in the paper.

The first result is about controlling the size of a martingale, which is used in Proposition 3.1 to bound the martingale term in the stochastic dynamics (42), and also in Lemma B.1 to bound (45). This is from [SW09, Appendix B.6, Equation (18)].

Lemma A.1.

For any continuous martingale MM and any λ,μ>0\lambda,\mu>0, we have

(sup0ut|Mu|λ,Mtμ)2eλ22μ.\mathbb{P}\left(\sup_{0\leqslant u\leqslant t}|M_{u}|\geqslant\lambda,\langle M\rangle_{t}\leqslant\mu\right)\leqslant 2e^{-\frac{\lambda^{2}}{2\mu}}.

The second result is the Helffer-Sjőstrand formula, which is a classical result in functional calculus. This formula is used in Proposition 4.1 to compute the trace of functions via the Stieltjes transform. We are using the version in [EY17, Section 11.2].

Lemma A.2 (Helffer-Sjőstrand formula).

Let fC1()f\in C^{1}(\mathbb{R}) with compact support and let χ(y)\chi(y) be a smooth cutoff function with support in [1,1][-1,1], with χ(y)=1\chi(y)=1 for |y|12|y|\leqslant\frac{1}{2} and with bounded derivatives. Then

f(λ)=1π2iyf′′(x)χ(y)+i(f(x)+if(x))χ(y)λxiydxdy.f(\lambda)=\frac{1}{\pi}\int_{\mathbb{R}^{2}}\frac{\text{\rm i}yf^{\prime\prime}(x)\chi(y)+\text{\rm i}(f(x)+\text{\rm i}f^{\prime}(x))\chi^{\prime}(y)}{\lambda-x-\text{\rm i}y}\mathrm{d}x\mathrm{d}y.

We also have the following resolvent expansion identity. This is a well-known result in linear algebra, and it is used in Proposition 4.1 to compare the resolvents of two matrices.

Lemma A.3 (Resolvent expansion).

For any two matrices AA and BB, we have

(A+B)1=A1(A+B)1BA1(A+B)^{-1}=A^{-1}-(A+B)^{-1}BA^{-1}

provided that all the matrix inverses exist.

Finally, we have some estimates for the Stieltjes transform of the semicircle law. For z=E+iηz=E+\text{\rm i}\eta with η>0\eta>0, recall that m𝗌𝖼(z)m_{\mathsf{sc}}(z) denotes the Stieltjes transform of the semicircle distribution. The following estimates are well known in random matrix theory (see e.g. [EY17, Lemma 6.2]).

Lemma A.4.

We have for all z=E+iηz=E+\text{\rm i}\eta with η\eta that

|m𝗌𝖼(z)|=|m𝗌𝖼(z)+z|11.|m_{\mathsf{sc}}(z)|=|m_{\mathsf{sc}}(z)+z|^{-1}\leqslant 1.

Furthermore, there is a constant c>0c>0 such that for E[10,10]E\in[-10,10] and η(0,10]\eta\in(0,10] we have

c|m𝗌𝖼(z)|1cη,|1m𝗌𝖼2(z)|ξ(E)+η,c\leqslant|m_{\mathsf{sc}}(z)|\leqslant 1-c\eta,\ \ |1-m_{\mathsf{sc}}^{2}(z)|\sim\sqrt{\xi(E)+\eta},

as well as

Imm𝗌𝖼(z){ξ(E)+ηif |E|2,ηξ(E)+ηif |E|2,{\mathrm{Im}\,}m_{\mathsf{sc}}(z)\sim\begin{cases}\sqrt{\xi(E)+\eta}&\mbox{if }|E|\leqslant 2,\\ \frac{\eta}{\sqrt{\xi(E)+\eta}}&\mbox{if }|E|\geqslant 2,\end{cases}

where ξ(E):=||E|2|\xi(E):=||E|-2| is the distance of EE to the spectral edge.

Appendix B Proofs for Smoothed Analysis

B.1 Proof of Proposition 3.1

The proof is essentially the same as [Wan19, Proposition 3.8], and we briefly describe the key steps here for completeness. For any 1,mN101\leqslant\ell,m\leqslant N^{10}, we define t=N10t_{\ell}=\ell N^{-10} and z(m)=Em+iηm=Em+iN1+4εξ(Em)1/2z^{(m)}=E_{m}+\text{\rm i}\eta_{m}=E_{m}+\text{\rm i}N^{-1+4\varepsilon}\xi(E_{m})^{-1/2}, where Emdρ𝗌𝖼=mN10\int_{-\infty}^{E_{m}}\mathrm{d}\rho_{\mathsf{sc}}=mN^{-10}. Consider the stopping times

τ0\displaystyle\tau_{0} =inf{0t1:NkNs.t.|sk(t)γk|>N23+ε(N+1|k|)13},\displaystyle=\inf\left\{0\leqslant t\leqslant 1:\exists-N\leqslant k\leqslant N\ s.t.\ |s_{k}(t)-\gamma_{k}|>N^{-\frac{2}{3}+\varepsilon}(N+1-|k|)^{-\frac{1}{3}}\right\},
τ,m\displaystyle\tau_{\ell,m} =inf{0st:Im𝔖~s(zts(m))>N2ε2ξ(Em)1/2(ξ(Em)1/2t)},\displaystyle=\inf\left\{0\leqslant s\leqslant t_{\ell}:{\mathrm{Im}\,}\widetilde{{\mathfrak{S}}}_{s}\left(z_{t_{\ell}-s}^{(m)}\right)>\dfrac{N^{2\varepsilon}}{2}\dfrac{\xi(E_{m})^{1/2}}{\left(\xi(E_{m})^{1/2}\vee t_{\ell}\right)}\right\},
τ\displaystyle\tau =min{τ0,τ,m:0,mN10,ξ(Em)>N23+4ε},\displaystyle=\min\left\{\tau_{0},\tau_{\ell,m}:0\leqslant\ell,m\leqslant N^{10},\xi(E_{m})>N^{-\frac{2}{3}+4\varepsilon}\right\},

with the convention inf=\inf\emptyset=\infty. We claim that it suffices to show that τ=\tau=\infty with overwhelming probability.

To prove this claim, for any z𝒮εz\in\mathscr{S}_{\varepsilon} and 0t10\leqslant t\leqslant 1, we pick ttt+1t_{\ell}\leqslant t\leqslant t_{\ell+1} and |zz(m)|<N5|z-z^{(m)}|<N^{-5}. Note that the maximum principle Lemma 3.2 implies |ψk(t)|1|\psi_{k}(t)|\lesssim 1 for all kk and tt. Then we have |𝔖~t(z)𝔖~t(z(m))|N2|\widetilde{{\mathfrak{S}}}_{t}(z)-\widetilde{{\mathfrak{S}}}_{t}(z^{(m)})|\lesssim N^{-2}. Also, note that for z=E+iηz=E+i\eta we have |St(z)|η1|S_{t}(z)|\leqslant\eta^{-1}, and

|z𝔖~t(z)|Nmaxk|ψk(0)|η2Nη2,|zz𝔖~t(z)|Nη3.\left|{\partial_{z}\widetilde{{\mathfrak{S}}}_{t}(z)}\right|\lesssim N\max_{k}|\psi_{k}(0)|\eta^{-2}\lesssim N\eta^{-2},\ \ \left|{\partial_{zz}\widetilde{{\mathfrak{S}}}_{t}(z)}\right|\lesssim N\eta^{-3}.

Consider the events

,m,k:={suptut+1|tuev2ψk(v)(sk(v)z(m))2dBk(v)|<N4}.{\mathcal{E}}_{\ell,m,k}:=\left\{\sup_{t_{\ell}\leqslant u\leqslant t_{\ell+1}}\left|{\int_{t_{\ell}}^{u}\frac{e^{-\frac{v}{2}}\psi_{k}(v)}{(s_{k}(v)-z^{(m)})^{2}}\mathrm{d}B_{k}(v)}\right|<N^{-4}\right\}.

On the event k,m,k\bigcap_{k}{\mathcal{E}}_{\ell,m,k}, the above estimates imply that |𝔖~t(z(m))𝔖~t(z(m))|<N2|\widetilde{{\mathfrak{S}}}_{t}(z^{(m)})-\widetilde{{\mathfrak{S}}}_{t_{\ell}}(z^{(m)})|<N^{-2}. It further shows that

|𝔖~t(z)𝔖~t(z(m))|<N2.\left|{\widetilde{{\mathfrak{S}}}_{t}(z)-\widetilde{{\mathfrak{S}}}_{t_{\ell}}(z^{(m)})}\right|<N^{-2}.

Since this holds for all zz and tt, we have shown that

{τ=}(1,mN10,NkN,m,k)z𝒮ε,0t1{Im𝔖~t(z)N2εξ(E)1/2(ξ(E)1/2t)}.\left\{\tau=\infty\right\}\bigcap\left(\bigcap_{1\leqslant\ell,m\leqslant N^{10},-N\leqslant k\leqslant N}{\mathcal{E}}_{\ell,m,k}\right)\subset\bigcap_{z\in\mathscr{S}_{\varepsilon},0\leqslant t\leqslant 1}\left\{{\mathrm{Im}\,}\widetilde{{\mathfrak{S}}}_{t}(z)\leqslant N^{2\varepsilon}\frac{\xi(E)^{1/2}}{\left(\xi(E)^{1/2}\vee t\right)}\right\}. (41)

Moreover, note that

tt+1ev2ψk(v)(sk(v)z(m))2dBk(v)t+1tt+1ev|ψk(v)|2(sk(v)z(m))4dvN10(N1+4ε)4(maxk|ψk(0)|)2N6+16ε.\left\langle\int_{t_{\ell}}^{t_{\ell+1}}\frac{e^{-\frac{v}{2}}\psi_{k}(v)}{(s_{k}(v)-z^{(m)})^{2}}\mathrm{d}B_{k}(v)\right\rangle_{t_{\ell+1}}\\ \leqslant\int_{t_{\ell}}^{t_{\ell+1}}\frac{e^{-v}|\psi_{k}(v)|^{2}}{(s_{k}(v)-z^{(m)})^{4}}\mathrm{d}v\leqslant N^{-10}\left(N^{-1+4\varepsilon}\right)^{-4}\left(\max_{k}|\psi_{k}(0)|\right)^{2}\leqslant N^{-6+16\varepsilon}.

Using Lemma A.1, we conclude that the event ,m,k{\mathcal{E}}_{\ell,m,k} happens with overwhelming probability. By a union bound, we further have that l,m,k,m,k\bigcap_{l,m,k}{\mathcal{E}}_{\ell,m,k} happens with overwhelming probability. Together with the set inclusion (41), we conclude that the claim is true, i.e. it suffices to prove τ=\tau=\infty with overwhelming probability.

To prove τ=\tau=\infty with overwhelming probability, consider some fixed t=tt=t_{\ell} and z=z(m)=E+iηz=z^{(m)}=E+\text{\rm i}\eta, and define the function fu(z):=𝔖~u(ztu)f_{u}(z):=\widetilde{{\mathfrak{S}}}_{u}(z_{t-u}). By Lemma 3.5, the initial condition is well controlled Im𝔖~0(z)N2εξ(Em)1/2(ξ(Em)1/2t){\mathrm{Im}\,}\widetilde{{\mathfrak{S}}}_{0}(z)\lesssim N^{2\varepsilon}\tfrac{\xi(E_{m})^{1/2}}{(\xi(E_{m})^{1/2}\vee t)}. To bound the increments, note that the dynamics (17) yields

dfuτ(z)=ϵu(ztu)d(uτ)eu2NNkNψk(u)(ztusk(u))2dBk(uτ),\mathrm{d}f_{u\wedge\tau}(z)=\epsilon_{u}(z_{t-u})\mathrm{d}(u\wedge\tau)-\dfrac{e^{-\frac{u}{2}}}{\sqrt{N}}\sum_{-N\leqslant k\leqslant N}\dfrac{\psi_{k}(u)}{(z_{t-u}-s_{k}(u))^{2}}\mathrm{d}B_{k}(u\wedge\tau), (42)

where

ϵu(z):=(Su(z)m𝗌𝖼(z))z𝔖~u+14N(zz𝔖~u)+eu22NNkNψk(u)(skz)2(sk+z)\epsilon_{u}(z):=(S_{u}(z)-m_{\mathsf{sc}}(z))\partial_{z}\widetilde{{\mathfrak{S}}}_{u}+\dfrac{1}{4N}(\partial_{zz}\widetilde{{\mathfrak{S}}}_{u})+\dfrac{e^{-\frac{u}{2}}}{2N}\sum_{-N\leqslant k\leqslant N}\dfrac{\psi_{k}(u)}{(s_{k}-z)^{2}(s_{k}+z)}

By the local semicircle law (16), we have

sup0st|0s(Su(ztu)m𝗌𝖼(ztu))z𝔖~u(ztu)d(uτ)|\displaystyle\quad\sup_{0\leqslant s\leqslant t}\left|\int_{0}^{s}\left(S_{u}(z_{t-u})-m_{\mathsf{sc}}(z_{t-u})\right)\partial_{z}\widetilde{{\mathfrak{S}}}_{u}(z_{t-u})\mathrm{d}(u\wedge\tau)\right| (43)
0tNεNIm(ztu)NkNψk(u)|sk(u)ztu|2d(uτ)\displaystyle\leqslant\int_{0}^{t}\dfrac{N^{\varepsilon}}{{N{\mathrm{Im}\,}(z_{t-u})}}\sum_{-N\leqslant k\leqslant N}\dfrac{\psi_{k}(u)}{|s_{k}(u)-z_{t-u}|^{2}}\mathrm{d}(u\wedge\tau)
0tNεIm𝔖~u(ztu)N(Imztu)2d(uτ)\displaystyle\leqslant\int_{0}^{t}\dfrac{N^{\varepsilon}{\mathrm{Im}\,}\widetilde{{\mathfrak{S}}}_{u}(z_{t-u})}{{N}({\mathrm{Im}\,}z_{t-u})^{2}}\mathrm{d}(u\wedge\tau)
0tN2εduN(η+(tu)b(z)ξ(z)1/2)2ξ(E)12(ξ(E)12t)\displaystyle\leqslant\int_{0}^{t}\dfrac{N^{2\varepsilon}\mathrm{d}u}{{N}\left(\eta+(t-u)\frac{b(z)}{\xi(z)^{1/2}}\right)^{2}}\dfrac{\xi(E)^{\frac{1}{2}}}{(\xi(E)^{\frac{1}{2}}\vee t)}
ξ(E)12(ξ(E)12t).\displaystyle\lesssim\dfrac{\xi(E)^{\frac{1}{2}}}{(\xi(E)^{\frac{1}{2}}\vee t)}.

Also, we have

sup0st|0s14N(zz𝔖~u(ztu))d(uτ)|0tIm𝔖~u(ztu)N(Im(ztu))2d(uτ)Nεξ(E)12(ξ(E)12t),\sup_{0\leqslant s\leqslant t}\left|\int_{0}^{s}\dfrac{1}{4N}(\partial_{zz}\widetilde{{\mathfrak{S}}}_{u}(z_{t-u}))\mathrm{d}(u\wedge\tau)\right|\lesssim\int_{0}^{t}\dfrac{{\mathrm{Im}\,}\widetilde{{\mathfrak{S}}}_{u}(z_{t-u})}{N\left({\mathrm{Im}\,}(z_{t-u})\right)^{2}}\mathrm{d}(u\wedge\tau)\lesssim N^{-\varepsilon}\dfrac{\xi(E)^{\frac{1}{2}}}{(\xi(E)^{\frac{1}{2}}\vee t)},

and

sup0st|0seu22NNkNψk(u)(skztu)2(sk+ztu)d(uτ)|1N0t1Im(ztu)NkNψk(u)|sk(u)ztu|2d(uτ)0tImfu(ztu)N(Im(ztu))2d(uτ)Nεξ(E)12(ξ(E)12t).\sup_{0\leqslant s\leqslant t}\left|\int_{0}^{s}\dfrac{e^{-\frac{u}{2}}}{2N}\sum_{-N\leqslant k\leqslant N}\dfrac{\psi_{k}(u)}{(s_{k}-z_{t-u})^{2}(s_{k}+z_{t-u})}\mathrm{d}(u\wedge\tau)\right|\\ \lesssim\dfrac{1}{N}\int_{0}^{t}\dfrac{1}{{\mathrm{Im}\,}(z_{t-u})}\sum_{-N\leqslant k\leqslant N}\dfrac{\psi_{k}(u)}{|s_{k}(u)-z_{t-u}|^{2}}\mathrm{d}(u\wedge\tau)\\ \lesssim\int_{0}^{t}\dfrac{{\mathrm{Im}\,}f_{u}(z_{t-u})}{N\left({\mathrm{Im}\,}(z_{t-u})\right)^{2}}\mathrm{d}(u\wedge\tau)\lesssim N^{-\varepsilon}\dfrac{\xi(E)^{\frac{1}{2}}}{(\xi(E)^{\frac{1}{2}}\vee t)}.

For the martingale part

Ms:=0seu2NNkNψk(u)(ztusk(u))2dBk(uτ),M_{s}:=\int_{0}^{s}\dfrac{e^{-\frac{u}{2}}}{\sqrt{N}}\sum_{-N\leqslant k\leqslant N}\dfrac{\psi_{k}(u)}{(z_{t-u}-s_{k}(u))^{2}}\mathrm{d}B_{k}(u\wedge\tau),

using the rigidity of singular values (Lemma 3.1), with overwhelming probability we have

sup0st|Ms|2Nε20t1NNkN|ψk(u)|2|ztuγk|4d(uτ).\sup_{0\leqslant s\leqslant t}|M_{s}|^{2}\lesssim N^{\frac{\varepsilon}{2}}\int_{0}^{t}\frac{1}{N}\sum_{-N\leqslant k\leqslant N}\frac{|\psi_{k}(u)|^{2}}{|z_{t-u}-\gamma_{k}|^{4}}\mathrm{d}(u\wedge\tau).

To estimate this integral, we chop the interval [N,N][-N,N] into 2N14ε2N^{1-4\varepsilon} subintervals Ij=kj,kj+1I_{j}=\llbracket k_{j},k_{j+1}\rrbracket where kj=N+jN4εk_{j}=-N+\lfloor jN^{4\varepsilon}\rfloor. We can bound the summation in the integral in the following way

1NNkN|ψk(u)|2|ztuγk|41N0j2N14ε(maxkIjψk(u))(maxkIj1|ztuγk|4)(kIjψk(u)).\frac{1}{N}\sum_{-N\leqslant k\leqslant N}\frac{|\psi_{k}(u)|^{2}}{|z_{t-u}-\gamma_{k}|^{4}}\leqslant\frac{1}{N}\sum_{0\leqslant j\leqslant 2N^{1-4\varepsilon}}\left(\max_{k\in I_{j}}\psi_{k}(u)\right)\left(\max_{k\in I_{j}}\frac{1}{|z_{t-u}-\gamma_{k}|^{4}}\right)\left(\sum_{k\in I_{j}}\psi_{k}(u)\right).

Using similar discretization arguments as above, we can derive

maxkIjψk(u)kIjψk(u)N6εN(ξ(γkj)1/2u),\max_{k\in I_{j}}\psi_{k}(u)\leqslant\sum_{k\in I_{j}}\psi_{k}(u)\leqslant\frac{N^{6\varepsilon}}{N(\xi(\gamma_{k_{j}})^{1/2}\vee u)},

and

maxkIj1|ztuγk|4N4εkIj1|ztuγk|4.\max_{k\in I_{j}}\frac{1}{|z_{t-u}-\gamma_{k}|^{4}}\leqslant N^{-4\varepsilon}\sum_{k\in I_{j}}\frac{1}{|z_{t-u}-\gamma_{k}|^{4}}.

Therefore, we obtain

sup0st|Ms|2N2+9ε0t221|ztux|4(ξ(x)u2)𝑑ρ𝗌𝖼(x)duNεξ(E)(ξ(E)t2).\sup_{0\leqslant s\leqslant t}|M_{s}|^{2}\leqslant N^{-2+9\varepsilon}\int_{0}^{t}\int_{-2}^{2}\frac{1}{|z_{t-u}-x|^{4}(\xi(x)\vee u^{2})}d\rho_{\mathsf{sc}}(x)\mathrm{d}u\lesssim N^{\varepsilon}\frac{\xi(E)}{(\xi(E)\vee t^{2})}.

Combining this estimate for the martingale term with previous estimates, a union bound shows that with overwhelming probability we have

sup,m,0st,ξ(Em)>φ2N2/3Imf~sτ(ztsτ(m))N2εξ(E)1/2(ξ(E)1/2t).\sup_{\ell,m,0\leqslant s\leqslant t_{\ell},\xi(E_{m})>\varphi^{2}N^{-2/3}}{\mathrm{Im}\,}\widetilde{f}_{s\wedge\tau}(z_{t_{\ell}-s\wedge\tau}^{(m)})\lesssim N^{2\varepsilon}\dfrac{\xi(E)^{1/2}}{\left(\xi(E)^{1/2}\vee t\right)}.

Now we have proved τ=\tau=\infty with overwhelming probability and hence the desired result is true.

B.2 Proof of Lemma 3.10

The proof of Lemma 3.10 is a delicate task. The key part of the proof is a careful analysis of the dynamics. The main idea is to approximate the dynamics with a short-range version, which will be easier to control. To do this, we show the finite speed of propagation estimate for the short-range kernel of the parabolic-type equation (14) satisfied by {φk}\{\varphi_{k}\}. Then we prove a short-range approximation of the original dynamics and introduce a regularized equation. Finally, we show that, with a well-behaved initial condition, the regularized equation gives us the desired good approximation.

To begin with, the core input of the bootstrap argument is the following technical lemma, which states that the estimate of the local average will improve along with the induction hypothesis α{\mathcal{H}}_{\alpha}.

Lemma B.1.

Assume α{\mathcal{H}}_{\alpha}. Let ξ>0\xi>0 be any fixed small constant. For any 0<t<10<t<1, any ε0>0\varepsilon_{0}>0 arbitrarily small and z=E+iηz=E+\text{\rm i}\eta satisfying N1+ε0<η<1N^{-1+\varepsilon_{0}}<\eta<1, |E|<2ξ|E|<2-\xi, we have

|Im𝔖t(z)et2ImSt(z)ImS0(zt)Im𝔖0(zt)|((Nt)αN2tη+1Nt)\left|{\mathrm{Im}\,}{\mathfrak{S}}_{t}(z)-e^{-\frac{t}{2}}\dfrac{{\mathrm{Im}\,}S_{t}(z)}{{\mathrm{Im}\,}S_{0}(z_{t})}{\mathrm{Im}\,}{\mathfrak{S}}_{0}(z_{t})\right|\prec\left(\dfrac{(Nt)^{\alpha}}{N^{2}t\eta}+\dfrac{1}{Nt}\right) (44)
Proof.

Fix tt and consider the function

gu(z):=𝔖u(ztu)eu2Im𝔖0(zt)ImS0(zt)Su(ztu), 0ut.g_{u}(z):={\mathfrak{S}}_{u}(z_{t-u})-e^{-\frac{u}{2}}\dfrac{{\mathrm{Im}\,}{\mathfrak{S}}_{0}(z_{t})}{{\mathrm{Im}\,}S_{0}(z_{t})}S_{u}(z_{t-u}),\ \ 0\leqslant u\leqslant t.

An observation is that eu/2Su(z)e^{-u/2}S_{u}(z) satisfy the same stochastic advection equation (17) with φk\varphi_{k} replaced by 12N\tfrac{1}{2N}. Therefore, we have

dgu\displaystyle\mathrm{d}g_{u} =(Su(ztu)m𝗌𝖼(ztu))(z𝔖u(ztu)eu2Im𝔖0(zt)ImS0(zt)zSu(ztu))du\displaystyle=\left(S_{u}(z_{t-u})-m_{\mathsf{sc}}(z_{t-u})\right)\left(\partial_{z}{\mathfrak{S}}_{u}(z_{t-u})-e^{-\frac{u}{2}}\dfrac{{\mathrm{Im}\,}{\mathfrak{S}}_{0}(z_{t})}{{\mathrm{Im}\,}S_{0}(z_{t})}\partial_{z}S_{u}(z_{t-u})\right)\mathrm{d}u (45)
+14N(zz𝔖u(ztu)eu2Im𝔖0(zt)ImS0(zt)zzSu(ztu))du\displaystyle\quad+\dfrac{1}{4N}\left(\partial_{zz}{\mathfrak{S}}_{u}(z_{t-u})-e^{-\frac{u}{2}}\dfrac{{\mathrm{Im}\,}{\mathfrak{S}}_{0}(z_{t})}{{\mathrm{Im}\,}S_{0}(z_{t})}\partial_{zz}S_{u}(z_{t-u})\right)\mathrm{d}u
+eu22NNkNθk(u)(sk(u)ztu)2(sk(u)+ztu)du\displaystyle\quad+\dfrac{e^{-\frac{u}{2}}}{2N}\sum_{-N\leqslant k\leqslant N}\dfrac{\theta_{k}(u)}{(s_{k}(u)-z_{t-u})^{2}(s_{k}(u)+z_{t-u})}\mathrm{d}u
eu2NNkNθk(u)(sk(u)ztu)2dBk,\displaystyle\quad-\dfrac{e^{-\frac{u}{2}}}{\sqrt{N}}\sum_{-N\leqslant k\leqslant N}\dfrac{\theta_{k}(u)}{(s_{k}(u)-z_{t-u})^{2}}\mathrm{d}B_{k},

where

θk(u)=φk(u)Im𝔖0(zt)2NImS0(zt).\theta_{k}(u)=\varphi_{k}(u)-\frac{{\mathrm{Im}\,}{\mathfrak{S}}_{0}(z_{t})}{2N{\mathrm{Im}\,}S_{0}(z_{t})}.

Similarly as in the proof of Proposition 3.1, for ε>0\varepsilon>0 and 0,m,pN100\leqslant\ell,m,p\leqslant N^{10}, define t=N10t_{\ell}=\ell N^{-10} and z(m,p)=Em+iηpz^{(m,p)}=E_{m}+\text{\rm i}\eta_{p} where Emdρ=mN10\int_{-\infty}^{E_{m}}\mathrm{d}\rho=mN^{-10} and ηp=N1+4ε+pN10\eta_{p}=N^{-1+4\varepsilon}+pN^{-10}. We also pick c>0c>0 such that (1c)N=argmink|γk(2ξ10)|\lfloor(1-c)N\rfloor=\arg\min_{k}|\gamma_{k}-(2-\tfrac{\xi}{10})|. Assuming α{\mathcal{H}}_{\alpha}, let ε0>0\varepsilon_{0}>0 be the arbitrarily small scale in the hypothesis. Let C>0C>0 be some suitably large constant. Recall the stopping times

τ0\displaystyle\tau_{0} =inf{0u1:NkNs.t.|sk(u)γk|>N23+ε(N+1|k|)13},\displaystyle=\inf\left\{0\leqslant u\leqslant 1:\exists-N\leqslant k\leqslant N\ s.t.\ |s_{k}(u)-\gamma_{k}|>N^{-\frac{2}{3}+\varepsilon}(N+1-|k|)^{-\frac{1}{3}}\right\},
τ1\displaystyle\tau_{1} =inf{N1+ε0u1:NkNs.t.|φk(u)|>NCεN1((N+1|k|N)1/3u)},\displaystyle=\inf\left\{N^{-1+\varepsilon_{0}}\leqslant u\leqslant 1:\exists-N\leqslant k\leqslant N\ s.t.\ |\varphi_{k}(u)|>\dfrac{N^{C\varepsilon}}{N}\dfrac{1}{\left((\frac{N+1-|k|}{N})^{1/3}\vee u\right)}\right\},

and consider the new stopping times

τ2\displaystyle\tau_{2} =inf{N1+ε0u1:k(c1)N,(1c)Ns.t.|φk(u)φ^k(u)|>NCε(Nt)αN2t},\displaystyle=\inf\left\{N^{-1+\varepsilon_{0}}\leqslant u\leqslant 1:\exists k\in\llbracket(c-1)N,(1-c)N\rrbracket\ s.t.\ |\varphi_{k}(u)-\widehat{\varphi}_{k}(u)|>N^{C\varepsilon}\dfrac{(Nt)^{\alpha}}{N^{2}t}\right\},
τ,m,p\displaystyle\tau_{\ell,m,p} =inf{0ut:|Imgu(t)(z(m,p))|>NCε((Nt)αN2tη+1Nt)}\displaystyle=\inf\left\{0\leqslant u\leqslant t_{\ell}:\left|{\mathrm{Im}\,}g_{u}^{(t_{\ell})}(z^{(m,p)})\right|>N^{C\varepsilon}\left(\dfrac{(Nt)^{\alpha}}{N^{2}t\eta}+\dfrac{1}{Nt}\right)\right\}
τ\displaystyle\tau =min{τ0,τ1,τ2,τ,m,p:0,m,pN10,|Em|<2ξ}.\displaystyle=\min\{\tau_{0},\tau_{1},\tau_{2},\tau_{\ell,m,p}:0\leqslant\ell,m,p\leqslant N^{10},|E_{m}|<2-\xi\}.

Recall the convention inf=\inf\emptyset=\infty. As shown in the proof of Proposition 3.1, it suffices to show that τ=\tau=\infty with overwhelming probability.

A key ingredient for the analysis of the dynamics of gug_{u} is the following estimates on θk(u)\theta_{k}(u). To do this, we fix some t=tt=t_{\ell} and z=z(m,p)z=z^{(m,p)} with |Em|<2ξ|E_{m}|<2-\xi, and let N1+ε0utτN^{-1+\varepsilon_{0}}\leqslant u\leqslant t\wedge\tau and k(c1)N,(1c)Nk\in\llbracket(c-1)N,(1-c)N\rrbracket.

On the one hand, we have a direct a priori estimate. Since uτ1u\leqslant\tau_{1}, we have

|φk(u)|N23+Cε(N+1|k|)13.|\varphi_{k}(u)|\lesssim N^{-\frac{2}{3}+C\varepsilon}(N+1-|k|)^{-\frac{1}{3}}.

Moreover, note that for z=E+iηz=E+i\eta with EE in the bulk and t<1t<1, uniformly we have ImS0(zt)1{\mathrm{Im}\,}S_{0}(z_{t})\gtrsim 1. By Lemma 3.5, this shows

Im𝔖0(zt)2NImS0(zt)N1+εN23+Cε(N+1|k|)13.\dfrac{{\mathrm{Im}\,}{\mathfrak{S}}_{0}(z_{t})}{2N{\mathrm{Im}\,}S_{0}(z_{t})}\lesssim N^{-1+\varepsilon}\lesssim N^{-\frac{2}{3}+C\varepsilon}(N+1-|k|)^{-\frac{1}{3}}.

As a consequence, we have

|θk(u)|N23+Cε(N+1|k|)13.|\theta_{k}(u)|\lesssim N^{-\frac{2}{3}+C\varepsilon}(N+1-|k|)^{-\frac{1}{3}}. (46)

On the other hand, the estimate can also be obtained via approximation

|θk(u)||φk(u)φ^k(u)|+|φ^k(u)φ^j(t)|+|φ^j(t)Imf0(zt)2NImS0(zt)|.|\theta_{k}(u)|\leqslant|\varphi_{k}(u)-\widehat{\varphi}_{k}(u)|+|\widehat{\varphi}_{k}(u)-\widehat{\varphi}_{j}(t)|+\left|\widehat{\varphi}_{j}(t)-\frac{{\mathrm{Im}\,}f_{0}(z_{t})}{2N{\mathrm{Im}\,}S_{0}(z_{t})}\right|.

For the first term, since uτ2u\leqslant\tau_{2} we have

|φk(u)φ^k(u)|NCε(Nu)aN2u.|\varphi_{k}(u)-\widehat{\varphi}_{k}(u)|\leqslant N^{C\varepsilon}\dfrac{(Nu)^{a}}{N^{2}u}.

Choosing |γjE|N1+2ε|\gamma_{j}-E|\leqslant N^{-1+2\varepsilon}, the remaining two terms are controlled by Lemma 3.8

|φ^k(u)φ^j(t)|+|φ^j(t)Imf0(zt)2NImS0(zt)|\displaystyle|\widehat{\varphi}_{k}(u)-\widehat{\varphi}_{j}(t)|+\left|\widehat{\varphi}_{j}(t)-\frac{{\mathrm{Im}\,}f_{0}(z_{t})}{2N{\mathrm{Im}\,}S_{0}(z_{t})}\right| Nε(|kj|N2u+|tu|Nu+|Eγj|Nt+ηNt)\displaystyle\lesssim N^{\varepsilon}\left(\dfrac{|k-j|}{N^{2}u}+\dfrac{|t-u|}{Nu}+\dfrac{|E-\gamma_{j}|}{Nt}+\dfrac{\eta}{Nt}\right)
NCε(|γkE|Nu+tuNu+ηNt).\displaystyle\lesssim N^{C\varepsilon}\left(\dfrac{|\gamma_{k}-E|}{Nu}+\dfrac{t-u}{Nu}+\dfrac{\eta}{Nt}\right).

Together, we decompose the error terms into two parts and obtain the following

|θk(u)|NCε(|γkE|Nu+tuNu+ηNt+(Nu)αN2u)=:φC(|γkE|Nu+Λ(a,N,t,η,u))|\theta_{k}(u)|\leqslant N^{C\varepsilon}\left(\dfrac{|\gamma_{k}-E|}{Nu}+\dfrac{t-u}{Nu}+\dfrac{\eta}{Nt}+\dfrac{(Nu)^{\alpha}}{N^{2}u}\right)=:\varphi^{C}\left(\dfrac{|\gamma_{k}-E|}{Nu}+\Lambda(a,N,t,\eta,u)\right) (47)

With the above control on θk(u)\theta_{k}(u), the dynamics (45) can be used to bound Im(gtg0){\mathrm{Im}\,}(g_{t}-g_{0}) similarly as in Proposition 3.1. For the first term, we have

0tτ|Su(ztu)m𝗌𝖼(ztu)||z𝔖u(ztu)eu2Im𝔖0(zt)ImS0(zt)zSu(ztu)|du\displaystyle\quad\int_{0}^{t\wedge\tau}|S_{u}(z_{t-u})-m_{\mathsf{sc}}(z_{t-u})|\left|\partial_{z}{\mathfrak{S}}_{u}(z_{t-u})-e^{-\frac{u}{2}}\dfrac{{\mathrm{Im}\,}{\mathfrak{S}}_{0}(z_{t})}{{\mathrm{Im}\,}S_{0}(z_{t})}\partial_{z}S_{u}(z_{t-u})\right|\mathrm{d}u (48)
0tτNCεNIm(ztu)NkN|θk(u)||sk(u)ztu|2du\displaystyle\leqslant\int_{0}^{t\wedge\tau}\dfrac{N^{C\varepsilon}}{{N{\mathrm{Im}\,}(z_{t-u})}}\sum_{-N\leqslant k\leqslant N}\dfrac{|\theta_{k}(u)|}{|s_{k}(u)-z_{t-u}|^{2}}\mathrm{d}u
0tτNCεNIm(ztu)(|k|(1c)N|θk(u)||γkztu|2+|k|<(1c)N|θk(u)||γkztu|2)du\displaystyle\leqslant\int_{0}^{t\wedge\tau}\dfrac{N^{C\varepsilon}}{{N{\mathrm{Im}\,}(z_{t-u})}}\left(\sum_{|k|\geqslant(1-c)N}\dfrac{|\theta_{k}(u)|}{|\gamma_{k}-z_{t-u}|^{2}}+\sum_{|k|<(1-c)N}\dfrac{|\theta_{k}(u)|}{|\gamma_{k}-z_{t-u}|^{2}}\right)\mathrm{d}u
=:I1+I2.\displaystyle=:I_{1}+I_{2}.

For the soft edge part |k|(1c)N|k|\geqslant(1-c)N, using (46) we obtain

I1\displaystyle I_{1} NCε0tτ1NIm(ztu)|k|(1α)NN23+Cε(N+1|k|)13du\displaystyle\leqslant N^{C\varepsilon}\int_{0}^{t\wedge\tau}\dfrac{1}{{N{\mathrm{Im}\,}(z_{t-u})}}\sum_{|k|\geqslant(1-\alpha)N}N^{-\frac{2}{3}+C\varepsilon}(N+1-|k|)^{-\frac{1}{3}}\mathrm{d}u (49)
NCε0tτ1NIm(ztu)du\displaystyle\leqslant N^{C\varepsilon}\int_{0}^{t\wedge\tau}\dfrac{1}{{N{\mathrm{Im}\,}(z_{t-u})}}\mathrm{d}u
NCεlog(1+Nt)N.\displaystyle\leqslant N^{C\varepsilon}\dfrac{\log(1+Nt)}{N}.

For I2I_{2}, note that

|k|<(1c)N|θk(u)||γkztu|2\displaystyle\sum_{|k|<(1-c)N}\dfrac{|\theta_{k}(u)|}{|\gamma_{k}-z_{t-u}|^{2}} NCε(1Nu|k|<(1c)N|γkE||γkztu|2+Λ|k|<(1c)N1|γkztu|2)\displaystyle\leqslant N^{C\varepsilon}\left(\dfrac{1}{Nu}\sum_{|k|<(1-c)N}\dfrac{|\gamma_{k}-E|}{|\gamma_{k}-z_{t-u}|^{2}}+\Lambda\sum_{|k|<(1-c)N}\dfrac{1}{|\gamma_{k}-z_{t-u}|^{2}}\right)
NCε(1u+NΛη+tu)\displaystyle\leqslant N^{C\varepsilon}\left(\dfrac{1}{u}+N\dfrac{\Lambda}{\eta+t-u}\right)

This yields

N1+ε0tτNCεNIm(ztu)|k|<(1α)N|θk(u)||γkztu|2du\displaystyle\quad\int_{N^{-1+\varepsilon_{0}}}^{t\wedge\tau}\dfrac{N^{C\varepsilon}}{{N{\mathrm{Im}\,}(z_{t-u})}}\sum_{|k|<(1-\alpha)N}\dfrac{|\theta_{k}(u)|}{|\gamma_{k}-z_{t-u}|^{2}}\mathrm{d}u (50)
NCεN1+ε0tτ1N(η+tu)(1u+NΛη+tu)du\displaystyle\leqslant N^{C\varepsilon}\int_{N^{-1+\varepsilon_{0}}}^{t\wedge\tau}\dfrac{1}{{N(\eta+t-u)}}\left(\dfrac{1}{u}+N\dfrac{\Lambda}{\eta+t-u}\right)\mathrm{d}u
NCεN1+ε0tτ1N(η+tu)(1u+N1η+tu(tuNu+ηNt+(Nu)αN2u))du\displaystyle\leqslant N^{C\varepsilon}\int_{N^{-1+\varepsilon_{0}}}^{t\wedge\tau}\dfrac{1}{{N(\eta+t-u)}}\left(\dfrac{1}{u}+N\dfrac{1}{\eta+t-u}\left(\dfrac{t-u}{Nu}+\dfrac{\eta}{Nt}+\dfrac{(Nu)^{\alpha}}{N^{2}u}\right)\right)\mathrm{d}u
NCε((Nt)αN2tη+1Nt)\displaystyle\leqslant N^{C\varepsilon}\left(\dfrac{(Nt)^{\alpha}}{N^{2}t\eta}+\dfrac{1}{Nt}\right)

Moreover, without loss of generality, we may assume that ε0ε\varepsilon_{0}\sim\varepsilon. Then, using (46) we obtain

0N1+ε0NCεNIm(ztu)|k|<(1α)N|θk(u)||γkztu|2duNCεN2(η+t)2\int_{0}^{N^{-1+\varepsilon_{0}}}\dfrac{N^{C\varepsilon}}{{N{\mathrm{Im}\,}(z_{t-u})}}\sum_{|k|<(1-\alpha)N}\dfrac{|\theta_{k}(u)|}{|\gamma_{k}-z_{t-u}|^{2}}\mathrm{d}u\leqslant\dfrac{N^{C\varepsilon}}{N^{2}(\eta+t)^{2}} (51)

Together with previous estimates, this shows

0tτ|Su(ztu)m𝗌𝖼(ztu)||z𝔖u(ztu)eu2Im𝔖0(zt)ImS0(zt)zSu(ztu)|duNCε((Nt)αN2tη+1Nt).\int_{0}^{t\wedge\tau}|S_{u}(z_{t-u})-m_{\mathsf{sc}}(z_{t-u})|\left|\partial_{z}{\mathfrak{S}}_{u}(z_{t-u})-e^{-\frac{u}{2}}\dfrac{{\mathrm{Im}\,}{\mathfrak{S}}_{0}(z_{t})}{{\mathrm{Im}\,}S_{0}(z_{t})}\partial_{z}S_{u}(z_{t-u})\right|\mathrm{d}u\leqslant N^{C\varepsilon}\left(\dfrac{(Nt)^{\alpha}}{N^{2}t\eta}+\dfrac{1}{Nt}\right). (52)

Similarly, we have

0tτ14N|zz𝔖u(ztu)eu2Im𝔖0(zt)ImS0(zt)zzSu(ztu)|duNCε((Nt)αN2tη+1Nt)\int_{0}^{t\wedge\tau}\dfrac{1}{4N}\left|\partial_{zz}{\mathfrak{S}}_{u}(z_{t-u})-e^{-\frac{u}{2}}\dfrac{{\mathrm{Im}\,}{\mathfrak{S}}_{0}(z_{t})}{{\mathrm{Im}\,}S_{0}(z_{t})}\partial_{zz}S_{u}(z_{t-u})\right|\mathrm{d}u\leqslant N^{C\varepsilon}\left(\dfrac{(Nt)^{\alpha}}{N^{2}t\eta}+\dfrac{1}{Nt}\right)

and

0tτ|eu22NNkNθk(u)(sk(u)ztu)2(sk(u)+ztu)|duNCε((Nt)αN2tη+1Nt)\int_{0}^{t\wedge\tau}\left|\dfrac{e^{-\frac{u}{2}}}{2N}\sum_{-N\leqslant k\leqslant N}\dfrac{\theta_{k}(u)}{(s_{k}(u)-z_{t-u})^{2}(s_{k}(u)+z_{t-u})}\right|\mathrm{d}u\leqslant N^{C\varepsilon}\left(\dfrac{(Nt)^{\alpha}}{N^{2}t\eta}+\dfrac{1}{Nt}\right)

It suffices to bound the martingale term

Ms:=0seu2NNkNθk(u)(sk(u)ztu)2dBk.M_{s}:=\int_{0}^{s}\dfrac{e^{-\frac{u}{2}}}{\sqrt{N}}\sum_{-N\leqslant k\leqslant N}\dfrac{\theta_{k}(u)}{(s_{k}(u)-z_{t-u})^{2}}\mathrm{d}B_{k}.

Again we decompose the integral into two parts

Mtτ\displaystyle\left\langle M\right\rangle_{t\wedge\tau} 1N0tτ|k|(1c)N|θk(u)|2|γkztu|4du+1N0tτ|k|<(1c)N|θk(u)|2|γkztu|4du\displaystyle\lesssim\dfrac{1}{N}\int_{0}^{t\wedge\tau}\sum_{|k|\geqslant(1-c)N}\dfrac{|\theta_{k}(u)|^{2}}{|\gamma_{k}-z_{t-u}|^{4}}\mathrm{d}u+\dfrac{1}{N}\int_{0}^{t\wedge\tau}\sum_{|k|<(1-c)N}\dfrac{|\theta_{k}(u)|^{2}}{|\gamma_{k}-z_{t-u}|^{4}}\mathrm{d}u
=:J1+J2.\displaystyle=:J_{1}+J_{2}.

The contribution from the soft edge is easy to control

J1NCεN0tτ|k|(1c)N(N23(N+1|k|)13)2duNCεtN2J_{1}\leqslant\dfrac{N^{C\varepsilon}}{N}\int_{0}^{t\wedge\tau}\sum_{|k|\geqslant(1-c)N}\left(N^{-\frac{2}{3}}(N+1-|k|)^{-\frac{1}{3}}\right)^{2}\mathrm{d}u\leqslant N^{C\varepsilon}\dfrac{t}{N^{2}}

For the other term, we use both (46) and (47)

J2\displaystyle J_{2} NCεN0tη1N2|k|<(1c)N1|γkztu|4du\displaystyle\lesssim\dfrac{N^{C\varepsilon}}{N}\int_{0}^{t\wedge\eta}\dfrac{1}{N^{2}}\sum_{|k|<(1-c)N}\dfrac{1}{|\gamma_{k}-z_{t-u}|^{4}}\mathrm{d}u
+NCεNtηtτ|k|<(1c)N1|γkztu|4[(|γkE|2N2u2+Λ2)1N2]du.\displaystyle\quad+\dfrac{N^{C\varepsilon}}{N}\int_{t\wedge\eta}^{t\wedge\tau}\sum_{|k|<(1-c)N}\dfrac{1}{|\gamma_{k}-z_{t-u}|^{4}}\left[\left(\dfrac{|\gamma_{k}-E|^{2}}{N^{2}u^{2}}+\Lambda^{2}\right)\wedge\dfrac{1}{N^{2}}\right]\mathrm{d}u.

Note that

NCεN0tη1N2|k|<(1c)N1|γkztu|4duNCεN0tη1N(η+tu)3duNCεN2t2.\dfrac{N^{C\varepsilon}}{N}\int_{0}^{t\wedge\eta}\dfrac{1}{N^{2}}\sum_{|k|<(1-c)N}\dfrac{1}{|\gamma_{k}-z_{t-u}|^{4}}\mathrm{d}u\leqslant\dfrac{N^{C\varepsilon}}{N}\int_{0}^{t\wedge\eta}\dfrac{1}{N(\eta+t-u)^{3}}\mathrm{d}u\leqslant\dfrac{N^{C\varepsilon}}{N^{2}t^{2}}.

For the other term, without loss of generality we may assume η<t\eta<t. For the |γkE|2N2u2\tfrac{|\gamma_{k}-E|^{2}}{N^{2}u^{2}} term, we have

NCεNηt|k|<(1c)N1|γkztu|4(|γkE|2N2u21N2)du\displaystyle\quad\dfrac{N^{C\varepsilon}}{N}\int_{\eta}^{t}\sum_{|k|<(1-c)N}\dfrac{1}{|\gamma_{k}-z_{t-u}|^{4}}\left(\dfrac{|\gamma_{k}-E|^{2}}{N^{2}u^{2}}\wedge\dfrac{1}{N^{2}}\right)\mathrm{d}u
NCεN2t2+NCεNηtk:|γkE|u1|γkztu|4|γkE|2N2u2du\displaystyle\leqslant\dfrac{N^{C\varepsilon}}{N^{2}t^{2}}+\dfrac{N^{C\varepsilon}}{N}\int_{\eta}^{t}\sum_{k:|\gamma_{k}-E|\leqslant u}\dfrac{1}{|\gamma_{k}-z_{t-u}|^{4}}\dfrac{|\gamma_{k}-E|^{2}}{N^{2}u^{2}}\mathrm{d}u
NCεN2t2+NCεηt1N2u2(uux2x4+(η+tu)4dx)du\displaystyle\leqslant\dfrac{N^{C\varepsilon}}{N^{2}t^{2}}+N^{C\varepsilon}\int_{\eta}^{t}\dfrac{1}{N^{2}u^{2}}\left(\int_{-u}^{u}\dfrac{x^{2}}{x^{4}+(\eta+t-u)^{4}}\mathrm{d}x\right)\mathrm{d}u
NCεN2t2.\displaystyle\leqslant\dfrac{N^{C\varepsilon}}{N^{2}t^{2}}.

For the contribution of Λ(a,N,t,η,u)\Lambda(a,N,t,\eta,u), we have

NCεNηt|k|<(1c)N1|γkztu|4(Λ21N2)du\displaystyle\quad\dfrac{N^{C\varepsilon}}{N}\int_{\eta}^{t}\sum_{|k|<(1-c)N}\dfrac{1}{|\gamma_{k}-z_{t-u}|^{4}}\left(\Lambda^{2}\wedge\dfrac{1}{N^{2}}\right)\mathrm{d}u
NCεNηt|k|<(1c)N1|γkztu|4[((tu)2N2u2+(Nu)2αN4u2+η2N2t2)1N2]du\displaystyle\leqslant\dfrac{N^{C\varepsilon}}{N}\int_{\eta}^{t}\sum_{|k|<(1-c)N}\dfrac{1}{|\gamma_{k}-z_{t-u}|^{4}}\left[\left(\dfrac{(t-u)^{2}}{N^{2}u^{2}}+\dfrac{(Nu)^{2\alpha}}{N^{4}u^{2}}+\dfrac{\eta^{2}}{N^{2}t^{2}}\right)\wedge\dfrac{1}{N^{2}}\right]\mathrm{d}u
NCεηt1(η+tu)3[(Nu)2αN4u2+η2N2t2+((tu)2N2u21N2)]du.\displaystyle\leqslant N^{C\varepsilon}\int_{\eta}^{t}\dfrac{1}{(\eta+t-u)^{3}}\left[\dfrac{(Nu)^{2\alpha}}{N^{4}u^{2}}+\dfrac{\eta^{2}}{N^{2}t^{2}}+\left(\dfrac{(t-u)^{2}}{N^{2}u^{2}}\wedge\dfrac{1}{N^{2}}\right)\right]\mathrm{d}u.

The first two terms in the bracket give us

ηt1(η+tu)3((Nu)2αN4u2+η2N2t2)du(Nt)2αN4t2η2+1N2t2.\int_{\eta}^{t}\dfrac{1}{(\eta+t-u)^{3}}\left(\dfrac{(Nu)^{2\alpha}}{N^{4}u^{2}}+\dfrac{\eta^{2}}{N^{2}t^{2}}\right)\mathrm{d}u\leqslant\dfrac{(Nt)^{2\alpha}}{N^{4}t^{2}\eta^{2}}+\dfrac{1}{N^{2}t^{2}}.

For the remaining term, we have

ηt1(η+tu)3((tu)2N2u21N2)duηt21(η+tu)31N2du+t2t1(η+tu)3(tu)2N2u2dulogNN2t2\int_{\eta}^{t}\dfrac{1}{(\eta+t-u)^{3}}\left(\dfrac{(t-u)^{2}}{N^{2}u^{2}}\wedge\dfrac{1}{N^{2}}\right)\mathrm{d}u\\ \leqslant\int_{\eta}^{\frac{t}{2}}\dfrac{1}{(\eta+t-u)^{3}}\dfrac{1}{N^{2}}\mathrm{d}u+\int_{\frac{t}{2}}^{t}\dfrac{1}{(\eta+t-u)^{3}}\dfrac{(t-u)^{2}}{N^{2}u^{2}}\mathrm{d}u\leqslant\dfrac{\log N}{N^{2}t^{2}}

Combining these results shows

MtτNCε((Nt)2αN4t2η2+1N2t2).\left\langle M\right\rangle_{t\wedge\tau}\leqslant N^{C\varepsilon}\left(\dfrac{(Nt)^{2\alpha}}{N^{4}t^{2}\eta^{2}}+\dfrac{1}{N^{2}t^{2}}\right).

Using Lemma A.1 and a union bound, for any fixed large D>0D>0 and sufficiently large N>N0(ε,D)N>N_{0}(\varepsilon,D), we have

(sup,m,p,0stτ,|Em|<2ξ|Ms|NCε((Nt)αN2tη+1Nt))>1ND.\mathbb{P}\left(\sup_{\ell,m,p,0\leqslant s\leqslant t\wedge\tau,|E_{m}|<2-\xi}|M_{s}|\leqslant N^{C\varepsilon}\left(\dfrac{(Nt)^{\alpha}}{N^{2}t\eta}+\dfrac{1}{Nt}\right)\right)>1-N^{-D}.

Together with previous estimates, with overwhelming probability we have

sup,m,p,0stτ,|Em|<2ξ|gs(z)|NCε((Nt)αN2tη+1Nt).\sup_{\ell,m,p,0\leqslant s\leqslant t\wedge\tau,|E_{m}|<2-\xi}|g_{s}(z)|\leqslant N^{C\varepsilon}\left(\dfrac{(Nt)^{\alpha}}{N^{2}t\eta}+\dfrac{1}{Nt}\right).

This implies min,m,p{τ,m,p}=\min_{\ell,m,p}\{\tau_{\ell,m,p}\}=\infty with overwhelming probability. Moreover, we have shown in Lemma 3.1, Lemma 3.6 that τ0=\tau_{0}=\infty and τ1=\tau_{1}=\infty with overwhelming probability. Assuming the hypothesis α{\mathcal{H}}_{\alpha}, we also have τ2=\tau_{2}=\infty with overwhelming probability. These imply that τ=\tau=\infty with overwhelming probability. Hence we complete the proof. ∎

Now we move on to the short-range approximation of the dynamics. Recall that {φk}\{\varphi_{k}\} satisfies the parabolic equation (14), and we rewrite it as

ddtφk=(𝒫φ)k\dfrac{\mathrm{d}}{\mathrm{d}t}\varphi_{k}=\left({\mathcal{P}}\varphi\right)_{k}

where the time-dependent operator 𝒫{\mathcal{P}} is defined in the following way: For f:2Nf:\mathbb{R}\to\mathbb{R}^{2N},

(𝒫f)k:=j±kcjk(t)(fj(t)fk(t)),cjk(t):=12N(sj(t)sk(t))2.\left({\mathcal{P}}f\right)_{k}:=\sum_{j\neq\pm k}c_{jk}(t)(f_{j}(t)-f_{k}(t)),\ \ \ c_{jk}(t):=\dfrac{1}{2N(s_{j}(t)-s_{k}(t))^{2}}.

Consider some parameter l=l(N,α)l=l(N,\alpha) which will be determined later, we decompose the operator 𝒫{\mathcal{P}} into two parts 𝒫=𝒫short+𝒫long{\mathcal{P}}={\mathcal{P}}_{\mathrm{short}}+{\mathcal{P}}_{\mathrm{long}}. The operators 𝒫short{\mathcal{P}}_{\mathrm{short}} and 𝒫long{\mathcal{P}}_{\mathrm{long}} represent the short-range interactions and long-range interactions respectively and are defined as follows

(𝒫shortf)k\displaystyle\left({\mathcal{P}}_{\mathrm{short}}f\right)_{k} =|jk|lcjk(t)(fj(t)fk(t)),\displaystyle=\sum_{|j-k|\leqslant l}c_{jk}(t)(f_{j}(t)-f_{k}(t)),
(𝒫longf)k\displaystyle\left({\mathcal{P}}_{\mathrm{long}}f\right)_{k} =|jk|>lcjk(t)(fj(t)fk(t)).\displaystyle=\sum_{|j-k|>l}c_{jk}(t)(f_{j}(t)-f_{k}(t)).

Note that the operators 𝒫short,𝒫long{\mathcal{P}}_{\mathrm{short}},{\mathcal{P}}_{\mathrm{long}} are also time dependent. Let 𝒯short(s,t){\mathscr{T}}_{\mathrm{short}}(s,t) denote the semigroup associated with the operator 𝒫short{\mathcal{P}}_{\mathrm{short}} in the sense

t𝒯short(s,t)=𝒫short(t)𝒯short(s,t),𝒯short(s,s)=𝖨𝖽.\partial_{t}{\mathscr{T}}_{\mathrm{short}}(s,t)={\mathcal{P}}_{\mathrm{short}}(t){\mathscr{T}}_{\mathrm{short}}(s,t),\ \ {\mathscr{T}}_{\mathrm{short}}(s,s)=\mathsf{Id}.

Also, let 𝒯{\mathscr{T}} denote the semigroup associated with 𝒫{\mathcal{P}}.

To prove the short-range approximation, we need the following finite speed of propagation estimate for the semigroup. Such estimates were proved in [CL19] with minor changes.

Lemma B.2.

For any fixed small c>0c>0 and large D>0D>0, there exists N0(c,D)N_{0}(c,D) such that the following holds with probability at least 1ND1-N^{-D}. For any ε>0\varepsilon>0, N>N0N>N_{0}, 0<u<v<10<u<v<1, lN|uv|l\geqslant N|u-v|, |k|(1c)N|k|\leqslant(1-c)N and NjN-N\leqslant j\leqslant N such that |kj|>Nεl|k-j|>N^{\varepsilon}l, we have

(𝒯short(u,v)δk)(j)<ND.\left({\mathscr{T}}_{\mathrm{short}}(u,v)\delta_{k}\right)(j)<N^{-D}. (53)

With Lemma B.2, we have the following short-range approximation estimate. In particular, this short-range approximation can be improved based on the hypothesis α{\mathcal{H}}_{\alpha}.

Lemma B.3.

Assume α{\mathcal{H}}_{\alpha}. Let c>0c>0 be any fixed small constant. There exists a constant C>0C>0 such that for any ε>0\varepsilon>0, N1+Cε<t<1N^{-1+C\varepsilon}<t<1, t2u<vt\tfrac{t}{2}\leqslant u<v\leqslant t, lNεl\geqslant N^{\varepsilon} and |k|(1c)N|k|\leqslant(1-c)N, we have

|((𝒯(u,v)𝒯short(u,v))φ(u))k|(vu)(Nl(Nt)αN2t+1Nt).\left|\left(\left({\mathscr{T}}(u,v)-{\mathscr{T}}_{\mathrm{short}}(u,v)\right)\varphi(u)\right)_{k}\right|\prec(v-u)\left(\dfrac{N}{l}\dfrac{(Nt)^{\alpha}}{N^{2}t}+\dfrac{1}{Nt}\right). (54)
Proof.

The Duhamel’s principle implies

((𝒯(u,v)𝒯short(u,v))φ(u))k=uv(𝒯short(s,v)[(𝒫longφ)(s)])kds.\left(\left({\mathscr{T}}(u,v)-{\mathscr{T}}_{\mathrm{short}}(u,v)\right)\varphi(u)\right)_{k}=\int_{u}^{v}\left({\mathscr{T}}_{\mathrm{short}}(s,v)[\left({\mathcal{P}}_{\mathrm{long}}\,\varphi\right)(s)]\right)_{k}\mathrm{d}s.

On the event that Lemma B.2 holds, for |k|(13c)N|k|\leqslant(1-3c)N, the finite speed of propagation yields

(𝒯short(s,v)[(𝒫longφ)(s)])k=(𝒯short(s,v)[(𝒫longφ𝟙(2c1)N,(12c)N)(s)])k+ND,\left({\mathscr{T}}_{\mathrm{short}}(s,v)[\left({\mathcal{P}}_{\mathrm{long}}\,\varphi\right)(s)]\right)_{k}=\left({\mathscr{T}}_{\mathrm{short}}(s,v)[\left({\mathcal{P}}_{\mathrm{long}}\,\varphi\mathbbm{1}_{\llbracket(2c-1)N,(1-2c)N\rrbracket}\right)(s)]\right)_{k}+N^{-D},

where (φ𝟙(2c1)N,(12c)N)j=φj𝟙(2c1)N,(12c)N(j)(\varphi\mathbbm{1}_{\llbracket(2c-1)N,(1-2c)N\rrbracket})_{j}=\varphi_{j}\mathbbm{1}_{\llbracket(2c-1)N,(1-2c)N\rrbracket}(j). Moreover, using the property that 𝒯short{\mathscr{T}}_{\mathrm{short}} is an LL^{\infty} contraction, we have

|((𝒯(u,v)𝒯short(u,v))φ(u))k||uv|sup|j|(12c)N,u<s<v|(𝒫longφ)j(s)|+ND.\left|{\left(\left({\mathscr{T}}(u,v)-{\mathscr{T}}_{\mathrm{short}}(u,v)\right)\varphi(u)\right)_{k}}\right|\leqslant|u-v|\sup_{|j|\leqslant(1-2c)N,u<s<v}\left|{\left({\mathcal{P}}_{\mathrm{long}}\,\varphi\right)_{j}(s)}\right|+N^{-D}. (55)

For |i|(1c)N|i|\leqslant(1-c)N, assuming α{\mathcal{H}}_{\alpha} and on the event that Lemma 3.7 holds, there exists a constant C>0C>0 so that

|φi(s)φj(s)||φi(s)φ^i(s)|+|φj(s)φ^j(s)|+|φ^i(s)φ^j(s)|NCε((Nt)αN2t+|ij|N2t).\left|{\varphi_{i}(s)-\varphi_{j}(s)}\right|\leqslant\left|{\varphi_{i}(s)-\widehat{\varphi}_{i}(s)}\right|+\left|{\varphi_{j}(s)-\widehat{\varphi}_{j}(s)}\right|+\left|{\widehat{\varphi}_{i}(s)-\widehat{\varphi}_{j}(s)}\right|\lesssim N^{C\varepsilon}\left(\frac{(Nt)^{\alpha}}{N^{2}t}+\frac{|i-j|}{N^{2}t}\right).

For |i|>(1c)N|i|>(1-c)N, Lemma 3.6 yields

|φi(s)φj(s)||φi(s)|+|φj(s)|N23+Cε(N+1|i|)13.|\varphi_{i}(s)-\varphi_{j}(s)|\leqslant|\varphi_{i}(s)|+|\varphi_{j}(s)|\lesssim N^{-\frac{2}{3}+C\varepsilon}(N+1-|i|)^{-\frac{1}{3}}.

Therefore, using rigidity of singular values, we have

(𝒫longφ)j(s)\displaystyle\left({\mathcal{P}}_{\mathrm{long}}\varphi\right)_{j}(s) =|ij|>lφi(s)φj(s)2N(si(s)sj(s))2\displaystyle=\sum_{|i-j|>l}\frac{\varphi_{i}(s)-\varphi_{j}(s)}{2N(s_{i}(s)-s_{j}(s))^{2}}
N1+Cε|ij|>l1(ij)2((Nt)αN2t+|ij|N2t)+N1+Cε|i|>(1c)NN23(N+1|i|)13\displaystyle\lesssim N^{1+C\varepsilon}\sum_{|i-j|>l}\frac{1}{(i-j)^{2}}\left(\frac{(Nt)^{\alpha}}{N^{2}t}+\frac{|i-j|}{N^{2}t}\right)+N^{-1+C\varepsilon}\sum_{|i|>(1-c)N}N^{-\frac{2}{3}}(N+1-|i|)^{-\frac{1}{3}}
NCε(Nl(Nt)αN2t+1Nt).\displaystyle\lesssim N^{C\varepsilon}\left(\frac{N}{l}\frac{(Nt)^{\alpha}}{N^{2}t}+\frac{1}{Nt}\right).

Combined with (55), this implies the desired claim with |k|(13c)N|k|\leqslant(1-3c)N. Note that c>0c>0 is arbitrary, and thus it concludes the proof. ∎

Further, we will show that we have nice control for a regularization of the short-range dynamics with a well-behaved initial data. To do this, we follow the techniques developed in [BY17]. Consider some fixed times u<tu<t, the short-range parameter ll, and define an averaging space window scale rr. Throughout the remaining parts of this section, for any fixed arbitrarily small ε>0\varepsilon>0, we make the following assumption on these parameters

N30ε(tu)<N20εlN<N10εr<t.N^{30\varepsilon}(t-u)<N^{20\varepsilon}\dfrac{l}{N}<N^{10\varepsilon}r<t. (56)

For a fixed index kk, as in [BY17], we define the flattening operator with parameter a>0a>0 by

(af)j(v):={fj(v)if|jk|aφ^k(t)if|jk|>afor uvt,\left({{\mathcal{F}}}_{a}f\right)_{j}(v):=\left\{\begin{aligned} &f_{j}(v)&\ &\mbox{if}\ |j-k|\leqslant a\\ &\widehat{\varphi}_{k}(t)&\ &\mbox{if}\ |j-k|>a\end{aligned}\right.\ \ \ \mbox{for }u\leqslant v\leqslant t,

and the averaging operator

(𝒜f)j:=1|Nr,2Nr|aNr,2Nr(af)j\left({\mathcal{A}}f\right)_{j}:=\dfrac{1}{|\llbracket Nr,2Nr\rrbracket|}\sum_{a\in\llbracket Nr,2Nr\rrbracket}\left({{\mathcal{F}}}_{a}f\right)_{j}

As shown in [BY17, Equation (7.4)], the averaging operator can also be represented as a combination of Lipschitz function, i.e. there exists a Lipschitz function hh with |hihj||ij|Nr|h_{i}-h_{j}|\leqslant\tfrac{|i-j|}{Nr} such that

(𝒜f)j=hjfj+(1hj)φ^k(t).\left({\mathcal{A}}f\right)_{j}=h_{j}f_{j}+(1-h_{j})\widehat{\varphi}_{k}(t). (57)

Finally, for u<v<tu<v<t, consider the regularized dynamics

{ddvΓj(v)=(𝒫short(v)Γ)j,Γ(u)=𝒜φ(u)\left\{\begin{aligned} &\dfrac{\mathrm{d}}{\mathrm{d}v}\Gamma_{j}(v)=\left({\mathcal{P}}_{\mathrm{short}}(v)\Gamma\right)_{j},\\ &\Gamma(u)={\mathcal{A}}\varphi(u)\end{aligned}\right.

The following lemma shows that the averaging the regularized dynamics gives good approximation for φ^k\widehat{\varphi}_{k}.

Lemma B.4.

Assume α{\mathcal{H}}_{\alpha}. Let c>0c>0 be any fixed small constant. There exists a constant C>0C>0 such that for any ε>0\varepsilon>0, N1+Cε<η,t<1N^{-1+C\varepsilon}<\eta,t<1, t2u<vt\tfrac{t}{2}\leqslant u<v\leqslant t, l>Nεl>N^{\varepsilon}, j,k(2c1)N,(12c)Nj,k\in\llbracket(2c-1)N,(1-2c)N\rrbracket such that |γjγk|<10r|\gamma_{j}-\gamma_{k}|<10r, and z=γj+iηz=\gamma_{j}+\text{\rm i}\eta, we have

|12NIm|ij|<lΓi(v)si(v)z(12NIm|ij|<l1si(v)z)φ^k(v)|(rNt+ηNt+(Nt)αN2t(lNr+Nηl+N(tu)l+1Nη)).\left|\dfrac{1}{2N}{\mathrm{Im}\,}\sum_{|i-j|<l}\dfrac{\Gamma_{i}(v)}{s_{i}(v)-z}-\left(\dfrac{1}{2N}{\mathrm{Im}\,}\sum_{|i-j|<l}\dfrac{1}{s_{i}(v)-z}\right)\widehat{\varphi}_{k}(v)\right|\\ \prec\left(\dfrac{r}{Nt}+\dfrac{\eta}{Nt}+\dfrac{(Nt)^{\alpha}}{N^{2}t}\left(\dfrac{l}{Nr}+\dfrac{N\eta}{l}+\dfrac{N(t-u)}{l}+\dfrac{1}{N\eta}\right)\right). (58)
Proof.

We decompose the upper line of (58) into three parts

12NIm|ij|<lΓi(v)si(v)z(12NIm|ij|<l1si(v)z)φ^k(v)=:I1+I2+I3,\dfrac{1}{2N}{\mathrm{Im}\,}\sum_{|i-j|<l}\dfrac{\Gamma_{i}(v)}{s_{i}(v)-z}-\left(\dfrac{1}{2N}{\mathrm{Im}\,}\sum_{|i-j|<l}\dfrac{1}{s_{i}(v)-z}\right)\widehat{\varphi}_{k}(v)=:I_{1}+I_{2}+I_{3},

where

I1\displaystyle I_{1} =12NIm|ij|<l(𝒯short(u,v)𝒜φ(u)𝒜𝒯short(u,v)φ(u))isi(v)z\displaystyle=\dfrac{1}{2N}{\mathrm{Im}\,}\sum_{|i-j|<l}\dfrac{\left({\mathscr{T}}_{\mathrm{short}}(u,v){\mathcal{A}}\varphi(u)-{\mathcal{A}}{\mathscr{T}}_{\mathrm{short}}(u,v)\varphi(u)\right)_{i}}{s_{i}(v)-z}
I2\displaystyle I_{2} =12NIm|ij|<l(𝒜𝒯short(u,v)φ(u)𝒜𝒯(u,v)φ(u))isi(v)z\displaystyle=\dfrac{1}{2N}{\mathrm{Im}\,}\sum_{|i-j|<l}\dfrac{\left({\mathcal{A}}{\mathscr{T}}_{\mathrm{short}}(u,v)\varphi(u)-{\mathcal{A}}{\mathscr{T}}(u,v)\varphi(u)\right)_{i}}{s_{i}(v)-z}
I3\displaystyle I_{3} =12NIm|ij|<l(𝒜𝒯(u,v)φ(u))iφ^k(v)si(v)z.\displaystyle=\dfrac{1}{2N}{\mathrm{Im}\,}\sum_{|i-j|<l}\dfrac{\left({\mathcal{A}}{\mathscr{T}}(u,v)\varphi(u)\right)_{i}-\widehat{\varphi}_{k}(v)}{s_{i}(v)-z}.

For the first term I1I_{1}, Note that

(𝒯short(u,v)𝒜φ(u)𝒜𝒯short(u,v)φ(u))i=1|Nr,2Nr|aNr,2Nr(𝒯short(u,v)aφ(u)a𝒯short(u,v)φ(u))i.\left({\mathscr{T}}_{\mathrm{short}}(u,v){\mathcal{A}}\varphi(u)-{\mathcal{A}}{\mathscr{T}}_{\mathrm{short}}(u,v)\varphi(u)\right)_{i}=\\ \dfrac{1}{|\llbracket Nr,2Nr\rrbracket|}\sum_{a\in\llbracket Nr,2Nr\rrbracket}\left({\mathscr{T}}_{\mathrm{short}}(u,v){{\mathcal{F}}}_{a}\varphi(u)-{{\mathcal{F}}}_{a}{\mathscr{T}}_{\mathrm{short}}(u,v)\varphi(u)\right)_{i}.

When |ik|<aNεl|i-k|<a-N^{\varepsilon}l, we have

(a𝒯short(u,v)φ(u))i=(𝒯short(u,v)φ(u))i,\left({{\mathcal{F}}}_{a}{\mathscr{T}}_{\mathrm{short}}(u,v)\varphi(u)\right)_{i}=\left({\mathscr{T}}_{\mathrm{short}}(u,v)\varphi(u)\right)_{i},

and the finite speed of propagation (53) yields

(𝒯short(u,v)aφ(u))i=(𝒯short(u,v)φ(u))i+ND.\left({\mathscr{T}}_{\mathrm{short}}(u,v){{\mathcal{F}}}_{a}\varphi(u)\right)_{i}=\left({\mathscr{T}}_{\mathrm{short}}(u,v)\varphi(u)\right)_{i}+N^{-D}.

This gives

(𝒯short(u,v)aφ(u)a𝒯short(u,v)φ(u))i<ND\left({\mathscr{T}}_{\mathrm{short}}(u,v){{\mathcal{F}}}_{a}\varphi(u)-{{\mathcal{F}}}_{a}{\mathscr{T}}_{\mathrm{short}}(u,v)\varphi(u)\right)_{i}<N^{-D}

Similarly, this bound also holds in the case |ik|>a+Nεl|i-k|>a+N^{\varepsilon}l. Now suppose aNεl|ik|a+Nεla-N^{\varepsilon}l\leqslant|i-k|\leqslant a+N^{\varepsilon}l. Applying (23) and Hypothesis α{\mathcal{H}}_{\alpha}, we obtain

|(𝒯short(u,v)aφ(u)a𝒯short(u,v)φ(u))i|\displaystyle\quad\left|\left({\mathscr{T}}_{\mathrm{short}}(u,v){{\mathcal{F}}}_{a}\varphi(u)-{{\mathcal{F}}}_{a}{\mathscr{T}}_{\mathrm{short}}(u,v)\varphi(u)\right)_{i}\right|
maxm:||mk|a|2Nεl|φm(v)φ^k(t)|+ND\displaystyle\leqslant\max_{m:||m-k|-a|\leqslant 2N^{\varepsilon}l}|\varphi_{m}(v)-\widehat{\varphi}_{k}(t)|+N^{-D}
maxm:||mk|a|2Nεl|φm(v)φ^m(v)|+maxm:||mk|a|2Nεl|φ^m(v)φ^k(t)|+ND\displaystyle\leqslant\max_{m:||m-k|-a|\leqslant 2N^{\varepsilon}l}|\varphi_{m}(v)-\widehat{\varphi}_{m}(v)|+\max_{m:||m-k|-a|\leqslant 2N^{\varepsilon}l}|\widehat{\varphi}_{m}(v)-\widehat{\varphi}_{k}(t)|+N^{-D}
NCε((Nt)αN2t+r+(tu)Nt).\displaystyle\leqslant N^{C\varepsilon}\left(\dfrac{(Nt)^{\alpha}}{N^{2}t}+\dfrac{r+(t-u)}{Nt}\right).

Combined with the estimate above, this implies

I1NCεlNr((Nt)αN2t+r+(tu)Nt).I_{1}\leqslant N^{C\varepsilon}\dfrac{l}{Nr}\left(\dfrac{(Nt)^{\alpha}}{N^{2}t}+\dfrac{r+(t-u)}{Nt}\right). (59)

For the term I2I_{2}, note that |ij|<l|i-j|<l implies |i|(1c)N|i|\leqslant(1-c)N. Therefore, using the Lipschitz representation of the averaging operator (57), the short-range approximation (54) gives us

|(𝒜𝒯short(u,v)φ(u)𝒜𝒯(u,v)φ(u))i||(𝒯short(u,v)φ(u)𝒯(u,v)φ(u))i|NCε(tu)(Nl(Nt)αN2t+1Nt).\left|\left({\mathcal{A}}{\mathscr{T}}_{\mathrm{short}}(u,v)\varphi(u)-{\mathcal{A}}{\mathscr{T}}(u,v)\varphi(u)\right)_{i}\right|\\ \leqslant|\left({\mathscr{T}}_{\mathrm{short}}(u,v)\varphi(u)-{\mathscr{T}}(u,v)\varphi(u)\right)_{i}|\leqslant N^{C\varepsilon}(t-u)\left(\dfrac{N}{l}\dfrac{(Nt)^{\alpha}}{N^{2}t}+\dfrac{1}{Nt}\right).

This shows

I2NCε(tu)(Nl(Nt)αN2t+1Nt).I_{2}\leqslant N^{C\varepsilon}(t-u)\left(\dfrac{N}{l}\dfrac{(Nt)^{\alpha}}{N^{2}t}+\dfrac{1}{Nt}\right). (60)

Finally, for the term I3I_{3}, by the Lipschitz representation of the averaging opeator (57), it can be rewritten in the following way,

I3\displaystyle I_{3} =12NImNiNhj(φi(v)φ^k(v))siz12NIm|ij|lhj(φi(v)φ^k(v))siz\displaystyle=\dfrac{1}{2N}{\mathrm{Im}\,}\sum_{-N\leqslant i\leqslant N}\dfrac{h_{j}(\varphi_{i}(v)-\widehat{\varphi}_{k}(v))}{s_{i}-z}-\dfrac{1}{2N}{\mathrm{Im}\,}\sum_{|i-j|\geqslant l}\dfrac{h_{j}(\varphi_{i}(v)-\widehat{\varphi}_{k}(v))}{s_{i}-z}
+12NIm|ij|<l(hihj)(φi(v)φ^k(v))siz+12NIm|ij|<l(1hi)(φ^k(t)φ^k(v))siz\displaystyle\quad+\dfrac{1}{2N}{\mathrm{Im}\,}\sum_{|i-j|<l}\dfrac{(h_{i}-h_{j})(\varphi_{i}(v)-\widehat{\varphi}_{k}(v))}{s_{i}-z}+\dfrac{1}{2N}{\mathrm{Im}\,}\sum_{|i-j|<l}\dfrac{(1-h_{i})(\widehat{\varphi}_{k}(t)-\widehat{\varphi}_{k}(v))}{s_{i}-z}
=:J1+J2+J3+J4.\displaystyle=:J_{1}+J_{2}+J_{3}+J_{4}.

Using (24) and (44), we control J1J_{1} in the following way

J1\displaystyle J_{1} ev22N(Im𝔖v(z)ev2ImSv(z)ImS0(zv)Im𝔖0(zv))+ImSv(z)(Im𝔖0(zv)NImS0(zv)φ^k(v))\displaystyle\leqslant\dfrac{e^{\frac{v}{2}}}{2N}\left({\mathrm{Im}\,}{\mathfrak{S}}_{v}(z)-e^{-\frac{v}{2}}\dfrac{{\mathrm{Im}\,}S_{v}(z)}{{\mathrm{Im}\,}S_{0}(z_{v})}{\mathrm{Im}\,}{\mathfrak{S}}_{0}(z_{v})\right)+{\mathrm{Im}\,}S_{v}(z)\left(\dfrac{{\mathrm{Im}\,}{\mathfrak{S}}_{0}(z_{v})}{N{\mathrm{Im}\,}S_{0}(z_{v})}-\widehat{\varphi}_{k}(v)\right)
NCε((Nt)αN3tη+1N2t+η+rNt).\displaystyle\leqslant N^{C\varepsilon}\left(\dfrac{(Nt)^{\alpha}}{N^{3}t\eta}+\dfrac{1}{N^{2}t}+\dfrac{\eta+r}{Nt}\right).

Applying (23) to estimate J2J_{2}, we obtain

J2\displaystyle J_{2} 12N|ij|lη(γiγj)2+η2(|φi(v)φ^i(v)|+|φ^i(v)φ^k(v)|)\displaystyle\leqslant\dfrac{1}{2N}\sum_{|i-j|\geqslant l}\dfrac{\eta}{(\gamma_{i}-\gamma_{j})^{2}+\eta^{2}}\left(|\varphi_{i}(v)-\widehat{\varphi}_{i}(v)|+|\widehat{\varphi}_{i}(v)-\widehat{\varphi}_{k}(v)|\right)
NCε2N|ij|lη(γiγj)2+η2((Nt)αN2t+|ij|N2t+NrN2t)\displaystyle\leqslant\dfrac{N^{C\varepsilon}}{2N}\sum_{|i-j|\geqslant l}\dfrac{\eta}{(\gamma_{i}-\gamma_{j})^{2}+\eta^{2}}\left(\dfrac{(Nt)^{\alpha}}{N^{2}t}+\dfrac{|i-j|}{N^{2}t}+\dfrac{Nr}{N^{2}t}\right)
NCε((Nt)αN2tNηl+ηNt+rNt).\displaystyle\leqslant N^{C\varepsilon}\left(\dfrac{(Nt)^{\alpha}}{N^{2}t}\dfrac{N\eta}{l}+\dfrac{\eta}{Nt}+\dfrac{r}{Nt}\right).

Similarly, by the Lipschitz property of {hi}\{h_{i}\}, we estimate J3J_{3} as follows

J312N|ij|<lη(γiγj)2+η2|ij|Nr((Nt)αN2t+|ij|N2t+NrN2t)NCεlNr((Nt)αN2t+rNt)NCε((Nt)αN2tlNr+rNt).J_{3}\leqslant\dfrac{1}{2N}\sum_{|i-j|<l}\dfrac{\eta}{(\gamma_{i}-\gamma_{j})^{2}+\eta^{2}}\dfrac{|i-j|}{Nr}\left(\dfrac{(Nt)^{\alpha}}{N^{2}t}+\dfrac{|i-j|}{N^{2}t}+\dfrac{Nr}{N^{2}t}\right)\\ \leqslant N^{C\varepsilon}\dfrac{l}{Nr}\left(\dfrac{(Nt)^{\alpha}}{N^{2}t}+\dfrac{r}{Nt}\right)\leqslant N^{C\varepsilon}\left(\dfrac{(Nt)^{\alpha}}{N^{2}t}\dfrac{l}{Nr}+\dfrac{r}{Nt}\right).

Using the same arguments, by (23), we have

J4NCε2N|ij|<lη(γiγj)2+η2tvNtNCεrNt.J_{4}\leqslant\dfrac{N^{C\varepsilon}}{2N}\sum_{|i-j|<l}\dfrac{\eta}{(\gamma_{i}-\gamma_{j})^{2}+\eta^{2}}\dfrac{t-v}{Nt}\leqslant N^{C\varepsilon}\dfrac{r}{Nt}.

Together with the previous estimates, this leads to

I3NCε(rNt+ηNt+(Nt)αN2t(lNr+Nηl+1Nη))I_{3}\leqslant N^{C\varepsilon}\left(\dfrac{r}{Nt}+\dfrac{\eta}{Nt}+\dfrac{(Nt)^{\alpha}}{N^{2}t}\left(\dfrac{l}{Nr}+\dfrac{N\eta}{l}+\dfrac{1}{N\eta}\right)\right)

Combined with (59) and (60), we obtain the desired result. ∎

Finally, we have all the tools to prove Lemma 3.10.

Proof of Lemma 3.10.

We fix some small c>0c>0 and consider an arbitrarily small ε>0\varepsilon>0. Throughout the whole proof, we do all estimates on the overwhelming probability event where Lemma B.2, Lemma B.3 and Lemma B.4 hold. For a fixed index k(2c1)N,(12c)Nk\in\llbracket(2c-1)N,(1-2c)N\rrbracket, we have

|φk(t)Γk(t)||((𝒯(u,t)𝒯short(u,t))φ(u))k|+|(𝒯short(u,t)(φ(u)Γ(u)))k|.|\varphi_{k}(t)-\Gamma_{k}(t)|\leqslant\left|\left(({\mathscr{T}}(u,t)-{\mathscr{T}}_{\mathrm{short}}(u,t))\varphi(u)\right)_{k}\right|+\left|\left({\mathscr{T}}_{\mathrm{short}}(u,t)(\varphi(u)-\Gamma(u))\right)_{k}\right|.

By the definition of the averaging opeartor, we know that Γ(u)=𝒜φ(u)=φ(u)\Gamma(u)={\mathcal{A}}\varphi(u)=\varphi(u) on the set {j:|jk|Nr}\{j:|j-k|\leqslant Nr\}. Therefore, combined with the finite speed of propagation estimate (53) for the second term and the short-range approximation (54) for the first term, we obtain

|φk(t)Γk(t)|NCε(tu)((Nt)αN2tNl+1Nt)+N2022.|\varphi_{k}(t)-\Gamma_{k}(t)|\leqslant N^{C\varepsilon}(t-u)\left(\dfrac{(Nt)^{\alpha}}{N^{2}t}\dfrac{N}{l}+\dfrac{1}{Nt}\right)+N^{-2022}. (61)

It suffices to estimate |Γk(t)φ^k(t)||\Gamma_{k}(t)-\widehat{\varphi}_{k}(t)|. Consider the function

M(v):=maxNiN(Γi(v)φ^k(t)).M(v):=\max_{-N\leqslant i\leqslant N}\left(\Gamma_{i}(v)-\widehat{\varphi}_{k}(t)\right).

Similarly as in Lemma 3.2, we can show a parabolic maximum principle for MM and consequently MM decreases in time. Moreover, note that Γi(u)=φ^k(t)\Gamma_{i}(u)=\widehat{\varphi}_{k}(t) if |ik|2Nr|i-k|\geqslant 2Nr.

Let j=j(v)j=j(v) to denote the index that attains the maximum. If there exists a time uvtu\leqslant v\leqslant t such that |jk|>3Nr|j-k|>3Nr, then in this case the finite speed propagation (53) gives us

M(t)M(v)=Γj(v)φ^k(t)N2022.M(t)\leqslant M(v)=\Gamma_{j}(v)-\widehat{\varphi}_{k}(t)\leqslant N^{-2022}. (62)

On the other hand, now we assume that |j(v)k|<3Nr|j(v)-k|<3Nr for all uvtu\leqslant v\leqslant t. In this case, we have

ddv(Γj(v)φ^k(t))=|ij|<lΓi(v)Γj(v)2N(si(v)sj(v))212N|ij|<lΓi(v)Γj(v)(si(v)sj(v))2+η2.\dfrac{\mathrm{d}}{\mathrm{d}v}\left(\Gamma_{j}(v)-\widehat{\varphi}_{k}(t)\right)=\sum_{|i-j|<l}\dfrac{\Gamma_{i}(v)-\Gamma_{j}(v)}{2N(s_{i}(v)-s_{j}(v))^{2}}\leqslant\dfrac{1}{2N}\sum_{|i-j|<l}\dfrac{\Gamma_{i}(v)-\Gamma_{j}(v)}{(s_{i}(v)-s_{j}(v))^{2}+\eta^{2}}.

This gives us

ddv(Γj(v)φ^k(t))12NηIm|ij|<lΓi(v)si(v)z(12NηIm|ij|<l1si(v)z)Γj(v)\dfrac{\mathrm{d}}{\mathrm{d}v}\left(\Gamma_{j}(v)-\widehat{\varphi}_{k}(t)\right)\leqslant\dfrac{1}{2N\eta}{\mathrm{Im}\,}\sum_{|i-j|<l}\dfrac{\Gamma_{i}(v)}{s_{i}(v)-z}-\left(\dfrac{1}{2N\eta}{\mathrm{Im}\,}\sum_{|i-j|<l}\dfrac{1}{s_{i}(v)-z}\right)\Gamma_{j}(v)

and therefore

ddv(Γj(v)φ^k(t))\displaystyle\dfrac{\mathrm{d}}{\mathrm{d}v}\left(\Gamma_{j}(v)-\widehat{\varphi}_{k}(t)\right) 1η(12NIm|ij|<lΓi(v)si(v)z(12NIm|ij|<l1si(v)z)φ^k(v))\displaystyle\leqslant\dfrac{1}{\eta}\left(\dfrac{1}{2N}{\mathrm{Im}\,}\sum_{|i-j|<l}\dfrac{\Gamma_{i}(v)}{s_{i}(v)-z}-\left(\dfrac{1}{2N}{\mathrm{Im}\,}\sum_{|i-j|<l}\dfrac{1}{s_{i}(v)-z}\right)\widehat{\varphi}_{k}(v)\right)
+1η(12NIm|ij|<l1si(v)z)(φ^k(v)φ^k(t))\displaystyle\quad+\dfrac{1}{\eta}\left(\dfrac{1}{2N}{\mathrm{Im}\,}\sum_{|i-j|<l}\dfrac{1}{s_{i}(v)-z}\right)\left(\widehat{\varphi}_{k}(v)-\widehat{\varphi}_{k}(t)\right)
+1η(12NIm|ij|<l1si(v)z)(φ^k(t)Γj(v)).\displaystyle\quad+\dfrac{1}{\eta}\left(\dfrac{1}{2N}{\mathrm{Im}\,}\sum_{|i-j|<l}\dfrac{1}{s_{i}(v)-z}\right)\left(\widehat{\varphi}_{k}(t)-\Gamma_{j}(v)\right).

Applying Lemma B.4 and Lemma 3.7 yields

ddvM(v+)1ηM(v)+NCεη(rNt+ηNt+(Nt)αN2t(lNr+Nηl+N(tu)l+1Nη)),\dfrac{\mathrm{d}}{\mathrm{d}v}M(v^{+})\lesssim-\dfrac{1}{\eta}M(v)+\dfrac{N^{C\varepsilon}}{\eta}\left(\dfrac{r}{Nt}+\dfrac{\eta}{Nt}+\dfrac{(Nt)^{\alpha}}{N^{2}t}\left(\dfrac{l}{Nr}+\dfrac{N\eta}{l}+\dfrac{N(t-u)}{l}+\dfrac{1}{N\eta}\right)\right),

where the left-hand side represents the right derivative of MM at time vv. Let η=(tu)Nε\eta=\tfrac{(t-u)}{N^{\varepsilon}}, then above inequality leads to

M(t)NCε(rNt+(Nt)αN2t(lNr+N(tu)l+1N(tu)))M(t)\leqslant N^{C\varepsilon}\left(\dfrac{r}{Nt}+\dfrac{(Nt)^{\alpha}}{N^{2}t}\left(\dfrac{l}{Nr}+\dfrac{N(t-u)}{l}+\dfrac{1}{N(t-u)}\right)\right)

Choosing

r=(Nt)3α4N,l=(Nt)α2,(tu)=(Nt)α4N,r=\dfrac{(Nt)^{\frac{3\alpha}{4}}}{N},\ \ l=(Nt)^{\frac{\alpha}{2}},\ \ (t-u)=\dfrac{(Nt)^{\frac{\alpha}{4}}}{N}, (63)

then we have

M(t)<NCε(Nt)3α4N2tM(t)<N^{C\varepsilon}\dfrac{(Nt)^{\frac{3\alpha}{4}}}{N^{2}t}

Similarly, this bound also holds for maxNiN(Γi(s)φ^k(t))-\max_{-N\leqslant i\leqslant N}\left(\Gamma_{i}(s)-\widehat{\varphi}_{k}(t)\right). Combined with (61) and (62), this completes the proof. ∎

Appendix C Proofs for Quantitative Universality

In this section, we prove the quantitative resolvent comparison Proposition 4.1.

The key idea is based on the Lindeberg exchange method (for a detailed introduction we refer to the monograph [EY17, VH14]). We first fix an ordering map of the indices ϕ:{(i,j):1i,jN}N2\phi:\{(i,j):1\leqslant i,j\leqslant N\}\to\llbracket N^{2}\rrbracket. For 0kN20\leqslant k\leqslant N^{2}, let HkH_{k} be the random matrix defined as

(Hk)ij={Xijif ϕ(i,j)kYijotherwise,(H_{k})_{ij}=\begin{cases}X_{ij}&\textrm{if }\phi(i,j)\leqslant k\\ Y_{ij}&\textrm{otherwise}\end{cases},

so that H0=YH_{0}=Y and HN2=XH_{N^{2}}=X. By telescoping summation, it suffices to show the following is true uniformly in 1kN21\leqslant k\leqslant N^{2},

|𝔼[F(Trf(Hk))]𝔼[F(Trf(Hk1))]|NCεN2(1ρN2+(ρNa)5N+tρNa)\left|\mathbb{E}[F(\operatorname{Tr}f(H_{k}))]-\mathbb{E}[F(\operatorname{Tr}f(H_{k-1}))]\right|\leqslant\frac{N^{C\varepsilon}}{N^{2}}\left(\frac{1}{\rho N^{2}}+\frac{\left(\rho N^{a}\right)^{5}}{\sqrt{N}}+t\rho N^{a}\right) (64)

To prove (64), we use the Helffer-Sjöstrand formula. Let χ\chi be a smooth symmetric cutoff function such that χ(y)=1\chi(y)=1 if |y|<Na|y|<N^{-a} and χ(y)=0\chi(y)=0 if |y|>2Na|y|>2N^{-a}, with χNa\|\chi^{\prime}\|_{\infty}\leqslant N^{a}. For any matrix H{Hk}k=0N2H\in\left\{H_{k}\right\}_{k=0}^{N^{2}}, let H~{\widetilde{H}} denote its Girko symmetrization

H~=(0HH0){\widetilde{H}}=\left(\begin{matrix}0&H^{\top}\\ H&0\end{matrix}\right)

Recall that the symmetrized singular values {σi(H)}i=NN\left\{\sigma_{i}(H)\right\}_{i=-N}^{N} are the eigenvalues of H~{\widetilde{H}}. With the cutoff function χ\chi, applying Lemma A.2 to H~{\widetilde{H}} yields

Trf(H)=g(z)Tr(H~z)1d2z,\operatorname{Tr}f(H)=\int_{\mathbb{C}}g(z)\operatorname{Tr}({\widetilde{H}}-z)^{-1}\,\mathrm{d}^{2}z, (65)

where d2z\mathrm{d}^{2}z is the Lebesgue measure on \mathbb{C} and

g(z):=1π[iyf′′(x)χ(y)+i(f(x)+iyf(x))χ(y)],z=x+yi.g(z):=\frac{1}{\pi}\left[\text{\rm i}yf^{\prime\prime}(x)\chi(y)+\text{\rm i}\left(f(x)+\text{\rm i}yf^{\prime}(x)\right)\chi^{\prime}(y)\right],\ \ z=x+y\text{\rm i}.

The analysis of the comparison can be proceeded in the following steps:

Step 1: Approximation of Trf(H)\operatorname{Tr}f(H). We first truncate the integral in (65) and define

𝒯(H):=|y|>N2g(z)Tr(H~z)1d2z.{\mathcal{T}}(H):=\int_{|y|>N^{-2}}g(z)\operatorname{Tr}({\widetilde{H}}-z)^{-1}\mathrm{d}^{2}z.

The approximation error can be bounded by

|Trf(H)𝒯(H)|\displaystyle\left|{\operatorname{Tr}f(H)-{\mathcal{T}}(H)}\right| |y|<N2,E<|x|<E+ρ|f′′(x)|NkNy2|σk(x+yi)|2dxdy\displaystyle\lesssim\iint_{|y|<N^{-2},E<|x|<E+\rho}|f^{\prime\prime}(x)|\sum_{-N\leqslant k\leqslant N}\frac{y^{2}}{|\sigma_{k}-(x+y\text{\rm i})|^{2}}\mathrm{d}x\mathrm{d}y
E<|x|<E+ρ1ρ2N2(1N2NkN1|σk(x+iN)|2)dx.\displaystyle\lesssim\int_{E<|x|<E+\rho}\frac{1}{\rho^{2}N^{2}}\left(\frac{1}{N^{2}}\sum_{-N\leqslant k\leqslant N}\frac{1}{|\sigma_{k}-(x+\frac{\text{\rm i}}{N})|^{2}}\right)\mathrm{d}x.

For singular values near the origin, i.e. |k|NCε|k|\leqslant N^{C\varepsilon}, we have

E<|x|<E+ρ1|σk(x+iN)|2dxE<|x|<E+ρN2dxρN2.\int_{E<|x|<E+\rho}\frac{1}{|\sigma_{k}-(x+\frac{i}{N})|^{2}}\mathrm{d}x\leqslant\int_{E<|x|<E+\rho}N^{2}\mathrm{d}x\lesssim\rho N^{2}.

On the other hand, for |k|>NCε|k|>N^{C\varepsilon}, by the rigidity of singular values, we have the following overwhelming probability bound

E<|x|<E+ρ1|σk(x+iN)|2dxρ|Eγk|2.\int_{E<|x|<E+\rho}\frac{1}{|\sigma_{k}-(x+\frac{i}{N})|^{2}}\mathrm{d}x\leqslant\frac{\rho}{|E-\gamma_{k}|^{2}}.

Combining the above two bounds together, we obtain

|Trf(H)𝒯(H)|NCερN2+1ρN4|k|>NCε1|Eγk|2NCερN2+1ρN3(1Nk11(k/N)2)NCερN2\left|{\operatorname{Tr}f(H)-{\mathcal{T}}(H)}\right|\leqslant\frac{N^{C\varepsilon}}{\rho N^{2}}+\frac{1}{\rho N^{4}}\sum_{|k|>N^{C\varepsilon}}\frac{1}{|E-\gamma_{k}|^{2}}\lesssim\frac{N^{C\varepsilon}}{\rho N^{2}}+\frac{1}{\rho N^{3}}\left(\frac{1}{N}\sum_{k\geqslant 1}\frac{1}{(k/N)^{2}}\right)\lesssim\frac{N^{C\varepsilon}}{\rho N^{2}}

with overwhelming probability.

Step 2: Expansions and moment matching. With the approximation by 𝒯(H){\mathcal{T}}(H), it suffices to show that

|𝔼[F(𝒯(Hk))]𝔼[F(𝒯(Hk1))]|NCεN2((ρNa)5N+tρNa)\left|{\mathbb{E}[F({\mathcal{T}}(H_{k}))]-\mathbb{E}[F({\mathcal{T}}(H_{k-1}))]}\right|\lesssim\frac{N^{C\varepsilon}}{N^{2}}\left(\frac{(\rho N^{a})^{5}}{\sqrt{N}}+t\rho N^{a}\right) (66)

uniformly for all 1kN21\leqslant k\leqslant N^{2}. Now consider a fixed 1ωN21\leqslant\omega\leqslant N^{2} corresponding to the index (i,j)(i,j), i.e. ϕ(i,j)=ω\phi(i,j)=\omega. We rewrite the matrices HωH_{\omega} and Hω1H_{\omega-1} in the following way

Hω=W+1NU,Hω1=W+1NV,H_{\omega}=W+\frac{1}{\sqrt{N}}U,\ \ H_{\omega-1}=W+\frac{1}{\sqrt{N}}V,

where the matrix WW coincides with Hω1H_{\omega-1} and HωH_{\omega} except on the (i,j)(i,j) entry with Wij=0W_{ij}=0. Then note that the matrices U,VU,V satisfy Uij=NXijU_{ij}=\sqrt{N}X_{ij} and Vij=NYijV_{ij}=\sqrt{N}Y_{ij} and all other entries are zero. Recall the notation H~{\widetilde{H}} for the Girko symmetrization. Consider the resolvents of the matrices W~{\widetilde{W}} and H~ω{\widetilde{H}}_{\omega}

R:=(W~z)1,S:=(H~ωz)1.R:=({\widetilde{W}}-z)^{-1},\ \ S:=({\widetilde{H}}_{\omega}-z)^{-1}.

The Taylor expansion yields

𝔼[F(𝒯(Hω))]𝔼[F(𝒯(Hω1))]\displaystyle\mathbb{E}[F({\mathcal{T}}(H_{\omega}))]-\mathbb{E}[F({\mathcal{T}}(H_{\omega-1}))] (67)
=k=14𝔼[F(k)(𝒯(W))k!((𝒯(Hω)𝒯(W))k(𝒯(Hω1)𝒯(W))k)]\displaystyle\quad=\sum_{k=1}^{4}\mathbb{E}\left[\frac{F^{(k)}({\mathcal{T}}(W))}{k!}\left(({\mathcal{T}}(H_{\omega})-{\mathcal{T}}(W))^{k}-({\mathcal{T}}(H_{\omega-1})-{\mathcal{T}}(W))^{k}\right)\right]
+O(F(5))𝔼[(𝒯(Hω)𝒯(W))5+(𝒯(Hω1)𝒯(W))5].\displaystyle\quad\quad+O(\|F^{(5)}\|_{\infty})\,\mathbb{E}\left[({\mathcal{T}}(H_{\omega})-{\mathcal{T}}(W))^{5}+({\mathcal{T}}(H_{\omega-1})-{\mathcal{T}}(W))^{5}\right].

We first control the term corresponding to the fifth derivative. By Lemma A.3, the first order resolvent expansion gives us

1NTrS1NTrR=1NTrSU~R.\frac{1}{N}\operatorname{Tr}S-\frac{1}{N}\operatorname{Tr}R=\frac{1}{\sqrt{N}}\operatorname{Tr}S{\widetilde{U}}R.

Consequently,

|𝒯(Hω)𝒯(W)|1N|y|>N2|g(z)||TrS(z)U~R(z)|d2z.\left|{{\mathcal{T}}(H_{\omega})-{\mathcal{T}}(W)}\right|\leqslant\frac{1}{\sqrt{N}}\int_{|y|>N^{-2}}|g(z)|\left|{\operatorname{Tr}S(z){\widetilde{U}}R(z)}\right|\mathrm{d}^{2}z.

We can restrict the integral on the domain {z=x+yi:N2<|y|<2Na,E<|x|<E+ρ}\left\{z=x+y\text{\rm i}:N^{-2}<|y|<2N^{-a},E<|x|<E+\rho\right\} as the contribution outside this region is negligible. Moreover, a key observation is that the matrix U~{\widetilde{U}} only has two non-zero entries. Thus,

|𝒯(Hω)𝒯(W)|N12+CεN2<|y|<2Na,E<|x|<E+ρ|g(z)|(maxk|Sk(z)|)(maxk|Rk(z)|)d2z+N12+CεN2<|y|<2Na,E<|x|<E+ρ|g(z)|(maxk|Skk(z)|)(maxk|Rkk(z)|)d2z.\left|{{\mathcal{T}}(H_{\omega})-{\mathcal{T}}(W)}\right|\lesssim N^{\frac{1}{2}+C\varepsilon}\int_{N^{-2}<|y|<2N^{-a},E<|x|<E+\rho}|g(z)|\bigg{(}\max_{k\neq\ell}|S_{k\ell}(z)|\bigg{)}\bigg{(}\max_{k\neq\ell}|R_{k\ell}(z)|\bigg{)}\mathrm{d}^{2}z\\ +N^{-\frac{1}{2}+C\varepsilon}\int_{N^{-2}<|y|<2N^{-a},E<|x|<E+\rho}|g(z)|\left(\max_{k}|S_{kk}(z)|\right)\left(\max_{k}|R_{kk}(z)|\right)\mathrm{d}^{2}z.

Note that in this integral domain, the scale of |y||y| is smaller than the natural size of the local law. Therefore, we will use a suboptimal version of the local semicircle law for a larger spectral domain, which was discussed in [EKYY13, LS18]. For z=x+iyz=x+\text{\rm i}y in this integral domain, with overwhelming probability we have

maxk,|Sk(z)δkmsc(z)|NCε(1N+Ψ(z)),Ψ(z)=1Ny+Immsc(z)Ny.\max_{k,\ell}\left|{S_{k\ell}(z)-\delta_{k\ell}\,m_{sc}(z)}\right|\leqslant N^{C\varepsilon}\left(\frac{1}{\sqrt{N}}+\Psi(z)\right),\ \ \Psi(z)=\frac{1}{Ny}+\sqrt{\frac{{\mathrm{Im}\,}m_{sc}(z)}{Ny}}.

The same result also holds for Rk(z)R_{k\ell}(z). By Lemma A.4, we have Ψ(z)1Ny\Psi(z)\lesssim\frac{1}{\sqrt{Ny}} for zz in the integral domain. Note that the contribution of the diagonal resolvent entries is negligible. Therefore, with overwhelming probability we have

|𝒯(Hω)𝒯(W)|NCεN1/2N2<|y|<2Na,E<|x|<E+ρ|g(z)|Nyd2zNCεN1/2ρNa.\left|{{\mathcal{T}}(H_{\omega})-{\mathcal{T}}(W)}\right|\lesssim N^{C\varepsilon}N^{1/2}\int_{N^{-2}<|y|<2N^{-a},E<|x|<E+\rho}\frac{|g(z)|}{Ny}\mathrm{d}^{2}z\lesssim N^{C\varepsilon}N^{-1/2}\rho N^{a}.

Similarly, this bound also holds for |𝒯(Hω1)𝒯(W)||{\mathcal{T}}(H_{\omega-1})-{\mathcal{T}}(W)|, and we obtain

𝔼[(𝒯(Hω)𝒯(W))5]NCεN2(N12(ρNa)5),𝔼[(𝒯(Hω1)𝒯(W))5]NCεN2(N12(ρNa)5).\mathbb{E}[({\mathcal{T}}(H_{\omega})-{\mathcal{T}}(W))^{5}]\lesssim\frac{N^{C\varepsilon}}{N^{2}}\left(N^{-\frac{1}{2}}(\rho N^{a})^{5}\right),\ \ \mathbb{E}[({\mathcal{T}}(H_{\omega-1})-{\mathcal{T}}(W))^{5}]\lesssim\frac{N^{C\varepsilon}}{N^{2}}\left(N^{-\frac{1}{2}}(\rho N^{a})^{5}\right).

Hence the fifth order term in (67) is bounded by

O(F(5))𝔼[(𝒯(Hω)𝒯(W))5+(𝒯(Hω1)𝒯(W))5]NCεN2(ρNa)5N.O(\|F^{(5)}\|_{\infty})\,\mathbb{E}\left[({\mathcal{T}}(H_{\omega})-{\mathcal{T}}(W))^{5}+({\mathcal{T}}(H_{\omega-1})-{\mathcal{T}}(W))^{5}\right]\lesssim\frac{N^{C\varepsilon}}{N^{2}}\frac{(\rho N^{a})^{5}}{\sqrt{N}}. (68)

Now we consider the first term k=1k=1 in the Taylor expansion (67). Denote

R^:=1NTrR,R^X(m)=(1)mNTr(RU~)mR,ΩX:=1NTr(RU~)5S,\widehat{R}:=\frac{1}{N}\operatorname{Tr}R,\ \ \widehat{R}_{X}^{(m)}=\frac{(-1)^{m}}{N}\operatorname{Tr}(R{\widetilde{U}})^{m}R,\ \ \Omega_{X}:=-\frac{1}{N}\operatorname{Tr}(R{\widetilde{U}})^{5}S,

and also define

R^Y(m):=(1)mNTr(RV~)mR,ΩY:=1NTr(RV~)5(H~ω1z)1.\widehat{R}_{Y}^{(m)}:=\frac{(-1)^{m}}{N}\operatorname{Tr}(R{\widetilde{V}})^{m}R,\ \ \Omega_{Y}:=-\frac{1}{N}\operatorname{Tr}(R{\widetilde{V}})^{5}({\widetilde{H}}_{\omega-1}-z)^{-1}.

Using the resolvent expansion (Lemma A.3) up to the fifth order, we obtain

1NTrS=R^+m=14Nm2R^X(m)+N52ΩX.\frac{1}{N}\operatorname{Tr}S=\widehat{R}+\sum_{m=1}^{4}N^{-\frac{m}{2}}\widehat{R}_{X}^{(m)}+N^{-\frac{5}{2}}\Omega_{X}.

A Similar expansion also holds for (H~ω1z)1({\widetilde{H}}_{\omega-1}-z)^{-1}. Then we have

𝔼[F(𝒯(W))(𝒯(Hω)𝒯(Hω1))]=𝔼[F(𝒯(W))|y|>N2g(z)(m=14Nm2+1(R^X(m)R^Y(m))+N32(ΩXΩY))d2z]\mathbb{E}\left[F^{\prime}({\mathcal{T}}(W))\left({\mathcal{T}}(H_{\omega})-{\mathcal{T}}(H_{\omega-1})\right)\right]\\ =\mathbb{E}\left[F^{\prime}({\mathcal{T}}(W))\int_{|y|>N^{-2}}g(z)\left(\sum_{m=1}^{4}N^{-\frac{m}{2}+1}(\widehat{R}_{X}^{(m)}-\widehat{R}_{Y}^{(m)})+N^{-\frac{3}{2}}(\Omega_{X}-\Omega_{Y})\right)\mathrm{d}^{2}z\right] (69)

A key observation is that for 1m31\leqslant m\leqslant 3, the terms R^X(m)\widehat{R}_{X}^{(m)} and R^Y(m)\widehat{R}_{Y}^{(m)} only depend on the first three moments of XijX_{ij} and YijY_{ij}. Recall that the first three moments of XijX_{ij} and YijY_{ij} are identical. Therefore, the terms corresponding to 1m31\leqslant m\leqslant 3 in (69) makes no contribution.

Step 3: Higher order error. For the m=4m=4 term in (69), note that

Tr(RU~)4R=12N{αk,βk}={i+N,j}Rα1U~α1β1Rβ1α2U~α2β2Rβ2α3U~α3β3Rβ3α4U~α4β4Rβ4.\operatorname{Tr}(R{\widetilde{U}})^{4}R=\sum_{1\leqslant\ell\leqslant 2N}\sum_{\{\alpha_{k},\beta_{k}\}=\{i+N,j\}}R_{\ell\alpha_{1}}{\widetilde{U}}_{\alpha_{1}\beta_{1}}R_{\beta_{1}\alpha_{2}}{\widetilde{U}}_{\alpha_{2}\beta_{2}}R_{\beta_{2}\alpha_{3}}{\widetilde{U}}_{\alpha_{3}\beta_{3}}R_{\beta_{3}\alpha_{4}}{\widetilde{U}}_{\alpha_{4}\beta_{4}}R_{\beta_{4}\ell}. (70)

A similar formula is also true for Tr(RV~)4R\operatorname{Tr}(R{\widetilde{V}})^{4}R. Note that typically we have α1\ell\neq\alpha_{1} and β4\ell\neq\beta_{4}, but we may have β1=α2\beta_{1}=\alpha_{2}, β2=α3\beta_{2}=\alpha_{3}, β3=α4\beta_{3}=\alpha_{4}. Moreover, the terms with either =1+N\ell=1+N or =j\ell=j are combinatorially negligible in the summation and therefore we can ignore these terms in the following computations. Recall that the difference between the fourth moments of XijX_{ij} and YijY_{ij} is bounded by tt. Thus, we have

𝔼[N(R^X(4)R^Y(4))]=𝔼[Tr(RU~)4RTr(RV~)4R]Nt(maxk|Rk|)2(maxk|Rkk|)3.\mathbb{E}\left[N(\widehat{R}_{X}^{(4)}-\widehat{R}_{Y}^{(4)})\right]=\mathbb{E}\left[\operatorname{Tr}(R{\widetilde{U}})^{4}R-\operatorname{Tr}(R{\widetilde{V}})^{4}R\right]\lesssim Nt\left(\max_{k\neq\ell}|R_{k\ell}|\right)^{2}\left(\max_{k}|R_{kk}|\right)^{3}.

As mentioned above, for the integral in (69) we can restrict the integral domain to N2<|y|<2NaN^{-2}<|y|<2N^{-a} and E<|x|<E+ρE<|x|<E+\rho. In this region, the entries of the resolvent are bound by maxk|Rk(z)|NCεNy\max_{k\neq\ell}|R_{k\ell}(z)|\lesssim\frac{N^{C\varepsilon}}{\sqrt{Ny}} and maxk|Rkk(z)|NCε\max_{k}|R_{kk}(z)|\lesssim N^{C\varepsilon}. As a consequence,

|𝔼[F(𝒯(W))|y|>N2g(z)N42+1(R^X(4)R^Y(4))d2z]|NCεtNN2<|y|<2Na,E<|x|<E+ρ|g(z)|Nyd2zNCεN2(tρNa).\left|\mathbb{E}\left[F^{\prime}({\mathcal{T}}(W))\int_{|y|>N^{-2}}g(z)N^{-\frac{4}{2}+1}(\widehat{R}_{X}^{(4)}-\widehat{R}_{Y}^{(4)})\mathrm{d}^{2}z\right]\right|\\ \lesssim N^{C\varepsilon}\frac{t}{N}\int_{N^{-2}<|y|<2N^{-a},E<|x|<E+\rho}\frac{|g(z)|}{Ny}\mathrm{d}^{2}z\lesssim\frac{N^{C\varepsilon}}{N^{2}}(t\rho N^{a}). (71)

For the term ΩXΩY\Omega_{X}-\Omega_{Y}, since these terms involve the higher moments of XijX_{ij} and YijY_{ij}, we simply bound it by the size of ΩX\Omega_{X} and ΩY\Omega_{Y}. By a similar expansion as in (70) and the local law, we have |ΩX|,|ΩY|NCεNy|\Omega_{X}|,|\Omega_{Y}|\lesssim\frac{N^{C\varepsilon}}{Ny}. Therefore,

|𝔼[F(𝒯(W))|y|>N2g(z)N32ΩXd2z]|NCεN32N2<|y|<2Na,E<|x|<E+ρ|g(z)|Nyd2zNCεN2ρNaNNCεN2(ρNa)5N.\left|\mathbb{E}\left[F^{\prime}({\mathcal{T}}(W))\int_{|y|>N^{-2}}g(z)N^{-\frac{3}{2}}\Omega_{X}\mathrm{d}^{2}z\right]\right|\\ \lesssim N^{C\varepsilon}N^{-\frac{3}{2}}\int_{N^{-2}<|y|<2N^{-a},E<|x|<E+\rho}\frac{|g(z)|}{Ny}d^{2}z\lesssim\frac{N^{C\varepsilon}}{N^{2}}\frac{\rho N^{a}}{\sqrt{N}}\leqslant\frac{N^{C\varepsilon}}{N^{2}}\frac{(\rho N^{a})^{5}}{\sqrt{N}}. (72)

The same bound also holds for ΩY\Omega_{Y}.

Finally, as explained in classical literature of random matrix theory (see e.g. [EY17, Theorem 17.4]), the contributions of higher order terms in the Taylor expansion (67) are of smaller order. Consequently, combining (68), (71) and (72) yields the claim (66), which implies the desired result (33).

References

  • [AEK14] Oskari Ajanki, László Erdős, and Torben Krüger. Local semicircle law with imprimitive variance matrix. Electron. Commun. Probab., 19:no. 33, 9, 2014.
  • [AEK17] Johannes Alt, László Erdős, and Torben Krüger. Local law for random Gram matrices. Electron. J. Probab., 22:Paper No. 25, 41, 2017.
  • [AMR11] David Arthur, Bodo Manthey, and Heiko Röglin. Smoothed analysis of the k-means method. Journal of the ACM (JACM), 58(5):1–31, 2011.
  • [BC10] Peter Bürgisser and Felipe Cucker. Smoothed analysis of moore–penrose inversion. SIAM Journal on Matrix Analysis and Applications, 31(5):2769–2783, 2010.
  • [BCMV14] Aditya Bhaskara, Moses Charikar, Ankur Moitra, and Aravindan Vijayaraghavan. Smoothed analysis of tensor decompositions. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 594–603, 2014.
  • [BEK+14] Alex Bloemendal, László Erdős, Antti Knowles, Horng-Tzer Yau, and Jun Yin. Isotropic local laws for sample covariance and generalized Wigner matrices. Electron. J. Probab., 19:no. 33, 53, 2014.
  • [BGK17] Florent Benaych-Georges and Antti Knowles. Local semicircle law for Wigner matrices. In Advanced topics in random matrices, volume 53 of Panor. Synthèses, pages 1–90. Soc. Math. France, Paris, 2017.
  • [Bou22] Paul Bourgade. Extreme gaps between eigenvalues of Wigner matrices. Journal of the European Mathematical Society, 24(8):2823–2873, 2022.
  • [BY17] P. Bourgade and H.-T. Yau. The eigenvector moment flow and local quantum unique ergodicity. Comm. Math. Phys., 350(1):231–278, 2017.
  • [BYY14] Paul Bourgade, Horng-Tzer Yau, and Jun Yin. Local circular law for random matrices. Probability Theory and Related Fields, 159(3-4):545–595, 2014.
  • [CL19] Ziliang Che and Patrick Lopatto. Universality of the least singular value for sparse random matrices. Electron. J. Probab., 24:Paper No. 9, 53, 2019.
  • [CMS13] Claudio Cacciapuoti, Anna Maltsev, and Benjamin Schlein. Local Marchenko-Pastur law at the hard edge of sample covariance matrices. J. Math. Phys., 54(4):043302, 13, 2013.
  • [Ede88] Alan Edelman. Eigenvalues and condition numbers of random matrices. SIAM J. Matrix Anal. Appl., 9(4):543–560, 1988.
  • [EKYY13] László Erdős, Antti Knowles, Horng-Tzer Yau, and Jun Yin. Spectral statistics of erdős–rényi graphs i: local semicircle law. The Annals of Probability, 41(3B):2279–2375, 2013.
  • [ESYY12] László Erdős, Benjamin Schlein, Horng-Tzer Yau, and Jun Yin. The local relaxation flow approach to universality of the local statistics for random matrices. Ann. Inst. Henri Poincaré Probab. Stat., 48(1):1–46, 2012.
  • [EY17] László Erdős and Horng-Tzer Yau. A dynamical approach to random matrix theory, volume 28 of Courant Lecture Notes in Mathematics. Courant Institute of Mathematical Sciences, New York; American Mathematical Society, Providence, RI, 2017.
  • [EYY11] László Erdős, Horng-Tzer Yau, and Jun Yin. Universality for generalized Wigner matrices with Bernoulli distribution. J. Comb., 2(1):15–81, 2011.
  • [FV16] Brendan Farrell and Roman Vershynin. Smoothed analysis of symmetric random matrices with continuous distributions. Proceedings of the American Mathematical Society, 144(5):2257–2261, 2016.
  • [GPV21] Mehrdad Ghadiri, Richard Peng, and Santosh S Vempala. Sparse regression faster than dωd^{\omega}. arXiv preprint arXiv:2109.11537, 2021.
  • [Hig02] Nicholas J Higham. Accuracy and stability of numerical algorithms. SIAM, 2002.
  • [LS18] Ji Oon Lee and Kevin Schnelli. Local law and tracy–widom limit for sparse random matrices. Probability Theory and Related Fields, 171(1):543–616, 2018.
  • [LSY19] Benjamin Landon, Philippe Sosoe, and Horng-Tzer Yau. Fixed energy universality of Dyson Brownian motion. Adv. Math., 346:1137–1332, 2019.
  • [MT16] Govind Menon and Thomas Trogdon. Smoothed analysis for the conjugate gradient algorithm. SIGMA. Symmetry, Integrability and Geometry: Methods and Applications, 12:109, 2016.
  • [Nie22] Zipei Nie. Matrix anti-concentration inequalities with applications. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 568–581, 2022.
  • [PV21] Richard Peng and Santosh Vempala. Solving sparse linear systems faster than matrix multiplication. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 504–521. SIAM, 2021.
  • [Sma85] Steve Smale. On the efficiency of algorithms of analysis. Bulletin of the American Mathematical Society, 13(2):87–121, 1985.
  • [SS20] Rikhav Shah and Sandeep Silwal. Smoothed analysis of the condition number under low-rank perturbations. arXiv preprint arXiv:2009.01986, 2020.
  • [SST06] Arvind Sankar, Daniel A Spielman, and Shang-Hua Teng. Smoothed analysis of the condition numbers and growth factors of matrices. SIAM Journal on Matrix Analysis and Applications, 28(2):446–476, 2006.
  • [ST04] Daniel A Spielman and Shang-Hua Teng. Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. Journal of the ACM (JACM), 51(3):385–463, 2004.
  • [SW09] Galen R. Shorack and Jon A. Wellner. Empirical processes with applications to statistics, volume 59 of Classics in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2009. Reprint of the 1986 original [MR0838963].
  • [SX21] Kevin Schnelli and Yuanyuan Xu. Convergence rate to the Tracy-Widom laws for the largest eigenvalue of sample covariance matrices. arXiv preprint arXiv:2108.02728, 2021.
  • [SX22a] Kevin Schnelli and Yuanyuan Xu. Convergence rate to the Tracy-Widom laws for the largest eigenvalue of Wigner matrices. Comm. Math. Phys., 393(2):839–907, 2022.
  • [SX22b] Kevin Schnelli and Yuanyuan Xu. Quantitative tracy-widom laws for the largest eigenvalue of generalized Wigner matrices. arXiv preprint arXiv:2207.00546, 2022.
  • [TB97] Lloyd N. Trefethen and David Bau, III. Numerical linear algebra. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1997.
  • [TV10a] Terence Tao and Van Vu. Random matrices: the distribution of the smallest singular values. Geom. Funct. Anal., 20(1):260–297, 2010.
  • [TV10b] Terence Tao and Van Vu. Smooth analysis of the condition number and the least singular value. Mathematics of computation, 79(272):2333–2352, 2010.
  • [VH14] Ramon Van Handel. Probability in high dimension. Technical report, PRINCETON UNIV NJ, 2014.
  • [Wan19] Haoyu Wang. Quantitative universality for the largest eigenvalue of sample covariance matrices. arXiv preprint arXiv:1912.05473, 2019.
  • [Wsc04] Mario Wschebor. Smoothed analysis of κ(a)\kappa(a). Journal of Complexity, 20(1):97–107, 2004.