Optimal Smoothed Analysis and Quantitative Universality for the Smallest Singular Value of Random Matrices

Haoyu Wang Department of Mathematics, Yale University, New Haven, CT 06520, USA, [email protected]

Abstract

The smallest singular value and condition number play important roles in numerical linear algebra and the analysis of algorithms. In numerical analysis with randomness, many previous works make Gaussian assumptions, which are not general enough to reflect the arbitrariness of the input. To overcome this drawback, we prove the first quantitative universality for the smallest singular value and condition number of random matrices.

Moreover, motivated by the study of smoothed analysis that random perturbation makes deterministic matrices well-conditioned, we consider an analog for random matrices. For a random matrix perturbed by independent Gaussian noise, we show that this matrix quickly becomes approximately Gaussian. In particular, we derive an optimal smoothed analysis for random matrices in terms of a sharp Gaussian approximation.

1 Introduction

In numerical analysis and analysis of algorithms, the smallest singular value and condition number of matrices play important roles [TB97]. Broadly speaking, the smallest singular value reflects the invertibility of a matrix, and a better non-degeneracy condition of the matrix will result in faster algorithms or ensures better stability in computations. The condition number is a crucial quantity as well. It measures the hardness and accuracy of numerical computations, and its most well-known practical meaning is the loss of precision in solving linear equations [Sma85]. For more concrete state-of-the-art applications, the smallest singular value affect the stability in fast algorithms of solving linear systems [PV21, Nie22]. In [GPV21], the condition number determines the iteration complexity in fast algorithms for $p$ -norm regression. There are a lot of other applications of these two quantities, and we do not intend to provide a complete review here.

In the seminal work of Spielman and Teng [ST04], smoothed analysis is introduced to understand why some algorithms with poor worst-case performance can work successfully in practice. Roughly speaking, smoothed analysis is an interpolation between the worst-case analysis and the average-case analysis. In the context of solving linear equations $Ax=b$ , even if the matrix $A$ has large condition number (and consequently large loss of precision), a random perturbation $A+H$ will become well-conditioned with high probability. Smoothed analysis for random perturbation of deterministic matrices has been well studied since its invention [Wsc04, VH14, TV10b, FV16, SS20] etc. It has also been applied in many algorithms, for example Gaussian elimination [SST06], matrix inversion [BC10], conjugate descent method [MT16], tensor decomposition [BCMV14], the $k$ -means method [AMR11], etc.

Classical results of smoothed analysis focus on random perturbation of deterministic matrices, and show that this perturbation indeed makes the original matrix better for the implementation of algorithms (in the sense such as having smaller condition number). Recently, numerical computations with random data has become more and more common. In the context of random matrices, we show that if a random matrix is perturbed by independent Gaussian noise, it quickly becomes approximately Gaussian. The smallest singular value and condition number for a Gaussian matrix has been well studied [Ede88], and therefore such Gaussian approximations enable us to analyze the more tractable Gaussian ensemble instead with benign approximation error. Specifically, we prove an optimal smoothed analysis of random matrices in terms of a sharp Gaussian approximation.

Moreover, in numerical analysis with randomness, many previous works make Gaussian assumptions. Such assumptions are not general enough to reflect the arbitrariness of the input. To overcome this drawback, we prove the first quantitative universality for the smallest singular value and condition number of random matrices. Both the smoothed analysis for random matrices and the quantitative universality will play useful roles in computations with randomness.

1.1 Overview and our contributions

From the mathematical perspective, the edge universality has been a classical problem in random matrix theory and it has tremendous progress in the past decade. However, most previous results are qualitative statements, and most quantitative rate of convergence to the limiting law were only known for integrable models. Recently, the rate of convergence to the Tracy-Widom distribution for the largest eigenvalue was first obtained in [Bou22] for generalized Wigner matrices and in [Wan19] for sample covariance matrices. These results were further improved by Schnelli and Xu in [SX22a, SX21, SX22b]. Our work is the first result about the quantitative universality for the smallest singular value and the condition number.

For an $M\times N$ matrix $H$ with $\lim_{N\to\infty}N/M\to d\in(0,1]$ , the empirical spectral distribution of the sample covariance matrix $H^{\top}H$ converges to the Marchenko-Pastur law

\rho_{\mathsf{MP}}(x)=\frac{1}{2\pi d}\sqrt{\frac{(x-(1-\sqrt{d})^{2})((1+\sqrt{d})^{2}-x)}{x^{2}}},\ \ x\in[(1-\sqrt{d})^{2},(1+\sqrt{d})^{2}].

In the case $d<1$ , the Marchenko-Pastur law has no singularity and such situations are called the soft edge case in the literature of random matrix theory. Moreover, note that all eigenvalues are of order constant with high probability. In particular, thanks to the square-root decay near the spectral edge of the Marchenko-Pastur law, the extreme eigenvalues of the covariance matrix $H^{\top}H$ are highly concentrated at the spectral endpoints at the scale $N^{-2/3}$ and the fluctuation is the Tracy-Widom law. In this case, the optimal rate of convergence to Tracy-Widom distribution has been established in [SX21], and the quantitative universality for the smallest singular value will be an easy consequence.

However, in the case $\lim N/M=1$ , things become much more complicated. Note that in this case the Marchenko-Pastur law has a singularity at the origin

\rho_{\mathsf{MP}}(x)=\dfrac{1}{2\pi}\sqrt{\dfrac{4-x}{x}},\ \ x\in[0,4].

This singularity of the limiting spectral distribution makes the behaviours of the eigenvalue near the left endpoint $x=0$ very different. In particular, the typical eigenvalue spacing near the left edge is of order $N^{-1}$ , which is much smaller than the soft edge case where the typical spacing is of order $N^{-2/3}$ . In random matrix theory, this case is called the hard edge case. In this model, the left edge of the spectrum with singularity is called the hard edge and the right edge is called the soft edge. The study of the spectral statistics near the hard edge is a notoriously tricky problem. Due to some technical difficulties, the study of the hard edge case is mostly restricted to square matrices with $M\equiv N$ . Because of this difficulty, we will also focus on the study of square $N\times N$ random matrices.

For the study of the smallest singular value of an $N\times N$ random matrix, the first result was obtained by Edelman for the Gaussian matrix in [Ede88]. Later, the distribution in the Gaussian ensemble was shown to be universal by Tao and Vu in [TV10a] and further generalized to sparse matrices by Che and Lopatto in [CL19]. However, both of their universality results are qualitative statements with unknown error terms.

In computer science and numerical computations, the accuracy or complexity of algorithms depend on the smallest singular value or condition number in a delicate way. Therefore, a qualitative universality is not enough to measure the performance of general random input. Thus, quantitative estimates are necessary for practical purposes. In this work, we prove the first quantitative version of the universality for the smallest singular value and the condition number. This solves a long-standing open problem¹¹1Problem 11 in http://www.aimath.org/WWN/randommatrices/randommatrices.pdf, Open problems: AIM workshop on random matrices.. Along the way to prove quantitative universality, we obtain a sharp estimate for matrices with Gaussian perturbations, and hence establish the optimal smoothed analysis for random matrices.

1.2 Models and main results

Thanks to the motivations from theoretical computer science, we mainly focus on real matrices. However, as mentioned in [CL19], our whole proof works for complex matrices as well.

Let $H=(H_{ij})$ be an $N\times N$ matrix with independent real valued entries with mean 0 and variance $N^{-1}$ ,

H_{ij}=N^{-1/2}h_{ij},\ \ \ \mathbb{E}h_{ij}=0,\ \ \ \mathbb{E}h_{ij}^{2}=1.

(1)

We assume the entries $h_{ij}$ have a sub-exponential decay, that is, there exists a constant $\theta>0$ such that for $u>1$ ,

\mathbb{P}(|h_{ij}|>u)\leqslant\theta^{-1}\exp(-u^{\theta}).

(2)

We remark that this assumption is mainly for convenience, and other conditions such as the existence of a sufficiently high moment would also be enough.

For an $N\times N$ matrix $H$ , let $\sigma_{1}(H)\leqslant\cdots\leqslant\sigma_{N}(H)$ denote the singular values in non-decreasing order and use $\kappa(H):=\sigma_{N}(H)/\sigma_{1}(H)$ to denote the condition number. Throughout this paper, we let $G$ be an $N\times N$ Gaussian matrix with i.i.d. entires ${\mathcal{N}}(0,N^{-1})$ .

To state the main results, we first introduce two important probabilistic notions that are commonly used throughout the whole paper.

Definition 1 (Overwhelming probability).

Let $\{{\mathcal{E}}_{N}\}$ be a sequence of events. We say that ${\mathcal{E}}_{N}$ holds with overwhelming probability if for any (large) $D>0$ , there exists $N_{0}(D)$ such that for all $N\geqslant N_{0}(D)$ we have

\mathbb{P}({\mathcal{E}}_{N})\geqslant 1-N^{-D}.

Definition 2 (Stochastic domination).

Let ${\mathcal{X}}=\{{\mathcal{X}}_{N}\}$ and ${\mathcal{Y}}=\{{\mathcal{Y}}_{N}\}$ be two families of nonnegative random variables. We say that ${\mathcal{X}}$ is stochastically dominated by ${\mathcal{Y}}$ if for all (small) $\varepsilon>0$ and (large) $D>0$ , there exists $N_{0}(\varepsilon,D)>0$ such that for $N\geqslant N_{0}(\varepsilon,D)$ we have

\mathbb{P}\left({\mathcal{X}}_{N}>N^{\varepsilon}{\mathcal{Y}}_{N}\right)\leqslant N^{-D}.

The stochastic domination is always uniform in all parameters. If ${\mathcal{X}}$ is stochastically dominated by ${\mathcal{Y}}$ , we use the notation ${\mathcal{X}}\prec{\mathcal{Y}}$ .

Our main result is the following.

Theorem 1.

Let $H$ be an $N\times N$ random matrix satisfying (1) and (2). For any $\varepsilon>0$ and any $\lambda>N^{-1/2+\varepsilon}$ , we have

\left|{\sigma_{1}(H+\lambda G)-\sqrt{1+\lambda^{2}}\sigma_{1}(G)}\right|\prec\frac{\sqrt{1+\lambda^{2}}}{N^{2}\log(1+\lambda^{2})}.

(3)

Theorem 2.

Let $H$ be an $N\times N$ matrix satisfying (1) and (2). For any $\delta\in(0,1)$ and $\varepsilon>0$ , we have

\mathbb{P}\left(N\sigma_{1}(G)>r+N^{-\delta}\right)-N^{\varepsilon}\left(N^{-1+\delta}\vee N^{-\frac{1}{2}}\right)\leqslant\mathbb{P}\left(N\sigma_{1}(H)>r\right)\\ \leqslant\mathbb{P}\left(N\sigma_{1}(G)>r-N^{-\delta}\right)+N^{\varepsilon}\left(N^{-1+\delta}\vee N^{-\frac{1}{2}}\right),

(4)

where $a\vee b=\max\{a,b\}$ denotes the maximum between $a$ and $b$ .

For the smallest singular value, one aspect of the complex-valued case is particularly interesting in the sense that the complex Gaussian model is explicitly integrable, i.e., the distribution of its smallest singular value is given by an exact formula. Specifically, let $G_{\mathbb{C}}$ be an $N\times\mathbb{N}$ matrix whose entries are i.i.d complex Gaussians whose real and imaginary parts are i.i.d. copies of $\tfrac{1}{\sqrt{2}}{\mathcal{N}}(0,N^{-1})$ . For the complex Gaussian ensemble, Edelman proved in [Ede88] that the distribution of the (renormalized) smallest singular value of a complex Gaussian ensemble is independent of $N$ and can be computed explicitly

\mathbb{P}(N\sigma_{1}(G_{\mathbb{C}})\leqslant r)=\int_{0}^{r}e^{-x}\mathrm{d}x=1-e^{-r}.

Thanks to this exact formula for the integrable model, the edge universality for the smallest singular value can be quantified in terms of the Kolmogorov-Smirnov distance to the explicit law.

More precisely, let $H_{\mathbb{C}}$ be an $N\times N$ random matrix satisfying

(H_{\mathbb{C}})_{ij}=N^{-1/2}h_{ij},\ \ \ \mathbb{E}h_{ij}=0,\ \ \ \mathbb{E}[({\mathrm{Re}\,}h_{ij})^{2}]=\mathbb{E}[({\mathrm{Im}\,}h_{ij})^{2}]=\frac{1}{2},\ \ \mathbb{E}[({\mathrm{Re}\,}h_{ij})({\mathrm{Im}\,}h_{ij})]=0.

and the sub-exponential decay assumption (2). Then we have the following rate of convergence to the explicit law.

Corollary 1.

Let $H_{\mathbb{C}}$ be a complex $N\times N$ random matrix defined as above, then for any $\varepsilon>0$ we have

\mathbb{P}\left(N\sigma_{1}(H_{\mathbb{C}})\leqslant r\right)=1-e^{-r}+O(N^{-\frac{1}{2}+\varepsilon}).

(5)

Based on the analysis of the smallest singular value, combined with results on the quantitative results of the largest singular values, we can also derive optimal smoothed analysis and quantitative universality for the condition number.

Theorem 3.

Let $H$ be an $N\times N$ random matrix satisfying (1) and (2). For any $\varepsilon>0$ and any $\lambda>N^{-1/2+\varepsilon}$ , we have

\left|{\kappa(H+\lambda G)-\kappa(G)}\right|\prec\frac{1}{\log(1+\lambda^{2})}.

(6)

Theorem 4.

Let $H$ be an $N\times N$ random matrix satisfying (1) and (2). For any $\varepsilon>0$ , we have

\mathbb{P}\left(\frac{\kappa(G)}{N}>r+N^{-\frac{2}{3}+\varepsilon}\right)-N^{-\frac{1}{3}-\varepsilon}\leqslant\mathbb{P}\left(\frac{\kappa(H)}{N}>r\right)\leqslant\mathbb{P}\left(\frac{\kappa(G)}{N}>r-N^{-\frac{2}{3}+\varepsilon}\right)+N^{-\frac{1}{3}-\varepsilon}

(7)

As mentioned previously, the general hard edge case $\lim N/M=1$ without $M\equiv N$ is a notoriously difficult problem in random matrix theory. To further generalize our results, we have the following slight extension towards the general case.

Theorem 5.

Let $H$ be an $M\times N$ random matrix satisfying (1) and (2) with $M=N+O(N^{o(1)})$ . Then all results in Theorem 1, Theorem 2, Theorem 3 and Theorem 4 are still true.

1.3 Outline of proofs

The central idea of this paper is based on the Erdős-Schlein-Yau dynamical approach in random matrix theory. In their seminal work [ESYY12], the so-called three-step strategy is developed to prove universality phenomena for random matrices. Roughly speaking, this framework is the following three ideas.

(i)

A priori estimates for spectral statistics. This is based on the analyzing the resolvent of the matrix, and such analysis is called local law in random matrix theory. Local law states that the spectral density converges to the limiting law on microscopic scale. This local law implies the eigenvalue rigidity phenomenon, which states that the eigenvalues are close to their typical locations. Such a priori control of the eigenvalue locations will play a significant role in further analysis.
(ii)

Local relaxation of eigenvalues. This step is designed to prove universality for matrices with a tiny Gaussian component. We perturb the matrix by some independent Gaussian noise, and then under this perturbation, the dynamics of the eigenvalues is governed by the Dyson Brownian motion (DBM). Moreover, the spectral distribution of the Gaussian ensemble is the equilibrium measure of DBM. The ergodicity of DBM results in a fast convergence to the local equilibrium, and hence implies the universality for matrix with small Gaussian perturbation.
(iii)

Density arguments. For any probability distribution of the matrix elements, there exists a distribution with a small Gaussian component (in the sense of Step (ii)) such that the two associated random matrices have asymptotically identical spectral statistics. Typically, such an asymptotic identity is guaranteed by some moment matching conditions and the comparison of resolvents.

For a systematic discussion of this method, we refer to the monograph [EY17]. Following this strategy, our main techniques can also be divided into the following three steps.

•

The first step is the local semicircle law for the symmetrization of the random matrix $H$ . This local law guarantees the optimal rigidity estimates for the singular values. This step is be based on classical works in random matrix theory such as [BEK⁺14, AEK14, AEK17].
•

The second step is to interpolate the general matrix $H$ with the Gaussian matrix $G$ , and estimate the dynamics of the singular value. More specifically, we consider the interpolation $H_{t}=e^{-t/2}H+\sqrt{1-e^{-t}}G$ , which solves the matrix Ornstein-Uhlenbeck stochastic differential equation

$\mathrm{d}H_{t}=\frac{1}{\sqrt{N}}\mathrm{d}B_{t}-\frac{1}{2}H_{t}\mathrm{d}t.$

Note that this interpolation $H_{t}$ is equivalent to the matrix perturbation in our smoothed analysis. We consider a weighted Stieltjes transform (defined in (15)). A key innovation of our work is that, combined with a symmetrization trick, the evolution of the weighted Stieltjes along the dynamics of $H_{t}$ satisfies a stochastic PDE that can be well approximated by a deterministic advection equation. This deterministic PDE yields a rough estimate for $|\sigma_{k}(H_{t})-\sigma_{k}(G)|$ . Finally, using a delicate bootstrap argument, we show that the estimates for $|\sigma_{k}(H_{t})-\sigma_{k}(G)|$ are self-improving. Iterating the bootstrap argument to optimal scale, we derive the optimal smoothed analysis for the smallest singular value.
•

The last step is a quantitative resolvent comparison. In particular, the difference between the resolvent of two different random matrices are explicitly controlled in terms of the difference of their fourth moments. This comparison is proved via the Lindeberg exchange method. Together with the optimal smoothed analysis, this comparison theorem establishes the quantitative universality.

1.4 Notations and paper organizations

Throughout this paper, we denote $C$ a generic constant which does not depend on any parameter but may vary form line to line. We write $A\lesssim B$ if $A\leqslant CB$ holds for some constant $C$ , and similarly write $A\gtrsim B$ if $A\geqslant C^{-1}B$ . We also denote $A\sim B$ if both $C^{-1}B\leqslant A\leqslant CB$ hold. When $A$ and $B$ are complex valued, $A\sim B$ means ${\mathrm{Re}\,}A\sim{\mathrm{Re}\,}B$ and ${\mathrm{Im}\,}A\sim{\mathrm{Im}\,}B$ . We use $\llbracket A,B\rrbracket:=[A,B]\cap\mathbb{Z}$ to denote the set of integers between $A$ and $B$ . We use $a\vee b:=\max\{a,b\}$ and $a\wedge b:=\min\{a,b\}$ to denote the maximum and minimum between $a$ and $b$ , respectively.

The paper is organized as follows. In Section 2, we discuss some applications of our results in numerical analysis and algorithms. In Section 3, we discuss the smoothed analysis for the smallest singular value of random matrices via the study of singular value dynamics. In Section 4, we use the smoothed analysis to establish a full quantitative universality for the smallest singular value. In Section 5, we use the results on the smallest singular value to derive smoothed analysis and quantitative universality for the condition number. In Section 6, we extend the result for square matrices to a slightly more general non-square case. Finally, in the Appendix, we collect some auxiliary results and provide the deferred technical proofs.

2 Applications in Numerical Analysis and Algorithms

In this section, we discuss some applications of our results in numerical analysis and algorithms. There are numerous circumstances where the smallest singular value and condition number play important roles. We do not intend to mention all of them and just focus on two simple scenarios in the framework of solving linear systems to illustrate the usefulness of our results. We expect our results can be applied in more complicated models and more advanced algorithms.

2.1 Accuracy of least-square solution

Consider the linear least-square optimization

\min_{x\in\mathbb{R}^{N}}\|Ax-b\|_{2}^{2},

where $A$ is an $M\times N$ matrix satisfying $M-N=N^{o(1)}$ , and $b\in\mathbb{R}^{N}$ is a fixed vector. The loss of precision of this problem, denoted by $\mathrm{LoP}(A,b)$ , is the number of correct digits in the entries of the data $(A,b)$ minus the same quantity for the computed solution. Let $\mathrm{LoP}(A)$ denote the loss of precision for the worst $b$ . Then, as shown in [Hig02], we have

\mathrm{LoP}(A)=\log MN^{3/2}+2\log\kappa(A)+O(1).

Let $H$ be an $M\times N$ random matrix satisfying (1) and (2). Also let $G$ be an $M\times N$ Gaussian matrix. By Theorem 3 and Theorem 5, for any $\varepsilon>0$ , with overwhelming probability we have

\mathrm{LoP}(H+\lambda G)\leqslant\log MN^{3/2}+2\log\kappa(G)+N^{-1+\varepsilon}\frac{1}{\log(1+\lambda^{2})}+O(1).

Also, using Theorem 4 and 5, for any $\varepsilon>0$ , with probability at least $1-N^{-1/3-\varepsilon}$ , we have

\mathrm{LoP}(H)\leqslant\log MN^{3/2}+2\log\kappa(G)+N^{-\frac{2}{3}+\varepsilon}+O(1)

The error terms are smaller than the $O(1)$ term. These results imply that a general random matrix and its Gaussian perturbation can ensure accuracy as good as in the Gaussian case.

2.2 Complexity of conjugate gradient method

Consider the linear equation

A^{\top}Ax=c,

where $A$ is an $M\times N$ matrix with $M-N=N^{o(1)}$ , and $c\in\mathbb{R}^{N}$ is a fixed vector. This linear system can be solved via the conjugate gradient algorithm. Let $T_{\delta}(A)$ denote the needed iterations to obtain an $\delta$ -approximation of the true solution in the worst case. Then it is known (see e.g. [TB97]) that

T_{\delta}(A)=\frac{1}{2}\kappa(A)\delta.

\left|{T_{\delta}(H+\lambda G)-T_{\delta}(G)}\right|\leqslant\frac{\delta N^{\varepsilon}}{\log(1+\lambda^{2})}\lesssim N^{\varepsilon}\frac{\delta}{\lambda^{2}}

This shows that as long as $\lambda^{2}\gg\delta$ , the Gaussian perturbation $H+\lambda G$ has time complexity as good as the Gaussian ensemble.

Similarly, using Theorem 4 and Theorem 5, for any $\varepsilon>0$ , with probability at least $1-N^{-1/3-\varepsilon}$ , we have

T_{\delta}(H)\leqslant T_{\delta}(G)+\delta N^{1/3+\varepsilon}.

This shows that as long as the required accuracy satisfies $\delta\ll N^{-1/3}$ , the time complexity for a general random matrix is as good as the Gaussian ensemble.

3 Smoothed Analysis and Gaussian Approximation

3.1 Singular value dynamics

In smoothed analysis, we are interested in matrix perturbation of the form $H+\lambda G$ . After normalization of the variance, it is equivalent to study matrix of the form $H_{t}=e^{-t/2}H+\sqrt{1-e^{-t}}G$ . More specifically, we have

H+\lambda G=\sqrt{1+\lambda^{2}}\left(\frac{1}{\sqrt{1+\lambda^{2}}}H+\frac{\lambda}{\sqrt{1+\lambda^{2}}}G\right)=\sqrt{1+\lambda^{2}}H_{\log(1+\lambda^{2})}.

(8)

Let $B$ be an $N\times N$ matrix Brownian motion, i.e. $B_{ij}$ are independent standard Brownian motions. Then the evolution of $H_{t}$ is governed by the following matrix-valued Ornstein-Uhlenbeck process:

\mathrm{d}H_{t}=\dfrac{1}{\sqrt{N}}\mathrm{d}B_{t}-\dfrac{1}{2}H_{t}\mathrm{d}t.

(9)

Let $\{s_{k}(t)\}_{k=1}^{N}$ denote the singular values of $H_{t}$ , then $\{s_{k}(t)\}_{k=1}^{N}$ satisfy the following system of stochastic differential equations [ESYY12, equation (5.8)],

\mathrm{d}s_{k}=\dfrac{\mathrm{d}B_{k}}{\sqrt{N}}+\left[-\dfrac{1}{2}s_{k}+\dfrac{1}{2N}\sum_{\ell\neq k}\left(\dfrac{1}{s_{k}-s_{\ell}}+\dfrac{1}{s_{k}+s_{\ell}}\right)\right]\mathrm{d}t,\ \ \ 1\leqslant k\leqslant N.

(10)

To handle these SDEs, an important idea is the following symmetrization trick (see [CL19, equation (3.9)]):

s_{-i}(t)=-s_{i}(t),\ \ \ B_{-i}(t)=-B_{i}(t),\ \ \forall t\geqslant 0,\ 1\leqslant i\leqslant N.

With these notations, we label the indices from $-1$ to $-N$ and $1$ to $N$ , so that the zero index is omitted. Unless otherwise stated, this will be the convention and we will not emphasize it explicitly in the following parts of the paper. After symmetrization, for the real case we have

\mathrm{d}s_{k}=\dfrac{\mathrm{d}B_{k}}{\sqrt{N}}+\left[-\dfrac{1}{2}s_{k}+\dfrac{1}{2N}\sum_{\ell\neq\pm k}\dfrac{1}{s_{k}-s_{\ell}}\right]\mathrm{d}t,\ \ \ -N\leqslant k\leqslant N,k\neq 0.

(11)

Now we use the coupling method introduced in [LSY19] to analyze these dynamics. Consider the interpotation between a general matrix $H$ and a Gaussian matrix $G$ . Let $\{\sigma_{k}(H)\}_{k=-N}^{N}$ and $\{\sigma_{k}(G)\}_{k=-N}^{N}$ be the (symmetrized) singular values of $H$ and $G$ , respectively. For $\nu\in[0,1]$ , define

s_{k}^{(\nu)}(0)=(1-\nu)\sigma_{k}(H)+\nu\sigma_{k}(G).

With this initial condition, we denote the unique solution of (11) by $\{s_{k}^{(\nu)}(t)\}$ . Also, let $\{\sigma_{k}(H,t)\}$ and $\{\sigma_{k}(G,t)\}$ denote the solutions of (11) with initial conditions $\{\sigma_{k}(H)\}$ and $\{\sigma_{k}(G)\}$ , respectively.

It is well known that the empirical measure of the eigenvalues of $H^{*}H$ converges to the Marchenko-Pastur distribution

\rho_{\mathsf{MP}}(x)=\dfrac{1}{2\pi}\sqrt{\dfrac{4-x}{x}},

For $1\leqslant k\leqslant N$ , we define the typical position of the singular value $\sigma_{k}$ as the quantile $\gamma_{k}$ satisfying

\int_{-\infty}^{\gamma_{k}^{2}}\rho_{\mathsf{MP}}(x)\mathrm{d}x=\dfrac{k}{N}.

We also define $\gamma_{-k}=-\gamma_{k}$ . By a change of variable, we have

\int_{-\infty}^{\gamma_{k}}\rho_{\mathsf{sc}}(x)\mathrm{d}x=\dfrac{N+k}{2N},\ \ \ \int_{-\infty}^{\gamma_{-k}}\rho_{\mathsf{sc}}(x)\mathrm{d}x=\dfrac{N-k}{2N},

(12)

where $\rho_{\mathsf{sc}}(x)=\tfrac{1}{2\pi}\sqrt{(4-x^{2})_{+}}$ is the semicircle law.

An important input of our proof is the following uniform rigidity estimates. For any fixed $\varepsilon>0$ , consider the set of good trajectories

\mathscr{A}_{\varepsilon}=\left\{\left|s_{k}^{(\nu)}(t)-\gamma_{k}\right|<N^{-\frac{2}{3}+\varepsilon}(N+1-|k|)^{-\frac{1}{3}}\ \mbox{for all}\ 0\leqslant t\leqslant 1,-N\leqslant k\leqslant N,0\leqslant\nu\leqslant 1\right\}.

(13)

Such rigidity estimates for fixed $t$ and $\nu=0$ or $1$ were proved in [AEK14, AEK17, BEK⁺14, BYY14, CMS13].The extension to uniform estimates in parameters can be done by a discretization argument: (1) discretize in $t$ and $\nu$ ; (2) use weyl’s inequality to control increments over small time intervals; (3) use a maximum principle for the derivative with respect to the $\nu$ parameter (see Lemma 3.2) to control increments in small $\nu$ -intervals. As a consequence, we have

Lemma 3.1.

For any $\varepsilon>0$ , the event $\mathscr{A}_{\varepsilon}$ happens with overwhelming probability, i.e. for any $D>0$ , there exists $N_{0}(\varepsilon,D)$ such that for any $N>N_{0}$ we have

\mathbb{P}(\mathscr{A}_{\varepsilon})>1-N^{-D}.

We consider

\varphi_{k}^{(\nu)}(t):=e^{\frac{t}{2}}\dfrac{\mathrm{d}}{\mathrm{d}\nu}s_{k}^{(\nu)}(t).

For the simplicity of notations, we omit the parameter $\nu$ if the context is clear. Then $\varphi_{k}$ satisfies the following non-local parabolic type equation.

\dfrac{\mathrm{d}}{\mathrm{d}t}\varphi_{k}=\dfrac{1}{2N}\sum_{\ell\neq\pm k}\dfrac{\varphi_{\ell}-\varphi_{k}}{\left(s_{\ell}-s_{k}\right)^{2}}

(14)

Let $\psi_{k}=\psi_{k}^{(\nu)}$ solve the same equation as $\varphi_{k}$ in (14) but with initial condition $\psi_{k}(0)=|\varphi_{k}(0)|=|\sigma_{k}(H)-\sigma_{k}(G)|$ . Following the same arguments in [Wan19, Lemma 3.1], this equation satisfies a maximum principle.

Lemma 3.2.

For all $t\geqslant 0$ and $-N\leqslant k\leqslant N$ , we have

\psi_{k}(t)=\psi_{-k}(t),\ \ \ \psi_{k}(t)\geqslant 0,\ \ \ |\psi_{k}(t)|\leqslant\max_{-N\leqslant\ell\leqslant N}|\psi_{\ell}(0)|,\ \ \ |\varphi_{k}(t)|\leqslant\psi_{k}(t).

We consider the following weighted Stieltjes transform

{\mathfrak{S}}_{t}(z):=e^{-\frac{t}{2}}\sum_{-N\leqslant k\leqslant N}\dfrac{\varphi_{k}(t)}{s_{k}(t)-z},\ \ \ \widetilde{{\mathfrak{S}}}_{t}(z):=e^{-\frac{t}{2}}\sum_{-N\leqslant k\leqslant N}\dfrac{\psi_{k}(t)}{s_{k}(t)-z}.

(15)

Let $S_{t}(z)$ and $m_{\mathsf{sc}}(z)$ denote the Stieltjes transforms of the empirical measure for the singular values and of the semicircle law

S_{t}(z)=\dfrac{1}{2N}\sum_{-N\leqslant k\leqslant N}\dfrac{1}{s_{k}-z},\ \ \ m_{\mathsf{sc}}(z)=\dfrac{-z+\sqrt{z^{2}-4}}{2}.

A well-known result in random matrix theory is the following local semicircle law for the Stieltjes transform $S_{t}(z)$ . Let $\omega>0$ be an arbitrarily fixed constant. Define the spectral domain

\mathbf{S}=\mathbf{S}_{\omega}:=\left\{z=E+\text{\rm i}\eta:|E|\leqslant\omega^{-1},N^{-1+\omega}\leqslant\eta\leqslant\omega^{-1}\right\}.

For any $\omega>0$ and $z\in\mathbf{S}_{\omega}$ , it was shown in [AEK14, AEK17] that

\left|{S_{t}(z)-m_{\mathsf{sc}}(z)}\right|\prec\frac{1}{N\eta}.

(16)

As computed in [Wan19, Lemma 3.3], a key result is that ${\mathfrak{S}}_{t}(z)$ and $\widetilde{{\mathfrak{S}}}_{t}(z)$ satisfy the following stochastic advection equation.

Lemma 3.3.

For ${\mathrm{Im}\,}z\neq 0$ , we have

	$\displaystyle\mathrm{d}\widetilde{{\mathfrak{S}}}_{t}$	$\displaystyle=\left(S_{t}(z)+\dfrac{z}{2}\right)(\partial_{z}\widetilde{{\mathfrak{S}}}_{t})\mathrm{d}t+\dfrac{1}{4N}(\partial_{zz}\widetilde{{\mathfrak{S}}}_{t})\mathrm{d}t+\left[\dfrac{e^{-\frac{t}{2}}}{2N}\sum_{-N\leqslant k\leqslant N}\dfrac{\psi_{k}}{(s_{k}-z)^{2}(s_{k}+z)}\right]\mathrm{d}t$		(17)
		$\displaystyle\quad-\dfrac{e^{-\frac{t}{2}}}{\sqrt{N}}\sum_{-N\leqslant k\leqslant N}\dfrac{\psi_{k}}{(s_{k}-z)^{2}}\mathrm{d}B_{k}.$		(17)

Based on the local semicircle law (16), we expect that this stochastic differential equation can be approximated by the deterministic advection PDE

\partial_{t}h=\dfrac{\sqrt{z^{2}-4}}{2}\partial_{z}h.

(18)

The above PDE have the following explicit characteristics

z_{t}=\dfrac{e^{\frac{t}{2}}\left(z+\sqrt{z^{2}-4}\right)+e^{-\frac{t}{2}}\left(z-\sqrt{z^{2}-4}\right)}{2}.

(19)

This implies $\widetilde{{\mathfrak{S}}}_{t}(z)\approx\widetilde{{\mathfrak{S}}}_{0}(z_{t})$ , and we will justify this approximation in the next subsection. Moreover, we remark that ${\mathfrak{S}}_{t}$ satisfies the same equation (17) with $\psi_{k}$ replaced by $\varphi_{k}$ .

Before moving to the main estimates, we first collect some basic results, including the geometry of the characteristics and a rough estimate for the initial condition. These can be proved via direct computations and the details can be found in [Bou22, Section 2].

Let $\xi(z)=\min\{|z-2|,|z+2|\}$ . For any $\varepsilon>0$ , we consider the curve and the domain

\mathscr{S}_{\varepsilon}=\left\{E+\text{\rm i}\eta:-2+N^{-\frac{2}{3}+4\varepsilon}<E<2-N^{-\frac{2}{3}+4\varepsilon},\eta=N^{-1+4\varepsilon}\xi(E)^{-\frac{1}{2}}\right\},\ \ \mathscr{R}_{\varepsilon}=\bigcup_{0\leqslant t\leqslant 1}\{z_{t}:z\in\mathscr{S}_{\varepsilon}\}.

We also define $a(z)=\operatorname{dist}(z,[-2,2])$ and $b(z)=\operatorname{dist}(z,[-2,2]^{c})$ .

Lemma 3.4.

Uniformly in $0<t<1$ and $z=z_{0}$ satisfying $\eta={\mathrm{Im}\,}z>0$ and $|z-2|<\tfrac{1}{10}$ , we have

{\mathrm{Re}\,}(z_{t}-z_{0})\sim t\dfrac{a(z)}{\xi(z)^{1/2}}+t^{2},\ \ \ {\mathrm{Im}\,}(z_{t}-z_{0})\sim t\dfrac{b(z)}{\xi(z)^{1/2}}.

In particular, for $\varepsilon>0$ , if $z\in\mathscr{S}_{\varepsilon}$ , then $z_{t}-z_{0}\sim(tN^{-1+4\varepsilon}\xi(E)^{-1}+t^{2})+\text{\rm i}\xi(E)^{1/2}t$ .

Moreover, for any $\xi>0$ , and $z=E+\text{\rm i}\eta\in[-2+\xi,2-\xi]\times[0,\xi^{-1}]$ , we have ${\mathrm{Im}\,}(z_{t}-z_{0})\sim t$ .

Lemma 3.5.

Let $\varepsilon>0$ be any small constant. In the set $\mathscr{A}_{\varepsilon}$ , for any $z=E+\text{\rm i}\eta\in\mathscr{R}_{\varepsilon}$ , we have ${\mathrm{Im}\,}\widetilde{{\mathfrak{S}}}_{0}(z)\lesssim N^{\varepsilon}$ if $\eta>\max(E-2,-E-2)$ , and ${\mathrm{Im}\,}\widetilde{{\mathfrak{S}}}_{0}(z)\lesssim N^{\varepsilon}{\xi(z)^{-1}}{\eta}$ otherwise. The same bounds also hold for $|{\mathrm{Im}\,}{\mathfrak{S}}_{0}|$ .

A key ingredient of our proof is the following a priori estimate for $\widetilde{{\mathfrak{S}}}_{t}$ , whose proof is deferred to Appendix B.1.

Proposition 3.1.

Let $\varepsilon>0$ be any small constant. Uniformly for all $0<t<1$ and $z=E+\text{\rm i}\eta\in\mathscr{S}_{\varepsilon}$ , with overwhelming probability we have

{\mathrm{Im}\,}\widetilde{{\mathfrak{S}}}_{t}(z)\lesssim N^{2\varepsilon}\dfrac{\xi(E)^{1/2}}{\left(\xi(E)^{1/2}\vee t\right)}.

This estimate yields a rough control for the decay of $\varphi_{k}(t)$ , which will be an important input for more refined estimates.

Lemma 3.6.

For all $-N\leqslant k\leqslant N$ and $0\leqslant t\leqslant 1$ , we have

|\varphi_{k}(t)|\prec\frac{1}{N}\dfrac{1}{\left((\frac{N+1-|k|}{N})^{1/3}\vee t\right)}

(20)

Proof.

By Lemma 3.2, it suffices to control $\psi_{k}(t)$ . By the nonnegativity of $\psi_{k}(t)$ , we have

{\mathrm{Im}\,}\widetilde{{\mathfrak{S}}}_{t}(z)=\sum_{-N\leqslant k\leqslant N}\frac{\psi_{k}(t){\mathrm{Im}\,}z}{|s_{k}(t)-z|^{2}}\geqslant\psi_{k}(t)\frac{{\mathrm{Im}\,}z}{|s_{k}(t)-z|^{2}},

which implies

\psi_{k}(t)\leqslant{\mathrm{Im}\,}\widetilde{{\mathfrak{S}}}_{t}(z)\dfrac{|s_{k}(t)-z|^{2}}{{\mathrm{Im}\,}z}.

Let $\varepsilon>0$ . For $(N+1-|k|)>N^{10\varepsilon}$ , pick the point $z=\gamma_{k}+\text{\rm i}{N^{-1+4\varepsilon}\xi(\gamma_{k})^{-1/2}}\in\mathscr{S}_{\varepsilon}$ . In this case we have ${\xi(\gamma_{k})}^{1/2}\sim(\tfrac{N+1-|k|}{N})^{1/3}$ . Therefore, in the set $\mathscr{A}_{\varepsilon}$ , by Proposition 3.1, uniformly for all $-N+N^{10\varepsilon}\leqslant k\leqslant N-N^{10\varepsilon}$ and $0\leqslant t\leqslant 1$ , with overwhelming probability we have

|\psi_{k}(t)|<\dfrac{N^{8\varepsilon}}{N}\dfrac{1}{\left((\frac{N+1-|k|}{N})^{1/3}\vee t\right)}.

For $(N+1-|k|)\leqslant N^{10\varepsilon}$ , without loss of generality we consider $N+1-k\leqslant N^{10\varepsilon}$ . In this case, let $k_{0}=N-N^{10\varepsilon}+1$ and consider $z=\gamma_{k_{0}}+\text{\rm i}{N^{-1+4\varepsilon}\xi(\gamma_{k_{0}})^{-1/2}}$ . The same argument results in a similar bound with a larger $N^{20\varepsilon}$ factor. By the arbitrariness of $\varepsilon$ , this completes the proof. ∎

3.2 Local relaxation at the hard edge

In this subsection we prove a quantitative estimate for the local relaxation flow (11) at the hard edge. The main estimate in this section is the following. We remark that Theorem 6 is equivalent to Theorem 1 using the rescaling (8).

Theorem 6.

For $\varepsilon_{0}>0$ arbitrarily small and any $N^{-1+\varepsilon_{0}}<t<1$ , we have

|\sigma_{1}(H,t)-\sigma_{1}(G,t)|\prec\dfrac{1}{N^{2}t}.

(21)

To estimate $\varphi_{k}(t)$ near the hard edge, we introduce the following quantity to approximate it. Let $\gamma_{k}^{t}=(\gamma_{k})_{t}$ with the convention $\gamma^{t}=(\gamma+\text{\rm i}0^{+})_{t}$ , and define

\widehat{\varphi}_{k}(t):=\dfrac{1}{2N{\mathrm{Im}\,}m_{\mathsf{sc}}(\gamma_{k}^{t})}\sum_{-N\leqslant j\leqslant N}{\mathrm{Im}\,}\left(\dfrac{1}{\gamma_{j}-\gamma_{k}^{t}}\right)(\sigma_{j}(H)-\sigma_{j}(G)).

(22)

Our goal is to prove the following estimates

Proposition 3.2.

Let $0<c<1$ be a fixed small constant. For $\varepsilon_{0}>0$ arbitrarily small with any $N^{-1+\varepsilon_{0}}<t<1$ and $k\in\llbracket(c-1)N,(1-c)N\rrbracket$ , we have

|\sigma_{k}(H,t)-\sigma_{k}(G,t)-\widehat{\varphi}_{k}(t)|\prec\dfrac{1}{N^{2}t}.

To obtain the optimal control for the local relaxation flow, we need to carefully estimate $\widehat{\varphi}_{k}$ near the hard edge. A first step towards such estimates is given in the following lemma.

Lemma 3.7.

Let $\varepsilon>0$ and $0<c<1$ . For any $(k,\ell)\in\llbracket(c-1)N,(1-c)N\rrbracket^{2}$ , $|E|<2-c$ , and $s,t,\eta\in[N^{-1+4\varepsilon},1]$ , in the set $\mathscr{A}_{\varepsilon}$ , for $z=E+\text{\rm i}\eta$ we have

\left|\widehat{\varphi}_{k}(t)-\widehat{\varphi}_{\ell}(s)\right|\lesssim N^{\varepsilon}\left(\dfrac{|k-\ell|}{N^{2}(s\wedge t)}+\dfrac{|s-t|}{N(s\wedge t)}\right),

(23)

\left|\dfrac{{\mathrm{Im}\,}{\mathfrak{S}}_{0}(z_{t})}{2N{\mathrm{Im}\,}S_{0}(z_{t})}-\widehat{\varphi}_{\ell}(s)\right|\lesssim N^{\varepsilon}\left(\dfrac{|E-\gamma_{\ell}|}{N(s\wedge(\eta+t))}+\dfrac{|\eta+t-s|}{N(s\wedge(\eta+t))}\right).

(24)

Proof.

By the properties of the Stieltjes transform $m_{\mathsf{sc}}(z)$ (see e.g. [EY17, Section 6]) and direct computation, we have

{\mathrm{Im}\,}m_{\mathsf{sc}}(\gamma_{k}^{t})\gtrsim 1,\ \ {\mathrm{Im}\,}m_{\mathsf{sc}}(\gamma_{\ell}^{s})\gtrsim 1,

and

\left|{{\mathrm{Im}\,}m_{\mathsf{sc}}(\gamma_{k}^{t})-{\mathrm{Im}\,}m_{\mathsf{sc}}(\gamma_{\ell}^{s})}\right|+\left|{\gamma_{k}^{t}-\gamma_{\ell}^{s}}\right|\lesssim\frac{|k-\ell|}{N}+|s-t|.

(25)

In the set $\mathscr{A}_{\varepsilon}$ , the rigidity estimates imply that

|\sigma_{j}(H)-\sigma_{j}(G)|\leqslant N^{-\frac{2}{3}+\varepsilon}(N+1-|j|)^{-\frac{1}{3}}.

Then we have

$\displaystyle\left\|{\frac{1}{2N}{\mathrm{Im}\,}\left(\sum_{-N\leqslant j\leqslant N}\frac{\sigma_{j}(H)-\sigma_{j}(G)}{\gamma_{j}-\gamma_{k}^{t}}\right)}\right\|$	$\displaystyle\lesssim\frac{1}{2N}({\mathrm{Im}\,}\gamma_{k}^{t})\sum_{-j\leqslant j\leqslant N}\frac{N^{-\frac{2}{3}+\varepsilon}(N+1-\|j\|)^{-\frac{1}{3}}}{(\gamma_{j}-{\mathrm{Re}\,}\gamma_{k}^{t})^{2}+({\mathrm{Im}\,}\gamma_{k}^{t})^{2}}$	(26)
	$\displaystyle\lesssim N^{-1+\varepsilon}({\mathrm{Im}\,}\gamma_{k}^{t})\int_{-2}^{2}\frac{\xi(x)^{-\frac{1}{2}}}{(x-{\mathrm{Re}\,}\gamma_{k}^{t})^{2}+({\mathrm{Im}\,}\gamma_{k}^{t})^{2}}\mathrm{d}\rho_{\mathsf{sc}}(x)$
	$\displaystyle\lesssim N^{-1+\varepsilon}({\mathrm{Im}\,}\gamma_{k}^{t})\int_{-2}^{2}\frac{1}{(x-{\mathrm{Re}\,}\gamma_{k}^{t})^{2}+({\mathrm{Im}\,}\gamma_{k}^{t})^{2}}\mathrm{d}x$
	$\displaystyle\lesssim N^{-1+\varepsilon}.$

By triangle inequality, we have

	$\displaystyle\left\|{\widehat{\varphi}_{k}(t)-\widehat{\varphi}_{\ell}(s)}\right\|$	$\displaystyle\leqslant\frac{1}{2N}\left\|{\left(\frac{1}{{\mathrm{Im}\,}m_{\mathsf{sc}}(\gamma_{k}^{t})}-\frac{1}{{\mathrm{Im}\,}m_{\mathsf{sc}}(\gamma_{\ell}^{s})}\right){\mathrm{Im}\,}\left(\sum_{-N\leqslant j\leqslant N}\frac{\sigma_{j}(H)-\sigma_{j}(G)}{\gamma_{j}-\gamma_{k}^{t}}\right)}\right\|$
		$\displaystyle\quad+\frac{1}{2N{\mathrm{Im}\,}m_{\mathsf{sc}}(\gamma_{\ell}^{s})}\sum_{-N\leqslant j\leqslant N}\left\|{{\mathrm{Im}\,}\left(\frac{1}{\gamma_{j}-\gamma_{k}^{t}}\right)-{\mathrm{Im}\,}\left(\frac{1}{\gamma_{j}-\gamma_{\ell}^{s}}\right)}\right\|\left\|{\sigma_{j}(H)-\sigma_{j}(G)}\right\|$
		$\displaystyle=:I_{1}+I_{2}.$

Using (25) and (26), we obtain

I_{1}\lesssim N^{\varepsilon}\left(\frac{|k-\ell|}{N^{2}}+\frac{|s-t|}{N}\right).

For the second term $I_{2}$ , in the set $\mathscr{A}_{\varepsilon}$ , the rigidity and (25) imply that

	$\displaystyle I_{2}$	$\displaystyle\lesssim N^{-1+\varepsilon}\sum_{-N\leqslant j\leqslant N}N^{-\frac{2}{3}}(N+1-\|j\|)^{-\frac{1}{3}}\left\|{\frac{\gamma_{k}^{t}-\gamma_{\ell}^{s}}{(\gamma_{j}-\gamma_{k}^{t})(\gamma_{j}-\gamma_{\ell}^{s})}}\right\|$
		$\displaystyle\lesssim N^{-1+\varepsilon}\left(\frac{\|k-\ell\|}{N}+\|s-t\|\right)\sum_{-N\leqslant j\leqslant N}N^{-\frac{2}{3}}(N+1-\|j\|)^{-\frac{1}{3}}\left(\frac{1}{\|\gamma_{j}-\gamma_{k}^{t}\|^{2}}+\frac{1}{\|\gamma_{j}-\gamma_{\ell}^{s}\|^{2}}\right).$

Recall from Lemma 3.4 that ${\mathrm{Im}\,}\gamma_{k}^{t}\sim t$ and ${\mathrm{Im}\,}\gamma_{\ell}^{s}\sim s$ . Using a similar argument as in (26) we obtain

I_{2}\lesssim N^{-1+\varepsilon}\left(\frac{|k-\ell|}{N}+|s-t|\right)\left(\frac{1}{t}+\frac{1}{s}\right)\lesssim N^{\varepsilon}\left(\frac{|k-\ell|}{N^{2}(s\wedge t)}+\frac{|s-t|}{N(s\wedge t)}\right).

Hence we have proved (23). For the other part (24), it can be proved via the same arguments. ∎

As a consequence, we have a good control for the size of $\widehat{\varphi}_{k}(t)$ away from the soft edge. This is based on the symmetric structure of $\{\widehat{\varphi}_{k}\}$ .

Lemma 3.8.

Let $\varepsilon>0$ and $0<c<1$ . For any $t\in[N^{-1+4\varepsilon},1]$ and $k\in\llbracket(c-1)N,(1-c)N\rrbracket$ , with overwhelming probability we have

|\widehat{\varphi}_{k}(t)|\lesssim N^{\varepsilon}\dfrac{k}{N^{2}t}.

(27)

Proof.

A key observation is the following

{\mathrm{Re}\,}\gamma_{-k}^{t}=-{\mathrm{Re}\,}\gamma_{k}^{t},\ \ {\mathrm{Im}\,}\gamma_{-k}^{t}={\mathrm{Im}\,}\gamma_{k}^{t},\ \ {\mathrm{Re}\,}m_{\mathsf{sc}}(\gamma_{-k}^{t})=-{\mathrm{Re}\,}m_{\mathsf{sc}}(\gamma_{k}^{t}),\ \ {\mathrm{Im}\,}m_{\mathsf{sc}}(\gamma_{-k}^{t})={\mathrm{Im}\,}m_{\mathsf{sc}}(\gamma_{k}^{t}).

Therefore, we have

	$\displaystyle\widehat{\varphi}_{-k}(t)$	$\displaystyle=\dfrac{1}{2N{\mathrm{Im}\,}m_{\mathsf{sc}}(\gamma_{-k}^{t})}\sum_{-N\leqslant j\leqslant N}{\mathrm{Im}\,}\left(\dfrac{1}{\gamma_{j}-\gamma_{-k}^{t}}\right)(\sigma_{j}(H)-\sigma_{j}(G))$
		$\displaystyle=\dfrac{1}{2N{\mathrm{Im}\,}m_{\mathsf{sc}}(\gamma_{k}^{t})}\sum_{-N\leqslant j\leqslant N}\dfrac{{\mathrm{Im}\,}(\gamma_{-k}^{t})}{\left(\gamma_{j}-{\mathrm{Re}\,}(\gamma_{-k}^{t})\right)^{2}+({\mathrm{Im}\,}\gamma_{-k}^{t})^{2}}(\sigma_{j}(H)-\sigma_{j}(G))$
		$\displaystyle=\dfrac{1}{2N{\mathrm{Im}\,}m_{\mathsf{sc}}(\gamma_{k}^{t})}\sum_{-N\leqslant j\leqslant N}\dfrac{{\mathrm{Im}\,}(\gamma_{k}^{t})}{\left(\gamma_{j}+{\mathrm{Re}\,}(\gamma_{k}^{t})\right)^{2}+({\mathrm{Im}\,}\gamma_{k}^{t})^{2}}(\sigma_{j}(H)-\sigma_{j}(G))$

Using the symmtrization of the singular values, we further have

	$\displaystyle\widehat{\varphi}_{-k}(t)$	$\displaystyle=-\dfrac{1}{2N{\mathrm{Im}\,}m_{\mathsf{sc}}(\gamma_{k}^{t})}\sum_{-N\leqslant j\leqslant N}\dfrac{{\mathrm{Im}\,}(\gamma_{k}^{t})}{\left(\gamma_{-j}-{\mathrm{Re}\,}(\gamma_{k}^{t})\right)^{2}+({\mathrm{Im}\,}\gamma_{k}^{t})^{2}}(\sigma_{-j}(H)-\sigma_{-j}(G))$
		$\displaystyle=-\dfrac{1}{2N{\mathrm{Im}\,}m_{\mathsf{sc}}(\gamma_{k}^{t})}\sum_{-N\leqslant j\leqslant N}{\mathrm{Im}\,}\left(\dfrac{1}{\gamma_{-j}-\gamma_{k}^{t}}\right)(\sigma_{-j}(H)-\sigma_{-j}(G))$
		$\displaystyle=-\dfrac{1}{2N{\mathrm{Im}\,}m_{\mathsf{sc}}(\gamma_{k}^{t})}\sum_{-N\leqslant j\leqslant N}{\mathrm{Im}\,}\left(\dfrac{1}{\gamma_{j}-\gamma_{k}^{t}}\right)(\sigma_{j}(H)-\sigma_{j}(G))$
		$\displaystyle=-\widehat{\varphi}_{k}(t)$

Consequently, by (23) we have

|\widehat{\varphi}_{k}(t)-\widehat{\varphi}_{-k}(t)|=2|\widehat{\varphi}_{k}(t)|\lesssim N^{\varepsilon}\dfrac{2k}{N^{2}t}.

This shows the desired result. ∎

Finally, it’s straightfoward to derive Theorem 6 from Proposition 3.2 and Lemma 3.8. Thus, our primary goal is to prove Proposition 3.2.

3.3 Bootstrap arguments

We will prove the main technical estimate Proposition 3.2 via a bootstrap argument.

Definition 3 (Hypothesis ${\mathcal{H}}_{\alpha}$ ).

Consider the following hypothesis: For any fixed small $0<c<1$ , the following holds for $\varepsilon_{0}>0$ arbitrarily small. For any $N^{-1+\varepsilon_{0}}<t<1$ , $k\in\llbracket(c-1)N,(1-c)N\rrbracket$ and $\nu\in[0,1]$ , we have

\left|\varphi_{k}^{(\nu)}(t)-\widehat{\varphi}_{k}(t)\right|\prec\dfrac{(Nt)^{\alpha}}{N^{2}t}.

(28)

Proposition 3.2 is derived via a bootstrap of the hypothesis ${\mathcal{H}}_{\alpha}$ . Specifically, we have the following two lemmas.

Lemma 3.9.

The hypothesis ${\mathcal{H}}_{1}$ is true.

Proof.

Recall from Lemma 3.6 that (20) implies that $\varphi_{k}^{(\nu)}(t)\prec N^{-1}$ . On the other hand, from the definition (22) of $\widehat{\varphi}_{k}$ , using the rigidity and (26) we obtain $\widehat{\varphi}_{k}(t)\prec N^{-1}$ thanks to the arbitrariness of $\varepsilon_{0}$ . Therefore, the triangle inequality yields $|\varphi_{k}^{(\nu)}(t)-\widehat{\varphi}_{k}(t)|\prec N^{-1}$ , which completes the proof. ∎

Lemma 3.10.

If ${\mathcal{H}}_{\alpha}$ is true, then ${\mathcal{H}}_{3\alpha/4}$ is true, i.e.

\left|\varphi_{k}^{(\nu)}(t)-\widehat{\varphi}_{k}(t)\right|\prec\dfrac{(Nt)^{\frac{3\alpha}{4}}}{N^{2}t}.

The self-improving property of the hypothesis ${\mathcal{H}}_{\alpha}$ stated in Lemma 3.10 is the main technical part of the proof for Proposition 3.2. We defer its proof to Appendix B.2.

Finally, the optimal control (21) for the local relaxation flow at the hard edge follows from these two lemma together with Lemma 3.8.

Proof of Proposition 3.2.

Note that

\sigma_{k}(H,t)-\sigma_{k}(G,t)-\widehat{\varphi}_{k}(t)=\int_{0}^{1}\varphi_{k}^{(\nu)}(t)\mathrm{d}\nu-\widehat{\varphi}_{k}(t)=\int_{0}^{1}\left(\varphi_{k}^{(\nu)}(t)-\widehat{\varphi}_{k}(t)\right)\mathrm{d}\nu

(29)

Consider an arbitrarily fixed $\delta>0$ , based on Lemma 3.9 and Lemma 3.10, after a finite time of iterations, with overwhelming probability we have

\left|\varphi_{k}(t)-\widehat{\varphi}_{k}(t)\right|\leqslant\dfrac{N^{\delta}}{N^{2}t}

This shows that for any fixed $\widetilde{D}$ and $p$ , and for large enough $N$ , we have

\mathbb{E}\left(\left|\varphi_{k}(t)-\widehat{\varphi}_{k}(t)\right|^{2p}\right)\leqslant\left(\dfrac{N^{\delta}}{N^{2}t}\right)^{2p}+N^{-\widetilde{D}}.

By (29) we obtain

\mathbb{E}\left(\left|\sigma_{k}(H,t)-\sigma_{k}(G,t)-\widehat{\varphi}_{k}(t)\right|^{2p}\right)\leqslant\int_{0}^{1}\mathbb{E}\left(\left|\varphi_{k}(t)-\widehat{\varphi}_{k}(t)\right|^{2p}\right)\mathrm{d}\nu\\ \leqslant\left(\dfrac{N^{\delta}}{N^{2}t}\right)^{2p}+N^{-\widetilde{D}}.

We choose $p=D/\delta$ and $\widetilde{D}=D+100p$ , and then the Markov inequality yields

\mathbb{P}\left(\left|\sigma_{k}(H,t)-\sigma_{k}(G,t)-\widehat{\varphi}_{k}(t)\right|\leqslant\dfrac{N^{2\delta}}{N^{2}t}\right)>1-N^{-D},

which completes the proof thanks to the arbitrariness of $\delta$ and $D$ . ∎

4 Quantitative Universality

4.1 Quantitative resolvent comparison

In classical random matrix theory, the spectral universality is proved by comparison of the resolvent for matrices with some moment matching conditions. To obtain a quantitative universality for the smallest singular value, we need a quantitative version of the resolvent comparison theorem.

For a fixed constant $a\in(1,2)$ , let $\rho=\rho(N)\in[N^{-a},N^{-1}]$ be a cutoff scaling. Let $r>0$ and consider two symmetric functions $f_{1}(x),f_{2}(x)$ that are non-increasing in $|x|$ , given by

f_{1}(x):=\begin{cases}0&\textrm{if }|x|>rN^{-1}\\ 1&\textrm{if }|x|<rN^{-1}-\rho\end{cases},\ \ \ f_{2}(x):=\begin{cases}0&\textrm{if }|x|>rN^{-1}+\rho\\ 1&\textrm{if }|x|<rN^{-1}\end{cases}.

Also, consider a fixed non-increasing smooth function $F$ such that $F(x)=1$ for $x\leqslant 0$ and $F(x)=0$ for $x\geqslant 1$ .

A key observation is that the functions $f_{1},f_{2}$ and $F$ can bound the distribution of the smallest singular value $\sigma_{1}(H)$ . For any function $f:\mathbb{R}\to\mathbb{R}$ , we denote $\operatorname{Tr}f(H):=\sum_{i=-N}^{N}f(\sigma_{i}(H))$ .

Lemma 4.1.

We have

\mathbb{E}\left[F\left(\operatorname{Tr}f_{2}(H)\right)\right]\leqslant\mathbb{P}\left(\sigma_{1}(H)>rN^{-1}\right)\leqslant\mathbb{E}\left[F\left(\operatorname{Tr}f_{1}(H)\right)\right],

(30)

Proof.

For the right-hand side, assume $\sigma_{1}(H)>rN^{-1}$ . By definition of the function $f_{1}$ , we have $\sum_{i=-N}^{N}f_{1}(\sigma_{i}(H))=0$ , which implies $F(\operatorname{Tr}f_{1}(H))=1$ . Also note that $F\geqslant 0$ . Therefore, we conclude $\mathbbm{1}\left\{\sigma_{1}(H)>rN^{-1}\right\}\leqslant F(\operatorname{Tr}f_{1}(H))$ and this yields

\mathbb{P}\left(\sigma_{1}(H)>rN^{-1}\right)=\mathbb{E}\left[\mathbbm{1}\left\{\sigma_{1}(H)>rN^{-1}\right\}\right]\leqslant\mathbb{E}\left[F(\operatorname{Tr}f_{1}(H))\right].

The left-hand side can be proved similarly. ∎

When estimating the distribution $\mathbb{P}\left(\sigma_{1}(H)>rN^{-1}\right)$ , thanks to the rigidity of singular values, we can assume $r<N^{\varepsilon}$ without loss of generality, where $\varepsilon>0$ is a constant that can be arbitrarily small. Based on Lemma 4.1, to compare the distribution of the smallest singular values of different random matrices, it suffices to compare the functions $\operatorname{Tr}f_{1}$ and $\operatorname{Tr}f_{2}$ . In the remaining part of this section, we provide a systematic treatment of such a comparison.

Pick a point $E\in\mathbb{R}$ with $0<E<N^{-1+\varepsilon}$ . Let $f(x)$ be a smooth symmetric function that is non-increasing in $|x|$ satisfying

f(x)=\begin{cases}0&\textrm{if }|x|>E\\ 1&\textrm{if }|x|<E-\rho\end{cases},\ \ \ \textrm{and }\ \|f^{(k)}\|_{\infty}\lesssim\rho^{-k}\ \textrm{for }k=1,2.

(31)

For the test functions $f$ and $F$ defined as above, we have the following quantitative comparison of the resolvents, whose proof is deferred to Appendix C.

Proposition 4.1.

Let $X$ and $Y$ be two independent random matrices satisfying (1) and (2). Assume the first three moments of the entries are identical, i.e. $\mathbb{E}[X_{ij}^{k}]=\mathbb{E}[Y_{ij}^{k}]$ for all $1\leqslant i,j\leqslant N$ and $1\leqslant k\leqslant 3$ . Suppose also that for some parameter $t=t(N)$ we have

\left|\mathbb{E}[(\sqrt{N}X_{ij})^{4}]-\mathbb{E}[(\sqrt{N}Y_{ij})^{4}]\right|\leqslant t,\ \ \textrm{for all }1\leqslant i,j\leqslant N.

(32)

With the test functions $f$ and $F$ defined as above, there exists a constant $C>0$ such that the following is true for any $\varepsilon>0$

\left|\mathbb{E}[F(\operatorname{Tr}f(X))]-\mathbb{E}[F(\operatorname{Tr}f(Y))]\right|\leqslant N^{C\varepsilon}\left(\frac{1}{\rho N^{2}}+\frac{\left(\rho N^{a}\right)^{5}}{\sqrt{N}}+t\rho N^{a}\right).

(33)

4.2 Proof of Theorem 2

Using the quantitative comparison theorem (Proposition 4.1) and the smoothed analysis (Theorem 6), we now prove the quantitative universality.

For a general random matrix $H$ satisfying Assumptions (1) and (2), there exists another matrix $H_{0}^{\prime}$ that also satisfies the same assumptions such that the matrix $H_{t}^{\prime}:=e^{-t/2}H_{0}^{\prime}+(1-e^{-t})^{1/2}G$ has the same first three moments as $H$ and the difference between the fourth moments (in the sense of (32)) is $O(t)$ . This is guaranteed by [EYY11, Lemma 3.4].

Lemma 4.1 and Proposition 4.1 yields

\mathbb{E}\left[F(\operatorname{Tr}f_{2}(H_{t}^{\prime}))\right]-N^{C\varepsilon}\left(\frac{1}{\rho N^{2}}+\frac{\left(\rho N^{a}\right)^{5}}{\sqrt{N}}+t\rho N^{a}\right)\leqslant\mathbb{P}\left(\sigma_{1}(H)>rN^{-1}\right)\\ \leqslant\mathbb{E}\left[F(\operatorname{Tr}f_{1}(H_{t}^{\prime}))\right]+N^{C\varepsilon}\left(\frac{1}{\rho N^{2}}+\frac{\left(\rho N^{a}\right)^{5}}{\sqrt{N}}+t\rho N^{a}\right).

(34)

Using Lemma 4.1 for $H_{t}^{\prime}$ with $f_{1}$ and $f_{2}$ shifted by $\pm\rho$ , we have

\mathbb{P}\left(\sigma_{1}(H_{t}^{\prime})>\frac{r}{N}+\rho\right)-N^{C\varepsilon}\left(\frac{1}{\rho N^{2}}+\frac{\left(\rho N^{a}\right)^{5}}{\sqrt{N}}+t\rho N^{a}\right)\leqslant\mathbb{P}\left(\sigma_{1}(H)>\frac{r}{N}\right)\\ \leqslant\mathbb{P}\left(\sigma_{1}(H_{t}^{\prime})>\frac{r}{N}-\rho\right)+N^{C\varepsilon}\left(\frac{1}{\rho N^{2}}+\frac{\left(\rho N^{a}\right)^{5}}{\sqrt{N}}+t\rho N^{a}\right).

(35)

Using smoothed analysis Theorem 6, we have

\mathbb{P}\left(\sigma_{1}(G)>\frac{r}{N}+\rho+\frac{1}{N^{2}t}\right)-N^{C\varepsilon}\left(\frac{1}{\rho N^{2}}+\frac{\left(\rho N^{a}\right)^{5}}{\sqrt{N}}+t\rho N^{a}\right)\leqslant\mathbb{P}\left(\sigma_{1}(H)>\frac{r}{N}\right)\\ \leqslant\mathbb{P}\left(\sigma_{1}(G)>\frac{r}{N}-\rho-\frac{1}{N^{2}t}\right)+N^{C\varepsilon}\left(\frac{1}{\rho N^{2}}+\frac{\left(\rho N^{a}\right)^{5}}{\sqrt{N}}+t\rho N^{a}\right).

(36)

Taking $\rho=N^{-a}$ , $t=N^{a-2}$ and setting $a=1+\delta$ , we obtain the optimal bounds

\mathbb{P}\left(N\sigma_{1}(G)>r+N^{-\delta}\right)-N^{C\varepsilon}\left(N^{-1+\delta}\vee N^{-\frac{1}{2}}\right)\leqslant\mathbb{P}\left(N\sigma_{1}(H)>r\right)\\ \leqslant\mathbb{P}\left(N\sigma_{1}(G)>r-N^{-\delta}\right)+N^{C\varepsilon}\left(N^{-1+\delta}\vee N^{-\frac{1}{2}}\right).

(37)

Hence, thanks to the arbitrariness of $\varepsilon$ , we have proved Theorem 2.

Finally, for the complex case, using the exact formula for the distribution of $\sigma_{1}(G_{\mathbb{C}})$ , we obtain a rate of convergence to the limiting law. Recall that

\mathbb{P}(N\sigma_{1}(G_{\mathbb{C}})\leqslant r)=\int_{0}^{r}e^{-x}\mathrm{d}x=1-e^{-r}.

Proof of Corollary 1.

For the complex case, the previous arguments are still valid. Therefore, we still have (37). Since $N\sigma_{1}(G_{\mathbb{C}})$ has a bounded density, we have

\mathbb{P}\left(N\sigma_{1}(G_{\mathbb{C}})\leqslant r\right)-N^{\varepsilon}\left(N^{-\delta}+(N^{-1+\delta}\vee N^{-1/2})\right)\leqslant\mathbb{P}\left(N\sigma_{1}(H_{\mathbb{C}})\leqslant r\right)\\ \mathbb{P}\left(N\sigma_{1}(G_{\mathbb{C}})\leqslant r\right)+N^{\varepsilon}\left(N^{-\delta}+(N^{-1+\delta}\vee N^{-1/2})\right)

(38)

Choosing $\delta=1/2$ , we obtain the optimal estimate

\mathbb{P}\left(N\sigma_{1}(H_{\mathbb{C}})\leqslant r\right)=1-e^{-r}+N^{-\frac{1}{2}+\varepsilon},

which proves the desired result. ∎

5 Condition Number

5.1 Smoothed analysis

Note that the condition number $\kappa(H)$ is scaling invariant in the sense that $\kappa(aH)=\kappa(H)$ for any $a>0$ . Therefore, in the smoothed analysis, it suffices to consider $H_{t}=e^{-t/2}H+(1-e^{t})^{1/2}G$ , whose singular values satisfies the stochastic differential equation (10).

Recall from Theorem 6, we have shown that

|\sigma_{1}(H_{t})-\sigma_{1}(G)|\prec\frac{1}{N^{2}t}.

Note that in Lemma 3.6, using the same arguments as in Proposition 3.2, we can derive that

|\sigma_{N}(H_{t})-\sigma_{N}(G)|\prec\frac{1}{Nt}.

Then for any large $D>0$ and small $\varepsilon_{1},\varepsilon_{2}>0$ , there exists $N_{0}(\varepsilon_{1},\varepsilon_{2},D)$ such that the following holds with probability at least $1-N^{-D}$ .

\sigma_{1}(G)-\frac{N^{\varepsilon_{1}}}{N^{2}t}\leqslant\sigma_{1}(H_{t})\leqslant\sigma_{1}(G)+\frac{N^{\varepsilon_{1}}}{N^{2}t},\ \ \ \sigma_{N}(G)-\frac{N^{\varepsilon_{2}}}{Nt}\leqslant\sigma_{N}(H_{t})\leqslant\sigma_{N}(G)+\frac{N^{\varepsilon_{2}}}{Nt}.

Without loss of generality, we assume that $\varepsilon_{1}\sim\varepsilon_{2}\sim\varepsilon$ for some $\varepsilon>0$ that can be arbitrarily small. Then we have

\frac{\kappa(H_{t})}{N}=\frac{\sigma_{N}(H_{t})}{N\sigma_{1}(H_{t})}\leqslant\frac{\sigma_{N}(G)+\frac{N^{\varepsilon_{2}}}{Nt}}{N\sigma_{1}(G)-\frac{N^{\varepsilon_{1}}}{Nt}}\leqslant\frac{\kappa(G)}{N}+\frac{N^{C\varepsilon}}{Nt},

where in the last inequality we use that $N\sigma_{1}(G)$ is of order constant with overwhelming probability. By the arbitrariness of $\varepsilon>0$ , we can relabel the parameter and then obtain

\kappa(H_{t})\leqslant\kappa(G)+\frac{N^{\varepsilon}}{t}.

Similarly, we can also prove a lower bound and conclude that

|\kappa(H_{t})-\kappa(G)|\prec\frac{1}{t}.

For the matrix $H+\lambda G$ , we can write the matrix as

H+\lambda G=\sqrt{1+\lambda^{2}}\left(\frac{1}{\sqrt{1+\lambda^{2}}}H+\frac{\lambda}{\sqrt{1+\lambda^{2}}}G\right)=\sqrt{1+\lambda^{2}}H_{\log(1+\lambda^{2})}.

Therefore we have that $\kappa(H+\lambda G)=\kappa(H_{\log(1+\lambda^{2})})$ . As a consequence, we obtain that

|\kappa(H+\lambda G)-\kappa(G)|\prec\frac{1}{\log(1+\lambda^{2})},

which completes the proof for Theorem 3.

5.2 Quantitative universality

In this section, we prove Theorem 4, which establishes the quantitative universality for the condition number. We will use the universality for both the smallest singular value and the largest singular values. In particular, since the smallest singular values is typically of order $O(N^{-1})$ , a quantitative control is necessary. The largest singular value is of order constant, and therefore it is easier to deal with. Specifically, for the largest singular value, due to the Tracy-Widom limit for the largest eigenvalue of the sample covariance matrix, we know that $|\sigma_{N}(H)-2|\leqslant N^{-2/3+\varepsilon}$ for any $\varepsilon>0$ with overwhelming probability.

Let $\varepsilon>0$ be an arbitrarily small constant. For any large $D>0$ and sufficiently large $N$ , we have

\mathbb{P}\left(\frac{\kappa(H)}{N}>r\right)=\mathbb{P}\left(N\sigma_{1}(H)<r^{-1}\sigma_{N}(H)\right)\leqslant\mathbb{P}\left(N\sigma_{1}(H)<r^{-1}(2+N^{-\frac{2}{3}+\varepsilon})\right)+N^{-D}

Using Theorem 2, we obtain

	$\displaystyle\mathbb{P}\left(N^{-1}\kappa(H)>r\right)$	$\displaystyle\leqslant\mathbb{P}\left(rN\sigma_{1}(G)<2\right)+N^{-\frac{1}{3}-\varepsilon}+N^{-D}$
		$\displaystyle\leqslant\mathbb{P}\left(rN\sigma_{1}(G)<\sigma_{N}(G)+N^{-\frac{2}{3}+\frac{\varepsilon}{2}}\right)+N^{-\frac{1}{3}-\varepsilon}+N^{-D}$
		$\displaystyle\leqslant\mathbb{P}\left(\frac{\kappa(G)}{N}>r-\frac{N^{\varepsilon/2}}{N^{2/3}}\frac{1}{N\sigma_{1}(G)}\right)+N^{-\frac{1}{3}-\varepsilon}+N^{-D}$
		$\displaystyle\leqslant\mathbb{P}\left(\frac{\kappa(G)}{N}>r-N^{-\frac{2}{3}+\varepsilon}\right)+N^{-\frac{1}{3}-\varepsilon}+N^{-D}$

where the third inequality follows from that $N\sigma_{1}(G)$ is of order constant with overwhelming probability. Taking large enough $D$ , we have

\mathbb{P}\left(\frac{\kappa(H)}{N}>r\right)\leqslant\mathbb{P}\left(\frac{\kappa(G)}{N}>r-N^{-\frac{2}{3}+\varepsilon}\right)+O\left(N^{-\frac{1}{3}-\varepsilon}\right).

This yields the upper bound in (7), and the lower bound can be proved similarly. Hence, we have proved Theorem 4.

6 Beyond Strictly Square Matrices

As mentioned in the Introduction, the optimal local law for an $M\times N$ random matrix with general $\lim N/M=1$ is a notoriously hard problem. In particular, the optimal rigidity estimates Lemma 3.1 is unknown unless we restrict $M\equiv N$ . In this section, we discuss a slight extension of the strictly-square case. We show that in the regime $M=N+O(N^{o(1)})$ , all of our theorems still hold.

This claim is based on the following important observation. All proofs of our paper only relies on the local law (as well as its consequences), and therefore it suffices to show that such a local law is still valid for a general $M\times N$ matrix. More specifically, the main task is to show that the optimal local semicircle law (16) still holds for the Girko symmetrization of an $M\times N$ random matrix. Modulo the optimal local law, the optimal rigidity will still be valid as a by-product via standard approaches in random matrix theory.

For an $M\times N$ matrix $H$ , we consider the augmented matrix $H_{\mathsf{A}}$ , which is an $M\times M$ matrix by adding $M-N$ columns to $H$ with independent entries satisfying (1) and (2). Without loss of generality, we may assume that the added columns are the first $M-N$ columns in $H_{\mathsf{A}}$ . Since $H_{\mathsf{A}}$ is a square matrix, the local semicircle law (see [AEK14, Theorem 1.1]) is still true. Specifically, for any fixed $\omega>0$ , define the spectral domain

\mathbf{S}=\mathbf{S}_{\omega}:=\left\{z=x+\text{\rm i}y:|x|\leqslant\omega^{-1},M^{-1+\omega}\leqslant y\leqslant\omega^{-1}\right\}.

Then for any $z=x+\text{\rm i}y\in\mathbf{S}$ , define the resolvent $G^{\mathsf{A}}(z):=({\widetilde{H}}_{\mathsf{A}}-z)^{-1}$ and we have

\left|{\frac{1}{2M}\operatorname{Tr}G^{\mathsf{A}}(z)-m_{\mathsf{sc}}(z)}\right|\prec\frac{1}{My},

and

\max_{i,j}\left|{G_{ij}^{\mathsf{A}}(z)-\delta_{ij}m_{\mathsf{sc}}(z)}\right|\prec\left(\frac{1}{My}+\sqrt{\frac{{\mathrm{Im}\,}m_{sc}(z)}{My}}\right).

For an $n\times n$ matrix $X$ and a subset $1\leqslant k\leqslant n$ , we define $X^{(k)}$ as the $(n-k)\times(n-k)$ matrix

X^{(k)}=(X_{ij})_{k+1\leqslant i,j\leqslant n}.

Recall the Girko symmetrization ${\widetilde{H}}$ of a matrix $H$ . Then we have ${\widetilde{H}}={\widetilde{H}}_{\mathsf{A}}^{(M-N)}$ .

For $z=x+\text{\rm i}y$ with $y>0$ , let $G^{(k)}(z):=({\widetilde{H}}_{\mathsf{A}}^{(k)}-z)^{-1}$ . In particular we have $G^{(0)}(z)=G^{\mathsf{A}}(z)$ and $G^{(M-N)}(z)=G(z):=({\widetilde{H}}-z)^{-1}$ . Then we have the following resolvent identity (see e.g. [BGK17, Lemma 3.5])

G_{ij}^{\mathsf{A}}(z)=G_{ij}^{(1)}(z)+\frac{G_{i1}^{\mathsf{A}}(z)G_{1j}^{\mathsf{A}}(z)}{G_{11}^{\mathsf{A}}(z)},\ \ \mbox{for all }2\leqslant i,j\leqslant 2M.

From the local semicircle law for the square matrix $H_{\mathsf{A}}$ , with overwhelming probability we have

|G_{i,1}^{\mathsf{A}}(z)|,|G_{1,j}^{\mathsf{A}}(z)|\leqslant M^{\varepsilon}\left(\frac{1}{My}+\sqrt{\frac{{\mathrm{Im}\,}m_{sc}(z)}{My}}\right),\ \ \mbox{and }\ |G_{1,1}^{\mathsf{A}}(z)|\sim 1.

Using local law for $G_{ij}^{\mathsf{A}}$ again, this implies that

|G_{ij}^{(1)}(z)-G_{ij}^{\mathsf{A}}(z)|\lesssim\frac{M^{2\varepsilon}}{My}.

More generally, we have

G_{ij}^{(k)}(z)=G_{ij}^{(k+1)}(z)+\frac{G_{i,k+1}^{(k)}(z)G_{k+1,j}^{(k)}(z)}{G_{k+1,k+1}^{(k)}(z)},\ \ \mbox{for all }k+2\leqslant i,j\leqslant 2M.

By the same arguments as above, for any fixed $k$ , we can derive that

|G_{ij}^{(k+1)}(z)-G_{ij}^{(k)}(z)|\lesssim\frac{M^{2\varepsilon}}{My}.

By a telescoping summation, we derive

	$\displaystyle\|G_{ij}(z)-\delta_{ij}m_{\mathsf{sc}}(z)\|$	$\displaystyle\leqslant\sum_{k=0}^{M-N-1}\|G_{ij}^{(k+1)}(z)-G_{ij}^{(k)}(z)\|+\|G_{ij}^{\mathsf{A}}(z)-\delta_{ij}m_{\mathsf{sc}}(z)\|$
		$\displaystyle\leqslant M^{\varepsilon}\left(\frac{1}{My}+\sqrt{\frac{{\mathrm{Im}\,}m_{sc}(z)}{My}}\right)+(M-N)\frac{M^{2\varepsilon}}{My}$

Since $M-N=N^{o(1)}$ , the second term can be absorb into the first term with a larger factor $N^{3\varepsilon}$ . Thanks to the arbitrariness of $\varepsilon$ , we have

|G_{ij}(z)-\delta_{ij}m_{\mathsf{sc}}(z)|\prec\left(\frac{1}{Ny}+\sqrt{\frac{{\mathrm{Im}\,}m_{sc}(z)}{Ny}}\right).

(39)

Moreover, the resolvent identity also yields

\sum_{i=k+2}^{2M}G_{ii}^{(k)}(z)=\operatorname{Tr}G^{(k+1)}(z)+\frac{1}{G_{k+1,k+1}^{(k)}(z)}\sum_{i=k+2}^{2M}G_{i,k+1}^{(k)}(z)G_{k+1,i}^{(k)}(z).

This yields

\operatorname{Tr}G^{(k)}(z)-\operatorname{Tr}G^{(k+1)}(z)=\frac{1}{G_{k+1,k+1}^{(k)}(z)}\sum_{i=k+1}^{2M}G_{i,k+1}^{(k)}(z)G_{k+1,i}^{(k)}(z).

Using the Ward identity, this implies that

	$\displaystyle\left\|{\operatorname{Tr}G^{(k)}(z)-\operatorname{Tr}G^{(k+1)}(z)}\right\|$	$\displaystyle\leqslant\frac{1}{\|G_{k+1,k+1}^{(k)}(z)\|}\sum_{i=k+1}^{2M}\left\|{G_{i,k+1}^{(k)}(z)}\right\|^{2}$
		$\displaystyle\leqslant\frac{1}{\|G_{k+1,k+1}^{(k)}(z)\|}\frac{{\mathrm{Im}\,}G_{k+1,k+1}^{(k)}(z)}{y}\leqslant\frac{1}{y}.$

From the local law for $G^{{\mathsf{A}}}$ , with overwhelming probability we have

\left|{\frac{1}{2M}\operatorname{Tr}G^{{\mathsf{A}}}(z)-m_{\mathsf{sc}}(z)}\right|\leqslant\frac{M^{\varepsilon}}{My}

Again, using $M-N=N^{o(1)}$ , the telescoping sum yields

\left|{\frac{1}{M+N}\operatorname{Tr}G(z)-m_{\mathsf{sc}}(z)}\right|\leqslant\frac{N^{\varepsilon}}{Ny}+\frac{N^{o(1)}}{Ny}.

The second term can be absorbed into the first term for any fixed $\varepsilon>0$ . The arbitrariness of $\varepsilon$ concludes that

\left|{\frac{1}{M+N}\operatorname{Tr}G(z)-m_{\mathsf{sc}}(z)}\right|\prec\frac{1}{Ny}.

(40)

Hence, (39) and (40) have shown that the local law also holds for $H$ . The rigidity estimates Lemma 3.1 also follow from a classical argument in random matrix theory (see e.g. [BGK17]). As a consequence, Theorem 1, Theorem 2, Theorem 3 and Theorem 4 are all still valid.

Acknowledgment

The author thanks Paul Bourgade for suggesting this problem and for insightful comments on an early version of this manuscript.

Appendix A Auxiliary Results

To make this paper self-contained, we collect some well-known results that are used in the paper.

The first result is about controlling the size of a martingale, which is used in Proposition 3.1 to bound the martingale term in the stochastic dynamics (42), and also in Lemma B.1 to bound (45). This is from [SW09, Appendix B.6, Equation (18)].

Lemma A.1.

For any continuous martingale $M$ and any $\lambda,\mu>0$ , we have

\mathbb{P}\left(\sup_{0\leqslant u\leqslant t}|M_{u}|\geqslant\lambda,\langle M\rangle_{t}\leqslant\mu\right)\leqslant 2e^{-\frac{\lambda^{2}}{2\mu}}.

The second result is the Helffer-Sjőstrand formula, which is a classical result in functional calculus. This formula is used in Proposition 4.1 to compute the trace of functions via the Stieltjes transform. We are using the version in [EY17, Section 11.2].

Lemma A.2 (Helffer-Sjőstrand formula).

Let $f\in C^{1}(\mathbb{R})$ with compact support and let $\chi(y)$ be a smooth cutoff function with support in $[-1,1]$ , with $\chi(y)=1$ for $|y|\leqslant\frac{1}{2}$ and with bounded derivatives. Then

f(\lambda)=\frac{1}{\pi}\int_{\mathbb{R}^{2}}\frac{\text{\rm i}yf^{\prime\prime}(x)\chi(y)+\text{\rm i}(f(x)+\text{\rm i}f^{\prime}(x))\chi^{\prime}(y)}{\lambda-x-\text{\rm i}y}\mathrm{d}x\mathrm{d}y.

We also have the following resolvent expansion identity. This is a well-known result in linear algebra, and it is used in Proposition 4.1 to compare the resolvents of two matrices.

Lemma A.3 (Resolvent expansion).

For any two matrices $A$ and $B$ , we have

(A+B)^{-1}=A^{-1}-(A+B)^{-1}BA^{-1}

provided that all the matrix inverses exist.

Finally, we have some estimates for the Stieltjes transform of the semicircle law. For $z=E+\text{\rm i}\eta$ with $\eta>0$ , recall that $m_{\mathsf{sc}}(z)$ denotes the Stieltjes transform of the semicircle distribution. The following estimates are well known in random matrix theory (see e.g. [EY17, Lemma 6.2]).

Lemma A.4.

We have for all $z=E+\text{\rm i}\eta$ with $\eta$ that

|m_{\mathsf{sc}}(z)|=|m_{\mathsf{sc}}(z)+z|^{-1}\leqslant 1.

Furthermore, there is a constant $c>0$ such that for $E\in[-10,10]$ and $\eta\in(0,10]$ we have

c\leqslant|m_{\mathsf{sc}}(z)|\leqslant 1-c\eta,\ \ |1-m_{\mathsf{sc}}^{2}(z)|\sim\sqrt{\xi(E)+\eta},

as well as

{\mathrm{Im}\,}m_{\mathsf{sc}}(z)\sim\begin{cases}\sqrt{\xi(E)+\eta}&\mbox{if }|E|\leqslant 2,\\ \frac{\eta}{\sqrt{\xi(E)+\eta}}&\mbox{if }|E|\geqslant 2,\end{cases}

where $\xi(E):=||E|-2|$ is the distance of $E$ to the spectral edge.

Appendix B Proofs for Smoothed Analysis

B.1 Proof of Proposition 3.1

The proof is essentially the same as [Wan19, Proposition 3.8], and we briefly describe the key steps here for completeness. For any $1\leqslant\ell,m\leqslant N^{10}$ , we define $t_{\ell}=\ell N^{-10}$ and $z^{(m)}=E_{m}+\text{\rm i}\eta_{m}=E_{m}+\text{\rm i}N^{-1+4\varepsilon}\xi(E_{m})^{-1/2}$ , where $\int_{-\infty}^{E_{m}}\mathrm{d}\rho_{\mathsf{sc}}=mN^{-10}$ . Consider the stopping times

	$\displaystyle\tau_{0}$	$\displaystyle=\inf\left\{0\leqslant t\leqslant 1:\exists-N\leqslant k\leqslant N\ s.t.\ \|s_{k}(t)-\gamma_{k}\|>N^{-\frac{2}{3}+\varepsilon}(N+1-\|k\|)^{-\frac{1}{3}}\right\},$
	$\displaystyle\tau_{\ell,m}$	$\displaystyle=\inf\left\{0\leqslant s\leqslant t_{\ell}:{\mathrm{Im}\,}\widetilde{{\mathfrak{S}}}_{s}\left(z_{t_{\ell}-s}^{(m)}\right)>\dfrac{N^{2\varepsilon}}{2}\dfrac{\xi(E_{m})^{1/2}}{\left(\xi(E_{m})^{1/2}\vee t_{\ell}\right)}\right\},$
	$\displaystyle\tau$	$\displaystyle=\min\left\{\tau_{0},\tau_{\ell,m}:0\leqslant\ell,m\leqslant N^{10},\xi(E_{m})>N^{-\frac{2}{3}+4\varepsilon}\right\},$

with the convention $\inf\emptyset=\infty$ . We claim that it suffices to show that $\tau=\infty$ with overwhelming probability.

To prove this claim, for any $z\in\mathscr{S}_{\varepsilon}$ and $0\leqslant t\leqslant 1$ , we pick $t_{\ell}\leqslant t\leqslant t_{\ell+1}$ and $|z-z^{(m)}|<N^{-5}$ . Note that the maximum principle Lemma 3.2 implies $|\psi_{k}(t)|\lesssim 1$ for all $k$ and $t$ . Then we have $|\widetilde{{\mathfrak{S}}}_{t}(z)-\widetilde{{\mathfrak{S}}}_{t}(z^{(m)})|\lesssim N^{-2}$ . Also, note that for $z=E+i\eta$ we have $|S_{t}(z)|\leqslant\eta^{-1}$ , and

\left|{\partial_{z}\widetilde{{\mathfrak{S}}}_{t}(z)}\right|\lesssim N\max_{k}|\psi_{k}(0)|\eta^{-2}\lesssim N\eta^{-2},\ \ \left|{\partial_{zz}\widetilde{{\mathfrak{S}}}_{t}(z)}\right|\lesssim N\eta^{-3}.

Consider the events

{\mathcal{E}}_{\ell,m,k}:=\left\{\sup_{t_{\ell}\leqslant u\leqslant t_{\ell+1}}\left|{\int_{t_{\ell}}^{u}\frac{e^{-\frac{v}{2}}\psi_{k}(v)}{(s_{k}(v)-z^{(m)})^{2}}\mathrm{d}B_{k}(v)}\right|<N^{-4}\right\}.

On the event $\bigcap_{k}{\mathcal{E}}_{\ell,m,k}$ , the above estimates imply that $|\widetilde{{\mathfrak{S}}}_{t}(z^{(m)})-\widetilde{{\mathfrak{S}}}_{t_{\ell}}(z^{(m)})|<N^{-2}$ . It further shows that

\left|{\widetilde{{\mathfrak{S}}}_{t}(z)-\widetilde{{\mathfrak{S}}}_{t_{\ell}}(z^{(m)})}\right|<N^{-2}.

Since this holds for all $z$ and $t$ , we have shown that

\left\{\tau=\infty\right\}\bigcap\left(\bigcap_{1\leqslant\ell,m\leqslant N^{10},-N\leqslant k\leqslant N}{\mathcal{E}}_{\ell,m,k}\right)\subset\bigcap_{z\in\mathscr{S}_{\varepsilon},0\leqslant t\leqslant 1}\left\{{\mathrm{Im}\,}\widetilde{{\mathfrak{S}}}_{t}(z)\leqslant N^{2\varepsilon}\frac{\xi(E)^{1/2}}{\left(\xi(E)^{1/2}\vee t\right)}\right\}.

(41)

Moreover, note that

\left\langle\int_{t_{\ell}}^{t_{\ell+1}}\frac{e^{-\frac{v}{2}}\psi_{k}(v)}{(s_{k}(v)-z^{(m)})^{2}}\mathrm{d}B_{k}(v)\right\rangle_{t_{\ell+1}}\\ \leqslant\int_{t_{\ell}}^{t_{\ell+1}}\frac{e^{-v}|\psi_{k}(v)|^{2}}{(s_{k}(v)-z^{(m)})^{4}}\mathrm{d}v\leqslant N^{-10}\left(N^{-1+4\varepsilon}\right)^{-4}\left(\max_{k}|\psi_{k}(0)|\right)^{2}\leqslant N^{-6+16\varepsilon}.

Using Lemma A.1, we conclude that the event ${\mathcal{E}}_{\ell,m,k}$ happens with overwhelming probability. By a union bound, we further have that $\bigcap_{l,m,k}{\mathcal{E}}_{\ell,m,k}$ happens with overwhelming probability. Together with the set inclusion (41), we conclude that the claim is true, i.e. it suffices to prove $\tau=\infty$ with overwhelming probability.

To prove $\tau=\infty$ with overwhelming probability, consider some fixed $t=t_{\ell}$ and $z=z^{(m)}=E+\text{\rm i}\eta$ , and define the function $f_{u}(z):=\widetilde{{\mathfrak{S}}}_{u}(z_{t-u})$ . By Lemma 3.5, the initial condition is well controlled ${\mathrm{Im}\,}\widetilde{{\mathfrak{S}}}_{0}(z)\lesssim N^{2\varepsilon}\tfrac{\xi(E_{m})^{1/2}}{(\xi(E_{m})^{1/2}\vee t)}$ . To bound the increments, note that the dynamics (17) yields

\mathrm{d}f_{u\wedge\tau}(z)=\epsilon_{u}(z_{t-u})\mathrm{d}(u\wedge\tau)-\dfrac{e^{-\frac{u}{2}}}{\sqrt{N}}\sum_{-N\leqslant k\leqslant N}\dfrac{\psi_{k}(u)}{(z_{t-u}-s_{k}(u))^{2}}\mathrm{d}B_{k}(u\wedge\tau),

(42)

where

\epsilon_{u}(z):=(S_{u}(z)-m_{\mathsf{sc}}(z))\partial_{z}\widetilde{{\mathfrak{S}}}_{u}+\dfrac{1}{4N}(\partial_{zz}\widetilde{{\mathfrak{S}}}_{u})+\dfrac{e^{-\frac{u}{2}}}{2N}\sum_{-N\leqslant k\leqslant N}\dfrac{\psi_{k}(u)}{(s_{k}-z)^{2}(s_{k}+z)}

By the local semicircle law (16), we have

		$\displaystyle\quad\sup_{0\leqslant s\leqslant t}\left\|\int_{0}^{s}\left(S_{u}(z_{t-u})-m_{\mathsf{sc}}(z_{t-u})\right)\partial_{z}\widetilde{{\mathfrak{S}}}_{u}(z_{t-u})\mathrm{d}(u\wedge\tau)\right\|$		(43)
		$\displaystyle\leqslant\int_{0}^{t}\dfrac{N^{\varepsilon}}{{N{\mathrm{Im}\,}(z_{t-u})}}\sum_{-N\leqslant k\leqslant N}\dfrac{\psi_{k}(u)}{\|s_{k}(u)-z_{t-u}\|^{2}}\mathrm{d}(u\wedge\tau)$
		$\displaystyle\leqslant\int_{0}^{t}\dfrac{N^{\varepsilon}{\mathrm{Im}\,}\widetilde{{\mathfrak{S}}}_{u}(z_{t-u})}{{N}({\mathrm{Im}\,}z_{t-u})^{2}}\mathrm{d}(u\wedge\tau)$
		$\displaystyle\leqslant\int_{0}^{t}\dfrac{N^{2\varepsilon}\mathrm{d}u}{{N}\left(\eta+(t-u)\frac{b(z)}{\xi(z)^{1/2}}\right)^{2}}\dfrac{\xi(E)^{\frac{1}{2}}}{(\xi(E)^{\frac{1}{2}}\vee t)}$
		$\displaystyle\lesssim\dfrac{\xi(E)^{\frac{1}{2}}}{(\xi(E)^{\frac{1}{2}}\vee t)}.$

Also, we have

\sup_{0\leqslant s\leqslant t}\left|\int_{0}^{s}\dfrac{1}{4N}(\partial_{zz}\widetilde{{\mathfrak{S}}}_{u}(z_{t-u}))\mathrm{d}(u\wedge\tau)\right|\lesssim\int_{0}^{t}\dfrac{{\mathrm{Im}\,}\widetilde{{\mathfrak{S}}}_{u}(z_{t-u})}{N\left({\mathrm{Im}\,}(z_{t-u})\right)^{2}}\mathrm{d}(u\wedge\tau)\lesssim N^{-\varepsilon}\dfrac{\xi(E)^{\frac{1}{2}}}{(\xi(E)^{\frac{1}{2}}\vee t)},

and

\sup_{0\leqslant s\leqslant t}\left|\int_{0}^{s}\dfrac{e^{-\frac{u}{2}}}{2N}\sum_{-N\leqslant k\leqslant N}\dfrac{\psi_{k}(u)}{(s_{k}-z_{t-u})^{2}(s_{k}+z_{t-u})}\mathrm{d}(u\wedge\tau)\right|\\ \lesssim\dfrac{1}{N}\int_{0}^{t}\dfrac{1}{{\mathrm{Im}\,}(z_{t-u})}\sum_{-N\leqslant k\leqslant N}\dfrac{\psi_{k}(u)}{|s_{k}(u)-z_{t-u}|^{2}}\mathrm{d}(u\wedge\tau)\\ \lesssim\int_{0}^{t}\dfrac{{\mathrm{Im}\,}f_{u}(z_{t-u})}{N\left({\mathrm{Im}\,}(z_{t-u})\right)^{2}}\mathrm{d}(u\wedge\tau)\lesssim N^{-\varepsilon}\dfrac{\xi(E)^{\frac{1}{2}}}{(\xi(E)^{\frac{1}{2}}\vee t)}.

For the martingale part

M_{s}:=\int_{0}^{s}\dfrac{e^{-\frac{u}{2}}}{\sqrt{N}}\sum_{-N\leqslant k\leqslant N}\dfrac{\psi_{k}(u)}{(z_{t-u}-s_{k}(u))^{2}}\mathrm{d}B_{k}(u\wedge\tau),

using the rigidity of singular values (Lemma 3.1), with overwhelming probability we have

\sup_{0\leqslant s\leqslant t}|M_{s}|^{2}\lesssim N^{\frac{\varepsilon}{2}}\int_{0}^{t}\frac{1}{N}\sum_{-N\leqslant k\leqslant N}\frac{|\psi_{k}(u)|^{2}}{|z_{t-u}-\gamma_{k}|^{4}}\mathrm{d}(u\wedge\tau).

To estimate this integral, we chop the interval $[-N,N]$ into $2N^{1-4\varepsilon}$ subintervals $I_{j}=\llbracket k_{j},k_{j+1}\rrbracket$ where $k_{j}=-N+\lfloor jN^{4\varepsilon}\rfloor$ . We can bound the summation in the integral in the following way

\frac{1}{N}\sum_{-N\leqslant k\leqslant N}\frac{|\psi_{k}(u)|^{2}}{|z_{t-u}-\gamma_{k}|^{4}}\leqslant\frac{1}{N}\sum_{0\leqslant j\leqslant 2N^{1-4\varepsilon}}\left(\max_{k\in I_{j}}\psi_{k}(u)\right)\left(\max_{k\in I_{j}}\frac{1}{|z_{t-u}-\gamma_{k}|^{4}}\right)\left(\sum_{k\in I_{j}}\psi_{k}(u)\right).

Using similar discretization arguments as above, we can derive

\max_{k\in I_{j}}\psi_{k}(u)\leqslant\sum_{k\in I_{j}}\psi_{k}(u)\leqslant\frac{N^{6\varepsilon}}{N(\xi(\gamma_{k_{j}})^{1/2}\vee u)},

and

\max_{k\in I_{j}}\frac{1}{|z_{t-u}-\gamma_{k}|^{4}}\leqslant N^{-4\varepsilon}\sum_{k\in I_{j}}\frac{1}{|z_{t-u}-\gamma_{k}|^{4}}.

Therefore, we obtain

\sup_{0\leqslant s\leqslant t}|M_{s}|^{2}\leqslant N^{-2+9\varepsilon}\int_{0}^{t}\int_{-2}^{2}\frac{1}{|z_{t-u}-x|^{4}(\xi(x)\vee u^{2})}d\rho_{\mathsf{sc}}(x)\mathrm{d}u\lesssim N^{\varepsilon}\frac{\xi(E)}{(\xi(E)\vee t^{2})}.

Combining this estimate for the martingale term with previous estimates, a union bound shows that with overwhelming probability we have

\sup_{\ell,m,0\leqslant s\leqslant t_{\ell},\xi(E_{m})>\varphi^{2}N^{-2/3}}{\mathrm{Im}\,}\widetilde{f}_{s\wedge\tau}(z_{t_{\ell}-s\wedge\tau}^{(m)})\lesssim N^{2\varepsilon}\dfrac{\xi(E)^{1/2}}{\left(\xi(E)^{1/2}\vee t\right)}.

Now we have proved $\tau=\infty$ with overwhelming probability and hence the desired result is true.

B.2 Proof of Lemma 3.10

The proof of Lemma 3.10 is a delicate task. The key part of the proof is a careful analysis of the dynamics. The main idea is to approximate the dynamics with a short-range version, which will be easier to control. To do this, we show the finite speed of propagation estimate for the short-range kernel of the parabolic-type equation (14) satisfied by $\{\varphi_{k}\}$ . Then we prove a short-range approximation of the original dynamics and introduce a regularized equation. Finally, we show that, with a well-behaved initial condition, the regularized equation gives us the desired good approximation.

To begin with, the core input of the bootstrap argument is the following technical lemma, which states that the estimate of the local average will improve along with the induction hypothesis ${\mathcal{H}}_{\alpha}$ .

Lemma B.1.

Assume ${\mathcal{H}}_{\alpha}$ . Let $\xi>0$ be any fixed small constant. For any $0<t<1$ , any $\varepsilon_{0}>0$ arbitrarily small and $z=E+\text{\rm i}\eta$ satisfying $N^{-1+\varepsilon_{0}}<\eta<1$ , $|E|<2-\xi$ , we have

\left|{\mathrm{Im}\,}{\mathfrak{S}}_{t}(z)-e^{-\frac{t}{2}}\dfrac{{\mathrm{Im}\,}S_{t}(z)}{{\mathrm{Im}\,}S_{0}(z_{t})}{\mathrm{Im}\,}{\mathfrak{S}}_{0}(z_{t})\right|\prec\left(\dfrac{(Nt)^{\alpha}}{N^{2}t\eta}+\dfrac{1}{Nt}\right)

(44)

Proof.

Fix $t$ and consider the function

g_{u}(z):={\mathfrak{S}}_{u}(z_{t-u})-e^{-\frac{u}{2}}\dfrac{{\mathrm{Im}\,}{\mathfrak{S}}_{0}(z_{t})}{{\mathrm{Im}\,}S_{0}(z_{t})}S_{u}(z_{t-u}),\ \ 0\leqslant u\leqslant t.

An observation is that $e^{-u/2}S_{u}(z)$ satisfy the same stochastic advection equation (17) with $\varphi_{k}$ replaced by $\tfrac{1}{2N}$ . Therefore, we have

$\displaystyle\mathrm{d}g_{u}$	$\displaystyle=\left(S_{u}(z_{t-u})-m_{\mathsf{sc}}(z_{t-u})\right)\left(\partial_{z}{\mathfrak{S}}_{u}(z_{t-u})-e^{-\frac{u}{2}}\dfrac{{\mathrm{Im}\,}{\mathfrak{S}}_{0}(z_{t})}{{\mathrm{Im}\,}S_{0}(z_{t})}\partial_{z}S_{u}(z_{t-u})\right)\mathrm{d}u$	(45)
	$\displaystyle\quad+\dfrac{1}{4N}\left(\partial_{zz}{\mathfrak{S}}_{u}(z_{t-u})-e^{-\frac{u}{2}}\dfrac{{\mathrm{Im}\,}{\mathfrak{S}}_{0}(z_{t})}{{\mathrm{Im}\,}S_{0}(z_{t})}\partial_{zz}S_{u}(z_{t-u})\right)\mathrm{d}u$
	$\displaystyle\quad+\dfrac{e^{-\frac{u}{2}}}{2N}\sum_{-N\leqslant k\leqslant N}\dfrac{\theta_{k}(u)}{(s_{k}(u)-z_{t-u})^{2}(s_{k}(u)+z_{t-u})}\mathrm{d}u$
	$\displaystyle\quad-\dfrac{e^{-\frac{u}{2}}}{\sqrt{N}}\sum_{-N\leqslant k\leqslant N}\dfrac{\theta_{k}(u)}{(s_{k}(u)-z_{t-u})^{2}}\mathrm{d}B_{k},$

where

\theta_{k}(u)=\varphi_{k}(u)-\frac{{\mathrm{Im}\,}{\mathfrak{S}}_{0}(z_{t})}{2N{\mathrm{Im}\,}S_{0}(z_{t})}.

Similarly as in the proof of Proposition 3.1, for $\varepsilon>0$ and $0\leqslant\ell,m,p\leqslant N^{10}$ , define $t_{\ell}=\ell N^{-10}$ and $z^{(m,p)}=E_{m}+\text{\rm i}\eta_{p}$ where $\int_{-\infty}^{E_{m}}\mathrm{d}\rho=mN^{-10}$ and $\eta_{p}=N^{-1+4\varepsilon}+pN^{-10}$ . We also pick $c>0$ such that $\lfloor(1-c)N\rfloor=\arg\min_{k}|\gamma_{k}-(2-\tfrac{\xi}{10})|$ . Assuming ${\mathcal{H}}_{\alpha}$ , let $\varepsilon_{0}>0$ be the arbitrarily small scale in the hypothesis. Let $C>0$ be some suitably large constant. Recall the stopping times

	$\displaystyle\tau_{0}$	$\displaystyle=\inf\left\{0\leqslant u\leqslant 1:\exists-N\leqslant k\leqslant N\ s.t.\ \|s_{k}(u)-\gamma_{k}\|>N^{-\frac{2}{3}+\varepsilon}(N+1-\|k\|)^{-\frac{1}{3}}\right\},$
	$\displaystyle\tau_{1}$	$\displaystyle=\inf\left\{N^{-1+\varepsilon_{0}}\leqslant u\leqslant 1:\exists-N\leqslant k\leqslant N\ s.t.\ \|\varphi_{k}(u)\|>\dfrac{N^{C\varepsilon}}{N}\dfrac{1}{\left((\frac{N+1-\|k\|}{N})^{1/3}\vee u\right)}\right\},$

and consider the new stopping times

	$\displaystyle\tau_{2}$	$\displaystyle=\inf\left\{N^{-1+\varepsilon_{0}}\leqslant u\leqslant 1:\exists k\in\llbracket(c-1)N,(1-c)N\rrbracket\ s.t.\ \|\varphi_{k}(u)-\widehat{\varphi}_{k}(u)\|>N^{C\varepsilon}\dfrac{(Nt)^{\alpha}}{N^{2}t}\right\},$
	$\displaystyle\tau_{\ell,m,p}$	$\displaystyle=\inf\left\{0\leqslant u\leqslant t_{\ell}:\left\|{\mathrm{Im}\,}g_{u}^{(t_{\ell})}(z^{(m,p)})\right\|>N^{C\varepsilon}\left(\dfrac{(Nt)^{\alpha}}{N^{2}t\eta}+\dfrac{1}{Nt}\right)\right\}$
	$\displaystyle\tau$	$\displaystyle=\min\{\tau_{0},\tau_{1},\tau_{2},\tau_{\ell,m,p}:0\leqslant\ell,m,p\leqslant N^{10},\|E_{m}\|<2-\xi\}.$

Recall the convention $\inf\emptyset=\infty$ . As shown in the proof of Proposition 3.1, it suffices to show that $\tau=\infty$ with overwhelming probability.

A key ingredient for the analysis of the dynamics of $g_{u}$ is the following estimates on $\theta_{k}(u)$ . To do this, we fix some $t=t_{\ell}$ and $z=z^{(m,p)}$ with $|E_{m}|<2-\xi$ , and let $N^{-1+\varepsilon_{0}}\leqslant u\leqslant t\wedge\tau$ and $k\in\llbracket(c-1)N,(1-c)N\rrbracket$ .

On the one hand, we have a direct a priori estimate. Since $u\leqslant\tau_{1}$ , we have

|\varphi_{k}(u)|\lesssim N^{-\frac{2}{3}+C\varepsilon}(N+1-|k|)^{-\frac{1}{3}}.

Moreover, note that for $z=E+i\eta$ with $E$ in the bulk and $t<1$ , uniformly we have ${\mathrm{Im}\,}S_{0}(z_{t})\gtrsim 1$ . By Lemma 3.5, this shows

\dfrac{{\mathrm{Im}\,}{\mathfrak{S}}_{0}(z_{t})}{2N{\mathrm{Im}\,}S_{0}(z_{t})}\lesssim N^{-1+\varepsilon}\lesssim N^{-\frac{2}{3}+C\varepsilon}(N+1-|k|)^{-\frac{1}{3}}.

As a consequence, we have

|\theta_{k}(u)|\lesssim N^{-\frac{2}{3}+C\varepsilon}(N+1-|k|)^{-\frac{1}{3}}.

(46)

On the other hand, the estimate can also be obtained via approximation

|\theta_{k}(u)|\leqslant|\varphi_{k}(u)-\widehat{\varphi}_{k}(u)|+|\widehat{\varphi}_{k}(u)-\widehat{\varphi}_{j}(t)|+\left|\widehat{\varphi}_{j}(t)-\frac{{\mathrm{Im}\,}f_{0}(z_{t})}{2N{\mathrm{Im}\,}S_{0}(z_{t})}\right|.

For the first term, since $u\leqslant\tau_{2}$ we have

|\varphi_{k}(u)-\widehat{\varphi}_{k}(u)|\leqslant N^{C\varepsilon}\dfrac{(Nu)^{a}}{N^{2}u}.

Choosing $|\gamma_{j}-E|\leqslant N^{-1+2\varepsilon}$ , the remaining two terms are controlled by Lemma 3.8

	$\displaystyle\|\widehat{\varphi}_{k}(u)-\widehat{\varphi}_{j}(t)\|+\left\|\widehat{\varphi}_{j}(t)-\frac{{\mathrm{Im}\,}f_{0}(z_{t})}{2N{\mathrm{Im}\,}S_{0}(z_{t})}\right\|$	$\displaystyle\lesssim N^{\varepsilon}\left(\dfrac{\|k-j\|}{N^{2}u}+\dfrac{\|t-u\|}{Nu}+\dfrac{\|E-\gamma_{j}\|}{Nt}+\dfrac{\eta}{Nt}\right)$
		$\displaystyle\lesssim N^{C\varepsilon}\left(\dfrac{\|\gamma_{k}-E\|}{Nu}+\dfrac{t-u}{Nu}+\dfrac{\eta}{Nt}\right).$

Together, we decompose the error terms into two parts and obtain the following

|\theta_{k}(u)|\leqslant N^{C\varepsilon}\left(\dfrac{|\gamma_{k}-E|}{Nu}+\dfrac{t-u}{Nu}+\dfrac{\eta}{Nt}+\dfrac{(Nu)^{\alpha}}{N^{2}u}\right)=:\varphi^{C}\left(\dfrac{|\gamma_{k}-E|}{Nu}+\Lambda(a,N,t,\eta,u)\right)

(47)

With the above control on $\theta_{k}(u)$ , the dynamics (45) can be used to bound ${\mathrm{Im}\,}(g_{t}-g_{0})$ similarly as in Proposition 3.1. For the first term, we have

		$\displaystyle\quad\int_{0}^{t\wedge\tau}\|S_{u}(z_{t-u})-m_{\mathsf{sc}}(z_{t-u})\|\left\|\partial_{z}{\mathfrak{S}}_{u}(z_{t-u})-e^{-\frac{u}{2}}\dfrac{{\mathrm{Im}\,}{\mathfrak{S}}_{0}(z_{t})}{{\mathrm{Im}\,}S_{0}(z_{t})}\partial_{z}S_{u}(z_{t-u})\right\|\mathrm{d}u$		(48)
		$\displaystyle\leqslant\int_{0}^{t\wedge\tau}\dfrac{N^{C\varepsilon}}{{N{\mathrm{Im}\,}(z_{t-u})}}\sum_{-N\leqslant k\leqslant N}\dfrac{\|\theta_{k}(u)\|}{\|s_{k}(u)-z_{t-u}\|^{2}}\mathrm{d}u$
		$\displaystyle\leqslant\int_{0}^{t\wedge\tau}\dfrac{N^{C\varepsilon}}{{N{\mathrm{Im}\,}(z_{t-u})}}\left(\sum_{\|k\|\geqslant(1-c)N}\dfrac{\|\theta_{k}(u)\|}{\|\gamma_{k}-z_{t-u}\|^{2}}+\sum_{\|k\|<(1-c)N}\dfrac{\|\theta_{k}(u)\|}{\|\gamma_{k}-z_{t-u}\|^{2}}\right)\mathrm{d}u$
		$\displaystyle=:I_{1}+I_{2}.$

For the soft edge part $|k|\geqslant(1-c)N$ , using (46) we obtain

$\displaystyle I_{1}$	$\displaystyle\leqslant N^{C\varepsilon}\int_{0}^{t\wedge\tau}\dfrac{1}{{N{\mathrm{Im}\,}(z_{t-u})}}\sum_{\|k\|\geqslant(1-\alpha)N}N^{-\frac{2}{3}+C\varepsilon}(N+1-\|k\|)^{-\frac{1}{3}}\mathrm{d}u$	(49)
	$\displaystyle\leqslant N^{C\varepsilon}\int_{0}^{t\wedge\tau}\dfrac{1}{{N{\mathrm{Im}\,}(z_{t-u})}}\mathrm{d}u$
	$\displaystyle\leqslant N^{C\varepsilon}\dfrac{\log(1+Nt)}{N}.$

For $I_{2}$ , note that

	$\displaystyle\sum_{\|k\|<(1-c)N}\dfrac{\|\theta_{k}(u)\|}{\|\gamma_{k}-z_{t-u}\|^{2}}$	$\displaystyle\leqslant N^{C\varepsilon}\left(\dfrac{1}{Nu}\sum_{\|k\|<(1-c)N}\dfrac{\|\gamma_{k}-E\|}{\|\gamma_{k}-z_{t-u}\|^{2}}+\Lambda\sum_{\|k\|<(1-c)N}\dfrac{1}{\|\gamma_{k}-z_{t-u}\|^{2}}\right)$
		$\displaystyle\leqslant N^{C\varepsilon}\left(\dfrac{1}{u}+N\dfrac{\Lambda}{\eta+t-u}\right)$

This yields

		$\displaystyle\quad\int_{N^{-1+\varepsilon_{0}}}^{t\wedge\tau}\dfrac{N^{C\varepsilon}}{{N{\mathrm{Im}\,}(z_{t-u})}}\sum_{\|k\|<(1-\alpha)N}\dfrac{\|\theta_{k}(u)\|}{\|\gamma_{k}-z_{t-u}\|^{2}}\mathrm{d}u$		(50)
		$\displaystyle\leqslant N^{C\varepsilon}\int_{N^{-1+\varepsilon_{0}}}^{t\wedge\tau}\dfrac{1}{{N(\eta+t-u)}}\left(\dfrac{1}{u}+N\dfrac{\Lambda}{\eta+t-u}\right)\mathrm{d}u$
		$\displaystyle\leqslant N^{C\varepsilon}\int_{N^{-1+\varepsilon_{0}}}^{t\wedge\tau}\dfrac{1}{{N(\eta+t-u)}}\left(\dfrac{1}{u}+N\dfrac{1}{\eta+t-u}\left(\dfrac{t-u}{Nu}+\dfrac{\eta}{Nt}+\dfrac{(Nu)^{\alpha}}{N^{2}u}\right)\right)\mathrm{d}u$
		$\displaystyle\leqslant N^{C\varepsilon}\left(\dfrac{(Nt)^{\alpha}}{N^{2}t\eta}+\dfrac{1}{Nt}\right)$

Moreover, without loss of generality, we may assume that $\varepsilon_{0}\sim\varepsilon$ . Then, using (46) we obtain

\int_{0}^{N^{-1+\varepsilon_{0}}}\dfrac{N^{C\varepsilon}}{{N{\mathrm{Im}\,}(z_{t-u})}}\sum_{|k|<(1-\alpha)N}\dfrac{|\theta_{k}(u)|}{|\gamma_{k}-z_{t-u}|^{2}}\mathrm{d}u\leqslant\dfrac{N^{C\varepsilon}}{N^{2}(\eta+t)^{2}}

(51)

Together with previous estimates, this shows

\int_{0}^{t\wedge\tau}|S_{u}(z_{t-u})-m_{\mathsf{sc}}(z_{t-u})|\left|\partial_{z}{\mathfrak{S}}_{u}(z_{t-u})-e^{-\frac{u}{2}}\dfrac{{\mathrm{Im}\,}{\mathfrak{S}}_{0}(z_{t})}{{\mathrm{Im}\,}S_{0}(z_{t})}\partial_{z}S_{u}(z_{t-u})\right|\mathrm{d}u\leqslant N^{C\varepsilon}\left(\dfrac{(Nt)^{\alpha}}{N^{2}t\eta}+\dfrac{1}{Nt}\right).

(52)

Similarly, we have

\int_{0}^{t\wedge\tau}\dfrac{1}{4N}\left|\partial_{zz}{\mathfrak{S}}_{u}(z_{t-u})-e^{-\frac{u}{2}}\dfrac{{\mathrm{Im}\,}{\mathfrak{S}}_{0}(z_{t})}{{\mathrm{Im}\,}S_{0}(z_{t})}\partial_{zz}S_{u}(z_{t-u})\right|\mathrm{d}u\leqslant N^{C\varepsilon}\left(\dfrac{(Nt)^{\alpha}}{N^{2}t\eta}+\dfrac{1}{Nt}\right)

and

\int_{0}^{t\wedge\tau}\left|\dfrac{e^{-\frac{u}{2}}}{2N}\sum_{-N\leqslant k\leqslant N}\dfrac{\theta_{k}(u)}{(s_{k}(u)-z_{t-u})^{2}(s_{k}(u)+z_{t-u})}\right|\mathrm{d}u\leqslant N^{C\varepsilon}\left(\dfrac{(Nt)^{\alpha}}{N^{2}t\eta}+\dfrac{1}{Nt}\right)

It suffices to bound the martingale term

M_{s}:=\int_{0}^{s}\dfrac{e^{-\frac{u}{2}}}{\sqrt{N}}\sum_{-N\leqslant k\leqslant N}\dfrac{\theta_{k}(u)}{(s_{k}(u)-z_{t-u})^{2}}\mathrm{d}B_{k}.

Again we decompose the integral into two parts

	$\displaystyle\left\langle M\right\rangle_{t\wedge\tau}$	$\displaystyle\lesssim\dfrac{1}{N}\int_{0}^{t\wedge\tau}\sum_{\|k\|\geqslant(1-c)N}\dfrac{\|\theta_{k}(u)\|^{2}}{\|\gamma_{k}-z_{t-u}\|^{4}}\mathrm{d}u+\dfrac{1}{N}\int_{0}^{t\wedge\tau}\sum_{\|k\|<(1-c)N}\dfrac{\|\theta_{k}(u)\|^{2}}{\|\gamma_{k}-z_{t-u}\|^{4}}\mathrm{d}u$
		$\displaystyle=:J_{1}+J_{2}.$

The contribution from the soft edge is easy to control

J_{1}\leqslant\dfrac{N^{C\varepsilon}}{N}\int_{0}^{t\wedge\tau}\sum_{|k|\geqslant(1-c)N}\left(N^{-\frac{2}{3}}(N+1-|k|)^{-\frac{1}{3}}\right)^{2}\mathrm{d}u\leqslant N^{C\varepsilon}\dfrac{t}{N^{2}}

For the other term, we use both (46) and (47)

	$\displaystyle J_{2}$	$\displaystyle\lesssim\dfrac{N^{C\varepsilon}}{N}\int_{0}^{t\wedge\eta}\dfrac{1}{N^{2}}\sum_{\|k\|<(1-c)N}\dfrac{1}{\|\gamma_{k}-z_{t-u}\|^{4}}\mathrm{d}u$
		$\displaystyle\quad+\dfrac{N^{C\varepsilon}}{N}\int_{t\wedge\eta}^{t\wedge\tau}\sum_{\|k\|<(1-c)N}\dfrac{1}{\|\gamma_{k}-z_{t-u}\|^{4}}\left[\left(\dfrac{\|\gamma_{k}-E\|^{2}}{N^{2}u^{2}}+\Lambda^{2}\right)\wedge\dfrac{1}{N^{2}}\right]\mathrm{d}u.$

Note that

\dfrac{N^{C\varepsilon}}{N}\int_{0}^{t\wedge\eta}\dfrac{1}{N^{2}}\sum_{|k|<(1-c)N}\dfrac{1}{|\gamma_{k}-z_{t-u}|^{4}}\mathrm{d}u\leqslant\dfrac{N^{C\varepsilon}}{N}\int_{0}^{t\wedge\eta}\dfrac{1}{N(\eta+t-u)^{3}}\mathrm{d}u\leqslant\dfrac{N^{C\varepsilon}}{N^{2}t^{2}}.

For the other term, without loss of generality we may assume $\eta<t$ . For the $\tfrac{|\gamma_{k}-E|^{2}}{N^{2}u^{2}}$ term, we have

	$\displaystyle\quad\dfrac{N^{C\varepsilon}}{N}\int_{\eta}^{t}\sum_{\|k\|<(1-c)N}\dfrac{1}{\|\gamma_{k}-z_{t-u}\|^{4}}\left(\dfrac{\|\gamma_{k}-E\|^{2}}{N^{2}u^{2}}\wedge\dfrac{1}{N^{2}}\right)\mathrm{d}u$
	$\displaystyle\leqslant\dfrac{N^{C\varepsilon}}{N^{2}t^{2}}+\dfrac{N^{C\varepsilon}}{N}\int_{\eta}^{t}\sum_{k:\|\gamma_{k}-E\|\leqslant u}\dfrac{1}{\|\gamma_{k}-z_{t-u}\|^{4}}\dfrac{\|\gamma_{k}-E\|^{2}}{N^{2}u^{2}}\mathrm{d}u$
	$\displaystyle\leqslant\dfrac{N^{C\varepsilon}}{N^{2}t^{2}}+N^{C\varepsilon}\int_{\eta}^{t}\dfrac{1}{N^{2}u^{2}}\left(\int_{-u}^{u}\dfrac{x^{2}}{x^{4}+(\eta+t-u)^{4}}\mathrm{d}x\right)\mathrm{d}u$
	$\displaystyle\leqslant\dfrac{N^{C\varepsilon}}{N^{2}t^{2}}.$

For the contribution of $\Lambda(a,N,t,\eta,u)$ , we have

	$\displaystyle\quad\dfrac{N^{C\varepsilon}}{N}\int_{\eta}^{t}\sum_{\|k\|<(1-c)N}\dfrac{1}{\|\gamma_{k}-z_{t-u}\|^{4}}\left(\Lambda^{2}\wedge\dfrac{1}{N^{2}}\right)\mathrm{d}u$
	$\displaystyle\leqslant\dfrac{N^{C\varepsilon}}{N}\int_{\eta}^{t}\sum_{\|k\|<(1-c)N}\dfrac{1}{\|\gamma_{k}-z_{t-u}\|^{4}}\left[\left(\dfrac{(t-u)^{2}}{N^{2}u^{2}}+\dfrac{(Nu)^{2\alpha}}{N^{4}u^{2}}+\dfrac{\eta^{2}}{N^{2}t^{2}}\right)\wedge\dfrac{1}{N^{2}}\right]\mathrm{d}u$
	$\displaystyle\leqslant N^{C\varepsilon}\int_{\eta}^{t}\dfrac{1}{(\eta+t-u)^{3}}\left[\dfrac{(Nu)^{2\alpha}}{N^{4}u^{2}}+\dfrac{\eta^{2}}{N^{2}t^{2}}+\left(\dfrac{(t-u)^{2}}{N^{2}u^{2}}\wedge\dfrac{1}{N^{2}}\right)\right]\mathrm{d}u.$

The first two terms in the bracket give us

\int_{\eta}^{t}\dfrac{1}{(\eta+t-u)^{3}}\left(\dfrac{(Nu)^{2\alpha}}{N^{4}u^{2}}+\dfrac{\eta^{2}}{N^{2}t^{2}}\right)\mathrm{d}u\leqslant\dfrac{(Nt)^{2\alpha}}{N^{4}t^{2}\eta^{2}}+\dfrac{1}{N^{2}t^{2}}.

For the remaining term, we have

\int_{\eta}^{t}\dfrac{1}{(\eta+t-u)^{3}}\left(\dfrac{(t-u)^{2}}{N^{2}u^{2}}\wedge\dfrac{1}{N^{2}}\right)\mathrm{d}u\\ \leqslant\int_{\eta}^{\frac{t}{2}}\dfrac{1}{(\eta+t-u)^{3}}\dfrac{1}{N^{2}}\mathrm{d}u+\int_{\frac{t}{2}}^{t}\dfrac{1}{(\eta+t-u)^{3}}\dfrac{(t-u)^{2}}{N^{2}u^{2}}\mathrm{d}u\leqslant\dfrac{\log N}{N^{2}t^{2}}

Combining these results shows

\left\langle M\right\rangle_{t\wedge\tau}\leqslant N^{C\varepsilon}\left(\dfrac{(Nt)^{2\alpha}}{N^{4}t^{2}\eta^{2}}+\dfrac{1}{N^{2}t^{2}}\right).

Using Lemma A.1 and a union bound, for any fixed large $D>0$ and sufficiently large $N>N_{0}(\varepsilon,D)$ , we have

\mathbb{P}\left(\sup_{\ell,m,p,0\leqslant s\leqslant t\wedge\tau,|E_{m}|<2-\xi}|M_{s}|\leqslant N^{C\varepsilon}\left(\dfrac{(Nt)^{\alpha}}{N^{2}t\eta}+\dfrac{1}{Nt}\right)\right)>1-N^{-D}.

Together with previous estimates, with overwhelming probability we have

\sup_{\ell,m,p,0\leqslant s\leqslant t\wedge\tau,|E_{m}|<2-\xi}|g_{s}(z)|\leqslant N^{C\varepsilon}\left(\dfrac{(Nt)^{\alpha}}{N^{2}t\eta}+\dfrac{1}{Nt}\right).

This implies $\min_{\ell,m,p}\{\tau_{\ell,m,p}\}=\infty$ with overwhelming probability. Moreover, we have shown in Lemma 3.1, Lemma 3.6 that $\tau_{0}=\infty$ and $\tau_{1}=\infty$ with overwhelming probability. Assuming the hypothesis ${\mathcal{H}}_{\alpha}$ , we also have $\tau_{2}=\infty$ with overwhelming probability. These imply that $\tau=\infty$ with overwhelming probability. Hence we complete the proof. ∎

Now we move on to the short-range approximation of the dynamics. Recall that $\{\varphi_{k}\}$ satisfies the parabolic equation (14), and we rewrite it as

\dfrac{\mathrm{d}}{\mathrm{d}t}\varphi_{k}=\left({\mathcal{P}}\varphi\right)_{k}

where the time-dependent operator ${\mathcal{P}}$ is defined in the following way: For $f:\mathbb{R}\to\mathbb{R}^{2N}$ ,

\left({\mathcal{P}}f\right)_{k}:=\sum_{j\neq\pm k}c_{jk}(t)(f_{j}(t)-f_{k}(t)),\ \ \ c_{jk}(t):=\dfrac{1}{2N(s_{j}(t)-s_{k}(t))^{2}}.

Consider some parameter $l=l(N,\alpha)$ which will be determined later, we decompose the operator ${\mathcal{P}}$ into two parts ${\mathcal{P}}={\mathcal{P}}_{\mathrm{short}}+{\mathcal{P}}_{\mathrm{long}}$ . The operators ${\mathcal{P}}_{\mathrm{short}}$ and ${\mathcal{P}}_{\mathrm{long}}$ represent the short-range interactions and long-range interactions respectively and are defined as follows

	$\displaystyle\left({\mathcal{P}}_{\mathrm{short}}f\right)_{k}$	$\displaystyle=\sum_{\|j-k\|\leqslant l}c_{jk}(t)(f_{j}(t)-f_{k}(t)),$
	$\displaystyle\left({\mathcal{P}}_{\mathrm{long}}f\right)_{k}$	$\displaystyle=\sum_{\|j-k\|>l}c_{jk}(t)(f_{j}(t)-f_{k}(t)).$

Note that the operators ${\mathcal{P}}_{\mathrm{short}},{\mathcal{P}}_{\mathrm{long}}$ are also time dependent. Let ${\mathscr{T}}_{\mathrm{short}}(s,t)$ denote the semigroup associated with the operator ${\mathcal{P}}_{\mathrm{short}}$ in the sense

\partial_{t}{\mathscr{T}}_{\mathrm{short}}(s,t)={\mathcal{P}}_{\mathrm{short}}(t){\mathscr{T}}_{\mathrm{short}}(s,t),\ \ {\mathscr{T}}_{\mathrm{short}}(s,s)=\mathsf{Id}.

Also, let ${\mathscr{T}}$ denote the semigroup associated with ${\mathcal{P}}$ .

To prove the short-range approximation, we need the following finite speed of propagation estimate for the semigroup. Such estimates were proved in [CL19] with minor changes.

Lemma B.2.

For any fixed small $c>0$ and large $D>0$ , there exists $N_{0}(c,D)$ such that the following holds with probability at least $1-N^{-D}$ . For any $\varepsilon>0$ , $N>N_{0}$ , $0<u<v<1$ , $l\geqslant N|u-v|$ , $|k|\leqslant(1-c)N$ and $-N\leqslant j\leqslant N$ such that $|k-j|>N^{\varepsilon}l$ , we have

\left({\mathscr{T}}_{\mathrm{short}}(u,v)\delta_{k}\right)(j)<N^{-D}.

(53)

With Lemma B.2, we have the following short-range approximation estimate. In particular, this short-range approximation can be improved based on the hypothesis ${\mathcal{H}}_{\alpha}$ .

Lemma B.3.

Assume ${\mathcal{H}}_{\alpha}$ . Let $c>0$ be any fixed small constant. There exists a constant $C>0$ such that for any $\varepsilon>0$ , $N^{-1+C\varepsilon}<t<1$ , $\tfrac{t}{2}\leqslant u<v\leqslant t$ , $l\geqslant N^{\varepsilon}$ and $|k|\leqslant(1-c)N$ , we have

\left|\left(\left({\mathscr{T}}(u,v)-{\mathscr{T}}_{\mathrm{short}}(u,v)\right)\varphi(u)\right)_{k}\right|\prec(v-u)\left(\dfrac{N}{l}\dfrac{(Nt)^{\alpha}}{N^{2}t}+\dfrac{1}{Nt}\right).

(54)

Proof.

The Duhamel’s principle implies

\left(\left({\mathscr{T}}(u,v)-{\mathscr{T}}_{\mathrm{short}}(u,v)\right)\varphi(u)\right)_{k}=\int_{u}^{v}\left({\mathscr{T}}_{\mathrm{short}}(s,v)[\left({\mathcal{P}}_{\mathrm{long}}\,\varphi\right)(s)]\right)_{k}\mathrm{d}s.

On the event that Lemma B.2 holds, for $|k|\leqslant(1-3c)N$ , the finite speed of propagation yields

\left({\mathscr{T}}_{\mathrm{short}}(s,v)[\left({\mathcal{P}}_{\mathrm{long}}\,\varphi\right)(s)]\right)_{k}=\left({\mathscr{T}}_{\mathrm{short}}(s,v)[\left({\mathcal{P}}_{\mathrm{long}}\,\varphi\mathbbm{1}_{\llbracket(2c-1)N,(1-2c)N\rrbracket}\right)(s)]\right)_{k}+N^{-D},

where $(\varphi\mathbbm{1}_{\llbracket(2c-1)N,(1-2c)N\rrbracket})_{j}=\varphi_{j}\mathbbm{1}_{\llbracket(2c-1)N,(1-2c)N\rrbracket}(j)$ . Moreover, using the property that ${\mathscr{T}}_{\mathrm{short}}$ is an $L^{\infty}$ contraction, we have

\left|{\left(\left({\mathscr{T}}(u,v)-{\mathscr{T}}_{\mathrm{short}}(u,v)\right)\varphi(u)\right)_{k}}\right|\leqslant|u-v|\sup_{|j|\leqslant(1-2c)N,u<s<v}\left|{\left({\mathcal{P}}_{\mathrm{long}}\,\varphi\right)_{j}(s)}\right|+N^{-D}.

(55)

For $|i|\leqslant(1-c)N$ , assuming ${\mathcal{H}}_{\alpha}$ and on the event that Lemma 3.7 holds, there exists a constant $C>0$ so that

\left|{\varphi_{i}(s)-\varphi_{j}(s)}\right|\leqslant\left|{\varphi_{i}(s)-\widehat{\varphi}_{i}(s)}\right|+\left|{\varphi_{j}(s)-\widehat{\varphi}_{j}(s)}\right|+\left|{\widehat{\varphi}_{i}(s)-\widehat{\varphi}_{j}(s)}\right|\lesssim N^{C\varepsilon}\left(\frac{(Nt)^{\alpha}}{N^{2}t}+\frac{|i-j|}{N^{2}t}\right).

For $|i|>(1-c)N$ , Lemma 3.6 yields

|\varphi_{i}(s)-\varphi_{j}(s)|\leqslant|\varphi_{i}(s)|+|\varphi_{j}(s)|\lesssim N^{-\frac{2}{3}+C\varepsilon}(N+1-|i|)^{-\frac{1}{3}}.

Therefore, using rigidity of singular values, we have

	$\displaystyle\left({\mathcal{P}}_{\mathrm{long}}\varphi\right)_{j}(s)$	$\displaystyle=\sum_{\|i-j\|>l}\frac{\varphi_{i}(s)-\varphi_{j}(s)}{2N(s_{i}(s)-s_{j}(s))^{2}}$
		$\displaystyle\lesssim N^{1+C\varepsilon}\sum_{\|i-j\|>l}\frac{1}{(i-j)^{2}}\left(\frac{(Nt)^{\alpha}}{N^{2}t}+\frac{\|i-j\|}{N^{2}t}\right)+N^{-1+C\varepsilon}\sum_{\|i\|>(1-c)N}N^{-\frac{2}{3}}(N+1-\|i\|)^{-\frac{1}{3}}$
		$\displaystyle\lesssim N^{C\varepsilon}\left(\frac{N}{l}\frac{(Nt)^{\alpha}}{N^{2}t}+\frac{1}{Nt}\right).$

Combined with (55), this implies the desired claim with $|k|\leqslant(1-3c)N$ . Note that $c>0$ is arbitrary, and thus it concludes the proof. ∎

Further, we will show that we have nice control for a regularization of the short-range dynamics with a well-behaved initial data. To do this, we follow the techniques developed in [BY17]. Consider some fixed times $u<t$ , the short-range parameter $l$ , and define an averaging space window scale $r$ . Throughout the remaining parts of this section, for any fixed arbitrarily small $\varepsilon>0$ , we make the following assumption on these parameters

N^{30\varepsilon}(t-u)<N^{20\varepsilon}\dfrac{l}{N}<N^{10\varepsilon}r<t.

(56)

For a fixed index $k$ , as in [BY17], we define the flattening operator with parameter $a>0$ by

\left({{\mathcal{F}}}_{a}f\right)_{j}(v):=\left\{\begin{aligned} &f_{j}(v)&\ &\mbox{if}\ |j-k|\leqslant a\\ &\widehat{\varphi}_{k}(t)&\ &\mbox{if}\ |j-k|>a\end{aligned}\right.\ \ \ \mbox{for }u\leqslant v\leqslant t,

and the averaging operator

\left({\mathcal{A}}f\right)_{j}:=\dfrac{1}{|\llbracket Nr,2Nr\rrbracket|}\sum_{a\in\llbracket Nr,2Nr\rrbracket}\left({{\mathcal{F}}}_{a}f\right)_{j}

As shown in [BY17, Equation (7.4)], the averaging operator can also be represented as a combination of Lipschitz function, i.e. there exists a Lipschitz function $h$ with $|h_{i}-h_{j}|\leqslant\tfrac{|i-j|}{Nr}$ such that

\left({\mathcal{A}}f\right)_{j}=h_{j}f_{j}+(1-h_{j})\widehat{\varphi}_{k}(t).

(57)

Finally, for $u<v<t$ , consider the regularized dynamics

\left\{\begin{aligned} &\dfrac{\mathrm{d}}{\mathrm{d}v}\Gamma_{j}(v)=\left({\mathcal{P}}_{\mathrm{short}}(v)\Gamma\right)_{j},\\ &\Gamma(u)={\mathcal{A}}\varphi(u)\end{aligned}\right.

The following lemma shows that the averaging the regularized dynamics gives good approximation for $\widehat{\varphi}_{k}$ .

Lemma B.4.

Assume ${\mathcal{H}}_{\alpha}$ . Let $c>0$ be any fixed small constant. There exists a constant $C>0$ such that for any $\varepsilon>0$ , $N^{-1+C\varepsilon}<\eta,t<1$ , $\tfrac{t}{2}\leqslant u<v\leqslant t$ , $l>N^{\varepsilon}$ , $j,k\in\llbracket(2c-1)N,(1-2c)N\rrbracket$ such that $|\gamma_{j}-\gamma_{k}|<10r$ , and $z=\gamma_{j}+\text{\rm i}\eta$ , we have

\left|\dfrac{1}{2N}{\mathrm{Im}\,}\sum_{|i-j|<l}\dfrac{\Gamma_{i}(v)}{s_{i}(v)-z}-\left(\dfrac{1}{2N}{\mathrm{Im}\,}\sum_{|i-j|<l}\dfrac{1}{s_{i}(v)-z}\right)\widehat{\varphi}_{k}(v)\right|\\ \prec\left(\dfrac{r}{Nt}+\dfrac{\eta}{Nt}+\dfrac{(Nt)^{\alpha}}{N^{2}t}\left(\dfrac{l}{Nr}+\dfrac{N\eta}{l}+\dfrac{N(t-u)}{l}+\dfrac{1}{N\eta}\right)\right).

(58)

Proof.

We decompose the upper line of (58) into three parts

\dfrac{1}{2N}{\mathrm{Im}\,}\sum_{|i-j|<l}\dfrac{\Gamma_{i}(v)}{s_{i}(v)-z}-\left(\dfrac{1}{2N}{\mathrm{Im}\,}\sum_{|i-j|<l}\dfrac{1}{s_{i}(v)-z}\right)\widehat{\varphi}_{k}(v)=:I_{1}+I_{2}+I_{3},

where

	$\displaystyle I_{1}$	$\displaystyle=\dfrac{1}{2N}{\mathrm{Im}\,}\sum_{\|i-j\|<l}\dfrac{\left({\mathscr{T}}_{\mathrm{short}}(u,v){\mathcal{A}}\varphi(u)-{\mathcal{A}}{\mathscr{T}}_{\mathrm{short}}(u,v)\varphi(u)\right)_{i}}{s_{i}(v)-z}$
	$\displaystyle I_{2}$	$\displaystyle=\dfrac{1}{2N}{\mathrm{Im}\,}\sum_{\|i-j\|<l}\dfrac{\left({\mathcal{A}}{\mathscr{T}}_{\mathrm{short}}(u,v)\varphi(u)-{\mathcal{A}}{\mathscr{T}}(u,v)\varphi(u)\right)_{i}}{s_{i}(v)-z}$
	$\displaystyle I_{3}$	$\displaystyle=\dfrac{1}{2N}{\mathrm{Im}\,}\sum_{\|i-j\|<l}\dfrac{\left({\mathcal{A}}{\mathscr{T}}(u,v)\varphi(u)\right)_{i}-\widehat{\varphi}_{k}(v)}{s_{i}(v)-z}.$

For the first term $I_{1}$ , Note that

\left({\mathscr{T}}_{\mathrm{short}}(u,v){\mathcal{A}}\varphi(u)-{\mathcal{A}}{\mathscr{T}}_{\mathrm{short}}(u,v)\varphi(u)\right)_{i}=\\ \dfrac{1}{|\llbracket Nr,2Nr\rrbracket|}\sum_{a\in\llbracket Nr,2Nr\rrbracket}\left({\mathscr{T}}_{\mathrm{short}}(u,v){{\mathcal{F}}}_{a}\varphi(u)-{{\mathcal{F}}}_{a}{\mathscr{T}}_{\mathrm{short}}(u,v)\varphi(u)\right)_{i}.

When $|i-k|<a-N^{\varepsilon}l$ , we have

\left({{\mathcal{F}}}_{a}{\mathscr{T}}_{\mathrm{short}}(u,v)\varphi(u)\right)_{i}=\left({\mathscr{T}}_{\mathrm{short}}(u,v)\varphi(u)\right)_{i},

and the finite speed of propagation (53) yields

\left({\mathscr{T}}_{\mathrm{short}}(u,v){{\mathcal{F}}}_{a}\varphi(u)\right)_{i}=\left({\mathscr{T}}_{\mathrm{short}}(u,v)\varphi(u)\right)_{i}+N^{-D}.

This gives

\left({\mathscr{T}}_{\mathrm{short}}(u,v){{\mathcal{F}}}_{a}\varphi(u)-{{\mathcal{F}}}_{a}{\mathscr{T}}_{\mathrm{short}}(u,v)\varphi(u)\right)_{i}<N^{-D}

Similarly, this bound also holds in the case $|i-k|>a+N^{\varepsilon}l$ . Now suppose $a-N^{\varepsilon}l\leqslant|i-k|\leqslant a+N^{\varepsilon}l$ . Applying (23) and Hypothesis ${\mathcal{H}}_{\alpha}$ , we obtain

	$\displaystyle\quad\left\|\left({\mathscr{T}}_{\mathrm{short}}(u,v){{\mathcal{F}}}_{a}\varphi(u)-{{\mathcal{F}}}_{a}{\mathscr{T}}_{\mathrm{short}}(u,v)\varphi(u)\right)_{i}\right\|$
	$\displaystyle\leqslant\max_{m:\|\|m-k\|-a\|\leqslant 2N^{\varepsilon}l}\|\varphi_{m}(v)-\widehat{\varphi}_{k}(t)\|+N^{-D}$
	$\displaystyle\leqslant\max_{m:\|\|m-k\|-a\|\leqslant 2N^{\varepsilon}l}\|\varphi_{m}(v)-\widehat{\varphi}_{m}(v)\|+\max_{m:\|\|m-k\|-a\|\leqslant 2N^{\varepsilon}l}\|\widehat{\varphi}_{m}(v)-\widehat{\varphi}_{k}(t)\|+N^{-D}$
	$\displaystyle\leqslant N^{C\varepsilon}\left(\dfrac{(Nt)^{\alpha}}{N^{2}t}+\dfrac{r+(t-u)}{Nt}\right).$

Combined with the estimate above, this implies

I_{1}\leqslant N^{C\varepsilon}\dfrac{l}{Nr}\left(\dfrac{(Nt)^{\alpha}}{N^{2}t}+\dfrac{r+(t-u)}{Nt}\right).

(59)

For the term $I_{2}$ , note that $|i-j|<l$ implies $|i|\leqslant(1-c)N$ . Therefore, using the Lipschitz representation of the averaging operator (57), the short-range approximation (54) gives us

\left|\left({\mathcal{A}}{\mathscr{T}}_{\mathrm{short}}(u,v)\varphi(u)-{\mathcal{A}}{\mathscr{T}}(u,v)\varphi(u)\right)_{i}\right|\\ \leqslant|\left({\mathscr{T}}_{\mathrm{short}}(u,v)\varphi(u)-{\mathscr{T}}(u,v)\varphi(u)\right)_{i}|\leqslant N^{C\varepsilon}(t-u)\left(\dfrac{N}{l}\dfrac{(Nt)^{\alpha}}{N^{2}t}+\dfrac{1}{Nt}\right).

This shows

I_{2}\leqslant N^{C\varepsilon}(t-u)\left(\dfrac{N}{l}\dfrac{(Nt)^{\alpha}}{N^{2}t}+\dfrac{1}{Nt}\right).

(60)

Finally, for the term $I_{3}$ , by the Lipschitz representation of the averaging opeator (57), it can be rewritten in the following way,

	$\displaystyle I_{3}$	$\displaystyle=\dfrac{1}{2N}{\mathrm{Im}\,}\sum_{-N\leqslant i\leqslant N}\dfrac{h_{j}(\varphi_{i}(v)-\widehat{\varphi}_{k}(v))}{s_{i}-z}-\dfrac{1}{2N}{\mathrm{Im}\,}\sum_{\|i-j\|\geqslant l}\dfrac{h_{j}(\varphi_{i}(v)-\widehat{\varphi}_{k}(v))}{s_{i}-z}$
		$\displaystyle\quad+\dfrac{1}{2N}{\mathrm{Im}\,}\sum_{\|i-j\|<l}\dfrac{(h_{i}-h_{j})(\varphi_{i}(v)-\widehat{\varphi}_{k}(v))}{s_{i}-z}+\dfrac{1}{2N}{\mathrm{Im}\,}\sum_{\|i-j\|<l}\dfrac{(1-h_{i})(\widehat{\varphi}_{k}(t)-\widehat{\varphi}_{k}(v))}{s_{i}-z}$
		$\displaystyle=:J_{1}+J_{2}+J_{3}+J_{4}.$

Using (24) and (44), we control $J_{1}$ in the following way

	$\displaystyle J_{1}$	$\displaystyle\leqslant\dfrac{e^{\frac{v}{2}}}{2N}\left({\mathrm{Im}\,}{\mathfrak{S}}_{v}(z)-e^{-\frac{v}{2}}\dfrac{{\mathrm{Im}\,}S_{v}(z)}{{\mathrm{Im}\,}S_{0}(z_{v})}{\mathrm{Im}\,}{\mathfrak{S}}_{0}(z_{v})\right)+{\mathrm{Im}\,}S_{v}(z)\left(\dfrac{{\mathrm{Im}\,}{\mathfrak{S}}_{0}(z_{v})}{N{\mathrm{Im}\,}S_{0}(z_{v})}-\widehat{\varphi}_{k}(v)\right)$
		$\displaystyle\leqslant N^{C\varepsilon}\left(\dfrac{(Nt)^{\alpha}}{N^{3}t\eta}+\dfrac{1}{N^{2}t}+\dfrac{\eta+r}{Nt}\right).$

Applying (23) to estimate $J_{2}$ , we obtain

	$\displaystyle J_{2}$	$\displaystyle\leqslant\dfrac{1}{2N}\sum_{\|i-j\|\geqslant l}\dfrac{\eta}{(\gamma_{i}-\gamma_{j})^{2}+\eta^{2}}\left(\|\varphi_{i}(v)-\widehat{\varphi}_{i}(v)\|+\|\widehat{\varphi}_{i}(v)-\widehat{\varphi}_{k}(v)\|\right)$
		$\displaystyle\leqslant\dfrac{N^{C\varepsilon}}{2N}\sum_{\|i-j\|\geqslant l}\dfrac{\eta}{(\gamma_{i}-\gamma_{j})^{2}+\eta^{2}}\left(\dfrac{(Nt)^{\alpha}}{N^{2}t}+\dfrac{\|i-j\|}{N^{2}t}+\dfrac{Nr}{N^{2}t}\right)$
		$\displaystyle\leqslant N^{C\varepsilon}\left(\dfrac{(Nt)^{\alpha}}{N^{2}t}\dfrac{N\eta}{l}+\dfrac{\eta}{Nt}+\dfrac{r}{Nt}\right).$

Similarly, by the Lipschitz property of $\{h_{i}\}$ , we estimate $J_{3}$ as follows

J_{3}\leqslant\dfrac{1}{2N}\sum_{|i-j|<l}\dfrac{\eta}{(\gamma_{i}-\gamma_{j})^{2}+\eta^{2}}\dfrac{|i-j|}{Nr}\left(\dfrac{(Nt)^{\alpha}}{N^{2}t}+\dfrac{|i-j|}{N^{2}t}+\dfrac{Nr}{N^{2}t}\right)\\ \leqslant N^{C\varepsilon}\dfrac{l}{Nr}\left(\dfrac{(Nt)^{\alpha}}{N^{2}t}+\dfrac{r}{Nt}\right)\leqslant N^{C\varepsilon}\left(\dfrac{(Nt)^{\alpha}}{N^{2}t}\dfrac{l}{Nr}+\dfrac{r}{Nt}\right).

Using the same arguments, by (23), we have

J_{4}\leqslant\dfrac{N^{C\varepsilon}}{2N}\sum_{|i-j|<l}\dfrac{\eta}{(\gamma_{i}-\gamma_{j})^{2}+\eta^{2}}\dfrac{t-v}{Nt}\leqslant N^{C\varepsilon}\dfrac{r}{Nt}.

Together with the previous estimates, this leads to

I_{3}\leqslant N^{C\varepsilon}\left(\dfrac{r}{Nt}+\dfrac{\eta}{Nt}+\dfrac{(Nt)^{\alpha}}{N^{2}t}\left(\dfrac{l}{Nr}+\dfrac{N\eta}{l}+\dfrac{1}{N\eta}\right)\right)

Combined with (59) and (60), we obtain the desired result. ∎

Finally, we have all the tools to prove Lemma 3.10.

Proof of Lemma 3.10.

We fix some small $c>0$ and consider an arbitrarily small $\varepsilon>0$ . Throughout the whole proof, we do all estimates on the overwhelming probability event where Lemma B.2, Lemma B.3 and Lemma B.4 hold. For a fixed index $k\in\llbracket(2c-1)N,(1-2c)N\rrbracket$ , we have

|\varphi_{k}(t)-\Gamma_{k}(t)|\leqslant\left|\left(({\mathscr{T}}(u,t)-{\mathscr{T}}_{\mathrm{short}}(u,t))\varphi(u)\right)_{k}\right|+\left|\left({\mathscr{T}}_{\mathrm{short}}(u,t)(\varphi(u)-\Gamma(u))\right)_{k}\right|.

By the definition of the averaging opeartor, we know that $\Gamma(u)={\mathcal{A}}\varphi(u)=\varphi(u)$ on the set $\{j:|j-k|\leqslant Nr\}$ . Therefore, combined with the finite speed of propagation estimate (53) for the second term and the short-range approximation (54) for the first term, we obtain

|\varphi_{k}(t)-\Gamma_{k}(t)|\leqslant N^{C\varepsilon}(t-u)\left(\dfrac{(Nt)^{\alpha}}{N^{2}t}\dfrac{N}{l}+\dfrac{1}{Nt}\right)+N^{-2022}.

(61)

It suffices to estimate $|\Gamma_{k}(t)-\widehat{\varphi}_{k}(t)|$ . Consider the function

M(v):=\max_{-N\leqslant i\leqslant N}\left(\Gamma_{i}(v)-\widehat{\varphi}_{k}(t)\right).

Similarly as in Lemma 3.2, we can show a parabolic maximum principle for $M$ and consequently $M$ decreases in time. Moreover, note that $\Gamma_{i}(u)=\widehat{\varphi}_{k}(t)$ if $|i-k|\geqslant 2Nr$ .

Let $j=j(v)$ to denote the index that attains the maximum. If there exists a time $u\leqslant v\leqslant t$ such that $|j-k|>3Nr$ , then in this case the finite speed propagation (53) gives us

M(t)\leqslant M(v)=\Gamma_{j}(v)-\widehat{\varphi}_{k}(t)\leqslant N^{-2022}.

(62)

On the other hand, now we assume that $|j(v)-k|<3Nr$ for all $u\leqslant v\leqslant t$ . In this case, we have

\dfrac{\mathrm{d}}{\mathrm{d}v}\left(\Gamma_{j}(v)-\widehat{\varphi}_{k}(t)\right)=\sum_{|i-j|<l}\dfrac{\Gamma_{i}(v)-\Gamma_{j}(v)}{2N(s_{i}(v)-s_{j}(v))^{2}}\leqslant\dfrac{1}{2N}\sum_{|i-j|<l}\dfrac{\Gamma_{i}(v)-\Gamma_{j}(v)}{(s_{i}(v)-s_{j}(v))^{2}+\eta^{2}}.

This gives us

\dfrac{\mathrm{d}}{\mathrm{d}v}\left(\Gamma_{j}(v)-\widehat{\varphi}_{k}(t)\right)\leqslant\dfrac{1}{2N\eta}{\mathrm{Im}\,}\sum_{|i-j|<l}\dfrac{\Gamma_{i}(v)}{s_{i}(v)-z}-\left(\dfrac{1}{2N\eta}{\mathrm{Im}\,}\sum_{|i-j|<l}\dfrac{1}{s_{i}(v)-z}\right)\Gamma_{j}(v)

and therefore

	$\displaystyle\dfrac{\mathrm{d}}{\mathrm{d}v}\left(\Gamma_{j}(v)-\widehat{\varphi}_{k}(t)\right)$	$\displaystyle\leqslant\dfrac{1}{\eta}\left(\dfrac{1}{2N}{\mathrm{Im}\,}\sum_{\|i-j\|<l}\dfrac{\Gamma_{i}(v)}{s_{i}(v)-z}-\left(\dfrac{1}{2N}{\mathrm{Im}\,}\sum_{\|i-j\|<l}\dfrac{1}{s_{i}(v)-z}\right)\widehat{\varphi}_{k}(v)\right)$
		$\displaystyle\quad+\dfrac{1}{\eta}\left(\dfrac{1}{2N}{\mathrm{Im}\,}\sum_{\|i-j\|<l}\dfrac{1}{s_{i}(v)-z}\right)\left(\widehat{\varphi}_{k}(v)-\widehat{\varphi}_{k}(t)\right)$
		$\displaystyle\quad+\dfrac{1}{\eta}\left(\dfrac{1}{2N}{\mathrm{Im}\,}\sum_{\|i-j\|<l}\dfrac{1}{s_{i}(v)-z}\right)\left(\widehat{\varphi}_{k}(t)-\Gamma_{j}(v)\right).$

Applying Lemma B.4 and Lemma 3.7 yields

\dfrac{\mathrm{d}}{\mathrm{d}v}M(v^{+})\lesssim-\dfrac{1}{\eta}M(v)+\dfrac{N^{C\varepsilon}}{\eta}\left(\dfrac{r}{Nt}+\dfrac{\eta}{Nt}+\dfrac{(Nt)^{\alpha}}{N^{2}t}\left(\dfrac{l}{Nr}+\dfrac{N\eta}{l}+\dfrac{N(t-u)}{l}+\dfrac{1}{N\eta}\right)\right),

where the left-hand side represents the right derivative of $M$ at time $v$ . Let $\eta=\tfrac{(t-u)}{N^{\varepsilon}}$ , then above inequality leads to

M(t)\leqslant N^{C\varepsilon}\left(\dfrac{r}{Nt}+\dfrac{(Nt)^{\alpha}}{N^{2}t}\left(\dfrac{l}{Nr}+\dfrac{N(t-u)}{l}+\dfrac{1}{N(t-u)}\right)\right)

Choosing

r=\dfrac{(Nt)^{\frac{3\alpha}{4}}}{N},\ \ l=(Nt)^{\frac{\alpha}{2}},\ \ (t-u)=\dfrac{(Nt)^{\frac{\alpha}{4}}}{N},

(63)

then we have

M(t)<N^{C\varepsilon}\dfrac{(Nt)^{\frac{3\alpha}{4}}}{N^{2}t}

Similarly, this bound also holds for $-\max_{-N\leqslant i\leqslant N}\left(\Gamma_{i}(s)-\widehat{\varphi}_{k}(t)\right)$ . Combined with (61) and (62), this completes the proof. ∎

Appendix C Proofs for Quantitative Universality

In this section, we prove the quantitative resolvent comparison Proposition 4.1.

The key idea is based on the Lindeberg exchange method (for a detailed introduction we refer to the monograph [EY17, VH14]). We first fix an ordering map of the indices $\phi:\{(i,j):1\leqslant i,j\leqslant N\}\to\llbracket N^{2}\rrbracket$ . For $0\leqslant k\leqslant N^{2}$ , let $H_{k}$ be the random matrix defined as

(H_{k})_{ij}=\begin{cases}X_{ij}&\textrm{if }\phi(i,j)\leqslant k\\ Y_{ij}&\textrm{otherwise}\end{cases},

so that $H_{0}=Y$ and $H_{N^{2}}=X$ . By telescoping summation, it suffices to show the following is true uniformly in $1\leqslant k\leqslant N^{2}$ ,

\left|\mathbb{E}[F(\operatorname{Tr}f(H_{k}))]-\mathbb{E}[F(\operatorname{Tr}f(H_{k-1}))]\right|\leqslant\frac{N^{C\varepsilon}}{N^{2}}\left(\frac{1}{\rho N^{2}}+\frac{\left(\rho N^{a}\right)^{5}}{\sqrt{N}}+t\rho N^{a}\right)

(64)

To prove (64), we use the Helffer-Sjöstrand formula. Let $\chi$ be a smooth symmetric cutoff function such that $\chi(y)=1$ if $|y|<N^{-a}$ and $\chi(y)=0$ if $|y|>2N^{-a}$ , with $\|\chi^{\prime}\|_{\infty}\leqslant N^{a}$ . For any matrix $H\in\left\{H_{k}\right\}_{k=0}^{N^{2}}$ , let ${\widetilde{H}}$ denote its Girko symmetrization

{\widetilde{H}}=\left(\begin{matrix}0&H^{\top}\\ H&0\end{matrix}\right)

Recall that the symmetrized singular values $\left\{\sigma_{i}(H)\right\}_{i=-N}^{N}$ are the eigenvalues of ${\widetilde{H}}$ . With the cutoff function $\chi$ , applying Lemma A.2 to ${\widetilde{H}}$ yields

\operatorname{Tr}f(H)=\int_{\mathbb{C}}g(z)\operatorname{Tr}({\widetilde{H}}-z)^{-1}\,\mathrm{d}^{2}z,

(65)

where $\mathrm{d}^{2}z$ is the Lebesgue measure on $\mathbb{C}$ and

g(z):=\frac{1}{\pi}\left[\text{\rm i}yf^{\prime\prime}(x)\chi(y)+\text{\rm i}\left(f(x)+\text{\rm i}yf^{\prime}(x)\right)\chi^{\prime}(y)\right],\ \ z=x+y\text{\rm i}.

The analysis of the comparison can be proceeded in the following steps:

Step 1: Approximation of $\operatorname{Tr}f(H)$ . We first truncate the integral in (65) and define

{\mathcal{T}}(H):=\int_{|y|>N^{-2}}g(z)\operatorname{Tr}({\widetilde{H}}-z)^{-1}\mathrm{d}^{2}z.

The approximation error can be bounded by

	$\displaystyle\left\|{\operatorname{Tr}f(H)-{\mathcal{T}}(H)}\right\|$	$\displaystyle\lesssim\iint_{\|y\|<N^{-2},E<\|x\|<E+\rho}\|f^{\prime\prime}(x)\|\sum_{-N\leqslant k\leqslant N}\frac{y^{2}}{\|\sigma_{k}-(x+y\text{\rm i})\|^{2}}\mathrm{d}x\mathrm{d}y$
		$\displaystyle\lesssim\int_{E<\|x\|<E+\rho}\frac{1}{\rho^{2}N^{2}}\left(\frac{1}{N^{2}}\sum_{-N\leqslant k\leqslant N}\frac{1}{\|\sigma_{k}-(x+\frac{\text{\rm i}}{N})\|^{2}}\right)\mathrm{d}x.$

For singular values near the origin, i.e. $|k|\leqslant N^{C\varepsilon}$ , we have

\int_{E<|x|<E+\rho}\frac{1}{|\sigma_{k}-(x+\frac{i}{N})|^{2}}\mathrm{d}x\leqslant\int_{E<|x|<E+\rho}N^{2}\mathrm{d}x\lesssim\rho N^{2}.

On the other hand, for $|k|>N^{C\varepsilon}$ , by the rigidity of singular values, we have the following overwhelming probability bound

\int_{E<|x|<E+\rho}\frac{1}{|\sigma_{k}-(x+\frac{i}{N})|^{2}}\mathrm{d}x\leqslant\frac{\rho}{|E-\gamma_{k}|^{2}}.

Combining the above two bounds together, we obtain

\left|{\operatorname{Tr}f(H)-{\mathcal{T}}(H)}\right|\leqslant\frac{N^{C\varepsilon}}{\rho N^{2}}+\frac{1}{\rho N^{4}}\sum_{|k|>N^{C\varepsilon}}\frac{1}{|E-\gamma_{k}|^{2}}\lesssim\frac{N^{C\varepsilon}}{\rho N^{2}}+\frac{1}{\rho N^{3}}\left(\frac{1}{N}\sum_{k\geqslant 1}\frac{1}{(k/N)^{2}}\right)\lesssim\frac{N^{C\varepsilon}}{\rho N^{2}}

with overwhelming probability.

Step 2: Expansions and moment matching. With the approximation by ${\mathcal{T}}(H)$ , it suffices to show that

\left|{\mathbb{E}[F({\mathcal{T}}(H_{k}))]-\mathbb{E}[F({\mathcal{T}}(H_{k-1}))]}\right|\lesssim\frac{N^{C\varepsilon}}{N^{2}}\left(\frac{(\rho N^{a})^{5}}{\sqrt{N}}+t\rho N^{a}\right)

(66)

uniformly for all $1\leqslant k\leqslant N^{2}$ . Now consider a fixed $1\leqslant\omega\leqslant N^{2}$ corresponding to the index $(i,j)$ , i.e. $\phi(i,j)=\omega$ . We rewrite the matrices $H_{\omega}$ and $H_{\omega-1}$ in the following way

H_{\omega}=W+\frac{1}{\sqrt{N}}U,\ \ H_{\omega-1}=W+\frac{1}{\sqrt{N}}V,

where the matrix $W$ coincides with $H_{\omega-1}$ and $H_{\omega}$ except on the $(i,j)$ entry with $W_{ij}=0$ . Then note that the matrices $U,V$ satisfy $U_{ij}=\sqrt{N}X_{ij}$ and $V_{ij}=\sqrt{N}Y_{ij}$ and all other entries are zero. Recall the notation ${\widetilde{H}}$ for the Girko symmetrization. Consider the resolvents of the matrices ${\widetilde{W}}$ and ${\widetilde{H}}_{\omega}$

R:=({\widetilde{W}}-z)^{-1},\ \ S:=({\widetilde{H}}_{\omega}-z)^{-1}.

The Taylor expansion yields

		$\displaystyle\mathbb{E}[F({\mathcal{T}}(H_{\omega}))]-\mathbb{E}[F({\mathcal{T}}(H_{\omega-1}))]$		(67)
		$\displaystyle\quad=\sum_{k=1}^{4}\mathbb{E}\left[\frac{F^{(k)}({\mathcal{T}}(W))}{k!}\left(({\mathcal{T}}(H_{\omega})-{\mathcal{T}}(W))^{k}-({\mathcal{T}}(H_{\omega-1})-{\mathcal{T}}(W))^{k}\right)\right]$
		$\displaystyle\quad\quad+O(\\|F^{(5)}\\|_{\infty})\,\mathbb{E}\left[({\mathcal{T}}(H_{\omega})-{\mathcal{T}}(W))^{5}+({\mathcal{T}}(H_{\omega-1})-{\mathcal{T}}(W))^{5}\right].$

We first control the term corresponding to the fifth derivative. By Lemma A.3, the first order resolvent expansion gives us

\frac{1}{N}\operatorname{Tr}S-\frac{1}{N}\operatorname{Tr}R=\frac{1}{\sqrt{N}}\operatorname{Tr}S{\widetilde{U}}R.

Consequently,

\left|{{\mathcal{T}}(H_{\omega})-{\mathcal{T}}(W)}\right|\leqslant\frac{1}{\sqrt{N}}\int_{|y|>N^{-2}}|g(z)|\left|{\operatorname{Tr}S(z){\widetilde{U}}R(z)}\right|\mathrm{d}^{2}z.

We can restrict the integral on the domain $\left\{z=x+y\text{\rm i}:N^{-2}<|y|<2N^{-a},E<|x|<E+\rho\right\}$ as the contribution outside this region is negligible. Moreover, a key observation is that the matrix ${\widetilde{U}}$ only has two non-zero entries. Thus,

\left|{{\mathcal{T}}(H_{\omega})-{\mathcal{T}}(W)}\right|\lesssim N^{\frac{1}{2}+C\varepsilon}\int_{N^{-2}<|y|<2N^{-a},E<|x|<E+\rho}|g(z)|\bigg{(}\max_{k\neq\ell}|S_{k\ell}(z)|\bigg{)}\bigg{(}\max_{k\neq\ell}|R_{k\ell}(z)|\bigg{)}\mathrm{d}^{2}z\\ +N^{-\frac{1}{2}+C\varepsilon}\int_{N^{-2}<|y|<2N^{-a},E<|x|<E+\rho}|g(z)|\left(\max_{k}|S_{kk}(z)|\right)\left(\max_{k}|R_{kk}(z)|\right)\mathrm{d}^{2}z.

Note that in this integral domain, the scale of $|y|$ is smaller than the natural size of the local law. Therefore, we will use a suboptimal version of the local semicircle law for a larger spectral domain, which was discussed in [EKYY13, LS18]. For $z=x+\text{\rm i}y$ in this integral domain, with overwhelming probability we have

\max_{k,\ell}\left|{S_{k\ell}(z)-\delta_{k\ell}\,m_{sc}(z)}\right|\leqslant N^{C\varepsilon}\left(\frac{1}{\sqrt{N}}+\Psi(z)\right),\ \ \Psi(z)=\frac{1}{Ny}+\sqrt{\frac{{\mathrm{Im}\,}m_{sc}(z)}{Ny}}.

The same result also holds for $R_{k\ell}(z)$ . By Lemma A.4, we have $\Psi(z)\lesssim\frac{1}{\sqrt{Ny}}$ for $z$ in the integral domain. Note that the contribution of the diagonal resolvent entries is negligible. Therefore, with overwhelming probability we have

\left|{{\mathcal{T}}(H_{\omega})-{\mathcal{T}}(W)}\right|\lesssim N^{C\varepsilon}N^{1/2}\int_{N^{-2}<|y|<2N^{-a},E<|x|<E+\rho}\frac{|g(z)|}{Ny}\mathrm{d}^{2}z\lesssim N^{C\varepsilon}N^{-1/2}\rho N^{a}.

Similarly, this bound also holds for $|{\mathcal{T}}(H_{\omega-1})-{\mathcal{T}}(W)|$ , and we obtain

\mathbb{E}[({\mathcal{T}}(H_{\omega})-{\mathcal{T}}(W))^{5}]\lesssim\frac{N^{C\varepsilon}}{N^{2}}\left(N^{-\frac{1}{2}}(\rho N^{a})^{5}\right),\ \ \mathbb{E}[({\mathcal{T}}(H_{\omega-1})-{\mathcal{T}}(W))^{5}]\lesssim\frac{N^{C\varepsilon}}{N^{2}}\left(N^{-\frac{1}{2}}(\rho N^{a})^{5}\right).

Hence the fifth order term in (67) is bounded by

O(\|F^{(5)}\|_{\infty})\,\mathbb{E}\left[({\mathcal{T}}(H_{\omega})-{\mathcal{T}}(W))^{5}+({\mathcal{T}}(H_{\omega-1})-{\mathcal{T}}(W))^{5}\right]\lesssim\frac{N^{C\varepsilon}}{N^{2}}\frac{(\rho N^{a})^{5}}{\sqrt{N}}.

(68)

Now we consider the first term $k=1$ in the Taylor expansion (67). Denote

\widehat{R}:=\frac{1}{N}\operatorname{Tr}R,\ \ \widehat{R}_{X}^{(m)}=\frac{(-1)^{m}}{N}\operatorname{Tr}(R{\widetilde{U}})^{m}R,\ \ \Omega_{X}:=-\frac{1}{N}\operatorname{Tr}(R{\widetilde{U}})^{5}S,

and also define

\widehat{R}_{Y}^{(m)}:=\frac{(-1)^{m}}{N}\operatorname{Tr}(R{\widetilde{V}})^{m}R,\ \ \Omega_{Y}:=-\frac{1}{N}\operatorname{Tr}(R{\widetilde{V}})^{5}({\widetilde{H}}_{\omega-1}-z)^{-1}.

Using the resolvent expansion (Lemma A.3) up to the fifth order, we obtain

\frac{1}{N}\operatorname{Tr}S=\widehat{R}+\sum_{m=1}^{4}N^{-\frac{m}{2}}\widehat{R}_{X}^{(m)}+N^{-\frac{5}{2}}\Omega_{X}.

A Similar expansion also holds for $({\widetilde{H}}_{\omega-1}-z)^{-1}$ . Then we have

\mathbb{E}\left[F^{\prime}({\mathcal{T}}(W))\left({\mathcal{T}}(H_{\omega})-{\mathcal{T}}(H_{\omega-1})\right)\right]\\ =\mathbb{E}\left[F^{\prime}({\mathcal{T}}(W))\int_{|y|>N^{-2}}g(z)\left(\sum_{m=1}^{4}N^{-\frac{m}{2}+1}(\widehat{R}_{X}^{(m)}-\widehat{R}_{Y}^{(m)})+N^{-\frac{3}{2}}(\Omega_{X}-\Omega_{Y})\right)\mathrm{d}^{2}z\right]

(69)

A key observation is that for $1\leqslant m\leqslant 3$ , the terms $\widehat{R}_{X}^{(m)}$ and $\widehat{R}_{Y}^{(m)}$ only depend on the first three moments of $X_{ij}$ and $Y_{ij}$ . Recall that the first three moments of $X_{ij}$ and $Y_{ij}$ are identical. Therefore, the terms corresponding to $1\leqslant m\leqslant 3$ in (69) makes no contribution.

Step 3: Higher order error. For the $m=4$ term in (69), note that

\operatorname{Tr}(R{\widetilde{U}})^{4}R=\sum_{1\leqslant\ell\leqslant 2N}\sum_{\{\alpha_{k},\beta_{k}\}=\{i+N,j\}}R_{\ell\alpha_{1}}{\widetilde{U}}_{\alpha_{1}\beta_{1}}R_{\beta_{1}\alpha_{2}}{\widetilde{U}}_{\alpha_{2}\beta_{2}}R_{\beta_{2}\alpha_{3}}{\widetilde{U}}_{\alpha_{3}\beta_{3}}R_{\beta_{3}\alpha_{4}}{\widetilde{U}}_{\alpha_{4}\beta_{4}}R_{\beta_{4}\ell}.

(70)

A similar formula is also true for $\operatorname{Tr}(R{\widetilde{V}})^{4}R$ . Note that typically we have $\ell\neq\alpha_{1}$ and $\ell\neq\beta_{4}$ , but we may have $\beta_{1}=\alpha_{2}$ , $\beta_{2}=\alpha_{3}$ , $\beta_{3}=\alpha_{4}$ . Moreover, the terms with either $\ell=1+N$ or $\ell=j$ are combinatorially negligible in the summation and therefore we can ignore these terms in the following computations. Recall that the difference between the fourth moments of $X_{ij}$ and $Y_{ij}$ is bounded by $t$ . Thus, we have

\mathbb{E}\left[N(\widehat{R}_{X}^{(4)}-\widehat{R}_{Y}^{(4)})\right]=\mathbb{E}\left[\operatorname{Tr}(R{\widetilde{U}})^{4}R-\operatorname{Tr}(R{\widetilde{V}})^{4}R\right]\lesssim Nt\left(\max_{k\neq\ell}|R_{k\ell}|\right)^{2}\left(\max_{k}|R_{kk}|\right)^{3}.

As mentioned above, for the integral in (69) we can restrict the integral domain to $N^{-2}<|y|<2N^{-a}$ and $E<|x|<E+\rho$ . In this region, the entries of the resolvent are bound by $\max_{k\neq\ell}|R_{k\ell}(z)|\lesssim\frac{N^{C\varepsilon}}{\sqrt{Ny}}$ and $\max_{k}|R_{kk}(z)|\lesssim N^{C\varepsilon}$ . As a consequence,

\left|\mathbb{E}\left[F^{\prime}({\mathcal{T}}(W))\int_{|y|>N^{-2}}g(z)N^{-\frac{4}{2}+1}(\widehat{R}_{X}^{(4)}-\widehat{R}_{Y}^{(4)})\mathrm{d}^{2}z\right]\right|\\ \lesssim N^{C\varepsilon}\frac{t}{N}\int_{N^{-2}<|y|<2N^{-a},E<|x|<E+\rho}\frac{|g(z)|}{Ny}\mathrm{d}^{2}z\lesssim\frac{N^{C\varepsilon}}{N^{2}}(t\rho N^{a}).

(71)

For the term $\Omega_{X}-\Omega_{Y}$ , since these terms involve the higher moments of $X_{ij}$ and $Y_{ij}$ , we simply bound it by the size of $\Omega_{X}$ and $\Omega_{Y}$ . By a similar expansion as in (70) and the local law, we have $|\Omega_{X}|,|\Omega_{Y}|\lesssim\frac{N^{C\varepsilon}}{Ny}$ . Therefore,

\left|\mathbb{E}\left[F^{\prime}({\mathcal{T}}(W))\int_{|y|>N^{-2}}g(z)N^{-\frac{3}{2}}\Omega_{X}\mathrm{d}^{2}z\right]\right|\\ \lesssim N^{C\varepsilon}N^{-\frac{3}{2}}\int_{N^{-2}<|y|<2N^{-a},E<|x|<E+\rho}\frac{|g(z)|}{Ny}d^{2}z\lesssim\frac{N^{C\varepsilon}}{N^{2}}\frac{\rho N^{a}}{\sqrt{N}}\leqslant\frac{N^{C\varepsilon}}{N^{2}}\frac{(\rho N^{a})^{5}}{\sqrt{N}}.

(72)

The same bound also holds for $\Omega_{Y}$ .

Finally, as explained in classical literature of random matrix theory (see e.g. [EY17, Theorem 17.4]), the contributions of higher order terms in the Taylor expansion (67) are of smaller order. Consequently, combining (68), (71) and (72) yields the claim (66), which implies the desired result (33).

References

[AEK14] Oskari Ajanki, László Erdős, and Torben Krüger. Local semicircle law with imprimitive variance matrix. Electron. Commun. Probab., 19:no. 33, 9, 2014.
[AEK17] Johannes Alt, László Erdős, and Torben Krüger. Local law for random Gram matrices. Electron. J. Probab., 22:Paper No. 25, 41, 2017.
[AMR11] David Arthur, Bodo Manthey, and Heiko Röglin. Smoothed analysis of the k-means method. Journal of the ACM (JACM), 58(5):1–31, 2011.
[BC10] Peter Bürgisser and Felipe Cucker. Smoothed analysis of moore–penrose inversion. SIAM Journal on Matrix Analysis and Applications, 31(5):2769–2783, 2010.
[BCMV14] Aditya Bhaskara, Moses Charikar, Ankur Moitra, and Aravindan Vijayaraghavan. Smoothed analysis of tensor decompositions. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 594–603, 2014.
[BEK⁺14] Alex Bloemendal, László Erdős, Antti Knowles, Horng-Tzer Yau, and Jun Yin. Isotropic local laws for sample covariance and generalized Wigner matrices. Electron. J. Probab., 19:no. 33, 53, 2014.
[BGK17] Florent Benaych-Georges and Antti Knowles. Local semicircle law for Wigner matrices. In Advanced topics in random matrices, volume 53 of Panor. Synthèses, pages 1–90. Soc. Math. France, Paris, 2017.
[Bou22] Paul Bourgade. Extreme gaps between eigenvalues of Wigner matrices. Journal of the European Mathematical Society, 24(8):2823–2873, 2022.
[BY17] P. Bourgade and H.-T. Yau. The eigenvector moment flow and local quantum unique ergodicity. Comm. Math. Phys., 350(1):231–278, 2017.
[BYY14] Paul Bourgade, Horng-Tzer Yau, and Jun Yin. Local circular law for random matrices. Probability Theory and Related Fields, 159(3-4):545–595, 2014.
[CL19] Ziliang Che and Patrick Lopatto. Universality of the least singular value for sparse random matrices. Electron. J. Probab., 24:Paper No. 9, 53, 2019.
[CMS13] Claudio Cacciapuoti, Anna Maltsev, and Benjamin Schlein. Local Marchenko-Pastur law at the hard edge of sample covariance matrices. J. Math. Phys., 54(4):043302, 13, 2013.
[Ede88] Alan Edelman. Eigenvalues and condition numbers of random matrices. SIAM J. Matrix Anal. Appl., 9(4):543–560, 1988.
[EKYY13] László Erdős, Antti Knowles, Horng-Tzer Yau, and Jun Yin. Spectral statistics of erdős–rényi graphs i: local semicircle law. The Annals of Probability, 41(3B):2279–2375, 2013.
[ESYY12] László Erdős, Benjamin Schlein, Horng-Tzer Yau, and Jun Yin. The local relaxation flow approach to universality of the local statistics for random matrices. Ann. Inst. Henri Poincaré Probab. Stat., 48(1):1–46, 2012.
[EY17] László Erdős and Horng-Tzer Yau. A dynamical approach to random matrix theory, volume 28 of Courant Lecture Notes in Mathematics. Courant Institute of Mathematical Sciences, New York; American Mathematical Society, Providence, RI, 2017.
[EYY11] László Erdős, Horng-Tzer Yau, and Jun Yin. Universality for generalized Wigner matrices with Bernoulli distribution. J. Comb., 2(1):15–81, 2011.
[FV16] Brendan Farrell and Roman Vershynin. Smoothed analysis of symmetric random matrices with continuous distributions. Proceedings of the American Mathematical Society, 144(5):2257–2261, 2016.
[GPV21] Mehrdad Ghadiri, Richard Peng, and Santosh S Vempala. Sparse regression faster than $d^{\omega}$ . arXiv preprint arXiv:2109.11537, 2021.
[Hig02] Nicholas J Higham. Accuracy and stability of numerical algorithms. SIAM, 2002.
[LS18] Ji Oon Lee and Kevin Schnelli. Local law and tracy–widom limit for sparse random matrices. Probability Theory and Related Fields, 171(1):543–616, 2018.
[LSY19] Benjamin Landon, Philippe Sosoe, and Horng-Tzer Yau. Fixed energy universality of Dyson Brownian motion. Adv. Math., 346:1137–1332, 2019.
[MT16] Govind Menon and Thomas Trogdon. Smoothed analysis for the conjugate gradient algorithm. SIGMA. Symmetry, Integrability and Geometry: Methods and Applications, 12:109, 2016.
[Nie22] Zipei Nie. Matrix anti-concentration inequalities with applications. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 568–581, 2022.
[PV21] Richard Peng and Santosh Vempala. Solving sparse linear systems faster than matrix multiplication. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 504–521. SIAM, 2021.
[Sma85] Steve Smale. On the efficiency of algorithms of analysis. Bulletin of the American Mathematical Society, 13(2):87–121, 1985.
[SS20] Rikhav Shah and Sandeep Silwal. Smoothed analysis of the condition number under low-rank perturbations. arXiv preprint arXiv:2009.01986, 2020.
[SST06] Arvind Sankar, Daniel A Spielman, and Shang-Hua Teng. Smoothed analysis of the condition numbers and growth factors of matrices. SIAM Journal on Matrix Analysis and Applications, 28(2):446–476, 2006.
[ST04] Daniel A Spielman and Shang-Hua Teng. Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. Journal of the ACM (JACM), 51(3):385–463, 2004.
[SW09] Galen R. Shorack and Jon A. Wellner. Empirical processes with applications to statistics, volume 59 of Classics in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2009. Reprint of the 1986 original [MR0838963].
[SX21] Kevin Schnelli and Yuanyuan Xu. Convergence rate to the Tracy-Widom laws for the largest eigenvalue of sample covariance matrices. arXiv preprint arXiv:2108.02728, 2021.
[SX22a] Kevin Schnelli and Yuanyuan Xu. Convergence rate to the Tracy-Widom laws for the largest eigenvalue of Wigner matrices. Comm. Math. Phys., 393(2):839–907, 2022.
[SX22b] Kevin Schnelli and Yuanyuan Xu. Quantitative tracy-widom laws for the largest eigenvalue of generalized Wigner matrices. arXiv preprint arXiv:2207.00546, 2022.
[TB97] Lloyd N. Trefethen and David Bau, III. Numerical linear algebra. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1997.
[TV10a] Terence Tao and Van Vu. Random matrices: the distribution of the smallest singular values. Geom. Funct. Anal., 20(1):260–297, 2010.
[TV10b] Terence Tao and Van Vu. Smooth analysis of the condition number and the least singular value. Mathematics of computation, 79(272):2333–2352, 2010.
[VH14] Ramon Van Handel. Probability in high dimension. Technical report, PRINCETON UNIV NJ, 2014.
[Wan19] Haoyu Wang. Quantitative universality for the largest eigenvalue of sample covariance matrices. arXiv preprint arXiv:1912.05473, 2019.
[Wsc04] Mario Wschebor. Smoothed analysis of $\kappa(a)$ . Journal of Complexity, 20(1):97–107, 2004.

	$\displaystyle\left\|{\widehat{\varphi}_{k}(t)-\widehat{\varphi}_{\ell}(s)}\right\|$	$\displaystyle\leqslant\frac{1}{2N}\left\|{\left(\frac{1}{{\mathrm{Im}\,}m_{\mathsf{sc}}(\gamma_{k}^{t})}-\frac{1}{{\mathrm{Im}\,}m_{\mathsf{sc}}(\gamma_{\ell}^{s})}\right){\mathrm{Im}\,}\left(\sum_{-N\leqslant j\leqslant N}\frac{\sigma_{j}(H)-\sigma_{j}(G)}{\gamma_{j}-\gamma_{k}^{t}}\right)}\right\|$
		$\displaystyle\quad+\frac{1}{2N{\mathrm{Im}\,}m_{\mathsf{sc}}(\gamma_{\ell}^{s})}\sum_{-N\leqslant j\leqslant N}\left\|{{\mathrm{Im}\,}\left(\frac{1}{\gamma_{j}-\gamma_{k}^{t}}\right)-{\mathrm{Im}\,}\left(\frac{1}{\gamma_{j}-\gamma_{\ell}^{s}}\right)}\right\|\left\|{\sigma_{j}(H)-\sigma_{j}(G)}\right\|$
		$\displaystyle=:I_{1}+I_{2}.$

		$\displaystyle\quad\int_{0}^{t\wedge\tau}\|S_{u}(z_{t-u})-m_{\mathsf{sc}}(z_{t-u})\|\left\|\partial_{z}{\mathfrak{S}}_{u}(z_{t-u})-e^{-\frac{u}{2}}\dfrac{{\mathrm{Im}\,}{\mathfrak{S}}_{0}(z_{t})}{{\mathrm{Im}\,}S_{0}(z_{t})}\partial_{z}S_{u}(z_{t-u})\right\|\mathrm{d}u$		(48)
		$\displaystyle\leqslant\int_{0}^{t\wedge\tau}\dfrac{N^{C\varepsilon}}{{N{\mathrm{Im}\,}(z_{t-u})}}\sum_{-N\leqslant k\leqslant N}\dfrac{\|\theta_{k}(u)\|}{\|s_{k}(u)-z_{t-u}\|^{2}}\mathrm{d}u$
		$\displaystyle\leqslant\int_{0}^{t\wedge\tau}\dfrac{N^{C\varepsilon}}{{N{\mathrm{Im}\,}(z_{t-u})}}\left(\sum_{\|k\|\geqslant(1-c)N}\dfrac{\|\theta_{k}(u)\|}{\|\gamma_{k}-z_{t-u}\|^{2}}+\sum_{\|k\|<(1-c)N}\dfrac{\|\theta_{k}(u)\|}{\|\gamma_{k}-z_{t-u}\|^{2}}\right)\mathrm{d}u$
		$\displaystyle=:I_{1}+I_{2}.$

	$\displaystyle\quad\left\|\left({\mathscr{T}}_{\mathrm{short}}(u,v){{\mathcal{F}}}_{a}\varphi(u)-{{\mathcal{F}}}_{a}{\mathscr{T}}_{\mathrm{short}}(u,v)\varphi(u)\right)_{i}\right\|$
	$\displaystyle\leqslant\max_{m:\|\|m-k\|-a\|\leqslant 2N^{\varepsilon}l}\|\varphi_{m}(v)-\widehat{\varphi}_{k}(t)\|+N^{-D}$
	$\displaystyle\leqslant\max_{m:\|\|m-k\|-a\|\leqslant 2N^{\varepsilon}l}\|\varphi_{m}(v)-\widehat{\varphi}_{m}(v)\|+\max_{m:\|\|m-k\|-a\|\leqslant 2N^{\varepsilon}l}\|\widehat{\varphi}_{m}(v)-\widehat{\varphi}_{k}(t)\|+N^{-D}$
	$\displaystyle\leqslant N^{C\varepsilon}\left(\dfrac{(Nt)^{\alpha}}{N^{2}t}+\dfrac{r+(t-u)}{Nt}\right).$

	$\displaystyle\dfrac{\mathrm{d}}{\mathrm{d}v}\left(\Gamma_{j}(v)-\widehat{\varphi}_{k}(t)\right)$	$\displaystyle\leqslant\dfrac{1}{\eta}\left(\dfrac{1}{2N}{\mathrm{Im}\,}\sum_{\|i-j\|<l}\dfrac{\Gamma_{i}(v)}{s_{i}(v)-z}-\left(\dfrac{1}{2N}{\mathrm{Im}\,}\sum_{\|i-j\|<l}\dfrac{1}{s_{i}(v)-z}\right)\widehat{\varphi}_{k}(v)\right)$
		$\displaystyle\quad+\dfrac{1}{\eta}\left(\dfrac{1}{2N}{\mathrm{Im}\,}\sum_{\|i-j\|<l}\dfrac{1}{s_{i}(v)-z}\right)\left(\widehat{\varphi}_{k}(v)-\widehat{\varphi}_{k}(t)\right)$
		$\displaystyle\quad+\dfrac{1}{\eta}\left(\dfrac{1}{2N}{\mathrm{Im}\,}\sum_{\|i-j\|<l}\dfrac{1}{s_{i}(v)-z}\right)\left(\widehat{\varphi}_{k}(t)-\Gamma_{j}(v)\right).$

Optimal Smoothed Analysis and Quantitative Universality for the Smallest Singular Value of Random Matrices

Abstract

1 Introduction

1.1 Overview and our contributions

1.2 Models and main results

Definition 1 (Overwhelming probability).

Definition 2 (Stochastic domination).

Theorem 1.

Theorem 2.

Corollary 1.

Theorem 3.

Theorem 4.

Theorem 5.

1.3 Outline of proofs

1.4 Notations and paper organizations

2 Applications in Numerical Analysis and Algorithms

2.1 Accuracy of least-square solution

2.2 Complexity of conjugate gradient method

3 Smoothed Analysis and Gaussian Approximation

3.1 Singular value dynamics

Lemma 3.1.

Lemma 3.2.

Lemma 3.3.

Lemma 3.4.

Lemma 3.5.

Proposition 3.1.

Lemma 3.6.

Proof.

3.2 Local relaxation at the hard edge

Theorem 6.

Proposition 3.2.

Lemma 3.7.

Proof.

Lemma 3.8.

Proof.

3.3 Bootstrap arguments

Definition 3 (Hypothesis ℋα{\mathcal{H}}_{\alpha}).

Lemma 3.9.

Proof.

Lemma 3.10.

Proof of Proposition 3.2.

4 Quantitative Universality

4.1 Quantitative resolvent comparison

Lemma 4.1.

Proof.

Proposition 4.1.

4.2 Proof of Theorem 2

Proof of Corollary 1.

5 Condition Number

5.1 Smoothed analysis

5.2 Quantitative universality

6 Beyond Strictly Square Matrices

Acknowledgment

Appendix A Auxiliary Results

Lemma A.1.

Lemma A.2 (Helffer-Sjőstrand formula).

Lemma A.3 (Resolvent expansion).

Lemma A.4.

Appendix B Proofs for Smoothed Analysis

B.1 Proof of Proposition 3.1

B.2 Proof of Lemma 3.10

Lemma B.1.

Proof.

Lemma B.2.

Lemma B.3.

Proof.

Lemma B.4.

Proof.

Proof of Lemma 3.10.

Appendix C Proofs for Quantitative Universality

References

Definition 3 (Hypothesis ${\mathcal{H}}_{\alpha}$ ).