This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A new approach to strong convergence

Chi-Fang Chen EECS, University of California, Berkeley, CA 94720, USA
Center for Theoretical Physics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
[email protected]
Jorge Garza-Vargas Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA [email protected] Joel A. Tropp Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA [email protected]  and  Ramon van Handel Department of Mathematics, Princeton University, Princeton, NJ 08544, USA [email protected]
Abstract.

A family of random matrices 𝑿N=(X1N,,XdN)\bm{X}^{N}=(X_{1}^{N},\ldots,X_{d}^{N}) is said to converge strongly to a family of bounded operators 𝒙=(x1,,xd)\bm{x}=(x_{1},\ldots,x_{d}) when P(𝑿N,𝑿N)P(𝒙,𝒙)\|P(\bm{X}^{N},\bm{X}^{N*})\|\to\|P(\bm{x},\bm{x}^{*})\| for every noncommutative polynomial PP. This phenomenon plays a key role in several recent breakthroughs on random graphs, geometry, and operator algebras. However, proofs of strong convergence are notoriously delicate and have relied largely on problem-specific methods.

In this paper, we develop a new approach to strong convergence that uses only soft arguments. Our method exploits the fact that for many natural models, the expected trace of P(𝑿N,𝑿N)P(\bm{X}^{N},\bm{X}^{N*}) is a rational function of 1N\frac{1}{N} whose lowest order asymptotics are easily understood. We develop a general technique to deduce strong convergence directly from these inputs using the inequality of A. and V. Markov for univariate polynomials and elementary Fourier analysis. To illustrate the method, we develop the following applications.

  1. 1.

    We give a short proof of the result of Friedman that random regular graphs have a near-optimal spectral gap, and obtain a sharp understanding of the large deviations probabilities of the second eigenvalue.

  2. 2.

    We prove a strong quantitative form of the strong convergence property of random permutation matrices due to Bordenave and Collins.

  3. 3.

    We extend the above to any stable representation of the symmetric group, providing many new examples of the strong convergence phenomenon.

Key words and phrases:
Strong convergence; random permutations; random regular graphs
2010 Mathematics Subject Classification:
60B20; 15B52; 05C80; 46L53; 46L54

1. Introduction

Let 𝑿N=(X1N,,XdN)\bm{X}^{N}=(X_{1}^{N},\ldots,X_{d}^{N}) be a sequence of dd-tuples of random matrices of increasing dimension, and let 𝒙=(x1,,xd)\bm{x}=(x_{1},\ldots,x_{d}) be a dd-tuple of bounded operators. Then 𝑿N\bm{X}^{N} is said to converge strongly to 𝒙\bm{x} if

P(𝑿N,𝑿N)NP(𝒙,𝒙)in probability\|P(\bm{X}^{N},\bm{X}^{N*})\|\xrightarrow{N\to\infty}\|P(\bm{x},\bm{x}^{*})\|\quad\text{in probability}

for every noncommutative polynomial PP.

The notion of strong convergence has its origin in the work of Haagerup and Thorbjørnsen [31] in the context of complex Gaussian (GUE) matrices. It has since proved to be a powerful tool in a broad range of applications. Notably, strong convergence plays a central role in a series of recent breakthroughs in the study of random lifts of graphs [7], random covering spaces [38, 46, 37, 48], random minimal surfaces [65], and operator algebras [31, 30, 34, 4, 35, 8]; we refer to the recent paper [8] for a more extensive discussion and bibliography.

In many cases, norm convergence can be highly nontrivial to establish even for the simplest polynomials PP. For example, let S¯1N,,S¯dN\bar{S}_{1}^{N},\ldots,\bar{S}_{d}^{N} be i.i.d. random permutation matrices of dimension NN. Then the linear polynomial

A¯N=S¯1N+S¯1N++S¯dN+S¯dN\bar{A}^{N}=\bar{S}_{1}^{N}+\bar{S}_{1}^{N*}+\cdots+\bar{S}_{d}^{N}+\bar{S}_{d}^{N*}

is the adjacency matrix of a random 2d2d-regular graph. Every 2d2d-regular graph has largest eigenvalue 2d2d with eigenvector 1N1_{N}, while the Alon–Boppana bound states that the second eigenvalue is at least 22d1o(1)2\sqrt{2d-1}-o(1) [55]. It was conjectured by Alon, and proved in a breakthrough monograph of Friedman [27], that the random 2d2d-regular graph satisfies A¯N|{1N}=22d1+o(1)\|\bar{A}^{N}|_{\{1_{N}\}^{\perp}}\|=2\sqrt{2d-1}+o(1), so that such graphs have near-optimal second eigenvalue. The much more general result that random permutation matrices converge strongly, due to Bordenave and Collins [7], is of major importance in many recent applications cited above.

Given the power of the strong convergence phenomenon, it may be unsurprising that strong convergence has been notoriously difficult to prove (see, e.g., the discussion in [9]). In those cases where strong convergence has been established, the proofs rely on problem-specific methods and often require delicate computations. These obstacles have hampered efforts to establish strong convergence in new situations and to obtain a sharper understanding of this phenomenon.

In this paper, we develop a new approach to strong convergence that uses only soft arguments, and which requires limited problem-specific inputs as compared to previous methods. We will illustrate the method in the context of random permutation matrices and other representations of the symmetric group, resulting in surprisingly short proofs of strong convergence and new quantitative and qualitative information on the strong convergence phenomenon:

  • We prove a quantitative form of Friedman’s theorem on the second eigenvalue of random regular graphs [27] that reveals new probabilistic structure. In particular, we obtain a sharp characterization of the tail behavior of the second eigenvalue.

  • We obtain a quantitative form of the strong convergence of random permutation matrices due to Bordenave and Collins [7, 8], which significantly improves the best known convergence rate for arbitrary polynomials.

  • We establish strong convergence of random matrices defined by any stable representation of the symmetric group (in the sense of [25]). This provides a large new family of examples of the strong convergence phenomenon, and illustrates that strong convergence arises in settings far beyond the standard representation.

Further applications to several other models will appear in forthcoming work.

Let us emphasize at the outset that the main difficulty in establishing strong convergence is to prove the upper bound

P(𝑿N,𝑿N)P(𝒙,𝒙)+o(1)with probability 1o(1)\|P(\bm{X}^{N},\bm{X}^{N*})\|\leq\|P(\bm{x},\bm{x}^{*})\|+o(1)\quad\text{with probability }1-o(1)

as NN\to\infty. The corresponding lower bound then follows automatically, either as a byproduct of the proof or from general principles (see, e.g., [46, pp. 16–19]). We therefore focus on upper bounds throughout this paper, and relegate a brief treatment of the corresponding lower bounds to Appendix A.

Organization of this paper

This paper is organized as follows. In section 2, we first outline the core ingredients of our approach in a general setting, while section 3 presents the main results obtained in this paper.

The next two sections introduce some general tools on which our methods are based. Section 4 recalls properties of polynomials and distributions that form the basis for our approach, while section 5 recalls basic properties of words in random permutation matrices that form the input for our methods.

The rest of the paper is devoted to the proofs of our main results. Sections 6 and 7 prove master inequalities for polynomials and smooth functions, respectively. These inequalities are applied in section 8 to random regular graphs, and in section 9 to strong convergence of random permutation matrices. Section 10 extends the above results to general stable representations of the symmetric group.

Finally, Appendix A is devoted to a brief treatment of lower bounds.

Notation

In the following, aba\lesssim b denotes that aCba\leq Cb for a universal constant CC. We denote :={1,2,3,}\mathbb{N}:=\{1,2,3,\ldots\} and +:={0,1,2,}\mathbb{Z}_{+}:=\{0,1,2,\ldots\}.

The symmetric group on NN letters is denoted as 𝐒N\mathbf{S}_{N}, the free group with free generators g1,,gdg_{1},\ldots,g_{d} is denoted 𝐅d\mathbf{F}_{d}, and e𝐅de\in\mathbf{F}_{d} denotes the identity element.

We denote by 𝒫q\mathcal{P}_{q} the space of real polynomials h:h:\mathbb{R}\to\mathbb{R} of degree at most qq, and by 𝒫\mathcal{P} the space of all real polynomials. We denote by h(m)h^{(m)} the mmth derivative of a univariate function, and by hCm[a,b]:=j=0msupx[a,b]|h(j)(x)|\|h\|_{C^{m}[a,b]}:=\sum_{j=0}^{m}\sup_{x\in[a,b]}|h^{(j)}(x)|.

We will denote by MN()\mathrm{M}_{N}(\mathbb{C}) the space of N×NN\times N matrices with complex entries. The trace of a matrix MM is denoted as TrM\mathop{\mathrm{Tr}}M, the normalized trace as trNM:=1NTrM\mathop{\mathrm{tr}_{N}}M:=\frac{1}{N}\mathop{\mathrm{Tr}}M, and the operator norm as M\|M\|. The identity matrix is denoted 𝟏NMN()\mathbf{1}_{N}\in\mathrm{M}_{N}(\mathbb{C}), and 1NN1_{N}\in\mathbb{C}^{N} denotes the vector all of whose entries equal one. We use the convention that powers bind before the trace, e.g., trN(XN)p:=trN[(XN)p]\mathop{\mathrm{tr}_{N}}(X^{N})^{p}:=\mathop{\mathrm{tr}_{N}}[(X^{N})^{p}].

If XX is a self-adjoint operator and h:h:\mathbb{R}\to\mathbb{C}, then the operator h(X)h(X) is defined by the usual functional calculus (in particular, if XX is a self-adjoint matrix, hh is applied to the eigenvalues while keeping the eigenvectors fixed).

2. Outline

The main results of this paper will be presented in section 3, which can be read independently of the present section. However, as the new approach that underlies their proofs is much more broadly applicable, we begin in this section by introducing the core ingredients of the method in a general setting.

Throughout this section, we let XNX^{N} be a sequence of NN-dimensional self-adjoint random matrices, whose expected empirical eigenvalue distribution converges as NN\to\infty to a limiting probability measure ν0\nu_{0} with compact support. This weak convergence property is captured by convergence of the moments

limN𝐄[trN(XN)p]=xpν0(dx)for all p.\lim_{N\to\infty}\mathbf{E}[\mathop{\mathrm{tr}_{N}}(X^{N})^{p}]=\int x^{p}\,\nu_{0}(dx)\qquad\text{for all }p\in\mathbb{N}.

The moments 𝐄[trN(XN)p]\mathbf{E}[\mathop{\mathrm{tr}_{N}}(X^{N})^{p}] generally admit a combinatorial description whose behavior as NN\to\infty is well understood, so that weak convergence is readily accessible. However, weak convergence can only ensure that a fraction 1o(1)1-o(1) of eigenvalues converges to the support of ν0\nu_{0}, and cannot rule out existence of outliers in the spectrum. This is the basic difficulty in strong convergence problems.

In the problems of interest in this paper, the random matrix XN=P(𝑿N,𝑿N)X^{N}=P(\bm{X}^{N},\bm{X}^{N*}) is a noncommutative polynomial of a given family 𝑿N=(X1N,,XdN)\bm{X}^{N}=(X_{1}^{N},\ldots,X_{d}^{N}) of random matrices. In this setting, it is natural to think of ν0\nu_{0} itself as the spectral distribution of a certain limiting operator XF=P(𝒙,𝒙)X_{\rm F}=P(\bm{x},\bm{x}^{*}) in the sense that

τ(XFp)=xpν0(dx)for all p,\tau(X_{\rm F}^{p})=\int x^{p}\,\nu_{0}(dx)\qquad\text{for all }p\in\mathbb{N},

where 𝒙=(x1,,xd)\bm{x}=(x_{1},\ldots,x_{d}) is a limiting family of operators associated to 𝑿N\bm{X}^{N} and τ\tau is a positive linear functional that plays the role of the trace for the limiting operators. Such models are naturally formulated in the setting of CC^{*}-probability spaces, for which we refer to [54] for a excellent introduction. However, this general framework will not be needed in this paper, as the limiting objects associated to the models we will study admit an explicit construction that is recalled in section 3.1.

Example 2.1.

While the aim of this section in to discuss the methods of this paper in a general setting, the reader may keep in mind the following guiding example. Let XN=A¯N|{1N}X^{N}=\bar{A}^{N}|_{\{1_{N}\}^{\perp}} be the adjacency matrix of a random 2d2d-regular graph with the trivial eigenvalue removed, as defined in the introduction. In this case, the limiting operator XFX_{\rm F} may be viewed as the adjacency matrix of the infinite 2d2d-regular tree, and τ(X)=δe,Xδe\tau(X)=\langle\delta_{e},X\delta_{e}\rangle where δe\delta_{e} is the coordinate vector associated with the root of the tree. The spectral distribution ν0\nu_{0} of XFX_{\rm F} is the Kesten-McKay distribution, which is supported in the interval [XF,XF][-\|X_{\rm F}\|,\|X_{\rm F}\|] with XF=22d1\|X_{\rm F}\|=2\sqrt{2d-1}. A more precise description of this model is given in section 3.2 below.

Remark 2.2.

Throughout this paper, we will work with self-adjoint random matrices XNX^{N} and operators XFX_{\rm F}. This entails no loss of generality: since X2=XX\|X\|^{2}=\|X^{*}X\| for any bounded operator XX, strong convergence of arbitrary noncommutative polynomials PP is equivalent to strong convergence of the self-adjoint polynomials PPP^{*}P. We may therefore restrict attention to self-adjoint polynomials PP.

While it is not essential for the applicability of the methods of this paper, we will also assume for simplicity that the random matrices XNX^{N} are uniformly bounded, i.e., there is a constant KK so that XNK\|X^{N}\|\leq K a.s. for all NN. This is the case for all models considered in this paper; for example, XN2d\|X^{N}\|\leq 2d a.s. in Example 2.1.

In the remainder of this section, we assume that weak convergence

limN𝐄[trN(XN)p]=τ(XFp)for all p\lim_{N\to\infty}\mathbf{E}[\mathop{\mathrm{tr}_{N}}(X^{N})^{p}]=\tau(X_{\rm F}^{p})\qquad\text{for all }p\in\mathbb{N} (2.1)

holds. The problem of strong convergence is to show that not only the moments, but also the operator norm, of XNX^{N} converges to that of the limiting model XFX_{\rm F}, which ensures a lack of outliers in the spectrum. More precisely, we aim to explain in the sequel how to prove the strong convergence upper bound XNXF+o(1)\|X^{N}\|\leq\|X_{\rm F}\|+o(1), which is the main difficulty. The corresponding lower bound is typically an easy consequence of (2.1), as will be discussed in Appendix A.

To date, proofs of strong convergence were based on resolvent equations [31, 64], interpolation [20, 57, 58, 2], or the moment method [7, 9, 8]. The first two approaches rely on analytic tools, such as integration by parts formulae or free stochastic calculus, that are only available for special models. In contrast, the only known way to access the properties of models such as those based on random permutations or group representations is through moment computations. We therefore begin by recalling the challenges faced by the moment method.

2.1. Prelude: the moment method

The basic premise behind the moment method is the observation that since |τ(XFp)|XFp|\tau(X_{\rm F}^{p})|\leq\|X_{\rm F}\|^{p}, (2.1) implies that

(𝐄XN2p)1/2p(𝐄[Tr(XN)2p])1/2pN1/2p(XF+o(1))\big{(}\mathbf{E}\|X^{N}\|^{2p}\big{)}^{1/2p}\leq\big{(}\mathbf{E}[\mathop{\mathrm{Tr}}(X^{N})^{2p}]\big{)}^{1/2p}\leq N^{1/2p}\big{(}\|X_{\rm F}\|+o(1)\big{)} (2.2)

as NN\to\infty. This does not in itself suffice to prove XNXF+o(1)\|X^{N}\|\leq\|X_{\rm F}\|+o(1), as the right-hand side diverges when pp is fixed. The essence of the moment method is that the desired bound would follow if (2.2) remains valid for p=p(N)logNp=p(N)\gg\log N, as then N1/2p=1+o(1)N^{1/2p}=1+o(1). We emphasize this is a much stronger property than (2.1). In practice, the implementation of this method faces two major obstacles.

  • While moment asymptotics for fixed pp are accessible in many situations, the case where pp grows rapidly with NN is often a difficult combinatorial problem. Despite the availability of tools to facilitate the analysis of large moments for strong convergence, such as nonbacktracking and linearization methods (see, e.g., [7, 8]), the core of the analysis remains dependent on delicate computations that do not readily carry over to new situations.

  • A more fundamental problem is that the inequality (2.2) that forms the foundation for the moment method may fail to hold altogether for any plogNp\gg\log N. To see why, suppose that 𝐏[XNXF+ε]Nc\mathbf{P}[\|X^{N}\|\geq\|X_{\rm F}\|+\varepsilon]\geq N^{-c} for some ε,c>0\varepsilon,c>0. Then

    𝐄[Tr(XN)2p]Nc(XF+ε)2p,\mathbf{E}[\mathop{\mathrm{Tr}}(X^{N})^{2p}]\geq N^{-c}(\|X_{\rm F}\|+\varepsilon)^{2p},

    which precludes the validity of (2.2) for plogNp\gg\log N. It was observed by Friedman [27] that this situation arises already in the setting of random regular graphs due to the presence with probability NcN^{-c} of dense subgraphs called “tangles”. The application of the moment method in this setting requires conditioning on the absence of tangles, which significantly complicates the analysis.

2.2. A new approach

The approach of this paper is also based on moment computations, which are however used in an entirely different manner. As we will explain below, our method deduces norm convergence from moments by first letting NN\to\infty and then pp\to\infty, bypassing the challenges of the moment method.

Inputs

Our approach requires two basic inputs that we presently describe.

Let h𝒫qh\in\mathcal{P}_{q} be a polynomial of degree at most qq. Then 𝐄[trNh(XN)]\mathbf{E}[\mathop{\mathrm{tr}_{N}}h(X^{N})] is a linear combination of the moments 𝐄[trN(XN)p]\mathbf{E}[\mathop{\mathrm{tr}_{N}}(X^{N})^{p}] for pqp\leq q. We will exploit the fact that in many situations, the moments are rational functions of 1N\frac{1}{N}:

𝐄[trNh(XN)]=Φh(1N)=fh(1N)gq(1N),\mathbf{E}[\mathop{\mathrm{tr}_{N}}h(X^{N})]=\Phi_{h}(\tfrac{1}{N})=\frac{f_{h}(\frac{1}{N})}{g_{q}(\frac{1}{N})}, (2.3)

where fhf_{h} and gqg_{q} are polynomials that depend only on hh and qq, respectively. This phenomenon is very common; e.g., it arises from Weingarten calculus [21, 19] or from genus expansions [50, Chapter 1]. For the random permutation models considered in this paper, the relevant properties are recalled in section 5.

In general, the function Φh\Phi_{h} is extremely complicated. However, our methods will use only soft information on its structure: we require upper bounds on the degrees of fhf_{h} and gqg_{q} (which are proportional to qq for the models considered in this paper), and we must show that gqg_{q} does not vanish near zero. Both properties can be read off almost immediately from a combinatorial expression for Φh\Phi_{h}.

The second input to our method is the asymptotic expansion as NN\to\infty

𝐄[trNh(XN)]=ν0(h)+ν1(h)N+O(1N2),\mathbf{E}[\mathop{\mathrm{tr}_{N}}h(X^{N})]=\nu_{0}(h)+\frac{\nu_{1}(h)}{N}+O\bigg{(}\frac{1}{N^{2}}\bigg{)}, (2.4)

where ν0\nu_{0} and ν1\nu_{1} are linear functionals on the space 𝒫\mathcal{P} of real polynomials. Clearly

ν0(h)\displaystyle\nu_{0}(h) =Φh(0)=limN𝐄[trNh(XN)]=τ(h(XF)),\displaystyle=\Phi_{h}(0)=\lim_{N\to\infty}\mathbf{E}[\mathop{\mathrm{tr}_{N}}h(X^{N})]=\tau(h(X_{\rm F})),
ν1(h)\displaystyle\nu_{1}(h) =Φh(0)=limNN(𝐄[trNh(XN)]τ(h(XF)))\displaystyle=\Phi_{h}^{\prime}(0)=\lim_{N\to\infty}N\big{(}\mathbf{E}[\mathop{\mathrm{tr}_{N}}h(X^{N})]-\tau(h(X_{\rm F}))\big{)}

are defined by the lowest-order asymptotics of the moments. We will exploit that simple combinatorial expressions for ν0\nu_{0} and ν1\nu_{1} are readily accessible in practice.

Remark 2.3.

It is evident from the above formulas that ν0(h)=h(x)ν0(dx)\nu_{0}(h)=\int h(x)\,\nu_{0}(dx) coincides with the spectral distribution ν0\nu_{0} of XFX_{\rm F}, so we know a priori that it is defined by a probability measure. In particular, even though ν0(h)\nu_{0}(h) appears in (2.4) only for polynomial test functions hh, its definition automatically extends to any continuous function hh. In contrast, there is no reason a priori to expect that ν1(h)\nu_{1}(h) can be meaningfully defined for non-polynomial test functions hh: there could be functions hh for which weak convergence holds at a slower rate than 1N\frac{1}{N}, in which case the expansion (2.4) fails to hold. That (2.4) and the definition of ν1(h)\nu_{1}(h) nonetheless extend to smooth functions hh will be a key output of our method.

Outline

Our basic aim is to achieve the following.

  1. ​​(a)​

    We aim to show that the validity of (2.4) extends from polynomials to smooth functions hh. In particular, we will show that ν1\nu_{1} extends to a compactly supported distribution (in the sense of Schwartz, cf. Definition 4.6).

  2. ​​(b)​

    We aim to show that suppν1[XF,XF]\mathop{\mathrm{supp}}\nu_{1}\subseteq[-\|X_{\rm F}\|,\|X_{\rm F}\|]. ​(That suppν0[XF,XF]\mathop{\mathrm{supp}}\nu_{0}\subseteq[-\|X_{\rm F}\|,\|X_{\rm F}\|] holds automatically as ν0\nu_{0} is the spectral distribution of XFX_{\rm F}.)

A bound on the norm then follows easily. Let χ:[0,1]\chi:\mathbb{R}\to[0,1] be a smooth function so that χ=0\chi=0 on [XFε2,XF+ε2][-\|X_{\rm F}\|-\frac{\varepsilon}{2},\|X_{\rm F}\|+\frac{\varepsilon}{2}] and χ=1\chi=1 on [XFε,XF+ε]c[-\|X_{\rm F}\|-\varepsilon,\|X_{\rm F}\|+\varepsilon]^{c}. Then ν0(χ)=ν1(χ)=0\nu_{0}(\chi)=\nu_{1}(\chi)=0 by (b), and thus (a) yields

𝐏[XNXF+ε]𝐄[Trχ(XN)]=O(1N).\mathbf{P}[\|X^{N}\|\geq\|X_{\rm F}\|+\varepsilon]\leq\mathbf{E}[\mathop{\mathrm{Tr}}\chi(X^{N})]=O\bigg{(}\frac{1}{N}\bigg{)}.

As ε>0\varepsilon>0 was arbitrary, XNXF+o(1)\|X^{N}\|\leq\|X_{\rm F}\|+o(1) in probability.

We now explain the steps that will be used to prove (a) and (b).

Step 1: the polynomial method

At the heart of our approach lies a general method to obtain a quantitative form of (2.4): we will show that

|𝐄[trNh(XN)]ν0(h)ν1(h)N|q8N2hC0[K,K]\bigg{|}\mathbf{E}[\mathop{\mathrm{tr}_{N}}h(X^{N})]-\nu_{0}(h)-\frac{\nu_{1}(h)}{N}\bigg{|}\lesssim\frac{q^{8}}{N^{2}}\|h\|_{C^{0}[-K,K]} (2.5)

for any h𝒫qh\in\mathcal{P}_{q}, where XNK\|X^{N}\|\leq K a.s. for all NN. While achieving such a bound by previous methods would require hard analysis, it arises here from a soft argument: we can “upgrade” an asymptotic expansion (2.4) to a strong nonasymptotic bound (2.5) merely by virtue of the fact that Φh\Phi_{h} is rational.

To this end, observe that the left-hand side of (2.5) is nothing other than the remainder in the first-order Taylor expansion of Φh\Phi_{h} at zero, so that

|Φh(1N)Φh(0)1NΦh(0)|12N2Φh′′C0[0,1N].\big{|}\Phi_{h}(\tfrac{1}{N})-\Phi_{h}(0)-\tfrac{1}{N}\Phi_{h}^{\prime}(0)\big{|}\leq\tfrac{1}{2N^{2}}\|\Phi_{h}^{\prime\prime}\|_{C^{0}[0,\frac{1}{N}]}.

This bound appears useless at first sight, as it relies on the behavior of Φh(x)\Phi_{h}(x) for xJ:={1N:N}x\not\in J:=\{\frac{1}{N}:N\in\mathbb{N}\} where its spectral interpretation is lost. However, as Φh\Phi_{h} is rational, we can control its behavior by its values on JJ alone by means of classical inequalities for arbitrary univariate polynomials ff of degree qq:

  • The inequality fC0[1,1]q2fC0[1,1]\|f^{\prime}\|_{C^{0}[-1,1]}\leq q^{2}\|f\|_{C^{0}[-1,1]} of A. and V. Markov (Lemma 4.1).

  • A corollary of the Markov inequality that fC0[1,1]supxI|f(x)|\|f\|_{C^{0}[-1,1]}\lesssim\sup_{x\in I}|f(x)| for any set I[1,1]I\subset[-1,1] with O(1q2)O(\frac{1}{q^{2}}) spacing between its points (Lemma 4.2).

Applying these inequalities to the numerator and denominator of Φh\Phi_{h} yields (2.5) with minimal effort. We emphasize that the only inputs used here are upper bounds on the degrees of fhf_{h} and gqg_{q} in (2.3), and that gqg_{q} does not vanish near zero.

Step 2: the extension problem

We now aim to extend (2.5) to hCh\in C^{\infty}. This is not merely a technical issue: while 𝐄[trNh(XN)]\mathbf{E}[\mathop{\mathrm{tr}_{N}}h(X^{N})] and ν0(h)=τ(h(XF))\nu_{0}(h)=\tau(h(X_{\rm F})) are defined for hCh\in C^{\infty} by functional calculus, it is unclear that ν1(h)\nu_{1}(h) can even be meaningfully defined when hh is not a polynomial (cf. Remark 2.3).

In order to address these issues, we must replace the degree-dependent bound (2.5) by a bound that depends only on the smoothness of the polynomial hh. This is achieved as follows. Rather than applying (2.5) directly to h𝒫qh\in\mathcal{P}_{q}, we expand

h(x)=j=0qajTj(K1x)h(x)=\sum_{j=0}^{q}a_{j}T_{j}(K^{-1}x)

in terms of Chebyshev polynomials of the first kind TjT_{j}, and apply (2.5) to each TjT_{j} individually. As Chebyshev polynomials are uniformly bounded by 11, this replaces q8hC0[K,K]q^{8}\|h\|_{C^{0}[-K,K]} by j=1qj8|aj|hC9[K,K]\sum_{j=1}^{q}j^{8}|a_{j}|\lesssim\|h\|_{C^{9}[-K,K]} in (2.5), where the latter inequality follows by classical Fourier analysis. We therefore obtain

|𝐄[trNh(XN)]ν0(h)ν1(h)N|1N2hC9[K,K]\bigg{|}\mathbf{E}[\mathop{\mathrm{tr}_{N}}h(X^{N})]-\nu_{0}(h)-\frac{\nu_{1}(h)}{N}\bigg{|}\lesssim\frac{1}{N^{2}}\|h\|_{C^{9}[-K,K]} (2.6)

for every polynomial h𝒫h\in\mathcal{P}. (We will in fact use a stronger inequality of Zygmund, cf. Lemma 4.4, that achieves better rates in our main results.)

Since the inequality (2.6) no longer depends on the degree of the polynomial hh, it extends to arbitrary smooth functions hh as polynomials are dense in CC^{\infty}. In particular, it ensures that the left-hand side extends uniquely to a compactly supported distribution, so that ν1(h)\nu_{1}(h) can be defined for any hCh\in C^{\infty}.

Step 3: the asymptotic moment method

It remains to bound the support of ν1\nu_{1}. To this end, we make fundamental use of a key property of compactly supported distributions: their support is bounded by the exponential growth rate of their moments (Lemma 4.9). In particular, if we define

ρ=lim supp|ν1(xp)|1/p=lim supplimN|N(𝐄[trN(XN)p]τ(XFp))|1/p,\rho=\limsup_{p\to\infty}|\nu_{1}(x^{p})|^{1/p}=\limsup_{p\to\infty}\lim_{N\to\infty}\big{|}N\big{(}\mathbf{E}[\mathop{\mathrm{tr}_{N}}(X^{N})^{p}]-\tau(X_{\rm F}^{p})\big{)}\big{|}^{1/p},

then suppν1[ρ,ρ]\mathop{\mathrm{supp}}\nu_{1}\subseteq[-\rho,\rho]. Thus our method ultimately reduces to a form of the moment method, but where we first let NN\to\infty and only then pp\to\infty.

In contrast to the moments of the random matrix XNX^{N}, the moments of ν1\nu_{1} turn out to be easy to analyze for the models considered in this paper using a simple idea that is inspired by earlier work of Friedman [26, Lemma 2.4]: even though ν1\nu_{1} does not have a clear spectral interpretation, its moments can be expressed as a sum of products of matrix elements of powers of XFX_{\rm F}. As the sum only has polynomially many terms, the desired bound ρXF\rho\leq\|X_{\rm F}\| follows readily.

2.3. The role of cancellations

A surprising feature of our approach is that tangles, which form a fundamental obstruction to the moment method, are completely ignored. This seems unexpected, as the method is based on an estimate (2.5) for traces of high degree polynomials which are merely linear combinations of moments. Indeed, as was explained in section 2.1, the presence of tangles causes the random matrix moments of large degree to be exponentially larger than their limiting values, that is, 𝐄[trN(XN)2p]ecpτ(XF2p)\mathbf{E}[\mathop{\mathrm{tr}_{N}}(X^{N})^{2p}]\geq e^{cp}\tau(X_{\rm F}^{2p}) for plogNp\gg\log N. In contrast, (2.5) yields a bound on the difference between linear combinations of the random matrix moments and their limiting values that is only polynomial in the degree.

The key feature of (2.5) is that it involves a uniform bound hC0[K,K]\|h\|_{C^{0}[-K,K]} on the test function hh. It therefore yields no useful information on moments of order plogNp\gg\log N, as hC0[K,K]=Kp\|h\|_{C^{0}[-K,K]}=K^{p} is exponential in pp for h(x)=xph(x)=x^{p}. However, it yields a powerful bound for polynomials hh that are uniformly bounded on the interval [K,K][-K,K] independently of their degree. The estimate (2.5) therefore reveals a new phenomenon: the effect of tangles on the individual moments cancels out when they are combined to form bounded test functions hh. One of the key features of the polynomial method is that it is able to capture these cancellations.

2.4. Related work

The observation that norm bounds follow from the validity of (2.4) for smooth functions hh and a bound on first-order support suppν1\mathop{\mathrm{supp}}\nu_{1} has been widely used since the works of Haagerup and Thorbjørnsen [31] and Schultz [64] (see also the work of Parraud [58] where higher-order expansions are considered). However, these works rely heavily on analytic and operator-algebraic tools that are not available for the kind of models considered in this paper. What is fundamentally new here is that our method achieves this aim using only simple moment computations, which are much more broadly applicable.

The polynomial method that lies at the heart of our approach is inspired by work in complexity theory [1]. We are aware of little precedent for the use of this method in a random matrix context, beside a distantly related idea that appears in Bourgain and Tzafriri [11, Theorem 2.3]. The first author used a variant of this idea in concurrent work to establish weak convergence of certain random unitary matrices that arise in quantum information theory [15, 14].

We emphasize that the appearance of Chebyshev polynomials in Step 2 above is unrelated to their connection with nonbacktracking walks that is used, for example, in [27, Lemma 2.3]. Indeed, our Chebyshev polynomials are normalized by the a.s. bound KK on XN\|X^{N}\| (e.g., 2d2d in Friedman’s theorem) as opposed to the support of the limiting spectrum XF\|X_{\rm F}\| (22d12\sqrt{2d-1} in Friedman’s theorem).

Finally, we mention the interesting works [6, 23] that develop a method to bound the spectral radius of certain non-Hermitian matrices using NN\to\infty moment asymptotics with fixed pp, similarly avoiding the challenges of the moment method in that setting. However, the methods developed in these works are unrelated to our approach and are not applicable to the problems considered in this paper.

3. Main results

In this section, we formulate and discuss the main results of this paper. As our primary aim is to achieve strong convergence, we will focus the presentation on upper bounds on the norm as explained in section 1. Let us note, however, that a byproduct of the analysis will also yield a quantitative form of weak convergence, which is of independent interest; see Corollary 7.2 below.

3.1. Preliminaries

Before we turn to our main results, we must briefly recall some basic facts about random permutation matrices and their limiting model. The following definitions will remain in force throughout this paper.

Definition 3.1.

Let S¯1N,S¯dN\bar{S}_{1}^{N},\ldots\bar{S}_{d}^{N} be i.i.d. random permutation matrices of dimension NN, and denote by SiN:=S¯iN|{1N}S_{i}^{N}:=\bar{S}_{i}^{N}|_{\{1_{N}\}^{\perp}} their restriction to the invariant subspace {1N}N\{1_{N}\}^{\perp}\subset\mathbb{C}^{N}. We will often write 𝑺N=(S1N,,SdN)\bm{S}^{N}=(S_{1}^{N},\ldots,S_{d}^{N}) and 𝑺N=(S1N,,SdN)\bm{S}^{N*}=(S_{1}^{N*},\ldots,S_{d}^{N*}).

Definition 3.2.

Let 𝒔=(s1,,sd)\bm{s}=(s_{1},\ldots,s_{d}) be defined by si:=λ(gi)s_{i}:=\lambda(g_{i}), where g1,,gdg_{1},\ldots,g_{d} are the free generators of 𝐅d\mathbf{F}_{d} and λ:𝐅dB(l2(𝐅d))\lambda:\mathbf{F}_{d}\to B(l^{2}(\mathbf{F}_{d})) is the left-regular representation defined by λ(g)δh=δgh\lambda(g)\delta_{h}=\delta_{gh}. Define the vector state τ(a):=δe,aδe\tau(a):=\langle\delta_{e},a\delta_{e}\rangle on B(l2(𝐅d))B(l^{2}(\mathbf{F}_{d})).

The basic weak convergence property of random permutation matrices, due to Nica [52] (see Corollary 5.3 for a short proof), states that

limN𝐄[trNP(𝑺N,𝑺N)]=τ(P(𝒔,𝒔))\lim_{N\to\infty}\mathbf{E}\big{[}\mathop{\mathrm{tr}_{N}}P(\bm{S}^{N},\bm{S}^{N*})\big{]}=\tau\big{(}P(\bm{s},\bm{s}^{*})\big{)}

for every noncommutative polynomial PP. This property plays the role of (2.1) in section 2. The aim of the strong convergence problem is to prove that this convergence holds not only for the trace but also for the norm.

The basic inputs to the methods of this paper, as described in section 2.2, are well known in the present setting. They will be reviewed in section 5 below.

Remark 3.3.

Even though SiNS_{i}^{N} are (N1)(N-1)-dimensional matrices defined on {1N}\{1_{N}\}^{\perp}, we will normalize the trace trN\mathop{\mathrm{tr}_{N}} by NN rather than by N1N-1 as this leads to cleaner expressions for the rational functions that arise in the proof. This makes no difference to our results, and is mainly done for notational convenience.

3.2. Random regular graphs

Let A¯N\bar{A}^{N} be the adjacency matrix of the random 2d2d-regular graph with NN vertices defined in section 1. Then AN:=A¯N|{1N}A^{N}:=\bar{A}^{N}|_{\{1_{N}\}^{\perp}} is defined by the linear polynomial of random permutation matrices

AN:=S1N+S1N++SdN+SdN,A^{N}:=S_{1}^{N}+S_{1}^{N*}+\cdots+S_{d}^{N}+S_{d}^{N*},

and the associated limiting model is

AF:=s1+s1++sd+sd.A_{\rm F}:=s_{1}+s_{1}^{*}+\cdots+s_{d}+s_{d}^{*}.

Note that AFA_{\rm F} is nothing other than the adjacency matrix of the Cayley graph of 𝐅d\mathbf{F}_{d} generated by g1,,gdg_{1},\ldots,g_{d} and their inverses, that is, of the 2d2d-regular tree.

It is a classical fact due to Kesten [43] that AF=22d1\|A_{\rm F}\|=2\sqrt{2d-1}. That

AN22d1+o(1)with probability 1o(1)\|A^{N}\|\leq 2\sqrt{2d-1}+o(1)\quad\text{with probability }1-o(1)

is due to Friedman [27]. Friedman’s proof was simplified by Bordenave [5], and a new proof with an improved convergence rate was given by Huang and Yau [42]. Very recently, the latter approach was further developed in the impressive works of Huang, McKenzie and Yau [40, 41] to achieve the optimal convergence rate.

3.2.1. An effective Friedman theorem

As a first illustration of the methods of this paper, we will give a short new proof of Friedman’s theorem in section 8.1. A direct implementation of the approach described in section 2 yields the following.

Theorem 3.4.

For every d2d\geq 2, N1N\geq 1, and ε<2d22d1\varepsilon<2d-2\sqrt{2d-1}, we have

𝐏[AN22d1+ε]1N(dlogdε)8log(2edε).\mathbf{P}\big{[}\|A^{N}\|\geq 2\sqrt{2d-1}+\varepsilon\big{]}\lesssim\frac{1}{N}\,\bigg{(}\frac{d\log d}{\varepsilon}\bigg{)}^{8}\log\bigg{(}\frac{2ed}{\varepsilon}\bigg{)}.

Theorem 3.4 implies that when dd is fixed as NN\to\infty, we have111The notation ZN=OP(zN)Z_{N}=O_{\mathrm{P}}(z_{N}) denotes that {ZN/zN}N1\{Z_{N}/z_{N}\}_{N\geq 1} is bounded in probability.

AN22d1+OP((logNN)1/8).\|A^{N}\|\leq 2\sqrt{2d-1}+O_{\mathrm{P}}\big{(}\big{(}\tfrac{\log N}{N}\big{)}^{1/8}\big{)}.

This rate falls short of the optimal N2/3N^{-2/3} rate in Friedman’s theorem that was very recently established in [40, 41]. However, the methods of [42, 40, 41] rely heavily on the special structure of random regular graphs, and it is unclear at present whether they could be applied to the study of strong convergence. In contrast, the methods of this paper will achieve the same rate for arbitrary polynomials of random permutation matrices, see section 3.3 below.

The quantitative nature of Theorem 3.4 also enables us to consider what happens when d,Nd,N\to\infty simultaneously. It is an old question of Vu [66, §5] whether AN=(1+o(1))22d1\|A^{N}\|=(1+o(1))2\sqrt{2d-1} with probability 1o(1)1-o(1) remains valid when d,Nd,N\to\infty in an arbitrary manner. We can now settle this question for the permutation model of random regular graphs that is considered here (see Remark 3.8 below for a brief discussion of other models of random regular graphs).

Corollary 3.5.

AN=(1+o(1))22d1\|A^{N}\|=(1+o(1))2\sqrt{2d-1} with probability 1o(1)1-o(1) whenever NN\to\infty and d=d(N)d=d(N) depends on NN in an arbitrary manner.

Proof.

That the conclusion holds for d(logN)5d\geq(\log N)^{5} was proved in [12, §3.2.2]. In the complementary regime d(logN)5d\leq(\log N)^{5}, Theorem 3.4 readily yields the upper bound AN(1+o(1))22d1\|A^{N}\|\leq(1+o(1))2\sqrt{2d-1} with probability 1o(1)1-o(1), while the corresponding lower bound follows from the Alon–Boppana theorem [55]. ∎

3.2.2. The staircase theorem

As was explained in section 2, the approach of this paper only requires an understanding of the first-order term ν1\nu_{1} in the 1N\frac{1}{N}-expansion of the moments. However, in the setting of random regular graphs, a detailed understanding of the higher-order terms was achieved by Puder [61] using methods of combinatorial group theory. When such additional information is available, the approach of this paper is readily adapted to achieve stronger results.

The following theorem will be proved in section 8.2 by taking full advantage of the results of [61]. We emphasize that this is the only part of this paper where we will use asymptotics of expected traces beyond the lowest order.

Theorem 3.6.

Define

ρm:={22d1for 2m12d1,2m1+2d12m1for 2m1>2d1,\rho_{m}:=\begin{cases}2\sqrt{2d-1}&\text{for }2m-1\leq\sqrt{2d-1},\\ 2m-1+\tfrac{2d-1}{2m-1}&\text{for }2m-1>\sqrt{2d-1},\end{cases}

and let m:=12(2d1+1)m_{*}:=\lfloor\frac{1}{2}(\sqrt{2d-1}+1)\rfloor be the largest integer mm so that 2m12d12m-1\leq\sqrt{2d-1}.

Then for every d2d\geq 2, mmd1m_{*}\leq m\leq d-1, and 0<ε<ρm+1ρm0<\varepsilon<\rho_{m+1}-\rho_{m}, we have

𝐏[ANρm+ε]CdNm1ε4(m+1)log(2eε)\displaystyle\mathbf{P}\big{[}\|A^{N}\|\geq\rho_{m}+\varepsilon\big{]}\leq\frac{C_{d}}{N^{m}}\frac{1}{\varepsilon^{4(m+1)}}\log\bigg{(}\frac{2e}{\varepsilon}\bigg{)}
for all N1N\geq 1, and
𝐏[ANρm+ε]1o(1)Nm\displaystyle\mathbf{P}\big{[}\|A^{N}\|\geq\rho_{m}+\varepsilon\big{]}\geq\frac{1-o(1)}{N^{m}}

as NN\to\infty. Here CdC_{d} is a constant that depends on dd only.

1Nm\tfrac{1}{N^{m_{*}}}1Nm+1\tfrac{1}{N^{m_{*}+1}}1Nm+2\tfrac{1}{N^{m_{*}+2}}1Nd1\tfrac{1}{N^{d-1}}2d\scriptstyle 2dx\scriptstyle x22d1\scriptstyle 2\sqrt{2d-1}ρm+1\scriptstyle\rho_{m_{*}+1}ρm+2\scriptstyle\rho_{m_{*}+2}ρm+3\scriptstyle\rho_{m_{*}+3}𝐏[ANx]\scriptstyle\mathbf{P}[\|A^{N}\|\,\geq\,x]
Figure 3.1. Staircase pattern of the tail probabilities of AN\|A^{N}\|.

Theorem 3.6 reveals an unusual staircase pattern of the large deviations of AN\|A^{N}\|, which is illustrated in Figure 3.1. The lower bound, which follows from an elementary argument of Friedman [27, Theorem 2.11], arises due to the presence of tangles: Friedman shows that the presence of a vertex with m>mm>m_{*} self-loops, which happens with probability N1m\approx N^{1-m}, gives rise to an outlier in the spectrum at location ρm\approx\rho_{m}. The fact that our upper bound precisely matches this behavior shows that the tail probabilities of AN\|A^{N}\| are completely dominated by these events.

Remark 3.7.

Let us highlight two further consequences of Theorem 3.6.

  1. 1.

    Theorem 3.6 shows that AN22d1+ε\|A^{N}\|\leq 2\sqrt{2d-1}+\varepsilon with probability at least 1cNm1-\frac{c}{N^{m_{*}}} for any ε>0\varepsilon>0, closing the gap between the upper and lower bounds in the main result of Friedman’s original monograph [27, p. 2]. This was previously achieved by Friedman and Kohler [29] by a refinment of Friedman’s methods.

  2. 2.

    The lower bound of Theorem 3.6 shows that the O(1N)O(\frac{1}{N}) probability bound of Theorem 3.4 cannot be improved in general for fixed ε>0\varepsilon>0, as this bound is sharp when m=1m_{*}=1 (that is, when 2d42\leq d\leq 4).

Remark 3.8.

The random regular graph model considered in this paper is known as the permutation model. Several other models of random regular graphs are also considered in the literature. All these models are known to be contiguous to the permutation model as NN\to\infty for fixed degree 2d2d, so that any statement that holds with probability 1o(1)1-o(1) for the permutation model remains valid with probability 1o(1)1-o(1) for the other models; cf. [27, pp. 2–3] and the references therein.

It should be emphasized, however, that low probability events are not preserved by contiguity. In particular, the detailed quantitative picture in Theorem 3.6 is specific to the permutation model. The corresponding picture for other models of random regular graphs must be investigated on a case by case basis.

Similarly, the different models of random regular graphs are no longer contiguous when the degree diverges dd\to\infty, so that high degree results as in Corollary 3.5 must be investigated separately for each model. Partial results on this problem for the uniform model of random regular graphs may be found in [3, 63, 36].

3.3. Strong convergence of random permutation matrices

The adjacency matrix of a random regular graph is one very special example of a polynomial of random permutation matrices. The much more general fact that the norm of every noncommutative polynomial of random permutation matrices converges to that of its limiting model is an important result due to Bordenave and Collins [7, 8]. Here we give a short proof that yields an effective form of this result.

We will formulate our strong convergence results for general noncommutative polynomials PMD()𝒔,𝒔P\in\mathrm{M}_{D}(\mathbb{C})\otimes\mathbb{C}\langle\bm{s},\bm{s}^{*}\rangle with matrix coefficients, that is,

P(𝒔,𝒔)=wAww(𝒔,𝒔),P(\bm{s},\bm{s}^{*})=\sum_{w}A_{w}\otimes w(\bm{s},\bm{s}^{*}),

where the sum is over a finite set of words ww in the symbols s1,,sd,s1,,sds_{1},\ldots,s_{d},s_{1}^{*},\ldots,s_{d}^{*} and AwMD()A_{w}\in\mathrm{M}_{D}(\mathbb{C}) are matrix coefficients. The case of scalar coefficients is recovered for D=1D=1, but the more general setting considered here arises often in applications (e.g., in the study of random lifts [7]). For any such polynomial, we define the norm

PMD()C(𝐅d)=sup𝑼P(𝑼,𝑼),\|P\|_{\mathrm{M}_{D}(\mathbb{C})\otimes C^{*}(\mathbf{F}_{d})}=\sup_{\bm{U}}\|P(\bm{U},\bm{U}^{*})\|,

where the supremum is taken over all dd-tuples of unitary matrices of any dimension. The notation used here agrees with the standard definition in terms of the norm in the full CC^{*}-algebra of the free group, see, e.g., [17, Theorem 7]. However, this is not important for our purposes, and the reader may simply view the above as the definition of the norm. We emphasize that PMD()C(𝐅d)\|P\|_{\mathrm{M}_{D}(\mathbb{C})\otimes C^{*}(\mathbf{F}_{d})} is not the same as P(𝒔,𝒔)\|P(\bm{s},\bm{s}^{*})\|, which corresponds to the reduced CC^{*}-algebra of the free group. For example, for P(𝒔,𝒔)=s1+s1++sd+sdP(\bm{s},\bm{s}^{*})=s_{1}+s_{1}^{*}+\cdots+s_{d}+s_{d}^{*}, we have P(𝒔,𝒔)=22d1\|P(\bm{s},\bm{s}^{*})\|=2\sqrt{2d-1} and PC(𝐅d)=2d\|P\|_{C^{*}(\mathbf{F}_{d})}=2d. In practice, PMD()C(𝐅d)\|P\|_{\mathrm{M}_{D}(\mathbb{C})\otimes C^{*}(\mathbf{F}_{d})} may be bounded trivially by the sum of the norms of the matrix coefficients of PP.

We now state our main result on strong convergence of random permutations.

Theorem 3.9.

Let d2d\geq 2, and let PMD()𝐬,𝐬P\in\mathrm{M}_{D}(\mathbb{C})\otimes\mathbb{C}\langle\bm{s},\bm{s}^{*}\rangle be any self-adjoint noncommutative polynomial of degree q0q_{0}. Then we have

𝐏[P(𝑺N,𝑺N)P(𝒔,𝒔)+ε]DN(Kq0logdε)8log(eKε)\mathbf{P}\big{[}\|P(\bm{S}^{N},\bm{S}^{N*})\|\geq\|P(\bm{s},\bm{s}^{*})\|+\varepsilon\big{]}\lesssim\frac{D}{N}\bigg{(}\frac{Kq_{0}\log d}{\varepsilon}\bigg{)}^{8}\log\bigg{(}\frac{eK}{\varepsilon}\bigg{)}

for all ε<KP(𝐬,𝐬)\varepsilon<K-\|P(\bm{s},\bm{s}^{*})\|, where K=PMD()C(𝐅d)K=\|P\|_{\mathrm{M}_{D}(\mathbb{C})\otimes C^{*}(\mathbf{F}_{d})}.

This theorem will be proved in section 9. Note that the limiting norm P(𝒔,𝒔)\|P(\bm{s},\bm{s}^{*})\| can in principle be computed explicitly using the results of [44].

Remark 3.10.

The assumption that PP is self-adjoint entails no loss of generality: the analogous bound for a non-self-adjoint polynomial PP of degree q0q_{0} follows by applying this result to the self-adjoint polynomial PPP^{*}P of degree 2q02q_{0} (cf. Remark 2.2).

Theorem 3.9 shows that P(𝑺N,𝑺N)P(𝒔,𝒔)+OP((logNN)1/8)\|P(\bm{S}^{N},\bm{S}^{N*})\|\leq\|P(\bm{s},\bm{s}^{*})\|+O_{\mathrm{P}}\big{(}\big{(}\frac{\log N}{N}\big{)}^{1/8}\big{)} for any polynomial PP. This significantly improves the best known bound to date, due to Bordenave and Collins [8, Theorem 1.4], which yields fluctuations of order loglogNlogN\frac{\log\log N}{\log N}. Our bound can be directly substituted in applications, such as to random lifts of graphs [7, §1.5], to improve the best known convergence rates.

Let us note that for fixed ε>0\varepsilon>0, the tail probability of order 1N\frac{1}{N} in Theorem 3.9 cannot be improved in general, as is illustrated by Theorem 3.6 with d=2d=2.

Remark 3.11.

While Theorem 3.9 yields much stronger quantitative bounds for fixed DD then prior results, it can only be applied when D=o(N)D=o(N). In contrast, it was recently shown in [8, Corollary 1.5] that strong convergence remains valid in the present setting even for DD as large as N(logN)1/2N^{(\log N)^{1/2}}. Such bounds could be achieved using our methods if the supports of the higher-order terms in the 1N\frac{1}{N}-expansion of the moments are still bounded by P(𝒔,𝒔)\|P(\bm{s},\bm{s}^{*})\|. While this is the case for continuous models such as GUE [58], it is simply false in the present setting: the proof of Theorem 3.6 shows that tangles can already arise in the second-order term. It is an interesting question whether the approach of this paper can be combined with conditioning on the absence of tangles to achieve improved bounds.

3.4. Stable representations of the symmetric group

Let πN:𝐒NB(VN)\pi_{N}:\mathbf{S}_{N}\to B(V_{N}) be a finite-dimensional unitary representation of the symmetric group 𝐒N\mathbf{S}_{N}. Then we can define random matrices Π1N,,ΠdN\Pi^{N}_{1},\ldots,\Pi^{N}_{d} of dimension dimVN\dim V_{N} as

ΠiN:=πN(σi),\Pi_{i}^{N}:=\pi_{N}(\sigma_{i}), (3.1)

where σ1,,σd\sigma_{1},\ldots,\sigma_{d} are i.i.d. uniformly distributed elements of 𝐒N\mathbf{S}_{N}. When πN\pi_{N} is the standard representation, we recover the random permutation matrices ΠiN=SiN\Pi_{i}^{N}=S_{i}^{N} as a special case. Other representations, however, capture a much larger family of random matrix models. We will prove strong convergence for random matrices defined by any stable representation of 𝐒N\mathbf{S}_{N} (see section 3.4.1), which yields a far-reaching generalization of Theorem 3.9. This result is of interest for several reasons:

  • It provides many new examples of the strong convergence phenomenon.

  • It shows that strong convergence can be achieved with much less randomness than in the standard representation: ΠiN\Pi^{N}_{i} uses the same number of random bits as SiNS_{i}^{N}, but has much larger dimension (dimVNNα\dim V_{N}\asymp N^{\alpha} with α\alpha arbitrarily large). See [9] for analogous questions in the context of the unitary group.

  • It provides new evidence in support of long-standing questions on the expansion of random Cayley graphs of the symmetric group, for which extensive numerical evidence and conjectures are available; see [62] and the references therein.

  • Random matrices defined by representations other than the standard representation arise in various applications [28, 33].

Remark 3.12.

The question whether strong asymptotic freeness can be derandomized has been investigated in the theoretical computer science literature [51, 56] by means of pseudorandom permutation matrices. The results of the present section suggest that high-dimensional representations of 𝐒N\mathbf{S}_{N} can provide a different perspective on such questions. We omit a detailed discussion of the number of random bits needed by our bounds, as significantly stronger results in this direction were subsequently obtained by Cassidy [13] by combining the methods of this paper with new group-theoretic ideas; see Remark 3.16 below.

3.4.1. Stable representations

The approach of this paper requires that the moments of the random matrices of interest are rational functions of 1N\frac{1}{N}. For this to be the case, we cannot choose an arbitrary representation πN\pi_{N} of 𝐒N\mathbf{S}_{N} for each NN. Instead, we will work with stable representations [25, 18] that are defined consistently for different NN. We briefly introduce the relevant notions here.

Following [32], denote by ξi(σ)\xi_{i}(\sigma) the number of fixed points of σi\sigma^{i} for σ𝐒N\sigma\in\mathbf{S}_{N}. The sequence ξ1(σ),ξ2(σ),\xi_{1}(\sigma),\xi_{2}(\sigma),\ldots determines the conjugacy class of σ\sigma. If πN:𝐒NB(VN)\pi_{N}:\mathbf{S}_{N}\to B(V_{N}) is any finite-dimensional unitary representation, its character σTrπN(σ)\sigma\mapsto\mathop{\mathrm{Tr}}\pi_{N}(\sigma) is a class function and can therefore be expressed as a polynomial of ξ1(σ),ξ2(σ),\xi_{1}(\sigma),\xi_{2}(\sigma),\ldots

Definition 3.13.

A finite-dimensional unitary representation πN:𝐒NB(VN)\pi_{N}:\mathbf{S}_{N}\to B(V_{N}) of 𝐒N\mathbf{S}_{N}, defined for each NN0N\geq N_{0}, is stable if there exists rr\in\mathbb{N} and a polynomial φ[x1,,xr]\varphi\in\mathbb{C}[x_{1},\ldots,x_{r}] so that TrπN(σ)=φ(ξ1(σ),,ξr(σ))\mathop{\mathrm{Tr}}\pi_{N}(\sigma)=\varphi(\xi_{1}(\sigma),\ldots,\xi_{r}(\sigma)) for all NN0,σ𝐒NN\geq N_{0},\sigma\in\mathbf{S}_{N}.

Thus stable representations are representations of 𝐒N\mathbf{S}_{N} whose characters are defined by the same polynomial φ\varphi for all NN0N\geq N_{0}. For example, the standard representation is stable as it satisfies TrπN=ξ11\mathop{\mathrm{Tr}}\pi_{N}=\xi_{1}-1 for all NN.

The irreducible stable representations of 𝐒N\mathbf{S}_{N} can be constructed explicitly as follows. Fix a base partition λ=(λ1λ>0)|λ|\lambda=(\lambda_{1}\geq\cdots\geq\lambda_{\ell}>0)\vdash|\lambda| of |λ|=i=1λi|\lambda|=\sum_{i=1}^{\ell}\lambda_{i}. Then for every N|λ|+λ1N\geq|\lambda|+\lambda_{1}, the irreducible representation of 𝐒N\mathbf{S}_{N} defined by

λ[N]=(N|λ|λ1λ)N\lambda[N]=(N-|\lambda|\geq\lambda_{1}\geq\cdots\geq\lambda_{\ell})\vdash N (3.2)

is stable. Moreover, it follows from [32, Proposition B.2] that every stable representation in the sense of Definition 3.13 is a direct sum of such irreducible representations defined by fixed base partitions λ(1),,λ(s)\lambda^{(1)},\ldots,\lambda^{(s)}.

3.4.2. Strong convergence

Fix a stable representation πN:𝐒NB(VN)\pi_{N}:\mathbf{S}_{N}\to B(V_{N}) defined for NN0N\geq N_{0} by a character polynomial φ[x1,,xr]\varphi\in\mathbb{C}[x_{1},\ldots,x_{r}]. We aim to prove strong convergence of the random matrices Π1N,,ΠdN\Pi^{N}_{1},\ldots,\Pi^{N}_{d} defined by (3.1).

We will not require that πN\pi_{N} is irreducible, but we assume it does not contain the trivial representation. The dimension of ΠiN\Pi_{i}^{N} is given by

DN:=dim(VN)=TrπN(e)=φ(N,N,,N).D_{N}:=\dim(V_{N})=\mathop{\mathrm{Tr}}\pi_{N}(e)=\varphi(N,N,\ldots,N).

Thus DND_{N} is a polynomial in NN; we denote its degree by α\alpha, so that DNNαD_{N}\asymp N^{\alpha}.

We now formulate our main result on strong convergence of 𝚷N=(Π1N,,ΠdN)\bm{\Pi}^{N}=(\Pi^{N}_{1},\ldots,\Pi^{N}_{d}), whose proof is contained in section 10 below.

Theorem 3.14.

Let d2d\geq 2, and let PMD()𝐬,𝐬P\in\mathrm{M}_{D}(\mathbb{C})\otimes\mathbb{C}\langle\bm{s},\bm{s}^{*}\rangle be any self-adjoint noncommutative polynomial of degree q0q_{0}. Then we have

𝐏[P(𝚷N,𝚷N)P(𝒔,𝒔)+ε]CDN(Kq0logdε)4(α+1)log(eKε)\mathbf{P}\big{[}\|P(\bm{\Pi}^{N},\bm{\Pi}^{N*})\|\geq\|P(\bm{s},\bm{s}^{*})\|+\varepsilon\big{]}\leq\frac{CD}{N}\bigg{(}\frac{Kq_{0}\log d}{\varepsilon}\bigg{)}^{4(\alpha+1)}\log\bigg{(}\frac{eK}{\varepsilon}\bigg{)}

for all ε<KP(𝐬,𝐬)\varepsilon<K-\|P(\bm{s},\bm{s}^{*})\| and NN0N\geq N_{0}. Here we define K=PMD()C(𝐅d)K=\|P\|_{\mathrm{M}_{D}(\mathbb{C})\otimes C^{*}(\mathbf{F}_{d})}, and CC is a constant that depends on the choice of stable representation.

Remark 3.15.

It is certainly possible to obtain an explicit expression for CC from the proof; we have suppressed the dependence on the choice of stable representation for simplicity of presentation, and we did not optimize this dependence in the proof.

Remark 3.16.

The bound P(𝚷N,𝚷N)P(𝒔,𝒔)+OP((logNN)1/4(α+1))\|P(\bm{\Pi}^{N},\bm{\Pi}^{N*})\|\leq\|P(\bm{s},\bm{s}^{*})\|+O_{\mathrm{P}}\big{(}\big{(}\frac{\log N}{N}\big{)}^{1/4(\alpha+1)}\big{)} that follows from Theorem 3.14 becomes weaker when we consider stable representations of increasingly large dimension DNNαD_{N}\asymp N^{\alpha}. This is not a restriction of our method, however, but rather reflects a gap in the understanding of stable representations at the time that this paper was written [32, Conjecture 1.8]. Important progress in this direction, recently obtained by Cassidy [13], yields an improved probability bound where 1Nε4(α+1)\frac{1}{N\varepsilon^{4(\alpha+1)}} is replaced by 1Nαε8α\frac{1}{N^{\alpha}\varepsilon^{8\alpha}}, resulting in a convergence rate that is independent of α\alpha. As is discussed in [13], this yields strong convergence uniformly for all stable representations with αN112δ\alpha\leq N^{\frac{1}{12}-\delta} for any δ>0\delta>0.

Theorem 3.14 can be readily applied to concrete situations. For example, it implies that random 2d2d-regular Schreier graphs defined by the action of 𝐒N\mathbf{S}_{N} on kk-tuples of distinct elements of {1,,N}\{1,\ldots,N\} have second eigenvalue 22d1+o(1)2\sqrt{2d-1}+o(1) with probability 1o(1)1-o(1), settling a question discussed in [32, §1.4]. Indeed, it is not difficult to see (cf. [32, §8]) that the restricted adjacency matrix AN=A¯N|{1}A^{N}=\bar{A}^{N}|_{\{1^{\perp}\}} of such a random graph can be represented as

AN=Π1N+Π1N+ΠdN+ΠdNA^{N}=\Pi_{1}^{N}+\Pi_{1}^{N*}+\cdots\Pi_{d}^{N}+\Pi_{d}^{N*}

for some stable representation of 𝐒N\mathbf{S}_{N} (which depends on kk) that does not contain the trivial representation, so that the conclusion follows immediately as a special case of Theorem 3.14. Applications of this model may be found in [28, 33]. (Let us note for completeness that the case k=1k=1 reduces to Friedman’s theorem, while the case k=2k=2 was previously studied by Bordenave and Collins [7, §1.6].)

4. Basic tools

The aim of this section is to introduce the general facts on polynomials and compactly supported distributions that form the basis for the methods of this paper. While most of the tools in this section follow readily from known results, it is useful to state them in the particular form in which they will be needed.

4.1. Markov inequalities

One of the key tools that will be used in our analysis is the following classical inequality of A. Markov and V. Markov. Here we recall that (2k1)!!:=(2k1)(2k3)531=(2k)!2kk!(2k-1)!!:=(2k-1)(2k-3)\cdots 5\cdot 3\cdot 1=\frac{(2k)!}{2^{k}k!}.

Lemma 4.1 (Markov inequality).

Let h𝒫qh\in\mathcal{P}_{q} and a>0a>0, mm\in\mathbb{N}. Then we have

h(m)C0[0,a]1(2m1)!!(2q2a)mhC0[0,a].\|h^{(m)}\|_{C^{0}[0,a]}\leq\frac{1}{(2m-1)!!}\bigg{(}\frac{2q^{2}}{a}\bigg{)}^{m}\|h\|_{C^{0}[0,a]}.
Proof.

Apply [10, p. 256(d)] to P(x)=h(a2x+a2)P(x)=h(\frac{a}{2}x+\frac{a}{2}). ∎

Two basic issues will arise in our applications of this inequality. First, we will not be able to control the relevant functions on the entire interval [0,a][0,a], but only on a discrete subset thereof. This issue will be addressed using a standard interpolation inequality that is itself a direct consequence of the Markov inequality.

Lemma 4.2 (Interpolation).

Let h𝒫qh\in\mathcal{P}_{q}, and fix a subset I[0,a]I\subseteq[0,a] such that [0,a]I+[a4q2,a4q2][0,a]\subseteq I+[-\frac{a}{4q^{2}},\frac{a}{4q^{2}}]. Then we have

hC0[0,a]2supxI|h(x)|.\|h\|_{C^{0}[0,a]}\leq 2\sup_{x\in I}|h(x)|.
Proof.

Apply [16, Lemma 3(i), p. 91] to P(x)=h(a2x+a2)P(x)=h(\frac{a}{2}x+\frac{a}{2}). ∎

The second issue is that we will aim to apply these inequalities to rational functions rather than to polynomials. However, in the cases of interest to us, the denominator of the rational function will be nearly constant. The following lemma extends the Markov inequality to this setting.

Lemma 4.3 (Rational Markov).

Let r=fgr=\frac{f}{g} be a rational function with f,g𝒫qf,g\in\mathcal{P}_{q}, and let a>0a>0 and mm\in\mathbb{N}. Suppose that c:=supx,y[0,a]|g(x)g(y)|<c:=\sup_{x,y\in[0,a]}\big{|}\frac{g(x)}{g(y)}\big{|}<\infty. Then

r(m)C0[0,a]m!(5cq2a)mrC0[0,a].\|r^{(m)}\|_{C^{0}[0,a]}\leq m!\,\bigg{(}\frac{5cq^{2}}{a}\bigg{)}^{m}\|r\|_{C^{0}[0,a]}.
Proof.

Applying the product rule to f=rgf=rg yields

r(m)g=f(m)k=1m(mk)r(mk)g(k).r^{(m)}g=f^{(m)}-\sum_{k=1}^{m}{m\choose k}r^{(m-k)}g^{(k)}.

As 1chgC0[0,a]hC0[0,a]gC0[0,a]hgC0[0,a]\frac{1}{c}\|\frac{h}{g}\|_{C^{0}[0,a]}\leq\frac{\|h\|_{C^{0}[0,a]}}{\|g\|_{C^{0}[0,a]}}\leq\|\frac{h}{g}\|_{C^{0}[0,a]} for every function hh, we can estimate

r(m)C0[0,a]\displaystyle\|r^{(m)}\|_{C^{0}[0,a]} f(m)gC0[0,a]+k=1m(mk)r(mk)C0[0,a]g(k)gC0[0,a]\displaystyle\leq\bigg{\|}\frac{f^{(m)}}{g}\bigg{\|}_{C^{0}[0,a]}+\sum_{k=1}^{m}{m\choose k}\|r^{(m-k)}\|_{C^{0}[0,a]}\bigg{\|}\frac{g^{(k)}}{g}\bigg{\|}_{C^{0}[0,a]}
2ck=1m(mk)1(2k1)!!(2q2a)kr(mk)C0[0,a]\displaystyle\leq 2c\sum_{k=1}^{m}{m\choose k}\frac{1}{(2k-1)!!}\bigg{(}\frac{2q^{2}}{a}\bigg{)}^{k}\|r^{(m-k)}\|_{C^{0}[0,a]}

by applying Lemma 4.1 to f(m)f^{(m)} and g(k)g^{(k)}. Here the factor 22 in the second line is due to the fact that the f(m)f^{(m)} and k=mk=m terms in the first line yield the same bound.

We now reason by induction. Clearly the conclusion holds m=0m=0. Now suppose the conclusion holds up to m1m-1. Then the above inequality yields

r(m)C0[0,a]\displaystyle\|r^{(m)}\|_{C^{0}[0,a]} (5cq2a)mrC0[0,a]2ck=1m(mk)(mk)!(2k1)!!(25c)k\displaystyle\leq\bigg{(}\frac{5cq^{2}}{a}\bigg{)}^{m}\|r\|_{C^{0}[0,a]}\cdot 2c\sum_{k=1}^{m}{m\choose k}\frac{(m-k)!}{(2k-1)!!}\bigg{(}\frac{2}{5c}\bigg{)}^{k}
m!(5cq2a)mrC0[0,a]2c(cosh(45c)1)\displaystyle\leq m!\,\bigg{(}\frac{5cq^{2}}{a}\bigg{)}^{m}\|r\|_{C^{0}[0,a]}\cdot 2c\,\Big{(}\cosh\Big{(}\sqrt{\tfrac{4}{5c}}\Big{)}-1\Big{)}

as (mk)(mk)!(2k1)!!=2km!(2k)!{m\choose k}\frac{(m-k)!}{(2k-1)!!}=\frac{2^{k}m!}{(2k)!}. Finally, use c1c\geq 1 and cosh(x)15x8\cosh(\sqrt{x})-1\leq\frac{5x}{8} for x[0,1]x\in[0,1]. ∎

The above bounds cannot be essentially improved without further assumptions. Lemma 4.1 attains equality for Chebyshev polynomials [10, p. 256(d)]. The optimality of Lemma 4.2 is discussed in [22], while the optimality of Lemma 4.3 is illustrated by considering r(x)=1uxr(x)=\frac{1}{u-x} on x[0,1]x\in[0,1] for u>1u>1.

4.2. Chebyshev polynomials

Let h𝒫qh\in\mathcal{P}_{q} and fix K>0K>0. In this section, we will consider the behavior of hh on the interval [K,K][-K,K].

Denote by TjT_{j} the Chebyshev polynomial of the first kind of degree jj, defined by Tj(cosθ)=cos(jθ)T_{j}(\cos\theta)=\cos(j\theta). Then hh can be uniquely expressed as

h(x)=j=0qajTj(K1x)h(x)=\sum_{j=0}^{q}a_{j}T_{j}(K^{-1}x) (4.1)

for some real coefficients a0,,aqa_{0},\ldots,a_{q}. Note that these coefficients are precisely the Fourier coefficients of the cosine series h(Kcosθ)h(K\cos\theta). We can therefore apply a classical result of Zygmund on absolute convergence of trigonometric series.

Lemma 4.4 (Zygmund).

Let hh be as in (4.1) and define f(θ):=h(Kcosθ)f(\theta):=h(K\cos\theta). Then

|a0|hC0[K,K],|a_{0}|\leq\|h\|_{C^{0}[-K,K]},

and

j=1qjm|aj|βf(m+1)Lβ[0,2π]\sum_{j=1}^{q}j^{m}|a_{j}|\lesssim\beta_{*}\|f^{(m+1)}\|_{L^{\beta}[0,2\pi]}

for every m+m\in\mathbb{Z}_{+} and β>1\beta>1, where we defined 1/β:=11/β1/\beta_{*}:=1-1/\beta.

Proof.

That |a0|hC0[K,K]|a_{0}|\leq\|h\|_{C^{0}[-K,K]} follows immediately from a0=12π02πf(θ)𝑑θa_{0}=\frac{1}{2\pi}\int_{0}^{2\pi}f(\theta)\,d\theta. To obtain the second estimate, we note that

f(m)(θ)={j=0q(1)m/2jmajcos(jθ)for even m,j=0q(1)(m+1)/2jmajsin(jθ)for odd m.f^{(m)}(\theta)=\begin{cases}\sum_{j=0}^{q}(-1)^{m/2}j^{m}a_{j}\cos(j\theta)&\text{for even }m,\\ \sum_{j=0}^{q}(-1)^{(m+1)/2}j^{m}a_{j}\sin(j\theta)&\text{for odd }m.\end{cases}

The conclusion follows from [67, p. 242]. ∎

A direct consequence is the following.

Corollary 4.5.

Let hh be as in (4.1) and let m+m\in\mathbb{Z}_{+}. Then

j=0qjm|aj|cm,KhCm+1[K,K],\sum_{j=0}^{q}j^{m}|a_{j}|\leq c_{m,K}\,\|h\|_{C^{m+1}[-K,K]},

where the constant cm,Kc_{m,K} depends on m,Km,K only.

Proof.

That f(θ)=h(Kcosθ)f(\theta)=h(K\cos\theta) satisfies f(m+1)L[0,2π]cm,KhCm+1[K,K]\|f^{(m+1)}\|_{L^{\infty}[0,2\pi]}\leq c_{m,K}\|h\|_{C^{m+1}[-K,K]} follows by the chain rule. It remains to apply Lemma 4.4 with β=\beta=\infty. ∎

4.3. Compactly supported distributions

In this paper we will encounter only distributions on \mathbb{R}. We therefore adopt the following definition [39, §2.3].

Definition 4.6.

A linear functional ν\nu on C()C^{\infty}(\mathbb{R}) such that

|ν(f)|cfCm[K,K]for all fC()|\nu(f)|\leq c\|f\|_{C^{m}[-K,K]}\quad\text{for all }f\in C^{\infty}(\mathbb{R})

holds for some c,K>0c,K>0, m+m\in\mathbb{Z}_{+}, is called a compactly supported distribution.

The linear functionals that will arise in this paper will not be defined a priori on C()C^{\infty}(\mathbb{R}), but rather only on the space 𝒫\mathcal{P} of univariate polynomials. It will be therefore essential to understand when linear functionals on 𝒫\mathcal{P} can be extended to compactly supported distributions. As we have not located the following result in the literature, we prove it here for completeness.

Lemma 4.7 (Hausdorff moment problem for compactly supported distributions).

Let ν\nu be a linear functional on 𝒫\mathcal{P}. Then the following are equivalent.

  1. 1.

    There exist c,m,K0c,m,K\geq 0 so that |ν(h)|cqmhC0[K,K]|\nu(h)|\leq cq^{m}\|h\|_{C^{0}[-K,K]} for all h𝒫q,qh\in\mathcal{P}_{q},~{}q\in\mathbb{N}.

  2. 2.

    There exist c,m,K0c,m,K\geq 0 so that |ν(Tj(K1))|cjm|\nu(T_{j}(K^{-1}\cdot))|\leq cj^{m} for all jj\in\mathbb{N}.

  3. 3.

    ν\nu extends uniquely to a compactly supported distribution.

Only the implication 131\Rightarrow 3 will be used in this paper; the equivalence of 232\Leftrightarrow 3 appears implicitly in [24, p. 38].

Proof.

That 121\Rightarrow 2 is immediate as Tj(K1)C0[K,K]=1\|T_{j}(K^{-1}\cdot)\|_{C^{0}[-K,K]}=1 by the definition of the Chebyshev polynomials TjT_{j}.

To prove that 232\Rightarrow 3, fix any h𝒫h\in\mathcal{P} and express it in the form (4.1). Then

|ν(h)|j=0q|ν(Tj(K1))||aj|cj=0qjm|aj|cm,KhCm+1[K,K]|\nu(h)|\leq\sum_{j=0}^{q}|\nu(T_{j}(K^{-1}\cdot))|\,|a_{j}|\leq c\sum_{j=0}^{q}j^{m}|a_{j}|\leq c_{m,K}\|h\|_{C^{m+1}[-K,K]}

by Corollary 4.5. Thus the condition of Definition 4.6 is satisfied for all h𝒫h\in\mathcal{P}. As 𝒫\mathcal{P} is dense in C()C^{\infty}(\mathbb{R}) with respect to the norm Cm+1[K,K]\|\cdot\|_{C^{m+1}[-K,K]}, it is clear that ν\nu extends uniquely to a compactly supported distribution.

To prove that 313\Rightarrow 1, it suffices by Definition 4.6 to note that hCm[K,K]cm,Kq2mhC0[K,K]\|h\|_{C^{m}[-K,K]}\leq c_{m,K}q^{2m}\|h\|_{C^{0}[-K,K]} for every h𝒫qh\in\mathcal{P}_{q} and qq\in\mathbb{N} by Lemma 4.1. ∎

Next, we recall the definition of support [39, §2.2].

Definition 4.8.

The support suppν\mathop{\mathrm{supp}}\nu of a compactly supported distribution ν\nu is the smallest closed set AA\subset\mathbb{R} with the property that ν(f)=0\nu(f)=0 for every fC()f\in C^{\infty}(\mathbb{R}) such that f=0f=0 in a neighborhood of AA.

It is clear that when the condition of Definition 4.6 is satisfied, we must have suppν[K,K]\mathop{\mathrm{supp}}\nu\subseteq[-K,K]. However, the support may in fact be much smaller. A key property of compactly supported distributions for our purposes is that their support can be bounded by the growth rate of their moments. While the analogous property of measures is straightforward, its validity for distributions requires justification.

Lemma 4.9.

Let ν\nu be a compactly supported distribution. Then

suppν[ρ,ρ]withρ=lim supp|ν(xp)|1/p.\mathop{\mathrm{supp}}\nu\subseteq[-\rho,\rho]\qquad\text{with}\qquad\rho=\limsup_{p\to\infty}|\nu(x^{p})|^{1/p}.
Proof.

It is clear from the definition of a compactly supported distribution that we must have ρ<\rho<\infty. Thus the function FF defined by

F(z):=p=0ν(xp)zp+1F(z):=\sum_{p=0}^{\infty}\frac{\nu(x^{p})}{z^{p+1}}

is analytic for zz\in\mathbb{C} with |z||z| sufficiently large. It follows from [24, Lemma 1] (or from [64, Theorem 5.4]) that FF can be analytically continued to \suppν\mathbb{C}\backslash\mathop{\mathrm{supp}}\nu and that suppν\mathop{\mathrm{supp}}\nu is precisely the set of singular points of FF. As the above expansion of FF is absolutely convergent in a neighborhood of z=±(ρ+ε)z=\pm(\rho+\varepsilon) for all ε>0\varepsilon>0, it follows that ±(ρ+ε)suppν\pm(\rho+\varepsilon)\not\in\mathop{\mathrm{supp}}\nu for all ε>0\varepsilon>0, which yields the conclusion. ∎

4.4. Test functions

We finally recall a standard construction of smooth approximations of the indicator function 1[ρ,ρ]c1_{[-\rho,\rho]^{c}} for which the bound of Lemma 4.4 is well controlled; its sharpness is discussed in [39, pp. 19–22]. Recall that 1/β=11/β1/\beta_{*}=1-1/\beta.

Lemma 4.10 (Test functions).

Fix m+m\in\mathbb{Z}_{+} and K,ρ,ε>0K,\rho,\varepsilon>0 so that ρ+ε<K\rho+\varepsilon<K. Then there exists a function χC()\chi\in C^{\infty}(\mathbb{R}) with the following properties.

  1. 1.

    χ(x)[0,1]\chi(x)\in[0,1] for all xx, χ(x)=0\chi(x)=0 for |x|ρ+ε2|x|\leq\rho+\frac{\varepsilon}{2}, and χ(x)=1\chi(x)=1 for |x|ρ+ε|x|\geq\rho+\varepsilon.

  2. 2.

    f(m+1)Lβ[0,2π](Cm)m(Kε)m+1/β\|f^{(m+1)}\|_{L^{\beta}[0,2\pi]}\leq(Cm)^{m}(\frac{K}{\varepsilon})^{m+1/\beta_{*}} for all β>1\beta>1.

Here f(θ):=χ(Kcosθ)f(\theta):=\chi(K\cos\theta), and CC is universal constant.

Proof.

Let FC()F\in C^{\infty}(\mathbb{R}) be a nonnegative function that is supported in [0,1][0,1] with FL1()=1\|F\|_{L^{1}(\mathbb{R})}=1. For a>0a>0, define Fa(x):=1aF(xa)F_{a}(x):=\frac{1}{a}F(\frac{x}{a}) and Ha(x):=1a1[0,a](x)H_{a}(x):=\frac{1}{a}1_{[0,a]}(x), and let

h(x)=x(Fδ/2Hδ/2mHδ/2mm times)(y)𝑑y.h(x)=\int_{-\infty}^{x}(F_{\delta/2}*\underbrace{H_{\delta/2m}*\cdots*H_{\delta/2m}}_{m\text{ times}})(y)\,dy.

Here δ,φ>0\delta,\varphi>0 are parameters that will be chosen below.

Note that hC()h\in C^{\infty}(\mathbb{R}) by construction. Moreover, the integrand in the definition of hh is nonnegative, has L1()L^{1}(\mathbb{R})-norm one, and is supported in [0,δ][0,\delta]. Thus hh is nondecreasing, h(x)=0h(x)=0 for x0x\leq 0, and h(x)=1h(x)=1 for xδx\geq\delta. We define

χ(x)=h(arcsin(xK)φ)+h(arcsin(xK)φ)\chi(x)=h\big{(}\arcsin\big{(}\tfrac{x}{K}\big{)}-\varphi\big{)}+h\big{(}{-\arcsin\big{(}\tfrac{x}{K}\big{)}}-\varphi\big{)}

for x[K,K]x\in[-K,K], and define χ(x)=1\chi(x)=1 otherwise. We now choose the parameters φ=arcsin(ρ+ε/2K)\varphi=\arcsin(\frac{\rho+\varepsilon/2}{K}) and δ=arcsin(ρ+εK)arcsin(ρ+ε/2K)\delta=\arcsin(\frac{\rho+\varepsilon}{K})-\arcsin(\frac{\rho+\varepsilon/2}{K}) so that 1. holds. Moreover, χ(x)=1\chi(x)=1 is constant near x=±Kx=\pm K as ρ+ε<K\rho+\varepsilon<K, so χC()\chi\in C^{\infty}(\mathbb{R}) by the chain rule.

Now note that f(θ)=h(π2θφ)+h(θπ2φ)f(\theta)=h\big{(}\tfrac{\pi}{2}-\theta-\varphi\big{)}+h\big{(}\theta-\tfrac{\pi}{2}-\varphi\big{)} for θ[0,π]\theta\in[0,\pi]. Therefore

f(m+1)Lβ[0,2π]\displaystyle\|f^{(m+1)}\|_{L^{\beta}[0,2\pi]} 4Fδ/2Hδ/2mHδ/2mLβ()\displaystyle\leq 4\|F_{\delta/2}*H_{\delta/2m}^{\prime}*\cdots*H_{\delta/2m}^{\prime}\|_{L^{\beta}(\mathbb{R})}
4Fδ/2Lβ()Hδ/2mL1()m(4m)m(1δ)m+1/β8FL(),\displaystyle\leq 4\|F_{\delta/2}\|_{L^{\beta}(\mathbb{R})}\|H_{\delta/2m}^{\prime}\|_{L^{1}(\mathbb{R})}^{m}\leq(4m)^{m}\big{(}\tfrac{1}{\delta}\big{)}^{m+1/\beta_{*}}8\|F\|_{L^{\infty}(\mathbb{R})},

where the factor 44 in the first line arises by the triangle inequality and by splitting the LβL^{\beta} norm over the intervals [0,π][0,\pi] and [π,2π][\pi,2\pi], and the second line uses Young’s inequality and Ha=1a(δ0δa)H_{a}^{\prime}=\frac{1}{a}(\delta_{0}-\delta_{a}). Then 2. follows by noting that δε2K\delta\geq\frac{\varepsilon}{2K}. ∎

5. Words in random permutation matrices

The aim of this section is to recall basic facts about random permutations that will form the input to the methods of this paper. These results are implicit in Nica [53], but are derived in simpler and more explicit form by Linial and Puder [45] whose presentation we follow. We emphasize that all the results that are reviewed in this section arise from elementary combinatorial reasoning; we refer the reader to the expository paper [21, §3] for a brief introduction.

We will work with random permutation matrices 𝑺N=(S1N,,SdN)\bm{S}^{N}=(S_{1}^{N},\ldots,S_{d}^{N}) and their limiting model 𝒔=(s1,,sd)\bm{s}=(s_{1},\ldots,s_{d}) as defined in section 3.1. In particular, we recall that si:=λ(gi)s_{i}:=\lambda(g_{i}), where g1,,gdg_{1},\ldots,g_{d} are the free generators of 𝐅d\mathbf{F}_{d}.

It will be convenient to extend these definitions by setting g0:=eg_{0}:=e, gd+i:=gi1g_{d+i}:=g_{i}^{-1} and analogously S0N:=𝟏S^{N}_{0}:=\mathbf{1}, Sd+iN:=(SiN)1=SiNS^{N}_{d+i}:=(S_{i}^{N})^{-1}=S_{i}^{N*} and s0:=𝟏s_{0}:=\mathbf{1}, sd+i:=si1=sis_{d+i}:=s_{i}^{-1}=s_{i}^{*} for 1id1\leq i\leq d. This convention will be in force in the remainder of the paper.

We aim to understand the expected trace of words in random permutation matrices and their inverses. To formalize this notion, denote by 𝐖d\mathbf{W}_{d} the set of all finite words in the letters g0,,g2dg_{0},\ldots,g_{2d}. We implicitly identify w𝐖dw\in\mathbf{W}_{d} with the word map w:GdGw:\mathrm{G}^{d}\to\mathrm{G} it induces on any group G\mathrm{G}. Thus w(g1,,gd)𝐅dw(g_{1},\ldots,g_{d})\in\mathbf{F}_{d} is the reduction of the word ww, while w(S1N,,SdN)w(S_{1}^{N},\ldots,S_{d}^{N}) is the random matrix obtained by substituting giSiNg_{i}\leftarrow S_{i}^{N} and multiplying these matrices in the order they appear in ww.

5.1. Rationality

The first property we will need is that the expected trace of any word is a rational function of 1N\frac{1}{N}. We emphasize that only very limited information on this function will be needed, which is collected in the following lemma.

Lemma 5.1.

Let w𝐖dw\in\mathbf{W}_{d} be any word of length at most qq and NqN\geq q. Then

𝐄[trNw(𝑺N)]=fw(1N)gq(1N),\mathbf{E}[\mathop{\mathrm{tr}_{N}}w(\bm{S}^{N})]=\frac{f_{w}(\tfrac{1}{N})}{g_{q}(\tfrac{1}{N})},

where

gq(x):=(1x)d1(12x)d2(1(q1)x)dq1g_{q}(x):=(1-x)^{d_{1}}(1-2x)^{d_{2}}\cdots(1-(q-1)x)^{d_{q-1}}

with dj:=min(d,qj+1)d_{j}:=\min\big{(}d,\lfloor\frac{q}{j+1}\rfloor\big{)}, and fw,gqf_{w},g_{q} are polynomials of degree at most q(1+logd)q(1+\log d).

Proof.

Observe that 𝐄[Trw(S1N,,SdN)]\mathbf{E}[\mathop{\mathrm{Tr}}w(S_{1}^{N},\ldots,S_{d}^{N})] is the expected number of fixed points of w(S¯1N,,S¯dN)w(\bar{S}_{1}^{N},\ldots,\bar{S}_{d}^{N}) minus one (as we restrict to {1N}\{1_{N}\}^{\perp}). Thus [45, eq. (7)] yields

𝐄[trNw(𝑺N)]=1N+Γ(1N)eΓvΓ+1l=1vΓ1(1lN)j=1dl=1eΓj1(1lN)\mathbf{E}[\mathop{\mathrm{tr}_{N}}w(\bm{S}^{N})]=-\frac{1}{N}+\sum_{\Gamma}\frac{(\frac{1}{N})^{e_{\Gamma}-v_{\Gamma}+1}\prod_{l=1}^{v_{\Gamma}-1}(1-\frac{l}{N})}{\prod_{j=1}^{d}\prod_{l=1}^{e_{\Gamma}^{j}-1}(1-\frac{l}{N})}

where the sum is over a certain collection of connected graphs Γ\Gamma with vΓqv_{\Gamma}\leq q vertices and eΓqe_{\Gamma}\leq q edges, and eΓj0e_{\Gamma}^{j}\geq 0 are integers so that eΓ1++eΓd=eΓe_{\Gamma}^{1}+\cdots+e_{\Gamma}^{d}=e_{\Gamma} [45, p. 105]. Note that eΓvΓ+10e_{\Gamma}-v_{\Gamma}+1\geq 0 as Γ\Gamma is connected.

The denominator inside the sum can be written equivalently as

j=1dl=1eΓj1(1lN)=(11N)d1Γ(12N)d2Γ(1q1N)dq1Γ,{\textstyle\prod_{j=1}^{d}\prod_{l=1}^{e_{\Gamma}^{j}-1}(1-\tfrac{l}{N})=(1-\tfrac{1}{N})^{d_{1}^{\Gamma}}(1-\tfrac{2}{N})^{d_{2}^{\Gamma}}\cdots(1-\tfrac{q-1}{N})^{d_{q-1}^{\Gamma}}},

with diΓ:=|{1jd:eΓji+1}|did_{i}^{\Gamma}:=|\{1\leq j\leq d:e_{\Gamma}^{j}\geq i+1\}|\leq d_{i} as eΓ1++eΓdqe_{\Gamma}^{1}+\cdots+e_{\Gamma}^{d}\leq q. Thus

𝐄[trNw(𝑺N)]=1Ngq(1N)+Γ(1N)eΓvΓ+1l=1vΓ1(1lN)i=1q1(1iN)didiΓgq(1N).\mathbf{E}[\mathop{\mathrm{tr}_{N}}w(\bm{S}^{N})]=\frac{-\frac{1}{N}g_{q}(\frac{1}{N})+\sum_{\Gamma}(\frac{1}{N})^{e_{\Gamma}-v_{\Gamma}+1}\prod_{l=1}^{v_{\Gamma}-1}(1-\frac{l}{N})\prod_{i=1}^{q-1}(1-\frac{i}{N})^{d_{i}-d_{i}^{\Gamma}}}{g_{q}(\frac{1}{N})}.

To conclude, note that i=1q1di1qmin(d,qx)𝑑xq(1+logd)min(d,q)\sum_{i=1}^{q-1}d_{i}\leq\int_{1}^{q}\min(d,\frac{q}{x})dx\leq q(1+\log d)-\min(d,q) and eΓi=1q1diΓ=|{1jd:eΓj1}|min(d,q)e_{\Gamma}-\sum_{i=1}^{q-1}d_{i}^{\Gamma}=|\{1\leq j\leq d:e_{\Gamma}^{j}\geq 1\}|\leq\min(d,q). Thus gqg_{q} has degree at most q(1+logd)1q(1+\log d)-1 and each term in the sum has degree at most q(1+logd)q(1+\log d). ∎

5.2. First-order asymptotics

We now recall the first-order asymptotic behavior of expected traces of words in random permutation matrices. The following is a special case of a result of Nica [53], see [45, p. 124] for a simple proof.

An element of the free group v𝐅dv\in\mathbf{F}_{d}, vev\neq e is called a proper power if it can be written as v=wkv=w^{k} for some w𝐅dw\in\mathbf{F}_{d} and k2k\geq 2, and is called a non-power otherwise. We denote by 𝐅dnp\mathbf{F}_{d}^{\rm np} the set of non-powers. Every v𝐅dv\in\mathbf{F}_{d}, vev\neq e can be written uniquely as v=wkv=w^{k} for some w𝐅dnpw\in\mathbf{F}_{d}^{\rm np} and k1k\geq 1, cf. [47, Proposition I.2.17].

Lemma 5.2.

Fix a word w𝐖dw\in\mathbf{W}_{d} that does not reduce to the identity, and express its reduction as w(g1,,gd)=vkw(g_{1},\ldots,g_{d})=v^{k} with v𝐅dnpv\in\mathbf{F}_{d}^{\rm np} and k1k\geq 1. Then

limNN𝐄[trNw(𝑺N)]=ω(k)1,\lim_{N\to\infty}N\,\mathbf{E}[\mathop{\mathrm{tr}_{N}}w(\bm{S}^{N})]=\omega(k)-1,

where ω(k)\omega(k) denotes the number of divisors of kk.

Proof.

This follows from [53, Corollary 1.3] by noting as in the proof of Lemma 5.1 that 𝐄[Trw(𝑺N)]\mathbf{E}[\mathop{\mathrm{Tr}}w(\bm{S}^{N})] is the expected number of fixed points of w(𝑺¯N)w(\bm{\bar{S}}^{N}) minus one. ∎

In particular, Lemma 5.2 implies the following.

Corollary 5.3 (Weak convergence).

For every word w𝐖dw\in\mathbf{W}_{d}, we have

limN𝐄[trNw(𝑺N)]=τ(w(𝒔)).\lim_{N\to\infty}\mathbf{E}[\mathop{\mathrm{tr}_{N}}w(\bm{S}^{N})]=\tau\big{(}w(\bm{s})\big{)}.
Proof.

Lemma 5.2 implies that 𝐄[trNw(𝑺N)]=o(1)\mathbf{E}[\mathop{\mathrm{tr}_{N}}w(\bm{S}^{N})]=o(1) for any word ww that does not reduce to the identity. On the other hand, if ww reduces to the identity, then w(𝑺N)w(\bm{S}^{N}) is the identity matrix on {1N}\{1_{N}\}^{\perp} and thus 𝐄[trNw(𝑺N)]=11N\mathbf{E}[\mathop{\mathrm{tr}_{N}}w(\bm{S}^{N})]=1-\frac{1}{N}. The conclusion follows as clearly τ(si1sik)=δe,λ(gi1)λ(gik)δe=1gi1gik=e\tau(s_{i_{1}}\cdots s_{i_{k}})=\langle\delta_{e},\lambda(g_{i_{1}})\cdots\lambda(g_{i_{k}})\delta_{e}\rangle=1_{g_{i_{1}}\cdots g_{i_{k}}=e}. ∎

6. Master inequalities for random permutations I

6.1. Master inequalities

In this section, we develop a core ingredient of our method: inequalities for the normalized trace of polynomials of 𝑺N\bm{S}^{N}.

Theorem 6.1 (Master inequalities).

Fix a self-adjoint noncommutative polynomial PMD()𝐬,𝐬P\in\mathrm{M}_{D}(\mathbb{C})\otimes\mathbb{C}\langle\bm{s},\bm{s}^{*}\rangle of degree q0q_{0}, and let K=PMD()C(𝐅d)K=\|P\|_{\mathrm{M}_{D}(\mathbb{C})\otimes C^{*}(\mathbf{F}_{d})}. Then there exists a linear functional νi\nu_{i} on 𝒫\mathcal{P} for every i+i\in\mathbb{Z}_{+} so that

|𝐄[trDNh(P(𝑺N,𝑺N))]i=0m1νi(h)Ni|(4qq0(1+logd))4mNmhC0[K,K]\bigg{|}\mathbf{E}[\mathop{\mathrm{tr}_{DN}}h(P(\bm{S}^{N},\bm{S}^{N*}))]-\sum_{i=0}^{m-1}\frac{\nu_{i}(h)}{N^{i}}\bigg{|}\leq\frac{(4qq_{0}(1+\log d))^{4m}}{N^{m}}\|h\|_{C^{0}[-K,K]}

for every N,m,qN,m,q\in\mathbb{N} and h𝒫qh\in\mathcal{P}_{q}.

An immediate consequence of Theorem 6.1 is the following corollary that we spell out separately for future reference.

Corollary 6.2.

In the setting of Theorem 6.1, we have

|νm(h)|(4qq0(1+logd))4mhC0[K,K]|\nu_{m}(h)|\leq(4qq_{0}(1+\log d))^{4m}\,\|h\|_{C^{0}[-K,K]}

for every m+m\in\mathbb{Z}_{+}, qq\in\mathbb{N}, and h𝒫qh\in\mathcal{P}_{q}.

While we will need Theorem 6.1 in its full generality in some applications, many strong convergence results will already arise from the case m=2m=2 as was explained in section 2. The case m=1m=1 is of independent interest as it provides a quantitative form of Corollary 5.3; see, for example, Corollary 7.2 below.

6.2. Proof of Theorem 6.1 and Corollary 6.2

Throughout the proof, we fix a polynomial PP as in Theorem 6.1. We emphasize that all the objects that appear in the proof depend implicitly on the choice of PP.

We begin by noting that (trDid)(h(P(𝑺N,𝑺N)))({\mathop{\mathrm{tr}_{D}}}\otimes\mathrm{id})(h(P(\bm{S}^{N},\bm{S}^{N*}))) is a linear combination of words w(𝑺N)w(\bm{S}^{N}) of length at most qq0qq_{0} for every h𝒫qh\in\mathcal{P}_{q}. Thus Lemma 5.1 yields

𝐄[trDNh(P(𝑺N,𝑺N))]=Ψh(1N)=fh(1N)gqq0(1N)\mathbf{E}[\mathop{\mathrm{tr}_{DN}}h(P(\bm{S}^{N},\bm{S}^{N*}))]=\Psi_{h}(\tfrac{1}{N})=\frac{f_{h}(\frac{1}{N})}{g_{qq_{0}}(\frac{1}{N})}

for Nqq0N\geq qq_{0}, where fh,gqq0f_{h},g_{qq_{0}} are polynomials of degree at most qq0(1+logd)qq_{0}(1+\log d).

As Ψh\Psi_{h} has no pole at 0, we can define for each m+m\in\mathbb{Z}_{+}

νm(h):=Ψh(m)(0)m!.\nu_{m}(h):=\frac{\Psi_{h}^{(m)}(0)}{m!}.

It is clear from the definition of Ψh\Psi_{h} that νm\nu_{m} defines a linear functional on 𝒫\mathcal{P}. Moreover, it follows immediately from Taylor’s theorem that

|𝐄[trDNh(P(𝑺N,𝑺N))]i=0m1νi(h)Ni|Ψh(m)C0[0,1N]m!1Nm\bigg{|}\mathbf{E}[\mathop{\mathrm{tr}_{DN}}h(P(\bm{S}^{N},\bm{S}^{N*}))]-\sum_{i=0}^{m-1}\frac{\nu_{i}(h)}{N^{i}}\bigg{|}\leq\frac{\|\Psi_{h}^{(m)}\|_{C^{0}[0,\frac{1}{N}]}}{m!}\frac{1}{N^{m}} (6.1)

provided NN is large enough that Ψh\Psi_{h} has no poles on [0,1N][0,\frac{1}{N}]. The main idea of the proof is that we will use Lemma 4.3 to bound the remainder term in the Taylor expansion. To this end we must show that the function gqq0g_{qq_{0}} is nearly constant.

Lemma 6.3.

e1gq(x)1e^{-1}\leq g_{q}(x)\leq 1 for every x[0,1q2(1+logd)]x\in[0,\frac{1}{q^{2}(1+\log d)}] and qq\in\mathbb{N}.

Proof.

The definition of gqg_{q} in Lemma 5.1 implies (1qx)q(1+logd)1gq(x)1(1-qx)^{q(1+\log d)-1}\leq g_{q}(x)\leq 1 for x[0,1q]x\in[0,\frac{1}{q}], where we used that d1++dq1q(1+logd)1d_{1}+\cdots+d_{q-1}\leq q(1+\log d)-1 as shown in the proof of Lemma 5.1. The conclusion follows using (11a)a1e1(1-\frac{1}{a})^{a-1}\geq e^{-1} for a>1a>1. ∎

Next, we must obtain an upper bound on ΨhC0[0,1N]\|\Psi_{h}\|_{C^{0}[0,\frac{1}{N}]}.

Lemma 6.4.

ΨhC0[0,1M]2ehC0[K,K]\|\Psi_{h}\|_{C^{0}[0,\frac{1}{M}]}\leq 2e\|h\|_{C^{0}[-K,K]} for M=2(qq0(1+logd))2M=2(qq_{0}(1+\log d))^{2}.

Proof.

As P(𝑺N,𝑺N)PMD()C(𝐅d)\|P(\bm{S}^{N},\bm{S}^{N*})\|\leq\|P\|_{\mathrm{M}_{D}(\mathbb{C})\otimes C^{*}(\mathbf{F}_{d})}, we have

|Ψh(1N)|=|𝐄[trDNh(P(𝑺N,𝑺N))]|hC0[K,K]|\Psi_{h}(\tfrac{1}{N})|=|\mathbf{E}[\mathop{\mathrm{tr}_{DN}}h(P(\bm{S}^{N},\bm{S}^{N*}))]|\leq\|h\|_{C^{0}[-K,K]}

for all NN\in\mathbb{N}. We will extend this bound from the set I={1N:NM}I=\{\frac{1}{N}:N\geq M\} to the interval [0,1M][0,\frac{1}{M}] using polynomial interpolation. First, note that

|fh(1N)||Ψh(1N)|hC0[K,K]|f_{h}(\tfrac{1}{N})|\leq|\Psi_{h}(\tfrac{1}{N})|\leq\|h\|_{C^{0}[-K,K]}

as gqq0(1N)1g_{qq_{0}}(\frac{1}{N})\leq 1 for NqN\geq q. The assumption of Lemma 4.2 is satisfied for hfhh\leftarrow f_{h}, qqq0(1+logd)q\leftarrow qq_{0}(1+\log d), a=1Ma=\frac{1}{M} as soon as M2(qq0(1+logd))2M\geq 2(qq_{0}(1+\log d))^{2}, which yields

fhC0[0,1M]2hC0[K,K].\|f_{h}\|_{C^{0}[0,\frac{1}{M}]}\leq 2\|h\|_{C^{0}[-K,K]}.

It remains to apply Lemma 6.3 to estimate ΨhC0[0,1M]efhC0[0,1M]\|\Psi_{h}\|_{C^{0}[0,\frac{1}{M}]}\leq e\|f_{h}\|_{C^{0}[0,\frac{1}{M}]}. ∎

We can now conclude the proof.

Proof of Theorem 6.1 and Corollary 6.2.

Let M=2(qq0(1+logd))2M=2(qq_{0}(1+\log d))^{2}. Then

Ψh(m)C0[0,1M]m!(4qq0(1+logd))4mhC0[K,K]\frac{\|\Psi_{h}^{(m)}\|_{C^{0}[0,\frac{1}{M}]}}{m!}\leq(4qq_{0}(1+\log d))^{4m}\|h\|_{C^{0}[-K,K]}

by Lemmas 4.3, 6.3, and 6.4, where we used that (10e)m2e44m(10e)^{m}2e\leq 4^{4m} for m1m\geq 1. Thus the proof of Theorem 6.1 for NMN\geq M follows directly from (6.1).

The proof of Corollary 6.2 for m1m\geq 1 now follows by multiplying the bound of Theorem 6.1 by NmN^{m} and taking NN\to\infty. As ν0(h)=limN𝐄[trDNh(P(𝑺N,𝑺N))]\nu_{0}(h)=\lim_{N}\mathbf{E}[\mathop{\mathrm{tr}_{DN}}h(P(\bm{S}^{N},\bm{S}^{N*}))], the conclusion of Corollary 6.2 evidently also holds for m=0m=0.

It remains to prove Theorem 6.1 for N<MN<M. To this end, we can trivially bound the left-hand side of the inequality in Theorem 6.1 by

|𝐄[trDNh(P(𝑺N,𝑺N))]i=0m1νi(h)Ni|(1+i=0m1(4qq0(1+logd))4iNi)hC0[K,K]\bigg{|}\mathbf{E}[\mathop{\mathrm{tr}_{DN}}h(P(\bm{S}^{N},\bm{S}^{N*}))]-\sum_{i=0}^{m-1}\frac{\nu_{i}(h)}{N^{i}}\bigg{|}\leq\bigg{(}1+\sum_{i=0}^{m-1}\frac{(4qq_{0}(1+\log d))^{4i}}{N^{i}}\bigg{)}\|h\|_{C^{0}[-K,K]}

using the triangle inequality and Corollary 6.2. But note that 1<2k(qq0(1+logd))4kNk1<\frac{2^{k}(qq_{0}(1+\log d))^{4k}}{N^{k}} for every k1k\geq 1 as we assumed N<MN<M. We can therefore estimate

1+i=0m1(4qq0(1+logd))4iNi(2m+i=0m144i2mi)(qq0(1+logd))4mNm,1+\sum_{i=0}^{m-1}\frac{(4qq_{0}(1+\log d))^{4i}}{N^{i}}\leq\bigg{(}2^{m}+\sum_{i=0}^{m-1}4^{4i}2^{m-i}\bigg{)}\frac{(qq_{0}(1+\log d))^{4m}}{N^{m}},

and we conclude using 2m+i=0m144i2mi44m2^{m}+\sum_{i=0}^{m-1}4^{4i}2^{m-i}\leq 4^{4m}. ∎

7. Master inequalities for random permutations II

The aim of this short section is to extend the master inequalities of Theorem 6.1 from polynomials to general smooth test functions. This extension will play a key role in the proofs of strong convergence.

Theorem 7.1 (Smooth master inequalities).

Fix a self-adjoint noncommutative polynomial PMD()𝐬,𝐬P\in\mathrm{M}_{D}(\mathbb{C})\otimes\mathbb{C}\langle\bm{s},\bm{s}^{*}\rangle of degree q0q_{0}, and let K=PMD()C(𝐅d)K=\|P\|_{\mathrm{M}_{D}(\mathbb{C})\otimes C^{*}(\mathbf{F}_{d})}. Then there exists a compactly supported distribution νi\nu_{i} for every i+i\in\mathbb{Z}_{+} so that

|𝐄[trDNh(P(𝑺N,𝑺N))]i=0m1νi(h)Ni|(4q0(1+logd))4mNmβf(4m+1)Lβ[0,2π]\bigg{|}\mathbf{E}[\mathop{\mathrm{tr}_{DN}}h(P(\bm{S}^{N},\bm{S}^{N*}))]-\sum_{i=0}^{m-1}\frac{\nu_{i}(h)}{N^{i}}\bigg{|}\lesssim\frac{(4q_{0}(1+\log d))^{4m}}{N^{m}}\,\beta_{*}\|f^{(4m+1)}\|_{L^{\beta}[0,2\pi]}

for all N,mN,m\in\mathbb{N}, β>1\beta>1, hC()h\in C^{\infty}(\mathbb{R}), where f(θ):=h(Kcosθ)f(\theta):=h(K\cos\theta) and 1/β:=11/β1/\beta_{*}:=1-1/\beta.

Proof.

We first note that Corollary 6.2 ensures, by Lemma 4.7, that the linear functionals νi\nu_{i} in Theorem 6.1 extend uniquely to compactly supported distributions.

Let h𝒫qh\in\mathcal{P}_{q} and express it in the form (4.1). Rather than applying Theorem 6.1 to hh directly, we apply it to the Chebyshev polynomials Tj(K1)T_{j}(K^{-1}\cdot) to obtain

|𝐄[trDNh(P(𝑺N,𝑺N))]i=0m1νi(h)Ni|(4q0(1+logd))4mNmj=1qj4m|aj|.\bigg{|}\mathbf{E}[\mathop{\mathrm{tr}_{DN}}h(P(\bm{S}^{N},\bm{S}^{N*}))]-\sum_{i=0}^{m-1}\frac{\nu_{i}(h)}{N^{i}}\bigg{|}\leq\frac{(4q_{0}(1+\log d))^{4m}}{N^{m}}\sum_{j=1}^{q}j^{4m}|a_{j}|.

Here we used that the left-hand side of Theorem 6.1 vanishes when hh is the constant function (as then ν0(h)=h\nu_{0}(h)=h, νi(h)=0\nu_{i}(h)=0 for i1i\geq 1), so the constant term in the Chebyshev expansion cancels. The proof is completed for h𝒫qh\in\mathcal{P}_{q} by applying Lemma 4.4. The conclusion extends to hCh\in C^{\infty} as 𝒫q\mathcal{P}_{q} is dense in C[K,K]C^{\infty}[-K,K]. ∎

The special case m=1m=1 is of independent interest, as it yields a quantitative formulation of weak convergence (cf. Corollary 5.3).

Corollary 7.2 (Quantitative weak convergence).

In the setting of Theorem 7.1,

|𝐄[trDNh(P(𝑺N,𝑺N))](trDτ)(h(P(𝒔,𝒔)))|(q0(1+logd))4Nβf(5)Lβ[0,2π]\big{|}\mathbf{E}[\mathop{\mathrm{tr}_{DN}}h(P(\bm{S}^{N},\bm{S}^{N*}))]-({\mathop{\mathrm{tr}_{D}}}\otimes{\tau})(h(P(\bm{s},\bm{s}^{*})))\big{|}{\,\lesssim\,}\frac{(q_{0}(1+\log d))^{4}}{N}\beta_{*}\|f^{(5)}\|_{L^{\beta}[0,2\pi]}

for all NN\in\mathbb{N}, β>1\beta>1, hC()h\in C^{\infty}(\mathbb{R}), where f(θ):=h(Kcosθ)f(\theta):=h(K\cos\theta) and 1/β:=11/β1/\beta_{*}:=1-1/\beta.

Proof.

This is simply a restatement of the special case of Theorem 7.1 with m=1m=1, where we note that ν0(h)=(trDτ)(h(P(𝒔,𝒔)))\nu_{0}(h)=({\mathop{\mathrm{tr}_{D}}}\otimes{\tau})(h(P(\bm{s},\bm{s}^{*}))) by Corollary 5.3. ∎

Remark 7.3.

When hh is chosen to be a smooth approximation of the indicator function of an interval, Corollary 7.2 shows that the spectral distribution of P(𝑺N,𝑺N)P(\bm{S}^{N},\bm{S}^{N*}) is well approximated by that of P(𝒔,𝒔)P(\bm{s},\bm{s}^{*}) at a mesoscopic scale. While far sharper results at the local scale are proved in [42, 40, 41] for random regular graphs, no result of this kind appears to be known to date for arbitrary polynomials PP.

In the remainder of this paper, we adopt the notation for the distributions νi\nu_{i} as in Theorem 7.1. It should be emphasized, however, that the definition of νi\nu_{i} depends both on the model and on the noncommutative polynomial PP under consideration. As each of the following sections will be concerned with a specific model and choice of polynomial PP, the meaning of νi\nu_{i} will always be clear from context.

8. Random regular graphs

The aim of this section is to prove our main results on random regular graphs, the effective Friedman Theorem 3.4 and the staircase Theorem 3.6.

8.1. Proof of Theorem 3.4

The proof of Theorem 3.4 is based on Theorem 7.1 with m=2m=2. Let us spell out what this says in the present setting.

Lemma 8.1.

There exists a compactly supported distribution ν1\nu_{1} such that for every NN\in\mathbb{N}, β>1\beta>1, and hC()h\in C^{\infty}(\mathbb{R}), we have

|𝐄[trNh(AN)]τ(h(AF))1Nν1(h)|(1+logd)8N2βf(9)Lβ[0,2π],\big{|}\mathbf{E}[\mathop{\mathrm{tr}_{N}}h(A^{N})]-\tau(h(A_{\rm F}))-\tfrac{1}{N}\nu_{1}(h)\big{|}\lesssim\frac{(1+\log d)^{8}}{N^{2}}\,\beta_{*}\|f^{(9)}\|_{L^{\beta}[0,2\pi]},

where f(θ):=h(2dcosθ)f(\theta):=h(2d\cos\theta) and 1/β=11/β1/\beta_{*}=1-1/\beta.

Proof.

Apply Theorem 7.1 with m=2m=2 and P(𝒔,𝒔)=s1+s1++sd+sdP(\bm{s},\bm{s}^{*})=s_{1}+s_{1}^{*}+\cdots+s_{d}+s_{d}^{*}, using PC(𝐅d)=2d\|P\|_{C^{*}(\mathbf{F}_{d})}=2d and ν0(h)=limN𝐄[trNh(AN)]=τ(h(AF))\nu_{0}(h)=\lim_{N}\mathbf{E}[\mathop{\mathrm{tr}_{N}}h(A^{N})]=\tau(h(A_{\rm F})) by Corollary 5.3. ∎

The remaining ingredient of the proof is to show that suppν1[AF,AF]\mathop{\mathrm{supp}}\nu_{1}\subseteq[-\|A_{\rm F}\|,\|A_{\rm F}\|]. To this end, is suffices by Lemma 4.9 to understand the exponential growth rate of the moments of ν1\nu_{1}. We begin by computing a formula for the moments of ν1\nu_{1}.222In the special setting of random regular graphs, a nonrigorous derivation of an explicit formula for ν1\nu_{1} (that is, an explicit solution to the Hausdorff moment problem associated to Lemma 8.2) appears in the physics literature [49]. However, an explicit solution to the moment problem, which is not available in more general situations, is not needed to apply the methods of this paper. Recall that 𝐅dnp\mathbf{F}_{d}^{\rm np} denotes the non-power elements of 𝐅d\mathbf{F}_{d} (see Lemma 5.2).

Lemma 8.2.

For all pp\in\mathbb{N}, we have

ν1(xp)=τ(AFp)+k=2p(ω(k)1)v𝐅dnpi1,,ip=12d1gi1gip=vk.\nu_{1}(x^{p})=-\tau(A_{\rm F}^{p})+\sum_{k=2}^{p}(\omega(k)-1)\sum_{v\in\mathbf{F}_{d}^{\rm np}}\sum_{i_{1},\ldots,i_{p}=1}^{2d}1_{g_{i_{1}}\cdots g_{i_{p}}=v^{k}}.
Proof.

Note that

ν1(xp)+τ(AFp)=limNN(𝐄[trN(AN)p](11N)τ(AFp))=limNi1,,ip=12dN(𝐄[trNSi1NSipN](11N)τ(λ(gi1)λ(gip))).\nu_{1}(x^{p})+\tau(A_{\rm F}^{p})=\lim_{N\to\infty}N\big{(}\mathbf{E}[\mathop{\mathrm{tr}_{N}}(A^{N})^{p}]-(1-\tfrac{1}{N})\,\tau(A_{\rm F}^{p})\big{)}\\ =\lim_{N\to\infty}\sum_{i_{1},\ldots,i_{p}=1}^{2d}N\big{(}\mathbf{E}[\mathop{\mathrm{tr}_{N}}S^{N}_{i_{1}}\cdots S^{N}_{i_{p}}]-(1-\tfrac{1}{N})\,\tau(\lambda(g_{i_{1}})\cdots\lambda(g_{i_{p}}))\big{)}.

As τ(λ(gi1)λ(gip))=1gi1gip=e\tau(\lambda(g_{i_{1}})\cdots\lambda(g_{i_{p}}))=1_{g_{i_{1}}\cdots g_{i_{p}}=e}, the terms with gi1gip=eg_{i_{1}}\cdots g_{i_{p}}=e cancel in the sum as in the proof of Corollary 5.3. The conclusion now follows from Lemma 5.2 (note that the k=1k=1 term does not appear in the sum as ω(1)1=0\omega(1)-1=0). ∎

In the present setting, a simple argument due to Friedman [26, Lemma 2.4] suffices to bound the growth rate of the moments. As a variant of this argument will be needed later in this paper, we recall the proof here.

Lemma 8.3.

For every k2k\geq 2 and p1p\geq 1, we have

v𝐅dnpi1,,ip=12d1gi1gip=vk(p+1)4AFp.\sum_{v\in\mathbf{F}_{d}^{\rm np}}\sum_{i_{1},\ldots,i_{p}=1}^{2d}1_{g_{i_{1}}\cdots g_{i_{p}}=v^{k}}\leq(p+1)^{4}\|A_{\rm F}\|^{p}.
Proof.

We would like to argue that if gi1gip=vkg_{i_{1}}\cdots g_{i_{p}}=v^{k}, there must exist a<ba<b so that gi1gia=vg_{i_{1}}\cdots g_{i_{a}}=v, gia+1gib=vg_{i_{a+1}}\cdots g_{i_{b}}=v, and gib+1gip=vk2g_{i_{b+1}}\cdots g_{i_{p}}=v^{k-2}. But this need not be true: if vv is not cyclically reduced, the last letters of vv may cancel the first letters of the next repetition of vv, and then the cancelled letters need not appear in gi1gipg_{i_{1}}\cdots g_{i_{p}}.

To eliminate this ambiguity, we note that any v𝐅dnpv\in\mathbf{F}_{d}^{\rm np} can be uniquely expressed as v=gwg1v=gwg^{-1} with w𝐅dnpw\in\mathbf{F}_{d}^{\rm np}, g𝐅dg\in\mathbf{F}_{d} such that ww is cyclically reduced and such that gwg1gwg^{-1} is reduced if geg\neq e. Thus we may write for any k2k\geq 2

v𝐅dnpi1,,ip=12d1gi1gip=vk(w,g)0t1t2t3t4pi1,,ip=12d(1gi1git1=g×1git1+1git2=w 1git2+1git3=w 1git3+1git4=wk2 1git4+1gip=g1),\sum_{v\in\mathbf{F}_{d}^{\rm np}}\sum_{i_{1},\ldots,i_{p}=1}^{2d}1_{g_{i_{1}}\cdots g_{i_{p}}=v^{k}}\leq\sum_{(w,g)}\sum_{0\leq t_{1}\leq t_{2}\leq t_{3}\leq t_{4}\leq p}\sum_{i_{1},\ldots,i_{p}=1}^{2d}\big{(}1_{g_{i_{1}}\cdots g_{i_{t_{1}}}=g}\times\mbox{}\\ 1_{g_{i_{t_{1}+1}}\cdots g_{i_{t_{2}}}=w}\,1_{g_{i_{t_{2}+1}}\cdots g_{i_{t_{3}}}=w}\,1_{g_{i_{t_{3}+1}}\cdots g_{i_{t_{4}}}=w^{k-2}}\,1_{g_{i_{t_{4}+1}}\cdots g_{i_{p}}=g^{-1}}\big{)},

where the sum is over pairs (w,g)(w,g) with the above properties.

The idea is now to express the indicators as 1gi1git=v=δv,λ(gi1)λ(git)δe1_{g_{i_{1}}\cdots g_{i_{t}}=v}=\langle\delta_{v},\lambda(g_{i_{1}})\cdots\lambda(g_{i_{t}})\delta_{e}\rangle. Substituting this identity into the right-hand side of the above inequality yields

v𝐅dnpi1,,ip=12d1gi1gip=vk(w,g)0t1t2t3t4p(δg,AFt1δeδw,AFt2t1δe×\displaystyle\sum_{v\in\mathbf{F}_{d}^{\rm np}}\sum_{i_{1},\ldots,i_{p}=1}^{2d}1_{g_{i_{1}}\cdots g_{i_{p}}=v^{k}}\leq\sum_{(w,g)}\sum_{0\leq t_{1}\leq t_{2}\leq t_{3}\leq t_{4}\leq p}\big{(}\langle\delta_{g},A_{\rm F}^{t_{1}}\delta_{e}\rangle\langle\delta_{w},A_{\rm F}^{t_{2}-t_{1}}\delta_{e}\rangle\times\mbox{}
δw,AFt3t2δeδwk2,AFt4t3δeδg1,AFpt4δe)\displaystyle\hskip 142.26378pt\langle\delta_{w},A_{\rm F}^{t_{3}-t_{2}}\delta_{e}\rangle\langle\delta_{w^{k-2}},A_{\rm F}^{t_{4}-t_{3}}\delta_{e}\rangle\langle\delta_{g^{-1}},A_{\rm F}^{p-t_{4}}\delta_{e}\rangle\big{)}
0t1t2t3t4pAFt4t3(g𝐅dδg,AFt1δeδg1,AFpt4δe)×\displaystyle\leq\sum_{0\leq t_{1}\leq t_{2}\leq t_{3}\leq t_{4}\leq p}\|A_{\rm F}\|^{t_{4}-t_{3}}\bigg{(}\sum_{g\in\mathbf{F}_{d}}\langle\delta_{g},A_{\rm F}^{t_{1}}\delta_{e}\rangle\langle\delta_{g^{-1}},A_{\rm F}^{p-t_{4}}\delta_{e}\rangle\bigg{)}\times\mbox{}
(w𝐅dδw,AFt2t1δeδw,AFt3t2δe)\displaystyle\hskip 170.71652pt\bigg{(}\sum_{w\in\mathbf{F}_{d}}\langle\delta_{w},A_{\rm F}^{t_{2}-t_{1}}\delta_{e}\rangle\langle\delta_{w},A_{\rm F}^{t_{3}-t_{2}}\delta_{e}\rangle\bigg{)}

where we used |δwk2,AFt4t3δe|AFt4t3|\langle\delta_{w^{k-2}},A_{\rm F}^{t_{4}-t_{3}}\delta_{e}\rangle|\leq\|A_{\rm F}\|^{t_{4}-t_{3}}. The conclusion now follows readily by applying the Cauchy–Schwarz inequality to the sums over gg and ww, as

v𝐅d|δv,AFtδe|2=AFtδe2AF2t\sum_{v\in\mathbf{F}_{d}}|\langle\delta_{v},A_{\rm F}^{t}\delta_{e}\rangle|^{2}=\|A_{\rm F}^{t}\delta_{e}\|^{2}\leq\|A_{\rm F}\|^{2t}

for any t0t\geq 0. ∎

Corollary 8.4.

suppν1[AF,AF]\mathop{\mathrm{supp}}\nu_{1}\subseteq[-\|A_{\rm F}\|,\|A_{\rm F}\|].

Proof.

Lemmas 8.2 and 8.3 imply that |ν1(xp)|(1+p2(p+1)4)AFp|\nu_{1}(x^{p})|\leq(1+p^{2}(p+1)^{4})\|A_{\rm F}\|^{p} for all p1p\geq 1, so that the conclusion follows from Lemma 4.9. ∎

We can now complete the proof of Theorem 3.4.

Proof of Theorem 3.4.

Let χ\chi be the test function provided by Lemma 4.10 with m=8m=8, K=2dK=2d, ρ=AF=22d1\rho=\|A_{\rm F}\|=2\sqrt{2d-1}. As χ(x)=0\chi(x)=0 for |x|AF+ε2|x|\leq\|A_{\rm F}\|+\frac{\varepsilon}{2}, we have τ(χ(AF))=0\tau(\chi(A_{\rm F}))=0 and ν1(χ)=0\nu_{1}(\chi)=0 by Corollary 8.4. Thus Lemmas 8.1 and 4.10 yield

𝐄[trNχ(AN)](logd)8N2β(dε)8+1/β\mathbf{E}[\mathop{\mathrm{tr}_{N}}\chi(A^{N})]\lesssim\frac{(\log d)^{8}}{N^{2}}\,\beta_{*}\bigg{(}\frac{d}{\varepsilon}\bigg{)}^{8+1/\beta_{*}}

for all ε<2d22d1\varepsilon<2d-2\sqrt{2d-1} and β1\beta_{*}\geq 1.

To conclude, we note that as χ(x)1\chi(x)\geq 1 for |x|AF+ε|x|\geq\|A_{\rm F}\|+\varepsilon, we have

𝐏[ANAF+ε]𝐏[Trχ(AN)1]1N(dlogdε)8log(2edε)\mathbf{P}[\|A^{N}\|\geq\|A_{\rm F}\|+\varepsilon]\leq\mathbf{P}[\mathop{\mathrm{Tr}}\chi(A^{N})\geq 1]\lesssim\frac{1}{N}\,\bigg{(}\frac{d\log d}{\varepsilon}\bigg{)}^{8}\log\bigg{(}\frac{2ed}{\varepsilon}\bigg{)}

for ε<2d22d1\varepsilon<2d-2\sqrt{2d-1}, where we chose β=1+log(2dε)\beta_{*}=1+\log(\frac{2d}{\varepsilon}). ∎

8.2. Proof of Theorem 3.6

The proof is based on results of Puder [61]. Let us begin by synthesizing the parts of [61] that we need here.

Lemma 8.5.

Define ρm\rho_{m} as in Theorem 3.6. Then for any 2md2\leq m\leq d, we have

lim supp|νm(xp)|1/pρm.\limsup_{p\to\infty}|\nu_{m}(x^{p})|^{1/p}\leq\rho_{m}.
Proof.

Fix a word w𝐖dw\in\mathbf{W}_{d} of length qq that does not reduce to the identity. By Lemmas 5.1 and 5.2, we can write for all sufficiently large NN

N𝐄[trNw(𝑺N)]=s=0bs(w)Ns,N\,\mathbf{E}[\mathop{\mathrm{tr}_{N}}w(\bm{S}^{N})]=\sum_{s=0}^{\infty}\frac{b_{s}(w)}{N^{s}},

as a rational function is analytic away from its poles. It is shown in [61, §5.5] that bs(w)=0b_{s}(w)=0 for sπ(w)2s\leq\pi(w)-2, that bπ(w)1(w)=|Crit(w)|b_{\pi(w)-1}(w)=|\mathrm{Crit}(w)|, and that |bs(w)|q2(s+1)|b_{s}(w)|\leq q^{2(s+1)} for sπ(w)s\geq\pi(w), where π(w){0,,d}{}\pi(w)\in\{0,\ldots,d\}\cup\{\infty\} denotes the primitivity rank of ww. We refer to [61, §2.2] for precise definitions of π(w)\pi(w) and Crit(w)\mathrm{Crit}(w), which are not needed in the following argument: the only properties we will use are that π(w)=0\pi(w)=0 if and only if w=ew=e, and that |Crit(w)|1|\mathrm{Crit}(w)|\geq 1 when π(w)\pi(w)\neq\infty.

Theorem 6.1 with m=d+1m=d+1 and P(𝒔,𝒔)=s1+s1++sd+sdP(\bm{s},\bm{s}^{*})=s_{1}+s_{1}^{*}+\cdots+s_{d}+s_{d}^{*} yields

i1,,ip=12dN𝐄[trNSi1NSipN] 1gi1gipe\displaystyle\sum_{i_{1},\ldots,i_{p}=1}^{2d}N\,\mathbf{E}[\mathop{\mathrm{tr}_{N}}S^{N}_{i_{1}}\cdots S^{N}_{i_{p}}]\,1_{g_{i_{1}}\cdots g_{i_{p}}\neq e} =N(𝐄[trN(AN)p](11N)ν0(xp))\displaystyle=N\big{(}\mathbf{E}[\mathop{\mathrm{tr}_{N}}(A^{N})^{p}]-(1-\tfrac{1}{N})\nu_{0}(x^{p})\big{)}
=ν0(xp)+i=0d1νi+1(xp)Ni+O(1Nd)\displaystyle=\nu_{0}(x^{p})+\sum_{i=0}^{d-1}\frac{\nu_{i+1}(x^{p})}{N^{i}}+O\bigg{(}\frac{1}{N^{d}}\bigg{)}

for NN\to\infty with p,dp,d fixed (cf. the proof of Lemma 8.2). It follows that

νm(xp)=i1,,ip=12dbm1(gi1gip) 1gi1gipe=r=1mw𝒲prbm1(w)\nu_{m}(x^{p})=\sum_{i_{1},\ldots,i_{p}=1}^{2d}b_{m-1}(g_{i_{1}}\cdots g_{i_{p}})\,1_{g_{i_{1}}\cdots g_{i_{p}}\neq e}=\sum_{r=1}^{m}\sum_{w\in\mathcal{W}_{p}^{r}}b_{m-1}(w)

for m2m\geq 2, where 𝒲pr={w=gi1gip:1i1,,ip2d,π(w)=r}\mathcal{W}_{p}^{r}=\{w=g_{i_{1}}\cdots g_{i_{p}}:1\leq i_{1},\ldots,i_{p}\leq 2d,\pi(w)=r\}. Thus

|νm(xp)|mp2mmaxr=1,,mw𝒲pr|Crit(w)|,|\nu_{m}(x^{p})|\leq mp^{2m}\max_{r=1,\ldots,m}\sum_{w\in\mathcal{W}_{p}^{r}}|\mathrm{Crit}(w)|,

where we used that |Crit(w)|1|\mathrm{Crit}(w)|\geq 1 for π(w)\pi(w)\neq\infty. The conclusion now follows from the second (unnumbered) theorem in [61, §2.4]. ∎

Corollary 8.6.

Let d2d\geq 2. Then suppνm[ρm,ρm]\mathop{\mathrm{supp}}\nu_{m}\subseteq[-\rho_{m},\rho_{m}] for all 0md0\leq m\leq d.

Proof.

For m=0,1m=0,1 the follows from AF=22d1\|A_{F}\|=2\sqrt{2d-1} and Corollary 8.4, while for m=2,,dm=2,\ldots,d this follows from Lemmas 8.5 and 4.9. ∎

We can now complete the proof of Theorem 3.6.

Proof of Theorem 3.6.

Fix mmd1m_{*}\leq m\leq d-1. Let χ\chi be the test function provided by Lemma 4.10 with m4(m+1)m\leftarrow 4(m+1), K2dK\leftarrow 2d, ρρm\rho\leftarrow\rho_{m}. Then νi(χ)=0\nu_{i}(\chi)=0 for imi\leq m by Corollary 8.6. Thus Theorem 7.1 with hχh\leftarrow\chi, mm+1m\leftarrow m+1 and Lemma 4.10 yield

𝐏[ANρm+ε]𝐏[Trχ(AN)1]CdNm1ε4(m+1)log(2eε),\mathbf{P}[\|A^{N}\|\geq\rho_{m}+\varepsilon]\leq\mathbf{P}[\mathop{\mathrm{Tr}}\chi(A^{N})\geq 1]\lesssim\frac{C_{d}}{N^{m}}\frac{1}{\varepsilon^{4(m+1)}}\log\bigg{(}\frac{2e}{\varepsilon}\bigg{)},

where we chose β=1+log(2ε)\beta_{*}=1+\log(\frac{2}{\varepsilon}) (note that β>1\beta_{*}>1 as ε<ρm+1ρm2\varepsilon<\rho_{m+1}-\rho_{m}\leq 2).

For the lower bound, it is shown in the proof of [27, Theorem 2.11] that

𝐏[ANα](1o(1))N1m\mathbf{P}[\|A^{N}\|\geq\alpha^{\prime}]\geq(1-o(1))N^{1-m}

for all m>mm>m_{*} and α<ρm\alpha^{\prime}<\rho_{m}. Choosing α=ρm1+ε\alpha^{\prime}=\rho_{m-1}+\varepsilon completes the proof. ∎

9. Strong convergence of random permutation matrices

The aim of this section is to prove Theorem 3.9. Most of the proof is identical to that of Theorem 3.4. The only difficulty in the present setting is to adapt the argument of Lemma 8.3 for bounding suppν1\mathop{\mathrm{supp}}\nu_{1}. This is not entirely straightforward, because the proof of Lemma 8.3 relied on an overcounting argument which is not applicable to general polynomials. Nonetheless, we will show that a more careful implementation of the idea behind the proof can avoid this issue.

Throughout the proof, we fix a polynomial PP as in Theorem 3.9, and define

XN:=P(𝑺N,𝑺N),XF:=P(𝒔,𝒔).X^{N}:=P(\bm{S}^{N},\bm{S}^{N*}),\qquad\quad X_{\rm F}:=P(\bm{s},\bm{s}^{*}).

To simplify the proof, we begin by applying a linearization trick of Pisier [59] in order to factor XFX_{\rm F} into polynomials of degree one.

Lemma 9.1.

For any ε>0\varepsilon>0, there exist qq\in\mathbb{N} and operators

Xj:=i=02dAj,iλ(gi),j=1,,qX_{j}:=\sum_{i=0}^{2d}A_{j,i}\otimes\lambda(g_{i}),\qquad j=1,\ldots,q

with matrix coefficients Aj,iA_{j,i} of dimensions Dj×Dj+1D_{j}\times D_{j+1} with D1=Dq+1=DD_{1}=D_{q+1}=D, so that XF=X1XqX_{\rm F}=X_{1}\cdots X_{q} and Xj(XF+ε)1/q\|X_{j}\|\leq(\|X_{\rm F}\|+\varepsilon)^{1/q} for all jj.

Proof.

This is an immediate consequence of [59, Theorem 1]. ∎

In the following, we fix ε>0\varepsilon>0 and a factorization of XFX_{\rm F} as in Lemma 9.1. We will also define XjNX_{j}^{N} by replacing λ(gi)SiN\lambda(g_{i})\leftarrow S_{i}^{N} in the definition of XjX_{j}. Note that as g1,,gdg_{1},\ldots,g_{d} are free, XF=X1XqX_{\rm F}=X_{1}\cdots X_{q} implies that XN=X1NXqNX^{N}=X^{N}_{1}\cdots X^{N}_{q}.

It will be convenient to extend the indexing of the above operators cyclically: we will define Xj:=X((j1)modq)+1X_{j}:=X_{((j-1)\mathop{\mathrm{mod}}q)+1} for all jj\in\mathbb{N}. This implies, for example, that XFp=X1X2XpqX_{\rm F}^{p}=X_{1}X_{2}\cdots X_{pq}. We similarly define Aj,i:=A((j1)modq)+1,iA_{j,i}:=A_{((j-1)\mathop{\mathrm{mod}}q)+1,i} for jj\in\mathbb{N}.

9.1. The first-order support

In the present setting, the moments of the linear functionals ν0\nu_{0} and ν1\nu_{1} in Theorem 6.1 satisfy

ν0(xp)\displaystyle\nu_{0}(x^{p}) =limN𝐄[trDN(XN)p]=τ(XFp),\displaystyle=\lim_{N\to\infty}\mathbf{E}[\mathop{\mathrm{tr}_{DN}}(X^{N})^{p}]=\tau(X_{F}^{p}),
ν1(xp)\displaystyle\nu_{1}(x^{p}) =limNN(𝐄[trDN(XN)p]τ(XFp)).\displaystyle=\lim_{N\to\infty}N\big{(}\mathbf{E}[\mathop{\mathrm{tr}_{DN}}(X^{N})^{p}]-\tau(X_{F}^{p})\big{)}.

We begin by writing an expression for ν1(xp)\nu_{1}(x^{p}).

Lemma 9.2.

For every pp\in\mathbb{N}, we have

ν1(xp)=τ(XFp)+k=2pq(ω(k)1)v𝐅dnpi1,,ipq=02dai1,,ipq 1gi1gipq=vk,\nu_{1}(x^{p})=-\tau(X_{\rm F}^{p})+\sum_{k=2}^{pq}(\omega(k)-1)\sum_{v\in\mathbf{F}_{d}^{\rm np}}\sum_{i_{1},\ldots,i_{pq}=0}^{2d}a_{i_{1},\ldots,i_{pq}}\,1_{g_{i_{1}}\cdots g_{i_{pq}}=v^{k}},

where we define

ai1,,ipq:=trD(A1,i1A2,i2Apq,ipq).a_{i_{1},\ldots,i_{pq}}:=\mathop{\mathrm{tr}_{D}}\big{(}A_{1,i_{1}}A_{2,i_{2}}\cdots A_{pq,i_{pq}}\big{)}.
Proof.

The proof is identical to that of Lemma 8.2. ∎

We would like to repeat the proof of Lemma 8.3 in the present setting. Recall that the idea of the proof is to write vk=gwwwk2g1v^{k}=gwww^{k-2}g^{-1} where ww is cyclically reduced, and then to reason that any word gi1gitg_{i_{1}}\cdots g_{i_{t}} that reduces to vkv^{k} is a concatenation of words gi1git1=gg_{i_{1}}\cdots g_{i_{t_{1}}}=g, git1+1git2=wg_{i_{t_{1}+1}}\cdots g_{i_{t_{2}}}=w, git2+1git3=wg_{i_{t_{2}+1}}\cdots g_{i_{t_{3}}}=w, etc. We can therefore sum over all such concatenations in the expression for ν1\nu_{1}.

The problem with this argument is that the above decomposition is not unique: for example, g1g2g21g1=g12g_{1}g_{2}g_{2}^{-1}g_{1}=g_{1}^{2} can be decomposed in two different ways (g1,g2g21g1)(g_{1},g_{2}g_{2}^{-1}g_{1}) or (g1g2g21,g1)(g_{1}g_{2}g_{2}^{-1},g_{1}). Thus when we sum over all possible ways of generating such concatenations, we are overcounting the number of words that reduce to vkv^{k}. Unlike in Lemma 8.3, the coefficients ai1,,ipqa_{i_{1},\ldots,i_{pq}} can have both positive and negative signs and thus we cannot upper bound the moments by overcounting.

We presently show how a more careful analysis can avoid overcounting. We begin by introducing the following basic notion.

Definition 9.3.

For any v𝐅dv\in\mathbf{F}_{d} and 0i1,,ik2d0\leq i_{1},\ldots,i_{k}\leq 2d, we write gi1gikvg_{i_{1}}\cdots g_{i_{k}}\triangleq v if gi1gik=vg_{i_{1}}\cdots g_{i_{k}}=v and gilgikvg_{i_{l}}\cdots g_{i_{k}}\neq v for 1<lk1<l\leq k.

To interpret this notion, one should think of any word gi1gikg_{i_{1}}\cdots g_{i_{k}} as defining a walk in the Cayley graph of 𝐅d\mathbf{F}_{d} when read from right to left, starting at the identity. Then gi1gikvg_{i_{1}}\cdots g_{i_{k}}\triangleq v states that the walk defined by gi1gikg_{i_{1}}\cdots g_{i_{k}} reaches vv for the first time at its endpoint. Just as we can write

1gi1gik=v=δv,λ(gi1)λ(gik)δe,1_{g_{i_{1}}\cdots g_{i_{k}}=v}=\langle\delta_{v},\lambda(g_{i_{1}})\cdots\lambda(g_{i_{k}})\delta_{e}\rangle,

the equivalence notion of Definition 9.3 may also be expressed in terms of matrix elements of the left-regular representation.

Lemma 9.4.

For any 0i1,,ik2d0\leq i_{1},\ldots,i_{k}\leq 2d and v𝐅dv\in\mathbf{F}_{d}, we have

1gi1gikv=δv,λ(gi1)λ(gik)δes=1k1δe,λ(gi1)Qλ(gi2)Qλ(gis)δeδv,λ(gis+1)λ(gik)δe,1_{g_{i_{1}}\cdots g_{i_{k}}\triangleq v}=\langle\delta_{v},\lambda(g_{i_{1}})\cdots\lambda(g_{i_{k}})\delta_{e}\rangle\\ -\sum_{s=1}^{k-1}\langle\delta_{e},\lambda(g_{i_{1}})Q\lambda(g_{i_{2}})\cdots Q\lambda(g_{i_{s}})\delta_{e}\rangle\langle\delta_{v},\lambda(g_{i_{s+1}})\cdots\lambda(g_{i_{k}})\delta_{e}\rangle,

where we defined Q:=𝟏δeδeQ:=\mathbf{1}-\delta_{e}\delta_{e}^{*} (that is, QQ is the orthogonal projection onto {δe}\{\delta_{e}\}^{\perp}).

Proof.

One may simply note that term ss in the sum is the indicator of the event that gi1gik=vg_{i_{1}}\cdots g_{i_{k}}=v, gis+1gik=vg_{i_{s+1}}\cdots g_{i_{k}}=v, and gitgikvg_{i_{t}}\cdots g_{i_{k}}\neq v for 1<t<s+11<t<s+1. ∎

We now use this identity to express the kind of sum that appears in Lemma 9.2 exactly (without overcounting) in terms of matrix elements of the operators XjX_{j}.

Lemma 9.5.

Fix v1,,ve𝐅dv_{1},\ldots,v_{\ell}\neq e\in\mathbf{F}_{d} (2)(\ell\geq 2) so that v1vv_{1}\cdots v_{\ell} is reduced. Define

X(t,s)\displaystyle X_{(t,s)} :=XtXs,\displaystyle:=X_{t}\cdots X_{s},
X[t,s]\displaystyle X_{[t,s]} :=Xt(𝟏Dt+1Q)Xt+1(𝟏DsQ)Xs\displaystyle:=-X_{t}(\mathbf{1}_{D_{t+1}}\otimes Q)X_{t+1}\cdots(\mathbf{1}_{D_{s}}\otimes Q)X_{s}

for tst\leq s and X(t,s)=X[t,s]:=𝟏X_{(t,s)}=X_{[t,s]}:=\mathbf{1} for t>st>s. Then

i1,,ipq=02dai1,,ipq 1gi1gipq=v1v=x2,y2,,x,y,x+11<t2s2<<tspqex+1δv1,X(1,t21)ex2δe×r=2exrδe,X[tr,sr1]eyrδeeyrδvr,X(sr,tr+11)exr+1δe\sum_{i_{1},\ldots,i_{pq}=0}^{2d}a_{i_{1},\ldots,i_{pq}}\,1_{g_{i_{1}}\cdots g_{i_{pq}}=v_{1}\cdots v_{\ell}}=\\ \sum_{x_{2},y_{2},\ldots,x_{\ell},y_{\ell},x_{\ell+1}}\sum_{1<t_{2}\leq s_{2}<\cdots<t_{\ell}\leq s_{\ell}\leq pq}\langle e_{x_{\ell+1}}\otimes\delta_{v_{1}},X_{(1,t_{2}-1)}\,e_{x_{2}}\otimes\delta_{e}\rangle\times\mbox{}\\ \prod_{r=2}^{\ell}\langle e_{x_{r}}\otimes\delta_{e},X_{[t_{r},s_{r}-1]}\,e_{y_{r}}\otimes\delta_{e}\rangle\langle e_{y_{r}}\otimes\delta_{v_{r}},X_{(s_{r},t_{r+1}-1)}\,e_{x_{r+1}}\otimes\delta_{e}\rangle

where we let t+1:=pq+1t_{\ell+1}:=pq+1.

Proof.

Suppose that gi1gipq=v1vg_{i_{1}}\cdots g_{i_{pq}}=v_{1}\cdots v_{\ell}. As we assumed that v1vv_{1}\cdots v_{\ell} is reduced, there are unique indices 1<t2<<tpq1<t_{2}<\cdots<t_{\ell}\leq pq so that

gi1git21=v1,gitrgitr+11vrfor 2r.g_{i_{1}}\cdots g_{i_{t_{2}-1}}=v_{1},\qquad g_{i_{t_{r}}}\cdots g_{i_{t_{r+1}-1}}\triangleq v_{r}\quad\text{for }2\leq r\leq\ell.

Thus we can write

1gi1gipq=v1v=1<t2<<tpq1gi1git21=v1r=21gitrgitr+11vr.1_{g_{i_{1}}\cdots g_{i_{pq}}=v_{1}\cdots v_{\ell}}=\sum_{1<t_{2}<\cdots<t_{\ell}\leq pq}1_{g_{i_{1}}\cdots g_{i_{t_{2}-1}}=v_{1}}\prod_{r=2}^{\ell}1_{g_{i_{t_{r}}}\cdots g_{i_{t_{r+1}-1}}\triangleq v_{r}}.

The conclusion follows by applying Lemma 9.4 to the indicators in this identity and summing over the indices i1,,ipqi_{1},\ldots,i_{pq}. ∎

With this identity in hand, we can proceed exactly as in Lemma 8.3. In the following lemma, recall that we chose an arbitrary ε>0\varepsilon>0 in Lemma 9.1.

Lemma 9.6.

For every k2k\geq 2, we have

|v𝐅dnpi1,,ipq=02dai1,,ipq 1gi1gipq=vk|2Dmax9(pq)8(XF+ε)p,\Bigg{|}\sum_{v\in\mathbf{F}_{d}^{\rm np}}\sum_{i_{1},\ldots,i_{pq}=0}^{2d}a_{i_{1},\ldots,i_{pq}}\,1_{g_{i_{1}}\cdots g_{i_{pq}}=v^{k}}\Bigg{|}\leq 2D_{\rm max}^{9}(pq)^{8}(\|X_{\rm F}\|+\varepsilon)^{p},

where Dmax=maxjDjD_{\rm max}=\max_{j}D_{j}.

Proof.

Suppose first that k3k\geq 3. Let \mathcal{F} be the set of v𝐅dnpv\in\mathbf{F}_{d}^{\rm np} that are cyclically reduced, and let :=𝐅dnp\\mathcal{F}^{\prime}:=\mathbf{F}_{d}^{\rm np}\backslash\mathcal{F}. Then every vv\in\mathcal{F}^{\prime} can be uniquely expressed as v=gwg1v=gwg^{-1} with ww\in\mathcal{F}, g𝐅dg\in\mathbf{F}_{d} so that gwg1gwg^{-1} is reduced. Applying Lemma 9.5 with =5\ell=5 and v1gv_{1}\leftarrow g, v2wv_{2}\leftarrow w, v3wv_{3}\leftarrow w, v4wk2v_{4}\leftarrow w^{k-2}, v5g1v_{5}\leftarrow g^{-1} yields

|vi1,,ipq=02dai1,,ipq 1gi1gipq=vk|1<t2s2<<t5s5pqX[t2,s21]X[t3,s31]X[t4,s41]X[t5,s51]X(s4,t51)×x2,y2,,x5,y5,x6w𝐅d|ey2δw,X(s2,t31)ex3δeey3δw,X(s3,t41)ex4δe|×g𝐅d|ex6δg,X(1,t21)ex2δeey5δg1,X(s5,pq)ex6δe|.\Bigg{|}\sum_{v\in\mathcal{F}^{\prime}}\sum_{i_{1},\ldots,i_{pq}=0}^{2d}a_{i_{1},\ldots,i_{pq}}\,1_{g_{i_{1}}\cdots g_{i_{pq}}=v^{k}}\Bigg{|}\leq\\ \sum_{1<t_{2}\leq s_{2}<\cdots<t_{5}\leq s_{5}\leq pq}\|X_{[t_{2},s_{2}-1]}\|\|X_{[t_{3},s_{3}-1]}\|\|X_{[t_{4},s_{4}-1]}\|\|X_{[t_{5},s_{5}-1]}\|\|X_{(s_{4},t_{5}-1)}\|\times\mbox{}\\ \sum_{x_{2},y_{2},\ldots,x_{5},y_{5},x_{6}}\sum_{w\in\mathbf{F}_{d}}|\langle e_{y_{2}}\otimes\delta_{w},X_{(s_{2},t_{3}-1)}\,e_{x_{3}}\otimes\delta_{e}\rangle\langle e_{y_{3}}\otimes\delta_{w},X_{(s_{3},t_{4}-1)}\,e_{x_{4}}\otimes\delta_{e}\rangle|\times\mbox{}\\ \sum_{g\in\mathbf{F}_{d}}|\langle e_{x_{6}}\otimes\delta_{g},X_{(1,t_{2}-1)}\,e_{x_{2}}\otimes\delta_{e}\rangle\langle e_{y_{5}}\otimes\delta_{g^{-1}},X_{(s_{5},pq)}\,e_{x_{6}}\otimes\delta_{e}\rangle|.

Using that X(t,s1)(XF+ε)(st)/q\|X_{(t,s-1)}\|\leq(\|X_{\rm F}\|+\varepsilon)^{(s-t)/q} and X[t,s1](XF+ε)(st)/q\|X_{[t,s-1]}\|\leq(\|X_{\rm F}\|+\varepsilon)^{(s-t)/q} by Lemma 9.1 and Q=1\|Q\|=1, and applying Cauchy–Schwarz as in the proof of Lemma 8.3, we can upper bound the right-hand side by Dmax9(pq)8(XF+ε)pD_{\rm max}^{9}(pq)^{8}(\|X_{\rm F}\|+\varepsilon)^{p}.

If we sum instead over vv\in\mathcal{F} on the left-hand side, we can bound in exactly the same manner by applying Lemma 9.5 with =3\ell=3 and v1wv_{1}\leftarrow w, v2wv_{2}\leftarrow w, v3wk2v_{3}\leftarrow w^{k-2}. Thus the proof is complete for the case k3k\geq 3.

The proof for k=2k=2 is identical, except that we now omit the wk2w^{k-2} terms. ∎

Corollary 9.7.

suppν1[XF,XF]\mathop{\mathrm{supp}}\nu_{1}\subseteq[-\|X_{\rm F}\|,\|X_{\rm F}\|].

Proof.

Lemmas 9.2 and 9.6 imply that |ν1(xp)|XFp+2Dmax9(pq)10(XF+ε)p|\nu_{1}(x^{p})|\leq\|X_{\rm F}\|^{p}+2D_{\rm max}^{9}(pq)^{10}(\|X_{\rm F}\|+\varepsilon)^{p}. The conclusion follows by first applying Lemma 4.9, and then letting ε0\varepsilon\to 0. ∎

9.2. Proof of Theorem 3.9

The rest of the proof contains no new ideas.

Proof of Theorem 3.9.

Let χ\chi be the test function provided by Lemma 4.10 with m=8m=8, K=PMd()C(𝐅d)K=\|P\|_{\mathrm{M}_{d}(\mathbb{C})\otimes C^{*}(\mathbf{F}_{d})}, and ρ=XF\rho=\|X_{\rm F}\|. Then ν0(χ)=ν1(χ)=0\nu_{0}(\chi)=\nu_{1}(\chi)=0 by Corollary 9.7. Thus Theorem 7.1 with hχh\leftarrow\chi, m2m\leftarrow 2 and Lemma 4.10 yield

𝐏[XNXF+ε]𝐏[Trχ(XN)1]DN(Kq0logdε)8log(eKε)\mathbf{P}[\|X^{N}\|\geq\|X_{\rm F}\|+\varepsilon]\leq\mathbf{P}[\mathop{\mathrm{Tr}}\chi(X^{N})\geq 1]\lesssim\frac{D}{N}\bigg{(}\frac{Kq_{0}\log d}{\varepsilon}\bigg{)}^{8}\log\bigg{(}\frac{eK}{\varepsilon}\bigg{)}

for all ε<KXF\varepsilon<K-\|X_{\rm F}\|, where we chose β=1+log(Kε)\beta_{*}=1+\log(\frac{K}{\varepsilon}). ∎

10. Stable representations of the symmetric group

The aim of this section is to prove Theorem 3.14. The main issue here is to extend the basic properties of random permutations matrices of section 5 to general stable representations. With analogues of these properties in place, the remainder of the proof is nearly the same as that of Theorem 3.9.

Throughout this section, we fix a stable representation as in section 3.4.2 and adopt the notations introduced there. In addition, we will denote by β\beta the degree of the polynomial φ(x1,x22,,xrr)\varphi(x_{1},x_{2}^{2},\ldots,x_{r}^{r}) and by κ:=supNN0DNNα\kappa:=\sup_{N\geq N_{0}}\frac{D_{N}}{N^{\alpha}}.

10.1. Basic properties

We will deduce the requisite properties from the much more precise results of [32]. (In fact, the only facts that will be needed here are relatively elementary, as is explained in [45, Remark 31] and [32, §1.5.1].)

Let w𝐖dw\in\mathbf{W}_{d} be a word. Then clearly

w(𝚷N)=πN(w(𝝈))w(\bm{\Pi}^{N})=\pi_{N}(w(\bm{\sigma}))

where 𝝈=(σ1,,σd)\bm{\sigma}=(\sigma_{1},\ldots,\sigma_{d}) are i.i.d. uniformly distributed elements of 𝐒N\mathbf{S}_{N}. The distribution of w(𝝈)w(\bm{\sigma}) defines a probability measure 𝐏w,N\mathbf{P}_{w,N} on 𝐒N\mathbf{S}_{N}, called the word measure.

Lemma 10.1.

Let w𝐖dw\in\mathbf{W}_{d} be any word of length at most qq that does not reduce to the identity, and let Nmax{βq,N0}N\geq\max\{\beta q,N_{0}\}. Then we have

𝐄[Trw(𝚷N)]=fw(1N)gq(1N),\mathbf{E}[\mathop{\mathrm{Tr}}w(\bm{\Pi}^{N})]=\frac{f_{w}(\tfrac{1}{N})}{g_{q}(\tfrac{1}{N})},

where

gq(x):=(1x)d1(12x)d2(1(βq1)x)dβq1g_{q}(x):=(1-x)^{d_{1}}(1-2x)^{d_{2}}\cdots(1-(\beta q-1)x)^{d_{\beta q-1}}

with dj:=min(d,βqj+1)d_{j}:=\min\big{(}d,\lfloor\frac{\beta q}{j+1}\rfloor\big{)}, and fw,gqf_{w},g_{q} are polynomials of degree at most βq(1+logd)\beta q(1+\log d).

Proof.

We may assume without loss of generality that ww is cyclically reduced with length 1q1\leq\ell\leq q, as Trw(𝚷N)\mathop{\mathrm{Tr}}w(\bm{\Pi}^{N}) is invariant under cyclic reduction.

The definition of the stable representation implies that

𝐄[Trw(𝚷N)]=𝐄w,N[φ(ξ1,,ξr)].\mathbf{E}[\mathop{\mathrm{Tr}}w(\bm{\Pi}^{N})]=\mathbf{E}_{w,N}[\varphi(\xi_{1},\ldots,\xi_{r})]. (10.1)

Let ξ1α1ξrαr\xi_{1}^{\alpha_{1}}\cdots\xi_{r}^{\alpha_{r}} be a monomial of φ\varphi of degree at least one. Then for NN sufficiently large, [32, Example 3.6, Definition 6.6, and Proposition 6.8] imply that

𝐄w,N[ξ1α1ξrαr]=Γ(1N)eΓvΓi=1vΓ1(1iN)j=1di=1eΓj1(1iN),\mathbf{E}_{w,N}[\xi_{1}^{\alpha_{1}}\cdots\xi_{r}^{\alpha_{r}}]=\sum_{\Gamma}\frac{(\frac{1}{N})^{e_{\Gamma}-v_{\Gamma}}\prod_{i=1}^{v_{\Gamma}-1}(1-\frac{i}{N})}{\prod_{j=1}^{d}\prod_{i=1}^{e_{\Gamma}^{j}-1}(1-\frac{i}{N})}, (10.2)

where the sum is over a certain collection of quotients Γ\Gamma of the graph consisting of α1\alpha_{1} disjoint cycles of length \ell, α2\alpha_{2} disjoint cycles of length 22\ell, etc., whose edges are colored by {1,,d}\{1,\ldots,d\}. Here we denote by vΓv_{\Gamma} the number of vertices of Γ\Gamma, by eΓje_{\Gamma}^{j} the number of edges of Γ\Gamma colored by jj, and by eΓ=eΓ1++eΓde_{\Gamma}=e_{\Gamma}^{1}+\cdots+e_{\Gamma}^{d}.

It is immediate from the above construction that vΓi=1riαiβqv_{\Gamma}\leq\ell\sum_{i=1}^{r}i\alpha_{i}\leq\beta q, and analogously eΓβqe_{\Gamma}\leq\beta q. The validity of (10.2) for all NβqmaxjeΓjN\geq\beta q\geq\max_{j}e_{\Gamma}^{j} can be read off from the proof of [32, Proposition 6.8]. Moreover, we must have eΓvΓ0e_{\Gamma}-v_{\Gamma}\geq 0 for all Γ\Gamma that appear in the sum, as 𝐄w,N[ξ1α1ξrαr]=O(1)\mathbf{E}_{w,N}[\xi_{1}^{\alpha_{1}}\cdots\xi_{r}^{\alpha_{r}}]=O(1) by [32, Theorem 1.3] and the following discussion. Thus (10.2) is a rational function of 1N\frac{1}{N}. The remainder of the proof proceeds exactly as in the proof of Lemma 5.1. ∎

We now describe the lowest order asymptotics.

Lemma 10.2.

Fix a word w𝐖dw\in\mathbf{W}_{d} that does not reduce to the identity, and express its reduction as w(g1,,gd)=vkw(g_{1},\ldots,g_{d})=v^{k} with v𝐅dnpv\in\mathbf{F}_{d}^{\rm np} and k1k\geq 1. Then

limN𝐄[Trw(𝚷N)]=ϖ(k)\lim_{N\to\infty}\mathbf{E}[\mathop{\mathrm{Tr}}w(\bm{\Pi}^{N})]=\varpi(k)

with ϖ(1)=0\varpi(1)=0 and |ϖ(k)|ckβ|\varpi(k)|\leq ck^{\beta} for k2k\geq 2. Here cc is a constant that depends on φ\varphi.

Proof.

Recall that φ\varphi is a sum of character polynomials of irreducible representations, and that we assumed that πN\pi_{N} does not contain the trivial representation. The case k=1k=1 therefore follows from Lemma 5.2 for the standard representation and from [32, Corollary 1.7] for all other irreducible components of φ\varphi.

For k2k\geq 2, let ξ1α1ξrαr\xi_{1}^{\alpha_{1}}\cdots\xi_{r}^{\alpha_{r}} be a monomial of φ\varphi. Applying [32, Theorem 1.3] and the following discussion as well as [32, Remark 7.3] yields

limN𝐄w,N[ξ1α1ξrαr]=limN𝐄v,N[ξkα1ξrkαr]=𝒫Partitions(S)A𝒫j|gcd(A)j|A|1,\lim_{N\to\infty}\mathbf{E}_{w,N}[\xi_{1}^{\alpha_{1}}\cdots\xi_{r}^{\alpha_{r}}]=\lim_{N\to\infty}\mathbf{E}_{v,N}[\xi_{k}^{\alpha_{1}}\cdots\xi_{rk}^{\alpha_{r}}]=\sum_{\mathcal{P}\in\mathrm{Partitions}(S)}\prod_{A\in\mathcal{P}}\sum_{j|\mathrm{gcd}(A)}j^{|A|-1},

where SS is the multiset that contains jkjk with multiplicity αj\alpha_{j} for j=1,,rj=1,\ldots,r. By crudely estimating jgcd(A)rkj\leq\mathrm{gcd}(A)\leq rk, the right-hand side is bounded by B|S|(rk)|S|B_{|S|}(rk)^{|S|} where Bnn!B_{n}\leq n! denotes the number of partitions of a set with nn elements. The conclusion follows as |S|=j=1rαjβ|S|=\sum_{j=1}^{r}\alpha_{j}\leq\beta and using (10.1). ∎

Note that an immediate consequence of Lemma 10.2 is that the weak convergence as in Corollary 5.3 remains valid in the present setting.

10.2. Proof of Theorem 3.14

Fix a self-adjoint noncommutative polynomial PMD()𝒔,𝒔P\in\mathrm{M}_{D}(\mathbb{C})\otimes\mathbb{C}\langle\bm{s},\bm{s}^{*}\rangle of degree q0q_{0}, and let h𝒫qh\in\mathcal{P}_{q}. Then (Trid)(h(P(𝚷N,𝚷N)))({\mathop{\mathrm{Tr}}}\otimes\mathrm{id})(h(P(\bm{\Pi}^{N},\bm{\Pi}^{N*}))) is a linear combination of words w(𝚷N)w(\bm{\Pi}^{N}) of length at most qq0qq_{0}. Summing separately over words that do and do not reduce to the identity yields

𝐄[Trh(P(𝚷N,𝚷N))]=f~h(1N)gqq0(1N)+DN(Trτ)(h(P(𝒔,𝒔)))\mathbf{E}[\mathop{\mathrm{Tr}}h(P(\bm{\Pi}^{N},\bm{\Pi}^{N*}))]=\frac{\tilde{f}_{h}(\frac{1}{N})}{g_{qq_{0}}(\frac{1}{N})}+D_{N}\,(\mathop{\mathrm{Tr}}\otimes\tau)(h(P(\bm{s},\bm{s}^{*})))

by Lemma 10.1, where f~h\tilde{f}_{h} is a polynomial of degree at most βqq0(1+logd)\beta qq_{0}(1+\log d). Thus

𝐄[trNαh(P(𝚷N,𝚷N))]=Ψh(1N)=fh(1N)gqq0(1N)\mathbf{E}[\mathop{\mathrm{tr}_{N^{\alpha}}}h(P(\bm{\Pi}^{N},\bm{\Pi}^{N*}))]=\Psi_{h}(\tfrac{1}{N})=\frac{f_{h}(\frac{1}{N})}{g_{qq_{0}}(\frac{1}{N})}

where fhf_{h} is the polynomial of degree at most βqq0(1+logd)+α2βqq0(1+logd)\beta qq_{0}(1+\log d)+\alpha\leq 2\beta qq_{0}(1+\log d). Here we used that DNNα\frac{D_{N}}{N^{\alpha}} is a polynomial of 1N\frac{1}{N} of degree α\alpha.

We can now repeat the proofs in sections 67 nearly verbatim with the replacements q2βN01/2qq\leftarrow 2\beta N_{0}^{1/2}q (here we multiply by N01/2N_{0}^{1/2} to ensure MN0M\geq N_{0} in Lemma 6.4) and |Ψh(1N)|DDNNαhC0[K,K]κDhC0[K,K]|\Psi_{h}(\frac{1}{N})|\leq\frac{DD_{N}}{N^{\alpha}}\|h\|_{C^{0}[-K,K]}\leq\kappa D\|h\|_{C^{0}[-K,K]} to conclude the following.

Lemma 10.3.

Let d2d\geq 2. Fix PP as in Theorem 3.14 and K=PMD()C(𝐅d)K=\|P\|_{\mathrm{M}_{D}(\mathbb{C})\otimes C^{*}(\mathbf{F}_{d})}. Then there exist compactly supported distributions νi\nu_{i} such that

|𝐄[trNαh(P(𝚷N,𝚷N))]i=0ανi(h)Ni|CD(q0logd)4(α+1)Nα+1βf(4(α+1)+1)Lβ[0,2π]\bigg{|}\mathbf{E}[\mathop{\mathrm{tr}_{N^{\alpha}}}h(P(\bm{\Pi}^{N},\bm{\Pi}^{N*}))]-\sum_{i=0}^{\alpha}\frac{\nu_{i}(h)}{N^{i}}\bigg{|}\leq\frac{CD\,(q_{0}\log d)^{4(\alpha+1)}}{N^{\alpha+1}}\,\beta_{*}\|f^{(4(\alpha+1)+1)}\|_{L^{\beta}[0,2\pi]}

for all NN0N\geq N_{0}, β>1\beta>1, and hC()h\in C^{\infty}(\mathbb{R}). Here f(θ):=h(Kcosθ)f(\theta):=h(K\cos\theta), 1/β:=11/β1/\beta_{*}:=1-1/\beta, and CC is a constant that depends on the choice of stable representation.

Next, we must characterize the supports of the distributions νi\nu_{i}.

Lemma 10.4.

suppνi[P(𝒔,𝒔),P(𝒔,𝒔)]\mathop{\mathrm{supp}}\nu_{i}\subseteq[-\|P(\bm{s},\bm{s}^{*})\|,\|P(\bm{s},\bm{s}^{*})\|] for i=0,,αi=0,\ldots,\alpha.

Proof.

Suppose first that 0i<α0\leq i<\alpha. As Lemma 10.2 implies that

𝐄[trNαh(P(𝚷N,𝚷N))]=DNNα(Trτ)(h(P(𝒔,𝒔)))+O(1Nα)\mathbf{E}[\mathop{\mathrm{tr}_{N^{\alpha}}}h(P(\bm{\Pi}^{N},\bm{\Pi}^{N*}))]=\frac{D_{N}}{N^{\alpha}}\,(\mathop{\mathrm{Tr}}\otimes\tau)(h(P(\bm{s},\bm{s}^{*})))+O\bigg{(}\frac{1}{N^{\alpha}}\bigg{)}

as NN\to\infty for every h𝒫h\in\mathcal{P}, it follows that there is a constant cic_{i} for every i<αi<\alpha so that νi(h)=ci(Trτ)(h(P(𝒔,𝒔)))\nu_{i}(h)=c_{i}\,(\mathop{\mathrm{Tr}}\otimes\tau)(h(P(\bm{s},\bm{s}^{*}))). The claim follows directly.

We now consider i=αi=\alpha. Then we can repeat the proof of Lemma 9.2, but using Lemma 10.2 instead of Lemma 5.2, to obtain the representation

να(xp)=D0τ(P(𝒔,𝒔)p)+k=2pqϖ(k)v𝐅dnpi1,,ipq=02dai1,,ipq 1gi1gipq=vk\nu_{\alpha}(x^{p})=D_{0}\tau(P(\bm{s},\bm{s}^{*})^{p})+\sum_{k=2}^{pq}\varpi(k)\sum_{v\in\mathbf{F}_{d}^{\rm np}}\sum_{i_{1},\ldots,i_{pq}=0}^{2d}a_{i_{1},\ldots,i_{pq}}\,1_{g_{i_{1}}\cdots g_{i_{pq}}=v^{k}}

where D0:=φ(0,,0)D_{0}:=\varphi(0,\ldots,0) is the constant term in the polynomial NDNN\mapsto D_{N}. We therefore obtain |να(xp)|Dmax9(pq)9+β(P(𝒔,𝒔)+ε)p|\nu_{\alpha}(x^{p})|\lesssim D_{\rm max}^{9}(pq)^{9+\beta}(\|P(\bm{s},\bm{s}^{*})\|+\varepsilon)^{p} by Lemmas 9.6 and 10.2, and the conclusion follows by Lemma 4.9 and as ε>0\varepsilon>0 is arbitrary. ∎

The rest of the proof follows in the usual manner.

Proof of Theorem 3.14.

Let χ\chi be the test function provided by Lemma 4.10 with m4(α+1)m\leftarrow 4(\alpha+1) and ρP(𝒔,𝒔)\rho\leftarrow\|P(\bm{s},\bm{s}^{*})\|. Then νi(χ)=0\nu_{i}(\chi)=0 for all iαi\leq\alpha by Lemma 10.4. Thus Lemma 10.3 with hχh\leftarrow\chi, and Lemma 4.10 yield

𝐏[P(𝚷N,𝚷N)P(𝒔,𝒔)+ε]CDN(Kq0logdε)4(α+1)log(eKε),\mathbf{P}[\|P(\bm{\Pi}^{N},\bm{\Pi}^{N*})\|\geq\|P(\bm{s},\bm{s}^{*})\|+\varepsilon]\leq\frac{CD}{N}\bigg{(}\frac{Kq_{0}\log d}{\varepsilon}\bigg{)}^{4(\alpha+1)}\log\bigg{(}\frac{eK}{\varepsilon}\bigg{)},

where we chose β=1+log(Kε)\beta_{*}=1+\log(\frac{K}{\varepsilon}). ∎

Appendix A Lower bounds

There is a general principle that any random matrix model that satisfies

P(𝑿N,𝑿N)P(𝒙,𝒙)+o(1)with probability 1o(1)\|P(\bm{X}^{N},\bm{X}^{N*})\|\leq\|P(\bm{x},\bm{x}^{*})\|+o(1)\quad\text{with probability }1-o(1)

for every noncommutative polynomial PP, automatically also satisfies

P(𝑿N,𝑿N)P(𝒙,𝒙)o(1)with probability 1o(1)\|P(\bm{X}^{N},\bm{X}^{N*})\|\geq\|P(\bm{x},\bm{x}^{*})\|-o(1)\quad\text{with probability }1-o(1)

provided that the CC^{*}-algebra generated by 𝒙\bm{x} has a unique faithful trace; this can be shown, for example, along the lines of [46, pp. 16–19]. Since the latter property holds for the limiting model defined in section 3.1 (cf. [60]), this general principle applies to all the models considered in this paper.

For this reason, we have focused exclusively on strong convergence upper bounds in the main part of this paper. On the other hand, such a general principle is usually not needed in practice, as in most cases the lower bound already follows easily from weak convergence alone. For completness, we presently provide an elementary proof of the lower bound in the setting of this paper along these lines.

Theorem A.1.

Let PMD()𝐬,𝐬P\in\mathrm{M}_{D}(\mathbb{C})\otimes\mathbb{C}\langle\bm{s},\bm{s}^{*}\rangle be any noncommutative polynomial, and fix any stable representation as in section 3.4.2. Then

P(𝚷N,𝚷N)P(𝒔,𝒔)o(1)with probability 1o(1)\|P(\bm{\Pi}^{N},\bm{\Pi}^{N*})\|\geq\|P(\bm{s},\bm{s}^{*})\|-o(1)\quad\text{with probability }1-o(1)

as NN\to\infty.

Note that all the models in this paper may be viewed as special cases of stable representations, so that Theorem A.1 captures random permutation matrices and random regular graphs as well. The proof of Theorem A.1 could also form the basis for quantitative lower bounds, but we do not pursue this here.

We begin by recalling the basic fact that Theorem A.1 will follow directly once we establish weak convergence in the following form.

Lemma A.2 (Weak convergence in probability).

In the setting of Theorem A.1,

trDDN(P(𝚷N,𝚷N))=(trDτ)(P(𝒔,𝒔))+o(1){\mathop{\mathrm{tr}_{DD_{N}}}}\big{(}P(\bm{\Pi}^{N},\bm{\Pi}^{N*})\big{)}=\big{(}{\mathop{\mathrm{tr}_{D}}}\otimes\tau)(P(\bm{s},\bm{s}^{*})\big{)}+o(1)

with probability 1o(1)1-o(1) as NN\to\infty for every self-adjoint PMD()𝐬,𝐬P\in\mathrm{M}_{D}(\mathbb{C})\otimes\mathbb{C}\langle\bm{s},\bm{s}^{*}\rangle.

Let us first use this result to complete the proof of Theorem A.1.

Proof of Theorem A.1.

Fix the polynomial PP. Then

P(𝚷N,𝚷N)2ptrDDN(|P(𝚷N,𝚷N)|2p)(trDτ)(|P(𝒔,𝒔)|2p)o(1)\|P(\bm{\Pi}^{N},\bm{\Pi}^{N*})\|^{2p}\geq{\mathop{\mathrm{tr}_{DD_{N}}}}\big{(}|P(\bm{\Pi}^{N},\bm{\Pi}^{N*})|^{2p}\big{)}\geq\big{(}{\mathop{\mathrm{tr}_{D}}}\otimes\tau)(|P(\bm{s},\bm{s}^{*})|^{2p}\big{)}-o(1)

with probability 1o(1)1-o(1) as NN\to\infty for every pp\in\mathbb{N} by Lemma A.2. Moreover,

(trDτ)(|P(𝒔,𝒔)|2p)1/2p=P(𝒔,𝒔)o(1)\big{(}{\mathop{\mathrm{tr}_{D}}}\otimes\tau)(|P(\bm{s},\bm{s}^{*})|^{2p}\big{)}^{1/2p}=\|P(\bm{s},\bm{s}^{*})\|-o(1)

as pp\to\infty due to the fact that τ\tau defines a faithful state on the CC^{*}-algebra Cred(𝐅d)C^{*}_{\rm red}(\mathbf{F}_{d}) that is generated by 𝒔\bm{s} [54, Proposition 3.12]. The conclusion follows. ∎

The only subtlety in implementing this idea is that the form of weak convergence that follows from Lemma 10.2 only yields convergence in expectation

𝐄[trDDNP(𝚷N,𝚷N)]=(trDτ)(P(𝒔,𝒔))+o(1),\mathbf{E}\big{[}{\mathop{\mathrm{tr}_{DD_{N}}}}P(\bm{\Pi}^{N},\bm{\Pi}^{N*})\big{]}=\big{(}{\mathop{\mathrm{tr}_{D}}}\otimes\tau)(P(\bm{s},\bm{s}^{*})\big{)}+o(1), (A.1)

as opposed to convergence in probability as in Lemma A.2. We must therefore still show concentration around the mean. The latter can be deduced from (A.1) using a minor additional argument. Note that (A.1) does not require PP to be self-adjoint.

Proof of Lemma A.2.

Since every stable representation is the direct sum of irreducible stable represenations (cf. section 3.4.1), we can assume in the rest of the proof that πN\pi_{N} is irreducible. Morover, given any PMD()𝒔,𝒔P\in\mathrm{M}_{D}(\mathbb{C})\otimes\mathbb{C}\langle\bm{s},\bm{s}^{*}\rangle,

trDDNP(𝚷N,𝚷N)=1Di=1DtrDNPii(𝚷N,𝚷N)\mathop{\mathrm{tr}_{DD_{N}}}P(\bm{\Pi}^{N},\bm{\Pi}^{N*})=\frac{1}{D}\sum_{i=1}^{D}\mathop{\mathrm{tr}_{D_{N}}}P_{ii}(\bm{\Pi}^{N},\bm{\Pi}^{N*})

where Pii𝒔,𝒔P_{ii}\in\mathbb{C}\langle\bm{s},\bm{s}^{*}\rangle are the diagonal elements of PP with respect to the matrix coefficients. We can therefore assume in the rest of the proof that D=1D=1.

Let Π0N=πN(σ0)\Pi^{N}_{0}=\pi_{N}(\sigma_{0}), where σ0\sigma_{0} is a uniformly distributed random element of 𝐒N\mathbf{S}_{N} that is independent of σ1,,σd\sigma_{1},\ldots,\sigma_{d}. Moreover, let s0=λ(g0)s_{0}=\lambda(g_{0}) where g0,,gdg_{0},\ldots,g_{d} are free generators of 𝐅d+1\mathbf{F}_{d+1}. It follows from Schur’s lemma that

𝐄[Π0NMΠ0N]=trDN(M)𝟏\mathbf{E}[\Pi^{N}_{0}M\Pi^{N*}_{0}]=\mathop{\mathrm{tr}_{D_{N}}}(M)\mathbf{1}

for every MMDN()M\in\mathrm{M}_{D_{N}}(\mathbb{C}). Therefore

𝐄[trDNΠ0NP(𝚷N,𝚷N)Π0NP(𝚷N,𝚷N)]=𝐄[(trDNP(𝚷N,𝚷N))2].\mathbf{E}\big{[}\mathop{\mathrm{tr}_{D_{N}}}\Pi^{N}_{0}P(\bm{\Pi}^{N},\bm{\Pi}^{N*})\Pi^{N*}_{0}P(\bm{\Pi}^{N},\bm{\Pi}^{N*})\big{]}=\mathbf{E}\big{[}(\mathop{\mathrm{tr}_{D_{N}}}P(\bm{\Pi}^{N},\bm{\Pi}^{N*}))^{2}\big{]}.

Moreover, it is readily verified using the definitions in section 3.1 that

τ(s0P(𝒔,𝒔)s0P(𝒔,𝒔))=(τ(P(𝒔,𝒔)))2.\tau\big{(}s_{0}P(\bm{s},\bm{s}^{*})s^{*}_{0}P(\bm{s},\bm{s}^{*})\big{)}=\big{(}\tau(P(\bm{s},\bm{s}^{*}))\big{)}^{2}.

Since the objects inside the traces on the left-hand side are noncommutative polynomials in d+1d+1 variables and as PP is self-adjoint, applying (A.1) yields

limNVar(trDNP(𝚷N,𝚷N))=0.\lim_{N\to\infty}\mathrm{Var}\big{(}\mathop{\mathrm{tr}_{D_{N}}}P(\bm{\Pi}^{N},\bm{\Pi}^{N*})\big{)}=0.

Given that convergence in expectation follows from (A.1) and that the variance converges to zero, convergence in probability follows immediately. ∎

Acknowledgments

​​We are grateful to Charles Bordenave, Benoît Collins, Michael Magee, Doron Puder, and Mikael de la Salle for very helpful discussions, and to Nick Cook, Srivatsav Kunnawalkam Elayavalli, Joel Friedman, Sidhanth Mohanty, Peter Sarnak, Nikhil Srivastava, Pierre Youssef, and Yizhe Zhu for helpful comments. We are especially indebted to Michael Magee for suggesting the application to stable representations, and for patiently explaining what they are. RvH thanks Peter Sarnak for organizing a stimulating seminar on strong convergence and related topics at Princeton in Fall 2023. Finally, we are grateful to the referee for suggestions that helped us improve the presentation of the paper.

CFC was supported by a Simons-CIQC postdoctoral fellowship through NSF grant QLCI-2016245. JGV was supported by NSF grant FRG-1952777, Caltech IST, and a Baer–Weiss postdoctoral fellowship. JAT was supported by the Caltech Carver Mead New Adventures Fund, NSF grant FRG-1952777, and ONR grants BRC-N00014-18-1-2363 and N00014-24-1-2223. RvH was supported by NSF grants DMS-2054565 and DMS-2347954.

References

  • [1] S. Aaronson. The polynomial method in quantum and classical computing. In Proc. 49th IEEE Symposium on Foundations of Computer Science, page 3. IEEE, 2008.
  • [2] A. S. Bandeira, M. T. Boedihardjo, and R. van Handel. Matrix concentration inequalities and free probability. Invent. Math., 234(1):419–487, 2023.
  • [3] R. Bauerschmidt, J. Huang, A. Knowles, and H.-T. Yau. Edge rigidity and universality of random regular graphs of intermediate degree. Geom. Funct. Anal., 30(3):693–769, 2020.
  • [4] S. Belinschi and M. Capitaine. Strong convergence of tensor products of independent GUE matrices, 2022. Preprint arXiv:2205.07695.
  • [5] C. Bordenave. A new proof of Friedman’s second eigenvalue theorem and its extension to random lifts. Ann. Sci. Éc. Norm. Supér. (4), 53(6):1393–1439, 2020.
  • [6] C. Bordenave, D. Chafaï, and D. García-Zelada. Convergence of the spectral radius of a random matrix through its characteristic polynomial. Probab. Theory Related Fields, 182(3-4):1163–1181, 2022.
  • [7] C. Bordenave and B. Collins. Eigenvalues of random lifts and polynomials of random permutation matrices. Ann. of Math. (2), 190(3):811–875, 2019.
  • [8] C. Bordenave and B. Collins. Norm of matrix-valued polynomials in random unitaries and permutations, 2024. Preprint arxiv:2304.05714v2.
  • [9] C. Bordenave and B. Collins. Strong asymptotic freeness for independent uniform variables on compact groups associated to nontrivial representations. Invent. Math., 237(1):221–273, 2024.
  • [10] P. Borwein and T. Erdélyi. Polynomials and polynomial inequalities, volume 161 of Graduate Texts in Mathematics. Springer-Verlag, New York, 1995.
  • [11] J. Bourgain and L. Tzafriri. On a problem of Kadison and Singer. J. Reine Angew. Math., 420:1–43, 1991.
  • [12] T. Brailovskaya and R. van Handel. Universality and sharp matrix concentration inequalities. Geom. Funct. Anal., 34(6):1734–1838, 2024.
  • [13] E. Cassidy. Random permutations acting on kk-tuples have near-optimal spectral gap for k=poly(n)k=\mathrm{poly}(n), 2024. Preprint arxiv:2412.13941.
  • [14] C.-F. Chen, A. Bouland, F. G. S. L. Brandão, J. Docter, P. Hayden, and M. Xu. Efficient unitary designs and pseudorandom unitaries from permutations, 2024. Preprint arxiv:2404.16751.
  • [15] C.-F. Chen, J. Docter, M. Xu, A. Bouland, and P. Hayden. Efficient unitary tt-designs from random sums, 2024. Preprint arxiv:2402.09335.
  • [16] E. W. Cheney. Introduction to approximation theory. AMS, Providence, RI, 1998.
  • [17] M. D. Choi. The full CC^{\ast}-algebra of the free group on two generators. Pacific J. Math., 87(1):41–48, 1980.
  • [18] T. Church, J. S. Ellenberg, and B. Farb. FI-modules and stability for representations of symmetric groups. Duke Math. J., 164(9):1833–1910, 2015.
  • [19] B. Collins. Moment methods on compact groups: Weingarten calculus and its applications. In ICM Vol. IV. Sections 5–8, pages 3142–3164. EMS Press, Berlin, 2023.
  • [20] B. Collins, A. Guionnet, and F. Parraud. On the operator norm of non-commutative polynomials in deterministic matrices and iid GUE matrices. Camb. J. Math., 10(1):195–260, 2022.
  • [21] B. Collins, S. Matsumoto, and J. Novak. The Weingarten calculus. Notices Amer. Math. Soc., 69(5):734–745, 2022.
  • [22] D. Coppersmith and T. J. Rivlin. The growth of polynomials bounded at equally spaced points. SIAM J. Math. Anal., 23(4):970–983, 1992.
  • [23] S. Coste, G. Lambert, and Y. Zhu. The characteristic polynomial of sums of random permutations and regular digraphs. Int. Math. Res. Not. IMRN, (3):2461–2510, 2024.
  • [24] R. Estrada and R. P. Kanwal. Moment sequences for a certain class of distributions. Complex Variables Theory Appl., 9(1):31–39, 1987.
  • [25] B. Farb. Representation stability. In Proceedings of the International Congress of Mathematicians—Seoul 2014. Vol. II, pages 1173–1196. Kyung Moon Sa, Seoul, 2014.
  • [26] J. Friedman. Relative expanders or weakly relatively Ramanujan graphs. Duke Math. J., 118(1):19–35, 2003.
  • [27] J. Friedman. A proof of Alon’s second eigenvalue conjecture and related problems. Mem. Amer. Math. Soc., 195(910):viii+100, 2008.
  • [28] J. Friedman, A. Joux, Y. Roichman, J. Stern, and J.-P. Tillich. The action of a few random permutations on rr-tuples and an application to cryptography. In STACS 96 (Grenoble, 1996), volume 1046 of Lecture Notes in Comput. Sci., pages 375–386. Springer, Berlin, 1996.
  • [29] J. Friedman and D.-E. Kohler. The relativized second eigenvalue conjecture of Alon, 2014. Preprint arxiv:1403.3462.
  • [30] U. Haagerup, H. Schultz, and S. Thorbjørnsen. A random matrix approach to the lack of projections in Cred(𝔽2)C^{*}_{\rm red}(\mathbb{F}_{2}). Adv. Math., 204(1):1–83, 2006.
  • [31] U. Haagerup and S. Thorbjørnsen. A new application of random matrices: Ext(Cred(F2)){\rm Ext}(C^{*}_{\rm red}(F_{2})) is not a group. Ann. of Math. (2), 162(2):711–775, 2005.
  • [32] L. Hanany and D. Puder. Word measures on symmetric groups. Int. Math. Res. Not. IMRN, (11):9221–9297, 2023.
  • [33] M. B. Hastings and A. W. Harrow. Classical and quantum tensor product expanders. Quantum Inf. Comput., 9(3-4):336–360, 2009.
  • [34] B. Hayes. A random matrix approach to the Peterson-Thom conjecture. Indiana Univ. Math. J., 71(3):1243–1297, 2022.
  • [35] B. Hayes, D. Jekel, and S. K. Elayavalli. Consequences of the random matrix solution to the Peterson-Thom conjecture. Anal. PDE, 2025. To appear.
  • [36] Y. He. Spectral gap and edge universality of dense random regular graphs. Commun. Math. Phys., 405:181, 2024.
  • [37] W. Hide. Effective lower bounds for spectra of random covers and random unitary bundles. Israel J. Math., 2025. To appear.
  • [38] W. Hide and M. Magee. Near optimal spectral gaps for hyperbolic surfaces. Ann. of Math. (2), 198(2):791–824, 2023.
  • [39] L. Hörmander. The analysis of linear partial differential operators. I. Classics in Mathematics. Springer-Verlag, Berlin, 2003. Distribution theory and Fourier analysis.
  • [40] J. Huang, T. McKenzie, and H.-T. Yau. Optimal eigenvalue rigidity of random regular graphs, 2024. Preprint arxiv:2405.12161.
  • [41] J. Huang, T. McKenzie, and H.-T. Yau. Ramanujan property and edge universality of random regular graphs, 2024. Preprint arxiv:2412.20263.
  • [42] J. Huang and H.-T. Yau. Spectrum of random dd-regular graphs up to the edge. Comm. Pure Appl. Math., 77(3):1635–1723, 2024.
  • [43] H. Kesten. Symmetric random walks on groups. Trans. Amer. Math. Soc., 92:336–354, 1959.
  • [44] F. Lehner. Computing norms of free operators with matrix coefficients. Amer. J. Math., 121(3):453–486, 1999.
  • [45] N. Linial and D. Puder. Word maps and spectra of random graph lifts. Random Structures Algorithms, 37(1):100–135, 2010.
  • [46] L. Louder, M. Magee, and W. Hide. Strongly convergent unitary representations of limit groups. J. Funct. Anal., 288(6):Paper No. 110803, 2025.
  • [47] R. C. Lyndon and P. E. Schupp. Combinatorial group theory. Classics in Mathematics. Springer-Verlag, Berlin, 2001. Reprint of the 1977 edition.
  • [48] M. Magee and J. Thomas. Strongly convergent unitary representations of right-angled Artin groups, 2023. Preprint arxiv:2308.00863.
  • [49] F. L. Metz, G. Parisi, and L. Leuzzi. Finite-size corrections to the spectrum of regular random graphs: An analytical solution. Phys. Rev. E, 90:052109, Nov 2014.
  • [50] J. A. Mingo and R. Speicher. Free probability and random matrices, volume 35 of Fields Institute Monographs. Springer, New York, 2017.
  • [51] S. Mohanty, R. O’Donnell, and P. Paredes. Explicit near-Ramanujan graphs of every degree, 2019. Preprint arXiv:1909.06988v3.
  • [52] A. Nica. Asymptotically free families of random unitaries in symmetric groups. Pacific J. Math., 157(2):295–310, 1993.
  • [53] A. Nica. On the number of cycles of given length of a free word in several random permutations. Random Structures Algorithms, 5(5):703–730, 1994.
  • [54] A. Nica and R. Speicher. Lectures on the combinatorics of free probability. Cambridge, 2006.
  • [55] A. Nilli. On the second eigenvalue of a graph. Discrete Math., 91(2):207–210, 1991.
  • [56] R. O’Donnell and X. Wu. Explicit near-fully X-Ramanujan graphs, 2020. Preprint arXiv:2009.02595.
  • [57] F. Parraud. On the operator norm of non-commutative polynomials in deterministic matrices and iid Haar unitary matrices. Probab. Theory Related Fields, 182(3-4):751–806, 2022.
  • [58] F. Parraud. Asymptotic expansion of smooth functions in polynomials in deterministic matrices and iid GUE matrices. Comm. Math. Phys., 399(1):249–294, 2023.
  • [59] G. Pisier. On a linearization trick. Enseign. Math., 64(3-4):315–326, 2018.
  • [60] R. T. Powers. Simplicity of the CC^{\ast}-algebra associated with the free group on two generators. Duke Math. J., 42:151–156, 1975.
  • [61] D. Puder. Expansion of random graphs: new proofs, new results. Invent. Math., 201(3):845–908, 2015.
  • [62] I. Rivin and N. T. Sardari. Quantum chaos on random Cayley graphs of SL2[/p]{\rm SL}_{2}[\mathbb{Z}/p\mathbb{Z}]. Exp. Math., 28(3):328–341, 2019.
  • [63] A. Sarid. The spectral gap of random regular graphs. Random Structures Algorithms, 63(2):557–587, 2023.
  • [64] H. Schultz. Non-commutative polynomials of independent Gaussian random matrices. The real and symplectic cases. Probab. Theory Related Fields, 131(2):261–309, 2005.
  • [65] A. Song. Random minimal surfaces in spheres, 2024. Preprint arxiv:2402.10287.
  • [66] V. Vu. Random discrete matrices. In Horizons of combinatorics, volume 17 of Bolyai Soc. Math. Stud., pages 257–280. Springer, Berlin, 2008.
  • [67] A. Zygmund. Trigonometric series. Vol. I, II. Cambridge, third edition, 2002.