A new approach to strong convergence
Abstract.
A family of random matrices is said to converge strongly to a family of bounded operators when for every noncommutative polynomial . This phenomenon plays a key role in several recent breakthroughs on random graphs, geometry, and operator algebras. However, proofs of strong convergence are notoriously delicate and have relied largely on problem-specific methods.
In this paper, we develop a new approach to strong convergence that uses only soft arguments. Our method exploits the fact that for many natural models, the expected trace of is a rational function of whose lowest order asymptotics are easily understood. We develop a general technique to deduce strong convergence directly from these inputs using the inequality of A. and V. Markov for univariate polynomials and elementary Fourier analysis. To illustrate the method, we develop the following applications.
-
1.
We give a short proof of the result of Friedman that random regular graphs have a near-optimal spectral gap, and obtain a sharp understanding of the large deviations probabilities of the second eigenvalue.
-
2.
We prove a strong quantitative form of the strong convergence property of random permutation matrices due to Bordenave and Collins.
-
3.
We extend the above to any stable representation of the symmetric group, providing many new examples of the strong convergence phenomenon.
Key words and phrases:
Strong convergence; random permutations; random regular graphs2010 Mathematics Subject Classification:
60B20; 15B52; 05C80; 46L53; 46L541. Introduction
Let be a sequence of -tuples of random matrices of increasing dimension, and let be a -tuple of bounded operators. Then is said to converge strongly to if
for every noncommutative polynomial .
The notion of strong convergence has its origin in the work of Haagerup and Thorbjørnsen [31] in the context of complex Gaussian (GUE) matrices. It has since proved to be a powerful tool in a broad range of applications. Notably, strong convergence plays a central role in a series of recent breakthroughs in the study of random lifts of graphs [7], random covering spaces [38, 46, 37, 48], random minimal surfaces [65], and operator algebras [31, 30, 34, 4, 35, 8]; we refer to the recent paper [8] for a more extensive discussion and bibliography.
In many cases, norm convergence can be highly nontrivial to establish even for the simplest polynomials . For example, let be i.i.d. random permutation matrices of dimension . Then the linear polynomial
is the adjacency matrix of a random -regular graph. Every -regular graph has largest eigenvalue with eigenvector , while the Alon–Boppana bound states that the second eigenvalue is at least [55]. It was conjectured by Alon, and proved in a breakthrough monograph of Friedman [27], that the random -regular graph satisfies , so that such graphs have near-optimal second eigenvalue. The much more general result that random permutation matrices converge strongly, due to Bordenave and Collins [7], is of major importance in many recent applications cited above.
Given the power of the strong convergence phenomenon, it may be unsurprising that strong convergence has been notoriously difficult to prove (see, e.g., the discussion in [9]). In those cases where strong convergence has been established, the proofs rely on problem-specific methods and often require delicate computations. These obstacles have hampered efforts to establish strong convergence in new situations and to obtain a sharper understanding of this phenomenon.
In this paper, we develop a new approach to strong convergence that uses only soft arguments, and which requires limited problem-specific inputs as compared to previous methods. We will illustrate the method in the context of random permutation matrices and other representations of the symmetric group, resulting in surprisingly short proofs of strong convergence and new quantitative and qualitative information on the strong convergence phenomenon:
-
•
We prove a quantitative form of Friedman’s theorem on the second eigenvalue of random regular graphs [27] that reveals new probabilistic structure. In particular, we obtain a sharp characterization of the tail behavior of the second eigenvalue.
- •
-
•
We establish strong convergence of random matrices defined by any stable representation of the symmetric group (in the sense of [25]). This provides a large new family of examples of the strong convergence phenomenon, and illustrates that strong convergence arises in settings far beyond the standard representation.
Further applications to several other models will appear in forthcoming work.
Let us emphasize at the outset that the main difficulty in establishing strong convergence is to prove the upper bound
as . The corresponding lower bound then follows automatically, either as a byproduct of the proof or from general principles (see, e.g., [46, pp. 16–19]). We therefore focus on upper bounds throughout this paper, and relegate a brief treatment of the corresponding lower bounds to Appendix A.
Organization of this paper
This paper is organized as follows. In section 2, we first outline the core ingredients of our approach in a general setting, while section 3 presents the main results obtained in this paper.
The next two sections introduce some general tools on which our methods are based. Section 4 recalls properties of polynomials and distributions that form the basis for our approach, while section 5 recalls basic properties of words in random permutation matrices that form the input for our methods.
The rest of the paper is devoted to the proofs of our main results. Sections 6 and 7 prove master inequalities for polynomials and smooth functions, respectively. These inequalities are applied in section 8 to random regular graphs, and in section 9 to strong convergence of random permutation matrices. Section 10 extends the above results to general stable representations of the symmetric group.
Finally, Appendix A is devoted to a brief treatment of lower bounds.
Notation
In the following, denotes that for a universal constant . We denote and .
The symmetric group on letters is denoted as , the free group with free generators is denoted , and denotes the identity element.
We denote by the space of real polynomials of degree at most , and by the space of all real polynomials. We denote by the th derivative of a univariate function, and by .
We will denote by the space of matrices with complex entries. The trace of a matrix is denoted as , the normalized trace as , and the operator norm as . The identity matrix is denoted , and denotes the vector all of whose entries equal one. We use the convention that powers bind before the trace, e.g., .
If is a self-adjoint operator and , then the operator is defined by the usual functional calculus (in particular, if is a self-adjoint matrix, is applied to the eigenvalues while keeping the eigenvectors fixed).
2. Outline
The main results of this paper will be presented in section 3, which can be read independently of the present section. However, as the new approach that underlies their proofs is much more broadly applicable, we begin in this section by introducing the core ingredients of the method in a general setting.
Throughout this section, we let be a sequence of -dimensional self-adjoint random matrices, whose expected empirical eigenvalue distribution converges as to a limiting probability measure with compact support. This weak convergence property is captured by convergence of the moments
The moments generally admit a combinatorial description whose behavior as is well understood, so that weak convergence is readily accessible. However, weak convergence can only ensure that a fraction of eigenvalues converges to the support of , and cannot rule out existence of outliers in the spectrum. This is the basic difficulty in strong convergence problems.
In the problems of interest in this paper, the random matrix is a noncommutative polynomial of a given family of random matrices. In this setting, it is natural to think of itself as the spectral distribution of a certain limiting operator in the sense that
where is a limiting family of operators associated to and is a positive linear functional that plays the role of the trace for the limiting operators. Such models are naturally formulated in the setting of -probability spaces, for which we refer to [54] for a excellent introduction. However, this general framework will not be needed in this paper, as the limiting objects associated to the models we will study admit an explicit construction that is recalled in section 3.1.
Example 2.1.
While the aim of this section in to discuss the methods of this paper in a general setting, the reader may keep in mind the following guiding example. Let be the adjacency matrix of a random -regular graph with the trivial eigenvalue removed, as defined in the introduction. In this case, the limiting operator may be viewed as the adjacency matrix of the infinite -regular tree, and where is the coordinate vector associated with the root of the tree. The spectral distribution of is the Kesten-McKay distribution, which is supported in the interval with . A more precise description of this model is given in section 3.2 below.
Remark 2.2.
Throughout this paper, we will work with self-adjoint random matrices and operators . This entails no loss of generality: since for any bounded operator , strong convergence of arbitrary noncommutative polynomials is equivalent to strong convergence of the self-adjoint polynomials . We may therefore restrict attention to self-adjoint polynomials .
While it is not essential for the applicability of the methods of this paper, we will also assume for simplicity that the random matrices are uniformly bounded, i.e., there is a constant so that a.s. for all . This is the case for all models considered in this paper; for example, a.s. in Example 2.1.
In the remainder of this section, we assume that weak convergence
(2.1) |
holds. The problem of strong convergence is to show that not only the moments, but also the operator norm, of converges to that of the limiting model , which ensures a lack of outliers in the spectrum. More precisely, we aim to explain in the sequel how to prove the strong convergence upper bound , which is the main difficulty. The corresponding lower bound is typically an easy consequence of (2.1), as will be discussed in Appendix A.
To date, proofs of strong convergence were based on resolvent equations [31, 64], interpolation [20, 57, 58, 2], or the moment method [7, 9, 8]. The first two approaches rely on analytic tools, such as integration by parts formulae or free stochastic calculus, that are only available for special models. In contrast, the only known way to access the properties of models such as those based on random permutations or group representations is through moment computations. We therefore begin by recalling the challenges faced by the moment method.
2.1. Prelude: the moment method
The basic premise behind the moment method is the observation that since , (2.1) implies that
(2.2) |
as . This does not in itself suffice to prove , as the right-hand side diverges when is fixed. The essence of the moment method is that the desired bound would follow if (2.2) remains valid for , as then . We emphasize this is a much stronger property than (2.1). In practice, the implementation of this method faces two major obstacles.
-
•
While moment asymptotics for fixed are accessible in many situations, the case where grows rapidly with is often a difficult combinatorial problem. Despite the availability of tools to facilitate the analysis of large moments for strong convergence, such as nonbacktracking and linearization methods (see, e.g., [7, 8]), the core of the analysis remains dependent on delicate computations that do not readily carry over to new situations.
-
•
A more fundamental problem is that the inequality (2.2) that forms the foundation for the moment method may fail to hold altogether for any . To see why, suppose that for some . Then
which precludes the validity of (2.2) for . It was observed by Friedman [27] that this situation arises already in the setting of random regular graphs due to the presence with probability of dense subgraphs called “tangles”. The application of the moment method in this setting requires conditioning on the absence of tangles, which significantly complicates the analysis.
2.2. A new approach
The approach of this paper is also based on moment computations, which are however used in an entirely different manner. As we will explain below, our method deduces norm convergence from moments by first letting and then , bypassing the challenges of the moment method.
Inputs
Our approach requires two basic inputs that we presently describe.
Let be a polynomial of degree at most . Then is a linear combination of the moments for . We will exploit the fact that in many situations, the moments are rational functions of :
(2.3) |
where and are polynomials that depend only on and , respectively. This phenomenon is very common; e.g., it arises from Weingarten calculus [21, 19] or from genus expansions [50, Chapter 1]. For the random permutation models considered in this paper, the relevant properties are recalled in section 5.
In general, the function is extremely complicated. However, our methods will use only soft information on its structure: we require upper bounds on the degrees of and (which are proportional to for the models considered in this paper), and we must show that does not vanish near zero. Both properties can be read off almost immediately from a combinatorial expression for .
The second input to our method is the asymptotic expansion as
(2.4) |
where and are linear functionals on the space of real polynomials. Clearly
are defined by the lowest-order asymptotics of the moments. We will exploit that simple combinatorial expressions for and are readily accessible in practice.
Remark 2.3.
It is evident from the above formulas that coincides with the spectral distribution of , so we know a priori that it is defined by a probability measure. In particular, even though appears in (2.4) only for polynomial test functions , its definition automatically extends to any continuous function . In contrast, there is no reason a priori to expect that can be meaningfully defined for non-polynomial test functions : there could be functions for which weak convergence holds at a slower rate than , in which case the expansion (2.4) fails to hold. That (2.4) and the definition of nonetheless extend to smooth functions will be a key output of our method.
Outline
Our basic aim is to achieve the following.
- (a)
-
(b)
We aim to show that . (That holds automatically as is the spectral distribution of .)
A bound on the norm then follows easily. Let be a smooth function so that on and on . Then by (b), and thus (a) yields
As was arbitrary, in probability.
We now explain the steps that will be used to prove (a) and (b).
Step 1: the polynomial method
At the heart of our approach lies a general method to obtain a quantitative form of (2.4): we will show that
(2.5) |
for any , where a.s. for all . While achieving such a bound by previous methods would require hard analysis, it arises here from a soft argument: we can “upgrade” an asymptotic expansion (2.4) to a strong nonasymptotic bound (2.5) merely by virtue of the fact that is rational.
To this end, observe that the left-hand side of (2.5) is nothing other than the remainder in the first-order Taylor expansion of at zero, so that
This bound appears useless at first sight, as it relies on the behavior of for where its spectral interpretation is lost. However, as is rational, we can control its behavior by its values on alone by means of classical inequalities for arbitrary univariate polynomials of degree :
-
•
The inequality of A. and V. Markov (Lemma 4.1).
-
•
A corollary of the Markov inequality that for any set with spacing between its points (Lemma 4.2).
Applying these inequalities to the numerator and denominator of yields (2.5) with minimal effort. We emphasize that the only inputs used here are upper bounds on the degrees of and in (2.3), and that does not vanish near zero.
Step 2: the extension problem
We now aim to extend (2.5) to . This is not merely a technical issue: while and are defined for by functional calculus, it is unclear that can even be meaningfully defined when is not a polynomial (cf. Remark 2.3).
In order to address these issues, we must replace the degree-dependent bound (2.5) by a bound that depends only on the smoothness of the polynomial . This is achieved as follows. Rather than applying (2.5) directly to , we expand
in terms of Chebyshev polynomials of the first kind , and apply (2.5) to each individually. As Chebyshev polynomials are uniformly bounded by , this replaces by in (2.5), where the latter inequality follows by classical Fourier analysis. We therefore obtain
(2.6) |
for every polynomial . (We will in fact use a stronger inequality of Zygmund, cf. Lemma 4.4, that achieves better rates in our main results.)
Since the inequality (2.6) no longer depends on the degree of the polynomial , it extends to arbitrary smooth functions as polynomials are dense in . In particular, it ensures that the left-hand side extends uniquely to a compactly supported distribution, so that can be defined for any .
Step 3: the asymptotic moment method
It remains to bound the support of . To this end, we make fundamental use of a key property of compactly supported distributions: their support is bounded by the exponential growth rate of their moments (Lemma 4.9). In particular, if we define
then . Thus our method ultimately reduces to a form of the moment method, but where we first let and only then .
In contrast to the moments of the random matrix , the moments of turn out to be easy to analyze for the models considered in this paper using a simple idea that is inspired by earlier work of Friedman [26, Lemma 2.4]: even though does not have a clear spectral interpretation, its moments can be expressed as a sum of products of matrix elements of powers of . As the sum only has polynomially many terms, the desired bound follows readily.
2.3. The role of cancellations
A surprising feature of our approach is that tangles, which form a fundamental obstruction to the moment method, are completely ignored. This seems unexpected, as the method is based on an estimate (2.5) for traces of high degree polynomials which are merely linear combinations of moments. Indeed, as was explained in section 2.1, the presence of tangles causes the random matrix moments of large degree to be exponentially larger than their limiting values, that is, for . In contrast, (2.5) yields a bound on the difference between linear combinations of the random matrix moments and their limiting values that is only polynomial in the degree.
The key feature of (2.5) is that it involves a uniform bound on the test function . It therefore yields no useful information on moments of order , as is exponential in for . However, it yields a powerful bound for polynomials that are uniformly bounded on the interval independently of their degree. The estimate (2.5) therefore reveals a new phenomenon: the effect of tangles on the individual moments cancels out when they are combined to form bounded test functions . One of the key features of the polynomial method is that it is able to capture these cancellations.
2.4. Related work
The observation that norm bounds follow from the validity of (2.4) for smooth functions and a bound on first-order support has been widely used since the works of Haagerup and Thorbjørnsen [31] and Schultz [64] (see also the work of Parraud [58] where higher-order expansions are considered). However, these works rely heavily on analytic and operator-algebraic tools that are not available for the kind of models considered in this paper. What is fundamentally new here is that our method achieves this aim using only simple moment computations, which are much more broadly applicable.
The polynomial method that lies at the heart of our approach is inspired by work in complexity theory [1]. We are aware of little precedent for the use of this method in a random matrix context, beside a distantly related idea that appears in Bourgain and Tzafriri [11, Theorem 2.3]. The first author used a variant of this idea in concurrent work to establish weak convergence of certain random unitary matrices that arise in quantum information theory [15, 14].
We emphasize that the appearance of Chebyshev polynomials in Step 2 above is unrelated to their connection with nonbacktracking walks that is used, for example, in [27, Lemma 2.3]. Indeed, our Chebyshev polynomials are normalized by the a.s. bound on (e.g., in Friedman’s theorem) as opposed to the support of the limiting spectrum ( in Friedman’s theorem).
Finally, we mention the interesting works [6, 23] that develop a method to bound the spectral radius of certain non-Hermitian matrices using moment asymptotics with fixed , similarly avoiding the challenges of the moment method in that setting. However, the methods developed in these works are unrelated to our approach and are not applicable to the problems considered in this paper.
3. Main results
In this section, we formulate and discuss the main results of this paper. As our primary aim is to achieve strong convergence, we will focus the presentation on upper bounds on the norm as explained in section 1. Let us note, however, that a byproduct of the analysis will also yield a quantitative form of weak convergence, which is of independent interest; see Corollary 7.2 below.
3.1. Preliminaries
Before we turn to our main results, we must briefly recall some basic facts about random permutation matrices and their limiting model. The following definitions will remain in force throughout this paper.
Definition 3.1.
Let be i.i.d. random permutation matrices of dimension , and denote by their restriction to the invariant subspace . We will often write and .
Definition 3.2.
Let be defined by , where are the free generators of and is the left-regular representation defined by . Define the vector state on .
The basic weak convergence property of random permutation matrices, due to Nica [52] (see Corollary 5.3 for a short proof), states that
for every noncommutative polynomial . This property plays the role of (2.1) in section 2. The aim of the strong convergence problem is to prove that this convergence holds not only for the trace but also for the norm.
The basic inputs to the methods of this paper, as described in section 2.2, are well known in the present setting. They will be reviewed in section 5 below.
Remark 3.3.
Even though are -dimensional matrices defined on , we will normalize the trace by rather than by as this leads to cleaner expressions for the rational functions that arise in the proof. This makes no difference to our results, and is mainly done for notational convenience.
3.2. Random regular graphs
Let be the adjacency matrix of the random -regular graph with vertices defined in section 1. Then is defined by the linear polynomial of random permutation matrices
and the associated limiting model is
Note that is nothing other than the adjacency matrix of the Cayley graph of generated by and their inverses, that is, of the -regular tree.
It is a classical fact due to Kesten [43] that . That
is due to Friedman [27]. Friedman’s proof was simplified by Bordenave [5], and a new proof with an improved convergence rate was given by Huang and Yau [42]. Very recently, the latter approach was further developed in the impressive works of Huang, McKenzie and Yau [40, 41] to achieve the optimal convergence rate.
3.2.1. An effective Friedman theorem
As a first illustration of the methods of this paper, we will give a short new proof of Friedman’s theorem in section 8.1. A direct implementation of the approach described in section 2 yields the following.
Theorem 3.4.
For every , , and , we have
Theorem 3.4 implies that when is fixed as , we have111The notation denotes that is bounded in probability.
This rate falls short of the optimal rate in Friedman’s theorem that was very recently established in [40, 41]. However, the methods of [42, 40, 41] rely heavily on the special structure of random regular graphs, and it is unclear at present whether they could be applied to the study of strong convergence. In contrast, the methods of this paper will achieve the same rate for arbitrary polynomials of random permutation matrices, see section 3.3 below.
The quantitative nature of Theorem 3.4 also enables us to consider what happens when simultaneously. It is an old question of Vu [66, §5] whether with probability remains valid when in an arbitrary manner. We can now settle this question for the permutation model of random regular graphs that is considered here (see Remark 3.8 below for a brief discussion of other models of random regular graphs).
Corollary 3.5.
with probability whenever and depends on in an arbitrary manner.
3.2.2. The staircase theorem
As was explained in section 2, the approach of this paper only requires an understanding of the first-order term in the -expansion of the moments. However, in the setting of random regular graphs, a detailed understanding of the higher-order terms was achieved by Puder [61] using methods of combinatorial group theory. When such additional information is available, the approach of this paper is readily adapted to achieve stronger results.
The following theorem will be proved in section 8.2 by taking full advantage of the results of [61]. We emphasize that this is the only part of this paper where we will use asymptotics of expected traces beyond the lowest order.
Theorem 3.6.
Define
and let be the largest integer so that .
Then for every , , and , we have
for all , and | ||||
as . Here is a constant that depends on only.
Theorem 3.6 reveals an unusual staircase pattern of the large deviations of , which is illustrated in Figure 3.1. The lower bound, which follows from an elementary argument of Friedman [27, Theorem 2.11], arises due to the presence of tangles: Friedman shows that the presence of a vertex with self-loops, which happens with probability , gives rise to an outlier in the spectrum at location . The fact that our upper bound precisely matches this behavior shows that the tail probabilities of are completely dominated by these events.
Remark 3.7.
Let us highlight two further consequences of Theorem 3.6.
- 1.
- 2.
Remark 3.8.
The random regular graph model considered in this paper is known as the permutation model. Several other models of random regular graphs are also considered in the literature. All these models are known to be contiguous to the permutation model as for fixed degree , so that any statement that holds with probability for the permutation model remains valid with probability for the other models; cf. [27, pp. 2–3] and the references therein.
It should be emphasized, however, that low probability events are not preserved by contiguity. In particular, the detailed quantitative picture in Theorem 3.6 is specific to the permutation model. The corresponding picture for other models of random regular graphs must be investigated on a case by case basis.
Similarly, the different models of random regular graphs are no longer contiguous when the degree diverges , so that high degree results as in Corollary 3.5 must be investigated separately for each model. Partial results on this problem for the uniform model of random regular graphs may be found in [3, 63, 36].
3.3. Strong convergence of random permutation matrices
The adjacency matrix of a random regular graph is one very special example of a polynomial of random permutation matrices. The much more general fact that the norm of every noncommutative polynomial of random permutation matrices converges to that of its limiting model is an important result due to Bordenave and Collins [7, 8]. Here we give a short proof that yields an effective form of this result.
We will formulate our strong convergence results for general noncommutative polynomials with matrix coefficients, that is,
where the sum is over a finite set of words in the symbols and are matrix coefficients. The case of scalar coefficients is recovered for , but the more general setting considered here arises often in applications (e.g., in the study of random lifts [7]). For any such polynomial, we define the norm
where the supremum is taken over all -tuples of unitary matrices of any dimension. The notation used here agrees with the standard definition in terms of the norm in the full -algebra of the free group, see, e.g., [17, Theorem 7]. However, this is not important for our purposes, and the reader may simply view the above as the definition of the norm. We emphasize that is not the same as , which corresponds to the reduced -algebra of the free group. For example, for , we have and . In practice, may be bounded trivially by the sum of the norms of the matrix coefficients of .
We now state our main result on strong convergence of random permutations.
Theorem 3.9.
Let , and let be any self-adjoint noncommutative polynomial of degree . Then we have
for all , where .
This theorem will be proved in section 9. Note that the limiting norm can in principle be computed explicitly using the results of [44].
Remark 3.10.
The assumption that is self-adjoint entails no loss of generality: the analogous bound for a non-self-adjoint polynomial of degree follows by applying this result to the self-adjoint polynomial of degree (cf. Remark 2.2).
Theorem 3.9 shows that for any polynomial . This significantly improves the best known bound to date, due to Bordenave and Collins [8, Theorem 1.4], which yields fluctuations of order . Our bound can be directly substituted in applications, such as to random lifts of graphs [7, §1.5], to improve the best known convergence rates.
Let us note that for fixed , the tail probability of order in Theorem 3.9 cannot be improved in general, as is illustrated by Theorem 3.6 with .
Remark 3.11.
While Theorem 3.9 yields much stronger quantitative bounds for fixed then prior results, it can only be applied when . In contrast, it was recently shown in [8, Corollary 1.5] that strong convergence remains valid in the present setting even for as large as . Such bounds could be achieved using our methods if the supports of the higher-order terms in the -expansion of the moments are still bounded by . While this is the case for continuous models such as GUE [58], it is simply false in the present setting: the proof of Theorem 3.6 shows that tangles can already arise in the second-order term. It is an interesting question whether the approach of this paper can be combined with conditioning on the absence of tangles to achieve improved bounds.
3.4. Stable representations of the symmetric group
Let be a finite-dimensional unitary representation of the symmetric group . Then we can define random matrices of dimension as
(3.1) |
where are i.i.d. uniformly distributed elements of . When is the standard representation, we recover the random permutation matrices as a special case. Other representations, however, capture a much larger family of random matrix models. We will prove strong convergence for random matrices defined by any stable representation of (see section 3.4.1), which yields a far-reaching generalization of Theorem 3.9. This result is of interest for several reasons:
-
•
It provides many new examples of the strong convergence phenomenon.
-
•
It shows that strong convergence can be achieved with much less randomness than in the standard representation: uses the same number of random bits as , but has much larger dimension ( with arbitrarily large). See [9] for analogous questions in the context of the unitary group.
-
•
It provides new evidence in support of long-standing questions on the expansion of random Cayley graphs of the symmetric group, for which extensive numerical evidence and conjectures are available; see [62] and the references therein.
- •
Remark 3.12.
The question whether strong asymptotic freeness can be derandomized has been investigated in the theoretical computer science literature [51, 56] by means of pseudorandom permutation matrices. The results of the present section suggest that high-dimensional representations of can provide a different perspective on such questions. We omit a detailed discussion of the number of random bits needed by our bounds, as significantly stronger results in this direction were subsequently obtained by Cassidy [13] by combining the methods of this paper with new group-theoretic ideas; see Remark 3.16 below.
3.4.1. Stable representations
The approach of this paper requires that the moments of the random matrices of interest are rational functions of . For this to be the case, we cannot choose an arbitrary representation of for each . Instead, we will work with stable representations [25, 18] that are defined consistently for different . We briefly introduce the relevant notions here.
Following [32], denote by the number of fixed points of for . The sequence determines the conjugacy class of . If is any finite-dimensional unitary representation, its character is a class function and can therefore be expressed as a polynomial of
Definition 3.13.
A finite-dimensional unitary representation of , defined for each , is stable if there exists and a polynomial so that for all .
Thus stable representations are representations of whose characters are defined by the same polynomial for all . For example, the standard representation is stable as it satisfies for all .
The irreducible stable representations of can be constructed explicitly as follows. Fix a base partition of . Then for every , the irreducible representation of defined by
(3.2) |
is stable. Moreover, it follows from [32, Proposition B.2] that every stable representation in the sense of Definition 3.13 is a direct sum of such irreducible representations defined by fixed base partitions .
3.4.2. Strong convergence
Fix a stable representation defined for by a character polynomial . We aim to prove strong convergence of the random matrices defined by (3.1).
We will not require that is irreducible, but we assume it does not contain the trivial representation. The dimension of is given by
Thus is a polynomial in ; we denote its degree by , so that .
We now formulate our main result on strong convergence of , whose proof is contained in section 10 below.
Theorem 3.14.
Let , and let be any self-adjoint noncommutative polynomial of degree . Then we have
for all and . Here we define , and is a constant that depends on the choice of stable representation.
Remark 3.15.
It is certainly possible to obtain an explicit expression for from the proof; we have suppressed the dependence on the choice of stable representation for simplicity of presentation, and we did not optimize this dependence in the proof.
Remark 3.16.
The bound that follows from Theorem 3.14 becomes weaker when we consider stable representations of increasingly large dimension . This is not a restriction of our method, however, but rather reflects a gap in the understanding of stable representations at the time that this paper was written [32, Conjecture 1.8]. Important progress in this direction, recently obtained by Cassidy [13], yields an improved probability bound where is replaced by , resulting in a convergence rate that is independent of . As is discussed in [13], this yields strong convergence uniformly for all stable representations with for any .
Theorem 3.14 can be readily applied to concrete situations. For example, it implies that random -regular Schreier graphs defined by the action of on -tuples of distinct elements of have second eigenvalue with probability , settling a question discussed in [32, §1.4]. Indeed, it is not difficult to see (cf. [32, §8]) that the restricted adjacency matrix of such a random graph can be represented as
for some stable representation of (which depends on ) that does not contain the trivial representation, so that the conclusion follows immediately as a special case of Theorem 3.14. Applications of this model may be found in [28, 33]. (Let us note for completeness that the case reduces to Friedman’s theorem, while the case was previously studied by Bordenave and Collins [7, §1.6].)
4. Basic tools
The aim of this section is to introduce the general facts on polynomials and compactly supported distributions that form the basis for the methods of this paper. While most of the tools in this section follow readily from known results, it is useful to state them in the particular form in which they will be needed.
4.1. Markov inequalities
One of the key tools that will be used in our analysis is the following classical inequality of A. Markov and V. Markov. Here we recall that .
Lemma 4.1 (Markov inequality).
Let and , . Then we have
Proof.
Apply [10, p. 256(d)] to . ∎
Two basic issues will arise in our applications of this inequality. First, we will not be able to control the relevant functions on the entire interval , but only on a discrete subset thereof. This issue will be addressed using a standard interpolation inequality that is itself a direct consequence of the Markov inequality.
Lemma 4.2 (Interpolation).
Let , and fix a subset such that . Then we have
Proof.
Apply [16, Lemma 3(i), p. 91] to . ∎
The second issue is that we will aim to apply these inequalities to rational functions rather than to polynomials. However, in the cases of interest to us, the denominator of the rational function will be nearly constant. The following lemma extends the Markov inequality to this setting.
Lemma 4.3 (Rational Markov).
Let be a rational function with , and let and . Suppose that . Then
Proof.
Applying the product rule to yields
As for every function , we can estimate
by applying Lemma 4.1 to and . Here the factor in the second line is due to the fact that the and terms in the first line yield the same bound.
We now reason by induction. Clearly the conclusion holds . Now suppose the conclusion holds up to . Then the above inequality yields
as . Finally, use and for . ∎
4.2. Chebyshev polynomials
Let and fix . In this section, we will consider the behavior of on the interval .
Denote by the Chebyshev polynomial of the first kind of degree , defined by . Then can be uniquely expressed as
(4.1) |
for some real coefficients . Note that these coefficients are precisely the Fourier coefficients of the cosine series . We can therefore apply a classical result of Zygmund on absolute convergence of trigonometric series.
Lemma 4.4 (Zygmund).
Proof.
That follows immediately from . To obtain the second estimate, we note that
The conclusion follows from [67, p. 242]. ∎
A direct consequence is the following.
Corollary 4.5.
Proof.
That satisfies follows by the chain rule. It remains to apply Lemma 4.4 with . ∎
4.3. Compactly supported distributions
In this paper we will encounter only distributions on . We therefore adopt the following definition [39, §2.3].
Definition 4.6.
A linear functional on such that
holds for some , , is called a compactly supported distribution.
The linear functionals that will arise in this paper will not be defined a priori on , but rather only on the space of univariate polynomials. It will be therefore essential to understand when linear functionals on can be extended to compactly supported distributions. As we have not located the following result in the literature, we prove it here for completeness.
Lemma 4.7 (Hausdorff moment problem for compactly supported distributions).
Let be a linear functional on . Then the following are equivalent.
-
1.
There exist so that for all .
-
2.
There exist so that for all .
-
3.
extends uniquely to a compactly supported distribution.
Only the implication will be used in this paper; the equivalence of appears implicitly in [24, p. 38].
Proof.
That is immediate as by the definition of the Chebyshev polynomials .
Next, we recall the definition of support [39, §2.2].
Definition 4.8.
The support of a compactly supported distribution is the smallest closed set with the property that for every such that in a neighborhood of .
It is clear that when the condition of Definition 4.6 is satisfied, we must have . However, the support may in fact be much smaller. A key property of compactly supported distributions for our purposes is that their support can be bounded by the growth rate of their moments. While the analogous property of measures is straightforward, its validity for distributions requires justification.
Lemma 4.9.
Let be a compactly supported distribution. Then
Proof.
It is clear from the definition of a compactly supported distribution that we must have . Thus the function defined by
is analytic for with sufficiently large. It follows from [24, Lemma 1] (or from [64, Theorem 5.4]) that can be analytically continued to and that is precisely the set of singular points of . As the above expansion of is absolutely convergent in a neighborhood of for all , it follows that for all , which yields the conclusion. ∎
4.4. Test functions
We finally recall a standard construction of smooth approximations of the indicator function for which the bound of Lemma 4.4 is well controlled; its sharpness is discussed in [39, pp. 19–22]. Recall that .
Lemma 4.10 (Test functions).
Fix and so that . Then there exists a function with the following properties.
-
1.
for all , for , and for .
-
2.
for all .
Here , and is universal constant.
Proof.
Let be a nonnegative function that is supported in with . For , define and , and let
Here are parameters that will be chosen below.
Note that by construction. Moreover, the integrand in the definition of is nonnegative, has -norm one, and is supported in . Thus is nondecreasing, for , and for . We define
for , and define otherwise. We now choose the parameters and so that 1. holds. Moreover, is constant near as , so by the chain rule.
Now note that for . Therefore
where the factor in the first line arises by the triangle inequality and by splitting the norm over the intervals and , and the second line uses Young’s inequality and . Then 2. follows by noting that . ∎
5. Words in random permutation matrices
The aim of this section is to recall basic facts about random permutations that will form the input to the methods of this paper. These results are implicit in Nica [53], but are derived in simpler and more explicit form by Linial and Puder [45] whose presentation we follow. We emphasize that all the results that are reviewed in this section arise from elementary combinatorial reasoning; we refer the reader to the expository paper [21, §3] for a brief introduction.
We will work with random permutation matrices and their limiting model as defined in section 3.1. In particular, we recall that , where are the free generators of .
It will be convenient to extend these definitions by setting , and analogously , and , for . This convention will be in force in the remainder of the paper.
We aim to understand the expected trace of words in random permutation matrices and their inverses. To formalize this notion, denote by the set of all finite words in the letters . We implicitly identify with the word map it induces on any group . Thus is the reduction of the word , while is the random matrix obtained by substituting and multiplying these matrices in the order they appear in .
5.1. Rationality
The first property we will need is that the expected trace of any word is a rational function of . We emphasize that only very limited information on this function will be needed, which is collected in the following lemma.
Lemma 5.1.
Let be any word of length at most and . Then
where
with , and are polynomials of degree at most .
Proof.
Observe that is the expected number of fixed points of minus one (as we restrict to ). Thus [45, eq. (7)] yields
where the sum is over a certain collection of connected graphs with vertices and edges, and are integers so that [45, p. 105]. Note that as is connected.
The denominator inside the sum can be written equivalently as
with as . Thus
To conclude, note that and . Thus has degree at most and each term in the sum has degree at most . ∎
5.2. First-order asymptotics
We now recall the first-order asymptotic behavior of expected traces of words in random permutation matrices. The following is a special case of a result of Nica [53], see [45, p. 124] for a simple proof.
An element of the free group , is called a proper power if it can be written as for some and , and is called a non-power otherwise. We denote by the set of non-powers. Every , can be written uniquely as for some and , cf. [47, Proposition I.2.17].
Lemma 5.2.
Fix a word that does not reduce to the identity, and express its reduction as with and . Then
where denotes the number of divisors of .
Proof.
In particular, Lemma 5.2 implies the following.
Corollary 5.3 (Weak convergence).
For every word , we have
Proof.
Lemma 5.2 implies that for any word that does not reduce to the identity. On the other hand, if reduces to the identity, then is the identity matrix on and thus . The conclusion follows as clearly . ∎
6. Master inequalities for random permutations I
6.1. Master inequalities
In this section, we develop a core ingredient of our method: inequalities for the normalized trace of polynomials of .
Theorem 6.1 (Master inequalities).
Fix a self-adjoint noncommutative polynomial of degree , and let . Then there exists a linear functional on for every so that
for every and .
An immediate consequence of Theorem 6.1 is the following corollary that we spell out separately for future reference.
Corollary 6.2.
6.2. Proof of Theorem 6.1 and Corollary 6.2
Throughout the proof, we fix a polynomial as in Theorem 6.1. We emphasize that all the objects that appear in the proof depend implicitly on the choice of .
We begin by noting that is a linear combination of words of length at most for every . Thus Lemma 5.1 yields
for , where are polynomials of degree at most .
As has no pole at , we can define for each
It is clear from the definition of that defines a linear functional on . Moreover, it follows immediately from Taylor’s theorem that
(6.1) |
provided is large enough that has no poles on . The main idea of the proof is that we will use Lemma 4.3 to bound the remainder term in the Taylor expansion. To this end we must show that the function is nearly constant.
Lemma 6.3.
for every and .
Proof.
Next, we must obtain an upper bound on .
Lemma 6.4.
for .
Proof.
We can now conclude the proof.
Proof of Theorem 6.1 and Corollary 6.2.
Let . Then
by Lemmas 4.3, 6.3, and 6.4, where we used that for . Thus the proof of Theorem 6.1 for follows directly from (6.1).
7. Master inequalities for random permutations II
The aim of this short section is to extend the master inequalities of Theorem 6.1 from polynomials to general smooth test functions. This extension will play a key role in the proofs of strong convergence.
Theorem 7.1 (Smooth master inequalities).
Fix a self-adjoint noncommutative polynomial of degree , and let . Then there exists a compactly supported distribution for every so that
for all , , , where and .
Proof.
We first note that Corollary 6.2 ensures, by Lemma 4.7, that the linear functionals in Theorem 6.1 extend uniquely to compactly supported distributions.
Let and express it in the form (4.1). Rather than applying Theorem 6.1 to directly, we apply it to the Chebyshev polynomials to obtain
Here we used that the left-hand side of Theorem 6.1 vanishes when is the constant function (as then , for ), so the constant term in the Chebyshev expansion cancels. The proof is completed for by applying Lemma 4.4. The conclusion extends to as is dense in . ∎
The special case is of independent interest, as it yields a quantitative formulation of weak convergence (cf. Corollary 5.3).
Corollary 7.2 (Quantitative weak convergence).
Proof.
Remark 7.3.
When is chosen to be a smooth approximation of the indicator function of an interval, Corollary 7.2 shows that the spectral distribution of is well approximated by that of at a mesoscopic scale. While far sharper results at the local scale are proved in [42, 40, 41] for random regular graphs, no result of this kind appears to be known to date for arbitrary polynomials .
In the remainder of this paper, we adopt the notation for the distributions as in Theorem 7.1. It should be emphasized, however, that the definition of depends both on the model and on the noncommutative polynomial under consideration. As each of the following sections will be concerned with a specific model and choice of polynomial , the meaning of will always be clear from context.
8. Random regular graphs
The aim of this section is to prove our main results on random regular graphs, the effective Friedman Theorem 3.4 and the staircase Theorem 3.6.
8.1. Proof of Theorem 3.4
The proof of Theorem 3.4 is based on Theorem 7.1 with . Let us spell out what this says in the present setting.
Lemma 8.1.
There exists a compactly supported distribution such that for every , , and , we have
where and .
The remaining ingredient of the proof is to show that . To this end, is suffices by Lemma 4.9 to understand the exponential growth rate of the moments of . We begin by computing a formula for the moments of .222In the special setting of random regular graphs, a nonrigorous derivation of an explicit formula for (that is, an explicit solution to the Hausdorff moment problem associated to Lemma 8.2) appears in the physics literature [49]. However, an explicit solution to the moment problem, which is not available in more general situations, is not needed to apply the methods of this paper. Recall that denotes the non-power elements of (see Lemma 5.2).
Lemma 8.2.
For all , we have
Proof.
In the present setting, a simple argument due to Friedman [26, Lemma 2.4] suffices to bound the growth rate of the moments. As a variant of this argument will be needed later in this paper, we recall the proof here.
Lemma 8.3.
For every and , we have
Proof.
We would like to argue that if , there must exist so that , , and . But this need not be true: if is not cyclically reduced, the last letters of may cancel the first letters of the next repetition of , and then the cancelled letters need not appear in .
To eliminate this ambiguity, we note that any can be uniquely expressed as with , such that is cyclically reduced and such that is reduced if . Thus we may write for any
where the sum is over pairs with the above properties.
The idea is now to express the indicators as . Substituting this identity into the right-hand side of the above inequality yields
where we used . The conclusion now follows readily by applying the Cauchy–Schwarz inequality to the sums over and , as
for any . ∎
Corollary 8.4.
.
We can now complete the proof of Theorem 3.4.
8.2. Proof of Theorem 3.6
The proof is based on results of Puder [61]. Let us begin by synthesizing the parts of [61] that we need here.
Lemma 8.5.
Define as in Theorem 3.6. Then for any , we have
Proof.
Fix a word of length that does not reduce to the identity. By Lemmas 5.1 and 5.2, we can write for all sufficiently large
as a rational function is analytic away from its poles. It is shown in [61, §5.5] that for , that , and that for , where denotes the primitivity rank of . We refer to [61, §2.2] for precise definitions of and , which are not needed in the following argument: the only properties we will use are that if and only if , and that when .
Corollary 8.6.
Let . Then for all .
We can now complete the proof of Theorem 3.6.
Proof of Theorem 3.6.
Fix . Let be the test function provided by Lemma 4.10 with , , . Then for by Corollary 8.6. Thus Theorem 7.1 with , and Lemma 4.10 yield
where we chose (note that as ).
For the lower bound, it is shown in the proof of [27, Theorem 2.11] that
for all and . Choosing completes the proof. ∎
9. Strong convergence of random permutation matrices
The aim of this section is to prove Theorem 3.9. Most of the proof is identical to that of Theorem 3.4. The only difficulty in the present setting is to adapt the argument of Lemma 8.3 for bounding . This is not entirely straightforward, because the proof of Lemma 8.3 relied on an overcounting argument which is not applicable to general polynomials. Nonetheless, we will show that a more careful implementation of the idea behind the proof can avoid this issue.
Throughout the proof, we fix a polynomial as in Theorem 3.9, and define
To simplify the proof, we begin by applying a linearization trick of Pisier [59] in order to factor into polynomials of degree one.
Lemma 9.1.
For any , there exist and operators
with matrix coefficients of dimensions with , so that and for all .
Proof.
This is an immediate consequence of [59, Theorem 1]. ∎
In the following, we fix and a factorization of as in Lemma 9.1. We will also define by replacing in the definition of . Note that as are free, implies that .
It will be convenient to extend the indexing of the above operators cyclically: we will define for all . This implies, for example, that . We similarly define for .
9.1. The first-order support
In the present setting, the moments of the linear functionals and in Theorem 6.1 satisfy
We begin by writing an expression for .
Lemma 9.2.
For every , we have
where we define
Proof.
The proof is identical to that of Lemma 8.2. ∎
We would like to repeat the proof of Lemma 8.3 in the present setting. Recall that the idea of the proof is to write where is cyclically reduced, and then to reason that any word that reduces to is a concatenation of words , , , etc. We can therefore sum over all such concatenations in the expression for .
The problem with this argument is that the above decomposition is not unique: for example, can be decomposed in two different ways or . Thus when we sum over all possible ways of generating such concatenations, we are overcounting the number of words that reduce to . Unlike in Lemma 8.3, the coefficients can have both positive and negative signs and thus we cannot upper bound the moments by overcounting.
We presently show how a more careful analysis can avoid overcounting. We begin by introducing the following basic notion.
Definition 9.3.
For any and , we write if and for .
To interpret this notion, one should think of any word as defining a walk in the Cayley graph of when read from right to left, starting at the identity. Then states that the walk defined by reaches for the first time at its endpoint. Just as we can write
the equivalence notion of Definition 9.3 may also be expressed in terms of matrix elements of the left-regular representation.
Lemma 9.4.
For any and , we have
where we defined (that is, is the orthogonal projection onto ).
Proof.
One may simply note that term in the sum is the indicator of the event that , , and for . ∎
We now use this identity to express the kind of sum that appears in Lemma 9.2 exactly (without overcounting) in terms of matrix elements of the operators .
Lemma 9.5.
Fix so that is reduced. Define
for and for . Then
where we let .
Proof.
Suppose that . As we assumed that is reduced, there are unique indices so that
Thus we can write
The conclusion follows by applying Lemma 9.4 to the indicators in this identity and summing over the indices . ∎
With this identity in hand, we can proceed exactly as in Lemma 8.3. In the following lemma, recall that we chose an arbitrary in Lemma 9.1.
Lemma 9.6.
For every , we have
where .
Proof.
Suppose first that . Let be the set of that are cyclically reduced, and let . Then every can be uniquely expressed as with , so that is reduced. Applying Lemma 9.5 with and , , , , yields
Using that and by Lemma 9.1 and , and applying Cauchy–Schwarz as in the proof of Lemma 8.3, we can upper bound the right-hand side by .
If we sum instead over on the left-hand side, we can bound in exactly the same manner by applying Lemma 9.5 with and , , . Thus the proof is complete for the case .
The proof for is identical, except that we now omit the terms. ∎
Corollary 9.7.
.
9.2. Proof of Theorem 3.9
The rest of the proof contains no new ideas.
10. Stable representations of the symmetric group
The aim of this section is to prove Theorem 3.14. The main issue here is to extend the basic properties of random permutations matrices of section 5 to general stable representations. With analogues of these properties in place, the remainder of the proof is nearly the same as that of Theorem 3.9.
Throughout this section, we fix a stable representation as in section 3.4.2 and adopt the notations introduced there. In addition, we will denote by the degree of the polynomial and by .
10.1. Basic properties
We will deduce the requisite properties from the much more precise results of [32]. (In fact, the only facts that will be needed here are relatively elementary, as is explained in [45, Remark 31] and [32, §1.5.1].)
Let be a word. Then clearly
where are i.i.d. uniformly distributed elements of . The distribution of defines a probability measure on , called the word measure.
Lemma 10.1.
Let be any word of length at most that does not reduce to the identity, and let . Then we have
where
with , and are polynomials of degree at most .
Proof.
We may assume without loss of generality that is cyclically reduced with length , as is invariant under cyclic reduction.
The definition of the stable representation implies that
(10.1) |
Let be a monomial of of degree at least one. Then for sufficiently large, [32, Example 3.6, Definition 6.6, and Proposition 6.8] imply that
(10.2) |
where the sum is over a certain collection of quotients of the graph consisting of disjoint cycles of length , disjoint cycles of length , etc., whose edges are colored by . Here we denote by the number of vertices of , by the number of edges of colored by , and by .
It is immediate from the above construction that , and analogously . The validity of (10.2) for all can be read off from the proof of [32, Proposition 6.8]. Moreover, we must have for all that appear in the sum, as by [32, Theorem 1.3] and the following discussion. Thus (10.2) is a rational function of . The remainder of the proof proceeds exactly as in the proof of Lemma 5.1. ∎
We now describe the lowest order asymptotics.
Lemma 10.2.
Fix a word that does not reduce to the identity, and express its reduction as with and . Then
with and for . Here is a constant that depends on .
Proof.
Recall that is a sum of character polynomials of irreducible representations, and that we assumed that does not contain the trivial representation. The case therefore follows from Lemma 5.2 for the standard representation and from [32, Corollary 1.7] for all other irreducible components of .
For , let be a monomial of . Applying [32, Theorem 1.3] and the following discussion as well as [32, Remark 7.3] yields
where is the multiset that contains with multiplicity for . By crudely estimating , the right-hand side is bounded by where denotes the number of partitions of a set with elements. The conclusion follows as and using (10.1). ∎
10.2. Proof of Theorem 3.14
Fix a self-adjoint noncommutative polynomial of degree , and let . Then is a linear combination of words of length at most . Summing separately over words that do and do not reduce to the identity yields
by Lemma 10.1, where is a polynomial of degree at most . Thus
where is the polynomial of degree at most . Here we used that is a polynomial of of degree .
We can now repeat the proofs in sections 6–7 nearly verbatim with the replacements (here we multiply by to ensure in Lemma 6.4) and to conclude the following.
Lemma 10.3.
Let . Fix as in Theorem 3.14 and . Then there exist compactly supported distributions such that
for all , , and . Here , , and is a constant that depends on the choice of stable representation.
Next, we must characterize the supports of the distributions .
Lemma 10.4.
for .
Proof.
Suppose first that . As Lemma 10.2 implies that
as for every , it follows that there is a constant for every so that . The claim follows directly.
The rest of the proof follows in the usual manner.
Appendix A Lower bounds
There is a general principle that any random matrix model that satisfies
for every noncommutative polynomial , automatically also satisfies
provided that the -algebra generated by has a unique faithful trace; this can be shown, for example, along the lines of [46, pp. 16–19]. Since the latter property holds for the limiting model defined in section 3.1 (cf. [60]), this general principle applies to all the models considered in this paper.
For this reason, we have focused exclusively on strong convergence upper bounds in the main part of this paper. On the other hand, such a general principle is usually not needed in practice, as in most cases the lower bound already follows easily from weak convergence alone. For completness, we presently provide an elementary proof of the lower bound in the setting of this paper along these lines.
Theorem A.1.
Let be any noncommutative polynomial, and fix any stable representation as in section 3.4.2. Then
as .
Note that all the models in this paper may be viewed as special cases of stable representations, so that Theorem A.1 captures random permutation matrices and random regular graphs as well. The proof of Theorem A.1 could also form the basis for quantitative lower bounds, but we do not pursue this here.
We begin by recalling the basic fact that Theorem A.1 will follow directly once we establish weak convergence in the following form.
Lemma A.2 (Weak convergence in probability).
Let us first use this result to complete the proof of Theorem A.1.
Proof of Theorem A.1.
The only subtlety in implementing this idea is that the form of weak convergence that follows from Lemma 10.2 only yields convergence in expectation
(A.1) |
as opposed to convergence in probability as in Lemma A.2. We must therefore still show concentration around the mean. The latter can be deduced from (A.1) using a minor additional argument. Note that (A.1) does not require to be self-adjoint.
Proof of Lemma A.2.
Since every stable representation is the direct sum of irreducible stable represenations (cf. section 3.4.1), we can assume in the rest of the proof that is irreducible. Morover, given any ,
where are the diagonal elements of with respect to the matrix coefficients. We can therefore assume in the rest of the proof that .
Let , where is a uniformly distributed random element of that is independent of . Moreover, let where are free generators of . It follows from Schur’s lemma that
for every . Therefore
Moreover, it is readily verified using the definitions in section 3.1 that
Since the objects inside the traces on the left-hand side are noncommutative polynomials in variables and as is self-adjoint, applying (A.1) yields
Given that convergence in expectation follows from (A.1) and that the variance converges to zero, convergence in probability follows immediately. ∎
Acknowledgments
We are grateful to Charles Bordenave, Benoît Collins, Michael Magee, Doron Puder, and Mikael de la Salle for very helpful discussions, and to Nick Cook, Srivatsav Kunnawalkam Elayavalli, Joel Friedman, Sidhanth Mohanty, Peter Sarnak, Nikhil Srivastava, Pierre Youssef, and Yizhe Zhu for helpful comments. We are especially indebted to Michael Magee for suggesting the application to stable representations, and for patiently explaining what they are. RvH thanks Peter Sarnak for organizing a stimulating seminar on strong convergence and related topics at Princeton in Fall 2023. Finally, we are grateful to the referee for suggestions that helped us improve the presentation of the paper.
CFC was supported by a Simons-CIQC postdoctoral fellowship through NSF grant QLCI-2016245. JGV was supported by NSF grant FRG-1952777, Caltech IST, and a Baer–Weiss postdoctoral fellowship. JAT was supported by the Caltech Carver Mead New Adventures Fund, NSF grant FRG-1952777, and ONR grants BRC-N00014-18-1-2363 and N00014-24-1-2223. RvH was supported by NSF grants DMS-2054565 and DMS-2347954.
References
- [1] S. Aaronson. The polynomial method in quantum and classical computing. In Proc. 49th IEEE Symposium on Foundations of Computer Science, page 3. IEEE, 2008.
- [2] A. S. Bandeira, M. T. Boedihardjo, and R. van Handel. Matrix concentration inequalities and free probability. Invent. Math., 234(1):419–487, 2023.
- [3] R. Bauerschmidt, J. Huang, A. Knowles, and H.-T. Yau. Edge rigidity and universality of random regular graphs of intermediate degree. Geom. Funct. Anal., 30(3):693–769, 2020.
- [4] S. Belinschi and M. Capitaine. Strong convergence of tensor products of independent GUE matrices, 2022. Preprint arXiv:2205.07695.
- [5] C. Bordenave. A new proof of Friedman’s second eigenvalue theorem and its extension to random lifts. Ann. Sci. Éc. Norm. Supér. (4), 53(6):1393–1439, 2020.
- [6] C. Bordenave, D. Chafaï, and D. García-Zelada. Convergence of the spectral radius of a random matrix through its characteristic polynomial. Probab. Theory Related Fields, 182(3-4):1163–1181, 2022.
- [7] C. Bordenave and B. Collins. Eigenvalues of random lifts and polynomials of random permutation matrices. Ann. of Math. (2), 190(3):811–875, 2019.
- [8] C. Bordenave and B. Collins. Norm of matrix-valued polynomials in random unitaries and permutations, 2024. Preprint arxiv:2304.05714v2.
- [9] C. Bordenave and B. Collins. Strong asymptotic freeness for independent uniform variables on compact groups associated to nontrivial representations. Invent. Math., 237(1):221–273, 2024.
- [10] P. Borwein and T. Erdélyi. Polynomials and polynomial inequalities, volume 161 of Graduate Texts in Mathematics. Springer-Verlag, New York, 1995.
- [11] J. Bourgain and L. Tzafriri. On a problem of Kadison and Singer. J. Reine Angew. Math., 420:1–43, 1991.
- [12] T. Brailovskaya and R. van Handel. Universality and sharp matrix concentration inequalities. Geom. Funct. Anal., 34(6):1734–1838, 2024.
- [13] E. Cassidy. Random permutations acting on -tuples have near-optimal spectral gap for , 2024. Preprint arxiv:2412.13941.
- [14] C.-F. Chen, A. Bouland, F. G. S. L. Brandão, J. Docter, P. Hayden, and M. Xu. Efficient unitary designs and pseudorandom unitaries from permutations, 2024. Preprint arxiv:2404.16751.
- [15] C.-F. Chen, J. Docter, M. Xu, A. Bouland, and P. Hayden. Efficient unitary -designs from random sums, 2024. Preprint arxiv:2402.09335.
- [16] E. W. Cheney. Introduction to approximation theory. AMS, Providence, RI, 1998.
- [17] M. D. Choi. The full -algebra of the free group on two generators. Pacific J. Math., 87(1):41–48, 1980.
- [18] T. Church, J. S. Ellenberg, and B. Farb. FI-modules and stability for representations of symmetric groups. Duke Math. J., 164(9):1833–1910, 2015.
- [19] B. Collins. Moment methods on compact groups: Weingarten calculus and its applications. In ICM Vol. IV. Sections 5–8, pages 3142–3164. EMS Press, Berlin, 2023.
- [20] B. Collins, A. Guionnet, and F. Parraud. On the operator norm of non-commutative polynomials in deterministic matrices and iid GUE matrices. Camb. J. Math., 10(1):195–260, 2022.
- [21] B. Collins, S. Matsumoto, and J. Novak. The Weingarten calculus. Notices Amer. Math. Soc., 69(5):734–745, 2022.
- [22] D. Coppersmith and T. J. Rivlin. The growth of polynomials bounded at equally spaced points. SIAM J. Math. Anal., 23(4):970–983, 1992.
- [23] S. Coste, G. Lambert, and Y. Zhu. The characteristic polynomial of sums of random permutations and regular digraphs. Int. Math. Res. Not. IMRN, (3):2461–2510, 2024.
- [24] R. Estrada and R. P. Kanwal. Moment sequences for a certain class of distributions. Complex Variables Theory Appl., 9(1):31–39, 1987.
- [25] B. Farb. Representation stability. In Proceedings of the International Congress of Mathematicians—Seoul 2014. Vol. II, pages 1173–1196. Kyung Moon Sa, Seoul, 2014.
- [26] J. Friedman. Relative expanders or weakly relatively Ramanujan graphs. Duke Math. J., 118(1):19–35, 2003.
- [27] J. Friedman. A proof of Alon’s second eigenvalue conjecture and related problems. Mem. Amer. Math. Soc., 195(910):viii+100, 2008.
- [28] J. Friedman, A. Joux, Y. Roichman, J. Stern, and J.-P. Tillich. The action of a few random permutations on -tuples and an application to cryptography. In STACS 96 (Grenoble, 1996), volume 1046 of Lecture Notes in Comput. Sci., pages 375–386. Springer, Berlin, 1996.
- [29] J. Friedman and D.-E. Kohler. The relativized second eigenvalue conjecture of Alon, 2014. Preprint arxiv:1403.3462.
- [30] U. Haagerup, H. Schultz, and S. Thorbjørnsen. A random matrix approach to the lack of projections in . Adv. Math., 204(1):1–83, 2006.
- [31] U. Haagerup and S. Thorbjørnsen. A new application of random matrices: is not a group. Ann. of Math. (2), 162(2):711–775, 2005.
- [32] L. Hanany and D. Puder. Word measures on symmetric groups. Int. Math. Res. Not. IMRN, (11):9221–9297, 2023.
- [33] M. B. Hastings and A. W. Harrow. Classical and quantum tensor product expanders. Quantum Inf. Comput., 9(3-4):336–360, 2009.
- [34] B. Hayes. A random matrix approach to the Peterson-Thom conjecture. Indiana Univ. Math. J., 71(3):1243–1297, 2022.
- [35] B. Hayes, D. Jekel, and S. K. Elayavalli. Consequences of the random matrix solution to the Peterson-Thom conjecture. Anal. PDE, 2025. To appear.
- [36] Y. He. Spectral gap and edge universality of dense random regular graphs. Commun. Math. Phys., 405:181, 2024.
- [37] W. Hide. Effective lower bounds for spectra of random covers and random unitary bundles. Israel J. Math., 2025. To appear.
- [38] W. Hide and M. Magee. Near optimal spectral gaps for hyperbolic surfaces. Ann. of Math. (2), 198(2):791–824, 2023.
- [39] L. Hörmander. The analysis of linear partial differential operators. I. Classics in Mathematics. Springer-Verlag, Berlin, 2003. Distribution theory and Fourier analysis.
- [40] J. Huang, T. McKenzie, and H.-T. Yau. Optimal eigenvalue rigidity of random regular graphs, 2024. Preprint arxiv:2405.12161.
- [41] J. Huang, T. McKenzie, and H.-T. Yau. Ramanujan property and edge universality of random regular graphs, 2024. Preprint arxiv:2412.20263.
- [42] J. Huang and H.-T. Yau. Spectrum of random -regular graphs up to the edge. Comm. Pure Appl. Math., 77(3):1635–1723, 2024.
- [43] H. Kesten. Symmetric random walks on groups. Trans. Amer. Math. Soc., 92:336–354, 1959.
- [44] F. Lehner. Computing norms of free operators with matrix coefficients. Amer. J. Math., 121(3):453–486, 1999.
- [45] N. Linial and D. Puder. Word maps and spectra of random graph lifts. Random Structures Algorithms, 37(1):100–135, 2010.
- [46] L. Louder, M. Magee, and W. Hide. Strongly convergent unitary representations of limit groups. J. Funct. Anal., 288(6):Paper No. 110803, 2025.
- [47] R. C. Lyndon and P. E. Schupp. Combinatorial group theory. Classics in Mathematics. Springer-Verlag, Berlin, 2001. Reprint of the 1977 edition.
- [48] M. Magee and J. Thomas. Strongly convergent unitary representations of right-angled Artin groups, 2023. Preprint arxiv:2308.00863.
- [49] F. L. Metz, G. Parisi, and L. Leuzzi. Finite-size corrections to the spectrum of regular random graphs: An analytical solution. Phys. Rev. E, 90:052109, Nov 2014.
- [50] J. A. Mingo and R. Speicher. Free probability and random matrices, volume 35 of Fields Institute Monographs. Springer, New York, 2017.
- [51] S. Mohanty, R. O’Donnell, and P. Paredes. Explicit near-Ramanujan graphs of every degree, 2019. Preprint arXiv:1909.06988v3.
- [52] A. Nica. Asymptotically free families of random unitaries in symmetric groups. Pacific J. Math., 157(2):295–310, 1993.
- [53] A. Nica. On the number of cycles of given length of a free word in several random permutations. Random Structures Algorithms, 5(5):703–730, 1994.
- [54] A. Nica and R. Speicher. Lectures on the combinatorics of free probability. Cambridge, 2006.
- [55] A. Nilli. On the second eigenvalue of a graph. Discrete Math., 91(2):207–210, 1991.
- [56] R. O’Donnell and X. Wu. Explicit near-fully X-Ramanujan graphs, 2020. Preprint arXiv:2009.02595.
- [57] F. Parraud. On the operator norm of non-commutative polynomials in deterministic matrices and iid Haar unitary matrices. Probab. Theory Related Fields, 182(3-4):751–806, 2022.
- [58] F. Parraud. Asymptotic expansion of smooth functions in polynomials in deterministic matrices and iid GUE matrices. Comm. Math. Phys., 399(1):249–294, 2023.
- [59] G. Pisier. On a linearization trick. Enseign. Math., 64(3-4):315–326, 2018.
- [60] R. T. Powers. Simplicity of the -algebra associated with the free group on two generators. Duke Math. J., 42:151–156, 1975.
- [61] D. Puder. Expansion of random graphs: new proofs, new results. Invent. Math., 201(3):845–908, 2015.
- [62] I. Rivin and N. T. Sardari. Quantum chaos on random Cayley graphs of . Exp. Math., 28(3):328–341, 2019.
- [63] A. Sarid. The spectral gap of random regular graphs. Random Structures Algorithms, 63(2):557–587, 2023.
- [64] H. Schultz. Non-commutative polynomials of independent Gaussian random matrices. The real and symplectic cases. Probab. Theory Related Fields, 131(2):261–309, 2005.
- [65] A. Song. Random minimal surfaces in spheres, 2024. Preprint arxiv:2402.10287.
- [66] V. Vu. Random discrete matrices. In Horizons of combinatorics, volume 17 of Bolyai Soc. Math. Stud., pages 257–280. Springer, Berlin, 2008.
- [67] A. Zygmund. Trigonometric series. Vol. I, II. Cambridge, third edition, 2002.