Non-backtracking spectra of weighted inhomogeneous random graphs
Abstract
We study a model of random graphs where each edge is drawn independently (but not necessarily identically distributed) from the others, and then assigned a random weight. When the mean degree of such a graph is low, it is known that the spectrum of the adjacency matrix deviates significantly from that of its expected value .
In contrast, we show that over a wide range of parameters the top eigenvalues of the non-backtracking matrix — a matrix whose powers count the non-backtracking walks between two edges — are close to those of , and all other eigenvalues are confined in a bulk with known radius. We also obtain a precise characterization of the scalar product between the eigenvectors of and their deterministic counterparts derived from the model parameters.
This result has many applications, in domains ranging from (noisy) matrix completion to community detection, as well as matrix perturbation theory. In particular, we establish as a corollary that a result known as the Baik-Ben Arous-Péché phase transition, previously established only for rotationally invariant random matrices, holds more generally for matrices as above under a mild concentration hypothesis.
Mathematics Subject Classification (2020): 60B20.
Keywords: random graphs, community detection, non-backtraking matrix.
1 Introduction
Let be a symmetric matrix with entries in , and a (symmetric) weight matrix with independent random entries. We define the inhomogeneous undirected random graph associated with the couple as follows: the vertex set is simply , and each edge is present in independently with probability , and holds weight .
The entrywise expected value and variance of the weighted adjacency matrix of are
(1) |
2 Applications
2.1 Phase transition in random graphs
Matrix perturbation theory focuses on finding the eigenvalues and eigenvectors of matrices of the form , where is a known matrix and is a perturbation assumed “small” in a sense. Celebrated results in this field include the Bauer-Fike theorem [bauer_norms_1960] for asymmetric matrices, and the Weyl [weyl_asymptotische_1912] and Davis-Kahan [yu_useful_2015] theorems for symmetric ones; incidentally the present paper makes use of those results in its proofs. Finding sharp general theorems without additional assumptions is known to be hard, since the eigenvalues and eigenvectors depend on the interactions between the eigenspaces of and .
In the last two decades, growing attention has been paid to problems of the following form: finding the eigenvectors of (or, in its multiplicative form, ), where is an matrix with low rank (usually fixed) and known eigenvalues, and is a random matrix with known distribution. Examples of this setting are the spiked covariance model [baik_phase_2005, johnstone_pca_2018] and additive perturbations of Wigner matrices [peche_largest_2006, feral_largest_2007, capitaine_largest_2009]. A more systematic study has been performed in [benaychgeorges_singular_2012, benaychgeorges_spectral_2020] on orthogonally invariant random matrices.
A staple of those results is the existence of a so-called BBP phase transition (named after Baik-Ben Arous-Péché, from the seminal article [baik_phase_2005]): in the limit , each eigenvalue of that is above a certain threshold gets reflected (albeit perturbed) in the spectrum of , with the associated eigenvector correlated to the one of .
Phase transition for the adjacency matrix
The adjacency matrix of our random graph can be viewed as a perturbation model by writing
The term being negligible with respect to the others, we can see as the sum of a deterministic low-rank matrix and a random noise matrix with i.i.d centered entries. Further, the entrywise variance of is equal (up to a negligible term) to , so the parameter can be seen as an equivalent to the variance in the Wigner model. We thus expect, whenever (so that is the actual threshold in Theorem LABEL:th:main), to find a phase transition akin to the one in [benaych-georges_eigenvalues_2011]; and indeed the following theorem holds:
Theorem 1.
Let be a matrix couple of size and as above. Assume further that:
-
(i)
the Perron-Frobenius eigenvector of is ; that is ,
-
(ii)
the above eigenvector equation concentrates, i.e. with high probability there exists such that for all ,
(2)
Then, if is such that , there exists an eigenvalue of that verifies
(3) |
Further, if the mean degree for all is equal to , and is such that (with and defined in (LABEL:eq:sigma_def) and (LABEL:eq:delta_i_def)), then there exists a normed eigenvector of with corresponfing eigenvalue such that
(4) |
Whenever , and goes to zero as , then the condition is always verified and the term in (3) vanishes, and the obtained expansion is therefore asymptotically correct. The presence of renders a similar result on the scalar product harder to obtain; however, assuming (that is, the eigenvalues of are somewhat regularly spaced) implies similarly that the term in (4) vanishes.
The obtained expression for , as well as the scalar product expansion, are identical to the ones in [benaych-georges_eigenvalues_2011], for low-rank additive perturbations of Gaussian Wigner matrices. Our result is thus a direct extension of [benaych-georges_eigenvalues_2011], for a larger class of matrices upon a sparsity and concentration condition. Such an extension isn’t unexpected, in view of results concerning the universality of the semicircle law for Bernoulli random matrices, such as [erdos_local_2013].
An especially interesting particular case of Theorem 1 is the unweighted random graph setting, where for all . In this case, we have so the eigenvector equation is equivalent to all the average degrees being equal, i.e. for . It is a well known fact (see for example [feige_spectral_2005]) that for unweighted random graphs the degree concentration property holds with . A slight modification of the proof of Theorem 1 further removes several error terms, and the following corollary ensues:
Corollary 1.
Let be a matrix and as above, with . Assume further that for all ,
Then for all , there exists an eigenvalue of that verifies
and if is such that , there exists a normed eigenvector of such that
In particular we have
This is an improvement on the results of [benaychgeorges_spectral_2020], which only give . The condition ensures that the degrees of concentrate. Since our result is really only meaningful whenever , so that the error term is negligible before , we do not perform the same detailed analysis as in [alt_extremal_2020]. However, a more precise phase transition around is not excluded.
Theorem 1 is derived from Theorem LABEL:th:main through an adaptation of the Ihara-Bass formula [bass_iharaselberg_1992], obtained by expanding arguments from [benaychgeorges_largest_2019, watanabe_graph_2009]:
Proposition 1.
Let be an eigenvector of the matrix with associated eigenvalue , such that for every . Define the weighted adjacency matrix and the diagonal degree matrix by
3 A Bauer-Fike type bound for almost orthogonal diagonalization
One important tool in tying together the local analysis of is a matrix perturbation theorem, derived from the Bauer-Fike theorem. It mostly consists in a simplification and adaptation of Theorem 8.2 in [bordenave_detection_2020], tailored to our needs. We begin by recalling the original Bauer-Fike Theorem:
Theorem 2 (Bauer-Fike Theorem [bauer_norms_1960]).
Let be a diagonalizable matrix, such that for some invertible matrix and . Let be any matrix of size . Then, any eigenvalue of satisfies
(5) |
for some , where is the condition number of .
Let be the RHS of (5), and the ball centered at with radius (in ). Let be a set of indices such that
Then the number of eigenvalues of in is exactly .
3.1 A custom perturbation lemma for almost diagonalizable matrices
Building on this theorem, we now expose this section’s first result. Let and be matrices; our nearly diagonalizable matrix shall be with . We shall assume that the are in decreasing order of modulus:
Now, let be a matrix, not necessarily diagonalizable. The assumptions needed for our results are as follows:
-
(i)
For some small constant ,
-
(ii)
The matrices and are well-conditioned: both and are nonsingular, and there exist two constants such that
-
(iii)
There exists another constant such that
-
(iv)
The are well-separated from , in the sense that
(6) where an exact expression for will be given over the course of the proof.
Then the following result, whose statement and proof (regarding the eigenvalue perturbation) are adapted from [bordenave_detection_2020], holds:
Theorem 3.
Let be a matrix satisfying assumptions (i)-(iv) above, and let be the eigenvalues of with largest modulus. There exists a permutation such that for all
and the other eigenvalues of all have modulus at most . Additionally, if is such that
(7) |
then there exists a normed eigenvector associated with such that
where is the minimum distance from to another eigenvalue:
Proof.
We begin with defining an alternative matrix such that . Let be the subspace of such that
and consider the vectors and defined as
4 Preliminary computations
We begin the proof of Theorem LABEL:th:bl_u_bounds with some elementary computations on the entries of and , which will be of use in the later parts of the proof. Most of the results from this section are adapted from [bordenave_detection_2020], although sometimes improved and adapted to our setting.
Bounding and from below
We begin with a simple bound on ; by the Courant-Fisher theorem, for every unit vector , and applying it to yields
where we used that and the Jensen inequality. The Frobenius norm of is then greater than , which in turns implies
(8) |
so that is bounded away from zero. In order to prove a similar bound on , we write for
Squaring and summing those inequalities over gives
so that as with ,
(9) |
A scalar product lemma
Our second step is an important lemma for the following proof, leveraging the entrywise bounds on :
Lemma 1.
Let be any unit vectors. Then, for any ,
Proof.
We write the eigendecomposition of as
with the Perron-Frobenius eigenvalue of and its rank. Then, for all ,
This is akin to a delocalization property on the eigenvectors of .
We can now prove the above lemma:
where we extensively used the Cauchy-Schwarz inequality, as well as the bound from (8). ∎
Entrywise bounds for
For a more precise estimation of entrywise bounds, we define the scale-invariant delocalization parameter
Using the same proof technique as in (9), as well as (8), we have
for any . Recall that, as shown in the proof of Lemma 1, for all
Now, for and ,
where we again used the Cauchy-Schwarz inequality at the last line. This yields
(10) |
for any and .
The covariance matrices
We now study the covariance matrices and defined in (LABEL:eq:def_gamma). Our aim is to prove the following lemma:
Lemma 2.
For all ; the matrix (resp. ) is a positive definite matrix, with all its eigenvalues greater than 1 (resp. ) and such that
5 Local study of
It is a well-known fact (see for example [bordenave_nonbacktracking_2018]) that when the mean degree is low enough (), the graph is locally tree-like — that is, vertex neighbourhoods behave almost like random trees. The goal of this section is to establish rigorously this result, as well as provide bounds on neighbourhood sizes.
5.1 Setting and definitions
Labeled rooted graphs
A labeled rooted graph is a triplet consisting of a graph , a root , and a mark function with finite support. We shall denote by the set of labeled rooted graphs with , and will often write for an element of , dropping the mark function. Notions of subgraphs, induced subgraphs and distance extend naturally from regular graphs to this setting.
Labeling trees and graphs
We recall that is the inhomogeneous random graph defined earlier. For each vertex , we can define the associated element of as follows: the root is set to , each vertex is given a mark , and we let for all . The resulting triple is a random element of .
Now, let ; we define the inhomogeneous random tree as follows: first, the root is given a mark . Then, for each vertex already labeled, we draw the number of children of according to , where we recall that
Each child of receives a label drawn independently at random from the distribution
(11) |
which sums to 1 by definition. The resulting tree is a random element of , denoted by .
5.2 Growth properties of trees and graphs
A number of growth properties for neighbourhoods in and are needed to ensure the successful couplings below. By definition of , (resp. ) is dominated by an Erdős-Rényi graph (resp. a Galton-Watson tree with offspring distribution ); we are thus able to direcly lift properties from [bordenave_nonbacktracking_2018], Sections 8 and 9.
Lemma 3.
Let be an arbitrary vertex in ; then, there exist absolute constants such that for every , we have
(12) |
The same result holds when replacing with the tree defined above.
Taking in the above inequality, one gets
(13) |
for any . Summing these inequalities for yields a similar bound for the whole ball: with probability at least , we have
(14) |
for all and . In particular, this implies the following useful bound: for any ,
Another consequence of (12) is the following useful lemma:
Lemma 4.
For every , there is a constant such that
(15) |
An important note is that the above results apply to any collection of random variables satisfying an inequality like (12); in particular, it also applies to an i.i.d collection of inhomogeneous random trees of size .
5.3 Local tree-like structure
We first check that the random graph is tree-like. We say that a graph is -tangle-free if there is at most one cycle in the -neighbourhood of every vertex in the graph. As mentioned before, the random graph is dominated by an Erdős-Rényi graph ; we can therefore lift the desired properties from [bordenave_nonbacktracking_2018].
Lemma 5.
Let be any integer parameter.
-
(i)
the random graph is -tangle-free with probability at least .
-
(ii)
the probability that a given vertex has a cycle in its -neighbourhood is at most .
We shall assume in the following that the -tangle-free property happens with probability at least for some , which happens whenever
(17) |
We now gather all the result of the current section into one proposition, for ease of reading. The bound assumed above is used to simplify the inequalities below.
Proposition 2.
Let be an inhomogeneous random graph, and a family of random trees as defined above. Let be small enough so that (17) holds. Then there exists an event with probability at least , under which:
-
(i)
the graph is -tangle-free,
-
(ii)
for all , , we have
(18) -
(iii)
for any , the number of vertices in whose -neighbourhood contains a cycle is at most
Furthermore, for any and , we have
(19) |
and the same holds for the family .
5.4 Coupling between rooted graphs and trees
We now turn onto the main argument of this proof: we bound the variation distance between the neighbourhoods of and up to size .
First, recall some definitions: if are two probability measures on the space , their total variation distance is defined as
The following two characterizations of the total variation distance shall be useful: first, whenever is countable, we have
(20) |
Additionally,
(21) |
where denotes the set of all couplings between and , i.e. probability measures on such that the marginal distributions are and .
Denoting by the probability distribution of a variable , the aim of this section is to prove the following:
Proposition 3.
Let for some constant . Then, for every vertex ,
(22) |
5.4.1 A total variation distance lemma for sampling processes
For an integer , denote by the set of all multisets with elements in , and by the powerset of . Let , with and , and consider the two probability laws on :
-
•
: each element of is picked with probability ,
-
•
: the size of the multiset is drawn according to a distribution, and each element of has an i.i.d label with distribution .
Note that is actually supported on .
Proposition 4.
Let be defined as above. Then
Proof.
Using characterization (20), we have
(23) |
We shall treat those two terms separately. First, notice that for , we have
(24) | ||||
(25) |
and thus by summing over all sets ,
Using the classical inequality , we can bound the second member of (23) as follows:
Both absolute values above can be removed since the expressions inside are nonnegative; further, for , we have . Combining all those estimates, we find
where we again used the logarithm inequalities extensively. Finally, for , we have , which allows us to finish the computation:
(26) |
∎
We introduce now a family of probability laws on ; for a subset , let be the measure corresponding to picking each element of with probability .
The variation distance between those laws and is then easier to bound:
Lemma 6.
For any , we have:
Proof.
Consider the following coupling: we take a realization of , and set . Then, , and we find
This ends the proof, since (21) ensures that . ∎
5.4.2 Proof of Proposition 3
Gathering all the previous results, we are now ready to prove Proposition 3:
Proof.
Define the classical breadth-first exploration process on the neighbourhood of a vertex as follows : start with and at stage , if is not empty, take a vertex at minimal distance from , reveal its neighbours in , and update . We denote by the filtration generated by the , and by the set of vertices already visited at time , and the first time at which all vertices in have been revealed.
We perform the same exploration process in parallel on , which corresponds to a breadth-first search of the tree. At step , we denote by the distribution of given , and the distribution of the offspring of in (no conditioning is needed there).
Let denote the event that is a tree and contains no more than vertices; from (14) and Lemma 5, we can choose such that has probability at least for some absolute constant . By iteration, it suffices to show that if holds, there exists a constant such that
(27) |
Given , the probability measure is as follows: each element of is selected with probability . Let denote the same probability measure, but where the selection is made over all of . Using Lemma 6, we first find that
On the other hand, Proposition 4 yields
Equation (27) then results from a straightforward application of the triangle inequality. ∎
6 Near eigenvectors of
6.1 Functionals on
6.1.1 Vertex functionals on trees
Similarly to [bordenave_nonbacktracking_2018], quantities of interest in the study of will be tied to functionals on the random inhomogeneous tree defined above. Define a functional on the set of labeled rooted trees by
where is the unique path of length between and . Then the following proposition holds:
Proposition 5.
Let be an integer. For any , the following identities are true:
(28) | |||
(29) | |||
(30) |
where we recall that .
6.1.2 Adapting functionals to non-backtracking paths
The matrix considered here acts on (directed) edges, whereas the functionals considered so far are defined on vertices. Consequently, we define the following transformation: for a function , and a random vector with expected value , let
where denotes the graph with the edge removed.
The expectations from Proposition 5 are then adapted as follows:
Proposition 6.
Let be an integer. For any , and , the following identities are true:
(31) | |||
(32) | |||
(33) |
The proof for those results makes use of properties specific to moments of Poisson random variables; as with the preceding results, it is deferred to a later section.
6.2 Spatial averaging of graph functionals
In this section, we leverage the coupling obtained above to provide bounds on quantities of the form , for local functions . The tools and results used in this section are essentially identical to those in [bordenave_nonbacktracking_2018], with a few improvements and clarifications added when necessary.
We begin with a result that encodes the fact that the -neighbourhoods in are approximately independent. We say that a function from to is -local if is only function of .
Proposition 7.
Let for some constant . Let be two -local functions such that for all and is non decreasing by the addition of edges. Then
Proof.
For , denote by the set ; the vector is an independent vector, and we have
for some measurable function .
Define now the graph with vertex set and edge set , and set
The random variable is -measurable, so the Efron-Stein inequality applies:
For a given , the difference is always zero except if , due to the locality property; consequently,
where we used the non-decreasing property of in the last line. By the Cauchy-Schwarz inequality and equation (16), we can write
Using that , and the linearity of expectation, yields the desired bound. ∎
We now use our previous coupling results to provide a concentration bound between a functional on graphs and its expectation on trees:
Proposition 8.
Let and be as in the previous proposition. Then, with probability at least , the following inequality holds:
where is defined as
Proof.
Using the Chebyshev inequality and the variance bound from the preceding proposition, we have with probability at least
It then remains to bound the difference between the expectation term and its counterpart on trees. For , let denote the event that the coupling bewteen and fails; by the locality property, on . Therefore, using the Cauchy-Schwarz inequality,
It is then straightforward to check that both obtained bounds are less than the RHS in the proposition, upon adjusting . ∎
6.3 Structure of near eigenvectors
In the following, the aim is to obtain bounds on the norms and scalar product of the near eigenvectors and defined in (LABEL:eq:def_ui_vi). The main result of this section is as follows:
Proposition 9.
Let be small enough so that (17) holds. On an event with probability , the following inequalities hold for all , and some absolute constant :
(34) | |||
(35) | |||
(36) | |||
(37) | |||
(38) |
Proof.
The proof of those inequalities relies on careful applications of Proposition 8 to previously considered functionals. We aim to prove that each of those inequalities hold with probability ; we fix in the following an integer and . Let be the set of vertices such that is not a tree; we place ourselves in the event described in Proposition 2 and as a consequence
We first prove (34); let
The function is clearly -local, and
The function thus defined is non-decreasing by the addition of edges. When , we notice that
hence,
since by the tangle-free property there are at most two paths from to any vertex in . Furthermore, using the results in subsection 5.2, we find that with probability at least
7 Proof of Theorem LABEL:th:bl_u_bounds
Having shown Proposition 9, all that remains is simply to gather the preceding bounds, and simplify them to get an easy-to-read summary. Bounds (LABEL:eq:Ustar_U)-(LABEL:eq:Ustar_V), as well as (LABEL:eq:norm_Bl), being straightforward computations, they are deferred to the appendix.
7.1 A telescopic trick: proof of (LABEL:eq:Bl_U)
Notice that for for a matrix , we have
(39) |
where are the columns (or lines) of . To apply this inequality, we write
(40) |
and (38) yields
Since , the bounds apply, so that
(41) |
We now use the (very crude) inequality inside (41):
The terms in the sum are all less than 1 since , and implies
The bound holds by definition of , and (LABEL:eq:Bl_U) ensues via (39).
7.2 Bounding
Having established the candidates and error bounds for the upper eigenvalues of , it remains to bound the remaining eigenvalues (also called the bulk) of the matrix. This is done using a method first employed in [massoulie_community_2014], and leveraged again in a similar setting in [bordenave_nonbacktracking_2018, bordenave_detection_2020]. Our approach will be based on the latter two, adapting the non-backtracking method to the weighted case.
Our first preliminary step is the following lemma:
Lemma 7.
On an event with probability at least , for any , any unit vector and , one has
Proving this bound is done through the same telescopic sum trick as above, and is done in the appendix.
7.2.1 Tangle-free decomposition of
We adapt here the decomposition first used in [bordenave_nonbacktracking_2018] to our setting. Through the remainder of this section, we shall consider as an operator on instead of , setting whenever or . This yields a matrix with as a principal submatrix and zeros everywhere else, thus the non-zero spectrum stays identical.
For , and , we define the set of non-backtracking paths of length from to ; further, for an edge we define the indicator variable of , and , so that is the (weighted) adjacency matrix of .
We then have that
Define the set of -tangle-free paths (i.e. the set of paths such that the subgraph induced by is tangle-free). Then, whenever the graph is tangle-free, for all the matrix is equal to , with
Define now the “centered” versions of the weighted and unweighted adjacency matrices and by
Appendix A Applications of Theorem LABEL:th:main
A.1 Proof of Proposition 1
Let be an eigenvector of associated with the eigenvalue ; the eigenvalue equation for reads
(42) |
On the other hand, the definition expands to
Applying equation (42) to and yields