This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Non-backtracking spectra of weighted inhomogeneous random graphs

Ludovic Stephan
[email protected]
Corresponding author
Département d’informatique de l’ENS, ENS, CNRS, PSL University, Paris, FranceInria, Paris, FranceSorbonne Université, Paris, France
   Laurent Massoulié11footnotemark: 1 22footnotemark: 2
[email protected]
Microsoft Research-Inria Joint Centre, Paris, France
Abstract

We study a model of random graphs where each edge is drawn independently (but not necessarily identically distributed) from the others, and then assigned a random weight. When the mean degree of such a graph is low, it is known that the spectrum of the adjacency matrix AA deviates significantly from that of its expected value 𝔼A\mathbb{E}A.

In contrast, we show that over a wide range of parameters the top eigenvalues of the non-backtracking matrix BB — a matrix whose powers count the non-backtracking walks between two edges — are close to those of 𝔼A\mathbb{E}A, and all other eigenvalues are confined in a bulk with known radius. We also obtain a precise characterization of the scalar product between the eigenvectors of BB and their deterministic counterparts derived from the model parameters.

This result has many applications, in domains ranging from (noisy) matrix completion to community detection, as well as matrix perturbation theory. In particular, we establish as a corollary that a result known as the Baik-Ben Arous-Péché phase transition, previously established only for rotationally invariant random matrices, holds more generally for matrices AA as above under a mild concentration hypothesis.

Mathematics Subject Classification (2020): 60B20.
Keywords: random graphs, community detection, non-backtraking matrix.

1 Introduction

Let Pn()P\in\mathcal{M}_{n}(\mathbb{R}) be a symmetric n×nn\times n matrix with entries in [0,1][0,1], and WW a (symmetric) weight matrix with independent random entries. We define the inhomogeneous undirected random graph G=(V,E)G=(V,E) associated with the couple (P,W)(P,W) as follows: the vertex set is simply V=[n]V=[n], and each edge {u,v}\{u,v\} is present in EE independently with probability PuvP_{uv}, and holds weight WuvW_{uv}.

The entrywise expected value and variance of the weighted adjacency matrix of GG are

𝔼A=P𝔼W\mathbb{E}{A}=P\circ\mathbb{E}W\quad\ \par (1)

2 Applications

2.1 Phase transition in random graphs

Matrix perturbation theory focuses on finding the eigenvalues and eigenvectors of matrices of the form X+ΔX+\Delta, where XX is a known matrix and Δ\Delta is a perturbation assumed “small” in a sense. Celebrated results in this field include the Bauer-Fike theorem [bauer_norms_1960] for asymmetric matrices, and the Weyl [weyl_asymptotische_1912] and Davis-Kahan [yu_useful_2015] theorems for symmetric ones; incidentally the present paper makes use of those results in its proofs. Finding sharp general theorems without additional assumptions is known to be hard, since the eigenvalues and eigenvectors depend on the interactions between the eigenspaces of XX and Δ\Delta.

In the last two decades, growing attention has been paid to problems of the following form: finding the eigenvectors of Xn+PnX_{n}+P_{n} (or, in its multiplicative form, Xn(In+Pn)X_{n}(I_{n}+P_{n})), where PnP_{n} is an n×nn\times n matrix with low rank rnr\ll n (usually fixed) and known eigenvalues, and XnX_{n} is a random matrix with known distribution. Examples of this setting are the spiked covariance model [baik_phase_2005, johnstone_pca_2018] and additive perturbations of Wigner matrices [peche_largest_2006, feral_largest_2007, capitaine_largest_2009]. A more systematic study has been performed in [benaychgeorges_singular_2012, benaychgeorges_spectral_2020] on orthogonally invariant random matrices.

A staple of those results is the existence of a so-called BBP phase transition (named after Baik-Ben Arous-Péché, from the seminal article [baik_phase_2005]): in the limit nn\to\infty, each eigenvalue of PnP_{n} that is above a certain threshold gets reflected (albeit perturbed) in the spectrum of Xn+PnX_{n}+P_{n}, with the associated eigenvector correlated to the one of PnP_{n}.

Phase transition for the adjacency matrix

The adjacency matrix AA of our random graph GG can be viewed as a perturbation model by writing

A=𝔼A+(A𝔼A)=Qdiag(Q)+(A𝔼A).A=\mathbb{E}A+(A-\mathbb{E}A)=Q-\operatorname{diag}(Q)+(A-\mathbb{E}A).

The term diag(Q)\operatorname{diag}(Q) being negligible with respect to the others, we can see AA as the sum of a deterministic low-rank matrix and a random noise matrix with i.i.d centered entries. Further, the entrywise variance of AA is equal (up to a negligible term) to KK, so the parameter ρ\rho can be seen as an equivalent to the variance in the Wigner model. We thus expect, whenever ρL\sqrt{\rho}\gg L (so that ρ\sqrt{\rho} is the actual threshold in Theorem LABEL:th:main), to find a phase transition akin to the one in [benaych-georges_eigenvalues_2011]; and indeed the following theorem holds:

Theorem 1.

Let (P,W)(P,W) be a matrix couple of size n×nn\times n and r,b,d,τ,Lr,b,d,\tau,L as above. Assume further that:

  1. (i)

    the Perron-Frobenius eigenvector of KK is 𝟏\mathbf{1}; that is K𝟏=ρ𝟏K\mathbf{1}=\rho\mathbf{1},

  2. (ii)

    the above eigenvector equation concentrates, i.e. with high probability there exists ε1/2\varepsilon\leq 1/2 such that for all i[n]i\in[n],

    |jiWij2ρ|ερ\left|\sum_{j\sim i}W_{ij}^{2}-\rho\right|\leq\varepsilon\rho (2)

Then, if i[r0]i\in[r_{0}] is such that μi22L2\mu_{i}^{2}\geq 2L^{2}, there exists an eigenvalue νi\nu_{i} of AA that verifies

νi=μi+ρμi+ρμiO(Lμi+L2μi2+ε).\nu_{i}=\mu_{i}+\frac{\rho}{\mu_{i}}+\frac{\rho}{\mu_{i}}\cdot O\left(\frac{L}{\mu_{i}}+\frac{L^{2}}{\mu_{i}^{2}}+\varepsilon\right). (3)

Further, if the mean degree djd_{j} for all jj is equal to d0>1d_{0}>1, and ii is such that δi2σ\delta_{i}\geq 2\sigma (with σ\sigma and δi\delta_{i} defined in (LABEL:eq:sigma_def) and (LABEL:eq:delta_i_def)), then there exists a normed eigenvector ζ\zeta of AA with corresponfing eigenvalue νi\nu_{i} such that

ζ,φi=1ρμi2+O[1δiσ(Lρμi2+L2ρμi3+ερμi)].\langle\zeta,\varphi_{i}\rangle=\sqrt{1-\frac{\rho}{\mu_{i}^{2}}}+O\left[\frac{1}{\delta_{i}-\sigma}\left(\frac{L\rho}{\mu_{i}^{2}}+\frac{L^{2}\rho}{\mu_{i}^{3}}+\varepsilon\frac{\rho}{\mu_{i}}\right)\right]. (4)

Whenever ρL2\rho\gg L^{2}, and ε\varepsilon goes to zero as nn\to\infty, then the condition μi22L2\mu_{i}^{2}\geq 2L^{2} is always verified and the O()O(\cdot) term in (3) vanishes, and the obtained expansion is therefore asymptotically correct. The presence of δi\delta_{i} renders a similar result on the scalar product harder to obtain; however, assuming δi=Θ(ρ)\delta_{i}=\Theta(\sqrt{\rho}) (that is, the eigenvalues of QQ are somewhat regularly spaced) implies similarly that the O()O(\cdot) term in (4) vanishes.

The obtained expression for νi\nu_{i}, as well as the scalar product expansion, are identical to the ones in [benaych-georges_eigenvalues_2011], for low-rank additive perturbations of Gaussian Wigner matrices. Our result is thus a direct extension of [benaych-georges_eigenvalues_2011], for a larger class of matrices upon a sparsity and concentration condition. Such an extension isn’t unexpected, in view of results concerning the universality of the semicircle law for Bernoulli random matrices, such as [erdos_local_2013].

An especially interesting particular case of Theorem 1 is the unweighted random graph setting, where Wij=1W_{ij}=1 for all i,ji,j. In this case, we have K=PK=P so the eigenvector equation K𝟏=ρ𝟏K\mathbf{1}=\rho\mathbf{1} is equivalent to all the average degrees being equal, i.e. di=d0=ρd_{i}=d_{0}=\rho for i[n]i\in[n]. It is a well known fact (see for example [feige_spectral_2005]) that for unweighted random graphs the degree concentration property holds with ε=2log(n)/d0\varepsilon=2\sqrt{\log(n)/d_{0}}. A slight modification of the proof of Theorem 1 further removes several error terms, and the following corollary ensues:

Corollary 1.

Let PP be a n×nn\times n matrix and r,b,d,τr,b,d,\tau as above, with W=𝟏𝟏W=\mathbf{1}^{*}\mathbf{1}. Assume further that for all i[n]i\in[n],

j[n]Pij=d0>16log(n).\sum_{j\in[n]}P_{ij}=d_{0}>16\log(n).

Then for all i[r0]i\in[r_{0}], there exists an eigenvalue νi\nu_{i} of AA that verifies

νi=μi+d0μi+O(log(n)d0d0μi),\nu_{i}=\mu_{i}+\frac{d_{0}}{\mu_{i}}+O\left(\sqrt{\frac{\log(n)}{d_{0}}}\frac{d_{0}}{\mu_{i}}\right),

and if ii is such that δi>2σ\delta_{i}>2\sigma, there exists a normed eigenvector of AA such that

ζ,φi=1d0μi2+O(1δiσlog(n)d0d0μi).\langle\zeta,\varphi_{i}\rangle=\sqrt{1-\frac{d_{0}}{\mu_{i}^{2}}}+O\left(\frac{1}{\delta_{i}-\sigma}\sqrt{\frac{\log(n)}{d_{0}}}\frac{d_{0}}{\mu_{i}}\right).

In particular we have

ν1=d0+1+O(log(n)d0)\nu_{1}=d_{0}+1+O\left(\sqrt{\frac{\log(n)}{d_{0}}}\right)

This is an improvement on the results of [benaychgeorges_spectral_2020], which only give νi=μi+O(d0)\nu_{i}=\mu_{i}+O(\sqrt{d_{0}}). The condition d0>16log(n)d_{0}>16\log(n) ensures that the degrees of GG concentrate. Since our result is really only meaningful whenever d0log(n)d_{0}\gg\log(n), so that the error term is negligible before d0/μid_{0}/\mu_{i}, we do not perform the same detailed analysis as in [alt_extremal_2020]. However, a more precise phase transition around d0log(n)d_{0}\asymp\log(n) is not excluded.


Theorem 1 is derived from Theorem LABEL:th:main through an adaptation of the Ihara-Bass formula [bass_iharaselberg_1992], obtained by expanding arguments from [benaychgeorges_largest_2019, watanabe_graph_2009]:

Proposition 1.

Let xx be an eigenvector of the matrix BB with associated eigenvalue λ\lambda, such that λ2Wij2\lambda^{2}\neq W_{ij}^{2} for every i,ji,j. Define the weighted adjacency matrix A~(λ)\tilde{A}(\lambda) and the diagonal degree matrix D~(λ)\tilde{D}(\lambda) by

A~(λ)ij=𝟏{ij}λWijλ2Wij2\tilde{A}{(\lambda)}_{ij}=\mathbf{1}\{i\sim j\}\frac{\lambda W_{ij}}{\lambda^{2}-W_{ij}^{2}}\quad\ \par

3 A Bauer-Fike type bound for almost orthogonal diagonalization

One important tool in tying together the local analysis of GG is a matrix perturbation theorem, derived from the Bauer-Fike theorem. It mostly consists in a simplification and adaptation of Theorem 8.2 in [bordenave_detection_2020], tailored to our needs. We begin by recalling the original Bauer-Fike Theorem:

Theorem 2 (Bauer-Fike Theorem [bauer_norms_1960]).

Let DD be a diagonalizable matrix, such that D=V1ΛVD=V^{-1}\Lambda V for some invertible matrix VV and Λ=diag(λ1,,λn)\Lambda=\operatorname{diag}(\lambda_{1},\dots,\lambda_{n}). Let EE be any matrix of size n×nn\times n. Then, any eigenvalue μ\mu of D+ED+E satisfies

|μλi|Eκ(V),|\mu-\lambda_{i}|\leq\lVert E\rVert\,\kappa(V), (5)

for some i[n]i\in[n], where κ(V)=VV1\kappa(V)=\lVert V\rVert\lVert V^{-1}\rVert is the condition number of VV.

Let RR be the RHS of (5), and Ci:=(λi,R)C_{i}:=\mathcal{B}(\lambda_{i},R) the ball centered at λi\lambda_{i} with radius RR (in \mathbb{C}). Let [n]\mathcal{I}\subseteq[n] be a set of indices such that

(iCi)(iCi)=.\left(\bigcup_{i\in\mathcal{I}}C_{i}\right)\cap\left(\bigcup_{i\notin\mathcal{I}}C_{i}\right)=\emptyset.

Then the number of eigenvalues of D+ED+E in iCi\bigcup_{i\in\mathcal{I}}C_{i} is exactly |||\mathcal{I}|.

3.1 A custom perturbation lemma for almost diagonalizable matrices

Building on this theorem, we now expose this section’s first result. Let U=(u1,,ur)U=(u_{1},\dots,u_{r}) and V=(v1,,vr)V=(v_{1},\dots,v_{r}) be n×rn\times r matrices; our nearly diagonalizable matrix shall be S=UΣVS=U\Sigma V^{*} with Σ=diag(θ1,,θr)\Sigma=\operatorname{diag}(\theta_{1},\dots,\theta_{r}). We shall assume that the θi\theta_{i} are in decreasing order of modulus:

|θr||θr1||θ1|=1.|\theta_{r}|\leq|\theta_{r-1}|\leq\cdots\leq|\theta_{1}|=1.

Now, let AA be a n×nn\times n matrix, not necessarily diagonalizable. The assumptions needed for our results are as follows:

  1. (i)

    For some small constant ε>0\varepsilon>0,

    ASε.\lVert A-S\rVert\leq\varepsilon.
  2. (ii)

    The matrices UU and VV are well-conditioned: both UUU^{*}U and VVV^{*}V are nonsingular, and there exist two constants α,β>1\alpha,\beta>1 such that

    UU\displaystyle\lVert U^{*}U\rVert α,\displaystyle\leq\alpha, VV\displaystyle\lVert V^{*}V\rVert α,\displaystyle\leq\alpha,
    (UU)1\displaystyle\lVert{(U^{*}U)}^{-1}\rVert β,\displaystyle\leq\beta, (VV)1\displaystyle\lVert{(V^{*}V)}^{-1}\rVert β.\displaystyle\leq\beta.
  3. (iii)

    There exists another constant 0<δ<10<\delta<1 such that

    UVIrδ.\lVert U^{*}V-I_{r}\rVert_{\infty}\leq\delta.
  4. (iv)

    The θi\theta_{i} are well-separated from 0, in the sense that

    |θr|>2σ,|\theta_{r}|>2\sigma, (6)

    where an exact expression for σ\sigma will be given over the course of the proof.

Then the following result, whose statement and proof (regarding the eigenvalue perturbation) are adapted from [bordenave_detection_2020], holds:

Theorem 3.

Let AA be a matrix satisfying assumptions (i)-(iv) above, and let |λ1||λ2||λr||\lambda_{1}|\geq|\lambda_{2}|\geq\cdots\geq|\lambda_{r}| be the rr eigenvalues of AA with largest modulus. There exists a permutation π\pi such that for all i[r]i\in[r]

|λπ(i)θi|r×σ,|\lambda_{\pi(i)}-\theta_{i}|\leq r\times\sigma,

and the other nrn-r eigenvalues of AA all have modulus at most σ\sigma. Additionally, if ii is such that

B(θi,σ)(jiB(θj,σ))=,B(\theta_{i},\sigma)\cap\left(\bigcup_{j\neq i}B(\theta_{j},\sigma)\right)=\emptyset, (7)

then there exists a normed eigenvector ξ\xi associated with λπ(i)\lambda_{\pi(i)} such that

ξuiui3σδiσ,\left\lVert\xi-\frac{u_{i}}{\lVert u_{i}\rVert}\right\rVert\leq\frac{3\sigma}{\delta_{i}-\sigma},

where δi\delta_{i} is the minimum distance from θi\theta_{i} to another eigenvalue:

δi=minji|θjθi|2σ.\delta_{i}=\min_{j\neq i}{|\theta_{j}-\theta_{i}|}\geq 2\sigma.
Proof.

We begin with defining an alternative matrix U¯\bar{U} such that U¯V=Ir\bar{U}^{*}V=I_{r}. Let HiH_{i} be the subspace of n\mathbb{R}^{n} such that

Hi=vect(vj|ji),H_{i}=\operatorname{vect}(v_{j}\ |\ j\neq i),

and consider the vectors u~i\tilde{u}_{i} and u¯i\bar{u}_{i} defined as

u~i=uiPHi(ui)\tilde{u}_{i}=u_{i}-P_{H_{i}}(u_{i})\quad\ \par

4 Preliminary computations

We begin the proof of Theorem LABEL:th:bl_u_bounds with some elementary computations on the entries of KK and Γ(t)\Gamma^{(t)}, which will be of use in the later parts of the proof. Most of the results from this section are adapted from [bordenave_detection_2020], although sometimes improved and adapted to our setting.

Bounding ρ\rho and LL from below

We begin with a simple bound on ρ=ρ(K)\rho=\rho(K); by the Courant-Fisher theorem, ρw,Kw\rho\geq\langle w,Kw\rangle for every unit vector ww, and applying it to w=𝟏/nw=\mathbf{1}/\sqrt{n} yields

ρ\displaystyle\rho w,Kwn\displaystyle\geq\frac{\langle w,Kw\rangle}{n}
=1ni,j[n]Pij𝔼[Wij2]\displaystyle=\frac{1}{n}\sum_{i,j\in[n]}P_{ij}\mathbb{E}\left[W_{ij}^{2}\right]
=1di,j[n]Pij2𝔼[Wij]2\displaystyle=\frac{1}{d}\sum_{i,j\in[n]}P_{ij}^{2}\mathbb{E}\left[W_{ij}\right]^{2}
=Q2Fd,\displaystyle=\frac{\lVert Q\rVert^{2}_{F}}{d},

where we used that Pijd/nP_{ij}\leq d/n and the Jensen inequality. The Frobenius norm of QQ is then greater than μ12=1\mu_{1}^{2}=1, which in turns implies

ρ1d,\rho\geq\frac{1}{d}, (8)

so that ρ\rho is bounded away from zero. In order to prove a similar bound on LL, we write for x[n]x\in[n]

φ1(x)=y[n]Qxyφ1(y)yQxy2dLn.\varphi_{1}(x)=\sum_{y\in[n]}Q_{xy}\varphi_{1}(y)\leq\sqrt{\sum_{y}Q_{xy}^{2}}\leq\frac{dL}{\sqrt{n}}.

Squaring and summing those inequalities over xx gives

1=φ12d2L2,1=\lVert\varphi_{1}\rVert^{2}\leq d^{2}L^{2},

so that as with ρ\rho,

L1d.L\geq\frac{1}{d}. (9)
A scalar product lemma

Our second step is an important lemma for the following proof, leveraging the entrywise bounds on WW:

Lemma 1.

Let φ,φn\varphi,\varphi^{\prime}\in\mathbb{R}^{n} be any unit vectors. Then, for any t0t\geq 0,

𝟏,Ktφφrd2L2ρt\langle\mathbf{1},K^{t}\varphi\circ\varphi^{\prime}\rangle\leq rd^{2}L^{2}\rho^{t}
Proof.

We write the eigendecomposition of KK as

K=k=1sνkψkψk,K=\sum_{k=1}^{s}\nu_{k}\psi_{k}\psi_{k}^{*},

with ν1=ρ\nu_{1}=\rho the Perron-Frobenius eigenvalue of KK and sr2s\leq r^{2} its rank. Then, for all i[n]i\in[n],

k=1sνk2ψk(i)2=(K2)ii\displaystyle\sum_{k=1}^{s}\nu_{k}^{2}\psi_{k}{(i)}^{2}={(K^{2})}_{ii} =j[n]Kij2\displaystyle=\sum_{j\in[n]}K_{ij}^{2}
=j[n]Pij2𝔼[Wij2]2\displaystyle=\sum_{j\in[n]}P_{ij}^{2}\mathbb{E}\left[W_{ij}^{2}\right]^{2}
j[n](dn)2L4\displaystyle\leq\sum_{j\in[n]}{\left(\frac{d}{n}\right)}^{2}L^{4}
d2L4n.\displaystyle\leq\frac{d^{2}L^{4}}{n}.

This is akin to a delocalization property on the eigenvectors of KK.

We can now prove the above lemma:

𝟏,Ktφφ\displaystyle\langle\mathbf{1},K^{t}\varphi\circ\varphi^{\prime}\rangle =k=1sνkt𝟏,ψkψk,φφ\displaystyle=\sum_{k=1}^{s}\nu_{k}^{t}\langle\mathbf{1},\psi_{k}\rangle\langle\psi_{k},\varphi\circ\varphi^{\prime}\rangle
ρt1k=1sψk𝟏|νk||ψk,φφ|\displaystyle\leq\rho^{t-1}\sum_{k=1}^{s}\lVert\psi_{k}\rVert\lVert\mathbf{1}\rVert\cdot|\nu_{k}|\left|\langle\psi_{k},\varphi\circ\varphi^{\prime}\rangle\right|
ρt1ni[n]|φ(i)||φ(i)|k=1s|νk||ψk(i)|\displaystyle\leq\rho^{t-1}\sqrt{n}\sum_{i\in[n]}|\varphi(i)||\varphi^{\prime}(i)|\sum_{k=1}^{s}|\nu_{k}||\psi_{k}(i)|
ρtdni[n]|φ(i)||φ(i)|sk=1sνk2ψk(i)2\displaystyle\leq\rho^{t}d\sqrt{n}\sum_{i\in[n]}|\varphi(i)||\varphi^{\prime}(i)|\sqrt{s}\sqrt{\sum_{k=1}^{s}\nu_{k}^{2}\psi_{k}{(i)}^{2}}
ρtansdL2ni|φ(i)||φ(i)|\displaystyle\leq\rho^{t}a\sqrt{n}\sqrt{s}\frac{dL^{2}}{\sqrt{n}}\sum_{i}{|\varphi(i)||\varphi^{\prime}(i)|}
rd2L2ρt,\displaystyle\leq rd^{2}L^{2}\rho^{t},

where we extensively used the Cauchy-Schwarz inequality, as well as the bound ρ1d\rho^{-1}\leq d from (8). ∎

Entrywise bounds for KtK^{t}

For a more precise estimation of entrywise bounds, we define the scale-invariant delocalization parameter

Ψ=dL2ρ.\Psi=\frac{dL^{2}}{\rho}.

Using the same proof technique as in (9), as well as (8), we have

1Ψd2L21\leq\Psi\leq d^{2}L^{2}

for any i,j[n]i,j\in[n]. Recall that, as shown in the proof of Lemma 1, for all i[n]i\in[n]

(K2)iid2L4n=Ψ2nρ2.{(K^{2})}_{ii}\leq\frac{d^{2}L^{4}}{n}=\frac{\Psi^{2}}{n}\rho^{2}.

Now, for t0t\geq 0 and i,j[n]i,j\in[n],

(Kt)ij\displaystyle{(K^{t})}_{ij} =k[s]νktψk(i)ψk(j)\displaystyle=\sum_{k\in[s]}\nu_{k}^{t}\psi_{k}(i)\psi_{k}(j)
ρt2kνk2|ψk(i)||ψk(j)|\displaystyle\leq\rho^{t-2}\sum_{k}\nu_{k}^{2}\left|\psi_{k}(i)\right|\left|\psi_{k}(j)\right|
ρt2(K2)ii(K2)jj,\displaystyle\leq\rho^{t-2}\sqrt{{(K^{2})}_{ii}{(K^{2})}_{jj}},

where we again used the Cauchy-Schwarz inequality at the last line. This yields

(Kt)ijΨ2nρt{(K^{t})}_{ij}\leq\frac{\Psi^{2}}{n}\rho^{t} (10)

for any t1t\geq 1 and i,j[n]i,j\in[n].

The covariance matrices

We now study the covariance matrices ΓU(t)\Gamma_{U}^{(t)} and ΓV(t)\Gamma_{V}^{(t)} defined in (LABEL:eq:def_gamma). Our aim is to prove the following lemma:

Lemma 2.

For all t1t\geq 1; the matrix ΓU(t)\Gamma_{U}^{(t)} (resp. ΓV(t)\Gamma_{V}^{(t)}) is a positive definite matrix, with all its eigenvalues greater than 1 (resp. c1c^{-1}) and such that

1ΓU(t)r2d3L21τ1\leq\lVert\Gamma_{U}^{(t)}\rVert\leq\frac{r^{2}d^{3}L^{2}}{1-\tau}\quad\ \par

5 Local study of GG

It is a well-known fact (see for example [bordenave_nonbacktracking_2018]) that when the mean degree is low enough (d=no(1)d=n^{o(1)}), the graph GG is locally tree-like — that is, vertex neighbourhoods behave almost like random trees. The goal of this section is to establish rigorously this result, as well as provide bounds on neighbourhood sizes.

5.1 Setting and definitions

Labeled rooted graphs

A labeled rooted graph is a triplet g=(g,o,ι)g_{*}=(g,o,\iota) consisting of a graph g=(V,E)g=(V,E), a root oVo\in V, and a mark function ι:V\iota:V\to\mathbb{N} with finite support. We shall denote by 𝒢\mathcal{G}_{*} the set of labeled rooted graphs with V=V=\mathbb{N}, and will often write g=(g,o)g_{*}=(g,o) for an element of 𝒢\mathcal{G}_{*}, dropping the mark function. Notions of subgraphs, induced subgraphs and distance extend naturally from regular graphs to this setting.

Labeling trees and graphs

We recall that GG is the inhomogeneous random graph defined earlier. For each vertex xVx\in V, we can define the associated element of 𝒢\mathcal{G}_{*} as follows: the root is set to xx, each vertex y[n]y\in[n] is given a mark ι(y)=y\iota(y)=y, and we let ι(z)=0\iota(z)=0 for all z[n]z\in\mathbb{N}\setminus[n]. The resulting triple (G,x,ι)(G,x,\iota) is a random element of 𝒢\mathcal{G}_{*}.

Now, let o[n]o\in[n]; we define the inhomogeneous random tree as follows: first, the root is given a mark ι(o)=o\iota(o)=o. Then, for each vertex xx already labeled, we draw the number of children of xx according to Poi(dι(x))\operatorname{Poi}(d_{\iota(x)}), where we recall that

dι(x)=jPι(x),jd.d_{\iota(x)}=\sum_{j}P_{\iota(x),j}\leq d.

Each child yy of xx receives a label drawn independently at random from the distribution

πι(x)=(Pι(x),1dι(x),,Pι(x),ndι(x)),\pi_{\iota(x)}=\left(\frac{P_{\iota(x),1}}{d_{\iota(x)}},\dots,\frac{P_{\iota(x),n}}{d_{\iota(x)}}\right), (11)

which sums to 1 by definition. The resulting tree is a random element of 𝒢\mathcal{G}_{*}, denoted by (T,o)(T,o).

5.2 Growth properties of trees and graphs

A number of growth properties for neighbourhoods in TT and GG are needed to ensure the successful couplings below. By definition of dd, GG (resp. (T,o)(T,o)) is dominated by an Erdős-Rényi graph 𝒢(n,d/n)\mathcal{G}(n,d/n) (resp. a Galton-Watson tree with offspring distribution Poi(d)\operatorname{Poi}(d)); we are thus able to direcly lift properties from [bordenave_nonbacktracking_2018], Sections 8 and 9.

Lemma 3.

Let vv be an arbitrary vertex in GG; then, there exist absolute constants c0,c1>0c_{0},c_{1}>0 such that for every s>0s>0, we have

(t1,|(G,v)t|sdt)1c0ec1s.\mathbb{P}\left\lparen\forall t\geq 1,\ \left|\partial(G,v)_{t}\right|\leq sd^{t}\right\rparen\geq 1-c_{0}e^{-c_{1}s}. (12)

The same result holds when replacing (G,v)(G,v) with the tree (T,o)(T,o) defined above.

Taking s=c11log(c0n2)s=c_{1}^{-1}\log(c_{0}n^{2}) in the above inequality, one gets

(t1,vV,|(G,v)t|c3log(n)dt)11n,\mathbb{P}\left\lparen\forall t\geq 1,\ \forall v\in V,\ \left|\partial(G,v)_{t}\right|\leq c_{3}\log(n)d^{t}\right\rparen\geq 1-\frac{1}{n},\ (13)

for any n3n\geq 3. Summing these inequalities for 1t1\leq t\leq\ell yields a similar bound for the whole ball: with probability at least 11n1-\frac{1}{n}, we have

|(G,v)t|c4log(n)dt|{(G,v)}_{t}|\leq c_{4}\log(n)d^{t} (14)

for all vVv\in V and t1t\geq 1. In particular, this implies the following useful bound: for any vVv\in V,

deg(v)c4dlog(n).\deg(v)\leq c_{4}d\log(n).

Another consequence of (12) is the following useful lemma:

Lemma 4.

For every p2p\geq 2, there is a constant cpc_{p} such that

𝔼[maxvVsupt1(|(G,v)t|dt)p]cplog(n)p\mathbb{E}\left[\max_{v\in V}\sup_{t\geq 1}{\left(\frac{\left|\partial(G,v)_{t}\right|}{d^{t}}\right)}^{p}\,\right]\leq c_{p}\log{(n)}^{p} (15)

Similarly to the proof of (14), we have

maxvV|(G,v)t|pdtptpmaxxVsupst|(G,v)t|pdsp,\max_{v\in V}|{(G,v)}_{t}|^{p}\leq d^{tp}t^{p}\max_{x\in V}\sup_{s\leq t}\frac{\left|\partial(G,v)_{t}\right|^{p}}{d^{sp}},

which yields

𝔼[maxvV|(G,v)t|p]cptplog(n)pdtp\mathbb{E}\left[\max_{v\in V}|{(G,v)}_{t}|^{p}\right]\leq c_{p}t^{p}\log{(n)}^{p}d^{tp} (16)

An important note is that the above results apply to any collection of nn random variables satisfying an inequality like (12); in particular, it also applies to an i.i.d collection of inhomogeneous random trees of size nn.

5.3 Local tree-like structure

We first check that the random graph GG is tree-like. We say that a graph gg is \ell-tangle-free if there is at most one cycle in the \ell-neighbourhood of every vertex in the graph. As mentioned before, the random graph GG is dominated by an Erdős-Rényi graph 𝒢(n,d/n)\mathcal{G}(n,d/n); we can therefore lift the desired properties from [bordenave_nonbacktracking_2018].

Lemma 5.

Let n\ell\leq n be any integer parameter.

  1. (i)

    the random graph GG is \ell-tangle-free with probability at least 1ca2d4/n1-ca^{2}d^{4\ell}/n.

  2. (ii)

    the probability that a given vertex vv has a cycle in its \ell-neighbourhood is at most cad2/ncad^{2\ell}/n.

We shall assume in the following that the 22\ell-tangle-free property happens with probability at least 1cnϵ1-cn^{-\epsilon} for some ϵ>0\epsilon>0, which happens whenever

1ϵ10logd(n)c3log(n).\ell\leq\frac{1-\epsilon}{10}\log_{d}(n)\leq c_{3}\log(n). (17)

We now gather all the result of the current section into one proposition, for ease of reading. The bound clog(n)\ell\leq c\log(n) assumed above is used to simplify the inequalities below.

Proposition 2.

Let GG be an inhomogeneous random graph, and (Tx,x)x[n]{(T_{x},x)}_{x\in[n]} a family of random trees as defined above. Let \ell be small enough so that (17) holds. Then there exists an event \mathcal{E} with probability at least 11log(n)1-\frac{1}{\log(n)}, under which:

  1. (i)

    the graph GG is 22\ell-tangle-free,

  2. (ii)

    for all vGv\in G, t2t\leq 2\ell, we have

    |(G,x)t|clog(n)dt,|{(G,x)}_{t}|\leq c\log(n)d^{t}, (18)
  3. (iii)

    for any t2t\leq 2\ell, the number of vertices in GG whose tt-neighbourhood contains a cycle is at most clog(n)2dt+1c\log{(n)}^{2}d^{t+1}

Furthermore, for any t2t\leq 2\ell and p1p\geq 1, we have

𝔼[maxvV|(G,v)t|p]1pclog(n)2dt,\mathbb{E}\left[\max_{v\in V}|{(G,v)}_{t}|^{p}\right]^{\frac{1}{p}}\leq c\log{(n)}^{2}d^{t}, (19)

and the same holds for the family (Tx,x)x[n]{(T_{x},x)}_{x\in[n]}.

5.4 Coupling between rooted graphs and trees

We now turn onto the main argument of this proof: we bound the variation distance between the neighbourhoods of (G,x)(G,x) and (T,x)(T,x) up to size \ell.

First, recall some definitions: if 1,2\mathbb{P}_{1},\mathbb{P}_{2} are two probability measures on the space (Ω,)(\Omega,\mathcal{F}), their total variation distance is defined as

dTV(1,2)=sup𝒜|1(𝒜)2(𝒜)|.\operatorname{d_{TV}}(\mathbb{P}_{1},\mathbb{P}_{2})=\sup_{\mathcal{A}\in\mathcal{F}}|\mathbb{P}_{1}(\mathcal{A})-\mathbb{P}_{2}(\mathcal{A})|.

The following two characterizations of the total variation distance shall be useful: first, whenever Ω\Omega is countable, we have

dTV(1,2)=12121=12ωΩ|1(ω)2(ω)|.\operatorname{d_{TV}}(\mathbb{P}_{1},\mathbb{P}_{2})=\frac{1}{2}\left\lVert\mathbb{P}_{1}-\mathbb{P}_{2}\right\rVert_{1}=\frac{1}{2}\sum_{\omega\in\Omega}|\mathbb{P}_{1}(\omega)-\mathbb{P}_{2}(\omega)|. (20)

Additionally,

dTV(1,2)=minπ(X1,X2)(X1X2),\operatorname{d_{TV}}(\mathbb{P}_{1},\mathbb{P}_{2})=\min_{\mathbb{P}\in\pi(X_{1},X_{2})}\mathbb{P}(X_{1}\neq X_{2}), (21)

where π(X1,X2)\pi(X_{1},X_{2}) denotes the set of all couplings between 1\mathbb{P}_{1} and 2\mathbb{P}_{2}, i.e. probability measures on (Ω2,)(\Omega^{2},\mathcal{F}\otimes\mathcal{F}) such that the marginal distributions are 1\mathbb{P}_{1} and 2\mathbb{P}_{2}.

Denoting by (X)\mathcal{L}(X) the probability distribution of a variable XX, the aim of this section is to prove the following:

Proposition 3.

Let c0log(n)\ell\leq c_{0}\log(n) for some constant c0>0c_{0}>0. Then, for every vertex vVv\in V,

dTV(((G,v)),((T,v)))clog(n)2d2+2n.\operatorname{d_{TV}}(\mathcal{L}({(G,v)}_{\ell}),\mathcal{L}({(T,v)}_{\ell}))\leq\frac{c\,\log{(n)}^{2}d^{2\ell+2}}{n}. (22)

5.4.1 A total variation distance lemma for sampling processes

For an integer nn, denote by 𝒮(n)\mathcal{S}(n) the set of all multisets with elements in [n][n], and by 𝒫(n)𝒮(n)\mathcal{P}(n)\subset\mathcal{S}(n) the powerset of [n][n]. Let p1,,pn[0,1/2]p_{1},\dots,p_{n}\in[0,1/2], with pi=λ\sum p_{i}=\lambda and pi2=α\sum p_{i}^{2}=\alpha, and consider the two probability laws on 𝒮(n)\mathcal{S}(n):

  • 1\mathbb{P}_{1}: each element ii of [n][n] is picked with probability pip_{i},

  • 2\mathbb{P}_{2}: the size of the multiset SS is drawn according to a Poi(λ)\operatorname{Poi}(\lambda) distribution, and each element of SS has an i.i.d label with distribution (p1/λ,,pn/λ)(p_{1}/\lambda,\dots,p_{n}/\lambda).

Note that 1\mathbb{P}_{1} is actually supported on 𝒫(n)\mathcal{P}(n).

Proposition 4.

Let 1,2\mathbb{P}_{1},\mathbb{P}_{2} be defined as above. Then

dTV(1,2)α+e2α12.\operatorname{d_{TV}}(\mathbb{P}_{1},\mathbb{P}_{2})\leq\alpha+\frac{e^{2\alpha}-1}{2}.
Proof.

Using characterization (20), we have

2dTV(1,2)=S𝒫(n)|1(S)2(S)|+2(S𝒫(n)).2\operatorname{d_{TV}}(\mathbb{P}_{1},\mathbb{P}_{2})=\sum_{S\in\mathcal{P}(n)}\left|\mathbb{P}_{1}(S)-\mathbb{P}_{2}(S)\right|+\mathbb{P}_{2}(S\notin\mathcal{P}(n)). (23)

We shall treat those two terms separately. First, notice that for S𝒫(n)S\in\mathcal{P}(n), we have

1(S)\displaystyle\mathbb{P}_{1}(S) =iSpiiS(1pi)\displaystyle=\prod_{i\in S}p_{i}\prod_{i\notin S}(1-p_{i}) (24)
2(S)\displaystyle\mathbb{P}_{2}(S) =eλλ|S||S|!×|S|!iSpiλ\displaystyle=\frac{e^{-\lambda}\lambda^{|S|}}{|S|!}\times|S|!\prod_{i\in S}\frac{p_{i}}{\lambda}
=eλiSpi,\displaystyle=e^{-\lambda}\prod_{i\in S}p_{i}, (25)

and thus by summing over all sets SS,

2(S𝒫(n))=eλi=1n(1+pi).\mathbb{P}_{2}(S\in\mathcal{P}(n))=e^{-\lambda}\prod_{i=1}^{n}(1+p_{i}).

Using the classical inequality log(1+x)xx2/2\log(1+x)\geq x-x^{2}/2, we can bound the second member of (23) as follows:

2(S𝒫(n))\displaystyle\mathbb{P}_{2}(S\notin\mathcal{P}(n)) =1eλi=1n(1+pi)\displaystyle=1-e^{-\lambda}\prod_{i=1}^{n}(1+p_{i})
1eλeλα/2\displaystyle\leq 1-e^{-\lambda}e^{\lambda-\alpha/2}
α/2.\displaystyle\leq\alpha/2.

On the other hand, using again (24) and (25), the first term reduces to

S𝒫(n)|1(S)2(S)|=S𝒫(n)iSpi|iS(1pi)eλ|\displaystyle\sum_{S\in\mathcal{P}(n)}\left|\mathbb{P}_{1}(S)-\mathbb{P}_{2}(S)\right|=\sum_{S\in\mathcal{P}(n)}\prod_{i\in S}p_{i}\left|\prod_{i\notin S}(1-p_{i})-e^{-\lambda}\right|
S𝒫(n)iSpi(|eλi=1n(1pi)|+|iS(1pi)i=1n(1pi)|).\displaystyle\leq\sum_{S\in\mathcal{P}(n)}\prod_{i\in S}p_{i}\left(\left|e^{-\lambda}-\prod_{i=1}^{n}(1-p_{i})\right|+\left|\prod_{i\notin S}(1-p_{i})-\prod_{i=1}^{n}(1-p_{i})\right|\right).

Both absolute values above can be removed since the expressions inside are nonnegative; further, for 0p1/20\leq p\leq 1/2, we have log(1x)xx2\log(1-x)\geq-x-x^{2}. Combining all those estimates, we find

S𝒫(n)|1(S)2(S)|\displaystyle\sum_{S\in\mathcal{P}(n)}\left|\mathbb{P}_{1}(S)-\mathbb{P}_{2}(S)\right|
eλ(1eα)S𝒫(n)iSpi+i=1n(1pi)S𝒫(n)iSpi(iS11pi1)\displaystyle\leq e^{-\lambda}(1-e^{-\alpha})\sum_{S\in\mathcal{P}(n)}\prod_{i\in S}p_{i}+\prod_{i=1}^{n}(1-p_{i})\sum_{S\in\mathcal{P}(n)}\prod_{i\in S}p_{i}\left(\prod_{i\in S}\frac{1}{1-p_{i}}-1\right)
αeλi=1n(1+pi)+eλ(i=1n(1+pi1pi)i=1n(1+pi))\displaystyle\leq\alpha e^{-\lambda}\prod_{i=1}^{n}(1+p_{i})+e^{-\lambda}\left(\prod_{i=1}^{n}\left(1+\frac{p_{i}}{1-p_{i}}\right)-\prod_{i=1}^{n}\left(1+p_{i}\right)\right)
α+eλexp(i=1npi1pi)eα2,\displaystyle\leq\alpha+e^{-\lambda}\exp\left(\sum_{i=1}^{n}\frac{p_{i}}{1-p_{i}}\right)-e^{-\frac{\alpha}{2}},

where we again used the logarithm inequalities extensively. Finally, for 0p1/20\leq p\leq 1/2, we have p/(1p)p+2p2p/(1-p)\leq p+2p^{2}, which allows us to finish the computation:

S𝒫(n)|1(S)2(S)|32α+e2α1.\sum_{S\in\mathcal{P}(n)}\left|\mathbb{P}_{1}(S)-\mathbb{P}_{2}(S)\right|\leq\frac{3}{2}\alpha+e^{2\alpha}-1. (26)

Combining (26) with (23) easily implies the lemma.

We introduce now a family of probability laws on 𝒮(n)\mathcal{S}(n); for a subset S[n]S\subseteq[n], let S\mathbb{P}_{S} be the measure corresponding to picking each element ii of SS with probability pip_{i}.

The variation distance between those laws and 1=[n]\mathbb{P}_{1}=\mathbb{P}_{[n]} is then easier to bound:

Lemma 6.

For any S[n]S\subseteq[n], we have:

dTV(1,S)iSpi.\operatorname{d_{TV}}(\mathbb{P}_{1},\mathbb{P}_{S})\leq\sum_{i\notin S}p_{i}.
Proof.

Consider the following coupling: we take a realization XX of 1\mathbb{P}_{1}, and set Y=XSY=X\cap S. Then, YSY\sim\mathbb{P}_{S}, and we find

(XY)=1(XSc)𝔼[|XSc|]=iSpi\mathbb{P}(X\neq Y)=\mathbb{P}_{1}(X\cap S^{c}\neq\emptyset)\leq\mathbb{E}\left[|X\cap S^{c}|\right]=\sum_{i\notin S}p_{i}

This ends the proof, since (21) ensures that dTV(1,S)(XY)\operatorname{d_{TV}}(\mathbb{P}_{1},\mathbb{P}_{S})\leq\mathbb{P}(X\neq Y). ∎

5.4.2 Proof of Proposition 3

Gathering all the previous results, we are now ready to prove Proposition 3:

Proof.

Define the classical breadth-first exploration process on the neighbourhood of a vertex vv as follows : start with A0={v}A_{0}=\{v\} and at stage t0t\geq 0, if AtA_{t} is not empty, take a vertex vtAtv_{t}\in A_{t} at minimal distance from vv, reveal its neighbours NtN_{t} in VAtV\setminus A_{t}, and update At+1=(AtNt){vt}A_{t+1}=(A_{t}\cup N_{t})\setminus\{v_{t}\}. We denote by (t)t0{(\mathcal{F}_{t})}_{t\geq 0} the filtration generated by the (At)t0{(A_{t})}_{t\geq 0}, and by Dt=stAsD_{t}=\bigcup_{s\leq t}A_{s} the set of vertices already visited at time tt, and τ\tau the first time at which all vertices in (G,v){(G,v)}_{\ell} have been revealed.

We perform the same exploration process in parallel on (T,v)(T,v), which corresponds to a breadth-first search of the tree. At step tt, we denote by t\mathbb{P}_{t} the distribution of NtN_{t} given t\mathcal{F}_{t}, and t\mathbb{Q}_{t} the distribution of the offspring of vtv_{t} in TT (no conditioning is needed there).

Let EE_{\ell} denote the event that (G,v){(G,v)}_{\ell} is a tree and contains no more than c1log(n)dc_{1}\log(n)d^{\ell} vertices; from (14) and Lemma 5, we can choose c1c_{1} such that EE_{\ell} has probability at least 1c2d2+1/n1-c_{2}d^{2\ell+1}/n for some absolute constant c2c_{2}. By iteration, it suffices to show that if EE_{\ell} holds, there exists a constant c3>0c_{3}>0 such that

dTV(t,t)c3log(n)d+2nfor alltτ.\operatorname{d_{TV}}(\mathbb{P}_{t},\mathbb{Q}_{t})\leq\frac{c_{3}\log(n)d^{\ell+2}}{n}\quad\text{for all}\quad t\leq\tau. (27)

Given t\mathcal{F}_{t}, the probability measure t\mathbb{P}_{t} is as follows: each element ii of VAtV\setminus A_{t} is selected with probability pi=Pvtip_{i}=P_{v_{t}i}. Let t\mathbb{P}^{\prime}_{t} denote the same probability measure, but where the selection is made over all of VV. Using Lemma 6, we first find that

dTV(t,t)iAtPvtic1log(n)ddn.\operatorname{d_{TV}}(\mathbb{P}_{t},\mathbb{P}^{\prime}_{t})\leq\sum_{i\in A_{t}}P_{v_{t}i}\leq c_{1}\log(n)d^{\ell}\cdot\frac{d}{n}.

On the other hand, Proposition 4 yields

dTV(t,t)c4i=1nPvti2c5d2n.\operatorname{d_{TV}}(\mathbb{P}^{\prime}_{t},\mathbb{Q}_{t})\leq c_{4}\sum_{i=1}^{n}P_{v_{t}i}^{2}\leq c_{5}\,\frac{d^{2}}{n}.

Equation (27) then results from a straightforward application of the triangle inequality. ∎

6 Near eigenvectors of GG

6.1 Functionals on (T,o)(T,o)

6.1.1 Vertex functionals on trees

Similarly to [bordenave_nonbacktracking_2018], quantities of interest in the study of BB will be tied to functionals on the random inhomogeneous tree defined above. Define a functional fφ,tf_{\varphi,t} on the set of labeled rooted trees 𝒯𝒢\mathcal{T}_{*}\subset\mathcal{G}_{*} by

fφ,t(T,o)=xt(T,o)tWι(o),ι(x1)Wι(xt1),ι(xt)φ(ι(xt)),f_{\varphi,t}(T,o)=\sum_{x_{t}\in\partial{(T,o)}_{t}}{W_{\iota(o),\iota(x_{1})}\dots W_{\iota(x_{t-1}),\iota(x_{t})}\varphi(\iota(x_{t}))},

where (o,x1,,xt)(o,x_{1},\dots,x_{t}) is the unique path of length tt between oo and xtx_{t}. Then the following proposition holds:

Proposition 5.

Let t0t\geq 0 be an integer. For any i,j[r]i,j\in[r], the following identities are true:

𝔼[fφi,t(T,x)]=μitφi(x),\displaystyle\mathbb{E}\left[f_{\varphi_{i},t}(T,x)\right]=\mu_{i}^{t}\,\varphi_{i}(x), (28)
𝔼[fφi,t(T,x)fφj,t(T,x)]=(μiμj)ts=0t[Ksφi,j](x)(μiμj)s,\displaystyle\mathbb{E}\left[f_{\varphi_{i},t}(T,x)f_{\varphi_{j},t}(T,x)\right]={(\mu_{i}\mu_{j})}^{t}\sum_{s=0}^{t}{\frac{[K^{s}\varphi^{i,j}](x)}{{(\mu_{i}\mu_{j})}^{s}}}, (29)
𝔼[(fφi,t+1(T,x)μifφi,t(T,x))2]=[Kt+1φi,i](x).\displaystyle\mathbb{E}\left[{\left(f_{\varphi_{i},t+1}(T,x)-\mu_{i}f_{\varphi_{i},t}(T,x)\right)}^{2}\right]=[K^{t+1}\varphi^{i,i}](x). (30)

where we recall that φi,j=φiφj\varphi^{i,j}=\varphi_{i}\odot\varphi_{j}.

6.1.2 Adapting functionals to non-backtracking paths

The matrix BB considered here acts on (directed) edges, whereas the functionals considered so far are defined on vertices. Consequently, we define the following transformation: for a function f:𝒢f:\mathcal{G}_{*}\to\mathbb{R}, and a random vector wVw\in\mathbb{R}^{V} with expected value w¯\bar{w}, let

wf(g,o)=e:e2=owe1f(ge,o),\vec{\partial}_{w}f(g,o)=\sum_{e:e_{2}=o}w_{e_{1}}f(g_{e},o),

where geg_{e} denotes the graph gg with the edge e1,e2{e_{1},e_{2}} removed.

The expectations from Proposition 5 are then adapted as follows:

Proposition 6.

Let t0t\geq 0 be an integer. For any i,j[r]i,j\in[r], and ϕker(P)\phi\in\ker(P), the following identities are true:

𝔼[wfφi,t(Tx,x)]=[Pw¯](x)𝔼[fφi,t(Tx,x)],\displaystyle\mathbb{E}\left[\vec{\partial}_{w}f_{\varphi_{i},t}(T_{x},x)\right]=[P\bar{w}](x)\cdot\mathbb{E}\left[f_{\varphi_{i},t}(T_{x},x)\right], (31)
𝔼[w(fφi,tfφj,t)(Tx,x)]=[Pw¯](x)𝔼[fφi,t(Tx,x)fφj,t(Tx,x)],\displaystyle\mathbb{E}\left[\vec{\partial}_{w}(f_{\varphi_{i},t}\cdot f_{\varphi_{j},t})(T_{x},x)\right]=[P\bar{w}](x)\cdot\mathbb{E}\left[f_{\varphi_{i},t}(T_{x},x)f_{\varphi_{j},t}(T_{x},x)\right], (32)
𝔼[w[(fφi,t+1μifφi,t)2](Tx,x)]\displaystyle\mathbb{E}\left[\vec{\partial}_{w}[{(f_{\varphi_{i},t+1}-\mu_{i}f_{\varphi_{i},t})}^{2}](T_{x},x)\right]
=[Pw¯](x)𝔼[(fφi,t+1(Tx,x)μifφi,t(Tx,x))2].\displaystyle\qquad\qquad=[P\bar{w}](x)\cdot\mathbb{E}\left[{\left(f_{\varphi_{i},t+1}(T_{x},x)-\mu_{i}f_{\varphi_{i},t}(T_{x},x)\right)}^{2}\right]. (33)

The proof for those results makes use of properties specific to moments of Poisson random variables; as with the preceding results, it is deferred to a later section.

6.2 Spatial averaging of graph functionals

In this section, we leverage the coupling obtained above to provide bounds on quantities of the form 1nxVf(G,x)\frac{1}{n}\sum_{x\in V}f(G,x), for local functions ff. The tools and results used in this section are essentially identical to those in [bordenave_nonbacktracking_2018], with a few improvements and clarifications added when necessary.

We begin with a result that encodes the fact that the tt-neighbourhoods in GG are approximately independent. We say that a function ff from 𝒢\mathcal{G}_{*} to \mathbb{R} is tt-local if f(g,o)f(g,o) is only function of (g,o)t{(g,o)}_{t}.

Proposition 7.

Let tc0log(n)t\leq c_{0}\log(n) for some constant c0>0c_{0}>0. Let f,ψ:𝒢f,\psi:\mathcal{G}_{*}\to\mathbb{R} be two tt-local functions such that |f(g,o)|ψ(g,o)|f(g,o)|\leq\psi(g,o) for all (g,o)𝒢(g,o)\in\mathcal{G}_{*} and ψ\psi is non decreasing by the addition of edges. Then

Var(oVf(G,o))clog(n)4nd2t𝔼[maxoVψ(G,o)4].\operatorname{Var}\left(\sum_{o\in V}f(G,o)\right)\leq c\log{(n)}^{4}nd^{2t}\cdot\sqrt{\mathbb{E}\left[\max_{o\in V}\psi{(G,o)}^{4}\right]}.
Proof.

For xVx\in V, denote by ExE_{x} the set {{u,x}E\nonscript|\nonscriptux}\left\{\{u,x\}\in E\nonscript{}\>\middle|\nonscript{}\>\mathopen{}u\leq x\right\}; the vector (E1,,En)(E_{1},\dots,E_{n}) is an independent vector, and we have

Y:=vVf(G,v)=F(E1,,En).Y:=\sum_{v\in V}f(G,v)=F(E_{1},\dots,E_{n}).

for some measurable function FF.
Define now GxG_{x} the graph with vertex set VV and edge set yxEy\bigcup_{y\neq x}E_{y}, and set

Yx=vVf(Gx,v).Y_{x}=\sum_{v\in V}f(G_{x},v).

The random variable YxY_{x} is yxEy\bigcup_{y\neq x}E_{y}-measurable, so the Efron-Stein inequality applies:

Var(Y)x[n]𝔼[(YYx)2].\operatorname{Var}(Y)\leq\sum_{x\in[n]}\mathbb{E}\left[{(Y-Y_{x})}^{2}\right].

For a given xVx\in V, the difference f(G,o)f(Gx,o)f(G,o)-f(G_{x},o) is always zero except if x(G,o)tx\in{(G,o)}_{t}, due to the locality property; consequently,

|YYx|\displaystyle|Y-Y_{x}| oV|f(G,o)f(Gx,o)|\displaystyle\leq\sum_{o\in V}|f(G,o)-f(G_{x},o)|
o(G,x)tψ(G,o)+ψ(Gx,o)\displaystyle\leq\sum_{o\in{(G,x)}_{t}}\psi(G,o)+\psi(G_{x},o)
2maxx[n]|(G,x)t|maxoVψ(G,o),\displaystyle\leq 2\max_{x\in[n]}\left|{(G,x)}_{t}\right|\cdot\max_{o\in V}\psi(G,o),

where we used the non-decreasing property of ψ\psi in the last line. By the Cauchy-Schwarz inequality and equation (16), we can write

𝔼[(YYx)2]\displaystyle\mathbb{E}\left[{(Y-Y_{x})}^{2}\right] 4𝔼[|maxx[n](G,x)t|4]𝔼[maxoVψ(G,o)4]\displaystyle\leq 4\sqrt{\mathbb{E}\left[\left|\max_{x\in[n]}{(G,x)}_{t}\right|^{4}\right]}\cdot\sqrt{\mathbb{E}\left[\max_{o\in V}\psi{(G,o)}^{4}\right]}
c1t2log(n)2d2t𝔼[maxoVψ(G,o)4].\displaystyle\leq c_{1}t^{2}\log{(n)}^{2}d^{2t}\cdot\sqrt{\mathbb{E}\left[\max_{o\in V}\psi{(G,o)}^{4}\right]}.

Using that tc0log(n)t\leq c_{0}\log(n), and the linearity of expectation, yields the desired bound. ∎

We now use our previous coupling results to provide a concentration bound between a functional on graphs and its expectation on trees:

Proposition 8.

Let tt\in\mathbb{N} and f,ψ:𝒢f,\psi:\mathcal{G}_{*}\to\mathbb{R} be as in the previous proposition. Then, with probability at least 11r2log(n)21-\frac{1}{r^{2}\log{(n)}^{2}}, the following inequality holds:

|vVf(G,v)𝔼[x[n]f(Tx,x)]|crlog(n)3dt+1nψ,\left|\sum_{v\in V}f(G,v)-\mathbb{E}\left[\sum_{x\in[n]}f(T_{x},x)\right]\right|\leq c\,r\log{(n)}^{3}d^{t+1}\sqrt{n}\left\lVert\psi\right\rVert_{\star},

where ψ\left\lVert\psi\right\rVert_{\star} is defined as

ψ=(𝔼[maxvVψ(G,v)4])14(maxx[n]𝔼[ψ(Tx,x)2])12.\left\lVert\psi\right\rVert_{\star}={\left(\mathbb{E}\left[\max_{v\in V}\psi{(G,v)}^{4}\right]\right)}^{\frac{1}{4}}\,\vee\,{\left(\max_{x\in[n]}\mathbb{E}\left[\psi{(T_{x},x)}^{2}\right]\right)}^{\frac{1}{2}}.
Proof.

Using the Chebyshev inequality and the variance bound from the preceding proposition, we have with probability at least 11r2log(n)21-\frac{1}{r^{2}\log{(n)}^{2}}

|vVf(G,v)𝔼[vVf(G,v)]|c1rlog(n)3dtnψ.\left|\sum_{v\in V}f(G,v)-\mathbb{E}\left[\sum_{v\in V}f(G,v)\right]\right|\leq c_{1}\,r\log{(n)}^{3}d^{t}\sqrt{n}\left\lVert\psi\right\rVert_{\star}.

It then remains to bound the difference between the expectation term and its counterpart on trees. For xVx\in V, let x\mathcal{E}_{x} denote the event that the coupling bewteen (G,x)t{(G,x)}_{t} and (Tx,x)t{(T_{x},x)}_{t} fails; by the locality property, f(G,x)=f(Tx,x)f(G,x)=f(T_{x},x) on x\mathcal{E}_{x}. Therefore, using the Cauchy-Schwarz inequality,

|x[n]𝔼[f(G,x)f(Tx,x)]|x[n]𝔼[|f(G,x)|1x+|f(Tx,x)|1x]\displaystyle\left|\sum_{x\in[n]}\mathbb{E}\left[f(G,x)-f(T_{x},x)\right]\right|\leq\sum_{x\in[n]}\mathbb{E}\left[|f(G,x)|1_{\mathcal{E}_{x}}+|f(T_{x},x)|1_{\mathcal{E}_{x}}\right]
x[n](x)(𝔼[ψ(G,x)2]+𝔼[ψ(Tx,x)2])\displaystyle\leq\sum_{x\in[n]}\sqrt{\mathbb{P}\left\lparen\mathcal{E}_{x}\right\rparen}\left(\sqrt{\mathbb{E}\left[\psi{(G,x)}^{2}\right]}+\sqrt{\mathbb{E}\left[\psi{(T_{x},x)}^{2}\right]}\right)
c2log(n)2d2t+2nx[n](𝔼[ψ(G,x)4]14+𝔼[ψ(Tx,x)2])\displaystyle\leq\sqrt{\frac{c_{2}\log{(n)}^{2}d^{2t+2}}{n}}\cdot\sum_{x\in[n]}\left(\mathbb{E}\left[\psi{(G,x)}^{4}\right]^{\frac{1}{4}}+\sqrt{\mathbb{E}\left[\psi{(T_{x},x)}^{2}\right]}\right)
c3log(n)adt+1nψ.\displaystyle\leq c_{3}\log(n)ad^{t+1}\sqrt{n}\,\left\lVert\psi\right\rVert_{\star}.

It is then straightforward to check that both obtained bounds are less than the RHS in the proposition, upon adjusting cc. ∎

6.3 Structure of near eigenvectors

In the following, the aim is to obtain bounds on the norms and scalar product of the near eigenvectors uiu_{i} and viv_{i} defined in (LABEL:eq:def_ui_vi). The main result of this section is as follows:

Proposition 9.

Let \ell be small enough so that (17) holds. On an event with probability 1c1/log(n)1-c_{1}/\log(n), the following inequalities hold for all i,j[r]i,j\in[r], t2t\leq 2\ell and some absolute constant c>0c>0:

|Btχi,χjμitφi,DPφj|crb2d2log(n)6d2tLtn,\displaystyle\left|\langle B^{t}\chi_{i},\chi_{j}\rangle-\mu_{i}^{t}\langle\varphi_{i},D_{P}\varphi_{j}\rangle\right|\leq\frac{c\,rb^{2}d^{2}\log{(n)}^{6}d^{2t}L^{t}}{\sqrt{n}}, (34)
|Btχi,DWχˇjμit+1δij|crb2d3Llog(n)6d2tLtn,\displaystyle\left|\langle B^{t}\chi_{i},D_{W}\check{\chi}_{j}\rangle-\mu_{i}^{t+1}\delta_{ij}\right|\leq\frac{c\,rb^{2}d^{3}L\log{(n)}^{6}d^{2t}L^{t}}{\sqrt{n}}, (35)
|Btχi,BtχjμitμjtΓU,ij(t)|crb2d2log(n)7d3tL2tn,\displaystyle\left|\langle B^{t}\chi_{i},B^{t}\chi_{j}\rangle-\mu_{i}^{t}\mu_{j}^{t}\Gamma_{U,ij}^{(t)}\right|\leq\frac{c\,rb^{2}d^{2}\log{(n)}^{7}d^{3t}L^{2t}}{\sqrt{n}}, (36)
|(B)tDWχˇi,(B)tDWχˇjμit+1μjt+1Γij(t+1)|crb2d2L2log(n)6d3tL2tn,\displaystyle\left|\langle{(B^{*})}^{t}D_{W}\check{\chi}_{i},{(B^{*})}^{t}D_{W}\check{\chi}_{j}\rangle-\mu_{i}^{t+1}\mu_{j}^{t+1}\Gamma_{ij}^{(t+1)}\right|\leq\frac{c\,rb^{2}d^{2}L^{2}\log{(n)}^{6}d^{3t}L^{2t}}{\sqrt{n}}, (37)
Bt+1χiμiBtχi2rd3L2ρt+1+crb2d3log(n)7d3tL2tn.\displaystyle\left\lVert B^{t+1}\chi_{i}-\mu_{i}B^{t}\chi_{i}\right\rVert^{2}\leq rd^{3}L^{2}\rho^{t+1}+\frac{crb^{2}d^{3}\log{(n)}^{7}d^{3t}L^{2t}}{\sqrt{n}}. (38)
Proof.

The proof of those inequalities relies on careful applications of Proposition 8 to previously considered functionals. We aim to prove that each of those inequalities hold with probability 1c2/rlog(n)1-c_{2}/r\log(n); we fix in the following an integer t2t\leq 2\ell and i,j[r]i,j\in[r]. Let 𝒱t\mathcal{V}_{t} be the set of vertices such that (G,v)t{(G,v)}_{t} is not a tree; we place ourselves in the event described in Proposition 2 and as a consequence

𝒱tc3log(n)2dt+1.\mathcal{V}_{t}\leq c_{3}\log{(n)}^{2}d^{t+1}.

We first prove (34); let

f(g,o)=𝟏(g,o)t has no cyclesφj(o)𝟏fφi,t(g,o).f(g,o)=\mathbf{1}_{{(g,o)}_{t}\text{ has no cycles}}\,\varphi_{j}(o)\vec{\partial}_{\mathbf{1}}f_{\varphi_{i},t}(g,o).

The function ff is clearly tt-local, and

|f(g,o)|\displaystyle\left|f(g,o)\right| φiφjdeg(o)|(g,o)t|Lt\displaystyle\leq\left\lVert\varphi_{i}\right\rVert_{\infty}\left\lVert\varphi_{j}\right\rVert_{\infty}\deg(o)\left|\partial{(g,o)}_{t}\right|L^{t}
b2ndeg(o)|(g,o)t|Lt:=ψ(g,o).\displaystyle\leq\frac{b^{2}}{n}\deg(o)\left|{(g,o)}_{t}\right|L^{t}:=\psi(g,o).

The function ψ\psi thus defined is non-decreasing by the addition of edges. When v𝒱tv\notin\mathcal{V}_{t}, we notice that

f(G,v)=φj(v)[TBtχi](v),f(G,v)=\varphi_{j}(v)\cdot[T^{*}B^{t}\chi_{i}](v),

hence,

|Btχi,χjvVf(G,v)|=|v𝒱tφi(v)TBtχj|2|𝒱t|maxvψ(G,v),\left|\langle B^{t}\chi_{i},\chi_{j}\rangle-\sum_{v\in V}f(G,v)\right|=\left|\sum_{v\in\mathcal{V}_{t}}\varphi_{i}(v)T^{*}B^{t}\chi_{j}\right|\leq 2|\mathcal{V}_{t}|\max_{v}\psi(G,v),

since by the tangle-free property there are at most two paths from vv to any vertex in (G,v)t{(G,v)}_{t}. Furthermore, using the results in subsection 5.2, we find that with probability at least 11/n1-1/n

maxvψ(G,v)c4b2log(n)2dt+1Ltn\max_{v}\psi(G,v)\leq\frac{c_{4}\,b^{2}\log{(n)}^{2}d^{t+1}L^{t}}{n}\quad\ \par

7 Proof of Theorem LABEL:th:bl_u_bounds

Having shown Proposition 9, all that remains is simply to gather the preceding bounds, and simplify them to get an easy-to-read summary. Bounds (LABEL:eq:Ustar_U)-(LABEL:eq:Ustar_V), as well as (LABEL:eq:norm_Bl), being straightforward computations, they are deferred to the appendix.

7.1 A telescopic trick: proof of (LABEL:eq:Bl_U)

Notice that for for a r0×r0r_{0}\times r_{0} matrix MM, we have

Mr0maxiMi.\lVert M\rVert\leq r_{0}\max_{i}\lVert M_{i}\rVert. (39)

where MiM_{i} are the columns (or lines) of MM. To apply this inequality, we write

Buiμiuit=01μit1Bt+1uiμiBtui,\lVert B^{\ell}u_{i}-\mu_{i}^{\ell}u_{i}\rVert\leq\sum_{t=0}^{\ell-1}\mu_{i}^{\ell-t-1}\lVert B^{t+1}u_{i}-\mu_{i}B^{t}u_{i}\rVert, (40)

and (38) yields

Bt+1uiμiBtui2μi2Bt++1χiμiBt+χi2\displaystyle\lVert B^{t+1}u_{i}-\mu_{i}B^{t}u_{i}\rVert^{2}\leq\mu_{i}^{-2\ell}\lVert B^{t+\ell+1}\chi_{i}-\mu_{i}B^{t+\ell}\chi_{i}\rVert^{2}
μi2(rd3L2ρt++1+crb2d3log(n)7d3(t+)L2(t+)n).\displaystyle\qquad\qquad\leq\mu_{i}^{-2\ell}\left(rd^{3}L^{2}\rho^{t+\ell+1}+\frac{crb^{2}d^{3}\log{(n)}^{7}d^{3(t+\ell)}L^{2(t+\ell)}}{\sqrt{n}}\right).

Since ir0i\leq r_{0}, the bounds μi2ρ1/d\mu_{i}^{2}\geq\rho\geq 1/d apply, so that

Bt+1uiμiBtui2rd3L2ρt++1μi2+crb2d3log(n)7d3t+5L2(t+)n.\lVert B^{t+1}u_{i}-\mu_{i}B^{t}u_{i}\rVert^{2}\leq rd^{3}L^{2}\rho^{t+\ell+1}\mu_{i}^{-2\ell}+\frac{crb^{2}d^{3}\log{(n)}^{7}d^{3t+5\ell}L^{2(t+\ell)}}{\sqrt{n}}. (41)

We now use the (very crude) inequality x+yx+y\sqrt{x+y}\leq\sqrt{x}+\sqrt{y} inside (41):

Buiμiui\displaystyle\lVert B^{\ell}u_{i}-\mu_{i}^{\ell}u_{i}\rVert t=01[μit1rd3/2Lρt++12μi+c1bd3/2log(n)7/2d3t+52Lt+n1/4]\displaystyle\leq\sum_{t=0}^{\ell-1}\left[\mu_{i}^{\ell-t-1}\sqrt{r}d^{3/2}L\rho^{\frac{t+\ell+1}{2}}\mu_{i}^{-\ell}+\frac{c_{1}\,bd^{3/2}\log{(n)}^{7/2}d^{\frac{3t+5\ell}{2}}L^{t+\ell}}{n^{1/4}}\right]
rd3/2Lρ/2t=01(ρμi)t+1+c2bd2log(n)9/2(Ld4)n1/4L.\displaystyle\leq\sqrt{r}d^{3/2}L\rho^{\ell/2}\sum_{t=0}^{\ell-1}{\left(\frac{\sqrt{\rho}}{\mu_{i}}\right)}^{t+1}+c_{2}\,bd^{2}\log{(n)}^{9/2}\frac{{(Ld^{4})}^{\ell}}{n^{1/4}}L^{\ell}.

The terms in the sum are all less than 1 since ir0i\leq r_{0}, and <c3log(n)\ell<c_{3}\log(n) implies

Buiμiuic3rd3/2Llog(n)ρ/2+c2bd2log(n)9/2(aLd3)n1/4L.\lVert B^{\ell}u_{i}-\mu_{i}^{\ell}u_{i}\rVert\leq c_{3}\sqrt{r}d^{3/2}L\log(n)\rho^{\ell/2}+c_{2}bd^{2}\log{(n)}^{9/2}\frac{{(aLd^{3})}^{\ell}}{n^{1/4}}L^{\ell}.

The bound (Ld4)n1/4{(Ld^{4})}^{\ell}\leq n^{1/4} holds by definition of \ell, and (LABEL:eq:Bl_U) ensues via (39).

7.2 Bounding BPH\lVert B^{\ell}P_{H^{\bot}}\rVert

Having established the candidates and error bounds for the upper eigenvalues of BB^{\ell}, it remains to bound the remaining eigenvalues (also called the bulk) of the matrix. This is done using a method first employed in [massoulie_community_2014], and leveraged again in a similar setting in [bordenave_nonbacktracking_2018, bordenave_detection_2020]. Our approach will be based on the latter two, adapting the non-backtracking method to the weighted case.


Our first preliminary step is the following lemma:

Lemma 7.

On an event with probability at least 11/log(n)1-1/\log(n), for any tt\leq\ell, any unit vector wHw\in H^{\bot} and i[r0]i\in[r_{0}], one has

|(B)tDWχˇi,w|rd3/2L2ρt/2+c4bd3/2log(n)9/2d2Ln1/4.\left|\langle{(B^{*})}^{t}D_{W}\check{\chi}_{i},w\rangle\right|\leq\sqrt{r}d^{3/2}L^{2}\rho^{t/2}+\frac{c_{4}\,bd^{3/2}\log{(n)}^{9/2}d^{2\ell}L^{\ell}}{n^{1/4}}.

Proving this bound is done through the same telescopic sum trick as above, and is done in the appendix.

7.2.1 Tangle-free decomposition of BB^{\ell}

We adapt here the decomposition first used in [bordenave_nonbacktracking_2018] to our setting. Through the remainder of this section, we shall consider BB as an operator on E(V)\vec{E}(V) instead of E\vec{E}, setting Bef=0B_{ef}=0 whenever eEe\notin\vec{E} or fEf\notin\vec{E}. This yields a matrix with BB as a principal submatrix and zeros everywhere else, thus the non-zero spectrum stays identical.

For e,fE(V)e,f\in\vec{E}(V), and t0t\geq 0, we define Γkef\Gamma^{k}_{ef} the set of non-backtracking paths of length kk from ee to ff; further, for an edge ee we define XeX_{e} the indicator variable of eEe\in\vec{E}, and Ae=XeWeA_{e}=X_{e}W_{e}, so that AA is the (weighted) adjacency matrix of GG.

We then have that

(Bk)ef=γΓk+1efXes=1kAγsγs+1.{(B^{k})}_{ef}=\sum_{\gamma\in\Gamma^{k+1}_{ef}}X_{e}\prod_{s=1}^{k}A_{\gamma_{s}\gamma_{s+1}}.

Define FkefF^{k}_{ef} the set of \ell-tangle-free paths (i.e. the set of paths γ\gamma such that the subgraph induced by γ\gamma is tangle-free). Then, whenever the graph GG is tangle-free, for all kk\leq\ell the matrix BkB^{k} is equal to B(k)B^{(k)}, with

(B(k))ef=γFk+1efXes=1kAγsγs+1.{(B^{(k)})}_{ef}=\sum_{\gamma\in F^{k+1}_{ef}}X_{e}\prod_{s=1}^{k}A_{\gamma_{s}\gamma_{s+1}}.

Define now the “centered” versions of the weighted and unweighted adjacency matrices A¯\underline{A} and X¯\underline{X} by

A¯ij=AijQij\underline{A}_{ij}=A_{ij}-Q_{ij}\quad\ \par\par\LTX@newpage\par\par

Appendix A Applications of Theorem LABEL:th:main

A.1 Proof of Proposition 1

Let xx be an eigenvector of BB associated with the eigenvalue λ\lambda; the eigenvalue equation for xx reads

λxe=efWfxf.\lambda x_{e}=\sum_{e\rightarrow f}W_{f}x_{f}. (42)

On the other hand, the definition y=SDWxy=S^{*}D_{W}x expands to

yi=e:e1=iWexe.y_{i}=\sum_{e:e_{1}=i}W_{e}x_{e}.

Applying equation (42) to ee and e1e^{-1} yields

λxe=ye2Wexe1References12018AbbeAbbe[2018]abbe_community_2018EmmanuelAbbe.CommunityDetectionandStochasticBlockModels:RecentDevelopments.Journal of Machine Learning Research,18(177):186,2018.ISSN15337928.URL𝚑𝚝𝚝𝚙://𝚓𝚖𝚕𝚛.𝚘𝚛𝚐/𝚙𝚊𝚙𝚎𝚛𝚜/𝚟𝟷𝟾/𝟷𝟼𝟺𝟾𝟶.𝚑𝚝𝚖𝚕.22015AbbeandSandonAbbeandSandon[2015]abbe_community_2015EmmanuelAbbeandColinSandon.CommunityDetectioninGeneralStochasticBlockModels:FundamentalLimitsandEfficientAlgorithmsforRecovery.In2015 IEEE 56th Annual Symposium on Foundations of Computer Science,pages670688,October2015.doi:10.1109/FOCS.2015.47.32018AbbeandSan