Non-backtracking spectra of weighted inhomogeneous random graphs

Ludovic Stephan
[email protected]
Corresponding author Département d’informatique de l’ENS, ENS, CNRS, PSL University, Paris, FranceInria, Paris, FranceSorbonne Université, Paris, France Laurent Massoulié¹¹footnotemark: 1 ²²footnotemark: 2
[email protected] Microsoft Research-Inria Joint Centre, Paris, France

Abstract

We study a model of random graphs where each edge is drawn independently (but not necessarily identically distributed) from the others, and then assigned a random weight. When the mean degree of such a graph is low, it is known that the spectrum of the adjacency matrix $A$ deviates significantly from that of its expected value $\mathbb{E}A$ .

In contrast, we show that over a wide range of parameters the top eigenvalues of the non-backtracking matrix $B$ — a matrix whose powers count the non-backtracking walks between two edges — are close to those of $\mathbb{E}A$ , and all other eigenvalues are confined in a bulk with known radius. We also obtain a precise characterization of the scalar product between the eigenvectors of $B$ and their deterministic counterparts derived from the model parameters.

This result has many applications, in domains ranging from (noisy) matrix completion to community detection, as well as matrix perturbation theory. In particular, we establish as a corollary that a result known as the Baik-Ben Arous-Péché phase transition, previously established only for rotationally invariant random matrices, holds more generally for matrices $A$ as above under a mild concentration hypothesis.

Mathematics Subject Classification (2020): 60B20.
Keywords: random graphs, community detection, non-backtraking matrix.

1 Introduction

Let $P\in\mathcal{M}_{n}(\mathbb{R})$ be a symmetric $n\times n$ matrix with entries in $[0,1]$ , and $W$ a (symmetric) weight matrix with independent random entries. We define the inhomogeneous undirected random graph $G=(V,E)$ associated with the couple $(P,W)$ as follows: the vertex set is simply $V=[n]$ , and each edge $\{u,v\}$ is present in $E$ independently with probability $P_{uv}$ , and holds weight $W_{uv}$ .

The entrywise expected value and variance of the weighted adjacency matrix of $G$ are

\mathbb{E}{A}=P\circ\mathbb{E}W\quad\ \par

(1)

2 Applications

2.1 Phase transition in random graphs

Matrix perturbation theory focuses on finding the eigenvalues and eigenvectors of matrices of the form $X+\Delta$ , where $X$ is a known matrix and $\Delta$ is a perturbation assumed “small” in a sense. Celebrated results in this field include the Bauer-Fike theorem [bauer_norms_1960] for asymmetric matrices, and the Weyl [weyl_asymptotische_1912] and Davis-Kahan [yu_useful_2015] theorems for symmetric ones; incidentally the present paper makes use of those results in its proofs. Finding sharp general theorems without additional assumptions is known to be hard, since the eigenvalues and eigenvectors depend on the interactions between the eigenspaces of $X$ and $\Delta$ .

In the last two decades, growing attention has been paid to problems of the following form: finding the eigenvectors of $X_{n}+P_{n}$ (or, in its multiplicative form, $X_{n}(I_{n}+P_{n})$ ), where $P_{n}$ is an $n\times n$ matrix with low rank $r\ll n$ (usually fixed) and known eigenvalues, and $X_{n}$ is a random matrix with known distribution. Examples of this setting are the spiked covariance model [baik_phase_2005, johnstone_pca_2018] and additive perturbations of Wigner matrices [peche_largest_2006, feral_largest_2007, capitaine_largest_2009]. A more systematic study has been performed in [benaychgeorges_singular_2012, benaychgeorges_spectral_2020] on orthogonally invariant random matrices.

A staple of those results is the existence of a so-called BBP phase transition (named after Baik-Ben Arous-Péché, from the seminal article [baik_phase_2005]): in the limit $n\to\infty$ , each eigenvalue of $P_{n}$ that is above a certain threshold gets reflected (albeit perturbed) in the spectrum of $X_{n}+P_{n}$ , with the associated eigenvector correlated to the one of $P_{n}$ .

Phase transition for the adjacency matrix

The adjacency matrix $A$ of our random graph $G$ can be viewed as a perturbation model by writing

A=\mathbb{E}A+(A-\mathbb{E}A)=Q-\operatorname{diag}(Q)+(A-\mathbb{E}A).

The term $\operatorname{diag}(Q)$ being negligible with respect to the others, we can see $A$ as the sum of a deterministic low-rank matrix and a random noise matrix with i.i.d centered entries. Further, the entrywise variance of $A$ is equal (up to a negligible term) to $K$ , so the parameter $\rho$ can be seen as an equivalent to the variance in the Wigner model. We thus expect, whenever $\sqrt{\rho}\gg L$ (so that $\sqrt{\rho}$ is the actual threshold in Theorem LABEL:th:main), to find a phase transition akin to the one in [benaych-georges_eigenvalues_2011]; and indeed the following theorem holds:

Theorem 1.

Let $(P,W)$ be a matrix couple of size $n\times n$ and $r,b,d,\tau,L$ as above. Assume further that:

(i)

the Perron-Frobenius eigenvector of $K$ is $\mathbf{1}$ ; that is $K\mathbf{1}=\rho\mathbf{1}$ ,
(ii)

the above eigenvector equation concentrates, i.e. with high probability there exists $\varepsilon\leq 1/2$ such that for all $i\in[n]$ ,

$\left|\sum_{j\sim i}W_{ij}^{2}-\rho\right|\leq\varepsilon\rho$ (2)

Then, if $i\in[r_{0}]$ is such that $\mu_{i}^{2}\geq 2L^{2}$ , there exists an eigenvalue $\nu_{i}$ of $A$ that verifies

\nu_{i}=\mu_{i}+\frac{\rho}{\mu_{i}}+\frac{\rho}{\mu_{i}}\cdot O\left(\frac{L}{\mu_{i}}+\frac{L^{2}}{\mu_{i}^{2}}+\varepsilon\right).

(3)

Further, if the mean degree $d_{j}$ for all $j$ is equal to $d_{0}>1$ , and $i$ is such that $\delta_{i}\geq 2\sigma$ (with $\sigma$ and $\delta_{i}$ defined in (LABEL:eq:sigma_def) and (LABEL:eq:delta_i_def)), then there exists a normed eigenvector $\zeta$ of $A$ with corresponfing eigenvalue $\nu_{i}$ such that

\langle\zeta,\varphi_{i}\rangle=\sqrt{1-\frac{\rho}{\mu_{i}^{2}}}+O\left[\frac{1}{\delta_{i}-\sigma}\left(\frac{L\rho}{\mu_{i}^{2}}+\frac{L^{2}\rho}{\mu_{i}^{3}}+\varepsilon\frac{\rho}{\mu_{i}}\right)\right].

(4)

Whenever $\rho\gg L^{2}$ , and $\varepsilon$ goes to zero as $n\to\infty$ , then the condition $\mu_{i}^{2}\geq 2L^{2}$ is always verified and the $O(\cdot)$ term in (3) vanishes, and the obtained expansion is therefore asymptotically correct. The presence of $\delta_{i}$ renders a similar result on the scalar product harder to obtain; however, assuming $\delta_{i}=\Theta(\sqrt{\rho})$ (that is, the eigenvalues of $Q$ are somewhat regularly spaced) implies similarly that the $O(\cdot)$ term in (4) vanishes.

The obtained expression for $\nu_{i}$ , as well as the scalar product expansion, are identical to the ones in [benaych-georges_eigenvalues_2011], for low-rank additive perturbations of Gaussian Wigner matrices. Our result is thus a direct extension of [benaych-georges_eigenvalues_2011], for a larger class of matrices upon a sparsity and concentration condition. Such an extension isn’t unexpected, in view of results concerning the universality of the semicircle law for Bernoulli random matrices, such as [erdos_local_2013].

An especially interesting particular case of Theorem 1 is the unweighted random graph setting, where $W_{ij}=1$ for all $i,j$ . In this case, we have $K=P$ so the eigenvector equation $K\mathbf{1}=\rho\mathbf{1}$ is equivalent to all the average degrees being equal, i.e. $d_{i}=d_{0}=\rho$ for $i\in[n]$ . It is a well known fact (see for example [feige_spectral_2005]) that for unweighted random graphs the degree concentration property holds with $\varepsilon=2\sqrt{\log(n)/d_{0}}$ . A slight modification of the proof of Theorem 1 further removes several error terms, and the following corollary ensues:

Corollary 1.

Let $P$ be a $n\times n$ matrix and $r,b,d,\tau$ as above, with $W=\mathbf{1}^{*}\mathbf{1}$ . Assume further that for all $i\in[n]$ ,

\sum_{j\in[n]}P_{ij}=d_{0}>16\log(n).

Then for all $i\in[r_{0}]$ , there exists an eigenvalue $\nu_{i}$ of $A$ that verifies

\nu_{i}=\mu_{i}+\frac{d_{0}}{\mu_{i}}+O\left(\sqrt{\frac{\log(n)}{d_{0}}}\frac{d_{0}}{\mu_{i}}\right),

and if $i$ is such that $\delta_{i}>2\sigma$ , there exists a normed eigenvector of $A$ such that

\langle\zeta,\varphi_{i}\rangle=\sqrt{1-\frac{d_{0}}{\mu_{i}^{2}}}+O\left(\frac{1}{\delta_{i}-\sigma}\sqrt{\frac{\log(n)}{d_{0}}}\frac{d_{0}}{\mu_{i}}\right).

In particular we have

\nu_{1}=d_{0}+1+O\left(\sqrt{\frac{\log(n)}{d_{0}}}\right)

This is an improvement on the results of [benaychgeorges_spectral_2020], which only give $\nu_{i}=\mu_{i}+O(\sqrt{d_{0}})$ . The condition $d_{0}>16\log(n)$ ensures that the degrees of $G$ concentrate. Since our result is really only meaningful whenever $d_{0}\gg\log(n)$ , so that the error term is negligible before $d_{0}/\mu_{i}$ , we do not perform the same detailed analysis as in [alt_extremal_2020]. However, a more precise phase transition around $d_{0}\asymp\log(n)$ is not excluded.

Theorem 1 is derived from Theorem LABEL:th:main through an adaptation of the Ihara-Bass formula [bass_iharaselberg_1992], obtained by expanding arguments from [benaychgeorges_largest_2019, watanabe_graph_2009]:

Proposition 1.

Let $x$ be an eigenvector of the matrix $B$ with associated eigenvalue $\lambda$ , such that $\lambda^{2}\neq W_{ij}^{2}$ for every $i,j$ . Define the weighted adjacency matrix $\tilde{A}(\lambda)$ and the diagonal degree matrix $\tilde{D}(\lambda)$ by

\tilde{A}{(\lambda)}_{ij}=\mathbf{1}\{i\sim j\}\frac{\lambda W_{ij}}{\lambda^{2}-W_{ij}^{2}}\quad\ \par

3 A Bauer-Fike type bound for almost orthogonal diagonalization

One important tool in tying together the local analysis of $G$ is a matrix perturbation theorem, derived from the Bauer-Fike theorem. It mostly consists in a simplification and adaptation of Theorem 8.2 in [bordenave_detection_2020], tailored to our needs. We begin by recalling the original Bauer-Fike Theorem:

Theorem 2 (Bauer-Fike Theorem [bauer_norms_1960]).

Let $D$ be a diagonalizable matrix, such that $D=V^{-1}\Lambda V$ for some invertible matrix $V$ and $\Lambda=\operatorname{diag}(\lambda_{1},\dots,\lambda_{n})$ . Let $E$ be any matrix of size $n\times n$ . Then, any eigenvalue $\mu$ of $D+E$ satisfies

|\mu-\lambda_{i}|\leq\lVert E\rVert\,\kappa(V),

(5)

for some $i\in[n]$ , where $\kappa(V)=\lVert V\rVert\lVert V^{-1}\rVert$ is the condition number of $V$ .

Let $R$ be the RHS of (5), and $C_{i}:=\mathcal{B}(\lambda_{i},R)$ the ball centered at $\lambda_{i}$ with radius $R$ (in $\mathbb{C}$ ). Let $\mathcal{I}\subseteq[n]$ be a set of indices such that

\left(\bigcup_{i\in\mathcal{I}}C_{i}\right)\cap\left(\bigcup_{i\notin\mathcal{I}}C_{i}\right)=\emptyset.

Then the number of eigenvalues of $D+E$ in $\bigcup_{i\in\mathcal{I}}C_{i}$ is exactly $|\mathcal{I}|$ .

3.1 A custom perturbation lemma for almost diagonalizable matrices

Building on this theorem, we now expose this section’s first result. Let $U=(u_{1},\dots,u_{r})$ and $V=(v_{1},\dots,v_{r})$ be $n\times r$ matrices; our nearly diagonalizable matrix shall be $S=U\Sigma V^{*}$ with $\Sigma=\operatorname{diag}(\theta_{1},\dots,\theta_{r})$ . We shall assume that the $\theta_{i}$ are in decreasing order of modulus:

|\theta_{r}|\leq|\theta_{r-1}|\leq\cdots\leq|\theta_{1}|=1.

Now, let $A$ be a $n\times n$ matrix, not necessarily diagonalizable. The assumptions needed for our results are as follows:

(i)

For some small constant $\varepsilon>0$ ,

$\lVert A-S\rVert\leq\varepsilon.$

(ii)

The matrices $U$ and $V$ are well-conditioned: both $U^{*}U$ and $V^{*}V$ are nonsingular, and there exist two constants $\alpha,\beta>1$ such that

	$\displaystyle\lVert U^{*}U\rVert$	$\displaystyle\leq\alpha,$	$\displaystyle\lVert V^{*}V\rVert$	$\displaystyle\leq\alpha,$
	$\displaystyle\lVert{(U^{*}U)}^{-1}\rVert$	$\displaystyle\leq\beta,$	$\displaystyle\lVert{(V^{*}V)}^{-1}\rVert$	$\displaystyle\leq\beta.$

(iii)

There exists another constant $0<\delta<1$ such that

$\lVert U^{*}V-I_{r}\rVert_{\infty}\leq\delta.$
(iv)

The $\theta_{i}$ are well-separated from $0$ , in the sense that

$|\theta_{r}|>2\sigma,$ (6)

where an exact expression for $\sigma$ will be given over the course of the proof.

Then the following result, whose statement and proof (regarding the eigenvalue perturbation) are adapted from [bordenave_detection_2020], holds:

Theorem 3.

Let $A$ be a matrix satisfying assumptions (i)-(iv) above, and let $|\lambda_{1}|\geq|\lambda_{2}|\geq\cdots\geq|\lambda_{r}|$ be the $r$ eigenvalues of $A$ with largest modulus. There exists a permutation $\pi$ such that for all $i\in[r]$

|\lambda_{\pi(i)}-\theta_{i}|\leq r\times\sigma,

and the other $n-r$ eigenvalues of $A$ all have modulus at most $\sigma$ . Additionally, if $i$ is such that

B(\theta_{i},\sigma)\cap\left(\bigcup_{j\neq i}B(\theta_{j},\sigma)\right)=\emptyset,

(7)

then there exists a normed eigenvector $\xi$ associated with $\lambda_{\pi(i)}$ such that

\left\lVert\xi-\frac{u_{i}}{\lVert u_{i}\rVert}\right\rVert\leq\frac{3\sigma}{\delta_{i}-\sigma},

where $\delta_{i}$ is the minimum distance from $\theta_{i}$ to another eigenvalue:

\delta_{i}=\min_{j\neq i}{|\theta_{j}-\theta_{i}|}\geq 2\sigma.

Proof.

We begin with defining an alternative matrix $\bar{U}$ such that $\bar{U}^{*}V=I_{r}$ . Let $H_{i}$ be the subspace of $\mathbb{R}^{n}$ such that

H_{i}=\operatorname{vect}(v_{j}\ |\ j\neq i),

and consider the vectors $\tilde{u}_{i}$ and $\bar{u}_{i}$ defined as

\tilde{u}_{i}=u_{i}-P_{H_{i}}(u_{i})\quad\ \par

4 Preliminary computations

We begin the proof of Theorem LABEL:th:bl_u_bounds with some elementary computations on the entries of $K$ and $\Gamma^{(t)}$ , which will be of use in the later parts of the proof. Most of the results from this section are adapted from [bordenave_detection_2020], although sometimes improved and adapted to our setting.

Bounding $\rho$ and $L$ from below

We begin with a simple bound on $\rho=\rho(K)$ ; by the Courant-Fisher theorem, $\rho\geq\langle w,Kw\rangle$ for every unit vector $w$ , and applying it to $w=\mathbf{1}/\sqrt{n}$ yields

	$\displaystyle\rho$	$\displaystyle\geq\frac{\langle w,Kw\rangle}{n}$
		$\displaystyle=\frac{1}{n}\sum_{i,j\in[n]}P_{ij}\mathbb{E}\left[W_{ij}^{2}\right]$
		$\displaystyle=\frac{1}{d}\sum_{i,j\in[n]}P_{ij}^{2}\mathbb{E}\left[W_{ij}\right]^{2}$
		$\displaystyle=\frac{\lVert Q\rVert^{2}_{F}}{d},$

where we used that $P_{ij}\leq d/n$ and the Jensen inequality. The Frobenius norm of $Q$ is then greater than $\mu_{1}^{2}=1$ , which in turns implies

\rho\geq\frac{1}{d},

(8)

so that $\rho$ is bounded away from zero. In order to prove a similar bound on $L$ , we write for $x\in[n]$

\varphi_{1}(x)=\sum_{y\in[n]}Q_{xy}\varphi_{1}(y)\leq\sqrt{\sum_{y}Q_{xy}^{2}}\leq\frac{dL}{\sqrt{n}}.

Squaring and summing those inequalities over $x$ gives

1=\lVert\varphi_{1}\rVert^{2}\leq d^{2}L^{2},

so that as with $\rho$ ,

L\geq\frac{1}{d}.

(9)

A scalar product lemma

Our second step is an important lemma for the following proof, leveraging the entrywise bounds on $W$ :

Lemma 1.

Let $\varphi,\varphi^{\prime}\in\mathbb{R}^{n}$ be any unit vectors. Then, for any $t\geq 0$ ,

\langle\mathbf{1},K^{t}\varphi\circ\varphi^{\prime}\rangle\leq rd^{2}L^{2}\rho^{t}

Proof.

We write the eigendecomposition of $K$ as

K=\sum_{k=1}^{s}\nu_{k}\psi_{k}\psi_{k}^{*},

with $\nu_{1}=\rho$ the Perron-Frobenius eigenvalue of $K$ and $s\leq r^{2}$ its rank. Then, for all $i\in[n]$ ,

	$\displaystyle\sum_{k=1}^{s}\nu_{k}^{2}\psi_{k}{(i)}^{2}={(K^{2})}_{ii}$	$\displaystyle=\sum_{j\in[n]}K_{ij}^{2}$
		$\displaystyle=\sum_{j\in[n]}P_{ij}^{2}\mathbb{E}\left[W_{ij}^{2}\right]^{2}$
		$\displaystyle\leq\sum_{j\in[n]}{\left(\frac{d}{n}\right)}^{2}L^{4}$
		$\displaystyle\leq\frac{d^{2}L^{4}}{n}.$

This is akin to a delocalization property on the eigenvectors of $K$ .

We can now prove the above lemma:

	$\displaystyle\langle\mathbf{1},K^{t}\varphi\circ\varphi^{\prime}\rangle$	$\displaystyle=\sum_{k=1}^{s}\nu_{k}^{t}\langle\mathbf{1},\psi_{k}\rangle\langle\psi_{k},\varphi\circ\varphi^{\prime}\rangle$
		$\displaystyle\leq\rho^{t-1}\sum_{k=1}^{s}\lVert\psi_{k}\rVert\lVert\mathbf{1}\rVert\cdot\|\nu_{k}\|\left\|\langle\psi_{k},\varphi\circ\varphi^{\prime}\rangle\right\|$
		$\displaystyle\leq\rho^{t-1}\sqrt{n}\sum_{i\in[n]}\|\varphi(i)\|\|\varphi^{\prime}(i)\|\sum_{k=1}^{s}\|\nu_{k}\|\|\psi_{k}(i)\|$
		$\displaystyle\leq\rho^{t}d\sqrt{n}\sum_{i\in[n]}\|\varphi(i)\|\|\varphi^{\prime}(i)\|\sqrt{s}\sqrt{\sum_{k=1}^{s}\nu_{k}^{2}\psi_{k}{(i)}^{2}}$
		$\displaystyle\leq\rho^{t}a\sqrt{n}\sqrt{s}\frac{dL^{2}}{\sqrt{n}}\sum_{i}{\|\varphi(i)\|\|\varphi^{\prime}(i)\|}$
		$\displaystyle\leq rd^{2}L^{2}\rho^{t},$

where we extensively used the Cauchy-Schwarz inequality, as well as the bound $\rho^{-1}\leq d$ from (8). ∎

Entrywise bounds for $K^{t}$

For a more precise estimation of entrywise bounds, we define the scale-invariant delocalization parameter

\Psi=\frac{dL^{2}}{\rho}.

Using the same proof technique as in (9), as well as (8), we have

1\leq\Psi\leq d^{2}L^{2}

for any $i,j\in[n]$ . Recall that, as shown in the proof of Lemma 1, for all $i\in[n]$

{(K^{2})}_{ii}\leq\frac{d^{2}L^{4}}{n}=\frac{\Psi^{2}}{n}\rho^{2}.

Now, for $t\geq 0$ and $i,j\in[n]$ ,

	$\displaystyle{(K^{t})}_{ij}$	$\displaystyle=\sum_{k\in[s]}\nu_{k}^{t}\psi_{k}(i)\psi_{k}(j)$
		$\displaystyle\leq\rho^{t-2}\sum_{k}\nu_{k}^{2}\left\|\psi_{k}(i)\right\|\left\|\psi_{k}(j)\right\|$
		$\displaystyle\leq\rho^{t-2}\sqrt{{(K^{2})}_{ii}{(K^{2})}_{jj}},$

where we again used the Cauchy-Schwarz inequality at the last line. This yields

{(K^{t})}_{ij}\leq\frac{\Psi^{2}}{n}\rho^{t}

(10)

for any $t\geq 1$ and $i,j\in[n]$ .

The covariance matrices

We now study the covariance matrices $\Gamma_{U}^{(t)}$ and $\Gamma_{V}^{(t)}$ defined in (LABEL:eq:def_gamma). Our aim is to prove the following lemma:

Lemma 2.

For all $t\geq 1$ ; the matrix $\Gamma_{U}^{(t)}$ (resp. $\Gamma_{V}^{(t)}$ ) is a positive definite matrix, with all its eigenvalues greater than 1 (resp. $c^{-1}$ ) and such that

1\leq\lVert\Gamma_{U}^{(t)}\rVert\leq\frac{r^{2}d^{3}L^{2}}{1-\tau}\quad\ \par

5 Local study of $G$

It is a well-known fact (see for example [bordenave_nonbacktracking_2018]) that when the mean degree is low enough ( $d=n^{o(1)}$ ), the graph $G$ is locally tree-like — that is, vertex neighbourhoods behave almost like random trees. The goal of this section is to establish rigorously this result, as well as provide bounds on neighbourhood sizes.

5.1 Setting and definitions

Labeled rooted graphs

A labeled rooted graph is a triplet $g_{*}=(g,o,\iota)$ consisting of a graph $g=(V,E)$ , a root $o\in V$ , and a mark function $\iota:V\to\mathbb{N}$ with finite support. We shall denote by $\mathcal{G}_{*}$ the set of labeled rooted graphs with $V=\mathbb{N}$ , and will often write $g_{*}=(g,o)$ for an element of $\mathcal{G}_{*}$ , dropping the mark function. Notions of subgraphs, induced subgraphs and distance extend naturally from regular graphs to this setting.

Labeling trees and graphs

We recall that $G$ is the inhomogeneous random graph defined earlier. For each vertex $x\in V$ , we can define the associated element of $\mathcal{G}_{*}$ as follows: the root is set to $x$ , each vertex $y\in[n]$ is given a mark $\iota(y)=y$ , and we let $\iota(z)=0$ for all $z\in\mathbb{N}\setminus[n]$ . The resulting triple $(G,x,\iota)$ is a random element of $\mathcal{G}_{*}$ .

Now, let $o\in[n]$ ; we define the inhomogeneous random tree as follows: first, the root is given a mark $\iota(o)=o$ . Then, for each vertex $x$ already labeled, we draw the number of children of $x$ according to $\operatorname{Poi}(d_{\iota(x)})$ , where we recall that

d_{\iota(x)}=\sum_{j}P_{\iota(x),j}\leq d.

Each child $y$ of $x$ receives a label drawn independently at random from the distribution

\pi_{\iota(x)}=\left(\frac{P_{\iota(x),1}}{d_{\iota(x)}},\dots,\frac{P_{\iota(x),n}}{d_{\iota(x)}}\right),

(11)

which sums to 1 by definition. The resulting tree is a random element of $\mathcal{G}_{*}$ , denoted by $(T,o)$ .

5.2 Growth properties of trees and graphs

A number of growth properties for neighbourhoods in $T$ and $G$ are needed to ensure the successful couplings below. By definition of $d$ , $G$ (resp. $(T,o)$ ) is dominated by an Erdős-Rényi graph $\mathcal{G}(n,d/n)$ (resp. a Galton-Watson tree with offspring distribution $\operatorname{Poi}(d)$ ); we are thus able to direcly lift properties from [bordenave_nonbacktracking_2018], Sections 8 and 9.

Lemma 3.

Let $v$ be an arbitrary vertex in $G$ ; then, there exist absolute constants $c_{0},c_{1}>0$ such that for every $s>0$ , we have

\mathbb{P}\left\lparen\forall t\geq 1,\ \left|\partial(G,v)_{t}\right|\leq sd^{t}\right\rparen\geq 1-c_{0}e^{-c_{1}s}.

(12)

The same result holds when replacing $(G,v)$ with the tree $(T,o)$ defined above.

Taking $s=c_{1}^{-1}\log(c_{0}n^{2})$ in the above inequality, one gets

\mathbb{P}\left\lparen\forall t\geq 1,\ \forall v\in V,\ \left|\partial(G,v)_{t}\right|\leq c_{3}\log(n)d^{t}\right\rparen\geq 1-\frac{1}{n},\

(13)

for any $n\geq 3$ . Summing these inequalities for $1\leq t\leq\ell$ yields a similar bound for the whole ball: with probability at least $1-\frac{1}{n}$ , we have

|{(G,v)}_{t}|\leq c_{4}\log(n)d^{t}

(14)

for all $v\in V$ and $t\geq 1$ . In particular, this implies the following useful bound: for any $v\in V$ ,

\deg(v)\leq c_{4}d\log(n).

Another consequence of (12) is the following useful lemma:

Lemma 4.

For every $p\geq 2$ , there is a constant $c_{p}$ such that

\mathbb{E}\left[\max_{v\in V}\sup_{t\geq 1}{\left(\frac{\left|\partial(G,v)_{t}\right|}{d^{t}}\right)}^{p}\,\right]\leq c_{p}\log{(n)}^{p}

(15)

Similarly to the proof of (14), we have

\max_{v\in V}|{(G,v)}_{t}|^{p}\leq d^{tp}t^{p}\max_{x\in V}\sup_{s\leq t}\frac{\left|\partial(G,v)_{t}\right|^{p}}{d^{sp}},

which yields

\mathbb{E}\left[\max_{v\in V}|{(G,v)}_{t}|^{p}\right]\leq c_{p}t^{p}\log{(n)}^{p}d^{tp}

(16)

An important note is that the above results apply to any collection of $n$ random variables satisfying an inequality like (12); in particular, it also applies to an i.i.d collection of inhomogeneous random trees of size $n$ .

5.3 Local tree-like structure

We first check that the random graph $G$ is tree-like. We say that a graph $g$ is $\ell$ -tangle-free if there is at most one cycle in the $\ell$ -neighbourhood of every vertex in the graph. As mentioned before, the random graph $G$ is dominated by an Erdős-Rényi graph $\mathcal{G}(n,d/n)$ ; we can therefore lift the desired properties from [bordenave_nonbacktracking_2018].

Lemma 5.

Let $\ell\leq n$ be any integer parameter.

(i)

the random graph $G$ is $\ell$ -tangle-free with probability at least $1-ca^{2}d^{4\ell}/n$ .
(ii)

the probability that a given vertex $v$ has a cycle in its $\ell$ -neighbourhood is at most $cad^{2\ell}/n$ .

We shall assume in the following that the $2\ell$ -tangle-free property happens with probability at least $1-cn^{-\epsilon}$ for some $\epsilon>0$ , which happens whenever

\ell\leq\frac{1-\epsilon}{10}\log_{d}(n)\leq c_{3}\log(n).

(17)

We now gather all the result of the current section into one proposition, for ease of reading. The bound $\ell\leq c\log(n)$ assumed above is used to simplify the inequalities below.

Proposition 2.

Let $G$ be an inhomogeneous random graph, and ${(T_{x},x)}_{x\in[n]}$ a family of random trees as defined above. Let $\ell$ be small enough so that (17) holds. Then there exists an event $\mathcal{E}$ with probability at least $1-\frac{1}{\log(n)}$ , under which:

(i)

the graph $G$ is $2\ell$ -tangle-free,
(ii)

for all $v\in G$ , $t\leq 2\ell$ , we have

$|{(G,x)}_{t}|\leq c\log(n)d^{t},$ (18)
(iii)

for any $t\leq 2\ell$ , the number of vertices in $G$ whose $t$ -neighbourhood contains a cycle is at most $c\log{(n)}^{2}d^{t+1}$

Furthermore, for any $t\leq 2\ell$ and $p\geq 1$ , we have

\mathbb{E}\left[\max_{v\in V}|{(G,v)}_{t}|^{p}\right]^{\frac{1}{p}}\leq c\log{(n)}^{2}d^{t},

(19)

and the same holds for the family ${(T_{x},x)}_{x\in[n]}$ .

5.4 Coupling between rooted graphs and trees

We now turn onto the main argument of this proof: we bound the variation distance between the neighbourhoods of $(G,x)$ and $(T,x)$ up to size $\ell$ .

First, recall some definitions: if $\mathbb{P}_{1},\mathbb{P}_{2}$ are two probability measures on the space $(\Omega,\mathcal{F})$ , their total variation distance is defined as

\operatorname{d_{TV}}(\mathbb{P}_{1},\mathbb{P}_{2})=\sup_{\mathcal{A}\in\mathcal{F}}|\mathbb{P}_{1}(\mathcal{A})-\mathbb{P}_{2}(\mathcal{A})|.

The following two characterizations of the total variation distance shall be useful: first, whenever $\Omega$ is countable, we have

\operatorname{d_{TV}}(\mathbb{P}_{1},\mathbb{P}_{2})=\frac{1}{2}\left\lVert\mathbb{P}_{1}-\mathbb{P}_{2}\right\rVert_{1}=\frac{1}{2}\sum_{\omega\in\Omega}|\mathbb{P}_{1}(\omega)-\mathbb{P}_{2}(\omega)|.

(20)

Additionally,

\operatorname{d_{TV}}(\mathbb{P}_{1},\mathbb{P}_{2})=\min_{\mathbb{P}\in\pi(X_{1},X_{2})}\mathbb{P}(X_{1}\neq X_{2}),

(21)

where $\pi(X_{1},X_{2})$ denotes the set of all couplings between $\mathbb{P}_{1}$ and $\mathbb{P}_{2}$ , i.e. probability measures on $(\Omega^{2},\mathcal{F}\otimes\mathcal{F})$ such that the marginal distributions are $\mathbb{P}_{1}$ and $\mathbb{P}_{2}$ .

Denoting by $\mathcal{L}(X)$ the probability distribution of a variable $X$ , the aim of this section is to prove the following:

Proposition 3.

Let $\ell\leq c_{0}\log(n)$ for some constant $c_{0}>0$ . Then, for every vertex $v\in V$ ,

\operatorname{d_{TV}}(\mathcal{L}({(G,v)}_{\ell}),\mathcal{L}({(T,v)}_{\ell}))\leq\frac{c\,\log{(n)}^{2}d^{2\ell+2}}{n}.

(22)

5.4.1 A total variation distance lemma for sampling processes

For an integer $n$ , denote by $\mathcal{S}(n)$ the set of all multisets with elements in $[n]$ , and by $\mathcal{P}(n)\subset\mathcal{S}(n)$ the powerset of $[n]$ . Let $p_{1},\dots,p_{n}\in[0,1/2]$ , with $\sum p_{i}=\lambda$ and $\sum p_{i}^{2}=\alpha$ , and consider the two probability laws on $\mathcal{S}(n)$ :

•

$\mathbb{P}_{1}$ : each element $i$ of $[n]$ is picked with probability $p_{i}$ ,
•

$\mathbb{P}_{2}$ : the size of the multiset $S$ is drawn according to a $\operatorname{Poi}(\lambda)$ distribution, and each element of $S$ has an i.i.d label with distribution $(p_{1}/\lambda,\dots,p_{n}/\lambda)$ .

Note that $\mathbb{P}_{1}$ is actually supported on $\mathcal{P}(n)$ .

Proposition 4.

Let $\mathbb{P}_{1},\mathbb{P}_{2}$ be defined as above. Then

\operatorname{d_{TV}}(\mathbb{P}_{1},\mathbb{P}_{2})\leq\alpha+\frac{e^{2\alpha}-1}{2}.

Proof.

Using characterization (20), we have

2\operatorname{d_{TV}}(\mathbb{P}_{1},\mathbb{P}_{2})=\sum_{S\in\mathcal{P}(n)}\left|\mathbb{P}_{1}(S)-\mathbb{P}_{2}(S)\right|+\mathbb{P}_{2}(S\notin\mathcal{P}(n)).

(23)

We shall treat those two terms separately. First, notice that for $S\in\mathcal{P}(n)$ , we have

$\displaystyle\mathbb{P}_{1}(S)$	$\displaystyle=\prod_{i\in S}p_{i}\prod_{i\notin S}(1-p_{i})$	(24)
$\displaystyle\mathbb{P}_{2}(S)$	$\displaystyle=\frac{e^{-\lambda}\lambda^{\|S\|}}{\|S\|!}\times\|S\|!\prod_{i\in S}\frac{p_{i}}{\lambda}$
	$\displaystyle=e^{-\lambda}\prod_{i\in S}p_{i},$	(25)

and thus by summing over all sets $S$ ,

\mathbb{P}_{2}(S\in\mathcal{P}(n))=e^{-\lambda}\prod_{i=1}^{n}(1+p_{i}).

Using the classical inequality $\log(1+x)\geq x-x^{2}/2$ , we can bound the second member of (23) as follows:

	$\displaystyle\mathbb{P}_{2}(S\notin\mathcal{P}(n))$	$\displaystyle=1-e^{-\lambda}\prod_{i=1}^{n}(1+p_{i})$
		$\displaystyle\leq 1-e^{-\lambda}e^{\lambda-\alpha/2}$
		$\displaystyle\leq\alpha/2.$

On the other hand, using again (24) and (25), the first term reduces to

	$\displaystyle\sum_{S\in\mathcal{P}(n)}\left\|\mathbb{P}_{1}(S)-\mathbb{P}_{2}(S)\right\|=\sum_{S\in\mathcal{P}(n)}\prod_{i\in S}p_{i}\left\|\prod_{i\notin S}(1-p_{i})-e^{-\lambda}\right\|$
	$\displaystyle\leq\sum_{S\in\mathcal{P}(n)}\prod_{i\in S}p_{i}\left(\left\|e^{-\lambda}-\prod_{i=1}^{n}(1-p_{i})\right\|+\left\|\prod_{i\notin S}(1-p_{i})-\prod_{i=1}^{n}(1-p_{i})\right\|\right).$

Both absolute values above can be removed since the expressions inside are nonnegative; further, for $0\leq p\leq 1/2$ , we have $\log(1-x)\geq-x-x^{2}$ . Combining all those estimates, we find

	$\displaystyle\sum_{S\in\mathcal{P}(n)}\left\|\mathbb{P}_{1}(S)-\mathbb{P}_{2}(S)\right\|$
	$\displaystyle\leq e^{-\lambda}(1-e^{-\alpha})\sum_{S\in\mathcal{P}(n)}\prod_{i\in S}p_{i}+\prod_{i=1}^{n}(1-p_{i})\sum_{S\in\mathcal{P}(n)}\prod_{i\in S}p_{i}\left(\prod_{i\in S}\frac{1}{1-p_{i}}-1\right)$
	$\displaystyle\leq\alpha e^{-\lambda}\prod_{i=1}^{n}(1+p_{i})+e^{-\lambda}\left(\prod_{i=1}^{n}\left(1+\frac{p_{i}}{1-p_{i}}\right)-\prod_{i=1}^{n}\left(1+p_{i}\right)\right)$
	$\displaystyle\leq\alpha+e^{-\lambda}\exp\left(\sum_{i=1}^{n}\frac{p_{i}}{1-p_{i}}\right)-e^{-\frac{\alpha}{2}},$

where we again used the logarithm inequalities extensively. Finally, for $0\leq p\leq 1/2$ , we have $p/(1-p)\leq p+2p^{2}$ , which allows us to finish the computation:

\sum_{S\in\mathcal{P}(n)}\left|\mathbb{P}_{1}(S)-\mathbb{P}_{2}(S)\right|\leq\frac{3}{2}\alpha+e^{2\alpha}-1.

(26)

Combining (26) with (23) easily implies the lemma.

∎

We introduce now a family of probability laws on $\mathcal{S}(n)$ ; for a subset $S\subseteq[n]$ , let $\mathbb{P}_{S}$ be the measure corresponding to picking each element $i$ of $S$ with probability $p_{i}$ .

The variation distance between those laws and $\mathbb{P}_{1}=\mathbb{P}_{[n]}$ is then easier to bound:

Lemma 6.

For any $S\subseteq[n]$ , we have:

\operatorname{d_{TV}}(\mathbb{P}_{1},\mathbb{P}_{S})\leq\sum_{i\notin S}p_{i}.

Proof.

Consider the following coupling: we take a realization $X$ of $\mathbb{P}_{1}$ , and set $Y=X\cap S$ . Then, $Y\sim\mathbb{P}_{S}$ , and we find

\mathbb{P}(X\neq Y)=\mathbb{P}_{1}(X\cap S^{c}\neq\emptyset)\leq\mathbb{E}\left[|X\cap S^{c}|\right]=\sum_{i\notin S}p_{i}

This ends the proof, since (21) ensures that $\operatorname{d_{TV}}(\mathbb{P}_{1},\mathbb{P}_{S})\leq\mathbb{P}(X\neq Y)$ . ∎

5.4.2 Proof of Proposition 3

Gathering all the previous results, we are now ready to prove Proposition 3:

Proof.

Define the classical breadth-first exploration process on the neighbourhood of a vertex $v$ as follows : start with $A_{0}=\{v\}$ and at stage $t\geq 0$ , if $A_{t}$ is not empty, take a vertex $v_{t}\in A_{t}$ at minimal distance from $v$ , reveal its neighbours $N_{t}$ in $V\setminus A_{t}$ , and update $A_{t+1}=(A_{t}\cup N_{t})\setminus\{v_{t}\}$ . We denote by ${(\mathcal{F}_{t})}_{t\geq 0}$ the filtration generated by the ${(A_{t})}_{t\geq 0}$ , and by $D_{t}=\bigcup_{s\leq t}A_{s}$ the set of vertices already visited at time $t$ , and $\tau$ the first time at which all vertices in ${(G,v)}_{\ell}$ have been revealed.

We perform the same exploration process in parallel on $(T,v)$ , which corresponds to a breadth-first search of the tree. At step $t$ , we denote by $\mathbb{P}_{t}$ the distribution of $N_{t}$ given $\mathcal{F}_{t}$ , and $\mathbb{Q}_{t}$ the distribution of the offspring of $v_{t}$ in $T$ (no conditioning is needed there).

Let $E_{\ell}$ denote the event that ${(G,v)}_{\ell}$ is a tree and contains no more than $c_{1}\log(n)d^{\ell}$ vertices; from (14) and Lemma 5, we can choose $c_{1}$ such that $E_{\ell}$ has probability at least $1-c_{2}d^{2\ell+1}/n$ for some absolute constant $c_{2}$ . By iteration, it suffices to show that if $E_{\ell}$ holds, there exists a constant $c_{3}>0$ such that

\operatorname{d_{TV}}(\mathbb{P}_{t},\mathbb{Q}_{t})\leq\frac{c_{3}\log(n)d^{\ell+2}}{n}\quad\text{for all}\quad t\leq\tau.

(27)

Given $\mathcal{F}_{t}$ , the probability measure $\mathbb{P}_{t}$ is as follows: each element $i$ of $V\setminus A_{t}$ is selected with probability $p_{i}=P_{v_{t}i}$ . Let $\mathbb{P}^{\prime}_{t}$ denote the same probability measure, but where the selection is made over all of $V$ . Using Lemma 6, we first find that

\operatorname{d_{TV}}(\mathbb{P}_{t},\mathbb{P}^{\prime}_{t})\leq\sum_{i\in A_{t}}P_{v_{t}i}\leq c_{1}\log(n)d^{\ell}\cdot\frac{d}{n}.

On the other hand, Proposition 4 yields

\operatorname{d_{TV}}(\mathbb{P}^{\prime}_{t},\mathbb{Q}_{t})\leq c_{4}\sum_{i=1}^{n}P_{v_{t}i}^{2}\leq c_{5}\,\frac{d^{2}}{n}.

Equation (27) then results from a straightforward application of the triangle inequality. ∎

6 Near eigenvectors of $G$

6.1 Functionals on $(T,o)$

6.1.1 Vertex functionals on trees

Similarly to [bordenave_nonbacktracking_2018], quantities of interest in the study of $B$ will be tied to functionals on the random inhomogeneous tree defined above. Define a functional $f_{\varphi,t}$ on the set of labeled rooted trees $\mathcal{T}_{*}\subset\mathcal{G}_{*}$ by

f_{\varphi,t}(T,o)=\sum_{x_{t}\in\partial{(T,o)}_{t}}{W_{\iota(o),\iota(x_{1})}\dots W_{\iota(x_{t-1}),\iota(x_{t})}\varphi(\iota(x_{t}))},

where $(o,x_{1},\dots,x_{t})$ is the unique path of length $t$ between $o$ and $x_{t}$ . Then the following proposition holds:

Proposition 5.

Let $t\geq 0$ be an integer. For any $i,j\in[r]$ , the following identities are true:

	$\displaystyle\mathbb{E}\left[f_{\varphi_{i},t}(T,x)\right]=\mu_{i}^{t}\,\varphi_{i}(x),$		(28)
	$\displaystyle\mathbb{E}\left[f_{\varphi_{i},t}(T,x)f_{\varphi_{j},t}(T,x)\right]={(\mu_{i}\mu_{j})}^{t}\sum_{s=0}^{t}{\frac{[K^{s}\varphi^{i,j}](x)}{{(\mu_{i}\mu_{j})}^{s}}},$		(29)
	$\displaystyle\mathbb{E}\left[{\left(f_{\varphi_{i},t+1}(T,x)-\mu_{i}f_{\varphi_{i},t}(T,x)\right)}^{2}\right]=[K^{t+1}\varphi^{i,i}](x).$		(30)

where we recall that $\varphi^{i,j}=\varphi_{i}\odot\varphi_{j}$ .

6.1.2 Adapting functionals to non-backtracking paths

The matrix $B$ considered here acts on (directed) edges, whereas the functionals considered so far are defined on vertices. Consequently, we define the following transformation: for a function $f:\mathcal{G}_{*}\to\mathbb{R}$ , and a random vector $w\in\mathbb{R}^{V}$ with expected value $\bar{w}$ , let

\vec{\partial}_{w}f(g,o)=\sum_{e:e_{2}=o}w_{e_{1}}f(g_{e},o),

where $g_{e}$ denotes the graph $g$ with the edge ${e_{1},e_{2}}$ removed.

The expectations from Proposition 5 are then adapted as follows:

Proposition 6.

Let $t\geq 0$ be an integer. For any $i,j\in[r]$ , and $\phi\in\ker(P)$ , the following identities are true:

	$\displaystyle\mathbb{E}\left[\vec{\partial}_{w}f_{\varphi_{i},t}(T_{x},x)\right]=[P\bar{w}](x)\cdot\mathbb{E}\left[f_{\varphi_{i},t}(T_{x},x)\right],$		(31)
	$\displaystyle\mathbb{E}\left[\vec{\partial}_{w}(f_{\varphi_{i},t}\cdot f_{\varphi_{j},t})(T_{x},x)\right]=[P\bar{w}](x)\cdot\mathbb{E}\left[f_{\varphi_{i},t}(T_{x},x)f_{\varphi_{j},t}(T_{x},x)\right],$		(32)
	$\displaystyle\mathbb{E}\left[\vec{\partial}_{w}[{(f_{\varphi_{i},t+1}-\mu_{i}f_{\varphi_{i},t})}^{2}](T_{x},x)\right]$
	$\displaystyle\qquad\qquad=[P\bar{w}](x)\cdot\mathbb{E}\left[{\left(f_{\varphi_{i},t+1}(T_{x},x)-\mu_{i}f_{\varphi_{i},t}(T_{x},x)\right)}^{2}\right].$		(33)

The proof for those results makes use of properties specific to moments of Poisson random variables; as with the preceding results, it is deferred to a later section.

6.2 Spatial averaging of graph functionals

In this section, we leverage the coupling obtained above to provide bounds on quantities of the form $\frac{1}{n}\sum_{x\in V}f(G,x)$ , for local functions $f$ . The tools and results used in this section are essentially identical to those in [bordenave_nonbacktracking_2018], with a few improvements and clarifications added when necessary.

We begin with a result that encodes the fact that the $t$ -neighbourhoods in $G$ are approximately independent. We say that a function $f$ from $\mathcal{G}_{*}$ to $\mathbb{R}$ is $t$ -local if $f(g,o)$ is only function of ${(g,o)}_{t}$ .

Proposition 7.

Let $t\leq c_{0}\log(n)$ for some constant $c_{0}>0$ . Let $f,\psi:\mathcal{G}_{*}\to\mathbb{R}$ be two $t$ -local functions such that $|f(g,o)|\leq\psi(g,o)$ for all $(g,o)\in\mathcal{G}_{*}$ and $\psi$ is non decreasing by the addition of edges. Then

\operatorname{Var}\left(\sum_{o\in V}f(G,o)\right)\leq c\log{(n)}^{4}nd^{2t}\cdot\sqrt{\mathbb{E}\left[\max_{o\in V}\psi{(G,o)}^{4}\right]}.

Proof.

For $x\in V$ , denote by $E_{x}$ the set $\left\{\{u,x\}\in E\nonscript{}\>\middle|\nonscript{}\>\mathopen{}u\leq x\right\}$ ; the vector $(E_{1},\dots,E_{n})$ is an independent vector, and we have

Y:=\sum_{v\in V}f(G,v)=F(E_{1},\dots,E_{n}).

for some measurable function $F$ .
Define now $G_{x}$ the graph with vertex set $V$ and edge set $\bigcup_{y\neq x}E_{y}$ , and set

Y_{x}=\sum_{v\in V}f(G_{x},v).

The random variable $Y_{x}$ is $\bigcup_{y\neq x}E_{y}$ -measurable, so the Efron-Stein inequality applies:

\operatorname{Var}(Y)\leq\sum_{x\in[n]}\mathbb{E}\left[{(Y-Y_{x})}^{2}\right].

For a given $x\in V$ , the difference $f(G,o)-f(G_{x},o)$ is always zero except if $x\in{(G,o)}_{t}$ , due to the locality property; consequently,

	$\displaystyle\|Y-Y_{x}\|$	$\displaystyle\leq\sum_{o\in V}\|f(G,o)-f(G_{x},o)\|$
		$\displaystyle\leq\sum_{o\in{(G,x)}_{t}}\psi(G,o)+\psi(G_{x},o)$
		$\displaystyle\leq 2\max_{x\in[n]}\left\|{(G,x)}_{t}\right\|\cdot\max_{o\in V}\psi(G,o),$

where we used the non-decreasing property of $\psi$ in the last line. By the Cauchy-Schwarz inequality and equation (16), we can write

	$\displaystyle\mathbb{E}\left[{(Y-Y_{x})}^{2}\right]$	$\displaystyle\leq 4\sqrt{\mathbb{E}\left[\left\|\max_{x\in[n]}{(G,x)}_{t}\right\|^{4}\right]}\cdot\sqrt{\mathbb{E}\left[\max_{o\in V}\psi{(G,o)}^{4}\right]}$
		$\displaystyle\leq c_{1}t^{2}\log{(n)}^{2}d^{2t}\cdot\sqrt{\mathbb{E}\left[\max_{o\in V}\psi{(G,o)}^{4}\right]}.$

Using that $t\leq c_{0}\log(n)$ , and the linearity of expectation, yields the desired bound. ∎

We now use our previous coupling results to provide a concentration bound between a functional on graphs and its expectation on trees:

Proposition 8.

Let $t\in\mathbb{N}$ and $f,\psi:\mathcal{G}_{*}\to\mathbb{R}$ be as in the previous proposition. Then, with probability at least $1-\frac{1}{r^{2}\log{(n)}^{2}}$ , the following inequality holds:

\left|\sum_{v\in V}f(G,v)-\mathbb{E}\left[\sum_{x\in[n]}f(T_{x},x)\right]\right|\leq c\,r\log{(n)}^{3}d^{t+1}\sqrt{n}\left\lVert\psi\right\rVert_{\star},

where $\left\lVert\psi\right\rVert_{\star}$ is defined as

\left\lVert\psi\right\rVert_{\star}={\left(\mathbb{E}\left[\max_{v\in V}\psi{(G,v)}^{4}\right]\right)}^{\frac{1}{4}}\,\vee\,{\left(\max_{x\in[n]}\mathbb{E}\left[\psi{(T_{x},x)}^{2}\right]\right)}^{\frac{1}{2}}.

Proof.

Using the Chebyshev inequality and the variance bound from the preceding proposition, we have with probability at least $1-\frac{1}{r^{2}\log{(n)}^{2}}$

\left|\sum_{v\in V}f(G,v)-\mathbb{E}\left[\sum_{v\in V}f(G,v)\right]\right|\leq c_{1}\,r\log{(n)}^{3}d^{t}\sqrt{n}\left\lVert\psi\right\rVert_{\star}.

It then remains to bound the difference between the expectation term and its counterpart on trees. For $x\in V$ , let $\mathcal{E}_{x}$ denote the event that the coupling bewteen ${(G,x)}_{t}$ and ${(T_{x},x)}_{t}$ fails; by the locality property, $f(G,x)=f(T_{x},x)$ on $\mathcal{E}_{x}$ . Therefore, using the Cauchy-Schwarz inequality,

	$\displaystyle\left\|\sum_{x\in[n]}\mathbb{E}\left[f(G,x)-f(T_{x},x)\right]\right\|\leq\sum_{x\in[n]}\mathbb{E}\left[\|f(G,x)\|1_{\mathcal{E}_{x}}+\|f(T_{x},x)\|1_{\mathcal{E}_{x}}\right]$
	$\displaystyle\leq\sum_{x\in[n]}\sqrt{\mathbb{P}\left\lparen\mathcal{E}_{x}\right\rparen}\left(\sqrt{\mathbb{E}\left[\psi{(G,x)}^{2}\right]}+\sqrt{\mathbb{E}\left[\psi{(T_{x},x)}^{2}\right]}\right)$
	$\displaystyle\leq\sqrt{\frac{c_{2}\log{(n)}^{2}d^{2t+2}}{n}}\cdot\sum_{x\in[n]}\left(\mathbb{E}\left[\psi{(G,x)}^{4}\right]^{\frac{1}{4}}+\sqrt{\mathbb{E}\left[\psi{(T_{x},x)}^{2}\right]}\right)$
	$\displaystyle\leq c_{3}\log(n)ad^{t+1}\sqrt{n}\,\left\lVert\psi\right\rVert_{\star}.$

It is then straightforward to check that both obtained bounds are less than the RHS in the proposition, upon adjusting $c$ . ∎

6.3 Structure of near eigenvectors

In the following, the aim is to obtain bounds on the norms and scalar product of the near eigenvectors $u_{i}$ and $v_{i}$ defined in (LABEL:eq:def_ui_vi). The main result of this section is as follows:

Proposition 9.

Let $\ell$ be small enough so that (17) holds. On an event with probability $1-c_{1}/\log(n)$ , the following inequalities hold for all $i,j\in[r]$ , $t\leq 2\ell$ and some absolute constant $c>0$ :

	$\displaystyle\left\|\langle B^{t}\chi_{i},\chi_{j}\rangle-\mu_{i}^{t}\langle\varphi_{i},D_{P}\varphi_{j}\rangle\right\|\leq\frac{c\,rb^{2}d^{2}\log{(n)}^{6}d^{2t}L^{t}}{\sqrt{n}},$		(34)
	$\displaystyle\left\|\langle B^{t}\chi_{i},D_{W}\check{\chi}_{j}\rangle-\mu_{i}^{t+1}\delta_{ij}\right\|\leq\frac{c\,rb^{2}d^{3}L\log{(n)}^{6}d^{2t}L^{t}}{\sqrt{n}},$		(35)
	$\displaystyle\left\|\langle B^{t}\chi_{i},B^{t}\chi_{j}\rangle-\mu_{i}^{t}\mu_{j}^{t}\Gamma_{U,ij}^{(t)}\right\|\leq\frac{c\,rb^{2}d^{2}\log{(n)}^{7}d^{3t}L^{2t}}{\sqrt{n}},$		(36)
	$\displaystyle\left\|\langle{(B^{})}^{t}D_{W}\check{\chi}_{i},{(B^{})}^{t}D_{W}\check{\chi}_{j}\rangle-\mu_{i}^{t+1}\mu_{j}^{t+1}\Gamma_{ij}^{(t+1)}\right\|\leq\frac{c\,rb^{2}d^{2}L^{2}\log{(n)}^{6}d^{3t}L^{2t}}{\sqrt{n}},$		(37)
	$\displaystyle\left\lVert B^{t+1}\chi_{i}-\mu_{i}B^{t}\chi_{i}\right\rVert^{2}\leq rd^{3}L^{2}\rho^{t+1}+\frac{crb^{2}d^{3}\log{(n)}^{7}d^{3t}L^{2t}}{\sqrt{n}}.$		(38)

Proof.

The proof of those inequalities relies on careful applications of Proposition 8 to previously considered functionals. We aim to prove that each of those inequalities hold with probability $1-c_{2}/r\log(n)$ ; we fix in the following an integer $t\leq 2\ell$ and $i,j\in[r]$ . Let $\mathcal{V}_{t}$ be the set of vertices such that ${(G,v)}_{t}$ is not a tree; we place ourselves in the event described in Proposition 2 and as a consequence

\mathcal{V}_{t}\leq c_{3}\log{(n)}^{2}d^{t+1}.

We first prove (34); let

f(g,o)=\mathbf{1}_{{(g,o)}_{t}\text{ has no cycles}}\,\varphi_{j}(o)\vec{\partial}_{\mathbf{1}}f_{\varphi_{i},t}(g,o).

The function $f$ is clearly $t$ -local, and

	$\displaystyle\left\|f(g,o)\right\|$	$\displaystyle\leq\left\lVert\varphi_{i}\right\rVert_{\infty}\left\lVert\varphi_{j}\right\rVert_{\infty}\deg(o)\left\|\partial{(g,o)}_{t}\right\|L^{t}$
		$\displaystyle\leq\frac{b^{2}}{n}\deg(o)\left\|{(g,o)}_{t}\right\|L^{t}:=\psi(g,o).$

The function $\psi$ thus defined is non-decreasing by the addition of edges. When $v\notin\mathcal{V}_{t}$ , we notice that

f(G,v)=\varphi_{j}(v)\cdot[T^{*}B^{t}\chi_{i}](v),

hence,

\left|\langle B^{t}\chi_{i},\chi_{j}\rangle-\sum_{v\in V}f(G,v)\right|=\left|\sum_{v\in\mathcal{V}_{t}}\varphi_{i}(v)T^{*}B^{t}\chi_{j}\right|\leq 2|\mathcal{V}_{t}|\max_{v}\psi(G,v),

since by the tangle-free property there are at most two paths from $v$ to any vertex in ${(G,v)}_{t}$ . Furthermore, using the results in subsection 5.2, we find that with probability at least $1-1/n$

\max_{v}\psi(G,v)\leq\frac{c_{4}\,b^{2}\log{(n)}^{2}d^{t+1}L^{t}}{n}\quad\ \par

7 Proof of Theorem LABEL:th:bl_u_bounds

Having shown Proposition 9, all that remains is simply to gather the preceding bounds, and simplify them to get an easy-to-read summary. Bounds (LABEL:eq:Ustar_U)-(LABEL:eq:Ustar_V), as well as (LABEL:eq:norm_Bl), being straightforward computations, they are deferred to the appendix.

7.1 A telescopic trick: proof of (LABEL:eq:Bl_U)

Notice that for for a $r_{0}\times r_{0}$ matrix $M$ , we have

\lVert M\rVert\leq r_{0}\max_{i}\lVert M_{i}\rVert.

(39)

where $M_{i}$ are the columns (or lines) of $M$ . To apply this inequality, we write

\lVert B^{\ell}u_{i}-\mu_{i}^{\ell}u_{i}\rVert\leq\sum_{t=0}^{\ell-1}\mu_{i}^{\ell-t-1}\lVert B^{t+1}u_{i}-\mu_{i}B^{t}u_{i}\rVert,

(40)

and (38) yields

	$\displaystyle\lVert B^{t+1}u_{i}-\mu_{i}B^{t}u_{i}\rVert^{2}\leq\mu_{i}^{-2\ell}\lVert B^{t+\ell+1}\chi_{i}-\mu_{i}B^{t+\ell}\chi_{i}\rVert^{2}$
	$\displaystyle\qquad\qquad\leq\mu_{i}^{-2\ell}\left(rd^{3}L^{2}\rho^{t+\ell+1}+\frac{crb^{2}d^{3}\log{(n)}^{7}d^{3(t+\ell)}L^{2(t+\ell)}}{\sqrt{n}}\right).$

Since $i\leq r_{0}$ , the bounds $\mu_{i}^{2}\geq\rho\geq 1/d$ apply, so that

\lVert B^{t+1}u_{i}-\mu_{i}B^{t}u_{i}\rVert^{2}\leq rd^{3}L^{2}\rho^{t+\ell+1}\mu_{i}^{-2\ell}+\frac{crb^{2}d^{3}\log{(n)}^{7}d^{3t+5\ell}L^{2(t+\ell)}}{\sqrt{n}}.

(41)

We now use the (very crude) inequality $\sqrt{x+y}\leq\sqrt{x}+\sqrt{y}$ inside (41):

	$\displaystyle\lVert B^{\ell}u_{i}-\mu_{i}^{\ell}u_{i}\rVert$	$\displaystyle\leq\sum_{t=0}^{\ell-1}\left[\mu_{i}^{\ell-t-1}\sqrt{r}d^{3/2}L\rho^{\frac{t+\ell+1}{2}}\mu_{i}^{-\ell}+\frac{c_{1}\,bd^{3/2}\log{(n)}^{7/2}d^{\frac{3t+5\ell}{2}}L^{t+\ell}}{n^{1/4}}\right]$
		$\displaystyle\leq\sqrt{r}d^{3/2}L\rho^{\ell/2}\sum_{t=0}^{\ell-1}{\left(\frac{\sqrt{\rho}}{\mu_{i}}\right)}^{t+1}+c_{2}\,bd^{2}\log{(n)}^{9/2}\frac{{(Ld^{4})}^{\ell}}{n^{1/4}}L^{\ell}.$

The terms in the sum are all less than 1 since $i\leq r_{0}$ , and $\ell<c_{3}\log(n)$ implies

\lVert B^{\ell}u_{i}-\mu_{i}^{\ell}u_{i}\rVert\leq c_{3}\sqrt{r}d^{3/2}L\log(n)\rho^{\ell/2}+c_{2}bd^{2}\log{(n)}^{9/2}\frac{{(aLd^{3})}^{\ell}}{n^{1/4}}L^{\ell}.

The bound ${(Ld^{4})}^{\ell}\leq n^{1/4}$ holds by definition of $\ell$ , and (LABEL:eq:Bl_U) ensues via (39).

7.2 Bounding $\lVert B^{\ell}P_{H^{\bot}}\rVert$

Having established the candidates and error bounds for the upper eigenvalues of $B^{\ell}$ , it remains to bound the remaining eigenvalues (also called the bulk) of the matrix. This is done using a method first employed in [massoulie_community_2014], and leveraged again in a similar setting in [bordenave_nonbacktracking_2018, bordenave_detection_2020]. Our approach will be based on the latter two, adapting the non-backtracking method to the weighted case.

Our first preliminary step is the following lemma:

Lemma 7.

On an event with probability at least $1-1/\log(n)$ , for any $t\leq\ell$ , any unit vector $w\in H^{\bot}$ and $i\in[r_{0}]$ , one has

\left|\langle{(B^{*})}^{t}D_{W}\check{\chi}_{i},w\rangle\right|\leq\sqrt{r}d^{3/2}L^{2}\rho^{t/2}+\frac{c_{4}\,bd^{3/2}\log{(n)}^{9/2}d^{2\ell}L^{\ell}}{n^{1/4}}.

Proving this bound is done through the same telescopic sum trick as above, and is done in the appendix.

7.2.1 Tangle-free decomposition of $B^{\ell}$

We adapt here the decomposition first used in [bordenave_nonbacktracking_2018] to our setting. Through the remainder of this section, we shall consider $B$ as an operator on $\vec{E}(V)$ instead of $\vec{E}$ , setting $B_{ef}=0$ whenever $e\notin\vec{E}$ or $f\notin\vec{E}$ . This yields a matrix with $B$ as a principal submatrix and zeros everywhere else, thus the non-zero spectrum stays identical.

For $e,f\in\vec{E}(V)$ , and $t\geq 0$ , we define $\Gamma^{k}_{ef}$ the set of non-backtracking paths of length $k$ from $e$ to $f$ ; further, for an edge $e$ we define $X_{e}$ the indicator variable of $e\in\vec{E}$ , and $A_{e}=X_{e}W_{e}$ , so that $A$ is the (weighted) adjacency matrix of $G$ .

We then have that

{(B^{k})}_{ef}=\sum_{\gamma\in\Gamma^{k+1}_{ef}}X_{e}\prod_{s=1}^{k}A_{\gamma_{s}\gamma_{s+1}}.

Define $F^{k}_{ef}$ the set of $\ell$ -tangle-free paths (i.e. the set of paths $\gamma$ such that the subgraph induced by $\gamma$ is tangle-free). Then, whenever the graph $G$ is tangle-free, for all $k\leq\ell$ the matrix $B^{k}$ is equal to $B^{(k)}$ , with

{(B^{(k)})}_{ef}=\sum_{\gamma\in F^{k+1}_{ef}}X_{e}\prod_{s=1}^{k}A_{\gamma_{s}\gamma_{s+1}}.

Define now the “centered” versions of the weighted and unweighted adjacency matrices $\underline{A}$ and $\underline{X}$ by

\underline{A}_{ij}=A_{ij}-Q_{ij}\quad\ \par\par\LTX@newpage\par\par

Appendix A Applications of Theorem LABEL:th:main

A.1 Proof of Proposition 1

Let $x$ be an eigenvector of $B$ associated with the eigenvalue $\lambda$ ; the eigenvalue equation for $x$ reads

\lambda x_{e}=\sum_{e\rightarrow f}W_{f}x_{f}.

(42)

On the other hand, the definition $y=S^{*}D_{W}x$ expands to

y_{i}=\sum_{e:e_{1}=i}W_{e}x_{e}.

Applying equation (42) to $e$ and $e^{-1}$ yields

λ x_{e} = y_{e_{2}} - W_{e} x_{e^{- 1}} References12018AbbeAbbe[2018]abbe_community_2018 E m m a n u e l A b b e . C o m m u n i t y D e t e c t i o n a n d S t o c h a s t i c B l o c k M o d e l s : R e c e n t D e v e l o p m e n t s . Journal of Machine Learning Research, 18 (177) : 1 - - 86, 2018 . I S S N 1533 - 7928 . U R L 𝚑𝚝𝚝𝚙 : / / 𝚓𝚖𝚕𝚛 . 𝚘𝚛𝚐 / 𝚙𝚊𝚙𝚎𝚛𝚜 / 𝚟𝟷𝟾 / 𝟷𝟼 - 𝟺𝟾𝟶 . 𝚑𝚝𝚖𝚕 . 22015AbbeandSandonAbbeandSandon[2015]abbe_community_2015 E m m a n u e l A b b e a n d C o l i n S a n d o n . C o m m u n i t y D e t e c t i o n i n G e n e r a l S t o c h a s t i c B l o c k M o d e l s : F u n d a m e n t a l L i m i t s a n d E f f i c i e n t A l g o r i t h m s f o r R e c o v e r y . I n 2015 IEEE 56th Annual Symposium on Foundations of
Computer Science, p a g e s 670 - - 688, O c t o b e r 2015 . d o i : 10.1109 / F O C S .2015.47 . 32018AbbeandSan

	$\displaystyle\langle\mathbf{1},K^{t}\varphi\circ\varphi^{\prime}\rangle$	$\displaystyle=\sum_{k=1}^{s}\nu_{k}^{t}\langle\mathbf{1},\psi_{k}\rangle\langle\psi_{k},\varphi\circ\varphi^{\prime}\rangle$
		$\displaystyle\leq\rho^{t-1}\sum_{k=1}^{s}\lVert\psi_{k}\rVert\lVert\mathbf{1}\rVert\cdot\|\nu_{k}\|\left\|\langle\psi_{k},\varphi\circ\varphi^{\prime}\rangle\right\|$
		$\displaystyle\leq\rho^{t-1}\sqrt{n}\sum_{i\in[n]}\|\varphi(i)\|\|\varphi^{\prime}(i)\|\sum_{k=1}^{s}\|\nu_{k}\|\|\psi_{k}(i)\|$
		$\displaystyle\leq\rho^{t}d\sqrt{n}\sum_{i\in[n]}\|\varphi(i)\|\|\varphi^{\prime}(i)\|\sqrt{s}\sqrt{\sum_{k=1}^{s}\nu_{k}^{2}\psi_{k}{(i)}^{2}}$
		$\displaystyle\leq\rho^{t}a\sqrt{n}\sqrt{s}\frac{dL^{2}}{\sqrt{n}}\sum_{i}{\|\varphi(i)\|\|\varphi^{\prime}(i)\|}$
		$\displaystyle\leq rd^{2}L^{2}\rho^{t},$

	$\displaystyle\|Y-Y_{x}\|$	$\displaystyle\leq\sum_{o\in V}\|f(G,o)-f(G_{x},o)\|$
		$\displaystyle\leq\sum_{o\in{(G,x)}_{t}}\psi(G,o)+\psi(G_{x},o)$
		$\displaystyle\leq 2\max_{x\in[n]}\left\|{(G,x)}_{t}\right\|\cdot\max_{o\in V}\psi(G,o),$

Non-backtracking spectra of weighted inhomogeneous random graphs

Abstract

1 Introduction

2 Applications

2.1 Phase transition in random graphs

Phase transition for the adjacency matrix

Theorem 1.

Corollary 1.

Proposition 1.

3 A Bauer-Fike type bound for almost orthogonal diagonalization

Theorem 2 (Bauer-Fike Theorem [bauer_norms_1960]).

3.1 A custom perturbation lemma for almost diagonalizable matrices

Theorem 3.

Proof.

4 Preliminary computations

Bounding ρ\rho and LL from below

A scalar product lemma

Lemma 1.

Proof.

Entrywise bounds for KtK^{t}

The covariance matrices

Lemma 2.

5 Local study of GG

5.1 Setting and definitions

Labeled rooted graphs

Labeling trees and graphs

5.2 Growth properties of trees and graphs

Lemma 3.

Lemma 4.

5.3 Local tree-like structure

Lemma 5.

Proposition 2.

5.4 Coupling between rooted graphs and trees

Proposition 3.

5.4.1 A total variation distance lemma for sampling processes

Proposition 4.

Proof.

Lemma 6.

Proof.

5.4.2 Proof of Proposition 3

Proof.

6 Near eigenvectors of GG

6.1 Functionals on (T,o)(T,o)

6.1.1 Vertex functionals on trees

Proposition 5.

6.1.2 Adapting functionals to non-backtracking paths

Proposition 6.

6.2 Spatial averaging of graph functionals

Proposition 7.

Proof.

Proposition 8.

Proof.

6.3 Structure of near eigenvectors

Proposition 9.

Proof.

7 Proof of Theorem LABEL:th:bl_u_bounds

7.1 A telescopic trick: proof of (LABEL:eq:Bl_U)

7.2 Bounding ∥BℓPH⊥∥\lVert B^{\ell}P_{H^{\bot}}\rVert

Lemma 7.

7.2.1 Tangle-free decomposition of BℓB^{\ell}

Appendix A Applications of Theorem LABEL:th:main

A.1 Proof of Proposition 1

Bounding $\rho$ and $L$ from below

Entrywise bounds for $K^{t}$

5 Local study of $G$

6 Near eigenvectors of $G$

6.1 Functionals on $(T,o)$

7.2 Bounding $\lVert B^{\ell}P_{H^{\bot}}\rVert$

7.2.1 Tangle-free decomposition of $B^{\ell}$