This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A polynomial-time iterative algorithm for random graph matching with non-vanishing correlation

Jian Ding
Peking University
   Zhangsong Li
Peking University
Abstract

We propose an efficient algorithm for matching two correlated Erdős–Rényi graphs with nn vertices whose edges are correlated through a latent vertex correspondence. When the edge density q=nα+o(1)q=n^{-\alpha+o(1)} for a constant α[0,1)\alpha\in[0,1), we show that our algorithm has polynomial running time and succeeds to recover the latent matching as long as the edge correlation is non-vanishing. This is closely related to our previous work on a polynomial-time algorithm that matches two Gaussian Wigner matrices with non-vanishing correlation, and provides the first polynomial-time random graph matching algorithm (regardless of the regime of qq) when the edge correlation is below the square root of the Otter’s constant (which is 0.338\approx 0.338).

1 Introduction

In this paper, we study the algorithmic perspective for recovering the latent matching between two correlated Erdős–Rényi graphs. To be mathematically precise, we first need to choose a model for a pair of correlated Erdős–Rényi graphs, and a natural choice is that the two graphs are independently sub-sampled from a common Erdős–Rényi graph, as described more precisely next. For two vertex sets VV and 𝖵\mathsf{V} with cardinality nn, let E0E_{0} be the set of unordered pairs (u,v)(u,v) with u,vV,uvu,v\in V,u\neq v and define 𝖤0\mathsf{E}_{0} similarly with respect to 𝖵\mathsf{V}. For some model parameters p,s(0,1)p,s\in(0,1), we generate a pair of correlated random graphs G=(V,E)G=(V,E) and 𝖦=(𝖵,𝖤)\mathsf{G}=(\mathsf{V},\mathsf{E}) with the following procedure: sample a uniform bijection π:V𝖵\pi:V\to\mathsf{V}, independent Bernoulli variables I(u,v)I_{(u,v)} with parameter pp for (u,v)E0(u,v)\in E_{0} as well as independent Bernoulli variables J(u,v),𝖩(𝗎,𝗏)J_{(u,v)},\mathsf{J}_{(\mathsf{u},\mathsf{v})} with parameter ss for (u,v)E0(u,v)\in E_{0} and (𝗎,𝗏)𝖤0(\mathsf{u},\mathsf{v})\in\mathsf{E}_{0}. Let

G(u,v)=I(u,v)J(u,v),(u,v)E0,𝖦(𝗎,𝗏)=I(π1(𝗎),π1(𝗏))𝖩(𝗎,𝗏),(𝗎,𝗏)𝖤0,{}G_{(u,v)}=I_{(u,v)}J_{(u,v)},\forall(u,v)\in E_{0}\,,\quad\mathsf{G}_{(\mathsf{u},\mathsf{v})}=I_{\left(\pi^{-1}(\mathsf{u}),\pi^{-1}(\mathsf{v})\right)}\mathsf{J}_{(\mathsf{u},\mathsf{v})},\forall(\mathsf{u},\mathsf{v})\in\mathsf{E}_{0}\,, (1.1)

and let E={eE0:Ge=1},𝖤={𝖾𝖤0:𝖦𝖾=1}E=\{e\in E_{0}:G_{e}=1\},\mathsf{E}=\{\mathsf{e}\in\mathsf{E}_{0}:\mathsf{G}_{\mathsf{e}}=1\}. It is obvious that marginally GG is an Erdős–Rényi graph with edge density q=psq=ps and so is 𝖦\mathsf{G}. In addition, the edge correlation ρ\rho is given by ρ=s(1p)/(1ps)\rho=s(1-p)/(1-ps).

An important question is to recover the latent matching π\pi from the observation of (G,𝖦)(G,\mathsf{G}). More precisely, we wish to find an estimator π^\hat{\pi} which is measurable with respect to (G,𝖦)(G,\mathsf{G}) such that π^=π\hat{\pi}=\pi. Our main contribution is an efficient matching algorithm as incorporated in the theorem below.

Theorem 1.1.

Suppose that q=nα+o(1)1/2q=n^{-\alpha+o(1)}\leq 1/2 for some constant α[0,1)\alpha\in[0,1) and that ρ(0,1]\rho\in(0,1] is a constant. Then there exist a constant C=C(α,ρ)C=C(\alpha,\rho) and an algorithm (see Algorithm 2.6 below) with time-complexity O(nC)O(n^{C}) that recovers the latent matching with probability 1o(1)1-o(1). That is, this polynomial-time algorithm takes (G,𝖦)(G,\mathsf{G}) as the input and outputs an estimator π^\hat{\pi} such that π^=π\hat{\pi}=\pi with probability tending to 1 as nn\to\infty.

1.1 Backgrounds and related works

The random graph matching problem is motivated by questions from various applied fields such as social network analysis [40, 41], computer vision [8, 4], computational biology [48, 49, 19] and natural language processing [30]. In biology, an important problem is to identify proteins with similar structures/functions across different species. Toward this goal, directly comparing amino acid sequences that constitute proteins is often complicated, since genetic mutations of the species can result in significant variations of such a sequence. However, despite these variations, proteins typically maintain similar functions within each species’ metabolism. In light of this, biologists employ graph-based representations, such as Protein-Protein Interaction (PPI) graphs, for each species. Under the assumption that the topological structures of PPI graphs are similar across species, researchers can then effectively match proteins with similar functions by taking advantage of such similarity (and by possibly taking advantage of some other domain knowledge). This approach turns out to be successful and offers a nuanced understanding of phenomena such as the evolution of protein complexes. We refer the reader to [19] for more information on the topic. In the domain of social networks, data privacy is a fundamental and complicated problem. One complication arises from the fact that a user may have accounts in multiple social network platforms, where the user shares similar content or engages in comparable activities. Thus, from such similarities in different platforms, it is possible to infer their identities by aligning the respective graphs representing user interactions and content consumption. That is to say, it is possible to use graph matching techniques to deanonymize private social networks using information gleaned from public social platforms. In this scope, well-known examples include the deanonymization of Netflix using data from IMDb [40] and the deanonymization of Twitter using Flickr [41]. Viewing this problem from the opposite perspective, a deeper understanding on the random graph matching problem may offer insights on better mechanism to protect data privacy. With aforementioned applications in mind, the graph matching problem has been extensively studied recently from a theoretical point of view.

It is natural that different applications have their own distinct features, which mathematically boils down to a careful choice of underlying random graph model suitable for the desired application. Similar to most previous works on the random graph matching problem, in this paper we consider the correlated Erdős–Rényi random graph model, which is possibly an over idealization for any realistic network but nevertheless offers a good playground to develop insights and methods for this problem in general. By the collective efforts as in [10, 9, 31, 53, 52, 27, 13, 14], it is fair to say that we have a fairly complete understanding on the information thresholds for the problem of correlation detection and vertex matching. In contrast, the understanding on the computational aspect is far from being complete, and in what follows we briefly review the progress on this front.

A huge amount of efforts have been devoted to developing efficient algorithms for graph matching [42, 54, 34, 33, 24, 47, 1, 18, 21, 22, 5, 11, 12, 39, 26, 35, 36, 37, 28, 29, 38]. Previous to this work, arguably the best result is the recent work [38] (see also [28, 29] for a remarkable result on partial recovery of similar flavor when the average degree is O(1)O(1)), where the authors substantially improved a groundbreaking work [36] and obtained a polynomial time algorithm which succeeds as long as the correlation is above the square root of the Otter’s constant (the Otter’s constant is around 0.338{0.338}). In terms of methods, the present work seems drastically different from these existing works; instead, this work is closely related to our previous work [17] on a polynomial-time iterative algorithm that matches two correlated Gaussian Wigner matrices. One is encouraged to see [38, Section 1.1] and [17, Section 1.1] for a more elaborated review on previous algorithms. In addition, one is encouraged to see [17, Section 1.2] for discussions on novel features of this iterative algorithm, especially in comparison with the message-passing algorithm [29, 43] and a recent greedy algorithm for aligning two independent graphs [15].

1.2 Our contributions

While the present work can be viewed as an extension of [17], we do feel that we have conquered substantial obstacles and have made substantial contributions to this topic, as we describe below.

  • While in [38] (see also [29]) polynomial-time matching algorithms were obtained when the correlation is above the threshold from the Otter’s constant, our work establishes a polynomial-time algorithm as long as the average degree grows polynomially and the correlation is non-vanishing. In addition, the power in the running time only tends to \infty as the correlation tends to 0 (for each fixed α<1\alpha<1), and we are under the feeling that this is the best possible; such feeling is supported by a recent work on the complexity of low-degree polynomials for graph matching [16].

  • From a conceptual point of view, our work demonstrates the robustness of the iterative matching algorithm proposed in [17]. This type of “algorithmic universality” is closely related to the universality phenomenon in random matrix theory, which roughly speaking postulates that the particular distribution for entries of random matrices is often irrelevant for spectral properties under investigation. Our work also encourages future study on even more ambitious perspectives for robustness, for instance algorithms that are robust with assumptions on underlying random graph models. This is of major interest since realistic networks are usually captured better by more structured graph models, such as the random geometric graph model [50], the random growing graph model [44] and the stochastic block model [45].

  • In terms of techniques, our work employs a method which argues that Gaussian approximation is valid typically. There are a couple of major challenges : (1) Gaussian approximation is valid only in a rather weak sense that the Radon-Nikodym derivative is not too large; (2) we can not simply ignore the atypical cases when Gaussian approximation is invalid since we have to analyze the conditional behavior given all the information the algorithm has exploited so far. The latter raises a major obstacle in our proof for theoretical guarantee; see Section 3.1 for more detailed discussions on how this obstacle is addressed.

In addition, it is natural to suspect that our work brings us one step closer to understanding computational phase transitions for random graph matching problems as well as understanding algorithmic perspectives for other matching problems (see, e.g., [6]). We refer an interested reader to [17, Section 1.3] and omit further discussions here.

1.3 Notations

We record in this subsection some notation conventions, and we point out that a list of commonly used notations is included at the end of the paper for better reference.

Denote the identity matrix by I\mathrm{I} and the zero matrix by O\mathrm{O}. For a dmd\!*\!m matrix A\mathrm{A}, we use A\mathrm{A}^{*} to denote its transpose, and let AHS\|\mathrm{A}\|_{\mathrm{HS}} denote the Hilbert-Schmidt norm of A\mathrm{A}. For 1s1\leq s\leq\infty, define the ss-norm of A\mathrm{A} by As=sup{Axs:xs=1}\|\mathrm{A}\|_{s}=\sup\{\|\mathrm{A}x^{*}\|_{s}:\|x\|_{s}=1\}. Note that when A\mathrm{A} is a symmetric square matrix, we have for 1s+1t=1\tfrac{1}{s}+\tfrac{1}{t}=1

As=sup{yAx:xs=1,yt=1}=sup{yAt:yt=1}=At=At.\|\mathrm{A}\|_{s}=\sup\{y\mathrm{A}x^{*}:\|x\|_{s}=1,\|y\|_{t}=1\}=\sup\{\|y\mathrm{A}\|_{t}:\|y\|_{t}=1\}=\|\mathrm{A}^{*}\|_{t}=\|\mathrm{A}\|_{t}\,.

We further denote the operator norm of A\mathrm{A} by Aop=A2\|\mathrm{A}\|_{\mathrm{op}}=\|\mathrm{A}\|_{2}. If m=dm=d, denote det(A)\mathrm{det(A)} and tr(A)\mathrm{tr(A)} the determinant and the trace of A\mathrm{A}, respectively. For dd-dimensional vectors x,yx,y and a ddd\!*\!d symmetric matrix Σ\Sigma, let x,yΣ=xΣy\langle x,y\rangle_{\Sigma}=x\Sigma y^{*} be the “inner product” of x,yx,y with respect to Σ\Sigma, and we further denote xΣ2=xΣx\|x\|_{\Sigma}^{2}=x\Sigma x^{*}. For two vectors γ,μd\gamma,\mu\in\mathbb{R}^{d}, we say γμ\gamma\geq\mu (or equivalently μγ\mu\leq\gamma) if their entries satisfy γ(i)μ(i)\gamma(i)\geq\mu(i) for all 1id1\leq i\leq d. The indicator function of a set AA is denoted by 𝟏A\mathbf{1}_{A}.

Without further specification, all asymptotics are taken with respect to nn\to\infty. We also use standard asymptotic notations: for two sequences {an}\{a_{n}\} and {bn}\{b_{n}\}, we write an=O(bn)a_{n}=O(b_{n}) or anbna_{n}\lesssim b_{n}, if |an|C|bn||a_{n}|\leq C|b_{n}| for some absolute constant CC and all nn. We write an=Ω(bn)a_{n}=\Omega(b_{n}) or anbna_{n}\gtrsim b_{n}, if bn=O(an)b_{n}=O(a_{n}); we write an=Θ(bn)a_{n}=\Theta(b_{n}) or anbna_{n}\asymp b_{n}, if an=O(bn)a_{n}=O(b_{n}) and an=Ω(bn)a_{n}=\Omega(b_{n}); we write an=o(bn)a_{n}=o(b_{n}) or bn=ω(an)b_{n}=\omega(a_{n}), if anbn0\frac{a_{n}}{b_{n}}\to 0 as nn\to\infty. We write anbna_{n}\sim b_{n} if anbn1\frac{a_{n}}{b_{n}}\to 1.

We denote by Ber(p)\mathrm{Ber}(p) the Bernoulli distribution with parameter pp, denote by Bin(n,p)\mathrm{Bin}(n,p) the Binomial distribution with nn trials and success probability pp, denote by 𝒩(μ,σ2)\mathcal{N}(\mu,\sigma^{2}) the normal distribution with mean μ\mu and variance σ2\sigma^{2}, and denote by 𝒩(μ,Σ)\mathcal{N}(\mu,\Sigma) the multi-variate normal distribution with mean μ\mu and covariance matrix Σ\Sigma. We say (X,Y)(X,Y) is a pair of correlated binomial random variables, denoted as CorBin(N,M,p;m,ρ)\mathrm{CorBin}(N,M,p;m,\rho) for mmin{N,M}m\leq\min\{N,M\} and ρ[0,1]\rho\in[0,1], if (X,Y)(k=1Nbk,k=1Mbk)(X,Y)\sim\big{(}\sum_{k=1}^{N}b_{k},\sum_{k=1}^{M}b^{\prime}_{k}\big{)} with bk,blBer(p)b_{k},b^{\prime}_{l}\sim\mathrm{Ber}(p) such that {bk,bk}\{b_{k},b^{\prime}_{k}\} are independent with {b1,,bN,b1,,bM}{bk,bk}\{b_{1},\ldots,b_{N},b^{\prime}_{1},\ldots,b^{\prime}_{M}\}\setminus\{b_{k},b^{\prime}_{k}\} and the covariance between bkb_{k} and bkb^{\prime}_{k} is ρ\rho when kmk\leq m, and that bkb_{k} is independent with {b1,,bN,b1,,bM}{bk}\{b_{1},\ldots,b_{N},b^{\prime}_{1},\ldots,b^{\prime}_{M}\}\setminus\{b_{k}\} and blb^{\prime}_{l} is independent with {b1,,bN,b1,,bM}{bl}\{b_{1},\ldots,b_{N},b^{\prime}_{1},\ldots,b^{\prime}_{M}\}\setminus\{b^{\prime}_{l}\} when k,l>mk,l>m. We say XX is a sub-Gaussian variable, if there exists a positive constant CC such that (|X|t)2et2/C2\mathbb{P}(|X|\geq t)\leq 2e^{-{t^{2}}/{C^{2}}}, and we use Xψ2=inf{C>0:𝔼[exp{X2C2}]2}\|X\|_{\psi_{2}}=\inf\big{\{}C>0:\mathbb{E}[\exp\{\frac{X^{2}}{C^{2}}\}]\leq 2\big{\}} to denote its sub-Gaussian norm.

Acknowledgment. We thank Zongming Ma, Yihong Wu, Jiaming Xu and Fan Yang for stimulating discussions on random graph matching problems. J. Ding is partially supported by NSFC Key Program Project No. 12231002 and the Xplorer Prize.

2 An iterative matching algorithm

We first describe the underlying heuristics of our algorithm (the reader is strongly encouraged to consult [17, Section 2] for a description on the iterative matching algorithm for correlated Gaussian Wigner matrices). Since we expect that Wigner matrices and Erdős–Rényi graphs (with sufficient edge density) should belong to the same algorithmic universality class, it is natural to try to extend the algorithm proposed in [17] to the case for correlated random graphs. As in [17], our wish is to iteratively construct a sequence of paired sets (Γk(t),Πk(t))1kKt\big{(}\Gamma^{(t)}_{k},\Pi^{(t)}_{k}\big{)}_{1\leq k\leq K_{t}} for t0t\geq 0 (with Γk(t)V\Gamma^{(t)}_{k}\subset V and Πk(t)𝖵\Pi^{(t)}_{k}\subset\mathsf{V}), where each (Γk(t),Πk(t))\big{(}\Gamma^{(t)}_{k},\Pi^{(t)}_{k}\big{)} contains more true pairs of the form (v,π(v))(v,\pi(v)) than the case when the two sets are sampled uniformly and independently. In addition, we may further require |Γk(t)|,|Πk(t)|𝔞tn|\Gamma^{(t)}_{k}|,|\Pi^{(t)}_{k}|\approx\mathfrak{a}_{t}n for convenience of analysis later.

For initialization in [17], we obtain K0K_{0} true pairs via brute-force search, and provided with K0K_{0} true pairs we then for each such pair define (Γk(0),Πk(0))\big{(}\Gamma^{(0)}_{k},\Pi^{(0)}_{k}\big{)} to be the collections of their neighbors such that the corresponding edge weights exceed a certain threshold. In this work, however, due to the sparsity of Erdős–Rényi graphs (when α>0\alpha>0) we cannot produce an efficient initialization by simply looking at the 1-neighborhoods of some true pairs. In order to address this, we instead look at their χ\chi-neighborhoods with carefully chosen χ\chi (see the definition of (Γk(0),Πk(0))\big{(}\Gamma^{(0)}_{k},\Pi^{(0)}_{k}\big{)} in (2.8) below). This would require a significantly more complicated analysis since this initialization will have influence on iterations later. The idea to address this is to argue that in the initialization we have only used information on a small fraction of the edges; this is why χ\chi will be chosen carefully.

Provided with the initialization, the iteration of the algorithm is similar to that in [17] (although we will introduce some modifications in order to facilitate our analysis later). Since each pair (Γk(t),Πk(t))\big{(}\Gamma^{(t)}_{k},\Pi^{(t)}_{k}\big{)} carries some signal, we then hope to construct more paired sets at time t+1t+1 by considering various linear combinations of vertex degrees restricted to each Γk(t)\Gamma^{(t)}_{k} (or to Πk(t)\Pi^{(t)}_{k}). As a key novelty of this iterative algorithm, as in [17] we will use the increase on the number of paired sets to compensate the decrease on the signal carried in each pair. As we hope, once the iteration progresses to time t=tt=t^{*} for some well chosen tt^{*} (see (2.35) below) we would have accumulated enough total signal so that we can just complete the matching directly in the next step, as described in Section 2.4.

However, controlling the correlation among different iterative steps is a much more sophisticated job in this setting. In [17] we used Gaussian projection to remove the influence of conditioning on information obtained in previous steps. This is indeed a powerful technique but it crucially relies on the property of Gaussian process. Although there are examples that the universality of iterative algorithms have been established (see, e.g., [2, 7, 23] for development on this front for approximate-message-passing), we are not sure how their techniques can help solving our problem since dealing with a pair of correlated matrices seems of substantial and novel challenge. Instead we try to compare the Bernoulli process we obtained in the iterative algorithm with the corresponding Gaussian process when we replace {Gu,v,𝖦𝗎,𝗏}\{G_{u,v},\mathsf{G}_{\mathsf{u,v}}\} by a Gaussian process with the same mean and covariance structure. In order to facilitate such comparison, we also apply a Gaussian smoothing to our Bernoulli process in our algorithm below (see (2.29) where we introduce external Gaussian noise for smoothing purpose). However, since we need to analyze the conditional behavior of two processes, we need to compare their densities; this is much more demanding than controlling e.g. the transportation distance between two processes, and actually the density ratio of these two processes are fairly large. In order to address this, on the one hand, we will show that if we ignore a vanishing fraction of vertices (which is a highly non-trivial step as we will elaborate in Section 3.1), the density ratio is then under control while still being fairly large; on the other hand, we show that in the Gaussian setting some really bad event occurs with tiny probability (and thus still with small probability even after multiplying by this fairly large density ratio). We refer to Sections 3.4 and 3.5 for more detailed discussions on this point.

Finally, due to the aforementioned complications we are only able to show that our iterative algorithm constructs an almost exact matching. To obtain an exact matching, we will employ the method of seeded graph matching, as developed in previous works [1, 39, 38].

In the rest of this section, we will describe in detail our iterative algorithm, which consists of a few steps including preprocessing (see Section 2.1), initialization (see Section 2.2), iteration (see Section 2.3), finishing (see Section 2.4) and seeded graph matching (see Section 2.5). We formally present our algorithm in Section 2.6. In Section 2.7 we analyze the time complexity of the algorithm.

2.1 Preprocessing

Similar to [17], we make appropriate preprocessing on random graphs such that we only need to consider graphs with directed edges. We first make a technical assumption that we only need to consider the case when ρ\rho is a sufficiently small constant, which can be easily achieved by keeping each edge independently with a sufficiently small constant probability.

Now, we define G\overrightarrow{G} from GG. For any uvVu\neq v\in V, we do the following:

  • if (u,v)E(G)(u,v)\in E(G), then independently among all such (u,v)(u,v):

    with probability 12q4, set (u,v)G,(v,u)G,\displaystyle\mbox{with probability }\frac{1}{2}-\frac{q}{4},\mbox{ set }\overrightarrow{(u,v)}\in\overrightarrow{G},\overrightarrow{(v,u)}\not\in\overrightarrow{G}\,,
    with probability 12q4, set (u,v)G,(v,u)G,\displaystyle\mbox{with probability }\frac{1}{2}-\frac{q}{4},\mbox{ set }\overrightarrow{(u,v)}\not\in\overrightarrow{G},\overrightarrow{(v,u)}\in\overrightarrow{G}\,,
    with probability q4, set (u,v)G,(v,u)G,\displaystyle\mbox{with probability }\frac{q}{4},\mbox{ set }\overrightarrow{(u,v)}\in\overrightarrow{G},\overrightarrow{(v,u)}\in\overrightarrow{G}\,,
    with probability q4, set (u,v)G,(v,u)G;\displaystyle\mbox{with probability }\frac{q}{4},\mbox{ set }\overrightarrow{(u,v)}\not\in\overrightarrow{G},\overrightarrow{(v,u)}\not\in\overrightarrow{G}\,;
  • if (u,v)E(G)(u,v)\not\in E(G), then set (u,v)G,(v,u)G\overrightarrow{(u,v)}\not\in\overrightarrow{G},\overrightarrow{(v,u)}\not\in\overrightarrow{G}.

We define 𝖦\overrightarrow{\mathsf{G}} from 𝖦\mathsf{G} in the same manner such that G\overrightarrow{G} and 𝖦\overrightarrow{\mathsf{G}} are conditionally independent given (G,𝖦)(G,\mathsf{G}). We continue to use the convention that Gu,v=𝟏{(u,v)G}\overrightarrow{G}_{u,v}=\mathbf{1}_{\{\overrightarrow{(u,v)}\in\overrightarrow{G}\}}. It is then straightforward to verify that {Gu,v:uv}\{\overrightarrow{G}_{u,v}:u\neq v\} and {𝖦𝗎,𝗏:𝗎𝗏}\{\mathsf{\overrightarrow{\mathsf{G}}_{u,v}:u\neq v}\} are two families of i.i.d. Bernoulli random variables with parameter q2\frac{q}{2}. In addition, we have

𝔼[Gu,v𝖦π(u),π(v)]=𝔼[Gu,v𝖦π(v),π(u)]=q(q+ρ(1q))4.\displaystyle\mathbb{E}[\overrightarrow{G}_{u,v}{\overrightarrow{\mathsf{G}}_{\pi(u),\pi(v)}}]=\mathbb{E}[\overrightarrow{G}_{u,v}{\overrightarrow{\mathsf{G}}_{\pi(v),\pi(u)}}]=\frac{q(q+\rho(1-q))}{4}\,.

Thus, G,𝖦\overrightarrow{G},\overrightarrow{\mathsf{G}} are edge-correlated directed graphs, denoted as 𝒢(n,q^,ρ^)\overrightarrow{\mathcal{G}}(n,\hat{q},\hat{\rho}), such that q^=q2\hat{q}=\frac{q}{2} and ρ^=1q2qρ\hat{\rho}=\frac{1-q}{2-q}\rho. Also note that q^nα+o(1)\hat{q}\geq n^{-\alpha+o(1)} and ρ^[ρ3,ρ2)\hat{\rho}\in[\frac{\rho}{3},\frac{\rho}{2}) since q1/2q\leq 1/2. From now on we will work on the directed graph (G,𝖦)(\overrightarrow{G},\overrightarrow{\mathsf{G}}).

2.2 Initialization

For a pair of standard bivariate normal variables (X,Y)(X,Y) with correlation uu, we define ϕ:[1,1][0,1]\phi:[-1,1]\mapsto[0,1] by (below the number 10 is somewhat arbitrarily chosen)

ϕ(u)=(|X|10,|Y|10).\displaystyle\phi(u)=\mathbb{P}(|X|\geq 10,|Y|\geq 10)\,. (2.1)

In addition, we define

ιub=supx(0,1]{ϕ(x)ϕ(0)x2} and ιlb=infx(0,1]{ϕ(x)ϕ(0)x2}.{}\iota_{\mathrm{ub}}=\sup_{x\in(0,1]}\Big{\{}\frac{\phi(x)-\phi(0)}{x^{2}}\Big{\}}\mbox{ and }\iota_{\mathrm{lb}}=\inf_{x\in(0,1]}\Big{\{}\frac{\phi(x)-\phi(0)}{x^{2}}\Big{\}}\,. (2.2)

From the definition we know ϕ\phi is strictly increasing and by [17, Claims 2.6 and 2.8] we have ϕ(0)=0,ϕ′′(0)>0\phi^{\prime}(0)=0,\phi^{\prime\prime}(0)>0, and thus both ιub\iota_{\mathrm{ub}} and ιlb\iota_{\mathrm{lb}} are positive and finite. Also we write 𝔞=ϕ(1)=(|X|10)\mathfrak{a}=\phi(1)=\mathbb{P}(|X|\geq 10). Recalling in Subsection 2.1 it was shown that ρ\rho can be assumed to be a sufficiently small constant, from now on we will assume that

ρ^ρmin{𝔞𝔞2,ιub1,110}.{}\hat{\rho}\leq\rho\leq\min\big{\{}\mathfrak{a}-\mathfrak{a}^{2},\iota_{\mathrm{ub}}^{-1},\tfrac{1}{10}\big{\}}\,. (2.3)

Let κ=κ(ρ^)\kappa=\kappa(\hat{\rho}) be a sufficiently large constant depending on ρ^\hat{\rho} whose exact value will be decided later in (2.10). Set K0=κK_{0}=\kappa. We then arbitrarily choose a sequence A=(u1,u2,,uK0)A=(u_{1},u_{2},\ldots,u_{K_{0}}) where uiu_{i}’s are distinct vertices in VV, and we list all the sequences of length K0K_{0} with distinct elements in 𝖵\mathsf{V} as 𝖠1,𝖠2,,𝖠𝙼\mathsf{A}_{1},\mathsf{A}_{2},\ldots,\mathsf{A}_{\mathtt{M}} where 𝙼=𝙼(n,ρ^,q^)=i=0K01(ni)\mathtt{M}=\mathtt{M}(n,\hat{\rho},\hat{q})=\prod_{i=0}^{K_{0}-1}(n-i). As in [17], for each 1𝚖𝙼1\leq\mathtt{m}\leq\mathtt{M}, we will run a procedure of initialization and iteration and clearly for one of them (although a priori we do not know which one it is) we are running the algorithm as if we have K0K_{0} true pairs as seeds. For convenience, when describing the initialization and the iteration we will drop 𝚖\mathtt{m} from notations, but we emphasize that this procedure will be applied to each 𝖠𝚖\mathsf{A}_{\mathtt{m}}. Having this clarified, we take a fixed 𝚖\mathtt{m} and denote 𝖠𝚖=(𝗎1,,𝗎K0)\mathsf{A}_{\mathtt{m}}=(\mathsf{u}_{1},\ldots,\mathsf{u}_{K_{0}}). In what follows, we abuse the notation and write VAV\setminus A when regarding AA as a set (similarly for 𝖠𝚖\mathsf{A}_{\mathtt{m}}).

We next describe our initialization procedure. As discussed earlier, to the contrary of the case for Wigner matrices, we have to investigate the neighborhood around a seed up to a certain depth in order to get information for a large number of vertices. To this end, we choose an integer χ11α\chi\leq\frac{1}{1-\alpha} as the depth such that

(nq^)χ=o(ne(loglogn)100) and (nq^)χ+1=Ω(ne(loglogn)100)(n\hat{q})^{\chi}=o\big{(}ne^{-(\log\log n)^{100}}\big{)}\mbox{ and }(n\hat{q})^{\chi+1}=\Omega\big{(}ne^{-(\log\log n)^{100}}\big{)} (2.4)

(this is possible since α<1\alpha<1). We choose such χ\chi since on the one hand we wish to see a large number of vertices near the seed and on the other hand we want to reveal only a vanishing fraction of the edges. Now for 1kK01\leq k\leq K_{0}, define the seeds

k(0)={uk},Υk(0)={𝗎k}, and ϑ0=ς0=1/n.\displaystyle\aleph^{(0)}_{k}=\{u_{k}\},\Upsilon^{(0)}_{k}=\{\mathsf{u}_{k}\}\,,\mbox{ and }\vartheta_{0}=\varsigma_{0}=1/n\,. (2.5)

Then for 1aχ1\leq a\leq\chi, we iteratively define the aa-neighborhood of each seed by

k(a)={vV(1kK0,0ja1k(j)):Gv,u=1 for some uk(a1)},\displaystyle\aleph^{(a)}_{k}=\Big{\{}v\in V\setminus\big{(}\cup_{1\leq k\leq K_{0},0\leq j\leq a-1}\aleph^{(j)}_{k}\big{)}:\overrightarrow{G}_{v,u}=1\mbox{ for some }u\in\aleph^{(a-1)}_{k}\Big{\}}\,, (2.6)
Υk(a)={𝗏𝖵(1kK0,0ja1Υk(j)):𝖦𝗏,𝗎=1 for some 𝗎Υk(a1)}.\displaystyle\Upsilon^{(a)}_{k}=\Big{\{}\mathsf{v}\in\mathsf{V}\setminus\big{(}\cup_{1\leq k\leq K_{0},0\leq j\leq a-1}\Upsilon^{(j)}_{k}\big{)}:\overrightarrow{\mathsf{G}}_{\mathsf{v},\mathsf{u}}=1\mbox{ for some }\mathsf{u}\in\Upsilon^{(a-1)}_{k}\Big{\}}\,.

Also, for 1aχ1\leq a\leq\chi we iteratively define

ϑa=(X1) where XBin(ϑa1n,q^),\displaystyle\vartheta_{a}=\mathbb{P}(X\geq 1)\mbox{ where }X\sim\mathrm{Bin}(\vartheta_{a-1}n,\hat{q})\,, (2.7)
ςa=(X1,Y1) where (X,Y)CorBin(ϑa1n,ϑa1n,q^;ςa1n,ρ^).\displaystyle\varsigma_{a}=\mathbb{P}(X\geq 1,Y\geq 1)\mbox{ where }(X,Y)\sim\mathrm{CorBin}(\vartheta_{a-1}n,\vartheta_{a-1}n,\hat{q};\varsigma_{a-1}n,\hat{\rho})\,.

We will show in Subsection 3.3 that actually we have

|k(a)|/n,|Υk(a)|/nϑa and |k(a)|Υk(a)|/nςa.|\aleph^{(a)}_{k}|/n,|\Upsilon^{(a)}_{k}|/n\approx\vartheta_{a}\mbox{ and }|\aleph^{(a)}_{k}|\cap\Upsilon^{(a)}_{k}|/n\approx\varsigma_{a}\,.

Let 𝚍χ=𝚍χ(n,q^)\mathtt{d}_{\chi}=\mathtt{d}_{\chi}(n,\hat{q}) be the minimal integer such that (Bin(nϑχ,q^)𝚍χ)<12\mathbb{P}(\mathrm{Bin}(n\vartheta_{\chi},\hat{q})\geq\mathtt{d}_{\chi})<\frac{1}{2}, and set

Γk(0)=k(χ+1)={vV(1kK0,0jχk(j)):uk(χ)Gv,u𝚍χ},\displaystyle\Gamma^{(0)}_{k}=\aleph^{(\chi+1)}_{k}=\Big{\{}v\in V\setminus\big{(}\cup_{1\leq k\leq K_{0},0\leq j\leq\chi}\aleph^{(j)}_{k}\big{)}:\sum_{u\in\aleph^{(\chi)}_{k}}\overrightarrow{G}_{v,u}\geq\mathtt{d}_{\chi}\Big{\}}\,, (2.8)
Πk(0)=Υk(χ+1)={𝗏𝖵(1kK0,0jχΥk(j)):𝗎Υk(χ)𝖦𝗏,𝗎𝚍χ}.\displaystyle\Pi^{(0)}_{k}=\Upsilon^{(\chi+1)}_{k}=\Big{\{}\mathsf{v}\in\mathsf{V}\setminus\big{(}\cup_{1\leq k\leq K_{0},0\leq j\leq\chi}\Upsilon^{(j)}_{k}\big{)}:\sum_{\mathsf{u}\in\Upsilon^{(\chi)}_{k}}\overrightarrow{\mathsf{G}}_{\mathsf{v},\mathsf{u}}\geq\mathtt{d}_{\chi}\Big{\}}\,.

And we further define

ϑ=ϑχ+1=(X𝚍χ) where XBin(ϑχn,q^),\displaystyle\vartheta=\vartheta_{\chi+1}=\mathbb{P}(X\geq\mathtt{d}_{\chi})\mbox{ where }X\sim\mathrm{Bin}(\vartheta_{\chi}n,\hat{q})\,, (2.9)
ς=ςχ+1=(X,Y𝚍χ) where X,YCorBin(ϑχn,ϑχn,q^;ςχn,ρ^).\displaystyle\varsigma=\varsigma_{\chi+1}=\mathbb{P}(X,Y\geq\mathtt{d}_{\chi})\mbox{ where }X,Y\sim\mathrm{CorBin}(\vartheta_{\chi}n,\vartheta_{\chi}n,\hat{q};\varsigma_{\chi}n,\hat{\rho})\,.

We may then choose K0=κK_{0}=\kappa sufficiently large such that K01034ιub2ρ^20ιlb2(𝔞𝔞2)2K_{0}\geq\frac{10^{34}\iota_{\mathrm{ub}}^{2}\hat{\rho}^{-20}}{\iota_{\mathrm{lb}}^{2}(\mathfrak{a}-\mathfrak{a}^{2})^{2}} and

log(K0ιlb2(𝔞𝔞2)2ρ^20/1030ιub2)log(K0ιlb4ρ^24ε02/161030ιub2)1.01 where ε0=ςϑ22(ϑϑ2).\frac{\log(K_{0}\iota_{\mathrm{lb}}^{2}(\mathfrak{a}-\mathfrak{a}^{2})^{2}\hat{\rho}^{20}/10^{30}\iota_{\mathrm{ub}}^{2})}{\log({K_{0}\iota_{\mathrm{lb}}^{4}\hat{\rho}^{24}\varepsilon_{0}^{2}}/16\cdot 10^{30}\iota_{\mathrm{ub}}^{2})}\leq 1.01\mbox{ where }\varepsilon_{0}=\frac{\varsigma-\vartheta^{2}}{2(\vartheta-\vartheta^{2})}\,. (2.10)

In addition, we define Φ(0),Ψ(0)\Phi^{(0)},\Psi^{(0)} to be K0K0K_{0}\!*\!K_{0} matrices by

Φ(0)=I and Ψ(0)=ςϑ22(ϑϑ2)I,\displaystyle\Phi^{(0)}=\mathrm{I}\mbox{ and }\Psi^{(0)}=\frac{\varsigma-\vartheta^{2}}{2(\vartheta-\vartheta^{2})}\mathrm{I}\,, (2.11)

and in the iterative steps we will also construct Kt,εt,Γk(t),Πk(t)K_{t},\varepsilon_{t},\Gamma^{(t)}_{k},\Pi^{(t)}_{k} and Φ(t),Ψ(t)\Phi^{(t)},\Psi^{(t)} for t1t\geq 1. Similarly as in [17], the matrices Φ(t)\Phi^{(t)} and Ψ(t)\Psi^{(t)} are supposed to approximate the cardinalities of the sets {Γk(t),Πk(t)}\big{\{}\Gamma^{(t)}_{k},\Pi^{(t)}_{k}\big{\}} in the following sense. Write

𝔞t={𝔞,t1;ϑ,t=0.{}\mathfrak{a}_{t}=\begin{cases}\mathfrak{a},&t\geq 1\,;\\ \vartheta,&t=0\,.\end{cases} (2.12)

Then somewhat informally, we expect that

1n|Γi(t)|,1n|Πi(t)|𝔞t,\displaystyle\frac{1}{n}|\Gamma^{(t)}_{i}|,\frac{1}{n}|\Pi^{(t)}_{i}|\approx\mathfrak{a}_{t}\,, (2.13)
1n|Γi(t)Γj(t)|1n|Γi(t)|1n|Γj(t)|+𝔞t2𝔞t𝔞t21n|Γi(t)Γj(t)|𝔞t2𝔞t𝔞t2Φ(t)(i,j),\displaystyle\frac{\frac{1}{n}|\Gamma^{(t)}_{i}\cap\Gamma^{(t)}_{j}|-\frac{1}{n}|\Gamma^{(t)}_{i}|-\frac{1}{n}|\Gamma^{(t)}_{j}|+\mathfrak{a}_{t}^{2}}{\mathfrak{a}_{t}-\mathfrak{a}_{t}^{2}}\approx\frac{\frac{1}{n}|\Gamma^{(t)}_{i}\cap\Gamma^{(t)}_{j}|-\mathfrak{a}_{t}^{2}}{\mathfrak{a}_{t}-\mathfrak{a}_{t}^{2}}\approx\Phi^{(t)}(i,j)\,, (2.14)
1n|Πi(t)Πj(t)|1n|Πi(t)|1n|Πj(t)|+𝔞t2𝔞t𝔞t21n|Πi(t)Πj(t)|𝔞t2𝔞t𝔞t2Φ(t)(i,j),\displaystyle\frac{\frac{1}{n}|\Pi^{(t)}_{i}\cap\Pi^{(t)}_{j}|-\frac{1}{n}|\Pi^{(t)}_{i}|-\frac{1}{n}|\Pi^{(t)}_{j}|+\mathfrak{a}_{t}^{2}}{\mathfrak{a}_{t}-\mathfrak{a}_{t}^{2}}\approx\frac{\frac{1}{n}|\Pi^{(t)}_{i}\cap\Pi^{(t)}_{j}|-\mathfrak{a}_{t}^{2}}{\mathfrak{a}_{t}-\mathfrak{a}_{t}^{2}}\approx\Phi^{(t)}(i,j)\,, (2.15)
1n|Γi(t)Πj(t)|1n|Γi(t)|1n|Πj(t)|+𝔞t2𝔞t𝔞t21n|Γi(t)Πj(t)|𝔞t2𝔞t𝔞t2Ψ(t)(i,j).\displaystyle\frac{\frac{1}{n}|\Gamma^{(t)}_{i}\cap\Pi^{(t)}_{j}|-\frac{1}{n}|\Gamma^{(t)}_{i}|-\frac{1}{n}|\Pi^{(t)}_{j}|+\mathfrak{a}_{t}^{2}}{\mathfrak{a}_{t}-\mathfrak{a}_{t}^{2}}\approx\frac{\frac{1}{n}|\Gamma^{(t)}_{i}\cap\Pi^{(t)}_{j}|-\mathfrak{a}_{t}^{2}}{\mathfrak{a}_{t}-\mathfrak{a}_{t}^{2}}\approx\Psi^{(t)}(i,j)\,. (2.16)

As in [17, Lemma 2.1], in order to facilitate our analysis later we will also need an important property on the eigenvalues of Φ(t)\Phi^{(t)} and Ψ(t)\Psi^{(t)}:

Φ(t) has at least 34Kt eigenvalues between 0.9 and 1.1,\displaystyle\Phi^{(t)}\mbox{ has at least }\frac{3}{4}K_{t}\mbox{ eigenvalues between $0.9$ and $1.1$}\,, (2.17)
and Ψ(t) has at least 34Kt eigenvalues between 0.9εt and 1.1εt.\displaystyle\Psi^{(t)}\mbox{ has at least }\frac{3}{4}K_{t}\mbox{ eigenvalues between $0.9\varepsilon_{t}$ and $1.1\varepsilon_{t}$}\,. (2.18)

We will show in Subsection 3.3 that (2.13)–(2.18) are satisfied for t=0t=0. The main challenge is to construct (Γk(t+1),Πk(t+1))(\Gamma_{k}^{(t+1)},\Pi_{k}^{(t+1)}) and Φ(t+1),Ψ(t+1)\Phi^{(t+1)},\Psi^{(t+1)} such that (2.13)–(2.18) hold for t+1t+1, under the inductive assumption that (2.13)–(2.18) hold for tt. We conclude this subsection with some bounds on (ϑk,ςk)(\vartheta_{k},\varsigma_{k}).

Lemma 2.1.

ϑk,ςk=Θ(n1(nq^)k)\vartheta_{k},\varsigma_{k}=\Theta(n^{-1}(n\hat{q})^{k}) for 0kχ0\leq k\leq\chi and ϑχ+1,ςχ+1=Ω(e(loglogn)100)\vartheta_{\chi+1},\varsigma_{\chi+1}=\Omega(e^{-(\log\log n)^{100}}). Also, we have ςkϑk2=Θ(ϑk)\varsigma_{k}-\vartheta_{k}^{2}=\Theta(\vartheta_{k}) for 0kχ+10\leq k\leq\chi+1. In addition, we have either ϑχ+1=Θ(1)\vartheta_{\chi+1}=\Theta(1) or ϑχnα+o(1)\vartheta_{\chi}\leq n^{-\alpha+o(1)}.

Proof.

We prove the first claim by induction. The claim is trivial for k=0k=0. Now suppose the claim holds up to some kχ1k\leq\chi-1. Using (2.7) and Poisson approximation (note that when kχ1k\leq\chi-1 we have nq^ϑk=Θ(n1(nq^)k+1)=o(1)n\hat{q}\vartheta_{k}=\Theta(n^{-1}(n\hat{q})^{k+1})=o(1))

ϑk+1=Θ((ϑknq^)=Θ(n1(nq^)k+1) and ϑk+1ςk+1Θ(ρ^ςknq^)=Θ(ϑk+1),\displaystyle\vartheta_{k+1}=\Theta((\vartheta_{k}n\hat{q})=\Theta(n^{-1}(n\hat{q})^{k+1})\mbox{ and }\vartheta_{k+1}\geq\varsigma_{k+1}\geq\Theta(\hat{\rho}\varsigma_{k}n\hat{q})=\Theta(\vartheta_{k+1})\,,

which verifies the claim for k+1k+1 and thus verifies the first claim (for 0kχ0\leq k\leq\chi). If ϑχnq^=Θ((nq^)χ+1)1\vartheta_{\chi}n\hat{q}=\Theta((n\hat{q})^{\chi+1})\ll 1, we have 𝚍χ=1\mathtt{d}_{\chi}=1 and thus ϑχ+1,ςχ+1=Θ((nq^)χ+1)=Ω(e(loglogn)100)\vartheta_{\chi+1},\varsigma_{\chi+1}=\Theta((n\hat{q})^{\chi+1})=\Omega(e^{-(\log\log n)^{100}}); if ϑχnq^=Ω(1)\vartheta_{\chi}n\hat{q}=\Omega(1), we have ςχnq^=Ω(1)\varsigma_{\chi}n\hat{q}=\Omega(1) and thus by the choice of 𝚍χ\mathtt{d}_{\chi} we have ϑχ+1,ςχ+1=Θ(1)\vartheta_{\chi+1},\varsigma_{\chi+1}=\Theta(1) and ςχ+1ϑχ+12=Θ(1)\varsigma_{\chi+1}-\vartheta_{\chi+1}^{2}=\Theta(1) using Poisson approximation. Thus, we have ςkϑk2=Θ(ϑk)\varsigma_{k}-\vartheta_{k}^{2}=\Theta(\vartheta_{k}) for k=χ+1k=\chi+1 (note that the case for 1kχ1\leq k\leq\chi can be checked in a straightforward manner). In addition, if ϑχ=nα+ϵ+o(1)\vartheta_{\chi}=n^{-\alpha+\epsilon+o(1)} for some arbitrarily small but fixed ϵ>0\epsilon>0, then nq^ϑχ1n\hat{q}\vartheta_{\chi}\gg 1 and thus ϑχ+1=Θ(1)\vartheta_{\chi+1}=\Theta(1). This completes the proof of the lemma. ∎

2.3 Iteration

We reiterate that in this subsection we are describing the iteration for a fixed 1𝚖𝙼1\leq\mathtt{m}\leq\mathtt{M} and eventually this iterative procedure will be applied to each 𝚖\mathtt{m}. Define

Kt+1=1ϰKt2 where ϰ=ϰ(ρ^)=1030ιub2ρ^20ιlb2(𝔞𝔞2)2\displaystyle K_{t+1}=\frac{1}{\varkappa}K_{t}^{2}\mbox{ where }\varkappa=\varkappa(\hat{\rho})=\frac{10^{30}\iota_{\mathrm{ub}}^{2}\hat{\rho}^{-20}}{\iota_{\mathrm{lb}}^{2}(\mathfrak{a}-\mathfrak{a}^{2})^{2}} (2.19)

for t0t\geq 0. Since we have assumed K0104ϰK_{0}\geq 10^{4}\varkappa, we can then prove by induction that

1030ρ^20(𝔞𝔞2)2Kt2Kt+1104Kt.{}10^{30}\hat{\rho}^{20}(\mathfrak{a}-\mathfrak{a}^{2})^{2}K_{t}^{2}\geq K_{t+1}\geq 10^{4}K_{t}\,. (2.20)

We now suppose that (Γk(s),Πk(s))1kKs(\Gamma^{(s)}_{k},\Pi^{(s)}_{k})_{1\leq k\leq K_{s}} and Φ(s),Ψ(s)\Phi^{(s)},\Psi^{(s)} have been constructed for sts\leq t (which will be implemented inductively via (2.29) as we describe next). Recall that we are working under the assumption that (2.13)–(2.18) hold for sts\leq t. For vVv\in V and 𝗏𝖵\mathsf{v}\in\mathsf{V}, define Dv(t),𝖣𝗏(t)KtD^{(t)}_{v},\mathsf{D}^{(t)}_{\mathsf{v}}\in\mathbb{R}^{K_{t}} to be the “normalized degrees” of vv in Γk(t)\Gamma^{(t)}_{k} and of 𝗏\mathsf{v} in Πk(t)\Pi^{(t)}_{k} as follows:

Dv(t)(k)\displaystyle D^{(t)}_{v}(k) =1(𝔞t𝔞t2)nq^(1q^)uV(𝟏uΓk(t)𝔞t)(Gv,uq^),\displaystyle=\frac{1}{\sqrt{(\mathfrak{a}_{t}-\mathfrak{a}_{t}^{2})n\hat{q}(1-\hat{q})}}\sum_{u\in V}(\mathbf{1}_{u\in\Gamma^{(t)}_{k}}-\mathfrak{a}_{t})(\overrightarrow{G}_{v,u}-\hat{q})\,, (2.21)
𝖣𝗏(t)(k)\displaystyle\mathsf{D}^{(t)}_{\mathsf{v}}(k) =1(𝔞t𝔞t2)nq^(1q^)𝗎𝖵(𝟏𝗎Πk(t)𝔞t)(𝖦𝗏,𝗎q^).\displaystyle=\frac{1}{\sqrt{(\mathfrak{a}_{t}-\mathfrak{a}_{t}^{2})n\hat{q}(1-\hat{q})}}\sum_{\mathsf{u}\in\mathsf{V}}(\mathbf{1}_{\mathsf{u}\in\Pi^{(t)}_{k}}-\mathfrak{a}_{t})(\overrightarrow{\mathsf{G}}_{\mathsf{v,u}}-\hat{q})\,.

Recalling (2.12), we note that there is a difference between the definition (2.21) for t=0t=0 and t1t\geq 1; this is because Γk(0)\Gamma^{(0)}_{k} and Πk(0)\Pi^{(0)}_{k} may only contain a vanishing fraction of vertices. We also point out that similar to [17], in the above definition we used the “centered” version of 𝟏uΓk(t)\mathbf{1}_{u\in\Gamma^{(t)}_{k}} and 𝟏𝗎Πk(t)\mathbf{1}_{\mathsf{u}\in\Pi^{(t)}_{k}} since (2.13) suggests that intuitively each vertex uu (respectively, 𝗎\mathsf{u}) has probability approximately 𝔞t\mathfrak{a}_{t} to belong to Γk(t)\Gamma^{(t)}_{k} (respectively, Πk(t)\Pi^{(t)}_{k}); such centering will be useful for our proof later as it leads to additional cancellation.

Assuming Lemma 2.2, we can then write Φ(t)\Phi^{(t)} and Ψ(t)\Psi^{(t)} as their spectral decompositions:

Φ(t)=i=1Ktλi(t)(νi(t))(νi(t)) and Ψ(t)=i=1Ktμi(t)(ξi(t))(ξi(t))\Phi^{(t)}=\sum^{K_{t}}_{i=1}\lambda^{(t)}_{i}\left({\nu^{(t)}_{i}}\right)^{*}\left(\nu^{(t)}_{i}\right)\mbox{ and }\Psi^{(t)}=\sum_{i=1}^{K_{t}}\mu^{(t)}_{i}\left({\xi^{(t)}_{i}}\right)^{*}\left(\xi^{(t)}_{i}\right) (2.22)

where

λi(t)(0.9,1.1),μi(t)(0.9εt,1.1εt) for 1i3Kt4\lambda^{(t)}_{i}\in(0.9,1.1),\mu^{(t)}_{i}\in(0.9\varepsilon_{t},1.1\varepsilon_{t})\mbox{ for }1\leq i\leq\frac{3K_{t}}{4} (2.23)

and νi(t),ξi(t)\nu_{i}^{(t)},\xi_{i}^{(t)} are the unit eigenvectors with respect to λi(t),μi(t)\lambda_{i}^{(t)},\mu_{i}^{(t)} respectively. Next, for s,ts,t we define MΓ(t,s),MΠ(t,s),PΓ,Π(t,s)\mathrm{M}_{\Gamma}^{(t,s)},\mathrm{M}_{\Pi}^{(t,s)},\mathrm{P}_{\Gamma,\Pi}^{(t,s)} to be KtKsK_{t}\!*\!K_{s} matrices as follows:

MΓ(t,s)(i,j)\displaystyle\mathrm{M}_{\Gamma}^{(t,s)}(i,j) =|Γi(t)Γj(s)|𝔞s|Γi(t)|𝔞t|Γj(s)|+𝔞s𝔞tn(𝔞s𝔞s2)(𝔞t𝔞t2)n,\displaystyle=\frac{|\Gamma^{(t)}_{i}\cap\Gamma^{(s)}_{j}|-\mathfrak{a}_{s}|\Gamma^{(t)}_{i}|-\mathfrak{a}_{t}|\Gamma^{(s)}_{j}|+\mathfrak{a}_{s}\mathfrak{a}_{t}n}{\sqrt{(\mathfrak{a}_{s}-\mathfrak{a}_{s}^{2})(\mathfrak{a}_{t}-\mathfrak{a}_{t}^{2})}n}\,, (2.24)
MΠ(t,s)(i,j)\displaystyle\mathrm{M}_{\Pi}^{(t,s)}(i,j) =|Πi(t)Πj(s)|𝔞s|Πi(t)|𝔞t|Πj(s)|+𝔞s𝔞tn(𝔞s𝔞s2)(𝔞t𝔞t2)n,\displaystyle=\frac{|\Pi^{(t)}_{i}\cap\Pi^{(s)}_{j}|-\mathfrak{a}_{s}|\Pi^{(t)}_{i}|-\mathfrak{a}_{t}|\Pi^{(s)}_{j}|+\mathfrak{a}_{s}\mathfrak{a}_{t}n}{\sqrt{(\mathfrak{a}_{s}-\mathfrak{a}_{s}^{2})(\mathfrak{a}_{t}-\mathfrak{a}_{t}^{2})}n}\,,
PΓ,Π(t,s)(i,j)\displaystyle\mathrm{P}_{\Gamma,\Pi}^{(t,s)}(i,j) =|π(Γi(t))Πj(s)|𝔞s|Γi(t)|𝔞t|Πj(s)|+𝔞s𝔞tn(𝔞s𝔞s2)(𝔞t𝔞t2)n.\displaystyle=\frac{|\pi(\Gamma^{(t)}_{i})\cap\Pi^{(s)}_{j}|-\mathfrak{a}_{s}|\Gamma^{(t)}_{i}|-\mathfrak{a}_{t}|\Pi^{(s)}_{j}|+\mathfrak{a}_{s}\mathfrak{a}_{t}n}{\sqrt{(\mathfrak{a}_{s}-\mathfrak{a}_{s}^{2})(\mathfrak{a}_{t}-\mathfrak{a}_{t}^{2})}n}\,.

These matrices actually represent the covariance matrices for random vectors of the form Dv(t)D^{(t)}_{v} and 𝖣π(v)(s)\mathsf{D}^{(s)}_{\pi(v)}. To get a rough intuition of this, we (formally incorrectly) regard Dv(s)D_{v}^{(s)} as a linear combination of {Gu,v}\{G_{u,v}\} with deterministic coefficients (and the same applies to 𝖣𝗏(s)\mathsf{D}^{(s)}_{\mathsf{v}}). Then we can see that for all vVv\in V, the “correlation” between Dv(t)(i)D^{(t)}_{v}(i) and Dv(s)(j)D^{(s)}_{v}(j) equals

1(𝔞t𝔞t2)(𝔞s𝔞s2)nuVA(𝟏uΓi(t)𝔞t)(𝟏uΓj(s)𝔞s)=MΓ(t,s)(i,j).\frac{1}{\sqrt{(\mathfrak{a}_{t}-\mathfrak{a}_{t}^{2})(\mathfrak{a}_{s}-\mathfrak{a}_{s}^{2})}n}\sum_{u\in V\setminus A}(\mathbf{1}_{u\in\Gamma^{(t)}_{i}}-\mathfrak{a}_{t})(\mathbf{1}_{u\in\Gamma^{(s)}_{j}}-\mathfrak{a}_{s})=\mathrm{M}_{\Gamma}^{(t,s)}(i,j)\,.

This justifies our definition of MΓ(t,s)\mathrm{M}_{\Gamma}^{(t,s)} which aims to record the correlation between Dv(t)D^{(t)}_{v} and Dv(s)D^{(s)}_{v}. Similarly under the same simplification we have MΠ(t,s)\mathrm{M}_{\Pi}^{(t,s)} (respectively, ρ^PΓ,Π(t,s)\hat{\rho}\mathrm{P}_{\Gamma,\Pi}^{(t,s)}) is the correlation matrix between 𝖣𝗏(t)\mathsf{D}^{(t)}_{\mathsf{v}} and 𝖣𝗏(s)\mathsf{D}^{(s)}_{\mathsf{v}} (respectively, between Dv(t)D^{(t)}_{v} and 𝖣π(v)(s)\mathsf{D}^{(s)}_{\pi(v)}). In addition, from (2.13)–(2.16) we expect that MΓ(t,t),MΠ(t,t)Φ(t)\mathrm{M}_{\Gamma}^{(t,t)},\mathrm{M}_{\Pi}^{(t,t)}\approx\Phi^{(t)} and PΓ,Π(t,t)Ψ(t)\mathrm{P}_{\Gamma,\Pi}^{(t,t)}\approx\Psi^{(t)}. Note that MΓ,MΠ\mathrm{M}_{\Gamma},\mathrm{M}_{\Pi} are accessible by the algorithm but PΓ,Π\mathrm{P}_{\Gamma,\Pi} is not (since it relies on the latent matching). We further define two linear subspaces as follows:

W(t)\displaystyle\mathrm{W}^{(t)} ={xKt:xMΓ(t,s)=0,xMΠ(t,s)=0, for all s<t},\displaystyle\overset{\triangle}{=}\big{\{}x\in\mathbb{R}^{K_{t}}:x\mathrm{M}_{\Gamma}^{(t,s)}=0,x\mathrm{M}_{\Pi}^{(t,s)}=0,\mbox{ for all }s<t\big{\}}\,, (2.25)
V(t)\displaystyle\mathrm{V}^{(t)} =span{ν1(t),ν2(t),,ν34Kt(t)}span{ξ1(t),ξ2(t),,ξ34Kt(t)}W(t).\displaystyle\overset{\triangle}{=}\mathrm{span}\big{\{}\nu^{(t)}_{1},\nu^{(t)}_{2},\ldots,\nu^{(t)}_{\frac{3}{4}K_{t}}\big{\}}\cap\mathrm{span}\big{\{}\xi^{(t)}_{1},\xi^{(t)}_{2},\ldots,\xi^{(t)}_{\frac{3}{4}K_{t}}\big{\}}\cap\mathrm{W}^{(t)}\,.

We refer to [17, Remark 3.3] for underlying reasons of the definition above. Note that the number of linear constraints posed on W(t)\mathrm{W}^{(t)} are at most 2i=1tKi12\sum_{i=1}^{t}K_{i-1}. So

dim(V(t))34Kt+34Kt+dim(Wt)2Kt12Kt2i=1tKi1(2.20)0.49Kt.\dim(\mathrm{V}^{(t)})\geq\frac{3}{4}K_{t}+\frac{3}{4}K_{t}+\dim(\mathrm{W}_{t})-2K_{t}\geq\frac{1}{2}K_{t}-2\sum_{i=1}^{t}K_{i-1}\overset{\eqref{eq-increasing-dimension}}{\geq}0.49K_{t}\,.

As proved in [17, (2.10) and (2.11)], we can choose η1(t),η2(t),,η112Kt(t)\eta^{(t)}_{1},\eta^{(t)}_{2},\ldots,\eta^{(t)}_{\frac{1}{12}K_{t}} from V(t)\mathrm{V}^{(t)} such that

ηi(t)MΓ(t,t)(ηj(t))=ηi(t)MΠ(t,t)(ηj(t))=ηi(t)Ψ(t)(ηj(t))=0,\displaystyle\eta^{(t)}_{i}\mathrm{M}_{\Gamma}^{(t,t)}\big{(}\eta^{(t)}_{j}\big{)}^{*}=\eta^{(t)}_{i}\mathrm{M}_{\Pi}^{(t,t)}\big{(}\eta^{(t)}_{j}\big{)}^{*}=\eta^{(t)}_{i}\Psi^{(t)}\big{(}\eta^{(t)}_{j}\big{)}^{*}=0\,, (2.26)
ηi(t)Φ(t)(ηi(t))=1,2εtηi(t)Ψ(t)(ηi(t))0.5εt.\displaystyle\eta^{(t)}_{i}\Phi^{(t)}\big{(}\eta^{(t)}_{i}\big{)}^{*}=1,\quad 2\varepsilon_{t}\geq\eta^{(t)}_{i}\Psi^{(t)}\big{(}\eta^{(t)}_{i}\big{)}^{*}\geq 0.5\varepsilon_{t}\,. (2.27)

Furthermore, we must have ηi(t)2(12,2)\big{\|}\eta^{(t)}_{i}\big{\|}^{2}\in(\frac{1}{2},2). As in [17], we will project the degrees Dv(t),𝖣𝗏(t)D^{(t)}_{v},\mathsf{D}^{(t)}_{\mathsf{v}} to a set of carefully chosen directions in the space spanned by all ηi\eta_{i}’s. These directions are defined as follows: we sample βk(t)(j)\beta^{(t)}_{k}(j) as i.i.d. uniform variables on {1,1}\{-1,1\}. By [17, Proposition 2.4], these βk(t)(j)\beta^{(t)}_{k}(j)’s satisfy [17, (2.21)–(2.24)] with probability at least 0.5. As in [17], we will keep resampling until these requirements are satisfied. Define

σk(t)=12Ktj=1112Ktβk(t)(j)ηj(t) for k=1,2,,Kt+1.\displaystyle\sigma_{k}^{(t)}=\sqrt{\frac{12}{K_{t}}}\sum_{j=1}^{\frac{1}{12}K_{t}}\beta^{(t)}_{k}(j)\eta_{j}^{(t)}\mbox{ for }k=1,2,\ldots,K_{t+1}\,. (2.28)

We sample i.i.d. standard normal variables {Wv(t)(i),𝖶𝗏(t)(i):1iKt12}\{W^{(t)}_{v}(i),\mathsf{W}^{(t)}_{\mathsf{v}}(i):1\leq i\leq\frac{K_{t}}{12}\} and complete our iteration by setting

Γk(t+1)={vV:12|12Ktβk(t),Wv(t)+σk(t),Dv(t)|10},\displaystyle\Gamma^{(t+1)}_{k}=\Big{\{}v\in V:\frac{1}{\sqrt{2}}\Big{|}\sqrt{\frac{12}{K_{t}}}\langle\beta^{(t)}_{k},W^{(t)}_{v}\rangle+\langle\sigma^{(t)}_{k},D^{(t)}_{v}\rangle\Big{|}\geq 10\Big{\}}\,, (2.29)
Πk(t+1)={𝗏𝖵:12|12Ktβk(t),𝖶v(t)+σk(t),𝖣𝗏(t)|10}.\displaystyle\Pi^{(t+1)}_{k}=\Big{\{}\mathsf{v}\in\mathsf{V}:\frac{1}{\sqrt{2}}\Big{|}\sqrt{\frac{12}{K_{t}}}\langle\beta^{(t)}_{k},\mathsf{W}^{(t)}_{v}\rangle+\langle\sigma^{(t)}_{k},\mathsf{D}^{(t)}_{\mathsf{v}}\rangle\Big{|}\geq 10\Big{\}}\,.

In the above, we introduced a Gaussian smoothing {Wv(t)(i),𝖶𝗏(t)(i):1iKt}\{W^{(t)}_{v}(i),\mathsf{W}^{(t)}_{\mathsf{v}}(i):1\leq i\leq K_{t}\}. We believe this is not essential but provides technical convenience: on the one hand it probably simply reduces the efficiency of the algorithm since it weakens the signal, but on the other hand it facilitates the analysis since it brings the distribution closer to Gaussian. In addition, we have used the absolute value of a random variable instead of a random variable itself, with the purpose of introducing more symmetry as in [17] (e.g., to bound (3.95) below). Recall that we have explained that MΓ(t,t)\mathrm{M}_{\Gamma}^{(t,t)} records the covariance matrix of Dv(t)D^{(t)}_{v} for all vVv\in V. Thus, we expect that the correlation between σk(t),Dv(t)\langle\sigma^{(t)}_{k},D^{(t)}_{v}\rangle and σl(t),Dv(t)\langle\sigma^{(t)}_{l},D^{(t)}_{v}\rangle is approximately

12Kti,j=1112Ktβk(t)(i)βl(t)(j)ηi(t)MΓ(t,t)(ηj(t))=(2.26),(2.27)12Kti=1112Ktβk(t)(i)βl(t)(i)=12Ktβk(t),βl(t).\frac{12}{K_{t}}\sum_{i,j=1}^{\frac{1}{12}K_{t}}\beta^{(t)}_{k}(i)\beta^{(t)}_{l}(j)\eta^{(t)}_{i}\mathrm{M}_{\Gamma}^{(t,t)}\big{(}\eta^{(t)}_{j}\big{)}^{*}\overset{\eqref{equ-vector-orthogonal},\eqref{equ-vector-unit}}{=}\frac{12}{K_{t}}\sum_{i=1}^{\frac{1}{12}K_{t}}\beta^{(t)}_{k}(i)\beta^{(t)}_{l}(i)=\frac{12}{K_{t}}\big{\langle}\beta^{(t)}_{k},\beta^{(t)}_{l}\big{\rangle}\,.

In particular, the variance of each σi(t),Dv(t)\langle\sigma^{(t)}_{i},D^{(t)}_{v}\rangle is approximately 11. Similarly, we can show the correlation between σi(t),𝖣𝗏(t)\langle\sigma^{(t)}_{i},\mathsf{D}^{(t)}_{\mathsf{v}}\rangle and σj(t),𝖣𝗏(t)\langle\sigma^{(t)}_{j},\mathsf{D}^{(t)}_{\mathsf{v}}\rangle is approximately 12Ktβk(t),βl(t)\frac{12}{K_{t}}\big{\langle}\beta^{(t)}_{k},\beta^{(t)}_{l}\big{\rangle}, and the correlation between σi(t),Dv(t)\langle\sigma^{(t)}_{i},D^{(t)}_{v}\rangle and σj(t),𝖣π(v)(t)\langle\sigma^{(t)}_{j},\mathsf{D}^{(t)}_{\pi(v)}\rangle is approximately ρ^12Ktβ^i(t),β^j(t)\hat{\rho}\cdot\frac{12}{K_{t}}\big{\langle}\hat{\beta}^{(t)}_{i},\hat{\beta}^{(t)}_{j}\big{\rangle}, where

β^k(t)(j)=(ηj(t)Ψ(t)(ηj(t)))1/2βk(t)(j).\hat{\beta}^{(t)}_{k}(j)=\Big{(}\eta^{(t)}_{j}\Psi^{(t)}\big{(}\eta^{(t)}_{j}\big{)}^{*}\Big{)}^{1/2}\cdot\beta^{(t)}_{k}(j)\,. (2.30)

(Here we also used that (2.16) implies that PΓ,Π(t,t)Ψ(t)\mathrm{P}_{\Gamma,\Pi}^{(t,t)}\approx\Psi^{(t)}). Recall our desire for (2.13)–(2.16) to hold for t+1t+1. Thus the signal contained in each pair at time t+1t+1 is approximately

εt+1=1(𝔞𝔞2)(ϕ(ρ^212Ktj=1Kt12ηj(t)Ψ(t)(ηj(t)))ϕ(0)).\displaystyle\varepsilon_{t+1}=\frac{1}{(\mathfrak{a}-\mathfrak{a}^{2})}\Big{(}\phi\Big{(}\frac{\hat{\rho}}{2}\frac{12}{K_{t}}\sum_{j=1}^{\frac{K_{t}}{12}}\eta^{(t)}_{j}\Psi^{(t)}\big{(}\eta^{(t)}_{j}\big{)}^{*}\Big{)}-\phi(0)\Big{)}\,. (2.31)

By (2.27), we have that

εt+1[ιlbρ^24(𝔞𝔞2)(0.5εt)2,ιubρ^24(𝔞𝔞2)(2εt)2].\varepsilon_{t+1}\in\big{[}\frac{\iota_{\mathrm{lb}}\hat{\rho}^{2}}{4(\mathfrak{a}-\mathfrak{a}^{2})}(0.5\varepsilon_{t})^{2},\frac{\iota_{\mathrm{ub}}\hat{\rho}^{2}}{4(\mathfrak{a}-\mathfrak{a}^{2})}(2\varepsilon_{t})^{2}\big{]}\,. (2.32)

Recalling (2.3), we have εt+1εt2\varepsilon_{t+1}\leq\varepsilon_{t}^{2}, and thus (recall from (2.10) that ε0<12\varepsilon_{0}<\tfrac{1}{2})

εt+1εtε012,{}\varepsilon_{t+1}\leq\varepsilon_{t}\leq\ldots\leq\varepsilon_{0}\leq\tfrac{1}{2}\,, (2.33)

which verifies our statement that the signal εt\varepsilon_{t} in each pair is decreasing. We can then finish the iteration by defining Φ(t+1),Ψ(t+1)\Phi^{(t+1)},\Psi^{(t+1)} to be Kt+1Kt+1K_{t+1}\!*\!K_{t+1} matrices such that

Φ(t+1)(i,j)=(𝔞𝔞2)1{ϕ(12Ktβi(t),βj(t))𝔞2},\displaystyle\Phi^{(t+1)}(i,j)=(\mathfrak{a}-\mathfrak{a}^{2})^{-1}\Big{\{}\phi\Big{(}\frac{12}{K_{t}}\langle{\beta}^{(t)}_{i},{\beta}^{(t)}_{j}\rangle\Big{)}-\mathfrak{a}^{2}\Big{\}}\,, (2.34)
Ψ(t+1)(i,j)=(𝔞𝔞2)1{ϕ(ρ^212Ktβ^i(t),β^j(t))𝔞2}.\displaystyle\Psi^{(t+1)}(i,j)=(\mathfrak{a}-\mathfrak{a}^{2})^{-1}\Big{\{}\phi\Big{(}\frac{\hat{\rho}}{2}\frac{12}{K_{t}}\langle\hat{\beta}^{(t)}_{i},\hat{\beta}^{(t)}_{j}\rangle\Big{)}-\mathfrak{a}^{2}\Big{\}}\,.

Next, we state a lemma which then inductively justifies (2.17) and (2.18).

Lemma 2.2.

Let (Φ(t),Ψ(t))(\Phi^{(t)},\Psi^{(t)}) be initialized as in (2.11) and inductively defined as in (2.34), also let εt\varepsilon_{t} be initialized in (2.19) and iteratively defined in (2.31). Then, Φ(t)\Phi^{(t)} has 34Kt\frac{3}{4}K_{t} eigenvalues between 0.90.9 and 1.11.1, and Ψ(t)\Psi^{(t)} has 34Kt\frac{3}{4}K_{t} eigenvalues between 0.9εt0.9\varepsilon_{t} and 1.1εt1.1\varepsilon_{t}.

We note that the definition of (2.34) is identical to that of [17, (2.15)] and thus Lemma 2.2 is identical to [17, Lemma 2.1].

2.4 Almost exact matching

In this subsection we describe how we get an almost exact matching once we accumulate enough signal along the iteration. To this end, define

t=min{t0:Kt(logn)2}.t^{*}=\min\{t\geq 0:K_{t}\geq(\log n)^{2}\}\,. (2.35)

Obviously Kt(logn)4K_{t^{*}}\leq(\log n)^{4}. By (2.19), we have Kt=(K02t)/(ϰ2t1)K_{t}=(K_{0}^{2^{t}})/(\varkappa^{2^{t}-1}) and as a result we have t=O(logloglogn)t^{*}=O(\log\log\log n). In addition, recalling (2.32) we have

Kt+1εt+12\displaystyle K_{t+1}\varepsilon_{t+1}^{2} ρ^20ιlb2(𝔞𝔞2)1030ιub2Kt2(ιlbρ^216(𝔞𝔞2)εt2)2\displaystyle\geq\frac{\hat{\rho}^{20}\iota_{\mathrm{lb}}^{2}(\mathfrak{a}-\mathfrak{a}^{2})}{10^{30}\iota_{\mathrm{ub}}^{2}}K_{t}^{2}\cdot\Big{(}\frac{\iota_{\mathrm{lb}}\hat{\rho}^{2}}{16(\mathfrak{a}-\mathfrak{a}^{2})}\varepsilon_{t}^{2}\Big{)}^{2}
=ρ^24ιlb41621030ιub2(𝔞𝔞2)(Ktεt2)2.\displaystyle=\frac{\hat{\rho}^{24}\iota_{\mathrm{lb}}^{4}}{16^{2}\cdot 10^{30}\iota_{\mathrm{ub}}^{2}(\mathfrak{a}-\mathfrak{a}^{2})}(K_{t}\varepsilon_{t}^{2})^{2}\,.

Using the choice of K0=κK_{0}=\kappa in (2.10) we see that the total signal Ktεt2K_{t}\varepsilon_{t}^{2} is increasing in tt. We also have that

Ktεt2(K0ιlb2ρ^4ε0216(𝔞𝔞2)2ϰ)2t(2.10)(K0ϰ)2t/1.01Kt1/1.01(logn)1.9.\displaystyle K_{t^{*}}\varepsilon_{t^{*}}^{2}\geq\Big{(}\frac{K_{0}\iota_{\mathrm{lb}}^{2}\hat{\rho}^{4}\varepsilon^{2}_{0}}{16(\mathfrak{a}-\mathfrak{a}^{2})^{2}\varkappa}\Big{)}^{2^{t^{*}}}\overset{\eqref{eq-kappa-choice}}{\geq}\Big{(}\frac{K_{0}}{\varkappa}\Big{)}^{2^{t^{*}}/1.01}\geq K_{t_{*}}^{1/1.01}\geq(\log n)^{1.9}\,. (2.36)

For each 1𝚖𝙼1\leq\mathtt{m}\leq\mathtt{M}, we run the procedure of initialization and then run the iteration up to time tt^{*}, and then we construct a permutation π𝚖\pi_{\mathtt{m}} (with respect to 𝖠𝚖\mathsf{A}_{\mathtt{m}}) as follows. For A=(u1,,uK0)A=(u_{1},\ldots,u_{K_{0}}) and 𝖠𝚖=(𝗎1,,𝗎K0)\mathsf{A}_{\mathtt{m}}=(\mathsf{u}_{1},\ldots,\mathsf{u}_{K_{0}}), set π𝚖(uj)=𝗎j\pi_{\mathtt{m}}(u_{j})=\mathsf{u}_{j} for 1jK01\leq j\leq K_{0}. We set a prefixed ordering of VAV\setminus A and 𝖵𝖠𝚖\mathsf{V}\setminus\mathsf{A}_{\mathtt{m}} as VA={v1,,vnK0}V\setminus A=\{v_{1},\ldots,v_{n-K_{0}}\} and 𝖵𝖠𝚖={𝗏1,,𝗏nK0}\mathsf{V}\setminus\mathsf{A}_{\mathtt{m}}=\{\mathsf{v}_{1},\ldots,\mathsf{v}_{n-K_{0}}\}, initialize the set CAND\mathrm{CAND} to be 𝖵𝖠𝚖\mathsf{V}\setminus\mathsf{A}_{\mathtt{m}}, and initialize the sets SUC\mathrm{SUC}, PAIRED\mathrm{PAIRED} and FAIL\mathrm{FAIL} to be empty sets. The algorithm processes vkv_{k} in the increasing order of kk: for each vkv_{k}, we find the minimal 𝗄\mathsf{k} such that 𝗏𝗄CAND\mathsf{v}_{\mathsf{k}}\in\mathrm{CAND} and

j=1112Kt(Wvk(t)(j)+ηj(t),Dvk(t))(𝖶𝗏𝗄(t)(j)+ηj(t),𝖣𝗏𝗄(t))1100Ktεt.\sum_{j=1}^{\frac{1}{12}K_{t^{*}}}\big{(}W^{(t^{*})}_{v_{k}}(j)+\langle\eta^{(t^{*})}_{j},D^{(t^{*})}_{v_{k}}\rangle\big{)}\big{(}\mathsf{W}^{(t^{*})}_{\mathsf{v}_{\mathsf{k}}}(j)+\langle\eta^{(t^{*})}_{j},\mathsf{D}^{(t^{*})}_{\mathsf{v}_{\mathsf{k}}}\rangle\big{)}\geq\frac{1}{100}K_{t^{*}}\varepsilon_{t^{*}}\,. (2.37)

We then define π𝚖(vk)=𝗏𝗄\pi_{\mathtt{m}}(v_{k})=\mathsf{v}_{\mathsf{k}}, put vkv_{k} into SUC\mathrm{SUC} and move 𝗏𝗄\mathsf{v}_{\mathsf{k}} from CAND\mathrm{CAND} to PAIRED\mathrm{PAIRED}. If there is no 𝗄\mathsf{k} satisfying (2.37), we put vkv_{k} into FAIL\mathrm{FAIL}. Having processed all vertices in VAV\setminus A, we pair the vertices in FAIL\mathrm{FAIL} and the (remaining) vertices in CAND\mathrm{CAND} in an arbitrary but pre-fixed manner to get the matching π𝚖\pi_{\mathtt{m}}.

We say a pair of sequences A=(u1,u2,,uK0)A=(u_{1},u_{2},\ldots,u_{K_{0}}) and 𝖠=(𝗎1,𝗎2,,𝗎K0)\mathsf{A}=(\mathsf{u}_{1},\mathsf{u}_{2},\ldots,\mathsf{u}_{K_{0}}) is a good pair if

𝗎j=π(uj) for 1jK0.\mathsf{u}_{j}=\pi(u_{j})\mbox{ for }1\leq j\leq K_{0}\,. (2.38)

The success of our algorithm lies in the following proposition which states that starting from a good pair we have that π𝚖\pi_{\mathtt{m}} correctly recovers almost all vertices.

Proposition 2.3.

For a pair (A,𝖠)(A,\mathsf{A}), define π(A,𝖠)=π𝚖\pi(A,\mathsf{A})=\pi_{\mathtt{m}} if 𝖠=𝖠𝚖\mathsf{A}=\mathsf{A}_{\mathtt{m}}. If (A,𝖠)(A,\mathsf{A}) is a good pair, then with probability 1o(1)1-o(1), we have

|{v:π(A,𝖠)(v)=π(v)}|(110logn)n.\displaystyle|\{v:{\pi}(A,\mathsf{A})(v)=\pi(v)\}|\geq(1-\frac{10}{\log n})n\,.

2.5 From almost exact matching to exact matching

In this subsection, we employ a seeded matching algorithm [1] (see also [39, 55]) to enhance an almost exact matching (which we denote as π~\tilde{\pi} in what follows) to an exact matching. Our matching algorithm is a simplified version of [1, Algorithm 4].

  Algorithm 1 Seeded Matching Algorithm

 

1:  Input: A triple (G,𝖦,π~)(G,\mathsf{G},\tilde{\pi}) where (G,𝖦)𝒢(n,q,ρ)(G,\mathsf{G})\sim\mathcal{G}(n,q,\rho) and π~\tilde{\pi} agrees with π\pi on 1o(1)1-o(1) fraction of vertices.
2:  For uV(G),𝗏𝖵(𝖦)u\in V(G),\mathsf{v\in V(G)}, define their 1-neighborhood N(u,𝗏)=|{wV:uw,𝗏π~(w)}|N(u,\mathsf{v})=|\{w\in V:u\sim w,\mathsf{v}\sim\tilde{\pi}(w)\}|.
3:  Define Δ=ρ2nq100\Delta=\frac{\rho^{2}nq}{100} and set π^=π~\hat{\pi}=\tilde{\pi}.
4:  Repeat the following: if there exists a pair u,𝗏u,\mathsf{v} such that N(u,𝗏)ΔN(u,\mathsf{v})\geq\Delta and N(u,π^(u))N(u,\hat{\pi}(u)), N(π^)1(𝗏),𝗏)<110ΔN(\hat{\pi})^{-1}(\mathsf{v}),\mathsf{v})<\frac{1}{10}\Delta, then modify π^\hat{\pi} to map uu to 𝗏\mathsf{v} and map π^1(𝗏){\hat{\pi}}^{-1}(\mathsf{v}) to π^(u)\hat{\pi}(u); otherwise, move to Step 5.
5:  Output: π^\hat{\pi}.

 

At this point, we can run Algorithm 2.5 for each π𝚖\pi_{\mathtt{m}} (which serves as the input π~\tilde{\pi}), and obtain the corresponding refined matching π^𝚖\hat{\pi}_{\mathtt{m}} (which is the output π^\hat{\pi}). By [1, Lemma 4.2] and Proposition 2.3, we see that π^𝚖=π\hat{\pi}_{\mathtt{m}}=\pi with probability 1o(1)1-o(1) (note that [1, Lemma 4.2] applies to an adversarially chosen input π~\tilde{\pi} as long as π~\tilde{\pi} agrees with π\pi on 1o(1)1-o(1) fraction of vertices). Finally, we set

π^=argmaxπ^𝚖{(u,v)E(V)Gu,v𝖦π^𝚖(u),π^𝚖(v)}.\displaystyle\hat{\pi}_{\diamond}=\arg\max_{\hat{\pi}_{\mathtt{m}}}\Big{\{}\sum_{(u,v)\in E(V)}G_{u,v}\mathsf{G}_{\hat{\pi}_{\mathtt{m}}(u),\hat{\pi}_{\mathtt{m}}(v)}\Big{\}}\,. (2.39)

Combined with [52, Theorem 4], it yields the following theorem.

Theorem 2.4.

With probability 1o(1)1-o(1), we have π^=π\hat{\pi}_{\diamond}=\pi.

2.6 Formal description of the algorithm

We are now ready to present our algorithm formally.

  Algorithm 2 Random Graph Matching Algorithm

 

1:  Define G,𝖦,q^,ρ^,A,ϕ,𝙼,ιlb,ιub,𝔞,ϰ,κ,χ\overrightarrow{G},\overrightarrow{\mathsf{G}},\hat{q},\hat{\rho},A,\phi,\mathtt{M},\iota_{\mathrm{lb}},\iota_{\mathrm{ub}},\mathfrak{a},\varkappa,\kappa,\chi and Φ(0),Ψ(0)\Phi^{(0)},\Psi^{(0)} as above.
2:  List all sequences with K0K_{0} distinct elements in 𝖵\mathsf{V} by 𝖠1,𝖠2,,𝖠𝙼\mathsf{A}_{1},\mathsf{A}_{2},\ldots,\mathsf{A}_{\mathtt{M}}.
3:  for 𝚖=1,,𝙼\mathtt{m}=1,\ldots,\mathtt{M} do
4:     Define k(a),Υk(a)\aleph^{(a)}_{k},\Upsilon^{(a)}_{k} for 0aχ,1kK00\leq a\leq\chi,1\leq k\leq K_{0} as in (2.5) and (2.6).
5:     Define Γk(0),Πk(0)\Gamma^{(0)}_{k},\Pi^{(0)}_{k} for 1kK01\leq k\leq K_{0} as in (2.8).
6:     Define ε0,K0\varepsilon_{0},K_{0} as above.
7:     Set π𝚖(vj)=𝗏j\pi_{\mathtt{m}}(v_{j})=\mathsf{v}_{j} where vj,𝗏jv_{j},\mathsf{v}_{j} are the jj-th coordinate of A,𝖠𝚖A,\mathsf{A}_{\mathtt{m}} respectively.
8:     while  Kt(logn)2K_{t}\leq(\log n)^{2}  do
9:        Calculate Kt+1K_{t+1} according to (2.19).
10:        Calculate MΓ(t,s),MΠ(t,s)\mathrm{M}^{(t,s)}_{\Gamma},\mathrm{M}^{(t,s)}_{\Pi} for 0st0\leq s\leq t according to (2.24).
11:        Calculate the eigenvalues and eigenvectors of Φ(t),Ψ(t)\Phi^{(t)},\Psi^{(t)}, as in (2.22).
12:        Define η1(t),η2(t),,ηKt12(t)\eta^{(t)}_{1},\eta^{(t)}_{2},\ldots,\eta^{(t)}_{\frac{K_{t}}{12}} according to (2.26) and (2.27).
13:        Calculate εt+1\varepsilon_{t+1} according to (2.31).
14:        Sample random vectors βk(t)\beta^{(t)}_{k} for 1kKt+11\leq k\leq K_{t+1} as described below (2.23).
15:        Define σk(t)\sigma^{(t)}_{k} for 1kKt+11\leq k\leq K_{t+1} according to (2.28).
16:        Define Φ(t+1),Ψ(t+1)\Phi^{(t+1)},\Psi^{(t+1)} according to (2.34).
17:        Define Γk(t+1),Πk(t+1)\Gamma^{(t+1)}_{k},\Pi^{(t+1)}_{k} for 1kKt+11\leq k\leq K_{t+1} according to (2.29).
18:     end while
19:     Suppose we stop at t=tt=t^{*}.
20:     Define η1(t),η2(t),,ηKt12(t)\eta^{(t)}_{1},\eta^{(t)}_{2},\ldots,\eta^{(t)}_{\frac{K_{t^{*}}}{12}} according to (2.26) and (2.27).
21:     List VAV\setminus A and 𝖵𝖠𝚖\mathsf{V}\setminus\mathsf{A}_{\mathtt{m}} in a prefixed order VA={v1,,vnK0}V\setminus A=\{v_{1},\ldots,v_{n-K_{0}}\} and 𝖵𝖠𝚖={𝗏1,,𝗏nK0}\mathsf{V}\setminus\mathsf{A}_{\mathtt{m}}=\{\mathsf{v}_{1},\ldots,\mathsf{v}_{n-K_{0}}\}.
22:     Set SUC,PAIRED,FAIL=\mathrm{SUC},\mathrm{PAIRED},\mathrm{FAIL}=\emptyset and CAND=𝖵𝖠𝚖\mathrm{CAND}=\mathsf{V}\setminus\mathsf{A}_{\mathtt{m}}.
23:     for  1knK01\leq k\leq n-K_{0}  do
24:        Define Su=0\textup{S}_{u}=0.
25:        for  1𝗄nK01\leq\mathsf{k}\leq n-K_{0}  do
26:           if  𝗏𝗄CAND\mathsf{v}_{\mathsf{k}}\in\mathrm{CAND} and (vk,𝗏𝗄)(v_{k},\mathsf{v}_{\mathsf{k}}) satisfies (2.37then
27:              Define π𝚖(vk)=𝗏𝗄\pi_{\mathtt{m}}(v_{k})=\mathsf{v}_{\mathsf{k}}.
28:              Set Su=1\textup{S}_{u}=1.
29:              Put vkv_{k} into SUC\mathrm{SUC} and move 𝗏𝗄\mathsf{v}_{\mathsf{k}} into PAIRED\mathrm{PAIRED}.
30:           end if
31:        end for
32:        if  Su=0\textup{S}_{u}=0  then
33:           Put vkv_{k} into FAIL\mathrm{FAIL}.
34:        end if
35:     end for
36:     Complete π𝚖\pi_{\mathtt{m}} into an entire matching by mapping FAIL\mathrm{FAIL} to CAND\mathrm{CAND} in an arbitrary and prefixed manner.
37:     Run Algorithm 2.5 with the input (G,𝖦,π𝚖)(G,\mathsf{G},{\pi}_{\mathtt{m}}) and denote the output as π^𝚖\hat{\pi}_{\mathtt{m}}.
38:  end for
39:  Find π^𝚖\hat{\pi}_{\mathtt{m}^{*}} which maximizes (u,v)E(V)Gu,v𝖦π(u),π(v)\sum_{(u,v)\in E(V)}G_{u,v}\mathsf{G}_{\pi(u),\pi(v)} among {π^𝚖:1𝚖𝙼}\{\hat{\pi}_{\mathtt{m}}:1\leq\mathtt{m}\leq\mathtt{M}\}.
40:  return  π^=π^𝚖\hat{\pi}_{\diamond}=\hat{\pi}_{\mathtt{m}^{*}}.

 

2.7 Running time analysis

In this subsection, we analyze the running time for Algorithm 2.6.

Proposition 2.5.

The running time for computing each π𝚖\pi_{\mathtt{m}} is O(n3)O(n^{3}). Furthermore, the running time for Algorithm 2.6 is O(nκ+3)O(n^{\kappa+3}).

Proof.

We first prove the first claim. For each 𝚖\mathtt{m}, it takes O(n2)O(n^{2}) time to compute all Γk(0),Πk(0)\Gamma^{(0)}_{k},\Pi^{(0)}_{k} and [17, Proposition 2.13] can be easily adapted to show that computing π𝚖\pi_{\mathtt{m}} based on the initialization takes time O(n2+o(1))O(n^{2+o(1)}). In addition, it is easy to see that Algorithm 2.5 runs in time O(n3)O(n^{3}). Altogether, this yields the claim.

We now prove the second claim. Since 𝙼nκ\mathtt{M}\leq n^{\kappa}, the running time for computing all π𝚖\pi_{\mathtt{m}} is O(nκ+3)O(n^{\kappa+3}). In addition, finding π^\hat{\pi}_{\diamond} from {π𝚖}\{\pi_{\mathtt{m}}\} takes O(nκ+2)O(n^{\kappa+2}) time. So the total running time is O(nκ+3)O(n^{\kappa+3}). ∎

We complete this section by pointing out that Theorem 1.1 follows directly from Theorem 2.4 and Proposition 2.5.

3 Analysis of the matching algorithm

The main goal of this section is to prove Proposition 2.3.

3.1 Outline of the proof

We fix a good pair (A,𝖠)(A,\mathsf{A}). As in [17], the basic intuition is that each pair of (Γk(t),Πk(t))(\Gamma^{(t)}_{k},\Pi^{(t)}_{k}) carries signal of strength at least εt\varepsilon_{t}, and thus the total signal strength of all KtK_{t} pairs will grow in tt (recall (2.36)). A natural attempt is to prove this via induction, for which a key challenge is to control correlations among different iterative steps. As a related challenge, we also need to show that the signals carried by different pairs are essentially non-repetitive. To this end, we will (more or less) follow [17] and propose the following admissible conditions on (Γk(t),Πk(t))(\Gamma^{(t)}_{k},\Pi^{(t)}_{k}) and we hope that this will allow us to verify these admissible conditions by induction. Define the targeted approximation error at time tt by

Δt=Δt(q^,ρ^)=e(loglogn)10(logn)10titKi100.\displaystyle\Delta_{t}=\Delta_{t}(\hat{q},\hat{\rho})=e^{-(\log\log n)^{10}}(\log n)^{10t}\prod_{i\leq t}K_{i}^{100}\,. (3.1)

Since KtKt(logn)4K_{t}\leq K_{t^{*}}\leq(\log n)^{4} and 2t2loglogn2^{t^{*}}\leq 2\log\log n, we have

ΔtΔte(loglogn)81.{}\Delta_{t}\leq\Delta_{t^{*}}\leq e^{-(\log\log n)^{8}}\ll 1\,. (3.2)
Definition 3.1.

For t0t\geq 0 and a collection of pairs (Γk(s),Πk(s))1kKs,0st(\Gamma^{(s)}_{k},\Pi^{(s)}_{k})_{1\leq k\leq K_{s},0\leq s\leq t} with Γk(s)V\Gamma^{(s)}_{k}\subset V and Πk(s)𝖵\Pi^{(s)}_{k}\subset\mathsf{V}, we say (Γk(s),Πk(s))1kKs,0st(\Gamma^{(s)}_{k},\Pi^{(s)}_{k})_{1\leq k\leq K_{s},0\leq s\leq t} is tt-admissible if the following hold:

  1. (i.)

    ||Γk(0)|nϑ|,||Πk(0)|nϑ|<ϑΔ0\Big{|}\frac{|\Gamma^{(0)}_{k}|}{n}-\vartheta\Big{|},\Big{|}\frac{|\Pi^{(0)}_{k}|}{n}-\vartheta\Big{|}<\vartheta\Delta_{0} for 1kK01\leq k\leq K_{0};

  2. (ii.)

    ||Γk(0)Γl(0)|nϑ2|,||Πk(0)Πl(0)|nϑ2|<ϑΔ0\Big{|}\frac{|\Gamma^{(0)}_{k}\cap\Gamma^{(0)}_{l}|}{n}-\vartheta^{2}\Big{|},\Big{|}\frac{|\Pi^{(0)}_{k}\cap\Pi^{(0)}_{l}|}{n}-\vartheta^{2}\Big{|}<\vartheta\Delta_{0} for 1klK01\leq k\neq l\leq K_{0};

  3. (iii.)

    ||π(Γk(0))Πk(0)|nς|<ϑΔ0\Big{|}\frac{|\pi(\Gamma^{(0)}_{k})\cap\Pi^{(0)}_{k}|}{n}-\varsigma\Big{|}<\vartheta\Delta_{0} and ||π(Γk(0))Πl(0)|nϑ2|<ϑΔ0\Big{|}\frac{|\pi(\Gamma^{(0)}_{k})\cap\Pi^{(0)}_{l}|}{n}-\vartheta^{2}\Big{|}<\vartheta\Delta_{0} for 1klK01\leq k\neq l\leq K_{0};

  4. (iv.)

    ||Γk(s)|n𝔞|,||Πk(s)|n𝔞|<𝔞Δs\Big{|}\frac{|\Gamma^{(s)}_{k}|}{n}-\mathfrak{a}\Big{|},\Big{|}\frac{|\Pi^{(s)}_{k}|}{n}-\mathfrak{a}\Big{|}<\mathfrak{a}\Delta_{s} for 1kKs1\leq k\leq K_{s} and 1st1\leq s\leq t;

  5. (v.)

    ||Πk(s)Πl(s)|nϕ(12Ks1βk(s1),βl(s1))|,||Γk(s)Γl(s)|nϕ(12Ks1βk(s1),βl(s1))|<𝔞Δs\Big{|}\frac{|\Pi^{(s)}_{k}\cap\Pi^{(s)}_{l}|}{n}-\phi(\frac{12}{K_{s-1}}\langle{\beta}^{(s-1)}_{k},{\beta}^{(s-1)}_{l}\rangle)\Big{|},\Big{|}\frac{|\Gamma^{(s)}_{k}\cap\Gamma^{(s)}_{l}|}{n}-\phi(\frac{12}{K_{s-1}}\langle{\beta}^{(s-1)}_{k},{\beta}^{(s-1)}_{l}\rangle)\Big{|}<\mathfrak{a}\Delta_{s} for 1k,lKs1\leq k,l\leq K_{s} and 1st1\leq s\leq t;

  6. (vi.)

    ||π(Γk(s))Πl(s)|nϕ(ρ^212Ks1β^k(s1),β^l(s1))|<𝔞Δs\Big{|}\frac{|\pi(\Gamma^{(s)}_{k})\cap\Pi^{(s)}_{l}|}{n}-\phi(\frac{\hat{\rho}}{2}\frac{12}{K_{s-1}}\langle\hat{\beta}^{(s-1)}_{k},\hat{\beta}^{(s-1)}_{l}\rangle)\Big{|}<\mathfrak{a}\Delta_{s} for 1k,lKs1\leq k,l\leq K_{s} and 1st1\leq s\leq t;

  7. (vii.)

    ||Γk(s)Γl(r)|n𝔞2|,||Πk(s)Πl(r)|n𝔞2|<𝔞Δs\Big{|}\frac{|\Gamma^{(s)}_{k}\cap\Gamma^{(r)}_{l}|}{n}-\mathfrak{a}^{2}\Big{|},\Big{|}\frac{|\Pi^{(s)}_{k}\cap\Pi^{(r)}_{l}|}{n}-\mathfrak{a}^{2}\Big{|}<\mathfrak{a}\Delta_{s} for 1kKs1\leq k\leq K_{s}, 1lKr1\leq l\leq K_{r} and 1r<st1\leq r<s\leq t;

  8. (viii.)

    ||π(Γk(s))Πl(r)|n𝔞2|<𝔞Δmax(s,r)\Big{|}\frac{|\pi(\Gamma^{(s)}_{k})\cap\Pi^{(r)}_{l}|}{n}-\mathfrak{a}^{2}\Big{|}<\mathfrak{a}\Delta_{\max(s,r)} for 1kKs1\leq k\leq K_{s}, 1lKr1\leq l\leq K_{r} and 1rst1\leq r\not=s\leq t.

  9. (ix.)

    ||Γk(s)Γl(0)|n𝔞ϑ|,||Πk(s)Πl(0)|n𝔞ϑ|<ϑ𝔞Δs\Big{|}\frac{|\Gamma^{(s)}_{k}\cap\Gamma^{(0)}_{l}|}{n}-\mathfrak{a}\vartheta\Big{|},\Big{|}\frac{|\Pi^{(s)}_{k}\cap\Pi^{(0)}_{l}|}{n}-\mathfrak{a}\vartheta\Big{|}<\sqrt{\vartheta\mathfrak{a}}\Delta_{s} for 1kKs1\leq k\leq K_{s}, 1lK01\leq l\leq K_{0} and 0<st0<s\leq t;

  10. (x.)

    ||π(Γk(s))Πl(0)|n𝔞ϑ|,||π(Γl(0))Πk(s)|n𝔞ϑ|<ϑ𝔞Δs\Big{|}\frac{|\pi(\Gamma^{(s)}_{k})\cap\Pi^{(0)}_{l}|}{n}-\mathfrak{a}\vartheta\Big{|},\Big{|}\frac{|\pi(\Gamma^{(0)}_{l})\cap\Pi^{(s)}_{k}|}{n}-\mathfrak{a}\vartheta\Big{|}<\sqrt{\vartheta\mathfrak{a}}\Delta_{s} for 1kKs1\leq k\leq K_{s}, 1lK01\leq l\leq K_{0} and 0<st0<s\leq t.

Here 𝔞,ϑ,Kt,ϕ,β(t)\mathfrak{a},\vartheta,K_{t},\phi,\beta^{(t)} and β^(t)\hat{\beta}^{(t)} are defined previously in Section 2.

Proofs for [17, (3.3)-(3.8)] can be easily adapted (with no essential change) to show that under the assumption of admissibility, the matrices MΓ(t,t),MΠ(t,t)\mathrm{M}^{(t,t)}_{\Gamma},\mathrm{M}^{(t,t)}_{\Pi} and PΓ,Π(t,t)\mathrm{P}^{(t,t)}_{\Gamma,\Pi} concentrate around Φ(t),Φ(t)\Phi^{(t)},\Phi^{(t)} and Ψ(t)\Psi^{(t)} respectively with error Δt\Delta_{t}, and MΓ(t,s),MΠ(t,s),PΓ,Π(t,s)\mathrm{M}^{(t,s)}_{\Gamma},\mathrm{M}^{(t,s)}_{\Pi},\mathrm{P}^{(t,s)}_{\Gamma,\Pi} have entries bounded by Δt\Delta_{t}. For notational convenience, define

t={(Γk(s),Πk(s))0st,1kKs is t-admissible}.\displaystyle\mathcal{E}_{t}=\{(\Gamma^{(s)}_{k},\Pi^{(s)}_{k})_{0\leq s\leq t,1\leq k\leq K_{s}}\mbox{ is $t$-admissible}\}\,. (3.3)

As hinted earlier, to verify t\mathcal{E}_{t} inductively, the main difficulty is the complicated dependency among the iteration steps. In [17], much effort was dedicated to this issue even with the help of Gaussian property. In the present work, this is even much harder than in [17] due to the lack of Gaussian property (for instance, a crucial Gaussian property is that conditioned on linear statistics a Gaussian process remains Gaussian). To this end, our rough intuition is to try to compare our process with a Gaussian process whenever possible, and (as we will see) major challenge arises in case when such comparison is out of control.

We first consider the initialization. Define

REV(a)=0ja1kK0(k(j)π1(Υk(j)))\mathrm{REV}^{(a)}=\cup_{0\leq j\leq a}\cup_{1\leq k\leq K_{0}}\big{(}\aleph^{(j)}_{k}\cup\pi^{-1}(\Upsilon^{(j)}_{k})\big{)} (3.4)

to be the set of vertices that we have explored for initialization either directly or indirectly (e.g., through correlation of the latent matching). We further denote REV=REV(χ)\mathrm{REV}=\mathrm{REV}^{(\chi)}. Define

𝔖init=σ{REV,{Gu,w,𝖦π(u),π(w):uREV or wREV}}\displaystyle\mathfrak{S}_{\mathrm{init}}=\sigma\big{\{}\mathrm{REV},\{\overrightarrow{G}_{u,w},\overrightarrow{\mathsf{G}}_{\pi(u),\pi(w)}:u\in\mathrm{REV}\mbox{ or }w\in\mathrm{REV}\}\big{\}}

(note that REV=A\mathrm{REV}=A is measurable with respect to {Gu,w,𝖦π(u),π(w):uA or wA}\{\overrightarrow{G}_{u,w},\overrightarrow{\mathsf{G}}_{\pi(u),\pi(w)}:u\in A\mbox{ or }w\in A\}). We have {Γk(0),Πk(0)}\{\Gamma^{(0)}_{k},\Pi^{(0)}_{k}\} is measurable with respect to 𝔖init\mathfrak{S}_{\mathrm{init}}. We will show that conditioning on a realization of (j(a),Υj(a))(\aleph^{(a)}_{j},\Upsilon^{(a)}_{j}) will not affect the degree of “most” vertices, thus verifying the concentration of j(a+1),Υj(a+1)\aleph^{(a+1)}_{j},\Upsilon^{(a+1)}_{j} inductively. This will eventually yield that 0\mathcal{E}_{0} holds with probability 1o(1)1-o(1), as incorporated in Section 3.3.

We now consider the iterations. When comparing our process to the case of Wigner matrices, a key challenge is that in each time tt we can only control the behavior (i.e., show they are close to “Gaussian”) of all but a vanishing fraction of ηk(t),Dv(t)\langle\eta^{(t)}_{k},D^{(t)}_{v}\rangle. This set of uncontrollable vertices will be inductively defined as BADt\mathrm{BAD}_{t} in (3.15). However, the algorithm forces us to deal with the behavior of ηk(t+1),Dv(t+1)\langle\eta^{(t+1)}_{k},D^{(t+1)}_{v}\rangle conditioned on all {ηk(s),Dv(s):0st}\{\langle\eta^{(s)}_{k},D^{(s)}_{v}\rangle:0\leq s\leq t\} (since the algorithm has explored all these variables), and thus we also need to control the influence from these “bad” vertices. To address this problem, we will separate ηk(t+1),Dv(t+1)\langle\eta^{(t+1)}_{k},D^{(t+1)}_{v}\rangle into two parts, one involved with bad vertices and one not. To this end, for 0st+10\leq s\leq t+1 and for vBADtv\not\in\mathrm{BAD}_{t}, we decompose Dv(s)(k)D^{(s)}_{v}(k) into a sum of two terms with Dv(s)(k)=btDv(s)(k)+gtDv(s)(k)D^{(s)}_{v}(k)=b_{t}{D}^{(s)}_{v}(k)+g_{t}{D}^{(s)}_{v}(k) (and similarly for the mathsf version), where

btDv(s)(k)=1(𝔞s𝔞s2)nq^(1q^)uBADt(𝟏uΓk(s)𝔞s)(Gv,uq^).\displaystyle b_{t}{D}^{(s)}_{v}(k)\overset{\triangle}{=}\frac{1}{\sqrt{(\mathfrak{a}_{s}-\mathfrak{a}_{s}^{2})n\hat{q}(1-\hat{q})}}\sum_{u\in\mathrm{BAD}_{t}}(\mathbf{1}_{u\in\Gamma^{(s)}_{k}}-\mathfrak{a}_{s})(\overrightarrow{G}_{v,u}-\hat{q})\,. (3.5)

Further, we define 𝔖t\mathfrak{S}_{t} to be the σ\sigma-field generated by

{Wv(s)(k)+ηk(s),Dv(s),𝖶π(v)(s)(k)+ηk(s),𝖣π(v)(s):1kKs12,0st,vV},\displaystyle\{W^{(s)}_{v}(k)+\langle\eta^{(s)}_{k},D^{(s)}_{v}\rangle,\mathsf{W}^{(s)}_{\pi(v)}(k)+\langle\eta^{(s)}_{k},\mathsf{D}^{(s)}_{\pi(v)}\rangle:1\leq k\leq\tfrac{K_{s}}{12},0\leq s\leq t,v\in V\}\,, (3.6)
BADt and {Gu,w,𝖦π(u),π(w):u or wBADt}.\displaystyle\mathrm{BAD}_{t}\mbox{ and }\{\overrightarrow{G}_{u,w},\overrightarrow{\mathsf{G}}_{\pi(u),\pi(w)}:u\mbox{ or }w\in\mathrm{BAD}_{t}\}\,.

Then we see that Dv(s)D_{v}^{(s)} is fixed under 𝔖t\mathfrak{S}_{t} for vBADtv\in\mathrm{BAD}_{t} and 1st1\leq s\leq t, and for vBADtv\not\in\mathrm{BAD}_{t} in a sense conditioned on 𝔖t\mathfrak{S}_{t} we may view btDv(s)(k)b_{t}{D}^{(s)}_{v}(k) and gtDv(s)(k)g_{t}{D}^{(s)}_{v}(k) as the “biased” part and the “free” part, respectively. We set BAD1=REV\mathrm{BAD}_{-1}=\mathrm{REV}, and inductively define

BIASD,t,s,k={vVBADt1:|bt1Dv(s)(k)|>e10(loglogn)10},\displaystyle\mathrm{BIAS}_{D,t,s,k}=\Big{\{}v\in V\setminus\mathrm{BAD}_{t-1}:|b_{t-1}{D}^{(s)}_{v}(k)|>e^{-10(\log\log n)^{10}}\Big{\}}\,, (3.7)
BIAS𝖣,t,s,k={vVBADt1:|bt1𝖣π(v)(s)(k)|>e10(loglogn)10},\displaystyle\mathrm{BIAS}_{\mathsf{D},t,s,k}=\Big{\{}v\in V\setminus\mathrm{BAD}_{t-1}:|b_{t-1}\mathsf{D}^{(s)}_{\pi(v)}(k)|>e^{-10(\log\log n)^{10}}\Big{\}}\,,

and then define BIASt=0st1kKs(BIASD,t,s,kBIAS𝖣,t,s,k)\mathrm{BIAS}_{t}=\cup_{0\leq s\leq t}\cup_{1\leq k\leq K_{s}}\big{(}\mathrm{BIAS}_{D,t,s,k}\cup\mathrm{BIAS}_{\mathsf{D},t,s,k}\big{)} to be the collection of vertices that are overly biased by the set BADt1\mathrm{BAD}_{t-1} (see (3.15)). Accordingly, we will give up on controlling the behavior for vertices in BIASt\mathrm{BIAS}_{t}.

Now we turn to the term gtDv(t+1)g_{t}D^{(t+1)}_{v}. We will try to argue that this “free” part behaves like a suitably chosen Gaussian process via the technique of density comparison, as in Section 3.4. To this end, we sample a pair of Gaussian matrices (Z,𝖹)(\overrightarrow{Z},\overrightarrow{\mathsf{Z}}) with zero diagonal terms such that their off-diagonals {Zu,v,𝖹𝗎,𝗏}\{\overrightarrow{Z}_{u,v},\overrightarrow{\mathsf{Z}}_{\mathsf{u,v}}\} is a family of Gaussian process with mean 0 and variance q^(1q^)\hat{q}(1-\hat{q}), and the only non-zero covariance occurs on pairs of the form (Zu,v,𝖹π(u),π(v))(\overrightarrow{Z}_{u,v},\overrightarrow{\mathsf{Z}}_{\pi(u),\pi(v)}) or (Zu,v,𝖹π(v),π(u))(\overrightarrow{Z}_{u,v},\overrightarrow{\mathsf{Z}}_{\pi(v),\pi(u)}) (for uvVu\neq v\in V) where 𝔼[Zu,v𝖹π(u),π(v)]=𝔼[Zu,v𝖹π(v),π(u)]=ρ^\mathbb{E}[\overrightarrow{Z}_{u,v}\overrightarrow{\mathsf{Z}}_{\pi(u),\pi(v)}]=\mathbb{E}[\overrightarrow{Z}_{u,v}\overrightarrow{\mathsf{Z}}_{\pi(v),\pi(u)}]=\hat{\rho} (this is analogous to the process defined in [17, Section 2.1]). In addition, we sample i.i.d standard normal variables W~v(s)(k),𝖶~π(v)(s)(k)\tilde{W}^{(s)}_{v}(k),\tilde{\mathsf{W}}^{(s)}_{\pi(v)}(k) for 0st,vV,1kKs/120\leq s\leq t^{*},v\in V,1\leq k\leq K_{s}/12. (We emphasize that we will sample (Z,𝖹,W~,𝖶~)(\overrightarrow{Z},\overrightarrow{\mathsf{Z}},\tilde{W},\tilde{\mathsf{W}}) only once and then will stick to it throughout the analysis.) We can then define the following “Gaussian substitution”, where we replace each Gv,u\overrightarrow{G}_{v,u} with Zv,u\overrightarrow{Z}_{v,u} for each u,vBADtu,v\not\in\mathrm{BAD}_{t}: for vBADtv\not\in\mathrm{BAD}_{t}, define gtD~v(s)g_{t}\tilde{D}^{(s)}_{v} to be a st+1Ks\sum_{s\leq t+1}K_{s}-dimensional vector whose kk-th entry is given by

gtD~v(s)(k)=uVBADt(𝟏uΓk(s)𝔞s)Zv,u(𝔞s𝔞s2)nq^(1q^) for 0st+1,\displaystyle g_{t}\tilde{D}^{(s)}_{v}(k)=\frac{\sum_{u\in V\setminus\mathrm{BAD}_{t}}(\mathbf{1}_{u\in\Gamma^{(s)}_{k}}-\mathfrak{a}_{s})\overrightarrow{Z}_{v,u}}{\sqrt{(\mathfrak{a}_{s}-\mathfrak{a}_{s}^{2})n\hat{q}(1-\hat{q})}}\mbox{ for }0\leq s\leq t+1\,, (3.8)

and we define gt𝖣~𝗏(s)g_{t}\tilde{\mathsf{D}}^{(s)}_{\mathsf{v}} similarly. We will use a delicate Lindeberg’s interpolation argument to bound the ratio of the densities before and after the substitution. To be more precise, we will process each pair (u,w)(u,w) sequentially (in an arbitrarily prefixed order) where we replace {Gu,wq^,Gw,uq^,𝖦π(u),π(w)q^,𝖦π(w),π(u)q^}\{\overrightarrow{G}_{u,w}-\hat{q},\overrightarrow{G}_{w,u}-\hat{q},\overrightarrow{\mathsf{G}}_{\pi(u),\pi(w)}-\hat{q},\overrightarrow{\mathsf{G}}_{\pi(w),\pi(u)}-\hat{q}\} by {Zu,w,Zw,u,𝖹π(u),π(w),𝖹π(w),π(u)}\{\overrightarrow{Z}_{u,w},\overrightarrow{Z}_{w,u},\overrightarrow{\mathsf{Z}}_{\pi(u),\pi(w)},\overrightarrow{\mathsf{Z}}_{\pi(w),\pi(u)}\}. Define this operation as 𝐎{u,w}\mathbf{O}_{\{u,w\}}, and the key is to bound the change of density ratio for each operation. To this end, list {𝐎{u,w}:(u,w)E0,uw}\{\mathbf{O}_{\{u,w\}}:(u,w)\in E_{0},u\neq w\} as {𝐎{u1,w1},,𝐎{uN,wN}}\{\mathbf{O}_{\{u_{1},w_{1}\}},\ldots,\mathbf{O}_{\{u_{N},w_{N}\}}\} in the aforementioned prefixed order, and define a corresponding operation 𝐔{u,w}\mathbf{U}_{\{u,w\}} which replaces {Gu,wq^,Gw,uq^,𝖦π(u),π(w)q^,𝖦π(w),π(u)q^}\{\overrightarrow{G}_{u,w}-\hat{q},\overrightarrow{G}_{w,u}-\hat{q},\overrightarrow{\mathsf{G}}_{\pi(u),\pi(w)}-\hat{q},\overrightarrow{\mathsf{G}}_{\pi(w),\pi(u)}-\hat{q}\} by {0,0,0,0}\{0,0,0,0\}. For 0jN0\leq j\leq N and for any random variable 𝐗\mathbf{X}, define

𝐗(j)=ij𝐎{ui,wi}(𝐗),𝐗j=ij𝐔{ui,wi}(𝐗),𝐗[j]=𝐗(j)𝐗j,\displaystyle\mathbf{X}_{(j)}=\circ_{i\leq j}\mathbf{O}_{\{u_{i},w_{i}\}}\big{(}\mathbf{X}\big{)},\quad\mathbf{X}_{\langle j\rangle}=\circ_{i\leq j}\mathbf{U}_{\{u_{i},w_{i}\}}\big{(}\mathbf{X}\big{)},\quad\mathbf{X}_{[j]}=\mathbf{X}_{(j)}-\mathbf{X}_{\langle j\rangle}\,, (3.9)

where ij𝐎{ui,wi}\circ_{i\leq j}\mathbf{O}_{\{u_{i},w_{i}\}} is the composition of the first jj operations {𝐎{u1,w1},,𝐎{uj,wj}}\{\mathbf{O}_{\{u_{1},w_{1}\}},\ldots,\mathbf{O}_{\{u_{j},w_{j}\}}\} and ij𝐔{ui,wi}\circ_{i\leq j}\mathbf{U}_{\{u_{i},w_{i}\}} is the composition of the first jj operations {𝐔{u1,w1},,𝐔{uj,wj}}\{\mathbf{U}_{\{u_{1},w_{1}\}},\ldots,\mathbf{U}_{\{u_{j},w_{j}\}}\}. For notational convenience, we simply define i0𝐎{ui,wi}=i0𝐔{ui,wi}=𝐈𝐝\circ_{i\leq 0}\mathbf{O}_{\{u_{i},w_{i}\}}=\circ_{i\leq 0}\mathbf{U}_{\{u_{i},w_{i}\}}=\mathbf{Id} to be the identity map. We will see that a crucial point of our argument is to employ suitable truncations; such truncations will be useful when establishing Lemma 3.15. To this end, we define LARGEt,s,k(0)\mathrm{LARGE}^{(0)}_{t,s,k} to be the collection of vertices vVBADt1v\in V\setminus\mathrm{BAD}_{t-1} such that for some 0jN0\leq j\leq N

|Wv(s)(k)| or |ηk(s),gt1Dv(s)j| or |𝖶π(v)(s)(k)| or |ηk(s),gt1𝖣π(v)(s)j|>n1logloglogn.\big{|}W^{(s)}_{v}(k)\big{|}\mbox{ or }\big{|}\langle\eta^{(s)}_{k},g_{t-1}{D}^{(s)}_{v}\rangle_{\langle j\rangle}\big{|}\mbox{ or }\big{|}\mathsf{W}^{(s)}_{\pi(v)}(k)\big{|}\mbox{ or }\big{|}\langle\eta^{(s)}_{k},g_{t-1}{\mathsf{D}}^{(s)}_{\pi(v)}\rangle_{\langle j\rangle}\big{|}>n^{\frac{1}{\log\log\log n}}\,. (3.10)

Let LARGEt(0)=s=0tk=1Ks12LARGEt,s,k(0)\mathrm{LARGE}^{(0)}_{t}=\cup_{s=0}^{t}\cup_{k=1}^{\frac{K_{s}}{12}}\mathrm{LARGE}^{(0)}_{t,s,k}. We then define

bt,0Dv(s)(k)\displaystyle b_{t,0}{D}^{(s)}_{v}(k) =1(𝔞s𝔞s2)nq^(1q^)uLARGEt(0)BIAStPRBt(𝟏uΓk(s)𝔞s)(Gv,uq^),\displaystyle=\frac{1}{\sqrt{(\mathfrak{a}_{s}-\mathfrak{a}_{s}^{2})n\hat{q}(1-\hat{q})}}\sum_{u\in\mathrm{LARGE}_{t}^{(0)}\cup\mathrm{BIAS}_{t}\cup\mathrm{PRB}_{t}}(\mathbf{1}_{u\in\Gamma^{(s)}_{k}}-\mathfrak{a}_{s})(\overrightarrow{G}_{v,u}-\hat{q})\,,
bt,0𝖣π(v)(s)(k)\displaystyle b_{t,0}{\mathsf{D}}^{(s)}_{\pi(v)}(k) =1(𝔞s𝔞s2)nq^(1q^)uLARGEt(0)BIAStPRBt(𝟏π(u)Πk(s)𝔞s)(𝖦π(v),π(u)q^).\displaystyle=\frac{1}{\sqrt{(\mathfrak{a}_{s}-\mathfrak{a}_{s}^{2})n\hat{q}(1-\hat{q})}}\sum_{u\in\mathrm{LARGE}_{t}^{(0)}\cup\mathrm{BIAS}_{t}\cup\mathrm{PRB}_{t}}(\mathbf{1}_{\pi(u)\in\Pi^{(s)}_{k}}-\mathfrak{a}_{s})(\overrightarrow{\mathsf{G}}_{\pi(v),\pi(u)}-\hat{q})\,.

Here PRBt\mathrm{PRB}_{t} is defined in (3.14) below and later we will explain that this definition is valid (although we have not defined PRBt\mathrm{PRB}_{t} yet). For a0a\geq 0, we inductively define LARGEt,s,k(a+1)\mathrm{LARGE}^{(a+1)}_{t,s,k} to be the collection of vertices vVBADt1v\in V\setminus\mathrm{BAD}_{t-1} such that for some 0jN0\leq j\leq N

|ηk(s),bt,aDv(s)j| or |ηk(s),bt,a𝖣π(v)(s)j|>n1logloglogn,\big{|}\langle\eta^{(s)}_{k},b_{t,a}{D}^{(s)}_{v}\rangle_{\langle j\rangle}\big{|}\mbox{ or }\big{|}\langle\eta^{(s)}_{k},b_{t,a}{\mathsf{D}}^{(s)}_{\pi(v)}\rangle_{\langle j\rangle}\big{|}>n^{\frac{1}{\log\log\log n}}\,, (3.11)

define LARGEt(a+1)=s=0tk=1Ks12LARGEt,s,k(a+1)\mathrm{LARGE}^{(a+1)}_{t}=\bigcup_{s=0}^{t}\bigcup_{k=1}^{\frac{K_{s}}{12}}\mathrm{LARGE}^{(a+1)}_{t,s,k}, and define

bt,a+1Dv(s)(k)=1(𝔞s𝔞s2)nq^(1q^)uLARGEt(a+1)(𝟏uΓk(s)𝔞s)(Gv,uq^).\displaystyle b_{t,a+1}{D}^{(s)}_{v}(k)=\frac{1}{\sqrt{(\mathfrak{a}_{s}-\mathfrak{a}_{s}^{2})n\hat{q}(1-\hat{q})}}\sum_{u\in\mathrm{LARGE}^{(a+1)}_{t}}(\mathbf{1}_{u\in\Gamma^{(s)}_{k}}-\mathfrak{a}_{s})(\overrightarrow{G}_{v,u}-\hat{q})\,.

Also we similarly define bt,a+1𝖣π(v)(s)(k)b_{t,a+1}{\mathsf{D}}^{(s)}_{\pi(v)}(k) (using similar analogy for a=0a=0). Having completed this inductive definition, we finally write LARGEt=a=0LARGEt(a)\mathrm{LARGE}_{t}=\cup_{a=0}^{\infty}\mathrm{LARGE}^{(a)}_{t}. We will argue that after removing LARGEt\mathrm{LARGE}_{t} the remaining random variables have smoothed density whose change in each substitution can be bounded, thereby verifying that their original joint density is not too far away from that of a Gaussian process by a delicate Lindeberg’s argument. The details of such Lindeberg’s argument are incorporated in Section 3.4.

Thanks to the above discussions, we have essentially reduced the problem to analyzing the corresponding Gaussian process (not exactly yet since we still need to consider yet another type of bad vertices arising from Gaussian approximation as in (3.14) below). To this end, we will employ the techniques of Gaussian projection. Define t=σ(𝔉t)\mathcal{F}_{t}=\sigma(\mathfrak{F}_{t}) where

𝔉t={W~v(s)(k)+ηk(s),gtD~v(s)𝖶~𝗏(s)(k)+ηk(s),gt𝖣~𝗏(s):0st,1kKs12,v,π1(𝗏)BADt}.\mathfrak{F}_{t}=\Bigg{\{}\begin{aligned} \tilde{W}^{(s)}_{v}(k)+\langle\eta^{(s)}_{k},g_{t}\tilde{D}^{(s)}_{v}\rangle\\ \tilde{\mathsf{W}}^{(s)}_{\mathsf{v}}(k)+\langle\eta^{(s)}_{k},g_{t}\tilde{\mathsf{D}}^{(s)}_{\mathsf{v}}\rangle\end{aligned}:0\leq s\leq t,1\leq k\leq\frac{K_{s}}{12},v,\pi^{-1}(\mathsf{v})\not\in\mathrm{BAD}_{t}\Bigg{\}}\,. (3.12)

We will condition on (see (3.51) below)

{Γk(s),Πk(s),BADs:0st,1kKs}\{\Gamma^{(s)}_{k},\Pi^{(s)}_{k},\mathrm{BAD}_{s}:0\leq s\leq t,1\leq k\leq K_{s}\} (3.13)

and thus 𝔉t\mathfrak{F}_{t} is viewed as a Gaussian process. We can then obtain the conditional distribution of W~v(t+1)(k)+ηk(t+1),gtD~v(t+1)\tilde{W}^{(t+1)}_{v}(k)+\langle\eta^{(t+1)}_{k},g_{t}\tilde{D}^{(t+1)}_{v}\rangle given t\mathcal{F}_{t} (see Remark 3.22). In particular, we will show that the projection of W~v(t+1)(k)+ηk(t+1),gtD~v(t+1)\tilde{W}^{(t+1)}_{v}(k)+\langle\eta^{(t+1)}_{k},g_{t}\tilde{D}^{(t+1)}_{v}\rangle onto t\mathcal{F}_{t} has the form

(gt[Y~]tgt[𝖸~]t)𝐐t(Ht+1,k,v𝖧t+1,k,v).\displaystyle\begin{pmatrix}g_{t}[\tilde{Y}]_{t}&g_{t}[\tilde{\mathsf{Y}}]_{t}\end{pmatrix}\mathbf{Q}_{t}\begin{pmatrix}H_{t+1,k,v}&\mathsf{H}_{t+1,k,v}\end{pmatrix}^{*}\,.

Here gt[Y~]t(s,k,u)=W~u(s)(k)+ηk(s),gtD~u(s),gt[𝖸~]t(s,k,𝗎)=𝖶~𝗎(s)(k)+ηk(s),gt𝖣~𝗎(s)g_{t}[\tilde{Y}]_{t}(s,k,u)=\tilde{W}^{(s)}_{u}(k)+\langle\eta^{(s)}_{k},g_{t}\tilde{D}^{(s)}_{u}\rangle,g_{t}[\tilde{\mathsf{Y}}]_{t}(s,k,\mathsf{u})=\tilde{\mathsf{W}}^{(s)}_{\mathsf{u}}(k)+\langle\eta^{(s)}_{k},g_{t}\tilde{\mathsf{D}}^{(s)}_{\mathsf{u}}\rangle, and 𝐐t1\mathbf{Q}^{-1}_{t} is the conditional covariance matrix of 𝔉t\mathfrak{F}_{t} given (3.13). In addition, Ht+1,k,v(s,l,u){H}_{t+1,k,v}(s,l,u) is the conditional covariance between W~v(t+1)(k)+ηk(t+1),gtD~v(t+1)\tilde{W}^{(t+1)}_{v}(k)+\langle\eta^{(t+1)}_{k},g_{t}\tilde{D}^{(t+1)}_{v}\rangle and W~u(s)(l)+ηl(s),gtD~u(s)\tilde{W}^{(s)}_{u}(l)+\langle\eta^{(s)}_{l},g_{t}\tilde{D}^{(s)}_{u}\rangle, and 𝖧t+1,k,v(s,l,𝗎)\mathsf{H}_{t+1,k,v}(s,l,\mathsf{u}) is that between W~v(t+1)(k)+ηk(t+1),gtD~v(t+1)\tilde{W}^{(t+1)}_{v}(k)+\langle\eta^{(t+1)}_{k},g_{t}\tilde{D}^{(t+1)}_{v}\rangle and 𝖶~𝗎(s)(l)+ηl(s),gt𝖣~𝗎(s)\tilde{\mathsf{W}}^{(s)}_{\mathsf{u}}(l)+\langle\eta^{(s)}_{l},g_{t}\tilde{\mathsf{D}}^{(s)}_{\mathsf{u}}\rangle. See (3.60) and Remark 3.22 for precise definitions of 𝐐\mathbf{Q} and H,𝖧H,\mathsf{H}. Furthermore, we define

gt1[Y]t1(s,k,u)\displaystyle g_{t-1}[Y]_{t-1}(s,k,u) =Wu(s)(k)+ηk(s),gt1Du(s),\displaystyle=W^{(s)}_{u}(k)+\langle\eta^{(s)}_{k},g_{t-1}D^{(s)}_{u}\rangle\,,
[gY]t1(s,k,u)\displaystyle[gY]_{t-1}(s,k,u) =Wu(s)(k)+ηk(s),gs1Du(s),\displaystyle=W^{(s)}_{u}(k)+\langle\eta^{(s)}_{k},g_{s-1}D^{(s)}_{u}\rangle\,,

and define the mathsf version similarly. We let PRBt,k\mathrm{PRB}_{t,k} be the collection of vVv\in V such that

|([gY]t1gt1[Y]t1[g𝖸]t1gt1[𝖸]t1)𝐐t1(Ht,k,v𝖧t,k,v)|>Δt\displaystyle\big{|}\begin{pmatrix}[gY]_{t-1}-g_{t-1}[Y]_{t-1}&[g\mathsf{Y}]_{t-1}-g_{t-1}[\mathsf{Y}]_{t-1}\end{pmatrix}\mathbf{Q}_{t-1}\begin{pmatrix}H_{t,k,v}&\mathsf{H}_{t,k,v}\end{pmatrix}^{*}\big{|}>\Delta_{t} (3.14)

and let PRBt=1kKtPRBt,k\mathrm{PRB}_{t}=\cup_{1\leq k\leq K_{t}}\mathrm{PRB}_{t,k}; we will see that removing the vertices in PRBt\mathrm{PRB}_{t} is crucial for establishing Lemma 3.26. Since [gY]t1,gt1[Y]t1[gY]_{t-1},g_{t-1}[Y]_{t-1}, 𝐐t1\mathbf{Q}_{t-1} and Ht,k,vH_{t,k,v} are all measurable with respect to 𝔖t1\mathfrak{S}_{t-1}, we could have defined PRBt\mathrm{PRB}_{t} before defining LARGEt(a)\mathrm{LARGE}^{(a)}_{t} as in (3.11); we chose to postpone the definition of PRBt\mathrm{PRB}_{t} until now since it is only natural to introduce it after writing down the form of the Gaussian projection. Finally, we are ready to complete the inductive definition for the “bad” set as follows:

BADt=BADt1LARGEtBIAStPRBt.\displaystyle\mathrm{BAD}_{t}=\mathrm{BAD}_{t-1}\cup\mathrm{LARGE}_{t}\cup\mathrm{BIAS}_{t}\cup\mathrm{PRB}_{t}\,. (3.15)

We summarize in Figure 1 the logic flow for defining BADt\mathrm{BAD}_{t} from BADt1\mathrm{BAD}_{t-1} and variables in 𝔖t1\mathfrak{S}_{t-1}, which should illustrate clearly that our definitions are not “cyclic”.

Refer to caption
Figure 1: Logic of the definition

Recall (3.4) and recall BAD1=REV\mathrm{BAD}_{-1}=\mathrm{REV}. Let 𝒯1={|BAD1|(4nK02ϑχ+n/q^(1q^)+n12logloglogn)}\mathcal{T}_{-1}=\{|\mathrm{BAD}_{-1}|\leq(4nK_{0}^{2}\vartheta_{\chi}+\sqrt{n/\hat{q}(1-\hat{q})}+n^{1-\frac{2}{\log\log\log n}})\}, and for t0t\geq 0 let 𝒯t\mathcal{T}_{t} be the event such that for sts\leq t

|BADs|e30(s+1)(loglogn)20ϑ3(s+1)(4nK02ϑχ+n/q^(1q^)+n12logloglogn),\displaystyle|\mathrm{BAD}_{s}|\leq e^{30(s+1)(\log\log n)^{20}}\vartheta^{-3(s+1)}(4nK_{0}^{2}\vartheta_{\chi}+\sqrt{n/\hat{q}(1-\hat{q})}+n^{1-\frac{2}{\log\log\log n}})\,, (3.16)
and LARGEs(logn)=.\displaystyle\mbox{and }\mathrm{LARGE}_{s}^{(\log n)}=\emptyset\,.

By Lemma 2.1 and the fact that t2logloglognt\leq 2\log\log\log n, we have |BADt|nϑ10Δt10|\mathrm{BAD}_{t}|\ll n\vartheta^{10}\Delta_{t}^{10} on the event 𝒯t\mathcal{T}_{t}. Our hope is that, on the one hand 𝒯t\mathcal{T}_{t} occurs typically (as stated in Proposition 3.2 below) so that the number of “bad” vertices is under control, and on the other hand on 𝒯t\mathcal{T}_{t} most vertices can be dealt with by techniques of Gaussian projection which then allows to control the conditional distribution.

Proposition 3.2.

We have (0tt𝒯tt)=1o(1)\mathbb{P}(\cap_{0\leq t\leq t^{*}}\mathcal{T}_{t}\cap\mathcal{E}_{t})=1-o(1).

In Section 3.6, we will prove Proposition 3.2 via induction.

3.2 Preliminaries on probability and linear algebra

In this subsection we collect some standard lemmas on probability and linear algebra that will be useful for our further analysis.

The following version of Bernstein’s inequality appeared in [20, Theorem 1.4].

Lemma 3.3.

(Bernstein’s inequality). Let X=i=1mXiX=\sum_{i=1}^{m}X_{i}, where XiX_{i}’s are independent random variables such that |Xi|K|X_{i}|\leq K almost surely. Then, for s>0s>0 we have

(|X𝔼[X]|>s)2exp{s22(σ2+Ks/3)},\displaystyle\mathbb{P}(|X-\mathbb{E}[X]|>s)\leq 2\exp\Big{\{}-\frac{s^{2}}{2(\sigma^{2}+Ks/3)}\Big{\}}\,,

where σ2=i=1mVar(Xi)\sigma^{2}=\sum_{i=1}^{m}\mathrm{Var}(X_{i}) is the variance of XX.

We will also use Hanson-Wright inequality (see [32, 51, 25] and see [46, Theorem 1.1]), which is useful in controlling the quadratic form of sub-Gaussian random variables.

Lemma 3.4.

(Hanson-Wright Inequality). Let X=(X1,,Xm)X=(X_{1},\ldots,X_{m}) be a random vector with independent components which satisfy 𝔼[Xi]=0\mathbb{E}[X_{i}]=0 and Xiψ2K\|X_{i}\|_{\psi_{2}}\leq K for all 1im1\leq i\leq m. If A\mathrm{A} is an mmm\!*\!m symmetric matrix, then for an absolute constant c>0c>0 we have

(|XAX𝔼[XAX]|>s)2exp{cmin(s2K4AHS2,sK2Aop)}.\mathbb{P}\big{(}|X\mathrm{A}X^{*}-\mathbb{E}[X\mathrm{A}X^{*}]|>s\big{)}\leq 2\exp\Big{\{}-c\min\Big{(}\frac{s^{2}}{K^{4}\|\mathrm{A}\|^{2}_{\mathrm{HS}}},\frac{s}{K^{2}\|\mathrm{A}\|_{\mathrm{op}}}\Big{)}\Big{\}}\,. (3.17)

The following is a corollary of Hanson-Wright inequality (see [17, Lemma 3.8]).

Lemma 3.5.

Let X1,,Xm,Y1,,YmX_{1},\ldots,X_{m},Y_{1},\ldots,Y_{m} be mean-zero variables with Xiψ2,Yiψ2K\|X_{i}\|_{\psi_{2}},\|Y_{i}\|_{\psi_{2}}\leq K for all 1im1\leq i\leq m. In addition, assume for all ii that (Xi,Yi)(X_{i},Y_{i}) is independent with (Xi,Yi)(X_{\setminus i},Y_{\setminus i}) where XiX_{\setminus i} is obtained from XX by dropping its ii-th component (and similarly for YiY_{\setminus i}). Let A\mathrm{A} be an mmm\!*\!m matrix with diagonal entries being 0. Then for an absolute constant c>0c>0 and for every s>0s>0

(|XAY|>s)2exp{cmin(s2K4AHS2,sK2Aop)}.\displaystyle\mathbb{P}(|X\mathrm{A}Y^{*}|>s)\leq 2\exp\Big{\{}-c\min\Big{(}\frac{s^{2}}{K^{4}\|\mathrm{A}\|^{2}_{\mathrm{HS}}},\frac{s}{K^{2}\|\mathrm{A}\|_{\mathrm{op}}}\Big{)}\Big{\}}\,.

The following inequality is standard for posterior estimation.

Lemma 3.6.

For a random variable XX and an event AA in the same probability space, we have xX|A((A|X=x)ϵ(A))1ϵ\mathbb{P}_{x\sim X|A}\big{(}\mathbb{P}(A|X=x)\geq\epsilon\mathbb{P}(A)\big{)}\geq 1-\epsilon for ϵ[0,1]\epsilon\in[0,1].

Proof.

We have that

xX|A((A|X=x)ϵ(A))=\displaystyle\mathbb{P}_{x\sim X|A}\big{(}\mathbb{P}(A|X=x)\geq\epsilon\mathbb{P}(A)\big{)}= 𝔼xX[1(A)(A|X=x)𝟏{(A|X=x)ϵ(A)}]\displaystyle\mathbb{E}_{x\sim X}\Big{[}\frac{1}{\mathbb{P}(A)}\mathbb{P}(A|X=x)\mathbf{1}_{\{\mathbb{P}(A|X=x)\geq\epsilon\mathbb{P}(A)\}}\Big{]}
\displaystyle\geq 1(A)𝔼xX[(A|X=x)ϵ(A)]=1ϵ,\displaystyle\frac{1}{\mathbb{P}(A)}\mathbb{E}_{x\sim X}[\mathbb{P}(A|X=x)-\epsilon\mathbb{P}(A)]=1-\epsilon\,,

completing the proof of this lemma. ∎

We also need some results in linear algebra.

Lemma 3.7.

For two mmm\!*\!m matrices A,B\mathrm{A,B}, if (A+B)1opC,A1L\|\mathrm{(A+B)}^{-1}\|_{\mathrm{op}}\leq C,\|\mathrm{A}^{-1}\|_{\infty}\leq L and the entries of B\mathrm{B} are bounded by Km\frac{K}{m}, then (A+B)1max{2KCL,2L}\|\mathrm{(A+B)}^{-1}\|_{\infty}\leq\max\{2KCL,2L\}.

Proof.

It suffices to show that (A+B)xmin{12KCL,12L}\|\mathrm{(A+B)}x^{*}\|_{\infty}\geq\min\{\frac{1}{2KCL},\frac{1}{2L}\} for any x=1\|x\|_{\infty}=1. First we consider the case when x2m2KL\|x\|_{2}\geq\frac{\sqrt{m}}{2KL}. Since (A+B)x2(A+B)1op1x2C1x2\|\mathrm{(A+B)}x^{*}\|_{2}\geq\|\mathrm{(A+B)}^{-1}\|_{\mathrm{op}}^{-1}\|x\|_{2}\geq C^{-1}\|x\|_{2}, we get (A+B)x2(A+B)x22m14K2C2L2\|\mathrm{(A+B)}x^{*}\|^{2}_{\infty}\geq\frac{\|\mathrm{(A+B)}x^{*}\|^{2}_{2}}{m}\geq\frac{1}{4K^{2}C^{2}L^{2}}. Next we consider the case when x2m2KL\|x\|_{2}\leq\frac{\sqrt{m}}{2KL}. In this case x1m2KL\|x\|_{1}\leq\frac{m}{2KL}. Thus, AxA11xL1\|\mathrm{A}x^{*}\|_{\infty}\geq\|\mathrm{A}^{-1}\|_{\infty}^{-1}\|x\|_{\infty}\geq L^{-1} and BxKmx1(2L)1\|\mathrm{B}x^{*}\|_{\infty}\leq\frac{K}{m}\|x\|_{1}\leq(2L)^{-1}. Therefore, (A+B)xAxBx12L\|\mathrm{(A+B)}x^{*}\|_{\infty}\geq\|\mathrm{A}x^{*}\|_{\infty}-\|\mathrm{B}x^{*}\|_{\infty}\geq\frac{1}{2L}, as desired. ∎

Lemma 3.8.

For an mm*\ell matrix A\mathrm{A}, suppose that there exist two partitions {1,,m}=k=1Kk\{1,\ldots,m\}=\sqcup_{k=1}^{K}\mathcal{I}_{k} and {1,,}=k=1K𝒥k\{1,\ldots,\ell\}=\sqcup_{k=1}^{K}\mathcal{J}_{k} with |k|,|𝒥k|D|\mathcal{I}_{k}|,|\mathcal{J}_{k}|\leq D (for 1kK1\leq k\leq K) such that |Aa,b|δ|\mathrm{A}_{a,b}|\leq\delta for 1kK1\leq k\leq K and (a,b)k×𝒥k(a,b)\in\mathcal{I}_{k}\times\mathcal{J}_{k}, and that kl(a,b)k×𝒥lAa,b2C2\sum_{k\neq l}\sum_{(a,b)\in\mathcal{I}_{k}\times\mathcal{J}_{l}}\mathrm{A}_{a,b}^{2}\leq C^{2}. Then we have AopDδ+C\|\mathrm{A}\|_{\mathrm{op}}\leq D\delta+C.

Proof.

Denote |k|=mk|\mathcal{I}_{k}|=m_{k} and |𝒥k|=k|\mathcal{J}_{k}|=\ell_{k}. Define Adiag\mathrm{A}^{\mathrm{diag}} such that Aa,bdiag=Aa,b\mathrm{A}^{\mathrm{diag}}_{a,b}=\mathrm{A}_{a,b} for (a,b)k×𝒥k(a,b)\in\mathcal{I}_{k}\times\mathcal{J}_{k} and that Aa,bdiag=0\mathrm{A}^{\mathrm{diag}}_{a,b}=0 otherwise. Then, there exist two permutation matrices Q1,Q2\mathrm{Q_{1},Q_{2}} such that

Q1AdiagQ2=(A1Om12Om1KOm21A2Om2KOmK1OmK2AK).\mathrm{Q}_{1}\mathrm{A}^{\mathrm{diag}}\mathrm{Q}_{2}=\begin{pmatrix}\mathrm{A}_{1}&\mathrm{O}_{m_{1}*\ell_{2}}&\cdots&\mathrm{O}_{m_{1}*\ell_{K}}\\ \mathrm{O}_{m_{2}*\ell_{1}}&\mathrm{A}_{2}&\cdots&\mathrm{O}_{m_{2}*\ell_{K}}\\ \vdots&\vdots&\vdots&\vdots\\ \mathrm{O}_{m_{K}*\ell_{1}}&\mathrm{O}_{m_{K}*\ell_{2}}&\cdots&\mathrm{A}_{K}\end{pmatrix}\,.

where Ak\mathrm{A}_{k} is a matrix with size mkkm_{k}\!*\!\ell_{k} and with entries bounded by δ\delta. Thus Adiagop=max1kKAkopDδ\|\mathrm{A}^{\mathrm{diag}}\|_{\mathrm{op}}=\max_{1\leq k\leq K}\|\mathrm{A}_{k}\|_{\mathrm{op}}\leq D\delta. Also we have AAdiagop2AAdiagHS2C2\|\mathrm{A}-\mathrm{A}^{\mathrm{diag}}\|_{\mathrm{op}}^{2}\leq\|\mathrm{A}-\mathrm{A}^{\mathrm{diag}}\|_{\mathrm{HS}}^{2}\leq C^{2}, and thus the result follows from the triangle inequality. ∎

3.3 Analysis of initialization

In this subsection we analyze the initialization. We will prove a concentration result for k(a),Υk(a)\aleph^{(a)}_{k},\Upsilon^{(a)}_{k} for 1aχ+1,1kK01\leq a\leq\chi+1,1\leq k\leq K_{0} in Lemma 3.13, which then implies that (0)=1o(1)\mathbb{P}(\mathcal{E}_{0})=1-o(1) as in Lemma 3.14. As preparations, we first collect a few technical estimates on binomial variables.

Lemma 3.9.

For mNp1m\ll N\ll p^{-1} and l=Θ(1)l=\Theta(1), we have

(Bin(N+m,p)l)(Bin(N,p)l)2mlN(Bin(N,p)l).\displaystyle\mathbb{P}(\mathrm{Bin}(N+m,p)\geq l)-\mathbb{P}(\mathrm{Bin}(N,p)\geq l)\leq\frac{2ml}{N}\mathbb{P}(\mathrm{Bin}(N,p)\geq l)\,.
Proof.

The left hand side equals to k=1l(Bin(m,p)=k)(l>Bin(N,p)lk)\sum_{k=1}^{l}\mathbb{P}(\mathrm{Bin}(m,p)=k)*\mathbb{P}(l>\mathrm{Bin}(N,p)\geq l-k). Since mp1mp\ll 1, a straightforward computation yields that (Bin(m,p)k)(mp)kk!\mathbb{P}(\mathrm{Bin}(m,p)\geq k)\sim\frac{(mp)^{k}}{k!} for any fixed kk. Since Np1Np\ll 1, we also have that (Bin(N,p)lk)(Bin(N,p)l)l!(lk)!(Np)k\frac{\mathbb{P}(\mathrm{Bin}(N,p)\geq l-k)}{\mathbb{P}(\mathrm{Bin}(N,p)\geq l)}\sim\frac{l!}{(l-k)!}(Np)^{-k}. Thus,

(Bin(N+m,p)l)(Bin(N,p)l)(Bin(N,p)l)1.5k=1ll!(lk)!(Np)k1k!(mp)k2mlN,\displaystyle\frac{\mathbb{P}(\mathrm{Bin}(N+m,p)\geq l)-\mathbb{P}(\mathrm{Bin}(N,p)\geq l)}{\mathbb{P}(\mathrm{Bin}(N,p)\geq l)}\leq 1.5\sum_{k=1}^{l}\frac{l!}{(l-k)!}(Np)^{-k}\cdot\frac{1}{k!}(mp)^{k}\leq\frac{2ml}{N},

which yields the desired bound. ∎

Corollary 3.10.

For mN,Mp1m\ll N,M\ll p^{-1} and l=Θ(1)l=\Theta(1), we have

(CorBin(N+m,N+m,p;M+m,ρ)(l,l))(CorBin(N,N,p;M,ρ)(l,l))\displaystyle\mathbb{P}(\mathrm{CorBin}(N+m,N+m,p;M+m,\rho)\geq(l,l))-\mathbb{P}(\mathrm{CorBin}(N,N,p;M,\rho)\geq(l,l))
4lmN(Bin(N,p)l).\displaystyle\leq\frac{4lm}{N}\mathbb{P}(\mathrm{Bin}(N,p)\geq l)\,.
Proof.

Let (X,Y)=𝑑CorBin(N,N,p;M,ρ)(X,Y)\overset{d}{=}\mathrm{CorBin}(N,N,p;M,\rho) and (U,U)=𝑑CorBin(m,m,p;m,ρ)(U,U^{\prime})\overset{d}{=}\mathrm{CorBin}(m,m,p;m,\rho) be such that (X,Y)(X,Y) is independent of (U,U)(U,U^{\prime}). Then the left hand side (in the corollary-statement) equals to

(X+U,Y+Ul)(X,Yl)(X+Ul>X)+(Y+Ul>Y)\displaystyle\mathbb{P}(X+U,Y+U^{\prime}\geq l)-\mathbb{P}(X,Y\geq l)\leq\mathbb{P}(X+U\geq l>X)+\mathbb{P}(Y+U^{\prime}\geq l>Y)
=\displaystyle=\ 2((Bin(N+m,p)l)(Bin(N,p)l))4lmN(Bin(N,p)l),\displaystyle 2(\mathbb{P}(\mathrm{Bin}(N+m,p)\geq l)-\mathbb{P}(\mathrm{Bin}(N,p)\geq l))\leq\frac{4lm}{N}\mathbb{P}(\mathrm{Bin}(N,p)\geq l)\,,

where the last inequality follows from Lemma 3.9. This gives the desired bound. ∎

Lemma 3.11.

For all N1,M,m,lN\gg 1,M,m,l\in\mathbb{N} and p,ϵ>0p,\epsilon>0 we have

(Bin(N+m,p)l)(Bin(N,p)l)ϵ2+(mp+ϵ1mp(1p))logNNp(1p).\displaystyle\mathbb{P}(\mathrm{Bin}(N+m,p)\geq l)-\mathbb{P}(\mathrm{Bin}(N,p)\geq l)\lesssim\epsilon^{2}+\big{(}mp+\epsilon^{-1}\sqrt{mp(1-p)}\big{)}\frac{\log N}{\sqrt{Np(1-p)}}\,.
Proof.

Writing Q=mp+ϵ1mp(1p)Q=mp+\epsilon^{-1}\sqrt{mp(1-p)}, we have that the left hand side is bounded by (Bin(m,p)Q)+(lQ<Bin(N,p)<l)\mathbb{P}(\mathrm{Bin}(m,p)\geq Q)+\mathbb{P}(l-Q<\mathrm{Bin}(N,p)<l). Applying Chebyshev’s inequality, we have (Bin(m,p)Q)ϵ2\mathbb{P}(\mathrm{Bin}(m,p)\geq Q)\leq\epsilon^{2}. In addition, we have

(lQ<Bin(N,p)<l)Qmaxk>0{(Bin(N,p)=k)}\displaystyle\mathbb{P}(l-Q<\mathrm{Bin}(N,p)<l)\leq Q\max_{k>0}\{\mathbb{P}(\mathrm{Bin}(N,p)=k)\}
\displaystyle\leq\ Qmaxk{[Np],[Np]+1},k0{(Nk)pk(1p)Nk}QlogNNp(1p),\displaystyle Q\cdot\max_{k\in\{[Np],[Np]+1\},k\neq 0}\Big{\{}\binom{N}{k}p^{k}(1-p)^{N-k}\Big{\}}\lesssim\frac{Q\log N}{\sqrt{Np(1-p)}}\,, (3.18)

where [Np]=max{r:rNp}[Np]=\max\{r:r\leq Np\}. Here the last inequality above can be verified as follows: if Np(1p)=O(1)Np(1-p)=O(1), then logNNp(1p)1\frac{\log N}{\sqrt{Np(1-p)}}\gg 1 (and thus the bound holds); if Np(1p)1Np(1-p)\gg 1, then by Sterling’s formula we have (Nk)pk(1p)Nk=O(logNNp(1p))\binom{N}{k}p^{k}(1-p)^{N-k}=O(\frac{\log N}{\sqrt{Np(1-p)}}), as desired. ∎

Corollary 3.12.

For N1,M,m1,m2=o(N),lN\gg 1,M,m_{1},m_{2}=o(N),l\in\mathbb{N} and p,ϵ>0p,\epsilon>0, we have

(CorBin(N+m1,N+m1,p;M+m2,ρ)(l,l))(CorBin(N,N,p;M,ρ)(l,l))\displaystyle\mathbb{P}(\mathrm{CorBin}(N+m_{1},N+m_{1},p;M+m_{2},\rho)\geq(l,l))-\mathbb{P}(\mathrm{CorBin}(N,N,p;M,\rho)\geq(l,l))
\displaystyle\lesssim\ ϵ2+(2m1p+4ϵ1m1p(1p)+4ϵ1m2p(1p))logNNp(1p).\displaystyle\epsilon^{2}+(2m_{1}p+4\epsilon^{-1}\sqrt{m_{1}p(1-p)}+4\epsilon^{-1}\sqrt{m_{2}p(1-p)})\frac{\log N}{\sqrt{Np(1-p)}}\,.
Proof.

Let (X,Y)=𝑑CorBin(Nm2,Nm2,p;M,ρ)(X,Y)\overset{d}{=}\mathrm{CorBin}(N-m_{2},N-m_{2},p;M,\rho), (Z,Z)=𝑑CorBin(m2,m2,p;0,ρ)(Z,Z^{\prime})\overset{d}{=}\mathrm{CorBin}(m_{2},m_{2},p;0,\rho), (W,W)=𝑑CorBin(m2,m2,p;m2,ρ)(W,W^{\prime})\overset{d}{=}\mathrm{CorBin}(m_{2},m_{2},p;m_{2},\rho) and (U,U)=𝑑CorBin(m1,m1,p;0,ρ)(U,U^{\prime})\overset{d}{=}\mathrm{CorBin}(m_{1},m_{1},p;0,\rho) be independent pairs of variables. Let Q1=m1p+2ϵ1m1p(1p)Q_{1}=m_{1}p+2\epsilon^{-1}\sqrt{m_{1}p(1-p)} and Q2=2ϵ1m2p(1p)Q_{2}=2\epsilon^{-1}\sqrt{m_{2}p(1-p)}. Then the difference of probabilities in the statement is equal to

(X+U+W,Y+U+Wl)(X+Z,Y+Zl)\displaystyle\mathbb{P}(X+U+W,Y+U^{\prime}+W^{\prime}\geq l)-\mathbb{P}(X+Z,Y+Z^{\prime}\geq l)
\displaystyle\leq\ (X+U+Wl>X+Z)+(Y+U+Wl>Y+Z)\displaystyle\mathbb{P}(X+U+W\geq l>X+Z)+\mathbb{P}(Y+U^{\prime}+W^{\prime}\geq l>Y+Z^{\prime})
\displaystyle\leq\ 2((|WZ|>Q2)+(U>Q1)+(lQ1Q2X+Z<l))\displaystyle 2\big{(}\mathbb{P}(|W-Z|>Q_{2})+\mathbb{P}(U>Q_{1})+\mathbb{P}(l-Q_{1}-Q_{2}\leq X+Z<l)\big{)}
\displaystyle\lesssim\ 4ϵ2+2(Q1+Q2)(logN)/Np(1p),\displaystyle 4\epsilon^{2}+2(Q_{1}+Q_{2})(\log N)/\sqrt{Np(1-p)}\,,

where the last transition follows from Chebyshev’s inequality and (3.3). ∎

Recall (2.7). Define the targeted approximation error in the aa-th iteration of the initialization by

Λa=100a(nq^)12(logn)ϑa for 0aχ.\displaystyle\Lambda_{a}=100^{a}(n\hat{q})^{-\frac{1}{2}}(\log n)\vartheta_{a}\mbox{ for }0\leq a\leq\chi\,. (3.19)
Lemma 3.13.

The following hold with probability 1o(1)1-o(1) for all 0aχ0\leq a\leq\chi:
(1) ||k(a)|nϑa|,||Υk(a)|nϑa|Λa\Big{|}\frac{|\aleph^{(a)}_{k}|}{n}-\vartheta_{a}\Big{|},\Big{|}\frac{|\Upsilon^{(a)}_{k}|}{n}-\vartheta_{a}\Big{|}\leq\Lambda_{a} for 1kK01\leq k\leq K_{0};
(2) ||k(a)l(a)|nϑa2|,||Υk(a)Υl(a)|nϑa2|,||π(k(a))Υl(a)|nϑa2|Λa\Big{|}\frac{|\aleph^{(a)}_{k}\cap\aleph^{(a)}_{l}|}{n}-\vartheta_{a}^{2}\Big{|},\Big{|}\frac{|\Upsilon^{(a)}_{k}\cap\Upsilon^{(a)}_{l}|}{n}-\vartheta_{a}^{2}\Big{|},\Big{|}\frac{|\pi(\aleph^{(a)}_{k})\cap\Upsilon^{(a)}_{l}|}{n}-\vartheta_{a}^{2}\Big{|}\leq\Lambda_{a} for 1klK01\leq k\neq l\leq K_{0};
(3) ||π(k(a))Υk(a)|nςa|Λa\Big{|}\frac{|\pi(\aleph^{(a)}_{k})\cap\Upsilon^{(a)}_{k}|}{n}-\varsigma_{a}\Big{|}\leq\Lambda_{a} for 1kK01\leq k\leq K_{0}.

Proof.

The proof is by induction on aa. The base case for a=0a=0 is trivial. Now suppose that Items (1), (2) and (3) hold up to some aχ1a\leq\chi-1 and we wish to prove that (1), (2) and (3) hold with probability 1o(1)1-o(1) for a+1a+1. To this end, applying (2.4) and Lemma 2.1 we have ϑa=Θ(n1(nq^)χ1)1nq^\vartheta_{a}=\Theta(n^{-1}(n\hat{q})^{\chi-1})\ll\frac{1}{n\hat{q}}. Recall the definition of REV(a)\mathrm{REV}^{(a)} as in (3.4), which records the collection of vertices explored by our algorithm. By the induction hypothesis we know

|REV(a)|4K0ϑanΛa+1n,\displaystyle|\mathrm{REV}^{(a)}|\leq 4K_{0}\vartheta_{a}n\ll\Lambda_{a+1}n\,, (3.20)

where the last transition follows from Lemma 2.1. Thus, it suffices to control the concentration of |k(a+1)REV(a)||\aleph^{(a+1)}_{k}\setminus\mathrm{REV}^{(a)}| in order to control that for |k(a+1)||\aleph^{(a+1)}_{k}|. Note that

|k(a+1)REV(a)|nϑa+1=1nuVREV(a)(𝟏{uk(a+1)}ϑa+1)+O(ϑa+1ϑa),\displaystyle\frac{|\aleph^{(a+1)}_{k}\setminus\mathrm{REV}^{(a)}|}{n}-\vartheta_{a+1}=\frac{1}{n}\sum_{u\in V\setminus\mathrm{REV}^{(a)}}\Big{(}\mathbf{1}_{\{u\in\aleph^{(a+1)}_{k}\}}-\vartheta_{a+1}\Big{)}+O(\vartheta_{a+1}\vartheta_{a})\,,

where ϑa+1ϑaϑa+1/nq^Λa+1\vartheta_{a+1}\vartheta_{a}\ll\vartheta_{a+1}/n\hat{q}\ll\Lambda_{a+1} by Lemma 2.1. Since the indicators in the above sum are measurable with respect to {Gv,u:vk(a)}\{\overrightarrow{G}_{v,u}:v\in\aleph^{(a)}_{k}\}, we see that conditioned on a realization of {k(a),Υk(a)}\{\aleph^{(a)}_{k},\Upsilon^{(a)}_{k}\} we have that {𝟏{uk(a+1)}:uVREV(a)}\{\mathbf{1}_{\{u\in\aleph^{(a+1)}_{k}\}}:u\in V\setminus\mathrm{REV}^{(a)}\} is a collection of i.i.d. Bernoulli random variables with parameter given by

pk(a+1)=(uk(a+1))=(Bin(|k(a)|,q^)1).\displaystyle p^{(a+1)}_{k}=\mathbb{P}\big{(}u\in\aleph^{(a+1)}_{k}\big{)}=\mathbb{P}\big{(}\mathrm{Bin}(|\aleph^{(a)}_{k}|,\hat{q})\geq 1\big{)}\,.

By the induction hypothesis, we have ||k(a)|nϑa|nΛa\big{|}|\aleph^{(a)}_{k}|-n\vartheta_{a}\big{|}\leq n\Lambda_{a}. Combined with Lemma 3.9, it yields that

|ϑa+1pk(a+1)||(Bin(nϑa,q^)1)(Bin(nϑa+nΛa,q^)1)|\displaystyle\big{|}\vartheta_{a+1}-p^{(a+1)}_{k}\big{|}\leq\big{|}\mathbb{P}\big{(}\mathrm{Bin}(n\vartheta_{a},\hat{q})\geq 1\big{)}-\mathbb{P}\big{(}\mathrm{Bin}(n\vartheta_{a}+n\Lambda_{a},\hat{q})\geq 1\big{)}\big{|}
\displaystyle\leq\ 2nΛanϑa(Bin(nϑa,q^)1)(2.7),(3.19)110Λa+1.\displaystyle\frac{2n\Lambda_{a}}{n\vartheta_{a}}\mathbb{P}(\mathrm{Bin}(n\vartheta_{a},\hat{q})\geq 1)\overset{\eqref{equ-def-iter-vartheta-varsigma},\eqref{equ-def-Lambda}}{\leq}\frac{1}{10}\Lambda_{a+1}\,. (3.21)

Thus, we may apply Lemma 3.3 and get that

(||k(a+1)|nϑa+1|>Λa+1)(3.20)(||k(a+1)REV(a)|nϑa+1|>910Λa+1)\displaystyle\mathbb{P}\Big{(}\Big{|}\frac{|\aleph^{(a+1)}_{k}|}{n}-\vartheta_{a+1}\Big{|}>\Lambda_{a+1}\Big{)}\overset{\eqref{eq-REV-approximation}}{\leq}\mathbb{P}\Big{(}\Big{|}\frac{|\aleph^{(a+1)}_{k}\setminus\mathrm{REV}^{(a)}|}{n}-\vartheta_{a+1}\Big{|}>\frac{9}{10}\Lambda_{a+1}\Big{)}
(3.21)\displaystyle\overset{\eqref{equ-bound-p-a+1-minus-vartheta-a+1}}{\leq} (1n|Bin(n|REV(a)|,pk(a+1))(n|REV(a)|)pk(a+1)|>12Λa+1)\displaystyle\mathbb{P}\Big{(}\frac{1}{n}\Big{|}\mathrm{Bin}(n-|\mathrm{REV}^{(a)}|,p^{(a+1)}_{k})-(n-|\mathrm{REV}^{(a)}|)p^{(a+1)}_{k}\Big{|}>\frac{1}{2}\Lambda_{a+1}\Big{)}
\displaystyle\leq\ 2exp{(12nΛa+1)22(npk(a+1)+nΛa+1/3)}2exp{q^1ϑa+1(logn)2}.\displaystyle 2\exp\Big{\{}-\frac{(\frac{1}{2}n\Lambda_{a+1})^{2}}{2(np^{(a+1)}_{k}+n\Lambda_{a+1}/3)}\Big{\}}\leq 2\exp\{-\hat{q}^{-1}\vartheta_{a+1}(\log n)^{2}\}\,. (3.22)

Similar results hold for Υk(a+1)\Upsilon^{(a+1)}_{k}. We now move to Item (2). Similarly, conditioned on a realization of {k(a),Υk(a)}\{\aleph^{(a)}_{k},\Upsilon^{(a)}_{k}\}, we have that

|k(a+1)l(a+1)REV(a)|n=1nuVREV(a)𝟏{uk(a+1)l(a+1)}\displaystyle\frac{|\aleph^{(a+1)}_{k}\cap\aleph^{(a+1)}_{l}\setminus\mathrm{REV}^{(a)}|}{n}=\frac{1}{n}\sum_{u\in V\setminus\mathrm{REV}^{(a)}}\mathbf{1}_{\{u\in\aleph^{(a+1)}_{k}\cap\aleph^{(a+1)}_{l}\}}

is a (normalized) sum of i.i.d. Bernoulli random variables with parameter give by

pk,l(a+1)=\displaystyle p^{(a+1)}_{k,l}= (uk(a+1)l(a+1))=(wk(a)Gw,u,wl(a)Gw,u1),\displaystyle\mathbb{P}\Big{(}u\in\aleph^{(a+1)}_{k}\cap\aleph^{(a+1)}_{l}\Big{)}=\mathbb{P}\Big{(}\sum_{w\in\aleph^{(a)}_{k}}\overrightarrow{G}_{w,u},\sum_{w\in\aleph^{(a)}_{l}}\overrightarrow{G}_{w,u}\geq 1\Big{)}\,,

where (wk(a)Gw,u,wl(a)Gw,u)=𝑑CorBin(|k(a)|,|l(a)|,q^;|k(a)l(a)|,ρ^)\big{(}\sum_{w\in\aleph^{(a)}_{k}}\overrightarrow{G}_{w,u},\sum_{w\in\aleph^{(a)}_{l}}\overrightarrow{G}_{w,u}\big{)}\overset{d}{=}\mathrm{CorBin}(|\aleph^{(a)}_{k}|,|\aleph^{(a)}_{l}|,\hat{q};|\aleph^{(a)}_{k}\cap\aleph^{(a)}_{l}|,\hat{\rho}). By the induction hypothesis we have ||k(a)|nϑa|,||k(a)|nϑa|,||k(a)l(a)|nϑa2|nΛa\big{|}|\aleph^{(a)}_{k}|-n\vartheta_{a}\big{|},\big{|}|\aleph^{(a)}_{k}|-n\vartheta_{a}\big{|},\big{|}|\aleph^{(a)}_{k}\cap\aleph^{(a)}_{l}|-n\vartheta_{a}^{2}\big{|}\leq n\Lambda_{a}. In addition, by Lemma 2.1 we have ϑa2ϑa/nq^Λa\vartheta_{a}^{2}\leq\vartheta_{a}/n\hat{q}\ll\Lambda_{a} and thus |k(a)l(a)|1.1nΛa|\aleph^{(a)}_{k}\cap\aleph^{(a)}_{l}|\leq 1.1n\Lambda_{a}. Combined with Corollary 3.10, these yield that

|(uk(a+1)l(a+1))ϑa+12|5Λaϑaϑa+1110Λa+1.\displaystyle\Big{|}\mathbb{P}\Big{(}u\in\aleph^{(a+1)}_{k}\cap\aleph^{(a+1)}_{l}\Big{)}-\vartheta_{a+1}^{2}\Big{|}\leq\frac{5\Lambda_{a}}{\vartheta_{a}}\vartheta_{a+1}\leq\frac{1}{10}\Lambda_{a+1}\,.

Applying Lemma 3.3 again, we get that

(||k(a+1)l(a+1)|nϑa+12|>Λa+1)\displaystyle\mathbb{P}\Big{(}\Big{|}\frac{|\aleph^{(a+1)}_{k}\cap\aleph^{(a+1)}_{l}|}{n}-\vartheta_{a+1}^{2}\Big{|}>\Lambda_{a+1}\Big{)}
\displaystyle\leq\ (1n|Bin(n|REV(a)|,pk,l(a+1))(n|REV(a)|)pk,l(a+1)|>12Λa+1)\displaystyle\mathbb{P}\Big{(}\frac{1}{n}\Big{|}\mathrm{Bin}(n-|\mathrm{REV}^{(a)}|,p^{(a+1)}_{k,l})-(n-|\mathrm{REV}^{(a)}|)p^{(a+1)}_{k,l}\Big{|}>\frac{1}{2}\Lambda_{a+1}\Big{)}
\displaystyle\leq\ 2exp{q^1ϑa+1(logn)2}.\displaystyle 2\exp\{-\hat{q}^{-1}\vartheta_{a+1}(\log n)^{2}\}\,. (3.23)

The terms |k(a+1)l(a+1)|n,|Υk(a+1)Υl(a+1)|n\frac{|\aleph^{(a+1)}_{k}\cap\aleph^{(a+1)}_{l}|}{n},\frac{|\Upsilon^{(a+1)}_{k}\cap\Upsilon^{(a+1)}_{l}|}{n} can be bounded in the same way. We next turn to Item (3). In order to bound ||k(a+1)Υk(a+1)|nςa+1|\big{|}\frac{|\aleph^{(a+1)}_{k}\cap\Upsilon^{(a+1)}_{k}|}{n}-\varsigma_{a+1}\big{|}, note that

|k(a+1)Υk(a+1)REV(a)|nςa+1=1nuVREV(a)(𝟏{uk(a+1)Υk(a+1)}ςa+1)+O(ςa+1ϑa).\displaystyle\frac{|\aleph^{(a+1)}_{k}\cap\Upsilon^{(a+1)}_{k}\setminus\mathrm{REV}^{(a)}|}{n}-\varsigma_{a+1}=\frac{1}{n}\sum_{u\in V\setminus\mathrm{REV}^{(a)}}\Big{(}\mathbf{1}_{\{u\in\aleph^{(a+1)}_{k}\cap\Upsilon^{(a+1)}_{k}\}}-\varsigma_{a+1}\Big{)}+O(\varsigma_{a+1}\vartheta_{a})\,.

Given a realization of {k(a),Υk(a)}\{\aleph^{(a)}_{k},\Upsilon^{(a)}_{k}\}, we have {𝟏{uk(a+1)Υk(a+1)}}\{\mathbf{1}_{\{u\in\aleph^{(a+1)}_{k}\cap\Upsilon^{(a+1)}_{k}\}}\} is a collection of i.i.d. Bernoulli variables, with parameter (by Lemma 3.10 and the induction hypothesis again)

|(uk(a+1)Υk(a+1))ςa+1|=|(X,Y1)ςa+1|5Λaϑaϑa+1110Λa+1,\displaystyle\Big{|}\mathbb{P}\Big{(}u\in\aleph^{(a+1)}_{k}\cap\Upsilon^{(a+1)}_{k}\Big{)}-\varsigma_{a+1}\Big{|}=\Big{|}\mathbb{P}(X,Y\geq 1)-\varsigma_{a+1}\Big{|}\leq\frac{5\Lambda_{a}}{\vartheta_{a}}\vartheta_{a+1}\leq\frac{1}{10}\Lambda_{a+1}\,,

where (X,Y)CorBin(|k(a)|,|Υk(a)|,q;|k(a)Υk(a)|;ρ)(X,Y)\sim\mathrm{CorBin}(|\aleph^{(a)}_{k}|,|\Upsilon^{(a)}_{k}|,q;|\aleph^{(a)}_{k}\cap\Upsilon^{(a)}_{k}|;\rho). By Lemma 3.3 again, we get that

(||k(a+1)Υk(a+1)|nςa+1|>Λa+1)\displaystyle\mathbb{P}\Big{(}\Big{|}\frac{|\aleph^{(a+1)}_{k}\cap\Upsilon^{(a+1)}_{k}|}{n}-\varsigma_{a+1}\Big{|}>\Lambda_{a+1}\Big{)}
\displaystyle\leq\ (||k(a+1)Υk(a+1)REV(a)|nςa+1|>910Λa+1)\displaystyle\mathbb{P}\Big{(}\Big{|}\frac{|\aleph^{(a+1)}_{k}\cap\Upsilon^{(a+1)}_{k}\setminus\mathrm{REV}^{(a)}|}{n}-\varsigma_{a+1}\Big{|}>\frac{9}{10}\Lambda_{a+1}\Big{)}
\displaystyle\leq\ 2exp{(12Λa+1n)22(nςa+1+nΛa+1/3)}2exp{q^1ϑa+1(logn)2}.\displaystyle 2\exp\Big{\{}-\frac{(\frac{1}{2}\Lambda_{a+1}n)^{2}}{2(n\varsigma_{a+1}+n\Lambda_{a+1}/3)}\Big{\}}\leq 2\exp\{-\hat{q}^{-1}\vartheta_{a+1}(\log n)^{2}\}\,. (3.24)

Combining (3.22), (3.23), and (3.24) and applying a union bound, we see that (assuming (1), (2) and (3) hold for aa) the conditional probability for (1), (2) or (3) to fail for a+1a+1 is at most 2K02exp{(logn)2}2K_{0}^{2}\exp\{-(\log n)^{2}\} since ϑa+1ϑ1=q^\vartheta_{a+1}\geq\vartheta_{1}=\hat{q}. Therefore, we complete the proof of Lemma 3.13 by induction (note that χ=O(1)\chi=O(1)). ∎

Lemma 3.14.

We have (0)=1o(1)\mathbb{P}(\mathcal{E}_{0})=1-o(1).

Proof.

By Lemma 3.13, with probability 1o(1)1-o(1) we have

|REV|4K0nϑχnϑΔ0.|\mathrm{REV}|\leq 4K_{0}n\vartheta_{\chi}\ll n\vartheta\Delta_{0}\,.

Here the last inequality can be derived as follows: from Lemma 2.1, we have either ϑ=Θ(1)\vartheta=\Theta(1) or ϑχnα+o(1)\vartheta_{\chi}\leq n^{-\alpha+o(1)}. In addition, when ϑ=Θ(1)\vartheta=\Theta(1) we have (recall (2.9))

nϑΔ0=Θ(nΔ0)(3.1)ne(loglogn)100(2.4)4K0nϑχ;n\vartheta\Delta_{0}=\Theta(n\Delta_{0})\overset{\eqref{equ-def-delta}}{\gg}ne^{-(\log\log n)^{100}}\overset{\eqref{eq-def-chi}}{\gg}4K_{0}n\vartheta_{\chi}\,;

And when ϑχnα+o(1)\vartheta_{\chi}\leq n^{-\alpha+o(1)} we have

4K0nϑχ=n1α+o(1)ne(loglogn)200(2.4),(3.1)nϑΔ0.4K_{0}n\vartheta_{\chi}=n^{1-\alpha+o(1)}\ll ne^{-(\log\log n)^{200}}\overset{\eqref{eq-def-chi},\eqref{equ-def-delta}}{\ll}n\vartheta\Delta_{0}\,.

Provided with the preceding bound on |REV||\mathrm{REV}|, it suffices to analyze |Γk(0)REV|n\frac{|\Gamma^{(0)}_{k}\setminus\mathrm{REV}|}{n}, whose conditional distribution given {k(a),Υk(a):1kK0,0aχ}\{\aleph^{(a)}_{k},\Upsilon^{(a)}_{k}:1\leq k\leq K_{0},0\leq a\leq\chi\} is the same as the sum of n|REV|n-|\mathrm{REV}| i.i.d. Bernoulli variables with parameter given by

(uΓk(0))=(Bin(|k(χ)|,q^)𝚍χ).\displaystyle\mathbb{P}(u\in\Gamma^{(0)}_{k})=\mathbb{P}(\mathrm{Bin}(|\aleph^{(\chi)}_{k}|,\hat{q})\geq\mathtt{d}_{\chi})\,.

By Lemma 3.13 again, we have ||k(χ)|ϑχn|nΛχ\Big{|}|\aleph^{(\chi)}_{k}|-\vartheta_{\chi}n\Big{|}\leq n\Lambda_{\chi} with probability 1o(1)1-o(1). Provided with this and applying Lemma 3.11 (with ϵ=ϑΔ0\epsilon=\vartheta\Delta_{0}), we get that

|(uΓk(0))ϑ|\displaystyle|\mathbb{P}(u\in\Gamma^{(0)}_{k})-\vartheta|\leq |(Bin(ϑχn+nΛχ,q^)𝚍χ)(Bin(ϑχn,q^)𝚍χ)|\displaystyle|\mathbb{P}(\mathrm{Bin}(\vartheta_{\chi}n+n\Lambda_{\chi},\hat{q})\geq\mathtt{d}_{\chi})-\mathbb{P}(\mathrm{Bin}(\vartheta_{\chi}n,\hat{q})\geq\mathtt{d}_{\chi})|
\displaystyle\leq (ϑΔ0)2+(logn)nΛχq^+2(ϑΔ0)1nΛχq^ϑχnq^ϑΔ0,\displaystyle(\vartheta\Delta_{0})^{2}+(\log n)\frac{n\Lambda_{\chi}\hat{q}+2(\vartheta\Delta_{0})^{-1}\sqrt{n\Lambda_{\chi}\hat{q}}}{\sqrt{\vartheta_{\chi}n\hat{q}}}\ll\vartheta\Delta_{0}\,,

where we used (3.1) and the inequalities nΛχq^ϑχnq^=100χϑχlognϑΔ0/logn\frac{n\Lambda_{\chi}\hat{q}}{\sqrt{\vartheta_{\chi}n\hat{q}}}=100^{\chi}\sqrt{\vartheta_{\chi}}\log n\ll\vartheta\Delta_{0}/\log n (by Lemma 2.1) as well as (ϑΔ0)1nΛχq^ϑχnq^=(nq^)14(ϑΔ0)1ϑΔ0/logn\frac{(\vartheta\Delta_{0})^{-1}\sqrt{n\Lambda_{\chi}\hat{q}}}{\sqrt{\vartheta_{\chi}n\hat{q}}}=(n\hat{q})^{-\frac{1}{4}}(\vartheta\Delta_{0})^{-1}\ll\vartheta\Delta_{0}/\log n. By Lemma 3.3,

(||Γk(0)|nϑ|>ϑΔ0)=(||Γk(0)REV|nϑ|>910ϑΔ0)\displaystyle\mathbb{P}\Big{(}\Big{|}\frac{|\Gamma^{(0)}_{k}|}{n}-\vartheta\Big{|}>\vartheta\Delta_{0}\Big{)}=\mathbb{P}\Big{(}\Big{|}\frac{|\Gamma^{(0)}_{k}\setminus\mathrm{REV}|}{n}-\vartheta\Big{|}>\frac{9}{10}\vartheta\Delta_{0}\Big{)}
\displaystyle\leq\ (|1nBin(n|REV|,ϑ+o(ϑΔ0))ϑ|>910ϑΔ0)\displaystyle\mathbb{P}\Big{(}\Big{|}\frac{1}{n}\mathrm{Bin}(n-|\mathrm{REV}|,\vartheta+o(\vartheta\Delta_{0}))-\vartheta\Big{|}>\frac{9}{10}\vartheta\Delta_{0}\Big{)}
\displaystyle\leq\ 2exp{(12ϑΔ0n)22(nϑ+ϑΔ0n/3)}2exp{2ϑΔ02n}.\displaystyle 2\exp\Big{\{}-\frac{(\frac{1}{2}\vartheta\Delta_{0}n)^{2}}{2(n\vartheta+\vartheta\Delta_{0}n/3)}\Big{\}}\leq 2\exp\{-2\vartheta\Delta_{0}^{2}n\}\,. (3.25)

We can obtain the concentration for |Πk(0)|n\frac{|\Pi^{(0)}_{k}|}{n}, |Γk(0)Γl(0)|n\frac{|\Gamma^{(0)}_{k}\cap\Gamma^{(0)}_{l}|}{n}, |Πk(0)Πl(0)|n\frac{|\Pi^{(0)}_{k}\cap\Pi^{(0)}_{l}|}{n} and |π(Γk(0))Πl(0)|n\frac{|\pi(\Gamma^{(0)}_{k})\cap\Pi^{(0)}_{l}|}{n} similarly. For instance, for |π(Γk(0))Πl(0)|n\frac{|\pi(\Gamma^{(0)}_{k})\cap\Pi^{(0)}_{l}|}{n}, we note that given {k(a),Υk(a):1kK0,0aχ}\{\aleph^{(a)}_{k},\Upsilon^{(a)}_{k}:1\leq k\leq K_{0},0\leq a\leq\chi\},

|Γk(0)π1(Πl(0))REV|n=uVREV𝟏{uΓk(0)π1(Πl(0))}\displaystyle\frac{|\Gamma^{(0)}_{k}\cap\pi^{-1}(\Pi^{(0)}_{l})\setminus\mathrm{REV}|}{n}=\sum_{u\in V\setminus\mathrm{REV}}\mathbf{1}_{\{u\in\Gamma^{(0)}_{k}\cap\pi^{-1}(\Pi^{(0)}_{l})\}}

is a sum of i.i.d. Bernoulli variables with parameter given by

(uΓk(0)π1(Πl(0)))=(CorBin(|k(χ)|,|Υl(χ)|,q^;|k(0)π1(Υl(0))|,ρ^)(𝚍χ,𝚍χ)).\displaystyle\mathbb{P}(u\in\Gamma^{(0)}_{k}\cap\pi^{-1}(\Pi^{(0)}_{l}))=\mathbb{P}(\mathrm{CorBin}(|\aleph^{(\chi)}_{k}|,|\Upsilon^{(\chi)}_{l}|,\hat{q};|\aleph^{(0)}_{k}\cap\pi^{-1}(\Upsilon^{(0)}_{l})|,\hat{\rho})\geq(\mathtt{d}_{\chi},\mathtt{d}_{\chi}))\,.

By Lemma 3.13, we have ||k(χ)|nϑχ|,||Υk(χ)|nϑχ|,||k(0)π1(Υl(0))|nςχ|nΛχ\big{|}|\aleph^{(\chi)}_{k}|-n\vartheta_{\chi}\big{|},\big{|}|\Upsilon^{(\chi)}_{k}|-n\vartheta_{\chi}\big{|},\big{|}|\aleph^{(0)}_{k}\cap\pi^{-1}(\Upsilon^{(0)}_{l})|-n\varsigma_{\chi}\big{|}\leq n\Lambda_{\chi} with probability 1o(1)1-o(1). Provided with this and applying Corollary 3.12 again (with ϵ=ϑΔ0\epsilon=\vartheta\Delta_{0}), we get that (uΓk(0)π1(Πl(0)))=ς+o(ϑΔ0)\mathbb{P}(u\in\Gamma^{(0)}_{k}\cap\pi^{-1}(\Pi^{(0)}_{l}))=\varsigma+o(\vartheta\Delta_{0}). Thus we can obtain a similar concentration bound using Lemma 3.3. We omit further details due to similarity.

By (3.25) (and its analogues) and a union bound, we deduce that

(0c)20K02exp{2ϑΔ02n}=o(1),\mathbb{P}(\mathcal{E}_{0}^{c})\leq 20K_{0}^{2}\exp\{-2\vartheta\Delta_{0}^{2}n\}=o(1)\,, (3.26)

where for the last step we recalled (3.1) and Lemma 2.1. This completes the proof. ∎

3.4 Density comparison

Our proof of the admissibility along the iteration relies on a direct comparison of the smoothed Bernoulli density and the Gaussian density, which then allows us to use the techniques developed in [17] for correlated Gaussian Wigner matrices. Recall (3.6), (3.8) and (3.12). Our main result in this subsection is Lemma 3.16 below, and we need to introduce more notations before its statement.

For a random variable XX and a σ\sigma-field \mathcal{F}, we denote by p{X}p_{\{X\mid\mathcal{F}\}} the conditional density of XX given \mathcal{F}. For a realization Ξt={ξk(s),ζk(s):st,1kKs}\Xi_{t}=\{\xi^{(s)}_{k},\zeta^{(s)}_{k}:s\leq t,1\leq k\leq K_{s}\} of {Γk(s),Πk(s):st,1kKs}\{\Gamma^{(s)}_{k},\Pi^{(s)}_{k}:s\leq t,1\leq k\leq K_{s}\} and a realization Bt1\mathrm{B}_{t-1} of BADt1\mathrm{BAD}_{t-1}, we define vector-valued functions φv(s)(Ξt,Bt1),ψv(s)(Ξt,Bt1)\varphi^{(s)}_{v}(\Xi_{t},\mathrm{B}_{t-1}),\psi^{(s)}_{v}(\Xi_{t},\mathrm{B}_{t-1}) for vBt1v\not\in\mathrm{B}_{t-1} and 0st0\leq s\leq t, where for 1kKs1\leq k\leq K_{s} the kk-th component is given by

φv,k(s)(Ξt,Bt1)\displaystyle\varphi^{(s)}_{v,k}(\Xi_{t},\mathrm{B}_{t-1}) =1(𝔞s𝔞s2)nq^(1q^)uBt1(𝟏uξk(s)𝔞s)(Gv,uq^),\displaystyle=\frac{1}{\sqrt{(\mathfrak{a}_{s}-\mathfrak{a}_{s}^{2})n\hat{q}(1-\hat{q})}}\sum_{u\not\in\mathrm{B}_{t-1}}(\mathbf{1}_{u\in\xi^{(s)}_{k}}-\mathfrak{a}_{s})(\overrightarrow{G}_{v,u}-\hat{q})\,, (3.27)
ψv,k(s)(Ξt,Bt1)\displaystyle\psi^{(s)}_{v,k}(\Xi_{t},\mathrm{B}_{t-1}) =1(𝔞s𝔞s2)nq^(1q^)uBt1(𝟏uξk(s)𝔞s)Zv,u.\displaystyle=\frac{1}{\sqrt{(\mathfrak{a}_{s}-\mathfrak{a}_{s}^{2})n\hat{q}(1-\hat{q})}}\sum_{u\not\in\mathrm{B}_{t-1}}(\mathbf{1}_{u\in\xi^{(s)}_{k}}-\mathfrak{a}_{s})\overrightarrow{Z}_{v,u}\,.

Similarly, we define φπ(v)(s)(Ξt,Bt1),ψπ(v)(s)(Ξt,Bt1)\varphi^{(s)}_{\pi(v)}(\Xi_{t},\mathrm{B}_{t-1}),\psi^{(s)}_{\pi(v)}(\Xi_{t},\mathrm{B}_{t-1}) where for 1kKs1\leq k\leq K_{s} the kk-th component is given by

φπ(v),k(s)(Ξt,Bt1)\displaystyle\varphi^{(s)}_{\pi(v),k}(\Xi_{t},\mathrm{B}_{t-1}) =1(𝔞s𝔞s2)nq^(1q^)uBt1(𝟏π(u)ζk(s)𝔞s)(𝖦π(v),π(u)q^),\displaystyle=\frac{1}{\sqrt{(\mathfrak{a}_{s}-\mathfrak{a}_{s}^{2})n\hat{q}(1-\hat{q})}}\sum_{u\not\in\mathrm{B}_{t-1}}(\mathbf{1}_{\pi(u)\in\zeta^{(s)}_{k}}-\mathfrak{a}_{s})(\overrightarrow{\mathsf{G}}_{\pi(v),\pi(u)}-\hat{q})\,, (3.28)
ψπ(v),k(s)(Ξt,Bt1)\displaystyle\psi^{(s)}_{\pi(v),k}(\Xi_{t},\mathrm{B}_{t-1}) =1(𝔞s𝔞s2)nq^(1q^)uBt1(𝟏π(u)ζk(s)𝔞s)𝖹π(v),π(u).\displaystyle=\frac{1}{\sqrt{(\mathfrak{a}_{s}-\mathfrak{a}_{s}^{2})n\hat{q}(1-\hat{q})}}\sum_{u\not\in\mathrm{B}_{t-1}}(\mathbf{1}_{\pi(u)\in\zeta^{(s)}_{k}}-\mathfrak{a}_{s})\overrightarrow{\mathsf{Z}}_{\pi(v),\pi(u)}\,.

In addition, define Bt=Bt(Ξt,Bt1,G,𝖦,W,𝖶)\mathrm{B}_{t}=\mathrm{B}_{t}(\Xi_{t},\mathrm{B}_{t-1},\overrightarrow{G},\overrightarrow{\mathsf{G}},W,\mathsf{W}) to be the corresponding realization of BADt\mathrm{BAD}_{t}, i.e., Bt\mathrm{B}_{t} is the collection of vertices satisfying either of (3.7), (3.10), (3.11) and (3.14) with (Γk(s),Πk(s))(\Gamma^{(s)}_{k},\Pi^{(s)}_{k}) replaced by (ξk(s),ζk(s))(\xi^{(s)}_{k},\zeta^{(s)}_{k}) and BADt1\mathrm{BAD}_{t-1} replaced by Bt1\mathrm{B}_{t-1}. Define a random vector 𝐗t=𝐗t(Bt,Bt1)\mathbf{X}^{\leq t}=\mathbf{X}^{\leq t}(\mathrm{B}_{t},\mathrm{B}_{t-1}) by

𝐗t(s,k,v)=Wv(s)(k)+ηk(s),φv(s) and 𝐗t(s,k,π(v))=𝖶π(v)(s)(k)+ηk(s),φπ(v)(s)\displaystyle\mathbf{X}^{\leq t}(s,k,v)=W^{(s)}_{v}(k)+\langle\eta^{(s)}_{k},\varphi^{(s)}_{v}\rangle\mbox{ and }\mathbf{X}^{\leq t}(s,k,\pi(v))=\mathsf{W}^{(s)}_{\pi(v)}(k)+\langle\eta^{(s)}_{k},\varphi^{(s)}_{\pi(v)}\rangle

where 0st,1kKs120\leq s\leq t,1\leq k\leq\frac{K_{s}}{12}, and vBt1v\not\in\mathrm{B}_{t-1} when s<ts<t, and vBtv\not\in\mathrm{B}_{t} when s=ts=t. Define 𝐘t\mathbf{Y}^{\leq t} similarly by replacing φv(s),φπ(v)(s)\varphi^{(s)}_{v},\varphi^{(s)}_{\pi(v)} with ψv(s),ψπ(v)(s)\psi^{(s)}_{v},\psi^{(s)}_{\pi(v)} and replacing W,𝖶W,\mathsf{W} with W~,𝖶~\tilde{W},\tilde{\mathsf{W}}. Let 𝐗=t\mathbf{X}^{=t} be the vector obtained from 𝐗t\mathbf{X}^{\leq t} by keeping its coordinates with s=ts=t, and let 𝐗<t\mathbf{X}^{<t} be the vector obtained from 𝐗t\mathbf{X}^{\leq t} by keeping its coordinates with s<ts<t. Define 𝐘=t\mathbf{Y}^{=t} and 𝐘<t\mathbf{Y}^{<t} with respect to 𝐘t\mathbf{Y}^{\leq t} similarly. We also define 𝐗^t(s,k,v)=𝐗t(s,k,v)Wv(s)(k)\hat{\mathbf{X}}^{\leq t}(s,k,v)=\mathbf{X}^{\leq t}(s,k,v)-W^{(s)}_{v}(k) and 𝐘^t(s,k,v)=𝐘t(s,k,v)W~v(s)(k)\hat{\mathbf{Y}}^{\leq t}(s,k,v)=\mathbf{Y}^{\leq t}(s,k,v)-\tilde{W}^{(s)}_{v}(k). Also, define (GB,𝖦B)=(Gu,w,𝖦π(u),π(w):u or wBt1)(\overrightarrow{G}_{\mathrm{B}},\overrightarrow{\mathsf{G}}_{\mathrm{B}})=(\overrightarrow{G}_{u,w},\overrightarrow{\mathsf{G}}_{\pi(u),\pi(w)}:u\mbox{ or }w\in\mathrm{B}_{t-1}), and define (GB,𝖦B)=(Gu,w,𝖦π(u),π(w):u,wBt1)(\overrightarrow{G}_{\setminus\mathrm{B}},\overrightarrow{\mathsf{G}}_{\setminus\mathrm{B}})=(\overrightarrow{G}_{u,w},\overrightarrow{\mathsf{G}}_{\pi(u),\pi(w)}:u,w\not\in\mathrm{B}_{t-1}). Denote by (gB,𝗀B)(\overrightarrow{g}_{\mathrm{B}},\overrightarrow{\mathsf{g}}_{\mathrm{B}}) the realization of (GB,𝖦B)(\overrightarrow{G}_{\mathrm{B}},\overrightarrow{\mathsf{G}}_{\mathrm{B}}). For any fixed realization (Ξt,Bt,Bt1,gB,𝗀B)(\Xi_{t},\mathrm{B}_{t},\mathrm{B}_{t-1},\overrightarrow{g}_{\mathrm{B}},\overrightarrow{\mathsf{g}}_{\mathrm{B}}), we further define 𝚙{𝐗t}(xt)=p(xt,Ξt,Bt,Bt1,gB,𝗀B)\mathtt{p}_{\{\mathbf{X}^{\leq t}\}}(x^{\leq t})=p({x}^{\leq t},\Xi_{t},\mathrm{B}_{t},\mathrm{B}_{t-1},\overrightarrow{g}_{\mathrm{B}},\overrightarrow{\mathsf{g}}_{\mathrm{B}}) to be the conditional density as follows:

𝚙{𝐗t}(xt)=p{𝐗tBADt=Bt;BADt1=Bt1;(GB,𝖦B)=(gB,𝗀B)}(xt),\displaystyle\mathtt{p}_{\{\mathbf{X}^{\leq t}\}}(x^{\leq t})=p_{\{\mathbf{X}^{\leq t}\mid\mathrm{BAD}_{t}=\mathrm{B}_{t};\mathrm{BAD}_{t-1}=\mathrm{B}_{t-1};(\overrightarrow{G}_{\mathrm{B}},\overrightarrow{\mathsf{G}}_{\mathrm{B}})=(\overrightarrow{g}_{\mathrm{B}},\overrightarrow{\mathsf{g}}_{\mathrm{B}})\}}(x^{\leq t})\,,

where the support of xtx^{\leq t} is consistent with the choice of (Bt,Bt1)(\mathrm{B}_{t},\mathrm{B}_{t-1}) (i.e., xtx^{\leq t} is a legitimate realization for 𝐗t=𝐗t(Bt,Bt1)\mathbf{X}^{\leq t}=\mathbf{X}^{\leq t}(\mathrm{B}_{t},\mathrm{B}_{t-1})). Define 𝚙{𝐘t}(xt)\mathtt{p}_{\{\mathbf{Y}^{\leq t}\}}({x}^{\leq t}) similarly but with respect to 𝐘t\mathbf{Y}^{\leq t}. For the purpose of truncation later, we say a realization (Bt,Bt1)(\mathrm{B}_{t},\mathrm{B}_{t-1}) for (BADt,BADt1)(\mathrm{BAD}_{t},\mathrm{BAD}_{t-1}) is an amenable set-realization, if

(BADt=Bt,BADt1=Bt1)exp{nΔt9}.\mathbb{P}(\mathrm{BAD}_{t}=\mathrm{B}_{t},\mathrm{BAD}_{t-1}=\mathrm{B}_{t-1})\geq\exp\{-n\Delta_{t}^{9}\}\,. (3.29)

Also, we say (gB,𝗀B)(\overrightarrow{g}_{\mathrm{B}},\overrightarrow{\mathsf{g}}_{\mathrm{B}}) is an amenable bias-realization with respect to (Bt,Bt1)(\mathrm{B}_{t},\mathrm{B}_{t-1}), if

(BADt=Bt,BADt1=Bt1|(GB,𝖦B)=(gB,𝗀B))exp{nΔt8}.\mathbb{P}(\mathrm{BAD}_{t}=\mathrm{B}_{t},\mathrm{BAD}_{t-1}=\mathrm{B}_{t-1}|(\overrightarrow{G}_{\mathrm{B}},\overrightarrow{\mathsf{G}}_{\mathrm{B}})=(\overrightarrow{g}_{\mathrm{B}},\overrightarrow{\mathsf{g}}_{\mathrm{B}}))\geq\exp\{-n\Delta_{t}^{8}\}\,. (3.30)

In addition, we say a realization xt=xt(Bt,Bt1)x^{\leq t}=x^{\leq t}(\mathrm{B}_{t},\mathrm{B}_{t-1}) for 𝐗t(Bt,Bt1)\mathbf{X}^{\leq t}(\mathrm{B}_{t},\mathrm{B}_{t-1}) is an amenable variable-realization with respect to (Bt,Bt1)(\mathrm{B}_{t},\mathrm{B}_{t-1}) and (gB,𝗀B)(\overrightarrow{g}_{\mathrm{B}},\overrightarrow{\mathsf{g}}_{\mathrm{B}}), if it is consistent with the choice of (Bt,Bt1)(\mathrm{B}_{t},\mathrm{B}_{t-1}) and (gB,𝗀B)(\overrightarrow{g}_{\mathrm{B}},\overrightarrow{\mathsf{g}}_{\mathrm{B}}), and (below the vector x<tx^{<t} is obtained by keeping the coordinates of xtx^{\leq t} with s<ts<t)

xt2(logn)n1logloglogn,\displaystyle\|x^{\leq t}\|_{\infty}\leq 2(\log n)n^{\frac{1}{\log\log\log n}}, (3.31)
p{𝐘<t|BADt=Bt,BADt1=Bt1,(GB,𝖦B)=(gB,𝗀B)}(x<t)p{𝐗<t|BADt=Bt,BADt1=Bt1,(GB,𝖦B)=(gB,𝗀B)}(x<t)exp{nΔt10},\displaystyle\frac{p_{\{\mathbf{Y}^{<t}|\mathrm{BAD}_{t}=\mathrm{B}_{t},\mathrm{BAD}_{t-1}=\mathrm{B}_{t-1},(\overrightarrow{G}_{\mathrm{B}},\overrightarrow{\mathsf{G}}_{\mathrm{B}})=(\overrightarrow{g}_{\mathrm{B}},\overrightarrow{\mathsf{g}}_{\mathrm{B}})\}}(x^{<t})}{p_{\{\mathbf{X}^{<t}|\mathrm{BAD}_{t}=\mathrm{B}_{t},\mathrm{BAD}_{t-1}=\mathrm{B}_{t-1},(\overrightarrow{G}_{\mathrm{B}},\overrightarrow{\mathsf{G}}_{\mathrm{B}})=(\overrightarrow{g}_{\mathrm{B}},\overrightarrow{\mathsf{g}}_{\mathrm{B}})\}}(x^{<t})}\leq\exp\{n\Delta_{t}^{10}\}\,, (3.32)
and p{𝐘t|BADt=Bt,BADt1=Bt1,(GB,𝖦B)=(gB,𝗀B)}(xt)p{𝐗t|BADt=Bt,BADt1=Bt1,(GB,𝖦B)=(gB,𝗀B)}(xt)exp{nΔt10}.\displaystyle\frac{p_{\{\mathbf{Y}^{\leq t}|\mathrm{BAD}_{t}=\mathrm{B}_{t},\mathrm{BAD}_{t-1}=\mathrm{B}_{t-1},(\overrightarrow{G}_{\mathrm{B}},\overrightarrow{\mathsf{G}}_{\mathrm{B}})=(\overrightarrow{g}_{\mathrm{B}},\overrightarrow{\mathsf{g}}_{\mathrm{B}})\}}(x^{\leq t})}{p_{\{\mathbf{X}^{\leq t}|\mathrm{BAD}_{t}=\mathrm{B}_{t},\mathrm{BAD}_{t-1}=\mathrm{B}_{t-1},(\overrightarrow{G}_{\mathrm{B}},\overrightarrow{\mathsf{G}}_{\mathrm{B}})=(\overrightarrow{g}_{\mathrm{B}},\overrightarrow{\mathsf{g}}_{\mathrm{B}})\}}(x^{\leq t})}\leq\exp\{n\Delta_{t}^{10}\}\,. (3.33)
Lemma 3.15.

On the event 𝒯t\mathcal{T}_{t}, with probability 1O(exp{nΔt10})1-O(\exp\{-n\Delta_{t}^{10}\}) we have that

  • (Bt,Bt1)(\mathrm{B}_{t},\mathrm{B}_{t-1}) sampled according to (BADt,BADt1)(\mathrm{BAD}_{t},\mathrm{BAD}_{t-1}) is an amenable set-realization;

  • (gB,𝗀B)(\overrightarrow{g}_{\mathrm{B}},\overrightarrow{\mathsf{g}}_{\mathrm{B}}) sampled according to {(GB,𝖦B)|BADt=Bt,Bt1=Bt1}\{(\overrightarrow{G}_{\mathrm{B}},\overrightarrow{\mathsf{G}}_{\mathrm{B}})|\mathrm{BAD}_{t}=\mathrm{B}_{t},\mathrm{B}_{t-1}=\mathrm{B}_{t-1}\} is an amenable bias-realization with respect to (Bt,Bt1)(\mathrm{B}_{t},\mathrm{B}_{t-1});

  • xt(Bt,Bt1)x^{\leq t}(\mathrm{B}_{t},\mathrm{B}_{t-1}) sampled according to {𝐗t|BADt=Bt;BADt1=Bt1;(GB,𝖦B)=(gB,𝗀B)}\{\mathbf{X}^{\leq t}|\mathrm{BAD}_{t}=\mathrm{B}_{t};\mathrm{BAD}_{t-1}=\mathrm{B}_{t-1};(\overrightarrow{G}_{\mathrm{B}},\overrightarrow{\mathsf{G}}_{\mathrm{B}})=(\overrightarrow{g}_{\mathrm{B}},\overrightarrow{\mathsf{g}}_{\mathrm{B}})\} is an amenable variable-realization with respect to (Bt,Bt1)(\mathrm{B}_{t},\mathrm{B}_{t-1}) and (gB,𝗀B)(\overrightarrow{g}_{\mathrm{B}},\overrightarrow{\mathsf{g}}_{\mathrm{B}}).

Proof.

We first consider (Bt,Bt1)(\mathrm{B}_{t},\mathrm{B}_{t-1}). On 𝒯t\mathcal{T}_{t} we have |BADt|,|BADt1|nΔt10|\mathrm{BAD}_{t}|,|\mathrm{BAD}_{t-1}|\leq n\Delta_{t}^{10}. Note that (BADt,BADt1)(\mathrm{BAD}_{t},\mathrm{BAD}_{t-1}) are two subsets of VV and each has at most (nnΔt10)\binom{n}{n\Delta_{t}^{10}} possible values. Thus,

(BADt,BADt1{amenable set-realization};𝒯t)((nnΔt10))2exp{nΔt9}e12nΔt9.\displaystyle\mathbb{P}(\mathrm{BAD}_{t},\mathrm{BAD}_{t-1}\not\in\{\mbox{amenable set-realization}\};\mathcal{T}_{t})\leq\Big{(}\binom{n}{n\Delta_{t}^{10}}\Big{)}^{2}\exp\{-n\Delta_{t}^{9}\}\ll e^{-\tfrac{1}{2}n\Delta_{t}^{9}}\,.

Given an amenable set-realization (Bt,Bt1)(\mathrm{B}_{t},\mathrm{B}_{t-1}), we now consider (gB,𝗀B)(\overrightarrow{g}_{\mathrm{B}},\overrightarrow{\mathsf{g}}_{\mathrm{B}}). By Lemma 3.6,

(gB,𝗀B){(GB,𝖦B)|BADt=Bt,Bt1=Bt1}((gB,𝗀B) is bias-amenable)1enΔt9.\displaystyle\mathbb{P}_{(\overrightarrow{g}_{\mathrm{B}},\overrightarrow{\mathsf{g}}_{\mathrm{B}})\sim\{(\overrightarrow{G}_{\mathrm{B}},\overrightarrow{\mathsf{G}}_{\mathrm{B}})|\mathrm{BAD}_{t}=\mathrm{B}_{t},\mathrm{B}_{t-1}=\mathrm{B}_{t-1}\}}\big{(}(\overrightarrow{g}_{\mathrm{B}},\overrightarrow{\mathsf{g}}_{\mathrm{B}})\mbox{ is bias-amenable}\big{)}\geq 1-e^{-n\Delta_{t}^{-9}}\,.

Finally we consider 𝐗t\mathbf{X}^{\leq t}. Combining Markov’s inequality and (below we write 𝒜={BADt=Bt;BADt1=Bt1;(GB,𝖦B)=(gB,𝗀B)}\mathcal{A}=\{\mathrm{BAD}_{t}=\mathrm{B}_{t};\mathrm{BAD}_{t-1}=\mathrm{B}_{t-1};(\overrightarrow{G}_{\mathrm{B}},\overrightarrow{\mathsf{G}}_{\mathrm{B}})=(\overrightarrow{g}_{\mathrm{B}},\overrightarrow{\mathsf{g}}_{\mathrm{B}})\})

𝔼xt{𝐗t𝒜}[p{𝐘t𝒜}(xt)p{𝐗t𝒜}(xt)]=1,\displaystyle\mathbb{E}_{x^{\leq t}\sim\{\mathbf{X}^{\leq t}\mid\mathcal{A}\}}\Big{[}\frac{p_{\{\mathbf{Y}^{\leq t}\mid\mathcal{A}\}}(x^{\leq t})}{p_{\{\mathbf{X}^{\leq t}\mid\mathcal{A}\}}(x^{\leq t})}\Big{]}=1\,,

we get that conditioned on 𝒜\mathcal{A}, the (random) realization xtx^{\leq t} for 𝐗t\mathbf{X}^{\leq t} satisfies (3.33) with probability at least 1exp{nΔt10}1-\exp\{-n\Delta_{t}^{10}\}. Similarly, the realization xtx^{\leq t} satisfies (3.32) with probability at least 1exp{nΔt10}1-\exp\{-n\Delta_{t}^{10}\}. In addition, for any st1s\leq t-1 and vBt1v\not\in\mathrm{B}_{t-1}, recalling that Bt1\mathrm{B}_{t-1} is the realization for BADt1\mathrm{BAD}_{t-1} and using 𝒯t1\mathcal{T}_{t-1}, (3.10) and (3.11), we derive from the triangle inequality that

|𝐗t(s,k,v)|\displaystyle|\mathbf{X}^{\leq t}(s,k,v)| |Wk(s)(v)|+|ηk(s),gt2φv(s)|+a=0logn|ηk(s),bt1,aφv(s)|\displaystyle\leq|W^{(s)}_{k}(v)|+|\langle\eta^{(s)}_{k},g_{t-2}\varphi^{(s)}_{v}\rangle|+\sum_{a=0}^{\log n}|\langle\eta^{(s)}_{k},b_{t-1,a}\varphi^{(s)}_{v}\rangle|
(3+logn)n1logloglogn,\displaystyle\leq(3+\log n)n^{\frac{1}{\log\log\log n}}\,, (3.34)

where gt2φv(s)(k)g_{t-2}\varphi^{(s)}_{v}(k) denotes the (first) expression (3.27) but the summation is taken for all uBADt2u\not\in\mathrm{BAD}_{t-2}, bt1,0φv(s)b_{t-1,0}\varphi^{(s)}_{v} denotes the expression (3.27) but the summation is taken for all uLARGEt1(0)BIASt1PRBt1u\in\mathrm{LARGE}^{(0)}_{t-1}\cup\mathrm{BIAS}_{t-1}\cup\mathrm{PRB}_{t-1}, and bt1,aφv(s)b_{t-1,a}\varphi^{(s)}_{v} denotes the expression (3.27) but the summation is taken for all uLARGEt1(a)u\in\mathrm{LARGE}^{(a)}_{t-1}. We have |𝐗t(s,k,π(v))|(3+logn)n1logloglogn|\mathbf{X}^{\leq t}(s,k,\pi(v))|\leq(3+\log n)n^{\frac{1}{\log\log\log n}} similarly. Since LARGEtBt\mathrm{LARGE}_{t}\subset\mathrm{B}_{t}, for vBtv\not\in\mathrm{B}_{t} choosing s=ts=t in (3.10) gives that

|𝐗t(t,k,v)||Wk(t)(v)|+|ηk(t),φv(t)|2n1logloglogn.\displaystyle|\mathbf{X}^{\leq t}(t,k,v)|\leq|W^{(t)}_{k}(v)|+|\langle\eta^{(t)}_{k},\varphi^{(t)}_{v}\rangle|\leq 2n^{\frac{1}{\log\log\log n}}\,. (3.35)

Combining (3.34) and (3.35) we can show that {BADt=Bt,BADt1=Bt1}\{\mathrm{BAD}_{t}=\mathrm{B}_{t},\mathrm{BAD}_{t-1}=\mathrm{B}_{t-1}\} implies that 𝐗t,𝐗t(t,k,π(v))(3+logn)n1logloglogn\|\mathbf{X}^{\leq t}\|_{\infty},\|\mathbf{X}^{\leq t}(t,k,\pi(v))\|_{\infty}\leq(3+\log n)n^{\frac{1}{\log\log\log n}}. Altogether, we complete the proof of the lemma. ∎

Lemma 3.16.

For ttt\leq t^{*}, on the event t𝒯t\mathcal{E}_{t}\cap\mathcal{T}_{t}, fix an amenable set-realization (Bt,Bt1)(\mathrm{B}_{t},\mathrm{B}_{t-1}), fix an amenable bias-realization (gB,𝗀B)(\overrightarrow{g}_{\mathrm{B}},\overrightarrow{\mathsf{g}}_{\mathrm{B}}) with respect to (Bt,Bt1)(\mathrm{B}_{t},\mathrm{B}_{t-1}), and fix an amenable variable-realization xtx^{\leq t} with respect to (Bt,Bt1)(\mathrm{B}_{t},\mathrm{B}_{t-1}) and (gB,𝗀B)(\overrightarrow{g}_{\mathrm{B}},\overrightarrow{\mathsf{g}}_{\mathrm{B}}). Then we have (below the vector x=tx^{=t} is defined by keeping the coordinates of xtx^{\leq t} such that s=ts=t)

p{𝐗=t|𝔖t1;BADt=Bt}(x=t)p{𝐘=t|t1}(x=t)=exp{O(nΔt5)}.\displaystyle\frac{p_{\{\mathbf{X}^{=t}|\mathfrak{S}_{t-1};\mathrm{BAD}_{t}=\mathrm{B}_{t}\}}(x^{=t})}{p_{\{\mathbf{Y}^{=t}|\mathcal{F}_{t-1}\}}(x^{=t})}=\exp\big{\{}O\big{(}n\Delta_{t}^{5}\big{)}\big{\}}\,. (3.36)
Remark 3.17.

Since {BADt=Bt}\{\mathrm{BAD}_{t}=\mathrm{B}_{t}\} is measurable with respect to (G,𝖦,W,𝖶)(\overrightarrow{G},\overrightarrow{\mathsf{G}},W,\mathsf{W}) and thus is independent of (Z,𝖹,W~,𝖶~)(\overrightarrow{Z},\overrightarrow{\mathsf{Z}},\tilde{W},\tilde{\mathsf{W}}), we have that p{𝐘t}(xt)=p{𝐘t|BADt=Bt}(xt){p}_{\{\mathbf{Y}^{\leq t}\}}(x^{\leq t})={p}_{\{\mathbf{Y}^{\leq t}|\mathrm{BAD}_{t}=\mathrm{B}_{t}\}}(x^{\leq t}). That is, we may add conditioning on BADt=Bt\mathrm{BAD}_{t}=\mathrm{B}_{t} in the denominator of (3.36), but this will not change its conditional density.

The key to the proof of Lemma 3.16 is the following bound on the “joint” density.

Lemma 3.18.

For ttt\leq t^{*}, on the event t𝒯t\mathcal{E}_{t}\cap\mathcal{T}_{t}, for an amenable set-realization (Bt,Bt1)(\mathrm{B}_{t},\mathrm{B}_{t-1}), an amenable bias-realization (gB,𝗀B)(\overrightarrow{g}_{\mathrm{B}},\overrightarrow{\mathsf{g}}_{\mathrm{B}}) with respect to (Bt,Bt1)(\mathrm{B}_{t},\mathrm{B}_{t-1}) and an amenable variable-realization xtx^{\leq t} with respect to (Bt,Bt1)(\mathrm{B}_{t},\mathrm{B}_{t-1}) and (gB,𝗀B)(\overrightarrow{g}_{\mathrm{B}},\overrightarrow{\mathsf{g}}_{\mathrm{B}}), we have

𝚙{𝐗t}(xt)𝚙{𝐘t}(xt),𝚙{𝐗<t}(x<t)𝚙{𝐘<t}(x<t)=exp{O(nΔt5)}.\frac{\mathtt{p}_{\{\mathbf{X}^{\leq t}\}}(x^{\leq t})}{\mathtt{p}_{\{\mathbf{Y}^{\leq t}\}}(x^{\leq t})},\frac{\mathtt{p}_{\{\mathbf{X}^{<t}\}}(x^{<t})}{\mathtt{p}_{\{\mathbf{Y}^{<t}\}}(x^{<t})}=\exp\big{\{}O(n\Delta_{t}^{5})\big{\}}\,. (3.37)
Proof of Lemma 3.16 assuming Lemma 3.18.

Applying Lemma 3.18 for both tt and t1t-1, we get that

𝚙{𝐗=t|𝐗<t}(x=t|x<t)𝚙{𝐘=t|𝐘<t}(x=t|x<t)=exp{O(nΔt5)}.\displaystyle\frac{\mathtt{p}_{\{\mathbf{X}^{=t}|\mathbf{X}^{<t}\}}(x^{=t}|x^{<t})}{\mathtt{p}_{\{\mathbf{Y}^{=t}|\mathbf{Y}^{<t}\}}(x^{=t}|x^{<t})}=\exp\{O(n\Delta_{t}^{5})\}\,.

Since p{𝐗=t|𝔖t1,BADt=Bt}(x=t)=𝚙{𝐗t}(xt)𝚙{𝐗<t}(x<t)p_{\{\mathbf{X}^{=t}|\mathfrak{S}_{t-1},\mathrm{BAD}_{t}=\mathrm{B}_{t}\}}(x^{=t})=\frac{\mathtt{p}_{\{\mathbf{X}^{\leq t}\}(x^{\leq t})}}{\mathtt{p}_{\{\mathbf{X}^{<t}\}(x^{<t})}}, we complete the proof of the lemma. ∎

The rest of this subsection is devoted to the proof of Lemma 3.18. Due to similarity, we only prove it for 𝚙{𝐗t}(xt)𝚙{𝐘t}(xt)\frac{\mathtt{p}_{\{\mathbf{X}^{\leq t}\}}(x^{\leq t})}{\mathtt{p}_{\{\mathbf{Y}^{\leq t}\}}(x^{\leq t})}. Also since xtx^{\leq t} is an amenable variable-realization, the lower bound is obvious and as a result we only need to prove the upper bound. Note that (BADt,BADt1)=(BADt(G,𝖦,W,𝖶),BADt1(G,𝖦,W,𝖶))(\mathrm{BAD}_{t},\mathrm{BAD}_{t-1})=(\mathrm{BAD}_{t}(\overrightarrow{G},\overrightarrow{\mathsf{G}},W,\mathsf{W}),\mathrm{BAD}_{t-1}(\overrightarrow{G},\overrightarrow{\mathsf{G}},W,\mathsf{W})) is a function of (G,𝖦,W,𝖶)(\overrightarrow{G},\overrightarrow{\mathsf{G}},W,\mathsf{W}), and that (GB,𝖦B)(\overrightarrow{G}_{\mathrm{B}},\overrightarrow{\mathsf{G}}_{\mathrm{B}}) is independent with 𝐗t(Bt,Bt1,W,𝖶)\mathbf{X}^{\leq t}(\mathrm{B}_{t},\mathrm{B}_{t-1},W,\mathsf{W}) and with (GB,𝖦B,W,𝖶)(\overrightarrow{G}_{\setminus\mathrm{B}},\overrightarrow{\mathsf{G}}_{\setminus\mathrm{B}},W,\mathsf{W}). Since for any independent random vectors X,YX,Y and any function ff

(f(X,Y)|X=x)=𝑑f(x,Y),(f(X,Y)|X=x)\overset{d}{=}f(x,Y)\,, (3.38)

we can then apply (3.38) and get that (note that the forms of pp are different in the equality below)

𝚙{𝐗t}(xt)=p{𝐗t|𝒜¯}(xt)\mathtt{p}_{\{\mathbf{X}^{\leq t}\}}(x^{\leq t})=p_{\{\mathbf{X}^{\leq t}|\bar{\mathcal{A}}\}}(x^{\leq t}) (3.39)

where 𝒜¯=r={t1,t}{BADr((gB,𝗀B),(GB,𝖦B),W,𝖶)=Br}\bar{\mathcal{A}}=\cap_{r=\in\{t-1,t\}}\big{\{}\mathrm{BAD}_{r}((\overrightarrow{g}_{\mathrm{B}},\overrightarrow{\mathsf{g}}_{\mathrm{B}}),(\overrightarrow{G}_{\setminus\mathrm{B}},\overrightarrow{\mathsf{G}}_{\setminus\mathrm{B}}),W,\mathsf{W})=\mathrm{B}_{r}\big{\}}. For an amenable set-realization (Bt,Bt1)(\mathrm{B}_{t},\mathrm{B}_{t-1}) (for convenience we will drop (Bt,Bt1)(\mathrm{B}_{t},\mathrm{B}_{t-1}) from notations in what follows), recall (3.9) and define 𝐗^jt\hat{\mathbf{X}}^{\leq t}_{\langle j\rangle} from 𝐗^t\hat{\mathbf{X}}^{\leq t} via the procedure in (3.9) (and similarly define 𝐗(j)t\mathbf{X}^{\leq t}_{(j)} from 𝐗t\mathbf{X}^{\leq t}). For 1jN1\leq j\leq N, let j=j(Bt,Bt1)\mathcal{M}_{j}=\mathcal{M}_{j}(\mathrm{B}_{t},\mathrm{B}_{t-1}) be the event that

𝐗^itn1logloglogn for jiN.\displaystyle\|\hat{\mathbf{X}}^{\leq t}_{\langle i\rangle}\|_{\infty}\leq n^{\frac{1}{\log\log\log n}}\mbox{ for }j\leq i\leq N\,. (3.40)

Recalling Remark 3.17, we get from (3.39) that

𝚙{𝐗t}(xt)𝚙{𝐘t}(xt)=p{𝐗t|𝒜¯}(xt)p{𝐘t}(xt)=p{𝐗t|𝒜¯}(xt)p{𝐗t|1}(xt)p{𝐗t|1}(xt)p{𝐘t}(xt).\displaystyle\frac{\mathtt{p}_{\{\mathbf{X}^{\leq t}\}}(x^{\leq t})}{\mathtt{p}_{\{\mathbf{Y}^{\leq t}\}}(x^{\leq t})}=\frac{p_{\{\mathbf{X}^{\leq t}|\bar{\mathcal{A}}\}}(x^{\leq t})}{p_{\{\mathbf{Y}^{\leq t}\}}(x^{\leq t})}=\frac{{p}_{\{\mathbf{X}^{\leq t}|\bar{\mathcal{A}}\}}(x^{\leq t})}{{p}_{\{\mathbf{X}^{\leq t}|\mathcal{M}_{1}\}}(x^{\leq t})}\cdot\frac{{p}_{\{\mathbf{X}^{\leq t}|\mathcal{M}_{1}\}}(x^{\leq t})}{{p}_{\{\mathbf{Y}^{\leq t}\}}(x^{\leq t})}\,.

Note that 𝒜¯1\bar{\mathcal{A}}\subset\mathcal{M}_{1}. Applying (3.38) with f=(BADt,BADt1)f=(\mathrm{BAD}_{t},\mathrm{BAD}_{t-1}), X=(GB,𝖦B)X=(\overrightarrow{G}_{\mathrm{B}},\overrightarrow{\mathsf{G}}_{\mathrm{B}}) and Y=(GB,𝖦B,W,𝖶)Y=(\overrightarrow{G}_{\setminus\mathrm{B}},\overrightarrow{\mathsf{G}}_{\setminus\mathrm{B}},W,\mathsf{W}), we get that

(𝒜¯)=(BADt=Bt,BADt1=Bt1(GB,𝖦B)=(gB,𝗀B))exp{nΔt8}.\displaystyle\mathbb{P}(\bar{\mathcal{A}})=\mathbb{P}(\mathrm{BAD}_{t}=\mathrm{B}_{t},\mathrm{BAD}_{t-1}=\mathrm{B}_{t-1}\mid(\overrightarrow{G}_{\mathrm{B}},\overrightarrow{\mathsf{G}}_{\mathrm{B}})=(\overrightarrow{g}_{\mathrm{B}},\overrightarrow{\mathsf{g}}_{\mathrm{B}}))\geq\exp\{-n\Delta_{t}^{8}\}\,.

Thus, for an amenable set-realization (Bt,Bt1)(\mathrm{B}_{t},\mathrm{B}_{t-1}) and an amenable bias-realization (gB,𝗀B)(\overrightarrow{g}_{\mathrm{B}},\overrightarrow{\mathsf{g}}_{\mathrm{B}}),

p{𝐗t|𝒜¯}(xt)(1)p{𝐗t|1}(xt)1(𝒜¯)exp{nΔt8}.\displaystyle\frac{{p}_{\{\mathbf{X}^{\leq t}|\bar{\mathcal{A}}\}}(x^{\leq t})}{\mathbb{P}(\mathcal{M}_{1})\cdot{p}_{\{\mathbf{X}^{\leq t}|\mathcal{M}_{1}\}}(x^{\leq t})}\leq\frac{1}{\mathbb{P}(\bar{\mathcal{A}})}\leq\exp\{n\Delta_{t}^{8}\}\,.

Therefore, it remains to show that for an amenable variable-realization xtx^{\leq t}

(1)p{𝐗t|1}(xt)p{𝐘t}(xt)exp{O(nΔt8)}.\displaystyle\frac{\mathbb{P}(\mathcal{M}_{1})\cdot{p}_{\{\mathbf{X}^{\leq t}|\mathcal{M}_{1}\}}(x^{\leq t})}{{p}_{\{\mathbf{Y}^{\leq t}\}}(x^{\leq t})}\leq\exp\big{\{}O\big{(}n\Delta_{t}^{8}\big{)}\big{\}}\,. (3.41)

For a random variable XX, define pX;j(x)p_{X;\mathcal{M}_{j}}(x) to be the density of XX on the event j\mathcal{M}_{j}, i.e., xApX;j(x)𝑑x=(XA;j)\int_{x\in A}p_{X;\mathcal{M}_{j}}(x)dx=\mathbb{P}(X\in A;\mathcal{M}_{j}) for any AA. From the definition we see that j\mathcal{M}_{j} is increasing in jj (and thus pX;j(x)p_{X;\mathcal{M}_{j}}(x) is increasing in jj). Combined with the facts that pX;j(x)pX(x)p_{X;\mathcal{M}_{j}}(x)\leq p_{X}(x) and pX;j(x)=(j)pX|j(x)p_{X;\mathcal{M}_{j}}(x)=\mathbb{P}(\mathcal{M}_{j})p_{X|\mathcal{M}_{j}}(x), it yields that the left hand side of (3.41) is equal to (also note that 𝐗(N)t=𝑑𝐘t\mathbf{X}^{\leq t}_{(N)}\overset{d}{=}\mathbf{Y}^{\leq t})

p{𝐗(0)t;1}(xt)p{𝐗(N)t}(xt)p{𝐗(0)t;1}(xt)p{𝐗(N)t;N}(xt)j=1Np{𝐗(j1)t;j}(xt)p{𝐗(j)t;j}(xt).\displaystyle\frac{p_{\{\mathbf{X}^{\leq t}_{(0)};\mathcal{M}_{1}\}}(x^{\leq t})}{{p}_{\{\mathbf{X}^{\leq t}_{(N)}\}}(x^{\leq t})}\leq\frac{p_{\{\mathbf{X}^{\leq t}_{(0)};\mathcal{M}_{1}\}}(x^{\leq t})}{{p}_{\{\mathbf{X}^{\leq t}_{(N)};\mathcal{M}_{N}\}}(x^{\leq t})}\leq\prod_{j=1}^{N}\frac{{p}_{\{\mathbf{X}^{\leq t}_{(j-1)};\mathcal{M}_{j}\}}(x^{\leq t})}{{p}_{\{\mathbf{X}^{\leq t}_{(j)};\mathcal{M}_{j}\}}(x^{\leq t})}\,. (3.42)

Since Nn2N\leq n^{2}, we can conclude the proof of Lemma 3.18 by combining Lemma 3.19 below.

Lemma 3.19.

For an amenable set-realization (Bt,Bt1)(\mathrm{B}_{t},\mathrm{B}_{t-1}) and an amenable variable-realization xtx^{\leq t}, we have for all 1jN1\leq j\leq N

p{𝐗(j1)t;j}(xt)p{𝐗(j)t;j}(xt)=1+O(Kt20ϑ3(logn)3n3logloglogn/nnq^).\displaystyle\frac{{p}_{\{\mathbf{X}^{\leq t}_{(j-1)};\mathcal{M}_{j}\}}(x^{\leq t})}{{p}_{\{\mathbf{X}^{\leq t}_{(j)};\mathcal{M}_{j}\}}(x^{\leq t})}=1+O\big{(}K_{t}^{20}\vartheta^{-3}(\log n)^{3}n^{\frac{3}{\log\log\log n}}/n\sqrt{n\hat{q}}\big{)}\,.

The proof of Lemma 3.19 requires a couple of results on the Gaussian-smoothed density.

Lemma 3.20.

For 1dm1\leq d\leq m and C>0C>0, let U=(U1,,Um)U=(U_{1},\ldots,U_{m}) be a random vector such that |Uk|C|U_{k}|\leq C for 1kd1\leq k\leq d, and let X1,,Xd,Y1,,YdX_{1},\ldots,X_{d},Y_{1},\ldots,Y_{d} be sub-Gaussian random variables independent with {U1,,Ud}\{U_{1},\ldots,U_{d}\} such that 𝔼[Xk]=𝔼[Yk]\mathbb{E}[X_{k}]=\mathbb{E}[Y_{k}] and 𝔼[XkXl]=𝔼[YkYl]\mathbb{E}[X_{k}X_{l}]=\mathbb{E}[Y_{k}Y_{l}] for any 1k,ld1\leq k,l\leq d. Define X~\tilde{X} such that X~k=Xk\tilde{X}_{k}=X_{k} for 1kd1\leq k\leq d and X~k=0\tilde{X}_{k}=0 for d+1kmd+1\leq k\leq m (and define Y~\tilde{Y} similarly). Then, for any positive definite mmm\!*\!m matrix A\mathrm{A},

|𝔼[e12(UA2+2U,X~AX~A2)e12(UA2+2U,Y~AY~A2)]|\displaystyle\Big{|}\mathbb{E}\Big{[}e^{\frac{1}{2}(-\|U\|^{2}_{\mathrm{A}}+2\langle U,\tilde{X}\rangle_{\mathrm{A}}-\|\tilde{X}\|^{2}_{\mathrm{A}})}-e^{\frac{1}{2}(-\|U\|^{2}_{\mathrm{A}}+2\langle U,\tilde{Y}\rangle_{\mathrm{A}}-\|\tilde{Y}\|^{2}_{\mathrm{A}})}\Big{]}\Big{|} (3.43)
\displaystyle\leq 100d3(CA+AopA1op1/2)3𝔼[e12UA2]𝔼[eCAX1X3+eCAY1Y3].\displaystyle 100d^{3}\big{(}C\|\mathrm{A}\|_{\infty}+\|\mathrm{A}\|_{\mathrm{op}}\|\mathrm{A}^{-1}\|_{\mathrm{op}}^{1/2}\big{)}^{3}\mathbb{E}\Big{[}e^{-\frac{1}{2}\|U\|^{2}_{\mathrm{A}}}\Big{]}\mathbb{E}\Big{[}e^{C\|\mathrm{A}\|_{\infty}\|{X}\|_{1}}\|X\|^{3}+e^{C\|\mathrm{A}\|_{\infty}\|{Y}\|_{1}}\|Y\|^{3}\Big{]}.
Proof.

For xdx\in\mathbb{R}^{d} write ψ(x)=e12x~A2+U,x~A\psi(x)=e^{-\frac{1}{2}\|\tilde{x}\|^{2}_{\mathrm{A}}+\langle U,\tilde{x}\rangle_{\mathrm{A}}}, where x~i=xi\tilde{x}_{i}=x_{i} for 1id1\leq i\leq d and x~i=0\tilde{x}_{i}=0 for d+1imd+1\leq i\leq m. Then (3.43) is equal to |𝔼[e12UA2(ψ(X1,,Xd)ψ(Y1,,Yd))]|\big{|}\mathbb{E}\big{[}e^{-\frac{1}{2}\|U\|^{2}_{\mathrm{A}}}\big{(}\psi(X_{1},\ldots,X_{d})-\psi(Y_{1},\ldots,Y_{d})\big{)}\big{]}\big{|}. This motivates the consideration of high-dimensional Taylor’s expansion. Define ψa=xaψ(0,,0)\psi^{\prime}_{a}=\frac{\partial}{\partial x_{a}}\psi(0,\ldots,0), ψab′′=2xaxbψ(0,,0)\psi^{\prime\prime}_{ab}=\frac{\partial^{2}}{\partial x_{a}\partial x_{b}}\psi(0,\ldots,0) and ψabc′′′(t1,,td)=3xaxbxcψ(t1,,td)\psi^{\prime\prime\prime}_{abc}(t_{1},\ldots,t_{d})=\frac{\partial^{3}}{\partial x_{a}\partial x_{b}\partial x_{c}}\psi(t_{1},\ldots,t_{d}). Then,

|ψ(x1,,xd)ψ(0,,0)a=1dψaxa12a,b=1dψab′′xaxb|𝐑ψd3x3,\displaystyle\Big{|}\psi(x_{1},\ldots,x_{d})-\psi(0,\ldots,0)-\sum_{a=1}^{d}\psi^{\prime}_{a}x_{a}-\frac{1}{2}\sum_{a,b=1}^{d}\psi^{\prime\prime}_{ab}x_{a}x_{b}\Big{|}\leq\mathbf{R}_{\psi}d^{3}\|x\|^{3}\,,

where the remainder 𝐑ψ\mathbf{R}_{\psi} is bounded by

|𝐑ψ(x1,,xd)|sup|tj||xj|,1jd|ψabc′′′(t1,,td)|.\displaystyle|\mathbf{R}_{\psi}(x_{1},\ldots,x_{d})|\leq\sup_{|t_{j}|\leq|x_{j}|,1\leq j\leq d}|\psi^{\prime\prime\prime}_{abc}(t_{1},\ldots,t_{d})|\,. (3.44)

Since ψ,ψ′′\psi^{\prime},\psi^{\prime\prime} are random variables measurable with respect to {Uk:1km}\{U_{k}:1\leq k\leq m\}, we get

𝔼[e12UA2ψa(XaYa)]=𝔼[e12UA2ψa]𝔼[(XaYa)]=0,\displaystyle\mathbb{E}\Big{[}e^{-\frac{1}{2}\|U\|^{2}_{\mathrm{A}}}\psi^{\prime}_{a}(X_{a}-Y_{a})\Big{]}=\mathbb{E}\Big{[}e^{-\frac{1}{2}\|U\|^{2}_{\mathrm{A}}}\psi^{\prime}_{a}\Big{]}\mathbb{E}\Big{[}(X_{a}-Y_{a})\Big{]}=0\,, (3.45)
𝔼[e12UA2ψab′′(XaXbYaYb)]=𝔼[e12UA2ψab′′]𝔼[(XaXbYaYb)]=0.\displaystyle\mathbb{E}\Big{[}e^{-\frac{1}{2}\|U\|^{2}_{\mathrm{A}}}\psi^{\prime\prime}_{ab}(X_{a}X_{b}-Y_{a}Y_{b})\Big{]}=\mathbb{E}\Big{[}e^{-\frac{1}{2}\|U\|^{2}_{\mathrm{A}}}\psi^{\prime\prime}_{ab}\Big{]}\mathbb{E}\Big{[}(X_{a}X_{b}-Y_{a}Y_{b})\Big{]}=0\,.

Define fjf_{j} to be the jj’th standard basis in m\mathbb{R}^{m}, and define t~\tilde{t} from tt similar to defining x~\tilde{x} from xx. Since

t~,UAAUt~1UAt~1\langle\tilde{t},U\rangle_{\mathrm{A}}\leq\|\mathrm{A}U^{*}\|_{\infty}\|\tilde{t}\|_{1}\leq\|U\|_{\infty}\|\mathrm{A}\|_{\infty}\|\tilde{t}\|_{1}

which is bounded by CAx1C\|\mathrm{A}\|_{\infty}\|x\|_{1} if |tj||xj||t_{j}|\leq|x_{j}| for all jj, we have that for distinct a,b,ca,b,c

sup|tj||xj||ψabc′′′(t1,,td)|=sup|tj||xj||e12t~A2+t~,UAτ{a,b,c}(fτ,UAfτ,t~A)|\displaystyle\sup_{|t_{j}|\leq|x_{j}|}|\psi^{\prime\prime\prime}_{abc}(t_{1},\ldots,t_{d})|=\sup_{|t_{j}|\leq|x_{j}|}\Big{|}e^{-\frac{1}{2}\|\tilde{t}\|_{\mathrm{A}}^{2}+\langle\tilde{t},U\rangle_{\mathrm{A}}}\prod_{\tau\in\{a,b,c\}}(\langle f_{\tau},U\rangle_{\mathrm{A}}-\langle f_{\tau},\tilde{t}\rangle_{\mathrm{A}})\Big{|}
eCAx1sup|tj||xj||τ{a,b,c}((fτ,UAfτ,t~A)e16t~A2)|,\displaystyle\leq e^{C\|\mathrm{A}\|_{\infty}\|x\|_{1}}\sup_{|t_{j}|\leq|x_{j}|}\Big{|}\prod_{\tau\in\{a,b,c\}}\Big{(}(\langle f_{\tau},U\rangle_{\mathrm{A}}-\langle f_{\tau},\tilde{t}\rangle_{\mathrm{A}})e^{-\frac{1}{6}\|\tilde{t}\|_{\mathrm{A}}^{2}}\Big{)}\Big{|}\,,

where e16t~A2e^{-\frac{1}{6}\|\tilde{t}\|_{\mathrm{A}}^{2}} arises since we split e12t~A2e^{-\frac{1}{2}\|\tilde{t}\|_{\mathrm{A}}^{2}} into the product of three copies of e16t~A2e^{-\frac{1}{6}\|\tilde{t}\|_{\mathrm{A}}^{2}}. Since e16x2x1e^{-\frac{1}{6}x^{2}}x\leq 1 for all xx\in\mathbb{R}, we have that

e16t~A2|fτ,t~A|e16t~A2t~Ae16A1op1t~2Aopt~AopA1op12.\displaystyle e^{-\frac{1}{6}\|\tilde{t}\|_{\mathrm{A}}^{2}}|\langle f_{\tau},\tilde{t}\rangle_{\mathrm{A}}|\leq e^{-\frac{1}{6}\|\tilde{t}\|_{\mathrm{A}}^{2}}\|\tilde{t}\mathrm{A}\|\leq e^{-\frac{1}{6}\|\mathrm{A}^{-1}\|_{\mathrm{op}}^{-1}\|\tilde{t}\|^{2}}\|\mathrm{A}\|_{\mathrm{op}}\|\tilde{t}\|\leq\|\mathrm{A}\|_{\mathrm{op}}\|\mathrm{A}^{-1}\|_{\mathrm{op}}^{\frac{1}{2}}\,.

Combining the preceding two inequalities with

|fτ,UA|AUfτ1UAfτ1CA,|\langle f_{\tau},U\rangle_{\mathrm{A}}|\leq\|\mathrm{A}U^{*}\|_{\infty}\|f_{\tau}\|_{1}\leq\|U\|_{\infty}\|\mathrm{A}\|_{\infty}\|f_{\tau}\|_{1}\leq C\|\mathrm{A}\|_{\infty}\,,

we get that |ψabc′′′(t1,,td)|eCAx1(CA+AopA1op1/2)3|\psi^{\prime\prime\prime}_{abc}(t_{1},\ldots,t_{d})|\leq e^{C\|\mathrm{A}\|_{\infty}\|x\|_{1}}(C\|\mathrm{A}\|_{\infty}+\|\mathrm{A}\|_{\mathrm{op}}\|\mathrm{A}^{-1}\|_{\mathrm{op}}^{1/2})^{3}. Similarly, we have

sup|tj||xj||ψaab′′′(t1,,td)|eCAx1(2CA+4AopA1op1/2)3,\displaystyle\sup_{|t_{j}|\leq|x_{j}|}|\psi^{\prime\prime\prime}_{aab}(t_{1},\ldots,t_{d})|\leq e^{C\|\mathrm{A}\|_{\infty}\|x\|_{1}}(2C\|\mathrm{A}\|_{\infty}+4\|\mathrm{A}\|_{\mathrm{op}}\|\mathrm{A}^{-1}\|_{\mathrm{op}}^{1/2})^{3}\,,
sup|tj||xj||ψaaa′′′(t1,,td)|eCAx1(4(CA)3+100(AopA1op1/2)3).\displaystyle\sup_{|t_{j}|\leq|x_{j}|}|\psi^{\prime\prime\prime}_{aaa}(t_{1},\ldots,t_{d})|\leq e^{C\|\mathrm{A}\|_{\infty}\|x\|_{1}}\big{(}4(C\|\mathrm{A}\|_{\infty})^{3}+100(\|\mathrm{A}\|_{\mathrm{op}}\|\mathrm{A}^{-1}\|_{\mathrm{op}}^{1/2})^{3}\big{)}\,.

Therefore, 𝐑ψ(x1,,xd)100(CA+AopA1op1/2)3exp{CAx1}\mathbf{R}_{\psi}(x_{1},\ldots,x_{d})\leq 100\big{(}C\|\mathrm{A}\|_{\infty}+\|\mathrm{A}\|_{\mathrm{op}}\|\mathrm{A}^{-1}\|_{\mathrm{op}}^{1/2}\big{)}^{3}\exp\{C\|\mathrm{A}\|_{\infty}\|x\|_{1}\}. Combined with (3.44) and (3.45), it yields that (3.43) is bounded by

100d3(CA+AopA1op1/2)3𝔼[e12UA2(eCAX1X3+eCAY1Y3)],\displaystyle 100d^{3}\big{(}C\|\mathrm{A}\|_{\infty}+\|\mathrm{A}\|_{\mathrm{op}}\|\mathrm{A}^{-1}\|_{\mathrm{op}}^{1/2}\big{)}^{3}\mathbb{E}\Big{[}e^{-\frac{1}{2}\|U\|^{2}_{\mathrm{A}}}\Big{(}e^{C\|\mathrm{A}\|_{\infty}\|{X}\|_{1}}\|X\|^{3}+e^{C\|\mathrm{A}\|_{\infty}\|{Y}\|_{1}}\|Y\|^{3}\Big{)}\Big{]}\,,

completing the proof since {X1,,Xd,Y1,,Yd}\{X_{1},\ldots,X_{d},Y_{1},\ldots,Y_{d}\} is independent of {U1,,Ud}\{U_{1},\ldots,U_{d}\}. ∎

The next lemma will be useful in bounding the density change when locally replacing Bernoulli variables by Gaussian variables.

Lemma 3.21.

For 1dm1\leq d\leq m, let Z=(Z1,,Zm)𝒩(0,Σ)Z=(Z_{1},\ldots,Z_{m})\sim\mathcal{N}(0,\Sigma) be a normal vector, and let U=(U1,,Um)U=(U_{1},\ldots,U_{m}) be a sub-Gaussian vector independent with ZZ such that |Uk|C|U_{k}|\leq C. Let B,B,𝖡,𝖡B,B^{\prime},\mathsf{B},\mathsf{B}^{\prime}, G,G,𝖦,𝖦G,G^{\prime},\mathsf{G},\mathsf{G}^{\prime} be random variables independent with Z,UZ,U such that B,B,𝖡,𝖡B,B^{\prime},\mathsf{B},\mathsf{B}^{\prime} all have the same law as Ber(q)q\mathrm{Ber}(q)-q, and G,G,𝖦,𝖦𝒩(0,q(1q))G,G^{\prime},\mathsf{G},\mathsf{G}^{\prime}\sim\mathcal{N}(0,q(1-q)). Also, suppose that (B,B,𝖡,𝖡)(B,B^{\prime},\mathsf{B},\mathsf{B}^{\prime}) and (G,G,𝖦,𝖦)(G,G^{\prime},\mathsf{G},\mathsf{G}^{\prime}) have the same covariance matrix. For any α,β,θ,γd\alpha,\beta,\theta,\gamma\in\mathbb{R}^{d} with \ell_{\infty}-norms at most ϵ\epsilon, define α~m\tilde{\alpha}\in\mathbb{R}^{m} such that α~(i)=α(i)\tilde{\alpha}(i)=\alpha(i) for 1id1\leq i\leq d and α~(i)=0\tilde{\alpha}(i)=0 for d+1imd+1\leq i\leq m; we similarly define β~,θ~,γ~m\tilde{\beta},\tilde{\theta},\tilde{\gamma}\in\mathbb{R}^{m}. Then for all λm\lambda\in\mathbb{R}^{m}, the joint densities of (Z+U+Bα~+Bβ~+𝖡θ~+𝖡γ~)(Z+U+B\tilde{\alpha}+B^{\prime}\tilde{\beta}+\mathsf{B}\tilde{\theta}+\mathsf{B}^{\prime}\tilde{\gamma}) and (Z+U+Gα~+Gβ~+𝖦θ~+𝖦γ~)(Z+U+G\tilde{\alpha}+G^{\prime}\tilde{\beta}+\mathsf{G}\tilde{\theta}+\mathsf{G}^{\prime}\tilde{\gamma}) satisfy that

|p{Z+U+Bα~+Bβ~+𝖡θ~+𝖡γ~}(λ)p{Z+U+Gα~+Gβ~+𝖦θ~+𝖦γ~}(λ)1|\displaystyle\Big{|}\frac{p_{\{Z+U+B\tilde{\alpha}+B^{\prime}\tilde{\beta}+\mathsf{B}\tilde{\theta}+\mathsf{B}^{\prime}\tilde{\gamma}\}}(\lambda)}{p_{\{Z+U+G\tilde{\alpha}+G^{\prime}\tilde{\beta}+\mathsf{G}\tilde{\theta}+\mathsf{G}^{\prime}\tilde{\gamma}\}}(\lambda)}-1\Big{|}\leq\ 104d5ϵ3qe8dϵ2q((λ+C)Σ1+Σ1opΣop1/2)3\displaystyle 10^{4}d^{5}\epsilon^{3}qe^{8d\epsilon^{2}q}\big{(}(\|\lambda\|_{\infty}+C)\|\Sigma^{-1}\|_{\infty}+\|\Sigma^{-1}\|_{\mathrm{op}}\|\Sigma\|_{\mathrm{op}}^{1/2}\big{)}^{3}
(e16d2ϵΣ1(C+λ)+e4d3qϵ2Σ12(C+λ)2).\displaystyle*\Big{(}e^{16d^{2}\epsilon\|\Sigma^{-1}\|_{\infty}(C+\|\lambda\|_{\infty})}+e^{4d^{3}q\epsilon^{2}\|\Sigma^{-1}\|^{2}_{\infty}(C+\|\lambda\|_{\infty})^{2}}\Big{)}\,.
Proof.

We have that pZ+U(λ)=(det(Σ))12(2π)d2𝔼U[exp{12λUΣ12}]p_{Z+U}(\lambda)=(\mathrm{det}(\Sigma))^{-\frac{1}{2}}(2\pi)^{-\frac{d}{2}}\mathbb{E}_{U}\big{[}\exp\{-\frac{1}{2}\|\lambda-U\|_{\Sigma^{-1}}^{2}\}\big{]}, where 𝔼U\mathbb{E}_{U} is the expectation by averaging over UU. Writing X~=Bα~+Bβ~+𝖡θ~+𝖡γ~\tilde{X}=B\tilde{\alpha}+B^{\prime}\tilde{\beta}+\mathsf{B}\tilde{\theta}+\mathsf{B}^{\prime}\tilde{\gamma} and writing Y~=Gα~+Gβ~+𝖦θ~+𝖦γ~\tilde{Y}=G\tilde{\alpha}+G^{\prime}\tilde{\beta}+\mathsf{G}\tilde{\theta}+\mathsf{G}^{\prime}\tilde{\gamma}, we then have

pZ+U+X~(λ)pZ+U+Y~(λ)1=𝔼[e12λUX~Σ12]𝔼[e12λUY~Σ12]1=1×2\displaystyle\frac{p_{Z+U+\tilde{X}}(\lambda)}{p_{Z+U+\tilde{Y}}(\lambda)}-1=\frac{\mathbb{E}\big{[}e^{-\frac{1}{2}\|\lambda-U-\tilde{X}\|^{2}_{\Sigma^{-1}}}\big{]}}{\mathbb{E}\big{[}e^{-\frac{1}{2}\|\lambda-U-\tilde{Y}\|^{2}_{\Sigma^{-1}}}\big{]}}-1=\mathfrak{I}_{1}\times\mathfrak{I}_{2} (3.46)

where 1=𝔼[e12λUΣ12]𝔼[e12λUY~Σ12]\mathfrak{I}_{1}=\frac{\mathbb{E}\big{[}e^{-\frac{1}{2}\|\lambda-U\|^{2}_{\Sigma^{-1}}}\big{]}}{\mathbb{E}\big{[}e^{-\frac{1}{2}\|\lambda-U-\tilde{Y}\|^{2}_{\Sigma^{-1}}}\big{]}} and

2=𝔼[e12λUΣ12(eλU,X~Σ112X~Σ12eλU,Y~Σ112Y~Σ12)]𝔼[e12λUΣ12].\displaystyle\mathfrak{I}_{2}=\frac{\mathbb{E}\big{[}e^{-\frac{1}{2}\|\lambda-U\|^{2}_{\Sigma^{-1}}}\big{(}e^{\langle\lambda-U,\tilde{X}\rangle_{\Sigma^{-1}}-\frac{1}{2}\|\tilde{X}\|^{2}_{\Sigma^{-1}}}-e^{\langle\lambda-U,\tilde{Y}\rangle_{\Sigma^{-1}}-\frac{1}{2}\|\tilde{Y}\|^{2}_{\Sigma^{-1}}}\big{)}\big{]}}{\mathbb{E}\big{[}e^{-\frac{1}{2}\|\lambda-U\|^{2}_{\Sigma^{-1}}}\big{]}}\,. (3.47)

Here the equality in (3.46) holds since one may simply cancel out the denominator in 2\mathfrak{I}_{2} with the factor in 1\mathfrak{I}_{1}. Thus, it suffices to bound 1\mathfrak{I}_{1} and 2\mathfrak{I}_{2} separately.

We first bound 2\mathfrak{I}_{2}. Applying Lemma 3.20 with A=Σ1\mathrm{A}=\Sigma^{-1} and using |λkUk||λk|+Ckλ+C|\lambda_{k}-U_{k}|\leq|\lambda_{k}|+C_{k}\leq\|\lambda\|_{\infty}+C and X1dX\|X\|_{1}\leq d\|X\| we get that

2\displaystyle\mathfrak{I}_{2}\leq\ 100d3((λ+C)Σ1+Σ1opΣop1/2)3\displaystyle 100d^{3}((\|\lambda\|_{\infty}+C)\|\Sigma^{-1}\|_{\infty}+\|\Sigma^{-1}\|_{\mathrm{op}}\|\Sigma\|_{\mathrm{op}}^{1/2})^{3}
\displaystyle* (𝔼[ed(C+λ)Σ1XX3]+𝔼[ed(C+λ)Σ1YY3]).\displaystyle\Big{(}\mathbb{E}\Big{[}e^{d(C+\|\lambda\|_{\infty})\|\Sigma^{-1}\|_{\infty}\|X\|}\|X\|^{3}\Big{]}+\mathbb{E}\Big{[}e^{d(C+\|\lambda\|_{\infty})\|\Sigma^{-1}\|_{\infty}\|Y\|}\|Y\|^{3}\Big{]}\Big{)}\,.

For notational convenience, we denote 𝙴X\mathtt{E}_{X} and 𝙴Y\mathtt{E}_{Y} the two expectations in the preceding inequality. Since |Xk|=|Bαk+Bβk+𝖡θk+𝖡γk|4ϵ|X_{k}|=|B\alpha_{k}+B^{\prime}\beta_{k}+\mathsf{B}\theta_{k}+\mathsf{B}^{\prime}\gamma_{k}|\leq 4\epsilon, we have X2dϵ\|X\|\leq 2\sqrt{d}\epsilon and [X=0][B,B,𝖡,𝖡=0]14q\mathbb{P}[\|X\|=0]\geq\mathbb{P}[B,B^{\prime},\mathsf{B},\mathsf{B}^{\prime}=0]\geq 1-4q. Thus,

𝙴X\displaystyle\mathtt{E}_{X} 𝔼X[e4d2ϵ(C+λ)Σ1(2dϵ)3𝟏X0]100d2ϵ3qe4d2ϵ(C+λ)Σ1.\displaystyle\leq\mathbb{E}_{X}\Big{[}e^{4d^{2}\epsilon(C+\|\lambda\|_{\infty})\|\Sigma^{-1}\|_{\infty}}(2\sqrt{d}\epsilon)^{3}\mathbf{1}_{\|X\|\neq 0}\Big{]}\leq 100d^{2}\epsilon^{3}qe^{4d^{2}\epsilon(C+\|\lambda\|_{\infty})\|\Sigma^{-1}\|_{\infty}}\,. (3.48)

Since Yk=Gαk+Gβk+𝖦θk+𝖦γkY_{k}=G\alpha_{k}+G^{\prime}\beta_{k}+\mathsf{G}\theta_{k}+\mathsf{G}^{\prime}\gamma_{k} is a sub-Gaussian variable with sub-Gaussian norm at most 16ϵ2q16\epsilon^{2}q, we see that [Yt]2det216ϵ2dq\mathbb{P}\big{[}\|Y\|\geq t\Big{]}\leq 2de^{-\frac{t^{2}}{16\epsilon^{2}dq}}. Consequently,

𝙴Y100d2ϵ3qe4d3qϵ2Σ12(C+λ)2.\displaystyle\mathtt{E}_{Y}\leq 100d^{2}\epsilon^{3}qe^{4d^{3}q\epsilon^{2}\|\Sigma^{-1}\|_{\infty}^{2}(C+\|\lambda\|_{\infty})^{2}}\,.

Combined with (3.48), it yields that

2104d5ϵ3q((λ+C)Σ1+Σ1opΣop1/2)3\displaystyle\mathfrak{I}_{2}\leq 10^{4}d^{5}\epsilon^{3}q\big{(}(\|\lambda\|_{\infty}+C)\|\Sigma^{-1}\|_{\infty}+\|\Sigma^{-1}\|_{\mathrm{op}}\|\Sigma\|_{\mathrm{op}}^{1/2}\big{)}^{3}
(e4d2ϵΣ1(C+λ)+e4d3qϵ2Σ12(C+λ)2).\displaystyle*\Big{(}e^{4d^{2}\epsilon\|\Sigma^{-1}\|_{\infty}(C+\|\lambda\|_{\infty})}+e^{4d^{3}q\epsilon^{2}\|\Sigma^{-1}\|^{2}_{\infty}(C+\|\lambda\|_{\infty})^{2}}\Big{)}\,. (3.49)

We next bound 1\mathfrak{I}_{1}. Applying Jensen’s inequality, we get that

𝔼U,Y[e12λUY~Σ12]𝔼U[e𝔼Y[12λUY~Σ12]]\displaystyle\mathbb{E}_{U,Y}\Big{[}e^{-\frac{1}{2}\|\lambda-U-\tilde{Y}\|^{2}_{\Sigma^{-1}}}\Big{]}\geq\mathbb{E}_{U}\Big{[}e^{\mathbb{E}_{Y}[-\frac{1}{2}\|\lambda-U-\tilde{Y}\|^{2}_{\Sigma^{-1}}]}\Big{]}
\displaystyle\geq\ eΣ1op𝔼[Y~2]𝔼U[e12λUΣ12]e16dqϵ2Σ1op𝔼U[e12λUΣ12],\displaystyle e^{-\|\Sigma^{-1}\|_{\mathrm{op}}\mathbb{E}[\|\tilde{Y}\|^{2}]}\mathbb{E}_{U}\Big{[}e^{-\frac{1}{2}\|\lambda-U\|^{2}_{\Sigma^{-1}}}\Big{]}\geq e^{-16dq\epsilon^{2}\|\Sigma^{-1}\|_{\mathrm{op}}}\mathbb{E}_{U}\Big{[}e^{-\frac{1}{2}\|\lambda-U\|^{2}_{\Sigma^{-1}}}\Big{]}\,,

where the second inequality uses independence and the last inequality uses 𝔼Yk216qϵ2\mathbb{E}Y_{k}^{2}\leq 16q\epsilon^{2}. Thus, 1e16dϵ2qΣ1op\mathfrak{I}_{1}\leq e^{16d\epsilon^{2}q\|\Sigma^{-1}\|_{\mathrm{op}}}. Combined with (3.49), this yields the desired bound. ∎

By Lemma 3.21, we need to employ suitable truncations in order to control the density ratio; this is why we defined LARGEtBADt\mathrm{LARGE}_{t}\subset\mathrm{BAD}_{t} as in (3.11). We now prove Lemma 3.19.

Proof of Lemma 3.19..

Fix 1jN1\leq j\leq N. We now set the framework for applying Lemma 3.21. Recall (the third equality of) (3.9). Define

Z(s,k,v)=Wv(s)(k)+𝐗[j1]t(s,k,v),λ(s,k,v)=xt(s,k,v),\displaystyle Z(s,k,v)=W^{(s)}_{v}(k)+\mathbf{X}^{\leq t}_{[j-1]}(s,k,v),\quad\lambda(s,k,v)=x^{\leq t}(s,k,v)\,,
Z(s,k,𝗏)=𝖶𝗏(s)(k)+𝐗[j1]t(s,k,𝗏),λ(s,k,𝗏)=xt(s,k,𝗏).\displaystyle Z(s,k,\mathsf{v})=\mathsf{W}^{(s)}_{\mathsf{v}}(k)+\mathbf{X}^{\leq t}_{[j-1]}(s,k,\mathsf{v}),\quad\lambda(s,k,\mathsf{v})={x}^{\leq t}(s,k,\mathsf{v})\,.

Let (B,B,𝖡,𝖡)=(Guj,wjq^,Gwj,ujq^,𝖦π(uj),π(wj)q^,𝖦π(wj),π(uj)q^)(B,B^{\prime},\mathsf{B},\mathsf{B}^{\prime})=(\overrightarrow{G}_{u_{j},w_{j}}-\hat{q},\overrightarrow{G}_{w_{j},u_{j}}-\hat{q},\overrightarrow{\mathsf{G}}_{\pi(u_{j}),\pi(w_{j})}-\hat{q},\overrightarrow{\mathsf{G}}_{\pi(w_{j}),\pi(u_{j})}-\hat{q}), and let

α(s,k,v)=i=1Ksηk(s)(i)𝟏{wjξi(s)}𝔞s(𝔞s𝔞s2)nq^(1q^) for v=uj and α(s,k,v)=0 for vuj.\alpha(s,k,v)=\sum_{i=1}^{K_{s}}\eta^{(s)}_{k}(i)\frac{\mathbf{1}_{\{w_{j}\in\xi^{(s)}_{i}\}}-\mathfrak{a}_{s}}{\sqrt{(\mathfrak{a}_{s}-\mathfrak{a}_{s}^{2})n\hat{q}(1-\hat{q})}}\mbox{ for }v=u_{j}\mbox{ and }\alpha(s,k,v)=0\mbox{ for }v\neq u_{j}\,.

Thus we see that α(s,k,v)\alpha(s,k,v) is the coefficient of Guj,wjq^\overrightarrow{G}_{u_{j},w_{j}}-\hat{q} in ηk(s),φv(s)\langle\eta^{(s)}_{k},\varphi^{(s)}_{v}\rangle. Similarly denote by β(s,k,v)\beta(s,k,v) the coefficient of (Gwj,ujq^)(\overrightarrow{G}_{w_{j},u_{j}}-\hat{q}) in ηk(s),φv(s)\langle\eta^{(s)}_{k},\varphi^{(s)}_{v}\rangle, denote by θ(s,k,𝗏)\theta(s,k,\mathsf{v}) the coefficient of (𝖦π(uj),π(wj)q^)(\overrightarrow{\mathsf{G}}_{\pi(u_{j}),\pi(w_{j})}-\hat{q}) in ηk(s),ψ𝗏(s)\langle\eta^{(s)}_{k},\psi^{(s)}_{\mathsf{v}}\rangle, and denote by γ(s,k,𝗏)\gamma(s,k,\mathsf{v}) the coefficient of (𝖦π(wj),π(uj)q^)(\overrightarrow{\mathsf{G}}_{\pi(w_{j}),\pi(u_{j})}-\hat{q}) in ηk(s),ψ𝗏(s)\langle\eta^{(s)}_{k},\psi^{(s)}_{\mathsf{v}}\rangle. Further, define U(s,k,v)=ηk(s),φv(s)jU(s,k,v)=\langle\eta^{(s)}_{k},\varphi^{(s)}_{v}\rangle_{\langle j\rangle} and U(s,k,π(v))=ηk(s),φπ(v)(s)jU(s,k,\pi(v))=\langle\eta^{(s)}_{k},\varphi^{(s)}_{\pi(v)}\rangle_{\langle j\rangle}. Then we have 𝐗(j1)t=U+Z+(Bα+Bβ+𝖡θ+𝖡γ)\mathbf{X}^{\leq t}_{(j-1)}=U+Z+(B\alpha+B^{\prime}\beta+\mathsf{B}\theta+\mathsf{B}^{\prime}\gamma), and on j\mathcal{M}_{j} we have that Un1logloglogn\|U\|_{\infty}\leq n^{\frac{1}{\log\log\log n}}. Also, we have |xt(s,k,v)|2(logn)n1logloglogn|x^{\leq t}(s,k,v)|\leq 2(\log n)n^{\frac{1}{\log\log\log n}} since xtx^{\leq t} is an amenable variable-realization. Thus, by ηk22\|\eta_{k}\|^{2}\leq 2 and Cauchy-Schwartz inequality,

|α(s,k,uj)|,|β(s,k,wj)|,|θ(s,k,π(uj))|,|γ(s,k,π(wj))|(Ks/𝔞snq^)1/2.\displaystyle|\alpha(s,k,u_{j})|,|\beta(s,k,w_{j})|,|\theta(s,k,\pi(u_{j}))|,|\gamma(s,k,\pi(w_{j}))|\leq(K_{s}/\mathfrak{a}_{s}n\hat{q})^{1/2}\,.

Finally, let Σ[j1]\Sigma_{[j-1]} be the covariance matrix of ZZ. Since Σ[j1]I\Sigma_{[j-1]}-\mathrm{I} is the covariance matrix of {𝐗[j1]t}\{\mathbf{X}^{\leq t}_{[j-1]}\}, we have Σ[j1]1op1\|\Sigma^{-1}_{[j-1]}\|_{\mathrm{op}}\leq 1. Also, Σ[j1]2\|\Sigma_{[j-1]}\|_{\infty}\leq 2 and

Σ[j1]((s,k,u);(r,l,v))=O((𝟏viΓi(s)𝔞s)(𝟏uiΓi(r)𝔞r)/n𝔞s𝔞r) for uv.\Sigma_{[j-1]}((s,k,u);(r,l,v))=O\big{(}(\mathbf{1}_{v\in\cup_{i}\Gamma^{(s)}_{i}}-\mathfrak{a}_{s})(\mathbf{1}_{u\in\cup_{i}\Gamma^{(r)}_{i}}-\mathfrak{a}_{r})/n\sqrt{\mathfrak{a}_{s}\mathfrak{a}_{r}}\big{)}\mbox{ for }u\neq v\,.

Thus, we may apply Lemma 3.8 with

v=𝒥v={(s,k,v),(s,k,π(v)):0st,1kKs/12}\displaystyle\mathcal{I}_{v}=\mathcal{J}_{v}=\big{\{}(s,k,v),(s,k,\pi(v)):0\leq s\leq t,1\leq k\leq K_{s}/12\big{\}}

and derive that Σ[j1]op2Kt2\|\Sigma_{[j-1]}\|_{\mathrm{op}}\leq 2K_{t}^{2}. Furthermore, we can bound Σ[j1]1\|\Sigma^{-1}_{[j-1]}\|_{\infty} by Lemma 3.7 as follows. Set A\mathrm{A} (in Lemma 3.7) to be a matrix with A((s,k,v),(r,l,v))=Σ[j1]((s,k,v),(r,l,v))\mathrm{A}((s,k,v),(r,l,v))=\Sigma_{[j-1]}((s,k,v),(r,l,v)) and all other entries being 0, and set B=Σ[j1]A\mathrm{B}=\Sigma_{[j-1]}-\mathrm{A}. Then (A+B)1op=Σ[j1]1op1\|(\mathrm{A+B})^{-1}\|_{\mathrm{op}}=\|\Sigma_{[j-1]}^{-1}\|_{\mathrm{op}}\leq 1, the entries of B\mathrm{B} are bounded by ϑ1Kt2/n\vartheta^{-1}K_{t}^{2}/n, and A=diag(I+Av)\mathrm{A}=\mathrm{diag}(\mathrm{I}+\mathrm{A}_{v}) is a block-diagonal matrix where each block Av\mathrm{A}_{v} is the covariance matrix of

{𝐗[j1]t(s,k,v),𝐗[j1]t(s,k,π(v)):0st,1kKs/12}.\big{\{}\mathbf{X}^{\leq t}_{[j-1]}(s,k,v),\mathbf{X}^{\leq t}_{[j-1]}(s,k,\pi(v)):0\leq s\leq t,1\leq k\leq K_{s}/12\big{\}}\,.

So Av\mathrm{A}_{v} is semi-positive definite and thus (I+Av)1op1\|(\mathrm{I}+\mathrm{A}_{v})^{-1}\|_{\mathrm{op}}\leq 1. Since also the dimension of Av\mathrm{A}_{v} is bounded by KtK_{t}, we have (I+Av)1Kt2(I+Av)1opKt2\|(\mathrm{I}+\mathrm{A}_{v})^{-1}\|_{\infty}\leq K_{t}^{2}\|(\mathrm{I}+\mathrm{A}_{v})^{-1}\|_{\mathrm{op}}\leq K_{t}^{2}. Thus, we have that A1=maxv{(I+Av)1}Kt2\|\mathrm{A}^{-1}\|_{\infty}=\max_{v}\{\|(\mathrm{I}+\mathrm{A}_{v})^{-1}\|_{\infty}\}\leq K_{t}^{2}. Therefore, we can apply Lemma 3.7 with C=1,m=Ktn,K=ϑ1Kt3C=1,m=K_{t}n,K=\vartheta^{-1}K_{t}^{3} and L=Kt2L=K_{t}^{2} and obtain that Σ[j1]12Kt5ϑ1\|\Sigma_{[j-1]}^{-1}\|_{\infty}\leq 2K_{t}^{5}\vartheta^{-1}. Applying Lemma 3.21 and using independence between {j,U}\{\mathcal{M}_{j},U\} and ZZ, we get that

|p{𝐗(j1)t;j}(xt)p{𝐗(j)t;j}(xt)1|\displaystyle\Big{|}\frac{{p}_{\{\mathbf{X}^{\leq t}_{(j-1)};\mathcal{M}_{j}\}}(x^{\leq t})}{{p}_{\{\mathbf{X}^{\leq t}_{(j)};\mathcal{M}_{j}\}}(x^{\leq t})}-1\Big{|}
\displaystyle\lesssim\ Kt5(nq^)3q^((logn)n1logloglognKt5ϑ1)3exp{4KtKt/ϑnq^2(logn)n1logloglogn}\displaystyle K_{t}^{5}(\sqrt{n\hat{q}})^{-3}\hat{q}((\log n)n^{\frac{1}{\log\log\log n}}K_{t}^{5}\vartheta^{-1})^{3}\exp\big{\{}4K_{t}\sqrt{K_{t}/\vartheta n\hat{q}}\cdot 2(\log n)n^{\frac{1}{\log\log\log n}}\big{\}}
\displaystyle\leq\ Kt20ϑ3(logn)3n3logloglogn/nnq^.\displaystyle K_{t}^{20}\vartheta^{-3}(\log n)^{3}n^{\frac{3}{\log\log\log n}}/n\sqrt{n\hat{q}}\,.\qed

3.5 Gaussian analysis

Since

(3.13) is independent of Gaussian variables {Wv(s),𝖶𝗏(s),Zv,u,𝖹𝗏,𝗎},\eqref{eq-to-be-conditioned-on}\mbox{ is independent of Gaussian variables }\big{\{}W^{(s)}_{v},\mathsf{W}^{(s)}_{\mathsf{v}},\overrightarrow{Z}_{v,u},\overrightarrow{\mathsf{Z}}_{\mathsf{v},\mathsf{u}}\big{\}}\,, (3.50)

when analyzing the process defined by (3.8) it would be convenient to and thus we will

 condition on the realization of (3.13).\mbox{ condition on the realization of }\eqref{eq-to-be-conditioned-on}\,. (3.51)

As such, the following process defined by (3.8) can be viewed as a Gaussian process:

{W~v(s)(k)+ηk(s),gtD~v(s)𝖶~π(v)(s)(k)+ηk(s),gt𝖣~π(v)(s):0st+1,1kKs12,vVBADt}.\displaystyle\Bigg{\{}\begin{split}&\tilde{W}^{(s)}_{v}(k)+\langle\eta^{(s)}_{k},g_{t}\tilde{D}^{(s)}_{v}\rangle\\ &\tilde{\mathsf{W}}^{(s)}_{\pi(v)}(k)+\langle\eta^{(s)}_{k},g_{t}\tilde{\mathsf{D}}^{(s)}_{\pi(v)}\rangle\end{split}:0\leq s\leq t+1,1\leq k\leq\frac{K_{s}}{12},v\in V\setminus\mathrm{BAD}_{t}\Bigg{\}}.

Note that our convention here is consistent with (3.36) (see also Remark 3.17). Recall that t\mathcal{F}_{t} is the σ\sigma-field generated by 𝔉t\mathfrak{F}_{t} (see (3.12)), which is slightly different from the above process since in 𝔉t\mathfrak{F}_{t} we have sts\leq t. We will study the conditional law of W~v(t+1)(k)+ηk(t+1),gtD~v(t+1)\tilde{W}^{(t+1)}_{v}(k)+\langle\eta^{(t+1)}_{k},g_{t}\tilde{D}^{(t+1)}_{v}\rangle given t\mathcal{F}_{t}. A plausible approach is to apply techniques of Gaussian projection developed in [17] (see also e.g. [3] for important development on problems related to a single random matrix). To this end, we define the operation 𝔼^\hat{\mathbb{E}} as follows: for any function hh (of the form h(Γ,Π,BADt,Z,𝖹,W~,𝖶~)h(\Gamma,\Pi,\mathrm{BAD}_{t},\overrightarrow{Z},\overrightarrow{\mathsf{Z}},\tilde{W},\tilde{\mathsf{W}})), define

𝔼^[h(Γ,Π,BADt,Z,𝖹,W~,𝖶~)]=𝔼[hΓ,Π,BADt].\hat{\mathbb{E}}[h(\Gamma,\Pi,\mathrm{BAD}_{t},\overrightarrow{Z},\overrightarrow{\mathsf{Z}},\tilde{W},\tilde{\mathsf{W}})]=\mathbb{E}[h\mid\Gamma,\Pi,\mathrm{BAD}_{t}]\,. (3.52)

Our definition of 𝔼^\hat{\mathbb{E}} appears to be simpler than that in [17] thanks to (3.50). We emphasize that the two definitions are in fact identical, and the simplicity of the expression in (3.52) is due to the reason that we have introduced an independent copy of Gaussian process already for the purpose of applying Lindeberg’s argument. Further, when calculating 𝔼^\hat{\mathbb{E}} with respect to gtD~v(s)g_{t}\tilde{D}^{(s)}_{v} and gt𝖣~𝗏(s)g_{t}\tilde{\mathsf{D}}^{(s)}_{\mathsf{v}}, we regard gtD~v(s)=ψv(s)g_{t}\tilde{D}^{(s)}_{v}=\psi_{v}^{(s)} and gt𝖣~π(v)(s)=ψπ(v)(s)g_{t}\tilde{\mathsf{D}}^{(s)}_{\pi(v)}=\psi_{\pi(v)}^{(s)} as vector valued functions defined in (3.27) and (3.28). For convenience of definiteness, we list variables in 𝔉t\mathfrak{F}_{t} in the following order: first we list all W~v(s)(k)+ηk(s),gtD~v(s)\tilde{W}^{(s)}_{v}(k)+\langle\eta^{(s)}_{k},g_{t}\tilde{D}^{(s)}_{v}\rangle indexed by (s,k,v)(s,k,v) in the dictionary order and then we list all 𝖶~𝗏(s)(k)+ηk(s),gt𝖣~𝗏(s)\tilde{\mathsf{W}}^{(s)}_{\mathsf{v}}(k)+\langle\eta^{(s)}_{k},g_{t}\tilde{\mathsf{D}}^{(s)}_{\mathsf{v}}\rangle indexed by (s,k,𝗏)(s,k,\mathsf{v}) in the dictionary order. Since {W~v(s)(k),𝖶~𝗏(s)(k)}\{\tilde{W}^{(s)}_{v}(k),\tilde{\mathsf{W}}^{(s)}_{\mathsf{v}}(k)\} are i.i.d. standard Gaussian variables, it suffices to calculate correlations between variables in the collection {ηk(s),gtD~v(s),ηk(s),gt𝖣~𝗏(s)}\{\langle\eta^{(s)}_{k},g_{t}\tilde{D}^{(s)}_{v}\rangle,\langle\eta^{(s)}_{k},g_{t}\tilde{\mathsf{D}}^{(s)}_{\mathsf{v}}\rangle\}. Under this ordering, on the event t\mathcal{E}_{t} we have for all r,str,s\leq t

𝔼^[gtD~v(r)(k)gtD~u(s)(l)]=𝔼^[gt𝖣~𝗏(r)(k)gt𝖣~𝗎(s)(l)]=0 for uv,𝗎𝗏,\displaystyle\hat{\mathbb{E}}[g_{t}\tilde{D}^{(r)}_{v}(k)g_{t}\tilde{D}^{(s)}_{u}(l)]=\hat{\mathbb{E}}[g_{t}\tilde{\mathsf{D}}^{(r)}_{\mathsf{v}}(k)g_{t}\tilde{\mathsf{D}}^{(s)}_{\mathsf{u}}(l)]=0\mbox{ for }u\neq v,\mathsf{u\neq v}\,,

since {Zu,w:uw}\{\overrightarrow{Z}_{u,w}:u\neq w\} is a collection of independent variables (and similarly for 𝖹\overrightarrow{\mathsf{Z}}). Also, for all 0r,st0\leq r,s\leq t (recall that 𝔞s=𝔞\mathfrak{a}_{s}=\mathfrak{a} for s1s\geq 1 and 𝔞0=ϑ\mathfrak{a}_{0}=\vartheta), on the event 𝒯t\mathcal{T}_{t} we have

𝔼^[gtD~v(r)(k)gtD~v(s)(l)]=1n(𝔞r𝔞r2)(𝔞s𝔞s2)wVBADt(𝟏wΓk(r)𝔞r)(𝟏wΓl(s)𝔞s)\displaystyle\hat{\mathbb{E}}[g_{t}\tilde{D}^{(r)}_{v}(k)g_{t}\tilde{D}^{(s)}_{v}(l)]=\frac{1}{n\sqrt{(\mathfrak{a}_{r}-\mathfrak{a}_{r}^{2})(\mathfrak{a}_{s}-\mathfrak{a}_{s}^{2})}}\sum_{w\in V\setminus\mathrm{BAD}_{t}}(\mathbf{1}_{w\in\Gamma^{(r)}_{k}}-\mathfrak{a}_{r})(\mathbf{1}_{w\in\Gamma^{(s)}_{l}}-\mathfrak{a}_{s})
=\displaystyle= |Γk(r)Γl(s)BADt|𝔞s|Γk(r)BADt|𝔞r|Γl(s)BADt|+𝔞r𝔞s(n|BADt|)n(𝔞r𝔞r2)(𝔞s𝔞s2)\displaystyle\frac{\big{|}\Gamma^{(r)}_{k}\cap\Gamma^{(s)}_{l}\setminus\mathrm{BAD}_{t}\big{|}-\mathfrak{a}_{s}\big{|}\Gamma^{(r)}_{k}\setminus\mathrm{BAD}_{t}\big{|}-\mathfrak{a}_{r}\big{|}\Gamma^{(s)}_{l}\setminus\mathrm{BAD}_{t}\big{|}+\mathfrak{a}_{r}\mathfrak{a}_{s}(n-|\mathrm{BAD}_{t}|)}{n\sqrt{(\mathfrak{a}_{r}-\mathfrak{a}_{r}^{2})(\mathfrak{a}_{s}-\mathfrak{a}_{s}^{2})}}
=\displaystyle= |Γk(r)Γl(s)|𝔞s|Γk(r)|𝔞r|Γl(s)|+𝔞r𝔞snn(𝔞r𝔞r2)(𝔞s𝔞s2)+O(|BADt|nϑ)=MΓ(r,s)(k,l)+o(Δ02),\displaystyle\frac{\big{|}\Gamma^{(r)}_{k}\cap\Gamma^{(s)}_{l}\big{|}-\mathfrak{a}_{s}\big{|}\Gamma^{(r)}_{k}\big{|}-\mathfrak{a}_{r}\big{|}\Gamma^{(s)}_{l}\big{|}+\mathfrak{a}_{r}\mathfrak{a}_{s}n}{n\sqrt{(\mathfrak{a}_{r}-\mathfrak{a}_{r}^{2})(\mathfrak{a}_{s}-\mathfrak{a}_{s}^{2})}}+O\big{(}\frac{|\mathrm{BAD}_{t}|}{n\vartheta}\big{)}=\mathrm{M}^{(r,s)}_{\Gamma}(k,l)+o(\Delta^{2}_{0})\,,

where the last transition follows from (3.16). Similarly we have (again on 𝒯t\mathcal{T}_{t})

𝔼^[gt𝖣~𝗏(r)(k)gt𝖣~𝗏(s)(l)]=MΠ(r,s)(k,l)+o(Δ02),\displaystyle\hat{\mathbb{E}}[g_{t}\tilde{\mathsf{D}}^{(r)}_{\mathsf{v}}(k)g_{t}\tilde{\mathsf{D}}^{(s)}_{\mathsf{v}}(l)]=\mathrm{M}^{(r,s)}_{\Pi}(k,l)+o(\Delta^{2}_{0})\,,
𝔼^[gtD~v(r)(k)gt𝖣~π(v)(s)(l)]=PΓ,Π(r,s)(k,l)+o(Δ02).\displaystyle\hat{\mathbb{E}}[g_{t}\tilde{{D}}^{(r)}_{v}(k)g_{t}\tilde{\mathsf{D}}^{(s)}_{\pi(v)}(l)]=\mathrm{P}^{(r,s)}_{\Gamma,\Pi}(k,l)+o(\Delta^{2}_{0})\,.

Thus, on t𝒯t\mathcal{E}_{t}\cap\mathcal{T}_{t} we have

𝔼^[ηk(r),gtD~v(r)ηm(s),gtD~u(s)]\displaystyle\hat{\mathbb{E}}[\langle\eta^{(r)}_{k},g_{t}\tilde{D}^{(r)}_{v}\rangle\langle\eta^{(s)}_{m},g_{t}\tilde{D}^{(s)}_{u}\rangle] =𝔼^[ηk(r),gt𝖣~𝗏(r)ηm(s),gt𝖣~𝗎(s)]=0 for uv,𝗎𝗏,\displaystyle=\hat{\mathbb{E}}[\langle\eta^{(r)}_{k},g_{t}\tilde{\mathsf{D}}^{(r)}_{\mathsf{v}}\rangle\langle\eta^{(s)}_{m},g_{t}\tilde{\mathsf{D}}^{(s)}_{\mathsf{u}}\rangle]=0\mbox{ for }u\neq v,\mathsf{u}\neq\mathsf{v}\,, (3.53)
𝔼^[ηk(r),gtD~v(r)ηm(s),gtD~v(s)]\displaystyle\hat{\mathbb{E}}[\langle\eta^{(r)}_{k},g_{t}\tilde{D}^{(r)}_{v}\rangle\langle\eta^{(s)}_{m},g_{t}\tilde{D}^{(s)}_{v}\rangle] =ηk(r)MΓ(r,s)(ηm(s))+o(KtΔ02)\displaystyle=\eta^{(r)}_{k}\mathrm{M}_{\Gamma}^{(r,s)}\big{(}\eta^{(s)}_{m}\big{)}^{*}+o(K_{t}\Delta_{0}^{2})
=(2.25),(2.26)o(KtΔ02) for (r,k)(s,m),\displaystyle\overset{\eqref{equ-linear-space},\eqref{equ-vector-orthogonal}}{=}o(K_{t}\Delta_{0}^{2})\mbox{ for }(r,k)\neq(s,m)\,, (3.54)
𝔼^[ηk(r),gt𝖣~𝗏(r)ηm(s),gs𝖣~𝗏(s)]\displaystyle\hat{\mathbb{E}}[\langle\eta^{(r)}_{k},g_{t}\tilde{\mathsf{D}}^{(r)}_{\mathsf{v}}\rangle\langle\eta^{(s)}_{m},g_{s}\tilde{\mathsf{D}}^{(s)}_{\mathsf{v}}\rangle] =ηk(r)MΠ(r,s)(ηm(s))+o(KtΔ02)\displaystyle=\eta^{(r)}_{k}\mathrm{M}_{\Pi}^{(r,s)}\big{(}\eta^{(s)}_{m}\big{)}^{*}+o(K_{t}\Delta_{0}^{2})
=(2.25),(2.26)o(KtΔ02) for (r,k)(s,m).\displaystyle\overset{\eqref{equ-linear-space},\eqref{equ-vector-orthogonal}}{=}o(K_{t}\Delta_{0}^{2})\mbox{ for }(r,k)\neq(s,m)\,. (3.55)

In addition, we have

𝔼^[ηk(r),gtD~v(r)ηm(s),gs𝖣~π(v)(s)]\displaystyle\hat{\mathbb{E}}[\langle\eta^{(r)}_{k},g_{t}\tilde{{D}}^{(r)}_{v}\rangle\langle\eta^{(s)}_{m},g_{s}\tilde{\mathsf{D}}^{(s)}_{\pi(v)}\rangle] =ρ^ηk(r)PΓ,Π(r,s)(ηm(s))+o(KtΔ02)\displaystyle=\hat{\rho}\cdot\eta^{(r)}_{k}\mathrm{P}_{\Gamma,\Pi}^{(r,s)}\big{(}\eta^{(s)}_{m}\big{)}^{*}+o(K_{t}\Delta_{0}^{2})
=(2.25),(2.26)o(KtΔt) for (r,k)(s,m),\displaystyle\overset{\eqref{equ-linear-space},\eqref{equ-vector-orthogonal}}{=}o(K_{t}\Delta_{t})\mbox{ for }(r,k)\neq(s,m)\,, (3.56)

where in the second equality above we also used that the entries of PΓ,Π(r,s),MΓ(r,s),MΠ(r,s)\mathrm{P}^{(r,s)}_{\Gamma,\Pi},\mathrm{M}^{(r,s)}_{\Gamma},\mathrm{M}^{(r,s)}_{\Pi} are bounded by Δt\Delta_{t} when rsr\neq s, and PΓ,Π(s,s),MΓ(s,s),MΠ(s,s)\mathrm{P}^{(s,s)}_{\Gamma,\Pi},\mathrm{M}^{(s,s)}_{\Gamma},\mathrm{M}^{(s,s)}_{\Pi} concentrate around Ψ(s),Φ(s),Φ(s)\Psi^{(s)},\Phi^{(s)},\Phi^{(s)} respectively with error Δt\Delta_{t}. In addition, we have that

|𝔼^[ηk(r),gtD~v(r)2]1|=|ηk(r)MΓ(r,r)(ηk(r))1|+o(KtΔ02)\displaystyle\Big{|}\hat{\mathbb{E}}[\langle\eta^{(r)}_{k},g_{t}\tilde{D}^{(r)}_{v}\rangle^{2}]-1\Big{|}=\Big{|}\eta^{(r)}_{k}\mathrm{M}_{\Gamma}^{(r,r)}\big{(}\eta^{(r)}_{k}\big{)}^{*}-1\Big{|}+o(K_{t}\Delta_{0}^{2})
=\displaystyle=\ |ηk(r)Φ(r)(ηk(r))+ηk(r)(MΓ(r,r)Φ(r))(ηk(r))1|+o(KtΔ02)(2.27)KtΔt,\displaystyle\Big{|}\eta^{(r)}_{k}\Phi^{(r)}\big{(}\eta^{(r)}_{k}\big{)}^{*}+\eta^{(r)}_{k}\big{(}\mathrm{M}_{\Gamma}^{(r,r)}-\Phi^{(r)}\big{)}\big{(}\eta^{(r)}_{k}\big{)}^{*}-1\Big{|}+o(K_{t}\Delta_{0}^{2})\overset{\eqref{equ-vector-unit}}{\leq}K_{t}\Delta_{t}\,, (3.57)
|𝔼^[ηk(r),gt𝖣~𝗏(r)2]1|=|ηk(r)MΠ(r,r)(ηk(r))1|+o(KtΔ02)\displaystyle\Big{|}\hat{\mathbb{E}}[\langle\eta^{(r)}_{k},g_{t}\tilde{\mathsf{D}}^{(r)}_{\mathsf{v}}\rangle^{2}]-1\Big{|}=\Big{|}\eta^{(r)}_{k}\mathrm{M}_{\Pi}^{(r,r)}\big{(}\eta^{(r)}_{k}\big{)}^{*}-1\Big{|}+o(K_{t}\Delta_{0}^{2})
=\displaystyle=\ |ηk(r)Φ(r)(ηk(r))+ηk(r)(MΠ(r,r)Φ(r))(ηk(r))1|+o(KtΔ02)(2.27)KtΔt,\displaystyle\Big{|}\eta^{(r)}_{k}\Phi^{(r)}\big{(}\eta^{(r)}_{k}\big{)}^{*}+\eta^{(r)}_{k}\big{(}\mathrm{M}_{\Pi}^{(r,r)}-\Phi^{(r)}\big{)}\big{(}\eta^{(r)}_{k}\big{)}^{*}-1\Big{|}+o(K_{t}\Delta_{0}^{2})\overset{\eqref{equ-vector-unit}}{\leq}K_{t}\Delta_{t}\,, (3.58)
|𝔼^[ηk(r),gtD~𝗏(r)ηk(r),gt𝖣~π(v)(r)]ρ^ηk(r)Ψ(r)(ηk(r))|\displaystyle\Big{|}\hat{\mathbb{E}}[\langle\eta^{(r)}_{k},g_{t}\tilde{{D}}^{(r)}_{\mathsf{v}}\rangle\langle\eta^{(r)}_{k},g_{t}\tilde{\mathsf{D}}^{(r)}_{\pi(v)}\rangle]-\hat{\rho}\cdot\eta^{(r)}_{k}\Psi^{(r)}\big{(}\eta^{(r)}_{k}\big{)}^{*}\Big{|}
=\displaystyle=\ ρ^|ηk(r)(PΓ,Π(r,r)Ψ(r))(ηk(r))|+o(KtΔ02)KtΔt.\displaystyle\hat{\rho}\cdot\Big{|}\eta^{(r)}_{k}\big{(}\mathrm{P}_{\Gamma,\Pi}^{(r,r)}-\Psi^{(r)}\big{)}\big{(}\eta^{(r)}_{k}\big{)}^{*}\Big{|}+o(K_{t}\Delta_{0}^{2})\leq K_{t}\Delta_{t}\,. (3.59)

Recall that 𝔉t\mathfrak{F}_{t} consists of variables in (3.12) where {Wv(t)(k),𝖶𝗏(t)(k)}\{W^{(t)}_{v}(k),\mathsf{W}^{(t)}_{\mathsf{v}}(k)\} is a collection of standard Gaussian variables independent with {ηk(t),gtD~v(t),ηk(t),gt𝖣~𝗏(t)}\{\langle\eta^{(t)}_{k},g_{t}\tilde{D}^{(t)}_{v}\rangle,\langle\eta^{(t)}_{k},g_{t}\tilde{\mathsf{D}}^{(t)}_{\mathsf{v}}\rangle\}. Therefore, we may write the 𝔼^\hat{\mathbb{E}}-correlation matrix of 𝔉t\mathfrak{F}_{t} as (I+𝐀t𝐁t𝐁tI+𝐂t)\begin{pmatrix}\mathrm{I}+\mathbf{A}_{t}&\mathbf{B}_{t}\\ \mathbf{B}_{t}^{*}&\mathrm{I}+\mathbf{C}_{t}\end{pmatrix}, such that the following hold:

  • 𝐀t,𝐂t\mathbf{A}_{t},\mathbf{C}_{t} have diagonal entries in (1KtΔt,1+KtΔt)(1-K_{t}\Delta_{t},1+K_{t}\Delta_{t});

  • for each fixed (s,k,u)(s,k,u), there are at most 2Kt2K_{t} non-zero non-diagonal 𝐀t((s,k,u);(r,l,u))\mathbf{A}_{t}((s,k,u);(r,l,u)) (and also the same for 𝐂t\mathbf{C}_{t}) and these entries are all bounded by KtΔtK_{t}\Delta_{t} (this fact implies that 𝐀tIop,𝐂tIop=O(Kt2Δt)\|\mathbf{A}_{t}-\mathrm{I}\|_{\mathrm{op}},\|\mathbf{C}_{t}-\mathrm{I}\|_{\mathrm{op}}=O(K_{t}^{2}\Delta_{t}));

  • 𝐁t\mathbf{B}_{t} is the matrix with row indexed by (s,k,u)(s,k,u) for 0st,1kKs12,uVBADt0\leq s\leq t,1\leq k\leq\frac{K_{s}}{12},u\in V\setminus\mathrm{BAD}_{t} and column indexed by (r,l,𝗐)(r,l,\mathsf{w}) for 0rt,1lKs12,𝗐𝖵π(BADt)0\leq r\leq t,1\leq l\leq\frac{K_{s}}{12},\mathsf{w}\in\mathsf{V}\setminus\pi(\mathrm{BAD}_{t}), and with entries 𝐁t((s,k,u);(r,l,𝗐))\mathbf{B}_{t}((s,k,u);(r,l,\mathsf{w})) given by 𝔼^[ηk(s),gsD~u(s)ηl(r),gr𝖣~𝗐(r)]\hat{\mathbb{E}}[\langle\eta^{(s)}_{k},g_{s}\tilde{D}^{(s)}_{u}\rangle\langle\eta^{(r)}_{l},g_{r}\tilde{\mathsf{D}}^{(r)}_{\mathsf{w}}\rangle].

Thus, by [17, Lemma 3.10] we have

𝔼[ηk(t+1),gtD~v(t+1)|t]=(gt[Y~]tgt[𝖸~]t)(I+𝐀t𝐁t𝐁tI+𝐂t)1(Ht+1,k,v𝖧t+1,k,v).\displaystyle\mathbb{E}[\langle\eta^{(t+1)}_{k},g_{t}\tilde{D}^{(t+1)}_{v}\rangle|\mathcal{F}_{t}]=\begin{pmatrix}g_{t}[\tilde{Y}]_{t}&g_{t}[\tilde{\mathsf{Y}}]_{t}\end{pmatrix}\begin{pmatrix}\mathrm{I}+\mathbf{A}_{t}&\mathbf{B}_{t}\\ \mathbf{B}_{t}^{*}&\mathrm{I}+\mathbf{C}_{t}\end{pmatrix}^{-1}\begin{pmatrix}H_{t+1,k,v}^{*}\\ \mathsf{H}_{t+1,k,v}^{*}\end{pmatrix}\,. (3.60)

Here Ht+1,k,v,𝖧t+1,k,vH_{t+1,k,v},\mathsf{H}_{t+1,k,v} and gt[Y~]t,gt[𝖸~]tg_{t}[\tilde{Y}]_{t},g_{t}[\tilde{\mathsf{Y}}]_{t} are all 0stKs12(n|BADt|)\sum_{0\leq s\leq t}\frac{K_{s}}{12}(n-|\mathrm{BAD}_{t}|) dimensional vectors; Ht+1,k,vH_{t+1,k,v} and gt[Y~]tg_{t}[\tilde{Y}]_{t} are indexed by triple (s,l,u)(s,l,u) with 0st,1kKs12,uVBADt0\leq s\leq t,1\leq k\leq\frac{K_{s}}{12},u\in V\setminus\mathrm{BAD}_{t} in the dictionary order; 𝖧t+1,k,v\mathsf{H}_{t+1,k,v} and gt[𝖸~]tg_{t}[\tilde{\mathsf{Y}}]_{t} are indexed by triple (s,l,𝗎)(s,l,\mathsf{u}) with 0st,1kKs12,𝗎𝖵π(BADt)0\leq s\leq t,1\leq k\leq\frac{K_{s}}{12},\mathsf{u}\in\mathsf{V}\setminus\pi(\mathrm{BAD}_{t}) in the dictionary order. Also gt[Y~]tg_{t}[\tilde{Y}]_{t} and gt[𝖸~]tg_{t}[\tilde{\mathsf{Y}}]_{t} can be divided into sub-vectors as follows:

gt[Y~]t=[gtY~t|gtY~t1||gtY~0] and gt[𝖸~]t=[gt𝖸~t|gt𝖸~t1||gt𝖸~0],\displaystyle g_{t}[\tilde{Y}]_{t}=[g_{t}\tilde{Y}_{t}\,|\,g_{t}\tilde{Y}_{t-1}\,|\,\ldots\,|\,g_{t}\tilde{Y}_{0}]\mbox{ and }g_{t}[\tilde{\mathsf{Y}}]_{t}=[g_{t}\tilde{\mathsf{Y}}_{t}\,|\,g_{t}\tilde{\mathsf{Y}}_{t-1}\,|\,\ldots\,|\,g_{t}\tilde{\mathsf{Y}}_{0}]\,,

where gtY~sg_{t}\tilde{Y}_{s} and gt𝖸~sg_{t}\tilde{\mathsf{Y}}_{s} are Ks12(n|BADt|)\frac{K_{s}}{12}(n-|\mathrm{BAD}_{t}|) dimensional vectors indexed by (k,u)(k,u) and (k,𝗎)(k,\mathsf{u}), respectively. In addition, their entries are given by

gtY~s(l,u)=W~u(s)(l)+ηl(s),gtD~u(s),Ht+1,k,v(s,l,u)=𝔼^[ηk(t+1),gtD~v(t+1)ηl(s),gtD~u(s)];\displaystyle g_{t}\tilde{Y}_{s}(l,u)=\tilde{W}^{(s)}_{u}(l)+\langle\eta^{(s)}_{l},g_{t}\tilde{D}^{(s)}_{u}\rangle,\quad H_{t+1,k,v}(s,l,u)=\hat{\mathbb{E}}[\langle\eta^{(t+1)}_{k},g_{t}\tilde{D}^{(t+1)}_{v}\rangle\langle\eta^{(s)}_{l},g_{t}\tilde{D}^{(s)}_{{u}}\rangle]\,;
gt𝖸~s(l,𝗎)=𝖶~𝗎(s)(l)+ηl(s),gt𝖣~𝗎(s),𝖧t+1,k,v(s,l,𝗎)=𝔼^[ηk(t+1),gtD~v(t+1)ηl(s),gt𝖣~𝗎(s)].\displaystyle g_{t}\tilde{\mathsf{Y}}_{s}(l,\mathsf{u})=\tilde{\mathsf{W}}^{(s)}_{\mathsf{u}}(l)+\langle\eta^{(s)}_{l},g_{t}\tilde{\mathsf{D}}^{(s)}_{\mathsf{u}}\rangle,\quad\mathsf{H}_{t+1,k,v}(s,l,\mathsf{u})=\hat{\mathbb{E}}[\langle\eta^{(t+1)}_{k},g_{t}\tilde{D}^{(t+1)}_{v}\rangle\langle\eta^{(s)}_{l},g_{t}\tilde{\mathsf{D}}^{(s)}_{\mathsf{u}}\rangle]\,.
Remark 3.22.

In conclusion, we have shown that

((gtY~t+1gt𝖸~t+1)|t)=𝑑((gt[Y~]tgt[𝖸~]t)(I+𝐀t𝐁t𝐁tI+𝐂t)1𝐇t+1|t)\displaystyle\Big{(}\begin{pmatrix}g_{t}\tilde{Y}_{t+1}&g_{t}\tilde{\mathsf{Y}}_{t+1}\end{pmatrix}\big{|}\mathcal{F}_{t}\Big{)}\overset{d}{=}\Big{(}\begin{pmatrix}g_{t}[\tilde{Y}]_{t}&g_{t}[\tilde{\mathsf{Y}}]_{t}\end{pmatrix}\begin{pmatrix}\mathrm{I}+\mathbf{A}_{t}&\mathbf{B}_{t}\\ \mathbf{B}_{t}^{*}&\mathrm{I}+\mathbf{C}_{t}\end{pmatrix}^{-1}\mathbf{H}_{t+1}^{*}\big{|}\mathcal{F}_{t}\Big{)} (3.61)
+(gtY~t+1gt𝖸~t+1)(gt[Y~]tgt[𝖸~]t)(I+𝐀t𝐁t𝐁tI+𝐂t)1𝐇t+1.\displaystyle+\begin{pmatrix}g_{t}\tilde{Y}_{t+1}^{\diamond}&g_{t}\tilde{\mathsf{Y}}_{t+1}^{\diamond}\end{pmatrix}-\begin{pmatrix}g_{t}[\tilde{Y}]_{t}^{\diamond}&g_{t}[\tilde{\mathsf{Y}}]_{t}^{\diamond}\end{pmatrix}\begin{pmatrix}\mathrm{I}+\mathbf{A}_{t}&\mathbf{B}_{t}\\ \mathbf{B}_{t}^{*}&\mathrm{I}+\mathbf{C}_{t}\end{pmatrix}^{-1}\mathbf{H}_{t+1}^{*}\,. (3.62)

In the above 𝐇t+1\mathbf{H}_{t+1} is given by

𝐇t+1((k,τ1);(s,l,τ2))=Ht+1,k,τ1(s,l,τ2) for τ1,τ2(VBADt)(𝖵π(BADt)).\mathbf{H}_{t+1}((k,\tau_{1});(s,l,\tau_{2}))=H_{t+1,k,\tau_{1}}(s,l,\tau_{2})\mbox{ for }\tau_{1},\tau_{2}\in(V\setminus\mathrm{BAD}_{t})\cup(\mathsf{V}\setminus\pi(\mathrm{BAD}_{t}))\,. (3.63)

In addition, gtY~sg_{t}\tilde{Y}^{\diamond}_{s} is given by gtY~s(l,v)=W~v(t)(l)+ηl(s),(gtD~v(s))g_{t}\tilde{Y}^{\diamond}_{s}(l,v)=\tilde{W}^{(t)}_{v}(l)+\langle\eta^{(s)}_{l},(g_{t}\tilde{D}^{(s)}_{v})^{\diamond}\rangle, where

gtD~v(t)(k)=1n(𝔞t𝔞t2)uVBADt(𝟏uΓk(t)𝔞t)Zv,u\displaystyle g_{t}\tilde{D}^{(t)}_{v}(k)^{\diamond}=\frac{1}{\sqrt{n(\mathfrak{a}_{t}-\mathfrak{a}_{t}^{2})}}\sum_{u\in V\setminus\mathrm{BAD}_{t}}(\mathbf{1}_{u\in\Gamma^{(t)}_{k}}-\mathfrak{a}_{t})\overrightarrow{Z}_{v,u}^{\diamond}

is a linear combination of Gaussian variables {Zv,u}\{\overrightarrow{Z}_{v,u}^{\diamond}\} with coefficients fixed (recall (3.51)), and {Zv,u}\{\overrightarrow{Z}_{v,u}^{\diamond}\} is an independent copy of {Zv,u}\{\overrightarrow{Z}_{v,u}\} (and similarly for gt𝖣𝗏(t)(k)g_{t}\mathsf{D}^{(t)}_{\mathsf{v}}(k)^{\diamond}). For notational convenience, we denote (3.61) as PROJ((gtY~t+1,gt𝖸~t+1))\textup{PROJ}\big{(}(g_{t}\tilde{Y}_{t+1},g_{t}\tilde{\mathsf{Y}}_{t+1})\big{)}, which is a vector with entries given by (the analogue of below for the mathsf version also holds)

PROJ((gtY~t+1,gt𝖸~t+1))(k,v)\displaystyle\textup{PROJ}\big{(}(g_{t}\tilde{Y}_{t+1},g_{t}\tilde{\mathsf{Y}}_{t+1})\big{)}(k,v) =PROJ(gtY~t+1(k,v))\displaystyle=\mathrm{PROJ}(g_{t}\tilde{Y}_{t+1}(k,v))
=PROJ(W~v(t+1)(k)+ηk(t+1),gtD~v(t+1)) for vVBADt.\displaystyle=\textup{PROJ}(\tilde{W}^{(t+1)}_{v}(k)+\langle\eta^{(t+1)}_{k},g_{t}\tilde{D}^{(t+1)}_{v}\rangle)\mbox{ for }v\in V\setminus\mathrm{BAD}_{t}\,.

We also denote (3.62) as (gtY~t+1GAUS(gtY~t+1)gt𝖸~t+1GAUS(gt𝖸~t+1))\begin{pmatrix}g_{t}\tilde{Y}_{t+1}^{\diamond}-\mathrm{GAUS}(g_{t}\tilde{Y}_{t+1})&g_{t}\tilde{\mathsf{Y}}_{t+1}^{\diamond}-\mathrm{GAUS}(g_{t}\tilde{\mathsf{Y}}_{t+1})\end{pmatrix}. We will further use the notation PROJ(gtYt+1(k,v))\mathrm{PROJ}(g_{t}Y_{t+1}(k,v)) (note that there is no tilde here) to denote the projection obtained from PROJ(gtY~t+1(k,v))\mathrm{PROJ}(g_{t}\tilde{Y}_{t+1}(k,v)) by replacing each Zv,u,W~v(s)(k)\overrightarrow{Z}_{v,u},\tilde{W}^{(s)}_{v}(k) with Gv,u,Wv(s)(k)\overrightarrow{G}_{v,u},W^{(s)}_{v}(k) therein (and similarly for PROJ(gt𝖸t+1(k,𝗏))\mathrm{PROJ}(g_{t}\mathsf{Y}_{t+1}(k,\mathsf{v}))).

Denote 𝐐t=(I+𝐀t𝐁t𝐁tI+𝐂t)1\mathbf{Q}_{t}=\begin{pmatrix}\mathrm{I}+\mathbf{A}_{t}&\mathbf{B}_{t}\\ \mathbf{B}_{t}^{*}&\mathrm{I}+\mathbf{C}_{t}\end{pmatrix}^{-1}. We next control norms of these matrices.

Lemma 3.23.

On the event t𝒯t\mathcal{E}_{t}\cap\mathcal{T}_{t}, we have 𝐐top100\|\mathbf{Q}_{t}\|_{\mathrm{op}}\leq 100 if ρ^<0.1\hat{\rho}<0.1.

Lemma 3.24.

On the event t𝒯t\mathcal{E}_{t}\cap\mathcal{T}_{t}, we have 𝐐t=𝐐t1100Kt10ϑ2\|\mathbf{Q}_{t}\|_{\infty}=\|\mathbf{Q}_{t}\|_{1}\leq 100K_{t}^{10}\vartheta^{-2}.

Similar versions of Lemmas 3.23 and 3.24 were proved in [17, Lemmas 3.13 and 3.15], and the proofs can be adapted easily. Indeed, by proofs in [17], in order to bound the operator norm it suffices to use the fact that ηk(s),gtD~v(s)span{Zu,w},ηk(s),gt𝖣~𝗏(s)span{𝖹𝗎,𝗐}\langle\eta^{(s)}_{k},g_{t}\tilde{D}^{(s)}_{v}\rangle\in\mathrm{span}\{\overrightarrow{Z}_{u,w}\},\langle\eta^{(s)}_{k},g_{t}\tilde{\mathsf{D}}^{(s)}_{\mathsf{v}}\rangle\in\mathrm{span}\{\overrightarrow{\mathsf{Z}}_{\mathsf{u,w}}\}. Also, in order to bound the \infty-norm, it suffices to show that the operator norm is bounded by a constant and that 𝐐t1((s,k,v);(s,k,𝗎))=O(Ktϑ1)\mathbf{Q}^{-1}_{t}((s,k,v);(s^{\prime},k^{\prime},\mathsf{u}))=O(K_{t}\vartheta^{-1}) when 𝗎π(v)\mathsf{u}\neq\pi(v). All of these can be easily checked and thus we omit further details.

Lemma 3.25.

On the event t𝒯t1\mathcal{E}_{t}\cap\mathcal{T}_{t-1}, we have

𝐇tHS2nKt4Δt2,𝐇t,𝐇t1Kt3ϑ and 𝐇top2Kt3.\|\mathbf{H}_{t}\|_{\mathrm{HS}}^{2}\leq nK_{t}^{4}\Delta_{t}^{2}\,,\quad\|\mathbf{H}_{t}\|_{\infty},\|\mathbf{H}_{t}\|_{1}\leq\frac{K^{3}_{t}}{\sqrt{\vartheta}}\,\mbox{ and }\,\|\mathbf{H}_{t}\|_{\mathrm{op}}\leq 2K_{t}^{3}\,.
Proof.

Recall (3.63). By (3.53), (3.54), (3.55) and (3.56) we get 𝐇t((k,v);(s,l,u))=0\mathbf{H}_{t}((k,v);(s,l,u))=0 for uvu\neq v and that 𝐇t((k,v);(s,l,v)),𝐇t((k,v);(s,l,π(v)))=O(KtΔt)\mathbf{H}_{t}((k,v);(s,l,v)),\mathbf{H}_{t}((k,v);(s,l,\pi(v)))=O(K_{t}\Delta_{t}); The similar results also hold for 𝐇t((k,𝗏);(s,l,𝗎))\mathbf{H}_{t}((k,\mathsf{v});(s,l,\mathsf{u})). In addition, for π(v)𝗎\pi(v)\neq\mathsf{u} we have that 𝐇t((k,v);(s,l,𝗎))\mathbf{H}_{t}((k,v);(s,l,\mathsf{u})) is equal to

𝔼^[i=1Ktj=1Ksw1,w2ηk(t)(i)ηl(s)(j)(𝟏w1Γi(t)𝔞t)(𝟏π(w2)Πj(s)𝔞s)Zv,w1𝖹𝗎,π(w2)nq^(1q^)(𝔞t𝔞t2)(𝔞s𝔞s2)]\displaystyle\hat{\mathbb{E}}\Bigg{[}\frac{\sum_{i=1}^{K_{t}}\sum_{j=1}^{K_{s}}\sum_{w_{1},w_{2}}\eta^{(t)}_{k}(i)\eta^{(s)}_{l}(j)(\mathbf{1}_{w_{1}\in\Gamma^{(t)}_{i}}-\mathfrak{a}_{t})(\mathbf{1}_{\pi(w_{2})\in\Pi^{(s)}_{j}}-\mathfrak{a}_{s})\overrightarrow{Z}_{v,w_{1}}\overrightarrow{\mathsf{Z}}_{\mathsf{u},\pi(w_{2})}}{n\hat{q}(1-\hat{q})\sqrt{(\mathfrak{a}_{t}-\mathfrak{a}_{t}^{2})(\mathfrak{a}_{s}-\mathfrak{a}_{s}^{2})}}\Bigg{]}
=O(Kt2n𝔞t𝔞s(𝟏π(v)jΠj(s)𝔞s)(𝟏uiΓi(t)𝔞t))\displaystyle=O\Big{(}\frac{K^{2}_{t}}{n\sqrt{\mathfrak{a}_{t}\mathfrak{a}_{s}}}(\mathbf{1}_{\pi(v)\in\cup_{j}\Pi^{(s)}_{j}}-\mathfrak{a}_{s})(\mathbf{1}_{u\in\cup_{i}\Gamma^{(t)}_{i}}-\mathfrak{a}_{t})\Big{)}

and a similar bound applies to (𝗎,v)(\mathsf{u},v) with vV,𝗎𝖵v\in V,\mathsf{u}\in\mathsf{V} and 𝗎π(v)\mathsf{u}\neq\pi(v). Combined with Items (i) and (iv) in Definition 3.1 for t\mathcal{E}_{t}, it yields that 𝐇tHS2nKt4Δt2\|\mathbf{H}_{t}\|_{\mathrm{HS}}^{2}\leq nK_{t}^{4}\Delta_{t}^{2} and 𝐇t,𝐇t1Kt3ϑ\|\mathbf{H}_{t}\|_{\infty},\|\mathbf{H}_{t}\|_{1}\leq\frac{K^{3}_{t}}{\sqrt{\vartheta}}.

We next bound 𝐇top\|\mathbf{H}_{t}\|_{\mathrm{op}}. Applying Lemma 3.8 by setting δ=KtΔt\delta=K_{t}\Delta_{t}, C2=Kt6C^{2}=K_{t}^{6} and v=𝒥v={(s,k,v),(s,k,π(v)):0st,1kKs/12}\mathcal{I}_{v}=\mathcal{J}_{v}=\{(s,k,v),(s,k,\pi(v)):0\leq s\leq t,1\leq k\leq K_{s}/12\}, we can then derive that 𝐇topKt3+4Kt2Δt2Kt3\|\mathbf{H}_{t}\|_{\mathrm{op}}\leq K_{t}^{3}+4K_{t}^{2}\Delta_{t}\leq 2K_{t}^{3}. ∎

Remark 3.22 provides an explicit expression for the conditional law of of W~v(t+1)(k)+ηk(t+1),gtD~v(t+1)\tilde{W}^{(t+1)}_{v}(k)+\langle\eta^{(t+1)}_{k},g_{t}\tilde{D}^{(t+1)}_{v}\rangle. However, the projection PROJ(gtY~t+1(k,v))\mathrm{PROJ}(g_{t}\tilde{Y}_{t+1}(k,v)) is not easy to deal with since the expression of every variable in 𝔉t\mathfrak{F}_{t} relies on BADt\mathrm{BAD}_{t} (even for those indexed with s<ts<t). A more tractable “projection” is the projection of W~v(t+1)(k)+ηk(t+1),gtD~v(t+1)\tilde{W}^{(t+1)}_{v}(k)+\langle\eta^{(t+1)}_{k},g_{t}\tilde{D}^{(t+1)}_{v}\rangle onto t\mathcal{F}^{\prime}_{t} for t=σ(𝔉t)\mathcal{F}^{\prime}_{t}=\sigma(\mathfrak{F}^{\prime}_{t}) where

𝔉t={W~u(s)(l)+ηl(s),gs1D~u(s)𝖶~π(u)(s)(l)+ηl(s),gs1𝖣~π(u)(s):0st,1lKs12,uBADt}.\displaystyle\mathfrak{F}^{\prime}_{t}=\Bigg{\{}\begin{split}&\tilde{W}^{(s)}_{u}(l)+\langle\eta^{(s)}_{l},g_{s-1}\tilde{D}^{(s)}_{u}\rangle\\ &\tilde{\mathsf{W}}^{(s)}_{\pi(u)}(l)+\langle\eta^{(s)}_{l},g_{s-1}\tilde{\mathsf{D}}^{(s)}_{\pi(u)}\rangle\end{split}:0\leq s\leq t,1\leq l\leq\frac{K_{s}}{12},u\not\in\mathrm{BAD}_{t}\Bigg{\}}. (3.64)

We can similarly show that 𝔼[(gtY~t+1,gt𝖸~t+1)|t]\mathbb{E}[(g_{t}\tilde{Y}_{t+1},g_{t}\tilde{\mathsf{Y}}_{t+1})|\mathcal{F}^{\prime}_{t}] (recall that the conditional expectation is the same as the projection) has the form

(PROJ(gtY~t+1)PROJ(gt𝖸~t+1))=([gY~]t[g𝖸~]t)𝐏t𝐉t+1,\displaystyle\begin{pmatrix}{\mathrm{PROJ}}^{\prime}(g_{t}\tilde{Y}_{t+1})&{\mathrm{PROJ}}^{\prime}(g_{t}\tilde{\mathsf{Y}}_{t+1})\end{pmatrix}=\begin{pmatrix}[g\tilde{Y}]_{t}&[g\tilde{\mathsf{Y}}]_{t}\end{pmatrix}\mathbf{P}_{t}\mathbf{J}_{t+1}^{*}\,, (3.65)

where [gY~]t(s,l,u)=W~l(s)(u)+ηl(s),gs1D~u(s)[g\tilde{Y}]_{t}(s,l,u)=\tilde{W}^{(s)}_{l}(u)+\langle\eta^{(s)}_{l},g_{s-1}\tilde{D}^{(s)}_{u}\rangle, PROJ(gtY~t+1)(k,v)=PROJ(gtY~t+1(k,v)){\mathrm{PROJ}}^{\prime}(g_{t}\tilde{Y}_{t+1})(k,v)={\mathrm{PROJ}}^{\prime}(g_{t}\tilde{Y}_{t+1}(k,v)) is the projection of W~k(t+1)(v)+ηk(t+1),gtD~v(t+1)\tilde{W}^{(t+1)}_{k}(v)+\langle\eta^{(t+1)}_{k},g_{t}\tilde{D}^{(t+1)}_{v}\rangle onto t\mathcal{F}^{\prime}_{t}, 𝐉t+1\mathbf{J}_{t+1} is defined by

𝐉t+1((k,v),(s,l,u))=𝔼^[ηk(t+1),gtD~v(t+1)ηl(s),gs1D~u(s)] for u,vBADt,\mathbf{J}_{t+1}((k,v),(s,l,u))=\hat{\mathbb{E}}[\langle\eta^{(t+1)}_{k},g_{t}\tilde{D}^{(t+1)}_{v}\rangle\langle\eta^{(s)}_{l},g_{s-1}\tilde{D}^{(s)}_{u}\rangle]\mbox{ for }u,v\not\in\mathrm{BAD}_{t}\,, (3.66)

and 𝐏t1\mathbf{P}_{t}^{-1} is defined to be the covariance matrix of 𝔉t\mathfrak{F}^{\prime}_{t}. Adapting proofs of Lemmas 3.23, 3.24 and 3.25, we can show that under t+1𝒯t\mathcal{E}_{t+1}\cap\mathcal{T}_{t}, we have 𝐉t+1op2Kt3\|\mathbf{J}_{t+1}\|_{\mathrm{op}}\leq 2K_{t}^{3} and 𝐏top100\|\mathbf{P}_{t}\|_{\mathrm{op}}\leq 100.

Similarly, we denote by PROJ(gtYt+1){\mathrm{PROJ}}^{\prime}(g_{t}Y_{t+1}) a vector obtained from replacing the substituted Gaussian entries with the original Bernoulli variables in the projection PROJ(gtY~t+1){\mathrm{PROJ}}^{\prime}(g_{t}\tilde{Y}_{t+1}) (i.e, on the right hand side of (3.65)). The projection PROJ(gtYt+1)\mathrm{PROJ}^{\prime}(g_{t}Y_{t+1}) is more tractable than PROJ(gtYt+1)\mathrm{PROJ}(g_{t}Y_{t+1}) (recall its definition in Remark 3.22) since Wv(s)(k)+ηk(s),gs1Dv(s)W_{v}^{(s)}(k)+\langle\eta^{(s)}_{k},g_{s-1}D^{(s)}_{v}\rangle is measurable with respect to 𝔖s\mathfrak{S}_{s} (recall (3.6)) and thus we can use induction to control PROJ\mathrm{PROJ}^{\prime}. Another advantage is that the matrix 𝐏t\mathbf{P}_{t} is measurable with 𝔖t1\mathfrak{S}_{t-1}.

Lemma 3.26.

On the event t+1𝒯t\mathcal{E}_{t+1}\cap\mathcal{T}_{t}, we have

vBADt+1|PROJ(gtYt+1(k,v))PROJ(gtYt+1(k,v))|2nΔt+12+Δt+12([gY]t[g𝖸]t)2.\displaystyle\sum_{v\not\in\mathrm{BAD}_{t+1}}\big{|}\mathrm{PROJ}(g_{t}Y_{t+1}(k,v))-{\mathrm{PROJ}}^{\prime}(g_{t}Y_{t+1}(k,v))\big{|}^{2}\leq n\Delta_{t+1}^{2}+\Delta_{t+1}^{2}\big{\|}\begin{pmatrix}[gY]_{t}&[g\mathsf{Y}]_{t}\end{pmatrix}\big{\|}^{2}\,.

And similar result holds for PROJ(gt𝖸t+1)PROJ(gt𝖸t+1)\mathrm{PROJ}(g_{t}\mathsf{Y}_{t+1})-{\mathrm{PROJ}}^{\prime}(g_{t}\mathsf{Y}_{t+1}).

Proof.

By the triangle inequality, the left hand side in the lemma-statement is given by

vBADt+1((gt[Y]tgt[𝖸]t)𝐐t𝐇t+1(k,v)([gY]t[g𝖸]t)𝐏t𝐉t+1(k,v))2\displaystyle\sum_{v\not\in\mathrm{BAD}_{t+1}}\Big{(}\begin{pmatrix}g_{t}[Y]_{t}&g_{t}[\mathsf{Y}]_{t}\end{pmatrix}\mathbf{Q}_{t}\mathbf{H}_{t+1}^{*}(k,v)-\begin{pmatrix}[gY]_{t}&[g\mathsf{Y}]_{t}\end{pmatrix}\mathbf{P}_{t}\mathbf{J}_{t+1}^{*}(k,v)\Big{)}^{2}
\displaystyle\leq\ 2vBADt+1(([gY]tgt[Y]t[g𝖸]tgt[𝖸]t)𝐐t𝐇t+1(k,v))2\displaystyle 2\sum_{v\not\in\mathrm{BAD}_{t+1}}\Big{(}\begin{pmatrix}[gY]_{t}-g_{t}[Y]_{t}&[g\mathsf{Y}]_{t}-g_{t}[\mathsf{Y}]_{t}\end{pmatrix}\mathbf{Q}_{t}\mathbf{H}_{t+1}^{*}(k,v)\Big{)}^{2} (3.67)
+\displaystyle+\ 2([gY]t[g𝖸]t)(𝐏t𝐉t+1𝐐t𝐇t+1)2.\displaystyle 2\big{\|}\begin{pmatrix}[gY]_{t}&[g\mathsf{Y}]_{t}\end{pmatrix}(\mathbf{P}_{t}\mathbf{J}_{t+1}^{*}-\mathbf{Q}_{t}\mathbf{H}_{t+1}^{*})\big{\|}^{2}\,. (3.68)

By (3.14), we have

(3.67)=vBADt+1(([gY]tgt[Y]t[g𝖸]tgt[𝖸]t)𝐐t(Ht+1,k,v𝖧t+1,k,v))2nΔt+12.\displaystyle\eqref{equ-dif-proj-part-I}=\sum_{v\not\in\mathrm{BAD}_{t+1}}\Big{(}\begin{pmatrix}[gY]_{t}-g_{t}[Y]_{t}&[g\mathsf{Y}]_{t}-g_{t}[\mathsf{Y}]_{t}\end{pmatrix}\mathbf{Q}_{t}\begin{pmatrix}H_{t+1,k,v}&\mathsf{H}_{t+1,k,v}\end{pmatrix}^{*}\Big{)}^{2}\leq n\Delta_{t+1}^{2}\,.

It remains to bound (3.68). Note that for u,vVu,v\in V and τu{u,π(u)}\tau_{u}\in\{u,\pi(u)\} and τv{v,π(v)}\tau_{v}\in\{v,\pi(v)\}

(𝐉t+1𝐇t+1)((k,τv);(s,l,τu))={0,vu,u,vBADt;O(1n),vu,vBADt or uBADt;O(|BADt|/n),v=u.\displaystyle\big{(}\mathbf{J}_{t+1}-\mathbf{H}_{t+1}\big{)}((k,\tau_{v});(s,l,\tau_{u}))=\begin{cases}0,&v\neq u,u,v\not\in\mathrm{BAD}_{t};\\ O(\frac{1}{n}),&v\neq u,v\in\mathrm{BAD}_{t}\mbox{ or }u\in\mathrm{BAD}_{t};\\ O(|\mathrm{BAD}_{t}|/n),&v=u\,.\end{cases}

As in the proof of Lemma 3.25, we can choose v=𝒥v={(s,k,v),(s,k,π(v)):0st,1kKs/12}\mathcal{I}_{v}=\mathcal{J}_{v}=\{(s,k,v),(s,k,\pi(v)):0\leq s\leq t,1\leq k\leq K_{s}/12\}, δ=|BADt|n\delta=\frac{|\mathrm{BAD}_{t}|}{n} and C2=|BADt|nΔt10C^{2}=\frac{|\mathrm{BAD}_{t}|}{n}\ll\Delta_{t}^{10} (since we are on the event 𝒯t\mathcal{T}_{t}). We can then apply Lemma 3.8 and get that 𝐉t+1𝐇t+1op2Δt+15\|\mathbf{J}_{t+1}-\mathbf{H}_{t+1}\|_{\mathrm{op}}\ll 2\Delta_{t+1}^{5}. Again by applying Lemma 3.8 for such v,𝒥v,δ\mathcal{I}_{v},\mathcal{J}_{v},\delta and C2C^{2}, we get 𝐐t1𝐏t1op2Δt+15\|\mathbf{Q}_{t}^{-1}-\mathbf{P}_{t}^{-1}\|_{\mathrm{op}}\ll 2\Delta_{t+1}^{5}. Thus,

𝐐t𝐏top𝐐top𝐐t1𝐏t1op𝐏top1002Δt+15.\displaystyle\|\mathbf{Q}_{t}-\mathbf{P}_{t}\|_{\mathrm{op}}\leq\|\mathbf{Q}_{t}\|_{\mathrm{op}}\|\mathbf{Q}_{t}^{-1}-\mathbf{P}_{t}^{-1}\|_{\mathrm{op}}\|\mathbf{P}_{t}\|_{\mathrm{op}}\leq 100^{2}\Delta_{t+1}^{5}\,.

Since in addition 𝐉t+1,𝐇t+1,𝐏t,𝐐t\mathbf{J}_{t+1},\mathbf{H}_{t+1},\mathbf{P}_{t},\mathbf{Q}_{t} have operator norms bounded by O(Kt+13)O(K^{3}_{t+1}), we get that

𝐏t𝐉t+1𝐐t𝐇t+1op𝐏t𝐐top𝐉t+1op+𝐉t+1𝐇t+1op𝐐topΔt+12.\displaystyle\|\mathbf{P}_{t}\mathbf{J}_{t+1}^{*}-\mathbf{Q}_{t}\mathbf{H}_{t+1}^{*}\|_{\mathrm{op}}\leq\|\mathbf{P}_{t}-\mathbf{Q}_{t}\|_{\mathrm{op}}\|\mathbf{J}_{t+1}\|_{\mathrm{op}}+\|\mathbf{J}_{t+1}-\mathbf{H}_{t+1}\|_{\mathrm{op}}\|\mathbf{Q}_{t}\|_{\mathrm{op}}\leq\Delta_{t+1}^{2}\,.

This implies that (3.68)Δt+12([gY]t2+[g𝖸]t2)\eqref{equ-dif-proj-part-II}\leq\Delta_{t+1}^{2}(\|[gY]_{t}\|^{2}+\|[g\mathsf{Y}]_{t}\|^{2}), as required. ∎

3.6 Proof of Proposition 3.2

In this subsection, we prove Proposition 3.2 by induction on tt. Recall the definition of t\mathcal{E}_{t} (see (3.3)) and 𝒯t\mathcal{T}_{t} (see (3.16)). For a given realization (Bt,Bt1)(\mathrm{B}_{t},\mathrm{B}_{t-1}) for (BADt,BADt1)(\mathrm{BAD}_{t},\mathrm{BAD}_{t-1}), define vectors W~t\tilde{W}_{t} and 𝖶~t\tilde{\mathsf{W}}_{t} where W~t(k,v)=W~v(t)(k)\tilde{W}_{t}(k,v)=\tilde{W}^{(t)}_{v}(k) and 𝖶~t(k,π(v))=𝖶~π(v)(t)(k)\tilde{\mathsf{W}}_{t}(k,\pi(v))=\tilde{\mathsf{W}}^{(t)}_{\pi(v)}(k) for vBtv\not\in\mathrm{B}_{t}. We recall gt1Y~sg_{t-1}\tilde{Y}_{s} and define gt1Ysg_{t-1}{Y}_{s} as follows:

gt1Y~s(k,v)=W~k(s)(v)+ηk(s),gt1D~v(s),gt1Ys(k,v)=Wk(s)(v)+ηk(s),gt1Dv(s),\displaystyle g_{t-1}\tilde{Y}_{s}(k,v)=\tilde{W}^{(s)}_{k}(v)+\langle\eta^{(s)}_{k},g_{t-1}\tilde{D}^{(s)}_{v}\rangle,\quad g_{t-1}{Y}_{s}(k,v)=W^{(s)}_{k}(v)+\langle\eta^{(s)}_{k},g_{t-1}{D}^{(s)}_{v}\rangle\,,
gt1𝖸~s(k,𝗏)=𝖶~k(s)(𝗏)+ηk(s),gt1𝖣~𝗏(s),gt1𝖸s(k,𝗏)=𝖶k(s)(𝗏)+ηk(s),gt1𝖣𝗏(s),\displaystyle g_{t-1}\tilde{\mathsf{Y}}_{s}(k,\mathsf{v})=\tilde{\mathsf{W}}^{(s)}_{k}(\mathsf{v})+\langle\eta^{(s)}_{k},g_{t-1}\tilde{\mathsf{D}}^{(s)}_{\mathsf{v}}\rangle,\quad g_{t-1}{\mathsf{Y}}_{s}(k,\mathsf{v})=\mathsf{W}^{(s)}_{k}(\mathsf{v})+\langle\eta^{(s)}_{k},g_{t-1}\mathsf{D}^{(s)}_{\mathsf{v}}\rangle\,,

where 0st,1kKs120\leq s\leq t,1\leq k\leq\frac{K_{s}}{12}, and v,π1(𝗏)Bt1v,\pi^{-1}(\mathsf{v})\not\in\mathrm{B}_{t-1} when s<ts<t, as well as v,π1(𝗏)Btv,\pi^{-1}(\mathsf{v})\not\in\mathrm{B}_{t} when s=ts=t. In what follows, we will use xt={xk,v(t)}x_{t}=\{x^{(t)}_{k,v}\} and 𝗑t={𝗑k,𝗏(t)}\mathsf{x}_{t}=\{\mathsf{x}^{(t)}_{k,\mathsf{v}}\} to denote realization for gt1Ytg_{t-1}Y_{t} and gt1𝖸tg_{t-1}\mathsf{Y}_{t} respectively. In addition, we define a mean-zero Gaussian process

{ηk(s),Dˇv(s),ηk(s),𝖣ˇπ(v)(s):0st,1kKs12,vV}\big{\{}\langle\eta^{(s)}_{k},\check{D}^{(s)}_{v}\rangle,\langle\eta^{(s)}_{k},\check{\mathsf{D}}^{(s)}_{\pi(v)}\rangle:0\leq s\leq t,1\leq k\leq\tfrac{K_{s}}{12},v\in V\big{\}} (3.69)

where each variable has variance 1 and the only non-zero covariance is given by

𝔼[ηk(s),Dˇv(s)ηk(s),𝖣ˇπ(v)(s)]=ρ^ηk(s)Ψ(s)(ηk(s)) for 0st,1kKs12 and vV.\displaystyle\mathbb{E}[\langle\eta^{(s)}_{k},\check{D}^{(s)}_{v}\rangle\langle\eta^{(s)}_{k},\check{\mathsf{D}}^{(s)}_{\pi(v)}\rangle]=\hat{\rho}\eta^{(s)}_{k}\Psi^{(s)}(\eta^{(s)}_{k})^{*}\mbox{ for }0\leq s\leq t,1\leq k\leq\tfrac{K_{s}}{12}\mbox{ and }v\in V\,.

We defined (3.69) since eventually we will show that this is a good approximation for our actual process (see our definition of t\mathcal{B}_{t} below). For v,π1(𝗏)Btv,\pi^{-1}(\mathsf{v})\not\in\mathrm{B}_{t}, we also define

σk(t),Dˇv(t)=12Ktj=1Kt12βk(t)(j)ηj(t),Dˇv(t) and Yˇt(k,v)=W~v(t)(k)+ηk(t),Dˇv(t),\displaystyle\langle\sigma^{(t)}_{k},\check{D}^{(t)}_{v}\rangle=\sqrt{\frac{12}{K_{t}}}\sum_{j=1}^{\frac{K_{t}}{12}}\beta^{(t)}_{k}(j)\langle\eta^{(t)}_{j},\check{D}^{(t)}_{v}\rangle\mbox{ and }\check{Y}_{t}(k,v)=\tilde{W}^{(t)}_{v}(k)+\langle\eta^{(t)}_{k},\check{D}^{(t)}_{v}\rangle\,,

and we make analogous definitions for the mathsf version for 𝗏\mathsf{v}. For t0t\geq 0, we define

𝒜t=\displaystyle\mathcal{A}_{t}= {vVBADt|gt1Yt(k,v)|2+|gt1𝖸t(k,π(v))|2100n for all k},\displaystyle\Big{\{}\sum_{v\in V\setminus\mathrm{BAD}_{t}}|g_{t-1}Y_{t}(k,v)|^{2}+|g_{t-1}\mathsf{Y}_{t}(k,\pi(v))|^{2}\leq 100n\mbox{ for all }k\Big{\}}\,,
t=\displaystyle\mathcal{B}_{t}= {(gt1Yt,gt1𝖸t){(xt,𝗑t):p{gt1Yt,gt1𝖸t|𝔖t1;BADt}(xt,𝗑t)p{Yˇt,𝖸ˇt}(xt,𝗑t)exp{nKt30Δt2}}},\displaystyle\Big{\{}(g_{t-1}Y_{t},g_{t-1}\mathsf{Y}_{t})\in\Big{\{}(x_{t},\mathsf{x}_{t}):\frac{p_{\{g_{t-1}Y_{t},g_{t-1}\mathsf{Y}_{t}|\mathfrak{S}_{t-1};\mathrm{BAD}_{t}\}}(x_{t},\mathsf{x}_{t})}{p_{\{\check{Y}_{t},\check{\mathsf{Y}}_{t}\}}(x_{t},\mathsf{x}_{t})}\leq\exp\{nK_{t}^{30}\Delta_{t}^{2}\}\Big{\}}\Big{\}}\,,
t=\displaystyle\mathcal{H}_{t}= {vVBADt|PROJ(gt1Yt(k,v))|2+|PROJ(gt1𝖸t(k,π(v)))|2nKt6Δt2 for all k}.\displaystyle\Big{\{}\sum_{v\in V\setminus\mathrm{BAD}_{t}}\big{|}\textup{PROJ}(g_{t-1}Y_{t}(k,v))\big{|}^{2}+\big{|}\textup{PROJ}(g_{t-1}\mathsf{Y}_{t}(k,\pi(v)))\big{|}^{2}\leq nK_{t}^{6}\Delta^{2}_{t}\mbox{ for all }k\Big{\}}\,.

Note that 0\mathcal{H}_{0} holds obviously. For notational convenience in the induction, we will also denote by 𝒜1\mathcal{A}_{-1} and 1\mathcal{B}_{-1} as the whole space. Also, by Lemma 3.13 𝒯1\mathcal{T}_{-1} holds with probability 1o(1)1-o(1) since BAD1=REV\mathrm{BAD}_{-1}=\mathrm{REV}. In addition, we have (0)=1o(1)\mathbb{P}(\mathcal{E}_{0})=1-o(1) by Lemma 3.14. With these clarified, our inductive proof consists of the following steps:

Step 1. If 𝒯t1\mathcal{T}_{t-1} holds for 0tt10\leq t\leq t^{*}-1, then 𝒯t\mathcal{T}_{t} holds with probability 1o(1)1-o(1);

Step 2. If 𝒜t1,t1,𝒯t,t,t\mathcal{A}_{t-1},\mathcal{B}_{t-1},\mathcal{T}_{t},\mathcal{E}_{t},\mathcal{H}_{t} hold for 0tt10\leq t\leq t^{*}-1, then t\mathcal{B}_{t} holds with probability 1o(1)1-o(1);

Step 3. If t\mathcal{B}_{t} holds for 0tt10\leq t\leq t^{*}-1, then 𝒜t\mathcal{A}_{t} holds with probability 1o(1)1-o(1);

Step 4. If 𝒜t,t,t,t,𝒯t\mathcal{A}_{t},\mathcal{B}_{t},\mathcal{E}_{t},\mathcal{H}_{t},\mathcal{T}_{t} hold for 0tt10\leq t\leq t^{*}-1, then t+1\mathcal{E}_{t+1} holds with probability 1o(1)1-o(1);

Step 5. If 𝒜t,t,t,𝒯t,t+1\mathcal{A}_{t},\mathcal{B}_{t},\mathcal{H}_{t},\mathcal{T}_{t},\mathcal{E}_{t+1} hold for 0tt10\leq t\leq t^{*}-1, then t+1\mathcal{H}_{t+1} holds with probability 1o(1)1-o(1).

3.6.1 Step 1: 𝒯t\mathcal{T}_{t}

In what follows, we assume 𝒯t1\mathcal{T}_{t-1} holds without further notice. As we will see, the philosophy of our proof throughout this subsection is to first consider an arbitrarily fixed realization for e.g. {Γk(r),Πk(r):0rt},BADt1\{\Gamma^{(r)}_{k},\Pi^{(r)}_{k}:0\leq r\leq t\},\mathrm{BAD}_{t-1} and BIASD,t,s,l\mathrm{BIAS}_{D,t,s,l}, and then we prove a bound on the tail probability for some “bad” event—we shall emphasize that, we will not compute the conditional probability as it will be difficult to implement; instead we will compute the probability (which we denote as ^\hat{\mathbb{P}}) by simply treating {Γk(r),Πk(r):0rt,1kKr},BADt1\{\Gamma^{(r)}_{k},\Pi^{(r)}_{k}:0\leq r\leq t,1\leq k\leq K_{r}\},\mathrm{BAD}_{t-1} and BIASD,t,s,l\mathrm{BIAS}_{D,t,s,l} as deterministic objects. Formally, we define the operation 𝔼^\hat{\mathbb{E}} as follows: for any function hh (of the form h(Γ,Π,BADt1,G,𝖦,W,𝖶)h(\Gamma,\Pi,\mathrm{BAD}_{t-1},\overrightarrow{G},\overrightarrow{\mathsf{G}},W,\mathsf{W})) and any realization Ξ,B\Xi,\mathrm{B} for {Γ,Π}\{\Gamma,\Pi\} and BADt1\mathrm{BAD}_{t-1}, define

f(Ξ,B)=𝔼{G,𝖦,W,𝖶}[h(Ξ,B,G,𝖦,W,𝖶)].f(\Xi,\mathrm{B})=\mathbb{E}_{\{\overrightarrow{G},\overrightarrow{\mathsf{G}},W,\mathsf{W}\}}\big{[}h(\Xi,\mathrm{B},\overrightarrow{G},\overrightarrow{\mathsf{G}},W,\mathsf{W})\big{]}\,.

Then the operator 𝔼^\hat{\mathbb{E}} is defined such that

𝔼^[h(Γ,Π,BADt1,G,𝖦,W,𝖶)]=f(Γ,Π,BADt1).\hat{\mathbb{E}}\big{[}h(\Gamma,\Pi,\mathrm{BAD}_{t-1},\overrightarrow{G},\overrightarrow{\mathsf{G}},W,\mathsf{W})\big{]}=f(\Gamma,\Pi,\mathrm{BAD}_{t-1})\,.

Note that this definition of 𝔼^\hat{\mathbb{E}} is consistent with that in [17]. It is also consistent with (3.52) except that thanks to (3.50) for the special case considered in (3.52) simplifications were applied.

Provided with 𝔼^\hat{\mathbb{E}}, we can now precisely define ^(A)=𝔼^[𝟏A]\hat{\mathbb{P}}(A)=\hat{\mathbb{E}}[\mathbf{1}_{A}] for any event AA. After bounding the ^\hat{\mathbb{P}}-probability, we apply a union bound over all possible realizations which then justifies that the bad event indeed typically will not occur; this union bound is necessary exactly since what we have computed earlier is not the conditional probability. The key to our success is that the tail probability is so small that it can afford a union bound.

Lemma 3.27.

We have

(|BIASt|8ϑ1Kt4(|BADt1|+n/q^(1q^))e20(loglogn)10)1o(enKt).\displaystyle\mathbb{P}\Big{(}|\mathrm{BIAS}_{t}|\leq 8\vartheta^{-1}K^{4}_{t}\big{(}|\mathrm{BAD}_{t-1}|+\sqrt{n/\hat{q}(1-\hat{q})}\big{)}e^{20(\log\log n)^{10}}\Big{)}\geq 1-o(e^{-nK_{t}})\,.
Proof.

Recall the definition of BIASD,t,s,k\mathrm{BIAS}_{D,t,s,k} as in (3.7). We first consider an arbitrarily fixed realization of {Γk(r),Πk(r):0rt,1kKr}\big{\{}\Gamma^{(r)}_{k},\Pi^{(r)}_{k}:0\leq r\leq t,1\leq k\leq K_{r}\big{\}} and BADt1\mathrm{BAD}_{t-1}. Since the events vBIASD,t,s,kv\in\mathrm{BIAS}_{D,t,s,k} over vBADt1v\not\in\mathrm{BAD}_{t-1} are independent of each other and by Lemma 3.3 each such event occurs with probability at most 2exp{nϑq^(1q^)e20(loglogn)102(|BADt1|q^(1q^)+nq^(1q^))}2\exp\Big{\{}-\frac{n\vartheta\hat{q}(1-\hat{q})e^{-20(\log\log n)^{10}}}{2(|\mathrm{BAD}_{t-1}|\hat{q}(1-\hat{q})+\sqrt{n\hat{q}(1-\hat{q})})}\Big{\}}, we can then apply Lemma 3.3 (again) and derive that

^(|BIASD,t,s,k|>4ϑ1Kt2(|BADt1|+n/q^(1q^))e20(loglogn)10)enKt2.\displaystyle\hat{\mathbb{P}}\Big{(}|\mathrm{BIAS}_{D,t,s,k}|>4\vartheta^{-1}K_{t}^{2}(|\mathrm{BAD}_{t-1}|+\sqrt{n/\hat{q}(1-\hat{q})})e^{20(\log\log n)^{10}}\Big{)}\leq e^{-nK^{2}_{t}}\,.

Clearly, a similar estimate holds for BIAS𝖣,t,s,k\mathrm{BIAS}_{\mathsf{D},t,s,k}. We next apply a union bound over all admissible realizations for {Γk(r),Πk(r):rt,1kKr},BADt1\{\Gamma^{(r)}_{k},\Pi^{(r)}_{k}:r\leq t,1\leq k\leq K_{r}\},\mathrm{BAD}_{t-1}. Since we need to choose at most 4Kt4K_{t} subsets of VV (or 𝖵\mathsf{V}), the enumeration is bounded by 24Ktn2^{4K_{t}n}. Therefore, applying a union bound over all these realizations and over s,ts,t, we obtain the desired estimate by recalling that BIASt=0st1kKsBIASD,t,s,kBIAS𝖣,t,s,k\mathrm{BIAS}_{t}=\cup_{0\leq s\leq t}\cup_{1\leq k\leq K_{s}}\mathrm{BIAS}_{D,t,s,k}\cup\mathrm{BIAS}_{\mathsf{D},t,s,k}. ∎

Lemma 3.28.

We have (|PRBt|𝙰)1o(enKt)\mathbb{P}(|\mathrm{PRB}_{t}|\leq\mathtt{A})\geq 1-o(e^{-nK_{t}}) where

𝙰=Kt20ϑ2Δt2(|BADt1|+n/q^(1q^)).\displaystyle\mathtt{A}=K_{t}^{20}\vartheta^{-2}\Delta_{t}^{-2}\big{(}|\mathrm{BAD}_{t-1}|+\sqrt{n/\hat{q}(1-\hat{q})}\big{)}\,.
Proof.

Recall (3.14). For each fixed admissible realization of {Γk(r),Πk(r):0rt,1kKr}\{\Gamma^{(r)}_{k},\Pi^{(r)}_{k}:0\leq r\leq t,1\leq k\leq K_{r}\} and a realization of BADt1\mathrm{BAD}_{t-1}, we have that the matrices 𝐐t1\mathbf{Q}_{t-1} and Ht,k,v,𝖧t,k,vH_{t,k,v},\mathsf{H}_{t,k,v} (and thus the matrix 𝐇t\mathbf{H}_{t}) are fixed. Define a vector χt,k\chi_{t,k} such that χt,k(k,v)=𝟏vPRBt,k\chi_{t,k}(k,v)=\mathbf{1}_{v\in\mathrm{PRB}_{t,k}} and χt,k(l,v)=0\chi_{t,k}(l,v)=0 for lkl\neq k for each vBADt1v\not\in\mathrm{BAD}_{t-1}. Then, we have

|([gY]t1gt1[Y]t10)𝐐t1𝐇tχt,k|>12Δtχt,k2=12Δt|PRBt|,\displaystyle\big{|}\begin{pmatrix}[gY]_{t-1}-g_{t-1}[Y]_{t-1}&0\end{pmatrix}\mathbf{Q}_{t-1}\mathbf{H}_{t}^{*}\chi^{*}_{t,k}\big{|}>\tfrac{1}{2}\Delta_{t}\|\chi_{t,k}\|^{2}=\tfrac{1}{2}\Delta_{t}|\mathrm{PRB}_{t}|\,, (3.70)

or we have a version of (3.70) with YY replaced by 𝖸\mathsf{Y}. Without loss of generality, we in what follows assume that (3.70) holds. For vuv\neq u, we have that

𝔼[([gY]t1(k,v)gt1[Y]t1(k,v))([gY]t1(k,u)gt1[Y]t1(k,u))]=0,\displaystyle\mathbb{E}[([gY]_{t-1}(k,v)-g_{t-1}[Y]_{t-1}(k,v))([gY]_{t-1}(k,u)-g_{t-1}[Y]_{t-1}(k,u))]=0\,,
𝔼[([gY]t1(k,v)gt1[Y]t1(k,v))([gY]t1(l,v)gt1[Y]t1(l,v))]Kt12|BADt1|/n.\displaystyle\mathbb{E}[([gY]_{t-1}(k,v)-g_{t-1}[Y]_{t-1}(k,v))([gY]_{t-1}(l,v)-g_{t-1}[Y]_{t-1}(l,v))]\leq K_{t-1}^{2}|\mathrm{BAD}_{t-1}|/n\,.

So the covariance matrix of ([gY]t1gt1[Y]t10)\begin{pmatrix}[gY]_{t-1}-g_{t-1}[Y]_{t-1}&0\end{pmatrix} (we denote as 𝐑t1\mathbf{R}_{t-1}) is a block-diagonal matrix with each block of dimension at most Kt1K_{t-1} and of entries bounded by Kt12|BADt1|n\frac{K_{t-1}^{2}|\mathrm{BAD}_{t-1}|}{n}. As a result, 𝐑t1opKt14|BADt1|/n\|\mathbf{R}_{t-1}\|_{\mathrm{op}}\leq K_{t-1}^{4}|\mathrm{BAD}_{t-1}|/n. Thus, regarding χt,k\chi_{t,k} as a deterministic vector, we have ([gY]t1gt1[Y]t10)𝐐t1𝐇tχt,k\begin{pmatrix}[gY]_{t-1}-g_{t-1}[Y]_{t-1}&0\end{pmatrix}\mathbf{Q}_{t-1}\mathbf{H}_{t}^{*}\chi^{*}_{t,k} is a linear combination of Gu,wq^\overrightarrow{G}_{u,w}-\hat{q}, with variance given by

χt,k𝐇t𝐐t1𝐑t1𝐐t1𝐇tχt,kχt,k2𝐑t1op𝐐t1op2𝐇top21nKt10|BADt1|χt,k2.\displaystyle\chi_{t,k}\mathbf{H}_{t}\mathbf{Q}_{t-1}\mathbf{R}_{t-1}\mathbf{Q}_{t-1}\mathbf{H}_{t}^{*}\chi_{t,k}^{*}\leq\|\chi_{t,k}\|^{2}\|\mathbf{R}_{t-1}\|_{\mathrm{op}}\|\mathbf{Q}_{t-1}\|^{2}_{\mathrm{op}}\|\mathbf{H}_{t}\|^{2}_{\mathrm{op}}\leq\frac{1}{n}K_{t}^{10}|\mathrm{BAD}_{t-1}|\|\chi_{t,k}\|^{2}\,.

In addition, the coefficient of each Gu,wq^\overrightarrow{G}_{u,w}-\hat{q} can be bounded as follows: for 0st1,1kKs12,vBADt10\leq s\leq t-1,1\leq k\leq\frac{K_{s}}{12},v\not\in\mathrm{BAD}_{t-1} denoting τu,w(s,k,v)\tau_{u,w}(s,k,v) the coefficient of Gu,wq^\overrightarrow{G}_{u,w}-\hat{q} in [gY]t1gt1[Y]t1[gY]_{t-1}-g_{t-1}[Y]_{t-1}, we have τu,w(s,k,v)=0\tau_{u,w}(s,k,v)=0 for v{u,w}v\not\in\{u,w\} and we have |τu,w(s,k,u)|,|τu,w(s,k,w)|=O(Ks𝔞snq^)|\tau_{u,w}(s,k,u)|,|\tau_{u,w}(s,k,w)|=O(\frac{K_{s}}{\sqrt{\mathfrak{a}_{s}n\hat{q}}}). Combined with Lemmas 3.24 and 3.25, it yields that the coefficient of Gu,w\overrightarrow{G}_{u,w} in the linear combination satisfies that

|τu,w𝐐t1𝐇tχt,k|τu,w1𝐐t1𝐇tχt,kτu,w1𝐐t1𝐇tχt,kKt13ϑ2nq^(1q^).\displaystyle|\tau_{u,w}\mathbf{Q}_{t-1}\mathbf{H}_{t}^{*}\chi_{t,k}^{*}|\leq\|\tau_{u,w}\|_{1}\|\mathbf{Q}_{t-1}\mathbf{H}_{t}^{*}\chi_{t,k}^{*}\|_{\infty}\leq\|\tau_{u,w}\|_{1}\|\mathbf{Q}_{t-1}\mathbf{H}_{t}^{*}\|_{\infty}\|\chi_{t,k}^{*}\|_{\infty}\leq\frac{K_{t}^{13}}{\vartheta^{2}\sqrt{n\hat{q}(1-\hat{q})}}\,.

Thus, recalling (3.70), we can apply Lemma 3.3 to each realization of χt,k\chi_{t,k} and derive that (noting that on |PRBt|𝙰|\mathrm{PRB}_{t}|\geq\mathtt{A} we have χt,k2𝙰\|\chi_{t,k}\|^{2}\geq\mathtt{A})

^(|PRBt|𝙰)2Ktn𝙰𝙰exp{(0.5Δt𝙰)2Kt10|BADt1|𝙰/n+Δt𝙰Kt13/ϑ2nq^(1q^)},\displaystyle\hat{\mathbb{P}}(|\mathrm{PRB}_{t}|\geq\mathtt{A})\leq 2^{K_{t}n}\sum_{\mathtt{A}^{\prime}\geq\mathtt{A}}\exp\Big{\{}-\frac{(0.5\Delta_{t}\mathtt{A}^{\prime})^{2}}{K_{t}^{10}|\mathrm{BAD}_{t-1}|\mathtt{A}^{\prime}/n+\Delta_{t}\mathtt{A}^{\prime}K_{t}^{13}/\vartheta^{2}\sqrt{n\hat{q}(1-\hat{q})}}\Big{\}}\,,

which is bounded by exp{nKt2}\exp\{-nK_{t}^{2}\}. Here in the above display the factor 2Ktn2^{K_{t}n} counts the enumeration for possible realizations of χt,k\chi_{t,k}. At this point, we apply a union bound over all possible realizations of {Γk(r),Πk(r):0rt,1kKr}\{\Gamma^{(r)}_{k},\Pi^{(r)}_{k}:0\leq r\leq t,1\leq k\leq K_{r}\} and BADt1\mathrm{BAD}_{t-1} (whose enumeration is again bounded by 24Ktn2^{4K_{t}n}), completing the proof of the lemma. ∎

In the next few lemmas, we control |LARGEt||\mathrm{LARGE}_{t}|.

Lemma 3.29.

With probability 1o(enKt)1-o(e^{-nK_{t}}) we have |LARGEt(0)|8ϑ1Kt4n12logloglogn|\mathrm{LARGE}^{(0)}_{t}|\leq 8\vartheta^{-1}K^{4}_{t}n^{1-\frac{2}{\log\log\log n}}.

Proof.

Recall (3.10). For each fixed realization of {Γk(r),Πk(r):rt,1kKr}\{\Gamma^{(r)}_{k},\Pi^{(r)}_{k}:r\leq t,1\leq k\leq K_{r}\} and BADt1\mathrm{BAD}_{t-1} and for each jj we can apply Lemma 3.3 and obtain that

^(|ηk(s),gt1Dv(s)j|>n1logloglogn)2exp{12ϑn2logloglogn}.\displaystyle\hat{\mathbb{P}}\Big{(}|\langle\eta^{(s)}_{k},g_{t-1}D^{(s)}_{v}\rangle_{\langle j\rangle}|>n^{\frac{1}{\log\log\log n}}\Big{)}\leq 2\exp\{-\tfrac{1}{2}\vartheta n^{\frac{2}{\log\log\log n}}\}\,.

Also (|Wk(s)(v)|>n1logloglogn)exp{12n2logloglogn}\mathbb{P}(|W^{(s)}_{k}(v)|>n^{\frac{1}{\log\log\log n}})\leq\exp\{-\frac{1}{2}n^{\frac{2}{\log\log\log n}}\}. Then applying a union bound over jj, we get that

^(vLARGEt,s,k(0))2n2exp{12ϑn2logloglogn}exp{14ϑn2logloglogn}\displaystyle\hat{\mathbb{P}}\big{(}v\in\mathrm{LARGE}^{(0)}_{t,s,k}\big{)}\leq 2n^{2}\exp\big{\{}-\tfrac{1}{2}\vartheta n^{\frac{2}{\log\log\log n}}\big{\}}\leq\exp\big{\{}-\tfrac{1}{4}\vartheta n^{\frac{2}{\log\log\log n}}\big{\}}

where in the last inequality we use the bound of ϑ=ϑχ+1\vartheta=\vartheta_{\chi+1} in Lemma 2.1. Under the ^\hat{\mathbb{P}}-measure, the events vLARGEt,s,k(0)v\in\mathrm{LARGE}^{(0)}_{t,s,k} over vv are independent of each other. Thus, another application of Lemma 3.3 yields that

^(|LARGEt,s,k|>ϑ1Kt2n11logloglogn)exp{14ϑn2logloglognϑ1Kt2n12logloglogn},\displaystyle\hat{\mathbb{P}}\Big{(}|\mathrm{LARGE}_{t,s,k}|>\vartheta^{-1}K^{2}_{t}n^{1-\frac{1}{\log\log\log n}}\Big{)}\leq\exp\big{\{}-\tfrac{1}{4}\vartheta n^{\frac{2}{\log\log\log n}}\cdot\vartheta^{-1}K^{2}_{t}n^{1-\frac{2}{\log\log\log n}}\big{\}}\,,

which is bounded by eKt2n/4e^{-K_{t}^{2}n/4}. Now, a union bound over all possible realizations (which is at most 24Ktn2^{4K_{t}n}) and over s,ks,k (as well as for the mathsf version) completes the proof. ∎

Lemma 3.30.

With probability 1o(enKt)1-o(e^{-nK_{t}}) we have

|LARGEt(1)|\displaystyle|\mathrm{LARGE}^{(1)}_{t}| ϑ1Kt4n2logloglogn(|BIASt|+|PRBt|+|LARGEt(0)|),\displaystyle\leq\vartheta^{-1}K_{t}^{4}n^{-\frac{2}{\log\log\log n}}(|\mathrm{BIAS}_{t}|+|\mathrm{PRB}_{t}|+|\mathrm{LARGE}^{(0)}_{t}|)\,, (3.71)
and |LARGEt(a+1)|\displaystyle\mbox{ and }|\mathrm{LARGE}^{(a+1)}_{t}| ϑ1Kt4n2logloglogn|LARGEt(a)| for a1.\displaystyle\leq\vartheta^{-1}K_{t}^{4}n^{-\frac{2}{\log\log\log n}}|\mathrm{LARGE}^{(a)}_{t}|\mbox{ for }a\geq 1\,. (3.72)
Proof.

We will prove (3.71) and the proof for (3.72) is similar. Recall (3.11). For each fixed realization {Γk(r),Πk(r):rt,1kKr}\{\Gamma^{(r)}_{k},\Pi^{(r)}_{k}:r\leq t,1\leq k\leq K_{r}\}, BADt1,LARGEt(0),BIASt\mathrm{BAD}_{t-1},\mathrm{LARGE}^{(0)}_{t},\mathrm{BIAS}_{t}, we apply Lemma 3.3 and obtain that

^(vLARGEt,s,k(1))n2exp{ϑn2logloglogn(|BIASt|+|PRBt|+|LARGEt(0)|)/n}.\displaystyle\hat{\mathbb{P}}(v\in\mathrm{LARGE}^{(1)}_{t,s,k})\leq n^{2}\exp\Big{\{}-\frac{\vartheta n^{\frac{2}{\log\log\log n}}}{(|\mathrm{BIAS}_{t}|+|\mathrm{PRB}_{t}|+|\mathrm{LARGE}^{(0)}_{t}|)/n}\Big{\}}\,.

Since under the ^\hat{\mathbb{P}}-measure we have independence among {vLARGEt(1)}\{v\in\mathrm{LARGE}^{(1)}_{t}\} for different vv, we can then apply Lemma 3.3 again and get that

^(the complement of (3.71))\displaystyle\hat{\mathbb{P}}(\mbox{the complement of }\eqref{eq-LARGE-1})
\displaystyle\leq\ exp{ϑn2logloglognϑ1Kt2n2logloglogn(|BIASt|+|PRBt|+|LARGEt(0)|)(|BIASt|+|PRBt|+|LARGEt(0)|)/n}eKt2n.\displaystyle\exp\Big{\{}-\frac{\vartheta n^{\frac{2}{\log\log\log n}}\cdot\vartheta^{-1}K_{t}^{2}n^{-\frac{2}{\log\log\log n}}(|\mathrm{BIAS}_{t}|+|\mathrm{PRB}_{t}|+|\mathrm{LARGE}^{(0)}_{t}|)}{(|\mathrm{BIAS}_{t}|+|\mathrm{PRB}_{t}|+|\mathrm{LARGE}^{(0)}_{t}|)/n}\Big{\}}\leq e^{-K_{t}^{2}n}\,.

Then a union bound over all possible realizations (whose enumeration is bounded by 24Ktn2^{4K_{t}n}) completes the proof. ∎

We may assume that all the typical events as described in Lemmas 3.27, 3.28, 3.29 and 3.30 hold (note that this occurs with probability 1Kt2enKt1-K_{t}^{2}e^{-nK_{t}}). Then, we see that LARGEt(logn)=\mathrm{LARGE}_{t}^{(\log n)}=\emptyset (by (3.72)). In addition, we have that

|LARGEt||LARGEt(0)|+a=1logn|LARGEt(a)|\displaystyle|\mathrm{LARGE}_{t}|\leq|\mathrm{LARGE}^{(0)}_{t}|+\sum_{a=1}^{\log n}|\mathrm{LARGE}_{t}^{(a)}|
\displaystyle\leq\ 8ϑ1Kt4(n12logloglogn+(|BIASt|+|PRBt|+n12logloglogn)a=1logn(ϑ1Kt4)analogloglogn)\displaystyle 8\vartheta^{-1}K_{t}^{4}\Big{(}n^{1-\frac{2}{\log\log\log n}}+(|\mathrm{BIAS}_{t}|+|\mathrm{PRB}_{t}|+n^{1-\frac{2}{\log\log\log n}})\sum_{a=1}^{\log n}(\vartheta^{-1}K_{t}^{4})^{a}n^{-\frac{a}{\log\log\log n}}\Big{)}
\displaystyle\leq\ 20ϑ2Kt8n12logloglogn.\displaystyle 20\vartheta^{-2}K_{t}^{8}n^{1-\frac{2}{\log\log\log n}}\,. (3.73)

This (together with events in Lemmas 3.27 and 3.28) implies that

|BADt|20ϑ3Kt30e20(loglogn)10(|BADt1|+n12logloglogn).\displaystyle|\mathrm{BAD}_{t}|\leq 20\vartheta^{-3}K_{t}^{30}e^{20(\log\log n)^{10}}\big{(}|\mathrm{BAD}_{t-1}|+n^{1-\frac{2}{\log\log\log n}}\big{)}\,.

Combined with the induction hypothesis 𝒯t1\mathcal{T}_{t-1}, this yields that

(𝒯tc;𝒯t1)Kt2exp{nKt}.\displaystyle\mathbb{P}(\mathcal{T}_{t}^{c};\mathcal{T}_{t-1})\leq K_{t}^{2}\exp\{-nK_{t}\}\,. (3.74)

3.6.2 Step 2: t\mathcal{B}_{t}

Before controlling t\mathcal{B}_{t}, we prove a couple of lemmas as preparations.

Lemma 3.31.

For any two matrices A,B\mathrm{A,B}, we have ABHS,BAHSAopBHS\|\mathrm{AB}\|_{\mathrm{HS}},\|\mathrm{BA}\|_{\mathrm{HS}}\leq\|\mathrm{A}\|_{\mathrm{op}}\|\mathrm{B}\|_{\mathrm{HS}}.

Proof.

Since AAop=Aop2\|\mathrm{A^{*}A}\|_{\mathrm{op}}=\|\mathrm{A}\|_{\mathrm{op}}^{2}, we have Aop2IAA\|\mathrm{A}\|_{\mathrm{op}}^{2}\mathrm{I}-\mathrm{A^{*}A} is semi-positive definite. Thus,

ABHS2=tr(BAAB)=Aop2tr(BB)tr(B(Aop2IAA)B)\displaystyle\|\mathrm{AB}\|_{\mathrm{HS}}^{2}=\mathrm{tr}(\mathrm{B^{*}A^{*}AB})=\|\mathrm{A}\|_{\mathrm{op}}^{2}\mathrm{tr}(\mathrm{B^{*}B})-\mathrm{tr}(\mathrm{B^{*}(\|\mathrm{A}\|_{\mathrm{op}}^{2}\mathrm{I}-\mathrm{A^{*}A})B})
\displaystyle\leq\ Aop2tr(BB)=Aop2BHS2,\displaystyle\|\mathrm{A}\|_{\mathrm{op}}^{2}\mathrm{tr}(\mathrm{B^{*}B})=\|\mathrm{A}\|_{\mathrm{op}}^{2}\|\mathrm{B}\|_{\mathrm{HS}}^{2}\,,

which yields ABHSAopBHS\|\mathrm{AB}\|_{\mathrm{HS}}\leq\|\mathrm{A}\|_{\mathrm{op}}\|\mathrm{B}\|_{\mathrm{HS}}. Similarly we can show BAHSAopBHS\|\mathrm{BA}\|_{\mathrm{HS}}\leq\|\mathrm{A}\|_{\mathrm{op}}\|\mathrm{B}\|_{\mathrm{HS}}. ∎

Lemma 3.32.

For any m1m\geq 1, let μm\mu\in\mathbb{R}^{m} and let ΣX,ΣY\Sigma_{X},\Sigma_{Y} be mmm\!*\!m positive definite matrices. Suppose that X𝒩(0,ΣX)X\sim\mathcal{N}(0,\Sigma_{X}) and Y𝒩(μ,ΣY)Y\sim\mathcal{N}(\mu,\Sigma_{Y}). Then for all umu\in\mathbb{R}^{m}

pY(u)pX(u)exp{\displaystyle\frac{p_{Y}(u)}{p_{X}(u)}\leq\exp\Big{\{} ΣXop2ΣY1op2ΣYop2ΣY1ΣX1HS2+(ΣX1op+ΣY1op)μ2\displaystyle\|\Sigma_{X}\|_{\mathrm{op}}^{2}\|\Sigma_{Y}^{-1}\|_{\mathrm{op}}^{2}\|\Sigma_{Y}\|_{\mathrm{op}}^{2}\|\Sigma_{Y}^{-1}-\Sigma_{X}^{-1}\|_{\mathrm{HS}}^{2}+(\|\Sigma_{X}^{-1}\|_{\mathrm{op}}+\|\Sigma_{Y}^{-1}\|_{\mathrm{op}})\|\mu\|^{2}
+μ,uΣY1+12u(ΣX1ΣY1)212𝔼[Y(ΣX1ΣY1)2]}.\displaystyle+\langle\mu,u\rangle_{\Sigma_{Y}^{-1}}+\frac{1}{2}\|u\|^{2}_{(\Sigma_{X}^{-1}-\Sigma_{Y}^{-1})}-\frac{1}{2}\mathbb{E}\big{[}\|Y\|^{2}_{(\Sigma_{X}^{-1}-\Sigma_{Y}^{-1})}\big{]}\Big{\}}\,.
Proof.

Recalling the formula for Gaussian density, we have that

pY(u)pX(u)\displaystyle\frac{p_{Y}(u)}{p_{X}(u)} =det(ΣX)det(ΣY)exp{12μΣY12+u,μΣY1+12u(ΣX1ΣY1)2}.\displaystyle=\sqrt{\frac{\textup{det}(\Sigma_{X})}{\textup{det}(\Sigma_{Y})}}\cdot\exp\Big{\{}-\frac{1}{2}\|\mu\|_{\Sigma_{Y}^{-1}}^{2}+\langle u,\mu\rangle_{\Sigma^{-1}_{Y}}+\frac{1}{2}\|u\|^{2}_{(\Sigma_{X}^{-1}-\Sigma_{Y}^{-1})}\Big{\}}\,. (3.75)

Let ΛY\Lambda_{Y} be a positive definite matrix such that ΣY=ΛYΛY\Sigma_{Y}=\Lambda_{Y}\Lambda_{Y}^{*}. We have

𝔼[Y(ΣX1ΣY1)2]\displaystyle\mathbb{E}\big{[}\|Y\|^{2}_{(\Sigma_{X}^{-1}-\Sigma_{Y}^{-1})}\big{]} =μ(ΣX1ΣY1)2+tr(ΛY(ΣX1ΣY1)ΛY)\displaystyle=\|\mu\|^{2}_{(\Sigma_{X}^{-1}-\Sigma_{Y}^{-1})}+\mathrm{tr}\big{(}\Lambda_{Y}^{*}(\Sigma_{X}^{-1}-\Sigma_{Y}^{-1})\Lambda_{Y}\big{)}
=μ(ΣX1ΣY1)2+tr(ΛYΣX1ΛYI),\displaystyle=\|\mu\|^{2}_{(\Sigma_{X}^{-1}-\Sigma_{Y}^{-1})}+\mathrm{tr}\big{(}\Lambda_{Y}^{*}\Sigma_{X}^{-1}\Lambda_{Y}-\mathrm{I}\big{)}\,, (3.76)

where I\mathrm{I} is an identity matrix. We next control the determinants of ΣX,ΣY\Sigma_{X},\Sigma_{Y}. Let ϱ1,,ϱm0\varrho_{1},\ldots,\varrho_{m}\geq 0 be the eigenvalues of ΛYΣX1ΛY\Lambda_{Y}^{*}\Sigma_{X}^{-1}\Lambda_{Y}. Then k=1m(ϱk1)=tr(ΛYΣX1ΛYI)\sum_{k=1}^{m}(\varrho_{k}-1)=\mathrm{tr}(\Lambda_{Y}^{*}\Sigma_{X}^{-1}\Lambda_{Y}-\mathrm{I}) and k=1mϱk=det(ΛYΣX1ΛY)=det(ΣY)det(ΣX)\prod_{k=1}^{m}\varrho_{k}=\mathrm{det}(\Lambda_{Y}^{*}\Sigma_{X}^{-1}\Lambda_{Y})=\frac{\mathrm{det}(\Sigma_{Y})}{\mathrm{det}(\Sigma_{X})}. Also, using ϱk1(ΛYΣX1ΛY)1opΣXopΣY1op\varrho_{k}^{-1}\leq\|(\Lambda_{Y}^{*}\Sigma_{X}^{-1}\Lambda_{Y})^{-1}\|_{\mathrm{op}}\leq\|\Sigma_{X}\|_{\mathrm{op}}\|\Sigma_{Y}^{-1}\|_{\mathrm{op}} and the fact that x1logxc2(x1)2x-1-\log x\leq c^{-2}(x-1)^{2} for xc>0x\geq c>0, we have that

log{det(ΣX)det(ΣY)}+tr(ΛYΣX1ΛYI)=k=1m(logϱk+ϱk1)\displaystyle\log\Big{\{}\frac{\textup{det}(\Sigma_{X})}{\textup{det}(\Sigma_{Y})}\Big{\}}+\mathrm{tr}(\Lambda_{Y}^{*}\Sigma_{X}^{-1}\Lambda_{Y}-\mathrm{I})=\sum_{k=1}^{m}(-\log\varrho_{k}+\varrho_{k}-1)
\displaystyle\leq\ ΣXop2ΣY1op2k=1m(ϱk1)2=ΣXop2ΣY1op2IΛYΣX1ΛYHS2\displaystyle\|\Sigma_{X}\|_{\mathrm{op}}^{2}\|\Sigma_{Y}^{-1}\|_{\mathrm{op}}^{2}\sum_{k=1}^{m}(\varrho_{k}-1)^{2}=\|\Sigma_{X}\|_{\mathrm{op}}^{2}\|\Sigma_{Y}^{-1}\|_{\mathrm{op}}^{2}\|\mathrm{I}-\Lambda_{Y}^{*}\Sigma_{X}^{-1}\Lambda_{Y}\|_{\mathrm{HS}}^{2}
=\displaystyle=\ ΣXop2ΣY1op2ΛY(ΣY1ΣX1)ΛYHS2ΣXop2ΣY1op2ΛYop4ΣY1ΣX1HS2\displaystyle\|\Sigma_{X}\|_{\mathrm{op}}^{2}\|\Sigma_{Y}^{-1}\|_{\mathrm{op}}^{2}\|\Lambda_{Y}^{*}(\Sigma_{Y}^{-1}-\Sigma_{X}^{-1})\Lambda_{Y}\|_{\mathrm{HS}}^{2}\leq\|\Sigma_{X}\|_{\mathrm{op}}^{2}\|\Sigma_{Y}^{-1}\|_{\mathrm{op}}^{2}\|\Lambda_{Y}\|_{\mathrm{op}}^{4}\|\Sigma_{Y}^{-1}-\Sigma_{X}^{-1}\|_{\mathrm{HS}}^{2}
=\displaystyle=\ ΣXop2ΣY1op2ΣYop2ΣY1ΣX1HS2,\displaystyle\|\Sigma_{X}\|_{\mathrm{op}}^{2}\|\Sigma_{Y}^{-1}\|_{\mathrm{op}}^{2}\|\Sigma_{Y}\|_{\mathrm{op}}^{2}\|\Sigma_{Y}^{-1}-\Sigma_{X}^{-1}\|_{\mathrm{HS}}^{2}\,,

where the last inequality follows from Lemma 3.31. Combined with (3.75) and (3.6.2), this completes the proof of the lemma. ∎

We now return to t\mathcal{B}_{t}. Recall (3.36) as in Lemma 3.16. It remains to bound the density ratio between {gt1Y~t,gt1𝖸~t|t1}\big{\{}g_{t-1}\tilde{Y}_{t},g_{t-1}\tilde{\mathsf{Y}}_{t}|\mathcal{F}_{t-1}\big{\}} and {Yˇt,𝖸ˇt}\big{\{}\check{Y}_{t},\check{\mathsf{Y}}_{t}\big{\}}. Recall that in Remark 3.22 we have shown

(gt1Y~t(k,v)|t1)=𝑑\displaystyle(g_{t-1}\tilde{Y}_{t}(k,v)|\mathcal{F}_{t-1})\overset{d}{=} gt1Y~t(k,v)GAUS(gt1Y~t(k,v))+PROJ(gt1Y~t(k,v)).\displaystyle g_{t-1}\tilde{Y}_{t}^{\diamond}(k,v)-\mathrm{GAUS}(g_{t-1}\tilde{Y}_{t}(k,v))+\mathrm{PROJ}(g_{t-1}\tilde{Y}_{t}(k,v))\,.

Let Σ~t\tilde{\Sigma}_{t} be the covariance matrix of the process

{gt1Y~t(k,v)GAUS(gt1Y~t(k,v))gt1𝖸~t(k,𝗏)GAUS(gt1𝖸~t(k,𝗏)):v,π1(𝗏)Bt,1kKt12},\displaystyle\Bigg{\{}\begin{split}g_{t-1}\tilde{Y}_{t}^{\diamond}(k,v)-\mathrm{GAUS}(g_{t-1}\tilde{Y}_{t}(k,v))\\ g_{t-1}\tilde{\mathsf{Y}}_{t}^{\diamond}(k,\mathsf{v})-\mathrm{GAUS}(g_{t-1}\tilde{\mathsf{Y}}_{t}(k,\mathsf{v}))\end{split}:v,\pi^{-1}(\mathsf{v})\not\in\mathrm{B}_{t},1\leq k\leq\frac{K_{t}}{12}\Bigg{\}}\,,

let Σˇt\check{\Sigma}_{t} be the covariance matrix of

𝔉ˇt={Yˇt(k,v),𝖸ˇt(k,𝗏):v,π1(𝗏)Bt,1kKt12},\check{\mathfrak{F}}_{t}=\big{\{}\check{Y}_{t}(k,v),\check{\mathsf{Y}}_{t}(k,\mathsf{v}):v,\pi^{-1}(\mathsf{v})\not\in\mathrm{B}_{t},1\leq k\leq\tfrac{K_{t}}{12}\big{\}}\,, (3.77)

and let Σt\Sigma^{\diamond}_{t} be the covariance matrix of

{gt1Y~t(k,v),gt1𝖸~t(k,𝗏):v,π1(𝗏)Bt,1kKt12}.\displaystyle\big{\{}g_{t-1}\tilde{Y}^{\diamond}_{t}(k,v),g_{t-1}\tilde{\mathsf{Y}}^{\diamond}_{t}(k,\mathsf{v}):v,\pi^{-1}(\mathsf{v})\not\in\mathrm{B}_{t},1\leq k\leq\tfrac{K_{t}}{12}\big{\}}\,.

Also define vectors L(t),𝖫(t)L^{(t)},\mathsf{L}^{(t)} such that for 1kKt12,vBt,𝗏π(Bt)1\leq k\leq\frac{K_{t}}{12},v\not\in\mathrm{B}_{t},\mathsf{v}\not\in\pi(\mathrm{B}_{t})

L(t)(k,v)=PROJ(gt1Y~t(k,v)) and 𝖫(t)(k,𝗏)=PROJ(gt1𝖸~t(k,𝗏)).\displaystyle L^{(t)}(k,v)=\mathrm{PROJ}(g_{t-1}\tilde{Y}_{t}(k,v))\mbox{ and }\mathsf{L}^{(t)}(k,\mathsf{v})=\mathrm{PROJ}(g_{t-1}\tilde{\mathsf{Y}}_{t}(k,\mathsf{v}))\,.

Applying Lemma 3.32 we have

p{gt1Y~t,gt1𝖸~t|t1}(xt,𝗑t)p{Yˇt,𝖸ˇt}(xt,𝗑t)\displaystyle\frac{p_{\{g_{t-1}\tilde{Y}_{t},g_{t-1}\tilde{\mathsf{Y}}_{t}|\mathcal{F}_{t-1}\}}(x_{t},\mathsf{x}_{t})}{p_{\{\check{Y}_{t},\check{\mathsf{Y}}_{t}\}}(x_{t},\mathsf{x}_{t})}
\displaystyle\leq\ exp{Σˇtop2Σ~t1op2Σ~top2Σ~t1Σˇt1HS2+(Σˇt1op+Σ~t1op)(L(t),𝖫(t))2\displaystyle\exp\Big{\{}\|\check{\Sigma}_{t}\|_{\mathrm{op}}^{2}\|\tilde{\Sigma}_{t}^{-1}\|_{\mathrm{op}}^{2}\|\tilde{\Sigma}_{t}\|_{\mathrm{op}}^{2}\|\tilde{\Sigma}_{t}^{-1}-\check{\Sigma}_{t}^{-1}\|_{\mathrm{HS}}^{2}+(\|\check{\Sigma}_{t}^{-1}\|_{\mathrm{op}}+\|\tilde{\Sigma}_{t}^{-1}\|_{\mathrm{op}})\|(L^{(t)},\mathsf{L}^{(t)})\|^{2}
+(L(t),𝖫(t))Σ~t1(xt,𝗑t)+12(xt,𝗑t)(Σ~t1Σˇt1)212𝔼[(Xt,𝖷t)(Σ~t1Σˇt1)2]},\displaystyle+(L^{(t)},\mathsf{L}^{(t)})\tilde{\Sigma}_{t}^{-1}(x_{t},\mathsf{x}_{t})^{*}+\frac{1}{2}\|(x_{t},\mathsf{x}_{t})\|^{2}_{(\tilde{\Sigma}_{t}^{-1}-\check{\Sigma}_{t}^{-1})}-\frac{1}{2}\mathbb{E}\big{[}\|(X_{t},\mathsf{X}_{t})\|^{2}_{(\tilde{\Sigma}_{t}^{-1}-\check{\Sigma}_{t}^{-1})}\big{]}\Big{\}}\,, (3.78)

where the expectation is taken over (Xt,𝖷t)(W~t+gt1Y~t,𝖶~t+gt1𝖸~t|t1)(X_{t},\mathsf{X}_{t})\sim(\tilde{W}_{t}+g_{t-1}\tilde{Y}_{t},\tilde{\mathsf{W}}_{t}+g_{t-1}\tilde{\mathsf{Y}}_{t}|\mathcal{F}_{t-1}). We need a few estimates on Σ~t,Σˇt\tilde{\Sigma}_{t},\check{\Sigma}_{t} and L(t),𝖫(t)L^{(t)},\mathsf{L}^{(t)}.

Claim 3.33.

On the event t\mathcal{H}_{t}, we have L(t)2,𝖫(t)2nKt6Δt2\|L^{(t)}\|^{2},\|\mathsf{L}^{(t)}\|^{2}\leq nK_{t}^{6}\Delta_{t}^{2}.

Proof.

On t\mathcal{H}_{t} we know k,v(PROJ(gt1Y~t(k,v)))2nKt6Δt2\sum_{k,v}\big{(}\mathrm{PROJ}(g_{t-1}\tilde{Y}_{t}(k,v))\big{)}^{2}\leq nK_{t}^{6}\Delta_{t}^{2}. Thus, L(t)2nKt6Δt2\|L^{(t)}\|^{2}\leq nK_{t}^{6}\Delta_{t}^{2}. We can bound 𝖫(t)2\|\mathsf{L}^{(t)}\|^{2} similarly. ∎

Claim 3.34.

We have Σ~t1op,(Σt)1op,Σˇt1op1\|\tilde{\Sigma}_{t}^{-1}\|_{\mathrm{op}},\|(\Sigma^{\diamond}_{t})^{-1}\|_{\mathrm{op}},\|\check{\Sigma}_{t}^{-1}\|_{\mathrm{op}}\leq 1.

Proof.

By definition, we see that Σ~t\tilde{\Sigma}_{t} is the sum of the identity matrix and a positive definite matrix, and thus we have Σ~t1op1\|\tilde{\Sigma}_{t}^{-1}\|_{\mathrm{op}}\leq 1. Similar results hold for Σt\Sigma^{\diamond}_{t} and Σˇt\check{\Sigma}_{t}. ∎

Claim 3.35.

On 𝒯t1t\mathcal{T}_{t-1}\cap\mathcal{E}_{t}, we have Σ~tΣˇtop200Kt6\|\tilde{\Sigma}_{t}-\check{\Sigma}_{t}\|_{\mathrm{op}}\leq 200K_{t}^{6} and Σ~tΣˇtHS2nKt11Δt2\|\tilde{\Sigma}_{t}-\check{\Sigma}_{t}\|_{\mathrm{HS}}^{2}\leq nK_{t}^{11}\Delta_{t}^{2}.

Proof.

By [17, (3.12),(3.61)] which follow from standard properties for general Gaussian processes, we have

𝔼[(gt1Y~t(k,v)GAUS(gt1Y~t(k,v)))(gt1Y~t(l,u)GAUS(gt1Y~t(l,u)))]\displaystyle\mathbb{E}\Big{[}(g_{t-1}\tilde{Y}_{t}^{\diamond}(k,v)-\mathrm{GAUS}(g_{t-1}\tilde{Y}_{t}(k,v)))(g_{t-1}\tilde{Y}_{t}^{\diamond}(l,u)-\mathrm{GAUS}(g_{t-1}\tilde{Y}_{t}(l,u)))\Big{]}
=\displaystyle= 𝔼[gt1Y~t(k,v)gt1Y~t(l,u)]𝔼[GAUS(gt1Y~t(k,v))GAUS(gt1Y~t(l,u))].\displaystyle\mathbb{E}\Big{[}g_{t-1}\tilde{Y}_{t}^{\diamond}(k,v)g_{t-1}\tilde{Y}_{t}^{\diamond}(l,u)\Big{]}-\mathbb{E}\Big{[}\mathrm{GAUS}(g_{t-1}\tilde{Y}_{t}(k,v))\mathrm{GAUS}(g_{t-1}\tilde{Y}_{t}(l,u))\Big{]}\,.

Here the coefficient gt1g_{t-1} does not matter since this result follows from the fact that

Cov(Y1{X1,,Xn},Y2{X1,,Xn})\displaystyle\mathrm{Cov}\Big{(}Y_{1}\mid\{X_{1},\ldots,X_{n}\},Y_{2}\mid\{X_{1},\ldots,X_{n}\}\Big{)}
=\displaystyle=\ 𝔼[Y1Y2]𝔼[𝔼[Y1X1,,Xn]𝔼[Y2X1,,Xn]]\displaystyle\mathbb{E}[Y_{1}Y_{2}]-\mathbb{E}\big{[}\mathbb{E}[Y_{1}\mid X_{1},\ldots,X_{n}]\mathbb{E}[Y_{2}\mid X_{1},\ldots,X_{n}]\big{]}

for general Gaussian process {X1,,Xn,Y1,Y2}\{X_{1},\ldots,X_{n},Y_{1},Y_{2}\}. Recalling (3.62), we see that the covariance matrix of {GAUS(gt1Y~t),GAUS(gt1𝖸~t)}\{\mathrm{GAUS}(g_{t-1}\tilde{Y}_{t}),\mathrm{GAUS}(g_{t-1}\tilde{\mathsf{Y}}_{t})\} equals to

𝔼[𝐇t𝐐t1(gt1[Y~]t1gt1[𝖸~]t1)(gt1[Y~]t1gt1[𝖸~]t1)𝐐t1𝐇t]=𝐇t𝐐t1𝐇t.\displaystyle\mathbb{E}\Big{[}\mathbf{H}_{t}\mathbf{Q}_{t-1}\begin{pmatrix}g_{t-1}[\tilde{Y}]_{t-1}^{\diamond}\\ g_{t-1}[\tilde{\mathsf{Y}}]_{t-1}^{\diamond}\end{pmatrix}\begin{pmatrix}g_{t-1}[\tilde{Y}]_{t-1}^{\diamond}&g_{t-1}[\tilde{\mathsf{Y}}]_{t-1}^{\diamond}\end{pmatrix}\mathbf{Q}_{t-1}\mathbf{H}_{t}^{*}\Big{]}=\mathbf{H}_{t}\mathbf{Q}_{t-1}\mathbf{H}_{t}^{*}\,.

Thus, we have Σ~t=Σt𝐇t𝐐t1(𝐇t)\tilde{\Sigma}_{t}=\Sigma^{\diamond}_{t}-\mathbf{H}_{t}\mathbf{Q}_{t-1}(\mathbf{H}_{t})^{*}. Combined with Lemmas 3.23 and 3.25, it yields that Σ~tΣtop100Kt6\|\tilde{\Sigma}_{t}-\Sigma^{\diamond}_{t}\|_{\mathrm{op}}\leq 100K_{t}^{6}. In addition, applying Lemmas 3.23, 3.25 and 3.31, we get

Σ~tΣtHS2=𝐇t𝐐t1𝐇tHS2𝐐t1op2𝐇top2𝐇tHS2105nKt10Δt2.\displaystyle\|\tilde{\Sigma}_{t}-\Sigma^{\diamond}_{t}\|_{\mathrm{HS}}^{2}=\|\mathbf{H}_{t}\mathbf{Q}_{t-1}\mathbf{H}_{t}^{*}\|_{\mathrm{HS}}^{2}\leq\|\mathbf{Q}_{t-1}\|_{\mathrm{op}}^{2}\|\mathbf{H}_{t}\|_{\mathrm{op}}^{2}\|\mathbf{H}_{t}\|_{\mathrm{HS}}^{2}\leq 10^{5}nK_{t}^{10}\Delta_{t}^{2}\,. (3.79)

Furthermore, by (3.54) and (3.57) we get that Σt((k,v),(l,v))Σˇt((k,v),(l,v))=O(KtΔt)\Sigma^{\diamond}_{t}((k,v),(l,v))-\check{\Sigma}_{t}((k,v),(l,v))=O(K_{t}\Delta_{t}); by (3.55) and (3.58) we get that Σt((k,𝗏),(l,𝗏))Σˇt((k,𝗏),(l,𝗏))=O(KtΔt)\Sigma^{\diamond}_{t}((k,\mathsf{v}),(l,\mathsf{v}))-\check{\Sigma}_{t}((k,\mathsf{v}),(l,\mathsf{v}))=O(K_{t}\Delta_{t}); by (3.56) and (3.59) we get that Σt((k,v),(l,π(v)))Σˇt((k,v),(l,π(v)))=O(KtΔt)\Sigma^{\diamond}_{t}((k,v),(l,\pi(v)))-\check{\Sigma}_{t}((k,v),(l,\pi(v)))=O(K_{t}\Delta_{t}). Also, for uvu\neq v, by (3.53) we have for τu{u,π(u)}\tau_{u}\in\{u,\pi(u)\} and τv{v,π(v)}\tau_{v}\in\{v,\pi(v)\}

Σt((k,τv),(l,τu))Σˇt((k,τv),(l,τu))\displaystyle\Sigma^{\diamond}_{t}((k,\tau_{v}),(l,\tau_{u}))-\check{\Sigma}_{t}((k,\tau_{v}),(l,\tau_{u})) =Σt((k,τv),(l,τu))\displaystyle=\Sigma^{\diamond}_{t}((k,\tau_{v}),(l,\tau_{u}))
=O(Kt𝔞tn(𝟏viΓ(t)i𝔞t)(𝟏uiΓ(t)i𝔞t)).\displaystyle=O\Big{(}\frac{K_{t}}{\mathfrak{a}_{t}n}(\mathbf{1}_{v\in\cup_{i}\Gamma^{(t)}_{i}}-\mathfrak{a}_{t})(\mathbf{1}_{u\in\cup_{i}\Gamma^{(t)}_{i}}-\mathfrak{a}_{t})\Big{)}\,.

Combined with Items (i) and (iv) of t\mathcal{E}_{t} in Definition 3.1, this implies that ΣˇtΣtHS2nKt6Δt2\|\check{\Sigma}_{t}-\Sigma^{\diamond}_{t}\|_{\mathrm{HS}}^{2}\leq nK_{t}^{6}\Delta_{t}^{2}. Applying Lemma 3.8 by setting v=𝒥v={(s,k,v),(s,k,π(v))}\mathcal{I}_{v}=\mathcal{J}_{v}=\{(s,k,v),(s,k,\pi(v))\}, δ=KtΔt\delta=K_{t}\Delta_{t} and C=10Kt3C=10K_{t}^{3}, we can deduce that ΣˇtΣtop100Kt3\|\check{\Sigma}_{t}-\Sigma^{\diamond}_{t}\|_{\mathrm{op}}\leq 100K_{t}^{3}. Combined with (3.79) and the fact that Σ~tΣtop100Kt6\|\tilde{\Sigma}_{t}-\Sigma^{\diamond}_{t}\|_{\mathrm{op}}\leq 100K_{t}^{6}, this completes the proof by the triangle inequality. ∎

Corollary 3.36.

On the event 𝒯t1t\mathcal{T}_{t-1}\cap\mathcal{E}_{t}, we have Σ~t1Σˇt1op200Kt6\|\tilde{\Sigma}_{t}^{-1}-\check{\Sigma}_{t}^{-1}\|_{\mathrm{op}}\leq 200K_{t}^{6} as well as Σ~t1Σˇt12HSnKt11Δt2\|\tilde{\Sigma}_{t}^{-1}-\check{\Sigma}_{t}^{-1}\|^{2}_{\mathrm{HS}}\leq nK_{t}^{11}\Delta_{t}^{2}.

Proof.

Combining Claims 3.34 and 3.35, we get

Σ~t1Σˇt1op\displaystyle\|\tilde{\Sigma}_{t}^{-1}-\check{\Sigma}_{t}^{-1}\|_{\mathrm{op}} =Σ~t1(Σ~tΣˇt)Σˇt1op\displaystyle=\|\tilde{\Sigma}_{t}^{-1}(\tilde{\Sigma}_{t}-\check{\Sigma}_{t})\check{\Sigma}_{t}^{-1}\|_{\mathrm{op}}
Σ~t1opΣ~tΣˇtopΣˇt1op200Kt6.\displaystyle\leq\|\tilde{\Sigma}_{t}^{-1}\|_{\mathrm{op}}\|\tilde{\Sigma}_{t}-\check{\Sigma}_{t}\|_{\mathrm{op}}\|\check{\Sigma}_{t}^{-1}\|_{\mathrm{op}}\leq 200K_{t}^{6}\,.

It remains to control the HS-norm. By Lemma 3.31 and Claim 3.34, we have

Σ~t1Σˇt12HS\displaystyle\|\tilde{\Sigma}_{t}^{-1}-\check{\Sigma}_{t}^{-1}\|^{2}_{\mathrm{HS}} =Σ~t1(Σ~tΣˇt)Σˇt12HS\displaystyle=\|\tilde{\Sigma}_{t}^{-1}(\tilde{\Sigma}_{t}-\check{\Sigma}_{t})\check{\Sigma}_{t}^{-1}\|^{2}_{\mathrm{HS}}
Σ~t12opΣˇt12opΣ~tΣˇt2HSnKt11Δt2.\displaystyle\leq\|\tilde{\Sigma}_{t}^{-1}\|^{2}_{\mathrm{op}}\|\check{\Sigma}_{t}^{-1}\|^{2}_{\mathrm{op}}\|\tilde{\Sigma}_{t}-\check{\Sigma}_{t}\|^{2}_{\mathrm{HS}}\leq nK_{t}^{11}\Delta_{t}^{2}\,.\qed
Corollary 3.37.

On the event 𝒯t1t\mathcal{T}_{t-1}\cap\mathcal{E}_{t} we have Σˇtop2\|\check{\Sigma}_{t}\|_{\mathrm{op}}\leq 2 and Σ~top300Kt6\|\tilde{\Sigma}_{t}\|_{\mathrm{op}}\leq 300K_{t}^{6}.

Proof.

By the definition of Σˇt\check{\Sigma}_{t}, we see that Σˇt=diag(Σˇt,k,v)\check{\Sigma}_{t}=\mathrm{diag}(\check{\Sigma}_{t,k,v}) is a block-diagonal matrix where for 1kKt,vBADt1\leq k\leq K_{t},v\not\in\mathrm{BAD}_{t} we have Σˇt,k,v\check{\Sigma}_{t,k,v} is a 222\!*\!2 matrix with diagonal entries 1 and non-diagonal entries

ρ^η(t)kΨ(t)(η(t)k)(2.27)2ρ^εt(2.33)ρ^(2.3)0.1.\hat{\rho}\eta^{(t)}_{k}\Psi^{(t)}(\eta^{(t)}_{k})^{*}\overset{\eqref{equ-vector-unit}}{\leq}2\hat{\rho}\varepsilon_{t}\overset{\eqref{eq-decrease-varepsilon}}{\leq}\hat{\rho}\overset{\eqref{eq-assumetion-rho}}{\leq}0.1\,.

Thus, Σˇtop2\|\check{\Sigma}_{t}\|_{\mathrm{op}}\leq 2. By Claim 3.35 and the triangle inequality, we get that Σ~topΣ~tΣˇtop+Σˇtop300Kt6\|\tilde{\Sigma}_{t}\|_{\mathrm{op}}\leq\|\tilde{\Sigma}_{t}-\check{\Sigma}_{t}\|_{\mathrm{op}}+\|\check{\Sigma}_{t}\|_{\mathrm{op}}\leq 300K_{t}^{6}. ∎

We are now ready to show that t\mathcal{B}_{t} typically occurs. In what follows, we assume on the event 𝒜t1,t1,𝒯t,t,t\mathcal{A}_{t-1},\mathcal{B}_{t-1},\mathcal{T}_{t},\mathcal{E}_{t},\mathcal{H}_{t} without further notice. Formally, we abuse the notation ()\mathbb{P}(\cdot) by meaning (𝒜t1t1𝒯ttt)\mathbb{P}(\cdot\cap\mathcal{A}_{t-1}\cap\mathcal{B}_{t-1}\cap\mathcal{T}_{t}\cap\mathcal{E}_{t}\cap\mathcal{H}_{t}); we abuse notation this way (and similarly in later subsections) since it shortens the notation and the meaning should be clear from the context. Define 𝒞\mathcal{C} be the event that the realizations of {BADt,BADt1},(gt1Yt,gt1𝖸t)\{\mathrm{BAD}_{t},\mathrm{BAD}_{t-1}\},(g_{t-1}Y_{t},g_{t-1}\mathsf{Y}_{t}) and {Gu,w,𝖦π(u),π(w):u or wBADt1}\{\overrightarrow{G}_{u,w},\overrightarrow{\mathsf{G}}_{\pi(u),\pi(w)}:u\mbox{ or }w\in\mathrm{BAD}_{t-1}\} are amenable. By Lemma 3.15 we have

(𝒞)1o(exp{n1logloglogn}).{}\mathbb{P}(\mathcal{C})\geq 1-o(\exp\{-n^{\frac{1}{\log\log\log n}}\})\,. (3.80)

By Lemma 3.16, under 𝒞\mathcal{C} we have (recall that BADt1=Bt1\mathrm{BAD}_{t-1}=\mathrm{B}_{t-1} is implied in 𝔖t1\mathfrak{S}_{t-1})

p{gt1Yt,gt1𝖸t|𝔖t1,BADt=Bt}(xt,𝗑t)p{gt1Y~t,gt1𝖸~t|t1}(xt,𝗑t)exp{nΔt5}.\displaystyle\frac{p_{\{g_{t-1}Y_{t},g_{t-1}\mathsf{Y}_{t}|\mathfrak{S}_{t-1},\mathrm{BAD}_{t}=\mathrm{B}_{t}\}}(x_{t},\mathsf{x}_{t})}{p_{\{g_{t-1}\tilde{Y}_{t},g_{t-1}\tilde{\mathsf{Y}}_{t}|\mathcal{F}_{t-1}\}}(x_{t},\mathsf{x}_{t})}\leq\exp\{n\Delta_{t}^{5}\}\,.

Plugging Claims 3.33 and 3.34 and Corollaries 3.36 and 3.37 into (3.78), we have under 𝒞\mathcal{C}

p{gt1Y~t,gt1𝖸~t|t1}(xt,𝗑t)p{Yˇt,𝖸ˇt}(xt,𝗑t)exp{nKt29Δt2\displaystyle\frac{p_{\{g_{t-1}\tilde{Y}_{t},g_{t-1}\tilde{\mathsf{Y}}_{t}|\mathcal{F}_{t-1}\}}(x_{t},\mathsf{x}_{t})}{p_{\{\check{Y}_{t},\check{\mathsf{Y}}_{t}\}}(x_{t},\mathsf{x}_{t})}\leq\exp\Big{\{}nK_{t}^{29}\Delta_{t}^{2}
+(L(t),𝖫(t))Σ~t1(xt,𝗑t)+12(xt,𝗑t)2(Σ~t1Σˇt1)12𝔼[(X,𝖷)2(Σ~t1Σˇt1)]},\displaystyle+(L^{(t)},\mathsf{L}^{(t)})\tilde{\Sigma}_{t}^{-1}(x_{t},\mathsf{x}_{t})^{*}+\frac{1}{2}\|(x_{t},\mathsf{x}_{t})\|^{2}_{(\tilde{\Sigma}_{t}^{-1}-\check{\Sigma}_{t}^{-1})}-\frac{1}{2}\mathbb{E}\big{[}\|(X,\mathsf{X})\|^{2}_{(\tilde{\Sigma}_{t}^{-1}-\check{\Sigma}_{t}^{-1})}\big{]}\Big{\}}\,,

where (X,𝖷)((gt1Y~t,gt1𝖸~t)|t1)(X,\mathsf{X})\sim((g_{t-1}\tilde{Y}_{t},g_{t-1}\tilde{\mathsf{Y}}_{t})|\mathcal{F}_{t-1}). Altogether, we get that

p{gt1Yt,gt1𝖸t|𝔖t1,BADt=Bt}(xt,𝗑t)p{Yˇt,𝖸ˇt}(xt,𝗑t)exp{2nKt29Δt2\displaystyle\frac{p_{\{g_{t-1}Y_{t},g_{t-1}\mathsf{Y}_{t}|\mathfrak{S}_{t-1},\mathrm{BAD}_{t}=\mathrm{B}_{t}\}}(x_{t},\mathsf{x}_{t})}{p_{\{\check{Y}_{t},\check{\mathsf{Y}}_{t}\}}(x_{t},\mathsf{x}_{t})}\leq\exp\Big{\{}2nK_{t}^{29}\Delta_{t}^{2}
+(L(t),𝖫(t))Σ~t1(xt,𝗑t)+12(xt,𝗑t)2(Σ~t1Σˇt1)12𝔼[(X,𝖷)2(Σ~t1Σˇt1)]}.\displaystyle+(L^{(t)},\mathsf{L}^{(t)})\tilde{\Sigma}_{t}^{-1}(x_{t},\mathsf{x}_{t})^{*}+\frac{1}{2}\|(x_{t},\mathsf{x}_{t})\|^{2}_{(\tilde{\Sigma}_{t}^{-1}-\check{\Sigma}_{t}^{-1})}-\frac{1}{2}\mathbb{E}\big{[}\|(X,\mathsf{X})\|^{2}_{(\tilde{\Sigma}_{t}^{-1}-\check{\Sigma}_{t}^{-1})}\big{]}\Big{\}}\,.

Thus, to estimate probability for \mathcal{B} it suffices to show that the preceding upper bound is out of control only with probability o(1)o(1). By Lemma 3.16, (as we will show) it suffices to control this probability under the measure p{gt1Y~t,gt1𝖸~t|t1}p_{\{g_{t-1}\tilde{Y}_{t},g_{t-1}\tilde{\mathsf{Y}}_{t}|\mathcal{F}_{t-1}\}}. To this end, define

𝒰(I)={(x,𝗑):(L(t),𝖫(t))Σ~t1(x,𝗑)nKt29Δt2},\displaystyle\mathcal{U}^{(I)}=\{(x,\mathsf{x}):(L^{(t)},\mathsf{L}^{(t)})\tilde{\Sigma}_{t}^{-1}(x,\mathsf{x})^{*}\geq nK_{t}^{29}\Delta_{t}^{2}\}\,,
𝒰(II)={(x,𝗑):(x,𝗑)(Σ~t1Σˇt1)2𝔼[(X,𝖷)2(Σ~t1Σˇt1)]nKt29Δt2}.\displaystyle\mathcal{U}^{(II)}=\{(x,\mathsf{x}):\|(x,\mathsf{x})\|_{(\tilde{\Sigma}_{t}^{-1}-\check{\Sigma}_{t}^{-1})}^{2}-\mathbb{E}\big{[}\|(X,\mathsf{X})\|^{2}_{(\tilde{\Sigma}_{t}^{-1}-\check{\Sigma}_{t}^{-1})}\big{]}\geq nK_{t}^{29}\Delta_{t}^{2}\}\,.

By Claims 3.33 and 3.34, we have

Var((L(t),𝖫(t)),(X,𝖷)Σ~t1)=(L(t),𝖫(t))Σ~t1(L(t),𝖫(t))nKt10Δt2.\displaystyle\mathrm{Var}\big{(}\big{\langle}(L^{(t)},\mathsf{L}^{(t)}),(X,\mathsf{X})\big{\rangle}_{\tilde{\Sigma}_{t}^{-1}}\big{)}=(L^{(t)},\mathsf{L}^{(t)})\tilde{\Sigma}_{t}^{-1}(L^{(t)},\mathsf{L}^{(t)})^{*}\leq nK_{t}^{10}\Delta_{t}^{2}\,.

Since the mean is equal to the variance in this case and (X,𝖷)((gt1Y~t,gt1𝖸~t)|t1)(X,\mathsf{X})\sim((g_{t-1}\tilde{Y}_{t},g_{t-1}\tilde{\mathsf{Y}}_{t})|\mathcal{F}_{t-1}), we then obtain from the tail probability of normal distribution that

((gt1Y~t,gt1𝖸~t)𝒰(I)|t1)exp{nKt29Δt2}.\displaystyle\mathbb{P}((g_{t-1}\tilde{Y}_{t},g_{t-1}\tilde{\mathsf{Y}}_{t})\in\mathcal{U}^{(I)}|\mathcal{F}_{t-1})\leq\exp\{-nK_{t}^{29}\Delta_{t}^{2}\}\,. (3.81)

Next, we consider 𝒰(II)\mathcal{U}^{(II)}. On the event 𝒰(II)\mathcal{U}^{(II)}, we have

(X,𝖷)2(Σ~t1Σˇt1)𝔼[(X,𝖷)2(Σ~t1Σˇt1)]>nKt29Δt2.\displaystyle\|(X,\mathsf{X})\|^{2}_{(\tilde{\Sigma}_{t}^{-1}-\check{\Sigma}_{t}^{-1})}-\mathbb{E}\big{[}\|(X,\mathsf{X})\|^{2}_{(\tilde{\Sigma}_{t}^{-1}-\check{\Sigma}_{t}^{-1})}\big{]}>nK_{t}^{29}\Delta_{t}^{2}\,.

Recalling the definitions of L(t),𝖫(t)L^{(t)},\mathsf{L}^{(t)}, we have that (XL(t),𝖷𝖫(t))(X-L^{(t)},\mathsf{X}-\mathsf{L}^{(t)}) is a mean zero Gaussian vector. This motivates us to write

(X,𝖷)2(Σ~t1Σˇt1)𝔼[(X,𝖷)2(Σ~t1Σˇt1)]=(L(t),𝖫(t)),(XL(t),𝖷𝖫(t))(Σ~t1Σˇt1)\displaystyle\|(X,\mathsf{X})\|^{2}_{(\tilde{\Sigma}_{t}^{-1}-\check{\Sigma}_{t}^{-1})}-\mathbb{E}\big{[}\|(X,\mathsf{X})\|^{2}_{(\tilde{\Sigma}_{t}^{-1}-\check{\Sigma}_{t}^{-1})}\big{]}=\langle(L^{(t)},\mathsf{L}^{(t)}),(X-L^{(t)},\mathsf{X}-\mathsf{L}^{(t)})\rangle_{(\tilde{\Sigma}_{t}^{-1}-\check{\Sigma}_{t}^{-1})}
+(XL(t),𝖷𝖫(t))2(Σ~t1Σˇt1)𝔼[(XL(t),𝖷𝖫(t))2(Σ~t1Σˇt1)].\displaystyle+\|(X-L^{(t)},\mathsf{X}-\mathsf{L}^{(t)})\|^{2}_{(\tilde{\Sigma}_{t}^{-1}-\check{\Sigma}_{t}^{-1})}-\mathbb{E}\big{[}\|(X-L^{(t)},\mathsf{X}-\mathsf{L}^{(t)})\|^{2}_{(\tilde{\Sigma}_{t}^{-1}-\check{\Sigma}_{t}^{-1})}\big{]}\,.

By Claim 3.33 and Corollary 3.36, (L(t),𝖫(t)),(XL(t),𝖷𝖫(t))(Σ~t1Σˇt1)\langle(L^{(t)},\mathsf{L}^{(t)}),(X-L^{(t)},\mathsf{X}-\mathsf{L}^{(t)})\rangle_{(\tilde{\Sigma}_{t}^{-1}-\check{\Sigma}_{t}^{-1})} is a Gaussian variable with mean 0 and variance O(nKt12Δt2))O(nK_{t}^{12}\Delta_{t}^{2})). Then,

((gt1Y~t,gt1𝖸~t)𝒰(II)|t1)(𝖯1+𝖯2),\displaystyle\mathbb{P}((g_{t-1}\tilde{Y}_{t},g_{t-1}\tilde{\mathsf{Y}}_{t})\in\mathcal{U}^{(II)}|\mathcal{F}_{t-1})\leq(\mathsf{P}_{1}+\mathsf{P}_{2})\,, (3.82)

where 𝖯1=((L(t),𝖫(t)),(XL(t),𝖷𝖫(t))(Σ~t1Σˇt1)>nKt28Δt2)exp{nKt28Δt2}\mathsf{P}_{1}=\mathbb{P}\big{(}\langle(L^{(t)},\mathsf{L}^{(t)}),(X-L^{(t)},\mathsf{X}-\mathsf{L}^{(t)})\rangle_{(\tilde{\Sigma}_{t}^{-1}-\check{\Sigma}_{t}^{-1})}>nK_{t}^{28}\Delta_{t}^{2}\big{)}\leq\exp\{-nK_{t}^{28}\Delta_{t}^{2}\} and

𝖯2=((XL(t),𝖷𝖫(t))2(Σ~t1Σˇt1)𝔼[(XL(t),𝖷𝖫(t))2(Σ~t1Σˇt1)]nKt28Δt2).\displaystyle\mathsf{P}_{2}=\mathbb{P}\big{(}\|(X-L^{(t)},\mathsf{X}-\mathsf{L}^{(t)})\|^{2}_{(\tilde{\Sigma}_{t}^{-1}-\check{\Sigma}_{t}^{-1})}-\mathbb{E}\big{[}\|(X-L^{(t)},\mathsf{X}-\mathsf{L}^{(t)})\|^{2}_{(\tilde{\Sigma}_{t}^{-1}-\check{\Sigma}_{t}^{-1})}\big{]}\geq nK_{t}^{28}\Delta_{t}^{2}\big{)}\,.

It remains to bound 𝖯2\mathsf{P}_{2}. To this end, we see that there exist a linear transform 𝐓t\mathbf{T}_{t} and a standard normal random vector (Ut,𝖴t)(U_{t},\mathsf{U}_{t}) such that

(XL(t),𝖷𝖫(t))=𝐓t(Ut,𝖴t)\displaystyle(X-L^{(t)},\mathsf{X}-\mathsf{L}^{(t)})=\mathbf{T}_{t}(U_{t},\mathsf{U}_{t})

and 𝐓t𝐓t=Σ~t\mathbf{T}_{t}^{*}\mathbf{T}_{t}=\tilde{\Sigma}_{t} (so in particular Σ~top=𝐓t2op\|\tilde{\Sigma}_{t}\|_{\mathrm{op}}=\|\mathbf{T}_{t}\|^{2}_{\mathrm{op}}). Thus,

(XL(t),𝖷𝖫(t))2(Σ~t1Σˇt1)=(Ut,𝖴t)2𝐓t(Σ~t1Σˇt1)𝐓t\displaystyle\|(X-L^{(t)},\mathsf{X}-\mathsf{L}^{(t)})\|^{2}_{(\tilde{\Sigma}_{t}^{-1}-\check{\Sigma}_{t}^{-1})}=\|(U_{t},\mathsf{U}_{t})\|^{2}_{\mathbf{T}_{t}(\tilde{\Sigma}_{t}^{-1}-\check{\Sigma}_{t}^{-1})\mathbf{T}_{t}^{*}}

is a quadratic form of a standard Gaussian vector. We also have the following estimate:

𝐓t(Σ~t1Σˇt1)𝐓top𝐓t2opΣ~t1Σˇt1op=Σ~topΣ~t1Σˇt1opKt20,\displaystyle\|\mathbf{T}_{t}(\tilde{\Sigma}_{t}^{-1}-\check{\Sigma}_{t}^{-1})\mathbf{T}_{t}^{*}\|_{\mathrm{op}}\leq\|\mathbf{T}_{t}\|^{2}_{\mathrm{op}}\|\tilde{\Sigma}_{t}^{-1}-\check{\Sigma}_{t}^{-1}\|_{\mathrm{op}}=\|\tilde{\Sigma}_{t}\|_{\mathrm{op}}\|\tilde{\Sigma}_{t}^{-1}-\check{\Sigma}_{t}^{-1}\|_{\mathrm{op}}\leq K_{t}^{20}\,,

where the second inequality follows from Corollaries 3.36 and 3.37. In addition, we have

𝐓t(Σ~t1Σˇt1)𝐓tHS2𝐓t4opΣ~t1Σˇt1HS2=Σ~top2Σ~t1Σˇt1HS2nKt24Δt2.\displaystyle\|\mathbf{T}_{t}(\tilde{\Sigma}_{t}^{-1}-\check{\Sigma}_{t}^{-1})\mathbf{T}_{t}^{*}\|_{\mathrm{HS}}^{2}\leq\|\mathbf{T}_{t}\|^{4}_{\mathrm{op}}\|\tilde{\Sigma}_{t}^{-1}-\check{\Sigma}_{t}^{-1}\|_{\mathrm{HS}}^{2}=\|\tilde{\Sigma}_{t}\|_{\mathrm{op}}^{2}\|\tilde{\Sigma}_{t}^{-1}-\check{\Sigma}_{t}^{-1}\|_{\mathrm{HS}}^{2}\leq nK_{t}^{24}\Delta_{t}^{2}\,.

where the first inequality follows from Lemma 3.31 and the second inequality follows from Corollaries 3.36, 3.37. We can then apply Lemma 3.4 and obtain that

𝖯22exp{Ω(1)min(nKt28Δt2Kt20,(nKt28Δt2)2nKt24Δt2)}2exp{Ω(nKt8Δt2)}.\displaystyle\mathsf{P}_{2}\leq 2\exp\Big{\{}-\Omega(1)\min\Big{(}\frac{nK_{t}^{28}\Delta_{t}^{2}}{K_{t}^{20}},\frac{(nK_{t}^{28}\Delta_{t}^{2})^{2}}{nK_{t}^{24}\Delta_{t}^{2}}\Big{)}\Big{\}}\leq 2\exp\{-\Omega(nK_{t}^{8}\Delta_{t}^{2})\}\,.

Plugging the estimates of 𝖯1,𝖯2\mathsf{P}_{1},\mathsf{P}_{2} into (3.82) we get that the left hand side of (3.82) is bounded by 3exp{Ω(nKt8Δt2)}3\exp\{-\Omega(nK_{t}^{8}\Delta_{t}^{2})\}. Combined with (3.81) and Lemma 3.16 it yields that

((gt1Yt,gt1𝖸t)𝒰(I)𝒰(II);𝒞|𝔖t1;BADt=Bt)exp{nΔt5nKt8Δt2}.\displaystyle\mathbb{P}((g_{t-1}Y_{t},g_{t-1}\mathsf{Y}_{t})\in\mathcal{U}^{(I)}\cup\mathcal{U}^{(II)};\mathcal{C}|\mathfrak{S}_{t-1};\mathrm{BAD}_{t}=\mathrm{B}_{t})\leq\exp\{n\Delta_{t}^{5}-nK_{t}^{8}\Delta_{t}^{2}\}\,. (3.83)

Combined with (3.80), this yields that (by recalling (3.1) and noting that tc𝒞c{(gt1Yt,gt1𝖸t)𝒰I𝒰II}\mathcal{B}_{t}^{c}\subset\mathcal{C}^{c}\cup\{(g_{t-1}Y_{t},g_{t-1}\mathsf{Y}_{t})\in\mathcal{U}^{I}\cup\mathcal{U}^{II}\})

(tc;𝒜t1,t1,𝒯t,t,t)exp{12n1logloglogn}.\displaystyle\mathbb{P}(\mathcal{B}_{t}^{c};\mathcal{A}_{t-1},\mathcal{B}_{t-1},\mathcal{T}_{t},\mathcal{E}_{t},\mathcal{H}_{t})\leq\exp\{-\tfrac{1}{2}n^{\frac{1}{\log\log\log n}}\}\,. (3.84)

3.6.3 Step 3: 𝒜t\mathcal{A}_{t}

It is straightforward to bound the probability for 𝒜t\mathcal{A}_{t} on the event t\mathcal{B}_{t}. Indeed,

(vBADt(gt1Yt(k,v))2>100n;t)exp{nKt30Δt2}(vBADtYˇt(k,v)2>100n),\displaystyle\mathbb{P}\Big{(}\sum_{v\not\in\mathrm{BAD}_{t}}(g_{t-1}Y_{t}(k,v))^{2}>100n;\mathcal{B}_{t}\Big{)}\leq\exp\{nK_{t}^{30}\Delta_{t}^{2}\}\cdot\mathbb{P}\Big{(}\sum_{v\not\in\mathrm{BAD}_{t}}\check{Y}_{t}(k,v)^{2}>100n\Big{)}\,,

where the latter probability is bounded by e2ne^{-2n} using Chernoff bound. Thus, applying union bound on kk we have

(𝒜tc;t)Ktexp{n}.\displaystyle\mathbb{P}(\mathcal{A}_{t}^{c};\mathcal{B}_{t})\leq K_{t}\exp\{-n\}\,. (3.85)

3.6.4 Step 4: t+1\mathcal{E}_{t+1}

Recall Definition 3.1 and (3.3). The goal of this subsection is to prove

(t+1c;𝒜t,t,t,t,𝒯t)2Kt2exp{nΔt2}.\mathbb{P}(\mathcal{E}_{t+1}^{c};\mathcal{A}_{t},\mathcal{B}_{t},\mathcal{E}_{t},\mathcal{H}_{t},\mathcal{T}_{t})\leq 2K_{t}^{2}\exp\{-n\Delta_{t}^{2}\}\,. (3.86)

To this end, we will verify Condition (i.)–(x.) in Definition 3.1. Since (i.), (ii.) and (iii.) are controlled by (3.26), we then focus on the other conditions. In what follows, we always assume that 𝒜t,t,t,t\mathcal{A}_{t},\mathcal{B}_{t},\mathcal{E}_{t},\mathcal{H}_{t} and 𝒯t\mathcal{T}_{t} hold. Crucially, we will reduce our analysis for events on {W(t)v(k)+η(t)k,gt1D(t)v,𝖶(t)π(v)(k)+η(t)k,gt1𝖣(t)π(v):1kKt12,vBADt}\{W^{(t)}_{v}(k)+\langle\eta^{(t)}_{k},g_{t-1}{D}^{(t)}_{v}\rangle,\mathsf{W}^{(t)}_{\pi(v)}(k)+\langle\eta^{(t)}_{k},g_{t-1}\mathsf{D}^{(t)}_{\pi(v)}\rangle:1\leq k\leq\frac{K_{t}}{12},v\not\in\mathrm{BAD}_{t}\} under the conditioning of 𝔖t1\mathfrak{S}_{t-1} and BADt=Bt\mathrm{BAD}_{t}=\mathrm{B}_{t} to the same events on 𝔉ˇt\check{\mathfrak{F}}_{t} (recalling (3.77)) thanks to t\mathcal{B}_{t}; the latter would be much easier to estimate. To be more precise, note that

{|Γ(t+1)kBADt|n𝔞>𝔞Δt+1}\displaystyle\Big{\{}\frac{|\Gamma^{(t+1)}_{k}\setminus\mathrm{BAD}_{t}|}{n}-\mathfrak{a}>\mathfrak{a}\Delta_{t+1}\Big{\}} (3.87)
=\displaystyle= {1nvVBADt(𝟏{|12(12/Ktβ(t)k,W(t)v+σ(t)k,D(t)v)|10}𝔞)𝔞Δt+1}.\displaystyle\Big{\{}\frac{1}{n}\sum_{v\in V\setminus\mathrm{BAD}_{t}}\Big{(}\mathbf{1}_{\{|\frac{1}{\sqrt{2}}(\sqrt{12/K_{t}}\langle\beta^{(t)}_{k},W^{(t)}_{v}\rangle+\langle\sigma^{(t)}_{k},D^{(t)}_{v}\rangle)|\geq 10\}}-\mathfrak{a}\Big{)}\geq\mathfrak{a}\Delta_{t+1}\Big{\}}\,.

For vBADtv\not\in\mathrm{BAD}_{t}, by (3.7) we have bt1D(t)vKte10(loglogn)10Δt10\|b_{t-1}{D}^{(t)}_{v}\|\leq K_{t}e^{-10(\log\log n)^{10}}\leq\Delta_{t}^{10}, and as a result |σ(t)k,D(t)vσ(t)k,gt1D(t)v|KtΔt10Δt2|\langle\sigma^{(t)}_{k},D^{(t)}_{v}\rangle-\langle\sigma^{(t)}_{k},g_{t-1}{D}^{(t)}_{v}\rangle|\leq K_{t}\Delta_{t}^{10}\ll\Delta_{t}^{2}. Thus, under the conditioning of 𝔖t1\mathfrak{S}_{t-1} and BADt=Bt\mathrm{BAD}_{t}=\mathrm{B}_{t} we have

(3.87){1nvVBADt(𝟏{|12(12/Ktβ(t)k,W(t)v+σ(t)k,gt1D(t)v)|10Δt2}𝔞)𝔞Δt+12}.\displaystyle\eqref{eq-recall-Gamma-Bad-sum}\subset\Big{\{}\frac{1}{n}\sum_{v\in V\setminus\mathrm{BAD}_{t}}\Big{(}\mathbf{1}_{\{|\frac{1}{\sqrt{2}}(\sqrt{12/K_{t}}\langle\beta^{(t)}_{k},W^{(t)}_{v}\rangle+\langle\sigma^{(t)}_{k},g_{t-1}{D}^{(t)}_{v}\rangle)|\geq 10-\Delta_{t}^{2}\}}-\mathfrak{a}\Big{)}\geq\frac{\mathfrak{a}\Delta_{t+1}}{2}\Big{\}}\,.

Therefore, the conditional probability of (3.87) can be bounded by the conditional probability of the right hand side in the preceding inequality under the conditioning of 𝔖t1\mathfrak{S}_{t-1} and BADt=Bt\mathrm{BAD}_{t}=\mathrm{B}_{t}. We will tilt the measure to the same event on 𝔉ˇt\check{\mathfrak{F}}_{t} (as explained earlier), and on t\mathcal{B}_{t} we know this tilting loses at most a factor of exp{nKt30Δt2}\exp\{nK_{t}^{30}\Delta_{t}^{2}\}. Also on 𝒯t\mathcal{T}_{t} we have |BADt|n𝔞Δt+1\frac{|\mathrm{BAD}_{t}|}{n}\ll\mathfrak{a}\Delta_{t+1}. Therefore, the conditional probability of (3.87) is bounded by the following probability up to a factor of exp{nKt30Δt2}\exp\{nK_{t}^{30}\Delta_{t}^{2}\}:

(1nvVBt(𝟏{|12(12/Ktβ(t)k,W~(t)v+σ(t)k,Dˇ(t)v)|10Δt2}𝔞)>𝔞Δt+12).\displaystyle\mathbb{P}\Big{(}\frac{1}{n}\sum_{v\in V\setminus\mathrm{B}_{t}}\Big{(}\mathbf{1}_{\{|\frac{1}{\sqrt{2}}(\sqrt{12/K_{t}}\langle\beta^{(t)}_{k},\tilde{W}^{(t)}_{v}\rangle+\langle\sigma^{(t)}_{k},\check{D}^{(t)}_{v}\rangle)|\geq 10-\Delta_{t}^{2}\}}-\mathfrak{a}\Big{)}>\frac{\mathfrak{a}\Delta_{t+1}}{2}\Big{)}\,. (3.88)

Since {12(12/Ktβ(t)k,W~(t)v+σ(t)k,Dˇ(t)v):vVBt}\{\frac{1}{\sqrt{2}}(\sqrt{12/K_{t}}\langle\beta^{(t)}_{k},\tilde{W}^{(t)}_{v}\rangle+\langle\sigma^{(t)}_{k},\check{D}^{(t)}_{v}\rangle):v\in V\setminus\mathrm{B}_{t}\} is a collection of i.i.d. standard normal variables, we have (3.88)exp{n𝔞2Δt+1210}\eqref{eq-checker-probability-Gamma}\leq\exp\{-\frac{n\mathfrak{a}^{2}\Delta_{t+1}^{2}}{10}\}, and thus

((3.87)𝔖t1;BADt=Bt)exp{n𝔞2Δt+1220}.\displaystyle\mathbb{P}(\eqref{eq-recall-Gamma-Bad-sum}\mid\mathfrak{S}_{t-1};\mathrm{BAD}_{t}=\mathrm{B}_{t})\leq\exp\{-\frac{n\mathfrak{a}^{2}\Delta_{t+1}^{2}}{20}\}\,.

Similarly a lower deviation for |Γ(t+1)k|n𝔞\frac{|\Gamma^{(t+1)}_{k}|}{n}-\mathfrak{a} can be derived, completing the verification for (iv.). The bounds on |Γ(t+1)kΓ(t+1)l|n\frac{|\Gamma^{(t+1)}_{k}\cap\Gamma^{(t+1)}_{l}|}{n}, |Π(t+1)kΠ(t+1)l|n\frac{|\Pi^{(t+1)}_{k}\cap\Pi^{(t+1)}_{l}|}{n} and |π(Γ(t+1)k)Π(t+1)l|n\frac{|\pi(\Gamma^{(t+1)}_{k})\cap\Pi^{(t+1)}_{l}|}{n} (which correspond to (iv.), (v.) and (vi.) respectively) can be proved similarly.

Furthermore, we bound |π(Γ(t+1)k)Π(s)l|n,|Γ(t+1)kΓ(s)l|n,|Π(t+1)kΠ(s)l|n\frac{|\pi(\Gamma^{(t+1)}_{k})\cap\Pi^{(s)}_{l}|}{n},\frac{|\Gamma^{(t+1)}_{k}\cap\Gamma^{(s)}_{l}|}{n},\frac{|\Pi^{(t+1)}_{k}\cap\Pi^{(s)}_{l}|}{n} (which correspond to (vii.), (viii.), (ix.) and (x.) respectively). Note that under 𝔖t1\mathfrak{S}_{t-1}, we have that Π(s)l\Pi^{(s)}_{l}’s are fixed subsets for sts\leq t. In addition, on the event t\mathcal{E}_{t} we have ||Π(s)l|n𝔞s|<𝔞sΔs\Big{|}\frac{|\Pi^{(s)}_{l}|}{n}-\mathfrak{a}_{s}\Big{|}<\mathfrak{a}_{s}\Delta_{s}. Thus,

|Γ(t+1)kΠ(s)l|n𝔞t+1𝔞s=𝔞s(1𝔞snuΠ(s)l(𝟏uΓ(t+1)k𝔞t+1))+𝔞t+1(|Π(s)l|n𝔞s).\displaystyle\frac{|\Gamma^{(t+1)}_{k}\cap\Pi^{(s)}_{l}|}{n}-\mathfrak{a}_{t+1}\mathfrak{a}_{s}=\mathfrak{a}_{s}\Big{(}\frac{1}{\mathfrak{a}_{s}n}\sum_{u\in\Pi^{(s)}_{l}}\Big{(}\mathbf{1}_{u\in\Gamma^{(t+1)}_{k}}-\mathfrak{a}_{t+1}\Big{)}\Big{)}+\mathfrak{a}_{t+1}\Big{(}\frac{|\Pi^{(s)}_{l}|}{n}-\mathfrak{a}_{s}\Big{)}\,.

Since |𝔞t+1(|Π(s)l|n𝔞s)|𝔞t+1𝔞sΔs\Big{|}\mathfrak{a}_{t+1}\Big{(}\frac{|\Pi^{(s)}_{l}|}{n}-\mathfrak{a}_{s}\Big{)}\Big{|}\leq\mathfrak{a}_{t+1}\mathfrak{a}_{s}\Delta_{s} on the event t\mathcal{E}_{t}, the above can be bounded similarly to that for |Γ(t+1)k|n𝔞\frac{|\Gamma^{(t+1)}_{k}|}{n}-\mathfrak{a}. The same applies to the other two items here. We omit further details since the modifications are minor.

Putting all above together, we finally complete the proof of (3.86).

3.6.5 Step 5: t+1\mathcal{H}_{t+1}

We assume that 𝒜t,t,t,𝒯t,t+1\mathcal{A}_{t},\mathcal{B}_{t},\mathcal{H}_{t},\mathcal{T}_{t},\mathcal{E}_{t+1} hold throughout this subsection without further notice. Thus, we have

vBADt+1|PROJ(η(t+1)k,gtD(t+1)v)PROJ(η(t+1)k,gtD(t+1)v)|2\displaystyle\sum_{v\not\in\mathrm{BAD}_{t+1}}\big{|}{\mathrm{PROJ}}^{\prime}(\langle\eta^{(t+1)}_{k},g_{t}D^{(t+1)}_{v}\rangle)-\mathrm{PROJ}(\langle\eta^{(t+1)}_{k},g_{t}D^{(t+1)}_{v}\rangle)\big{|}^{2}
\displaystyle\leq\ Δt+12s,kvBADtη(s)k,gs1D(s)v2+nΔt+122nKt+13Δt+12,\displaystyle\Delta_{t+1}^{2}\sum_{s,k}\sum_{v\not\in\mathrm{BAD}_{t}}\langle\eta^{(s)}_{k},g_{s-1}D^{(s)}_{v}\rangle^{2}+n\Delta_{t+1}^{2}\leq 2nK_{t+1}^{3}\Delta_{t+1}^{2}\,,

where the first inequality follows from Lemma 3.26 and the second inequality relies on our assumption that 𝒜t\mathcal{A}_{t} holds. In light of this, to show t+1\mathcal{H}_{t+1} it suffices to bound for each kk the conditional probability given 𝔖t1\mathfrak{S}_{t-1} and BADt\mathrm{BAD}_{t} of the event

vBADt|PROJ(η(t+1)k,gtD(t+1)v)|2>14nKt+16Δt+12.\displaystyle\sum_{v\not\in\mathrm{BAD}_{t}}\big{|}{\mathrm{PROJ}}^{\prime}(\langle\eta^{(t+1)}_{k},g_{t}D^{(t+1)}_{v}\rangle)\big{|}^{2}>\frac{1}{4}nK_{t+1}^{6}\Delta_{t+1}^{2}\,. (3.89)

For notation convenience, we will write \mathbb{P} and 𝔼\mathbb{E} as (𝔖t1;BADt)\mathbb{P}(\cdot\mid\mathfrak{S}_{t-1};\mathrm{BAD}_{t}) and 𝔼(𝔖t1;BADt)\mathbb{E}(\cdot\mid\mathfrak{S}_{t-1};\mathrm{BAD}_{t}) in the rest of the subsection. In order to bound (3.89), we expand the matrix product in (3.65) into a summation as follows (we write [gY,g𝖸]t(r,l,w)=[gY]t(r,l,w)[gY,g\mathsf{Y}]_{t}(r,l,w)=[gY]_{t}(r,l,w) and [gY,g𝖸]t(r,l,π(w))=[g𝖸]t(r,l,π(w))[gY,g\mathsf{Y}]_{t}(r,l,\pi(w))=[g\mathsf{Y}]_{t}(r,l,\pi(w)) for wBADtw\not\in\mathrm{BAD}_{t} below):

s,r=1tl=1Krm=1Ksτ1,τ2𝐉t+1((k,v);(s,m,τ1))𝐏t((s,m,τ1);(r,l,τ2))[gY,g𝖸]t(r,l,τ2)\displaystyle\sum_{s,r=1}^{t}\sum_{l=1}^{K_{r}}\sum_{m=1}^{K_{s}}\sum_{\tau_{1},\tau_{2}}\mathbf{J}_{t+1}((k,v);(s,m,\tau_{1}))\mathbf{P}_{t}((s,m,\tau_{1});(r,l,\tau_{2}))[gY,g\mathsf{Y}]_{t}(r,l,\tau_{2}) (3.90)

where the summation is taken over τ1,τ2(VBADt)(𝖵π(BADt))\tau_{1},\tau_{2}\in\big{(}V\setminus\mathrm{BAD}_{t}\big{)}\cup\big{(}\mathsf{V}\setminus\pi(\mathrm{BAD}_{t})\big{)}. So we know that v|PROJ(η(t+1)k,gtD(t+1)v)|2\sum_{v}\big{|}\mathrm{PROJ}^{\prime}(\langle\eta^{(t+1)}_{k},g_{t}D^{(t+1)}_{v}\rangle)\big{|}^{2} is bounded up to a factor of 4Kt24K_{t}^{2} by the maximum of

v|τ1𝒱1,τ2𝒱2𝐉t+1((k,v);(s,m,τ1))𝐏t((s,m,τ1);(r,l,τ2))[gY,g𝖸]t(r,l,τ2)|2,\displaystyle\sum_{v}\Big{|}\sum_{\tau_{1}\in\mathcal{V}_{1},\tau_{2}\in\mathcal{V}_{2}}\mathbf{J}_{t+1}((k,v);(s,m,\tau_{1}))\mathbf{P}_{t}((s,m,\tau_{1});(r,l,\tau_{2}))[gY,g\mathsf{Y}]_{t}(r,l,\tau_{2})\Big{|}^{2}\,,

where the maximum is taken over s,rt,1mKs,1lKr,𝒱i{VBADt,𝖵π(BADt)}s,r\leq t,1\leq m\leq K_{s},1\leq l\leq K_{r},\mathcal{V}_{i}\in\big{\{}V\setminus\mathrm{BAD}_{t},\mathsf{V}\setminus\pi(\mathrm{BAD}_{t})\big{\}}. Thus, it suffices to bound each term in the maximum. For simplicity, we only demonstrate how to bound terms of the form:

v|u,wVBADt𝐉t+1((k,v);(s,l,π(u)))𝐏t((s,m,π(u));(r,l,w))[gY,g𝖸]t(r,l,w)|2.\displaystyle\sum_{v}\Big{|}\sum_{u,w\in V\setminus\mathrm{BAD}_{t}}\mathbf{J}_{t+1}((k,v);(s,l,\pi(u)))\mathbf{P}_{t}((s,m,\pi(u));(r,l,w))[gY,g\mathsf{Y}]_{t}(r,l,w)\Big{|}^{2}\,. (3.91)
Lemma 3.38.

We have for all s,r,m,ls,r,m,l

((3.91)12nKt+15Δt+12;𝒜t,t,t,𝒯t,t+1)exp{12nΔt+12}.\displaystyle\mathbb{P}\Big{(}\eqref{equ_one_part_projection}\geq\frac{1}{2}nK_{t+1}^{5}\Delta_{t+1}^{2};\mathcal{A}_{t},\mathcal{B}_{t},\mathcal{H}_{t},\mathcal{T}_{t},\mathcal{E}_{t+1}\Big{)}\leq\exp\big{\{}-\tfrac{1}{2}n\Delta_{t+1}^{2}\big{\}}\,.

Since the bounds for other terms are similar, by Lemma 3.38 and a union bound, we get that ((3.89);𝒜t,t,t,𝒯t,t+1)10Kt2exp{nΔt+12/2}\mathbb{P}(\eqref{equ-tail-sum-proj};\mathcal{A}_{t},\mathcal{B}_{t},\mathcal{H}_{t},\mathcal{T}_{t},\mathcal{E}_{t+1})\leq 10K_{t}^{2}\exp\{-n\Delta_{t+1}^{2}/2\}. Applying a union bound then yields that

(t+1c;𝒜t,t,t,𝒯t,t+1)20Kt+14exp{12nΔt+12}.\mathbb{P}(\mathcal{H}_{t+1}^{c};\mathcal{A}_{t},\mathcal{B}_{t},\mathcal{H}_{t},\mathcal{T}_{t},\mathcal{E}_{t+1})\leq 20K_{t+1}^{4}\exp\big{\{}-\tfrac{1}{2}n\Delta_{t+1}^{2}\big{\}}\,. (3.92)
Proof of Lemma 3.38.

Recall (3.66). We first divide the summation in (3.91)\eqref{equ_one_part_projection} into three parts 𝒮1,𝒮2,𝒮3\mathcal{S}_{1},\mathcal{S}_{2},\mathcal{S}_{3}, where 𝒮1\mathcal{S}_{1} accounts for the summation over u=vu=v and can be written as

v|w𝔼^[η(t+1)k,gtD~(t+1)vη(s)m,gs1𝖣~(s)π(v)]𝐏t((s,m,π(v));(r,l,w))gr1Yr(l,w)|2,\displaystyle\sum_{v}\Big{|}\sum_{w}\hat{\mathbb{E}}\Big{[}\langle\eta^{(t+1)}_{k},g_{t}\tilde{D}^{(t+1)}_{v}\rangle\langle\eta^{(s)}_{m},g_{s-1}\tilde{\mathsf{D}}^{(s)}_{\pi(v)}\rangle\Big{]}\mathbf{P}_{t}((s,m,\pi(v));(r,l,w))g_{r-1}Y_{r}(l,w)\Big{|}^{2}\,,

and 𝒮2\mathcal{S}_{2} accounts for the summation over uv,r=tu\neq v,r=t and can be written as

v|u,w(𝟏π(v)Π(s)j𝔞s)(𝟏uΓ(t+1)i𝔞t+1)n(𝔞t+1𝔞t+12)(𝔞s𝔞s2)𝐏t((s,m,π(u));(t,l,w))gt1Yt(l,w)|2,\displaystyle\sum_{v}\Big{|}\sum_{u,w}\frac{(\mathbf{1}_{\pi(v)\in\Pi^{(s)}_{j}}-\mathfrak{a}_{s})(\mathbf{1}_{u\in\Gamma^{(t+1)}_{i}}-\mathfrak{a}_{t+1})}{n\sqrt{(\mathfrak{a}_{t+1}-\mathfrak{a}_{t+1}^{2})(\mathfrak{a}_{s}-\mathfrak{a}_{s}^{2})}}\mathbf{P}_{t}((s,m,\pi(u));(t,l,w))g_{t-1}Y_{t}(l,w)\Big{|}^{2}\,,

and 𝒮3\mathcal{S}_{3} accounts for the summation over uv,r<tu\neq v,r<t and can be written as

v|u,w(𝟏π(v)Π(s)j𝔞s)(𝟏uΓ(t+1)i𝔞t+1)n(𝔞t+1𝔞t+12)(𝔞s𝔞s2)𝐏t((s,m,π(u));(r,l,w))gr1Yr(l,w)|2.\displaystyle\sum_{v}\Big{|}\sum_{u,w}\frac{(\mathbf{1}_{\pi(v)\in\Pi^{(s)}_{j}}-\mathfrak{a}_{s})(\mathbf{1}_{u\in\Gamma^{(t+1)}_{i}}-\mathfrak{a}_{t+1})}{n\sqrt{(\mathfrak{a}_{t+1}-\mathfrak{a}_{t+1}^{2})(\mathfrak{a}_{s}-\mathfrak{a}_{s}^{2})}}\mathbf{P}_{t}((s,m,\pi(u));(r,l,w))g_{r-1}Y_{r}(l,w)\Big{|}^{2}\,.

By Cauchy-Schwartz inequality, we have that (3.91)3(𝒮1+𝒮2+𝒮3)\eqref{equ_one_part_projection}\leq 3(\mathcal{S}_{1}+\mathcal{S}_{2}+\mathcal{S}_{3}). We first bound 𝒮1\mathcal{S}_{1}. Using (3.54), on the event t+1\mathcal{E}_{t+1} we have

𝔼^[η(t+1)k,gtD~(t+1)vη(s)m,gs1𝖣~(s)π(v)]=η(t+1)kPΓ,Π(t+1,s)(η(s)m)+o(Δt+1)Kt+1Δt+1.\displaystyle\hat{\mathbb{E}}\left[\langle\eta^{(t+1)}_{k},g_{t}\tilde{D}^{(t+1)}_{v}\rangle\langle\eta^{(s)}_{m},g_{s-1}\tilde{\mathsf{D}}^{(s)}_{\pi(v)}\rangle\right]=\eta^{(t+1)}_{k}\mathrm{P}_{\Gamma,\Pi}^{(t+1,s)}\left(\eta^{(s)}_{m}\right)^{*}+o(\Delta_{t+1})\leq K_{t+1}\Delta_{t+1}\,.

Thus, recalling 𝐏top100\|\mathbf{P}_{t}\|_{\mathrm{op}}\leq 100 we have

𝒮1\displaystyle\mathcal{S}_{1}\leq\ vKt+12Δt+12|w𝐏t((s,m,π(v));(r,l,w))gr1Yr(l,w)|2\displaystyle\sum_{v}K_{t+1}^{2}\Delta_{t+1}^{2}\Big{|}\sum_{w}\mathbf{P}_{t}((s,m,\pi(v));(r,l,w))g_{r-1}Y_{r}(l,w)\Big{|}^{2}
\displaystyle\leq\ Kt+12Δt+12𝐏t2opw|gr1Yr(l,w)|2𝒜t104Kt+14Δt+12n.\displaystyle K_{t+1}^{2}\Delta_{t+1}^{2}\|\mathbf{P}_{t}\|^{2}_{\mathrm{op}}\sum_{w}\big{|}g_{r-1}Y_{r}(l,w)\big{|}^{2}\overset{\mathcal{A}_{t}}{\leq}10^{4}K_{t+1}^{4}\Delta_{t+1}^{2}n\,. (3.93)

Next we bound 𝒮2\mathcal{S}_{2}. A straightforward calculation yields that

𝒮2=\displaystyle\mathcal{S}_{2}=\ Kt+12((12𝔞s)|Π(s)jBADt|+𝔞s2n)(𝔞𝔞2)(𝔞s𝔞s2)n2\displaystyle\frac{K_{t+1}^{2}((1-2\mathfrak{a}_{s})|\Pi^{(s)}_{j}\setminus\mathrm{BAD}_{t}|+\mathfrak{a}_{s}^{2}n)}{(\mathfrak{a}-\mathfrak{a}^{2})(\mathfrak{a}_{s}-\mathfrak{a}_{s}^{2})n^{2}}
(u,w(𝟏uΓ(t+1)i𝔞)𝐏t((s,m,π(u));(t,l,w))gt1Yt(l,w))2\displaystyle*\Big{(}\sum_{u,w}(\mathbf{1}_{u\in\Gamma^{(t+1)}_{i}}-\mathfrak{a})\mathbf{P}_{t}((s,m,\pi(u));(t,l,w))g_{t-1}Y_{t}(l,w)\Big{)}^{2}
\displaystyle\leq\ Kt+12𝔞n(u,w(𝟏uΓ(t+1)i𝔞)𝐏t((s,m,π(u));(t,l,w))gt1Yt(l,w))2.\displaystyle\frac{K_{t+1}^{2}}{\mathfrak{a}n}\Big{(}\sum_{u,w}(\mathbf{1}_{u\in\Gamma^{(t+1)}_{i}}-\mathfrak{a})\mathbf{P}_{t}((s,m,\pi(u));(t,l,w))g_{t-1}Y_{t}(l,w)\Big{)}^{2}\,.

We tilt the measure on {gt1Yt|𝔖t1;BADt}\{g_{t-1}Y_{t}|\mathfrak{S}_{t-1};\mathrm{BAD}_{t}\} to {Yˇt}\{\check{Y}_{t}\} again. Write Xu=Yˇt(l,u)X_{u}=\check{Y}_{t}(l,u) and 𝚋u=(𝟏{|12(12/Ktβ(t)i,W~(t)u+σ(t)i,Dˇ(t)u+σ(t)i,bt1D(t)u)|10}𝔞)\mathtt{b}_{u}=(\mathbf{1}_{\{|\frac{1}{\sqrt{2}}(\sqrt{12/K_{t}}\langle\beta^{(t)}_{i},\tilde{W}^{(t)}_{u}\rangle+\langle\sigma^{(t)}_{i},\check{D}^{(t)}_{u}\rangle+\langle\sigma^{(t)}_{i},b_{t-1}{D}^{(t)}_{u}\rangle)|\geq 10\}}-\mathfrak{a}). We will first bound

(|u,w𝚋u𝐏t((s,m,π(u));(t,l,w))Xw|>110𝔞Kt+1nΔt+1).\displaystyle\mathbb{P}\Big{(}\big{|}\sum_{u,w}\mathtt{b}_{u}\mathbf{P}_{t}((s,m,\pi(u));(t,l,w))X_{w}\big{|}>\frac{1}{10}\mathfrak{a}K_{t+1}n\Delta_{t+1}\Big{)}\,. (3.94)

Note that

(3.94)\displaystyle\eqref{equ-tail-quadratic}\leq\ (|u𝐏t((s,m,π(u));(t,l,u))𝚋uXu|>130𝔞Kt+1nΔt+1)\displaystyle\mathbb{P}\Big{(}\big{|}\sum_{u}\mathbf{P}_{t}((s,m,\pi(u));(t,l,u))\mathtt{b}_{u}X_{u}\big{|}>\frac{1}{30}\mathfrak{a}K_{t+1}n\Delta_{t+1}\Big{)} (3.95)
+\displaystyle+\ (|uw𝐏t((s,m,π(u));(t,l,w))𝔼[𝚋u]Xw|>130𝔞Kt+1nΔt+1)\displaystyle\mathbb{P}\Big{(}\big{|}\sum_{u\neq w}\mathbf{P}_{t}((s,m,\pi(u));(t,l,w))\mathbb{E}[\mathtt{b}_{u}]X_{w}\big{|}>\frac{1}{30}\mathfrak{a}K_{t+1}n\Delta_{t+1}\Big{)} (3.96)
+\displaystyle+\ (|uw𝐏t((s,m,π(u));(t,l,w))(𝚋u𝔼[𝚋u])Xw|>130𝔞Kt+1nΔt+1).\displaystyle\mathbb{P}\Big{(}\big{|}\sum_{u\neq w}\mathbf{P}_{t}((s,m,\pi(u));(t,l,w))(\mathtt{b}_{u}-\mathbb{E}[\mathtt{b}_{u}])X_{w}\big{|}>\frac{1}{30}\mathfrak{a}K_{t+1}n\Delta_{t+1}\Big{)}\,. (3.97)

To bound (3.95), note that 𝚋uXu\mathtt{b}_{u}X_{u}’s for different uu are independent, and the entries of 𝐏t\mathbf{P}_{t} are bounded by 100100 (since 𝐏top100\|\mathbf{P}_{t}\|_{\mathrm{op}}\leq 100). Also, we have |σ(t)i,bt1D(t)u|KtΔt10|\langle\sigma^{(t)}_{i},b_{t-1}{D}^{(t)}_{u}\rangle|\leq K_{t}\Delta_{t}^{10} for uBIAStu\not\in\mathrm{BIAS}_{t}, and thus |𝔼[𝚋uXu]|KtΔt10|\mathbb{E}[\mathtt{b}_{u}X_{u}]|\lesssim K_{t}\Delta_{t}^{10} (this is the place where we use the symmetry from taking the absolute values as discussed below (2.29); in fact |𝔼[𝚋uXu]||\mathbb{E}[\mathtt{b}_{u}X_{u}]| would have been 0 if σ(t)i,bt1D(t)u\langle\sigma^{(t)}_{i},b_{t-1}{D}^{(t)}_{u}\rangle were 0). We then get that (3.95)exp{Ω(n𝔞2Kt+12Δt+12)}\eqref{equ-tail-quadratic-part-I}\leq\exp\{-\Omega(n\mathfrak{a}^{2}K_{t+1}^{2}\Delta_{t+1}^{2})\} by Chernoff bound. To bound (3.96), note that uw𝐏t((s,m,π(u));(t,l,w))𝔼[𝚋u]Xw\sum_{u\neq w}\mathbf{P}_{t}((s,m,\pi(u));(t,l,w))\mathbb{E}[\mathtt{b}_{u}]X_{w} is a mean-zero Gaussian variable, with variance bounded by

w(uw𝐏t((s,m,π(u));(t,l,w))𝔼[𝚋u])2𝐏top2u(𝔼[𝚋u])2nKt2Δt20\displaystyle\sum_{w}\Big{(}\sum_{u\neq w}\mathbf{P}_{t}((s,m,\pi(u));(t,l,w))\mathbb{E}[\mathtt{b}_{u}]\Big{)}^{2}\leq\|\mathbf{P}_{t}\|_{\mathrm{op}}^{2}\sum_{u}(\mathbb{E}[\mathtt{b}_{u}])^{2}\leq nK_{t}^{2}\Delta_{t}^{20}

using |𝔼[𝚋u]|KtΔt10|\mathbb{E}[\mathtt{b}_{u}]|\leq K_{t}\Delta_{t}^{10} again. Thus we get (3.96)exp{(nΔt+1)2nKt+12Δt20}exp{n}\eqref{equ-tail-quadratic-part-II}\leq\exp\{-\frac{(n\Delta_{t+1})^{2}}{nK_{t+1}^{2}\Delta_{t}^{20}}\}\leq\exp\{-n\}. We now bound (3.97). Note that {𝚋u𝔼[𝚋u],Xu}\{\mathtt{b}_{u}-\mathbb{E}[\mathtt{b}_{u}],X_{u}\} are mean-zero sub-Gaussian variables, and are independent with {𝚋w𝔼[𝚋w],Xw:wu}\{\mathtt{b}_{w}-\mathbb{E}[\mathtt{b}_{w}],X_{w}:w\neq u\}. In addition, we have 𝐏top100,𝐏t2HSnKt𝐏top2Kt2n\|\mathbf{P}_{t}\|_{\mathrm{op}}\leq 100,\|\mathbf{P}_{t}\|^{2}_{\mathrm{HS}}\leq nK_{t}\|\mathbf{P}_{t}\|_{\mathrm{op}}^{2}\leq K_{t}^{2}n. Thus, we can apply Lemma 3.5 and get that

(3.97)2exp{Ω(1)min((𝔞Kt+1nΔt+1)2𝐏t2HS,𝔞Kt+1nΔt+1𝐏top)}2exp{Ω(nΔt+12Kt+1)}.\displaystyle\eqref{equ-tail-quadratic-part-III}\leq 2\exp\Big{\{}-\Omega(1)\min\Big{(}\frac{(\mathfrak{a}K_{t+1}n\Delta_{t+1})^{2}}{\|\mathbf{P}_{t}\|^{2}_{\mathrm{HS}}},\frac{\mathfrak{a}K_{t+1}n\Delta_{t+1}}{\|\mathbf{P}_{t}\|_{\mathrm{op}}}\Big{)}\Big{\}}\leq 2\exp\{-\Omega(n\Delta_{t+1}^{2}K_{t+1})\}\,.

Combining bounds on (3.95) and (3.96), we have (3.94)O(eΩ(nΔt+12Kt+1))\eqref{equ-tail-quadratic}\leq O(e^{-\Omega(n\Delta_{t+1}^{2}K_{t+1})}). Thus, recalling definition of t\mathcal{B}_{t} and averaging over 𝔖t1\mathfrak{S}_{t-1} and BADt\mathrm{BAD}_{t} we have that

(𝒮2>110Kt+14nΔt+12)exp{nΔt+12}.\displaystyle\mathbb{P}(\mathcal{S}_{2}>\tfrac{1}{10}K_{t+1}^{4}n\Delta_{t+1}^{2})\leq\exp\{-n\Delta_{t+1}^{2}\}\,. (3.98)

It remains to bound 𝒮3\mathcal{S}_{3}. Again, using Cauchy-Schwartz inequality we have

𝒮3Kt+12𝔞n(u,w(𝟏uΓ(t+1)i𝔞)𝐏t((s,m,π(u));(r,l,w))gr1Yr(l,w))2.\displaystyle\mathcal{S}_{3}\leq\frac{K_{t+1}^{2}}{\mathfrak{a}n}\Big{(}\sum_{u,w}(\mathbf{1}_{u\in\Gamma^{(t+1)}_{i}}-\mathfrak{a})\mathbf{P}_{t}((s,m,\pi(u));(r,l,w))g_{r-1}Y_{r}(l,w)\Big{)}^{2}\,.

Again, by tilting the measure to {Yˇt}\{\check{Y}_{t}\}, we get that (𝒮3>110Kt+14nΔt+12)\mathbb{P}(\mathcal{S}_{3}>\frac{1}{10}K_{t+1}^{4}n\Delta_{t+1}^{2}) is bounded by (recall from above that 𝚋u=(𝟏{|12(12/Ktβ(t)i,W~(t)u+σ(t)i,Dˇ(t)u+σ(t)i,bt1D(t)u)|10}𝔞)\mathtt{b}_{u}=(\mathbf{1}_{\{|\frac{1}{\sqrt{2}}(\sqrt{12/K_{t}}\langle\beta^{(t)}_{i},\tilde{W}^{(t)}_{u}\rangle+\langle\sigma^{(t)}_{i},\check{D}^{(t)}_{u}\rangle+\langle\sigma^{(t)}_{i},b_{t-1}{D}^{(t)}_{u}\rangle)|\geq 10\}}-\mathfrak{a}))

exp{nKt30Δt2}(\displaystyle\exp\{nK_{t}^{30}\Delta_{t}^{2}\}\cdot\mathbb{P}\Big{(} u,w𝚋u𝐏t((s,m,π(u));(t,l,w))gr1Yr(l,w)>110𝔞nKt+1Δt+1).\displaystyle\sum_{u,w}\mathtt{b}_{u}\mathbf{P}_{t}((s,m,\pi(u));(t,l,w))g_{r-1}Y_{r}(l,w)>\frac{1}{10}\mathfrak{a}nK_{t+1}\Delta_{t+1}\Big{)}\,.

We can write u,w𝚋u𝐏t((s,m,π(u));(t,l,w))gr1Yr(l,w)=uλu𝚋u\sum_{u,w}\mathtt{b}_{u}\mathbf{P}_{t}((s,m,\pi(u));(t,l,w))g_{r-1}Y_{r}(l,w)=\sum_{u}\lambda_{u}\mathtt{b}_{u}, where λu\lambda_{u} is given by λu=w𝐏t((s,m,π(u));(t,l,w))gr1Yr(l,w)\lambda_{u}=\sum_{w}\mathbf{P}_{t}((s,m,\pi(u));(t,l,w))g_{r-1}Y_{r}(l,w). We see that λu\lambda_{u}’s satisfy that

uλu2\displaystyle\sum_{u}\lambda_{u}^{2} =u(w𝐏t((s,m,π(u));(t,l,w))gr1Yr(l,w))2\displaystyle=\sum_{u}\Big{(}\sum_{w}\mathbf{P}_{t}((s,m,\pi(u));(t,l,w))g_{r-1}Y_{r}(l,w)\Big{)}^{2}
𝐏top2w(gr1Yr(l,w))2𝒜t104n.\displaystyle\leq\|\mathbf{P}_{t}\|_{\mathrm{op}}^{2}\sum_{w}\big{(}g_{r-1}Y_{r}(l,w)\big{)}^{2}\overset{\mathcal{A}_{t}}{\leq}10^{4}n\,.

Using |σ(t)i,bt1D(t)u|KtΔt10|\langle\sigma^{(t)}_{i},b_{t-1}{D}^{(t)}_{u}\rangle|\leq K_{t}\Delta_{t}^{10} again, we have 𝔼[𝚋u]=O(KtΔt10)\mathbb{E}[\mathtt{b}_{u}]=O(K_{t}\Delta_{t}^{10}). Thus, we get that |𝔼[uλu𝚋u]|uBADt1|λu|KtΔt10nKtΔt2|\mathbb{E}[\sum_{u}\lambda_{u}\mathtt{b}_{u}]|\leq\sum_{u\not\in\mathrm{BAD}_{t-1}}|\lambda_{u}|K_{t}\Delta_{t}^{10}\ll nK_{t}\Delta_{t}^{2}. Combined with Azuma-Hoeffding inequality, it yields that

(uλu𝚋u>110𝔞Kt+1nΔt+1)2exp{(110𝔞Kt+1nΔt+1)24uλu2}2exp{Ω(n)Kt+1Δt+12}.\displaystyle\mathbb{P}\big{(}\sum_{u}\lambda_{u}\mathtt{b}_{u}>\tfrac{1}{10}\mathfrak{a}K_{t+1}n\Delta_{t+1}\big{)}\leq 2\exp\Big{\{}-\frac{(\frac{1}{10}\mathfrak{a}K_{t+1}n\Delta_{t+1})^{2}}{4\sum_{u}\lambda_{u}^{2}}\Big{\}}\leq 2\exp\{-\Omega(n)K_{t+1}\Delta_{t+1}^{2}\}\,.

This then implies that (by using t\mathcal{B}_{t} and averaging again)

(𝒮3>110Kt+14nΔt+12)exp{nΔt+12}.\displaystyle\mathbb{P}(\mathcal{S}_{3}>\frac{1}{10}K_{t+1}^{4}n\Delta_{t+1}^{2})\leq\exp\{-n\Delta_{t+1}^{2}\}\,. (3.99)

Combined with (3.93) and (3.98), this completes the proof of Lemma 3.38. ∎

3.6.6 Conclusion

By putting together (3.74), (3.84), (3.85), (3.86) and (3.92), we have proved Step 1Step 5 listed at the beginning of this subsection. In addition, since tloglognt^{*}\leq\log\log n, our quantitative bounds imply that all these hold simultaneously for t=0,,tt=0,\ldots,t^{*} with probability 1o(1)1-o(1). By the inductive logic explained at the beginning of this subsection, we complete the proof of Proposition 3.2. We also point out that in addition we have shown that

(𝒜t,t,t,𝒯tt)=1o(1),\mathbb{P}(\mathcal{A}_{t^{*}},\mathcal{B}_{t^{*}},\mathcal{H}_{t^{*}},\mathcal{T}_{t^{*}}\cap\mathcal{E}_{t^{*}})=1-o(1)\,, (3.100)

which will be used in Section 4.

4 Almost exact matching

In this section we show that on the event =𝒜tttt𝒯t\mathcal{E}_{\diamond}=\mathcal{A}_{t^{*}}\cap\mathcal{B}_{t^{*}}\cap\mathcal{E}_{t^{*}}\cap\mathcal{H}_{t^{*}}\cap\mathcal{T}_{t^{*}}, our algorithm matches all but a vanishing fraction of vertices with probability 1o(1)1-o(1), thereby proving Proposition 2.3 (recall (3.100)). For notational convenience, we will drop tt^{*} from subscripts (unless we wish to emphasize it). That is, we will write ε,K,Δ,ηl,Dv,Wv,Yˇ(k,v),BAD\varepsilon,K,\Delta,\eta_{l},D_{v},W_{v},\check{Y}(k,v),\mathrm{BAD} instead of εt,Kt,Δt,η(t)l,D(t)v,W(t)v,Yˇt(k,v),BADt\varepsilon_{t^{*}},K_{t^{*}},\Delta_{t^{*}},\eta^{(t^{*})}_{l},D^{(t^{*})}_{v},W^{(t^{*})}_{v},\check{Y}_{t^{*}}(k,v),\mathrm{BAD}_{t^{*}}.

In light of (2.37), we define 𝚄\mathtt{U} to be the collection of vVv\in V such that

k=1112K(Wv(k)+ηk,Dv)(𝖶π(v)(k)+ηk,𝖣π(v))<1100Kε,\displaystyle\sum_{k=1}^{\frac{1}{12}K}\big{(}W_{v}(k)+\langle\eta_{k},D_{v}\rangle\big{)}\big{(}\mathsf{W}_{\pi(v)}(k)+\langle\eta_{k},\mathsf{D}_{\pi(v)}\rangle\big{)}<\frac{1}{100}K\varepsilon\,,

and we define 𝙴\mathtt{E} to be the collection of directed edges (u,w)VBADc×VBADc(u,w)\in V\cap\mathrm{BAD}^{c}\times V\cap\mathrm{BAD}^{c} (with uwu\neq w) such that

k=1112K(Wu(k)+ηk,Du)(𝖶π(w)(k)+ηk,𝖣π(w))1100Kε.\displaystyle\sum_{k=1}^{\frac{1}{12}K}\big{(}W_{u}(k)+\langle\eta_{k},D_{u}\rangle\big{)}\big{(}\mathsf{W}_{\pi(w)}(k)+\langle\eta_{k},\mathsf{D}_{\pi(w)}\rangle\big{)}\geq\frac{1}{100}K\varepsilon\,.

It is clear that 𝚄\mathtt{U} and 𝙴\mathtt{E} will potentially lead to mis-matching for our algorithm in the finishing stage. As a result, our proof requires bounds on them.

Lemma 4.1.

We have (|𝚄|n/logn;)=o(1)\mathbb{P}(|\mathtt{U}|\geq n/\log n;\mathcal{E}_{\diamond})=o(1).

Proof.

We assume \mathcal{E}_{\diamond}. If |𝚄|>nlogn|\mathtt{U}|>\frac{n}{\log n}, we have |𝚄BADc|>n2logn|\mathtt{U}\cap\mathrm{BAD}^{c}|>\frac{n}{2\log n} using |BAD|n(logn)2|\mathrm{BAD}|\ll\frac{n}{(\log n)^{2}}. Let UU be a realization for 𝚄BADc\mathtt{U}\cap\mathrm{BAD}^{c}. Then ηk,bt1DvΔ10\langle\eta_{k},b_{t^{*}-1}D_{v}\rangle\ll\Delta^{10} for vUv\in U. Thus,

k=1112K(gt1Y(k,v)+o(Δ10))(gt1𝖸(k,π(v))+o(Δ10))<1100Kε for vU.\displaystyle\sum_{k=1}^{\frac{1}{12}K}\big{(}g_{t^{*}-1}Y(k,v)+o(\Delta^{10})\big{)}\big{(}g_{t^{*}-1}\mathsf{Y}(k,\pi(v))+o(\Delta^{10})\big{)}<\frac{1}{100}K\varepsilon\mbox{ for }v\in U\,. (4.1)

We again use the tilted measure. By the definition of {ηk,Dˇv,ηk,𝖣ˇ𝗏}\{\langle\eta_{k},\check{D}_{v}\rangle,\langle\eta_{k},\check{\mathsf{D}}_{\mathsf{v}}\rangle\}, the events

{k=1112K(Yˇ(k,v)+o(Δ10))(𝖸ˇ(k,π(v))+o(Δ10))<1100Kε}\displaystyle\Big{\{}\sum_{k=1}^{\frac{1}{12}K}\big{(}\check{Y}(k,v)+o(\Delta^{10})\big{)}\big{(}\check{\mathsf{Y}}(k,\pi(v))+o(\Delta^{10})\big{)}<\frac{1}{100}K\varepsilon\Big{\}}

are independent for different vv, and each has probability at most (by Δe(loglogn)8ε\Delta\leq e^{-(\log\log n)^{8}}\ll\varepsilon and Lemma 3.4 again)

2exp{(Kε)2K}exp{(logn)1.8}.\displaystyle 2\exp\big{\{}-\frac{(K\varepsilon)^{2}}{K}\big{\}}\leq\exp\{-(\log n)^{1.8}\}\,.

Recalling the definition of t\mathcal{B}_{t^{*}} and recalling that t\mathcal{E}_{\diamond}\subset\mathcal{B}_{t^{*}}, we derive that

((4.1);𝔖t1,BADt)exp{nΔ2}exp{n(logn)0.8}exp{12n(logn)0.8},\displaystyle\mathbb{P}(\eqref{equ-event-self-edge};\mathcal{E}_{\diamond}\mid\mathfrak{S}_{t^{*}-1},\mathrm{BAD}_{t^{*}})\leq\exp\{n\Delta^{2}\}\cdot\exp\{-n(\log n)^{0.8}\}\ll\exp\{-\tfrac{1}{2}n(\log n)^{0.8}\}\,,

Since the enumeration for possible realizations of 𝚄\mathtt{U} is at most 2n2^{n}, this completes the proof by a simple union bound. ∎

Lemma 4.2.

On \mathcal{E}_{\diamond} with probability 1o(1)1-o(1) we have the following: any subset of 𝙴\mathtt{E} has cardinality at most nlogn\frac{n}{\log n} if each vertex is incident to at most one edge in this subset.

Proof.

Suppose otherwise there exists U={v1,,vM}VBADcU=\{v_{1},\ldots,v_{M}\}\subset V\cap\mathrm{BAD}^{c} with M=2nlognM=\frac{2n}{\log n} such that for all 1iM/21\leq i\leq M/2

k=1K/12(gt1Y(k,v2i1)+o(Δ10))(gt1𝖸(k,π(v2i))+o(Δ10))1100Kε.\displaystyle\mbox{$\sum_{k=1}^{K/12}$}\big{(}g_{t^{*}-1}Y(k,v_{2i-1})+o(\Delta^{10})\big{)}\big{(}g_{t^{*}-1}\mathsf{Y}(k,\pi(v_{2i}))+o(\Delta^{10})\big{)}\geq\tfrac{1}{100}K\varepsilon\,. (4.2)

We again tilt the measure. Note that the events

{k=1K/12(Yˇ(k,v2i1)+o(Δ10))(𝖸ˇ(k,π(v2i))+o(Δ10))1100Kε} for 1iM/2\displaystyle\Big{\{}\mbox{$\sum_{k=1}^{K/12}$}\Big{(}\check{Y}(k,v_{2i-1})+o(\Delta^{10})\big{)}\big{(}\check{\mathsf{Y}}(k,\pi(v_{2i}))+o(\Delta^{10})\Big{)}\geq\tfrac{1}{100}K\varepsilon\Big{\}}\mbox{ for }1\leq i\leq M/2

are independent and each occurs with probability at most exp{(logn)1.8}\exp\{-(\log n)^{1.8}\}. Therefore, by similarly applying t\mathcal{B}_{t^{*}} and applying a union bound, we complete the proof of the lemma. ∎

We are now ready to provide the proof of Proposition 2.3.

Proof of Proposition 2.3.

By Lemmas 4.1 and 4.2, we assume without loss of generality that the events described in these two lemmas both occur. Let Vfail={vV:π^(v)π(v)}V_{\mathrm{fail}}=\{v\in V:\hat{\pi}(v)\neq\pi(v)\}. Suppose in the finishing step (i.e, for the step of computing (2.37)) our algorithm processes Vfail𝚄V_{\mathrm{fail}}\setminus\mathtt{U} in the order of w1,w2,,wmw_{1},w_{2},\ldots,w_{m}. For wk𝚄w_{k}\not\in\mathtt{U}, in order to have π^(wk)π(wk)\hat{\pi}(w_{k})\neq\pi(w_{k}), either our algorithm assigns a wrong matching to wkw_{k} or at the time when processing wkw_{k} the vertex π(wk)\pi(w_{k}) has already been matched to some other vertex. We then construct a directed graph H\overrightarrow{H} on vertices {w1,w2,,wm}𝚄\{w_{1},w_{2},\ldots,w_{m}\}\cup\mathtt{U} as follows: for each v{w1,w2,,wm}𝚄v\in\{w_{1},w_{2},\ldots,w_{m}\}\cup\mathtt{U}, if the finishing step puts vv into SUC\mathrm{SUC} and matches vv to some π(u)\pi(u) with π(u)π(v)\pi(u)\neq\pi(v), then we connect a directed edge from vv to uu. Note our algorithm will not match a vertex twice, so all vertices have in-degree and out-degree both at most 1. Also, for 1km1\leq k\leq m, if wkw_{k} has out-degree 0, then π(wk)\pi(w_{k}) must have been matched to some uu and thus there is a directed edge (u,wk)H(u,w_{k})\in\overrightarrow{H}. Thus, the directed graph H\overrightarrow{H} is a collection of non-overlapping directed chains. Since there are at least m2\frac{m}{2} edges in H\overrightarrow{H} (recall that each wkw_{k} is incident to at least one edge in H\overrightarrow{H}), we can get a matching with cardinality at least m4\frac{m}{4}. Since |BAD|n/(logn)2|\mathrm{BAD}|\ll n/(\log n)^{2}, we can then get a matching restricted on VBADcV\cap\mathrm{BAD}^{c} with cardinality at least m/4n/(logn)2m/4-n/(\log n)^{2}. By the event in Lemma 4.2, we see that m/4n/(logn)2n/lognm/4-n/(\log n)^{2}\leq n/\log n, completing the proof. ∎

Appendix A Index of notation

Here we record some commonly used symbols in the paper, along with their meaning and the location where they are first defined. Local notations will not be included.

  • G,𝖦\overrightarrow{G},\overrightarrow{\mathsf{G}}: pre-processed graphs; Subsection 2.1.

  • q^\hat{q}: pre-processed edge density; Subsection 2.1.

  • ρ^\hat{\rho}: pre-processed edge correlation; Subsection 2.1.

  • ιlb\iota_{\mathrm{lb}}, ιub\iota_{\mathrm{ub}}: lower and upper bounds for the increment of ϕ\phi; (2.2).

  • κ\kappa: number of sets generated in initialization; (2.10).

  • χ\chi: depth of neighborhood in initialization; (2.4).

  • (a)k,Υ(a)k\aleph^{(a)}_{k},\Upsilon^{(a)}_{k}: aa-neighborhood of the seeds in initialization; (2.5), (2.6).

  • ϑa,ςa\vartheta_{a},\varsigma_{a}: fraction of aa-neighborhood and their interactions; (2.7).

  • Γ(t)k,Π(t)k\Gamma^{(t)}_{k},\Pi^{(t)}_{k}: sets generated at time tt in iteration; (2.8), (2.29).

  • Φ(t),Ψ(t)\Phi^{(t)},\Psi^{(t)}: matrices denote the overlapping structures of sets; (2.11), (2.34).

  • 𝔞t\mathfrak{a}_{t}: fraction of sets generated at time tt; (2.12).

  • KtK_{t}: number of sets generated at time tt in iteration; (2.19).

  • εt\varepsilon_{t}: signal contained in each pair at time tt; (2.31).

  • η(t)k\eta^{(t)}_{k}: basis of projection spaces at time tt; (2.27), (2.26).

  • σ(t)k\sigma^{(t)}_{k}: direction of projections at time tt; (2.28).

  • tt^{*}: time when the iteration stops; (2.35).

  • D(t)v,𝖣(t)𝗏D^{(t)}_{v},\mathsf{D}^{(t)}_{\mathsf{v}}: degree of vertices to the sets at time tt; (2.21).

  • Δt\Delta_{t}: targeted approximation error at time tt; (3.1).

  • MΓ,MΠ,PΓ,Π\mathrm{M}_{\Gamma},\mathrm{M}_{\Pi},\mathrm{P}_{\Gamma,\Pi}: matrices recording the correlation among iteration; (2.24).

  • REV\mathrm{REV}: the vertices revealed in initialization; (3.4).

  • Wv(t),𝖶𝗏(t)W_{v}^{(t)},\mathsf{W}_{\mathsf{v}}^{(t)}: Gaussian vectors introduced for smoothing; Section 2.3.

  • 𝔖t\mathfrak{S}_{t}: the information used up to time tt; (3.6).

  • btDv(t),gt𝖣𝗏(t)b_{t}D_{v}^{(t)},g_{t}{\mathsf{D}}_{\mathsf{v}}^{(t)}: the bias part and the good part of degree. (3.5).

  • Z,𝖹\overrightarrow{Z},\overrightarrow{\mathsf{Z}}: independently sampled Gaussian matrices; Subsection 3.1.

  • 𝔉t\mathfrak{F}_{t}: the Gaussian process up to time tt; (3.12).

  • t\mathcal{F}_{t}: the information generated by the Gaussian process up to time tt; (3.12).

  • BADt\mathrm{BAD}_{t}: the collection of bad vertices up to time tt; (3.15).

  • BIASt\mathrm{BIAS}_{t}: the collection of vertices with large bias; (3.7).

  • LARGEt\mathrm{LARGE}_{t}: the collection of vertices with large degree; (3.11).

  • PRBt\mathrm{PRB}_{t}: the collection of vertices with bad projection; (3.14).

  • gtD~v(t),gt𝖣~𝗏(t)g_{t}\tilde{D}_{v}^{(t)},g_{t}\tilde{\mathsf{D}}_{\mathsf{v}}^{(t)}: the degree replaced by Gaussian; (3.8).

  • 𝐐t\mathbf{Q}_{t}: covariance matrix of Gaussian variables at time tt; Remark 3.22.

  • 𝐏t\mathbf{P}_{t}: covariance matrix of revised Gaussian variables at time tt; Subsection 3.5.

  • 𝐇t\mathbf{H}_{t}: coefficient matrix of Gaussian variables at time tt; Remark 3.22.

  • 𝐉t\mathbf{J}_{t}: coefficient matrix of revised Gaussian variables at time tt; Subsection 3.5.

  • Dˇ(t)v,𝖣ˇ(t)𝗏\check{D}^{(t)}_{v},\check{\mathsf{D}}^{(t)}_{\mathsf{v}}: Gaussian process with simple covariance structure; (3.69).

  • 𝔉t\mathfrak{F}_{t}^{\prime}: the revised Gaussian process up to time tt; (3.64).

  • t\mathcal{F}_{t}^{\prime}: the information generated by revised Gaussian process; (3.64).

References

  • [1] B. Barak, C.-N. Chou, Z. Lei, T. Schramm, and Y. Sheng. (nearly) efficient algorithms for the graph matching problem on correlated random graphs. In Advances in Neural Information Processing Systems (NIPS), volume 32. Curran Associates, Inc., 2019.
  • [2] M. Bayati, M. Lelarge and A. Montanari. Universality in polytope phase transitions and message passing algorithms. In The Annals of Applied Probability. 25(2):753–822, 2015.
  • [3] M. Bayati and A. Montanari. The dynamics of message passing on dense graphs, with applications to compressed sensing. In IEEE Transactions on Information Theory, 57(2):764–785, 2011.
  • [4] A. Berg, T. Berg, and J. Malik. Shape matching and object recognition using low distortion correspondences. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), volume 1, pages 26–33, 2005.
  • [5] M. Bozorg, S. Salehkaleybar, and M. Hashemi. Seedless graph matching via tail of degree distribution for correlated Erdős–Rényi graphs. Preprint, arXiv:1907.06334.
  • [6] S. Chen, S. Jiang, Z. Ma, G. P. Nolan, and B. Zhu. One-way matching of datasets with low rank signals. Preprint, arXiv:2204.13858.
  • [7] W. Chen and W. Lam. Universality of approximate message passing algorithms. In Electronic Journal of Probability, 26:1–44, 2021.
  • [8] T. Cour, P. Srinivasan, and J. Shi. Balanced graph matching. In Advances in Neural Information Processing Systems (NIPS), volume 19. MIT Press, 2006.
  • [9] D. Cullina and N. Kiyavash. Exact alignment recovery for correlated Erdős–Rényi graphs. Preprint, arXiv:1711.06783.
  • [10] D. Cullina and N. Kiyavash. Improved achievability and converse bounds for erdos-renyi graph matching. In Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science, pages 63–72, New York, NY, USA, 2016. Association for Computing Machinery.
  • [11] D. Cullina, N. Kiyavash, P. Mittal, and H. V. Poor. Partial recovery of Erdős–Rényi graph alignment via kk-core alignment. In Proceedings of the 2020 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science, pages 99–100, New York, NY, USA, 2020. Association for Computing Machinery.
  • [12] O. E. Dai, D. Cullina, N. Kiyavash, and M. Grossglauser. Analysis of a canonical labeling algorithm for the alignment of correlated Erdős–Rényi graphs. In ACM SIGMETRICS Performance Evaluation Review, 47(1):96–97, 2019.
  • [13] J. Ding and H. Du. Detection threshold for correlated Erdős–Rényi graphs via densest subgraph. In IEEE Transactions on Information Theory, 69(8):5289–5298, 2023.
  • [14] J. Ding and H. Du. Matching recovery threshold for correlated random graphs. In Annals of Statistics, 51(4):1718–1743, 2023.
  • [15] J. Ding, H. Du, and S. Gong. A polynomial-time approximation scheme for the maximal overlap of two independent Erdős–Rényi graphs. To appear in Random Structures and Algorithms.
  • [16] J. Ding, H. Du and Z. Li. Low-Degree Hardness of Detection for Correlated Erdős–Rényi Graphs. Preprint, arXiv: 2311.15931.
  • [17] J. Ding and Z. Li. A polynomial time iterative algorithm for matching Gaussian matrices with non-vanishing correlation. Preprint, arXiv:2212.13677.
  • [18] J. Ding, Z. Ma, Y. Wu, and J. Xu. Efficient random graph matching via degree profiles. In Probability Theory and Related Fields, 179(1-2):29–115, 2021.
  • [19] A. Elmsallati, C. Clark, and J. Kalita. Global alignment of protein-protein interaction networks: A survey. In IEEE/ACM transactions on computational biology and bioinformatics. 13(4):689–705, 2015.
  • [20] D. P. Dubhashi and A. Panconesi. Concentration of measure for the analysis of randomized algorithms. Cambridge University Press, Cambridge, 2009.
  • [21] Z. Fan, C. Mao, Y. Wu, and J. Xu. Spectral graph matching and regularized quadratic relaxations: Algorithm and theory. In Foundations of Computational Mathematics, 23(5):1511–1565, 2023.
  • [22] Z. Fan, C. Mao, Y. Wu, and J. Xu. Spectral graph matching and regularized quadratic relaxations II: Erdős–Rényi graphs and universality. In Foundations of Computational Mathematics, 23(5):1567–1617, 2023.
  • [23] Z. Fan, T. Wang and X. Zhong. Universality of Approximate Message Passing algorithms and tensor networks. Preprint, arXiv:2206.13037.
  • [24] S. Feizi, G. Quon, M. Medard, M. Kellis, and A. Jadbabaie. Spectral alignment of networks. Preprint, arXiv:1602.04181.
  • [25] S. Foucart and H. Rauhut. A mathematical introduction to compressive sensing. Applied and Numerical Harmonic Analysis. Birkhäuser/Springer, New York, 2013.
  • [26] L. Ganassali and L. Massoulié. From tree matching to sparse graph alignment. In Proceedings of Thirty Third Conference on Learning Theory (COLT), volume 125 of Proceedings of Machine Learning Research, pages 1633–1665. PMLR, 09–12 Jul 2020.
  • [27] L. Ganassali, L. Massoulié, and M. Lelarge. Impossibility of partial recovery in the graph alignment problem. In Proceedings of Thirty Fourth Conference on Learning Theory (COLT), volume 134 of Proceedings of Machine Learning Research, pages 2080–2102. PMLR, 15–19 Aug 2021.
  • [28] L. Ganassali, L. Massoulié, and M. Lelarge. Correlation detection in trees for planted graph alignment. To appear in Annals of Applied Probability.
  • [29] L. Ganassali, L. Massoulié, and G. Semerjian. Statistical limits of correlation detection in trees. arXiv:2209.13723.
  • [30] A. Haghighi, A. Ng, and C. Manning. Robust textual inference via graph matching. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pages 387–394, Vancouver, British Columbia, Canada, Oct 2005.
  • [31] G. Hall and L. Massoulié. Partial recovery in the graph alignment problem. In Operations Research, 71(1):259–272, 2023.
  • [32] D. L. Hanson and F. T. Wright. A bound on tail probabilities for quadratic forms in independent random variables. Annals of Mathematical Statistics, 42:1079–1083, 1971.
  • [33] E. Kazemi, S. H. Hassani, and M. Grossglauser. Growing a graph matching from a handful of seeds. In Proceedings Of The VLDB Endowment, 8(10):1010–1021, jun 2015.
  • [34] V. Lyzinski, D. E. Fishkind, and C. E. Priebe. Seeded graph matching for correlated Erdős–Rényi graphs. In Journal of Machine Learning Research, 15:3513–3540, 2014.
  • [35] C. Mao, M. Rudelson, and K. Tikhomirov. Random Graph Matching with Improved Noise Robustness. In Proceedings of Thirty Fourth Conference on Learning Theory (COLT), volume 134 of Proceedings of Machine Learning Research, pages 1–34, 2021.
  • [36] C. Mao, M. Rudelson, and K. Tikhomirov. Exact matching of random graphs with constant correlation. In Probability Theory and Related Fields, 186(2):327–389, 2023.
  • [37] C. Mao, Y. Wu, J. Xu, and S. H. Yu. Testing network correlation efficiently via counting trees. To appear in Annals of Statistics.
  • [38] C. Mao, Y. Wu, J. Xu, and S. H. Yu. Random graph matching at Otter’s threshold via counting chandeliers. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing (STOC), pages 1345–1356, 2023.
  • [39] E. Mossel and J. Xu. Seeded graph matching via large neighborhood statistics. In Random Structures Algorithms, 57(3):570–611, 2020.
  • [40] A. Narayanan and V. Shmatikov. Robust de-anonymization of large sparse datasets. In 2008 IEEE Symposium on Security and Privacy (ISSP), pages 111–125, 2008.
  • [41] A. Narayanan and V. Shmatikov. De-anonymizing social networks. In 2009 30th IEEE Symposium on Security and Privacy (ISSP), pages 173–187, 2009.
  • [42] P. Pedarsani and M. Grossglauser. On the privacy of anonymized networks. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1235–1243, New York, NY, USA, 2011. Association for Computing Machinery.
  • [43] G. Piccioli, G. Semerjian, G. Sicuro, and L. Zdeborová. Aligning random graphs with a sub-tree similarity message-passing algorithm. In Journal Of Statistical Mechanics-theory And Experiment, (6):Paper No. 063401, 44, 2022.
  • [44] M. Z. Racz and A. Sridhar. Correlated randomly growing graphs. In Annals of Applied Probability, 32(2):1058–1111, 2022.
  • [45] M. Z. Racz and A. Sridhar. Correlated stochastic block models: Exact graph matching with applications to recovering communities. In Advances in Neural Information Processing Systems (NIPS), volume 34. Curran Associates, Inc., 2021.
  • [46] M. Rudelson and R. Vershynin. Hanson-Wright inequality and sub-Gaussian concentration. In Electronic Communications in Probability, 18(24):1–9, 2013.
  • [47] F. Shirani, S. Garg, and E. Erkip. Seeded graph matching: Efficient algorithms and theoretical guarantees. In 2017 51st Asilomar Conference on Signals, Systems, and Computers, pages 253–257, 2017.
  • [48] R. Singh, J. Xu, and B. Berger. Global alignment of multiple protein interaction networks with application to functional orthology detection. In Proceedings of the National Academy of Sciences of the United States of America, 105:12763–12768, 2008.
  • [49] J. T. Vogelstein, J. M. Conroy, V. Lyzinski, L. J. Podrazik, S. G. Kratzer, E. T. Harley, D. E. Fishkind, R. J. Vogelstein, and C. E. Priebe. Fast approximate quadratic programming for graph matching. In PLOS ONE, 10(4):1–17, 04 2015.
  • [50] H. Wang, Y. Wu, J. Xu, and I. Yolou. Random graph matching in geometric models: the case of complete graphs. In Conference on Learning Theory (COLT), volume 178 of Proceedings of Machine Learning Research, pages 3441–3488, 2022.
  • [51] F. T. Wright. A bound on tail probabilities for quadratic forms in independent random variables whose distributions are not necessarily symmetric. In Annals of Probability, 1(6):1068–1070, 1973.
  • [52] Y. Wu, J. Xu, and S. H. Yu. Settling the sharp reconstruction thresholds of random graph matching. In IEEE Transactions on Information Theory, 68(8): 5391–5417, 2022.
  • [53] Y. Wu, J. Xu, and S. H. Yu. Testing correlation of unlabeled random graphs. In The Annals of Applied Probability, 33(4): 2519–2558, 2023.
  • [54] L. Yartseva and M. Grossglauser. On the performance of percolation graph matching. In Proceedings of the First ACM Conference on Online Social Networks (COSN), pages 119–130, New York, NY, USA, 2013. Association for Computing Machinery.
  • [55] L. Yu, J. Xu and X. Lin. Graph matching with partially-correct seeds. In Journal of Machine Learning Research, 22(1):12829–12882, 2021.