This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

, and

Inferences on mixing probabilities and ranking in mixed-membership modelsthanks: The research is supported in part by ONR grant N00014-22-1-2340, NSF grants DMS-2052926, DMS-2053832 and DMS-2210833.

Sohom Bhattacharya    Jianqing Fan    Jikai Hou Department of Statistics, University of Floridapresep=, ] Department of Operations Research and Financial Engineering, Princeton Universitypresep=. ]
Abstract

Network data is prevalent in numerous big data applications including economics and health networks where it is of prime importance to understand the latent structure of network. In this paper, we model the network using the Degree-Corrected Mixed Membership (DCMM) model. In DCMM model, for each node ii, there exists a membership vector 𝝅i=(𝝅i(1),𝝅i(2),,𝝅i(K))\boldsymbol{\pi}_{i}=(\boldsymbol{\pi}_{i}(1),\boldsymbol{\pi}_{i}(2),\ldots,\boldsymbol{\pi}_{i}(K)), where 𝝅i(k)\boldsymbol{\pi}_{i}(k) denotes the weight that node ii puts in community kk. We derive novel finite-sample expansion for the 𝝅i(k)\boldsymbol{\pi}_{i}(k)s which allows us to obtain asymptotic distributions and confidence interval of the membership mixing probabilities and other related population quantities. This fills an important gap on uncertainty quantification on the membership profile. We further develop a ranking scheme of the vertices based on the membership mixing probabilities on certain communities and perform relevant statistical inferences. A multiplier bootstrap method is proposed for ranking inference of individual member’s profile with respect to a given community. The validity of our theoretical results is further demonstrated by via numerical experiments in both real and synthetic data examples.

Network data,
Mixed membership models,
Asymptotic distributions,
Ranking inference,
keywords:

1 Introduction

In various fields of study, such as citation networks, protein interactions, health, finance, trade, and social networks, we often come across large amounts of data that describe the relationships between objects. There are numerous approaches to understand and analyze such network data. Algorithmic methods are commonly used to optimize specific criteria as shown in [New13a, New13b] and [ZM14]. Alternatively, model-based methods rely on probabilistic models with specific structures, which are reviewed by [GZF+10]. One of the earliest models where the nodes (or vertices) of the network belong to some latent community is Stochastic Block Model (SBM) [HLL83, WW87, Abb17]. Several improvements have been proposed over this model to overcome its limitations, two of them are relevant in our paper. First, [KN11] introduced degree-corrected SBM, where a degree parameter is used for each vertices to make the expected degrees match the observed ones. Second, [ABFX08, ABEF14] study mixed membership model where each individual can belong to several communities with a mixing probability profile. In this paper, we study membership profiles in Degree-Corrected Mixed Membership (DCMM) model, which combines both the above benefits. In DCMM, every node ii is assumed to have a community membership probability vector 𝝅iK\boldsymbol{\pi}_{i}\in\mathbb{R}^{K}, where KK is the number of communities and the kk-th entry of 𝝅i\boldsymbol{\pi}_{i} specifies the mixture proportion of node ii in community kk (see [FFHL22a, JKL23]. For example, a newspaper can be 40%40\% conservative and 60%60\% liberal. In addition, each node is allowed to have its own degree.

Given such a network, estimation and inference of membership profiles has drawn some attention recently. For example, [JKL23] provides an algorithm to estimate the 𝝅i\boldsymbol{\pi}_{i}’s and [FFHL22b, FFLY22] considers the hypothesis testing problem that two nodes have same membership profiles. However, they avoid the problems of uncertainty quantification of 𝝅i(k)\boldsymbol{\pi}_{i}(k). In addition, to our best knowledge, none of the prior works concern with the problem of ranking nodes in a particular profile. As an example, one might consider asking the question: Is newspaper A more liberal than newspaper B? Or how many newspapers should I pick to ensure the top-K conservative newspapers are selected? Such a ranking question has applications in finances where one might be interested in knowing whether a particular stock is in top KK technology-stocks before investing in it. In our work, we device a framework to perform ranking inference based on 𝝅i\boldsymbol{\pi}_{i}s.

Our work lies in the intersection of three research directions which we delineate here.

  1. 1.

    Community detection: Our estimation and inference procedure crucially rely on spectral clustering, which is one of the oldest methods of community detection (cf. [VL07] for a tutorial). In the last decade, [RCY11, Jin15, LR15] have developed both the theory and methods of spectral clustering. Other line of research related to community detection involves showing optimality of detection boundary [Abb17] or link-prediction problem [NK07, LWLZ23]. Recently, [JKL23] has developed an algorithm to estimate πi\pi_{i}s in l2l_{2} norm, however it lacks any inferential guarantees and asymptotic distributions. Furthermore, there has been significant number of works about hypothesis testing in network data. [ACV14, VAC15] formulated community detection as a hypothesis detection problem. [FFHL22b, FFLY22] studies the testing problem of whether two vertices have same membership profiles in DCMM. Under stochastic block models, detection of the number of blocks has been studied by [BS16, Lei16, WB17] among others.

  2. 2.

    Ranking inference: Most of the literature about ranking problems deals with pairwise comparisons or multiple partial ranking models, like Bradley-Terry-Luce and other assortative network models. Prominent examples of ranking involves individual choices in economics [Luc12, M+73], websites [DKNS01], ranking of journals [JJKL21, Sti94], alleles in genetics [SC95].Hence, the ranking problem has been extensively studied in statistics, machine learning, and operations research, see, for example, [Hun04, CS15, CFMW19, CCF+21, GSZ23, FLWY22] for more details. However, none of the above work is concerned with ranking in DCMM model and hence, our work is significantly different from the aforementioned papers.

  3. 3.

    ll_{\infty} and l2,l_{2,\infty} perturbation theory: Often, for uncertainty quantification of unknown parameters, it is not enough to obtain an l2l_{2} error bound on the estimators, rather one needs a leave-one-out style analysis to get more refined ll_{\infty} and l2,l_{2,\infty} error bounds. See [CCF+21, Chapter 4] for an introduction. Such analysis has been used in matrix completion[CFMY19], principal component analysis [YCF21], ranking analysis[FLWY22, FHY22, GSZ23]. We develop novel subspace perturbation theory to obtain novel finite sample expansions of individual 𝝅i(k)\boldsymbol{\pi}_{i}(k)s and use it to obtain asymptotic distributions.

To perform inference about ranks and hypothesis testing, we employ an inference framework for ranking items through maximum pairwise difference statistic whose distribution is approximated by a newly proposed multiplier bootstrap method. A similar framework is recently introduced by [FLWY22] in the context of Bradley-Terry-Luce model.

The rest of the paper is structured as follows. Section 2 formulates the problem and describes the estimation procedure. Section 3 and Section 4 delineates the vertex hunting and membership reconstruction steps of our estimation procedure respectively. Using the results we established, we develop some distribution theory and answer inference questions in Section 5. We complement our theoretical findings with numerical experiments in Section 6 where we perform simulations on both synthetic and real datasets. Section 7 provides brief outline of the major proofs of our paper. Finally, Section 8 contains all the proofs.

2 Problem Formulation

2.1 Model setting

Consider an undirected graph G=(V,E)G=(V,E) on nn nodes (or vertices), where V=[n]:={1,2,,n}V=[n]:=\{1,2,\ldots,n\} is the set of nodes and E[n]×[n]E\subseteq[n]\times[n] denotes the set of edges or links between the nodes. Given such a graph GG, consider its symmetric adjacency matrix X=n×nX=\mathbb{R}^{n\times n} which captures the connectivity structure of XX, namely xij=1x_{ij}=1 if there exists a link or edge between the nodes ii and jj, i.e., (i,j)E(i,j)\in E and xij=0x_{ij}=0 otherwise. We allow the presence of potential self-loops in the graph GG. If there exists no self-loops, we let xii=0x_{ii}=0 for i[n]i\in[n]. Under a probabilistic model, we will assume that xijx_{ij} is an independent realization from a Bernoulli random variable for all upper triangular entries of random matrix XX.

Given a network on nn nodes, we assume that there is an underlying latent community structure which contains KK communities. Each node ii is associated with a community membership probability vector 𝝅i=(𝝅i(1),𝝅i(2),,𝝅i(K))K\boldsymbol{\pi}_{i}=(\boldsymbol{\pi}_{i}(1),\boldsymbol{\pi}_{i}(2),\dots,\boldsymbol{\pi}_{i}(K))\in\mathbb{R}^{K} such that

(node i belongs to community k)=𝝅i(k),k=1,2,,K.\displaystyle\mathbb{P}(\text{node }i\text{ belongs to community }k)=\boldsymbol{\pi}_{i}(k),\quad k=1,2,\dots,K.

A node is i[n]i\in[n] is called a pure node if there exists k[K]k\in[K] such that 𝝅i(k)=1\boldsymbol{\pi}_{i}(k)=1. The degree-corrected mixed membership (DCMM) model assumes (cf.[JKL23]) that the probability that an edge exists between node ii and node jj is given by

(edge exists between node i and node j)=θiθjk=1Kl=1K𝝅i(k)𝝅j(l)pkl.\displaystyle\mathbb{P}(\text{edge exists between node }i\text{ and node }j)=\theta_{i}\theta_{j}\sum_{k=1}^{K}\sum_{l=1}^{K}\boldsymbol{\pi}_{i}(k)\boldsymbol{\pi}_{j}(l)p_{kl}. (1)

Here, θi>0\theta_{i}>0 captures the degree heterogeneity for node i[n]i\in[n], and pkl>0p_{kl}>0 can be viewed as the probability of a typical member in community kk (θi=1\theta_{i}=1, say) connects with a typical member in community ll (θj=1\theta_{j}=1, say). The mixture probabilities can be written in the following matrix form as follows. Let 𝚯=diag(θ1,θ2,,θn)\boldsymbol{\Theta}=\textbf{diag}(\theta_{1},\theta_{2},\dots,\theta_{n}) be a diagonal matrix that captures the degree heterogeneity, 𝚷=(𝝅1,𝝅2,,𝝅n)n×K\boldsymbol{\Pi}=(\boldsymbol{\pi}_{1},\boldsymbol{\pi}_{2},\dots,\boldsymbol{\pi}_{n})^{\top}\in\mathbb{R}^{n\times K} be the matrix of community membership probability vectors, and 𝑷=(pkl)K×K\boldsymbol{P}=(p_{kl})\in\mathbb{R}^{K\times K} be a nonsingular matrix with pkl[0,1]p_{kl}\in[0,1]. Then, the mixing probability matrix can be expressed as

𝑯=𝚯𝚷𝑷𝚷𝚯.\displaystyle\boldsymbol{H}=\boldsymbol{\Theta}\boldsymbol{\Pi}\boldsymbol{P}\boldsymbol{\Pi}^{\top}\boldsymbol{\Theta}. (2)

Let 𝑿n×n\boldsymbol{X}\in\mathbb{R}^{n\times n} be the symmetric adjacency matrix of these nn nodes, i.e., Xij=1X_{ij}=1 if there is a link connecting nodes ii and jj, and Xij=0X_{ij}=0 otherwise. Note that 𝔼(Xij)=Hij\mathbb{E}(X_{ij})=H_{ij} for iji\not=j. We set Xii=0X_{ii}=0 for the case without self-loop. We also allow the case with loop. In that case, we assume that P(Xii=1)=HiiP(X_{ii}=1)=H_{ii}. In both cases, we write

𝑿=𝑯+𝑾,\boldsymbol{X}=\boldsymbol{H}+\boldsymbol{W}, (3)

where 𝑾\boldsymbol{W} is a symmetric random matrix with mean zero and independent entries above the diagonal and on the diagonal for the case with self-loop. In the following, since our theory applies to both cases, we will not distinguish between them. Therefore, the notation (3) is used throughout the article.

Our goal is to study the community membership probability matrix 𝚷\boldsymbol{\Pi} and conduct inference on its entries. Based on their uncertainty quantifications, we provide a framework for ranking inference and perform some hypothesis testing problems on the rank of each node’s mixing probability on a community. To this end, we need to first estimate 𝚷\boldsymbol{\Pi} using Mixed-SCORE algorithm [JKL23]. We invoke a slight modification of the algorithm along with the l2ll_{2}-l_{\infty} perturbation theory to get desired entry-wise expansion.

We impose the following identifiability condition for the DCMM model (1).

Assumption 1.

Each community 𝒞k\mathcal{C}_{k}, 1kK1\leq k\leq K has at least one pure node, namely, there exists a vertex i[n]i\in[n] such that 𝛑i(k)=1\boldsymbol{\pi}_{i}(k)=1.

2.2 Estimation procedure

We describe the version of the Mixed-SCORE algorithm (cf. [JKL23]) which will be used to estimate 𝚷\boldsymbol{\Pi}. The algorithm consists of three key steps. First, we map each node to a (K1)(K-1)-dimension space using observed 𝑿\boldsymbol{X}. Ideally, in absence of noise, i.e., 𝑾=0\boldsymbol{W}=0, these nn points in the (K1)(K-1)-dimension space would form a simplex with KK vertices, with the pure nodes defined by Assumption 1 becoming the vertices. Presence of such a simplex structure has been discussed by Lemma 2.1 of [JKL23].

In the presence of noise, as long as the noise level is mild, we can still hope they are approximately a simplex. So, we apply a vertex hunting algorithm to estimate these KK vertices. After that, we estimate the membership vector 𝝅i\boldsymbol{\pi}_{i} for each node based on the estimated vertices. Below, we describe the procedure mathematically.

  • SCORE step: Let λ^1,λ^2,,λ^K\widehat{\lambda}_{1},\widehat{\lambda}_{2},\dots,\widehat{\lambda}_{K} be the largest KK eigenvalues (in magnitude) of 𝑿\boldsymbol{X}, sorted in descending order. Let 𝒖^1,𝒖^2,,𝒖^K\widehat{\boldsymbol{u}}_{1},\widehat{\boldsymbol{u}}_{2},\dots,\widehat{\boldsymbol{u}}_{K} be the corresponding eigenvectors. Calculate the following nn vectors

    𝒓^i:=[(𝒖^2)i(𝒖^1)i,(𝒖^3)i(𝒖^1)i,,(𝒖^K)i(𝒖^1)i]K1,i[n].\displaystyle\widehat{\boldsymbol{r}}_{i}:=\left[\frac{(\widehat{\boldsymbol{u}}_{2})_{i}}{(\widehat{\boldsymbol{u}}_{1})_{i}},\frac{(\widehat{\boldsymbol{u}}_{3})_{i}}{(\widehat{\boldsymbol{u}}_{1})_{i}},\dots,\frac{(\widehat{\boldsymbol{u}}_{K})_{i}}{(\widehat{\boldsymbol{u}}_{1})_{i}}\right]^{\top}\in\mathbb{R}^{K-1},\quad\forall i\in[n]. (4)
  • Vertex Hunting step: Apply vertex hunting algorithm (see Secion 3 for details) to 𝒓^1,𝒓^2,,𝒓^n\widehat{\boldsymbol{r}}_{1},\widehat{\boldsymbol{r}}_{2},\dots,\widehat{\boldsymbol{r}}_{n} and get estimated vertices 𝒃^1,𝒃^2,,𝒃^KK1\widehat{\boldsymbol{b}}_{1},\widehat{\boldsymbol{b}}_{2},\dots,\widehat{\boldsymbol{b}}_{K}\in\mathbb{R}^{K-1}.

  • Membership Reconstruction step: For each i[n]i\in[n], let 𝒂^i=(a^i(1),,a^i(K))K\widehat{\boldsymbol{a}}_{i}=(\widehat{a}_{i}(1),\ldots,\widehat{a}_{i}(K))\in\mathbb{R}^{K} be the unique solution of the linear equations:

    𝒓^i=k=1Ka^i(k)𝒃^k,k=1Ka^i(k)=1.\displaystyle\widehat{\boldsymbol{r}}_{i}=\sum_{k=1}^{K}\widehat{a}_{i}(k)\widehat{\boldsymbol{b}}_{k},\qquad\sum_{k=1}^{K}\widehat{a}_{i}(k)=1. (5)

    Define a vector 𝝅^iK\widehat{\boldsymbol{\pi}}^{\prime}_{i}\in\mathbb{R}^{K} with 𝝅i(k)=a^i(k)/[𝒄^]k\boldsymbol{\pi}^{\prime}_{i}(k)=\widehat{a}_{i}(k)/\left[\widehat{\boldsymbol{c}}\right]_{k}, where

    [𝒄^]k=[λ^1+𝒃^kdiag(λ^2,λ^3,,λ^K)𝒃^k]1/2,1kK.\displaystyle\left[\widehat{\boldsymbol{c}}\right]_{k}=\left[\widehat{\lambda}_{1}+\widehat{\boldsymbol{b}}_{k}^{\top}\textbf{diag}(\widehat{\lambda}_{2},\widehat{\lambda}_{3},\dots,\widehat{\lambda}_{K})\widehat{\boldsymbol{b}}_{k}\right]^{-1/2},\quad 1\leq k\leq K. (6)

    Estimate 𝝅i\boldsymbol{\pi}_{i} as 𝝅^i=𝝅i/k=1K𝝅i(k)\widehat{\boldsymbol{\pi}}_{i}=\boldsymbol{\pi}^{\prime}_{i}/\sum_{k=1}^{K}\boldsymbol{\pi}_{i}^{\prime}(k).

To analyze the algorithm, we begin by analyzing the SCORE step, i.e., investigate the 𝒓^i\widehat{\boldsymbol{r}}_{i}’s.

We introduce some notations first. Since 𝑯\boldsymbol{H} and 𝑿\boldsymbol{X} are symmetric, we write their eigen-decomposition as

𝑯\displaystyle\boldsymbol{H} =i=1nλi𝒖i𝒖i=i=1n|λi|𝒖i(sgn(λi)𝒖i),\displaystyle=\sum_{i=1}^{n}\lambda_{i}^{*}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}=\sum_{i=1}^{n}|\lambda_{i}^{*}|\boldsymbol{u}_{i}^{*}\left(\textbf{sgn}\left(\lambda_{i}^{*}\right)\boldsymbol{u}_{i}^{*}\right)^{\top},
𝑿\displaystyle\boldsymbol{X} =i=1nλ^i𝒖^i𝒖^i=i=1n|λ^i|𝒖^i(sgn(λ^i)𝒖^i).\displaystyle=\sum_{i=1}^{n}\widehat{\lambda}_{i}\widehat{\boldsymbol{u}}_{i}\widehat{\boldsymbol{u}}_{i}^{\top}=\sum_{i=1}^{n}|\widehat{\lambda}_{i}|\widehat{\boldsymbol{u}}_{i}\left(\textbf{sgn}\left(\widehat{\lambda}_{i}\right)\widehat{\boldsymbol{u}}_{i}\right)^{\top}.

The eigenvalues are sorted in the following way. First, λ1,λ2,,λK\lambda_{1}^{*},\lambda_{2}^{*},\dots,\lambda_{K}^{*} are the largest KK eigenvalues of 𝑯\boldsymbol{H} in magnitude, and λ^1,λ^2,,λ^K\widehat{\lambda}_{1},\widehat{\lambda}_{2},\dots,\widehat{\lambda}_{K} are the largest KK eigenvalues of 𝑿\boldsymbol{X} in magnitude. Second, λ1>λ2>>λK\lambda_{1}^{*}>\lambda_{2}^{*}>\cdots>\lambda_{K}^{*}, λ^1>λ^2>λ^K\widehat{\lambda}_{1}>\widehat{\lambda}_{2}\cdots>\widehat{\lambda}_{K} are sorted descendingly, while the other eigenvalues can be sorted in any order. By Lemma C.3 of [JKL23], we can choose the sign of 𝒖1\boldsymbol{u}_{1}^{*} to make sure (𝒖1)i>0(\boldsymbol{u}_{1}^{*})_{i}>0, 1in1\leq i\leq n. The direction of 𝒖^1\widehat{\boldsymbol{u}}_{1} are chosen to make sure 𝒖^1𝒖10\widehat{\boldsymbol{u}}_{1}^{\top}\boldsymbol{u}_{1}^{*}\geq 0. Define

σi:=|λi|,𝒗i:=sgn(λi)𝒖i,σ^i:=|λ^i|,𝒗^i:=sgn(λ^i)𝒖^i.\displaystyle\sigma_{i}^{*}:=\left|\lambda_{i}^{*}\right|,\qquad\boldsymbol{v}_{i}^{*}:=\textbf{sgn}\left(\lambda_{i}^{*}\right)\boldsymbol{u}_{i}^{*},\qquad\widehat{\sigma}_{i}:=|\widehat{\lambda}_{i}|,\qquad\widehat{\boldsymbol{v}}_{i}:=\textbf{sgn}(\widehat{\lambda}_{i})\widehat{\boldsymbol{u}}_{i}. (7)

We further define,

𝑼¯=[𝒖2,𝒖3,,𝒖K]n×(K1)and𝑼¯=[𝒖^2,𝒖^3,,𝒖^K]n×(K1).\displaystyle\overline{\boldsymbol{U}}^{*}=[\boldsymbol{u}_{2}^{*},\boldsymbol{u}_{3}^{*},\dots,\boldsymbol{u}_{K}^{*}]\in\mathbb{R}^{n\times(K-1)}\quad\text{and}\quad\overline{\boldsymbol{U}}=[\widehat{\boldsymbol{u}}_{2},\widehat{\boldsymbol{u}}_{3},\dots,\widehat{\boldsymbol{u}}_{K}]\in\mathbb{R}^{n\times(K-1)}. (8)

We also denote by

𝒓i:=[(𝒖2)i(𝒖1)i,(𝒖3)i(𝒖1)i,,(𝒖K)i(𝒖1)i]K1,i[n].\displaystyle\boldsymbol{r}_{i}^{*}:=\left[\frac{(\boldsymbol{u}_{2}^{*})_{i}}{(\boldsymbol{u}_{1}^{*})_{i}},\frac{(\boldsymbol{u}_{3}^{*})_{i}}{(\boldsymbol{u}_{1}^{*})_{i}},\dots,\frac{(\boldsymbol{u}_{K}^{*})_{i}}{(\boldsymbol{u}_{1}^{*})_{i}}\right]^{\top}\in\mathbb{R}^{K-1},\quad\forall i\in[n]. (9)

Note from the expression of 𝒓i\boldsymbol{r}_{i}^{*} and 𝒓^i\widehat{\boldsymbol{r}}_{i}, in order to analyze 𝒓^i\widehat{\boldsymbol{r}}_{i}, we need to study the difference between 𝒖1\boldsymbol{u}_{1}^{*} and 𝒖^1\widehat{\boldsymbol{u}}_{1}, and the difference between 𝑼¯\overline{\boldsymbol{U}}^{*} and 𝑼¯\overline{\boldsymbol{U}}. To this end, we need the following four assumptions for the theoretical analysis, introduced in Section 3 of [JKL23], which are necessary for the Mixed-SCORE algorithm to work.

Denote by 𝑮:=K𝜽22(𝚷𝚯2𝚷)K×K\boldsymbol{G}:=K\left\|\boldsymbol{\theta}\right\|_{2}^{-2}(\boldsymbol{\Pi}^{\top}\boldsymbol{\Theta}^{2}\boldsymbol{\Pi})\in\mathbb{R}^{K\times K} and θmax:=maxi[n]θi\theta_{\max}:=\max_{i\in[n]}\theta_{i}. Note that, the eigenvalues of 𝑷𝑮\boldsymbol{P}\boldsymbol{G} are real, since 𝑮\boldsymbol{G} is positive-definite.

Assumption 2.

There exist constants C,C,C′′>0C,C^{\prime},C^{\prime\prime}>0 such that

θmaxCmini[n]θiandClognnθmaxC′′.\displaystyle\theta_{\text{max}}\leq C\min_{i\in[n]}\theta_{i}\quad\text{and}\quad C^{\prime}\sqrt{\frac{\log n}{n}}\leq\theta_{\text{max}}\leq C^{\prime\prime}.
Assumption 3.

There exists a constant C>0C>0 such that

𝑷maxC,𝑮C,𝑮1C.\displaystyle\left\|\boldsymbol{P}\right\|_{\text{max}}\leq C,\quad\left\|\boldsymbol{G}\right\|\leq C,\quad\left\|\boldsymbol{G}^{-1}\right\|\leq C.
Assumption 4.

There exists c1>0c_{1}>0 such that |λ2(𝐏𝐆)|(1c1)λ1(𝐏𝐆)|\lambda_{2}(\boldsymbol{P}\boldsymbol{G})|\leq(1-c_{1})\lambda_{1}(\boldsymbol{P}\boldsymbol{G}) and c1βn|λK(𝐏𝐆)||λ2(𝐏𝐆)|c11βnc_{1}\beta_{n}\leq|\lambda_{K}(\boldsymbol{P}\boldsymbol{G})|\leq|\lambda_{2}(\boldsymbol{P}\boldsymbol{G})|\leq c_{1}^{-1}\beta_{n}, where {βn}n=1\{\beta_{n}\}_{n=1}^{\infty} is a sequence of positive real number such that βn1\beta_{n}\leq 1, and λi(𝐏𝐆)\lambda_{i}(\boldsymbol{P}\boldsymbol{G}) is the ii-th largest right eigenvalue of 𝐏𝐆\boldsymbol{P}\boldsymbol{G}.

Assumption 5.

There exists a constant C>0C>0 such that min1kK𝛈1(k)>0\min_{1\leq k\leq K}\boldsymbol{\eta}_{1}(k)>0 and

max1kK𝜼1(k)min1kK𝜼1(k)C.\displaystyle\frac{\max_{1\leq k\leq K}\boldsymbol{\eta}_{1}(k)}{\min_{1\leq k\leq K}\boldsymbol{\eta}_{1}(k)}\leq C.

Here 𝛈1\boldsymbol{\eta}_{1} is the right eigenvector corresponding to λ1(𝐏𝐆)\lambda_{1}(\boldsymbol{P}\boldsymbol{G}).

Here, Assumption 2 and Assumption 3 ensure that the underlying model is not spiky. In other words, the signal spread across all nodes. Assumption 4 is the eigen-gap assumption, and as we will see next, this assumption ensures that there is a sufficient gap between the first eigenvalue and the remaining eigenvalues of 𝑯\boldsymbol{H}, and the remaining non-zeros eigenvalues of 𝑯\boldsymbol{H} are of the same order. Assumption 5 seems to be less straightforward, but it actually satisfied by a very wide range of models (See [JKL23] for examples). We will make Assumption 2-5 for the remainder of the paper.

With these assumptions in hand, the following lemma lists some important properties of the eigenvalues of 𝑯\boldsymbol{H} and 𝑿\boldsymbol{X}.

Lemma 1.

(Lemma C.2 of [JKL23]) Let 𝛉=[θ1,,θn]\boldsymbol{\theta}=[\theta_{1},\ldots,\theta_{n}]^{\top} and βn\beta_{n}s are defined by Assumption 4. Under Assumption 2-5, we have the following statements

  • C11K1𝜽22λ1C1𝜽22C_{1}^{-1}K^{-1}\left\|\boldsymbol{\theta}\right\|_{2}^{2}\leq\lambda_{1}^{*}\leq C_{1}\left\|\boldsymbol{\theta}\right\|_{2}^{2}. If βn=o(1)\beta_{n}=o(1), then λ1𝜽22\lambda_{1}^{*}\asymp\left\|\boldsymbol{\theta}\right\|_{2}^{2}.

  • λ1|λi|λ1\lambda_{1}^{*}-\left|\lambda_{i}^{*}\right|\asymp\lambda_{1}^{*}, for 2iK2\leq i\leq K.

  • |λi|βnK1𝜽22|\lambda_{i}^{*}|\asymp\beta_{n}K^{-1}\left\|\boldsymbol{\theta}\right\|_{2}^{2}, for 2iK2\leq i\leq K.

Combine this lemma with the fact that 𝑾nθmax\left\|\boldsymbol{W}\right\|\lesssim\sqrt{n}\theta_{\max} (will be shown later in Lemma 4) with high probability, we know that the eigenvalues of 𝑯\boldsymbol{H} and 𝑿\boldsymbol{X} can be divided into three group as long as nθmaxβnK1𝜽22\sqrt{n}\theta_{\max}\ll\beta_{n}K^{-1}\left\|\boldsymbol{\theta}\right\|_{2}^{2}.

{λ1}{|λ2|,|λ3|,,|λK|}{λK+1,λK+2,,λn},\displaystyle\{\lambda_{1}^{*}\}\gtrsim\{|\lambda_{2}^{*}|,|\lambda_{3}^{*}|,\dots,|\lambda_{K}^{*}|\}\gg\{\lambda_{K+1}^{*},\lambda_{K+2}^{*},\dots,\lambda_{n}^{*}\},
{λ^1}{|λ^2|,|λ^3|,,|λ^K|}{λ^K+1,λ^K+2,,λ^n}.\displaystyle\{\widehat{\lambda}_{1}\}\gtrsim\{|\widehat{\lambda}_{2}|,|\widehat{\lambda}_{3}|,\dots,|\widehat{\lambda}_{K}|\}\gg\{\widehat{\lambda}_{K+1},\widehat{\lambda}_{K+2},\dots,\widehat{\lambda}_{n}\}.

This also motivates us to derive two results separately. First, we obtain the expansion of 𝒖^1𝒖1\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{*}. Second, we analyze the difference between 𝑼¯\overline{\boldsymbol{U}}^{*} and 𝑼¯\overline{\boldsymbol{U}} by writing the expansion of 𝑼¯𝑹𝑼¯\overline{\boldsymbol{U}}\boldsymbol{R}-\overline{\boldsymbol{U}}^{*}, where

𝑹:=argmin𝑶𝒪(K1)×(K1)𝑼¯𝑶𝑼¯F\displaystyle\boldsymbol{R}:=\operatorname*{arg\,min}_{\boldsymbol{O}\in\mathcal{O}^{(K-1)\times(K-1)}}\left\|\overline{\boldsymbol{U}}\boldsymbol{O}-\overline{\boldsymbol{U}}^{*}\right\|_{F} (10)

is an orthogonal matrix which best “matches” 𝑼¯\overline{\boldsymbol{U}} and 𝑼¯\overline{\boldsymbol{U}}^{*}. Here 𝒪(K1)×(K1)\mathcal{O}^{(K-1)\times(K-1)} stands for the set of all the (K1)×(K1)(K-1)\times(K-1) orthogonal matrices.

The analysis of 𝑼¯𝑹𝑼¯\overline{\boldsymbol{U}}\boldsymbol{R}-\overline{\boldsymbol{U}}^{*} is similar to a matrix denoising problem, so we define some quantities which are commonly used in matrix denoising literature [YCF21]. We define

σmax=max2iK|λi|,σmin=min2iK|λi|,κ=σmax/σmin,\displaystyle\sigma_{\textbf{max}}^{*}=\max_{2\leq i\leq K}|\lambda_{i}^{*}|,\quad\sigma_{\textbf{min}}^{*}=\min_{2\leq i\leq K}|\lambda_{i}^{*}|,\quad\kappa^{*}=\sigma_{\textbf{max}}^{*}/\sigma_{\textbf{min}}^{*},
σ^max:=max2iK|λ^i|,σ^min=min2iK|λ^i|.\displaystyle\widehat{\sigma}_{\textbf{max}}:=\max_{2\leq i\leq K}|\widehat{\lambda}_{i}|,\quad\widehat{\sigma}_{\textbf{min}}=\min_{2\leq i\leq K}|\widehat{\lambda}_{i}|. (11)

Our results regarding uncertainty quantification contributes to the literature of estimation of subspace spanned by partial eigenvectors, cf. [AFWZ20, AFW22].

To state our results, we state an incoherence condition on the matrix 𝑯\boldsymbol{H}, which is a natural modification of the standard incoherence assumption [YCF21, Assumption 2] to adapt to our setting by separating the first eigenspace from the second to KK-th eigenspace.

Definition 1.

(Incoherence) The symmetric matrix 𝐇\boldsymbol{H} is said to be μ\mu^{*}-incoherent if

𝒖1μn,and𝑼¯2,μ(K1)n.\displaystyle\left\|\boldsymbol{u}_{1}^{*}\right\|_{\infty}\leq\sqrt{\frac{\mu^{*}}{n}},\quad\mbox{and}\quad\left\|\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}\leq\sqrt{\frac{\mu^{*}(K-1)}{n}}. (12)

Note that, unlike [YCF21] we do not indeed to assume an incoherence assumption in this article. This is because a combination of Assumption 2 and Assumption 3 shows that 𝑯\boldsymbol{H} is μ\mu^{\star} incoherent with μ1\mu^{*}\asymp 1, see Remark 1 and Section 8.22 for more details. Also Lemma 1 implies κ1\kappa^{*}\asymp 1. However, the quantities μ\mu^{*} and κ\kappa^{*} are two key parameters in the literature of matrix denoising, and we keep them in our results.

Remark 1.

By Lemma C.3 of [JKL23], we obtain (𝐮1)iθi/𝛉2(\boldsymbol{u}_{1}^{*})_{i}\asymp\theta_{i}/\|\boldsymbol{\theta}\|_{2} for 1i[n]1\leq i\leq[n]. Therefore, an incoherence condition on 𝐮1\boldsymbol{u}_{1}^{*} actually can be interpreted as the incoherence condition on 𝛉\boldsymbol{\theta}, which is implied by Assumption 2.

Now, we are ready to state our results. We first state the following bound on 𝒖^1𝒖1\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{*} whose proof is deferred.

Theorem 1.

If max{nθmax,logn}λ1\max\{\sqrt{n}\theta_{\text{max}},\log n\}\ll\lambda_{1}^{*}, with probability at least 1O(n10)1-O(n^{-10}),

𝒖^1𝒖1Klog0.5n+K1.5μnθmax+Kμlognn1.5θmax2+K3n1.5θmax3,\displaystyle\left\|\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{*}\right\|_{\infty}\lesssim\frac{K\log^{0.5}n+K^{1.5}\sqrt{\mu^{*}}}{n\theta_{\text{max}}}+\frac{K\sqrt{\mu^{*}}\log n}{n^{1.5}\theta_{\text{max}}^{2}}+\frac{K^{3}}{n^{1.5}\theta_{\text{max}}^{3}},

where μ\mu^{\star} is given by Definition 1. Moreover, we have the following expansion

𝒖^1𝒖1=i=2n𝒖i𝑾𝒖1λ1λi𝒖i+𝜹,\displaystyle\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{*}=\sum_{i=2}^{n}\frac{\boldsymbol{u}_{i}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}}{\lambda_{1}^{*}-\lambda_{i}^{*}}\boldsymbol{u}_{i}^{*}+\boldsymbol{\delta},

where with probability at at least 1O(n10)1-O(n^{-10}) we have 𝛅2K2/nθmax2\|\boldsymbol{\delta}\|_{2}\lesssim K^{2}/n\theta_{\text{max}}^{2} and

𝜹\displaystyle\left\|\boldsymbol{\delta}\right\|_{\infty}\lesssim K2.5μ+K2log0.5nn1.5θmax2+K3n1.5θmax3+K2log1.5n+K2.5μn2θmax3+log2nμn2.5θmax4.\displaystyle\frac{K^{2.5}\sqrt{\mu^{*}}+K^{2}\log^{0.5}n}{n^{1.5}\theta_{\text{max}}^{2}}+\frac{K^{3}}{n^{1.5}\theta_{\text{max}}^{3}}+\frac{K^{2}\log^{1.5}n+K^{2.5}\sqrt{\mu^{*}}}{n^{2}\theta_{\text{max}}^{3}}+\frac{\log^{2}n\sqrt{\mu^{*}}}{n^{2.5}\theta_{\text{max}}^{4}}.

Furthermore, if K,μ,θmax1K,\mu^{*},\theta_{\max}\asymp 1, we obtain

𝒖^1𝒖1log0.5nn and 𝜹log2nn1.5.\displaystyle\left\|\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{*}\right\|_{\infty}\lesssim\frac{\log^{0.5}n}{n}\quad\text{ and }\quad\left\|\boldsymbol{\delta}\right\|_{\infty}\lesssim\frac{\log^{2}n}{n^{1.5}}.
Proof.

Combine Lemma 1 with Theorem 8, Theorem 9 and Lemma 5. ∎

Theorem 1 provides an error bound for u^1\hat{u}_{1} through the quantity 𝜹\boldsymbol{\delta}. Both l2l_{2} and ll_{\infty} bounds on 𝜹\boldsymbol{\delta} are provided. The following result states the expansion for 𝑼¯𝑹𝑼¯\overline{\boldsymbol{U}}\boldsymbol{R}-\overline{\boldsymbol{U}}^{*}. Define

ε0:=\displaystyle\varepsilon_{0}:= (κK2.5μn1.5βn2θmax2+K2.5log0.5nn1.5βn2θmax3+K3.5n1.5βn2θmax3+K1.5log0.5nnβnθmax+K2nβnθmax2+K0.5μn0.5)\displaystyle\left(\frac{\kappa^{*}K^{2.5}\sqrt{\mu^{*}}}{n^{1.5}\beta_{n}^{2}\theta_{\text{max}}^{2}}+\frac{K^{2.5}\log^{0.5}n}{n^{1.5}\beta_{n}^{2}\theta_{\text{max}}^{3}}+\frac{K^{3.5}}{n^{1.5}\beta_{n}^{2}\theta_{\text{max}}^{3}}+\frac{K^{1.5}\log^{0.5}n}{n\beta_{n}\theta_{\text{max}}}+\frac{K^{2}}{n\beta_{n}\theta_{\text{max}}^{2}}+\frac{K^{0.5}\sqrt{\mu^{*}}}{n^{0.5}}\right)
(κK2nβn2θmax2+K1.5log0.5nnβnθmax)+K2.5log0.5nn1.5βn2θmax4+K3.5n1.5βn2θmax3.\displaystyle\cdot\left(\frac{\kappa^{*}K^{2}}{n\beta_{n}^{2}\theta_{\text{max}}^{2}}+\frac{K^{1.5}\log^{0.5}n}{n\beta_{n}\theta_{\text{max}}}\right)+\frac{K^{2.5}\log^{0.5}n}{n^{1.5}\beta_{n}^{2}\theta_{\text{max}}^{4}}+\frac{K^{3.5}}{n^{1.5}\beta_{n}^{2}\theta_{\text{max}}^{3}}. (13)

Define 𝚲¯:=diag(λ2,λ3,,λK)\overline{\boldsymbol{\Lambda}}^{*}:=\textbf{diag}(\lambda_{2}^{*},\lambda_{3}^{*},\dots,\lambda_{K}^{*}).

Theorem 2.

Assume that max{nθmax,logn}σmin\max\{\sqrt{n}\theta_{\text{max}},\log n\}\ll\sigma_{\textbf{min}}^{*} and (K1)lognθmax/σmin+κnθmax2/σmin21\sqrt{(K-1)\log n}\theta_{\text{max}}/\sigma_{\textbf{min}}^{*}+\kappa^{*}n\theta_{\text{max}}^{2}/\sigma_{\textbf{min}}^{*2}\ll 1, where κ\kappa^{\star}, μ\mu^{\star} are given by (2.2) and Definition 1 respectively. Write

𝑼¯𝑹𝑼¯=[𝑾𝒖1𝒖1𝑾𝑵]𝑼¯(𝚲¯)1+𝚿𝑼¯,\displaystyle\overline{\boldsymbol{U}}\boldsymbol{R}-\overline{\boldsymbol{U}}^{*}=\left[\boldsymbol{W}-\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{N}\right]\overline{\boldsymbol{U}}^{*}\left(\overline{\boldsymbol{\Lambda}}^{*}\right)^{-1}+\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}},

where 𝐑\boldsymbol{R} is defined by (10) and

𝑵=j=2nλ1𝒖j𝒖j/(λ1λj).\boldsymbol{N}=\sum_{j=2}^{n}\lambda_{1}^{*}\boldsymbol{u}_{j}^{*}\boldsymbol{u}_{j}^{*\top}/(\lambda_{1}^{*}-\lambda_{j}^{*}). (14)

If nmax{μlog2n,Klogn}n\gtrsim\max\left\{\mu^{*}\log^{2}n,K\log n\right\}, then with probability at least 1O(n10)1-O(n^{-10}), 𝚿𝐔¯\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}} satisfies

𝚿𝑼¯2,ε0,\displaystyle\left\|\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}\right\|_{2,\infty}\lesssim\varepsilon_{0},

where ε0\varepsilon_{0} is defined by (2.2). If, in addition K,μ,κ,βn,θmax1K,\mu^{*},\kappa^{*},\beta_{n},\theta_{\max}\asymp 1 as in Theorem 1, we obtain 𝚿𝐔¯2,log0.5nn1.5\left\|\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}\right\|_{2,\infty}\lesssim\frac{\log^{0.5}n}{n^{1.5}}.

Proof.

See Section 8.11. ∎

We are now in a position to state our main result about r^i\hat{r}_{i}s defined by (4). This is our final result regarding finite sample analysis of SCORE-step. The proof involves both Theorem 1 and Theorem 2, and is deferred.

Theorem 3.

Assume the conditions in Theorem 2 hold. In addition, we assume

Klog0.5n+K1.5μn0.5θmax+Kμlognnθmax2+K3nθmax31.\displaystyle\frac{K\log^{0.5}n+K^{1.5}\sqrt{\mu^{*}}}{n^{0.5}\theta_{\text{max}}}+\frac{K\sqrt{\mu^{*}}\log n}{n\theta_{\text{max}}^{2}}+\frac{K^{3}}{n\theta_{\text{max}}^{3}}\ll 1.

Recall the orthogonal matrix 𝐑\boldsymbol{R} defined by (10) and

𝒘i={[𝑾𝒖1𝒖1𝑾𝑵]i,𝑼¯(𝚲¯)1},i[n],\displaystyle\boldsymbol{w}_{i}=\left\{\left[\boldsymbol{W}-\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{N}\right]_{i,\cdot}\overline{\boldsymbol{U}}^{*}\left(\overline{\boldsymbol{\Lambda}}^{*}\right)^{-1}\right\}^{\top},\quad\forall i\in[n],

where the matrix NN was defined by (14). Then we have the following decomposition

𝑹𝒓^i𝒓i=1+γi(𝒖1)i(𝒘i1λ1[𝑵𝑾𝒖1]i𝒓i)+[𝚿𝒓]i,,i[n],\displaystyle\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}=\frac{1+\gamma_{i}}{(\boldsymbol{u}_{1}^{*})_{i}}\left(\boldsymbol{w}_{i}-\frac{1}{\lambda_{1}^{*}}\left[\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right]_{i}\boldsymbol{r}_{i}^{*}\right)+[\boldsymbol{\Psi}_{\boldsymbol{r}}]_{i,\cdot}^{\top},\quad\forall i\in[n],

such that with probability at least 1O(n10)1-O(n^{-10}), for all i[n]i\in[n] we have

|γi|Klog0.5n+K1.5μn0.5θmax+Kμlognnθmax2+K3nθmax3,\displaystyle|\gamma_{i}|\lesssim\frac{K\log^{0.5}n+K^{1.5}\sqrt{\mu^{*}}}{n^{0.5}\theta_{\text{max}}}+\frac{K\sqrt{\mu^{*}}\log n}{n\theta_{\text{max}}^{2}}+\frac{K^{3}}{n\theta_{\text{max}}^{3}},
[𝚿𝒓]i,2n(𝚿𝑼¯2,+𝒓i2𝜹),\displaystyle\left\|[\boldsymbol{\Psi}_{\boldsymbol{r}}]_{i,\cdot}\right\|_{2}\lesssim\sqrt{n}\left(\left\|\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}\right\|_{2,\infty}+\left\|\boldsymbol{r}_{i}^{*}\right\|_{2}\left\|\boldsymbol{\delta}\right\|_{\infty}\right),

where 𝛅\left\|\boldsymbol{\delta}\right\|_{\infty}and 𝚿𝐔¯2,\left\|\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}\right\|_{2,\infty} are controlled by Theorem 1 and Theorem 2. Specifically, if K,μ,κ,βn,θmax1K,\mu^{*},\kappa^{*},\beta_{n},\theta_{\max}\asymp 1, for all i[n]i\in[n] we have

|γi|log0.5nn0.5and[𝚿𝒓]i,2log2nn.\displaystyle|\gamma_{i}|\lesssim\frac{\log^{0.5}n}{n^{0.5}}\quad\text{and}\quad\left\|[\boldsymbol{\Psi}_{\boldsymbol{r}}]_{i,\cdot}\right\|_{2}\lesssim\frac{\log^{2}n}{n}.
Proof.

See Section 8.12. ∎

Remark 2.

In terms of the number of communities are allowed, if we assume μ,κ,βn,θmax1\mu^{*},\kappa^{*},\beta_{n},\theta_{\max}\asymp 1, then the assumptions in Theorem 1-3 require Kn1/3K\ll n^{1/3}. When it comes to θmax\theta_{\max}, if K,μ,κ,βn1K,\mu^{*},\kappa^{*},\beta_{n}\asymp 1, the assumptions in Theorem 1-3 require θmaxn1/3\theta_{\text{max}}\gg n^{-1/3}.

We finish the section with our result regarding estimation error bound on the 𝒓^i\widehat{\boldsymbol{r}}_{i}s. To this end, we need to define the following quantities:

ε1:=\displaystyle\varepsilon_{1}:= Kμn0.5θmax+K0.5lognμn0.5θmax2+K1.5log0.5nn0.5βnθmax+K1.5lognμnβnθmax2;\displaystyle\frac{K\sqrt{\mu^{*}}}{n^{0.5}\theta_{\text{max}}}+\frac{K^{0.5}\log n\sqrt{\mu^{*}}}{n^{0.5}\theta_{\text{max}}^{2}}+\frac{K^{1.5}\log^{0.5}n}{n^{0.5}\beta_{n}\theta_{\text{max}}}+\frac{K^{1.5}\log n\sqrt{\mu^{*}}}{n\beta_{n}\theta_{\text{max}}^{2}};
ε2:=\displaystyle\varepsilon_{2}:= (Klog0.5n+K1.5μn0.5θmax+Kμlognnθmax2+K3nθmax3)ε1+K3μ+K2.5log0.5nnθmax2\displaystyle\left(\frac{K\log^{0.5}n+K^{1.5}\sqrt{\mu^{*}}}{n^{0.5}\theta_{\text{max}}}+\frac{K\sqrt{\mu^{*}}\log n}{n\theta_{\text{max}}^{2}}+\frac{K^{3}}{n\theta_{\text{max}}^{3}}\right)\varepsilon_{1}+\frac{K^{3}\sqrt{\mu^{*}}+K^{2.5}\log^{0.5}n}{n\theta_{\text{max}}^{2}}
+K2.5log1.5n+K3μn1.5θmax3+K0.5log2nμn2θmax4+ε0,\displaystyle+\frac{K^{2.5}\log^{1.5}n+K^{3}\sqrt{\mu^{*}}}{n^{1.5}\theta_{\text{max}}^{3}}+\frac{K^{0.5}\log^{2}n\sqrt{\mu^{*}}}{n^{2}\theta_{\text{max}}^{4}}+\varepsilon_{0}, (15)

where ε0\varepsilon_{0} is defined by (2.2).

Despite the complicated expressions, these two quantities are easily interpretable. ε1\varepsilon_{1} controls the estimation error 𝑹𝒓^i𝒓i2\|\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}\|_{2} according to the Theorem 4 below, while ε2\varepsilon_{2} controls the expansion error 𝚿𝑼¯2,\left\|\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}\right\|_{2,\infty} according to Theorem 3. If we assume K,μ,κ,βn,θmax1K,\mu^{*},\kappa^{*},\beta_{n},\theta_{\max}\asymp 1, for all i[n]i\in[n], then one have

ε1lognn0.5andε2log2nn.\displaystyle\varepsilon_{1}\asymp\frac{\log n}{n^{0.5}}\quad\text{and}\quad\varepsilon_{2}\asymp\frac{\log^{2}n}{n}.

That is, the expansion error decays faster than the estimation error by n\sqrt{n} up to logarithmic factors. This validates our theoretical results. As we have mentioned before, the following theorem shows that ε1\varepsilon_{1} controls the estimation error.

Theorem 4.

Assume the conditions in Theorem 3 hold. Assume

nmax{\displaystyle n\gtrsim\max\Bigg{\{} K2βn2θmax6,K4βn2θmax4logn,κ2K2μβn2θmax2logn,Kμ,κK2.5βn2θmax3log0.5n,κK2βn2θmax2,\displaystyle\frac{K^{2}}{\beta_{n}^{2}\theta_{\text{max}}^{6}},\frac{K^{4}}{\beta_{n}^{2}\theta_{\text{max}}^{4}\log n},\frac{\kappa^{*2}K^{2}\mu^{*}}{\beta_{n}^{2}\theta_{\text{max}}^{2}\log n},K\mu^{*},\frac{\kappa^{*}K^{2.5}}{\beta_{n}^{2}\theta_{\text{max}}^{3}\log^{0.5}n},\frac{\kappa^{*}K^{2}}{\beta_{n}^{2}\theta_{\text{max}}^{2}},
K1.5log0.5nβnθmax,K4θmax2,K3θmax4μ}\displaystyle\frac{K^{1.5}\log^{0.5}n}{\beta_{n}\theta_{\text{max}}},\frac{K^{4}}{\theta_{\text{max}}^{2}},\frac{K^{3}}{\theta_{\text{max}}^{4}\mu^{*}}\Bigg{\}} (16)

and

n1.5max{κK4βn3θmax4log0.5n,κK3βn3θmax4,K2.5log0.5nβn2θmax3,κ2K3μβn3θmax3log0.5n,κK2.5μβn2θmax2}.\displaystyle n^{1.5}\gtrsim\max\Bigg{\{}\frac{\kappa^{*}K^{4}}{\beta_{n}^{3}\theta_{\text{max}}^{4}\log^{0.5}n},\frac{\kappa^{*}K^{3}}{\beta_{n}^{3}\theta_{\text{max}}^{4}},\frac{K^{2.5}\log^{0.5}n}{\beta_{n}^{2}\theta_{\text{max}}^{3}},\frac{\kappa^{*2}K^{3}\sqrt{\mu^{*}}}{\beta_{n}^{3}\theta_{\text{max}}^{3}\log^{0.5}n},\frac{\kappa^{*}K^{2.5}\sqrt{\mu^{*}}}{\beta_{n}^{2}\theta_{\text{max}}^{2}}\Bigg{\}}. (17)

Then with probability at least 1O(n10)1-O(n^{-10}) we have

max1in𝑹𝒓^i𝒓i2ε1.\displaystyle\max_{1\leq i\leq n}\left\|\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}\right\|_{2}\lesssim\varepsilon_{1}. (18)
Proof.

See Section 8.13. ∎

The above result obtains finite sample error bound for the r^i\hat{r}_{i}s defined by (4). By Theorem 3, we know that

max1in𝑹𝒓^i𝒓iΔ𝒓i2\displaystyle\max_{1\leq i\leq n}\left\|\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}-\Delta\boldsymbol{r}_{i}\right\|_{2} max1in|γi|2max1in𝑹𝒓^i𝒓i2+max1in[𝚿𝒓]i,2\displaystyle\lesssim\max_{1\leq i\leq n}|\gamma_{i}|_{2}\cdot\max_{1\leq i\leq n}\left\|\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}\right\|_{2}+\max_{1\leq i\leq n}\left\|[\boldsymbol{\Psi}_{\boldsymbol{r}}]_{i,\cdot}\right\|_{2}
max1in|γi|2ε1+max1in[𝚿𝒓]i,2ε2\displaystyle\lesssim\max_{1\leq i\leq n}|\gamma_{i}|_{2}\cdot\varepsilon_{1}+\max_{1\leq i\leq n}\left\|[\boldsymbol{\Psi}_{\boldsymbol{r}}]_{i,\cdot}\right\|_{2}\lesssim\varepsilon_{2} (19)

with probability at least 1O(n10)1-O(n^{-10}), where we define

Δ𝒓i:=(𝒘i[𝑵𝑾𝒖1]i𝒓i/λ1)/(𝒖1)i,i[n].\displaystyle\Delta\boldsymbol{r}_{i}:=(\boldsymbol{w}_{i}-\left[\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right]_{i}\boldsymbol{r}_{i}^{*}/\lambda_{1}^{*})/(\boldsymbol{u}_{1})_{i},\quad i\in[n]. (20)
Remark 3.

We remark here that (16) and (17) are two sufficient conditions to ensure (18) holds, but not necessary. In fact, these two conditions ensure the upper bound of the first order term Δ𝐫i2\left\|\Delta\boldsymbol{r}_{i}\right\|_{2} dominates the upper bound of the expansion error, which is given by Theorem 3, in order to simplify the upper bound of the estimation error max1in𝐑𝐫^i𝐫i2\max_{1\leq i\leq n}\|\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}\|_{2}. In other words, the results in the rest of the paper hold without (16) and (17), but a more complicated expression of ε1\varepsilon_{1} is required if we don’t have (16) and (17).

3 Vertex Hunting

In this section, we describe how to estimate the KK underlying vertices of the simplex based on the dataset. To this end, define disjoint subsets 𝕍1,𝕍2,,𝕍K[n]\mathbb{V}_{1},\mathbb{V}_{2},\dots,\mathbb{V}_{K}\subset[n] such that 𝕍k\mathbb{V}_{k} is the collection of all the pure nodes of the kk-th community, and let 𝒃k\boldsymbol{b}_{k}^{*} be the vertex of the corresponding community, i.e.,

𝕍k={i[n]:𝝅i(k)=1}and𝒃k=(i𝕍k𝒓i)/|𝕍k|k[K].\displaystyle\mathbb{V}_{k}=\{i\in[n]:\boldsymbol{\pi}_{i}(k)=1\}\quad\text{and}\quad\boldsymbol{b}_{k}^{*}=\left(\sum_{i\in\mathbb{V}_{k}}\boldsymbol{r}_{i}^{*}\right)/|\mathbb{V}_{k}|\quad\forall k\in[K]. (21)

The following quantity

Δ𝒓=mink[K]mini[n]\𝕍k𝒓i𝒃k2\displaystyle\Delta_{\boldsymbol{r}}=\min_{k\in[K]}\min_{i\in[n]\backslash\mathbb{V}_{k}}\left\|\boldsymbol{r}_{i}^{*}-\boldsymbol{b}_{k}^{*}\right\|_{2} (22)

measures the gap between any vertex and the other points. We will use the following successive projection algorithm given by Algorithm 1 for the vertex hunting step.

Algorithm 1 Successive projection
1:  Input 𝒓^1,𝒓^2,,𝒓^n\widehat{\boldsymbol{r}}_{1},\widehat{\boldsymbol{r}}_{2},\dots,\widehat{\boldsymbol{r}}_{n}, radius ϕ\phi
2:  Initialize 𝒁i=(1,𝒓^i)\boldsymbol{Z}_{i}=(1,\widehat{\boldsymbol{r}}_{i}^{\top})^{\top}, for i[n]i\in[n]
3:  for k=1,2,,Kk=1,2,\dots,K do
4:     Let ik=argmax1in𝒁i2i_{k}=\operatorname*{arg\,max}_{1\leq i\leq n}\left\|\boldsymbol{Z}_{i}\right\|_{2} and 𝒃^k=𝒓^ik\widehat{\boldsymbol{b}}_{k}^{\prime}=\widehat{\boldsymbol{r}}_{i_{k}}
5:     Update 𝒁i𝒁i𝒓^ik𝒓^ik𝒁i/𝒓^ik22\boldsymbol{Z}_{i}\leftarrow\boldsymbol{Z}_{i}-\widehat{\boldsymbol{r}}_{i_{k}}\widehat{\boldsymbol{r}}_{i_{k}}^{\top}\boldsymbol{Z}_{i}/\left\|\widehat{\boldsymbol{r}}_{i_{k}}\right\|_{2}^{2}, for i[n]i\in[n]
6:  end for
7:  Let
𝕍^k={i[n]:𝒓^i𝒃^k2ϕ} and 𝒃^k=1|𝕍^k|i𝕍^k𝒓^i\widehat{\mathbb{V}}_{k}=\left\{i\in[n]:\left\|\widehat{\boldsymbol{r}}_{i}-\widehat{\boldsymbol{b}}_{k}^{\prime}\right\|_{2}\leq\phi\right\}\text{ and }\widehat{\boldsymbol{b}}_{k}=\frac{1}{|\widehat{\mathbb{V}}_{k}|}\sum_{i\in\widehat{\mathbb{V}}_{k}}\widehat{\boldsymbol{r}}_{i}
for k[K]k\in[K]
8:  return  𝕍^1,𝕍^2,,𝕍^K\widehat{\mathbb{V}}_{1},\widehat{\mathbb{V}}_{2},\dots,\widehat{\mathbb{V}}_{K} and 𝒃^1,𝒃^2,,𝒃^K\widehat{\boldsymbol{b}}_{1},\widehat{\boldsymbol{b}}_{2},\dots,\widehat{\boldsymbol{b}}_{K}

Successive projection algorithm (SPA) is a forward variable selection method introduced by [ASG+01]. Our version of SPA borrows from[GV13], who derived its theoretical guarantee. One might consider alternate vertex hunting algorithms similar in spirit to  [JKL23]. We leave this for future research directions.

Our goal for the remainder of the section is to understand how SPA is effective in selecting the underlying vertices. If Δr\Delta_{r}, defined by (22), is not too small, we expect the vertex hunting algorithm to retrieve all the vertices of the simplex, namely, selection consistency. This is the main result of this section. The proof follows by combining Theorem 4 with Theorem 3 of [GV13].

Theorem 5.

Assume the conditions in Theorem 4 hold and the estimation error bound on the right hand side of (18) is at most of a constant order. If we further have

Δ𝒓>2ϕ>CSPε1\displaystyle\Delta_{\boldsymbol{r}}>2\phi>C_{\text{SP}}\cdot\varepsilon_{1}

for some constant CSP>0C_{\text{SP}}>0, then with probability at least 1O(n10)1-O(n^{-10}), there exists a permutation ρ\rho of [K][K], such that 𝕍^ρ(k)=𝕍k\widehat{\mathbb{V}}_{\rho(k)}=\mathbb{V}_{k} for all k[K]k\in[K].

Proof.

See Section 8.14. ∎

Theorem 5 along with Theorem 3 yields the following result regarding 𝒃^k\widehat{\boldsymbol{b}}_{k}.

Corollary 1.

Assume the conditions in Theorem 5 hold. Then with probability at least 1O(n10)1-O(n^{-10}), there exists a permutation ρ\rho of [K][K], such that

𝑹𝒃^ρ(k)𝒃k=1|𝕍k|i𝕍kΔ𝒓i+[𝚿𝒃]k,,k[K],\displaystyle\boldsymbol{R}^{\top}\widehat{\boldsymbol{b}}_{\rho(k)}-\boldsymbol{b}_{k}^{*}=\frac{1}{\left|\mathbb{V}_{k}\right|}\sum_{i\in\mathbb{V}_{k}}\Delta\boldsymbol{r}_{i}+[\boldsymbol{\Psi}_{\boldsymbol{b}}]_{k,\cdot}^{\top},\quad\forall k\in[K], (23)

where Δ𝐫i\Delta\boldsymbol{r}_{i} is defined by (20). Furthermore, 𝚿𝐛2,ε2\left\|\boldsymbol{\Psi}_{\boldsymbol{b}}\right\|_{2,\infty}\lesssim\varepsilon_{2}.

Proof.

See Section 8.15. ∎

The leading term of RHS of (23) will be denoted by

Δ𝒃k:=(i𝕍kΔ𝒓i)/|𝕍k|,k[K].\displaystyle\Delta\boldsymbol{b}_{k}:=\left(\sum_{i\in\mathbb{V}_{k}}\Delta\boldsymbol{r}_{i}\right)/|\mathbb{V}_{k}|,\quad k\in[K]. (24)

4 Membership Reconstruction

In this section we characterize the behavior of 𝝅^i\widehat{\boldsymbol{\pi}}_{i}. To this end, we will require the expansion of the following three terms

λ^1,𝒃^kdiag(λ^2,λ^2,,λ^K)𝒃^k,k[K]and𝒂^i,i[n].\displaystyle\widehat{\lambda}_{1},\quad\widehat{\boldsymbol{b}}_{k}^{\top}\textbf{diag}(\widehat{\lambda}_{2},\widehat{\lambda}_{2},\dots,\widehat{\lambda}_{K})\widehat{\boldsymbol{b}}_{k},k\in[K]\quad\text{and}\quad\widehat{\boldsymbol{a}}_{i},i\in[n].

The expansion of λ^1\widehat{\lambda}_{1} can be directly given by Theorem 10, and we defer the precise statement to Section 7. We turn to derive the expansion of 𝒃^kdiag(λ^2,λ^2,,λ^K)𝒃^k=𝒃^k𝚲¯𝒃^k\widehat{\boldsymbol{b}}_{k}^{\top}\textbf{diag}(\widehat{\lambda}_{2},\widehat{\lambda}_{2},\dots,\widehat{\lambda}_{K})\widehat{\boldsymbol{b}}_{k}=\widehat{\boldsymbol{b}}_{k}^{\top}\overline{\boldsymbol{\Lambda}}\widehat{\boldsymbol{b}}_{k}. We will use the following notations: given any vector 𝒗K\boldsymbol{v}\in\mathbb{R}^{K} and permutation ρ()\rho(\cdot) of [K][K], set

ρ(𝒗)=[vρ(1),vρ(2),,vρ(K)].\displaystyle\rho(\boldsymbol{v})=\left[v_{\rho(1)},v_{\rho(2)},\dots,v_{\rho(K)}\right]^{\top}.

And, given a matrix 𝑨=[𝒂1,𝒂2,,𝒂K]\boldsymbol{A}=[\boldsymbol{a}_{1},\boldsymbol{a}_{2},\dots,\boldsymbol{a}_{K}] with KK columns, define

ρ(𝑨)=[𝒂ρ(1),𝒂ρ(2),,𝒂ρ(K)].\displaystyle\rho(\boldsymbol{A})=\left[\boldsymbol{a}_{\rho(1)},\boldsymbol{a}_{\rho(2)},\dots,\boldsymbol{a}_{\rho(K)}\right].

Now, we are in a position to state our result regarding 𝒃^k𝚲¯𝒃^k\widehat{\boldsymbol{b}}_{k}^{\top}\overline{\boldsymbol{\Lambda}}\widehat{\boldsymbol{b}}_{k}.

Lemma 2.

Assume the conditions in Theorem 5 hold and ε2ε1K1\varepsilon_{2}\lesssim\varepsilon_{1}\lesssim\sqrt{K-1}, where ε1\varepsilon_{1} and ε2\varepsilon_{2} are defined via (15). Then with probability at least 1O(n10)1-O(n^{-10}), for all k[K]k\in[K], we have the following expansion:

𝒃^ρ(k)𝚲¯𝒃^ρ(k)𝒃k𝚲¯𝒃k=2𝒃k𝚲¯Δ𝒃k+[𝝍]k,\displaystyle\widehat{\boldsymbol{b}}_{\rho(k)}^{\top}\overline{\boldsymbol{\Lambda}}\widehat{\boldsymbol{b}}_{\rho(k)}-\boldsymbol{b}_{k}^{*\top}\overline{\boldsymbol{\Lambda}}^{*}\boldsymbol{b}_{k}^{*}=2\boldsymbol{b}_{k}^{*\top}\overline{\boldsymbol{\Lambda}}^{*}\Delta\boldsymbol{b}_{k}+\left[\boldsymbol{\psi}\right]_{k},

where Δ𝐛k\Delta\boldsymbol{b}_{k} is defined by (24), bkb^{\star}_{k} is defined by (21), 𝚲¯:=diag(λ2,,λK)\overline{\boldsymbol{\Lambda}}^{\star}:=\textbf{diag}(\lambda^{\star}_{2},\ldots,\lambda^{\star}_{K}) and the error 𝛙\boldsymbol{\psi} satisfies

𝝍κK2βn+K1.5θmaxlog0.5n+(K0.5ε2+ε12)σmax.\displaystyle\left\|\boldsymbol{\psi}\right\|_{\infty}\lesssim\frac{\kappa^{*}K^{2}}{\beta_{n}}+K^{1.5}\theta_{\text{max}}\log^{0.5}n+\left(K^{0.5}\varepsilon_{2}+\varepsilon_{1}^{2}\right)\sigma_{\textbf{max}}^{*}.

For the estimation error, if κK1.5βn+Kθmaxlog0.5nε1σmax\frac{\kappa^{*}K^{1.5}}{\beta_{n}}+K\theta_{\text{max}}\log^{0.5}n\lesssim\varepsilon_{1}\sigma_{\textbf{max}}^{*}, we have

|𝒃^ρ(k)𝚲¯𝒃^ρ(k)𝒃k𝚲¯𝒃k|K0.5ε1σmax.\displaystyle\left|\widehat{\boldsymbol{b}}_{\rho(k)}^{\top}\overline{\boldsymbol{\Lambda}}\widehat{\boldsymbol{b}}_{\rho(k)}-\boldsymbol{b}_{k}^{*\top}\overline{\boldsymbol{\Lambda}}^{*}\boldsymbol{b}_{k}^{*}\right|\lesssim K^{0.5}\varepsilon_{1}\sigma_{\textbf{max}}^{*}.
Proof.

See Section 8.17. ∎

The following Lemma gives an expansion of 𝒂^i,i[n]\widehat{\boldsymbol{a}}_{i},i\in[n].

Lemma 3.

Assume the conditions in Theorem 5 hold and ε2ε1C\varepsilon_{2}\lesssim\varepsilon_{1}\leq C for some constant C>0C>0. Let ρ()\rho(\cdot) be the permutation in Theorem 5. Then with probability at least 1O(n10)1-O(n^{-10}), for all i[n]i\in[n], 𝐚^i\widehat{\boldsymbol{a}}_{i} can be expanded as

ρ(𝒂^i)𝒂i=(𝑩)1[Δ𝒓i0](𝑩)1Δ𝑩𝒂i+[𝚿𝒂]i,,\displaystyle\rho(\widehat{\boldsymbol{a}}_{i})-\boldsymbol{a}^{*}_{i}=(\boldsymbol{B}^{*})^{-1}\begin{bmatrix}\Delta\boldsymbol{r}_{i}\\ 0\end{bmatrix}-(\boldsymbol{B}^{*})^{-1}\Delta\boldsymbol{B}\boldsymbol{a}_{i}^{*}+\left[\boldsymbol{\Psi}_{\boldsymbol{a}}\right]_{i,\cdot}^{\top},

where

𝑩=[𝒃1𝒃2𝒃K111],𝒂i=(𝑩)1[𝒓i1],Δ𝑩=[Δ𝒃1Δ𝒃2Δ𝒃K000],\displaystyle\boldsymbol{B}^{*}=\begin{bmatrix}\boldsymbol{b}_{1}^{*}&\boldsymbol{b}_{2}^{*}&\dots&\boldsymbol{b}_{K}^{*}\\ 1&1&\dots&1\end{bmatrix},\quad\boldsymbol{a}_{i}^{*}=(\boldsymbol{B}^{*})^{-1}\begin{bmatrix}\boldsymbol{r}_{i}^{*}\\ 1\end{bmatrix},\quad\Delta\boldsymbol{B}=\begin{bmatrix}\Delta\boldsymbol{b}_{1}&\Delta\boldsymbol{b}_{2}&\dots&\Delta\boldsymbol{b}_{K}\\ 0&0&\dots&0\end{bmatrix}, (25)

and

[𝚿𝒂]i,2ε2+ε1ρ(𝒂^i)𝒂i2.\displaystyle\|\left[\boldsymbol{\Psi}_{\boldsymbol{a}}\right]_{i,\cdot}\|_{2}\lesssim\varepsilon_{2}+\varepsilon_{1}\left\|\rho(\widehat{\boldsymbol{a}}_{i})-\boldsymbol{a}^{*}_{i}\right\|_{2}.

Furthermore, the estimation error can be controlled as

ρ(𝒂^i)𝒂i2ε1.\displaystyle\left\|\rho(\widehat{\boldsymbol{a}}_{i})-\boldsymbol{a}^{*}_{i}\right\|_{2}\lesssim\varepsilon_{1}.
Proof.

See Section 8.18. ∎

For simplicity, we define, for i[n]i\in[n]

Δ𝒂i=(𝑩)1[Δ𝒓i0](𝑩)1Δ𝑩𝒂i,\displaystyle\Delta\boldsymbol{a}_{i}=(\boldsymbol{B}^{*})^{-1}\begin{bmatrix}\Delta\boldsymbol{r}_{i}\\ 0\end{bmatrix}-(\boldsymbol{B}^{*})^{-1}\Delta\boldsymbol{B}\boldsymbol{a}_{i}^{*}, (26)

where BB^{\star} and aa^{\star} are defined by (25). As a counterpart of 𝒄^\widehat{\boldsymbol{c}} in the membership construction step (6), we define ck=[λ1+𝒃k𝚲¯𝒃k]1/2c_{k}^{*}=[\lambda_{1}^{*}+\boldsymbol{b}_{k}^{*\top}\overline{\boldsymbol{\Lambda}}^{*}\boldsymbol{b}_{k}^{*}]^{-1/2} for k[K]k\in[K], where 𝒃k\boldsymbol{b}_{k}^{*\top} is given by (21). The expansion of 𝒄^\widehat{\boldsymbol{c}} can be seen from Corollary 3 and Lemma 2. Combine them with Lemma 3, we obtain the following expansion of 𝝅^i\widehat{\boldsymbol{\pi}}_{i} that is linear in 𝑾\boldsymbol{W}.

Theorem 6.

Assume the conditions in Theorem 5 hold and ε2ε1C\varepsilon_{2}\lesssim\varepsilon_{1}\leq C for some constant C>0C>0. Let ρ()\rho(\cdot) be the permutation in Theorem 5. Then with probability at least 1O(n10)1-O(n^{-10}), for all i[n]i\in[n] and k[K]k\in[K], we have

𝝅^i(ρ(k))𝝅i(k)=(1+ηi)Δ𝝅i(k)+[𝚿𝚷]i,k,\displaystyle\widehat{\boldsymbol{\pi}}_{i}(\rho(k))-\boldsymbol{\pi}_{i}(k)=(1+\eta_{i})\Delta\boldsymbol{\pi}_{i}(k)+\left[\boldsymbol{\Psi}_{\boldsymbol{\Pi}}\right]_{i,k},

where

Δ𝝅i(k)=\displaystyle\Delta\boldsymbol{\pi}_{i}(k)= 1(l=1K(𝒂i)l/cl)2{lk,l[K]Tr[𝑾𝒖1𝒖1+2𝑵𝑾𝒖1𝒖1](ck2clcl2ck)(𝒂i)k(𝒂i)l\displaystyle\frac{1}{\left(\sum_{l=1}^{K}(\boldsymbol{a}^{*}_{i})_{l}/c^{*}_{l}\right)^{2}}\Bigg{\{}\sum_{l\neq k,l\in[K]}\textbf{Tr}\left[\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+2\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right]\left(\frac{c_{k}^{*}}{2c_{l}^{*}}-\frac{c_{l}^{*}}{2c_{k}^{*}}\right)(\boldsymbol{a}_{i}^{*})_{k}(\boldsymbol{a}_{i}^{*})_{l}
+(Δ𝒂i)k(𝒂i)l(Δ𝒂i)l(𝒂i)kckcl+(𝒃k𝚲¯Δ𝒃kckcl𝒃l𝚲¯Δ𝒃lclck)(𝒂i)k(𝒂i)l},\displaystyle\quad\quad+\frac{(\Delta\boldsymbol{a}_{i})_{k}(\boldsymbol{a}_{i}^{*})_{l}-(\Delta\boldsymbol{a}_{i})_{l}(\boldsymbol{a}_{i}^{*})_{k}}{c_{k}^{*}c_{l}^{*}}+\left(\frac{\boldsymbol{b}_{k}^{*\top}\overline{\boldsymbol{\Lambda}}^{*}\Delta\boldsymbol{b}_{k}c_{k}^{*}}{c_{l}^{*}}-\frac{\boldsymbol{b}_{l}^{*\top}\overline{\boldsymbol{\Lambda}}^{*}\Delta\boldsymbol{b}_{l}c_{l}^{*}}{c_{k}^{*}}\right)(\boldsymbol{a}_{i}^{*})_{k}(\boldsymbol{a}_{i}^{*})_{l}\Bigg{\}}, (27)

|ηi|Kε1|\eta_{i}|\lesssim K\varepsilon_{1} and

|[𝚿𝚷]i,k|Knθmax2(κK2βn+K1.5θmaxlog0.5n+(K0.5ε2+ε12)σmax)+K(ε2+ε12).\displaystyle\left|\left[\boldsymbol{\Psi}_{\boldsymbol{\Pi}}\right]_{i,k}\right|\lesssim\frac{K}{n\theta_{\text{max}}^{2}}\left(\frac{\kappa^{*}K^{2}}{\beta_{n}}+K^{1.5}\theta_{\text{max}}\log^{0.5}n+\left(K^{0.5}\varepsilon_{2}+\varepsilon_{1}^{2}\right)\sigma_{\textbf{max}}^{*}\right)+K(\varepsilon_{2}+\varepsilon_{1}^{2}).
Proof.

See Section 8.19. ∎

Given ε1\varepsilon_{1} and ε2\varepsilon_{2} in (15), we define

ε3:=Knθmax2(κK2βn+K1.5θmaxlog0.5n+(K0.5ε2+ε12)σmax)+K(ε2+ε12),\displaystyle\varepsilon_{3}:=\frac{K}{n\theta_{\text{max}}^{2}}\left(\frac{\kappa^{*}K^{2}}{\beta_{n}}+K^{1.5}\theta_{\text{max}}\log^{0.5}n+\left(K^{0.5}\varepsilon_{2}+\varepsilon_{1}^{2}\right)\sigma_{\textbf{max}}^{*}\right)+K(\varepsilon_{2}+\varepsilon_{1}^{2}),

where κ\kappa^{\star} is defined by (2.2). Since σmaxβnK1𝜽22K1𝜽22\sigma_{\textbf{max}}^{*}\asymp\beta_{n}K^{-1}\|\boldsymbol{\theta}\|_{2}^{2}\lesssim K^{-1}\|\boldsymbol{\theta}\|_{2}^{2}, maxk[K]Δ𝒃k2ε1\max_{k\in[K]}\|\Delta\boldsymbol{b}_{k}\|_{2}\lesssim\varepsilon_{1} and maxi[n]Δ𝒂i2ε1\max_{i\in[n]}\|\Delta\boldsymbol{a}_{i}\|_{2}\lesssim\varepsilon_{1}, from Theorem 6 one can show that

|𝝅^i(ρ(k))𝝅i(k)Δ𝝅i(k)|ε3\displaystyle|\widehat{\boldsymbol{\pi}}_{i}(\rho(k))-\boldsymbol{\pi}_{i}(k)-\Delta\boldsymbol{\pi}_{i}(k)|\lesssim\varepsilon_{3}

with probability at least 1O(n10)1-O(n^{-10}). Specifically, if K,μ,κ,βn,θmax1K,\mu^{*},\kappa^{*},\beta_{n},\theta_{\max}\asymp 1, we have

ε3log2nn.\displaystyle\varepsilon_{3}\asymp\frac{\log^{2}n}{n}.

5 Distributional Theory and Rank Inference

In this section, we tackle specific inference problems based on the uncertainty quantification results we stated before. First, we establish distributional guarantee using the first order expansion we derived in Theorem 6. Second, we apply the distributional results to related inference problem, especially rank inference application. To state the distributional results of 𝝅^i\widehat{\boldsymbol{\pi}}_{i}, we need the following notations. They are non-random matrices of dimension n×nn\times n

𝑪i,k𝒓:=𝚲¯kk(𝒖1)i[𝒆i(𝑼¯,k)]𝚲¯kk[𝒖1(𝑼¯,k)𝑵](𝒓i)k(𝒖1)iλ1𝒖1𝑵i,,\displaystyle\boldsymbol{C}^{\boldsymbol{r}}_{i,k}:=\frac{\overline{\boldsymbol{\Lambda}}_{kk}^{*}}{(\boldsymbol{u}_{1})_{i}}\left[\boldsymbol{e}_{i}\left(\overline{\boldsymbol{U}}_{\cdot,k}^{*}\right)^{\top}\right]-\overline{\boldsymbol{\Lambda}}_{kk}^{*}\left[\boldsymbol{u}_{1}^{*}\left(\overline{\boldsymbol{U}}_{\cdot,k}^{*}\right)^{\top}\boldsymbol{N}\right]-\frac{(\boldsymbol{r}_{i}^{*})_{k}}{(\boldsymbol{u}_{1}^{*})_{i}\lambda_{1}^{*}}\boldsymbol{u}_{1}^{*}\boldsymbol{N}_{i,\cdot},
𝑪i,k𝒂:=l[K1](𝑩)k,l1𝑪i,l𝒓l[K1]t[K](𝑩)k,l1(𝒂i)t𝑪t,l𝒃,𝑪i,k𝒃:=1|𝕍i|j𝕍i𝑪j,k𝒓,\displaystyle\boldsymbol{C}^{\boldsymbol{a}}_{i,k}:=\sum_{l\in[K-1]}\left(\boldsymbol{B}^{*}\right)_{k,l}^{-1}\boldsymbol{C}^{\boldsymbol{r}}_{i,l}-\sum_{l\in[K-1]}\sum_{t\in[K]}\left(\boldsymbol{B}^{*}\right)_{k,l}^{-1}(\boldsymbol{a}_{i}^{*})_{t}\boldsymbol{C}^{\boldsymbol{b}}_{t,l},\quad\boldsymbol{C}^{\boldsymbol{b}}_{i,k}:=\frac{1}{\left|\mathbb{V}_{i}\right|}\sum_{j\in\mathbb{V}_{i}}\boldsymbol{C}^{\boldsymbol{r}}_{j,k},
𝑪i,k𝝅:=1(l=1K(𝒂i)l/cl)2lk,l[K]{(𝒖1𝒖1+2𝒖1𝒖1𝑵)(ck2clcl2ck)(𝒂i)k(𝒂i)l\displaystyle\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}:=\frac{1}{\left(\sum_{l=1}^{K}(\boldsymbol{a}^{*}_{i})_{l}/c^{*}_{l}\right)^{2}}\sum_{l\neq k,l\in[K]}\Bigg{\{}\left(\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+2\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{N}\right)\left(\frac{c_{k}^{*}}{2c_{l}^{*}}-\frac{c_{l}^{*}}{2c_{k}^{*}}\right)(\boldsymbol{a}_{i}^{*})_{k}(\boldsymbol{a}_{i}^{*})_{l}
+(𝒂i)l𝑪i,k𝒂(𝒂i)k𝑪i,l𝒂ckcl+t[K1](𝒃kt𝚲¯ttck𝑪k,t𝒃cl𝒃lt𝚲¯ttcl𝑪l,t𝒃ck)(𝒂i)k(𝒂i)l}.\displaystyle\quad\quad+\frac{(\boldsymbol{a}_{i}^{*})_{l}\boldsymbol{C}^{\boldsymbol{a}}_{i,k}-(\boldsymbol{a}_{i}^{*})_{k}\boldsymbol{C}^{\boldsymbol{a}}_{i,l}}{c_{k}^{*}c_{l}^{*}}+\sum_{t\in[K-1]}\left(\frac{\boldsymbol{b}_{kt}^{*\top}\overline{\boldsymbol{\Lambda}}^{*}_{tt}c_{k}^{*}\boldsymbol{C}^{\boldsymbol{b}}_{k,t}}{c_{l}^{*}}-\frac{\boldsymbol{b}_{lt}^{*\top}\overline{\boldsymbol{\Lambda}}^{*}_{tt}c_{l}^{*}\boldsymbol{C}^{\boldsymbol{b}}_{l,t}}{c_{k}^{*}}\right)(\boldsymbol{a}_{i}^{*})_{k}(\boldsymbol{a}_{i}^{*})_{l}\Bigg{\}}. (28)

One can see that

(Δ𝒓i)k=Tr[𝑪i,k𝒓𝑾],(Δ𝒃i)k=Tr[𝑪i,k𝒃𝑾],\displaystyle(\Delta\boldsymbol{r}_{i})_{k}=\textbf{Tr}\left[\boldsymbol{C}_{i,k}^{\boldsymbol{r}}\boldsymbol{W}\right],\quad(\Delta\boldsymbol{b}_{i})_{k}=\textbf{Tr}\left[\boldsymbol{C}_{i,k}^{\boldsymbol{b}}\boldsymbol{W}\right],
(Δ𝒂i)k=Tr[𝑪i,k𝒂𝑾],Δ𝝅i(k)=Tr[𝑪i,k𝝅𝑾],\displaystyle(\Delta\boldsymbol{a}_{i})_{k}=\textbf{Tr}\left[\boldsymbol{C}_{i,k}^{\boldsymbol{a}}\boldsymbol{W}\right],\quad\Delta\boldsymbol{\pi}_{i}(k)=\textbf{Tr}\left[\boldsymbol{C}_{i,k}^{\boldsymbol{\pi}}\boldsymbol{W}\right],

where Δ𝒓i\Delta\boldsymbol{r}_{i}, Δ𝒃i\Delta\boldsymbol{b}_{i}, Δ𝒂i\Delta\boldsymbol{a}_{i}, Δ𝝅i(k)\Delta\boldsymbol{\pi}_{i}(k) are defined by (20), (24), (26) and (27) respectively. For any matrices 𝑴,𝑴1,𝑴2n×n\boldsymbol{M},\boldsymbol{M}_{1},\boldsymbol{M}_{2}\in\mathbb{R}^{n\times n}, we define the variance of Tr[𝑴𝑾]\textbf{Tr}[\boldsymbol{M}\boldsymbol{W}] as

V𝑴:=i[n]Mii2Hii(1Hii)+1i<jn(Mij+Mji)2Hij(1Hij),\displaystyle V_{\boldsymbol{M}}:=\sum_{i\in[n]}M_{ii}^{2}H_{ii}(1-H_{ii})+\sum_{1\leq i<j\leq n}(M_{ij}+M_{ji})^{2}H_{ij}(1-H_{ij}),

and the covariance of Tr[𝑴1𝑾]\textbf{Tr}[\boldsymbol{M}_{1}\boldsymbol{W}] and Tr[𝑴2𝑾]\textbf{Tr}[\boldsymbol{M}_{2}\boldsymbol{W}] as

V𝑴1,𝑴2:=i[n]M1,iiM2,iiHii(1Hii)+1i<jn(M1,ij+M1,ji)(M2,ij+M2,ji)Hij(1Hij).\displaystyle V_{\boldsymbol{M}_{1},\boldsymbol{M}_{2}}:=\sum_{i\in[n]}M_{1,ii}M_{2,ii}H_{ii}(1-H_{ii})+\sum_{1\leq i<j\leq n}(M_{1,ij}+M_{1,ji})(M_{2,ij}+M_{2,ji})H_{ij}(1-H_{ij}).

The asymptotic distribution of 𝝅^i\widehat{\boldsymbol{\pi}}_{i} is given by the following Theorem:

Theorem 7.

Let ρ()\rho(\cdot) be the permutation in Theorem 5. Let rr be a fixed integer and

={(i1,k1),(i2,k2),,(ir,kr)}\displaystyle\mathcal{I}=\left\{(i_{1},k_{1}),(i_{2},k_{2}),\dots,(i_{r},k_{r})\right\}

be rr distinct fixed pairs from [n]×[K][n]\times[K]. We consider vectors

𝝅^:=(𝝅^i1(ρ(k1)),𝝅^i2(ρ(k2)),,𝝅^ir(ρ(kr))) and 𝝅:=(𝝅i1(k1),𝝅i2(k2),,𝝅ir(kr)),\displaystyle\widehat{\boldsymbol{\pi}}_{\mathcal{I}}:=\left(\widehat{\boldsymbol{\pi}}_{i_{1}}(\rho(k_{1})),\widehat{\boldsymbol{\pi}}_{i_{2}}(\rho(k_{2})),\dots,\widehat{\boldsymbol{\pi}}_{i_{r}}(\rho(k_{r}))\right)^{\top}\text{ and }\boldsymbol{\pi}_{\mathcal{I}}:=\left(\boldsymbol{\pi}_{i_{1}}(k_{1}),\boldsymbol{\pi}_{i_{2}}(k_{2}),\dots,\boldsymbol{\pi}_{i_{r}}(k_{r})\right)^{\top},

Denote by 𝚺\boldsymbol{\Sigma} the covariance matrix whose jthj^{th} diagonal entry is V𝐂ij,kj𝛑,j[r]V_{\boldsymbol{C}^{\boldsymbol{\pi}}_{i_{j},k_{j}}},j\in[r] and (j,k)(j,k) off-diagonal entry is V𝐂ij,kj𝛑,𝐂ik,kk𝛑,jk[r]V_{\boldsymbol{C}^{\boldsymbol{\pi}}_{i_{j},k_{j}},\boldsymbol{C}^{\boldsymbol{\pi}}_{i_{k},k_{k}}},j\neq k\in[r]. Then for any convex set 𝒟r\mathcal{D}\subset\mathbb{R}^{r}, we have

|(𝝅^𝝅𝒟)(𝒩(𝟎r,𝚺)𝒟)|=o(1),\displaystyle\left|\mathbb{P}(\widehat{\boldsymbol{\pi}}_{\mathcal{I}}-\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D})-\mathbb{P}(\mathcal{N}(\boldsymbol{0}_{r},\boldsymbol{\Sigma})\in\mathcal{D})\right|=o(1),

as long as

r5/4max1ijn𝚺1/2𝝎ij2=o(1) and λr1/2(𝚺)r3/4ε3=o(1),\displaystyle r^{5/4}\max_{1\leq i\leq j\leq n}\left\|\boldsymbol{\Sigma}^{-1/2}\boldsymbol{\omega}_{ij}\right\|_{2}=o(1)\text{ and }\lambda_{r}^{-1/2}(\boldsymbol{\Sigma})r^{3/4}\varepsilon_{3}=o(1), (29)

where for each pair (i,j)[n]×[n](i,j)\in[n]\times[n] such that iji\leq j,

𝝎ij={((𝑪i1,k1𝝅)ij+(𝑪i1,k1𝝅)ji,(𝑪i2,k2𝝅)ij+(𝑪i2,k2𝝅)ji,,(𝑪ir,kr𝝅)ij+(𝑪ir,kr𝝅)ji)ij;((𝑪i1,k1𝝅)ii,(𝑪i2,k2𝝅)ii,,(𝑪ir,kr𝝅)ii)i=j.\displaystyle\boldsymbol{\omega}_{ij}=\begin{cases}\left((\boldsymbol{C}^{\boldsymbol{\pi}}_{i_{1},k_{1}})_{ij}+(\boldsymbol{C}^{\boldsymbol{\pi}}_{i_{1},k_{1}})_{ji},(\boldsymbol{C}^{\boldsymbol{\pi}}_{i_{2},k_{2}})_{ij}+(\boldsymbol{C}^{\boldsymbol{\pi}}_{i_{2},k_{2}})_{ji},\dots,(\boldsymbol{C}^{\boldsymbol{\pi}}_{i_{r},k_{r}})_{ij}+(\boldsymbol{C}^{\boldsymbol{\pi}}_{i_{r},k_{r}})_{ji}\right)^{\top}&i\neq j;\\ \left((\boldsymbol{C}^{\boldsymbol{\pi}}_{i_{1},k_{1}})_{ii},(\boldsymbol{C}^{\boldsymbol{\pi}}_{i_{2},k_{2}})_{ii},\dots,(\boldsymbol{C}^{\boldsymbol{\pi}}_{i_{r},k_{r}})_{ii}\right)^{\top}&i=j.\end{cases}
Proof.

See Section 8.20. ∎

Next, we will apply Theorem 7 to answer some inference questions. In many practical applications, one is concerned with the characteristics of each community rather than the index (e.g., 1,2,,K1,2,\dots,K) of the community. For this reason, we will assume that the permutation ρ()\rho(\cdot) is the identical map, that is, ρ(i)=i\rho(i)=i for all index ii.

Example 1.

Given a node ii in the network, it is natural to ask which community it is closest to. This amounts to find the largest component of {πi(k)}k=1K\{\pi_{i}(k)\}_{k=1}^{K} and can be formulated as KK hypothesis testing problems. For k[K]k\in[K], we consider the following testing problem:

Hk0:𝝅i(k)maxl[K]\{k}𝝅i(l) v.s. Hka:𝝅i(k)>maxl[K]\{k}𝝅i(l).\displaystyle H_{k0}:\boldsymbol{\pi}_{i}(k)\leq\max_{l\in[K]\backslash\{k\}}\boldsymbol{\pi}_{i}(l)\quad\text{ v.s. }\quad H_{ka}:\boldsymbol{\pi}_{i}(k)>\max_{l\in[K]\backslash\{k\}}\boldsymbol{\pi}_{i}(l).

According to Theorem 7 with r=2r=2, we consider the following Bonferroni-adjusted critical region at a significance level of α\alpha

{𝝅^i(k)>𝝅^i(l)+Φ1(αK1)V^𝑴,l[K]\{k}},\displaystyle\left\{\widehat{\boldsymbol{\pi}}_{i}(k)>\widehat{\boldsymbol{\pi}}_{i}(l)+\Phi^{-1}\left(\frac{\alpha}{K-1}\right)\sqrt{\widehat{V}_{\boldsymbol{M}}},\;\forall l\in[K]\backslash\{k\}\right\},

where with 𝐌=𝐂^i,k𝛑𝐂^i,l𝛑\boldsymbol{M}=\widehat{\boldsymbol{C}}^{\boldsymbol{\pi}}_{i,k}-\widehat{\boldsymbol{C}}^{\boldsymbol{\pi}}_{i,l} and 𝐇^:=i=1Kλ^i𝐮^i𝐮^i\widehat{\boldsymbol{H}}:=\sum_{i=1}^{K}\widehat{\lambda}_{i}\widehat{\boldsymbol{u}}_{i}\widehat{\boldsymbol{u}}_{i}^{\top},

V^𝑴=i[n]Mii2H^ii(1H^ii)+1i<jn(Mij+Mji)2H^ij(1H^ij).\displaystyle\widehat{V}_{\boldsymbol{M}}=\sum_{i\in[n]}M_{ii}^{2}\widehat{H}_{ii}(1-\widehat{H}_{ii})+\sum_{1\leq i<j\leq n}(M_{ij}+M_{ji})^{2}\widehat{H}_{ij}(1-\widehat{H}_{ij}). (30)

It is easy to see from the above critical region that at most one of H10,H20,,HK0H_{10},H_{20},\dots,H_{K0} can be rejected. Once Hk0H_{k0} is rejected, we can conclude that node ii is closest to community kk at a significance level of α\alpha.

Example 2.

Moving beyond the question of closest community detection, it is often of interest to understand the rank of node ii with respect to community kk such as a ranking of conservativeness of a journal or a book. Let Ri,kR_{i,k} be the rank of 𝛑i(k)\boldsymbol{\pi}_{i}(k) among 𝛑1(k),𝛑2(k),,𝛑n(k)\boldsymbol{\pi}_{1}(k),\boldsymbol{\pi}_{2}(k),\dots,\boldsymbol{\pi}_{n}(k). We adopt the method proposed by [FLWY22] to construct rank confidence interval for Ri,kR_{i,k}. Consider the following random variable 𝒯\mathcal{T} and its bootstrap counterpart 𝒢\mathcal{G}:

𝒯\displaystyle\mathcal{T} =maxj:ji|(𝝅^j(k)𝝅^i(k))(𝝅j(k)𝝅i(k))V𝑪j,k𝝅𝑪i,k𝝅|,\displaystyle=\max_{j:j\neq i}\left|\frac{(\widehat{\boldsymbol{\pi}}_{j}(k)-\widehat{\boldsymbol{\pi}}_{i}(k))-(\boldsymbol{\pi}_{j}(k)-\boldsymbol{\pi}_{i}(k))}{\sqrt{V_{\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}}}}\right|,
𝒢\displaystyle\mathcal{G} =maxj:ji|Tr[(𝑪j,k𝝅𝑪i,k𝝅)(𝑾𝑮)]V𝑪j,k𝝅𝑪i,k𝝅1abnGab(n2+n)/2Tr[(𝑪j,k𝝅𝑪i,k𝝅)𝑾]V𝑪j,k𝝅𝑪i,k𝝅|,\displaystyle=\max_{j:j\neq i}\left|\frac{\textbf{Tr}\left[\left(\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}\right)\left(\boldsymbol{W}\odot\boldsymbol{G}\right)\right]}{\sqrt{V_{\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}}}}-\frac{\sum_{1\leq a\leq b\leq n}G_{ab}}{(n^{2}+n)/2}\frac{\textbf{Tr}\left[\left(\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}\right)\boldsymbol{W}\right]}{\sqrt{V_{\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}}}}\right|,

where 𝐆n×n\boldsymbol{G}\in\mathbb{R}^{n\times n} is a symmetric random matrix whose upper triangular (include diagonal) entries are i.i.d. standard Gaussian distribution and 𝐀𝐁\boldsymbol{A}\odot\boldsymbol{B} denotes the elementwise product of matrices 𝐀\boldsymbol{A} and 𝐁\boldsymbol{B}. Given any α(0,1)\alpha\in(0,1), we define c1αc_{1-\alpha} as the (1α)(1-\alpha)th quantile of the conditional distribution of 𝒢\mathcal{G} given 𝐖\boldsymbol{W}. Then by [CCKK22, Theorem 2.2], one can show that

|(𝒯>c1α)α|0\displaystyle\left|\mathbb{P}(\mathcal{T}>c_{1-\alpha})-\alpha\right|\to 0 (31)

under mild regularity condition (See Section 8.23 for more details). Using the plug-in estimators and estimated critical value c^1α\widehat{c}_{1-\alpha} from the bootstrap samples, we construct the following simultaneous confidence intervals for {𝛑j(k)𝛑i(k)}j[n]\{i}\{\boldsymbol{\pi}_{j}(k)-\boldsymbol{\pi}_{i}(k)\}_{j\in[n]\backslash\{i\}} with a confidence level of 1α1-\alpha as

[CL(j),CU(j)]:=[𝝅^j(k)𝝅^i(k)±c^1αV^𝑪^j,k𝝅𝑪^i,k𝝅].\displaystyle\left[C_{L}(j),C_{U}(j)\right]:=\left[\widehat{\boldsymbol{\pi}}_{j}(k)-\widehat{\boldsymbol{\pi}}_{i}(k)\pm\widehat{c}_{1-\alpha}\sqrt{\widehat{V}_{\widehat{\boldsymbol{C}}^{\boldsymbol{\pi}}_{j,k}-\widehat{\boldsymbol{C}}^{\boldsymbol{\pi}}_{i,k}}}\right].

In other words, for all j[n]\{i}j\in[n]\backslash\{i\}, 𝛑j(k)𝛑i(k)[CL(j),CU(j)]\boldsymbol{\pi}_{j}(k)-\boldsymbol{\pi}_{i}(k)\in[C_{L}(j),C_{U}(j)] with probability at least 1α1-\alpha. Now CL(j)>0C_{L}(j)>0 implies 𝛑j(k)>𝛑i(k)\boldsymbol{\pi}_{j}(k)>\boldsymbol{\pi}_{i}(k) and counting the number of such jsj^{\prime}s give the lower bound the rank of 𝛑i(k)\boldsymbol{\pi}_{i}(k). Similarly, CU(j)<0C_{U}(j)<0 implies 𝛑j(k)<𝛑i(k)\boldsymbol{\pi}_{j}(k)<\boldsymbol{\pi}_{i}(k) and this gives an upper bound on the rank of 𝛑i(k)\boldsymbol{\pi}_{i}(k). As a result,

[1+j[n]\{i}𝕀(CL(j)>0),nj[n]\{i}𝕀(CU(j)<0)]\displaystyle\left[1+\sum_{j\in[n]\backslash\{i\}}\mathbb{I}(C_{L}(j)>0),n-\sum_{j\in[n]\backslash\{i\}}\mathbb{I}(C_{U}(j)<0)\right]

forms a 100(1α)%100(1-\alpha)\% confidence interval for Ri,kR_{i,k}.

Example 3.

[FFHL22b] proposed the SIMPLE test to study the statistical inference on the membership profiles. Specifically, for each node pair ij[n]i\neq j\in[n], we are interested in the following testing problem:

H0:𝝅i=𝝅j v.s. Ha:𝝅i𝝅j.\displaystyle H_{0}:\boldsymbol{\pi}_{i}=\boldsymbol{\pi}_{j}\quad\text{ v.s. }\quad H_{a}:\boldsymbol{\pi}_{i}\neq\boldsymbol{\pi}_{j}.

Theorem 7 allows us to recover their result. To see this, we take r=2(K1)r=2(K-1),

={(i,1),(i,2),,(i,K1),(j,1),(j,2),,(j,K1)}.\displaystyle\mathcal{I}=\{(i,1),(i,2),\dots,(i,K-1),(j,1),(j,2),\dots,(j,K-1)\}.

We define matrix 𝐓(K1)×(2K2)\boldsymbol{T}\in\mathbb{R}^{(K-1)\times(2K-2)} by

Tpq={1q=p1q=p+K10otherwise,(p,q)[K1]×[2K2].\displaystyle T_{pq}=\begin{cases}1&q=p\\ -1&q=p+K-1\\ 0&\text{otherwise}\end{cases},\quad\forall(p,q)\in[K-1]\times[2K-2].

As a result, by Theorem 7, as long as condition (29) holds, under null hypothesis H0:𝛑i=𝛑jH_{0}:\boldsymbol{\pi}_{i}=\boldsymbol{\pi}_{j} we have

((𝝅^i)1:K1(𝝅^j)1:K1)T(𝑻𝚺𝑻)1((𝝅^i)1:K1(𝝅^j)1:K1)χK12.\displaystyle\Bigl{(}(\widehat{\boldsymbol{\pi}}_{i})_{1:K-1}-(\widehat{\boldsymbol{\pi}}_{j})_{1:K-1}\Bigr{)}^{T}\left(\boldsymbol{T}\boldsymbol{\Sigma}\boldsymbol{T}^{\top}\right)^{-1}\Bigl{(}(\widehat{\boldsymbol{\pi}}_{i})_{1:K-1}-(\widehat{\boldsymbol{\pi}}_{j})_{1:K-1}\Bigr{)}\to\chi^{2}_{K-1}.

This Hotelling type of statistic can be used to test the null hypothesis for two individual nodes and recovers the result in [FFHL22b].

6 Numerical Studies

In this section, we conduct numerical experiments on both synthetic data and real data to complement our theoretical results. We first validate our distributional results by simulations. Then, we apply our approach to stock dataset to study the simplex structure and do rank inference, as we have mentioned in previous examples.

6.1 Synthetic Data Simulation

Here, we conduct synthetic data experiments to verify our uncertainty quantification results in Theorems 6 and 7. Our simulation setup is as follows: set the number of nodes n=2000n=2000 and number of communities K=2K=2. To generate the membership matrix 𝚷\boldsymbol{\Pi}, we first set the first two rows of the 2000×22000\times 2 matrix 𝚷\boldsymbol{\Pi} as [1,0][1,0] and [0,1][0,1], as two pure nodes. The first entries of the remaining 19981998 rows are sampled independently from the uniform distribution over interval [0.1,0.9][0.1,0.9], while the second entries are determined by the first entries since the row sum is 11. Then we randomly shuffle the rows of 𝚷\boldsymbol{\Pi}. For the matrix 𝑷2×2\boldsymbol{P}\in\mathbb{R}^{2\times 2}, we set its diagonals as 11 and off-diagonals as 0.20.2. In terms of the θi\theta_{i}, which partially represents the signal strength, we consider three settings: (i) θi=0.6\theta_{i}=0.6 for all nodes. (ii) θi\theta_{i}’s are sampled independently from the uniform distribution over interval [0.3,0.9][0.3,0.9]. (iii) θi=0.9\theta_{i}=0.9 for all nodes. In each setting, we generate the network and obtain the estimated mixed membership matrix estimator 𝚷^\widehat{\boldsymbol{\Pi}} 500500 times. We then record the realizations of the following standardized random variable

𝝅^1(1)𝝅1(1)V^𝑪^1,1𝝅,\displaystyle\frac{\widehat{\boldsymbol{\pi}}_{1}(1)-\boldsymbol{\pi}_{1}(1)}{\sqrt{\widehat{V}_{\widehat{\boldsymbol{C}}^{\boldsymbol{\pi}}_{1,1}}}},

where V^\widehat{V} is defined by (30) and 𝑪^1,1𝝅\widehat{\boldsymbol{C}}^{\boldsymbol{\pi}}_{1,1} is the plug-in estimator of 𝑪1,1𝝅\boldsymbol{C}^{\boldsymbol{\pi}}_{1,1} given by (28). Figure 1 summarizes the results collected from the 500500 simulations. The three plots in the first row show the histograms of the results from each setting, and the orange curve is the density of standard normal distribution. The three Q-Q plots in the second row further examine the normality in the three settings. These results suggest that the random variable is nearly normally distributed and the estimated asymptotic variance is right. They in turn support further our theoretical results Theorem 6 and Theorem 7.

Refer to caption
Figure 1: Histograms and Q-Q plots for validating the normality of (𝝅^1(1)𝝅1(1))/var^(𝑪^1,1𝝅)(\widehat{\boldsymbol{\pi}}_{1}(1)-\boldsymbol{\pi}_{1}(1))/\sqrt{\widehat{\textbf{var}}(\widehat{\boldsymbol{C}}^{\boldsymbol{\pi}}_{1,1})}. The orange curves in the first row of plots are the density of standard normal distribution. Three columns represent three choices of θi\theta_{i}, and in each setting the simulation is repeated for 500500 times.

6.2 Real Data Experiments

In this subsection, we apply our uncertainty quantification results to financial dataset. Our dataset consists of the daily close prices of the S&P 500 stocks from January 1, 2010 to December 31, 2022 from Yahoo Finance111https://pypi.org/project/yfinance/ and we calculated the log returns. We clean the data by removing the stocks with more than one missing values. For those with just one missing value, we let the corresponding log returns be zero. We would like to construct a network based on the correlation of the log returns. It is well known in finance that some common factors account for much of this correlation. Similar to [FFHL19], we first fit a factor model with five factors to remove these common factors and then construct the network based on the covariance matrix of the idiosyncratic components. Letting 𝑨\boldsymbol{A} be the covariance matrix of the idiosyncratic components, we draw an edge between nodes ii and jj if and only if Aij>0.1A_{ij}>0.1. In this way, we obtain the adjacency matrix 𝑿\boldsymbol{X}. After the preprocessing steps, we have n=433n=433 stocks remained.

We take K=3K=3 and apply the SCORE normalization step to the leading eigenvectors of 𝑿\boldsymbol{X}. On the left side of Figure 2 we display the scatter plot of 𝒓^i\widehat{\boldsymbol{r}}_{i} for i[n]i\in[n] and show the 22-dimensional simplex structure. As we can see from the figure, it has a clear triangular structure. Looking closely at the nodes, we can find some characteristics of the three corners of this triangle. Many financial companies (PNC, NDAQ, TROW, RE, PSA) are very close to the vertex of the right corner, which can be viewed as the pure node of this community. And, many other financial companies (AJG, WRB, AXP, BRO, C) are also closer to this corner compared to the other two corners. In the top corner, we can find companies (REGN, EW, HOLX, ILMN) belonging to the healthcare industries. Moreover, some other healthcare companies PFE, HUM, LH, TMO, ABT can also be viewed as mixed members which are closer to this community. Similarly, companies related to the energy industry such as MPC, BA, TDY, EMR, AEP, DOV, XEL, FE make up a large part of the bottom corner, while other similar companies ED, LNT seem to be mixed members close to this corner. To validate these observations, we apply Theorem 7 to conduct the following tests for each i[n]i\in[n] as we have stated in Example 1.

H01:𝝅i(1)max{𝝅i(2),𝝅i(3)} v.s. Ha1:𝝅i(1)>max{𝝅i(2),𝝅i(3)};\displaystyle H_{01}:\boldsymbol{\pi}_{i}(1)\leq\max\{\boldsymbol{\pi}_{i}(2),\boldsymbol{\pi}_{i}(3)\}\quad\text{ v.s. }\quad H_{a1}:\boldsymbol{\pi}_{i}(1)>\max\{\boldsymbol{\pi}_{i}(2),\boldsymbol{\pi}_{i}(3)\};
H02:𝝅i(2)max{𝝅i(1),𝝅i(3)} v.s. Ha2:𝝅i(2)>max{𝝅i(1),𝝅i(3)};\displaystyle H_{02}:\boldsymbol{\pi}_{i}(2)\leq\max\{\boldsymbol{\pi}_{i}(1),\boldsymbol{\pi}_{i}(3)\}\quad\text{ v.s. }\quad H_{a2}:\boldsymbol{\pi}_{i}(2)>\max\{\boldsymbol{\pi}_{i}(1),\boldsymbol{\pi}_{i}(3)\}; (32)
H03:𝝅i(3)max{𝝅i(1),𝝅i(2)} v.s. Ha3:𝝅i(3)>max{𝝅i(1),𝝅i(2)};\displaystyle H_{03}:\boldsymbol{\pi}_{i}(3)\leq\max\{\boldsymbol{\pi}_{i}(1),\boldsymbol{\pi}_{i}(2)\}\quad\text{ v.s. }\quad H_{a3}:\boldsymbol{\pi}_{i}(3)>\max\{\boldsymbol{\pi}_{i}(1),\boldsymbol{\pi}_{i}(2)\};

At most one of H01,H02H_{01},H_{02} and H03H_{03} can be rejected. On the right side of Figure 2, we show the results of the aforementioned tests. We use red/blue/green to represent that H01H_{01}/H02H_{02}/H03H_{03} is rejected respectively. If none of these three null hypothesises is rejected, we let be point be grey. As we can see from Figure 2, for most (293/433=67.7%293/433=67.7\%) of the nodes, one of H01,H02H_{01},H_{02} and H03H_{03} is rejected. In other words, although many nodes have mixed membership, we can identify them to be closer to one community. Furthermore, the test results confirm our observation about the three corners we have mentioned before. It is worth mentioning that the information technology companies, which make up a large proportion of the S&P 500 list, can be found in abundance in any part of the simplex. This also indicates that the prosperous development of information technology industry is a result of the nurturing of traditional industries.

Refer to caption
Figure 2: Left: Scatter plot of 𝒓^i\widehat{\boldsymbol{r}}_{i} for i[n]i\in[n]. Several representative companies in each corners are highlighted. Right: The test results of H01,H02H_{01},H_{02} and H03H_{03}. Red/blue/green means that H01H_{01}/H02H_{02}/H03H_{03} is rejected while grey means that none of these three null hypothesises is rejected.

Next, we apply our approach in Example 2 to construct rank confidence interval for the nodes. We summarize our inference results in Table 1. We select 1111 stocks as representatives for presentation. First, we include the three estimated vertices. Second, of the four categories (red/blue/green/grey) shown in Figure 2, we randomly select two from each category and we label them with R/B/G/C (C stands for center). In Table 1 we include the estimated mixed membership vectors π^i\widehat{\pi}_{i} for each stock as well as the three 95%95\% rank confidence intervals for πi(1),πi(2)\pi_{i}(1),\pi_{i}(2) and πi(3)\pi_{i}(3), and we denote them by RCI I, RCI II and RCI III. As we can see from Table 1, our approach provides meaningful rank confidence intervals for the stocks and the categories which they are close to.

Symbol 𝝅^i\widehat{\boldsymbol{\pi}}_{i} RCI I RCI II RCI III
AAL (VIV_{I}) [1,0,0][1,0,0] [1,11][1,11] [206,433][206,433] [382,433][382,433]
LLY (VIIV_{II}) [0,1,0][0,1,0] [256,433][256,433] [1,13][1,13] [381,433][381,433]
MPC (VIIIV_{III}) [0,0,1][0,0,1] [137,433][137,433] [148,433][148,433] [1,7][1,7]
META (C) [0.457,0.177,0.366][0.457,0.177,0.366] [96,127][96,127] [163,341][163,341] [179,255][179,255]
EBAY (C) [0.379,0.220,0.401][0.379,0.220,0.401] [111,157][111,157] [150,277][150,277] [157,228][157,228]
DHR (R) [0.959,0.017,0.025][0.959,0.017,0.025] [1,19][1,19] [218,433][218,433] [381,433][381,433]
HOLX (R) [0.933,0.024,0.044][0.933,0.024,0.044] [1,23][1,23] [227,433][227,433] [369,433][369,433]
PNC (B) [0.009,0.983,0.008][0.009,0.983,0.008] [253,433][253,433] [1,17][1,17] [380,433][380,433]
MOS (B) [0.037,0.947,0.016][0.037,0.947,0.016] [243,433][243,433] [1,23][1,23] [378,433][378,433]
AEP (G) [0.080,0.063,0.857][0.080,0.063,0.857] [190,433][190,433] [178,421][178,421] [10,37][10,37]
LYB (G) [0.159,0.071,0.771][0.159,0.071,0.771] [166,400][166,400] [221,433][221,433] [35,51][35,51]
Table 1: Estimated mixed membership profile vectors π^i\widehat{\pi}_{i} and 95%95\% confidence intervals for the ranks of some representative stocks’s membership profiles with respect to the three estimated vertices, denoted respectively RC I, RC II, and RC III. AAL, LLY, MPC are the estimated vertices of the three categories. R/B/G/C stands for red/blue/green/grey(center), the color group that has been shown in Figure 2.

Next, we investigate whether there has been a change of simplex structure before the COVID-1919 and after the pandemic began. We use stock data from January 1, 2017 to January 1, 2020 as before COVID-1919 data, and stock data from May 1, 2020 to May 1, 2023 as after COVID-1919 data. We follow the same data preprocessing procedure as before, while the threshold for AijA_{ij} is replaced by 0.120.12 when dealing with the after COVID-19 data. And, we also conduct the aforementioned tests (32) to these two dataset. In Figure 3 we include the experiment results and the results of tests are also shown by the colors.

Recall that we have identified three categories with finance, healthcare, and energy as their respective representatives in Figure 2. From the top two plots of Figure 3 we can see that the ‘finance corner’ (red) and the ‘energy corner’ (blue) are consistent with each other, and these structures are similar to what we have observed in Figure 2 except for some companies (e.g., AXP, BRO, ED, LNT, WRP). However, a structural change in another corner brings us some interesting observations. First, the remaining corner (green) in the top left plot (before COVID-1919) of Figure 3 is a mix of companies from many different industries, and there is no industry can be considered representative of this corner. Second, the healthcare companies from the ‘healthcare corner’ in Figure 2, are now in the center cluster of the top left plot. The bottom left plot of Figure 3 is the zoomed plot of the center cluster of the top left plot, and all the companies (EW, HUM, HOLX, ABT, REGN, TMO, LH, ILMN, PFE) which have been identified as the representatives of the ‘healthcare corner’ in Figure 2 are there. However, when we look at the after COVID-1919 plots (top right and bottom right of Figure 3), we find that these healthcare companies move from the center cluster to the green corner, and consequently make the green corner a ‘healthcare corner’. The bottom right plot of Figure 3 is the zoomed plot of the green corner of the top right plot, and again we can see that all the nine aforementioned healthcare companies are here. To sum up, the ‘healthcare corner’ we have observed in Figure 2 does not exist before COVID-1919. At that time, the healthcare companies are more in the center of the simplex, and this indicates them to be a mix of the different industries. After the pandemic began, the healthcare industry have grown dramatically, and now they become the representative of the third corner along with ‘finance corner’ and ‘energy corner’.

Refer to caption
Figure 3: Top left: the simplex structure before COVID-1919. Top right: the simplex structure after the pandemic began. Red/blue/green means the corresponding H01H_{01}/H02H_{02}/H03H_{03} is rejected while grey means none of H01H_{01}/H02H_{02}/H03H_{03} is rejected. Bottom left: the zoomed plot of the center cluster of the top left plot. Bottom right: the zoomed plot of the green corner of the top right plot.

7 Proof Outline

In this section, we sketch the main steps of the proof. Details of the proof will be provided in Section 8. We begin by showing an “eigen-gap” property, which is fundamental to the matrix analysis. To this end, we provide the high-probability bound on the spectral norm of the noise matrix 𝑾\boldsymbol{W}.

Lemma 4.

The following event 𝒜1\mathcal{A}_{1} happens with probability at least 1O(n10)1-O(n^{-10}):

𝒜1={𝑾C2θmaxn}.\displaystyle\mathcal{A}_{1}=\left\{\left\|\boldsymbol{W}\right\|\leq C_{2}\theta_{\text{max}}\sqrt{n}\right\}. (33)
Proof.

By definition we know that |Wij|1|W_{ij}|\leq 1. Assumption 3 implies

maxi,j𝔼[Wij2]\displaystyle\max_{i,j}\mathbb{E}\left[W_{ij}^{2}\right] =maxi,jHij(1Hij)maxi,jHijmaxi,jθiθj𝝅i𝑷𝝅jθmax2.\displaystyle=\max_{i,j}H_{ij}\left(1-H_{ij}\right)\leq\max_{i,j}H_{ij}\leq\max_{i,j}\theta_{i}\theta_{j}\boldsymbol{\pi}_{i}^{\top}\boldsymbol{P}\boldsymbol{\pi}_{j}\lesssim\theta_{\text{max}}^{2}.

As a result, by [CCF+21, Theorem 3.4] we know that

𝑾θmaxn+lognθmaxn\displaystyle\left\|\boldsymbol{W}\right\|\lesssim\theta_{\text{max}}\sqrt{n}+\sqrt{\log n}\lesssim\theta_{\text{max}}\sqrt{n}

with probability at least 1O(n10)1-O(n^{-10}). ∎

We are going to prove our results under this favorable set 𝒜1\mathcal{A}_{1}. We begin by listing the key results leading to Theorem 1 and Theorem 2. A key assumption in all these results will be either θmaxnλ1\theta_{\text{max}}\sqrt{n}\ll\lambda_{1}^{*} (for results regarding 𝒖^1𝒖1\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{*}) or θmaxnσmin\theta_{\text{max}}\sqrt{n}\ll\sigma_{\textbf{min}}^{*} (for results regarding 𝑼¯𝑹𝑼¯\overline{\boldsymbol{U}}\boldsymbol{R}-\overline{\boldsymbol{U}}^{*}) to guarantee the eigen-gap. From Lemma 1, we already know that these two assumptions are mild.

We now state the results for 𝒖^1𝒖1\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{*}. The proofs rely on contour integrals, similar in spirit to [FFHL22a].

Theorem 8.

Assume that θmaxnλ1\theta_{\text{max}}\sqrt{n}\ll\lambda_{1}^{*}. Under the event 𝒜1\mathcal{A}_{1} defined in (33), we can write the following expansion

𝒖^1𝒖1=i=2n𝒖i𝑾𝒖1λ1λi𝒖i+𝜹,\displaystyle\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{*}=\sum_{i=2}^{n}\frac{\boldsymbol{u}_{i}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}}{\lambda_{1}^{*}-\lambda_{i}^{*}}\boldsymbol{u}_{i}^{*}+\boldsymbol{\delta}, (34)

where 𝛅2θmax2n/λ12\|\boldsymbol{\delta}\|_{2}\lesssim\theta_{\text{max}}^{2}n/\lambda_{1}^{*2}.

Proof.

See Section 8.1. ∎

Theorem 8 expresses the quantity 𝒖^1𝒖1\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{*} as the sum of a leading term (first summand of the RHS of (34)) and a error-term 𝜹\boldsymbol{\delta}, along with its l2l_{2} bound. We also need an ll_{\infty} bound of 𝜹\boldsymbol{\delta} which is provided by the following Theorem.

Theorem 9.

Assume that max{nθmax,logn}λ1\max\{\sqrt{n}\theta_{\text{max}},\log n\}\ll\lambda_{1}^{*}. Then with probability at least 1O(n10)1-O(n^{-10}), we have

𝜹\displaystyle\left\|\boldsymbol{\delta}\right\|_{\infty}\lesssim θmax3n1.5λ13+n((K1)μ+logn)θmax2λ12\displaystyle\frac{\theta_{\text{max}}^{3}n^{1.5}}{\lambda_{1}^{*3}}+\frac{\sqrt{n((K-1)\mu^{*}+\log n)}\theta_{\text{max}}^{2}}{\lambda_{1}^{*2}}
+logn(logn+(K1)μ)θmax+log2nμ/nλ12,\displaystyle+\frac{\log n\left(\sqrt{\log n}+\sqrt{(K-1)\mu^{*}}\right)\theta_{\text{max}}+\log^{2}n\sqrt{\mu^{*}/n}}{\lambda_{1}^{*2}},

where δ\delta is defined via (34).

Proof.

See Section 8.2. ∎

Combining Theorem 8 and Theorem 9 with triangle inequality provides the next lemma.

Lemma 5.

Assume that max{nθmax,logn}λ1\max\{\sqrt{n}\theta_{\text{max}},\log n\}\ll\lambda_{1}^{*}. Then with probability at least 1O(n10)1-O(n^{-10}), we have

𝒖^1𝒖1\displaystyle\left\|\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{*}\right\|_{\infty}\lesssim (logn+(K1)μ)θmax+lognμ/nλ1+θmax3n1.5λ13.\displaystyle\frac{\left(\sqrt{\log n}+\sqrt{(K-1)\mu^{*}}\right)\theta_{\text{max}}+\log n\sqrt{\mu^{*}/n}}{\lambda_{1}^{*}}+\frac{\theta_{\text{max}}^{3}n^{1.5}}{\lambda_{1}^{*3}}.
Proof.

See Section 8.3. ∎

The combination of Theorem 8, Theorem 9 and Lemma 5 is exactly Theorem 1.

The next goal is to analyze 𝑼¯𝑹𝑼¯\overline{\boldsymbol{U}}\boldsymbol{R}-\overline{\boldsymbol{U}}^{*}. This is a matrix denoising problem with ground truth 𝑯¯\overline{\boldsymbol{H}} and noisy matrix 𝑿¯\overline{\boldsymbol{X}}, where

𝑯¯=i=2Kλi𝒖i𝒖i,𝑿¯=i=2nλ^i𝒖^i𝒖^i.\displaystyle\overline{\boldsymbol{H}}=\sum_{i=2}^{K}\lambda_{i}^{*}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top},\quad\overline{\boldsymbol{X}}=\sum_{i=2}^{n}\widehat{\lambda}_{i}\widehat{\boldsymbol{u}}_{i}\widehat{\boldsymbol{u}}_{i}^{\top}.

Define the noise matrix

𝑾¯:=𝑿¯𝑯¯=𝑾[λ^1𝒖^1𝒖^1λ1𝒖1𝒖1]\overline{\boldsymbol{W}}:=\overline{\boldsymbol{X}}-\overline{\boldsymbol{H}}=\boldsymbol{W}-[\widehat{\lambda}_{1}\widehat{\boldsymbol{u}}_{1}\widehat{\boldsymbol{u}}_{1}^{\top}-\lambda_{1}^{*}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}] (35)

Unlike 𝑾\boldsymbol{W}, the matrix 𝑾¯\overline{\boldsymbol{W}} does not have a close form expression. The following expansion for 𝑾¯\overline{\boldsymbol{W}} will be useful for our results.

Lemma 6.

Assume that nλ1\sqrt{n}\ll\lambda_{1}^{*}. Then we have

𝑾¯=𝑾𝑾𝒖1𝒖1𝑾𝒖1𝒖1𝑵𝑾𝒖1𝒖1(𝑾𝒖1𝒖1)𝑵𝚫,\displaystyle\overline{\boldsymbol{W}}=\boldsymbol{W}-\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}-\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}-\left(\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right)^{\top}\boldsymbol{N}-\boldsymbol{\Delta},

where NN is defined by (14).

Further, with probability at least 1O(n10)1-O(n^{-10}) we have

𝚫n/λ1and𝚫𝑼¯2,(K1)μnλ12+n1.5λ12.\displaystyle\|\boldsymbol{\Delta}\|\lesssim n/\lambda_{1}^{*}\quad\text{and}\quad\left\|\boldsymbol{\Delta}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}\lesssim\sqrt{\frac{(K-1)\mu^{*}n}{\lambda_{1}^{*2}}}+\frac{n^{1.5}}{\lambda_{1}^{*2}}.
Proof.

See Theorem 10 and Lemma 11. ∎

In order to study the expansion of 𝑾¯\overline{\boldsymbol{W}}, defined via (35), it is enough to expand λ^1𝒖^1𝒖^1λ1𝒖1𝒖1\widehat{\lambda}_{1}\widehat{\boldsymbol{u}}_{1}\widehat{\boldsymbol{u}}_{1}^{\top}-\lambda_{1}^{*}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}. We state our result here. The proof, which uses contour integrals, is deferred to Section 8.4.

Theorem 10.

Assume that nθmaxλ1\sqrt{n}\theta_{\text{max}}\ll\lambda_{1}^{*}. Then under event 𝒜1\mathcal{A}_{1} defined by (33), we have the following expansion:

λ^1𝒖^1𝒖^1λ1𝒖1𝒖1=𝒖1𝒖1𝑾𝒖1𝒖1+𝑵𝑾𝒖1𝒖1+(𝑾𝒖1𝒖1)𝑵+𝚫,\displaystyle\widehat{\lambda}_{1}\widehat{\boldsymbol{u}}_{1}\widehat{\boldsymbol{u}}_{1}^{\top}-\lambda_{1}^{*}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}=\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+\left(\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right)^{\top}\boldsymbol{N}+\boldsymbol{\Delta}, (36)

where 𝐍=Δi=2nλ1λ1λi𝐮i𝐮i\boldsymbol{N}\overset{\Delta}{=}\sum_{i=2}^{n}\frac{\lambda_{1}^{*}}{\lambda_{1}^{*}-\lambda_{i}^{*}}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top} is a symmetric matrix and 𝚫nθmax2/λ1\|\boldsymbol{\Delta}\|\lesssim n\theta_{\text{max}}^{2}/\lambda_{1}^{*}.

Theorem 10 shows the first part of Lemma 6, while the second part, the bound for 𝚫𝑼¯2,\|\boldsymbol{\Delta}\overline{\boldsymbol{U}}^{*}\|_{2,\infty}, will be shown later. From Theorem 10 we can directly deduce the following corollary.

Corollary 2.

Assume that nθmaxλ1\sqrt{n}\theta_{\text{max}}\ll\lambda_{1}^{*}. Then under event 𝒜1\mathcal{A}_{1}, we have

λ^1𝒖^1𝒖^1λ1𝒖1𝒖1nθmax.\displaystyle\left\|\widehat{\lambda}_{1}\widehat{\boldsymbol{u}}_{1}\widehat{\boldsymbol{u}}_{1}^{\top}-\lambda_{1}^{*}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right\|\lesssim\sqrt{n}\theta_{\text{max}}.
Proof.

See Section 8.5. ∎

Corollary 2, coupled with Lemma 4, shows that under event 𝒜1\mathcal{A}_{1},

𝑾¯𝑾+λ^1𝒖^1𝒖^1λ1𝒖1𝒖1nθmax.\displaystyle\|\overline{\boldsymbol{W}}\|\leq\|\boldsymbol{W}\|+\left\|\widehat{\lambda}_{1}\widehat{\boldsymbol{u}}_{1}\widehat{\boldsymbol{u}}_{1}^{\top}-\lambda_{1}^{*}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right\|\lesssim\sqrt{n}\theta_{\text{max}}. (37)

Equipped with Theorem 5, we are ready to prove results regarding matrix denoising. We show the following five results whose combination proves Theorem 2. Note that the first four results here are similar to [YCF21, Lemma 1-4], while the fifth results is exactly the second part of Lemma 6. To state these results, we need to introduce some notations first. Define

𝑳:=𝑼¯𝑼¯, 𝑽¯:=[𝒗^2,,𝒗^K], 𝑽¯:=[𝒗2,,𝒗K], 𝚲¯:=diag(λ^2,,λ^K),\displaystyle\boldsymbol{L}:=\overline{\boldsymbol{U}}^{\top}\overline{\boldsymbol{U}}^{*},\text{ }\overline{\boldsymbol{V}}:=[\widehat{\boldsymbol{v}}_{2},\dots,\widehat{\boldsymbol{v}}_{K}],\text{ }\overline{\boldsymbol{V}}^{*}:=[\boldsymbol{v}_{2}^{*},\dots,\boldsymbol{v}_{K}^{*}],\text{ }\overline{\boldsymbol{\Lambda}}:=\textbf{diag}(\widehat{\lambda}_{2},\dots,\widehat{\lambda}_{K}), (38)

where 𝒗^i\widehat{\boldsymbol{v}}_{i}’s are defined as (7). Recall the definition of RR from (10). Here we state the five lemmas whose proofs are deferred.

Lemma 7.

Assume that nθmaxσmin\sqrt{n}\theta_{\text{max}}\ll\sigma_{\textbf{min}}^{*}. Then under event 𝒜1\mathcal{A}_{1} we have

𝑼¯𝑹𝑼¯nθmaxσmin,𝑳𝑹nθmax2σmin2and12σi(𝑳)2,1iK1.\displaystyle\left\|\overline{\boldsymbol{U}}\boldsymbol{R}-\overline{\boldsymbol{U}}^{*}\right\|\lesssim\frac{\sqrt{n}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{*}},\quad\left\|\boldsymbol{L}-\boldsymbol{R}\right\|\lesssim\frac{n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{*2}}\quad\text{and}\quad\frac{1}{2}\leq\sigma_{i}(\boldsymbol{L})\leq 2,\quad 1\leq i\leq K-1.

Furthermore, under event 𝒜1\mathcal{A}_{1} we have 𝐕¯=𝐔¯𝐃\overline{\boldsymbol{V}}=\overline{\boldsymbol{U}}\boldsymbol{D}, 𝐕¯=𝐔¯𝐃\overline{\boldsymbol{V}}^{*}=\overline{\boldsymbol{U}}^{*}\boldsymbol{D}, where

𝑫=diag(sgn(λ2),,sgn(λK))𝒪(K1)×(K1).\displaystyle\boldsymbol{D}=\textbf{diag}(\textbf{sgn}(\lambda_{2}^{*}),\dots,\textbf{sgn}(\lambda_{K}^{*}))\in\mathcal{O}^{(K-1)\times(K-1)}. (39)

This immediately implies 𝐕¯𝐕¯=𝐃𝐋𝐃\overline{\boldsymbol{V}}^{\top}\overline{\boldsymbol{V}}^{*}=\boldsymbol{D}\boldsymbol{L}\boldsymbol{D},

𝑫𝑹𝑫=min𝑶𝒪(K1)×(K1)𝑽¯𝑶𝑽¯F,\boldsymbol{D}\boldsymbol{R}\boldsymbol{D}=\min_{\boldsymbol{O}\in\mathcal{O}^{(K-1)\times(K-1)}}\left\|\overline{\boldsymbol{V}}\boldsymbol{O}-\overline{\boldsymbol{V}}^{*}\right\|_{F},

and the same results for 𝐕¯\overline{\boldsymbol{V}} and 𝐕¯\overline{\boldsymbol{V}}^{*} are also true. In fact, we have

𝑽¯𝑫𝑹𝑫𝑽¯=𝑼¯𝑹𝑼¯.\displaystyle\left\|\overline{\boldsymbol{V}}\boldsymbol{D}\boldsymbol{R}\boldsymbol{D}-\overline{\boldsymbol{V}}^{*}\right\|=\left\|\overline{\boldsymbol{U}}\boldsymbol{R}-\overline{\boldsymbol{U}}^{*}\right\|.
Proof.

See Section 8.6. ∎

Lemma 8.

Assume that nθmaxσmin\sqrt{n}\theta_{\text{max}}\ll\sigma_{\textbf{min}}^{*} and n2θmax2(K1)μ2lognn^{2}\theta_{\text{max}}^{2}\gtrsim(K-1)\mu^{*2}\log n. Then with probability exceeding 1O(n10)1-O(n^{-10}) we have

𝑹𝚲¯𝑹𝚲¯κnθmax2σmin+(K1)lognθmax,\displaystyle\left\|\boldsymbol{R}^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{R}-\overline{\boldsymbol{\Lambda}}^{*}\right\|\lesssim\frac{\kappa^{*}n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{*}}+\sqrt{(K-1)\log n}\theta_{\text{max}},
𝑳𝚲¯𝑳𝚲¯θmax3n1.5σmin2+(K1)lognθmax.\displaystyle\left\|\boldsymbol{L}^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{\Lambda}}^{*}\right\|\lesssim\frac{\theta_{\text{max}}^{3}n^{1.5}}{\sigma_{\textbf{min}}^{*2}}+\sqrt{(K-1)\log n}\theta_{\text{max}}.
Proof.

See Section 8.7. ∎

Lemma 9.

Assume that (K1)lognθmax/σmin+κnθmax2/σmin21\sqrt{(K-1)\log n}\theta_{\text{max}}/\sigma_{\textbf{min}}^{*}+\kappa^{*}n\theta_{\text{max}}^{2}/\sigma_{\textbf{min}}^{*2}\ll 1 and max{nθmax,logn}σmin\max\{\sqrt{n}\theta_{\text{max}},\log n\}\ll\sigma_{\textbf{min}}^{*}. Then with probability at least 1O(n10)1-O(n^{-10}) we have

𝑼¯𝚲¯𝑳𝑿¯𝑼¯2,\displaystyle\left\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}\lesssim (nθmax2σmin+logn)𝑼¯𝑳𝑼¯2,+(K1)nlognσminθmax\displaystyle\left(\frac{n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{*}}+\log n\right)\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}+\frac{\sqrt{(K-1)n\log n}}{\sigma_{\textbf{min}}^{*}}\theta_{\text{max}}
+((K1)nθmax+logn)nθmax2λ1σmin+(nθmax+logn)2(K1)μ/nσmin\displaystyle+\frac{(\sqrt{(K-1)n}\theta_{\text{max}}+\log n)n\theta_{\text{max}}^{2}}{\lambda_{1}^{*}\sigma_{\textbf{min}}^{*}}+\frac{(\sqrt{n}\theta_{\text{max}}+\log n)^{2}\sqrt{(K-1)\mu^{*}/n}}{\sigma_{\textbf{min}}^{*}}
+κσmin(K1)μnθmax2.\displaystyle+\frac{\kappa^{*}}{\sigma_{\textbf{min}}^{*}}\sqrt{(K-1)\mu^{*}n}\theta_{\text{max}}^{2}.
Proof.

See Section 8.8. ∎

Lemma 10.

Assume that (K1)lognθmax/σmin+κnθmax2/σmin21\sqrt{(K-1)\log n}\theta_{\text{max}}/\sigma_{\textbf{min}}^{*}+\kappa^{*}n\theta_{\text{max}}^{2}/\sigma_{\textbf{min}}^{*2}\ll 1 and nμmax{log2n,K1}n\gtrsim\mu^{*}\max\{\log^{2}n,K-1\}. Then with probability at least 1O(n10)1-O(n^{-10}) we have

𝑼¯𝑳𝑼¯2,\displaystyle\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}\lesssim κσmin2(K1)μnθmax2+(K1)nlognσmin2θmax+K1n1.5θmax3λ1σmin2\displaystyle\frac{\kappa^{*}}{\sigma_{\textbf{min}}^{*2}}\sqrt{(K-1)\mu^{*}n}\theta_{\text{max}}^{2}+\frac{\sqrt{(K-1)n\log n}}{\sigma_{\textbf{min}}^{*2}}\theta_{\text{max}}+\frac{\sqrt{K-1}n^{1.5}\theta_{\text{max}}^{3}}{\lambda_{1}^{*}\sigma_{\textbf{min}}^{*2}}
+(K1)lognθmax+logn(K1)μ/n+μθmaxσmin+nθmax2λ1σmin.\displaystyle+\frac{\sqrt{(K-1)\log n}\theta_{\text{max}}+\log n\sqrt{(K-1)\mu^{*}/n}+\sqrt{\mu^{*}}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{*}}+\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{*}\sigma_{\textbf{min}}^{*}}.
Proof.

See Section 8.9. ∎

Lemma 11.

Assume that max{nθmax,logn}λ1\max\{\sqrt{n}\theta_{\text{max}},\log n\}\ll\lambda_{1}^{*}. Then with probability at least 1O(n10)1-O(n^{-10}) we have

𝚫𝑼¯2,(K1)μnθmax2λ1+K1μlog2nnλ1+n1.5θmax3λ12.\displaystyle\left\|\boldsymbol{\Delta}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}\lesssim\frac{\sqrt{(K-1)\mu^{*}n}\theta_{\text{max}}^{2}}{\lambda_{1}^{*}}+\frac{\sqrt{K-1}\mu^{*}\log^{2}n}{n\lambda_{1}^{*}}+\frac{n^{1.5}\theta_{\text{max}}^{3}}{\lambda_{1}^{*2}}.
Proof.

See Section 8.10. ∎

Similar to [YCF21], we prove Theorem 2 by combining these five lemmas. The proof of Theorem 2 is included in Section 8.11. Next, we combine Theorem 1 and Theorem 2 to yield Theorem 3. See Section 8.12 for details.

Finally, to obtain the membership reconstruction results in Section 4, we need the following result regarding λ^1\widehat{\lambda}_{1}, which is a direct corollary of Theorem 10.

Corollary 3.

Assume that max{nθmax,logn}λ1\max\{\sqrt{n}\theta_{\text{max}},\log n\}\ll\lambda_{1}^{*}. Then under event 𝒜1\mathcal{A}_{1} defined by (33), we have the following expansion:

λ^1λ1=Tr[𝑾𝒖1𝒖1+2𝑵𝑾𝒖1𝒖1]+Tr[𝚫],\displaystyle\widehat{\lambda}_{1}-\lambda_{1}^{*}=\textbf{Tr}\left[\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+2\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right]+\textbf{Tr}\left[\boldsymbol{\Delta}\right],

where |Tr[𝚫]|nθmax2/λ1|\textbf{Tr}\left[\boldsymbol{\Delta}\right]|\lesssim n\theta_{\text{max}}^{2}/\lambda_{1}^{*}. In terms of the estimation error, we have |λ^1λ1|nθmax|\widehat{\lambda}_{1}-\lambda_{1}^{*}|\lesssim\sqrt{n}\theta_{\text{max}}.

Proof.

See Section 8.16. ∎

References

  • [Abb17] Emmanuel Abbe. Community detection and stochastic block models: recent developments. The Journal of Machine Learning Research, 18(1):6446–6531, 2017.
  • [ABEF14] Edoardo M Airoldi, David Blei, Elena A Erosheva, and Stephen E Fienberg. Handbook of mixed membership models and their applications. CRC press, 2014.
  • [ABFX08] Edo M Airoldi, David Blei, Stephen Fienberg, and Eric Xing. Mixed membership stochastic blockmodels. Advances in neural information processing systems, 21, 2008.
  • [ACV14] Ery Arias-Castro and Nicolas Verzelen. Community detection in dense random networks. The Annals of Statistics, 42(3):940 – 969, 2014.
  • [AFW22] Emmanuel Abbe, Jianqing Fan, and Kaizheng Wang. An p\ell_{p} theory of pca and spectral clustering. The Annals of Statistics, 50(4):2359–2385, 2022.
  • [AFWZ20] Emmanuel Abbe, Jianqing Fan, Kaizheng Wang, and Yiqiao Zhong. Entrywise eigenvector analysis of random matrices with low expected rank. Annals of statistics, 48(3):1452, 2020.
  • [ASG+01] Mário César Ugulino Araújo, Teresa Cristina Bezerra Saldanha, Roberto Kawakami Harrop Galvao, Takashi Yoneyama, Henrique Caldas Chame, and Valeria Visani. The successive projections algorithm for variable selection in spectroscopic multicomponent analysis. Chemometrics and intelligent laboratory systems, 57(2):65–73, 2001.
  • [BS16] Peter J Bickel and Purnamrita Sarkar. Hypothesis testing for automated community detection in networks. Journal of the Royal Statistical Society: Series B: Statistical Methodology, pages 253–273, 2016.
  • [CCF+21] Yuxin Chen, Yuejie Chi, Jianqing Fan, Cong Ma, et al. Spectral methods for data science: A statistical perspective. Foundations and Trends® in Machine Learning, 14(5):566–806, 2021.
  • [CCK15] Victor Chernozhukov, Denis Chetverikov, and Kengo Kato. Comparison and anti-concentration bounds for maxima of gaussian random vectors. Probability Theory and Related Fields, 162:47–70, 2015.
  • [CCKK22] Victor Chernozhuokov, Denis Chetverikov, Kengo Kato, and Yuta Koike. Improved central limit theorem and bootstrap approximations in high dimensions. The Annals of Statistics, 50(5):2562–2586, 2022.
  • [CFMW19] Yuxin Chen, Jianqing Fan, Cong Ma, and Kaizheng Wang. Spectral method and regularized mle are both optimal for top-k ranking. Annals of statistics, 47(4):2204, 2019.
  • [CFMY19] Yuxin Chen, Jianqing Fan, Cong Ma, and Yuling Yan. Inference and uncertainty quantification for noisy matrix completion. Proceedings of the National Academy of Sciences, 116(46):22931–22937, 2019.
  • [CS15] Yuxin Chen and Changho Suh. Spectral mle: Top-k rank aggregation from pairwise comparisons. In International Conference on Machine Learning, pages 371–380. PMLR, 2015.
  • [DKNS01] Cynthia Dwork, Ravi Kumar, Moni Naor, and Dandapani Sivakumar. Rank aggregation methods for the web. In Proceedings of the 10th international conference on World Wide Web, pages 613–622, 2001.
  • [FFHL19] Jianqing Fan, Yingying Fan, Xiao Han, and Jinchi Lv. Simple: Statistical inference on membership profiles in large networks. arXiv preprint arXiv:1910.01734, 2019.
  • [FFHL22a] Jianqing Fan, Yingying Fan, Xiao Han, and Jinchi Lv. Asymptotic theory of eigenvectors for random matrices with diverging spikes. Journal of the American Statistical Association, 117(538):996–1009, 2022.
  • [FFHL22b] Jianqing Fan, Yingying Fan, Xiao Han, and Jinchi Lv. Simple: Statistical inference on membership profiles in large networks. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(2):630–653, 2022.
  • [FFLY22] Jianqing Fan, Yingying Fan, Jinchi Lv, and Fan Yang. Simple-rc: Group network inference with non-sharp nulls and weak signals. arXiv preprint arXiv:2211.00128, 2022.
  • [FHY22] Jianqing Fan, Jikai Hou, and Mengxin Yu. Uncertainty quantification of mle for entity ranking with covariates. arXiv preprint arXiv:2212.09961, 2022.
  • [FLWY22] Jianqing Fan, Zhipeng Lou, Weichen Wang, and Mengxin Yu. Ranking inferences based on the top choice of multiway comparisons. arXiv preprint arXiv:2211.11957, 2022.
  • [GSZ23] Chao Gao, Yandi Shen, and Anderson Y Zhang. Uncertainty quantification in the bradley–terry–luce model. Information and Inference: A Journal of the IMA, 12(2):1073–1140, 2023.
  • [GV13] Nicolas Gillis and Stephen A Vavasis. Fast and robust recursive algorithmsfor separable nonnegative matrix factorization. IEEE transactions on pattern analysis and machine intelligence, 36(4):698–714, 2013.
  • [GZF+10] Anna Goldenberg, Alice X Zheng, Stephen E Fienberg, Edoardo M Airoldi, et al. A survey of statistical network models. Foundations and Trends® in Machine Learning, 2(2):129–233, 2010.
  • [HLL83] Paul W Holland, Kathryn Blackmond Laskey, and Samuel Leinhardt. Stochastic blockmodels: First steps. Social networks, 5(2):109–137, 1983.
  • [Hun04] David R Hunter. Mm algorithms for generalized bradley-terry models. The annals of statistics, 32(1):384–406, 2004.
  • [Jin15] Jiashun Jin. Fast community detection by SCORE. The Annals of Statistics, 43(1):57 – 89, 2015.
  • [JJKL21] P Ji, J Jin, ZT Ke, and W Li. Meta-analysis on citations for statisticians. manuscript.[1, 10], 2021.
  • [JKL23] Jiashun Jin, Zheng Tracy Ke, and Shengming Luo. Mixed membership estimation for social networks. Journal of Econometrics, 2023.
  • [KN11] Brian Karrer and Mark EJ Newman. Stochastic blockmodels and community structure in networks. Physical review E, 83(1):016107, 2011.
  • [Lei16] Jing Lei. A goodness-of-fit test for stochastic block models. The Annals of Statistics, 44(1):401 – 424, 2016.
  • [LR15] Jing Lei and Alessandro Rinaldo. Consistency of spectral clustering in stochastic block models. The Annals of Statistics, 43(1):215 – 237, 2015.
  • [Luc12] R Duncan Luce. Individual choice behavior: A theoretical analysis. Courier Corporation, 2012.
  • [LWLZ23] Tianxi Li, Yun-Jhong Wu, Elizaveta Levina, and Ji Zhu. Link prediction for egocentrically sampled networks. Journal of Computational and Graphical Statistics, pages 1–24, 2023.
  • [M+73] Daniel McFadden et al. Conditional logit analysis of qualitative choice behavior. 1973.
  • [New13a] Mark EJ Newman. Community detection and graph partitioning. Europhysics Letters, 103(2):28003, 2013.
  • [New13b] Mark EJ Newman. Spectral methods for community detection and graph partitioning. Physical Review E, 88(4):042822, 2013.
  • [NK07] DL NoweU and J Kleinberg. The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58(7):1019–1031, 2007.
  • [Rai19] Martin Raič. A multivariate Berry–Esseen theorem with explicit constants. Bernoulli, 25(4A):2824 – 2853, 2019.
  • [RCY11] Karl Rohe, Sourav Chatterjee, and Bin Yu. Spectral clustering and the high-dimensional stochastic blockmodel. The Annals of Statistics, 39(4):1878 – 1915, 2011.
  • [SC95] PC Sham and D Curtis. An extended transmission/disequilibrium test (tdt) for multi-allele marker loci. Annals of human genetics, 59(3):323–336, 1995.
  • [Sti94] Stephen M Stigler. Citation patterns in the journals of statistics and probability. Statistical Science, pages 94–108, 1994.
  • [T+15] Joel A Tropp et al. An introduction to matrix concentration inequalities. Foundations and Trends® in Machine Learning, 8(1-2):1–230, 2015.
  • [VAC15] Nicolas Verzelen and Ery Arias-Castro. Community detection in sparse random networks. The Annals of Applied Probability, 25(6):3465 – 3510, 2015.
  • [VL07] Ulrike Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17:395–416, 2007.
  • [WB17] Y. X. Rachel Wang and Peter J. Bickel. Likelihood-based model selection for stochastic block models. The Annals of Statistics, 45(2):500 – 528, 2017.
  • [WW87] Yuchung J Wang and George Y Wong. Stochastic blockmodels for directed graphs. Journal of the American Statistical Association, 82(397):8–19, 1987.
  • [YCF21] Yuling Yan, Yuxin Chen, and Jianqing Fan. Inference for heteroskedastic pca with missing data. arXiv preprint arXiv:2107.12365, 2021.
  • [ZM14] Pan Zhang and Cristopher Moore. Scalable detection of statistically significant communities and hierarchies, using message passing for modularity. Proceedings of the National Academy of Sciences, 111(51):18144–18149, 2014.

8 Proofs

8.1 Proof of Theorem 8

Proof.

Define r:=λ1|λ2|2r:=\frac{\lambda_{1}^{*}-|\lambda_{2}^{*}|}{2} and let 𝒞1\mathcal{C}_{1} be the circular contour around λ1\lambda_{1}^{*} with radius rr. Then λ1\lambda_{1}^{*} is the only eigenvalue of 𝑯\boldsymbol{H} that is inside 𝒞1\mathcal{C}_{1}. Under event 𝒜1\mathcal{A}_{1} defined by (33), by Weyl’s theorem, we know that

λ^1=σ^1σ1C2n=λ1C2θmaxn,\displaystyle\widehat{\lambda}_{1}=\widehat{\sigma}_{1}\geq\sigma_{1}^{*}-C_{2}\sqrt{n}=\lambda_{1}^{*}-C_{2}\theta_{\text{max}}\sqrt{n},
λ^iσ^maxσmax+C2θmaxn=max2jn|λj|+C2θmaxn, for 2iK.\displaystyle\widehat{\lambda}_{i}\leq\widehat{\sigma}_{\text{max}}\leq\sigma_{\text{max}}^{*}+C_{2}\theta_{\text{max}}\sqrt{n}=\max_{2\leq j\leq n}|\lambda_{j}^{*}|+C_{2}\theta_{\text{max}}\sqrt{n},\text{ for }2\leq i\leq K.

As a result, λ^1\widehat{\lambda}_{1} is the only eigenvalue of 𝑿\boldsymbol{X} that is inside 𝒞1\mathcal{C}_{1}. For λ\lambda\in\mathbb{C}, we have

(λ𝑰𝑯)1\displaystyle\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1} =i=1n1λλi𝒖i𝒖i\displaystyle=\sum_{i=1}^{n}\frac{1}{\lambda-\lambda_{i}^{*}}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top} (40)
(λ𝑰𝑿)1\displaystyle\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1} =(λ𝑰𝑯𝑾)1=i=1n1λλ^i𝒖^i𝒖^i.\displaystyle=\left(\lambda\boldsymbol{I}-\boldsymbol{H}-\boldsymbol{W}\right)^{-1}=\sum_{i=1}^{n}\frac{1}{\lambda-\widehat{\lambda}_{i}}\widehat{\boldsymbol{u}}_{i}\widehat{\boldsymbol{u}}_{i}^{\top}. (41)

As a result, we know that

(λ𝑰𝑯)1\displaystyle\left\|(\lambda\boldsymbol{I}-\boldsymbol{H})^{-1}\right\| =maxi[n]1|λλi|=1r1λ1,\displaystyle=\max_{i\in[n]}\frac{1}{|\lambda-\lambda_{i}^{*}|}=\frac{1}{r}\asymp\frac{1}{\lambda_{1}^{*}},
(λ𝑰𝑿)1\displaystyle\left\|(\lambda\boldsymbol{I}-\boldsymbol{X})^{-1}\right\| =maxi[n]1|λλ^i|maxi[n]1|λλi||λ^iλi|1rC2θmaxn1λ1\displaystyle=\max_{i\in[n]}\frac{1}{|\lambda-\widehat{\lambda}_{i}|}\leq\max_{i\in[n]}\frac{1}{|\lambda-\lambda_{i}^{*}|-|\widehat{\lambda}_{i}-\lambda_{i}^{*}|}\leq\frac{1}{r-C_{2}\theta_{\text{max}}\sqrt{n}}\asymp\frac{1}{\lambda_{1}^{*}} (42)

under event 𝒜1\mathcal{A}_{1}. Using (40) and (41) we know that

12πi𝒞1(λ𝑰𝑯)1𝑑λ=𝒖1𝒖1=Δ𝑷1,12πi𝒞1(λ𝑰𝑿)1𝑑λ=𝒖^1𝒖^1=Δ𝑷^1.\displaystyle\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}d\lambda=\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\overset{\Delta}{=}\boldsymbol{P}_{1}^{*},\quad\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}d\lambda=\widehat{\boldsymbol{u}}_{1}\widehat{\boldsymbol{u}}_{1}^{\top}\overset{\Delta}{=}\widehat{\boldsymbol{P}}_{1}. (43)

We denote by

Δ𝑷1=12πi𝒞1(λ𝑰𝑯)1𝑾(λ𝑰𝑯)1𝑑λ.\displaystyle\Delta\boldsymbol{P}_{1}=\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}d\lambda.

Then we know that

𝑷^1𝑷1Δ𝑷112π𝒞1(λ𝑰𝑿)1(λ𝑰𝑯)1(λ𝑰𝑯)1𝑾(λ𝑰𝑯)1𝑑λ.\displaystyle\left\|\widehat{\boldsymbol{P}}_{1}-\boldsymbol{P}_{1}^{*}-\Delta\boldsymbol{P}_{1}\right\|\leq\frac{1}{2\pi}\oint_{\mathcal{C}_{1}}\left\|\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}-\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}-\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\right\|d\lambda.

The integrand can be reformulated as,

(λ𝑰𝑿)1(λ𝑰𝑯)1(λ𝑰𝑯)1𝑾(λ𝑰𝑯)1\displaystyle\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}-\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}-\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}
=\displaystyle= (λ𝑰𝑿)1[(λ𝑰𝑯)(λ𝑰𝑿)](λ𝑰𝑯)1(λ𝑰𝑯)1𝑾(λ𝑰𝑯)1\displaystyle\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}\left[\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)-\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)\right]\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}-\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}
=\displaystyle= [(λ𝑰𝑿)1(λ𝑰𝑯)1]𝑾(λ𝑰𝑯)1\displaystyle\left[\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}-\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\right]\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}
=\displaystyle= (λ𝑰𝑿)1𝑾(λ𝑰𝑯)1𝑾(λ𝑰𝑯)1.\displaystyle\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}.

As a result, we have, under event 𝒜1\mathcal{A}_{1},

(λ𝑰𝑿)1(λ𝑰𝑯)1(λ𝑰𝑯)1𝑾(λ𝑰𝑯)1\displaystyle\left\|\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}-\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}-\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\right\|
\displaystyle\leq (λ𝑰𝑿)1(λ𝑰𝑯)12𝑾2θmax2nλ13,\displaystyle\left\|\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}\right\|\left\|\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\right\|^{2}\left\|\boldsymbol{W}\right\|^{2}\lesssim\frac{\theta_{\text{max}}^{2}n}{\lambda_{1}^{*3}},

where the first inequality uses (8.1). Hence, under event 𝒜1\mathcal{A}_{1},

𝑷^1𝑷1Δ𝑷112π𝒞1θmax2nλ13𝑑λ=θmax2nrλ13θmax2nλ12,\displaystyle\left\|\widehat{\boldsymbol{P}}_{1}-\boldsymbol{P}_{1}^{*}-\Delta\boldsymbol{P}_{1}\right\|\lesssim\frac{1}{2\pi}\oint_{\mathcal{C}_{1}}\frac{\theta_{\text{max}}^{2}n}{\lambda_{1}^{*3}}d\lambda=\frac{\theta_{\text{max}}^{2}nr}{\lambda_{1}^{*3}}\lesssim\frac{\theta_{\text{max}}^{2}n}{\lambda_{1}^{*2}},

since rλ1r\lesssim\lambda_{1}. This immediately yields

(𝑷^1𝑷1Δ𝑷1)𝒖12θmax2nλ12.\displaystyle\left\|\left(\widehat{\boldsymbol{P}}_{1}-\boldsymbol{P}_{1}^{*}-\Delta\boldsymbol{P}_{1}\right)\boldsymbol{u}_{1}^{*}\right\|_{2}\lesssim\frac{\theta_{\text{max}}^{2}n}{\lambda_{1}^{*2}}. (44)

By (43), we know that 𝑷1𝒖1=𝒖1\boldsymbol{P}_{1}^{*}\boldsymbol{u}_{1}^{*}=\boldsymbol{u}_{1}^{*}, 𝑷^1𝒖1=(𝒖^1𝒖1)𝒖^1\widehat{\boldsymbol{P}}_{1}\boldsymbol{u}_{1}^{*}=(\widehat{\boldsymbol{u}}_{1}^{\top}\boldsymbol{u}_{1}^{*})\widehat{\boldsymbol{u}}_{1}. Therefore,

Δ𝑷1\displaystyle\Delta\boldsymbol{P}_{1} =12πi𝒞1i=1nj=1n1(λλi)(λλj)𝒖i𝒖i𝑾𝒖j𝒖jdλ\displaystyle=\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\sum_{i=1}^{n}\sum_{j=1}^{n}\frac{1}{(\lambda-\lambda_{i}^{*})(\lambda-\lambda_{j}^{*})}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}\boldsymbol{W}\boldsymbol{u}_{j}^{*}\boldsymbol{u}_{j}^{*\top}d\lambda
=i=1nj=1nRes(1(λλi)(λλj),λ1)𝒖i𝒖i𝑾𝒖j𝒖j\displaystyle=\sum_{i=1}^{n}\sum_{j=1}^{n}\textbf{Res}\left(\frac{1}{(\lambda-\lambda_{i}^{*})(\lambda-\lambda_{j}^{*})},\lambda_{1}^{*}\right)\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}\boldsymbol{W}\boldsymbol{u}_{j}^{*}\boldsymbol{u}_{j}^{*\top}
=i=2n1λ1λi(𝒖i𝒖i𝑾𝒖1𝒖1+𝒖1𝒖1𝑾𝒖i𝒖i).\displaystyle=\sum_{i=2}^{n}\frac{1}{\lambda_{1}^{*}-\lambda_{i}^{*}}\left(\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}\right).

As a result, we know that

Δ𝑷1𝒖1=i=2n𝒖i𝑾𝒖1λ1λi𝒖i.\displaystyle\Delta\boldsymbol{P}_{1}\boldsymbol{u}_{1}^{*}=\sum_{i=2}^{n}\frac{\boldsymbol{u}_{i}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}}{\lambda_{1}^{*}-\lambda_{i}^{*}}\boldsymbol{u}_{i}^{*}.

Therefore, we obtain,

𝒖^1𝒖1i=2n𝒖i𝑾𝒖1λ1λi𝒖i\displaystyle\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{*}-\sum_{i=2}^{n}\frac{\boldsymbol{u}_{i}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}}{\lambda_{1}^{*}-\lambda_{i}^{*}}\boldsymbol{u}_{i}^{*} =((𝒖^1𝒖1)𝒖^1𝒖1i=2n𝒖i𝑾𝒖1λ1λi𝒖i)+((𝒖^1𝒖1)𝒖^1𝒖^1)\displaystyle=\left((\widehat{\boldsymbol{u}}_{1}^{\top}\boldsymbol{u}_{1}^{*})\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{*}-\sum_{i=2}^{n}\frac{\boldsymbol{u}_{i}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}}{\lambda_{1}^{*}-\lambda_{i}^{*}}\boldsymbol{u}_{i}^{*}\right)+\left((\widehat{\boldsymbol{u}}_{1}^{\top}\boldsymbol{u}_{1}^{*})\widehat{\boldsymbol{u}}_{1}-\widehat{\boldsymbol{u}}_{1}\right)
=(𝑷^1𝑷1Δ𝑷1)𝒖1T1+((𝒖^1𝒖1)𝒖^1𝒖^1)T2.\displaystyle=\underbrace{\left(\widehat{\boldsymbol{P}}_{1}-\boldsymbol{P}_{1}^{*}-\Delta\boldsymbol{P}_{1}\right)\boldsymbol{u}_{1}^{*}}_{T_{1}}+\underbrace{\left((\widehat{\boldsymbol{u}}_{1}^{\top}\boldsymbol{u}_{1}^{*})\widehat{\boldsymbol{u}}_{1}-\widehat{\boldsymbol{u}}_{1}\right)}_{T_{2}}. (45)

Now, T12θmax2n/λ12\|T_{1}\|_{2}\lesssim\theta_{\text{max}}^{2}n/\lambda_{1}^{*2} by  (44). To bound T22\|T_{2}\|_{2}, it is enough to bound (𝒖^1𝒖1)𝒖^1𝒖^12=|𝒖^1𝒖11|\|(\widehat{\boldsymbol{u}}_{1}^{\top}\boldsymbol{u}_{1}^{*})\widehat{\boldsymbol{u}}_{1}-\widehat{\boldsymbol{u}}_{1}\|_{2}=|\widehat{\boldsymbol{u}}_{1}^{\top}\boldsymbol{u}_{1}^{*}-1|. To this end, note that,

|𝒖^1𝒖11|=1𝒖^1𝒖1=𝒖1𝒖1+𝒖^1𝒖^1𝒖^1𝒖1𝒖1𝒖^12=𝒖1𝒖^1222,\displaystyle|\widehat{\boldsymbol{u}}_{1}^{\top}\boldsymbol{u}_{1}^{*}-1|=1-\widehat{\boldsymbol{u}}_{1}^{\top}\boldsymbol{u}_{1}^{*}=\frac{\boldsymbol{u}_{1}^{*\top}\boldsymbol{u}_{1}^{*}+\widehat{\boldsymbol{u}}_{1}^{\top}\widehat{\boldsymbol{u}}_{1}-\widehat{\boldsymbol{u}}_{1}^{\top}\boldsymbol{u}_{1}^{*}-\boldsymbol{u}_{1}^{*\top}\widehat{\boldsymbol{u}}_{1}}{2}=\frac{\left\|\boldsymbol{u}_{1}^{*}-\widehat{\boldsymbol{u}}_{1}\right\|_{2}^{2}}{2}, (46)

Also, by Wedin’s sinΘ\Theta Theorem [CCF+21, Theorem 2.9],

𝒖1𝒖^12𝑾λ1θmaxnλ1,\displaystyle\left\|\boldsymbol{u}_{1}^{*}-\widehat{\boldsymbol{u}}_{1}\right\|_{2}\lesssim\frac{\left\|\boldsymbol{W}\right\|}{\lambda_{1}^{*}}\lesssim\frac{\theta_{\text{max}}\sqrt{n}}{\lambda_{1}^{*}}, (47)

under 𝒜1\mathcal{A}_{1}. Combining  (46) and  (47), we obtain

(𝒖^1𝒖1)𝒖^1𝒖^12=𝒖1𝒖^1222θmax2nλ12.\displaystyle\|(\widehat{\boldsymbol{u}}_{1}^{\top}\boldsymbol{u}_{1}^{*})\widehat{\boldsymbol{u}}_{1}-\widehat{\boldsymbol{u}}_{1}\|_{2}=\frac{\left\|\boldsymbol{u}_{1}^{*}-\widehat{\boldsymbol{u}}_{1}\right\|_{2}^{2}}{2}\lesssim\frac{\theta_{\text{max}}^{2}n}{\lambda_{1}^{*2}}.

Therefore, we get, under the event 𝒜1\mathcal{A}_{1},

𝜹2\displaystyle\|\boldsymbol{\delta}\|_{2} 𝒖^1𝒖1i=2n𝒖i𝑾𝒖1λ1λi𝒖i2\displaystyle\leq\left\|\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{*}-\sum_{i=2}^{n}\frac{\boldsymbol{u}_{i}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}}{\lambda_{1}^{*}-\lambda_{i}^{*}}\boldsymbol{u}_{i}^{*}\right\|_{2}
(𝑷^1𝑷1Δ𝑷1)𝒖12+(𝒖^1𝒖1)𝒖^1𝒖^12θmax2nλ12.\displaystyle\leq\left\|\left(\widehat{\boldsymbol{P}}_{1}-\boldsymbol{P}_{1}^{*}-\Delta\boldsymbol{P}_{1}\right)\boldsymbol{u}_{1}^{*}\right\|_{2}+\|(\widehat{\boldsymbol{u}}_{1}^{\top}\boldsymbol{u}_{1}^{*})\widehat{\boldsymbol{u}}_{1}-\widehat{\boldsymbol{u}}_{1}\|_{2}\lesssim\frac{\theta_{\text{max}}^{2}n}{\lambda_{1}^{*2}}.

8.2 Proof of Theorem 9

Proof.

Recall from  (45) that

𝜹=𝒖^1𝒖1Δ𝑷1𝒖1=(𝑷^1𝑷1Δ𝑷1)𝒖1+(1𝒖^1𝒖1)𝒖^1.\displaystyle\boldsymbol{\delta}=\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{*}-\Delta\boldsymbol{P}_{1}\boldsymbol{u}_{1}^{*}=\left(\widehat{\boldsymbol{P}}_{1}-\boldsymbol{P}^{\star}_{1}-\Delta\boldsymbol{P}_{1}\right)\boldsymbol{u}_{1}^{*}+(1-\widehat{\boldsymbol{u}}_{1}^{\top}\boldsymbol{u}_{1}^{*})\widehat{\boldsymbol{u}}_{1}. (48)

We begin with bounding (𝑷^1𝑷1Δ𝑷1)𝒖1\|(\widehat{\boldsymbol{P}}_{1}-\boldsymbol{P}^{\star}_{1}-\Delta\boldsymbol{P}_{1})\boldsymbol{u}_{1}^{*}\|_{\infty}. Recall that,

𝑷^1𝑷1Δ𝑷1\displaystyle\widehat{\boldsymbol{P}}_{1}-\boldsymbol{P}_{1}^{*}-\Delta\boldsymbol{P}_{1} =12πi𝒞1(λ𝑰𝑿)1𝑾(λ𝑰𝑯)1𝑾(λ𝑰𝑯)1𝑑λ.\displaystyle=\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}d\lambda.

We split the integrand as a sum of the following two quantities,

𝚫1\displaystyle\boldsymbol{\Delta}_{1} :=(λ𝑰𝑯)1𝑾(λ𝑰𝑯)1𝑾(λ𝑰𝑯)1;\displaystyle:=\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1};
𝚫2\displaystyle\boldsymbol{\Delta}_{2} :=[(λ𝑰𝑿)1(λ𝑰𝑯)1]𝑾(λ𝑰𝑯)1𝑾(λ𝑰𝑯)1\displaystyle:=\left[\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}-\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\right]\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}
=(λ𝑰𝑿)1𝑾(λ𝑰𝑯)1𝑾(λ𝑰𝑯)1𝑾(λ𝑰𝑯)1,\displaystyle=\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1},

so that 𝑷^1𝑷1Δ𝑷1=12πi𝒞1(𝚫1+𝚫2)𝑑λ\widehat{\boldsymbol{P}}_{1}-\boldsymbol{P}_{1}^{*}-\Delta\boldsymbol{P}_{1}=\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\left(\boldsymbol{\Delta}_{1}+\boldsymbol{\Delta}_{2}\right)d\lambda. Under the event 𝒜1\mathcal{A}_{1},

[12πi𝒞1𝚫2𝑑λ]𝒖1\displaystyle\left\|\left[\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{2}d\lambda\right]\boldsymbol{u}_{1}^{*}\right\|_{\infty} [12πi𝒞1𝚫2𝑑λ]𝒖1212πi𝒞1𝚫2𝑑λ\displaystyle\leq\left\|\left[\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{2}d\lambda\right]\boldsymbol{u}_{1}^{*}\right\|_{2}\leq\left\|\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{2}d\lambda\right\|
12π𝒞1𝚫2𝑑λθmax3n1.5rλ14θmax3n1.5λ13,\displaystyle\leq\frac{1}{2\pi}\oint_{\mathcal{C}_{1}}\left\|\boldsymbol{\Delta}_{2}\right\|d\lambda\lesssim\frac{\theta_{\text{max}}^{3}n^{1.5}r}{\lambda_{1}^{*4}}\lesssim\frac{\theta_{\text{max}}^{3}n^{1.5}}{\lambda_{1}^{*3}}, (49)

where the fourth inequality uses (8.1). It remains to bound [12πi𝒞1𝚫1𝑑λ]𝒖1\left\|\left[\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{1}d\lambda\right]\boldsymbol{u}_{1}^{*}\right\|_{\infty}. Since 𝚫1\boldsymbol{\Delta}_{1} can be expanded as

𝚫1=i,j,k=1n1(λλi)(λλj)(λλk)𝒖i𝒖i𝑾𝒖j𝒖j𝑾𝒖k𝒖k,\displaystyle\boldsymbol{\Delta}_{1}=\sum_{i,j,k=1}^{n}\frac{1}{(\lambda-\lambda_{i}^{*})(\lambda-\lambda_{j}^{*})(\lambda-\lambda_{k}^{*})}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}\boldsymbol{W}\boldsymbol{u}_{j}^{*}\boldsymbol{u}_{j}^{*\top}\boldsymbol{W}\boldsymbol{u}_{k}^{*}\boldsymbol{u}_{k}^{*\top},

we have

12πi𝒞1𝚫1𝑑λ=i,j,k=1nRes(1(λλi)(λλj)(λλk),λ1)𝒖i𝒖i𝑾𝒖j𝒖j𝑾𝒖k𝒖k.\displaystyle\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{1}d\lambda=\sum_{i,j,k=1}^{n}\textbf{Res}\left(\frac{1}{(\lambda-\lambda_{i}^{*})(\lambda-\lambda_{j}^{*})(\lambda-\lambda_{k}^{*})},\lambda_{1}^{*}\right)\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}\boldsymbol{W}\boldsymbol{u}_{j}^{*}\boldsymbol{u}_{j}^{*\top}\boldsymbol{W}\boldsymbol{u}_{k}^{*}\boldsymbol{u}_{k}^{*\top}.

Since 𝒖k𝒖1=𝟙k=1\boldsymbol{u}_{k}^{*\top}\boldsymbol{u}_{1}^{*}=\mathbbm{1}_{k=1}, we have

[12πi𝒞1𝚫1𝑑λ]𝒖1=\displaystyle\left[\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{1}d\lambda\right]\boldsymbol{u}_{1}^{*}= i,j=1nRes(1(λλi)(λλj)(λλ1),λ1)𝒖i𝒖i𝑾𝒖j𝒖j𝑾𝒖1\displaystyle\sum_{i,j=1}^{n}\textbf{Res}\left(\frac{1}{(\lambda-\lambda_{i}^{*})(\lambda-\lambda_{j}^{*})(\lambda-\lambda_{1}^{*})},\lambda_{1}^{*}\right)\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}\boldsymbol{W}\boldsymbol{u}_{j}^{*}\boldsymbol{u}_{j}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}
=\displaystyle= i,j=2n1(λ1λi)(λ1λj)𝒖i𝒖i𝑾𝒖j𝒖j𝑾𝒖1\displaystyle\sum_{i,j=2}^{n}\frac{1}{(\lambda_{1}^{*}-\lambda_{i}^{*})(\lambda_{1}^{*}-\lambda_{j}^{*})}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}\boldsymbol{W}\boldsymbol{u}_{j}^{*}\boldsymbol{u}_{j}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}
i=2n1(λ1λi)2[𝒖i𝒖i𝑾𝒖1𝒖1𝑾𝒖1+𝒖1𝒖1𝑾𝒖i𝒖i𝑾𝒖1].\displaystyle-\sum_{i=2}^{n}\frac{1}{(\lambda_{1}^{*}-\lambda_{i}^{*})^{2}}\left[\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}+\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right].

Define the matrices

𝑵1:=i=2n1λ1λi𝒖i𝒖i,𝑵2:=i=2n1(λ1λi)2𝒖i𝒖i.\displaystyle\boldsymbol{N}_{1}:=\sum_{i=2}^{n}\frac{1}{\lambda_{1}^{*}-\lambda_{i}^{*}}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top},\quad\boldsymbol{N}_{2}:=\sum_{i=2}^{n}\frac{1}{(\lambda_{1}^{*}-\lambda_{i}^{*})^{2}}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}. (50)

Then we can write

[12πi𝒞1𝚫1𝑑λ]𝒖1=𝑵1𝑾𝑵1𝑾𝒖1𝑵2𝑾𝒖1𝒖1𝑾𝒖1𝒖1𝒖1𝑾𝑵2𝑾𝒖1.\displaystyle\left[\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{1}d\lambda\right]\boldsymbol{u}_{1}^{*}=\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}-\boldsymbol{N}_{2}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}-\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{N}_{2}\boldsymbol{W}\boldsymbol{u}_{1}^{*}. (51)

We analyse the three terms separately now.

  • (i)

    Control 𝑵1𝑾𝑵1𝑾𝒖1\left\|\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\|_{\infty}: We introduce leave-one-out matrix 𝑾(i)\boldsymbol{W}^{(i)} by replacing all the elements in ii-th row and ii-th column of original 𝑾\boldsymbol{W} with 0 for i[n]i\in[n]. Then, for any i[n]i\in[n], 𝑾i,\boldsymbol{W}_{i,\cdot} is independent of 𝑾(i)\boldsymbol{W}^{(i)}. We write

    𝑾𝑵1𝑾𝒖1\displaystyle\left\|\boldsymbol{W}\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\|_{\infty} =max1in|𝑾i,𝑵1𝑾𝒖1|\displaystyle=\max_{1\leq i\leq n}\left|\boldsymbol{W}_{i,\cdot}\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right|
    max1in{|𝑾i,𝑵1𝑾(i)𝒖1|+|𝑾i,𝑵1(𝑾𝑾(i))𝒖1|}.\displaystyle\leq\max_{1\leq i\leq n}\left\{\left|\boldsymbol{W}_{i,\cdot}\boldsymbol{N}_{1}\boldsymbol{W}^{(i)}\boldsymbol{u}_{1}^{*}\right|+\left|\boldsymbol{W}_{i,\cdot}\boldsymbol{N}_{1}\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{*}\right|\right\}. (52)

    For a fixed i[n]i\in[n], by Lemma 13 and Lemma 12, we obtain

    |𝑾i,𝑵1𝑾(i)𝒖1|\displaystyle\left|\boldsymbol{W}_{i,\cdot}\boldsymbol{N}_{1}\boldsymbol{W}^{(i)}\boldsymbol{u}_{1}^{*}\right| lognθmax𝑵1𝑾(i)𝒖12+logn𝑵1𝑾(i)𝒖1\displaystyle\lesssim\sqrt{\log n}\theta_{\text{max}}\Big{\|}\boldsymbol{N}_{1}\boldsymbol{W}^{(i)}\boldsymbol{u}_{1}^{*}\Big{\|}_{2}+\log n\Big{\|}\boldsymbol{N}_{1}\boldsymbol{W}^{(i)}\boldsymbol{u}_{1}^{*}\Big{\|}_{\infty}
    nlognθmax2λ1+logn𝑵1𝑾(i)𝒖1\displaystyle\lesssim\frac{\sqrt{n\log n}\theta_{\text{max}}^{2}}{\lambda_{1}^{*}}+\log n\left\|\boldsymbol{N}_{1}\boldsymbol{W}^{(i)}\boldsymbol{u}_{1}^{*}\right\|_{\infty} (53)

    with probability at least 1O(n15)1-O(n^{-15}) under 𝒜1\mathcal{A}_{1}. By Corollary 4 and Lemma 15, we have

    𝑵1𝑾(i)𝒖1\displaystyle\left\|\boldsymbol{N}_{1}\boldsymbol{W}^{(i)}\boldsymbol{u}_{1}^{*}\right\|_{\infty} 1λ1𝑾(i)𝒖1+(K1)μnλ12𝑾(i)𝒖12\displaystyle\lesssim\frac{1}{\lambda_{1}^{*}}\left\|\boldsymbol{W}^{(i)}\boldsymbol{u}_{1}^{*}\right\|_{\infty}+\sqrt{\frac{(K-1)\mu^{*}}{n\lambda_{1}^{*2}}}\left\|\boldsymbol{W}^{(i)}\boldsymbol{u}_{1}^{*}\right\|_{2}
    (logn+(K1)μ)θmax+lognμ/nλ1:=ρ1\displaystyle\lesssim\frac{\left(\sqrt{\log n}+\sqrt{(K-1)\mu^{*}}\right)\theta_{\text{max}}+\log n\sqrt{\mu^{*}/n}}{\lambda_{1}^{*}}:=\rho_{1} (54)

    with probability at least 1O(n15)1-O(n^{-15}) under event 𝒜1\mathcal{A}_{1}. Plugging (54) in (53) tells us

    |𝑾i,𝑵1𝑾(i)𝒖1|\displaystyle\left|\boldsymbol{W}_{i,\cdot}\boldsymbol{N}_{1}\boldsymbol{W}^{(i)}\boldsymbol{u}_{1}^{*}\right| nlognθmax2λ1+lognρ1=:η1\displaystyle\lesssim\frac{\sqrt{n\log n}\theta_{\text{max}}^{2}}{\lambda_{1}^{*}}+\log n\rho_{1}=:\eta_{1} (55)

    with probability at least 1O(n15)1-O(n^{-15}) under event 𝒜1\mathcal{A}_{1}.

    The second summand in (52) can be bounded using Lemma 12 and Lemma 16 as

    |𝑾i,𝑵1(𝑾𝑾(i))𝒖1|\displaystyle\left|\boldsymbol{W}_{i,\cdot}\boldsymbol{N}_{1}\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{*}\right| 𝑾i,2𝑵1(𝑾𝑾(i))𝒖12\displaystyle\lesssim\left\|\boldsymbol{W}_{i,\cdot}\right\|_{2}\left\|\boldsymbol{N}_{1}\right\|\left\|\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{*}\right\|_{2}
    𝑾λ1(lognθmax+(logn+nθmax)μ/n)\displaystyle\lesssim\frac{\left\|\boldsymbol{W}\right\|}{\lambda_{1}^{*}}\left(\sqrt{\log n}\theta_{\text{max}}+\left(\log n+\sqrt{n}\theta_{\text{max}}\right)\sqrt{\mu^{*}/n}\right)
    n(μ+logn)θmax2+lognμθmaxλ1\displaystyle\lesssim\frac{\sqrt{n(\mu^{*}+\log n)}\theta_{\text{max}}^{2}+\log n\sqrt{\mu^{*}}\theta_{\text{max}}}{\lambda_{1}^{*}} (56)

    with probability at least 1O(n15)1-O(n^{-15}). Plugging (55) and (56) in  (52) we get

    𝑾𝑵1𝑾𝒖1\displaystyle\left\|\boldsymbol{W}\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\|_{\infty}\lesssim nμθmax2λ1+η1\displaystyle\frac{\sqrt{n\mu^{*}}\theta_{\text{max}}^{2}}{\lambda_{1}^{*}}+\eta_{1}

    with probability at least 1O(n10)1-O(n^{-10}). Then by Cororllary 4 we have

    𝑵1𝑾𝑵1𝑾𝒖1\displaystyle\left\|\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\|_{\infty}\lesssim 1λ1𝑾𝑵1𝑾𝒖1+(K1)μnλ12𝑾𝑵1𝑾𝒖12\displaystyle\frac{1}{\lambda_{1}^{*}}\left\|\boldsymbol{W}\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\|_{\infty}+\sqrt{\frac{(K-1)\mu^{*}}{n\lambda_{1}^{*2}}}\left\|\boldsymbol{W}\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\|_{2}
    \displaystyle\lesssim 1λ1𝑾𝑵1𝑾𝒖1+(K1)μnλ12𝑾2𝑵1𝒖12\displaystyle\frac{1}{\lambda_{1}^{*}}\left\|\boldsymbol{W}\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\|_{\infty}+\sqrt{\frac{(K-1)\mu^{*}}{n\lambda_{1}^{*2}}}\left\|\boldsymbol{W}\right\|^{2}\left\|\boldsymbol{N}_{1}\right\|\left\|\boldsymbol{u}_{1}^{*}\right\|_{2}
    \displaystyle\lesssim nμθmax2λ12+η1λ1+(K1)μnλ1nθmax21λ1\displaystyle\frac{\sqrt{n\mu^{*}}\theta_{\text{max}}^{2}}{\lambda_{1}^{*2}}+\frac{\eta_{1}}{\lambda_{1}^{*}}+\frac{\sqrt{(K-1)\mu^{*}}}{\sqrt{n}\lambda_{1}^{*}}n\theta_{\text{max}}^{2}\frac{1}{\lambda_{1}^{*}}
    \displaystyle\lesssim n(K1)μθmax2λ12+η1λ1\displaystyle\frac{\sqrt{n(K-1)\mu^{*}}\theta_{\text{max}}^{2}}{\lambda_{1}^{*2}}+\frac{\eta_{1}}{\lambda_{1}^{*}}

    with probability at least 1O(n10)1-O(n^{-10}).

  • (ii)

    Control 𝑵2𝑾𝒖1𝒖1𝑾𝒖1\left\|\boldsymbol{N}_{2}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\|_{\infty}: First, by Lemma 17, with probability at least 1O(n15)1-O(n^{-15}), we have |𝒖1𝑾𝒖1|logn|\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}|\lesssim\sqrt{\log n}. Second, by Lemma 15 we have

    𝑾𝒖1lognθmax+lognμ/n\displaystyle\left\|\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\|_{\infty}\lesssim\sqrt{\log n}\theta_{\text{max}}+\log n\sqrt{\mu^{*}/n} (57)

    with probability at least 1O(n14)1-O(n^{-14}). Using the definition of ρ1\rho_{1} from (54),

    𝑵2𝑾𝒖1𝒖1𝑾𝒖1=|𝒖1𝑾𝒖1|𝑵2𝑾𝒖1logn𝑵2𝑾𝒖1\displaystyle\left\|\boldsymbol{N}_{2}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\|_{\infty}=\left|\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right|\left\|\boldsymbol{N}_{2}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\|_{\infty}\lesssim\sqrt{\log n}\left\|\boldsymbol{N}_{2}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\|_{\infty}
    logn(1λ12𝑾𝒖1+(K1)μnλ14𝑾𝒖12)lognλ1ρ1\displaystyle\lesssim\sqrt{\log n}\Big{(}\frac{1}{\lambda_{1}^{*2}}\left\|\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\|_{\infty}+\sqrt{\frac{(K-1)\mu^{*}}{n\lambda_{1}^{*4}}}\left\|\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\|_{2}\Big{)}\lesssim\frac{\sqrt{\log n}}{\lambda_{1}^{*}}\rho_{1}

    with probability at least 1O(n10)1-O(n^{-10}), where the second inequality uses Corollary 4.

  • (iii)

    Control 𝒖1𝒖1𝑾𝑵2𝑾𝒖1\left\|\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{N}_{2}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\|_{\infty}: Using Lemma 12, under the event 𝒜1\mathcal{A}_{1},

    𝒖1𝒖1𝑾𝑵2𝑾𝒖1\displaystyle\left\|\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{N}_{2}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\|_{\infty} =𝒖1|𝒖1𝑾𝑵2𝑾𝒖1|\displaystyle=\left\|\boldsymbol{u}_{1}^{*}\right\|_{\infty}\left|\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{N}_{2}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right|
    𝒖1|𝒖122𝑾2𝑵2μnθmax2λ12\displaystyle\leq\left\|\boldsymbol{u}_{1}^{*}\right\|_{\infty}\left|\boldsymbol{u}_{1}^{*}\right\|_{2}^{2}\left\|\boldsymbol{W}\right\|^{2}\left\|\boldsymbol{N}_{2}\right\|\leq\frac{\sqrt{\mu^{*}n}\theta_{\text{max}}^{2}}{\lambda_{1}^{*2}}

    with probability at least 1O(n10)1-O(n^{-10}), where the final inequality uses (12).

Combine these three parts with (51), we get

[12πi𝒞1𝚫1𝑑λ]𝒖1\displaystyle\left\|\left[\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{1}d\lambda\right]\boldsymbol{u}_{1}^{*}\right\|_{\infty}\lesssim n(K1)μθmax2λ12+η1λ1=:η2.\displaystyle\frac{\sqrt{n(K-1)\mu^{*}}\theta_{\text{max}}^{2}}{\lambda_{1}^{*2}}+\frac{\eta_{1}}{\lambda_{1}}=:\eta_{2}. (58)

with probability at least 1O(n10)1-O(n^{-10}). Combining  (58) and  (49), we get

(𝑷^1𝑷1Δ𝑷1)𝒖1\displaystyle\left\|(\widehat{\boldsymbol{P}}_{1}-\boldsymbol{P}_{1}-\Delta\boldsymbol{P}_{1})\boldsymbol{u}_{1}^{*}\right\|_{\infty}\lesssim θmax3n1.5λ13+η2.\displaystyle\frac{\theta_{\text{max}}^{3}n^{1.5}}{\lambda_{1}^{*3}}+\eta_{2}. (59)

It remains to bound (1𝒖^1𝒖1)𝒖^1\|(1-\widehat{\boldsymbol{u}}_{1}^{\top}\boldsymbol{u}_{1}^{*})\widehat{\boldsymbol{u}}_{1}\|_{\infty}. To this end, using (46) and (47),

(1𝒖^1𝒖1)𝒖^1\displaystyle\|(1-\widehat{\boldsymbol{u}}_{1}^{\top}\boldsymbol{u}_{1}^{*})\widehat{\boldsymbol{u}}_{1}\|_{\infty} =|1𝒖^1𝒖1|𝒖^1nθmax2λ12(𝒖1+𝒖^𝒖1)\displaystyle=|1-\widehat{\boldsymbol{u}}_{1}^{\top}\boldsymbol{u}_{1}^{*}|\|\widehat{\boldsymbol{u}}_{1}\|_{\infty}\lesssim\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{*2}}\left(\left\|\boldsymbol{u}_{1}^{*}\right\|_{\infty}+\left\|\widehat{\boldsymbol{u}}-\boldsymbol{u}_{1}^{*}\right\|_{\infty}\right)
nθmax2λ12(μn+𝒖^𝒖12)μnθmax2λ12+θmax3n1.5λ13,\displaystyle\leq\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{*2}}\left(\sqrt{\frac{\mu^{*}}{n}}+\left\|\widehat{\boldsymbol{u}}-\boldsymbol{u}_{1}^{*}\right\|_{2}\right)\lesssim\frac{\sqrt{\mu^{*}n}\theta_{\text{max}}^{2}}{\lambda_{1}^{*2}}+\frac{\theta_{\text{max}}^{3}n^{1.5}}{\lambda_{1}^{*3}}, (60)

where the second inequality is by (12). Plugging (59) and (60) in (48), we obtain

𝜹(𝑷^1𝑷1Δ𝑷1)𝒖1+(1𝒖^1𝒖1)𝒖^1θmax3n1.5λ13+η2\displaystyle\left\|\boldsymbol{\delta}\right\|_{\infty}\leq\left\|(\widehat{\boldsymbol{P}}_{1}-\boldsymbol{P}_{1}-\Delta\boldsymbol{P}_{1})\boldsymbol{u}_{1}^{*}\right\|_{\infty}+\|(1-\widehat{\boldsymbol{u}}_{1}^{\top}\boldsymbol{u}_{1}^{*})\widehat{\boldsymbol{u}}_{1}\|_{\infty}\lesssim\frac{\theta_{\text{max}}^{3}n^{1.5}}{\lambda_{1}^{*3}}+\eta_{2}

with probability at least 1O(n10)1-O(n^{-10}), completing the proof. ∎

8.3 Proof of Lemma 5

Proof.

By Theorem 8 we know that 𝒖^1𝒖1=𝑵1𝑾𝒖1+𝜹\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{*}=\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}+\boldsymbol{\delta}. By Lemma 4, Corollary 4 and (57) we know with probability at least 1O(n10)1-O(n^{-10}),

𝑵1𝑾𝒖1\displaystyle\left\|\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}\right\|_{\infty} 1λ1𝑾𝒖1+(K1)μnλ12𝑾𝒖12\displaystyle\leq\frac{1}{\lambda_{1}^{*}}\left\|\boldsymbol{W}\boldsymbol{u}_{1}\right\|_{\infty}+\sqrt{\frac{(K-1)\mu^{*}}{n\lambda_{1}^{*2}}}\left\|\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\|_{2}
1λ1𝑾𝒖12,+(K1)μnλ12𝑾ρ1,\displaystyle\leq\frac{1}{\lambda_{1}^{*}}\left\|\boldsymbol{W}\boldsymbol{u}_{1}\right\|_{2,\infty}+\sqrt{\frac{(K-1)\mu^{*}}{n\lambda_{1}^{*2}}}\left\|\boldsymbol{W}\right\|\lesssim\rho_{1},

where ρ1\rho_{1} is defined by (54). Combine this with Theorem 9, we get, with probability at least 1O(n10)1-O(n^{-10}),

𝒖^1𝒖1𝑵1𝑾𝒖1+𝜹ρ1+θmax3n1.5λ13+ρ1lognλ1ρ1+θmax3n1.5λ13\displaystyle\left\|\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{*}\right\|_{\infty}\leq\left\|\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}\right\|_{\infty}+\left\|\boldsymbol{\delta}\right\|_{\infty}\lesssim\rho_{1}+\frac{\theta_{\text{max}}^{3}n^{1.5}}{\lambda_{1}^{*3}}+\frac{\rho_{1}\log n}{\lambda_{1}}\lesssim\rho_{1}+\frac{\theta_{\text{max}}^{3}n^{1.5}}{\lambda_{1}^{*3}}

by our assumption λ1logn\lambda^{\star}_{1}\gg\log n. This completes the proof. ∎

8.4 Proof of Theorem 10

Proof.

We use the same rr and 𝒞1\mathcal{C}_{1} as in the proof of Theorem 8. Since

𝑯(λ𝑰𝑯)1\displaystyle\boldsymbol{H}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1} =i=1nλiλλi𝒖i𝒖i,𝑿(λ𝑰𝑿)1=i=1nλ^iλλ^i𝒖^i𝒖^i,\displaystyle=\sum_{i=1}^{n}\frac{\lambda_{i}^{*}}{\lambda-\lambda_{i}^{*}}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top},\quad\boldsymbol{X}\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}=\sum_{i=1}^{n}\frac{\widehat{\lambda}_{i}}{\lambda-\widehat{\lambda}_{i}}\widehat{\boldsymbol{u}}_{i}\widehat{\boldsymbol{u}}_{i}^{\top},

for the same reason as the proof of Theorem 8, under event 𝒜1\mathcal{A}_{1}, we have

λ^1𝒖^1𝒖^1λ1𝒖1𝒖1\displaystyle\widehat{\lambda}_{1}\widehat{\boldsymbol{u}}_{1}\widehat{\boldsymbol{u}}_{1}^{\top}-\lambda_{1}^{*}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top} =12πi𝒞1𝑿(λ𝑰𝑿)1𝑑λ12πi𝒞1𝑯(λ𝑰𝑯)1𝑑λ\displaystyle=\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{X}\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}d\lambda-\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{H}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}d\lambda
=12πi𝒞1[𝑿(λ𝑰𝑿)1𝑯(λ𝑰𝑯)1]𝑑λ.\displaystyle=\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\left[\boldsymbol{X}\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}-\boldsymbol{H}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\right]d\lambda.

We expand the integrand as sum of two quantitites in the following way:

𝑿(λ𝑰𝑿)1𝑯(λ𝑰𝑯)1\displaystyle\boldsymbol{X}\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}-\boldsymbol{H}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}
=\displaystyle= (𝑿𝑯)(λ𝑰𝑿)1+𝑯[(λ𝑰𝑿)1(λ𝑰𝑯)1]\displaystyle\left(\boldsymbol{X}-\boldsymbol{H}\right)\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}+\boldsymbol{H}\left[\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}-\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\right]
=\displaystyle= 𝑾(λ𝑰𝑿)1+𝑯(λ𝑰𝑿)1𝑾(λ𝑰𝑯)1\displaystyle\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}+\boldsymbol{H}\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}
=\displaystyle= 𝑾(λ𝑰𝑯)1+𝑯(λ𝑰𝑯)1𝑾(λ𝑰𝑯)1\displaystyle\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}+\boldsymbol{H}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}
+𝑾[(λ𝑰𝑿)1(λ𝑰𝑯)1]+𝑯[(λ𝑰𝑿)1(λ𝑰𝑯)1]𝑾(λ𝑰𝑯)1\displaystyle+\boldsymbol{W}\left[\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}-\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\right]+\boldsymbol{H}\left[\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}-\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\right]\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}
=\displaystyle= 𝑾(λ𝑰𝑯)1+𝑯(λ𝑰𝑯)1𝑾(λ𝑰𝑯)1𝚫1\displaystyle\underbrace{\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}+\boldsymbol{H}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}}_{\boldsymbol{\Delta}_{1}}
+𝑾(λ𝑰𝑿)1𝑾(λ𝑰𝑯)1+𝑯(λ𝑰𝑿)1𝑾(λ𝑰𝑯)1𝑾(λ𝑰𝑯)1𝚫2.\displaystyle+\underbrace{\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}+\boldsymbol{H}\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}}_{\boldsymbol{\Delta}_{2}}. (61)

For 𝚫1\boldsymbol{\Delta}_{1}, the contour integral can be calculated as

12πi𝒞1𝚫1𝑑λ=\displaystyle\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{1}d\lambda= 12πi𝒞1(i=1n1λλi𝑾𝒖i𝒖i+i=1nj=1nλi(λλi)(λλj)𝒖i𝒖i𝑾𝒖j𝒖j)𝑑λ\displaystyle\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\left(\sum_{i=1}^{n}\frac{1}{\lambda-\lambda_{i}^{*}}\boldsymbol{W}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}+\sum_{i=1}^{n}\sum_{j=1}^{n}\frac{\lambda_{i}^{*}}{(\lambda-\lambda_{i}^{*})(\lambda-\lambda_{j}^{*})}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}\boldsymbol{W}\boldsymbol{u}_{j}^{*}\boldsymbol{u}_{j}^{*\top}\right)d\lambda
=\displaystyle= i=1nRes(1λλi,λ1)𝑾𝒖i𝒖i\displaystyle\sum_{i=1}^{n}\textbf{Res}\left(\frac{1}{\lambda-\lambda_{i}^{*}},\lambda_{1}^{*}\right)\boldsymbol{W}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}
+i,j=1nRes(λi(λλi)(λλj),λ1)𝒖i𝒖i𝑾𝒖j𝒖j\displaystyle+\sum_{i,j=1}^{n}\textbf{Res}\left(\frac{\lambda_{i}^{*}}{(\lambda-\lambda_{i}^{*})(\lambda-\lambda_{j}^{*})},\lambda_{1}^{*}\right)\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}\boldsymbol{W}\boldsymbol{u}_{j}^{*}\boldsymbol{u}_{j}^{*\top}
=\displaystyle= 𝑾𝒖1𝒖1+i=2nλiλ1λi𝒖i𝒖i𝑾𝒖1𝒖1+i=2nλ1λ1λi𝒖1𝒖1𝑾𝒖i𝒖i\displaystyle\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+\sum_{i=2}^{n}\frac{\lambda_{i}^{*}}{\lambda_{1}^{*}-\lambda_{i}^{*}}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+\sum_{i=2}^{n}\frac{\lambda_{1}^{*}}{\lambda_{1}^{*}-\lambda_{i}^{*}}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}
=\displaystyle= (𝑰+i=2nλiλ1λi𝒖i𝒖i)𝑾𝒖1𝒖1+(𝑾𝒖1𝒖1)i=2nλ1λ1λi𝒖i𝒖i\displaystyle\left(\boldsymbol{I}+\sum_{i=2}^{n}\frac{\lambda_{i}^{*}}{\lambda_{1}^{*}-\lambda_{i}^{*}}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}\right)\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+\left(\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right)^{\top}\sum_{i=2}^{n}\frac{\lambda_{1}^{*}}{\lambda_{1}^{*}-\lambda_{i}^{*}}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}
=\displaystyle= (i=1n𝒖i𝒖i+i=2nλiλ1λi𝒖i𝒖i)𝑾𝒖1𝒖1+(𝑾𝒖1𝒖1)𝑵\displaystyle\left(\sum_{i=1}^{n}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}+\sum_{i=2}^{n}\frac{\lambda_{i}^{*}}{\lambda_{1}^{*}-\lambda_{i}^{*}}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}\right)\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+\left(\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right)^{\top}\boldsymbol{N}
=\displaystyle= (𝒖1𝒖1+i=2nλ1λ1λi𝒖i𝒖i)𝑾𝒖1𝒖1+(𝑾𝒖1𝒖1)𝑵\displaystyle\left(\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+\sum_{i=2}^{n}\frac{\lambda_{1}^{*}}{\lambda_{1}^{*}-\lambda_{i}^{*}}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}\right)\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+\left(\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right)^{\top}\boldsymbol{N}
=\displaystyle= 𝒖1𝒖1𝑾𝒖1𝒖1+𝑵𝑾𝒖1𝒖1+(𝑾𝒖1𝒖1)𝑵.\displaystyle\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+\left(\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right)^{\top}\boldsymbol{N}.

Next, we bound the spectral norm of 𝚫2\boldsymbol{\Delta}_{2} under the event 𝒜1\mathcal{A}_{1}, as:

𝚫2\displaystyle\left\|\boldsymbol{\Delta}_{2}\right\| 𝑾(λ𝑰𝑿)1𝑾(λ𝑰𝑯)1+𝑯(λ𝑰𝑿)1𝑾(λ𝑰𝑯)1𝑾(λ𝑰𝑯)1\displaystyle\leq\left\|\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\right\|+\left\|\boldsymbol{H}\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\right\|
𝑾2(λ𝑰𝑿)1(λ𝑰𝑯)1+𝑯(λ𝑰𝑿)1𝑾2(λ𝑰𝑯)12\displaystyle\leq\left\|\boldsymbol{W}\right\|^{2}\left\|\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}\right\|\left\|\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\right\|+\left\|\boldsymbol{H}\right\|\left\|\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}\right\|\left\|\boldsymbol{W}\right\|^{2}\left\|\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\right\|^{2}
nθmax2λ12+λ1nθmax2λ13nθmax2λ12,\displaystyle\lesssim\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{*2}}+\frac{\lambda_{1}^{*}n\theta_{\text{max}}^{2}}{\lambda_{1}^{*3}}\lesssim\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{*2}},

where the third inequality (8.1). As a result, the contour integral of 𝚫2\boldsymbol{\Delta}_{2}, can be bounded as

12πi𝒞1𝚫2𝑑λ12π𝒞1nθmax2λ12𝑑λ=nθmax2rλ12nθmax2λ1.\displaystyle\left\|\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{2}d\lambda\right\|\lesssim\frac{1}{2\pi}\oint_{\mathcal{C}_{1}}\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{*2}}d\lambda=\frac{n\theta_{\text{max}}^{2}r}{\lambda_{1}^{*2}}\lesssim\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{*}}.

Noting that the contour integral of 𝚫2\boldsymbol{\Delta}_{2} is exactly

λ^1𝒖^1𝒖^1λ1𝒖1𝒖1[𝒖1𝒖1𝑾𝒖1𝒖1+𝑵𝑾𝒖1𝒖1+(𝑾𝒖1𝒖1)𝑵]\displaystyle\widehat{\lambda}_{1}\widehat{\boldsymbol{u}}_{1}\widehat{\boldsymbol{u}}_{1}^{\top}-\lambda_{1}^{*}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}-\left[\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+\left(\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right)^{\top}\boldsymbol{N}\right]

gives us the desired result. ∎

8.5 Proof of Corollary 2

Proof.

Since 𝒖2,𝒖3,,𝒖K\boldsymbol{u}_{2}^{*},\boldsymbol{u}_{3}^{*},\dots,\boldsymbol{u}_{K}^{*} are orthogonal to each other, we have

𝑵=i=2nλ1λ1λi𝒖i𝒖imax2inλ1λ1λi1.\displaystyle\left\|\boldsymbol{N}\right\|=\left\|\sum_{i=2}^{n}\frac{\lambda_{1}^{*}}{\lambda_{1}^{*}-\lambda_{i}^{*}}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}\right\|\lesssim\max_{2\leq i\leq n}\frac{\lambda_{1}^{*}}{\lambda_{1}^{*}-\lambda_{i}^{*}}\lesssim 1. (62)

Since 𝑾nθmax\|\boldsymbol{W}\|\lesssim\sqrt{n}\theta_{\text{max}} on the set 𝒜1\mathcal{A}_{1} we have, by (10),

λ^1𝒖^1𝒖^1λ1𝒖1𝒖1\displaystyle\left\|\widehat{\lambda}_{1}\widehat{\boldsymbol{u}}_{1}\widehat{\boldsymbol{u}}_{1}^{\top}-\lambda_{1}^{*}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right\| 𝒖1𝒖1𝑾𝒖1𝒖1+2𝑾𝒖1𝒖1𝑵+𝚫\displaystyle\leq\left\|\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right\|+2\left\|\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right\|\left\|\boldsymbol{N}\right\|+\left\|\boldsymbol{\Delta}\right\|
𝑾+2𝑾𝑵+𝚫nθmax+nθmax2λ1nθmax,\displaystyle\leq\left\|\boldsymbol{W}\right\|+2\left\|\boldsymbol{W}\right\|\left\|\boldsymbol{N}\right\|+\left\|\boldsymbol{\Delta}\right\|\lesssim\sqrt{n}\theta_{\text{max}}+\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{*}}\lesssim\sqrt{n}\theta_{\text{max}},

where the third inequality used (62). ∎

8.6 Proof of Lemma 7

Proof.

Recall the definitions of LL and RR from (38) and (10) respectively. We write the SVD of 𝑳\boldsymbol{L} as 𝑳=𝒀1(cos𝛀)𝒀2\boldsymbol{L}=\boldsymbol{Y}_{1}(\cos\boldsymbol{\Omega})\boldsymbol{Y}_{2}^{\top}, where 𝒀1,𝒀2(K1)×(K1)\boldsymbol{Y}_{1},\boldsymbol{Y}_{2}\in\mathbb{R}^{(K-1)\times(K-1)}. Then we have 𝑹=𝒀1𝒀2\boldsymbol{R}=\boldsymbol{Y}_{1}\boldsymbol{Y}_{2}^{\top}. Therefore, we have

𝑼¯𝑹𝑼¯2\displaystyle\left\|\overline{\boldsymbol{U}}\boldsymbol{R}-\overline{\boldsymbol{U}}^{*}\right\|^{2} =𝑼¯𝒀1𝒀2𝑼¯2=(𝑼¯𝒀1𝒀2𝑼¯)(𝑼¯𝒀1𝒀2𝑼¯)\displaystyle=\left\|\overline{\boldsymbol{U}}\boldsymbol{Y}_{1}\boldsymbol{Y}_{2}^{\top}-\overline{\boldsymbol{U}}^{*}\right\|^{2}=\left\|\left(\overline{\boldsymbol{U}}\boldsymbol{Y}_{1}\boldsymbol{Y}_{2}^{\top}-\overline{\boldsymbol{U}}^{*}\right)^{\top}\left(\overline{\boldsymbol{U}}\boldsymbol{Y}_{1}\boldsymbol{Y}_{2}^{\top}-\overline{\boldsymbol{U}}^{*}\right)\right\|
=2𝑰𝒀2𝒀1𝑼¯𝑼¯𝑼¯𝑼¯𝒀1𝒀2\displaystyle=\left\|2\boldsymbol{I}-\boldsymbol{Y}_{2}\boldsymbol{Y}_{1}^{\top}\overline{\boldsymbol{U}}^{\top}\overline{\boldsymbol{U}}^{*}-\overline{\boldsymbol{U}}^{*\top}\overline{\boldsymbol{U}}\boldsymbol{Y}_{1}\boldsymbol{Y}_{2}^{\top}\right\|
=2𝑰𝒀2𝒀1𝒀1(cos𝛀)𝒀2𝒀2(cos𝛀)𝒀1𝒀1𝒀2\displaystyle=\left\|2\boldsymbol{I}-\boldsymbol{Y}_{2}\boldsymbol{Y}_{1}^{\top}\boldsymbol{Y}_{1}(\cos\boldsymbol{\Omega})\boldsymbol{Y}_{2}^{\top}-\boldsymbol{Y}_{2}(\cos\boldsymbol{\Omega})\boldsymbol{Y}_{1}^{\top}\boldsymbol{Y}_{1}\boldsymbol{Y}_{2}^{\top}\right\|
=2𝑰𝒀2(cos𝛀)𝒀2=2𝑰cos𝛀2𝑰cos2𝛀=2sin𝛀2.\displaystyle=2\left\|\boldsymbol{I}-\boldsymbol{Y}_{2}(\cos\boldsymbol{\Omega})\boldsymbol{Y}_{2}^{\top}\right\|=2\left\|\boldsymbol{I}-\cos\boldsymbol{\Omega}\right\|\leq 2\left\|\boldsymbol{I}-\cos^{2}\boldsymbol{\Omega}\right\|=2\left\|\sin\boldsymbol{\Omega}\right\|^{2}.

By Wedin’s sinΘ\Theta Theorem [CCF+21, Theorem 2.9], we have

sin𝛀dist(𝑼¯,𝑼¯)2𝑿¯𝑯¯σK1(𝑿¯)σK(𝑯¯)=2𝑾¯σK(𝑿)σK+1(𝑯).\displaystyle\left\|\sin\boldsymbol{\Omega}\right\|\leq\textbf{dist}(\overline{\boldsymbol{U}},\overline{\boldsymbol{U}}^{*})\leq\frac{\sqrt{2}\left\|\overline{\boldsymbol{X}}-\overline{\boldsymbol{H}}\right\|}{\sigma_{K-1}(\overline{\boldsymbol{X}})-\sigma_{K}(\overline{\boldsymbol{H}})}=\frac{\sqrt{2}\left\|\overline{\boldsymbol{W}}\right\|}{\sigma_{K}(\boldsymbol{X})-\sigma_{K+1}(\boldsymbol{H})}.

Recall that, 𝑾¯nθmax\|\overline{\boldsymbol{W}}\|\lesssim\sqrt{n}\theta_{\text{max}} under event 𝒜1\mathcal{A}_{1}, by (37). On the other hand, we have

σK(𝑿)σK+1(𝑯)\displaystyle\sigma_{K}(\boldsymbol{X})-\sigma_{K+1}(\boldsymbol{H}) σK(𝑿)diag(𝚯𝚷𝑷𝚷𝚯)\displaystyle\geq\sigma_{K}(\boldsymbol{X})-\left\|\textbf{diag}(\boldsymbol{\Theta}\boldsymbol{\Pi}\boldsymbol{P}\boldsymbol{\Pi}^{\top}\boldsymbol{\Theta})\right\|
σK(𝑯)𝑾diag(𝚯𝚷𝑷𝚷𝚯)σmin.\displaystyle\geq\sigma_{K}(\boldsymbol{H})-\left\|\boldsymbol{W}\right\|-\left\|\textbf{diag}(\boldsymbol{\Theta}\boldsymbol{\Pi}\boldsymbol{P}\boldsymbol{\Pi}^{\top}\boldsymbol{\Theta})\right\|\asymp\sigma_{\textbf{min}}^{*}.

Therefore, we have

𝑼¯𝑹𝑼¯sin𝛀nθmaxσmin.\displaystyle\left\|\overline{\boldsymbol{U}}\boldsymbol{R}-\overline{\boldsymbol{U}}^{*}\right\|\lesssim\left\|\sin\boldsymbol{\Omega}\right\|\lesssim\frac{\sqrt{n}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{*}}. (63)

Again, by (63),

𝑳𝑹=𝒀1(cos𝛀)𝒀2𝒀1𝒀2=𝑰cos𝛀2sin𝛀2nθmax2σmin2.\displaystyle\left\|\boldsymbol{L}-\boldsymbol{R}\right\|=\left\|\boldsymbol{Y}_{1}(\cos\boldsymbol{\Omega})\boldsymbol{Y}_{2}^{\top}-\boldsymbol{Y}_{1}\boldsymbol{Y}_{2}^{\top}\right\|=\left\|\boldsymbol{I}-\cos\boldsymbol{\Omega}\right\|\leq 2\left\|\sin\boldsymbol{\Omega}\right\|^{2}\lesssim\frac{n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{*2}}.

Since we have assumed nθmax2σmin2n\theta_{\text{max}}^{2}\ll\sigma_{\textbf{min}}^{*2}, we have 𝑳𝑹=o(1)\|\boldsymbol{L}-\boldsymbol{R}\|=o(1)

σi(𝑳)σi(𝑹)+𝑳𝑹1+o(1)2,\displaystyle\sigma_{i}(\boldsymbol{L})\leq\sigma_{i}(\boldsymbol{R})+\left\|\boldsymbol{L}-\boldsymbol{R}\right\|\leq 1+o(1)\leq 2,
σi(𝑳)σi(𝑹)𝑳𝑹1o(1)12.\displaystyle\sigma_{i}(\boldsymbol{L})\geq\sigma_{i}(\boldsymbol{R})-\left\|\boldsymbol{L}-\boldsymbol{R}\right\|\geq 1-o(1)\geq\frac{1}{2}.

This proves the first part of this lemma.

Moving onto the proof of second part, by (7), we have 𝒗i=sgn(λi)𝒖i\boldsymbol{v}_{i}^{*}=\textbf{sgn}\left(\lambda_{i}^{*}\right)\boldsymbol{u}_{i}^{*} yielding 𝑽¯=𝑼¯𝑫\overline{\boldsymbol{V}}^{*}=\overline{\boldsymbol{U}}^{*}\boldsymbol{D}. It remains to show that 𝒗^i=sgn(λi)𝒖^i\widehat{\boldsymbol{v}}_{i}=\textbf{sgn}(\lambda_{i}^{*})\widehat{\boldsymbol{u}}_{i} under event 𝒜1\mathcal{A}_{1}. To this end, set s:=|{i[K]:λi>0}|s:=|\{i\in[K]:\lambda_{i}^{*}>0\}|. by Lemma 1 we know that

λ1>λ2λsβnK1𝜽22>0>βnK1𝜽22λs+1λK,\displaystyle\lambda_{1}^{*}>\lambda_{2}^{*}\geq\cdots\geq\lambda_{s}^{*}\asymp\beta_{n}K^{-1}\left\|\boldsymbol{\theta}\right\|_{2}^{2}>0>-\beta_{n}K^{-1}\left\|\boldsymbol{\theta}\right\|_{2}^{2}\asymp\lambda_{s+1}^{*}\geq\lambda_{K}^{*},
|λi|diag(𝚯𝚷𝑷𝚷𝚯)θmax2βnK1𝜽22,i>K.\displaystyle|\lambda_{i}^{*}|\leq\left\|\textbf{diag}(\boldsymbol{\Theta}\boldsymbol{\Pi}\boldsymbol{P}\boldsymbol{\Pi}^{\top}\boldsymbol{\Theta})\right\|\lesssim\theta_{\text{max}}^{2}\ll\beta_{n}K^{-1}\left\|\boldsymbol{\theta}\right\|_{2}^{2},\quad\forall i>K.

Recall that λ^1,λ^2,,λ^K\widehat{\lambda}_{1},\widehat{\lambda}_{2},\dots,\widehat{\lambda}_{K} are the largest KK eigenvalues in magnitude among all nn eigenvalues of 𝑿\boldsymbol{X}, and λ^1,λ^2,,λ^K\widehat{\lambda}_{1},\widehat{\lambda}_{2},\dots,\widehat{\lambda}_{K} are sorted descendingly. By Weyl’s theorem, under event 𝒜1\mathcal{A}_{1},

λ^1>λ^2λ^sβnK1𝜽22>0>βnK1𝜽22λ^s+1λ^K,\displaystyle\widehat{\lambda}_{1}>\widehat{\lambda}_{2}\geq\cdots\geq\widehat{\lambda}_{s}\asymp\beta_{n}K^{-1}\left\|\boldsymbol{\theta}\right\|_{2}^{2}>0>-\beta_{n}K^{-1}\left\|\boldsymbol{\theta}\right\|_{2}^{2}\asymp\widehat{\lambda}_{s+1}\geq\widehat{\lambda}_{K},
|λ^i|nθmax+diag(𝚯𝚷𝑷𝚷𝚯)nθmaxβnK1𝜽22,i>K.\displaystyle|\widehat{\lambda}_{i}|\lesssim\sqrt{n}\theta_{\text{max}}+\left\|\textbf{diag}(\boldsymbol{\Theta}\boldsymbol{\Pi}\boldsymbol{P}\boldsymbol{\Pi}^{\top}\boldsymbol{\Theta})\right\|\lesssim\sqrt{n}\theta_{\text{max}}\ll\beta_{n}K^{-1}\left\|\boldsymbol{\theta}\right\|_{2}^{2},\quad\forall i>K.

As a result, we get sgn(λi)=sgn(λ^i)\textbf{sgn}(\lambda_{i}^{*})=\textbf{sgn}(\widehat{\lambda}_{i}). This leads to, using (7),

𝒗^i=sgn(λ^i)𝒖^i=sgn(λi)𝒖^i,\displaystyle\widehat{\boldsymbol{v}}_{i}=\textbf{sgn}(\widehat{\lambda}_{i})\widehat{\boldsymbol{u}}_{i}=\textbf{sgn}(\lambda_{i}^{*})\widehat{\boldsymbol{u}}_{i},

implying 𝑽¯=𝑼¯𝑫\overline{\boldsymbol{V}}=\overline{\boldsymbol{U}}\boldsymbol{D}. This completes the proof of the Lemma. ∎

8.7 Proof of Lemma 8

Proof.

Recall the defintions of RR and LL from (10) and (38) respectively. By triangle inequality,

𝑹𝚲¯𝑹𝚲¯𝑹𝚲¯𝑹𝑳𝚲¯𝑳α1+𝑳𝚲¯𝑳𝑼¯𝑿¯𝑼¯α2+𝑼¯𝑿¯𝑼¯𝚲¯α3.\displaystyle\left\|\boldsymbol{R}^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{R}-\overline{\boldsymbol{\Lambda}}^{*}\right\|\leq\underbrace{\left\|\boldsymbol{R}^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{R}-\boldsymbol{L}^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}\right\|}_{\alpha_{1}}+\underbrace{\left\|\boldsymbol{L}^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*\top}\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}\right\|}_{\alpha_{2}}+\underbrace{\left\|\overline{\boldsymbol{U}}^{*\top}\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}-\overline{\boldsymbol{\Lambda}}^{*}\right\|}_{\alpha_{3}}.

We bound the three quantities separately. For the first term α1\alpha_{1}, we have, on the event 𝒜1\mathcal{A}_{1}, using Lemma 7

α1\displaystyle\alpha_{1} (𝑹𝑳)𝚲¯𝑹+𝑳𝚲¯(𝑹𝑳)\displaystyle\leq\left\|(\boldsymbol{R}-\boldsymbol{L})^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{R}\right\|+\left\|\boldsymbol{L}^{\top}\overline{\boldsymbol{\Lambda}}(\boldsymbol{R}-\boldsymbol{L})\right\|
𝑹𝑳𝚲¯𝑹+𝑳𝚲¯𝑹𝑳\displaystyle\leq\left\|\boldsymbol{R}-\boldsymbol{L}\right\|\left\|\overline{\boldsymbol{\Lambda}}\right\|\left\|\boldsymbol{R}\right\|+\left\|\boldsymbol{L}\right\|\left\|\overline{\boldsymbol{\Lambda}}\right\|\left\|\boldsymbol{R}-\boldsymbol{L}\right\|
nθmax2σmin2𝚲¯nθmax2σmin2(𝚲¯+𝑾¯)nθmax2σmin2(σmax+nθmax)nθmax2κσmin,\displaystyle\lesssim\frac{n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{*2}}\left\|\overline{\boldsymbol{\Lambda}}\right\|\leq\frac{n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{*2}}\left(\left\|\overline{\boldsymbol{\Lambda}}^{*}\right\|+\left\|\overline{\boldsymbol{W}}\right\|\right)\lesssim\frac{n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{*2}}\left(\sigma_{\textbf{max}}^{*}+\sqrt{n}\theta_{\text{max}}\right)\lesssim\frac{n\theta_{\text{max}}^{2}\kappa^{*}}{\sigma_{\textbf{min}}^{*}},

where the third inequality uses the fact RR, LL are orthogonal matrices and the last inquality uses our assumption that nθmaxσmin\sqrt{n}\theta_{\text{max}}\ll\sigma_{\textbf{min}}^{*}.

For the second quantity α2\alpha_{2}, one can see that

𝑳𝚲¯𝑳𝑼¯𝑿¯𝑼¯=𝑼¯𝑼¯𝚲¯𝑼¯𝑼¯𝑼¯𝑿¯𝑼¯=𝑼¯𝑼¯𝚺¯𝑽¯𝑼¯,\displaystyle\boldsymbol{L}^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*\top}\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}=\overline{\boldsymbol{U}}^{*\top}\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\overline{\boldsymbol{U}}^{\top}\overline{\boldsymbol{U}}^{*}-\overline{\boldsymbol{U}}^{*\top}\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}=-\overline{\boldsymbol{U}}^{*\top}\overline{\boldsymbol{U}}_{\perp}\overline{\boldsymbol{\Sigma}}_{\perp}\overline{\boldsymbol{V}}_{\perp}^{\top}\overline{\boldsymbol{U}}^{*}, (64)

where 𝑼¯,𝚺¯\overline{\boldsymbol{U}}_{\perp},\overline{\boldsymbol{\Sigma}}_{\perp} and 𝑽¯\overline{\boldsymbol{V}}_{\perp} come from the SVD of 𝑿¯\overline{\boldsymbol{X}}

𝑿¯=SVD[𝑼¯𝑼¯][𝚲¯𝑫𝟎𝟎𝚺¯][(𝑼¯𝑫)𝑽¯]=𝑼¯𝚲¯𝑼¯+𝑼¯𝚺¯𝑽¯.\displaystyle\overline{\boldsymbol{X}}\overset{\text{SVD}}{=}\left[\begin{array}[]{ll}\overline{\boldsymbol{U}}&\overline{\boldsymbol{U}}_{\perp}\end{array}\right]\left[\begin{array}[]{cc}\overline{\boldsymbol{\Lambda}}\boldsymbol{D}&\boldsymbol{0}\\ \boldsymbol{0}&\overline{\boldsymbol{\Sigma}}_{\perp}\end{array}\right]\left[\begin{array}[]{c}(\overline{\boldsymbol{U}}\boldsymbol{D})^{\top}\\ \overline{\boldsymbol{V}}_{\perp}^{\top}\end{array}\right]=\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\overline{\boldsymbol{U}}^{\top}+\overline{\boldsymbol{U}}_{\perp}\overline{\boldsymbol{\Sigma}}_{\perp}\overline{\boldsymbol{V}}_{\perp}^{\top}.

By Weyl’s theorem we have

𝚺¯σK(𝑯¯)+𝑾𝑾+diag(H)nθmax\displaystyle\left\|\overline{\boldsymbol{\Sigma}}_{\perp}\right\|\leq\sigma_{K}(\overline{\boldsymbol{H}})+\left\|\boldsymbol{W}\right\|\leq\left\|\boldsymbol{W}\right\|+\left\|\textbf{diag}(H)\right\|\lesssim\sqrt{n}\theta_{\text{max}} (65)

under event 𝒜1\mathcal{A}_{1}. On the other hand, by [CCF+21, Lemma 2.5] and Lemma 7 we have

𝑼¯𝑼¯=𝑼¯𝑼¯=sin𝛀nθmaxσmin,\displaystyle\left\|\overline{\boldsymbol{U}}^{*\top}\overline{\boldsymbol{U}}_{\perp}\right\|=\left\|\overline{\boldsymbol{U}}_{\perp}^{\top}\overline{\boldsymbol{U}}^{*}\right\|=\left\|\sin\boldsymbol{\Omega}\right\|\lesssim\frac{\sqrt{n}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{*}}, (66)
𝑽¯𝑼¯=𝑽¯𝑼¯𝑫=𝑽¯𝑽¯=sin𝛀nθmaxσmin.\displaystyle\left\|\overline{\boldsymbol{V}}^{\top}_{\perp}\overline{\boldsymbol{U}}^{*}\right\|=\left\|\overline{\boldsymbol{V}}^{\top}_{\perp}\overline{\boldsymbol{U}}^{*}\boldsymbol{D}\right\|=\left\|\overline{\boldsymbol{V}}^{\top}_{\perp}\overline{\boldsymbol{V}}^{*}\right\|=\left\|\sin\boldsymbol{\Omega}\right\|\lesssim\frac{\sqrt{n}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{*}}. (67)

Combine (64), (65), (66) and (67) we get

α2𝑼¯𝑼¯𝚺¯𝑽¯𝑼¯θmax3n1.5σmin2.\displaystyle\alpha_{2}\leq\left\|\overline{\boldsymbol{U}}^{*\top}\overline{\boldsymbol{U}}_{\perp}\right\|\left\|\overline{\boldsymbol{\Sigma}}_{\perp}\right\|\left\|\overline{\boldsymbol{V}}^{\top}_{\perp}\overline{\boldsymbol{U}}^{*}\right\|\lesssim\frac{\theta_{\text{max}}^{3}n^{1.5}}{\sigma_{\textbf{min}}^{*2}}.

Next, we analyze α3\alpha_{3}. By definition we have

𝑼¯𝑿¯𝑼¯𝚲¯=𝑼¯(𝑯¯+𝑾¯)𝑼¯𝑼¯𝑯¯𝑼¯=𝑼¯𝑾¯𝑼¯.\displaystyle\overline{\boldsymbol{U}}^{*\top}\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}-\overline{\boldsymbol{\Lambda}}^{*}=\overline{\boldsymbol{U}}^{*\top}(\overline{\boldsymbol{H}}+\overline{\boldsymbol{W}})\overline{\boldsymbol{U}}^{*}-\overline{\boldsymbol{U}}^{*\top}\overline{\boldsymbol{H}}\overline{\boldsymbol{U}}^{*}=\overline{\boldsymbol{U}}^{*\top}\overline{\boldsymbol{W}}\overline{\boldsymbol{U}}^{*}.

Use the notation of Theorem 10, we can write

𝑾¯\displaystyle\overline{\boldsymbol{W}} =𝑾[λ^1𝒖^1𝒖^1λ1𝒖1𝒖1]\displaystyle=\boldsymbol{W}-\left[\widehat{\lambda}_{1}\widehat{\boldsymbol{u}}_{1}\widehat{\boldsymbol{u}}_{1}^{\top}-\lambda_{1}^{*}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right]
=𝑾𝒖1𝒖1𝑾𝒖1𝒖1𝑵𝑾𝒖1𝒖1𝒖1𝒖1𝑾𝑵𝚫.\displaystyle=\boldsymbol{W}-\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}-\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}-\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{N}-\boldsymbol{\Delta}. (68)

Since 𝒖1𝒖1𝑼¯=𝒖1𝒖1[𝒖2,𝒖3,,𝒖K]=𝟎\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\overline{\boldsymbol{U}}^{*}=\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}[\boldsymbol{u}_{2}^{*},\boldsymbol{u}_{3}^{*},\dots,\boldsymbol{u}_{K}^{*}]=\boldsymbol{0}, we know that

𝑼¯𝑾¯𝑼¯=𝑼¯𝑾𝑼¯𝑼¯𝚫𝑼¯.\displaystyle\overline{\boldsymbol{U}}^{*\top}\overline{\boldsymbol{W}}\overline{\boldsymbol{U}}^{*}=\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}-\overline{\boldsymbol{U}}^{*\top}\boldsymbol{\Delta}\overline{\boldsymbol{U}}^{*}.

Now we bound the two terms separately. The second quantity is immediately bounded by Theorem 10:

𝑼¯𝚫𝑼¯𝚫nθmax2λ1.\displaystyle\left\|\overline{\boldsymbol{U}}^{*\top}\boldsymbol{\Delta}\overline{\boldsymbol{U}}^{*}\right\|\leq\left\|\boldsymbol{\Delta}\right\|\lesssim\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{*}}.

On the other hand, similar to [YCF21, Lemma 2], we use matrix Bernstein inequality [T+15, Theorem 6.1.1] to control 𝑼¯𝑾𝑼¯\left\|\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\right\|. Define 𝒀ii:=𝑼¯i,𝑼¯i,\boldsymbol{Y}_{ii}:=\overline{\boldsymbol{U}}^{*\top}_{i,\cdot}\overline{\boldsymbol{U}}^{*}_{i,\cdot}, i[n]i\in[n] and 𝒀ij:=𝑼¯i,𝑼¯j,+𝑼¯j,𝑼¯i,\boldsymbol{Y}_{ij}:=\overline{\boldsymbol{U}}^{*\top}_{i,\cdot}\overline{\boldsymbol{U}}^{*}_{j,\cdot}+\overline{\boldsymbol{U}}^{*\top}_{j,\cdot}\overline{\boldsymbol{U}}^{*}_{i,\cdot}, 1i<jn1\leq i<j\leq n. Then

𝑼¯𝑾𝑼¯=i=1nWii𝒀ii+1i<jnWij𝒀ij.\displaystyle\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}=\sum_{i=1}^{n}W_{ii}\boldsymbol{Y}_{ii}+\sum_{1\leq i<j\leq n}W_{ij}\boldsymbol{Y}_{ij}.

By (12),

max1ijnWij𝒀ijmax1ijn𝒀ij2(maxi[n]𝑼¯i,2)22μ(K1)n.\displaystyle\max_{1\leq i\leq j\leq n}\left\|W_{ij}\boldsymbol{Y}_{ij}\right\|\leq\max_{1\leq i\leq j\leq n}\left\|\boldsymbol{Y}_{ij}\right\|\leq 2\left(\max_{i\in[n]}\left\|\overline{\boldsymbol{U}}^{*}_{i,\cdot}\right\|_{2}\right)^{2}\leq\frac{2\mu^{*}(K-1)}{n}.

Second, since 𝑼¯𝑾𝑼¯\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*} is symmetric, we know that

(𝑼¯𝑾𝑼¯)𝑼¯𝑾𝑼¯=𝑼¯𝑾𝑼¯(𝑼¯𝑾𝑼¯)=𝑼¯𝑾𝑼¯𝑼¯𝑾𝑼¯.\displaystyle\left(\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\right)^{\top}\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}=\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\left(\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\right)^{\top}=\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}.

Therefore, we only have to bound 𝔼𝑼¯𝑾𝑼¯𝑼¯𝑾𝑼¯\left\|\mathbb{E}\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\right\|. Since 𝑼¯1\|\overline{\boldsymbol{U}}^{*}\|\leq 1, we have

𝔼𝑼¯𝑾𝑼¯𝑼¯𝑾𝑼¯=𝑼¯(𝔼𝑾𝑼¯𝑼¯𝑾)𝑼¯𝔼𝑾𝑼¯𝑼¯𝑾.\displaystyle\left\|\mathbb{E}\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\right\|=\left\|\overline{\boldsymbol{U}}^{*\top}\left(\mathbb{E}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\right)\overline{\boldsymbol{U}}^{*}\right\|\leq\left\|\mathbb{E}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\right\|.

For any 1ijn1\leq i\neq j\leq n, we have

[𝔼𝑾𝑼¯𝑼¯𝑾]ij=𝔼[𝑾𝑼¯𝑼¯𝑾]ij=𝔼k,l=1nWik[𝑼¯𝑼¯]klWlj=[𝑼¯𝑼¯]j,i𝔼Wij2.\displaystyle\left[\mathbb{E}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\right]_{ij}=\mathbb{E}\left[\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\right]_{ij}=\mathbb{E}\sum_{k,l=1}^{n}W_{ik}\left[\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\right]_{kl}W_{lj}=\left[\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\right]_{j,i}\mathbb{E}W_{ij}^{2}.

And, for any 1in1\leq i\leq n, we have

[𝔼𝑾𝑼¯𝑼¯𝑾]ii\displaystyle\left[\mathbb{E}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\right]_{ii} =𝔼k,l=1nWik[𝑼¯𝑼¯]klWli=j=1n[𝑼¯𝑼¯]jj𝔼Wii2\displaystyle=\mathbb{E}\sum_{k,l=1}^{n}W_{ik}\left[\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\right]_{kl}W_{li}=\sum_{j=1}^{n}\left[\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\right]_{jj}\mathbb{E}W_{ii}^{2}
j=1n[𝑼¯𝑼¯]jjθmax2=tr[𝑼¯𝑼¯]θmax2=𝑼¯F2θmax2=(K1)θmax2.\displaystyle\lesssim\sum_{j=1}^{n}\left[\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\right]_{jj}\theta_{\text{max}}^{2}=\textbf{tr}\left[\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\right]\theta_{\text{max}}^{2}=\left\|\overline{\boldsymbol{U}}^{*}\right\|_{F}^{2}\theta_{\text{max}}^{2}=(K-1)\theta_{\text{max}}^{2}. (69)

Define 𝑨\boldsymbol{A} by setting Aij:=(𝔼𝑾𝑼¯𝑼¯𝑾)ij𝟙i=jA_{ij}:=\left(\mathbb{E}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\right)_{ij}\mathbbm{1}_{i=j}. Then, by (8.7),

𝔼𝑾𝑼¯𝑼¯𝑾𝑨+𝔼𝑾𝑼¯𝑼¯𝑾𝑨(K1)θmax2+𝔼𝑾𝑼¯𝑼¯𝑾𝑨F.\displaystyle\left\|\mathbb{E}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\right\|\leq\left\|\boldsymbol{A}\right\|+\left\|\mathbb{E}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}-\boldsymbol{A}\right\|\leq(K-1)\theta_{\text{max}}^{2}+\left\|\mathbb{E}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}-\boldsymbol{A}\right\|_{F}.

The second summand of the above display can be bounded as:

𝔼𝑾𝑼¯𝑼¯𝑾𝑨F=1ijn[𝔼𝑾𝑼¯𝑼¯𝑾]ij2=1ijn[𝑼¯𝑼¯]ji2[𝔼Wij2]2\displaystyle\left\|\mathbb{E}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}-\boldsymbol{A}\right\|_{F}=\sqrt{\sum_{1\leq i\neq j\leq n}\left[\mathbb{E}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\right]_{ij}^{2}}=\sqrt{\sum_{1\leq i\neq j\leq n}\left[\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\right]_{ji}^{2}\left[\mathbb{E}W_{ij}^{2}\right]^{2}}
\displaystyle\lesssim i,j=1n[𝑼¯𝑼¯]ji2θmax4=𝑼¯𝑼¯Fθmax2𝑼¯F2θmax2=(K1)θmax2.\displaystyle\sqrt{\sum_{i,j=1}^{n}\left[\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\right]_{ji}^{2}\theta_{\text{max}}^{4}}=\left\|\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\right\|_{F}\theta_{\text{max}}^{2}\leq\left\|\overline{\boldsymbol{U}}^{*}\right\|_{F}^{2}\theta_{\text{max}}^{2}=(K-1)\theta_{\text{max}}^{2}.

In conclusion, we get

𝔼𝑼¯𝑾𝑼¯𝑼¯𝑾𝑼¯𝔼𝑾𝑼¯𝑼¯𝑾(K1)θmax2.\displaystyle\left\|\mathbb{E}\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\right\|\leq\left\|\mathbb{E}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\right\|\lesssim(K-1)\theta_{\text{max}}^{2}.

Then by matrix Bernstein inequality [T+15, Theorem 6.1.1], with probability at least 1O(n10)1-O(n^{-10}),

α3=𝑼¯𝑾𝑼¯(K1)θmax2logn+μ(K1)nlogn.\displaystyle\alpha_{3}=\left\|\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\right\|\lesssim\sqrt{(K-1)\theta_{\text{max}}^{2}\log n}+\frac{\mu^{*}(K-1)}{n}\log n.

This implies α3(K1)lognθmax\alpha_{3}\lesssim\sqrt{(K-1)\log n}\theta_{\text{max}} since we assumed n2θmax2(K1)μ2lognn^{2}\theta_{\text{max}}^{2}\gtrsim(K-1)\mu^{*2}\log n. Finally, combining the bounds for α1,α2\alpha_{1},\alpha_{2} and α3\alpha_{3}, we get

𝑹𝚲¯𝑹𝚲¯α1+α2+α3\displaystyle\left\|\boldsymbol{R}^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{R}-\overline{\boldsymbol{\Lambda}}^{*}\right\|\leq\alpha_{1}+\alpha_{2}+\alpha_{3} κnθmax2σmin+θmax3n1.5σmin2+(K1)lognθmax\displaystyle\lesssim\frac{\kappa^{*}n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{*}}+\frac{\theta_{\text{max}}^{3}n^{1.5}}{\sigma_{\textbf{min}}^{*2}}+\sqrt{(K-1)\log n}\theta_{\text{max}}
κnθmax2σmin+(K1)lognθmax,\displaystyle\asymp\frac{\kappa^{*}n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{*}}+\sqrt{(K-1)\log n}\theta_{\text{max}},

since σmaxσminnθmax\sigma_{\textbf{max}}^{*}\geq\sigma_{\textbf{min}}^{*}\gg\sqrt{n}\theta_{\text{max}}. Also,

𝑳𝚲¯𝑳𝚲¯\displaystyle\|\boldsymbol{L}^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{\Lambda}}^{*}\| 𝑳𝚲¯𝑳𝑼¯𝑿¯𝑼¯+𝑼¯𝑿¯𝑼¯𝚲¯=α2+α3\displaystyle\leq\left\|\boldsymbol{L}^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*\top}\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}\right\|+\left\|\overline{\boldsymbol{U}}^{*\top}\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}-\overline{\boldsymbol{\Lambda}}^{*}\right\|=\alpha_{2}+\alpha_{3}
θmax3n1.5σmin2+(K1)lognθmax,\displaystyle\lesssim\frac{\theta_{\text{max}}^{3}n^{1.5}}{\sigma_{\textbf{min}}^{*2}}+\sqrt{(K-1)\log n}\theta_{\text{max}},

completing the proof of the Lemma. ∎

8.8 Proof of Lemma 9

Proof.

Since 𝑼¯𝚲¯=𝑿¯𝑼¯\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}=\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}, by triangle inequality we have

𝑼¯𝚲¯𝑳𝑿¯𝑼¯2,=𝑿¯𝑼¯𝑳𝑿¯𝑼¯2,𝑯¯(𝑼¯𝑳𝑼¯)2,+𝑾¯(𝑼¯𝑳𝑼¯)2,.\displaystyle\left\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}=\left\|\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}\leq\left\|\overline{\boldsymbol{H}}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}+\left\|\overline{\boldsymbol{W}}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}. (70)

We control the first term on the RHS first. We consider the with self-loop case and the without self-loop case separately.

With self-loop: The first term can be bounded as

𝑯¯(𝑼¯𝑳𝑼¯)2,=𝑼¯𝚲¯𝑼¯(𝑼¯𝑳𝑼¯)2,𝑼¯2,σmax𝑼¯(𝑼¯𝑳𝑼¯).\displaystyle\left\|\overline{\boldsymbol{H}}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}=\left\|\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{\Lambda}}^{*}\overline{\boldsymbol{U}}^{*\top}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}\lesssim\left\|\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}\sigma_{\textbf{max}}^{*}\left\|\overline{\boldsymbol{U}}^{*\top}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|.

Let 𝑳=𝒀1(cos𝛀)𝒀2\boldsymbol{L}=\boldsymbol{Y}_{1}(\cos\boldsymbol{\Omega})\boldsymbol{Y}_{2}^{\top} be the SVD of 𝑳\boldsymbol{L}, then

𝑼¯(𝑼¯𝑳𝑼¯)=𝑳𝑳𝑰=𝒀(cos2𝛀)𝒀𝑰=cos2𝛀𝑰=sin𝛀2.\displaystyle\left\|\overline{\boldsymbol{U}}^{*\top}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|=\left\|\boldsymbol{L}^{\top}\boldsymbol{L}-\boldsymbol{I}\right\|=\left\|\boldsymbol{Y}\left(\cos^{2}\boldsymbol{\Omega}\right)\boldsymbol{Y}^{\top}-\boldsymbol{I}\right\|=\left\|\cos^{2}\boldsymbol{\Omega}-\boldsymbol{I}\right\|=\left\|\sin\boldsymbol{\Omega}\right\|^{2}.

By (63), we have sin𝛀nθmax/σmin\left\|\sin\boldsymbol{\Omega}\right\|\lesssim\sqrt{n}\theta_{\text{max}}/\sigma_{\textbf{min}}^{*}. Therefore,

𝑯¯(𝑼¯𝑳𝑼¯)2,𝑼¯2,σmaxsin𝛀2κσmin(K1)μnθmax2.\displaystyle\left\|\overline{\boldsymbol{H}}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}\lesssim\left\|\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}\sigma_{\textbf{max}}^{*}\left\|\sin\boldsymbol{\Omega}\right\|^{2}\lesssim\frac{\kappa^{*}}{\sigma_{\textbf{min}}^{*}}\sqrt{(K-1)\mu^{*}n}\theta_{\text{max}}^{2}. (71)

Without self-loop: The first term can be bounded as

𝑯¯(𝑼¯𝑳𝑼¯)2,𝑼¯𝚲¯𝑼¯(𝑼¯𝑳𝑼¯)2,+(𝑯¯𝑼¯𝚲¯𝑼¯)(𝑼¯𝑳𝑼¯)2,.\displaystyle\left\|\overline{\boldsymbol{H}}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}\leq\left\|\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{\Lambda}}^{*}\overline{\boldsymbol{U}}^{*\top}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}+\left\|\left(\overline{\boldsymbol{H}}-\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{\Lambda}}^{*}\overline{\boldsymbol{U}}^{*\top}\right)\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}.

The first summand is bounded as the last case. For the second term,

𝑯¯𝑼¯𝚲¯𝑼¯σK+1(𝑯)diag(𝚯𝚷𝑷𝚷𝚯)θmax2,\displaystyle\left\|\overline{\boldsymbol{H}}-\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{\Lambda}}^{*}\overline{\boldsymbol{U}}^{*\top}\right\|\leq\sigma_{K+1}(\boldsymbol{H})\leq\left\|\textbf{diag}(\boldsymbol{\Theta}\boldsymbol{\Pi}\boldsymbol{P}\boldsymbol{\Pi}^{\top}\boldsymbol{\Theta})\right\|\lesssim\theta_{\text{max}}^{2},

one can see that

(𝑯¯𝑼¯𝚲¯𝑼¯)(𝑼¯𝑳𝑼¯)2,𝑯¯𝑼¯𝚲¯𝑼¯𝑼¯𝑳𝑼¯θmax2𝑼¯𝑳𝑼¯\displaystyle\left\|\left(\overline{\boldsymbol{H}}-\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{\Lambda}}^{*}\overline{\boldsymbol{U}}^{*\top}\right)\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}\leq\left\|\overline{\boldsymbol{H}}-\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{\Lambda}}^{*}\overline{\boldsymbol{U}}^{*\top}\right\|\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|\lesssim\theta_{\text{max}}^{2}\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|
\displaystyle\leq θmax2𝑼¯𝑼¯𝑼¯𝑼¯=θmax2(𝑼¯𝑼¯𝑼¯𝑼¯)𝑼¯\displaystyle\theta_{\text{max}}^{2}\left\|\overline{\boldsymbol{U}}\overline{\boldsymbol{U}}^{\top}\overline{\boldsymbol{U}}^{*}-\overline{\boldsymbol{U}}^{*}\right\|=\theta_{\text{max}}^{2}\left\|\left(\overline{\boldsymbol{U}}\overline{\boldsymbol{U}}^{\top}-\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\right)\overline{\boldsymbol{U}}^{*}\right\|
\displaystyle\leq θmax2𝑼¯𝑼¯𝑼¯𝑼¯=θmax2sin𝛀nθmax3σmin.\displaystyle\theta_{\text{max}}^{2}\left\|\overline{\boldsymbol{U}}\overline{\boldsymbol{U}}^{\top}-\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\right\|=\theta_{\text{max}}^{2}\left\|\sin\boldsymbol{\Omega}\right\|\lesssim\frac{\sqrt{n}\theta_{\text{max}}^{3}}{\sigma_{\textbf{min}}^{*}}. (72)

Since θmax1\theta_{\text{max}}\lesssim 1, combine (71) and (72) we get

𝑯¯(𝑼¯𝑳𝑼¯)2,κσmin(K1)μnθmax2.\displaystyle\left\|\overline{\boldsymbol{H}}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}\lesssim\frac{\kappa^{*}}{\sigma_{\textbf{min}}^{*}}\sqrt{(K-1)\mu^{*}n}\theta_{\text{max}}^{2}. (73)

It remains to bound the second term 𝑾¯(𝑼¯𝑳𝑼¯)2,\left\|\overline{\boldsymbol{W}}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}. Recall from (8.7),

𝑾¯=𝑾𝒖1𝒖1𝑾𝒖1𝒖1𝑵𝑾𝒖1𝒖1(𝑾𝒖1𝒖1)𝑵𝚫.\displaystyle\overline{\boldsymbol{W}}=\boldsymbol{W}-\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}-\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}-\left(\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right)^{\top}\boldsymbol{N}-\boldsymbol{\Delta}. (74)

We control the five summands of RHS separately.

  1. (a)

    Control 𝑾(𝑼¯𝑳𝑼¯)2,\left\|\boldsymbol{W}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}: We use the definition of leave-one-out matrix 𝑾(i)\boldsymbol{W}^{(i)} in the proof of Theorem 9. Define 𝑿(i):=𝑯+𝑾(i)\boldsymbol{X}^{(i)}:=\boldsymbol{H}+\boldsymbol{W}^{(i)} and let λ^1(i),λ^2(i),,λ^K(i)\widehat{\lambda}_{1}^{(i)},\widehat{\lambda}_{2}^{(i)},\dots,\widehat{\lambda}_{K}^{(i)} be the largest KK eigenvalues of 𝑿(i)\boldsymbol{X}^{(i)} in magnitude, and they are sorted decreasingly. Let 𝒖^1(i),𝒖^2(i),,𝒖^K(i)\widehat{\boldsymbol{u}}_{1}^{(i)},\widehat{\boldsymbol{u}}_{2}^{(i)},\dots,\widehat{\boldsymbol{u}}_{K}^{(i)} the corresponding eigenvectors. Let 𝑿¯(i)=𝑿(i)λ^1(i)𝒖^1(i)𝒖^1(i)\overline{\boldsymbol{X}}^{(i)}=\boldsymbol{X}^{(i)}-\widehat{\lambda}_{1}^{(i)}\widehat{\boldsymbol{u}}_{1}^{(i)}\widehat{\boldsymbol{u}}_{1}^{(i)\top}, 𝑼¯(i)=[𝒖^2(i),,𝒖^K(i)]\overline{\boldsymbol{U}}^{(i)}=[\widehat{\boldsymbol{u}}_{2}^{(i)},\dots,\widehat{\boldsymbol{u}}_{K}^{(i)}] and 𝑳(i)=𝑼¯(i)𝑼¯\boldsymbol{L}^{(i)}=\overline{\boldsymbol{U}}^{(i)\top}\overline{\boldsymbol{U}}^{*}. Then 𝑼¯(i)\overline{\boldsymbol{U}}^{(i)} and 𝑳(i)\boldsymbol{L}^{(i)} are independent with 𝑾𝑾(i)\boldsymbol{W}-\boldsymbol{W}^{(i)}. And, one can easily see that the results in Lemma 4, Theorem 10 (we also define 𝚫(i)\boldsymbol{\Delta}^{(i)}), Corollary 2 and Lemma 7 also apply to the leave-one-out matrices. As a result, we have

    𝑾(𝑼¯𝑳𝑼¯)2,\displaystyle\left\|\boldsymbol{W}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty} =max1in𝑾i,(𝑼¯𝑳𝑼¯)2\displaystyle=\max_{1\leq i\leq n}\left\|\boldsymbol{W}_{i,\cdot}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2}
    max1in{𝑾i,(𝑼¯(i)𝑳(i)𝑼¯)2+𝑾i,(𝑼¯𝑳𝑼¯(i)𝑳(i))2}\displaystyle\leq\max_{1\leq i\leq n}\left\{\left\|\boldsymbol{W}_{i,\cdot}\left(\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2}+\left\|\boldsymbol{W}_{i,\cdot}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}\right)\right\|_{2}\right\}

    By Lemma 14, we have

    𝑾i,(𝑼¯(i)𝑳(i)𝑼¯)2\displaystyle\left\|\boldsymbol{W}_{i,\cdot}\left(\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2}\lesssim lognθmax𝑼¯(i)𝑳(i)𝑼¯F+logn𝑼¯(i)𝑳(i)𝑼¯2,\displaystyle\sqrt{\log n}\theta_{\text{max}}\left\|\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}-\overline{\boldsymbol{U}}^{*}\right\|_{F}+\log n\left\|\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}-\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}
    \displaystyle\leq lognθmax𝑼¯𝑳𝑼¯F+logn𝑼¯𝑳𝑼¯2,\displaystyle\sqrt{\log n}\theta_{\text{max}}\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{F}+\log n\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}
    +lognθmax𝑼¯(i)𝑳(i)𝑼¯𝑳F+logn𝑼¯(i)𝑳(i)𝑼¯𝑳2,\displaystyle+\sqrt{\log n}\theta_{\text{max}}\left\|\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}-\overline{\boldsymbol{U}}\boldsymbol{L}\right\|_{F}+\log n\left\|\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}-\overline{\boldsymbol{U}}\boldsymbol{L}\right\|_{2,\infty}
    \displaystyle\leq lognθmax𝑼¯𝑳𝑼¯F+logn𝑼¯𝑳𝑼¯2,\displaystyle\sqrt{\log n}\theta_{\text{max}}\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{F}+\log n\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}
    +logn𝑼¯(i)𝑳(i)𝑼¯𝑳F\displaystyle+\log n\left\|\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}-\overline{\boldsymbol{U}}\boldsymbol{L}\right\|_{F}

    with probability at least 1O(n14)1-O(n^{-14}). By [CCF+21, Lemma 2.5], since 𝑼¯𝑼¯=𝑰\overline{\boldsymbol{U}}^{*\top}\overline{\boldsymbol{U}}^{*}=\boldsymbol{I}, we have

    𝑼¯𝑳𝑼¯F\displaystyle\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{F} =𝑼¯𝑼¯𝑼¯𝑼¯F=(𝑼¯𝑼¯𝑼¯𝑼¯)𝑼¯F\displaystyle=\left\|\overline{\boldsymbol{U}}\overline{\boldsymbol{U}}^{\top}\overline{\boldsymbol{U}}^{*}-\overline{\boldsymbol{U}}^{*}\right\|_{F}=\left\|\left(\overline{\boldsymbol{U}}\overline{\boldsymbol{U}}^{\top}-\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\right)\overline{\boldsymbol{U}}^{*}\right\|_{F}
    𝑼¯𝑼¯𝑼¯𝑼¯F=2sin𝛀F2(K1)sin𝛀\displaystyle\leq\left\|\overline{\boldsymbol{U}}\overline{\boldsymbol{U}}^{\top}-\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\right\|_{F}=\sqrt{2}\left\|\sin\boldsymbol{\Omega}\right\|_{F}\leq\sqrt{2(K-1)}\left\|\sin\boldsymbol{\Omega}\right\|
    (K1)nσminθmax,\displaystyle\lesssim\frac{\sqrt{(K-1)n}}{\sigma_{\textbf{min}}^{*}}\theta_{\text{max}}, (75)

    where we used (63) for the last equality. As a result, we have

    𝑾i,(𝑼¯(i)𝑳(i)𝑼¯)2\displaystyle\left\|\boldsymbol{W}_{i,\cdot}\left(\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2}
    \displaystyle\lesssim (K1)nlognσminθmax+logn𝑼¯𝑳𝑼¯2,+logn𝑼¯(i)𝑳(i)𝑼¯𝑳Fα.\displaystyle\underbrace{\frac{\sqrt{(K-1)n\log n}}{\sigma_{\textbf{min}}^{*}}\theta_{\text{max}}+\log n\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}+\log n\left\|\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}-\overline{\boldsymbol{U}}\boldsymbol{L}\right\|_{F}}_{\alpha}. (76)

    On the other hand, we have

    𝑾i,(𝑼¯𝑳𝑼¯(i)𝑳(i))2𝑾i,2𝑼¯𝑳𝑼¯(i)𝑳(i)nθmax𝑼¯𝑳𝑼¯(i)𝑳(i)F.\displaystyle\left\|\boldsymbol{W}_{i,\cdot}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}\right)\right\|_{2}\leq\left\|\boldsymbol{W}_{i,\cdot}\right\|_{2}\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}\right\|\leq\sqrt{n}\theta_{\text{max}}\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}\right\|_{F}. (77)

    As a result, it remains to bound β:=𝑼¯(i)𝑳(i)𝑼¯𝑳F\beta:=\left\|\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}-\overline{\boldsymbol{U}}\boldsymbol{L}\right\|_{F}. One can see that

    β=(𝑼¯(i)𝑼¯(i)𝑼¯𝑼¯)𝑼¯F𝑼¯(i)𝑼¯(i)𝑼¯𝑼¯F.\displaystyle\beta=\left\|\left(\overline{\boldsymbol{U}}^{(i)}\overline{\boldsymbol{U}}^{(i)\top}-\overline{\boldsymbol{U}}\overline{\boldsymbol{U}}^{\top}\right)\overline{\boldsymbol{U}}^{*}\right\|_{F}\leq\left\|\overline{\boldsymbol{U}}^{(i)}\overline{\boldsymbol{U}}^{(i)\top}-\overline{\boldsymbol{U}}\overline{\boldsymbol{U}}^{\top}\right\|_{F}.

    By Davis-Kahan’s sinΘ\Theta theorem [CCF+21, Theorem 2.7], we have

    β=𝑼¯(i)𝑳(i)𝑼¯𝑳F\displaystyle\beta=\left\|\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}-\overline{\boldsymbol{U}}\boldsymbol{L}\right\|_{F} (𝑿¯𝑿¯(i))𝑼¯(i)Fmin2jK|λ^j|maxj>K|λ^j(i)|\displaystyle\lesssim\frac{\left\|\left(\overline{\boldsymbol{X}}-\overline{\boldsymbol{X}}^{(i)}\right)\overline{\boldsymbol{U}}^{(i)}\right\|_{F}}{\min_{2\leq j\leq K}\left|\widehat{\lambda}_{j}\right|-\max_{j>K}\left|\widehat{\lambda}_{j}^{(i)}\right|}
    (𝑿¯𝑿¯(i))𝑼¯(i)Fσmin𝑾𝑾(i)(𝑿¯𝑿¯(i))𝑼¯(i)Fσmin,\displaystyle\lesssim\frac{\left\|\left(\overline{\boldsymbol{X}}-\overline{\boldsymbol{X}}^{(i)}\right)\overline{\boldsymbol{U}}^{(i)}\right\|_{F}}{\sigma_{\textbf{min}}^{*}-\left\|\boldsymbol{W}\right\|-\left\|\boldsymbol{W}^{(i)}\right\|}\lesssim\frac{\left\|\left(\overline{\boldsymbol{X}}-\overline{\boldsymbol{X}}^{(i)}\right)\overline{\boldsymbol{U}}^{(i)}\right\|_{F}}{\sigma_{\textbf{min}}^{*}}, (78)

    since 𝑾(i)𝑾nθmaxσmin\|\boldsymbol{W}^{(i)}\|\leq\|\boldsymbol{W}\|\lesssim\sqrt{n}\theta_{\text{max}}\ll\sigma_{\textbf{min}}^{*}. By Theorem 10, 𝑿¯𝑿¯(i)\overline{\boldsymbol{X}}-\overline{\boldsymbol{X}}^{(i)} can be decomposed as

    𝑿¯𝑿¯(i)=\displaystyle\overline{\boldsymbol{X}}-\overline{\boldsymbol{X}}^{(i)}= 𝑾𝑾(i)𝒖1𝒖1(𝑾𝑾(i))𝒖1𝒖1𝑵(𝑾𝑾(i))𝒖1𝒖1\displaystyle\boldsymbol{W}-\boldsymbol{W}^{(i)}-\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}-\boldsymbol{N}\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}
    ((𝑾𝑾(i))𝒖1𝒖1)𝑵(𝚫𝚫(i)).\displaystyle-\left(\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right)^{\top}\boldsymbol{N}-\left(\boldsymbol{\Delta}-\boldsymbol{\Delta}^{(i)}\right).

    We bound the numerator of (78) via controlling the five summands of RHS of the above display separately.

    1. (a)

      Control (𝑾𝑾(i))𝑼¯(i)F\left\|\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\overline{\boldsymbol{U}}^{(i)}\right\|_{F}: By triangle inequality we have

      (𝑾𝑾(i))𝑼¯(i)F\displaystyle\left\|\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\overline{\boldsymbol{U}}^{(i)}\right\|_{F} =(𝑾𝑾(i))𝑼¯(i)𝑳(i)(𝑳(i))1F\displaystyle=\left\|\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}\left(\boldsymbol{L}^{(i)}\right)^{-1}\right\|_{F}
      (𝑾𝑾(i))𝑼¯(i)𝑳(i)F\displaystyle\lesssim\left\|\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}\right\|_{F}
      (𝑾𝑾(i))𝑼¯F+(𝑾𝑾(i))(𝑼¯(i)𝑳(i)𝑼¯)F=:ϑ1\displaystyle\leq\left\|\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\overline{\boldsymbol{U}}^{*}\right\|_{F}+\underbrace{\left\|\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\left(\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{F}}_{=:\vartheta_{1}} (79)

      On one hand, by Lemma 16 and (12) we have

      (𝑾𝑾(i))𝑼¯Flognθmax𝑼¯F+(nθmax+logn)𝑼¯2,\displaystyle\left\|\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\overline{\boldsymbol{U}}^{*}\right\|_{F}\lesssim\sqrt{\log n}\theta_{\text{max}}\left\|\overline{\boldsymbol{U}}^{*}\right\|_{F}+(\sqrt{n}\theta_{\text{max}}+\log n)\left\|\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}
      (K1)lognθmax+(nθmax+logn)(K1)μ/n=:α1\displaystyle\lesssim\sqrt{(K-1)\log n}\theta_{\text{max}}+(\sqrt{n}\theta_{\text{max}}+\log n)\sqrt{(K-1)\mu^{*}/n}=:\alpha_{1} (80)

      with probability at least 1O(n14)1-O(n^{-14}). On the other hand, we have

      ϑ12=\displaystyle\vartheta^{2}_{1}= 𝑾i,(𝑼¯(i)𝑳(i)𝑼¯)22+j[n],jiWji2(𝑼¯(i)𝑳(i)𝑼¯)j,22.\displaystyle\left\|\boldsymbol{W}_{i,\cdot}\left(\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2}^{2}+\sum_{j\in[n],j\neq i}W_{ji}^{2}\left\|\left(\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}-\overline{\boldsymbol{U}}^{*}\right)_{j,\cdot}\right\|_{2}^{2}.

      Since j[n],jiWji2𝑾,i22𝑾2nθmax2\sum_{j\in[n],j\neq i}W_{ji}^{2}\leq\left\|\boldsymbol{W}_{\cdot,i}\right\|_{2}^{2}\leq\left\|\boldsymbol{W}\right\|^{2}\lesssim n\theta_{\text{max}}^{2}, we get

      ϑ12α2+nθmax2𝑼¯(i)𝑳(i)𝑼¯2,2,\displaystyle\vartheta^{2}_{1}\lesssim\alpha^{2}+n\theta_{\text{max}}^{2}\left\|\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}-\overline{\boldsymbol{U}}^{*}\right\|^{2}_{2,\infty},

      where α\alpha is defined by (76). As a result, we have

      ϑ12\displaystyle\vartheta^{2}_{1} α+nθmax𝑼¯(i)𝑳(i)𝑼¯2,\displaystyle\lesssim\alpha+\sqrt{n}\theta_{\text{max}}\left\|\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}-\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}
      α+nθmax𝑼¯(i)𝑳(i)𝑼¯𝑳2,+nθmax𝑼¯𝑳𝑼¯2,\displaystyle\lesssim\alpha+\sqrt{n}\theta_{\text{max}}\left\|\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}-\overline{\boldsymbol{U}}\boldsymbol{L}\right\|_{2,\infty}+\sqrt{n}\theta_{\text{max}}\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}
      α+nθmaxβ+nθmax𝑼¯𝑳𝑼¯2,,\displaystyle\lesssim\alpha+\sqrt{n}\theta_{\text{max}}\beta+\sqrt{n}\theta_{\text{max}}\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty},

      where β\beta is defined by (78). Combine this with (79) and (80) we have

      (𝑾𝑾(i))𝑼¯(i)F\displaystyle\left\|\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\overline{\boldsymbol{U}}^{(i)}\right\|_{F}\lesssim α+nθmaxβ+nθmax𝑼¯𝑳𝑼¯2,+α1.\displaystyle\;\alpha+\sqrt{n}\theta_{\text{max}}\beta+\sqrt{n}\theta_{\text{max}}\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}+\alpha_{1}. (81)

      with probability at least 1O(n14)1-O(n^{-14}).

    2. (b)

      Control 𝒖1𝒖1(𝑾𝑾(i))𝒖1𝒖1𝑼¯(i)F\left\|\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\overline{\boldsymbol{U}}^{(i)}\right\|_{F}: Since 𝒖1𝒖1(𝑾𝑾(i))𝒖1𝒖1𝑼¯(i)\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\overline{\boldsymbol{U}}^{(i)} is a rank-11 matrix, we know that

      𝒖1𝒖1(𝑾𝑾(i))𝒖1𝒖1𝑼¯(i)F=𝒖12𝒖1(𝑾𝑾(i))𝒖1𝒖1𝑼¯(i)2\displaystyle\left\|\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\overline{\boldsymbol{U}}^{(i)}\right\|_{F}=\left\|\boldsymbol{u}_{1}^{*}\right\|_{2}\left\|\boldsymbol{u}_{1}^{*\top}\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\overline{\boldsymbol{U}}^{(i)}\right\|_{2}
      (𝑾𝑾(i))𝒖1𝒖1(𝑾𝑾(i))𝒖1𝒖1F.\displaystyle\leq\left\|\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right\|\leq\left\|\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right\|_{F}.

      By Lemma 16 we know that

      (𝑾𝑾(i))𝒖1𝒖1F\displaystyle\left\|\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right\|_{F} lognθmax𝒖1𝒖1F+(nθmax+logn)𝒖1𝒖12,\displaystyle\lesssim\sqrt{\log n}\theta_{\text{max}}\left\|\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right\|_{F}+(\sqrt{n}\theta_{\text{max}}+\log n)\left\|\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right\|_{2,\infty}
      lognθmax+(nθmax+logn)μ/n=:ρ2.\displaystyle\leq\sqrt{\log n}\theta_{\text{max}}+(\sqrt{n}\theta_{\text{max}}+\log n)\sqrt{\mu^{*}/n}=:\rho_{2}. (82)

      with probability at least 1O(n14)1-O(n^{-14}), where the last inequality uses (12). Hence,

      𝒖1𝒖1(𝑾𝑾(i))𝒖1𝒖1𝑼¯(i)Fρ2\displaystyle\left\|\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\overline{\boldsymbol{U}}^{(i)}\right\|_{F}\lesssim\rho_{2} (83)

      with probability at least 1O(n14)1-O(n^{-14}).

    3. (c)

      Control 𝑵(𝑾𝑾(i))𝒖1𝒖1𝑼¯(i)F\left\|\boldsymbol{N}\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\overline{\boldsymbol{U}}^{(i)}\right\|_{F}: By (62), we have 𝑵1\left\|\boldsymbol{N}\right\|\lesssim 1 implying

      𝑵(𝑾𝑾(i))𝒖1𝒖1𝑼¯(i)F\displaystyle\left\|\boldsymbol{N}\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\overline{\boldsymbol{U}}^{(i)}\right\|_{F} 𝑵𝑼¯(i)(𝑾𝑾(i))𝒖1𝒖1F\displaystyle\leq\left\|\boldsymbol{N}\right\|\left\|\overline{\boldsymbol{U}}^{(i)}\right\|\left\|\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right\|_{F}
      (𝑾𝑾(i))𝒖1𝒖1Fρ2.\displaystyle\lesssim\left\|\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right\|_{F}\lesssim\rho_{2}.

      with probability at least 1O(n14)1-O(n^{-14}) using (0b).

    4. (d)

      Control ((𝑾𝑾(i))𝒖1𝒖1)𝑵𝑼¯(i)F\left\|\left(\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right)^{\top}\boldsymbol{N}\overline{\boldsymbol{U}}^{(i)}\right\|_{F}: Since ((𝑾𝑾(i))𝒖1𝒖1)𝑵𝑼¯(i)\left(\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right)^{\top}\boldsymbol{N}\overline{\boldsymbol{U}}^{(i)} is a rank-11 matrix and 𝑵=λ1𝑵1\boldsymbol{N}=\lambda_{1}^{*}\boldsymbol{N}_{1}, by Lemma 12 and (16) we know that

      ((𝑾𝑾(i))𝒖1𝒖1)𝑵𝑼¯(i)F\displaystyle\left\|\left(\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right)^{\top}\boldsymbol{N}\overline{\boldsymbol{U}}^{(i)}\right\|_{F} =𝒖12𝒖1(𝑾𝑾(i))𝑵𝑼¯(i)2\displaystyle=\left\|\boldsymbol{u}_{1}^{*}\right\|_{2}\left\|\boldsymbol{u}_{1}^{*\top}\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{N}\overline{\boldsymbol{U}}^{(i)}\right\|_{2}
      (𝑾𝑾(i))𝒖12ρ2\displaystyle\lesssim\left\|\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{*}\right\|_{2}\lesssim\rho_{2} (84)

      with probability at least 1O(n15)1-O(n^{-15}).

    5. (e)

      Control (𝚫𝚫(i))𝑼¯(i)F\left\|\left(\boldsymbol{\Delta}-\boldsymbol{\Delta}^{(i)}\right)\overline{\boldsymbol{U}}^{(i)}\right\|_{F}: By Theorem 10, the ranks of 𝚫\boldsymbol{\Delta} and 𝚫(i)\boldsymbol{\Delta}^{(i)} are at most 33. Hence,

      (𝚫𝚫(i))𝑼¯(i)F𝚫𝚫(i)F𝑼¯(i)𝚫𝚫(i)nθmax2λ1.\displaystyle\left\|\left(\boldsymbol{\Delta}-\boldsymbol{\Delta}^{(i)}\right)\overline{\boldsymbol{U}}^{(i)}\right\|_{F}\lesssim\left\|\boldsymbol{\Delta}-\boldsymbol{\Delta}^{(i)}\right\|_{F}\left\|\overline{\boldsymbol{U}}^{(i)}\right\|\lesssim\left\|\boldsymbol{\Delta}-\boldsymbol{\Delta}^{(i)}\right\|\lesssim\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{*}}. (85)

    Recall the definitions of α,β\alpha,\beta from (76) and (78) respectively. Combining (81)-(85), we get

    (𝑿¯𝑿¯(i))𝑼¯(i)F\displaystyle\left\|\left(\overline{\boldsymbol{X}}-\overline{\boldsymbol{X}}^{(i)}\right)\overline{\boldsymbol{U}}^{(i)}\right\|_{F}\lesssim α+nθmaxβ+nθmax𝑼¯𝑳𝑼¯2,+nθmax2λ1+α1\displaystyle\;\alpha+\sqrt{n}\theta_{\text{max}}\beta+\sqrt{n}\theta_{\text{max}}\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}+\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{*}}+\alpha_{1}

    with probability at least 1O(n14)1-O(n^{-14}). This, along with (78) yields

    β\displaystyle\beta\lesssim nθmaxσminβ+1σminα+nθmaxσmin𝑼¯𝑳𝑼¯2,+nθmax2λ1σmin\displaystyle\frac{\sqrt{n}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{*}}\beta+\frac{1}{\sigma_{\textbf{min}}^{*}}\alpha+\frac{\sqrt{n}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{*}}\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}+\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{*}\sigma_{\textbf{min}}^{*}}
    +(K1)lognθmax+(nθmax+logn)(K1)μ/nσmin\displaystyle+\frac{\sqrt{(K-1)\log n}\theta_{\text{max}}+(\sqrt{n}\theta_{\text{max}}+\log n)\sqrt{(K-1)\mu^{*}/n}}{\sigma_{\textbf{min}}^{*}}

    with probability at least 1O(n14)1-O(n^{-14}). Since nθmaxσmin\sqrt{n}\theta_{\text{max}}\ll\sigma_{\textbf{min}}^{*},

    β\displaystyle\beta\lesssim 1σminα+nθmaxσmin𝑼¯𝑳𝑼¯2,+nθmax2λ1σmin+α1σmin\displaystyle\frac{1}{\sigma_{\textbf{min}}^{*}}\alpha+\frac{\sqrt{n}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{*}}\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}+\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{*}\sigma_{\textbf{min}}^{*}}+\frac{\alpha_{1}}{\sigma_{\textbf{min}}^{*}}

    with probability at least 1O(n14)1-O(n^{-14}). Recall that α\alpha is defined as

    α=(K1)nlognσminθmax+logn𝑼¯𝑳𝑼¯2,+βlogn\displaystyle\alpha=\frac{\sqrt{(K-1)n\log n}}{\sigma_{\textbf{min}}^{*}}\theta_{\text{max}}+\log n\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}+\beta\log n

    in  (76), we have

    α\displaystyle\alpha\lesssim (K1)nlognσminθmax+logn𝑼¯𝑳𝑼¯2,+lognσminα\displaystyle\frac{\sqrt{(K-1)n\log n}}{\sigma_{\textbf{min}}^{*}}\theta_{\text{max}}+\log n\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}+\frac{\log n}{\sigma_{\textbf{min}}^{*}}\alpha
    +nθmaxlognσmin𝑼¯𝑳𝑼¯2,+nθmax2lognλ1σmin+α1lognσmin\displaystyle+\frac{\sqrt{n}\theta_{\text{max}}\log n}{\sigma_{\textbf{min}}^{*}}\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}+\frac{n\theta_{\text{max}}^{2}\log n}{\lambda_{1}^{*}\sigma_{\textbf{min}}^{*}}+\frac{\alpha_{1}\log n}{\sigma_{\textbf{min}}^{*}}

    with probability at least 1O(n14)1-O(n^{-14}). By our assumption max{nθmax,logn}σmin\max\{\sqrt{n}\theta_{\text{max}},\log n\}\ll\sigma_{\textbf{min}}^{*}, yielding

    α\displaystyle\alpha\lesssim logn𝑼¯𝑳𝑼¯2,+(K1)nlognσminθmax+nθmax2lognλ1σmin\displaystyle\log n\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}+\frac{\sqrt{(K-1)n\log n}}{\sigma_{\textbf{min}}^{*}}\theta_{\text{max}}+\frac{n\theta_{\text{max}}^{2}\log n}{\lambda_{1}^{*}\sigma_{\textbf{min}}^{*}}
    +(nθmax+logn)(K1)μ/nlognσmin\displaystyle+\frac{(\sqrt{n}\theta_{\text{max}}+\log n)\sqrt{(K-1)\mu^{*}/n}\log n}{\sigma_{\textbf{min}}^{*}} (86)

    with probability at least 1O(n14)1-O(n^{-14}). This also implies

    β\displaystyle\beta\lesssim nθmax+lognσmin𝑼¯𝑳𝑼¯2,+nθmax2λ1σmin\displaystyle\frac{\sqrt{n}\theta_{\text{max}}+\log n}{\sigma_{\textbf{min}}^{*}}\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}+\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{*}\sigma_{\textbf{min}}^{*}}
    +(K1)lognθmax+(nθmax+logn)(K1)μ/nσmin\displaystyle+\frac{\sqrt{(K-1)\log n}\theta_{\text{max}}+(\sqrt{n}\theta_{\text{max}}+\log n)\sqrt{(K-1)\mu^{*}/n}}{\sigma_{\textbf{min}}^{*}} (87)

    with probability at least 1O(n14)1-O(n^{-14}). Plugging the quantities α,β\alpha,\beta in  (76) we get

    𝑾i,(𝑼¯𝑳𝑼¯)2\displaystyle\left\|\boldsymbol{W}_{i,\cdot}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2}\leq 𝑾i,(𝑼¯(i)𝑳(i)𝑼¯)2+𝑾i,(𝑼¯𝑳𝑼¯(i)𝑳(i))2\displaystyle\left\|\boldsymbol{W}_{i,\cdot}\left(\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2}+\left\|\boldsymbol{W}_{i,\cdot}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}\right)\right\|_{2}
    \displaystyle\lesssim α+nθmaxβ\displaystyle\alpha+\sqrt{n}\theta_{\text{max}}\beta

    with probability at least 1O(n14)1-O(n^{-14}). Taking a union bound for i[n]i\in[n] and noting that (𝒜1)1O(n10)\mathbb{P}(\mathcal{A}_{1})\geq 1-O(n^{-10}), we have with probability at least 1O(n10)1-O(n^{-10}),

    𝑾(𝑼¯𝑳𝑼¯)2,\displaystyle\left\|\boldsymbol{W}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}\lesssim (nθmax2σmin+logn)𝑼¯𝑳𝑼¯2,+(K1)nlognσminθmax\displaystyle\left(\frac{n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{*}}+\log n\right)\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}+\frac{\sqrt{(K-1)n\log n}}{\sigma_{\textbf{min}}^{*}}\theta_{\text{max}}
    +(nθmax+logn)nθmax2λ1σmin+(nθmax+logn)2(K1)μ/nσmin,\displaystyle+\frac{(\sqrt{n}\theta_{\text{max}}+\log n)n\theta_{\text{max}}^{2}}{\lambda_{1}^{*}\sigma_{\textbf{min}}^{*}}+\frac{(\sqrt{n}\theta_{\text{max}}+\log n)^{2}\sqrt{(K-1)\mu^{*}/n}}{\sigma_{\textbf{min}}^{*}}, (88)

    where we used the bounds  (86) and  (87).

  2. (b)

    Control 𝒖1𝒖1𝑾𝒖1𝒖1(𝑼¯𝑳𝑼¯)2,\left\|\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}: We can write it as

    𝒖1𝒖1𝑾𝒖1𝒖1(𝑼¯𝑳𝑼¯)2,=𝒖1|𝒖1𝑾𝒖1|𝒖1(𝑼¯𝑳𝑼¯)2\displaystyle\left\|\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}=\left\|\boldsymbol{u}_{1}^{*}\right\|_{\infty}\left|\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right|\left\|\boldsymbol{u}_{1}^{*\top}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2} (89)

    By Lemma 17, |𝒖1𝑾𝒖1|logn\left|\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right|\lesssim\sqrt{\log n} with probability at least 1O(n15)1-O(n^{-15}). Since 𝑼¯𝑼¯=𝑰\overline{\boldsymbol{U}}^{*\top}\overline{\boldsymbol{U}}=\boldsymbol{I},

    𝒖1(𝑼¯𝑳𝑼¯)2\displaystyle\left\|\boldsymbol{u}_{1}^{*\top}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2} 𝒖12𝑼¯𝑳𝑼¯=𝑼¯𝑼¯𝑼¯𝑼¯=(𝑼¯𝑼¯𝑼¯𝑼¯)𝑼¯\displaystyle\leq\left\|\boldsymbol{u}_{1}^{*}\right\|_{2}\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|=\left\|\overline{\boldsymbol{U}}\overline{\boldsymbol{U}}^{\top}\overline{\boldsymbol{U}}^{*}-\overline{\boldsymbol{U}}^{*}\right\|=\left\|\left(\overline{\boldsymbol{U}}\overline{\boldsymbol{U}}^{\top}-\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\right)\overline{\boldsymbol{U}}^{*}\right\|
    𝑼¯𝑼¯𝑼¯𝑼¯=sin𝛀nθmaxσmin,\displaystyle\leq\left\|\overline{\boldsymbol{U}}\overline{\boldsymbol{U}}^{\top}-\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\right\|=\left\|\sin\boldsymbol{\Omega}\right\|\lesssim\frac{\sqrt{n}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{*}},

    where the last inequality uses (63). Plugging in (89),

    𝒖1𝒖1𝑾𝒖1𝒖1(𝑼¯𝑳𝑼¯)2,μnlognnθmaxσmin=μlognθmaxσmin\displaystyle\left\|\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}\lesssim\sqrt{\frac{\mu^{*}}{n}}\sqrt{\log n}\frac{\sqrt{n}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{*}}=\frac{\sqrt{\mu^{*}\log n}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{*}} (90)

    with probability at least 1O(n15)1-O(n^{-15}).

  3. (c)

    Control 𝑵𝑾𝒖1𝒖1(𝑼¯𝑳𝑼¯)2,\left\|\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}: Since 𝑵=λ1𝑵1\boldsymbol{N}=\lambda_{1}^{*}\boldsymbol{N}_{1}, we apply Lemma 50 to get

    𝑵𝑾𝒖1𝒖1(𝑼¯𝑳𝑼¯)2,\displaystyle\left\|\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}
    \displaystyle\lesssim 𝑾𝒖1𝒖1(𝑼¯𝑳𝑼¯)2,+(𝑵𝑰)𝑾𝒖1𝒖1(𝑼¯𝑳𝑼¯)2,\displaystyle\left\|\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}+\left\|\left(\boldsymbol{N}-\boldsymbol{I}\right)\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}
    \displaystyle\lesssim 𝑾𝒖1𝒖1(𝑼¯𝑳𝑼¯)2,+𝑵𝑰2,𝑾𝒖1𝒖1(𝑼¯𝑳𝑼¯)\displaystyle\left\|\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}+\left\|\boldsymbol{N}-\boldsymbol{I}\right\|_{2,\infty}\left\|\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|
    \displaystyle\lesssim 𝑾𝒖12,𝒖1(𝑼¯𝑳𝑼¯)+(K1)μn𝑾𝑼¯𝑳𝑼¯.\displaystyle\left\|\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\|_{2,\infty}\left\|\boldsymbol{u}_{1}^{*\top}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|+\sqrt{\frac{(K-1)\mu^{*}}{n}}\left\|\boldsymbol{W}\right\|\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|.

    On one hand, under event 𝒜1\mathcal{A}_{1}, we have 𝑾nθmax\|\boldsymbol{W}\|\lesssim\sqrt{n}\theta_{\text{max}} and 𝑼¯𝑳𝑼¯=sin𝛀nθmax/σmin\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|=\left\|\sin\boldsymbol{\Omega}\right\|\lesssim\sqrt{n}\theta_{\text{max}}/\sigma_{\textbf{min}}^{*}, using (63), yielding

    (K1)μn𝑾𝑼¯𝑳𝑼¯(K1)nμθmax2σmin.\displaystyle\sqrt{\frac{(K-1)\mu^{*}}{n}}\left\|\boldsymbol{W}\right\|\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|\lesssim\frac{\sqrt{(K-1)n\mu^{*}}\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{*}}.

    On the other hand, by Lemma 15, 𝑾𝒖12,lognθmax+lognμ/n\left\|\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\|_{2,\infty}\lesssim\sqrt{\log n}\theta_{\text{max}}+\log n\sqrt{\mu^{*}/n} with probability at least 1O(n14)1-O(n^{-14}). As a result, one can see that

    𝑾𝒖12,𝒖1(𝑼¯𝑳𝑼¯)\displaystyle\left\|\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\|_{2,\infty}\left\|\boldsymbol{u}_{1}^{*\top}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\| (lognθmax+lognμ/n)𝑼¯𝑳𝑼¯\displaystyle\lesssim\left(\sqrt{\log n}\theta_{\text{max}}+\log n\sqrt{\mu^{*}/n}\right)\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|
    nlognθmax2+lognμθmaxσmin\displaystyle\lesssim\frac{\sqrt{n\log n}\theta_{\text{max}}^{2}+\log n\sqrt{\mu^{*}}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{*}}

    with probability at least 1O(n14)1-O(n^{-14}). To sum up, we get

    𝑵𝑾𝒖1𝒖1(𝑼¯𝑳𝑼¯)2,((K1)μ+logn)nθmax2+lognμθmaxσmin\displaystyle\left\|\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}\lesssim\frac{(\sqrt{(K-1)\mu^{*}}+\sqrt{\log n})\sqrt{n}\theta_{\text{max}}^{2}+\log n\sqrt{\mu^{*}}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{*}} (91)

    with probability at least 1O(n14)1-O(n^{-14}).

  4. (d)

    Control (𝑾𝒖1𝒖1)𝑵(𝑼¯𝑳𝑼¯)2,\left\|\left(\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right)^{\top}\boldsymbol{N}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}: It can be bounded as

    (𝑾𝒖1𝒖1)𝑵(𝑼¯𝑳𝑼¯)2,\displaystyle\left\|\left(\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right)^{\top}\boldsymbol{N}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty} =𝒖1𝒖1𝑾𝑵(𝑼¯𝑳𝑼¯)2\displaystyle=\left\|\boldsymbol{u}_{1}^{*}\right\|_{\infty}\left\|\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{N}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2}
    𝒖1𝒖12𝑾𝑵𝑼¯𝑳𝑼¯.\displaystyle\leq\left\|\boldsymbol{u}_{1}^{*}\right\|_{\infty}\left\|\boldsymbol{u}_{1}^{*}\right\|_{2}\left\|\boldsymbol{W}\right\|\left\|\boldsymbol{N}\right\|\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|.

    Recall that 𝑵=λ1𝑵1\boldsymbol{N}=\lambda_{1}^{*}\boldsymbol{N}_{1}, as we have shown in Lemma 12 and (63), 𝑵1\left\|\boldsymbol{N}\right\|\lesssim 1 and 𝑼¯𝑳𝑼¯=sin𝛀nθmax/σmin\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|=\left\|\sin\boldsymbol{\Omega}\right\|\lesssim\sqrt{n}\theta_{\text{max}}/\sigma_{\textbf{min}}^{*}. Therefore, we have

    (𝑾𝒖1𝒖1)𝑵(𝑼¯𝑳𝑼¯)2,μnnθmaxnθmaxσmin=μnθmax2σmin.\displaystyle\left\|\left(\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right)^{\top}\boldsymbol{N}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}\lesssim\sqrt{\frac{\mu^{*}}{n}}\sqrt{n}\theta_{\text{max}}\frac{\sqrt{n}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{*}}=\frac{\sqrt{\mu^{*}n}\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{*}}. (92)
  5. (e)

    Control 𝚫(𝑼¯𝑳𝑼¯)2,\left\|\boldsymbol{\Delta}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}: According to Lemma 10, we simply have

    𝚫(𝑼¯𝑳𝑼¯)2,𝚫(𝑼¯𝑳𝑼¯)F𝚫𝑼¯𝑳𝑼¯Fnθmax2λ1𝑼¯𝑳𝑼¯F.\displaystyle\left\|\boldsymbol{\Delta}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}\leq\left\|\boldsymbol{\Delta}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{F}\leq\left\|\boldsymbol{\Delta}\right\|\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{F}\lesssim\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{*}}\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{F}.

    As we have shown in (75), we have 𝑼¯𝑳𝑼¯F(K1)nθmax/σmin\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{F}\lesssim\sqrt{(K-1)n}\theta_{\text{max}}/\sigma_{\textbf{min}}^{*}. As a result, we have

    𝚫(𝑼¯𝑳𝑼¯)2,nλ1𝑼¯𝑳𝑼¯FK1θmax3n1.5λ1σmin.\displaystyle\left\|\boldsymbol{\Delta}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}\lesssim\frac{n}{\lambda_{1}^{*}}\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{F}\lesssim\frac{\sqrt{K-1}\theta_{\text{max}}^{3}n^{1.5}}{\lambda_{1}^{*}\sigma_{\textbf{min}}^{*}}. (93)

Finally, we can sum up all the five terms we bounded above. Specifically, a combination of (88), (90), (91), (92), (93) and (74) yields

𝑾¯(𝑼¯𝑳𝑼¯)2,\displaystyle\left\|\overline{\boldsymbol{W}}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}\lesssim (nθmax2σmin+logn)𝑼¯𝑳𝑼¯2,+(K1)nlognσminθmax\displaystyle\left(\frac{n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{*}}+\log n\right)\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}+\frac{\sqrt{(K-1)n\log n}}{\sigma_{\textbf{min}}^{*}}\theta_{\text{max}}
+((K1)nθmax+logn)nθmax2λ1σmin+(nθmax+logn)2(K1)μ/nσmin\displaystyle+\frac{(\sqrt{(K-1)n}\theta_{\text{max}}+\log n)n\theta_{\text{max}}^{2}}{\lambda_{1}^{*}\sigma_{\textbf{min}}^{*}}+\frac{(\sqrt{n}\theta_{\text{max}}+\log n)^{2}\sqrt{(K-1)\mu^{*}/n}}{\sigma_{\textbf{min}}^{*}}

with probability at least 1O(n10)1-O(n^{-10}). Plugging in this bound, along with the bound (71) and (73) in (70) provides the desired conclusion.

8.9 Proof of Lemma 10

Proof.

Since 𝑿¯𝑼¯=𝑼¯𝚲¯+𝑾¯𝑼¯\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}=\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{\Lambda}}^{*}+\overline{\boldsymbol{W}}\overline{\boldsymbol{U}}^{*}, we consider the following decomposition

σmin𝑼¯𝑳𝑼¯2,\displaystyle\sigma_{\textbf{min}}^{*}\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty} 𝑼¯𝑳𝚲¯𝑼¯𝚲¯2,\displaystyle\leq\left\|\overline{\boldsymbol{U}}\boldsymbol{L}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{\Lambda}}^{*}\right\|_{2,\infty}
𝑼¯𝚲¯𝑳𝑼¯𝚲¯2,+𝑼¯𝑳𝚲¯𝑼¯𝚲¯𝑳2,\displaystyle\leq\left\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{\Lambda}}^{*}\right\|_{2,\infty}+\left\|\overline{\boldsymbol{U}}\boldsymbol{L}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}\right\|_{2,\infty}
𝑼¯𝚲¯𝑳𝑿¯𝑼¯2,+𝑾¯𝑼¯2,+𝑼¯𝑳𝚲¯𝑼¯𝚲¯𝑳2,.\displaystyle\leq\left\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}+\left\|\overline{\boldsymbol{W}}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}+\left\|\overline{\boldsymbol{U}}\boldsymbol{L}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}\right\|_{2,\infty}. (94)

Since the upper bound of 𝑼¯𝚲¯𝑳𝑿¯𝑼¯2,\left\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty} follows from Lemma 9, we only have to deal with the second and third quantity. We begin with the second quantity. By Lemma 7, since σK1(𝑳)1/2\sigma_{K-1}(\boldsymbol{L})\geq 1/2, we have

𝑼¯𝑳𝚲¯𝑼¯𝚲¯𝑳2,\displaystyle\left\|\overline{\boldsymbol{U}}\boldsymbol{L}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}\right\|_{2,\infty} =𝑼¯2,𝑳𝚲¯𝚲¯𝑳𝑼¯𝑳2,𝑳𝚲¯𝚲¯𝑳\displaystyle=\left\|\overline{\boldsymbol{U}}\right\|_{2,\infty}\left\|\boldsymbol{L}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{\Lambda}}\boldsymbol{L}\right\|\lesssim\left\|\overline{\boldsymbol{U}}\boldsymbol{L}\right\|_{2,\infty}\left\|\boldsymbol{L}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{\Lambda}}\boldsymbol{L}\right\|
(𝑼¯2,+𝑼¯𝑳𝑼¯2,)𝑳𝚲¯𝚲¯𝑳\displaystyle\leq\left(\left\|\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}+\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}\right)\left\|\boldsymbol{L}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{\Lambda}}\boldsymbol{L}\right\|
((K1)μn+𝑼¯𝑳𝑼¯2,)𝑳𝚲¯𝚲¯𝑳,\displaystyle\leq\left(\sqrt{\frac{(K-1)\mu^{*}}{n}}+\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}\right)\left\|\boldsymbol{L}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{\Lambda}}\boldsymbol{L}\right\|, (95)

where the last inequality follows from (12). In addition,

𝑳𝚲¯𝚲¯𝑳\displaystyle\left\|\boldsymbol{L}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{\Lambda}}\boldsymbol{L}\right\| =(𝑳)1(𝑳𝑳𝚲¯𝑳𝚲¯𝑳)𝑳𝑳𝚲¯𝑳𝚲¯𝑳\displaystyle=\left\|\left(\boldsymbol{L}^{\top}\right)^{-1}\left(\boldsymbol{L}^{\top}\boldsymbol{L}\overline{\boldsymbol{\Lambda}}^{*}-\boldsymbol{L}^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}\right)\right\|\lesssim\left\|\boldsymbol{L}^{\top}\boldsymbol{L}\overline{\boldsymbol{\Lambda}}^{*}-\boldsymbol{L}^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}\right\|
𝚲¯𝑳𝚲¯𝑳+𝑳𝑳𝚲¯𝚲¯\displaystyle\leq\left\|\overline{\boldsymbol{\Lambda}}^{*}-\boldsymbol{L}^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}\right\|+\left\|\boldsymbol{L}^{\top}\boldsymbol{L}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{\Lambda}}^{*}\right\|
θmax3n1.5σmin2+(K1)lognθmax+σmax𝑳𝑳𝑰.\displaystyle\lesssim\frac{\theta_{\text{max}}^{3}n^{1.5}}{\sigma_{\textbf{min}}^{*2}}+\sqrt{(K-1)\log n}\theta_{\text{max}}+\sigma_{\textbf{max}}^{*}\left\|\boldsymbol{L}^{\top}\boldsymbol{L}-\boldsymbol{I}\right\|. (96)

Since 𝑹𝑹=𝑰\boldsymbol{R}^{\top}\boldsymbol{R}=\boldsymbol{I}, by Lemma 7 we have

𝑳𝑳𝑰\displaystyle\left\|\boldsymbol{L}^{\top}\boldsymbol{L}-\boldsymbol{I}\right\| =𝑳𝑳𝑹𝑹𝑳(𝑳𝑹)+(𝑳𝑹)𝑹𝑳𝑹nθmax2σmin2.\displaystyle=\left\|\boldsymbol{L}^{\top}\boldsymbol{L}-\boldsymbol{R}^{\top}\boldsymbol{R}\right\|\leq\|\boldsymbol{L}^{\top}\left(\boldsymbol{L}-\boldsymbol{R}\right)\|+\|\left(\boldsymbol{L}-\boldsymbol{R}\right)^{\top}\boldsymbol{R}\|\lesssim\|\boldsymbol{L}-\boldsymbol{R}\|\lesssim\frac{n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{*2}}. (97)

Combining (96) and (97), we get

𝑳𝚲¯𝚲¯𝑳\displaystyle\left\|\boldsymbol{L}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{\Lambda}}\boldsymbol{L}\right\| θmax3n1.5σmin2+(K1)lognθmax+σmaxnθmax2σmin2\displaystyle\lesssim\frac{\theta_{\text{max}}^{3}n^{1.5}}{\sigma_{\textbf{min}}^{*2}}+\sqrt{(K-1)\log n}\theta_{\text{max}}+\sigma_{\textbf{max}}^{*}\frac{n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{*2}}
(K1)lognθmax+κnθmax2σmin=:ξ1\displaystyle\lesssim\sqrt{(K-1)\log n}\theta_{\text{max}}+\frac{\kappa^{*}n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{*}}=:\xi_{1} (98)

since nθmaxσminσmax\sqrt{n}\theta_{\text{max}}\ll\sigma_{\textbf{min}}^{*}\leq\sigma_{\textbf{max}}^{*}. Therefore, by (95) we have

𝑼¯𝑳𝚲¯𝑼¯𝚲¯𝑳2,ξ1((K1)μn+𝑼¯𝑳𝑼¯2,).\displaystyle\left\|\overline{\boldsymbol{U}}\boldsymbol{L}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}\right\|_{2,\infty}\lesssim\xi_{1}\left(\sqrt{\frac{(K-1)\mu^{*}}{n}}+\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}\right). (99)

We now turn to bound the third summand of (94), namely, 𝑾¯𝑼¯2,\left\|\overline{\boldsymbol{W}}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}. Recall the decomposition of 𝑾¯\overline{\boldsymbol{W}} from (8.7). By Lemma 15 we have

𝑾𝑼¯2,\displaystyle\left\|\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty} lognθmax𝑼¯F+logn𝑼¯2,\displaystyle\lesssim\sqrt{\log n}\theta_{\text{max}}\left\|\overline{\boldsymbol{U}}^{*}\right\|_{F}+\log n\left\|\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}
(K1)lognθmax+logn(K1)μ/n\displaystyle\lesssim\sqrt{(K-1)\log n}\theta_{\text{max}}+\log n\sqrt{(K-1)\mu^{*}/n} (100)

with probability at least 1O(n14)1-O(n^{-14}). And, since (𝑾𝒖1𝒖1)𝑵𝑼¯=𝒖1𝒖1𝑾𝑵𝑼¯\left(\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right)^{\top}\boldsymbol{N}\overline{\boldsymbol{U}}^{*}=\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{N}\overline{\boldsymbol{U}}^{*} is a rank-11 matrix, we have

(𝑾𝒖1𝒖1)𝑵𝑼¯2,\displaystyle\left\|\left(\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right)^{\top}\boldsymbol{N}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty} =𝒖1𝒖1𝑾𝑵𝑼¯2𝒖1𝒖12𝑾𝑵𝑼¯\displaystyle=\left\|\boldsymbol{u}_{1}^{*}\right\|_{\infty}\left\|\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{N}\overline{\boldsymbol{U}}^{*}\right\|_{2}\leq\left\|\boldsymbol{u}_{1}^{*}\right\|_{\infty}\left\|\boldsymbol{u}_{1}^{*}\right\|_{2}\left\|\boldsymbol{W}\right\|\left\|\boldsymbol{N}\right\|\left\|\overline{\boldsymbol{U}}^{*}\right\|
μnnθmax=μθmax.\displaystyle\lesssim\sqrt{\frac{\mu^{*}}{n}}\sqrt{n}\theta_{\text{max}}=\sqrt{\mu^{*}}\theta_{\text{max}}. (101)

And, since 𝚫2,𝚫\left\|\boldsymbol{\Delta}\right\|_{2,\infty}\leq\left\|\boldsymbol{\Delta}\right\|, we have via (12),

𝚫𝑼¯2,𝚫2,𝑼¯𝚫𝑼¯nθmax2λ1.\displaystyle\left\|\boldsymbol{\Delta}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}\leq\left\|\boldsymbol{\Delta}\right\|_{2,\infty}\left\|\overline{\boldsymbol{U}}^{*}\right\|\leq\left\|\boldsymbol{\Delta}\right\|\left\|\overline{\boldsymbol{U}}^{*}\right\|\lesssim\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{*}}. (102)

Finally, since 𝒖1𝑼¯=𝟎\boldsymbol{u}_{1}^{*\top}\overline{\boldsymbol{U}}^{*}=\boldsymbol{0}, we know that 𝒖1𝒖1𝑾𝒖1𝒖1𝑼¯2,=𝑵𝑾𝒖1𝒖1𝑼¯2,=0\left\|\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}=\left\|\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}=0. This, along with (100), (101) and (102) yields

𝑾¯𝑼¯2,\displaystyle\left\|\overline{\boldsymbol{W}}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}\lesssim (K1)lognθmax+logn(K1)μ/n+μθmax+nθmax2λ1\displaystyle\sqrt{(K-1)\log n}\theta_{\text{max}}+\log n\sqrt{(K-1)\mu^{*}/n}+\sqrt{\mu^{*}}\theta_{\text{max}}+\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{*}}
=:ξ2.\displaystyle=:\xi_{2}. (103)

Combining (94), (99), (103) and Lemma 9 we have

𝑼¯𝑳𝑼¯2,\displaystyle\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}\lesssim (ξ1+lognσmin)𝑼¯𝑳𝑼¯2,+ξ1σmin(K1)μn+ξ2σmin\displaystyle\Big{(}\frac{\xi_{1}+\log n}{\sigma_{\textbf{min}}^{\star}}\Big{)}\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}+\frac{\xi_{1}}{\sigma_{\textbf{min}}^{*}}\sqrt{\frac{(K-1)\mu^{*}}{n}}+\frac{\xi_{2}}{\sigma_{\textbf{min}}^{\star}}
+κσmin2(K1)μnθmax2+(K1)nlognσmin2θmax\displaystyle+\frac{\kappa^{*}}{\sigma_{\textbf{min}}^{*2}}\sqrt{(K-1)\mu^{*}n}\theta_{\text{max}}^{2}+\frac{\sqrt{(K-1)n\log n}}{\sigma_{\textbf{min}}^{*2}}\theta_{\text{max}}
+((K1)nθmax+logn)nθmax2λ1σmin2+(nθmax+logn)2(K1)μ/nσmin2\displaystyle+\frac{(\sqrt{(K-1)n}\theta_{\text{max}}+\log n)n\theta_{\text{max}}^{2}}{\lambda_{1}^{*}\sigma_{\textbf{min}}^{*2}}+\frac{(\sqrt{n}\theta_{\text{max}}+\log n)^{2}\sqrt{(K-1)\mu^{*}/n}}{\sigma_{\textbf{min}}^{*2}}
\displaystyle\lesssim (ξ1+lognσmin)𝑼¯𝑳𝑼¯2,+ξ2σmin\displaystyle\Big{(}\frac{\xi_{1}+\log n}{\sigma_{\textbf{min}}^{\star}}\Big{)}\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}+\frac{\xi_{2}}{\sigma_{\textbf{min}}^{\star}}
+κσmin2(K1)μnθmax2+(K1)nlognσmin2θmax+K1n1.5θmax3λ1σmin2\displaystyle+\frac{\kappa^{*}}{\sigma_{\textbf{min}}^{*2}}\sqrt{(K-1)\mu^{*}n}\theta_{\text{max}}^{2}+\frac{\sqrt{(K-1)n\log n}}{\sigma_{\textbf{min}}^{*2}}\theta_{\text{max}}+\frac{\sqrt{K-1}n^{1.5}\theta_{\text{max}}^{3}}{\lambda_{1}^{*}\sigma_{\textbf{min}}^{*2}}

since nμmax{log2n,K1}n\gtrsim\mu^{*}\max\{\log^{2}n,K-1\}. When we further have ξ1/σmin1\xi_{1}/\sigma_{\textbf{min}}^{*}\ll 1, the first summand is negligible and we obtain the desired conclusion. ∎

8.10 Proof of Lemma 11

Proof.

From (61) we know that

𝚫=\displaystyle\boldsymbol{\Delta}= 12πi𝒞1𝑾(λ𝑰𝑿)1𝑾(λ𝑰𝑯)1𝑑λ\displaystyle\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}d\lambda
+12πi𝒞1𝑯(λ𝑰𝑿)1𝑾(λ𝑰𝑯)1𝑾(λ𝑰𝑯)1𝑑λ\displaystyle+\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{H}\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}d\lambda

We define the following four matrices

𝚫1\displaystyle\boldsymbol{\Delta}_{1} =𝑾(λ𝑰𝑯)1𝑾(λ𝑰𝑯)1\displaystyle=\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}
𝚫2\displaystyle\boldsymbol{\Delta}_{2} =𝑾[(λ𝑰𝑿)1(λ𝑰𝑯)1]𝑾(λ𝑰𝑯)1\displaystyle=\boldsymbol{W}\left[\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}-\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\right]\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}
=𝑾(λ𝑰𝑿)1𝑾(λ𝑰𝑯)1𝑾(λ𝑰𝑯)1\displaystyle=\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}
𝚫3\displaystyle\boldsymbol{\Delta}_{3} =𝑯(λ𝑰𝑯)1𝑾(λ𝑰𝑯)1𝑾(λ𝑰𝑯)1\displaystyle=\boldsymbol{H}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}
𝚫4\displaystyle\boldsymbol{\Delta}_{4} =𝑯[(λ𝑰𝑿)1(λ𝑰𝑯)1]𝑾(λ𝑰𝑯)1𝑾(λ𝑰𝑯)1\displaystyle=\boldsymbol{H}\left[\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}-\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\right]\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}
=𝑯(λ𝑰𝑿)1𝑾(λ𝑰𝑯)1𝑾(λ𝑰𝑯)1𝑾(λ𝑰𝑯)1.\displaystyle=\boldsymbol{H}\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}.

Control 12πi𝒞1𝚫1𝑑λ𝑼¯2,\left\|\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{1}d\lambda\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}: For 2iK2\leq i\leq K, one can show that

[12πi𝒞1𝚫1𝑑λ𝑼¯],i1\displaystyle\left[\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{1}d\lambda\overline{\boldsymbol{U}}^{*}\right]_{\cdot,i-1} =12πi𝒞1j,k=1n1(λλj)(λλk)𝑾𝒖j𝒖j𝑾𝒖k𝒖k𝒖idλ\displaystyle=\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\sum_{j,k=1}^{n}\frac{1}{(\lambda-\lambda_{j}^{*})(\lambda-\lambda_{k}^{*})}\boldsymbol{W}\boldsymbol{u}_{j}^{*}\boldsymbol{u}_{j}^{*\top}\boldsymbol{W}\boldsymbol{u}_{k}\boldsymbol{u}_{k}^{*\top}\boldsymbol{u}_{i}^{*}d\lambda
=12πi𝒞1j=1n1(λλj)(λλi)𝑾𝒖j𝒖j𝑾𝒖idλ\displaystyle=\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\sum_{j=1}^{n}\frac{1}{(\lambda-\lambda_{j}^{*})(\lambda-\lambda_{i}^{*})}\boldsymbol{W}\boldsymbol{u}_{j}^{*}\boldsymbol{u}_{j}^{*\top}\boldsymbol{W}\boldsymbol{u}_{i}^{*}d\lambda
=j=1nRes(1(λλj)(λλi),λ1)𝑾𝒖j𝒖j𝑾𝒖i\displaystyle=\sum_{j=1}^{n}\textbf{Res}\left(\frac{1}{(\lambda-\lambda_{j}^{*})(\lambda-\lambda_{i}^{*})},\lambda_{1}^{*}\right)\boldsymbol{W}\boldsymbol{u}_{j}^{*}\boldsymbol{u}_{j}^{*\top}\boldsymbol{W}\boldsymbol{u}_{i}^{*}
=1λ1λi𝑾𝒖1𝒖1𝑾𝒖i=𝒖1𝑾𝒖iλ1λi𝑾𝒖1.\displaystyle=\frac{1}{\lambda_{1}^{*}-\lambda_{i}^{*}}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{i}^{*}=\frac{\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{i}^{*}}{\lambda_{1}^{*}-\lambda_{i}^{*}}\boldsymbol{W}\boldsymbol{u}_{1}^{*}.

As a result, we have

12πi𝒞1𝚫1𝑑λ𝑼¯2,=𝑾𝒖1i=2K(𝒖1𝑾𝒖iλ1λi)2.\displaystyle\left\|\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{1}d\lambda\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}=\left\|\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\|_{\infty}\sqrt{\sum_{i=2}^{K}\left(\frac{\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{i}^{*}}{\lambda_{1}^{*}-\lambda_{i}^{*}}\right)^{2}}.

By Lemma 15, we know that 𝑾𝒖1lognθmax+lognμ/n\left\|\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\|_{\infty}\lesssim\sqrt{\log n}\theta_{\text{max}}+\log n\sqrt{\mu^{*}/n} with probability at least 1O(n14)1-O(n^{-14}). For each 2iK2\leq i\leq K, by Lemma 18 we know that |𝒖1𝑾𝒖i|lognθmax+lognμ/n\left|\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{i}^{*}\right|\lesssim\sqrt{\log n}\theta_{\text{max}}+\log n\sqrt{\mu^{*}/n} with probability at least 1O(n15)1-O(n^{-15}). As a result, we know that

12πi𝒞1𝚫1𝑑λ𝑼¯2,\displaystyle\left\|\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{1}d\lambda\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty} (lognθmax+lognμ/n)2K1λ1\displaystyle\lesssim\left(\sqrt{\log n}\theta_{\text{max}}+\log n\sqrt{\mu^{*}/n}\right)^{2}\frac{\sqrt{K-1}}{\lambda_{1}^{*}} (104)

with probability at least 1O(n14)1-O(n^{-14}).

Control 12πi𝒞1𝚫2𝑑λ𝑼¯\left\|\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{2}d\lambda\overline{\boldsymbol{U}}^{*}\right\|: By definition we have

12πi𝒞1𝚫2𝑑λ𝑼¯\displaystyle\left\|\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{2}d\lambda\overline{\boldsymbol{U}}^{*}\right\| 12πi𝒞1𝚫2𝑑λ12π𝒞1𝚫2𝑑λ\displaystyle\leq\left\|\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{2}d\lambda\right\|\leq\frac{1}{2\pi}\oint_{\mathcal{C}_{1}}\left\|\boldsymbol{\Delta}_{2}\right\|d\lambda
12π𝒞1𝑾3(λ𝑰𝑿)1(λ𝑰𝑯)12𝑑λ\displaystyle\leq\frac{1}{2\pi}\oint_{\mathcal{C}_{1}}\left\|\boldsymbol{W}\right\|^{3}\left\|\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}\right\|\left\|\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\right\|^{2}d\lambda
𝒞1(nθmax)3λ13𝑑λn1.5θmax3rλ13n1.5θmax3λ12.\displaystyle\lesssim\oint_{\mathcal{C}_{1}}\frac{(\sqrt{n}\theta_{\text{max}})^{3}}{\lambda_{1}^{*3}}d\lambda\lesssim\frac{n^{1.5}\theta_{\text{max}}^{3}r}{\lambda_{1}^{*3}}\lesssim\frac{n^{1.5}\theta_{\text{max}}^{3}}{\lambda_{1}^{*2}}. (105)

Control 12πi𝒞1𝚫3𝑑λ𝑼¯2,\left\|\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{3}d\lambda\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}: For 2iK2\leq i\leq K, one can show that

[12πi𝒞1𝚫3𝑑λ𝑼¯],i1=12πi𝒞1j,k,l=1nλj(λλj)(λλk)(λλl)𝒖j𝒖j𝑾𝒖k𝒖k𝑾𝒖l𝒖l𝒖idλ\displaystyle\left[\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{3}d\lambda\overline{\boldsymbol{U}}^{*}\right]_{\cdot,i-1}=\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\sum_{j,k,l=1}^{n}\frac{\lambda_{j}^{*}}{(\lambda-\lambda_{j}^{*})(\lambda-\lambda_{k}^{*})(\lambda-\lambda_{l}^{*})}\boldsymbol{u}_{j}^{*}\boldsymbol{u}_{j}^{*\top}\boldsymbol{W}\boldsymbol{u}_{k}^{*}\boldsymbol{u}_{k}^{*\top}\boldsymbol{W}\boldsymbol{u}_{l}\boldsymbol{u}_{l}^{*\top}\boldsymbol{u}_{i}^{*}d\lambda
=\displaystyle= 12πi𝒞1j,k=1nλj(λλj)(λλk)(λλi)𝒖j𝒖j𝑾𝒖k𝒖k𝑾𝒖idλ\displaystyle\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\sum_{j,k=1}^{n}\frac{\lambda_{j}^{*}}{(\lambda-\lambda_{j}^{*})(\lambda-\lambda_{k}^{*})(\lambda-\lambda_{i}^{*})}\boldsymbol{u}_{j}^{*}\boldsymbol{u}_{j}^{*\top}\boldsymbol{W}\boldsymbol{u}_{k}^{*}\boldsymbol{u}_{k}^{*\top}\boldsymbol{W}\boldsymbol{u}_{i}^{*}d\lambda
=\displaystyle= j,k=1nRes(λj(λλj)(λλk)(λλi),λ1)𝒖j𝒖j𝑾𝒖k𝒖k𝑾𝒖i\displaystyle\sum_{j,k=1}^{n}\textbf{Res}\left(\frac{\lambda_{j}^{*}}{(\lambda-\lambda_{j}^{*})(\lambda-\lambda_{k}^{*})(\lambda-\lambda_{i}^{*})},\lambda_{1}^{*}\right)\boldsymbol{u}_{j}^{*}\boldsymbol{u}_{j}^{*\top}\boldsymbol{W}\boldsymbol{u}_{k}^{*}\boldsymbol{u}_{k}^{*\top}\boldsymbol{W}\boldsymbol{u}_{i}^{*}
=\displaystyle= j=2nλj(λ1λj)(λ1λi)𝒖j𝒖j𝑾𝒖1𝒖1𝑾𝒖i\displaystyle\sum_{j=2}^{n}\frac{\lambda_{j}^{*}}{(\lambda_{1}^{*}-\lambda_{j}^{*})(\lambda_{1}^{*}-\lambda_{i}^{*})}\boldsymbol{u}_{j}^{*}\boldsymbol{u}_{j}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{i}^{*}
+j=2nλ1(λ1λj)(λ1λi)𝒖1𝒖1𝑾𝒖j𝒖j𝑾𝒖i\displaystyle+\sum_{j=2}^{n}\frac{\lambda_{1}^{*}}{(\lambda_{1}^{*}-\lambda_{j}^{*})(\lambda_{1}^{*}-\lambda_{i}^{*})}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{j}^{*}\boldsymbol{u}_{j}^{*\top}\boldsymbol{W}\boldsymbol{u}_{i}^{*}
λ1(λ1λi)2𝒖1𝒖1𝑾𝒖1𝒖1𝑾𝒖i\displaystyle-\frac{\lambda_{1}^{*}}{(\lambda_{1}^{*}-\lambda_{i}^{*})^{2}}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{i}^{*}
=\displaystyle= 1λ1λi(𝑵𝑰)𝑾𝒖1𝒖1𝑾𝒖i+1λ1λi𝒖1𝒖1𝑾𝑵𝑾𝒖i\displaystyle\frac{1}{\lambda_{1}^{*}-\lambda_{i}^{*}}\left(\boldsymbol{N}-\boldsymbol{I}\right)\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{i}^{*}+\frac{1}{\lambda_{1}^{*}-\lambda_{i}^{*}}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{i}^{*}
λ1(λ1λi)2𝒖1𝒖1𝑾𝒖1𝒖1𝑾𝒖i.\displaystyle-\frac{\lambda_{1}^{*}}{(\lambda_{1}^{*}-\lambda_{i}^{*})^{2}}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{i}^{*}.

Let

𝑪1\displaystyle\boldsymbol{C}_{1} =diag(1λ1λ2,1λ1λ3,,1λ1λK);\displaystyle=\textbf{diag}\left(\frac{1}{\lambda_{1}^{*}-\lambda_{2}^{*}},\frac{1}{\lambda_{1}^{*}-\lambda_{3}^{*}},\dots,\frac{1}{\lambda_{1}^{*}-\lambda_{K}^{*}}\right);
𝑪2\displaystyle\boldsymbol{C}_{2} =diag(λ1(λ1λ2)2,λ1(λ1λ3)2,,λ1(λ1λK)2).\displaystyle=\textbf{diag}\left(\frac{\lambda_{1}^{*}}{(\lambda_{1}^{*}-\lambda_{2}^{*})^{2}},\frac{\lambda_{1}^{*}}{(\lambda_{1}^{*}-\lambda_{3}^{*})^{2}},\dots,\frac{\lambda_{1}^{*}}{(\lambda_{1}^{*}-\lambda_{K}^{*})^{2}}\right).

Then we have

12πi𝒞1𝚫3𝑑λ𝑼¯=\displaystyle\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{3}d\lambda\overline{\boldsymbol{U}}^{*}= (𝑵𝑰)𝑾𝒖1𝒖1𝑾𝑼¯𝑪1+𝒖1𝒖1𝑾𝑵𝑾𝑼¯𝑪1\displaystyle\left(\boldsymbol{N}-\boldsymbol{I}\right)\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\boldsymbol{C}_{1}+\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{N}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\boldsymbol{C}_{1}
𝒖1𝒖1𝑾𝒖1𝒖1𝑾𝑼¯𝑪2.\displaystyle-\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\boldsymbol{C}_{2}. (106)

Note that 𝑵=λ1𝑵1\boldsymbol{N}=\lambda_{1}^{*}\boldsymbol{N}_{1} for 𝑵1\boldsymbol{N}_{1} defined in  (50). In the proof of Lemma 12 we actually show that 𝑵𝑰2,(K1)μn\left\|\boldsymbol{N}-\boldsymbol{I}\right\|_{2,\infty}\lesssim\sqrt{\frac{(K-1)\mu^{*}}{n}}, which implies

(𝑵𝑰)𝑾𝒖1𝒖1𝑾𝑼¯𝑪12,\displaystyle\left\|\left(\boldsymbol{N}-\boldsymbol{I}\right)\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\boldsymbol{C}_{1}\right\|_{2,\infty} 𝑵𝑰2,𝑾𝒖1𝒖1𝑾𝑼¯𝑪1\displaystyle\leq\left\|\boldsymbol{N}-\boldsymbol{I}\right\|_{2,\infty}\left\|\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\boldsymbol{C}_{1}\right\|
𝑵𝑰2,𝑾2𝑪1\displaystyle\leq\left\|\boldsymbol{N}-\boldsymbol{I}\right\|_{2,\infty}\left\|\boldsymbol{W}\right\|^{2}\left\|\boldsymbol{C}_{1}\right\|
(K1)μnnθmax2λ1=(K1)μnλ12θmax2.\displaystyle\lesssim\sqrt{\frac{(K-1)\mu^{*}}{n}}\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{*}}=\sqrt{\frac{(K-1)\mu^{*}n}{\lambda_{1}^{*2}}}\theta_{\text{max}}^{2}. (107)

For 𝒖1𝒖1𝑾𝑵𝑾𝑼¯𝑪1\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{N}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\boldsymbol{C}_{1} and 𝒖1𝒖1𝑾𝒖1𝒖1𝑾𝑼¯𝑪2\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\boldsymbol{C}_{2}, since both of them are rank-11 matrices, we have, using 𝑵1\left\|\boldsymbol{N}\right\|\lesssim 1 from Lemma 12,

𝒖1𝒖1𝑾𝑵𝑾𝑼¯𝑪12,\displaystyle\left\|\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{N}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\boldsymbol{C}_{1}\right\|_{2,\infty} =𝒖1𝒖1𝑾𝑵𝑾𝑼¯𝑪1\displaystyle=\left\|\boldsymbol{u}_{1}^{*}\right\|_{\infty}\left\|\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{N}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\boldsymbol{C}_{1}\right\|
𝒖1𝑾2𝑵𝑪1μnnθmax2λ1=μnλ12θmax2\displaystyle\leq\left\|\boldsymbol{u}_{1}^{*}\right\|_{\infty}\left\|\boldsymbol{W}\right\|^{2}\left\|\boldsymbol{N}\right\|\left\|\boldsymbol{C}_{1}\right\|\lesssim\sqrt{\frac{\mu^{*}}{n}}\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{*}}=\sqrt{\frac{\mu^{*}n}{\lambda_{1}^{*2}}}\theta_{\text{max}}^{2} (108)

and

𝒖1𝒖1𝑾𝒖1𝒖1𝑾𝑼¯𝑪22,\displaystyle\left\|\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\boldsymbol{C}_{2}\right\|_{2,\infty} =𝒖1𝒖1𝑾𝒖1𝒖1𝑾𝑼¯𝑪2\displaystyle=\left\|\boldsymbol{u}_{1}^{*}\right\|_{\infty}\left\|\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\boldsymbol{C}_{2}\right\|
𝒖1𝑾2𝑵𝑪2=μnλ12θmax2.\displaystyle\leq\left\|\boldsymbol{u}_{1}^{*}\right\|_{\infty}\left\|\boldsymbol{W}\right\|^{2}\left\|\boldsymbol{N}\right\|\left\|\boldsymbol{C}_{2}\right\|\lesssim=\sqrt{\frac{\mu^{*}n}{\lambda_{1}^{*2}}}\theta_{\text{max}}^{2}. (109)

Now, combining  (106), (107), (108)  (109) yields

12πi𝒞1𝚫3𝑑λ𝑼¯(K1)μnλ12θmax2.\displaystyle\Big{\|}\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{3}d\lambda\overline{\boldsymbol{U}}^{*}\Big{\|}\lesssim\sqrt{\frac{(K-1)\mu^{*}n}{\lambda_{1}^{*2}}}\theta_{\text{max}}^{2}. (110)

Control 12πi𝒞1𝚫4𝑑λ𝑼¯\left\|\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{4}d\lambda\overline{\boldsymbol{U}}^{*}\right\|: By definition we have

12πi𝒞1𝚫4𝑑λ𝑼¯\displaystyle\Big{\|}\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{4}d\lambda\overline{\boldsymbol{U}}^{*}\Big{\|} 12πi𝒞1𝚫4𝑑λ12π𝒞1𝚫4𝑑λ\displaystyle\leq\Big{\|}\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{4}d\lambda\Big{\|}\leq\frac{1}{2\pi}\oint_{\mathcal{C}_{1}}\left\|\boldsymbol{\Delta}_{4}\right\|d\lambda
12π𝒞1𝑯(λ𝑰𝑿)1𝑾3(λ𝑰𝑯)13𝑑λ\displaystyle\leq\frac{1}{2\pi}\oint_{\mathcal{C}_{1}}\left\|\boldsymbol{H}\right\|\|\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}\|\left\|\boldsymbol{W}\right\|^{3}\|\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\|^{3}d\lambda
𝒞1λ1(nθmax)3λ14𝑑λn1.5θmax3rλ13n1.5θmax3λ12,\displaystyle\lesssim\oint_{\mathcal{C}_{1}}\frac{\lambda_{1}^{*}(\sqrt{n}\theta_{\text{max}})^{3}}{\lambda_{1}^{*4}}d\lambda\lesssim\frac{n^{1.5}\theta_{\text{max}}^{3}r}{\lambda_{1}^{*3}}\lesssim\frac{n^{1.5}\theta_{\text{max}}^{3}}{\lambda_{1}^{*2}}, (111)

where the second inequality uses (8.1).

Next, we combine the four bounds obtained above. More precisely, using  (104),  (105),  (110) and  (111) with triangle inequality, we obtain

𝚫𝑼¯2,\displaystyle\left\|\boldsymbol{\Delta}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}\leq i=1412πi𝒞1𝚫i𝑑λ𝑼¯2,\displaystyle\sum\limits_{i=1}^{4}\left\|\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{i}d\lambda\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}
\displaystyle\leq i=1,312πi𝒞1𝚫i𝑑λ𝑼¯2,+i=2,412πi𝒞1𝚫i𝑑λ𝑼¯\displaystyle\sum\limits_{i=1,3}\left\|\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{i}d\lambda\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}+\sum\limits_{i=2,4}\left\|\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{i}d\lambda\overline{\boldsymbol{U}}^{*}\right\|
\displaystyle\lesssim (lognθmax+lognμ/n)2K1λ1+n1.5θmax3λ12+(K1)μnλ12θmax2\displaystyle\left(\sqrt{\log n}\theta_{\text{max}}+\log n\sqrt{\mu^{*}/n}\right)^{2}\frac{\sqrt{K-1}}{\lambda_{1}^{*}}+\frac{n^{1.5}\theta_{\text{max}}^{3}}{\lambda_{1}^{*2}}+\sqrt{\frac{(K-1)\mu^{*}n}{\lambda_{1}^{*2}}}\theta_{\text{max}}^{2}
\displaystyle\lesssim (K1)μnθmax2λ1+K1μlog2nnλ1+n1.5θmax3λ12\displaystyle\frac{\sqrt{(K-1)\mu^{*}n}\theta_{\text{max}}^{2}}{\lambda_{1}^{*}}+\frac{\sqrt{K-1}\mu^{*}\log^{2}n}{n\lambda_{1}^{*}}+\frac{n^{1.5}\theta_{\text{max}}^{3}}{\lambda_{1}^{*2}}

with probability at least 1O(n10)1-O(n^{-10}). ∎

8.11 Proof of Theorem 2

Proof.

As for the first step, we have

𝑿¯𝑼¯(𝚲¯)1=\displaystyle\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}\left(\overline{\boldsymbol{\Lambda}}^{*}\right)^{-1}= (𝑯¯+𝑾¯)𝑼¯(𝚲¯)1=𝑼¯+𝑾¯𝑼¯(𝚲¯)1\displaystyle\left(\overline{\boldsymbol{H}}+\overline{\boldsymbol{W}}\right)\overline{\boldsymbol{U}}^{*}\left(\overline{\boldsymbol{\Lambda}}^{*}\right)^{-1}=\overline{\boldsymbol{U}}^{*}+\overline{\boldsymbol{W}}\overline{\boldsymbol{U}}^{*}\left(\overline{\boldsymbol{\Lambda}}^{*}\right)^{-1}
=\displaystyle= 𝑼¯+[𝑾𝒖1𝒖1𝑾𝒖1𝒖1𝑵𝑾𝒖1𝒖1(𝑾𝒖1𝒖1)𝑵]𝑼¯(𝚲¯)1\displaystyle\overline{\boldsymbol{U}}^{*}+\left[\boldsymbol{W}-\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}-\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}-\left(\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right)^{\top}\boldsymbol{N}\right]\overline{\boldsymbol{U}}^{*}\left(\overline{\boldsymbol{\Lambda}}^{*}\right)^{-1}
𝚫𝑼¯(𝚲¯)1\displaystyle-\boldsymbol{\Delta}\overline{\boldsymbol{U}}^{*}\left(\overline{\boldsymbol{\Lambda}}^{*}\right)^{-1}
=\displaystyle= 𝑼¯+[𝑾𝒖1𝒖1𝑾𝑵]𝑼¯(𝚲¯)1𝚫𝑼¯(𝚲¯)1,\displaystyle\overline{\boldsymbol{U}}^{*}+\left[\boldsymbol{W}-\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{N}\right]\overline{\boldsymbol{U}}^{*}\left(\overline{\boldsymbol{\Lambda}}^{*}\right)^{-1}-\boldsymbol{\Delta}\overline{\boldsymbol{U}}^{*}\left(\overline{\boldsymbol{\Lambda}}^{*}\right)^{-1},

and

𝑼¯𝑹\displaystyle\overline{\boldsymbol{U}}\boldsymbol{R} =𝑼¯𝑹𝚲¯(𝚲¯)1=[𝑼¯𝚲¯𝑹+𝑼¯(𝑹𝚲¯𝚲¯𝑹)](𝚲¯)1\displaystyle=\overline{\boldsymbol{U}}\boldsymbol{R}\overline{\boldsymbol{\Lambda}}^{*}\left(\overline{\boldsymbol{\Lambda}}^{*}\right)^{-1}=\left[\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{R}+\overline{\boldsymbol{U}}\left(\boldsymbol{R}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{\Lambda}}\boldsymbol{R}\right)\right]\left(\overline{\boldsymbol{\Lambda}}^{*}\right)^{-1}
=[𝑼¯𝚲¯𝑳+𝑼¯𝚲¯(𝑹𝑳)+𝑼¯(𝑹𝚲¯𝚲¯𝑹)](𝚲¯)1\displaystyle=\left[\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}+\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\left(\boldsymbol{R}-\boldsymbol{L}\right)+\overline{\boldsymbol{U}}\left(\boldsymbol{R}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{\Lambda}}\boldsymbol{R}\right)\right]\left(\overline{\boldsymbol{\Lambda}}^{*}\right)^{-1}
=[𝑿¯𝑼¯+(𝑼¯𝚲¯𝑳𝑿¯𝑼¯)+𝑼¯𝚲¯(𝑹𝑳)+𝑼¯(𝑹𝚲¯𝚲¯𝑹)](𝚲¯)1\displaystyle=\left[\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}+\left(\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}\right)+\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\left(\boldsymbol{R}-\boldsymbol{L}\right)+\overline{\boldsymbol{U}}\left(\boldsymbol{R}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{\Lambda}}\boldsymbol{R}\right)\right]\left(\overline{\boldsymbol{\Lambda}}^{*}\right)^{-1}
=𝑿¯𝑼¯(𝚲¯)1+[(𝑼¯𝚲¯𝑳𝑿¯𝑼¯)+𝑼¯𝚲¯(𝑹𝑳)+𝑼¯(𝑹𝚲¯𝚲¯𝑹)](𝚲¯)1\displaystyle=\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}\left(\overline{\boldsymbol{\Lambda}}^{*}\right)^{-1}+\left[\left(\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}\right)+\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\left(\boldsymbol{R}-\boldsymbol{L}\right)+\overline{\boldsymbol{U}}\left(\boldsymbol{R}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{\Lambda}}\boldsymbol{R}\right)\right]\left(\overline{\boldsymbol{\Lambda}}^{*}\right)^{-1}

As a result, we know that

𝚿𝑼¯=[(𝑼¯𝚲¯𝑳𝑿¯𝑼¯)+𝑼¯𝚲¯(𝑹𝑳)+𝑼¯(𝑹𝚲¯𝚲¯𝑹)𝚫𝑼¯](𝚲¯)1.\displaystyle\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}=\left[\left(\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}\right)+\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\left(\boldsymbol{R}-\boldsymbol{L}\right)+\overline{\boldsymbol{U}}\left(\boldsymbol{R}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{\Lambda}}\boldsymbol{R}\right)-\boldsymbol{\Delta}\overline{\boldsymbol{U}}^{*}\right]\left(\overline{\boldsymbol{\Lambda}}^{*}\right)^{-1}.

This imples that

𝚿𝑼¯2,𝑼¯𝚲¯𝑳𝑿¯𝑼¯2,+𝑼¯𝚲¯(𝑹𝑳)2,+𝑼¯(𝑹𝚲¯𝚲¯𝑹)2,+𝚫𝑼¯2,σmin.\displaystyle\left\|\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}\right\|_{2,\infty}\leq\frac{\left\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}+\left\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\left(\boldsymbol{R}-\boldsymbol{L}\right)\right\|_{2,\infty}+\left\|\overline{\boldsymbol{U}}\left(\boldsymbol{R}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{\Lambda}}\boldsymbol{R}\right)\right\|_{2,\infty}+\left\|\boldsymbol{\Delta}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}}{\sigma_{\textbf{min}}^{*}}.

We now bound the numerator. Note that the last term is already bounded by Lemma 11.

Control 𝑼¯𝚲¯𝑳𝑿¯𝑼¯2,\left\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}: Combine (94) and (99) we know that

𝑼¯𝑳𝑼¯2,\displaystyle\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}\leq 1σmin𝑼¯𝚲¯𝑳𝑿¯𝑼¯2,+1σmin𝑾¯𝑼¯2,+1σmin𝑼¯𝑳𝚲¯𝑼¯𝚲¯𝑳2,\displaystyle\frac{1}{\sigma_{\textbf{min}}^{*}}\left\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}+\frac{1}{\sigma_{\textbf{min}}^{*}}\left\|\overline{\boldsymbol{W}}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}+\frac{1}{\sigma_{\textbf{min}}^{*}}\left\|\overline{\boldsymbol{U}}\boldsymbol{L}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}\right\|_{2,\infty}
\displaystyle\lesssim 1σmin𝑼¯𝚲¯𝑳𝑿¯𝑼¯2,+1σmin𝑾¯𝑼¯2,\displaystyle\frac{1}{\sigma_{\textbf{min}}^{*}}\left\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}+\frac{1}{\sigma_{\textbf{min}}^{*}}\left\|\overline{\boldsymbol{W}}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}
+ξ1σmin((K1)μn+𝑼¯𝑳𝑼¯2,),\displaystyle+\frac{\xi_{1}}{\sigma_{\textbf{min}}^{\star}}\left(\sqrt{\frac{(K-1)\mu^{*}}{n}}+\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}\right),

where ξ1\xi_{1} is defined by (8.9). Since we assumed ξ1/σmin1\xi_{1}/\sigma_{\textbf{min}}^{*}\ll 1, we further have

𝑼¯𝑳𝑼¯2,\displaystyle\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}\lesssim 1σmin𝑼¯𝚲¯𝑳𝑿¯𝑼¯2,+1σmin𝑾¯𝑼¯2,+ξ1σmin(K1)μn.\displaystyle\frac{1}{\sigma_{\textbf{min}}^{*}}\left\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}+\frac{1}{\sigma_{\textbf{min}}^{*}}\left\|\overline{\boldsymbol{W}}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}+\frac{\xi_{1}}{\sigma_{\textbf{min}}^{\star}}\sqrt{\frac{(K-1)\mu^{*}}{n}}.

Plugging this in Lemma 9 we get

𝑼¯𝚲¯𝑳𝑿¯𝑼¯2,\displaystyle\left\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}\lesssim (nθmax2σmin2+lognσmin)(𝑼¯𝚲¯𝑳𝑿¯𝑼¯2,+𝑾¯𝑼¯2,+(K1)μnξ1σmin)\displaystyle\Bigg{(}\frac{n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{*2}}+\frac{\log n}{\sigma_{\textbf{min}}^{*}}\Bigg{)}\Bigg{(}\left\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}+\left\|\overline{\boldsymbol{W}}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}+\sqrt{\frac{(K-1)\mu^{*}}{n}}\frac{\xi_{1}}{\sigma_{\textbf{min}}^{\star}}\Bigg{)}
+(K1)nlognσminθmax+κσmin(K1)μnθmax2\displaystyle+\frac{\sqrt{(K-1)n\log n}}{\sigma_{\textbf{min}}^{*}}\theta_{\text{max}}+\frac{\kappa^{*}}{\sigma_{\textbf{min}}^{*}}\sqrt{(K-1)\mu^{*}n}\theta_{\text{max}}^{2}
+((K1)nθmax+logn)nθmax2λ1σmin+(nθmax+logn)2(K1)μ/nσmin.\displaystyle+\frac{(\sqrt{(K-1)n}\theta_{\text{max}}+\log n)n\theta_{\text{max}}^{2}}{\lambda_{1}^{*}\sigma_{\textbf{min}}^{*}}+\frac{(\sqrt{n}\theta_{\text{max}}+\log n)^{2}\sqrt{(K-1)\mu^{*}/n}}{\sigma_{\textbf{min}}^{*}}.

Since max{nθmax,logn}σmin\max\{\sqrt{n}\theta_{\text{max}},\log n\}\ll\sigma_{\textbf{min}}^{*}, and by (103), we know that

𝑼¯𝚲¯𝑳𝑿¯𝑼¯2,\displaystyle\left\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}\lesssim (nθmax2σmin2+lognσmin)𝑾¯𝑼¯2,+(K1)nlognσminθmax+κσmin(K1)μnθmax2\displaystyle\left(\frac{n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{*2}}+\frac{\log n}{\sigma_{\textbf{min}}^{*}}\right)\left\|\overline{\boldsymbol{W}}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}+\frac{\sqrt{(K-1)n\log n}}{\sigma_{\textbf{min}}^{*}}\theta_{\text{max}}+\frac{\kappa^{*}}{\sigma_{\textbf{min}}^{*}}\sqrt{(K-1)\mu^{*}n}\theta_{\text{max}}^{2}
+(nθmaxσmin2+logn)(K1)μnξ1σmin\displaystyle+\left(\frac{n\theta_{\text{max}}}{\sigma_{\textbf{min}}^{*2}}+\log n\right)\sqrt{\frac{(K-1)\mu^{*}}{n}}\frac{\xi_{1}}{\sigma_{\textbf{min}}^{\star}}
+((K1)nθmax+logn)nθmax2λ1σmin+(nθmax+logn)2(K1)μ/nσmin\displaystyle+\frac{(\sqrt{(K-1)n}\theta_{\text{max}}+\log n)n\theta_{\text{max}}^{2}}{\lambda_{1}^{*}\sigma_{\textbf{min}}^{*}}+\frac{(\sqrt{n}\theta_{\text{max}}+\log n)^{2}\sqrt{(K-1)\mu^{*}/n}}{\sigma_{\textbf{min}}^{*}}
\displaystyle\lesssim (K1)nlogn+κ(K1)μnθmax2σmin+nθmax2logn+K1n1.5θmax3λ1σmin\displaystyle\frac{\sqrt{(K-1)n\log n}+\kappa^{*}\sqrt{(K-1)\mu^{*}n}\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{*}}+\frac{n\theta_{\text{max}}^{2}\log n+\sqrt{K-1}n^{1.5}\theta_{\text{max}}^{3}}{\lambda_{1}^{*}\sigma_{\textbf{min}}^{*}}

since we assumed μlog2nn\mu^{*}\log^{2}n\lesssim n. According to Lemma 1, since K1𝜽22λ1K^{-1}\left\|\boldsymbol{\theta}\right\|_{2}^{2}\lesssim\lambda_{1}^{*} and σminβnK1𝜽22\sigma_{\textbf{min}}\asymp\beta_{n}K^{-1}\left\|\boldsymbol{\theta}\right\|_{2}^{2}, we have, using KlognnK\log n\lesssim n,

𝑼¯𝚲¯𝑳𝑿¯𝑼¯2,\displaystyle\left\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty} K1.5log0.5nn0.5βnθmax2+κK1.5μn0.5βn+K2lognnβnθmax2+K2.5n0.5βnθmax\displaystyle\lesssim\frac{K^{1.5}\log^{0.5}n}{n^{0.5}\beta_{n}\theta_{\text{max}}^{2}}+\frac{\kappa^{*}K^{1.5}\sqrt{\mu^{*}}}{n^{0.5}\beta_{n}}+\frac{K^{2}\log n}{n\beta_{n}\theta_{\text{max}}^{2}}+\frac{K^{2.5}}{n^{0.5}\beta_{n}\theta_{\text{max}}}
K1.5log0.5nn0.5βnθmax2+κK1.5μn0.5βn+K2.5n0.5βnθmax.\displaystyle\lesssim\frac{K^{1.5}\log^{0.5}n}{n^{0.5}\beta_{n}\theta_{\text{max}}^{2}}+\frac{\kappa^{*}K^{1.5}\sqrt{\mu^{*}}}{n^{0.5}\beta_{n}}+\frac{K^{2.5}}{n^{0.5}\beta_{n}\theta_{\text{max}}}. (112)

Control 𝑼¯𝚲¯(𝑹𝑳)2,+𝑼¯(𝑹𝚲¯𝚲¯𝑹)2,\left\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\left(\boldsymbol{R}-\boldsymbol{L}\right)\right\|_{2,\infty}+\left\|\overline{\boldsymbol{U}}\left(\boldsymbol{R}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{\Lambda}}\boldsymbol{R}\right)\right\|_{2,\infty}: By Lemma 7 we have

𝑼¯𝚲¯(𝑹𝑳)2,\displaystyle\left\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\left(\boldsymbol{R}-\boldsymbol{L}\right)\right\|_{2,\infty} 𝑼¯2,𝚲¯(𝑹𝑳)σmax𝑼¯2,𝑹𝑳\displaystyle\leq\left\|\overline{\boldsymbol{U}}\right\|_{2,\infty}\left\|\overline{\boldsymbol{\Lambda}}\left(\boldsymbol{R}-\boldsymbol{L}\right)\right\|\leq\sigma_{\textbf{max}}^{*}\left\|\overline{\boldsymbol{U}}\right\|_{2,\infty}\left\|\boldsymbol{R}-\boldsymbol{L}\right\|
nθmax2σmaxσmin2𝑼¯2,=κnθmax2σmin2𝑼¯2,.\displaystyle\lesssim\frac{n\theta_{\text{max}}^{2}\sigma_{\textbf{max}}^{*}}{\sigma_{\textbf{min}}^{*2}}\left\|\overline{\boldsymbol{U}}\right\|_{2,\infty}=\frac{\kappa^{*}n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{*2}}\left\|\overline{\boldsymbol{U}}\right\|_{2,\infty}.

By Lemma 8 we have

𝑼¯(𝑹𝚲¯𝚲¯𝑹)2,\displaystyle\left\|\overline{\boldsymbol{U}}\left(\boldsymbol{R}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{\Lambda}}\boldsymbol{R}\right)\right\|_{2,\infty} 𝑼¯2,𝑹𝚲¯𝚲¯𝑹=𝑼¯2,𝚲¯𝑹𝚲¯𝑹\displaystyle\leq\left\|\overline{\boldsymbol{U}}\right\|_{2,\infty}\left\|\boldsymbol{R}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{\Lambda}}\boldsymbol{R}\right\|=\left\|\overline{\boldsymbol{U}}\right\|_{2,\infty}\left\|\overline{\boldsymbol{\Lambda}}^{*}-\boldsymbol{R}^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{R}\right\|
(κnθmax2σmin+(K1)lognθmax)𝑼¯2,\displaystyle\lesssim\left(\frac{\kappa^{*}n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{*}}+\sqrt{(K-1)\log n}\theta_{\text{max}}\right)\left\|\overline{\boldsymbol{U}}\right\|_{2,\infty}

As a result, by Lemma 1 we know that

𝑼¯𝚲¯(𝑹𝑳)2,+𝑼¯(𝑹𝚲¯𝚲¯𝑹)2,\displaystyle\left\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\left(\boldsymbol{R}-\boldsymbol{L}\right)\right\|_{2,\infty}+\left\|\overline{\boldsymbol{U}}\left(\boldsymbol{R}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{\Lambda}}\boldsymbol{R}\right)\right\|_{2,\infty} (κnθmax2σmin+(K1)lognθmax)𝑼¯2,\displaystyle\lesssim\left(\frac{\kappa^{*}n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{*}}+\sqrt{(K-1)\log n}\theta_{\text{max}}\right)\left\|\overline{\boldsymbol{U}}\right\|_{2,\infty}
(κKβn+K0.5log0.5nθmax)𝑼¯2,\displaystyle\lesssim\left(\frac{\kappa^{*}K}{\beta_{n}}+K^{0.5}\log^{0.5}n\theta_{\text{max}}\right)\left\|\overline{\boldsymbol{U}}\right\|_{2,\infty} (113)

By lemma 7 we know that σK1(𝑳)1/2\sigma_{K-1}(\boldsymbol{L})\geq 1/2. As a result, by Lemma 10 and Lemma 1 we have

𝑼¯2,\displaystyle\left\|\overline{\boldsymbol{U}}\right\|_{2,\infty}\leq 2𝑼¯𝑳2,𝑼¯𝑳𝑼¯2,+𝑼¯2,\displaystyle 2\left\|\overline{\boldsymbol{U}}\boldsymbol{L}\right\|_{2,\infty}\lesssim\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}+\left\|\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}
\displaystyle\lesssim κσmin2(K1)μnθmax2+(K1)nlognσmin2θmax+K1n1.5θmax3λ1σmin2\displaystyle\frac{\kappa^{*}}{\sigma_{\textbf{min}}^{*2}}\sqrt{(K-1)\mu^{*}n}\theta_{\text{max}}^{2}+\frac{\sqrt{(K-1)n\log n}}{\sigma_{\textbf{min}}^{*2}}\theta_{\text{max}}+\frac{\sqrt{K-1}n^{1.5}\theta_{\text{max}}^{3}}{\lambda_{1}^{*}\sigma_{\textbf{min}}^{*2}}
+(K1)lognθmax+logn(K1)μ/n+μθmaxσmin+nθmax2λ1σmin+(K1)μn\displaystyle+\frac{\sqrt{(K-1)\log n}\theta_{\text{max}}+\log n\sqrt{(K-1)\mu^{*}/n}+\sqrt{\mu^{*}}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{*}}+\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{*}\sigma_{\textbf{min}}^{*}}+\sqrt{\frac{(K-1)\mu^{*}}{n}}
\displaystyle\lesssim κσmin2(K1)μnθmax2+(K1)nlognσmin2θmax+K1n1.5θmax3λ1σmin2\displaystyle\frac{\kappa^{*}}{\sigma_{\textbf{min}}^{*2}}\sqrt{(K-1)\mu^{*}n}\theta_{\text{max}}^{2}+\frac{\sqrt{(K-1)n\log n}}{\sigma_{\textbf{min}}^{*2}}\theta_{\text{max}}+\frac{\sqrt{K-1}n^{1.5}\theta_{\text{max}}^{3}}{\lambda_{1}^{*}\sigma_{\textbf{min}}^{*2}}
+(K1)lognθmaxσmin+nθmax2λ1σmin+(K1)μn\displaystyle+\frac{\sqrt{(K-1)\log n}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{*}}+\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{*}\sigma_{\textbf{min}}^{*}}+\sqrt{\frac{(K-1)\mu^{*}}{n}}
\displaystyle\lesssim κK2.5μn1.5βn2θmax2+K2.5log0.5nn1.5βn2θmax3+K3.5n1.5βn2θmax3+K1.5log0.5nnβnθmax+K2nβnθmax2+K0.5μn0.5:=ξ3\displaystyle\frac{\kappa^{*}K^{2.5}\sqrt{\mu^{*}}}{n^{1.5}\beta_{n}^{2}\theta_{\text{max}}^{2}}+\frac{K^{2.5}\log^{0.5}n}{n^{1.5}\beta_{n}^{2}\theta_{\text{max}}^{3}}+\frac{K^{3.5}}{n^{1.5}\beta_{n}^{2}\theta_{\text{max}}^{3}}+\frac{K^{1.5}\log^{0.5}n}{n\beta_{n}\theta_{\text{max}}}+\frac{K^{2}}{n\beta_{n}\theta_{\text{max}}^{2}}+\frac{K^{0.5}\sqrt{\mu^{*}}}{n^{0.5}}:=\xi_{3} (114)

with probability at least 1O(n10)1-O(n^{-10}). Combine  (113) and  (114) we get

𝑼¯𝚲¯(𝑹𝑳)2,+𝑼¯(𝑹𝚲¯𝚲¯𝑹)2,ξ3(κKβn+K0.5log0.5nθmax)\displaystyle\left\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\left(\boldsymbol{R}-\boldsymbol{L}\right)\right\|_{2,\infty}+\left\|\overline{\boldsymbol{U}}\left(\boldsymbol{R}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{\Lambda}}\boldsymbol{R}\right)\right\|_{2,\infty}\lesssim\xi_{3}\left(\frac{\kappa^{*}K}{\beta_{n}}+K^{0.5}\log^{0.5}n\theta_{\text{max}}\right) (115)

with probability at least 1O(n10)1-O(n^{-10}).

Control 𝚫𝑼¯2,\left\|\boldsymbol{\Delta}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}: Plugging Lemma 1 in Lemma 11 we get:

𝚫𝑼¯2,\displaystyle\left\|\boldsymbol{\Delta}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty} (K1)μnθmax2λ1+K1μlog2nnλ1+n1.5θmax3λ12\displaystyle\lesssim\frac{\sqrt{(K-1)\mu^{*}n}\theta_{\text{max}}^{2}}{\lambda_{1}^{*}}+\frac{\sqrt{K-1}\mu^{*}\log^{2}n}{n\lambda_{1}^{*}}+\frac{n^{1.5}\theta_{\text{max}}^{3}}{\lambda_{1}^{*2}}
K1.5μn0.5+K1.5μlog2nn2θmax2+K2n0.5θmaxK1.5μn0.5+K2n0.5θmax\displaystyle\lesssim\frac{K^{1.5}\sqrt{\mu^{*}}}{n^{0.5}}+\frac{K^{1.5}\mu^{*}\log^{2}n}{n^{2}\theta_{\text{max}}^{2}}+\frac{K^{2}}{n^{0.5}\theta_{\text{max}}}\lesssim\frac{K^{1.5}\sqrt{\mu^{*}}}{n^{0.5}}+\frac{K^{2}}{n^{0.5}\theta_{\text{max}}} (116)

since μlog2nn\mu^{*}\log^{2}n\lesssim n.

Combine (112), (115) and Lemma (116) we get

𝑼¯𝚲¯𝑳𝑿¯𝑼¯2,+𝑼¯𝚲¯(𝑹𝑳)2,+𝑼¯(𝑹𝚲¯𝚲¯𝑹)2,+𝚫𝑼¯2,\displaystyle\left\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}+\left\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\left(\boldsymbol{R}-\boldsymbol{L}\right)\right\|_{2,\infty}+\left\|\overline{\boldsymbol{U}}\left(\boldsymbol{R}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{\Lambda}}\boldsymbol{R}\right)\right\|_{2,\infty}+\left\|\boldsymbol{\Delta}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}
\displaystyle\lesssim ξ3(κKβn+K0.5log0.5nθmax)\displaystyle\xi_{3}\left(\frac{\kappa^{*}K}{\beta_{n}}+K^{0.5}\log^{0.5}n\theta_{\text{max}}\right)
+K1.5log0.5nn0.5βnθmax2+κK1.5μn0.5βn+K2.5n0.5βnθmax+K1.5μn0.5+K2n0.5θmax\displaystyle+\frac{K^{1.5}\log^{0.5}n}{n^{0.5}\beta_{n}\theta_{\text{max}}^{2}}+\frac{\kappa^{*}K^{1.5}\sqrt{\mu^{*}}}{n^{0.5}\beta_{n}}+\frac{K^{2.5}}{n^{0.5}\beta_{n}\theta_{\text{max}}}+\frac{K^{1.5}\sqrt{\mu^{*}}}{n^{0.5}}+\frac{K^{2}}{n^{0.5}\theta_{\text{max}}}
\displaystyle\lesssim ξ3(κKβn+K0.5log0.5nθmax)+K1.5log0.5nn0.5βnθmax2+K2.5n0.5βnθmax=:ξ4\displaystyle\xi_{3}\left(\frac{\kappa^{*}K}{\beta_{n}}+K^{0.5}\log^{0.5}n\theta_{\text{max}}\right)+\frac{K^{1.5}\log^{0.5}n}{n^{0.5}\beta_{n}\theta_{\text{max}}^{2}}+\frac{K^{2.5}}{n^{0.5}\beta_{n}\theta_{\text{max}}}=:\xi_{4}

with probability at least 1O(n10)1-O(n^{-10}), since nmax{μlog2n,Klogn}n\gtrsim\max\left\{\mu^{*}\log^{2}n,K\log n\right\}, by our assumption. Therefore, for 𝚿𝑼¯\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}} we have

𝚿𝑼¯2,ξ4θmax2.\displaystyle\left\|\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}\right\|_{2,\infty}\lesssim\frac{\xi_{4}}{\theta_{\text{max}}^{2}}.

with probability at least 1O(n10)1-O(n^{-10}), completing the proof. ∎

8.12 Proof of Theorem 3

Proof.

Note that by [JKL23, Lemma B.2], we have (𝒖1)iθi/𝜽2(\boldsymbol{u}_{1}^{*})_{i}\asymp\theta_{i}/\left\|\boldsymbol{\theta}\right\|_{2} for all i[n]i\in[n]. Combine this with Lemma 5, we can ensure (𝒖^1)i>0(\widehat{\boldsymbol{u}}_{1})_{i}>0 and (𝒖^1)i(𝒖1)i(\widehat{\boldsymbol{u}}_{1})_{i}\asymp(\boldsymbol{u}_{1}^{*})_{i} for all i[n]i\in[n] as long as

𝒖^1𝒖1\displaystyle\left\|\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{*}\right\|_{\infty} Klog0.5n+K1.5μnθmax+Kμlognn1.5θmax2+K3n1.5θmax3\displaystyle\lesssim\frac{K\log^{0.5}n+K^{1.5}\sqrt{\mu^{*}}}{n\theta_{\text{max}}}+\frac{K\sqrt{\mu^{*}}\log n}{n^{1.5}\theta_{\text{max}}^{2}}+\frac{K^{3}}{n^{1.5}\theta_{\text{max}}^{3}}
mini[n](𝒖1)imini[n]θi𝜽21n.\displaystyle\ll\min_{i\in[n]}(\boldsymbol{u}_{1}^{*})_{i}\asymp\frac{\min_{i\in[n]}\theta_{i}}{\left\|\boldsymbol{\theta}\right\|_{2}}\asymp\frac{1}{\sqrt{n}}.

We have already derived the expansion of 𝒖^1𝒖1\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{*} and 𝑼¯𝑹𝑼¯\overline{\boldsymbol{U}}\boldsymbol{R}-\overline{\boldsymbol{U}}^{*} in Theorem 8, Theorem 9 and Theorem 2. For i[n]i\in[n] we have

(𝒖^1)i=(𝒖1)i+[𝑵1𝑾𝒖1]i+δi,\displaystyle(\widehat{\boldsymbol{u}}_{1})_{i}=(\boldsymbol{u}_{1}^{*})_{i}+\left[\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right]_{i}+\delta_{i},
[𝑼¯𝑹]i,=[𝑼¯]i,+𝒘i+[𝚿𝑼¯]i,.\displaystyle[\overline{\boldsymbol{U}}\boldsymbol{R}]_{i,\cdot}^{\top}=[\overline{\boldsymbol{U}}^{*}]_{i,\cdot}^{\top}+\boldsymbol{w}_{i}+\left[\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}\right]_{i,\cdot}^{\top}.

Since [𝑼¯𝑹]i,=𝑹[𝑼¯]i,[\overline{\boldsymbol{U}}\boldsymbol{R}]_{i,\cdot}^{\top}=\boldsymbol{R}^{\top}[\overline{\boldsymbol{U}}]_{i,\cdot}^{\top} and 𝒓^i=[𝑼¯]i,/(𝒖^1)i\widehat{\boldsymbol{r}}_{i}=[\overline{\boldsymbol{U}}]_{i,\cdot}^{\top}/(\widehat{\boldsymbol{u}}_{1})_{i}, we know that [𝑼¯𝑹]i,/(𝒖^1)i=𝑹𝒓^i[\overline{\boldsymbol{U}}\boldsymbol{R}]_{i,\cdot}^{\top}/(\widehat{\boldsymbol{u}}_{1})_{i}=\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i}. As a result, we have

𝑹𝒓^i𝒓i=\displaystyle\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}= [𝑼¯𝑹]i,(𝒖^1)i[𝑼¯]i,(𝒖1)i=[𝑼¯]i,+𝒘i+[𝚿𝑼¯]i,(𝒖1)i+[𝑵1𝑾𝒖1]i+δi[𝑼¯]i,(𝒖1)i\displaystyle\frac{[\overline{\boldsymbol{U}}\boldsymbol{R}]_{i,\cdot}^{\top}}{(\widehat{\boldsymbol{u}}_{1})_{i}}-\frac{[\overline{\boldsymbol{U}}^{*}]_{i,\cdot}^{\top}}{(\boldsymbol{u}_{1}^{*})_{i}}=\frac{[\overline{\boldsymbol{U}}^{*}]_{i,\cdot}^{\top}+\boldsymbol{w}_{i}+\left[\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}\right]_{i,\cdot}^{\top}}{(\boldsymbol{u}_{1}^{*})_{i}+\left[\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right]_{i}+\delta_{i}}-\frac{[\overline{\boldsymbol{U}}^{*}]_{i,\cdot}^{\top}}{(\boldsymbol{u}_{1}^{*})_{i}}
=\displaystyle= (𝒖1)i𝒘i+(𝒖1)i[𝚿𝑼¯]i,[𝑵1𝑾𝒖1]i[𝑼¯]i,δi[𝑼¯]i,((𝒖1)i+[𝑵1𝑾𝒖1]i+δi)(𝒖1)i\displaystyle\frac{(\boldsymbol{u}_{1}^{*})_{i}\boldsymbol{w}_{i}+(\boldsymbol{u}_{1}^{*})_{i}\left[\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}\right]_{i,\cdot}^{\top}-\left[\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right]_{i}[\overline{\boldsymbol{U}}^{*}]_{i,\cdot}^{\top}-\delta_{i}[\overline{\boldsymbol{U}}^{*}]_{i,\cdot}^{\top}}{\left((\boldsymbol{u}_{1}^{*})_{i}+\left[\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right]_{i}+\delta_{i}\right)(\boldsymbol{u}_{1}^{*})_{i}}
=\displaystyle= (𝒖1)i𝒘i[𝑵1𝑾𝒖1]i[𝑼¯]i,(𝒖1)i2(𝒖1)i(𝒖1)i+[𝑵1𝑾𝒖1]i+δi\displaystyle\frac{(\boldsymbol{u}_{1}^{*})_{i}\boldsymbol{w}_{i}-\left[\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right]_{i}[\overline{\boldsymbol{U}}^{*}]_{i,\cdot}^{\top}}{(\boldsymbol{u}_{1}^{*})_{i}^{2}}\cdot\frac{(\boldsymbol{u}_{1}^{*})_{i}}{(\boldsymbol{u}_{1}^{*})_{i}+\left[\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right]_{i}+\delta_{i}}
+(𝒖1)i[𝚿𝑼¯]i,δi[𝑼¯]i,((𝒖1)i+[𝑵1𝑾𝒖1]i+δi)(𝒖1)i\displaystyle+\frac{(\boldsymbol{u}_{1}^{*})_{i}\left[\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}\right]_{i,\cdot}^{\top}-\delta_{i}[\overline{\boldsymbol{U}}^{*}]_{i,\cdot}^{\top}}{\left((\boldsymbol{u}_{1}^{*})_{i}+\left[\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right]_{i}+\delta_{i}\right)(\boldsymbol{u}_{1}^{*})_{i}}
=\displaystyle= 1(𝒖1)i(𝒘i[𝑵1𝑾𝒖1]i𝒓i)(1[𝑵1𝑾𝒖1]i+δi(𝒖1)i+[𝑵1𝑾𝒖1]i+δi)\displaystyle\frac{1}{(\boldsymbol{u}_{1}^{*})_{i}}\left(\boldsymbol{w}_{i}-\left[\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right]_{i}\boldsymbol{r}_{i}^{*}\right)\left(1-\frac{\left[\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right]_{i}+\delta_{i}}{(\boldsymbol{u}_{1}^{*})_{i}+\left[\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right]_{i}+\delta_{i}}\right)
+[𝚿𝑼¯]i,δi𝒓i(𝒖1)i+[𝑵1𝑾𝒖1]i+δi.\displaystyle+\frac{\left[\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}\right]_{i,\cdot}^{\top}-\delta_{i}\boldsymbol{r}_{i}^{*}}{(\boldsymbol{u}_{1}^{*})_{i}+\left[\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right]_{i}+\delta_{i}}.

We let

γi=[𝑵1𝑾𝒖1]i+δi(𝒖1)i+[𝑵1𝑾𝒖1]i+δiand[𝚿𝒓]i,=[𝚿𝑼¯]i,δi𝒓i(𝒖1)i+[𝑵1𝑾𝒖1]i+δi.\displaystyle\gamma_{i}=-\frac{\left[\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right]_{i}+\delta_{i}}{(\boldsymbol{u}_{1}^{*})_{i}+\left[\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right]_{i}+\delta_{i}}\quad\text{and}\quad[\boldsymbol{\Psi}_{\boldsymbol{r}}]_{i,\cdot}^{\top}=\frac{\left[\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}\right]_{i,\cdot}^{\top}-\delta_{i}\boldsymbol{r}_{i}^{*}}{(\boldsymbol{u}_{1}^{*})_{i}+\left[\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right]_{i}+\delta_{i}}.

Then the expansion can be written as

𝑹𝒓^i𝒓i=1+γi(𝒖1)i(𝒘i[𝑵1𝑾𝒖1]i𝒓i)+[𝚿𝒓]i,.\displaystyle\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}=\frac{1+\gamma_{i}}{(\boldsymbol{u}_{1}^{*})_{i}}\left(\boldsymbol{w}_{i}-\left[\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right]_{i}\boldsymbol{r}_{i}^{*}\right)+[\boldsymbol{\Psi}_{\boldsymbol{r}}]_{i,\cdot}^{\top}.

On the other hand, by Lemma 5 we know that

max1in|[𝑵1𝑾𝒖1]i+δi|=𝒖^1𝒖1\displaystyle\max_{1\leq i\leq n}\left|\left[\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right]_{i}+\delta_{i}\right|=\left\|\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{*}\right\|_{\infty} Klog0.5n+K1.5μnθmax+Kμlognn1.5θmax2+K3n1.5θmax3\displaystyle\lesssim\frac{K\log^{0.5}n+K^{1.5}\sqrt{\mu^{*}}}{n\theta_{\text{max}}}+\frac{K\sqrt{\mu^{*}}\log n}{n^{1.5}\theta_{\text{max}}^{2}}+\frac{K^{3}}{n^{1.5}\theta_{\text{max}}^{3}}
min1in(𝒖1)i1n.\displaystyle\ll\min_{1\leq i\leq n}(\boldsymbol{u}_{1}^{*})_{i}\asymp\frac{1}{\sqrt{n}}.

Therefore, γi\gamma_{i} and [𝚿𝒓]i,[\boldsymbol{\Psi}_{\boldsymbol{r}}]_{i,\cdot}^{\top} can be controlled as

|γi|𝒖^1𝒖1(𝒖1)iKlog0.5n+K1.5μn0.5θmax+Kμlognnθmax2+K3nθmax3,\displaystyle|\gamma_{i}|\lesssim\frac{\left\|\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{*}\right\|_{\infty}}{(\boldsymbol{u}_{1}^{*})_{i}}\lesssim\frac{K\log^{0.5}n+K^{1.5}\sqrt{\mu^{*}}}{n^{0.5}\theta_{\text{max}}}+\frac{K\sqrt{\mu^{*}}\log n}{n\theta_{\text{max}}^{2}}+\frac{K^{3}}{n\theta_{\text{max}}^{3}},
[𝚿𝒓]i,2[𝚿𝑼¯]i,2+|δi|𝒓i2(𝒖1)in(𝚿𝑼¯2,+𝒓i2𝜹).\displaystyle\left\|[\boldsymbol{\Psi}_{\boldsymbol{r}}]_{i,\cdot}\right\|_{2}\lesssim\frac{\left\|\left[\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}\right]_{i,\cdot}\right\|_{2}+|\delta_{i}|\left\|\boldsymbol{r}_{i}^{*}\right\|_{2}}{(\boldsymbol{u}_{1}^{*})_{i}}\lesssim\sqrt{n}\left(\left\|\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}\right\|_{2,\infty}+\left\|\boldsymbol{r}_{i}^{*}\right\|_{2}\left\|\boldsymbol{\delta}\right\|_{\infty}\right).

8.13 Proof of Theorem 4

Proof.

By Theorem 3, with probability at least 1O(n10)1-O(n^{-10}), for all i[n]i\in[n] we have

𝑹𝒓^i𝒓i2\displaystyle\left\|\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}\right\|_{2} 1(𝒖1)i(𝒘i2+1λ1[𝑵𝑾𝒖1]i𝒓i2)+[𝚿𝒓]i,2\displaystyle\lesssim\frac{1}{(\boldsymbol{u}_{1}^{*})_{i}}\left(\left\|\boldsymbol{w}_{i}\right\|_{2}+\frac{1}{\lambda_{1}^{*}}\left[\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right]_{i}\left\|\boldsymbol{r}_{i}^{*}\right\|_{2}\right)+\left\|[\boldsymbol{\Psi}_{\boldsymbol{r}}]_{i,\cdot}\right\|_{2}
1(𝒖1)i(𝒘i2+𝚿𝑼¯2,+1λ1|[𝑵𝑾𝒖1]i|𝒓i2+𝜹𝒓i2).\displaystyle\lesssim\frac{1}{(\boldsymbol{u}_{1}^{*})_{i}}\left(\left\|\boldsymbol{w}_{i}\right\|_{2}+\left\|\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}\right\|_{2,\infty}+\frac{1}{\lambda_{1}^{*}}\left|\left[\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right]_{i}\right|\left\|\boldsymbol{r}_{i}^{*}\right\|_{2}+\left\|\boldsymbol{\delta}\right\|_{\infty}\left\|\boldsymbol{r}_{i}^{*}\right\|_{2}\right). (117)

On one hand, we know that

𝒘i2\displaystyle\left\|\boldsymbol{w}_{i}\right\|_{2} 𝑾i,𝑼¯(𝚲¯)12+[𝒖1𝒖1𝑾𝑵]i,𝑼¯(𝚲¯)12\displaystyle\leq\left\|\boldsymbol{W}_{i,\cdot}\overline{\boldsymbol{U}}^{*}\left(\overline{\boldsymbol{\Lambda}}^{*}\right)^{-1}\right\|_{2}+\left\|\left[\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{N}\right]_{i,\cdot}\overline{\boldsymbol{U}}^{*}\left(\overline{\boldsymbol{\Lambda}}^{*}\right)^{-1}\right\|_{2}
=𝑾i,𝑼¯(𝚲¯)12+(𝒖1)i𝒖1𝑾𝑵𝑼¯(𝚲¯)12.\displaystyle=\left\|\boldsymbol{W}_{i,\cdot}\overline{\boldsymbol{U}}^{*}\left(\overline{\boldsymbol{\Lambda}}^{*}\right)^{-1}\right\|_{2}+(\boldsymbol{u}_{1}^{*})_{i}\left\|\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{N}\overline{\boldsymbol{U}}^{*}\left(\overline{\boldsymbol{\Lambda}}^{*}\right)^{-1}\right\|_{2}. (118)

By Lemma 14 we know that

𝑾i,𝑼¯(𝚲¯)12\displaystyle\left\|\boldsymbol{W}_{i,\cdot}\overline{\boldsymbol{U}}^{*}\left(\overline{\boldsymbol{\Lambda}}^{*}\right)^{-1}\right\|_{2} lognθmax𝑼¯(𝚲¯)1F+logn𝑼¯(𝚲¯)12,\displaystyle\lesssim\sqrt{\log n}\theta_{\text{max}}\left\|\overline{\boldsymbol{U}}^{*}\left(\overline{\boldsymbol{\Lambda}}^{*}\right)^{-1}\right\|_{F}+\log n\left\|\overline{\boldsymbol{U}}^{*}\left(\overline{\boldsymbol{\Lambda}}^{*}\right)^{-1}\right\|_{2,\infty}
(K1)lognθmaxσmin+lognσmin(K1)μn\displaystyle\lesssim\frac{\sqrt{(K-1)\log n}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{*}}+\frac{\log n}{\sigma_{\textbf{min}}^{*}}\sqrt{\frac{(K-1)\mu^{*}}{n}} (119)

with probability at least 1O(n14)1-O(n^{-14}). And, by Lemma 18 and Lemma 12 we have

𝒖1𝑾𝑵𝑼¯(𝚲¯)12=i=2K(𝒖1𝑾𝑵𝒖iλi)21σmini=2K(𝒖1𝑾𝑵𝒖i)2\displaystyle\left\|\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{N}\overline{\boldsymbol{U}}^{*}\left(\overline{\boldsymbol{\Lambda}}^{*}\right)^{-1}\right\|_{2}=\sqrt{\sum_{i=2}^{K}\left(\frac{\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{N}\boldsymbol{u}_{i}^{*}}{\lambda_{i}^{*}}\right)^{2}}\leq\frac{1}{\sigma_{\textbf{min}}^{*}}\sqrt{\sum_{i=2}^{K}\left(\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{N}\boldsymbol{u}_{i}^{*}\right)^{2}}
\displaystyle\lesssim K1σmin(lognθmax+lognμ/n)𝑵(K1)lognθmaxσmin+lognσmin(K1)μn\displaystyle\frac{\sqrt{K-1}}{\sigma_{\textbf{min}}^{*}}\left(\sqrt{\log n}\theta_{\text{max}}+\log n\sqrt{\mu^{*}/n}\right)\left\|\boldsymbol{N}\right\|\lesssim\frac{\sqrt{(K-1)\log n}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{*}}+\frac{\log n}{\sigma_{\textbf{min}}^{*}}\sqrt{\frac{(K-1)\mu^{*}}{n}} (120)

with probability at least 1O(n14)1-O(n^{-14}). Plugging (119) and (120) in (118) we get

𝒘i2\displaystyle\left\|\boldsymbol{w}_{i}\right\|_{2} (1+(𝒖1)i)((K1)lognθmaxσmin+lognσmin(K1)μn)\displaystyle\lesssim(1+(\boldsymbol{u}_{1}^{*})_{i})\left(\frac{\sqrt{(K-1)\log n}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{*}}+\frac{\log n}{\sigma_{\textbf{min}}^{*}}\sqrt{\frac{(K-1)\mu^{*}}{n}}\right)
(K1)lognθmaxσmin+lognσmin(K1)μn\displaystyle\lesssim\frac{\sqrt{(K-1)\log n}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{*}}+\frac{\log n}{\sigma_{\textbf{min}}^{*}}\sqrt{\frac{(K-1)\mu^{*}}{n}}

with probability at least 1O(n14)1-O(n^{-14}). Combine this with Theorem 2, we get

𝒘i2+𝚿𝑼¯2,K1.5log0.5nnβnθmax+K1.5lognμn1.5βnθmax2\displaystyle\left\|\boldsymbol{w}_{i}\right\|_{2}+\left\|\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}\right\|_{2,\infty}\lesssim\frac{K^{1.5}\log^{0.5}n}{n\beta_{n}\theta_{\text{max}}}+\frac{K^{1.5}\log n\sqrt{\mu^{*}}}{n^{1.5}\beta_{n}\theta_{\text{max}}^{2}} (121)

for all i[n]i\in[n] with probability at least 1O(n10)1-O(n^{-10}). On the other hand, by Corollary 4 and Lemma 15 we know that

1λ1|[𝑵𝑾𝒖1]i|\displaystyle\frac{1}{\lambda_{1}^{*}}\left|\left[\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right]_{i}\right| 𝑵1𝑾𝒖11λ1𝑾𝒖1+(K1)μnλ12𝑾𝒖12\displaystyle\leq\left\|\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\|_{\infty}\lesssim\frac{1}{\lambda_{1}^{*}}\left\|\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\|_{\infty}+\sqrt{\frac{(K-1)\mu^{*}}{n\lambda_{1}^{*2}}}\left\|\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\|_{2}
1λ1(lognθmax+lognμ/n)+(K1)μnλ12nθmax\displaystyle\lesssim\frac{1}{\lambda_{1}^{*}}\left(\sqrt{\log n}\theta_{\text{max}}+\log n\sqrt{\mu^{*}/n}\right)+\sqrt{\frac{(K-1)\mu^{*}}{n\lambda_{1}^{*2}}}\sqrt{n}\theta_{\text{max}}
=((K1)μ+logn)θmax+lognμ/nλ1\displaystyle=\frac{\left(\sqrt{(K-1)\mu^{*}}+\sqrt{\log n}\right)\theta_{\text{max}}+\log n\sqrt{\mu^{*}/n}}{\lambda_{1}^{*}}

for all i[n]i\in[n] with probability at least 1O(n13)1-O(n^{-13}). Combine this with Theorem 1, we get

1λ1|[𝑵𝑾𝒖1]i|𝒓i2+𝜹𝒓i2Kμ+K0.5log0.5nnθmax+K0.5lognμnθmax2\displaystyle\frac{1}{\lambda_{1}^{*}}\left|\left[\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right]_{i}\right|\left\|\boldsymbol{r}_{i}^{*}\right\|_{2}+\left\|\boldsymbol{\delta}\right\|_{\infty}\left\|\boldsymbol{r}_{i}^{*}\right\|_{2}\lesssim\frac{K\sqrt{\mu^{*}}+K^{0.5}\log^{0.5}n}{n\theta_{\text{max}}}+\frac{K^{0.5}\log n\sqrt{\mu^{*}}}{n\theta_{\text{max}}^{2}} (122)

for all i[n]i\in[n] with probability at least 1O(n10)1-O(n^{-10}).

Plugging (121) and (122) in (117) we get

𝑹𝒓^i𝒓i2\displaystyle\left\|\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}\right\|_{2} 1(𝒖1)i(Kμnθmax+K0.5lognμnθmax2+K1.5log0.5nnβnθmax+K1.5lognμn1.5βnθmax2)\displaystyle\lesssim\frac{1}{(\boldsymbol{u}_{1}^{*})_{i}}\left(\frac{K\sqrt{\mu^{*}}}{n\theta_{\text{max}}}+\frac{K^{0.5}\log n\sqrt{\mu^{*}}}{n\theta_{\text{max}}^{2}}+\frac{K^{1.5}\log^{0.5}n}{n\beta_{n}\theta_{\text{max}}}+\frac{K^{1.5}\log n\sqrt{\mu^{*}}}{n^{1.5}\beta_{n}\theta_{\text{max}}^{2}}\right)
Kμn0.5θmax+K0.5lognμn0.5θmax2+K1.5log0.5nn0.5βnθmax+K1.5lognμnβnθmax2\displaystyle\lesssim\frac{K\sqrt{\mu^{*}}}{n^{0.5}\theta_{\text{max}}}+\frac{K^{0.5}\log n\sqrt{\mu^{*}}}{n^{0.5}\theta_{\text{max}}^{2}}+\frac{K^{1.5}\log^{0.5}n}{n^{0.5}\beta_{n}\theta_{\text{max}}}+\frac{K^{1.5}\log n\sqrt{\mu^{*}}}{n\beta_{n}\theta_{\text{max}}^{2}}

for all i[n]i\in[n] with probability at least 1O(n10)1-O(n^{-10}). ∎

8.14 Proof of Theorem 5

Proof.

Let 𝒦={i1,i2,,iK}\mathcal{K}=\{i_{1},i_{2},\dots,i_{K}\}. For the same reason as the proof of [JKL23, Lemma E.1], we know that

maxk[K]mini𝒦𝒓^i𝒃k2\displaystyle\max_{k\in[K]}\min_{i\in\mathcal{K}}\left\|\widehat{\boldsymbol{r}}_{i}-\boldsymbol{b}_{k}^{*}\right\|_{2} maxi[n]𝑹𝒓^i𝒓i2\displaystyle\lesssim\max_{i\in[n]}\left\|\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}\right\|_{2}
Kμn0.5θmax+K0.5lognμn0.5θmax2+K1.5log0.5nn0.5βnθmax+K1.5lognμnβnθmax2\displaystyle\lesssim\frac{K\sqrt{\mu^{*}}}{n^{0.5}\theta_{\text{max}}}+\frac{K^{0.5}\log n\sqrt{\mu^{*}}}{n^{0.5}\theta_{\text{max}}^{2}}+\frac{K^{1.5}\log^{0.5}n}{n^{0.5}\beta_{n}\theta_{\text{max}}}+\frac{K^{1.5}\log n\sqrt{\mu^{*}}}{n\beta_{n}\theta_{\text{max}}^{2}}

with probability at least 1O(n10)1-O(n^{-10}). And we denote by 𝒜2\mathcal{A}_{2} this event. We let ρ(k)=argminl[K]𝒓^il𝒃k2\rho(k)=\operatorname*{arg\,min}_{l\in[K]}\left\|\widehat{\boldsymbol{r}}_{i_{l}}-\boldsymbol{b}_{k}^{*}\right\|_{2} for k[K]k\in[K]. By triangle inequality, under event 𝒜2\mathcal{A}_{2} we know that

𝒓iρ(k)𝒃k2\displaystyle\left\|\boldsymbol{r}_{i_{\rho(k)}}^{*}-\boldsymbol{b}_{k}^{*}\right\|_{2} 𝒓iρ(k)𝒓^iρ(k)2+𝒓^iρ(k)𝒃k2\displaystyle\leq\left\|\boldsymbol{r}_{i_{\rho(k)}}^{*}-\widehat{\boldsymbol{r}}_{i_{\rho(k)}}\right\|_{2}+\left\|\widehat{\boldsymbol{r}}_{i_{\rho(k)}}-\boldsymbol{b}_{k}^{*}\right\|_{2}
Kμn0.5θmax+K0.5lognμn0.5θmax2+K1.5log0.5nn0.5βnθmax+K1.5lognμnβnθmax2.\displaystyle\lesssim\frac{K\sqrt{\mu^{*}}}{n^{0.5}\theta_{\text{max}}}+\frac{K^{0.5}\log n\sqrt{\mu^{*}}}{n^{0.5}\theta_{\text{max}}^{2}}+\frac{K^{1.5}\log^{0.5}n}{n^{0.5}\beta_{n}\theta_{\text{max}}}+\frac{K^{1.5}\log n\sqrt{\mu^{*}}}{n\beta_{n}\theta_{\text{max}}^{2}}. (123)

If iρ(k)𝕍ki_{\rho(k)}\notin\mathbb{V}_{k}, we know that 𝒓iρ(k)𝒃k2minl[K]mini[n]\𝕍l𝒓i𝒃l2=Δ𝒓\|\boldsymbol{r}_{i_{\rho(k)}}^{*}-\boldsymbol{b}_{k}^{*}\|_{2}\geq\min_{l\in[K]}\min_{i\in[n]\backslash\mathbb{V}_{l}}\left\|\boldsymbol{r}_{i}^{*}-\boldsymbol{b}_{l}^{*}\right\|_{2}=\Delta_{\boldsymbol{r}}. In this case, (123) cannot hold for appropriately chosen CSPC_{\text{SP}}. Therefore, for appropriately chosen CSPC_{\text{SP}}, we must have iρ(k)𝕍ki_{\rho(k)}\in\mathbb{V}_{k}. This also implies that ρ\rho is a permutation of [K][K], since the cardinality of 𝒦\mathcal{K} is exactly KK.

For any k[K]k\in[K] and j[n]j\in[n], if j𝕍kj\in\mathbb{V}_{k}, by triangle inequality we have

𝒓^j𝒓^iρ(k)2\displaystyle\left\|\widehat{\boldsymbol{r}}_{j}-\widehat{\boldsymbol{r}}_{i_{\rho(k)}}\right\|_{2} =𝑹(𝒓^j𝒓^iρ(k))2𝑹𝒓^j𝒓j2+𝒓j𝒓iρ(k)2+𝒓iρ(k)𝑹𝒓^iρ(k)2\displaystyle=\left\|\boldsymbol{R}^{\top}\left(\widehat{\boldsymbol{r}}_{j}-\widehat{\boldsymbol{r}}_{i_{\rho(k)}}\right)\right\|_{2}\leq\left\|\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{j}-\boldsymbol{r}_{j}^{*}\right\|_{2}+\left\|\boldsymbol{r}_{j}^{*}-\boldsymbol{r}_{i_{\rho(k)}}^{*}\right\|_{2}+\left\|\boldsymbol{r}_{i_{\rho(k)}}^{*}-\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i_{\rho(k)}}\right\|_{2}
𝒃k𝒃k2+2maxi[n]𝑹𝒓^i𝒓i2\displaystyle\leq\left\|\boldsymbol{b}_{k}^{*}-\boldsymbol{b}_{k}^{*}\right\|_{2}+2\max_{i\in[n]}\left\|\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}\right\|_{2}
Kμn0.5θmax+K0.5lognμn0.5θmax2+K1.5log0.5nn0.5βnθmax+K1.5lognμnβnθmax2\displaystyle\lesssim\frac{K\sqrt{\mu^{*}}}{n^{0.5}\theta_{\text{max}}}+\frac{K^{0.5}\log n\sqrt{\mu^{*}}}{n^{0.5}\theta_{\text{max}}^{2}}+\frac{K^{1.5}\log^{0.5}n}{n^{0.5}\beta_{n}\theta_{\text{max}}}+\frac{K^{1.5}\log n\sqrt{\mu^{*}}}{n\beta_{n}\theta_{\text{max}}^{2}}

under event 𝒜2\mathcal{A}_{2}. As a result, for for appropriately chosen CSPC_{\text{SP}}, it holds that 𝒓^j𝒃^ρ(k)2=𝒓^j𝒓^iρ(k)2ϕ\|\widehat{\boldsymbol{r}}_{j}-\widehat{\boldsymbol{b}}_{\rho(k)}^{\prime}\|_{2}=\|\widehat{\boldsymbol{r}}_{j}-\widehat{\boldsymbol{r}}_{i_{\rho(k)}}\|_{2}\leq\phi. In other words, we must have j𝕍^ρ(k)j\in\widehat{\mathbb{V}}_{\rho(k)}. On the other hand, if j𝕍kj\notin\mathbb{V}_{k}, again by triangle inequality we have

𝒓j𝒃k2=𝒓j𝒓iρ(k)2\displaystyle\left\|\boldsymbol{r}_{j}^{*}-\boldsymbol{b}_{k}^{*}\right\|_{2}=\left\|\boldsymbol{r}_{j}^{*}-\boldsymbol{r}_{i_{\rho(k)}}^{*}\right\|_{2} 𝒓j𝑹𝒓^j2+𝑹(𝒓^j𝒓^iρ(k))2+𝑹𝒓^iρ(k)𝒓iρ(k)2\displaystyle\leq\left\|\boldsymbol{r}_{j}^{*}-\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{j}\right\|_{2}+\left\|\boldsymbol{R}^{\top}\left(\widehat{\boldsymbol{r}}_{j}-\widehat{\boldsymbol{r}}_{i_{\rho(k)}}\right)\right\|_{2}+\left\|\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i_{\rho(k)}}-\boldsymbol{r}_{i_{\rho(k)}}^{*}\right\|_{2}
=𝒓j𝑹𝒓^j2+𝒓^j𝒓^iρ(k)2+𝑹𝒓^iρ(k)𝒓iρ(k)2.\displaystyle=\left\|\boldsymbol{r}_{j}^{*}-\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{j}\right\|_{2}+\left\|\widehat{\boldsymbol{r}}_{j}-\widehat{\boldsymbol{r}}_{i_{\rho(k)}}\right\|_{2}+\left\|\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i_{\rho(k)}}-\boldsymbol{r}_{i_{\rho(k)}}^{*}\right\|_{2}.

As a result, 𝒓^j𝒓^iρ(k)2\|\widehat{\boldsymbol{r}}_{j}-\widehat{\boldsymbol{r}}_{i_{\rho(k)}}\|_{2} can be lower bounded as

𝒓^j𝒓^iρ(k)2\displaystyle\left\|\widehat{\boldsymbol{r}}_{j}-\widehat{\boldsymbol{r}}_{i_{\rho(k)}}\right\|_{2} 𝒓j𝒃k2𝒓j𝑹𝒓^j2𝑹𝒓^iρ(k)𝒓iρ(k)2\displaystyle\geq\left\|\boldsymbol{r}_{j}^{*}-\boldsymbol{b}_{k}^{*}\right\|_{2}-\left\|\boldsymbol{r}_{j}^{*}-\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{j}\right\|_{2}-\left\|\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i_{\rho(k)}}-\boldsymbol{r}_{i_{\rho(k)}}^{*}\right\|_{2}
minl[K]mini[n]\𝕍l𝒓i𝒃l22maxi[n]𝑹𝒓^i𝒓i2\displaystyle\geq\min_{l\in[K]}\min_{i\in[n]\backslash\mathbb{V}_{l}}\left\|\boldsymbol{r}_{i}^{*}-\boldsymbol{b}_{l}^{*}\right\|_{2}-2\max_{i\in[n]}\left\|\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}\right\|_{2}
=Δ𝒓2maxi[n]𝑹𝒓^i𝒓i2>ϕ\displaystyle=\Delta_{\boldsymbol{r}}-2\max_{i\in[n]}\left\|\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}\right\|_{2}>\phi

as long as CSPC_{\text{SP}} satisfies

maxi[n]𝑹𝒓^i𝒓i2CSP4ε1\displaystyle\max_{i\in[n]}\left\|\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}\right\|_{2}\leq\frac{C_{\text{SP}}}{4}\varepsilon_{1}

under event 𝒜2\mathcal{A}_{2}, we have 𝒓^j𝒓^iρ(k)2>ϕ\|\widehat{\boldsymbol{r}}_{j}-\widehat{\boldsymbol{r}}_{i_{\rho(k)}}\|_{2}>\phi. This implies j𝕍^ρ(k)j\notin\widehat{\mathbb{V}}_{\rho(k)}. To sum up, we have 𝕍^ρ(k)=𝕍k\widehat{\mathbb{V}}_{\rho(k)}=\mathbb{V}_{k} for all k[K]k\in[K] under event 𝒜2\mathcal{A}_{2}. ∎

8.15 Proof of Corollary 1

Proof.

Let ρ()\rho(\cdot) be the permutation from Theorem 5. According to Theorem 5, with probability at least 1O(n10)1-O(n^{-10}), we have 𝕍^ρ(k)=𝕍k\widehat{\mathbb{V}}_{\rho(k)}=\mathbb{V}_{k} for all k[K]k\in[K]. Combine this fact with Algorithm 1 one can see that

𝒃^ρ(k)=1|𝕍^ρ(k)|i𝕍^ρ(k)𝒓^i=1|𝕍k|i𝕍k𝒓^i.\displaystyle\widehat{\boldsymbol{b}}_{\rho(k)}=\frac{1}{|\widehat{\mathbb{V}}_{\rho(k)}|}\sum_{i\in\widehat{\mathbb{V}}_{\rho(k)}}\widehat{\boldsymbol{r}}_{i}=\frac{1}{|\mathbb{V}_{k}|}\sum_{i\in\mathbb{V}_{k}}\widehat{\boldsymbol{r}}_{i}.

As a result, we can write

𝑹𝒃^ρ(k)𝒃k=1|𝕍k|i𝕍k(𝒓^i𝒃k)=1|𝕍k|i𝕍k(𝒓^i𝒓i).\displaystyle\boldsymbol{R}^{\top}\widehat{\boldsymbol{b}}_{\rho(k)}-\boldsymbol{b}_{k}^{*}=\frac{1}{|\mathbb{V}_{k}|}\sum_{i\in\mathbb{V}_{k}}\left(\widehat{\boldsymbol{r}}_{i}-\boldsymbol{b}_{k}^{*}\right)=\frac{1}{|\mathbb{V}_{k}|}\sum_{i\in\mathbb{V}_{k}}\left(\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}\right).

Then by (19) we know that

[𝚿𝒃]k,2\displaystyle\left\|[\boldsymbol{\Psi}_{\boldsymbol{b}}]_{k,\cdot}\right\|_{2} =1|𝕍k|i𝕍k(𝒓^i𝒓i)1|𝕍k|i𝕍kΔ𝒓i2\displaystyle=\left\|\frac{1}{|\mathbb{V}_{k}|}\sum_{i\in\mathbb{V}_{k}}\left(\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}\right)-\frac{1}{\left|\mathbb{V}_{k}\right|}\sum_{i\in\mathbb{V}_{k}}\Delta\boldsymbol{r}_{i}\right\|_{2}
1|𝕍k|i𝕍k(𝒓^i𝒓iΔ𝒓i)2ε2\displaystyle\leq\frac{1}{|\mathbb{V}_{k}|}\left\|\sum_{i\in\mathbb{V}_{k}}\left(\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}-\Delta\boldsymbol{r}_{i}\right)\right\|_{2}\lesssim\varepsilon_{2}

with probability at least 1O(n10)1-O(n^{-10}). ∎

8.16 Proof of Corollary 3

Proof.

Considering the trace on both sides of (36) we get

λ^1λ1\displaystyle\widehat{\lambda}_{1}-\lambda_{1}^{*} =Tr[λ^1𝒖^1𝒖^1λ1𝒖1𝒖1]\displaystyle=\textbf{Tr}\left[\widehat{\lambda}_{1}\widehat{\boldsymbol{u}}_{1}\widehat{\boldsymbol{u}}_{1}^{\top}-\lambda_{1}^{*}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right]
=Tr[𝒖1𝒖1𝑾𝒖1𝒖1+𝑵𝑾𝒖1𝒖1+(𝑾𝒖1𝒖1)𝑵+𝚫]\displaystyle=\textbf{Tr}\left[\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+\left(\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right)^{\top}\boldsymbol{N}+\boldsymbol{\Delta}\right]
=Tr[𝑾𝒖1𝒖1𝒖1𝒖1+𝑵𝑾𝒖1𝒖1+𝑵𝑾𝒖1𝒖1]+Tr[𝚫]\displaystyle=\textbf{Tr}\left[\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+\boldsymbol{N}^{\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right]+\textbf{Tr}\left[\boldsymbol{\Delta}\right]
=Tr[𝑾𝒖1𝒖1+2𝑵𝑾𝒖1𝒖1]+Tr[𝚫].\displaystyle=\textbf{Tr}\left[\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+2\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right]+\textbf{Tr}\left[\boldsymbol{\Delta}\right].

And, from (36) we also know that

𝚫\displaystyle\boldsymbol{\Delta} =λ^1𝒖^1𝒖^1λ1𝒖1𝒖1𝒖1𝒖1𝑾𝒖1𝒖1𝑵𝑾𝒖1𝒖1(𝑾𝒖1𝒖1)𝑵\displaystyle=\widehat{\lambda}_{1}\widehat{\boldsymbol{u}}_{1}\widehat{\boldsymbol{u}}_{1}^{\top}-\lambda_{1}^{*}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}-\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}-\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}-\left(\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right)^{\top}\boldsymbol{N}
=λ^1𝒖^1𝒖^1𝒖1(λ1𝒖1+𝒖1𝑾𝒖1𝒖1+𝒖1𝑾𝑵)𝑵𝑾𝒖1𝒖1.\displaystyle=\widehat{\lambda}_{1}\widehat{\boldsymbol{u}}_{1}\widehat{\boldsymbol{u}}_{1}^{\top}-\boldsymbol{u}_{1}^{*}\left(\lambda_{1}^{*}\boldsymbol{u}_{1}^{*\top}+\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{N}\right)-\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}.

As a result, the rank of 𝚫\boldsymbol{\Delta} is at most 33. Therefore, the trace of 𝚫\boldsymbol{\Delta} can be bounded as |Tr[𝚫]|Rank[𝚫]𝚫nθmax2λ1\left|\textbf{Tr}\left[\boldsymbol{\Delta}\right]\right|\leq\textbf{Rank}\left[\boldsymbol{\Delta}\right]\left\|\boldsymbol{\Delta}\right\|\lesssim\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{*}}, completing the proof. ∎

8.17 Proof of Lemma 2

Proof.

By Corollary 1 we know that

𝒃^ρ(k)𝚲¯𝒃^ρ(k)=\displaystyle\widehat{\boldsymbol{b}}_{\rho(k)}^{\top}\overline{\boldsymbol{\Lambda}}\widehat{\boldsymbol{b}}_{\rho(k)}= (𝒃k+Δ𝒃k+[𝚿𝒃]k,)𝑹𝚲¯𝑹(𝒃k+Δ𝒃k+[𝚿𝒃]k,)\displaystyle\left(\boldsymbol{b}_{k}^{*}+\Delta\boldsymbol{b}_{k}+[\boldsymbol{\Psi}_{\boldsymbol{b}}]_{k,\cdot}\right)^{\top}\boldsymbol{R}^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{R}\left(\boldsymbol{b}_{k}^{*}+\Delta\boldsymbol{b}_{k}+[\boldsymbol{\Psi}_{\boldsymbol{b}}]_{k,\cdot}\right)
=\displaystyle= (𝒃k+Δ𝒃k+[𝚿𝒃]k,)𝚲¯(𝒃k+Δ𝒃k+[𝚿𝒃]k,)\displaystyle\left(\boldsymbol{b}_{k}^{*}+\Delta\boldsymbol{b}_{k}+[\boldsymbol{\Psi}_{\boldsymbol{b}}]_{k,\cdot}\right)^{\top}\overline{\boldsymbol{\Lambda}}^{*}\left(\boldsymbol{b}_{k}^{*}+\Delta\boldsymbol{b}_{k}+[\boldsymbol{\Psi}_{\boldsymbol{b}}]_{k,\cdot}\right)
+(𝒃k+Δ𝒃k+[𝚿𝒃]k,)(𝑹𝚲¯𝑹𝚲¯)(𝒃k+Δ𝒃k+[𝚿𝒃]k,).\displaystyle+\left(\boldsymbol{b}_{k}^{*}+\Delta\boldsymbol{b}_{k}+[\boldsymbol{\Psi}_{\boldsymbol{b}}]_{k,\cdot}\right)^{\top}\left(\boldsymbol{R}^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{R}-\overline{\boldsymbol{\Lambda}}^{*}\right)\left(\boldsymbol{b}_{k}^{*}+\Delta\boldsymbol{b}_{k}+[\boldsymbol{\Psi}_{\boldsymbol{b}}]_{k,\cdot}\right). (124)

On one hand, by Lemma 8 we have

|(𝒃k+Δ𝒃k+[𝚿𝒃]k,)(𝑹𝚲¯𝑹𝚲¯)(𝒃k+Δ𝒃k+[𝚿𝒃]k,)|\displaystyle\left|\left(\boldsymbol{b}_{k}^{*}+\Delta\boldsymbol{b}_{k}+[\boldsymbol{\Psi}_{\boldsymbol{b}}]_{k,\cdot}\right)^{\top}\left(\boldsymbol{R}^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{R}-\overline{\boldsymbol{\Lambda}}^{*}\right)\left(\boldsymbol{b}_{k}^{*}+\Delta\boldsymbol{b}_{k}+[\boldsymbol{\Psi}_{\boldsymbol{b}}]_{k,\cdot}\right)\right|
\displaystyle\leq 𝑹𝚲¯𝑹𝚲¯𝒃k+Δ𝒃k+[𝚿𝒃]k,22\displaystyle\left\|\boldsymbol{R}^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{R}-\overline{\boldsymbol{\Lambda}}^{*}\right\|\left\|\boldsymbol{b}_{k}^{*}+\Delta\boldsymbol{b}_{k}+[\boldsymbol{\Psi}_{\boldsymbol{b}}]_{k,\cdot}\right\|^{2}_{2}
\displaystyle\lesssim (κnθmax2σmin+(K1)lognθmax)(K1+ε1)2κK2βn+K1.5θmaxlog0.5n\displaystyle\left(\frac{\kappa^{*}n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{*}}+\sqrt{(K-1)\log n}\theta_{\text{max}}\right)\left(\sqrt{K-1}+\varepsilon_{1}\right)^{2}\lesssim\frac{\kappa^{*}K^{2}}{\beta_{n}}+K^{1.5}\theta_{\text{max}}\log^{0.5}n (125)

with probability exceeding 1O(n10)1-O(n^{-10}). On the other hand, by triangle inequality we know that

|(𝒃k+Δ𝒃k+[𝚿𝒃]k,)𝚲¯(𝒃k+Δ𝒃k+[𝚿𝒃]k,)𝒃k𝚲¯𝒃k2𝒃k𝚲¯Δ𝒃k|\displaystyle\left|\left(\boldsymbol{b}_{k}^{*}+\Delta\boldsymbol{b}_{k}+[\boldsymbol{\Psi}_{\boldsymbol{b}}]_{k,\cdot}\right)^{\top}\overline{\boldsymbol{\Lambda}}^{*}\left(\boldsymbol{b}_{k}^{*}+\Delta\boldsymbol{b}_{k}+[\boldsymbol{\Psi}_{\boldsymbol{b}}]_{k,\cdot}\right)-\boldsymbol{b}_{k}^{*\top}\overline{\boldsymbol{\Lambda}}^{*}\boldsymbol{b}_{k}^{*}-2\boldsymbol{b}_{k}^{*\top}\overline{\boldsymbol{\Lambda}}^{*}\Delta\boldsymbol{b}_{k}\right|
\displaystyle\leq |[𝚿𝒃]k,𝚲¯(𝒃k+Δ𝒃k+[𝚿𝒃]k,)|+|(𝒃k+Δ𝒃k)𝚲¯[𝚿𝒃]k,|+|Δ𝒃k𝚲¯Δ𝒃k|\displaystyle\left|[\boldsymbol{\Psi}_{\boldsymbol{b}}]_{k,\cdot}^{\top}\overline{\boldsymbol{\Lambda}}^{*}\left(\boldsymbol{b}_{k}^{*}+\Delta\boldsymbol{b}_{k}+[\boldsymbol{\Psi}_{\boldsymbol{b}}]_{k,\cdot}\right)\right|+\left|\left(\boldsymbol{b}_{k}^{*}+\Delta\boldsymbol{b}_{k}\right)^{\top}\overline{\boldsymbol{\Lambda}}^{*}[\boldsymbol{\Psi}_{\boldsymbol{b}}]_{k,\cdot}\right|+\left|\Delta\boldsymbol{b}_{k}^{\top}\overline{\boldsymbol{\Lambda}}^{*}\Delta\boldsymbol{b}_{k}\right|
\displaystyle\lesssim (K1ε2+ε12)σmax.\displaystyle\left(\sqrt{K-1}\varepsilon_{2}+\varepsilon_{1}^{2}\right)\sigma_{\textbf{max}}^{*}. (126)

Plugging (125) and (126) in (124) we get

|𝒃^ρ(k)𝚲¯𝒃^ρ(k)𝒃k𝚲¯𝒃k2𝒃k𝚲¯Δ𝒃k|\displaystyle\left|\widehat{\boldsymbol{b}}_{\rho(k)}^{\top}\overline{\boldsymbol{\Lambda}}\widehat{\boldsymbol{b}}_{\rho(k)}-\boldsymbol{b}_{k}^{*\top}\overline{\boldsymbol{\Lambda}}^{*}\boldsymbol{b}_{k}^{*}-2\boldsymbol{b}_{k}^{*\top}\overline{\boldsymbol{\Lambda}}^{*}\Delta\boldsymbol{b}_{k}\right|
\displaystyle\lesssim κK2βn+K1.5μθmaxlog0.5n+(K0.5ε2+ε12)σmax\displaystyle\frac{\kappa^{*}K^{2}}{\beta_{n}}+K^{1.5}\mu^{*}\theta_{\text{max}}\log^{0.5}n+\left(K^{0.5}\varepsilon_{2}+\varepsilon_{1}^{2}\right)\sigma_{\textbf{max}}^{*}

with probability at least 1O(n10)1-O(n^{-10}). Moreover, the left hand side is a small order term compared to K0.5ε1σmaxK^{0.5}\varepsilon_{1}\sigma_{\textbf{max}}^{*}, which controls 𝒃k𝚲¯Δ𝒃k\boldsymbol{b}_{k}^{*\top}\overline{\boldsymbol{\Lambda}}^{*}\Delta\boldsymbol{b}_{k}. As a result, the estimation error can be controlled by K0.5ε1σmaxK^{0.5}\varepsilon_{1}\sigma_{\textbf{max}}^{*}, ∎

8.18 Proof of Lemma 3

Proof.

We denote by

𝑩^=[𝒃^1𝒃^2𝒃^K111]K×K,\displaystyle\boldsymbol{\widehat{B}}=\begin{bmatrix}\widehat{\boldsymbol{b}}_{1}&\widehat{\boldsymbol{b}}_{2}&\dots&\widehat{\boldsymbol{b}}_{K}\\ 1&1&\dots&1\end{bmatrix}\in\mathbb{R}^{K\times K},

By the definition of 𝒂^i\widehat{\boldsymbol{a}}_{i}, we know that 𝑩^𝒂^i=[𝒓^i,1]\boldsymbol{\widehat{B}}\widehat{\boldsymbol{a}}_{i}=[\widehat{\boldsymbol{r}}_{i}^{\top},1]^{\top}. We also denote by

𝑹~=[𝑹𝟎(K1)×1𝟎1×(K1)1]K×K\displaystyle\widetilde{\boldsymbol{R}}=\begin{bmatrix}\boldsymbol{R}&\boldsymbol{0}_{(K-1)\times 1}\\ \boldsymbol{0}_{1\times(K-1)}&1\end{bmatrix}\in\mathbb{R}^{K\times K}

Let ρ()\rho(\cdot) be the permutation from Theorem 5, then we have

𝑹~[𝒓^i1]=𝑹~𝑩^𝒂^i=𝑹~j=1K[𝒃^j1](𝒂^i)j=𝑹~j=1K[𝒃^ρ(j)1](𝒂^i)ρ(j)=𝑹~ρ(𝑩^)ρ(𝒂^i).\displaystyle\widetilde{\boldsymbol{R}}^{\top}\begin{bmatrix}\widehat{\boldsymbol{r}}_{i}\\ 1\end{bmatrix}=\widetilde{\boldsymbol{R}}^{\top}\boldsymbol{\widehat{B}}\widehat{\boldsymbol{a}}_{i}=\widetilde{\boldsymbol{R}}^{\top}\sum_{j=1}^{K}\begin{bmatrix}\widehat{\boldsymbol{b}}_{j}\\ 1\end{bmatrix}(\widehat{\boldsymbol{a}}_{i})_{j}=\widetilde{\boldsymbol{R}}^{\top}\sum_{j=1}^{K}\begin{bmatrix}\widehat{\boldsymbol{b}}_{\rho(j)}\\ 1\end{bmatrix}(\widehat{\boldsymbol{a}}_{i})_{\rho(j)}=\widetilde{\boldsymbol{R}}^{\top}\rho(\boldsymbol{\widehat{B}})\rho(\widehat{\boldsymbol{a}}_{i}).

Therefore, we can write

𝑩(ρ(𝒂^i)𝒂i)\displaystyle\boldsymbol{B}^{*}\left(\rho(\widehat{\boldsymbol{a}}_{i})-\boldsymbol{a}_{i}^{*}\right) =𝑩ρ(𝒂^i)[𝒓i1]=𝑹~[𝒓^i1][𝒓i1](𝑹~ρ(𝑩^)𝑩)ρ(𝒂^i)\displaystyle=\boldsymbol{B}^{*}\rho(\widehat{\boldsymbol{a}}_{i})-\begin{bmatrix}\boldsymbol{r}_{i}^{*}\\ 1\end{bmatrix}=\widetilde{\boldsymbol{R}}^{\top}\begin{bmatrix}\widehat{\boldsymbol{r}}_{i}\\ 1\end{bmatrix}-\begin{bmatrix}\boldsymbol{r}_{i}^{*}\\ 1\end{bmatrix}-\left(\widetilde{\boldsymbol{R}}^{\top}\rho(\boldsymbol{\widehat{B}})-\boldsymbol{B}^{*}\right)\rho(\widehat{\boldsymbol{a}}_{i})
=𝑹~[𝒓^i1][𝒓i1](𝑹~ρ(𝑩^)𝑩)𝒂i(𝑹~ρ(𝑩^)𝑩)(ρ(𝒂^i)𝒂i).\displaystyle=\widetilde{\boldsymbol{R}}^{\top}\begin{bmatrix}\widehat{\boldsymbol{r}}_{i}\\ 1\end{bmatrix}-\begin{bmatrix}\boldsymbol{r}_{i}^{*}\\ 1\end{bmatrix}-\left(\widetilde{\boldsymbol{R}}^{\top}\rho(\boldsymbol{\widehat{B}})-\boldsymbol{B}^{*}\right)\boldsymbol{a}^{*}_{i}-\left(\widetilde{\boldsymbol{R}}^{\top}\rho(\boldsymbol{\widehat{B}})-\boldsymbol{B}^{*}\right)\left(\rho(\widehat{\boldsymbol{a}}_{i})-\boldsymbol{a}^{*}_{i}\right). (127)

Denote by 𝚿𝑩=[𝚿𝒃,𝟎K×1]\boldsymbol{\Psi}_{\boldsymbol{B}}=[\boldsymbol{\Psi}_{\boldsymbol{b}},\boldsymbol{0}_{K\times 1}]^{\top}. By Corollary 1 we know that

𝑹~ρ(𝑩^)𝑩=Δ𝑩+𝚿𝑩,\displaystyle\widetilde{\boldsymbol{R}}^{\top}\rho(\boldsymbol{\widehat{B}})-\boldsymbol{B}^{*}=\Delta\boldsymbol{B}+\boldsymbol{\Psi}_{\boldsymbol{B}}, (128)

and

𝑹~ρ(𝑩^)𝑩𝑹~ρ(𝑩^)𝑩FKsupi[n]𝑹𝒓^i𝒓i2Kε1;\displaystyle\left\|\widetilde{\boldsymbol{R}}^{\top}\rho(\boldsymbol{\widehat{B}})-\boldsymbol{B}^{*}\right\|\leq\left\|\widetilde{\boldsymbol{R}}^{\top}\rho(\boldsymbol{\widehat{B}})-\boldsymbol{B}^{*}\right\|_{F}\leq\sqrt{K}\sup_{i\in[n]}\left\|\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}\right\|_{2}\lesssim\sqrt{K}\varepsilon_{1};
𝚿𝑩K𝚿𝒃2,Kε2.\displaystyle\left\|\boldsymbol{\Psi}_{\boldsymbol{B}}\right\|\leq\sqrt{K}\left\|\boldsymbol{\Psi}_{\boldsymbol{b}}\right\|_{2,\infty}\lesssim\sqrt{K}\varepsilon_{2}.

Next, by Theorem 3,

𝑹~[𝒓^i1][𝒓i1]=[Δ𝒓i0]+[[𝚿𝒓]i,0],\displaystyle\widetilde{\boldsymbol{R}}^{\top}\begin{bmatrix}\widehat{\boldsymbol{r}}_{i}\\ 1\end{bmatrix}-\begin{bmatrix}\boldsymbol{r}_{i}^{*}\\ 1\end{bmatrix}=\begin{bmatrix}\Delta\boldsymbol{r}_{i}\\ 0\end{bmatrix}+\begin{bmatrix}[\boldsymbol{\Psi}_{\boldsymbol{r}}]_{i,\cdot}^{\top}\\ 0\end{bmatrix}, (129)

where [𝚿𝒓]i,ε2\|[\boldsymbol{\Psi}_{\boldsymbol{r}}]_{i,\cdot}\|\lesssim\varepsilon_{2}. Plugging (128) and (129) in (127), we get

𝑩(ρ(𝒂^i)𝒂i)=[Δ𝒓i0]Δ𝑩𝒂i+[[𝚿𝒓]i,0]𝚿𝑩𝒂i(𝑹~ρ(𝑩^)𝑩)(ρ(𝒂^i)𝒂i).\displaystyle\boldsymbol{B}^{*}\left(\rho(\widehat{\boldsymbol{a}}_{i})-\boldsymbol{a}_{i}^{*}\right)=\begin{bmatrix}\Delta\boldsymbol{r}_{i}\\ 0\end{bmatrix}-\Delta\boldsymbol{B}\boldsymbol{a}_{i}^{*}+\begin{bmatrix}[\boldsymbol{\Psi}_{\boldsymbol{r}}]_{i,\cdot}^{\top}\\ 0\end{bmatrix}-\boldsymbol{\Psi}_{\boldsymbol{B}}\boldsymbol{a}_{i}^{*}-\left(\widetilde{\boldsymbol{R}}^{\top}\rho(\boldsymbol{\widehat{B}})-\boldsymbol{B}^{*}\right)\left(\rho(\widehat{\boldsymbol{a}}_{i})-\boldsymbol{a}^{*}_{i}\right). (130)

Since 𝒃1,𝒃2,,𝒃K\boldsymbol{b}_{1}^{*},\boldsymbol{b}_{2}^{*},\dots,\boldsymbol{b}_{K}^{*} form a simplex and all 𝒓i,i[n]\boldsymbol{r}_{i}^{*},i\in[n] are inside it, we know that the entries of 𝒂i\boldsymbol{a}_{i} are within [0,1][0,1]. As a result, we know that 𝒂122=j=1K(𝒂i)j2j=1K(𝒂i)j=1\|\boldsymbol{a}_{1}^{*}\|_{2}^{2}=\sum_{j=1}^{K}(\boldsymbol{a}_{i}^{*})_{j}^{2}\leq\sum_{j=1}^{K}(\boldsymbol{a}_{i}^{*})_{j}=1. That is to say, we have

[[𝚿𝒓]i,0]𝚿𝑩𝒂i2[Ψ𝒓]i,+𝚿𝑩𝒂i2Kε2.\displaystyle\left\|\begin{bmatrix}[\boldsymbol{\Psi}_{\boldsymbol{r}}]_{i,\cdot}^{\top}\\ 0\end{bmatrix}-\boldsymbol{\Psi}_{\boldsymbol{B}}\boldsymbol{a}_{i}^{*}\right\|_{2}\leq\|[\Psi_{\boldsymbol{r}}]_{i,\cdot}\|+\left\|\boldsymbol{\Psi}_{\boldsymbol{B}}\right\|\left\|\boldsymbol{a}_{i}^{*}\right\|_{2}\lesssim\sqrt{K}\varepsilon_{2}. (131)

On the other hand, we have

(𝑹~ρ(𝑩^)𝑩)(ρ(𝒂^i)𝒂i)2Kε1ρ(𝒂^i)𝒂i2.\displaystyle\left\|\left(\widetilde{\boldsymbol{R}}^{\top}\rho(\boldsymbol{\widehat{B}})-\boldsymbol{B}^{*}\right)\left(\rho(\widehat{\boldsymbol{a}}_{i})-\boldsymbol{a}^{*}_{i}\right)\right\|_{2}\lesssim\sqrt{K}\varepsilon_{1}\left\|\rho(\widehat{\boldsymbol{a}}_{i})-\boldsymbol{a}^{*}_{i}\right\|_{2}. (132)

According to [JKL23, (C.26)], we know that (𝑩)11/K\|(\boldsymbol{B}^{*})^{-1}\|\lesssim 1/\sqrt{K}. As a result, plugging (131) and (132) in (130) we get

ρ(𝒂^i)𝒂i(𝑩)1[Δ𝒓i0]+(𝑩)1Δ𝑩𝒂i2ε2+ε1ρ(𝒂^i)𝒂i2.\displaystyle\left\|\rho(\widehat{\boldsymbol{a}}_{i})-\boldsymbol{a}^{*}_{i}-(\boldsymbol{B}^{*})^{-1}\begin{bmatrix}\Delta\boldsymbol{r}_{i}\\ 0\end{bmatrix}+(\boldsymbol{B}^{*})^{-1}\Delta\boldsymbol{B}\boldsymbol{a}_{i}^{*}\right\|_{2}\lesssim\varepsilon_{2}+\varepsilon_{1}\left\|\rho(\widehat{\boldsymbol{a}}_{i})-\boldsymbol{a}^{*}_{i}\right\|_{2}. (133)

And, since

(𝑩)1[Δ𝒓i0]2(𝑩)1Δ𝒓i2ε1K\displaystyle\left\|(\boldsymbol{B}^{*})^{-1}\begin{bmatrix}\Delta\boldsymbol{r}_{i}\\ 0\end{bmatrix}\right\|_{2}\leq\left\|(\boldsymbol{B}^{*})^{-1}\right\|\left\|\Delta\boldsymbol{r}_{i}\right\|_{2}\lesssim\frac{\varepsilon_{1}}{\sqrt{K}} (134)

and

(𝑩)1Δ𝑩𝒂i2Δ𝑩F𝒂i2KKmaxk[K]Δ𝒃k2Kε1.\displaystyle\left\|(\boldsymbol{B}^{*})^{-1}\Delta\boldsymbol{B}^{*}\boldsymbol{a}_{i}^{*}\right\|_{2}\lesssim\frac{\left\|\Delta\boldsymbol{B}\right\|_{F}\left\|\boldsymbol{a}^{*}_{i}\right\|_{2}}{\sqrt{K}}\leq\frac{\sqrt{K}\max_{k\in[K]}\left\|\Delta\boldsymbol{b}_{k}\right\|_{2}}{\sqrt{K}}\lesssim\varepsilon_{1}. (135)

Combine (134), (135) with (133) we get

ρ(𝒂^i)𝒂i2ε1+ε2+ε1ρ(𝒂^i)𝒂i2.\displaystyle\left\|\rho(\widehat{\boldsymbol{a}}_{i})-\boldsymbol{a}^{*}_{i}\right\|_{2}\lesssim\varepsilon_{1}+\varepsilon_{2}+\varepsilon_{1}\left\|\rho(\widehat{\boldsymbol{a}}_{i})-\boldsymbol{a}^{*}_{i}\right\|_{2}.

Since ε2ε1C\varepsilon_{2}\lesssim\varepsilon_{1}\leq C, we know that ρ(𝒂^i)𝒂i2ε1\left\|\rho(\widehat{\boldsymbol{a}}_{i})-\boldsymbol{a}^{*}_{i}\right\|_{2}\lesssim\varepsilon_{1}, completing the proof. ∎

8.19 Proof of Theorem 6

Proof.

First we focus on c^k,k[K]\widehat{c}_{k},k\in[K]. Define f(x):=xf(x):=\sqrt{x} and ck:=(λ1+𝒃k𝚲¯𝒃k)1/2c_{k}^{*}:=(\lambda_{1}^{*}+\boldsymbol{b}_{k}^{*\top}\overline{\boldsymbol{\Lambda}}^{*}\boldsymbol{b}_{k}^{*})^{-1/2}. Set x0:=λ1+𝒃k𝚲¯𝒃kx_{0}:=\lambda_{1}^{*}+\boldsymbol{b}_{k}^{*\top}\overline{\boldsymbol{\Lambda}}^{*}\boldsymbol{b}_{k}^{*}, x1:=λ^1+𝒃^ρ(k)𝚲¯𝒃^ρ(k)x_{1}:=\widehat{\lambda}_{1}+\widehat{\boldsymbol{b}}_{\rho(k)}^{\top}\overline{\boldsymbol{\Lambda}}\widehat{\boldsymbol{b}}_{\rho(k)}. By Taylor expansion, we know that there exists some x~\tilde{x} between x0x_{0} and x1x_{1}, such that

1c^ρ(k)1ck=f(x1)f(x0)=f(x0)(x1x0)+f′′(x~)2(x1x0)2.\displaystyle\frac{1}{\widehat{c}_{\rho(k)}}-\frac{1}{c_{k}^{*}}=f(x_{1})-f(x_{0})=f^{\prime}(x_{0})(x_{1}-x_{0})+\frac{f^{\prime\prime}(\tilde{x})}{2}(x_{1}-x_{0})^{2}.

Combining Corollary 3 and Lemma 2, we have

|x1x0(Tr[𝑾𝒖1𝒖1+2𝑵𝑾𝒖1𝒖1]+2𝒃k𝚲¯Δ𝒃k)|\displaystyle\left|x_{1}-x_{0}-\left(\textbf{Tr}\left[\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+2\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right]+2\boldsymbol{b}_{k}^{*\top}\overline{\boldsymbol{\Lambda}}^{*}\Delta\boldsymbol{b}_{k}\right)\right|
\displaystyle\leq |Tr[𝚫]|+𝝍κK2βn+K1.5θmaxlog0.5n+(K0.5ε2+ε12)σmax.\displaystyle\left|\textbf{Tr}\left[\boldsymbol{\Delta}\right]\right|+\left\|\boldsymbol{\psi}\right\|_{\infty}\lesssim\frac{\kappa^{*}K^{2}}{\beta_{n}}+K^{1.5}\theta_{\text{max}}\log^{0.5}n+\left(K^{0.5}\varepsilon_{2}+\varepsilon_{1}^{2}\right)\sigma_{\textbf{max}}^{*}.

Note that f(x0)=0.5x01/2=0.5ckf^{\prime}(x_{0})=0.5x_{0}^{-1/2}=0.5c_{k}^{*}. Since K0.5ε1σmaxx0nθmax2K^{0.5}\varepsilon_{1}\sigma_{\textbf{max}}^{*}\ll x_{0}\asymp n\theta_{\text{max}}^{2}, we have f′′(x~)f′′(x0)ck3f^{\prime\prime}(\tilde{x})\asymp f^{\prime\prime}(x_{0})\asymp c_{k}^{*3}. Hence,

|1c^ρ(k)1ckck2(Tr[𝑾𝒖1𝒖1+2𝑵𝑾𝒖1𝒖1]+2𝒃k𝚲¯Δ𝒃k)|\displaystyle\left|\frac{1}{\widehat{c}_{\rho(k)}}-\frac{1}{c_{k}^{*}}-\frac{c_{k}^{*}}{2}\left(\textbf{Tr}\left[\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+2\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right]+2\boldsymbol{b}_{k}^{*\top}\overline{\boldsymbol{\Lambda}}^{*}\Delta\boldsymbol{b}_{k}\right)\right|
\displaystyle\lesssim (κK2βn+K1.5θmaxlog0.5n+(K0.5ε2+ε12)σmax)ck+Kε12σmax2ck3.\displaystyle\left(\frac{\kappa^{*}K^{2}}{\beta_{n}}+K^{1.5}\theta_{\text{max}}\log^{0.5}n+\left(K^{0.5}\varepsilon_{2}+\varepsilon_{1}^{2}\right)\sigma_{\textbf{max}}^{*}\right)c_{k}^{*}+K\varepsilon_{1}^{2}\sigma_{\textbf{max}}^{*2}c_{k}^{*3}. (136)

Again by Taylor expansion, we know that there exists some x~\tilde{x}^{\prime} between x0x_{0} and x1x_{1}, such that

1c^ρ(k)1ck=f(x1)f(x0)=f(x~)(x1x0).\displaystyle\frac{1}{\widehat{c}_{\rho(k)}}-\frac{1}{c_{k}^{*}}=f(x_{1})-f(x_{0})=f^{\prime}(\tilde{x}^{\prime})(x_{1}-x_{0}).

Since K1ε1σmaxx0nθmax2\sqrt{K-1}\varepsilon_{1}\sigma_{\textbf{max}}^{*}\ll x_{0}\asymp n\theta_{\text{max}}^{2}, we have f(x~)f(x0)ckf^{\prime}(\tilde{x}^{\prime})\asymp f^{\prime}(x_{0})\asymp c_{k}^{*}. So,

|1c^ρ(k)1ck|ck|x1x0|K0.5ε1σmaxck.\displaystyle\left|\frac{1}{\widehat{c}_{\rho(k)}}-\frac{1}{c_{k}^{*}}\right|\asymp c_{k}^{*}\left|x_{1}-x_{0}\right|\lesssim K^{0.5}\varepsilon_{1}\sigma_{\textbf{max}}^{*}c_{k}^{*}. (137)

Second, for the permutation ρ()\rho(\cdot) from Theorem 5 and any i[n]i\in[n], we have

(𝒂^i)ρ(k)c^ρ(k)(𝒂i)kck\displaystyle\frac{(\widehat{\boldsymbol{a}}_{i})_{\rho(k)}}{\widehat{c}_{\rho(k)}}-\frac{(\boldsymbol{a}_{i}^{*})_{k}}{c_{k}^{*}} =(𝒂^i)ρ(k)(𝒂i)kck+(1c^ρ(k)1ck)(𝒂i)k+(1c^ρ(k)1ck)((𝒂^i)ρ(k)(𝒂i)k)\displaystyle=\frac{(\widehat{\boldsymbol{a}}_{i})_{\rho(k)}-(\boldsymbol{a}_{i}^{*})_{k}}{c_{k}^{*}}+\left(\frac{1}{\widehat{c}_{\rho(k)}}-\frac{1}{c_{k}^{*}}\right)(\boldsymbol{a}_{i}^{*})_{k}+\left(\frac{1}{\widehat{c}_{\rho(k)}}-\frac{1}{c_{k}^{*}}\right)\left((\widehat{\boldsymbol{a}}_{i})_{\rho(k)}-(\boldsymbol{a}_{i}^{*})_{k}\right)
=(ρ(𝒂^i)𝒂i)kck+(1c^ρ(k)1ck)(𝒂i)k+(1c^ρ(k)1ck)(ρ(𝒂^i)𝒂i)k.\displaystyle=\frac{(\rho(\widehat{\boldsymbol{a}}_{i})-\boldsymbol{a}_{i}^{*})_{k}}{c_{k}^{*}}+\left(\frac{1}{\widehat{c}_{\rho(k)}}-\frac{1}{c_{k}^{*}}\right)(\boldsymbol{a}_{i}^{*})_{k}+\left(\frac{1}{\widehat{c}_{\rho(k)}}-\frac{1}{c_{k}^{*}}\right)(\rho(\widehat{\boldsymbol{a}}_{i})-\boldsymbol{a}_{i}^{*})_{k}.

Combine Lemma 3 and (136), we have the following expansion

(𝒂^i)ρ(k)c^ρ(k)(𝒂i)kck=(Δ𝒂i)kck+ck2(Tr[𝑾𝒖1𝒖1+2𝑵𝑾𝒖1𝒖1]+2𝒃k𝚲¯Δ𝒃k)(𝒂i)k+[𝚿𝒂/𝒄]i,k,\displaystyle\frac{(\widehat{\boldsymbol{a}}_{i})_{\rho(k)}}{\widehat{c}_{\rho(k)}}-\frac{(\boldsymbol{a}_{i}^{*})_{k}}{c_{k}^{*}}=\frac{(\Delta\boldsymbol{a}_{i})_{k}}{c_{k}^{*}}+\frac{c_{k}^{*}}{2}\left(\textbf{Tr}\left[\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+2\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right]+2\boldsymbol{b}_{k}^{*\top}\overline{\boldsymbol{\Lambda}}^{*}\Delta\boldsymbol{b}_{k}\right)(\boldsymbol{a}_{i}^{*})_{k}+\left[\boldsymbol{\Psi}_{\boldsymbol{a}/\boldsymbol{c}}\right]_{i,k},

where

|[𝚿𝒂/𝒄]i,k|\displaystyle\left|\left[\boldsymbol{\Psi}_{\boldsymbol{a}/\boldsymbol{c}}\right]_{i,k}\right|\leq |1c^ρ(k)1ckck2(Tr[𝑾𝒖1𝒖1+2𝑵𝑾𝒖1𝒖1]+2𝒃k𝚲¯Δ𝒃k)|(𝒂i)k\displaystyle\left|\frac{1}{\widehat{c}_{\rho(k)}}-\frac{1}{c_{k}^{*}}-\frac{c_{k}^{*}}{2}\left(\textbf{Tr}\left[\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+2\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right]+2\boldsymbol{b}_{k}^{*\top}\overline{\boldsymbol{\Lambda}}^{*}\Delta\boldsymbol{b}_{k}\right)\right|(\boldsymbol{a}_{i}^{*})_{k}
+|(ρ(𝒂^i)𝒂iΔ𝒂i)k|ck+|1c^ρ(k)1ck||(ρ(𝒂^i)𝒂i)k|\displaystyle+\frac{\left|(\rho(\widehat{\boldsymbol{a}}_{i})-\boldsymbol{a}_{i}^{*}-\Delta\boldsymbol{a}_{i})_{k}\right|}{c_{k}^{*}}+\left|\frac{1}{\widehat{c}_{\rho(k)}}-\frac{1}{c_{k}^{*}}\right|\left|(\rho(\widehat{\boldsymbol{a}}_{i})-\boldsymbol{a}_{i}^{*})_{k}\right|
\displaystyle\lesssim (κK2βn+K1.5θmaxlog0.5n+(K0.5ε2+ε12)σmax)ck+Kε12σmax2ck3\displaystyle\left(\frac{\kappa^{*}K^{2}}{\beta_{n}}+K^{1.5}\theta_{\text{max}}\log^{0.5}n+\left(K^{0.5}\varepsilon_{2}+\varepsilon_{1}^{2}\right)\sigma_{\textbf{max}}^{*}\right)c_{k}^{*}+K\varepsilon_{1}^{2}\sigma_{\textbf{max}}^{*2}c_{k}^{*3}
+ε2+ε12ck+K0.5ε12σmaxck\displaystyle+\frac{\varepsilon_{2}+\varepsilon_{1}^{2}}{c_{k}^{*}}+K^{0.5}\varepsilon_{1}^{2}\sigma_{\textbf{max}}^{*}c_{k}^{*}
(i)\displaystyle\overset{(i)}{\lesssim} 1𝜽2(κK2βn+K1.5θmaxlog0.5n+(K0.5ε2+ε12)σmax)+𝜽2(ε2+ε12).\displaystyle\frac{1}{\left\|\boldsymbol{\theta}\right\|_{2}}\left(\frac{\kappa^{*}K^{2}}{\beta_{n}}+K^{1.5}\theta_{\text{max}}\log^{0.5}n+\left(K^{0.5}\varepsilon_{2}+\varepsilon_{1}^{2}\right)\sigma_{\textbf{max}}^{*}\right)+\left\|\boldsymbol{\theta}\right\|_{2}(\varepsilon_{2}+\varepsilon_{1}^{2}).

Here (i)(i) holds because ck1/𝜽2c_{k}^{*}\asymp 1/\left\|\boldsymbol{\theta}\right\|_{2} according to [JKL23, C.22] and σmaxβnK1𝜽22\sigma_{\textbf{max}}^{*}\asymp\beta_{n}K^{-1}\left\|\boldsymbol{\theta}\right\|_{2}^{2} according to Lemma 1. In terms of the estimation error, by Theorem 3 and (137) we have

|(𝒂^i)ρ(k)c^ρ(k)(𝒂i)kck||(𝒂^i)ρ(k)c^ρ(k)(𝒂i)kc^ρ(k)|+|(𝒂i)kc^ρ(k)(𝒂i)kck|ε1ck+K0.5ε1σmaxckε1𝜽2.\displaystyle\left|\frac{(\widehat{\boldsymbol{a}}_{i})_{\rho(k)}}{\widehat{c}_{\rho(k)}}-\frac{(\boldsymbol{a}_{i}^{*})_{k}}{c_{k}^{*}}\right|\leq\left|\frac{(\widehat{\boldsymbol{a}}_{i})_{\rho(k)}}{\widehat{c}_{\rho(k)}}-\frac{(\boldsymbol{a}_{i}^{*})_{k}}{\widehat{c}_{\rho(k)}}\right|+\left|\frac{(\boldsymbol{a}_{i}^{*})_{k}}{\widehat{c}_{\rho(k)}}-\frac{(\boldsymbol{a}_{i}^{*})_{k}}{c_{k}^{*}}\right|\lesssim\frac{\varepsilon_{1}}{c_{k}^{*}}+K^{0.5}\varepsilon_{1}\sigma_{\textbf{max}}^{*}c_{k}^{*}\lesssim\varepsilon_{1}\left\|\boldsymbol{\theta}\right\|_{2}. (138)

Now we are ready to derive the expansion of 𝝅^i\widehat{\boldsymbol{\pi}}_{i}. For any i[n]i\in[n] and k[K]k\in[K], we have

𝝅^i(ρ(k))𝝅i(k)=(𝒂^i)ρ(k)/c^ρ(k)l=1K(𝒂^i)ρ(l)/^cρ(l)(𝒂i)k/ckl=1K(𝒂i)l/cl\displaystyle\widehat{\boldsymbol{\pi}}_{i}(\rho(k))-\boldsymbol{\pi}_{i}^{*}(k)=\frac{(\widehat{\boldsymbol{a}}_{i})_{\rho(k)}/\widehat{c}_{\rho(k)}}{\sum_{l=1}^{K}(\widehat{\boldsymbol{a}}_{i})_{\rho(l)}\widehat{/}c_{\rho(l)}}-\frac{(\boldsymbol{a}_{i}^{*})_{k}/c^{*}_{k}}{\sum_{l=1}^{K}(\boldsymbol{a}_{i}^{*})_{l}/c^{*}_{l}}
=\displaystyle= lk,l[K]((𝒂^i)ρ(k)/c^ρ(k))((𝒂i)l/cl)((𝒂^i)ρ(l)/c^ρ(l))((𝒂i)k/ck)(l=1K(𝒂^i)ρ(l)/c^ρ(l))(l=1K(𝒂i)l/cl)\displaystyle\frac{\sum_{l\neq k,l\in[K]}\left((\widehat{\boldsymbol{a}}_{i})_{\rho(k)}/\widehat{c}_{\rho(k)}\right)\cdot\left((\boldsymbol{a}_{i}^{*})_{l}/c^{*}_{l}\right)-\left((\widehat{\boldsymbol{a}}_{i})_{\rho(l)}/\widehat{c}_{\rho(l)}\right)\cdot\left((\boldsymbol{a}_{i}^{*})_{k}/c^{*}_{k}\right)}{\left(\sum_{l=1}^{K}(\widehat{\boldsymbol{a}}_{i})_{\rho(l)}/\widehat{c}_{\rho(l)}\right)\left(\sum_{l=1}^{K}(\boldsymbol{a}_{i}^{*})_{l}/c^{*}_{l}\right)}
=\displaystyle= (1+ηi)Δ𝝅i(k)+[𝚿𝚷]i,k,\displaystyle(1+\eta_{i})\Delta\boldsymbol{\pi}_{i}(k)+\left[\boldsymbol{\Psi}_{\boldsymbol{\Pi}}\right]_{i,k},

where

ηi=\displaystyle\eta_{i}= l=1K(𝒂i)l/cl(𝒂^i)ρ(l)/c^ρ(l)l=1K(𝒂^i)ρ(l)/c^ρ(l),\displaystyle\frac{\sum_{l=1}^{K}(\boldsymbol{a}_{i}^{*})_{l}/c^{*}_{l}-(\widehat{\boldsymbol{a}}_{i})_{\rho(l)}/\widehat{c}_{\rho(l)}}{\sum_{l=1}^{K}(\widehat{\boldsymbol{a}}_{i})_{\rho(l)}/\widehat{c}_{\rho(l)}},
Δ𝝅i(k)=\displaystyle\Delta\boldsymbol{\pi}_{i}(k)= 1(l=1K(𝒂i)l/cl)2{lk,l[K]Tr[𝑾𝒖1𝒖1+2𝑵𝑾𝒖1𝒖1](ck2clcl2ck)(𝒂i)k(𝒂i)l\displaystyle\frac{1}{\left(\sum_{l=1}^{K}(\boldsymbol{a}^{*}_{i})_{l}/c^{*}_{l}\right)^{2}}\Bigg{\{}\sum_{l\neq k,l\in[K]}\textbf{Tr}\left[\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+2\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right]\left(\frac{c_{k}^{*}}{2c_{l}^{*}}-\frac{c_{l}^{*}}{2c_{k}^{*}}\right)(\boldsymbol{a}_{i}^{*})_{k}(\boldsymbol{a}_{i}^{*})_{l}
+(Δ𝒂i)k(𝒂i)l(Δ𝒂i)l(𝒂i)kckcl+(𝒃k𝚲¯Δ𝒃kckcl𝒃l𝚲¯Δ𝒃lclck)(𝒂i)k(𝒂i)l},\displaystyle\quad\quad+\frac{(\Delta\boldsymbol{a}_{i})_{k}(\boldsymbol{a}_{i}^{*})_{l}-(\Delta\boldsymbol{a}_{i})_{l}(\boldsymbol{a}_{i}^{*})_{k}}{c_{k}^{*}c_{l}^{*}}+\left(\frac{\boldsymbol{b}_{k}^{*\top}\overline{\boldsymbol{\Lambda}}^{*}\Delta\boldsymbol{b}_{k}c_{k}^{*}}{c_{l}^{*}}-\frac{\boldsymbol{b}_{l}^{*\top}\overline{\boldsymbol{\Lambda}}^{*}\Delta\boldsymbol{b}_{l}c_{l}^{*}}{c_{k}^{*}}\right)(\boldsymbol{a}_{i}^{*})_{k}(\boldsymbol{a}_{i}^{*})_{l}\Bigg{\}},
[𝚿𝚷]i,k=\displaystyle\left[\boldsymbol{\Psi}_{\boldsymbol{\Pi}}\right]_{i,k}= lk,l[K][𝚿𝒂/𝒄]i,k(𝒂i)l/cl[𝚿𝒂/𝒄]i,l(𝒂i)k/ck(l=1K(𝒂^i)ρ(l)/c^ρ(l))(l=1K(𝒂i)l/cl).\displaystyle\frac{\sum_{l\neq k,l\in[K]}\left[\boldsymbol{\Psi}_{\boldsymbol{a}/\boldsymbol{c}}\right]_{i,k}(\boldsymbol{a}_{i}^{*})_{l}/c_{l}^{*}-\left[\boldsymbol{\Psi}_{\boldsymbol{a}/\boldsymbol{c}}\right]_{i,l}(\boldsymbol{a}_{i}^{*})_{k}/c_{k}^{*}}{\left(\sum_{l=1}^{K}(\widehat{\boldsymbol{a}}_{i})_{\rho(l)}/\widehat{c}_{\rho(l)}\right)\left(\sum_{l=1}^{K}(\boldsymbol{a}_{i}^{*})_{l}/c^{*}_{l}\right)}.

Since ε1C\varepsilon_{1}\leq C for some appropriate C>0C>0, we have

|ckc^ρ(k)1|K0.5ε1σmaxck2K0.5ε1σmax𝜽22ε1K0.5|ckc^ρ(k)1|0.5.\displaystyle\left|\frac{c_{k}^{*}}{\widehat{c}_{\rho(k)}}-1\right|\lesssim K^{0.5}\varepsilon_{1}\sigma_{\textbf{max}}^{*}c_{k}^{*2}\lesssim K^{0.5}\varepsilon_{1}\frac{\sigma_{\textbf{max}}^{*}}{\left\|\boldsymbol{\theta}\right\|_{2}^{2}}\lesssim\frac{\varepsilon_{1}}{K^{0.5}}\Rightarrow\left|\frac{c_{k}^{*}}{\widehat{c}_{\rho(k)}}-1\right|\leq 0.5.

As a result, we have

l=1K(𝒂^i)ρ(l)c^ρ(l)l=1K(𝒂^i)ρ(l)maxt[K]c^t=1maxt[K]c^t1maxt[K]ct𝜽2.\displaystyle\sum_{l=1}^{K}\frac{(\widehat{\boldsymbol{a}}_{i})_{\rho(l)}}{\widehat{c}_{\rho(l)}}\geq\sum_{l=1}^{K}\frac{(\widehat{\boldsymbol{a}}_{i})_{\rho(l)}}{\max_{t\in[K]}\widehat{c}_{t}}=\frac{1}{\max_{t\in[K]}\widehat{c}_{t}}\gtrsim\frac{1}{\max_{t\in[K]}c^{*}_{t}}\asymp\left\|\boldsymbol{\theta}\right\|_{2}.

Combine this with (138), for ηi,i[n]\eta_{i},i\in[n] we have

|ηi|\displaystyle|\eta_{i}| l=1K|(𝒂i)l/cl(𝒂^i)ρ(l)/c^ρ(l)|l=1K(𝒂^i)ρ(l)/c^ρ(l)Kε1𝜽2𝜽2=Kε1.\displaystyle\leq\frac{\sum_{l=1}^{K}\left|(\boldsymbol{a}_{i}^{*})_{l}/c^{*}_{l}-(\widehat{\boldsymbol{a}}_{i})_{\rho(l)}/\widehat{c}_{\rho(l)}\right|}{\sum_{l=1}^{K}(\widehat{\boldsymbol{a}}_{i})_{\rho(l)}/\widehat{c}_{\rho(l)}}\lesssim\frac{K\varepsilon_{1}\left\|\boldsymbol{\theta}\right\|_{2}}{\left\|\boldsymbol{\theta}\right\|_{2}}=K\varepsilon_{1}.

Since l=1K(𝒂i)l/cl1/maxt[K]ct1/𝜽2\sum_{l=1}^{K}(\boldsymbol{a}_{i}^{*})_{l}/c^{*}_{l}\geq 1/\max_{t\in[K]}c_{t}^{*}\asymp 1/\left\|\boldsymbol{\theta}\right\|_{2}, we have

|[𝚿𝚷]i,k|\displaystyle\left|\left[\boldsymbol{\Psi}_{\boldsymbol{\Pi}}\right]_{i,k}\right| lk,l[K]|[𝚿𝒂/𝒄]i,k|(𝒂i)l𝜽2+|[𝚿𝒂/𝒄]i,l|(𝒂i)k𝜽2𝜽22\displaystyle\lesssim\frac{\sum_{l\neq k,l\in[K]}\left|\left[\boldsymbol{\Psi}_{\boldsymbol{a}/\boldsymbol{c}}\right]_{i,k}\right|(\boldsymbol{a}_{i}^{*})_{l}\left\|\boldsymbol{\theta}\right\|_{2}+\left|\left[\boldsymbol{\Psi}_{\boldsymbol{a}/\boldsymbol{c}}\right]_{i,l}\right|(\boldsymbol{a}_{i}^{*})_{k}\left\|\boldsymbol{\theta}\right\|_{2}}{\left\|\boldsymbol{\theta}\right\|_{2}^{2}}
Kmaxj[n],l[K]|[𝚿𝒂/𝒄]j,l|𝜽2\displaystyle\leq\frac{K\max_{j\in[n],l\in[K]}\left|\left[\boldsymbol{\Psi}_{\boldsymbol{a}/\boldsymbol{c}}\right]_{j,l}\right|}{\left\|\boldsymbol{\theta}\right\|_{2}}
K𝜽22(κK2βn+K1.5θmaxlog0.5n+(K0.5ε2+ε12)σmax)+K(ε2+ε12).\displaystyle\lesssim\frac{K}{\left\|\boldsymbol{\theta}\right\|_{2}^{2}}\left(\frac{\kappa^{*}K^{2}}{\beta_{n}}+K^{1.5}\theta_{\text{max}}\log^{0.5}n+\left(K^{0.5}\varepsilon_{2}+\varepsilon_{1}^{2}\right)\sigma_{\textbf{max}}^{*}\right)+K(\varepsilon_{2}+\varepsilon_{1}^{2}).

8.20 Proof of Theorem 7

Proof.

Define

Δ𝝅:=(Δ𝝅i1(k1),Δ𝝅i2(k2),,Δ𝝅ir(kr))=1ijnWij𝝎ij.\displaystyle\Delta\boldsymbol{\pi}_{\mathcal{I}}:=\left(\Delta\boldsymbol{\pi}_{i_{1}}(k_{1}),\Delta\boldsymbol{\pi}_{i_{2}}(k_{2}),\dots,\Delta\boldsymbol{\pi}_{i_{r}}(k_{r})\right)^{\top}=\sum_{1\leq i\leq j\leq n}W_{ij}\boldsymbol{\omega}_{ij}.

By Berry-Esseen theorem [Rai19], for any convex set 𝒟r\mathcal{D}\subset\mathbb{R}^{r}, we have

|(Δ𝝅𝒟)(𝒩(𝟎r,𝚺)𝒟)|r1/41ijn𝔼[𝚺1/2Wij𝝎ij23]\displaystyle\left|\mathbb{P}(\Delta\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D})-\mathbb{P}(\mathcal{N}(\boldsymbol{0}_{r},\boldsymbol{\Sigma})\in\mathcal{D})\right|\lesssim r^{1/4}\sum_{1\leq i\leq j\leq n}\mathbb{E}\left[\left\|\boldsymbol{\Sigma}^{-1/2}W_{ij}\boldsymbol{\omega}_{ij}\right\|_{2}^{3}\right]
\displaystyle\lesssim r1/41ijn𝚺1/2𝝎ij23𝔼[|Wij|3]r1/41ijn𝚺1/2𝝎ij23Hij(1Hij)\displaystyle r^{1/4}\sum_{1\leq i\leq j\leq n}\left\|\boldsymbol{\Sigma}^{-1/2}\boldsymbol{\omega}_{ij}\right\|_{2}^{3}\mathbb{E}\left[\left|W_{ij}\right|^{3}\right]\leq r^{1/4}\sum_{1\leq i\leq j\leq n}\left\|\boldsymbol{\Sigma}^{-1/2}\boldsymbol{\omega}_{ij}\right\|_{2}^{3}H_{ij}(1-H_{ij})
\displaystyle\lesssim r1/4max1ijn𝚺1/2𝝎ij21ijn𝚺1/2𝝎ij22Hij(1Hij).\displaystyle r^{1/4}\max_{1\leq i\leq j\leq n}\left\|\boldsymbol{\Sigma}^{-1/2}\boldsymbol{\omega}_{ij}\right\|_{2}\sum_{1\leq i\leq j\leq n}\left\|\boldsymbol{\Sigma}^{-1/2}\boldsymbol{\omega}_{ij}\right\|_{2}^{2}H_{ij}(1-H_{ij}). (139)

Since 𝚺\boldsymbol{\Sigma} is the covariance of Δϕ\Delta\phi_{\mathcal{I}}, we know that

1ijn𝚺1/2𝝎ij22Hij(1Hij)=1ijn𝝎ij𝚺1𝝎ijHij(1Hij)\displaystyle\sum_{1\leq i\leq j\leq n}\left\|\boldsymbol{\Sigma}^{-1/2}\boldsymbol{\omega}_{ij}\right\|_{2}^{2}H_{ij}(1-H_{ij})=\sum_{1\leq i\leq j\leq n}\boldsymbol{\omega}_{ij}^{\top}\boldsymbol{\Sigma}^{-1}\boldsymbol{\omega}_{ij}H_{ij}(1-H_{ij})
=\displaystyle= 1ijnTr[𝚺1𝝎ij𝝎ij]Hij(1Hij)=Tr[𝚺11ijn𝝎ij𝝎ijHij(1Hij)]\displaystyle\sum_{1\leq i\leq j\leq n}\textbf{Tr}\left[\boldsymbol{\Sigma}^{-1}\boldsymbol{\omega}_{ij}\boldsymbol{\omega}_{ij}^{\top}\right]H_{ij}(1-H_{ij})=\textbf{Tr}\left[\boldsymbol{\Sigma}^{-1}\sum_{1\leq i\leq j\leq n}\boldsymbol{\omega}_{ij}\boldsymbol{\omega}_{ij}^{\top}H_{ij}(1-H_{ij})\right]
=\displaystyle= Tr[𝚺11ijn𝔼[(Wij𝝎ij)(Wij𝝎ij)]]=Tr[𝚺1𝚺]=r.\displaystyle\textbf{Tr}\left[\boldsymbol{\Sigma}^{-1}\sum_{1\leq i\leq j\leq n}\mathbb{E}\left[(W_{ij}\boldsymbol{\omega}_{ij})(W_{ij}\boldsymbol{\omega}_{ij})^{\top}\right]\right]=\textbf{Tr}\left[\boldsymbol{\Sigma}^{-1}\boldsymbol{\Sigma}\right]=r. (140)

Combine (139) and (140) we know that

|(Δ𝝅𝒟)(𝒩(𝟎r,𝚺)𝒟)|r5/4max1ijn𝚺1/2𝝎ij2.\displaystyle\left|\mathbb{P}(\Delta\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D})-\mathbb{P}(\mathcal{N}(\boldsymbol{0}_{r},\boldsymbol{\Sigma})\in\mathcal{D})\right|\lesssim r^{5/4}\max_{1\leq i\leq j\leq n}\left\|\boldsymbol{\Sigma}^{-1/2}\boldsymbol{\omega}_{ij}\right\|_{2}. (141)

It remains to control |(𝝅^𝝅𝒟)(Δ𝝅𝒟)|\left|\mathbb{P}(\widehat{\boldsymbol{\pi}}_{\mathcal{I}}-\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D})-\mathbb{P}(\Delta\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D})\right|. For any convex set 𝒟r\mathcal{D}\subset\mathbb{R}^{r} and point xrx\in\mathbb{R}^{r}, we define

δ𝒟(x):={minyr\𝒟xy2, if x𝒟miny𝒟xy2, if x𝒟 and 𝒟ε:={xr:δ𝒟(x)ε}.\displaystyle\delta_{\mathcal{D}}(x):=\begin{cases}-\min_{y\in\mathbb{R}^{r}\backslash\mathcal{D}}\left\|x-y\right\|_{2},&\text{ if }x\in\mathcal{D}\\ \min_{y\in\mathcal{D}}\left\|x-y\right\|_{2},&\text{ if }x\notin\mathcal{D}\end{cases}\text{ and }\mathcal{D}^{\varepsilon}:=\left\{x\in\mathbb{R}^{r}:\delta_{\mathcal{D}}(x)\leq\varepsilon\right\}.

With this definition, we have

(𝚺1/2Δ𝝅𝒟ε)=\displaystyle\mathbb{P}(\boldsymbol{\Sigma}^{-1/2}\Delta\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D}^{-\varepsilon})= (𝚺1/2Δ𝝅𝒟ε,𝚺1/2(𝝅^𝝅Δ)2ε)\displaystyle\mathbb{P}\left(\boldsymbol{\Sigma}^{-1/2}\Delta\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D}^{-\varepsilon},\left\|\boldsymbol{\Sigma}^{-1/2}\left(\widehat{\boldsymbol{\pi}}_{\mathcal{I}}-\boldsymbol{\pi}_{\mathcal{I}}-\Delta_{\mathcal{I}}\right)\right\|_{2}\leq\varepsilon\right)
+(𝚺1/2Δ𝝅𝒟ε,𝚺1/2(𝝅^𝝅Δ)2>ε)\displaystyle+\mathbb{P}\left(\boldsymbol{\Sigma}^{-1/2}\Delta\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D}^{-\varepsilon},\left\|\boldsymbol{\Sigma}^{-1/2}\left(\widehat{\boldsymbol{\pi}}_{\mathcal{I}}-\boldsymbol{\pi}_{\mathcal{I}}-\Delta_{\mathcal{I}}\right)\right\|_{2}>\varepsilon\right)
\displaystyle\leq (𝚺1/2(𝝅^𝝅)𝒟)+(𝚺1/2(𝝅^𝝅Δ)2>ε).\displaystyle\mathbb{P}\left(\boldsymbol{\Sigma}^{-1/2}(\widehat{\boldsymbol{\pi}}_{\mathcal{I}}-\boldsymbol{\pi}_{\mathcal{I}})\in\mathcal{D}\right)+\mathbb{P}\left(\left\|\boldsymbol{\Sigma}^{-1/2}\left(\widehat{\boldsymbol{\pi}}_{\mathcal{I}}-\boldsymbol{\pi}_{\mathcal{I}}-\Delta_{\mathcal{I}}\right)\right\|_{2}>\varepsilon\right).

Taking ε=λr1/2(𝚺)r1/2ε3\varepsilon=\lambda_{r}^{-1/2}(\boldsymbol{\Sigma})r^{1/2}\varepsilon_{3}, by Theorem 6 we know that 𝚺1/2(𝝅^𝝅Δ)2ε\|\boldsymbol{\Sigma}^{-1/2}\left(\widehat{\boldsymbol{\pi}}_{\mathcal{I}}-\boldsymbol{\pi}_{\mathcal{I}}-\Delta_{\mathcal{I}}\right)\|_{2}\leq\varepsilon with probability at least 1O(n9)1-O(n^{-9}). As a result, we have

(𝚺1/2Δ𝝅𝒟ε)(𝚺1/2(𝝅^𝝅)𝒟)+O(n9).\displaystyle\mathbb{P}(\boldsymbol{\Sigma}^{-1/2}\Delta\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D}^{-\varepsilon})\leq\mathbb{P}\left(\boldsymbol{\Sigma}^{-1/2}(\widehat{\boldsymbol{\pi}}_{\mathcal{I}}-\boldsymbol{\pi}_{\mathcal{I}})\in\mathcal{D}\right)+O(n^{-9}). (142)

On the other hand, one can see that

|(𝚺1/2Δ𝝅𝒟ε)(𝚺1/2Δ𝝅𝒟)||(𝚺1/2Δ𝝅𝒟ε)(𝒩(𝟎r,𝑰r)𝒟ε)|\displaystyle|\mathbb{P}(\boldsymbol{\Sigma}^{-1/2}\Delta\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D}^{-\varepsilon})-\mathbb{P}(\boldsymbol{\Sigma}^{-1/2}\Delta\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D})|\leq|\mathbb{P}(\boldsymbol{\Sigma}^{-1/2}\Delta\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D}^{-\varepsilon})-\mathbb{P}(\mathcal{N}(\boldsymbol{0}_{r},\boldsymbol{I}_{r})\in\mathcal{D}^{-\varepsilon})|
+|(𝒩(𝟎r,𝑰r)𝒟ε)(𝒩(𝟎r,𝑰r)𝒟)|+|(𝒩(𝟎r,𝑰r)𝒟)(𝚺1/2Δ𝝅𝒟)|.\displaystyle+|\mathbb{P}(\mathcal{N}(\boldsymbol{0}_{r},\boldsymbol{I}_{r})\in\mathcal{D}^{-\varepsilon})-\mathbb{P}(\mathcal{N}(\boldsymbol{0}_{r},\boldsymbol{I}_{r})\in\mathcal{D})|+|\mathbb{P}(\mathcal{N}(\boldsymbol{0}_{r},\boldsymbol{I}_{r})\in\mathcal{D})-\mathbb{P}(\boldsymbol{\Sigma}^{-1/2}\Delta\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D})|. (143)

By [Rai19, Theorem 1.2] we know that

|(𝒩(𝟎r,𝑰r)𝒟ε)(𝒩(𝟎r,𝑰r)𝒟)|r1/4ελr1/2(𝚺)r3/4ε3.\displaystyle|\mathbb{P}(\mathcal{N}(\boldsymbol{0}_{r},\boldsymbol{I}_{r})\in\mathcal{D}^{-\varepsilon})-\mathbb{P}(\mathcal{N}(\boldsymbol{0}_{r},\boldsymbol{I}_{r})\in\mathcal{D})|\lesssim r^{1/4}\varepsilon\lesssim\lambda_{r}^{-1/2}(\boldsymbol{\Sigma})r^{3/4}\varepsilon_{3}.

Plugging this and (141) in (143), we have

|(𝚺1/2Δ𝝅𝒟ε)(𝚺1/2Δ𝝅𝒟)|r5/4max1ijn𝚺1/2𝝎ij2+λr1/2(𝚺)r3/4ε3.\displaystyle|\mathbb{P}(\boldsymbol{\Sigma}^{-1/2}\Delta\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D}^{-\varepsilon})-\mathbb{P}(\boldsymbol{\Sigma}^{-1/2}\Delta\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D})|\lesssim r^{5/4}\max_{1\leq i\leq j\leq n}\left\|\boldsymbol{\Sigma}^{-1/2}\boldsymbol{\omega}_{ij}\right\|_{2}+\lambda_{r}^{-1/2}(\boldsymbol{\Sigma})r^{3/4}\varepsilon_{3}.

Combine this with (142), and by the arbitrariness of 𝒟\mathcal{D}, we get

(Δ𝝅𝒟)((𝝅^𝝅)𝒟)+O(r5/4max1ijn𝚺1/2𝝎ij2+λr1/2(𝚺)r3/4ε3).\displaystyle\mathbb{P}(\Delta\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D})\leq\mathbb{P}\left((\widehat{\boldsymbol{\pi}}_{\mathcal{I}}-\boldsymbol{\pi}_{\mathcal{I}})\in\mathcal{D}\right)+O\left(r^{5/4}\max_{1\leq i\leq j\leq n}\left\|\boldsymbol{\Sigma}^{-1/2}\boldsymbol{\omega}_{ij}\right\|_{2}+\lambda_{r}^{-1/2}(\boldsymbol{\Sigma})r^{3/4}\varepsilon_{3}\right).

Similarly, it can also be shown that

((𝝅^𝝅)𝒟)(Δ𝝅𝒟)+O(r5/4max1ijn𝚺1/2𝝎ij2+λr1/2(𝚺)r3/4ε3).\displaystyle\mathbb{P}\left((\widehat{\boldsymbol{\pi}}_{\mathcal{I}}-\boldsymbol{\pi}_{\mathcal{I}})\in\mathcal{D}\right)\leq\mathbb{P}(\Delta\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D})+O\left(r^{5/4}\max_{1\leq i\leq j\leq n}\left\|\boldsymbol{\Sigma}^{-1/2}\boldsymbol{\omega}_{ij}\right\|_{2}+\lambda_{r}^{-1/2}(\boldsymbol{\Sigma})r^{3/4}\varepsilon_{3}\right).

As a result, we know that

|(𝝅^𝝅𝒟)(Δ𝝅𝒟)|r5/4max1ijn𝚺1/2𝝎ij2+λr1/2(𝚺)r3/4ε3.\displaystyle\left|\mathbb{P}(\widehat{\boldsymbol{\pi}}_{\mathcal{I}}-\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D})-\mathbb{P}(\Delta\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D})\right|\lesssim r^{5/4}\max_{1\leq i\leq j\leq n}\left\|\boldsymbol{\Sigma}^{-1/2}\boldsymbol{\omega}_{ij}\right\|_{2}+\lambda_{r}^{-1/2}(\boldsymbol{\Sigma})r^{3/4}\varepsilon_{3}. (144)

Combine (141) and (144), we get

|(𝝅^𝝅𝒟)(𝒩(𝟎r,𝚺)𝒟)|r5/4max1ijn𝚺1/2𝝎ij2+λr1/2(𝚺)r3/4ε3.\displaystyle\left|\mathbb{P}(\widehat{\boldsymbol{\pi}}_{\mathcal{I}}-\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D})-\mathbb{P}(\mathcal{N}(\boldsymbol{0}_{r},\boldsymbol{\Sigma})\in\mathcal{D})\right|\lesssim r^{5/4}\max_{1\leq i\leq j\leq n}\left\|\boldsymbol{\Sigma}^{-1/2}\boldsymbol{\omega}_{ij}\right\|_{2}+\lambda_{r}^{-1/2}(\boldsymbol{\Sigma})r^{3/4}\varepsilon_{3}.

8.21 Auxiliary Lemmas

Lemma 12.

For 𝐍1\boldsymbol{N}_{1} and 𝐍2\boldsymbol{N}_{2} defined in  (50), we have

𝑵11λ1𝑰2,(K1)μnλ12,𝑵11λ1\displaystyle\left\|\boldsymbol{N}_{1}-\frac{1}{\lambda_{1}^{*}}\boldsymbol{I}\right\|_{2,\infty}\lesssim\sqrt{\frac{(K-1)\mu^{*}}{n\lambda_{1}^{*2}}},\quad\;\left\|\boldsymbol{N}_{1}\right\|\lesssim\frac{1}{\lambda_{1}^{*}}
𝑵21λ12𝑰2,(K1)μnλ14,𝑵21λ12.\displaystyle\left\|\boldsymbol{N}_{2}-\frac{1}{\lambda_{1}^{*2}}\boldsymbol{I}\right\|_{2,\infty}\lesssim\sqrt{\frac{(K-1)\mu^{*}}{n\lambda_{1}^{*4}}},\quad\left\|\boldsymbol{N}_{2}\right\|\lesssim\frac{1}{\lambda_{1}^{*2}}.
Proof.

We prove the desired results in the following two settings: with self-loop and without self-loop. When self-loops are allowed, the rank of 𝑯\boldsymbol{H} is exactly KK. When there is no self-loop, 𝑯\boldsymbol{H} is approximately rank KK.

With self-loop: The spectral norm bounds follow directly from  (50)

𝑵1=max2in1λ1λi1λ1,𝑵2=max2in1(λ1λi)21λ12.\displaystyle\left\|\boldsymbol{N}_{1}\right\|=\max_{2\leq i\leq n}\frac{1}{\lambda_{1}^{*}-\lambda_{i}^{*}}\lesssim\frac{1}{\lambda_{1}^{*}},\quad\left\|\boldsymbol{N}_{2}\right\|=\max_{2\leq i\leq n}\frac{1}{(\lambda_{1}^{*}-\lambda_{i}^{*})^{2}}\lesssim\frac{1}{\lambda_{1}^{*2}}.

By definition, for 𝑵1\boldsymbol{N}_{1} we have

𝑵11λ1𝑰=i=2n1λ1λi𝒖i𝒖i1λ1i=1n𝒖i𝒖i=1λ1𝒖1𝒖1+i=2Kλiλ1(λ1λi)𝒖i𝒖i.\displaystyle\boldsymbol{N}_{1}-\frac{1}{\lambda_{1}^{*}}\boldsymbol{I}=\sum_{i=2}^{n}\frac{1}{\lambda_{1}^{*}-\lambda_{i}^{*}}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}-\frac{1}{\lambda_{1}^{*}}\sum_{i=1}^{n}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}=-\frac{1}{\lambda_{1}^{*}}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+\sum_{i=2}^{K}\frac{\lambda_{i}^{*}}{\lambda_{1}^{*}(\lambda_{1}^{*}-\lambda_{i}^{*})}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}.

On one hand, by (12) we have

1λ1𝒖1𝒖12,=1λ1𝒖1𝒖12μnλ12.\displaystyle\left\|\frac{1}{\lambda_{1}^{*}}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right\|_{2,\infty}=\frac{1}{\lambda_{1}^{*}}\left\|\boldsymbol{u}_{1}^{*}\right\|_{\infty}\left\|\boldsymbol{u}_{1}^{*}\right\|_{2}\leq\sqrt{\frac{\mu^{*}}{n\lambda_{1}^{*2}}}.

On the other hand, defining

𝑪:=diag(λ2λ1(λ1λ2),λ3λ1(λ1λ3),,λKλ1(λ1λK)),\boldsymbol{C}:=\textbf{diag}\left(\frac{\lambda_{2}^{*}}{\lambda_{1}^{*}(\lambda_{1}^{*}-\lambda_{2}^{*})},\frac{\lambda_{3}^{*}}{\lambda_{1}^{*}(\lambda_{1}^{*}-\lambda_{3}^{*})},\dots,\frac{\lambda_{K}^{*}}{\lambda_{1}^{*}(\lambda_{1}^{*}-\lambda_{K}^{*})}\right),

by (12) we have,

i=2Kλiλ1(λ1λi)𝒖i𝒖i2,\displaystyle\left\|\sum_{i=2}^{K}\frac{\lambda_{i}^{*}}{\lambda_{1}^{*}(\lambda_{1}^{*}-\lambda_{i}^{*})}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}\right\|_{2,\infty} =𝑼¯𝑪𝑼¯2,𝑼¯2,𝑪𝑼¯(K1)μnλ12.\displaystyle=\left\|\overline{\boldsymbol{U}}\boldsymbol{C}\overline{\boldsymbol{U}}^{\top}\right\|_{2,\infty}\leq\left\|\overline{\boldsymbol{U}}\right\|_{2,\infty}\left\|\boldsymbol{C}\overline{\boldsymbol{U}}^{\top}\right\|\lesssim\sqrt{\frac{(K-1)\mu^{*}}{n\lambda_{1}^{*2}}}.

As a result, we get 𝑵11λ1𝑰2,(K1)μnλ12\left\|\boldsymbol{N}_{1}-\frac{1}{\lambda_{1}^{*}}\boldsymbol{I}\right\|_{2,\infty}\lesssim\sqrt{\frac{(K-1)\mu^{*}}{n\lambda_{1}^{*2}}}. Similarly, for 𝑵2\boldsymbol{N}_{2} we have

𝑵21λ12𝑰=1λ12𝒖1𝒖1+i=2K(1(λ1λi)21λ12)𝒖i𝒖i.\displaystyle\boldsymbol{N}_{2}-\frac{1}{\lambda_{1}^{*2}}\boldsymbol{I}=-\frac{1}{\lambda_{1}^{*2}}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+\sum_{i=2}^{K}\left(\frac{1}{(\lambda_{1}^{*}-\lambda_{i}^{*})^{2}}-\frac{1}{\lambda_{1}^{*2}}\right)\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}.

Again, we have 𝒖1𝒖1/λ122,μ/(nλ14)\|\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}/\lambda_{1}^{*2}\|_{2,\infty}\lesssim\sqrt{\mu^{*}/(n\lambda_{1}^{*4})}. Defining

𝑪1=diag(1(λ1λ2)21λ12,1(λ1λ3)21λ12,,1(λ1λK)21λ12),\boldsymbol{C}_{1}=\textbf{diag}\left(\frac{1}{(\lambda_{1}^{*}-\lambda_{2}^{*})^{2}}-\frac{1}{\lambda_{1}^{*2}},\frac{1}{(\lambda_{1}^{*}-\lambda_{3}^{*})^{2}}-\frac{1}{\lambda_{1}^{*2}},\dots,\frac{1}{(\lambda_{1}^{*}-\lambda_{K}^{*})^{2}}-\frac{1}{\lambda_{1}^{*2}}\right),

we have

i=2K(1(λ1λi)21λ12)𝒖i𝒖i2,\displaystyle\left\|\sum_{i=2}^{K}\left(\frac{1}{(\lambda_{1}^{*}-\lambda_{i}^{*})^{2}}-\frac{1}{\lambda_{1}^{*2}}\right)\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}\right\|_{2,\infty} =𝑼¯𝑪1𝑼¯2,𝑼¯2,𝑪1𝑼¯(K1)μnλ14.\displaystyle=\left\|\overline{\boldsymbol{U}}\boldsymbol{C}_{1}\overline{\boldsymbol{U}}^{\top}\right\|_{2,\infty}\leq\left\|\overline{\boldsymbol{U}}\right\|_{2,\infty}\left\|\boldsymbol{C}_{1}\overline{\boldsymbol{U}}^{\top}\right\|\lesssim\sqrt{\frac{(K-1)\mu^{*}}{n\lambda_{1}^{*4}}}.

Combining together, we get the desired conclusion 𝑵21λ12𝑰2,(K1)μnλ14\|\boldsymbol{N}_{2}-\frac{1}{\lambda_{1}^{*2}}\boldsymbol{I}\|_{2,\infty}\lesssim\sqrt{\frac{(K-1)\mu^{*}}{n\lambda_{1}^{*4}}}.

Without self-loop: The spectral norm bounds follow same as the previous case. For the rest, by definition of 𝑵1\boldsymbol{N}_{1},

𝑵11λ1𝑰=i=2n1λ1λi𝒖i𝒖i1λ1i=1n𝒖i𝒖i=\displaystyle\boldsymbol{N}_{1}-\frac{1}{\lambda_{1}^{*}}\boldsymbol{I}=\sum_{i=2}^{n}\frac{1}{\lambda_{1}^{*}-\lambda_{i}^{*}}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}-\frac{1}{\lambda_{1}^{*}}\sum_{i=1}^{n}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}= 1λ1𝒖1𝒖1+i=2Kλiλ1(λ1λi)𝒖i𝒖i\displaystyle-\frac{1}{\lambda_{1}^{*}}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+\sum_{i=2}^{K}\frac{\lambda_{i}^{*}}{\lambda_{1}^{*}(\lambda_{1}^{*}-\lambda_{i}^{*})}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}
+i=K+1nλiλ1(λ1λi)𝒖i𝒖i\displaystyle+\sum_{i=K+1}^{n}\frac{\lambda_{i}^{*}}{\lambda_{1}^{*}(\lambda_{1}^{*}-\lambda_{i}^{*})}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}

The bounds on the first two summands are same as before. Hence it remains to control i=K+1nλiλ1(λ1λi)𝒖i𝒖i2,\|\sum_{i=K+1}^{n}\frac{\lambda_{i}^{*}}{\lambda_{1}^{*}(\lambda_{1}^{*}-\lambda_{i}^{*})}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}\|_{2,\infty}. One can see that

i=K+1nλiλ1(λ1λi)𝒖i𝒖i2,\displaystyle\left\|\sum_{i=K+1}^{n}\frac{\lambda_{i}^{*}}{\lambda_{1}^{*}(\lambda_{1}^{*}-\lambda_{i}^{*})}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}\right\|_{2,\infty} i=K+1nλiλ1(λ1λi)𝒖i𝒖imaxK+1in|λiλ1(λ1λi)|\displaystyle\leq\left\|\sum_{i=K+1}^{n}\frac{\lambda_{i}^{*}}{\lambda_{1}^{*}(\lambda_{1}^{*}-\lambda_{i}^{*})}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}\right\|\leq\max_{K+1\leq i\leq n}\left|\frac{\lambda_{i}^{*}}{\lambda_{1}^{*}(\lambda_{1}^{*}-\lambda_{i}^{*})}\right|
diag(𝚯𝚷𝑷𝚷𝚯)λ12θmax2λ12.\displaystyle\lesssim\frac{\left\|\textbf{diag}(\boldsymbol{\Theta}\boldsymbol{\Pi}\boldsymbol{P}\boldsymbol{\Pi}^{\top}\boldsymbol{\Theta})\right\|}{\lambda_{1}^{*2}}\leq\frac{\theta_{\text{max}}^{2}}{\lambda_{1}^{*2}}.

As a result, we get

𝑵11λ1𝑰2,(K1)μnλ12+θmax2λ12(K1)μnλ12,\displaystyle\left\|\boldsymbol{N}_{1}-\frac{1}{\lambda_{1}^{*}}\boldsymbol{I}\right\|_{2,\infty}\lesssim\sqrt{\frac{(K-1)\mu^{*}}{n\lambda_{1}^{*2}}}+\frac{\theta_{\text{max}}^{2}}{\lambda_{1}^{*2}}\lesssim\sqrt{\frac{(K-1)\mu^{*}}{n\lambda_{1}^{*2}}},

since nθmax2nθmaxλ1\sqrt{n}\theta_{\text{max}}^{2}\lesssim\sqrt{n}\theta_{\text{max}}\ll\lambda_{1}^{*}. Similarly, for 𝑵2\boldsymbol{N}_{2} we have

𝑵21λ12𝑰=1λ12𝒖1𝒖1+i=2K(1(λ1λi)21λ12)𝒖i𝒖i+i=K+1n(1(λ1λi)21λ12)𝒖i𝒖i.\displaystyle\boldsymbol{N}_{2}-\frac{1}{\lambda_{1}^{*2}}\boldsymbol{I}=-\frac{1}{\lambda_{1}^{*2}}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+\sum_{i=2}^{K}\left(\frac{1}{(\lambda_{1}^{*}-\lambda_{i}^{*})^{2}}-\frac{1}{\lambda_{1}^{*2}}\right)\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}+\sum_{i=K+1}^{n}\left(\frac{1}{(\lambda_{1}^{*}-\lambda_{i}^{*})^{2}}-\frac{1}{\lambda_{1}^{*2}}\right)\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}.

We bound the first two summands as before. For the third term,

i=K+1n(1(λ1λi)21λ12)𝒖i𝒖i2,i=K+1n(1(λ1λi)21λ12)𝒖i𝒖i\displaystyle\left\|\sum_{i=K+1}^{n}\left(\frac{1}{(\lambda_{1}^{*}-\lambda_{i}^{*})^{2}}-\frac{1}{\lambda_{1}^{*2}}\right)\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}\right\|_{2,\infty}\leq\left\|\sum_{i=K+1}^{n}\left(\frac{1}{(\lambda_{1}^{*}-\lambda_{i}^{*})^{2}}-\frac{1}{\lambda_{1}^{*2}}\right)\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}\right\|
\displaystyle\lesssim maxK+1in|1(λ1λi)21λ12|maxK+1in|λiλ13|diag(𝚯𝚷𝑷𝚷𝚯)λ13θmax2λ13.\displaystyle\max_{K+1\leq i\leq n}\left|\frac{1}{(\lambda_{1}^{*}-\lambda_{i}^{*})^{2}}-\frac{1}{\lambda_{1}^{*2}}\right|\lesssim\max_{K+1\leq i\leq n}\left|\frac{\lambda_{i}^{*}}{\lambda_{1}^{*3}}\right|\leq\frac{\left\|\textbf{diag}(\boldsymbol{\Theta}\boldsymbol{\Pi}\boldsymbol{P}\boldsymbol{\Pi}^{\top}\boldsymbol{\Theta})\right\|}{\lambda_{1}^{*3}}\leq\frac{\theta_{\text{max}}^{2}}{\lambda_{1}^{*3}}.

Combine them together, we get

𝑵21λ12𝑰2,(K1)μnλ14+θmax2λ13(K1)μnλ14,\displaystyle\left\|\boldsymbol{N}_{2}-\frac{1}{\lambda_{1}^{*2}}\boldsymbol{I}\right\|_{2,\infty}\lesssim\sqrt{\frac{(K-1)\mu^{*}}{n\lambda_{1}^{*4}}}+\frac{\theta_{\text{max}}^{2}}{\lambda_{1}^{*3}}\lesssim\sqrt{\frac{(K-1)\mu^{*}}{n\lambda_{1}^{*4}}},

since nθmax2nθmaxλ1\sqrt{n}\theta_{\text{max}}^{2}\lesssim\sqrt{n}\theta_{\text{max}}\ll\lambda_{1}^{*}. This completes the proof. ∎

From Lemma 12, We immediately have the following corollary.

Corollary 4.

For 𝐍1\boldsymbol{N}_{1} and 𝐍2\boldsymbol{N}_{2} defined in  (50), we have for all 𝐱n\boldsymbol{x}\in\mathbb{R}^{n}

𝑵i𝒙1λ1i𝒙+(K1)μnλ12i𝒙2,i=1,2.\displaystyle\left\|\boldsymbol{N}_{i}\boldsymbol{x}\right\|_{\infty}\lesssim\frac{1}{\lambda_{1}^{*i}}\left\|\boldsymbol{x}\right\|_{\infty}+\sqrt{\frac{(K-1)\mu^{*}}{n\lambda_{1}^{*2i}}}\left\|\boldsymbol{x}\right\|_{2},\quad i=1,2.
Lemma 13.

For any i[n]i\in[n] and a fixed vector 𝐱n\boldsymbol{x}\in\mathbb{R}^{n}, we have

|𝑾i,𝒙|lognθmax𝒙2+logn𝒙\displaystyle\left|\boldsymbol{W}_{i,\cdot}\boldsymbol{x}\right|\lesssim\sqrt{\log n}\theta_{\text{max}}\left\|\boldsymbol{x}\right\|_{2}+\log n\left\|\boldsymbol{x}\right\|_{\infty}

with probability at least 1O(n15)1-O(n^{-15}). Here the constant hidden in \lesssim is free of nn, 𝐱\boldsymbol{x} and θmax\theta_{\text{max}}.

Proof.

Since |Wij|1|W_{ij}|\leq 1, by Bernstein inequality,with probability at least 1O(n15)1-O(n^{-15}),

|𝑾i,𝒙|\displaystyle\left|\boldsymbol{W}_{i,\cdot}\boldsymbol{x}\right| lognj=1n𝔼[Wij2]xj2+logn𝒙lognθmax𝒙2+logn𝒙\displaystyle\lesssim\sqrt{\log n\sum_{j=1}^{n}\mathbb{E}\left[W_{ij}^{2}\right]x_{j}^{2}}+\log n\left\|\boldsymbol{x}\right\|_{\infty}\lesssim\sqrt{\log n}\theta_{\text{max}}\left\|\boldsymbol{x}\right\|_{2}+\log n\left\|\boldsymbol{x}\right\|_{\infty}

Lemma 14.

For any i[n]i\in[n] and a fixed matrix 𝐀n×m\boldsymbol{A}\in\mathbb{R}^{n\times m}, we have with probability at least 1O(n15m)1-O(n^{-15}m),

𝑾i,𝑨2lognθmax𝑨F+logn𝑨2,.\displaystyle\left\|\boldsymbol{W}_{i,\cdot}\boldsymbol{A}\right\|_{2}\lesssim\sqrt{\log n}\theta_{\text{max}}\left\|\boldsymbol{A}\right\|_{F}+\log n\left\|\boldsymbol{A}\right\|_{2,\infty}.
Proof.

Taking E=𝑾i,E=\boldsymbol{W}_{i,\cdot} in [YCF21, Lemma 5] gives us the desired statement. ∎

Lemma 15.

For any fixed matrix 𝐀n×m\boldsymbol{A}\in\mathbb{R}^{n\times m}, we have with probability at least 1O(n14m)1-O(n^{-14}m),

𝑾𝑨2,lognθmax𝑨F+mlogn𝑨max.\displaystyle\left\|\boldsymbol{W}\boldsymbol{A}\right\|_{2,\infty}\lesssim\sqrt{\log n}\theta_{\text{max}}\left\|\boldsymbol{A}\right\|_{F}+\sqrt{m}\log n\left\|\boldsymbol{A}\right\|_{\text{max}}.
Proof.

Recall that 𝑾𝑨2,=max1in𝑾i,𝑨2\|\boldsymbol{W}\boldsymbol{A}\|_{2,\infty}=\max_{1\leq i\leq n}\|\boldsymbol{W}_{i,\cdot}\boldsymbol{A}\|_{2}. Applying Lemma 14 for i[n]i\in[n], we get the desired conclusion. ∎

Lemma 16.

For any i[n]i\in[n] and a fixed matrix 𝐀n×m\boldsymbol{A}\in\mathbb{R}^{n\times m}, we have with probability at least 1O(n15m)1-O(n^{-15}m),

(𝑾𝑾(i))𝑨Flognθmax𝑨F+(nθmax+logn)𝑨2,.\displaystyle\left\|\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{A}\right\|_{F}\lesssim\sqrt{\log n}\theta_{\text{max}}\left\|\boldsymbol{A}\right\|_{F}+(\sqrt{n}\theta_{\text{max}}+\log n)\left\|\boldsymbol{A}\right\|_{2,\infty}.
Proof.

By definition we have

(𝑾𝑾(i))𝑨F\displaystyle\left\|\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{A}\right\|_{F} =𝑾i,𝑨22+j[n],jiWji2𝑨i,22\displaystyle=\sqrt{\left\|\boldsymbol{W}_{i,\cdot}\boldsymbol{A}\right\|_{2}^{2}+\sum_{j\in[n],j\neq i}W_{ji}^{2}\left\|\boldsymbol{A}_{i,\cdot}\right\|_{2}^{2}}
𝑾i,𝑨2+𝑾,i2𝑨i,2.\displaystyle\lesssim\left\|\boldsymbol{W}_{i,\cdot}\boldsymbol{A}\right\|_{2}+\left\|\boldsymbol{W}_{\cdot,i}\right\|_{2}\left\|\boldsymbol{A}_{i,\cdot}\right\|_{2}. (145)

By Bernstein inequality, we know that

|j[n]Wji2j[n]𝔼[Wji2]|\displaystyle\left|\sum_{j\in[n]}W_{ji}^{2}-\sum_{j\in[n]}\mathbb{E}\left[W_{ji}^{2}\right]\right| lognj=1n𝔼[Wji4]+logn\displaystyle\lesssim\sqrt{\log n\sum_{j=1}^{n}\mathbb{E}\left[W_{ji}^{4}\right]}+\log n
nlognθmax2+lognnlognθmax.\displaystyle\lesssim\sqrt{n\log n\theta_{\text{max}}^{2}}+\log n\lesssim\sqrt{n\log n}\theta_{\text{max}}. (146)

with probability at least 1O(n15)1-O(n^{-15}). On the other hand, we know that

|j[n]𝔼[Wji2]|nθmax2.\displaystyle\left|\sum_{j\in[n]}\mathbb{E}\left[W_{ji}^{2}\right]\right|\lesssim n\theta_{\text{max}}^{2}. (147)

Combine (146) and (147) we know that

|j[n]Wji2|nlognθmax+nθmax2nθmax2\displaystyle\left|\sum_{j\in[n]}W_{ji}^{2}\right|\lesssim\sqrt{n\log n}\theta_{\text{max}}+n\theta_{\text{max}}^{2}\lesssim n\theta_{\text{max}}^{2}

with probability at least 1O(n15)1-O(n^{-15}). Plugging this as well as Lemma 14 in (145) we get

(𝑾𝑾(i))𝑨Flognθmax𝑨F+(nθmax+logn)𝑨2,\displaystyle\left\|\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{A}\right\|_{F}\lesssim\sqrt{\log n}\theta_{\text{max}}\left\|\boldsymbol{A}\right\|_{F}+(\sqrt{n}\theta_{\text{max}}+\log n)\left\|\boldsymbol{A}\right\|_{2,\infty}

with probability at least 1O(n15m)1-O(n^{-15}m). ∎

Lemma 17.

For any fixed 𝐮,𝐯n\boldsymbol{u},\boldsymbol{v}\in\mathbb{R}^{n} , we have with probability at least 1O(n15)1-O(n^{-15}),

|𝒖𝑾𝒗|logn𝒖2𝒗2.\displaystyle\left|\boldsymbol{u}^{\top}\boldsymbol{W}\boldsymbol{v}\right|\lesssim\sqrt{\log n}\|\boldsymbol{u}\|_{2}\|\boldsymbol{v}\|_{2}.

In particular, for any i,j[n]i,j\in[n] , we have with probability at least 1O(n15)1-O(n^{-15}),

|𝒖i𝑾𝒖j|logn.\displaystyle\left|\boldsymbol{u}_{i}^{*\top}\boldsymbol{W}\boldsymbol{u}_{j}^{*}\right|\lesssim\sqrt{\log n}.
Proof.

Since |Wkl|1|W_{kl}|\leq 1, by Hoeffding’s inequality, we have

(|𝒖𝑾𝒗|t)\displaystyle\mathbb{P}\left(\left|\boldsymbol{u}^{\top}\boldsymbol{W}\boldsymbol{v}\right|\geq t\right) 2exp(2t2k=1n(2ukvk)2+1k<ln(2ukvl+2ulvk)2)\displaystyle\leq 2\exp\left(-\frac{2t^{2}}{\sum_{k=1}^{n}\left(2u_{k}v_{k}\right)^{2}+\sum_{1\leq k<l\leq n}\left(2u_{k}v_{l}+2u_{l}v_{k}\right)^{2}}\right)
2exp(t22k=1nuk2vk2+41k<ln[uk2vl2+ul2vk2])\displaystyle\leq 2\exp\left(-\frac{t^{2}}{2\sum_{k=1}^{n}u_{k}^{2}v_{k}^{2}+4\sum_{1\leq k<l\leq n}[u_{k}^{2}v_{l}^{2}+u_{l}^{2}v_{k}^{2}]}\right)
2exp(t24k,l=1nuk2vl2)=2exp(t24𝒖22𝒗22).\displaystyle\leq 2\exp\left(-\frac{t^{2}}{4\sum_{k,l=1}^{n}u_{k}^{2}v_{l}^{2}}\right)=2\exp\left(-\frac{t^{2}}{4\|\boldsymbol{u}\|_{2}^{2}\|\boldsymbol{v}\|_{2}^{2}}\right).

As a result, with probability at least 1O(n15)1-O(n^{-15}), we have |𝒖𝑾𝒗|logn𝒖2𝒗2|\boldsymbol{u}^{\top}\boldsymbol{W}\boldsymbol{v}|\lesssim\sqrt{\log n}\|\boldsymbol{u}\|_{2}\|\boldsymbol{v}\|_{2}.

Lemma 18.

For any 𝐯n\boldsymbol{v}\in\mathbb{R}^{n} with 𝐯2=1\|\boldsymbol{v}\|_{2}=1 , with probability at least 1O(n15)1-O(n^{-15}),

|𝒖1𝑾𝒗|lognθmax+lognμ/n.\displaystyle\left|\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{v}\right|\lesssim\sqrt{\log n}\theta_{\text{max}}+\log n\sqrt{\mu^{*}/n}.
Proof.

We write

𝒖1𝑾𝒗=i=1nWii(𝒖1)ivi+1i<jnWij((𝒖1)ivj+(𝒖1)jvi).\displaystyle\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{v}=\sum_{i=1}^{n}W_{ii}(\boldsymbol{u}_{1}^{*})_{i}v_{i}+\sum_{1\leq i<j\leq n}W_{ij}((\boldsymbol{u}_{1}^{*})_{i}v_{j}+(\boldsymbol{u}_{1}^{*})_{j}v_{i}).

Since 𝒗2=1\|\boldsymbol{v}\|_{2}=1, we know, by (12),

maxi[n]|(𝒖1)ivi|μ/nandmax1i<jn|(𝒖1)ivj+(𝒖1)jvi|2μ/n.\displaystyle\max_{i\in[n]}|(\boldsymbol{u}_{1}^{*})_{i}v_{i}|\leq\sqrt{\mu^{*}/n}\quad\text{and}\quad\max_{1\leq i<j\leq n}|(\boldsymbol{u}_{1}^{*})_{i}v_{j}+(\boldsymbol{u}_{1}^{*})_{j}v_{i}|\leq 2\sqrt{\mu^{*}/n}.

As a result, by Bernstein inequality we have, with probability at least 1O(n15)1-O(n^{-15}),

|𝒖1𝑾𝒗|\displaystyle\left|\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{v}\right|\lesssim logn(i=1n((𝒖1)ivi)2𝔼[Wii2]+1i<jn((𝒖1)ivj+(𝒖1)jvi)2𝔼[Wij2])\displaystyle\sqrt{\log n\Bigg{(}\sum_{i=1}^{n}((\boldsymbol{u}_{1}^{*})_{i}v_{i})^{2}\mathbb{E}[W_{ii}^{2}]+\sum_{1\leq i<j\leq n}((\boldsymbol{u}_{1}^{*})_{i}v_{j}+(\boldsymbol{u}_{1}^{*})_{j}v_{i})^{2}\mathbb{E}[W_{ij}^{2}]\Bigg{)}}
+lognμ/n\displaystyle+\log n\sqrt{\mu^{*}/n}
\displaystyle\lesssim logni,j=1n((𝒖1)ivj)2θmax2+lognμ/nlognθmax+lognμ/n\displaystyle\sqrt{\log n\sum_{i,j=1}^{n}((\boldsymbol{u}_{1}^{*})_{i}v_{j})^{2}\theta_{\text{max}}^{2}}+\log n\sqrt{\mu^{*}/n}\lesssim\sqrt{\log n}\theta_{\text{max}}+\log n\sqrt{\mu^{*}/n}

8.22 The Incoherence Parameter μ\mu^{*}

In this subsection, we aim to show that the incoherence condition (12) holds with μ1\mu^{*}\asymp 1. As we pointed out in Remark 1, [JKL23, Lemma C.3] and Assumption 2 guarantees 𝒖11n\left\|\boldsymbol{u}_{1}^{*}\right\|_{\infty}\lesssim\sqrt{\frac{1}{n}}. Therefore it remains to show

𝑼¯2,Kn.\displaystyle\left\|\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}\lesssim\sqrt{\frac{K}{n}}.

We use the notation 𝑼¯1\overline{\boldsymbol{U}}^{*}_{1} for 𝑼¯\overline{\boldsymbol{U}}^{*} when there exists self-loops, and use the notation 𝑼¯2\overline{\boldsymbol{U}}^{*}_{2} for 𝑼¯\overline{\boldsymbol{U}}^{*} when the self-loops are not allowed.

With self-loop: Recall the definition of 𝑩\boldsymbol{B}^{*} in Lemma 3. By [JKL23, (C.26)], we know that 𝑩K\left\|\boldsymbol{B}^{*}\right\|\lesssim\sqrt{K}. As a result, we have 𝒃k2K\|\boldsymbol{b}^{*}_{k}\|_{2}\leq\sqrt{K} for 1kK1\leq k\leq K. Since 𝒓i,i[n]\boldsymbol{r}_{i}^{*},i\in[n] are convex combinations of 𝒃k\boldsymbol{b}_{k}^{*}s, one can see that 𝒓i2K\|\boldsymbol{r}^{*}_{i}\|_{2}\leq\sqrt{K} for 1in1\leq i\leq n. Hence, by the definition of 𝒓i\boldsymbol{r}_{i}^{*} in (9),

(𝑼¯1)i,2=(𝒖1)i𝒓i2Kn.\displaystyle\left\|\left(\overline{\boldsymbol{U}}^{*}_{1}\right)_{i,\cdot}\right\|_{2}=\left\|(\boldsymbol{u}_{1}^{*})_{i}\boldsymbol{r}_{i}^{*}\right\|_{2}\lesssim\sqrt{\frac{K}{n}}.

Without self-loop: Define 𝑹𝑼¯\boldsymbol{R}_{\overline{\boldsymbol{U}}} as the rotation matrix matches 𝑼¯1\overline{\boldsymbol{U}}_{1}^{*} and 𝑼¯2\overline{\boldsymbol{U}}_{2}^{*}

𝑹𝑼¯:=argmin𝑶𝒪(K1)×(K1)𝑼¯1𝑶𝑼¯2F.\displaystyle\boldsymbol{R}_{\overline{\boldsymbol{U}}}:=\operatorname*{arg\,min}_{\boldsymbol{O}\in\mathcal{O}^{(K-1)\times(K-1)}}\left\|\overline{\boldsymbol{U}}_{1}^{*}\boldsymbol{O}-\overline{\boldsymbol{U}}^{*}_{2}\right\|_{F}.

By Wedin’s sinΘ\Theta Theorem [CCF+21, Theorem 2.9], we have

𝑼¯1𝑹𝑼¯𝑼¯2diag(𝚯𝚷𝑷𝚷𝚯)σminθmax2βnK1nθmax2Kβnn.\displaystyle\left\|\overline{\boldsymbol{U}}_{1}^{*}\boldsymbol{R}_{\overline{\boldsymbol{U}}}-\overline{\boldsymbol{U}}^{*}_{2}\right\|\lesssim\frac{\left\|\textbf{diag}(\boldsymbol{\Theta}\boldsymbol{\Pi}\boldsymbol{P}\boldsymbol{\Pi}^{\top}\boldsymbol{\Theta})\right\|}{\sigma_{\textbf{min}}^{*}}\lesssim\frac{\theta_{\text{max}}^{2}}{\beta_{n}K^{-1}n\theta_{\text{max}}^{2}}\lesssim\frac{K}{\beta_{n}n}.

As a result, we have

𝑼¯22,\displaystyle\left\|\overline{\boldsymbol{U}}^{*}_{2}\right\|_{2,\infty} 𝑼¯1𝑹𝑼¯2,+𝑼¯1𝑹𝑼¯𝑼¯22,𝑼¯1𝑹𝑼¯2,+𝑼¯1𝑹𝑼¯𝑼¯2\displaystyle\leq\left\|\overline{\boldsymbol{U}}^{*}_{1}\boldsymbol{R}_{\overline{\boldsymbol{U}}}\right\|_{2,\infty}+\left\|\overline{\boldsymbol{U}}_{1}^{*}\boldsymbol{R}_{\overline{\boldsymbol{U}}}-\overline{\boldsymbol{U}}^{*}_{2}\right\|_{2,\infty}\leq\left\|\overline{\boldsymbol{U}}^{*}_{1}\boldsymbol{R}_{\overline{\boldsymbol{U}}}\right\|_{2,\infty}+\left\|\overline{\boldsymbol{U}}_{1}^{*}\boldsymbol{R}_{\overline{\boldsymbol{U}}}-\overline{\boldsymbol{U}}^{*}_{2}\right\|
Kn+KβnnKn\displaystyle\lesssim\sqrt{\frac{K}{n}}+\frac{K}{\beta_{n}n}\lesssim\sqrt{\frac{K}{n}}

as long as nKβn2n\gtrsim K\beta_{n}^{-2}, which is a relatively mild assumption and is assumed to be true from Theorem 2 to Theorem 7 (Theorem 1 does not involve 𝑼¯\overline{\boldsymbol{U}}^{*}).

8.23 Proof of (31)

We formally state the theoretical guarantee of the Gaussian multiplier bootstrap method described in Example 2 as below.

Theorem 11.

Assume the conditions in Theorem 6 hold. As long as

maxj:ji𝑪j,k𝝅𝑪i,k𝝅maxlog5/2nV𝑪j,k𝝅𝑪i,k𝝅=o(1) and maxj:jiε3lognV𝑪j,k𝝅𝑪i,k𝝅=o(1),\displaystyle\max_{j:j\neq i}\frac{\|\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}\|_{\text{max}}\log^{5/2}n}{\sqrt{V_{\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}}}}=o(1)\quad\text{ and }\quad\max_{j:j\neq i}\frac{\varepsilon_{3}\sqrt{\log n}}{\sqrt{V_{\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}}}}=o(1),

we have

|(𝒯>c1α)α|0.\displaystyle\left|\mathbb{P}(\mathcal{T}>c_{1-\alpha})-\alpha\right|\to 0.
Proof.

We define

𝒯=maxj:ji|Tr[(𝑪j,k𝝅𝑪i,k𝝅)𝑾]V𝑪j,k𝝅𝑪i,k𝝅|.\displaystyle\mathcal{T}^{\sharp}=\max_{j:j\neq i}\left|\frac{\textbf{Tr}\left[\left(\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}\right)\boldsymbol{W}\right]}{\sqrt{V_{\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}}}}\right|.

Since |Wij|1|W_{ij}|\leq 1, we know that [CCKK22, Condition E, M] holds with b1=b2=1b_{1}=b_{2}=1 and Bnn𝑪j,k𝝅𝑪i,k𝝅max/V𝑪j,k𝝅𝑪i,k𝝅B_{n}\asymp n\|\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}\|_{\text{max}}/\sqrt{V_{\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}}}. As a result, by [CCKK22, Theorem 2.2] we have

|(𝒯>c1α)α|\displaystyle\left|\mathbb{P}(\mathcal{T}^{\sharp}>c_{1-\alpha})-\alpha\right| (Bn2log5((n1)(n2+n)/2)(n2+n)/2)1/4\displaystyle\lesssim\left(\frac{B_{n}^{2}\log^{5}((n-1)(n^{2}+n)/2)}{(n^{2}+n)/2}\right)^{1/4}
log5/4n𝑪j,k𝝅𝑪i,k𝝅maxV𝑪j,k𝝅𝑪i,k𝝅0.\displaystyle\lesssim\log^{5/4}n\sqrt{\frac{\|\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}\|_{\text{max}}}{\sqrt{V_{\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}}}}}\to 0. (148)

Therefore, to prove (31), it is enough to show

supx|(𝒯x)(𝒯x)|0.\displaystyle\sup_{x\in\mathbb{R}}\left|\mathbb{P}(\mathcal{T}^{\sharp}\leq x)-\mathbb{P}(\mathcal{T}\leq x)\right|\to 0. (149)

One can see that

supx|(𝒯x)(𝒯x)|(|𝒯𝒯|>δ)+supx(x<𝒯x+δ),δ>0.\displaystyle\sup_{x\in\mathbb{R}}\left|\mathbb{P}(\mathcal{T}^{\sharp}\leq x)-\mathbb{P}(\mathcal{T}\leq x)\right|\leq\mathbb{P}(|\mathcal{T}^{\sharp}-\mathcal{T}|>\delta)+\sup_{x\in\mathbb{R}}\mathbb{P}(x<\mathcal{T}^{\sharp}\leq x+\delta),\quad\forall\delta>0. (150)

We take δmaxj:jiε3/V𝑪j,k𝝅𝑪i,k𝝅\delta\asymp\max_{j:j\neq i}\varepsilon_{3}/\sqrt{V_{\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}}}. Then by Theorem 6 we know that

(|𝒯𝒯|>δ)=O(n10).\displaystyle\mathbb{P}(|\mathcal{T}^{\sharp}-\mathcal{T}|>\delta)=O(n^{-10}). (151)

Next we show that supx(x<𝒯x+δ)0\sup_{x\in\mathbb{R}}\mathbb{P}(x<\mathcal{T}^{\sharp}\leq x+\delta)\to 0. Let (Z1,Z2,,Zn1)(Z_{1},Z_{2},\dots,Z_{n-1}) be a centered Gaussian random vector with the same covariance structure (thus we know that Zk𝒩(0,1),k[n1]Z_{k}\sim\mathcal{N}(0,1),\;\forall k\in[n-1]) as

(Tr[(𝑪j,k𝝅𝑪i,k𝝅)𝑾]V𝑪j,k𝝅𝑪i,k𝝅:j[n]\i)n1.\displaystyle\left(\frac{\textbf{Tr}\left[\left(\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}\right)\boldsymbol{W}\right]}{\sqrt{V_{\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}}}}:j\in[n]\backslash i\right)\in\mathbb{R}^{n-1}.

Then by [CCKK22, Theorem 2.1] we know that

supx|(𝒯x)(max1kn1|Zk|x)|\displaystyle\sup_{x\in\mathbb{R}}\left|\mathbb{P}(\mathcal{T}^{\sharp}\leq x)-\mathbb{P}\left(\max_{1\leq k\leq n-1}|Z_{k}|\leq x\right)\right| (Bn2log5((n1)(n2+n)/2)(n2+n)/2)1/4\displaystyle\lesssim\left(\frac{B_{n}^{2}\log^{5}((n-1)(n^{2}+n)/2)}{(n^{2}+n)/2}\right)^{1/4}
log5/4n𝑪j,k𝝅𝑪i,k𝝅maxV𝑪j,k𝝅𝑪i,k𝝅0.\displaystyle\lesssim\log^{5/4}n\sqrt{\frac{\|\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}\|_{\text{max}}}{\sqrt{V_{\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}}}}}\to 0.

As a result, we have

|supx(x<𝒯x+δ)supx(x<max1kn1|Zk|x+δ)|\displaystyle\left|\sup_{x\in\mathbb{R}}\mathbb{P}(x<\mathcal{T}^{\sharp}\leq x+\delta)-\sup_{x\in\mathbb{R}}\mathbb{P}\left(x<\max_{1\leq k\leq n-1}|Z_{k}|\leq x+\delta\right)\right|
\displaystyle\leq supx|(x<𝒯x+δ)(x<max1kn1|Zk|x+δ)|\displaystyle\sup_{x\in\mathbb{R}}\left|\mathbb{P}(x<\mathcal{T}^{\sharp}\leq x+\delta)-\mathbb{P}\left(x<\max_{1\leq k\leq n-1}|Z_{k}|\leq x+\delta\right)\right|
\displaystyle\leq 2supx|(𝒯x)(max1kn1|Zk|x)|0.\displaystyle 2\sup_{x\in\mathbb{R}}\left|\mathbb{P}(\mathcal{T}^{\sharp}\leq x)-\mathbb{P}\left(\max_{1\leq k\leq n-1}|Z_{k}|\leq x\right)\right|\to 0. (152)

On the other hand, by [CCK15, Theorem 3] we know that

supx(x<max1kn1|Zk|x+δ)δlogn0.\displaystyle\sup_{x\in\mathbb{R}}\mathbb{P}\left(x<\max_{1\leq k\leq n-1}|Z_{k}|\leq x+\delta\right)\lesssim\delta\sqrt{\log n}\to 0. (153)

Combine (152) with (153) we know that

supx(x<𝒯x+δ)0.\displaystyle\sup_{x\in\mathbb{R}}\mathbb{P}(x<\mathcal{T}^{\sharp}\leq x+\delta)\to 0. (154)

Plugging (151) and (154) in (150) we prove (149). This, along with (148) proves that

|(𝒯>c1α)α|0,\displaystyle\left|\mathbb{P}(\mathcal{T}>c_{1-\alpha})-\alpha\right|\to 0,

concluding our proof.