, and

Inferences on mixing probabilities and ranking in mixed-membership models^†^†thanks: The research is supported in part by ONR grant N00014-22-1-2340, NSF grants DMS-2052926, DMS-2053832 and DMS-2210833.

Sohom Bhattacharya Jianqing Fan Jikai Hou Department of Statistics, University of Floridapresep=, ] Department of Operations Research and Financial Engineering, Princeton Universitypresep=. ]

Abstract

Network data is prevalent in numerous big data applications including economics and health networks where it is of prime importance to understand the latent structure of network. In this paper, we model the network using the Degree-Corrected Mixed Membership (DCMM) model. In DCMM model, for each node $i$ , there exists a membership vector $\boldsymbol{\pi}_{i}=(\boldsymbol{\pi}_{i}(1),\boldsymbol{\pi}_{i}(2),\ldots,\boldsymbol{\pi}_{i}(K))$ , where $\boldsymbol{\pi}_{i}(k)$ denotes the weight that node $i$ puts in community $k$ . We derive novel finite-sample expansion for the $\boldsymbol{\pi}_{i}(k)$ s which allows us to obtain asymptotic distributions and confidence interval of the membership mixing probabilities and other related population quantities. This fills an important gap on uncertainty quantification on the membership profile. We further develop a ranking scheme of the vertices based on the membership mixing probabilities on certain communities and perform relevant statistical inferences. A multiplier bootstrap method is proposed for ranking inference of individual member’s profile with respect to a given community. The validity of our theoretical results is further demonstrated by via numerical experiments in both real and synthetic data examples.

Network data,

Mixed membership models,

Asymptotic distributions,

Ranking inference,

keywords:

1 Introduction

In various fields of study, such as citation networks, protein interactions, health, finance, trade, and social networks, we often come across large amounts of data that describe the relationships between objects. There are numerous approaches to understand and analyze such network data. Algorithmic methods are commonly used to optimize specific criteria as shown in [New13a, New13b] and [ZM14]. Alternatively, model-based methods rely on probabilistic models with specific structures, which are reviewed by [GZF⁺10]. One of the earliest models where the nodes (or vertices) of the network belong to some latent community is Stochastic Block Model (SBM) [HLL83, WW87, Abb17]. Several improvements have been proposed over this model to overcome its limitations, two of them are relevant in our paper. First, [KN11] introduced degree-corrected SBM, where a degree parameter is used for each vertices to make the expected degrees match the observed ones. Second, [ABFX08, ABEF14] study mixed membership model where each individual can belong to several communities with a mixing probability profile. In this paper, we study membership profiles in Degree-Corrected Mixed Membership (DCMM) model, which combines both the above benefits. In DCMM, every node $i$ is assumed to have a community membership probability vector $\boldsymbol{\pi}_{i}\in\mathbb{R}^{K}$ , where $K$ is the number of communities and the $k$ -th entry of $\boldsymbol{\pi}_{i}$ specifies the mixture proportion of node $i$ in community $k$ (see [FFHL22a, JKL23]. For example, a newspaper can be $40\%$ conservative and $60\%$ liberal. In addition, each node is allowed to have its own degree.

Given such a network, estimation and inference of membership profiles has drawn some attention recently. For example, [JKL23] provides an algorithm to estimate the $\boldsymbol{\pi}_{i}$ ’s and [FFHL22b, FFLY22] considers the hypothesis testing problem that two nodes have same membership profiles. However, they avoid the problems of uncertainty quantification of $\boldsymbol{\pi}_{i}(k)$ . In addition, to our best knowledge, none of the prior works concern with the problem of ranking nodes in a particular profile. As an example, one might consider asking the question: Is newspaper A more liberal than newspaper B? Or how many newspapers should I pick to ensure the top-K conservative newspapers are selected? Such a ranking question has applications in finances where one might be interested in knowing whether a particular stock is in top $K$ technology-stocks before investing in it. In our work, we device a framework to perform ranking inference based on $\boldsymbol{\pi}_{i}$ s.

Our work lies in the intersection of three research directions which we delineate here.

1.

Community detection: Our estimation and inference procedure crucially rely on spectral clustering, which is one of the oldest methods of community detection (cf. [VL07] for a tutorial). In the last decade, [RCY11, Jin15, LR15] have developed both the theory and methods of spectral clustering. Other line of research related to community detection involves showing optimality of detection boundary [Abb17] or link-prediction problem [NK07, LWLZ23]. Recently, [JKL23] has developed an algorithm to estimate $\pi_{i}$ s in $l_{2}$ norm, however it lacks any inferential guarantees and asymptotic distributions. Furthermore, there has been significant number of works about hypothesis testing in network data. [ACV14, VAC15] formulated community detection as a hypothesis detection problem. [FFHL22b, FFLY22] studies the testing problem of whether two vertices have same membership profiles in DCMM. Under stochastic block models, detection of the number of blocks has been studied by [BS16, Lei16, WB17] among others.
2.

Ranking inference: Most of the literature about ranking problems deals with pairwise comparisons or multiple partial ranking models, like Bradley-Terry-Luce and other assortative network models. Prominent examples of ranking involves individual choices in economics [Luc12, M⁺73], websites [DKNS01], ranking of journals [JJKL21, Sti94], alleles in genetics [SC95].Hence, the ranking problem has been extensively studied in statistics, machine learning, and operations research, see, for example, [Hun04, CS15, CFMW19, CCF⁺21, GSZ23, FLWY22] for more details. However, none of the above work is concerned with ranking in DCMM model and hence, our work is significantly different from the aforementioned papers.
3.

$l_{\infty}$ and $l_{2,\infty}$ perturbation theory: Often, for uncertainty quantification of unknown parameters, it is not enough to obtain an $l_{2}$ error bound on the estimators, rather one needs a leave-one-out style analysis to get more refined $l_{\infty}$ and $l_{2,\infty}$ error bounds. See [CCF⁺21, Chapter 4] for an introduction. Such analysis has been used in matrix completion[CFMY19], principal component analysis [YCF21], ranking analysis[FLWY22, FHY22, GSZ23]. We develop novel subspace perturbation theory to obtain novel finite sample expansions of individual $\boldsymbol{\pi}_{i}(k)$ s and use it to obtain asymptotic distributions.

To perform inference about ranks and hypothesis testing, we employ an inference framework for ranking items through maximum pairwise difference statistic whose distribution is approximated by a newly proposed multiplier bootstrap method. A similar framework is recently introduced by [FLWY22] in the context of Bradley-Terry-Luce model.

The rest of the paper is structured as follows. Section 2 formulates the problem and describes the estimation procedure. Section 3 and Section 4 delineates the vertex hunting and membership reconstruction steps of our estimation procedure respectively. Using the results we established, we develop some distribution theory and answer inference questions in Section 5. We complement our theoretical findings with numerical experiments in Section 6 where we perform simulations on both synthetic and real datasets. Section 7 provides brief outline of the major proofs of our paper. Finally, Section 8 contains all the proofs.

2 Problem Formulation

2.1 Model setting

Consider an undirected graph $G=(V,E)$ on $n$ nodes (or vertices), where $V=[n]:=\{1,2,\ldots,n\}$ is the set of nodes and $E\subseteq[n]\times[n]$ denotes the set of edges or links between the nodes. Given such a graph $G$ , consider its symmetric adjacency matrix $X=\mathbb{R}^{n\times n}$ which captures the connectivity structure of $X$ , namely $x_{ij}=1$ if there exists a link or edge between the nodes $i$ and $j$ , i.e., $(i,j)\in E$ and $x_{ij}=0$ otherwise. We allow the presence of potential self-loops in the graph $G$ . If there exists no self-loops, we let $x_{ii}=0$ for $i\in[n]$ . Under a probabilistic model, we will assume that $x_{ij}$ is an independent realization from a Bernoulli random variable for all upper triangular entries of random matrix $X$ .

Given a network on $n$ nodes, we assume that there is an underlying latent community structure which contains $K$ communities. Each node $i$ is associated with a community membership probability vector $\boldsymbol{\pi}_{i}=(\boldsymbol{\pi}_{i}(1),\boldsymbol{\pi}_{i}(2),\dots,\boldsymbol{\pi}_{i}(K))\in\mathbb{R}^{K}$ such that

\displaystyle\mathbb{P}(\text{node }i\text{ belongs to community }k)=\boldsymbol{\pi}_{i}(k),\quad k=1,2,\dots,K.

A node is $i\in[n]$ is called a pure node if there exists $k\in[K]$ such that $\boldsymbol{\pi}_{i}(k)=1$ . The degree-corrected mixed membership (DCMM) model assumes (cf.[JKL23]) that the probability that an edge exists between node $i$ and node $j$ is given by

\displaystyle\mathbb{P}(\text{edge exists between node }i\text{ and node }j)=\theta_{i}\theta_{j}\sum_{k=1}^{K}\sum_{l=1}^{K}\boldsymbol{\pi}_{i}(k)\boldsymbol{\pi}_{j}(l)p_{kl}.

(1)

Here, $\theta_{i}>0$ captures the degree heterogeneity for node $i\in[n]$ , and $p_{kl}>0$ can be viewed as the probability of a typical member in community $k$ ( $\theta_{i}=1$ , say) connects with a typical member in community $l$ ( $\theta_{j}=1$ , say). The mixture probabilities can be written in the following matrix form as follows. Let $\boldsymbol{\Theta}=\textbf{diag}(\theta_{1},\theta_{2},\dots,\theta_{n})$ be a diagonal matrix that captures the degree heterogeneity, $\boldsymbol{\Pi}=(\boldsymbol{\pi}_{1},\boldsymbol{\pi}_{2},\dots,\boldsymbol{\pi}_{n})^{\top}\in\mathbb{R}^{n\times K}$ be the matrix of community membership probability vectors, and $\boldsymbol{P}=(p_{kl})\in\mathbb{R}^{K\times K}$ be a nonsingular matrix with $p_{kl}\in[0,1]$ . Then, the mixing probability matrix can be expressed as

\displaystyle\boldsymbol{H}=\boldsymbol{\Theta}\boldsymbol{\Pi}\boldsymbol{P}\boldsymbol{\Pi}^{\top}\boldsymbol{\Theta}.

(2)

Let $\boldsymbol{X}\in\mathbb{R}^{n\times n}$ be the symmetric adjacency matrix of these $n$ nodes, i.e., $X_{ij}=1$ if there is a link connecting nodes $i$ and $j$ , and $X_{ij}=0$ otherwise. Note that $\mathbb{E}(X_{ij})=H_{ij}$ for $i\not=j$ . We set $X_{ii}=0$ for the case without self-loop. We also allow the case with loop. In that case, we assume that $P(X_{ii}=1)=H_{ii}$ . In both cases, we write

\boldsymbol{X}=\boldsymbol{H}+\boldsymbol{W},

(3)

where $\boldsymbol{W}$ is a symmetric random matrix with mean zero and independent entries above the diagonal and on the diagonal for the case with self-loop. In the following, since our theory applies to both cases, we will not distinguish between them. Therefore, the notation (3) is used throughout the article.

Our goal is to study the community membership probability matrix $\boldsymbol{\Pi}$ and conduct inference on its entries. Based on their uncertainty quantifications, we provide a framework for ranking inference and perform some hypothesis testing problems on the rank of each node’s mixing probability on a community. To this end, we need to first estimate $\boldsymbol{\Pi}$ using Mixed-SCORE algorithm [JKL23]. We invoke a slight modification of the algorithm along with the $l_{2}-l_{\infty}$ perturbation theory to get desired entry-wise expansion.

We impose the following identifiability condition for the DCMM model (1).

Assumption 1.

Each community $\mathcal{C}_{k}$ , $1\leq k\leq K$ has at least one pure node, namely, there exists a vertex $i\in[n]$ such that $\boldsymbol{\pi}_{i}(k)=1$ .

2.2 Estimation procedure

We describe the version of the Mixed-SCORE algorithm (cf. [JKL23]) which will be used to estimate $\boldsymbol{\Pi}$ . The algorithm consists of three key steps. First, we map each node to a $(K-1)$ -dimension space using observed $\boldsymbol{X}$ . Ideally, in absence of noise, i.e., $\boldsymbol{W}=0$ , these $n$ points in the $(K-1)$ -dimension space would form a simplex with $K$ vertices, with the pure nodes defined by Assumption 1 becoming the vertices. Presence of such a simplex structure has been discussed by Lemma 2.1 of [JKL23].

In the presence of noise, as long as the noise level is mild, we can still hope they are approximately a simplex. So, we apply a vertex hunting algorithm to estimate these $K$ vertices. After that, we estimate the membership vector $\boldsymbol{\pi}_{i}$ for each node based on the estimated vertices. Below, we describe the procedure mathematically.

•

SCORE step: Let $\widehat{\lambda}_{1},\widehat{\lambda}_{2},\dots,\widehat{\lambda}_{K}$ be the largest $K$ eigenvalues (in magnitude) of $\boldsymbol{X}$ , sorted in descending order. Let $\widehat{\boldsymbol{u}}_{1},\widehat{\boldsymbol{u}}_{2},\dots,\widehat{\boldsymbol{u}}_{K}$ be the corresponding eigenvectors. Calculate the following $n$ vectors

\displaystyle\widehat{\boldsymbol{r}}_{i}:=\left[\frac{(\widehat{\boldsymbol{u}}_{2})_{i}}{(\widehat{\boldsymbol{u}}_{1})_{i}},\frac{(\widehat{\boldsymbol{u}}_{3})_{i}}{(\widehat{\boldsymbol{u}}_{1})_{i}},\dots,\frac{(\widehat{\boldsymbol{u}}_{K})_{i}}{(\widehat{\boldsymbol{u}}_{1})_{i}}\right]^{\top}\in\mathbb{R}^{K-1},\quad\forall i\in[n].

(4)

•

Vertex Hunting step: Apply vertex hunting algorithm (see Secion 3 for details) to $\widehat{\boldsymbol{r}}_{1},\widehat{\boldsymbol{r}}_{2},\dots,\widehat{\boldsymbol{r}}_{n}$ and get estimated vertices $\widehat{\boldsymbol{b}}_{1},\widehat{\boldsymbol{b}}_{2},\dots,\widehat{\boldsymbol{b}}_{K}\in\mathbb{R}^{K-1}$ .

•

Membership Reconstruction step: For each $i\in[n]$ , let $\widehat{\boldsymbol{a}}_{i}=(\widehat{a}_{i}(1),\ldots,\widehat{a}_{i}(K))\in\mathbb{R}^{K}$ be the unique solution of the linear equations:

\displaystyle\widehat{\boldsymbol{r}}_{i}=\sum_{k=1}^{K}\widehat{a}_{i}(k)\widehat{\boldsymbol{b}}_{k},\qquad\sum_{k=1}^{K}\widehat{a}_{i}(k)=1.

(5)

Define a vector $\widehat{\boldsymbol{\pi}}^{\prime}_{i}\in\mathbb{R}^{K}$ with $\boldsymbol{\pi}^{\prime}_{i}(k)=\widehat{a}_{i}(k)/\left[\widehat{\boldsymbol{c}}\right]_{k}$ , where

\displaystyle\left[\widehat{\boldsymbol{c}}\right]_{k}=\left[\widehat{\lambda}_{1}+\widehat{\boldsymbol{b}}_{k}^{\top}\textbf{diag}(\widehat{\lambda}_{2},\widehat{\lambda}_{3},\dots,\widehat{\lambda}_{K})\widehat{\boldsymbol{b}}_{k}\right]^{-1/2},\quad 1\leq k\leq K.

(6)

Estimate $\boldsymbol{\pi}_{i}$ as $\widehat{\boldsymbol{\pi}}_{i}=\boldsymbol{\pi}^{\prime}_{i}/\sum_{k=1}^{K}\boldsymbol{\pi}_{i}^{\prime}(k)$ .

To analyze the algorithm, we begin by analyzing the SCORE step, i.e., investigate the $\widehat{\boldsymbol{r}}_{i}$ ’s.

We introduce some notations first. Since $\boldsymbol{H}$ and $\boldsymbol{X}$ are symmetric, we write their eigen-decomposition as

	$\displaystyle\boldsymbol{H}$	$\displaystyle=\sum_{i=1}^{n}\lambda_{i}^{}\boldsymbol{u}_{i}^{}\boldsymbol{u}_{i}^{\top}=\sum_{i=1}^{n}\|\lambda_{i}^{}\|\boldsymbol{u}_{i}^{}\left(\textbf{sgn}\left(\lambda_{i}^{}\right)\boldsymbol{u}_{i}^{*}\right)^{\top},$
	$\displaystyle\boldsymbol{X}$	$\displaystyle=\sum_{i=1}^{n}\widehat{\lambda}_{i}\widehat{\boldsymbol{u}}_{i}\widehat{\boldsymbol{u}}_{i}^{\top}=\sum_{i=1}^{n}\|\widehat{\lambda}_{i}\|\widehat{\boldsymbol{u}}_{i}\left(\textbf{sgn}\left(\widehat{\lambda}_{i}\right)\widehat{\boldsymbol{u}}_{i}\right)^{\top}.$

The eigenvalues are sorted in the following way. First, $\lambda_{1}^{*},\lambda_{2}^{*},\dots,\lambda_{K}^{*}$ are the largest $K$ eigenvalues of $\boldsymbol{H}$ in magnitude, and $\widehat{\lambda}_{1},\widehat{\lambda}_{2},\dots,\widehat{\lambda}_{K}$ are the largest $K$ eigenvalues of $\boldsymbol{X}$ in magnitude. Second, $\lambda_{1}^{*}>\lambda_{2}^{*}>\cdots>\lambda_{K}^{*}$ , $\widehat{\lambda}_{1}>\widehat{\lambda}_{2}\cdots>\widehat{\lambda}_{K}$ are sorted descendingly, while the other eigenvalues can be sorted in any order. By Lemma C.3 of [JKL23], we can choose the sign of $\boldsymbol{u}_{1}^{*}$ to make sure $(\boldsymbol{u}_{1}^{*})_{i}>0$ , $1\leq i\leq n$ . The direction of $\widehat{\boldsymbol{u}}_{1}$ are chosen to make sure $\widehat{\boldsymbol{u}}_{1}^{\top}\boldsymbol{u}_{1}^{*}\geq 0$ . Define

\displaystyle\sigma_{i}^{*}:=\left|\lambda_{i}^{*}\right|,\qquad\boldsymbol{v}_{i}^{*}:=\textbf{sgn}\left(\lambda_{i}^{*}\right)\boldsymbol{u}_{i}^{*},\qquad\widehat{\sigma}_{i}:=|\widehat{\lambda}_{i}|,\qquad\widehat{\boldsymbol{v}}_{i}:=\textbf{sgn}(\widehat{\lambda}_{i})\widehat{\boldsymbol{u}}_{i}.

(7)

We further define,

\displaystyle\overline{\boldsymbol{U}}^{*}=[\boldsymbol{u}_{2}^{*},\boldsymbol{u}_{3}^{*},\dots,\boldsymbol{u}_{K}^{*}]\in\mathbb{R}^{n\times(K-1)}\quad\text{and}\quad\overline{\boldsymbol{U}}=[\widehat{\boldsymbol{u}}_{2},\widehat{\boldsymbol{u}}_{3},\dots,\widehat{\boldsymbol{u}}_{K}]\in\mathbb{R}^{n\times(K-1)}.

(8)

We also denote by

\displaystyle\boldsymbol{r}_{i}^{*}:=\left[\frac{(\boldsymbol{u}_{2}^{*})_{i}}{(\boldsymbol{u}_{1}^{*})_{i}},\frac{(\boldsymbol{u}_{3}^{*})_{i}}{(\boldsymbol{u}_{1}^{*})_{i}},\dots,\frac{(\boldsymbol{u}_{K}^{*})_{i}}{(\boldsymbol{u}_{1}^{*})_{i}}\right]^{\top}\in\mathbb{R}^{K-1},\quad\forall i\in[n].

(9)

Note from the expression of $\boldsymbol{r}_{i}^{*}$ and $\widehat{\boldsymbol{r}}_{i}$ , in order to analyze $\widehat{\boldsymbol{r}}_{i}$ , we need to study the difference between $\boldsymbol{u}_{1}^{*}$ and $\widehat{\boldsymbol{u}}_{1}$ , and the difference between $\overline{\boldsymbol{U}}^{*}$ and $\overline{\boldsymbol{U}}$ . To this end, we need the following four assumptions for the theoretical analysis, introduced in Section 3 of [JKL23], which are necessary for the Mixed-SCORE algorithm to work.

Denote by $\boldsymbol{G}:=K\left\|\boldsymbol{\theta}\right\|_{2}^{-2}(\boldsymbol{\Pi}^{\top}\boldsymbol{\Theta}^{2}\boldsymbol{\Pi})\in\mathbb{R}^{K\times K}$ and $\theta_{\max}:=\max_{i\in[n]}\theta_{i}$ . Note that, the eigenvalues of $\boldsymbol{P}\boldsymbol{G}$ are real, since $\boldsymbol{G}$ is positive-definite.

Assumption 2.

There exist constants $C,C^{\prime},C^{\prime\prime}>0$ such that

\displaystyle\theta_{\text{max}}\leq C\min_{i\in[n]}\theta_{i}\quad\text{and}\quad C^{\prime}\sqrt{\frac{\log n}{n}}\leq\theta_{\text{max}}\leq C^{\prime\prime}.

Assumption 3.

There exists a constant $C>0$ such that

\displaystyle\left\|\boldsymbol{P}\right\|_{\text{max}}\leq C,\quad\left\|\boldsymbol{G}\right\|\leq C,\quad\left\|\boldsymbol{G}^{-1}\right\|\leq C.

Assumption 4.

There exists $c_{1}>0$ such that $|\lambda_{2}(\boldsymbol{P}\boldsymbol{G})|\leq(1-c_{1})\lambda_{1}(\boldsymbol{P}\boldsymbol{G})$ and $c_{1}\beta_{n}\leq|\lambda_{K}(\boldsymbol{P}\boldsymbol{G})|\leq|\lambda_{2}(\boldsymbol{P}\boldsymbol{G})|\leq c_{1}^{-1}\beta_{n}$ , where $\{\beta_{n}\}_{n=1}^{\infty}$ is a sequence of positive real number such that $\beta_{n}\leq 1$ , and $\lambda_{i}(\boldsymbol{P}\boldsymbol{G})$ is the $i$ -th largest right eigenvalue of $\boldsymbol{P}\boldsymbol{G}$ .

Assumption 5.

There exists a constant $C>0$ such that $\min_{1\leq k\leq K}\boldsymbol{\eta}_{1}(k)>0$ and

\displaystyle\frac{\max_{1\leq k\leq K}\boldsymbol{\eta}_{1}(k)}{\min_{1\leq k\leq K}\boldsymbol{\eta}_{1}(k)}\leq C.

Here $\boldsymbol{\eta}_{1}$ is the right eigenvector corresponding to $\lambda_{1}(\boldsymbol{P}\boldsymbol{G})$ .

Here, Assumption 2 and Assumption 3 ensure that the underlying model is not spiky. In other words, the signal spread across all nodes. Assumption 4 is the eigen-gap assumption, and as we will see next, this assumption ensures that there is a sufficient gap between the first eigenvalue and the remaining eigenvalues of $\boldsymbol{H}$ , and the remaining non-zeros eigenvalues of $\boldsymbol{H}$ are of the same order. Assumption 5 seems to be less straightforward, but it actually satisfied by a very wide range of models (See [JKL23] for examples). We will make Assumption 2-5 for the remainder of the paper.

With these assumptions in hand, the following lemma lists some important properties of the eigenvalues of $\boldsymbol{H}$ and $\boldsymbol{X}$ .

Lemma 1.

(Lemma C.2 of [JKL23]) Let $\boldsymbol{\theta}=[\theta_{1},\ldots,\theta_{n}]^{\top}$ and $\beta_{n}$ s are defined by Assumption 4. Under Assumption 2-5, we have the following statements

•

$C_{1}^{-1}K^{-1}\left\|\boldsymbol{\theta}\right\|_{2}^{2}\leq\lambda_{1}^{*}\leq C_{1}\left\|\boldsymbol{\theta}\right\|_{2}^{2}$ . If $\beta_{n}=o(1)$ , then $\lambda_{1}^{*}\asymp\left\|\boldsymbol{\theta}\right\|_{2}^{2}$ .
•

$\lambda_{1}^{*}-\left|\lambda_{i}^{*}\right|\asymp\lambda_{1}^{*}$ , for $2\leq i\leq K$ .
•

$|\lambda_{i}^{*}|\asymp\beta_{n}K^{-1}\left\|\boldsymbol{\theta}\right\|_{2}^{2}$ , for $2\leq i\leq K$ .

Combine this lemma with the fact that $\left\|\boldsymbol{W}\right\|\lesssim\sqrt{n}\theta_{\max}$ (will be shown later in Lemma 4) with high probability, we know that the eigenvalues of $\boldsymbol{H}$ and $\boldsymbol{X}$ can be divided into three group as long as $\sqrt{n}\theta_{\max}\ll\beta_{n}K^{-1}\left\|\boldsymbol{\theta}\right\|_{2}^{2}$ .

	$\displaystyle\{\lambda_{1}^{}\}\gtrsim\{\|\lambda_{2}^{}\|,\|\lambda_{3}^{}\|,\dots,\|\lambda_{K}^{}\|\}\gg\{\lambda_{K+1}^{},\lambda_{K+2}^{},\dots,\lambda_{n}^{*}\},$
	$\displaystyle\{\widehat{\lambda}_{1}\}\gtrsim\{\|\widehat{\lambda}_{2}\|,\|\widehat{\lambda}_{3}\|,\dots,\|\widehat{\lambda}_{K}\|\}\gg\{\widehat{\lambda}_{K+1},\widehat{\lambda}_{K+2},\dots,\widehat{\lambda}_{n}\}.$

This also motivates us to derive two results separately. First, we obtain the expansion of $\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{*}$ . Second, we analyze the difference between $\overline{\boldsymbol{U}}^{*}$ and $\overline{\boldsymbol{U}}$ by writing the expansion of $\overline{\boldsymbol{U}}\boldsymbol{R}-\overline{\boldsymbol{U}}^{*}$ , where

\displaystyle\boldsymbol{R}:=\operatorname*{arg\,min}_{\boldsymbol{O}\in\mathcal{O}^{(K-1)\times(K-1)}}\left\|\overline{\boldsymbol{U}}\boldsymbol{O}-\overline{\boldsymbol{U}}^{*}\right\|_{F}

(10)

is an orthogonal matrix which best “matches” $\overline{\boldsymbol{U}}$ and $\overline{\boldsymbol{U}}^{*}$ . Here $\mathcal{O}^{(K-1)\times(K-1)}$ stands for the set of all the $(K-1)\times(K-1)$ orthogonal matrices.

The analysis of $\overline{\boldsymbol{U}}\boldsymbol{R}-\overline{\boldsymbol{U}}^{*}$ is similar to a matrix denoising problem, so we define some quantities which are commonly used in matrix denoising literature [YCF21]. We define

		$\displaystyle\sigma_{\textbf{max}}^{}=\max_{2\leq i\leq K}\|\lambda_{i}^{}\|,\quad\sigma_{\textbf{min}}^{}=\min_{2\leq i\leq K}\|\lambda_{i}^{}\|,\quad\kappa^{}=\sigma_{\textbf{max}}^{}/\sigma_{\textbf{min}}^{*},$
		$\displaystyle\widehat{\sigma}_{\textbf{max}}:=\max_{2\leq i\leq K}\|\widehat{\lambda}_{i}\|,\quad\widehat{\sigma}_{\textbf{min}}=\min_{2\leq i\leq K}\|\widehat{\lambda}_{i}\|.$		(11)

Our results regarding uncertainty quantification contributes to the literature of estimation of subspace spanned by partial eigenvectors, cf. [AFWZ20, AFW22].

To state our results, we state an incoherence condition on the matrix $\boldsymbol{H}$ , which is a natural modification of the standard incoherence assumption [YCF21, Assumption 2] to adapt to our setting by separating the first eigenspace from the second to $K$ -th eigenspace.

Definition 1.

(Incoherence) The symmetric matrix $\boldsymbol{H}$ is said to be $\mu^{*}$ -incoherent if

\displaystyle\left\|\boldsymbol{u}_{1}^{*}\right\|_{\infty}\leq\sqrt{\frac{\mu^{*}}{n}},\quad\mbox{and}\quad\left\|\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}\leq\sqrt{\frac{\mu^{*}(K-1)}{n}}.

(12)

Note that, unlike [YCF21] we do not indeed to assume an incoherence assumption in this article. This is because a combination of Assumption 2 and Assumption 3 shows that $\boldsymbol{H}$ is $\mu^{\star}$ incoherent with $\mu^{*}\asymp 1$ , see Remark 1 and Section 8.22 for more details. Also Lemma 1 implies $\kappa^{*}\asymp 1$ . However, the quantities $\mu^{*}$ and $\kappa^{*}$ are two key parameters in the literature of matrix denoising, and we keep them in our results.

Remark 1.

By Lemma C.3 of [JKL23], we obtain $(\boldsymbol{u}_{1}^{*})_{i}\asymp\theta_{i}/\|\boldsymbol{\theta}\|_{2}$ for $1\leq i\leq[n]$ . Therefore, an incoherence condition on $\boldsymbol{u}_{1}^{*}$ actually can be interpreted as the incoherence condition on $\boldsymbol{\theta}$ , which is implied by Assumption 2.

Now, we are ready to state our results. We first state the following bound on $\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{*}$ whose proof is deferred.

Theorem 1.

If $\max\{\sqrt{n}\theta_{\text{max}},\log n\}\ll\lambda_{1}^{*}$ , with probability at least $1-O(n^{-10})$ ,

\displaystyle\left\|\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{*}\right\|_{\infty}\lesssim\frac{K\log^{0.5}n+K^{1.5}\sqrt{\mu^{*}}}{n\theta_{\text{max}}}+\frac{K\sqrt{\mu^{*}}\log n}{n^{1.5}\theta_{\text{max}}^{2}}+\frac{K^{3}}{n^{1.5}\theta_{\text{max}}^{3}},

where $\mu^{\star}$ is given by Definition 1. Moreover, we have the following expansion

\displaystyle\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{*}=\sum_{i=2}^{n}\frac{\boldsymbol{u}_{i}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}}{\lambda_{1}^{*}-\lambda_{i}^{*}}\boldsymbol{u}_{i}^{*}+\boldsymbol{\delta},

where with probability at at least $1-O(n^{-10})$ we have $\|\boldsymbol{\delta}\|_{2}\lesssim K^{2}/n\theta_{\text{max}}^{2}$ and

\displaystyle\left\|\boldsymbol{\delta}\right\|_{\infty}\lesssim

\displaystyle\frac{K^{2.5}\sqrt{\mu^{*}}+K^{2}\log^{0.5}n}{n^{1.5}\theta_{\text{max}}^{2}}+\frac{K^{3}}{n^{1.5}\theta_{\text{max}}^{3}}+\frac{K^{2}\log^{1.5}n+K^{2.5}\sqrt{\mu^{*}}}{n^{2}\theta_{\text{max}}^{3}}+\frac{\log^{2}n\sqrt{\mu^{*}}}{n^{2.5}\theta_{\text{max}}^{4}}.

Furthermore, if $K,\mu^{*},\theta_{\max}\asymp 1$ , we obtain

\displaystyle\left\|\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{*}\right\|_{\infty}\lesssim\frac{\log^{0.5}n}{n}\quad\text{ and }\quad\left\|\boldsymbol{\delta}\right\|_{\infty}\lesssim\frac{\log^{2}n}{n^{1.5}}.

Proof.

Combine Lemma 1 with Theorem 8, Theorem 9 and Lemma 5. ∎

Theorem 1 provides an error bound for $\hat{u}_{1}$ through the quantity $\boldsymbol{\delta}$ . Both $l_{2}$ and $l_{\infty}$ bounds on $\boldsymbol{\delta}$ are provided. The following result states the expansion for $\overline{\boldsymbol{U}}\boldsymbol{R}-\overline{\boldsymbol{U}}^{*}$ . Define

	$\displaystyle\varepsilon_{0}:=$	$\displaystyle\left(\frac{\kappa^{}K^{2.5}\sqrt{\mu^{}}}{n^{1.5}\beta_{n}^{2}\theta_{\text{max}}^{2}}+\frac{K^{2.5}\log^{0.5}n}{n^{1.5}\beta_{n}^{2}\theta_{\text{max}}^{3}}+\frac{K^{3.5}}{n^{1.5}\beta_{n}^{2}\theta_{\text{max}}^{3}}+\frac{K^{1.5}\log^{0.5}n}{n\beta_{n}\theta_{\text{max}}}+\frac{K^{2}}{n\beta_{n}\theta_{\text{max}}^{2}}+\frac{K^{0.5}\sqrt{\mu^{*}}}{n^{0.5}}\right)$
		$\displaystyle\cdot\left(\frac{\kappa^{*}K^{2}}{n\beta_{n}^{2}\theta_{\text{max}}^{2}}+\frac{K^{1.5}\log^{0.5}n}{n\beta_{n}\theta_{\text{max}}}\right)+\frac{K^{2.5}\log^{0.5}n}{n^{1.5}\beta_{n}^{2}\theta_{\text{max}}^{4}}+\frac{K^{3.5}}{n^{1.5}\beta_{n}^{2}\theta_{\text{max}}^{3}}.$		(13)

Define $\overline{\boldsymbol{\Lambda}}^{*}:=\textbf{diag}(\lambda_{2}^{*},\lambda_{3}^{*},\dots,\lambda_{K}^{*})$ .

Theorem 2.

Assume that $\max\{\sqrt{n}\theta_{\text{max}},\log n\}\ll\sigma_{\textbf{min}}^{*}$ and $\sqrt{(K-1)\log n}\theta_{\text{max}}/\sigma_{\textbf{min}}^{*}+\kappa^{*}n\theta_{\text{max}}^{2}/\sigma_{\textbf{min}}^{*2}\ll 1$ , where $\kappa^{\star}$ , $\mu^{\star}$ are given by (2.2) and Definition 1 respectively. Write

\displaystyle\overline{\boldsymbol{U}}\boldsymbol{R}-\overline{\boldsymbol{U}}^{*}=\left[\boldsymbol{W}-\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{N}\right]\overline{\boldsymbol{U}}^{*}\left(\overline{\boldsymbol{\Lambda}}^{*}\right)^{-1}+\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}},

where $\boldsymbol{R}$ is defined by (10) and

\boldsymbol{N}=\sum_{j=2}^{n}\lambda_{1}^{*}\boldsymbol{u}_{j}^{*}\boldsymbol{u}_{j}^{*\top}/(\lambda_{1}^{*}-\lambda_{j}^{*}).

(14)

If $n\gtrsim\max\left\{\mu^{*}\log^{2}n,K\log n\right\}$ , then with probability at least $1-O(n^{-10})$ , $\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}$ satisfies

\displaystyle\left\|\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}\right\|_{2,\infty}\lesssim\varepsilon_{0},

where $\varepsilon_{0}$ is defined by (2.2). If, in addition $K,\mu^{*},\kappa^{*},\beta_{n},\theta_{\max}\asymp 1$ as in Theorem 1, we obtain $\left\|\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}\right\|_{2,\infty}\lesssim\frac{\log^{0.5}n}{n^{1.5}}$ .

Proof.

See Section 8.11. ∎

We are now in a position to state our main result about $\hat{r}_{i}$ s defined by (4). This is our final result regarding finite sample analysis of SCORE-step. The proof involves both Theorem 1 and Theorem 2, and is deferred.

Theorem 3.

Assume the conditions in Theorem 2 hold. In addition, we assume

\displaystyle\frac{K\log^{0.5}n+K^{1.5}\sqrt{\mu^{*}}}{n^{0.5}\theta_{\text{max}}}+\frac{K\sqrt{\mu^{*}}\log n}{n\theta_{\text{max}}^{2}}+\frac{K^{3}}{n\theta_{\text{max}}^{3}}\ll 1.

Recall the orthogonal matrix $\boldsymbol{R}$ defined by (10) and

\displaystyle\boldsymbol{w}_{i}=\left\{\left[\boldsymbol{W}-\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{N}\right]_{i,\cdot}\overline{\boldsymbol{U}}^{*}\left(\overline{\boldsymbol{\Lambda}}^{*}\right)^{-1}\right\}^{\top},\quad\forall i\in[n],

where the matrix $N$ was defined by (14). Then we have the following decomposition

\displaystyle\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}=\frac{1+\gamma_{i}}{(\boldsymbol{u}_{1}^{*})_{i}}\left(\boldsymbol{w}_{i}-\frac{1}{\lambda_{1}^{*}}\left[\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right]_{i}\boldsymbol{r}_{i}^{*}\right)+[\boldsymbol{\Psi}_{\boldsymbol{r}}]_{i,\cdot}^{\top},\quad\forall i\in[n],

such that with probability at least $1-O(n^{-10})$ , for all $i\in[n]$ we have

	$\displaystyle\|\gamma_{i}\|\lesssim\frac{K\log^{0.5}n+K^{1.5}\sqrt{\mu^{}}}{n^{0.5}\theta_{\text{max}}}+\frac{K\sqrt{\mu^{}}\log n}{n\theta_{\text{max}}^{2}}+\frac{K^{3}}{n\theta_{\text{max}}^{3}},$
	$\displaystyle\left\\|[\boldsymbol{\Psi}_{\boldsymbol{r}}]_{i,\cdot}\right\\|_{2}\lesssim\sqrt{n}\left(\left\\|\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}\right\\|_{2,\infty}+\left\\|\boldsymbol{r}_{i}^{*}\right\\|_{2}\left\\|\boldsymbol{\delta}\right\\|_{\infty}\right),$

where $\left\|\boldsymbol{\delta}\right\|_{\infty}$ and $\left\|\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}\right\|_{2,\infty}$ are controlled by Theorem 1 and Theorem 2. Specifically, if $K,\mu^{*},\kappa^{*},\beta_{n},\theta_{\max}\asymp 1$ , for all $i\in[n]$ we have

\displaystyle|\gamma_{i}|\lesssim\frac{\log^{0.5}n}{n^{0.5}}\quad\text{and}\quad\left\|[\boldsymbol{\Psi}_{\boldsymbol{r}}]_{i,\cdot}\right\|_{2}\lesssim\frac{\log^{2}n}{n}.

Proof.

See Section 8.12. ∎

Remark 2.

In terms of the number of communities are allowed, if we assume $\mu^{*},\kappa^{*},\beta_{n},\theta_{\max}\asymp 1$ , then the assumptions in Theorem 1-3 require $K\ll n^{1/3}$ . When it comes to $\theta_{\max}$ , if $K,\mu^{*},\kappa^{*},\beta_{n}\asymp 1$ , the assumptions in Theorem 1-3 require $\theta_{\text{max}}\gg n^{-1/3}$ .

We finish the section with our result regarding estimation error bound on the $\widehat{\boldsymbol{r}}_{i}$ s. To this end, we need to define the following quantities:

$\displaystyle\varepsilon_{1}:=$	$\displaystyle\frac{K\sqrt{\mu^{}}}{n^{0.5}\theta_{\text{max}}}+\frac{K^{0.5}\log n\sqrt{\mu^{}}}{n^{0.5}\theta_{\text{max}}^{2}}+\frac{K^{1.5}\log^{0.5}n}{n^{0.5}\beta_{n}\theta_{\text{max}}}+\frac{K^{1.5}\log n\sqrt{\mu^{*}}}{n\beta_{n}\theta_{\text{max}}^{2}};$
$\displaystyle\varepsilon_{2}:=$	$\displaystyle\left(\frac{K\log^{0.5}n+K^{1.5}\sqrt{\mu^{}}}{n^{0.5}\theta_{\text{max}}}+\frac{K\sqrt{\mu^{}}\log n}{n\theta_{\text{max}}^{2}}+\frac{K^{3}}{n\theta_{\text{max}}^{3}}\right)\varepsilon_{1}+\frac{K^{3}\sqrt{\mu^{*}}+K^{2.5}\log^{0.5}n}{n\theta_{\text{max}}^{2}}$
	$\displaystyle+\frac{K^{2.5}\log^{1.5}n+K^{3}\sqrt{\mu^{}}}{n^{1.5}\theta_{\text{max}}^{3}}+\frac{K^{0.5}\log^{2}n\sqrt{\mu^{}}}{n^{2}\theta_{\text{max}}^{4}}+\varepsilon_{0},$	(15)

where $\varepsilon_{0}$ is defined by (2.2).

Despite the complicated expressions, these two quantities are easily interpretable. $\varepsilon_{1}$ controls the estimation error $\|\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}\|_{2}$ according to the Theorem 4 below, while $\varepsilon_{2}$ controls the expansion error $\left\|\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}\right\|_{2,\infty}$ according to Theorem 3. If we assume $K,\mu^{*},\kappa^{*},\beta_{n},\theta_{\max}\asymp 1$ , for all $i\in[n]$ , then one have

\displaystyle\varepsilon_{1}\asymp\frac{\log n}{n^{0.5}}\quad\text{and}\quad\varepsilon_{2}\asymp\frac{\log^{2}n}{n}.

That is, the expansion error decays faster than the estimation error by $\sqrt{n}$ up to logarithmic factors. This validates our theoretical results. As we have mentioned before, the following theorem shows that $\varepsilon_{1}$ controls the estimation error.

Theorem 4.

Assume the conditions in Theorem 3 hold. Assume

	$\displaystyle n\gtrsim\max\Bigg{\{}$	$\displaystyle\frac{K^{2}}{\beta_{n}^{2}\theta_{\text{max}}^{6}},\frac{K^{4}}{\beta_{n}^{2}\theta_{\text{max}}^{4}\log n},\frac{\kappa^{2}K^{2}\mu^{}}{\beta_{n}^{2}\theta_{\text{max}}^{2}\log n},K\mu^{},\frac{\kappa^{}K^{2.5}}{\beta_{n}^{2}\theta_{\text{max}}^{3}\log^{0.5}n},\frac{\kappa^{*}K^{2}}{\beta_{n}^{2}\theta_{\text{max}}^{2}},$
		$\displaystyle\frac{K^{1.5}\log^{0.5}n}{\beta_{n}\theta_{\text{max}}},\frac{K^{4}}{\theta_{\text{max}}^{2}},\frac{K^{3}}{\theta_{\text{max}}^{4}\mu^{*}}\Bigg{\}}$		(16)

and

\displaystyle n^{1.5}\gtrsim\max\Bigg{\{}\frac{\kappa^{*}K^{4}}{\beta_{n}^{3}\theta_{\text{max}}^{4}\log^{0.5}n},\frac{\kappa^{*}K^{3}}{\beta_{n}^{3}\theta_{\text{max}}^{4}},\frac{K^{2.5}\log^{0.5}n}{\beta_{n}^{2}\theta_{\text{max}}^{3}},\frac{\kappa^{*2}K^{3}\sqrt{\mu^{*}}}{\beta_{n}^{3}\theta_{\text{max}}^{3}\log^{0.5}n},\frac{\kappa^{*}K^{2.5}\sqrt{\mu^{*}}}{\beta_{n}^{2}\theta_{\text{max}}^{2}}\Bigg{\}}.

(17)

Then with probability at least $1-O(n^{-10})$ we have

\displaystyle\max_{1\leq i\leq n}\left\|\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}\right\|_{2}\lesssim\varepsilon_{1}.

(18)

Proof.

See Section 8.13. ∎

The above result obtains finite sample error bound for the $\hat{r}_{i}$ s defined by (4). By Theorem 3, we know that

	$\displaystyle\max_{1\leq i\leq n}\left\\|\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}-\Delta\boldsymbol{r}_{i}\right\\|_{2}$	$\displaystyle\lesssim\max_{1\leq i\leq n}\|\gamma_{i}\|_{2}\cdot\max_{1\leq i\leq n}\left\\|\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}\right\\|_{2}+\max_{1\leq i\leq n}\left\\|[\boldsymbol{\Psi}_{\boldsymbol{r}}]_{i,\cdot}\right\\|_{2}$
		$\displaystyle\lesssim\max_{1\leq i\leq n}\|\gamma_{i}\|_{2}\cdot\varepsilon_{1}+\max_{1\leq i\leq n}\left\\|[\boldsymbol{\Psi}_{\boldsymbol{r}}]_{i,\cdot}\right\\|_{2}\lesssim\varepsilon_{2}$		(19)

with probability at least $1-O(n^{-10})$ , where we define

\displaystyle\Delta\boldsymbol{r}_{i}:=(\boldsymbol{w}_{i}-\left[\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right]_{i}\boldsymbol{r}_{i}^{*}/\lambda_{1}^{*})/(\boldsymbol{u}_{1})_{i},\quad i\in[n].

(20)

Remark 3.

We remark here that (16) and (17) are two sufficient conditions to ensure (18) holds, but not necessary. In fact, these two conditions ensure the upper bound of the first order term $\left\|\Delta\boldsymbol{r}_{i}\right\|_{2}$ dominates the upper bound of the expansion error, which is given by Theorem 3, in order to simplify the upper bound of the estimation error $\max_{1\leq i\leq n}\|\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}\|_{2}$ . In other words, the results in the rest of the paper hold without (16) and (17), but a more complicated expression of $\varepsilon_{1}$ is required if we don’t have (16) and (17).

3 Vertex Hunting

In this section, we describe how to estimate the $K$ underlying vertices of the simplex based on the dataset. To this end, define disjoint subsets $\mathbb{V}_{1},\mathbb{V}_{2},\dots,\mathbb{V}_{K}\subset[n]$ such that $\mathbb{V}_{k}$ is the collection of all the pure nodes of the $k$ -th community, and let $\boldsymbol{b}_{k}^{*}$ be the vertex of the corresponding community, i.e.,

\displaystyle\mathbb{V}_{k}=\{i\in[n]:\boldsymbol{\pi}_{i}(k)=1\}\quad\text{and}\quad\boldsymbol{b}_{k}^{*}=\left(\sum_{i\in\mathbb{V}_{k}}\boldsymbol{r}_{i}^{*}\right)/|\mathbb{V}_{k}|\quad\forall k\in[K].

(21)

The following quantity

\displaystyle\Delta_{\boldsymbol{r}}=\min_{k\in[K]}\min_{i\in[n]\backslash\mathbb{V}_{k}}\left\|\boldsymbol{r}_{i}^{*}-\boldsymbol{b}_{k}^{*}\right\|_{2}

(22)

measures the gap between any vertex and the other points. We will use the following successive projection algorithm given by Algorithm 1 for the vertex hunting step.

Algorithm 1 Successive projection

1: Input

\widehat{\boldsymbol{r}}_{1},\widehat{\boldsymbol{r}}_{2},\dots,\widehat{\boldsymbol{r}}_{n}

, radius

\phi

2: Initialize

\boldsymbol{Z}_{i}=(1,\widehat{\boldsymbol{r}}_{i}^{\top})^{\top}

, for

i\in[n]

3: for

k=1,2,\dots,K

4: Let

i_{k}=\operatorname*{arg\,max}_{1\leq i\leq n}\left\|\boldsymbol{Z}_{i}\right\|_{2}

and

\widehat{\boldsymbol{b}}_{k}^{\prime}=\widehat{\boldsymbol{r}}_{i_{k}}

5: Update

\boldsymbol{Z}_{i}\leftarrow\boldsymbol{Z}_{i}-\widehat{\boldsymbol{r}}_{i_{k}}\widehat{\boldsymbol{r}}_{i_{k}}^{\top}\boldsymbol{Z}_{i}/\left\|\widehat{\boldsymbol{r}}_{i_{k}}\right\|_{2}^{2}

, for

i\in[n]

6: end for

7: Let

\widehat{\mathbb{V}}_{k}=\left\{i\in[n]:\left\|\widehat{\boldsymbol{r}}_{i}-\widehat{\boldsymbol{b}}_{k}^{\prime}\right\|_{2}\leq\phi\right\}\text{ and }\widehat{\boldsymbol{b}}_{k}=\frac{1}{|\widehat{\mathbb{V}}_{k}|}\sum_{i\in\widehat{\mathbb{V}}_{k}}\widehat{\boldsymbol{r}}_{i}

for

k\in[K]

8: return

\widehat{\mathbb{V}}_{1},\widehat{\mathbb{V}}_{2},\dots,\widehat{\mathbb{V}}_{K}

and

\widehat{\boldsymbol{b}}_{1},\widehat{\boldsymbol{b}}_{2},\dots,\widehat{\boldsymbol{b}}_{K}

Successive projection algorithm (SPA) is a forward variable selection method introduced by [ASG⁺01]. Our version of SPA borrows from[GV13], who derived its theoretical guarantee. One might consider alternate vertex hunting algorithms similar in spirit to [JKL23]. We leave this for future research directions.

Our goal for the remainder of the section is to understand how SPA is effective in selecting the underlying vertices. If $\Delta_{r}$ , defined by (22), is not too small, we expect the vertex hunting algorithm to retrieve all the vertices of the simplex, namely, selection consistency. This is the main result of this section. The proof follows by combining Theorem 4 with Theorem 3 of [GV13].

Theorem 5.

Assume the conditions in Theorem 4 hold and the estimation error bound on the right hand side of (18) is at most of a constant order. If we further have

\displaystyle\Delta_{\boldsymbol{r}}>2\phi>C_{\text{SP}}\cdot\varepsilon_{1}

for some constant $C_{\text{SP}}>0$ , then with probability at least $1-O(n^{-10})$ , there exists a permutation $\rho$ of $[K]$ , such that $\widehat{\mathbb{V}}_{\rho(k)}=\mathbb{V}_{k}$ for all $k\in[K]$ .

Proof.

See Section 8.14. ∎

Theorem 5 along with Theorem 3 yields the following result regarding $\widehat{\boldsymbol{b}}_{k}$ .

Corollary 1.

Assume the conditions in Theorem 5 hold. Then with probability at least $1-O(n^{-10})$ , there exists a permutation $\rho$ of $[K]$ , such that

\displaystyle\boldsymbol{R}^{\top}\widehat{\boldsymbol{b}}_{\rho(k)}-\boldsymbol{b}_{k}^{*}=\frac{1}{\left|\mathbb{V}_{k}\right|}\sum_{i\in\mathbb{V}_{k}}\Delta\boldsymbol{r}_{i}+[\boldsymbol{\Psi}_{\boldsymbol{b}}]_{k,\cdot}^{\top},\quad\forall k\in[K],

(23)

where $\Delta\boldsymbol{r}_{i}$ is defined by (20). Furthermore, $\left\|\boldsymbol{\Psi}_{\boldsymbol{b}}\right\|_{2,\infty}\lesssim\varepsilon_{2}$ .

Proof.

See Section 8.15. ∎

The leading term of RHS of (23) will be denoted by

\displaystyle\Delta\boldsymbol{b}_{k}:=\left(\sum_{i\in\mathbb{V}_{k}}\Delta\boldsymbol{r}_{i}\right)/|\mathbb{V}_{k}|,\quad k\in[K].

(24)

4 Membership Reconstruction

In this section we characterize the behavior of $\widehat{\boldsymbol{\pi}}_{i}$ . To this end, we will require the expansion of the following three terms

\displaystyle\widehat{\lambda}_{1},\quad\widehat{\boldsymbol{b}}_{k}^{\top}\textbf{diag}(\widehat{\lambda}_{2},\widehat{\lambda}_{2},\dots,\widehat{\lambda}_{K})\widehat{\boldsymbol{b}}_{k},k\in[K]\quad\text{and}\quad\widehat{\boldsymbol{a}}_{i},i\in[n].

The expansion of $\widehat{\lambda}_{1}$ can be directly given by Theorem 10, and we defer the precise statement to Section 7. We turn to derive the expansion of $\widehat{\boldsymbol{b}}_{k}^{\top}\textbf{diag}(\widehat{\lambda}_{2},\widehat{\lambda}_{2},\dots,\widehat{\lambda}_{K})\widehat{\boldsymbol{b}}_{k}=\widehat{\boldsymbol{b}}_{k}^{\top}\overline{\boldsymbol{\Lambda}}\widehat{\boldsymbol{b}}_{k}$ . We will use the following notations: given any vector $\boldsymbol{v}\in\mathbb{R}^{K}$ and permutation $\rho(\cdot)$ of $[K]$ , set

\displaystyle\rho(\boldsymbol{v})=\left[v_{\rho(1)},v_{\rho(2)},\dots,v_{\rho(K)}\right]^{\top}.

And, given a matrix $\boldsymbol{A}=[\boldsymbol{a}_{1},\boldsymbol{a}_{2},\dots,\boldsymbol{a}_{K}]$ with $K$ columns, define

\displaystyle\rho(\boldsymbol{A})=\left[\boldsymbol{a}_{\rho(1)},\boldsymbol{a}_{\rho(2)},\dots,\boldsymbol{a}_{\rho(K)}\right].

Now, we are in a position to state our result regarding $\widehat{\boldsymbol{b}}_{k}^{\top}\overline{\boldsymbol{\Lambda}}\widehat{\boldsymbol{b}}_{k}$ .

Lemma 2.

Assume the conditions in Theorem 5 hold and $\varepsilon_{2}\lesssim\varepsilon_{1}\lesssim\sqrt{K-1}$ , where $\varepsilon_{1}$ and $\varepsilon_{2}$ are defined via (15). Then with probability at least $1-O(n^{-10})$ , for all $k\in[K]$ , we have the following expansion:

\displaystyle\widehat{\boldsymbol{b}}_{\rho(k)}^{\top}\overline{\boldsymbol{\Lambda}}\widehat{\boldsymbol{b}}_{\rho(k)}-\boldsymbol{b}_{k}^{*\top}\overline{\boldsymbol{\Lambda}}^{*}\boldsymbol{b}_{k}^{*}=2\boldsymbol{b}_{k}^{*\top}\overline{\boldsymbol{\Lambda}}^{*}\Delta\boldsymbol{b}_{k}+\left[\boldsymbol{\psi}\right]_{k},

where $\Delta\boldsymbol{b}_{k}$ is defined by (24), $b^{\star}_{k}$ is defined by (21), $\overline{\boldsymbol{\Lambda}}^{\star}:=\textbf{diag}(\lambda^{\star}_{2},\ldots,\lambda^{\star}_{K})$ and the error $\boldsymbol{\psi}$ satisfies

\displaystyle\left\|\boldsymbol{\psi}\right\|_{\infty}\lesssim\frac{\kappa^{*}K^{2}}{\beta_{n}}+K^{1.5}\theta_{\text{max}}\log^{0.5}n+\left(K^{0.5}\varepsilon_{2}+\varepsilon_{1}^{2}\right)\sigma_{\textbf{max}}^{*}.

For the estimation error, if $\frac{\kappa^{*}K^{1.5}}{\beta_{n}}+K\theta_{\text{max}}\log^{0.5}n\lesssim\varepsilon_{1}\sigma_{\textbf{max}}^{*}$ , we have

\displaystyle\left|\widehat{\boldsymbol{b}}_{\rho(k)}^{\top}\overline{\boldsymbol{\Lambda}}\widehat{\boldsymbol{b}}_{\rho(k)}-\boldsymbol{b}_{k}^{*\top}\overline{\boldsymbol{\Lambda}}^{*}\boldsymbol{b}_{k}^{*}\right|\lesssim K^{0.5}\varepsilon_{1}\sigma_{\textbf{max}}^{*}.

Proof.

See Section 8.17. ∎

The following Lemma gives an expansion of $\widehat{\boldsymbol{a}}_{i},i\in[n]$ .

Lemma 3.

Assume the conditions in Theorem 5 hold and $\varepsilon_{2}\lesssim\varepsilon_{1}\leq C$ for some constant $C>0$ . Let $\rho(\cdot)$ be the permutation in Theorem 5. Then with probability at least $1-O(n^{-10})$ , for all $i\in[n]$ , $\widehat{\boldsymbol{a}}_{i}$ can be expanded as

\displaystyle\rho(\widehat{\boldsymbol{a}}_{i})-\boldsymbol{a}^{*}_{i}=(\boldsymbol{B}^{*})^{-1}\begin{bmatrix}\Delta\boldsymbol{r}_{i}\\ 0\end{bmatrix}-(\boldsymbol{B}^{*})^{-1}\Delta\boldsymbol{B}\boldsymbol{a}_{i}^{*}+\left[\boldsymbol{\Psi}_{\boldsymbol{a}}\right]_{i,\cdot}^{\top},

where

\displaystyle\boldsymbol{B}^{*}=\begin{bmatrix}\boldsymbol{b}_{1}^{*}&\boldsymbol{b}_{2}^{*}&\dots&\boldsymbol{b}_{K}^{*}\\ 1&1&\dots&1\end{bmatrix},\quad\boldsymbol{a}_{i}^{*}=(\boldsymbol{B}^{*})^{-1}\begin{bmatrix}\boldsymbol{r}_{i}^{*}\\ 1\end{bmatrix},\quad\Delta\boldsymbol{B}=\begin{bmatrix}\Delta\boldsymbol{b}_{1}&\Delta\boldsymbol{b}_{2}&\dots&\Delta\boldsymbol{b}_{K}\\ 0&0&\dots&0\end{bmatrix},

(25)

and

\displaystyle\|\left[\boldsymbol{\Psi}_{\boldsymbol{a}}\right]_{i,\cdot}\|_{2}\lesssim\varepsilon_{2}+\varepsilon_{1}\left\|\rho(\widehat{\boldsymbol{a}}_{i})-\boldsymbol{a}^{*}_{i}\right\|_{2}.

Furthermore, the estimation error can be controlled as

\displaystyle\left\|\rho(\widehat{\boldsymbol{a}}_{i})-\boldsymbol{a}^{*}_{i}\right\|_{2}\lesssim\varepsilon_{1}.

Proof.

See Section 8.18. ∎

For simplicity, we define, for $i\in[n]$

\displaystyle\Delta\boldsymbol{a}_{i}=(\boldsymbol{B}^{*})^{-1}\begin{bmatrix}\Delta\boldsymbol{r}_{i}\\ 0\end{bmatrix}-(\boldsymbol{B}^{*})^{-1}\Delta\boldsymbol{B}\boldsymbol{a}_{i}^{*},

(26)

where $B^{\star}$ and $a^{\star}$ are defined by (25). As a counterpart of $\widehat{\boldsymbol{c}}$ in the membership construction step (6), we define $c_{k}^{*}=[\lambda_{1}^{*}+\boldsymbol{b}_{k}^{*\top}\overline{\boldsymbol{\Lambda}}^{*}\boldsymbol{b}_{k}^{*}]^{-1/2}$ for $k\in[K]$ , where $\boldsymbol{b}_{k}^{*\top}$ is given by (21). The expansion of $\widehat{\boldsymbol{c}}$ can be seen from Corollary 3 and Lemma 2. Combine them with Lemma 3, we obtain the following expansion of $\widehat{\boldsymbol{\pi}}_{i}$ that is linear in $\boldsymbol{W}$ .

Theorem 6.

\displaystyle\widehat{\boldsymbol{\pi}}_{i}(\rho(k))-\boldsymbol{\pi}_{i}(k)=(1+\eta_{i})\Delta\boldsymbol{\pi}_{i}(k)+\left[\boldsymbol{\Psi}_{\boldsymbol{\Pi}}\right]_{i,k},

where

	$\displaystyle\Delta\boldsymbol{\pi}_{i}(k)=$	$\displaystyle\frac{1}{\left(\sum_{l=1}^{K}(\boldsymbol{a}^{}_{i})_{l}/c^{}_{l}\right)^{2}}\Bigg{\{}\sum_{l\neq k,l\in[K]}\textbf{Tr}\left[\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}+2\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\right]\left(\frac{c_{k}^{}}{2c_{l}^{}}-\frac{c_{l}^{}}{2c_{k}^{}}\right)(\boldsymbol{a}_{i}^{})_{k}(\boldsymbol{a}_{i}^{})_{l}$
		$\displaystyle\quad\quad+\frac{(\Delta\boldsymbol{a}_{i})_{k}(\boldsymbol{a}_{i}^{})_{l}-(\Delta\boldsymbol{a}_{i})_{l}(\boldsymbol{a}_{i}^{})_{k}}{c_{k}^{}c_{l}^{}}+\left(\frac{\boldsymbol{b}_{k}^{\top}\overline{\boldsymbol{\Lambda}}^{}\Delta\boldsymbol{b}_{k}c_{k}^{}}{c_{l}^{}}-\frac{\boldsymbol{b}_{l}^{\top}\overline{\boldsymbol{\Lambda}}^{}\Delta\boldsymbol{b}_{l}c_{l}^{}}{c_{k}^{}}\right)(\boldsymbol{a}_{i}^{})_{k}(\boldsymbol{a}_{i}^{})_{l}\Bigg{\}},$		(27)

$|\eta_{i}|\lesssim K\varepsilon_{1}$ and

\displaystyle\left|\left[\boldsymbol{\Psi}_{\boldsymbol{\Pi}}\right]_{i,k}\right|\lesssim\frac{K}{n\theta_{\text{max}}^{2}}\left(\frac{\kappa^{*}K^{2}}{\beta_{n}}+K^{1.5}\theta_{\text{max}}\log^{0.5}n+\left(K^{0.5}\varepsilon_{2}+\varepsilon_{1}^{2}\right)\sigma_{\textbf{max}}^{*}\right)+K(\varepsilon_{2}+\varepsilon_{1}^{2}).

Proof.

See Section 8.19. ∎

Given $\varepsilon_{1}$ and $\varepsilon_{2}$ in (15), we define

\displaystyle\varepsilon_{3}:=\frac{K}{n\theta_{\text{max}}^{2}}\left(\frac{\kappa^{*}K^{2}}{\beta_{n}}+K^{1.5}\theta_{\text{max}}\log^{0.5}n+\left(K^{0.5}\varepsilon_{2}+\varepsilon_{1}^{2}\right)\sigma_{\textbf{max}}^{*}\right)+K(\varepsilon_{2}+\varepsilon_{1}^{2}),

where $\kappa^{\star}$ is defined by (2.2). Since $\sigma_{\textbf{max}}^{*}\asymp\beta_{n}K^{-1}\|\boldsymbol{\theta}\|_{2}^{2}\lesssim K^{-1}\|\boldsymbol{\theta}\|_{2}^{2}$ , $\max_{k\in[K]}\|\Delta\boldsymbol{b}_{k}\|_{2}\lesssim\varepsilon_{1}$ and $\max_{i\in[n]}\|\Delta\boldsymbol{a}_{i}\|_{2}\lesssim\varepsilon_{1}$ , from Theorem 6 one can show that

\displaystyle|\widehat{\boldsymbol{\pi}}_{i}(\rho(k))-\boldsymbol{\pi}_{i}(k)-\Delta\boldsymbol{\pi}_{i}(k)|\lesssim\varepsilon_{3}

with probability at least $1-O(n^{-10})$ . Specifically, if $K,\mu^{*},\kappa^{*},\beta_{n},\theta_{\max}\asymp 1$ , we have

\displaystyle\varepsilon_{3}\asymp\frac{\log^{2}n}{n}.

5 Distributional Theory and Rank Inference

In this section, we tackle specific inference problems based on the uncertainty quantification results we stated before. First, we establish distributional guarantee using the first order expansion we derived in Theorem 6. Second, we apply the distributional results to related inference problem, especially rank inference application. To state the distributional results of $\widehat{\boldsymbol{\pi}}_{i}$ , we need the following notations. They are non-random matrices of dimension $n\times n$

	$\displaystyle\boldsymbol{C}^{\boldsymbol{r}}_{i,k}:=\frac{\overline{\boldsymbol{\Lambda}}_{kk}^{}}{(\boldsymbol{u}_{1})_{i}}\left[\boldsymbol{e}_{i}\left(\overline{\boldsymbol{U}}_{\cdot,k}^{}\right)^{\top}\right]-\overline{\boldsymbol{\Lambda}}_{kk}^{}\left[\boldsymbol{u}_{1}^{}\left(\overline{\boldsymbol{U}}_{\cdot,k}^{}\right)^{\top}\boldsymbol{N}\right]-\frac{(\boldsymbol{r}_{i}^{})_{k}}{(\boldsymbol{u}_{1}^{})_{i}\lambda_{1}^{}}\boldsymbol{u}_{1}^{*}\boldsymbol{N}_{i,\cdot},$
	$\displaystyle\boldsymbol{C}^{\boldsymbol{a}}_{i,k}:=\sum_{l\in[K-1]}\left(\boldsymbol{B}^{}\right)_{k,l}^{-1}\boldsymbol{C}^{\boldsymbol{r}}_{i,l}-\sum_{l\in[K-1]}\sum_{t\in[K]}\left(\boldsymbol{B}^{}\right)_{k,l}^{-1}(\boldsymbol{a}_{i}^{*})_{t}\boldsymbol{C}^{\boldsymbol{b}}_{t,l},\quad\boldsymbol{C}^{\boldsymbol{b}}_{i,k}:=\frac{1}{\left\|\mathbb{V}_{i}\right\|}\sum_{j\in\mathbb{V}_{i}}\boldsymbol{C}^{\boldsymbol{r}}_{j,k},$
	$\displaystyle\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}:=\frac{1}{\left(\sum_{l=1}^{K}(\boldsymbol{a}^{}_{i})_{l}/c^{}_{l}\right)^{2}}\sum_{l\neq k,l\in[K]}\Bigg{\{}\left(\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}+2\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{N}\right)\left(\frac{c_{k}^{}}{2c_{l}^{}}-\frac{c_{l}^{}}{2c_{k}^{}}\right)(\boldsymbol{a}_{i}^{})_{k}(\boldsymbol{a}_{i}^{})_{l}$
	$\displaystyle\quad\quad+\frac{(\boldsymbol{a}_{i}^{})_{l}\boldsymbol{C}^{\boldsymbol{a}}_{i,k}-(\boldsymbol{a}_{i}^{})_{k}\boldsymbol{C}^{\boldsymbol{a}}_{i,l}}{c_{k}^{}c_{l}^{}}+\sum_{t\in[K-1]}\left(\frac{\boldsymbol{b}_{kt}^{\top}\overline{\boldsymbol{\Lambda}}^{}_{tt}c_{k}^{}\boldsymbol{C}^{\boldsymbol{b}}_{k,t}}{c_{l}^{}}-\frac{\boldsymbol{b}_{lt}^{\top}\overline{\boldsymbol{\Lambda}}^{}_{tt}c_{l}^{}\boldsymbol{C}^{\boldsymbol{b}}_{l,t}}{c_{k}^{}}\right)(\boldsymbol{a}_{i}^{})_{k}(\boldsymbol{a}_{i}^{})_{l}\Bigg{\}}.$		(28)

One can see that

	$\displaystyle(\Delta\boldsymbol{r}_{i})_{k}=\textbf{Tr}\left[\boldsymbol{C}_{i,k}^{\boldsymbol{r}}\boldsymbol{W}\right],\quad(\Delta\boldsymbol{b}_{i})_{k}=\textbf{Tr}\left[\boldsymbol{C}_{i,k}^{\boldsymbol{b}}\boldsymbol{W}\right],$
	$\displaystyle(\Delta\boldsymbol{a}_{i})_{k}=\textbf{Tr}\left[\boldsymbol{C}_{i,k}^{\boldsymbol{a}}\boldsymbol{W}\right],\quad\Delta\boldsymbol{\pi}_{i}(k)=\textbf{Tr}\left[\boldsymbol{C}_{i,k}^{\boldsymbol{\pi}}\boldsymbol{W}\right],$

where $\Delta\boldsymbol{r}_{i}$ , $\Delta\boldsymbol{b}_{i}$ , $\Delta\boldsymbol{a}_{i}$ , $\Delta\boldsymbol{\pi}_{i}(k)$ are defined by (20), (24), (26) and (27) respectively. For any matrices $\boldsymbol{M},\boldsymbol{M}_{1},\boldsymbol{M}_{2}\in\mathbb{R}^{n\times n}$ , we define the variance of $\textbf{Tr}[\boldsymbol{M}\boldsymbol{W}]$ as

\displaystyle V_{\boldsymbol{M}}:=\sum_{i\in[n]}M_{ii}^{2}H_{ii}(1-H_{ii})+\sum_{1\leq i<j\leq n}(M_{ij}+M_{ji})^{2}H_{ij}(1-H_{ij}),

and the covariance of $\textbf{Tr}[\boldsymbol{M}_{1}\boldsymbol{W}]$ and $\textbf{Tr}[\boldsymbol{M}_{2}\boldsymbol{W}]$ as

\displaystyle V_{\boldsymbol{M}_{1},\boldsymbol{M}_{2}}:=\sum_{i\in[n]}M_{1,ii}M_{2,ii}H_{ii}(1-H_{ii})+\sum_{1\leq i<j\leq n}(M_{1,ij}+M_{1,ji})(M_{2,ij}+M_{2,ji})H_{ij}(1-H_{ij}).

The asymptotic distribution of $\widehat{\boldsymbol{\pi}}_{i}$ is given by the following Theorem:

Theorem 7.

Let $\rho(\cdot)$ be the permutation in Theorem 5. Let $r$ be a fixed integer and

\displaystyle\mathcal{I}=\left\{(i_{1},k_{1}),(i_{2},k_{2}),\dots,(i_{r},k_{r})\right\}

be $r$ distinct fixed pairs from $[n]\times[K]$ . We consider vectors

\displaystyle\widehat{\boldsymbol{\pi}}_{\mathcal{I}}:=\left(\widehat{\boldsymbol{\pi}}_{i_{1}}(\rho(k_{1})),\widehat{\boldsymbol{\pi}}_{i_{2}}(\rho(k_{2})),\dots,\widehat{\boldsymbol{\pi}}_{i_{r}}(\rho(k_{r}))\right)^{\top}\text{ and }\boldsymbol{\pi}_{\mathcal{I}}:=\left(\boldsymbol{\pi}_{i_{1}}(k_{1}),\boldsymbol{\pi}_{i_{2}}(k_{2}),\dots,\boldsymbol{\pi}_{i_{r}}(k_{r})\right)^{\top},

Denote by $\boldsymbol{\Sigma}$ the covariance matrix whose $j^{th}$ diagonal entry is $V_{\boldsymbol{C}^{\boldsymbol{\pi}}_{i_{j},k_{j}}},j\in[r]$ and $(j,k)$ off-diagonal entry is $V_{\boldsymbol{C}^{\boldsymbol{\pi}}_{i_{j},k_{j}},\boldsymbol{C}^{\boldsymbol{\pi}}_{i_{k},k_{k}}},j\neq k\in[r]$ . Then for any convex set $\mathcal{D}\subset\mathbb{R}^{r}$ , we have

\displaystyle\left|\mathbb{P}(\widehat{\boldsymbol{\pi}}_{\mathcal{I}}-\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D})-\mathbb{P}(\mathcal{N}(\boldsymbol{0}_{r},\boldsymbol{\Sigma})\in\mathcal{D})\right|=o(1),

as long as

\displaystyle r^{5/4}\max_{1\leq i\leq j\leq n}\left\|\boldsymbol{\Sigma}^{-1/2}\boldsymbol{\omega}_{ij}\right\|_{2}=o(1)\text{ and }\lambda_{r}^{-1/2}(\boldsymbol{\Sigma})r^{3/4}\varepsilon_{3}=o(1),

(29)

where for each pair $(i,j)\in[n]\times[n]$ such that $i\leq j$ ,

\displaystyle\boldsymbol{\omega}_{ij}=\begin{cases}\left((\boldsymbol{C}^{\boldsymbol{\pi}}_{i_{1},k_{1}})_{ij}+(\boldsymbol{C}^{\boldsymbol{\pi}}_{i_{1},k_{1}})_{ji},(\boldsymbol{C}^{\boldsymbol{\pi}}_{i_{2},k_{2}})_{ij}+(\boldsymbol{C}^{\boldsymbol{\pi}}_{i_{2},k_{2}})_{ji},\dots,(\boldsymbol{C}^{\boldsymbol{\pi}}_{i_{r},k_{r}})_{ij}+(\boldsymbol{C}^{\boldsymbol{\pi}}_{i_{r},k_{r}})_{ji}\right)^{\top}&i\neq j;\\ \left((\boldsymbol{C}^{\boldsymbol{\pi}}_{i_{1},k_{1}})_{ii},(\boldsymbol{C}^{\boldsymbol{\pi}}_{i_{2},k_{2}})_{ii},\dots,(\boldsymbol{C}^{\boldsymbol{\pi}}_{i_{r},k_{r}})_{ii}\right)^{\top}&i=j.\end{cases}

Proof.

See Section 8.20. ∎

Next, we will apply Theorem 7 to answer some inference questions. In many practical applications, one is concerned with the characteristics of each community rather than the index (e.g., $1,2,\dots,K$ ) of the community. For this reason, we will assume that the permutation $\rho(\cdot)$ is the identical map, that is, $\rho(i)=i$ for all index $i$ .

Example 1.

Given a node $i$ in the network, it is natural to ask which community it is closest to. This amounts to find the largest component of $\{\pi_{i}(k)\}_{k=1}^{K}$ and can be formulated as $K$ hypothesis testing problems. For $k\in[K]$ , we consider the following testing problem:

\displaystyle H_{k0}:\boldsymbol{\pi}_{i}(k)\leq\max_{l\in[K]\backslash\{k\}}\boldsymbol{\pi}_{i}(l)\quad\text{ v.s. }\quad H_{ka}:\boldsymbol{\pi}_{i}(k)>\max_{l\in[K]\backslash\{k\}}\boldsymbol{\pi}_{i}(l).

According to Theorem 7 with $r=2$ , we consider the following Bonferroni-adjusted critical region at a significance level of $\alpha$

\displaystyle\left\{\widehat{\boldsymbol{\pi}}_{i}(k)>\widehat{\boldsymbol{\pi}}_{i}(l)+\Phi^{-1}\left(\frac{\alpha}{K-1}\right)\sqrt{\widehat{V}_{\boldsymbol{M}}},\;\forall l\in[K]\backslash\{k\}\right\},

where with $\boldsymbol{M}=\widehat{\boldsymbol{C}}^{\boldsymbol{\pi}}_{i,k}-\widehat{\boldsymbol{C}}^{\boldsymbol{\pi}}_{i,l}$ and $\widehat{\boldsymbol{H}}:=\sum_{i=1}^{K}\widehat{\lambda}_{i}\widehat{\boldsymbol{u}}_{i}\widehat{\boldsymbol{u}}_{i}^{\top}$ ,

\displaystyle\widehat{V}_{\boldsymbol{M}}=\sum_{i\in[n]}M_{ii}^{2}\widehat{H}_{ii}(1-\widehat{H}_{ii})+\sum_{1\leq i<j\leq n}(M_{ij}+M_{ji})^{2}\widehat{H}_{ij}(1-\widehat{H}_{ij}).

(30)

It is easy to see from the above critical region that at most one of $H_{10},H_{20},\dots,H_{K0}$ can be rejected. Once $H_{k0}$ is rejected, we can conclude that node $i$ is closest to community $k$ at a significance level of $\alpha$ .

Example 2.

Moving beyond the question of closest community detection, it is often of interest to understand the rank of node $i$ with respect to community $k$ such as a ranking of conservativeness of a journal or a book. Let $R_{i,k}$ be the rank of $\boldsymbol{\pi}_{i}(k)$ among $\boldsymbol{\pi}_{1}(k),\boldsymbol{\pi}_{2}(k),\dots,\boldsymbol{\pi}_{n}(k)$ . We adopt the method proposed by [FLWY22] to construct rank confidence interval for $R_{i,k}$ . Consider the following random variable $\mathcal{T}$ and its bootstrap counterpart $\mathcal{G}$ :

	$\displaystyle\mathcal{T}$	$\displaystyle=\max_{j:j\neq i}\left\|\frac{(\widehat{\boldsymbol{\pi}}_{j}(k)-\widehat{\boldsymbol{\pi}}_{i}(k))-(\boldsymbol{\pi}_{j}(k)-\boldsymbol{\pi}_{i}(k))}{\sqrt{V_{\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}}}}\right\|,$
	$\displaystyle\mathcal{G}$	$\displaystyle=\max_{j:j\neq i}\left\|\frac{\textbf{Tr}\left[\left(\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}\right)\left(\boldsymbol{W}\odot\boldsymbol{G}\right)\right]}{\sqrt{V_{\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}}}}-\frac{\sum_{1\leq a\leq b\leq n}G_{ab}}{(n^{2}+n)/2}\frac{\textbf{Tr}\left[\left(\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}\right)\boldsymbol{W}\right]}{\sqrt{V_{\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}}}}\right\|,$

where $\boldsymbol{G}\in\mathbb{R}^{n\times n}$ is a symmetric random matrix whose upper triangular (include diagonal) entries are i.i.d. standard Gaussian distribution and $\boldsymbol{A}\odot\boldsymbol{B}$ denotes the elementwise product of matrices $\boldsymbol{A}$ and $\boldsymbol{B}$ . Given any $\alpha\in(0,1)$ , we define $c_{1-\alpha}$ as the $(1-\alpha)$ th quantile of the conditional distribution of $\mathcal{G}$ given $\boldsymbol{W}$ . Then by [CCKK22, Theorem 2.2], one can show that

\displaystyle\left|\mathbb{P}(\mathcal{T}>c_{1-\alpha})-\alpha\right|\to 0

(31)

under mild regularity condition (See Section 8.23 for more details). Using the plug-in estimators and estimated critical value $\widehat{c}_{1-\alpha}$ from the bootstrap samples, we construct the following simultaneous confidence intervals for $\{\boldsymbol{\pi}_{j}(k)-\boldsymbol{\pi}_{i}(k)\}_{j\in[n]\backslash\{i\}}$ with a confidence level of $1-\alpha$ as

\displaystyle\left[C_{L}(j),C_{U}(j)\right]:=\left[\widehat{\boldsymbol{\pi}}_{j}(k)-\widehat{\boldsymbol{\pi}}_{i}(k)\pm\widehat{c}_{1-\alpha}\sqrt{\widehat{V}_{\widehat{\boldsymbol{C}}^{\boldsymbol{\pi}}_{j,k}-\widehat{\boldsymbol{C}}^{\boldsymbol{\pi}}_{i,k}}}\right].

In other words, for all $j\in[n]\backslash\{i\}$ , $\boldsymbol{\pi}_{j}(k)-\boldsymbol{\pi}_{i}(k)\in[C_{L}(j),C_{U}(j)]$ with probability at least $1-\alpha$ . Now $C_{L}(j)>0$ implies $\boldsymbol{\pi}_{j}(k)>\boldsymbol{\pi}_{i}(k)$ and counting the number of such $j^{\prime}s$ give the lower bound the rank of $\boldsymbol{\pi}_{i}(k)$ . Similarly, $C_{U}(j)<0$ implies $\boldsymbol{\pi}_{j}(k)<\boldsymbol{\pi}_{i}(k)$ and this gives an upper bound on the rank of $\boldsymbol{\pi}_{i}(k)$ . As a result,

\displaystyle\left[1+\sum_{j\in[n]\backslash\{i\}}\mathbb{I}(C_{L}(j)>0),n-\sum_{j\in[n]\backslash\{i\}}\mathbb{I}(C_{U}(j)<0)\right]

forms a $100(1-\alpha)\%$ confidence interval for $R_{i,k}$ .

Example 3.

[FFHL22b] proposed the SIMPLE test to study the statistical inference on the membership profiles. Specifically, for each node pair $i\neq j\in[n]$ , we are interested in the following testing problem:

\displaystyle H_{0}:\boldsymbol{\pi}_{i}=\boldsymbol{\pi}_{j}\quad\text{ v.s. }\quad H_{a}:\boldsymbol{\pi}_{i}\neq\boldsymbol{\pi}_{j}.

Theorem 7 allows us to recover their result. To see this, we take $r=2(K-1)$ ,

\displaystyle\mathcal{I}=\{(i,1),(i,2),\dots,(i,K-1),(j,1),(j,2),\dots,(j,K-1)\}.

We define matrix $\boldsymbol{T}\in\mathbb{R}^{(K-1)\times(2K-2)}$ by

\displaystyle T_{pq}=\begin{cases}1&q=p\\ -1&q=p+K-1\\ 0&\text{otherwise}\end{cases},\quad\forall(p,q)\in[K-1]\times[2K-2].

As a result, by Theorem 7, as long as condition (29) holds, under null hypothesis $H_{0}:\boldsymbol{\pi}_{i}=\boldsymbol{\pi}_{j}$ we have

\displaystyle\Bigl{(}(\widehat{\boldsymbol{\pi}}_{i})_{1:K-1}-(\widehat{\boldsymbol{\pi}}_{j})_{1:K-1}\Bigr{)}^{T}\left(\boldsymbol{T}\boldsymbol{\Sigma}\boldsymbol{T}^{\top}\right)^{-1}\Bigl{(}(\widehat{\boldsymbol{\pi}}_{i})_{1:K-1}-(\widehat{\boldsymbol{\pi}}_{j})_{1:K-1}\Bigr{)}\to\chi^{2}_{K-1}.

This Hotelling type of statistic can be used to test the null hypothesis for two individual nodes and recovers the result in [FFHL22b].

6 Numerical Studies

In this section, we conduct numerical experiments on both synthetic data and real data to complement our theoretical results. We first validate our distributional results by simulations. Then, we apply our approach to stock dataset to study the simplex structure and do rank inference, as we have mentioned in previous examples.

6.1 Synthetic Data Simulation

Here, we conduct synthetic data experiments to verify our uncertainty quantification results in Theorems 6 and 7. Our simulation setup is as follows: set the number of nodes $n=2000$ and number of communities $K=2$ . To generate the membership matrix $\boldsymbol{\Pi}$ , we first set the first two rows of the $2000\times 2$ matrix $\boldsymbol{\Pi}$ as $[1,0]$ and $[0,1]$ , as two pure nodes. The first entries of the remaining $1998$ rows are sampled independently from the uniform distribution over interval $[0.1,0.9]$ , while the second entries are determined by the first entries since the row sum is $1$ . Then we randomly shuffle the rows of $\boldsymbol{\Pi}$ . For the matrix $\boldsymbol{P}\in\mathbb{R}^{2\times 2}$ , we set its diagonals as $1$ and off-diagonals as $0.2$ . In terms of the $\theta_{i}$ , which partially represents the signal strength, we consider three settings: (i) $\theta_{i}=0.6$ for all nodes. (ii) $\theta_{i}$ ’s are sampled independently from the uniform distribution over interval $[0.3,0.9]$ . (iii) $\theta_{i}=0.9$ for all nodes. In each setting, we generate the network and obtain the estimated mixed membership matrix estimator $\widehat{\boldsymbol{\Pi}}$ $500$ times. We then record the realizations of the following standardized random variable

\displaystyle\frac{\widehat{\boldsymbol{\pi}}_{1}(1)-\boldsymbol{\pi}_{1}(1)}{\sqrt{\widehat{V}_{\widehat{\boldsymbol{C}}^{\boldsymbol{\pi}}_{1,1}}}},

where $\widehat{V}$ is defined by (30) and $\widehat{\boldsymbol{C}}^{\boldsymbol{\pi}}_{1,1}$ is the plug-in estimator of $\boldsymbol{C}^{\boldsymbol{\pi}}_{1,1}$ given by (28). Figure 1 summarizes the results collected from the $500$ simulations. The three plots in the first row show the histograms of the results from each setting, and the orange curve is the density of standard normal distribution. The three Q-Q plots in the second row further examine the normality in the three settings. These results suggest that the random variable is nearly normally distributed and the estimated asymptotic variance is right. They in turn support further our theoretical results Theorem 6 and Theorem 7.

Refer to caption — Figure 1: Histograms and Q-Q plots for validating the normality of $(\widehat{\boldsymbol{\pi}}_{1}(1)-\boldsymbol{\pi}_{1}(1))/\sqrt{\widehat{\textbf{var}}(\widehat{\boldsymbol{C}}^{\boldsymbol{\pi}}_{1,1})}$ . The orange curves in the first row of plots are the density of standard normal distribution. Three columns represent three choices of $\theta_{i}$ , and in each setting the simulation is repeated for $500$ times.

6.2 Real Data Experiments

In this subsection, we apply our uncertainty quantification results to financial dataset. Our dataset consists of the daily close prices of the S&P 500 stocks from January 1, 2010 to December 31, 2022 from Yahoo Finance¹¹1https://pypi.org/project/yfinance/ and we calculated the log returns. We clean the data by removing the stocks with more than one missing values. For those with just one missing value, we let the corresponding log returns be zero. We would like to construct a network based on the correlation of the log returns. It is well known in finance that some common factors account for much of this correlation. Similar to [FFHL19], we first fit a factor model with five factors to remove these common factors and then construct the network based on the covariance matrix of the idiosyncratic components. Letting $\boldsymbol{A}$ be the covariance matrix of the idiosyncratic components, we draw an edge between nodes $i$ and $j$ if and only if $A_{ij}>0.1$ . In this way, we obtain the adjacency matrix $\boldsymbol{X}$ . After the preprocessing steps, we have $n=433$ stocks remained.

We take $K=3$ and apply the SCORE normalization step to the leading eigenvectors of $\boldsymbol{X}$ . On the left side of Figure 2 we display the scatter plot of $\widehat{\boldsymbol{r}}_{i}$ for $i\in[n]$ and show the $2$ -dimensional simplex structure. As we can see from the figure, it has a clear triangular structure. Looking closely at the nodes, we can find some characteristics of the three corners of this triangle. Many financial companies (PNC, NDAQ, TROW, RE, PSA) are very close to the vertex of the right corner, which can be viewed as the pure node of this community. And, many other financial companies (AJG, WRB, AXP, BRO, C) are also closer to this corner compared to the other two corners. In the top corner, we can find companies (REGN, EW, HOLX, ILMN) belonging to the healthcare industries. Moreover, some other healthcare companies PFE, HUM, LH, TMO, ABT can also be viewed as mixed members which are closer to this community. Similarly, companies related to the energy industry such as MPC, BA, TDY, EMR, AEP, DOV, XEL, FE make up a large part of the bottom corner, while other similar companies ED, LNT seem to be mixed members close to this corner. To validate these observations, we apply Theorem 7 to conduct the following tests for each $i\in[n]$ as we have stated in Example 1.

	$\displaystyle H_{01}:\boldsymbol{\pi}_{i}(1)\leq\max\{\boldsymbol{\pi}_{i}(2),\boldsymbol{\pi}_{i}(3)\}\quad\text{ v.s. }\quad H_{a1}:\boldsymbol{\pi}_{i}(1)>\max\{\boldsymbol{\pi}_{i}(2),\boldsymbol{\pi}_{i}(3)\};$
	$\displaystyle H_{02}:\boldsymbol{\pi}_{i}(2)\leq\max\{\boldsymbol{\pi}_{i}(1),\boldsymbol{\pi}_{i}(3)\}\quad\text{ v.s. }\quad H_{a2}:\boldsymbol{\pi}_{i}(2)>\max\{\boldsymbol{\pi}_{i}(1),\boldsymbol{\pi}_{i}(3)\};$		(32)
	$\displaystyle H_{03}:\boldsymbol{\pi}_{i}(3)\leq\max\{\boldsymbol{\pi}_{i}(1),\boldsymbol{\pi}_{i}(2)\}\quad\text{ v.s. }\quad H_{a3}:\boldsymbol{\pi}_{i}(3)>\max\{\boldsymbol{\pi}_{i}(1),\boldsymbol{\pi}_{i}(2)\};$

At most one of $H_{01},H_{02}$ and $H_{03}$ can be rejected. On the right side of Figure 2, we show the results of the aforementioned tests. We use red/blue/green to represent that $H_{01}$ / $H_{02}$ / $H_{03}$ is rejected respectively. If none of these three null hypothesises is rejected, we let be point be grey. As we can see from Figure 2, for most ( $293/433=67.7\%$ ) of the nodes, one of $H_{01},H_{02}$ and $H_{03}$ is rejected. In other words, although many nodes have mixed membership, we can identify them to be closer to one community. Furthermore, the test results confirm our observation about the three corners we have mentioned before. It is worth mentioning that the information technology companies, which make up a large proportion of the S&P 500 list, can be found in abundance in any part of the simplex. This also indicates that the prosperous development of information technology industry is a result of the nurturing of traditional industries.

Next, we apply our approach in Example 2 to construct rank confidence interval for the nodes. We summarize our inference results in Table 1. We select $11$ stocks as representatives for presentation. First, we include the three estimated vertices. Second, of the four categories (red/blue/green/grey) shown in Figure 2, we randomly select two from each category and we label them with R/B/G/C (C stands for center). In Table 1 we include the estimated mixed membership vectors $\widehat{\pi}_{i}$ for each stock as well as the three $95\%$ rank confidence intervals for $\pi_{i}(1),\pi_{i}(2)$ and $\pi_{i}(3)$ , and we denote them by RCI I, RCI II and RCI III. As we can see from Table 1, our approach provides meaningful rank confidence intervals for the stocks and the categories which they are close to.

Symbol	$\widehat{\boldsymbol{\pi}}_{i}$	RCI I	RCI II	RCI III
AAL ( $V_{I}$ )	$[1,0,0]$	$[1,11]$	$[206,433]$	$[382,433]$
LLY ( $V_{II}$ )	$[0,1,0]$	$[256,433]$	$[1,13]$	$[381,433]$
MPC ( $V_{III}$ )	$[0,0,1]$	$[137,433]$	$[148,433]$	$[1,7]$
META (C)	$[0.457,0.177,0.366]$	$[96,127]$	$[163,341]$	$[179,255]$
EBAY (C)	$[0.379,0.220,0.401]$	$[111,157]$	$[150,277]$	$[157,228]$
DHR (R)	$[0.959,0.017,0.025]$	$[1,19]$	$[218,433]$	$[381,433]$
HOLX (R)	$[0.933,0.024,0.044]$	$[1,23]$	$[227,433]$	$[369,433]$
PNC (B)	$[0.009,0.983,0.008]$	$[253,433]$	$[1,17]$	$[380,433]$
MOS (B)	$[0.037,0.947,0.016]$	$[243,433]$	$[1,23]$	$[378,433]$
AEP (G)	$[0.080,0.063,0.857]$	$[190,433]$	$[178,421]$	$[10,37]$
LYB (G)	$[0.159,0.071,0.771]$	$[166,400]$	$[221,433]$	$[35,51]$

Table 1: Estimated mixed membership profile vectors

\widehat{\pi}_{i}

and

95\%

confidence intervals for the ranks of some representative stocks’s membership profiles with respect to the three estimated vertices, denoted respectively RC I, RC II, and RC III. AAL, LLY, MPC are the estimated vertices of the three categories. R/B/G/C stands for red/blue/green/grey(center), the color group that has been shown in Figure 2.

Next, we investigate whether there has been a change of simplex structure before the COVID- $19$ and after the pandemic began. We use stock data from January 1, 2017 to January 1, 2020 as before COVID- $19$ data, and stock data from May 1, 2020 to May 1, 2023 as after COVID- $19$ data. We follow the same data preprocessing procedure as before, while the threshold for $A_{ij}$ is replaced by $0.12$ when dealing with the after COVID-19 data. And, we also conduct the aforementioned tests (32) to these two dataset. In Figure 3 we include the experiment results and the results of tests are also shown by the colors.

Recall that we have identified three categories with finance, healthcare, and energy as their respective representatives in Figure 2. From the top two plots of Figure 3 we can see that the ‘finance corner’ (red) and the ‘energy corner’ (blue) are consistent with each other, and these structures are similar to what we have observed in Figure 2 except for some companies (e.g., AXP, BRO, ED, LNT, WRP). However, a structural change in another corner brings us some interesting observations. First, the remaining corner (green) in the top left plot (before COVID- $19$ ) of Figure 3 is a mix of companies from many different industries, and there is no industry can be considered representative of this corner. Second, the healthcare companies from the ‘healthcare corner’ in Figure 2, are now in the center cluster of the top left plot. The bottom left plot of Figure 3 is the zoomed plot of the center cluster of the top left plot, and all the companies (EW, HUM, HOLX, ABT, REGN, TMO, LH, ILMN, PFE) which have been identified as the representatives of the ‘healthcare corner’ in Figure 2 are there. However, when we look at the after COVID- $19$ plots (top right and bottom right of Figure 3), we find that these healthcare companies move from the center cluster to the green corner, and consequently make the green corner a ‘healthcare corner’. The bottom right plot of Figure 3 is the zoomed plot of the green corner of the top right plot, and again we can see that all the nine aforementioned healthcare companies are here. To sum up, the ‘healthcare corner’ we have observed in Figure 2 does not exist before COVID- $19$ . At that time, the healthcare companies are more in the center of the simplex, and this indicates them to be a mix of the different industries. After the pandemic began, the healthcare industry have grown dramatically, and now they become the representative of the third corner along with ‘finance corner’ and ‘energy corner’.

7 Proof Outline

In this section, we sketch the main steps of the proof. Details of the proof will be provided in Section 8. We begin by showing an “eigen-gap” property, which is fundamental to the matrix analysis. To this end, we provide the high-probability bound on the spectral norm of the noise matrix $\boldsymbol{W}$ .

Lemma 4.

The following event $\mathcal{A}_{1}$ happens with probability at least $1-O(n^{-10})$ :

\displaystyle\mathcal{A}_{1}=\left\{\left\|\boldsymbol{W}\right\|\leq C_{2}\theta_{\text{max}}\sqrt{n}\right\}.

(33)

Proof.

By definition we know that $|W_{ij}|\leq 1$ . Assumption 3 implies

\displaystyle\max_{i,j}\mathbb{E}\left[W_{ij}^{2}\right]

\displaystyle=\max_{i,j}H_{ij}\left(1-H_{ij}\right)\leq\max_{i,j}H_{ij}\leq\max_{i,j}\theta_{i}\theta_{j}\boldsymbol{\pi}_{i}^{\top}\boldsymbol{P}\boldsymbol{\pi}_{j}\lesssim\theta_{\text{max}}^{2}.

As a result, by [CCF⁺21, Theorem 3.4] we know that

\displaystyle\left\|\boldsymbol{W}\right\|\lesssim\theta_{\text{max}}\sqrt{n}+\sqrt{\log n}\lesssim\theta_{\text{max}}\sqrt{n}

with probability at least $1-O(n^{-10})$ . ∎

We are going to prove our results under this favorable set $\mathcal{A}_{1}$ . We begin by listing the key results leading to Theorem 1 and Theorem 2. A key assumption in all these results will be either $\theta_{\text{max}}\sqrt{n}\ll\lambda_{1}^{*}$ (for results regarding $\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{*}$ ) or $\theta_{\text{max}}\sqrt{n}\ll\sigma_{\textbf{min}}^{*}$ (for results regarding $\overline{\boldsymbol{U}}\boldsymbol{R}-\overline{\boldsymbol{U}}^{*}$ ) to guarantee the eigen-gap. From Lemma 1, we already know that these two assumptions are mild.

We now state the results for $\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{*}$ . The proofs rely on contour integrals, similar in spirit to [FFHL22a].

Theorem 8.

Assume that $\theta_{\text{max}}\sqrt{n}\ll\lambda_{1}^{*}$ . Under the event $\mathcal{A}_{1}$ defined in (33), we can write the following expansion

\displaystyle\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{*}=\sum_{i=2}^{n}\frac{\boldsymbol{u}_{i}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}}{\lambda_{1}^{*}-\lambda_{i}^{*}}\boldsymbol{u}_{i}^{*}+\boldsymbol{\delta},

(34)

where $\|\boldsymbol{\delta}\|_{2}\lesssim\theta_{\text{max}}^{2}n/\lambda_{1}^{*2}$ .

Proof.

See Section 8.1. ∎

Theorem 8 expresses the quantity $\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{*}$ as the sum of a leading term (first summand of the RHS of (34)) and a error-term $\boldsymbol{\delta}$ , along with its $l_{2}$ bound. We also need an $l_{\infty}$ bound of $\boldsymbol{\delta}$ which is provided by the following Theorem.

Theorem 9.

Assume that $\max\{\sqrt{n}\theta_{\text{max}},\log n\}\ll\lambda_{1}^{*}$ . Then with probability at least $1-O(n^{-10})$ , we have

	$\displaystyle\left\\|\boldsymbol{\delta}\right\\|_{\infty}\lesssim$	$\displaystyle\frac{\theta_{\text{max}}^{3}n^{1.5}}{\lambda_{1}^{3}}+\frac{\sqrt{n((K-1)\mu^{}+\log n)}\theta_{\text{max}}^{2}}{\lambda_{1}^{*2}}$
		$\displaystyle+\frac{\log n\left(\sqrt{\log n}+\sqrt{(K-1)\mu^{}}\right)\theta_{\text{max}}+\log^{2}n\sqrt{\mu^{}/n}}{\lambda_{1}^{*2}},$

where $\delta$ is defined via (34).

Proof.

See Section 8.2. ∎

Combining Theorem 8 and Theorem 9 with triangle inequality provides the next lemma.

Lemma 5.

Assume that $\max\{\sqrt{n}\theta_{\text{max}},\log n\}\ll\lambda_{1}^{*}$ . Then with probability at least $1-O(n^{-10})$ , we have

\displaystyle\left\|\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{*}\right\|_{\infty}\lesssim

\displaystyle\frac{\left(\sqrt{\log n}+\sqrt{(K-1)\mu^{*}}\right)\theta_{\text{max}}+\log n\sqrt{\mu^{*}/n}}{\lambda_{1}^{*}}+\frac{\theta_{\text{max}}^{3}n^{1.5}}{\lambda_{1}^{*3}}.

Proof.

See Section 8.3. ∎

The combination of Theorem 8, Theorem 9 and Lemma 5 is exactly Theorem 1.

The next goal is to analyze $\overline{\boldsymbol{U}}\boldsymbol{R}-\overline{\boldsymbol{U}}^{*}$ . This is a matrix denoising problem with ground truth $\overline{\boldsymbol{H}}$ and noisy matrix $\overline{\boldsymbol{X}}$ , where

\displaystyle\overline{\boldsymbol{H}}=\sum_{i=2}^{K}\lambda_{i}^{*}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top},\quad\overline{\boldsymbol{X}}=\sum_{i=2}^{n}\widehat{\lambda}_{i}\widehat{\boldsymbol{u}}_{i}\widehat{\boldsymbol{u}}_{i}^{\top}.

Define the noise matrix

\overline{\boldsymbol{W}}:=\overline{\boldsymbol{X}}-\overline{\boldsymbol{H}}=\boldsymbol{W}-[\widehat{\lambda}_{1}\widehat{\boldsymbol{u}}_{1}\widehat{\boldsymbol{u}}_{1}^{\top}-\lambda_{1}^{*}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}]

(35)

Unlike $\boldsymbol{W}$ , the matrix $\overline{\boldsymbol{W}}$ does not have a close form expression. The following expansion for $\overline{\boldsymbol{W}}$ will be useful for our results.

Lemma 6.

Assume that $\sqrt{n}\ll\lambda_{1}^{*}$ . Then we have

\displaystyle\overline{\boldsymbol{W}}=\boldsymbol{W}-\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}-\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}-\left(\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right)^{\top}\boldsymbol{N}-\boldsymbol{\Delta},

where $N$ is defined by (14).

Further, with probability at least $1-O(n^{-10})$ we have

\displaystyle\|\boldsymbol{\Delta}\|\lesssim n/\lambda_{1}^{*}\quad\text{and}\quad\left\|\boldsymbol{\Delta}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}\lesssim\sqrt{\frac{(K-1)\mu^{*}n}{\lambda_{1}^{*2}}}+\frac{n^{1.5}}{\lambda_{1}^{*2}}.

Proof.

See Theorem 10 and Lemma 11. ∎

In order to study the expansion of $\overline{\boldsymbol{W}}$ , defined via (35), it is enough to expand $\widehat{\lambda}_{1}\widehat{\boldsymbol{u}}_{1}\widehat{\boldsymbol{u}}_{1}^{\top}-\lambda_{1}^{*}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}$ . We state our result here. The proof, which uses contour integrals, is deferred to Section 8.4.

Theorem 10.

Assume that $\sqrt{n}\theta_{\text{max}}\ll\lambda_{1}^{*}$ . Then under event $\mathcal{A}_{1}$ defined by (33), we have the following expansion:

\displaystyle\widehat{\lambda}_{1}\widehat{\boldsymbol{u}}_{1}\widehat{\boldsymbol{u}}_{1}^{\top}-\lambda_{1}^{*}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}=\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+\left(\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right)^{\top}\boldsymbol{N}+\boldsymbol{\Delta},

(36)

where $\boldsymbol{N}\overset{\Delta}{=}\sum_{i=2}^{n}\frac{\lambda_{1}^{*}}{\lambda_{1}^{*}-\lambda_{i}^{*}}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}$ is a symmetric matrix and $\|\boldsymbol{\Delta}\|\lesssim n\theta_{\text{max}}^{2}/\lambda_{1}^{*}$ .

Theorem 10 shows the first part of Lemma 6, while the second part, the bound for $\|\boldsymbol{\Delta}\overline{\boldsymbol{U}}^{*}\|_{2,\infty}$ , will be shown later. From Theorem 10 we can directly deduce the following corollary.

Corollary 2.

Assume that $\sqrt{n}\theta_{\text{max}}\ll\lambda_{1}^{*}$ . Then under event $\mathcal{A}_{1}$ , we have

\displaystyle\left\|\widehat{\lambda}_{1}\widehat{\boldsymbol{u}}_{1}\widehat{\boldsymbol{u}}_{1}^{\top}-\lambda_{1}^{*}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right\|\lesssim\sqrt{n}\theta_{\text{max}}.

Proof.

See Section 8.5. ∎

Corollary 2, coupled with Lemma 4, shows that under event $\mathcal{A}_{1}$ ,

\displaystyle\|\overline{\boldsymbol{W}}\|\leq\|\boldsymbol{W}\|+\left\|\widehat{\lambda}_{1}\widehat{\boldsymbol{u}}_{1}\widehat{\boldsymbol{u}}_{1}^{\top}-\lambda_{1}^{*}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right\|\lesssim\sqrt{n}\theta_{\text{max}}.

(37)

Equipped with Theorem 5, we are ready to prove results regarding matrix denoising. We show the following five results whose combination proves Theorem 2. Note that the first four results here are similar to [YCF21, Lemma 1-4], while the fifth results is exactly the second part of Lemma 6. To state these results, we need to introduce some notations first. Define

\displaystyle\boldsymbol{L}:=\overline{\boldsymbol{U}}^{\top}\overline{\boldsymbol{U}}^{*},\text{ }\overline{\boldsymbol{V}}:=[\widehat{\boldsymbol{v}}_{2},\dots,\widehat{\boldsymbol{v}}_{K}],\text{ }\overline{\boldsymbol{V}}^{*}:=[\boldsymbol{v}_{2}^{*},\dots,\boldsymbol{v}_{K}^{*}],\text{ }\overline{\boldsymbol{\Lambda}}:=\textbf{diag}(\widehat{\lambda}_{2},\dots,\widehat{\lambda}_{K}),

(38)

where $\widehat{\boldsymbol{v}}_{i}$ ’s are defined as (7). Recall the definition of $R$ from (10). Here we state the five lemmas whose proofs are deferred.

Lemma 7.

Assume that $\sqrt{n}\theta_{\text{max}}\ll\sigma_{\textbf{min}}^{*}$ . Then under event $\mathcal{A}_{1}$ we have

\displaystyle\left\|\overline{\boldsymbol{U}}\boldsymbol{R}-\overline{\boldsymbol{U}}^{*}\right\|\lesssim\frac{\sqrt{n}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{*}},\quad\left\|\boldsymbol{L}-\boldsymbol{R}\right\|\lesssim\frac{n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{*2}}\quad\text{and}\quad\frac{1}{2}\leq\sigma_{i}(\boldsymbol{L})\leq 2,\quad 1\leq i\leq K-1.

Furthermore, under event $\mathcal{A}_{1}$ we have $\overline{\boldsymbol{V}}=\overline{\boldsymbol{U}}\boldsymbol{D}$ , $\overline{\boldsymbol{V}}^{*}=\overline{\boldsymbol{U}}^{*}\boldsymbol{D}$ , where

\displaystyle\boldsymbol{D}=\textbf{diag}(\textbf{sgn}(\lambda_{2}^{*}),\dots,\textbf{sgn}(\lambda_{K}^{*}))\in\mathcal{O}^{(K-1)\times(K-1)}.

(39)

This immediately implies $\overline{\boldsymbol{V}}^{\top}\overline{\boldsymbol{V}}^{*}=\boldsymbol{D}\boldsymbol{L}\boldsymbol{D}$ ,

\boldsymbol{D}\boldsymbol{R}\boldsymbol{D}=\min_{\boldsymbol{O}\in\mathcal{O}^{(K-1)\times(K-1)}}\left\|\overline{\boldsymbol{V}}\boldsymbol{O}-\overline{\boldsymbol{V}}^{*}\right\|_{F},

and the same results for $\overline{\boldsymbol{V}}$ and $\overline{\boldsymbol{V}}^{*}$ are also true. In fact, we have

\displaystyle\left\|\overline{\boldsymbol{V}}\boldsymbol{D}\boldsymbol{R}\boldsymbol{D}-\overline{\boldsymbol{V}}^{*}\right\|=\left\|\overline{\boldsymbol{U}}\boldsymbol{R}-\overline{\boldsymbol{U}}^{*}\right\|.

Proof.

See Section 8.6. ∎

Lemma 8.

Assume that $\sqrt{n}\theta_{\text{max}}\ll\sigma_{\textbf{min}}^{*}$ and $n^{2}\theta_{\text{max}}^{2}\gtrsim(K-1)\mu^{*2}\log n$ . Then with probability exceeding $1-O(n^{-10})$ we have

	$\displaystyle\left\\|\boldsymbol{R}^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{R}-\overline{\boldsymbol{\Lambda}}^{}\right\\|\lesssim\frac{\kappa^{}n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{*}}+\sqrt{(K-1)\log n}\theta_{\text{max}},$
	$\displaystyle\left\\|\boldsymbol{L}^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{\Lambda}}^{}\right\\|\lesssim\frac{\theta_{\text{max}}^{3}n^{1.5}}{\sigma_{\textbf{min}}^{2}}+\sqrt{(K-1)\log n}\theta_{\text{max}}.$

Proof.

See Section 8.7. ∎

Lemma 9.

Assume that $\sqrt{(K-1)\log n}\theta_{\text{max}}/\sigma_{\textbf{min}}^{*}+\kappa^{*}n\theta_{\text{max}}^{2}/\sigma_{\textbf{min}}^{*2}\ll 1$ and $\max\{\sqrt{n}\theta_{\text{max}},\log n\}\ll\sigma_{\textbf{min}}^{*}$ . Then with probability at least $1-O(n^{-10})$ we have

	$\displaystyle\left\\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}\right\\|_{2,\infty}\lesssim$	$\displaystyle\left(\frac{n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{}}+\log n\right)\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{}\right\\|_{2,\infty}+\frac{\sqrt{(K-1)n\log n}}{\sigma_{\textbf{min}}^{*}}\theta_{\text{max}}$
		$\displaystyle+\frac{(\sqrt{(K-1)n}\theta_{\text{max}}+\log n)n\theta_{\text{max}}^{2}}{\lambda_{1}^{}\sigma_{\textbf{min}}^{}}+\frac{(\sqrt{n}\theta_{\text{max}}+\log n)^{2}\sqrt{(K-1)\mu^{}/n}}{\sigma_{\textbf{min}}^{}}$
		$\displaystyle+\frac{\kappa^{}}{\sigma_{\textbf{min}}^{}}\sqrt{(K-1)\mu^{*}n}\theta_{\text{max}}^{2}.$

Proof.

See Section 8.8. ∎

Lemma 10.

Assume that $\sqrt{(K-1)\log n}\theta_{\text{max}}/\sigma_{\textbf{min}}^{*}+\kappa^{*}n\theta_{\text{max}}^{2}/\sigma_{\textbf{min}}^{*2}\ll 1$ and $n\gtrsim\mu^{*}\max\{\log^{2}n,K-1\}$ . Then with probability at least $1-O(n^{-10})$ we have

	$\displaystyle\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\\|_{2,\infty}\lesssim$	$\displaystyle\frac{\kappa^{}}{\sigma_{\textbf{min}}^{2}}\sqrt{(K-1)\mu^{}n}\theta_{\text{max}}^{2}+\frac{\sqrt{(K-1)n\log n}}{\sigma_{\textbf{min}}^{2}}\theta_{\text{max}}+\frac{\sqrt{K-1}n^{1.5}\theta_{\text{max}}^{3}}{\lambda_{1}^{}\sigma_{\textbf{min}}^{2}}$
		$\displaystyle+\frac{\sqrt{(K-1)\log n}\theta_{\text{max}}+\log n\sqrt{(K-1)\mu^{}/n}+\sqrt{\mu^{}}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{}}+\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{}\sigma_{\textbf{min}}^{*}}.$

Proof.

See Section 8.9. ∎

Lemma 11.

Assume that $\max\{\sqrt{n}\theta_{\text{max}},\log n\}\ll\lambda_{1}^{*}$ . Then with probability at least $1-O(n^{-10})$ we have

\displaystyle\left\|\boldsymbol{\Delta}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}\lesssim\frac{\sqrt{(K-1)\mu^{*}n}\theta_{\text{max}}^{2}}{\lambda_{1}^{*}}+\frac{\sqrt{K-1}\mu^{*}\log^{2}n}{n\lambda_{1}^{*}}+\frac{n^{1.5}\theta_{\text{max}}^{3}}{\lambda_{1}^{*2}}.

Proof.

See Section 8.10. ∎

Similar to [YCF21], we prove Theorem 2 by combining these five lemmas. The proof of Theorem 2 is included in Section 8.11. Next, we combine Theorem 1 and Theorem 2 to yield Theorem 3. See Section 8.12 for details.

Finally, to obtain the membership reconstruction results in Section 4, we need the following result regarding $\widehat{\lambda}_{1}$ , which is a direct corollary of Theorem 10.

Corollary 3.

Assume that $\max\{\sqrt{n}\theta_{\text{max}},\log n\}\ll\lambda_{1}^{*}$ . Then under event $\mathcal{A}_{1}$ defined by (33), we have the following expansion:

\displaystyle\widehat{\lambda}_{1}-\lambda_{1}^{*}=\textbf{Tr}\left[\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+2\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right]+\textbf{Tr}\left[\boldsymbol{\Delta}\right],

where $|\textbf{Tr}\left[\boldsymbol{\Delta}\right]|\lesssim n\theta_{\text{max}}^{2}/\lambda_{1}^{*}$ . In terms of the estimation error, we have $|\widehat{\lambda}_{1}-\lambda_{1}^{*}|\lesssim\sqrt{n}\theta_{\text{max}}$ .

Proof.

See Section 8.16. ∎

References

[Abb17] Emmanuel Abbe. Community detection and stochastic block models: recent developments. The Journal of Machine Learning Research, 18(1):6446–6531, 2017.
[ABEF14] Edoardo M Airoldi, David Blei, Elena A Erosheva, and Stephen E Fienberg. Handbook of mixed membership models and their applications. CRC press, 2014.
[ABFX08] Edo M Airoldi, David Blei, Stephen Fienberg, and Eric Xing. Mixed membership stochastic blockmodels. Advances in neural information processing systems, 21, 2008.
[ACV14] Ery Arias-Castro and Nicolas Verzelen. Community detection in dense random networks. The Annals of Statistics, 42(3):940 – 969, 2014.
[AFW22] Emmanuel Abbe, Jianqing Fan, and Kaizheng Wang. An $\ell_{p}$ theory of pca and spectral clustering. The Annals of Statistics, 50(4):2359–2385, 2022.
[AFWZ20] Emmanuel Abbe, Jianqing Fan, Kaizheng Wang, and Yiqiao Zhong. Entrywise eigenvector analysis of random matrices with low expected rank. Annals of statistics, 48(3):1452, 2020.
[ASG⁺01] Mário César Ugulino Araújo, Teresa Cristina Bezerra Saldanha, Roberto Kawakami Harrop Galvao, Takashi Yoneyama, Henrique Caldas Chame, and Valeria Visani. The successive projections algorithm for variable selection in spectroscopic multicomponent analysis. Chemometrics and intelligent laboratory systems, 57(2):65–73, 2001.
[BS16] Peter J Bickel and Purnamrita Sarkar. Hypothesis testing for automated community detection in networks. Journal of the Royal Statistical Society: Series B: Statistical Methodology, pages 253–273, 2016.
[CCF⁺21] Yuxin Chen, Yuejie Chi, Jianqing Fan, Cong Ma, et al. Spectral methods for data science: A statistical perspective. Foundations and Trends® in Machine Learning, 14(5):566–806, 2021.
[CCK15] Victor Chernozhukov, Denis Chetverikov, and Kengo Kato. Comparison and anti-concentration bounds for maxima of gaussian random vectors. Probability Theory and Related Fields, 162:47–70, 2015.
[CCKK22] Victor Chernozhuokov, Denis Chetverikov, Kengo Kato, and Yuta Koike. Improved central limit theorem and bootstrap approximations in high dimensions. The Annals of Statistics, 50(5):2562–2586, 2022.
[CFMW19] Yuxin Chen, Jianqing Fan, Cong Ma, and Kaizheng Wang. Spectral method and regularized mle are both optimal for top-k ranking. Annals of statistics, 47(4):2204, 2019.
[CFMY19] Yuxin Chen, Jianqing Fan, Cong Ma, and Yuling Yan. Inference and uncertainty quantification for noisy matrix completion. Proceedings of the National Academy of Sciences, 116(46):22931–22937, 2019.
[CS15] Yuxin Chen and Changho Suh. Spectral mle: Top-k rank aggregation from pairwise comparisons. In International Conference on Machine Learning, pages 371–380. PMLR, 2015.
[DKNS01] Cynthia Dwork, Ravi Kumar, Moni Naor, and Dandapani Sivakumar. Rank aggregation methods for the web. In Proceedings of the 10th international conference on World Wide Web, pages 613–622, 2001.
[FFHL19] Jianqing Fan, Yingying Fan, Xiao Han, and Jinchi Lv. Simple: Statistical inference on membership profiles in large networks. arXiv preprint arXiv:1910.01734, 2019.
[FFHL22a] Jianqing Fan, Yingying Fan, Xiao Han, and Jinchi Lv. Asymptotic theory of eigenvectors for random matrices with diverging spikes. Journal of the American Statistical Association, 117(538):996–1009, 2022.
[FFHL22b] Jianqing Fan, Yingying Fan, Xiao Han, and Jinchi Lv. Simple: Statistical inference on membership profiles in large networks. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(2):630–653, 2022.
[FFLY22] Jianqing Fan, Yingying Fan, Jinchi Lv, and Fan Yang. Simple-rc: Group network inference with non-sharp nulls and weak signals. arXiv preprint arXiv:2211.00128, 2022.
[FHY22] Jianqing Fan, Jikai Hou, and Mengxin Yu. Uncertainty quantification of mle for entity ranking with covariates. arXiv preprint arXiv:2212.09961, 2022.
[FLWY22] Jianqing Fan, Zhipeng Lou, Weichen Wang, and Mengxin Yu. Ranking inferences based on the top choice of multiway comparisons. arXiv preprint arXiv:2211.11957, 2022.
[GSZ23] Chao Gao, Yandi Shen, and Anderson Y Zhang. Uncertainty quantification in the bradley–terry–luce model. Information and Inference: A Journal of the IMA, 12(2):1073–1140, 2023.
[GV13] Nicolas Gillis and Stephen A Vavasis. Fast and robust recursive algorithmsfor separable nonnegative matrix factorization. IEEE transactions on pattern analysis and machine intelligence, 36(4):698–714, 2013.
[GZF⁺10] Anna Goldenberg, Alice X Zheng, Stephen E Fienberg, Edoardo M Airoldi, et al. A survey of statistical network models. Foundations and Trends® in Machine Learning, 2(2):129–233, 2010.
[HLL83] Paul W Holland, Kathryn Blackmond Laskey, and Samuel Leinhardt. Stochastic blockmodels: First steps. Social networks, 5(2):109–137, 1983.
[Hun04] David R Hunter. Mm algorithms for generalized bradley-terry models. The annals of statistics, 32(1):384–406, 2004.
[Jin15] Jiashun Jin. Fast community detection by SCORE. The Annals of Statistics, 43(1):57 – 89, 2015.
[JJKL21] P Ji, J Jin, ZT Ke, and W Li. Meta-analysis on citations for statisticians. manuscript.[1, 10], 2021.
[JKL23] Jiashun Jin, Zheng Tracy Ke, and Shengming Luo. Mixed membership estimation for social networks. Journal of Econometrics, 2023.
[KN11] Brian Karrer and Mark EJ Newman. Stochastic blockmodels and community structure in networks. Physical review E, 83(1):016107, 2011.
[Lei16] Jing Lei. A goodness-of-fit test for stochastic block models. The Annals of Statistics, 44(1):401 – 424, 2016.
[LR15] Jing Lei and Alessandro Rinaldo. Consistency of spectral clustering in stochastic block models. The Annals of Statistics, 43(1):215 – 237, 2015.
[Luc12] R Duncan Luce. Individual choice behavior: A theoretical analysis. Courier Corporation, 2012.
[LWLZ23] Tianxi Li, Yun-Jhong Wu, Elizaveta Levina, and Ji Zhu. Link prediction for egocentrically sampled networks. Journal of Computational and Graphical Statistics, pages 1–24, 2023.
[M⁺73] Daniel McFadden et al. Conditional logit analysis of qualitative choice behavior. 1973.
[New13a] Mark EJ Newman. Community detection and graph partitioning. Europhysics Letters, 103(2):28003, 2013.
[New13b] Mark EJ Newman. Spectral methods for community detection and graph partitioning. Physical Review E, 88(4):042822, 2013.
[NK07] DL NoweU and J Kleinberg. The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58(7):1019–1031, 2007.
[Rai19] Martin Raič. A multivariate Berry–Esseen theorem with explicit constants. Bernoulli, 25(4A):2824 – 2853, 2019.
[RCY11] Karl Rohe, Sourav Chatterjee, and Bin Yu. Spectral clustering and the high-dimensional stochastic blockmodel. The Annals of Statistics, 39(4):1878 – 1915, 2011.
[SC95] PC Sham and D Curtis. An extended transmission/disequilibrium test (tdt) for multi-allele marker loci. Annals of human genetics, 59(3):323–336, 1995.
[Sti94] Stephen M Stigler. Citation patterns in the journals of statistics and probability. Statistical Science, pages 94–108, 1994.
[T⁺15] Joel A Tropp et al. An introduction to matrix concentration inequalities. Foundations and Trends® in Machine Learning, 8(1-2):1–230, 2015.
[VAC15] Nicolas Verzelen and Ery Arias-Castro. Community detection in sparse random networks. The Annals of Applied Probability, 25(6):3465 – 3510, 2015.
[VL07] Ulrike Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17:395–416, 2007.
[WB17] Y. X. Rachel Wang and Peter J. Bickel. Likelihood-based model selection for stochastic block models. The Annals of Statistics, 45(2):500 – 528, 2017.
[WW87] Yuchung J Wang and George Y Wong. Stochastic blockmodels for directed graphs. Journal of the American Statistical Association, 82(397):8–19, 1987.
[YCF21] Yuling Yan, Yuxin Chen, and Jianqing Fan. Inference for heteroskedastic pca with missing data. arXiv preprint arXiv:2107.12365, 2021.
[ZM14] Pan Zhang and Cristopher Moore. Scalable detection of statistically significant communities and hierarchies, using message passing for modularity. Proceedings of the National Academy of Sciences, 111(51):18144–18149, 2014.

8 Proofs

8.1 Proof of Theorem 8

Proof.

Define $r:=\frac{\lambda_{1}^{*}-|\lambda_{2}^{*}|}{2}$ and let $\mathcal{C}_{1}$ be the circular contour around $\lambda_{1}^{*}$ with radius $r$ . Then $\lambda_{1}^{*}$ is the only eigenvalue of $\boldsymbol{H}$ that is inside $\mathcal{C}_{1}$ . Under event $\mathcal{A}_{1}$ defined by (33), by Weyl’s theorem, we know that

	$\displaystyle\widehat{\lambda}_{1}=\widehat{\sigma}_{1}\geq\sigma_{1}^{}-C_{2}\sqrt{n}=\lambda_{1}^{}-C_{2}\theta_{\text{max}}\sqrt{n},$
	$\displaystyle\widehat{\lambda}_{i}\leq\widehat{\sigma}_{\text{max}}\leq\sigma_{\text{max}}^{}+C_{2}\theta_{\text{max}}\sqrt{n}=\max_{2\leq j\leq n}\|\lambda_{j}^{}\|+C_{2}\theta_{\text{max}}\sqrt{n},\text{ for }2\leq i\leq K.$

As a result, $\widehat{\lambda}_{1}$ is the only eigenvalue of $\boldsymbol{X}$ that is inside $\mathcal{C}_{1}$ . For $\lambda\in\mathbb{C}$ , we have

	$\displaystyle\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}$	$\displaystyle=\sum_{i=1}^{n}\frac{1}{\lambda-\lambda_{i}^{}}\boldsymbol{u}_{i}^{}\boldsymbol{u}_{i}^{*\top}$		(40)
	$\displaystyle\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}$	$\displaystyle=\left(\lambda\boldsymbol{I}-\boldsymbol{H}-\boldsymbol{W}\right)^{-1}=\sum_{i=1}^{n}\frac{1}{\lambda-\widehat{\lambda}_{i}}\widehat{\boldsymbol{u}}_{i}\widehat{\boldsymbol{u}}_{i}^{\top}.$		(41)

As a result, we know that

	$\displaystyle\left\\|(\lambda\boldsymbol{I}-\boldsymbol{H})^{-1}\right\\|$	$\displaystyle=\max_{i\in[n]}\frac{1}{\|\lambda-\lambda_{i}^{}\|}=\frac{1}{r}\asymp\frac{1}{\lambda_{1}^{}},$
	$\displaystyle\left\\|(\lambda\boldsymbol{I}-\boldsymbol{X})^{-1}\right\\|$	$\displaystyle=\max_{i\in[n]}\frac{1}{\|\lambda-\widehat{\lambda}_{i}\|}\leq\max_{i\in[n]}\frac{1}{\|\lambda-\lambda_{i}^{}\|-\|\widehat{\lambda}_{i}-\lambda_{i}^{}\|}\leq\frac{1}{r-C_{2}\theta_{\text{max}}\sqrt{n}}\asymp\frac{1}{\lambda_{1}^{*}}$		(42)

under event $\mathcal{A}_{1}$ . Using (40) and (41) we know that

\displaystyle\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}d\lambda=\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\overset{\Delta}{=}\boldsymbol{P}_{1}^{*},\quad\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}d\lambda=\widehat{\boldsymbol{u}}_{1}\widehat{\boldsymbol{u}}_{1}^{\top}\overset{\Delta}{=}\widehat{\boldsymbol{P}}_{1}.

(43)

We denote by

\displaystyle\Delta\boldsymbol{P}_{1}=\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}d\lambda.

Then we know that

\displaystyle\left\|\widehat{\boldsymbol{P}}_{1}-\boldsymbol{P}_{1}^{*}-\Delta\boldsymbol{P}_{1}\right\|\leq\frac{1}{2\pi}\oint_{\mathcal{C}_{1}}\left\|\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}-\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}-\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\right\|d\lambda.

The integrand can be reformulated as,

		$\displaystyle\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}-\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}-\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}$
	$\displaystyle=$	$\displaystyle\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}\left[\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)-\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)\right]\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}-\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}$
	$\displaystyle=$	$\displaystyle\left[\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}-\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\right]\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}$
	$\displaystyle=$	$\displaystyle\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}.$

As a result, we have, under event $\mathcal{A}_{1}$ ,

		$\displaystyle\left\\|\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}-\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}-\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\right\\|$
	$\displaystyle\leq$	$\displaystyle\left\\|\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}\right\\|\left\\|\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\right\\|^{2}\left\\|\boldsymbol{W}\right\\|^{2}\lesssim\frac{\theta_{\text{max}}^{2}n}{\lambda_{1}^{*3}},$

where the first inequality uses (8.1). Hence, under event $\mathcal{A}_{1}$ ,

\displaystyle\left\|\widehat{\boldsymbol{P}}_{1}-\boldsymbol{P}_{1}^{*}-\Delta\boldsymbol{P}_{1}\right\|\lesssim\frac{1}{2\pi}\oint_{\mathcal{C}_{1}}\frac{\theta_{\text{max}}^{2}n}{\lambda_{1}^{*3}}d\lambda=\frac{\theta_{\text{max}}^{2}nr}{\lambda_{1}^{*3}}\lesssim\frac{\theta_{\text{max}}^{2}n}{\lambda_{1}^{*2}},

since $r\lesssim\lambda_{1}$ . This immediately yields

\displaystyle\left\|\left(\widehat{\boldsymbol{P}}_{1}-\boldsymbol{P}_{1}^{*}-\Delta\boldsymbol{P}_{1}\right)\boldsymbol{u}_{1}^{*}\right\|_{2}\lesssim\frac{\theta_{\text{max}}^{2}n}{\lambda_{1}^{*2}}.

(44)

By (43), we know that $\boldsymbol{P}_{1}^{*}\boldsymbol{u}_{1}^{*}=\boldsymbol{u}_{1}^{*}$ , $\widehat{\boldsymbol{P}}_{1}\boldsymbol{u}_{1}^{*}=(\widehat{\boldsymbol{u}}_{1}^{\top}\boldsymbol{u}_{1}^{*})\widehat{\boldsymbol{u}}_{1}$ . Therefore,

	$\displaystyle\Delta\boldsymbol{P}_{1}$	$\displaystyle=\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\sum_{i=1}^{n}\sum_{j=1}^{n}\frac{1}{(\lambda-\lambda_{i}^{})(\lambda-\lambda_{j}^{})}\boldsymbol{u}_{i}^{}\boldsymbol{u}_{i}^{\top}\boldsymbol{W}\boldsymbol{u}_{j}^{}\boldsymbol{u}_{j}^{\top}d\lambda$
		$\displaystyle=\sum_{i=1}^{n}\sum_{j=1}^{n}\textbf{Res}\left(\frac{1}{(\lambda-\lambda_{i}^{})(\lambda-\lambda_{j}^{})},\lambda_{1}^{}\right)\boldsymbol{u}_{i}^{}\boldsymbol{u}_{i}^{\top}\boldsymbol{W}\boldsymbol{u}_{j}^{}\boldsymbol{u}_{j}^{*\top}$
		$\displaystyle=\sum_{i=2}^{n}\frac{1}{\lambda_{1}^{}-\lambda_{i}^{}}\left(\boldsymbol{u}_{i}^{}\boldsymbol{u}_{i}^{\top}\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}+\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{u}_{i}^{}\boldsymbol{u}_{i}^{\top}\right).$

As a result, we know that

\displaystyle\Delta\boldsymbol{P}_{1}\boldsymbol{u}_{1}^{*}=\sum_{i=2}^{n}\frac{\boldsymbol{u}_{i}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}}{\lambda_{1}^{*}-\lambda_{i}^{*}}\boldsymbol{u}_{i}^{*}.

Therefore, we obtain,

	$\displaystyle\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{}-\sum_{i=2}^{n}\frac{\boldsymbol{u}_{i}^{\top}\boldsymbol{W}\boldsymbol{u}_{1}^{}}{\lambda_{1}^{}-\lambda_{i}^{}}\boldsymbol{u}_{i}^{}$	$\displaystyle=\left((\widehat{\boldsymbol{u}}_{1}^{\top}\boldsymbol{u}_{1}^{})\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{}-\sum_{i=2}^{n}\frac{\boldsymbol{u}_{i}^{\top}\boldsymbol{W}\boldsymbol{u}_{1}^{}}{\lambda_{1}^{}-\lambda_{i}^{}}\boldsymbol{u}_{i}^{}\right)+\left((\widehat{\boldsymbol{u}}_{1}^{\top}\boldsymbol{u}_{1}^{})\widehat{\boldsymbol{u}}_{1}-\widehat{\boldsymbol{u}}_{1}\right)$
		$\displaystyle=\underbrace{\left(\widehat{\boldsymbol{P}}_{1}-\boldsymbol{P}_{1}^{}-\Delta\boldsymbol{P}_{1}\right)\boldsymbol{u}_{1}^{}}_{T_{1}}+\underbrace{\left((\widehat{\boldsymbol{u}}_{1}^{\top}\boldsymbol{u}_{1}^{*})\widehat{\boldsymbol{u}}_{1}-\widehat{\boldsymbol{u}}_{1}\right)}_{T_{2}}.$		(45)

Now, $\|T_{1}\|_{2}\lesssim\theta_{\text{max}}^{2}n/\lambda_{1}^{*2}$ by (44). To bound $\|T_{2}\|_{2}$ , it is enough to bound $\|(\widehat{\boldsymbol{u}}_{1}^{\top}\boldsymbol{u}_{1}^{*})\widehat{\boldsymbol{u}}_{1}-\widehat{\boldsymbol{u}}_{1}\|_{2}=|\widehat{\boldsymbol{u}}_{1}^{\top}\boldsymbol{u}_{1}^{*}-1|$ . To this end, note that,

\displaystyle|\widehat{\boldsymbol{u}}_{1}^{\top}\boldsymbol{u}_{1}^{*}-1|=1-\widehat{\boldsymbol{u}}_{1}^{\top}\boldsymbol{u}_{1}^{*}=\frac{\boldsymbol{u}_{1}^{*\top}\boldsymbol{u}_{1}^{*}+\widehat{\boldsymbol{u}}_{1}^{\top}\widehat{\boldsymbol{u}}_{1}-\widehat{\boldsymbol{u}}_{1}^{\top}\boldsymbol{u}_{1}^{*}-\boldsymbol{u}_{1}^{*\top}\widehat{\boldsymbol{u}}_{1}}{2}=\frac{\left\|\boldsymbol{u}_{1}^{*}-\widehat{\boldsymbol{u}}_{1}\right\|_{2}^{2}}{2},

(46)

Also, by Wedin’s sin $\Theta$ Theorem [CCF⁺21, Theorem 2.9],

\displaystyle\left\|\boldsymbol{u}_{1}^{*}-\widehat{\boldsymbol{u}}_{1}\right\|_{2}\lesssim\frac{\left\|\boldsymbol{W}\right\|}{\lambda_{1}^{*}}\lesssim\frac{\theta_{\text{max}}\sqrt{n}}{\lambda_{1}^{*}},

(47)

under $\mathcal{A}_{1}$ . Combining (46) and (47), we obtain

\displaystyle\|(\widehat{\boldsymbol{u}}_{1}^{\top}\boldsymbol{u}_{1}^{*})\widehat{\boldsymbol{u}}_{1}-\widehat{\boldsymbol{u}}_{1}\|_{2}=\frac{\left\|\boldsymbol{u}_{1}^{*}-\widehat{\boldsymbol{u}}_{1}\right\|_{2}^{2}}{2}\lesssim\frac{\theta_{\text{max}}^{2}n}{\lambda_{1}^{*2}}.

Therefore, we get, under the event $\mathcal{A}_{1}$ ,

	$\displaystyle\\|\boldsymbol{\delta}\\|_{2}$	$\displaystyle\leq\left\\|\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{}-\sum_{i=2}^{n}\frac{\boldsymbol{u}_{i}^{\top}\boldsymbol{W}\boldsymbol{u}_{1}^{}}{\lambda_{1}^{}-\lambda_{i}^{}}\boldsymbol{u}_{i}^{}\right\\|_{2}$
		$\displaystyle\leq\left\\|\left(\widehat{\boldsymbol{P}}_{1}-\boldsymbol{P}_{1}^{}-\Delta\boldsymbol{P}_{1}\right)\boldsymbol{u}_{1}^{}\right\\|_{2}+\\|(\widehat{\boldsymbol{u}}_{1}^{\top}\boldsymbol{u}_{1}^{})\widehat{\boldsymbol{u}}_{1}-\widehat{\boldsymbol{u}}_{1}\\|_{2}\lesssim\frac{\theta_{\text{max}}^{2}n}{\lambda_{1}^{2}}.$

∎

8.2 Proof of Theorem 9

Proof.

Recall from (45) that

\displaystyle\boldsymbol{\delta}=\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{*}-\Delta\boldsymbol{P}_{1}\boldsymbol{u}_{1}^{*}=\left(\widehat{\boldsymbol{P}}_{1}-\boldsymbol{P}^{\star}_{1}-\Delta\boldsymbol{P}_{1}\right)\boldsymbol{u}_{1}^{*}+(1-\widehat{\boldsymbol{u}}_{1}^{\top}\boldsymbol{u}_{1}^{*})\widehat{\boldsymbol{u}}_{1}.

(48)

We begin with bounding $\|(\widehat{\boldsymbol{P}}_{1}-\boldsymbol{P}^{\star}_{1}-\Delta\boldsymbol{P}_{1})\boldsymbol{u}_{1}^{*}\|_{\infty}$ . Recall that,

\displaystyle\widehat{\boldsymbol{P}}_{1}-\boldsymbol{P}_{1}^{*}-\Delta\boldsymbol{P}_{1}

\displaystyle=\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}d\lambda.

We split the integrand as a sum of the following two quantities,

	$\displaystyle\boldsymbol{\Delta}_{1}$	$\displaystyle:=\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1};$
	$\displaystyle\boldsymbol{\Delta}_{2}$	$\displaystyle:=\left[\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}-\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\right]\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}$
		$\displaystyle=\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1},$

so that $\widehat{\boldsymbol{P}}_{1}-\boldsymbol{P}_{1}^{*}-\Delta\boldsymbol{P}_{1}=\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\left(\boldsymbol{\Delta}_{1}+\boldsymbol{\Delta}_{2}\right)d\lambda$ . Under the event $\mathcal{A}_{1}$ ,

	$\displaystyle\left\\|\left[\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{2}d\lambda\right]\boldsymbol{u}_{1}^{*}\right\\|_{\infty}$	$\displaystyle\leq\left\\|\left[\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{2}d\lambda\right]\boldsymbol{u}_{1}^{*}\right\\|_{2}\leq\left\\|\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{2}d\lambda\right\\|$
		$\displaystyle\leq\frac{1}{2\pi}\oint_{\mathcal{C}_{1}}\left\\|\boldsymbol{\Delta}_{2}\right\\|d\lambda\lesssim\frac{\theta_{\text{max}}^{3}n^{1.5}r}{\lambda_{1}^{4}}\lesssim\frac{\theta_{\text{max}}^{3}n^{1.5}}{\lambda_{1}^{3}},$		(49)

where the fourth inequality uses (8.1). It remains to bound $\left\|\left[\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{1}d\lambda\right]\boldsymbol{u}_{1}^{*}\right\|_{\infty}$ . Since $\boldsymbol{\Delta}_{1}$ can be expanded as

\displaystyle\boldsymbol{\Delta}_{1}=\sum_{i,j,k=1}^{n}\frac{1}{(\lambda-\lambda_{i}^{*})(\lambda-\lambda_{j}^{*})(\lambda-\lambda_{k}^{*})}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}\boldsymbol{W}\boldsymbol{u}_{j}^{*}\boldsymbol{u}_{j}^{*\top}\boldsymbol{W}\boldsymbol{u}_{k}^{*}\boldsymbol{u}_{k}^{*\top},

we have

\displaystyle\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{1}d\lambda=\sum_{i,j,k=1}^{n}\textbf{Res}\left(\frac{1}{(\lambda-\lambda_{i}^{*})(\lambda-\lambda_{j}^{*})(\lambda-\lambda_{k}^{*})},\lambda_{1}^{*}\right)\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}\boldsymbol{W}\boldsymbol{u}_{j}^{*}\boldsymbol{u}_{j}^{*\top}\boldsymbol{W}\boldsymbol{u}_{k}^{*}\boldsymbol{u}_{k}^{*\top}.

Since $\boldsymbol{u}_{k}^{*\top}\boldsymbol{u}_{1}^{*}=\mathbbm{1}_{k=1}$ , we have

	$\displaystyle\left[\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{1}d\lambda\right]\boldsymbol{u}_{1}^{*}=$	$\displaystyle\sum_{i,j=1}^{n}\textbf{Res}\left(\frac{1}{(\lambda-\lambda_{i}^{})(\lambda-\lambda_{j}^{})(\lambda-\lambda_{1}^{})},\lambda_{1}^{}\right)\boldsymbol{u}_{i}^{}\boldsymbol{u}_{i}^{\top}\boldsymbol{W}\boldsymbol{u}_{j}^{}\boldsymbol{u}_{j}^{\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}$
	$\displaystyle=$	$\displaystyle\sum_{i,j=2}^{n}\frac{1}{(\lambda_{1}^{}-\lambda_{i}^{})(\lambda_{1}^{}-\lambda_{j}^{})}\boldsymbol{u}_{i}^{}\boldsymbol{u}_{i}^{\top}\boldsymbol{W}\boldsymbol{u}_{j}^{}\boldsymbol{u}_{j}^{\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}$
		$\displaystyle-\sum_{i=2}^{n}\frac{1}{(\lambda_{1}^{}-\lambda_{i}^{})^{2}}\left[\boldsymbol{u}_{i}^{}\boldsymbol{u}_{i}^{\top}\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{u}_{1}^{}+\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{u}_{i}^{}\boldsymbol{u}_{i}^{\top}\boldsymbol{W}\boldsymbol{u}_{1}^{}\right].$

Define the matrices

\displaystyle\boldsymbol{N}_{1}:=\sum_{i=2}^{n}\frac{1}{\lambda_{1}^{*}-\lambda_{i}^{*}}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top},\quad\boldsymbol{N}_{2}:=\sum_{i=2}^{n}\frac{1}{(\lambda_{1}^{*}-\lambda_{i}^{*})^{2}}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}.

(50)

Then we can write

\displaystyle\left[\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{1}d\lambda\right]\boldsymbol{u}_{1}^{*}=\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}-\boldsymbol{N}_{2}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}-\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{N}_{2}\boldsymbol{W}\boldsymbol{u}_{1}^{*}.

(51)

We analyse the three terms separately now.

(i)

Control $\left\|\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\|_{\infty}$ : We introduce leave-one-out matrix $\boldsymbol{W}^{(i)}$ by replacing all the elements in $i$ -th row and $i$ -th column of original $\boldsymbol{W}$ with $0$ for $i\in[n]$ . Then, for any $i\in[n]$ , $\boldsymbol{W}_{i,\cdot}$ is independent of $\boldsymbol{W}^{(i)}$ . We write

	$\displaystyle\left\\|\boldsymbol{W}\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\\|_{\infty}$	$\displaystyle=\max_{1\leq i\leq n}\left\|\boldsymbol{W}_{i,\cdot}\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\|$
		$\displaystyle\leq\max_{1\leq i\leq n}\left\{\left\|\boldsymbol{W}_{i,\cdot}\boldsymbol{N}_{1}\boldsymbol{W}^{(i)}\boldsymbol{u}_{1}^{}\right\|+\left\|\boldsymbol{W}_{i,\cdot}\boldsymbol{N}_{1}\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{}\right\|\right\}.$		(52)

For a fixed $i\in[n]$ , by Lemma 13 and Lemma 12, we obtain

	$\displaystyle\left\|\boldsymbol{W}_{i,\cdot}\boldsymbol{N}_{1}\boldsymbol{W}^{(i)}\boldsymbol{u}_{1}^{*}\right\|$	$\displaystyle\lesssim\sqrt{\log n}\theta_{\text{max}}\Big{\\|}\boldsymbol{N}_{1}\boldsymbol{W}^{(i)}\boldsymbol{u}_{1}^{}\Big{\\|}_{2}+\log n\Big{\\|}\boldsymbol{N}_{1}\boldsymbol{W}^{(i)}\boldsymbol{u}_{1}^{}\Big{\\|}_{\infty}$
		$\displaystyle\lesssim\frac{\sqrt{n\log n}\theta_{\text{max}}^{2}}{\lambda_{1}^{}}+\log n\left\\|\boldsymbol{N}_{1}\boldsymbol{W}^{(i)}\boldsymbol{u}_{1}^{}\right\\|_{\infty}$		(53)

with probability at least $1-O(n^{-15})$ under $\mathcal{A}_{1}$ . By Corollary 4 and Lemma 15, we have

	$\displaystyle\left\\|\boldsymbol{N}_{1}\boldsymbol{W}^{(i)}\boldsymbol{u}_{1}^{*}\right\\|_{\infty}$	$\displaystyle\lesssim\frac{1}{\lambda_{1}^{}}\left\\|\boldsymbol{W}^{(i)}\boldsymbol{u}_{1}^{}\right\\|_{\infty}+\sqrt{\frac{(K-1)\mu^{}}{n\lambda_{1}^{2}}}\left\\|\boldsymbol{W}^{(i)}\boldsymbol{u}_{1}^{*}\right\\|_{2}$
		$\displaystyle\lesssim\frac{\left(\sqrt{\log n}+\sqrt{(K-1)\mu^{}}\right)\theta_{\text{max}}+\log n\sqrt{\mu^{}/n}}{\lambda_{1}^{*}}:=\rho_{1}$		(54)

with probability at least $1-O(n^{-15})$ under event $\mathcal{A}_{1}$ . Plugging (54) in (53) tells us

\displaystyle\left|\boldsymbol{W}_{i,\cdot}\boldsymbol{N}_{1}\boldsymbol{W}^{(i)}\boldsymbol{u}_{1}^{*}\right|

\displaystyle\lesssim\frac{\sqrt{n\log n}\theta_{\text{max}}^{2}}{\lambda_{1}^{*}}+\log n\rho_{1}=:\eta_{1}

(55)

with probability at least $1-O(n^{-15})$ under event $\mathcal{A}_{1}$ .

The second summand in (52) can be bounded using Lemma 12 and Lemma 16 as

$\displaystyle\left\|\boldsymbol{W}_{i,\cdot}\boldsymbol{N}_{1}\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{*}\right\|$	$\displaystyle\lesssim\left\\|\boldsymbol{W}_{i,\cdot}\right\\|_{2}\left\\|\boldsymbol{N}_{1}\right\\|\left\\|\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{*}\right\\|_{2}$
	$\displaystyle\lesssim\frac{\left\\|\boldsymbol{W}\right\\|}{\lambda_{1}^{}}\left(\sqrt{\log n}\theta_{\text{max}}+\left(\log n+\sqrt{n}\theta_{\text{max}}\right)\sqrt{\mu^{}/n}\right)$
	$\displaystyle\lesssim\frac{\sqrt{n(\mu^{}+\log n)}\theta_{\text{max}}^{2}+\log n\sqrt{\mu^{}}\theta_{\text{max}}}{\lambda_{1}^{*}}$	(56)

with probability at least $1-O(n^{-15})$ . Plugging (55) and (56) in (52) we get

\displaystyle\left\|\boldsymbol{W}\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\|_{\infty}\lesssim

\displaystyle\frac{\sqrt{n\mu^{*}}\theta_{\text{max}}^{2}}{\lambda_{1}^{*}}+\eta_{1}

with probability at least $1-O(n^{-10})$ . Then by Cororllary 4 we have

	$\displaystyle\left\\|\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\\|_{\infty}\lesssim$	$\displaystyle\frac{1}{\lambda_{1}^{}}\left\\|\boldsymbol{W}\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{}\right\\|_{\infty}+\sqrt{\frac{(K-1)\mu^{}}{n\lambda_{1}^{2}}}\left\\|\boldsymbol{W}\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\\|_{2}$
	$\displaystyle\lesssim$	$\displaystyle\frac{1}{\lambda_{1}^{}}\left\\|\boldsymbol{W}\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{}\right\\|_{\infty}+\sqrt{\frac{(K-1)\mu^{}}{n\lambda_{1}^{2}}}\left\\|\boldsymbol{W}\right\\|^{2}\left\\|\boldsymbol{N}_{1}\right\\|\left\\|\boldsymbol{u}_{1}^{*}\right\\|_{2}$
	$\displaystyle\lesssim$	$\displaystyle\frac{\sqrt{n\mu^{}}\theta_{\text{max}}^{2}}{\lambda_{1}^{2}}+\frac{\eta_{1}}{\lambda_{1}^{}}+\frac{\sqrt{(K-1)\mu^{}}}{\sqrt{n}\lambda_{1}^{}}n\theta_{\text{max}}^{2}\frac{1}{\lambda_{1}^{}}$
	$\displaystyle\lesssim$	$\displaystyle\frac{\sqrt{n(K-1)\mu^{}}\theta_{\text{max}}^{2}}{\lambda_{1}^{2}}+\frac{\eta_{1}}{\lambda_{1}^{*}}$

with probability at least $1-O(n^{-10})$ .

(ii)

Control $\left\|\boldsymbol{N}_{2}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\|_{\infty}$ : First, by Lemma 17, with probability at least $1-O(n^{-15})$ , we have $|\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}|\lesssim\sqrt{\log n}$ . Second, by Lemma 15 we have

\displaystyle\left\|\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\|_{\infty}\lesssim\sqrt{\log n}\theta_{\text{max}}+\log n\sqrt{\mu^{*}/n}

(57)

with probability at least $1-O(n^{-14})$ . Using the definition of $\rho_{1}$ from (54),

	$\displaystyle\left\\|\boldsymbol{N}_{2}\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{u}_{1}^{}\right\\|_{\infty}=\left\|\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{u}_{1}^{}\right\|\left\\|\boldsymbol{N}_{2}\boldsymbol{W}\boldsymbol{u}_{1}^{}\right\\|_{\infty}\lesssim\sqrt{\log n}\left\\|\boldsymbol{N}_{2}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\\|_{\infty}$
	$\displaystyle\lesssim\sqrt{\log n}\Big{(}\frac{1}{\lambda_{1}^{2}}\left\\|\boldsymbol{W}\boldsymbol{u}_{1}^{}\right\\|_{\infty}+\sqrt{\frac{(K-1)\mu^{}}{n\lambda_{1}^{4}}}\left\\|\boldsymbol{W}\boldsymbol{u}_{1}^{}\right\\|_{2}\Big{)}\lesssim\frac{\sqrt{\log n}}{\lambda_{1}^{}}\rho_{1}$

with probability at least $1-O(n^{-10})$ , where the second inequality uses Corollary 4.

(iii)

Control $\left\|\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{N}_{2}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\|_{\infty}$ : Using Lemma 12, under the event $\mathcal{A}_{1}$ ,

	$\displaystyle\left\\|\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{N}_{2}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\\|_{\infty}$	$\displaystyle=\left\\|\boldsymbol{u}_{1}^{}\right\\|_{\infty}\left\|\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{N}_{2}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\|$
		$\displaystyle\leq\left\\|\boldsymbol{u}_{1}^{}\right\\|_{\infty}\left\|\boldsymbol{u}_{1}^{}\right\\|_{2}^{2}\left\\|\boldsymbol{W}\right\\|^{2}\left\\|\boldsymbol{N}_{2}\right\\|\leq\frac{\sqrt{\mu^{}n}\theta_{\text{max}}^{2}}{\lambda_{1}^{2}}$

with probability at least $1-O(n^{-10})$ , where the final inequality uses (12).

Combine these three parts with (51), we get

\displaystyle\left\|\left[\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{1}d\lambda\right]\boldsymbol{u}_{1}^{*}\right\|_{\infty}\lesssim

\displaystyle\frac{\sqrt{n(K-1)\mu^{*}}\theta_{\text{max}}^{2}}{\lambda_{1}^{*2}}+\frac{\eta_{1}}{\lambda_{1}}=:\eta_{2}.

(58)

with probability at least $1-O(n^{-10})$ . Combining (58) and (49), we get

\displaystyle\left\|(\widehat{\boldsymbol{P}}_{1}-\boldsymbol{P}_{1}-\Delta\boldsymbol{P}_{1})\boldsymbol{u}_{1}^{*}\right\|_{\infty}\lesssim

\displaystyle\frac{\theta_{\text{max}}^{3}n^{1.5}}{\lambda_{1}^{*3}}+\eta_{2}.

(59)

It remains to bound $\|(1-\widehat{\boldsymbol{u}}_{1}^{\top}\boldsymbol{u}_{1}^{*})\widehat{\boldsymbol{u}}_{1}\|_{\infty}$ . To this end, using (46) and (47),

	$\displaystyle\\|(1-\widehat{\boldsymbol{u}}_{1}^{\top}\boldsymbol{u}_{1}^{*})\widehat{\boldsymbol{u}}_{1}\\|_{\infty}$	$\displaystyle=\|1-\widehat{\boldsymbol{u}}_{1}^{\top}\boldsymbol{u}_{1}^{}\|\\|\widehat{\boldsymbol{u}}_{1}\\|_{\infty}\lesssim\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{2}}\left(\left\\|\boldsymbol{u}_{1}^{}\right\\|_{\infty}+\left\\|\widehat{\boldsymbol{u}}-\boldsymbol{u}_{1}^{}\right\\|_{\infty}\right)$
		$\displaystyle\leq\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{2}}\left(\sqrt{\frac{\mu^{}}{n}}+\left\\|\widehat{\boldsymbol{u}}-\boldsymbol{u}_{1}^{}\right\\|_{2}\right)\lesssim\frac{\sqrt{\mu^{}n}\theta_{\text{max}}^{2}}{\lambda_{1}^{2}}+\frac{\theta_{\text{max}}^{3}n^{1.5}}{\lambda_{1}^{3}},$		(60)

where the second inequality is by (12). Plugging (59) and (60) in (48), we obtain

\displaystyle\left\|\boldsymbol{\delta}\right\|_{\infty}\leq\left\|(\widehat{\boldsymbol{P}}_{1}-\boldsymbol{P}_{1}-\Delta\boldsymbol{P}_{1})\boldsymbol{u}_{1}^{*}\right\|_{\infty}+\|(1-\widehat{\boldsymbol{u}}_{1}^{\top}\boldsymbol{u}_{1}^{*})\widehat{\boldsymbol{u}}_{1}\|_{\infty}\lesssim\frac{\theta_{\text{max}}^{3}n^{1.5}}{\lambda_{1}^{*3}}+\eta_{2}

with probability at least $1-O(n^{-10})$ , completing the proof. ∎

8.3 Proof of Lemma 5

Proof.

By Theorem 8 we know that $\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{*}=\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}+\boldsymbol{\delta}$ . By Lemma 4, Corollary 4 and (57) we know with probability at least $1-O(n^{-10})$ ,

	$\displaystyle\left\\|\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}\right\\|_{\infty}$	$\displaystyle\leq\frac{1}{\lambda_{1}^{}}\left\\|\boldsymbol{W}\boldsymbol{u}_{1}\right\\|_{\infty}+\sqrt{\frac{(K-1)\mu^{}}{n\lambda_{1}^{2}}}\left\\|\boldsymbol{W}\boldsymbol{u}_{1}^{}\right\\|_{2}$
		$\displaystyle\leq\frac{1}{\lambda_{1}^{}}\left\\|\boldsymbol{W}\boldsymbol{u}_{1}\right\\|_{2,\infty}+\sqrt{\frac{(K-1)\mu^{}}{n\lambda_{1}^{*2}}}\left\\|\boldsymbol{W}\right\\|\lesssim\rho_{1},$

where $\rho_{1}$ is defined by (54). Combine this with Theorem 9, we get, with probability at least $1-O(n^{-10})$ ,

\displaystyle\left\|\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{*}\right\|_{\infty}\leq\left\|\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}\right\|_{\infty}+\left\|\boldsymbol{\delta}\right\|_{\infty}\lesssim\rho_{1}+\frac{\theta_{\text{max}}^{3}n^{1.5}}{\lambda_{1}^{*3}}+\frac{\rho_{1}\log n}{\lambda_{1}}\lesssim\rho_{1}+\frac{\theta_{\text{max}}^{3}n^{1.5}}{\lambda_{1}^{*3}}

by our assumption $\lambda^{\star}_{1}\gg\log n$ . This completes the proof. ∎

8.4 Proof of Theorem 10

Proof.

We use the same $r$ and $\mathcal{C}_{1}$ as in the proof of Theorem 8. Since

\displaystyle\boldsymbol{H}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}

\displaystyle=\sum_{i=1}^{n}\frac{\lambda_{i}^{*}}{\lambda-\lambda_{i}^{*}}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top},\quad\boldsymbol{X}\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}=\sum_{i=1}^{n}\frac{\widehat{\lambda}_{i}}{\lambda-\widehat{\lambda}_{i}}\widehat{\boldsymbol{u}}_{i}\widehat{\boldsymbol{u}}_{i}^{\top},

for the same reason as the proof of Theorem 8, under event $\mathcal{A}_{1}$ , we have

	$\displaystyle\widehat{\lambda}_{1}\widehat{\boldsymbol{u}}_{1}\widehat{\boldsymbol{u}}_{1}^{\top}-\lambda_{1}^{}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{*\top}$	$\displaystyle=\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{X}\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}d\lambda-\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{H}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}d\lambda$
		$\displaystyle=\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\left[\boldsymbol{X}\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}-\boldsymbol{H}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\right]d\lambda.$

We expand the integrand as sum of two quantitites in the following way:

	$\displaystyle\boldsymbol{X}\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}-\boldsymbol{H}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}$
$\displaystyle=$	$\displaystyle\left(\boldsymbol{X}-\boldsymbol{H}\right)\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}+\boldsymbol{H}\left[\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}-\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\right]$
$\displaystyle=$	$\displaystyle\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}+\boldsymbol{H}\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}$
$\displaystyle=$	$\displaystyle\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}+\boldsymbol{H}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}$
	$\displaystyle+\boldsymbol{W}\left[\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}-\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\right]+\boldsymbol{H}\left[\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}-\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\right]\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}$
$\displaystyle=$	$\displaystyle\underbrace{\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}+\boldsymbol{H}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}}_{\boldsymbol{\Delta}_{1}}$
	$\displaystyle+\underbrace{\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}+\boldsymbol{H}\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}}_{\boldsymbol{\Delta}_{2}}.$	(61)

For $\boldsymbol{\Delta}_{1}$ , the contour integral can be calculated as

	$\displaystyle\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{1}d\lambda=$	$\displaystyle\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\left(\sum_{i=1}^{n}\frac{1}{\lambda-\lambda_{i}^{}}\boldsymbol{W}\boldsymbol{u}_{i}^{}\boldsymbol{u}_{i}^{\top}+\sum_{i=1}^{n}\sum_{j=1}^{n}\frac{\lambda_{i}^{}}{(\lambda-\lambda_{i}^{})(\lambda-\lambda_{j}^{})}\boldsymbol{u}_{i}^{}\boldsymbol{u}_{i}^{\top}\boldsymbol{W}\boldsymbol{u}_{j}^{}\boldsymbol{u}_{j}^{\top}\right)d\lambda$
	$\displaystyle=$	$\displaystyle\sum_{i=1}^{n}\textbf{Res}\left(\frac{1}{\lambda-\lambda_{i}^{}},\lambda_{1}^{}\right)\boldsymbol{W}\boldsymbol{u}_{i}^{}\boldsymbol{u}_{i}^{\top}$
		$\displaystyle+\sum_{i,j=1}^{n}\textbf{Res}\left(\frac{\lambda_{i}^{}}{(\lambda-\lambda_{i}^{})(\lambda-\lambda_{j}^{})},\lambda_{1}^{}\right)\boldsymbol{u}_{i}^{}\boldsymbol{u}_{i}^{\top}\boldsymbol{W}\boldsymbol{u}_{j}^{}\boldsymbol{u}_{j}^{\top}$
	$\displaystyle=$	$\displaystyle\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}+\sum_{i=2}^{n}\frac{\lambda_{i}^{}}{\lambda_{1}^{}-\lambda_{i}^{}}\boldsymbol{u}_{i}^{}\boldsymbol{u}_{i}^{\top}\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}+\sum_{i=2}^{n}\frac{\lambda_{1}^{}}{\lambda_{1}^{}-\lambda_{i}^{}}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{u}_{i}^{}\boldsymbol{u}_{i}^{\top}$
	$\displaystyle=$	$\displaystyle\left(\boldsymbol{I}+\sum_{i=2}^{n}\frac{\lambda_{i}^{}}{\lambda_{1}^{}-\lambda_{i}^{}}\boldsymbol{u}_{i}^{}\boldsymbol{u}_{i}^{\top}\right)\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}+\left(\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\right)^{\top}\sum_{i=2}^{n}\frac{\lambda_{1}^{}}{\lambda_{1}^{}-\lambda_{i}^{}}\boldsymbol{u}_{i}^{}\boldsymbol{u}_{i}^{\top}$
	$\displaystyle=$	$\displaystyle\left(\sum_{i=1}^{n}\boldsymbol{u}_{i}^{}\boldsymbol{u}_{i}^{\top}+\sum_{i=2}^{n}\frac{\lambda_{i}^{}}{\lambda_{1}^{}-\lambda_{i}^{}}\boldsymbol{u}_{i}^{}\boldsymbol{u}_{i}^{\top}\right)\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}+\left(\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{*\top}\right)^{\top}\boldsymbol{N}$
	$\displaystyle=$	$\displaystyle\left(\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}+\sum_{i=2}^{n}\frac{\lambda_{1}^{}}{\lambda_{1}^{}-\lambda_{i}^{}}\boldsymbol{u}_{i}^{}\boldsymbol{u}_{i}^{\top}\right)\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}+\left(\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{*\top}\right)^{\top}\boldsymbol{N}$
	$\displaystyle=$	$\displaystyle\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}+\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}+\left(\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\right)^{\top}\boldsymbol{N}.$

Next, we bound the spectral norm of $\boldsymbol{\Delta}_{2}$ under the event $\mathcal{A}_{1}$ , as:

	$\displaystyle\left\\|\boldsymbol{\Delta}_{2}\right\\|$	$\displaystyle\leq\left\\|\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\right\\|+\left\\|\boldsymbol{H}\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\right\\|$
		$\displaystyle\leq\left\\|\boldsymbol{W}\right\\|^{2}\left\\|\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}\right\\|\left\\|\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\right\\|+\left\\|\boldsymbol{H}\right\\|\left\\|\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}\right\\|\left\\|\boldsymbol{W}\right\\|^{2}\left\\|\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\right\\|^{2}$
		$\displaystyle\lesssim\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{2}}+\frac{\lambda_{1}^{}n\theta_{\text{max}}^{2}}{\lambda_{1}^{3}}\lesssim\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{2}},$

where the third inequality (8.1). As a result, the contour integral of $\boldsymbol{\Delta}_{2}$ , can be bounded as

\displaystyle\left\|\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{2}d\lambda\right\|\lesssim\frac{1}{2\pi}\oint_{\mathcal{C}_{1}}\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{*2}}d\lambda=\frac{n\theta_{\text{max}}^{2}r}{\lambda_{1}^{*2}}\lesssim\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{*}}.

Noting that the contour integral of $\boldsymbol{\Delta}_{2}$ is exactly

\displaystyle\widehat{\lambda}_{1}\widehat{\boldsymbol{u}}_{1}\widehat{\boldsymbol{u}}_{1}^{\top}-\lambda_{1}^{*}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}-\left[\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+\left(\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right)^{\top}\boldsymbol{N}\right]

gives us the desired result. ∎

8.5 Proof of Corollary 2

Proof.

Since $\boldsymbol{u}_{2}^{*},\boldsymbol{u}_{3}^{*},\dots,\boldsymbol{u}_{K}^{*}$ are orthogonal to each other, we have

\displaystyle\left\|\boldsymbol{N}\right\|=\left\|\sum_{i=2}^{n}\frac{\lambda_{1}^{*}}{\lambda_{1}^{*}-\lambda_{i}^{*}}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}\right\|\lesssim\max_{2\leq i\leq n}\frac{\lambda_{1}^{*}}{\lambda_{1}^{*}-\lambda_{i}^{*}}\lesssim 1.

(62)

Since $\|\boldsymbol{W}\|\lesssim\sqrt{n}\theta_{\text{max}}$ on the set $\mathcal{A}_{1}$ we have, by (10),

	$\displaystyle\left\\|\widehat{\lambda}_{1}\widehat{\boldsymbol{u}}_{1}\widehat{\boldsymbol{u}}_{1}^{\top}-\lambda_{1}^{}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{*\top}\right\\|$	$\displaystyle\leq\left\\|\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\right\\|+2\left\\|\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\right\\|\left\\|\boldsymbol{N}\right\\|+\left\\|\boldsymbol{\Delta}\right\\|$
		$\displaystyle\leq\left\\|\boldsymbol{W}\right\\|+2\left\\|\boldsymbol{W}\right\\|\left\\|\boldsymbol{N}\right\\|+\left\\|\boldsymbol{\Delta}\right\\|\lesssim\sqrt{n}\theta_{\text{max}}+\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{*}}\lesssim\sqrt{n}\theta_{\text{max}},$

where the third inequality used (62). ∎

8.6 Proof of Lemma 7

Proof.

Recall the definitions of $L$ and $R$ from (38) and (10) respectively. We write the SVD of $\boldsymbol{L}$ as $\boldsymbol{L}=\boldsymbol{Y}_{1}(\cos\boldsymbol{\Omega})\boldsymbol{Y}_{2}^{\top}$ , where $\boldsymbol{Y}_{1},\boldsymbol{Y}_{2}\in\mathbb{R}^{(K-1)\times(K-1)}$ . Then we have $\boldsymbol{R}=\boldsymbol{Y}_{1}\boldsymbol{Y}_{2}^{\top}$ . Therefore, we have

	$\displaystyle\left\\|\overline{\boldsymbol{U}}\boldsymbol{R}-\overline{\boldsymbol{U}}^{*}\right\\|^{2}$	$\displaystyle=\left\\|\overline{\boldsymbol{U}}\boldsymbol{Y}_{1}\boldsymbol{Y}_{2}^{\top}-\overline{\boldsymbol{U}}^{}\right\\|^{2}=\left\\|\left(\overline{\boldsymbol{U}}\boldsymbol{Y}_{1}\boldsymbol{Y}_{2}^{\top}-\overline{\boldsymbol{U}}^{}\right)^{\top}\left(\overline{\boldsymbol{U}}\boldsymbol{Y}_{1}\boldsymbol{Y}_{2}^{\top}-\overline{\boldsymbol{U}}^{*}\right)\right\\|$
		$\displaystyle=\left\\|2\boldsymbol{I}-\boldsymbol{Y}_{2}\boldsymbol{Y}_{1}^{\top}\overline{\boldsymbol{U}}^{\top}\overline{\boldsymbol{U}}^{}-\overline{\boldsymbol{U}}^{\top}\overline{\boldsymbol{U}}\boldsymbol{Y}_{1}\boldsymbol{Y}_{2}^{\top}\right\\|$
		$\displaystyle=\left\\|2\boldsymbol{I}-\boldsymbol{Y}_{2}\boldsymbol{Y}_{1}^{\top}\boldsymbol{Y}_{1}(\cos\boldsymbol{\Omega})\boldsymbol{Y}_{2}^{\top}-\boldsymbol{Y}_{2}(\cos\boldsymbol{\Omega})\boldsymbol{Y}_{1}^{\top}\boldsymbol{Y}_{1}\boldsymbol{Y}_{2}^{\top}\right\\|$
		$\displaystyle=2\left\\|\boldsymbol{I}-\boldsymbol{Y}_{2}(\cos\boldsymbol{\Omega})\boldsymbol{Y}_{2}^{\top}\right\\|=2\left\\|\boldsymbol{I}-\cos\boldsymbol{\Omega}\right\\|\leq 2\left\\|\boldsymbol{I}-\cos^{2}\boldsymbol{\Omega}\right\\|=2\left\\|\sin\boldsymbol{\Omega}\right\\|^{2}.$

By Wedin’s sin $\Theta$ Theorem [CCF⁺21, Theorem 2.9], we have

\displaystyle\left\|\sin\boldsymbol{\Omega}\right\|\leq\textbf{dist}(\overline{\boldsymbol{U}},\overline{\boldsymbol{U}}^{*})\leq\frac{\sqrt{2}\left\|\overline{\boldsymbol{X}}-\overline{\boldsymbol{H}}\right\|}{\sigma_{K-1}(\overline{\boldsymbol{X}})-\sigma_{K}(\overline{\boldsymbol{H}})}=\frac{\sqrt{2}\left\|\overline{\boldsymbol{W}}\right\|}{\sigma_{K}(\boldsymbol{X})-\sigma_{K+1}(\boldsymbol{H})}.

Recall that, $\|\overline{\boldsymbol{W}}\|\lesssim\sqrt{n}\theta_{\text{max}}$ under event $\mathcal{A}_{1}$ , by (37). On the other hand, we have

	$\displaystyle\sigma_{K}(\boldsymbol{X})-\sigma_{K+1}(\boldsymbol{H})$	$\displaystyle\geq\sigma_{K}(\boldsymbol{X})-\left\\|\textbf{diag}(\boldsymbol{\Theta}\boldsymbol{\Pi}\boldsymbol{P}\boldsymbol{\Pi}^{\top}\boldsymbol{\Theta})\right\\|$
		$\displaystyle\geq\sigma_{K}(\boldsymbol{H})-\left\\|\boldsymbol{W}\right\\|-\left\\|\textbf{diag}(\boldsymbol{\Theta}\boldsymbol{\Pi}\boldsymbol{P}\boldsymbol{\Pi}^{\top}\boldsymbol{\Theta})\right\\|\asymp\sigma_{\textbf{min}}^{*}.$

Therefore, we have

\displaystyle\left\|\overline{\boldsymbol{U}}\boldsymbol{R}-\overline{\boldsymbol{U}}^{*}\right\|\lesssim\left\|\sin\boldsymbol{\Omega}\right\|\lesssim\frac{\sqrt{n}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{*}}.

(63)

Again, by (63),

\displaystyle\left\|\boldsymbol{L}-\boldsymbol{R}\right\|=\left\|\boldsymbol{Y}_{1}(\cos\boldsymbol{\Omega})\boldsymbol{Y}_{2}^{\top}-\boldsymbol{Y}_{1}\boldsymbol{Y}_{2}^{\top}\right\|=\left\|\boldsymbol{I}-\cos\boldsymbol{\Omega}\right\|\leq 2\left\|\sin\boldsymbol{\Omega}\right\|^{2}\lesssim\frac{n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{*2}}.

Since we have assumed $n\theta_{\text{max}}^{2}\ll\sigma_{\textbf{min}}^{*2}$ , we have $\|\boldsymbol{L}-\boldsymbol{R}\|=o(1)$

	$\displaystyle\sigma_{i}(\boldsymbol{L})\leq\sigma_{i}(\boldsymbol{R})+\left\\|\boldsymbol{L}-\boldsymbol{R}\right\\|\leq 1+o(1)\leq 2,$
	$\displaystyle\sigma_{i}(\boldsymbol{L})\geq\sigma_{i}(\boldsymbol{R})-\left\\|\boldsymbol{L}-\boldsymbol{R}\right\\|\geq 1-o(1)\geq\frac{1}{2}.$

This proves the first part of this lemma.

Moving onto the proof of second part, by (7), we have $\boldsymbol{v}_{i}^{*}=\textbf{sgn}\left(\lambda_{i}^{*}\right)\boldsymbol{u}_{i}^{*}$ yielding $\overline{\boldsymbol{V}}^{*}=\overline{\boldsymbol{U}}^{*}\boldsymbol{D}$ . It remains to show that $\widehat{\boldsymbol{v}}_{i}=\textbf{sgn}(\lambda_{i}^{*})\widehat{\boldsymbol{u}}_{i}$ under event $\mathcal{A}_{1}$ . To this end, set $s:=|\{i\in[K]:\lambda_{i}^{*}>0\}|$ . by Lemma 1 we know that

	$\displaystyle\lambda_{1}^{}>\lambda_{2}^{}\geq\cdots\geq\lambda_{s}^{}\asymp\beta_{n}K^{-1}\left\\|\boldsymbol{\theta}\right\\|_{2}^{2}>0>-\beta_{n}K^{-1}\left\\|\boldsymbol{\theta}\right\\|_{2}^{2}\asymp\lambda_{s+1}^{}\geq\lambda_{K}^{*},$
	$\displaystyle\|\lambda_{i}^{*}\|\leq\left\\|\textbf{diag}(\boldsymbol{\Theta}\boldsymbol{\Pi}\boldsymbol{P}\boldsymbol{\Pi}^{\top}\boldsymbol{\Theta})\right\\|\lesssim\theta_{\text{max}}^{2}\ll\beta_{n}K^{-1}\left\\|\boldsymbol{\theta}\right\\|_{2}^{2},\quad\forall i>K.$

Recall that $\widehat{\lambda}_{1},\widehat{\lambda}_{2},\dots,\widehat{\lambda}_{K}$ are the largest $K$ eigenvalues in magnitude among all $n$ eigenvalues of $\boldsymbol{X}$ , and $\widehat{\lambda}_{1},\widehat{\lambda}_{2},\dots,\widehat{\lambda}_{K}$ are sorted descendingly. By Weyl’s theorem, under event $\mathcal{A}_{1}$ ,

	$\displaystyle\widehat{\lambda}_{1}>\widehat{\lambda}_{2}\geq\cdots\geq\widehat{\lambda}_{s}\asymp\beta_{n}K^{-1}\left\\|\boldsymbol{\theta}\right\\|_{2}^{2}>0>-\beta_{n}K^{-1}\left\\|\boldsymbol{\theta}\right\\|_{2}^{2}\asymp\widehat{\lambda}_{s+1}\geq\widehat{\lambda}_{K},$
	$\displaystyle\|\widehat{\lambda}_{i}\|\lesssim\sqrt{n}\theta_{\text{max}}+\left\\|\textbf{diag}(\boldsymbol{\Theta}\boldsymbol{\Pi}\boldsymbol{P}\boldsymbol{\Pi}^{\top}\boldsymbol{\Theta})\right\\|\lesssim\sqrt{n}\theta_{\text{max}}\ll\beta_{n}K^{-1}\left\\|\boldsymbol{\theta}\right\\|_{2}^{2},\quad\forall i>K.$

As a result, we get $\textbf{sgn}(\lambda_{i}^{*})=\textbf{sgn}(\widehat{\lambda}_{i})$ . This leads to, using (7),

\displaystyle\widehat{\boldsymbol{v}}_{i}=\textbf{sgn}(\widehat{\lambda}_{i})\widehat{\boldsymbol{u}}_{i}=\textbf{sgn}(\lambda_{i}^{*})\widehat{\boldsymbol{u}}_{i},

implying $\overline{\boldsymbol{V}}=\overline{\boldsymbol{U}}\boldsymbol{D}$ . This completes the proof of the Lemma. ∎

8.7 Proof of Lemma 8

Proof.

Recall the defintions of $R$ and $L$ from (10) and (38) respectively. By triangle inequality,

\displaystyle\left\|\boldsymbol{R}^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{R}-\overline{\boldsymbol{\Lambda}}^{*}\right\|\leq\underbrace{\left\|\boldsymbol{R}^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{R}-\boldsymbol{L}^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}\right\|}_{\alpha_{1}}+\underbrace{\left\|\boldsymbol{L}^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*\top}\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}\right\|}_{\alpha_{2}}+\underbrace{\left\|\overline{\boldsymbol{U}}^{*\top}\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}-\overline{\boldsymbol{\Lambda}}^{*}\right\|}_{\alpha_{3}}.

We bound the three quantities separately. For the first term $\alpha_{1}$ , we have, on the event $\mathcal{A}_{1}$ , using Lemma 7

	$\displaystyle\alpha_{1}$	$\displaystyle\leq\left\\|(\boldsymbol{R}-\boldsymbol{L})^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{R}\right\\|+\left\\|\boldsymbol{L}^{\top}\overline{\boldsymbol{\Lambda}}(\boldsymbol{R}-\boldsymbol{L})\right\\|$
		$\displaystyle\leq\left\\|\boldsymbol{R}-\boldsymbol{L}\right\\|\left\\|\overline{\boldsymbol{\Lambda}}\right\\|\left\\|\boldsymbol{R}\right\\|+\left\\|\boldsymbol{L}\right\\|\left\\|\overline{\boldsymbol{\Lambda}}\right\\|\left\\|\boldsymbol{R}-\boldsymbol{L}\right\\|$
		$\displaystyle\lesssim\frac{n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{2}}\left\\|\overline{\boldsymbol{\Lambda}}\right\\|\leq\frac{n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{2}}\left(\left\\|\overline{\boldsymbol{\Lambda}}^{}\right\\|+\left\\|\overline{\boldsymbol{W}}\right\\|\right)\lesssim\frac{n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{2}}\left(\sigma_{\textbf{max}}^{}+\sqrt{n}\theta_{\text{max}}\right)\lesssim\frac{n\theta_{\text{max}}^{2}\kappa^{}}{\sigma_{\textbf{min}}^{*}},$

where the third inequality uses the fact $R$ , $L$ are orthogonal matrices and the last inquality uses our assumption that $\sqrt{n}\theta_{\text{max}}\ll\sigma_{\textbf{min}}^{*}$ .

For the second quantity $\alpha_{2}$ , one can see that

\displaystyle\boldsymbol{L}^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*\top}\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}=\overline{\boldsymbol{U}}^{*\top}\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\overline{\boldsymbol{U}}^{\top}\overline{\boldsymbol{U}}^{*}-\overline{\boldsymbol{U}}^{*\top}\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}=-\overline{\boldsymbol{U}}^{*\top}\overline{\boldsymbol{U}}_{\perp}\overline{\boldsymbol{\Sigma}}_{\perp}\overline{\boldsymbol{V}}_{\perp}^{\top}\overline{\boldsymbol{U}}^{*},

(64)

where $\overline{\boldsymbol{U}}_{\perp},\overline{\boldsymbol{\Sigma}}_{\perp}$ and $\overline{\boldsymbol{V}}_{\perp}$ come from the SVD of $\overline{\boldsymbol{X}}$

\displaystyle\overline{\boldsymbol{X}}\overset{\text{SVD}}{=}\left[\begin{array}[]{ll}\overline{\boldsymbol{U}}&\overline{\boldsymbol{U}}_{\perp}\end{array}\right]\left[\begin{array}[]{cc}\overline{\boldsymbol{\Lambda}}\boldsymbol{D}&\boldsymbol{0}\\ \boldsymbol{0}&\overline{\boldsymbol{\Sigma}}_{\perp}\end{array}\right]\left[\begin{array}[]{c}(\overline{\boldsymbol{U}}\boldsymbol{D})^{\top}\\ \overline{\boldsymbol{V}}_{\perp}^{\top}\end{array}\right]=\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\overline{\boldsymbol{U}}^{\top}+\overline{\boldsymbol{U}}_{\perp}\overline{\boldsymbol{\Sigma}}_{\perp}\overline{\boldsymbol{V}}_{\perp}^{\top}.

By Weyl’s theorem we have

\displaystyle\left\|\overline{\boldsymbol{\Sigma}}_{\perp}\right\|\leq\sigma_{K}(\overline{\boldsymbol{H}})+\left\|\boldsymbol{W}\right\|\leq\left\|\boldsymbol{W}\right\|+\left\|\textbf{diag}(H)\right\|\lesssim\sqrt{n}\theta_{\text{max}}

(65)

under event $\mathcal{A}_{1}$ . On the other hand, by [CCF⁺21, Lemma 2.5] and Lemma 7 we have

	$\displaystyle\left\\|\overline{\boldsymbol{U}}^{\top}\overline{\boldsymbol{U}}_{\perp}\right\\|=\left\\|\overline{\boldsymbol{U}}_{\perp}^{\top}\overline{\boldsymbol{U}}^{}\right\\|=\left\\|\sin\boldsymbol{\Omega}\right\\|\lesssim\frac{\sqrt{n}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{*}},$		(66)
	$\displaystyle\left\\|\overline{\boldsymbol{V}}^{\top}_{\perp}\overline{\boldsymbol{U}}^{}\right\\|=\left\\|\overline{\boldsymbol{V}}^{\top}_{\perp}\overline{\boldsymbol{U}}^{}\boldsymbol{D}\right\\|=\left\\|\overline{\boldsymbol{V}}^{\top}_{\perp}\overline{\boldsymbol{V}}^{}\right\\|=\left\\|\sin\boldsymbol{\Omega}\right\\|\lesssim\frac{\sqrt{n}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{}}.$		(67)

Combine (64), (65), (66) and (67) we get

\displaystyle\alpha_{2}\leq\left\|\overline{\boldsymbol{U}}^{*\top}\overline{\boldsymbol{U}}_{\perp}\right\|\left\|\overline{\boldsymbol{\Sigma}}_{\perp}\right\|\left\|\overline{\boldsymbol{V}}^{\top}_{\perp}\overline{\boldsymbol{U}}^{*}\right\|\lesssim\frac{\theta_{\text{max}}^{3}n^{1.5}}{\sigma_{\textbf{min}}^{*2}}.

Next, we analyze $\alpha_{3}$ . By definition we have

\displaystyle\overline{\boldsymbol{U}}^{*\top}\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}-\overline{\boldsymbol{\Lambda}}^{*}=\overline{\boldsymbol{U}}^{*\top}(\overline{\boldsymbol{H}}+\overline{\boldsymbol{W}})\overline{\boldsymbol{U}}^{*}-\overline{\boldsymbol{U}}^{*\top}\overline{\boldsymbol{H}}\overline{\boldsymbol{U}}^{*}=\overline{\boldsymbol{U}}^{*\top}\overline{\boldsymbol{W}}\overline{\boldsymbol{U}}^{*}.

Use the notation of Theorem 10, we can write

	$\displaystyle\overline{\boldsymbol{W}}$	$\displaystyle=\boldsymbol{W}-\left[\widehat{\lambda}_{1}\widehat{\boldsymbol{u}}_{1}\widehat{\boldsymbol{u}}_{1}^{\top}-\lambda_{1}^{}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{*\top}\right]$
		$\displaystyle=\boldsymbol{W}-\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}-\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}-\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{N}-\boldsymbol{\Delta}.$		(68)

Since $\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\overline{\boldsymbol{U}}^{*}=\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}[\boldsymbol{u}_{2}^{*},\boldsymbol{u}_{3}^{*},\dots,\boldsymbol{u}_{K}^{*}]=\boldsymbol{0}$ , we know that

\displaystyle\overline{\boldsymbol{U}}^{*\top}\overline{\boldsymbol{W}}\overline{\boldsymbol{U}}^{*}=\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}-\overline{\boldsymbol{U}}^{*\top}\boldsymbol{\Delta}\overline{\boldsymbol{U}}^{*}.

Now we bound the two terms separately. The second quantity is immediately bounded by Theorem 10:

\displaystyle\left\|\overline{\boldsymbol{U}}^{*\top}\boldsymbol{\Delta}\overline{\boldsymbol{U}}^{*}\right\|\leq\left\|\boldsymbol{\Delta}\right\|\lesssim\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{*}}.

On the other hand, similar to [YCF21, Lemma 2], we use matrix Bernstein inequality [T⁺15, Theorem 6.1.1] to control $\left\|\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\right\|$ . Define $\boldsymbol{Y}_{ii}:=\overline{\boldsymbol{U}}^{*\top}_{i,\cdot}\overline{\boldsymbol{U}}^{*}_{i,\cdot}$ , $i\in[n]$ and $\boldsymbol{Y}_{ij}:=\overline{\boldsymbol{U}}^{*\top}_{i,\cdot}\overline{\boldsymbol{U}}^{*}_{j,\cdot}+\overline{\boldsymbol{U}}^{*\top}_{j,\cdot}\overline{\boldsymbol{U}}^{*}_{i,\cdot}$ , $1\leq i<j\leq n$ . Then

\displaystyle\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}=\sum_{i=1}^{n}W_{ii}\boldsymbol{Y}_{ii}+\sum_{1\leq i<j\leq n}W_{ij}\boldsymbol{Y}_{ij}.

By (12),

\displaystyle\max_{1\leq i\leq j\leq n}\left\|W_{ij}\boldsymbol{Y}_{ij}\right\|\leq\max_{1\leq i\leq j\leq n}\left\|\boldsymbol{Y}_{ij}\right\|\leq 2\left(\max_{i\in[n]}\left\|\overline{\boldsymbol{U}}^{*}_{i,\cdot}\right\|_{2}\right)^{2}\leq\frac{2\mu^{*}(K-1)}{n}.

Second, since $\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}$ is symmetric, we know that

\displaystyle\left(\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\right)^{\top}\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}=\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\left(\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\right)^{\top}=\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}.

Therefore, we only have to bound $\left\|\mathbb{E}\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\right\|$ . Since $\|\overline{\boldsymbol{U}}^{*}\|\leq 1$ , we have

\displaystyle\left\|\mathbb{E}\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\right\|=\left\|\overline{\boldsymbol{U}}^{*\top}\left(\mathbb{E}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\right)\overline{\boldsymbol{U}}^{*}\right\|\leq\left\|\mathbb{E}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\right\|.

For any $1\leq i\neq j\leq n$ , we have

\displaystyle\left[\mathbb{E}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\right]_{ij}=\mathbb{E}\left[\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\right]_{ij}=\mathbb{E}\sum_{k,l=1}^{n}W_{ik}\left[\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\right]_{kl}W_{lj}=\left[\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\right]_{j,i}\mathbb{E}W_{ij}^{2}.

And, for any $1\leq i\leq n$ , we have

	$\displaystyle\left[\mathbb{E}\boldsymbol{W}\overline{\boldsymbol{U}}^{}\overline{\boldsymbol{U}}^{\top}\boldsymbol{W}\right]_{ii}$	$\displaystyle=\mathbb{E}\sum_{k,l=1}^{n}W_{ik}\left[\overline{\boldsymbol{U}}^{}\overline{\boldsymbol{U}}^{\top}\right]_{kl}W_{li}=\sum_{j=1}^{n}\left[\overline{\boldsymbol{U}}^{}\overline{\boldsymbol{U}}^{\top}\right]_{jj}\mathbb{E}W_{ii}^{2}$
		$\displaystyle\lesssim\sum_{j=1}^{n}\left[\overline{\boldsymbol{U}}^{}\overline{\boldsymbol{U}}^{\top}\right]_{jj}\theta_{\text{max}}^{2}=\textbf{tr}\left[\overline{\boldsymbol{U}}^{}\overline{\boldsymbol{U}}^{\top}\right]\theta_{\text{max}}^{2}=\left\\|\overline{\boldsymbol{U}}^{*}\right\\|_{F}^{2}\theta_{\text{max}}^{2}=(K-1)\theta_{\text{max}}^{2}.$		(69)

Define $\boldsymbol{A}$ by setting $A_{ij}:=\left(\mathbb{E}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\right)_{ij}\mathbbm{1}_{i=j}$ . Then, by (8.7),

\displaystyle\left\|\mathbb{E}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\right\|\leq\left\|\boldsymbol{A}\right\|+\left\|\mathbb{E}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}-\boldsymbol{A}\right\|\leq(K-1)\theta_{\text{max}}^{2}+\left\|\mathbb{E}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}-\boldsymbol{A}\right\|_{F}.

The second summand of the above display can be bounded as:

		$\displaystyle\left\\|\mathbb{E}\boldsymbol{W}\overline{\boldsymbol{U}}^{}\overline{\boldsymbol{U}}^{\top}\boldsymbol{W}-\boldsymbol{A}\right\\|_{F}=\sqrt{\sum_{1\leq i\neq j\leq n}\left[\mathbb{E}\boldsymbol{W}\overline{\boldsymbol{U}}^{}\overline{\boldsymbol{U}}^{\top}\boldsymbol{W}\right]_{ij}^{2}}=\sqrt{\sum_{1\leq i\neq j\leq n}\left[\overline{\boldsymbol{U}}^{}\overline{\boldsymbol{U}}^{\top}\right]_{ji}^{2}\left[\mathbb{E}W_{ij}^{2}\right]^{2}}$
	$\displaystyle\lesssim$	$\displaystyle\sqrt{\sum_{i,j=1}^{n}\left[\overline{\boldsymbol{U}}^{}\overline{\boldsymbol{U}}^{\top}\right]_{ji}^{2}\theta_{\text{max}}^{4}}=\left\\|\overline{\boldsymbol{U}}^{}\overline{\boldsymbol{U}}^{\top}\right\\|_{F}\theta_{\text{max}}^{2}\leq\left\\|\overline{\boldsymbol{U}}^{*}\right\\|_{F}^{2}\theta_{\text{max}}^{2}=(K-1)\theta_{\text{max}}^{2}.$

In conclusion, we get

\displaystyle\left\|\mathbb{E}\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\right\|\leq\left\|\mathbb{E}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\right\|\lesssim(K-1)\theta_{\text{max}}^{2}.

Then by matrix Bernstein inequality [T⁺15, Theorem 6.1.1], with probability at least $1-O(n^{-10})$ ,

\displaystyle\alpha_{3}=\left\|\overline{\boldsymbol{U}}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\right\|\lesssim\sqrt{(K-1)\theta_{\text{max}}^{2}\log n}+\frac{\mu^{*}(K-1)}{n}\log n.

This implies $\alpha_{3}\lesssim\sqrt{(K-1)\log n}\theta_{\text{max}}$ since we assumed $n^{2}\theta_{\text{max}}^{2}\gtrsim(K-1)\mu^{*2}\log n$ . Finally, combining the bounds for $\alpha_{1},\alpha_{2}$ and $\alpha_{3}$ , we get

	$\displaystyle\left\\|\boldsymbol{R}^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{R}-\overline{\boldsymbol{\Lambda}}^{*}\right\\|\leq\alpha_{1}+\alpha_{2}+\alpha_{3}$	$\displaystyle\lesssim\frac{\kappa^{}n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{}}+\frac{\theta_{\text{max}}^{3}n^{1.5}}{\sigma_{\textbf{min}}^{*2}}+\sqrt{(K-1)\log n}\theta_{\text{max}}$
		$\displaystyle\asymp\frac{\kappa^{}n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{}}+\sqrt{(K-1)\log n}\theta_{\text{max}},$

since $\sigma_{\textbf{max}}^{*}\geq\sigma_{\textbf{min}}^{*}\gg\sqrt{n}\theta_{\text{max}}$ . Also,

	$\displaystyle\\|\boldsymbol{L}^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{\Lambda}}^{*}\\|$	$\displaystyle\leq\left\\|\boldsymbol{L}^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{\top}\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{}\right\\|+\left\\|\overline{\boldsymbol{U}}^{\top}\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{}-\overline{\boldsymbol{\Lambda}}^{*}\right\\|=\alpha_{2}+\alpha_{3}$
		$\displaystyle\lesssim\frac{\theta_{\text{max}}^{3}n^{1.5}}{\sigma_{\textbf{min}}^{*2}}+\sqrt{(K-1)\log n}\theta_{\text{max}},$

completing the proof of the Lemma. ∎

8.8 Proof of Lemma 9

Proof.

Since $\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}=\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}$ , by triangle inequality we have

\displaystyle\left\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}=\left\|\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}\leq\left\|\overline{\boldsymbol{H}}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}+\left\|\overline{\boldsymbol{W}}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}.

(70)

We control the first term on the RHS first. We consider the with self-loop case and the without self-loop case separately.

With self-loop: The first term can be bounded as

\displaystyle\left\|\overline{\boldsymbol{H}}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}=\left\|\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{\Lambda}}^{*}\overline{\boldsymbol{U}}^{*\top}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}\lesssim\left\|\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}\sigma_{\textbf{max}}^{*}\left\|\overline{\boldsymbol{U}}^{*\top}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|.

Let $\boldsymbol{L}=\boldsymbol{Y}_{1}(\cos\boldsymbol{\Omega})\boldsymbol{Y}_{2}^{\top}$ be the SVD of $\boldsymbol{L}$ , then

\displaystyle\left\|\overline{\boldsymbol{U}}^{*\top}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|=\left\|\boldsymbol{L}^{\top}\boldsymbol{L}-\boldsymbol{I}\right\|=\left\|\boldsymbol{Y}\left(\cos^{2}\boldsymbol{\Omega}\right)\boldsymbol{Y}^{\top}-\boldsymbol{I}\right\|=\left\|\cos^{2}\boldsymbol{\Omega}-\boldsymbol{I}\right\|=\left\|\sin\boldsymbol{\Omega}\right\|^{2}.

By (63), we have $\left\|\sin\boldsymbol{\Omega}\right\|\lesssim\sqrt{n}\theta_{\text{max}}/\sigma_{\textbf{min}}^{*}$ . Therefore,

\displaystyle\left\|\overline{\boldsymbol{H}}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}\lesssim\left\|\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}\sigma_{\textbf{max}}^{*}\left\|\sin\boldsymbol{\Omega}\right\|^{2}\lesssim\frac{\kappa^{*}}{\sigma_{\textbf{min}}^{*}}\sqrt{(K-1)\mu^{*}n}\theta_{\text{max}}^{2}.

(71)

Without self-loop: The first term can be bounded as

\displaystyle\left\|\overline{\boldsymbol{H}}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}\leq\left\|\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{\Lambda}}^{*}\overline{\boldsymbol{U}}^{*\top}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}+\left\|\left(\overline{\boldsymbol{H}}-\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{\Lambda}}^{*}\overline{\boldsymbol{U}}^{*\top}\right)\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}.

The first summand is bounded as the last case. For the second term,

\displaystyle\left\|\overline{\boldsymbol{H}}-\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{\Lambda}}^{*}\overline{\boldsymbol{U}}^{*\top}\right\|\leq\sigma_{K+1}(\boldsymbol{H})\leq\left\|\textbf{diag}(\boldsymbol{\Theta}\boldsymbol{\Pi}\boldsymbol{P}\boldsymbol{\Pi}^{\top}\boldsymbol{\Theta})\right\|\lesssim\theta_{\text{max}}^{2},

one can see that

	$\displaystyle\left\\|\left(\overline{\boldsymbol{H}}-\overline{\boldsymbol{U}}^{}\overline{\boldsymbol{\Lambda}}^{}\overline{\boldsymbol{U}}^{\top}\right)\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{}\right)\right\\|_{2,\infty}\leq\left\\|\overline{\boldsymbol{H}}-\overline{\boldsymbol{U}}^{}\overline{\boldsymbol{\Lambda}}^{}\overline{\boldsymbol{U}}^{\top}\right\\|\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{}\right\\|\lesssim\theta_{\text{max}}^{2}\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\\|$
$\displaystyle\leq$	$\displaystyle\theta_{\text{max}}^{2}\left\\|\overline{\boldsymbol{U}}\overline{\boldsymbol{U}}^{\top}\overline{\boldsymbol{U}}^{}-\overline{\boldsymbol{U}}^{}\right\\|=\theta_{\text{max}}^{2}\left\\|\left(\overline{\boldsymbol{U}}\overline{\boldsymbol{U}}^{\top}-\overline{\boldsymbol{U}}^{}\overline{\boldsymbol{U}}^{\top}\right)\overline{\boldsymbol{U}}^{*}\right\\|$
$\displaystyle\leq$	$\displaystyle\theta_{\text{max}}^{2}\left\\|\overline{\boldsymbol{U}}\overline{\boldsymbol{U}}^{\top}-\overline{\boldsymbol{U}}^{}\overline{\boldsymbol{U}}^{\top}\right\\|=\theta_{\text{max}}^{2}\left\\|\sin\boldsymbol{\Omega}\right\\|\lesssim\frac{\sqrt{n}\theta_{\text{max}}^{3}}{\sigma_{\textbf{min}}^{*}}.$	(72)

Since $\theta_{\text{max}}\lesssim 1$ , combine (71) and (72) we get

\displaystyle\left\|\overline{\boldsymbol{H}}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}\lesssim\frac{\kappa^{*}}{\sigma_{\textbf{min}}^{*}}\sqrt{(K-1)\mu^{*}n}\theta_{\text{max}}^{2}.

(73)

It remains to bound the second term $\left\|\overline{\boldsymbol{W}}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}$ . Recall from (8.7),

\displaystyle\overline{\boldsymbol{W}}=\boldsymbol{W}-\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}-\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}-\left(\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right)^{\top}\boldsymbol{N}-\boldsymbol{\Delta}.

(74)

We control the five summands of RHS separately.

(a)

Control $\left\|\boldsymbol{W}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}$ : We use the definition of leave-one-out matrix $\boldsymbol{W}^{(i)}$ in the proof of Theorem 9. Define $\boldsymbol{X}^{(i)}:=\boldsymbol{H}+\boldsymbol{W}^{(i)}$ and let $\widehat{\lambda}_{1}^{(i)},\widehat{\lambda}_{2}^{(i)},\dots,\widehat{\lambda}_{K}^{(i)}$ be the largest $K$ eigenvalues of $\boldsymbol{X}^{(i)}$ in magnitude, and they are sorted decreasingly. Let $\widehat{\boldsymbol{u}}_{1}^{(i)},\widehat{\boldsymbol{u}}_{2}^{(i)},\dots,\widehat{\boldsymbol{u}}_{K}^{(i)}$ the corresponding eigenvectors. Let $\overline{\boldsymbol{X}}^{(i)}=\boldsymbol{X}^{(i)}-\widehat{\lambda}_{1}^{(i)}\widehat{\boldsymbol{u}}_{1}^{(i)}\widehat{\boldsymbol{u}}_{1}^{(i)\top}$ , $\overline{\boldsymbol{U}}^{(i)}=[\widehat{\boldsymbol{u}}_{2}^{(i)},\dots,\widehat{\boldsymbol{u}}_{K}^{(i)}]$ and $\boldsymbol{L}^{(i)}=\overline{\boldsymbol{U}}^{(i)\top}\overline{\boldsymbol{U}}^{*}$ . Then $\overline{\boldsymbol{U}}^{(i)}$ and $\boldsymbol{L}^{(i)}$ are independent with $\boldsymbol{W}-\boldsymbol{W}^{(i)}$ . And, one can easily see that the results in Lemma 4, Theorem 10 (we also define $\boldsymbol{\Delta}^{(i)}$ ), Corollary 2 and Lemma 7 also apply to the leave-one-out matrices. As a result, we have

	$\displaystyle\left\\|\boldsymbol{W}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\\|_{2,\infty}$	$\displaystyle=\max_{1\leq i\leq n}\left\\|\boldsymbol{W}_{i,\cdot}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\\|_{2}$
		$\displaystyle\leq\max_{1\leq i\leq n}\left\{\left\\|\boldsymbol{W}_{i,\cdot}\left(\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}-\overline{\boldsymbol{U}}^{*}\right)\right\\|_{2}+\left\\|\boldsymbol{W}_{i,\cdot}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}\right)\right\\|_{2}\right\}$

By Lemma 14, we have

	$\displaystyle\left\\|\boldsymbol{W}_{i,\cdot}\left(\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}-\overline{\boldsymbol{U}}^{*}\right)\right\\|_{2}\lesssim$	$\displaystyle\sqrt{\log n}\theta_{\text{max}}\left\\|\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}-\overline{\boldsymbol{U}}^{}\right\\|_{F}+\log n\left\\|\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}-\overline{\boldsymbol{U}}^{}\right\\|_{2,\infty}$
	$\displaystyle\leq$	$\displaystyle\sqrt{\log n}\theta_{\text{max}}\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{}\right\\|_{F}+\log n\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{}\right\\|_{2,\infty}$
		$\displaystyle+\sqrt{\log n}\theta_{\text{max}}\left\\|\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}-\overline{\boldsymbol{U}}\boldsymbol{L}\right\\|_{F}+\log n\left\\|\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}-\overline{\boldsymbol{U}}\boldsymbol{L}\right\\|_{2,\infty}$
	$\displaystyle\leq$	$\displaystyle\sqrt{\log n}\theta_{\text{max}}\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{}\right\\|_{F}+\log n\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{}\right\\|_{2,\infty}$
		$\displaystyle+\log n\left\\|\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}-\overline{\boldsymbol{U}}\boldsymbol{L}\right\\|_{F}$

with probability at least $1-O(n^{-14})$ . By [CCF⁺21, Lemma 2.5], since $\overline{\boldsymbol{U}}^{*\top}\overline{\boldsymbol{U}}^{*}=\boldsymbol{I}$ , we have

$\displaystyle\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\\|_{F}$	$\displaystyle=\left\\|\overline{\boldsymbol{U}}\overline{\boldsymbol{U}}^{\top}\overline{\boldsymbol{U}}^{}-\overline{\boldsymbol{U}}^{}\right\\|_{F}=\left\\|\left(\overline{\boldsymbol{U}}\overline{\boldsymbol{U}}^{\top}-\overline{\boldsymbol{U}}^{}\overline{\boldsymbol{U}}^{\top}\right)\overline{\boldsymbol{U}}^{*}\right\\|_{F}$
	$\displaystyle\leq\left\\|\overline{\boldsymbol{U}}\overline{\boldsymbol{U}}^{\top}-\overline{\boldsymbol{U}}^{}\overline{\boldsymbol{U}}^{\top}\right\\|_{F}=\sqrt{2}\left\\|\sin\boldsymbol{\Omega}\right\\|_{F}\leq\sqrt{2(K-1)}\left\\|\sin\boldsymbol{\Omega}\right\\|$
	$\displaystyle\lesssim\frac{\sqrt{(K-1)n}}{\sigma_{\textbf{min}}^{*}}\theta_{\text{max}},$	(75)

where we used (63) for the last equality. As a result, we have

		$\displaystyle\left\\|\boldsymbol{W}_{i,\cdot}\left(\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}-\overline{\boldsymbol{U}}^{*}\right)\right\\|_{2}$
	$\displaystyle\lesssim$	$\displaystyle\underbrace{\frac{\sqrt{(K-1)n\log n}}{\sigma_{\textbf{min}}^{}}\theta_{\text{max}}+\log n\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{}\right\\|_{2,\infty}+\log n\left\\|\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}-\overline{\boldsymbol{U}}\boldsymbol{L}\right\\|_{F}}_{\alpha}.$		(76)

On the other hand, we have

\displaystyle\left\|\boldsymbol{W}_{i,\cdot}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}\right)\right\|_{2}\leq\left\|\boldsymbol{W}_{i,\cdot}\right\|_{2}\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}\right\|\leq\sqrt{n}\theta_{\text{max}}\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}\right\|_{F}.

(77)

As a result, it remains to bound $\beta:=\left\|\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}-\overline{\boldsymbol{U}}\boldsymbol{L}\right\|_{F}$ . One can see that

\displaystyle\beta=\left\|\left(\overline{\boldsymbol{U}}^{(i)}\overline{\boldsymbol{U}}^{(i)\top}-\overline{\boldsymbol{U}}\overline{\boldsymbol{U}}^{\top}\right)\overline{\boldsymbol{U}}^{*}\right\|_{F}\leq\left\|\overline{\boldsymbol{U}}^{(i)}\overline{\boldsymbol{U}}^{(i)\top}-\overline{\boldsymbol{U}}\overline{\boldsymbol{U}}^{\top}\right\|_{F}.

By Davis-Kahan’s sin $\Theta$ theorem [CCF⁺21, Theorem 2.7], we have

	$\displaystyle\beta=\left\\|\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}-\overline{\boldsymbol{U}}\boldsymbol{L}\right\\|_{F}$	$\displaystyle\lesssim\frac{\left\\|\left(\overline{\boldsymbol{X}}-\overline{\boldsymbol{X}}^{(i)}\right)\overline{\boldsymbol{U}}^{(i)}\right\\|_{F}}{\min_{2\leq j\leq K}\left\|\widehat{\lambda}_{j}\right\|-\max_{j>K}\left\|\widehat{\lambda}_{j}^{(i)}\right\|}$
		$\displaystyle\lesssim\frac{\left\\|\left(\overline{\boldsymbol{X}}-\overline{\boldsymbol{X}}^{(i)}\right)\overline{\boldsymbol{U}}^{(i)}\right\\|_{F}}{\sigma_{\textbf{min}}^{}-\left\\|\boldsymbol{W}\right\\|-\left\\|\boldsymbol{W}^{(i)}\right\\|}\lesssim\frac{\left\\|\left(\overline{\boldsymbol{X}}-\overline{\boldsymbol{X}}^{(i)}\right)\overline{\boldsymbol{U}}^{(i)}\right\\|_{F}}{\sigma_{\textbf{min}}^{}},$		(78)

since $\|\boldsymbol{W}^{(i)}\|\leq\|\boldsymbol{W}\|\lesssim\sqrt{n}\theta_{\text{max}}\ll\sigma_{\textbf{min}}^{*}$ . By Theorem 10, $\overline{\boldsymbol{X}}-\overline{\boldsymbol{X}}^{(i)}$ can be decomposed as

	$\displaystyle\overline{\boldsymbol{X}}-\overline{\boldsymbol{X}}^{(i)}=$	$\displaystyle\boldsymbol{W}-\boldsymbol{W}^{(i)}-\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}-\boldsymbol{N}\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}$
		$\displaystyle-\left(\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\right)^{\top}\boldsymbol{N}-\left(\boldsymbol{\Delta}-\boldsymbol{\Delta}^{(i)}\right).$

We bound the numerator of (78) via controlling the five summands of RHS of the above display separately.

(a)

Control $\left\|\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\overline{\boldsymbol{U}}^{(i)}\right\|_{F}$ : By triangle inequality we have

$\displaystyle\left\\|\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\overline{\boldsymbol{U}}^{(i)}\right\\|_{F}$	$\displaystyle=\left\\|\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}\left(\boldsymbol{L}^{(i)}\right)^{-1}\right\\|_{F}$
	$\displaystyle\lesssim\left\\|\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}\right\\|_{F}$
	$\displaystyle\leq\left\\|\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\overline{\boldsymbol{U}}^{}\right\\|_{F}+\underbrace{\left\\|\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\left(\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}-\overline{\boldsymbol{U}}^{}\right)\right\\|_{F}}_{=:\vartheta_{1}}$	(79)

On one hand, by Lemma 16 and (12) we have

	$\displaystyle\left\\|\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\overline{\boldsymbol{U}}^{}\right\\|_{F}\lesssim\sqrt{\log n}\theta_{\text{max}}\left\\|\overline{\boldsymbol{U}}^{}\right\\|_{F}+(\sqrt{n}\theta_{\text{max}}+\log n)\left\\|\overline{\boldsymbol{U}}^{*}\right\\|_{2,\infty}$
	$\displaystyle\lesssim\sqrt{(K-1)\log n}\theta_{\text{max}}+(\sqrt{n}\theta_{\text{max}}+\log n)\sqrt{(K-1)\mu^{*}/n}=:\alpha_{1}$		(80)

with probability at least $1-O(n^{-14})$ . On the other hand, we have

\displaystyle\vartheta^{2}_{1}=

\displaystyle\left\|\boldsymbol{W}_{i,\cdot}\left(\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2}^{2}+\sum_{j\in[n],j\neq i}W_{ji}^{2}\left\|\left(\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}-\overline{\boldsymbol{U}}^{*}\right)_{j,\cdot}\right\|_{2}^{2}.

Since $\sum_{j\in[n],j\neq i}W_{ji}^{2}\leq\left\|\boldsymbol{W}_{\cdot,i}\right\|_{2}^{2}\leq\left\|\boldsymbol{W}\right\|^{2}\lesssim n\theta_{\text{max}}^{2}$ , we get

\displaystyle\vartheta^{2}_{1}\lesssim\alpha^{2}+n\theta_{\text{max}}^{2}\left\|\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}-\overline{\boldsymbol{U}}^{*}\right\|^{2}_{2,\infty},

where $\alpha$ is defined by (76). As a result, we have

	$\displaystyle\vartheta^{2}_{1}$	$\displaystyle\lesssim\alpha+\sqrt{n}\theta_{\text{max}}\left\\|\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}-\overline{\boldsymbol{U}}^{*}\right\\|_{2,\infty}$
		$\displaystyle\lesssim\alpha+\sqrt{n}\theta_{\text{max}}\left\\|\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}-\overline{\boldsymbol{U}}\boldsymbol{L}\right\\|_{2,\infty}+\sqrt{n}\theta_{\text{max}}\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\\|_{2,\infty}$
		$\displaystyle\lesssim\alpha+\sqrt{n}\theta_{\text{max}}\beta+\sqrt{n}\theta_{\text{max}}\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\\|_{2,\infty},$

where $\beta$ is defined by (78). Combine this with (79) and (80) we have

\displaystyle\left\|\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\overline{\boldsymbol{U}}^{(i)}\right\|_{F}\lesssim

\displaystyle\;\alpha+\sqrt{n}\theta_{\text{max}}\beta+\sqrt{n}\theta_{\text{max}}\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}+\alpha_{1}.

(81)

with probability at least $1-O(n^{-14})$ .

(b)

Control $\left\|\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\overline{\boldsymbol{U}}^{(i)}\right\|_{F}$ : Since $\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\overline{\boldsymbol{U}}^{(i)}$ is a rank- $1$ matrix, we know that

	$\displaystyle\left\\|\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\overline{\boldsymbol{U}}^{(i)}\right\\|_{F}=\left\\|\boldsymbol{u}_{1}^{}\right\\|_{2}\left\\|\boldsymbol{u}_{1}^{\top}\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\overline{\boldsymbol{U}}^{(i)}\right\\|_{2}$
	$\displaystyle\leq\left\\|\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\right\\|\leq\left\\|\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\right\\|_{F}.$

By Lemma 16 we know that

	$\displaystyle\left\\|\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\right\\|_{F}$	$\displaystyle\lesssim\sqrt{\log n}\theta_{\text{max}}\left\\|\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\right\\|_{F}+(\sqrt{n}\theta_{\text{max}}+\log n)\left\\|\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\right\\|_{2,\infty}$
		$\displaystyle\leq\sqrt{\log n}\theta_{\text{max}}+(\sqrt{n}\theta_{\text{max}}+\log n)\sqrt{\mu^{*}/n}=:\rho_{2}.$		(82)

with probability at least $1-O(n^{-14})$ , where the last inequality uses (12). Hence,

\displaystyle\left\|\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\overline{\boldsymbol{U}}^{(i)}\right\|_{F}\lesssim\rho_{2}

(83)

with probability at least $1-O(n^{-14})$ .

(c)

Control $\left\|\boldsymbol{N}\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\overline{\boldsymbol{U}}^{(i)}\right\|_{F}$ : By (62), we have $\left\|\boldsymbol{N}\right\|\lesssim 1$ implying

	$\displaystyle\left\\|\boldsymbol{N}\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\overline{\boldsymbol{U}}^{(i)}\right\\|_{F}$	$\displaystyle\leq\left\\|\boldsymbol{N}\right\\|\left\\|\overline{\boldsymbol{U}}^{(i)}\right\\|\left\\|\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\right\\|_{F}$
		$\displaystyle\lesssim\left\\|\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\right\\|_{F}\lesssim\rho_{2}.$

with probability at least $1-O(n^{-14})$ using (0b).

(d)

Control $\left\|\left(\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right)^{\top}\boldsymbol{N}\overline{\boldsymbol{U}}^{(i)}\right\|_{F}$ : Since $\left(\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right)^{\top}\boldsymbol{N}\overline{\boldsymbol{U}}^{(i)}$ is a rank- $1$ matrix and $\boldsymbol{N}=\lambda_{1}^{*}\boldsymbol{N}_{1}$ , by Lemma 12 and (16) we know that

	$\displaystyle\left\\|\left(\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\right)^{\top}\boldsymbol{N}\overline{\boldsymbol{U}}^{(i)}\right\\|_{F}$	$\displaystyle=\left\\|\boldsymbol{u}_{1}^{}\right\\|_{2}\left\\|\boldsymbol{u}_{1}^{\top}\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{N}\overline{\boldsymbol{U}}^{(i)}\right\\|_{2}$
		$\displaystyle\lesssim\left\\|\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{u}_{1}^{*}\right\\|_{2}\lesssim\rho_{2}$		(84)

with probability at least $1-O(n^{-15})$ .

(e)

Control $\left\|\left(\boldsymbol{\Delta}-\boldsymbol{\Delta}^{(i)}\right)\overline{\boldsymbol{U}}^{(i)}\right\|_{F}$ : By Theorem 10, the ranks of $\boldsymbol{\Delta}$ and $\boldsymbol{\Delta}^{(i)}$ are at most $3$ . Hence,

\displaystyle\left\|\left(\boldsymbol{\Delta}-\boldsymbol{\Delta}^{(i)}\right)\overline{\boldsymbol{U}}^{(i)}\right\|_{F}\lesssim\left\|\boldsymbol{\Delta}-\boldsymbol{\Delta}^{(i)}\right\|_{F}\left\|\overline{\boldsymbol{U}}^{(i)}\right\|\lesssim\left\|\boldsymbol{\Delta}-\boldsymbol{\Delta}^{(i)}\right\|\lesssim\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{*}}.

(85)

Recall the definitions of $\alpha,\beta$ from (76) and (78) respectively. Combining (81)-(85), we get

\displaystyle\left\|\left(\overline{\boldsymbol{X}}-\overline{\boldsymbol{X}}^{(i)}\right)\overline{\boldsymbol{U}}^{(i)}\right\|_{F}\lesssim

\displaystyle\;\alpha+\sqrt{n}\theta_{\text{max}}\beta+\sqrt{n}\theta_{\text{max}}\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}+\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{*}}+\alpha_{1}

with probability at least $1-O(n^{-14})$ . This, along with (78) yields

	$\displaystyle\beta\lesssim$	$\displaystyle\frac{\sqrt{n}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{}}\beta+\frac{1}{\sigma_{\textbf{min}}^{}}\alpha+\frac{\sqrt{n}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{}}\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{}\right\\|_{2,\infty}+\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{}\sigma_{\textbf{min}}^{}}$
		$\displaystyle+\frac{\sqrt{(K-1)\log n}\theta_{\text{max}}+(\sqrt{n}\theta_{\text{max}}+\log n)\sqrt{(K-1)\mu^{}/n}}{\sigma_{\textbf{min}}^{}}$

with probability at least $1-O(n^{-14})$ . Since $\sqrt{n}\theta_{\text{max}}\ll\sigma_{\textbf{min}}^{*}$ ,

\displaystyle\beta\lesssim

\displaystyle\frac{1}{\sigma_{\textbf{min}}^{*}}\alpha+\frac{\sqrt{n}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{*}}\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}+\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{*}\sigma_{\textbf{min}}^{*}}+\frac{\alpha_{1}}{\sigma_{\textbf{min}}^{*}}

with probability at least $1-O(n^{-14})$ . Recall that $\alpha$ is defined as

\displaystyle\alpha=\frac{\sqrt{(K-1)n\log n}}{\sigma_{\textbf{min}}^{*}}\theta_{\text{max}}+\log n\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}+\beta\log n

in (76), we have

	$\displaystyle\alpha\lesssim$	$\displaystyle\frac{\sqrt{(K-1)n\log n}}{\sigma_{\textbf{min}}^{}}\theta_{\text{max}}+\log n\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{}\right\\|_{2,\infty}+\frac{\log n}{\sigma_{\textbf{min}}^{*}}\alpha$
		$\displaystyle+\frac{\sqrt{n}\theta_{\text{max}}\log n}{\sigma_{\textbf{min}}^{}}\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{}\right\\|_{2,\infty}+\frac{n\theta_{\text{max}}^{2}\log n}{\lambda_{1}^{}\sigma_{\textbf{min}}^{}}+\frac{\alpha_{1}\log n}{\sigma_{\textbf{min}}^{*}}$

with probability at least $1-O(n^{-14})$ . By our assumption $\max\{\sqrt{n}\theta_{\text{max}},\log n\}\ll\sigma_{\textbf{min}}^{*}$ , yielding

	$\displaystyle\alpha\lesssim$	$\displaystyle\log n\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{}\right\\|_{2,\infty}+\frac{\sqrt{(K-1)n\log n}}{\sigma_{\textbf{min}}^{}}\theta_{\text{max}}+\frac{n\theta_{\text{max}}^{2}\log n}{\lambda_{1}^{}\sigma_{\textbf{min}}^{}}$
		$\displaystyle+\frac{(\sqrt{n}\theta_{\text{max}}+\log n)\sqrt{(K-1)\mu^{}/n}\log n}{\sigma_{\textbf{min}}^{}}$		(86)

with probability at least $1-O(n^{-14})$ . This also implies

	$\displaystyle\beta\lesssim$	$\displaystyle\frac{\sqrt{n}\theta_{\text{max}}+\log n}{\sigma_{\textbf{min}}^{}}\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{}\right\\|_{2,\infty}+\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{}\sigma_{\textbf{min}}^{}}$
		$\displaystyle+\frac{\sqrt{(K-1)\log n}\theta_{\text{max}}+(\sqrt{n}\theta_{\text{max}}+\log n)\sqrt{(K-1)\mu^{}/n}}{\sigma_{\textbf{min}}^{}}$		(87)

with probability at least $1-O(n^{-14})$ . Plugging the quantities $\alpha,\beta$ in (76) we get

	$\displaystyle\left\\|\boldsymbol{W}_{i,\cdot}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\\|_{2}\leq$	$\displaystyle\left\\|\boldsymbol{W}_{i,\cdot}\left(\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}-\overline{\boldsymbol{U}}^{*}\right)\right\\|_{2}+\left\\|\boldsymbol{W}_{i,\cdot}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{(i)}\boldsymbol{L}^{(i)}\right)\right\\|_{2}$
	$\displaystyle\lesssim$	$\displaystyle\alpha+\sqrt{n}\theta_{\text{max}}\beta$

with probability at least $1-O(n^{-14})$ . Taking a union bound for $i\in[n]$ and noting that $\mathbb{P}(\mathcal{A}_{1})\geq 1-O(n^{-10})$ , we have with probability at least $1-O(n^{-10})$ ,

	$\displaystyle\left\\|\boldsymbol{W}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\\|_{2,\infty}\lesssim$	$\displaystyle\left(\frac{n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{}}+\log n\right)\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{}\right\\|_{2,\infty}+\frac{\sqrt{(K-1)n\log n}}{\sigma_{\textbf{min}}^{*}}\theta_{\text{max}}$
		$\displaystyle+\frac{(\sqrt{n}\theta_{\text{max}}+\log n)n\theta_{\text{max}}^{2}}{\lambda_{1}^{}\sigma_{\textbf{min}}^{}}+\frac{(\sqrt{n}\theta_{\text{max}}+\log n)^{2}\sqrt{(K-1)\mu^{}/n}}{\sigma_{\textbf{min}}^{}},$		(88)

where we used the bounds (86) and (87).

(b)

Control $\left\|\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}$ : We can write it as

\displaystyle\left\|\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}=\left\|\boldsymbol{u}_{1}^{*}\right\|_{\infty}\left|\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right|\left\|\boldsymbol{u}_{1}^{*\top}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2}

(89)

By Lemma 17, $\left|\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right|\lesssim\sqrt{\log n}$ with probability at least $1-O(n^{-15})$ . Since $\overline{\boldsymbol{U}}^{*\top}\overline{\boldsymbol{U}}=\boldsymbol{I}$ ,

	$\displaystyle\left\\|\boldsymbol{u}_{1}^{\top}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{}\right)\right\\|_{2}$	$\displaystyle\leq\left\\|\boldsymbol{u}_{1}^{}\right\\|_{2}\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{}\right\\|=\left\\|\overline{\boldsymbol{U}}\overline{\boldsymbol{U}}^{\top}\overline{\boldsymbol{U}}^{}-\overline{\boldsymbol{U}}^{}\right\\|=\left\\|\left(\overline{\boldsymbol{U}}\overline{\boldsymbol{U}}^{\top}-\overline{\boldsymbol{U}}^{}\overline{\boldsymbol{U}}^{\top}\right)\overline{\boldsymbol{U}}^{*}\right\\|$
		$\displaystyle\leq\left\\|\overline{\boldsymbol{U}}\overline{\boldsymbol{U}}^{\top}-\overline{\boldsymbol{U}}^{}\overline{\boldsymbol{U}}^{\top}\right\\|=\left\\|\sin\boldsymbol{\Omega}\right\\|\lesssim\frac{\sqrt{n}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{*}},$

where the last inequality uses (63). Plugging in (89),

\displaystyle\left\|\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}\lesssim\sqrt{\frac{\mu^{*}}{n}}\sqrt{\log n}\frac{\sqrt{n}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{*}}=\frac{\sqrt{\mu^{*}\log n}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{*}}

(90)

with probability at least $1-O(n^{-15})$ .

(c)

Control $\left\|\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}$ : Since $\boldsymbol{N}=\lambda_{1}^{*}\boldsymbol{N}_{1}$ , we apply Lemma 50 to get

		$\displaystyle\left\\|\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\\|_{2,\infty}$
	$\displaystyle\lesssim$	$\displaystyle\left\\|\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{}\right)\right\\|_{2,\infty}+\left\\|\left(\boldsymbol{N}-\boldsymbol{I}\right)\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{}\right)\right\\|_{2,\infty}$
	$\displaystyle\lesssim$	$\displaystyle\left\\|\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{}\right)\right\\|_{2,\infty}+\left\\|\boldsymbol{N}-\boldsymbol{I}\right\\|_{2,\infty}\left\\|\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{}\right)\right\\|$
	$\displaystyle\lesssim$	$\displaystyle\left\\|\boldsymbol{W}\boldsymbol{u}_{1}^{}\right\\|_{2,\infty}\left\\|\boldsymbol{u}_{1}^{\top}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{}\right)\right\\|+\sqrt{\frac{(K-1)\mu^{}}{n}}\left\\|\boldsymbol{W}\right\\|\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\\|.$

On one hand, under event $\mathcal{A}_{1}$ , we have $\|\boldsymbol{W}\|\lesssim\sqrt{n}\theta_{\text{max}}$ and $\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|=\left\|\sin\boldsymbol{\Omega}\right\|\lesssim\sqrt{n}\theta_{\text{max}}/\sigma_{\textbf{min}}^{*}$ , using (63), yielding

\displaystyle\sqrt{\frac{(K-1)\mu^{*}}{n}}\left\|\boldsymbol{W}\right\|\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|\lesssim\frac{\sqrt{(K-1)n\mu^{*}}\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{*}}.

On the other hand, by Lemma 15, $\left\|\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\|_{2,\infty}\lesssim\sqrt{\log n}\theta_{\text{max}}+\log n\sqrt{\mu^{*}/n}$ with probability at least $1-O(n^{-14})$ . As a result, one can see that

	$\displaystyle\left\\|\boldsymbol{W}\boldsymbol{u}_{1}^{}\right\\|_{2,\infty}\left\\|\boldsymbol{u}_{1}^{\top}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\\|$	$\displaystyle\lesssim\left(\sqrt{\log n}\theta_{\text{max}}+\log n\sqrt{\mu^{}/n}\right)\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{}\right\\|$
		$\displaystyle\lesssim\frac{\sqrt{n\log n}\theta_{\text{max}}^{2}+\log n\sqrt{\mu^{}}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{}}$

with probability at least $1-O(n^{-14})$ . To sum up, we get

\displaystyle\left\|\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}\lesssim\frac{(\sqrt{(K-1)\mu^{*}}+\sqrt{\log n})\sqrt{n}\theta_{\text{max}}^{2}+\log n\sqrt{\mu^{*}}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{*}}

(91)

with probability at least $1-O(n^{-14})$ .

(d)

Control $\left\|\left(\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right)^{\top}\boldsymbol{N}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}$ : It can be bounded as

	$\displaystyle\left\\|\left(\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\right)^{\top}\boldsymbol{N}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\\|_{2,\infty}$	$\displaystyle=\left\\|\boldsymbol{u}_{1}^{}\right\\|_{\infty}\left\\|\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{N}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\\|_{2}$
		$\displaystyle\leq\left\\|\boldsymbol{u}_{1}^{}\right\\|_{\infty}\left\\|\boldsymbol{u}_{1}^{}\right\\|_{2}\left\\|\boldsymbol{W}\right\\|\left\\|\boldsymbol{N}\right\\|\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\\|.$

Recall that $\boldsymbol{N}=\lambda_{1}^{*}\boldsymbol{N}_{1}$ , as we have shown in Lemma 12 and (63), $\left\|\boldsymbol{N}\right\|\lesssim 1$ and $\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|=\left\|\sin\boldsymbol{\Omega}\right\|\lesssim\sqrt{n}\theta_{\text{max}}/\sigma_{\textbf{min}}^{*}$ . Therefore, we have

\displaystyle\left\|\left(\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right)^{\top}\boldsymbol{N}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}\lesssim\sqrt{\frac{\mu^{*}}{n}}\sqrt{n}\theta_{\text{max}}\frac{\sqrt{n}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{*}}=\frac{\sqrt{\mu^{*}n}\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{*}}.

(92)

(e)

Control $\left\|\boldsymbol{\Delta}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}$ : According to Lemma 10, we simply have

\displaystyle\left\|\boldsymbol{\Delta}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}\leq\left\|\boldsymbol{\Delta}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{F}\leq\left\|\boldsymbol{\Delta}\right\|\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{F}\lesssim\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{*}}\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{F}.

As we have shown in (75), we have $\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{F}\lesssim\sqrt{(K-1)n}\theta_{\text{max}}/\sigma_{\textbf{min}}^{*}$ . As a result, we have

\displaystyle\left\|\boldsymbol{\Delta}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\|_{2,\infty}\lesssim\frac{n}{\lambda_{1}^{*}}\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{F}\lesssim\frac{\sqrt{K-1}\theta_{\text{max}}^{3}n^{1.5}}{\lambda_{1}^{*}\sigma_{\textbf{min}}^{*}}.

(93)

Finally, we can sum up all the five terms we bounded above. Specifically, a combination of (88), (90), (91), (92), (93) and (74) yields

	$\displaystyle\left\\|\overline{\boldsymbol{W}}\left(\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right)\right\\|_{2,\infty}\lesssim$	$\displaystyle\left(\frac{n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{}}+\log n\right)\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{}\right\\|_{2,\infty}+\frac{\sqrt{(K-1)n\log n}}{\sigma_{\textbf{min}}^{*}}\theta_{\text{max}}$
		$\displaystyle+\frac{(\sqrt{(K-1)n}\theta_{\text{max}}+\log n)n\theta_{\text{max}}^{2}}{\lambda_{1}^{}\sigma_{\textbf{min}}^{}}+\frac{(\sqrt{n}\theta_{\text{max}}+\log n)^{2}\sqrt{(K-1)\mu^{}/n}}{\sigma_{\textbf{min}}^{}}$

with probability at least $1-O(n^{-10})$ . Plugging in this bound, along with the bound (71) and (73) in (70) provides the desired conclusion.

∎

8.9 Proof of Lemma 10

Proof.

Since $\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}=\overline{\boldsymbol{U}}^{*}\overline{\boldsymbol{\Lambda}}^{*}+\overline{\boldsymbol{W}}\overline{\boldsymbol{U}}^{*}$ , we consider the following decomposition

$\displaystyle\sigma_{\textbf{min}}^{}\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{}\right\\|_{2,\infty}$	$\displaystyle\leq\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}\overline{\boldsymbol{\Lambda}}^{}-\overline{\boldsymbol{U}}^{}\overline{\boldsymbol{\Lambda}}^{*}\right\\|_{2,\infty}$
	$\displaystyle\leq\left\\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{}\overline{\boldsymbol{\Lambda}}^{}\right\\|_{2,\infty}+\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}\right\\|_{2,\infty}$
	$\displaystyle\leq\left\\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{}\right\\|_{2,\infty}+\left\\|\overline{\boldsymbol{W}}\overline{\boldsymbol{U}}^{}\right\\|_{2,\infty}+\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}\right\\|_{2,\infty}.$	(94)

Since the upper bound of $\left\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}$ follows from Lemma 9, we only have to deal with the second and third quantity. We begin with the second quantity. By Lemma 7, since $\sigma_{K-1}(\boldsymbol{L})\geq 1/2$ , we have

$\displaystyle\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}\right\\|_{2,\infty}$	$\displaystyle=\left\\|\overline{\boldsymbol{U}}\right\\|_{2,\infty}\left\\|\boldsymbol{L}\overline{\boldsymbol{\Lambda}}^{}-\overline{\boldsymbol{\Lambda}}\boldsymbol{L}\right\\|\lesssim\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}\right\\|_{2,\infty}\left\\|\boldsymbol{L}\overline{\boldsymbol{\Lambda}}^{}-\overline{\boldsymbol{\Lambda}}\boldsymbol{L}\right\\|$
	$\displaystyle\leq\left(\left\\|\overline{\boldsymbol{U}}^{}\right\\|_{2,\infty}+\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{}\right\\|_{2,\infty}\right)\left\\|\boldsymbol{L}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{\Lambda}}\boldsymbol{L}\right\\|$
	$\displaystyle\leq\left(\sqrt{\frac{(K-1)\mu^{}}{n}}+\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{}\right\\|_{2,\infty}\right)\left\\|\boldsymbol{L}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{\Lambda}}\boldsymbol{L}\right\\|,$	(95)

where the last inequality follows from (12). In addition,

$\displaystyle\left\\|\boldsymbol{L}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{\Lambda}}\boldsymbol{L}\right\\|$	$\displaystyle=\left\\|\left(\boldsymbol{L}^{\top}\right)^{-1}\left(\boldsymbol{L}^{\top}\boldsymbol{L}\overline{\boldsymbol{\Lambda}}^{}-\boldsymbol{L}^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}\right)\right\\|\lesssim\left\\|\boldsymbol{L}^{\top}\boldsymbol{L}\overline{\boldsymbol{\Lambda}}^{}-\boldsymbol{L}^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}\right\\|$
	$\displaystyle\leq\left\\|\overline{\boldsymbol{\Lambda}}^{}-\boldsymbol{L}^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}\right\\|+\left\\|\boldsymbol{L}^{\top}\boldsymbol{L}\overline{\boldsymbol{\Lambda}}^{}-\overline{\boldsymbol{\Lambda}}^{*}\right\\|$
	$\displaystyle\lesssim\frac{\theta_{\text{max}}^{3}n^{1.5}}{\sigma_{\textbf{min}}^{2}}+\sqrt{(K-1)\log n}\theta_{\text{max}}+\sigma_{\textbf{max}}^{}\left\\|\boldsymbol{L}^{\top}\boldsymbol{L}-\boldsymbol{I}\right\\|.$	(96)

Since $\boldsymbol{R}^{\top}\boldsymbol{R}=\boldsymbol{I}$ , by Lemma 7 we have

\displaystyle\left\|\boldsymbol{L}^{\top}\boldsymbol{L}-\boldsymbol{I}\right\|

\displaystyle=\left\|\boldsymbol{L}^{\top}\boldsymbol{L}-\boldsymbol{R}^{\top}\boldsymbol{R}\right\|\leq\|\boldsymbol{L}^{\top}\left(\boldsymbol{L}-\boldsymbol{R}\right)\|+\|\left(\boldsymbol{L}-\boldsymbol{R}\right)^{\top}\boldsymbol{R}\|\lesssim\|\boldsymbol{L}-\boldsymbol{R}\|\lesssim\frac{n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{*2}}.

(97)

Combining (96) and (97), we get

	$\displaystyle\left\\|\boldsymbol{L}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{\Lambda}}\boldsymbol{L}\right\\|$	$\displaystyle\lesssim\frac{\theta_{\text{max}}^{3}n^{1.5}}{\sigma_{\textbf{min}}^{2}}+\sqrt{(K-1)\log n}\theta_{\text{max}}+\sigma_{\textbf{max}}^{}\frac{n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{*2}}$
		$\displaystyle\lesssim\sqrt{(K-1)\log n}\theta_{\text{max}}+\frac{\kappa^{}n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{}}=:\xi_{1}$		(98)

since $\sqrt{n}\theta_{\text{max}}\ll\sigma_{\textbf{min}}^{*}\leq\sigma_{\textbf{max}}^{*}$ . Therefore, by (95) we have

\displaystyle\left\|\overline{\boldsymbol{U}}\boldsymbol{L}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}\right\|_{2,\infty}\lesssim\xi_{1}\left(\sqrt{\frac{(K-1)\mu^{*}}{n}}+\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}\right).

(99)

We now turn to bound the third summand of (94), namely, $\left\|\overline{\boldsymbol{W}}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}$ . Recall the decomposition of $\overline{\boldsymbol{W}}$ from (8.7). By Lemma 15 we have

	$\displaystyle\left\\|\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\right\\|_{2,\infty}$	$\displaystyle\lesssim\sqrt{\log n}\theta_{\text{max}}\left\\|\overline{\boldsymbol{U}}^{}\right\\|_{F}+\log n\left\\|\overline{\boldsymbol{U}}^{}\right\\|_{2,\infty}$
		$\displaystyle\lesssim\sqrt{(K-1)\log n}\theta_{\text{max}}+\log n\sqrt{(K-1)\mu^{*}/n}$		(100)

with probability at least $1-O(n^{-14})$ . And, since $\left(\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right)^{\top}\boldsymbol{N}\overline{\boldsymbol{U}}^{*}=\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{N}\overline{\boldsymbol{U}}^{*}$ is a rank- $1$ matrix, we have

	$\displaystyle\left\\|\left(\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\right)^{\top}\boldsymbol{N}\overline{\boldsymbol{U}}^{*}\right\\|_{2,\infty}$	$\displaystyle=\left\\|\boldsymbol{u}_{1}^{}\right\\|_{\infty}\left\\|\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{N}\overline{\boldsymbol{U}}^{}\right\\|_{2}\leq\left\\|\boldsymbol{u}_{1}^{}\right\\|_{\infty}\left\\|\boldsymbol{u}_{1}^{}\right\\|_{2}\left\\|\boldsymbol{W}\right\\|\left\\|\boldsymbol{N}\right\\|\left\\|\overline{\boldsymbol{U}}^{}\right\\|$
		$\displaystyle\lesssim\sqrt{\frac{\mu^{}}{n}}\sqrt{n}\theta_{\text{max}}=\sqrt{\mu^{}}\theta_{\text{max}}.$		(101)

And, since $\left\|\boldsymbol{\Delta}\right\|_{2,\infty}\leq\left\|\boldsymbol{\Delta}\right\|$ , we have via (12),

\displaystyle\left\|\boldsymbol{\Delta}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}\leq\left\|\boldsymbol{\Delta}\right\|_{2,\infty}\left\|\overline{\boldsymbol{U}}^{*}\right\|\leq\left\|\boldsymbol{\Delta}\right\|\left\|\overline{\boldsymbol{U}}^{*}\right\|\lesssim\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{*}}.

(102)

Finally, since $\boldsymbol{u}_{1}^{*\top}\overline{\boldsymbol{U}}^{*}=\boldsymbol{0}$ , we know that $\left\|\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}=\left\|\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}=0$ . This, along with (100), (101) and (102) yields

	$\displaystyle\left\\|\overline{\boldsymbol{W}}\overline{\boldsymbol{U}}^{*}\right\\|_{2,\infty}\lesssim$	$\displaystyle\sqrt{(K-1)\log n}\theta_{\text{max}}+\log n\sqrt{(K-1)\mu^{}/n}+\sqrt{\mu^{}}\theta_{\text{max}}+\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{*}}$
		$\displaystyle=:\xi_{2}.$		(103)

Combining (94), (99), (103) and Lemma 9 we have

	$\displaystyle\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\\|_{2,\infty}\lesssim$	$\displaystyle\Big{(}\frac{\xi_{1}+\log n}{\sigma_{\textbf{min}}^{\star}}\Big{)}\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{}\right\\|_{2,\infty}+\frac{\xi_{1}}{\sigma_{\textbf{min}}^{}}\sqrt{\frac{(K-1)\mu^{*}}{n}}+\frac{\xi_{2}}{\sigma_{\textbf{min}}^{\star}}$
		$\displaystyle+\frac{\kappa^{}}{\sigma_{\textbf{min}}^{2}}\sqrt{(K-1)\mu^{}n}\theta_{\text{max}}^{2}+\frac{\sqrt{(K-1)n\log n}}{\sigma_{\textbf{min}}^{2}}\theta_{\text{max}}$
		$\displaystyle+\frac{(\sqrt{(K-1)n}\theta_{\text{max}}+\log n)n\theta_{\text{max}}^{2}}{\lambda_{1}^{}\sigma_{\textbf{min}}^{2}}+\frac{(\sqrt{n}\theta_{\text{max}}+\log n)^{2}\sqrt{(K-1)\mu^{}/n}}{\sigma_{\textbf{min}}^{2}}$
	$\displaystyle\lesssim$	$\displaystyle\Big{(}\frac{\xi_{1}+\log n}{\sigma_{\textbf{min}}^{\star}}\Big{)}\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\\|_{2,\infty}+\frac{\xi_{2}}{\sigma_{\textbf{min}}^{\star}}$
		$\displaystyle+\frac{\kappa^{}}{\sigma_{\textbf{min}}^{2}}\sqrt{(K-1)\mu^{}n}\theta_{\text{max}}^{2}+\frac{\sqrt{(K-1)n\log n}}{\sigma_{\textbf{min}}^{2}}\theta_{\text{max}}+\frac{\sqrt{K-1}n^{1.5}\theta_{\text{max}}^{3}}{\lambda_{1}^{}\sigma_{\textbf{min}}^{2}}$

since $n\gtrsim\mu^{*}\max\{\log^{2}n,K-1\}$ . When we further have $\xi_{1}/\sigma_{\textbf{min}}^{*}\ll 1$ , the first summand is negligible and we obtain the desired conclusion. ∎

8.10 Proof of Lemma 11

Proof.

From (61) we know that

	$\displaystyle\boldsymbol{\Delta}=$	$\displaystyle\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}d\lambda$
		$\displaystyle+\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{H}\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}d\lambda$

We define the following four matrices

	$\displaystyle\boldsymbol{\Delta}_{1}$	$\displaystyle=\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}$
	$\displaystyle\boldsymbol{\Delta}_{2}$	$\displaystyle=\boldsymbol{W}\left[\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}-\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\right]\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}$
		$\displaystyle=\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}$
	$\displaystyle\boldsymbol{\Delta}_{3}$	$\displaystyle=\boldsymbol{H}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}$
	$\displaystyle\boldsymbol{\Delta}_{4}$	$\displaystyle=\boldsymbol{H}\left[\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}-\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\right]\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}$
		$\displaystyle=\boldsymbol{H}\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\boldsymbol{W}\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}.$

Control $\left\|\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{1}d\lambda\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}$ : For $2\leq i\leq K$ , one can show that

	$\displaystyle\left[\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{1}d\lambda\overline{\boldsymbol{U}}^{*}\right]_{\cdot,i-1}$	$\displaystyle=\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\sum_{j,k=1}^{n}\frac{1}{(\lambda-\lambda_{j}^{})(\lambda-\lambda_{k}^{})}\boldsymbol{W}\boldsymbol{u}_{j}^{}\boldsymbol{u}_{j}^{\top}\boldsymbol{W}\boldsymbol{u}_{k}\boldsymbol{u}_{k}^{\top}\boldsymbol{u}_{i}^{}d\lambda$
		$\displaystyle=\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\sum_{j=1}^{n}\frac{1}{(\lambda-\lambda_{j}^{})(\lambda-\lambda_{i}^{})}\boldsymbol{W}\boldsymbol{u}_{j}^{}\boldsymbol{u}_{j}^{\top}\boldsymbol{W}\boldsymbol{u}_{i}^{*}d\lambda$
		$\displaystyle=\sum_{j=1}^{n}\textbf{Res}\left(\frac{1}{(\lambda-\lambda_{j}^{})(\lambda-\lambda_{i}^{})},\lambda_{1}^{}\right)\boldsymbol{W}\boldsymbol{u}_{j}^{}\boldsymbol{u}_{j}^{\top}\boldsymbol{W}\boldsymbol{u}_{i}^{}$
		$\displaystyle=\frac{1}{\lambda_{1}^{}-\lambda_{i}^{}}\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{u}_{i}^{}=\frac{\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{u}_{i}^{}}{\lambda_{1}^{}-\lambda_{i}^{}}\boldsymbol{W}\boldsymbol{u}_{1}^{}.$

As a result, we have

\displaystyle\left\|\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{1}d\lambda\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}=\left\|\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\|_{\infty}\sqrt{\sum_{i=2}^{K}\left(\frac{\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{i}^{*}}{\lambda_{1}^{*}-\lambda_{i}^{*}}\right)^{2}}.

By Lemma 15, we know that $\left\|\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right\|_{\infty}\lesssim\sqrt{\log n}\theta_{\text{max}}+\log n\sqrt{\mu^{*}/n}$ with probability at least $1-O(n^{-14})$ . For each $2\leq i\leq K$ , by Lemma 18 we know that $\left|\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{i}^{*}\right|\lesssim\sqrt{\log n}\theta_{\text{max}}+\log n\sqrt{\mu^{*}/n}$ with probability at least $1-O(n^{-15})$ . As a result, we know that

\displaystyle\left\|\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{1}d\lambda\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}

\displaystyle\lesssim\left(\sqrt{\log n}\theta_{\text{max}}+\log n\sqrt{\mu^{*}/n}\right)^{2}\frac{\sqrt{K-1}}{\lambda_{1}^{*}}

(104)

with probability at least $1-O(n^{-14})$ .

Control $\left\|\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{2}d\lambda\overline{\boldsymbol{U}}^{*}\right\|$ : By definition we have

$\displaystyle\left\\|\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{2}d\lambda\overline{\boldsymbol{U}}^{*}\right\\|$	$\displaystyle\leq\left\\|\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{2}d\lambda\right\\|\leq\frac{1}{2\pi}\oint_{\mathcal{C}_{1}}\left\\|\boldsymbol{\Delta}_{2}\right\\|d\lambda$
	$\displaystyle\leq\frac{1}{2\pi}\oint_{\mathcal{C}_{1}}\left\\|\boldsymbol{W}\right\\|^{3}\left\\|\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}\right\\|\left\\|\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\right\\|^{2}d\lambda$
	$\displaystyle\lesssim\oint_{\mathcal{C}_{1}}\frac{(\sqrt{n}\theta_{\text{max}})^{3}}{\lambda_{1}^{3}}d\lambda\lesssim\frac{n^{1.5}\theta_{\text{max}}^{3}r}{\lambda_{1}^{3}}\lesssim\frac{n^{1.5}\theta_{\text{max}}^{3}}{\lambda_{1}^{*2}}.$	(105)

Control $\left\|\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{3}d\lambda\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}$ : For $2\leq i\leq K$ , one can show that

		$\displaystyle\left[\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{3}d\lambda\overline{\boldsymbol{U}}^{}\right]_{\cdot,i-1}=\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\sum_{j,k,l=1}^{n}\frac{\lambda_{j}^{}}{(\lambda-\lambda_{j}^{})(\lambda-\lambda_{k}^{})(\lambda-\lambda_{l}^{})}\boldsymbol{u}_{j}^{}\boldsymbol{u}_{j}^{\top}\boldsymbol{W}\boldsymbol{u}_{k}^{}\boldsymbol{u}_{k}^{\top}\boldsymbol{W}\boldsymbol{u}_{l}\boldsymbol{u}_{l}^{\top}\boldsymbol{u}_{i}^{*}d\lambda$
	$\displaystyle=$	$\displaystyle\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\sum_{j,k=1}^{n}\frac{\lambda_{j}^{}}{(\lambda-\lambda_{j}^{})(\lambda-\lambda_{k}^{})(\lambda-\lambda_{i}^{})}\boldsymbol{u}_{j}^{}\boldsymbol{u}_{j}^{\top}\boldsymbol{W}\boldsymbol{u}_{k}^{}\boldsymbol{u}_{k}^{\top}\boldsymbol{W}\boldsymbol{u}_{i}^{*}d\lambda$
	$\displaystyle=$	$\displaystyle\sum_{j,k=1}^{n}\textbf{Res}\left(\frac{\lambda_{j}^{}}{(\lambda-\lambda_{j}^{})(\lambda-\lambda_{k}^{})(\lambda-\lambda_{i}^{})},\lambda_{1}^{}\right)\boldsymbol{u}_{j}^{}\boldsymbol{u}_{j}^{\top}\boldsymbol{W}\boldsymbol{u}_{k}^{}\boldsymbol{u}_{k}^{\top}\boldsymbol{W}\boldsymbol{u}_{i}^{}$
	$\displaystyle=$	$\displaystyle\sum_{j=2}^{n}\frac{\lambda_{j}^{}}{(\lambda_{1}^{}-\lambda_{j}^{})(\lambda_{1}^{}-\lambda_{i}^{})}\boldsymbol{u}_{j}^{}\boldsymbol{u}_{j}^{\top}\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{u}_{i}^{}$
		$\displaystyle+\sum_{j=2}^{n}\frac{\lambda_{1}^{}}{(\lambda_{1}^{}-\lambda_{j}^{})(\lambda_{1}^{}-\lambda_{i}^{})}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{u}_{j}^{}\boldsymbol{u}_{j}^{\top}\boldsymbol{W}\boldsymbol{u}_{i}^{}$
		$\displaystyle-\frac{\lambda_{1}^{}}{(\lambda_{1}^{}-\lambda_{i}^{})^{2}}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{u}_{i}^{}$
	$\displaystyle=$	$\displaystyle\frac{1}{\lambda_{1}^{}-\lambda_{i}^{}}\left(\boldsymbol{N}-\boldsymbol{I}\right)\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{u}_{i}^{}+\frac{1}{\lambda_{1}^{}-\lambda_{i}^{}}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{i}^{}$
		$\displaystyle-\frac{\lambda_{1}^{}}{(\lambda_{1}^{}-\lambda_{i}^{})^{2}}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{u}_{i}^{}.$

Let

	$\displaystyle\boldsymbol{C}_{1}$	$\displaystyle=\textbf{diag}\left(\frac{1}{\lambda_{1}^{}-\lambda_{2}^{}},\frac{1}{\lambda_{1}^{}-\lambda_{3}^{}},\dots,\frac{1}{\lambda_{1}^{}-\lambda_{K}^{}}\right);$
	$\displaystyle\boldsymbol{C}_{2}$	$\displaystyle=\textbf{diag}\left(\frac{\lambda_{1}^{}}{(\lambda_{1}^{}-\lambda_{2}^{})^{2}},\frac{\lambda_{1}^{}}{(\lambda_{1}^{}-\lambda_{3}^{})^{2}},\dots,\frac{\lambda_{1}^{}}{(\lambda_{1}^{}-\lambda_{K}^{*})^{2}}\right).$

Then we have

	$\displaystyle\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{3}d\lambda\overline{\boldsymbol{U}}^{*}=$	$\displaystyle\left(\boldsymbol{N}-\boldsymbol{I}\right)\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{}\boldsymbol{C}_{1}+\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{N}\boldsymbol{W}\overline{\boldsymbol{U}}^{}\boldsymbol{C}_{1}$
		$\displaystyle-\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\boldsymbol{C}_{2}.$		(106)

Note that $\boldsymbol{N}=\lambda_{1}^{*}\boldsymbol{N}_{1}$ for $\boldsymbol{N}_{1}$ defined in (50). In the proof of Lemma 12 we actually show that $\left\|\boldsymbol{N}-\boldsymbol{I}\right\|_{2,\infty}\lesssim\sqrt{\frac{(K-1)\mu^{*}}{n}}$ , which implies

$\displaystyle\left\\|\left(\boldsymbol{N}-\boldsymbol{I}\right)\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\boldsymbol{C}_{1}\right\\|_{2,\infty}$	$\displaystyle\leq\left\\|\boldsymbol{N}-\boldsymbol{I}\right\\|_{2,\infty}\left\\|\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\boldsymbol{C}_{1}\right\\|$
	$\displaystyle\leq\left\\|\boldsymbol{N}-\boldsymbol{I}\right\\|_{2,\infty}\left\\|\boldsymbol{W}\right\\|^{2}\left\\|\boldsymbol{C}_{1}\right\\|$
	$\displaystyle\lesssim\sqrt{\frac{(K-1)\mu^{}}{n}}\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{}}=\sqrt{\frac{(K-1)\mu^{}n}{\lambda_{1}^{2}}}\theta_{\text{max}}^{2}.$	(107)

For $\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{N}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\boldsymbol{C}_{1}$ and $\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\boldsymbol{C}_{2}$ , since both of them are rank- $1$ matrices, we have, using $\left\|\boldsymbol{N}\right\|\lesssim 1$ from Lemma 12,

	$\displaystyle\left\\|\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{N}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\boldsymbol{C}_{1}\right\\|_{2,\infty}$	$\displaystyle=\left\\|\boldsymbol{u}_{1}^{}\right\\|_{\infty}\left\\|\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{N}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\boldsymbol{C}_{1}\right\\|$
		$\displaystyle\leq\left\\|\boldsymbol{u}_{1}^{}\right\\|_{\infty}\left\\|\boldsymbol{W}\right\\|^{2}\left\\|\boldsymbol{N}\right\\|\left\\|\boldsymbol{C}_{1}\right\\|\lesssim\sqrt{\frac{\mu^{}}{n}}\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{}}=\sqrt{\frac{\mu^{}n}{\lambda_{1}^{*2}}}\theta_{\text{max}}^{2}$		(108)

and

	$\displaystyle\left\\|\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\boldsymbol{C}_{2}\right\\|_{2,\infty}$	$\displaystyle=\left\\|\boldsymbol{u}_{1}^{}\right\\|_{\infty}\left\\|\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\overline{\boldsymbol{U}}^{*}\boldsymbol{C}_{2}\right\\|$
		$\displaystyle\leq\left\\|\boldsymbol{u}_{1}^{}\right\\|_{\infty}\left\\|\boldsymbol{W}\right\\|^{2}\left\\|\boldsymbol{N}\right\\|\left\\|\boldsymbol{C}_{2}\right\\|\lesssim=\sqrt{\frac{\mu^{}n}{\lambda_{1}^{*2}}}\theta_{\text{max}}^{2}.$		(109)

Now, combining (106), (107), (108) (109) yields

\displaystyle\Big{\|}\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{3}d\lambda\overline{\boldsymbol{U}}^{*}\Big{\|}\lesssim\sqrt{\frac{(K-1)\mu^{*}n}{\lambda_{1}^{*2}}}\theta_{\text{max}}^{2}.

(110)

Control $\left\|\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{4}d\lambda\overline{\boldsymbol{U}}^{*}\right\|$ : By definition we have

$\displaystyle\Big{\\|}\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{4}d\lambda\overline{\boldsymbol{U}}^{*}\Big{\\|}$	$\displaystyle\leq\Big{\\|}\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{4}d\lambda\Big{\\|}\leq\frac{1}{2\pi}\oint_{\mathcal{C}_{1}}\left\\|\boldsymbol{\Delta}_{4}\right\\|d\lambda$
	$\displaystyle\leq\frac{1}{2\pi}\oint_{\mathcal{C}_{1}}\left\\|\boldsymbol{H}\right\\|\\|\left(\lambda\boldsymbol{I}-\boldsymbol{X}\right)^{-1}\\|\left\\|\boldsymbol{W}\right\\|^{3}\\|\left(\lambda\boldsymbol{I}-\boldsymbol{H}\right)^{-1}\\|^{3}d\lambda$
	$\displaystyle\lesssim\oint_{\mathcal{C}_{1}}\frac{\lambda_{1}^{}(\sqrt{n}\theta_{\text{max}})^{3}}{\lambda_{1}^{4}}d\lambda\lesssim\frac{n^{1.5}\theta_{\text{max}}^{3}r}{\lambda_{1}^{3}}\lesssim\frac{n^{1.5}\theta_{\text{max}}^{3}}{\lambda_{1}^{2}},$	(111)

where the second inequality uses (8.1).

Next, we combine the four bounds obtained above. More precisely, using (104), (105), (110) and (111) with triangle inequality, we obtain

	$\displaystyle\left\\|\boldsymbol{\Delta}\overline{\boldsymbol{U}}^{*}\right\\|_{2,\infty}\leq$	$\displaystyle\sum\limits_{i=1}^{4}\left\\|\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{i}d\lambda\overline{\boldsymbol{U}}^{*}\right\\|_{2,\infty}$
	$\displaystyle\leq$	$\displaystyle\sum\limits_{i=1,3}\left\\|\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{i}d\lambda\overline{\boldsymbol{U}}^{}\right\\|_{2,\infty}+\sum\limits_{i=2,4}\left\\|\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\boldsymbol{\Delta}_{i}d\lambda\overline{\boldsymbol{U}}^{}\right\\|$
	$\displaystyle\lesssim$	$\displaystyle\left(\sqrt{\log n}\theta_{\text{max}}+\log n\sqrt{\mu^{}/n}\right)^{2}\frac{\sqrt{K-1}}{\lambda_{1}^{}}+\frac{n^{1.5}\theta_{\text{max}}^{3}}{\lambda_{1}^{2}}+\sqrt{\frac{(K-1)\mu^{}n}{\lambda_{1}^{*2}}}\theta_{\text{max}}^{2}$
	$\displaystyle\lesssim$	$\displaystyle\frac{\sqrt{(K-1)\mu^{}n}\theta_{\text{max}}^{2}}{\lambda_{1}^{}}+\frac{\sqrt{K-1}\mu^{}\log^{2}n}{n\lambda_{1}^{}}+\frac{n^{1.5}\theta_{\text{max}}^{3}}{\lambda_{1}^{*2}}$

with probability at least $1-O(n^{-10})$ . ∎

8.11 Proof of Theorem 2

Proof.

As for the first step, we have

	$\displaystyle\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{}\left(\overline{\boldsymbol{\Lambda}}^{}\right)^{-1}=$	$\displaystyle\left(\overline{\boldsymbol{H}}+\overline{\boldsymbol{W}}\right)\overline{\boldsymbol{U}}^{}\left(\overline{\boldsymbol{\Lambda}}^{}\right)^{-1}=\overline{\boldsymbol{U}}^{}+\overline{\boldsymbol{W}}\overline{\boldsymbol{U}}^{}\left(\overline{\boldsymbol{\Lambda}}^{*}\right)^{-1}$
	$\displaystyle=$	$\displaystyle\overline{\boldsymbol{U}}^{}+\left[\boldsymbol{W}-\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}-\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}-\left(\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\right)^{\top}\boldsymbol{N}\right]\overline{\boldsymbol{U}}^{}\left(\overline{\boldsymbol{\Lambda}}^{*}\right)^{-1}$
		$\displaystyle-\boldsymbol{\Delta}\overline{\boldsymbol{U}}^{}\left(\overline{\boldsymbol{\Lambda}}^{}\right)^{-1}$
	$\displaystyle=$	$\displaystyle\overline{\boldsymbol{U}}^{}+\left[\boldsymbol{W}-\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{N}\right]\overline{\boldsymbol{U}}^{}\left(\overline{\boldsymbol{\Lambda}}^{}\right)^{-1}-\boldsymbol{\Delta}\overline{\boldsymbol{U}}^{}\left(\overline{\boldsymbol{\Lambda}}^{*}\right)^{-1},$

and

	$\displaystyle\overline{\boldsymbol{U}}\boldsymbol{R}$	$\displaystyle=\overline{\boldsymbol{U}}\boldsymbol{R}\overline{\boldsymbol{\Lambda}}^{}\left(\overline{\boldsymbol{\Lambda}}^{}\right)^{-1}=\left[\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{R}+\overline{\boldsymbol{U}}\left(\boldsymbol{R}\overline{\boldsymbol{\Lambda}}^{}-\overline{\boldsymbol{\Lambda}}\boldsymbol{R}\right)\right]\left(\overline{\boldsymbol{\Lambda}}^{}\right)^{-1}$
		$\displaystyle=\left[\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}+\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\left(\boldsymbol{R}-\boldsymbol{L}\right)+\overline{\boldsymbol{U}}\left(\boldsymbol{R}\overline{\boldsymbol{\Lambda}}^{}-\overline{\boldsymbol{\Lambda}}\boldsymbol{R}\right)\right]\left(\overline{\boldsymbol{\Lambda}}^{}\right)^{-1}$
		$\displaystyle=\left[\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{}+\left(\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{}\right)+\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\left(\boldsymbol{R}-\boldsymbol{L}\right)+\overline{\boldsymbol{U}}\left(\boldsymbol{R}\overline{\boldsymbol{\Lambda}}^{}-\overline{\boldsymbol{\Lambda}}\boldsymbol{R}\right)\right]\left(\overline{\boldsymbol{\Lambda}}^{}\right)^{-1}$
		$\displaystyle=\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{}\left(\overline{\boldsymbol{\Lambda}}^{}\right)^{-1}+\left[\left(\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{}\right)+\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\left(\boldsymbol{R}-\boldsymbol{L}\right)+\overline{\boldsymbol{U}}\left(\boldsymbol{R}\overline{\boldsymbol{\Lambda}}^{}-\overline{\boldsymbol{\Lambda}}\boldsymbol{R}\right)\right]\left(\overline{\boldsymbol{\Lambda}}^{*}\right)^{-1}$

As a result, we know that

\displaystyle\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}=\left[\left(\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}\right)+\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\left(\boldsymbol{R}-\boldsymbol{L}\right)+\overline{\boldsymbol{U}}\left(\boldsymbol{R}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{\Lambda}}\boldsymbol{R}\right)-\boldsymbol{\Delta}\overline{\boldsymbol{U}}^{*}\right]\left(\overline{\boldsymbol{\Lambda}}^{*}\right)^{-1}.

This imples that

\displaystyle\left\|\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}\right\|_{2,\infty}\leq\frac{\left\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}+\left\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\left(\boldsymbol{R}-\boldsymbol{L}\right)\right\|_{2,\infty}+\left\|\overline{\boldsymbol{U}}\left(\boldsymbol{R}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{\Lambda}}\boldsymbol{R}\right)\right\|_{2,\infty}+\left\|\boldsymbol{\Delta}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}}{\sigma_{\textbf{min}}^{*}}.

We now bound the numerator. Note that the last term is already bounded by Lemma 11.

Control $\left\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}$ : Combine (94) and (99) we know that

	$\displaystyle\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\\|_{2,\infty}\leq$	$\displaystyle\frac{1}{\sigma_{\textbf{min}}^{}}\left\\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{}\right\\|_{2,\infty}+\frac{1}{\sigma_{\textbf{min}}^{}}\left\\|\overline{\boldsymbol{W}}\overline{\boldsymbol{U}}^{}\right\\|_{2,\infty}+\frac{1}{\sigma_{\textbf{min}}^{}}\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}\overline{\boldsymbol{\Lambda}}^{}-\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}\right\\|_{2,\infty}$
	$\displaystyle\lesssim$	$\displaystyle\frac{1}{\sigma_{\textbf{min}}^{}}\left\\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{}\right\\|_{2,\infty}+\frac{1}{\sigma_{\textbf{min}}^{}}\left\\|\overline{\boldsymbol{W}}\overline{\boldsymbol{U}}^{}\right\\|_{2,\infty}$
		$\displaystyle+\frac{\xi_{1}}{\sigma_{\textbf{min}}^{\star}}\left(\sqrt{\frac{(K-1)\mu^{}}{n}}+\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{}\right\\|_{2,\infty}\right),$

where $\xi_{1}$ is defined by (8.9). Since we assumed $\xi_{1}/\sigma_{\textbf{min}}^{*}\ll 1$ , we further have

\displaystyle\left\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}\lesssim

\displaystyle\frac{1}{\sigma_{\textbf{min}}^{*}}\left\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}+\frac{1}{\sigma_{\textbf{min}}^{*}}\left\|\overline{\boldsymbol{W}}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}+\frac{\xi_{1}}{\sigma_{\textbf{min}}^{\star}}\sqrt{\frac{(K-1)\mu^{*}}{n}}.

Plugging this in Lemma 9 we get

	$\displaystyle\left\\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}\right\\|_{2,\infty}\lesssim$	$\displaystyle\Bigg{(}\frac{n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{2}}+\frac{\log n}{\sigma_{\textbf{min}}^{}}\Bigg{)}\Bigg{(}\left\\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{}\right\\|_{2,\infty}+\left\\|\overline{\boldsymbol{W}}\overline{\boldsymbol{U}}^{}\right\\|_{2,\infty}+\sqrt{\frac{(K-1)\mu^{*}}{n}}\frac{\xi_{1}}{\sigma_{\textbf{min}}^{\star}}\Bigg{)}$
		$\displaystyle+\frac{\sqrt{(K-1)n\log n}}{\sigma_{\textbf{min}}^{}}\theta_{\text{max}}+\frac{\kappa^{}}{\sigma_{\textbf{min}}^{}}\sqrt{(K-1)\mu^{}n}\theta_{\text{max}}^{2}$
		$\displaystyle+\frac{(\sqrt{(K-1)n}\theta_{\text{max}}+\log n)n\theta_{\text{max}}^{2}}{\lambda_{1}^{}\sigma_{\textbf{min}}^{}}+\frac{(\sqrt{n}\theta_{\text{max}}+\log n)^{2}\sqrt{(K-1)\mu^{}/n}}{\sigma_{\textbf{min}}^{}}.$

Since $\max\{\sqrt{n}\theta_{\text{max}},\log n\}\ll\sigma_{\textbf{min}}^{*}$ , and by (103), we know that

	$\displaystyle\left\\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}\right\\|_{2,\infty}\lesssim$	$\displaystyle\left(\frac{n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{2}}+\frac{\log n}{\sigma_{\textbf{min}}^{}}\right)\left\\|\overline{\boldsymbol{W}}\overline{\boldsymbol{U}}^{}\right\\|_{2,\infty}+\frac{\sqrt{(K-1)n\log n}}{\sigma_{\textbf{min}}^{}}\theta_{\text{max}}+\frac{\kappa^{}}{\sigma_{\textbf{min}}^{}}\sqrt{(K-1)\mu^{*}n}\theta_{\text{max}}^{2}$
		$\displaystyle+\left(\frac{n\theta_{\text{max}}}{\sigma_{\textbf{min}}^{2}}+\log n\right)\sqrt{\frac{(K-1)\mu^{}}{n}}\frac{\xi_{1}}{\sigma_{\textbf{min}}^{\star}}$
		$\displaystyle+\frac{(\sqrt{(K-1)n}\theta_{\text{max}}+\log n)n\theta_{\text{max}}^{2}}{\lambda_{1}^{}\sigma_{\textbf{min}}^{}}+\frac{(\sqrt{n}\theta_{\text{max}}+\log n)^{2}\sqrt{(K-1)\mu^{}/n}}{\sigma_{\textbf{min}}^{}}$
	$\displaystyle\lesssim$	$\displaystyle\frac{\sqrt{(K-1)n\log n}+\kappa^{}\sqrt{(K-1)\mu^{}n}\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{}}+\frac{n\theta_{\text{max}}^{2}\log n+\sqrt{K-1}n^{1.5}\theta_{\text{max}}^{3}}{\lambda_{1}^{}\sigma_{\textbf{min}}^{*}}$

since we assumed $\mu^{*}\log^{2}n\lesssim n$ . According to Lemma 1, since $K^{-1}\left\|\boldsymbol{\theta}\right\|_{2}^{2}\lesssim\lambda_{1}^{*}$ and $\sigma_{\textbf{min}}\asymp\beta_{n}K^{-1}\left\|\boldsymbol{\theta}\right\|_{2}^{2}$ , we have, using $K\log n\lesssim n$ ,

	$\displaystyle\left\\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}\right\\|_{2,\infty}$	$\displaystyle\lesssim\frac{K^{1.5}\log^{0.5}n}{n^{0.5}\beta_{n}\theta_{\text{max}}^{2}}+\frac{\kappa^{}K^{1.5}\sqrt{\mu^{}}}{n^{0.5}\beta_{n}}+\frac{K^{2}\log n}{n\beta_{n}\theta_{\text{max}}^{2}}+\frac{K^{2.5}}{n^{0.5}\beta_{n}\theta_{\text{max}}}$
		$\displaystyle\lesssim\frac{K^{1.5}\log^{0.5}n}{n^{0.5}\beta_{n}\theta_{\text{max}}^{2}}+\frac{\kappa^{}K^{1.5}\sqrt{\mu^{}}}{n^{0.5}\beta_{n}}+\frac{K^{2.5}}{n^{0.5}\beta_{n}\theta_{\text{max}}}.$		(112)

Control $\left\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\left(\boldsymbol{R}-\boldsymbol{L}\right)\right\|_{2,\infty}+\left\|\overline{\boldsymbol{U}}\left(\boldsymbol{R}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{\Lambda}}\boldsymbol{R}\right)\right\|_{2,\infty}$ : By Lemma 7 we have

	$\displaystyle\left\\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\left(\boldsymbol{R}-\boldsymbol{L}\right)\right\\|_{2,\infty}$	$\displaystyle\leq\left\\|\overline{\boldsymbol{U}}\right\\|_{2,\infty}\left\\|\overline{\boldsymbol{\Lambda}}\left(\boldsymbol{R}-\boldsymbol{L}\right)\right\\|\leq\sigma_{\textbf{max}}^{*}\left\\|\overline{\boldsymbol{U}}\right\\|_{2,\infty}\left\\|\boldsymbol{R}-\boldsymbol{L}\right\\|$
		$\displaystyle\lesssim\frac{n\theta_{\text{max}}^{2}\sigma_{\textbf{max}}^{}}{\sigma_{\textbf{min}}^{2}}\left\\|\overline{\boldsymbol{U}}\right\\|_{2,\infty}=\frac{\kappa^{}n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{2}}\left\\|\overline{\boldsymbol{U}}\right\\|_{2,\infty}.$

By Lemma 8 we have

	$\displaystyle\left\\|\overline{\boldsymbol{U}}\left(\boldsymbol{R}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{\Lambda}}\boldsymbol{R}\right)\right\\|_{2,\infty}$	$\displaystyle\leq\left\\|\overline{\boldsymbol{U}}\right\\|_{2,\infty}\left\\|\boldsymbol{R}\overline{\boldsymbol{\Lambda}}^{}-\overline{\boldsymbol{\Lambda}}\boldsymbol{R}\right\\|=\left\\|\overline{\boldsymbol{U}}\right\\|_{2,\infty}\left\\|\overline{\boldsymbol{\Lambda}}^{}-\boldsymbol{R}^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{R}\right\\|$
		$\displaystyle\lesssim\left(\frac{\kappa^{}n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{}}+\sqrt{(K-1)\log n}\theta_{\text{max}}\right)\left\\|\overline{\boldsymbol{U}}\right\\|_{2,\infty}$

As a result, by Lemma 1 we know that

	$\displaystyle\left\\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\left(\boldsymbol{R}-\boldsymbol{L}\right)\right\\|_{2,\infty}+\left\\|\overline{\boldsymbol{U}}\left(\boldsymbol{R}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{\Lambda}}\boldsymbol{R}\right)\right\\|_{2,\infty}$	$\displaystyle\lesssim\left(\frac{\kappa^{}n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{}}+\sqrt{(K-1)\log n}\theta_{\text{max}}\right)\left\\|\overline{\boldsymbol{U}}\right\\|_{2,\infty}$
		$\displaystyle\lesssim\left(\frac{\kappa^{*}K}{\beta_{n}}+K^{0.5}\log^{0.5}n\theta_{\text{max}}\right)\left\\|\overline{\boldsymbol{U}}\right\\|_{2,\infty}$		(113)

By lemma 7 we know that $\sigma_{K-1}(\boldsymbol{L})\geq 1/2$ . As a result, by Lemma 10 and Lemma 1 we have

$\displaystyle\left\\|\overline{\boldsymbol{U}}\right\\|_{2,\infty}\leq$	$\displaystyle 2\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}\right\\|_{2,\infty}\lesssim\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{}\right\\|_{2,\infty}+\left\\|\overline{\boldsymbol{U}}^{}\right\\|_{2,\infty}$
$\displaystyle\lesssim$	$\displaystyle\frac{\kappa^{}}{\sigma_{\textbf{min}}^{2}}\sqrt{(K-1)\mu^{}n}\theta_{\text{max}}^{2}+\frac{\sqrt{(K-1)n\log n}}{\sigma_{\textbf{min}}^{2}}\theta_{\text{max}}+\frac{\sqrt{K-1}n^{1.5}\theta_{\text{max}}^{3}}{\lambda_{1}^{}\sigma_{\textbf{min}}^{2}}$
	$\displaystyle+\frac{\sqrt{(K-1)\log n}\theta_{\text{max}}+\log n\sqrt{(K-1)\mu^{}/n}+\sqrt{\mu^{}}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{}}+\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{}\sigma_{\textbf{min}}^{}}+\sqrt{\frac{(K-1)\mu^{}}{n}}$
$\displaystyle\lesssim$	$\displaystyle\frac{\kappa^{}}{\sigma_{\textbf{min}}^{2}}\sqrt{(K-1)\mu^{}n}\theta_{\text{max}}^{2}+\frac{\sqrt{(K-1)n\log n}}{\sigma_{\textbf{min}}^{2}}\theta_{\text{max}}+\frac{\sqrt{K-1}n^{1.5}\theta_{\text{max}}^{3}}{\lambda_{1}^{}\sigma_{\textbf{min}}^{2}}$
	$\displaystyle+\frac{\sqrt{(K-1)\log n}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{}}+\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{}\sigma_{\textbf{min}}^{}}+\sqrt{\frac{(K-1)\mu^{}}{n}}$
$\displaystyle\lesssim$	$\displaystyle\frac{\kappa^{}K^{2.5}\sqrt{\mu^{}}}{n^{1.5}\beta_{n}^{2}\theta_{\text{max}}^{2}}+\frac{K^{2.5}\log^{0.5}n}{n^{1.5}\beta_{n}^{2}\theta_{\text{max}}^{3}}+\frac{K^{3.5}}{n^{1.5}\beta_{n}^{2}\theta_{\text{max}}^{3}}+\frac{K^{1.5}\log^{0.5}n}{n\beta_{n}\theta_{\text{max}}}+\frac{K^{2}}{n\beta_{n}\theta_{\text{max}}^{2}}+\frac{K^{0.5}\sqrt{\mu^{*}}}{n^{0.5}}:=\xi_{3}$	(114)

with probability at least $1-O(n^{-10})$ . Combine (113) and (114) we get

\displaystyle\left\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\left(\boldsymbol{R}-\boldsymbol{L}\right)\right\|_{2,\infty}+\left\|\overline{\boldsymbol{U}}\left(\boldsymbol{R}\overline{\boldsymbol{\Lambda}}^{*}-\overline{\boldsymbol{\Lambda}}\boldsymbol{R}\right)\right\|_{2,\infty}\lesssim\xi_{3}\left(\frac{\kappa^{*}K}{\beta_{n}}+K^{0.5}\log^{0.5}n\theta_{\text{max}}\right)

(115)

with probability at least $1-O(n^{-10})$ .

Control $\left\|\boldsymbol{\Delta}\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}$ : Plugging Lemma 1 in Lemma 11 we get:

	$\displaystyle\left\\|\boldsymbol{\Delta}\overline{\boldsymbol{U}}^{*}\right\\|_{2,\infty}$	$\displaystyle\lesssim\frac{\sqrt{(K-1)\mu^{}n}\theta_{\text{max}}^{2}}{\lambda_{1}^{}}+\frac{\sqrt{K-1}\mu^{}\log^{2}n}{n\lambda_{1}^{}}+\frac{n^{1.5}\theta_{\text{max}}^{3}}{\lambda_{1}^{*2}}$
		$\displaystyle\lesssim\frac{K^{1.5}\sqrt{\mu^{}}}{n^{0.5}}+\frac{K^{1.5}\mu^{}\log^{2}n}{n^{2}\theta_{\text{max}}^{2}}+\frac{K^{2}}{n^{0.5}\theta_{\text{max}}}\lesssim\frac{K^{1.5}\sqrt{\mu^{*}}}{n^{0.5}}+\frac{K^{2}}{n^{0.5}\theta_{\text{max}}}$		(116)

since $\mu^{*}\log^{2}n\lesssim n$ .

Combine (112), (115) and Lemma (116) we get

		$\displaystyle\left\\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{}\right\\|_{2,\infty}+\left\\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\left(\boldsymbol{R}-\boldsymbol{L}\right)\right\\|_{2,\infty}+\left\\|\overline{\boldsymbol{U}}\left(\boldsymbol{R}\overline{\boldsymbol{\Lambda}}^{}-\overline{\boldsymbol{\Lambda}}\boldsymbol{R}\right)\right\\|_{2,\infty}+\left\\|\boldsymbol{\Delta}\overline{\boldsymbol{U}}^{*}\right\\|_{2,\infty}$
	$\displaystyle\lesssim$	$\displaystyle\xi_{3}\left(\frac{\kappa^{*}K}{\beta_{n}}+K^{0.5}\log^{0.5}n\theta_{\text{max}}\right)$
		$\displaystyle+\frac{K^{1.5}\log^{0.5}n}{n^{0.5}\beta_{n}\theta_{\text{max}}^{2}}+\frac{\kappa^{}K^{1.5}\sqrt{\mu^{}}}{n^{0.5}\beta_{n}}+\frac{K^{2.5}}{n^{0.5}\beta_{n}\theta_{\text{max}}}+\frac{K^{1.5}\sqrt{\mu^{*}}}{n^{0.5}}+\frac{K^{2}}{n^{0.5}\theta_{\text{max}}}$
	$\displaystyle\lesssim$	$\displaystyle\xi_{3}\left(\frac{\kappa^{*}K}{\beta_{n}}+K^{0.5}\log^{0.5}n\theta_{\text{max}}\right)+\frac{K^{1.5}\log^{0.5}n}{n^{0.5}\beta_{n}\theta_{\text{max}}^{2}}+\frac{K^{2.5}}{n^{0.5}\beta_{n}\theta_{\text{max}}}=:\xi_{4}$

with probability at least $1-O(n^{-10})$ , since $n\gtrsim\max\left\{\mu^{*}\log^{2}n,K\log n\right\}$ , by our assumption. Therefore, for $\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}$ we have

\displaystyle\left\|\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}\right\|_{2,\infty}\lesssim\frac{\xi_{4}}{\theta_{\text{max}}^{2}}.

with probability at least $1-O(n^{-10})$ , completing the proof. ∎

8.12 Proof of Theorem 3

Proof.

Note that by [JKL23, Lemma B.2], we have $(\boldsymbol{u}_{1}^{*})_{i}\asymp\theta_{i}/\left\|\boldsymbol{\theta}\right\|_{2}$ for all $i\in[n]$ . Combine this with Lemma 5, we can ensure $(\widehat{\boldsymbol{u}}_{1})_{i}>0$ and $(\widehat{\boldsymbol{u}}_{1})_{i}\asymp(\boldsymbol{u}_{1}^{*})_{i}$ for all $i\in[n]$ as long as

	$\displaystyle\left\\|\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{*}\right\\|_{\infty}$	$\displaystyle\lesssim\frac{K\log^{0.5}n+K^{1.5}\sqrt{\mu^{}}}{n\theta_{\text{max}}}+\frac{K\sqrt{\mu^{}}\log n}{n^{1.5}\theta_{\text{max}}^{2}}+\frac{K^{3}}{n^{1.5}\theta_{\text{max}}^{3}}$
		$\displaystyle\ll\min_{i\in[n]}(\boldsymbol{u}_{1}^{*})_{i}\asymp\frac{\min_{i\in[n]}\theta_{i}}{\left\\|\boldsymbol{\theta}\right\\|_{2}}\asymp\frac{1}{\sqrt{n}}.$

We have already derived the expansion of $\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{*}$ and $\overline{\boldsymbol{U}}\boldsymbol{R}-\overline{\boldsymbol{U}}^{*}$ in Theorem 8, Theorem 9 and Theorem 2. For $i\in[n]$ we have

	$\displaystyle(\widehat{\boldsymbol{u}}_{1})_{i}=(\boldsymbol{u}_{1}^{})_{i}+\left[\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{}\right]_{i}+\delta_{i},$
	$\displaystyle[\overline{\boldsymbol{U}}\boldsymbol{R}]_{i,\cdot}^{\top}=[\overline{\boldsymbol{U}}^{*}]_{i,\cdot}^{\top}+\boldsymbol{w}_{i}+\left[\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}\right]_{i,\cdot}^{\top}.$

Since $[\overline{\boldsymbol{U}}\boldsymbol{R}]_{i,\cdot}^{\top}=\boldsymbol{R}^{\top}[\overline{\boldsymbol{U}}]_{i,\cdot}^{\top}$ and $\widehat{\boldsymbol{r}}_{i}=[\overline{\boldsymbol{U}}]_{i,\cdot}^{\top}/(\widehat{\boldsymbol{u}}_{1})_{i}$ , we know that $[\overline{\boldsymbol{U}}\boldsymbol{R}]_{i,\cdot}^{\top}/(\widehat{\boldsymbol{u}}_{1})_{i}=\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i}$ . As a result, we have

	$\displaystyle\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}=$	$\displaystyle\frac{[\overline{\boldsymbol{U}}\boldsymbol{R}]_{i,\cdot}^{\top}}{(\widehat{\boldsymbol{u}}_{1})_{i}}-\frac{[\overline{\boldsymbol{U}}^{}]_{i,\cdot}^{\top}}{(\boldsymbol{u}_{1}^{})_{i}}=\frac{[\overline{\boldsymbol{U}}^{}]_{i,\cdot}^{\top}+\boldsymbol{w}_{i}+\left[\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}\right]_{i,\cdot}^{\top}}{(\boldsymbol{u}_{1}^{})_{i}+\left[\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{}\right]_{i}+\delta_{i}}-\frac{[\overline{\boldsymbol{U}}^{}]_{i,\cdot}^{\top}}{(\boldsymbol{u}_{1}^{*})_{i}}$
	$\displaystyle=$	$\displaystyle\frac{(\boldsymbol{u}_{1}^{})_{i}\boldsymbol{w}_{i}+(\boldsymbol{u}_{1}^{})_{i}\left[\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}\right]_{i,\cdot}^{\top}-\left[\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{}\right]_{i}[\overline{\boldsymbol{U}}^{}]_{i,\cdot}^{\top}-\delta_{i}[\overline{\boldsymbol{U}}^{}]_{i,\cdot}^{\top}}{\left((\boldsymbol{u}_{1}^{})_{i}+\left[\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{}\right]_{i}+\delta_{i}\right)(\boldsymbol{u}_{1}^{})_{i}}$
	$\displaystyle=$	$\displaystyle\frac{(\boldsymbol{u}_{1}^{})_{i}\boldsymbol{w}_{i}-\left[\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{}\right]_{i}[\overline{\boldsymbol{U}}^{}]_{i,\cdot}^{\top}}{(\boldsymbol{u}_{1}^{})_{i}^{2}}\cdot\frac{(\boldsymbol{u}_{1}^{})_{i}}{(\boldsymbol{u}_{1}^{})_{i}+\left[\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right]_{i}+\delta_{i}}$
		$\displaystyle+\frac{(\boldsymbol{u}_{1}^{})_{i}\left[\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}\right]_{i,\cdot}^{\top}-\delta_{i}[\overline{\boldsymbol{U}}^{}]_{i,\cdot}^{\top}}{\left((\boldsymbol{u}_{1}^{})_{i}+\left[\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{}\right]_{i}+\delta_{i}\right)(\boldsymbol{u}_{1}^{*})_{i}}$
	$\displaystyle=$	$\displaystyle\frac{1}{(\boldsymbol{u}_{1}^{})_{i}}\left(\boldsymbol{w}_{i}-\left[\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{}\right]_{i}\boldsymbol{r}_{i}^{}\right)\left(1-\frac{\left[\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{}\right]_{i}+\delta_{i}}{(\boldsymbol{u}_{1}^{})_{i}+\left[\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{}\right]_{i}+\delta_{i}}\right)$
		$\displaystyle+\frac{\left[\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}\right]_{i,\cdot}^{\top}-\delta_{i}\boldsymbol{r}_{i}^{}}{(\boldsymbol{u}_{1}^{})_{i}+\left[\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right]_{i}+\delta_{i}}.$

We let

\displaystyle\gamma_{i}=-\frac{\left[\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right]_{i}+\delta_{i}}{(\boldsymbol{u}_{1}^{*})_{i}+\left[\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right]_{i}+\delta_{i}}\quad\text{and}\quad[\boldsymbol{\Psi}_{\boldsymbol{r}}]_{i,\cdot}^{\top}=\frac{\left[\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}\right]_{i,\cdot}^{\top}-\delta_{i}\boldsymbol{r}_{i}^{*}}{(\boldsymbol{u}_{1}^{*})_{i}+\left[\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right]_{i}+\delta_{i}}.

Then the expansion can be written as

\displaystyle\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}=\frac{1+\gamma_{i}}{(\boldsymbol{u}_{1}^{*})_{i}}\left(\boldsymbol{w}_{i}-\left[\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right]_{i}\boldsymbol{r}_{i}^{*}\right)+[\boldsymbol{\Psi}_{\boldsymbol{r}}]_{i,\cdot}^{\top}.

On the other hand, by Lemma 5 we know that

	$\displaystyle\max_{1\leq i\leq n}\left\|\left[\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{}\right]_{i}+\delta_{i}\right\|=\left\\|\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{}\right\\|_{\infty}$	$\displaystyle\lesssim\frac{K\log^{0.5}n+K^{1.5}\sqrt{\mu^{}}}{n\theta_{\text{max}}}+\frac{K\sqrt{\mu^{}}\log n}{n^{1.5}\theta_{\text{max}}^{2}}+\frac{K^{3}}{n^{1.5}\theta_{\text{max}}^{3}}$
		$\displaystyle\ll\min_{1\leq i\leq n}(\boldsymbol{u}_{1}^{*})_{i}\asymp\frac{1}{\sqrt{n}}.$

Therefore, $\gamma_{i}$ and $[\boldsymbol{\Psi}_{\boldsymbol{r}}]_{i,\cdot}^{\top}$ can be controlled as

	$\displaystyle\|\gamma_{i}\|\lesssim\frac{\left\\|\widehat{\boldsymbol{u}}_{1}-\boldsymbol{u}_{1}^{}\right\\|_{\infty}}{(\boldsymbol{u}_{1}^{})_{i}}\lesssim\frac{K\log^{0.5}n+K^{1.5}\sqrt{\mu^{}}}{n^{0.5}\theta_{\text{max}}}+\frac{K\sqrt{\mu^{}}\log n}{n\theta_{\text{max}}^{2}}+\frac{K^{3}}{n\theta_{\text{max}}^{3}},$
	$\displaystyle\left\\|[\boldsymbol{\Psi}_{\boldsymbol{r}}]_{i,\cdot}\right\\|_{2}\lesssim\frac{\left\\|\left[\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}\right]_{i,\cdot}\right\\|_{2}+\|\delta_{i}\|\left\\|\boldsymbol{r}_{i}^{}\right\\|_{2}}{(\boldsymbol{u}_{1}^{})_{i}}\lesssim\sqrt{n}\left(\left\\|\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}\right\\|_{2,\infty}+\left\\|\boldsymbol{r}_{i}^{*}\right\\|_{2}\left\\|\boldsymbol{\delta}\right\\|_{\infty}\right).$

∎

8.13 Proof of Theorem 4

Proof.

By Theorem 3, with probability at least $1-O(n^{-10})$ , for all $i\in[n]$ we have

	$\displaystyle\left\\|\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}\right\\|_{2}$	$\displaystyle\lesssim\frac{1}{(\boldsymbol{u}_{1}^{})_{i}}\left(\left\\|\boldsymbol{w}_{i}\right\\|_{2}+\frac{1}{\lambda_{1}^{}}\left[\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{}\right]_{i}\left\\|\boldsymbol{r}_{i}^{}\right\\|_{2}\right)+\left\\|[\boldsymbol{\Psi}_{\boldsymbol{r}}]_{i,\cdot}\right\\|_{2}$
		$\displaystyle\lesssim\frac{1}{(\boldsymbol{u}_{1}^{})_{i}}\left(\left\\|\boldsymbol{w}_{i}\right\\|_{2}+\left\\|\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}\right\\|_{2,\infty}+\frac{1}{\lambda_{1}^{}}\left\|\left[\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{}\right]_{i}\right\|\left\\|\boldsymbol{r}_{i}^{}\right\\|_{2}+\left\\|\boldsymbol{\delta}\right\\|_{\infty}\left\\|\boldsymbol{r}_{i}^{*}\right\\|_{2}\right).$		(117)

On one hand, we know that

	$\displaystyle\left\\|\boldsymbol{w}_{i}\right\\|_{2}$	$\displaystyle\leq\left\\|\boldsymbol{W}_{i,\cdot}\overline{\boldsymbol{U}}^{}\left(\overline{\boldsymbol{\Lambda}}^{}\right)^{-1}\right\\|_{2}+\left\\|\left[\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{N}\right]_{i,\cdot}\overline{\boldsymbol{U}}^{}\left(\overline{\boldsymbol{\Lambda}}^{}\right)^{-1}\right\\|_{2}$
		$\displaystyle=\left\\|\boldsymbol{W}_{i,\cdot}\overline{\boldsymbol{U}}^{}\left(\overline{\boldsymbol{\Lambda}}^{}\right)^{-1}\right\\|_{2}+(\boldsymbol{u}_{1}^{})_{i}\left\\|\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{N}\overline{\boldsymbol{U}}^{}\left(\overline{\boldsymbol{\Lambda}}^{}\right)^{-1}\right\\|_{2}.$		(118)

By Lemma 14 we know that

	$\displaystyle\left\\|\boldsymbol{W}_{i,\cdot}\overline{\boldsymbol{U}}^{}\left(\overline{\boldsymbol{\Lambda}}^{}\right)^{-1}\right\\|_{2}$	$\displaystyle\lesssim\sqrt{\log n}\theta_{\text{max}}\left\\|\overline{\boldsymbol{U}}^{}\left(\overline{\boldsymbol{\Lambda}}^{}\right)^{-1}\right\\|_{F}+\log n\left\\|\overline{\boldsymbol{U}}^{}\left(\overline{\boldsymbol{\Lambda}}^{}\right)^{-1}\right\\|_{2,\infty}$
		$\displaystyle\lesssim\frac{\sqrt{(K-1)\log n}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{}}+\frac{\log n}{\sigma_{\textbf{min}}^{}}\sqrt{\frac{(K-1)\mu^{*}}{n}}$		(119)

with probability at least $1-O(n^{-14})$ . And, by Lemma 18 and Lemma 12 we have

		$\displaystyle\left\\|\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{N}\overline{\boldsymbol{U}}^{}\left(\overline{\boldsymbol{\Lambda}}^{}\right)^{-1}\right\\|_{2}=\sqrt{\sum_{i=2}^{K}\left(\frac{\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{N}\boldsymbol{u}_{i}^{}}{\lambda_{i}^{}}\right)^{2}}\leq\frac{1}{\sigma_{\textbf{min}}^{}}\sqrt{\sum_{i=2}^{K}\left(\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{N}\boldsymbol{u}_{i}^{*}\right)^{2}}$
	$\displaystyle\lesssim$	$\displaystyle\frac{\sqrt{K-1}}{\sigma_{\textbf{min}}^{}}\left(\sqrt{\log n}\theta_{\text{max}}+\log n\sqrt{\mu^{}/n}\right)\left\\|\boldsymbol{N}\right\\|\lesssim\frac{\sqrt{(K-1)\log n}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{}}+\frac{\log n}{\sigma_{\textbf{min}}^{}}\sqrt{\frac{(K-1)\mu^{*}}{n}}$		(120)

with probability at least $1-O(n^{-14})$ . Plugging (119) and (120) in (118) we get

	$\displaystyle\left\\|\boldsymbol{w}_{i}\right\\|_{2}$	$\displaystyle\lesssim(1+(\boldsymbol{u}_{1}^{})_{i})\left(\frac{\sqrt{(K-1)\log n}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{}}+\frac{\log n}{\sigma_{\textbf{min}}^{}}\sqrt{\frac{(K-1)\mu^{}}{n}}\right)$
		$\displaystyle\lesssim\frac{\sqrt{(K-1)\log n}\theta_{\text{max}}}{\sigma_{\textbf{min}}^{}}+\frac{\log n}{\sigma_{\textbf{min}}^{}}\sqrt{\frac{(K-1)\mu^{*}}{n}}$

with probability at least $1-O(n^{-14})$ . Combine this with Theorem 2, we get

\displaystyle\left\|\boldsymbol{w}_{i}\right\|_{2}+\left\|\boldsymbol{\Psi}_{\overline{\boldsymbol{U}}}\right\|_{2,\infty}\lesssim\frac{K^{1.5}\log^{0.5}n}{n\beta_{n}\theta_{\text{max}}}+\frac{K^{1.5}\log n\sqrt{\mu^{*}}}{n^{1.5}\beta_{n}\theta_{\text{max}}^{2}}

(121)

for all $i\in[n]$ with probability at least $1-O(n^{-10})$ . On the other hand, by Corollary 4 and Lemma 15 we know that

	$\displaystyle\frac{1}{\lambda_{1}^{}}\left\|\left[\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{}\right]_{i}\right\|$	$\displaystyle\leq\left\\|\boldsymbol{N}_{1}\boldsymbol{W}\boldsymbol{u}_{1}^{}\right\\|_{\infty}\lesssim\frac{1}{\lambda_{1}^{}}\left\\|\boldsymbol{W}\boldsymbol{u}_{1}^{}\right\\|_{\infty}+\sqrt{\frac{(K-1)\mu^{}}{n\lambda_{1}^{2}}}\left\\|\boldsymbol{W}\boldsymbol{u}_{1}^{}\right\\|_{2}$
		$\displaystyle\lesssim\frac{1}{\lambda_{1}^{}}\left(\sqrt{\log n}\theta_{\text{max}}+\log n\sqrt{\mu^{}/n}\right)+\sqrt{\frac{(K-1)\mu^{}}{n\lambda_{1}^{2}}}\sqrt{n}\theta_{\text{max}}$
		$\displaystyle=\frac{\left(\sqrt{(K-1)\mu^{}}+\sqrt{\log n}\right)\theta_{\text{max}}+\log n\sqrt{\mu^{}/n}}{\lambda_{1}^{*}}$

for all $i\in[n]$ with probability at least $1-O(n^{-13})$ . Combine this with Theorem 1, we get

\displaystyle\frac{1}{\lambda_{1}^{*}}\left|\left[\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\right]_{i}\right|\left\|\boldsymbol{r}_{i}^{*}\right\|_{2}+\left\|\boldsymbol{\delta}\right\|_{\infty}\left\|\boldsymbol{r}_{i}^{*}\right\|_{2}\lesssim\frac{K\sqrt{\mu^{*}}+K^{0.5}\log^{0.5}n}{n\theta_{\text{max}}}+\frac{K^{0.5}\log n\sqrt{\mu^{*}}}{n\theta_{\text{max}}^{2}}

(122)

for all $i\in[n]$ with probability at least $1-O(n^{-10})$ .

Plugging (121) and (122) in (117) we get

	$\displaystyle\left\\|\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}\right\\|_{2}$	$\displaystyle\lesssim\frac{1}{(\boldsymbol{u}_{1}^{})_{i}}\left(\frac{K\sqrt{\mu^{}}}{n\theta_{\text{max}}}+\frac{K^{0.5}\log n\sqrt{\mu^{}}}{n\theta_{\text{max}}^{2}}+\frac{K^{1.5}\log^{0.5}n}{n\beta_{n}\theta_{\text{max}}}+\frac{K^{1.5}\log n\sqrt{\mu^{}}}{n^{1.5}\beta_{n}\theta_{\text{max}}^{2}}\right)$
		$\displaystyle\lesssim\frac{K\sqrt{\mu^{}}}{n^{0.5}\theta_{\text{max}}}+\frac{K^{0.5}\log n\sqrt{\mu^{}}}{n^{0.5}\theta_{\text{max}}^{2}}+\frac{K^{1.5}\log^{0.5}n}{n^{0.5}\beta_{n}\theta_{\text{max}}}+\frac{K^{1.5}\log n\sqrt{\mu^{*}}}{n\beta_{n}\theta_{\text{max}}^{2}}$

for all $i\in[n]$ with probability at least $1-O(n^{-10})$ . ∎

8.14 Proof of Theorem 5

Proof.

Let $\mathcal{K}=\{i_{1},i_{2},\dots,i_{K}\}$ . For the same reason as the proof of [JKL23, Lemma E.1], we know that

	$\displaystyle\max_{k\in[K]}\min_{i\in\mathcal{K}}\left\\|\widehat{\boldsymbol{r}}_{i}-\boldsymbol{b}_{k}^{*}\right\\|_{2}$	$\displaystyle\lesssim\max_{i\in[n]}\left\\|\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}\right\\|_{2}$
		$\displaystyle\lesssim\frac{K\sqrt{\mu^{}}}{n^{0.5}\theta_{\text{max}}}+\frac{K^{0.5}\log n\sqrt{\mu^{}}}{n^{0.5}\theta_{\text{max}}^{2}}+\frac{K^{1.5}\log^{0.5}n}{n^{0.5}\beta_{n}\theta_{\text{max}}}+\frac{K^{1.5}\log n\sqrt{\mu^{*}}}{n\beta_{n}\theta_{\text{max}}^{2}}$

with probability at least $1-O(n^{-10})$ . And we denote by $\mathcal{A}_{2}$ this event. We let $\rho(k)=\operatorname*{arg\,min}_{l\in[K]}\left\|\widehat{\boldsymbol{r}}_{i_{l}}-\boldsymbol{b}_{k}^{*}\right\|_{2}$ for $k\in[K]$ . By triangle inequality, under event $\mathcal{A}_{2}$ we know that

	$\displaystyle\left\\|\boldsymbol{r}_{i_{\rho(k)}}^{}-\boldsymbol{b}_{k}^{}\right\\|_{2}$	$\displaystyle\leq\left\\|\boldsymbol{r}_{i_{\rho(k)}}^{}-\widehat{\boldsymbol{r}}_{i_{\rho(k)}}\right\\|_{2}+\left\\|\widehat{\boldsymbol{r}}_{i_{\rho(k)}}-\boldsymbol{b}_{k}^{}\right\\|_{2}$
		$\displaystyle\lesssim\frac{K\sqrt{\mu^{}}}{n^{0.5}\theta_{\text{max}}}+\frac{K^{0.5}\log n\sqrt{\mu^{}}}{n^{0.5}\theta_{\text{max}}^{2}}+\frac{K^{1.5}\log^{0.5}n}{n^{0.5}\beta_{n}\theta_{\text{max}}}+\frac{K^{1.5}\log n\sqrt{\mu^{*}}}{n\beta_{n}\theta_{\text{max}}^{2}}.$		(123)

If $i_{\rho(k)}\notin\mathbb{V}_{k}$ , we know that $\|\boldsymbol{r}_{i_{\rho(k)}}^{*}-\boldsymbol{b}_{k}^{*}\|_{2}\geq\min_{l\in[K]}\min_{i\in[n]\backslash\mathbb{V}_{l}}\left\|\boldsymbol{r}_{i}^{*}-\boldsymbol{b}_{l}^{*}\right\|_{2}=\Delta_{\boldsymbol{r}}$ . In this case, (123) cannot hold for appropriately chosen $C_{\text{SP}}$ . Therefore, for appropriately chosen $C_{\text{SP}}$ , we must have $i_{\rho(k)}\in\mathbb{V}_{k}$ . This also implies that $\rho$ is a permutation of $[K]$ , since the cardinality of $\mathcal{K}$ is exactly $K$ .

For any $k\in[K]$ and $j\in[n]$ , if $j\in\mathbb{V}_{k}$ , by triangle inequality we have

	$\displaystyle\left\\|\widehat{\boldsymbol{r}}_{j}-\widehat{\boldsymbol{r}}_{i_{\rho(k)}}\right\\|_{2}$	$\displaystyle=\left\\|\boldsymbol{R}^{\top}\left(\widehat{\boldsymbol{r}}_{j}-\widehat{\boldsymbol{r}}_{i_{\rho(k)}}\right)\right\\|_{2}\leq\left\\|\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{j}-\boldsymbol{r}_{j}^{}\right\\|_{2}+\left\\|\boldsymbol{r}_{j}^{}-\boldsymbol{r}_{i_{\rho(k)}}^{}\right\\|_{2}+\left\\|\boldsymbol{r}_{i_{\rho(k)}}^{}-\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i_{\rho(k)}}\right\\|_{2}$
		$\displaystyle\leq\left\\|\boldsymbol{b}_{k}^{}-\boldsymbol{b}_{k}^{}\right\\|_{2}+2\max_{i\in[n]}\left\\|\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}\right\\|_{2}$
		$\displaystyle\lesssim\frac{K\sqrt{\mu^{}}}{n^{0.5}\theta_{\text{max}}}+\frac{K^{0.5}\log n\sqrt{\mu^{}}}{n^{0.5}\theta_{\text{max}}^{2}}+\frac{K^{1.5}\log^{0.5}n}{n^{0.5}\beta_{n}\theta_{\text{max}}}+\frac{K^{1.5}\log n\sqrt{\mu^{*}}}{n\beta_{n}\theta_{\text{max}}^{2}}$

under event $\mathcal{A}_{2}$ . As a result, for for appropriately chosen $C_{\text{SP}}$ , it holds that $\|\widehat{\boldsymbol{r}}_{j}-\widehat{\boldsymbol{b}}_{\rho(k)}^{\prime}\|_{2}=\|\widehat{\boldsymbol{r}}_{j}-\widehat{\boldsymbol{r}}_{i_{\rho(k)}}\|_{2}\leq\phi$ . In other words, we must have $j\in\widehat{\mathbb{V}}_{\rho(k)}$ . On the other hand, if $j\notin\mathbb{V}_{k}$ , again by triangle inequality we have

	$\displaystyle\left\\|\boldsymbol{r}_{j}^{}-\boldsymbol{b}_{k}^{}\right\\|_{2}=\left\\|\boldsymbol{r}_{j}^{}-\boldsymbol{r}_{i_{\rho(k)}}^{}\right\\|_{2}$	$\displaystyle\leq\left\\|\boldsymbol{r}_{j}^{}-\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{j}\right\\|_{2}+\left\\|\boldsymbol{R}^{\top}\left(\widehat{\boldsymbol{r}}_{j}-\widehat{\boldsymbol{r}}_{i_{\rho(k)}}\right)\right\\|_{2}+\left\\|\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i_{\rho(k)}}-\boldsymbol{r}_{i_{\rho(k)}}^{}\right\\|_{2}$
		$\displaystyle=\left\\|\boldsymbol{r}_{j}^{}-\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{j}\right\\|_{2}+\left\\|\widehat{\boldsymbol{r}}_{j}-\widehat{\boldsymbol{r}}_{i_{\rho(k)}}\right\\|_{2}+\left\\|\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i_{\rho(k)}}-\boldsymbol{r}_{i_{\rho(k)}}^{}\right\\|_{2}.$

As a result, $\|\widehat{\boldsymbol{r}}_{j}-\widehat{\boldsymbol{r}}_{i_{\rho(k)}}\|_{2}$ can be lower bounded as

	$\displaystyle\left\\|\widehat{\boldsymbol{r}}_{j}-\widehat{\boldsymbol{r}}_{i_{\rho(k)}}\right\\|_{2}$	$\displaystyle\geq\left\\|\boldsymbol{r}_{j}^{}-\boldsymbol{b}_{k}^{}\right\\|_{2}-\left\\|\boldsymbol{r}_{j}^{}-\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{j}\right\\|_{2}-\left\\|\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i_{\rho(k)}}-\boldsymbol{r}_{i_{\rho(k)}}^{}\right\\|_{2}$
		$\displaystyle\geq\min_{l\in[K]}\min_{i\in[n]\backslash\mathbb{V}_{l}}\left\\|\boldsymbol{r}_{i}^{}-\boldsymbol{b}_{l}^{}\right\\|_{2}-2\max_{i\in[n]}\left\\|\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}\right\\|_{2}$
		$\displaystyle=\Delta_{\boldsymbol{r}}-2\max_{i\in[n]}\left\\|\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}\right\\|_{2}>\phi$

as long as $C_{\text{SP}}$ satisfies

\displaystyle\max_{i\in[n]}\left\|\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}\right\|_{2}\leq\frac{C_{\text{SP}}}{4}\varepsilon_{1}

under event $\mathcal{A}_{2}$ , we have $\|\widehat{\boldsymbol{r}}_{j}-\widehat{\boldsymbol{r}}_{i_{\rho(k)}}\|_{2}>\phi$ . This implies $j\notin\widehat{\mathbb{V}}_{\rho(k)}$ . To sum up, we have $\widehat{\mathbb{V}}_{\rho(k)}=\mathbb{V}_{k}$ for all $k\in[K]$ under event $\mathcal{A}_{2}$ . ∎

8.15 Proof of Corollary 1

Proof.

Let $\rho(\cdot)$ be the permutation from Theorem 5. According to Theorem 5, with probability at least $1-O(n^{-10})$ , we have $\widehat{\mathbb{V}}_{\rho(k)}=\mathbb{V}_{k}$ for all $k\in[K]$ . Combine this fact with Algorithm 1 one can see that

\displaystyle\widehat{\boldsymbol{b}}_{\rho(k)}=\frac{1}{|\widehat{\mathbb{V}}_{\rho(k)}|}\sum_{i\in\widehat{\mathbb{V}}_{\rho(k)}}\widehat{\boldsymbol{r}}_{i}=\frac{1}{|\mathbb{V}_{k}|}\sum_{i\in\mathbb{V}_{k}}\widehat{\boldsymbol{r}}_{i}.

As a result, we can write

\displaystyle\boldsymbol{R}^{\top}\widehat{\boldsymbol{b}}_{\rho(k)}-\boldsymbol{b}_{k}^{*}=\frac{1}{|\mathbb{V}_{k}|}\sum_{i\in\mathbb{V}_{k}}\left(\widehat{\boldsymbol{r}}_{i}-\boldsymbol{b}_{k}^{*}\right)=\frac{1}{|\mathbb{V}_{k}|}\sum_{i\in\mathbb{V}_{k}}\left(\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}\right).

Then by (19) we know that

	$\displaystyle\left\\|[\boldsymbol{\Psi}_{\boldsymbol{b}}]_{k,\cdot}\right\\|_{2}$	$\displaystyle=\left\\|\frac{1}{\|\mathbb{V}_{k}\|}\sum_{i\in\mathbb{V}_{k}}\left(\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}\right)-\frac{1}{\left\|\mathbb{V}_{k}\right\|}\sum_{i\in\mathbb{V}_{k}}\Delta\boldsymbol{r}_{i}\right\\|_{2}$
		$\displaystyle\leq\frac{1}{\|\mathbb{V}_{k}\|}\left\\|\sum_{i\in\mathbb{V}_{k}}\left(\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}-\Delta\boldsymbol{r}_{i}\right)\right\\|_{2}\lesssim\varepsilon_{2}$

with probability at least $1-O(n^{-10})$ . ∎

8.16 Proof of Corollary 3

Proof.

Considering the trace on both sides of (36) we get

	$\displaystyle\widehat{\lambda}_{1}-\lambda_{1}^{*}$	$\displaystyle=\textbf{Tr}\left[\widehat{\lambda}_{1}\widehat{\boldsymbol{u}}_{1}\widehat{\boldsymbol{u}}_{1}^{\top}-\lambda_{1}^{}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{*\top}\right]$
		$\displaystyle=\textbf{Tr}\left[\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}+\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}+\left(\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\right)^{\top}\boldsymbol{N}+\boldsymbol{\Delta}\right]$
		$\displaystyle=\textbf{Tr}\left[\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}+\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}+\boldsymbol{N}^{\top}\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\right]+\textbf{Tr}\left[\boldsymbol{\Delta}\right]$
		$\displaystyle=\textbf{Tr}\left[\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}+2\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\right]+\textbf{Tr}\left[\boldsymbol{\Delta}\right].$

And, from (36) we also know that

	$\displaystyle\boldsymbol{\Delta}$	$\displaystyle=\widehat{\lambda}_{1}\widehat{\boldsymbol{u}}_{1}\widehat{\boldsymbol{u}}_{1}^{\top}-\lambda_{1}^{}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}-\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}-\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}-\left(\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{*\top}\right)^{\top}\boldsymbol{N}$
		$\displaystyle=\widehat{\lambda}_{1}\widehat{\boldsymbol{u}}_{1}\widehat{\boldsymbol{u}}_{1}^{\top}-\boldsymbol{u}_{1}^{}\left(\lambda_{1}^{}\boldsymbol{u}_{1}^{\top}+\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}+\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{N}\right)-\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{*\top}.$

As a result, the rank of $\boldsymbol{\Delta}$ is at most $3$ . Therefore, the trace of $\boldsymbol{\Delta}$ can be bounded as $\left|\textbf{Tr}\left[\boldsymbol{\Delta}\right]\right|\leq\textbf{Rank}\left[\boldsymbol{\Delta}\right]\left\|\boldsymbol{\Delta}\right\|\lesssim\frac{n\theta_{\text{max}}^{2}}{\lambda_{1}^{*}}$ , completing the proof. ∎

8.17 Proof of Lemma 2

Proof.

By Corollary 1 we know that

$\displaystyle\widehat{\boldsymbol{b}}_{\rho(k)}^{\top}\overline{\boldsymbol{\Lambda}}\widehat{\boldsymbol{b}}_{\rho(k)}=$	$\displaystyle\left(\boldsymbol{b}_{k}^{}+\Delta\boldsymbol{b}_{k}+[\boldsymbol{\Psi}_{\boldsymbol{b}}]_{k,\cdot}\right)^{\top}\boldsymbol{R}^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{R}\left(\boldsymbol{b}_{k}^{}+\Delta\boldsymbol{b}_{k}+[\boldsymbol{\Psi}_{\boldsymbol{b}}]_{k,\cdot}\right)$
$\displaystyle=$	$\displaystyle\left(\boldsymbol{b}_{k}^{}+\Delta\boldsymbol{b}_{k}+[\boldsymbol{\Psi}_{\boldsymbol{b}}]_{k,\cdot}\right)^{\top}\overline{\boldsymbol{\Lambda}}^{}\left(\boldsymbol{b}_{k}^{*}+\Delta\boldsymbol{b}_{k}+[\boldsymbol{\Psi}_{\boldsymbol{b}}]_{k,\cdot}\right)$
	$\displaystyle+\left(\boldsymbol{b}_{k}^{}+\Delta\boldsymbol{b}_{k}+[\boldsymbol{\Psi}_{\boldsymbol{b}}]_{k,\cdot}\right)^{\top}\left(\boldsymbol{R}^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{R}-\overline{\boldsymbol{\Lambda}}^{}\right)\left(\boldsymbol{b}_{k}^{*}+\Delta\boldsymbol{b}_{k}+[\boldsymbol{\Psi}_{\boldsymbol{b}}]_{k,\cdot}\right).$	(124)

On one hand, by Lemma 8 we have

	$\displaystyle\left\|\left(\boldsymbol{b}_{k}^{}+\Delta\boldsymbol{b}_{k}+[\boldsymbol{\Psi}_{\boldsymbol{b}}]_{k,\cdot}\right)^{\top}\left(\boldsymbol{R}^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{R}-\overline{\boldsymbol{\Lambda}}^{}\right)\left(\boldsymbol{b}_{k}^{*}+\Delta\boldsymbol{b}_{k}+[\boldsymbol{\Psi}_{\boldsymbol{b}}]_{k,\cdot}\right)\right\|$
$\displaystyle\leq$	$\displaystyle\left\\|\boldsymbol{R}^{\top}\overline{\boldsymbol{\Lambda}}\boldsymbol{R}-\overline{\boldsymbol{\Lambda}}^{}\right\\|\left\\|\boldsymbol{b}_{k}^{}+\Delta\boldsymbol{b}_{k}+[\boldsymbol{\Psi}_{\boldsymbol{b}}]_{k,\cdot}\right\\|^{2}_{2}$
$\displaystyle\lesssim$	$\displaystyle\left(\frac{\kappa^{}n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{}}+\sqrt{(K-1)\log n}\theta_{\text{max}}\right)\left(\sqrt{K-1}+\varepsilon_{1}\right)^{2}\lesssim\frac{\kappa^{*}K^{2}}{\beta_{n}}+K^{1.5}\theta_{\text{max}}\log^{0.5}n$	(125)

with probability exceeding $1-O(n^{-10})$ . On the other hand, by triangle inequality we know that

	$\displaystyle\left\|\left(\boldsymbol{b}_{k}^{}+\Delta\boldsymbol{b}_{k}+[\boldsymbol{\Psi}_{\boldsymbol{b}}]_{k,\cdot}\right)^{\top}\overline{\boldsymbol{\Lambda}}^{}\left(\boldsymbol{b}_{k}^{}+\Delta\boldsymbol{b}_{k}+[\boldsymbol{\Psi}_{\boldsymbol{b}}]_{k,\cdot}\right)-\boldsymbol{b}_{k}^{\top}\overline{\boldsymbol{\Lambda}}^{}\boldsymbol{b}_{k}^{}-2\boldsymbol{b}_{k}^{\top}\overline{\boldsymbol{\Lambda}}^{}\Delta\boldsymbol{b}_{k}\right\|$
$\displaystyle\leq$	$\displaystyle\left\|[\boldsymbol{\Psi}_{\boldsymbol{b}}]_{k,\cdot}^{\top}\overline{\boldsymbol{\Lambda}}^{}\left(\boldsymbol{b}_{k}^{}+\Delta\boldsymbol{b}_{k}+[\boldsymbol{\Psi}_{\boldsymbol{b}}]_{k,\cdot}\right)\right\|+\left\|\left(\boldsymbol{b}_{k}^{}+\Delta\boldsymbol{b}_{k}\right)^{\top}\overline{\boldsymbol{\Lambda}}^{}[\boldsymbol{\Psi}_{\boldsymbol{b}}]_{k,\cdot}\right\|+\left\|\Delta\boldsymbol{b}_{k}^{\top}\overline{\boldsymbol{\Lambda}}^{*}\Delta\boldsymbol{b}_{k}\right\|$
$\displaystyle\lesssim$	$\displaystyle\left(\sqrt{K-1}\varepsilon_{2}+\varepsilon_{1}^{2}\right)\sigma_{\textbf{max}}^{*}.$	(126)

Plugging (125) and (126) in (124) we get

		$\displaystyle\left\|\widehat{\boldsymbol{b}}_{\rho(k)}^{\top}\overline{\boldsymbol{\Lambda}}\widehat{\boldsymbol{b}}_{\rho(k)}-\boldsymbol{b}_{k}^{\top}\overline{\boldsymbol{\Lambda}}^{}\boldsymbol{b}_{k}^{}-2\boldsymbol{b}_{k}^{\top}\overline{\boldsymbol{\Lambda}}^{*}\Delta\boldsymbol{b}_{k}\right\|$
	$\displaystyle\lesssim$	$\displaystyle\frac{\kappa^{}K^{2}}{\beta_{n}}+K^{1.5}\mu^{}\theta_{\text{max}}\log^{0.5}n+\left(K^{0.5}\varepsilon_{2}+\varepsilon_{1}^{2}\right)\sigma_{\textbf{max}}^{*}$

with probability at least $1-O(n^{-10})$ . Moreover, the left hand side is a small order term compared to $K^{0.5}\varepsilon_{1}\sigma_{\textbf{max}}^{*}$ , which controls $\boldsymbol{b}_{k}^{*\top}\overline{\boldsymbol{\Lambda}}^{*}\Delta\boldsymbol{b}_{k}$ . As a result, the estimation error can be controlled by $K^{0.5}\varepsilon_{1}\sigma_{\textbf{max}}^{*}$ , ∎

8.18 Proof of Lemma 3

Proof.

We denote by

\displaystyle\boldsymbol{\widehat{B}}=\begin{bmatrix}\widehat{\boldsymbol{b}}_{1}&\widehat{\boldsymbol{b}}_{2}&\dots&\widehat{\boldsymbol{b}}_{K}\\ 1&1&\dots&1\end{bmatrix}\in\mathbb{R}^{K\times K},

By the definition of $\widehat{\boldsymbol{a}}_{i}$ , we know that $\boldsymbol{\widehat{B}}\widehat{\boldsymbol{a}}_{i}=[\widehat{\boldsymbol{r}}_{i}^{\top},1]^{\top}$ . We also denote by

\displaystyle\widetilde{\boldsymbol{R}}=\begin{bmatrix}\boldsymbol{R}&\boldsymbol{0}_{(K-1)\times 1}\\ \boldsymbol{0}_{1\times(K-1)}&1\end{bmatrix}\in\mathbb{R}^{K\times K}

Let $\rho(\cdot)$ be the permutation from Theorem 5, then we have

\displaystyle\widetilde{\boldsymbol{R}}^{\top}\begin{bmatrix}\widehat{\boldsymbol{r}}_{i}\\ 1\end{bmatrix}=\widetilde{\boldsymbol{R}}^{\top}\boldsymbol{\widehat{B}}\widehat{\boldsymbol{a}}_{i}=\widetilde{\boldsymbol{R}}^{\top}\sum_{j=1}^{K}\begin{bmatrix}\widehat{\boldsymbol{b}}_{j}\\ 1\end{bmatrix}(\widehat{\boldsymbol{a}}_{i})_{j}=\widetilde{\boldsymbol{R}}^{\top}\sum_{j=1}^{K}\begin{bmatrix}\widehat{\boldsymbol{b}}_{\rho(j)}\\ 1\end{bmatrix}(\widehat{\boldsymbol{a}}_{i})_{\rho(j)}=\widetilde{\boldsymbol{R}}^{\top}\rho(\boldsymbol{\widehat{B}})\rho(\widehat{\boldsymbol{a}}_{i}).

Therefore, we can write

	$\displaystyle\boldsymbol{B}^{}\left(\rho(\widehat{\boldsymbol{a}}_{i})-\boldsymbol{a}_{i}^{}\right)$	$\displaystyle=\boldsymbol{B}^{}\rho(\widehat{\boldsymbol{a}}_{i})-\begin{bmatrix}\boldsymbol{r}_{i}^{}\\ 1\end{bmatrix}=\widetilde{\boldsymbol{R}}^{\top}\begin{bmatrix}\widehat{\boldsymbol{r}}_{i}\\ 1\end{bmatrix}-\begin{bmatrix}\boldsymbol{r}_{i}^{}\\ 1\end{bmatrix}-\left(\widetilde{\boldsymbol{R}}^{\top}\rho(\boldsymbol{\widehat{B}})-\boldsymbol{B}^{}\right)\rho(\widehat{\boldsymbol{a}}_{i})$
		$\displaystyle=\widetilde{\boldsymbol{R}}^{\top}\begin{bmatrix}\widehat{\boldsymbol{r}}_{i}\\ 1\end{bmatrix}-\begin{bmatrix}\boldsymbol{r}_{i}^{}\\ 1\end{bmatrix}-\left(\widetilde{\boldsymbol{R}}^{\top}\rho(\boldsymbol{\widehat{B}})-\boldsymbol{B}^{}\right)\boldsymbol{a}^{}_{i}-\left(\widetilde{\boldsymbol{R}}^{\top}\rho(\boldsymbol{\widehat{B}})-\boldsymbol{B}^{}\right)\left(\rho(\widehat{\boldsymbol{a}}_{i})-\boldsymbol{a}^{*}_{i}\right).$		(127)

Denote by $\boldsymbol{\Psi}_{\boldsymbol{B}}=[\boldsymbol{\Psi}_{\boldsymbol{b}},\boldsymbol{0}_{K\times 1}]^{\top}$ . By Corollary 1 we know that

\displaystyle\widetilde{\boldsymbol{R}}^{\top}\rho(\boldsymbol{\widehat{B}})-\boldsymbol{B}^{*}=\Delta\boldsymbol{B}+\boldsymbol{\Psi}_{\boldsymbol{B}},

(128)

and

	$\displaystyle\left\\|\widetilde{\boldsymbol{R}}^{\top}\rho(\boldsymbol{\widehat{B}})-\boldsymbol{B}^{}\right\\|\leq\left\\|\widetilde{\boldsymbol{R}}^{\top}\rho(\boldsymbol{\widehat{B}})-\boldsymbol{B}^{}\right\\|_{F}\leq\sqrt{K}\sup_{i\in[n]}\left\\|\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}\right\\|_{2}\lesssim\sqrt{K}\varepsilon_{1};$
	$\displaystyle\left\\|\boldsymbol{\Psi}_{\boldsymbol{B}}\right\\|\leq\sqrt{K}\left\\|\boldsymbol{\Psi}_{\boldsymbol{b}}\right\\|_{2,\infty}\lesssim\sqrt{K}\varepsilon_{2}.$

Next, by Theorem 3,

\displaystyle\widetilde{\boldsymbol{R}}^{\top}\begin{bmatrix}\widehat{\boldsymbol{r}}_{i}\\ 1\end{bmatrix}-\begin{bmatrix}\boldsymbol{r}_{i}^{*}\\ 1\end{bmatrix}=\begin{bmatrix}\Delta\boldsymbol{r}_{i}\\ 0\end{bmatrix}+\begin{bmatrix}[\boldsymbol{\Psi}_{\boldsymbol{r}}]_{i,\cdot}^{\top}\\ 0\end{bmatrix},

(129)

where $\|[\boldsymbol{\Psi}_{\boldsymbol{r}}]_{i,\cdot}\|\lesssim\varepsilon_{2}$ . Plugging (128) and (129) in (127), we get

\displaystyle\boldsymbol{B}^{*}\left(\rho(\widehat{\boldsymbol{a}}_{i})-\boldsymbol{a}_{i}^{*}\right)=\begin{bmatrix}\Delta\boldsymbol{r}_{i}\\ 0\end{bmatrix}-\Delta\boldsymbol{B}\boldsymbol{a}_{i}^{*}+\begin{bmatrix}[\boldsymbol{\Psi}_{\boldsymbol{r}}]_{i,\cdot}^{\top}\\ 0\end{bmatrix}-\boldsymbol{\Psi}_{\boldsymbol{B}}\boldsymbol{a}_{i}^{*}-\left(\widetilde{\boldsymbol{R}}^{\top}\rho(\boldsymbol{\widehat{B}})-\boldsymbol{B}^{*}\right)\left(\rho(\widehat{\boldsymbol{a}}_{i})-\boldsymbol{a}^{*}_{i}\right).

(130)

Since $\boldsymbol{b}_{1}^{*},\boldsymbol{b}_{2}^{*},\dots,\boldsymbol{b}_{K}^{*}$ form a simplex and all $\boldsymbol{r}_{i}^{*},i\in[n]$ are inside it, we know that the entries of $\boldsymbol{a}_{i}$ are within $[0,1]$ . As a result, we know that $\|\boldsymbol{a}_{1}^{*}\|_{2}^{2}=\sum_{j=1}^{K}(\boldsymbol{a}_{i}^{*})_{j}^{2}\leq\sum_{j=1}^{K}(\boldsymbol{a}_{i}^{*})_{j}=1$ . That is to say, we have

\displaystyle\left\|\begin{bmatrix}[\boldsymbol{\Psi}_{\boldsymbol{r}}]_{i,\cdot}^{\top}\\ 0\end{bmatrix}-\boldsymbol{\Psi}_{\boldsymbol{B}}\boldsymbol{a}_{i}^{*}\right\|_{2}\leq\|[\Psi_{\boldsymbol{r}}]_{i,\cdot}\|+\left\|\boldsymbol{\Psi}_{\boldsymbol{B}}\right\|\left\|\boldsymbol{a}_{i}^{*}\right\|_{2}\lesssim\sqrt{K}\varepsilon_{2}.

(131)

On the other hand, we have

\displaystyle\left\|\left(\widetilde{\boldsymbol{R}}^{\top}\rho(\boldsymbol{\widehat{B}})-\boldsymbol{B}^{*}\right)\left(\rho(\widehat{\boldsymbol{a}}_{i})-\boldsymbol{a}^{*}_{i}\right)\right\|_{2}\lesssim\sqrt{K}\varepsilon_{1}\left\|\rho(\widehat{\boldsymbol{a}}_{i})-\boldsymbol{a}^{*}_{i}\right\|_{2}.

(132)

According to [JKL23, (C.26)], we know that $\|(\boldsymbol{B}^{*})^{-1}\|\lesssim 1/\sqrt{K}$ . As a result, plugging (131) and (132) in (130) we get

\displaystyle\left\|\rho(\widehat{\boldsymbol{a}}_{i})-\boldsymbol{a}^{*}_{i}-(\boldsymbol{B}^{*})^{-1}\begin{bmatrix}\Delta\boldsymbol{r}_{i}\\ 0\end{bmatrix}+(\boldsymbol{B}^{*})^{-1}\Delta\boldsymbol{B}\boldsymbol{a}_{i}^{*}\right\|_{2}\lesssim\varepsilon_{2}+\varepsilon_{1}\left\|\rho(\widehat{\boldsymbol{a}}_{i})-\boldsymbol{a}^{*}_{i}\right\|_{2}.

(133)

And, since

\displaystyle\left\|(\boldsymbol{B}^{*})^{-1}\begin{bmatrix}\Delta\boldsymbol{r}_{i}\\ 0\end{bmatrix}\right\|_{2}\leq\left\|(\boldsymbol{B}^{*})^{-1}\right\|\left\|\Delta\boldsymbol{r}_{i}\right\|_{2}\lesssim\frac{\varepsilon_{1}}{\sqrt{K}}

(134)

and

\displaystyle\left\|(\boldsymbol{B}^{*})^{-1}\Delta\boldsymbol{B}^{*}\boldsymbol{a}_{i}^{*}\right\|_{2}\lesssim\frac{\left\|\Delta\boldsymbol{B}\right\|_{F}\left\|\boldsymbol{a}^{*}_{i}\right\|_{2}}{\sqrt{K}}\leq\frac{\sqrt{K}\max_{k\in[K]}\left\|\Delta\boldsymbol{b}_{k}\right\|_{2}}{\sqrt{K}}\lesssim\varepsilon_{1}.

(135)

Combine (134), (135) with (133) we get

\displaystyle\left\|\rho(\widehat{\boldsymbol{a}}_{i})-\boldsymbol{a}^{*}_{i}\right\|_{2}\lesssim\varepsilon_{1}+\varepsilon_{2}+\varepsilon_{1}\left\|\rho(\widehat{\boldsymbol{a}}_{i})-\boldsymbol{a}^{*}_{i}\right\|_{2}.

Since $\varepsilon_{2}\lesssim\varepsilon_{1}\leq C$ , we know that $\left\|\rho(\widehat{\boldsymbol{a}}_{i})-\boldsymbol{a}^{*}_{i}\right\|_{2}\lesssim\varepsilon_{1}$ , completing the proof. ∎

8.19 Proof of Theorem 6

Proof.

First we focus on $\widehat{c}_{k},k\in[K]$ . Define $f(x):=\sqrt{x}$ and $c_{k}^{*}:=(\lambda_{1}^{*}+\boldsymbol{b}_{k}^{*\top}\overline{\boldsymbol{\Lambda}}^{*}\boldsymbol{b}_{k}^{*})^{-1/2}$ . Set $x_{0}:=\lambda_{1}^{*}+\boldsymbol{b}_{k}^{*\top}\overline{\boldsymbol{\Lambda}}^{*}\boldsymbol{b}_{k}^{*}$ , $x_{1}:=\widehat{\lambda}_{1}+\widehat{\boldsymbol{b}}_{\rho(k)}^{\top}\overline{\boldsymbol{\Lambda}}\widehat{\boldsymbol{b}}_{\rho(k)}$ . By Taylor expansion, we know that there exists some $\tilde{x}$ between $x_{0}$ and $x_{1}$ , such that

\displaystyle\frac{1}{\widehat{c}_{\rho(k)}}-\frac{1}{c_{k}^{*}}=f(x_{1})-f(x_{0})=f^{\prime}(x_{0})(x_{1}-x_{0})+\frac{f^{\prime\prime}(\tilde{x})}{2}(x_{1}-x_{0})^{2}.

Combining Corollary 3 and Lemma 2, we have

		$\displaystyle\left\|x_{1}-x_{0}-\left(\textbf{Tr}\left[\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}+2\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\right]+2\boldsymbol{b}_{k}^{\top}\overline{\boldsymbol{\Lambda}}^{}\Delta\boldsymbol{b}_{k}\right)\right\|$
	$\displaystyle\leq$	$\displaystyle\left\|\textbf{Tr}\left[\boldsymbol{\Delta}\right]\right\|+\left\\|\boldsymbol{\psi}\right\\|_{\infty}\lesssim\frac{\kappa^{}K^{2}}{\beta_{n}}+K^{1.5}\theta_{\text{max}}\log^{0.5}n+\left(K^{0.5}\varepsilon_{2}+\varepsilon_{1}^{2}\right)\sigma_{\textbf{max}}^{}.$

Note that $f^{\prime}(x_{0})=0.5x_{0}^{-1/2}=0.5c_{k}^{*}$ . Since $K^{0.5}\varepsilon_{1}\sigma_{\textbf{max}}^{*}\ll x_{0}\asymp n\theta_{\text{max}}^{2}$ , we have $f^{\prime\prime}(\tilde{x})\asymp f^{\prime\prime}(x_{0})\asymp c_{k}^{*3}$ . Hence,

		$\displaystyle\left\|\frac{1}{\widehat{c}_{\rho(k)}}-\frac{1}{c_{k}^{}}-\frac{c_{k}^{}}{2}\left(\textbf{Tr}\left[\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}+2\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\right]+2\boldsymbol{b}_{k}^{\top}\overline{\boldsymbol{\Lambda}}^{}\Delta\boldsymbol{b}_{k}\right)\right\|$
	$\displaystyle\lesssim$	$\displaystyle\left(\frac{\kappa^{}K^{2}}{\beta_{n}}+K^{1.5}\theta_{\text{max}}\log^{0.5}n+\left(K^{0.5}\varepsilon_{2}+\varepsilon_{1}^{2}\right)\sigma_{\textbf{max}}^{}\right)c_{k}^{}+K\varepsilon_{1}^{2}\sigma_{\textbf{max}}^{2}c_{k}^{*3}.$		(136)

Again by Taylor expansion, we know that there exists some $\tilde{x}^{\prime}$ between $x_{0}$ and $x_{1}$ , such that

\displaystyle\frac{1}{\widehat{c}_{\rho(k)}}-\frac{1}{c_{k}^{*}}=f(x_{1})-f(x_{0})=f^{\prime}(\tilde{x}^{\prime})(x_{1}-x_{0}).

Since $\sqrt{K-1}\varepsilon_{1}\sigma_{\textbf{max}}^{*}\ll x_{0}\asymp n\theta_{\text{max}}^{2}$ , we have $f^{\prime}(\tilde{x}^{\prime})\asymp f^{\prime}(x_{0})\asymp c_{k}^{*}$ . So,

\displaystyle\left|\frac{1}{\widehat{c}_{\rho(k)}}-\frac{1}{c_{k}^{*}}\right|\asymp c_{k}^{*}\left|x_{1}-x_{0}\right|\lesssim K^{0.5}\varepsilon_{1}\sigma_{\textbf{max}}^{*}c_{k}^{*}.

(137)

Second, for the permutation $\rho(\cdot)$ from Theorem 5 and any $i\in[n]$ , we have

	$\displaystyle\frac{(\widehat{\boldsymbol{a}}_{i})_{\rho(k)}}{\widehat{c}_{\rho(k)}}-\frac{(\boldsymbol{a}_{i}^{})_{k}}{c_{k}^{}}$	$\displaystyle=\frac{(\widehat{\boldsymbol{a}}_{i})_{\rho(k)}-(\boldsymbol{a}_{i}^{})_{k}}{c_{k}^{}}+\left(\frac{1}{\widehat{c}_{\rho(k)}}-\frac{1}{c_{k}^{}}\right)(\boldsymbol{a}_{i}^{})_{k}+\left(\frac{1}{\widehat{c}_{\rho(k)}}-\frac{1}{c_{k}^{}}\right)\left((\widehat{\boldsymbol{a}}_{i})_{\rho(k)}-(\boldsymbol{a}_{i}^{})_{k}\right)$
		$\displaystyle=\frac{(\rho(\widehat{\boldsymbol{a}}_{i})-\boldsymbol{a}_{i}^{})_{k}}{c_{k}^{}}+\left(\frac{1}{\widehat{c}_{\rho(k)}}-\frac{1}{c_{k}^{}}\right)(\boldsymbol{a}_{i}^{})_{k}+\left(\frac{1}{\widehat{c}_{\rho(k)}}-\frac{1}{c_{k}^{}}\right)(\rho(\widehat{\boldsymbol{a}}_{i})-\boldsymbol{a}_{i}^{})_{k}.$

Combine Lemma 3 and (136), we have the following expansion

\displaystyle\frac{(\widehat{\boldsymbol{a}}_{i})_{\rho(k)}}{\widehat{c}_{\rho(k)}}-\frac{(\boldsymbol{a}_{i}^{*})_{k}}{c_{k}^{*}}=\frac{(\Delta\boldsymbol{a}_{i})_{k}}{c_{k}^{*}}+\frac{c_{k}^{*}}{2}\left(\textbf{Tr}\left[\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+2\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right]+2\boldsymbol{b}_{k}^{*\top}\overline{\boldsymbol{\Lambda}}^{*}\Delta\boldsymbol{b}_{k}\right)(\boldsymbol{a}_{i}^{*})_{k}+\left[\boldsymbol{\Psi}_{\boldsymbol{a}/\boldsymbol{c}}\right]_{i,k},

where

	$\displaystyle\left\|\left[\boldsymbol{\Psi}_{\boldsymbol{a}/\boldsymbol{c}}\right]_{i,k}\right\|\leq$	$\displaystyle\left\|\frac{1}{\widehat{c}_{\rho(k)}}-\frac{1}{c_{k}^{}}-\frac{c_{k}^{}}{2}\left(\textbf{Tr}\left[\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}+2\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\right]+2\boldsymbol{b}_{k}^{\top}\overline{\boldsymbol{\Lambda}}^{}\Delta\boldsymbol{b}_{k}\right)\right\|(\boldsymbol{a}_{i}^{*})_{k}$
		$\displaystyle+\frac{\left\|(\rho(\widehat{\boldsymbol{a}}_{i})-\boldsymbol{a}_{i}^{}-\Delta\boldsymbol{a}_{i})_{k}\right\|}{c_{k}^{}}+\left\|\frac{1}{\widehat{c}_{\rho(k)}}-\frac{1}{c_{k}^{}}\right\|\left\|(\rho(\widehat{\boldsymbol{a}}_{i})-\boldsymbol{a}_{i}^{})_{k}\right\|$
	$\displaystyle\lesssim$	$\displaystyle\left(\frac{\kappa^{}K^{2}}{\beta_{n}}+K^{1.5}\theta_{\text{max}}\log^{0.5}n+\left(K^{0.5}\varepsilon_{2}+\varepsilon_{1}^{2}\right)\sigma_{\textbf{max}}^{}\right)c_{k}^{}+K\varepsilon_{1}^{2}\sigma_{\textbf{max}}^{2}c_{k}^{*3}$
		$\displaystyle+\frac{\varepsilon_{2}+\varepsilon_{1}^{2}}{c_{k}^{}}+K^{0.5}\varepsilon_{1}^{2}\sigma_{\textbf{max}}^{}c_{k}^{*}$
	$\displaystyle\overset{(i)}{\lesssim}$	$\displaystyle\frac{1}{\left\\|\boldsymbol{\theta}\right\\|_{2}}\left(\frac{\kappa^{}K^{2}}{\beta_{n}}+K^{1.5}\theta_{\text{max}}\log^{0.5}n+\left(K^{0.5}\varepsilon_{2}+\varepsilon_{1}^{2}\right)\sigma_{\textbf{max}}^{}\right)+\left\\|\boldsymbol{\theta}\right\\|_{2}(\varepsilon_{2}+\varepsilon_{1}^{2}).$

Here $(i)$ holds because $c_{k}^{*}\asymp 1/\left\|\boldsymbol{\theta}\right\|_{2}$ according to [JKL23, C.22] and $\sigma_{\textbf{max}}^{*}\asymp\beta_{n}K^{-1}\left\|\boldsymbol{\theta}\right\|_{2}^{2}$ according to Lemma 1. In terms of the estimation error, by Theorem 3 and (137) we have

\displaystyle\left|\frac{(\widehat{\boldsymbol{a}}_{i})_{\rho(k)}}{\widehat{c}_{\rho(k)}}-\frac{(\boldsymbol{a}_{i}^{*})_{k}}{c_{k}^{*}}\right|\leq\left|\frac{(\widehat{\boldsymbol{a}}_{i})_{\rho(k)}}{\widehat{c}_{\rho(k)}}-\frac{(\boldsymbol{a}_{i}^{*})_{k}}{\widehat{c}_{\rho(k)}}\right|+\left|\frac{(\boldsymbol{a}_{i}^{*})_{k}}{\widehat{c}_{\rho(k)}}-\frac{(\boldsymbol{a}_{i}^{*})_{k}}{c_{k}^{*}}\right|\lesssim\frac{\varepsilon_{1}}{c_{k}^{*}}+K^{0.5}\varepsilon_{1}\sigma_{\textbf{max}}^{*}c_{k}^{*}\lesssim\varepsilon_{1}\left\|\boldsymbol{\theta}\right\|_{2}.

(138)

Now we are ready to derive the expansion of $\widehat{\boldsymbol{\pi}}_{i}$ . For any $i\in[n]$ and $k\in[K]$ , we have

		$\displaystyle\widehat{\boldsymbol{\pi}}_{i}(\rho(k))-\boldsymbol{\pi}_{i}^{}(k)=\frac{(\widehat{\boldsymbol{a}}_{i})_{\rho(k)}/\widehat{c}_{\rho(k)}}{\sum_{l=1}^{K}(\widehat{\boldsymbol{a}}_{i})_{\rho(l)}\widehat{/}c_{\rho(l)}}-\frac{(\boldsymbol{a}_{i}^{})_{k}/c^{}_{k}}{\sum_{l=1}^{K}(\boldsymbol{a}_{i}^{})_{l}/c^{*}_{l}}$
	$\displaystyle=$	$\displaystyle\frac{\sum_{l\neq k,l\in[K]}\left((\widehat{\boldsymbol{a}}_{i})_{\rho(k)}/\widehat{c}_{\rho(k)}\right)\cdot\left((\boldsymbol{a}_{i}^{})_{l}/c^{}_{l}\right)-\left((\widehat{\boldsymbol{a}}_{i})_{\rho(l)}/\widehat{c}_{\rho(l)}\right)\cdot\left((\boldsymbol{a}_{i}^{})_{k}/c^{}_{k}\right)}{\left(\sum_{l=1}^{K}(\widehat{\boldsymbol{a}}_{i})_{\rho(l)}/\widehat{c}_{\rho(l)}\right)\left(\sum_{l=1}^{K}(\boldsymbol{a}_{i}^{})_{l}/c^{}_{l}\right)}$
	$\displaystyle=$	$\displaystyle(1+\eta_{i})\Delta\boldsymbol{\pi}_{i}(k)+\left[\boldsymbol{\Psi}_{\boldsymbol{\Pi}}\right]_{i,k},$

where

	$\displaystyle\eta_{i}=$	$\displaystyle\frac{\sum_{l=1}^{K}(\boldsymbol{a}_{i}^{})_{l}/c^{}_{l}-(\widehat{\boldsymbol{a}}_{i})_{\rho(l)}/\widehat{c}_{\rho(l)}}{\sum_{l=1}^{K}(\widehat{\boldsymbol{a}}_{i})_{\rho(l)}/\widehat{c}_{\rho(l)}},$
	$\displaystyle\Delta\boldsymbol{\pi}_{i}(k)=$	$\displaystyle\frac{1}{\left(\sum_{l=1}^{K}(\boldsymbol{a}^{}_{i})_{l}/c^{}_{l}\right)^{2}}\Bigg{\{}\sum_{l\neq k,l\in[K]}\textbf{Tr}\left[\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}+2\boldsymbol{N}\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\right]\left(\frac{c_{k}^{}}{2c_{l}^{}}-\frac{c_{l}^{}}{2c_{k}^{}}\right)(\boldsymbol{a}_{i}^{})_{k}(\boldsymbol{a}_{i}^{})_{l}$
		$\displaystyle\quad\quad+\frac{(\Delta\boldsymbol{a}_{i})_{k}(\boldsymbol{a}_{i}^{})_{l}-(\Delta\boldsymbol{a}_{i})_{l}(\boldsymbol{a}_{i}^{})_{k}}{c_{k}^{}c_{l}^{}}+\left(\frac{\boldsymbol{b}_{k}^{\top}\overline{\boldsymbol{\Lambda}}^{}\Delta\boldsymbol{b}_{k}c_{k}^{}}{c_{l}^{}}-\frac{\boldsymbol{b}_{l}^{\top}\overline{\boldsymbol{\Lambda}}^{}\Delta\boldsymbol{b}_{l}c_{l}^{}}{c_{k}^{}}\right)(\boldsymbol{a}_{i}^{})_{k}(\boldsymbol{a}_{i}^{})_{l}\Bigg{\}},$
	$\displaystyle\left[\boldsymbol{\Psi}_{\boldsymbol{\Pi}}\right]_{i,k}=$	$\displaystyle\frac{\sum_{l\neq k,l\in[K]}\left[\boldsymbol{\Psi}_{\boldsymbol{a}/\boldsymbol{c}}\right]_{i,k}(\boldsymbol{a}_{i}^{})_{l}/c_{l}^{}-\left[\boldsymbol{\Psi}_{\boldsymbol{a}/\boldsymbol{c}}\right]_{i,l}(\boldsymbol{a}_{i}^{})_{k}/c_{k}^{}}{\left(\sum_{l=1}^{K}(\widehat{\boldsymbol{a}}_{i})_{\rho(l)}/\widehat{c}_{\rho(l)}\right)\left(\sum_{l=1}^{K}(\boldsymbol{a}_{i}^{})_{l}/c^{}_{l}\right)}.$

Since $\varepsilon_{1}\leq C$ for some appropriate $C>0$ , we have

\displaystyle\left|\frac{c_{k}^{*}}{\widehat{c}_{\rho(k)}}-1\right|\lesssim K^{0.5}\varepsilon_{1}\sigma_{\textbf{max}}^{*}c_{k}^{*2}\lesssim K^{0.5}\varepsilon_{1}\frac{\sigma_{\textbf{max}}^{*}}{\left\|\boldsymbol{\theta}\right\|_{2}^{2}}\lesssim\frac{\varepsilon_{1}}{K^{0.5}}\Rightarrow\left|\frac{c_{k}^{*}}{\widehat{c}_{\rho(k)}}-1\right|\leq 0.5.

As a result, we have

\displaystyle\sum_{l=1}^{K}\frac{(\widehat{\boldsymbol{a}}_{i})_{\rho(l)}}{\widehat{c}_{\rho(l)}}\geq\sum_{l=1}^{K}\frac{(\widehat{\boldsymbol{a}}_{i})_{\rho(l)}}{\max_{t\in[K]}\widehat{c}_{t}}=\frac{1}{\max_{t\in[K]}\widehat{c}_{t}}\gtrsim\frac{1}{\max_{t\in[K]}c^{*}_{t}}\asymp\left\|\boldsymbol{\theta}\right\|_{2}.

Combine this with (138), for $\eta_{i},i\in[n]$ we have

\displaystyle|\eta_{i}|

\displaystyle\leq\frac{\sum_{l=1}^{K}\left|(\boldsymbol{a}_{i}^{*})_{l}/c^{*}_{l}-(\widehat{\boldsymbol{a}}_{i})_{\rho(l)}/\widehat{c}_{\rho(l)}\right|}{\sum_{l=1}^{K}(\widehat{\boldsymbol{a}}_{i})_{\rho(l)}/\widehat{c}_{\rho(l)}}\lesssim\frac{K\varepsilon_{1}\left\|\boldsymbol{\theta}\right\|_{2}}{\left\|\boldsymbol{\theta}\right\|_{2}}=K\varepsilon_{1}.

Since $\sum_{l=1}^{K}(\boldsymbol{a}_{i}^{*})_{l}/c^{*}_{l}\geq 1/\max_{t\in[K]}c_{t}^{*}\asymp 1/\left\|\boldsymbol{\theta}\right\|_{2}$ , we have

	$\displaystyle\left\|\left[\boldsymbol{\Psi}_{\boldsymbol{\Pi}}\right]_{i,k}\right\|$	$\displaystyle\lesssim\frac{\sum_{l\neq k,l\in[K]}\left\|\left[\boldsymbol{\Psi}_{\boldsymbol{a}/\boldsymbol{c}}\right]_{i,k}\right\|(\boldsymbol{a}_{i}^{})_{l}\left\\|\boldsymbol{\theta}\right\\|_{2}+\left\|\left[\boldsymbol{\Psi}_{\boldsymbol{a}/\boldsymbol{c}}\right]_{i,l}\right\|(\boldsymbol{a}_{i}^{})_{k}\left\\|\boldsymbol{\theta}\right\\|_{2}}{\left\\|\boldsymbol{\theta}\right\\|_{2}^{2}}$
		$\displaystyle\leq\frac{K\max_{j\in[n],l\in[K]}\left\|\left[\boldsymbol{\Psi}_{\boldsymbol{a}/\boldsymbol{c}}\right]_{j,l}\right\|}{\left\\|\boldsymbol{\theta}\right\\|_{2}}$
		$\displaystyle\lesssim\frac{K}{\left\\|\boldsymbol{\theta}\right\\|_{2}^{2}}\left(\frac{\kappa^{}K^{2}}{\beta_{n}}+K^{1.5}\theta_{\text{max}}\log^{0.5}n+\left(K^{0.5}\varepsilon_{2}+\varepsilon_{1}^{2}\right)\sigma_{\textbf{max}}^{}\right)+K(\varepsilon_{2}+\varepsilon_{1}^{2}).$

∎

8.20 Proof of Theorem 7

Proof.

Define

\displaystyle\Delta\boldsymbol{\pi}_{\mathcal{I}}:=\left(\Delta\boldsymbol{\pi}_{i_{1}}(k_{1}),\Delta\boldsymbol{\pi}_{i_{2}}(k_{2}),\dots,\Delta\boldsymbol{\pi}_{i_{r}}(k_{r})\right)^{\top}=\sum_{1\leq i\leq j\leq n}W_{ij}\boldsymbol{\omega}_{ij}.

By Berry-Esseen theorem [Rai19], for any convex set $\mathcal{D}\subset\mathbb{R}^{r}$ , we have

	$\displaystyle\left\|\mathbb{P}(\Delta\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D})-\mathbb{P}(\mathcal{N}(\boldsymbol{0}_{r},\boldsymbol{\Sigma})\in\mathcal{D})\right\|\lesssim r^{1/4}\sum_{1\leq i\leq j\leq n}\mathbb{E}\left[\left\\|\boldsymbol{\Sigma}^{-1/2}W_{ij}\boldsymbol{\omega}_{ij}\right\\|_{2}^{3}\right]$
$\displaystyle\lesssim$	$\displaystyle r^{1/4}\sum_{1\leq i\leq j\leq n}\left\\|\boldsymbol{\Sigma}^{-1/2}\boldsymbol{\omega}_{ij}\right\\|_{2}^{3}\mathbb{E}\left[\left\|W_{ij}\right\|^{3}\right]\leq r^{1/4}\sum_{1\leq i\leq j\leq n}\left\\|\boldsymbol{\Sigma}^{-1/2}\boldsymbol{\omega}_{ij}\right\\|_{2}^{3}H_{ij}(1-H_{ij})$
$\displaystyle\lesssim$	$\displaystyle r^{1/4}\max_{1\leq i\leq j\leq n}\left\\|\boldsymbol{\Sigma}^{-1/2}\boldsymbol{\omega}_{ij}\right\\|_{2}\sum_{1\leq i\leq j\leq n}\left\\|\boldsymbol{\Sigma}^{-1/2}\boldsymbol{\omega}_{ij}\right\\|_{2}^{2}H_{ij}(1-H_{ij}).$	(139)

Since $\boldsymbol{\Sigma}$ is the covariance of $\Delta\phi_{\mathcal{I}}$ , we know that

	$\displaystyle\sum_{1\leq i\leq j\leq n}\left\\|\boldsymbol{\Sigma}^{-1/2}\boldsymbol{\omega}_{ij}\right\\|_{2}^{2}H_{ij}(1-H_{ij})=\sum_{1\leq i\leq j\leq n}\boldsymbol{\omega}_{ij}^{\top}\boldsymbol{\Sigma}^{-1}\boldsymbol{\omega}_{ij}H_{ij}(1-H_{ij})$
$\displaystyle=$	$\displaystyle\sum_{1\leq i\leq j\leq n}\textbf{Tr}\left[\boldsymbol{\Sigma}^{-1}\boldsymbol{\omega}_{ij}\boldsymbol{\omega}_{ij}^{\top}\right]H_{ij}(1-H_{ij})=\textbf{Tr}\left[\boldsymbol{\Sigma}^{-1}\sum_{1\leq i\leq j\leq n}\boldsymbol{\omega}_{ij}\boldsymbol{\omega}_{ij}^{\top}H_{ij}(1-H_{ij})\right]$
$\displaystyle=$	$\displaystyle\textbf{Tr}\left[\boldsymbol{\Sigma}^{-1}\sum_{1\leq i\leq j\leq n}\mathbb{E}\left[(W_{ij}\boldsymbol{\omega}_{ij})(W_{ij}\boldsymbol{\omega}_{ij})^{\top}\right]\right]=\textbf{Tr}\left[\boldsymbol{\Sigma}^{-1}\boldsymbol{\Sigma}\right]=r.$	(140)

Combine (139) and (140) we know that

\displaystyle\left|\mathbb{P}(\Delta\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D})-\mathbb{P}(\mathcal{N}(\boldsymbol{0}_{r},\boldsymbol{\Sigma})\in\mathcal{D})\right|\lesssim r^{5/4}\max_{1\leq i\leq j\leq n}\left\|\boldsymbol{\Sigma}^{-1/2}\boldsymbol{\omega}_{ij}\right\|_{2}.

(141)

It remains to control $\left|\mathbb{P}(\widehat{\boldsymbol{\pi}}_{\mathcal{I}}-\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D})-\mathbb{P}(\Delta\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D})\right|$ . For any convex set $\mathcal{D}\subset\mathbb{R}^{r}$ and point $x\in\mathbb{R}^{r}$ , we define

\displaystyle\delta_{\mathcal{D}}(x):=\begin{cases}-\min_{y\in\mathbb{R}^{r}\backslash\mathcal{D}}\left\|x-y\right\|_{2},&\text{ if }x\in\mathcal{D}\\ \min_{y\in\mathcal{D}}\left\|x-y\right\|_{2},&\text{ if }x\notin\mathcal{D}\end{cases}\text{ and }\mathcal{D}^{\varepsilon}:=\left\{x\in\mathbb{R}^{r}:\delta_{\mathcal{D}}(x)\leq\varepsilon\right\}.

With this definition, we have

	$\displaystyle\mathbb{P}(\boldsymbol{\Sigma}^{-1/2}\Delta\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D}^{-\varepsilon})=$	$\displaystyle\mathbb{P}\left(\boldsymbol{\Sigma}^{-1/2}\Delta\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D}^{-\varepsilon},\left\\|\boldsymbol{\Sigma}^{-1/2}\left(\widehat{\boldsymbol{\pi}}_{\mathcal{I}}-\boldsymbol{\pi}_{\mathcal{I}}-\Delta_{\mathcal{I}}\right)\right\\|_{2}\leq\varepsilon\right)$
		$\displaystyle+\mathbb{P}\left(\boldsymbol{\Sigma}^{-1/2}\Delta\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D}^{-\varepsilon},\left\\|\boldsymbol{\Sigma}^{-1/2}\left(\widehat{\boldsymbol{\pi}}_{\mathcal{I}}-\boldsymbol{\pi}_{\mathcal{I}}-\Delta_{\mathcal{I}}\right)\right\\|_{2}>\varepsilon\right)$
	$\displaystyle\leq$	$\displaystyle\mathbb{P}\left(\boldsymbol{\Sigma}^{-1/2}(\widehat{\boldsymbol{\pi}}_{\mathcal{I}}-\boldsymbol{\pi}_{\mathcal{I}})\in\mathcal{D}\right)+\mathbb{P}\left(\left\\|\boldsymbol{\Sigma}^{-1/2}\left(\widehat{\boldsymbol{\pi}}_{\mathcal{I}}-\boldsymbol{\pi}_{\mathcal{I}}-\Delta_{\mathcal{I}}\right)\right\\|_{2}>\varepsilon\right).$

Taking $\varepsilon=\lambda_{r}^{-1/2}(\boldsymbol{\Sigma})r^{1/2}\varepsilon_{3}$ , by Theorem 6 we know that $\|\boldsymbol{\Sigma}^{-1/2}\left(\widehat{\boldsymbol{\pi}}_{\mathcal{I}}-\boldsymbol{\pi}_{\mathcal{I}}-\Delta_{\mathcal{I}}\right)\|_{2}\leq\varepsilon$ with probability at least $1-O(n^{-9})$ . As a result, we have

\displaystyle\mathbb{P}(\boldsymbol{\Sigma}^{-1/2}\Delta\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D}^{-\varepsilon})\leq\mathbb{P}\left(\boldsymbol{\Sigma}^{-1/2}(\widehat{\boldsymbol{\pi}}_{\mathcal{I}}-\boldsymbol{\pi}_{\mathcal{I}})\in\mathcal{D}\right)+O(n^{-9}).

(142)

On the other hand, one can see that

	$\displaystyle\|\mathbb{P}(\boldsymbol{\Sigma}^{-1/2}\Delta\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D}^{-\varepsilon})-\mathbb{P}(\boldsymbol{\Sigma}^{-1/2}\Delta\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D})\|\leq\|\mathbb{P}(\boldsymbol{\Sigma}^{-1/2}\Delta\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D}^{-\varepsilon})-\mathbb{P}(\mathcal{N}(\boldsymbol{0}_{r},\boldsymbol{I}_{r})\in\mathcal{D}^{-\varepsilon})\|$
	$\displaystyle+\|\mathbb{P}(\mathcal{N}(\boldsymbol{0}_{r},\boldsymbol{I}_{r})\in\mathcal{D}^{-\varepsilon})-\mathbb{P}(\mathcal{N}(\boldsymbol{0}_{r},\boldsymbol{I}_{r})\in\mathcal{D})\|+\|\mathbb{P}(\mathcal{N}(\boldsymbol{0}_{r},\boldsymbol{I}_{r})\in\mathcal{D})-\mathbb{P}(\boldsymbol{\Sigma}^{-1/2}\Delta\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D})\|.$		(143)

By [Rai19, Theorem 1.2] we know that

\displaystyle|\mathbb{P}(\mathcal{N}(\boldsymbol{0}_{r},\boldsymbol{I}_{r})\in\mathcal{D}^{-\varepsilon})-\mathbb{P}(\mathcal{N}(\boldsymbol{0}_{r},\boldsymbol{I}_{r})\in\mathcal{D})|\lesssim r^{1/4}\varepsilon\lesssim\lambda_{r}^{-1/2}(\boldsymbol{\Sigma})r^{3/4}\varepsilon_{3}.

Plugging this and (141) in (143), we have

\displaystyle|\mathbb{P}(\boldsymbol{\Sigma}^{-1/2}\Delta\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D}^{-\varepsilon})-\mathbb{P}(\boldsymbol{\Sigma}^{-1/2}\Delta\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D})|\lesssim r^{5/4}\max_{1\leq i\leq j\leq n}\left\|\boldsymbol{\Sigma}^{-1/2}\boldsymbol{\omega}_{ij}\right\|_{2}+\lambda_{r}^{-1/2}(\boldsymbol{\Sigma})r^{3/4}\varepsilon_{3}.

Combine this with (142), and by the arbitrariness of $\mathcal{D}$ , we get

\displaystyle\mathbb{P}(\Delta\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D})\leq\mathbb{P}\left((\widehat{\boldsymbol{\pi}}_{\mathcal{I}}-\boldsymbol{\pi}_{\mathcal{I}})\in\mathcal{D}\right)+O\left(r^{5/4}\max_{1\leq i\leq j\leq n}\left\|\boldsymbol{\Sigma}^{-1/2}\boldsymbol{\omega}_{ij}\right\|_{2}+\lambda_{r}^{-1/2}(\boldsymbol{\Sigma})r^{3/4}\varepsilon_{3}\right).

Similarly, it can also be shown that

\displaystyle\mathbb{P}\left((\widehat{\boldsymbol{\pi}}_{\mathcal{I}}-\boldsymbol{\pi}_{\mathcal{I}})\in\mathcal{D}\right)\leq\mathbb{P}(\Delta\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D})+O\left(r^{5/4}\max_{1\leq i\leq j\leq n}\left\|\boldsymbol{\Sigma}^{-1/2}\boldsymbol{\omega}_{ij}\right\|_{2}+\lambda_{r}^{-1/2}(\boldsymbol{\Sigma})r^{3/4}\varepsilon_{3}\right).

As a result, we know that

\displaystyle\left|\mathbb{P}(\widehat{\boldsymbol{\pi}}_{\mathcal{I}}-\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D})-\mathbb{P}(\Delta\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D})\right|\lesssim r^{5/4}\max_{1\leq i\leq j\leq n}\left\|\boldsymbol{\Sigma}^{-1/2}\boldsymbol{\omega}_{ij}\right\|_{2}+\lambda_{r}^{-1/2}(\boldsymbol{\Sigma})r^{3/4}\varepsilon_{3}.

(144)

Combine (141) and (144), we get

\displaystyle\left|\mathbb{P}(\widehat{\boldsymbol{\pi}}_{\mathcal{I}}-\boldsymbol{\pi}_{\mathcal{I}}\in\mathcal{D})-\mathbb{P}(\mathcal{N}(\boldsymbol{0}_{r},\boldsymbol{\Sigma})\in\mathcal{D})\right|\lesssim r^{5/4}\max_{1\leq i\leq j\leq n}\left\|\boldsymbol{\Sigma}^{-1/2}\boldsymbol{\omega}_{ij}\right\|_{2}+\lambda_{r}^{-1/2}(\boldsymbol{\Sigma})r^{3/4}\varepsilon_{3}.

∎

8.21 Auxiliary Lemmas

Lemma 12.

For $\boldsymbol{N}_{1}$ and $\boldsymbol{N}_{2}$ defined in (50), we have

	$\displaystyle\left\\|\boldsymbol{N}_{1}-\frac{1}{\lambda_{1}^{}}\boldsymbol{I}\right\\|_{2,\infty}\lesssim\sqrt{\frac{(K-1)\mu^{}}{n\lambda_{1}^{2}}},\quad\;\left\\|\boldsymbol{N}_{1}\right\\|\lesssim\frac{1}{\lambda_{1}^{}}$
	$\displaystyle\left\\|\boldsymbol{N}_{2}-\frac{1}{\lambda_{1}^{2}}\boldsymbol{I}\right\\|_{2,\infty}\lesssim\sqrt{\frac{(K-1)\mu^{}}{n\lambda_{1}^{4}}},\quad\left\\|\boldsymbol{N}_{2}\right\\|\lesssim\frac{1}{\lambda_{1}^{2}}.$

Proof.

We prove the desired results in the following two settings: with self-loop and without self-loop. When self-loops are allowed, the rank of $\boldsymbol{H}$ is exactly $K$ . When there is no self-loop, $\boldsymbol{H}$ is approximately rank $K$ .

With self-loop: The spectral norm bounds follow directly from (50)

\displaystyle\left\|\boldsymbol{N}_{1}\right\|=\max_{2\leq i\leq n}\frac{1}{\lambda_{1}^{*}-\lambda_{i}^{*}}\lesssim\frac{1}{\lambda_{1}^{*}},\quad\left\|\boldsymbol{N}_{2}\right\|=\max_{2\leq i\leq n}\frac{1}{(\lambda_{1}^{*}-\lambda_{i}^{*})^{2}}\lesssim\frac{1}{\lambda_{1}^{*2}}.

By definition, for $\boldsymbol{N}_{1}$ we have

\displaystyle\boldsymbol{N}_{1}-\frac{1}{\lambda_{1}^{*}}\boldsymbol{I}=\sum_{i=2}^{n}\frac{1}{\lambda_{1}^{*}-\lambda_{i}^{*}}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}-\frac{1}{\lambda_{1}^{*}}\sum_{i=1}^{n}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}=-\frac{1}{\lambda_{1}^{*}}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+\sum_{i=2}^{K}\frac{\lambda_{i}^{*}}{\lambda_{1}^{*}(\lambda_{1}^{*}-\lambda_{i}^{*})}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}.

On one hand, by (12) we have

\displaystyle\left\|\frac{1}{\lambda_{1}^{*}}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}\right\|_{2,\infty}=\frac{1}{\lambda_{1}^{*}}\left\|\boldsymbol{u}_{1}^{*}\right\|_{\infty}\left\|\boldsymbol{u}_{1}^{*}\right\|_{2}\leq\sqrt{\frac{\mu^{*}}{n\lambda_{1}^{*2}}}.

On the other hand, defining

\boldsymbol{C}:=\textbf{diag}\left(\frac{\lambda_{2}^{*}}{\lambda_{1}^{*}(\lambda_{1}^{*}-\lambda_{2}^{*})},\frac{\lambda_{3}^{*}}{\lambda_{1}^{*}(\lambda_{1}^{*}-\lambda_{3}^{*})},\dots,\frac{\lambda_{K}^{*}}{\lambda_{1}^{*}(\lambda_{1}^{*}-\lambda_{K}^{*})}\right),

by (12) we have,

\displaystyle\left\|\sum_{i=2}^{K}\frac{\lambda_{i}^{*}}{\lambda_{1}^{*}(\lambda_{1}^{*}-\lambda_{i}^{*})}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}\right\|_{2,\infty}

\displaystyle=\left\|\overline{\boldsymbol{U}}\boldsymbol{C}\overline{\boldsymbol{U}}^{\top}\right\|_{2,\infty}\leq\left\|\overline{\boldsymbol{U}}\right\|_{2,\infty}\left\|\boldsymbol{C}\overline{\boldsymbol{U}}^{\top}\right\|\lesssim\sqrt{\frac{(K-1)\mu^{*}}{n\lambda_{1}^{*2}}}.

As a result, we get $\left\|\boldsymbol{N}_{1}-\frac{1}{\lambda_{1}^{*}}\boldsymbol{I}\right\|_{2,\infty}\lesssim\sqrt{\frac{(K-1)\mu^{*}}{n\lambda_{1}^{*2}}}$ . Similarly, for $\boldsymbol{N}_{2}$ we have

\displaystyle\boldsymbol{N}_{2}-\frac{1}{\lambda_{1}^{*2}}\boldsymbol{I}=-\frac{1}{\lambda_{1}^{*2}}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+\sum_{i=2}^{K}\left(\frac{1}{(\lambda_{1}^{*}-\lambda_{i}^{*})^{2}}-\frac{1}{\lambda_{1}^{*2}}\right)\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}.

Again, we have $\|\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}/\lambda_{1}^{*2}\|_{2,\infty}\lesssim\sqrt{\mu^{*}/(n\lambda_{1}^{*4})}$ . Defining

\boldsymbol{C}_{1}=\textbf{diag}\left(\frac{1}{(\lambda_{1}^{*}-\lambda_{2}^{*})^{2}}-\frac{1}{\lambda_{1}^{*2}},\frac{1}{(\lambda_{1}^{*}-\lambda_{3}^{*})^{2}}-\frac{1}{\lambda_{1}^{*2}},\dots,\frac{1}{(\lambda_{1}^{*}-\lambda_{K}^{*})^{2}}-\frac{1}{\lambda_{1}^{*2}}\right),

we have

\displaystyle\left\|\sum_{i=2}^{K}\left(\frac{1}{(\lambda_{1}^{*}-\lambda_{i}^{*})^{2}}-\frac{1}{\lambda_{1}^{*2}}\right)\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}\right\|_{2,\infty}

\displaystyle=\left\|\overline{\boldsymbol{U}}\boldsymbol{C}_{1}\overline{\boldsymbol{U}}^{\top}\right\|_{2,\infty}\leq\left\|\overline{\boldsymbol{U}}\right\|_{2,\infty}\left\|\boldsymbol{C}_{1}\overline{\boldsymbol{U}}^{\top}\right\|\lesssim\sqrt{\frac{(K-1)\mu^{*}}{n\lambda_{1}^{*4}}}.

Combining together, we get the desired conclusion $\|\boldsymbol{N}_{2}-\frac{1}{\lambda_{1}^{*2}}\boldsymbol{I}\|_{2,\infty}\lesssim\sqrt{\frac{(K-1)\mu^{*}}{n\lambda_{1}^{*4}}}$ .

Without self-loop: The spectral norm bounds follow same as the previous case. For the rest, by definition of $\boldsymbol{N}_{1}$ ,

	$\displaystyle\boldsymbol{N}_{1}-\frac{1}{\lambda_{1}^{}}\boldsymbol{I}=\sum_{i=2}^{n}\frac{1}{\lambda_{1}^{}-\lambda_{i}^{}}\boldsymbol{u}_{i}^{}\boldsymbol{u}_{i}^{\top}-\frac{1}{\lambda_{1}^{}}\sum_{i=1}^{n}\boldsymbol{u}_{i}^{}\boldsymbol{u}_{i}^{\top}=$	$\displaystyle-\frac{1}{\lambda_{1}^{}}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}+\sum_{i=2}^{K}\frac{\lambda_{i}^{}}{\lambda_{1}^{}(\lambda_{1}^{}-\lambda_{i}^{})}\boldsymbol{u}_{i}^{}\boldsymbol{u}_{i}^{*\top}$
		$\displaystyle+\sum_{i=K+1}^{n}\frac{\lambda_{i}^{}}{\lambda_{1}^{}(\lambda_{1}^{}-\lambda_{i}^{})}\boldsymbol{u}_{i}^{}\boldsymbol{u}_{i}^{\top}$

The bounds on the first two summands are same as before. Hence it remains to control $\|\sum_{i=K+1}^{n}\frac{\lambda_{i}^{*}}{\lambda_{1}^{*}(\lambda_{1}^{*}-\lambda_{i}^{*})}\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}\|_{2,\infty}$ . One can see that

	$\displaystyle\left\\|\sum_{i=K+1}^{n}\frac{\lambda_{i}^{}}{\lambda_{1}^{}(\lambda_{1}^{}-\lambda_{i}^{})}\boldsymbol{u}_{i}^{}\boldsymbol{u}_{i}^{\top}\right\\|_{2,\infty}$	$\displaystyle\leq\left\\|\sum_{i=K+1}^{n}\frac{\lambda_{i}^{}}{\lambda_{1}^{}(\lambda_{1}^{}-\lambda_{i}^{})}\boldsymbol{u}_{i}^{}\boldsymbol{u}_{i}^{\top}\right\\|\leq\max_{K+1\leq i\leq n}\left\|\frac{\lambda_{i}^{}}{\lambda_{1}^{}(\lambda_{1}^{}-\lambda_{i}^{})}\right\|$
		$\displaystyle\lesssim\frac{\left\\|\textbf{diag}(\boldsymbol{\Theta}\boldsymbol{\Pi}\boldsymbol{P}\boldsymbol{\Pi}^{\top}\boldsymbol{\Theta})\right\\|}{\lambda_{1}^{2}}\leq\frac{\theta_{\text{max}}^{2}}{\lambda_{1}^{2}}.$

As a result, we get

\displaystyle\left\|\boldsymbol{N}_{1}-\frac{1}{\lambda_{1}^{*}}\boldsymbol{I}\right\|_{2,\infty}\lesssim\sqrt{\frac{(K-1)\mu^{*}}{n\lambda_{1}^{*2}}}+\frac{\theta_{\text{max}}^{2}}{\lambda_{1}^{*2}}\lesssim\sqrt{\frac{(K-1)\mu^{*}}{n\lambda_{1}^{*2}}},

since $\sqrt{n}\theta_{\text{max}}^{2}\lesssim\sqrt{n}\theta_{\text{max}}\ll\lambda_{1}^{*}$ . Similarly, for $\boldsymbol{N}_{2}$ we have

\displaystyle\boldsymbol{N}_{2}-\frac{1}{\lambda_{1}^{*2}}\boldsymbol{I}=-\frac{1}{\lambda_{1}^{*2}}\boldsymbol{u}_{1}^{*}\boldsymbol{u}_{1}^{*\top}+\sum_{i=2}^{K}\left(\frac{1}{(\lambda_{1}^{*}-\lambda_{i}^{*})^{2}}-\frac{1}{\lambda_{1}^{*2}}\right)\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}+\sum_{i=K+1}^{n}\left(\frac{1}{(\lambda_{1}^{*}-\lambda_{i}^{*})^{2}}-\frac{1}{\lambda_{1}^{*2}}\right)\boldsymbol{u}_{i}^{*}\boldsymbol{u}_{i}^{*\top}.

We bound the first two summands as before. For the third term,

		$\displaystyle\left\\|\sum_{i=K+1}^{n}\left(\frac{1}{(\lambda_{1}^{}-\lambda_{i}^{})^{2}}-\frac{1}{\lambda_{1}^{2}}\right)\boldsymbol{u}_{i}^{}\boldsymbol{u}_{i}^{\top}\right\\|_{2,\infty}\leq\left\\|\sum_{i=K+1}^{n}\left(\frac{1}{(\lambda_{1}^{}-\lambda_{i}^{})^{2}}-\frac{1}{\lambda_{1}^{2}}\right)\boldsymbol{u}_{i}^{}\boldsymbol{u}_{i}^{\top}\right\\|$
	$\displaystyle\lesssim$	$\displaystyle\max_{K+1\leq i\leq n}\left\|\frac{1}{(\lambda_{1}^{}-\lambda_{i}^{})^{2}}-\frac{1}{\lambda_{1}^{2}}\right\|\lesssim\max_{K+1\leq i\leq n}\left\|\frac{\lambda_{i}^{}}{\lambda_{1}^{3}}\right\|\leq\frac{\left\\|\textbf{diag}(\boldsymbol{\Theta}\boldsymbol{\Pi}\boldsymbol{P}\boldsymbol{\Pi}^{\top}\boldsymbol{\Theta})\right\\|}{\lambda_{1}^{3}}\leq\frac{\theta_{\text{max}}^{2}}{\lambda_{1}^{*3}}.$

Combine them together, we get

\displaystyle\left\|\boldsymbol{N}_{2}-\frac{1}{\lambda_{1}^{*2}}\boldsymbol{I}\right\|_{2,\infty}\lesssim\sqrt{\frac{(K-1)\mu^{*}}{n\lambda_{1}^{*4}}}+\frac{\theta_{\text{max}}^{2}}{\lambda_{1}^{*3}}\lesssim\sqrt{\frac{(K-1)\mu^{*}}{n\lambda_{1}^{*4}}},

since $\sqrt{n}\theta_{\text{max}}^{2}\lesssim\sqrt{n}\theta_{\text{max}}\ll\lambda_{1}^{*}$ . This completes the proof. ∎

From Lemma 12, We immediately have the following corollary.

Corollary 4.

For $\boldsymbol{N}_{1}$ and $\boldsymbol{N}_{2}$ defined in (50), we have for all $\boldsymbol{x}\in\mathbb{R}^{n}$

\displaystyle\left\|\boldsymbol{N}_{i}\boldsymbol{x}\right\|_{\infty}\lesssim\frac{1}{\lambda_{1}^{*i}}\left\|\boldsymbol{x}\right\|_{\infty}+\sqrt{\frac{(K-1)\mu^{*}}{n\lambda_{1}^{*2i}}}\left\|\boldsymbol{x}\right\|_{2},\quad i=1,2.

Lemma 13.

For any $i\in[n]$ and a fixed vector $\boldsymbol{x}\in\mathbb{R}^{n}$ , we have

\displaystyle\left|\boldsymbol{W}_{i,\cdot}\boldsymbol{x}\right|\lesssim\sqrt{\log n}\theta_{\text{max}}\left\|\boldsymbol{x}\right\|_{2}+\log n\left\|\boldsymbol{x}\right\|_{\infty}

with probability at least $1-O(n^{-15})$ . Here the constant hidden in $\lesssim$ is free of $n$ , $\boldsymbol{x}$ and $\theta_{\text{max}}$ .

Proof.

Since $|W_{ij}|\leq 1$ , by Bernstein inequality,with probability at least $1-O(n^{-15})$ ,

\displaystyle\left|\boldsymbol{W}_{i,\cdot}\boldsymbol{x}\right|

\displaystyle\lesssim\sqrt{\log n\sum_{j=1}^{n}\mathbb{E}\left[W_{ij}^{2}\right]x_{j}^{2}}+\log n\left\|\boldsymbol{x}\right\|_{\infty}\lesssim\sqrt{\log n}\theta_{\text{max}}\left\|\boldsymbol{x}\right\|_{2}+\log n\left\|\boldsymbol{x}\right\|_{\infty}

∎

Lemma 14.

For any $i\in[n]$ and a fixed matrix $\boldsymbol{A}\in\mathbb{R}^{n\times m}$ , we have with probability at least $1-O(n^{-15}m)$ ,

\displaystyle\left\|\boldsymbol{W}_{i,\cdot}\boldsymbol{A}\right\|_{2}\lesssim\sqrt{\log n}\theta_{\text{max}}\left\|\boldsymbol{A}\right\|_{F}+\log n\left\|\boldsymbol{A}\right\|_{2,\infty}.

Proof.

Taking $E=\boldsymbol{W}_{i,\cdot}$ in [YCF21, Lemma 5] gives us the desired statement. ∎

Lemma 15.

For any fixed matrix $\boldsymbol{A}\in\mathbb{R}^{n\times m}$ , we have with probability at least $1-O(n^{-14}m)$ ,

\displaystyle\left\|\boldsymbol{W}\boldsymbol{A}\right\|_{2,\infty}\lesssim\sqrt{\log n}\theta_{\text{max}}\left\|\boldsymbol{A}\right\|_{F}+\sqrt{m}\log n\left\|\boldsymbol{A}\right\|_{\text{max}}.

Proof.

Recall that $\|\boldsymbol{W}\boldsymbol{A}\|_{2,\infty}=\max_{1\leq i\leq n}\|\boldsymbol{W}_{i,\cdot}\boldsymbol{A}\|_{2}$ . Applying Lemma 14 for $i\in[n]$ , we get the desired conclusion. ∎

Lemma 16.

For any $i\in[n]$ and a fixed matrix $\boldsymbol{A}\in\mathbb{R}^{n\times m}$ , we have with probability at least $1-O(n^{-15}m)$ ,

\displaystyle\left\|\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{A}\right\|_{F}\lesssim\sqrt{\log n}\theta_{\text{max}}\left\|\boldsymbol{A}\right\|_{F}+(\sqrt{n}\theta_{\text{max}}+\log n)\left\|\boldsymbol{A}\right\|_{2,\infty}.

Proof.

By definition we have

	$\displaystyle\left\\|\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{A}\right\\|_{F}$	$\displaystyle=\sqrt{\left\\|\boldsymbol{W}_{i,\cdot}\boldsymbol{A}\right\\|_{2}^{2}+\sum_{j\in[n],j\neq i}W_{ji}^{2}\left\\|\boldsymbol{A}_{i,\cdot}\right\\|_{2}^{2}}$
		$\displaystyle\lesssim\left\\|\boldsymbol{W}_{i,\cdot}\boldsymbol{A}\right\\|_{2}+\left\\|\boldsymbol{W}_{\cdot,i}\right\\|_{2}\left\\|\boldsymbol{A}_{i,\cdot}\right\\|_{2}.$		(145)

By Bernstein inequality, we know that

	$\displaystyle\left\|\sum_{j\in[n]}W_{ji}^{2}-\sum_{j\in[n]}\mathbb{E}\left[W_{ji}^{2}\right]\right\|$	$\displaystyle\lesssim\sqrt{\log n\sum_{j=1}^{n}\mathbb{E}\left[W_{ji}^{4}\right]}+\log n$
		$\displaystyle\lesssim\sqrt{n\log n\theta_{\text{max}}^{2}}+\log n\lesssim\sqrt{n\log n}\theta_{\text{max}}.$		(146)

with probability at least $1-O(n^{-15})$ . On the other hand, we know that

\displaystyle\left|\sum_{j\in[n]}\mathbb{E}\left[W_{ji}^{2}\right]\right|\lesssim n\theta_{\text{max}}^{2}.

(147)

Combine (146) and (147) we know that

\displaystyle\left|\sum_{j\in[n]}W_{ji}^{2}\right|\lesssim\sqrt{n\log n}\theta_{\text{max}}+n\theta_{\text{max}}^{2}\lesssim n\theta_{\text{max}}^{2}

with probability at least $1-O(n^{-15})$ . Plugging this as well as Lemma 14 in (145) we get

\displaystyle\left\|\Big{(}\boldsymbol{W}-\boldsymbol{W}^{(i)}\Big{)}\boldsymbol{A}\right\|_{F}\lesssim\sqrt{\log n}\theta_{\text{max}}\left\|\boldsymbol{A}\right\|_{F}+(\sqrt{n}\theta_{\text{max}}+\log n)\left\|\boldsymbol{A}\right\|_{2,\infty}

with probability at least $1-O(n^{-15}m)$ . ∎

Lemma 17.

For any fixed $\boldsymbol{u},\boldsymbol{v}\in\mathbb{R}^{n}$ , we have with probability at least $1-O(n^{-15})$ ,

\displaystyle\left|\boldsymbol{u}^{\top}\boldsymbol{W}\boldsymbol{v}\right|\lesssim\sqrt{\log n}\|\boldsymbol{u}\|_{2}\|\boldsymbol{v}\|_{2}.

In particular, for any $i,j\in[n]$ , we have with probability at least $1-O(n^{-15})$ ,

\displaystyle\left|\boldsymbol{u}_{i}^{*\top}\boldsymbol{W}\boldsymbol{u}_{j}^{*}\right|\lesssim\sqrt{\log n}.

Proof.

Since $|W_{kl}|\leq 1$ , by Hoeffding’s inequality, we have

	$\displaystyle\mathbb{P}\left(\left\|\boldsymbol{u}^{\top}\boldsymbol{W}\boldsymbol{v}\right\|\geq t\right)$	$\displaystyle\leq 2\exp\left(-\frac{2t^{2}}{\sum_{k=1}^{n}\left(2u_{k}v_{k}\right)^{2}+\sum_{1\leq k<l\leq n}\left(2u_{k}v_{l}+2u_{l}v_{k}\right)^{2}}\right)$
		$\displaystyle\leq 2\exp\left(-\frac{t^{2}}{2\sum_{k=1}^{n}u_{k}^{2}v_{k}^{2}+4\sum_{1\leq k<l\leq n}[u_{k}^{2}v_{l}^{2}+u_{l}^{2}v_{k}^{2}]}\right)$
		$\displaystyle\leq 2\exp\left(-\frac{t^{2}}{4\sum_{k,l=1}^{n}u_{k}^{2}v_{l}^{2}}\right)=2\exp\left(-\frac{t^{2}}{4\\|\boldsymbol{u}\\|_{2}^{2}\\|\boldsymbol{v}\\|_{2}^{2}}\right).$

As a result, with probability at least $1-O(n^{-15})$ , we have $|\boldsymbol{u}^{\top}\boldsymbol{W}\boldsymbol{v}|\lesssim\sqrt{\log n}\|\boldsymbol{u}\|_{2}\|\boldsymbol{v}\|_{2}$ .

∎

Lemma 18.

For any $\boldsymbol{v}\in\mathbb{R}^{n}$ with $\|\boldsymbol{v}\|_{2}=1$ , with probability at least $1-O(n^{-15})$ ,

\displaystyle\left|\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{v}\right|\lesssim\sqrt{\log n}\theta_{\text{max}}+\log n\sqrt{\mu^{*}/n}.

Proof.

We write

\displaystyle\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{v}=\sum_{i=1}^{n}W_{ii}(\boldsymbol{u}_{1}^{*})_{i}v_{i}+\sum_{1\leq i<j\leq n}W_{ij}((\boldsymbol{u}_{1}^{*})_{i}v_{j}+(\boldsymbol{u}_{1}^{*})_{j}v_{i}).

Since $\|\boldsymbol{v}\|_{2}=1$ , we know, by (12),

\displaystyle\max_{i\in[n]}|(\boldsymbol{u}_{1}^{*})_{i}v_{i}|\leq\sqrt{\mu^{*}/n}\quad\text{and}\quad\max_{1\leq i<j\leq n}|(\boldsymbol{u}_{1}^{*})_{i}v_{j}+(\boldsymbol{u}_{1}^{*})_{j}v_{i}|\leq 2\sqrt{\mu^{*}/n}.

As a result, by Bernstein inequality we have, with probability at least $1-O(n^{-15})$ ,

	$\displaystyle\left\|\boldsymbol{u}_{1}^{*\top}\boldsymbol{W}\boldsymbol{v}\right\|\lesssim$	$\displaystyle\sqrt{\log n\Bigg{(}\sum_{i=1}^{n}((\boldsymbol{u}_{1}^{})_{i}v_{i})^{2}\mathbb{E}[W_{ii}^{2}]+\sum_{1\leq i<j\leq n}((\boldsymbol{u}_{1}^{})_{i}v_{j}+(\boldsymbol{u}_{1}^{*})_{j}v_{i})^{2}\mathbb{E}[W_{ij}^{2}]\Bigg{)}}$
		$\displaystyle+\log n\sqrt{\mu^{*}/n}$
	$\displaystyle\lesssim$	$\displaystyle\sqrt{\log n\sum_{i,j=1}^{n}((\boldsymbol{u}_{1}^{})_{i}v_{j})^{2}\theta_{\text{max}}^{2}}+\log n\sqrt{\mu^{}/n}\lesssim\sqrt{\log n}\theta_{\text{max}}+\log n\sqrt{\mu^{*}/n}$

∎

8.22 The Incoherence Parameter $\mu^{*}$

In this subsection, we aim to show that the incoherence condition (12) holds with $\mu^{*}\asymp 1$ . As we pointed out in Remark 1, [JKL23, Lemma C.3] and Assumption 2 guarantees $\left\|\boldsymbol{u}_{1}^{*}\right\|_{\infty}\lesssim\sqrt{\frac{1}{n}}$ . Therefore it remains to show

\displaystyle\left\|\overline{\boldsymbol{U}}^{*}\right\|_{2,\infty}\lesssim\sqrt{\frac{K}{n}}.

We use the notation $\overline{\boldsymbol{U}}^{*}_{1}$ for $\overline{\boldsymbol{U}}^{*}$ when there exists self-loops, and use the notation $\overline{\boldsymbol{U}}^{*}_{2}$ for $\overline{\boldsymbol{U}}^{*}$ when the self-loops are not allowed.

With self-loop: Recall the definition of $\boldsymbol{B}^{*}$ in Lemma 3. By [JKL23, (C.26)], we know that $\left\|\boldsymbol{B}^{*}\right\|\lesssim\sqrt{K}$ . As a result, we have $\|\boldsymbol{b}^{*}_{k}\|_{2}\leq\sqrt{K}$ for $1\leq k\leq K$ . Since $\boldsymbol{r}_{i}^{*},i\in[n]$ are convex combinations of $\boldsymbol{b}_{k}^{*}$ s, one can see that $\|\boldsymbol{r}^{*}_{i}\|_{2}\leq\sqrt{K}$ for $1\leq i\leq n$ . Hence, by the definition of $\boldsymbol{r}_{i}^{*}$ in (9),

\displaystyle\left\|\left(\overline{\boldsymbol{U}}^{*}_{1}\right)_{i,\cdot}\right\|_{2}=\left\|(\boldsymbol{u}_{1}^{*})_{i}\boldsymbol{r}_{i}^{*}\right\|_{2}\lesssim\sqrt{\frac{K}{n}}.

Without self-loop: Define $\boldsymbol{R}_{\overline{\boldsymbol{U}}}$ as the rotation matrix matches $\overline{\boldsymbol{U}}_{1}^{*}$ and $\overline{\boldsymbol{U}}_{2}^{*}$

\displaystyle\boldsymbol{R}_{\overline{\boldsymbol{U}}}:=\operatorname*{arg\,min}_{\boldsymbol{O}\in\mathcal{O}^{(K-1)\times(K-1)}}\left\|\overline{\boldsymbol{U}}_{1}^{*}\boldsymbol{O}-\overline{\boldsymbol{U}}^{*}_{2}\right\|_{F}.

By Wedin’s sin $\Theta$ Theorem [CCF⁺21, Theorem 2.9], we have

\displaystyle\left\|\overline{\boldsymbol{U}}_{1}^{*}\boldsymbol{R}_{\overline{\boldsymbol{U}}}-\overline{\boldsymbol{U}}^{*}_{2}\right\|\lesssim\frac{\left\|\textbf{diag}(\boldsymbol{\Theta}\boldsymbol{\Pi}\boldsymbol{P}\boldsymbol{\Pi}^{\top}\boldsymbol{\Theta})\right\|}{\sigma_{\textbf{min}}^{*}}\lesssim\frac{\theta_{\text{max}}^{2}}{\beta_{n}K^{-1}n\theta_{\text{max}}^{2}}\lesssim\frac{K}{\beta_{n}n}.

As a result, we have

	$\displaystyle\left\\|\overline{\boldsymbol{U}}^{*}_{2}\right\\|_{2,\infty}$	$\displaystyle\leq\left\\|\overline{\boldsymbol{U}}^{}_{1}\boldsymbol{R}_{\overline{\boldsymbol{U}}}\right\\|_{2,\infty}+\left\\|\overline{\boldsymbol{U}}_{1}^{}\boldsymbol{R}_{\overline{\boldsymbol{U}}}-\overline{\boldsymbol{U}}^{}_{2}\right\\|_{2,\infty}\leq\left\\|\overline{\boldsymbol{U}}^{}_{1}\boldsymbol{R}_{\overline{\boldsymbol{U}}}\right\\|_{2,\infty}+\left\\|\overline{\boldsymbol{U}}_{1}^{}\boldsymbol{R}_{\overline{\boldsymbol{U}}}-\overline{\boldsymbol{U}}^{}_{2}\right\\|$
		$\displaystyle\lesssim\sqrt{\frac{K}{n}}+\frac{K}{\beta_{n}n}\lesssim\sqrt{\frac{K}{n}}$

as long as $n\gtrsim K\beta_{n}^{-2}$ , which is a relatively mild assumption and is assumed to be true from Theorem 2 to Theorem 7 (Theorem 1 does not involve $\overline{\boldsymbol{U}}^{*}$ ).

8.23 Proof of (31)

We formally state the theoretical guarantee of the Gaussian multiplier bootstrap method described in Example 2 as below.

Theorem 11.

Assume the conditions in Theorem 6 hold. As long as

\displaystyle\max_{j:j\neq i}\frac{\|\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}\|_{\text{max}}\log^{5/2}n}{\sqrt{V_{\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}}}}=o(1)\quad\text{ and }\quad\max_{j:j\neq i}\frac{\varepsilon_{3}\sqrt{\log n}}{\sqrt{V_{\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}}}}=o(1),

we have

\displaystyle\left|\mathbb{P}(\mathcal{T}>c_{1-\alpha})-\alpha\right|\to 0.

Proof.

We define

\displaystyle\mathcal{T}^{\sharp}=\max_{j:j\neq i}\left|\frac{\textbf{Tr}\left[\left(\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}\right)\boldsymbol{W}\right]}{\sqrt{V_{\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}}}}\right|.

Since $|W_{ij}|\leq 1$ , we know that [CCKK22, Condition E, M] holds with $b_{1}=b_{2}=1$ and $B_{n}\asymp n\|\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}\|_{\text{max}}/\sqrt{V_{\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}}}$ . As a result, by [CCKK22, Theorem 2.2] we have

	$\displaystyle\left\|\mathbb{P}(\mathcal{T}^{\sharp}>c_{1-\alpha})-\alpha\right\|$	$\displaystyle\lesssim\left(\frac{B_{n}^{2}\log^{5}((n-1)(n^{2}+n)/2)}{(n^{2}+n)/2}\right)^{1/4}$
		$\displaystyle\lesssim\log^{5/4}n\sqrt{\frac{\\|\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}\\|_{\text{max}}}{\sqrt{V_{\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}}}}}\to 0.$		(148)

Therefore, to prove (31), it is enough to show

\displaystyle\sup_{x\in\mathbb{R}}\left|\mathbb{P}(\mathcal{T}^{\sharp}\leq x)-\mathbb{P}(\mathcal{T}\leq x)\right|\to 0.

(149)

One can see that

\displaystyle\sup_{x\in\mathbb{R}}\left|\mathbb{P}(\mathcal{T}^{\sharp}\leq x)-\mathbb{P}(\mathcal{T}\leq x)\right|\leq\mathbb{P}(|\mathcal{T}^{\sharp}-\mathcal{T}|>\delta)+\sup_{x\in\mathbb{R}}\mathbb{P}(x<\mathcal{T}^{\sharp}\leq x+\delta),\quad\forall\delta>0.

(150)

We take $\delta\asymp\max_{j:j\neq i}\varepsilon_{3}/\sqrt{V_{\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}}}$ . Then by Theorem 6 we know that

\displaystyle\mathbb{P}(|\mathcal{T}^{\sharp}-\mathcal{T}|>\delta)=O(n^{-10}).

(151)

Next we show that $\sup_{x\in\mathbb{R}}\mathbb{P}(x<\mathcal{T}^{\sharp}\leq x+\delta)\to 0$ . Let $(Z_{1},Z_{2},\dots,Z_{n-1})$ be a centered Gaussian random vector with the same covariance structure (thus we know that $Z_{k}\sim\mathcal{N}(0,1),\;\forall k\in[n-1]$ ) as

\displaystyle\left(\frac{\textbf{Tr}\left[\left(\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}\right)\boldsymbol{W}\right]}{\sqrt{V_{\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}}}}:j\in[n]\backslash i\right)\in\mathbb{R}^{n-1}.

Then by [CCKK22, Theorem 2.1] we know that

	$\displaystyle\sup_{x\in\mathbb{R}}\left\|\mathbb{P}(\mathcal{T}^{\sharp}\leq x)-\mathbb{P}\left(\max_{1\leq k\leq n-1}\|Z_{k}\|\leq x\right)\right\|$	$\displaystyle\lesssim\left(\frac{B_{n}^{2}\log^{5}((n-1)(n^{2}+n)/2)}{(n^{2}+n)/2}\right)^{1/4}$
		$\displaystyle\lesssim\log^{5/4}n\sqrt{\frac{\\|\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}\\|_{\text{max}}}{\sqrt{V_{\boldsymbol{C}^{\boldsymbol{\pi}}_{j,k}-\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}}}}}\to 0.$

As a result, we have

	$\displaystyle\left\|\sup_{x\in\mathbb{R}}\mathbb{P}(x<\mathcal{T}^{\sharp}\leq x+\delta)-\sup_{x\in\mathbb{R}}\mathbb{P}\left(x<\max_{1\leq k\leq n-1}\|Z_{k}\|\leq x+\delta\right)\right\|$
$\displaystyle\leq$	$\displaystyle\sup_{x\in\mathbb{R}}\left\|\mathbb{P}(x<\mathcal{T}^{\sharp}\leq x+\delta)-\mathbb{P}\left(x<\max_{1\leq k\leq n-1}\|Z_{k}\|\leq x+\delta\right)\right\|$
$\displaystyle\leq$	$\displaystyle 2\sup_{x\in\mathbb{R}}\left\|\mathbb{P}(\mathcal{T}^{\sharp}\leq x)-\mathbb{P}\left(\max_{1\leq k\leq n-1}\|Z_{k}\|\leq x\right)\right\|\to 0.$	(152)

On the other hand, by [CCK15, Theorem 3] we know that

\displaystyle\sup_{x\in\mathbb{R}}\mathbb{P}\left(x<\max_{1\leq k\leq n-1}|Z_{k}|\leq x+\delta\right)\lesssim\delta\sqrt{\log n}\to 0.

(153)

Combine (152) with (153) we know that

\displaystyle\sup_{x\in\mathbb{R}}\mathbb{P}(x<\mathcal{T}^{\sharp}\leq x+\delta)\to 0.

(154)

Plugging (151) and (154) in (150) we prove (149). This, along with (148) proves that

\displaystyle\left|\mathbb{P}(\mathcal{T}>c_{1-\alpha})-\alpha\right|\to 0,

concluding our proof.

∎

	$\displaystyle\max_{1\leq i\leq n}\left\\|\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}-\Delta\boldsymbol{r}_{i}\right\\|_{2}$	$\displaystyle\lesssim\max_{1\leq i\leq n}\|\gamma_{i}\|_{2}\cdot\max_{1\leq i\leq n}\left\\|\boldsymbol{R}^{\top}\widehat{\boldsymbol{r}}_{i}-\boldsymbol{r}_{i}^{*}\right\\|_{2}+\max_{1\leq i\leq n}\left\\|[\boldsymbol{\Psi}_{\boldsymbol{r}}]_{i,\cdot}\right\\|_{2}$
		$\displaystyle\lesssim\max_{1\leq i\leq n}\|\gamma_{i}\|_{2}\cdot\varepsilon_{1}+\max_{1\leq i\leq n}\left\\|[\boldsymbol{\Psi}_{\boldsymbol{r}}]_{i,\cdot}\right\\|_{2}\lesssim\varepsilon_{2}$		(19)

	$\displaystyle\boldsymbol{C}^{\boldsymbol{r}}_{i,k}:=\frac{\overline{\boldsymbol{\Lambda}}_{kk}^{}}{(\boldsymbol{u}_{1})_{i}}\left[\boldsymbol{e}_{i}\left(\overline{\boldsymbol{U}}_{\cdot,k}^{}\right)^{\top}\right]-\overline{\boldsymbol{\Lambda}}_{kk}^{}\left[\boldsymbol{u}_{1}^{}\left(\overline{\boldsymbol{U}}_{\cdot,k}^{}\right)^{\top}\boldsymbol{N}\right]-\frac{(\boldsymbol{r}_{i}^{})_{k}}{(\boldsymbol{u}_{1}^{})_{i}\lambda_{1}^{}}\boldsymbol{u}_{1}^{*}\boldsymbol{N}_{i,\cdot},$
	$\displaystyle\boldsymbol{C}^{\boldsymbol{a}}_{i,k}:=\sum_{l\in[K-1]}\left(\boldsymbol{B}^{}\right)_{k,l}^{-1}\boldsymbol{C}^{\boldsymbol{r}}_{i,l}-\sum_{l\in[K-1]}\sum_{t\in[K]}\left(\boldsymbol{B}^{}\right)_{k,l}^{-1}(\boldsymbol{a}_{i}^{*})_{t}\boldsymbol{C}^{\boldsymbol{b}}_{t,l},\quad\boldsymbol{C}^{\boldsymbol{b}}_{i,k}:=\frac{1}{\left\|\mathbb{V}_{i}\right\|}\sum_{j\in\mathbb{V}_{i}}\boldsymbol{C}^{\boldsymbol{r}}_{j,k},$
	$\displaystyle\boldsymbol{C}^{\boldsymbol{\pi}}_{i,k}:=\frac{1}{\left(\sum_{l=1}^{K}(\boldsymbol{a}^{}_{i})_{l}/c^{}_{l}\right)^{2}}\sum_{l\neq k,l\in[K]}\Bigg{\{}\left(\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}+2\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{N}\right)\left(\frac{c_{k}^{}}{2c_{l}^{}}-\frac{c_{l}^{}}{2c_{k}^{}}\right)(\boldsymbol{a}_{i}^{})_{k}(\boldsymbol{a}_{i}^{})_{l}$
	$\displaystyle\quad\quad+\frac{(\boldsymbol{a}_{i}^{})_{l}\boldsymbol{C}^{\boldsymbol{a}}_{i,k}-(\boldsymbol{a}_{i}^{})_{k}\boldsymbol{C}^{\boldsymbol{a}}_{i,l}}{c_{k}^{}c_{l}^{}}+\sum_{t\in[K-1]}\left(\frac{\boldsymbol{b}_{kt}^{\top}\overline{\boldsymbol{\Lambda}}^{}_{tt}c_{k}^{}\boldsymbol{C}^{\boldsymbol{b}}_{k,t}}{c_{l}^{}}-\frac{\boldsymbol{b}_{lt}^{\top}\overline{\boldsymbol{\Lambda}}^{}_{tt}c_{l}^{}\boldsymbol{C}^{\boldsymbol{b}}_{l,t}}{c_{k}^{}}\right)(\boldsymbol{a}_{i}^{})_{k}(\boldsymbol{a}_{i}^{})_{l}\Bigg{\}}.$		(28)

	$\displaystyle\left\\|\overline{\boldsymbol{U}}\overline{\boldsymbol{\Lambda}}\boldsymbol{L}-\overline{\boldsymbol{X}}\overline{\boldsymbol{U}}^{*}\right\\|_{2,\infty}\lesssim$	$\displaystyle\left(\frac{n\theta_{\text{max}}^{2}}{\sigma_{\textbf{min}}^{}}+\log n\right)\left\\|\overline{\boldsymbol{U}}\boldsymbol{L}-\overline{\boldsymbol{U}}^{}\right\\|_{2,\infty}+\frac{\sqrt{(K-1)n\log n}}{\sigma_{\textbf{min}}^{*}}\theta_{\text{max}}$
		$\displaystyle+\frac{(\sqrt{(K-1)n}\theta_{\text{max}}+\log n)n\theta_{\text{max}}^{2}}{\lambda_{1}^{}\sigma_{\textbf{min}}^{}}+\frac{(\sqrt{n}\theta_{\text{max}}+\log n)^{2}\sqrt{(K-1)\mu^{}/n}}{\sigma_{\textbf{min}}^{}}$
		$\displaystyle+\frac{\kappa^{}}{\sigma_{\textbf{min}}^{}}\sqrt{(K-1)\mu^{*}n}\theta_{\text{max}}^{2}.$

	$\displaystyle\left\\|(\lambda\boldsymbol{I}-\boldsymbol{H})^{-1}\right\\|$	$\displaystyle=\max_{i\in[n]}\frac{1}{\|\lambda-\lambda_{i}^{}\|}=\frac{1}{r}\asymp\frac{1}{\lambda_{1}^{}},$
	$\displaystyle\left\\|(\lambda\boldsymbol{I}-\boldsymbol{X})^{-1}\right\\|$	$\displaystyle=\max_{i\in[n]}\frac{1}{\|\lambda-\widehat{\lambda}_{i}\|}\leq\max_{i\in[n]}\frac{1}{\|\lambda-\lambda_{i}^{}\|-\|\widehat{\lambda}_{i}-\lambda_{i}^{}\|}\leq\frac{1}{r-C_{2}\theta_{\text{max}}\sqrt{n}}\asymp\frac{1}{\lambda_{1}^{*}}$		(42)

	$\displaystyle\Delta\boldsymbol{P}_{1}$	$\displaystyle=\frac{1}{2\pi i}\oint_{\mathcal{C}_{1}}\sum_{i=1}^{n}\sum_{j=1}^{n}\frac{1}{(\lambda-\lambda_{i}^{})(\lambda-\lambda_{j}^{})}\boldsymbol{u}_{i}^{}\boldsymbol{u}_{i}^{\top}\boldsymbol{W}\boldsymbol{u}_{j}^{}\boldsymbol{u}_{j}^{\top}d\lambda$
		$\displaystyle=\sum_{i=1}^{n}\sum_{j=1}^{n}\textbf{Res}\left(\frac{1}{(\lambda-\lambda_{i}^{})(\lambda-\lambda_{j}^{})},\lambda_{1}^{}\right)\boldsymbol{u}_{i}^{}\boldsymbol{u}_{i}^{\top}\boldsymbol{W}\boldsymbol{u}_{j}^{}\boldsymbol{u}_{j}^{*\top}$
		$\displaystyle=\sum_{i=2}^{n}\frac{1}{\lambda_{1}^{}-\lambda_{i}^{}}\left(\boldsymbol{u}_{i}^{}\boldsymbol{u}_{i}^{\top}\boldsymbol{W}\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}+\boldsymbol{u}_{1}^{}\boldsymbol{u}_{1}^{\top}\boldsymbol{W}\boldsymbol{u}_{i}^{}\boldsymbol{u}_{i}^{\top}\right).$

Inferences on mixing probabilities and ranking in mixed-membership models††thanks: The research is supported in part by ONR grant N00014-22-1-2340, NSF grants DMS-2052926, DMS-2053832 and DMS-2210833.

Abstract

keywords:

1 Introduction

2 Problem Formulation

2.1 Model setting

Assumption 1.

2.2 Estimation procedure

Assumption 2.

Assumption 3.

Assumption 4.

Assumption 5.

Lemma 1.

Definition 1.

Remark 1.

Theorem 1.

Proof.

Theorem 2.

Proof.

Theorem 3.

Proof.

Remark 2.

Theorem 4.

Proof.

Remark 3.

3 Vertex Hunting

Theorem 5.

Proof.

Corollary 1.

Proof.

4 Membership Reconstruction

Lemma 2.

Proof.

Lemma 3.

Proof.

Theorem 6.

Proof.

5 Distributional Theory and Rank Inference

Theorem 7.

Proof.

Example 1.

Example 2.

Example 3.

6 Numerical Studies

6.1 Synthetic Data Simulation

6.2 Real Data Experiments

7 Proof Outline

Lemma 4.

Proof.

Theorem 8.

Proof.

Theorem 9.

Proof.

Lemma 5.

Proof.

Lemma 6.

Proof.

Theorem 10.

Corollary 2.

Proof.

Lemma 7.

Proof.

Lemma 8.

Proof.

Lemma 9.

Proof.

Lemma 10.

Proof.

Lemma 11.

Proof.

Corollary 3.

Proof.

References

8 Proofs

8.1 Proof of Theorem 8

Proof.

8.2 Proof of Theorem 9

Proof.

8.3 Proof of Lemma 5

Proof.

Inferences on mixing probabilities and ranking in mixed-membership models^†^†thanks: The research is supported in part by ONR grant N00014-22-1-2340, NSF grants DMS-2052926, DMS-2053832 and DMS-2210833.

8.22 The Incoherence Parameter $\mu^{*}$