, and
Inferences on mixing probabilities and ranking in mixed-membership models††thanks: The research is supported in part by ONR grant N00014-22-1-2340, NSF grants DMS-2052926, DMS-2053832 and DMS-2210833.
Abstract
Network data is prevalent in numerous big data applications including economics and health networks where it is of prime importance to understand the latent structure of network. In this paper, we model the network using the Degree-Corrected Mixed Membership (DCMM) model. In DCMM model, for each node , there exists a membership vector , where denotes the weight that node puts in community . We derive novel finite-sample expansion for the s which allows us to obtain asymptotic distributions and confidence interval of the membership mixing probabilities and other related population quantities. This fills an important gap on uncertainty quantification on the membership profile. We further develop a ranking scheme of the vertices based on the membership mixing probabilities on certain communities and perform relevant statistical inferences. A multiplier bootstrap method is proposed for ranking inference of individual member’s profile with respect to a given community. The validity of our theoretical results is further demonstrated by via numerical experiments in both real and synthetic data examples.
keywords:
1 Introduction
In various fields of study, such as citation networks, protein interactions, health, finance, trade, and social networks, we often come across large amounts of data that describe the relationships between objects. There are numerous approaches to understand and analyze such network data. Algorithmic methods are commonly used to optimize specific criteria as shown in [New13a, New13b] and [ZM14]. Alternatively, model-based methods rely on probabilistic models with specific structures, which are reviewed by [GZF+10]. One of the earliest models where the nodes (or vertices) of the network belong to some latent community is Stochastic Block Model (SBM) [HLL83, WW87, Abb17]. Several improvements have been proposed over this model to overcome its limitations, two of them are relevant in our paper. First, [KN11] introduced degree-corrected SBM, where a degree parameter is used for each vertices to make the expected degrees match the observed ones. Second, [ABFX08, ABEF14] study mixed membership model where each individual can belong to several communities with a mixing probability profile. In this paper, we study membership profiles in Degree-Corrected Mixed Membership (DCMM) model, which combines both the above benefits. In DCMM, every node is assumed to have a community membership probability vector , where is the number of communities and the -th entry of specifies the mixture proportion of node in community (see [FFHL22a, JKL23]. For example, a newspaper can be conservative and liberal. In addition, each node is allowed to have its own degree.
Given such a network, estimation and inference of membership profiles has drawn some attention recently. For example, [JKL23] provides an algorithm to estimate the ’s and [FFHL22b, FFLY22] considers the hypothesis testing problem that two nodes have same membership profiles. However, they avoid the problems of uncertainty quantification of . In addition, to our best knowledge, none of the prior works concern with the problem of ranking nodes in a particular profile. As an example, one might consider asking the question: Is newspaper A more liberal than newspaper B? Or how many newspapers should I pick to ensure the top-K conservative newspapers are selected? Such a ranking question has applications in finances where one might be interested in knowing whether a particular stock is in top technology-stocks before investing in it. In our work, we device a framework to perform ranking inference based on s.
Our work lies in the intersection of three research directions which we delineate here.
-
1.
Community detection: Our estimation and inference procedure crucially rely on spectral clustering, which is one of the oldest methods of community detection (cf. [VL07] for a tutorial). In the last decade, [RCY11, Jin15, LR15] have developed both the theory and methods of spectral clustering. Other line of research related to community detection involves showing optimality of detection boundary [Abb17] or link-prediction problem [NK07, LWLZ23]. Recently, [JKL23] has developed an algorithm to estimate s in norm, however it lacks any inferential guarantees and asymptotic distributions. Furthermore, there has been significant number of works about hypothesis testing in network data. [ACV14, VAC15] formulated community detection as a hypothesis detection problem. [FFHL22b, FFLY22] studies the testing problem of whether two vertices have same membership profiles in DCMM. Under stochastic block models, detection of the number of blocks has been studied by [BS16, Lei16, WB17] among others.
-
2.
Ranking inference: Most of the literature about ranking problems deals with pairwise comparisons or multiple partial ranking models, like Bradley-Terry-Luce and other assortative network models. Prominent examples of ranking involves individual choices in economics [Luc12, M+73], websites [DKNS01], ranking of journals [JJKL21, Sti94], alleles in genetics [SC95].Hence, the ranking problem has been extensively studied in statistics, machine learning, and operations research, see, for example, [Hun04, CS15, CFMW19, CCF+21, GSZ23, FLWY22] for more details. However, none of the above work is concerned with ranking in DCMM model and hence, our work is significantly different from the aforementioned papers.
-
3.
and perturbation theory: Often, for uncertainty quantification of unknown parameters, it is not enough to obtain an error bound on the estimators, rather one needs a leave-one-out style analysis to get more refined and error bounds. See [CCF+21, Chapter 4] for an introduction. Such analysis has been used in matrix completion[CFMY19], principal component analysis [YCF21], ranking analysis[FLWY22, FHY22, GSZ23]. We develop novel subspace perturbation theory to obtain novel finite sample expansions of individual s and use it to obtain asymptotic distributions.
To perform inference about ranks and hypothesis testing, we employ an inference framework for ranking items through maximum pairwise difference statistic whose distribution is approximated by a newly proposed multiplier bootstrap method. A similar framework is recently introduced by [FLWY22] in the context of Bradley-Terry-Luce model.
The rest of the paper is structured as follows. Section 2 formulates the problem and describes the estimation procedure. Section 3 and Section 4 delineates the vertex hunting and membership reconstruction steps of our estimation procedure respectively. Using the results we established, we develop some distribution theory and answer inference questions in Section 5. We complement our theoretical findings with numerical experiments in Section 6 where we perform simulations on both synthetic and real datasets. Section 7 provides brief outline of the major proofs of our paper. Finally, Section 8 contains all the proofs.
2 Problem Formulation
2.1 Model setting
Consider an undirected graph on nodes (or vertices), where is the set of nodes and denotes the set of edges or links between the nodes. Given such a graph , consider its symmetric adjacency matrix which captures the connectivity structure of , namely if there exists a link or edge between the nodes and , i.e., and otherwise. We allow the presence of potential self-loops in the graph . If there exists no self-loops, we let for . Under a probabilistic model, we will assume that is an independent realization from a Bernoulli random variable for all upper triangular entries of random matrix .
Given a network on nodes, we assume that there is an underlying latent community structure which contains communities. Each node is associated with a community membership probability vector such that
A node is is called a pure node if there exists such that . The degree-corrected mixed membership (DCMM) model assumes (cf.[JKL23]) that the probability that an edge exists between node and node is given by
(1) |
Here, captures the degree heterogeneity for node , and can be viewed as the probability of a typical member in community (, say) connects with a typical member in community (, say). The mixture probabilities can be written in the following matrix form as follows. Let be a diagonal matrix that captures the degree heterogeneity, be the matrix of community membership probability vectors, and be a nonsingular matrix with . Then, the mixing probability matrix can be expressed as
(2) |
Let be the symmetric adjacency matrix of these nodes, i.e., if there is a link connecting nodes and , and otherwise. Note that for . We set for the case without self-loop. We also allow the case with loop. In that case, we assume that . In both cases, we write
(3) |
where is a symmetric random matrix with mean zero and independent entries above the diagonal and on the diagonal for the case with self-loop. In the following, since our theory applies to both cases, we will not distinguish between them. Therefore, the notation (3) is used throughout the article.
Our goal is to study the community membership probability matrix and conduct inference on its entries. Based on their uncertainty quantifications, we provide a framework for ranking inference and perform some hypothesis testing problems on the rank of each node’s mixing probability on a community. To this end, we need to first estimate using Mixed-SCORE algorithm [JKL23]. We invoke a slight modification of the algorithm along with the perturbation theory to get desired entry-wise expansion.
We impose the following identifiability condition for the DCMM model (1).
Assumption 1.
Each community , has at least one pure node, namely, there exists a vertex such that .
2.2 Estimation procedure
We describe the version of the Mixed-SCORE algorithm (cf. [JKL23]) which will be used to estimate . The algorithm consists of three key steps. First, we map each node to a -dimension space using observed . Ideally, in absence of noise, i.e., , these points in the -dimension space would form a simplex with vertices, with the pure nodes defined by Assumption 1 becoming the vertices. Presence of such a simplex structure has been discussed by Lemma 2.1 of [JKL23].
In the presence of noise, as long as the noise level is mild, we can still hope they are approximately a simplex. So, we apply a vertex hunting algorithm to estimate these vertices. After that, we estimate the membership vector for each node based on the estimated vertices. Below, we describe the procedure mathematically.
-
•
SCORE step: Let be the largest eigenvalues (in magnitude) of , sorted in descending order. Let be the corresponding eigenvectors. Calculate the following vectors
(4) -
•
Vertex Hunting step: Apply vertex hunting algorithm (see Secion 3 for details) to and get estimated vertices .
-
•
Membership Reconstruction step: For each , let be the unique solution of the linear equations:
(5) Define a vector with , where
(6) Estimate as .
To analyze the algorithm, we begin by analyzing the SCORE step, i.e., investigate the ’s.
We introduce some notations first. Since and are symmetric, we write their eigen-decomposition as
The eigenvalues are sorted in the following way. First, are the largest eigenvalues of in magnitude, and are the largest eigenvalues of in magnitude. Second, , are sorted descendingly, while the other eigenvalues can be sorted in any order. By Lemma C.3 of [JKL23], we can choose the sign of to make sure , . The direction of are chosen to make sure . Define
(7) |
We further define,
(8) |
We also denote by
(9) |
Note from the expression of and , in order to analyze , we need to study the difference between and , and the difference between and . To this end, we need the following four assumptions for the theoretical analysis, introduced in Section 3 of [JKL23], which are necessary for the Mixed-SCORE algorithm to work.
Denote by and . Note that, the eigenvalues of are real, since is positive-definite.
Assumption 2.
There exist constants such that
Assumption 3.
There exists a constant such that
Assumption 4.
There exists such that and , where is a sequence of positive real number such that , and is the -th largest right eigenvalue of .
Assumption 5.
There exists a constant such that and
Here is the right eigenvector corresponding to .
Here, Assumption 2 and Assumption 3 ensure that the underlying model is not spiky. In other words, the signal spread across all nodes. Assumption 4 is the eigen-gap assumption, and as we will see next, this assumption ensures that there is a sufficient gap between the first eigenvalue and the remaining eigenvalues of , and the remaining non-zeros eigenvalues of are of the same order. Assumption 5 seems to be less straightforward, but it actually satisfied by a very wide range of models (See [JKL23] for examples). We will make Assumption 2-5 for the remainder of the paper.
With these assumptions in hand, the following lemma lists some important properties of the eigenvalues of and .
Lemma 1.
Combine this lemma with the fact that (will be shown later in Lemma 4) with high probability, we know that the eigenvalues of and can be divided into three group as long as .
This also motivates us to derive two results separately. First, we obtain the expansion of . Second, we analyze the difference between and by writing the expansion of , where
(10) |
is an orthogonal matrix which best “matches” and . Here stands for the set of all the orthogonal matrices.
The analysis of is similar to a matrix denoising problem, so we define some quantities which are commonly used in matrix denoising literature [YCF21]. We define
(11) |
Our results regarding uncertainty quantification contributes to the literature of estimation of subspace spanned by partial eigenvectors, cf. [AFWZ20, AFW22].
To state our results, we state an incoherence condition on the matrix , which is a natural modification of the standard incoherence assumption [YCF21, Assumption 2] to adapt to our setting by separating the first eigenspace from the second to -th eigenspace.
Definition 1.
(Incoherence) The symmetric matrix is said to be -incoherent if
(12) |
Note that, unlike [YCF21] we do not indeed to assume an incoherence assumption in this article. This is because a combination of Assumption 2 and Assumption 3 shows that is incoherent with , see Remark 1 and Section 8.22 for more details. Also Lemma 1 implies . However, the quantities and are two key parameters in the literature of matrix denoising, and we keep them in our results.
Remark 1.
Now, we are ready to state our results. We first state the following bound on whose proof is deferred.
Theorem 1.
If , with probability at least ,
where is given by Definition 1. Moreover, we have the following expansion
where with probability at at least we have and
Furthermore, if , we obtain
Theorem 1 provides an error bound for through the quantity . Both and bounds on are provided. The following result states the expansion for . Define
(13) |
Define .
Theorem 2.
Proof.
See Section 8.11. ∎
We are now in a position to state our main result about s defined by (4). This is our final result regarding finite sample analysis of SCORE-step. The proof involves both Theorem 1 and Theorem 2, and is deferred.
Theorem 3.
Assume the conditions in Theorem 2 hold. In addition, we assume
Recall the orthogonal matrix defined by (10) and
where the matrix was defined by (14). Then we have the following decomposition
such that with probability at least , for all we have
where and are controlled by Theorem 1 and Theorem 2. Specifically, if , for all we have
Proof.
See Section 8.12. ∎
Remark 2.
We finish the section with our result regarding estimation error bound on the s. To this end, we need to define the following quantities:
(15) |
where is defined by (2.2).
Despite the complicated expressions, these two quantities are easily interpretable. controls the estimation error according to the Theorem 4 below, while controls the expansion error according to Theorem 3. If we assume , for all , then one have
That is, the expansion error decays faster than the estimation error by up to logarithmic factors. This validates our theoretical results. As we have mentioned before, the following theorem shows that controls the estimation error.
Theorem 4.
Assume the conditions in Theorem 3 hold. Assume
(16) |
and
(17) |
Then with probability at least we have
(18) |
Proof.
See Section 8.13. ∎
The above result obtains finite sample error bound for the s defined by (4). By Theorem 3, we know that
(19) |
with probability at least , where we define
(20) |
Remark 3.
We remark here that (16) and (17) are two sufficient conditions to ensure (18) holds, but not necessary. In fact, these two conditions ensure the upper bound of the first order term dominates the upper bound of the expansion error, which is given by Theorem 3, in order to simplify the upper bound of the estimation error . In other words, the results in the rest of the paper hold without (16) and (17), but a more complicated expression of is required if we don’t have (16) and (17).
3 Vertex Hunting
In this section, we describe how to estimate the underlying vertices of the simplex based on the dataset. To this end, define disjoint subsets such that is the collection of all the pure nodes of the -th community, and let be the vertex of the corresponding community, i.e.,
(21) |
The following quantity
(22) |
measures the gap between any vertex and the other points. We will use the following successive projection algorithm given by Algorithm 1 for the vertex hunting step.
Successive projection algorithm (SPA) is a forward variable selection method introduced by [ASG+01]. Our version of SPA borrows from[GV13], who derived its theoretical guarantee. One might consider alternate vertex hunting algorithms similar in spirit to [JKL23]. We leave this for future research directions.
Our goal for the remainder of the section is to understand how SPA is effective in selecting the underlying vertices. If , defined by (22), is not too small, we expect the vertex hunting algorithm to retrieve all the vertices of the simplex, namely, selection consistency. This is the main result of this section. The proof follows by combining Theorem 4 with Theorem 3 of [GV13].
Theorem 5.
Proof.
See Section 8.14. ∎
Corollary 1.
Proof.
See Section 8.15. ∎
The leading term of RHS of (23) will be denoted by
(24) |
4 Membership Reconstruction
In this section we characterize the behavior of . To this end, we will require the expansion of the following three terms
The expansion of can be directly given by Theorem 10, and we defer the precise statement to Section 7. We turn to derive the expansion of . We will use the following notations: given any vector and permutation of , set
And, given a matrix with columns, define
Now, we are in a position to state our result regarding .
Lemma 2.
Proof.
See Section 8.17. ∎
The following Lemma gives an expansion of .
Lemma 3.
Proof.
See Section 8.18. ∎
For simplicity, we define, for
(26) |
where and are defined by (25). As a counterpart of in the membership construction step (6), we define for , where is given by (21). The expansion of can be seen from Corollary 3 and Lemma 2. Combine them with Lemma 3, we obtain the following expansion of that is linear in .
Theorem 6.
Proof.
See Section 8.19. ∎
5 Distributional Theory and Rank Inference
In this section, we tackle specific inference problems based on the uncertainty quantification results we stated before. First, we establish distributional guarantee using the first order expansion we derived in Theorem 6. Second, we apply the distributional results to related inference problem, especially rank inference application. To state the distributional results of , we need the following notations. They are non-random matrices of dimension
(28) |
One can see that
where , , , are defined by (20), (24), (26) and (27) respectively. For any matrices , we define the variance of as
and the covariance of and as
The asymptotic distribution of is given by the following Theorem:
Theorem 7.
Let be the permutation in Theorem 5. Let be a fixed integer and
be distinct fixed pairs from . We consider vectors
Denote by the covariance matrix whose diagonal entry is and off-diagonal entry is . Then for any convex set , we have
as long as
(29) |
where for each pair such that ,
Proof.
See Section 8.20. ∎
Next, we will apply Theorem 7 to answer some inference questions. In many practical applications, one is concerned with the characteristics of each community rather than the index (e.g., ) of the community. For this reason, we will assume that the permutation is the identical map, that is, for all index .
Example 1.
Given a node in the network, it is natural to ask which community it is closest to. This amounts to find the largest component of and can be formulated as hypothesis testing problems. For , we consider the following testing problem:
According to Theorem 7 with , we consider the following Bonferroni-adjusted critical region at a significance level of
where with and ,
(30) |
It is easy to see from the above critical region that at most one of can be rejected. Once is rejected, we can conclude that node is closest to community at a significance level of .
Example 2.
Moving beyond the question of closest community detection, it is often of interest to understand the rank of node with respect to community such as a ranking of conservativeness of a journal or a book. Let be the rank of among . We adopt the method proposed by [FLWY22] to construct rank confidence interval for . Consider the following random variable and its bootstrap counterpart :
where is a symmetric random matrix whose upper triangular (include diagonal) entries are i.i.d. standard Gaussian distribution and denotes the elementwise product of matrices and . Given any , we define as the th quantile of the conditional distribution of given . Then by [CCKK22, Theorem 2.2], one can show that
(31) |
under mild regularity condition (See Section 8.23 for more details). Using the plug-in estimators and estimated critical value from the bootstrap samples, we construct the following simultaneous confidence intervals for with a confidence level of as
In other words, for all , with probability at least . Now implies and counting the number of such give the lower bound the rank of . Similarly, implies and this gives an upper bound on the rank of . As a result,
forms a confidence interval for .
Example 3.
[FFHL22b] proposed the SIMPLE test to study the statistical inference on the membership profiles. Specifically, for each node pair , we are interested in the following testing problem:
Theorem 7 allows us to recover their result. To see this, we take ,
We define matrix by
As a result, by Theorem 7, as long as condition (29) holds, under null hypothesis we have
This Hotelling type of statistic can be used to test the null hypothesis for two individual nodes and recovers the result in [FFHL22b].
6 Numerical Studies
In this section, we conduct numerical experiments on both synthetic data and real data to complement our theoretical results. We first validate our distributional results by simulations. Then, we apply our approach to stock dataset to study the simplex structure and do rank inference, as we have mentioned in previous examples.
6.1 Synthetic Data Simulation
Here, we conduct synthetic data experiments to verify our uncertainty quantification results in Theorems 6 and 7. Our simulation setup is as follows: set the number of nodes and number of communities . To generate the membership matrix , we first set the first two rows of the matrix as and , as two pure nodes. The first entries of the remaining rows are sampled independently from the uniform distribution over interval , while the second entries are determined by the first entries since the row sum is . Then we randomly shuffle the rows of . For the matrix , we set its diagonals as and off-diagonals as . In terms of the , which partially represents the signal strength, we consider three settings: (i) for all nodes. (ii) ’s are sampled independently from the uniform distribution over interval . (iii) for all nodes. In each setting, we generate the network and obtain the estimated mixed membership matrix estimator times. We then record the realizations of the following standardized random variable
where is defined by (30) and is the plug-in estimator of given by (28). Figure 1 summarizes the results collected from the simulations. The three plots in the first row show the histograms of the results from each setting, and the orange curve is the density of standard normal distribution. The three Q-Q plots in the second row further examine the normality in the three settings. These results suggest that the random variable is nearly normally distributed and the estimated asymptotic variance is right. They in turn support further our theoretical results Theorem 6 and Theorem 7.

6.2 Real Data Experiments
In this subsection, we apply our uncertainty quantification results to financial dataset. Our dataset consists of the daily close prices of the S&P 500 stocks from January 1, 2010 to December 31, 2022 from Yahoo Finance111https://pypi.org/project/yfinance/ and we calculated the log returns. We clean the data by removing the stocks with more than one missing values. For those with just one missing value, we let the corresponding log returns be zero. We would like to construct a network based on the correlation of the log returns. It is well known in finance that some common factors account for much of this correlation. Similar to [FFHL19], we first fit a factor model with five factors to remove these common factors and then construct the network based on the covariance matrix of the idiosyncratic components. Letting be the covariance matrix of the idiosyncratic components, we draw an edge between nodes and if and only if . In this way, we obtain the adjacency matrix . After the preprocessing steps, we have stocks remained.
We take and apply the SCORE normalization step to the leading eigenvectors of . On the left side of Figure 2 we display the scatter plot of for and show the -dimensional simplex structure. As we can see from the figure, it has a clear triangular structure. Looking closely at the nodes, we can find some characteristics of the three corners of this triangle. Many financial companies (PNC, NDAQ, TROW, RE, PSA) are very close to the vertex of the right corner, which can be viewed as the pure node of this community. And, many other financial companies (AJG, WRB, AXP, BRO, C) are also closer to this corner compared to the other two corners. In the top corner, we can find companies (REGN, EW, HOLX, ILMN) belonging to the healthcare industries. Moreover, some other healthcare companies PFE, HUM, LH, TMO, ABT can also be viewed as mixed members which are closer to this community. Similarly, companies related to the energy industry such as MPC, BA, TDY, EMR, AEP, DOV, XEL, FE make up a large part of the bottom corner, while other similar companies ED, LNT seem to be mixed members close to this corner. To validate these observations, we apply Theorem 7 to conduct the following tests for each as we have stated in Example 1.
(32) | |||
At most one of and can be rejected. On the right side of Figure 2, we show the results of the aforementioned tests. We use red/blue/green to represent that // is rejected respectively. If none of these three null hypothesises is rejected, we let be point be grey. As we can see from Figure 2, for most () of the nodes, one of and is rejected. In other words, although many nodes have mixed membership, we can identify them to be closer to one community. Furthermore, the test results confirm our observation about the three corners we have mentioned before. It is worth mentioning that the information technology companies, which make up a large proportion of the S&P 500 list, can be found in abundance in any part of the simplex. This also indicates that the prosperous development of information technology industry is a result of the nurturing of traditional industries.

Next, we apply our approach in Example 2 to construct rank confidence interval for the nodes. We summarize our inference results in Table 1. We select stocks as representatives for presentation. First, we include the three estimated vertices. Second, of the four categories (red/blue/green/grey) shown in Figure 2, we randomly select two from each category and we label them with R/B/G/C (C stands for center). In Table 1 we include the estimated mixed membership vectors for each stock as well as the three rank confidence intervals for and , and we denote them by RCI I, RCI II and RCI III. As we can see from Table 1, our approach provides meaningful rank confidence intervals for the stocks and the categories which they are close to.
Symbol | RCI I | RCI II | RCI III | |
---|---|---|---|---|
AAL () | ||||
LLY () | ||||
MPC () | ||||
META (C) | ||||
EBAY (C) | ||||
DHR (R) | ||||
HOLX (R) | ||||
PNC (B) | ||||
MOS (B) | ||||
AEP (G) | ||||
LYB (G) |
Next, we investigate whether there has been a change of simplex structure before the COVID- and after the pandemic began. We use stock data from January 1, 2017 to January 1, 2020 as before COVID- data, and stock data from May 1, 2020 to May 1, 2023 as after COVID- data. We follow the same data preprocessing procedure as before, while the threshold for is replaced by when dealing with the after COVID-19 data. And, we also conduct the aforementioned tests (32) to these two dataset. In Figure 3 we include the experiment results and the results of tests are also shown by the colors.
Recall that we have identified three categories with finance, healthcare, and energy as their respective representatives in Figure 2. From the top two plots of Figure 3 we can see that the ‘finance corner’ (red) and the ‘energy corner’ (blue) are consistent with each other, and these structures are similar to what we have observed in Figure 2 except for some companies (e.g., AXP, BRO, ED, LNT, WRP). However, a structural change in another corner brings us some interesting observations. First, the remaining corner (green) in the top left plot (before COVID-) of Figure 3 is a mix of companies from many different industries, and there is no industry can be considered representative of this corner. Second, the healthcare companies from the ‘healthcare corner’ in Figure 2, are now in the center cluster of the top left plot. The bottom left plot of Figure 3 is the zoomed plot of the center cluster of the top left plot, and all the companies (EW, HUM, HOLX, ABT, REGN, TMO, LH, ILMN, PFE) which have been identified as the representatives of the ‘healthcare corner’ in Figure 2 are there. However, when we look at the after COVID- plots (top right and bottom right of Figure 3), we find that these healthcare companies move from the center cluster to the green corner, and consequently make the green corner a ‘healthcare corner’. The bottom right plot of Figure 3 is the zoomed plot of the green corner of the top right plot, and again we can see that all the nine aforementioned healthcare companies are here. To sum up, the ‘healthcare corner’ we have observed in Figure 2 does not exist before COVID-. At that time, the healthcare companies are more in the center of the simplex, and this indicates them to be a mix of the different industries. After the pandemic began, the healthcare industry have grown dramatically, and now they become the representative of the third corner along with ‘finance corner’ and ‘energy corner’.

7 Proof Outline
In this section, we sketch the main steps of the proof. Details of the proof will be provided in Section 8. We begin by showing an “eigen-gap” property, which is fundamental to the matrix analysis. To this end, we provide the high-probability bound on the spectral norm of the noise matrix .
Lemma 4.
The following event happens with probability at least :
(33) |
Proof.
We are going to prove our results under this favorable set . We begin by listing the key results leading to Theorem 1 and Theorem 2. A key assumption in all these results will be either (for results regarding ) or (for results regarding ) to guarantee the eigen-gap. From Lemma 1, we already know that these two assumptions are mild.
We now state the results for . The proofs rely on contour integrals, similar in spirit to [FFHL22a].
Theorem 8.
Proof.
See Section 8.1. ∎
Theorem 8 expresses the quantity as the sum of a leading term (first summand of the RHS of (34)) and a error-term , along with its bound. We also need an bound of which is provided by the following Theorem.
Theorem 9.
Proof.
See Section 8.2. ∎
Lemma 5.
Assume that . Then with probability at least , we have
Proof.
See Section 8.3. ∎
The next goal is to analyze . This is a matrix denoising problem with ground truth and noisy matrix , where
Define the noise matrix
(35) |
Unlike , the matrix does not have a close form expression. The following expansion for will be useful for our results.
Lemma 6.
Further, with probability at least we have
In order to study the expansion of , defined via (35), it is enough to expand . We state our result here. The proof, which uses contour integrals, is deferred to Section 8.4.
Theorem 10.
Assume that . Then under event defined by (33), we have the following expansion:
(36) |
where is a symmetric matrix and .
Theorem 10 shows the first part of Lemma 6, while the second part, the bound for , will be shown later. From Theorem 10 we can directly deduce the following corollary.
Corollary 2.
Assume that . Then under event , we have
Proof.
See Section 8.5. ∎
Corollary 2, coupled with Lemma 4, shows that under event ,
(37) |
Equipped with Theorem 5, we are ready to prove results regarding matrix denoising. We show the following five results whose combination proves Theorem 2. Note that the first four results here are similar to [YCF21, Lemma 1-4], while the fifth results is exactly the second part of Lemma 6. To state these results, we need to introduce some notations first. Define
(38) |
where ’s are defined as (7). Recall the definition of from (10). Here we state the five lemmas whose proofs are deferred.
Lemma 7.
Assume that . Then under event we have
Furthermore, under event we have , , where
(39) |
This immediately implies ,
and the same results for and are also true. In fact, we have
Proof.
See Section 8.6. ∎
Lemma 8.
Assume that and . Then with probability exceeding we have
Proof.
See Section 8.7. ∎
Lemma 9.
Assume that and . Then with probability at least we have
Proof.
See Section 8.8. ∎
Lemma 10.
Assume that and . Then with probability at least we have
Proof.
See Section 8.9. ∎
Lemma 11.
Assume that . Then with probability at least we have
Proof.
See Section 8.10. ∎
Similar to [YCF21], we prove Theorem 2 by combining these five lemmas. The proof of Theorem 2 is included in Section 8.11. Next, we combine Theorem 1 and Theorem 2 to yield Theorem 3. See Section 8.12 for details.
Finally, to obtain the membership reconstruction results in Section 4, we need the following result regarding , which is a direct corollary of Theorem 10.
Corollary 3.
Assume that . Then under event defined by (33), we have the following expansion:
where . In terms of the estimation error, we have .
Proof.
See Section 8.16. ∎
References
- [Abb17] Emmanuel Abbe. Community detection and stochastic block models: recent developments. The Journal of Machine Learning Research, 18(1):6446–6531, 2017.
- [ABEF14] Edoardo M Airoldi, David Blei, Elena A Erosheva, and Stephen E Fienberg. Handbook of mixed membership models and their applications. CRC press, 2014.
- [ABFX08] Edo M Airoldi, David Blei, Stephen Fienberg, and Eric Xing. Mixed membership stochastic blockmodels. Advances in neural information processing systems, 21, 2008.
- [ACV14] Ery Arias-Castro and Nicolas Verzelen. Community detection in dense random networks. The Annals of Statistics, 42(3):940 – 969, 2014.
- [AFW22] Emmanuel Abbe, Jianqing Fan, and Kaizheng Wang. An theory of pca and spectral clustering. The Annals of Statistics, 50(4):2359–2385, 2022.
- [AFWZ20] Emmanuel Abbe, Jianqing Fan, Kaizheng Wang, and Yiqiao Zhong. Entrywise eigenvector analysis of random matrices with low expected rank. Annals of statistics, 48(3):1452, 2020.
- [ASG+01] Mário César Ugulino Araújo, Teresa Cristina Bezerra Saldanha, Roberto Kawakami Harrop Galvao, Takashi Yoneyama, Henrique Caldas Chame, and Valeria Visani. The successive projections algorithm for variable selection in spectroscopic multicomponent analysis. Chemometrics and intelligent laboratory systems, 57(2):65–73, 2001.
- [BS16] Peter J Bickel and Purnamrita Sarkar. Hypothesis testing for automated community detection in networks. Journal of the Royal Statistical Society: Series B: Statistical Methodology, pages 253–273, 2016.
- [CCF+21] Yuxin Chen, Yuejie Chi, Jianqing Fan, Cong Ma, et al. Spectral methods for data science: A statistical perspective. Foundations and Trends® in Machine Learning, 14(5):566–806, 2021.
- [CCK15] Victor Chernozhukov, Denis Chetverikov, and Kengo Kato. Comparison and anti-concentration bounds for maxima of gaussian random vectors. Probability Theory and Related Fields, 162:47–70, 2015.
- [CCKK22] Victor Chernozhuokov, Denis Chetverikov, Kengo Kato, and Yuta Koike. Improved central limit theorem and bootstrap approximations in high dimensions. The Annals of Statistics, 50(5):2562–2586, 2022.
- [CFMW19] Yuxin Chen, Jianqing Fan, Cong Ma, and Kaizheng Wang. Spectral method and regularized mle are both optimal for top-k ranking. Annals of statistics, 47(4):2204, 2019.
- [CFMY19] Yuxin Chen, Jianqing Fan, Cong Ma, and Yuling Yan. Inference and uncertainty quantification for noisy matrix completion. Proceedings of the National Academy of Sciences, 116(46):22931–22937, 2019.
- [CS15] Yuxin Chen and Changho Suh. Spectral mle: Top-k rank aggregation from pairwise comparisons. In International Conference on Machine Learning, pages 371–380. PMLR, 2015.
- [DKNS01] Cynthia Dwork, Ravi Kumar, Moni Naor, and Dandapani Sivakumar. Rank aggregation methods for the web. In Proceedings of the 10th international conference on World Wide Web, pages 613–622, 2001.
- [FFHL19] Jianqing Fan, Yingying Fan, Xiao Han, and Jinchi Lv. Simple: Statistical inference on membership profiles in large networks. arXiv preprint arXiv:1910.01734, 2019.
- [FFHL22a] Jianqing Fan, Yingying Fan, Xiao Han, and Jinchi Lv. Asymptotic theory of eigenvectors for random matrices with diverging spikes. Journal of the American Statistical Association, 117(538):996–1009, 2022.
- [FFHL22b] Jianqing Fan, Yingying Fan, Xiao Han, and Jinchi Lv. Simple: Statistical inference on membership profiles in large networks. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(2):630–653, 2022.
- [FFLY22] Jianqing Fan, Yingying Fan, Jinchi Lv, and Fan Yang. Simple-rc: Group network inference with non-sharp nulls and weak signals. arXiv preprint arXiv:2211.00128, 2022.
- [FHY22] Jianqing Fan, Jikai Hou, and Mengxin Yu. Uncertainty quantification of mle for entity ranking with covariates. arXiv preprint arXiv:2212.09961, 2022.
- [FLWY22] Jianqing Fan, Zhipeng Lou, Weichen Wang, and Mengxin Yu. Ranking inferences based on the top choice of multiway comparisons. arXiv preprint arXiv:2211.11957, 2022.
- [GSZ23] Chao Gao, Yandi Shen, and Anderson Y Zhang. Uncertainty quantification in the bradley–terry–luce model. Information and Inference: A Journal of the IMA, 12(2):1073–1140, 2023.
- [GV13] Nicolas Gillis and Stephen A Vavasis. Fast and robust recursive algorithmsfor separable nonnegative matrix factorization. IEEE transactions on pattern analysis and machine intelligence, 36(4):698–714, 2013.
- [GZF+10] Anna Goldenberg, Alice X Zheng, Stephen E Fienberg, Edoardo M Airoldi, et al. A survey of statistical network models. Foundations and Trends® in Machine Learning, 2(2):129–233, 2010.
- [HLL83] Paul W Holland, Kathryn Blackmond Laskey, and Samuel Leinhardt. Stochastic blockmodels: First steps. Social networks, 5(2):109–137, 1983.
- [Hun04] David R Hunter. Mm algorithms for generalized bradley-terry models. The annals of statistics, 32(1):384–406, 2004.
- [Jin15] Jiashun Jin. Fast community detection by SCORE. The Annals of Statistics, 43(1):57 – 89, 2015.
- [JJKL21] P Ji, J Jin, ZT Ke, and W Li. Meta-analysis on citations for statisticians. manuscript.[1, 10], 2021.
- [JKL23] Jiashun Jin, Zheng Tracy Ke, and Shengming Luo. Mixed membership estimation for social networks. Journal of Econometrics, 2023.
- [KN11] Brian Karrer and Mark EJ Newman. Stochastic blockmodels and community structure in networks. Physical review E, 83(1):016107, 2011.
- [Lei16] Jing Lei. A goodness-of-fit test for stochastic block models. The Annals of Statistics, 44(1):401 – 424, 2016.
- [LR15] Jing Lei and Alessandro Rinaldo. Consistency of spectral clustering in stochastic block models. The Annals of Statistics, 43(1):215 – 237, 2015.
- [Luc12] R Duncan Luce. Individual choice behavior: A theoretical analysis. Courier Corporation, 2012.
- [LWLZ23] Tianxi Li, Yun-Jhong Wu, Elizaveta Levina, and Ji Zhu. Link prediction for egocentrically sampled networks. Journal of Computational and Graphical Statistics, pages 1–24, 2023.
- [M+73] Daniel McFadden et al. Conditional logit analysis of qualitative choice behavior. 1973.
- [New13a] Mark EJ Newman. Community detection and graph partitioning. Europhysics Letters, 103(2):28003, 2013.
- [New13b] Mark EJ Newman. Spectral methods for community detection and graph partitioning. Physical Review E, 88(4):042822, 2013.
- [NK07] DL NoweU and J Kleinberg. The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58(7):1019–1031, 2007.
- [Rai19] Martin Raič. A multivariate Berry–Esseen theorem with explicit constants. Bernoulli, 25(4A):2824 – 2853, 2019.
- [RCY11] Karl Rohe, Sourav Chatterjee, and Bin Yu. Spectral clustering and the high-dimensional stochastic blockmodel. The Annals of Statistics, 39(4):1878 – 1915, 2011.
- [SC95] PC Sham and D Curtis. An extended transmission/disequilibrium test (tdt) for multi-allele marker loci. Annals of human genetics, 59(3):323–336, 1995.
- [Sti94] Stephen M Stigler. Citation patterns in the journals of statistics and probability. Statistical Science, pages 94–108, 1994.
- [T+15] Joel A Tropp et al. An introduction to matrix concentration inequalities. Foundations and Trends® in Machine Learning, 8(1-2):1–230, 2015.
- [VAC15] Nicolas Verzelen and Ery Arias-Castro. Community detection in sparse random networks. The Annals of Applied Probability, 25(6):3465 – 3510, 2015.
- [VL07] Ulrike Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17:395–416, 2007.
- [WB17] Y. X. Rachel Wang and Peter J. Bickel. Likelihood-based model selection for stochastic block models. The Annals of Statistics, 45(2):500 – 528, 2017.
- [WW87] Yuchung J Wang and George Y Wong. Stochastic blockmodels for directed graphs. Journal of the American Statistical Association, 82(397):8–19, 1987.
- [YCF21] Yuling Yan, Yuxin Chen, and Jianqing Fan. Inference for heteroskedastic pca with missing data. arXiv preprint arXiv:2107.12365, 2021.
- [ZM14] Pan Zhang and Cristopher Moore. Scalable detection of statistically significant communities and hierarchies, using message passing for modularity. Proceedings of the National Academy of Sciences, 111(51):18144–18149, 2014.
8 Proofs
8.1 Proof of Theorem 8
Proof.
Define and let be the circular contour around with radius . Then is the only eigenvalue of that is inside . Under event defined by (33), by Weyl’s theorem, we know that
As a result, is the only eigenvalue of that is inside . For , we have
(40) | ||||
(41) |
As a result, we know that
(42) |
under event . Using (40) and (41) we know that
(43) |
We denote by
Then we know that
The integrand can be reformulated as,
As a result, we have, under event ,
where the first inequality uses (8.1). Hence, under event ,
since . This immediately yields
(44) |
By (43), we know that , . Therefore,
As a result, we know that
Therefore, we obtain,
(45) |
Now, by (44). To bound , it is enough to bound . To this end, note that,
(46) |
Also, by Wedin’s sin Theorem [CCF+21, Theorem 2.9],
(47) |
under . Combining (46) and (47), we obtain
Therefore, we get, under the event ,
∎
8.2 Proof of Theorem 9
Proof.
Recall from (45) that
(48) |
We begin with bounding . Recall that,
We split the integrand as a sum of the following two quantities,
so that . Under the event ,
(49) |
where the fourth inequality uses (8.1). It remains to bound . Since can be expanded as
we have
Since , we have
Define the matrices
(50) |
Then we can write
(51) |
We analyse the three terms separately now.
-
(i)
Control : We introduce leave-one-out matrix by replacing all the elements in -th row and -th column of original with for . Then, for any , is independent of . We write
(52) For a fixed , by Lemma 13 and Lemma 12, we obtain
(53) with probability at least under . By Corollary 4 and Lemma 15, we have
(54) with probability at least under event . Plugging (54) in (53) tells us
(55) with probability at least under event .
- (ii)
- (iii)
Combine these three parts with (51), we get
(58) |
with probability at least . Combining (58) and (49), we get
(59) |
8.3 Proof of Lemma 5
8.4 Proof of Theorem 10
Proof.
We use the same and as in the proof of Theorem 8. Since
for the same reason as the proof of Theorem 8, under event , we have
We expand the integrand as sum of two quantitites in the following way:
(61) |
For , the contour integral can be calculated as
Next, we bound the spectral norm of under the event , as:
where the third inequality (8.1). As a result, the contour integral of , can be bounded as
Noting that the contour integral of is exactly
gives us the desired result. ∎
8.5 Proof of Corollary 2
8.6 Proof of Lemma 7
Proof.
Recall the definitions of and from (38) and (10) respectively. We write the SVD of as , where . Then we have . Therefore, we have
By Wedin’s sin Theorem [CCF+21, Theorem 2.9], we have
Recall that, under event , by (37). On the other hand, we have
Therefore, we have
(63) |
Again, by (63),
Since we have assumed , we have
This proves the first part of this lemma.
Moving onto the proof of second part, by (7), we have yielding . It remains to show that under event . To this end, set . by Lemma 1 we know that
Recall that are the largest eigenvalues in magnitude among all eigenvalues of , and are sorted descendingly. By Weyl’s theorem, under event ,
As a result, we get . This leads to, using (7),
implying . This completes the proof of the Lemma. ∎
8.7 Proof of Lemma 8
Proof.
Recall the defintions of and from (10) and (38) respectively. By triangle inequality,
We bound the three quantities separately. For the first term , we have, on the event , using Lemma 7
where the third inequality uses the fact , are orthogonal matrices and the last inquality uses our assumption that .
For the second quantity , one can see that
(64) |
where and come from the SVD of
By Weyl’s theorem we have
(65) |
under event . On the other hand, by [CCF+21, Lemma 2.5] and Lemma 7 we have
(66) | |||
(67) |
Combine (64), (65), (66) and (67) we get
Next, we analyze . By definition we have
Use the notation of Theorem 10, we can write
(68) |
Since , we know that
Now we bound the two terms separately. The second quantity is immediately bounded by Theorem 10:
On the other hand, similar to [YCF21, Lemma 2], we use matrix Bernstein inequality [T+15, Theorem 6.1.1] to control . Define , and , . Then
By (12),
Second, since is symmetric, we know that
Therefore, we only have to bound . Since , we have
For any , we have
And, for any , we have
(69) |
Define by setting . Then, by (8.7),
The second summand of the above display can be bounded as:
In conclusion, we get
Then by matrix Bernstein inequality [T+15, Theorem 6.1.1], with probability at least ,
This implies since we assumed . Finally, combining the bounds for and , we get
since . Also,
completing the proof of the Lemma. ∎
8.8 Proof of Lemma 9
Proof.
Since , by triangle inequality we have
(70) |
We control the first term on the RHS first. We consider the with self-loop case and the without self-loop case separately.
With self-loop: The first term can be bounded as
Let be the SVD of , then
By (63), we have . Therefore,
(71) |
Without self-loop: The first term can be bounded as
The first summand is bounded as the last case. For the second term,
one can see that
(72) |
Since , combine (71) and (72) we get
(73) |
It remains to bound the second term . Recall from (8.7),
(74) |
We control the five summands of RHS separately.
-
(a)
Control : We use the definition of leave-one-out matrix in the proof of Theorem 9. Define and let be the largest eigenvalues of in magnitude, and they are sorted decreasingly. Let the corresponding eigenvectors. Let , and . Then and are independent with . And, one can easily see that the results in Lemma 4, Theorem 10 (we also define ), Corollary 2 and Lemma 7 also apply to the leave-one-out matrices. As a result, we have
By Lemma 14, we have
with probability at least . By [CCF+21, Lemma 2.5], since , we have
(75) where we used (63) for the last equality. As a result, we have
(76) On the other hand, we have
(77) As a result, it remains to bound . One can see that
By Davis-Kahan’s sin theorem [CCF+21, Theorem 2.7], we have
(78) since . By Theorem 10, can be decomposed as
We bound the numerator of (78) via controlling the five summands of RHS of the above display separately.
-
(a)
Control : By triangle inequality we have
(79) On one hand, by Lemma 16 and (12) we have
(80) with probability at least . On the other hand, we have
Since , we get
where is defined by (76). As a result, we have
where is defined by (78). Combine this with (79) and (80) we have
(81) with probability at least .
- (b)
- (c)
- (d)
-
(e)
Control : By Theorem 10, the ranks of and are at most . Hence,
(85)
Recall the definitions of from (76) and (78) respectively. Combining (81)-(85), we get
with probability at least . This, along with (78) yields
with probability at least . Since ,
with probability at least . Recall that is defined as
in (76), we have
with probability at least . By our assumption , yielding
(86) with probability at least . This also implies
(87) with probability at least . Plugging the quantities in (76) we get
with probability at least . Taking a union bound for and noting that , we have with probability at least ,
(88) -
(a)
- (b)
- (c)
- (d)
- (e)
Finally, we can sum up all the five terms we bounded above. Specifically, a combination of (88), (90), (91), (92), (93) and (74) yields
with probability at least . Plugging in this bound, along with the bound (71) and (73) in (70) provides the desired conclusion.
∎
8.9 Proof of Lemma 10
Proof.
Since , we consider the following decomposition
(94) |
Since the upper bound of follows from Lemma 9, we only have to deal with the second and third quantity. We begin with the second quantity. By Lemma 7, since , we have
(95) |
where the last inequality follows from (12). In addition,
(96) |
Since , by Lemma 7 we have
(97) |
Combining (96) and (97), we get
(98) |
since . Therefore, by (95) we have
(99) |
We now turn to bound the third summand of (94), namely, . Recall the decomposition of from (8.7). By Lemma 15 we have
(100) |
with probability at least . And, since is a rank- matrix, we have
(101) |
And, since , we have via (12),
(102) |
Finally, since , we know that . This, along with (100), (101) and (102) yields
(103) |
since . When we further have , the first summand is negligible and we obtain the desired conclusion. ∎
8.10 Proof of Lemma 11
Proof.
From (61) we know that
We define the following four matrices
Control : For , one can show that
As a result, we have
By Lemma 15, we know that with probability at least . For each , by Lemma 18 we know that with probability at least . As a result, we know that
(104) |
with probability at least .
Control : By definition we have
(105) |
Control : For , one can show that
Let
Then we have
(106) |
Note that for defined in (50). In the proof of Lemma 12 we actually show that , which implies
(107) |
For and , since both of them are rank- matrices, we have, using from Lemma 12,
(108) |
and
(109) |
Now, combining (106), (107), (108) (109) yields
(110) |
8.11 Proof of Theorem 2
Proof.
As for the first step, we have
and
As a result, we know that
This imples that
We now bound the numerator. Note that the last term is already bounded by Lemma 11.
Control : Combine (94) and (99) we know that
where is defined by (8.9). Since we assumed , we further have
Plugging this in Lemma 9 we get
Since , and by (103), we know that
since we assumed . According to Lemma 1, since and , we have, using ,
(112) |
8.12 Proof of Theorem 3
8.13 Proof of Theorem 4
Proof.
By Theorem 3, with probability at least , for all we have
(117) |
On one hand, we know that
(118) |
By Lemma 14 we know that
(119) |
with probability at least . And, by Lemma 18 and Lemma 12 we have
(120) |
with probability at least . Plugging (119) and (120) in (118) we get
with probability at least . Combine this with Theorem 2, we get
(121) |
for all with probability at least . On the other hand, by Corollary 4 and Lemma 15 we know that
for all with probability at least . Combine this with Theorem 1, we get
(122) |
for all with probability at least .
8.14 Proof of Theorem 5
Proof.
Let . For the same reason as the proof of [JKL23, Lemma E.1], we know that
with probability at least . And we denote by this event. We let for . By triangle inequality, under event we know that
(123) |
If , we know that . In this case, (123) cannot hold for appropriately chosen . Therefore, for appropriately chosen , we must have . This also implies that is a permutation of , since the cardinality of is exactly .
For any and , if , by triangle inequality we have
under event . As a result, for for appropriately chosen , it holds that . In other words, we must have . On the other hand, if , again by triangle inequality we have
As a result, can be lower bounded as
as long as satisfies
under event , we have . This implies . To sum up, we have for all under event . ∎
8.15 Proof of Corollary 1
8.16 Proof of Corollary 3
8.17 Proof of Lemma 2
Proof.
By Corollary 1 we know that
(124) |
On one hand, by Lemma 8 we have
(125) |
with probability exceeding . On the other hand, by triangle inequality we know that
(126) |
Plugging (125) and (126) in (124) we get
with probability at least . Moreover, the left hand side is a small order term compared to , which controls . As a result, the estimation error can be controlled by , ∎
8.18 Proof of Lemma 3
Proof.
We denote by
By the definition of , we know that . We also denote by
Let be the permutation from Theorem 5, then we have
Therefore, we can write
(127) |
Denote by . By Corollary 1 we know that
(128) |
and
Next, by Theorem 3,
(129) |
where . Plugging (128) and (129) in (127), we get
(130) |
Since form a simplex and all are inside it, we know that the entries of are within . As a result, we know that . That is to say, we have
(131) |
On the other hand, we have
(132) |
According to [JKL23, (C.26)], we know that . As a result, plugging (131) and (132) in (130) we get
(133) |
And, since
(134) |
and
(135) |
Combine (134), (135) with (133) we get
Since , we know that , completing the proof. ∎
8.19 Proof of Theorem 6
Proof.
First we focus on . Define and . Set , . By Taylor expansion, we know that there exists some between and , such that
Combining Corollary 3 and Lemma 2, we have
Note that . Since , we have . Hence,
(136) |
Again by Taylor expansion, we know that there exists some between and , such that
Since , we have . So,
(137) |
Second, for the permutation from Theorem 5 and any , we have
Combine Lemma 3 and (136), we have the following expansion
where
Here holds because according to [JKL23, C.22] and according to Lemma 1. In terms of the estimation error, by Theorem 3 and (137) we have
(138) |
Now we are ready to derive the expansion of . For any and , we have
where
Since for some appropriate , we have
As a result, we have
Combine this with (138), for we have
Since , we have
∎
8.20 Proof of Theorem 7
Proof.
Define
By Berry-Esseen theorem [Rai19], for any convex set , we have
(139) |
Since is the covariance of , we know that
(140) |
Combine (139) and (140) we know that
(141) |
It remains to control . For any convex set and point , we define
With this definition, we have
Taking , by Theorem 6 we know that with probability at least . As a result, we have
(142) |
On the other hand, one can see that
(143) |
By [Rai19, Theorem 1.2] we know that
Plugging this and (141) in (143), we have
Combine this with (142), and by the arbitrariness of , we get
Similarly, it can also be shown that
As a result, we know that
(144) |
Combine (141) and (144), we get
∎
8.21 Auxiliary Lemmas
Lemma 12.
For and defined in (50), we have
Proof.
We prove the desired results in the following two settings: with self-loop and without self-loop. When self-loops are allowed, the rank of is exactly . When there is no self-loop, is approximately rank .
With self-loop: The spectral norm bounds follow directly from (50)
By definition, for we have
On one hand, by (12) we have
On the other hand, defining
by (12) we have,
As a result, we get . Similarly, for we have
Again, we have . Defining
we have
Combining together, we get the desired conclusion .
Without self-loop: The spectral norm bounds follow same as the previous case. For the rest, by definition of ,
The bounds on the first two summands are same as before. Hence it remains to control . One can see that
As a result, we get
since . Similarly, for we have
We bound the first two summands as before. For the third term,
Combine them together, we get
since . This completes the proof. ∎
From Lemma 12, We immediately have the following corollary.
Corollary 4.
For and defined in (50), we have for all
Lemma 13.
For any and a fixed vector , we have
with probability at least . Here the constant hidden in is free of , and .
Proof.
Since , by Bernstein inequality,with probability at least ,
∎
Lemma 14.
For any and a fixed matrix , we have with probability at least ,
Proof.
Taking in [YCF21, Lemma 5] gives us the desired statement. ∎
Lemma 15.
For any fixed matrix , we have with probability at least ,
Proof.
Recall that . Applying Lemma 14 for , we get the desired conclusion. ∎
Lemma 16.
For any and a fixed matrix , we have with probability at least ,
Proof.
Lemma 17.
For any fixed , we have with probability at least ,
In particular, for any , we have with probability at least ,
Proof.
Since , by Hoeffding’s inequality, we have
As a result, with probability at least , we have .
∎
Lemma 18.
For any with , with probability at least ,
Proof.
We write
Since , we know, by (12),
As a result, by Bernstein inequality we have, with probability at least ,
∎
8.22 The Incoherence Parameter
In this subsection, we aim to show that the incoherence condition (12) holds with . As we pointed out in Remark 1, [JKL23, Lemma C.3] and Assumption 2 guarantees . Therefore it remains to show
We use the notation for when there exists self-loops, and use the notation for when the self-loops are not allowed.
8.23 Proof of (31)
We formally state the theoretical guarantee of the Gaussian multiplier bootstrap method described in Example 2 as below.
Theorem 11.
Proof.
We define
Since , we know that [CCKK22, Condition E, M] holds with and . As a result, by [CCKK22, Theorem 2.2] we have
(148) |
Therefore, to prove (31), it is enough to show
(149) |
One can see that
(150) |
We take . Then by Theorem 6 we know that
(151) |
Next we show that . Let be a centered Gaussian random vector with the same covariance structure (thus we know that ) as
Then by [CCKK22, Theorem 2.1] we know that
As a result, we have
(152) |
On the other hand, by [CCK15, Theorem 3] we know that
(153) |
Combine (152) with (153) we know that
(154) |
Plugging (151) and (154) in (150) we prove (149). This, along with (148) proves that
concluding our proof.
∎