Resampling Sensitivity of High-Dimensional PCA
Abstract
The study of stability and sensitivity of statistical methods or algorithms with respect to their data is an important problem in machine learning and statistics. The performance of the algorithm under resampling of the data is a fundamental way to measure its stability and is closely related to generalization or privacy of the algorithm. In this paper, we study the resampling sensitivity for the principal component analysis (PCA). Given an random matrix , let be the matrix obtained from by resampling randomly chosen entries of . Let and denote the principal components of and . In the proportional growth regime , we establish the sharp threshold for the sensitivity/stability transition of PCA. When , the principal components and are asymptotically orthogonal. On the other hand, when , the principal components and are asymptotically colinear. In words, we show that PCA is sensitive to the input data in the sense that resampling even a negligible portion of the input may completely change the output.
1 Introduction
The study of stability and sensitivity of statistical methods and algorithms with respect to the input data is an important task in machine learning and statistics [BE02, EEPK05, MNPR06, HRS16, DHS21]. The notion of stability for algorithms is also closely related to differential privacy [DR14] and generalization error [KN02]. To measure algorithmic stability, one fundamental question is to study the performance of the algorithm under resampling of its input data [BCRT21, KB21]. Originating from the analysis of Boolean functions, resampling sensitivity (also called noise sensitivity) is an important concept in theoretical computer science, which refers to the phenomenon that resampling a small portion of the random input data may lead to decorrelation of the output. Such a remarkable phenomenon was first studied in the pioneering work of Benjamini, Kalai and Schramm [BKS99], and we refer to the monograph [GS14] for a systematic discussion on this topic.
In this work, we study the resampling sensitivity of principal component analysis (PCA). As one of the most commonly used statistical methods, PCA is widely applied for dimension reduction, feature extraction, etc [Joh07, DT11]. It is also used in other fields such as economics [VK06], finance [ASX17], genetics [Rin08], and so on. The performance of PCA under the additive or multiplicative independent perturbation of the data matrix has been well studied (see e.g. [BBAP05, BS06, Pau07, BGN11, CLMW11, FWZ18]).
However, how resampling of the data matrix affects the outcome remains unclear. In this paper, we address this problem for the first time. Here, we emphasize that the resampling of the input data may not have any structure, and the specific resampling procedure is given in the next subsection. In our main results, we show that PCA is resampling sensitive, in the sense that, above certain threshold, resampling even a negligible portion of the data may make the resulted principal component completely change (i.e. become orthogonal to the original direction).
Compared with previous work that mainly focused on PCA with additive or multiplicative independent noise, our setting is very different. In our model, if writing the resampling effect as an additive or multiplicative perturbation, then this noise is not independent of the signal and does not possess any special structure. In contrast, in previous work, sometimes low-rank assumptions on the structure of the matrix or the noise, or some kind of incoherence conditions were imposed. In our work, we have almost no assumption on the data other than a sub-exponential decay condition. Moreover, we highlight that our results have universality. In particular, we do not need to know the specific distribution of the data and we do not require the data is i.i.d sampled.
1.1 Model and main results
Let be an data matrix with independent real valued entries with mean 0 and variance ,
(1) |
Note that we do not require the i.i.d. condition for the data. Furthermore, we assume the entries have a sub-exponential decay, that is, there exists a constant such that for ,
(2) |
This sub-exponential decay assumption is mainly for convenience, and other conditions such as the finiteness of a sufficiently high moment would be enough.
Motivated by high-dimensional statistics, we will work in the proportional growth regime . Throughout this paper, to avoid trivial eigenvalues, we will be working in the regime
In the case , our assymption is due to technical reasons in random matrix theory. Specifically, the proof relies on the delocalization of eigenvectors in the whole spectrum. As one of the major open problems in random matrix theory, delocalization of eigenvectors near the lower spectral edge is not known in the general case with just . The strictly square assumption can be slightly relaxed to (see e.g. [Wan22]), but we do not pursue such an extension for simplicity.
The sample covariance matrix corresponding to data matrix is defined by . We order the eigenvalues of as , and use to denote the unit eigenvector corresponding to the eigenvalue . If the context is clear, we just use and to denote the largest eigenvalue and the top eigenvector. We also consider the eigenvalues and eigenvectors of the Gram matrix . Note that and have the same non-trivial eigenvalues, and the spectrum of is given by with . We denote the unit eigenvectors of associated with the eigenvalue by .
Let and . These eigenvectors may be connected by the singular value decomposition of the data matrix , where with corresponds to the singular values. For convenience, we also denote . And therefore, up to the sign of the eigenvectors, we have
We now describe the resampling procedure. For a positive number , define the random matrix in the following way. Let be a set of pairs chosen uniformly at random without raplacement from the set of all ordered pairs of indices with and . We assume that the set is independent of the entries of . The entries of are given by
where are independent random variables, independent of , and has the same distribution as . In other words, is obtained from by resampling random entries of the matrix, and therefore clearly has the same distribution as . Let the sample covariance matrix corresponding to the resampled data matrix . Denote the eigenvalues and the corresponding normalized eigenvectors of by and . When the context is clear, the principal component is just denoted by and . Similarly, denote the eigenvector of the matrix associated with the eigenvalue by .
With the resampling parameter in two different regimes, we have the following results.
Theorem 1 (Noise sensitivity under excessive resampling).
Theorem 2 (Noise sensitivity under moderate resampling).
These two theorems together state that the critical threshold for the resampling strength is of order . Note that compared with the total number of inputs is negligible. We show that, above the threshold , resampling even a negligible portion of the data will result in a dramatic change of the resulting principal component, in the sense that the new principal component is asymptotically orthogonal to the old one; while below the threshold, a relatively mild resampling has almost no effect on the corresponding new principal component. If considering the eigenvector overlaps and , these quantities exhibit sharp phase transitions from to near the critical threshold .
We remark that the phase transition stated in the above theorems is not restricted to the top eigenvectors . With essentially the same arguments, we can prove that for any fixed , the -th leading eigenvectors and exhibit the same phase transition at the critical threshold of the same order .
1.2 High-Level Proof Scheme
The high-level idea is the “superconcentration implies chaos” phenomenon established by Chatterjee [Cha14], which means that small perturbation (beyond certain threshold) of super-concentrated system ( in the sense that it is characterized by some quantity with small variance) will lead to a dramatic change of the system. For random matrix models, the super-concentrated quantity is usually the eigenvalues. Resampling sensitivity for random matrices was first studied for Wigner matrices and sparse Erdős-Rényi graphs in [BLZ20, BL22]. For the PCA model, the key difference is that the entries of the sample covariance matrix are correlated. Moreover, resampling a single entry of the data will change entries in the sample covariance matrix. These two differences will make the proofs more technical and a linearization trick would be important to reduce the interdependency of the matrix entries. In addition, we also need a technical variance formula related to resampling to compute the variance of the top eigenvalue and tools from random matrix theory, in particular the local Marchenko-Pastur law for the resolvent and the delocalization of eigenvectors.
For the sensitivity of the principal components. We show that the inner products and are closely related to the variance of the top eigenvalue of the sample covariance matrix. Using the variance formula for resampling, we have a precise characterization between the inner product of the top eigenvectors and the concentration of the top eigenvalue. The perturbation effect of the resampling is studied via the variational representation of the eigenvalue.
For the stability under moderate resampling, the key idea of our proof is to study the stability of the resolvent. On the one hand, the stability of resolvent implies the stability of the top eigenvalue. On the other hand, the resolvent can be used to approximate certain useful statistics of the top eigenvector. The stability of the resolvent is proved via a Lindeberg exchange strategy. The resampling procedure can be decomposed into a martingale, and the difference between the resolvents can be therefore bounded by martingale concentration inequalities combined with the local Marchenko-Pastur law of the resolvent for a priori estimates.
2 Sensitivity under Excessive Resampling
We provide a heuristic argument for deriving the threshold for the sensitivity regime. We consider the derivative of the top eigenvalue as a function of the matrix entries. Then we have the approximation
Note that the matrix in the parenthesis has only non-zero entries, and each entry is roughly of size . Also, the eigenvector is delocalized in the sense that for all . A central limit theorem yields that approximately we have
By this heuristic argument and central limit theorem, we have
Note that from random matrix theory, we know that is of order . Therefore, if we have (this corresponds to ), then the difference the two top eigenvalues and is much smaller than the first two eigenvalues and of the matrix . This implies that the perturbation effect on is small, and therefore in this case it is plausible to believe that is just a small perturbation of . Thus, for the sensitivity regime, we must have .
Our proof is essentially trying to make the above heuristics rigorous. To do this, a key observation is that the inner products and can be related to the variance of the leading eigenvalue.
2.1 Connection with Variance of Top Eigenvalue
As mentioned above, the key step in the proof for sensitivity regime is to establish a connection between the inner products of top eigenvalues with the variance of the top eigenvalue. Specifically, we will prove
where is some universal constant and is the leading singular value. A similar result is also true for and . For more details, we refer to Section C.2.
From random matrix theory, we have . Then, based on this inequality, we derive the threshold for the sensitivity regime.
3 Stability under Moderate Resampling
To establish the stability of PCA when the resampling strength is mild, we will utilize tools from random matrix theory and specifically the proof relies on the analysis of the resolvent matrix. Also, to simplify the Gram matrix structure of the sample covariance matrix, when considering the resolvent we use a linearization trick. For any with , the resolvent is defined as
Similarly, we denote the resolvent of by . The key idea for the proofs in the stability regime is that eigenvectors can be approximated by resolvents and the resolvents are stable under moderate resampling.
3.1 Resolvent Approximation
To illustrate the usefulness of the resolvent, we show that the entries of the resolvent can be used to approximate the product of entries in the eigenvector. For some small , let with . In the regime for some , there exists some small such that for all , we have
and
A similar result also holds for and . For more details, we refer to Lemma D.5.
3.2 Stability of the Resolvent
Since the eigenvector can be approximated by the resolvent, it suffices to show the stability of the resolvent. Consider the regime for some . For some small and all that is close to the upper spectral edge and , there exists a small constant such that the following is true for all and ,
and
This is the main technical part of the whole argument, and its proof relies on the Lindeberg exchange method and a martingale concentration argument. For more details, we refer to Lemma D.3.
Combining the stability of the resolvents with the resolvent approximation for eigenvectors, we can conclude that and must be close (similarly, also for and ).
3.3 Stability of the Top Eigenvalue
As a byproduct, we derive an stability estimate of the top eigenvalues, which may be of independent interest. For with some arbitrary and an arbitrary ,
where is some small constant. Note that the fluctuation of the top eigenvalue around its typical location is of order , this result shows that the top eigenvalues under moderate resampling are indeed non-trivially stable. For more details, we refer to Lemma D.4.
4 Discussions and Applications
We have shown that PCA is sensitive to the input data, in the sense that resampling fraction of the data will results in decorrelation between the new output and the old output. We further prove that this threshold is sharp that a moderate resampling below this threshold will have no effect on the outcome.
Moreover, besides demonstrating an exciting phenomenon, our results have broad implications in other related fields. We briefly discuss a few potential extensions and applications that would be worth further exploration.
4.1 Extensions to Broader PCA Models
Sparse PCA
A natural extension is to consider sparse data, and this corresponds to sparse PCA that received a lot of attention in the past decade (see e.g. [MWA05, ZHT06, CMW13]). However, establishing the sharp phase transition for sparse PCA lacks several technical ingredients. In particular, for the sensitivity regime, the eigenvalue gap property (i.e. tail estimates for the eigenvalue gap, see Lemma B.3) is unknown. Also, to establish the sharp threshold for the stability regime, an improved local law of the resolvent is needed (cf. Lemma D.1). Though we expect these missing parts to be correct, proving them would be beyond the scope of this paper and we leave them to future work. Assuming the eigenvalue gap property and the improved local law, the phase transition can be proved by the same arguments in this paper.
PCA with General Population
For practical purposes, it would be significant to consider data with a general population covariance profile (see e.g. [Nad08]). The corresponding matrices were studied in [BPZ15, LS16]. The only missing ingredient we need is the eigenvalue gap property. Under reasonable assumptions for the population matrix so that the eigenvalue gap property is true, our arguments in this paper will yield the same stability-sensitivity transition.
4.2 Extensions to Other Statistical Methods
Within the general PCA framework, one important variant is the kernel PCA, which is closely related to the widely used spectral clustering [NJW01, VL07, CBG16]. The corresponding kernel random matrices were studied in [EK10, CS13]. However, the study of these kernel random matrices are far from being well-understood. In particular, the study of eigenvectors were very limited and the indispensable property of delocalization is still unknown.
It would be interesting to explore whether other statistical method share the same resampling sensitivity phenomenon. Random matrices associated with canonical correlation analysis (CCA) or multivariate analysis of variance (MANOVA) are well studied [HPZ16, HPY18, Yan22]. In particular, the Tracy-Widom concentration of the top eigenvalue, one of the most important ingredients that we need, has been proved. We anticipate that these models exhibit a similar stability-sensitivity transition as in PCA.
4.3 Differential Privacy
In our paper, we study the stability of the top eigenvalue and the top eigenvector under resampling in terms of bounding the distance. Such stability estimates can be regarded as the global sensitivity of PCA performed on neighboring datasets. This measurement is closely related to the analysis of differential privacy [DR14]. PCA under differential privacy was previous studied in [BDMN05, CSS13], etc. Our result revisit the problem of designing a private algorithm for solving the principal component. Here we remark that though the statements in Theorem 1 and Theorem 2 are qualitative, a careful examinination of the proof can yield some quantitative estimates. Based on the stability estimates in terms of the metric, a simple Laplace mechanism produces a differentially private version of PCA for computing the top eigenvalue or the top eigenvector. However, compared with [CSS13], our results are limited in the sense that their results are non-asymptotic for all sample size and data dimension , while ours are restricted to the proportional growth regime.
Moreover, previous works on differentially private PCA focused on neighbouring datasets that differ by one sample vector. Our result may be seen as a refined notion of privacy, since we can analyze the sensitivity of PCA over two “neighbouring” datasets with different entries for any .
Meanwhile, the largest eigenvalue of the sample covariance matrix plays an important rule in hypothesis testing. For example, the Roy’s largest root test is used in many problems (see e.g. [JN17]). Our result may provide useful insights to construct a differentially private test statistic based on the top eigenvalue.
4.4 Database Alignment
Database alignment (or in some cases graph matching) refers to the optimization problem in which we are given two datasets and the goal is to find an optimal correspondence between the samples and features that maximally align the data. For datasets , we look for permutations and to solve the optimization problem
where and are the sets of all permutations on and , respectively. This problem is closely related to the Quadratic Assignment Problem (QAP), which is known to be NP-hard to solve or even approximate.
The study of the alignment problem for correlated random databases has a long history. The previous work mainly considered matrices that are correlated through some additive perturbation, and some of the general model were studied with a homogeneous correlation (i.e. the correlation between all correponding pairs are the same). See for example [DCK19, WXS22] and many other works.
Our resampling procedure may be regarded as an adversarial corruption of the dataset, which is a different kind of correlation compared with previous work. To our knowledge, this is the first time to consider database alignment with adversarial corruption. To state the setting of the problem, we have two matrices
where is a random matrix satisfying (1) and (2), and and are permutation matrices of order and chosen uniformly at random. The goal is to recover the permutations and based on the observations and . Here, we can think of as the unlabeled version of with adversarial corruption. By considering the covariance matrices, we have
and similarly we consider
A natural idea to reconstruct the permutations (and is to align the top eigenvectors of the matrices and (and and ). See Algorithm 1 for details. This spectral method is a natural technique for databse alignment and graph matching. We are interested in under what resampling strength, the PCA-Recovery algorithm can almost perfectly reconstruct the permutations, and under what condition this method completely fail.
Spectral methods have been studied and applied in many scenarios (see e.g. [FMWX20, FMWX22] and [GLM22]). In particular, in [GLM22], a similar PCA method was studied to match two symmetric Gaussian matrices correlated via additive Gaussian noise. Their work proved a similar transition for the inner product of the top eigenvectors, which leads to a all-or-nothing phenomenon in the alignment problem, i.e. the accuracy of the recovery undergoes a sharp transition from to near some critical threshold. However, the arguments in [GLM22] are not applicable in our case. Their proof heavily depends on the Gaussian assumption of the matrices, and the additive strucutre of the noise. In particular, they proof crucially relies on the orthogonal invariance of the Gaussian noise. While in our case, the noise is presented in terms of the resampling strength. There is no way to write the “noise” in an additive form that is independent of the “signal”. Even in the Gaussian case, a rigorous analysis of the PCA-Recovery algorithm seems difficult.
Nevertheless, our results on the sensitivity of the eigenvector inner products suggest that, when , the two eigenvectors are approximately de-correlated so that they share almost no common information. Consequently, recovery of the data via aligning the principal components would be basically random guessing. Therefore, we conjecture that if , PCA-Recovery fails to recover the latent permutations in the sense that it can only achieve fraction of correct matching with the ground truth. On the other hand, when , the performance of our algorithm seems mysterious.
In Section 5, we empirically check the performance of the PCA-Recovery algorithm. Numerical simulations suggest that when , the performance of PCA-Recovery is indeed poor in the sense that the accuracy of the recovery is almost . On the other hand, when , experiments show that we cannot expect the sharp all-or-nothing phenomenon similarly as in [GLM22].
Finally, we remark that what PCA-Recovery actually studies is a more difficult task, as we do not need direct observations of and . We can consider a harder problem (in both statistical and computational sense), which we call alignment from covariance profile. In this problem, we only have access to the covariance between the samples and we aim to recover the correspondence between the samples from the two databases. A similar problem with Gaussian data and additive noise was considered in [WWXY22] as a prototype for matching random geometric dot-product graphs. The analysis of such database alignment problem with adversarial corruption will be an interesting direction for future studies.
5 Numerical Experiments
We validate our theoretical prediction by checking the performance of PCA on synthetic data. To highlight the universality of our results, we will consider Gaussian data and Bernoulli data. The Gaussian data matrix consists of i.i.d. entries. The Bernoulli data matrix consists of i.i.d. entries taking value in with equal probability . To visualize the stability-sensitivity transition, we focus on the overlap of the leading eigenvectors and as the observable. Note that, in the stability regime, the asymptotic colinearity (4) implies that . Therefore, we expect a phase transition from 1 to 0 at the critical threshold .
We first focus on rectangular data matrices. For concreteness, we set and . As shown in Figure 1, there is a clear phase transition for the eigenvector overlap varying from to . Also, it provide good evidence that the transition happens at the critical threshold .


We also check square data matrices. As shown in Figure 2, for both Gaussian data and Bernoulli data, we have the same phase transition for the overlap of the leading eigenvectors. Again, this numerical simulation supports our theoretical prediction that the transition happens at the critical threshold .


Also we check the performance of the PCA-Recovery algorithm for database alignment. Still, we consider both Gaussian data and Bernoulli data. As shown in Figure 3, in both the rectangular case and the square case, the accuracy of the algorithm is almost as long as the resampling strength surpasses . This is consistent with the theoretical prediction that in this case the top eigenvectors approximately decorrelate. On the other hand, numerical simulations suggest that PCA-Recovery is brittle to resampling. In particular, we cannot expect a similar all-or-nothing phenomenon as in [GLM22]. Identifying the critical threshold below which PCA-Recovery can achieve almost perfect recovery is an interesting open problem for future research.


Appendix A Notations and Organization
We use to denote generic constant, which may be different in each appearance. We denote if there exists a universal such that , and denote if for some universal . We write if and .
For the analysis of the sample covariance matrix, it is useful to apply the linearization trick (see e.g. [Tro12, DY18, FWZ18]). Specifically, we will also analyze the symmetrization of , which is defined as
(5) |
The spectrum of the symmetrization are given by the singular values , the symmetrized singular values , and trivial eigenvalue with multiplicity . Let be the concatenation of the eigenvectors and . Then is the eigenvector of associated with the eigenvalue . Indeed, we have
An important probabilistic concept that will be used repeatedly is the notion of overwhelming probability.
Definition 1 (Overwhelming probability).
Let be a sequence of events. We say that holds with overwhelming probability if for any (large) , there exists such that for all we have
Organization
The appendix is organized as follows. In Section B, we collect some useful tools for the proof, including a variance formula for resampling and classical results from random matrix theory. In Section C, we prove the sensitivity of PCA under excessive resampling. In Section D, we prove that PCA is stable if resampling of the data is moderate.
Appendix B Preliminaries
B.1 Variance formula and resampling
An essential technique for our proof is the formula for the variance of a function of independent random variables. This formula represents the variance via resampling of the random variables. This idea is first due to Chatterjee [Cha05], and in this paper we will use a slight extension of it as in [BLZ20].
Let be independent random variables taking values in some set , and let be some measurable function. Let and be an independent copy of . We denote
And in general, for , we define to be the random vector obtained from by replacing the components indexed by by corresponding components of . By a result of Chatterjee [Cha05], we have the following variance formula
We remark that this variance formula does not depend on the order of the random variables. Therefore, we can consider an arbitrary permutation of . Specifically, let be a random permutation sampled uniformly from the symmetric group and denote . Then we have
Note that, in the formula above, the expectation is taken with respect to both , and the random permutation .
Let have uniform distribution over . Let denote the vector obtained from by replacing its -th component by another independent copy of the random variable in the following way: If belongs to , then we replace by ; if is not in , then we replace by , where and are independent copies of such that are independent. With this notation, we have the following estimates.
Lemma B.1 (Lemma 3 in [BLZ20]).
Assume that is chosen uniformly at random from the set and independently of other random variables involved, we have for any ,
where for any ,
and the expectation is taken with respect to components of vectors, random permutations and the random variable .
B.2 Tools from random matrix theory
In this section we collect some classical results in random matrix theory, which will be indispensable for proving the main theorems. These include concentration of the top eigenvalue, eigenvalue rigidity estimates, and eigenvector delocalization.
To begin with, we first state some basic settings and notations. It is well known that the empirical distribution of the spectrum of the sample covariance matrix converges to the Marchenko-Pastur distribution
(6) |
where the endpoints of the spectrum are given by
(7) |
Define the typical locations of the eigenvalues:
A classical result in random matrix theory is the rigidity estimates of the eigenvalues [PY14, BEK+14]. Let , for any small and large there exists such that the following holds for any ,
(8) |
We remark that the square case is actually significantly different, due to the singularity of the Marchenko-Pastur law at the spectral edge . Near this edge, the typical eigenvalue spacing would be of order . In this case, it would be more convenient to state the rigidity estimates in terms of the singular values of the data matrix. The following result was proved in [AEK14]. For any small and large , there exists such that the following is true for any ,
(9) |
The intuition for this case is that the singular values of a square data matrix behaves like the eigenvalues of a Wigner matrix. More specifically, the singular values and their symmetrization are the eigenvalues of the symmetrized matrix defined in (5), and can be seen as a Wigner matrix with imprimitive variance profile (see [AEK14]). For more details, this was explained in [Wan19, Wan22].
Another important result is the Tracy-Widom limit for the top eigenvalue (see e.g. [PY14, DY18, Wan19, SX21]). Specifically,
Lemma B.2.
For any , with overwhelming probability, we have
Moreover, for any , there exists a constant such that
Proof.
The first result follows from the well-known Tracy-Widom limit for the top eigenvalue. Specifically,
where is a constant depending only on the ratio and is the type-1 Tracy-Widom distribution (in particular, [Wan19, SX21] provided quantitative rate of convergence). For the variance part, the Gaussian case was proved in [LR10], and the general case follows from universality, i.e. for any fixed
where denotes the probability measure associated with the Gaussian matrix. The spectral gap estimate also follows from universality that the spectral statistics of the sample covariance matrix is the same as the Gaussian Orthogonal Ensemble (GOE), i.e. for any fixed
For GOE, the desired spectral gap estimate was proved in e.g. [AGZ10]. ∎
Moreover, an estimate on the eigenvalue gap near the spectral edge is needed. The following result was proved in [TV12, Wan12]
Lemma B.3.
For any , there exists such that for every , with probability at least , we have
The property of eigenvectors is also a key ingredient for our proof. In particular, we extensively rely on the following delocalization property (see e.g. [PY14, BEK+14, DY18, Wan22])
Lemma B.4.
For any , with overwhelming probability, we have
Appendix C Proofs for the Sensitivity Regime
C.1 Sensitivity analysis for neighboring data matrices
As a first step, we will first show that resampling of a single entry has little perturbation effect on the top eigenvectors. This will be helpful to control the single entry resampling term in the variance formula (see Lemma B.1).
For any fixed and , let be the matrix obtained from by replacing the entry with . Define the corresponding covariance matrix , and use to denote its unit top eigenvector. Similarly, we denote by the unit top eigenvector of .
Lemma C.1.
Let small and . For all and , on the event , with overwhelming probability
(10) |
and similarly
Proof.
Let denote the eigenvalues of the matrix , and let denote the unit eigenvector associated with the eigenvalue . Similarly, we define the unit eigenvectors for the matrix . Using the variational characterization of the eigenvalues, we have
Since and differ only at the entry, we have
Thus, setting , we have
(11) | ||||
This gives us
(12) |
Similarly,
(13) |
From the sub-exponential decay assumption (2), we know with overwhelming probability for any . Moreover, by the delocalization of eigenvectors, with overwhelming probability, we have
Moreover, by the rigidity estimates (8), with overwhelming probability we have
Therefore, combining with a union bound, the above two inequalities (12) and (13) together yield
(14) |
with overwhelming probability.
We write in the orthonormal basis of eigenvectors :
Using this representation and the spectral theorem,
As a consequence,
For , taking inner product with yields
which implies
(15) |
By a similar computation as in (11), we have
(16) | ||||
with overwhelming probability, where the second step follows from rigidity of eigenvalues and the last step follows from the sub-exponential decay assumption and delocalization of eigenvectors.
Consider the event . Fix some small. By rigidity of eigenvalues (8), on the event , with overwhelming probability
(17) |
On the event , using (14), (16) and (17), with overwhelming probability we have
(18) |
Choose . Note that . Thanks to the delocalization of eigenvectors, with overwhelming probability, we have
Thus, on the event , it follows from (17) that
Choosing and small enough so that , we conclude that (10) is true.
A similar bound for can be shown by the same arguments for . Hence, we have shown the desired results. ∎
C.2 Proof of Theorem 1
Now we are ready to prove the resampling sensitivity.
Let be a copy of that is independent of and . For an arbitrary index with and , we introduce another random matrix obtained from by replacing the entry by . Similarly, we denote the matrix obtained via the same procedure from . For the matrix , we do the similar resampling procedure in the following way: if , then replace with ; if , then replace with , where is another independent copy of , and .
In the following analysis, we choose an index uniformly at random from the set of all pairs . Let be the top singular value of and use and to denote the normalized top left and right singular vectors of . Similarly, we define , and for . We also denote by and the concatenation of and , respectively. Finally, let , and be the symmetrization (5) of , and , respectively. When the context is clear, we will omit the index for the convenience of notations.
Step 1.
By Lemma B.1, we have
(19) |
Using the variational characterization of the top singular value
This implies
(20) |
Applying the same arguments to and , we have
Since the matrices and differ only at the entry, for any vectors and we have
where
Therefore,
Consider
Then we have
(21) |
Step 2.
Next we claim that the contribution of when does not hold is negligible. Specifically, we have
(26) |
Without loss of generality, it suffices to show that
(27) |
To see this, using , we decompose the expectation into two parts
where
For the first term , by delocalization and the relation , we have
(28) |
Note that the random variable and the event are dependent. To decouple these variables, we introduce a new event. Consider the event , where
Then,
Observe that the random variables and are independent of the event , and the random variable is independent of . Therefore, by Lemma B.3,
By Lemma C.1, we have for any fixed large . Using the Cauchy-Schwarz inequality, we have
Choosing , then (28) yields
For the term , note that , , and are unit vectors. We have that
Recall that holds with overwhelming probability. By the Cauchy-Schwarz inequality, for any large , we have
Hence we have shown the desired claim (27).
Step 3.
Combining (21), (25) and (26), we obtain
Since , by (19) we have
(29) |
Since the random index is uniformly sampled, we have
(30) |
Note that
In either case, we have . Therefore,
Consequently, this implies
(31) |
Moreover, we claim that
(32) |
For the ease of notations, we set . It suffices to show that for all pairs we have
(33) |
To see this, we introduce another copy of , denoted by , which is independent of all previous random variables . For an arbitrarily fixed index , we define matrices and by resampling the entry of and with . Let , be the left and right top singular vector of , and similarly , for . Define
A crucial observation is that and are independent. This is because, conditioned on , the matrices and are independent of . Such a conditional independence is also true for the singular vectors, and hence also holds for . On the other hand, by definition, the variable only depends on . Therefore,
Thus, we reduce (33) to showing
(34) |
The proof of (34) is similar as previous arguments. Consider the events
On the event , we have . Note that since . As a consequence,
(35) |
Using the same argument as in (27), we have
(36) |
where is the constant in the gap property (Lemma B.3) Thus, by (35) and (36), we have shown the desired claim (34).
Step 4.
Appendix D Proofs for the Stability Regime
Throughout the whole section, we will focus on the behaviour of and . Similar results also hold for and via the same arguments.
D.1 Linearization and local law of resolvent
As mentioned in the introduction, in certain cases it would be more convenient to consider the symmetrization of the matrix (defined as in (5)) when studying its spectral properties. For with , We introduce the resolvent of this symmetrization
(38) |
Note that is not the conventional definition of the resolvent matrix, but we still call it resolvent for convenience. For the ease of notations, we will relabel the indices in in the following way:
Definition 2 (Index sets).
We define the index sets
For a general matrix , we label the indices of the matrix elements in the following way: for , if , then typically we use Latin letters to represent them; if , we use the corresponding Greek letters and to represent them.
The resolvent is closely related to the eigenvalue and eigenvectors of the covariance matrix. As discussed in [DY18][Equation (3.9),(3.10)], we have
(39) |
and
An important result is the local Marchenko-Pastur law for the resolvent matrix . This was first proved in [DY18][Lemma 3.11], and we also refer to [HLS19][Proposition 2.13] for a version that can be directly applied. Specifically, the resolvent matrix has a deterministic limit, defined by
(40) |
where is the Stieltjes transform of the Marchenko-Pasture law (6), given by
where denotes the square root on the complex plane whose branch cut is the negative real line. With this choice we always have when .
To state the local law, we will focus on the spectral domain
(41) |
Lemma D.1 (Local Marchenko-Pastur law).
For any , the following estimate holds With overwhelming probability uniformly for ,
(42) |
To give a precise characterization of the resolvent, we rely on the following estimates for the Stieltjes transform of the Marchenko-Pasture law. We refer to e.g. [BKYY16][Lemma 3.6] for more details.
Lemma D.2 (Estimate for ).
For , let denote the distance to the spectral edge. If , we have
(43) |
In the following analysis, we will work with satisfying and , where is some parameter. Uniformly in this regime, the local law yields that the following is true with overwhelming probability for all and some universal constant ,
(44) |
These estimates will be used repeatedly in the following subsections.
D.2 Stability of the resolvent
In this subsection, we will prove the main technical result for the proof of resampling stability. Specifically, we will show that under moderate resampling, the resolvent matrices are stable. Since resolvent is closely related to various spectral statistics, this stability lemma for resolvent will be a key ingredient for our proof.
Lemma D.3.
Assume for some . There exists such that for all , uniformly for with and , there exists such that the following is true with overwhelming probability
(45) |
Proof.
Recall that is the random subset of matrix indices whose elements are resampled in the matrix . For , let be the matrix obtained from by resampling the entries and let be the -algebra generated by the random variables , and . For , we define
Let be an arbitrarily fixed parameter, and let be the constant in (44). Consider the event where for all with and we have
Define as the matrix obtained from by replacing the entry with , and also define its symmetrization as in (5). Note that is -measurable. We write
where are symmetric matrices whose elements are all except at the and entries, satisfying
Define the resolvents for the matrices and as in (38):
Using first-order resolvent expansion, we obtain
(46) |
The triangle inequality yields
Note that has only two non-zero entries,
Recall that with overwhelming probability thanks to the sub-exponential decay assumption (2). Then on the event , we have
Similarly,
We use the trivial bound for the last term. Then, on the event , we have
Therefore, we have shown that, on the event ,
(47) |
Similarly, using the first-order resolvent expansion for around , we have
By the same arguments as above, on the event , we can derive
Next, we use the resolvent identity (or zeroth-order expansion) for and :
This leads to
Thus, on the event , we conclude
(48) |
Meanwhile, the second-order resolvent expansion of around yields
A key observation is that is -measurable, and . For simplicity of notations, we set
where is the symmetrization of the matrix whose elements are all except . Then we have
(49) |
Similarly, using resolvent expansion of around , we obtain
By the same arguments as above, on the event , we deduce that
(50) |
where
(51) |
Combining (49) and (50) yields
(52) |
By a telescopic summation, we obtain
(53) | ||||
where the remainder is bounded by (52)
Recall the expression of , to estimate the remainder, we need to control the size of the set . Note that By a Berstein-type inequality (see e.g. [Cha07][Proposition 1.1]), for any , we have
Recall that . The inequality together with a union bound implies that
with overwhelming probability. We denote this event by . On the event , we have
(54) |
For the first term in (53), we set
Note that . This implies that . Moreover, by (48), on the event we have . Further, on the event ,
Using the Azuma-Hoeffding inequality, for any , we have
Moreover,
Recall that holds with overwhelming probability, and consequently for any . Choosing implies that with overwhelming probability
(55) |
For the next two terms in (53), we will deal with them by introducing a backward filtration. Let be the -algebra generated by the random variables , and with and . Similarly as above, we consider the event that for all with and we have
Using resolvent expansion, the same arguments for (47) yield that, on the event , we have
A key observation is that defined in (51) is -measurable. Also, we have . Consider
Then we have since we also have . Note that
The second term is negligible since holds with overwhelming probability. To estimate the first term, we use Azuma-Hoeffding inequality as before. Based on similar arguments as in (48), we deduce
By considering the event and using Azuma-Hoeffding inequality as in (55), we can conclude that with overwhelming probability,
As a consequence, with overwhelming probability
(56) |
For the third term in (53), by the same arguments, we have
(57) |
Finally, combining (53), (54), (55), (56) and (57), we have shown that
Recall that , , and . Then we obtain
(58) |
Choosing yields the desired bound (45) for a fixed .
So far, we have proved the desired result for a fixed . To extend this result to a uniform estimate, we simply invoke a standard net argument. To do this, we divide the interval into sub-intervals, and consider with taking values in each sub-interval. Note that
For , associated with the same sub-interval, we have
which is of lower order compared with the error bound in (58). This shows that, up to a small multiplicative factor, the desired error bound (45) holds uniformly in each sub-interval with overwhelming probability. Finally, thanks to the overwhelming probability, a union bound over the sub-intervals yields the desired uniform estimate (45) for all with and .
Using the same arguments, we can prove a similar bound for the and blocks. Hence, we have shown the desired results. ∎
D.3 Stability of the top eigenvalue
As a consequence of the stability of the resolvent, we also have the stability of the top eigenvalue. This stability of the eigenvalue will play a crucial rule for the resolvent approximation of eigenvector statistics in the next subsection.
Lemma D.4.
Assume for some . Let with as in Lemma D.3. For any , with overwhelming probability, we have
Proof.
Without loss of generality, we assume that . Set . By the spectral representation of the resolvent (39), we have
By the pigeonhole principle, we know that there exists such that . Choosing this and , we obtain
(59) |
On the other hand, using the spectral representation of resolvent again for , we have
Pick , we decompose the summation into two parts
Using delocalization of eigenvectors, for any , with overwhelming probability, we have
(60) |
By the Tracy-Widom limit of the top eigenvalue (Lemma B.2), for any , with overwhelming probability, we have . Also, as discussed in (17), the rigidity of eigenvalues yields that for all , with overwhelming probability,
Then using delocalization again, with overwhelming probability, we have
(61) |
Again, since , by choosing we have . Therefore, by (60) and (61), we have shown that with overwhelming probability
Note that the minimum is attained by . This shows that
Using Lemma D.3 and (59), we have
Therefore, we have shown that, with overwhelming probability,
Recall , and we conclude that
which proves the desired result thanks to the arbitrariness of . ∎
D.4 Proof of Theorem 2
The final ingredient to prove the resampling stability is the following approximation lemma, which asserts that the product of entries in the eigenvector can be well approximated by the resolvent entries.
Lemma D.5.
Assume that for some . Let be as in Lemma D.3. Then, for with , there exists such that with probability we have
Similarly, we also have
Proof.
For any , we consider a general with . From the spectral representation of the resolvent (39), we have
Pick some , we decompose the summation on the right-hand side into three parts
where
Using the same arguments as in (61), for any , with overwhelming probability we have
For the term , we consider the following event
For any , we can find an appropriate such that . Then, for with , on the event , we have
Let with . On the event , for all with and , we have
This yields
Choosing , we obtain
(62) |
Similarly, we can apply the same arguments to . Consider the event
By previous arguments, we know . This gives us . Finally, note that , by Lemma D.4, with overwhelming probability we have . This implies that we can choose in both (62) and . Thus, we have shown the desired result for and by choosing .
On the other hand, from (39) we also have
Using the same methods as above yields the desired result for and . ∎
Now we prove the main result Theorem 2 on the stability of PCA under moderate resampling.
Proof of Theorem 2.
Let as in Lemma D.5. By Lemma D.3 and D.5, we know that, with probability , for all , we have
Denote , and we have
For any fixed , we consider the event
Since delocalization of eigenvectors holds with overwhelming probability, we know that .
By the pigeonhole principle, there exists such that . We choose the phases of and in the way that and are non-negative. On the event , we obtain
Moreover, for any entry and , if holds, the triangle inequality gives us
Choosing sufficiently small, this implies the desired result (4).
For and , note that
By Lemma D.3, we have
As a consequence, we have
The desired result for and then follows from the same arguments above for and . ∎
Acknowledgment
The author thanks Yihong Wu and Paul Bourgade for helpful discussions at the early stage of this project. Thanks also to Yiyun He for helpful discussion on differential privacy.
References
- [AEK14] Oskari Ajanki, Lászlo Erdős, and Torben Krüger. Local semicircle law with imprimitive variance matrix. Electronic Communications in Probability, 19:1–9, 2014.
- [AGZ10] Greg W Anderson, Alice Guionnet, and Ofer Zeitouni. An introduction to random matrices. Number 118. Cambridge university press, 2010.
- [ASX17] Yacine Ait-Sahalia and Dacheng Xiu. Using principal component analysis to estimate a high dimensional factor model with high-frequency data. Journal of Econometrics, 201(2):384–399, 2017.
- [BBAP05] Jinho Baik, Gérard Ben Arous, and Sandrine Péché. Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Annals of Probability, pages 1643–1697, 2005.
- [BCRT21] Rina Foygel Barber, Emmanuel J Candes, Aaditya Ramdas, and Ryan J Tibshirani. Predictive inference with the jackknife+. The Annals of Statistics, 49(1):486–507, 2021.
- [BDMN05] Avrim Blum, Cynthia Dwork, Frank McSherry, and Kobbi Nissim. Practical privacy: the sulq framework. In Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 128–138, 2005.
- [BE02] Olivier Bousquet and André Elisseeff. Stability and generalization. The Journal of Machine Learning Research, 2:499–526, 2002.
- [BEK+14] Alex Bloemendal, László Erdős, Antti Knowles, Horng-Tzer Yau, and Jun Yin. Isotropic local laws for sample covariance and generalized Wigner matrices. Electron. J. Probab., 19:no. 33, 53, 2014.
- [BGN11] Florent Benaych-Georges and Raj Rao Nadakuditi. The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Advances in Mathematics, 227(1):494–521, 2011.
- [BKS99] Itai Benjamini, Gil Kalai, and Oded Schramm. Noise sensitivity of Boolean functions and applications to percolation. Inst. Hautes Études Sci. Publ. Math., (90):5–43 (2001), 1999.
- [BKYY16] Alex Bloemendal, Antti Knowles, Horng-Tzer Yau, and Jun Yin. On the principal components of sample covariance matrices. Probab. Theory Related Fields, 164(1-2):459–552, 2016.
- [BL22] Charles Bordenave and Jaehun Lee. Noise sensitivity for the top eigenvector of a sparse random matrix. Electronic Journal of Probability, 27:1–50, 2022.
- [BLZ20] Charles Bordenave, Gábor Lugosi, and Nikita Zhivotovskiy. Noise sensitivity of the top eigenvector of a Wigner matrix. Probability Theory and Related Fields, 177(3):1103–1135, 2020.
- [BPZ15] Zhigang Bao, Guangming Pan, and Wang Zhou. Universality for the largest eigenvalue of sample covariance matrices with general population. Ann. Statist., 43(1):382–421, 2015.
- [BS06] Jinho Baik and Jack W Silverstein. Eigenvalues of large sample covariance matrices of spiked population models. Journal of multivariate analysis, 97(6):1382–1408, 2006.
- [CBG16] Romain Couillet and Florent Benaych-Georges. Kernel spectral clustering of large dimensional data. Electronic Journal of Statistics, 10(1):1393–1454, 2016.
- [Cha05] Sourav Chatterjee. Concentration inequalities with exchangeable pairs. page 105, 2005. Thesis (Ph.D.)–Stanford University.
- [Cha07] Sourav Chatterjee. Stein’s method for concentration inequalities. Probab. Theory Related Fields, 138(1-2):305–321, 2007.
- [Cha14] Sourav Chatterjee. Superconcentration and related topics. Springer Monographs in Mathematics. Springer, Cham, 2014.
- [CLMW11] Emmanuel J Candès, Xiaodong Li, Yi Ma, and John Wright. Robust principal component analysis? Journal of the ACM (JACM), 58(3):1–37, 2011.
- [CMW13] T Tony Cai, Zongming Ma, and Yihong Wu. Sparse pca: Optimal rates and adaptive estimation. The Annals of Statistics, 41(6):3074, 2013.
- [CS13] Xiuyuan Cheng and Amit Singer. The spectrum of random inner-product kernel matrices. Random Matrices: Theory and Applications, 2(04):1350010, 2013.
- [CSS13] Kamalika Chaudhuri, Anand D Sarwate, and Kaushik Sinha. A near-optimal algorithm for differentially-private principal components. Journal of Machine Learning Research, 14, 2013.
- [DCK19] Osman E Dai, Daniel Cullina, and Negar Kiyavash. Database alignment with Gaussian features. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 3225–3233. PMLR, 2019.
- [DHS21] Zhun Deng, Hangfeng He, and Weijie Su. Toward better generalization bounds with locally elastic stability. In International Conference on Machine Learning, pages 2590–2600. PMLR, 2021.
- [DR14] Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3–4):211–407, 2014.
- [DT11] Momar Dieng and Craig A Tracy. Application of random matrix theory to multivariate statistics. In Random Matrices, Random Processes and Integrable Systems, pages 443–507. Springer, 2011.
- [DY18] Xiucai Ding and Fan Yang. A necessary and sufficient condition for edge universality at the largest singular values of covariance matrices. The Annals of Applied Probability, 28(3):1679–1738, 2018.
- [EEPK05] Andre Elisseeff, Theodoros Evgeniou, Massimiliano Pontil, and Leslie Pack Kaelbing. Stability of randomized learning algorithms. Journal of Machine Learning Research, 6(1), 2005.
- [EK10] Noureddine El Karoui. The spectrum of kernel random matrices. Ann. Statist., 38(1):1–50, 2010.
- [FMWX20] Zhou Fan, Cheng Mao, Yihong Wu, and Jiaming Xu. Spectral graph matching and regularized quadratic relaxations: Algorithm and theory. In International conference on machine learning, pages 2985–2995. PMLR, 2020.
- [FMWX22] Zhou Fan, Cheng Mao, Yihong Wu, and Jiaming Xu. Spectral graph matching and regularized quadratic relaxations II. Foundations of Computational Mathematics, pages 1–51, 2022.
- [FWZ18] Jianqing Fan, Weichen Wang, and Yiqiao Zhong. An eigenvector perturbation bound and its application to robust covariance estimation. Journal of Machine Learning Research, 18(207):1–42, 2018.
- [GLM22] Luca Ganassali, Marc Lelarge, and Laurent Massoulié. Spectral alignment of correlated gaussian matrices. Advances in Applied Probability, 54(1):279–310, 2022.
- [GS14] Christophe Garban and Jeffrey E Steif. Noise sensitivity of Boolean functions and percolation, volume 5. Cambridge University Press, 2014.
- [HLS19] Jong Yun Hwang, Ji Oon Lee, and Kevin Schnelli. Local law and Tracy–Widom limit for sparse sample covariance matrices. The Annals of Applied Probability, 29(5):3006–3036, 2019.
- [HPY18] Xiao Han, Guangming Pan, and Qing Yang. A unified matrix model including both CCA and F matrices in multivariate analysis: The largest eigenvalue and its applications. Bernoulli, 24(4B):3447–3468, 2018.
- [HPZ16] Xiao Han, Guangming Pan, and Bo Zhang. The Tracy–Widom law for the largest eigenvalue of F type matrices. The Annals of Statistics, 44(4):1564–1592, 2016.
- [HRS16] Moritz Hardt, Ben Recht, and Yoram Singer. Train faster, generalize better: Stability of stochastic gradient descent. In International conference on machine learning, pages 1225–1234. PMLR, 2016.
- [JN17] Iain M Johnstone and Boaz Nadler. Roy’s largest root test under rank-one alternatives. Biometrika, 104(1):181–193, 2017.
- [Joh07] Iain M. Johnstone. High dimensional statistical inference and random matrices. In International Congress of Mathematicians. Vol. I, pages 307–333. Eur. Math. Soc., Zürich, 2007.
- [KB21] Byol Kim and Rina Foygel Barber. Black box tests for algorithmic stability. arXiv preprint arXiv:2111.15546, 2021.
- [KN02] Samuel Kutin and Partha Niyogi. Almost-everywhere algorithmic stability and generalization error. In Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence, pages 275–282, 2002.
- [LR10] Michel Ledoux and Brian Rider. Small deviations for beta ensembles. Electronic Journal of Probability, 15:1319–1343, 2010.
- [LS16] Ji Oon Lee and Kevin Schnelli. Tracy-Widom distribution for the largest eigenvalue of real sample covariance matrices with general population. Ann. Appl. Probab., 26(6):3786–3839, 2016.
- [MNPR06] Sayan Mukherjee, Partha Niyogi, Tomaso Poggio, and Ryan Rifkin. Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Advances in Computational Mathematics, 25(1):161–193, 2006.
- [MWA05] Baback Moghaddam, Yair Weiss, and Shai Avidan. Spectral bounds for sparse pca: Exact and greedy algorithms. Advances in neural information processing systems, 18, 2005.
- [Nad08] Boaz Nadler. Finite sample approximation results for principal component analysis: A matrix perturbation approach. The Annals of Statistics, pages 2791–2817, 2008.
- [NJW01] Andrew Ng, Michael Jordan, and Yair Weiss. On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems, 14, 2001.
- [Pau07] Debashis Paul. Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statistica Sinica, pages 1617–1642, 2007.
- [PY14] Natesh S. Pillai and Jun Yin. Universality of covariance matrices. Ann. Appl. Probab., 24(3):935–1001, 2014.
- [Rin08] Markus Ringnér. What is principal component analysis? Nature biotechnology, 26(3):303–304, 2008.
- [SX21] Kevin Schnelli and Yuanyuan Xu. Convergence rate to the Tracy–Widom laws for the largest eigenvalue of sample covariance matrices. arXiv preprint arXiv:2108.02728, 2021.
- [Tro12] Joel A Tropp. User-friendly tail bounds for sums of random matrices. Foundations of computational mathematics, 12(4):389–434, 2012.
- [TV12] Terence Tao and Van Vu. Random covariance matrices: universality of local statistics of eigenvalues. Ann. Probab., 40(3):1285–1315, 2012.
- [VK06] Seema Vyas and Lilani Kumaranayake. Constructing socio-economic status indices: how to use principal components analysis. Health policy and planning, 21(6):459–468, 2006.
- [VL07] Ulrike Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17:395–416, 2007.
- [Wan12] Ke Wang. Random covariance matrices: Universality of local statistics of eigenvalues up to the edge. Random Matrices: Theory and Applications, 1(01):1150005, 2012.
- [Wan19] Haoyu Wang. Quantitative universality for the largest eigenvalue of sample covariance matrices. arXiv preprint arXiv:1912.05473, 2019.
- [Wan22] Haoyu Wang. Optimal smoothed analysis and quantitative universality for the smallest singular value of random matrices. arXiv preprint arXiv:2211.03975, 2022.
- [WWXY22] Haoyu Wang, Yihong Wu, Jiaming Xu, and Israel Yolou. Random graph matching in geometric models: the case of complete graphs. In Conference on Learning Theory, pages 3441–3488. PMLR, 2022.
- [WXS22] Yihong Wu, Jiaming Xu, and H Yu Sophie. Settling the sharp reconstruction thresholds of random graph matching. IEEE Transactions on Information Theory, 2022.
- [Yan22] Fan Yang. Sample canonical correlation coefficients of high-dimensional random vectors: Local law and Tracy–Widom limit. Random Matrices: Theory and Applications, 11(01):2250007, 2022.
- [ZHT06] Hui Zou, Trevor Hastie, and Robert Tibshirani. Sparse principal component analysis. Journal of computational and graphical statistics, 15(2):265–286, 2006.