Asymptotic Normality of Log Likelihood Ratio and Fundamental Limit of the Weak Detection for Spiked Wigner Matrices
Abstract
We consider the problem of detecting the presence of a signal in a rank-one spiked Wigner model. For general non-Gaussian noise, assuming that the signal is drawn from the Rademacher prior, we prove that the log likelihood ratio (LR) of the spiked model against the null model converges to a Gaussian when the signal-to-noise ratio is below a certain threshold. The threshold is optimal in the sense that the reliable detection is possible by a transformed principal component analysis (PCA) above it. From the mean and the variance of the limiting Gaussian for the log-LR, we compute the limit of the sum of the Type-I error and the Type-II error of the likelihood ratio test. We also prove similar results for a rank-one spiked IID model where the noise is asymmetric but the signal is symmetric.
1 Introduction
One of the most fundamental questions in statistics and data science is to detect a signal from noisy data. The simplest model for the ‘signal-plus-noise’ data is the spiked Wigner matrix, where the signal is a vector and the noise is a symmetric random matrix. In this model, the data matrix is of the form
where the signal is an -dimensional vector and the noise matrix is an Wigner matrix. (See Definition 2.1 and Assumption 2.2 for a precise definition.) The parameter is the signal-to-noise ratio (SNR).
Signal detection/recovery problems for the spiked Wigner matrix have been widely studied in the past decades. One of the most natural ways of detecting the signal is to analyze the eigenvalues of the data matrix, especially its largest eigenvalues, which is called the principal component analysis (PCA). With the normalization for and , it is well-known in random matrix theory that the bulk of the spectrum is supported on and the largest eigenvalue exhibits the following sharp phase transition. If , the largest eigenvalue of separates from the bulk, and hence the signal can be reliably detected by PCA. On the other hand, if , the largest eigenvalue converges to and its fluctuation does not depend on ; thus, the largest eigenvalue carries no information on the signal and PCA fails.
The spectral threshold for the PCA, and it is optimal when the noise is Gaussian in the sense that reliable detection is impossible below the spectral threshold [23]. If the noise is non-Gaussian, however, the conventional PCA is sub-optimal and can be improved by transforming the data entrywise [4, 23]. In this case, with an optimal entrywise transformation, under suitable conditions, the threshold is lowered to , where is the Fisher information of the (off-diagonal) noise distribution, which is strictly larger than if the noise is non-Gaussian. (See (2.1) for its definition.)
Below the threshold for the non-Gaussian noise case, reliable detection is impossible [23], and it is only possible to consider hypothesis testing between the null model and the alternative , which is called the weak detection. The fundamental limit of the weak detection can be proved by analyzing the performance of the likelihood ratio (LR) test, which minimizes the sum of the Type-I and Type-II errors by Neyman–Pearson lemma. For Gaussian noise, the log-LR coincides with the free energy of the Sherrington–Kirkpatrick (SK) model of spin glass. The limiting fluctuation of the free energy of the SK model is given by a Gaussian in the high-temperature case, and the result is directly applicable to the proof of asymptotic normality of the log-LR and the limiting error of the LR test.
While the performance of the LR test is of fundamental importance, to our best knowledge, no results have been proved when the noise is non-Gaussian. The main difficulty in the analysis of the LR test is that the connection between the log-LR and the free energy of the SK model is absent when the noise is non-Gaussian; though the free energy of the SK model converges to a normal distribution with non-Gaussian interaction for some cases [1, 3], it does not prove the Gaussian convergence of the log-LR.
In this paper, we focus on the likelihood ratio between the null hypothesis and the alternative hypothesis for the subcritical case under the assumption that the distribution of the signal, called the prior, is Rademacher, i.e., the ’s are i.i.d. with . Our goals are to prove that the asymptotic normality holds even with non-Gaussian noise under mild assumption on the noise and to compute the exact mean and the variance for the limiting Gaussian that enables us to find the limiting error of the LR test.
An interesting variant of the spiked Wigner matrix is the spiked IID matrix, where the entries of the noise matrix are independent and identically distributed without symmetry constraint. Such a model is natural when the -entry and -entry are sampled separately. While it seems natural to apply the singular value decomposition (SVD) for this case, it is in fact better to symmetrize the matrix first and then apply PCA to exploit the symmetry of the signal; for example, with the Gaussian noise, the threshold for the former , whereas for the latter. The idea of symmetrization can even be combined with the entrywise transformation in [23] if the noise is non-Gaussian to prove that reliable detection is possible if . However, to our best knowledge, no results are known for the subcritical case .
Our goal for the spiked IID model is to prove the results that correspond to those for the spiked Wigner matrices, including the asymptotic normality of the log-LR for the spiked IID matrices and the precise formulas for the mean and the variance for the limiting Gaussian, for . We remark that a consequence of the asymptotic normality of the log-LR and Le Cam’s first lemma is the impossibility of reliable detection for , which establishes that the threshold for the reliable detection for the spiked IID matrices is given by for the non-Gaussian noise case.
1.1 Main contribution on spiked Wigner matrices
For the spiked Wigner matrices,
Let and be the probability distributions of a spiked Wigner matrix under and , respectively. If the noise is Gaussian, from the normalization in Definition 2.1 and Assumption 2.2, the likelihood ratio of with respect to is given by
(1.1) |
where we used that for all . The random off-diagonal term in (LABEL:eq:Gaussian_LR),
coincides with the free energy of the Sherrington–Kirkpatrick model for which the asymptotic normality is known. (The diagonal part is independent from the off-diagonal part and its asymptotic (log)-normality can be easily proved by the CLT.)
If the noise is non-Gaussian, however, then we instead have
(1.2) |
where and are the densities of the (normalized) off-diagonal terms and the diagonal terms, respectively. Applying the Taylor expansion,
and after comparing it to the Taylor expansion of the exponential function, it is possible to rewrite the leading order terms of the off-diagonal part of the LR as
(1.3) |
where and depend only on and its derivatives. (See (4.3)-(LABEL:eq:ABC) for a precise definition.) Since and are not independent, the asymptotic normality of (1.3) is not obvious, and in fact, requires hard analysis.
The main technical difficulty in the analysis of the log-LR arises from that the mechanism for the Gaussian convergence of the terms containing , which we call the spin glass term, and those with , which we call the CLT term, are markedly different. While the latter can be readily proved by the CLT, the former is a result from the theory of spin glass based on the combinatorial analysis for the graph structure involving ’s. To overcome the difficulty, adopting the strategy of [1], we decompose the spin glass term, , into two parts, which are asymptotically orthogonal to each other, conditional on . (See Proposition 4.4 for more detail.) We can then verify the part that depends on and express the dependence by a conditional expectation on . Combining the -dependent part of the spin glass term with the CLT term, we can prove the Gaussian convergence with a precise formula for the mean and the variance of the limiting distribution.
From the asymptotic normality, it is immediate to obtain the sum of the Type-I and Type-II errors of the LR test. The limiting error in Corollary 2.5 converges to as , which shows that the detection threshold is given by . While the limiting error of the LR test for the case also converges to , which can be indirectly deduced from the fact that the transformed PCA in [23] can reliably detect the signal, our result is not directly applicable to this case; the asymptotic normality holds only in the regime and it is expected the asymptotic behavior of the log-LR changes if , which corresponds to the low temperature regime in the theory of spin glass.
The Rademacher prior we consider in this work is one of the most natural models for the rank-one signal-plus-noise data, and it also corresponds to the most fundamental spin configuration in the SK model. Our method relies on this special structure of the spike at least in the technical level. We believe that our main result would change with different priors, e.g., the spherical prior instead of the Rademacher prior, but the change is restricted to a subleading term (proportional to ) in the mean and the variance of the limiting Gaussian. Note that such a change would not occur with the Gaussian noise where the quadratic term (of ) vanishes, which can be made rigorously since the asymptotic behavior of the log-LR with the spherical prior coincides that with many i.i.d. priors including the Rademacher prior [13]. See the discussion at the end of Section 2.2 and also the numerical experiments in Section 3 for details.
1.2 Main contribution on spiked IID matrices
For the spiked IID matrices, we also prove the asymptotic normality of the log-LR (Theorem 2.8) and compute the limit of the sum of the Type-I and Type-II errors of the LR test (Corollary 2.9).
Suppose that the data matrix is of the form , where the signal is an -dimensional vector and the noise is an IID matrix. (See Definition 2.7 for a precise definition.) If the noise is Gaussian, similar to the spiked Wigner case in (LABEL:eq:Gaussian_LR), we find that the likelihood ratio
(1.4) |
Since is again a Gaussian random variable whose variance is , under , the exponent in the off-diagonal term is equal in distribution to . Thus, when compared to the exponent in the off-diagonal term in (LABEL:eq:Gaussian_LR), we find that the SNR is effectively doubled, and in particular, reliable detection is impossible if . Note that in this case it is possible to reliably detect the signal if by PCA for the symmetrized matrix , whereas the largest singular value of is separated from other singular values only when (see, e.g., Section 3.1 of [6]).
If the noise is non-Gaussian, however, we face the same issue as in the spiked Wigner matrices; the log-LR can be decomposed into the spin glass term and the CLT term as in (1.3), which are not independent. The issue can be resolved by adapting the analysis for the spiked Wigner matrices, where we need to consider and in place of and . It is then possible to prove the asymptotic normality of the log-LR with precise formulas for the mean and the variance.
The results in Theorem 2.8 and Corollary 2.9 show that reliable detection is impossible if . On the other hand, if we transform the data matrix entrywise and then symmetrize the transformed matrix, the largest eigenvalue of the resulting matrix becomes an outlier when , and hence reliable detection is possible by PCA. Thus, we can conclude that the threshold for the reliable detection. See the discussion at the end of Section 2.3 for details.
1.3 Related works
The phase transition of the largest eigenvalue of the spiked Wigner matrix has been extensively studied in random matrix theory. It is known as the BBP transition after the seminal work by Baik, Ben Arous, and Péché [2] for spiked (complex) Wishart ensembles. Corresponding results for the spiked Wigner matrix were proved under various assumptions [22, 14, 7, 5]. For the behavior of the eigenvector associated with the largest eigenvalue of the spiked Wigner matrix, we refer to [5].
The impossibility of the reliable detection for the case with the Gaussian noise was considered by Montanari, Reichman, and Zeitouni [18] and Johnstone and Onatski [16]. The optimality of the PCA for the Gaussian noise, the sub-optimality of the PCA and the optimality of the transformed PCA for the non-Gaussian noise was proved by Perry, Wein, Bandeira, and Moitra [23] under mild assumptions on the prior. We remark that the threshold may not be exactly depending on the prior [13].
The analysis on the LR test and its fundamental limit were studied by El Alaoui, Krzakala, and Jordan [13] under the assumption that the signal is an i.i.d. random vector and the noise is Gaussian. We remark that there exists a non-LR test based on the linear spectral statistics of the data matrix, which is optimal if the noise is Gaussian [9].
The detection problem for the spiked rectangular model, introduced by Johnstone [15], has also been widely studied. We refer to [19, 20, 12, 17, 10] for the weak detection under Gaussian noise, including the optimal error of the hypothesis test, and [23, 17] for the sub-optimality of the PCA under non-Gaussian noise. For the spiked IID matrices and their applications, we refer to [8] and references therein.
The SK model is one of the most fundamental models of spin glass. The model was introduced by Sherrington and Kirkpatrick [24] as a mean-field version of the Edwards–Anderson model [11]. The (deterministic) limit of the free energy of the SK model was first predicted by Parisi [21], and it was later rigorously proved by Talagrand [25]. The fluctuation of the free energy for the SK model was studied by Aizenman, Lebowitz, and Ruelle [1]; they showed the Gaussian convergence of the fluctuation of the free energy in the high temperature case, which also holds with non-Gaussian noise. To the best of our best knowledge, the fluctuation in the low temperature case, including the zero-temperature case, still remains open.
1.4 Organization of the paper
The rest of the paper is organized as follows. In Section 2, we precisely define the model and state the main results. In Section 3, we present some examples and conduct numerical experiments to compare the outcomes with the theoretical results. In Sections 4 and 5, we prove our main results on the spiked Wigner matrices. We conclude the paper in Section 6 by summarizing our results and proposing future research directions. Some details of the proof, including the proof for the results on the spiked IID matrices, can be found in Appendix.
2 Main results
2.1 Definition of the model
We begin by precisely defining the model we consider in this paper. We suppose that the noise matrix is a Wigner matrix, which is defined as follows:
Definition 2.1 (Wigner matrix).
We say an random matrix is a Wigner matrix if is a real symmetric matrix and () are independent centered real random variables satisfying
-
•
(Normalization) for all and for some , independent of ,
-
•
(Finite moment) for any , there exists a constant , independent of , such that for all .
For the data matrix , we assume that it is a spiked Wigner matrix, i.e., , and also additionally assume the following:
Assumption 2.2.
For the spike , we assume that the normalized entries are i.i.d. random variables with Rademacher distribution.
For the noise matrix , let and be the densities of the normalized off-diagonal entries and the normalized diagonal entries , respectively. We assume the following:
-
•
The density functions and are smooth, positive everywhere, and symmetric (about ).
-
•
The functions , , and their all derivatives vanish at infinity.
-
•
The functions and are polynomially bounded, i.e., for any positive integer there exist constants , independent of , such that uniformly on . (Here, and are the -th derivatives of and , respectively.)
Note that we assume the off-diagonal entries are identically distributed. While it is possible to consider a spiked Wigner matrix with non-identically distributed noise matrix by assuming certain conditions on their density functions, we refrain from it to avoid unnecessary complications. Similarly, we remark that some conditions in Definition 2.1 and Assumption 2.2 are due to technical reasons, especially the finite moment condition and the symmetric condition for the density of the noise, and we believe that it is possible to relax the conditions. However, for the sake of brevity, we do not attempt to relax them further in this paper.
2.2 Spiked Wigner matrices
Recall that the SNR for the alternative is fixed. Our main result is the following asymptotic normality for the log-LR statistic when testing between versus .
Theorem 2.3.
Remark 2.4.
Applying Le Cam’s third lemma, we also find from Theorem 2.3 that the log likelihood ratio under converges in distribution to .
If the off-diagonal density is Gaussian, we find that , , and the result in Theorem 2.3 coincides with Theorem 2 (and Remark below it) of [13], which states that converges in distribution to where
From the comparison with the Gaussian case, ignoring the term in (2.2), the result in Theorem 2.3 heuristically shows that the effective SNR is when the noise is non-Gaussian.
The change of the effective SNR with non-Gaussian noise can be understood as follows: under , if we apply a function to a normalized entry , then by the Taylor expansions,
(2.3) |
Approximating the right-hand side further by
we find that the matrix obtained by transforming entrywise via is of the form , with and , which shows that the SNR is effectively changed. The approximation can be made rigorous [23], and after optimizing , it can be shown that effective SNR is , which is independent of the prior under mild assumptions. We remark that the optimal is given by , and it is unique up to a constant factor.
An immediate consequence of Theorem 2.3 is the following estimate for the error probability of the LR test.
Corollary 2.5.
Note that the error converges to as , which is consistent with the discussion below Theorem 2.3 that the effective SNR is .
Remark 2.6.
Since the likelihood ratio test minimizes the sum of the Type-I error and the Type-II error by the Neyman–Pearson lemma, we find that for any hypothesis test versus , is asymptotically bounded above by .
The most non-trivial part in Theorem 2.3 and Corollary 2.5 is the third term of in (2.2),
(2.4) |
We conjecture that this term is due to the Rademacher prior and it would change if we assume a different prior. Our heuristic argument for it is as follows. In [9], an algorithm was proposed for hypothesis testing based on the linear spectral statistics (LSS) of the data matrix , which is optimal if the noise is Gaussian; the LSS of the matrix whose eigenvalues are is
for a function continuous on an open neighborhood of . If the noise is non-Gaussian, the algorithm in [9] can be improved by pre-transforming the data matrix entrywise as in (2.3). The error of the improved algorithm is
(2.5) |
where
with
The error (2.5) differs from the error in Corollary 2.5 only in the term involving . The hypothesis test based on the eigenvalues are spherical in nature, since it does not depend on the eigenvectors, or specific directions. It means that the prior naturally associated with the LSS-based test is the spherical prior, which is the uniform distribution on the unit sphere. We thus conjecture that the error term in (2.4) depends on the prior of the model.
2.3 Spiked IID matrices
The results on the spiked Wigner matrices in Theorem 2.3 can be naturally extended to the spiked IID matrices defined as follows:
Definition 2.7 (IID matrix).
We say an real random matrix is an IID matrix if () are independent and identically distributed centered real random variables satisfying that (i) and (ii) for any , there exists a constant , independent of , such that .
For a spiked IID matrix of the form , we have following result.
Theorem 2.8.
Suppose that is a spiked IID matrix and Assumption 2.2 holds for where is the density of the normalized entry . Let
(2.6) |
If , then the log likelihood ratio of with respect to under , defined by
(2.7) |
converges in distribution to as , where
(2.8) |
See Appendix C for the proof of Theorem 2.8. From Theorem 2.8, we can obtain the following estimate for the error probability of the LR test.
Corollary 2.9.
Comparing Theorem 2.8 with Theorem 2.3, we find that the effective SNR is , which is doubled from the effective for the spiked Wigner matrices. The doubling of the effective SNR can also be heuristically checked from the LR as follows. By definition,
(2.9) |
The off-diagonal term can be further decomposed into the upper-diagonal term and the lower-diagonal term, i.e.,
The upper-diagonal term and the lower-diagonal term are almost independent, which effectively makes the LR becomes squared and the log-LR doubled. However, these terms are not exactly independent due to the spike , and this is why the quadratic term in is not the same as the four times the quadratic term , which is what one would get by changing to in (2.2). Note that the linear term vanishes in (2.8) due to the assumption that , since the term , arising from the off-diagonal terms, gets doubled for the spiked IID matrices but does not as it originates from the diagonal terms.
Theorem 2.8 implies the impossibility of the reliable detection when . To conclude that the threshold for the reliable detection, we can consider the following variant of the PCA. We first apply the function entrywise to the data matrix as in (2.3). After normalizing the transformed matrix by dividing it by , it is approximately a spiked IID matrix. In the next step, we symmetrize the (normalized) transformed matrix, i.e., we compute
From the Taylor expansion, we find that
which is approximately a spiked Wigner matrix. Following the proof in [23], it is then immediate to prove that the reliable detection is possible by applying the PCA to if .
3 Examples and numerical experiments
In this section, we consider a spiked Wigner model with non-Gaussian noise and conduct numerical experiments to compare the error from the simulation with the theoretical error in Corollary 2.5.
Suppose that the density of the noise matrix is
We sample from the density and let . We also sample the spike so that ’s are i.i.d. Rademacher. The data matrix . From direct calculation, it is straightforward to see that the constants in Theorem 2.3 are
Assume that . By Corollary 2.5, the limiting error of the LR test is
(3.1) |
We perform a numerical experiment with samples of the matrix under and , respectively, varying the SNR from to . Since finding the exact log-LR in (1.2) for each sample is not viable computationally (as it requires to compute the average of the likelihood ratios with different spikes), we apply a Monte Carlo method as follows: for each , we compute
(3.2) |
for , where ’s are i.i.d. Rademacher, independent of . We use the test statistic , which is the average of ’s, i.e.,
(3.3) |
The behavior of the test statistic under and for the SNR and can be found in Figure 1.

To compare the error from the numerical experiment with the theoretical error, we consider the test in which we accept if and reject otherwise. The result can be found in Figure 2.(a), which shows that the empirical LLR error closely matches the theoretical limiting error in (3.1).
In the algorithm proposed in [9], the data matrix was pre-transformed by
The test statistic based on the LSS of was used for the hypothesis test, which is given by
We accept if
and reject otherwise. The limiting error of this test is
(3.4) |
Note that the error of this algorithm is larger than that of the LR test in (3.1), as is a decreasing function of . See also Figure 2.(a). We remark that the difference between the empirical error and the limiting error increases as the SNR increases, and it is possibly due to the finite size effect, i.e., the contribution from the small terms that were ignored in the asymptotic expansions.


To numerically test the conjecture introduced in the last paragraph of Section 2.2, we perform a simulation with the spherical prior instead of the Rademacher prior. More precisely, we use the same density for the noise matrix, while we sample the spike by first sampling to be i.i.d. standard Gaussians and then normalizing it by
which lets the spike be distributed uniformly on the unit sphere. In the numerical experiment with the spherical prior, we again generated samples of the matrix under and , respectively, varying the SNR from to . For the LR test, we used the test statistic in (3.3) for which the random sample spike is drawn uniformly on the unit sphere. The numerical error of the LR test can be found in Figure 2.(b), which shows that the empirical LLR error closely matches the theoretical limiting error of the LSS test in (3.4). From the numerical experiment, we can check that the error from the LR test depends on the prior, and it also suggests that the LSS-based test is optimal in the sense that it minimizes the sum of the Type-I error and the Type-II error.
4 Proof of main theorem
In this section, we provide the detailed proof of Theorem 2.3. We use the following shorthand notations for the proof.
Notational Remark 4.1.
For and , which can be deterministic numbers and/or random variables depending on , we use the notation if for any (small) and (large) there exists such that whenever . For an event , we say that holds with high probability if for any (large) there exists such that whenever . For a sequence of random variables, the notation denotes the convergence in distribution as
The following lemma will be frequently used in the proof of Theorem 2.3.
Lemma 4.2.
For any and for any positive integer ,
Moreover, if we define
then
Lemma 4.2 can be proved by applying standard arguments using Markov’s inequality and the finite moment condition in Definition 2.1. See Appendix A for the detailed proof.
From Lemma 4.2, we find that for any given , , , and are uniformly bounded by with high probability. Thus, we have the following estimate for the i.i.d. sums of these random variables.
Lemma 4.3.
Define
(4.1) |
For any ,
(4.2) |
4.1 Decomposition of LR
Assume . Recall that
(4.3) |
We first focus on the off-diagonal part of (4.3) and assume . Recall the definition of in Lemma 4.3. Note that is an odd function of if is odd, whereas it is an even function if is even. Recall that . Since from Lemma 4.2, by the Taylor expansion,
(4.4) |
uniformly on and . Taking the logarithm and Taylor expanding it again, we obtain
As briefly explained in Section 1.1, we first analyze the term involving , which we call the spin glass term, by applying the strategy of [1] conditional on . We can then verify the part dependent of , which is in Proposition 4.4. The sum of and the term involving in (LABEL:eq:decompose), which we call the CLT term, can be written as the sum of i.i.d. random variables that depend only on , and we apply the central limit theorem to prove its convergence. Since , the sum and its fluctuation is negligible by the law of large numbers; we call the term involving in (LABEL:eq:decompose) the LLN term.
4.2 Spin glass term
We have the following result for the convergence of the term involving . Let and denote the conditional expectation and the conditional variance given , respectively.
Proposition 4.4.
Let be as defined in (LABEL:eq:ABC). Set
(4.7) |
Then, there exist random variables and such that
(4.8) |
where and satisfy the following:
-
1.
The conditional distribution of given converges (in distribution) to , where
(4.9) -
2.
Conditional on ,
(4.10) where is a random variable whose asymptotic law is a centered Gaussian with variance
(4.11) -
3.
The fluctuations and are asymptotically orthogonal to each other under , the conditional -norm given .
Here, we would like to recall the definition of the conditional distribution. For random variables and with density, let be the joint density function of and and the marginal density function of . Then, the conditional distribution of given has the conditional density function satisfying the relation
Note that is the function of and .
Remark 4.5.
From the first part of Proposition 4.4, we can also find that converges in distribution to without conditioning on . In general, suppose that the conditional distribution of a sequence of random variables given converges and the limiting distribution is independent of . Then, as for some that is independent of . Since , we find that and thus converges in distribution to the same limiting distribution.
Remark 4.6.
Let be the sign function, defined as if and if . Then, is a random variable such that . Since is an even function of under , we can see that and are independent. Since is an odd function of under , this implies that the conditional density function of given is symmetric (about ), and in particular, for any odd .
Example.
To better understand the decomposition in Proposition 4.4, we consider the following example that was considered in Section 3. Recall that in this example the density of the normalized entries of the noise matrix is
and it is assumed that . The derivatives of are given by
Then, from the definition in (LABEL:eq:ABC),
and
Applying the identity , we thus find that
(See Remark 4.6 for a definition of .) Recall that depends only on . In Section 5, under , we will see that depends only on whereas depends only on . Due to the independence between and discussed in Remark 4.6, we find that the conditional distribution of depends only on the random variables . Thus, Equation (4.8) in this case means the decomposition of , conditional on , into one part that depends only on and the other part that depends only on . It is then obvious that the fluctuations and are asymptotically orthogonal to each other under , due to their independence.
4.3 CLT part
From Proposition 4.4, we find that the part involving in the spin glass term is asymptotically . Thus, combining it with the term involving in (LABEL:eq:decompose), we are led to consider
(4.12) |
which is the sum of i.i.d. random variables that depend only on . (The contribution from in (4.10) will be considered separately at the end of the proof.) By the central limit theorem, (4.12) converges to a normal random variable whose mean and variance we compute in this subsection.
We first compute the mean of (4.12). By definition,
(4.13) |
Note that for any ,
(4.14) |
and
We also have from the definition of and the law of total expectation, , that
(4.15) |
Using integration by parts, we obtain that
(4.16) |
Note that the boundary term in the integration by parts used in (4.16) vanishes since for some sequences and such that and , respectively, which can be checked from the fact that
(See Assumption 2.2 for the definition of the constants and .) Thus, from (4.15),
(4.17) |
and we conclude from (4.12), (4.13), (4.14), and (4.15) that
(4.18) |
4.4 LLN part
4.5 Diagonal part
For the diagonal part in (4.3), it is not hard to see that
(4.30) |
which does not depend on . Applying the integration by parts formulas, one can evaluate the mean and the variance to show that
(4.31) |
4.6 Proof of Theorem 2.3
We now combine Proposition 4.4, Equations (LABEL:eq:decompose), (4.26), (4.29), and (4.31) to find that converges to the Gaussian whose mean is
(4.32) |
where we also applied the integration by parts formulas (4.16) and (4.28). (See also Remark 4.5.) Similarly, we can also find that the variance of the limiting Gaussian is
(4.33) |
This concludes the proof of Theorem 2.3.
5 Asymptotic independence of fluctuations
In this section, we prove Proposition 4.4. The key idea for the proof is to decompose the fluctuation of into two parts. First, notice that
which can be checked by considering the cases and separately. Thus, if we let
(5.1) |
then it is direct to see that
(5.2) |
Recall that . By Taylor expanding the right hand side, we get
(5.3) |
and we define
(5.4) |
It is then obvious that .
In the rest of this section, we first find the fluctuation of , which can be readily obtained by the CLT. Then, we adopt the strategy for the proof of Proposition 2.2 in [1] to conclude the proof of Proposition 4.4 by finding the asymptotic fluctuation of .
5.1 Conditional law of
Recall the definition of in (5.4). Since
and , we find that
We want to apply the central limit limit theorem to the array , conditional on . Since and are -independent random variables, we find that is a non-negative, -independent random variable. In particular, is a non-negative, -independent constant.
Suppose that . Then, since
we find that there exists a constant , independent of , such that
almost surely. Further, since
Lyapunov’s condition for the central limit theorem is satisfied with the array as
almost surely. Since Lyapunov’s condition is scaling-invariant, we find that the central limit theorem holds for , conditional on , and the asymptotic law of the random variable
(5.5) |
is given by a centered Gaussian with variance defined in (4.11), since
5.2 Conditional law of
In this subsection, we prove the first part of Proposition 4.4 by a combinatorial approach. Consider the product in (5.1) as a polynomial of . A term in this polynomial is of the form
(5.6) |
If the multiplicity of any of the in (5.6) is odd, then the term (5.6) is negligible in (5.1); for instance, if the multiplicity of the is odd, then the contribution of the term (5.6) from the case and that from the case exactly cancel each other. It in particular implies that (5.6) is actually of the form
(5.7) |
where the pair does not repeat. We then find the one-to-one correspondence between the right hand side of (LABEL:eq:zeta_poly2) and a simple and closed graph on the vertex set , i.e., with no self-edges, repeated edges, or vertices of odd degree. Define for a simple and closed graph
where denotes the edge set of . A straightforward calculation shows that
(5.8) |
We further decompose by considering simple loops, or cycles, contained in . (A simple loop is a graph where every vertex has degree .) For example, if the edges of are , , , , , , then the degree of the vertex is and thus is not a simple loop. However, can be decomposed into simple loops and whose sets of edges are and , respectively.
Let us denote a simple loop by the lowercase . Note that for a simple loop , , since the ’s are independent and their density functions given is symmetric. (See Remark 4.6.) We introduce the following result from [1]:
Lemma 5.1.
The random variable converges in probability to 0.
For the proof of the first part of Proposition 4.4, we claim for that
(5.9) |
To prove the claim, we first notice that if , i.e., a simple loop has edges, then
(5.10) |
for any and any positive integer , which can be checked from an inequality . Similarly, applying Markov’s inequality, we can show that
(5.11) |
for any positive integer , where is the set of edges of . We thus find from the Taylor expansion that
From the expansion (5.9), we find that it is required to prove convergence results for the sums of and . We claim the following:
Lemma 5.2.
Let , . Let be given as in (4.9). Then, as ,
-
(i)
The conditional law given converges in probability to a centered Gaussian with variance , and
-
(ii)
converges in probability to .
Since
it is immediate to prove the first part of Proposition 4.4 from Lemmas 5.1 and 5.2. Furthermore, since the source of the fluctuation of is , which is orthogonal to that generates the fluctuation of under , we find that the third part of Proposition 4.4, the asymptotic orthogonality of and , holds.
6 Conclusion and future works
In this paper, we studied the detection problem for the spiked Wigner model. Assuming the Rademacher prior, we proved the asymptotic normality of the log likelihood ratio with precise formulas for the mean and the variance of the limiting Gaussian. We computed the limiting error of the likelihood ratio test, and we conducted numerical experiments to compare the results from the experiments with the theoretical error. We also obtained the corresponding results for the spiked IID model where the noise is asymmetric.
A natural extension of the current work is to prove the corresponding results for other models, especially the spiked Wigner models with the spherical prior or the sparse Rademacher prior, and prove the conjecture that the limiting distribution of the log likelihood ratio depends on the prior. It is also of interest to extend the techniques in this paper to spiked rectangular models.
Acknowledgments
We thank the anonymous referees for their helpful comments and suggestions. The third author thanks to Ji Hyung Jung and Seong-Mi Seo for helpful comments. The work of the second author was performed when he was affiliated with Stochastic Analysis and Application Research Center (SAARC) at KAIST. The first author was supported in part by National Research Foundation of Korea (NRF) grant funded by the Korea government(MSIT) (No. 2021R1C1C11008539 and 2023R1A2C1005843). The second author and the third author were supported in part by National Research Foundation of Korea (NRF) grant funded by the Korea government(MSIT) (No. 2019R1A5A1028324).
Appendix A Proof of Lemmas 4.2 and 4.3
In this section, we provide the detail of the proof of the high probability estimates, Lemmas 4.2 and 4.3.
Proof of Lemma 4.2.
We first prove the estimate for . For given , choose an integer . By applying Markov’s inequality,
for any sufficiently large , where we used the finite moment assumption for in Definition 2.1. This proves that . Note that it also implies that .
From the polynomial bound assumption for in Assumption 2.2,
Thus, for any
for any sufficiently large . Since is arbitrary, this proves that . The estimate for can be proved in a similar manner.
The last part of the lemma follows from that and can be written as a polynomial of ’s and ’s, respectively. ∎
Proof of Lemma 4.3.
We prove the first inequality for the case only; the general case can be handled in a similar manner.
Set . From Lemma 4.2, we find that for any , are uniformly bounded by with high probability. Define
Since holds with high probability, by taking a union bound, also holds with high probability. For any , applying Hoeffding’s inequality (or high order Markov’s inequality) on , we obtain
(A.1) |
for any sufficiently large . Moreover, since is polynomially bounded, for any ,
for any sufficiently large , which shows that
(A.2) |
and after multiplying both sides by , we conclude that the first inequality of Lemma 4.3 holds. ∎
Appendix B Proof of Lemma 5.2
We first prove the limit of . Let
Note that for distinct simple loops and , and are orthogonal with respect to both and . To check this, assume that the edge is in but not in . Then, from the independence of ’s,
A similar idea can also be used to prove the orthogonality of and with respect to .
We now consider . Let . Since the number of possible simple loops of length in the vertex set is , we have from the orthogonality of ’s that
(B.1) |
Also, for any given ,
(B.2) |
Since and , by the orthogonality and the dominated convergence theorem we have
(B.3) |
as .
We next consider . Our goal is to show that as , and for this purpose, it suffices to prove that for any fixed where we let , since (B.1) and (B.2) imply that
as in , uniformly in . Note that
(B.4) |
The covariance vanishes if there is no common edge, and otherwise
(B.5) |
for large , where is a fixed number greater than and is the number of common edges of and . However, the number of isomorphism types of and is finite for a fixed , and the number of possible choices of and for each isomorphism type is ), verifying that the right hand side of (B.4) converges to . Similarly, we can show that as , which combined with (B.3) proves (ii).
It remains to estimate . Our goal is to show that
(B.6) |
in probability. The coefficient is the number of pairings of , or equivalently the -th moment of the Gaussian. Since the conditional odd moments of given are zero, (i) immediately follows from (B.3) and (B.6). For the proof of (B.6), we notice that it suffices to prove the statement for as in the paragraph above.
By conditional independence of ’s, we have
(B.7) |
where ’s run over simple loops with length at most . In the first term, however, the contribution from the case where ’s are paired into non-intersecting double loops cancel with the second term. Thus, we have
(B.8) |
where runs over multigraphs with even edge multiplicity, which can be described as a union of loops of length at most , but not as a disjoint union of double loops. From an estimate
which can be proved as (5.11), each summand in the right hand side of (B.8) is . However, since the sum of the order of all the vertices, which is equal to , is at least , by the assumptions on . Therefore, the contribution of each isomorphism type to the sum is , hence . Since there are finitely many isomorphism types of , we obtain (B.7). This completes the proof of of Lemma 5.2.
Appendix C Proof of Theorem 2.8
In this section, we provide the detail of the proof of Theorem 2.8.
C.1 Preliminaries
Recall that we have the following estimates from Lemma 4.2: For any and for any ,
(C.1) |
In particular, for any , and are uniformly bounded by with high probability. We set
(C.2) |
where we denote by the -th derivative of . We also have from Lemma 4.3 that
(C.3) |
for any .
Assume . The likelihood ratio is
(C.4) |
As in the spiked Wigner case, the diagonal part can be handled separately, since it does not depend on the spike or the off-diagonal part. Since is an odd function of if is odd and an even function if is even, by the Taylor expansion,
(C.5) |
uniformly on and . Following the decomposition in (LABEL:eq:decompose), we thus get
(C.6) |
where we let
(C.7) |
Note that from Lemma 4.2,
(C.8) |
C.2 Spin glass part
We have the following result for the convergence of the term involving . Let and denote the conditional expectation and the conditional variance given , respectively. The counterpart of Proposition 4.4 is as follows:
Proposition C.1.
Set
(C.9) |
Then, there exist random variables and such that
(C.10) |
where and satisfy the following
-
1.
The conditional distribution of given converges in distribution to , with
(C.11) -
2.
Conditional on ,
(C.12) where is a random variable whose asymptotic law is a centered Gaussian with variance
(C.13) -
3.
The fluctuations and are asymptotically orthogonal to each other under , the conditional -norm given .
C.3 CLT part
From Proposition C.1, we find that the terms involving in the right hand side of (C.6) are
(C.14) |
which is the sum of i.i.d. random variables that depend only on . By the central limit theorem, it converges to a normal random variable whose mean and variance we compute in this subsection.
We first compute the mean of (C.14). By definition,
and using integration by parts formulas in (4.14) and (4.16), we find that
(C.15) |
where we used the independence of and and also that . We thus get
(C.16) |
C.4 LLN part and the diagonal part
From the definition of in (LABEL:eq:ABC) and the integration by parts formula in (4.28), we find that
(C.19) |
For the diagonal part in (C.4), it can be readily checked that
(C.20) |
C.5 Proof of Theorem 2.8
Appendix D Proof of Proposition C.1
In this section, we prove Proposition C.1 by following the strategy in Section 5. We decompose the fluctuation of into two parts. Let
(D.1) |
From the direct calculation involving the Taylor expansion as in Section 5, we get
(D.2) |
and we define
(D.3) |
By definition, .
For the second part of Proposition C.1, we notice that the array almost surely satisfies Lyapunov’s condition for the central limit theorem conditional on . Thus, if we define the random variable
(D.4) |
its asymptotic law is a centered Gaussian with variance
(D.5) |
This proves the second part of Proposition C.1.
Recall that is a simple graph on the vertex set with no self-edges and no vertices of odd degree and is the edge set of . We define
which gives us
We now introduce the results on that correspond to Lemmas 5.1 and 5.2. Recall that we denote simple loops by the lowercase .
Lemma D.1.
Let , . Let be given as in (C.11). Then, as ,
-
(i)
The random variable converges in probability to 0,
-
(ii)
The conditional law given converges in probability to a centered Gaussian with variance , and
-
(iii)
converges in probability to .
As in the proof of Proposition 4.4, from the relation
the first part of Proposition C.1 follows from Lemma D.1. The third part of Proposition C.1 is also obvious since , which generates fluctuation of , is orthogonal to under .
We now prove Lemma D.1. In order to prove the limit of , we let
Recall that and the number of possible simple loops of length in the vertex set is . We then have
(D.6) |
Moreover, for any given , there exists a deterministic such that for all ,
(D.7) |
Thus, if , by the orthogonality and the dominated convergence theorem we have
(D.8) |
as . Now, the proof of the second and the third parts of Lemma D.1 is the verbatim copy of the proof of Lemma 5.2 in Appendix B and we omit the detail.
Appendix E Sketch of the proof for the Spiked IID matrices
In this section, we provide the sketch of the proof for Theorem 2.8. Recall that
(E.1) |
As in the spiked Wigner case, the diagonal part can be handled separately, since it does not depend on the spike or the off-diagonal part. Following the decomposition in (LABEL:eq:decompose), we set
(E.2) |
Then, we get
(E.3) |
where we let
(E.4) |
We first consider the spin glass part and prove a counterpart of Proposition 4.4. Set
(E.5) |
Then, there exist random variables and such that
(E.6) |
where and are asymptotically orthogonal to each other under and satisfy the following: The conditional distribution of given converges in distribution to , with
(E.7) |
Further,
(E.8) |
where is a random variable whose asymptotic law is a centered Gaussian with variance
(E.9) |
We next follow the analysis in Section 4.3 for the CLT part. The terms involving in the right hand side of (LABEL:eq:iid_decompose) are
By definition,
and
(E.10) |
where we used the independence of and and also that . We thus get
(E.11) |
The computation of the variance is similar, and we only remark important changes here as follows:
and
Using the identities
and
we get
(E.12) |
For the LLN part, we simply get
(E.13) |
The computation for the diagonal part coincides with that for the spiked Wigner case, and we get
(E.14) |
References
- [1] M. Aizenman, J. L. Lebowitz, and D. Ruelle. Some rigorous results on the Sherrington-Kirkpatrick spin glass model. Comm. Math. Phys., 112(1):3–20, 1987.
- [2] J. Baik, G. Ben Arous, and S. Péché. Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Ann. Probab., 33(5):1643–1697, 2005.
- [3] J. Baik and J. O. Lee. Fluctuations of the free energy of the spherical Sherrington-Kirkpatrick model. J. Stat. Phys., 165(2):185–224, 2016.
- [4] J. Barbier, M. Dia, N. Macris, F. Krzakala, T. Lesieur, and L. Zdeborová. Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula. In Advances in Neural Information Processing Systems 29, pages 424–432. 2016.
- [5] F. Benaych-Georges and R. R. Nadakuditi. The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Adv. Math., 227(1):494–521, 2011.
- [6] F. Benaych-Georges and R. R. Nadakuditi. The singular values and vectors of low rank perturbations of large rectangular random matrices. J. Multivariate Anal., 111:120–135, 2012.
- [7] M. Capitaine, C. Donati-Martin, and D. Féral. The largest eigenvalues of finite rank deformation of large Wigner matrices: convergence and nonuniversality of the fluctuations. Ann. Probab., 37(1):1–47, 2009.
- [8] Y. Chen, C. Cheng, and J. Fan. Asymmetry helps: Eigenvalue and eigenvector analyses of asymmetrically perturbed low-rank matrices. Ann. Stastist, 49(1):435, 2021.
- [9] H. W. Chung and J. O. Lee. Weak detection of signal in the spiked wigner model. In Proceedings of the 36th International Conference on Machine Learning, volume 97, pages 1233–1241, 2019.
- [10] E. Dobriban. Sharp detection in PCA under correlations: All eigenvalues matter. Ann. Statist., 45(4):1810–1833, 2017.
- [11] S. F. Edwards and P. W. Anderson. Theory of spin glasses. J. Phys. F, 5(5):965, 1975.
- [12] A. El Alaoui and M. I. Jordan. Detection limits in the high-dimensional spiked rectangular model. In Conference On Learning Theory, pages 410–438, 2018.
- [13] A. El Alaoui, F. Krzakala, and M. I. Jordan. Fundamental limits of detection in the spiked Wigner model. Ann. Statist., 48(2):863–885, 2020.
- [14] D. Féral and S. Péché. The largest eigenvalue of rank one deformation of large Wigner matrices. Comm. Math. Phys., 272(1):185–228, 2007.
- [15] I. M. Johnstone. On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist., 29(2):295–327, 2001.
- [16] I. M. Johnstone and A. Onatski. Testing in high-dimensional spiked models. Ann. Stastist, 48(3):1231–1254, 2020.
- [17] J. H. Jung, H. W. Chung, and J. O. Lee. Detection of signal in the spiked rectangular models. In Proceedings of the 38th International Conference on Machine Learning, volume 139, pages 5158–5167, 2021.
- [18] A. Montanari, D. Reichman, and O. Zeitouni. On the limitation of spectral methods: from the Gaussian hidden clique problem to rank one perturbations of Gaussian tensors. IEEE Trans. Inform. Theory, 63(3):1572–1579, 2017.
- [19] A. Onatski, M. J. Moreira, and M. Hallin. Asymptotic power of sphericity tests for high-dimensional data. Ann. Statist., 41(3):1204–1231, 2013.
- [20] A. Onatski, M. J. Moreira, and M. Hallin. Signal detection in high dimension: the multispiked case. Ann. Statist., 42(1):225–254, 2014.
- [21] G. Parisi. A sequence of approximated solutions to the sk model for spin glasses. J. Phys. A, 13(4):L115, 1980.
- [22] S. Péché. The largest eigenvalue of small rank perturbations of Hermitian random matrices. Probab. Theory Related Fields, 134(1):127–173, 2006.
- [23] A. Perry, A. S. Wein, A. S. Bandeira, and A. Moitra. Optimality and sub-optimality of PCA I: Spiked random matrix models. Ann. Statist., 46(5):2416–2451, 2018.
- [24] D. Sherrington and S. Kirkpatrick. Solvable model of a spin-glass. Phys. Rev. Lett., 35(26):1792, 1975.
- [25] M. Talagrand. The Parisi formula. Ann. of Math. (2), 163(1):221–263, 2006.