This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Asymptotic Normality of Log Likelihood Ratio and Fundamental Limit of the Weak Detection for Spiked Wigner Matrices

Hye Won Chung 111School of Electrical Engineering, KAIST, Daejeon, 34141, Korea
email: [email protected]
  Jiho Lee 222Korea Science Academy of KAIST, Busan, 10547, Korea
email: [email protected]
  and Ji Oon Lee333Department of Mathematical Sciences, KAIST, Daejeon, 34141, Korea
email: [email protected]
Abstract

We consider the problem of detecting the presence of a signal in a rank-one spiked Wigner model. For general non-Gaussian noise, assuming that the signal is drawn from the Rademacher prior, we prove that the log likelihood ratio (LR) of the spiked model against the null model converges to a Gaussian when the signal-to-noise ratio is below a certain threshold. The threshold is optimal in the sense that the reliable detection is possible by a transformed principal component analysis (PCA) above it. From the mean and the variance of the limiting Gaussian for the log-LR, we compute the limit of the sum of the Type-I error and the Type-II error of the likelihood ratio test. We also prove similar results for a rank-one spiked IID model where the noise is asymmetric but the signal is symmetric.

1 Introduction

One of the most fundamental questions in statistics and data science is to detect a signal from noisy data. The simplest model for the ‘signal-plus-noise’ data is the spiked Wigner matrix, where the signal is a vector and the noise is a symmetric random matrix. In this model, the data matrix is of the form

M=H+λ𝒙𝒙T,M=H+\sqrt{\lambda}{\boldsymbol{x}}{\boldsymbol{x}}^{T},

where the signal 𝒙{\boldsymbol{x}} is an NN-dimensional vector and the noise matrix HH is an N×NN\times N Wigner matrix. (See Definition 2.1 and Assumption 2.2 for a precise definition.) The parameter λ\lambda is the signal-to-noise ratio (SNR).

Signal detection/recovery problems for the spiked Wigner matrix have been widely studied in the past decades. One of the most natural ways of detecting the signal is to analyze the eigenvalues of the data matrix, especially its largest eigenvalues, which is called the principal component analysis (PCA). With the normalization 𝔼[|Hij|2]=N1\mathbb{E}[|H_{ij}|^{2}]=N^{-1} for iji\neq j and 𝒙=1\|{\boldsymbol{x}}\|=1, it is well-known in random matrix theory that the bulk of the spectrum is supported on [2,2][-2,2] and the largest eigenvalue exhibits the following sharp phase transition. If λ>1\lambda>1, the largest eigenvalue of MM separates from the bulk, and hence the signal can be reliably detected by PCA. On the other hand, if λ<1\lambda<1, the largest eigenvalue converges to 22 and its fluctuation does not depend on λ\lambda; thus, the largest eigenvalue carries no information on the signal and PCA fails.

The spectral threshold λc=1\lambda_{c}=1 for the PCA, and it is optimal when the noise is Gaussian in the sense that reliable detection is impossible below the spectral threshold [23]. If the noise is non-Gaussian, however, the conventional PCA is sub-optimal and can be improved by transforming the data entrywise [4, 23]. In this case, with an optimal entrywise transformation, under suitable conditions, the threshold is lowered to 1/F1/F, where FF is the Fisher information of the (off-diagonal) noise distribution, which is strictly larger than 11 if the noise is non-Gaussian. (See (2.1) for its definition.)

Below the threshold λc=1/F\lambda_{c}=1/F for the non-Gaussian noise case, reliable detection is impossible [23], and it is only possible to consider hypothesis testing between the null model λ=0\lambda=0 and the alternative λ>0\lambda>0, which is called the weak detection. The fundamental limit of the weak detection can be proved by analyzing the performance of the likelihood ratio (LR) test, which minimizes the sum of the Type-I and Type-II errors by Neyman–Pearson lemma. For Gaussian noise, the log-LR coincides with the free energy of the Sherrington–Kirkpatrick (SK) model of spin glass. The limiting fluctuation of the free energy of the SK model is given by a Gaussian in the high-temperature case, and the result is directly applicable to the proof of asymptotic normality of the log-LR and the limiting error of the LR test.

While the performance of the LR test is of fundamental importance, to our best knowledge, no results have been proved when the noise is non-Gaussian. The main difficulty in the analysis of the LR test is that the connection between the log-LR and the free energy of the SK model is absent when the noise is non-Gaussian; though the free energy of the SK model converges to a normal distribution with non-Gaussian interaction for some cases [1, 3], it does not prove the Gaussian convergence of the log-LR.

In this paper, we focus on the likelihood ratio between the null hypothesis 𝑯0:λ=0{\boldsymbol{H}}_{0}:\lambda=0 and the alternative hypothesis 𝑯1:λ=ω>0{\boldsymbol{H}}_{1}:\lambda=\omega>0 for the subcritical case ω<1/F\omega<1/F under the assumption that the distribution of the signal, called the prior, is Rademacher, i.e., the xix_{i}’s are i.i.d. with (Nxi=1)=(Nxi=1)=1/2\mathbb{P}(\sqrt{N}x_{i}=1)=\mathbb{P}(\sqrt{N}x_{i}=-1)=1/2. Our goals are to prove that the asymptotic normality holds even with non-Gaussian noise under mild assumption on the noise and to compute the exact mean and the variance for the limiting Gaussian that enables us to find the limiting error of the LR test.

An interesting variant of the spiked Wigner matrix is the spiked IID matrix, where the entries of the noise matrix are independent and identically distributed without symmetry constraint. Such a model is natural when the (i,j)(i,j)-entry and (j,i)(j,i)-entry are sampled separately. While it seems natural to apply the singular value decomposition (SVD) for this case, it is in fact better to symmetrize the matrix first and then apply PCA to exploit the symmetry of the signal; for example, with the Gaussian noise, the threshold for the former λc=1\lambda_{c}=1, whereas λc=1/2\lambda_{c}=1/2 for the latter. The idea of symmetrization can even be combined with the entrywise transformation in [23] if the noise is non-Gaussian to prove that reliable detection is possible if λ>1/(2F)\lambda>1/(2F). However, to our best knowledge, no results are known for the subcritical case λ<1/(2F)\lambda<1/(2F).

Our goal for the spiked IID model is to prove the results that correspond to those for the spiked Wigner matrices, including the asymptotic normality of the log-LR for the spiked IID matrices and the precise formulas for the mean and the variance for the limiting Gaussian, for λ<1/(2F)\lambda<1/(2F). We remark that a consequence of the asymptotic normality of the log-LR and Le Cam’s first lemma is the impossibility of reliable detection for λ<1/(2F)\lambda<1/(2F), which establishes that the threshold for the reliable detection for the spiked IID matrices is given by λc=1/(2F)\lambda_{c}=1/(2F) for the non-Gaussian noise case.

1.1 Main contribution on spiked Wigner matrices

For the spiked Wigner matrices,

  • (Asymptotic normality - Theorem 2.3) we prove that the log-LR converges to a Gaussian, and

  • (Optimal error - Corollary 2.5) we compute the limit of the sum of the Type-I and Type-II errors of the LR test.

Let 0\mathbb{P}_{0} and 1\mathbb{P}_{1} be the probability distributions of a spiked Wigner matrix MM under 𝑯0{\boldsymbol{H}}_{0} and 𝑯1{\boldsymbol{H}}_{1}, respectively. If the noise is Gaussian, from the normalization in Definition 2.1 and Assumption 2.2, the likelihood ratio of 1\mathbb{P}_{1} with respect to 0\mathbb{P}_{0} is given by

(M;ω)=d1d0:=12N𝒙i<jexp(N2(NMijωNxixj)2)exp(N2(NMij)2)kexp(N2(NMkkωNxk2)2)exp(N2(NMkk)2)=12N𝒙i<jexp(ωN2Mijxixjω2)kexp(NωMkkω2),\begin{split}&{\mathcal{L}}(M;\omega)=\frac{\mathrm{d}\mathbb{P}_{1}}{\mathrm{d}\mathbb{P}_{0}}\\ &:=\frac{1}{2^{N}}\sum_{{\boldsymbol{x}}}\prod_{i<j}\frac{\exp(-\frac{N}{2}(\sqrt{N}M_{ij}-\sqrt{\omega N}x_{i}x_{j})^{2})}{\exp(-\frac{N}{2}(\sqrt{N}M_{ij})^{2})}\prod_{k}\frac{\exp(-\frac{N}{2}(\sqrt{N}M_{kk}-\sqrt{\omega N}x_{k}^{2})^{2})}{\exp(-\frac{N}{2}(\sqrt{N}M_{kk})^{2})}\\ &=\frac{1}{2^{N}}\sum_{{\boldsymbol{x}}}\prod_{i<j}\exp\left(\sqrt{\omega}N^{2}M_{ij}x_{i}x_{j}-\frac{\omega}{2}\right)\prod_{k}\exp\left(N\sqrt{\omega}M_{kk}-\frac{\omega}{2}\right),\end{split} (1.1)

where we used that xi2=N1x_{i}^{2}=N^{-1} for all ii. The random off-diagonal term in (LABEL:eq:Gaussian_LR),

12N𝒙i<jexp(ωN2Mijxixj)=12N𝒙exp(ωN2i<jMijxixj),\frac{1}{2^{N}}\sum_{{\boldsymbol{x}}}\prod_{i<j}\exp\left(\sqrt{\omega}N^{2}M_{ij}x_{i}x_{j}\right)=\frac{1}{2^{N}}\sum_{{\boldsymbol{x}}}\exp\left(\sqrt{\omega}N^{2}\sum_{i<j}M_{ij}x_{i}x_{j}\right),

coincides with the free energy of the Sherrington–Kirkpatrick model for which the asymptotic normality is known. (The diagonal part is independent from the off-diagonal part and its asymptotic (log)-normality can be easily proved by the CLT.)

If the noise is non-Gaussian, however, then we instead have

(M;ω)=12N𝒙i<jp(NMijωNxixj)p(NMij)kpd(NMkkωNxk2)pd(NMkk),{\mathcal{L}}(M;\omega)=\frac{1}{2^{N}}\sum_{{\boldsymbol{x}}}\prod_{i<j}\frac{p(\sqrt{N}M_{ij}-\sqrt{\omega N}x_{i}x_{j})}{p(\sqrt{N}M_{ij})}\prod_{k}\frac{p_{d}(\sqrt{N}M_{kk}-\sqrt{\omega N}x_{k}^{2})}{p_{d}(\sqrt{N}M_{kk})}\,, (1.2)

where pp and pdp_{d} are the densities of the (normalized) off-diagonal terms and the diagonal terms, respectively. Applying the Taylor expansion,

p(NMijωNxixj)p(NMij)1ωNp(NMij)p(NMij)xixj+ω2Np′′(NMij)p(NMij),\frac{p(\sqrt{N}M_{ij}-\sqrt{\omega N}x_{i}x_{j})}{p(\sqrt{N}M_{ij})}\simeq 1-\sqrt{\omega N}\,\frac{p^{\prime}(\sqrt{N}M_{ij})}{p(\sqrt{N}M_{ij})}x_{i}x_{j}+\frac{\omega}{2N}\frac{p^{\prime\prime}(\sqrt{N}M_{ij})}{p(\sqrt{N}M_{ij})}\,,

and after comparing it to the Taylor expansion of the exponential function, it is possible to rewrite the leading order terms of the off-diagonal part of the LR as

12N𝒙exp(i<j(Aijxixj+Bij)),\frac{1}{2^{N}}\sum_{{\boldsymbol{x}}}\exp\left(\sum_{i<j}(A_{ij}x_{i}x_{j}+B_{ij})\right), (1.3)

where AijA_{ij} and BijB_{ij} depend only on p(NMij)p(\sqrt{N}M_{ij}) and its derivatives. (See (4.3)-(LABEL:eq:ABC) for a precise definition.) Since AijA_{ij} and BijB_{ij} are not independent, the asymptotic normality of (1.3) is not obvious, and in fact, requires hard analysis.

The main technical difficulty in the analysis of the log-LR arises from that the mechanism for the Gaussian convergence of the terms containing AijA_{ij}, which we call the spin glass term, and those with BijB_{ij}, which we call the CLT term, are markedly different. While the latter can be readily proved by the CLT, the former is a result from the theory of spin glass based on the combinatorial analysis for the graph structure involving AijA_{ij}’s. To overcome the difficulty, adopting the strategy of [1], we decompose the spin glass term, 2N𝒙exp(i<j(Aijxixj))2^{-N}\sum_{{\boldsymbol{x}}}\exp(\sum_{i<j}(A_{ij}x_{i}x_{j})), into two parts, which are asymptotically orthogonal to each other, conditional on (Bij)(B_{ij}). (See Proposition 4.4 for more detail.) We can then verify the part that depends on (Bij)(B_{ij}) and express the dependence by a conditional expectation on (Bij)(B_{ij}). Combining the BB-dependent part of the spin glass term with the CLT term, we can prove the Gaussian convergence with a precise formula for the mean and the variance of the limiting distribution.

From the asymptotic normality, it is immediate to obtain the sum of the Type-I and Type-II errors of the LR test. The limiting error in Corollary 2.5 converges to 0 as λ(1/F)\lambda\nearrow(1/F), which shows that the detection threshold is given by λc=1/F\lambda_{c}=1/F. While the limiting error of the LR test for the case λ>1/F\lambda>1/F also converges to 0, which can be indirectly deduced from the fact that the transformed PCA in [23] can reliably detect the signal, our result is not directly applicable to this case; the asymptotic normality holds only in the regime λ<1/F\lambda<1/F and it is expected the asymptotic behavior of the log-LR changes if λ>1/F\lambda>1/F, which corresponds to the low temperature regime in the theory of spin glass.

The Rademacher prior we consider in this work is one of the most natural models for the rank-one signal-plus-noise data, and it also corresponds to the most fundamental spin configuration in the SK model. Our method relies on this special structure of the spike at least in the technical level. We believe that our main result would change with different priors, e.g., the spherical prior instead of the Rademacher prior, but the change is restricted to a subleading term (proportional to ω2\omega^{2}) in the mean and the variance of the limiting Gaussian. Note that such a change would not occur with the Gaussian noise where the quadratic term (of ω\omega) vanishes, which can be made rigorously since the asymptotic behavior of the log-LR with the spherical prior coincides that with many i.i.d. priors including the Rademacher prior [13]. See the discussion at the end of Section 2.2 and also the numerical experiments in Section 3 for details.

1.2 Main contribution on spiked IID matrices

For the spiked IID matrices, we also prove the asymptotic normality of the log-LR (Theorem 2.8) and compute the limit of the sum of the Type-I and Type-II errors of the LR test (Corollary 2.9).

Suppose that the data matrix YY is of the form Y=X+λ𝒙𝒙TY=X+\sqrt{\lambda}{\boldsymbol{x}}{\boldsymbol{x}}^{T}, where the signal 𝒙{\boldsymbol{x}} is an NN-dimensional vector and the noise XX is an N×NN\times N IID matrix. (See Definition 2.7 for a precise definition.) If the noise is Gaussian, similar to the spiked Wigner case in (LABEL:eq:Gaussian_LR), we find that the likelihood ratio

(Y;ω):=d1d0:=12N𝒙i,j=1Nexp(N2(NYijωNxixj)2)exp(N2(NYij)2)=12N𝒙i<jexp(ωN2(Yij+Yji)xixjω)kexp(NωYkkω2).\begin{split}{\mathcal{L}}(Y;\omega)&:=\frac{\mathrm{d}\mathbb{P}_{1}}{\mathrm{d}\mathbb{P}_{0}}:=\frac{1}{2^{N}}\sum_{{\boldsymbol{x}}}\prod_{i,j=1}^{N}\frac{\exp(-\frac{N}{2}(\sqrt{N}Y_{ij}-\sqrt{\omega N}x_{i}x_{j})^{2})}{\exp(-\frac{N}{2}(\sqrt{N}Y_{ij})^{2})}\\ &=\frac{1}{2^{N}}\sum_{{\boldsymbol{x}}}\prod_{i<j}\exp\left(\sqrt{\omega}N^{2}(Y_{ij}+Y_{ji})x_{i}x_{j}-\omega\right)\prod_{k}\exp\left(N\sqrt{\omega}Y_{kk}-\frac{\omega}{2}\right).\end{split} (1.4)

Since (Yij+Yji)(Y_{ij}+Y_{ji}) is again a Gaussian random variable whose variance is 2N12N^{-1}, under 𝑯0{\boldsymbol{H}}_{0}, the exponent in the off-diagonal term (ωN2(Yij+Yji)xixjω)(\sqrt{\omega}N^{2}(Y_{ij}+Y_{ji})x_{i}x_{j}-\omega) is equal in distribution to (2ωN2Yijxixjω)(\sqrt{2\omega}N^{2}Y_{ij}x_{i}x_{j}-\omega). Thus, when compared to the exponent in the off-diagonal term in (LABEL:eq:Gaussian_LR), we find that the SNR ω\omega is effectively doubled, and in particular, reliable detection is impossible if ω<1/2\omega<1/2. Note that in this case it is possible to reliably detect the signal if ω>1/2\omega>1/2 by PCA for the symmetrized matrix (Y+YT)/2(Y+Y^{T})/\sqrt{2}, whereas the largest singular value of MM is separated from other singular values only when ω>1\omega>1 (see, e.g., Section 3.1 of [6]).

If the noise is non-Gaussian, however, we face the same issue as in the spiked Wigner matrices; the log-LR can be decomposed into the spin glass term and the CLT term as in (1.3), which are not independent. The issue can be resolved by adapting the analysis for the spiked Wigner matrices, where we need to consider Aij+AjiA_{ij}+A_{ji} and Bij+BjiB_{ij}+B_{ji} in place of AijA_{ij} and BijB_{ij}. It is then possible to prove the asymptotic normality of the log-LR with precise formulas for the mean and the variance.

The results in Theorem 2.8 and Corollary 2.9 show that reliable detection is impossible if λ<1/(2F)\lambda<1/(2F). On the other hand, if we transform the data matrix entrywise and then symmetrize the transformed matrix, the largest eigenvalue of the resulting matrix becomes an outlier when λ>1/(2F)\lambda>1/(2F), and hence reliable detection is possible by PCA. Thus, we can conclude that the threshold λc=1/(2F)\lambda_{c}=1/(2F) for the reliable detection. See the discussion at the end of Section 2.3 for details.

1.3 Related works

The phase transition of the largest eigenvalue of the spiked Wigner matrix has been extensively studied in random matrix theory. It is known as the BBP transition after the seminal work by Baik, Ben Arous, and Péché [2] for spiked (complex) Wishart ensembles. Corresponding results for the spiked Wigner matrix were proved under various assumptions [22, 14, 7, 5]. For the behavior of the eigenvector associated with the largest eigenvalue of the spiked Wigner matrix, we refer to [5].

The impossibility of the reliable detection for the case λ<1\lambda<1 with the Gaussian noise was considered by Montanari, Reichman, and Zeitouni [18] and Johnstone and Onatski [16]. The optimality of the PCA for the Gaussian noise, the sub-optimality of the PCA and the optimality of the transformed PCA for the non-Gaussian noise was proved by Perry, Wein, Bandeira, and Moitra [23] under mild assumptions on the prior. We remark that the threshold λc\lambda_{c} may not be exactly 11 depending on the prior [13].

The analysis on the LR test and its fundamental limit were studied by El Alaoui, Krzakala, and Jordan [13] under the assumption that the signal is an i.i.d. random vector and the noise is Gaussian. We remark that there exists a non-LR test based on the linear spectral statistics of the data matrix, which is optimal if the noise is Gaussian [9].

The detection problem for the spiked rectangular model, introduced by Johnstone [15], has also been widely studied. We refer to [19, 20, 12, 17, 10] for the weak detection under Gaussian noise, including the optimal error of the hypothesis test, and [23, 17] for the sub-optimality of the PCA under non-Gaussian noise. For the spiked IID matrices and their applications, we refer to [8] and references therein.

The SK model is one of the most fundamental models of spin glass. The model was introduced by Sherrington and Kirkpatrick [24] as a mean-field version of the Edwards–Anderson model [11]. The (deterministic) limit of the free energy of the SK model was first predicted by Parisi [21], and it was later rigorously proved by Talagrand [25]. The fluctuation of the free energy for the SK model was studied by Aizenman, Lebowitz, and Ruelle [1]; they showed the Gaussian convergence of the fluctuation of the free energy in the high temperature case, which also holds with non-Gaussian noise. To the best of our best knowledge, the fluctuation in the low temperature case, including the zero-temperature case, still remains open.

1.4 Organization of the paper

The rest of the paper is organized as follows. In Section 2, we precisely define the model and state the main results. In Section 3, we present some examples and conduct numerical experiments to compare the outcomes with the theoretical results. In Sections 4 and 5, we prove our main results on the spiked Wigner matrices. We conclude the paper in Section 6 by summarizing our results and proposing future research directions. Some details of the proof, including the proof for the results on the spiked IID matrices, can be found in Appendix.

2 Main results

2.1 Definition of the model

We begin by precisely defining the model we consider in this paper. We suppose that the noise matrix is a Wigner matrix, which is defined as follows:

Definition 2.1 (Wigner matrix).

We say an N×NN\times N random matrix H=(Hij)H=(H_{ij}) is a Wigner matrix if HH is a real symmetric matrix and HijH_{ij} (1ijN1\leq i\leq j\leq N) are independent centered real random variables satisfying

  • (Normalization) N𝔼[Hij2]=1N\mathbb{E}[H_{ij}^{2}]=1 for all 1i<jN1\leq i<j\leq N and N𝔼[Hii2]=w2N\mathbb{E}[H_{ii}^{2}]=w_{2} for some w2w_{2}, independent of NN,

  • (Finite moment) for any D>0D>0, there exists a constant CDC_{D}, independent of NN, such that ND2𝔼[HijD]CDN^{\frac{D}{2}}\mathbb{E}[H_{ij}^{D}]\leq C_{D} for all 1ijN1\leq i\leq j\leq N.

For the data matrix MM, we assume that it is a spiked Wigner matrix, i.e., M=H+λ𝒙𝒙TM=H+\sqrt{\lambda}{\boldsymbol{x}}{\boldsymbol{x}}^{T}, and also additionally assume the following:

Assumption 2.2.

For the spike 𝒙{\boldsymbol{x}}, we assume that the normalized entries Nxi\sqrt{N}x_{i} are i.i.d. random variables with Rademacher distribution.

For the noise matrix HH, let pp and pdp_{d} be the densities of the normalized off-diagonal entries NHij\sqrt{N}H_{ij} and the normalized diagonal entries NHii\sqrt{N}H_{ii}, respectively. We assume the following:

  • The density functions pp and pdp_{d} are smooth, positive everywhere, and symmetric (about 0).

  • The functions pp, pdp_{d}, and their all derivatives vanish at infinity.

  • The functions p(s)/pp^{(s)}/p and pd(s)/pdp_{d}^{(s)}/p_{d} are polynomially bounded, i.e., for any positive integer ss there exist constants Cs,rs>0C_{s},r_{s}>0, independent of NN, such that |p(s)(x)/p(x)|,|pd(s)(x)/pd(x)|Cs|x|rs|p^{(s)}(x)/p(x)|,|p_{d}^{(s)}(x)/p_{d}(x)|\leq C_{s}|x|^{r_{s}} uniformly on xx. (Here, p(s)p^{(s)} and pd(s)p_{d}^{(s)} are the ss-th derivatives of pp and pdp_{d}, respectively.)

Note that we assume the off-diagonal entries are identically distributed. While it is possible to consider a spiked Wigner matrix with non-identically distributed noise matrix by assuming certain conditions on their density functions, we refrain from it to avoid unnecessary complications. Similarly, we remark that some conditions in Definition 2.1 and Assumption 2.2 are due to technical reasons, especially the finite moment condition and the symmetric condition for the density of the noise, and we believe that it is possible to relax the conditions. However, for the sake of brevity, we do not attempt to relax them further in this paper.

2.2 Spiked Wigner matrices

Recall that the SNR ω\omega for the alternative 𝑯1{\boldsymbol{H}}_{1} is fixed. Our main result is the following asymptotic normality for the log-LR statistic when testing between 𝑯0:λ=0{\boldsymbol{H}}_{0}:\lambda=0 versus 𝑯1:λ=ω>0{\boldsymbol{H}}_{1}:\lambda=\omega>0.

Theorem 2.3.

Suppose that MM is a spiked Wigner matrix satisfying Assumption 2.2. Define

F:=(p(x))2p(x)dx,Fd:=(pd(x))2pd(x)dx,G:=(p′′(x))2p(x)dx.F:=\int_{-\infty}^{\infty}\frac{(p^{\prime}(x))^{2}}{p(x)}\mathrm{d}x,\quad F_{d}:=\int_{-\infty}^{\infty}\frac{(p_{d}^{\prime}(x))^{2}}{p_{d}(x)}\mathrm{d}x,\quad G:=\int_{-\infty}^{\infty}\frac{(p^{\prime\prime}(x))^{2}}{p(x)}\mathrm{d}x\,. (2.1)

If ωF<1\omega F<1, then the log likelihood ratio log(M;λ)\log{\mathcal{L}}(M;\lambda) of 𝒫1{\mathcal{P}}_{1} with respect to 𝒫0{\mathcal{P}}_{0} under 𝐇0{\boldsymbol{H}}_{0} (defined in (1.2)) converges in distribution to 𝒩(ρ,2ρ){\mathcal{N}}(-\rho,2\rho) as NN\to\infty, where

ρ:=14(log(1ωF)+ω(F2Fd)+ω24(2F2G)).\rho:=-\frac{1}{4}\left(\log(1-\omega F)+\omega(F-2F_{d})+\frac{\omega^{2}}{4}(2F^{2}-G)\right). (2.2)
Remark 2.4.

Applying Le Cam’s third lemma, we also find from Theorem 2.3 that the log likelihood ratio log(M;λ)\log{\mathcal{L}}(M;\lambda) under 𝑯1{\boldsymbol{H}}_{1} converges in distribution to 𝒩(ρ,2ρ){\mathcal{N}}(\rho,2\rho).

If the off-diagonal density pp is Gaussian, we find that F=1F=1, G=2G=2, and the result in Theorem 2.3 coincides with Theorem 2 (and Remark below it) of [13], which states that log(M;λ)\log{\mathcal{L}}(M;\lambda) converges in distribution to 𝒩(ρ,2ρ){\mathcal{N}}(-\rho^{\prime},2\rho^{\prime}) where

ρ=14(log(1ω)+ω(12Fd)).\rho^{\prime}=-\frac{1}{4}\big{(}\log(1-\omega)+\omega(1-2F_{d})\big{)}.

From the comparison with the Gaussian case, ignoring the term (ω2/4)(2F2G)(\omega^{2}/4)(2F^{2}-G) in (2.2), the result in Theorem 2.3 heuristically shows that the effective SNR is ωF\omega F when the noise is non-Gaussian.

The change of the effective SNR with non-Gaussian noise can be understood as follows: under 𝑯1{\boldsymbol{H}}_{1}, if we apply a function qq to a normalized entry NMij\sqrt{N}M_{ij}, then by the Taylor expansions,

q(NMij)=q(NHij+ωNxixj)q(NHij)+ωNq(NHij)xixj.q(\sqrt{N}M_{ij})=q(\sqrt{N}H_{ij}+\sqrt{\omega N}x_{i}x_{j})\approx q(\sqrt{N}H_{ij})+\sqrt{\omega N}q^{\prime}(\sqrt{N}H_{ij})x_{i}x_{j}. (2.3)

Approximating the right-hand side further by

q(NHij)+ωNq(NHij)xixjq(NHij)+ωN𝔼[q(NHij)]xixj=N(q(NHij)N+ω𝔼[q(NHij)]xixj),\begin{split}q(\sqrt{N}H_{ij})+\sqrt{\omega N}q^{\prime}(\sqrt{N}H_{ij})x_{i}x_{j}&\approx q(\sqrt{N}H_{ij})+\sqrt{\omega N}\mathbb{E}[q^{\prime}(\sqrt{N}H_{ij})]x_{i}x_{j}\\ &=\sqrt{N}\left(\frac{q(\sqrt{N}H_{ij})}{\sqrt{N}}+\sqrt{\omega}\mathbb{E}[q^{\prime}(\sqrt{N}H_{ij})]x_{i}x_{j}\right),\end{split}

we find that the matrix obtained by transforming MM entrywise via qq is of the form N(Q+ω𝒙𝒙T)\sqrt{N}(Q+\sqrt{\omega^{\prime}}{\boldsymbol{x}}{\boldsymbol{x}}^{T}), with Qij=q(NHij)/NQ_{ij}=q(\sqrt{N}H_{ij})/\sqrt{N} and ω=ω𝔼[q(NHij)]\sqrt{\omega^{\prime}}=\sqrt{\omega}\mathbb{E}[q^{\prime}(\sqrt{N}H_{ij})], which shows that the SNR is effectively changed. The approximation can be made rigorous [23], and after optimizing qq, it can be shown that effective SNR is ωF\omega F, which is independent of the prior under mild assumptions. We remark that the optimal qq is given by p/p-p^{\prime}/p, and it is unique up to a constant factor.

An immediate consequence of Theorem 2.3 is the following estimate for the error probability of the LR test.

Corollary 2.5.

Suppose that the assumptions of Theorem 2.3 hold. Then, the error probability of the likelihood ratio test

err(ω):=(Reject𝑯0|𝑯0)+(Donotreject𝑯0|𝑯1)\mathrm{err}(\omega):=\mathbb{P}(\mathrm{Reject}\,{\boldsymbol{H}}_{0}|{\boldsymbol{H}}_{0})+\mathbb{P}(\mathrm{Do}\,\,\mathrm{not}\,\,\mathrm{reject}\,{\boldsymbol{H}}_{0}|{\boldsymbol{H}}_{1})

converges to

erfc(ρ2)=2πρ2ex2dx\mathrm{erfc}\left(\frac{\sqrt{\rho}}{2}\right)=\frac{2}{\sqrt{\pi}}\int_{\frac{\sqrt{\rho}}{2}}^{\infty}e^{-x^{2}}\mathrm{d}x

as NN\to\infty, where ρ\rho is as defined in (2.2).

The proof of Corollary 2.5 from Theorem 2.3 is standard; we refer to the proof of Theorem 2 of [9].

Note that the error err(ω)\mathrm{err}(\omega) converges to 0 as ωF1\omega F\nearrow 1, which is consistent with the discussion below Theorem 2.3 that the effective SNR is ωF\omega F.

Remark 2.6.

Since the likelihood ratio test minimizes the sum of the Type-I error and the Type-II error by the Neyman–Pearson lemma, we find that for any hypothesis test 𝑯0{\boldsymbol{H}}_{0} versus 𝑯1{\boldsymbol{H}}_{1}, err(ω)\mathrm{err}(\omega) is asymptotically bounded above by erfc(ρ/2)\mathrm{erfc}(\sqrt{\rho}/2).

The most non-trivial part in Theorem 2.3 and Corollary 2.5 is the third term of ρ\rho in (2.2),

ω24(2F2G).\frac{\omega^{2}}{4}(2F^{2}-G). (2.4)

We conjecture that this term is due to the Rademacher prior and it would change if we assume a different prior. Our heuristic argument for it is as follows. In [9], an algorithm was proposed for hypothesis testing based on the linear spectral statistics (LSS) of the data matrix MM, which is optimal if the noise is Gaussian; the LSS of the matrix MM whose eigenvalues are λ1λ2λN\lambda_{1}\geq\lambda_{2}\geq\dots\geq\lambda_{N} is

i=1Nf(λi)\sum_{i=1}^{N}f(\lambda_{i})

for a function ff continuous on an open neighborhood of [2,2][-2,2]. If the noise is non-Gaussian, the algorithm in [9] can be improved by pre-transforming the data matrix entrywise as in (2.3). The error of the improved algorithm is

erfc(ρ~/2),\mathrm{erfc}\left(\sqrt{\widetilde{\rho}}/2\right), (2.5)

where

ρ~=14(log(1ωF)+ω(F2Fd)+ω24(2F2G~))\widetilde{\rho}=-\frac{1}{4}\left(\log(1-\omega F)+\omega(F-2F_{d})+\frac{\omega^{2}}{4}(2F^{2}-\widetilde{G})\right)

with

G~=((p(x))2p′′(x)p(x)dx)2/(32(p(x))2p′′(x)p(x)dxF2).\widetilde{G}=\left(\int_{-\infty}^{\infty}\frac{(p^{\prime}(x))^{2}p^{\prime\prime}(x)}{p(x)}\mathrm{d}x\right)^{2}\Big{/}\left(\frac{3}{2}\int_{-\infty}^{\infty}\frac{(p^{\prime}(x))^{2}p^{\prime\prime}(x)}{p(x)}\mathrm{d}x-F^{2}\right).

The error (2.5) differs from the error in Corollary 2.5 only in the term involving ω2\omega^{2}. The hypothesis test based on the eigenvalues are spherical in nature, since it does not depend on the eigenvectors, or specific directions. It means that the prior naturally associated with the LSS-based test is the spherical prior, which is the uniform distribution on the unit sphere. We thus conjecture that the error term in (2.4) depends on the prior of the model.

2.3 Spiked IID matrices

The results on the spiked Wigner matrices in Theorem 2.3 can be naturally extended to the spiked IID matrices defined as follows:

Definition 2.7 (IID matrix).

We say an N×NN\times N real random matrix X=(Xij)X=(X_{ij}) is an IID matrix if XijX_{ij} (1i,jN1\leq i,j\leq N) are independent and identically distributed centered real random variables satisfying that (i) N𝔼[Xij2]=1N\mathbb{E}[X_{ij}^{2}]=1 and (ii) for any D>0D>0, there exists a constant CDC_{D}, independent of NN, such that ND2𝔼[XijD]CDN^{\frac{D}{2}}\mathbb{E}[X_{ij}^{D}]\leq C_{D}.

For a spiked IID matrix of the form Y=X+λ𝒙𝒙TY=X+\sqrt{\lambda}{\boldsymbol{x}}{\boldsymbol{x}}^{T}, we have following result.

Theorem 2.8.

Suppose that Y=X+λ𝐱𝐱TY=X+\sqrt{\lambda}{\boldsymbol{x}}{\boldsymbol{x}}^{T} is a spiked IID matrix and Assumption 2.2 holds for YY where pp is the density of the normalized entry NXij\sqrt{N}X_{ij}. Let

F:=(p(x))2p(x)dx,G:=(p′′(x))2p(x)dx.F:=\int_{-\infty}^{\infty}\frac{(p^{\prime}(x))^{2}}{p(x)}\mathrm{d}x,\quad G:=\int_{-\infty}^{\infty}\frac{(p^{\prime\prime}(x))^{2}}{p(x)}\mathrm{d}x\,. (2.6)

If 2ωF<12\omega F<1, then the log likelihood ratio log(Y;λ)\log{\mathcal{L}}(Y;\lambda) of 𝒫1{\mathcal{P}}_{1} with respect to 𝒫0{\mathcal{P}}_{0} under 𝐇0{\boldsymbol{H}}_{0}, defined by

(Y;ω):=12N𝒙i,j=1Np(NYijωNxixj)p(NYij){\mathcal{L}}(Y;\omega):=\frac{1}{2^{N}}\sum_{{\boldsymbol{x}}}\prod_{i,j=1}^{N}\frac{p(\sqrt{N}Y_{ij}-\sqrt{\omega N}x_{i}x_{j})}{p(\sqrt{N}Y_{ij})} (2.7)

converges in distribution to 𝒩(ρ,2ρ){\mathcal{N}}(-\rho^{*},2\rho^{*}) as NN\to\infty, where

ρ:=14(log(12ωF)+ω22(2F2G)).\rho^{*}:=-\frac{1}{4}\left(\log(1-2\omega F)+\frac{\omega^{2}}{2}(2F^{2}-G)\right). (2.8)

See Appendix C for the proof of Theorem 2.8. From Theorem 2.8, we can obtain the following estimate for the error probability of the LR test.

Corollary 2.9.

Suppose that the assumptions of Theorem 2.8 hold. Then, the error probability of the likelihood ratio test

err(ω):=(Reject𝑯0|𝑯0)+(Donotreject𝑯0|𝑯1)\mathrm{err}(\omega):=\mathbb{P}(\mathrm{Reject}\,{\boldsymbol{H}}_{0}|{\boldsymbol{H}}_{0})+\mathbb{P}(\mathrm{Do}\,\,\mathrm{not}\,\,\mathrm{reject}\,{\boldsymbol{H}}_{0}|{\boldsymbol{H}}_{1})

converges to

erfc(ρ2)=2πρ2ex2dx\mathrm{erfc}\left(\frac{\sqrt{\rho^{*}}}{2}\right)=\frac{2}{\sqrt{\pi}}\int_{\frac{\sqrt{\rho^{*}}}{2}}^{\infty}e^{-x^{2}}\mathrm{d}x

as NN\to\infty, where ρ\rho^{*} is as defined in (2.8).

We omit the proof of Corollary 2.9; see the proof of Theorem 2 of [9].

Comparing Theorem 2.8 with Theorem 2.3, we find that the effective SNR is 2ωF2\omega F, which is doubled from the effective ωF\omega F for the spiked Wigner matrices. The doubling of the effective SNR can also be heuristically checked from the LR as follows. By definition,

(Y;ω)=12N𝒙ijp(NYijωNxixj)p(NYij)kp(NXkkωNxk2)p(NXkk),\begin{split}{\mathcal{L}}(Y;\omega)=\frac{1}{2^{N}}\sum_{{\boldsymbol{x}}}\prod_{i\neq j}\frac{p(\sqrt{N}Y_{ij}-\sqrt{\omega N}x_{i}x_{j})}{p(\sqrt{N}Y_{ij})}\prod_{k}\frac{p(\sqrt{N}X_{kk}-\sqrt{\omega N}x_{k}^{2})}{p(\sqrt{N}X_{kk})}\,,\end{split} (2.9)

The off-diagonal term can be further decomposed into the upper-diagonal term and the lower-diagonal term, i.e.,

ijp(NYijωNxixj)p(NYij)=i<jp(NYijωNxixj)p(NYij)i>jp(NYijωNxixj)p(NYij).\prod_{i\neq j}\frac{p(\sqrt{N}Y_{ij}-\sqrt{\omega N}x_{i}x_{j})}{p(\sqrt{N}Y_{ij})}=\prod_{i<j}\frac{p(\sqrt{N}Y_{ij}-\sqrt{\omega N}x_{i}x_{j})}{p(\sqrt{N}Y_{ij})}\prod_{i>j}\frac{p(\sqrt{N}Y_{ij}-\sqrt{\omega N}x_{i}x_{j})}{p(\sqrt{N}Y_{ij})}.

The upper-diagonal term and the lower-diagonal term are almost independent, which effectively makes the LR becomes squared and the log-LR doubled. However, these terms are not exactly independent due to the spike xixjx_{i}x_{j}, and this is why the quadratic term (ω2/2)(2F2G)(\omega^{2}/2)(2F^{2}-G) in ρ\rho^{*} is not the same as the four times the quadratic term (ω2/4)(2F2G)(\omega^{2}/4)(2F^{2}-G), which is what one would get by changing ω\omega to 2ω2\omega in (2.2). Note that the linear term ω(F2Fd)\omega(F-2F_{d}) vanishes in (2.8) due to the assumption that Fd=FF_{d}=F, since the term ωF\omega F, arising from the off-diagonal terms, gets doubled for the spiked IID matrices but 2ωFd2\omega F_{d} does not as it originates from the diagonal terms.

Theorem 2.8 implies the impossibility of the reliable detection when λ<1/(2F)\lambda<1/(2F). To conclude that the threshold λc=1/(2F)\lambda_{c}=1/(2F) for the reliable detection, we can consider the following variant of the PCA. We first apply the function qq entrywise to the data matrix as in (2.3). After normalizing the transformed matrix by dividing it by N\sqrt{N}, it is approximately a spiked IID matrix. In the next step, we symmetrize the (normalized) transformed matrix, i.e., we compute

Y~ij=q(NYij)+q(NYji)2N.\widetilde{Y}_{ij}=\frac{q(\sqrt{N}Y_{ij})+q(\sqrt{N}Y_{ji})}{\sqrt{2N}}.

From the Taylor expansion, we find that

q(NYij)+q(NYji)2Nq(NXij)+q(NXji)2N+2ω𝔼[q(NXij)]xixj,\begin{split}\frac{q(\sqrt{N}Y_{ij})+q(\sqrt{N}Y_{ji})}{\sqrt{2N}}\approx\frac{q(\sqrt{N}X_{ij})+q(\sqrt{N}X_{ji})}{\sqrt{2N}}+\sqrt{2\omega}\mathbb{E}[q^{\prime}(\sqrt{N}X_{ij})]x_{i}x_{j},\end{split}

which is approximately a spiked Wigner matrix. Following the proof in [23], it is then immediate to prove that the reliable detection is possible by applying the PCA to Y~\widetilde{Y} if λc>1/(2F)\lambda_{c}>1/(2F).

3 Examples and numerical experiments

In this section, we consider a spiked Wigner model with non-Gaussian noise and conduct numerical experiments to compare the error from the simulation with the theoretical error in Corollary 2.5.

Suppose that the density of the noise matrix is

p(x)=pd(x)=sech(πx/2)2=1eπx/2+eπx/2.p(x)=p_{d}(x)=\frac{\operatorname{sech}(\pi x/2)}{2}=\frac{1}{e^{\pi x/2}+e^{-\pi x/2}}.

We sample Wij(ij)W_{ij}(i\leq j) from the density pp and let Hij=Wij/NH_{ij}=W_{ij}/\sqrt{N}. We also sample the spike 𝒙=(x1,x2,,xN){\boldsymbol{x}}=(x_{1},x_{2},\dots,x_{N}) so that Nxi\sqrt{N}x_{i}’s are i.i.d. Rademacher. The data matrix M=H+λ𝒙𝒙TM=H+\sqrt{\lambda}{\boldsymbol{x}}{\boldsymbol{x}}^{T}. From direct calculation, it is straightforward to see that the constants in Theorem 2.3 are

F=Fd=π28,G=π44.F=F_{d}=\frac{\pi^{2}}{8},\quad G=\frac{\pi^{4}}{4}.

Assume that ω<1/F=8/π2\omega<1/F=8/\pi^{2}. By Corollary 2.5, the limiting error of the LR test is

erfc(14log(1π2ω8)+π2ω8+7π4ω2128).\mathrm{erfc}\left(\frac{1}{4}\sqrt{-\log\big{(}1-\frac{\pi^{2}\omega}{8}\big{)}+\frac{\pi^{2}\omega}{8}+\frac{7\pi^{4}\omega^{2}}{128}}\right). (3.1)

We perform a numerical experiment with 500500 samples of the 32×3232\times 32 matrix MM under 𝑯0(λ=0){\boldsymbol{H}}_{0}(\lambda=0) and 𝑯1(λ=ω){\boldsymbol{H}}_{1}(\lambda=\omega), respectively, varying the SNR ω\omega from 0 to 0.50.5. Since finding the exact log-LR in (1.2) for each sample is not viable computationally (as it requires to compute the average of the likelihood ratios with 2322^{32} different spikes), we apply a Monte Carlo method as follows: for each MM, we compute

():=ijp(NMijωNxi()xj())p(NMij)=sech((πNMijπωNxi()xj())/2)sech(πNMij/2),{\mathcal{L}}^{(\ell)}:=\prod_{i\leq j}\frac{p(\sqrt{N}M_{ij}-\sqrt{\omega N}x^{(\ell)}_{i}x^{(\ell)}_{j})}{p(\sqrt{N}M_{ij})}=\frac{\operatorname{sech}\big{(}(\pi\sqrt{N}M_{ij}-\pi\sqrt{\omega N}x^{(\ell)}_{i}x^{(\ell)}_{j})/2\big{)}}{\operatorname{sech}(\pi\sqrt{N}M_{ij}/2)}, (3.2)

for 11051\leq\ell\leq 10^{5}, where Nxi()\sqrt{N}x^{(\ell)}_{i}’s are i.i.d. Rademacher, independent of 𝒙{\boldsymbol{x}}. We use the test statistic ¯\overline{\mathcal{L}}, which is the average of (){\mathcal{L}}^{(\ell)}’s, i.e.,

¯:=1105=1105().\overline{\mathcal{L}}:=\frac{1}{10^{5}}\sum_{\ell=1}^{10^{5}}{\mathcal{L}}^{(\ell)}. (3.3)

The behavior of the test statistic ¯\overline{\mathcal{L}} under 𝑯0{\boldsymbol{H}}_{0} and 𝑯1{\boldsymbol{H}}_{1} for the SNR ω=0.3\omega=0.3 and ω=0.4\omega=0.4 can be found in Figure 1.

Refer to caption
Figure 1: The histogram of the log likelihood ratios under H0H_{0} and H1H_{1}, respectively, for the SNR ω=0.3\omega=0.3 and ω=0.4\omega=0.4 after 500 runs. The limiting log likelihood ratios are also plotted.

To compare the error from the numerical experiment with the theoretical error, we consider the test in which we accept 𝑯0{\boldsymbol{H}}_{0} if ¯1\overline{\mathcal{L}}\leq 1 and reject 𝑯1{\boldsymbol{H}}_{1} otherwise. The result can be found in Figure 2.(a), which shows that the empirical LLR error closely matches the theoretical limiting error in (3.1).

In the algorithm proposed in [9], the data matrix was pre-transformed by

M~ij=2Ntanh(πN2Mij).\widetilde{M}_{ij}=\sqrt{\frac{2}{N}}\tanh\left(\frac{\pi\sqrt{N}}{2}M_{ij}\right).

The test statistic based on the LSS of M~\widetilde{M} was used for the hypothesis test, which is given by

L~ω=logdet((1+π2ω8)Iπ2ω8M~)+π2ωN16+πω22TrM~+π2ω16(TrM~2N).\begin{split}\widetilde{L}_{\omega}=-\log\det\left((1+\frac{\pi^{2}\omega}{8})I-\sqrt{\frac{\pi^{2}\omega}{8}}\widetilde{M}\right)+\frac{\pi^{2}\omega N}{16}+\frac{\pi\sqrt{\omega}}{2\sqrt{2}}\operatorname{Tr}\widetilde{M}+\frac{\pi^{2}\omega}{16}(\operatorname{Tr}\widetilde{M}^{2}-N).\end{split}

We accept 𝑯0{\boldsymbol{H}}_{0} if

L~ωlog(1π2ω8)3π4ω2512\widetilde{L}_{\omega}\leq-\log\left(1-\frac{\pi^{2}\omega}{8}\right)-\frac{3\pi^{4}\omega^{2}}{512}

and reject 𝑯0{\boldsymbol{H}}_{0} otherwise. The limiting error of this test is

erfc(14log(1π2ω8)+π2ω8).\mathrm{erfc}\left(\frac{1}{4}\sqrt{-\log\big{(}1-\frac{\pi^{2}\omega}{8}\big{)}+\frac{\pi^{2}\omega}{8}}\right). (3.4)

Note that the error of this algorithm is larger than that of the LR test in (3.1), as erfc(z)\mathrm{erfc}(z) is a decreasing function of zz. See also Figure 2.(a). We remark that the difference between the empirical error and the limiting error increases as the SNR ω\omega increases, and it is possibly due to the finite size effect, i.e., the contribution from the small terms that were ignored in the asymptotic expansions.

Refer to caption
(a) Rademacher prior
Refer to caption
(b) Spherical prior
Figure 2: The empirical LLR errors (blue solid curves) with the varying SNR ω\omega. (a) The empirical LLR error with Rademacher prior is compared with its limiting error of LLR (black dash-dot curve) and that of linear spectral statistics (LSS) (red dashed curve). The empirical LLR error closely matches its limiting error and is lower than the limiting LSS error. (b) The empirical LLR error with spherical prior is compared with the limiting error of LSS. The empirical LLR error with spherical prior closely matches the limiting LSS error.

To numerically test the conjecture introduced in the last paragraph of Section 2.2, we perform a simulation with the spherical prior instead of the Rademacher prior. More precisely, we use the same density for the noise matrix, while we sample the spike by first sampling x~i\widetilde{x}_{i} to be i.i.d. standard Gaussians and then normalizing it by

xi=x~ix~12+x~22++x~N2,x_{i}=\frac{\widetilde{x}_{i}}{\sqrt{\widetilde{x}_{1}^{2}+\widetilde{x}_{2}^{2}+\dots+\widetilde{x}_{N}^{2}}},

which lets the spike 𝒙=(x1,x2,,xN){\boldsymbol{x}}=(x_{1},x_{2},\dots,x_{N}) be distributed uniformly on the unit sphere. In the numerical experiment with the spherical prior, we again generated 500500 samples of the 32×3232\times 32 matrix MM under 𝑯0(λ=0){\boldsymbol{H}}_{0}(\lambda=0) and 𝑯1(λ=ω){\boldsymbol{H}}_{1}(\lambda=\omega), respectively, varying the SNR ω\omega from 0 to 0.50.5. For the LR test, we used the test statistic in (3.3) for which the random sample spike 𝒙()=(x1(),x2(),,xN()){\boldsymbol{x}}^{(\ell)}=(x^{(\ell)}_{1},x^{(\ell)}_{2},\dots,x^{(\ell)}_{N}) is drawn uniformly on the unit sphere. The numerical error of the LR test can be found in Figure 2.(b), which shows that the empirical LLR error closely matches the theoretical limiting error of the LSS test in (3.4). From the numerical experiment, we can check that the error from the LR test depends on the prior, and it also suggests that the LSS-based test is optimal in the sense that it minimizes the sum of the Type-I error and the Type-II error.

4 Proof of main theorem

In this section, we provide the detailed proof of Theorem 2.3. We use the following shorthand notations for the proof.

Notational Remark 4.1.

For XX and YY, which can be deterministic numbers and/or random variables depending on NN, we use the notation X=𝒪(Y)X={\mathcal{O}}(Y) if for any (small) ε>0\varepsilon>0 and (large) D>0D>0 there exists N0N0(ε,D)N_{0}\equiv N_{0}(\varepsilon,D) such that (|X|>Nε|Y|)<ND\mathbb{P}(|X|>N^{\varepsilon}|Y|)<N^{-D} whenever N>N0N>N_{0}. For an event Ω\Omega, we say that Ω\Omega holds with high probability if for any (large) D>0D>0 there exists N0N0(D)N_{0}\equiv N_{0}(D) such that (Ωc)<ND\mathbb{P}(\Omega^{c})<N^{-D} whenever N>N0N>N_{0}. For a sequence of random variables, the notation \Rightarrow denotes the convergence in distribution as N.N\rightarrow\infty.

The following lemma will be frequently used in the proof of Theorem 2.3.

Lemma 4.2.

For any 1i,jN1\leq i,j\leq N and for any positive integer ss,

Hij=𝒪(N12),p(s)(NHij)p(NHij),pd(s)(NHii)pd(NHii)=𝒪(1).H_{ij}={\mathcal{O}}(N^{-\frac{1}{2}}),\quad\frac{p^{(s)}(\sqrt{N}H_{ij})}{p(\sqrt{N}H_{ij})},\frac{p_{d}^{(s)}(\sqrt{N}H_{ii})}{p_{d}(\sqrt{N}H_{ii})}={\mathcal{O}}(1).

Moreover, if we define

h(x):=p(x)p(x),hd(x)=pd(x)pd(x),h(x):=-\frac{p^{\prime}(x)}{p(x)},\quad h_{d}(x)=-\frac{p_{d}^{\prime}(x)}{p_{d}(x)},

then

h(s)(NHij),hd(s)(NHii)=𝒪(1).h^{(s)}(\sqrt{N}H_{ij}),h_{d}^{(s)}(\sqrt{N}H_{ii})={\mathcal{O}}(1).

Lemma 4.2 can be proved by applying standard arguments using Markov’s inequality and the finite moment condition in Definition 2.1. See Appendix A for the detailed proof.

From Lemma 4.2, we find that for any given ϵ>0\epsilon>0, NHij\sqrt{N}H_{ij}, p(s)(NHij)/p(NHij)p^{(s)}(\sqrt{N}H_{ij})/p(\sqrt{N}H_{ij}), and pd(s)(NHii)/pd(NHii)p_{d}^{(s)}(\sqrt{N}H_{ii})/p_{d}(\sqrt{N}H_{ii}) are uniformly bounded by NϵN^{\epsilon} with high probability. Thus, we have the following estimate for the i.i.d. sums of these random variables.

Lemma 4.3.

Define

Pij(s):=p(s)(NHij)p(NHij),(Pd(s))ii:=pd(s)(NHii)pd(NHii).P^{(s)}_{ij}:=\frac{p^{(s)}(\sqrt{N}H_{ij})}{p(\sqrt{N}H_{ij})},\quad(P_{d}^{(s)})_{ii}:=\frac{p_{d}^{(s)}(\sqrt{N}H_{ii})}{p_{d}(\sqrt{N}H_{ii})}. (4.1)

For any s1,s2,,sk1s_{1},s_{2},\dots,s_{k}\geq 1,

2N(N1)i<jPij(s1)Pij(s2)Pij(sk)=𝔼[P12(s1)P12(s2)P12(sk)]+𝒪(N1),1Ni=1N(Pd(s1))ii(Pd(s2))ii(Pd(sk))ii=𝔼[(Pd(s1))11(Pd(s2))11(Pd(sk))11]+𝒪(N12).\begin{split}\frac{2}{N(N-1)}\sum_{i<j}P^{(s_{1})}_{ij}P^{(s_{2})}_{ij}\dots P^{(s_{k})}_{ij}&=\mathbb{E}[P^{(s_{1})}_{12}P^{(s_{2})}_{12}\dots P^{(s_{k})}_{12}]+{\mathcal{O}}(N^{-1}),\\ \frac{1}{N}\sum_{i=1}^{N}(P_{d}^{(s_{1})})_{ii}(P_{d}^{(s_{2})})_{ii}\dots(P_{d}^{(s_{k})})_{ii}&=\mathbb{E}[(P_{d}^{(s_{1})})_{11}(P_{d}^{(s_{2})})_{11}\dots(P_{d}^{(s_{k})})_{11}]+{\mathcal{O}}(N^{-\frac{1}{2}}).\end{split} (4.2)

See Appendix A for the detailed proof of Lemma 4.3.

4.1 Decomposition of LR

Assume 𝑯0{\boldsymbol{H}}_{0}. Recall that

(M;ω)=12N𝒙i<jp(NMijωNxixj)p(NMij)kpd(NMkkωNxk2)pd(NMkk).{\mathcal{L}}(M;\omega)=\frac{1}{2^{N}}\sum_{{\boldsymbol{x}}}\prod_{i<j}\frac{p(\sqrt{N}M_{ij}-\sqrt{\omega N}x_{i}x_{j})}{p(\sqrt{N}M_{ij})}\prod_{k}\frac{p_{d}(\sqrt{N}M_{kk}-\sqrt{\omega N}x_{k}^{2})}{p_{d}(\sqrt{N}M_{kk})}\,. (4.3)

We first focus on the off-diagonal part of (4.3) and assume i<ji<j. Recall the definition of Pij(s)P^{(s)}_{ij} in Lemma 4.3. Note that Pij(s)P^{(s)}_{ij} is an odd function of NMij\sqrt{N}M_{ij} if ss is odd, whereas it is an even function if ss is even. Recall that xi2=N1x_{i}^{2}=N^{-1}. Since Pij(s)=𝒪(1)P^{(s)}_{ij}={\mathcal{O}}(1) from Lemma 4.2, by the Taylor expansion,

p(NMijωNxixj)p(NMij)=1ωNPij(1)xixj+ωPij(2)2Nωω6NPij(3)xixj+ω2Pij(4)24N2+𝒪(N52),\begin{split}\frac{p(\sqrt{N}M_{ij}-\sqrt{\omega N}x_{i}x_{j})}{p(\sqrt{N}M_{ij})}=1-\sqrt{\omega N}P^{(1)}_{ij}x_{i}x_{j}+\frac{\omega P^{(2)}_{ij}}{2N}-\frac{\omega\sqrt{\omega}}{6\sqrt{N}}P^{(3)}_{ij}x_{i}x_{j}+\frac{\omega^{2}P^{(4)}_{ij}}{24N^{2}}+{\mathcal{O}}(N^{-\frac{5}{2}}),\end{split} (4.4)

uniformly on ii and jj. Taking the logarithm and Taylor expanding it again, we obtain

log(p(NMijωNxixj)p(NMij))=ωNxixj(Pij(1)+ω6N(Pij(3)3Pij(1)Pij(2)+2(Pij(1))3))+ω2N(Pij(2)(Pij(1))2)+ω224N2(Pij(4)3(Pij(2))24Pij(1)Pij(3)+12(Pij(1))2Pij(2)6(Pij(1))4)+𝒪(N52).\begin{split}&\log\left(\frac{p(\sqrt{N}M_{ij}-\sqrt{\omega N}x_{i}x_{j})}{p(\sqrt{N}M_{ij})}\right)\\ &=-\sqrt{\omega N}x_{i}x_{j}\left(P^{(1)}_{ij}+\frac{\omega}{6N}\Big{(}P^{(3)}_{ij}-3P^{(1)}_{ij}P^{(2)}_{ij}+2(P^{(1)}_{ij})^{3}\Big{)}\right)+\frac{\omega}{2N}\left(P^{(2)}_{ij}-(P^{(1)}_{ij})^{2}\right)\\ &\qquad+\frac{\omega^{2}}{24N^{2}}\left(P^{(4)}_{ij}-3(P^{(2)}_{ij})^{2}-4P^{(1)}_{ij}P^{(3)}_{ij}+12(P^{(1)}_{ij})^{2}P^{(2)}_{ij}-6(P^{(1)}_{ij})^{4}\right)+{\mathcal{O}}(N^{-\frac{5}{2}}).\end{split}

We define N×NN\times N matrices AA, BB, CC by

Aij:=ωN(Pij(1)+ω6N(Pij(3)3Pij(1)Pij(2)+2(Pij(1))3)),Bij:=ω2N(Pij(2)(Pij(1))2),Cij:=ω224N2(Pij(4)3(Pij(2))24Pij(1)Pij(3)+12(Pij(1))2Pij(2)6(Pij(1))4).\begin{split}&A_{ij}:=-\sqrt{\omega N}\left(P^{(1)}_{ij}+\frac{\omega}{6N}\Big{(}P^{(3)}_{ij}-3P^{(1)}_{ij}P^{(2)}_{ij}+2(P^{(1)}_{ij})^{3}\Big{)}\right),\quad B_{ij}:=\frac{\omega}{2N}\left(P^{(2)}_{ij}-(P^{(1)}_{ij})^{2}\right),\\ &C_{ij}:=\frac{\omega^{2}}{24N^{2}}\left(P^{(4)}_{ij}-3(P^{(2)}_{ij})^{2}-4P^{(1)}_{ij}P^{(3)}_{ij}+12(P^{(1)}_{ij})^{2}P^{(2)}_{ij}-6(P^{(1)}_{ij})^{4}\right).\end{split} (4.5)

Note that from Lemma 4.2, Aij=𝒪(N)A_{ij}={\mathcal{O}}(\sqrt{N}), Bij=𝒪(N1)B_{ij}={\mathcal{O}}(N^{-1}), Cij=𝒪(N2)C_{ij}={\mathcal{O}}(N^{-2}). We then have

log12N𝒙i<jp(NMijωNxixj)p(NMij)=log12N𝒙exp(i<jlog(p(NMijωNxixj)p(NMij)))=log12N𝒙exp(i<j(Aijxixj+Bij+Cij)+𝒪(N12))=log12N𝒙exp(i<jAijxixj+i<j(Bij+Cij)+𝒪(N12)).\begin{split}&\log\frac{1}{2^{N}}\sum_{{\boldsymbol{x}}}\prod_{i<j}\frac{p(\sqrt{N}M_{ij}-\sqrt{\omega N}x_{i}x_{j})}{p(\sqrt{N}M_{ij})}\\ &=\log\frac{1}{2^{N}}\sum_{{\boldsymbol{x}}}\exp\left(\sum_{i<j}\log\left(\frac{p(\sqrt{N}M_{ij}-\sqrt{\omega N}x_{i}x_{j})}{p(\sqrt{N}M_{ij})}\right)\right)\\ &=\log\frac{1}{2^{N}}\sum_{{\boldsymbol{x}}}\exp\left(\sum_{i<j}(A_{ij}x_{i}x_{j}+B_{ij}+C_{ij})+{\mathcal{O}}(N^{-\frac{1}{2}})\right)\\ &=\log\frac{1}{2^{N}}\sum_{{\boldsymbol{x}}}\exp\left(\sum_{i<j}A_{ij}x_{i}x_{j}+\sum_{i<j}(B_{ij}+C_{ij})+{\mathcal{O}}(N^{-\frac{1}{2}})\right).\end{split} (4.6)

As briefly explained in Section 1.1, we first analyze the term involving AijA_{ij}, which we call the spin glass term, by applying the strategy of [1] conditional on BB. We can then verify the part dependent of BB, which is ζ\zeta^{\prime} in Proposition 4.4. The sum of ζ\zeta^{\prime} and the term involving BB in (LABEL:eq:decompose), which we call the CLT term, can be written as the sum of i.i.d. random variables that depend only on BB, and we apply the central limit theorem to prove its convergence. Since Cij=𝒪(N2)C_{ij}={\mathcal{O}}(N^{-2}), the sum i<jCij=𝒪(1)\sum_{i<j}C_{ij}={\mathcal{O}}(1) and its fluctuation is negligible by the law of large numbers; we call the term involving CC in (LABEL:eq:decompose) the LLN term.

4.2 Spin glass term

We have the following result for the convergence of the term involving AA. Let 𝔼B\mathbb{E}_{B} and VarB\operatorname{Var}_{B} denote the conditional expectation and the conditional variance given BB, respectively.

Proposition 4.4.

Let AA be as defined in (LABEL:eq:ABC). Set

Z:=12N𝒙exp(i<jAijxixj).Z:=\frac{1}{2^{N}}\sum_{{\boldsymbol{x}}}\exp\left(\sum_{i<j}A_{ij}x_{i}x_{j}\right). (4.7)

Then, there exist random variables ζ\zeta and ζ\zeta^{\prime} such that

logZ=logζ+ζ+𝒪(N1),\log Z=\log\zeta+\zeta^{\prime}+{\mathcal{O}}(N^{-1}), (4.8)

where ζ\zeta and ζ\zeta^{\prime} satisfy the following:

  1. 1.

    The conditional distribution of logζ\log\zeta given BB converges (in distribution) to 𝒩(ν,2ν){\mathcal{N}}(-\nu,2\nu), where

    ν:=k=3(ωF)k4k=14(log(1ωF)+ωF+ω2F22).\nu:=\sum_{k=3}^{\infty}\frac{(\omega F)^{k}}{4k}=-\frac{1}{4}\left(\log(1-\omega F)+\omega F+\frac{\omega^{2}F^{2}}{2}\right). (4.9)
  2. 2.

    Conditional on BB,

    ζ=12N2i<j𝔼B[Aij2]ω224𝔼[(P12(1))4]+U,\zeta^{\prime}=\frac{1}{2N^{2}}\sum_{i<j}\mathbb{E}_{B}[A_{ij}^{2}]-\frac{\omega^{2}}{24}\mathbb{E}[(P^{(1)}_{12})^{4}]+U, (4.10)

    where UU is a random variable whose asymptotic law is a centered Gaussian with variance

    θ:=ω28𝔼[VarB((P12(1))2)].\theta:=\frac{\omega^{2}}{8}\mathbb{E}\left[\operatorname{Var}_{B}\big{(}(P^{(1)}_{12})^{2}\big{)}\right]. (4.11)
  3. 3.

    The fluctuations logζ\log\zeta and ζ\zeta^{\prime} are asymptotically orthogonal to each other under LB2L_{B}^{2}, the conditional L2L^{2}-norm given BB.

We will prove Proposition 4.4 in Section 5.

Here, we would like to recall the definition of the conditional distribution. For random variables X1X_{1} and X2X_{2} with density, let fX1,X2f_{X_{1},X_{2}} be the joint density function of X1X_{1} and X2X_{2} and fX1f_{X_{1}} the marginal density function of X1X_{1}. Then, the conditional distribution of X2X_{2} given X1X_{1} has the conditional density function fX2|X1f_{X_{2}|X_{1}} satisfying the relation

fX2|X1(x2|x1)fX1(x1)=fX1,X2(x1,x2).f_{X_{2}|X_{1}}(x_{2}|x_{1})f_{X_{1}}(x_{1})=f_{X_{1},X_{2}}(x_{1},x_{2}).

Note that fX2|X1(x2|x1)f_{X_{2}|X_{1}}(x_{2}|x_{1}) is the function of x1x_{1} and x2x_{2}.

Remark 4.5.

From the first part of Proposition 4.4, we can also find that logζ\log\zeta converges in distribution to 𝒩(ν,2ν){\mathcal{N}}(-\nu,2\nu) without conditioning on BB. In general, suppose that the conditional distribution of a sequence of random variables (Xn)(X_{n}) given YY converges and the limiting distribution is independent of YY. Then, 𝔼[eitXn|Y]ϕ(t)\mathbb{E}[e^{\mathrm{i}tX_{n}}|Y]\to\phi(t)as nn\to\infty for some ϕ(t)\phi(t) that is independent of YY. Since 𝔼[eitXn]=𝔼[𝔼[eitXn|Y]]\mathbb{E}[e^{\mathrm{i}tX_{n}}]=\mathbb{E}[\mathbb{E}[e^{\mathrm{i}tX_{n}}|Y]], we find that 𝔼[eitXn]ϕ(t)\mathbb{E}[e^{\mathrm{i}tX_{n}}]\to\phi(t) and thus (Xn)(X_{n}) converges in distribution to the same limiting distribution.

Remark 4.6.

Let sgn(t)\text{sgn}(t) be the sign function, defined as sgn(t)=1\text{sgn}(t)=1 if t0t\geq 0 and sgn(t)=1\text{sgn}(t)=-1 if t<0t<0. Then, sgn(Hij)\text{sgn}(H_{ij}) is a random variable such that (sgn(Hij)=1)=(sgn(Hij)=1)=1/2\mathbb{P}(\text{sgn}(H_{ij})=1)=\mathbb{P}(\text{sgn}(H_{ij})=-1)=1/2. Since BijB_{ij} is an even function of NHij\sqrt{N}H_{ij} under 𝑯0{\boldsymbol{H}}_{0}, we can see that BijB_{ij} and sgn(Hij)\text{sgn}(H_{ij}) are independent. Since AijA_{ij} is an odd function of NHij\sqrt{N}H_{ij} under 𝑯0{\boldsymbol{H}}_{0}, this implies that the conditional density function of AijA_{ij} given BijB_{ij} is symmetric (about 0), and in particular, 𝔼[Aijs|Bij]=0\mathbb{E}[A_{ij}^{s}|B_{ij}]=0 for any odd s+s\in\mathbb{Z}^{+}.

Example.

To better understand the decomposition in Proposition 4.4, we consider the following example that was considered in Section 3. Recall that in this example the density of the normalized entries of the noise matrix is

p(x)=pd(x)=sech(πx/2)2=1eπx/2+eπx/2p(x)=p_{d}(x)=\frac{\operatorname{sech}(\pi x/2)}{2}=\frac{1}{e^{\pi x/2}+e^{-\pi x/2}}

and it is assumed that ω<1/F=8/π2\omega<1/F=8/\pi^{2}. The derivatives of pp are given by

p(x)=π2tanh(πx/2)p(x),p′′(x)=π24(12sech2(πx/2))p(x),p′′′(x)=π38tanh(πx/2)(6sech2(πx/2)1)p(x).\begin{split}p^{\prime}(x)&=-\frac{\pi}{2}\tanh(\pi x/2)\cdot p(x),\quad p^{\prime\prime}(x)=\frac{\pi^{2}}{4}\left(1-2\operatorname{sech}^{2}(\pi x/2)\right)p(x),\\ p^{\prime\prime\prime}(x)&=-\frac{\pi^{3}}{8}\tanh(\pi x/2)\left(6\operatorname{sech}^{2}(\pi x/2)-1\right)\cdot p(x).\end{split}

Then, from the definition in (LABEL:eq:ABC),

Aij=ωN(π2tanh(πNMij2)ω6Nπ34tanh(πNMij2)(5sech2(πNMij2)1))A_{ij}=-\sqrt{\omega N}\left(-\frac{\pi}{2}\tanh\Big{(}\frac{\pi\sqrt{N}M_{ij}}{2}\Big{)}-\frac{\omega}{6N}\cdot\frac{\pi^{3}}{4}\tanh\Big{(}\frac{\pi\sqrt{N}M_{ij}}{2}\Big{)}\left(5\operatorname{sech}^{2}\Big{(}\frac{\pi\sqrt{N}M_{ij}}{2}\Big{)}-1\right)\right)

and

Bij=ω2Nπ24sech2(πNMij2).B_{ij}=-\frac{\omega}{2N}\cdot\frac{\pi^{2}}{4}\operatorname{sech}^{2}\Big{(}\frac{\pi\sqrt{N}M_{ij}}{2}\Big{)}.

Applying the identity 1tanh2x=sech2x1-\tanh^{2}x=\operatorname{sech}^{2}x, we thus find that

Aij=N2ωπ2+8NBij(110Bij3ωπ212N)sgn(Mij).A_{ij}=\frac{\sqrt{N}}{2}\sqrt{\omega\pi^{2}+8NB_{ij}}\left(1-\frac{10B_{ij}}{3}-\frac{\omega\pi^{2}}{12N}\right)\cdot\text{sgn}(M_{ij}).

(See Remark 4.6 for a definition of sgn(Mij)\text{sgn}(M_{ij}).) Recall that ZZ depends only on AijA_{ij}. In Section 5, under 𝑯0{\boldsymbol{H}}_{0}, we will see that ζ\zeta depends only on tanh(Aij/N)\tanh(A_{ij}/N) whereas ζ\zeta^{\prime} depends only on Aij2A_{ij}^{2}. Due to the independence between BijB_{ij} and sgn(Mij)\text{sgn}(M_{ij}) discussed in Remark 4.6, we find that the conditional distribution of logζ\log\zeta depends only on the random variables sgn(Mij)\text{sgn}(M_{ij}). Thus, Equation (4.8) in this case means the decomposition of ZZ, conditional on BB, into one part that depends only on sgn(Mij)\text{sgn}(M_{ij}) and the other part that depends only on BijB_{ij}. It is then obvious that the fluctuations logζ\log\zeta and ζ\zeta^{\prime} are asymptotically orthogonal to each other under LB2L_{B}^{2}, due to their independence.

4.3 CLT part

From Proposition 4.4, we find that the part involving BB in the spin glass term is asymptotically (2N2)1i<j𝔼B[Aij2](2N^{2})^{-1}\sum_{i<j}\mathbb{E}_{B}[A_{ij}^{2}]. Thus, combining it with the term involving BB in (LABEL:eq:decompose), we are led to consider

i<jBij+12N2i<j𝔼[Aij2|Bij],\sum_{i<j}B_{ij}+\frac{1}{2N^{2}}\sum_{i<j}\mathbb{E}[A_{ij}^{2}|B_{ij}], (4.12)

which is the sum of i.i.d. random variables that depend only on BijB_{ij}. (The contribution from UU in (4.10) will be considered separately at the end of the proof.) By the central limit theorem, (4.12) converges to a normal random variable whose mean and variance we compute in this subsection.

We first compute the mean of (4.12). By definition,

𝔼[Bij]=ω2N𝔼[Pij(2)(Pij(1))2].\mathbb{E}[B_{ij}]=\frac{\omega}{2N}\mathbb{E}\left[P^{(2)}_{ij}-(P^{(1)}_{ij})^{2}\right]. (4.13)

Note that for any s1s\geq 1,

𝔼[Pij(s)]=p(s)(t)p(t)p(t)dt=p(s)(t)dt=0,\mathbb{E}\left[P^{(s)}_{ij}\right]=\int_{\mathbb{R}}\frac{p^{(s)}(t)}{p(t)}\,p(t)\,\mathrm{d}t=\int_{\mathbb{R}}p^{(s)}(t)\,\mathrm{d}t=0, (4.14)

and

F=p(t)2p(t)dt=𝔼[(Pij(1))2].F=\int_{\mathbb{R}}\frac{p^{\prime}(t)^{2}}{p(t)}\mathrm{d}t=\mathbb{E}\left[(P^{(1)}_{ij})^{2}\right].

We also have from the definition of AijA_{ij} and the law of total expectation, 𝔼[𝔼[Aij2|Bij]]=𝔼[Aij2]\mathbb{E}\left[\mathbb{E}[A_{ij}^{2}|B_{ij}]\right]=\mathbb{E}[A_{ij}^{2}], that

𝔼[𝔼[Aij2|Bij]]=𝔼[Aij2]=ωN𝔼[(Pij(1))2]+ω23𝔼[Pij(1)Pij(3)3(Pij(1))2Pij(2)+2(Pij(1))4]+O(N1).\begin{split}\mathbb{E}\left[\mathbb{E}[A_{ij}^{2}|B_{ij}]\right]=\mathbb{E}[A_{ij}^{2}]=\omega N\mathbb{E}\left[(P^{(1)}_{ij})^{2}\right]+\frac{\omega^{2}}{3}\mathbb{E}\left[P^{(1)}_{ij}P^{(3)}_{ij}-3(P^{(1)}_{ij})^{2}P^{(2)}_{ij}+2(P^{(1)}_{ij})^{4}\right]+O(N^{-1}).\end{split} (4.15)

Using integration by parts, we obtain that

𝔼[(Pij(1))4]=p(t)4p(t)3dt=12(1p(t)2)(p(t)3)dt=32p(t)2p′′(t)p(t)2dt=32𝔼[(Pij(1))2Pij(2)].\begin{split}\mathbb{E}\left[(P^{(1)}_{ij})^{4}\right]&=\int_{\mathbb{R}}\frac{p^{\prime}(t)^{4}}{p(t)^{3}}\mathrm{d}t=-\frac{1}{2}\int_{\mathbb{R}}\left(\frac{1}{p(t)^{2}}\right)^{\prime}(p^{\prime}(t)^{3})\mathrm{d}t=\frac{3}{2}\int_{\mathbb{R}}\frac{p^{\prime}(t)^{2}p^{\prime\prime}(t)}{p(t)^{2}}\mathrm{d}t=\frac{3}{2}\mathbb{E}\left[(P^{(1)}_{ij})^{2}P^{(2)}_{ij}\right].\end{split} (4.16)

Note that the boundary term in the integration by parts used in (4.16) vanishes since p(tn±)3/p(tn±)20p^{\prime}(t^{\pm}_{n})^{3}/p(t^{\pm}_{n})^{2}\to 0 for some sequences (tn+)n=1,2,(t^{+}_{n})_{n=1,2,\dots} and (tn)n=1,2,(t^{-}_{n})_{n=1,2,\dots} such that tn+t^{+}_{n}\to\infty and tnt^{-}_{n}\to-\infty, respectively, which can be checked from the fact that

|p(t)3p(t)2|C3|t|r3p(t)dt=C3𝔼[(NHij)r3]<.\int_{\mathbb{R}}\left|\frac{p^{\prime}(t)^{3}}{p(t)^{2}}\right|\leq C_{3}\int_{\mathbb{R}}|t|^{r_{3}}p(t)\mathrm{d}t=C_{3}\mathbb{E}[(\sqrt{N}H_{ij})^{r_{3}}]<\infty.

(See Assumption 2.2 for the definition of the constants C3C_{3} and r3r_{3}.) Thus, from (4.15),

𝔼[𝔼[Aij2|Bij]]=ωN𝔼[(Pij(1))2]+ω23𝔼[Pij(1)Pij(3)]+O(N1),\mathbb{E}\left[\mathbb{E}[A_{ij}^{2}|B_{ij}]\right]=\omega N\mathbb{E}\left[(P^{(1)}_{ij})^{2}\right]+\frac{\omega^{2}}{3}\mathbb{E}\left[P^{(1)}_{ij}P^{(3)}_{ij}\right]+O(N^{-1}), (4.17)

and we conclude from (4.12), (4.13), (4.14), and (4.15) that

𝔼[i<jBij+12N2i<j𝔼[Aij2|Bij]]=ω26N2i<j𝔼[Pij(1)Pij(3)]+O(N1)=ω212𝔼[P12(1)P12(3)]+O(N1).\begin{split}\mathbb{E}\left[\sum_{i<j}B_{ij}+\frac{1}{2N^{2}}\sum_{i<j}\mathbb{E}[A_{ij}^{2}|B_{ij}]\right]&=\frac{\omega^{2}}{6N^{2}}\sum_{i<j}\mathbb{E}\left[P^{(1)}_{ij}P^{(3)}_{ij}\right]+O(N^{-1})\\ &=\frac{\omega^{2}}{12}\mathbb{E}\left[P^{(1)}_{12}P^{(3)}_{12}\right]+O(N^{-1}).\end{split} (4.18)

We next compute the variance of (4.12). Since 𝔼[BijAij2|Bij]=Bij𝔼[Aij2|Bij]\mathbb{E}[B_{ij}A_{ij}^{2}|B_{ij}]=B_{ij}\mathbb{E}[A_{ij}^{2}|B_{ij}] by the definition of the conditional expectation, we have

𝔼[(Bij+𝔼[Aij2|Bij]2N2)2]=𝔼[Bij2]+1N2𝔼[Aij2Bij]+14N4𝔼[(𝔼[Aij2|Bij])2].\begin{split}\mathbb{E}\left[\left(B_{ij}+\frac{\mathbb{E}[A_{ij}^{2}|B_{ij}]}{2N^{2}}\right)^{2}\right]=\mathbb{E}[B_{ij}^{2}]+\frac{1}{N^{2}}\mathbb{E}[A_{ij}^{2}B_{ij}]+\frac{1}{4N^{4}}\mathbb{E}[(\mathbb{E}[A_{ij}^{2}|B_{ij}])^{2}].\end{split} (4.19)

Again, by definition,

𝔼[Bij2]=ω24N2𝔼[(Pij(2))22(Pij(1))2Pij(2)+(Pij(1))4],\mathbb{E}[B_{ij}^{2}]=\frac{\omega^{2}}{4N^{2}}\mathbb{E}\left[(P^{(2)}_{ij})^{2}-2(P^{(1)}_{ij})^{2}P^{(2)}_{ij}+(P^{(1)}_{ij})^{4}\right], (4.20)
1N2𝔼[Aij2Bij]=ω22N2𝔼[(Pij(1))2(Pij(2)(Pij(1))2)]+O(N3),\frac{1}{N^{2}}\mathbb{E}[A_{ij}^{2}B_{ij}]=\frac{\omega^{2}}{2N^{2}}\mathbb{E}\left[(P^{(1)}_{ij})^{2}\Big{(}P^{(2)}_{ij}-(P^{(1)}_{ij})^{2}\Big{)}\right]+O(N^{-3}), (4.21)

and

14N4𝔼[(𝔼[Aij2|Bij])2]=ω24N2𝔼[(𝔼[(Pij(1))2|Bij])2]+O(N3).\frac{1}{4N^{4}}\mathbb{E}[(\mathbb{E}[A_{ij}^{2}|B_{ij}])^{2}]=\frac{\omega^{2}}{4N^{2}}\mathbb{E}[(\mathbb{E}[(P^{(1)}_{ij})^{2}|B_{ij}])^{2}]+O(N^{-3}). (4.22)

Adding (4.20) and (4.21), we find that

𝔼[Bij2]+1N2𝔼[Aij2Bij]=ω24N2𝔼[(Pij(2))2(Pij(1))4]+O(N3).\mathbb{E}[B_{ij}^{2}]+\frac{1}{N^{2}}\mathbb{E}[A_{ij}^{2}B_{ij}]=\frac{\omega^{2}}{4N^{2}}\mathbb{E}\left[(P^{(2)}_{ij})^{2}-(P^{(1)}_{ij})^{4}\right]+O(N^{-3}). (4.23)

Using the identity

𝔼[(Pij(1))4]=𝔼[(𝔼[(Pij(1))2|Bij])2+Var((Pij(1))2|Bij)],\mathbb{E}[(P^{(1)}_{ij})^{4}]=\mathbb{E}\left[(\mathbb{E}[(P^{(1)}_{ij})^{2}|B_{ij}])^{2}+\operatorname{Var}\big{(}(P^{(1)}_{ij})^{2}|B_{ij}\big{)}\right], (4.24)

and combining (4.19) and (4.23), we have

𝔼[(Bij+𝔼[Aij2|Bij]2N2)2]=ω24N2𝔼[(Pij(2))2Var((Pij(1))2|Bij)]+O(N3).\mathbb{E}\left[\left(B_{ij}+\frac{\mathbb{E}[A_{ij}^{2}|B_{ij}]}{2N^{2}}\right)^{2}\right]=\frac{\omega^{2}}{4N^{2}}\mathbb{E}\left[(P^{(2)}_{ij})^{2}-\operatorname{Var}\big{(}(P^{(1)}_{ij})^{2}|B_{ij}\big{)}\right]+O(N^{-3}). (4.25)

From (4.13) and (4.15), we know that

𝔼[Bij+𝔼[Aij2|Bij]2N2]=O(N2),\mathbb{E}\left[B_{ij}+\frac{\mathbb{E}[A_{ij}^{2}|B_{ij}]}{2N^{2}}\right]=O(N^{-2}),

and hence

i<j𝔼[(Bij+𝔼[Aij2|Bij]2N2)2]Var(i<jBij+12N2i<j𝔼[Aij2|Bij])=O(N2).\sum_{i<j}\mathbb{E}\left[\left(B_{ij}+\frac{\mathbb{E}[A_{ij}^{2}|B_{ij}]}{2N^{2}}\right)^{2}\right]-\operatorname{Var}\left(\sum_{i<j}B_{ij}+\frac{1}{2N^{2}}\sum_{i<j}\mathbb{E}[A_{ij}^{2}|B_{ij}]\right)=O(N^{-2}).

We thus conclude from (4.18) and (4.25) that

i<jBij+12N2i<j𝔼[Aij2|Bij]𝒩(ω212𝔼[P12(1)P12(3)],ω28𝔼[(P12(2))2Var((P12(1))2|B12)]).\begin{split}\sum_{i<j}B_{ij}+\frac{1}{2N^{2}}\sum_{i<j}\mathbb{E}[A_{ij}^{2}|B_{ij}]\Rightarrow{\mathcal{N}}\left(\frac{\omega^{2}}{12}\mathbb{E}\left[P^{(1)}_{12}P^{(3)}_{12}\right],\frac{\omega^{2}}{8}\mathbb{E}\left[(P^{(2)}_{12})^{2}-\operatorname{Var}\big{(}(P^{(1)}_{12})^{2}|B_{12}\big{)}\right]\right).\end{split} (4.26)

4.4 LLN part

From the definition of CijC_{ij} in (LABEL:eq:ABC), applying Lemma 4.3, we find that

i<jCij=ω248𝔼[P12(4)3(P12(2))24P12(1)P12(3)+12(P12(1))2P12(2)6(P12(1))4]+𝒪(N1).\sum_{i<j}C_{ij}=\frac{\omega^{2}}{48}\mathbb{E}\left[P^{(4)}_{12}-3(P^{(2)}_{12})^{2}-4P^{(1)}_{12}P^{(3)}_{12}+12(P^{(1)}_{12})^{2}P^{(2)}_{12}-6(P^{(1)}_{12})^{4}\right]+{\mathcal{O}}(N^{-1}). (4.27)

Since 𝔼[P12(4)]=0\mathbb{E}[P^{(4)}_{12}]=0 from (4.14), applying the integration by parts formulas (4.16) and

𝔼[(Pij(1))2Pij(2)]=p(t)2p′′(t)p(t)2dt=(1p(t))(p(t)p′′(t))dt=p′′(t)2+p(t)p′′′(t)p(t)dt=𝔼[(Pij(2))2+Pij(1)Pij(3)],\begin{split}\mathbb{E}\left[(P^{(1)}_{ij})^{2}P^{(2)}_{ij}\right]&=\int_{\mathbb{R}}\frac{p^{\prime}(t)^{2}p^{\prime\prime}(t)}{p(t)^{2}}\mathrm{d}t=-\int_{\mathbb{R}}\left(\frac{1}{p(t)}\right)^{\prime}(p^{\prime}(t)p^{\prime\prime}(t))\mathrm{d}t\\ &=\int_{\mathbb{R}}\frac{p^{\prime\prime}(t)^{2}+p^{\prime}(t)p^{\prime\prime\prime}(t)}{p(t)}\mathrm{d}t=\mathbb{E}\left[(P^{(2)}_{ij})^{2}+P^{(1)}_{ij}P^{(3)}_{ij}\right],\end{split} (4.28)

we find that

i<jCij=ω248𝔼[P12(1)P12(3)]+𝒪(N1).\sum_{i<j}C_{ij}=-\frac{\omega^{2}}{48}\mathbb{E}[P^{(1)}_{12}P^{(3)}_{12}]+{\mathcal{O}}(N^{-1}). (4.29)

4.5 Diagonal part

For the diagonal part in (4.3), it is not hard to see that

logkpd(NMkkωNxk2)pd(NMkk)=k(ωNpd(NMkk)pd(NMkk)+ω2N(pd′′(NMkk)pd(NMkk)pd(NMkk)2pd(NMkk)2))+𝒪(N12),\begin{split}&\log\prod_{k}\frac{p_{d}(\sqrt{N}M_{kk}-\sqrt{\omega N}x_{k}^{2})}{p_{d}(\sqrt{N}M_{kk})}\\ &=\sum_{k}\left(-\sqrt{\frac{\omega}{N}}\frac{p_{d}^{\prime}(\sqrt{N}M_{kk})}{p_{d}(\sqrt{N}M_{kk})}+\frac{\omega}{2N}\left(\frac{p_{d}^{\prime\prime}(\sqrt{N}M_{kk})p_{d}(\sqrt{N}M_{kk})-p_{d}^{\prime}(\sqrt{N}M_{kk})^{2}}{p_{d}(\sqrt{N}M_{kk})^{2}}\right)\right)+{\mathcal{O}}(N^{-\frac{1}{2}}),\end{split} (4.30)

which does not depend on 𝒙{\boldsymbol{x}}. Applying the integration by parts formulas, one can evaluate the mean and the variance to show that

logkpd(NMkkωNxk2)pd(NMkk)𝒩(ωFd2,ωFd).\log\prod_{k}\frac{p_{d}(\sqrt{N}M_{kk}-\sqrt{\omega N}x_{k}^{2})}{p_{d}(\sqrt{N}M_{kk})}\Rightarrow{\mathcal{N}}(-\frac{\omega F_{d}}{2},\omega F_{d}). (4.31)

4.6 Proof of Theorem 2.3

We now combine Proposition 4.4, Equations (LABEL:eq:decompose), (4.26), (4.29), and (4.31) to find that log(M;λ)\log{\mathcal{L}}(M;\lambda) converges to the Gaussian whose mean is

14(log(1ωF)+ωF+ω2F22)ω224𝔼[(P12(1))4]+ω212𝔼[P12(1)P12(3)]ω248𝔼[P12(1)P12(3)]ωFd2=14(log(1ωF)+ωF+ω2F22)ω216𝔼[(P12(2))2]ωFd2=ρ\begin{split}&\frac{1}{4}\left(\log(1-\omega F)+\omega F+\frac{\omega^{2}F^{2}}{2}\right)-\frac{\omega^{2}}{24}\mathbb{E}[(P^{(1)}_{12})^{4}]+\frac{\omega^{2}}{12}\mathbb{E}[P^{(1)}_{12}P^{(3)}_{12}]-\frac{\omega^{2}}{48}\mathbb{E}[P^{(1)}_{12}P^{(3)}_{12}]-\frac{\omega F_{d}}{2}\\ &=\frac{1}{4}\left(\log(1-\omega F)+\omega F+\frac{\omega^{2}F^{2}}{2}\right)-\frac{\omega^{2}}{16}\mathbb{E}[(P^{(2)}_{12})^{2}]-\frac{\omega F_{d}}{2}=-\rho\end{split} (4.32)

where we also applied the integration by parts formulas (4.16) and (4.28). (See also Remark 4.5.) Similarly, we can also find that the variance of the limiting Gaussian is

12(log(1ωF)+ωF+ω2F22)+ω28𝔼[VarB((P12(1))2)]+ω28𝔼[(P12(2))2VarB((P12(1))2)]+ωFd=2ρ.\begin{split}&-\frac{1}{2}\left(\log(1-\omega F)+\omega F+\frac{\omega^{2}F^{2}}{2}\right)+\frac{\omega^{2}}{8}\mathbb{E}\left[\operatorname{Var}_{B}\big{(}(P^{(1)}_{12})^{2}\big{)}\right]\\ &\qquad\qquad+\frac{\omega^{2}}{8}\mathbb{E}\left[(P^{(2)}_{12})^{2}-\operatorname{Var}_{B}\big{(}(P^{(1)}_{12})^{2}\big{)}\right]+\omega F_{d}=2\rho.\end{split} (4.33)

This concludes the proof of Theorem 2.3.

5 Asymptotic independence of fluctuations

In this section, we prove Proposition 4.4. The key idea for the proof is to decompose the fluctuation of ZZ into two parts. First, notice that

exp(Aijxixj)=coshAijN+NxixjsinhAijN,\exp\left(A_{ij}x_{i}x_{j}\right)=\cosh\frac{A_{ij}}{N}+Nx_{i}x_{j}\sinh\frac{A_{ij}}{N},

which can be checked by considering the cases xixj=N1x_{i}x_{j}=N^{-1} and xixj=N1x_{i}x_{j}=-N^{-1} separately. Thus, if we let

ζ:=12N𝒙i<j(1+NxixjtanhAijN),\zeta:=\frac{1}{2^{N}}\sum_{{\boldsymbol{x}}}\prod_{i<j}\left(1+Nx_{i}x_{j}\tanh\frac{A_{ij}}{N}\right), (5.1)

then it is direct to see that

logZlogζ=logi<jcosh(AijN).\log Z-\log\zeta=\log\prod_{i<j}\cosh\left(\frac{A_{ij}}{N}\right). (5.2)

Recall that Aij=𝒪(N)A_{ij}={\mathcal{O}}(\sqrt{N}). By Taylor expanding the right hand side, we get

logZlogζ=i<j(Aij22N2Aij412N4)+𝒪(N1),\log Z-\log\zeta=\sum_{i<j}\left(\frac{A_{ij}^{2}}{2N^{2}}-\frac{A_{ij}^{4}}{12N^{4}}\right)+{\mathcal{O}}(N^{-1}), (5.3)

and we define

ζ:=i<j(Aij22N2Aij412N4).\zeta^{\prime}:=\sum_{i<j}\left(\frac{A_{ij}^{2}}{2N^{2}}-\frac{A_{ij}^{4}}{12N^{4}}\right). (5.4)

It is then obvious that logZ=logζ+ζ+𝒪(N1)\log Z=\log\zeta+\zeta^{\prime}+{\mathcal{O}}(N^{-1}).

In the rest of this section, we first find the fluctuation of ζ\zeta^{\prime}, which can be readily obtained by the CLT. Then, we adopt the strategy for the proof of Proposition 2.2 in [1] to conclude the proof of Proposition 4.4 by finding the asymptotic fluctuation of logζ\log\zeta.

5.1 Conditional law of ζ\zeta^{\prime}

Recall the definition of ζ\zeta^{\prime} in (5.4). Since

Aij=ωNPij(1)+𝒪(N1/2)A_{ij}=-\sqrt{\omega N}P^{(1)}_{ij}+{\mathcal{O}}(N^{-1/2})

and Pij(1)=𝒪(1)P^{(1)}_{ij}={\mathcal{O}}(1), we find that

i<jAij4N4ω22𝔼[(P12(1))4].\sum_{i<j}\frac{A_{ij}^{4}}{N^{4}}\to\frac{\omega^{2}}{2}\mathbb{E}[(P^{(1)}_{12})^{4}].

We want to apply the central limit limit theorem to the array (Aij2/(2N2):1i<jN)(A_{ij}^{2}/(2N^{2}):1\leq i<j\leq N), conditional on BB. Since Pij(1)P^{(1)}_{ij} and NBijNB_{ij} are NN-independent random variables, we find that VarB((Pij(1))2)\operatorname{Var}_{B}((P^{(1)}_{ij})^{2}) is a non-negative, NN-independent random variable. In particular, 𝔼[VarB((Pij(1))2)]\mathbb{E}[\operatorname{Var}_{B}((P^{(1)}_{ij})^{2})] is a non-negative, NN-independent constant.

Suppose that 𝔼[VarB((Pij(1))2)]>0\mathbb{E}[\operatorname{Var}_{B}((P^{(1)}_{ij})^{2})]>0. Then, since

2N(N1)i<jVarB(Aij2ωN)𝔼[VarB((Pij(1))2)]>0,\frac{2}{N(N-1)}\sum_{i<j}\operatorname{Var}_{B}\big{(}\frac{A_{ij}^{2}}{\omega N}\big{)}\to\mathbb{E}[\operatorname{Var}_{B}((P^{(1)}_{ij})^{2})]>0,

we find that there exists a constant cc, independent of NN, such that

i<jVarB(Aij2)>cN4\sum_{i<j}\operatorname{Var}_{B}\big{(}A_{ij}^{2}\big{)}>cN^{4}

almost surely. Further, since

𝔼[(Aij2𝔼B[Aij2])4]=O(N4),\mathbb{E}[(A_{ij}^{2}-\mathbb{E}_{B}[A_{ij}^{2}])^{4}]=O(N^{4}),

Lyapunov’s condition for the central limit theorem is satisfied with the array (Aij2:1i<jN)(A_{ij}^{2}:1\leq i<j\leq N) as

(i<j𝔼[(Aij2𝔼B[Aij2])4])/(i<jVarB(Aij2))2=O(N2)\left(\sum_{i<j}\mathbb{E}\left[\Big{(}A_{ij}^{2}-\mathbb{E}_{B}\big{[}A_{ij}^{2}\big{]}\Big{)}^{4}\right]\right)\Big{/}\left(\sum_{i<j}\operatorname{Var}_{B}\big{(}A_{ij}^{2}\big{)}\right)^{2}=O(N^{-2})

almost surely. Since Lyapunov’s condition is scaling-invariant, we find that the central limit theorem holds for (Aij2/(2N2):1i<jN)(A_{ij}^{2}/(2N^{2}):1\leq i<j\leq N), conditional on BB, and the asymptotic law of the random variable

U:=ζ(12N2i<j𝔼B[Aij2]ω224𝔼[(P12(1))4])U:=\zeta^{\prime}-\left(\frac{1}{2N^{2}}\sum_{i<j}\mathbb{E}_{B}[A_{ij}^{2}]-\frac{\omega^{2}}{24}\mathbb{E}[(P^{(1)}_{12})^{4}]\right) (5.5)

is given by a centered Gaussian with variance θ\theta defined in (4.11), since

(i<jVarB(Aij22N2))ω28𝔼[VarB((P12(1))2)].\left(\sum_{i<j}\operatorname{Var}_{B}\big{(}\frac{A_{ij}^{2}}{2N^{2}}\big{)}\right)\to\frac{\omega^{2}}{8}\mathbb{E}[\operatorname{Var}_{B}\big{(}(P^{(1)}_{12})^{2}\big{)}].

If 𝔼[VarB((Pij(1))2)]=0\mathbb{E}[\operatorname{Var}_{B}((P^{(1)}_{ij})^{2})]=0, then (Pij(1))2=𝔼B[(Pij(1))2](P^{(1)}_{ij})^{2}=\mathbb{E}_{B}[(P^{(1)}_{ij})^{2}] almost surely, and hence Aij2=𝔼B[Aij2]+𝒪(1)A_{ij}^{2}=\mathbb{E}_{B}[A_{ij}^{2}]+{\mathcal{O}}(1). We then find that

(i<jVarB(Aij22N2))0\left(\sum_{i<j}\operatorname{Var}_{B}\big{(}\frac{A_{ij}^{2}}{2N^{2}}\big{)}\right)\to 0

almost surely, which implies that

12N2i<jAij212N2i<j𝔼B[Aij2]0\frac{1}{2N^{2}}\sum_{i<j}A_{ij}^{2}-\frac{1}{2N^{2}}\sum_{i<j}\mathbb{E}_{B}[A_{ij}^{2}]\to 0

almost surely. Thus, in this case, UU converges to 0, and hence the second part of Proposition 4.4 holds with θ=0\theta=0. This completes the proof of the second part of Proposition 4.4.

5.2 Conditional law of ζ\zeta

In this subsection, we prove the first part of Proposition 4.4 by a combinatorial approach. Consider the product i<j(1+Nxixjtanh(Aij/N))\prod_{i<j}\left(1+Nx_{i}x_{j}\tanh(A_{ij}/N)\right) in (5.1) as a polynomial of xksx_{k}^{\prime}s. A term in this polynomial is of the form

Nxi1xj1xi2xj2xixjtanhAi1j1NtanhAi2j2NtanhAijN.N^{\ell}x_{i_{1}}x_{j_{1}}x_{i_{2}}x_{j_{2}}\dots x_{i_{\ell}}x_{j_{\ell}}\tanh\frac{A_{i_{1}j_{1}}}{N}\tanh\frac{A_{i_{2}j_{2}}}{N}\dots\tanh\frac{A_{i_{\ell}j_{\ell}}}{N}. (5.6)

If the multiplicity of any of the xi1,,xi,xj1,,xjx_{i_{1}},\dots,x_{i_{\ell}},x_{j_{1}},\dots,x_{j_{\ell}} in (5.6) is odd, then the term (5.6) is negligible in (5.1); for instance, if the multiplicity of the xi1x_{i_{1}} is odd, then the contribution of the term (5.6) from the case xi1=1/Nx_{i_{1}}=1/\sqrt{N} and that from the case xi1=1/Nx_{i_{1}}=-1/\sqrt{N} exactly cancel each other. It in particular implies that (5.6) is actually of the form

Nxi1xj1xi2xj2xixjtanhAi1j1NtanhAi2j2NtanhAijN=tanhAi1j1NtanhAi2j2NtanhAijN,\begin{split}&N^{\ell}x_{i_{1}}x_{j_{1}}x_{i_{2}}x_{j_{2}}\dots x_{i_{\ell}}x_{j_{\ell}}\tanh\frac{A_{i_{1}j_{1}}}{N}\tanh\frac{A_{i_{2}j_{2}}}{N}\dots\tanh\frac{A_{i_{\ell}j_{\ell}}}{N}\\ &=\tanh\frac{A_{i_{1}j_{1}}}{N}\tanh\frac{A_{i_{2}j_{2}}}{N}\dots\tanh\frac{A_{i_{\ell}j_{\ell}}}{N},\end{split} (5.7)

where the pair (ik,jk)(i_{k},j_{k}) does not repeat. We then find the one-to-one correspondence between the right hand side of (LABEL:eq:zeta_poly2) and a simple and closed graph on the vertex set [n]:={1,2,,n}[n]:=\{1,2,\dots,n\}, i.e., with no self-edges, repeated edges, or vertices of odd degree. Define for a simple and closed graph Γ\Gamma

w(Γ):=(i,j)E(Γ)tanh(AijN),w(\Gamma):=\prod_{(i,j)\in E(\Gamma)}\tanh\Big{(}\frac{A_{ij}}{N}\Big{)},

where E(Γ)E(\Gamma) denotes the edge set of Γ\Gamma. A straightforward calculation shows that

ζ=Γw(Γ).\zeta=\sum_{\Gamma}w(\Gamma). (5.8)

We further decompose w(Γ)w(\Gamma) by considering simple loops, or cycles, contained in Γ\Gamma. (A simple loop is a graph where every vertex has degree 22.) For example, if the edges of Γ\Gamma are (1,2)(1,2), (1,3)(1,3), (2,3)(2,3), (3,4)(3,4), (3,5)(3,5), (4,5)(4,5), then the degree of the vertex 33 is 44 and thus Γ\Gamma is not a simple loop. However, Γ\Gamma can be decomposed into simple loops γ1\gamma_{1} and γ2\gamma_{2} whose sets of edges are {(1,2),(1,3),(2,3)}\{(1,2),(1,3),(2,3)\} and {(3,4),(3,5),(4,5)}\{(3,4),(3,5),(4,5)\}, respectively.

Let us denote a simple loop by the lowercase γ\gamma. Note that for a simple loop γ\gamma, 𝔼B[w(γ)]=0\mathbb{E}_{B}[w(\gamma)]=0, since the AijA_{ij}’s are independent and their density functions given BB is symmetric. (See Remark 4.6.) We introduce the following result from [1]:

Lemma 5.1.

The random variable ζγ(1+w(γ))\zeta-\prod_{\gamma}(1+w(\gamma)) converges in probability to 0.

For the proof of Lemma 5.1, see the end of Section 3 in [1].

For the proof of the first part of Proposition 4.4, we claim for γ(1+w(γ))\prod_{\gamma}(1+w(\gamma)) that

γ(1+w(γ))=exp(γ(w(γ)w(γ)22)(1+𝒪(N1))).\prod_{\gamma}\left(1+w(\gamma)\right)=\exp\left(\sum_{\gamma}\Big{(}w(\gamma)-\frac{w(\gamma)^{2}}{2}\Big{)}(1+{\mathcal{O}}(N^{-1}))\right). (5.9)

To prove the claim, we first notice that if |E(γ)|=k|E(\gamma)|=k, i.e., a simple loop γ\gamma has kk edges, then

𝔼|w(γ)|2(ωF+ϵ)kNk,𝔼|w(γ)|s=O(Nsk2)\mathbb{E}|w(\gamma)|^{2}\leq\frac{(\omega F+\epsilon)^{k}}{N^{k}},\quad\mathbb{E}|w(\gamma)|^{s}=O(N^{-\frac{sk}{2}}) (5.10)

for any ϵ>0\epsilon>0 and any positive integer ss, which can be checked from an inequality |tanhx||x||\tanh x|\leq|x|. Similarly, applying Markov’s inequality, we can show that

maxγ:|E(γ)|=k|w(γ)|s=𝒪(Ns2|E(γ)|),maxγ:|E(γ)|=k𝔼B|w(γ)|s=𝒪(Ns2|E(γ)|)\max_{\gamma:|E(\gamma)|=k}|w(\gamma)|^{s}={\mathcal{O}}(N^{-\frac{s}{2}|E(\gamma)|}),\quad\max_{\gamma:|E(\gamma)|=k}\mathbb{E}_{B}|w(\gamma)|^{s}={\mathcal{O}}(N^{-\frac{s}{2}|E(\gamma)|}) (5.11)

for any positive integer ss, where E(γ)E(\gamma) is the set of edges of γ\gamma. We thus find from the Taylor expansion that

1+w(γ)=exp((w(γ)w(γ)22)(1+𝒪(N1))).1+w(\gamma)=\exp\left(\Big{(}w(\gamma)-\frac{w(\gamma)^{2}}{2}\Big{)}(1+{\mathcal{O}}(N^{-1}))\right).

From the expansion (5.9), we find that it is required to prove convergence results for the sums of w(γ)w(\gamma) and w(γ)2w(\gamma)^{2}. We claim the following:

Lemma 5.2.

Let ξ:=γw(γ)\xi:=\sum_{\gamma}w(\gamma), η:=γw(γ)2\eta:=\sum_{\gamma}w(\gamma)^{2}. Let ν\nu be given as in (4.9). Then, as NN\to\infty,

  1. (i)

    The conditional law ξ\xi given BB converges in probability to a centered Gaussian with variance 2ν2\nu, and

  2. (ii)

    η\eta converges in probability to 2ν2\nu.

See Appendix B for the proof of Lemma 5.2.

Since

γ(1+w(γ))=exp(ξη2+𝒪(N1)),\prod_{\gamma}\left(1+w(\gamma)\right)=\exp\left(\xi-\frac{\eta}{2}+{\mathcal{O}}(N^{-1})\right),

it is immediate to prove the first part of Proposition 4.4 from Lemmas 5.1 and 5.2. Furthermore, since the source of the fluctuation of logζ\log\zeta is ξ=γw(γ)\xi=\sum_{\gamma}w(\gamma), which is orthogonal to Aij2A_{ij}^{2} that generates the fluctuation of ζ\zeta^{\prime} under LB2L_{B}^{2}, we find that the third part of Proposition 4.4, the asymptotic orthogonality of logζ\log\zeta and ζ\zeta^{\prime}, holds.

6 Conclusion and future works

In this paper, we studied the detection problem for the spiked Wigner model. Assuming the Rademacher prior, we proved the asymptotic normality of the log likelihood ratio with precise formulas for the mean and the variance of the limiting Gaussian. We computed the limiting error of the likelihood ratio test, and we conducted numerical experiments to compare the results from the experiments with the theoretical error. We also obtained the corresponding results for the spiked IID model where the noise is asymmetric.

A natural extension of the current work is to prove the corresponding results for other models, especially the spiked Wigner models with the spherical prior or the sparse Rademacher prior, and prove the conjecture that the limiting distribution of the log likelihood ratio depends on the prior. It is also of interest to extend the techniques in this paper to spiked rectangular models.

Acknowledgments

We thank the anonymous referees for their helpful comments and suggestions. The third author thanks to Ji Hyung Jung and Seong-Mi Seo for helpful comments. The work of the second author was performed when he was affiliated with Stochastic Analysis and Application Research Center (SAARC) at KAIST. The first author was supported in part by National Research Foundation of Korea (NRF) grant funded by the Korea government(MSIT) (No. 2021R1C1C11008539 and 2023R1A2C1005843). The second author and the third author were supported in part by National Research Foundation of Korea (NRF) grant funded by the Korea government(MSIT) (No. 2019R1A5A1028324).

Appendix A Proof of Lemmas 4.2 and 4.3

In this section, we provide the detail of the proof of the high probability estimates, Lemmas 4.2 and 4.3.

Proof of Lemma 4.2.

We first prove the estimate for HijH_{ij}. For given ϵ,D>0\epsilon,D>0, choose an integer D>D/ϵD^{\prime}>D/\epsilon. By applying Markov’s inequality,

(|Hij|>N12+ϵ)=(ND2|Hij|D>NDϵ)NDϵND2𝔼[|Hij|D]<CDNDϵ<ND\mathbb{P}(|H_{ij}|>N^{-\frac{1}{2}+\epsilon})=\mathbb{P}(N^{\frac{D^{\prime}}{2}}|H_{ij}|^{D^{\prime}}>N^{D^{\prime}\epsilon})\leq N^{-D^{\prime}\epsilon}N^{\frac{D^{\prime}}{2}}\mathbb{E}[|H_{ij}|^{D^{\prime}}]<C_{D^{\prime}}N^{-D^{\prime}\epsilon}<N^{-D}

for any sufficiently large NN, where we used the finite moment assumption for HijH_{ij} in Definition 2.1. This proves that Hij=𝒪(N12)H_{ij}={\mathcal{O}}(N^{-\frac{1}{2}}). Note that it also implies that NHij=𝒪(1)\sqrt{N}H_{ij}={\mathcal{O}}(1).

From the polynomial bound assumption for p(s)/pp^{(s)}/p in Assumption 2.2,

|p(s)(NHij)p(NHij)|Cs|NHij|rs.\left|\frac{p^{(s)}(\sqrt{N}H_{ij})}{p(\sqrt{N}H_{ij})}\right|\leq C_{s}|\sqrt{N}H_{ij}|^{r_{s}}.

Thus, for any ϵ,D>0\epsilon^{\prime},D>0

(|p(s)(NHij)p(NHij)|>Nϵ)(|NHij|>(NϵCs)1rs)\mathbb{P}\left(\left|\frac{p^{(s)}(\sqrt{N}H_{ij})}{p(\sqrt{N}H_{ij})}\right|>N^{\epsilon^{\prime}}\right)\leq\mathbb{P}\left(|\sqrt{N}H_{ij}|>\left(\frac{N^{\epsilon^{\prime}}}{C_{s}}\right)^{\frac{1}{r_{s}}}\right)

for any sufficiently large NN. Since ϵ\epsilon^{\prime} is arbitrary, this proves that p(s)(NHij)p(NHij)=𝒪(1)\frac{p^{(s)}(\sqrt{N}H_{ij})}{p(\sqrt{N}H_{ij})}={\mathcal{O}}(1). The estimate for pd(s)(NHij)pd(NHij)\frac{p_{d}^{(s)}(\sqrt{N}H_{ij})}{p_{d}(\sqrt{N}H_{ij})} can be proved in a similar manner.

The last part of the lemma follows from that h(s)h^{(s)} and hd(s)h_{d}^{(s)} can be written as a polynomial of (p(s)/p)(p^{(s)}/p)’s and (pd(s)/pd)(p_{d}^{(s)}/p_{d})’s, respectively. ∎

Proof of Lemma 4.3.

We prove the first inequality for the case k=1k=1 only; the general case can be handled in a similar manner.

Set s:=s1s:=s_{1}. From Lemma 4.2, we find that for any ϵ>0\epsilon>0, Pij(s)P_{ij}^{(s)} are uniformly bounded by NϵN^{\epsilon} with high probability. Define

Ωij:={|Pij(s)|<Nϵ},Ω:=i,jΩij.\Omega_{ij}:=\{|P_{ij}^{(s)}|<N^{\epsilon}\},\quad\Omega:=\bigcap_{i,j}\Omega_{ij}.

Since Ωij\Omega_{ij} holds with high probability, by taking a union bound, Ω\Omega also holds with high probability. For any D>0D>0, applying Hoeffding’s inequality (or high order Markov’s inequality) on Ω\Omega, we obtain

(|i<j(Pij(s)𝔼[Pij(s)|Ωij])|N1+2ϵ)<ND\mathbb{P}\left(\left|\sum_{i<j}\left(P_{ij}^{(s)}-\mathbb{E}[P_{ij}^{(s)}|\Omega_{ij}]\right)\right|\geq N^{1+2\epsilon}\right)<N^{-D} (A.1)

for any sufficiently large NN. Moreover, since Pij(s)P_{ij}^{(s)} is polynomially bounded, for any D>0D>0,

|𝔼[Pij(s)𝟏Ωijc]|𝔼[|Pij(s)|2]𝔼[𝟏Ωijc](Cs)2𝔼[|NHij|rsD](Ωijc)<ND\left|\mathbb{E}[P_{ij}^{(s)}\mathbf{1}_{\Omega_{ij}^{c}}]\right|\leq\mathbb{E}[|P_{ij}^{(s)}|^{2}]\mathbb{E}[\mathbf{1}_{\Omega_{ij}^{c}}]\leq(C_{s})^{2}\mathbb{E}[|\sqrt{N}H_{ij}|^{r_{s}D^{\prime}}]\mathbb{P}(\Omega_{ij}^{c})<N^{-D}

for any sufficiently large NN, which shows that

𝔼[Pij(s)]𝔼[Pij(s)|Ωij]=𝔼[Pij(s)]𝔼[Pij(s)𝟏Ωij](Ωij)=𝔼[Pij(s)𝟏Ωijc]+O(ND)=O(ND).\mathbb{E}[P_{ij}^{(s)}]-\mathbb{E}[P_{ij}^{(s)}|\Omega_{ij}]=\mathbb{E}[P_{ij}^{(s)}]-\frac{\mathbb{E}[P_{ij}^{(s)}\mathbf{1}_{\Omega_{ij}}]}{\mathbb{P}(\Omega_{ij})}=\mathbb{E}[P_{ij}^{(s)}\mathbf{1}_{\Omega_{ij}^{c}}]+O(N^{-D})=O(N^{-D}). (A.2)

Combining (A.1) and (A.2),

i<jPij(s)=𝔼[P12(s)]+𝒪(N),\sum_{i<j}P_{ij}^{(s)}=\mathbb{E}[P_{12}^{(s)}]+{\mathcal{O}}(N),

and after multiplying both sides by 2N(N1)\frac{2}{N(N-1)}, we conclude that the first inequality of Lemma 4.3 holds. ∎

Appendix B Proof of Lemma 5.2

We first prove the limit of 𝔼[η]\mathbb{E}[\eta]. Let

ξk:=γ:|E(γ)|=kw(γ).\xi_{k}:=\sum_{\gamma:|E(\gamma)|=k}w(\gamma).

Note that for distinct simple loops γ1\gamma_{1} and γ2\gamma_{2}, w(γ1)w(\gamma_{1}) and w(γ2)w(\gamma_{2}) are orthogonal with respect to both LB2L^{2}_{B} and L2L^{2}. To check this, assume that the edge (i1,j1)(i_{1},j_{1}) is in γ1\gamma_{1} but not in γ2\gamma_{2}. Then, from the independence of AijA_{ij}’s,

𝔼[w(γ1)w(γ2)]=𝔼[tanhAi1j1N(i,j):(i,j)E(γ1)E(γ2),(i,j)(i1,j2)tanhAijN]=𝔼[tanhAi1j1N]𝔼[(i,j):(i,j)E(γ1)E(γ2),(i,j)(i1,j2)tanhAijN]=0.\begin{split}\mathbb{E}[w(\gamma_{1})w(\gamma_{2})]&=\mathbb{E}\left[\tanh\frac{A_{i_{1}j_{1}}}{N}\prod_{(i,j):(i,j)\in E(\gamma_{1})\cup E(\gamma_{2}),(i,j)\neq(i_{1},j_{2})}\tanh\frac{A_{ij}}{N}\right]\\ &=\mathbb{E}\left[\tanh\frac{A_{i_{1}j_{1}}}{N}\right]\mathbb{E}\left[\prod_{(i,j):(i,j)\in E(\gamma_{1})\cup E(\gamma_{2}),(i,j)\neq(i_{1},j_{2})}\tanh\frac{A_{ij}}{N}\right]=0.\end{split}

A similar idea can also be used to prove the orthogonality of w(γ1)w(\gamma_{1}) and w(γ2)w(\gamma_{2}) with respect to LB2L^{2}_{B}.

We now consider 𝔼ξkLB2\mathbb{E}||\xi_{k}||_{L^{2}_{B}}. Let P(N,k):=N!(Nk)!P(N,k):=\frac{N!}{(N-k)!}. Since the number of possible simple loops of length kk in the vertex set [N][N] is P(N,k)/(2k)P(N,k)/(2k), we have from the orthogonality of γ\gamma’s that

𝔼ξkLB22=γ:|E(γ)|=k(i,j)E(γ)𝔼[tanh2AijN]=P(N,k)2k𝔼[tanh2A12N]kNk2k𝔼[(ωN)(P12(1))2N2]k=(ωF)k2k.\begin{split}\mathbb{E}||\xi_{k}||_{L^{2}_{B}}^{2}&=\sum_{\gamma:|E(\gamma)|=k}\prod_{(i,j)\in E(\gamma)}\mathbb{E}\left[\tanh^{2}\frac{A_{ij}}{N}\right]=\frac{P(N,k)}{2k}\mathbb{E}\left[\tanh^{2}\frac{A_{12}}{N}\right]^{k}\\ &\to\frac{N^{k}}{2k}\mathbb{E}\left[(\omega N)\frac{(P^{(1)}_{12})^{2}}{N^{2}}\right]^{k}=\frac{(\omega F)^{k}}{2k}.\end{split} (B.1)

Also, for any given ϵ>0\epsilon>0,

𝔼ξkLB22(ωF+ϵ)k2k.\mathbb{E}||\xi_{k}||_{L^{2}_{B}}^{2}\leq\frac{(\omega F+\epsilon)^{k}}{2k}. (B.2)

Since ξ=kNξk\xi=\sum_{k\leq N}\xi_{k} and ωF<1\omega F<1, by the orthogonality and the dominated convergence theorem we have

𝔼[η]=𝔼[ξ2]=𝔼[𝔼B[ξ2]]k=3(ωF)k2k=2ν\mathbb{E}[\eta]=\mathbb{E}[\xi^{2}]=\mathbb{E}[\mathbb{E}_{B}[\xi^{2}]]\to\sum_{k=3}^{\infty}\frac{(\omega F)^{k}}{2k}=2\nu (B.3)

as NN\to\infty.

We next consider Var(𝔼B[ξ2])\operatorname{Var}(\mathbb{E}_{B}[\xi^{2}]). Our goal is to show that Var(𝔼B[ξ2])0\operatorname{Var}(\mathbb{E}_{B}[\xi^{2}])\to 0 as NN\to\infty, and for this purpose, it suffices to prove that Var(𝔼B[(ξ(j))2])0\operatorname{Var}(\mathbb{E}_{B}[(\xi^{(j)})^{2}])\to 0 for any fixed jj where we let ξ(j):=kjξk\xi^{(j)}:=\sum_{k\leq j}\xi_{k}, since (B.1) and (B.2) imply that

kjξk0\sum_{k\geq j}\xi_{k}\to 0

as jj\to\infty in L2L^{2}, uniformly in NN. Note that

Var(𝔼B[(ξ(j))2])=γ1,γ2:|E(γ1)|,|E(γ2)|jCov(𝔼B[w(γ1)2],𝔼B[w(γ2)2]).\operatorname{Var}(\mathbb{E}_{B}[(\xi^{(j)})^{2}])=\sum_{\gamma_{1},\gamma_{2}:|E(\gamma_{1})|,|E(\gamma_{2})|\leq j}\operatorname{Cov}\left(\mathbb{E}_{B}[w(\gamma_{1})^{2}],\mathbb{E}_{B}[w(\gamma_{2})^{2}]\right). (B.4)

The covariance vanishes if there is no common edge, and otherwise

Cov(𝔼B[w(γ1)2],𝔼B[w(γ2)2])Jm(ωF)|E(γ1)|+|E(γ2)|2mN(|E(γ1)|+|E(γ2)|)\operatorname{Cov}\Big{(}\mathbb{E}_{B}[w(\gamma_{1})^{2}],\mathbb{E}_{B}[w(\gamma_{2})^{2}]\Big{)}\leq J^{m}(\omega F)^{|E(\gamma_{1})|+|E(\gamma_{2})|-2m}N^{-(|E(\gamma_{1})|+|E(\gamma_{2})|)} (B.5)

for large NN, where JJ is a fixed number greater than N2Var(Aij2)N^{-2}\operatorname{Var}(A_{ij}^{2}) and mm is the number of common edges of γ1\gamma_{1} and γ2\gamma_{2}. However, the number of isomorphism types of γ1\gamma_{1} and γ2\gamma_{2} is finite for a fixed jj, and the number of possible choices of γ1\gamma_{1} and γ2\gamma_{2} for each isomorphism type is O(N|E(γ1)|+|E(γ2)|2O(N^{|E(\gamma_{1})|+|E(\gamma_{2})|-2}), verifying that the right hand side of (B.4) converges to 0. Similarly, we can show that Var(η)0\operatorname{Var}(\eta)\to 0 as NN\to\infty, which combined with (B.3) proves (ii).

It remains to estimate 𝔼B[ξ2s]\mathbb{E}_{B}[\xi^{2s}]. Our goal is to show that

𝔼B[ξ2s](2s)!2ss!(𝔼B[ξ2])s0\mathbb{E}_{B}[\xi^{2s}]-\frac{(2s)!}{2^{s}s!}(\mathbb{E}_{B}[\xi^{2}])^{s}\to 0 (B.6)

in probability. The coefficient ((2s)!)/(2ss!)((2s)!)/(2^{s}s!) is the number of pairings of [2s][2s], or equivalently the 2s2s-th moment of the Gaussian. Since the conditional odd moments of ξ\xi given BB are zero, (i) immediately follows from (B.3) and (B.6). For the proof of (B.6), we notice that it suffices to prove the statement for ξ(j)\xi^{(j)} as in the paragraph above.

By conditional independence of w(γ)w(\gamma)’s, we have

𝔼B[(ξ(j))2s](2s)!2ss!(𝔼B[(ξ(j))2])s=γi:i2s𝔼B[i2sw(γi)](2s)!2ss!γi:i2sis𝔼B[w(γ2i1)w(γ2i)],\begin{split}\mathbb{E}_{B}[(\xi^{(j)})^{2s}]-\frac{(2s)!}{2^{s}s!}(\mathbb{E}_{B}[(\xi^{(j)})^{2}])^{s}=\sum_{\gamma_{i}:i\leq 2s}\mathbb{E}_{B}\left[\prod_{i\leq 2s}w(\gamma_{i})\right]-\frac{(2s)!}{2^{s}s!}\sum_{\gamma_{i}:i\leq 2s}\prod_{i\leq s}\mathbb{E}_{B}[w(\gamma_{2i-1})w(\gamma_{2i})],\end{split} (B.7)

where γi\gamma_{i}’s run over simple loops with length at most jj. In the first term, however, the contribution from the case where γi\gamma_{i}’s are paired into non-intersecting double loops cancel with the second term. Thus, we have

𝔼B[(ξ(j))2s](2s)!2ss!(𝔼B[(ξ(j))2])s=M𝔼B[w(M)],\mathbb{E}_{B}[(\xi^{(j)})^{2s}]-\frac{(2s)!}{2^{s}s!}(\mathbb{E}_{B}[(\xi^{(j)})^{2}])^{s}=\sum_{M}\mathbb{E}_{B}[w(M)], (B.8)

where MM runs over multigraphs with even edge multiplicity, which can be described as a union of 2s2s loops of length at most jj, but not as a disjoint union of ss double loops. From an estimate

maxγ:|E(M)|=k𝔼B|w(M)|s=𝒪(Ns2|E(M)|),\max_{\gamma:|E(M)|=k}\mathbb{E}_{B}|w(M)|^{s}={\mathcal{O}}(N^{-\frac{s}{2}|E(M)|}),

which can be proved as (5.11), each summand in the right hand side of (B.8) is 𝒪(Ns){\mathcal{O}}(N^{-s}). However, since the sum of the order of all the vertices, which is equal to 4s4s, is at least 4|V(M)|+44|V(M)|+4, by the assumptions on MM. Therefore, the contribution of each isomorphism type to the sum is 𝒪(Ns+|V(M)|){\mathcal{O}}(N^{-s+|V(M)|}), hence 𝒪(N1){\mathcal{O}}(N^{-1}). Since there are finitely many isomorphism types of MM, we obtain (B.7). This completes the proof of of Lemma 5.2.

Appendix C Proof of Theorem 2.8

In this section, we provide the detail of the proof of Theorem 2.8.

C.1 Preliminaries

Recall that we have the following estimates from Lemma 4.2: For any 1i,jN1\leq i,j\leq N and for any s1s\geq 1,

Xij=𝒪(N12),h(s)(NXij)=𝒪(1).X_{ij}={\mathcal{O}}(N^{-\frac{1}{2}}),\quad h^{(s)}(\sqrt{N}X_{ij})={\mathcal{O}}(1). (C.1)

In particular, for any ϵ>0\epsilon>0, NXij\sqrt{N}X_{ij} and h(s)(NXij)h^{(s)}(\sqrt{N}X_{ij}) are uniformly bounded by NϵN^{\epsilon} with high probability. We set

Qij(s):=p(s)(NYij)p(NYij),Q^{(s)}_{ij}:=\frac{p^{(s)}(\sqrt{N}Y_{ij})}{p(\sqrt{N}Y_{ij})}, (C.2)

where we denote by p(s)p^{(s)} the ss-th derivative of pp. We also have from Lemma 4.3 that

1N(N1)ijQij(s1)Qij(s2)Qij(sk)=𝔼[Q12(s1)Q12(s2)Q12(sk)]+𝒪(N1),1Ni=1NQii(s1)Qii(s2)Qii(sk)=𝔼[Q11(s1)Q11(s2)Q11(sk)]+𝒪(N12),\begin{split}\frac{1}{N(N-1)}\sum_{i\neq j}Q_{ij}^{(s_{1})}Q_{ij}^{(s_{2})}\dots Q_{ij}^{(s_{k})}&=\mathbb{E}[Q_{12}^{(s_{1})}Q_{12}^{(s_{2})}\dots Q_{12}^{(s_{k})}]+{\mathcal{O}}(N^{-1}),\\ \frac{1}{N}\sum_{i=1}^{N}Q_{ii}^{(s_{1})}Q_{ii}^{(s_{2})}\dots Q_{ii}^{(s_{k})}&=\mathbb{E}[Q_{11}^{(s_{1})}Q_{11}^{(s_{2})}\dots Q_{11}^{(s_{k})}]+{\mathcal{O}}(N^{-\frac{1}{2}}),\end{split} (C.3)

for any s1,s2,,sk1s_{1},s_{2},\dots,s_{k}\geq 1.

Assume 𝑯0{\boldsymbol{H}}_{0}. The likelihood ratio is

(Y;ω)=12N𝒙i,j=1Np(NYijωNxixj)p(NYij)=12N𝒙ijp(NYijωNxixj)p(NYij)kp(NYkkωNxk2)p(NYkk).\begin{split}{\mathcal{L}}(Y;\omega)&=\frac{1}{2^{N}}\sum_{{\boldsymbol{x}}}\prod_{i,j=1}^{N}\frac{p(\sqrt{N}Y_{ij}-\sqrt{\omega N}x_{i}x_{j})}{p(\sqrt{N}Y_{ij})}\\ &=\frac{1}{2^{N}}\sum_{{\boldsymbol{x}}}\prod_{i\neq j}\frac{p(\sqrt{N}Y_{ij}-\sqrt{\omega N}x_{i}x_{j})}{p(\sqrt{N}Y_{ij})}\prod_{k}\frac{p(\sqrt{N}Y_{kk}-\sqrt{\omega N}x_{k}^{2})}{p(\sqrt{N}Y_{kk})}\,.\end{split} (C.4)

As in the spiked Wigner case, the diagonal part can be handled separately, since it does not depend on the spike or the off-diagonal part. Since Qij(s)Q^{(s)}_{ij} is an odd function of NYij\sqrt{N}Y_{ij} if ss is odd and an even function if ss is even, by the Taylor expansion,

p(NYijωNxixj)p(NYij)=1ωNQij(1)xixj+ωQij(2)2Nωω6NQij(3)xixj+ω2Qij(4)24N2+𝒪(N52),\begin{split}&\frac{p(\sqrt{N}Y_{ij}-\sqrt{\omega N}x_{i}x_{j})}{p(\sqrt{N}Y_{ij})}\\ &=1-\sqrt{\omega N}Q^{(1)}_{ij}x_{i}x_{j}+\frac{\omega Q^{(2)}_{ij}}{2N}-\frac{\omega\sqrt{\omega}}{6\sqrt{N}}Q^{(3)}_{ij}x_{i}x_{j}+\frac{\omega^{2}Q^{(4)}_{ij}}{24N^{2}}+{\mathcal{O}}(N^{-\frac{5}{2}}),\end{split} (C.5)

uniformly on ii and jj. Following the decomposition in (LABEL:eq:decompose), we thus get

log12N𝒙ijp(NYijωNxixj)p(NYij)=log12N𝒙exp(i<jAijxixj+i<j(Bij+Cij)+𝒪(N12)),\begin{split}\log\frac{1}{2^{N}}\sum_{{\boldsymbol{x}}}\prod_{i\neq j}\frac{p(\sqrt{N}Y_{ij}-\sqrt{\omega N}x_{i}x_{j})}{p(\sqrt{N}Y_{ij})}=\log\frac{1}{2^{N}}\sum_{{\boldsymbol{x}}}\exp\left(\sum_{i<j}A^{*}_{ij}x_{i}x_{j}+\sum_{i<j}(B^{*}_{ij}+C^{*}_{ij})+{\mathcal{O}}(N^{-\frac{1}{2}})\right),\end{split} (C.6)

where we let

Aij:=ωN(Qij(1)+Qji(1)+ω6N(Qij(3)+Qji(3)3Qij(1)Qij(2)3Qji(1)Qji(2)+2(Qij(1))3+2(Qji(1))3)),Bij:=ω2N(Qij(2)+Qji(2)(Qij(1))2(Qji(1))2),Cij:=ω224N2(Qij(4)+Qji(4)3(Pij(2))23(Pji(2))24Qij(1)Qij(3)4Qji(1)Qji(3)+12(Qij(1))2Qij(2)+12(Qji(1))2Qji(2)6(Pij(1))46(Pji(1))4).\begin{split}&A^{*}_{ij}:=-\sqrt{\omega N}\bigg{(}Q^{(1)}_{ij}+Q^{(1)}_{ji}+\frac{\omega}{6N}\Big{(}Q^{(3)}_{ij}+Q^{(3)}_{ji}-3Q^{(1)}_{ij}Q^{(2)}_{ij}-3Q^{(1)}_{ji}Q^{(2)}_{ji}+2(Q^{(1)}_{ij})^{3}+2(Q^{(1)}_{ji})^{3}\Big{)}\bigg{)},\\ &B^{*}_{ij}:=\frac{\omega}{2N}\left(Q^{(2)}_{ij}+Q^{(2)}_{ji}-(Q^{(1)}_{ij})^{2}-(Q^{(1)}_{ji})^{2}\right),\\ &C^{*}_{ij}:=\frac{\omega^{2}}{24N^{2}}\bigg{(}Q^{(4)}_{ij}+Q^{(4)}_{ji}-3(P^{(2)}_{ij})^{2}-3(P^{(2)}_{ji})^{2}-4Q^{(1)}_{ij}Q^{(3)}_{ij}-4Q^{(1)}_{ji}Q^{(3)}_{ji}\\ &\qquad\qquad\qquad\qquad+12(Q^{(1)}_{ij})^{2}Q^{(2)}_{ij}+12(Q^{(1)}_{ji})^{2}Q^{(2)}_{ji}-6(P^{(1)}_{ij})^{4}-6(P^{(1)}_{ji})^{4}\bigg{)}.\end{split} (C.7)

Note that from Lemma 4.2,

Aij=𝒪(N),Bij=𝒪(N1),Cij=𝒪(N2).A^{*}_{ij}={\mathcal{O}}(\sqrt{N}),\quad B^{*}_{ij}={\mathcal{O}}(N^{-1}),\quad C^{*}_{ij}={\mathcal{O}}(N^{-2}). (C.8)

C.2 Spin glass part

We have the following result for the convergence of the term involving AA^{*}. Let 𝔼B\mathbb{E}_{B^{*}} and VarB\operatorname{Var}_{B^{*}} denote the conditional expectation and the conditional variance given BB^{*}, respectively. The counterpart of Proposition 4.4 is as follows:

Proposition C.1.

Set

Z:=12N𝒙exp(i<jAijxixj).Z^{*}:=\frac{1}{2^{N}}\sum_{{\boldsymbol{x}}}\exp\left(\sum_{i<j}A^{*}_{ij}x_{i}x_{j}\right). (C.9)

Then, there exist random variables ζ\zeta^{*} and (ζ)(\zeta^{*})^{\prime} such that

logZ=logζ+(ζ)+𝒪(N1),\log Z^{*}=\log\zeta^{*}+(\zeta^{*})^{\prime}+{\mathcal{O}}(N^{-1}), (C.10)

where ζ\zeta^{*} and (ζ)(\zeta^{*})^{\prime} satisfy the following

  1. 1.

    The conditional distribution of logζ\log\zeta^{*} given BB^{*} converges in distribution to 𝒩(ν,2ν){\mathcal{N}}(-\nu^{*},2\nu^{*}), with

    ν:=k=3(2ωF)k4k=14(log(12ωF)+2ωF+2ω2F2).\nu^{*}:=\sum_{k=3}^{\infty}\frac{(2\omega F)^{k}}{4k}=-\frac{1}{4}\left(\log(1-2\omega F)+2\omega F+2\omega^{2}F^{2}\right). (C.11)
  2. 2.

    Conditional on BB^{*},

    (ζ)=12N2i<j𝔼B[(Aij)2]ω212𝔼[(Q12(1))4]ω24𝔼[(Q12(1))2]2+U,(\zeta^{*})^{\prime}=\frac{1}{2N^{2}}\sum_{i<j}\mathbb{E}_{B^{*}}[(A^{*}_{ij})^{2}]-\frac{\omega^{2}}{12}\mathbb{E}[(Q^{(1)}_{12})^{4}]-\frac{\omega^{2}}{4}\mathbb{E}[(Q^{(1)}_{12})^{2}]^{2}+U^{*}, (C.12)

    where UU^{*} is a random variable whose asymptotic law is a centered Gaussian with variance

    θ:=ω28𝔼[VarB((Q12(1)+Q12(1))2)].\theta^{*}:=\frac{\omega^{2}}{8}\mathbb{E}\left[\operatorname{Var}_{B^{*}}\big{(}(Q^{(1)}_{12}+Q^{(1)}_{12})^{2}\big{)}\right]. (C.13)
  3. 3.

    The fluctuations logζ\log\zeta^{*} and (ζ)(\zeta^{*})^{\prime} are asymptotically orthogonal to each other under LB2L_{B^{*}}^{2}, the conditional L2L^{2}-norm given BB^{*}.

Proposition C.1 is proved in Section D.

C.3 CLT part

From Proposition C.1, we find that the terms involving BB^{*} in the right hand side of (C.6) are

i<jBij+12N2i<j𝔼[(Aij)2|Bij].\sum_{i<j}B^{*}_{ij}+\frac{1}{2N^{2}}\sum_{i<j}\mathbb{E}[(A^{*}_{ij})^{2}|B^{*}_{ij}]. (C.14)

which is the sum of i.i.d. random variables that depend only on BijB^{*}_{ij}. By the central limit theorem, it converges to a normal random variable whose mean and variance we compute in this subsection.

We first compute the mean of (C.14). By definition,

𝔼[Bij]=ωN𝔼[Qij(2)(Qij(1))2]=ωFN\mathbb{E}[B^{*}_{ij}]=\frac{\omega}{N}\mathbb{E}\left[Q^{(2)}_{ij}-(Q^{(1)}_{ij})^{2}\right]=-\frac{\omega F}{N}

and using integration by parts formulas in (4.14) and (4.16), we find that

𝔼[𝔼[(Aij)2|Bij]]=𝔼[(Aij)2]=2ωN𝔼[(Qij(1))2]+2ω23𝔼[Qij(1)Qij(3)3(Qij(1))2Qij(2)+2(Qij(1))4]+O(N1)=2ωN𝔼[(Qij(1))2]+2ω23𝔼[Qij(1)Qij(3)]+O(N1),\begin{split}&\mathbb{E}\left[\mathbb{E}[(A^{*}_{ij})^{2}|B^{*}_{ij}]\right]=\mathbb{E}[(A^{*}_{ij})^{2}]=2\omega N\mathbb{E}\left[(Q^{(1)}_{ij})^{2}\right]+\frac{2\omega^{2}}{3}\mathbb{E}\left[Q^{(1)}_{ij}Q^{(3)}_{ij}-3(Q^{(1)}_{ij})^{2}Q^{(2)}_{ij}+2(Q^{(1)}_{ij})^{4}\right]+O(N^{-1})\\ &=2\omega N\mathbb{E}\left[(Q^{(1)}_{ij})^{2}\right]+\frac{2\omega^{2}}{3}\mathbb{E}\left[Q^{(1)}_{ij}Q^{(3)}_{ij}\right]+O(N^{-1}),\end{split} (C.15)

where we used the independence of Qij(s)Q^{(s)}_{ij} and Qji(s)Q^{(s)}_{ji} and also that 𝔼[Qij(1)]=0\mathbb{E}[Q^{(1)}_{ij}]=0. We thus get

𝔼[i<jBij+12N2i<j𝔼[(Aij)2|Bij]]=ω26𝔼[Q12(1)Q12(3)]+O(N1).\begin{split}\mathbb{E}\left[\sum_{i<j}B^{*}_{ij}+\frac{1}{2N^{2}}\sum_{i<j}\mathbb{E}[(A^{*}_{ij})^{2}|B^{*}_{ij}]\right]=\frac{\omega^{2}}{6}\mathbb{E}\left[Q^{(1)}_{12}Q^{(3)}_{12}\right]+O(N^{-1}).\end{split} (C.16)

We next compute the the variance of (C.14). We have from the definition that

𝔼[(Bij)2]=ω22N2𝔼[(Qij(2))22(Qij(1))2Qij(2)+(Qij(1))4]+ω2F22N2,\mathbb{E}[(B^{*}_{ij})^{2}]=\frac{\omega^{2}}{2N^{2}}\mathbb{E}\left[(Q^{(2)}_{ij})^{2}-2(Q^{(1)}_{ij})^{2}Q^{(2)}_{ij}+(Q^{(1)}_{ij})^{4}\right]+\frac{\omega^{2}F^{2}}{2N^{2}},
1N2𝔼[(Aij)2Bij]=ω2N2𝔼[(Qij(1))2(Qij(2)(Qij(1))2)]ω2F2N2+O(N3),\frac{1}{N^{2}}\mathbb{E}[(A^{*}_{ij})^{2}B^{*}_{ij}]=\frac{\omega^{2}}{N^{2}}\mathbb{E}\left[(Q^{(1)}_{ij})^{2}\Big{(}Q^{(2)}_{ij}-(Q^{(1)}_{ij})^{2}\Big{)}\right]-\frac{\omega^{2}F^{2}}{N^{2}}+O(N^{-3}),

and

14N4𝔼[(𝔼[(Aij)2|Bij])2]=ω24N2𝔼[(𝔼[(Qij(1)+(Qji(1))2|Bij])2]+O(N3).\frac{1}{4N^{4}}\mathbb{E}[(\mathbb{E}[(A^{*}_{ij})^{2}|B^{*}_{ij}])^{2}]=\frac{\omega^{2}}{4N^{2}}\mathbb{E}[(\mathbb{E}[(Q^{(1)}_{ij}+(Q^{(1)}_{ji})^{2}|B^{*}_{ij}])^{2}]+O(N^{-3}).

We apply the identity

𝔼[(Qij(1)+Qji(1))4]=𝔼[(𝔼[(Qij(1)+Qji(1))2|Bij])2+Var((Qij(1)+Qji(1))2|Bij)],\mathbb{E}[(Q^{(1)}_{ij}+Q^{(1)}_{ji})^{4}]=\mathbb{E}\left[(\mathbb{E}[(Q^{(1)}_{ij}+Q^{(1)}_{ji})^{2}|B^{*}_{ij}])^{2}+\operatorname{Var}\big{(}(Q^{(1)}_{ij}+Q^{(1)}_{ji})^{2}|B^{*}_{ij}\big{)}\right],

which corresponds to (4.24), and also

𝔼[(Qij(1)+Qji(1))4]=2𝔼[(Qij(1))4]+6𝔼[(Qij(1))2]2.\mathbb{E}[(Q^{(1)}_{ij}+Q^{(1)}_{ji})^{4}]=2\mathbb{E}[(Q^{(1)}_{ij})^{4}]+6\mathbb{E}[(Q^{(1)}_{ij})^{2}]^{2}\,.

We then obtain

𝔼[(Bij+𝔼[(Aij)2|Bij]2N2)2]=ω24N2𝔼[2(Qij(2))2Var((Qij(1))2|Bij)]+ω2F2N2+O(N3).\begin{split}\mathbb{E}\left[\left(B^{*}_{ij}+\frac{\mathbb{E}[(A^{*}_{ij})^{2}|B^{*}_{ij}]}{2N^{2}}\right)^{2}\right]=\frac{\omega^{2}}{4N^{2}}\mathbb{E}\left[2(Q^{(2)}_{ij})^{2}-\operatorname{Var}\big{(}(Q^{(1)}_{ij})^{2}|B^{*}_{ij}\big{)}\right]+\frac{\omega^{2}F^{2}}{N^{2}}+O(N^{-3}).\end{split} (C.17)

We thus conclude from (C.16) and (C.17) that

i<jBij+12N2i<j𝔼[(Aij)2|Bij]𝒩(ω26𝔼[Q12(1)Q12(3)],ω28𝔼[2(Q12(2))2VarB((Q12(1)+Q21(1))2)]).\begin{split}&\sum_{i<j}B^{*}_{ij}+\frac{1}{2N^{2}}\sum_{i<j}\mathbb{E}[(A^{*}_{ij})^{2}|B^{*}_{ij}]\\ &\qquad\Rightarrow{\mathcal{N}}\left(\frac{\omega^{2}}{6}\mathbb{E}\left[Q^{(1)}_{12}Q^{(3)}_{12}\right],\frac{\omega^{2}}{8}\mathbb{E}\left[2(Q^{(2)}_{12})^{2}-\operatorname{Var}_{B^{*}}\big{(}(Q^{(1)}_{12}+Q^{(1)}_{21})^{2}\big{)}\right]\right).\end{split} (C.18)

C.4 LLN part and the diagonal part

From the definition of CijC^{*}_{ij} in (LABEL:eq:ABC) and the integration by parts formula in (4.28), we find that

i<jCij=ω224𝔼[Q12(1)Q12(3)]+𝒪(N1).\sum_{i<j}C^{*}_{ij}=-\frac{\omega^{2}}{24}\mathbb{E}[Q^{(1)}_{12}Q^{(3)}_{12}]+{\mathcal{O}}(N^{-1}). (C.19)

For the diagonal part in (C.4), it can be readily checked that

logkp(NYkkωNxk2)p(NYkk)𝒩(ωF2,ωF).\log\prod_{k}\frac{p(\sqrt{N}Y_{kk}-\sqrt{\omega N}x_{k}^{2})}{p(\sqrt{N}Y_{kk})}\Rightarrow{\mathcal{N}}(-\frac{\omega F}{2},\omega F). (C.20)

C.5 Proof of Theorem 2.8

We now combine Proposition C.1, Equations (LABEL:eq:B*_normal), (C.19), and (C.20) to find that log(Y;λ)\log{\mathcal{L}}(Y;\lambda) in (C.4) converges to the Gaussian whose mean is

14(log(12ωF)+2ωF+2ω2F2)ω212𝔼[(Q12(1))4]ω2F24+ω26𝔼[Q12(1)Q12(3)]ω224𝔼[Q12(1)Q12(3)]ωF2=14(log(1ωF)+ω2F2)ω28𝔼[(Q12(2))2]=ρ\begin{split}&\frac{1}{4}\left(\log(1-2\omega F)+2\omega F+2\omega^{2}F^{2}\right)-\frac{\omega^{2}}{12}\mathbb{E}[(Q^{(1)}_{12})^{4}]-\frac{\omega^{2}F^{2}}{4}+\frac{\omega^{2}}{6}\mathbb{E}[Q^{(1)}_{12}Q^{(3)}_{12}]-\frac{\omega^{2}}{24}\mathbb{E}[Q^{(1)}_{12}Q^{(3)}_{12}]-\frac{\omega F}{2}\\ &=\frac{1}{4}\left(\log(1-\omega F)+\omega^{2}F^{2}\right)-\frac{\omega^{2}}{8}\mathbb{E}[(Q^{(2)}_{12})^{2}]=-\rho^{*}\end{split} (C.21)

and variance

12(log(12ωF)+2ωF+2ω2F2)+ω28𝔼[VarB((Q12(1)+Q21(1))2)]+ω28𝔼[2(Q12(2))2VarB((Q12(1)+Q21(1))2)]+ω2F22+ωF=2ρ.\begin{split}&-\frac{1}{2}\left(\log(1-2\omega F)+2\omega F+2\omega^{2}F^{2}\right)+\frac{\omega^{2}}{8}\mathbb{E}\left[\operatorname{Var}_{B^{*}}\big{(}(Q^{(1)}_{12}+Q^{(1)}_{21})^{2}\big{)}\right]\\ &\qquad\qquad+\frac{\omega^{2}}{8}\mathbb{E}\left[2(Q^{(2)}_{12})^{2}-\operatorname{Var}_{B^{*}}\big{(}(Q^{(1)}_{12}+Q^{(1)}_{21})^{2}\big{)}\right]+\frac{\omega^{2}F^{2}}{2}+\omega F=2\rho^{*}.\end{split} (C.22)

This proves Theorem 2.8.

Appendix D Proof of Proposition C.1

In this section, we prove Proposition C.1 by following the strategy in Section 5. We decompose the fluctuation of ZZ^{*} into two parts. Let

ζ:=12N𝒙i<j(1+NxixjtanhAijN).\zeta^{*}:=\frac{1}{2^{N}}\sum_{{\boldsymbol{x}}}\prod_{i<j}\left(1+Nx_{i}x_{j}\tanh\frac{A^{*}_{ij}}{N}\right). (D.1)

From the direct calculation involving the Taylor expansion as in Section 5, we get

logZlogζ=i<j((Aij)22N2(Aij)412N4)+𝒪(N1),\log Z^{*}-\log\zeta^{*}=\sum_{i<j}\left(\frac{(A^{*}_{ij})^{2}}{2N^{2}}-\frac{(A^{*}_{ij})^{4}}{12N^{4}}\right)+{\mathcal{O}}(N^{-1}), (D.2)

and we define

(ζ):=i<j((Aij)22N2(Aij)412N4).(\zeta^{*})^{\prime}:=\sum_{i<j}\left(\frac{(A^{*}_{ij})^{2}}{2N^{2}}-\frac{(A^{*}_{ij})^{4}}{12N^{4}}\right). (D.3)

By definition, logZ=ζ+(ζ)+𝒪(N1)\log Z^{*}=\zeta^{*}+(\zeta^{*})^{\prime}+{\mathcal{O}}(N^{-1}).

For the second part of Proposition C.1, we notice that the array ((Aij)22N2:1i<jN)(\frac{(A^{*}_{ij})^{2}}{2N^{2}}:1\leq i<j\leq N) almost surely satisfies Lyapunov’s condition for the central limit theorem conditional on BB^{*}. Thus, if we define the random variable

U=(ζ)(12N2i<j𝔼B[(Aij)2]ω212𝔼[(Q12(1))4]ω24𝔼[(Q12(1))2]2),U^{*}=(\zeta^{*})^{\prime}-\left(\frac{1}{2N^{2}}\sum_{i<j}\mathbb{E}_{B^{*}}[(A^{*}_{ij})^{2}]-\frac{\omega^{2}}{12}\mathbb{E}[(Q^{(1)}_{12})^{4}]-\frac{\omega^{2}}{4}\mathbb{E}[(Q^{(1)}_{12})^{2}]^{2}\right), (D.4)

its asymptotic law is a centered Gaussian with variance

θ:=ω28𝔼[VarB((Q12(1)+Q12(1))2)].\theta^{*}:=\frac{\omega^{2}}{8}\mathbb{E}\left[\operatorname{Var}_{B^{*}}\big{(}(Q^{(1)}_{12}+Q^{(1)}_{12})^{2}\big{)}\right]. (D.5)

This proves the second part of Proposition C.1.

Recall that Γ\Gamma is a simple graph on the vertex set [n]:={1,2,,n}[n]:=\{1,2,\dots,n\} with no self-edges and no vertices of odd degree and E(Γ)E(\Gamma) is the edge set of Γ\Gamma. We define

w(Γ):=(i,j)E(Γ)tanh(AijN),w^{*}(\Gamma):=\prod_{(i,j)\in E(\Gamma)}\tanh\Big{(}\frac{A^{*}_{ij}}{N}\Big{)},

which gives us

ζ=Γw(Γ).\zeta^{*}=\sum_{\Gamma}w^{*}(\Gamma).

We now introduce the results on ww^{*} that correspond to Lemmas 5.1 and 5.2. Recall that we denote simple loops by the lowercase γ\gamma.

Lemma D.1.

Let ξ:=γw(γ)\xi^{*}:=\sum_{\gamma}w^{*}(\gamma), η:=γw(γ)2\eta^{*}:=\sum_{\gamma}w^{*}(\gamma)^{2}. Let ν\nu be given as in (C.11). Then, as NN\to\infty,

  1. (i)

    The random variable ζγ(1+w(γ))\zeta^{*}-\prod_{\gamma}(1+w^{*}(\gamma)) converges in probability to 0,

  2. (ii)

    The conditional law ξ\xi^{*} given BB converges in probability to a centered Gaussian with variance 2ν2\nu^{*}, and

  3. (iii)

    η\eta^{*} converges in probability to 2ν2\nu.

As in the proof of Proposition 4.4, from the relation

γ(1+w(γ))=exp(γ(w(γ)w(γ)22)(1+𝒪(N1)))=exp(ξη2+𝒪(N1)),\prod_{\gamma}\left(1+w^{*}(\gamma)\right)=\exp\left(\sum_{\gamma}\Big{(}w^{*}(\gamma)-\frac{w^{*}(\gamma)^{2}}{2}\Big{)}(1+{\mathcal{O}}(N^{-1}))\right)=\exp\left(\xi^{*}-\frac{\eta^{*}}{2}+{\mathcal{O}}(N^{-1})\right),

the first part of Proposition C.1 follows from Lemma D.1. The third part of Proposition C.1 is also obvious since (Aij)2(A^{*}_{ij})^{2}, which generates fluctuation of (ζ)(\zeta^{*})^{\prime}, is orthogonal to tanhAijN\tanh\frac{A^{*}_{ij}}{N} under LB2L_{B^{*}}^{2}.

We now prove Lemma D.1. In order to prove the limit of 𝔼[η]\mathbb{E}[\eta^{*}], we let

ξk:=γ:|E(γ)|=kw(γ).\xi^{*}_{k}:=\sum_{\gamma:|E(\gamma)|=k}w^{*}(\gamma).

Recall that P(N,k)=N!(Nk)!P(N,k)=\frac{N!}{(N-k)!} and the number of possible simple loops of length kk in the vertex set [N][N] is P(N,k)2k\frac{P(N,k)}{2k}. We then have

𝔼ξkLB22=γ:|E(γ)|=k(i,j)E(γ)𝔼[tanh2AijN]=P(N,k)2k𝔼[tanh2AijN]kNk2k𝔼[(ωN)k(Qij(1)+Qji(1))2N2]k=(2ωF)k2k.\begin{split}\mathbb{E}||\xi^{*}_{k}||_{L^{2}_{B^{*}}}^{2}&=\sum_{\gamma:|E(\gamma)|=k}\prod_{(i,j)\in E(\gamma)}\mathbb{E}\left[\tanh^{2}\frac{A^{*}_{ij}}{N}\right]=\frac{P(N,k)}{2k}\mathbb{E}\left[\tanh^{2}\frac{A^{*}_{ij}}{N}\right]^{k}\\ &\to\frac{N^{k}}{2k}\mathbb{E}\left[(\omega N)^{k}\frac{(Q^{(1)}_{ij}+Q^{(1)}_{ji})^{2}}{N^{2}}\right]^{k}=\frac{(2\omega F)^{k}}{2k}.\end{split} (D.6)

Moreover, for any given ϵ>0\epsilon>0, there exists a deterministic N0N^{\prime}_{0}\in\mathbb{N} such that for all NN0N\geq N_{0},

𝔼ξkLB22(2ωF+ϵ)k2k.\mathbb{E}||\xi^{*}_{k}||_{L^{2}_{B^{*}}}^{2}\leq\frac{(2\omega F+\epsilon)^{k}}{2k}. (D.7)

Thus, if 2ωF<12\omega F<1, by the orthogonality and the dominated convergence theorem we have

𝔼[η]=𝔼[(ξ)2]=𝔼[𝔼B[(ξ)2]]k=3(2ωF)k2k=2ν\mathbb{E}[\eta^{*}]=\mathbb{E}[(\xi^{*})^{2}]=\mathbb{E}[\mathbb{E}_{B^{*}}[(\xi^{*})^{2}]]\to\sum_{k=3}^{\infty}\frac{(2\omega F)^{k}}{2k}=2\nu^{*} (D.8)

as NN\to\infty. Now, the proof of the second and the third parts of Lemma D.1 is the verbatim copy of the proof of Lemma 5.2 in Appendix B and we omit the detail.

Appendix E Sketch of the proof for the Spiked IID matrices

In this section, we provide the sketch of the proof for Theorem 2.8. Recall that

(Y;ω)=12N𝒙i,j=1Np(NYijωNxixj)p(NYij)=12N𝒙ijp(NYijωNxixj)p(NYij)kp(NYkkωNxk2)p(NYkk).\begin{split}{\mathcal{L}}(Y;\omega)&=\frac{1}{2^{N}}\sum_{{\boldsymbol{x}}}\prod_{i,j=1}^{N}\frac{p(\sqrt{N}Y_{ij}-\sqrt{\omega N}x_{i}x_{j})}{p(\sqrt{N}Y_{ij})}\\ &=\frac{1}{2^{N}}\sum_{{\boldsymbol{x}}}\prod_{i\neq j}\frac{p(\sqrt{N}Y_{ij}-\sqrt{\omega N}x_{i}x_{j})}{p(\sqrt{N}Y_{ij})}\prod_{k}\frac{p(\sqrt{N}Y_{kk}-\sqrt{\omega N}x_{k}^{2})}{p(\sqrt{N}Y_{kk})}\,.\end{split} (E.1)

As in the spiked Wigner case, the diagonal part can be handled separately, since it does not depend on the spike or the off-diagonal part. Following the decomposition in (LABEL:eq:decompose), we set

Qij(s):=p(s)(NYij)p(NYij).Q^{(s)}_{ij}:=\frac{p^{(s)}(\sqrt{N}Y_{ij})}{p(\sqrt{N}Y_{ij})}. (E.2)

Then, we get

log12N𝒙ijp(NYijωNxixj)p(NYij)=log12N𝒙exp(i<jAijxixj+i<j(Bij+Cij)+𝒪(N12)),\begin{split}&\log\frac{1}{2^{N}}\sum_{{\boldsymbol{x}}}\prod_{i\neq j}\frac{p(\sqrt{N}Y_{ij}-\sqrt{\omega N}x_{i}x_{j})}{p(\sqrt{N}Y_{ij})}\\ &=\log\frac{1}{2^{N}}\sum_{{\boldsymbol{x}}}\exp\left(\sum_{i<j}A^{*}_{ij}x_{i}x_{j}+\sum_{i<j}(B^{*}_{ij}+C^{*}_{ij})+{\mathcal{O}}(N^{-\frac{1}{2}})\right),\end{split} (E.3)

where we let

Aij:=ωN(Qij(1)+Qji(1)+ω6N(Qij(3)+Qji(3)3Qij(1)Qij(2)3Qji(1)Qji(2)+2(Qij(1))3+2(Qji(1))3)),Bij:=ω2N(Qij(2)+Qji(2)(Qij(1))2(Qji(1))2),Cij:=ω224N2(Qij(4)+Qji(4)3(Pij(2))23(Pji(2))24Qij(1)Qij(3)4Qji(1)Qji(3)+12(Qij(1))2Qij(2)+12(Qji(1))2Qji(2)6(Pij(1))46(Pji(1))4).\begin{split}&A^{*}_{ij}:=-\sqrt{\omega N}\bigg{(}Q^{(1)}_{ij}+Q^{(1)}_{ji}+\frac{\omega}{6N}\Big{(}Q^{(3)}_{ij}+Q^{(3)}_{ji}-3Q^{(1)}_{ij}Q^{(2)}_{ij}-3Q^{(1)}_{ji}Q^{(2)}_{ji}+2(Q^{(1)}_{ij})^{3}+2(Q^{(1)}_{ji})^{3}\Big{)}\bigg{)},\\ &B^{*}_{ij}:=\frac{\omega}{2N}\left(Q^{(2)}_{ij}+Q^{(2)}_{ji}-(Q^{(1)}_{ij})^{2}-(Q^{(1)}_{ji})^{2}\right),\\ &C^{*}_{ij}:=\frac{\omega^{2}}{24N^{2}}\bigg{(}Q^{(4)}_{ij}+Q^{(4)}_{ji}-3(P^{(2)}_{ij})^{2}-3(P^{(2)}_{ji})^{2}-4Q^{(1)}_{ij}Q^{(3)}_{ij}-4Q^{(1)}_{ji}Q^{(3)}_{ji}\\ &\qquad\qquad\qquad+12(Q^{(1)}_{ij})^{2}Q^{(2)}_{ij}+12(Q^{(1)}_{ji})^{2}Q^{(2)}_{ji}-6(P^{(1)}_{ij})^{4}-6(P^{(1)}_{ji})^{4}\bigg{)}.\end{split} (E.4)

We first consider the spin glass part and prove a counterpart of Proposition 4.4. Set

Z:=12N𝒙exp(i<jAijxixj).Z^{*}:=\frac{1}{2^{N}}\sum_{{\boldsymbol{x}}}\exp\left(\sum_{i<j}A^{*}_{ij}x_{i}x_{j}\right). (E.5)

Then, there exist random variables ζ\zeta^{*} and (ζ)(\zeta^{*})^{\prime} such that

logZ=logζ+(ζ)+𝒪(N1),\log Z^{*}=\log\zeta^{*}+(\zeta^{*})^{\prime}+{\mathcal{O}}(N^{-1}), (E.6)

where logζ\log\zeta^{*} and (ζ)(\zeta^{*})^{\prime} are asymptotically orthogonal to each other under LB2L_{B^{*}}^{2} and satisfy the following: The conditional distribution of logζ\log\zeta^{*} given BB^{*} converges in distribution to 𝒩(ν,2ν){\mathcal{N}}(-\nu^{*},2\nu^{*}), with

ν:=k=3(2ωF)k4k=14(log(12ωF)+2ωF+2ω2F2).\nu^{*}:=\sum_{k=3}^{\infty}\frac{(2\omega F)^{k}}{4k}=-\frac{1}{4}\left(\log(1-2\omega F)+2\omega F+2\omega^{2}F^{2}\right). (E.7)

Further,

(ζ)=12N2i<j𝔼B[(Aij)2]ω212𝔼[(Q12(1))4]ω24𝔼[(Q12(1))2]2+U,(\zeta^{*})^{\prime}=\frac{1}{2N^{2}}\sum_{i<j}\mathbb{E}_{B^{*}}[(A^{*}_{ij})^{2}]-\frac{\omega^{2}}{12}\mathbb{E}[(Q^{(1)}_{12})^{4}]-\frac{\omega^{2}}{4}\mathbb{E}[(Q^{(1)}_{12})^{2}]^{2}+U^{*}, (E.8)

where UU^{*} is a random variable whose asymptotic law is a centered Gaussian with variance

θ:=ω28𝔼[VarB((Q12(1)+Q12(1))2)].\theta^{*}:=\frac{\omega^{2}}{8}\mathbb{E}\left[\operatorname{Var}_{B^{*}}\big{(}(Q^{(1)}_{12}+Q^{(1)}_{12})^{2}\big{)}\right]. (E.9)

We next follow the analysis in Section 4.3 for the CLT part. The terms involving BB^{*} in the right hand side of (LABEL:eq:iid_decompose) are

i<jBij+12N2i<j𝔼[(Aij)2|Bij].\sum_{i<j}B^{*}_{ij}+\frac{1}{2N^{2}}\sum_{i<j}\mathbb{E}[(A^{*}_{ij})^{2}|B^{*}_{ij}].

By definition,

𝔼[Bij]=ωN𝔼[Qij(2)(Qij(1))2]=ωFN\mathbb{E}[B^{*}_{ij}]=\frac{\omega}{N}\mathbb{E}\left[Q^{(2)}_{ij}-(Q^{(1)}_{ij})^{2}\right]=-\frac{\omega F}{N}

and

𝔼[𝔼[(Aij)2|Bij]]=𝔼[(Aij)2]=2ωN𝔼[(Qij(1))2]+2ω23𝔼[Qij(1)Qij(3)3(Qij(1))2Qij(2)+2(Qij(1))4]+O(N1)=2ωN𝔼[(Qij(1))2]+2ω23𝔼[Qij(1)Qij(3)]+O(N1),\begin{split}&\mathbb{E}\left[\mathbb{E}[(A^{*}_{ij})^{2}|B^{*}_{ij}]\right]=\mathbb{E}[(A^{*}_{ij})^{2}]\\ &=2\omega N\mathbb{E}\left[(Q^{(1)}_{ij})^{2}\right]+\frac{2\omega^{2}}{3}\mathbb{E}\left[Q^{(1)}_{ij}Q^{(3)}_{ij}-3(Q^{(1)}_{ij})^{2}Q^{(2)}_{ij}+2(Q^{(1)}_{ij})^{4}\right]+O(N^{-1})\\ &=2\omega N\mathbb{E}\left[(Q^{(1)}_{ij})^{2}\right]+\frac{2\omega^{2}}{3}\mathbb{E}\left[Q^{(1)}_{ij}Q^{(3)}_{ij}\right]+O(N^{-1}),\end{split} (E.10)

where we used the independence of Qij(s)Q^{(s)}_{ij} and Qji(s)Q^{(s)}_{ji} and also that 𝔼[Qij(1)]=0\mathbb{E}[Q^{(1)}_{ij}]=0. We thus get

𝔼[i<jBij+12N2i<j𝔼[(Aij)2|Bij]]=ω26𝔼[Q12(1)Q12(3)]+O(N1).\begin{split}\mathbb{E}\left[\sum_{i<j}B^{*}_{ij}+\frac{1}{2N^{2}}\sum_{i<j}\mathbb{E}[(A^{*}_{ij})^{2}|B^{*}_{ij}]\right]=\frac{\omega^{2}}{6}\mathbb{E}\left[Q^{(1)}_{12}Q^{(3)}_{12}\right]+O(N^{-1}).\end{split} (E.11)

The computation of the variance is similar, and we only remark important changes here as follows:

𝔼[(Bij)2]=ω22N2𝔼[(Qij(2))22(Qij(1))2Qij(2)+(Qij(1))4]+ω2F22N2,\mathbb{E}[(B^{*}_{ij})^{2}]=\frac{\omega^{2}}{2N^{2}}\mathbb{E}\left[(Q^{(2)}_{ij})^{2}-2(Q^{(1)}_{ij})^{2}Q^{(2)}_{ij}+(Q^{(1)}_{ij})^{4}\right]+\frac{\omega^{2}F^{2}}{2N^{2}},
1N2𝔼[(Aij)2Bij]=ω2N2𝔼[(Qij(1))2(Qij(2)(Qij(1))2)]ω2F2N2+O(N3),\frac{1}{N^{2}}\mathbb{E}[(A^{*}_{ij})^{2}B^{*}_{ij}]=\frac{\omega^{2}}{N^{2}}\mathbb{E}\left[(Q^{(1)}_{ij})^{2}\Big{(}Q^{(2)}_{ij}-(Q^{(1)}_{ij})^{2}\Big{)}\right]-\frac{\omega^{2}F^{2}}{N^{2}}+O(N^{-3}),

and

14N4𝔼[(𝔼[(Aij)2|Bij])2]=ω24N2𝔼[(𝔼[(Qij(1)+(Qji(1))2|Bij])2]+O(N3).\frac{1}{4N^{4}}\mathbb{E}[(\mathbb{E}[(A^{*}_{ij})^{2}|B^{*}_{ij}])^{2}]=\frac{\omega^{2}}{4N^{2}}\mathbb{E}[(\mathbb{E}[(Q^{(1)}_{ij}+(Q^{(1)}_{ji})^{2}|B^{*}_{ij}])^{2}]+O(N^{-3}).

Using the identities

𝔼[(Qij(1)+Qji(1))4]=𝔼[(𝔼[(Qij(1)+Qji(1))2|Bij])2+Var((Qij(1)+Qji(1))2|Bij)]\mathbb{E}[(Q^{(1)}_{ij}+Q^{(1)}_{ji})^{4}]=\mathbb{E}\left[(\mathbb{E}[(Q^{(1)}_{ij}+Q^{(1)}_{ji})^{2}|B^{*}_{ij}])^{2}+\operatorname{Var}\big{(}(Q^{(1)}_{ij}+Q^{(1)}_{ji})^{2}|B^{*}_{ij}\big{)}\right]

and

𝔼[(Qij(1)+Qji(1))4]=2𝔼[(Qij(1))4]+6𝔼[(Qij(1))2]2,\mathbb{E}[(Q^{(1)}_{ij}+Q^{(1)}_{ji})^{4}]=2\mathbb{E}[(Q^{(1)}_{ij})^{4}]+6\mathbb{E}[(Q^{(1)}_{ij})^{2}]^{2},

we get

𝔼[(Bij+𝔼[(Aij)2|Bij]2N2)2]=ω24N2𝔼[2(Qij(2))2Var((Qij(1))2|Bij)]+ω2F2N2+O(N3).\begin{split}\mathbb{E}\left[\left(B^{*}_{ij}+\frac{\mathbb{E}[(A^{*}_{ij})^{2}|B^{*}_{ij}]}{2N^{2}}\right)^{2}\right]=\frac{\omega^{2}}{4N^{2}}\mathbb{E}\left[2(Q^{(2)}_{ij})^{2}-\operatorname{Var}\big{(}(Q^{(1)}_{ij})^{2}|B^{*}_{ij}\big{)}\right]+\frac{\omega^{2}F^{2}}{N^{2}}+O(N^{-3}).\end{split} (E.12)

For the LLN part, we simply get

i<jCij=ω224𝔼[Q12(1)Q12(3)]+𝒪(N1).\sum_{i<j}C^{*}_{ij}=-\frac{\omega^{2}}{24}\mathbb{E}[Q^{(1)}_{12}Q^{(3)}_{12}]+{\mathcal{O}}(N^{-1}). (E.13)

The computation for the diagonal part coincides with that for the spiked Wigner case, and we get

logkp(NYkkωNxk2)p(NYkk)𝒩(ωF2,ωF).\log\prod_{k}\frac{p(\sqrt{N}Y_{kk}-\sqrt{\omega N}x_{k}^{2})}{p(\sqrt{N}Y_{kk})}\Rightarrow{\mathcal{N}}(-\frac{\omega F}{2},\omega F). (E.14)

Combining Equations (E.1), (E.5), (E.11), (E.12), (E.13), and (E.14), we find that log(Y;λ)\log{\mathcal{L}}(Y;\lambda) converges to the Gaussian whose mean is

14(log(12ωF)+2ωF+2ω2F2)ω212𝔼[(Q12(1))4]ω2F24+ω26𝔼[Q12(1)Q12(3)]ω224𝔼[Q12(1)Q12(3)]ωF2=14(log(1ωF)+ω2F2)ω28𝔼[(Q12(2))2]=ρ\begin{split}&\frac{1}{4}\left(\log(1-2\omega F)+2\omega F+2\omega^{2}F^{2}\right)-\frac{\omega^{2}}{12}\mathbb{E}[(Q^{(1)}_{12})^{4}]-\frac{\omega^{2}F^{2}}{4}+\frac{\omega^{2}}{6}\mathbb{E}[Q^{(1)}_{12}Q^{(3)}_{12}]-\frac{\omega^{2}}{24}\mathbb{E}[Q^{(1)}_{12}Q^{(3)}_{12}]-\frac{\omega F}{2}\\ &=\frac{1}{4}\left(\log(1-\omega F)+\omega^{2}F^{2}\right)-\frac{\omega^{2}}{8}\mathbb{E}[(Q^{(2)}_{12})^{2}]=-\rho^{*}\end{split} (E.15)

and variance

12(log(12ωF)+2ωF+2ω2F2)+ω28𝔼[VarB((Q12(1)+Q21(1))2)]+ω28𝔼[2(Q12(2))2VarB((Q12(1)+Q21(1))2)]+ω2F22+ωF=2ρ.\begin{split}&-\frac{1}{2}\left(\log(1-2\omega F)+2\omega F+2\omega^{2}F^{2}\right)+\frac{\omega^{2}}{8}\mathbb{E}\left[\operatorname{Var}_{B^{*}}\big{(}(Q^{(1)}_{12}+Q^{(1)}_{21})^{2}\big{)}\right]\\ &\qquad\qquad+\frac{\omega^{2}}{8}\mathbb{E}\left[2(Q^{(2)}_{12})^{2}-\operatorname{Var}_{B^{*}}\big{(}(Q^{(1)}_{12}+Q^{(1)}_{21})^{2}\big{)}\right]+\frac{\omega^{2}F^{2}}{2}+\omega F=2\rho^{*}.\end{split} (E.16)

This proves Theorem 2.8.

References

  • [1] M. Aizenman, J. L. Lebowitz, and D. Ruelle. Some rigorous results on the Sherrington-Kirkpatrick spin glass model. Comm. Math. Phys., 112(1):3–20, 1987.
  • [2] J. Baik, G. Ben Arous, and S. Péché. Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Ann. Probab., 33(5):1643–1697, 2005.
  • [3] J. Baik and J. O. Lee. Fluctuations of the free energy of the spherical Sherrington-Kirkpatrick model. J. Stat. Phys., 165(2):185–224, 2016.
  • [4] J. Barbier, M. Dia, N. Macris, F. Krzakala, T. Lesieur, and L. Zdeborová. Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula. In Advances in Neural Information Processing Systems 29, pages 424–432. 2016.
  • [5] F. Benaych-Georges and R. R. Nadakuditi. The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Adv. Math., 227(1):494–521, 2011.
  • [6] F. Benaych-Georges and R. R. Nadakuditi. The singular values and vectors of low rank perturbations of large rectangular random matrices. J. Multivariate Anal., 111:120–135, 2012.
  • [7] M. Capitaine, C. Donati-Martin, and D. Féral. The largest eigenvalues of finite rank deformation of large Wigner matrices: convergence and nonuniversality of the fluctuations. Ann. Probab., 37(1):1–47, 2009.
  • [8] Y. Chen, C. Cheng, and J. Fan. Asymmetry helps: Eigenvalue and eigenvector analyses of asymmetrically perturbed low-rank matrices. Ann. Stastist, 49(1):435, 2021.
  • [9] H. W. Chung and J. O. Lee. Weak detection of signal in the spiked wigner model. In Proceedings of the 36th International Conference on Machine Learning, volume 97, pages 1233–1241, 2019.
  • [10] E. Dobriban. Sharp detection in PCA under correlations: All eigenvalues matter. Ann. Statist., 45(4):1810–1833, 2017.
  • [11] S. F. Edwards and P. W. Anderson. Theory of spin glasses. J. Phys. F, 5(5):965, 1975.
  • [12] A. El Alaoui and M. I. Jordan. Detection limits in the high-dimensional spiked rectangular model. In Conference On Learning Theory, pages 410–438, 2018.
  • [13] A. El Alaoui, F. Krzakala, and M. I. Jordan. Fundamental limits of detection in the spiked Wigner model. Ann. Statist., 48(2):863–885, 2020.
  • [14] D. Féral and S. Péché. The largest eigenvalue of rank one deformation of large Wigner matrices. Comm. Math. Phys., 272(1):185–228, 2007.
  • [15] I. M. Johnstone. On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist., 29(2):295–327, 2001.
  • [16] I. M. Johnstone and A. Onatski. Testing in high-dimensional spiked models. Ann. Stastist, 48(3):1231–1254, 2020.
  • [17] J. H. Jung, H. W. Chung, and J. O. Lee. Detection of signal in the spiked rectangular models. In Proceedings of the 38th International Conference on Machine Learning, volume 139, pages 5158–5167, 2021.
  • [18] A. Montanari, D. Reichman, and O. Zeitouni. On the limitation of spectral methods: from the Gaussian hidden clique problem to rank one perturbations of Gaussian tensors. IEEE Trans. Inform. Theory, 63(3):1572–1579, 2017.
  • [19] A. Onatski, M. J. Moreira, and M. Hallin. Asymptotic power of sphericity tests for high-dimensional data. Ann. Statist., 41(3):1204–1231, 2013.
  • [20] A. Onatski, M. J. Moreira, and M. Hallin. Signal detection in high dimension: the multispiked case. Ann. Statist., 42(1):225–254, 2014.
  • [21] G. Parisi. A sequence of approximated solutions to the sk model for spin glasses. J. Phys. A, 13(4):L115, 1980.
  • [22] S. Péché. The largest eigenvalue of small rank perturbations of Hermitian random matrices. Probab. Theory Related Fields, 134(1):127–173, 2006.
  • [23] A. Perry, A. S. Wein, A. S. Bandeira, and A. Moitra. Optimality and sub-optimality of PCA I: Spiked random matrix models. Ann. Statist., 46(5):2416–2451, 2018.
  • [24] D. Sherrington and S. Kirkpatrick. Solvable model of a spin-glass. Phys. Rev. Lett., 35(26):1792, 1975.
  • [25] M. Talagrand. The Parisi formula. Ann. of Math. (2), 163(1):221–263, 2006.