This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Phase Transition in the Generalized Stochastic Block Model

Sun Min Lee111Department of Mathematical Sciences, KAIST, Daejeon 34141, Korea
e-mail: [email protected]
   Ji Oon Lee222Department of Mathematical Sciences, KAIST, Daejeon 34141, Korea
email: [email protected]
Abstract

We study the problem of detecting the community structure from the generalized stochastic block model (GSBM). Based on the analysis of the Stieljtes transform of the empirical spectral distribution, we prove a BBP-type transition for the largest eigenvalue of the GSBM. For specific models such as a hidden community model and an unbalanced stochastic model, we provide precise formulas for the two largest eigenvalues, establishing the gap in the BBP-type transition.

1 Introduction

One of the most fundamental and natural problems in data science is to understand an underlying structure from data sets that can be viewed as networks. The problem is known as the clustering or community detection, and it appears in diverse field of studies involving real-world networks.

The stochastic block model (SBM) is one of the most fundamental mathematical models to understand the community structure in networks. An SBM is a random graph with NN nodes, partitioned into KK disjoint subsets, called the communities, C1,C2,,CKC_{1},C_{2},\dots,C_{K}. One can characterize an SBM via its adjacency matrix, which is a symmetric (random) matrix M~\widetilde{M}, whose (i,j)(i,j)-entry M~ij\widetilde{M}_{ij} is a Bernoulli random variable depending only on the communities to which the nodes ii and jj belong. For the clustering of an SBM, it is often useful to analyze the eigenvalues of the adjacency matrix and their associated eigenvectors, which is known as a spectral method.

One of the most prominent examples of spectral methods is the principal component analysis (PCA) in which the behavior of the eigenvectors associated with the extremal eigenvalues are considered to obtain the community structure of the SBM. For an SBM with two communities, the expectation of its adjacency matrix M~\widetilde{M} has a block structure, i.e.,

𝔼[M~]=(P11P12P21P22).\mathbb{E}[\widetilde{M}]=\left(\begin{array}[]{c|c}P_{11}&P_{12}\\ \hline\cr P_{21}&P_{22}\end{array}\right). (1.1)

In the simplest case of a balanced SBM with P11=P12=pP_{11}=P_{12}=p, P12=P21=qP_{12}=P_{21}=q, and the two communities are of equal size, it can be easily checked that 𝔼[M~]\mathbb{E}[\widetilde{M}] has at most two non-zero eigenvalues, N(p+q)/2N(p+q)/2 and N(pq)/2N(p-q)/2. Thus, if N(pq)/2N(p-q)/2 is sufficiently large, the perturbation M~𝔼[M~]\widetilde{M}-\mathbb{E}[\widetilde{M}] is negligible for the two largest eigenvalues M~\widetilde{M} and it is possible to determine the community structure from the eigenvector associated with the second largest eigenvalue of M~\widetilde{M}. Equivalently, after subtracting (p+q)/2(p+q)/2 from each entry, the (shifted) adjacency matrix becomes the sum of a rank-11 deterministic matrix and a random matrix with centered entries, and one can use the eigenvector associated with the largest eigenvalue of the shifted adjacency matrix for clustering.

The sum of a deterministic matrix and a random matrix has been extensively studied in random matrix theory. When the deterministic matrix is rank-11 and the random matrix is a Wigner matrix, it is called a (rank-11) spiked Wigner matrix. The behavior of the largest eigenvalue of a spiked Wigner matrix is known to exhibit a sharp phase transition depending on the ratio between the spectral norms of the deterministic part and the random part. This type of phase transition is called the BBP transition, after the seminal work of Baik, Ben Arous, and Péché [5] for spiked (complex) Wishart matrix. From the BBP transition, we can immediately see that the detection of the signal is possible via PCA when the signal-to-noise ratio (SNR) is above a certain threshold.

While the BBP transition has been proved for spiked Wigner matrices under various assumptions [20, 15, 8, 7], it is not directly applicable to the SBM, since the entries in a Wigner matrix are i.i.d. (up to symmetry constraint) whereas those in the adjacency matrix of an SBM are not. The proof of the BBP transition with an SBM is substantially harder. For example, unless the SBM is balanced, the empirical spectral distribution (ESD) of M~\widetilde{M} does not even converge to the semi-circle distribution, which is the limiting ESD of a Wigner matrix; the limiting ESD in this case is not given by a simple formula as the semi-circle distribution but by an implicit formula via its Stieltjes transform.

Main contribution

In this paper, we consider a model that generalizes the SBM, called the generalized stochastic block model (GSBM), with two communities. In this model, the mean of the matrix has the same block structure as that of the SBM in (1.1), but the entries are not necessarily Bernoulli random variables. See Definition 2.1 for the precise definition of the GSBM.

For the GSBM, We prove the BBP-type transition for its largest eigenvalue (Theorem 2.1). The proof is based on the analysis of the Stiejtes transform of the ESD, which involves the resolvent of the random part of the GSBM. Due to the community structure, the random part is not a Wigner matrix, but a generalization of a Wigner matrix, known as a Wigner-type matrix. The local properties of eigenvalues of Wigner-type matrices are now well-established by recent developments of random matrix theory; see, e.g., [3, 4, 11].

In our main result, Theorem 2.1, we only state the existence of the critical values and the limiting gap between the two largest eigenvalue but refrain from writing the precise formulas for them. We instead apply our results to specific examples naturally arising in applications, hidden community model and unbalanced stochastic block model, and present the results from numerical experiments. (In terms of the edge probability, the former corresponds to the case P11=pP_{11}=p and P12=P21=P22=qP_{12}=P_{21}=P_{22}=q, while the latter P11=P22=pP_{11}=P_{22}=p and P12=P21=qP_{12}=P_{21}=q (but γ1/2\gamma\neq 1/2)).

Related works

The local law for Wigner-type matrices and the behavior of Quadratic vector equations (QVE), which are crucial in the analysis for Wigner-type matrices, were thoroughly investigated by Ajanki, Erdős and Krüger [3, 4]. A related result on the local law at the cusp for the Wigner-type matrix was also proved [13]. For more results on general Wigner-type matrices, we refer to [11, 14, 23] and references therein.

The phase transition of the largest eigenvalue was first proved proved by Baik, Ben Arous and Péché [5] for spiked Wishart matrices and later extend to other models, including the spiked Wigner matrix under various assumptions [20, 15, 8, 7]. If the SNR is below the threshold given by the BBP transition, the largest eigenvalue has no information on the signal and we cannot use the PCA for the detection of the signal. For this case, the PCA can be improved by an entrywise transformation that effectively increase the SNR [6, 21]. Reliable detection is impossible below a certain threshold [21], and it is only possible to consider a weak detection, which is a hypothesis testing between the null model (without spike) and the alternative (with spike). For more detail about the weak detection, we refer to [12, 10, 19].

The problem of recovering a hidden community from a symmetric matrix for two important cases, the Bernoulli and Gaussian entries, was discussed by Hajek, Wu, and Xu [16]. A threshold for exact recovery in SBM was discussed in [1, 2, 9, 17]. Recovering community at the Kesten–Stigum threshold for SBM was considered in [18]. For more results and Applications on SBM, we refer to [22] and references therein.

Organization of the paper

The rest of the paper is organized as follows: In Section 2, we define the model and state the main result. In Section 3, we introduce the hidden community model and unbalanced stochastic model to provide the results from numerical experiments around the transition threshold. In Section 4, we prove the main theorem. A summary of our results and future research directions was discussed in Section 5. Appendix A contains the definition of the Wigner-type matrices and preliminary results on this model. The detailed analysis for the specific models can be found in Appendix B.

2 Main Results

In this section, we precisely define the matrix model that we consider in this paper and state our main theorem. We begin by introducing a shifted, rescaled matrix for a generalized stochastic block model with two communities.

Definition 2.1 (Generalized Stochastic Block Model (GSBM)).

An N×NN\times N matrix MM is a generalized stochastic block model if

M=H+λuuTM=H+\lambda uu^{T}

where λ0\lambda\geq 0 is a constant, u=(u1,u2,,uN)Nu=(u_{1},u_{2},\dots,u_{N})\in\mathbb{R}^{N} with u=1\|u\|=1, and H=[Hij]H=[H_{ij}] is an N×NN\times N real symmetric matrix, satisfying the following:

  • There exist S[N]:={1,2,,N}S\subset[N]:=\{1,2,\dots,N\} and constants θ1,θ2\theta_{1},\theta_{2} such that

    ui={θ1ifiS,θ2ifiS.u_{i}=\begin{cases}\theta_{1}&\text{if}\quad i\in S\,,\\ \theta_{2}&\text{if}\quad i\notin S\,.\end{cases}

    We further assume that |S|N,(1|S|N)>c>0\frac{|S|}{N},(1-\frac{|S|}{N})>c>0 for some (NN-independent) constant cc.

  • Upper diagonal entries Hij(ij)H_{ij}(i\leq j) are centered independent random variables such that

    • there exist (NN-independent) constants α1\alpha_{1} and α2\alpha_{2} such that

      𝔼[Hij2]={α1N1ifi,jSα2N1ifi,jSN1otherwise\mathbb{E}[H_{ij}^{2}]=\begin{cases}\alpha_{1}N^{-1}&\text{if}\quad i,j\in S\\ \alpha_{2}N^{-1}&\text{if}\quad i,j\notin S\\ N^{-1}&\text{otherwise}\end{cases}
    • for any (NN-independent) D>0D>0, there exists a constant CDC_{D} such that for all iji\leq j

      𝔼[HijD]CDND2.\mathbb{E}[H_{ij}^{D}]\leq C_{D}N^{-\frac{D}{2}}.

For an adjacency matrix M~\widetilde{M} in (1.1), if P11=p1P_{11}=p_{1}, P22=p2P_{22}=p_{2}, and P12=P21=qP_{12}=P_{21}=q, then after shifting and rescaling, we find that

α1=p1(1p1)q(1q),α2=p2(1p2)q(1q).\alpha_{1}=\frac{p_{1}(1-p_{1})}{q(1-q)},\qquad\alpha_{2}=\frac{p_{2}(1-p_{2})}{q(1-q)}. (2.1)

(See Appendix B for more detail.)

We remark that HijH_{ij} are not necessarily Bernoulli random variables. The assumption on the finite moment means that the model is in the dense regime. The most typical balanced stochastic block model with two communities correspond to the choice of parameters |S|=N/2|S|=N/2 and α1=α2>1\alpha_{1}=\alpha_{2}>1.

Our main theorem is the following result on the phase transition for the spectral gap of GSBM.

Theorem 2.1.

Let MM be a generalized stochastic block model defined in Definition 2.1. Denote by λ1\lambda_{1} and λ2\lambda_{2} the largest and the second largest eigenvalue of MM. Assume that γ:=N1/N\gamma:=N_{1}/N is fixed. Then, there exists a constant λc\lambda_{c}, depending only on θ1,θ2,α1,α2,γ\theta_{1},\theta_{2},\alpha_{1},\alpha_{2},\gamma, such that

  • (Subcritical case) if λ<λc\lambda<\lambda_{c}, then λ1λ20\lambda_{1}-\lambda_{2}\to 0 as NN\to\infty, almost surely.

  • (Supercritical case) if λ>λc\lambda>\lambda_{c}, then λ1λ2g\lambda_{1}-\lambda_{2}\to g as NN\to\infty, almost surely, for some (NN-independent) constant gg.

We do not include the precise formulas for the critical value λc\lambda_{c} and the gap gg in the statement of Theorem 2.1 for the general cases since they are lengthy but not particularly informative. The formulas for the special cases can be found in Section 3.

3 Examples and Experiments

In this section, we focus on several specific models and check how the main result, Theorem 2.1, applies to them.

3.1 Hidden community model

In the hidden community model, only one of the intra-community connection probability is larger than the inter-community connection probability, and the other intra-connection probability coincides with the inter-community connection. The precise definition for such a model is as follows:

Definition 3.1 (Hidden Community Model).

Let C[n]C\subset[n] such that |C|=K|C|=K. Let define that SS is a N×NN\times N symmetric matrix with Sii=0S_{ii}=0 where SijS_{ij} are independent for 1ijN1\leq i\leq j\leq N and

Sij{P,if i,jCQ,otherwise\displaystyle S_{ij}\sim\begin{cases}P,&\mbox{if }i,j\in C\\ Q,&\mbox{otherwise}\\ \end{cases}

for given probability measures PP and QQ.

We consider the BBP-type transition of the hidden community model with Bernoulli entries, i.e., P=Bernoulli(p)P=\mathrm{Bernoulli}(p) and Q=Bernoulli(q)Q=\mathrm{Bernoulli}(q) with pqp\neq q, which also corresponds to the case α2=1\alpha_{2}=1 or p2=qp_{2}=q in (2.1). It is not hard to find that the transition occurs in the regime

p1:=p=wN+qp_{1}:=p=\frac{w}{\sqrt{N}}+q

for some (possibly NN-dependent) w=Θ(1)w=\Theta(1). After shifting and rescaling, we find that λ22\lambda_{2}\to 2 and

λ1{γwq(1q)+q(1q)γw if w>q(1q)γ,2 if w<q(1q)γ.\lambda_{1}\to\begin{cases}\frac{\gamma w}{\sqrt{q(1-q)}}+\frac{\sqrt{q(1-q)}}{\gamma w}&\text{ if }w>\frac{\sqrt{q(1-q)}}{\gamma}\,,\\ 2&\text{ if }w<\frac{\sqrt{q(1-q)}}{\gamma}\,.\end{cases} (3.1)

See Appendix B.1 for the detail.

We performed the numerical simulation for the hidden community model. We set N=2500N=2500, γ=1/4\gamma=1/4, and q=0.2q=0.2. Following the analysis in Appendix B.1, we find that an outlier eigenvalue occurs if

p>q+q(1q)γN=0.232.p>q+\frac{\sqrt{q(1-q)}}{\gamma\sqrt{N}}=0.232.

In Figure 1, we compare the histograms of the eigenvalues of the shifted, rescaled adjacency matrices with p=0.2p=0.2 and p=0.25p=0.25, respectively. As predicted by the analysis, the outlier appears only for the case p=0.25p=0.25.

Refer to caption
(a)
Refer to caption
(b)
Figure 1: The histograms of the eigenvalues for the hidden community model with p=0.2p=0.2 (left) and p=0.25p=0.25 (right).

3.2 Unbalanced stochastic block model

We next consider the case p1=p2p_{1}=p_{2} or α1=α2\alpha_{1}=\alpha_{2} with γ1/2\gamma\neq 1/2, which we will refer to an unbalanced stochastic block model. As in the hidden community model, the transition occurs in the regime p1=p2:=p=wN+qp_{1}=p_{2}:=p=\frac{w}{\sqrt{N}}+q. After shifting and rescaling, we find that λ22\lambda_{2}\to 2 and

λ1{w2q(1q)+2q(1q)w if w>2q(1q),2 if w<2q(1q).\lambda_{1}\to\begin{cases}\frac{w}{2\sqrt{q(1-q)}}+\frac{2\sqrt{q(1-q)}}{w}&\text{ if }w>2\sqrt{q(1-q)}\,,\\ 2&\text{ if }w<2\sqrt{q(1-q)}\,.\end{cases}

See Appendix B.2 for the detail. Note that the transition does not depend on γ\gamma.

We performed the numerical simulation for the unbalanced stochastic block model. As in the hidden community model, we set N=2500N=2500, γ=1/4\gamma=1/4, and q=0.2q=0.2. An outlier eigenvalue occurs if

p>q+2q(1q)N=0.216.p>q+\frac{2\sqrt{q(1-q)}}{\sqrt{N}}=0.216.

In Figure 2, we compare the histograms of the eigenvalues of the shifted, rescaled adjacency matrices with p=0.2p=0.2 and p=0.25p=0.25, respectively. Again, as predicted by the analysis, the outlier appears only for the case p=0.25p=0.25.

Refer to caption
(a)
Refer to caption
(b)
Figure 2: The histograms of the eigenvalues for the unbalanced stochastic block model with p=0.2p=0.2 (left) and p=0.25p=0.25 (right).

4 Proof of Theorem 2.1

Recall that we denote by λ1\lambda_{1} and λ2\lambda_{2} the two largest eigenvalues of MM. Let μ1\mu_{1} and μ2\mu_{2} be the two largest eigenvalues of HH. From the result on the Wigner-type matrices, we find that μ1\mu_{1} and μ2\mu_{2} converge to the rightmost edge of the limiting ESD of HH. By the Cauchy interlacing formula, we have the inequality

μ2λ2μ1λ1,\mu_{2}\leq\lambda_{2}\leq\mu_{1}\leq\lambda_{1},

which shows that λ2\lambda_{2} also converges to the rightmost edge of the limiting ESD of HH.

From the minimax principle,

λ1=maxx=1x,mx=maxx=1(x,Hx+λ|x,u|2),\lambda_{1}=\max_{\|x\|=1}\langle x,mx\rangle=\max_{\|x\|=1}\left(\langle x,Hx\rangle+\lambda|\langle x,u\rangle|^{2}\right),

which shows that λ1\lambda_{1} is an increasing function of λ\lambda. Further, since λ1μ1\lambda_{1}\geq\mu_{1} and

λμ1λ1λ+μ1,\lambda-\mu_{1}\leq\lambda_{1}\leq\lambda+\mu_{1},

we find that λ1μ1=o(1)\lambda_{1}-\mu_{1}=o(1) if λ=o(1)\lambda=o(1) and λ1>μ1+1\lambda_{1}>\mu_{1}+1 if λ>2μ1+1\lambda>2\mu_{1}+1. Thus, since λ1\lambda_{1} is a continuous function of λ\lambda, conditional on HH there exists λc\lambda_{c} such that the statement of Theorem 2.1 holds. It thus remains to show that λc\lambda_{c} and μ\mu are deterministic in the sense that the same statement holds without conditioning on HH.

Our proof is based on the Stieltjes transform method in random matrix theory for which we use the following definition:

Definition 4.1 (Stieltjes Transform).

Let μ\mu be a probability measure on the real line. The Stieltjes transform of μ\mu is defined by

Sμ(z)=1xzdμ(x)S_{\mu}(z)=\int_{\mathbb{R}}\frac{1}{x-z}\mathrm{d}\mu(x)

for z\supp(μ)z\in\mathbb{C}\backslash\mathrm{supp}\,(\mu).

For the noise HH, we consider its resolvent G(z)G(z) defined by

G(z):=(HzI)1G(z):=(H-zI)^{-1} (4.1)

for z\spec(H)z\in\mathbb{C}\backslash\mathrm{spec}\,(H). Note that the normalized trace m:=N1TrGm:=N^{-1}\operatorname{Tr}G is equal to the Stieltjes transform of the empirical spectral distribution (ESD) of HH.

To find the largest eigenvalue of M=H+λuuTM=H+\lambda uu^{T}, we recall that any eigenvalue zz of MM satisfies

det(H+λuuTzI)=0,\det(H+\lambda uu^{T}-zI)=0, (4.2)

which can be further decomposed into

0=det(H+λuuTzI)=det(HzI)(I+(HzI)1λuuT)=det(HzI)det(I+(HzI)1λuuT).\begin{split}0&=\det(H+\lambda uu^{T}-zI)=\det(H-zI)(I+(H-zI)^{-1}\lambda uu^{T})\\ &=\det(H-zI)\cdot\det(I+(H-zI)^{-1}\lambda uu^{T}).\end{split} (4.3)

Thus, if zz is not an eigenvalue of HH, we find that det(HzI)=0\det(H-zI)=0, and hence

det(I+(HzI)1λuuT)=0.\det(I+(H-zI)^{-1}\lambda uu^{T})=0.

We now claim that (HzI)1λuuT(H-zI)^{-1}\lambda uu^{T} has rank one. To prove the claim, we notice that for any vNv\in\mathbb{R}^{N},

(HzI)1λuuTv=u,v(HzI)1λu.(H-zI)^{-1}\lambda uu^{T}v=\langle u,v\rangle(H-zI)^{-1}\lambda u.

It implies that the range of (HzI)1λuuT(H-zI)^{-1}\lambda uu^{T} is contained in span((HzI)1λu)\mathrm{span}\,((H-zI)^{-1}\lambda u), which is a 1-dimensional space.

Since (HzI)1λuuT(H-zI)^{-1}\lambda uu^{T} has rank 1, it has only one non-zero eigenvalue, which we call λ0\lambda_{0}. Then, λ0\lambda_{0} is 1-1 for otherwise every eigenvalues of I+(HzI)1λuuTI+(H-zI)^{-1}\lambda uu^{T} is non-zero, contradicting (4.3). Furthermore, it is also obvious that (HzI)1λu(H-zI)^{-1}\lambda u is an eigenvector associated with the eigenvalue 1-1. Thus,

(HzI)1λuuT(HzI)1u=(HzI)1u,(H-zI)^{-1}\lambda uu^{T}(H-zI)^{-1}u=-(H-zI)^{-1}u,

which leads us to the equation

uT(HzI)1u=u,G(z)u=1λ.u^{T}(H-zI)^{-1}u=\langle u,G(z)u\rangle=-\frac{1}{\lambda}. (4.4)

For the noise matrix HH, which is a Wigner-type matrix considered in [4], we have that

u,G(z)ui=1Nmi(z)ui2\langle u,G(z)u\rangle\simeq\sum_{i=1}^{N}m_{i}(z)u_{i}^{2} (4.5)

for any zz not contained in an open neighborhood of the support of the limiting ESD of HH, where we let 𝐦:=(m1,m2,,mN)\mathbf{m}:=(m_{1},m_{2},\dots,m_{N}) be the solution to the quadratic vector equation (QVE)

1mi(z)=z+j=1N𝔼[Hij2]mj(z).-\frac{1}{m_{i}(z)}=z+\sum^{N}_{j=1}\mathbb{E}[H_{ij}^{2}]m_{j}(z). (4.6)

is satisfied for i,j=1,2,,Ni,\ j=1,2,...,N. (See Appendix A for the precise statement of (4.5).) We remark that the uniqueness of the solution mm for (4.6) is also known [3].

To solve the equations (4.4) and (4.6), we need to estimate m(z)m(z) from the assumption on the community structure in Definition 2.1. Assume for the simplicity that S={1,2,,N1}S=\{1,2,\dots,N_{1}\}. From the symmetry, we have an ansatz

m1(z)=m2(z)==mN1(z),mN1+1(z)==mN(z).m_{1}(z)=m_{2}(z)=\dots=m_{N_{1}}(z),\quad m_{N_{1}+1}(z)=\dots=m_{N}(z).

Then, we can rewrite (4.6) as

1mi(z)={z+j=1N1α1Nmj(z)+j=N1+1N1Nmj(z)if 1iN1z+j=1N11Nmj(z)+j=N1+1Nα2Nmj(z)if N1+1iN,\displaystyle-\frac{1}{m_{i}(z)}=\begin{cases}z+\sum^{N_{1}}_{j=1}\frac{\alpha_{1}}{N}m_{j}(z)+\sum^{N}_{j=N_{1}+1}\frac{1}{N}m_{j}(z)&\text{if }\quad 1\leq i\leq N_{1}\\ z+\sum^{N_{1}}_{j=1}\frac{1}{N}m_{j}(z)+\sum^{N}_{j=N_{1}+1}\frac{\alpha_{2}}{N}m_{j}(z)&\text{if }\quad N_{1}+1\leq i\leq N\\ \end{cases},

which can be further simplified (after omitting the zz-dependence) to

1=zm1+α1γ(m1)2+(1γ)m1mN1=zmN+γm1mN+α2(1γ)(mN)2.\begin{split}-1&=zm_{1}+\alpha_{1}\gamma(m_{1})^{2}+(1-\gamma)m_{1}m_{N}\,\\ -1&=zm_{N}+\gamma m_{1}m_{N}+\alpha_{2}(1-\gamma)(m_{N})^{2}\,.\end{split} (4.7)

We can thus conclude that if there exists real zz that solves (4.7) under the assumption

N1m1θ12+(NN1)mNθ22=N(γm1θ12+(1γ)mNθ22)=1λN_{1}m_{1}\theta_{1}^{2}+(N-N_{1})m_{N}\theta_{2}^{2}=N\left(\gamma m_{1}\theta_{1}^{2}+(1-\gamma)m_{N}\theta_{2}^{2}\right)=-\frac{1}{\lambda} (4.8)

then λ1\lambda_{1} converges to zz with high probability. Since (4.8) is deterministic, we find that the gap gg is deterministic in the supercritical case λ>λc\lambda>\lambda_{c}.

It remains to find the critical λc\lambda_{c}. Recall that we set m:=N1TrGm:=N^{-1}\operatorname{Tr}G. From the Stieltjes inversion formula, the (normalized) imaginary part of m(z)m(z) corresponds to the density of the limiting ESD of HH at Rez\operatorname{Re}z. Thus, after changing (4.7) as a single equation involving zz and mm only, i.e., f(z,m)=0f(z,m)=0, we find that the upper edge L+L_{+} of the ESD of HH is the largest real number such that f(L+,m)=0f(L_{+},m)=0 has a double root when considered as an equation for mm. (Note that technically the condition can be checked by solving f(L+,m)=0f(L_{+},m)=0 and mf(L+,m)=0\frac{\partial}{\partial m}f(L_{+},m)=0 simultaneously.) We can thus conclude that λc\lambda_{c} is determined as the largest number such that when λ=λc\lambda=\lambda_{c} the solution zz for the equation (4.7) under the assumption (4.8) coincides with L+L_{+}. This in particular shows that λc\lambda_{c} is also deterministic and completes the proof of Theorem 2.1.

5 Conclusion and Future Works

In this paper, we considered the generalized stochastic block model with two communities. We showed the phase transition in the GSBM where the random part is the Wigner-type matrix, which extends the BBP transition. For the precise formulas, we discussed a hidden community model and unbalanced stochastic block model with Bernoulli distribution and Gaussian distribution at the Kesten–Stigum threshold. Both models can be improved for a non-Gaussian case.

We believe that it is possible to prove the phase transition for the sparse matrix in which the data matrix is not necessarily symmetric and most of the elements are composed of zeros. We also hope to extend our result to the GSBM with more than two communities.

Acknowledgments

The authors were supported in part by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. 2019R1A5A1028324).

Appendix A Local law for Wigner-type matrices

In this section, we provide a precise statement of the local law for Wigner-type matrices, which was used in the proof of Theorem 2.1 in Section 4. Wigner-type matrices are defined as follows:

Definition A.1.

(Wigner-type matrix) We say an N×NN\times N random matrix H=(Hij)H=(H_{ij}) is a Wigner-type matrix if the entries of HH are independent real symmetric variables satisfying the following conditions:

  • 𝔼(Hij)=0\mathbb{E}(H_{ij})=0 for all i,ji,j.

  • The variance matrix 𝑺=(Sij)\boldsymbol{S}=(S_{ij}) where Sij=𝔼|Hij|2S_{ij}=\mathbb{E}|H_{ij}|^{2} satisfies

    (SL)ijρN and sijSN, 1i,jN(S^{L})_{ij}\geq\frac{\rho}{N}\text{ and }s_{ij}\leq\frac{S_{*}}{N},\ \ 1\leq i,j\leq N

    for finite parameters ρ,S,L\rho,S_{*},L.

For the precise statement of the local law, we use the following definitions, which are frequently used in the analysis involving rare events in random matrix theory.

Definition A.2.

(Overwhelming probability) An event Ω\Omega holds with overwhelming probability if for any big enough D>0D>0 , P(Ω)NDP(\Omega)\leq N^{-D} for any sufficient large NN.

Definition A.3.

(Stochastic domination) Let consider two families of non-negative random variables:

ψ={ψ(N)(u)|N,uU(N)}\psi=\{\psi^{(N)}(u)|N\in\mathbb{N},\ u\in U^{(N)}\}
ϕ={ϕ(N)(u)|N,uU(N)}\phi=\{\phi^{(N)}(u)|N\in\mathbb{N},\ u\in U^{(N)}\}

where U(N)U^{(N)} is N-dependent parameter set. Suppose N0:(0,)2N_{0}:(0,\infty)^{2}\to\mathbb{N} is a given function depending on p,q,np,\ q,\ n and μ\mu. If for ϵ>0\epsilon>0 small enough and D>0D>0 big enough, we have

P(ϕ(N)>Nϵψ(N))ND,for NN0(ϵ,D),P\left(\phi^{(N)}>N^{\epsilon}\psi^{(N)}\right)\leq N^{-D},\ \ \ \text{for }N\geq N_{0}(\epsilon,D),

then ϕ\phi is stochastically dominated by ψ\psi which denoted by ϕψ\phi\prec\psi.

We are now ready to state the local law. Let mi(z)m_{i}(z) be the solution of QVE in (4.6) and ρ\rho is the density defined as

ρ(τ):=limρ01πNi=1NImmi(τ+iη).\rho(\tau):=\lim_{\rho\searrow 0}\frac{1}{\pi N}\sum_{i=1}^{N}\operatorname{Im}m_{i}(\tau+\mathrm{i}\eta).

(See also Corollary 1.3 of [4] for more detail.)

Theorem A.1 (Local law).

Let HH be a Wigner-type matrix and fix an arbitrary γ(0,1)\gamma\in(0,1). Then, uniformly for all z=a+biz=a+bi with bNγ1b\geq N^{\gamma-1}, the resolvent G(z)=(HzI)1G(z)=(H-zI)^{-1} satisfy

maxi,j|Gij(z)mi(z)δij|1+ρ(z)bN+1bNmax_{i,j}\left|G_{ij}(z)-m_{i}(z)\delta_{ij}\right|\prec\frac{1+\sqrt{\rho(z)}}{\sqrt{bN}}+\frac{1}{bN}

Furthermore, for any deterministic vector wNw\in\mathbb{C}^{N} with maxi|wi|1\max_{i}|w_{i}|\geq 1, we have

|i,j=1Nwi¯(Gij(z)mi(z))|1bN\left|\sum^{N}_{i,j=1}\overline{w_{i}}\left(G_{ij}(z)-m_{i}(z)\right)\right|\prec\frac{1}{\sqrt{bN}}

The local law can be generalized to the anisotropic local law as follows.

Theorem A.2 (Anisotropic law).

Suppose that the assumptions in Theorem A.1 hold. Then, uniformly for all z=a+biz=a+bi with bNγ1b\geq N^{\gamma-1}, and for any two deterministic 2\ell^{2}-normalized vectors w,vNw,v\in\mathbb{C}^{N}, we have

|i,j=1Nwi¯Gij(z)vji=1Nmi(z)wi¯vj|1+ρ(z)bN+1bN\left|\sum^{N}_{i,j=1}\overline{w_{i}}G_{ij}(z)v_{j}-\sum^{N}_{i=1}m_{i}(z)\overline{w_{i}}v_{j}\right|\prec\frac{1+\sqrt{\rho(z)}}{\sqrt{bN}}+\frac{1}{bN}

Appendix B Examples from stochastic block models

In this appendix, we consider stochastic block models, which corresponds to GSBMs with Bernoulli distribution in our setting. Suppose that H^=[H^ij]i,j=1N\widehat{H}=[\widehat{H}_{ij}]_{i,j=1}^{N} is an SBM such that

H^={H^ijBernoulli(p1), if 1i,jN1,H^ijBernoulli(p2), if N1+1i,jN,H^ijBernoulli(q), otherwise.\widehat{H}=\begin{cases}\widehat{H}_{ij}\sim Bernoulli(p_{1}),&\mbox{ if }1\leq i,j\leq N_{1}\,,\\ \widehat{H}_{ij}\sim Bernoulli(p_{2}),&\mbox{ if }N_{1}+1\leq i,j\leq N\,,\\ \widehat{H}_{ij}\sim Bernoulli(q),&\mbox{ otherwise}\,.\\ \end{cases} (B.1)

In what follows, we will call the (i,j)(i,j)-entry is in the diagonal block if 1i,jN11\leq i,j\leq N_{1} or N1+1i,jNN_{1}+1\leq i,j\leq N, and otherwise it is in the off-diagonal block. In the block matrix form, it can also be expressed as follows:

H^\displaystyle\widehat{H}\explainA\explainB(Bernoulli(p1) Bernoulli(q) Bernoulli(q) Bernoulli(p2) )\explainC\explainDUNKNOWN{}

Our goal is to shift and rescale H^\widehat{H} to convert it into a GSBM M=H+λuuTM=H+\lambda uu^{T} in Definition 2.1. We first notice that the variances of the entries of H^\widehat{H} are p1(1p1)p_{1}(1-p_{1}) and p2(1p2)p_{2}(1-p_{2}) for the diagonal block and q(1q)q(1-q) for the off-diagonal block. Since we assume that the variance of the entry HijH_{ij} in the off-diagonal block is N1N^{-1}, we find that the matrix must be divided by Nq(1q)\sqrt{Nq(1-q)}. It is then immediate to find that

α1=p1(1p1)q(1q),α2=p2(1p2)q(1q).\alpha_{1}=\frac{p_{1}(1-p_{1})}{q(1-q)},\qquad\alpha_{2}=\frac{p_{2}(1-p_{2})}{q(1-q)}. (B.2)

as in (2.1).

The mean matrix

𝔼[H^]=(p1qqp2)\mathbb{E}[\widehat{H}]=\begin{pmatrix}p_{1}&q\\ q&p_{2}\end{pmatrix}

is a rank-22 matrix, and thus we need to subtract each entry by a deterministic number, which depends on the parameters p1,p2p_{1},p_{2}, and qq.

B.1 Hidden community model

Suppose that p1=pp_{1}=p and p2=qp_{2}=q. It is then easy to find that 𝔼[H^]\mathbb{E}[\widehat{H}] becomes a rank-11 matrix after subtracting each entry by qq, i.e., if we let E0E_{0} be the N×NN\times N matrix whose all entries are qq, then

𝔼[H^]E0=(pq000).\mathbb{E}[\widehat{H}]-E_{0}=\begin{pmatrix}p-q&0\\ 0&0\end{pmatrix}.

Thus, we find that

M=1Nq(1q)(H^E0)M=\frac{1}{\sqrt{Nq(1-q)}}(\widehat{H}-E_{0}) (B.3)

and

𝔼[M]=1Nq(1q)(pq000).\mathbb{E}[M]=\frac{1}{\sqrt{Nq(1-q)}}\begin{pmatrix}p-q&0\\ 0&0\end{pmatrix}. (B.4)

Recall that N1=γNN_{1}=\gamma N and p=wN+qp=\frac{w}{\sqrt{N}}+q. Since λuuT=𝔼[M]\lambda uu^{T}=\mathbb{E}[M], we get

u= (1γN0 )\explainC\explainD,UNKNOWN{}

i.e., θ1=1/γN\theta_{1}=1/\sqrt{\gamma N} and θ2=0\theta_{2}=0. We also find that

λ=N1(pq)Nq(1q)=γwq(1q).\lambda=\frac{N_{1}(p-q)}{\sqrt{Nq(1-q)}}=\frac{\gamma w}{\sqrt{q(1-q)}}.

Following the proof of Theorem 2.1 in Section 4, we solve the system of equation in (4.7),

1=zm1+p(1p)q(1q)γ(m1)2+(1γ)m1mN,1=zmN+γm1mN+(1γ)(mN)2.\begin{split}-1&=zm_{1}+\frac{p(1-p)}{q(1-q)}\gamma(m_{1})^{2}+(1-\gamma)m_{1}m_{N}\,,\\ -1&=zm_{N}+\gamma m_{1}m_{N}+(1-\gamma)(m_{N})^{2}\,.\end{split} (B.5)

Since pq=O(N1/2)p-q=O(N^{-1/2}), we consider an ansatz mN=m1+O(N1/2)m_{N}=m_{1}+O(N^{-1/2}), which shows for m=γm1+(1γ)mNm=\gamma m_{1}+(1-\gamma)m_{N} that

1+zm+m2=O(N1/2).1+zm+m^{2}=O(N^{-1/2}).

Following the analysis in the last paragraph of Section 4, we find that the upper edge L+=2+O(N1/2)L_{+}=2+O(N^{-1/2}). By Theorem 2.1, it also implies that λ22\lambda_{2}\to 2 as NN\to\infty.

In order to determine the location of the largest eigenvalue λ1\lambda_{1}, we consider (B.5) under the assumption in (4.8),

N(γm1θ12+(1γ)mNθ22)=m1=1λ=q(1q)γw.N\left(\gamma m_{1}\theta_{1}^{2}+(1-\gamma)m_{N}\theta_{2}^{2}\right)=m_{1}=-\frac{1}{\lambda}=-\frac{\sqrt{q(1-q)}}{\gamma w}. (B.6)

We remark that the ansatz mN=m1+O(N1/2)m_{N}=m_{1}+O(N^{-1/2}) can be directly checked in this case; by plugging (B.6) into (B.5) and eliminating zz,

(p(1p)q(1q)1)γ(q(1q)γw)2mN+mN+q(1q)γw=0,\left(\frac{p(1-p)}{q(1-q)}-1\right)\gamma\left(\frac{\sqrt{q(1-q)}}{\gamma w}\right)^{2}m_{N}+m_{N}+\frac{\sqrt{q(1-q)}}{\gamma w}=0,

whose solution is

mN=q(1q)/(γw)1+(p(1p)q(1q)1)γ(q(1q)γw)2=q(1q)γw(1N(12q)wNγw+N(12q)w).m_{N}=-\frac{\sqrt{q(1-q)}/(\gamma w)}{1+\left(\frac{p(1-p)}{q(1-q)}-1\right)\gamma\left(\frac{\sqrt{q(1-q)}}{\gamma w}\right)^{2}}=-\frac{\sqrt{q(1-q)}}{\gamma w}\left(1-\frac{\sqrt{N}(1-2q)-w}{N\gamma w+\sqrt{N}(1-2q)-w}\right).

To find the location of the largest eigenvalue, we need to check whether the assumption (B.6) is valid. However, we can instead find the value of zz by first assuming that the solution exists. Then,

z=γwq(1q)+q(1q)γw+O(N1/2).z=\frac{\gamma w}{\sqrt{q(1-q)}}+\frac{\sqrt{q(1-q)}}{\gamma w}+O(N^{-1/2}).

At the critical λc\lambda_{c} for the phase transition in Theorem 2.1, the location of the largest eigenvalue coincides with the location of the upper edge L+L_{+} in the limit NN\to\infty, or equivalently, γwq(1q)=1\frac{\gamma w}{\sqrt{q(1-q)}}=1. Thus, we conclude that

λ1{γwq(1q)+q(1q)γw if w>q(1q)γ,2 if w<q(1q)γ.\lambda_{1}\to\begin{cases}\frac{\gamma w}{\sqrt{q(1-q)}}+\frac{\sqrt{q(1-q)}}{\gamma w}&\text{ if }w>\frac{\sqrt{q(1-q)}}{\gamma}\,,\\ 2&\text{ if }w<\frac{\sqrt{q(1-q)}}{\gamma}\,.\end{cases}

B.2 Unbalanced stochastic model

Suppose that p1=p2=pp_{1}=p_{2}=p. Following the strategy in Appendix B.1, we let E1E_{1} be the N×NN\times N matrix whose all entries are (p+q)/2(p+q)/2. Then,

𝔼[H^]E1=((pq)/2(qp)/2(qp)/2(pq)/2).\mathbb{E}[\widehat{H}]-E_{1}=\begin{pmatrix}(p-q)/2&(q-p)/2\\ (q-p)/2&(p-q)/2\end{pmatrix}.

Thus, we find that

M=1Nq(1q)(H^E1)M=\frac{1}{\sqrt{Nq(1-q)}}(\widehat{H}-E_{1}) (B.7)

and

𝔼[M]=1Nq(1q)((pq)/2(qp)/2(qp)/2(pq)/2).\mathbb{E}[M]=\frac{1}{\sqrt{Nq(1-q)}}\begin{pmatrix}(p-q)/2&(q-p)/2\\ (q-p)/2&(p-q)/2\end{pmatrix}. (B.8)

From λuuT=𝔼[M]\lambda uu^{T}=\mathbb{E}[M], we get

u= (1N-1N)\explainC\explainD,UNKNOWN{}

i.e., θ1=1/N\theta_{1}=1/\sqrt{N} and θ2=1/N\theta_{2}=-1/\sqrt{N}. Also,

λ=N(pq)2Nq(1q)=w2q(1q).\lambda=\frac{N(p-q)}{2\sqrt{Nq(1-q)}}=\frac{w}{2\sqrt{q(1-q)}}.

With α1=α2=p(1p)q(1q)\alpha_{1}=\alpha_{2}=\frac{p(1-p)}{q(1-q)}, we solve the system of equation in (4.7),

1=zm1+p(1p)q(1q)γ(m1)2+(1γ)m1mN,1=zmN+γm1mN+p(1p)q(1q)(1γ)(mN)2.\begin{split}-1&=zm_{1}+\frac{p(1-p)}{q(1-q)}\gamma(m_{1})^{2}+(1-\gamma)m_{1}m_{N}\,,\\ -1&=zm_{N}+\gamma m_{1}m_{N}+\frac{p(1-p)}{q(1-q)}(1-\gamma)(m_{N})^{2}\,.\end{split} (B.9)

Again, we consider an ansatz mN=m1+O(N1/2)m_{N}=m_{1}+O(N^{-1/2}), which leads us to the result that the upper edge L+=2+O(N1/2)L_{+}=2+O(N^{-1/2}) and λ22\lambda_{2}\to 2 as NN\to\infty. The assumption in (4.8) becomes

γm1+(1γ)mN=1λ=2q(1q)w.\gamma m_{1}+(1-\gamma)m_{N}=-\frac{1}{\lambda}=-\frac{2\sqrt{q(1-q)}}{w}. (B.10)

If the solution to the equation (B.9) exists, it would be

z=w2q(1q)+2q(1q)w+O(N1/2).z=\frac{w}{2\sqrt{q(1-q)}}+\frac{2\sqrt{q(1-q)}}{w}+O(N^{-1/2}).

At the critical λc\lambda_{c}, w2q(1q)=1\frac{w}{2\sqrt{q(1-q)}}=1, and thus we conclude that

λ1{w2q(1q)+2q(1q)w if w>2q(1q),2 if w<2q(1q).\lambda_{1}\to\begin{cases}\frac{w}{2\sqrt{q(1-q)}}+\frac{2\sqrt{q(1-q)}}{w}&\text{ if }w>2\sqrt{q(1-q)}\,,\\ 2&\text{ if }w<2\sqrt{q(1-q)}\,.\end{cases}

References

  • [1] E. Abbe. Community detection and stochastic block models: recent developments. The Journal of Machine Learning Research, 18(1):6446–6531, 2017.
  • [2] E. Abbe, A. Bandeira, and G. Hall. Exact recovery in the stochastic block model. IEEE Transactions on Information Theory, 62(1):471–487, 2014.
  • [3] O. Ajanki, L. Erdős, and T. Krüger. Quadratic vector equations on complex upper half-plane. American Mathematical Society, 261(1261), 2019.
  • [4] O. Ajanki, L. Erdős, and T. Krüger. Universality for general wigner-type matrices. Probability Theory and Related Fields, 169(3):667–727, 2017.
  • [5] J. Baik, G. Ben Arous, and S. Péché. Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. The Annals of Probability, 33(5):1643–1697, 2005.
  • [6] J. Barbier, M. Dia, N. Macris, F. Krzakala, T. Lesieur, and L. Zdeborová. Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula. Advances in Neural Information Processing Systems, 29:424–432, 2016.
  • [7] F. Benaych-Georges and R. R. Nadakuditi. The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Advances in Mathematics, 227(1):494–521, 2011.
  • [8] M. Capitaine, C. Donati-Martin, and D. Féral. The largest eigenvalues of finite rank deformation of large Wigner matrices: convergence and nonuniversality of the fluctuations. The Annals of Probability, 37(1):1–47, 2009.
  • [9] P.-Y. Chen and A. Hero. Universal phase transition in community detectability under a stochastic block model. Physical Review E, 91(3):032804, 2015.
  • [10] H. W. Chung and J. O. Lee. Weak detection of signal in the spiked wigner model. In International Conference on Machine Learning, 97:1233–1241, 2019.
  • [11] I. Dumitriu and Y. Zhu. Sparse general Wigner-type matrices: Local law and eigenvector delocalization. Journal of Mathematical Physics, 60(2):023301, 2019.
  • [12] A. El Alaoui, F. Krzakala, and M. I. Jordan. Fundamental limits of detection in the spiked Wigner model. The Annals of Statistics, 48(2):863–885, 2020.
  • [13] L. Erdős, T. Krüger, and D. Schröder. Cusp universality for random matrices I: local law and the complex hermitian case. Communications in Mathematical Physics, 378(2):1203–1278, 2020.
  • [14] L. Erdős and P. Mühlbacher. Bounds on the norm of Wigner-type random matrices. Random Matrices: Theory and Applications, 8(03):1950009, 2019.
  • [15] D. Féral and S. Péché. The largest eigenvalue of rank one deformation of large Wigner matrices. Communications in mathematical physics, 272(1):185–228, 2007.
  • [16] B. Hajek, Y. Wu, and J. Xu. Information limits for recovering a hidden community. IEEE Transactions on Information Theory, 63(8):4729-4745, 2017.
  • [17] B. Hajek, Y. Wu, and J. Xu. Achieving exact cluster recovery threshold via semidefinite programming. IEEE Transactions on Information Theory, 62(5):2788–2797, 2016.
  • [18] B. Hajek, Y. Wu, and J. Xu. Recovering a hidden community beyond the Kesten–Stigum threshold in O(|E|log|V|)O(|E|log*|V|) time. Journal of Applied Probability, 55(2):325–352, 2018.
  • [19] J. H. Jung, H. W. Chung, and J. O. Lee. Weak detection in the spiked wigner model with general rank. arXiv:2001.05676, 2020.
  • [20] S. Péché. The largest eigenvalue of small rank perturbations of Hermitian random matrices. Probability Theory and Related Fields, 134(1):127–173, 2006.
  • [21] A. Perry, A. S. Wein, A. S. Bandeira, and A. Moitra. Optimality and sub-optimality of PCA I: Spiked random matrix models. The Annals of Statistics, 46(5):2416–2451, 2018.
  • [22] N. Stanley, T. Bonacci, R. Kwitt, M. Niethammer, and P. J. Mucha. Stochastic block models with multiple continuous attributes. Applied Network Science, 4(1):1–22, 2019.
  • [23] Y. Zhu. A graphon approach to limiting spectral distributions of wigner-type matrices. Random Structures & Algorithms, 56(1):251–279, 2020.