This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Two–Sample Test for Stochastic Block Models via Maximum Entry–Wise Deviation

Kang Fu, Seydou Keita and Jianwei Hu
School of Mathematics and Statistics, Central China Normal University, Wuhan
Abstract

The stochastic block model is a popular tool for detecting community structures in network data. Detecting the difference between two community structures is an important issue for stochastic block models. However, the two-sample test has been a largely under-explored domain, and too little work has been devoted to it. In this article, based on the maximum entry–wise deviation of the two centered and rescaled adjacency matrices, we propose a novel test statistic to test two samples of stochastic block models. We prove that the null distribution of the proposed test statistic converges in distribution to a Gumbel distribution, and we show the change of the two samples from stochastic block models can be tested via the proposed method. Then, we show that the proposed test has an asymptotic power guarantee against alternative models. One noticeable advantage of the proposed test statistic is that the number of communities can be allowed to grow linearly up to a logarithmic factor. Further, we extend the proposed method to the degree-corrected stochastic block model. Both simulation studies and real-world data examples indicate that the proposed method works well.

Keywords: Gumbel distribution; Network data; Stochastic block model; Two-sample test

1 Introduction

Network data analysis has become a popular research topic in many fields, including gene classification, social relationship investigation, and financial risk management. In the past decades, the majority of works mainly focused on the large-scale network data with community structure, see, e.g., Newman & Girvan (2004); Newman (2006); Steinhaeuser & Chawla (2010). In the network data analysis, the stochastic block model proposed by Holland et al. (1983) is a popular tool to fit the network data with community structure, see, e.g., Snijders & Nowicki (1997); Nowicki & Snijders (2001); Bickel & Chen (2009); Rohe et al. (2011); Choi et al. (2012); Jin (2015); Zhang & Zhou (2016). The stochastic block model with KK communities assumes that nn nodes of the network are clustered into KK communities, that is, there exists a mapping of community membership (also known as community membership label) g:[n][K]ng:[n]\rightarrow[K]^{n}, where [n]={1,,n}[n]=\{1,\ldots,n\}. Formally, g(i)=kg(i)=k means that the node ii belongs to the community kk. For an unweighted and undirected graph GG, it can be represented by a binary symmetric adjacency matrix AA, that is, Aij=1A_{ij}=1 if there is a connection (or an edge) between node ii and node jj and Aij=0A_{ij}=0 otherwise. Given the community membership label gg, the stochastic block model assumes that the entries Aij(i>j)A_{ij}(i>j) of the adjacency matrix AA are mutually independent Bernoulli random variables with probabilities Pij=Bg(i)g(j)P_{ij}=B_{g(i)g(j)} for a certain symmetric probability matrix B[0,1]K×KB\in[0,1]^{K\times K}, where matrix PP is called as edge-probability matrix. Then the stochastic block model is completely and uniquely determined by the pair (g,B)(g,B) up to label permutations of nodes.

The fundamental issues in the stochastic block model are model selection and community detection. Given an adjacency matrix AA, the goal of model selection is to estimate the number of clusters or communities, and the goal of community detection is to cluster all nodes into different communities such that the connections between the nodes in the same community are dense and the connections between the nodes in the different communities are sparse. To enhance the flexibility of the model, variations of the stochastic block model were also proposed, such as the degree-corrected stochastic block model proposed by Karrer & Newman (2011) addressed the degree heterogeneity by introducing the additional node activeness parameters, and Airoldi et al. (2008) proposed the mixed membership stochastic block model, where a single node may belong to multiple communities. For the model selection, many methods are used to estimate the number of communities, including the sequential testing methods (Lei, 2016; Hu et al., 2021), and the likelihood-based methods (Saldña et al., 2017; Wang & Bickel, 2017; Hu et al., 2020). Meanwhile, the majority of efficient methods also have been proposed to recover the community structure, such as modularity (Newman, 2006), variational methods (Daudin et al., 2008), profile-likelihood maximization (Bickel & Chen, 2009), spectral clustering (Rohe et al., 2011; Jin, 2015), pseudo-likelihood maximization (Amini et al., 2013), and profile-pseudo likelihood methods (Wang et al., 2021). The corresponding asymptotic properties of estimation of community label have been obtained, see, e.g., Rohe et al. (2011); Zhao et al. (2012); Choi et al. (2012); Lei & Rinaldo (2015); Sarkar & Bickel (2015); Zhang & Zhou (2016); Wang et al. (2021).

In the past decades, the research interest of much literature focuses on the study of one sample of stochastic block models. Usually, in the social network, people surrounding a leader may tend to develop a closer relationship with another leader, see, e.g., Barnett & Onnela (2016). This phenomenon may lead to an issue, that is, whether the community structure (the number of communities KK, the community probability matrix BB, and the community membership label gg) of the network will change over time or the environment.

As the simplest case of the multi-layer stochastic block model, two sample networks may also appear. For a multi-layer stochastic block model, it can be classified into a variety of more specific situations: First, all layer networks come from the identical stochastic block model (g,B)(g,B); second, each layer of the network comes from a different model but they have the same community structure, that is, gg is identical across all layers; third, each layer of the network comes from different models, that is, gg and BB are different in each layer. In fact, for the third case, it can be considered as a mixed model of multiple stochastic block models. However, how could we distinguish the three situations? Intuitively, we should judge whether two models are the same when we have two samples from the corresponding models. A common inference method to judge whether two models are the same is the hypothesis testing method. Tang et al. (2017) considered whether two independent finite-dimensional random dot product graphs are generated by the identical model. Using the adjacency spectral embedding for each sample, they constructed a statistic based on the kernel function. Ghoshdastidar & von Luxburg (2018) used the largest singular value of a scaled and centralized matrix to construct the statistic and proved that the null distribution convergences in distribution to a Tracy-Widom distribution. Ghoshdastidar et al. (2020) proposed two test statistics using the Frobenius norm and spectral norm to test whether two samples of networks are generated from the identical edge-probability matrix. However, this testing procedure requires choosing an appropriate threshold. Recently, Chen et al. (2021a) simplified the statistics in Ghoshdastidar & von Luxburg (2018) and proposed a test procedure for a two-sample network test. Based on the random matrix theory, Chen et al. (2021b) used the trace of a constructed matrix to obtain the statistic and proved that the asymptotic null distribution is the standard normal distribution.

For the methods mentioned above, one defect is that either the edge-probability matrices of the two populations differ greatly or multiple samples are required. Hence, when we only have two samples, the methods mentioned above may not work well as there is less information. In addition, it is worth noting that when the community structure changes slightly, the edge-probability matrix may change less, especially when the network size is relatively large. The above methods of testing H0:P1=P2H_{0}:P_{1}=P_{2} may not work well, where P1P_{1} and P2P_{2} are the edge-probability matrices of two network models. It implies that we cannot test the difference between the two models when the difference is tiny but the network size is very large. Thus, detecting the difference between two samples of stochastic block models is an interesting research problem. Under the two-sample test of the stochastic block model, Wu et al. (2022) proposed a test method based on the locally smoothed adjacency matrix. To construct the smoothed adjacency matrix, their method separates a community into serval non-overlapping neighboring sub-communities and averages the entries of the adjacency matrix in non-overlapping local neighborhoods within communities. However, the procedure is complex and only applicable to a small number of communities. In this article, we want to construct a statistic that allows KK to diverge with nn.

In this article, based on the maximum entry-wise deviation of the two centered and rescaled adjacency matrices, we propose a two-sample test statistic to detect the change of two stochastic block models under two observed adjacency matrices. We show that the asymptotic null distribution of the test statistic is a Gumbel distribution when K=o(n/log2n)K=o(n/\log^{2}n). This testing method allows KK to grow linearly with nn up to a logarithmic factor. It is well known that the number of communities must be less than the number of nodes in the stochastic block model. Since KK cannot grow faster, Rohe et al. (2014) called this scenario (KK\rightarrow\infty) the highest dimensional stochastic block model. Compared with the method in Wu et al.(2022), we relax the condition that the number of communities KK is fixed. Moreover, we also show that the proposed test is asymptotically powerful against serval alternative models, and does not need additional methods to improve the power of the proposed test. Finally, to improve the flexibility of the proposed method, we extend the proposed method to the degree-corrected stochastic block model.

Next, we formally describe this issue. Let 𝒳=(Xij)n×n\mathcal{X}=(X_{ij})_{n\times n} and 𝒴=(Yij)n×n\mathcal{Y}=(Y_{ij})_{n\times n} be two binary symmetric adjacency matrices from two stochastic block models parametrized by (gx,Bx)(g_{x},B_{x}) and (gy,By)(g_{y},B_{y}), respectively. For any ii, let Xii=Yii=0X_{ii}=Y_{ii}=0, i.e., no loop edge exists. In this article, we assume that the node label of the two networks is identical, instead of the community label of the node. Hence, the two networks 𝒳\mathcal{X} and 𝒴\mathcal{Y} can be viewed as repeated observations in the same individuals. Then, given two sample networks 𝒳\mathcal{X} and 𝒴\mathcal{Y}, the testing problem can be formulated as

H0:(gx,Bx)=(gy,By)v.s.H1:(gx,Bx)(gy,By).H_{0}:(g_{x},B_{x})=(g_{y},B_{y})\qquad\mathrm{v.s.}\qquad H_{1}:(g_{x},B_{x})\neq(g_{y},B_{y}). (1)

Intuitively, there are four mutually exclusive scenarios for the alternative hypothesis: (i) KxKyK_{x}\neq K_{y}, (ii) Kx=Ky,gx=gyK_{x}=K_{y},g_{x}=g_{y}, but BxByB_{x}\neq B_{y}, (iii) Kx=Ky,Bx=ByK_{x}=K_{y},B_{x}=B_{y}, but gxgyg_{x}\neq g_{y}, and (iv) Kx=Ky,gxgy,BxByK_{x}=K_{y},g_{x}\neq g_{y},B_{x}\neq B_{y}.

Our testing procedure for (1) goes as follows. First, we get consistent estimators of KxK_{x} and KyK_{y}, denoted by K^x\widehat{K}_{x} and K^y\widehat{K}_{y}, respectively. If K^xK^y\widehat{K}_{x}\neq\widehat{K}_{y}, it means that (gx,Bx)=(gy,By)(g_{x},B_{x})=(g_{y},B_{y}) cannot be true, so we can directly reject the null hypothesis. If K^x=K^y\widehat{K}_{x}=\widehat{K}_{y}, based on K^x=K^y=K^\widehat{K}_{x}=\widehat{K}_{y}=\widehat{K}, we obtain the strongly consistent estimators g^x\widehat{g}_{x} and g^y\widehat{g}_{y} for the community membership labels gxg_{x} and gyg_{y}. Second, we estimate the entries of BxB_{x} and ByB_{y} by the sample proportions of each community based on (𝒳,g^x)(\mathcal{X},\widehat{g}_{x}) and (𝒴,g^y)(\mathcal{Y},\widehat{g}_{y}). Third, we use B^y\widehat{B}_{y} (B^x\widehat{B}_{x}) to center and rescale adjacency matrix 𝒳\mathcal{X} (𝒴\mathcal{Y}), and sum for new matrix under specific rules, and get a combined information matrix, see (7). Finally, we obtain the test statistic by the maximum of the elements of the combined information matrix. The basic principle of this method is that if the null hypothesis is true, the entries of the combined information matrix asymptotically follow the normal distribution. According to the results of Zhou (2007), the asymptotic distribution of the maximum of elements after normalization is a Gumbel distribution. On the other hand, under the alternative hypothesis, the adjacency is incorrectly centered and scaled, and this deviation is magnified to a very large value by the normalization term. This implies that the test statistic can successfully separate H0H_{0} and H1H_{1} in (1).

The remainder of the article is organized as follows. In Section 2, we introduce the new test statistic and state its asymptotic null distribution and asymptotic power. In Section 3, we extend the proposed method to the degree-corrected stochastic block model. Simulation studies and real-world data examples are given in Sections 4 and 5, respectively. All technical proofs are postponed to the Appendix.

2 A NEW TWO-SAMPLE TEST FOR THE STOCHASTIC BLOCK MODEL

Consider a stochastic block model on nn nodes with the community label gg and probability matrix BB. Given a number of communities KK and a community label gg, the maximum likelihood estimator of BB is given by

B^uv={i𝒩u,j𝒩vXijnunv,uv,i,j𝒩u,ijXijnu(nv1),u=v,\widehat{B}_{uv}=\begin{dcases}\dfrac{\sum_{i\in\mathcal{N}_{u},j\in\mathcal{N}_{v}}X_{ij}}{n_{u}n_{v}},\quad&u\neq v,\\ \dfrac{\sum_{i,j\in\mathcal{N}_{u},i\neq j}X_{ij}}{n_{u}(n_{v}-1)},\quad&u=v,\end{dcases} (2)

where 𝒩u={i:g(i)=u}\mathcal{N}_{u}=\left\{i:g(i)=u\right\} for 1uK1\leq u\leq K, i{1,,n}i\in\{1,\ldots,n\} stands for the label of a node, and nu=|𝒩u|n_{u}=\left|\mathcal{N}_{u}\right|. After here, 𝒩ux={i:gx(i)=u}\mathcal{N}_{u}^{x}=\left\{i:g_{x}(i)=u\right\} and nux=|𝒩ux|n_{u}^{x}=\left|\mathcal{N}^{x}_{u}\right|, where xx can be replaced by yy.

For an adjacency matrix AA, Hu et al. (2021) used a test statistic, based on the maximum entry-wise deviation, to test the following two hypothesis tests:

(1) H0:K=K0v.s.H1:K>K0H_{0}:K=K_{0}\quad\mathrm{v.s.}\quad H_{1}:K>K_{0}, and

(2) H0:g=g0v.s.H1:gg0H_{0}:g=g_{0}\quad\mathrm{v.s.}\quad H_{1}:g\neq g_{0},

where KK and gg denote the true number of communities and the true community label, respectively, and K0K_{0} and g0g_{0} denote a hypothetical number of communities and hypothetical community label, respectively. Let

Ln(K0,g0):=max1in,1vK0|ρ^iv|,L_{n}(K_{0},g_{0}):=\max_{1\leq i\leq n,1\leq v\leq K_{0}}|\widehat{\rho}_{iv}|,

where ρ^iv=1|g01(v)/{i}|jg01(v)/{i}AijB^g0(i)g0(j)B^g0(i)g0(j)(1B^g0(i)g0(j))\widehat{\rho}_{iv}=\dfrac{1}{\sqrt{|g^{-1}_{0}(v)/\{i\}|}}\sum_{j\in g^{-1}_{0}(v)/\{i\}}\dfrac{A_{ij}-\widehat{B}_{g_{0}(i)g_{0}(j)}}{\sqrt{\widehat{B}_{g_{0}(i)g_{0}(j)}(1-\widehat{B}_{g_{0}(i)g_{0}(j)})}}, and g01(v)={i:g0(i)=v}g_{0}^{-1}(v)=\{i:g_{0}(i)=v\}, and |g01(v)|\left|g_{0}^{-1}(v)\right| is the number of nodes in block vv, and g01(v)/{i}g_{0}^{-1}(v)/\{i\} denotes the set of nodes that belong to community vv in gxg_{x} but excluding node ii and B^g0(i)g0(j)\widehat{B}_{g_{0}(i)g_{0}(j)} as defined in (2). Under the null hypothesis H0:K=K0,g=g0H_{0}:K=K_{0},g=g_{0}, if K=o(n/log2n)K=o(n/\log^{2}n), Hu et al. (2021) showed that

limn{Ln2(K0,g0)2log(2K0n)+loglog(2K0n)y}=exp{12πey/2},\lim_{n\rightarrow\infty}\mathbb{P}\{L_{n}^{2}(K_{0},g_{0})-2\log(2K_{0}n)+\log\log(2K_{0}n)\leq y\}=\exp\left\{-\dfrac{1}{2\sqrt{\pi}}e^{-y/2}\right\},

and proposed the following test statistic:

Tn=Ln2(K0,g0)2log(2K0n)+loglog(2K0n).T_{n}=L_{n}^{2}(K_{0},g_{0})-2\log(2K_{0}n)+\log\log(2K_{0}n).

Then the corresponding level-α\alpha rejection rule is

Reject:H0:K=K0,g=g0ifTnt(1α),\mathrm{Reject:}\ H_{0}:K=K_{0},g=g_{0}\ \mathrm{if}\ T_{n}\geq t_{(1-\alpha)},

where Q1αQ_{1-\alpha} is the α\alphath quantile of the Gumbel distribution with μ=2log(2π)\mu=-2\log(2\sqrt{\pi}) and β=2\beta=2.

In this article, We aim to develop a new test statistic that allows it can be used to test two samples. For two samples 𝒳\mathcal{X} and 𝒴\mathcal{Y} from stochastic block models (gx,Bx)(g_{x},B_{x}) and (gy,By)(g_{y},B_{y}), respectively, let K^x\widehat{K}_{x} and K^y\widehat{K}_{y} be obtained by some estimation methods, such as the recursive approach in Zhao et al. (2011), the sequential testing method given in Lei (2016), the likelihood-based method in Saldña et al. (2017), the corrected Bayesian information criterion in Hu et al. (2020), the test based on maximum entry-wise deviation in Hu et al. (2021), and the spectral methods in Le & Levina (2022). In this article, we use the corrected Bayesian information criterion in Hu et al. (2020) to get consistent estimators K^x\widehat{K}_{x} and K^y\widehat{K}_{y}. Recall that given the number of communities KxK_{x} and KyK_{y}, the community labels gxg_{x} and gyg_{y} can be consistently estimated, denote by g^x\widehat{g}_{x} and g^y\widehat{g}_{y}, by some existing strongly consistent community detection procedures, e.g., the majority voting algorithm in Gao et al. (2017) and the profile-pseudo likelihood method in Wang et al. (2021). In view of this, throughout the paper, we formally assume that

{g^x=gx,g^y=gy,K^x=Kx,K^y=Ky}1,\mathbb{P}\left\{\widehat{g}_{x}=g_{x},\widehat{g}_{y}=g_{y},\widehat{K}_{x}=K_{x},\widehat{K}_{y}=K_{y}\right\}\rightarrow 1, (3)

which implies that we require that all estimators are strongly consistent. Hence, we can get the B^x\widehat{B}_{x} and B^y\widehat{B}_{y} by equation (2). Note that the community membership label g^x\widehat{g}_{x} and the probability matrix B^x\widehat{B}_{x} depend on the number of communities K^x\widehat{K}_{x}. If g^x\widehat{g}_{x} is a strongly consistent estimator, K^x\widehat{K}_{x} must be the consistent estimator. Then, Condition (3) can be relaxed to {g^x=gx,g^y=gy}1\mathbb{P}\left\{\widehat{g}_{x}=g_{x},\widehat{g}_{y}=g_{y}\right\}\rightarrow 1. In fact, if K^xK^y\widehat{K}_{x}\neq\widehat{K}_{y}, then it is natural that g^xg^y\widehat{g}_{x}\neq\widehat{g}_{y} and B^xB^y\widehat{B}_{x}\neq\widehat{B}_{y}. Since g^x\widehat{g}_{x} and g^y\widehat{g}_{y} are consistent estimators, we can reject the null hypothesis with probability tending to 1. Meanwhile, For case (iv), there is indeed a problem of identifiability. Because both gg and BB are allowed to vary, the existing methods cannot reasonably solve the problem of identifiability. Thus, in this article, we mainly focus on cases (ii) and (iii) for four alternative hypotheses. Then, we assume Kx=Ky=KK_{x}=K_{y}=K. Without loss of generality, we write K^=K^x=K^y\widehat{K}=\widehat{K}_{x}=\widehat{K}_{y} when K^x=K^y\widehat{K}_{x}=\widehat{K}_{y}. Next, we formally state the test statistic. As mentioned in the introduction, our test statistic is motivated by the contrast of 𝒳\mathcal{X} (or 𝒴\mathcal{Y}) and B^y\widehat{B}_{y} (or B^x\widehat{B}_{x}), i.e.:

ρ~iv(gx,gy)=1|gx1(v)/{i}|+|gy1(v)/{i}|×[jgy1(v)/{i}XijB^gy(i)gy(j)yB^gy(i)gy(j)y(1B^gy(i)gy(j)y)+jgx1(v)/{i}YijB^gx(i)gx(j)xB^gx(i)gx(j)x(1B^gx(i)gx(j)x)]\tilde{\rho}_{iv}(g_{x},g_{y})=\dfrac{1}{\sqrt{\left|g_{x}^{-1}(v)/\{i\}\right|+\left|g_{y}^{-1}(v)/\{i\}\right|}}\times\\ \left[\sum\limits_{j\in g_{y}^{-1}(v)/\{i\}}\dfrac{X_{ij}-\widehat{B}^{y}_{g_{y}(i)g_{y}(j)}}{\sqrt{\widehat{B}^{y}_{g_{y}(i)g_{y}(j)}\left(1-\widehat{B}^{y}_{g_{y}(i)g_{y}(j)}\right)}}+\sum\limits_{j\in g_{x}^{-1}(v)/\{i\}}\dfrac{Y_{ij}-\widehat{B}^{x}_{g_{x}(i)g_{x}(j)}}{\sqrt{\widehat{B}^{x}_{g_{x}(i)g_{x}(j)}\left(1-\widehat{B}^{x}_{g_{x}(i)g_{x}(j)}\right)}}\right] (4)

Moreover, based on the maximum entry-wise deviation, the proposed test statistic has the following form:

Ln(gx,gy)=max1in,1vK|ρ~iv(gx,gy)|.L_{n}(g_{x},g_{y})=\max_{1\leq i\leq n,1\leq v\leq K}\left|\tilde{\rho}_{iv}(g_{x},g_{y})\right|.

2.1 The Asymptotic Null Distribution

To obtain the asymptotic result for the statistic Ln(gx,gy)L_{n}(g_{x},g_{y}), we first make the following assumptions:

Assumption 1.

The entries of BxB_{x} and ByB_{y} are uniformly bounded away from 0 and 1, and both BxB_{x} and ByB_{y} have no identical rows.

Assumption 2.

There exist C1C_{1}, C2C_{2}, C3C_{3} and C4C_{4} such that

C1n/Kxmin1uKxnuxmax1uKxnuxC2n2/(Kx2log2n),C_{1}n/K_{x}\leq\min_{1\leq u\leq K_{x}}n_{u}^{x}\leq\max_{1\leq u\leq K_{x}}n_{u}^{x}\leq C_{2}n^{2}/(K_{x}^{2}\log^{2}n),

and,

C3n/Kymin1uKynuymax1uKynuyC4n2/(Ky2log2n),C_{3}n/K_{y}\leq\min_{1\leq u\leq K_{y}}n_{u}^{y}\leq\max_{1\leq u\leq K_{y}}n_{u}^{y}\leq C_{4}n^{2}/(K_{y}^{2}\log^{2}n),

for all nn.

Assumption 1 requires that the entries of the probability matrices BxB_{x} and ByB_{y} are uniformly bounded away from 0 and 1. This assumption is similar to the corresponding condition in Lei (2016) and Hu et al. (2021). Meantime, Assumption 1 also requires that BxB_{x} and ByB_{y} are identifiable, which is a basic condition (Wang & Bickel, 2017). Assumption 2 not only requires a lower bound for the number of nodes in the smallest community, but also gives an upper bound for the number of nodes in the largest community. The lower bound requires that the number of nodes in the smallest community for 𝒳\mathcal{X} (𝒴\mathcal{Y}) is at least proportional to n/Kxn/K_{x} (n/Kyn/K_{y}). This is a mild assumption and easy to be achieved. For example, this lower bound can be achieved when the community label gxg_{x} is generated from a multinomial distribution with nn trials and parameter 𝝅=(π1,,πK)\bm{\pi}=(\pi_{1},\cdots,\pi_{K}) such that minuπuC1/Kx\min_{u}\pi_{u}\geq C_{1}/K_{x}. For the assumption about the upper bound, Zhang & Zhou (2016) and Gao et al. (2017) also considered a similar condition, which is used to control the maximum within-group deviation between the BxB_{x} (ByB_{y}) and its estimator B^x\widehat{B}_{x} (B^y\widehat{B}_{y}).

We now give the asymptotic property of the test statistic Ln(gx,gy)L_{n}(g_{x},g_{y}) and delay the proof to the Appendix.

Theorem 1.

Suppose that Assumptions (1) and (2) hold. Then under the null hypothesis H0:(gx,Bx)=(gy,By)H_{0}:(g_{x},B_{x})=(g_{y},B_{y}), as nn\rightarrow\infty, if K=o(n/log2n)K=o(n/\log^{2}n), we have

{Ln2(gx,gy)2log(2Kn)+loglog(2Kn)y}exp{12πey/2},\mathbb{P}\left\{L_{n}^{2}(g_{x},g_{y})-2\log(2Kn)+\log\log(2Kn)\leq y\right\}\rightarrow\exp\left\{-\dfrac{1}{2\sqrt{\pi}}e^{-y/2}\right\}, (5)

where the right hand-side of (5) is the cumulative distribution function of the Gumbel distribution with μ=2log(2π)\mu=-2\log(2\sqrt{\pi}) and β=2\beta=2.

Note that Theorem 1 demonstrates that Ln2(gx,gy)2log(2Kn)+loglog(2Kn)L_{n}^{2}(g_{x},g_{y})-2\log(2Kn)+\log\log(2Kn) converges in distribution to a Gumbel distribution under H0H_{0}.

Using the above Theorem 1, we can implement hypothesis test (1) as follows. First, we estimate the number of communities K^x\widehat{K}_{x} and K^y\widehat{K}_{y}, and the community membership labels (g^x,B^x)(\widehat{g}_{x},\widehat{B}_{x}) and (g^y,B^y)(\widehat{g}_{y},\widehat{B}_{y}), respectively. Then, we compute the statistic

Tn=Ln2(g^x,g^y)2log(2K^n)+loglog(2K^n).T_{n}=L_{n}^{2}(\widehat{g}_{x},\widehat{g}_{y})-2\log(2\widehat{K}n)+\log\log(2\widehat{K}n).

Since g^x\widehat{g}_{x} and g^y\widehat{g}_{y} are strongly consistent estimators, we have that TnT_{n} intuitively follows the Gumbel distribution with μ=2log(2π)\mu=-2\log(2\sqrt{\pi}) and β=2\beta=2. To carry out the hypothesis test, we have a rejection rule:

RejectH0,TnQ1α,\mathrm{Reject}\ \ H_{0},\quad T_{n}\geq Q_{1-\alpha},

where Q1αQ_{1-\alpha} is the α\alphath quantile of the Gumbel distribution with μ=2log(2π)\mu=-2\log(2\sqrt{\pi}) and β=2\beta=2. In Section 4, by simulation studies, we investigate the finite sample performance of the proposed test statistic.

In general, the null distribution converges to the Gumbel distribution slowly, which is also confirmed in simulation studies. Bickel & Sarkar (2015) considered a bootstrap method for the correction of the distribution of the finite sample, which was also used by Lei (2016) and Hu et al. (2021). For our statistic, the bootstrap corrected test statistic is calculated as follows:

1. Using the consistent estimation method to estimate K^x=K^y=K^\widehat{K}_{x}=\widehat{K}_{y}=\widehat{K}, then get the estimation (g^x,B^x)(\widehat{g}_{x},\widehat{B}_{x}) and (g^y,B^y)(\widehat{g}_{y},\widehat{B}_{y}) by the strongly consistent clustering method;

2. For m=1,,Mm=1,\ldots,M, generate 𝒳(m)\mathcal{X}^{(m)} and 𝒴(m)\mathcal{Y}^{(m)} from the edge-probability matrix P^ij=(P^ijx+P^ijy)/2=(B^g^x(i)g^x(j)x+B^g^y(i)g^y(j)y)/2\widehat{P}_{ij}=(\widehat{P}_{ij}^{x}+\widehat{P}_{ij}^{y})/2=(\widehat{B}_{\widehat{g}_{x}(i)\widehat{g}_{x}(j)}^{x}+\widehat{B}_{\widehat{g}_{y}(i)\widehat{g}_{y}(j)}^{y})/2, and calculate Tn(m)T_{n}^{(m)} based on 𝒳(m)\mathcal{X}^{(m)}, 𝒴(m)\mathcal{Y}^{(m)};

3. Using {Tn(m):m=1,,M}\{T_{n}^{(m)}:m=1,\ldots,M\} to estimate the location and scale parameters μ^\widehat{\mu} and β^\widehat{\beta} of the Gumbel distribution through maximum likelihood method.

4. The bootstrap corrected test statistic is calculated as

Tnboot=μ+β(Tnμ^β^),T_{n}^{boot}=\mu+\beta\left(\dfrac{T_{n}-\widehat{\mu}}{\widehat{\beta}}\right),

where μ=2log(2π)\mu=-2\log(2\sqrt{\pi}) and β=2\beta=2.

2.2 THE ASYMPTOTIC POWER

In this subsection, we investigate the asymptotic power of the proposed test procedure. To guarantee good testing power, we need the following theoretical assumption.

Assumption 3.

The maximum grouped difference between BxB_{x} and ByB_{y} satisfies:

maxi,v|1|gx1(v)|jgx1(v)(Bgy(i)gy(j)yBgx(i)gx(j)x)|/logn,\max_{i,v}|\dfrac{1}{\sqrt{|g_{x}^{-1}(v)|}}\sum_{j\in g_{x}^{-1}(v)}(B_{g_{y}(i)g_{y}(j)}^{y}-B_{g_{x}(i)g_{x}(j)}^{x})|/\sqrt{\log n}\rightarrow\infty,

and

maxi,v|1|gy1(v)|jgy1(v)(Bgx(i)gx(j)xBgy(i)gy(j)y)|/logn.\max_{i,v}|\dfrac{1}{\sqrt{|g_{y}^{-1}(v)|}}\sum_{j\in g_{y}^{-1}(v)}(B_{g_{x}(i)g_{x}(j)}^{x}-B_{g_{y}(i)g_{y}(j)}^{y})|/\sqrt{\log n}\rightarrow\infty.

Assumption 3 specifies that under the alternative H1H_{1}, the maximum grouped difference between BxB_{x} and ByB_{y} diverges faster than logn\sqrt{\log n}. This assumption is analogous to (A2) in Hu et al. (2021). Then, the lower bound of the growth rate of the test statistic TnT_{n} under the alternative hypothesis H1:(gx,Bx)(gy,By)H_{1}:(g_{x},B_{x})\neq(g_{y},B_{y}) is given in the following Theorem.

Theorem 2.

Suppose that Assumptions (1), (2), and (3) hold. Then under the alternative hypothesis H1:(gx,Bx)(gy,By)H_{1}:(g_{x},B_{x})\neq(g_{y},B_{y}), as nn\rightarrow\infty, if K=o(n/log2n)K=o(n/\log^{2}n), we have

Ln2(gx,gy)2log(2Kn)loglog(2Kn)+c1logn+Op(1),L_{n}^{2}(g_{x},g_{y})\geq 2\log(2Kn)-\log\log(2Kn)+c_{1}\log n+O_{p}(1),

for some some consistent c1>0c_{1}>0.

The proof is collected in the Appendix. The above Theorem 2 shows that TnT_{n} diverges faster than logn\log n as nn\rightarrow\infty under the alternative hypothesis H1H_{1}. Then, we have the following corollary saying that the asymptotic size is close to the nominal level and the asymptotic power is almost equal to 1.

Corollary 1.

Suppose that Assumptions (1), and (2) hold, given a nominal level α\alpha, we have

H0{Tn>Q1α}α,\mathbb{P}_{H_{0}}\{T_{n}>Q_{1-\alpha}\}\rightarrow\alpha,

and further, suppose that Assumption (3) holds, we have

H1{Tn>Q1α}1.\mathbb{P}_{H_{1}}\{T_{n}>Q_{1-\alpha}\}\rightarrow 1.

Therefore, the asymptotic null distribution in Theorem 1 and the growth rate in Theorem 2 suggest that the null hypothesis and the alternative hypothesis are well separated, and our proposed test is asymptotically powerful against alternative hypothesis H1:(gx,Bx)(gy,By)H_{1}:(g_{x},B_{x})\neq(g_{y},B_{y}).

3 EXTENSION

In this section, we extend the proposed method to the degree-corrected stochastic block model. In reality, a network may contain many high-degree nodes and lower-degree nodes, that is, the degree heterogeneity of nodes. To address this issue, Karrer & Newman (2011) proposed a degree-corrected stochastic block model. Specifically, for an undirected network 𝒳\mathcal{X}, this model assume that the edge XijX_{ij} satisfies {Xij=1}=θiθjBg(i)g(j)\mathbb{P}\{X_{ij}=1\}=\theta_{i}\theta_{j}B_{g(i)g(j)}, where θi\theta_{i} is the degree parameter for node ii. To ensure the identifiability of the model, we assume that iθi/n=1\sum_{i}\theta_{i}/n=1. For two sample networks 𝒳\mathcal{X} and 𝒴\mathcal{Y}, the testing problem has the following form:

H0:(gx,Bx,θx)=(gy,By,θy)v.s.H1:(gx,Bx,θx)(gy,By,θy),H^{\prime}_{0}:(g_{x},B_{x},\theta_{x})=(g_{y},B_{y},\theta_{y})\qquad\mathrm{v.s.}\qquad H^{\prime}_{1}:(g_{x},B_{x},\theta_{x})\neq(g_{y},B_{y},\theta_{y}), (6)

where θx\theta_{x} and θy\theta_{y} are the node degree parameters for networks 𝒳\mathcal{X} and 𝒴\mathcal{Y}, respectively.

Similar to the general stochastic block model, for the two-sample test of the degree-corrected stochastic block model, we consider the following test statistic:

LD(gx,gy)=maxi,v|τiv|,L_{D}(g_{x},g_{y})=\max_{i,v}|\tau_{iv}|,

where

τiv=1|gx1(v)/{i}|+|gy1(v)/{i}|×[jgy1(v)/{i}Xijθ^iyθ^jyB^gy(i)gy(j)yθ^iyθ^jyB^gy(i)gy(j)y(1θ^iyθ^jyB^gy(i)gy(j)y)+jgx1(v)/{i}Yijθ^ixθ^jxB^gx(i)gx(j)xθ^ixθ^jxB^gx(i)gx(j)x(1θ^ixθ^jxB^gx(i)gx(j)x)],\tau_{iv}=\dfrac{1}{\sqrt{\left|g_{x}^{-1}(v)/\{i\}\right|+\left|g_{y}^{-1}(v)/\{i\}\right|}}\times\\ \left[\sum\limits_{j\in g_{y}^{-1}(v)/\{i\}}\dfrac{X_{ij}-\widehat{\theta}^{y}_{i}\widehat{\theta}^{y}_{j}\widehat{B}^{y}_{g_{y}(i)g_{y}(j)}}{\sqrt{\widehat{\theta}^{y}_{i}\widehat{\theta}^{y}_{j}\widehat{B}^{y}_{g_{y}(i)g_{y}(j)}\left(1-\widehat{\theta}^{y}_{i}\widehat{\theta}^{y}_{j}\widehat{B}^{y}_{g_{y}(i)g_{y}(j)}\right)}}+\right.\\ \left.\sum\limits_{j\in g_{x}^{-1}(v)/\{i\}}\dfrac{Y_{ij}-\widehat{\theta}^{x}_{i}\widehat{\theta}^{x}_{j}\widehat{B}^{x}_{g_{x}(i)g_{x}(j)}}{\sqrt{\widehat{\theta}^{x}_{i}\widehat{\theta}^{x}_{j}\widehat{B}^{x}_{g_{x}(i)g_{x}(j)}\left(1-\widehat{\theta}^{x}_{i}\widehat{\theta}^{x}_{j}\widehat{B}^{x}_{g_{x}(i)g_{x}(j)}\right)}}\right], (7)

where θ^ix=|gx1(u)|di/j{j:gx(j)=gx(i)}dj\widehat{\theta}^{x}_{i}=|g_{x}^{-1}(u)|d_{i}/\sum_{j\in\{j:g_{x}(j)=g_{x}(i)\}}d_{j} is the maximum likelihood estimator of θix\theta^{x}_{i} and di=jXijd_{i}=\sum_{j}X_{ij}. θ^iy\widehat{\theta}^{y}_{i} is similar to that of θ^x\widehat{\theta}^{x}. Note that it is difficult to obtain the asymptotic null distribution of LD(gx,gy)L_{D}(g_{x},g_{y}) as the complex dependency between the entries τiv\tau_{iv}. By the simulation studies, we find that the empirical distribution of TD=LD2(g^x,g^y)2log(2K^n)+loglog(2K^n)T_{D}=L_{D}^{2}(\widehat{g}_{x},\widehat{g}_{y})-2\log(2\widehat{K}n)+\log\log(2\widehat{K}n) is also the Gumbel distribution with μ=2log(2π)\mu=-2\log(2\sqrt{\pi}) and β=2\beta=2. Then, we have a rejection rule:

RejectH0,TDQ1α,\mathrm{Reject}\ \ H^{\prime}_{0},\quad T_{D}\geq Q_{1-\alpha},

where Q1αQ_{1-\alpha} is the α\alphath quantile of the Gumbel distribution with μ=2log(2π)\mu=-2\log(2\sqrt{\pi}) and β=2\beta=2. Similar to the general stochastic block model, we can obtain the bootstrap corrected statistic through the identical method.

4 SIMULATION

In this section, we evaluate the performance of the proposed test statistics in various simulation studies. Firstly, the number of communities is estimated by the Bayesian information criterion proposed by Hu et al. (2020), and the community membership label is estimated by strongly consistent estimation methods mentioned above, i.e., the profile-pseudo likelihood method in Wang et al. (2022), initialized by spectral clustering with permutations, is used to obtain the community membership label. In the simulation, we consider the test statistic Tn=Ln2(g^x,g^y)2log(2K^n)+loglog(2K^n)T_{n}=L_{n}^{2}(\widehat{g}_{x},\widehat{g}_{y})-2\log(2\widehat{K}n)+\log\log(2\widehat{K}n) and the bootstrap corrected test statistic TnbootT_{n}^{boot}. In comparative experiments, the maximum deviation method (referred to as TST-MD) proposed by Chen et al. (2021a) and the spectral-based method (referred to as TST-S) proposed by Chen et al. (2021b) are used.

4.1 Simulation 1: The Null Distribution Under the Stochastic Block Model

In the first simulation, we consider the finite sample null distribution of test statistic TnT_{n} and empirically verify Theorem 1 for two samples from a common stochastic block model. Since the limiting distribution of the test statistic is proven to be a Gumbel distribution, the location and scale parameters can be estimated by generating some bootstrap samples to correct the test statistic at a lower cost. In all of our simulations, we use M=100M=100.

We generate data from the stochastic block model with Buvx=Buvy=0.1(1+4×𝟙(u=v))B_{uv}^{x}=B_{uv}^{y}=0.1(1+4\times\mathds{1}(u=v)). In this setting, we set Kx=Ky=3K_{x}=K_{y}=3 with π1=π2=π3=1/3\pi_{1}=\pi_{2}=\pi_{3}=1/3. The sample size is either n=600n=600 or n=1200n=1200. We use the strongly consistent method to get g^x\widehat{g}_{x} and g^y\widehat{g}_{y}, then get B^x\widehat{B}_{x} and B^y\widehat{B}_{y}, e.g., the profile-pseudo likelihood method in Wang et al. (2021). In Figure 1, based on 1000 data replications, we plot the sample distribution of the test statistic TnT_{n} with and without bootstrap correction. Figure 1 demonstrates that the empirical probability density function of TnT_{n} biases upward, but the bias decrease as the sample size increases. Compared with the distribution of TnT_{n}, the distribution of TnbootT_{n}^{boot} fits the true distribution better.

Refer to caption
Refer to caption
Figure 1: Null densities under the stochastic block model in Simulation 1 with n=600n=600 (left plot) and n=1200n=1200 (right plot). The red dashed lines, blue dash-dotted lines, and black solid lines show the densities of the test statistic TnT_{n}, the bootstrap corrected test statistic TnbootT_{n}^{boot}, and the theoretical limit, respectively.

4.2 Simulation 2: Empirical Size for Hypothesis (1)

In this subsection, we study the empirical size under varying KxK_{x}, KyK_{y}, BxB_{x} and ByB_{y}. We set the edge probability between communities uu and vv as Buvx=Buvy=r(1+3×𝟙(u=v))B_{uv}^{x}=B_{uv}^{y}=r(1+3\times\mathds{1}(u=v)), where rr measures the sparsity of the network. Let Kx,Ky{2,3,4,5,6,10,15,20,30},r{0.05,0.1,0.2}K_{x},K_{y}\in\{2,3,4,5,6,10,15,20,30\},r\in\{0.05,0.1,0.2\}, and the size of each block be 200. Table 1 reports the result from 200 data replications. From Table 1, TnT_{n}’s Type I errors are close to the nominal level when KK is small, TnbootT_{n}^{boot}’s Type I errors are close to the nominal level even when KK is large. Compared with TST-MD and TST-S, the performances of TnbootT_{n}^{boot} and TST-S are better than TST-MD. In some cases, TST-MS does not work well. Inversely, the empirical size of TnbootT_{n}^{boot} and TST-S are close to the nominal level. It is worth noting that as the sample size increases, the empirical size of the TnT_{n} gradually increases, which seems to be contrary to fundamental theory. In fact, as the sample size increases, the number of communities also increases. As a result, the commonly used community label estimation algorithms cannot exactly recover the community label. But by the bootstrap correction, the bootstrap corrected statistic TnbootT_{n}^{boot} has a good empirical size, which implies that the statistic with the bootstrap correction works well. We also notice that when the network is sparse, the test statistic TnT_{n} does not have good empirical size, and tends to be a little oversized.

Table 1: Empirical size at nominal level α=0.05\alpha=0.05 for hypothesis test H0:(gx,Bx)=(gy,By)v.s.H1:(gx,Bx)(gy,By)H_{0}:(g_{x},B_{x})=(g_{y},B_{y})\ \mathrm{v.s.}\ H_{1}:(g_{x},B_{x})\neq(g_{y},B_{y}). In the results of TnbootT_{n}^{boot}, the values in the parentheses are the empirical size of TST-MD (left) and TST-S (right), respectively.
TnT_{n} TnbootT_{n}^{boot}
r=0.05r=0.05 r=0.1r=0.1 r=0.2r=0.2 r=0.05r=0.05 r=0.1r=0.1 r=0.2r=0.2
K=2K=2 0.04 0.09 0.03 0.04 (0.09,0.03) 0.08 (0.1,0.02) 0.05 (0.01,0.05)
K=3K=3 0.09 0.06 0.07 0.05 (0.085,0.04) 0.05 (0.04,0.03) 0.04 (0.01,0.03)
K=4K=4 0.09 0.08 0.06 0.04 (0.01,0.04) 0.05 (0,0.03) 0.05 (0,0.04)
K=5K=5 0.09 0.09 0.05 0.04 (0,0.04) 0.06 (0,0.04) 0.06 (0,0.03)
K=6K=6 0.08 0.09 0.04 0.04 (0.005,0.05) 0.04 (0,0.05) 0.04 (0,0.03)
K=10K=10 0.20 0.10 0.06 0.06 (0,0.04) 0.05 (0,0.04) 0.04 (0,0.06)
K=15K=15 0.46 0.16 0.09 0.08 (0.01,0.04) 0.05 (0,0.05) 0.05 (0,0.04)
K=20K=20 0.77 0.14 0.10 0.10 (0.03,0.04) 0.02 (0,0.05) 0.08 (0,0.05)
K=30K=30 0.99 0.23 0.13 0.05 (0.04,0.05) 0.02 (0.005,0.06) 0.04 (0,0.05)

4.3 Simulation 3: Empirical Power for Hypothesis (1)

In this subsection, we investigate the empirical power under hypothesis test (1). We consider two kinds of alternatives: i) BxByB_{x}\neq B_{y} but gx=gyg_{x}=g_{y} and ii) gxgyg_{x}\neq g_{y} but Bx=ByB_{x}=B_{y}. Similar to simulation 2, let Kx,Ky{2,3,4,5,6,10,20,30}K_{x},K_{y}\in\{2,3,4,5,6,10,20,30\}. The sample size is either n=600n=600 or n=1200n=1200. For the first alternative, we generate 𝒳\mathcal{X} with edge possibility 3.5r3.5r within community and 0.5r0.5r between communities, and 𝒴\mathcal{Y} with edge possibility 8r8r within community and 3r3r between communities for r{0.01,0.05,0.1}r\in\{0.01,0.05,0.1\}. For the second alternative, we generate 𝒳\mathcal{X} and 𝒴\mathcal{Y} with edge probability 3r3r within community and rr between communities for r{0.05,0.1,0.2}r\in\{0.05,0.1,0.2\}. We set gx=(1,,1,,Kx,,Kx)g_{x}=(1,\ldots,1,\ldots,K_{x},\ldots,K_{x}) and gyg_{y} from the multinomial distribution with nn trials and probability 𝝅=(1/Ky,,1/Ky)\bm{\pi}=(1/K_{y},\cdots,1/K_{y}). Similarly, all simulations are run 200 times. For both alternatives, Tables 2 - 3 report the empirical power for hypothesis test (1). We observe that the proposed test statistics and TST-MD can successfully detect all alternative hypotheses. In the second alternative hypothesis, however, the empirical power of TST-S is less than 1. As discussed in the introduction, if the change of community structure is small, there will be little difference between the two edge-probability matrices. As a result, the TST-S will not separate the null hypothesis and the alternative hypothesis well.

Table 2: Empirical power of TnT_{n} at nominal level α=0.05\alpha=0.05 for the first alternative. The values in the parentheses are the empirical power of TnbootT_{n}^{boot}(left), TST-MD (middle), and TST-S (right), respectively.
n=600n=600 n=1200n=1200
r=0.01r=0.01 r=0.05r=0.05 r=0.1r=0.1 r=0.01r=0.01 r=0.05r=0.05 r=0.1r=0.1
K=2K=2 1 (1,1,1) 1 (1,1,1) 1 (1,1,1) 1 (1,1,1) 1 (1,1,1) 1 (1,1,1)
K=3K=3 1 (1,1,1) 1 (1,1,1) 1 (1,1,1) 1 (1,1,1) 1 (1,1,1) 1 (1,1,1)
K=4K=4 1 (1,1,1) 1 (1,1,1) 1 (1,1,1) 1 (1,1,1) 1 (1,1,1) 1 (1,1,1)
K=5K=5 1 (1,1,1) 1 (1,1,1) 1 (1,1,1) 1 (1,1,1) 1 (1,1,1) 1 (1,1,1)
K=6K=6 1 (1,1,1) 1 (1,1,1) 1 (1,1,1) 1 (1,1,1) 1 (1,1,1) 1 (1,1,1)
K=10K=10 1 (1,1,1) 1 (1,1,1) 1 (1,1,1) 1 (1,1,1) 1 (1,1,1) 1 (1,1,1)
K=15K=15 1 (1,1,1) 1 (1,1,1) 1 (1,1,1) 1 (1,1,1) 1 (1,1,1) 1 (1,1,1)
K=20K=20 1 (1,1,1) 1 (1,1,1) 1 (1,1,1) 1 (1,1,1) 1 (1,1,1) 1 (1,1,1)
K=30K=30 1 (1,1,1) 1 (1,1,1) 1 (1,1,1) 1 (1,1,1) 1 (1,1,1) 1 (1,1,1)
Table 3: Empirical power of TnT_{n} at nominal level α=0.05\alpha=0.05 for the second alternative. The values in the parentheses are the empirical power of TnbootT_{n}^{boot}(left), TST-MD (middle), and TST-S (right), respectively.
n=600n=600 n=1200n=1200
r=0.01r=0.01 r=0.05r=0.05 r=0.1r=0.1 r=0.01r=0.01 r=0.05r=0.05 r=0.1r=0.1
K=2K=2 1 (1,1,0.04) 1 (1,1,0.17) 1 (1,1,0.54) 1 (1,1,0.04) 1 (1,1,0.27) 1 (1,1,0.57)
K=3K=3 1 (1,1,0.06) 1 (1,1,0.21) 1 (1,1,0.51) 1 (1,1,0.02) 1 (1,1,0.39) 1 (1,1,0.55)
K=4K=4 1 (1,1,0.045) 1 (1,1,0.16) 1 (1,1,0.45) 1 (1,1,0.09) 1 (1,1,0.35) 1 (1,1,0.63)
K=5K=5 1 (1,1,0.09) 1 (1,1,0.13) 1 (1,1,0.32) 1 (1,1,0.07) 1 (1,1,0.33) 1 (1,1,0.58)
K=6K=6 1 (1,1,0.06) 1 (1,1,0.08) 1 (1,1,0.19) 1 (1,1,0.05) 1 (1,1,018) 1 (1,1,0.5)
K=10K=10 1 (1,0.91,0.09) 1 (1,1,0.06) 1 (1,1,0.09) 1 (1,1,0.08) 1 (1,1,0.07) 1 (1,1,0.09)
K=15K=15 1 (1,1,0.07) 1 (1,0.65,0.07) 1 (1,0.99,0.09) 1 (1,1,0.08) 1 (1,1,0.07) 1 (1,1,0.1)
K=20K=20 1 (1,1,0.12) 1 (1,0.82,0.06) 1 (1,0.79,0.06) 1 (1,1,0.09) 1 (1,1,0.07) 1 (1,1,0.1)
K=30K=30 1 (1,1,0.1) 1 (1,0.95,0.09) 1 (1,0.75,0.07) 1 (1,1,0.73) 1 (1,1,0.06) 1 (1,1,0.09)

4.4 Simulation 4: The Null Distribution Under the Degree-Corrected Stochastic Block Model

In this subsection, we investigate the asymptotic null distribution of the proposed statistic TDT_{D}. Similar to Simulation 1, we also consider the bootstrap corrected statistic. The bootstrap corrected method is similar to the general stochastic block model and is given in the Appendix.

As shown in Zhao et al. (2012), we use the same approach to generate the degree corrected parameters θ\theta. The details are as follows:

θi={ξi,w.p.0.8,9/11,w.p.0.1,13/11,w.p.0.1,\theta_{i}=\begin{dcases}\xi_{i},&\mathrm{w.p.}0.8,\\ 9/11,&\mathrm{w.p.}0.1,\\ 13/11,&\mathrm{w.p.}0.1,\end{dcases}

where ξi\xi_{i} is a uniformly random variable on the interval [4/5,6/5][4/5,6/5]. Similar to Simulation 1, we set Kx=Ky=3K_{x}=K_{y}=3 with π1=π2=π3=1/3\pi_{1}=\pi_{2}=\pi_{3}=1/3 and Buvx=Buvy=0.1(1+4×𝟙(u=v))B_{uv}^{x}=B_{uv}^{y}=0.1(1+4\times\mathds{1}(u=v)). We also consider sample sizes n=600n=600 and 12001200. Based on 1000 data replications, we report the results in Figure 2. Figure 2 shows the empirical distribution of the test statistic TDT_{D} is the Gumbel distribution by a location and scale shift. With the sample size increasing, the deviation between the empirical distribution of TDT_{D} and the limiting distribution does not decrease. After using the bootstrap correction, the empirical distribution of TDbootT_{D}^{boot} is close to the limiting distribution. It also implies that the empirical size of the test statistic TDT_{D} is smaller than the nominal level.

Refer to caption
Refer to caption
Figure 2: Null densities under the degree-corrected stochastic block model in Simulation 4 with n=600n=600 (left plot) and n=1200n=1200 (right plot). The red dashed lines, blue dash-dotted lines, and black solid lines show the densities of the test statistic TDT_{D}, the bootstrap corrected test statistic TDbootT_{D}^{boot}, and the theoretical limit, respectively.

4.5 Simulation 5: Empirical Size for Hypothesis (6)

In this subsection, we consider the empirical size under the framework of the degree-corrected stochastic block model. The basic settings are similar to Simulation 2 except Kx,Ky{2,3,4,6}K_{x},K_{y}\in\{2,3,4,6\}. The degree corrected parameters are generated by the method in Simulation 4. Table 4 shows the simulation results. It is observed that the probability of Type I error is close to the nominal level. At the same time, the empirical size of the statistic TDT_{D} is less than that of TDbootT_{D}^{boot}, which is also consistent with the results of Simulation 4.

Table 4: Empirical size at nominal level α=0.05\alpha=0.05 for hypothesis test H0:(gx,Bx,θx)=(gy,By,θy)v.s.H1:(gx,Bx,θx)(gy,By,θy)H_{0}:(g_{x},B_{x},\theta_{x})=(g_{y},B_{y},\theta_{y})\ \mathrm{v.s.}\ H_{1}:(g_{x},B_{x},\theta_{x})\neq(g_{y},B_{y},\theta_{y}). The values in the parentheses are the empirical power of TnbootT_{n}^{boot}.
TDT_{D} TDbootT_{D}^{boot}
r=0.05r=0.05 r=0.1r=0.1 r=0.2r=0.2 r=0.05r=0.05 r=0.1r=0.1 r=0.2r=0.2
K=2K=2 0.01 0.07 0.05 0.04 0.05 0.06
K=3K=3 0.01 0.01 0.04 0.04 0.06 0.04
K=4K=4 0.04 0.01 0.04 0.05 0.04 0.05
K=6K=6 0.06 0.01 0.05 0.06 0.04 0.05

4.6 Simulation 6: Empirical Power for Hypothesis (6)

In this simulation, we investigate the testing power for the two-sample test under the degree-corrected stochastic block model. We only consider the case of gxgyg_{x}\neq g_{y}. The probability matrix and the community label are generated the same as in Simulation 3. We consider the settings that n=600n=600 and 12001200. The degree corrected parameters are similar to Simulation 4. The empirical results are shown in Table 5. From Table 5, we can know that the proposed test statistic can successfully detect the alternative hypotheses as the test has good power.

Table 5: Empirical power of TDT_{D} at nominal level α=0.05\alpha=0.05 for hypothesis test H0:(gx,Bx,θx)=(gy,By,θy)v.s.H1:(gx,Bx,θx)(gy,By,θy)H_{0}:(g_{x},B_{x},\theta_{x})=(g_{y},B_{y},\theta_{y})\ \mathrm{v.s.}\ H_{1}:(g_{x},B_{x},\theta_{x})\neq(g_{y},B_{y},\theta_{y}).
TDT_{D} TDbootT_{D}^{boot}
r=0.05r=0.05 r=0.1r=0.1 r=0.2r=0.2 r=0.05r=0.05 r=0.1r=0.1 r=0.2r=0.2
K=2K=2 1 1 1 1 1 1
K=3K=3 1 1 1 1 1 1
K=4K=4 1 1 1 1 1 1
K=6K=6 0.98 1 1 1 1 1

5 DATA EXAMPLE

5.1 Gene co-expression data

In this subsection, we apply the proposed method to the gene dataset of developing rhesus monkeys’ tissue from the medial prefrontal cortex. This dataset was originally collected by Bakken et al. (2016). Other work analyzing the data suggests this is an appropriate dataset, as other work already has good evidence that gene co-expression patterns in monkey tissues in this brain region change significantly as they develop. For the prenatal period, a 6-layer network is considered, which corresponds to the 6 age stage. We label the 6-layer network as E40 to E120 to indicate the number of embryonic days of age. For the postnatal period, a 5-layer network is considered, which indicates the 5 layers within the medial prefrontal cortex. We label the 5-layer network as L2 to L6. With this data, we aim to show whether two gene co-expression networks in two different periods can be considered to come from a common stochastic block model using the proposed method.

Preprocessing Procedure. The microarray dataset contains n=9173n=9173 genes measured among many samples across the L=11L=11 layer. In this article, since we consider the difference between the two samples, we only choose the gene co-expression networks in two periods E40 and E50, that is, the development occurs from 40 days to 50 days in the embryo. Similar to other work, e.g., Langfelder & Horvath (2008), we preprocess the adjacency matrix as follows. First, to calculate the adjacency matrix, for a layer ll, we construct the Pearson correlation matrix. Define the co-expression similarity sijs_{ij} as the correlation coefficient between the profiles of nodes ii and jj: sij=cov(xi,xj),1i,jns_{ij}=cov(x_{i},x_{j}),1\leq i,j\leq n. In order to avoid the outliers, let sij=(1+sij)/2s_{ij}^{\prime}=(1+s_{ij})/2. Then, using a thresholding procedure, the co-expression similarity matrix S=(sij)n×nS=(s^{\prime}_{ij})_{n\times n} is transformed into the adjacency matrix 𝒳\mathcal{X}. An unweighted network adjacency XijX_{ij} between gene expression profiles xix_{i} and xjx_{j} can be defined by hard thresholding the co-expression similarity sijs^{\prime}_{ij} as

Xij={1|sij|τ,0|sij|<τ,andXii=0,X_{ij}=\begin{dcases}1&|s^{\prime}_{ij}|\geq\tau,\\ 0&|s^{\prime}_{ij}|<\tau,\end{dcases}\quad\mathrm{and}\quad X_{ii}=0,

where τ\tau is the “hard” threshold parameter. Thus, two genes are linked (Xij=1X_{ij}=1) if the absolute correlation between their expression profiles exceeds the hard threshold τ\tau. In this article, we set τ=0.72\tau=0.72 to get two adjacency matrices 𝒳\mathcal{X} and 𝒴\mathcal{Y}. Lastly, we remove all the genes corresponding to nodes whose total degree for 𝒳\mathcal{X} and 𝒴\mathcal{Y} is less than 90. The purpose of this is to remove those nodes with few connections, which process can be seen in Lei & Lin (2022). Finally, we have two adjacency matrices 𝒳,𝒴{0,1}4722×4722\mathcal{X},\mathcal{Y}\in\{0,1\}^{4722\times 4722}, each representing a network corresponding to 4722 genes.

Result and Interpretation. The following results show the difference between the gene co-expression networks of E40 and E50 using the proposed method based on the maximum entry-wise deviation. Prior to using our method, we select the number of communities to be KE40=KE50=8K_{E40}=K_{E50}=8, which is considered reasonable, as described in Lei & Lin (2022). Then, we use TnT_{n} and TnbootT_{n}^{boot} as test statistic, and obtain Tn=355748.3T_{n}=355748.3 and Tnboot=83.35T_{n}^{boot}=83.35, respectively. Since, Q0.95=3.41Q_{0.95}=3.41 for the Gumbel distribution, we reject H0:(gE40,BE40)=(gE50,BE50)H_{0}:(g_{E40},B_{E40})=(g_{E50},B_{E50}) at the level of 0.05. As the discussion in Lei & Lin (2022), as the development occurs from 40 days to 50 days in the embryo, there are different gene communities that are most connected. From 40 days in embryo, community 1 was highly coordinated (i.e. densely connected), and to 40 days in embryo, community 3 was highly coordinated. Hence, it can be considered that the two networks in two periods E40 and E50 come from two different stochastic block models. The results are similar to that of Liu et al. (2018) and Lei & Lin (2022).

5.2 International trade data

Now, we study an international trade dataset originally analyzed by Westveld & Hoff (2011), containing yearly international trade data between n=58n=58 countries from 1981 to 2000. For this network, an adjacency matrix AtA_{t} can be formed by first considering a weight matrix WtW_{t} with Wijt=Tradeijt+TradejitW_{ijt}=\mathrm{Trade}_{ijt}+\mathrm{Trade}_{jit} in given year tt, where Tradeijt\mathrm{Trade}_{ijt} denotes the value of exports from country ii to country jj in year tt. Finally, we define Aijt=1A_{ijt}=1 if WijtW0.5,tW_{ijt}\geq W_{0.5,t}, and Aijt=0A_{ijt}=0 otherwise, where W0.5,tW_{0.5,t} denotes the 50th percentile of {Wijt}1i<jn\{W_{ijt}\}_{1\leq i<j\leq n} in year tt. In this article, we focus on the international trade networks in 1995 and 1999. Thus, we can obtain two sample networks 𝒳\mathcal{X} and 𝒴\mathcal{Y} in 1995 and 1999. For the network in 1995, the number of communities is estimated to be 7 in Hu et al. (2021). For the network in 1999, using the corrected Bayesian information criterion, the number of communities in 1999 is also estimated to be 77. Then, we set K^x=K^y=7\widehat{K}_{x}=\widehat{K}_{y}=7 and continue to implement the proposed method. We obtain Tn=113.7352T_{n}=113.7352 and Tnboot=92.46558T_{n}^{boot}=92.46558. Since, Q0.95=3.41Q_{0.95}=3.41 for the Gumbel distribution, we reject H0:(gx,Bx)=(gy,By)H_{0}:(g_{x},B_{x})=(g_{y},B_{y}) at the level of 0.05. Hence, we can assert that the trade situation of different countries has changed in the four years from 1995 to 1999.

6 DISSCUSSION

In this article, we have proposed a novel two-sample test statistic based on the maximum entry-wise deviation of staggered centered and rescale observed adjacency matrix, and have demonstrated that the asymptotic null distribution of the test statistic is a Gumbel distribution when Kx=Ky=o(n/log2n)K_{x}=K_{y}=o(n/\log^{2}n). This test has extended the method of Hu et al.(2021) to two samples. In our study, the difference between two networks is assessed by the sum of two residual matrices, where the centered and rescaled matrix of another sample is obtained using the estimation of the label and the probability matrix of one sample. Empirically, we have demonstrated that the size and the power of the test are valid. In this article, we assume that the numbers of communities for 𝒳\mathcal{X} and 𝒴\mathcal{Y} are equal. In reality, we should determine whether KxK_{x} is equal to KyK_{y}. In fact, we can independently estimate the number of communities by some existing methods, such as the sequentially testing methods (Lei, 2016: Hu et al., 2021) and methods based on the model selection (Hu et al., 2020). If K^x=K^y\widehat{K}_{x}=\widehat{K}_{y}, the proposed method can be directly used. When KxKyK_{x}\neq K_{y}, we can consider the new community structure by combining two stochastic block models. Figure 3 gives a diagram. Let Zx{0,1}n×Kx,Zy{0,1}n×Ky,Z{0,1}n×KZ_{x}\in\{0,1\}^{n\times K_{x}},Z_{y}\in\{0,1\}^{n\times K_{y}},Z^{\prime}\in\{0,1\}^{n\times K^{\prime}} be membership matrices. Note that each row of ZxZ_{x} (or ZyZ_{y}) has exactly one entry that is nonzero. For two nodes i1i_{1} and i2i_{2}, Zi1=Zi2Z^{\prime}_{i_{1}\cdot}=Z^{\prime}_{i_{2}\cdot} if and only if Z¯i1=Z¯i2\bar{Z}_{i_{1}\cdot}=\bar{Z}_{i_{2}\cdot}, where Z¯=(Zx,Zy){0,1}n×(Kx+Ky)\bar{Z}=(Z_{x},Z_{y})\in\{0,1\}^{n\times(K_{x}+K_{y})} be the combined membership matrix. Hence, we can use g^x\widehat{g}_{x} and g^y\widehat{g}_{y} to obtain gg^{\prime}, then obtain B^x\widehat{B}_{x} and B^y\widehat{B}_{y} by gg^{\prime} and KK^{\prime}. Then, we can use our proposed method to test the difference between two samples after combining two community structures.

Refer to caption
Figure 3: An example of a community combination

It is worth noting that the proposed testing method works well when the network is dense and KK is small in our simulation studies. However, when the network is sparse or KK is large, condition (3) may be violated, i.e., the exact community label estimator is hard to be obtained. Hence, it would be of interest to investigate the possibility of the sparse network and big KK in future work. In addition, note that although the proposed method can identify whether the two models are the same through two samples, we cannot know whether this difference is caused by changes in gg or BB, which is crucial in practical applications. Intuitively, we can judge whether gxg_{x} is equal to gyg_{y} by the strong consistent estimator g^x\widehat{g}_{x} and g^y\widehat{g}_{y}. However, there is no theoretical guarantee. This issue will be considered in future work. In order to better improve the two-sample test of the stochastic block model, we will continue to study this issue in future work.

7 APPENDIX

We start with three lemmas that will be used in the proof. The following Poisson approximation result is essentially a special case of Theorem 1 in Arratia et al. (1989).

Lemma 1.

lemma 1.[Arratia et al. (1989)] Let II be an index set and {𝐁α,αI}\{\mathbf{B}_{\alpha},\alpha\in I\} be a set of subsets of II, that is, 𝐁αI\mathbf{B}_{\alpha}\subset I. Let also {ηα,αI}\{\eta_{\alpha},\alpha\subset I\} be random variables. For a given tt\in\mathbb{R}, set λ=αI{ηα>t}\lambda=\sum_{\alpha\in I}\mathbb{P}\{\eta_{\alpha}>t\}. Then

|{maxαIηαt}eλ|(1λ1)(b1+b2+b3),\left|\mathbb{P}\{\max_{\alpha\in I}\eta_{\alpha}\leq t\}-e^{-\lambda}\right|\leq(1\wedge\lambda^{-1})(b_{1}+b_{2}+b_{3}),

where

b1=αIβ𝐁α{ηα>t}{ηβ>t},b2=αIαβ𝐁α{ηα>t,ηβ>t},b_{1}=\sum_{\alpha\in I}\sum_{\beta\in\mathbf{B}_{\alpha}}\mathbb{P}\{\eta_{\alpha}>t\}\mathbb{P}\{\eta_{\beta}>t\},b_{2}=\sum_{\alpha\in I}\sum_{\alpha\neq\beta\in\mathbf{B}_{\alpha}}\mathbb{P}\{\eta_{\alpha}>t,\eta_{\beta}>t\},
b3=αI𝔼|{ηα>t|σ(ηβ,β𝐁α)}{ηβ>t}|,b_{3}=\sum_{\alpha\in I}\mathbb{E}|\mathbb{P}\{\eta_{\alpha}>t|\sigma(\eta_{\beta},\beta\neq{\mathbf{B}_{\alpha})\}}-\mathbb{P}\{\eta_{\beta}>t\}|,

and σ(ηβ,β𝐁α)\sigma(\eta_{\beta},\beta\notin\mathbf{B}_{\alpha}) is the σ\sigma-algebra generated by {ηβ,β𝐁α}\{\eta_{\beta},\beta\notin\mathbf{B}_{\alpha}\}. In particular, if ηα\eta_{\alpha} is independent of {ηβ,β𝐁α}\{\eta_{\beta},\beta\notin\mathbf{B}_{\alpha}\} for each α\alpha then b3=0b_{3}=0.

Lemma 2.

lemma 2.[Chen (1990)] Suppose ξ1,,ξn\xi_{1},\ldots,\xi_{n} are i.i.d random variables with 𝔼ξ1=0\mathbb{E}\xi_{1}=0 and 𝔼ξ12=1\mathbb{E}\xi_{1}^{2}=1. Set Sn=i=1nξiS_{n}=\sum_{i=1}^{n}\xi_{i}. Let 0<α10<\alpha\leq 1 and {an:n1}\{a_{n}:n\geq 1\} satisfy that ana_{n}\rightarrow\infty and an=o(nα/(2(2α)))a_{n}=o(n^{\alpha/(2(2-\alpha))}). If 𝔼et0|ξ1|α<\mathbb{E}e^{t_{0}|\xi_{1}|^{\alpha}}<\infty for some t0>0t_{0}>0, then

limn1an2log{Snnanμ}=μ22\lim_{n}\frac{1}{a_{n}^{2}}\log\mathbb{P}\{\frac{S_{n}}{\sqrt{n}a_{n}}\geq\mu\}=-\frac{\mu^{2}}{2}

for any μ>0\mu>0.

Lemma 3.

lemma 3.[Cai & Jiang (2011)] Suppose ξ1,,ξn\xi_{1},\ldots,\xi_{n} are i.i.d random variables with 𝔼ξ1=0\mathbb{E}\xi_{1}=0 and 𝔼ξ12=1\mathbb{E}\xi_{1}^{2}=1 and 𝔼et0|ξ1|α<\mathbb{E}e^{t_{0}|\xi_{1}|^{\alpha}}<\infty for some t0>0t_{0}>0 and 0<α10<\alpha\leq 1. Set Sn=i=1nξiS_{n}=\sum_{i=1}^{n}\xi_{i} and β=α/(2+α)\beta=\alpha/(2+\alpha). Then, for any {pn:n1}\{p_{n}:n\geq 1\} with 0<pn0<p_{n}\rightarrow\infty and logpn=o(nβ)\log p_{n}=o(n^{\beta}) and {yn:n1}\{y_{n}:n\geq 1\} with yny>0y_{n}\rightarrow y>0,

{Snnlogpn>yn}pnyn2/2(logpn)1/22πy\mathbb{P}\{\dfrac{S_{n}}{\sqrt{n\log p_{n}}}>y_{n}\}\sim\dfrac{p_{n}^{-y_{n}^{2}/2}(\log p_{n})^{-1/2}}{\sqrt{2\pi}y}

as nn\rightarrow\infty.

7.1 Proof of Theorem 1

Although the idea of proof of the Theorem 1 is similar to Theorem 1 in Hu et al. (2021), there are some differences in some details, such as decomposition for some quantities. Next, we prove Theorem 1.

Since, H0:(gx,Bx)=(gy,By)H_{0}:(g_{x},B_{x})=(g_{y},B_{y}), we denote g=gx=gyg=g_{x}=g_{y} and B=Bx=ByB=B_{x}=B_{y} for simplicity. By Bernstein’s inequality, we have

{|B^uvBuv|>ϵlognnunv}2exp(ϵ2log2n2Buv(1Buv)+23ϵlogn/nunv)\mathbb{P}\{|\widehat{B}_{uv}-B_{uv}|>\dfrac{\epsilon\log n}{\sqrt{n_{u}n_{v}}}\}\leq 2\exp\left(-\dfrac{\epsilon^{2}\log^{2}n}{2B_{uv}(1-B_{uv})+\frac{2}{3}\epsilon\log n/\sqrt{n_{u}n_{v}}}\right)

for any ϵ>0\epsilon>0, According to Assumption 2, we can get

B^uvBuv=op(lognnunv)=op(Klognn)\widehat{B}_{uv}-B_{uv}=o_{p}(\dfrac{\log n}{\sqrt{n_{u}n_{v}}})=o_{p}(\dfrac{K\log n}{n})

uniformly in uu and vv as nn\rightarrow\infty.

Notice that

ρ~iv,x\displaystyle\tilde{\rho}_{iv,x} :=jgy1(v)/{i}XijB^gy(i)gy(j)yB^gy(i)gy(j)y(1B^gy(i)gy(j)y)\displaystyle:=\sum\limits_{j\in g_{y}^{-1}(v)/\{i\}}\dfrac{X_{ij}-\widehat{B}^{y}_{g_{y}(i)g_{y}(j)}}{\sqrt{\widehat{B}^{y}_{g_{y}(i)g_{y}(j)}\left(1-\widehat{B}^{y}_{g_{y}(i)g_{y}(j)}\right)}}
=jg1(v)/{i}XijBg(i)g(j)+op(Klognn)Bg(i)g(j)(1Bg(i)g(j))(1+op(Klognn))\displaystyle\ =\sum\limits_{j\in g^{-1}(v)/\{i\}}\dfrac{X_{ij}-B_{g(i)g(j)}+o_{p}(\frac{K\log n}{n})}{\sqrt{B_{g(i)g(j)}\left(1-B_{g(i)g(j)}\right)}}\left(1+o_{p}(\sqrt{\dfrac{K\log n}{n}})\right)
=ρiv,x+ρiv,xop(Klognn)+op(1),\displaystyle\ =\rho_{iv,x}+\rho_{iv,x}o_{p}(\sqrt{\dfrac{K\log n}{n}})+o_{p}(1),

similarly,

ρ~iv,y\displaystyle\tilde{\rho}_{iv,y} :=jgx1(v)/{i}YijB^gx(i)gx(j)xB^gx(i)gx(j)x(1B^gx(i)gx(j)x)\displaystyle:=\sum\limits_{j\in g_{x}^{-1}(v)/\{i\}}\dfrac{Y_{ij}-\widehat{B}^{x}_{g_{x}(i)g_{x}(j)}}{\sqrt{\widehat{B}^{x}_{g_{x}(i)g_{x}(j)}\left(1-\widehat{B}^{x}_{g_{x}(i)g_{x}(j)}\right)}}
=jg1(v)/{i}YijBg(i)g(j)+op(Klognn)Bg(i)g(j)(1Bg(i)g(j))(1+op(Klognn))\displaystyle\ =\sum\limits_{j\in g^{-1}(v)/\{i\}}\dfrac{Y_{ij}-B_{g(i)g(j)}+o_{p}(\frac{K\log n}{n})}{\sqrt{B_{g(i)g(j)}\left(1-B_{g(i)g(j)}\right)}}\left(1+o_{p}(\sqrt{\dfrac{K\log n}{n}})\right)
=ρiv,y+ρiv,yop(Klognn)+op(1),\displaystyle\ =\rho_{iv,y}+\rho_{iv,y}o_{p}(\sqrt{\dfrac{K\log n}{n}})+o_{p}(1),

where ρiv,x=jg1(v)/{i}XijBg(i)g(j)Bg(i)g(j)(1Bg(i)g(j))\rho_{iv,x}=\sum\limits_{j\in g^{-1}(v)/\{i\}}\dfrac{X_{ij}-B_{g(i)g(j)}}{\sqrt{B_{g(i)g(j)}\left(1-B_{g(i)g(j)}\right)}} and ρiv,y=jg1(v)/{i}YijBg(i)g(j)Bg(i)g(j)(1Bg(i)g(j))\rho_{iv,y}=\sum\limits_{j\in g^{-1}(v)/\{i\}}\dfrac{Y_{ij}-B_{g(i)g(j)}}{\sqrt{B_{g(i)g(j)}\left(1-B_{g(i)g(j)}\right)}}.

Let

Ln,0\displaystyle L_{n,0} :=maxi,v|ρ~iv|\displaystyle:=\max_{i,v}|\tilde{\rho}_{iv}|
=maxi,v|1|gx1(v)/{i}|+|gy1(v)/{i}|(ρ~iv,x+ρ~iv,y)|\displaystyle\ =\max_{i,v}\left|\dfrac{1}{\sqrt{\left|g_{x}^{-1}(v)/\{i\}\right|+\left|g_{y}^{-1}(v)/\{i\}\right|}}\left(\tilde{\rho}_{iv,x}+\tilde{\rho}_{iv,y}\right)\right|
=maxi,v|12|g1(v)/{i}|[ρiv,x+ρiv,y+(ρiv,x+ρiv,y)op(Klognn)+op(1)]|\displaystyle\ =\max_{i,v}\left|\dfrac{1}{\sqrt{2\left|g^{-1}(v)/\{i\}\right|}}\left[\rho_{iv,x}+\rho_{iv,y}+(\rho_{iv,x}+\rho_{iv,y})o_{p}(\sqrt{\dfrac{K\log n}{n}})+o_{p}(1)\right]\right|
=Ln,1+Ln,1op(Klognn)+op(1),\displaystyle\ =L_{n,1}+L_{n,1}o_{p}(\sqrt{\dfrac{K\log n}{n}})+o_{p}(1),

where

Ln,1\displaystyle L_{n,1} :=maxi,v|ρiv|\displaystyle:=\max_{i,v}\left|\rho_{iv}\right|
=maxi,v|12|g1(v)/{i}|(ρiv,x+ρiv,y)|\displaystyle\ =\max_{i,v}\left|\dfrac{1}{\sqrt{2\left|g^{-1}(v)/\{i\}\right|}}\left(\rho_{iv,x}+\rho_{iv,y}\right)\right|
=maxi,v|1|g1(v)/{i}|jg1(v)/{i}Xij+Yij2Bg(i)g(j)2Bg(i)g(j)(1Bg(i)g(j))|.\displaystyle\ =\max_{i,v}\left|\dfrac{1}{\sqrt{\left|g^{-1}(v)/\{i\}\right|}}\sum\limits_{j\in g^{-1}(v)/\{i\}}\dfrac{X_{ij}+Y_{ij}-2B_{g(i)g(j)}}{\sqrt{2B_{g(i)g(j)}\left(1-B_{g(i)g(j)}\right)}}\right|.

If K=o(n/log2n)K=o(n/\log^{2}n) and Ln,1=Op(logn)L_{n,1}=O_{p}(\sqrt{\log n}), we have

Ln,0=Ln,1+op(1).L_{n,0}=L_{n,1}+o_{p}(1).

Thus, to prove Theorem 1 (5), it is sufficient to show

limn{Ln,122log(2Kn)+loglog(2Kn)y}=exp{12πey/2}.\lim_{n}\mathbb{P}\left\{L_{n,1}^{2}-2\log(2Kn)+\log\log(2Kn)\leq y\right\}=\exp\left\{-\dfrac{1}{2\sqrt{\pi}}e^{-y/2}\right\}.

Let yn=y+2log(2Kn)loglog(2Kn),I={(i,v)|1in,1vK},𝐁iv={(s,t)I/(i,v)|s=i}y_{n}=\sqrt{y+2\log(2Kn)-\log\log(2Kn)},I=\{(i,v)|1\leq i\leq n,1\leq v\leq K\},\mathbf{B}_{iv}=\{(s,t)\in I/(i,v)|s=i\}. Then |𝐁iv|=K1|\mathbf{B}_{iv}|=K-1. Note that 𝔼{Xij+Yij2Bg(i)g(j)2Bg(i)g(j)(1Bg(i)g(j))}=0\mathbb{E}\{\dfrac{X_{ij}+Y_{ij}-2B_{g(i)g(j)}}{\sqrt{2B_{g(i)g(j)}\left(1-B_{g(i)g(j)}\right)}}\}=0 and 𝔼{Xij+Yij2Bg(i)g(j)2Bg(i)g(j)(1Bg(i)g(j))}2=1\mathbb{E}\{\dfrac{X_{ij}+Y_{ij}-2B_{g(i)g(j)}}{\sqrt{2B_{g(i)g(j)}\left(1-B_{g(i)g(j)}\right)}}\}^{2}=1. By Lemma 1, we have

|{maxi,v|ρiv|yn}eλn|b1+b2,\left|\mathbb{P}\{\max_{i,v}|\rho_{iv}|\leq y_{n}\}-e^{-\lambda_{n}}\right|\leq b_{1}+b_{2},

where λn=i,v{|ρiv|>yn}\lambda_{n}=\sum_{i,v}\mathbb{P}\{|\rho_{iv}|>y_{n}\}. By Lemma 3, we have

{ρiv>yn}\displaystyle\mathbb{P}\{\rho_{iv}>y_{n}\} ={ρivlog(2Kn)>y+2log(2Kn)loglog(2Kn)log(2Kn)}\displaystyle=\mathbb{P}\{\dfrac{\rho_{iv}}{\sqrt{\log(2Kn)}}>\sqrt{\dfrac{y+2\log(2Kn)-\log\log(2Kn)}{\log(2Kn)}}\}
(2Kn)y+2log(2Kn)loglog(2Kn)log(2Kn)(log(2Kn))1/2/(2π)\displaystyle\sim(2Kn)^{-\frac{y+2\log(2Kn)-\log\log(2Kn)}{\log(2Kn)}}(\log(2Kn))^{-1/2}/(2\sqrt{\pi})
=(2Kn)1(2Kn)y2log(2Kn)(2Kn)loglog(2Kn)2log(2Kn)(log(2Kn))12/(2π)\displaystyle=(2Kn)^{-1}(2Kn)^{-\frac{y}{2\log(2Kn)}}(2Kn)^{\frac{\log\log(2Kn)}{2\log(2Kn)}}(\log(2Kn))^{-\frac{1}{2}}/(2\sqrt{\pi})
=(2Kn)1ey2log(2Kn)log(2Kn)eloglog(2Kn)2log(2Kn)log(2Kn)(log(2Kn))12/(2π)\displaystyle=(2Kn)^{-1}e^{-\frac{y}{2\log(2Kn)}\log(2Kn)}e^{\frac{\log\log(2Kn)}{2\log(2Kn)}\log(2Kn)}(\log(2Kn))^{-\frac{1}{2}}/(2\sqrt{\pi})
=(2Kn)1ey/2elog(log(2Kn))12(log(2Kn))12/(2π)\displaystyle=(2Kn)^{-1}e^{-y/2}e^{\log(\log(2Kn))^{\frac{1}{2}}}(\log(2Kn))^{-\frac{1}{2}}/(2\sqrt{\pi})
=(2Kn)1ey/2(log(2Kn))12(log(2Kn))12/(2π)\displaystyle=(2Kn)^{-1}e^{-y/2}(\log(2Kn))^{\frac{1}{2}}(\log(2Kn))^{-\frac{1}{2}}/(2\sqrt{\pi})
=(2Kn)1ey/2/(2π).\displaystyle=(2Kn)^{-1}e^{-y/2}/(2\sqrt{\pi}).

Hence,

λn\displaystyle\lambda_{n} =i,v{|ρiv|>yn}\displaystyle=\sum_{i,v}\mathbb{P}\{|\rho_{iv}|>y_{n}\}
=Kn(Kn)12πey/2\displaystyle=Kn\dfrac{(Kn)^{-1}}{2\sqrt{\pi}}e^{-y/2}
=12πey/2.\displaystyle=\dfrac{1}{2\sqrt{\pi}}e^{-y/2}.

Meanwhile,

b1\displaystyle b_{1} =αIβ𝐁α{ηα>yn}{ηα>yn}\displaystyle=\sum_{\alpha\in I}\sum_{\beta\in\mathbf{B}_{\alpha}}\mathbb{P}\{\eta_{\alpha}>y_{n}\}\mathbb{P}\{\eta_{\alpha}>y_{n}\}
<4K2ney2log(2Kn)+loglog(2Kn)\displaystyle<4K^{2}ne^{-y-2\log(2Kn)+\log\log(2Kn)}
=elog(4K2n)y2log(2Kn)+loglog(2Kn)\displaystyle=e^{\log(4K^{2}n)-y-2\log(2Kn)+\log\log(2Kn)}
=o(1),\displaystyle=o(1),
b2\displaystyle b_{2} =αIαβ𝐁α{ηα>yn,ηα>yn}\displaystyle=\sum_{\alpha\in I}\sum_{\alpha\neq\beta\in\mathbf{B}_{\alpha}}\mathbb{P}\{\eta_{\alpha}>y_{n},\eta_{\alpha}>y_{n}\}
<4K2ney2log(2Kn)+loglog(2Kn)\displaystyle<4K^{2}ne^{-y-2\log(2Kn)+\log\log(2Kn)}
=elog(4K2n)y2log(2Kn)+loglog(2Kn)\displaystyle=e^{\log(4K^{2}n)-y-2\log(2Kn)+\log\log(2Kn)}
=o(1).\displaystyle=o(1).

Thus, we have

limn{Ln,12yn}=exp{12πey/2},\lim_{n}\mathbb{P}\{L_{n,1}^{2}\leq y_{n}\}=\exp\left\{-\dfrac{1}{2\sqrt{\pi}}e^{-y/2}\right\},

Combining this with Ln,0=Ln,1+op(1)L_{n,0}=L_{n,1}+o_{p}(1), we know that (5) hlods. \square

7.2 Proof of Theorem 2

Although the idea of proof of the Theorem 2 is similar to Theorem 2 in Hu et al. (2021), there are some differences in some details, such as decomposition for some quantities. Next, we prove Theorem 2.

Similar to the proof of Theorem 1, we have

B^uvxBuvx=op(Klognn),andB^uvyBuvy=op(Klognn).\widehat{B}_{uv}^{x}-B_{uv}^{x}=o_{p}(\dfrac{K\log n}{n}),\ \mathrm{and}\ \widehat{B}_{uv}^{y}-B_{uv}^{y}=o_{p}(\dfrac{K\log n}{n}).

Let

ρiv,x=1|gy1(v)/{i}|jgy1(v)/{i}XijBgx(i)gx(j)xBgy(i)gy(j)y(1Bgy(i)gy(j)y),\rho_{iv,x}=\dfrac{1}{\sqrt{|g_{y}^{-1}(v)/\{i\}|}}\sum\limits_{j\in g_{y}^{-1}(v)/\{i\}}\dfrac{X_{ij}-B_{g_{x}(i)g_{x}(j)}^{x}}{\sqrt{B_{g_{y}(i)g_{y}(j)}^{y}\left(1-B_{g_{y}(i)g_{y}(j)}^{y}\right)}},
ρ~iv,x\displaystyle\tilde{\rho}_{iv,x} :=1|gy1(v)/{i}|jgy1(v)/{i}XijB^gy(i)gy(j)yB^gy(i)gy(j)y(1B^gy(i)gy(j)y)\displaystyle:=\dfrac{1}{\sqrt{|g_{y}^{-1}(v)/\{i\}|}}\sum\limits_{j\in g_{y}^{-1}(v)/\{i\}}\dfrac{X_{ij}-\widehat{B}^{y}_{g_{y}(i)g_{y}(j)}}{\sqrt{\widehat{B}^{y}_{g_{y}(i)g_{y}(j)}\left(1-\widehat{B}^{y}_{g_{y}(i)g_{y}(j)}\right)}}
=1|gy1(v)/{i}|\displaystyle\ =\dfrac{1}{\sqrt{|g_{y}^{-1}(v)/\{i\}|}}
×jgy1(v)/{i}XijBgx(i)gx(j)x+Bgx(i)gx(j)xBgy(i)gy(j)y+Bgy(i)gy(j)yB^gy(i)gy(j)yB^gy(i)gy(j)y(1B^gy(i)gy(j)y)\displaystyle\qquad\times\sum\limits_{j\in g_{y}^{-1}(v)/\{i\}}\dfrac{X_{ij}-B_{g_{x}(i)g_{x}(j)}^{x}+B_{g_{x}(i)g_{x}(j)}^{x}-B^{y}_{g_{y}(i)g_{y}(j)}+B^{y}_{g_{y}(i)g_{y}(j)}-\widehat{B}^{y}_{g_{y}(i)g_{y}(j)}}{\sqrt{\widehat{B}^{y}_{g_{y}(i)g_{y}(j)}\left(1-\widehat{B}^{y}_{g_{y}(i)g_{y}(j)}\right)}}
=1|gy1(v)/{i}|jgy1(v)/{i}XijBgx(i)gx(j)x+Bgx(i)gx(j)xBgy(i)gy(j)y+op(Klognn)Bgy(i)gy(j)y(1Bgy(i)gy(j)y)\displaystyle\ =\dfrac{1}{\sqrt{|g_{y}^{-1}(v)/\{i\}|}}\sum\limits_{j\in g_{y}^{-1}(v)/\{i\}}\dfrac{X_{ij}-B_{g_{x}(i)g_{x}(j)}^{x}+B_{g_{x}(i)g_{x}(j)}^{x}-B^{y}_{g_{y}(i)g_{y}(j)}+o_{p}(\dfrac{K\log n}{n})}{\sqrt{B^{y}_{g_{y}(i)g_{y}(j)}\left(1-B^{y}_{g_{y}(i)g_{y}(j)}\right)}}
×(1+op(Klognn))\displaystyle\qquad\times\left(1+o_{p}(\sqrt{\dfrac{K\log n}{n}})\right)
=ρiv,x+1|gy1(v)/{i}|jgy1(v)/{i}Bgx(i)gx(j)xBgy(i)gy(j)yBgy(i)gy(j)y(1Bgy(i)gy(j)y)+op(1).\displaystyle\ =\rho_{iv,x}+\dfrac{1}{\sqrt{|g_{y}^{-1}(v)/\{i\}|}}\sum\limits_{j\in g_{y}^{-1}(v)/\{i\}}\dfrac{B_{g_{x}(i)g_{x}(j)}^{x}-B^{y}_{g_{y}(i)g_{y}(j)}}{\sqrt{B^{y}_{g_{y}(i)g_{y}(j)}\left(1-B^{y}_{g_{y}(i)g_{y}(j)}\right)}}+o_{p}(1).

By Hoeffding’s inequality, we have

{maxi,v|1|gy1(v)/{i}|jgy1(v)/{i}XijBgx(i)gx(j)xBgy(i)gy(j)y(1Bgy(i)gy(j)y)|>t}\displaystyle\mathbb{P}\{\max_{i,v}|\dfrac{1}{\sqrt{|g_{y}^{-1}(v)/\{i\}|}}\sum\limits_{j\in g_{y}^{-1}(v)/\{i\}}\dfrac{X_{ij}-B_{g_{x}(i)g_{x}(j)}^{x}}{\sqrt{B_{g_{y}(i)g_{y}(j)}^{y}\left(1-B_{g_{y}(i)g_{y}(j)}^{y}\right)}}|>t\}
\displaystyle\leq i,v{|jgy1(v)/{i}XijBgx(i)gx(j)xBgy(i)gy(j)y(1Bgy(i)gy(j)y)|>t|gy1(v)/{i}|}\displaystyle\ \sum_{i,v}\mathbb{P}\{|\sum\limits_{j\in g_{y}^{-1}(v)/\{i\}}\dfrac{X_{ij}-B_{g_{x}(i)g_{x}(j)}^{x}}{\sqrt{B_{g_{y}(i)g_{y}(j)}^{y}\left(1-B_{g_{y}(i)g_{y}(j)}^{y}\right)}}|>t\sqrt{|g_{y}^{-1}(v)/\{i\}|}\}
\displaystyle\leq 2elog(Kn)2C12t2.\displaystyle\ 2e^{\log(Kn)-2C_{1}^{2}t^{2}}.

Hence, maxi,v|ρiv,x|=Op(logn)\max_{i,v}|\rho_{iv,x}|=O_{p}(\sqrt{\log n}). Denote

1|gy1(v)/{i}|jgy1(v)/{i}Bgx(i)gx(j)xBgy(i)gy(j)yBgy(i)gy(j)y(1Bgy(i)gy(j)y)\dfrac{1}{\sqrt{|g_{y}^{-1}(v)/\{i\}|}}\sum\limits_{j\in g_{y}^{-1}(v)/\{i\}}\dfrac{B_{g_{x}(i)g_{x}(j)}^{x}-B^{y}_{g_{y}(i)g_{y}(j)}}{\sqrt{B^{y}_{g_{y}(i)g_{y}(j)}\left(1-B^{y}_{g_{y}(i)g_{y}(j)}\right)}}

by iv(y,x)\ell_{iv}(y,x), we have

maxi,v|ρ~iv,x|=maxi,v|iv(y,x)|+Op(logn).\max_{i,v}|\tilde{\rho}_{iv,x}|=\max_{i,v}|\ell_{iv}(y,x)|+O_{p}(\sqrt{\log n}).

Similarly, we have

ρiv,y=1|gx1(v)/{i}|jgx1(v)/{i}YijBgy(i)gy(j)yBgx(i)gx(j)x(1Bgx(i)gx(j)x),\rho_{iv,y}=\dfrac{1}{\sqrt{|g_{x}^{-1}(v)/\{i\}|}}\sum\limits_{j\in g_{x}^{-1}(v)/\{i\}}\dfrac{Y_{ij}-B_{g_{y}(i)g_{y}(j)}^{y}}{\sqrt{B_{g_{x}(i)g_{x}(j)}^{x}\left(1-B_{g_{x}(i)g_{x}(j)}^{x}\right)}},
ρ~iv,y\displaystyle\tilde{\rho}_{iv,y} :=1|gx1(v)/{i}|jgx1(v)/{i}YijB^gx(i)gx(j)xB^gx(i)gx(j)x(1B^gx(i)gx(j)x)\displaystyle:=\dfrac{1}{\sqrt{|g_{x}^{-1}(v)/\{i\}|}}\sum\limits_{j\in g_{x}^{-1}(v)/\{i\}}\dfrac{Y_{ij}-\widehat{B}^{x}_{g_{x}(i)g_{x}(j)}}{\sqrt{\widehat{B}^{x}_{g_{x}(i)g_{x}(j)}\left(1-\widehat{B}^{x}_{g_{x}(i)g_{x}(j)}\right)}}
=ρiv,y+iv(x,y)+op(1).\displaystyle\ =\rho_{iv,y}+\ell_{iv}(x,y)+o_{p}(1).

where iv(x,y)=1|gx1(v)/{i}|jgx1(v)/{i}Bgy(i)gy(j)yBgx(i)gx(j)xBgx(i)gx(j)x(1Bgx(i)gx(j)x)\ell_{iv}(x,y)=\dfrac{1}{\sqrt{|g_{x}^{-1}(v)/\{i\}|}}\sum\limits_{j\in g_{x}^{-1}(v)/\{i\}}\dfrac{B_{g_{y}(i)g_{y}(j)}^{y}-B^{x}_{g_{x}(i)g_{x}(j)}}{\sqrt{B^{x}_{g_{x}(i)g_{x}(j)}\left(1-B^{x}_{g_{x}(i)g_{x}(j)}\right)}}, and

maxi,v|ρ~iv,y|=maxi,v|iv(x,y)|+Op(logn).\max_{i,v}|\tilde{\rho}_{iv,y}|=\max_{i,v}|\ell_{iv}(x,y)|+O_{p}(\sqrt{\log n}).

Hence,

Ln,0\displaystyle L_{n,0} :=maxi,v|ρ~iv|\displaystyle:=\max_{i,v}|\tilde{\rho}_{iv}|
=maxi,v|1|gx1(v)/{i}|+|gy1(v)/{i}|(|gy1(v)/{i}|ρ~iv,x+|gx1(v)/{i}|ρ~iv,y)|\displaystyle\ =\max_{i,v}|\dfrac{1}{\sqrt{\left|g_{x}^{-1}(v)/\{i\}\right|+\left|g_{y}^{-1}(v)/\{i\}\right|}}(\sqrt{\left|g_{y}^{-1}(v)/\{i\}\right|}\tilde{\rho}_{iv,x}+\sqrt{\left|g_{x}^{-1}(v)/\{i\}\right|}\tilde{\rho}_{iv,y})|
maxi,v||gx1(v)/{i}|+|gy1(v)/{i}||gx1(v)/{i}|+|gy1(v)/{i}|(ρ~iv,xρ~iv,y)|\displaystyle\ \geq\max_{i,v}|\dfrac{\sqrt{\left|g_{x}^{-1}(v)/\{i\}\right|}+\sqrt{\left|g_{y}^{-1}(v)/\{i\}\right|}}{\sqrt{\left|g_{x}^{-1}(v)/\{i\}\right|+\left|g_{y}^{-1}(v)/\{i\}\right|}}(\tilde{\rho}_{iv,x}\wedge\tilde{\rho}_{iv,y})|
=|gx1(v)/{i}|+|gy1(v)/{i}||gx1(v)/{i}|+|gy1(v)/{i}|maxi,v|(ρ~iv,xρ~iv,y)|\displaystyle\ =\dfrac{\sqrt{\left|g_{x}^{-1}(v)/\{i\}\right|}+\sqrt{\left|g_{y}^{-1}(v)/\{i\}\right|}}{\sqrt{\left|g_{x}^{-1}(v)/\{i\}\right|+\left|g_{y}^{-1}(v)/\{i\}\right|}}\max_{i,v}|(\tilde{\rho}_{iv,x}\wedge\tilde{\rho}_{iv,y})|

By Assumptions 2 and 3, we have Ln,02/lognL_{n,0}^{2}/\log n\rightarrow\infty.

Let Tn=Ln2(gx,gy)2log(2Kn)+loglog(2Kn)T_{n}=L_{n}^{2}(g_{x},g_{y})-2\log(2Kn)+\log\log(2Kn), as nn\rightarrow\infty, we have

Tn/logn.T_{n}/\log n\rightarrow\infty.

\square

7.3 The Bootstrap corrected test statistic under the degree-corrected stochastic block model

For two sample networks 𝒳\mathcal{X} and 𝒴\mathcal{Y}, under the framework of the degree-corrected stochastic block model, the bootstrap corrected test statistic is calculated as the following:

1. Estimating (g^x,B^x)(\widehat{g}_{x},\widehat{B}_{x}) and (g^y,B^y)(\widehat{g}_{y},\widehat{B}_{y}) by the consistent clustering method and θ^x\widehat{\theta}_{x} and θ^y\widehat{\theta}_{y} using their maximum likelihood estimators. Calculate the statistic TnT_{n} using 𝒳,𝒴,(g^x,B^x,θ^x)\mathcal{X},\mathcal{Y},(\widehat{g}_{x},\widehat{B}_{x},\widehat{\theta}_{x}) and (g^y,B^y,θ^y)(\widehat{g}_{y},\widehat{B}_{y},\widehat{\theta}_{y});

2. For m=1,,Mm=1,\ldots,M, generate 𝒳(m)\mathcal{X}^{(m)} and 𝒴(m)\mathcal{Y}^{(m)} from the edge-probability matrix P^ij=(P^ijx+P^ijy)/2=(θ^ixθ^jxB^g^x(i)g^x(j)x+θ^iyθ^jyB^g^y(i)g^y(j)y)/2\widehat{P}_{ij}=(\widehat{P}_{ij}^{x}+\widehat{P}_{ij}^{y})/2=(\widehat{\theta}_{i}^{x}\widehat{\theta}_{j}^{x}\widehat{B}_{\widehat{g}_{x}(i)\widehat{g}_{x}(j)}^{x}+\widehat{\theta}_{i}^{y}\widehat{\theta}_{j}^{y}\widehat{B}_{\widehat{g}_{y}(i)\widehat{g}_{y}(j)}^{y})/2, and calculate Tn(m)T_{n}^{(m)} based on 𝒳(m)\mathcal{X}^{(m)}, 𝒴(m)\mathcal{Y}^{(m)};

3. Using {Tn(m):m=1,,M}\{T_{n}^{(m)}:m=1,\ldots,M\} to estimate the location and scale parameters μ^\widehat{\mu} and β^\widehat{\beta} of the Gumbel distribution through maximum likelihood method.

4. The bootstrap corrected test statistic is calculated as

Tnboot=μ+β(Tnμ^β^),T_{n}^{boot}=\mu+\beta\left(\dfrac{T_{n}-\widehat{\mu}}{\widehat{\beta}}\right),

where μ=2log(2π)\mu=-2\log(2\sqrt{\pi}) and β=2\beta=2.

References

  • [1] Airoldi, E. M., Blei, D. M., Fienberg, S. E., & Xing, E. P. (2008). Mixed membership stochastic blockmodels. Journal of machine learning research, 9:1981 - 2014.
  • [2] Amini, A. A., Chen, A., Bickel, P. J., & Levina, E. (2013). Pseudo-Likelihood Methods for Community Detection in Large Sparse Networks. The Annals of Statistics, 41:2097 - 2122.
  • [3] Arratia, R., Goldstein, L., & Gordon, L. (1989). Two moments suffice for Poisson approximations: The Chen-Stein method. The Annals of Probability, 17:9 - 25.
  • [4] Bakken, T. E., Miller, J. A., Ding, S.-L., Sunkin, S. M., Smith, K. A., Ng, L., Szafer, A., Dalley, R. A., Royall, J. J., Lemon, T., et al (2016). A comprehensive transcriptional map of primate brain development. Nature, 535:367 - 375.
  • [5] Barnett, I. & Onnela, J.-P. (2016). Change point detection in correlation networks. Scientic reports, 6:18893.
  • [6] Bickel, P. J. & Chen, A. (2009). A nonparametric view of network models and Newman-Girvan and other modularities. Proceedings of the National Academy of Sciences, 106:21068 - 21073.
  • [7] Bickel, P. J., & Sarkar, P. (2015), Hypothesis testing for automated community detection in networks, Journal of the Royal Statistical Society, Series B, 78:253–273.
  • [8] Cai, T. T. & Jiang, T. (2011). Limiting laws of coherence of random matrices with applications to testing covariance structure and construction of compressed sensing matrices. The Annals of Statistics, 39:1496 - 1525.
  • [9] Chen, X. (1990). Probabilities of moderate deviations for BB-valued independent random vectors. Chinese Annals of Mathematics, 11:621 - 629.
  • [10] Chen, L., Zhou, J. & Lin, L. (2021a). Hypothesis testing for populations of networks. Communications in Statistics - Theory and Methods, online.
  • [11] Chen, L., Josephs, N., Lin, L., Zhou, J. & Kolaczyk, E. D. (2021b). A spectral-based framework for hypothesis testing in populations of networks. Statistica Sinica, online.
  • [12] Choi, D. S., Wolfe, P. J., & Airoldi, E. M. (2012). Stochastic blockmodels with a growing number of classes. Biometrika, 99:273 - 284.
  • [13] Daudin, J.-J., Picard, F., & Robin, S. (2008). A mixture model for random graphs. Statistics and Computing, 18:173 - 183.
  • [14] Gao, C., Ma, Z., Zhang, A. Y., & Zhou, H. H. (2017). Achieving optimal misclassification proportion in stochastic blockModels. The Journal of Machine Learning Research, 18:1980 - 2024.
  • [15] Ghoshdastidar, D., & von Luxburg, U. (2018). Practical methods for graph two-sample testing. In Advances in Neural Information Processing Systems, 3019-3028. Montréal, Canada.
  • [16] Ghoshdastidar, D., Gutzeit, M., Carpentier, A. & Von Luxburg, U.(2020). Two-sample hypothesis testing for inhomogeneous random graphs. The Annals of Statistics, 48: 2208 - 2229.
  • [17] Holland, P. W., Laskey, K. B., & Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social Networks, 5:109 - 137.
  • [18] Hu, J., Qin, H., Yan, T., , & Zhao, Y. (2020). Corrected Bayesian Information Criterion for Stochastic Block Models. Journal of the American Statistical Association, 115:1771 - 1783.
  • [19] Hu, J., Zhang, J., Qin, H., Yan, T., & Zhu, J. (2021). Using Maximum Entry-Wise Deviation to Test the Goodness of Fit for Stochastic Block Models. Journal of The American Statistical Association, 116:1373 - 1382.
  • [20] Jin, J. (2015). Fast Community Detection by SCORE. The Annals of Statistics, 43:57- 89.
  • [21] Karrer, B. & Newman, M. E. J. (2011). Stochastic blockmodels and community structure in networks. Physical Review. E, 83:016107.
  • [22] Langfelder, P. & Horvath, S. (2008). WGCNA: An R package for weighted correlation network analysis. BMC Bioinformatics, 9:559.
  • [23] Le, C. M. & Levina, E. (2022). Estimating the number of communities in networks by spectral methods. Electronic Journal of Statistics, 16:3315 - 3342.
  • [24] Lei, J. (2016). A goodness-of-fit test for stochastic block models. The Annals of Statistics, 44:401 - 424.
  • [25] Lei, J. & Lin, K. Z. (2022). Bias-adjusted spectral clustering in multi-layer stochastic block models. Journal of the American Statistical Association, online.
  • [26] Lei, J. & Rinaldo, A. (2015). Consistency of spectral clustering in stochastic block models. The Annals of Statistics, 43:215 - 237.
  • [27] Liu, F., Choi, D., Xie, L., & Roeder, K. (2018). Global spectral clustering in dynamic networks. Proceedings of the National Academy of Sciences, 115:927 - 932.
  • [28] Newman, M. E. J. (2006). Modularity and community structure in networks. Proceedings of the National Academy of Sciences of the United States of America, 103:8577 - 8582.
  • [29] Newman, M. E. J. & Girvan, M. (2004). Finding and evaluating community structure in networks. Physical review. E, 69:026113.
  • [30] Nowicki, K. & Snijders, T. (2001). Estimation and prediction for stochastic block structures. Journal of The American Statistical Association, 96:1077 - 1087.
  • [31] Rohe, K., Chatterjee, S., & Yu, B. (2011). Spectral clustering and the high-dimensional stochastic blockmodel. The Annals of Statistics, 39:1878 - 1915.
  • [32] Rohe, K., Qin, T., & Fan, H. (2014). The highest dimensional stochastic blockmodel with a regularized estimator. Statistica Sinica, 24:1771 - 1786.
  • [33] Saldña, D. F., Yu, Y., & Feng, Y. (2017). How Many Communities Are There? Journal of Computational and Graphical Statistics, 26:171 - 181.
  • [34] Sarkar, P. & Bickel, P. J. (2015). Role of normalization in spectral clustering for stochastic blockmodels. The Annals of Statistics, 43:962 - 990.
  • [35] Snijders, T. & Nowicki, K. (1997). Estimation and prediction for stochastic block structures for graphs with latent block structure. Journal of Classification, 14:75 - 100.
  • [36] Steinhaeuser, K. & Chawla, N. V. (2010). Identifying and evaluating community structure in complex networks. Pattern Recognition Letters, 31:413 - 421.
  • [37] Tang, M., Athreya, A., Sussman, A. L., Lyzinski, V. & Priebe, C. E. (2017). A nonparametric two-sample hypothesis testing problem for random graphs. Bernoulli, 23:1599 - 1630.
  • [38] Wang, J., Zhang, J., Liu, B., Zhu, J., & Guo, J. (2021). Fast network community detection with profile-pseudo likelihood methods. Journal of the American Statistical Association, online.
  • [39] Wang, Y. R. & Bickel, P. J. (2017). Likelihood-based model selection for stochastic block models. The Annals of Statistics, 45:500 - 528.
  • [40] Westveld, A. H. & Hoff, P. D. (2011). A Mixed effects model for longitudinal relational and network data with applications to international trade and conflict. The Annals of Applied Statistics, 5:843 - 872.
  • [41] Wu, F., Kong, X. & Xu, C. (2022). Test on stochastic block model: local smoothing and extreme value theory. Journal of Systems Science and Complexity, 35:1535 - 1556.
  • [42] Zhang, A. Y. & Zhou, H. H. (2016). Minimax rates of community detection in stochastic block models. The Annals of Statistics, 44:2252 - 2280.
  • [43] Zhao, Y., Levina, E., & Zhu, J. (2011). Community extraction for social networks. Proceedings of the National Academy of Sciences of the United States of America, 108:7321 - 7326.
  • [44] Zhao, Y., Levina, E., & Zhu, J. (2012). Consistency of community detection in networks under degree-corrected stochastic block models. The Annals of Statistics, 40:2266 - 2292.
  • [45] Zhou, W. (2007). Asymptotic distribution of the largest off-diagonal entry of correlation matrices. Transactions of the American Mathematical Society, 359:5345 - 5363.