Two–Sample Test for Stochastic Block Models via Maximum Entry–Wise Deviation

Kang Fu, Seydou Keita and Jianwei Hu
School of Mathematics and Statistics, Central China Normal University, Wuhan

Abstract

The stochastic block model is a popular tool for detecting community structures in network data. Detecting the difference between two community structures is an important issue for stochastic block models. However, the two-sample test has been a largely under-explored domain, and too little work has been devoted to it. In this article, based on the maximum entry–wise deviation of the two centered and rescaled adjacency matrices, we propose a novel test statistic to test two samples of stochastic block models. We prove that the null distribution of the proposed test statistic converges in distribution to a Gumbel distribution, and we show the change of the two samples from stochastic block models can be tested via the proposed method. Then, we show that the proposed test has an asymptotic power guarantee against alternative models. One noticeable advantage of the proposed test statistic is that the number of communities can be allowed to grow linearly up to a logarithmic factor. Further, we extend the proposed method to the degree-corrected stochastic block model. Both simulation studies and real-world data examples indicate that the proposed method works well.

Keywords: Gumbel distribution; Network data; Stochastic block model; Two-sample test

1 Introduction

Network data analysis has become a popular research topic in many fields, including gene classification, social relationship investigation, and financial risk management. In the past decades, the majority of works mainly focused on the large-scale network data with community structure, see, e.g., Newman & Girvan (2004); Newman (2006); Steinhaeuser & Chawla (2010). In the network data analysis, the stochastic block model proposed by Holland et al. (1983) is a popular tool to fit the network data with community structure, see, e.g., Snijders & Nowicki (1997); Nowicki & Snijders (2001); Bickel & Chen (2009); Rohe et al. (2011); Choi et al. (2012); Jin (2015); Zhang & Zhou (2016). The stochastic block model with $K$ communities assumes that $n$ nodes of the network are clustered into $K$ communities, that is, there exists a mapping of community membership (also known as community membership label) $g:[n]\rightarrow[K]^{n}$ , where $[n]=\{1,\ldots,n\}$ . Formally, $g(i)=k$ means that the node $i$ belongs to the community $k$ . For an unweighted and undirected graph $G$ , it can be represented by a binary symmetric adjacency matrix $A$ , that is, $A_{ij}=1$ if there is a connection (or an edge) between node $i$ and node $j$ and $A_{ij}=0$ otherwise. Given the community membership label $g$ , the stochastic block model assumes that the entries $A_{ij}(i>j)$ of the adjacency matrix $A$ are mutually independent Bernoulli random variables with probabilities $P_{ij}=B_{g(i)g(j)}$ for a certain symmetric probability matrix $B\in[0,1]^{K\times K}$ , where matrix $P$ is called as edge-probability matrix. Then the stochastic block model is completely and uniquely determined by the pair $(g,B)$ up to label permutations of nodes.

The fundamental issues in the stochastic block model are model selection and community detection. Given an adjacency matrix $A$ , the goal of model selection is to estimate the number of clusters or communities, and the goal of community detection is to cluster all nodes into different communities such that the connections between the nodes in the same community are dense and the connections between the nodes in the different communities are sparse. To enhance the flexibility of the model, variations of the stochastic block model were also proposed, such as the degree-corrected stochastic block model proposed by Karrer & Newman (2011) addressed the degree heterogeneity by introducing the additional node activeness parameters, and Airoldi et al. (2008) proposed the mixed membership stochastic block model, where a single node may belong to multiple communities. For the model selection, many methods are used to estimate the number of communities, including the sequential testing methods (Lei, 2016; Hu et al., 2021), and the likelihood-based methods (Saldña et al., 2017; Wang & Bickel, 2017; Hu et al., 2020). Meanwhile, the majority of efficient methods also have been proposed to recover the community structure, such as modularity (Newman, 2006), variational methods (Daudin et al., 2008), profile-likelihood maximization (Bickel & Chen, 2009), spectral clustering (Rohe et al., 2011; Jin, 2015), pseudo-likelihood maximization (Amini et al., 2013), and profile-pseudo likelihood methods (Wang et al., 2021). The corresponding asymptotic properties of estimation of community label have been obtained, see, e.g., Rohe et al. (2011); Zhao et al. (2012); Choi et al. (2012); Lei & Rinaldo (2015); Sarkar & Bickel (2015); Zhang & Zhou (2016); Wang et al. (2021).

In the past decades, the research interest of much literature focuses on the study of one sample of stochastic block models. Usually, in the social network, people surrounding a leader may tend to develop a closer relationship with another leader, see, e.g., Barnett & Onnela (2016). This phenomenon may lead to an issue, that is, whether the community structure (the number of communities $K$ , the community probability matrix $B$ , and the community membership label $g$ ) of the network will change over time or the environment.

As the simplest case of the multi-layer stochastic block model, two sample networks may also appear. For a multi-layer stochastic block model, it can be classified into a variety of more specific situations: First, all layer networks come from the identical stochastic block model $(g,B)$ ; second, each layer of the network comes from a different model but they have the same community structure, that is, $g$ is identical across all layers; third, each layer of the network comes from different models, that is, $g$ and $B$ are different in each layer. In fact, for the third case, it can be considered as a mixed model of multiple stochastic block models. However, how could we distinguish the three situations? Intuitively, we should judge whether two models are the same when we have two samples from the corresponding models. A common inference method to judge whether two models are the same is the hypothesis testing method. Tang et al. (2017) considered whether two independent finite-dimensional random dot product graphs are generated by the identical model. Using the adjacency spectral embedding for each sample, they constructed a statistic based on the kernel function. Ghoshdastidar & von Luxburg (2018) used the largest singular value of a scaled and centralized matrix to construct the statistic and proved that the null distribution convergences in distribution to a Tracy-Widom distribution. Ghoshdastidar et al. (2020) proposed two test statistics using the Frobenius norm and spectral norm to test whether two samples of networks are generated from the identical edge-probability matrix. However, this testing procedure requires choosing an appropriate threshold. Recently, Chen et al. (2021a) simplified the statistics in Ghoshdastidar & von Luxburg (2018) and proposed a test procedure for a two-sample network test. Based on the random matrix theory, Chen et al. (2021b) used the trace of a constructed matrix to obtain the statistic and proved that the asymptotic null distribution is the standard normal distribution.

For the methods mentioned above, one defect is that either the edge-probability matrices of the two populations differ greatly or multiple samples are required. Hence, when we only have two samples, the methods mentioned above may not work well as there is less information. In addition, it is worth noting that when the community structure changes slightly, the edge-probability matrix may change less, especially when the network size is relatively large. The above methods of testing $H_{0}:P_{1}=P_{2}$ may not work well, where $P_{1}$ and $P_{2}$ are the edge-probability matrices of two network models. It implies that we cannot test the difference between the two models when the difference is tiny but the network size is very large. Thus, detecting the difference between two samples of stochastic block models is an interesting research problem. Under the two-sample test of the stochastic block model, Wu et al. (2022) proposed a test method based on the locally smoothed adjacency matrix. To construct the smoothed adjacency matrix, their method separates a community into serval non-overlapping neighboring sub-communities and averages the entries of the adjacency matrix in non-overlapping local neighborhoods within communities. However, the procedure is complex and only applicable to a small number of communities. In this article, we want to construct a statistic that allows $K$ to diverge with $n$ .

In this article, based on the maximum entry-wise deviation of the two centered and rescaled adjacency matrices, we propose a two-sample test statistic to detect the change of two stochastic block models under two observed adjacency matrices. We show that the asymptotic null distribution of the test statistic is a Gumbel distribution when $K=o(n/\log^{2}n)$ . This testing method allows $K$ to grow linearly with $n$ up to a logarithmic factor. It is well known that the number of communities must be less than the number of nodes in the stochastic block model. Since $K$ cannot grow faster, Rohe et al. (2014) called this scenario ( $K\rightarrow\infty$ ) the highest dimensional stochastic block model. Compared with the method in Wu et al.(2022), we relax the condition that the number of communities $K$ is fixed. Moreover, we also show that the proposed test is asymptotically powerful against serval alternative models, and does not need additional methods to improve the power of the proposed test. Finally, to improve the flexibility of the proposed method, we extend the proposed method to the degree-corrected stochastic block model.

Next, we formally describe this issue. Let $\mathcal{X}=(X_{ij})_{n\times n}$ and $\mathcal{Y}=(Y_{ij})_{n\times n}$ be two binary symmetric adjacency matrices from two stochastic block models parametrized by $(g_{x},B_{x})$ and $(g_{y},B_{y})$ , respectively. For any $i$ , let $X_{ii}=Y_{ii}=0$ , i.e., no loop edge exists. In this article, we assume that the node label of the two networks is identical, instead of the community label of the node. Hence, the two networks $\mathcal{X}$ and $\mathcal{Y}$ can be viewed as repeated observations in the same individuals. Then, given two sample networks $\mathcal{X}$ and $\mathcal{Y}$ , the testing problem can be formulated as

H_{0}:(g_{x},B_{x})=(g_{y},B_{y})\qquad\mathrm{v.s.}\qquad H_{1}:(g_{x},B_{x})\neq(g_{y},B_{y}).

(1)

Intuitively, there are four mutually exclusive scenarios for the alternative hypothesis: (i) $K_{x}\neq K_{y}$ , (ii) $K_{x}=K_{y},g_{x}=g_{y}$ , but $B_{x}\neq B_{y}$ , (iii) $K_{x}=K_{y},B_{x}=B_{y}$ , but $g_{x}\neq g_{y}$ , and (iv) $K_{x}=K_{y},g_{x}\neq g_{y},B_{x}\neq B_{y}$ .

Our testing procedure for (1) goes as follows. First, we get consistent estimators of $K_{x}$ and $K_{y}$ , denoted by $\widehat{K}_{x}$ and $\widehat{K}_{y}$ , respectively. If $\widehat{K}_{x}\neq\widehat{K}_{y}$ , it means that $(g_{x},B_{x})=(g_{y},B_{y})$ cannot be true, so we can directly reject the null hypothesis. If $\widehat{K}_{x}=\widehat{K}_{y}$ , based on $\widehat{K}_{x}=\widehat{K}_{y}=\widehat{K}$ , we obtain the strongly consistent estimators $\widehat{g}_{x}$ and $\widehat{g}_{y}$ for the community membership labels $g_{x}$ and $g_{y}$ . Second, we estimate the entries of $B_{x}$ and $B_{y}$ by the sample proportions of each community based on $(\mathcal{X},\widehat{g}_{x})$ and $(\mathcal{Y},\widehat{g}_{y})$ . Third, we use $\widehat{B}_{y}$ ( $\widehat{B}_{x}$ ) to center and rescale adjacency matrix $\mathcal{X}$ ( $\mathcal{Y}$ ), and sum for new matrix under specific rules, and get a combined information matrix, see (7). Finally, we obtain the test statistic by the maximum of the elements of the combined information matrix. The basic principle of this method is that if the null hypothesis is true, the entries of the combined information matrix asymptotically follow the normal distribution. According to the results of Zhou (2007), the asymptotic distribution of the maximum of elements after normalization is a Gumbel distribution. On the other hand, under the alternative hypothesis, the adjacency is incorrectly centered and scaled, and this deviation is magnified to a very large value by the normalization term. This implies that the test statistic can successfully separate $H_{0}$ and $H_{1}$ in (1).

The remainder of the article is organized as follows. In Section 2, we introduce the new test statistic and state its asymptotic null distribution and asymptotic power. In Section 3, we extend the proposed method to the degree-corrected stochastic block model. Simulation studies and real-world data examples are given in Sections 4 and 5, respectively. All technical proofs are postponed to the Appendix.

2 A NEW TWO-SAMPLE TEST FOR THE STOCHASTIC BLOCK MODEL

Consider a stochastic block model on $n$ nodes with the community label $g$ and probability matrix $B$ . Given a number of communities $K$ and a community label $g$ , the maximum likelihood estimator of $B$ is given by

\widehat{B}_{uv}=\begin{dcases}\dfrac{\sum_{i\in\mathcal{N}_{u},j\in\mathcal{N}_{v}}X_{ij}}{n_{u}n_{v}},\quad&u\neq v,\\ \dfrac{\sum_{i,j\in\mathcal{N}_{u},i\neq j}X_{ij}}{n_{u}(n_{v}-1)},\quad&u=v,\end{dcases}

(2)

where $\mathcal{N}_{u}=\left\{i:g(i)=u\right\}$ for $1\leq u\leq K$ , $i\in\{1,\ldots,n\}$ stands for the label of a node, and $n_{u}=\left|\mathcal{N}_{u}\right|$ . After here, $\mathcal{N}_{u}^{x}=\left\{i:g_{x}(i)=u\right\}$ and $n_{u}^{x}=\left|\mathcal{N}^{x}_{u}\right|$ , where $x$ can be replaced by $y$ .

For an adjacency matrix $A$ , Hu et al. (2021) used a test statistic, based on the maximum entry-wise deviation, to test the following two hypothesis tests:

(1) $H_{0}:K=K_{0}\quad\mathrm{v.s.}\quad H_{1}:K>K_{0}$ , and

(2) $H_{0}:g=g_{0}\quad\mathrm{v.s.}\quad H_{1}:g\neq g_{0}$ ,

where $K$ and $g$ denote the true number of communities and the true community label, respectively, and $K_{0}$ and $g_{0}$ denote a hypothetical number of communities and hypothetical community label, respectively. Let

L_{n}(K_{0},g_{0}):=\max_{1\leq i\leq n,1\leq v\leq K_{0}}|\widehat{\rho}_{iv}|,

where $\widehat{\rho}_{iv}=\dfrac{1}{\sqrt{|g^{-1}_{0}(v)/\{i\}|}}\sum_{j\in g^{-1}_{0}(v)/\{i\}}\dfrac{A_{ij}-\widehat{B}_{g_{0}(i)g_{0}(j)}}{\sqrt{\widehat{B}_{g_{0}(i)g_{0}(j)}(1-\widehat{B}_{g_{0}(i)g_{0}(j)})}}$ , and $g_{0}^{-1}(v)=\{i:g_{0}(i)=v\}$ , and $\left|g_{0}^{-1}(v)\right|$ is the number of nodes in block $v$ , and $g_{0}^{-1}(v)/\{i\}$ denotes the set of nodes that belong to community $v$ in $g_{x}$ but excluding node $i$ and $\widehat{B}_{g_{0}(i)g_{0}(j)}$ as defined in (2). Under the null hypothesis $H_{0}:K=K_{0},g=g_{0}$ , if $K=o(n/\log^{2}n)$ , Hu et al. (2021) showed that

\lim_{n\rightarrow\infty}\mathbb{P}\{L_{n}^{2}(K_{0},g_{0})-2\log(2K_{0}n)+\log\log(2K_{0}n)\leq y\}=\exp\left\{-\dfrac{1}{2\sqrt{\pi}}e^{-y/2}\right\},

and proposed the following test statistic:

T_{n}=L_{n}^{2}(K_{0},g_{0})-2\log(2K_{0}n)+\log\log(2K_{0}n).

Then the corresponding level- $\alpha$ rejection rule is

\mathrm{Reject:}\ H_{0}:K=K_{0},g=g_{0}\ \mathrm{if}\ T_{n}\geq t_{(1-\alpha)},

where $Q_{1-\alpha}$ is the $\alpha$ th quantile of the Gumbel distribution with $\mu=-2\log(2\sqrt{\pi})$ and $\beta=2$ .

In this article, We aim to develop a new test statistic that allows it can be used to test two samples. For two samples $\mathcal{X}$ and $\mathcal{Y}$ from stochastic block models $(g_{x},B_{x})$ and $(g_{y},B_{y})$ , respectively, let $\widehat{K}_{x}$ and $\widehat{K}_{y}$ be obtained by some estimation methods, such as the recursive approach in Zhao et al. (2011), the sequential testing method given in Lei (2016), the likelihood-based method in Saldña et al. (2017), the corrected Bayesian information criterion in Hu et al. (2020), the test based on maximum entry-wise deviation in Hu et al. (2021), and the spectral methods in Le & Levina (2022). In this article, we use the corrected Bayesian information criterion in Hu et al. (2020) to get consistent estimators $\widehat{K}_{x}$ and $\widehat{K}_{y}$ . Recall that given the number of communities $K_{x}$ and $K_{y}$ , the community labels $g_{x}$ and $g_{y}$ can be consistently estimated, denote by $\widehat{g}_{x}$ and $\widehat{g}_{y}$ , by some existing strongly consistent community detection procedures, e.g., the majority voting algorithm in Gao et al. (2017) and the profile-pseudo likelihood method in Wang et al. (2021). In view of this, throughout the paper, we formally assume that

\mathbb{P}\left\{\widehat{g}_{x}=g_{x},\widehat{g}_{y}=g_{y},\widehat{K}_{x}=K_{x},\widehat{K}_{y}=K_{y}\right\}\rightarrow 1,

(3)

which implies that we require that all estimators are strongly consistent. Hence, we can get the $\widehat{B}_{x}$ and $\widehat{B}_{y}$ by equation (2). Note that the community membership label $\widehat{g}_{x}$ and the probability matrix $\widehat{B}_{x}$ depend on the number of communities $\widehat{K}_{x}$ . If $\widehat{g}_{x}$ is a strongly consistent estimator, $\widehat{K}_{x}$ must be the consistent estimator. Then, Condition (3) can be relaxed to $\mathbb{P}\left\{\widehat{g}_{x}=g_{x},\widehat{g}_{y}=g_{y}\right\}\rightarrow 1$ . In fact, if $\widehat{K}_{x}\neq\widehat{K}_{y}$ , then it is natural that $\widehat{g}_{x}\neq\widehat{g}_{y}$ and $\widehat{B}_{x}\neq\widehat{B}_{y}$ . Since $\widehat{g}_{x}$ and $\widehat{g}_{y}$ are consistent estimators, we can reject the null hypothesis with probability tending to 1. Meanwhile, For case (iv), there is indeed a problem of identifiability. Because both $g$ and $B$ are allowed to vary, the existing methods cannot reasonably solve the problem of identifiability. Thus, in this article, we mainly focus on cases (ii) and (iii) for four alternative hypotheses. Then, we assume $K_{x}=K_{y}=K$ . Without loss of generality, we write $\widehat{K}=\widehat{K}_{x}=\widehat{K}_{y}$ when $\widehat{K}_{x}=\widehat{K}_{y}$ . Next, we formally state the test statistic. As mentioned in the introduction, our test statistic is motivated by the contrast of $\mathcal{X}$ (or $\mathcal{Y}$ ) and $\widehat{B}_{y}$ (or $\widehat{B}_{x}$ ), i.e.:

\tilde{\rho}_{iv}(g_{x},g_{y})=\dfrac{1}{\sqrt{\left|g_{x}^{-1}(v)/\{i\}\right|+\left|g_{y}^{-1}(v)/\{i\}\right|}}\times\\ \left[\sum\limits_{j\in g_{y}^{-1}(v)/\{i\}}\dfrac{X_{ij}-\widehat{B}^{y}_{g_{y}(i)g_{y}(j)}}{\sqrt{\widehat{B}^{y}_{g_{y}(i)g_{y}(j)}\left(1-\widehat{B}^{y}_{g_{y}(i)g_{y}(j)}\right)}}+\sum\limits_{j\in g_{x}^{-1}(v)/\{i\}}\dfrac{Y_{ij}-\widehat{B}^{x}_{g_{x}(i)g_{x}(j)}}{\sqrt{\widehat{B}^{x}_{g_{x}(i)g_{x}(j)}\left(1-\widehat{B}^{x}_{g_{x}(i)g_{x}(j)}\right)}}\right]

(4)

Moreover, based on the maximum entry-wise deviation, the proposed test statistic has the following form:

L_{n}(g_{x},g_{y})=\max_{1\leq i\leq n,1\leq v\leq K}\left|\tilde{\rho}_{iv}(g_{x},g_{y})\right|.

2.1 The Asymptotic Null Distribution

To obtain the asymptotic result for the statistic $L_{n}(g_{x},g_{y})$ , we first make the following assumptions:

Assumption 1.

The entries of $B_{x}$ and $B_{y}$ are uniformly bounded away from 0 and 1, and both $B_{x}$ and $B_{y}$ have no identical rows.

Assumption 2.

There exist $C_{1}$ , $C_{2}$ , $C_{3}$ and $C_{4}$ such that

C_{1}n/K_{x}\leq\min_{1\leq u\leq K_{x}}n_{u}^{x}\leq\max_{1\leq u\leq K_{x}}n_{u}^{x}\leq C_{2}n^{2}/(K_{x}^{2}\log^{2}n),

and,

C_{3}n/K_{y}\leq\min_{1\leq u\leq K_{y}}n_{u}^{y}\leq\max_{1\leq u\leq K_{y}}n_{u}^{y}\leq C_{4}n^{2}/(K_{y}^{2}\log^{2}n),

for all $n$ .

Assumption 1 requires that the entries of the probability matrices $B_{x}$ and $B_{y}$ are uniformly bounded away from 0 and 1. This assumption is similar to the corresponding condition in Lei (2016) and Hu et al. (2021). Meantime, Assumption 1 also requires that $B_{x}$ and $B_{y}$ are identifiable, which is a basic condition (Wang & Bickel, 2017). Assumption 2 not only requires a lower bound for the number of nodes in the smallest community, but also gives an upper bound for the number of nodes in the largest community. The lower bound requires that the number of nodes in the smallest community for $\mathcal{X}$ ( $\mathcal{Y}$ ) is at least proportional to $n/K_{x}$ ( $n/K_{y}$ ). This is a mild assumption and easy to be achieved. For example, this lower bound can be achieved when the community label $g_{x}$ is generated from a multinomial distribution with $n$ trials and parameter $\bm{\pi}=(\pi_{1},\cdots,\pi_{K})$ such that $\min_{u}\pi_{u}\geq C_{1}/K_{x}$ . For the assumption about the upper bound, Zhang & Zhou (2016) and Gao et al. (2017) also considered a similar condition, which is used to control the maximum within-group deviation between the $B_{x}$ ( $B_{y}$ ) and its estimator $\widehat{B}_{x}$ ( $\widehat{B}_{y}$ ).

We now give the asymptotic property of the test statistic $L_{n}(g_{x},g_{y})$ and delay the proof to the Appendix.

Theorem 1.

Suppose that Assumptions (1) and (2) hold. Then under the null hypothesis $H_{0}:(g_{x},B_{x})=(g_{y},B_{y})$ , as $n\rightarrow\infty$ , if $K=o(n/\log^{2}n)$ , we have

\mathbb{P}\left\{L_{n}^{2}(g_{x},g_{y})-2\log(2Kn)+\log\log(2Kn)\leq y\right\}\rightarrow\exp\left\{-\dfrac{1}{2\sqrt{\pi}}e^{-y/2}\right\},

(5)

where the right hand-side of (5) is the cumulative distribution function of the Gumbel distribution with $\mu=-2\log(2\sqrt{\pi})$ and $\beta=2$ .

Note that Theorem 1 demonstrates that $L_{n}^{2}(g_{x},g_{y})-2\log(2Kn)+\log\log(2Kn)$ converges in distribution to a Gumbel distribution under $H_{0}$ .

Using the above Theorem 1, we can implement hypothesis test (1) as follows. First, we estimate the number of communities $\widehat{K}_{x}$ and $\widehat{K}_{y}$ , and the community membership labels $(\widehat{g}_{x},\widehat{B}_{x})$ and $(\widehat{g}_{y},\widehat{B}_{y})$ , respectively. Then, we compute the statistic

T_{n}=L_{n}^{2}(\widehat{g}_{x},\widehat{g}_{y})-2\log(2\widehat{K}n)+\log\log(2\widehat{K}n).

Since $\widehat{g}_{x}$ and $\widehat{g}_{y}$ are strongly consistent estimators, we have that $T_{n}$ intuitively follows the Gumbel distribution with $\mu=-2\log(2\sqrt{\pi})$ and $\beta=2$ . To carry out the hypothesis test, we have a rejection rule:

\mathrm{Reject}\ \ H_{0},\quad T_{n}\geq Q_{1-\alpha},

where $Q_{1-\alpha}$ is the $\alpha$ th quantile of the Gumbel distribution with $\mu=-2\log(2\sqrt{\pi})$ and $\beta=2$ . In Section 4, by simulation studies, we investigate the finite sample performance of the proposed test statistic.

In general, the null distribution converges to the Gumbel distribution slowly, which is also confirmed in simulation studies. Bickel & Sarkar (2015) considered a bootstrap method for the correction of the distribution of the finite sample, which was also used by Lei (2016) and Hu et al. (2021). For our statistic, the bootstrap corrected test statistic is calculated as follows:

1. Using the consistent estimation method to estimate $\widehat{K}_{x}=\widehat{K}_{y}=\widehat{K}$ , then get the estimation $(\widehat{g}_{x},\widehat{B}_{x})$ and $(\widehat{g}_{y},\widehat{B}_{y})$ by the strongly consistent clustering method;

2. For $m=1,\ldots,M$ , generate $\mathcal{X}^{(m)}$ and $\mathcal{Y}^{(m)}$ from the edge-probability matrix $\widehat{P}_{ij}=(\widehat{P}_{ij}^{x}+\widehat{P}_{ij}^{y})/2=(\widehat{B}_{\widehat{g}_{x}(i)\widehat{g}_{x}(j)}^{x}+\widehat{B}_{\widehat{g}_{y}(i)\widehat{g}_{y}(j)}^{y})/2$ , and calculate $T_{n}^{(m)}$ based on $\mathcal{X}^{(m)}$ , $\mathcal{Y}^{(m)}$ ;

3. Using $\{T_{n}^{(m)}:m=1,\ldots,M\}$ to estimate the location and scale parameters $\widehat{\mu}$ and $\widehat{\beta}$ of the Gumbel distribution through maximum likelihood method.

4. The bootstrap corrected test statistic is calculated as

T_{n}^{boot}=\mu+\beta\left(\dfrac{T_{n}-\widehat{\mu}}{\widehat{\beta}}\right),

where $\mu=-2\log(2\sqrt{\pi})$ and $\beta=2$ .

2.2 THE ASYMPTOTIC POWER

In this subsection, we investigate the asymptotic power of the proposed test procedure. To guarantee good testing power, we need the following theoretical assumption.

Assumption 3.

The maximum grouped difference between $B_{x}$ and $B_{y}$ satisfies:

\max_{i,v}|\dfrac{1}{\sqrt{|g_{x}^{-1}(v)|}}\sum_{j\in g_{x}^{-1}(v)}(B_{g_{y}(i)g_{y}(j)}^{y}-B_{g_{x}(i)g_{x}(j)}^{x})|/\sqrt{\log n}\rightarrow\infty,

and

\max_{i,v}|\dfrac{1}{\sqrt{|g_{y}^{-1}(v)|}}\sum_{j\in g_{y}^{-1}(v)}(B_{g_{x}(i)g_{x}(j)}^{x}-B_{g_{y}(i)g_{y}(j)}^{y})|/\sqrt{\log n}\rightarrow\infty.

Assumption 3 specifies that under the alternative $H_{1}$ , the maximum grouped difference between $B_{x}$ and $B_{y}$ diverges faster than $\sqrt{\log n}$ . This assumption is analogous to (A2^′) in Hu et al. (2021). Then, the lower bound of the growth rate of the test statistic $T_{n}$ under the alternative hypothesis $H_{1}:(g_{x},B_{x})\neq(g_{y},B_{y})$ is given in the following Theorem.

Theorem 2.

Suppose that Assumptions (1), (2), and (3) hold. Then under the alternative hypothesis $H_{1}:(g_{x},B_{x})\neq(g_{y},B_{y})$ , as $n\rightarrow\infty$ , if $K=o(n/\log^{2}n)$ , we have

L_{n}^{2}(g_{x},g_{y})\geq 2\log(2Kn)-\log\log(2Kn)+c_{1}\log n+O_{p}(1),

for some some consistent $c_{1}>0$ .

The proof is collected in the Appendix. The above Theorem 2 shows that $T_{n}$ diverges faster than $\log n$ as $n\rightarrow\infty$ under the alternative hypothesis $H_{1}$ . Then, we have the following corollary saying that the asymptotic size is close to the nominal level and the asymptotic power is almost equal to 1.

Corollary 1.

Suppose that Assumptions (1), and (2) hold, given a nominal level $\alpha$ , we have

\mathbb{P}_{H_{0}}\{T_{n}>Q_{1-\alpha}\}\rightarrow\alpha,

and further, suppose that Assumption (3) holds, we have

\mathbb{P}_{H_{1}}\{T_{n}>Q_{1-\alpha}\}\rightarrow 1.

Therefore, the asymptotic null distribution in Theorem 1 and the growth rate in Theorem 2 suggest that the null hypothesis and the alternative hypothesis are well separated, and our proposed test is asymptotically powerful against alternative hypothesis $H_{1}:(g_{x},B_{x})\neq(g_{y},B_{y})$ .

3 EXTENSION

In this section, we extend the proposed method to the degree-corrected stochastic block model. In reality, a network may contain many high-degree nodes and lower-degree nodes, that is, the degree heterogeneity of nodes. To address this issue, Karrer & Newman (2011) proposed a degree-corrected stochastic block model. Specifically, for an undirected network $\mathcal{X}$ , this model assume that the edge $X_{ij}$ satisfies $\mathbb{P}\{X_{ij}=1\}=\theta_{i}\theta_{j}B_{g(i)g(j)}$ , where $\theta_{i}$ is the degree parameter for node $i$ . To ensure the identifiability of the model, we assume that $\sum_{i}\theta_{i}/n=1$ . For two sample networks $\mathcal{X}$ and $\mathcal{Y}$ , the testing problem has the following form:

H^{\prime}_{0}:(g_{x},B_{x},\theta_{x})=(g_{y},B_{y},\theta_{y})\qquad\mathrm{v.s.}\qquad H^{\prime}_{1}:(g_{x},B_{x},\theta_{x})\neq(g_{y},B_{y},\theta_{y}),

(6)

where $\theta_{x}$ and $\theta_{y}$ are the node degree parameters for networks $\mathcal{X}$ and $\mathcal{Y}$ , respectively.

Similar to the general stochastic block model, for the two-sample test of the degree-corrected stochastic block model, we consider the following test statistic:

L_{D}(g_{x},g_{y})=\max_{i,v}|\tau_{iv}|,

where

\tau_{iv}=\dfrac{1}{\sqrt{\left|g_{x}^{-1}(v)/\{i\}\right|+\left|g_{y}^{-1}(v)/\{i\}\right|}}\times\\ \left[\sum\limits_{j\in g_{y}^{-1}(v)/\{i\}}\dfrac{X_{ij}-\widehat{\theta}^{y}_{i}\widehat{\theta}^{y}_{j}\widehat{B}^{y}_{g_{y}(i)g_{y}(j)}}{\sqrt{\widehat{\theta}^{y}_{i}\widehat{\theta}^{y}_{j}\widehat{B}^{y}_{g_{y}(i)g_{y}(j)}\left(1-\widehat{\theta}^{y}_{i}\widehat{\theta}^{y}_{j}\widehat{B}^{y}_{g_{y}(i)g_{y}(j)}\right)}}+\right.\\ \left.\sum\limits_{j\in g_{x}^{-1}(v)/\{i\}}\dfrac{Y_{ij}-\widehat{\theta}^{x}_{i}\widehat{\theta}^{x}_{j}\widehat{B}^{x}_{g_{x}(i)g_{x}(j)}}{\sqrt{\widehat{\theta}^{x}_{i}\widehat{\theta}^{x}_{j}\widehat{B}^{x}_{g_{x}(i)g_{x}(j)}\left(1-\widehat{\theta}^{x}_{i}\widehat{\theta}^{x}_{j}\widehat{B}^{x}_{g_{x}(i)g_{x}(j)}\right)}}\right],

(7)

where $\widehat{\theta}^{x}_{i}=|g_{x}^{-1}(u)|d_{i}/\sum_{j\in\{j:g_{x}(j)=g_{x}(i)\}}d_{j}$ is the maximum likelihood estimator of $\theta^{x}_{i}$ and $d_{i}=\sum_{j}X_{ij}$ . $\widehat{\theta}^{y}_{i}$ is similar to that of $\widehat{\theta}^{x}$ . Note that it is difficult to obtain the asymptotic null distribution of $L_{D}(g_{x},g_{y})$ as the complex dependency between the entries $\tau_{iv}$ . By the simulation studies, we find that the empirical distribution of $T_{D}=L_{D}^{2}(\widehat{g}_{x},\widehat{g}_{y})-2\log(2\widehat{K}n)+\log\log(2\widehat{K}n)$ is also the Gumbel distribution with $\mu=-2\log(2\sqrt{\pi})$ and $\beta=2$ . Then, we have a rejection rule:

\mathrm{Reject}\ \ H^{\prime}_{0},\quad T_{D}\geq Q_{1-\alpha},

where $Q_{1-\alpha}$ is the $\alpha$ th quantile of the Gumbel distribution with $\mu=-2\log(2\sqrt{\pi})$ and $\beta=2$ . Similar to the general stochastic block model, we can obtain the bootstrap corrected statistic through the identical method.

4 SIMULATION

In this section, we evaluate the performance of the proposed test statistics in various simulation studies. Firstly, the number of communities is estimated by the Bayesian information criterion proposed by Hu et al. (2020), and the community membership label is estimated by strongly consistent estimation methods mentioned above, i.e., the profile-pseudo likelihood method in Wang et al. (2022), initialized by spectral clustering with permutations, is used to obtain the community membership label. In the simulation, we consider the test statistic $T_{n}=L_{n}^{2}(\widehat{g}_{x},\widehat{g}_{y})-2\log(2\widehat{K}n)+\log\log(2\widehat{K}n)$ and the bootstrap corrected test statistic $T_{n}^{boot}$ . In comparative experiments, the maximum deviation method (referred to as TST-MD) proposed by Chen et al. (2021a) and the spectral-based method (referred to as TST-S) proposed by Chen et al. (2021b) are used.

4.1 Simulation 1: The Null Distribution Under the Stochastic Block Model

In the first simulation, we consider the finite sample null distribution of test statistic $T_{n}$ and empirically verify Theorem 1 for two samples from a common stochastic block model. Since the limiting distribution of the test statistic is proven to be a Gumbel distribution, the location and scale parameters can be estimated by generating some bootstrap samples to correct the test statistic at a lower cost. In all of our simulations, we use $M=100$ .

We generate data from the stochastic block model with $B_{uv}^{x}=B_{uv}^{y}=0.1(1+4\times\mathds{1}(u=v))$ . In this setting, we set $K_{x}=K_{y}=3$ with $\pi_{1}=\pi_{2}=\pi_{3}=1/3$ . The sample size is either $n=600$ or $n=1200$ . We use the strongly consistent method to get $\widehat{g}_{x}$ and $\widehat{g}_{y}$ , then get $\widehat{B}_{x}$ and $\widehat{B}_{y}$ , e.g., the profile-pseudo likelihood method in Wang et al. (2021). In Figure 1, based on 1000 data replications, we plot the sample distribution of the test statistic $T_{n}$ with and without bootstrap correction. Figure 1 demonstrates that the empirical probability density function of $T_{n}$ biases upward, but the bias decrease as the sample size increases. Compared with the distribution of $T_{n}$ , the distribution of $T_{n}^{boot}$ fits the true distribution better.

Refer to caption — Figure 1: Null densities under the stochastic block model in Simulation 1 with $n=600$ (left plot) and $n=1200$ (right plot). The red dashed lines, blue dash-dotted lines, and black solid lines show the densities of the test statistic $T_{n}$ , the bootstrap corrected test statistic $T_{n}^{boot}$ , and the theoretical limit, respectively.

4.2 Simulation 2: Empirical Size for Hypothesis (1)

In this subsection, we study the empirical size under varying $K_{x}$ , $K_{y}$ , $B_{x}$ and $B_{y}$ . We set the edge probability between communities $u$ and $v$ as $B_{uv}^{x}=B_{uv}^{y}=r(1+3\times\mathds{1}(u=v))$ , where $r$ measures the sparsity of the network. Let $K_{x},K_{y}\in\{2,3,4,5,6,10,15,20,30\},r\in\{0.05,0.1,0.2\}$ , and the size of each block be 200. Table 1 reports the result from 200 data replications. From Table 1, $T_{n}$ ’s Type I errors are close to the nominal level when $K$ is small, $T_{n}^{boot}$ ’s Type I errors are close to the nominal level even when $K$ is large. Compared with TST-MD and TST-S, the performances of $T_{n}^{boot}$ and TST-S are better than TST-MD. In some cases, TST-MS does not work well. Inversely, the empirical size of $T_{n}^{boot}$ and TST-S are close to the nominal level. It is worth noting that as the sample size increases, the empirical size of the $T_{n}$ gradually increases, which seems to be contrary to fundamental theory. In fact, as the sample size increases, the number of communities also increases. As a result, the commonly used community label estimation algorithms cannot exactly recover the community label. But by the bootstrap correction, the bootstrap corrected statistic $T_{n}^{boot}$ has a good empirical size, which implies that the statistic with the bootstrap correction works well. We also notice that when the network is sparse, the test statistic $T_{n}$ does not have good empirical size, and tends to be a little oversized.

Table 1: Empirical size at nominal level

\alpha=0.05

for hypothesis test

H_{0}:(g_{x},B_{x})=(g_{y},B_{y})\ \mathrm{v.s.}\ H_{1}:(g_{x},B_{x})\neq(g_{y},B_{y})

. In the results of

T_{n}^{boot}

, the values in the parentheses are the empirical size of TST-MD (left) and TST-S (right), respectively.

	$T_{n}$			$T_{n}^{boot}$
	$r=0.05$	$r=0.1$	$r=0.2$	$r=0.05$	$r=0.1$	$r=0.2$
$K=2$	0.04	0.09	0.03	0.04 (0.09,0.03)	0.08 (0.1,0.02)	0.05 (0.01,0.05)
$K=3$	0.09	0.06	0.07	0.05 (0.085,0.04)	0.05 (0.04,0.03)	0.04 (0.01,0.03)
$K=4$	0.09	0.08	0.06	0.04 (0.01,0.04)	0.05 (0,0.03)	0.05 (0,0.04)
$K=5$	0.09	0.09	0.05	0.04 (0,0.04)	0.06 (0,0.04)	0.06 (0,0.03)
$K=6$	0.08	0.09	0.04	0.04 (0.005,0.05)	0.04 (0,0.05)	0.04 (0,0.03)
$K=10$	0.20	0.10	0.06	0.06 (0,0.04)	0.05 (0,0.04)	0.04 (0,0.06)
$K=15$	0.46	0.16	0.09	0.08 (0.01,0.04)	0.05 (0,0.05)	0.05 (0,0.04)
$K=20$	0.77	0.14	0.10	0.10 (0.03,0.04)	0.02 (0,0.05)	0.08 (0,0.05)
$K=30$	0.99	0.23	0.13	0.05 (0.04,0.05)	0.02 (0.005,0.06)	0.04 (0,0.05)

4.3 Simulation 3: Empirical Power for Hypothesis (1)

In this subsection, we investigate the empirical power under hypothesis test (1). We consider two kinds of alternatives: i) $B_{x}\neq B_{y}$ but $g_{x}=g_{y}$ and ii) $g_{x}\neq g_{y}$ but $B_{x}=B_{y}$ . Similar to simulation 2, let $K_{x},K_{y}\in\{2,3,4,5,6,10,20,30\}$ . The sample size is either $n=600$ or $n=1200$ . For the first alternative, we generate $\mathcal{X}$ with edge possibility $3.5r$ within community and $0.5r$ between communities, and $\mathcal{Y}$ with edge possibility $8r$ within community and $3r$ between communities for $r\in\{0.01,0.05,0.1\}$ . For the second alternative, we generate $\mathcal{X}$ and $\mathcal{Y}$ with edge probability $3r$ within community and $r$ between communities for $r\in\{0.05,0.1,0.2\}$ . We set $g_{x}=(1,\ldots,1,\ldots,K_{x},\ldots,K_{x})$ and $g_{y}$ from the multinomial distribution with $n$ trials and probability $\bm{\pi}=(1/K_{y},\cdots,1/K_{y})$ . Similarly, all simulations are run 200 times. For both alternatives, Tables 2 - 3 report the empirical power for hypothesis test (1). We observe that the proposed test statistics and TST-MD can successfully detect all alternative hypotheses. In the second alternative hypothesis, however, the empirical power of TST-S is less than 1. As discussed in the introduction, if the change of community structure is small, there will be little difference between the two edge-probability matrices. As a result, the TST-S will not separate the null hypothesis and the alternative hypothesis well.

Table 2: Empirical power of

T_{n}

at nominal level

\alpha=0.05

for the first alternative. The values in the parentheses are the empirical power of

T_{n}^{boot}

(left), TST-MD (middle), and TST-S (right), respectively.

	$n=600$			$n=1200$
	$r=0.01$	$r=0.05$	$r=0.1$	$r=0.01$	$r=0.05$	$r=0.1$
$K=2$	1 (1,1,1)	1 (1,1,1)	1 (1,1,1)	1 (1,1,1)	1 (1,1,1)	1 (1,1,1)
$K=3$	1 (1,1,1)	1 (1,1,1)	1 (1,1,1)	1 (1,1,1)	1 (1,1,1)	1 (1,1,1)
$K=4$	1 (1,1,1)	1 (1,1,1)	1 (1,1,1)	1 (1,1,1)	1 (1,1,1)	1 (1,1,1)
$K=5$	1 (1,1,1)	1 (1,1,1)	1 (1,1,1)	1 (1,1,1)	1 (1,1,1)	1 (1,1,1)
$K=6$	1 (1,1,1)	1 (1,1,1)	1 (1,1,1)	1 (1,1,1)	1 (1,1,1)	1 (1,1,1)
$K=10$	1 (1,1,1)	1 (1,1,1)	1 (1,1,1)	1 (1,1,1)	1 (1,1,1)	1 (1,1,1)
$K=15$	1 (1,1,1)	1 (1,1,1)	1 (1,1,1)	1 (1,1,1)	1 (1,1,1)	1 (1,1,1)
$K=20$	1 (1,1,1)	1 (1,1,1)	1 (1,1,1)	1 (1,1,1)	1 (1,1,1)	1 (1,1,1)
$K=30$	1 (1,1,1)	1 (1,1,1)	1 (1,1,1)	1 (1,1,1)	1 (1,1,1)	1 (1,1,1)

Table 3: Empirical power of

T_{n}

at nominal level

\alpha=0.05

for the second alternative. The values in the parentheses are the empirical power of

T_{n}^{boot}

(left), TST-MD (middle), and TST-S (right), respectively.

	$n=600$			$n=1200$
	$r=0.01$	$r=0.05$	$r=0.1$	$r=0.01$	$r=0.05$	$r=0.1$
$K=2$	1 (1,1,0.04)	1 (1,1,0.17)	1 (1,1,0.54)	1 (1,1,0.04)	1 (1,1,0.27)	1 (1,1,0.57)
$K=3$	1 (1,1,0.06)	1 (1,1,0.21)	1 (1,1,0.51)	1 (1,1,0.02)	1 (1,1,0.39)	1 (1,1,0.55)
$K=4$	1 (1,1,0.045)	1 (1,1,0.16)	1 (1,1,0.45)	1 (1,1,0.09)	1 (1,1,0.35)	1 (1,1,0.63)
$K=5$	1 (1,1,0.09)	1 (1,1,0.13)	1 (1,1,0.32)	1 (1,1,0.07)	1 (1,1,0.33)	1 (1,1,0.58)
$K=6$	1 (1,1,0.06)	1 (1,1,0.08)	1 (1,1,0.19)	1 (1,1,0.05)	1 (1,1,018)	1 (1,1,0.5)
$K=10$	1 (1,0.91,0.09)	1 (1,1,0.06)	1 (1,1,0.09)	1 (1,1,0.08)	1 (1,1,0.07)	1 (1,1,0.09)
$K=15$	1 (1,1,0.07)	1 (1,0.65,0.07)	1 (1,0.99,0.09)	1 (1,1,0.08)	1 (1,1,0.07)	1 (1,1,0.1)
$K=20$	1 (1,1,0.12)	1 (1,0.82,0.06)	1 (1,0.79,0.06)	1 (1,1,0.09)	1 (1,1,0.07)	1 (1,1,0.1)
$K=30$	1 (1,1,0.1)	1 (1,0.95,0.09)	1 (1,0.75,0.07)	1 (1,1,0.73)	1 (1,1,0.06)	1 (1,1,0.09)

4.4 Simulation 4: The Null Distribution Under the Degree-Corrected Stochastic Block Model

In this subsection, we investigate the asymptotic null distribution of the proposed statistic $T_{D}$ . Similar to Simulation 1, we also consider the bootstrap corrected statistic. The bootstrap corrected method is similar to the general stochastic block model and is given in the Appendix.

As shown in Zhao et al. (2012), we use the same approach to generate the degree corrected parameters $\theta$ . The details are as follows:

\theta_{i}=\begin{dcases}\xi_{i},&\mathrm{w.p.}0.8,\\ 9/11,&\mathrm{w.p.}0.1,\\ 13/11,&\mathrm{w.p.}0.1,\end{dcases}

where $\xi_{i}$ is a uniformly random variable on the interval $[4/5,6/5]$ . Similar to Simulation 1, we set $K_{x}=K_{y}=3$ with $\pi_{1}=\pi_{2}=\pi_{3}=1/3$ and $B_{uv}^{x}=B_{uv}^{y}=0.1(1+4\times\mathds{1}(u=v))$ . We also consider sample sizes $n=600$ and $1200$ . Based on 1000 data replications, we report the results in Figure 2. Figure 2 shows the empirical distribution of the test statistic $T_{D}$ is the Gumbel distribution by a location and scale shift. With the sample size increasing, the deviation between the empirical distribution of $T_{D}$ and the limiting distribution does not decrease. After using the bootstrap correction, the empirical distribution of $T_{D}^{boot}$ is close to the limiting distribution. It also implies that the empirical size of the test statistic $T_{D}$ is smaller than the nominal level.

4.5 Simulation 5: Empirical Size for Hypothesis (6)

In this subsection, we consider the empirical size under the framework of the degree-corrected stochastic block model. The basic settings are similar to Simulation 2 except $K_{x},K_{y}\in\{2,3,4,6\}$ . The degree corrected parameters are generated by the method in Simulation 4. Table 4 shows the simulation results. It is observed that the probability of Type I error is close to the nominal level. At the same time, the empirical size of the statistic $T_{D}$ is less than that of $T_{D}^{boot}$ , which is also consistent with the results of Simulation 4.

Table 4: Empirical size at nominal level

\alpha=0.05

for hypothesis test

H_{0}:(g_{x},B_{x},\theta_{x})=(g_{y},B_{y},\theta_{y})\ \mathrm{v.s.}\ H_{1}:(g_{x},B_{x},\theta_{x})\neq(g_{y},B_{y},\theta_{y})

. The values in the parentheses are the empirical power of

T_{n}^{boot}

	$T_{D}$			$T_{D}^{boot}$
	$r=0.05$	$r=0.1$	$r=0.2$	$r=0.05$	$r=0.1$	$r=0.2$
$K=2$	0.01	0.07	0.05	0.04	0.05	0.06
$K=3$	0.01	0.01	0.04	0.04	0.06	0.04
$K=4$	0.04	0.01	0.04	0.05	0.04	0.05
$K=6$	0.06	0.01	0.05	0.06	0.04	0.05

4.6 Simulation 6: Empirical Power for Hypothesis (6)

In this simulation, we investigate the testing power for the two-sample test under the degree-corrected stochastic block model. We only consider the case of $g_{x}\neq g_{y}$ . The probability matrix and the community label are generated the same as in Simulation 3. We consider the settings that $n=600$ and $1200$ . The degree corrected parameters are similar to Simulation 4. The empirical results are shown in Table 5. From Table 5, we can know that the proposed test statistic can successfully detect the alternative hypotheses as the test has good power.

Table 5: Empirical power of

T_{D}

at nominal level

\alpha=0.05

for hypothesis test

H_{0}:(g_{x},B_{x},\theta_{x})=(g_{y},B_{y},\theta_{y})\ \mathrm{v.s.}\ H_{1}:(g_{x},B_{x},\theta_{x})\neq(g_{y},B_{y},\theta_{y})

	$T_{D}$			$T_{D}^{boot}$
	$r=0.05$	$r=0.1$	$r=0.2$	$r=0.05$	$r=0.1$	$r=0.2$
$K=2$	1	1	1	1	1	1
$K=3$	1	1	1	1	1	1
$K=4$	1	1	1	1	1	1
$K=6$	0.98	1	1	1	1	1

5 DATA EXAMPLE

5.1 Gene co-expression data

In this subsection, we apply the proposed method to the gene dataset of developing rhesus monkeys’ tissue from the medial prefrontal cortex. This dataset was originally collected by Bakken et al. (2016). Other work analyzing the data suggests this is an appropriate dataset, as other work already has good evidence that gene co-expression patterns in monkey tissues in this brain region change significantly as they develop. For the prenatal period, a 6-layer network is considered, which corresponds to the 6 age stage. We label the 6-layer network as E40 to E120 to indicate the number of embryonic days of age. For the postnatal period, a 5-layer network is considered, which indicates the 5 layers within the medial prefrontal cortex. We label the 5-layer network as L2 to L6. With this data, we aim to show whether two gene co-expression networks in two different periods can be considered to come from a common stochastic block model using the proposed method.

Preprocessing Procedure. The microarray dataset contains $n=9173$ genes measured among many samples across the $L=11$ layer. In this article, since we consider the difference between the two samples, we only choose the gene co-expression networks in two periods E40 and E50, that is, the development occurs from 40 days to 50 days in the embryo. Similar to other work, e.g., Langfelder & Horvath (2008), we preprocess the adjacency matrix as follows. First, to calculate the adjacency matrix, for a layer $l$ , we construct the Pearson correlation matrix. Define the co-expression similarity $s_{ij}$ as the correlation coefficient between the profiles of nodes $i$ and $j$ : $s_{ij}=cov(x_{i},x_{j}),1\leq i,j\leq n$ . In order to avoid the outliers, let $s_{ij}^{\prime}=(1+s_{ij})/2$ . Then, using a thresholding procedure, the co-expression similarity matrix $S=(s^{\prime}_{ij})_{n\times n}$ is transformed into the adjacency matrix $\mathcal{X}$ . An unweighted network adjacency $X_{ij}$ between gene expression profiles $x_{i}$ and $x_{j}$ can be defined by hard thresholding the co-expression similarity $s^{\prime}_{ij}$ as

X_{ij}=\begin{dcases}1&|s^{\prime}_{ij}|\geq\tau,\\ 0&|s^{\prime}_{ij}|<\tau,\end{dcases}\quad\mathrm{and}\quad X_{ii}=0,

where $\tau$ is the “hard” threshold parameter. Thus, two genes are linked ( $X_{ij}=1$ ) if the absolute correlation between their expression profiles exceeds the hard threshold $\tau$ . In this article, we set $\tau=0.72$ to get two adjacency matrices $\mathcal{X}$ and $\mathcal{Y}$ . Lastly, we remove all the genes corresponding to nodes whose total degree for $\mathcal{X}$ and $\mathcal{Y}$ is less than 90. The purpose of this is to remove those nodes with few connections, which process can be seen in Lei & Lin (2022). Finally, we have two adjacency matrices $\mathcal{X},\mathcal{Y}\in\{0,1\}^{4722\times 4722}$ , each representing a network corresponding to 4722 genes.

Result and Interpretation. The following results show the difference between the gene co-expression networks of E40 and E50 using the proposed method based on the maximum entry-wise deviation. Prior to using our method, we select the number of communities to be $K_{E40}=K_{E50}=8$ , which is considered reasonable, as described in Lei & Lin (2022). Then, we use $T_{n}$ and $T_{n}^{boot}$ as test statistic, and obtain $T_{n}=355748.3$ and $T_{n}^{boot}=83.35$ , respectively. Since, $Q_{0.95}=3.41$ for the Gumbel distribution, we reject $H_{0}:(g_{E40},B_{E40})=(g_{E50},B_{E50})$ at the level of 0.05. As the discussion in Lei & Lin (2022), as the development occurs from 40 days to 50 days in the embryo, there are different gene communities that are most connected. From 40 days in embryo, community 1 was highly coordinated (i.e. densely connected), and to 40 days in embryo, community 3 was highly coordinated. Hence, it can be considered that the two networks in two periods E40 and E50 come from two different stochastic block models. The results are similar to that of Liu et al. (2018) and Lei & Lin (2022).

5.2 International trade data

Now, we study an international trade dataset originally analyzed by Westveld & Hoff (2011), containing yearly international trade data between $n=58$ countries from 1981 to 2000. For this network, an adjacency matrix $A_{t}$ can be formed by first considering a weight matrix $W_{t}$ with $W_{ijt}=\mathrm{Trade}_{ijt}+\mathrm{Trade}_{jit}$ in given year $t$ , where $\mathrm{Trade}_{ijt}$ denotes the value of exports from country $i$ to country $j$ in year $t$ . Finally, we define $A_{ijt}=1$ if $W_{ijt}\geq W_{0.5,t}$ , and $A_{ijt}=0$ otherwise, where $W_{0.5,t}$ denotes the 50th percentile of $\{W_{ijt}\}_{1\leq i<j\leq n}$ in year $t$ . In this article, we focus on the international trade networks in 1995 and 1999. Thus, we can obtain two sample networks $\mathcal{X}$ and $\mathcal{Y}$ in 1995 and 1999. For the network in 1995, the number of communities is estimated to be 7 in Hu et al. (2021). For the network in 1999, using the corrected Bayesian information criterion, the number of communities in 1999 is also estimated to be $7$ . Then, we set $\widehat{K}_{x}=\widehat{K}_{y}=7$ and continue to implement the proposed method. We obtain $T_{n}=113.7352$ and $T_{n}^{boot}=92.46558$ . Since, $Q_{0.95}=3.41$ for the Gumbel distribution, we reject $H_{0}:(g_{x},B_{x})=(g_{y},B_{y})$ at the level of 0.05. Hence, we can assert that the trade situation of different countries has changed in the four years from 1995 to 1999.

6 DISSCUSSION

In this article, we have proposed a novel two-sample test statistic based on the maximum entry-wise deviation of staggered centered and rescale observed adjacency matrix, and have demonstrated that the asymptotic null distribution of the test statistic is a Gumbel distribution when $K_{x}=K_{y}=o(n/\log^{2}n)$ . This test has extended the method of Hu et al.(2021) to two samples. In our study, the difference between two networks is assessed by the sum of two residual matrices, where the centered and rescaled matrix of another sample is obtained using the estimation of the label and the probability matrix of one sample. Empirically, we have demonstrated that the size and the power of the test are valid. In this article, we assume that the numbers of communities for $\mathcal{X}$ and $\mathcal{Y}$ are equal. In reality, we should determine whether $K_{x}$ is equal to $K_{y}$ . In fact, we can independently estimate the number of communities by some existing methods, such as the sequentially testing methods (Lei, 2016: Hu et al., 2021) and methods based on the model selection (Hu et al., 2020). If $\widehat{K}_{x}=\widehat{K}_{y}$ , the proposed method can be directly used. When $K_{x}\neq K_{y}$ , we can consider the new community structure by combining two stochastic block models. Figure 3 gives a diagram. Let $Z_{x}\in\{0,1\}^{n\times K_{x}},Z_{y}\in\{0,1\}^{n\times K_{y}},Z^{\prime}\in\{0,1\}^{n\times K^{\prime}}$ be membership matrices. Note that each row of $Z_{x}$ (or $Z_{y}$ ) has exactly one entry that is nonzero. For two nodes $i_{1}$ and $i_{2}$ , $Z^{\prime}_{i_{1}\cdot}=Z^{\prime}_{i_{2}\cdot}$ if and only if $\bar{Z}_{i_{1}\cdot}=\bar{Z}_{i_{2}\cdot}$ , where $\bar{Z}=(Z_{x},Z_{y})\in\{0,1\}^{n\times(K_{x}+K_{y})}$ be the combined membership matrix. Hence, we can use $\widehat{g}_{x}$ and $\widehat{g}_{y}$ to obtain $g^{\prime}$ , then obtain $\widehat{B}_{x}$ and $\widehat{B}_{y}$ by $g^{\prime}$ and $K^{\prime}$ . Then, we can use our proposed method to test the difference between two samples after combining two community structures.

It is worth noting that the proposed testing method works well when the network is dense and $K$ is small in our simulation studies. However, when the network is sparse or $K$ is large, condition (3) may be violated, i.e., the exact community label estimator is hard to be obtained. Hence, it would be of interest to investigate the possibility of the sparse network and big $K$ in future work. In addition, note that although the proposed method can identify whether the two models are the same through two samples, we cannot know whether this difference is caused by changes in $g$ or $B$ , which is crucial in practical applications. Intuitively, we can judge whether $g_{x}$ is equal to $g_{y}$ by the strong consistent estimator $\widehat{g}_{x}$ and $\widehat{g}_{y}$ . However, there is no theoretical guarantee. This issue will be considered in future work. In order to better improve the two-sample test of the stochastic block model, we will continue to study this issue in future work.

7 APPENDIX

We start with three lemmas that will be used in the proof. The following Poisson approximation result is essentially a special case of Theorem 1 in Arratia et al. (1989).

Lemma 1.

lemma 1.[Arratia et al. (1989)] Let $I$ be an index set and $\{\mathbf{B}_{\alpha},\alpha\in I\}$ be a set of subsets of $I$ , that is, $\mathbf{B}_{\alpha}\subset I$ . Let also $\{\eta_{\alpha},\alpha\subset I\}$ be random variables. For a given $t\in\mathbb{R}$ , set $\lambda=\sum_{\alpha\in I}\mathbb{P}\{\eta_{\alpha}>t\}$ . Then

\left|\mathbb{P}\{\max_{\alpha\in I}\eta_{\alpha}\leq t\}-e^{-\lambda}\right|\leq(1\wedge\lambda^{-1})(b_{1}+b_{2}+b_{3}),

where

b_{1}=\sum_{\alpha\in I}\sum_{\beta\in\mathbf{B}_{\alpha}}\mathbb{P}\{\eta_{\alpha}>t\}\mathbb{P}\{\eta_{\beta}>t\},b_{2}=\sum_{\alpha\in I}\sum_{\alpha\neq\beta\in\mathbf{B}_{\alpha}}\mathbb{P}\{\eta_{\alpha}>t,\eta_{\beta}>t\},

b_{3}=\sum_{\alpha\in I}\mathbb{E}|\mathbb{P}\{\eta_{\alpha}>t|\sigma(\eta_{\beta},\beta\neq{\mathbf{B}_{\alpha})\}}-\mathbb{P}\{\eta_{\beta}>t\}|,

and $\sigma(\eta_{\beta},\beta\notin\mathbf{B}_{\alpha})$ is the $\sigma-$ algebra generated by $\{\eta_{\beta},\beta\notin\mathbf{B}_{\alpha}\}$ . In particular, if $\eta_{\alpha}$ is independent of $\{\eta_{\beta},\beta\notin\mathbf{B}_{\alpha}\}$ for each $\alpha$ then $b_{3}=0$ .

Lemma 2.

lemma 2.[Chen (1990)] Suppose $\xi_{1},\ldots,\xi_{n}$ are i.i.d random variables with $\mathbb{E}\xi_{1}=0$ and $\mathbb{E}\xi_{1}^{2}=1$ . Set $S_{n}=\sum_{i=1}^{n}\xi_{i}$ . Let $0<\alpha\leq 1$ and $\{a_{n}:n\geq 1\}$ satisfy that $a_{n}\rightarrow\infty$ and $a_{n}=o(n^{\alpha/(2(2-\alpha))})$ . If $\mathbb{E}e^{t_{0}|\xi_{1}|^{\alpha}}<\infty$ for some $t_{0}>0$ , then

\lim_{n}\frac{1}{a_{n}^{2}}\log\mathbb{P}\{\frac{S_{n}}{\sqrt{n}a_{n}}\geq\mu\}=-\frac{\mu^{2}}{2}

for any $\mu>0$ .

Lemma 3.

lemma 3.[Cai & Jiang (2011)] Suppose $\xi_{1},\ldots,\xi_{n}$ are i.i.d random variables with $\mathbb{E}\xi_{1}=0$ and $\mathbb{E}\xi_{1}^{2}=1$ and $\mathbb{E}e^{t_{0}|\xi_{1}|^{\alpha}}<\infty$ for some $t_{0}>0$ and $0<\alpha\leq 1$ . Set $S_{n}=\sum_{i=1}^{n}\xi_{i}$ and $\beta=\alpha/(2+\alpha)$ . Then, for any $\{p_{n}:n\geq 1\}$ with $0<p_{n}\rightarrow\infty$ and $\log p_{n}=o(n^{\beta})$ and $\{y_{n}:n\geq 1\}$ with $y_{n}\rightarrow y>0$ ,

\mathbb{P}\{\dfrac{S_{n}}{\sqrt{n\log p_{n}}}>y_{n}\}\sim\dfrac{p_{n}^{-y_{n}^{2}/2}(\log p_{n})^{-1/2}}{\sqrt{2\pi}y}

as $n\rightarrow\infty$ .

7.1 Proof of Theorem 1

Although the idea of proof of the Theorem 1 is similar to Theorem 1 in Hu et al. (2021), there are some differences in some details, such as decomposition for some quantities. Next, we prove Theorem 1.

Since, $H_{0}:(g_{x},B_{x})=(g_{y},B_{y})$ , we denote $g=g_{x}=g_{y}$ and $B=B_{x}=B_{y}$ for simplicity. By Bernstein’s inequality, we have

\mathbb{P}\{|\widehat{B}_{uv}-B_{uv}|>\dfrac{\epsilon\log n}{\sqrt{n_{u}n_{v}}}\}\leq 2\exp\left(-\dfrac{\epsilon^{2}\log^{2}n}{2B_{uv}(1-B_{uv})+\frac{2}{3}\epsilon\log n/\sqrt{n_{u}n_{v}}}\right)

for any $\epsilon>0$ , According to Assumption 2, we can get

\widehat{B}_{uv}-B_{uv}=o_{p}(\dfrac{\log n}{\sqrt{n_{u}n_{v}}})=o_{p}(\dfrac{K\log n}{n})

uniformly in $u$ and $v$ as $n\rightarrow\infty$ .

Notice that

	$\displaystyle\tilde{\rho}_{iv,x}$	$\displaystyle:=\sum\limits_{j\in g_{y}^{-1}(v)/\{i\}}\dfrac{X_{ij}-\widehat{B}^{y}_{g_{y}(i)g_{y}(j)}}{\sqrt{\widehat{B}^{y}_{g_{y}(i)g_{y}(j)}\left(1-\widehat{B}^{y}_{g_{y}(i)g_{y}(j)}\right)}}$
		$\displaystyle\ =\sum\limits_{j\in g^{-1}(v)/\{i\}}\dfrac{X_{ij}-B_{g(i)g(j)}+o_{p}(\frac{K\log n}{n})}{\sqrt{B_{g(i)g(j)}\left(1-B_{g(i)g(j)}\right)}}\left(1+o_{p}(\sqrt{\dfrac{K\log n}{n}})\right)$
		$\displaystyle\ =\rho_{iv,x}+\rho_{iv,x}o_{p}(\sqrt{\dfrac{K\log n}{n}})+o_{p}(1),$

similarly,

	$\displaystyle\tilde{\rho}_{iv,y}$	$\displaystyle:=\sum\limits_{j\in g_{x}^{-1}(v)/\{i\}}\dfrac{Y_{ij}-\widehat{B}^{x}_{g_{x}(i)g_{x}(j)}}{\sqrt{\widehat{B}^{x}_{g_{x}(i)g_{x}(j)}\left(1-\widehat{B}^{x}_{g_{x}(i)g_{x}(j)}\right)}}$
		$\displaystyle\ =\sum\limits_{j\in g^{-1}(v)/\{i\}}\dfrac{Y_{ij}-B_{g(i)g(j)}+o_{p}(\frac{K\log n}{n})}{\sqrt{B_{g(i)g(j)}\left(1-B_{g(i)g(j)}\right)}}\left(1+o_{p}(\sqrt{\dfrac{K\log n}{n}})\right)$
		$\displaystyle\ =\rho_{iv,y}+\rho_{iv,y}o_{p}(\sqrt{\dfrac{K\log n}{n}})+o_{p}(1),$

where $\rho_{iv,x}=\sum\limits_{j\in g^{-1}(v)/\{i\}}\dfrac{X_{ij}-B_{g(i)g(j)}}{\sqrt{B_{g(i)g(j)}\left(1-B_{g(i)g(j)}\right)}}$ and $\rho_{iv,y}=\sum\limits_{j\in g^{-1}(v)/\{i\}}\dfrac{Y_{ij}-B_{g(i)g(j)}}{\sqrt{B_{g(i)g(j)}\left(1-B_{g(i)g(j)}\right)}}$ .

Let

	$\displaystyle L_{n,0}$	$\displaystyle:=\max_{i,v}\|\tilde{\rho}_{iv}\|$
		$\displaystyle\ =\max_{i,v}\left\|\dfrac{1}{\sqrt{\left\|g_{x}^{-1}(v)/\{i\}\right\|+\left\|g_{y}^{-1}(v)/\{i\}\right\|}}\left(\tilde{\rho}_{iv,x}+\tilde{\rho}_{iv,y}\right)\right\|$
		$\displaystyle\ =\max_{i,v}\left\|\dfrac{1}{\sqrt{2\left\|g^{-1}(v)/\{i\}\right\|}}\left[\rho_{iv,x}+\rho_{iv,y}+(\rho_{iv,x}+\rho_{iv,y})o_{p}(\sqrt{\dfrac{K\log n}{n}})+o_{p}(1)\right]\right\|$
		$\displaystyle\ =L_{n,1}+L_{n,1}o_{p}(\sqrt{\dfrac{K\log n}{n}})+o_{p}(1),$

where

	$\displaystyle L_{n,1}$	$\displaystyle:=\max_{i,v}\left\|\rho_{iv}\right\|$
		$\displaystyle\ =\max_{i,v}\left\|\dfrac{1}{\sqrt{2\left\|g^{-1}(v)/\{i\}\right\|}}\left(\rho_{iv,x}+\rho_{iv,y}\right)\right\|$
		$\displaystyle\ =\max_{i,v}\left\|\dfrac{1}{\sqrt{\left\|g^{-1}(v)/\{i\}\right\|}}\sum\limits_{j\in g^{-1}(v)/\{i\}}\dfrac{X_{ij}+Y_{ij}-2B_{g(i)g(j)}}{\sqrt{2B_{g(i)g(j)}\left(1-B_{g(i)g(j)}\right)}}\right\|.$

If $K=o(n/\log^{2}n)$ and $L_{n,1}=O_{p}(\sqrt{\log n})$ , we have

L_{n,0}=L_{n,1}+o_{p}(1).

Thus, to prove Theorem 1 (5), it is sufficient to show

\lim_{n}\mathbb{P}\left\{L_{n,1}^{2}-2\log(2Kn)+\log\log(2Kn)\leq y\right\}=\exp\left\{-\dfrac{1}{2\sqrt{\pi}}e^{-y/2}\right\}.

Let $y_{n}=\sqrt{y+2\log(2Kn)-\log\log(2Kn)},I=\{(i,v)|1\leq i\leq n,1\leq v\leq K\},\mathbf{B}_{iv}=\{(s,t)\in I/(i,v)|s=i\}$ . Then $|\mathbf{B}_{iv}|=K-1$ . Note that $\mathbb{E}\{\dfrac{X_{ij}+Y_{ij}-2B_{g(i)g(j)}}{\sqrt{2B_{g(i)g(j)}\left(1-B_{g(i)g(j)}\right)}}\}=0$ and $\mathbb{E}\{\dfrac{X_{ij}+Y_{ij}-2B_{g(i)g(j)}}{\sqrt{2B_{g(i)g(j)}\left(1-B_{g(i)g(j)}\right)}}\}^{2}=1$ . By Lemma 1, we have

\left|\mathbb{P}\{\max_{i,v}|\rho_{iv}|\leq y_{n}\}-e^{-\lambda_{n}}\right|\leq b_{1}+b_{2},

where $\lambda_{n}=\sum_{i,v}\mathbb{P}\{|\rho_{iv}|>y_{n}\}$ . By Lemma 3, we have

	$\displaystyle\mathbb{P}\{\rho_{iv}>y_{n}\}$	$\displaystyle=\mathbb{P}\{\dfrac{\rho_{iv}}{\sqrt{\log(2Kn)}}>\sqrt{\dfrac{y+2\log(2Kn)-\log\log(2Kn)}{\log(2Kn)}}\}$
		$\displaystyle\sim(2Kn)^{-\frac{y+2\log(2Kn)-\log\log(2Kn)}{\log(2Kn)}}(\log(2Kn))^{-1/2}/(2\sqrt{\pi})$
		$\displaystyle=(2Kn)^{-1}(2Kn)^{-\frac{y}{2\log(2Kn)}}(2Kn)^{\frac{\log\log(2Kn)}{2\log(2Kn)}}(\log(2Kn))^{-\frac{1}{2}}/(2\sqrt{\pi})$
		$\displaystyle=(2Kn)^{-1}e^{-\frac{y}{2\log(2Kn)}\log(2Kn)}e^{\frac{\log\log(2Kn)}{2\log(2Kn)}\log(2Kn)}(\log(2Kn))^{-\frac{1}{2}}/(2\sqrt{\pi})$
		$\displaystyle=(2Kn)^{-1}e^{-y/2}e^{\log(\log(2Kn))^{\frac{1}{2}}}(\log(2Kn))^{-\frac{1}{2}}/(2\sqrt{\pi})$
		$\displaystyle=(2Kn)^{-1}e^{-y/2}(\log(2Kn))^{\frac{1}{2}}(\log(2Kn))^{-\frac{1}{2}}/(2\sqrt{\pi})$
		$\displaystyle=(2Kn)^{-1}e^{-y/2}/(2\sqrt{\pi}).$

Hence,

	$\displaystyle\lambda_{n}$	$\displaystyle=\sum_{i,v}\mathbb{P}\{\|\rho_{iv}\|>y_{n}\}$
		$\displaystyle=Kn\dfrac{(Kn)^{-1}}{2\sqrt{\pi}}e^{-y/2}$
		$\displaystyle=\dfrac{1}{2\sqrt{\pi}}e^{-y/2}.$

Meanwhile,

	$\displaystyle b_{1}$	$\displaystyle=\sum_{\alpha\in I}\sum_{\beta\in\mathbf{B}_{\alpha}}\mathbb{P}\{\eta_{\alpha}>y_{n}\}\mathbb{P}\{\eta_{\alpha}>y_{n}\}$
		$\displaystyle<4K^{2}ne^{-y-2\log(2Kn)+\log\log(2Kn)}$
		$\displaystyle=e^{\log(4K^{2}n)-y-2\log(2Kn)+\log\log(2Kn)}$
		$\displaystyle=o(1),$

	$\displaystyle b_{2}$	$\displaystyle=\sum_{\alpha\in I}\sum_{\alpha\neq\beta\in\mathbf{B}_{\alpha}}\mathbb{P}\{\eta_{\alpha}>y_{n},\eta_{\alpha}>y_{n}\}$
		$\displaystyle<4K^{2}ne^{-y-2\log(2Kn)+\log\log(2Kn)}$
		$\displaystyle=e^{\log(4K^{2}n)-y-2\log(2Kn)+\log\log(2Kn)}$
		$\displaystyle=o(1).$

Thus, we have

\lim_{n}\mathbb{P}\{L_{n,1}^{2}\leq y_{n}\}=\exp\left\{-\dfrac{1}{2\sqrt{\pi}}e^{-y/2}\right\},

Combining this with $L_{n,0}=L_{n,1}+o_{p}(1)$ , we know that (5) hlods. $\square$

7.2 Proof of Theorem 2

Although the idea of proof of the Theorem 2 is similar to Theorem 2 in Hu et al. (2021), there are some differences in some details, such as decomposition for some quantities. Next, we prove Theorem 2.

Similar to the proof of Theorem 1, we have

\widehat{B}_{uv}^{x}-B_{uv}^{x}=o_{p}(\dfrac{K\log n}{n}),\ \mathrm{and}\ \widehat{B}_{uv}^{y}-B_{uv}^{y}=o_{p}(\dfrac{K\log n}{n}).

Let

\rho_{iv,x}=\dfrac{1}{\sqrt{|g_{y}^{-1}(v)/\{i\}|}}\sum\limits_{j\in g_{y}^{-1}(v)/\{i\}}\dfrac{X_{ij}-B_{g_{x}(i)g_{x}(j)}^{x}}{\sqrt{B_{g_{y}(i)g_{y}(j)}^{y}\left(1-B_{g_{y}(i)g_{y}(j)}^{y}\right)}},

	$\displaystyle\tilde{\rho}_{iv,x}$	$\displaystyle:=\dfrac{1}{\sqrt{\|g_{y}^{-1}(v)/\{i\}\|}}\sum\limits_{j\in g_{y}^{-1}(v)/\{i\}}\dfrac{X_{ij}-\widehat{B}^{y}_{g_{y}(i)g_{y}(j)}}{\sqrt{\widehat{B}^{y}_{g_{y}(i)g_{y}(j)}\left(1-\widehat{B}^{y}_{g_{y}(i)g_{y}(j)}\right)}}$
		$\displaystyle\ =\dfrac{1}{\sqrt{\|g_{y}^{-1}(v)/\{i\}\|}}$
		$\displaystyle\qquad\times\sum\limits_{j\in g_{y}^{-1}(v)/\{i\}}\dfrac{X_{ij}-B_{g_{x}(i)g_{x}(j)}^{x}+B_{g_{x}(i)g_{x}(j)}^{x}-B^{y}_{g_{y}(i)g_{y}(j)}+B^{y}_{g_{y}(i)g_{y}(j)}-\widehat{B}^{y}_{g_{y}(i)g_{y}(j)}}{\sqrt{\widehat{B}^{y}_{g_{y}(i)g_{y}(j)}\left(1-\widehat{B}^{y}_{g_{y}(i)g_{y}(j)}\right)}}$
		$\displaystyle\ =\dfrac{1}{\sqrt{\|g_{y}^{-1}(v)/\{i\}\|}}\sum\limits_{j\in g_{y}^{-1}(v)/\{i\}}\dfrac{X_{ij}-B_{g_{x}(i)g_{x}(j)}^{x}+B_{g_{x}(i)g_{x}(j)}^{x}-B^{y}_{g_{y}(i)g_{y}(j)}+o_{p}(\dfrac{K\log n}{n})}{\sqrt{B^{y}_{g_{y}(i)g_{y}(j)}\left(1-B^{y}_{g_{y}(i)g_{y}(j)}\right)}}$
		$\displaystyle\qquad\times\left(1+o_{p}(\sqrt{\dfrac{K\log n}{n}})\right)$
		$\displaystyle\ =\rho_{iv,x}+\dfrac{1}{\sqrt{\|g_{y}^{-1}(v)/\{i\}\|}}\sum\limits_{j\in g_{y}^{-1}(v)/\{i\}}\dfrac{B_{g_{x}(i)g_{x}(j)}^{x}-B^{y}_{g_{y}(i)g_{y}(j)}}{\sqrt{B^{y}_{g_{y}(i)g_{y}(j)}\left(1-B^{y}_{g_{y}(i)g_{y}(j)}\right)}}+o_{p}(1).$

By Hoeffding’s inequality, we have

		$\displaystyle\mathbb{P}\{\max_{i,v}\|\dfrac{1}{\sqrt{\|g_{y}^{-1}(v)/\{i\}\|}}\sum\limits_{j\in g_{y}^{-1}(v)/\{i\}}\dfrac{X_{ij}-B_{g_{x}(i)g_{x}(j)}^{x}}{\sqrt{B_{g_{y}(i)g_{y}(j)}^{y}\left(1-B_{g_{y}(i)g_{y}(j)}^{y}\right)}}\|>t\}$
	$\displaystyle\leq$	$\displaystyle\ \sum_{i,v}\mathbb{P}\{\|\sum\limits_{j\in g_{y}^{-1}(v)/\{i\}}\dfrac{X_{ij}-B_{g_{x}(i)g_{x}(j)}^{x}}{\sqrt{B_{g_{y}(i)g_{y}(j)}^{y}\left(1-B_{g_{y}(i)g_{y}(j)}^{y}\right)}}\|>t\sqrt{\|g_{y}^{-1}(v)/\{i\}\|}\}$
	$\displaystyle\leq$	$\displaystyle\ 2e^{\log(Kn)-2C_{1}^{2}t^{2}}.$

Hence, $\max_{i,v}|\rho_{iv,x}|=O_{p}(\sqrt{\log n})$ . Denote

\dfrac{1}{\sqrt{|g_{y}^{-1}(v)/\{i\}|}}\sum\limits_{j\in g_{y}^{-1}(v)/\{i\}}\dfrac{B_{g_{x}(i)g_{x}(j)}^{x}-B^{y}_{g_{y}(i)g_{y}(j)}}{\sqrt{B^{y}_{g_{y}(i)g_{y}(j)}\left(1-B^{y}_{g_{y}(i)g_{y}(j)}\right)}}

by $\ell_{iv}(y,x)$ , we have

\max_{i,v}|\tilde{\rho}_{iv,x}|=\max_{i,v}|\ell_{iv}(y,x)|+O_{p}(\sqrt{\log n}).

Similarly, we have

\rho_{iv,y}=\dfrac{1}{\sqrt{|g_{x}^{-1}(v)/\{i\}|}}\sum\limits_{j\in g_{x}^{-1}(v)/\{i\}}\dfrac{Y_{ij}-B_{g_{y}(i)g_{y}(j)}^{y}}{\sqrt{B_{g_{x}(i)g_{x}(j)}^{x}\left(1-B_{g_{x}(i)g_{x}(j)}^{x}\right)}},

	$\displaystyle\tilde{\rho}_{iv,y}$	$\displaystyle:=\dfrac{1}{\sqrt{\|g_{x}^{-1}(v)/\{i\}\|}}\sum\limits_{j\in g_{x}^{-1}(v)/\{i\}}\dfrac{Y_{ij}-\widehat{B}^{x}_{g_{x}(i)g_{x}(j)}}{\sqrt{\widehat{B}^{x}_{g_{x}(i)g_{x}(j)}\left(1-\widehat{B}^{x}_{g_{x}(i)g_{x}(j)}\right)}}$
		$\displaystyle\ =\rho_{iv,y}+\ell_{iv}(x,y)+o_{p}(1).$

where $\ell_{iv}(x,y)=\dfrac{1}{\sqrt{|g_{x}^{-1}(v)/\{i\}|}}\sum\limits_{j\in g_{x}^{-1}(v)/\{i\}}\dfrac{B_{g_{y}(i)g_{y}(j)}^{y}-B^{x}_{g_{x}(i)g_{x}(j)}}{\sqrt{B^{x}_{g_{x}(i)g_{x}(j)}\left(1-B^{x}_{g_{x}(i)g_{x}(j)}\right)}}$ , and

\max_{i,v}|\tilde{\rho}_{iv,y}|=\max_{i,v}|\ell_{iv}(x,y)|+O_{p}(\sqrt{\log n}).

Hence,

	$\displaystyle L_{n,0}$	$\displaystyle:=\max_{i,v}\|\tilde{\rho}_{iv}\|$
		$\displaystyle\ =\max_{i,v}\|\dfrac{1}{\sqrt{\left\|g_{x}^{-1}(v)/\{i\}\right\|+\left\|g_{y}^{-1}(v)/\{i\}\right\|}}(\sqrt{\left\|g_{y}^{-1}(v)/\{i\}\right\|}\tilde{\rho}_{iv,x}+\sqrt{\left\|g_{x}^{-1}(v)/\{i\}\right\|}\tilde{\rho}_{iv,y})\|$
		$\displaystyle\ \geq\max_{i,v}\|\dfrac{\sqrt{\left\|g_{x}^{-1}(v)/\{i\}\right\|}+\sqrt{\left\|g_{y}^{-1}(v)/\{i\}\right\|}}{\sqrt{\left\|g_{x}^{-1}(v)/\{i\}\right\|+\left\|g_{y}^{-1}(v)/\{i\}\right\|}}(\tilde{\rho}_{iv,x}\wedge\tilde{\rho}_{iv,y})\|$
		$\displaystyle\ =\dfrac{\sqrt{\left\|g_{x}^{-1}(v)/\{i\}\right\|}+\sqrt{\left\|g_{y}^{-1}(v)/\{i\}\right\|}}{\sqrt{\left\|g_{x}^{-1}(v)/\{i\}\right\|+\left\|g_{y}^{-1}(v)/\{i\}\right\|}}\max_{i,v}\|(\tilde{\rho}_{iv,x}\wedge\tilde{\rho}_{iv,y})\|$

By Assumptions 2 and 3, we have $L_{n,0}^{2}/\log n\rightarrow\infty$ .

Let $T_{n}=L_{n}^{2}(g_{x},g_{y})-2\log(2Kn)+\log\log(2Kn)$ , as $n\rightarrow\infty$ , we have

T_{n}/\log n\rightarrow\infty.

$\square$

7.3 The Bootstrap corrected test statistic under the degree-corrected stochastic block model

For two sample networks $\mathcal{X}$ and $\mathcal{Y}$ , under the framework of the degree-corrected stochastic block model, the bootstrap corrected test statistic is calculated as the following:

1. Estimating $(\widehat{g}_{x},\widehat{B}_{x})$ and $(\widehat{g}_{y},\widehat{B}_{y})$ by the consistent clustering method and $\widehat{\theta}_{x}$ and $\widehat{\theta}_{y}$ using their maximum likelihood estimators. Calculate the statistic $T_{n}$ using $\mathcal{X},\mathcal{Y},(\widehat{g}_{x},\widehat{B}_{x},\widehat{\theta}_{x})$ and $(\widehat{g}_{y},\widehat{B}_{y},\widehat{\theta}_{y})$ ;

2. For $m=1,\ldots,M$ , generate $\mathcal{X}^{(m)}$ and $\mathcal{Y}^{(m)}$ from the edge-probability matrix $\widehat{P}_{ij}=(\widehat{P}_{ij}^{x}+\widehat{P}_{ij}^{y})/2=(\widehat{\theta}_{i}^{x}\widehat{\theta}_{j}^{x}\widehat{B}_{\widehat{g}_{x}(i)\widehat{g}_{x}(j)}^{x}+\widehat{\theta}_{i}^{y}\widehat{\theta}_{j}^{y}\widehat{B}_{\widehat{g}_{y}(i)\widehat{g}_{y}(j)}^{y})/2$ , and calculate $T_{n}^{(m)}$ based on $\mathcal{X}^{(m)}$ , $\mathcal{Y}^{(m)}$ ;

3. Using $\{T_{n}^{(m)}:m=1,\ldots,M\}$ to estimate the location and scale parameters $\widehat{\mu}$ and $\widehat{\beta}$ of the Gumbel distribution through maximum likelihood method.

4. The bootstrap corrected test statistic is calculated as

T_{n}^{boot}=\mu+\beta\left(\dfrac{T_{n}-\widehat{\mu}}{\widehat{\beta}}\right),

where $\mu=-2\log(2\sqrt{\pi})$ and $\beta=2$ .

References

[1] Airoldi, E. M., Blei, D. M., Fienberg, S. E., & Xing, E. P. (2008). Mixed membership stochastic blockmodels. Journal of machine learning research, 9:1981 - 2014.
[2] Amini, A. A., Chen, A., Bickel, P. J., & Levina, E. (2013). Pseudo-Likelihood Methods for Community Detection in Large Sparse Networks. The Annals of Statistics, 41:2097 - 2122.
[3] Arratia, R., Goldstein, L., & Gordon, L. (1989). Two moments suffice for Poisson approximations: The Chen-Stein method. The Annals of Probability, 17:9 - 25.
[4] Bakken, T. E., Miller, J. A., Ding, S.-L., Sunkin, S. M., Smith, K. A., Ng, L., Szafer, A., Dalley, R. A., Royall, J. J., Lemon, T., et al (2016). A comprehensive transcriptional map of primate brain development. Nature, 535:367 - 375.
[5] Barnett, I. & Onnela, J.-P. (2016). Change point detection in correlation networks. Scientic reports, 6:18893.
[6] Bickel, P. J. & Chen, A. (2009). A nonparametric view of network models and Newman-Girvan and other modularities. Proceedings of the National Academy of Sciences, 106:21068 - 21073.
[7] Bickel, P. J., & Sarkar, P. (2015), Hypothesis testing for automated community detection in networks, Journal of the Royal Statistical Society, Series B, 78:253–273.
[8] Cai, T. T. & Jiang, T. (2011). Limiting laws of coherence of random matrices with applications to testing covariance structure and construction of compressed sensing matrices. The Annals of Statistics, 39:1496 - 1525.
[9] Chen, X. (1990). Probabilities of moderate deviations for $B$ -valued independent random vectors. Chinese Annals of Mathematics, 11:621 - 629.
[10] Chen, L., Zhou, J. & Lin, L. (2021a). Hypothesis testing for populations of networks. Communications in Statistics - Theory and Methods, online.
[11] Chen, L., Josephs, N., Lin, L., Zhou, J. & Kolaczyk, E. D. (2021b). A spectral-based framework for hypothesis testing in populations of networks. Statistica Sinica, online.
[12] Choi, D. S., Wolfe, P. J., & Airoldi, E. M. (2012). Stochastic blockmodels with a growing number of classes. Biometrika, 99:273 - 284.
[13] Daudin, J.-J., Picard, F., & Robin, S. (2008). A mixture model for random graphs. Statistics and Computing, 18:173 - 183.
[14] Gao, C., Ma, Z., Zhang, A. Y., & Zhou, H. H. (2017). Achieving optimal misclassification proportion in stochastic blockModels. The Journal of Machine Learning Research, 18:1980 - 2024.
[15] Ghoshdastidar, D., & von Luxburg, U. (2018). Practical methods for graph two-sample testing. In Advances in Neural Information Processing Systems, 3019-3028. Montréal, Canada.
[16] Ghoshdastidar, D., Gutzeit, M., Carpentier, A. & Von Luxburg, U.(2020). Two-sample hypothesis testing for inhomogeneous random graphs. The Annals of Statistics, 48: 2208 - 2229.
[17] Holland, P. W., Laskey, K. B., & Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social Networks, 5:109 - 137.
[18] Hu, J., Qin, H., Yan, T., , & Zhao, Y. (2020). Corrected Bayesian Information Criterion for Stochastic Block Models. Journal of the American Statistical Association, 115:1771 - 1783.
[19] Hu, J., Zhang, J., Qin, H., Yan, T., & Zhu, J. (2021). Using Maximum Entry-Wise Deviation to Test the Goodness of Fit for Stochastic Block Models. Journal of The American Statistical Association, 116:1373 - 1382.
[20] Jin, J. (2015). Fast Community Detection by SCORE. The Annals of Statistics, 43:57- 89.
[21] Karrer, B. & Newman, M. E. J. (2011). Stochastic blockmodels and community structure in networks. Physical Review. E, 83:016107.
[22] Langfelder, P. & Horvath, S. (2008). WGCNA: An R package for weighted correlation network analysis. BMC Bioinformatics, 9:559.
[23] Le, C. M. & Levina, E. (2022). Estimating the number of communities in networks by spectral methods. Electronic Journal of Statistics, 16:3315 - 3342.
[24] Lei, J. (2016). A goodness-of-fit test for stochastic block models. The Annals of Statistics, 44:401 - 424.
[25] Lei, J. & Lin, K. Z. (2022). Bias-adjusted spectral clustering in multi-layer stochastic block models. Journal of the American Statistical Association, online.
[26] Lei, J. & Rinaldo, A. (2015). Consistency of spectral clustering in stochastic block models. The Annals of Statistics, 43:215 - 237.
[27] Liu, F., Choi, D., Xie, L., & Roeder, K. (2018). Global spectral clustering in dynamic networks. Proceedings of the National Academy of Sciences, 115:927 - 932.
[28] Newman, M. E. J. (2006). Modularity and community structure in networks. Proceedings of the National Academy of Sciences of the United States of America, 103:8577 - 8582.
[29] Newman, M. E. J. & Girvan, M. (2004). Finding and evaluating community structure in networks. Physical review. E, 69:026113.
[30] Nowicki, K. & Snijders, T. (2001). Estimation and prediction for stochastic block structures. Journal of The American Statistical Association, 96:1077 - 1087.
[31] Rohe, K., Chatterjee, S., & Yu, B. (2011). Spectral clustering and the high-dimensional stochastic blockmodel. The Annals of Statistics, 39:1878 - 1915.
[32] Rohe, K., Qin, T., & Fan, H. (2014). The highest dimensional stochastic blockmodel with a regularized estimator. Statistica Sinica, 24:1771 - 1786.
[33] Saldña, D. F., Yu, Y., & Feng, Y. (2017). How Many Communities Are There? Journal of Computational and Graphical Statistics, 26:171 - 181.
[34] Sarkar, P. & Bickel, P. J. (2015). Role of normalization in spectral clustering for stochastic blockmodels. The Annals of Statistics, 43:962 - 990.
[35] Snijders, T. & Nowicki, K. (1997). Estimation and prediction for stochastic block structures for graphs with latent block structure. Journal of Classification, 14:75 - 100.
[36] Steinhaeuser, K. & Chawla, N. V. (2010). Identifying and evaluating community structure in complex networks. Pattern Recognition Letters, 31:413 - 421.
[37] Tang, M., Athreya, A., Sussman, A. L., Lyzinski, V. & Priebe, C. E. (2017). A nonparametric two-sample hypothesis testing problem for random graphs. Bernoulli, 23:1599 - 1630.
[38] Wang, J., Zhang, J., Liu, B., Zhu, J., & Guo, J. (2021). Fast network community detection with profile-pseudo likelihood methods. Journal of the American Statistical Association, online.
[39] Wang, Y. R. & Bickel, P. J. (2017). Likelihood-based model selection for stochastic block models. The Annals of Statistics, 45:500 - 528.
[40] Westveld, A. H. & Hoff, P. D. (2011). A Mixed effects model for longitudinal relational and network data with applications to international trade and conflict. The Annals of Applied Statistics, 5:843 - 872.
[41] Wu, F., Kong, X. & Xu, C. (2022). Test on stochastic block model: local smoothing and extreme value theory. Journal of Systems Science and Complexity, 35:1535 - 1556.
[42] Zhang, A. Y. & Zhou, H. H. (2016). Minimax rates of community detection in stochastic block models. The Annals of Statistics, 44:2252 - 2280.
[43] Zhao, Y., Levina, E., & Zhu, J. (2011). Community extraction for social networks. Proceedings of the National Academy of Sciences of the United States of America, 108:7321 - 7326.
[44] Zhao, Y., Levina, E., & Zhu, J. (2012). Consistency of community detection in networks under degree-corrected stochastic block models. The Annals of Statistics, 40:2266 - 2292.
[45] Zhou, W. (2007). Asymptotic distribution of the largest off-diagonal entry of correlation matrices. Transactions of the American Mathematical Society, 359:5345 - 5363.

	$\displaystyle L_{n,0}$	$\displaystyle:=\max_{i,v}\|\tilde{\rho}_{iv}\|$
		$\displaystyle\ =\max_{i,v}\left\|\dfrac{1}{\sqrt{\left\|g_{x}^{-1}(v)/\{i\}\right\|+\left\|g_{y}^{-1}(v)/\{i\}\right\|}}\left(\tilde{\rho}_{iv,x}+\tilde{\rho}_{iv,y}\right)\right\|$
		$\displaystyle\ =\max_{i,v}\left\|\dfrac{1}{\sqrt{2\left\|g^{-1}(v)/\{i\}\right\|}}\left[\rho_{iv,x}+\rho_{iv,y}+(\rho_{iv,x}+\rho_{iv,y})o_{p}(\sqrt{\dfrac{K\log n}{n}})+o_{p}(1)\right]\right\|$
		$\displaystyle\ =L_{n,1}+L_{n,1}o_{p}(\sqrt{\dfrac{K\log n}{n}})+o_{p}(1),$

	$\displaystyle L_{n,1}$	$\displaystyle:=\max_{i,v}\left\|\rho_{iv}\right\|$
		$\displaystyle\ =\max_{i,v}\left\|\dfrac{1}{\sqrt{2\left\|g^{-1}(v)/\{i\}\right\|}}\left(\rho_{iv,x}+\rho_{iv,y}\right)\right\|$
		$\displaystyle\ =\max_{i,v}\left\|\dfrac{1}{\sqrt{\left\|g^{-1}(v)/\{i\}\right\|}}\sum\limits_{j\in g^{-1}(v)/\{i\}}\dfrac{X_{ij}+Y_{ij}-2B_{g(i)g(j)}}{\sqrt{2B_{g(i)g(j)}\left(1-B_{g(i)g(j)}\right)}}\right\|.$

	$\displaystyle\tilde{\rho}_{iv,x}$	$\displaystyle:=\dfrac{1}{\sqrt{\|g_{y}^{-1}(v)/\{i\}\|}}\sum\limits_{j\in g_{y}^{-1}(v)/\{i\}}\dfrac{X_{ij}-\widehat{B}^{y}_{g_{y}(i)g_{y}(j)}}{\sqrt{\widehat{B}^{y}_{g_{y}(i)g_{y}(j)}\left(1-\widehat{B}^{y}_{g_{y}(i)g_{y}(j)}\right)}}$
		$\displaystyle\ =\dfrac{1}{\sqrt{\|g_{y}^{-1}(v)/\{i\}\|}}$
		$\displaystyle\qquad\times\sum\limits_{j\in g_{y}^{-1}(v)/\{i\}}\dfrac{X_{ij}-B_{g_{x}(i)g_{x}(j)}^{x}+B_{g_{x}(i)g_{x}(j)}^{x}-B^{y}_{g_{y}(i)g_{y}(j)}+B^{y}_{g_{y}(i)g_{y}(j)}-\widehat{B}^{y}_{g_{y}(i)g_{y}(j)}}{\sqrt{\widehat{B}^{y}_{g_{y}(i)g_{y}(j)}\left(1-\widehat{B}^{y}_{g_{y}(i)g_{y}(j)}\right)}}$
		$\displaystyle\ =\dfrac{1}{\sqrt{\|g_{y}^{-1}(v)/\{i\}\|}}\sum\limits_{j\in g_{y}^{-1}(v)/\{i\}}\dfrac{X_{ij}-B_{g_{x}(i)g_{x}(j)}^{x}+B_{g_{x}(i)g_{x}(j)}^{x}-B^{y}_{g_{y}(i)g_{y}(j)}+o_{p}(\dfrac{K\log n}{n})}{\sqrt{B^{y}_{g_{y}(i)g_{y}(j)}\left(1-B^{y}_{g_{y}(i)g_{y}(j)}\right)}}$
		$\displaystyle\qquad\times\left(1+o_{p}(\sqrt{\dfrac{K\log n}{n}})\right)$
		$\displaystyle\ =\rho_{iv,x}+\dfrac{1}{\sqrt{\|g_{y}^{-1}(v)/\{i\}\|}}\sum\limits_{j\in g_{y}^{-1}(v)/\{i\}}\dfrac{B_{g_{x}(i)g_{x}(j)}^{x}-B^{y}_{g_{y}(i)g_{y}(j)}}{\sqrt{B^{y}_{g_{y}(i)g_{y}(j)}\left(1-B^{y}_{g_{y}(i)g_{y}(j)}\right)}}+o_{p}(1).$

	$\displaystyle L_{n,0}$	$\displaystyle:=\max_{i,v}\|\tilde{\rho}_{iv}\|$
		$\displaystyle\ =\max_{i,v}\|\dfrac{1}{\sqrt{\left\|g_{x}^{-1}(v)/\{i\}\right\|+\left\|g_{y}^{-1}(v)/\{i\}\right\|}}(\sqrt{\left\|g_{y}^{-1}(v)/\{i\}\right\|}\tilde{\rho}_{iv,x}+\sqrt{\left\|g_{x}^{-1}(v)/\{i\}\right\|}\tilde{\rho}_{iv,y})\|$
		$\displaystyle\ \geq\max_{i,v}\|\dfrac{\sqrt{\left\|g_{x}^{-1}(v)/\{i\}\right\|}+\sqrt{\left\|g_{y}^{-1}(v)/\{i\}\right\|}}{\sqrt{\left\|g_{x}^{-1}(v)/\{i\}\right\|+\left\|g_{y}^{-1}(v)/\{i\}\right\|}}(\tilde{\rho}_{iv,x}\wedge\tilde{\rho}_{iv,y})\|$
		$\displaystyle\ =\dfrac{\sqrt{\left\|g_{x}^{-1}(v)/\{i\}\right\|}+\sqrt{\left\|g_{y}^{-1}(v)/\{i\}\right\|}}{\sqrt{\left\|g_{x}^{-1}(v)/\{i\}\right\|+\left\|g_{y}^{-1}(v)/\{i\}\right\|}}\max_{i,v}\|(\tilde{\rho}_{iv,x}\wedge\tilde{\rho}_{iv,y})\|$