Beyond Asymptotics: Practical Insights into Community Detection in Complex Networks
Abstract
The stochastic block model (SBM) is a fundamental tool for community detection in networks, yet the finite-sample performance of inference methods remains underexplored. We evaluate key algorithms—spectral methods, variational inference, and Gibbs sampling—under varying conditions, including signal-to-noise ratios, heterogeneous community sizes, and multimodality. Our results highlight significant performance variations: spectral methods, especially SCORE, excel in computational efficiency and scalability, while Gibbs sampling dominates in small, well-separated networks. Variational Expectation-Maximization strikes a balance between accuracy and cost in larger networks but struggles with optimization in highly imbalanced settings. These findings underscore the practical trade-offs among methods and provide actionable guidance for algorithm selection in real-world applications. Our results also call for further theoretical investigation in SBMs with complex structures. The code can be found at https://github.com/Toby-X/SBM_computation.
1 Introduction
The idea of using networks to find latent community structures has been ubiquitous in biology (Chen and Yuan, 2006; Marcotte et al., 1999), sociology (Fortunato, 2010) and machine learning (Linden et al., 2003). Among various approaches, the stochastic block model (SBM, Holland et al., 1983) stands out as the simplest generative model for identifying community structures. In undirected networks, symmetric adjacency matrices are used to represent connections, where the -th entry is if and only if there is an edge between node and node and otherwise. For an undirected network with nodes and latent communities, SBM assumes that the expectation of the observed network has the following structure,
where is the observed adjacency matrix, is the community structure matrix, and is a symmetric matrix whose -th entry represents the probability that there is an edge between community and community . Each row of contains exactly one entry equal to , reflecting that each node belongs to precisely one community.
The popularity of SBM lies in its sharp information-theoretic phase transitions for statistical inference, making it a useful tool for understanding the limits of information extraction in networks and evaluating algorithmic efficiency for community detection. Recent theoretical advancements in SBM have been extensively reviewed by Abbe (2018). Notably, Lei et al. (2024) shows that under some specific asymptotic regime, the information-theoretic threshold for exact community structure recovery is the same for polynomial-time algorithms and the maximum likelihood estimator (MLE) for bipartite networks, with easy extensions to networks with latent communities. These results suggest that computationally efficient polynomial-time methods are sufficient for SBMs, obviating the need for more computationally intensive approaches. Despite these advances, several gaps remain. First, the theoretical results are confined to asymptotic settings, leaving the finite-sample performance of these algorithms unclear. Second, it remains uncertain which polynomial-time method—such as Approximate Message Passing (Feng et al., 2022) or Spectral Clustering (Von Luxburg, 2007)—is most effective in practice. Factors such as sample size, the number of latent communities, community size imbalance, and inter-community connectivity can significantly impact algorithmic performance, yet their impact remains underexplored.
To address this gap, we conduct a comprehensive investigation into the finite-sample performance of various statistical inference methods for community detection in SBMs across a wide range of challenging scenarios. Specifically, we evaluate performance across extensive signal-to-noise ratios (SNRs), heterogeneous community sizes, and multimodal connectivity structures. Our findings provide actionable insights into the practical applicability of these algorithms while highlighting gaps in the theoretical understanding of finite-sample performance, particularly in the presence of noisy or unbalanced data.
2 Methods
In this section, we briefly discuss the methods considered in this paper. Spectral methods, being polynomial-time algorithms, are the most computationally efficient. Variational methods follow in terms of computational cost, while Gibbs sampling requires the most intensive computation.
2.1 Gibbs Sampling
2.2 Variational Bayes
We consider a general variational Bayes method from Zhang (2020), where they consider the optimization problem:
(2.1) |
where is the mean-field variational class and . We can obtain a variational algorithm for the stochastic block model with explicit update formulas thanks to the mean-field assumption. The complete description of the variational class and the algorithm can be found in Appendix A.4.
2.3 Variational-EM
The variational Expectation-Maximization method (Variational-EM) (Leger, 2016) considers
where and is the model for community and . A variational form of likelihood is considered as
where is the variational parameters of the multinomial distribution which approximates . Details can be found in Appendix A.5.
2.4 Spectral Methods
Spectral Clustering has been successfully applied to various fields for its simplicity and accuracy (Von Luxburg, 2007). In the context of the stochastic block model (SBM), the top eigenspace of the adjacency matrix exhibits a simplex structure, where each cluster corresponds to a vertex of the simplex (Chen and Gu, 2024). Building on this insight, spectral-based methods have been extended to degree-corrected stochastic block models Karrer and Newman, 2011, leading to the development of approaches such as SCORE (Jin, 2015), one-class SVM (Mao et al., 2018), and regularized spectral clustering (Qin and Rohe, 2013). Although these methods are designed to address broader scenarios than the standard SBM, it remains an open question whether they can effectively handle challenging settings such as community imbalance and sparsity. Given the diversity of spectral clustering techniques, we detail the specific methods considered in this paper and their terminology for clarity.
Spectral Clustering.
For vanilla spectral clustering, we first perform the top- eigenvalue decomposition of the adjacency matrix ,
where contains the top- eigenvectors, and is the diagonal matrix with top- eigenvalues. Then, we apply K-means clustering to the rows of to estimate .
SCORE.
SCORE normalizes the eigenvector matrix by dividing each column by the first column element-wise
K-means clustering is then applied to to estimate .
Normalization.
In this method, each row of is normalized by its norm:
where is the -th row of . The resulting normalized matrix, denoted as , is then clustered using K-means to estimate .
Regularized Spectral Clustering.
This method constructs a regularized graph Laplacian:
where , is a diagonal matrix with , and is set to be as suggested in Qin and Rohe (2013).
3 Numerical Experiments
3.1 Simulation Settings
We consider a broad range of scenarios to evaluate the performance of our methods. For the number of nodes, we use , representing moderate-sized graphs to larger ones. For the number of communities, we set , ranging from simpler structures to more complex configurations. Notably, even in a balanced dataset, the combination of and poses a significant challenge due to the limited information available for distinguishing a large number of diverse communities. To introduce heterogeneity in the cluster sizes, we adopt the approach outlined in (Zhang et al., 2022). Specifically, let , where
Here, is a heterogeneity parameter controlling the imbalance in cluster sizes. A larger results in a greater imbalance in . We set in our experiments to explore varying degrees of heterogeneity. For the kernel probability matrix, we follow a standard assumption in the literature (Lei et al., 2024). The elements of the kernel matrix are given by
where . We note that achieves exact recovery in asymptotic theory (Lei et al., 2024). To evaluate performance under varying levels of information, we select . These values allow us to explore scenarios ranging from weak to strong signal levels, with serving as the threshold for perfect recovery. However, perfect recovery may not always be attainable in finite samples, which is the focus of this study. Specific initialization can be found in Appendix A.1.
3.2 Results

In this project, we evaluate the performance of clustering algorithms using the adjusted Rand index (ARI; Hubert and Arabie, 1985) and normalized mutual information (NMI; Strehl and Ghosh, 2002) as metrics. ARI measures how many samples are in the same cluster to see how different the clustering result is from the true value, with range . Higher ARI values indicate more accurate clustering. As ARI and NMI yield similar results for our experiments, we focus our analysis of ARI, with results of NMI provided in Appendix A.2.
The results, shown in Figure 1, reveal that the performance of the chosen inference methods varies across scenarios. Note that all methods fail in the sparsest case () so we exclude it in the figure. This highlights a dichotomy between finite-sample performance and the asymptotic guarantees discussed in Lei et al. (2024). We hypothesize that achieving the asymptotic information-theoretic limit requires significantly larger sample sizes.
One of the most peculiar phenomena occurs when or and . In this case, the performances of all algorithms improve when increases to while all methods fail when . On the one hand, since represents a balanced clustering size scenario, this failure likely arises because the sample sizes for individual clusters are too small to extract meaningful information. For instance, spectral clustering when achieves ARI around . This indicates increasing does not necessarily make the task harder, as the latent communities with a larger sample size can be learned more easily due to the imbalance. Hence, an improvement in ARI can be observed as we increase the imbalance of the communities.
Variational Bayes exhibits significantly worse performance under compared to other methods. In other scenarios, its performance typically falls between spectral methods and variational EM. We suspect that, under , the evidence lower bound (ELBO) deviates more substantially from the posterior distribution, resulting in poor performance. Another explanation is that VB may have a slower convergence rate in the balanced scenario. Hence, we recommend using variational-EM instead of variational Bayes for similar computation complexity and better performance.
Among eigenvalue decomposition-based methods, regularized spectral clustering performs the worst, despite claims that it mitigates sparsity. The performances of vanilla spectral clustering and normalization are similar, while SCORE dominates. Although a vanilla SBM does not incorporate degree heterogeneity, using normalization and SCORE does not degrade performance. The superiority of SCORE may stem from its projection of dimensional eigenspace into a dimensional space, effectively improving the estimation in larger clusters by trading off smaller ones. This is especially evident in scenarios with and for or , where SCORE consistently outperforms all other methods. In a nutshell, We recommend using normalization over vanilla spectral clustering for similar performance and its generalizability to degree-corrected stochastic block models. While SCORE dominates all spectral methods in our settings, it may run into stability issues in real-world datasets due to its unique normalization scheme, so further investigation is needed. Nonetheless, SCORE seems to be a plausible recommendation.
Variational EM (VEM) generally outperforms vanilla spectral clustering, despite its reliance on non-convex optimization. We suspect it results from the initialization of spectral clustering and further optimization is capable of finding an improvement of the local maximum near its starting point. Since spectral clustering gives robust results, VEM also gives robust results that are better than spectral clustering. Another interesting discovery of VEM is that in challenging scenarios ( and ), VEM’s performance deteriorates with increasing , both in terms of median accuracy and variation. We attribute this to two factors. First, the data generation mechanism we adopt in this project yields a drastically different set of data, especially when is large. As larger amplifies the variability in the data, we can expect an increase in the variation of the estimation accuracy. Furthermore, VEM may suffer from its optimization landscape when the data has a more complicated structure. The inability to find the global optimum of the nonconvex optimization problem may contribute to its performance deterioration.
We initially conjecture that Gibbs will dominate because variational methods use approximation that leads to error and spectral methods do not vitalize the likelihood information from the model. However, Gibbs outperforms only when both and are small. We propose two potential explanations for this phenomenon. First, when is large, Gibbs may struggle to sample equally from all modes in practice, leading to estimation bias. Second, when is large, we notice that the performance of Gibbs is only slightly worse than VEM. This may be attributed to the influence of an uninformative prior. As increases, the effective sample size for each parameter shrinks, giving the prior greater sway in the posterior and subsequently degrading Gibbs sampling performance compared to VEM.
4 Conclusion
In this work, we investigated the efficacy of various statistical methods for community detection in stochastic block models under noisy, heterogeneous, and multimodal conditions. Our findings highlight significant differences in performance across methods, emphasizing the importance of context when selecting an inference algorithm.
For small networks with simple structures, Gibbs sampling consistently outperformes other methods. For larger networks and communities, variational Expectation-Maximization achieves the best balance between computational cost and accuracy. In scenarios requiring scalability, spectral clustering emerged as a practical choice.
Future work could explore more effective initialization schemes for Gibbs and variational EM, especially using SCORE. Additionally, designing metrics or simulation settings that ensure proper discovery of all clusters may better evaluate these methods. Our work also calls for deeper theoretical analysis to discover all communities when imbalanced, where we conjecture there should be a lower bound for the sample size of the smallest community.
References
- Abbe (2018) Abbe, E. (2018). Community detection and stochastic block models: recent developments. Journal of Machine Learning Research 18 1–86.
- Chen and Yuan (2006) Chen, J. and Yuan, B. (2006). Detecting functional modules in the yeast protein–protein interaction network. Bioinformatics 22 2283–2290.
- Chen and Gu (2024) Chen, L. and Gu, Y. (2024). A spectral method for identifiable grade of membership analysis with binary responses. Psychometrika 1–32.
- Feng et al. (2022) Feng, O. Y., Venkataramanan, R., Rush, C., Samworth, R. J. et al. (2022). A unifying tutorial on approximate message passing. Foundations and Trends® in Machine Learning 15 335–536.
- Fortunato (2010) Fortunato, S. (2010). Community detection in graphs. Physics reports 486 75–174.
- Golightly and Wilkinson (2005) Golightly, A. and Wilkinson, D. J. (2005). Bayesian inference for stochastic kinetic models using a diffusion approximation. Biometrics 61 781–788.
- Holland et al. (1983) Holland, P. W., Laskey, K. B. and Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social networks 5 109–137.
- Hubert and Arabie (1985) Hubert, L. and Arabie, P. (1985). Comparing partitions. Journal of classification 2 193–218.
- Jin (2015) Jin, J. (2015). Fast community detection by SCORE. The Annals of Statistics 43 57 – 89.
- Karrer and Newman (2011) Karrer, B. and Newman, M. E. (2011). Stochastic blockmodels and community structure in networks. Physical Review E—Statistical, Nonlinear, and Soft Matter Physics 83 016107.
- Leger (2016) Leger, J.-B. (2016). Blockmodels: A r-package for estimating in latent block model and stochastic block model, with various probability functions, with or without covariates. arXiv preprint arXiv:1602.07587 .
- Lei et al. (2024) Lei, J., Zhang, A. R. and Zhu, Z. (2024). Computational and statistical thresholds in multi-layer stochastic block models. The Annals of Statistics 52 2431–2455.
- Linden et al. (2003) Linden, G., Smith, B. and York, J. (2003). Amazon. com recommendations: Item-to-item collaborative filtering. IEEE Internet computing 7 76–80.
- Mao et al. (2018) Mao, X., Sarkar, P. and Chakrabarti, D. (2018). Overlapping clustering models, and one (class) svm to bind them all. Advances in Neural Information Processing Systems 31.
- Marcotte et al. (1999) Marcotte, E. M., Pellegrini, M., Ng, H.-L., Rice, D. W., Yeates, T. O. and Eisenberg, D. (1999). Detecting protein function and protein-protein interactions from genome sequences. Science 285 751–753.
- McDaid et al. (2013) McDaid, A. F., Murphy, T. B., Friel, N. and Hurley, N. J. (2013). Improved bayesian inference for the stochastic block model with application to large networks. Computational Statistics & Data Analysis 60 12–31.
- Nowicki and Snijders (2001) Nowicki, K. and Snijders, T. A. B. (2001). Estimation and prediction for stochastic blockstructures. Journal of the American statistical association 96 1077–1087.
- Qin and Rohe (2013) Qin, T. and Rohe, K. (2013). Regularized spectral clustering under the degree-corrected stochastic blockmodel. Advances in neural information processing systems 26.
- Strehl and Ghosh (2002) Strehl, A. and Ghosh, J. (2002). Cluster ensembles—a knowledge reuse framework for combining multiple partitions. Journal of machine learning research 3 583–617.
- Von Luxburg (2007) Von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and computing 17 395–416.
- Zhang et al. (2022) Zhang, A. R., Cai, T. T. and Wu, Y. (2022). Heteroskedastic pca: Algorithm, optimality, and applications. The Annals of Statistics 50 53–80.
- Zhang (2020) Zhang, F. (2020). Theoretical Guarantees of Variational Inference and Its Applications. Ph.D. thesis, The University of Chicago.
Appendix A Appendix
A.1 Initialization
For Gibbss sampling, we use for all . Variational Bayes uses a uniform initialization of the memberships and a standard initialization for the variational parameters as specified in Algorithm 1. Variational-EM is initialized with spectral clustering for both models. We run each scenario with 100 different random seeds and report the results in the below section.
A.2 NMI

A.3 Gibbs Sampling
Let
The full conditional distribution of each parameter can be derived as
A.4 Variational Bayes
The mean-field variational class is specified as
where is the support of and are the variational families. In Equation (2.1), and respectively denote some distribution families for and . Zhang (2020) consider the following mass point distribution family for :
In addition, is selected as the normal distribution family and is selected as a parametric distribution family with parameter . Here we abuse the notation a little bit and assume a vectorized while we still use to denote the vectorized matrix. The distribution families are defined as follows:
The complete algorithm can be obtained as in Algorithm 1.
A.5 Variational-EM
The EM procedure is described as follows:
-
1.
E step: Maximization with respect to variational parameters .
-
2.
M step: Maximization with respect to original parameters and function parameters in .
We consider Bernoulli and Gaussian as the function models, as following. For both models, we select the community with maximized value as the membership for each node.
A.5.1 Bernoulli
This is the common stochastic block model. The model is defined as follows:
where for .
A.5.2 Gaussian
This model assumes a normal distribution of the community links, defined as:
where and for .