rMultiNet: An R Package For Multilayer Networks Analysis

\nameTing Li \email[email protected]
\addrDepartment of Applied Mathematics, The Hong Kong Polytechnic University
\nameZhongyuan Lyu \email[email protected]
\addrDepartment of Mathematics, The Hong Kong University of Science and Technology
\nameChenyu Ren \email[email protected]
\addrDepartment of Applied Mathematics, The Hong Kong Polytechnic University
\nameDong Xia \email[email protected]
\addrDepartment of Mathematics, The Hong Kong University of Science and Technology

Abstract

This paper develops an R package rMultiNet to analyze multilayer network data. We provide two general frameworks from recent literature, e.g. mixture multilayer stochastic block model(MMSBM) and mixture multilayer latent space model(MMLSM) to generate the multilayer network. We also provide several methods to reveal the embedding of both nodes and layers followed by further data analysis methods, such as clustering. Three real data examples are processed in the package. The source code of rMultiNet is available at https://github.com/ChenyuzZZ73/rMultiNet.

Keywords: Multilayer networks, Tensor decomposition, Community detection

1 Introduction

The recent decade has witnessed a fast-growing demand for processing and analyzing complex networks. While there are numerous studies about the single static network(Amini et al. (2013); Gao et al. (2017, 2018); Wang et al. (2021); Jing et al. (2022); Yu et al. (2022)), researchers have shown increasing interest in the study of the multilayer network (Paul and Chen (2020); Le et al. (2018); Lei et al. (2020); Arroyo et al. (2021); Jing et al. (2021); Li et al. (2021); Chen et al. (2022)), which is a more powerful representation of multi-relational data. Numerous kinds of real-world data could be recorded as multilayer networks, such as brain connectivity networks, gene-gene interactivity networks and world trading networks.
Since community structure is a common observation in static network analysis, it naturally raises the question on how to define and detect community structure in multilayer networks. In a multilayer network, the nodes represent individuals of interest and the edges between nodes in different layers represent different relationships. Such complex relations in multilayer networks pose great challenges to identify and analyze its community structure. Particularly, the heterogeneity across layers can be characterized by individual links, group memberships of nodes, or connectivity patterns inside the community and among different communities. Recently, Jing et al. (2021) proposed a novel mixture multilayer SBM(MMSBM) with a new tensor-based method TWIST to simultaneously cluster networks and identify global and local group memberships of vertices. Moreover, Lyu et al. (2021) introduced a novel mixture multilayer LSM(MMLSM) that estimates the latent positions of nodes via the generalized low-rank tensor decomposition.
In this paper, we propose an R package rMultiNet to analyze the mixture multilayer network. Fig.1 illustrates the overview of our package rMultiNet. We provide several prevalent methods for users to study the latent features of multilayer networks. First, we provide three real datasets as examples to study and two generative models, the MMSBM and the MMLSM, to generate mixture multilayer networks. Then, we provide several fitting methods from Jing et al. (2021) and Lyu et al. (2021) to reveal the embedding results of the multilayer network following with several prevalent clustering methods to analyze the embedding results. Last, several data visualization functions are provided to present the out-comings.

Refer to caption — Figure 1: overview of the rMultiNet package

2 Package overview

rMultiNet includes models to generate multilayer networks, several algorithms to learn complex mixture multilayer networks’ latent structures, multiple clustering methods to further analyze the embedding results, and several visualization functions for presentation. The package is organized into the modules listed below:

•

Generation: rMultiNet adopts the mixture multilayer stochastic(MMSBM) (Jing et al. (2021)) and mixture multilayer latent space(MMLSM) (Lyu et al. (2021)) to generate mixture multilayer networks.
•

Embedding: rMultiNet contains two tensor decomposition algorithms for mixture multilayer network embedding, e.g., the TWIST proposed by Jing et al. (2021) and the ProjectedGD introduced by Lyu et al. (2021). Moreover, a naive method, spectral clustering on the sum of adjacency matrices from all layers (Sum–Adj) and Mode-3 flatting(M3-SC), proposed in Jing et al. (2021), is included as a baseline for comparison.
•

Clustering: rMultiNet provides several clustering methods to analyze the embedding results, such as K-means (Likas et al. (2003)), spectral clustering (Dong et al. (2012)) and Density-based spatial clustering of applications with noise (DBSCAN) (Hahsler et al. (2019)).
•

Datasets: rMultiNet contains three datasets for study.

Human malaria parasite gene network: The data under investigation are the 9 Highly variable regions of the malaria parasite gene sequence. Each network contains 212 nodes, which appear on all 9 layers. More details about the background and the data pre-processing could be found in Larremore et al. (2013) and Jing et al. (2021).
Worldwide food trading network: In this multilayer network, layers represent different products, nodes are countries, and edges at each layer represent trading relationships of a specific food product among countries. The data is collected by De Domenico et al. (2015) and is available at http://www.fao.org. After data pre-processing (Jing et al. (2021)), we obtain a 30-layers network with 99 nodes.
UN Commodity trading network: The dataset contains annual trade information for countries in 2019 from the UN Comtrade Database (https://comtrade.un.org). We focus on the top representative 48 countries ranked by the exports of goods and services in US dollars. Each layer represents a different type of commodity classified into 97 categories (Lyu et al. (2021)).

3 Functionality and Examples

In this section, we provide the detailed usage of rMultiNet package. Specific meanings of notations used in this part can be found in Jing et al. (2021) and Lyu et al. (2021). The multilayer network to be explored can be either generated from the package or loaded from external files. The data load from external files needs to be in the form of the ‘tensor‘ class defined in package rTensor. We provide two approaches to generate the adjacency tensor of the mixture multilayer network in rMultiNet as follows, corresponding to MMSBM and MMLSM respectively.
$>$ library(rMultiNet)
$>$ GenerateMMSBM(n, m, L, K, d = NULL, r = NULL)
$>$ GenerateMMLSM(n, m, L, rank, U_mean= 0.5, cmax =1, d, int_type = ‘Uniform’, kernel_fun = ‘logit’, scale_par=1)
Here, $n$ is the number of vertices, $m$ is the number of types of the network, $L$ is the number of layers, $K$ is the number of groups of vertices, $d$ is the average degree of the network in each layer and $r$ is the out-in ratio in each layer. Particularly in function GenerateMMLSM, $rank$ is the rank of latent position matrix $U$ , U_mean is the mean of the normal distribution of each entry of $U$ , cmax is the entry-wise upper bound of core tensor C, int_type represents the ways of generating tensor C (‘Uniform’ or ‘Norm’), kernel_fun is the link function of generating the adjacency tensor (‘logit’ or ‘probit’) and scale_par is the scaling factor of the parameter tensor. The output is a list including an adjacency tensor and the generating parameters $\Theta$ .
The multilayer network loaded or generated from the above, will be stored as an adjacency tensor. rMultiNet provides algorithms to learn the latent structure of the adjacency tensor.
$>$ InitializationMMSBM(tnsr, ranks=NULL)
$>$ PowerIteration(tnsr, ranks=c(2,2,2), type=”TWIST”, U_0_list, delta1=1000, delta2=1000, max_iter = 25, tol = 1e-05)
The function InitializationMMSBM outputs initialization U_0_list, which can be the input of function PowerIteration. Here tnsr is the adjacency tensor of the network, type specifies the iterative algorithm to run ‘TWIST’ or ‘Tucker’, delta1 and delta2 are tuning parameters for regularization in mode-1 and mode-2, max_iter is the max times of iteration and tol is the convergence tolerance. Note that the ranks is the rank of the core tensor calculated by the equation $m\times K-(m-1)$ (see Jing et al. (2021)). The output is a list including the core tensor $Z$ , network embedding and node embedding.
$>$ SpecClustering(tnsr, rank, embedding_type = ”Layer”)
In function SpecClustering, tnsr is the adjacency tensor, rank is the number of columns of the output matrix $U$ , embedding_type specifies the embedding type (Sum-Adj for ‘Node’ and M3-SC for ‘Layer’). The output matrix $U$ can be applied in cluster methods like kmeans.
$>$ InitializationLSM(gen_list, n, rank, M, perturb = 0.1, int_type)
$>$ ProjectedGD(Ini_list, Cmax, eta_outer = 1e-04, tmax_outer = 35, p_type =‘logit’, rd =‘Non’, show = TRUE, sgma =1, sample_size =5000)
In function InitializationLSM, gen_list is a list including the adjacency tensor and the parameter $\Theta$ of the mixture multilayer network, $n$ is the number of nodes, rank is the rank of $U$ ; M is the number of network types, perturb specifies the upper bound of Uniform distribution, int_type specifies the method to initialize $U$ and $W$ ( ‘spec’, ‘rand’ or ‘warm’). The output of function InitializationLSM is a list including the adjacency tensor, $U_{0}$ , $W_{0}$ and tuning parameters $\{\delta_{1},\delta_{2},\delta_{3}\}$ . In function ProjectedGD, Ini_list is the output of function InitializationLSM, Cmax is the upper limits for adding the coefficient constraint, eta_outer is the learning rate in gradient descent, tmax_outer is the number of iterations in gradient descent, p_type specifies the type of link function (‘logit’, ‘probit’ or ‘poisson’), rd specifies whether to use stochastic sampling (‘rand’ or ‘Non’) and sgma is the link function parameter $\sigma$ . The output is the embedding results of nodes and layers.
rMultiNet also implements functions for visualization.
$>$ Embedding_network(network_membership,L, paxis = 2)
This function is used to produce plots of network embedding and node embedding, respectively. Here paxis specifies the number of eigenvectors to use in the plot. If the number of eigenvectors is more than two, a plot table is supposed to generate. By default, it plots the second eigenvector and the third eigenvector.
$>$ Community_cluster_km(embedding,type,cluster_number)
$>$ Community_cluster_dbscan(embedding,type,eps_value =.05,pts_value=5)
Clustering algorithms like K-means and DBSCAN can be applied to the results of embedding. Here, type can be either node embedding ‘n’ or network embedding ‘N’, cluster_number is the number of clusters for Kmeans, eps_value and pts_value is parameters for DBSCAN.

4 Summary

rMultiNet introduces an extension R package that includes a variety of traditional and state-of-the-art tensor decomposition methods for mixture multilayer network analysis. The package is developed with the modular pipeline mode: generative modeling, embedding algorithms, and visualization. In response to growing data and interactions in different networks, rMultiNet aims to help study complex networks, especially mixture multilayer networks. The dynamic networks also apply in this package which can be regarded as a special case of multilayer networks with layers indexed by time.

Acknowledgments

References

Amini et al. (2013) Arash A Amini, Aiyou Chen, Peter J Bickel, and Elizaveta Levina. Pseudo-likelihood methods for community detection in large sparse networks. The Annals of Statistics, 41(4):2097–2122, 2013.
Arroyo et al. (2021) Jesús Arroyo, Avanti Athreya, Joshua Cape, Guodong Chen, Carey E Priebe, and Joshua T Vogelstein. Inference for multiple heterogeneous networks with a common invariant subspace. Journal of Machine Learning Research, 22(142):1–49, 2021.
Chen et al. (2022) Shuxiao Chen, Sifan Liu, and Zongming Ma. Global and individualized community detection in inhomogeneous multilayer networks. The Annals of Statistics, 50(5):2664–2693, 2022.
De Domenico et al. (2015) Manlio De Domenico, Vincenzo Nicosia, Alexandre Arenas, and Vito Latora. Structural reducibility of multilayer networks. Nature communications, 6(1):1–9, 2015.
Dong et al. (2012) Xiaowen Dong, Pascal Frossard, Pierre Vandergheynst, and Nikolai Nefedov. Clustering with multi-layer graphs: A spectral perspective. IEEE Transactions on Signal Processing, 60(11):5820–5831, 2012.
Gao et al. (2017) Chao Gao, Zongming Ma, Anderson Y Zhang, and Harrison H Zhou. Achieving optimal misclassification proportion in stochastic block models. The Journal of Machine Learning Research, 18(1):1980–2024, 2017.
Gao et al. (2018) Chao Gao, Zongming Ma, Anderson Y Zhang, and Harrison H Zhou. Community detection in degree-corrected block models. The Annals of Statistics, 46(5):2153–2185, 2018.
Hahsler et al. (2019) Michael Hahsler, Matthew Piekenbrock, and Derek Doran. dbscan: Fast density-based clustering with r. Journal of Statistical Software, 91:1–30, 2019.
Jing et al. (2021) Bing-Yi Jing, Ting Li, Zhongyuan Lyu, and Dong Xia. Community detection on mixture multilayer networks via regularized tensor decomposition. The Annals of Statistics, 49(6):3181–3205, 2021.
Jing et al. (2022) Bingyi Jing, Ting Li, Ningchen Ying, and Xianshi Yu. Community detection in sparse networks using the symmetrized laplacian inverse matrix (slim). Statistica Sinica, 32(1):1, 2022.
Larremore et al. (2013) Daniel B Larremore, Aaron Clauset, and Caroline O Buckee. A network approach to analyzing highly recombinant malaria parasite genes. PLoS computational biology, 9(10):e1003268, 2013.
Le et al. (2018) Can M Le, Keith Levin, and Elizaveta Levina. Estimating a network from multiple noisy realizations. Electronic Journal of Statistics, 12(2):4697–4740, 2018.
Lei et al. (2020) Jing Lei, Kehui Chen, and Brian Lynch. Consistent community detection in multi-layer network data. Biometrika, 107(1):61–73, 2020.
Li et al. (2021) Ting Li, Jianchang Hu, Shiying Wang, and Heping Zhang. Super-variants identification for brain connectivity. Human Brain Mapping, 42(5):1304–1312, 2021.
Likas et al. (2003) Aristidis Likas, Nikos Vlassis, and Jakob J Verbeek. The global k-means clustering algorithm. Pattern recognition, 36(2):451–461, 2003.
Lyu et al. (2021) Zhongyuan Lyu, Dong Xia, and Yuan Zhang. Latent space model for higher-order networks and generalized tensor decomposition. arXiv preprint arXiv:2106.16042, 2021.
Paul and Chen (2020) Subhadeep Paul and Yuguo Chen. Spectral and matrix factorization methods for consistent community detection in multi-layer networks. The Annals of Statistics, 48(1):230–250, 2020.
Wang et al. (2021) Jiangzhou Wang, Jingfei Zhang, Binghui Liu, Ji Zhu, and Jianhua Guo. Fast network community detection with profile-pseudo likelihood methods. Journal of the American Statistical Association, pages 1–14, 2021.
Yu et al. (2022) Xianshi Yu, Ting Li, Ningchen Ying, and Bing-Yi Jing. Collaborative filtering with awareness of social networks. Journal of Business & Economic Statistics, 40(4):1629–1641, 2022.