This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

RaftGP: Random Fast Graph Partitioning

Yu Gao12 Meng Qin124 Yibin Ding2 Li Zeng3 Chaorui Zhang2 Weixi Zhang2 Wei Han2 Rongqian Zhao3 Bo Bai2 2Theory Lab, Central Research Institute, 2012 Labs, Huawei Technologies Co., Ltd. 3Algorithm & Technology Development Department, Global Technical Service Department, Huawei Technologies Co., Ltd. 4Department of Computer Science & Engineering, Hong Kong University of Science & Technology 1The first two authors contributed equally. Corresponding Author: Meng Qin ([email protected]). (Innovation Award Winner of IEEE HPEC 2023 Graph Challenge: https://graphchallenge.mit.edu/champions)
Abstract

Graph partitioning (GP), a.k.a. community detection, is a classic problem that divides the node set of a graph into densely-connected blocks. Following prior work on the IEEE HPEC Graph Challenge benchmark and recent advances in graph machine learning, we propose a novel RAndom FasT Graph Partitioning (RaftGP) method based on an efficient graph embedding scheme. It uses the Gaussian random projection to extract community-preserving features from classic GP objectives. These features are fed into a graph neural network (GNN) to derive low-dimensional node embeddings. Surprisingly, our experiments demonstrate that a randomly initialized GNN even without training is enough for RaftGP to derive informative community-preserving embeddings and support high-quality GP. To enable the derived embeddings to tackle GP, we introduce a hierarchical model selection algorithm that simultaneously determines the number of blocks and the corresponding GP result. We evaluate RaftGP on the Graph Challenge benchmark and compare the performance with five baselines, where our method can achieve a better trade-off between quality and efficiency. In particular, compared to the baseline algorithm [1] of the IEEE HPEC Graph Challenge, our method is 6.68x – 23.9x faster on graphs with 1E3 – 5E4 nodes and at least 64.5x faster on larger (1E5 node) graphs on which the baseline takes more than 1E4 seconds. Our method achieves better accuracy on all test cases. We also develop a new graph generator to address some limitations of the original generator in the benchmark.

Index Terms:
Graph Partitioning, Graph Clustering, Community Detection, Graph Embedding

I Introduction

For many real-world complex systems (e.g., social, communication, and biological networks), graphs serve as a generic abstraction of system entities and their relations in terms of nodes and edges. Graph partitioning (GP), a.k.a. disjoint community detection, is a classic problem that divides the node set of a graph into disjoint blocks (communities) with dense linkages distinct from other blocks. The partitioning of blocks (communities) can reveal the structures and functions of a graph [2, 3, 4] while supporting advanced applications (e.g., parallel task assignment [5], wireless network decomposition [6], and traffic classification [7]).

GP on large graphs is usually challenging because it can be formulated as several NP-hard combinatorial optimization objectives. Previous studies on the IEEE HPEC Graph Challenge benchmark [8] have provided a series of solutions to alleviating the NP-hard challenge of GP. Knyazev et al. [9] solved the eigen decomposition of graph Laplacians via locally optimal block preconditioned conjugate gradient and accelerated the spectral clustering algorithm. Durbeck et al. [10] explored the ability of graph summarization techniques to preserve the underlying community structures of graphs. Uppal et al. [11] and Wanye et al. [12] proposed a top-down divide-and-conquer algorithm and a sampling strategy to speed up the Bayes inference of the stochastic block model (SBM).

Different from the aforementioned studies that focus on the engineering improvements of existing GP techniques (e.g., spectral clustering and Bayesian SBM), we propose a novel RAndom FasT Graph Partitioning (RaftGP) method following recent advances in graph machine learning [13]. It adopts an efficient community-preserving graph embedding scheme including (i) random feature extraction, (ii) random embedding derivation, and (iii) hierarchical model selection.

We first use the efficient Gaussian random projection [14] to extract two sets of low-dimensional features from classic GP objectives of normalized cut (NCut) minimization [15] and modularity maximization [16], which induces two variants of RaftGP. The extracted features are believed to encode the community structures of graphs because the random projection can preserve the geometric structures (e.g., in terms of distances) between inputs (i.e., components about graph structures in the two GP objectives) with rigorous theoretical guarantees [14].

To derive low-dimensional embeddings, we then feed the extracted features into a graph neural network (GNN), which further enhances their abilities to preserve community structures via multi-hop feature aggregation while reducing the dimensionality. In contrast to existing GNN-based methods [17, 18, 19] that require time-consuming training, we get inspiration from the efficient random projection and consider an extreme design of RaftGP, where we do not apply any training procedures to GNN. Surprisingly, our experiments demonstrate that one feedforward propagation of a randomly initialized GNN , even without training, is enough for RaftGP to derive informative community-preserving embeddings.

To enable the embedding scheme to tackle GP, we introduce a hierarchical model selection algorithm that simultaneously determines the proper number of blocks and corresponding block membership (i.e., the GP result). It recursively partitions the node set into two subsets by applying KKMeans to the derived node embeddings, where a novel local modularity metric is used to stop the recursion.

We follow the experiment settings of the IEEE HPEC Graph Challenge to evaluate RaftGP and compare the performance with five baselines. However, we notice some limitations of the original data generators in the benchmark. To address these limitations, we develop a new algorithm that can generate graphs from the exact SBM distribution in nearly linear time.

II Problem Statements & Preliminaries

In this study, we consider undirected unweighted graphs. In general, a graph can be represented as G=(V,E)G=(V,E), where V={v1,v2,,vN}{V}=\{v_{1},v_{2},\cdots,v_{N}\} and E={(vi,vj)|vi,vjV}{E}=\{(v_{i},v_{j})|v_{i},v_{j}\in V\} are the sets of nodes and edges. The topology structure of a graph can be described by an adjacency matrix 𝐀{0,1}N×N{\mathbf{A}}\in\{0,1\}^{N\times N}, where 𝐀ij=𝐀ji=1{\mathbf{A}}_{ij}={\mathbf{A}}_{ji}=1 if (vi,vj)E(v_{i},v_{j})\in{E} and 𝐀ij=𝐀ji=0{\mathbf{A}}_{ij}={\mathbf{A}}_{ji}=0 otherwise. We study the following graph partitioning (GP) problem.

Definition 1 (Graph Partitioning, GP).

Given a graph G{G}, GP (a.k.a. disjoint community detection) aims to partition the node set V{V} into KK disjoint subsets (i.e., blocks or communities) C={C1,,CK}{C}=\{{C}_{1},\cdots,{C}_{K}\} such that (i) within each block the linkage is dense but (ii) between different blocks the linkage is relatively loose. In this study, we assume that KK is not given and should be determined by a model selection procedure.

Our RaftGP method follows a novel community-preserving embedding scheme based on some classic combinatorial optimization objectives of GP (e.g., NCut minimization [15] and modularity maximization [16]).

Definition 2 (Graph Embedding).

Given a graph G{G}, graph embedding (a.k.a. graph representation learning) learns a function f:Vdf:{V}\mapsto{\mathbb{R}^{d}} that maps each node viv_{i} to a low-dimensional vector representation (a.k.a. embedding) 𝐳id{\bf{z}}_{i}\in\mathbb{R}^{d} (dNd\ll N). The derived embeddings {𝐳i}\{{\bf{z}}_{i}\} are expected to preserve some major properties (e.g., community structures) of graph topology. For instance, nodes (vi,vj)(v_{i},v_{j}) with similar properties (e.g., in the same block) should have close embeddings (𝐳i,𝐳j)({\bf{z}}_{i},{\bf{z}}_{j}) (e.g., in terms of distance). It is a unified framework that can support various graph inference tasks by combining {𝐳i}\{{\bf{z}}_{i}\} with specific downstream modules. For GP, one can apply the KKMeans clustering algorithm to {𝐳i}\{{\bf{z}}_{i}\}, which derives the GP result C{C} w.r.t. the determined number of blocks KK.

Definition 3 (NCut Minimization).

Given a graph G{G} and the number of blocks KK to be partitioned, NCut minimization aims to derive a GP result C{C} that minimizes the NCut metric:

minCNCut(G,K)12r=1K[cut(Cr,C¯r)/vol(Cr)],\mathop{\min}\limits_{C}~{}{\mathop{\rm NCut}\nolimits}({G},K)\equiv\frac{1}{2}\sum\nolimits_{r=1}^{{K}}{[{\mathop{\rm cut}\nolimits}({C}_{r},{\overline{C}}_{r})/{\mathop{\rm vol}\nolimits}({C}_{r})]}, (1)

where C¯r=VCr{\overline{C}}_{r}={{{V}}}-{C}_{r} denotes the complementary set of Cr{{C}_{r}}; cut(Cr,C¯r)=viCr,vjC¯r𝐀ij{\mathop{\rm cut}\nolimits}({C}_{r},{{\overline{C}}_{r}})=\sum\nolimits_{{v_{i}}\in{C}_{r},{v_{j}}\in{\overline{C}}_{r}}{{{\bf{A}}_{ij}}} is the cut between Cr{C_{r}} and C¯r{{\overline{C}}_{r}}; vol(Cr)=viCr,vjV𝐀ij{\mathop{\rm vol}\nolimits}({C}_{r})=\sum\nolimits_{{v_{i}}\in{C}_{r},{v_{j}}\in{V}}{{{\bf{A}}_{ij}}} is the volume of Cr{{C}_{r}}. (1) can be further rewritten into the following matrix form:

min𝐇tr(𝐇T𝐋𝐇)s.t.𝐇ir={[deg(vi)/vol(Cr)]0.5,viCr0,otherwise,\mathop{\min}\limits_{\bf{H}}~{}{\mathop{\rm tr}\nolimits}({{\bf{H}}^{T}}{\bf{LH}}){\rm{~{}s.t.~{}}}{{\bf{H}}_{ir}}=\left\{{\begin{array}[]{*{20}{l}}{{[{{\deg}(v_{i})}/{\mathop{\rm vol}\nolimits}({{C}_{r}})]^{0.5}},{v_{i}}\in{{C}_{r}}}\\ {0,{\rm{~{}otherwise}}}\end{array}}\right.,

(2)

where 𝐋𝐈N𝐃0.5𝐀𝐃0.5{\bf{L}}\equiv{\bf{I}}_{N}-{\bf{D}}^{-0.5}{\bf{A}}{\bf{D}}^{-0.5} is the normalized Laplacian matrix of 𝐀{\bf{A}}; 𝐃diag(deg(v1),,degvN){\bf{D}}\equiv{\mathop{\rm diag}\nolimits}({\deg}(v_{1}),\cdots,\deg{v_{N}}) is defined as a degree diagonal matrix with deg(vi){\deg}(v_{i}) as the degree of node viv_{i}; 𝐈N{\bf{I}}_{N} is an NN-dimensional identity matrix; 𝐇{\bf{H}} consists of the scaled block membership indicators.

Definition 4 (Modularity maximization).

For the given G{G} and KK, modularity maximization aims to derive a GP result C{C} that maximizes the following modularity metric:

maxCMod(C,K)12er=1Kvi,vjCr[𝐀ijdeg(vi)deg(vj)2e],\mathop{\max}\limits_{C}{\rm{Mod}}({C},K)\equiv\frac{1}{{2e}}\sum\limits_{r=1}^{K}{\sum\limits_{{v_{i}},{v_{j}}\in{{C}_{r}}}{[{{\bf{A}}_{ij}}-\frac{{{{\deg}(v_{i})}{{\deg}(v_{j})}}}{{2e}}]}},

(3)

where deg(vi){\deg}(v_{i}) is the degree of node viv_{i}; ee is the number of edges. Similarly, (3) can also be rewritten into a matrix form:

min𝐇tr(𝐇T𝐐𝐇)s.t.𝐇ir={1,viCr0,otherwise,\mathop{\min}\limits_{\bf{H}}~{}-{\mathop{\rm tr}\nolimits}({{\bf{H}}^{T}}{\bf{QH}}){\rm{~{}s.t.~{}}}{{\bf{H}}_{ir}}=\left\{{\begin{array}[]{*{20}{l}}{1,~{}{v_{i}}\in{{C}_{r}}}\\ {0,{\rm{~{}otherwise}}}\end{array}}\right., (4)

where 𝐐N×N{\bf{Q}}\in\mathbb{R}^{N\times N} is defined as the modularity matrix with 𝐐ij=𝐀ijdeg(vi)deg(vj)/(2e){\bf{Q}}_{ij}={\bf{A}}_{ij}-{\deg}(v_{i}){\deg}(v_{j})/(2e); 𝐇{\bf{H}} consists of the block membership indicators.

Although NCut minimization and modularity maximization are NP-hard and need a pre-set number of blocks KK, our method does not directly solve these combinatorial optimization problems. Instead, we only focus on how to efficiently extract informative community-preserving features from (2) and (4), which is detailed later in Section III-A.

The benchmarks of the IEEE HPEC Graph Challenge contain graphs drawn from the SBM with different settings. One baseline algorithm [1] is also based on maximizing the posterior probability of the SBM.

Definition 5 (Stochastic Block Model, SBM).

Let 𝜽+N\boldsymbol{\theta}\in\mathbb{R}^{N}_{+} and 𝐜[K]N{\bf{c}}\in[K]^{N} be the degree correction vector and the block assignment vector. Let 𝛀+K×K{\bf{\Omega}}\in\mathbb{R}_{+}^{K\times K} be the block interaction matrix. The SBM independently generates an edge for each pair of nodes (vi,vj)(v_{i},v_{j}) (i.e., each entry of adjacency matrix 𝐀{\bf{A}}) via a Poisson distribution 𝐀ijPois(λij𝜽i𝜽j𝛀cicj)\mathbf{A}_{ij}\sim Pois(\lambda_{ij}\equiv\boldsymbol{\theta}_{i}\boldsymbol{\theta}_{j}{\bf{\Omega}}_{c_{i}c_{j}}).

III Methodology

To alleviate the NP-hard challenge of GP, we propose a novel RaftGP method following an efficient community-preserving embedding scheme. In this section, we elaborate on the three major procedures of RaftGP, which are (i) random feature extraction, (ii) random embedding derivation, and (iii) hierarchical model selection.

III-A Random Feature Extraction

We first extract informative features from the key components regarding graph structures in classic GP objectives. For NCut minimization defined in (2), 𝐌𝐃0.5𝐀𝐃0.5{\bf{M}}\equiv{\bf{D}}^{-0.5}{\bf{A}}{\bf{D}}^{-0.5} in the Laplacian matrix 𝐋{\bf{L}} is used to describe the graph topology. For modularity maximization, the modularity matrix 𝐐{\bf{Q}} in (4) is the primary component regarding graph structures.

In particular, {𝐌,𝐐}\{{\bf{M}},{\bf{Q}}\} can be considered as a reweighting of the original neighbor structures described by the adjacency matrix 𝐀{\bf{A}}, where the similarity between (𝐌i,:,𝐌j,:)({\bf{M}}_{i,:},{\bf{M}}_{j,:}) (or (𝐐i,:,𝐐j,:)({\bf{Q}}_{i,:},{\bf{Q}}_{j,:})) indicates the neighbor similarity between nodes (vi,vj)(v_{i},v_{j}). In general, nodes (vi,vj)(v_{i},v_{j}) with a higher neighbor similarity (i.e., denser local linkage) are more likely to be partitioned into the same block. Hence, we believe that {𝐌,𝐐}\{{\bf{M}},{\bf{Q}}\} encode key characteristics about the community structures of a graph. Tian et al. [20] and Yang et al. [21] have also validated the potential of 𝐌{\bf{M}} and 𝐐{\bf{Q}} to derive community-preserving embeddings via deep auto-encoders. Instead of using auto-encoders, we propose a more efficient strategy to extract community-preserving features (denoted by 𝐘N×L{\bf{Y}}\in\mathbb{R}^{N\times L}) from {𝐌,𝐐}\{{\bf{M}},{\bf{Q}}\}, with the feature dimensionality LNL\ll N.

Since most real-world graphs have sparse topology, 𝐌{\bf{M}} can be treated as a sparse matrix. However, 𝐐{\bf{Q}} is still a dense matrix. To fully utilize the sparsity of topology, we introduce a reduced modularity matrix 𝐐~𝐍×𝐍\bf{\tilde{Q}}\in\mathbb{R}^{N\times N}, where 𝐐~ij=𝐐ij{\bf{\tilde{Q}}}_{ij}={\bf{Q}}_{ij} if (vi,vj)E(v_{i},v_{j})\in E and 𝐐~ij=0{\bf{\tilde{Q}}}_{ij}=0 otherwise. Although the reduction of 𝐐{\bf{Q}} may lose some information, our experiments demonstrated that 𝐐~{\bf{\tilde{Q}}} is enough to derive informative community-preserving embeddings to support high-quality GP.

Let 𝐗{𝐌,𝐐~}{\bf{X}}\in\{{\bf{M}},{\bf{\tilde{Q}}}\} be the structural component from NCut minimization or modularity maximization, which induces two variants of RaftGP. We extract the features 𝐘{\bf{Y}} via

𝐘=𝐗𝚯with𝚯N×L,𝚯ir𝒩(0,L0.5),{\bf{Y}}={\bf{X}}{\bf{\Theta}}{\rm{~{}with~{}}}{\bf{\Theta}}\in\mathbb{R}^{N\times L},{\bf{\Theta}}_{ir}\sim{\mathcal{N}}(0,L^{-0.5}), (5)

where 𝐘i,:{\bf{Y}}_{i,:} is the extracted feature of node viv_{i}. Concretely, (5) is the Gaussian random projection [14] of 𝐗N×N{\bf{X}}\in\mathbb{R}^{N\times N}, an efficient dimension reduction technique that can preserve the geometry structures (e.g., in terms of distances) of input features in finite-dimensional l1l_{1} [22] and l2l_{2} [23] spaces with rigorous theoretical guarantees. For instance, nodes (vi,vj)(v_{i},v_{j}) with close (𝐗i,:,𝐗j,:)({\bf{X}}_{i,:},{\bf{X}}_{j,:}) are guaranteed to have close (𝐘i,:,𝐘j,:)({\bf{Y}}_{i,:},{\bf{Y}}_{j,:}).

Since 𝐗{\bf{X}} is sparse, the time complexity of random projection is O(|E|L)O(|E|L) by using the sparse-dense matrix multiplication. Given a graph G{G}, the complexities to compute 𝐌{\bf{M}} and 𝐐~{\bf{\tilde{Q}}} are both O(|E|)O(|E|). In summary, the overall complexity of the random feature extraction step is O(|E|L)O(|E|L).

III-B Random Embedding Derivation

Although the extracted features (described by 𝐘N×L{\bf{Y}}\in\mathbb{R}^{N\times L}) have the initial ability to encode characteristics about community structures, we derive the final community-preserving embeddings (denoted by 𝐙N×d{\bf{Z}}\in\mathbb{R}^{N\times d}) by feeding 𝐘{\bf{Y}} into a multi-layer GNN, which further enhances the extracted features via the feature aggregation of GNN while reducing the feature dimensionality to d<Ld<L. We adopt GCN [24] to build the multi-layer GNN because it is easy to implement and has a low complexity of feature aggregation.

Let 𝐙(k1)N×Lk1{\bf{Z}}^{(k-1)}\in\mathbb{R}^{N\times L_{k-1}} and 𝐙(k)N×Lk{\bf{Z}}^{(k)}\in\mathbb{R}^{N\times L_{k}} be the input and output of the kk-th GNN layer, where 𝐙(0)=𝐘{\bf{Z}}^{(0)}={\bf{Y}}; Lk1L_{k-1} and LkL_{k} are the input and output feature dimensionality. The kk-th GNN layer can be described as

𝐙(k)=fact(𝐃^0.5𝐀^𝐃^0.5𝐙(k1)𝐖(k)),{{\bf{Z}}^{(k)}}={f_{{\rm{act}}}}({{{\bf{\hat{D}}}}^{-0.5}}{\bf{\hat{A}}}{{{\bf{\hat{D}}}}^{-0.5}}{{\bf{Z}}^{(k-1)}}{{\bf{W}}^{(k)}}), (6)

where 𝐙i,:(k){\bf{Z}}_{i,:}^{(k)} is the (intermediate) representation of node viv_{i}; fact()f_{\rm{act}}(\cdot) is an activation function to be specified (e.g., tanh in our implementation); 𝐀^𝐀+𝐈N{\bf{\hat{A}}}\equiv{\bf{A}}+{\bf{I}}_{N} is the adjacency matrix with self-edges; 𝐃^{\bf{\hat{D}}} is the degree diagonal matrix of 𝐀^{\bf{\hat{A}}}; 𝐖(k)Lk1×Lk{\bf{W}}^{(k)}\in\mathbb{R}^{L_{k-1}\times L_{k}} is a trainable parameter. In particular, 𝐙i,:(k){\bf{Z}}_{i,:}^{(k)} is the non-linear aggregation (i.e., weighted mean) of features w.r.t. {vi}N(vi)\{v_{i}\}\cup{\mathop{\rm N}\nolimits}({v_{i}}) from the previous layer, with N(vi){\mathop{\rm N}\nolimits}({v_{i}}) as the neighbor set of viv_{i}. The feature aggregation forces nodes (vi,vj)(v_{i},v_{j}) with similar neighbor sets (N(vi),N(vj))({\mathop{\rm N}\nolimits}({v_{i}}),{\mathop{\rm N}\nolimits}({v_{j}})) (i.e., dense local linkage) to have similar features (𝐙i,:(k),𝐙j,:(k))({\bf{Z}}_{i,:}^{(k)},{\bf{Z}}_{j,:}^{(k)}), thus enhancing the ability to encode characteristics regarding community structures. The multi-layer structure can further extend the feature aggregation to the local topology induced by multi-hop neighbors.

Before feeding 𝐙(k){\bf{Z}}^{(k)} into the next GNN layer, we recommend conducting the row-wise l2l_{2}-normalization via 𝐙i,:(k)𝐙i,:(k)/|𝐙i,:(k)|2{\bf{Z}}_{i,:}^{(k)}\leftarrow{\bf{Z}}_{i,:}^{(k)}/|{\bf{Z}}_{i,:}^{(k)}{|_{2}}, which controls the scale of the (intermediate) representation of each node viv_{i}. We treat the normalized representations from the last GNN layer (denoted by 𝐙N×d{\bf{Z}}\in\mathbb{R}^{N\times d}) as the final community-preserving embeddings. In the rest of this paper, we use 𝐳id{\bf{z}}_{i}\in\mathbb{R}^{d} to represent the embedding of viv_{i}.

The powerful inference abilities of some state-of-the-art GNN-based GP methods (e.g., ClusterNet [17], LGNN [18], and ICD [19]) rely on the offline training of model parameters (e.g., {𝐖(k)}\{{\bf{W}}^{(k)}\}) via the time-consuming gradient descent.

Inspired by the efficient random projection described in (5), we consider an extreme design of RaftGP, where we do not apply any training procedures to the multi-layer GNN. Our experiments demonstrate that one feedforward propagation through a randomly initialized GNN without any training is enough for RaftGP to derive informative community-preserving embeddings and support high-quality GP. A reasonable interpretation is that each GNN layer described in (6) can be considered as a special random projection similar to (5). In the kk-th layer, the geometric properties (e.g., relative distances) between the aggregated representations (i.e., 𝐃^0.5𝐀^𝐃^0.5𝐙(k1){{{\bf{\hat{D}}}}^{-0.5}}{\bf{\hat{A}}}{{{\bf{\hat{D}}}}^{-0.5}}{{\bf{Z}}^{(k-1)}}) can be preserved with theoretical guarantees, even though it is multiplied by a random matrix 𝐖(k){\bf{W}}^{(k)} and mapped to another space with lower dimensionality. The nonlinear activation and l2l_{2}-normalization will also not largely change the geometric properties.

For the GNN feature aggregation in (6), one can treat 𝐃^0.5𝐀^𝐃^0.5{\bf{\hat{D}}}^{-0.5}{\bf{\hat{A}}}{\bf{\hat{D}}}^{-0.5} as a sparse matrix and use the efficient sparse-dense matrix multiplication for implementation. In this setting, the complexity of GNN feedforward propagation is O(|E|L+NLL1+|E|L1+NL1L2+)=O(|E|L+NLL1)O(|E|L+NL{L_{1}}+|E|{L_{1}}+N{L_{1}}{L_{2}}+\cdots)=O(|E|L+NL{L_{1}}), where we usually set Lk>Lk+1L_{k}>L_{k+1}. Assume that {L,L1}\{L,L_{1}\} have the same magnitude with the embedding dimensionality dd. The complexity of the embedding derivation step is O(|E|d+Nd2)O(|E|d+Nd^{2}).

III-C Hierarchical Model Selection

Graph embedding is a unified framework that can support various inference tasks by combining the derived embeddings {𝐳i}\{{\bf{z}}_{i}\} with specific downstream modules. To enable RaftGP to support the GP task stated in Definition 1, the model selection problem (i.e., determining a proper number of blocks KK) is critical. We introduce an efficient hierarchical model selection algorithm based on a novel local modularity metric.

Definition 6 (Local Modularity).

Let UVU\subseteq V be a subset of nodes in graph G=(V,E)G=(V,E). Let degG(v)\deg_{G}(v) be the degree of node vv in GG. Suppose UU is partitioned into KK disjoint blocks (U1,,UK)({U_{1}},\cdots,{U_{K}}). We define the local modularity metric as

LMod(U1,,UK;G)r=1K[mre(m¯r2e)2],{\mathop{\rm LMod}\nolimits}({U_{1}},\cdots,{U_{K}};G)\equiv\sum\nolimits_{r=1}^{K}{[\frac{{{m_{r}}}}{e}-{{(\frac{{{\overline{m}_{r}}}}{{2e}})^{2}}}]}, (7)

where mr{m_{r}} is the number of intra-block edges of UrU_{r}; m¯r=vUrdegG(v){{\overline{m}}_{r}}=\sum\nolimits_{v\in{U_{r}}}{{{\deg}_{G}}(v)} is total degree of nodes in UrU_{r}; 2e=r=1Km¯r2e=\sum\nolimits_{r=1}^{K}{{\overline{m}_{r}}} is the total degree of all nodes in UU.

Different from the standard modularity on G[U]G[U] which considers only edges induced by UU, local modularity also takes account of the edges from UU to VUV-U. When partitioning a subgraph, local modularity keeps its interaction with the remaining graph and thus ensures that we do not further partition a true block into sub-blocks.

1Algorithm]alg:recursion global variables 
2       Graph G=(V,E)G=(V,E); Derived node embeddings {𝐳i}\{{\bf{z}}_{i}\}; Current GP result CC (with CC\leftarrow\emptyset at the beginning)
3procedure Partition(U,G)\textsc{Partition}(U,G)
4       m1LMod(U;G)m_{1}\leftarrow{\mathop{\rm LMod}\nolimits}(U;G) via (7)
5       Apply KKMeans to {𝐳i|viU}\{{\bf{z}}_{i}|v_{i}\in U\} with K=2K=2 and get two subsets {U1,U2}\{U_{1},U_{2}\}
6       m2LMod(U1,U2;G)m_{2}\leftarrow{\mathop{\rm LMod}\nolimits}(U_{1},U_{2};G) via (7)
7       if |U1|ϵ|U_{1}|\leq\epsilon or |U1|ϵ|U_{1}|\leq\epsilon or m1>m2m_{1}>m_{2} then
8             Add UU as a new block to CC
9      else
10             Partition(U1,G)\textsc{Partition}(U_{1},G)
11             Partition(U2,G)\textsc{Partition}(U_{2},G)
12            
13      
Algorithm 1 Hierarchical Model Selection
Figure 1: The hierarchical model selection procedure on a sample embedding. Dots represent vertex embeddings. Rectangles represent final or intermediate blocks. Dashed lines represent partitions returned by the KKmeans algorithm.

Our hierarchical model selection procedure based on the local modularity metric is summarized in LABEL:alg:recursion. We demonstrate it by an example (Fig. 1). For a set of nodes UVU\subseteq V to be partitioned, we first compute the local modularity m1m_{1} that treats U{U} as a single block (i.e., Line 1). The KKMeans clustering algorithm is then applied to embeddings {𝐳i|viU}\{{\bf{z}}_{i}|v_{i}\in U\} (i.e., Line 1), which tries to divide U{U} into two blocks {U1,U2}\{U_{1},U_{2}\} based on the distances between embeddings. One can also compute the local modularity m2m_{2} w.r.t. the partition of {U1,U2}\{U_{1},U_{2}\} (i.e., Line 1). We then introduce a heuristic stopping rule based on the sizes of {U1,U2}\{U_{1},U_{2}\} and values of {m1,m2}\{m_{1},m_{2}\}. Concretely, we adopt the set UU as a new block, if |U1||U_{1}| or |U2||U_{2}| are less than or equal to a threshold ϵ\epsilon (i.e., ϵ=5\epsilon=5 in our implementation) or m1>m2m_{1}>m_{2} (i.e., Lines 1,1). Otherwise, we repeat the aforementioned procedures to further partition U1U_{1} and U2U_{2} (i.e., Lines 1, 1).

In particular, the stopping rule in Line 1 can effectively avoid partitioning UU into too small or imbalanced blocks (e.g., with |U1|<ϵ|U_{1}|<\epsilon or |U2|<ϵ|U_{2}|<\epsilon). Consistent with the modularity maximization objective, it also tries to avoid the decrease of modularity after the partitioning.

The complexity of this algorithm depends on the quality of the embedding. If the embedding is of good quality so that the KKMeans algorithm always partitions the node set UU into two balanced sets, we need to run the KKMeans algorithm on a total of O(NlogN)O(N\log N) nodes. The complexity in this case is O(NTlogN)O(NT\log N) where TT is the number of iterations in KKMeans. In the worst case, the set UU is always partitioned into a single block and the remaining blocks. In this case, the KKMeans algorithm is run on O(KN)O(KN) nodes where KK is the number of blocks. The complexity in this case is O(KTN)O(KTN).

III-D Extension to Streaming Graphs

The aforementioned designs of RaftGP can be easily extended to streaming graphs. Let EE^{\prime} be the set of newly added edges in the streaming setting. Since the addition of EE^{\prime} only changes the degrees of at most 2|E|2|E^{\prime}| nodes, one can use an incremental strategy to update the input statistics 𝐗{𝐌,𝐐~}{\bf{X}}\in\{{\bf{M}},{\bf{\tilde{Q}}}\} with a complexity of O(|E|)O(|E^{\prime}|). Let VV^{\prime} be the set of nodes incident to EE^{\prime}. Note that only the multi-hop neighbors of VV^{\prime} (denoted by V′′V^{\prime\prime}) will participate in the GNN feature aggregation of VV^{\prime}. Let E′′E^{\prime\prime} be the set of edges induced by VV′′V^{\prime}\cup V^{\prime\prime}. One can also incrementally update embeddings {𝐳i}\{{\bf{z}}_{i}\} via the random projection and GNN feedforward propagation induced by (VV′′,E′′)(V^{\prime}\cup V^{\prime\prime},E^{\prime\prime}). Therefore, the overall complexity of embedding derivation is O(|E′′|d+|VV′′|d)O(|E^{\prime\prime}|d+|V^{\prime}\cup V^{\prime\prime}|d), where we usually have |E′′|<|E||E^{\prime\prime}|<|E| and |VV′′|<N|V^{\prime}\cup V^{\prime\prime}|<N.

For the hierarchical model selection, one can formulate the partitioning procedure as a tree, where each tree node represents a subset of nodes (i.e., UVU\subseteq V) to be partitioned; the children of this tree node represent a partition of UU (e.g., (U1,U2)(U_{1},U_{2})); each leaf corresponds to a block CrC_{r}. The tree structure allows one to update only a tree path (path from a leaf to the root) once the embedding 𝐳i{\bf{z}}_{i} of node viv_{i} is updated.

IV Fast Graph Generating from the SBM

The stochastic block partition benchmark of the IEEE HPEC Graph Challenge generates test graphs using the SBM generators (i.e., ‘blockmodel’ and ‘blockmodel-degree’) implemented by graph-tool [25]. We argue that there are some limitations in the implementations of these generators.

For instance, ‘blockmodel’ ignores the degree correction terms (i.e., 𝜽\boldsymbol{\theta} in Definition 5). ‘blockmodel-degree’ uses the Markov Chain Monte Carlo (MCMC) sampling algorithm to rewire the edges generated from another random model. As a result, the distribution 𝒳\mathcal{X} of its output approaches the SBM distribution 𝒮\mathcal{S} only after sufficient iterations of MCMC. In practice, the number of iterations is set to a constant. This makes the distribution 𝒳\mathcal{X} an approximation of 𝒮\mathcal{S} without a provable distance guarantee.

1Algorithm]alg:sample procedure RangeSum(r,s,x,y)\textsc{RangeSum}(r,s,x,y)
2       return Sum of {λij𝜽i𝜽j𝛀cicj}\{\lambda_{ij}\equiv\boldsymbol{\theta}_{i}\boldsymbol{\theta}_{j}{\bf{\Omega}}_{c_{i}c_{j}}\} over the xx-th to the yy-th edge between blocks CrC_{r} and CsC_{s}
3procedure Sample(N,K,𝐜,𝛉,𝛀)\textsc{Sample}(N,K,{\bf{c}},\boldsymbol{\theta},\mathbf{\Omega})
4       Let Cr{vi|𝐜i=r}C_{r}\leftarrow\{v_{i}|{\bf{c}}_{i}=r\} be the rr-th block
5       for all pairs (r,s)(r,s) from [K]×[K][K]\times[K] do
6             m1m\leftarrow 1 // Counter of edge
7             while m|Cr||Cs|m\leq|C_{r}||C_{s}| do
8                   Sample pU[0,1]p\sim U[0,1]
9                   Let tt be the smallest integer in [m,|Cr||Cs|][m,|C_{r}||C_{s}|] s.t. p>exp(RangeSum(r,s,m,t))p>\exp{(-\textsc{RangeSum}(r,s,m,t))}
10                   if tt does not exist then
11                         break
12                        
13                  Let (vi,vj)(v_{i},v_{j}) be the tt-th edge from CrC_{r} to CsC_{s}
14                   Generate 𝐀ijPois(λij)\mathbf{A}_{ij}\sim Pois(\lambda_{ij}) s.t. 𝐀ij0\mathbf{A}_{ij}\neq 0
15                   mt+1m\leftarrow t+1
16            
17      
Algorithm 2 Fast Graph Generating from the SBM

To address the limitations, we introduce a new fast graph generating algorithm. It generates graphs from the exact SBM distribution 𝒮(𝜽,𝐜,𝛀)\mathcal{S}(\boldsymbol{\theta},\bf{c},\mathbf{\Omega}) and has a nearly linear time complexity of O((|E|+K2)logN)O((|E|+K^{2})\log N), where {𝜽,𝐜,𝛀}\{\boldsymbol{\theta},\bf{c},\mathbf{\Omega}\} are defined in Definition 5; KK, NN, and |E||E| are the numbers of blocks, nodes, and edges in the generated graph.

LABEL:alg:sample summarizes the graph generating procedure, where we enumerate all pairs of blocks (Cr,Cs)(C_{r},C_{s}) (rsr\leq s) and sample edges between these two blocks. The algorithm repeatedly finds the next edge with nonzero weight (Lines 2 and 2) and determines its weight from a Poisson distribution (Line 2). To implement Line 2, we perform a binary search on tt. To compute RangeSum(r,s,x,y)\textsc{RangeSum}(r,s,x,y), we arrange all the possible edges between (Cr,Cs)(C_{r},C_{s}) in a virtual |Cr|×|Cs||C_{r}|\times|C_{s}| table. Then we can see that a precomputed 2D prefix sum gives RangeSum(r,s,x,y)\textsc{RangeSum}(r,s,x,y) in O(1)O(1) time. Note that for undirected graphs, we should ignore the lower diagonal entries in 𝐀\mathbf{A}.

V Experiments

V-A Experiment Setups

V-A1 Datasets

TABLE I: Details of the Generated Benchmark Datasets
NN |E||E| KK deg density Layer Configuration
1E3 1.69E4\sim1.74E4 11 9\sim112 3E-2 [256,128,64]
5E3 9.51E4\sim9.69E4 19 5\sim164 8E-3 [1024,512,256,128]
1E4 1.95E5\sim1.96E5 25 6\sim180 4E-3 [4096,2048,1024,512,256]
5E4 9.97E5\sim1.01E6 44 5\sim205 8E-4 [4096,2048,1024,512,256]
1E5 2.01E6\sim2.02E6 56 4\sim209 4E-4 [4096,2048,1024,512,256]
2E5 4.04E6\sim4.06E6 71 5\sim230 2E-4 [4096,2048,1024,512,256]

We followed the stochastic block partitioning benchmark of the IEEE HPEC Graph Challenge to generate test datasets from the SBM. As mentioned in Section IV, we develop a new generator to overcome the limitations of the original implementations in the benchmark.

We set (i) the ratio between the numbers of within-block edges and between-block edges to 2.52.5 and (ii) block size heterogeneity to 33. These configurations correspond to the hardest setting of the benchmark. We generated graphs with different scales by respectively setting the number of nodes NN to 1E3, 5E3, 1E4, 5E4, 1E5, and 2E5. For each setting of NN, we independently generated five graphs. Statistics of the generated datasets are shown in Table I, where |E||E| and KK are the numbers of edges and blocks; deg{\deg} is the node degree.

V-A2 Baselines

We compared RaftGP with five baselines, including MC-SBM [1], greedy modularity maximization (GMod) [26], spectral modularity maximization (SMod) [16], Par-SBM [27], and BP-Mod [28]. As stated in Definition 1, we consider the GP task where the number of blocks KK is unknown and determined by the method to be evaluated. Hence, some other baselines that need a pre-set KK (e.g., metis [29], GraClus [30], and spectral clustering [15]) could not be included in our experiments.

Note that RaftGP has two variants with their features extracted from NCut minimization (i.e., 𝐗=𝐌{\bf{X}}={\bf{M}}) and modularity maximization (i.e., 𝐗=𝐐~{\bf{X}}={\bf{\tilde{Q}}}).They are denoted as RaftGP-C and RaftGP-M, respectively.

V-A3 Evaluation Criteria

We followed the IEEE HPEC Graph Challenge to adopt accuracy, adjusted random index (ARI), precision, and recall as quality metrics. Given precision and recall, we also computed the corresponding F1-score as a comprehensive metric considering both aspects. The runtime (in terms of seconds) of a method to output its GP result was used as the efficiency metric. We also define that a method encounters the out-of-time (OOT) exception if it cannot obtain a GP result within 1E4 seconds.

V-A4 Parameter & Environment Settings

Layer configurations of RaftGP on all the datasets are shown in Table I, where the first and last numbers denote the input feature dimensionality LL and embedding dimensionality dd. We used Python 3.7 and PyTorch 1.13 to implement RaftGP and adopted the official or widely-used implementations of baselines (implemented by C++ and Python). All the experiments were conducted on a server with two Intel Xeon 6130 CPUs (16 cores), one GeForce RTX3090 GPU (24GB GPU memory), 32GB main memory, and Ubuntu 18.04.5 LTS OS.

V-B Quantitative Evaluation & Discussions

TABLE II: Quantitative Evaluation Results
NN Methods Acc\uparrow ARI\uparrow F1\uparrow (Recall, Precision) Time\downarrow(s) NN Methods Acc\uparrow ARI\uparrow F1\uparrow (Recall, Precision) Time\downarrow(s)
1E3 MC-SBM 0.9764 0.9689 0.9736 (0.9486, 1.0000) 10.4589 5E4 MC-SBM 0.9769 0.9701 0.9711 (0.9439, 1.0000) 1658.27
GMod 0.7098 0.6813 0.7309 (0.9823, 0.5820) 4.7274 GMod OOT OOT OOT >>10,000
SMod 0.7098 0.5879 0.6531 (0.8675, 0.5237) 42.4847 SMod OOM OOM OOM OOM
Par-SBM 0.9984 0.9976 0.9978 (0.9959, 0.9998) 0.4022 Par-SBM 0.9748 0.9747 0.9758 (0.9998, 0.9529) 14.2388
BP-SBM 0.6904 0.6924 0.7508 (0.9999, 0.6010) 261.7399 BP-SBM OOT OOT OOT >>10,000
RaftGP-C 0.9996 0.9994 0.9994 (0.9997, 0.9992) 1.6172 RaftGP-C 0.9998 0.9998 0.9998 (1.0000, 0.9996) 71.1237
RaftGP-M 0.9996 0.9994 0.9994 (0.9997, 0.9992) 1.5634 RaftGP-M 0.9998 0.9998 0.9998 (1.0000, 0.9996) 69.4147
5E3 MC-SBM 0.9841 0.9858 0.9868 (0.9739, 1.0000) 83.0489 1E5 MC-SBM OOT OOT OOT >>10,000
GMod 0.6172 0.5930 0.6289 (0.9627, 0.4670) 120.4837 GMod OOT OOT OOT >>10,000
SMod 0.3639 0.1939 0.2848 (0.7149, 0.1778) 1234.89 SMod OOM OOM OOM OOM
Par-SBM 0.9916 0.9894 0.9904 (0.9984, 0.9825) 1.1748 Par-SBM 0.9061 0.9096 0.9130 (0.9998, 0.8401) 38.9002
BP-SBM 0.4746 0.4755 0.5282 (0.9999, 0.3589) 3994.94 BP-SBM OOT OOT OOT >>10,000
RaftGP-C 0.9998 0.9998 0.9997 (0.9999, 0.9996) 5.9843 RaftGP-C 1.0000 1.0000 1.0000 (1.0000, 1.0000) 150.6443
RaftGP-M 0.9998 0.9998 0.9997 (0.9999, 0.9996) 6.1395 RaftGP-M 1.0000 1.0000 1.0000 (1.0000, 1.0000) 154.8510
1E4 MC-SBM 0.9608 0.9495 0.9539 (0.9119, 0.9999) 313.1485 2E5 MC-SBM OOT OOT OOT >>10,000
GMod OOT OOT OOT >>10,000 GMod OOT OOT OOT >>10,000
SMod 0.4693 0.3066 0.3721 (0.7592, 0.2464) 6609.61 SMod OOM OOM OOM OOM
Par-SBM 0.9908 0.9959 0.9962 (0.9992, 0.9933) 4.4148 Par-SBM 0.9341 0.9445 0.9461 (0.9999, 0.8978) 109.4829
BP-SBM OOT OOT OOT >>10,000 BP-SBM OOT OOT OOT >>10,000
RaftGP-C 0.9981 0.9974 0.9975 (0.9999, 0.9952) 15.6803 RaftGP-C 1.0000 1.0000 1.0000 (1.0000, 1.0000) 377.0465
RaftGP-M 0.9981 0.9966 0.9968 (0.9999, 0.9938) 15.6424 RaftGP-M 1.0000 1.0000 1.0000 (1.0000, 1.0000) 367.5187
TABLE III: Detailed Runtime of RaftGP in terms of Seconds
NN Methods Total (s) Feat Emb Model
1E3 RaftGP-C 1.6172 0.1172 0.2454 1.2547
RaftGP-M 1.5634 0.1136 0.2320 1.2177
5E3 RaftGP-C 5.9843 0.5228 0.2525 5.2089
RaftGP-M 6.1395 0.4374 0.2537 5.4484
1E4 RaftGP-C 15.6803 2.7233 0.3040 12.6531
RaftGP-M 15.6424 2.4969 0.2788 12.8667
5E4 RaftGP-C 71.1237 11.2939 0.6683 59.1615
RaftGP-M 69.4147 10.4094 0.4556 58.5497
1E5 RaftGP-C 150.6443 23.1460 0.7190 126.7793
RaftGP-M 154.8510 24.0318 0.7432 130.0760
2E5 RaftGP-C 377.0465 62.1050 1.3236 313.6178
RaftGP-M 367.5187 57.6826 1.5124 308.3236

The average quality metrics and runtime over five generated graphs w.r.t. each setting of NN are reported in Table II, where the quality metrics of RaftGP are in bold if they perform the best; OOT and OOM denote the out-of-time and out-of-memory exceptions. In addition to the total runtime, Table III reports the time of (i) random feature extraction, (ii) embedding derivation (i.e., GNN feedforward propagation), and (iii) model selection for RaftGP.

Note that RaftGP does not include any training procedures. Surprisingly, both variants of RaftGP can achieve the best quality metrics in most cases, significantly outperforming MC-SBM, GMod, SMod, and BP-SBM. It verifies our discussions in Section III-B that one feedforward propagation through a random initialized GNN is a special random projection with the geometric properties of the aggregated features preserved, and thus can still derive informative embeddings. Although Par-SBM has close quality and slightly faster runtime compared with RaftGP when NN is small, our methods can achieve much better quality metrics in cases with larger NNs. In this sense, we believe that RaftGP can achieve a better trade-off between quality and efficiency especially when NN is large.

On all the datasets, the two variants of RaftGP have close quality metrics with best performance. It validates our motivation of using 𝐌{\bf{M}} and 𝐐~{\bf{\tilde{Q}}} (extracted from the NCut minimization and modularity maximization objectives) as informative statistics for the derivation of community-preserving embeddings.

As in Table III, model selection is the major bottleneck for both variants of RaftGP, which accounts for more than 8080% of the total runtime. It implies that the random feature extraction and GNN-based embedding derivation are efficient designs for RaftGP with runtime better than Par-SBM.

V-C Ablation Study

TABLE IV: Ablation Study of RaftGP
NN Methods Acc\uparrow ARI\uparrow F1\uparrow (Recall, Precision)
1E3 RaftGP-C 0.9996 0.9994 0.9994 (0.9997, 0.9992)
(i) w/o Feat 0.9996 0.9994 0.9994 (0.9997, 0.9992)
(ii)w/o Emb 0.9806 0.9903 0.9916 (0.9997, 0.9836)
RaftGP-M 0.9996 0.9994 0.9994 (0.9997, 0.9992)
(i) w/o Feat 0.9996 0.9994 0.9994 (0.9997, 0.9992)
(ii)w/o Emb 0.9000 0.8936 0.9190 (0.9998, 0.8502)
5E3 RaftGP-C 0.9998 0.9998 0.9997 (0.9999, 0.9996)
(i) w/o Feat 0.9998 0.9998 0.9997 (0.9999, 0.9996)
(ii)w/o Emb 0.9846 0.9784 0.9798 (0.9999, 0.9605)
RaftGP-M 0.9998 0.9998 0.9997 (0.9999, 0.9996)
(i) w/o Feat 0.9998 0.9998 0.9997 (0.9999, 0.9996)
(ii)w/o Emb 0.1257 0.0067 0.1266 (0.9996, 0.0676)
1E4 RaftGP-C 0.9981 0.9974 0.9975 (0.9999, 0.9952)
(i) w/o Feat 0.9981 0.9967 0.9968 (0.9999, 0.9938)
(ii)w/o Emb 0.8452 0.7023 0.7534 (0.9998, 0.6045)
RaftGP-M 0.9981 0.9966 0.9968 (0.9999, 0.9938)
(i) w/o Feat 0.9981 0.9966 0.9968 (0.9999, 0.9938)
(ii)w/o Emb 0.2725 0.0914 0.1966 (1.0000, 0.1090)
5E4 RaftGP-C 0.9998 0.9998 0.9998 (1.0000, 0.9996)
(i) w/o Feat OOM OOM OOM
(ii)w/o Emb 0.0537 0.0000 0.0543 (1.0000, 0.0279)
RaftGP-M 0.9998 0.9998 0.9998 (1.0000, 0.9996)
(i) w/o Feat OOM OOM OOM
(ii)w/o Emb 0.0829 0.0261 0.0791 (0.9998, 0.0412)

We also validated the effects of (i) random feature extraction and (ii) embedding derivation (with GNN) of RaftGP by removing the corresponding components. In case (i), 𝐗{𝐌,𝐐~}{\bf{X}}\in\{{\bf{M}},{\bf{\tilde{Q}}}\} was directly used as the input of GNN (without random projection). In case (ii), we directly treated the features 𝐘{\bf{Y}} derived via random projection as the embedding input of model selection (without using GNN). Results of our ablation study on all the datasets are shown in Table IV.

Although case (i) can achieve the quality competitive to the original RaftGP model when NN is small, we encounter the OOM exception when NN\geq 5E4. It indicates that the Gaussian random projection can reduce the feature dimensionality while preserving the key geometric properties of input features as discussed in Section III-A. Moreover, the GNN-based embedding derivation is essential to ensure the high quality of RaftGP because there are significant quality declines in case (ii) with the increase of NN. It verifies our discussions in Section III-B that the feature aggregation of GNN can enhance the ability of RaftGP to preserve community structures.

VI Conclusion

In this paper, we focused on the GP problem and proposed a RaftGP method based on an efficient community-preserving embedding scheme, including the random feature extraction, random embedding derivation, and hierarchical model selection. We found that a randomly initialized GNN, even without training, is enough to derive informative community-preserving embeddings and support high-quality GP. We evaluated our method on the stochastic block partitioning benchmark of the IEEE HPEC Graph Challenge and compared the performance with five baselines, where a new graph generating algorithm was developed to address some limitations of the original data generators in the benchmark. Compared to the baseline provided by the IEEE HPEC Graph Challenge, our method achieves better accuracy on all test cases and takes 64.5x shorter time on large graphs. In our future work, we plan to extend RaftGP to dynamic graphs [31, 32, 33, 34].

References

  • [1] T. Peixoto, “Efficient monte carlo and greedy heuristic for the inference of stochastic block models,” Physical Review. E, Statistical, Nonlinear, & Soft Matter Physics, vol. 89, p. 012804, 01 2014.
  • [2] M. Qin, D. Jin, K. Lei, B. Gabrys, and K. Musial-Gabrys, “Adaptive community detection incorporating topology and content in social networks,” Knowledge-Based Systems, vol. 161, pp. 342–356, 2018.
  • [3] W. Li, M. Qin, and K. Lei, “Identifying interpretable link communities with user interactions and messages in social networks,” in Proceedings of the 2019 IEEE International Conference on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom).   IEEE, 2019, pp. 271–278.
  • [4] M. Qin and K. Lei, “Dual-channel hybrid community detection in attributed networks,” Information Sciences, vol. 551, pp. 146–167, 2021.
  • [5] B. Hendrickson and T. G. Kolda, “Graph partitioning models for parallel computing,” Parallel Computing, vol. 26, no. 12, pp. 1519–1534, 2000.
  • [6] L. Dai and B. Bai, “Optimal decomposition for large-scale infrastructure-based wireless networks,” IEEE Transactions on Wireless Communications, vol. 16, no. 8, pp. 4956–4969, 2017.
  • [7] M. Qin, K. Lei, B. Bai, and G. Zhang, “Towards a profiling view for unsupervised traffic classification by exploring the statistic features and link patterns,” in Proceedings of the 2019 ACM SIGCOMM NetAI Workshop, 2019, pp. 50–56.
  • [8] E. K. Kao, V. Gadepally, M. B. Hurley, M. Jones, J. Kepner, S. Mohindra, P. Monticciolo, A. Reuther, S. Samsi, W. Song, D. Staheli, and S. T. Smith, “Streaming graph challenge: Stochastic block partition,” in Proceedings of the 2017 IEEE High Performance Extreme Computing Conference (HPEC), 2017, pp. 1–12.
  • [9] D. Zhuzhunashvili and A. Knyazev, “Preconditioned spectral clustering for stochastic block partition streaming graph challenge (preliminary version at arxiv.),” in Proceedings of the 2017 IEEE High Performance Extreme Computing Conference (HPEC).   IEEE, 2017, pp. 1–6.
  • [10] L. Durbeck and P. Athanas, “Dpgs graph summarization preserves community structure,” in Proceedings of the 2021 IEEE High Performance Extreme Computing Conference (HPEC).   IEEE, 2021, pp. 1–9.
  • [11] A. J. Uppal, J. Choi, T. B. Rolinger, and H. H. Huang, “Faster stochastic block partition using aggressive initial merging, compressed representation, and parallelism control,” in Proceedings of the 2021 IEEE High Performance Extreme Computing Conference (HPEC).   IEEE, 2021, pp. 1–7.
  • [12] F. Wanye, V. Gleyzer, and W.-c. Feng, “Fast stochastic block partitioning via sampling,” in Proceedings of the 2019 IEEE High Performance Extreme Computing Conference (HPEC).   IEEE, 2019, pp. 1–7.
  • [13] Z. Zhang, P. Cui, and W. Zhu, “Deep learning on graphs: A survey,” IEEE Transactions on Knowledge & Data Engineering (TKDE), vol. 34, no. 1, pp. 249–270, 2020.
  • [14] R. I. Arriaga and S. Vempala, “An algorithmic theory of learning: Robust concepts and random projection,” Machine Learning, vol. 63, pp. 161–182, 2006.
  • [15] U. Von Luxburg, “A tutorial on spectral clustering,” Statistics & Computing, vol. 17, no. 4, pp. 395–416, 2007.
  • [16] M. E. Newman, “Modularity and community structure in networks,” Proceedings of the National Academy of Sciences (PNAS), vol. 103, no. 23, pp. 8577–8582, 2006.
  • [17] B. Wilder, E. Ewing, B. Dilkina, and M. Tambe, “End to end learning and optimization on graphs,” Proceedings of the 2019 Advances in Neural Information Processing Systems (NIPS), vol. 32, 2019.
  • [18] Z. Chen, J. Bruna, and L. Li, “Supervised community detection with line graph neural networks,” in Proceedings of the 7th International Conference on Learning Representations (ICLR), 2019.
  • [19] M. Qin, C. Zhang, B. Bai, G. Zhang, and D.-Y. Yeung, “Towards a better trade-off between quality and efficiency of community detection: An inductive embedding method across graphs,” ACM Transactions on Knowledge Discovery from Data (TKDD), 2023.
  • [20] F. Tian, B. Gao, Q. Cui, E. Chen, and T.-Y. Liu, “Learning deep representations for graph clustering,” in Proceedings of the 28th AAAI Conference on Artificial Intelligence, vol. 28, no. 1, 2014.
  • [21] L. Yang, X. Cao, D. He, C. Wang, X. Wang, and W. Zhang, “Modularity based community detection with deep learning,” in Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI), vol. 16, 2016, pp. 2252–2258.
  • [22] J. Bourgain, “On lipschitz embedding of finite metric spaces in hilbert space,” Israel Journal of Mathematics, vol. 52, no. 1-2, pp. 46–52, 1985.
  • [23] W. J. J. Lindenstrauss, “Extensions of lipschitz maps into a hilbert space,” Contemporary Mathematics, vol. 26, no. 189, pp. 189–206, 1984.
  • [24] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in Proceedings of the 5th International Conference on Learning Representations (ICLR), 2017, p. 1609.02907.
  • [25] T. P. Peixoto, “The graph-tool python library,” figshare, 2014. [Online]. Available: http://figshare.com/articles/graph_tool/1164194
  • [26] A. Clauset, M. E. Newman, and C. Moore, “Finding community structure in very large networks,” Physical Review E, vol. 70, no. 6, p. 066111, 2004.
  • [27] C. Peng, Z. Zhang, K.-C. Wong, X. Zhang, and D. E. Keyes, “A scalable community detection algorithm for large graphs using stochastic block models,” Intelligent Data Analysis, vol. 21, no. 6, pp. 1463–1485, 2017.
  • [28] P. Zhang and C. Moore, “Scalable detection of statistically significant communities and hierarchies, using message passing for modularity,” Proceedings of the National Academy of Sciences (PNAS), vol. 111, no. 51, pp. 18 144–18 149, 2014.
  • [29] B. Hendrickson, R. W. Leland et al., “A multi-level algorithm for partitioning graphs,” Proceedings Supercomputing, vol. 95, no. 28, pp. 1–14, 1995.
  • [30] I. S. Dhillon, Y. Guan, and B. Kulis, “Weighted graph cuts without eigenvectors a multilevel approach,” IEEE Transactions on Pattern Analysis & Machine Intelligence (TPAMI), vol. 29, no. 11, pp. 1944–1957, 2007.
  • [31] K. Lei, M. Qin, B. Bai, and G. Zhang, “Adaptive multiple non-negative matrix factorization for temporal link prediction in dynamic networks,” in Proceedings of the 2018 SIGCOMM NetAI Workshop, 2018, pp. 28–34.
  • [32] K. Lei, M. Qin, B. Bai, G. Zhang, and M. Yang, “Gcn-gan: A non-linear temporal link prediction model for weighted dynamic networks,” in Proceedings of the 2019 IEEE Conference on Computer Communications (INFOCOM).   IEEE, 2019, pp. 388–396.
  • [33] M. Qin and D.-Y. Yeung, “Temporal link prediction: A unified framework, taxonomy, and review,” arXiv preprint arXiv:2210.08765, 2022.
  • [34] M. Qin, C. Zhang, B. Bai, G. Zhang, and D.-Y. Yeung, “High-quality temporal link prediction for weighted dynamic graphs via inductive embedding aggregation,” IEEE Transactions on Knowledge & Data Engineering (TKDE), vol. 35, no. 9, pp. 9378–9393, 2023.