This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

.

thanks: [email protected]

Edge Augmentation on Disconnected Graphs via Eigenvalue Elevation

Tianyi Li 1Department of Decision Sciences and Managerial Economics, CUHK Business School, Hong Kong, China
2CAS Key Laboratory for Theoretical Physics, Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, China
Abstract

The graph-theoretical task of determining most likely inter-community edges based on disconnected subgraphs’ intra-community connectivity is proposed. An algorithm is developed for this edge augmentation task, based on elevating the zero eigenvalues of graph’s spectrum. Upper bounds for eigenvalue elevation amplitude and for the corresponding augmented edge density are derived and are authenticated with simulation on random graphs. The algorithm works consistently across synthetic and real networks, yielding desirable performance at connecting graph components. Edge augmentation reverse-engineers graph partition under different community detection methods (Girvan-Newman method, greedy modularity maximization, label propagation, Louvain method, and fluid community), in most cases producing inter-community edges at >50%>50\% frequency.

It is common in social studies employing network analysis that the entire network is comprised of disconnected communities, e.g., villages (Banerjee et al., 2013; Cai et al., 2015) or schools (Paluck et al., 2016) in a geological region, or online chatrooms on a digital platform (Huffaker, 2010). Analysis is conducted on each distinct community, and results are compared across multiple communities. Although communities possess distinct features, they nevertheless share certain commonalities that are indicative of the background graph consisting of them all. It is useful to recover this background graph from these disconnected components, in order that tasks that operate on the entire graph can be conducted.

This asks for augmentation of the disconnected graph by introducing new edges. We view the observed subgraphs as downsampled from the entire background graph consisting of all nodes, and try to recover a more detailed connectivity between existing nodes. In particular, complementing existing graph connectivity within subgraphs (i.e., intra-community edges), we establish connections between current communities (i.e., inter-community edges). That is, address the following task: based on subgraphs’ intra-community connectivity, determine most likely inter-community edges that are currently missing on the disconnected graph.

This graph-theoretical task is seldom touched to the best of our knowledge. More broadly, despite being an important problem, graph augmentation has yet to receive substantial research attention. Previous studies formulate graph augmentation as an optimization problem – determining a minimum-cost set of edges to add to a graph to satisfy a specified property, such as biconnectivity, bridge-connectivity or strong connectivity (Frederickson and Ja’Ja’, 1981). Since the problem is NP-hard, approximate algorithms are developed (Frederickson and Ja’Ja’, 1981; Watanabe and Nakamura, 1987; Cai and Sun, 1989; Khuller and Thurimella, 1993). This task witnesses an unexpected revival in recent works (Klicpera et al., 2019; Rong et al., 2019; Gao et al., 2021; Zhao et al., 2021), where graph augmentation, as a subordinate of data augmentation, is studied to improve the generalizability of machine learning on graphs (see reviews in Ding et al. (2022); Zhao et al. (2022)). Those studies focus primarily on edge dropping for removing the over-smoothing of graph neural networks, or on node augmentation for enriching the dataset, paying less attention to edge augmentation.

Method.– Consider graph GG consisting of MM subgraphs (disconnected communities): G(N,E)={Gm}m=1:M={(Nm,Em)}m=1:MG(N,E)=\{G_{m}\}_{m=1:M}=\{(N_{m},E_{m})\}_{m=1:M}. Subgraph GmG_{m} has |Nm||N_{m}| nodes and |Em||E_{m}| edges: Nm={nm,i},Em={em,i}N_{m}=\{n^{m,i}\},E_{m}=\{e^{m,i}\}. Subgraphs G1,G2,,GMG_{1},G_{2},...,G_{M} are ordered by their size, e.g., |N1|<|N2||N_{1}|<|N_{2}|.

We consider that the observed edges {Em}m=1:M\{E_{m}\}_{m=1:M} are a subset of the edges in the whole graph. With additional edges ΔE\Delta E, the augmented edge set (sup) could facilitate a fully connected (fc) graph where current communities are all interconnected, and even beyond, until possibly reaching the complete graph having |N|(|N|1)/2|N|(|N|-1)/2 edges.

We parameterize edge sets at different scenarios with edge density θ1\theta\geq 1: at the current edge set, θ=1\theta=1; at augmented edge sets, θ>1\theta>1; θ\theta may reach maximum θmax\theta_{max}, which is no greater than |N|(|N|1)/2|E||N|(|N|-1)/2|E| at the complete graph (see below). Edge sets are further distinguished by the multiplicity of zero eigenvalues of GG’s Laplacian, denoted with MM^{\dagger}, which equals the number of connected components (i.e., disconnected communities) in the graph (Chung and Graham, 1997; Von Luxburg, 2007). For current graph GG, the number of components M=MM^{\dagger}=M; with additional edges, MM^{\dagger} decreases, until the graph is fully connected, in which case M=1M^{\dagger}=1. The span of edge set scenarios is:

E(θ=1,M=M)Esup(θ>1,MM)Efc(θ>1,M=1)Emax(θ=θmax|N|(|N|1)2|E|,M=1).E(\theta=1,M^{\dagger}=M)\subseteq E_{sup}(\theta>1,M^{\dagger}\leq M)\subseteq E_{fc}(\theta>1,M^{\dagger}=1)\subseteq E_{max}(\theta=\theta_{max}\leq\frac{|N|(|N|-1)}{2|E|},M^{\dagger}=1). (1)

If we have information on node attributes, we can use this data to project which edges are likely to appear, for example, considering the problem of link prediction (Liben‐Nowell and Kleinberg, 2007; Lü and Zhou, 2011). When we do not have such data and only have graph connectivity information, it is difficult to employ link prediction methods to establish inter-community edges – as connectivity-based link prediction relies on common neighbors between a node pair to predict their potential linkage, new edges are not connecting nodes having zero existing path.

Refer to caption
Figure 1: Illustration of graph augmentation via eigenvalue elevation.

(A) Graph augmentation via eigenvalue elevation

In this case, we augment the graph through working on the spectrum of graph Laplacian. Consider graph GG’s adjacency matrix A={aij}A=\{a_{ij}\}. Graph Laplacian L=DAL=D-A where DD is diagonal matrix whose elements are node degrees, D=diag(di)D=diag(d_{i}). Eigendecompose L=UΛUTL=U\Lambda U^{T}, where Λ\Lambda is diagonal matrix whose elements are eigenvalues of GG, Λ=diag(λ1,λ2,,λN)\Lambda=diag(\lambda_{1},\lambda_{2},...,\lambda_{N}), and U=(v1,v2,,vN)TU=(v_{1},v_{2},...,v_{N})^{T} are the corresponding eigenvectors. Arrange eigenvalues in increasing order, with λ1=0\lambda_{1}=0 being the smallest eigenvalue. As the graph has M>1M>1 disconnected communities, the first MM eigenvalues λ1,,λM\lambda_{1},...,\lambda_{M} are zero. To connect communities, we elevate zero eigenvalues to reduce the number of connected components. Consider h[1,M1]h\in[1,M-1] eigenvalues elevated uniformly to whw_{h} from 0, encoded in Λδ=diag(0,wh,,whh,0,,0)\Lambda_{\delta}=diag(0,\underbrace{w_{h},...,w_{h}}_{h},0,...,0). The modified Laplacian

L=U(Λ+Λδ)UT=L+UΛδUT=L+q=1:hwhvqvqT,L^{\prime}=U(\Lambda+\Lambda_{\delta})U^{T}=L+U\Lambda_{\delta}U^{T}=L+\sum_{q=1:h}w_{h}v_{q}v_{q}^{T}, (2)

where vq=(u1q,u2q,,uNq)Tv_{q}=(u_{1q},u_{2q},...,u_{Nq})^{T} are the eigenvectors corresponding to the hh elevated eigenvalues (uu_{\bullet} are elements in UU).

(B) Upper bound for edge density

As tr(vqvqT)=vqTvq=1tr(v_{q}v_{q}^{T})=v_{q}^{T}v_{q}=1, there is

tr(L)=tr(L)+q=1:hwh=tr(L)+hwh.tr(L^{\prime})=tr(L)+\sum_{q=1:h}w_{h}=tr(L)+hw_{h}. (3)

On the other hand, tr(L)=tr(DA)=tr(D)=vol(A)=2|E|tr(L)=tr(D-A)=tr(D)=vol(A)=2|E| (Note 1, ), and similarly, tr(L)=vol(A)=2|E|tr(L^{\prime})=vol(A^{\prime})=2|E^{\prime}|, i.e., the trace of LL^{\prime} equals the volume (e.g., twice the edge number) of the augmented graph AA^{\prime}. Recall the definition of θ\theta as the indication of edge density, |E|=θ|E||E^{\prime}|=\theta|E|, there is the expression

θ=1+hwh2|E|.\theta=1+\frac{hw_{h}}{2|E|}. (4)

By determining the number of eigenvalues to be elevated and the amplitude of elevation, hh and whw_{h}, we control the density θ\theta of augmented edges. We derive the upper bound for θ\theta. For each value of hh, the number of edges in the augmented graph is the largest if the graph is turned into a complete graph on the largest connected component having N(M1h)N-(M-1-h) nodes, besides the hh isolated nodes each forming a single-node component. That gives

θ^hmax=[|N|(M1h)][|N|(M1h)1]/2|E|\displaystyle\hat{\theta}^{max}_{h}=[|N|-(M-1-h)][|N|-(M-1-h)-1]/2|E| (5)
w^hmax=(|N|M+1+h)(|N|M+h)2|E|/h.\displaystyle\Longrightarrow\hat{w}_{h}^{max}=(|N|-M+1+h)(|N|-M+h)-2|E|/h.

Both dw^hmax/dhd\hat{w}_{h}^{max}/dh and dθ^max/dhd\hat{\theta}^{max}/dh are positive, thus

w^max=maxw^hmax=|N|(|N|1)2|E|/(M1),\displaystyle\hat{w}^{max}=\underset{h}{max}\ \hat{w}_{h}^{max}=|N|(|N|-1)-2|E|/(M-1), (6)
θ^max=maxθ^hmax=|N|(|N|1)/2|E|.\displaystyle\hat{\theta}^{max}=\underset{h}{max}\ \hat{\theta}_{h}^{max}=|N|(|N|-1)/2|E|.

This maximum value of whw_{h} cannot be realised, however. As whw_{h} continues to grow, it will surpass the largest eigenvalue λN\lambda_{N} of the original LL and become the largest eigenvalue of LL^{\prime}. Its value is then capped by the size of the augmented graph (Anderson and Morley, 1985). For hM1h\geq M-1, the graph has one connected component, thus the largest eigenvalue whw_{h} is less than graph size |N||N| (in fact, wh|N|w_{h}\leq|N| for any hh). For h<M1h<M-1, graph has more than one component. Consider addition node degree Δdi\Delta d_{i}. On the original LL, existing node degree di=K=1|N|λkuik2d_{i}=\sum_{K=1}^{|N|}\lambda_{k}u_{ik}^{2}; on LL^{\prime}, augmented node degree did^{\prime}_{i} takes similar form, thus

Δdi=didi=K=1|N|λkuik2K=1|N|λkuik2=whK=2hλkuik2.\Delta d_{i}=d^{\prime}_{i}-d_{i}=\sum_{K=1}^{|N|}\lambda^{\prime}_{k}u_{ik}^{2}-\sum_{K=1}^{|N|}\lambda_{k}u_{ik}^{2}=w_{h}\sum_{K=2}^{h}\lambda_{k}u_{ik}^{2}. (7)

Consider elements uiku_{ik} of the same row at different eigenvectors to be roughly independent and identically distributed. As iuik2=1\sum_{i}u_{ik}^{2}=1, uik2¯1/|N|\bar{u_{ik}^{2}}\sim 1/|N|; thus Δdiwhh/|N|\Delta d_{i}\sim w_{h}h/|N|. Ensuring successful assignment of extra degree Δdi\Delta d_{i}, the augmented did^{\prime}_{i} is upper bounded by the maximum possible size of the augmented graph, which is adding the size of the largest h+1h+1 components among the MM components (Note 2, ). This is for a predefined graph topology in terms of the sequence {|Nm|m=1:M}\{|N_{m}|_{m=1:M}\} of community sizes; considering arbitrary graph topologies, the largest size of the aggregated h+1h+1 components is when the rest M(h+1)M-(h+1) components are all isolate nodes. Each degree in did^{\prime}_{i} has 1/2 probability to be in Δdi\Delta d_{i} (the other 1/2 probability in did_{i}); thus there is

2Δdi\displaystyle 2\Delta d_{i} s=MhM1|Ns||N|(Mh1)1\displaystyle\leq\sum_{s=M-h}^{M-1}|N_{s}|\leq|N|-(M-h-1)-1 (8)
2whh|N||N|M+h.\displaystyle\Rightarrow 2\frac{w_{h}h}{|N|}\leq|N|-M+h.

Overall, the upper bound for whw_{h} is

wh{|N|(|N|M+h)2h,0<h<M1|N|.hw_{h}\leq\Biggl{\{}\begin{aligned} &\frac{|N|(|N|-M+h)}{2h},\ \ \ &&0<h<M-1\\ &|N|.\ \ \ &&\forall h\\ \end{aligned} (9)

The two upper bounds coincide when |N|(|N|M+h)/2h=|N||N|(|N|-M+h)/2h=|N| at h=M1h=M-1, which gives M=|N|/2M=|N|/2, i.e., the number of connected components reaches half graph size. One such case is Erdos-Renyi (ER) graph at the threshold for giant component emergence (Erdős and Rényi, 1960), i.e., G=ER(|N|,p=1/|N|)G=ER(|N|,p=1/|N|), in which case multiple non-trivial components exist besides the giant component, while the rest bulk of components are isolates. Most real-world graphs have M<|N|/2M<|N|/2, in which case the second upper bound wh<|N|w_{h}<|N| is tighter, entering into the region of 0<h<M10<h<M-1; extreme graph topology may have M>|N|/2M>|N|/2, leaving a discontinuity of this upper bound.

Consider M<|N|/2M<|N|/2 at common graphs, for θ\theta, there is upper bound

θhmax=1+hwhmax2|E|1+h|N|2|E|=1+hkave,\theta^{max}_{h}=1+\frac{hw^{max}_{h}}{2|E|}\leq 1+\frac{h|N|}{2|E|}=1+\frac{h}{k_{ave}}, (10)

where kavek_{ave} is average degree of the original graph GG. Further, θmax=1+(M1)wmax2|E|1+(M1)|N|2|E|=1+M1kave\theta^{max}=1+\frac{(M-1)w^{max}}{2|E|}\leq 1+\frac{(M-1)|N|}{2|E|}=1+\frac{M-1}{k_{ave}}, considering all values of hh. As |N||N| is always no greater than w^hmax=[(|N|M+1+h)(|N|M+h)2|E|]/h\hat{w}_{h}^{max}=[(|N|-M+1+h)(|N|-M+h)-2|E|]/h (equation (5)), w^hmax\hat{w}_{h}^{max} will be realized only in one extreme case: h=M1h=M-1 and the connected component is complete graph. In general, the above upper bound θhmax\theta_{h}^{max} is tighter than θ^max=|N|(|N1|)/2|E||N|/kave\hat{\theta}^{max}=|N|(|N-1|)/2|E|\sim|N|/k_{ave}, as in (6). (10) suggests that the amplitude of successful graph augmentation depends on original graph topology, via kavek_{ave} and MM, i.e., depending on original graph’s capacity of accommodating new edges. For dense graphs where kavek_{ave} approaches |N||N|, this upper bound 1+(M1)/kave1+(M-1)/k_{ave} approaches |N|/kave1|N|/k_{ave}\sim 1, converging to the complete graph scenario at (6).

(C) Determining new edges

The location of new edges is determined after we obtain LL^{\prime}. Recall L=DAL=D-A and L=DAL^{\prime}=D^{\prime}-A^{\prime}. DD and DD^{\prime} are the diagonal elements of LL and LL^{\prime}, respectively; diagonals of AA and AA^{\prime} are zero. We first get DD^{\prime} from the diagonal of LL^{\prime}, then indicate the number of additional degrees of each node, ΔD\Delta D, by comparing DD^{\prime} to DD. Diagonal elements of ΔD\Delta D, Δdi\Delta d_{i}, may not always be non-negative integer; let ΔdiMAX(Δdi,0)\Delta d_{i}\leftarrow MAX(\lfloor\Delta d_{i}\rfloor,0) in these cases. The above threshold for whw_{h} ensures that the maximum element in ΔD\Delta D, i.e., the maximum additional degree of a node, can be assigned in the augmented graph. That is, the maximum additional degree is at most the size of the new (largest) connected component minus one; whw_{h} beyond the threshold will lead to unrealistic elements in ΔD\Delta D that are too large to be realisable node degrees.

Even below the threshold of whw_{h}, the additional node degrees are not guaranteed to be realizable. This is because the set of nodes that are going to have additional degrees, i.e., the nonzero elements of ΔD\Delta D, may not be as large as some elements of ΔD\Delta D. Although elements of ΔD\Delta D are ensured to be no larger than the component size limit, new degrees are formulated only between nodes corresponding to the nonzero elements of ΔD\Delta D, not between a node in this set and a node that has zero entry in ΔD\Delta D. This suggests that realized degrees are no greater than the sum of the diagonals of ΔD\Delta D, which equals tr(L)tr(L)=hwhtr(L^{\prime})-tr(L)=hw_{h} (Note 3, ). Consequently, the realized number of components, MM^{\dagger}, can be larger than MhM-h.

We use the following algorithm to realize additional node degrees in ΔD\Delta D, i.e., to determine the additional adjacency ΔA=AA\Delta A=A^{\prime}-A. Sort the nodes having nonzero Δd\Delta d in descending order of Δd\Delta d. Starting from the first node in the list, add an edge between the current node and each node below it in the list that still has quota for new degree, if such an edge does not exist; stop if reaching list end or when Δd\Delta d has been all assigned; the realized Δd\Delta d^{\dagger} is then no more than Δd\Delta d. Continue until reaching list end. The complete algorithm of graph augmentation via eigenvalue elevation is summarized in the Supplemental Material. As mentioned, truncating additional degrees Δd\Delta d as integers and assigning them only among modified nodes make realized new edges less than projected new edges; edge realization ratio ϕ=2Δd/hwh<1\phi=2\sum\Delta d^{\dagger}/hw_{h}<1.

Refer to caption
Figure 2: Upper bound for (a) whw_{h}, (b) θ\theta; and (c) edge realization ratio ϕ\phi, in the (wh,h)(w_{h},h) space. Dashed lines are (9). Cross point on (b) is θmax\theta^{max} at h=M1h=M-1. Each point is averaging over 10 Erdos-Renyi random graph instances ER(|N|=200,p=1/|N|)ER(|N|=200,p=1/|N|).

(D) Evaluation of augmentation outcome

The outcome of edge augmentation is evaluated from two aspects. We first check the fraction of inter-community edges among proposed new edges, which include both intra- and inter-community edges. This fraction ρ\rho indicates algorithm’s shear ability in establishing inter-community connections. Note that, although in current problem setting (i.e., connecting communities), inter-community edges are preferred over intra-community edges, a large ρ\rho may not be desired in practice, as too much inter-community linkage destroys original community structures. So happens often with large whw_{h}: when whw_{h} exceeds original eigenvalues and departs from the bulk of them, it is imposing new low-dimension structures that aggregates existing communities (e.g., (Chauhan et al., 2009)), drastically altering graph topology. In practice, whw_{h} can be chosen at kavek_{ave} or λave\lambda_{ave} to prevent this failure.

This ambiguity at ρ\rho brings the second evaluation, where we investigate inter-community edge determination against ground truth, i.e. to which extent the algorithm can recover real inter-community links that are hidden from view. This uses graph data that have ground truth community structures; inter-community edges are covered and are checked against algorithm-determined edges.

In a further sense, as hinted above, since community structure on graphs is ill-defined and there is in fact no “ground truth” community (Peel et al., 2017; Li and Zhang, 2020), this task of inter-community edge augmentation can be viewed as the inverse task for community detection. The above edge augmentation algorithm thus reverse-engineers community detection algorithms. Hence the problem of whether the algorithm can recover ground truth inter-community connections is better formulated as to which extent this algorithm can recover the inter-community connections determined by which community detection algorithm, i.e., this edge augmentation heuristic is an effective reverse-engineering companion of which community detection heuristic.

Results.(A) Upper bound for whw_{h} and θ\theta. Consider Erdos-Renyi random graph G=ER(|N|=200,p=1/|N|)G=ER(|N|=200,p=1/|N|) at the threshold for giant component emergence, in order that multiple non-trivial components exist and that M=|N|/2M=|N|/2. For each h[0,200]h\in[0,200] and wh[0,500]w_{h}\in[0,500], average results over 10 random graphs. Count the number of times Δdmax\Delta d^{max}, determined under whw_{h} and hh, can be realised during edge augmentation, i.e, when the augmenting node set |NΔ|Δdmax|N^{\Delta}|\geq\Delta d^{max}. Compute realized edge density θ=1+Δd/|E|\theta=1+\sum\Delta d^{\dagger}/|E| at each (wh,h)(w_{h},h).

Upper bound for whw_{h} holds consistently (Figure 2a). At region h<M1h<M-1, fewer instances of random ER graphs approach the upper bound, which is for arbitrary graph topology and is bounding at the most extreme case with a giant complete component and MhM-h isolates. Average component number M=|N|/2=100M=|N|/2=100; corresponding upper bound for θ\theta, 1+h/kave1+h/k_{ave}, holds at each value of hh and in particular at h=M1h=M-1 (Figure 2b; kave=1k_{ave}=1). Consistently, edge realization ratio ϕ\phi remains high in the realizable (wh,h)(w_{h},h) region, slightly below 1 due to finite effects (Figure 2c) (Note 4, ).

Refer to caption
Figure 3: Fraction ρ\rho of inter-community edges among realized edges during the recovery of inter-community connections at SBM. Dashed line marks the boundary of realisable/unrealisable region.

(B) Recovering inter-community connections of SBM. We evaluate algorithm’s ability of recovering inter-community connections at the stochastic block model (SBM) (Holland et al., 1983; Karrer and Newman, 2011). Initiate M=5M=5 blocks each containing 50 nodes, totaling at |N|=250|N|=250; pin,pout=0.5,0.1p_{in},p_{out}=0.5,0.1; expected inter-block/intra-block edge ratio is 0.8. Hide all inter-block edges and conduct edge augmentation. Average results over 5 random graph instances for each h[1,10]h\in[1,10] and wh[0,1000]w_{h}\in[0,1000]. For this graph, as |N|(|N|M+h)/2h|N|(|N|-M+h)/2h is always greater than |N||N|, the upper bound wh|N|w_{h}\leq|N| holds across the range of hh.

The fraction ρ\rho of inter-block edges among realized edges increases as hh increases and roughly remains constant as whw_{h} increases (Figure 3). Inter-block connections become denser when a larger graph eigenspace is being modulated via altering more eigenvalues; the amplitude of augmentation plays a lesser role. With nodes in SBM communities being homogeneous, the recovery rate ϵ\epsilon of exact hidden inter-block edges is small (<0.1<0.1, not shown).

MM^{\dagger}/ρ\rho/ϵ\epsilon hh Girvan-Newman Greedy modularity Label propagation Louvain Fluid community
Davis southern women 0 2/-/- 3/-/- 2/-/- 3/-/- 2/-/-
(|N|=32,|E|=89|N|=32,|E|=89) 1 2/0/0 3/0/0 2/0/0 3/0/0 2/0/0
2 2/0/0 3/0/0 1/0.97/0.25 2/0.86/0 1/0.65/0.03
3 2/0/0 1/0.46/0.11 1/0.64/0 2/0.97/0.04 1/0.49/0.09
4 1/0.44/0.20 1/0.74/0.18 1/0.48/0 1/0.96/0.04 1/0.61/0.17
Karate club 0 2/-/- 3/-/- 4/-/- 4/-/- 2/-/-
(|N|=34,|E|=78|N|=34,|E|=78) 1 2/0/0 2/0.82/0.11 3/1.00/0.06 4/0/0 2/0/0
2 2/0/0 2/0.76/0.21 2/0.97/0.12 3/0.82/0 2/0/0
3 2/0/0 2/0.55/0.16 1/0.66/0.31 2/0.74/0 1/0.42/0.09
4 1/0.51/0.10 2/0.59/0.16 1/0.71/0.31 1/0.78/0.05 1/0.53/0.09
Dolphin 0 2/-/- 4/-/- 5/-/- 5/-/- 2/-/-
(|N|=62,|E|=159|N|=62,|E|=159) 1 1/0.55/0 3/0.06/0 4/0.93/0.06 4/0.07/0 2/0/0
2 1/0.48/0 2/0.49/0 3/0.77/0.06 3/0.58/0 1/0.52/0.29
3 1/0.63/0 2/0.81/0.14 2/0.82/0.12 2/0.62/0.03 1/0.58/0.24
4 1/0.53/0 2/0.75/0.14 2/0.87/0.15 1/0.67/0.05 1/0.53/0.12
Facebook TV show 0 2/-/- 59/-/- 410/0/0 49/-/- 9/-/-
(|N|=3892,|E|=17239|N|=3892,|E|=17239) 1 2/0/0 27/0.51/0 263/0.88/0 22/0.48/0 6/0.99/0
2 1/0.68/0 16/0.66/0 186/0.80/0 11/0.45/0 5/0.54/0
3 1/0.51/0 10/0.63/0 94/0.81/0 7/0.52/0 4/0.63/0
4 1/0.60/0 6/0.65/0 52/0.83/0.01 4/0.51/0 3/0.80/0
5 1/0.49/0 4/0.67/0 21/0.85/0.01 2/0.46/0 3/0.97/0
6 1/0.37/0 3/0.66/0 14/0.86/0.01 2/0.51/0 2/0.98/0
Table 1: Reverse-engineering community detection on real networks via edge augmentation.

(C) Reverse-engineering community detection at real networks. Apply a panel of community detection algorithms (Girvan-Newman method (Girvan and Newman, 2002), greedy modularity maximization (Clauset et al., 2004), label propagation (Raghavan et al., 2007), Louvain method (Blondel et al., 2008), and fluid community (Paluck et al., 2017)) on small-to-medium real networks (Davis southern women (Davis et al., 2009), Karate club (Zachary, 1977), Dolphin network (Lusseau et al., 2003), and a recent Facebook network (Rozemberczki et al., 2019)) for demonstration.

On each graph, after community detection, hide inter-community edges and recover via edge augmentation. Set whw_{h} equal to network size; vary hh. Compute the resulting component number MM^{\dagger}, fraction ρ\rho of inter-community edges among new edges, and recovery rate ϵ\epsilon of hidden ground-truth inter-community edges.

Results are shown in Table 1 (showing one set of results at indeterministic community detection; results are robust across multiple runs). Consistently, for the five community detection methods, as hh increases, components are always better connected (i.e., MM^{\dagger} decreases; showing the first pair of communities at Girvan-Newman to illustrate M=21M^{\dagger}=2\rightarrow 1). A small hh is sufficient for considerably connecting the partitioned real network (Facebook TV show). In most cases, over half of proposed edges are inter-community (ρ>0.5\rho>0.5), while the recovery rate ϵ\epsilon of exact ground-truth inter-community connections is not high. Overall, this edge augmentation method most reverse-engineers community detection under label propagation, yielding large ρ\rho across different graphs and hh values.

Concluding Remarks.– We propose the task on graphs of determining most likely inter-community edges based on components’ intra-community connectivity. We develop an algorithm for this edge augmentation task based on elevating zero eigenvalues of graph’s spectrum. Upper bounds for eigenvalue elevation amplitude and for the corresponding augmented edge density are derived and are validated on random graphs. Algorithm performs consistently on synthetic and real networks, showing desirable performance at connecting graph components and varied reverse-engineering compatibility towards different community detection methods.

We assume uniform whw_{h} in eigenvalue elevation; non-uniform whw_{h} may lead to more general results. The method can further extend to using alternative graph Laplacian (e.g., normalized, random-walk). The matching of inter-community edge augmentation heuristic with community detection ideas is prefatory and asks extensive empirical analysis.

The “most likely” inter-community edges, as we set out to determine, can be defined in different ways. This suggests that edge augmentation outcome can be evaluated on resulting graph’s performance at specific tasks that welcomes global connectivity. One suitable task is semi-supervised learning on graphs, for example, using graph convolution neural networks (Zhang et al., 2019), with the convolution taking place in the spectral (e.g., GCN (Kipf and Welling, 2016)) or spatial space (e.g., DCNN (Atwood and Towsley, 2016)). We investigate which edges to add to the disconnected graph, such that semi-supervised classification can better generalize; that is, the evaluation of inter-community connections anchors on their effect in enhancing classification performance. Application of this edge augmentation algorithm to semi-supervised classification on disconnected graphs points to an interesting future work.

References

  • Anderson and Morley (1985) Anderson Jr, W. N., &\& Morley, T. D. (1985), Eigenvalues of the Laplacian of a graph, Linear and Multilinear Algebra, 18(2), 141-145.
  • Atwood and Towsley (2016) Atwood, J., &\& Towsley, D. (2016), Diffusion-convolutional neural networks, Adv. in NIPS, 29.
  • Banerjee et al. (2013) Banerjee, A., Chandrasekhar, A. G., Duflo, E., &\& Jackson, M. O. (2013), The diffusion of microfinance, Science, 341(6144), 1236498.
  • Blondel et al. (2008) Blondel, V. D., Guillaume, J. L., Lambiotte, R., &\& Lefebvre, E. (2008). Fast unfolding of communities in large networks, J. of Statistical Mechanics, 2008(10), P10008.
  • Breiger and Pattison (1986) Breiger, R. L., &\& Pattison, P. E. (1986), Cumulated social roles: The duality of persons and their algebras, Social Networks, 8(3), 215-256.
  • Cai et al. (2015) Cai, J., De Janvry, A., &\& Sadoulet, E. (2015), Social networks and the decision to insure, A. E. J., 7(2), 81-108.
  • Cai and Sun (1989) Cai, G. R., &\& Sun, Y. G. (1989), The minimum augmentation of any graph to a K edge connected graph, Networks, 19(1), 151-172.
  • Chauhan et al. (2009) Chauhan, S., Girvan, M., &\& Ott, E. (2009), Spectral properties of networks with community structure, PRE, 80(5), 056114.
  • Chung and Graham (1997) Chung, F. R., &\& Graham, F. C. (1997), Spectral graph theory (No. 92), American Mathematical Soc..
  • Clauset et al. (2004) Clauset, A., Newman, M. E., &\& Moore, C. (2004), Finding community structure in very large networks, PRE, 70(6), 066111.
  • Davis et al. (2009) Davis, A., Gardner, B. B., &\& Gardner, M. R. (2009), Deep South: A social anthropological study of caste and class, Univ of South Carolina Press.
  • Ding et al. (2022) Ding, K., Xu, Z., Tong, H., &\& Liu, H. (2022), Data augmentation for deep graph learning: A survey, arXiv:2202.08235.
  • Erdős and Rényi (1960) Erdős, P., &\& Rényi, A. (1960), On the evolution of random graphs, Publ. Math. Inst. Hung. Acad. Sci, 5(1), 17-60.
  • Frederickson and Ja’Ja’ (1981) Frederickson, G. N., &\& Ja’Ja’, J. (1981), Approximation algorithms for several graph augmentation problems, SIAM J. on Computing, 10(2), 270-283.
  • Gao et al. (2021) Gao, Z., Bhattacharya, S., …, &\& Sadler, B. M. (2021), Training robust graph neural networks with topology adaptive edge dropping, arXiv:2106.02892.
  • Girvan and Newman (2002) Girvan, M., &\& Newman, M. E. (2002), Community structure in social and biological networks, PNAS, 99(12), 7821-7826.
  • Holland et al. (1983) Holland, P. W., Laskey, K. B., &\& Leinhardt, S. (1983), Stochastic blockmodels: First steps, Social Networks, 5(2), 109-137.
  • Huffaker (2010) Huffaker, D. (2010), Dimensions of leadership and social influence in online communities, Hu. Com. Res., 36(4), 593-617.
  • Karrer and Newman (2011) Karrer, B., &\& Newman, M. E. (2011), Stochastic blockmodels and community structure in networks, PRE, 83(1), 016107.
  • Kipf and Welling (2016) Kipf, T. N., &\& Welling, M. (2016), Semi-supervised classification with graph convolutional networks, arXiv:1609.02907.
  • Khuller and Thurimella (1993) Khuller, S., &\& Thurimella, R. (1993), Approximation algorithms for graph augmentation, J. of Algorithms, 14(2), 214-225.
  • Klicpera et al. (2019) Klicpera, J., Weisenberger, S., &\& Günnemann, S. (2019), Diffusion improves graph learning, arXiv:1911.05485.
  • Li and Zhang (2020) Li, T., &\& Zhang, P. (2020), Self-falsifiable hierarchical detection of overlapping communities on social networks, NJP, 22(3), 033014.
  • Liben‐Nowell and Kleinberg (2007) Liben‐Nowell, D., &\& Kleinberg, J. (2007), The link prediction problem for social networks, J. of ASIST, 58(7), 1019-1031.
  • Lü and Zhou (2011) Lü, L., &\& Zhou, T. (2011), Link prediction in complex networks: A survey, Physica A, 390(6), 1150-1170.
  • Lusseau et al. (2003) Lusseau, D., Schneider, K., …, &\& Dawson, S. M. (2003), The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations, Beh. Eco. and Sociobio., 54(4), 396-405.
  • Paluck et al. (2016) Paluck, E. L., Shepherd, H., &\& Aronow, P. M. (2016), Changing climates of conflict: A social network experiment in 56 schools, PNAS, 113(3), 566-571.
  • Paluck et al. (2017) Parés, F., Gasulla, D. G., … &\& Suzumura, T. (2017, November), Fluid communities: A competitive, scalable and diverse community detection algorithm, In Inter. Conf. on Comp. Net. and Their App. (pp. 229-240).
  • Peel et al. (2017) Peel, L., Larremore, D. B., &\& Clauset, A. (2017), The ground truth about metadata and community detection in networks, Sci. Adv., 3(5), e1602548.
  • Raghavan et al. (2007) Raghavan, U. N., Albert, R., &\& Kumara, S. (2007), Near linear time algorithm to detect community structures in large-scale networks, PRE, 76(3), 036106.
  • Rong et al. (2019) Rong, Y., Huang, W., Xu, T., &\& Huang, J. (2019), Dropedge: Towards deep graph convolutional networks on node classification, arXiv:1907.10903.
  • Rozemberczki et al. (2019) Rozemberczki, B., Davies, R., Sarkar, R., &\& Sutton, C. (2019, August). Gemsec: Graph embedding with self clustering, 2019 IEEE/ACM Inter. Conf. on Adv. in Soc. Net. Ana. and Mining (pp. 65-72).
  • Von Luxburg (2007) Von Luxburg, U. (2007), A tutorial on spectral clustering, Statistics and Computing, 17(4), 395-416.
  • Watanabe and Nakamura (1987) Watanabe, T., &\& Nakamura, A. (1987), Edge-connectivity augmentation problems, J. of Computer and System Sciences, 35(1), 96-144.
  • Zachary (1977) Zachary, W. W. (1977), An information flow model for conflict and fission in small groups, J. of Anthropological Research, 33(4), 452-473.
  • Zhang et al. (2019) Zhang, S., Tong, H., Xu, J., &\& Maciejewski, R. (2019), Graph convolutional networks: a comprehensive review, Computational Social Networks, 6(1), 1-23.
  • Zhao et al. (2022) Zhao, T., Liu, G., Günnemann, S., &\& Jiang, M. (2022), Graph Data Augmentation for Graph Machine Learning: A Survey, arXiv:2202.08871.
  • Zhao et al. (2021) Zhao, T., Liu, Y., …, &\& Shah, N. (2021, May), Data augmentation for graph neural networks, In Proc. of the AAAI Conf. on AI, 35(12), 11015-11023.
  • (39) Assume no self-loops on nodes; diagonal elements of AA are zero. Moreover, tr(L)=tr(Λ)=iλitr(L)=tr(\Lambda)=\sum_{i}\lambda_{i}.
  • (40) Removing hh zero eigenvalues corresponds to reducing hh components and thus aggregating h+1h+1 components.
  • (41) For a valid ΔD\Delta D, let the product hwhhw_{h} be an odd number when determining hh and whw_{h}.
  • (42) In the experiment, edge assignment is allowed to proceed in the unrealizable (wh,h)(w_{h},h) region, realizing a portion of proposed new degrees. Smaller realization ratio ϕ\phi scatters the contour of θ\theta in this region (Figure 2b).