^†^†thanks: [email protected]

Edge Augmentation on Disconnected Graphs via Eigenvalue Elevation

Tianyi Li ¹Department of Decision Sciences and Managerial Economics, CUHK Business School, Hong Kong, China
²CAS Key Laboratory for Theoretical Physics, Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, China

Abstract

The graph-theoretical task of determining most likely inter-community edges based on disconnected subgraphs’ intra-community connectivity is proposed. An algorithm is developed for this edge augmentation task, based on elevating the zero eigenvalues of graph’s spectrum. Upper bounds for eigenvalue elevation amplitude and for the corresponding augmented edge density are derived and are authenticated with simulation on random graphs. The algorithm works consistently across synthetic and real networks, yielding desirable performance at connecting graph components. Edge augmentation reverse-engineers graph partition under different community detection methods (Girvan-Newman method, greedy modularity maximization, label propagation, Louvain method, and fluid community), in most cases producing inter-community edges at $>50\%$ frequency.

It is common in social studies employing network analysis that the entire network is comprised of disconnected communities, e.g., villages (Banerjee et al., 2013; Cai et al., 2015) or schools (Paluck et al., 2016) in a geological region, or online chatrooms on a digital platform (Huffaker, 2010). Analysis is conducted on each distinct community, and results are compared across multiple communities. Although communities possess distinct features, they nevertheless share certain commonalities that are indicative of the background graph consisting of them all. It is useful to recover this background graph from these disconnected components, in order that tasks that operate on the entire graph can be conducted.

This asks for augmentation of the disconnected graph by introducing new edges. We view the observed subgraphs as downsampled from the entire background graph consisting of all nodes, and try to recover a more detailed connectivity between existing nodes. In particular, complementing existing graph connectivity within subgraphs (i.e., intra-community edges), we establish connections between current communities (i.e., inter-community edges). That is, address the following task: based on subgraphs’ intra-community connectivity, determine most likely inter-community edges that are currently missing on the disconnected graph.

This graph-theoretical task is seldom touched to the best of our knowledge. More broadly, despite being an important problem, graph augmentation has yet to receive substantial research attention. Previous studies formulate graph augmentation as an optimization problem – determining a minimum-cost set of edges to add to a graph to satisfy a specified property, such as biconnectivity, bridge-connectivity or strong connectivity (Frederickson and Ja’Ja’, 1981). Since the problem is NP-hard, approximate algorithms are developed (Frederickson and Ja’Ja’, 1981; Watanabe and Nakamura, 1987; Cai and Sun, 1989; Khuller and Thurimella, 1993). This task witnesses an unexpected revival in recent works (Klicpera et al., 2019; Rong et al., 2019; Gao et al., 2021; Zhao et al., 2021), where graph augmentation, as a subordinate of data augmentation, is studied to improve the generalizability of machine learning on graphs (see reviews in Ding et al. (2022); Zhao et al. (2022)). Those studies focus primarily on edge dropping for removing the over-smoothing of graph neural networks, or on node augmentation for enriching the dataset, paying less attention to edge augmentation.

Method.– Consider graph $G$ consisting of $M$ subgraphs (disconnected communities): $G(N,E)=\{G_{m}\}_{m=1:M}=\{(N_{m},E_{m})\}_{m=1:M}$ . Subgraph $G_{m}$ has $|N_{m}|$ nodes and $|E_{m}|$ edges: $N_{m}=\{n^{m,i}\},E_{m}=\{e^{m,i}\}$ . Subgraphs $G_{1},G_{2},...,G_{M}$ are ordered by their size, e.g., $|N_{1}|<|N_{2}|$ .

We consider that the observed edges $\{E_{m}\}_{m=1:M}$ are a subset of the edges in the whole graph. With additional edges $\Delta E$ , the augmented edge set (sup) could facilitate a fully connected (fc) graph where current communities are all interconnected, and even beyond, until possibly reaching the complete graph having $|N|(|N|-1)/2$ edges.

We parameterize edge sets at different scenarios with edge density $\theta\geq 1$ : at the current edge set, $\theta=1$ ; at augmented edge sets, $\theta>1$ ; $\theta$ may reach maximum $\theta_{max}$ , which is no greater than $|N|(|N|-1)/2|E|$ at the complete graph (see below). Edge sets are further distinguished by the multiplicity of zero eigenvalues of $G$ ’s Laplacian, denoted with $M^{\dagger}$ , which equals the number of connected components (i.e., disconnected communities) in the graph (Chung and Graham, 1997; Von Luxburg, 2007). For current graph $G$ , the number of components $M^{\dagger}=M$ ; with additional edges, $M^{\dagger}$ decreases, until the graph is fully connected, in which case $M^{\dagger}=1$ . The span of edge set scenarios is:

E(\theta=1,M^{\dagger}=M)\subseteq E_{sup}(\theta>1,M^{\dagger}\leq M)\subseteq E_{fc}(\theta>1,M^{\dagger}=1)\subseteq E_{max}(\theta=\theta_{max}\leq\frac{|N|(|N|-1)}{2|E|},M^{\dagger}=1).

(1)

If we have information on node attributes, we can use this data to project which edges are likely to appear, for example, considering the problem of link prediction (Liben‐Nowell and Kleinberg, 2007; Lü and Zhou, 2011). When we do not have such data and only have graph connectivity information, it is difficult to employ link prediction methods to establish inter-community edges – as connectivity-based link prediction relies on common neighbors between a node pair to predict their potential linkage, new edges are not connecting nodes having zero existing path.

Refer to caption — Figure 1: Illustration of graph augmentation via eigenvalue elevation.

(A) Graph augmentation via eigenvalue elevation

In this case, we augment the graph through working on the spectrum of graph Laplacian. Consider graph $G$ ’s adjacency matrix $A=\{a_{ij}\}$ . Graph Laplacian $L=D-A$ where $D$ is diagonal matrix whose elements are node degrees, $D=diag(d_{i})$ . Eigendecompose $L=U\Lambda U^{T}$ , where $\Lambda$ is diagonal matrix whose elements are eigenvalues of $G$ , $\Lambda=diag(\lambda_{1},\lambda_{2},...,\lambda_{N})$ , and $U=(v_{1},v_{2},...,v_{N})^{T}$ are the corresponding eigenvectors. Arrange eigenvalues in increasing order, with $\lambda_{1}=0$ being the smallest eigenvalue. As the graph has $M>1$ disconnected communities, the first $M$ eigenvalues $\lambda_{1},...,\lambda_{M}$ are zero. To connect communities, we elevate zero eigenvalues to reduce the number of connected components. Consider $h\in[1,M-1]$ eigenvalues elevated uniformly to $w_{h}$ from 0, encoded in $\Lambda_{\delta}=diag(0,\underbrace{w_{h},...,w_{h}}_{h},0,...,0)$ . The modified Laplacian

L^{\prime}=U(\Lambda+\Lambda_{\delta})U^{T}=L+U\Lambda_{\delta}U^{T}=L+\sum_{q=1:h}w_{h}v_{q}v_{q}^{T},

(2)

where $v_{q}=(u_{1q},u_{2q},...,u_{Nq})^{T}$ are the eigenvectors corresponding to the $h$ elevated eigenvalues ( $u_{\bullet}$ are elements in $U$ ).

(B) Upper bound for edge density

As $tr(v_{q}v_{q}^{T})=v_{q}^{T}v_{q}=1$ , there is

tr(L^{\prime})=tr(L)+\sum_{q=1:h}w_{h}=tr(L)+hw_{h}.

(3)

On the other hand, $tr(L)=tr(D-A)=tr(D)=vol(A)=2|E|$ (Note 1, ), and similarly, $tr(L^{\prime})=vol(A^{\prime})=2|E^{\prime}|$ , i.e., the trace of $L^{\prime}$ equals the volume (e.g., twice the edge number) of the augmented graph $A^{\prime}$ . Recall the definition of $\theta$ as the indication of edge density, $|E^{\prime}|=\theta|E|$ , there is the expression

\theta=1+\frac{hw_{h}}{2|E|}.

(4)

By determining the number of eigenvalues to be elevated and the amplitude of elevation, $h$ and $w_{h}$ , we control the density $\theta$ of augmented edges. We derive the upper bound for $\theta$ . For each value of $h$ , the number of edges in the augmented graph is the largest if the graph is turned into a complete graph on the largest connected component having $N-(M-1-h)$ nodes, besides the $h$ isolated nodes each forming a single-node component. That gives

		$\displaystyle\hat{\theta}^{max}_{h}=[\|N\|-(M-1-h)][\|N\|-(M-1-h)-1]/2\|E\|$		(5)
		$\displaystyle\Longrightarrow\hat{w}_{h}^{max}=(\|N\|-M+1+h)(\|N\|-M+h)-2\|E\|/h.$		(5)

Both $d\hat{w}_{h}^{max}/dh$ and $d\hat{\theta}^{max}/dh$ are positive, thus

		$\displaystyle\hat{w}^{max}=\underset{h}{max}\ \hat{w}_{h}^{max}=\|N\|(\|N\|-1)-2\|E\|/(M-1),$		(6)
		$\displaystyle\hat{\theta}^{max}=\underset{h}{max}\ \hat{\theta}_{h}^{max}=\|N\|(\|N\|-1)/2\|E\|.$		(6)

This maximum value of $w_{h}$ cannot be realised, however. As $w_{h}$ continues to grow, it will surpass the largest eigenvalue $\lambda_{N}$ of the original $L$ and become the largest eigenvalue of $L^{\prime}$ . Its value is then capped by the size of the augmented graph (Anderson and Morley, 1985). For $h\geq M-1$ , the graph has one connected component, thus the largest eigenvalue $w_{h}$ is less than graph size $|N|$ (in fact, $w_{h}\leq|N|$ for any $h$ ). For $h<M-1$ , graph has more than one component. Consider addition node degree $\Delta d_{i}$ . On the original $L$ , existing node degree $d_{i}=\sum_{K=1}^{|N|}\lambda_{k}u_{ik}^{2}$ ; on $L^{\prime}$ , augmented node degree $d^{\prime}_{i}$ takes similar form, thus

\Delta d_{i}=d^{\prime}_{i}-d_{i}=\sum_{K=1}^{|N|}\lambda^{\prime}_{k}u_{ik}^{2}-\sum_{K=1}^{|N|}\lambda_{k}u_{ik}^{2}=w_{h}\sum_{K=2}^{h}\lambda_{k}u_{ik}^{2}.

(7)

Consider elements $u_{ik}$ of the same row at different eigenvectors to be roughly independent and identically distributed. As $\sum_{i}u_{ik}^{2}=1$ , $\bar{u_{ik}^{2}}\sim 1/|N|$ ; thus $\Delta d_{i}\sim w_{h}h/|N|$ . Ensuring successful assignment of extra degree $\Delta d_{i}$ , the augmented $d^{\prime}_{i}$ is upper bounded by the maximum possible size of the augmented graph, which is adding the size of the largest $h+1$ components among the $M$ components (Note 2, ). This is for a predefined graph topology in terms of the sequence $\{|N_{m}|_{m=1:M}\}$ of community sizes; considering arbitrary graph topologies, the largest size of the aggregated $h+1$ components is when the rest $M-(h+1)$ components are all isolate nodes. Each degree in $d^{\prime}_{i}$ has 1/2 probability to be in $\Delta d_{i}$ (the other 1/2 probability in $d_{i}$ ); thus there is

	$\displaystyle 2\Delta d_{i}$	$\displaystyle\leq\sum_{s=M-h}^{M-1}\|N_{s}\|\leq\|N\|-(M-h-1)-1$		(8)
		$\displaystyle\Rightarrow 2\frac{w_{h}h}{\|N\|}\leq\|N\|-M+h.$		(8)

Overall, the upper bound for $w_{h}$ is

w_{h}\leq\Biggl{\{}\begin{aligned} &\frac{|N|(|N|-M+h)}{2h},\ \ \ &&0<h<M-1\\ &|N|.\ \ \ &&\forall h\\ \end{aligned}

(9)

The two upper bounds coincide when $|N|(|N|-M+h)/2h=|N|$ at $h=M-1$ , which gives $M=|N|/2$ , i.e., the number of connected components reaches half graph size. One such case is Erdos-Renyi (ER) graph at the threshold for giant component emergence (Erdős and Rényi, 1960), i.e., $G=ER(|N|,p=1/|N|)$ , in which case multiple non-trivial components exist besides the giant component, while the rest bulk of components are isolates. Most real-world graphs have $M<|N|/2$ , in which case the second upper bound $w_{h}<|N|$ is tighter, entering into the region of $0<h<M-1$ ; extreme graph topology may have $M>|N|/2$ , leaving a discontinuity of this upper bound.

Consider $M<|N|/2$ at common graphs, for $\theta$ , there is upper bound

\theta^{max}_{h}=1+\frac{hw^{max}_{h}}{2|E|}\leq 1+\frac{h|N|}{2|E|}=1+\frac{h}{k_{ave}},

(10)

where $k_{ave}$ is average degree of the original graph $G$ . Further, $\theta^{max}=1+\frac{(M-1)w^{max}}{2|E|}\leq 1+\frac{(M-1)|N|}{2|E|}=1+\frac{M-1}{k_{ave}}$ , considering all values of $h$ . As $|N|$ is always no greater than $\hat{w}_{h}^{max}=[(|N|-M+1+h)(|N|-M+h)-2|E|]/h$ (equation (5)), $\hat{w}_{h}^{max}$ will be realized only in one extreme case: $h=M-1$ and the connected component is complete graph. In general, the above upper bound $\theta_{h}^{max}$ is tighter than $\hat{\theta}^{max}=|N|(|N-1|)/2|E|\sim|N|/k_{ave}$ , as in (6). (10) suggests that the amplitude of successful graph augmentation depends on original graph topology, via $k_{ave}$ and $M$ , i.e., depending on original graph’s capacity of accommodating new edges. For dense graphs where $k_{ave}$ approaches $|N|$ , this upper bound $1+(M-1)/k_{ave}$ approaches $|N|/k_{ave}\sim 1$ , converging to the complete graph scenario at (6).

The location of new edges is determined after we obtain $L^{\prime}$ . Recall $L=D-A$ and $L^{\prime}=D^{\prime}-A^{\prime}$ . $D$ and $D^{\prime}$ are the diagonal elements of $L$ and $L^{\prime}$ , respectively; diagonals of $A$ and $A^{\prime}$ are zero. We first get $D^{\prime}$ from the diagonal of $L^{\prime}$ , then indicate the number of additional degrees of each node, $\Delta D$ , by comparing $D^{\prime}$ to $D$ . Diagonal elements of $\Delta D$ , $\Delta d_{i}$ , may not always be non-negative integer; let $\Delta d_{i}\leftarrow MAX(\lfloor\Delta d_{i}\rfloor,0)$ in these cases. The above threshold for $w_{h}$ ensures that the maximum element in $\Delta D$ , i.e., the maximum additional degree of a node, can be assigned in the augmented graph. That is, the maximum additional degree is at most the size of the new (largest) connected component minus one; $w_{h}$ beyond the threshold will lead to unrealistic elements in $\Delta D$ that are too large to be realisable node degrees.

Even below the threshold of $w_{h}$ , the additional node degrees are not guaranteed to be realizable. This is because the set of nodes that are going to have additional degrees, i.e., the nonzero elements of $\Delta D$ , may not be as large as some elements of $\Delta D$ . Although elements of $\Delta D$ are ensured to be no larger than the component size limit, new degrees are formulated only between nodes corresponding to the nonzero elements of $\Delta D$ , not between a node in this set and a node that has zero entry in $\Delta D$ . This suggests that realized degrees are no greater than the sum of the diagonals of $\Delta D$ , which equals $tr(L^{\prime})-tr(L)=hw_{h}$ (Note 3, ). Consequently, the realized number of components, $M^{\dagger}$ , can be larger than $M-h$ .

We use the following algorithm to realize additional node degrees in $\Delta D$ , i.e., to determine the additional adjacency $\Delta A=A^{\prime}-A$ . Sort the nodes having nonzero $\Delta d$ in descending order of $\Delta d$ . Starting from the first node in the list, add an edge between the current node and each node below it in the list that still has quota for new degree, if such an edge does not exist; stop if reaching list end or when $\Delta d$ has been all assigned; the realized $\Delta d^{\dagger}$ is then no more than $\Delta d$ . Continue until reaching list end. The complete algorithm of graph augmentation via eigenvalue elevation is summarized in the Supplemental Material. As mentioned, truncating additional degrees $\Delta d$ as integers and assigning them only among modified nodes make realized new edges less than projected new edges; edge realization ratio $\phi=2\sum\Delta d^{\dagger}/hw_{h}<1$ .

(D) Evaluation of augmentation outcome

The outcome of edge augmentation is evaluated from two aspects. We first check the fraction of inter-community edges among proposed new edges, which include both intra- and inter-community edges. This fraction $\rho$ indicates algorithm’s shear ability in establishing inter-community connections. Note that, although in current problem setting (i.e., connecting communities), inter-community edges are preferred over intra-community edges, a large $\rho$ may not be desired in practice, as too much inter-community linkage destroys original community structures. So happens often with large $w_{h}$ : when $w_{h}$ exceeds original eigenvalues and departs from the bulk of them, it is imposing new low-dimension structures that aggregates existing communities (e.g., (Chauhan et al., 2009)), drastically altering graph topology. In practice, $w_{h}$ can be chosen at $k_{ave}$ or $\lambda_{ave}$ to prevent this failure.

This ambiguity at $\rho$ brings the second evaluation, where we investigate inter-community edge determination against ground truth, i.e. to which extent the algorithm can recover real inter-community links that are hidden from view. This uses graph data that have ground truth community structures; inter-community edges are covered and are checked against algorithm-determined edges.

In a further sense, as hinted above, since community structure on graphs is ill-defined and there is in fact no “ground truth” community (Peel et al., 2017; Li and Zhang, 2020), this task of inter-community edge augmentation can be viewed as the inverse task for community detection. The above edge augmentation algorithm thus reverse-engineers community detection algorithms. Hence the problem of whether the algorithm can recover ground truth inter-community connections is better formulated as to which extent this algorithm can recover the inter-community connections determined by which community detection algorithm, i.e., this edge augmentation heuristic is an effective reverse-engineering companion of which community detection heuristic.

Results.– (A) Upper bound for $w_{h}$ and $\theta$ . Consider Erdos-Renyi random graph $G=ER(|N|=200,p=1/|N|)$ at the threshold for giant component emergence, in order that multiple non-trivial components exist and that $M=|N|/2$ . For each $h\in[0,200]$ and $w_{h}\in[0,500]$ , average results over 10 random graphs. Count the number of times $\Delta d^{max}$ , determined under $w_{h}$ and $h$ , can be realised during edge augmentation, i.e, when the augmenting node set $|N^{\Delta}|\geq\Delta d^{max}$ . Compute realized edge density $\theta=1+\sum\Delta d^{\dagger}/|E|$ at each $(w_{h},h)$ .

Upper bound for $w_{h}$ holds consistently (Figure 2a). At region $h<M-1$ , fewer instances of random ER graphs approach the upper bound, which is for arbitrary graph topology and is bounding at the most extreme case with a giant complete component and $M-h$ isolates. Average component number $M=|N|/2=100$ ; corresponding upper bound for $\theta$ , $1+h/k_{ave}$ , holds at each value of $h$ and in particular at $h=M-1$ (Figure 2b; $k_{ave}=1$ ). Consistently, edge realization ratio $\phi$ remains high in the realizable $(w_{h},h)$ region, slightly below 1 due to finite effects (Figure 2c) (Note 4, ).

(B) Recovering inter-community connections of SBM. We evaluate algorithm’s ability of recovering inter-community connections at the stochastic block model (SBM) (Holland et al., 1983; Karrer and Newman, 2011). Initiate $M=5$ blocks each containing 50 nodes, totaling at $|N|=250$ ; $p_{in},p_{out}=0.5,0.1$ ; expected inter-block/intra-block edge ratio is 0.8. Hide all inter-block edges and conduct edge augmentation. Average results over 5 random graph instances for each $h\in[1,10]$ and $w_{h}\in[0,1000]$ . For this graph, as $|N|(|N|-M+h)/2h$ is always greater than $|N|$ , the upper bound $w_{h}\leq|N|$ holds across the range of $h$ .

The fraction $\rho$ of inter-block edges among realized edges increases as $h$ increases and roughly remains constant as $w_{h}$ increases (Figure 3). Inter-block connections become denser when a larger graph eigenspace is being modulated via altering more eigenvalues; the amplitude of augmentation plays a lesser role. With nodes in SBM communities being homogeneous, the recovery rate $\epsilon$ of exact hidden inter-block edges is small ( $<0.1$ , not shown).

$M^{\dagger}$ / $\rho$ / $\epsilon$	$h$	Girvan-Newman	Greedy modularity	Label propagation	Louvain	Fluid community
Davis southern women	0	2/-/-	3/-/-	2/-/-	3/-/-	2/-/-
( $\|N\|=32,\|E\|=89$ )	1	2/0/0	3/0/0	2/0/0	3/0/0	2/0/0
	2	2/0/0	3/0/0	1/0.97/0.25	2/0.86/0	1/0.65/0.03
	3	2/0/0	1/0.46/0.11	1/0.64/0	2/0.97/0.04	1/0.49/0.09
	4	1/0.44/0.20	1/0.74/0.18	1/0.48/0	1/0.96/0.04	1/0.61/0.17
Karate club	0	2/-/-	3/-/-	4/-/-	4/-/-	2/-/-
( $\|N\|=34,\|E\|=78$ )	1	2/0/0	2/0.82/0.11	3/1.00/0.06	4/0/0	2/0/0
	2	2/0/0	2/0.76/0.21	2/0.97/0.12	3/0.82/0	2/0/0
	3	2/0/0	2/0.55/0.16	1/0.66/0.31	2/0.74/0	1/0.42/0.09
	4	1/0.51/0.10	2/0.59/0.16	1/0.71/0.31	1/0.78/0.05	1/0.53/0.09
Dolphin	0	2/-/-	4/-/-	5/-/-	5/-/-	2/-/-
( $\|N\|=62,\|E\|=159$ )	1	1/0.55/0	3/0.06/0	4/0.93/0.06	4/0.07/0	2/0/0
	2	1/0.48/0	2/0.49/0	3/0.77/0.06	3/0.58/0	1/0.52/0.29
	3	1/0.63/0	2/0.81/0.14	2/0.82/0.12	2/0.62/0.03	1/0.58/0.24
	4	1/0.53/0	2/0.75/0.14	2/0.87/0.15	1/0.67/0.05	1/0.53/0.12
Facebook TV show	0	2/-/-	59/-/-	410/0/0	49/-/-	9/-/-
( $\|N\|=3892,\|E\|=17239$ )	1	2/0/0	27/0.51/0	263/0.88/0	22/0.48/0	6/0.99/0
	2	1/0.68/0	16/0.66/0	186/0.80/0	11/0.45/0	5/0.54/0
	3	1/0.51/0	10/0.63/0	94/0.81/0	7/0.52/0	4/0.63/0
	4	1/0.60/0	6/0.65/0	52/0.83/0.01	4/0.51/0	3/0.80/0
	5	1/0.49/0	4/0.67/0	21/0.85/0.01	2/0.46/0	3/0.97/0
	6	1/0.37/0	3/0.66/0	14/0.86/0.01	2/0.51/0	2/0.98/0

Table 1: Reverse-engineering community detection on real networks via edge augmentation.

(C) Reverse-engineering community detection at real networks. Apply a panel of community detection algorithms (Girvan-Newman method (Girvan and Newman, 2002), greedy modularity maximization (Clauset et al., 2004), label propagation (Raghavan et al., 2007), Louvain method (Blondel et al., 2008), and fluid community (Paluck et al., 2017)) on small-to-medium real networks (Davis southern women (Davis et al., 2009), Karate club (Zachary, 1977), Dolphin network (Lusseau et al., 2003), and a recent Facebook network (Rozemberczki et al., 2019)) for demonstration.

On each graph, after community detection, hide inter-community edges and recover via edge augmentation. Set $w_{h}$ equal to network size; vary $h$ . Compute the resulting component number $M^{\dagger}$ , fraction $\rho$ of inter-community edges among new edges, and recovery rate $\epsilon$ of hidden ground-truth inter-community edges.

Results are shown in Table 1 (showing one set of results at indeterministic community detection; results are robust across multiple runs). Consistently, for the five community detection methods, as $h$ increases, components are always better connected (i.e., $M^{\dagger}$ decreases; showing the first pair of communities at Girvan-Newman to illustrate $M^{\dagger}=2\rightarrow 1$ ). A small $h$ is sufficient for considerably connecting the partitioned real network (Facebook TV show). In most cases, over half of proposed edges are inter-community ( $\rho>0.5$ ), while the recovery rate $\epsilon$ of exact ground-truth inter-community connections is not high. Overall, this edge augmentation method most reverse-engineers community detection under label propagation, yielding large $\rho$ across different graphs and $h$ values.

Concluding Remarks.– We propose the task on graphs of determining most likely inter-community edges based on components’ intra-community connectivity. We develop an algorithm for this edge augmentation task based on elevating zero eigenvalues of graph’s spectrum. Upper bounds for eigenvalue elevation amplitude and for the corresponding augmented edge density are derived and are validated on random graphs. Algorithm performs consistently on synthetic and real networks, showing desirable performance at connecting graph components and varied reverse-engineering compatibility towards different community detection methods.

We assume uniform $w_{h}$ in eigenvalue elevation; non-uniform $w_{h}$ may lead to more general results. The method can further extend to using alternative graph Laplacian (e.g., normalized, random-walk). The matching of inter-community edge augmentation heuristic with community detection ideas is prefatory and asks extensive empirical analysis.

The “most likely” inter-community edges, as we set out to determine, can be defined in different ways. This suggests that edge augmentation outcome can be evaluated on resulting graph’s performance at specific tasks that welcomes global connectivity. One suitable task is semi-supervised learning on graphs, for example, using graph convolution neural networks (Zhang et al., 2019), with the convolution taking place in the spectral (e.g., GCN (Kipf and Welling, 2016)) or spatial space (e.g., DCNN (Atwood and Towsley, 2016)). We investigate which edges to add to the disconnected graph, such that semi-supervised classification can better generalize; that is, the evaluation of inter-community connections anchors on their effect in enhancing classification performance. Application of this edge augmentation algorithm to semi-supervised classification on disconnected graphs points to an interesting future work.

References

Anderson and Morley (1985) Anderson Jr, W. N., $\&$ Morley, T. D. (1985), Eigenvalues of the Laplacian of a graph, Linear and Multilinear Algebra, 18(2), 141-145.
Atwood and Towsley (2016) Atwood, J., $\&$ Towsley, D. (2016), Diffusion-convolutional neural networks, Adv. in NIPS, 29.
Banerjee et al. (2013) Banerjee, A., Chandrasekhar, A. G., Duflo, E., $\&$ Jackson, M. O. (2013), The diffusion of microfinance, Science, 341(6144), 1236498.
Blondel et al. (2008) Blondel, V. D., Guillaume, J. L., Lambiotte, R., $\&$ Lefebvre, E. (2008). Fast unfolding of communities in large networks, J. of Statistical Mechanics, 2008(10), P10008.
Breiger and Pattison (1986) Breiger, R. L., $\&$ Pattison, P. E. (1986), Cumulated social roles: The duality of persons and their algebras, Social Networks, 8(3), 215-256.
Cai et al. (2015) Cai, J., De Janvry, A., $\&$ Sadoulet, E. (2015), Social networks and the decision to insure, A. E. J., 7(2), 81-108.
Cai and Sun (1989) Cai, G. R., $\&$ Sun, Y. G. (1989), The minimum augmentation of any graph to a K edge connected graph, Networks, 19(1), 151-172.
Chauhan et al. (2009) Chauhan, S., Girvan, M., $\&$ Ott, E. (2009), Spectral properties of networks with community structure, PRE, 80(5), 056114.
Chung and Graham (1997) Chung, F. R., $\&$ Graham, F. C. (1997), Spectral graph theory (No. 92), American Mathematical Soc..
Clauset et al. (2004) Clauset, A., Newman, M. E., $\&$ Moore, C. (2004), Finding community structure in very large networks, PRE, 70(6), 066111.
Davis et al. (2009) Davis, A., Gardner, B. B., $\&$ Gardner, M. R. (2009), Deep South: A social anthropological study of caste and class, Univ of South Carolina Press.
Ding et al. (2022) Ding, K., Xu, Z., Tong, H., $\&$ Liu, H. (2022), Data augmentation for deep graph learning: A survey, arXiv:2202.08235.
Erdős and Rényi (1960) Erdős, P., $\&$ Rényi, A. (1960), On the evolution of random graphs, Publ. Math. Inst. Hung. Acad. Sci, 5(1), 17-60.
Frederickson and Ja’Ja’ (1981) Frederickson, G. N., $\&$ Ja’Ja’, J. (1981), Approximation algorithms for several graph augmentation problems, SIAM J. on Computing, 10(2), 270-283.
Gao et al. (2021) Gao, Z., Bhattacharya, S., …, $\&$ Sadler, B. M. (2021), Training robust graph neural networks with topology adaptive edge dropping, arXiv:2106.02892.
Girvan and Newman (2002) Girvan, M., $\&$ Newman, M. E. (2002), Community structure in social and biological networks, PNAS, 99(12), 7821-7826.
Holland et al. (1983) Holland, P. W., Laskey, K. B., $\&$ Leinhardt, S. (1983), Stochastic blockmodels: First steps, Social Networks, 5(2), 109-137.
Huffaker (2010) Huffaker, D. (2010), Dimensions of leadership and social influence in online communities, Hu. Com. Res., 36(4), 593-617.
Karrer and Newman (2011) Karrer, B., $\&$ Newman, M. E. (2011), Stochastic blockmodels and community structure in networks, PRE, 83(1), 016107.
Kipf and Welling (2016) Kipf, T. N., $\&$ Welling, M. (2016), Semi-supervised classification with graph convolutional networks, arXiv:1609.02907.
Khuller and Thurimella (1993) Khuller, S., $\&$ Thurimella, R. (1993), Approximation algorithms for graph augmentation, J. of Algorithms, 14(2), 214-225.
Klicpera et al. (2019) Klicpera, J., Weisenberger, S., $\&$ Günnemann, S. (2019), Diffusion improves graph learning, arXiv:1911.05485.
Li and Zhang (2020) Li, T., $\&$ Zhang, P. (2020), Self-falsifiable hierarchical detection of overlapping communities on social networks, NJP, 22(3), 033014.
Liben‐Nowell and Kleinberg (2007) Liben‐Nowell, D., $\&$ Kleinberg, J. (2007), The link prediction problem for social networks, J. of ASIST, 58(7), 1019-1031.
Lü and Zhou (2011) Lü, L., $\&$ Zhou, T. (2011), Link prediction in complex networks: A survey, Physica A, 390(6), 1150-1170.
Lusseau et al. (2003) Lusseau, D., Schneider, K., …, $\&$ Dawson, S. M. (2003), The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations, Beh. Eco. and Sociobio., 54(4), 396-405.
Paluck et al. (2016) Paluck, E. L., Shepherd, H., $\&$ Aronow, P. M. (2016), Changing climates of conflict: A social network experiment in 56 schools, PNAS, 113(3), 566-571.
Paluck et al. (2017) Parés, F., Gasulla, D. G., … $\&$ Suzumura, T. (2017, November), Fluid communities: A competitive, scalable and diverse community detection algorithm, In Inter. Conf. on Comp. Net. and Their App. (pp. 229-240).
Peel et al. (2017) Peel, L., Larremore, D. B., $\&$ Clauset, A. (2017), The ground truth about metadata and community detection in networks, Sci. Adv., 3(5), e1602548.
Raghavan et al. (2007) Raghavan, U. N., Albert, R., $\&$ Kumara, S. (2007), Near linear time algorithm to detect community structures in large-scale networks, PRE, 76(3), 036106.
Rong et al. (2019) Rong, Y., Huang, W., Xu, T., $\&$ Huang, J. (2019), Dropedge: Towards deep graph convolutional networks on node classification, arXiv:1907.10903.
Rozemberczki et al. (2019) Rozemberczki, B., Davies, R., Sarkar, R., $\&$ Sutton, C. (2019, August). Gemsec: Graph embedding with self clustering, 2019 IEEE/ACM Inter. Conf. on Adv. in Soc. Net. Ana. and Mining (pp. 65-72).
Von Luxburg (2007) Von Luxburg, U. (2007), A tutorial on spectral clustering, Statistics and Computing, 17(4), 395-416.
Watanabe and Nakamura (1987) Watanabe, T., $\&$ Nakamura, A. (1987), Edge-connectivity augmentation problems, J. of Computer and System Sciences, 35(1), 96-144.
Zachary (1977) Zachary, W. W. (1977), An information flow model for conflict and fission in small groups, J. of Anthropological Research, 33(4), 452-473.
Zhang et al. (2019) Zhang, S., Tong, H., Xu, J., $\&$ Maciejewski, R. (2019), Graph convolutional networks: a comprehensive review, Computational Social Networks, 6(1), 1-23.
Zhao et al. (2022) Zhao, T., Liu, G., Günnemann, S., $\&$ Jiang, M. (2022), Graph Data Augmentation for Graph Machine Learning: A Survey, arXiv:2202.08871.
Zhao et al. (2021) Zhao, T., Liu, Y., …, $\&$ Shah, N. (2021, May), Data augmentation for graph neural networks, In Proc. of the AAAI Conf. on AI, 35(12), 11015-11023.
(39) Assume no self-loops on nodes; diagonal elements of $A$ are zero. Moreover, $tr(L)=tr(\Lambda)=\sum_{i}\lambda_{i}$ .
(40) Removing $h$ zero eigenvalues corresponds to reducing $h$ components and thus aggregating $h+1$ components.
(41) For a valid $\Delta D$ , let the product $hw_{h}$ be an odd number when determining $h$ and $w_{h}$ .
(42) In the experiment, edge assignment is allowed to proceed in the unrealizable $(w_{h},h)$ region, realizing a portion of proposed new degrees. Smaller realization ratio $\phi$ scatters the contour of $\theta$ in this region (Figure 2b).