.
Edge Augmentation on Disconnected Graphs via Eigenvalue Elevation
Abstract
The graph-theoretical task of determining most likely inter-community edges based on disconnected subgraphs’ intra-community connectivity is proposed. An algorithm is developed for this edge augmentation task, based on elevating the zero eigenvalues of graph’s spectrum. Upper bounds for eigenvalue elevation amplitude and for the corresponding augmented edge density are derived and are authenticated with simulation on random graphs. The algorithm works consistently across synthetic and real networks, yielding desirable performance at connecting graph components. Edge augmentation reverse-engineers graph partition under different community detection methods (Girvan-Newman method, greedy modularity maximization, label propagation, Louvain method, and fluid community), in most cases producing inter-community edges at frequency.
It is common in social studies employing network analysis that the entire network is comprised of disconnected communities, e.g., villages (Banerjee et al., 2013; Cai et al., 2015) or schools (Paluck et al., 2016) in a geological region, or online chatrooms on a digital platform (Huffaker, 2010). Analysis is conducted on each distinct community, and results are compared across multiple communities. Although communities possess distinct features, they nevertheless share certain commonalities that are indicative of the background graph consisting of them all. It is useful to recover this background graph from these disconnected components, in order that tasks that operate on the entire graph can be conducted.
This asks for augmentation of the disconnected graph by introducing new edges. We view the observed subgraphs as downsampled from the entire background graph consisting of all nodes, and try to recover a more detailed connectivity between existing nodes. In particular, complementing existing graph connectivity within subgraphs (i.e., intra-community edges), we establish connections between current communities (i.e., inter-community edges). That is, address the following task: based on subgraphs’ intra-community connectivity, determine most likely inter-community edges that are currently missing on the disconnected graph.
This graph-theoretical task is seldom touched to the best of our knowledge. More broadly, despite being an important problem, graph augmentation has yet to receive substantial research attention. Previous studies formulate graph augmentation as an optimization problem – determining a minimum-cost set of edges to add to a graph to satisfy a specified property, such as biconnectivity, bridge-connectivity or strong connectivity (Frederickson and Ja’Ja’, 1981). Since the problem is NP-hard, approximate algorithms are developed (Frederickson and Ja’Ja’, 1981; Watanabe and Nakamura, 1987; Cai and Sun, 1989; Khuller and Thurimella, 1993). This task witnesses an unexpected revival in recent works (Klicpera et al., 2019; Rong et al., 2019; Gao et al., 2021; Zhao et al., 2021), where graph augmentation, as a subordinate of data augmentation, is studied to improve the generalizability of machine learning on graphs (see reviews in Ding et al. (2022); Zhao et al. (2022)). Those studies focus primarily on edge dropping for removing the over-smoothing of graph neural networks, or on node augmentation for enriching the dataset, paying less attention to edge augmentation.
Method.– Consider graph consisting of subgraphs (disconnected communities): . Subgraph has nodes and edges: . Subgraphs are ordered by their size, e.g., .
We consider that the observed edges are a subset of the edges in the whole graph. With additional edges , the augmented edge set (sup) could facilitate a fully connected (fc) graph where current communities are all interconnected, and even beyond, until possibly reaching the complete graph having edges.
We parameterize edge sets at different scenarios with edge density : at the current edge set, ; at augmented edge sets, ; may reach maximum , which is no greater than at the complete graph (see below). Edge sets are further distinguished by the multiplicity of zero eigenvalues of ’s Laplacian, denoted with , which equals the number of connected components (i.e., disconnected communities) in the graph (Chung and Graham, 1997; Von Luxburg, 2007). For current graph , the number of components ; with additional edges, decreases, until the graph is fully connected, in which case . The span of edge set scenarios is:
(1) |
If we have information on node attributes, we can use this data to project which edges are likely to appear, for example, considering the problem of link prediction (Liben‐Nowell and Kleinberg, 2007; Lü and Zhou, 2011). When we do not have such data and only have graph connectivity information, it is difficult to employ link prediction methods to establish inter-community edges – as connectivity-based link prediction relies on common neighbors between a node pair to predict their potential linkage, new edges are not connecting nodes having zero existing path.

(A) Graph augmentation via eigenvalue elevation
In this case, we augment the graph through working on the spectrum of graph Laplacian. Consider graph ’s adjacency matrix . Graph Laplacian where is diagonal matrix whose elements are node degrees, . Eigendecompose , where is diagonal matrix whose elements are eigenvalues of , , and are the corresponding eigenvectors. Arrange eigenvalues in increasing order, with being the smallest eigenvalue. As the graph has disconnected communities, the first eigenvalues are zero. To connect communities, we elevate zero eigenvalues to reduce the number of connected components. Consider eigenvalues elevated uniformly to from 0, encoded in . The modified Laplacian
(2) |
where are the eigenvectors corresponding to the elevated eigenvalues ( are elements in ).
(B) Upper bound for edge density
As , there is
(3) |
On the other hand, (Note 1, ), and similarly, , i.e., the trace of equals the volume (e.g., twice the edge number) of the augmented graph . Recall the definition of as the indication of edge density, , there is the expression
(4) |
By determining the number of eigenvalues to be elevated and the amplitude of elevation, and , we control the density of augmented edges. We derive the upper bound for . For each value of , the number of edges in the augmented graph is the largest if the graph is turned into a complete graph on the largest connected component having nodes, besides the isolated nodes each forming a single-node component. That gives
(5) | ||||
Both and are positive, thus
(6) | ||||
This maximum value of cannot be realised, however. As continues to grow, it will surpass the largest eigenvalue of the original and become the largest eigenvalue of . Its value is then capped by the size of the augmented graph (Anderson and Morley, 1985). For , the graph has one connected component, thus the largest eigenvalue is less than graph size (in fact, for any ). For , graph has more than one component. Consider addition node degree . On the original , existing node degree ; on , augmented node degree takes similar form, thus
(7) |
Consider elements of the same row at different eigenvectors to be roughly independent and identically distributed. As , ; thus . Ensuring successful assignment of extra degree , the augmented is upper bounded by the maximum possible size of the augmented graph, which is adding the size of the largest components among the components (Note 2, ). This is for a predefined graph topology in terms of the sequence of community sizes; considering arbitrary graph topologies, the largest size of the aggregated components is when the rest components are all isolate nodes. Each degree in has 1/2 probability to be in (the other 1/2 probability in ); thus there is
(8) | ||||
Overall, the upper bound for is
(9) |
The two upper bounds coincide when at , which gives , i.e., the number of connected components reaches half graph size. One such case is Erdos-Renyi (ER) graph at the threshold for giant component emergence (Erdős and Rényi, 1960), i.e., , in which case multiple non-trivial components exist besides the giant component, while the rest bulk of components are isolates. Most real-world graphs have , in which case the second upper bound is tighter, entering into the region of ; extreme graph topology may have , leaving a discontinuity of this upper bound.
Consider at common graphs, for , there is upper bound
(10) |
where is average degree of the original graph . Further, , considering all values of . As is always no greater than (equation (5)), will be realized only in one extreme case: and the connected component is complete graph. In general, the above upper bound is tighter than , as in (6). (10) suggests that the amplitude of successful graph augmentation depends on original graph topology, via and , i.e., depending on original graph’s capacity of accommodating new edges. For dense graphs where approaches , this upper bound approaches , converging to the complete graph scenario at (6).
(C) Determining new edges
The location of new edges is determined after we obtain . Recall and . and are the diagonal elements of and , respectively; diagonals of and are zero. We first get from the diagonal of , then indicate the number of additional degrees of each node, , by comparing to . Diagonal elements of , , may not always be non-negative integer; let in these cases. The above threshold for ensures that the maximum element in , i.e., the maximum additional degree of a node, can be assigned in the augmented graph. That is, the maximum additional degree is at most the size of the new (largest) connected component minus one; beyond the threshold will lead to unrealistic elements in that are too large to be realisable node degrees.
Even below the threshold of , the additional node degrees are not guaranteed to be realizable. This is because the set of nodes that are going to have additional degrees, i.e., the nonzero elements of , may not be as large as some elements of . Although elements of are ensured to be no larger than the component size limit, new degrees are formulated only between nodes corresponding to the nonzero elements of , not between a node in this set and a node that has zero entry in . This suggests that realized degrees are no greater than the sum of the diagonals of , which equals (Note 3, ). Consequently, the realized number of components, , can be larger than .
We use the following algorithm to realize additional node degrees in , i.e., to determine the additional adjacency . Sort the nodes having nonzero in descending order of . Starting from the first node in the list, add an edge between the current node and each node below it in the list that still has quota for new degree, if such an edge does not exist; stop if reaching list end or when has been all assigned; the realized is then no more than . Continue until reaching list end. The complete algorithm of graph augmentation via eigenvalue elevation is summarized in the Supplemental Material. As mentioned, truncating additional degrees as integers and assigning them only among modified nodes make realized new edges less than projected new edges; edge realization ratio .

(D) Evaluation of augmentation outcome
The outcome of edge augmentation is evaluated from two aspects. We first check the fraction of inter-community edges among proposed new edges, which include both intra- and inter-community edges. This fraction indicates algorithm’s shear ability in establishing inter-community connections. Note that, although in current problem setting (i.e., connecting communities), inter-community edges are preferred over intra-community edges, a large may not be desired in practice, as too much inter-community linkage destroys original community structures. So happens often with large : when exceeds original eigenvalues and departs from the bulk of them, it is imposing new low-dimension structures that aggregates existing communities (e.g., (Chauhan et al., 2009)), drastically altering graph topology. In practice, can be chosen at or to prevent this failure.
This ambiguity at brings the second evaluation, where we investigate inter-community edge determination against ground truth, i.e. to which extent the algorithm can recover real inter-community links that are hidden from view. This uses graph data that have ground truth community structures; inter-community edges are covered and are checked against algorithm-determined edges.
In a further sense, as hinted above, since community structure on graphs is ill-defined and there is in fact no “ground truth” community (Peel et al., 2017; Li and Zhang, 2020), this task of inter-community edge augmentation can be viewed as the inverse task for community detection. The above edge augmentation algorithm thus reverse-engineers community detection algorithms. Hence the problem of whether the algorithm can recover ground truth inter-community connections is better formulated as to which extent this algorithm can recover the inter-community connections determined by which community detection algorithm, i.e., this edge augmentation heuristic is an effective reverse-engineering companion of which community detection heuristic.
Results.– (A) Upper bound for and . Consider Erdos-Renyi random graph at the threshold for giant component emergence, in order that multiple non-trivial components exist and that . For each and , average results over 10 random graphs. Count the number of times , determined under and , can be realised during edge augmentation, i.e, when the augmenting node set . Compute realized edge density at each .
Upper bound for holds consistently (Figure 2a). At region , fewer instances of random ER graphs approach the upper bound, which is for arbitrary graph topology and is bounding at the most extreme case with a giant complete component and isolates. Average component number ; corresponding upper bound for , , holds at each value of and in particular at (Figure 2b; ). Consistently, edge realization ratio remains high in the realizable region, slightly below 1 due to finite effects (Figure 2c) (Note 4, ).

(B) Recovering inter-community connections of SBM. We evaluate algorithm’s ability of recovering inter-community connections at the stochastic block model (SBM) (Holland et al., 1983; Karrer and Newman, 2011). Initiate blocks each containing 50 nodes, totaling at ; ; expected inter-block/intra-block edge ratio is 0.8. Hide all inter-block edges and conduct edge augmentation. Average results over 5 random graph instances for each and . For this graph, as is always greater than , the upper bound holds across the range of .
The fraction of inter-block edges among realized edges increases as increases and roughly remains constant as increases (Figure 3). Inter-block connections become denser when a larger graph eigenspace is being modulated via altering more eigenvalues; the amplitude of augmentation plays a lesser role. With nodes in SBM communities being homogeneous, the recovery rate of exact hidden inter-block edges is small (, not shown).
// | Girvan-Newman | Greedy modularity | Label propagation | Louvain | Fluid community | |
---|---|---|---|---|---|---|
Davis southern women | 0 | 2/-/- | 3/-/- | 2/-/- | 3/-/- | 2/-/- |
() | 1 | 2/0/0 | 3/0/0 | 2/0/0 | 3/0/0 | 2/0/0 |
2 | 2/0/0 | 3/0/0 | 1/0.97/0.25 | 2/0.86/0 | 1/0.65/0.03 | |
3 | 2/0/0 | 1/0.46/0.11 | 1/0.64/0 | 2/0.97/0.04 | 1/0.49/0.09 | |
4 | 1/0.44/0.20 | 1/0.74/0.18 | 1/0.48/0 | 1/0.96/0.04 | 1/0.61/0.17 | |
Karate club | 0 | 2/-/- | 3/-/- | 4/-/- | 4/-/- | 2/-/- |
() | 1 | 2/0/0 | 2/0.82/0.11 | 3/1.00/0.06 | 4/0/0 | 2/0/0 |
2 | 2/0/0 | 2/0.76/0.21 | 2/0.97/0.12 | 3/0.82/0 | 2/0/0 | |
3 | 2/0/0 | 2/0.55/0.16 | 1/0.66/0.31 | 2/0.74/0 | 1/0.42/0.09 | |
4 | 1/0.51/0.10 | 2/0.59/0.16 | 1/0.71/0.31 | 1/0.78/0.05 | 1/0.53/0.09 | |
Dolphin | 0 | 2/-/- | 4/-/- | 5/-/- | 5/-/- | 2/-/- |
() | 1 | 1/0.55/0 | 3/0.06/0 | 4/0.93/0.06 | 4/0.07/0 | 2/0/0 |
2 | 1/0.48/0 | 2/0.49/0 | 3/0.77/0.06 | 3/0.58/0 | 1/0.52/0.29 | |
3 | 1/0.63/0 | 2/0.81/0.14 | 2/0.82/0.12 | 2/0.62/0.03 | 1/0.58/0.24 | |
4 | 1/0.53/0 | 2/0.75/0.14 | 2/0.87/0.15 | 1/0.67/0.05 | 1/0.53/0.12 | |
Facebook TV show | 0 | 2/-/- | 59/-/- | 410/0/0 | 49/-/- | 9/-/- |
() | 1 | 2/0/0 | 27/0.51/0 | 263/0.88/0 | 22/0.48/0 | 6/0.99/0 |
2 | 1/0.68/0 | 16/0.66/0 | 186/0.80/0 | 11/0.45/0 | 5/0.54/0 | |
3 | 1/0.51/0 | 10/0.63/0 | 94/0.81/0 | 7/0.52/0 | 4/0.63/0 | |
4 | 1/0.60/0 | 6/0.65/0 | 52/0.83/0.01 | 4/0.51/0 | 3/0.80/0 | |
5 | 1/0.49/0 | 4/0.67/0 | 21/0.85/0.01 | 2/0.46/0 | 3/0.97/0 | |
6 | 1/0.37/0 | 3/0.66/0 | 14/0.86/0.01 | 2/0.51/0 | 2/0.98/0 |
(C) Reverse-engineering community detection at real networks. Apply a panel of community detection algorithms (Girvan-Newman method (Girvan and Newman, 2002), greedy modularity maximization (Clauset et al., 2004), label propagation (Raghavan et al., 2007), Louvain method (Blondel et al., 2008), and fluid community (Paluck et al., 2017)) on small-to-medium real networks (Davis southern women (Davis et al., 2009), Karate club (Zachary, 1977), Dolphin network (Lusseau et al., 2003), and a recent Facebook network (Rozemberczki et al., 2019)) for demonstration.
On each graph, after community detection, hide inter-community edges and recover via edge augmentation. Set equal to network size; vary . Compute the resulting component number , fraction of inter-community edges among new edges, and recovery rate of hidden ground-truth inter-community edges.
Results are shown in Table 1 (showing one set of results at indeterministic community detection; results are robust across multiple runs). Consistently, for the five community detection methods, as increases, components are always better connected (i.e., decreases; showing the first pair of communities at Girvan-Newman to illustrate ). A small is sufficient for considerably connecting the partitioned real network (Facebook TV show). In most cases, over half of proposed edges are inter-community (), while the recovery rate of exact ground-truth inter-community connections is not high. Overall, this edge augmentation method most reverse-engineers community detection under label propagation, yielding large across different graphs and values.
Concluding Remarks.– We propose the task on graphs of determining most likely inter-community edges based on components’ intra-community connectivity. We develop an algorithm for this edge augmentation task based on elevating zero eigenvalues of graph’s spectrum. Upper bounds for eigenvalue elevation amplitude and for the corresponding augmented edge density are derived and are validated on random graphs. Algorithm performs consistently on synthetic and real networks, showing desirable performance at connecting graph components and varied reverse-engineering compatibility towards different community detection methods.
We assume uniform in eigenvalue elevation; non-uniform may lead to more general results. The method can further extend to using alternative graph Laplacian (e.g., normalized, random-walk). The matching of inter-community edge augmentation heuristic with community detection ideas is prefatory and asks extensive empirical analysis.
The “most likely” inter-community edges, as we set out to determine, can be defined in different ways. This suggests that edge augmentation outcome can be evaluated on resulting graph’s performance at specific tasks that welcomes global connectivity. One suitable task is semi-supervised learning on graphs, for example, using graph convolution neural networks (Zhang et al., 2019), with the convolution taking place in the spectral (e.g., GCN (Kipf and Welling, 2016)) or spatial space (e.g., DCNN (Atwood and Towsley, 2016)). We investigate which edges to add to the disconnected graph, such that semi-supervised classification can better generalize; that is, the evaluation of inter-community connections anchors on their effect in enhancing classification performance. Application of this edge augmentation algorithm to semi-supervised classification on disconnected graphs points to an interesting future work.
References
- Anderson and Morley (1985) Anderson Jr, W. N., Morley, T. D. (1985), Eigenvalues of the Laplacian of a graph, Linear and Multilinear Algebra, 18(2), 141-145.
- Atwood and Towsley (2016) Atwood, J., Towsley, D. (2016), Diffusion-convolutional neural networks, Adv. in NIPS, 29.
- Banerjee et al. (2013) Banerjee, A., Chandrasekhar, A. G., Duflo, E., Jackson, M. O. (2013), The diffusion of microfinance, Science, 341(6144), 1236498.
- Blondel et al. (2008) Blondel, V. D., Guillaume, J. L., Lambiotte, R., Lefebvre, E. (2008). Fast unfolding of communities in large networks, J. of Statistical Mechanics, 2008(10), P10008.
- Breiger and Pattison (1986) Breiger, R. L., Pattison, P. E. (1986), Cumulated social roles: The duality of persons and their algebras, Social Networks, 8(3), 215-256.
- Cai et al. (2015) Cai, J., De Janvry, A., Sadoulet, E. (2015), Social networks and the decision to insure, A. E. J., 7(2), 81-108.
- Cai and Sun (1989) Cai, G. R., Sun, Y. G. (1989), The minimum augmentation of any graph to a K edge connected graph, Networks, 19(1), 151-172.
- Chauhan et al. (2009) Chauhan, S., Girvan, M., Ott, E. (2009), Spectral properties of networks with community structure, PRE, 80(5), 056114.
- Chung and Graham (1997) Chung, F. R., Graham, F. C. (1997), Spectral graph theory (No. 92), American Mathematical Soc..
- Clauset et al. (2004) Clauset, A., Newman, M. E., Moore, C. (2004), Finding community structure in very large networks, PRE, 70(6), 066111.
- Davis et al. (2009) Davis, A., Gardner, B. B., Gardner, M. R. (2009), Deep South: A social anthropological study of caste and class, Univ of South Carolina Press.
- Ding et al. (2022) Ding, K., Xu, Z., Tong, H., Liu, H. (2022), Data augmentation for deep graph learning: A survey, arXiv:2202.08235.
- Erdős and Rényi (1960) Erdős, P., Rényi, A. (1960), On the evolution of random graphs, Publ. Math. Inst. Hung. Acad. Sci, 5(1), 17-60.
- Frederickson and Ja’Ja’ (1981) Frederickson, G. N., Ja’Ja’, J. (1981), Approximation algorithms for several graph augmentation problems, SIAM J. on Computing, 10(2), 270-283.
- Gao et al. (2021) Gao, Z., Bhattacharya, S., …, Sadler, B. M. (2021), Training robust graph neural networks with topology adaptive edge dropping, arXiv:2106.02892.
- Girvan and Newman (2002) Girvan, M., Newman, M. E. (2002), Community structure in social and biological networks, PNAS, 99(12), 7821-7826.
- Holland et al. (1983) Holland, P. W., Laskey, K. B., Leinhardt, S. (1983), Stochastic blockmodels: First steps, Social Networks, 5(2), 109-137.
- Huffaker (2010) Huffaker, D. (2010), Dimensions of leadership and social influence in online communities, Hu. Com. Res., 36(4), 593-617.
- Karrer and Newman (2011) Karrer, B., Newman, M. E. (2011), Stochastic blockmodels and community structure in networks, PRE, 83(1), 016107.
- Kipf and Welling (2016) Kipf, T. N., Welling, M. (2016), Semi-supervised classification with graph convolutional networks, arXiv:1609.02907.
- Khuller and Thurimella (1993) Khuller, S., Thurimella, R. (1993), Approximation algorithms for graph augmentation, J. of Algorithms, 14(2), 214-225.
- Klicpera et al. (2019) Klicpera, J., Weisenberger, S., Günnemann, S. (2019), Diffusion improves graph learning, arXiv:1911.05485.
- Li and Zhang (2020) Li, T., Zhang, P. (2020), Self-falsifiable hierarchical detection of overlapping communities on social networks, NJP, 22(3), 033014.
- Liben‐Nowell and Kleinberg (2007) Liben‐Nowell, D., Kleinberg, J. (2007), The link prediction problem for social networks, J. of ASIST, 58(7), 1019-1031.
- Lü and Zhou (2011) Lü, L., Zhou, T. (2011), Link prediction in complex networks: A survey, Physica A, 390(6), 1150-1170.
- Lusseau et al. (2003) Lusseau, D., Schneider, K., …, Dawson, S. M. (2003), The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations, Beh. Eco. and Sociobio., 54(4), 396-405.
- Paluck et al. (2016) Paluck, E. L., Shepherd, H., Aronow, P. M. (2016), Changing climates of conflict: A social network experiment in 56 schools, PNAS, 113(3), 566-571.
- Paluck et al. (2017) Parés, F., Gasulla, D. G., … Suzumura, T. (2017, November), Fluid communities: A competitive, scalable and diverse community detection algorithm, In Inter. Conf. on Comp. Net. and Their App. (pp. 229-240).
- Peel et al. (2017) Peel, L., Larremore, D. B., Clauset, A. (2017), The ground truth about metadata and community detection in networks, Sci. Adv., 3(5), e1602548.
- Raghavan et al. (2007) Raghavan, U. N., Albert, R., Kumara, S. (2007), Near linear time algorithm to detect community structures in large-scale networks, PRE, 76(3), 036106.
- Rong et al. (2019) Rong, Y., Huang, W., Xu, T., Huang, J. (2019), Dropedge: Towards deep graph convolutional networks on node classification, arXiv:1907.10903.
- Rozemberczki et al. (2019) Rozemberczki, B., Davies, R., Sarkar, R., Sutton, C. (2019, August). Gemsec: Graph embedding with self clustering, 2019 IEEE/ACM Inter. Conf. on Adv. in Soc. Net. Ana. and Mining (pp. 65-72).
- Von Luxburg (2007) Von Luxburg, U. (2007), A tutorial on spectral clustering, Statistics and Computing, 17(4), 395-416.
- Watanabe and Nakamura (1987) Watanabe, T., Nakamura, A. (1987), Edge-connectivity augmentation problems, J. of Computer and System Sciences, 35(1), 96-144.
- Zachary (1977) Zachary, W. W. (1977), An information flow model for conflict and fission in small groups, J. of Anthropological Research, 33(4), 452-473.
- Zhang et al. (2019) Zhang, S., Tong, H., Xu, J., Maciejewski, R. (2019), Graph convolutional networks: a comprehensive review, Computational Social Networks, 6(1), 1-23.
- Zhao et al. (2022) Zhao, T., Liu, G., Günnemann, S., Jiang, M. (2022), Graph Data Augmentation for Graph Machine Learning: A Survey, arXiv:2202.08871.
- Zhao et al. (2021) Zhao, T., Liu, Y., …, Shah, N. (2021, May), Data augmentation for graph neural networks, In Proc. of the AAAI Conf. on AI, 35(12), 11015-11023.
- (39) Assume no self-loops on nodes; diagonal elements of are zero. Moreover, .
- (40) Removing zero eigenvalues corresponds to reducing components and thus aggregating components.
- (41) For a valid , let the product be an odd number when determining and .
- (42) In the experiment, edge assignment is allowed to proceed in the unrealizable region, realizing a portion of proposed new degrees. Smaller realization ratio scatters the contour of in this region (Figure 2b).