Maintenance of Structural Hole Spanners in Dynamic Networks
Abstract
Structural Hole (SH) spanners are the set of users who bridge different groups of users and are vital in numerous applications. Despite their importance, existing work for identifying SH spanners focuses only on static networks. However, real-world networks are highly dynamic where the underlying structure of the network evolves continuously. Consequently, we study SH spanner problem for dynamic networks. We propose an efficient solution for updating SH spanners in dynamic networks. Our solution reuses the information obtained during the initial runs of the static algorithm and avoids the recomputations for the nodes unaffected by the updates. Experimental results show that the proposed solution achieves a minimum speedup of 3.24 over recomputation. To the best of our knowledge, this is the first attempt to address the problem of maintaining SH spanners in dynamic networks.
Keywords:
Structural hole spanners; dynamic networks; pairwise connectivity; connected components.I Introduction
Over the past few years, various large-scale networks have emerged, such as collaboration networks, social networks, etc. The topological structure of these networks exhibits a community structure where the nodes are tightly connected within the community. The presence of community structure in the network leads to the formation of structural holes, which are the gaps formed due to lack of connectivity between the communities [1]. It has been shown that the information circulated within the community tends to be similar. Therefore, the presence of SH between the communities leads to the redundant flow of information within the community. Hence, a community needs to have connectivity with different communities to access novel information. SH spanners are those individuals who fill structural holes by acting as a bridge between different communities that are otherwise disconnected [1]. Figure 1 illustrates the SH spanners in the network. SH spanners have numerous real-world applications such as information diffusion, preventing the spread of rumours, community detection, etc. For example, identifying SH spanners in the network and filtering the information passing through them can prevent the spread of rumors in other communities. Several studies [1, 2, 3, 4, 5, 6, 7, 8] have been conducted for discovering SH spanners in static networks. However, real-world networks are highly dynamic and change rapidly. As a result, identified SH spanners change, and therefore, it is crucial to track updated spanners in the evolving networks. We study SH spanner problem for the dynamic networks. We define SH spanner as a node whose removal minimizes the pairwise connectivity of the residual network. This definition aims to capture the nodes that are located between otherwise disconnected groups of nodes. While the traditional SH spanner problem focuses on discovering the spanner nodes that minimize the pairwise connectivity of the network, the SH spanner tracking problem aims to update the previously identified spanner nodes as the network evolves. This paper proposes an efficient solution that maintains top- SH spanners for decremental edge updates in the network. Our solution performs greedy exchange, each time replacing an old spanner node with a high score node from the network.

Our contributions can be summarized as follows. First, we study SH spanner problem for dynamic networks and formulate Structural hole Spanner Tracking (SST) problem. We then propose an efficient solution for SST problem that maintains spanner nodes for single decremental edge updates in the network by discovering a set of affected nodes. We also design a method to compute the pairwise connectivity score of the nodes efficiently. We evaluate the performance of the proposed solution by conducting extensive experiments, and the results demonstrate that our solution achieves a minimum speedup of 3.24 over recomputation.
II RELATED WORK
SH theory was first studied by Burt [9] to identify the critical personnel in the company to integrate various operations. Goyal et al. [4] designed a model to illustrate how a node act as a SH spanner in the network. They formulated the SH spanner problem as a set of nodes that pass through numerous shortest paths between distinct pair of nodes. Tang et al. [8] designed a 2-step mechanism to identify SH spanners. For every node, the model only considers the shortest paths having length two, on which the node resides, and the rest of the paths are ignored. Rezvani et al. [2] designed several heuristics and argued that eliminating those nodes that bridge multiple communities results in an increase in the sum of all-pair shortest distance in the network. Based on [2], Xu et al. [6] designed fast and scalable algorithms for identifying spanner nodes. Gong et al. [3] designed a machine learning model to discover SH spanners in the online social network. Burt [10] studied the correlation between the strength of the links with which a node is connected to its bridged communities and the bridging advantage of that node. Based on [10], Xu et al. [7] designed maxBlock algorithm to discover SH spanners that connect many communities and have substantial relations with these communities. Lou et al. [1] designed a model to discover SH spanners and argued that eliminating the SH spanners from the network leads to a decrease in the minimal cut of the communities. The model requires prior community information however, community identification is an expensive process. He et al. [5] designed a model that jointly discovers communities and SH spanners.
III PRELIMINARIES AND PROBLEM STATEMENT
III-A Network Model
A network can be modeled as an undirected graph111We consider only graphs without self-loops or multiple edges. , where and are the set of nodes and edges in the network. Let and denotes the number of nodes and edges in the graph, respectively. A path from node to in an undirected graph is a sequence of nodes such that each pair is an edge in . A pair of nodes , is connected if there is a path between and . A connected component or component in an undirected graph is a maximal set of nodes in which a path connects each pair of node. The pairwise connectivity for any node pair is quantified as if node and are connected, and otherwise.
Total pairwise connectivity , i.e., pairwise connectivity across all node pairs in the graph, is given by:
(1) |
Pairwise connectivity score of node is the contribution of node to the total pairwise connectivity score of the graph. Pairwise connectivity score of node is calculated as follows:
(2) |
III-B Structural Hole Spanner Problem
Structural Hole Spanner Problem. Given a graph , and a positive integer , SH spanner problem is to identify a set of spanners Top- in , where Top- and Top-, such that the removal of nodes in Top- from minimizes the total pairwise connectivity in the residual subgraph Top-.
(3) |
where Top- and Top-.
Algorithm 1 presents a heuristic approach for discovering the SH spanners in the static network. The algorithm works by repeatedly selecting a node with a maximum pairwise connectivity score, which minimizes the total pairwise connectivity of the residual network when removed from the network.
III-C Structural Hole Spanner Tracking Problem
Structural Hole Spanner Tracking (SST) Problem. Given a graph , spanner set Top-, and edge update , SST problem is to identify a spanner set Top′- with cardinality in by updating Top- such that the removal of nodes in Top′- from minimizes the total pairwise connectivity in the residual subgraph Top′-).
IV PROPOSED SOLUTION
We propose an efficient solution that maintains top- spanners in the dynamic network. Instead of constructing the spanner set from the ground, we start from old Top- and repeatedly update it.
IV-A Finding Affected Nodes
Whenever there is an edge update in the network, by identifying the set of affected nodes, we need to recompute the scores of these nodes only. When an edge is deleted from the network, affected nodes are the nodes reachable from node , in case edge is non-bridge. On the other hand, if edge is a bridge, we have two sets of affected nodes and , representing the nodes reachable from node and , respectively. Let be the original network, as shown in Figure 2(a), and Figure 2(b) shows the updated network . When an edge is deleted from , the score of nodes changes as shown in Figure 2(b), and these nodes are called affected nodes whereas score of the nodes does not change, and therefore, these nodes are unaffected.

Lemma 1. Given a graph and an edge update , any node is an affected node if or , resulting in before deletion not equal to after deletion. Otherwise, the node is unaffected.
Here, is the pairwise connectivity score of node in updated graph .
IV-B Various Cases for Edge Deletion
In this section, we discuss various cases due to the deletion of an edge from the network.
Case 1: No change in connected component (Non-bridge edge)
When the deleted edge is non-bridge, then there is no change in the connected components of the updated network. Node and still belong to the same connected component and the score of only few nodes in the component changes, i.e., . Here, represents the connected component containing node .
Case 2: Split connected component (Bridge edge)
When edge is a bridge, its deletion splits the connected component into two new connected components. The score of the nodes in the component containing node and node changes, i.e., or .
IV-C Fast Computation of Pairwise Connectivity Score
With the change in the structure of the network, it is important to update the pairwise connectivity score of the affected nodes. Let be a set of nodes not reachable from node . Therefore, nodes in do not contribute to the pairwise connectivity score of node . Hence, it is not required to traverse the whole network to compute the score of node (as in equation 2), instead traversing the component to which node belongs is sufficient. Updated pairwise connectivity score for node can be calculated as:
(4) |
Here, denotes the pairwise connectivity score of the component containing node .
IV-D Updating top- Spanners
This section discusses the procedure for updating top- spanners. We use Algorithm 1 to obtain the initial spanner set and pairwise connectivity score of the nodes in the original network. In addition, we use max-heap priority queue , where the nodes are sorted by their pairwise connectivity score. Besides, we have maintained min-heap priority queue for the spanner nodes in Top-.
Lemma 2. Given a graph and an edge update , if the deleted edge is a non-bridge edge, then , .
Lemma 3. Given a graph and an edge update , if the deleted edge is a bridge edge, then , .
When an edge is deleted from the network, it is first determined if it is a bridge or non-bridge edge. We then identify the set of affected nodes using the procedure discussed in Section IV(A). We compute the new pairwise connectivity score of the affected nodes using equation 4 and update the score of these nodes in the priority queue . The score of the nodes in Top- may also changes due to updates in the network, and therefore, we need to update the score of affected nodes in Top-.
Once we have updated score of all the nodes, we update the Top- spanner set. Let denotes the node with the maximum score in . Now, we compare the score of with the minimum score node in Top-, and if Top-, we terminate the process and return Top-. In contrast, if Top-, and node is already present in Top-, we remove from the network and update score of the nodes in the component containing node in . On the other hand, if node is not present in Top-, we remove the minimum score node from Top- and add node to Top-. Finally, is removed from , and the score of the nodes in the priority queue is updated. The process for updating top- spanner is repeated for a maximum of times or earlier, incase the procedure terminates.
V Experimental Results
This section analyses the performance of the proposed solution. We implemented our method in Python 3.7. The experiments are performed on Window 10 PC with CPU 3.20 GHz and 16 GB RAM. We analyse the performance of the proposed solution on four real networks, Karate222http://www-personal.umich.edu/~mejn/netdata/karate.zip, Dolphins333http://www-personal.umich.edu/~mejn/netdata/dolphins.zip, American College Football444http://www-personal.umich.edu/~mejn/netdata/football.zip, and HC-BIOGRID555https://www.pilucrescenzi.it/wp/networks/biological/. The characteristics of the datasets are summarized in Table I.
\hlineB2.5 Dataset | Nodes | Edges | Avg. degree |
\hlineB2.5 Karate | 34 | 78 | 4 |
Dolphins | 62 | 159 | 5 |
Football | 115 | 613 | 10 |
HC-BIOGRID | 4039 | 14342 | 7 |
\hlineB2.5 |
To evaluate the performance of the proposed solution, we compare it against static recomputation. Table II shows the speedup achieved by the proposed solution over recomputation. Here, speedup is the ratio of the run time of static recomputation to the dynamic solution. In order to determine how the two solutions (static and proposed dynamic solution) perform for the dynamic network, we start with a full network and randomly remove 50 edges, one at a time. We then compute the geometric mean of the speedup for the proposed solution in terms of its execution time against the static recomputation. The column Gmean, Min and Max contain geometric mean of the speedup (over 50 edge deletions), minimum speedup, and maximum speedup achieved.
\hlineB2.5 Dataset | ||||||
Gmean | Min | Max | Gmean | Min | Max | |
\hlineB2.5 Karate | 2.35 | 1.73 | 3.1 | 3.92 | 2.98 | 4.18 |
Dolphins | 3.34 | 2.11 | 4.18 | 4.16 | 3.06 | 5.33 |
Football | 3.72 | 3.42 | 4.21 | 10.17 | 9.6 | 11.47 |
HC-BIOGRID | 3.76 | 1.85 | 4.11 | 11.16 | 10.21 | 11.89 |
\hlineB2.5 Mean (Geometric) | 3.24 | 2.19 | 3.87 | 6.56 | 5.47 | 7.42 |
\hlineB2.5 |
We run our solution for 2 different values of , i.e., = 1 and 5. For real networks, the gmean speedup is always at least 2.35 for = 1 and 3.92 for = 5. The experimental results demonstrate that speedup increases with the value of . The average speedup reaches 11.16 for = 5 from a speedup of 3.76 for = 1 (HC-BIOGRID dataset), which shows a significant improvement for a larger value of . The average speedup over all tested datasets is 3.24 for = 1 and 6.56 for = 5. The minimum speedup achieved by the proposed solution is 1.73 for the Karate dataset, and the maximum speedup achieved is 11.89 for HC-BIOGRID dataset. In addition, it has been observed that the speedup also increases with the size of the network. For a small size network (Karate dataset), the speedup is 3.92 for = 5. In contrast, for the same value of , the speedup increases significantly to 11.16 for the large size network (HC-BIOGRID dataset).
VI Conclusion
This paper studied the SH spanner identification problem for the dynamic networks, namely SST problem. We proposed an efficient solution for SST problem that maintains top- spanners dynamically by identifying the set of affected nodes whose pairwise connectivity score changes due to updates in the network. We also designed a fast mechanism for computing the pairwise connectivity score of the nodes. We analysed the performance of our solution experimentally and showed that the proposed single edge update solution speedup the process by a minimum factor of 3.24 over recomputation.
VII ACKNOWLEDGMENTS
This work was completed under Australian Government Research Training Program Scholarship at the University of Adelaide. It is supported by Key-Area Research and Development Plan of Guangdong Province #2020B010164003. The corresponding author is Hong Shen.
References
- [1] T. Lou and J. Tang, “Mining structural hole spanners through information diffusion in social networks,” in WWW, 2013, pp. 825–836.
- [2] M. Rezvani, W. Liang, W. Xu, and C. Liu, “Identifying top-k structural hole spanners in large-scale social networks,” in CIKM, 2015, pp. 263–272.
- [3] Q. Gong, J. Zhang, X. Wang, and Y. Chen, “Identifying structural hole spanners in online social networks using machine learning,” in SIGCOMM, 2019, pp. 93–95.
- [4] S. Goyal and F. Vega-Redondo, “Structural holes in social networks,” Journal of Economic Theory, vol. 137, no. 1, pp. 460–492, 2007.
- [5] L. He, C.-T. Lu, J. Ma, J. Cao, L. Shen, and P. S. Yu, “Joint community and structural hole spanner detection via harmonic modularity,” in SIGKDD, 2016, pp. 875–884.
- [6] W. Xu, M. Rezvani, W. Liang, J. X. Yu, and C. Liu, “Efficient algorithms for the identification of top- structural hole spanners in large social networks,” TKDE, vol. 29, no. 5, pp. 1017–1030, 2017.
- [7] W. Xu, T. Li, W. Liang, J. X. Yu, N. Yang, and S. Gao, “Identifying structural hole spanners to maximally block information propagation,” Information Sciences, vol. 505, pp. 100–126, 2019.
- [8] J. Tang, T. Lou, and J. Kleinberg, “Inferring social ties across heterogenous networks,” in WSDM, 2012, pp. 743–752.
- [9] R. S. Burt, Structural holes: The social structure of competition. Harvard university press, 2009.
- [10] R. Burt, “Structural holes in virtual worlds,” Chicago, IL: University of Chicago, 2011.