This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Maintenance of Structural Hole Spanners in Dynamic Networks

Diksha Goel School of Computer Science
University of Adelaide, Australia
[email protected]
   Hong Shen {@IEEEauthorhalign} Hui Tian School of Computer Science and Engineering
Sun Yat-sen University, Guangzhou, China
[email protected]
School of Information and Communication Technology
Griffith University, Australia
[email protected]
   Mingyu Guo School of Computer Science
University of Adelaide, Australia
[email protected]
Abstract

Structural Hole (SH) spanners are the set of users who bridge different groups of users and are vital in numerous applications. Despite their importance, existing work for identifying SH spanners focuses only on static networks. However, real-world networks are highly dynamic where the underlying structure of the network evolves continuously. Consequently, we study SH spanner problem for dynamic networks. We propose an efficient solution for updating SH spanners in dynamic networks. Our solution reuses the information obtained during the initial runs of the static algorithm and avoids the recomputations for the nodes unaffected by the updates. Experimental results show that the proposed solution achieves a minimum speedup of 3.24 over recomputation. To the best of our knowledge, this is the first attempt to address the problem of maintaining SH spanners in dynamic networks.

Keywords:
Structural hole spanners; dynamic networks; pairwise connectivity; connected components.

I Introduction

Over the past few years, various large-scale networks have emerged, such as collaboration networks, social networks, etc. The topological structure of these networks exhibits a community structure where the nodes are tightly connected within the community. The presence of community structure in the network leads to the formation of structural holes, which are the gaps formed due to lack of connectivity between the communities [1]. It has been shown that the information circulated within the community tends to be similar. Therefore, the presence of SH between the communities leads to the redundant flow of information within the community. Hence, a community needs to have connectivity with different communities to access novel information. SH spanners are those individuals who fill structural holes by acting as a bridge between different communities that are otherwise disconnected [1]. Figure 1 illustrates the SH spanners in the network. SH spanners have numerous real-world applications such as information diffusion, preventing the spread of rumours, community detection, etc. For example, identifying SH spanners in the network and filtering the information passing through them can prevent the spread of rumors in other communities. Several studies [1, 2, 3, 4, 5, 6, 7, 8] have been conducted for discovering SH spanners in static networks. However, real-world networks are highly dynamic and change rapidly. As a result, identified SH spanners change, and therefore, it is crucial to track updated spanners in the evolving networks. We study SH spanner problem for the dynamic networks. We define SH spanner as a node whose removal minimizes the pairwise connectivity of the residual network. This definition aims to capture the nodes that are located between otherwise disconnected groups of nodes. While the traditional SH spanner problem focuses on discovering the spanner nodes that minimize the pairwise connectivity of the network, the SH spanner tracking problem aims to update the previously identified spanner nodes as the network evolves. This paper proposes an efficient solution that maintains top-kk SH spanners for decremental edge updates in the network. Our solution performs greedy exchange, each time replacing an old spanner node with a high score node from the network.

Refer to caption
Fig. 1: Illustration of structural hole spanners.

Our contributions can be summarized as follows. First, we study SH spanner problem for dynamic networks and formulate Structural hole Spanner Tracking (SST) problem. We then propose an efficient solution for SST problem that maintains spanner nodes for single decremental edge updates in the network by discovering a set of affected nodes. We also design a method to compute the pairwise connectivity score of the nodes efficiently. We evaluate the performance of the proposed solution by conducting extensive experiments, and the results demonstrate that our solution achieves a minimum speedup of 3.24 over recomputation.

II RELATED WORK

SH theory was first studied by Burt [9] to identify the critical personnel in the company to integrate various operations. Goyal et al. [4] designed a model to illustrate how a node act as a SH spanner in the network. They formulated the SH spanner problem as a set of nodes that pass through numerous shortest paths between distinct pair of nodes. Tang et al. [8] designed a 2-step mechanism to identify SH spanners. For every node, the model only considers the shortest paths having length two, on which the node resides, and the rest of the paths are ignored. Rezvani et al. [2] designed several heuristics and argued that eliminating those nodes that bridge multiple communities results in an increase in the sum of all-pair shortest distance in the network. Based on [2], Xu et al. [6] designed fast and scalable algorithms for identifying spanner nodes. Gong et al. [3] designed a machine learning model to discover SH spanners in the online social network. Burt [10] studied the correlation between the strength of the links with which a node is connected to its bridged communities and the bridging advantage of that node. Based on [10], Xu et al. [7] designed maxBlock algorithm to discover SH spanners that connect many communities and have substantial relations with these communities. Lou et al. [1] designed a model to discover SH spanners and argued that eliminating the SH spanners from the network leads to a decrease in the minimal cut of the communities. The model requires prior community information however, community identification is an expensive process. He et al. [5] designed a model that jointly discovers communities and SH spanners.

III PRELIMINARIES AND PROBLEM STATEMENT

III-A Network Model

A network can be modeled as an undirected graph111We consider only graphs without self-loops or multiple edges. G=(V,E)G=(V,E), where VV and EE are the set of nodes and edges in the network. Let n=|V|n=|V| and m=|E|m=|E| denotes the number of nodes and edges in the graph, respectively. A path pijp_{ij} from node ii to jj in an undirected graph GG is a sequence of nodes {vi,vi+1,,vj}{\{v_{i},v_{i+1},...,v_{j}\}} such that each pair (vi,vi+1)(v_{i},v_{i+1}) is an edge in EE. A pair of nodes ii, jVj\in V is connected if there is a path between ii and jj. A connected component or component CC in an undirected graph GG is a maximal set of nodes in which a path connects each pair of node. The pairwise connectivity u(i,j)u(i,j) for any node pair (i,j)V×V(i,j)\in V\times V is quantified as 11 if node ii and jj are connected, and 0 otherwise.

Total pairwise connectivity P(G)P(G), i.e., pairwise connectivity across all node pairs in the graph, is given by:

P(G)=i,jV×V,iju(i,j){\displaystyle P(G)=\sum_{i,j\in V\times V,i\neq j}{\textstyle u(i,j)}} (1)

Pairwise connectivity score c(i)c(i) of node ii is the contribution of node ii to the total pairwise connectivity score of the graph. Pairwise connectivity score c(i)c(i) of node ii is calculated as follows:

c(i)=P(G)P(G\{i})c(i)=P(G)-P(G\backslash\{i\}) (2)

III-B Structural Hole Spanner Problem

Structural Hole Spanner Problem. Given a graph G=(V,E)G=(V,E), and a positive integer kk, SH spanner problem is to identify a set of spanners Top-kk in G(V,E)G(V,E), where Top-kk V\subset V and ||Top-k|=kk|=k, such that the removal of nodes in Top-kk from GG minimizes the total pairwise connectivity in the residual subgraph G(V\G(V\backslashTop-k)k).

Top-k=min{P(G\Top-k)}\text{Top-}k={min}\,\,\{{P(G\backslash\text{Top-}k)}\} (3)

where Top-kVk\,\subset\,V and ||Top-k|=kk|=k.

Algorithm 1 Structural hole spanner identification.
0:  Graph G(V,E)G(V,E), kk
0:  top-kk spanner set Top-kk
1:  Initialize Top-k=ϕk=\phi
2:  while ||Top-k|<kk|<k do
3:     v=argmaxvVc(v)v^{\prime}=argmax_{v\,\text{$\in$}\,V}\,c(v)
4:     G=G\{v}G=G\backslash\{v^{\prime}\}
5:     Top-k=k=Top-k{v}k\bigcup\{v^{\prime}\}
6:  end while
7:  return  Top-kk

Algorithm 1 presents a heuristic approach for discovering the SH spanners in the static network. The algorithm works by repeatedly selecting a node vv^{\prime} with a maximum pairwise connectivity score, which minimizes the total pairwise connectivity of the residual network when removed from the network.

III-C Structural Hole Spanner Tracking Problem

Structural Hole Spanner Tracking (SST) Problem. Given a graph G=(V,E)G=(V,E), spanner set Top-kk, and edge update ΔE\Delta E, SST problem is to identify a spanner set Top-kk with cardinality kk in G(V,E+ΔE)G^{\prime}(V,E+\Delta E) by updating Top-kk such that the removal of nodes in Top-kk from GG^{\prime} minimizes the total pairwise connectivity in the residual subgraph G(V\G^{\prime}(V\backslashTop-kk).

A naive solution for SST problem is to apply algorithm 1 after every update, which will return the new SH spanner set. However, this solution is computationally expensive. This paper focuses on updating top-kk spanners without explicitly running algorithm 1 after every update.

IV PROPOSED SOLUTION

We propose an efficient solution that maintains top-kk spanners in the dynamic network. Instead of constructing the spanner set from the ground, we start from old Top-kk and repeatedly update it.

IV-A Finding Affected Nodes

Whenever there is an edge update in the network, by identifying the set of affected nodes, we need to recompute the scores of these nodes only. When an edge (x,y)(x,y) is deleted from the network, affected nodes AA are the nodes reachable from node xx, in case edge (x,y)(x,y) is non-bridge. On the other hand, if edge (x,y)(x,y) is a bridge, we have two sets of affected nodes AxA_{x} and AyA_{y}, representing the nodes reachable from node xx and yy, respectively. Let G(V,E)G(V,E) be the original network, as shown in Figure 2(a), and Figure 2(b) shows the updated network G=G(V,E\(g,h))G^{\prime}=G(V,E\backslash(g,h)). When an edge (g,h)(g,h) is deleted from GG, the score of nodes {f,g,k,l,h,i,j,m}\{f,g,k,l,h,i,j,m\} changes as shown in Figure 2(b), and these nodes are called affected nodes whereas score of the nodes {a,b,c,d,e}\{a,b,c,d,e\} does not change, and therefore, these nodes are unaffected.

Refer to caption
Fig. 2: Illustration of affected and unaffected nodes due to updates in the network (a) Original network (b) Updated network.

Lemma 1. Given a graph G=(V,E)G=(V,E) and an edge update (a,b)(a,b), any node vVv\in V is an affected node if u(v,a)=1u(v,a)=1 or u(v,b)=1u(v,b)=1, resulting in c(v)c(v) before deletion not equal to c(v)c^{\prime}(v) after deletion. Otherwise, the node is unaffected.
Here, c(v)c^{\prime}(v) is the pairwise connectivity score of node vv in updated graph GG^{\prime}.

IV-B Various Cases for Edge Deletion

In this section, we discuss various cases due to the deletion of an edge from the network.

Case 1: No change in connected component (Non-bridge edge)

When the deleted edge (a,b)(a,b) is non-bridge, then there is no change in the connected components of the updated network. Node aa and bb still belong to the same connected component and the score of only few nodes in the component changes, i.e., c(r)c(r),rC(a)c(r)\neq c^{\prime}(r),\,r\in C(a). Here, C(a)C(a) represents the connected component containing node aa.
Case 2: Split connected component (Bridge edge)

When edge (g,h)(g,h) is a bridge, its deletion splits the connected component into two new connected components. The score of the nodes in the component containing node gg and node hh changes, i.e., c(r)c(r)rC(g)c(r)\neq c^{\prime}(r)\;\forall\,r\in C(g) or C(h)C(h).

IV-C Fast Computation of Pairwise Connectivity Score

With the change in the structure of the network, it is important to update the pairwise connectivity score of the affected nodes. Let vRv\in R be a set of nodes not reachable from node ii. Therefore, nodes in RR do not contribute to the pairwise connectivity score of node ii. Hence, it is not required to traverse the whole network to compute the score of node ii (as in equation 2), instead traversing the component to which node ii belongs is sufficient. Updated pairwise connectivity score for node ii can be calculated as:

c(i)=P(C(i))P(C(i)\{i})c(i)=P(C(i))-P(C(i)\backslash\{i\}) (4)

Here, P(C(i))P(C(i)) denotes the pairwise connectivity score of the component containing node ii.

IV-D Updating top-kk Spanners

This section discusses the procedure for updating top-kk spanners. We use Algorithm 1 to obtain the initial spanner set and pairwise connectivity score of the nodes in the original network. In addition, we use max-heap priority queue QQ, where the nodes are sorted by their pairwise connectivity score. Besides, we have maintained min-heap priority queue for the spanner nodes in Top-kk.
Lemma 2. Given a graph G=(V,E)G=(V,E) and an edge update (a,b)(a,b), if the deleted edge (a,b)(a,b) is a non-bridge edge, then c(v)c(v)c^{\prime}(v)\geq c(v), vA\forall\,v\in A.
Lemma 3. Given a graph G=(V,E)G=(V,E) and an edge update (a,b)(a,b), if the deleted edge (a,b)(a,b) is a bridge edge, then c(v)<c(v)c^{\prime}(v)<c(v), vA\forall\,v\in A.

When an edge (a,b)(a,b) is deleted from the network, it is first determined if it is a bridge or non-bridge edge. We then identify the set of affected nodes using the procedure discussed in Section IV(A). We compute the new pairwise connectivity score of the affected nodes using equation 4 and update the score of these nodes in the priority queue QQ. The score of the nodes in Top-kk may also changes due to updates in the network, and therefore, we need to update the score of affected nodes in Top-kk.

Once we have updated score of all the nodes, we update the Top-kk spanner set. Let ww denotes the node with the maximum score in QQ. Now, we compare the score of ww with the minimum score node in Top-kk, and if c(w)c(w)\leq Top-k.getMin()k.getMin(), we terminate the process and return Top-kk. In contrast, if c(w)>c(w)> Top-k.getMin()k.getMin(), and node ww is already present in Top-kk, we remove ww from the network and update score of the nodes in the component containing node ww in QQ. On the other hand, if node ww is not present in Top-kk, we remove the minimum score node from Top-kk and add node ww to Top-kk. Finally, ww is removed from GG, and the score of the nodes in the priority queue is updated. The process for updating top-kk spanner is repeated for a maximum of kk times or earlier, incase the procedure terminates.

V Experimental Results

This section analyses the performance of the proposed solution. We implemented our method in Python 3.7. The experiments are performed on Window 10 PC with CPU 3.20 GHz and 16 GB RAM. We analyse the performance of the proposed solution on four real networks, Karate222http://www-personal.umich.edu/~mejn/netdata/karate.zip, Dolphins333http://www-personal.umich.edu/~mejn/netdata/dolphins.zip, American College Football444http://www-personal.umich.edu/~mejn/netdata/football.zip, and HC-BIOGRID555https://www.pilucrescenzi.it/wp/networks/biological/. The characteristics of the datasets are summarized in Table I.

TABLE I: Summary of real datasets.
\hlineB2.5 Dataset Nodes Edges Avg. degree
\hlineB2.5 Karate 34 78 4
Dolphins 62 159 5
Football 115 613 10
HC-BIOGRID 4039 14342 7
\hlineB2.5

To evaluate the performance of the proposed solution, we compare it against static recomputation. Table II shows the speedup achieved by the proposed solution over recomputation. Here, speedup is the ratio of the run time of static recomputation to the dynamic solution. In order to determine how the two solutions (static and proposed dynamic solution) perform for the dynamic network, we start with a full network and randomly remove 50 edges, one at a time. We then compute the geometric mean of the speedup for the proposed solution in terms of its execution time against the static recomputation. The column Gmean, Min and Max contain geometric mean of the speedup (over 50 edge deletions), minimum speedup, and maximum speedup achieved.

TABLE II: Speedup on recomputation on real datasets.
\hlineB2.5 Dataset k=1k=1 k=5k=5
Gmean Min Max Gmean Min Max
\hlineB2.5 Karate 2.35 1.73 3.1 3.92 2.98 4.18
Dolphins 3.34 2.11 4.18 4.16 3.06 5.33
Football 3.72 3.42 4.21 10.17 9.6 11.47
HC-BIOGRID 3.76 1.85 4.11 11.16 10.21 11.89
\hlineB2.5 Mean (Geometric) 3.24 2.19 3.87 6.56 5.47 7.42
\hlineB2.5

We run our solution for 2 different values of kk, i.e., kk = 1 and 5. For real networks, the gmean speedup is always at least 2.35 for kk = 1 and 3.92 for kk = 5. The experimental results demonstrate that speedup increases with the value of kk. The average speedup reaches 11.16 for kk = 5 from a speedup of 3.76 for kk = 1 (HC-BIOGRID dataset), which shows a significant improvement for a larger value of kk. The average speedup over all tested datasets is 3.24 for kk = 1 and 6.56 for kk = 5. The minimum speedup achieved by the proposed solution is 1.73 for the Karate dataset, and the maximum speedup achieved is 11.89 for HC-BIOGRID dataset. In addition, it has been observed that the speedup also increases with the size of the network. For a small size network (Karate dataset), the speedup is 3.92 for kk = 5. In contrast, for the same value of kk, the speedup increases significantly to 11.16 for the large size network (HC-BIOGRID dataset).

VI Conclusion

This paper studied the SH spanner identification problem for the dynamic networks, namely SST problem. We proposed an efficient solution for SST problem that maintains top-kk spanners dynamically by identifying the set of affected nodes whose pairwise connectivity score changes due to updates in the network. We also designed a fast mechanism for computing the pairwise connectivity score of the nodes. We analysed the performance of our solution experimentally and showed that the proposed single edge update solution speedup the process by a minimum factor of 3.24 over recomputation.

VII ACKNOWLEDGMENTS

This work was completed under Australian Government Research Training Program Scholarship at the University of Adelaide. It is supported by Key-Area Research and Development Plan of Guangdong Province #2020B010164003. The corresponding author is Hong Shen.

References

  • [1] T. Lou and J. Tang, “Mining structural hole spanners through information diffusion in social networks,” in WWW, 2013, pp. 825–836.
  • [2] M. Rezvani, W. Liang, W. Xu, and C. Liu, “Identifying top-k structural hole spanners in large-scale social networks,” in CIKM, 2015, pp. 263–272.
  • [3] Q. Gong, J. Zhang, X. Wang, and Y. Chen, “Identifying structural hole spanners in online social networks using machine learning,” in SIGCOMM, 2019, pp. 93–95.
  • [4] S. Goyal and F. Vega-Redondo, “Structural holes in social networks,” Journal of Economic Theory, vol. 137, no. 1, pp. 460–492, 2007.
  • [5] L. He, C.-T. Lu, J. Ma, J. Cao, L. Shen, and P. S. Yu, “Joint community and structural hole spanner detection via harmonic modularity,” in SIGKDD, 2016, pp. 875–884.
  • [6] W. Xu, M. Rezvani, W. Liang, J. X. Yu, and C. Liu, “Efficient algorithms for the identification of top-kk structural hole spanners in large social networks,” TKDE, vol. 29, no. 5, pp. 1017–1030, 2017.
  • [7] W. Xu, T. Li, W. Liang, J. X. Yu, N. Yang, and S. Gao, “Identifying structural hole spanners to maximally block information propagation,” Information Sciences, vol. 505, pp. 100–126, 2019.
  • [8] J. Tang, T. Lou, and J. Kleinberg, “Inferring social ties across heterogenous networks,” in WSDM, 2012, pp. 743–752.
  • [9] R. S. Burt, Structural holes: The social structure of competition.   Harvard university press, 2009.
  • [10] R. Burt, “Structural holes in virtual worlds,” Chicago, IL: University of Chicago, 2011.