This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Permanence based Hidden Community and Graph Recovery in Social Networks thanks: Corresponding author. School of Computing, Gachon University, Republic of Korea. (email: [email protected])

Jaeyoung Choi and Wooseok Sim
Abstract

Due to the recent development of data analysis techniques, technologies for detecting communities through information expressed in social networks have been developed. Although it has several advantages, including the ability to effectively share recommended items through an estimated community, it may cause personal privacy issues. Therefore, recently, the problem of hiding the real community well is being studied at the same time. As an example, Mittal et al., proposed an algorithm called NEURAL that can hide this community well based on a metric called permanence. Based on this, in this study, we propose a Reverse NEURAL (R-NEURAL) algorithm that restores the community as well as the original graph structure using permanence. The proposed algorithm includes a method for well restoring not only the community hidden by the NEURAL algorithm but also the original graph structure modified by recovered edges. We conduct experiments on real-world graphs and found that the proposed algorithm recovers well the hidden community as well as the graph structure.

I Introduction

The community detection (CD) problem is a problem that infers which nodes are included in which groups in a network or graph through graph connectivity information. In social networks, it is regarded as an important problem since many users are organized into nodes and each user has complex connections [1, 2]. In this CD problem, the community is inferred by classifying cases in which the connectivity between nodes in a specific group is greater than the connectivity with other groups. Recently, many algorithms have been proposed to solve this CD based on the increase of data in social networks and the development of many machine-learning techniques [3, 4]. Many of the proposed CD algorithms are made based on the score function for the community structure in the graph. Examples include modularity [5], conductance [6], coverage, or partition quality [2]. Modularity is the most used metric which is the ratio between edges in the community and the edges distributed over the graph. This indicates how well it is structured when compared to uniform connections. The conductance is a fraction of edges that have the outside direction from the community and the total number of edges in the community. Coverage is the coverage of all edges in the graph and the edges of the intracommunity. Partition quality is the ratio of the number of two pairs in the same community. In addition, a new score function called permanence has recently been proposed, which is a score function based on the vertices of the graph [7]. This metric is based on the ratio of connections between inter-community and intra-community. Unlike the community detection approaches so far, the authors in [8] conducted a study that could well hide the community structure in the graph. They proposed a community deception algorithm named NEURAL, which is a method of selecting a specific edge in the graph based on this permanence score and adding or deleting it to intracommunity or inter-community. Motivated by the algorithm they proposed, we suggest a method called Reverse NEURAL (R-NEURAL) that can simultaneously restore the original community structure and graph structure after NEURAL (a deception algorithm) is applied as shown in Figure 1. In R-NEURAL, the main approach is to recover edges from hidden graphs based on permanence.

Refer to caption
Figure 1: Example of community and graph recovery process after the deception of the target community using edge updates. (CS: Community Structure).

II Model and Algorithms

II-A Preliminaries

We consider an undirected connected graph G=(V,E)G=(V,E), where VV is a set of nodes and EE is the set of edges of the form (i,j)(i,j) for i,jVi,j\in V. After applying a CD algorithm on G,G, we assume that there are k1k\geq 1 detected communities and we denote a set of detected communities by CS={C1,,Ck}.CS=\{C_{1},...,C_{k}\}. We also assume that there are no overlapping communities, i.e., CiCj=C_{i}\cap C_{j}=\emptyset for any ij.i\neq j. For each CiC_{i}, if an edge (u,v)(u,v) is in Ci,C_{i}, it is denoted by intracommunity edge whereas for an edge (u,v)(u,v), where uCiu\in C_{i} and vCjv\in C_{j} with CiCj=C_{i}\cap C_{j}=\emptyset, we denote it as a intercommunity edge.

II-B Permanence

As a measure of community structure in the graph, we use the permanence proposed by Chakraborty et al.[7]. This is a vertex-centric score function that quantifies how well node vv is affiliated with community CC. To see this, let I(v)I(v) be the internal connections of a node vv within its own community and let Emax(v)E_{max}(v) be the maximum connections of vv to its neighboring communities. Next, we let Cin(v)C_{in}(v) be the fraction of actual and possible number of edges among the internal neighbors of vv. Then, the permanence of vv in GG is given by:

Perm(v,G)=I(v)Emax(v)×1deg(v)(1Cin(v)),\displaystyle Perm(v,G)=\frac{I(v)}{E_{max}(v)}\times\frac{1}{deg(v)}-(1-C_{in}(v)), (1)

where deg(v)deg(v) is the node degree of v.v. This score function indicates that if the internal connectivity I(v)I(v) of vertex vv is greater than the external connectivity Emax(v)E_{max}(v), the corresponding value increases and forming a close clique and is divided by the degree of vv to normalize it [7]. As an extension, the permanence of the graph GG is defined by Perm(G)=vGPerm(v,G)/|V|,Perm(G)=\sum_{v\in G}Perm(v,G)/|V|, where |V||V| is the number of nodes in GG. Then the graph permanence is an average of node permanence over the graph.

II-C Community Deception Algorithm - NEURAL [8]

1
Input: Graph G,G, Target community C,C, Budget B>0B>0
Output: Updated graph GG^{\prime}
2
3
4𝒫l,ad=0\mathcal{P}_{l,ad}=0 and 𝒫l,de=0\mathcal{P}_{l,de}=0;
5while B>0B>0 do
6       Step1: For uCu\in C and vCv\in C^{\prime}, where CC^{\prime} is the community of maximum external pull for u,u, set GG^{\prime} to be the graph with added edge (u,v)(u,v). Compute Perm(u,G)Perm(u,G) and Perm(u,G)Perm(u,G^{\prime}) and choose the best edge satisfying:
7      (u,v)=argmaxuC,vCPerm(u,G)Perm(u,G)(u^{*},v^{*})=\arg\max_{u\in C,v\in C^{\prime}}Perm(u,G)-Perm(u,G^{\prime});
8      Set 𝒫l,ad=𝒫(G)𝒫(G)\mathcal{P}_{l,ad}=\mathcal{P}(G)-\mathcal{P}(G^{\prime}) for (u,v)(u^{*},v^{*});
9      Step2: For w,zCw,z\in C, set GG^{\prime} to be the graph with deleted edge (w,z)(w,z). Compute Perm(w,G)Perm(w,G) and Perm(w,G)Perm(w,G^{\prime}) and choose the best edge satisfying:
10      (w,z)=argmaxw,zCPerm(w,G)Perm(w,G)(w^{*},z^{*})=\arg\max_{w,z\in C}Perm(w,G)-Perm(w,G^{\prime});
11      Set 𝒫l,de=𝒫(G)𝒫(G)\mathcal{P}_{l,de}=\mathcal{P}(G)-\mathcal{P}(G^{\prime}) for (w,z)(w^{*},z^{*});
12      if 𝒫l,ad𝒫l,de\mathcal{P}_{l,ad}\geq\mathcal{P}_{l,de} and 𝒫l,ad>0\mathcal{P}_{l,ad}>0 then
13             G(V,E(u,v))G\leftarrow(V,E\cup(u^{*},v^{*}))
14      else if  𝒫l,de>0\mathcal{P}_{l,de}>0 then
15             G(V,E(w,z))G\leftarrow(V,E\setminus(w^{*},z^{*}))
16      BB1B\leftarrow B-1;
17GGG^{\prime}\leftarrow G;
18Return GG^{\prime};
Algorithm 1 NEURAL [8]

For community deception, the authors [8] proposed Network Deception Using Permanence Loss (NEURAL) algorithm in their work. The main idea of the algorithm is that some intracommunity or inter-community edges are updated (added or deleted) based on the permanence difference between before the update and after the edge update. To see this more precisely, we first define a permanence loss [8] of two graphs GG and GG^{\prime}, where GG^{\prime} is the graph result from some edge update on G,G, as follows:

𝒫l=Perm(G)Perm(G).\displaystyle\mathcal{P}_{l}=Perm(G)-Perm(G^{\prime}). (2)

Then, the goal of designing the NEURAL is to maximize the 𝒫l\mathcal{P}_{l}. For this, it is necessary to minimize the right term Perm(G)Perm(G^{\prime}) since the permanence of GG, Perm(G)Perm(G) is fixed if the graph is given. The NEURAL algorithm includes two main steps as depicted in Algorithm 1. Step 1 describes the process of finding an edge to add to another community in the target community. That is, for given original graph GG, Target community CC and budget B>0B>0, which will be used for the edge update, the NEURAL first select nodes pair (u,v)(u,v), for uCu\in C and vCv\in C^{\prime}, where CC^{\prime} is the community of maximum external pull of u.u. Then, it computes the permanence Perm(u,G)Perm(u,G) for the original graph GG and Perm(u,G)Perm(u,G^{\prime}) for the updated graph GG^{\prime} with adding the edge (u,v)(u,v). Next, it computes the difference between them for all uCu\in C and vCv\in C^{\prime} and finds the maximum (u,v)(u^{*},v^{*}) of the difference (lines 3-4). Then, it calculated the difference of permanences between the original graph GG and the updated graph GG^{\prime} as 𝒫l,ad\mathcal{P}_{l,ad}. On the other hand, Step 2 includes the process of finding an edge to delete within the same community. In this process, similar to step 1, calculate the permanence of the selected edge to find the edge (w,z)(w^{*},z^{*}) that maximizes the difference and put it in 𝒫l,de\mathcal{P}_{l,de} (lines 6-8). 111The reason why it is possible to find edges greedily as in Step 1 and Step 2 is that all permanences satisfy submodularity as shown in [8]. Adding an edge in the inter-community and deleting an edge in the intra-community is an operation that increases the permanence gain as proven in [8], so only these two have been considered. Finally, the algorithm compares the values of 𝒫l,ad\mathcal{P}_{l,ad} and 𝒫l,de\mathcal{P}_{l,de} obtained in this way and performs an edge update that satisfies a larger permanence gain, one budget is used (lines 9-13). This process is repeated until the total budget BB is exhausted, and the last updated graph is output as GG^{\prime}.

II-D Proposed Algorithm - Reverse NEURAL (R-NEURAL)

As a simple approach to the recovery of hidden communities that applied the NEURAL algorithm, we propose a Reverse-NEURAL (R-NEURAL) in this paper. This includes the reverse application of the two cases below based on NEURAL. Before describing that, we let GG^{\prime} be the output graph after applying the NEURAL and let G′′G^{\prime\prime} be the graph after updating edge deleting, or adding on GG^{\prime} to recover the original graph GG. Then, we introduce permanence gain, 𝒫g\mathcal{P}_{g}, as follows:

𝒫g=Perm(G′′)Perm(G).\displaystyle\mathcal{P}_{g}=Perm(G^{\prime\prime})-Perm(G^{\prime}). (3)

The above permanence gain seems quite similar to the permanence loss in (2). However, the goal of maximizing the permanence loss is to find GG^{\prime} that minimizes the permanence of the new graph GG^{\prime} whereas the goal of permanence gain 𝒫g\mathcal{P}_{g} is to maximize the permanence of the graph G′′G^{\prime\prime} recovered from GG^{\prime}. To do this, we reverse the following two in Algorithm 1, which is described in Algorithm 2.

  • (1)(1)

    In Step 1, since inter-community edges are newly connected in Step 1 of the NEURAL algorithm, we work to find and remove edges that exist between intercommunity. Due to the lack of knowledge of which edges have been added, R-NEURAL uses also a similar measure, permanence gain 𝒫g,de\mathcal{P}_{g,de}, to select edges to be deleted.

  • (2)(2)

    In Step 2, based on the method of finding the permanence loss while removing the intracommunity edge in NEURAL, we consider the addition of an intracommunity edge to GG^{\prime} and raise the permanence of G′′G^{\prime\prime} to 𝒫g,ad\mathcal{P}_{g,ad}.

Finally, we compare 𝒫g,de\mathcal{P}_{g,de} and 𝒫g,ad\mathcal{P}_{g,ad} and proceed with the operation for the larger one and consume one budget. We examine how well the community structure hidden by NEURAL can be recovered through this simple opposite approach.

1
Input: Output Graph G,G^{\prime}, Target community C,C, Budget B>0B>0
Output: Updated graph G′′G^{\prime\prime}
2
3
4𝒫l,ad=0\mathcal{P}_{l,ad}=0 and 𝒫l,de=0\mathcal{P}_{l,de}=0;
5while B>0B>0 do
6       Step1: For uCu\in C and vCv\in C^{\prime}, where CC^{\prime} is the community of maximum external pull for u,u, set G′′G^{\prime\prime} to be the graph with deleted edge (u,v)(u,v). Compute Perm(u,G)Perm(u,G) and Perm(u,G)Perm(u,G^{\prime}) and choose the best edge (u,v)(u^{*},v^{*}) with
7      argmaxuC,vCPerm(u,G′′)Perm(u,G)\arg\max_{u\in C,v\in C^{\prime}}Perm(u,G^{\prime\prime})-Perm(u,G^{\prime});
8      Set 𝒫g,de=𝒫(G′′)𝒫(G)\mathcal{P}_{g,de}=\mathcal{P}(G^{\prime\prime})-\mathcal{P}(G^{\prime}) for (u,v)(u^{*},v^{*});
9      Step2: For w,zCw,z\in C, set G′′G^{\prime\prime} to be the graph with added edge (w,z)(w,z). Compute Perm(w,G)Perm(w,G) and Perm(w,G)Perm(w,G^{\prime}) and choose the best edge satisfying:
10      (w,z)=argmaxw,zCPerm(w,G′′)Perm(w,G)(w^{*},z^{*})=\arg\max_{w,z\in C}Perm(w,G^{\prime\prime})-Perm(w,G^{\prime});
11      Set 𝒫g,ad=𝒫(G′′)𝒫(G)\mathcal{P}_{g,ad}=\mathcal{P}(G^{\prime\prime})-\mathcal{P}(G^{\prime}) for (w,z)(w^{*},z^{*});
12      if 𝒫g,de𝒫g,ad\mathcal{P}_{g,de}\geq\mathcal{P}_{g,ad} and 𝒫g,de>0\mathcal{P}_{g,de}>0 then
13             G(V,E(u,v))G^{\prime}\leftarrow(V,E\setminus(u^{*},v^{*}))
14      else if  𝒫g,ad>0\mathcal{P}_{g,ad}>0 then
15             G(V,E(w,z))G^{\prime}\leftarrow(V,E\cup(w^{*},z^{*}))
16      BB1B\leftarrow B-1;
17G′′GG^{\prime\prime}\leftarrow G^{\prime};
18Return G′′G^{\prime\prime};
Algorithm 2 Reverse NEURAL (R-NEURAL)

III Numerical and Simulation Results

In the simulation, we apply several different CD algorithms for (i)(i) original graph GG, (ii)(ii) updated graph GG^{\prime} after NEURAL, and (iii)(iii) recovered graph G′′G^{\prime\prime} after R-NEURAL, respectively. As an original graph, we use three different real-world graphs: Dolphin social network (Dol, 62 nodes) [9], and word adjacencies (Adjn, 112 nodes) [9], and Polbook (105 nodes) [9], respectively. As evaluation metrics, we consider three measures: modularity (M), coverage (C), and partition quality (PQ). In the experiments, we set B=0.3|VC|B=0.3|V_{C}|, i.e., 30% of the size of the target community CC as in [8]. As result, in Table I, we obtain three evaluation results for two real-world graphs, we see that there is an increase in most evaluation metrics for each graph G′′G^{\prime\prime} compared to those of GG^{\prime}. Among them, both graphs show that the partition quality score improves the most, which is due to the process of community recovery based on permanence. Next, we use two CD algorithms: Louvain [10] and InfoMap [11] in Table II. We obtain the three evaluation results with respect to the applied CD algorithms for one real-world graph (Dol). In this case, we also check that the detection performance is improved in the recovered graph G′′G^{\prime\prime} in most cases. Finally, to check how similar the recovered graph was to the original graph, we obtained the similarity measure between the two graphs using the well-known eigenvector similarity. To do this, we first measured the similarity between GG and the recovered graph G′′G^{\prime\prime} (sim(GG,G′′G^{\prime\prime})) together with the similarity between the original graph GG and the updated graph GG^{\prime} (sim(GG,GG^{\prime})) for deception in Figure 2. We compared the similarity of Dol, Adjn, and Polbook graphs. As a result, all graph G′′G^{\prime\prime} in the three real-world graphs can recover the original graph from the updated graph GG^{\prime}.

TABLE I: Two real-world graphs: Dol (left) and Adjn (right).
Metrics GG (Dol) GG^{\prime} G′′G^{\prime\prime} GG (Adjn) GG^{\prime} G′′G^{\prime\prime}
M 0.5202 0.4822 0.4866 0.2941 0.2911 0.2920
C 0.7447 0.7119 0.7123 0.4682 0.4479 0.4448
PQ 0.8381 0.8239 0.8324 0.8394 0.8323 0.8351
TABLE II: Two CD algorithms: Louvain (left) and Infomap (right).
Metrics GG (Dol) GG^{\prime} G′′G^{\prime\prime} GG (Dol) GG^{\prime} G′′G^{\prime\prime}
M 0.5205 0.4796 0.4854 0.5219 0.4198 0.4570
C 0.7418 0.7068 0.7063 0.7494 0.8400 0.7198
PQ 0.8407 0.8285 0.8525 0.8358 0.7298 0.8371
Refer to caption
Figure 2: Eigenvector similarity from the original graph GG. The orange line shows how close the graph G′′G^{\prime\prime} is recovered to GG compared to GG^{\prime}.

IV Conclusion

In this study, we proposed a reverse NEURAL algorithm that restores the community and graph structure using permanence. From the simulation on real-world graphs, we have checked that our proposed algorithm recovered not only the target community hidden by the NEURAL algorithm but also the original graph structure modified by edges. As future work, we will consider an additional query process for the selected edges in the algorithm to check whether it was in the original graph or not to improve the recovery performance.

References

  • [1] S. Papadopoulos, Y. Kompatsiaris, A. Vakali and Ploutarchos Spyridonos Community detection in Social Media. In Data Mining and Knowledge Discovery, VOL. 24, pp. 515-554, June 2011.
  • [2] P. Bedi and C. Sharma Community detection in social networks. In WIREs Data Mining Knowl Discov, VOL. 6, pp.115-135, 2016.
  • [3] D. Jin, Z. Yu, P. Jiao, S. Pan, D. He, J. Wu, P. S. Yu, and W. Zhang A Survey of Community Detection Approaches: From Statistical Modeling to Deep Learning. In IEEE TKDE, VOL. 35, pp.1149-1170, Feb. 2023.
  • [4] J. Kim and J. Lee Community Detection in Multi-Layer Graphs: A Survey. In ACM SIGMOD, VOL. 44(3), pp.37–48, September 2015.
  • [5] M. E. Newman Modularity and community structure in networks. In PNAS, VOL. 103(23), pp.8577-8582, June 2006.
  • [6] J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. In Internet Mathematics, VOL. 6(1), 2009.
  • [7] T. Chakraborty, S. Srinivasan, N. Ganguly, A. Mukherjee, and S. Bhowmick On the permanence of vertices in network communities. In ACM KDD, VOL. 6(1), pp.1396–1405, August 2014.
  • [8] S. Mittal, D. Sengupta and T. Chakraborty Hide and Seek: Outwitting Community Detection Algorithms In IEEE Transactions on Computational Social Systems, VOL. 8(4), pp.799 - 808, March 2021.
  • [9] Real-world graph data. http://www-personal.umich.edu/~mejn/netdata/
  • [10] V. D. Blondel, J.L. Guillaume, R. Lambiotte, and E. Lefebvre, Fast unfolding of communities in large networks. In J. Stat. Mech. Theory Exp, VOL. 2008(10), October 2008.
  • [11] M. Rosvall and C. T. Bergstrom Maps of random walks on complex networks reveal community structure In Proc. Nat. Acad. Sci. USA, VOL 105(4), pp.1118–1123, January 2008.