Permanence based Hidden Community and Graph Recovery in Social Networks ^†^†thanks: ^∗ Corresponding author. School of Computing, Gachon University, Republic of Korea. (email: [email protected])

Jaeyoung Choi^∗ and Wooseok Sim

Abstract

Due to the recent development of data analysis techniques, technologies for detecting communities through information expressed in social networks have been developed. Although it has several advantages, including the ability to effectively share recommended items through an estimated community, it may cause personal privacy issues. Therefore, recently, the problem of hiding the real community well is being studied at the same time. As an example, Mittal et al., proposed an algorithm called NEURAL that can hide this community well based on a metric called permanence. Based on this, in this study, we propose a Reverse NEURAL (R-NEURAL) algorithm that restores the community as well as the original graph structure using permanence. The proposed algorithm includes a method for well restoring not only the community hidden by the NEURAL algorithm but also the original graph structure modified by recovered edges. We conduct experiments on real-world graphs and found that the proposed algorithm recovers well the hidden community as well as the graph structure.

I Introduction

The community detection (CD) problem is a problem that infers which nodes are included in which groups in a network or graph through graph connectivity information. In social networks, it is regarded as an important problem since many users are organized into nodes and each user has complex connections [1, 2]. In this CD problem, the community is inferred by classifying cases in which the connectivity between nodes in a specific group is greater than the connectivity with other groups. Recently, many algorithms have been proposed to solve this CD based on the increase of data in social networks and the development of many machine-learning techniques [3, 4]. Many of the proposed CD algorithms are made based on the score function for the community structure in the graph. Examples include modularity [5], conductance [6], coverage, or partition quality [2]. Modularity is the most used metric which is the ratio between edges in the community and the edges distributed over the graph. This indicates how well it is structured when compared to uniform connections. The conductance is a fraction of edges that have the outside direction from the community and the total number of edges in the community. Coverage is the coverage of all edges in the graph and the edges of the intracommunity. Partition quality is the ratio of the number of two pairs in the same community. In addition, a new score function called permanence has recently been proposed, which is a score function based on the vertices of the graph [7]. This metric is based on the ratio of connections between inter-community and intra-community. Unlike the community detection approaches so far, the authors in [8] conducted a study that could well hide the community structure in the graph. They proposed a community deception algorithm named NEURAL, which is a method of selecting a specific edge in the graph based on this permanence score and adding or deleting it to intracommunity or inter-community. Motivated by the algorithm they proposed, we suggest a method called Reverse NEURAL (R-NEURAL) that can simultaneously restore the original community structure and graph structure after NEURAL (a deception algorithm) is applied as shown in Figure 1. In R-NEURAL, the main approach is to recover edges from hidden graphs based on permanence.

Refer to caption — Figure 1: Example of community and graph recovery process after the deception of the target community using edge updates. (CS: Community Structure).

II Model and Algorithms

II-A Preliminaries

We consider an undirected connected graph $G=(V,E)$ , where $V$ is a set of nodes and $E$ is the set of edges of the form $(i,j)$ for $i,j\in V$ . After applying a CD algorithm on $G,$ we assume that there are $k\geq 1$ detected communities and we denote a set of detected communities by $CS=\{C_{1},...,C_{k}\}.$ We also assume that there are no overlapping communities, i.e., $C_{i}\cap C_{j}=\emptyset$ for any $i\neq j.$ For each $C_{i}$ , if an edge $(u,v)$ is in $C_{i},$ it is denoted by intracommunity edge whereas for an edge $(u,v)$ , where $u\in C_{i}$ and $v\in C_{j}$ with $C_{i}\cap C_{j}=\emptyset$ , we denote it as a intercommunity edge.

II-B Permanence

As a measure of community structure in the graph, we use the permanence proposed by Chakraborty et al.[7]. This is a vertex-centric score function that quantifies how well node $v$ is affiliated with community $C$ . To see this, let $I(v)$ be the internal connections of a node $v$ within its own community and let $E_{max}(v)$ be the maximum connections of $v$ to its neighboring communities. Next, we let $C_{in}(v)$ be the fraction of actual and possible number of edges among the internal neighbors of $v$ . Then, the permanence of $v$ in $G$ is given by:

\displaystyle Perm(v,G)=\frac{I(v)}{E_{max}(v)}\times\frac{1}{deg(v)}-(1-C_{in}(v)),

(1)

where $deg(v)$ is the node degree of $v.$ This score function indicates that if the internal connectivity $I(v)$ of vertex $v$ is greater than the external connectivity $E_{max}(v)$ , the corresponding value increases and forming a close clique and is divided by the degree of $v$ to normalize it [7]. As an extension, the permanence of the graph $G$ is defined by $Perm(G)=\sum_{v\in G}Perm(v,G)/|V|,$ where $|V|$ is the number of nodes in $G$ . Then the graph permanence is an average of node permanence over the graph.

II-C Community Deception Algorithm - NEURAL [8]

Input: Graph

G,

Target community

C,

Budget

B>0

Output: Updated graph

G^{\prime}

\mathcal{P}_{l,ad}=0

and

\mathcal{P}_{l,de}=0

;

5while $B>0$ do

6 Step1: For

u\in C

and

v\in C^{\prime}

, where

C^{\prime}

is the community of maximum external pull for

u,

set

G^{\prime}

to be the graph with added edge

(u,v)

. Compute

Perm(u,G)

and

Perm(u,G^{\prime})

and choose the best edge satisfying:

(u^{*},v^{*})=\arg\max_{u\in C,v\in C^{\prime}}Perm(u,G)-Perm(u,G^{\prime})

;

8 Set

\mathcal{P}_{l,ad}=\mathcal{P}(G)-\mathcal{P}(G^{\prime})

for

(u^{*},v^{*})

;

9 Step2: For

w,z\in C

, set

G^{\prime}

to be the graph with deleted edge

(w,z)

. Compute

Perm(w,G)

and

Perm(w,G^{\prime})

and choose the best edge satisfying:

(w^{*},z^{*})=\arg\max_{w,z\in C}Perm(w,G)-Perm(w,G^{\prime})

;

11 Set

\mathcal{P}_{l,de}=\mathcal{P}(G)-\mathcal{P}(G^{\prime})

for

(w^{*},z^{*})

;

12 if $\mathcal{P}_{l,ad}\geq\mathcal{P}_{l,de}$ and $\mathcal{P}_{l,ad}>0$ then

G\leftarrow(V,E\cup(u^{*},v^{*}))

14 else if $\mathcal{P}_{l,de}>0$ then

G\leftarrow(V,E\setminus(w^{*},z^{*}))

B\leftarrow B-1

;

G^{\prime}\leftarrow G

;

18Return

G^{\prime}

;

Algorithm 1 NEURAL [8]

For community deception, the authors [8] proposed Network Deception Using Permanence Loss (NEURAL) algorithm in their work. The main idea of the algorithm is that some intracommunity or inter-community edges are updated (added or deleted) based on the permanence difference between before the update and after the edge update. To see this more precisely, we first define a permanence loss [8] of two graphs $G$ and $G^{\prime}$ , where $G^{\prime}$ is the graph result from some edge update on $G,$ as follows:

\displaystyle\mathcal{P}_{l}=Perm(G)-Perm(G^{\prime}).

(2)

Then, the goal of designing the NEURAL is to maximize the $\mathcal{P}_{l}$ . For this, it is necessary to minimize the right term $Perm(G^{\prime})$ since the permanence of $G$ , $Perm(G)$ is fixed if the graph is given. The NEURAL algorithm includes two main steps as depicted in Algorithm 1. Step 1 describes the process of finding an edge to add to another community in the target community. That is, for given original graph $G$ , Target community $C$ and budget $B>0$ , which will be used for the edge update, the NEURAL first select nodes pair $(u,v)$ , for $u\in C$ and $v\in C^{\prime}$ , where $C^{\prime}$ is the community of maximum external pull of $u.$ Then, it computes the permanence $Perm(u,G)$ for the original graph $G$ and $Perm(u,G^{\prime})$ for the updated graph $G^{\prime}$ with adding the edge $(u,v)$ . Next, it computes the difference between them for all $u\in C$ and $v\in C^{\prime}$ and finds the maximum $(u^{*},v^{*})$ of the difference (lines 3-4). Then, it calculated the difference of permanences between the original graph $G$ and the updated graph $G^{\prime}$ as $\mathcal{P}_{l,ad}$ . On the other hand, Step 2 includes the process of finding an edge to delete within the same community. In this process, similar to step 1, calculate the permanence of the selected edge to find the edge $(w^{*},z^{*})$ that maximizes the difference and put it in $\mathcal{P}_{l,de}$ (lines 6-8). ¹¹1The reason why it is possible to find edges greedily as in Step 1 and Step 2 is that all permanences satisfy submodularity as shown in [8]. Adding an edge in the inter-community and deleting an edge in the intra-community is an operation that increases the permanence gain as proven in [8], so only these two have been considered. Finally, the algorithm compares the values of $\mathcal{P}_{l,ad}$ and $\mathcal{P}_{l,de}$ obtained in this way and performs an edge update that satisfies a larger permanence gain, one budget is used (lines 9-13). This process is repeated until the total budget $B$ is exhausted, and the last updated graph is output as $G^{\prime}$ .

II-D Proposed Algorithm - Reverse NEURAL (R-NEURAL)

As a simple approach to the recovery of hidden communities that applied the NEURAL algorithm, we propose a Reverse-NEURAL (R-NEURAL) in this paper. This includes the reverse application of the two cases below based on NEURAL. Before describing that, we let $G^{\prime}$ be the output graph after applying the NEURAL and let $G^{\prime\prime}$ be the graph after updating edge deleting, or adding on $G^{\prime}$ to recover the original graph $G$ . Then, we introduce permanence gain, $\mathcal{P}_{g}$ , as follows:

\displaystyle\mathcal{P}_{g}=Perm(G^{\prime\prime})-Perm(G^{\prime}).

(3)

The above permanence gain seems quite similar to the permanence loss in (2). However, the goal of maximizing the permanence loss is to find $G^{\prime}$ that minimizes the permanence of the new graph $G^{\prime}$ whereas the goal of permanence gain $\mathcal{P}_{g}$ is to maximize the permanence of the graph $G^{\prime\prime}$ recovered from $G^{\prime}$ . To do this, we reverse the following two in Algorithm 1, which is described in Algorithm 2.

$(1)$

In Step 1, since inter-community edges are newly connected in Step 1 of the NEURAL algorithm, we work to find and remove edges that exist between intercommunity. Due to the lack of knowledge of which edges have been added, R-NEURAL uses also a similar measure, permanence gain $\mathcal{P}_{g,de}$ , to select edges to be deleted.
$(2)$

In Step 2, based on the method of finding the permanence loss while removing the intracommunity edge in NEURAL, we consider the addition of an intracommunity edge to $G^{\prime}$ and raise the permanence of $G^{\prime\prime}$ to $\mathcal{P}_{g,ad}$ .

Finally, we compare $\mathcal{P}_{g,de}$ and $\mathcal{P}_{g,ad}$ and proceed with the operation for the larger one and consume one budget. We examine how well the community structure hidden by NEURAL can be recovered through this simple opposite approach.

Input: Output Graph

G^{\prime},

Target community

C,

Budget

B>0

Output: Updated graph

G^{\prime\prime}

\mathcal{P}_{l,ad}=0

and

\mathcal{P}_{l,de}=0

;

5while $B>0$ do

6 Step1: For

u\in C

and

v\in C^{\prime}

, where

C^{\prime}

is the community of maximum external pull for

u,

set

G^{\prime\prime}

to be the graph with deleted edge

(u,v)

. Compute

Perm(u,G)

and

Perm(u,G^{\prime})

and choose the best edge

(u^{*},v^{*})

with

\arg\max_{u\in C,v\in C^{\prime}}Perm(u,G^{\prime\prime})-Perm(u,G^{\prime})

;

8 Set

\mathcal{P}_{g,de}=\mathcal{P}(G^{\prime\prime})-\mathcal{P}(G^{\prime})

for

(u^{*},v^{*})

;

9 Step2: For

w,z\in C

, set

G^{\prime\prime}

to be the graph with added edge

(w,z)

. Compute

Perm(w,G)

and

Perm(w,G^{\prime})

and choose the best edge satisfying:

(w^{*},z^{*})=\arg\max_{w,z\in C}Perm(w,G^{\prime\prime})-Perm(w,G^{\prime})

;

11 Set

\mathcal{P}_{g,ad}=\mathcal{P}(G^{\prime\prime})-\mathcal{P}(G^{\prime})

for

(w^{*},z^{*})

;

12 if $\mathcal{P}_{g,de}\geq\mathcal{P}_{g,ad}$ and $\mathcal{P}_{g,de}>0$ then

G^{\prime}\leftarrow(V,E\setminus(u^{*},v^{*}))

14 else if $\mathcal{P}_{g,ad}>0$ then

G^{\prime}\leftarrow(V,E\cup(w^{*},z^{*}))

B\leftarrow B-1

;

G^{\prime\prime}\leftarrow G^{\prime}

;

18Return

G^{\prime\prime}

;

Algorithm 2 Reverse NEURAL (R-NEURAL)

III Numerical and Simulation Results

In the simulation, we apply several different CD algorithms for $(i)$ original graph $G$ , $(ii)$ updated graph $G^{\prime}$ after NEURAL, and $(iii)$ recovered graph $G^{\prime\prime}$ after R-NEURAL, respectively. As an original graph, we use three different real-world graphs: Dolphin social network (Dol, 62 nodes) [9], and word adjacencies (Adjn, 112 nodes) [9], and Polbook (105 nodes) [9], respectively. As evaluation metrics, we consider three measures: modularity (M), coverage (C), and partition quality (PQ). In the experiments, we set $B=0.3|V_{C}|$ , i.e., 30% of the size of the target community $C$ as in [8]. As result, in Table I, we obtain three evaluation results for two real-world graphs, we see that there is an increase in most evaluation metrics for each graph $G^{\prime\prime}$ compared to those of $G^{\prime}$ . Among them, both graphs show that the partition quality score improves the most, which is due to the process of community recovery based on permanence. Next, we use two CD algorithms: Louvain [10] and InfoMap [11] in Table II. We obtain the three evaluation results with respect to the applied CD algorithms for one real-world graph (Dol). In this case, we also check that the detection performance is improved in the recovered graph $G^{\prime\prime}$ in most cases. Finally, to check how similar the recovered graph was to the original graph, we obtained the similarity measure between the two graphs using the well-known eigenvector similarity. To do this, we first measured the similarity between $G$ and the recovered graph $G^{\prime\prime}$ (sim( $G$ , $G^{\prime\prime}$ )) together with the similarity between the original graph $G$ and the updated graph $G^{\prime}$ (sim( $G$ , $G^{\prime}$ )) for deception in Figure 2. We compared the similarity of Dol, Adjn, and Polbook graphs. As a result, all graph $G^{\prime\prime}$ in the three real-world graphs can recover the original graph from the updated graph $G^{\prime}$ .

TABLE I: Two real-world graphs: Dol (left) and Adjn (right).

Metrics	$G$ (Dol)	$G^{\prime}$	$G^{\prime\prime}$	$G$ (Adjn)	$G^{\prime}$	$G^{\prime\prime}$
M	0.5202	0.4822	0.4866	0.2941	0.2911	0.2920
C	0.7447	0.7119	0.7123	0.4682	0.4479	0.4448
PQ	0.8381	0.8239	0.8324	0.8394	0.8323	0.8351

TABLE II: Two CD algorithms: Louvain (left) and Infomap (right).

Metrics	$G$ (Dol)	$G^{\prime}$	$G^{\prime\prime}$	$G$ (Dol)	$G^{\prime}$	$G^{\prime\prime}$
M	0.5205	0.4796	0.4854	0.5219	0.4198	0.4570
C	0.7418	0.7068	0.7063	0.7494	0.8400	0.7198
PQ	0.8407	0.8285	0.8525	0.8358	0.7298	0.8371

IV Conclusion

In this study, we proposed a reverse NEURAL algorithm that restores the community and graph structure using permanence. From the simulation on real-world graphs, we have checked that our proposed algorithm recovered not only the target community hidden by the NEURAL algorithm but also the original graph structure modified by edges. As future work, we will consider an additional query process for the selected edges in the algorithm to check whether it was in the original graph or not to improve the recovery performance.

References

[1] S. Papadopoulos, Y. Kompatsiaris, A. Vakali and Ploutarchos Spyridonos Community detection in Social Media. In Data Mining and Knowledge Discovery, VOL. 24, pp. 515-554, June 2011.
[2] P. Bedi and C. Sharma Community detection in social networks. In WIREs Data Mining Knowl Discov, VOL. 6, pp.115-135, 2016.
[3] D. Jin, Z. Yu, P. Jiao, S. Pan, D. He, J. Wu, P. S. Yu, and W. Zhang A Survey of Community Detection Approaches: From Statistical Modeling to Deep Learning. In IEEE TKDE, VOL. 35, pp.1149-1170, Feb. 2023.
[4] J. Kim and J. Lee Community Detection in Multi-Layer Graphs: A Survey. In ACM SIGMOD, VOL. 44(3), pp.37–48, September 2015.
[5] M. E. Newman Modularity and community structure in networks. In PNAS, VOL. 103(23), pp.8577-8582, June 2006.
[6] J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. In Internet Mathematics, VOL. 6(1), 2009.
[7] T. Chakraborty, S. Srinivasan, N. Ganguly, A. Mukherjee, and S. Bhowmick On the permanence of vertices in network communities. In ACM KDD, VOL. 6(1), pp.1396–1405, August 2014.
[8] S. Mittal, D. Sengupta and T. Chakraborty Hide and Seek: Outwitting Community Detection Algorithms In IEEE Transactions on Computational Social Systems, VOL. 8(4), pp.799 - 808, March 2021.
[9] Real-world graph data. http://www-personal.umich.edu/~mejn/netdata/
[10] V. D. Blondel, J.L. Guillaume, R. Lambiotte, and E. Lefebvre, Fast unfolding of communities in large networks. In J. Stat. Mech. Theory Exp, VOL. 2008(10), October 2008.
[11] M. Rosvall and C. T. Bergstrom Maps of random walks on complex networks reveal community structure In Proc. Nat. Acad. Sci. USA, VOL 105(4), pp.1118–1123, January 2008.

Permanence based Hidden Community and Graph Recovery in Social Networks ††thanks: ∗ Corresponding author. School of Computing, Gachon University, Republic of Korea. (email: [email protected])

Abstract

I Introduction

II Model and Algorithms

II-A Preliminaries

II-B Permanence

II-C Community Deception Algorithm - NEURAL [8]

II-D Proposed Algorithm - Reverse NEURAL (R-NEURAL)

III Numerical and Simulation Results

IV Conclusion

References

Permanence based Hidden Community and Graph Recovery in Social Networks ^†^†thanks: ^∗ Corresponding author. School of Computing, Gachon University, Republic of Korea. (email: [email protected])