This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Continuous Influence-based Community Partition for Social Networks

Qiufen Ni, Jianxiong Guo, Weili Wu, , and Chuanhe Huang This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.Qiufen Ni (Corresponding author) is with School of Computers, Guangdong University of Technology, Guangzhou 510006, China.Jianxiong Guo and Weili Wu are with Department of Computer Science, The University of Texas at Dallas, Richardson, Texas 75080, USA.Chuanhe Huang is with School of Computer Science, Wuhan University, Wuhan 430072, China.
Abstract

Community partition is of great importance in social networks because of the rapid increasing network scale, data and applications. We consider the community partition problem under LT model in social networks, which is a combinatorial optimization problem that divides the social network to disjoint mm communities. Our goal is to maximize the sum of influence propagation through maximizing it within each community. As the influence propagation function of community partition problem is supermodular under LT model, we use the method of Lova´\acute{a}sz Extension to relax the target influence function and transfer our goal to maximize the relaxed function over a matroid polytope. Next, we propose a continuous greedy algorithm using the properties of the relaxed function to solve our problem, which needs to be discretized in concrete implementation. Then, random rounding technique is used to convert the fractional solution to integer solution. We present a theoretical analysis with 11/e1-1/e approximation ratio for the proposed algorithms. Extensive experiments are conducted to evaluate the performance of the proposed continuous greedy algorithms on real-world online social networks datasets and the results demonstrate that continuous community partition method can improve influence spread and accuracy of the community partition effectively.

Index Terms:
Community Partition, Influence Maximization, Lova´\acute{a}sz extension, Matroid Polytope, Social Networks

I Introduction

Nowadays, social networks have gained popularity in real-world applications. So its structure becomes more and more complex and enormously large. It may cause the network traffic congestion as the increasing of the social network users and may increase the communication cost as the data storing in different servers, it may also brings difficulty in dealing with and analysing the complex data sources. A social network can be seen as a graph of social individuals and relationship between them, where the social individuals represent the vertices and the connections between individuals constitute the edges of the graph [1]. A valuable approach in analysing large complex social networks is community partition. A community is a group of nodes with dense inner connections and relatively sparse connections with nodes outside the group. Community partition is to partition a complex social network into several medium size sub-networks communities, which can help us learn about the relationships and characteristics of the individuals more clearly and improve the process of analysis since each community has similar node distribution with the original complex social networks. Some applications, such as the community-based rumor blocking, community-based active friending and so on, take advantage of the community structure of social networks to help us solve problems more effectively.

Influence maximization is an important problem in social networks. Its target is to select a small set of seed users to trigger a large number of influence propagation, which is widely used in the viral marketing [2, 3, 4]. As the uncertainty of human behaviours and decision making affects each other, Kemp et al.[5] proposed two classic influence propagation probabilistic models: Linear Threshold (LT) model and Independent Cascade (IC) model. They proved that the expected number of active users, called influence spread, is monotone and submodular with respect to the seed set. They propose a greedy algorithm which can maximize the influence propagation in social networks. Lu et al. [6] and Wang et al. [7] show that the influence maximization problem is NP-hard under IC model while polynomial-time solvable in LT model. In this paper, we want to study the community partition problem with the goal of maximizing the influence propagation in each community under LT model.

Community partition attracts extensive attentions. To solve it, most of community partition algorithms consider the social network as a graph where each node only belongs to one community [8]. The second approach is to correlate the social network with a hypergraph that nodes overlap exist between communities, there are some related studies[9]. The third approach associates the concept graph or Galois lattices where nodes share some common characteristics or knowledge, which is a more complex structure than the first two categories and it can give more semantics of communities to the network structure. The output is a Galos hierarchy hypergraph marked with lattice intents[10, 11]. In this work, we investigate the influence-based disjoint community partition problem in a graph, as social networks with disjoint community structures are real, such as communities composed of experts in different fields, a work unit’s departmental organization and personnel division communities, etc., this research problem has its practical significance. We adopt the Lova´\acute{a}sz extension to relax the objective function and design a continuous greedy algorithm to partition a social network into mm disjoint communities with the goal of maximizing the influence propagation within each community. Different from the existing heuristic methods in community partition problem, the proposed algorithm ensures the accuracy in community partition and can achieve a good approximation theoretical guarantee at the same time. Besides the community partition, there are also some other works study the local influence propagation instead of the global one[12, 13].

The main contributions of this paper can be summarize as follows:

  • An innovative continuous influence-based community partition method is developed. First, we formulate the community partition problem as partitioning a social network to mm disjoint communities with the goal of maximizing the influence propagation within each community.

  • We use the Lova´\acute{a}sz Extension to relax the objective function and a partition matroid to the domain of the relaxed problem is introduced.

  • A continuous greedy algorithm is devised based on its continuous extension and its discrete form is proposed to solve the problem in concrete implementation.

  • We analyze the performance guarantee and get an approximation ratio 11/e1-1/e for the proposed algorithms.

  • We do simulations on three real-world online social networks datasets to verify the high-quality and accuracy of the proposed continuous algorithms.

The result of this paper is arranged as follows. In Section II we start with summarizing some existing related work. Then in Section III, the network model and problem formulation are introduced. Section IV gives the detail solution for the proposed community partition problem. We also give the theoretical proof of the proposed algorithm in Section V, and in Section VI the simulation results are presented, while finally, the conclusion is presented in Section VII.

II Related Work

Community partition is one kind of community detection method. The community detection is not only important in social networks, but also widely studied in other fields, like biological networks [14, 15] and technological networks [16, 17]. Most of previous work about community detection is presented from the perspective of network structure: (1) Hierarchy-based method. M. Girvan et al. [18] propose a method which can be used for community detection in social and biological networks. They present a greedy algorithm progressively which removes the edge with the most betweenness communities from the network graph. (2) Modularity-based method. In this method, each node is an independent community initially. But if merging two communities can gain a larger modularity, the merging process will go on until the modularity is stable. Newman et al. [19] define a modularity QQ, then they divide the network into communities based on the value of QQ, when QQ reduces to zero, it represents that no more interior community edges than would be expected by random chance. But these algorithms only limit applying to small size networks. (3) Spectrum-based method. This method is based on the multi-path partition to achieve the spectral clustering. In order to gain overlapping communities, C. Gui et al. [20] propose a new algorithm to establish hierarchical structure with detecting the overlap of communities. This algorithm can balance the overlap and hierarchy through spectral analysis method. (4) Dynamic-based method. It dynamically chooses nodes with more gains of community measure function to partition the community. Q. Liu et al. [21] design a algorithm to discovery the link community structure of a dynamic weighted network based on the fitness of weighted edges and partition density. This method can show the process of the evolution of the link community structure, and it can also detect the overlapping communities. Through adjusting the value of a parameter, we can get the link communities’ hierarchical structure. (5) Label propagation method. It is a local-based community detection method depends on the label propagation of nodes. UN Raghavan et al. [22] assume that every node in the network is allocated a unique label. Each node iteratively accepts the label which is adopted by most of its neighbours until all the nodes’ labels achieve a stable state. Then nodes with the same labels will be partitioned to the same community.

There are some community detection works based on the node attributes. The user’s topic, tag and behaviour information of a node can be looked as a node’s attributes. We can partition community based on these attribute characteristics. X. Teng et al. [23] propose a overlapping community detection algorithm which is based on the nodes’ attributes in attribute networks. They design two objects: one is the changed extended modularity value and the other is the attribute similarity value. Then they use an encode and decode approach to realize the overlapping community partition. X. Wang et al. [24] consider the group feature instead of the individual characteristics in mostly previous works. They classify the “Internet water army” to six communities based on their behaviours and design a community detection method which is based on the logistic regression model.

There are also some other classical community partition methods in complex networks, such as based on multi-objective evolutionary algorithm, algorithm based on local expansion. In [25], Z. Feng et al. propose a discrete Gaussian process-based inverse model and a multi-objective optimization method to solve community detection problem in complex networks. Manuel Guerrero et al. [26] study the multi-objective variants of community detection problem, the variants includes modularity, conductance metric, the imbalance of the nodes in communities. They propose a Pareto-based multi-objective evolutionary algorithm to optimize different objectives simultaneously. Kamal Berahmand et al. [27] propose an expansion and detection of core nodes based local approach, the edge weight which is detected based on the node similarity is more accurate, the local algorithm would be more precise. The proposed algorithm can detect all the graph’s communities in a network by local and identifying various nodes’ roles. X. Ding et al. [28] propose a overlapping community detection algorithm to identify different community structures, which performs local expansion method and boundary re-checking sub-processes in order.

In recent years, some community detection methods combined with the new technologies have emerged. In [29], W. Feifan et al. propose an extreme learning machine-based community detection algorithm in complex networks, which combines kk-means and unsupervised ELM. This proposed algorithm is verified to be outstanding in low complexity. G. Sperlf [30] proposes a deep learning and network’s topology based community detection algorithm. Convolutional neutral network is used to deal with the large dimensions of a social network. Z. Chen et al. [31] propose a novel community detection method with Graph Neural Networks (GNNs) under the supervised learning setting. The experimental results and analyses demonstrate that the loss of local minima is low in linear GNN models.

There are also some community detection works based on influence propagation. This community partition method is based on which community nodes can achieve larger influence. Y. Wang et al. [32] present a selecting community approach based on the node’s influence in Mobile Social Networks (MSNs), which mining the top-KK most influential nodes within each community. Putting them together then as the best KK nodes. Z. Lu et al. [33] consider the community partition problem as a combinatorial optimization problem, which dividing KK disjoint communities in the target of maximizing the influence in each community. Then they propose a MKCPMKCP algorithm to solve this problem. In particular, when K=2K=2, they develop an optimal algorithm to partition the social networks to two disjoint sub-networks. N. Barbieri et al. [34] define a stochastic framework to model the social influence of users within a community, and they present an expectation maximization (EM) learning algorithm, which allows the automatic detection of the most appropriate number of communities by enabling a community annihilation mechanism. All these existing influence-based community partition methods are discrete and heuristic, they also do not have a theoretical guarantee.

III Network Model and Problem Formulation

III-A The Network Model

A social network is modelled as a directed graph G=(V,E)G=(V,E), where each vertex ii in VV is an individual, and each edge e=(i,j)e=(i,j) in EE is the social tie between user ii and jj. Let N(i)N^{-}(i) and N+(i)N^{+}(i) denote the sets of incoming neighbours and outgoing neighbours, respectively. In LT model, each edge eEe\in E in the graph is associated with a weight wijw_{ij}, each node iVi\in V is influenced by its incoming neighbours jj satisfies jN(i)wij1\sum_{j\in N^{-}(i)}w_{ij}\leq 1. In addition, each node iVi\in V is related with a threshold θi\theta_{i} which is uniformly distributed in the interval [0,1][0,1]. The information diffusion process can be described in discrete steps: all nodes that are active in step t1t-1 will still active in step tt. An inactive node will be active if the total weight of its incoming neighbours that are active is larger than or equal to θi\theta_{i}, i.e. jN(i)wijθi\sum_{j\in N^{-}(i)}w_{ij}\geq\theta_{i}. The propagation process ends until there is no new node being activated.

III-B Problem Formulation

Assume that there are mm communities, we allocate a community identifier sjM={1,2,,m}s_{j}\in M=\{1,2,...,m\} for each node jj, so all the nodes in the same community have the same community identifier, i.e. Si={j|sj=i}S_{i}=\{j|s_{j}=i\} represents the node set in community SiS_{i}, where 1im1\leq i\leq m. For a community SkS_{k} and a node iSki\in S_{k}, we use σSk(i)=j(Sk\i)pSk(i,j)\sigma_{S_{k}}(i)=\sum_{j\in(S_{k}\backslash i)}p_{S_{k}}(i,j) to denote the influence propagation of node ii within community SkS_{k}. Assume there is a non-empty subset DSkD\subseteq S_{k}, the total influence propagation of all nodes in DD within community SkS_{k} is denoted by σSk(D)=iDσSk(i)\sigma_{S_{k}}(D)=\sum_{i\in D}\sigma_{S_{k}}(i), which can show the reciprocal influence strength among the nodes in community SkS_{k}. In the rest of the paper, we use σ(Y)\sigma(Y) to denote the influence propagation of community YY instead of σY(Y)\sigma_{Y}(Y) for simplicity. So we denote the total influence propagation in the social networks after partitioning to mm communities as f(S1,S2,,Sm)=i=1mσ(Si)f(S_{1},S_{2},...,S_{m})=\sum_{i=1}^{m}\sigma(S_{i}). Next, let us describe the community partition problem we want to solve as follows:

Influence Maximization for Community Partition Problem (IMCPP): Given a graph G=(V,E)G=(V,E) as a social network, its information diffusion is under LT model. We seek a partition of the social network into mm disjoint sets {S1,S2,,SmS_{1},S_{2},\dots,S_{m}} satisfying: (1) k=1m(Sk)=V\bigcup_{k=1}^{m}(S_{k})=V; (2) ij,SiSj={\forall i\neq j,S_{i}\cap S_{j}=\emptyset}. Our goal is to maximize the influence propagation function f(S1,S2,,Sm)=k=1mσ(Sk)f(S_{1},S_{2},\dots,S_{m})=\sum\limits_{k=1}^{m}\sigma(S_{k}).

Z. Lu et al. [33] proved that the maximum K-community partition problem is NP-hard. Our IMCPP can be reduced to K-community partition problem, thus, the IMCPP is NP-hard.

IV Solution for IMCPP

We will prove some important properties of the objective function and how to solve it step by step efficiently in this section.

IV-A Property of Influence Propagation Function ff

First, we need to analyse the properties of the influence propagation function ff we want to solve. The first property of ff is monotonicity.

Lemma 1.

The influence propagation function ff for the community partition problem is monotone under the LT model.

Proof.

Assuming that the influence propagation function within the community SkS_{k} is σ(Sk)\sigma(S_{k}). We know that when adding a seed node ii to this community, the conditional expected marginal gain produced by ii to the community SkS_{k} can be denoted as: Δ(i|Sk)=𝔼[σ(Sk+i)σ(Sk)]\Delta(i|S_{k})=\mathbb{E}[\sigma(S_{k}+i)-\sigma(S_{k})]. Obviously, Δ(i|Sk)0\Delta(i|S_{k})\geq 0. So σ(Sk)\sigma(S_{k}) is monotone. As we know that the influence propagation function ff is: f(S1,S2,,Sm)=k=1mσ(Sk)f(S_{1},S_{2},\dots,S_{m})=\sum_{k=1}^{m}\sigma(S_{k}). Hence, ff is also monotone. ∎

We need to know the definition of supermodular function before we introduce the second property of ff. Let XX with |X|=n|X|=n be a ground set. A set function on XX is a function hh: 2XR2^{X}\rightarrow R.

Definition 1 (Supermodular function).

A set function hh: 2XR2^{X}\rightarrow R is supermodular if for any ABXA\subseteq B\subseteq X and uX\Bu\in X\backslash B, we have h(A{u})h(A)h(B{u})h(B)h(A\cup\{u\})-h(A)\leq h(B\cup\{u\})-h(B). There is another equivalent definition for supermodularity, that is h(AB)h(AB)h(A)+h(B)h(A\cap B)-h(A\cup B)\geq h(A)+h(B).

We found that the influence propagation function ff satisfies the supermodularity, shown as the following lemma, that is,

Lemma 2.

The influence propagation function ff for the community partition problem is supermodular under the LT model.

Proof.

Assume there are two communities SaS_{a} and SbS_{b}, and SaSbS_{a}\subset S_{b}, so we have to prove that for any node qSbq\notin S_{b}, σ(Sa{q})σ(Sa)σ(Sb{q})σ(Sb)\sigma(S_{a}\cup\{q\})-\sigma(S_{a})\leq\sigma(S_{b}\cup\{q\})-\sigma({S_{b}}), this is the condition that a function is supermodular. If there is a live-edge path from seed node ii to jj, it indicates that node jj is influenced by seed ii. Let pSa(i,j)p_{S_{a}}(i,j) be the probability that node jj receives influence from node ii through nodes within community SaS_{a} and pSb(i,j)p_{S_{b}}(i,j) be the probability that node jj receives influence from node ii through nodes within community SbS_{b}. So we have: σ(Sa{q})σ(Sa)=jSapSa(q,j)+iSapSa(i,q)+i,jSa:ij{pSa{q}(i,j)pSa(i,j)}\sigma(S_{a}\cup\{q\})-\sigma(S_{a})=\sum_{j\in S_{a}}p_{S_{a}}(q,j)+\sum_{i\in S_{a}}p_{S_{a}}(i,q)+\sum_{i,j\in S_{a}:i\neq j}\{p_{S_{a}\cup\{q\}}(i,j)-p_{S_{a}}(i,j)\}, this is the sum of the probabilities that the path must pass qq one time in community (Sa{q})(S_{a}\cup\{q\}). Accordingly, we can calculate the sum of the probabilities that the path must pass qq one time in community (Sb{q})(S_{b}\cup\{q\}) as σ(Sb{q})σ(Sb)=jSbpSb(q,j)+iSbpSb(i,q)+i,jSb:ij{pSb{q}(i,j)pSb(i,j)}\sigma(S_{b}\cup\{q\})-\sigma(S_{b})=\sum_{j\in S_{b}}p_{S_{b}}(q,j)+\sum_{i\in S_{b}}p_{S_{b}}(i,q)+\sum_{i,j\in S_{b}:i\neq j}\{p_{S_{b}\cup\{q\}}(i,j)-p_{S_{b}}(i,j)\}. As we know that SaSbS_{a}\subset S_{b}, we can get that jSapSa(q,j)jSbpSb(q,j)\sum_{j\in S_{a}}p_{S_{a}}(q,j)\leq\sum_{j\in S_{b}}p_{S_{b}}(q,j), iSapSa(i,q)iSbpSb(i,q)\sum_{i\in S_{a}}p_{S_{a}}(i,q)\leq\sum_{i\in S_{b}}p_{S_{b}}(i,q), because {Sa{q}}\{S_{a}\cup\{q\}\} is also the subset of {Sb{q}}\{S_{b}\cup\{q\}\}. It also follows that pSa{q}(i,j)pSa(i,j)pSb{q}(i,j)pSb(i,j)p_{S_{a}\cup\{q\}}(i,j)-p_{S_{a}}(i,j)\leq p_{S_{b}\cup\{q\}}(i,j)-p_{S_{b}}(i,j). So we can get the inequality σ(Sa{q})σ(Sa)σ(Sb{q}σ(Sb)\sigma(S_{a}\cup\{q\})-\sigma(S_{a})\leq\sigma(S_{b}\cup\{q\}-\sigma({S_{b}}). Therefore, the influence propagation function σ\sigma within each community is supermodular under LT model. As f(S1,S2,,Sm)=k=1mσ(Sk)f(S_{1},S_{2},\dots,S_{m})=\sum_{k=1}^{m}\sigma(S_{k}). Therefore, the sum influence propagation function ff in a social network for the community partition problem under LT model is also supermodular. ∎

IV-B Reformulation of the IMCPP

First, we need to introduce some basic definitions about matroid and matroid polytopes which will be used later.

Definition 2 (Matroid polytopes).

Given a matroid =(X,)\mathcal{M}=(X,\mathcal{I}), the matroid polytope P()P(\mathcal{M}) is the convex hull of the indicators of the bases of \mathcal{M} and defined as:

P()=conv{1I:I}.P(\mathcal{M})=conv\{\vec{1}_{I}:I\in\mathcal{I}\}.

\mathcal{I} is a family of subsets of ground set XX (called independent sets).

The matroid polytopes P()P(\mathcal{M}) is down-monotone because it satisfies the property that for any 0xy,yPxP0\leq x\leq y,y\in P\Rightarrow x\in P.

Then, we generalize the IMCPP problem to a matroid constraint, which is easier to be solved. Here, we define a new ground set U=M×VU=M\times V, where MM is the community set and VV is the node set of the given graph. Let AUA\subseteq U be a feasible solution, namely a feasible community partition combination. Here, (i,j)A(i,j)\in A means that we partition the node jj to community ii. As we can not partition the same node to more than one community, thus, a feasible solution satisfies the following constraint, that is

jV,|{i|(i,j)A}|1\forall j\in V,|\{i|(i,j)\in A\}|\leq 1

Then the influence function of a partition AA can be denoted as:

f(A)=iMσ({j|(i,j)A})f(A)=\sum\limits_{i\in M}\sigma(\{j|(i,j)\in A\})

Thus, the IMCPP can be written as follows:

maxAUf(A)s.t.jV,|{i|(i,j)A}|1\begin{split}&\quad\max\limits_{A\subseteq U}f(A)\\ &\quad s.t.\ \forall j\in V,|\{i|(i,j)\in A\}|\leq 1\end{split} (1)

Therefore, let us define a partition matroid =(U,)\mathcal{M}=(U,\mathcal{I}) as follows:

={XU:|X(M×{j})|1 for jV}\mathcal{I}=\{X\subseteq U:|X\cap(M\times\{j\})|\leq 1\text{ for }j\in V\}

Then the IMCPP problem is equivalent to maximize {f(A):A}\{f(A):A\in\mathcal{I}\}. Any set AA\in\mathcal{I} is called independent set.

IV-C Relaxation of IMCPP

In this section, we give a continuous relaxation of our optimization problem, shown as Equation 1. First, we need to introduce a continuous extension for an arbitrary set function: Lova´\acute{a}sz extension. It was defined by Lova´\acute{a}sz in [35].

Definition 3 (Lova´\acute{a}sz extension).

For a function hh: 2XR2^{X}\rightarrow R, xRX\vec{x}\in R^{X}. Assume that the elements in ground set X={v1,v2,,vn}X=\{v_{1},v_{2},\cdots,v_{n}\} are sorted from maximum to minimum such that x1x2xnx_{1}\geq x_{2}\geq\cdots\geq x_{n}. Let Si={v1,,vi},viXS_{i}=\{v_{1},\cdots,v_{i}\},\forall v_{i}\in X. The Lova´\acute{a}sz Extension h^(x):[0,1]XR\hat{h}(\vec{x}):[0,1]^{X}\rightarrow R of hh at x\vec{x} is defined as:

h^(x)=i=1n1(xixi+1)h(Si)+xnh(Sn)\hat{h}(\vec{x})=\sum\limits_{i=1}^{n-1}(x_{i}-x_{i+1})h(S_{i})+x_{n}h(S_{n})

There are other forms to describe the definition of Lova´\acute{a}sz extension, but in this paper, our discussion is based on this form. We should note that h^\hat{h} is well-defined and positively homogeneous. It satisfies:

h^(1S)=h(S),SX\hat{h}({\vec{1}_{S}})=h(S),\forall S\subseteq X

where 1S{0,1}X\vec{1}_{S}\in\{0,1\}^{X} is the characteristic vector of SS.

There is an equivalent definition to describe the Lova´\acute{a}sz extension: h^(x)=𝔼[h({viX:xi>λ})]\hat{h}(\vec{x})=\mathbb{E}[h(\{v_{i}\in X:x_{i}>\lambda\})], where λ\lambda is uniformly random in [0,1][0,1]. That is,

h^(x)=λ=01h({viX:xi>λ})𝑑λ\hat{h}({\vec{x}})=\int_{\lambda=0}^{1}h(\{v_{i}\in X:x_{i}>\lambda\})d\lambda

We can understand the Lova´\acute{a}sz extension h^(x)\hat{h}(\vec{x}) as the expectation value of hh on a distribution O^(x)\hat{O}(\vec{x}). The distribution O^(x)\hat{O}(\vec{x}) satisfies that it gets the largest subset SnS_{n} with probability x(vn){x}(v_{n}), and gets the next largest subset Sn1S_{n-1} with probability x(vn1)x(vn){x}(v_{n-1})-{x}(v_{n}) and so on. We should notice that the definition of h^(x)\hat{h}(\vec{x}) is oblivious. It does not depend on a particular function hh.

Then we describe the process of relaxing the IMCPP. We introduce a decision variable xij[0,1]x_{ij}\in[0,1] for all (i,j)M×V(i,j)\in M\times V where xijx_{ij} is the probability that node jj is allocated to community ii. Thus,

iMxij1,jV\sum\limits_{i\in M}x_{ij}\leq 1,j\in V

The domain of the relaxed problem can be denoted as:

P()={x[0,1]m×n:jV,iMxij1}P(\mathcal{M})=\{\vec{x}\in[0,1]^{m\times n}:\forall j\in V,\sum\limits_{i\in M}x_{ij}\leq 1\}

We use Lova´\acute{a}sz extension f^(x)\hat{f}(\vec{x}) to relax the influence function ff as follows:

f^(x)=𝔼λ[0,1][f({(i,j)U:xij>λ})]\hat{f}(\vec{x})=\mathbb{E}_{\lambda\sim[0,1]}[f(\{(i,j)\in U:x_{ij}\textgreater\lambda\})] (2)

where λ\lambda is uniformly random in [0, 1].

The relaxation of our problem, Equation 1, can be expressed as follows:

maxxf^(x)s.t.xP()\begin{split}&\quad\max\limits_{{x}}\hat{f}(\vec{x})\\ &\quad s.t.\ \vec{x}\in P(\mathcal{M})\end{split} (3)

So we transfer our goal to maximize the Lova´\acute{a}sz extension f^(x)\hat{f}(\vec{x}) of influence function ff over a matroid polytope P()P(\mathcal{M}).

Lemma 3.

A set function h:2XRh:2^{X}\rightarrow R is submodular (or supermodular) if and only if its Lova´\acute{a}sz extensions h^\hat{h} is convex (or concave).

Proof.

This conclusion was shown by Jan Vondra´\acute{a}k in [36]. ∎

Theorem 1.

The relaxation f^(x)\hat{f}(\vec{x}) of objective function for IMCPP, shown as Equation 2, is monotone and concave.

Proof.

From Lemma 1 and Lemma 2, we know that influence propagation function ff is monotone and supermodular. Based on Lemma 3, we have its Lova´\acute{a}sz extensions f^(x)\hat{f}(\vec{x}) is monotone and concave. ∎

IV-D The Continuous Greedy Process

Based on the monotone and concave property of f^(x)\hat{f}(\vec{x}), we design a continuous process and produce a set xP()\vec{x}\in P(\mathcal{M}) which approximates the optimum solution OPT=max{f^(x):xP()}OPT=\max\{\hat{f}(\vec{x}):\vec{x}\in P(\mathcal{M})\}. The vector moves in direction constrained by P()P(\mathcal{M}) until it achieves a local maximum gain.

In order to observe how f^(x)\hat{f}(\vec{x}) behaves along coordinates axes, we need to show the property of the derivative of f^(x)\hat{f}(\vec{x}), shown as the following lemma:

Lemma 4.

The partial derivative for xix_{i} of the Lova´\acute{a}sz extensions h^(x)\hat{h}(\vec{x}) of a set function hh is

h^(x)xi=h(Si)h(Si1)\frac{\partial\hat{h}(\vec{x})}{\partial x_{i}}=h(S_{i})-h(S_{i-1})

where x1x2xnx_{1}\geq x_{2}\geq\cdots x_{n} and Si={1,2,,i}S_{i}=\{1,2,\cdots,i\}.

Proof.

As h(S0)=h()=0h(S_{0})=h(\emptyset)=0, based on the Definition 3, we have that

h^(x)\displaystyle\hat{h}(\vec{x}) =i=1n1(xixi+1)h(Si)+xnh(Sn)\displaystyle=\sum\limits_{i=1}^{n-1}(x_{i}-x_{i+1})h(S_{i})+x_{n}h(S_{n})
=(x1x2)h(S1)+(x2x3)h(S2)+(x3x4)h(S3)\displaystyle=(x_{1}-x_{2})h(S_{1})+(x_{2}-x_{3})h(S_{2})+(x_{3}-x_{4})h(S_{3})
++(xn1xn)h(Sn1)+xnh(Sn)\displaystyle+\cdots+(x_{n-1}-x_{n})h(S_{n-1})+x_{n}h(S_{n})
=h(S1)x1+(h(S2)h(S1))x2++(h(Sn)\displaystyle=h(S_{1})x_{1}+(h(S_{2})-h(S_{1}))x_{2}+\cdots+(h(S_{n})-
h(Sn1))xn\displaystyle h(S_{n-1}))x_{n}

So we can get that

h^(x)xi=h(Si)h(Si1)\frac{\partial\hat{h}(\vec{x})}{\partial x_{i}}=h(S_{i})-h(S_{i-1})

In our IMCPP problem, we have x[0,1]M×V\vec{x}\in[0,1]^{M\times V}, and let |V|=n|V|=n, we can consider x\vec{x} as a vector such that

x=(x11,,x1n,,xi1,,xin,,xm1,,xmn)\vec{x}=(x_{11},\cdots,x_{1n},\cdots,x_{i1},\cdots,x_{in},\cdots,x_{m1},\cdots,x_{mn})

Then, we sort this vector x\vec{x} to get its sorted vector

x=(x11,,x1n,,xi1,,xin,,xm1,,xmn)\vec{x}^{\prime}=(x^{\prime}_{11},\cdots,x^{\prime}_{1n},\cdots,x^{\prime}_{i1},\cdots,x^{\prime}_{in},\cdots,x^{\prime}_{m1},\cdots,x^{\prime}_{mn}) (4)

that satisfying xijxlkx^{\prime}_{ij}\geq x^{\prime}_{lk} if i<li<l or i=lj<ki=l\land j<k. We denote Ω(xij)=xlk\Omega(x_{ij})=x^{\prime}_{lk} and Ω1(xlk)=xij\Omega^{-1}(x^{\prime}_{lk})=x_{ij}, which means that the element xijx_{ij} in vector x\vec{x} corresponds to the element xlkx^{\prime}_{lk} in sorted vector x\vec{x}^{\prime}. Let Γ(xij)=(i,j)\Gamma(x_{ij})=(i,j) and we have

Slk\displaystyle S^{\prime}_{lk} ={Γ(Ω1(x11)),,Γ(Ω1(x1n)),,Γ(Ω1(xl1)),\displaystyle=\{\Gamma(\Omega^{-1}(x^{\prime}_{11})),\cdots,\Gamma(\Omega^{-1}(x^{\prime}_{1n})),\cdots,\Gamma(\Omega^{-1}(x^{\prime}_{l1})),
,Γ(Ω1(xlk))}\displaystyle\cdots,\Gamma(\Omega^{-1}(x^{\prime}_{lk}))\}

To compute the derivative of f^(x)\hat{f}(\vec{x}), we have

f^(x)xij=f(Slk)f(Sl(k1))\frac{\partial\hat{f}(\vec{x})}{\partial x_{ij}}=f(S^{\prime}_{lk})-f(S^{\prime}_{l(k-1)}) (5)

where Ω(xij)=xlk\Omega(x_{ij})=x^{\prime}_{lk}.

From Equation 5, we can see that the derivative of f^(x)\hat{f}(\vec{x}) for xijx_{ij} just equals the marginal gain of influence propagation when partitioning node jj to community ii as Ω(xij)=xlk\Omega(x_{ij})=x^{\prime}_{lk}. So we can take advantage of this property to find a solution.

As f^(x)\hat{f}(\vec{x}) is non-decreasing monotone, for any (i,j)M×V(i,j)\in M\times V, f^(x)xij0\frac{\partial\hat{f}(\vec{x})}{\partial x_{ij}}\geq 0. Thus the gradient of f^(x)\hat{f}(\vec{x}) is a positive vector, i.e.

f^(x)=[f^(x)x11,,f^(x)x1nf^(x)xm1,f^(x)xmn]0\nabla\hat{f}(\vec{x})={\left[\begin{array}[]{ccc}\frac{\partial\hat{f}(\vec{x})}{\partial x_{11}},&\cdots,&\frac{\partial\hat{f}(\vec{x})}{\partial x_{1n}}\\ \vdots&\vdots&\vdots\\ \frac{\partial\hat{f}(\vec{x})}{\partial x_{m1}}&\cdots,&\frac{\partial\hat{f}(\vec{x})}{\partial x_{mn}}\end{array}\right]}\geq\vec{0}

Next, we begin to design the continuous process. Let x\vec{x} start from x(0)=0\vec{x}(0)=\vec{0} and follow a certain flow over a unit time interval:

dx(t)dt=vmax(x(t)),\frac{d\vec{x}(t)}{dt}=\vec{v}_{max}(\vec{x}(t)),

Then we define vmax(x)\vec{v}_{max}(\vec{x}) as

vmax(x(t))=argmaxvP(vf^(x(t)))\vec{v}_{max}(\vec{x}(t))=\arg\max\limits_{v\in P}(v\cdot\nabla\hat{f}({\vec{x}(t)}))

vmax(x)\vec{v}_{max}(\vec{x}) denotes that when an element jj is added to community ii at time tt, the direction in which the rate of change of the tangent line of function f^(x)\hat{f}(\vec{x}) is greatest. Based on the Equation 5, we know that this can bring the greatest gain for the influence propagation function ff. As t[0,1]t\in[0,1], we have

x(t)=0tdx(τ)dτ𝑑τ=0tvmax(x(τ))𝑑τ\vec{x}(t)=\int_{0}^{t}\frac{d\vec{x}(\tau)}{d\tau}d\tau=\int_{0}^{t}\vec{v}_{max}(\vec{x}(\tau))d\tau (6)

Next, we propose the continuous greedy algorithm for the problem, which is shown in Algorithm 1.

Algorithm 1 Continuous Greedy Algorithm
1:Graph GG, =(U,)\mathcal{M}=(U,\mathcal{I}), ff
2:x(1)\vec{x}(1)
3:Initialize x(0)=0\vec{x}(0)=\vec{0}
4:for each t[0,1]t\in[0,1] do
5:     For each (i,j)M×V(i,j)\in M\times V, let wij(t)=f(Slk)f(Sl(k1))w_{ij}(t)=f(S^{\prime}_{lk})-f(S^{\prime}_{l(k-1)}), where Ω(xij)=xlk\Omega(x_{ij})=x^{\prime}_{lk}
6:     vmax(x(t))=argmaxvP(vw(t))\vec{v}_{max}(\vec{x}(t))=\arg\max\limits_{\vec{v}\in P}(\vec{v}\cdot\vec{w}(t))
7:     Increase x(t)\vec{x}(t) at a rate of vmax(x(t))\vec{v}_{max}(\vec{x}(t))
8:end for
9:return x(1)\vec{x}(1)

In this algorithm, tt ranges from 0 to 11. For each time step, we need to calculate the value of wij(t)w_{ij}(t). Its meaning was illustrated in Equation 5. The step 4 shows that vmax(x(t))\vec{v}_{max}(\vec{x}(t)) always equals the value which maximizes vw(t)\vec{v}\cdot\vec{w}(t) in every iteration. It also means that we find the maximum marginal gain value of f^(x)\hat{f}(\vec{x}) when v(x(t))vmax(x(t))\vec{v}(\vec{x}(t))\leftarrow\vec{v}_{max}(\vec{x}(t)). Then x(t)\vec{x}(t) increases at the rate of vmax(x(t))\vec{v}_{max}(\vec{x}(t)) obtained in step 4. After the for loop, we get the value of x(1)\vec{x}(1) which is a convex combination of independent sets.

IV-E Discrete Implementation

Actually, the continous greedy algorithm solves our objective function by calculating the integral, shown as Equation 6. But it is hard to implement usually. So in this section, we discretize the continuous greedy algorithm. Given the time step Δt\Delta t, the discrete version is shown as follows:

  1. 1.

    Start with t=0t=0 and x(0)=0\vec{x}(0)=\vec{0}.

  2. 2.

    Obtain w(t)\vec{w}(t): Sort vector x(t)\vec{x}(t) from maximum to minimum and get vector x(t)\vec{x}^{\prime}(t), shown as Equation 4. For each element xij(t)x_{ij}(t), we define wij(t)=f(Slk(t))f(Sl(k1)(t))w_{ij}(t)=f(S^{\prime}_{lk}(t))-f(S^{\prime}_{l(k-1)}(t)), where Ω(xij(t))=xlk(t)\Omega(x_{ij}(t))=x^{\prime}_{lk}(t).

  3. 3.

    Let I(t)I^{*}(t) be the maximum-weight independent set in \mathcal{I} according to w(t)\vec{w}(t).

  4. 4.

    x(t+Δt)x(t)+1I(t)Δt\vec{x}(t+\Delta t)\leftarrow\vec{x}(t)+\vec{1}_{I^{*}(t)}\cdot\Delta t.

  5. 5.

    Increment t=t+Δtt=t+\Delta t; if t<1t<1, go back to step 2; Otherwise, return x(1)\vec{x}(1).

where we denote f^(x(t))\nabla\hat{f}({\vec{x}(t)}) by w(t)\vec{w}(t). Because vmax(x(t))Pv_{max}(\vec{x}(t))\in P and w(t)\vec{w}(t) is non-negative, vmax(x(t))\vec{v}_{max}(\vec{x}(t)) corresponds to a base of matroid \mathcal{M}. In other words, we find a I(t)I^{*}(t)\in\mathcal{I} such that

I(t)argmaxI(t)(w(t)1I(t))I^{*}(t)\in\arg\max_{I(t)\in\mathcal{I}}(w(t)\cdot\vec{1}_{I(t)})

where I(t)I^{*}(t) is the maximum-weight independent set at time step tt, which can be obtained by hill-climbing strategy. Then, tt increases discretely by Δt\Delta t in each step. Until getting the vector x(1)\vec{x}(1), the algorithm terminates.

After that, we have obtain a fractional vector returned by discrete continuous greedy. Then, we take the fractional solution x(1)\vec{x}(1) and apply randomized rounding technique: partitioning node jj to community ii with the probability xij(1)x_{ij}(1) independently and guaranteeing that each node can just belong to one community at most, i.e. xij=1x_{ij}=1 with the probability xij(1)x_{ij}(1) and xij=0x_{ij}=0 with the probability 1xij(1)1-x_{ij}(1), and for any jVj\in V, iMxij1\sum_{i\in M}x_{ij}\leq 1.

V Performance Analysis

In this section, we prove that the returned vector by Algorithm 1 is an approximate solution of our problem, Equation 1. Before we get the final approximation ratio, we need to prove the following lemma.

Lemma 5.

Suppose there exists xP\vec{x}^{*}\in P such that OPT=f^(x)OPT=\hat{f}(\vec{x}^{*}), where OPTOPT is the optimal solution for the problem, Equation 3. For any xRn\vec{x}\in R^{n}, we have xf^(x)OPTf^(x)\vec{x}^{*}\nabla\hat{f}(\vec{x})\geq OPT-\hat{f}(\vec{x}).

Proof.

We know that f^(x)\hat{f}(\vec{x}) is concave in all directions, but we just need to consider the non-negative direction as xP\vec{x}^{*}\in P. Let us consider a direction d=(xx)0\vec{d}=(\vec{x}^{*}-\vec{x})\vee\vec{0}, where xy=max(x,y)x\vee y=max(x,y). Assume that d0\vec{d}\geq\vec{0} is the moving direction of f^(x)\hat{f}(\vec{x}). Hence, based on the property of concave function, we have the conclusion: f^(x+d)f^(x)df^(x)\hat{f}(\vec{x}+\vec{d})-\hat{f}(\vec{x})\leq\vec{d}\nabla\hat{f}(\vec{x}).

We discuss the problem in two cases:

1. x+d=xxx\vec{x}+\vec{d}=\vec{x}^{*}\vee\vec{x}\geq\vec{x}^{*}. As f^(x)\hat{f}(\vec{x}) is non-decreasing, we can get f^(x+d)f^(x)\hat{f}(\vec{x}+\vec{d})\geq\hat{f}({\vec{x}^{*}}). So f^(x+d)f^(x)f^(x)f^(x)\hat{f}(\vec{x}+\vec{d})-\hat{f}(\vec{x})\geq\hat{f}({\vec{x}^{*}})-\hat{f}(\vec{x}).

2. When xx0\vec{x}^{*}-\vec{x}\geq\vec{0}, so d=(xx)\vec{d}=(\vec{x}^{*}-\vec{x}), dx\vec{d}\leq\vec{x}^{*} and as f^(x)0\nabla\hat{f}(\vec{x})\geq\vec{0}, we have df^(x)xf^(x)\vec{d}\nabla\hat{f}(\vec{x})\leq\vec{x}^{*}\nabla\hat{f}(\vec{x}).

Combining the above two cases, we can obtain that:

xf^(x)OPTf^(x)\vec{x}^{*}\nabla\hat{f}(\vec{x})\geq OPT-\hat{f}(\vec{x})

Theorem 2.

When f^(x)\hat{f}({x}) is the Lova´\acute{a}sz extension of the influence propagation ff for IMCPP, x(1)\vec{x}(1) returned by Algorithm 1 satisfies: x(1)P\vec{x}(1)\in P and f^(x(1))(11e)OPT\hat{f}({\vec{x}(1)})\geq(1-\frac{1}{e})OPT

Proof.

Based on the Equation 6, we can calculate x(1)\vec{x}(1) as follows:

x(1)=01x(t)𝑑t=01vmax(x(t))𝑑t\vec{x}(1)=\int_{0}^{1}\vec{x}^{\prime}(t)dt=\int_{0}^{1}\vec{v}_{max}(\vec{x}(t))dt

Since tt ranges from 0 to 1, we can use the theorem of the limit of Riemann sum to calculate the limit of the integral of x(1)\vec{x}(1).

x(1)=limn1ni=1nvmaxx(in)\vec{x}(1)=\lim\limits_{n\rightarrow\infty}\frac{1}{n}\sum_{i=1}^{n}\vec{v}_{max}\vec{x}(\frac{i}{n})

By the definition of vmax(x)\vec{v}_{max}(\vec{x}), we know that vmax(x)P\vec{v}_{max}(\vec{x})\in P for any x\vec{x}. The term inside the limit is a convex combination of vectors which belongs to PP. Since PP is a closed convex set, the limit of x(1)\vec{x}(1) is in PP. This proves x(1)P\vec{x}(1)\in P.

From Lemma 5, we know that there exists vP\vec{v}\in P such that v(x)f^(x)OPTf^(x)\vec{v}({x})\nabla\hat{f}(\vec{x})\geq OPT-\hat{f}(\vec{x}). When v=vmax\vec{v}=\vec{v}_{max}, we also can get vmax(x)f^(x)OPTf^(x)\vec{v}_{max}({\vec{x}})\nabla\hat{f}({\vec{x}})\geq OPT-\hat{f}({\vec{x}}).

We get further df^(x(t))dtOPTf^(x(t))\frac{d\hat{f}({\vec{x}(t)})}{dt}\geq OPT-\hat{f}({\vec{x}(t)}). We define g(t)=df^(x(t))dt+f^(x(t))OPTg(t)=\frac{d\hat{f}({\vec{x}}(t))}{dt}+\hat{f}({\vec{x}}(t))\geq OPT. Then we can get f^(x(t))=0te(xt)g(x)𝑑x\hat{f}({\vec{x}}(t))=\int_{0}^{t}e^{(x-t)}g(x)dx. So f^(x(1))=01e(x1)g(x)𝑑x01e(x1)OPT𝑑x=OPT[ex1]01=OPT(11/e)\hat{f}({\vec{x}}(1))=\int_{0}^{1}e^{(x-1)}g(x)dx\geq\int_{0}^{1}e^{(x-1)}OPTdx=OPT[e^{x-1}]^{1}_{0}=OPT(1-1/e). This proves f^(x(1))(11e)OPT\hat{f}({\vec{x}(1)})\geq(1-\frac{1}{e})OPT. ∎

In the second stage of Algorithm 1, we use random rounding to convert the fractional solution to integer solution. As we know that the relationship between the result of random rounding f^R(x)\hat{f}_{R}({\vec{x}}) and the continuous solution f^(x)\hat{f}({\vec{x}}) is f^R(x)f^(x)\hat{f}_{R}({\vec{x}})\geq\hat{f}({\vec{x}}). So the final result of Algorithm 1 we present compared with the optimal solution is f^R(x)(11e)OPT\hat{f}_{R}({\vec{x}})\geq(1-\frac{1}{e})OPT.

Theorem 3.

Algorithm 1 returns a (11e)(1-\frac{1}{e})-approximation (in expectation) for the problem, Equation 3.

Theorem 2 and the proof above imply the result of Theorem 3.

Here we will discuss the complexity of the proposed algorithm. The complexity is relatively high for large scale social networks.

Theorem 4.

The complexity of discrete continuous greedy algorithm is upper bounded by O((log(mn)+mn|E|r)/Δt)O((\log(mn)+mn|E|r)/\Delta t).

Proof.

First, at step (2), we sort the elements in x(t)\vec{x}(t) from maximum to minimum, there are total m×nm\times n elements, the complexity is O(log(mn))O(\log(mn)). Then, we estimate the objective function f()f(\cdot) by Monte Carlo simulations, the running time of f({v})f(\{v\}) given a node vv is O(|E|r)O(|E|r) where rr is the number of Monte Carlo simulations, where |E||E| is the cardinality of the edge set EE of the social network. The average number of node in Slk(t)S^{\prime}_{lk}(t) is mn/2mn/2, thus, the total running time of step (2) is O(log(mn)+mn|E|r)O(\log(mn)+mn|E|r).

The running time of Discretized continuous greedy is determined by its step (2), so we have its time complexity O((log(mn)+mn|E|r)/Δt)O((\log(mn)+mn|E|r)/\Delta t)

We can know that the complexity is high from Theorem 4, this results the poor scalability of the network.

VI Experiments

We show the experimental results of the proposed algorithms is this section. Firstly, we give descriptions about the used datasets and parameter settings.

VI-A Experimental Setup

Datasets: Three datasets of different magnitude are used in our experiments. The first two datasets are from networkrepository.com, it is like an online network repository which includes diverse kinds of networks. The first dataset is a co-authorship network which is a co-authorship about scientists in the field of network theory and experiment. The dataset consists of nodes and edges which depict relationships among 379 users, it has 914 edges between nodes; the second dataset is a Wiki-vote network, namely Wikipedia who-votes-on-whom network. It shows the voting relationship among 914 users and the edges between users are 2914. The third dataset is from arXiv, it is called the NetHEPT, which is a co-authorship network in the “High Energy Physics” section. It is made by 15299 users and 31376 edges.

Influence Model: LT model is the influence spread model in our problem. We set the spread probability for each directed edge ee as p(e)=1/d(i)p(e)=1/d(i), where d(i)d(i) is the in-degree of node ii. This method of setting p(e)p(e) has been widely adopted in some previous research[37, 38, 39]. The threshold that a node becomes active is generated randomly between 0 and 1.

Comparison Methods: To evaluate the effectiveness of the proposed algorithm, we compare the discrete continuous greedy algorithm with two baseline algorithms: random method, label propagation algorithm. Besides the Spit algorithm for Maximum K-Community Partition (SAMKCP) algorithm and Merge algorithm for Maximum K-Community Partition (MAMKCP) which are described in [33] are also used as the comparision algorithms.

Random: It randomly partitions nodes to different communities, which is a classical baseline algorithm.

Label Propagation [40]: It is a classical disjoint community detection algorithm. Firstly, each node is given a unique label. Then these labels spread in the network. Each node updates the label adopted by most of its neighbors iteratively. At the end of the algorithm, connected nodes with the same label make up one community.

SAMKCP: All the nodes belong to one community at first, then they spits on one of the communities recursively, which is a heuristic algorithm.

MAMKCP: Each node belongs to a community, then pairs of communities are merged recursively as a new community, which is also a heuristic algorithm.

VI-B Result Analysis

We have to extract sub-graph firstly at each step of the experiment so as to estimate the influence propagation of a community. The process of extracting sub-graph is that: Given the node sets of communities, we go through all the edges. We add a edge to the set of edges of sub-graph when the two nodes of this edge are in the node set of this community. Then do simulations in the next steps. To estimate f()f(\cdot), the number of Monte Carlo simulation is set as 500.

Refer to caption

Figure 1: Continuous results on dataset 1

Refer to caption

Figure 2: Continuous results on dataset 2

Refer to caption

Figure 3: Continuous results on dataset 3

Varying the value of mm and the datasets. The experiment result in Figure 1 is done on dataset 1, we show the variation of total influence propagation after the community partitioning by changing the value of time step Δt\Delta t when we partition the community to different numbers with the method of Discrete continuous greedy. We can see from the bars that when m=1m=1 the influence propagation does not change no matter what Δt\Delta t is since we do not need to partition community actually. When m=2m=2 the influence propagation is always larger than that in m=3m=3 whatever the time step Δt\Delta t is. When m=1m=1, the influence propagation obtains the maximum value. So the larger the number of the community is, the smaller the total influence propagation is in the social network. But if we fix the number of community partition mm, we can see that when the time step Δt\Delta t decreases from 0.2 to 0.05, the influence propagation is increasing from 385.3 to 400.1 when m=2m=2 and increasing from 363.2 to 380.5 when m=3m=3. Thus, we can get that the smaller the value of time step Δt\Delta t is, the larger the total influence propagation after community partitioning is, so smaller time step Δt\Delta t can make the community partition more accurate. We can also see from the Figure 1 that when the time step Δt\Delta t decreases from 0.2 to 0.1 and m=2m=2, the increment of influence propagation is 12.5; when we just increase the number of community partition to m=3m=3, the increment of influence propagation increases to 14.9. But when the time step Δt\Delta t decreases from 0.1 to 0.05, the increment of influence propagation is 2.3 when m=2m=2 and 2.4 when m=3m=3, respectively. The increment is greatly deuced as the decreasing of Δt\Delta t. When time step Δt=0.1\Delta t=0.1 the influence propagation is already relatively stable. These show that the community partition is already fairly accurate when Δt=0.1\Delta t=0.1 although a smaller the value of Δt\Delta t will get a more accurate result. The total influence propagation will become smaller if we try to partition the whole network into more communities. This phenomenon is because that it decreases the leaking out of the influence propagation between two communities after the community partitioning.

The experiment results in Figure 2 and 3 are done on two larger dataset 2 and dataset 3. we can find that the trend of the bar charts in Figure 2 and 3 are similar to the results in Figure 1. This further confirms the correctness and validity of our results.

Refer to caption


Figure 4: Comparative results on dataset 1

Refer to caption

Figure 5: Comparative results on dataset 2

Refer to caption

Figure 6: Comparative results on dataset 3

Comparison with other methods. In order to show the effectiveness of our approach, we compare Discrete continuous greedy with Random community partition method, Label propagation method and other two methods SAMKCP, MAMKCP proposed by [33]. The comparison results are shown in Figure 4, 5 and 6. The random community partition method is intuitive. It simple partitions nodes randomly to each community. The y-axis shows the influence propagation when we partition different number of communities on dataset 1, 2 and 3 using our algorithm and random method, label propagation method, SAMKCP, MAMKCP. The time step is set to 0.05 in proposed algorithm. It is obvious that the influence propagation with our proposed algorithm is much more than random method no matter m=2m=2 or m=3m=3 and no matter in dataset 1 or dataset 2. As in dataset 3, the result of random community partition method is much worse than the other four methods, we omit its result. The result of our method is also superior to SAMKCP, MAMKCP and Label propagation method, which shows that our algorithm trades time complexity for more accurate performance than SAMKCP, MAMKCP and Label propagation method. SAMKCP, MAMKCP and Label propagation methods have a low computational complexity but also have some loss in performance. These results illustrate that the continuous greedy method we proposed works well to maximize the influence propagation in each community.

VII Conclusion

We use the Lova´\acute{a}sz extension theory to relax our target function of the Influence Maximization for Community Partition Problem (IMCPP) and introduce a partition matroid to the domain of the relaxed problem. Then we propose a continuous greedy algorithm and its discrete form to solve the problem. In order to convert the fractional solutions which are obtained by the two algorithms to integer solutions, we use the random rounding method to the results of the algorithms at the second stage. We analyze the performance of our proposed algorithm and then an 11/e1-1/e approximation ratio is got for the algorithm. Finally, we do simulations on three datasets which is from the real-world social networks to show the advantages of the proposed methods.

In the future, we would like to design an approximation algorithm for the influenced-based community partition problem in IC model with the sandwich method or DS decomposition method. And based on these work, we want to study a two stage algorithm for rumor blocking: doing the influence-based community partition at the first stage and blocking rumor at bridge-ends of the community which contains the rumor source at the second stage.

References

  • [1] Q. Ni, J. Guo, C. Huang, and W. Wu, “Information coverage maximization for multiple products in social networks,” Theoretical Computer Science, pp. 32–41, 2020.
  • [2] P. Banerjee, W. Chen, and L. V. Lakshmanan, “Maximizing welfare in social networks under a utility driven influence diffusion model,” in Proceedings of the 2019 International Conference on Management of Data.   ACM, 2019, pp. 1078–1095.
  • [3] X. Liu, X. Kong, and P. S. Yu, “Active opinion maximization in social networks,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.   ACM, 2018, pp. 1840–1849.
  • [4] A. Talukder, M. G. R. Alam, N. H. Tran, and C. S. Hong, “A cost optimized reverse influence maximization in social networks,” in NOMS 2018-2018 IEEE/IFIP Network Operations and Management Symposium.   IEEE, 2018, pp. 1–9.
  • [5] D. Kempe, J. Kleinberg, and É. Tardos, “Maximizing the spread of influence through a social network,” in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining.   ACM, 2003, pp. 137–146.
  • [6] Z. Lu, Z. Zhang, and W. Wu, “Solution of bharathi–kempe–salek conjecture for influence maximization on arborescence,” Journal of Combinatorial Optimization, vol. 33, no. 2, pp. 803–808, 2017.
  • [7] H. Wang, B. Liu, X. Zhang, L. Wu, W. Wu, and H. Gao, “List edge and list total coloring of planar graphs with maximum degree 8,” Journal of Combinatorial Optimization, vol. 32, no. 1, pp. 188–197, 2016.
  • [8] H. Huang, H. Shen, Z. Meng, H. Chang, and H. He, “Community-based influence maximization for viral marketing,” Applied Intelligence, vol. 49, no. 6, pp. 2137–2150, 2019.
  • [9] W. Yang, G. Wang, K.-K. R. Choo, and S. Chen, “Hepart: A balanced hypergraph partitioning algorithm for big data applications,” Future Generation Computer Systems, vol. 83, pp. 250–268, 2018.
  • [10] M. Plantié and M. Crampes, “From photo networks to social networks, creation and use of a social network derived with photos,” in Proceedings of the 18th ACM international conference on Multimedia.   ACM, 2010, pp. 1047–1050.
  • [11] M. Fazlali, E. Moradi, and H. T. Malazi, “Adaptive parallel louvain community detection on a multicore platform,” Microprocessors and Microsystems, vol. 54, pp. 26–34, 2017.
  • [12] A. Bartal, N. Pliskin, and G. Ravid, “Modeling influence on posting engagement in online social networks: Beyond neighborhood effects,” Social Networks, vol. 59, pp. 61–76, 2019.
  • [13] W. Chen, C. Wang, and Y. Wang, “Scalable influence maximization for prevalent viral marketing in large-scale social networks,” in Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining.   ACM, 2010, pp. 1029–1038.
  • [14] C. Zapp, A. Obarska-Kosinska, B. Rennekamp, D. Mercadante, U. Barayeu, T. P. Dick, V. Denysenkov, T. Prisner, M. Bennati, C. Daday et al., “Mechanoradicals in tensed tendon collagen as a new source of oxidative stress,” arXiv preprint arXiv:1910.12190, 2019.
  • [15] O. Al-Harazi, A. El Allali, and D. Colak, “Biomolecular databases and subnetwork identification approaches of interest to big data community: An expert review,” Omics: a journal of integrative biology, vol. 23, no. 3, pp. 138–151, 2019.
  • [16] B. Kamiński, V. Poulin, P. Prałat, P. Szufel, and F. Theberge, “Clustering via hypergraph modularity,” PloS one, vol. 14, no. 11, 2019.
  • [17] R. Kishore, A. K. Gogineni, Z. Nussinov, and K. K. Sahu, “A nature inspired modularity function for unsupervised learning involving spatially embedded networks,” Scientific reports, vol. 9, no. 1, p. 2631, 2019.
  • [18] M. Girvan and M. E. Newman, “Community structure in social and biological networks,” Proceedings of the national academy of sciences, vol. 99, no. 12, pp. 7821–7826, 2002.
  • [19] M. E. Newman, “Fast algorithm for detecting community structure in networks,” Physical review E, vol. 69, no. 6, p. 066133, 2004.
  • [20] C. Gui, R. Zhang, R. Hu, G. Huang, and J. Wei, “Overlapping communities detection based on spectral analysis of line graphs,” Physica A: Statistical Mechanics and Its Applications, vol. 498, pp. 50–65, 2018.
  • [21] Q. Liu, C. Liu, J. Wang, X. Wang, B. Zhou, and P. Zou, “Evolutionary link community structure discovery in dynamic weighted networks,” Physica A: Statistical Mechanics and its Applications, vol. 466, pp. 370–388, 2017.
  • [22] U. N. Raghavan, R. Albert, and S. Kumara, “Near linear time algorithm to detect community structures in large-scale networks,” Physical review E, vol. 76, no. 3, p. 036106, 2007.
  • [23] X. Teng, J. Liu, and M. Li, “Overlapping community detection in directed and undirected attributed networks using a multiobjective evolutionary algorithm,” IEEE transactions on cybernetics, 2019.
  • [24] X. Wang, B. Zhou, Y. Jia, and S. Li, “Detecting internet hidden paid posters based on group and individual characteristics,” in International Conference on Web Information Systems Engineering.   Springer, 2015, pp. 109–123.
  • [25] F. Zou, D. Chen, D.-S. Huang, R. Lu, and X. Wang, “Inverse modelling-based multi-objective evolutionary algorithm with decomposition for community detection in complex networks,” Physica A: Statistical Mechanics and its Applications, vol. 513, pp. 662–674, 2019.
  • [26] M. Guerrero, C. Gil, F. G. Montoya, A. Alcayde, and R. Baños, “Multi-objective evolutionary algorithms to find community structures in large networks,” Mathematics, vol. 8, no. 11, p. 2048, 2020.
  • [27] K. Berahmand, A. Bouyer, and M. Vasighi, “Community detection in complex networks by detecting and expanding core nodes through extended local similarity of nodes,” IEEE Transactions on Computational Social Systems, vol. 5, no. 4, pp. 1021–1033, 2018.
  • [28] X. Ding, J. Zhang, and J. Yang, “Node-community membership diversifies community structures: An overlapping community detection algorithm based on local expansion and boundary re-checking,” Knowledge-Based Systems, p. 105935, 2020.
  • [29] W. Feifan, Z. Baihai, C. Senchun, and X. Yuanqing, “An extreme learning machine-based community detection algorithm in complex networks,” Complexity, vol. 2018, pp. 1–10, 2018.
  • [30] G. Sperlí, “A deep learning based community detection approach,” in Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, ser. SAC ’19.   New York, NY, USA: Association for Computing Machinery, 2019, p. 1107–1110. [Online]. Available: https://doi.org/10.1145/3297280.3297574
  • [31] Z. Chen, X. Li, and J. Bruna, “Supervised community detection with line graph neural networks,” 2017.
  • [32] Y. Wang, G. Cong, G. Song, and K. Xie, “Community-based greedy algorithm for mining top-k influential nodes in mobile social networks,” in Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining.   ACM, 2010, pp. 1039–1048.
  • [33] Z. Lu, Y. Zhu, W. Li, W. Wu, and X. Cheng, “Influence-based community partition for social networks,” Computational Social Networks, vol. 1, no. 1, p. 1, 2014.
  • [34] N. Barbieri, F. Bonchi, and G. Manco, “Efficient methods for influence-based network-oblivious community detection,” ACM Transactions on Intelligent Systems and Technology (TIST), vol. 8, no. 2, pp. 1–31, 2016.
  • [35] L. Lovász, “Submodular functions and convexity,” in Mathematical Programming The State of the Art.   Springer, 1983, pp. 235–257.
  • [36] J. Vondrák, “Continuous extensions of submodular functions continuous extensions of submodular functions. cs 369p: Polyhedral techniques in combinatorial optimization,” 2010.
  • [37] Y. Yang, X. Mao, J. Pei, and X. He, “Continuous influence maximization: What discounts should we offer to social network users?” in Proceedings of the 2016 international conference on management of data.   ACM, 2016, pp. 727–741.
  • [38] C. Wang, W. Chen, and Y. Wang, “Scalable influence maximization for independent cascade model in large-scale social networks,” Data Mining and Knowledge Discovery, vol. 25, no. 3, pp. 545–576, 2012.
  • [39] Y. Tang, X. Xiao, and Y. Shi, “Influence maximization: Near-optimal time complexity meets practical efficiency,” in Proceedings of the 2014 ACM SIGMOD international conference on Management of data.   ACM, 2014, pp. 75–86.
  • [40] S. E. Garza and S. E. Schaeffer, “Community detection with the label propagation algorithm: A survey,” Physica A: Statistical Mechanics and its Applications, vol. 534, p. 122058, 2019.