A Polyhedral Approach to Least Cost Influence Maximization in Social Networks
Abstract
The least cost influence maximization problem aims to determine minimum cost of partial (e.g., monetary) incentives initially given to the influential spreaders on a social network, so that these early adopters exert influence toward their neighbors and prompt influence propagation to reach a desired penetration rate by the end of cascading processes. We first conduct polyhedral analysis on a substructure that describes influence propagation assuming influence weights are unequal, linear and additively separable. Two classes of facet-defining inequalities based on a mixed 0-1 knapsack set contained in this substructure are proposed. We characterize another exponential class of valid and facet-defining inequalities utilizing the concept of minimum influencing subset. We show that these inequalities can be separated in polynomial time efficiently. Furthermore, a polynomial-time dynamic programming recursion is presented to solve this problem on a simple cycle graph. For arbitrary graphs, we propose a new exponential class of valid inequalities that dominates the cycle elimination constraints and an efficient separation algorithm for them. A compact convex hull description for a special case is presented. We illustrate the effectiveness of these inequalities via a delayed cut generation algorithm in the computational experiments.
Keywords— Influence Maximization, Social Networks, Valid Inequalities, Delayed Cut Generation
1 Introduction
Intricate connections between entities in many natural and man-made systems form large complex networks. Of particular interest in network science is to gain insight into the dynamic of cascading processes in complex networks. For instance, the spreading of new behaviors, opinions, technologies, conventions, and gossips from person to person through a social network may play an important role in designing competitive marketing strategies. Indeed, the collection of social ties among consumers can be exploited to select pivotal early adopters for initiation and anticipate the time and cost required for the information propagation. Therefore, a key research question has been focused on how to efficiently identify a set of users to disseminate a certain information within the network, result in an increasing trend in studying social influence and information propagation in social networks (see, e.g., [1, 2]). As it may be expected, the spreading of social influence is commonly studied on network graphs that consist of nodes and links representing users and their connections, as well as a framework that describes how the information propagates among various intermediate users over time. Granovetter [3] first presented the threshold model to simulate the collective force exerted by a group on each of its members for predicting innovation and rumor diffusion, voting trends, and migration. A distinct threshold value is assigned to each user representing the proportion of neighbors who make a decision before a particular user makes such decision. Many extensions built on the threshold model have been proposed to encompass different circumstances. Among them, the linear threshold model and independent cascade model are the two most popular and well-studied one. Kempe et al. [4] study the linear threshold and independent cascade models under the influence maximization (IM) problem. The goal of IM is to activate as many nodes as possible by the end of propagation process given selected influential nodes within budget. Specifically, they show that the expected influence spread is a monotone submodular function, which can be approximated by greedy algorithm with performance guarantee . On the other hand, the target set selection (TSS) introduced by [5] aims to find the minimum number of users required initially and activate the entire network through the propagation process. These two types of problems are fundamental in social network analysis and have many variants. Here we refer the reader to a survey of hardness and solution approaches for TSS [6], a review on approximation algorithms for IM [7], and a comprehensive review on introduction to types of social networks, properties, evaluation metrics and known method to solve IM [8]. Recently, Azaouzi et al. [9] presented a comprehensive review on models and methods of group-level IM and IM under privacy protection.
In this paper, we study an optimization problem arising in social network analytics, referred to as the least cost influence maximization (LCIM) problem [10, 11, 12]. The propagation process in LCIM is based on the linear threshold model, where each user is assigned a real-valued threshold and each link between users is assigned a weight to represent influence level. If the summation of influence weights from neighbors exceeds one’s threshold value, then the individual is activated, meaning that the information is adopted or the person changes their opinion to align with friends (neighbor nodes) in the social network. In LCIM, the concept of influence weights is extended with the idea of giving partial incentives such as free samples or coupons to individual as a motivation for spreading information. The assumption of the linear threshold model allows the incentives and influence weights in the activation processes to be linearly additive, which immediately admits a mixed-integer optimization formulation for the influence propagation. Our goal is to develop an exact computational method based on the polyhedral structure. We assume that all the parameters are deterministic, all the nodes are inactive initially and nodes remain activated once the influence weights exceeds the threshold. A similar assumption on using deterministic parameters in linear threshold model can be found in [13]. From a practical point of view, the assumption of deterministic linear threshold depends on the accuracy of estimation of influence factors and threshold parameters. Machine learning and data mining techniques may enable one to obtain accurate predictions on those parameters from massive amounts of data available nowadays. Note that when incentive is allowed and the influence weights and thresholds are considered known in the problem, the influence propagation is no longer submodular, thus, a greedy algorithm is not applicable for arbitrary graphs. However, the problem is still challenging to solve due to the combinatorial nature. The minimum subset of nodes required under dynamic activation is also referred to as the dynamic monopoly, where nodes become activated if at least half of the neighbors are activated in the previous time period. Hence, an extra index of time is incorporated into the integer programming formulation to describe step-by-step activation of nodes, see [14, 15, 16]. We do not consider time dynamics in our problem despite the fact that the influence propagation occurs in discrete steps. Moreover, the induced optimal influence propagation graph for LCIM in the static network setting given by the mixed-integer programming formulation is known to be acyclic [10].
Even though IM, TSS and LCIM problems share certain similarities, most of the previous studies on IM or TSS mainly focus on developing degree-based or centrality-based metaheuristics and approximation algorithms. Existing studies on using exact mixed-integer optimization techniques for LCIM under deterministic data settings are relatively limited. The computational complexity of LCIM is established in [12]. In particular, they show that LCIM is NP-hard on arbitrary graphs and bipartite graphs for both equal and unequal influence when 100% penetration rate is not required. They also give a greedy algorithm and a total unimodular formulation for LCIM with equal influence on a tree and 100% penetration rate. Günneç et al. [11] develop a branch-and-cut algorithm using this total unimodular formulation for LCIM on arbitrary graphs. Fischetti et al. [10] present a novel set covering formulation for a generalized LCIM, where the activation function can be adjusted to be nonlinear in order to capture the situation of diminishing marginal influences or over-proportional effect from peers. They propose strengthened generalized propagation inequalities and a price-cut-and-branch algorithm to deal with the exponential number of variables and constraints. Recent developments on using exact mixed-integer optimization methods for stochastic influence maximization problem include a delayed constraint generation algorithm for a two-stage stochastic influence maximization problem [17] and a branch-cut-and-price algorithms for robust a influence maximization, where node thresholds and arc influence factors are subject to budget uncertainty [18].
1.1 Notation and problem definition
For convenience, we use the notation to denote the set for , and for . For a set , we use to denote its convex hull of solutions. Formally, an oriented network (e.g., a social network) is represented by a graph with the set of nodes corresponding to people or users and the set of bidirectional arcs with cardinality corresponding to connections and influence directions between people in the network. Hence, each node has identical number of predecessors and successors, denoted by : , where . Each node has a threshold value measuring the “difficulty level” of an individual to be “activated”. Each arc is associated with an influence weight . The coverage (penetration) rate is denoted by , where is assumed. We also assume that and are positive integers such that and for all such that throughout this paper to omit trivial cases. For any and for some , we pre-process the data to set . All nodes are assumed inactive initially and nodes remain active once influences from neighbors and incentives reach or exceed the threshold. For each node , let nonnegative continuous variables be the amount of partial incentives given to user , binary variables indicate whether influence is exerted from node to nor not, and binary variables indicate whether node is activated or not. The arc-based formulation of static LCIM is given by
s.t. | (1a) | |||
(1b) | ||||
(1c) | ||||
(1d) | ||||
Node propagation constraints (1a) follow the linear threshold model by evaluating the total incoming influence from neighbor plus the incentives given to a node. Constraints (1b) state that for every two nodes with bidirectional arcs, the influence exertion is allowed in one way only. The cardinality constraint (1c) describes the desired number of activated nodes given a predetermined penetration rate , where and . Constraints (1d) are generalized cycle elimination constraints (GCEC) where . They are necessary to produce the acyclic optimal influence propagation graph. Note that the arc-based formulation proposed by [19] is different from this paper as the influence weights in their model are coming solely from their neighbors without incentives. Fischetti et al. [10] adopt this arc-based formulation on arbitrary graphs for computational performance comparison, but possible values of incentives are discretized by a set of an exponential number of binary variables.
1.2 Main contributions
The key contributions of this paper are summarized as follows. We conduct a polyhedral study for the hidden mixed 0-1 knapsack substructure in (1a) of LCIM. We propose three classes of strong valid inequalities, namely, the continuous cover, continuous packing and the minimum influencing subset inequalities from this substructure and specify the conditions under which they are facet-defining. The coefficient of these inequalities can be adjusted to equal influence assumption directly. We have improved the run time of the separation algorithm for the the continuous cover and continuous packing inequalities compared with our preliminary results presented in [20]. In addition, we provide a polynomial-size complete linear description of the polyhedron of LCIM on a tree when equal influence weights for every node and 100% coverage are assumed. For LCIM on arbitrary bidirectional network graphs, we derive a novel class of strong valid inequalities called the the inequalities and show they dominate the general cycle elimination constraints. Finally, we augment the preliminary computations in [20] with different formulations, cutting plane strategies, density and scale of networks in extensive computational experiments.
1.3 Outline
The remainder of this paper is organized as follows. In Section 2, we derive strong valid inequalities from the mixed 0-1 knapsack substructure and give an exact polynomial time separation algorithm for these inequalities. In Section 3, we discuss an time dynamic programming recursion for LCIM on a simple cycle. We propose the inequalities for arbitrary bidirectional graphs and develop a polynomial time separation algorithm via solving the longest path problem on a directed acyclic graph. We present the convex hull description of LCIM on a special case in Section 4. Finally, we illustrate the effectiveness of our proposed valid inequalities in the computational experiments in Section 5 and conclude with Section 6.
2 Valid inequalities based on mixed 0-1 knapsack polyhedron
To develop a strong formulation for LCIM, we study the polyhedral structure of constraints (1a). For , let
The set is a mixing set with a binary variable on the right-hand side value. For any inequality that is facet-defining for , it is facet-defining for as well. Therefore, we now consider a single node propagation by dropping the subscript and obtain the following set
Observe that the set contains a mixed 0-1 knapsack structure. Let set obtained from by setting , and , then we have a mixed 0-1 knapsack set with weight for each item and the capacity of knapsack plus an unbounded continuous variable in the following
The mixed 0-1 knapsack set is a special case of traditional 0-1 knapsack problem where the knapsack size is expanded with additional capacity. Observe that is full-dimensional and contains the origin. There are trivial facets for and their verification is straightforward.
Proposition 1.
The following facet-defining inequalities for are trivial.
-
(i)
The inequality is facet-defining for if for all .
-
(ii)
The inequality is facet-defining for if for all .
-
(iii)
The inequality is facet-defining for .
-
(iv)
The inequality is facet-defining for .
The first result of Proposition 1 validates our data pre-processing assumption for influence weight and threshold when . To show this, consider a single node influence propagation inequality
Clearly, this inequality is dominated by
as the second inequality is faced-defining. Hence, we can obtain the facet-defining inequality by simply substituting any influence weight with for all if . Marchand and Wolsey [21] propose two classes of valid inequalities for based on mixed-integer rounding and lifting function, namely, the continuous cover and continuous packing (reversed cover) inequalities. We follow a similar idea to strengthen the formulation for LCIM with moderate modification as . Applications of continuous cover and continuous packing inequalities for can be seen in several fields, including delay management for public transportation [22], job scheduling with uncertain multiple resources [23], discrete lot sizing [24] and single-item capacitated lot sizing [25]. They can also be extended to solve general mixed-integer optimization that contains mixed 0-1 knapsack set with bounded continuous variables, see [26, 27, 28].
2.1 Continuous cover and continuous packing inequalities
Consider a continuous cover such that and for any . Let be in non-increasing order with , for and .
Proposition 2.
The continuous cover inequality
(2) |
where
(3) |
is valid for .
Similarly, consider a continuous packing such that and for any . Let be in non-increasing order with , for and .
Proposition 3.
The continuous packing inequality
(4) |
where
(5) |
is valid for .
Here we omit the proofs as the validity of inequalities (2) and (4) and their facet-defining conditions for directly follow from [21] when , while when , we must have for all , both inequalities are trivially satisfied and facet-defining according to Proposition 1.
Example 1.
Let and , we list the facet-defining inequalities of inequality (2) and (4) in Table 1. For example, for , we have and . Then the lifting function is given by
Hence, the coefficient of is , which generates
set | facet-defining inequality |
---|---|
, | |
, | |
, | |
, | |
, | |
, | |
, |
2.2 Minimum influencing subset inequalities
We use the definition of minimal influencing set from [10], which we include here for the reader’s convenience.
Definition 1 ([10]).
Let be an incentive payment to node and be a set of active neighbors of node , such that . We say is a minimal influencing subset for node if and only if for a fixed incentive payment , it satisfies and for any . In other words, a strict subset of with the same incentive payment is not sufficient to activate node .
Proposition 4.
Let be a minimum influencing subset with an incentive payment . The minimal influencing subset inequality
(6) |
is valid for .
Proof.
If then inequality (6) is trivially satisfied. If for all , either for or for . Assume that none of these cases hold, given a , rewrite the left term of the inequality in the following form:
which implies
∎
Proposition 5.
Inequality (6) is facet-defining for if and only if . Moreover, for a given and a set , for each such that , the minimal influencing subset inequality
(7) |
is facet-defining for .
Proof.
Recall that by definition. If , then inequality (6) reduces to ; therefore, is necessary. To show the sufficiency that inequality (6) is facet-defining for , we exhibit linearly independent points on the face defined by inequality (6). Let be the unit vector corresponding to for . Consider the two feasible points and , then, consider the feasible points for . It is straightforward to verify that these points are linearly independent and satisfy inequality (6) at equality. To prove the second part of this proposition, let , and for be the unit vectors corresponding to variables , and , respectively. The component of is 1 if it has the same position with in the feasible solution; all other components of are 0. Similar setting is made to for and for , respectively. For , consider the points , also, consider the points . For and , consider the points . These points are linearly independent and satisfy inequality (7) at equality, which completes the proof. ∎
Example 1 (Continued).
set | facet-defining inequality |
---|---|
Although inequalities (2), (4) and (6) define a large number of facets for , they are not sufficient to completely describe in its original space of variables. Particularly, the following inequality is valid and facet-defining for Example 1 but can not be obtained through either inequalities (2), (4) or (6):
2.3 Separation of minimal influencing subset inequalities
In this section, we give an exact polynomial time separation algorithm for finding the most violated minimal influencing subset inequality. From inequality (7), we observe that finding the most violated inequality for a given fractional solution consists of choosing a set such that is maximized. Let .
Proposition 6.
Given a fractional solution from solving LCIM, there exists an time separation algorithm for inequality (7).
Proof.
A violated inequality (7) can be found if
which implies that it suffices to consider for some such that and . To do so, we sort in non-decreasing order for with indices such that . For , we sum up first elements, then we check if and , until . These elements constitute the subset and simultaneously and ensure and in order to generate a violated cut. The set corresponds to the most violated cut can be determined by evaluating . If , then there are no violated cuts. The sorting process runs in time and the evaluation takes , since we have to check for every node , overall the separation algorithm runs in time. ∎
2.4 Separation of continuous cover and continuous packing inequalities
Up to this point, we presented an exact polynomial time separation algorithm for inequalities (7). Next, we show that a violated continuous cover inequality for can be identified by exploiting the result of Proposition 6. Then we can obtain a violated continuous packing inequality in polynomial time after finding a violated continuous cover inequality. We first formally establish the relationship between , and in the following lemma.
Lemma 1.
If and there exists such that and for any , then , , and .
Proof.
By rearranging the terms in the definition of ,
Hence, we can see that is equivalent to and if . Next, if there exists an element such that , immediately we have by definition and . Since , we have , , and . ∎
Following Lemma 1, we present an efficient separation procedure to determine violated continuous cover and continuous packing inequalities by using the information of the set . Note that here we add an index to inequalities (2) similar to (7) for all .
Proposition 7.
There exists a violated continuous cover inequality if a violated inequality (7) is identified.
Proof.
Recall that inequality (7) is violated if
or equivalently by Lemma 1,
Now, a continuous cover inequality for a fixed node is violated if
Suppose for all , then the left term of the continuous cover inequality can be further written as
Since holds and the lifting function is nonnegative, the rest of the terms already violate the current solution , we then obtain a violated continuous cover inequality with being the minimum among . This suggests that, when a violated inequality (7) is found, it suffices to generate a violated continuous cover inequality concurrently. ∎
On the other hand, a violated continuous packing inequality can not be obtained directly from separating inequality (7). However, when a violated continuous cover inequality is found, we can check every element in to see if there exists an element such that . For every satisfying this condition, the packing set is then determined. A violated continuous packing inequality is found if
holds. Furthermore, suppose , the process of checking elements in takes time and the function can be constructed in time using binary search proposed in [26].
3 Valid inequalities for LCIM with cycles
In this section, we expand the study of to incorporate the remaining constraints in LCIM on an arbitrary bidirectional graph that contains cycles. The polyhedron that describes the intersection of these constraints is
To simplify the notation, we let and be the coefficients associated with variables and corresponding to continuous cover and continuous packing inequalities. In other words, we express them in the following form:
(8) |
and it is facet-defining for . Due to the straightforward structure of a cycle, the minimum incentive of the influence propagation can be easily characterized. Raghavan and Zhang [29] give an time algorithm for the weighted target set selection problem on a cycle. We also present a polynomial time dynamic programming to solve LCIM on a simple cycle. Then we give a class of exponential number of valid inequalities that forms acyclic influence propagation by exploiting inequality (8) as the base inequality. We also demonstrate that the separation for this class of valid inequalities can be done in polynomial time for arbitrary bidirectional graphs that contain cycles.
3.1 Dynamic programming recursion for LCIM on a simple cycle
Without loss of generality, we assume in this section. Hence, for a smple cycle graph, we have , and with for all . We still use for the set of all nodes and for set of all bidirectional arcs to focus on the discussion of LCIM on a simple cycle for consistency. Observe that due to the cycle structure, the influence propagation occurs on an induced path of a cycle as it is one-way and consecutive after a particular node is activated by paying full incentive to it. Given the cardinality requirement , the cost of activating other nodes on this path is equivalent to the threshold of the inactivated node minus the influence weights exerted from the activated predecessor on the side. For , the cost of activating the last node is zero as it receives influence exertion from both its predecessor and the firstly activated node. Therefore, we only need to evaluate the cases for .

Given , for , construct a node set that contains node with forward arcs connecting nodes starting from with total number of nodes equals to . Let be the indices in . The corresponding set of forward arcs is then as illustrated in Figure 1. We apply the similar logic for backward set of nodes and arcs . Let denote the minimum cost of activating nodes on the cycle beginning with node . For , the minimum cost of activating nodes on a cycle is given by
(9) |
where
(10) |
Proposition 8.
3.2 Valid inequalities for influence propagation over a cycle
Since every network consists of trees and cycles as substructures, observe that for every cycle in the network, at least one node is either paid with full incentive or the activation requires influence exertion from nodes outside the cycle. Fischetti et al. [10] first recognized this observation and proposed a generalized propagation constraints in a different space of variables. Here we propose an exponential class of valid inequalities that captures this observation as well as ensures the influence propagation is acyclic for .
Proposition 9.
Given an inequality (8) and a cycle with set of nodes and set of arcs , for , the inequality
(11) |
is valid for , where
-
(i)
,
-
(ii)
if ,
-
(iii)
computes the least common multiple of for if ,
-
(iv)
.
Proof.
Due to constraint (1c), there exists for some . Let . We partition into two disjoint sets and such that and , where and . We distinguish two main cases:
Case 1.
We first consider for all and for all . In this case, the left hand side of the inequality is equal to 0, whereas in the right hand side, we have and , if there exists at least one , we must have , which leads to . Consequently,
holds, inequality (11) is thus valid.
Case 2.
Next we consider for all and for all . In this case, we must have for all as influence exertion towards nodes belong to is unnecessary. Then the inequality reduces to
Regardless of whether for all or for some , there exists at least one index such that and . In other words, at least one node is activated with full incentive payment in order to launch the propagation for nodes in . For other nodes that receive partial or zero incentives, their corresponding term in the left hand side is zero. To compute the possible range of the left hand side, we rearrange the terms and obtain the following
Consequently, the left hand side is at most and at least with one particular node with full incentive , which is valid. This completes the proof.
∎
Example 2.
Consider a graph illustrated in Figure 2. The number next to each arc is the influence weight , while the number inside the brackets next to each node is the threshold .

For node 2, take with , the corresponding continuous packing inequality (8) is
Similarly, for node 3, take with , the corresponding continuous packing inequality (8) is
There are two cycles and in Figure 2. For with cycle , we have , , , then the inequality is
Furthermore, consider the cycle from another direction, the inequality is
Next, we give the condition in which inequality (11) is stronger than the generalized cycle elimination constraints. Without loss of generality, we assume that for .
Proposition 10.
Inequality (11) with dominates the generalized cycle elimination constraints.
Proof.
Consider a particular cycle , the GCEC can be states as
where is an arbitrary choice of index among . For a inequality (11) with on this cycle, we obtain
Clearly, the GCEC is weaker then the inequality unless . Furthermore, if we have , then must hold. This implies that there exist violated inequalities for every violated cycle identified. ∎
3.3 Separation of inequalities
Since the size of of inequalities (11) is exponential, we explore a separation scheme to find the most violated inequality corresponding to set in polynomial time. We again assume that in this section.
Proposition 11.
Separation problem for inequality (11) can be solved in time.
Proof.
Inequality (11) is violated if
(12) |
For a given fractional point , we determine a set such that the left hand side of (12) is maximized. With the observation in Proposition 10, for every violated cycle detected, we construct a longest path problem on a directed acyclic network to solve the separation problem.

Consider a directed acyclic network with a source vertex and a sink vertex . Define for all . It is possible that there exists multiple inequality (8) for a fixed . We select the one with the minimum value of accordingly. Let index set such that . Node is sorted according the the value of and each node has a unique mapping to each node . The node set is then . The arc set is .
Next, we assign length on each arc in . For arc , we let
Let . For arcs , we set the length . For arcs , we set the length
This longest path problem depicted in Figure 3 can be solved by Dijkstra’s algorithm. There exists a violated inequality (11) if and only if the longest path is strictly positive and the nodes on this path determine the elements in set . The sorting process of takes time, the evaluation of takes time, and the longest path on a directed acyclic graph takes time as there are nodes and arcs. Since we have to solve this problem for every violated cycle found by Dijkstra algorithm, the separation algorithm runs time overall. ∎
Example 3.

Consider solving LCIM with on a graph depicted in Figure 2. The initial linear programming relaxation solution without enumerating any cycle elimination constraints is
and the objective function value is 8.52. The violated cycle is for this solution. Then we obtain , and and sort them in non-decreasing order. Since , the set is corresponding to the original node set . This leads to a directed acyclic network illustrated in Figure 4 with new node set . The length of and are and , respectively. The lengths of the remaining arcs are
The longest path is , which determines with maximum violation 6. We add the following violated inequality (11)
to cut off this fractional solution. We then obtain the new objective function value 10.2 and the solution sets
which is very close to the true optimal objective function value 11 in this example.
4 LCIM under equal influence and 100% adoption
Since the optimal propagation subgraph of LCIM is acyclic, the solution of LCIM on a tree provides a valid lower bound for LCIM on a graph with cycles. Moreover, in practical applications, it is common to assume that both threshold and influence exertion are identical for every node, due to simplicity or lack of accurate estimation. For example, in the unanimous threshold model [5], for all is assumed. This diffusion model is normally considered as the most influence resistant one, and it has applications in complex computer network security problems. In addition, the majority threshold model [30] assumes for all . Both information diffusion models assume that for all .
In this special case of LCIM where equal influence is assumed for all and 100% coverage is required on a tree, we have for all and GCEC can be discarded. The LCIM formulation corresponding to equal influence and 100% coverage on a tree graph is given by
s.t. | (13a) | |||
(13b) | ||||
(13c) | ||||
(13d) |
Let denote the set of feasible solutions to LCIM-TE on a tree graph and let denote the set of feasible solutions to the linear programming relaxation of (13a) - (13d). Günneç et al. [12] prove that LCIM-TE is polynomial solvable on a tree graph. They propose a compact extended formulation with total unimodular constraints by considering three types of incoming influence for every node. However, this extended formulation cannot be applied to unequal influence weights directly, since there would be exponentially many distinct possible types of incoming influence for a node, depending on the influence weights received from its activated neighbors. We give the complete linear description of in the original space of variables with additional constraints and show they are a special case of the continuous cover and continuous packing inequalities by adjusting the influence weights.
Proposition 12.
Let and for all , the inequality
(14) |
is facet-defining for if and only if and . Furthermore, the complete linear description of is given by
Proof.
If , then and inequality (14) coincides with . Similarly, if , we also have and inequality (14) is reduced to . To prove the sufficiency, we demonstrate that inequality (14) is a special case of the continuous cover and continuous packing inequalities. Observe that is the minimum number that exceeds if multiplied by , which implies that . Therefore, is the cardinality of set corresponding to the continuous packing inequality with equal influence weights. We thus obtain and . Equivalently,
hence is the cardinality of set in the continuous cover inequality. Following the result of Lemma 1 for the interchangeable relationship between sets and , and coincide with the coefficients of the continuous cover and continuous packing inequalities, respectively.
For the second part of this Proposition, we assume and holds for all without loss of generality. Observe that for , the possible values of for , where is an implicit upper bound of number of activated neighbors for node , namely, . We prove that for any choice of , we must have integral in the following three cases:
Case 1.
Case 2.
Case 3.
Suppose . First, let , then . We have in both inequality (13a) and inequality (14). Following Case 1 and Case 2, we have with and for some such that . Next, let and holds in inequality (13a). While in inequality (14),
Since we assume , inequality (13a) dominates inequality (14). By mathematical induction, for , we conclude that inequality (13a) always dominates inequality (14). Furthermore, we must have from the implicit upper bound and the value of and are either 0 or 1 following Case 1 and Case 2. We have now demonstrated that inequality (14) is facet-defining and are integral for any choice of , thus the proof is completed.
∎
We close this section by noting that the inequalities (11) and the separation algorithm in Section 3.3 can be directly applied to LCIM-TE on a graph with cycles.
Proposition 13.
The inequality for equal influence weights of a cycle is given by
(15) |
for .
5 Computational Experiments
In this section, we give a detailed description of the data generation and algorithm settings. We test the effectiveness of a delayed cut generation algorithm that incorporates the proposed valid inequalities in solving LCIM under different conditions. All the experiments were conducted on a single thread of a Windows 10 Enterprise server with Intel(R) Core i7-4770 CPU at 3.40 GHz x-64 based processor and 8GB of RAM using Python 3.8 and Gurobi 9.1.2 with default settings as the optimization solver. A 3600 seconds time limit was imposed for each experiment.
5.1 Data generation and algorithm settings
We follow the exact data generation scheme in [10], except for the fact that we generate bidirectional arcs between every two nodes. The small-world network topology for each instance is generated based on watts_strogatz_graph function in the NetworkX package of Python [31]. The instances have the following properties: size of node set , average node degree , rewiring probability and we set penetration rate . Influence weight for all are generated from discrete uniform distribution between 1 and 10. Let and be a random variable follows normal distribution for all . We set . For each setting, we generate three instances and report the average.
The effectiveness of two delayed cut generation algorithms and one alternative reformulation are compared in our study:
- 1.
-
2.
CB: formulation LCIM with cut-and-branch enhancement, and
-
3.
LN: layered-network formulation.
To implement the delayed cut generation, the GCEC (1d) is separated via lazy constraint callback only for integer solutions for DEF and CB. Grötschel et al. [32] give a shortest path algorithm for separating (1d) and we utilize the existing shortest path function dijkstra_path in the NetworkX package to perform such task. For the cut-and-branch enhancement in algorithm CB, we add the proposed inequalities via user-cut callback at the root node to tighten the linear programming relaxation of formulation LCIM.
We use the layered-network formulation proposed by [33] to replace (1d) in algorithm LN as this formulation gives a directed acyclic graph with the additional layer assignment variables for all . We are interested in testing whether this cycle-free formulation is beneficial to solve LCIM compared with GCEC (1d). In addition, this formulation allows us to keep the mixed 0-1 knapsack substructure so the valid inequalities can be applied to it directly. In our preliminary experiments, we observe that this formulation does not produce better optimality gap for by relaxing constraints (16a) compared with DEF and CB (average optimality gap ), hence, we only report the computation for . The layered-network formulation used in algorithm LN is given by
where
(16a) | |||
(16b) | |||
(16c) |
5.2 Analysis of results
=50 | Nodes | Cuts | Time[Gap]* | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
DEF | CB | LN | DEF | CB | LN | DEF | CB | LN | |||
4-200 | 0.1 | 0.1 | 18 | 12 | 12 | 12 | 0.30 | 0.31 | |||
0.25 | 121 | 53 | 25 | 15 | 0.45 | 0.40 | |||||
0.5 | 492 | 254 | 52 | 21 | 0.84 | 0.65 | |||||
0.75 | 1562 | 1680 | 76 | 45 | 2.53 | 2.74 | |||||
1 | 921 | 636 | 1041 | 70 | 27 | 233 | 1.15 | 0.78 | 17.84 | ||
4-200 | 0.3 | 0.1 | 1 | 5 | 5 | 4 | 0.29 | 0.20 | |||
0.25 | 1 | 1 | 13 | 3 | 0.33 | 0.27 | |||||
0.5 | 33 | 9 | 28 | 10 | 0.41 | 0.39 | |||||
0.75 | 129 | 99 | 45 | 18 | 0.61 | 0.56 | |||||
1 | 207 | 1 | 140 | 47 | 16 | 83 | 0.57 | 0.42 | 5.02 | ||
8-400 | 0.1 | 0.1 | 85 | 115 | 18 | 15 | 1.02 | 1.71 | |||
0.25 | 429 | 392 | 14 | 14 | 2.41 | 2.85 | |||||
0.5 | 6077 | 8831 | 57 | 65 | 19.08 | 27.31 | |||||
0.75 | 457292 | 196845 | 189 | 169 | 1367.06[7.31]* | 682.35 | |||||
1 | 838587 | 658888 | 69506 | 339 | 341 | 2491 | [8.22]*** | [4.70]*** | [8.56]*** | ||
8-400 | 0.3 | 0.1 | 104 | 85 | 19 | 14 | 0.96 | 1.49 | |||
0.25 | 375 | 497 | 17 | 18 | 2.50 | 3.81 | |||||
0.5 | 3009 | 3592 | 35 | 40 | 15.88 | 31.34 | |||||
0.75 | 323777 | 185520 | 148 | 170 | 2028.51[5.11]* | 1486.88 | |||||
1 | 418483 | 404013 | 57946 | 297 | 265 | 2372 | [10.47]*** | [7.13]*** | [12.08]*** |
=75 | Nodes | Cuts | Time[Gap]* | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
DEF | CB | LN | DEF | CB | LN | DEF | CB | LN | |||
4-300 | 0.1 | 0.1 | 91 | 17 | 11 | 8 | 0.57 | 0.31 | |||
0.25 | 833 | 742 | 46 | 22 | 1.87 | 1.35 | |||||
0.5 | 1094 | 1543 | 82 | 48 | 2.11 | 3.17 | |||||
0.75 | 3378 | 1880 | 119 | 69 | 5.92 | 4.17 | |||||
1 | 1753 | 1384 | 1252 | 130 | 69 | 366 | 2.08 | 2.15 | 18.39 | ||
4-300 | 0.3 | 0.1 | 10 | 12 | 17 | 9 | 0.64 | 0.70 | |||
0.25 | 309 | 76 | 34 | 13 | 0.85 | 0.71 | |||||
0.5 | 805 | 151 | 78 | 21 | 1.78 | 1.24 | |||||
0.75 | 1537 | 604 | 80 | 37 | 3.50 | 2.66 | |||||
1 | 976 | 965 | 1067 | 107 | 53 | 445 | 2.44 | 2.12 | 16.63 | ||
8-600 | 0.1 | 0.1 | 229 | 195 | 13 | 10 | 2.73 | 3.45 | |||
0.25 | 3178 | 912 | 26 | 20 | 18.82 | 7.81 | |||||
0.5 | 216230 | 212500 | 192 | 78 | 227.75[4.77]* | 857.06 | |||||
0.75 | 564387 | 348760 | 248 | 169 | [6.15]*** | 508.18[3.36]** | |||||
1 | 360594 | 379776 | 50255 | 368 | 324 | 2923 | [9.50]*** | [5.71]*** | [9.64]*** | ||
8-600 | 0.3 | 0.1 | 214 | 127 | 13 | 5 | 2.10 | 2.78 | |||
0.25 | 1078 | 1224 | 19 | 15 | 10.54 | 10.27 | |||||
0.5 | 9948 | 3776 | 68 | 75 | 129.29 | 113.47 | |||||
0.75 | 161508 | 223027 | 184 | 194 | [5.00]*** | 3489.72[1.99]** | |||||
1 | 191592 | 167297 | 41342 | 266 | 356 | 2701 | [9.48]*** | [6.38]*** | [10.30]*** | ||
12-900 | 0.1 | 0.1 | 812 | 635 | 9 | 10 | 13.93 | 18.00 | |||
0.25 | 1803 | 2376 | 18 | 13 | 53.28 | 62.91 | |||||
0.5 | 369964 | 229309 | 36 | 63 | 2803.78[2.11]* | [2.96]*** | |||||
0.75 | 18614 | 31354 | 111 | 144 | [13.68]*** | [10.28]*** | |||||
1 | 48188 | 33974 | 26375 | 228 | 230 | 3317 | [16.49]*** | [16.18]*** | [17.85]*** | ||
12-900 | 0.3 | 0.1 | 1163 | 1275 | 10 | 2 | 15.08 | 30.70 | |||
0.25 | 3757 | 5019 | 42 | 19 | 115.19 | 224.71 | |||||
0.5 | 27217 | 22714 | 35 | 48 | 1363.66[2.83]* | 1827.81[9.39]* | |||||
0.75 | 11042 | 12625 | 58 | 120 | [24.75]*** | [15.69]*** | |||||
1 | 20954 | 25899 | 23253 | 199 | 283 | 3071 | [22.71]*** | [17.21]*** | [26.88]*** |
We summarize our computational results in TABLE 3, 4 and 5 under various settings of . We report the average number of branch-and-cut tree nodes explored in the column Nodes. The column Cuts shows the average number of Gurobi cuts added during the optimization process. In column Time[Gap]* we report the average solution time (in seconds) of the instances that are solved to optimality, and the average of the optimality gap of the instances that are not solved to optimality when reaching time limit (in brackets). Each asterisk sign indicates an unsolved instance and the gap is calculated by 100 where ub and lb are the best integer feasible solution obtained and best lower bound generated by the algorithm within time limit, respectively.
We observe that the major factor that contributes to the unsolved instances with positive optimality gap is the average node degree rather than the number of nodes. This observation can be justified by comparing the instances = (100,4,400) and = (50,8,400), as the former is easier to solve than the later. A similar observation is also established in [10] in their set covering formulation using the price-cut-and-branch algorithm. Despite the average number of nodes and cuts are the greatest in LN for =50, there exists no clear domination relationship in columns Nodes and Cuts between DEF, CB and LN for =75 or 100. Moreover, the average number of cuts added is not very large, which indicates that Gurobi cuts do not complement the optimization process. For the unsolved instances in Table 3 and 4, CB outperforms DEF and LN except for instances = (75,12,900,0.1,0.5) and (75,12,900,0.3,0.5). In Table 5 where =100, CB still outperforms DEF and LN for {4,8,12} except for one instance (100,12,1200,0.3,0.5), where the optimality gap difference is 0.92%. The common setting =0.5 shared in these exceptions suggests that the symmetry created by the cardinality constraint requires additional improvement. We also notice that LN produces better optimality gap than DEF for instances (100,8,800,0.1,1) and (100,12,1200,0.1,1). However, LN suffers from slow improvement of both upper and lower bound and results in larger ending gap in general in our experiments. We begin to see the degraded performance of algorithm CB for instances with =16. A large number of valid inequalities added to the root node could be a possible reason that decelerates the optimization process. Five out of ten results generated by CB under this category are no better than DEF. Nevertheless, the average optimality gap produced by CB for these five instances are 1.49% higher than DEF. To sum up, although the linear programming relaxation of the arc-based formulation is very weak with zero objective function value in most cases, our proposed valid inequalities significantly improve the strength of the lower bound and effectively reduce or close the optimality gap.
=100 | Nodes | Cuts | Time[Gap]* | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
DEF | CB | LN | DEF | CB | LN | DEF | CB | LN | |||
4-400 | 0.1 | 0.1 | 28 | 1 | 11 | 4 | 0.95 | 0.85 | |||
0.25 | 148 | 118 | 50 | 12 | 1.24 | 1.09 | |||||
0.5 | 1554 | 998 | 80 | 33 | 3.77 | 2.49 | |||||
0.75 | 1705 | 834 | 128 | 61 | 6.58 | 3.62 | |||||
1 | 669 | 1600 | 670 | 120 | 63 | 318 | 3.09 | 2.67 | 13.01 | ||
4-400 | 0.3 | 0.1 | 15 | 11 | 15 | 3 | 0.53 | 0.36 | |||
0.25 | 83 | 123 | 22 | 7 | 1.27 | 1.07 | |||||
0.5 | 365 | 391 | 92 | 26 | 2.40 | 1.60 | |||||
0.75 | 2045 | 697 | 117 | 53 | 8.55 | 2.92 | |||||
1 | 904 | 213 | 994 | 124 | 42 | 383 | 3.84 | 1.52 | 18.65 | ||
8-800 | 0.1 | 0.1 | 511 | 295 | 17 | 7 | 7.11 | 7.28 | |||
0.25 | 16138 | 9611 | 16 | 59 | 96.09 | 88.48 | |||||
0.5 | 364834 | 278716 | 297 | 157 | 1499.18[4.20]** | 1247.35[2.37]* | |||||
0.75 | 175014 | 231825 | 282 | 306 | [9.81]*** | [7.10]*** | |||||
1 | 121123 | 154716 | 46686 | 374 | 439 | 2988 | [13.84]*** | [8.39]*** | [13.02]*** | ||
8-800 | 0.3 | 0.1 | 434 | 345 | 16 | 12 | 5.74 | 7.27 | |||
0.25 | 5682 | 4509 | 19 | 21 | 56.08 | 72.36 | |||||
0.5 | 65328 | 25567 | 91 | 129 | 735.63[1.79]* | 685.49 | |||||
0.75 | 13119 | 37109 | 190 | 233 | [14.13]*** | [6.43]*** | |||||
1 | 40550 | 68921 | 35379 | 261 | 407 | 2849 | [15.19]*** | [9.39]*** | [17.02]*** | ||
12-1200 | 0.1 | 0.1 | 1866 | 1249 | 6 | 4 | 44.77 | 83.37 | |||
0.25 | 6425 | 6356 | 29 | 48 | 296.98 | 372.08 | |||||
0.5 | 40341 | 41264 | 41 | 52 | [11.17]*** | [9.17]*** | |||||
0.75 | 8292 | 7915 | 143 | 173 | [19.69]*** | [16.40]*** | |||||
1 | 15506 | 16755 | 20873 | 187 | 388 | 4447 | [29.78]*** | [18.11]*** | [20.74]*** | ||
12-1200 | 0.3 | 0.1 | 3984 | 2870 | 0 | 9 | 39.06 | 145.17 | |||
0.25 | 55111 | 33660 | 14 | 65 | 1187.31 | 2191.00 | |||||
0.5 | 15457 | 11751 | 44 | 51 | [21.00]*** | [21.92]*** | |||||
0.75 | 4426 | 5168 | 102 | 128 | [30.63]*** | [24.46]*** | |||||
1 | 8704 | 10036 | 20128 | 174 | 241 | 3688 | [21.11]*** | [19.05]*** | [21.31]*** | ||
16-1600 | 0.1 | 0.1 | 4585 | 5403 | 14 | 11 | 193.29 | 474.52 | |||
0.25 | 9590 | 11593 | 26 | 28 | 1244.07 | 2010.94 | |||||
0.5 | 5384 | 4268 | 9 | 14 | [22.36]*** | [22.42]*** | |||||
0.75 | 3360 | 3591 | 68 | 92 | [32.71]*** | [30.71]*** | |||||
1 | 8464 | 6772 | 11907 | 148 | 158 | 3443 | [22.47]*** | [23.43]*** | [24.10]*** | ||
16-1600 | 0.3 | 0.1 | 11982 | 13063 | 5 | 20 | 286.70 | 852.37 | |||
0.25 | 51785 | 23015 | 19 | 33 | [19.46]*** | [22.45]*** | |||||
0.5 | 5271 | 4520 | 17 | 18 | [34.36]*** | [35.58]*** | |||||
0.75 | 2636 | 2724 | 85 | 89 | [42.24]*** | [44.46]*** | |||||
1 | 4441 | 4485 | 11802 | 78 | 91 | 3408 | [25.78]*** | [24.78]*** | [25.79]*** |
6 Conclusion
We study the least cost influence maximization problem in social networks where the influence propagation behavior among users is captured by the deterministic linear threshold model. A typical application of this problem is to obtain an estimation of the partial incentives given to early product adopters in viral marketing while achieving a desired coverage rate by the end of information spreading. We focus on the case where influence weights exerted from peers are heterogeneous and derive several classes of valid inequalities from the hidden mixed 0-1 knapsack substructure in the mixed-integer programming formulation. Despite the fact that the set of feasible solutions is hard to convexify due to the knapsack constraints and the linear programming relaxation being very weak, our computational experiments show that the delayed cut generation algorithm exploiting these inequalities can effectively reduce the optimality gap. For the case with equal influence weights and 100% adoption on a tree, we characterize the complete linear description of the convex hull in its natural space of incentive, arc propagation and activation variables. The convex hull of the LCIM with equal influence weights and arbitrary adoption on a tree is still an open question and requires further investigation. We observe that the bottleneck of computational improvements are mostly in the instances where 100% adoption is not required. A promising future research direction is to identify more explicit valid inequalities from the intersections of the cardinality constraint that controls the penetration rate with the rest of the formulation.
Acknowledgements
We are grateful to the two anonymous reviewers who provided constructive feedback that helped us improve the content of this paper. We also thank Demetrios Papazaharias in the early discussion of the separation problem for the minimum influencing subset inequalities. This material is based upon work supported by the Air Force Research Laboratory (AFRL) Mathematical Modeling and Optimization Institute and AFRL award FA8651-16-2-0009. Preliminary results of this study were presented at the 9th International Conference on Computational Data and Social Networks (CSoNet 2020, December 11–13, 2020, Dallas, TX) and published in the respective conference proceedings volume [20].
Data Availability Statement
The datasets analyzed during the current study are available from the corresponding author on reasonable request.
References
- [1] Wei Chen, Laks VS Lakshmanan, and Carlos Castillo. Information and influence propagation in social networks. Synthesis Lectures on Data Management, 5(4):1–177, 2013.
- [2] David Kempe, Jon Kleinberg, and Éva Tardos. Maximizing the spread of influence through a social network. Theory of Computing, 11(4):105–147, 2015.
- [3] Mark Granovetter. Threshold models of collective behavior. American journal of sociology, 83(6):1420–1443, 1978.
- [4] David Kempe, Jon Kleinberg, and Éva Tardos. Maximizing the spread of influence through a social network. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 137–146, 2003.
- [5] Ning Chen. On the approximability of influence in social networks. SIAM Journal on Discrete Mathematics, 23(3):1400–1415, 2009.
- [6] Suman Banerjee, Mamata Jenamani, and Dilip Kumar Pratihar. A survey on influence maximization in a social network. Knowledge and Information Systems, 62(9):3417–3455, 2020.
- [7] Yuchen Li, Ju Fan, Yanhao Wang, and Kian-Lee Tan. Influence maximization on social graphs: A survey. IEEE Transactions on Knowledge and Data Engineering, 30(10):1852–1872, 2018.
- [8] Sancheng Peng, Yongmei Zhou, Lihong Cao, Shui Yu, Jianwei Niu, and Weijia Jia. Influence analysis in social networks: A survey. Journal of Network and Computer Applications, 106:17–32, 2018.
- [9] Mehdi Azaouzi, Wassim Mnasri, and Lotfi Ben Romdhane. New trends in influence maximization models. Computer Science Review, 40:100393, 2021.
- [10] Matteo Fischetti, Michael Kahr, Markus Leitner, Michele Monaci, and Mario Ruthmair. Least cost influence propagation in (social) networks. Mathematical Programming, 170(1):293–325, 2018.
- [11] Dilek Günneç, S Raghavan, and Rui Zhang. A branch-and-cut approach for the least cost influence problem on social networks. Networks, 76(1):84–105, 2020.
- [12] Dilek Günneç, Subramanian Raghavan, and Rui Zhang. Least-cost influence maximization on social networks. INFORMS Journal on Computing, 32(2):289–302, 2020.
- [13] Furkan Gursoy and Dilek Günneç. Influence maximization in social networks under deterministic linear threshold model. Knowledge-Based Systems, 161:111–123, 2018.
- [14] Babak Moazzez and Hossein Soltani. Integer programming approach to static monopolies in graphs. Journal of Combinatorial Optimization, 35(4):1009–1041, 2018.
- [15] Babak Moazzez and Hossein Soltani. Facets of the dynamic monopoly polytope: Linear ordering formulation. Discrete Optimization, 40:100641, 2021.
- [16] Hossein Soltani and Babak Moazzez. A polyhedral study of dynamic monopolies. Annals of Operations Research, 279(1):71–87, 2019.
- [17] Hao-Hsiang Wu and Simge Küçükyavuz. A two-stage stochastic programming approach for influence maximization in social networks. Computational Optimization and Applications, 69(3):563–595, 2018.
- [18] Giacomo Nannicini, Giorgio Sartor, Emiliano Traversi, and Roberto Wolfler Calvo. An exact algorithm for robust influence maximization. Mathematical Programming, 183(1):419–453, 2020.
- [19] Eyal Ackerman, Oren Ben-Zwi, and Guy Wolfovitz. Combinatorial model and bounds for target set selection. Theoretical Computer Science, 411(44-46):4017–4022, 2010.
- [20] Cheng-Lung Chen, Eduardo L Pasiliao, and Vladimir Boginski. A cutting plane method for least cost influence maximization. In International Conference on Computational Data and Social Networks, pages 499–511. Springer, 2020.
- [21] Hugues Marchand and Laurence A Wolsey. The 0-1 knapsack problem with a single continuous variable. Mathematical Programming, 85(1):15–33, 1999.
- [22] Veronica Dal Sasso, Luigi De Giovanni, and Martine Labbé. Strengthened formulations and valid inequalities for single delay management in public transportation. Transportation Science, 53(5):1271–1286, 2019.
- [23] Brian Keller and Güzin Bayraksan. Scheduling jobs sharing multiple resources under uncertainty: A stochastic programming approach. IIE Transactions, 42(1):16–30, 2009.
- [24] Marko Loparic, Hugues Marchand, and Laurence A Wolsey. Dynamic knapsack sets and capacitated lot-sizing. Mathematical Programming, 95(1):53–69, 2003.
- [25] Andrew J Miller, George L Nemhauser, and Martin WP Savelsbergh. On the capacitated lot-sizing and continuous 0–1 knapsack polyhedra. European Journal of Operational Research, 125(2):298–315, 2000.
- [26] Amar K Narisetty, Jean-Philippe P Richard, and George L Nemhauser. Lifted tableaux inequalities for 0–1 mixed-integer programs: A computational study. INFORMS Journal on Computing, 23(3):416–424, 2011.
- [27] J-PP Richard, Ismael R de Farias Jr, and George L Nemhauser. Lifted inequalities for 0-1 mixed integer programming: Basic theory and algorithms. Mathematical programming, 98(1):89–113, 2003.
- [28] J-PP Richard, Ismael R de Farias Jr, and George L Nemhauser. Lifted inequalities for 0-1 mixed integer programming: Superlinear lifting. Mathematical programming, 98(1):115–143, 2003.
- [29] S Raghavan and Rui Zhang. Weighted target set selection on trees and cycles. Networks, 77(4):587–609, 2021.
- [30] Thomas W Valente. Social network thresholds in the diffusion of innovations. Social networks, 18(1):69–89, 1996.
- [31] Aric A. Hagberg, Daniel A. Schult, and Pieter J. Swart. Exploring network structure, dynamics, and function using networkx. In Gaël Varoquaux, Travis Vaught, and Jarrod Millman, editors, Proceedings of the 7th Python in Science Conference, pages 11 – 15, Pasadena, CA USA, 2008.
- [32] Martin Grötschel, Michael Jünger, and Gerhard Reinelt. On the acyclic subgraph polytope. Mathematical Programming, 33(1):28–42, 1985.
- [33] Hasan Manzour, Simge Küçükyavuz, Hao-Hsiang Wu, and Ali Shojaie. Integer programming for learning directed acyclic graphs from continuous data. Informs Journal on Optimization, 3(1):46–73, 2021.