A Polyhedral Approach to Least Cost Influence Maximization in Social Networks

Cheng-Lung Chen
Industrial Engineering & Management Systems, University of Central Florida, USA.
[email protected]
Eduardo L. Pasiliao
Air Force Research Laboratory, Eglin AFB, USA.
[email protected]
Vladimir Boginski
Industrial Engineering & Management Systems, University of Central Florida, USA.
[email protected]

Abstract

The least cost influence maximization problem aims to determine minimum cost of partial (e.g., monetary) incentives initially given to the influential spreaders on a social network, so that these early adopters exert influence toward their neighbors and prompt influence propagation to reach a desired penetration rate by the end of cascading processes. We first conduct polyhedral analysis on a substructure that describes influence propagation assuming influence weights are unequal, linear and additively separable. Two classes of facet-defining inequalities based on a mixed 0-1 knapsack set contained in this substructure are proposed. We characterize another exponential class of valid and facet-defining inequalities utilizing the concept of minimum influencing subset. We show that these inequalities can be separated in polynomial time efficiently. Furthermore, a polynomial-time dynamic programming recursion is presented to solve this problem on a simple cycle graph. For arbitrary graphs, we propose a new exponential class of valid inequalities that dominates the cycle elimination constraints and an efficient separation algorithm for them. A compact convex hull description for a special case is presented. We illustrate the effectiveness of these inequalities via a delayed cut generation algorithm in the computational experiments.

Keywords— Influence Maximization, Social Networks, Valid Inequalities, Delayed Cut Generation

1 Introduction

Intricate connections between entities in many natural and man-made systems form large complex networks. Of particular interest in network science is to gain insight into the dynamic of cascading processes in complex networks. For instance, the spreading of new behaviors, opinions, technologies, conventions, and gossips from person to person through a social network may play an important role in designing competitive marketing strategies. Indeed, the collection of social ties among consumers can be exploited to select pivotal early adopters for initiation and anticipate the time and cost required for the information propagation. Therefore, a key research question has been focused on how to efficiently identify a set of users to disseminate a certain information within the network, result in an increasing trend in studying social influence and information propagation in social networks (see, e.g., [1, 2]). As it may be expected, the spreading of social influence is commonly studied on network graphs that consist of nodes and links representing users and their connections, as well as a framework that describes how the information propagates among various intermediate users over time. Granovetter [3] first presented the threshold model to simulate the collective force exerted by a group on each of its members for predicting innovation and rumor diffusion, voting trends, and migration. A distinct threshold value is assigned to each user representing the proportion of neighbors who make a decision before a particular user makes such decision. Many extensions built on the threshold model have been proposed to encompass different circumstances. Among them, the linear threshold model and independent cascade model are the two most popular and well-studied one. Kempe et al. [4] study the linear threshold and independent cascade models under the influence maximization (IM) problem. The goal of IM is to activate as many nodes as possible by the end of propagation process given selected influential nodes within budget. Specifically, they show that the expected influence spread is a monotone submodular function, which can be approximated by greedy algorithm with performance guarantee $1-\frac{1}{e}-\epsilon$ . On the other hand, the target set selection (TSS) introduced by [5] aims to find the minimum number of users required initially and activate the entire network through the propagation process. These two types of problems are fundamental in social network analysis and have many variants. Here we refer the reader to a survey of hardness and solution approaches for TSS [6], a review on approximation algorithms for IM [7], and a comprehensive review on introduction to types of social networks, properties, evaluation metrics and known method to solve IM [8]. Recently, Azaouzi et al. [9] presented a comprehensive review on models and methods of group-level IM and IM under privacy protection.

In this paper, we study an optimization problem arising in social network analytics, referred to as the least cost influence maximization (LCIM) problem [10, 11, 12]. The propagation process in LCIM is based on the linear threshold model, where each user is assigned a real-valued threshold and each link between users is assigned a weight to represent influence level. If the summation of influence weights from neighbors exceeds one’s threshold value, then the individual is activated, meaning that the information is adopted or the person changes their opinion to align with friends (neighbor nodes) in the social network. In LCIM, the concept of influence weights is extended with the idea of giving partial incentives such as free samples or coupons to individual as a motivation for spreading information. The assumption of the linear threshold model allows the incentives and influence weights in the activation processes to be linearly additive, which immediately admits a mixed-integer optimization formulation for the influence propagation. Our goal is to develop an exact computational method based on the polyhedral structure. We assume that all the parameters are deterministic, all the nodes are inactive initially and nodes remain activated once the influence weights exceeds the threshold. A similar assumption on using deterministic parameters in linear threshold model can be found in [13]. From a practical point of view, the assumption of deterministic linear threshold depends on the accuracy of estimation of influence factors and threshold parameters. Machine learning and data mining techniques may enable one to obtain accurate predictions on those parameters from massive amounts of data available nowadays. Note that when incentive is allowed and the influence weights and thresholds are considered known in the problem, the influence propagation is no longer submodular, thus, a greedy algorithm is not applicable for arbitrary graphs. However, the problem is still challenging to solve due to the combinatorial nature. The minimum subset of nodes required under dynamic activation is also referred to as the dynamic monopoly, where nodes become activated if at least half of the neighbors are activated in the previous time period. Hence, an extra index of time is incorporated into the integer programming formulation to describe step-by-step activation of nodes, see [14, 15, 16]. We do not consider time dynamics in our problem despite the fact that the influence propagation occurs in discrete steps. Moreover, the induced optimal influence propagation graph for LCIM in the static network setting given by the mixed-integer programming formulation is known to be acyclic [10].

Even though IM, TSS and LCIM problems share certain similarities, most of the previous studies on IM or TSS mainly focus on developing degree-based or centrality-based metaheuristics and approximation algorithms. Existing studies on using exact mixed-integer optimization techniques for LCIM under deterministic data settings are relatively limited. The computational complexity of LCIM is established in [12]. In particular, they show that LCIM is NP-hard on arbitrary graphs and bipartite graphs for both equal and unequal influence when 100% penetration rate is not required. They also give a greedy algorithm and a total unimodular formulation for LCIM with equal influence on a tree and 100% penetration rate. Günneç et al. [11] develop a branch-and-cut algorithm using this total unimodular formulation for LCIM on arbitrary graphs. Fischetti et al. [10] present a novel set covering formulation for a generalized LCIM, where the activation function can be adjusted to be nonlinear in order to capture the situation of diminishing marginal influences or over-proportional effect from peers. They propose strengthened generalized propagation inequalities and a price-cut-and-branch algorithm to deal with the exponential number of variables and constraints. Recent developments on using exact mixed-integer optimization methods for stochastic influence maximization problem include a delayed constraint generation algorithm for a two-stage stochastic influence maximization problem [17] and a branch-cut-and-price algorithms for robust a influence maximization, where node thresholds and arc influence factors are subject to budget uncertainty [18].

1.1 Notation and problem definition

For convenience, we use the notation $[a,b]$ to denote the set $\{a,a+1,\ldots,b\}$ for $a\leq b$ , and $[a,b]=\varnothing$ for $a>b$ . For a set $\mathcal{R}\subseteq\mathbb{R}^{n}$ , we use $\operatorname{conv}{(\mathcal{R})}$ to denote its convex hull of solutions. Formally, an oriented network (e.g., a social network) is represented by a graph $G=(V,E)$ with the set of nodes $V:=\{1,\ldots,n\}$ corresponding to people or users and the set of bidirectional arcs $E\subseteq\{(i,j)\in V\times V:(j,i)\in V\times V,i\neq j\}$ with cardinality $m$ corresponding to connections and influence directions between people in the network. Hence, each node has identical number of predecessors and successors, denoted by $v_{i}$ : $v_{i}=|N(i)|$ , where $N(i):=\{j\in V:(j,i)\in E\}$ . Each node $i\in V$ has a threshold value $h_{i}$ measuring the “difficulty level” of an individual to be “activated”. Each arc $(i,j)\in E$ is associated with an influence weight $d_{ij}$ . The coverage (penetration) rate is denoted by $a$ , where $0<a\leq 1$ is assumed. We also assume that $d_{ij}$ and $h_{i}$ are positive integers such that $\max\{d_{ji}:j\in N(i)\}\leq h_{i}$ and $\sum_{j\in N(i)}d_{ji}>h_{i}$ for all $i\in V$ such that $|N(i)|\geq 2$ throughout this paper to omit trivial cases. For any $d_{ji}>h_{i}$ and $|N(i)|=1$ for some $i\in V$ , we pre-process the data to set $d_{ji}=h_{i}$ . All nodes are assumed inactive initially and nodes remain active once influences from neighbors and incentives reach or exceed the threshold. For each node $i\in V$ , let nonnegative continuous variables $x_{i}$ be the amount of partial incentives given to user $i$ , binary variables $y_{ij}$ indicate whether influence is exerted from node $i$ to $j$ nor not, and binary variables $z_{i}$ indicate whether node $i$ is activated or not. The arc-based formulation of static LCIM is given by


$\displaystyle\text{(LCIM) }\min\limits_{x,y,z}\quad$	$\displaystyle\sum_{i\in V}x_{i}$
s.t.	$\displaystyle x_{i}+\sum_{j\in N(i)}d_{ji}y_{ji}\geq h_{i}z_{i}\quad\forall i\in V$	(1a)
	$\displaystyle y_{ij}+y_{ji}\leq z_{i}\quad\forall(i,j)\in E$	(1b)
	$\displaystyle\sum\limits_{i\in V}z_{i}\geq b$	(1c)
	$\displaystyle\sum_{(i,j)\in C}y_{ij}\leq\sum_{i\in V(C)\setminus\{k\}}z_{i}\quad\forall k\in V(C),\forall\text{ cycles }C\subseteq E$	(1d)
	$\displaystyle x\in\mathbb{R}_{+}^{n}$
	$\displaystyle y\in\mathbb{B}^{m},z\in\mathbb{B}^{n}.$

Node propagation constraints (1a) follow the linear threshold model by evaluating the total incoming influence from neighbor plus the incentives given to a node. Constraints (1b) state that for every two nodes with bidirectional arcs, the influence exertion is allowed in one way only. The cardinality constraint (1c) describes the desired number of activated nodes given a predetermined penetration rate $a$ , where $b=\lceil an\rceil$ and $1\leq b\leq n$ . Constraints (1d) are generalized cycle elimination constraints (GCEC) where $V(C)=\{i\in V:(i,j)\in C\}$ . They are necessary to produce the acyclic optimal influence propagation graph. Note that the arc-based formulation proposed by [19] is different from this paper as the influence weights in their model are coming solely from their neighbors without incentives. Fischetti et al. [10] adopt this arc-based formulation on arbitrary graphs for computational performance comparison, but possible values of incentives are discretized by a set of an exponential number of binary variables.

1.2 Main contributions

The key contributions of this paper are summarized as follows. We conduct a polyhedral study for the hidden mixed 0-1 knapsack substructure in (1a) of LCIM. We propose three classes of strong valid inequalities, namely, the continuous cover, continuous packing and the minimum influencing subset inequalities from this substructure and specify the conditions under which they are facet-defining. The coefficient of these inequalities can be adjusted to equal influence assumption directly. We have improved the run time of the separation algorithm for the the continuous cover and continuous packing inequalities compared with our preliminary results presented in [20]. In addition, we provide a polynomial-size complete linear description of the polyhedron of LCIM on a tree when equal influence weights for every node and 100% coverage are assumed. For LCIM on arbitrary bidirectional network graphs, we derive a novel class of strong valid inequalities called the the $(U,C)$ inequalities and show they dominate the general cycle elimination constraints. Finally, we augment the preliminary computations in [20] with different formulations, cutting plane strategies, density and scale of networks in extensive computational experiments.

1.3 Outline

The remainder of this paper is organized as follows. In Section 2, we derive strong valid inequalities from the mixed 0-1 knapsack substructure and give an exact polynomial time separation algorithm for these inequalities. In Section 3, we discuss an $O(n)$ time dynamic programming recursion for LCIM on a simple cycle. We propose the $(U,C)$ inequalities for arbitrary bidirectional graphs and develop a polynomial time separation algorithm via solving the longest path problem on a directed acyclic graph. We present the convex hull description of LCIM on a special case in Section 4. Finally, we illustrate the effectiveness of our proposed valid inequalities in the computational experiments in Section 5 and conclude with Section 6.

2 Valid inequalities based on mixed 0-1 knapsack polyhedron

To develop a strong formulation for LCIM, we study the polyhedral structure of constraints (1a). For $i\in V$ , let

\displaystyle\mathcal{P}_{i}=\left\{(x_{i},y,z_{i})\in\mathbb{R}_{+}\times\mathbb{B}^{v_{i}+1}:x_{i}+\sum_{j\in N(i)}d_{ji}y_{ji}\geq h_{i}z_{i}\right\}.

The set $\mathcal{P}_{i}$ is a mixing set with a binary variable on the right-hand side value. For any inequality that is facet-defining for $\operatorname{conv}{(\mathcal{P}_{i})}$ , it is facet-defining for $\operatorname{conv}{(\cap_{i\in V}\mathcal{P}_{i})}$ as well. Therefore, we now consider a single node propagation by dropping the subscript $i$ and obtain the following set

\displaystyle\mathcal{P}=\left\{(x,y,z)\in\mathbb{R}_{+}\times\mathbb{B}^{v+1}:x+\sum_{j\in N}d_{j}y_{j}\geq hz\right\}.

Observe that the set $\mathcal{P}$ contains a mixed 0-1 knapsack structure. Let set $\mathcal{\overline{P}}$ obtained from $\mathcal{P}$ by setting $\overline{y}_{j}=1-y_{j}$ , $j\in N$ and $z=1$ , then we have a mixed 0-1 knapsack set $\overline{\mathcal{P}}$ with weight $d_{j}$ for each item $j\in N$ and the capacity of knapsack $\sum_{j\in N}d_{j}-h$ plus an unbounded continuous variable $x$ in the following

\displaystyle\overline{\mathcal{P}}=\left\{(x,\overline{y},z)\in\mathbb{R}_{+}\times\mathbb{B}^{v}\times\{1\}:\sum_{j\in N}d_{j}\overline{y}_{j}\leq\left(\sum_{j\in N}d_{j}-hz\right)+x\right\}.

The mixed 0-1 knapsack set $\overline{\mathcal{P}}$ is a special case of traditional 0-1 knapsack problem where the knapsack size is expanded with additional capacity. Observe that $\dim(\mathcal{P})$ is full-dimensional and contains the origin. There are trivial facets for $\operatorname{conv}{(\mathcal{P})}$ and their verification is straightforward.

Proposition 1.

The following facet-defining inequalities for $\operatorname{conv}{(\mathcal{P})}$ are trivial.

(i)

The inequality $x+\sum_{j\in N}d_{j}y_{j}\geq hz$ is facet-defining for $\operatorname{conv}{(\mathcal{P})}$ if $d_{j}\leq h$ for all $j\in N$ .
(ii)

The inequality $x\geq 0$ is facet-defining for $\operatorname{conv}{(\mathcal{P})}$ if $d_{j}\leq h$ for all $j\in N$ .
(iii)

The inequality $y_{j}\geq 0$ is facet-defining for $\operatorname{conv}{(\mathcal{P})}$ .
(iv)

The inequality $y_{j}\leq 1$ is facet-defining for $\operatorname{conv}{(\mathcal{P})}$ .

The first result of Proposition 1 validates our data pre-processing assumption for influence weight and threshold when $d_{ij}>h_{i}$ . To show this, consider a single node influence propagation inequality

\displaystyle x+\sum_{j\in N:d_{j}\leq h}d_{j}y_{j}+\sum_{j\in N:d_{j}>h}d_{j}y_{j}\geq hz.

Clearly, this inequality is dominated by

\displaystyle x+\sum_{j\in N:d_{j}\leq h}d_{j}y_{j}+\sum_{j\in N:d_{j}>h}hy_{j}\geq hz

as the second inequality is faced-defining. Hence, we can obtain the facet-defining inequality by simply substituting any influence weight $d_{ji}$ with $h_{i}$ for all $j\in N(i)$ if $d_{ji}>h_{i}$ . Marchand and Wolsey [21] propose two classes of valid inequalities for $\overline{\mathcal{P}}$ based on mixed-integer rounding and lifting function, namely, the continuous cover and continuous packing (reversed cover) inequalities. We follow a similar idea to strengthen the formulation for LCIM with moderate modification as $\overline{\mathcal{P}}\subset\mathcal{P}$ . Applications of continuous cover and continuous packing inequalities for $\mathcal{\overline{P}}$ can be seen in several fields, including delay management for public transportation [22], job scheduling with uncertain multiple resources [23], discrete lot sizing [24] and single-item capacitated lot sizing [25]. They can also be extended to solve general mixed-integer optimization that contains mixed 0-1 knapsack set with bounded continuous variables, see [26, 27, 28].

2.1 Continuous cover and continuous packing inequalities

Consider a continuous cover $S:=\{1,\ldots,s\}\subseteq N$ such that $h+\sum_{j\in S}d_{j}-\sum_{j\in N}d_{j}=\pi>0$ and $h+\sum_{j\in S\setminus\{k\}}d_{j}-\sum_{j\in N}d_{j}<0$ for any $k\in S$ . Let $d_{j}\in S$ be in non-increasing order with $d_{1}\geq\ldots\geq d_{r}>\pi\geq d_{r+1}\geq\ldots\geq d_{s}$ , $D_{j}=\sum_{k=1}^{j}d_{k}$ for $j\leq r$ and $D_{0}=0$ .

Proposition 2.

The continuous cover inequality

\displaystyle x+\sum_{j\in S}\min\{\pi,d_{j}\}y_{j}+\sum_{j\in N\setminus S}\Phi(d_{j})y_{j}\geq\left(\min\limits_{j\in S}\{\pi,d_{j}\}+\sum_{j\in N\setminus S}\Phi(d_{j})\right)z

(2)

where

\displaystyle\Phi(d)=\begin{cases}j\pi&D_{j}\leq d\leq D_{j+1}-\pi,\quad j\in[0,r-1]\\ j\pi+d-D_{j}&D_{j}-\pi\leq d\leq D_{j},\quad j\in[1,r-1]\\ r\pi+d-D_{r}&D_{r}-\pi\leq d,\end{cases}

(3)

is valid for $\mathcal{P}$ .

Similarly, consider a continuous packing $L:=\{1,\ldots,l\}\subseteq N$ such that $\sum_{j\in L}d_{j}-h=\lambda>0$ and $\sum_{j\in L\setminus\{k\}}d_{j}-h<0$ for any $k\in L$ . Let $d_{j}\in L$ be in non-increasing order with $d_{1}\geq\ldots\geq d_{r}>\lambda\geq d_{r+1}\geq\ldots\geq d_{l}$ , $D_{j}=\sum_{k=1}^{j}d_{k}$ for $j\leq r$ and $D_{0}=0$ .

Proposition 3.

The continuous packing inequality

\displaystyle x+\sum_{j\in L}\max\{0,d_{j}-\lambda\}y_{j}+\sum_{j\in N\setminus L}\Psi(d_{j})y_{j}\geq\left(\sum_{j\in L}\max\{0,d_{j}-\lambda\}\right)z

(4)

where

\displaystyle\Psi(d)=\begin{cases}d-j\lambda&D_{j}\leq d\leq D_{j+1}-\lambda,\quad j\in[0,r-1]\\ D_{j}-j\lambda&D_{j}-\lambda\leq d\leq D_{j},\quad j\in[1,r-1]\\ D_{r}-\lambda r&D_{r}-\lambda\leq d,\end{cases}

(5)

is valid for $\mathcal{P}$ .

Here we omit the proofs as the validity of inequalities (2) and (4) and their facet-defining conditions for $\operatorname{conv}{(\mathcal{P})}$ directly follow from [21] when $z=1$ , while when $z=0$ , we must have $y_{j}=0$ for all $j\in N$ , both inequalities are trivially satisfied and facet-defining according to Proposition 1.

Example 1.

Let $d=(7,6,5,4)$ and $h=8$ , we list the facet-defining inequalities of inequality (2) and (4) in Table 1. For example, for $S=\{1,2,4\}$ , we have $\pi=3$ and $r=3$ . Then the lifting function $\Phi$ is given by

\displaystyle\Phi(d)=\begin{cases}0&0\leq d\leq 4\\ 3&7\leq d\leq 10\\ 6&13\leq d\leq 14\\ d-4&4\leq d\leq 7\\ d-7&10\leq d\leq 13\\ d-8&14\leq d\end{cases}

Hence, the coefficient of $y_{3}$ is $\Phi(d_{3})=\Phi(5)=5-4=1$ , which generates

\displaystyle x+3y_{1}+3y_{2}+y_{3}+3y_{4}\geq 4z.

Table 1: Continuous cover and continuous packing inequalities of Example 1

$x+7y_{1}+6y_{2}+5y_{3}+4y_{4}\geq 8z$
set	facet-defining inequality
$S=\{2,3,4\}$ , $L=\{1,2\}$	$x+y_{1}+y_{2}+y_{3}+y_{4}\geq 2z$
$S=\{1,3,4\}$ , $L=\{1,2\}$	$x+2y_{1}+y_{2}+2y_{3}+2y_{4}\geq 3z$
$S=\{1,2,4\}$ , $L=\{1,3\}$	$x+3y_{1}+3y_{2}+y_{3}+3y_{4}\geq 4z$
$S=\{1,2,3\}$ , $L=\{1,4\}$	$x+4y_{1}+4y_{2}+4y_{3}+y_{4}\geq 5z$
$S=\{1,2,4\}$ , $L=\{2,3\}$	$x+4y_{1}+3y_{2}+2y_{3}+3y_{4}\geq 5z$
$S=\{1,2,3\}$ , $L=\{2,4\}$	$x+5y_{1}+4y_{2}+4y_{3}+2y_{4}\geq 6z$
$S=\{1,2,3\}$ , $L=\{3,4\}$	$x+6y_{1}+5y_{2}+4y_{3}+3y_{4}\geq 7z$

Essentially, the continuous cover inequalities (2) and packing inequalities (4) are not sufficient to describe $\operatorname{conv}{(\mathcal{P})}$ , as the additional binary variable $z$ creates new extreme points. Next, we introduce a new class of valid inequalities for $\mathcal{P}$ that utilizes the concept of minimal influencing set.

2.2 Minimum influencing subset inequalities

We use the definition of minimal influencing set from [10], which we include here for the reader’s convenience.

Definition 1 ([10]).

Let $p_{i}\in[0,h_{i}-1]$ be an incentive payment to node $i\in V$ and $M\subseteq N(i)$ be a set of active neighbors of node $i\in V$ , such that $p_{i}+\sum_{j\in M}d_{ji}=h_{i}$ . We say $M$ is a minimal influencing subset for node $i\in V$ if and only if for a fixed incentive payment $\overline{p}_{i}$ , it satisfies $\overline{p}_{i}+\sum_{j\in M}d_{ji}=h_{i}$ and $\overline{p}_{i}+\sum_{j\in M\setminus\{k\}}d_{ji}<h_{i}$ for any $k\in M$ . In other words, a strict subset of $M$ with the same incentive payment is not sufficient to activate node $i$ .

Proposition 4.

Let $M\subseteq N$ be a minimum influencing subset with an incentive payment $p=h-\sum_{j\in M}d_{j}$ . The minimal influencing subset inequality

\displaystyle x+\sum_{j\in N\setminus M}\min\{d_{j},p\}y_{j}\geq pz

(6)

is valid for $\mathcal{P}$ .

Proof.

If $z=0$ then inequality (6) is trivially satisfied. If $y_{j}=0$ for all $j\in N\setminus M$ , either $x=0$ for $z=0$ or $x=p$ for $z=1$ . Assume that none of these cases hold, given a $p>0$ , rewrite the left term of the inequality in the following form:

		$\displaystyle x+\sum_{j\in N}d_{j}y_{j}$
	$\displaystyle=~{}$	$\displaystyle x+\sum_{j\in N\setminus M:d_{j}\leq p}d_{j}y_{j}+p\sum_{j\in N\setminus M:d_{j}>p}y_{j}+\sum_{j\in M}d_{j}y_{j}\geq h,$

which implies

\displaystyle x+\sum_{j\in N\setminus M:d_{j}\leq p}d_{j}y_{j}+p\sum_{j\in N\setminus M:d_{j}>p}y_{j}\geq h-\sum_{j\in M}d_{j}y_{j}\geq h-\sum_{j\in M}d_{j}=p.

∎

Proposition 5.

Inequality (6) is facet-defining for $\operatorname{conv}{(\mathcal{P})}$ if and only if $p>0$ . Moreover, for a given $i\in V$ and a set $N(i)$ , for each $M\subseteq N(i)$ such that $h_{i}-\sum_{j\in M}d_{ji}=p_{i}>0$ , the minimal influencing subset inequality

\displaystyle x_{i}+\sum_{j\in N(i)\setminus M}\min\{d_{ji},p_{i}\}y_{ji}\geq p_{i}z_{i}

(7)

is facet-defining for $\operatorname{conv}{(\cap_{i\in V}\mathcal{P}_{i})}$ .

Proof.

Recall that $p\in[0,h-1]$ by definition. If $p=0$ , then inequality (6) reduces to $x\geq 0$ ; therefore, $p>0$ is necessary. To show the sufficiency that inequality (6) is facet-defining for $\operatorname{conv}{(\mathcal{P})}$ , we exhibit $v+1$ linearly independent points $(x,\mathbf{y},z)$ on the face defined by inequality (6). Let $e_{j}\in\mathbb{B}^{v}$ be the unit vector corresponding to $y_{j}$ for $j\in N$ . Consider the two feasible points $(p,\sum_{j\in M}e_{j},1)$ and $(0,\sum_{j\in M}e_{j},0)$ , then, consider the $v-1$ feasible points $(0,\sum_{j\in M}(e_{j}+e_{k}),1)$ for $k\in L\setminus M$ . It is straightforward to verify that these $v+1$ points are linearly independent and satisfy inequality (6) at equality. To prove the second part of this proposition, let $\eta_{i}\in\mathbb{B}^{2n+m},\mu_{ij}\in\mathbb{B}^{2n+m}$ , and $\zeta_{i}\in\mathbb{B}^{2n+m}$ for $i\in V,j\in N(i)$ be the unit vectors corresponding to variables $x_{i},y_{ij}$ , and $z_{i}$ , respectively. The component of $\eta_{i}$ is 1 if it has the same position with $x_{i}$ in the feasible solution; all other components of $\eta_{i}$ are 0. Similar setting is made to $\mu_{ij}$ for $y_{ij}$ and $\zeta_{i}$ for $z_{i}$ , respectively. For $i\in V$ , consider the $n$ points $p_{i}\eta_{i}+\sum_{j\in M}\mu_{ji}+\zeta_{i}$ , also, consider the $n$ points $\sum_{j\in M}\mu_{ji}$ . For $i\in V$ and $k\in L\setminus M$ , consider the $m-1$ points $\sum_{j\in M}(\mu_{ji}+\mu_{ki})+\zeta_{i}$ . These $2n+m-1$ points are linearly independent and satisfy inequality (7) at equality, which completes the proof. ∎

Example 1 (Continued).

The facet-defining inequalities of (6) for Example 1 are listed in Table 2.

Table 2: Minimal influencing subset inequalities of Example 1

$x+7y_{1}+6y_{2}+5y_{3}+4y_{4}\geq 8z$
set	facet-defining inequality
$M=\{1\}$	$x+y_{2}+y_{3}+y_{4}\geq z$
$M=\{2\}$	$x+2y_{1}+2y_{3}+2y_{4}\geq 2z$
$M=\{3\}$	$x+3y_{1}+3y_{2}+3y_{4}\geq 3z$
$M=\{4\}$	$x+4y_{1}+4y_{2}+4y_{3}\geq 4z$

Although inequalities (2), (4) and (6) define a large number of facets for $\operatorname{conv}{(\mathcal{P})}$ , they are not sufficient to completely describe $\operatorname{conv}{(\mathcal{P})}$ in its original space of variables. Particularly, the following inequality is valid and facet-defining for Example 1 but can not be obtained through either inequalities (2), (4) or (6):

\displaystyle x+3y_{1}+2y_{2}+2y_{3}+2y_{4}\geq 4z.

2.3 Separation of minimal influencing subset inequalities

In this section, we give an exact polynomial time separation algorithm for finding the most violated minimal influencing subset inequality. From inequality (7), we observe that finding the most violated inequality for a given fractional solution $(\mathbf{x^{*}},\mathbf{y^{*}},\mathbf{z^{*}})\in\mathbb{R}_{+}^{2n+m}$ consists of choosing a set $M\subseteq N(i)$ such that $p_{i}z_{i}-\sum_{j\in N(i)\setminus M}\min\{d_{ji},p_{i}\}y_{ji}$ is maximized. Let $\hat{v}:=\max\{v_{i}:i\in V\}$ .

Proposition 6.

Given a fractional solution $(\mathbf{x^{*}},\mathbf{y^{*}},\mathbf{z^{*}})\in\mathbb{R}_{+}^{2n+m}$ from solving LCIM, there exists an $O(n\hat{v}\log\hat{v})$ time separation algorithm for inequality (7).

Proof.

A violated inequality (7) can be found if

\displaystyle p_{i}\left(z_{i}^{*}-\sum_{j\in N(i)\setminus M:d_{ji}>p_{i}}y_{ji}^{*}\right)-\sum_{j\in N(i)\setminus M:d_{ji}\leq p_{i}}d_{ji}y_{ji}^{*}>x_{i}^{*},

which implies that it suffices to consider $y_{ji}^{*}$ for some $j\in N(i)$ such that $z_{i}^{*}-\sum_{j\in N(i)}y_{ji}^{*}>0$ and $p_{i}>0$ . To do so, we sort $y_{ji}^{*}$ in non-decreasing order for $j\in N(i)$ with indices $j_{1},j_{2},\ldots,j_{v}$ such that $y_{j_{1}i}^{*}\leq y_{j_{2}i}^{*}\leq\ldots\leq y_{j_{v}i}^{*}$ . For $j_{1}\leq j_{k}\leq j_{v}$ , we sum up first $k$ elements, then we check if $z_{i}^{*}-\sum_{\ell=1}^{k}y_{j_{\ell}i}^{*}>0$ and $p_{i}^{\prime}=h_{i}-\sum_{\ell=k+1}^{v}d_{j_{\ell}i}>0$ , until $z_{i}^{*}-\sum_{\ell=1}^{k+1}y_{j_{\ell}i}^{*}<0$ . These $k$ elements constitute the subset $M$ and $N(i)\setminus M$ simultaneously and ensure $z_{i}^{*}-\sum_{j\in N(i)\setminus M}y_{ji}^{*}>0$ and $p_{i}>0$ in order to generate a violated cut. The set $M$ corresponds to the most violated cut can be determined by evaluating $\max\{0,p_{i}^{\prime}(z_{i}^{*}-\sum_{\ell=1}^{k}y_{j_{\ell}i}^{*}):k\in[1,v]\}$ . If $\max\{0,p_{i}^{\prime}(z_{i}^{*}-\sum_{\ell=1}^{k}y_{j_{\ell}i}^{*}):k\in[1,v]\}=0$ , then there are no violated cuts. The sorting process runs in $O(\hat{v}\log\hat{v})$ time and the evaluation takes $O(\hat{v})$ , since we have to check for every node $i\in V$ , overall the separation algorithm runs in $O(n\hat{v}\log\hat{v})$ time. ∎

2.4 Separation of continuous cover and continuous packing inequalities

Up to this point, we presented an exact polynomial time separation algorithm for inequalities (7). Next, we show that a violated continuous cover inequality for $\operatorname{conv}{(\cap_{i\in V}\mathcal{P}_{i})}$ can be identified by exploiting the result of Proposition 6. Then we can obtain a violated continuous packing inequality in polynomial time after finding a violated continuous cover inequality. We first formally establish the relationship between $S$ , $L$ and $M$ in the following lemma.

Lemma 1.

If $p=h-\sum_{j\in M}d_{j}>0$ and there exists $k\in N\setminus M$ such that $\sum_{j\in M\cup\{k\}}d_{j}>h$ and $\sum_{j\in M\cup\{k\}\setminus\{\ell\}}d_{j}<h$ for any $\ell\in M$ , then $p=\pi$ , $S=N\setminus M$ , $\sum_{j\in M\cup\{k\}}d_{j}-h=\lambda$ and $L=M\cup\{k\}$ .

Proof.

By rearranging the terms in the definition of $p$ ,

	$\displaystyle p$	$\displaystyle=h-\sum_{j\in M}d_{j}>0$
		$\displaystyle=h+\sum_{j\in N\setminus M}d_{j}-\sum_{j\in N}d_{j}>0$
		$\displaystyle=h+\sum_{j\in S}d_{j}-\sum_{j\in N}d_{j}>0.$

Hence, we can see that $S$ is equivalent to $N\setminus M$ and $p=\pi$ if $p>0$ . Next, if there exists an element $k\in N\setminus M$ such that $\sum_{j\in M\cup\{k\}}d_{j}>h$ , immediately we have $M\cup\{k\}=L$ by definition and $\sum_{j\in M\cup\{k\}}d_{j}-h=\lambda$ . Since $N\setminus M\cup\{M\cup\{k\}\}=N$ , we have $S\cup L=N$ , $S\cap L=\{k\}$ , and $L\setminus\{k\}=N\setminus S=M$ . ∎

Following Lemma 1, we present an efficient separation procedure to determine violated continuous cover and continuous packing inequalities by using the information of the set $M$ . Note that here we add an index $i$ to inequalities (2) similar to (7) for all $i\in V$ .

Proposition 7.

There exists a violated continuous cover inequality if a violated inequality (7) is identified.

Proof.

Recall that inequality (7) is violated if

\displaystyle p_{i}\left(z_{i}^{*}-\sum_{j\in N(i)\setminus M:d_{ji}>p_{i}}y_{ji}^{*}\right)-\sum_{j\in N(i)\setminus M:d_{ji}\leq p_{i}}d_{ji}y_{ji}^{*}>x_{i}^{*},

or equivalently by Lemma 1,

\displaystyle\pi_{i}z_{i}^{*}-\pi_{i}\sum_{j\in S:d_{ji}>\pi_{i}}y_{ji}^{*}-\sum_{j\in S:d_{ji}\leq\pi_{i}}d_{ji}y_{ji}^{*}>x_{i}^{*}.

Now, a continuous cover inequality for a fixed node $i\in V$ is violated if

\displaystyle\min\limits_{j\in S}\{\pi_{i},d_{ji}\}z_{i}^{*}+\sum_{j\in N(i)\setminus S}\Phi(d_{ji})(z_{i}^{*}-y_{ji}^{*})-\sum_{j\in S}\min\{\pi_{i},d_{ji}\}y_{ji}^{*}>x_{i}^{*}.

Suppose $d_{ji}>\pi_{i}$ for all $j\in S$ , then the left term of the continuous cover inequality can be further written as

\displaystyle\pi_{i}z_{i}^{*}+\sum_{j\in N(i)\setminus S}\Phi(d_{ji})(z_{i}^{*}-y_{ji}^{*})-\pi_{i}\sum_{j\in S:d_{ji}>\pi_{i}}y_{ji}^{*}-\sum_{j\in S:d_{ji}\leq\pi_{i}}d_{ji}y_{ji}^{*}.

Since $(z_{i}^{*}-y_{ji}^{*})\geq 0$ holds and the lifting function $\Phi$ is nonnegative, the rest of the terms already violate the current solution $(\mathbf{x^{*}},\mathbf{y^{*}},\mathbf{z^{*}})$ , we then obtain a violated continuous cover inequality with $\pi_{i}$ being the minimum among $\min\{\pi_{i},d_{ji}:j\in S\}$ . This suggests that, when a violated inequality (7) is found, it suffices to generate a violated continuous cover inequality concurrently. ∎

On the other hand, a violated continuous packing inequality can not be obtained directly from separating inequality (7). However, when a violated continuous cover inequality is found, we can check every element in $S$ to see if there exists an element $k$ such that $\sum_{j\in N\setminus S\cup\{k\}}d_{ji}>h_{i}$ . For every $k$ satisfying this condition, the packing set $L$ is then determined. A violated continuous packing inequality is found if

\displaystyle\sum_{j\in L}\max\{0,d_{ji}-\lambda_{i}\}(z_{i}^{*}-y_{ji}^{*})-\sum_{j\in N(i)\setminus L}\Psi(d_{ji})y_{ji}^{*}>x^{*}

holds. Furthermore, suppose $\hat{S}=\max\{|S|:S\subseteq N(i),i\in V\}$ , the process of checking elements in $S$ takes $O(\hat{S})$ time and the function $\Psi$ can be constructed in $O(\hat{v}\log\hat{v})$ time using binary search proposed in [26].

3 Valid inequalities for LCIM with cycles

In this section, we expand the study of $\operatorname{conv}{(\cap_{i\in V}\mathcal{P}_{i})}$ to incorporate the remaining constraints in LCIM on an arbitrary bidirectional graph that contains cycles. The polyhedron that describes the intersection of these constraints is

\displaystyle\mathcal{Q}=\left\{(x,y,z)\in\mathbb{R}_{+}^{n}\times\mathbb{B}^{n+m}:\eqref{LCIM1}-\eqref{LCIM4}\right\}.

To simplify the notation, we let $\boldsymbol{\alpha}$ and $\boldsymbol{\beta}$ be the coefficients associated with variables $\mathbf{y}$ and $\mathbf{z}$ corresponding to continuous cover and continuous packing inequalities. In other words, we express them in the following form:

\displaystyle x_{i}+\sum_{j\in N(i)}\alpha_{ji}y_{ji}\geq\beta_{i}z_{i},

(8)

and it is facet-defining for $\operatorname{conv}{(\mathcal{Q})}$ . Due to the straightforward structure of a cycle, the minimum incentive of the influence propagation can be easily characterized. Raghavan and Zhang [29] give an $O(n)$ time algorithm for the weighted target set selection problem on a cycle. We also present a polynomial time dynamic programming to solve LCIM on a simple cycle. Then we give a class of exponential number of valid inequalities that forms acyclic influence propagation by exploiting inequality (8) as the base inequality. We also demonstrate that the separation for this class of valid inequalities can be done in polynomial time for arbitrary bidirectional graphs that contain cycles.

3.1 Dynamic programming recursion for LCIM on a simple cycle

Without loss of generality, we assume $|V(C)|=|V|=n$ in this section. Hence, for a smple cycle graph, we have $V=V(C)=n$ , $E=C$ and $|E|=2n$ with $v_{i}=2$ for all $i\in V$ . We still use $V(C)$ for the set of all nodes and $C$ for set of all bidirectional arcs to focus on the discussion of LCIM on a simple cycle for consistency. Observe that due to the cycle structure, the influence propagation occurs on an induced path of a cycle as it is one-way and consecutive after a particular node is activated by paying full incentive to it. Given the cardinality requirement $b\leq n-1$ , the cost of activating other nodes on this path is equivalent to the threshold of the inactivated node minus the influence weights exerted from the activated predecessor on the side. For $b=n$ , the cost of activating the last node is zero as it receives influence exertion from both its predecessor and the firstly activated node. Therefore, we only need to evaluate the cases for $b\leq n-1$ .

Refer to caption — Figure 1: An induced subgraph of a cycle is an one-way path

Given $b\leq n-1$ , for $i\in V(C)$ , construct a node set $\overrightarrow{V}_{ib}$ that contains node $i$ with forward arcs connecting nodes starting from $i$ with total number of nodes equals to $b$ . Let $i_{1},i_{2},\ldots,i_{b-1},i_{b}$ be the indices in $\overrightarrow{V}_{ib}$ . The corresponding set of forward arcs is then $\overrightarrow{C}_{ib}=\{(i_{j},i_{j+1}):j\in[1,b-1]\}$ as illustrated in Figure 1. We apply the similar logic for backward set of nodes $\overleftarrow{V}_{ib}$ and arcs $\overleftarrow{C}_{ib}$ . Let $F(i,b)$ denote the minimum cost of activating $b$ nodes on the cycle beginning with node $i$ . For $1\leq b\leq n-1$ , the minimum cost of activating $b$ nodes on a cycle is given by

\displaystyle\min\limits_{i\in V(C)}F(i,b)

(9)

where

\displaystyle F(i,b)=\min\left\{h_{i}+\sum_{(j,k)\in\overrightarrow{C}_{ib}}(h_{k}-d_{jk}),\quad h_{i}+\sum_{(j,k)\in\overleftarrow{C}_{ib}}(h_{k}-d_{jk})\right\}.

(10)

Proposition 8.

The dynamic programming recursion given by (9) and (10) solves LCIM on a simple cycle in $O(n)$ time.

Proof.

The recursion (10) evaluates the minimum cost of activating $b$ nodes on a path beginning with node $i$ on both directions. As we only need to compare the minimum of (10) for every node, we obtain the optimal objective function of LCIM on a simple cycle in $O(n)$ time. ∎

3.2 Valid inequalities for influence propagation over a cycle

Since every network consists of trees and cycles as substructures, observe that for every cycle in the network, at least one node is either paid with full incentive or the activation requires influence exertion from nodes outside the cycle. Fischetti et al. [10] first recognized this observation and proposed a generalized propagation constraints in a different space of variables. Here we propose an exponential class of valid inequalities that captures this observation as well as ensures the influence propagation is acyclic for $\operatorname{conv}{(\mathcal{Q})}$ .

Proposition 9.

Given an inequality (8) and a cycle with set of nodes $V(C)$ and set of arcs $C$ , for $U\subseteq V(C)$ , the $(U,C)$ inequality

\displaystyle\sum_{i\in U}\gamma_{i}\left(x_{i}+\sum_{j\in N(i)}\alpha_{ji}y_{ji}-\beta_{i}z_{i}\right)\geq\delta(U)\left(1-\sum_{(k,\ell)\in C:\ell\notin U}(z_{\ell}-y_{k\ell})\right)

(11)

is valid for $\mathcal{Q}$ , where

(i)

$\omega_{i}=h_{i}-\beta_{i}+\sum_{j\in N(i):j\notin V(C)}(\alpha_{ji}-d_{ji})$ ,
(ii)

$\delta(U)=\omega_{i}$ if $|U|=1$ ,
(iii)

$\delta(U)$ computes the least common multiple of $\omega_{i}$ for $i\in U$ if $|U|\geq 2$ ,
(iv)

$\gamma_{i}=\frac{\delta(U)}{\omega_{i}}$ .

Proof.

Due to constraint (1c), there exists $z_{i}=1$ for some $i\in V(C)$ . Let $H=\{(k,\ell)\in C:\ell\notin U\}$ . We partition $H$ into two disjoint sets $H_{0}$ and $H_{1}$ such that $H=H_{0}\cup H_{1}$ and $H_{0}\cap H_{1}=\varnothing$ , where $H_{0}=\{(k,\ell)\in H:k\notin U\}$ and $H_{0}=\{(k,\ell)\in H:k\in U\}$ . We distinguish two main cases:

Case 1.

We first consider $z_{i}=0$ for all $i\in U$ and $z_{i}=1$ for all $i\in V(C)\setminus U$ . In this case, the left hand side of the inequality is equal to 0, whereas in the right hand side, we have $k\in V(C)\setminus U$ and $\ell\in\in V(C)\setminus U$ , if there exists at least one $y_{k\ell}=1$ , we must have $z_{k}=z_{\ell}=1$ , which leads to $|V(C)\setminus U|>|H_{0}|$ . Consequently,

\displaystyle\delta(U)\left(1-\sum_{i\in V(C)\setminus U}z_{i}+\sum_{(k,\ell)\in H_{0}}y_{k\ell}\right)<0

holds, inequality (11) is thus valid.

Case 2.

Next we consider $z_{i}=1$ for all $i\in U$ and $z_{i}=0$ for all $i\in V(C)\setminus U$ . In this case, we must have $y_{k\ell}=0$ for all $(k,\ell)\in H$ as influence exertion towards nodes belong to $V(C)\setminus U$ is unnecessary. Then the inequality reduces to

\displaystyle\sum_{i\in U}\gamma_{i}\left(x_{i}+\sum_{j\in N(i)}\alpha_{ji}y_{ji}-\beta_{i}\right)\geq\delta(U).

Regardless of whether $y_{ji}=0$ for all $i\in N(i)$ or $y_{ji}=1$ for some $j\in N(i)$ , there exists at least one index $i\in U$ such that $x_{i}=h_{i}$ and $y_{ji}=0$ . In other words, at least one node is activated with full incentive payment in order to launch the propagation for nodes in $U$ . For other nodes that receive partial or zero incentives, their corresponding term in the left hand side is zero. To compute the possible range of the left hand side, we rearrange the terms and obtain the following

		$\displaystyle\sum_{i\in U}\gamma_{i}(h_{i}-\beta_{i})$
	$\displaystyle=$	$\displaystyle\sum_{i\in U}\frac{\delta(U)}{(h_{i}-\beta_{i})}(h_{i}-\beta_{i})$
	$\displaystyle=$	$\displaystyle\sum_{i\in U}\delta(U).$

Consequently, the left hand side is at most $|U|\delta(U)$ and at least $\delta(U)$ with one particular node with full incentive $x_{i}=h_{i}$ , which is valid. This completes the proof.

∎

Example 2.

Consider a graph $G$ illustrated in Figure 2. The number next to each arc is the influence weight $d_{ij}$ , while the number inside the brackets next to each node is the threshold $h_{i}$ .

For node 2, take $L=\{1,3,5\}$ with $\lambda_{2}=3+4+5-10=2$ , the corresponding continuous packing inequality (8) is

\displaystyle x_{2}+y_{12}+2y_{32}+3y_{52}\geq 6z_{2}.

Similarly, for node 3, take $L=\{1,2\}$ with $\lambda_{3}=6+3-7=2$ , the corresponding continuous packing inequality (8) is

\displaystyle x_{3}+y_{13}+4y_{23}\geq 5z_{3}.

There are two cycles $\{(1,2),(2,3),(3,1)\}$ and $\{(1,3),(3,2),(2,1)\}$ in Figure 2. For $U=\{2,3\}$ with cycle $C=\{(1,2),(2,3),(3,1)\}$ , we have $\omega_{2}=10-6+3-5=2$ , $\omega_{3}=7-5=2$ , $\delta(\{2,3\})=\texttt{lcm}(2,2)=2$ , then the $(U,C)$ inequality is

\displaystyle(x_{2}+y_{12}+2y_{32}+3y_{52}-6z_{2})+(x_{3}+y_{13}+4y_{23}-5z_{3})\geq 2(1-z_{1}+y_{31}).

Furthermore, consider the cycle $C=\{(1,3),(3,2),(2,1)\}$ from another direction, the $(U,C)$ inequality is

\displaystyle(x_{2}+y_{12}+2y_{32}+3y_{52}-6z_{2})+(x_{3}+y_{13}+4y_{23}-5z_{3})\geq 2(1-z_{1}+y_{21}).

Next, we give the condition in which inequality (11) is stronger than the generalized cycle elimination constraints. Without loss of generality, we assume that $\delta(U)=1$ for $U=\varnothing$ .

Proposition 10.

Inequality (11) with $U=\varnothing$ dominates the generalized cycle elimination constraints.

Proof.

Consider a particular cycle $C$ , the GCEC can be states as

\displaystyle\sum_{(i,j)\in C}(z_{j}-y_{ij})\geq z_{k},

where $k\in V(C)$ is an arbitrary choice of index among $V(C)$ . For a $(U,C)$ inequality (11) with $U=\varnothing$ on this cycle, we obtain

\displaystyle\sum_{(i,j)\in C}(z_{j}-y_{ij})\geq 1.

Clearly, the GCEC is weaker then the $(U,C)$ inequality unless $z_{k}^{*}=1$ . Furthermore, if we have $\sum_{(i,j)\in C}(z_{j}-y_{ij})<z_{k}$ , then $\sum_{(i,j)\in C}(z_{j}-y_{ij})<1$ must hold. This implies that there exist violated $(U,C)$ inequalities for every violated cycle $C$ identified. ∎

3.3 Separation of $(U,C)$ inequalities

Since the size of of inequalities (11) is exponential, we explore a separation scheme to find the most violated inequality corresponding to set $U$ in polynomial time. We again assume that $|V(C)|=n$ in this section.

Proposition 11.

Separation problem for inequality (11) can be solved in $O(n^{3}\log n)$ time.

Proof.

Inequality (11) is violated if

\displaystyle\delta(U)\left(1-\sum_{(k,\ell)\in C:\ell\notin U}(z_{\ell}^{*}-y_{k\ell}^{*})\right)-\sum_{i\in U}\gamma_{i}\left(x_{i}^{*}+\sum_{j\in N(i)}\alpha_{ji}y_{ji}^{*}-\beta_{i}z_{i}^{*}\right)>0.

(12)

For a given fractional point $(\mathbf{x^{*}},\mathbf{y^{*}},\mathbf{z^{*}})\in\mathcal{Q}$ , we determine a set $U\subseteq V(C)$ such that the left hand side of (12) is maximized. With the observation in Proposition 10, for every violated cycle detected, we construct a longest path problem on a directed acyclic network to solve the separation problem.

Consider a directed acyclic network $\mathcal{D}=(\mathcal{V},\mathcal{A})$ with a source vertex $0\in\mathcal{V}$ and a sink vertex $n+1\in\mathcal{V}$ . Define $\theta_{i}=x_{i}^{*}+\sum_{j\in N(i)}\alpha_{ji}y_{ji}^{*}-\beta_{i}z_{i}^{*}$ for all $i\in[1,n]$ . It is possible that there exists multiple inequality (8) for a fixed $i$ . We select the one with the minimum value of $\theta_{i}$ accordingly. Let index set $\{i^{\prime}:i\in[1,n]\}$ such that $\theta_{1^{\prime}}\leq\theta_{2^{\prime}}\leq\ldots\leq\theta_{n^{\prime}}$ . Node $i^{\prime}$ is sorted according the the value of $\theta_{i}$ and each node $i^{\prime}\in\mathcal{V}$ has a unique mapping to each node $i\in V(C)$ . The node set $\mathcal{V}$ is then $\{0,n+1\}\cup\{i^{\prime}:i\in[1,n]\}$ . The arc set is $\mathcal{A}=\{(0,n+1)\}\cup\{(0,1^{\prime})\}\cup\{(i^{\prime},(i+1)^{\prime}):i\in[1,n-1]\}\cup\{(i^{\prime},n+1):i\in[1,n]\}$ .

Next, we assign length on each arc in $\mathcal{A}$ . For arc $(0,n+1)$ , we let

\displaystyle f_{0,n+1}=\max\left\{\omega_{i}\left(1-\sum_{(k,\ell)\in C:\ell\neq i}(z_{\ell}^{*}-y_{k\ell}^{*})\right)-\theta_{i}:i\in V(C)\right\}.

Let $f_{0,1^{\prime}}=\theta_{1^{\prime}}$ . For arcs $\{(i^{\prime},(i+1)^{\prime}):i\in[1,n-1]\}$ , we set the length $f_{i^{\prime},(i+1)^{\prime}}=\theta_{(i+1)^{\prime}}$ . For arcs $\{(i^{\prime},n+1):i\in[1,n]\}$ , we set the length

	$\displaystyle f_{i^{\prime},n+1}=$
	$\displaystyle\delta(\{\omega_{j}\}_{j=1^{\prime}}^{i^{\prime}})\left(1-\sum_{(k,\ell)\in C:\ell\notin\{j\}_{j=1^{\prime}}^{i^{\prime}}}(z_{\ell}^{}-y_{k\ell}^{})\right)-\sum_{k=1^{\prime}}^{i^{\prime}}\left(\frac{\delta(\{\omega_{j}\}_{j=1^{\prime}}^{i^{\prime}})}{\omega_{k}}+1\right)\theta_{k}.$

This longest path problem depicted in Figure 3 can be solved by Dijkstra’s algorithm. There exists a violated inequality (11) if and only if the longest path is strictly positive and the nodes on this path determine the elements in set $U$ . The sorting process of $\theta_{i}$ takes $O(n\log n)$ time, the evaluation of $f_{0,n+1}$ takes $O(n)$ time, and the longest path on a directed acyclic graph takes $O(n)$ time as there are $n+2$ nodes and $2n+1$ arcs. Since we have to solve this problem for every violated cycle found by Dijkstra algorithm, the separation algorithm runs $O(n^{3}\log n)$ time overall. ∎

Example 3.

Consider solving LCIM with $b=3$ on a graph depicted in Figure 2. The initial linear programming relaxation solution without enumerating any cycle elimination constraints is

	$\displaystyle(x_{1}^{},x_{2}^{},x_{3}^{},x_{4}^{},x_{5}^{*})=(0,4.92,0.6,3,0),$
	$\displaystyle(z_{1}^{},z_{2}^{},z_{3}^{},z_{4}^{},z_{5}^{*})=(0.6,0.6,0.6,0.6,0.6),$
	$\displaystyle(y_{12}^{},y_{13}^{},y_{14}^{},y_{21}^{},y_{23}^{},y_{25}^{},y_{31}^{},y_{32}^{},y_{41}^{},y_{52}^{})=(0.36,0,0,0.24,0.6,0.6,0.6,0,0.6,0),$

and the objective function value is 8.52. The violated cycle is $\{(1,2),(2,3),(3,1)\}$ for this solution. Then we obtain $\theta_{1}=-0.72$ , $\theta_{2}=1.68$ and $\theta_{3}=0$ and sort them in non-decreasing order. Since $\theta_{1}<\theta_{3}<\theta_{2}$ , the set $\{1^{\prime},2^{\prime},3^{\prime}\}$ is corresponding to the original node set $\{1,3,2\}$ . This leads to a directed acyclic network illustrated in Figure 4 with new node set $\mathcal{V}=\{0,1^{\prime},2^{\prime},3^{\prime},4\}$ . The length of $f_{0,1^{\prime}},f_{1^{\prime},2^{\prime}}$ and $f_{2^{\prime},3^{\prime}}$ are $\theta_{1},\theta_{3}$ and $\theta_{2}$ , respectively. The lengths of the remaining arcs are

	$\displaystyle f_{0,4}=\max\{3(1-0.24)-(-0.72),2-1.68,2\}=3,$
	$\displaystyle f_{1^{\prime},4}=3(1-0.24)-(\frac{3}{3}+1)(-0.72)=3.72,$
	$\displaystyle f_{2^{\prime},4}=6(1-0.24)-(\frac{6}{3}+1)(-0.72)=6.72,$
	$\displaystyle f_{3^{\prime},4}=6-(\frac{6}{3}+1)(-0.72)-(\frac{6}{2}+1)(1.68)=1.44.$

The longest path is $0\rightarrow 1^{\prime}\rightarrow 2^{\prime}\rightarrow 4$ , which determines $U=\{1,3\}$ with maximum violation 6. We add the following violated inequality (11)

\displaystyle 2(x_{1}+2y_{21}+4y_{31}+6y_{41}-12z_{1})+3(x_{3}+y_{13}+4y_{23}-5z_{3})\geq 6(1-z_{2}+y_{12})

to cut off this fractional solution. We then obtain the new objective function value 10.2 and the solution sets

	$\displaystyle(x_{1}^{},x_{2}^{},x_{3}^{},x_{4}^{},x_{5}^{*})=(0,6,2.2,2,0),$
	$\displaystyle(z_{1}^{},z_{2}^{},z_{3}^{},z_{4}^{},z_{5}^{*})=(0.6,0.6,0.6,0.6,0.6),$
	$\displaystyle(y_{12}^{},y_{13}^{},y_{14}^{},y_{21}^{},y_{23}^{},y_{25}^{},y_{31}^{},y_{32}^{},y_{41}^{},y_{52}^{})=(0,0,0.2,0.6,0.6,0.6,0.6,0,0.4,0),$

which is very close to the true optimal objective function value 11 in this example.

4 LCIM under equal influence and 100% adoption

Since the optimal propagation subgraph of LCIM is acyclic, the solution of LCIM on a tree provides a valid lower bound for LCIM on a graph with cycles. Moreover, in practical applications, it is common to assume that both threshold and influence exertion are identical for every node, due to simplicity or lack of accurate estimation. For example, in the unanimous threshold model [5], $h_{i}=v_{i}$ for all $i\in V$ is assumed. This diffusion model is normally considered as the most influence resistant one, and it has applications in complex computer network security problems. In addition, the majority threshold model [30] assumes $h_{i}=\lceil\frac{v_{i}}{2}\rceil$ for all $i\in V$ . Both information diffusion models assume that $d_{ij}=1$ for all $(i,j)\in E$ .

In this special case of LCIM where equal influence is assumed for all $i\in V$ and 100% coverage is required on a tree, we have $d_{ij}=d_{i}$ for all $i\in V$ and GCEC can be discarded. The LCIM formulation corresponding to equal influence and 100% coverage on a tree graph is given by


$\displaystyle\text{(LCIM-TE) }\min\limits_{x,y}\quad$	$\displaystyle\sum_{i\in V}x_{i}$
s.t.	$\displaystyle x_{i}+d_{i}\sum_{j\in N(i)}y_{ji}\geq h_{i}\quad\forall i\in V$	(13a)
	$\displaystyle y_{ij}+y_{ji}=1\quad\forall(i,j)\in E:i<j$	(13b)
	$\displaystyle x\in\mathbb{R}_{+}^{n}$	(13c)
	$\displaystyle y\in\mathbb{B}^{m}.$	(13d)

Let $\mathcal{S}$ denote the set of feasible solutions to LCIM-TE on a tree graph and let $\mathcal{R}$ denote the set of feasible solutions to the linear programming relaxation of (13a) - (13d). Günneç et al. [12] prove that LCIM-TE is polynomial solvable on a tree graph. They propose a compact extended formulation with total unimodular constraints by considering three types of incoming influence for every node. However, this extended formulation cannot be applied to unequal influence weights directly, since there would be exponentially many distinct possible types of incoming influence for a node, depending on the influence weights received from its activated neighbors. We give the complete linear description of $\operatorname{conv}{(\mathcal{S})}$ in the original space of variables with additional $O(n)$ constraints and show they are a special case of the continuous cover and continuous packing inequalities by adjusting the influence weights.

Proposition 12.

Let $\sigma_{i}=\lceil\frac{h_{i}}{d_{i}}\rceil$ and $g_{i}=h_{i}-(\sigma_{i}-1)d_{i}$ for all $i\in V$ , the inequality

\displaystyle x_{i}+\min\{g_{i},d_{i}\}\sum_{j\in N(i)}y_{ji}\geq g_{i}\sigma_{i}

(14)

is facet-defining for $\operatorname{conv}{(\mathcal{S})}$ if and only if $g_{i}<d_{i}$ and $\sigma_{i}\geq 2$ . Furthermore, the complete linear description of $\operatorname{conv}{(\mathcal{S})}$ is given by

\displaystyle\operatorname{conv}{(\mathcal{S})}=\left\{(x,y)\in\mathcal{R}:x_{i}+\min\{g_{i},d_{i}\}\sum_{j\in N(i)}y_{ji}\geq g_{i}\sigma_{i}\quad\forall i\in V\right\}.

Proof.

If $g_{i}=d_{i}$ , then $h_{i}=\sigma_{i}d_{i}$ and inequality (14) coincides with $\eqref{LCIM-E1}$ . Similarly, if $\sigma_{i}=1$ , we also have $g_{i}=h_{i}=d_{i}$ and inequality (14) is reduced to $\eqref{LCIM-E1}$ . To prove the sufficiency, we demonstrate that inequality (14) is a special case of the continuous cover and continuous packing inequalities. Observe that $\sigma_{i}$ is the minimum number that exceeds $h_{i}$ if multiplied by $d_{i}$ , which implies that $h_{i}-(\sigma_{i}-1)d_{i}>0$ . Therefore, $\sigma_{i}$ is the cardinality of set $L$ corresponding to the continuous packing inequality with equal influence weights. We thus obtain $\lambda_{i}=d_{i}\sigma_{i}-h_{i}$ and $g_{i}=d_{i}-\lambda_{i}$ . Equivalently,

	$\displaystyle g_{i}$	$\displaystyle=h_{i}-(\sigma_{i}-1)d_{i}$
		$\displaystyle=h_{i}+\left[\|N(i)\|-\sigma_{i}+1-\|N(i)\|\right]d_{i}$
		$\displaystyle=\pi_{i},$

hence $|N(i)|-\sigma_{i}+1$ is the cardinality of set $S$ in the continuous cover inequality. Following the result of Lemma 1 for the interchangeable relationship between sets $S$ and $L$ , $g_{i}$ and $g_{i}\sigma_{i}$ coincide with the coefficients of the continuous cover and continuous packing inequalities, respectively.

For the second part of this Proposition, we assume $g_{i}<d_{i}$ and $g_{i}\sigma_{i}<h_{i}$ holds for all $i\in V$ without loss of generality. Observe that for $i\in V$ , the possible values of $x_{i}=\max\{0,h_{i}-(\sigma_{i}-w)d_{i}\}$ for $w\in[0,\sigma_{i}]$ , where $\sigma_{i}-w$ is an implicit upper bound of number of activated neighbors for node $i$ , namely, $\sum_{j\in N(i)}y_{ji}\leq\sigma_{i}-w$ . We prove that for any choice of $w\in[0,\sigma_{i}]$ , we must have integral $(\mathbf{x},\mathbf{y})$ in the following three cases:

Case 1.

Suppose $w=0$ and $x_{i}=0$ . Inequality (13a) is reduced to $\sum_{j\in N(i)}y_{ji}\geq\frac{h_{i}}{d_{i}}$ , which is dominated by inequality (14) with $\sum_{j\in N(i)}y_{ji}\geq\sigma_{i}$ . There exist at least $\sigma_{i}$ activated neighbors that exert influence toward node $i$ . Moreover, from the implicit upper bound $\sum_{j\in N(i)}y_{ji}\leq\sigma_{i}$ , we have $\sum_{j\in N(i)}y_{ji}=\sigma_{i}$ . Since $x_{i}=0$ we must have $y_{ij}=0$ and $y_{ji}=1$ such that $|\{j:j\in N(i)\}|=\sigma_{i}$ due to constraints (13b).

Case 2.

Suppose $w=\sigma_{i}$ and $x_{i}=h_{i}$ . Inequality (13a) becomes $\sum_{j\in N(i)}y_{ji}\geq 0$ , which dominates inequality (14) with $\sum_{j\in N(i)}y_{ji}\geq\sigma_{i}-\frac{h_{i}}{g_{i}}$ as the right hand side here is strictly negative. Similar to Case 1, we must have $y_{ji}=0$ and $y_{ij}=1$ for all $j\in N(i)$ due to the implicit upper bound $\sum_{j\in N(i)}y_{ji}\leq 0$ and constraints (13b).

Case 3.

Suppose $w\in[1,\sigma_{i}-1]$ . First, let $w=1$ , then $x_{i}=g_{i}=h_{i}-(\sigma_{i}-1)d_{i}$ . We have $\sum_{j\in N(i)}y_{ji}\geq\sigma_{i}-1$ in both inequality (13a) and inequality (14). Following Case 1 and Case 2, we have $\sum_{j\in N(i)}y_{ji}=\sigma_{i}-1$ with $y_{ji}=1$ and $y_{ij}=0$ for some $j\in N(i)$ such that $|\{j:j\in N(i)\}|=\sigma_{i}-1$ . Next, let $w=2$ and $\sum_{j\in N(i)}y_{ji}\geq\sigma_{i}-2$ holds in inequality (13a). While in inequality (14),

	$\displaystyle\sum_{j\in N(i)}y_{ji}$	$\displaystyle\geq\frac{g_{i}\sigma_{i}-h_{i}+d_{i}\sigma_{i}-2d_{i}}{g_{i}}$
		$\displaystyle\geq\sigma_{i}-1-\frac{d_{i}}{g_{i}}.$

Since we assume $g_{i}<d_{i}$ , inequality (13a) dominates inequality (14). By mathematical induction, for $w\in[1,\sigma_{i}-1]$ , we conclude that inequality (13a) $\sum_{j\in N(i)}y_{ji}\geq w$ always dominates inequality (14). Furthermore, we must have $\sum_{j\in N(i)}y_{ji}=w$ from the implicit upper bound and the value of $y_{ij}$ and $y_{ji}$ are either 0 or 1 following Case 1 and Case 2. We have now demonstrated that inequality (14) is facet-defining and $(\mathbf{x},\mathbf{y})$ are integral for any choice of $w\in[0,\sigma_{i}]$ , thus the proof is completed.

∎

We close this section by noting that the $(U,C)$ inequalities (11) and the separation algorithm in Section 3.3 can be directly applied to LCIM-TE on a graph with cycles.

Proposition 13.

The $(U,C)$ inequality for equal influence weights of a cycle is given by

\displaystyle\sum_{i\in U}\gamma_{i}\left(x_{i}+\alpha_{i}\sum_{j\in N(i)}y_{ji}-\beta_{i}\right)\geq\delta(U)\left(1-|V(C)|+|U|+\sum_{(k,\ell)\in C:\ell\neq i}y_{k\ell}\right)

(15)

for $\operatorname{conv}{(\mathcal{S})}$ .

Proof.

The result is deduced from Proposition 9 by fixing $z_{i}$ to 1 for all $i\in V$ and substituting the coefficients accordingly. The definition of $\gamma_{i}$ and $\delta(U)$ follows Proposition 9. Let $\omega_{i}=h_{i}-\beta_{i}+|\{j\in N(i):j\notin V(C)\}|(\alpha_{i}-d_{i})$ for all $i\in V(C)$ . The right hand side of the inequality is equivalent to fix $z_{i}=1$ for $i\notin U$ . Finally, let $\alpha_{i}=\min\{g_{i},d_{i}\}$ and $\beta_{i}=g_{i}\sigma_{i}$ as in inequality (14). ∎

5 Computational Experiments

In this section, we give a detailed description of the data generation and algorithm settings. We test the effectiveness of a delayed cut generation algorithm that incorporates the proposed valid inequalities in solving LCIM under different conditions. All the experiments were conducted on a single thread of a Windows 10 Enterprise server with Intel(R) Core i7-4770 CPU at 3.40 GHz x-64 based processor and 8GB of RAM using Python 3.8 and Gurobi 9.1.2 with default settings as the optimization solver. A 3600 seconds time limit was imposed for each experiment.

5.1 Data generation and algorithm settings

We follow the exact data generation scheme in [10], except for the fact that we generate bidirectional arcs between every two nodes. The small-world network topology for each instance is generated based on watts_strogatz_graph function in the NetworkX package of Python [31]. The instances have the following properties: size of node set $n\in\{50,75,100\}$ , average node degree $v\in\{4,8,12,16\}$ , rewiring probability $q\in\{0.1,0.3\}$ and we set penetration rate $a\in\{0.1,0.25,0.5,0.75,1\}$ . Influence weight $d_{ij}$ for all $(i,j)\in E$ are generated from discrete uniform distribution between 1 and 10. Let $\Delta_{i}=\sum_{j\in N(i)}d_{ji}$ and $\Upsilon_{i}$ be a random variable follows normal distribution $\mathcal{N}(0.7\Delta_{i},\Delta_{i}/v_{i})$ for all $i\in V$ . We set $h_{i}=\lceil\max\{1,\min\{\Upsilon_{i},\Delta_{i}\}\}\rceil$ . For each setting, we generate three instances and report the average.

The effectiveness of two delayed cut generation algorithms and one alternative reformulation are compared in our study:

1.

DEF: formulation LCIM given by (1a) - (1c),
2.

CB: formulation LCIM with cut-and-branch enhancement, and
3.

LN: layered-network formulation.

To implement the delayed cut generation, the GCEC (1d) is separated via lazy constraint callback only for integer solutions for DEF and CB. Grötschel et al. [32] give a shortest path algorithm for separating (1d) and we utilize the existing shortest path function dijkstra_path in the NetworkX package to perform such task. For the cut-and-branch enhancement in algorithm CB, we add the proposed inequalities via user-cut callback at the root node to tighten the linear programming relaxation of formulation LCIM.

We use the layered-network formulation proposed by [33] to replace (1d) in algorithm LN as this formulation gives a directed acyclic graph with the additional layer assignment variables $l_{i}$ for all $i\in V$ . We are interested in testing whether this cycle-free formulation is beneficial to solve LCIM compared with GCEC (1d). In addition, this formulation allows us to keep the mixed 0-1 knapsack substructure so the valid inequalities can be applied to it directly. In our preliminary experiments, we observe that this formulation does not produce better optimality gap for $a\in\{0.1,0.25,0.5,0.75\}$ by relaxing constraints (16a) compared with DEF and CB (average optimality gap $>90\%$ ), hence, we only report the computation for $a=1$ $(b=n)$ . The layered-network formulation used in algorithm LN is given by

\displaystyle\min\left\{\sum_{i\in V}x_{i}:(x,y,z)\text{ satisfies }\eqref{LCIM1},\eqref{LCIM3},\eqref{LN-1}-\eqref{LN-3}\right\},

where


	$\displaystyle y_{ij}+y_{ji}=1\quad\forall(i,j)\in E$		(16a)
	$\displaystyle y_{ji}-(n-1)y_{ij}\leq l_{j}-l_{i}\quad\forall(i,j)\in E$		(16b)
	$\displaystyle 1\leq l_{i}\leq n\quad\forall i\in V.$		(16c)

5.2 Analysis of results

Table 3: Computational performance comparing MIP nodes, cuts, time and unsolved instances on network with

n

=50. Column Time[Gap]* reports the average solution time (in seconds) of the instances that are solved to optimality, and, where applicable, the average of the optimality gap (%, in brackets) of the instances that are not solved to optimality when reaching time limit. Each asterisk sign indicates an unsolved instance.

$n$ =50			Nodes			Cuts			Time[Gap]*
$v-m$	$q$	$a$	DEF	CB	LN	DEF	CB	LN	DEF	CB	LN
4-200	0.1	0.1	18	12		12	12		0.30	0.31
		0.25	121	53		25	15		0.45	0.40
		0.5	492	254		52	21		0.84	0.65
		0.75	1562	1680		76	45		2.53	2.74
		1	921	636	1041	70	27	233	1.15	0.78	17.84
4-200	0.3	0.1	1	5		5	4		0.29	0.20
		0.25	1	1		13	3		0.33	0.27
		0.5	33	9		28	10		0.41	0.39
		0.75	129	99		45	18		0.61	0.56
		1	207	1	140	47	16	83	0.57	0.42	5.02
8-400	0.1	0.1	85	115		18	15		1.02	1.71
		0.25	429	392		14	14		2.41	2.85
		0.5	6077	8831		57	65		19.08	27.31
		0.75	457292	196845		189	169		1367.06[7.31]*	682.35
		1	838587	658888	69506	339	341	2491	[8.22]***	[4.70]***	[8.56]***
8-400	0.3	0.1	104	85		19	14		0.96	1.49
		0.25	375	497		17	18		2.50	3.81
		0.5	3009	3592		35	40		15.88	31.34
		0.75	323777	185520		148	170		2028.51[5.11]*	1486.88
		1	418483	404013	57946	297	265	2372	[10.47]***	[7.13]***	[12.08]***

Table 4: Computational performance comparing MIP nodes, cuts, time and unsolved instances on network with

n

=75. Column Time[Gap]* reports the average solution time (in seconds) of the instances that are solved to optimality, and, where applicable, the average of the optimality gap (%, in brackets) of the instances that are not solved to optimality when reaching time limit. Each asterisk sign indicates an unsolved instance.

$n$ =75			Nodes			Cuts			Time[Gap]*
$v-m$	$q$	$a$	DEF	CB	LN	DEF	CB	LN	DEF	CB	LN
4-300	0.1	0.1	91	17		11	8		0.57	0.31
		0.25	833	742		46	22		1.87	1.35
		0.5	1094	1543		82	48		2.11	3.17
		0.75	3378	1880		119	69		5.92	4.17
		1	1753	1384	1252	130	69	366	2.08	2.15	18.39
4-300	0.3	0.1	10	12		17	9		0.64	0.70
		0.25	309	76		34	13		0.85	0.71
		0.5	805	151		78	21		1.78	1.24
		0.75	1537	604		80	37		3.50	2.66
		1	976	965	1067	107	53	445	2.44	2.12	16.63
8-600	0.1	0.1	229	195		13	10		2.73	3.45
		0.25	3178	912		26	20		18.82	7.81
		0.5	216230	212500		192	78		227.75[4.77]*	857.06
		0.75	564387	348760		248	169		[6.15]***	508.18[3.36]**
		1	360594	379776	50255	368	324	2923	[9.50]***	[5.71]***	[9.64]***
8-600	0.3	0.1	214	127		13	5		2.10	2.78
		0.25	1078	1224		19	15		10.54	10.27
		0.5	9948	3776		68	75		129.29	113.47
		0.75	161508	223027		184	194		[5.00]***	3489.72[1.99]**
		1	191592	167297	41342	266	356	2701	[9.48]***	[6.38]***	[10.30]***
12-900	0.1	0.1	812	635		9	10		13.93	18.00
		0.25	1803	2376		18	13		53.28	62.91
		0.5	369964	229309		36	63		2803.78[2.11]*	[2.96]***
		0.75	18614	31354		111	144		[13.68]***	[10.28]***
		1	48188	33974	26375	228	230	3317	[16.49]***	[16.18]***	[17.85]***
12-900	0.3	0.1	1163	1275		10	2		15.08	30.70
		0.25	3757	5019		42	19		115.19	224.71
		0.5	27217	22714		35	48		1363.66[2.83]*	1827.81[9.39]*
		0.75	11042	12625		58	120		[24.75]***	[15.69]***
		1	20954	25899	23253	199	283	3071	[22.71]***	[17.21]***	[26.88]***

We summarize our computational results in TABLE 3, 4 and 5 under various settings of $(n,m,v,q,a)$ . We report the average number of branch-and-cut tree nodes explored in the column Nodes. The column Cuts shows the average number of Gurobi cuts added during the optimization process. In column Time[Gap]* we report the average solution time (in seconds) of the instances that are solved to optimality, and the average of the optimality gap of the instances that are not solved to optimality when reaching time limit (in brackets). Each asterisk sign indicates an unsolved instance and the gap is calculated by 100 $\times(\texttt{ub}-\texttt{lb})/\texttt{lb}$ where ub and lb are the best integer feasible solution obtained and best lower bound generated by the algorithm within time limit, respectively.

We observe that the major factor that contributes to the unsolved instances with positive optimality gap is the average node degree rather than the number of nodes. This observation can be justified by comparing the instances $(n,v,m)$ = (100,4,400) and $(n,v,m)$ = (50,8,400), as the former is easier to solve than the later. A similar observation is also established in [10] in their set covering formulation using the price-cut-and-branch algorithm. Despite the average number of nodes and cuts are the greatest in LN for $n$ =50, there exists no clear domination relationship in columns Nodes and Cuts between DEF, CB and LN for $n$ =75 or 100. Moreover, the average number of cuts added is not very large, which indicates that Gurobi cuts do not complement the optimization process. For the unsolved instances in Table 3 and 4, CB outperforms DEF and LN except for instances $(n,v,m,q,a)$ = (75,12,900,0.1,0.5) and (75,12,900,0.3,0.5). In Table 5 where $n$ =100, CB still outperforms DEF and LN for $v\in$ {4,8,12} except for one instance (100,12,1200,0.3,0.5), where the optimality gap difference is 0.92%. The common setting $a$ =0.5 shared in these exceptions suggests that the symmetry created by the cardinality constraint requires additional improvement. We also notice that LN produces better optimality gap than DEF for instances (100,8,800,0.1,1) and (100,12,1200,0.1,1). However, LN suffers from slow improvement of both upper and lower bound and results in larger ending gap in general in our experiments. We begin to see the degraded performance of algorithm CB for instances with $v$ =16. A large number of valid inequalities added to the root node could be a possible reason that decelerates the optimization process. Five out of ten results generated by CB under this category are no better than DEF. Nevertheless, the average optimality gap produced by CB for these five instances are 1.49% higher than DEF. To sum up, although the linear programming relaxation of the arc-based formulation is very weak with zero objective function value in most cases, our proposed valid inequalities significantly improve the strength of the lower bound and effectively reduce or close the optimality gap.

Table 5: Computational performance comparing MIP nodes, cuts, time and unsolved instances on network with

n

=100. Column Time[Gap]* reports the average solution time (in seconds) of the instances that are solved to optimality, and, where applicable, the average of the optimality gap (%, in brackets) of the instances that are not solved to optimality when reaching time limit. Each asterisk sign indicates an unsolved instance.

$n$ =100			Nodes			Cuts			Time[Gap]*
$v-m$	$q$	$a$	DEF	CB	LN	DEF	CB	LN	DEF	CB	LN
4-400	0.1	0.1	28	1		11	4		0.95	0.85
		0.25	148	118		50	12		1.24	1.09
		0.5	1554	998		80	33		3.77	2.49
		0.75	1705	834		128	61		6.58	3.62
		1	669	1600	670	120	63	318	3.09	2.67	13.01
4-400	0.3	0.1	15	11		15	3		0.53	0.36
		0.25	83	123		22	7		1.27	1.07
		0.5	365	391		92	26		2.40	1.60
		0.75	2045	697		117	53		8.55	2.92
		1	904	213	994	124	42	383	3.84	1.52	18.65
8-800	0.1	0.1	511	295		17	7		7.11	7.28
		0.25	16138	9611		16	59		96.09	88.48
		0.5	364834	278716		297	157		1499.18[4.20]**	1247.35[2.37]*
		0.75	175014	231825		282	306		[9.81]***	[7.10]***
		1	121123	154716	46686	374	439	2988	[13.84]***	[8.39]***	[13.02]***
8-800	0.3	0.1	434	345		16	12		5.74	7.27
		0.25	5682	4509		19	21		56.08	72.36
		0.5	65328	25567		91	129		735.63[1.79]*	685.49
		0.75	13119	37109		190	233		[14.13]***	[6.43]***
		1	40550	68921	35379	261	407	2849	[15.19]***	[9.39]***	[17.02]***
12-1200	0.1	0.1	1866	1249		6	4		44.77	83.37
		0.25	6425	6356		29	48		296.98	372.08
		0.5	40341	41264		41	52		[11.17]***	[9.17]***
		0.75	8292	7915		143	173		[19.69]***	[16.40]***
		1	15506	16755	20873	187	388	4447	[29.78]***	[18.11]***	[20.74]***
12-1200	0.3	0.1	3984	2870		0	9		39.06	145.17
		0.25	55111	33660		14	65		1187.31	2191.00
		0.5	15457	11751		44	51		[21.00]***	[21.92]***
		0.75	4426	5168		102	128		[30.63]***	[24.46]***
		1	8704	10036	20128	174	241	3688	[21.11]***	[19.05]***	[21.31]***
16-1600	0.1	0.1	4585	5403		14	11		193.29	474.52
		0.25	9590	11593		26	28		1244.07	2010.94
		0.5	5384	4268		9	14		[22.36]***	[22.42]***
		0.75	3360	3591		68	92		[32.71]***	[30.71]***
		1	8464	6772	11907	148	158	3443	[22.47]***	[23.43]***	[24.10]***
16-1600	0.3	0.1	11982	13063		5	20		286.70	852.37
		0.25	51785	23015		19	33		[19.46]***	[22.45]***
		0.5	5271	4520		17	18		[34.36]***	[35.58]***
		0.75	2636	2724		85	89		[42.24]***	[44.46]***
		1	4441	4485	11802	78	91	3408	[25.78]***	[24.78]***	[25.79]***

6 Conclusion

We study the least cost influence maximization problem in social networks where the influence propagation behavior among users is captured by the deterministic linear threshold model. A typical application of this problem is to obtain an estimation of the partial incentives given to early product adopters in viral marketing while achieving a desired coverage rate by the end of information spreading. We focus on the case where influence weights exerted from peers are heterogeneous and derive several classes of valid inequalities from the hidden mixed 0-1 knapsack substructure in the mixed-integer programming formulation. Despite the fact that the set of feasible solutions is hard to convexify due to the knapsack constraints and the linear programming relaxation being very weak, our computational experiments show that the delayed cut generation algorithm exploiting these inequalities can effectively reduce the optimality gap. For the case with equal influence weights and 100% adoption on a tree, we characterize the complete linear description of the convex hull in its natural space of incentive, arc propagation and activation variables. The convex hull of the LCIM with equal influence weights and arbitrary adoption on a tree is still an open question and requires further investigation. We observe that the bottleneck of computational improvements are mostly in the instances where 100% adoption is not required. A promising future research direction is to identify more explicit valid inequalities from the intersections of the cardinality constraint that controls the penetration rate with the rest of the formulation.

Acknowledgements

We are grateful to the two anonymous reviewers who provided constructive feedback that helped us improve the content of this paper. We also thank Demetrios Papazaharias in the early discussion of the separation problem for the minimum influencing subset inequalities. This material is based upon work supported by the Air Force Research Laboratory (AFRL) Mathematical Modeling and Optimization Institute and AFRL award FA8651-16-2-0009. Preliminary results of this study were presented at the 9th International Conference on Computational Data and Social Networks (CSoNet 2020, December 11–13, 2020, Dallas, TX) and published in the respective conference proceedings volume [20].

Data Availability Statement

The datasets analyzed during the current study are available from the corresponding author on reasonable request.

References

[1] Wei Chen, Laks VS Lakshmanan, and Carlos Castillo. Information and influence propagation in social networks. Synthesis Lectures on Data Management, 5(4):1–177, 2013.
[2] David Kempe, Jon Kleinberg, and Éva Tardos. Maximizing the spread of influence through a social network. Theory of Computing, 11(4):105–147, 2015.
[3] Mark Granovetter. Threshold models of collective behavior. American journal of sociology, 83(6):1420–1443, 1978.
[4] David Kempe, Jon Kleinberg, and Éva Tardos. Maximizing the spread of influence through a social network. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 137–146, 2003.
[5] Ning Chen. On the approximability of influence in social networks. SIAM Journal on Discrete Mathematics, 23(3):1400–1415, 2009.
[6] Suman Banerjee, Mamata Jenamani, and Dilip Kumar Pratihar. A survey on influence maximization in a social network. Knowledge and Information Systems, 62(9):3417–3455, 2020.
[7] Yuchen Li, Ju Fan, Yanhao Wang, and Kian-Lee Tan. Influence maximization on social graphs: A survey. IEEE Transactions on Knowledge and Data Engineering, 30(10):1852–1872, 2018.
[8] Sancheng Peng, Yongmei Zhou, Lihong Cao, Shui Yu, Jianwei Niu, and Weijia Jia. Influence analysis in social networks: A survey. Journal of Network and Computer Applications, 106:17–32, 2018.
[9] Mehdi Azaouzi, Wassim Mnasri, and Lotfi Ben Romdhane. New trends in influence maximization models. Computer Science Review, 40:100393, 2021.
[10] Matteo Fischetti, Michael Kahr, Markus Leitner, Michele Monaci, and Mario Ruthmair. Least cost influence propagation in (social) networks. Mathematical Programming, 170(1):293–325, 2018.
[11] Dilek Günneç, S Raghavan, and Rui Zhang. A branch-and-cut approach for the least cost influence problem on social networks. Networks, 76(1):84–105, 2020.
[12] Dilek Günneç, Subramanian Raghavan, and Rui Zhang. Least-cost influence maximization on social networks. INFORMS Journal on Computing, 32(2):289–302, 2020.
[13] Furkan Gursoy and Dilek Günneç. Influence maximization in social networks under deterministic linear threshold model. Knowledge-Based Systems, 161:111–123, 2018.
[14] Babak Moazzez and Hossein Soltani. Integer programming approach to static monopolies in graphs. Journal of Combinatorial Optimization, 35(4):1009–1041, 2018.
[15] Babak Moazzez and Hossein Soltani. Facets of the dynamic monopoly polytope: Linear ordering formulation. Discrete Optimization, 40:100641, 2021.
[16] Hossein Soltani and Babak Moazzez. A polyhedral study of dynamic monopolies. Annals of Operations Research, 279(1):71–87, 2019.
[17] Hao-Hsiang Wu and Simge Küçükyavuz. A two-stage stochastic programming approach for influence maximization in social networks. Computational Optimization and Applications, 69(3):563–595, 2018.
[18] Giacomo Nannicini, Giorgio Sartor, Emiliano Traversi, and Roberto Wolfler Calvo. An exact algorithm for robust influence maximization. Mathematical Programming, 183(1):419–453, 2020.
[19] Eyal Ackerman, Oren Ben-Zwi, and Guy Wolfovitz. Combinatorial model and bounds for target set selection. Theoretical Computer Science, 411(44-46):4017–4022, 2010.
[20] Cheng-Lung Chen, Eduardo L Pasiliao, and Vladimir Boginski. A cutting plane method for least cost influence maximization. In International Conference on Computational Data and Social Networks, pages 499–511. Springer, 2020.
[21] Hugues Marchand and Laurence A Wolsey. The 0-1 knapsack problem with a single continuous variable. Mathematical Programming, 85(1):15–33, 1999.
[22] Veronica Dal Sasso, Luigi De Giovanni, and Martine Labbé. Strengthened formulations and valid inequalities for single delay management in public transportation. Transportation Science, 53(5):1271–1286, 2019.
[23] Brian Keller and Güzin Bayraksan. Scheduling jobs sharing multiple resources under uncertainty: A stochastic programming approach. IIE Transactions, 42(1):16–30, 2009.
[24] Marko Loparic, Hugues Marchand, and Laurence A Wolsey. Dynamic knapsack sets and capacitated lot-sizing. Mathematical Programming, 95(1):53–69, 2003.
[25] Andrew J Miller, George L Nemhauser, and Martin WP Savelsbergh. On the capacitated lot-sizing and continuous 0–1 knapsack polyhedra. European Journal of Operational Research, 125(2):298–315, 2000.
[26] Amar K Narisetty, Jean-Philippe P Richard, and George L Nemhauser. Lifted tableaux inequalities for 0–1 mixed-integer programs: A computational study. INFORMS Journal on Computing, 23(3):416–424, 2011.
[27] J-PP Richard, Ismael R de Farias Jr, and George L Nemhauser. Lifted inequalities for 0-1 mixed integer programming: Basic theory and algorithms. Mathematical programming, 98(1):89–113, 2003.
[28] J-PP Richard, Ismael R de Farias Jr, and George L Nemhauser. Lifted inequalities for 0-1 mixed integer programming: Superlinear lifting. Mathematical programming, 98(1):115–143, 2003.
[29] S Raghavan and Rui Zhang. Weighted target set selection on trees and cycles. Networks, 77(4):587–609, 2021.
[30] Thomas W Valente. Social network thresholds in the diffusion of innovations. Social networks, 18(1):69–89, 1996.
[31] Aric A. Hagberg, Daniel A. Schult, and Pieter J. Swart. Exploring network structure, dynamics, and function using networkx. In Gaël Varoquaux, Travis Vaught, and Jarrod Millman, editors, Proceedings of the 7th Python in Science Conference, pages 11 – 15, Pasadena, CA USA, 2008.
[32] Martin Grötschel, Michael Jünger, and Gerhard Reinelt. On the acyclic subgraph polytope. Mathematical Programming, 33(1):28–42, 1985.
[33] Hasan Manzour, Simge Küçükyavuz, Hao-Hsiang Wu, and Ali Shojaie. Integer programming for learning directed acyclic graphs from continuous data. Informs Journal on Optimization, 3(1):46–73, 2021.

A Polyhedral Approach to Least Cost Influence Maximization in Social Networks

Abstract

1 Introduction

1.1 Notation and problem definition

1.2 Main contributions

1.3 Outline

2 Valid inequalities based on mixed 0-1 knapsack polyhedron

Proposition 1.

2.1 Continuous cover and continuous packing inequalities

Proposition 2.

Proposition 3.

Example 1.

2.2 Minimum influencing subset inequalities

Definition 1 ([10]).

Proposition 4.

Proof.

Proposition 5.

Proof.

Example 1 (Continued).

2.3 Separation of minimal influencing subset inequalities

Proposition 6.

Proof.

2.4 Separation of continuous cover and continuous packing inequalities

Lemma 1.

Proof.

Proposition 7.

Proof.

3 Valid inequalities for LCIM with cycles

3.1 Dynamic programming recursion for LCIM on a simple cycle

Proposition 8.

Proof.

3.2 Valid inequalities for influence propagation over a cycle

Proposition 9.

Proof.

Case 1.

Case 2.

Example 2.

Proposition 10.

Proof.

3.3 Separation of (U,C)(U,C) inequalities

Proposition 11.

Proof.

Example 3.

4 LCIM under equal influence and 100% adoption

Proposition 12.

Proof.

Case 1.

Case 2.

Case 3.

Proposition 13.

Proof.

5 Computational Experiments

5.1 Data generation and algorithm settings

5.2 Analysis of results

6 Conclusion

Acknowledgements

Data Availability Statement

References

3.3 Separation of $(U,C)$ inequalities