This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Robust Sequence Networked Submodular Maximization

Qihao Shi12, Bingyang Fu1, Can Wang1, Jiawei Chen1, Sheng Zhou1, Yan Feng1, Chun Chen1 Corresponding Author.
Abstract

In this paper, we study the Robust optimization for sequence Networked submodular maximization (RoseNets) problem. We interweave the robust optimization with the sequence networked submodular maximization. The elements are connected by a directed acyclic graph and the objective function is not submodular on the elements but on the edges in the graph. Under such networked submodular scenario, the impact of removing an element from a sequence depends both on its position in the sequence and in the network. This makes the existing robust algorithms inapplicable. In this paper, we take the first step to study the RoseNets problem. We design a robust greedy algorithm, which is robust against the removal of an arbitrary subset of the selected elements. The approximation ratio of the algorithm depends both on the number of the removed elements and the network topology. We further conduct experiments on real applications of recommendation and link prediction. The experimental results demonstrate the effectiveness of the proposed algorithm.

Introduction

Submodularity is an important property that models a diminishing return phenomenon, i.e., the marginal value of adding an element to a set decreases as the set expands. It has been extensively studied in the literature, mainly accompanied with maximization or minimization problems of set functions (Nemhauser and Wolsey 1978; Khuller, Moss, and Naor 1999). This is called submodularity. Mathematically, a set function f:2Vf:2^{V}\to\mathbb{R} is submodular if for any two sets ABVA\subseteq B\subseteq V and an element vV\Bv\in V\backslash B, we have f(Av)f(A)f(Bv)f(B)f(A\cup v)-f(A)\geq f(B\cup v)-f(B). Such property finds a wide range of applications in machine learning, combinatorial optimization, economics, and so on. (Krause 2005; Lin and Bilmes 2011; Li et al. 2022; Shi et al. 2021; Kirchhoff and Bilmes 2014; Gabillon et al. 2013; Kempe, Kleinberg, and Tardos 2003; Wang et al. 2021).

Further equipped with monotonicity, i.e., f(A)f(B)f(A)\leq f(B) for any ABVA\subseteq B\subseteq V, a submodular set function can be maximized by the cardinality constrained classic greedy algorithm, achieving an approximation ratio up to 11/e1-1/e (Nemhauser and Wolsey 1978) (almost the best). Since then, the study of submodular functions has been extended by a variety of different scenarios, such as non-monotone scenario, adaptive scenario, and continuous scenario, etc (Feige, Mirrokni, and Vondrak 2007; Golovin and Krause 2011; Das and Kempe 2011; Bach 2019; Shi et al. 2019).

The above works focus on the set functions. In real applications, the order of adding elements plays an important role and affects the function value significantly. Recently, the submodularity has been generalized to sequence functions (Zhang et al. 2015; Tschiatschek, Singla, and Krause 2017; Streeter and Golovin 2008; Zhang et al. 2013). Considering sequences instead of sets causes an exponential increase in the size of the search space, while allowing for much more expressive models.

In this paper, we consider that the elements are networked by a directed graph. The edges encode the additional value when the connected elements are selected in a particular order. Such setting is not given a specific name before. To distinguish from the classic submodularity, we in this paper name it as networked submodularity (Net-submodularity for short). More specifically, the Net-submodular function f(σ)f(\sigma) is a sequence function, which is not submodular on the induced element set by σ\sigma but is submodular on the induced edge set by σ\sigma. The Net-submodularity is first considered in (Tschiatschek, Singla, and Krause 2017), which mainly focuses on the case where the underlying graph is a directed acyclic graph. General graphs and hypergraphs are considered in (Mitrovic et al. 2018).

Recently, robust versions of the submodular maximization problem have arisen (Orlin, Schulz, and Udwani 2018; Mitrovic et al. 2017; Bogunovic et al. 2017; Sallam et al. 2020) to meet the increasing demand in the stability of the system. The robustness of the model mainly concerns with its ability in handling the malfunctions or adversarial attacks, i.e., the removal of a subset of elements in the selected set or sequence. Sample cases of elements removal in real world scenarios include items sold out or stop production in recommendation (Mitrovic et al. 2018), web failure of user logout in link prediction (Mitrovic et al. 2019) and equipment malfunction in sensor allocation or activation (Zhang et al. 2015). In this paper, we take one step further and study a new problem of robust sequence networked submodular maximization (RoseNets). We show an example in Figure 1 to illustrate the importance of RoseNets problem.

See in Figure 1. Suppose all edge weights in sequence A are 0.9, in sequence B are 0.5, in sequence C are 0.4. Let the net-submodular utility function ff of the sequence be the summation of all the weights of the induced edge set by the sequence. Such utility function is obviously monotone but not submodular.111Easy to see that in sequence B, the utility of {B4}\{B4\} is 0, but {B3,B4}\{B3,B4\} is 0.5, which violates the submodularity. However, it is submodular on the edge set. Now we can see, the utility of sequence A, B and C are 2.7 (largest), 2.5 and 2.4 respectively. We can easily check that if one node would be removed in each sequence, the worst utility after removal of sequence A, B and C is 0.9, 1.0 (largest), and 0.8. If we remove two nodes in each sequence, the utility of A, B and C becomes 0, 0, and 0.4 (largest). With different number of nodes removed, the three sequences show different robustness. Existing non-robust algorithm may select sequence A since it has the largest utility. However, sequence B and C are more robust against node removal.

Refer to caption
Figure 1: Example of RoseNets

Given a net-submodular function and the corresponding network, the RoseNets problem aims to select a sequence of elements with cardinality constraints, such that the value of the sequence function is maximized when a certain number of the selected elements may be removed. As far as sequence functions and net-submodularity are concerned, the design and analysis of robust algorithms are faced with novel technical difficulties. The impact of removing an element from a sequence depends both on its position in the sequence and in the network. This makes the existing robust algorithms inapplicable here. It is unclear what conditions are sufficient for designing efficient robust algorithm with provable approximation ratios for RoseNets problem. We aim to take a step for answering this question in this paper. Our contributions are summarized as follows.

  1. 1.

    To the best of our knowledge, this is the first work that considers the RoseNets problem. Combining robust optimization and sequence net-submodular maximization requires subtle yet critical theoretical efforts.

  2. 2.

    We design a robust greedy algorithm that is robust against the removal of an arbitrary subset of the selected sequence. The theoretical approximation ratio depends both on the number of the removed elements and the network topology.

  3. 3.

    We conduct experiments on real applications of recommendation and link prediction. The experimental results demonstrate the effectiveness and robustness of the proposed algorithm, against existing sequence submodular baselines. We hope that this work serves as an important first step towards the design and analysis of efficient algorithms for robust submodular optimization.

Related Works

Submodular maximization has been extensively studied in the literature. Efficient approximation algorithms have been developed for maximizing a submodular set function in various settings (Nemhauser and Wolsey 1978; Khuller, Moss, and Naor 1999; Calinescu et al. 2011; Chekuri, Vondrák, and Zenklusen 2014). By considering the robustness requirement, recently, robust versions of submodular maximization have been extensively studied. These works aim at selecting a set of elements that is robust against the removal of a subset of elements. The first algorithm for the cardinality constrained robust submodular maximization problem is studied in (Orlin, Schulz, and Udwani 2018). A constant factored approximation ratio is achieved. The selected kk-sized set is robust against the removal of any τ\tau elements of the selected set. The constant approximation ratio is valid as long as τ=O(k))\tau=O(\sqrt{k})). An improvement is made in (Bogunovic et al. 2017), which provide an algorithm that guarantees the same constant approximation ratio but allows the removal of a larger number of elements (i.e.,τ=O(k))\tau=O(k)). With a mild assumption, the algorithm proposed in (Mitrovic et al. 2017) allows the removal of an arbitrary number of elements. The restriction on τ\tau is relaxed in (Tzoumas et al. 2017), while the derived approximation ratio is parameterized τ\tau. This work is extended to a multi-stage setting in (Tzoumas, Jadbabaie, and Pappas 2018) and (Tzoumas, Jadbabaie, and Pappas 2020). The decision at each stage would takes into account the failures that happened in the previous stages. Other constrains that are combined with the robust optimization include fairness, privacy issues and so on (Mirzasoleiman, Karbasi, and Krause 2017; Kazemi, Zadimoghaddam, and Karbasi 2018).

The concept of sequence (or string) submodularity for sequence functions is a generalization of submodularity, which has been introduced recently in several studies (Zhang et al. 2015; Streeter and Golovin 2008; Zhang et al. 2013) The above works all consider element-based robust submodular maximization. Networked submodularity is considered in (Tschiatschek, Singla, and Krause 2017; Mitrovic et al. 2018), where the sequential relationship among elements is encoded by a directed acyclic graph. Following the networked submodularity setting, the work in (Mitrovic et al. 2019) introduces the idea of adaptive sequence submodular maximization, which aims to utilize the feedback obtained in previous iterations to improve the current decision. In this paper, we follow the networked submodularity setting, and study the RoseNets problem. It is unclear whether all of the above algorithms can be properly extended to our problem, as converting a set function to a sequence function and submodularity to networked submodularity could result in an arbitrarily bad performance. Establishing the approximation guarantees for RoseNets problem would require a more sophisticated analysis, which calls for more in-depth theoretical efforts.

System Model and Problem Definition

In this paper, we follow the networked submodular sequence setting (Tschiatschek, Singla, and Krause 2017; Mitrovic et al. 2018). Let V={v1,v2,,vn}V=\{v_{1},v_{2},...,v_{n}\} be the set of nn elements. A set of edges EE represents that there is additional utility in picking certain elements in a certain order. More specifically, an edge eij=(vi,vj)e_{ij}=(v_{i},v_{j}) represents that there is additional utility in selecting vjv_{j} after viv_{i} has already been chosen. Self-loops (i.e., edges that begin and end at the same element) represents that there is individual utility in selecting an element.

Given a directed graph G=(V,E)G=(V,E), a non-negative monotone submodular set function h:2E0h:2^{E}\rightarrow\mathbb{R}_{\geq 0}, and a parameter kk, the objective is to select a non-repeating sequence σ\sigma of kk unique elements that maximizes the objective function:

f(σ)=h(E(σ)),f(\sigma)=h(E(\sigma)),

where E(σ)E(\sigma) contains all the edges (vi,vj)E(v_{i},v_{j})\in E that viv_{i} is select before vjv_{j} in σ\sigma.

We say E(σ)E(\sigma) is the set of edges induced by the sequence σ\sigma. It is important to note that the function hh is a submodular set function over the edges, not over the elements. Furthermore, the objective function ff is neither a set function, nor is it necessarily submodular on the elements. We call such a function f(σ)f(\sigma) as a networked submodular function.

We define f(σZ)f(\sigma-Z) to represent the residual value of the objective function after the removal of elements in set ZZ. In this paper, the robust sequence networked submodular maximization (RoseNets) problem is formally defined below.

Definition 1.

Given a directed graph G=(V,E)G=(V,E), a networked submodular function f()f(\cdot) and robustness parameter τ\tau, the RoseNets problem aims at finding a sequence σ\sigma such that it is robust against the worst possible removal of τ\tau nodes:

maxσ:|σ|kminZσ,|Z|τf(σZ).\max_{\sigma:|\sigma|\leq k}\min_{Z\in\sigma,|Z|\leq\tau}f(\sigma-Z).

The robustness parameter τ\tau represents the size of the subset ZZ that is removed. After the removal, the objective value should remain as large as possible. For τ=0\tau=0, the problem reduces to the classic sequence submodular maximization problem (Mitrovic et al. 2018).

Robust Algorithm and Theoretical Results

Direct applying the Sequence Greedy algorithm (Mitrovic et al. 2018) to solve the RoseNets problem would return an arbitrary bad solution. We can construct a very simple example for an illustration. See in Figure 2. Let the edge weights of (A,B),(B,C),(B,E),(B,F)(A,B),(B,C),(B,E),(B,F) be 0.9 and (C,D),(C,G),(D,G)(C,D),(C,G),(D,G) be 0.5. Let the net-submodular utility function ff of the selected sequence be the summation of all weights of the induced edge set by the sequence. Suppose we are to select a sequence with 5 elements. Using the Sequence Greedy algorithm, sequence A,B,C,E,F\langle A,B,C,E,F\rangle will be selected222Suppose elements are selected in the alphabetic order if the edge weights are equal. Similar examples can be easily constructed when elements are selected at random. for maximizing the utility, i.e., (0.94=3.6)(0.9\cdot 4=3.6). However, if τ=2\tau=2, i.e., two elements would be removed, removing BB and any other one element (worst case) makes the utility become 0.

Refer to caption
Figure 2: Example of Sequence Greedy

RoseNets Algorithm

We wish to design an algorithm that is robust against the removal of an arbitrary subset of τ\tau selected elements. In this paper, we propose the RoseNets Algorithm, which can approximately solves the RoseNets problem and is shown in Algorithm 1. Note we consider the case that k3k\geq 3.

The limitation of the Sequence Greedy algorithm is that the selected sequence is vulnerable. The overall utility might be concentrated in the first few elements. Algorithm 1 is motivated by this key observation and works in two steps. In Step 1 (the first while loop), we select a sequence σ1\sigma_{1} of τ\tau elements from VV in a greedy manner as in Sequence Greedy. In Step 2 (the second while loop), we select another sequence σ2\sigma_{2} of kτk-\tau elements from V\σ1V\backslash\sigma_{1}, again in a greedy manner as in Sequence Greedy. Note that when we select sequence σ2\sigma_{2}, we perform the greedy selection as if sequence σ1\sigma_{1} does not exist at all. This ensures that the value of the final returned sequence σ=σ1σ2\sigma=\sigma_{1}\oplus\sigma_{2} is not concentrated in either σ1\sigma_{1} or σ2\sigma_{2}. The complexity of Algorithm 1 is O(k|E|)O(k|E|), which is in terms of the number of function evaluations used in the algorithm.

To show the differences and benefits of the RoseNets algorithm, we go back to see the example in Figure 2. When k=5k=5 and τ=2\tau=2, the RoseNets algorithm will select the sequence σ1=A,B\sigma_{1}=\langle A,B\rangle and sequence σ2=C,D,G\sigma_{2}=\langle C,D,G\rangle. When selecting σ2\sigma_{2}, the RoseNets algorithm would not consider element BB in σ1\sigma_{1}. Thus element EE and FF are regarded as making not contribution to the utility function. The RoseNets algorithm will return the sequence σ=A,B,C,D,G\sigma=\langle A,B,C,D,G\rangle. The worst case of removing τ=2\tau=2 elements, is to remove BB and any one element in {C,D,G}\{C,D,G\}. The residual utility is 0.5. Remember the case for Sequence Greedy algorithm, the residual utility of the worst case is 0. This example shows the benefits of the RoseNets algorithm.

Both the examples in Figure 1 and Figure 2 imply that a robust sequence should have complex network structure. The utility should not concentrate in a center element, but aggregated from all the edges among elements in the sequence. Such a robust sequence can only be selected by multiple trials of greedy selection, with the trials neglecting each other. Otherwise, the central element (if exists) with its high edge weight neighbors are probability selected, as node BB in Figure 2. If such a node is removed, the utility would slump. In this paper, we implement such intuitive strategy by using two trials of greedy selection. Intuitively, invoking more times of greedy selection trial may improve the approximation ratio. However, as we are to take a first step in the RoseNets problem while our theoretical analysis is already non-trivial, we leave the design of multi-part selection algorithm and the approximation ratio analysis for future works.

1 σ=\sigma=\emptyset, σ1=\sigma_{1}=\emptyset, σ2=\sigma_{2}=\emptyset;
2 while |σ1|<τ|\sigma_{1}|<\tau do
3       if |σ1|=τ1|\sigma_{1}|=\tau-1 then
4             E={eij|vjσ1)(vi=vjviσ1)}E^{\prime}=\{e_{ij}|v_{j}\notin\sigma_{1})\land(v_{i}=v_{j}\vee v_{i}\in\sigma_{1})\};
5             eij=argmaxeijEh(e|E(σ1))e_{ij}=\arg\max_{e_{ij}\in E^{\prime}}h(e|E(\sigma_{1}));
6             σ1=σ1vj\sigma_{1}=\sigma_{1}\oplus v_{j};
7            
8      else
9             eij=argmax{eij|vjσ1}h(e|E(σ1))e_{ij}=\arg\max_{\{e_{ij}|v_{j}\notin\sigma_{1}\}}h(e|E(\sigma_{1}));
10             if vj=viv_{j}=v_{i} or viσ1v_{i}\in\sigma_{1} then
11                   σ1=σ1vj\sigma_{1}=\sigma_{1}\oplus v_{j};
12                   else
13                         σ1=σ1vivj\sigma_{1}=\sigma_{1}\oplus v_{i}\oplus v_{j};
14                        
15                  
16            
17      
18while |σ2|<kτ|\sigma_{2}|<k-\tau do
19       if |σ2|=kτ1|\sigma_{2}|=k-\tau-1 then
20             E={eij|vjσ1σ2(vi=vjviσ2)}E^{\prime}=\{e_{ij}|v_{j}\notin\sigma_{1}\cup\sigma_{2}\land(v_{i}=v_{j}\vee v_{i}\in\sigma_{2})\};
21             eij=argmaxeijEh(e|E(σ2))e_{ij}=\arg\max_{e_{ij}\in E^{\prime}}h(e|E(\sigma_{2}));
22             σ2=σ2vj\sigma_{2}=\sigma_{2}\oplus v_{j};
23            
24      else
25             eij=argmax{eij|viσ1,vjσ1σ2}h(e|E(σ2))e_{ij}=\arg\max_{\{e_{ij}|v_{i}\notin\sigma_{1},v_{j}\notin\sigma_{1}\cup\sigma_{2}\}}h(e|E(\sigma_{2}));
26             if vj=viv_{j}=v_{i} or viσ2v_{i}\in\sigma_{2} then
27                   σ2=σ2vj\sigma_{2}=\sigma_{2}\oplus v_{j};
28                   else
29                         σ2=σ2vivj\sigma_{2}=\sigma_{2}\oplus v_{i}\oplus v_{j};
30                        
31                  
32            
33      
34σ=σ1σ2\sigma=\sigma_{1}\oplus\sigma_{2};
return σ\sigma
Algorithm 1 RoseNets Algorithm

Theoretical Results

Let α=2din+1\alpha=2d_{\text{in}}+1, β=1+din+dout\beta=1+d_{\text{in}}+d_{\text{out}}, γ=ek3k2\gamma=e^{\frac{k-3}{k-2}}, η=ek2τ1kτ1\eta=e^{\frac{k-2\tau-1}{k-\tau-1}}. Note dind_{\text{in}} and doutd_{\text{out}} are the maximum in and out degree of the network respectively. For convience, we denote f(v|σ)f(v|\sigma) and f(σ|σ)f(\sigma^{\prime}|\sigma) as the marginal gain of attending vv and σ\sigma^{\prime} to sequence σ\sigma respectively. We denote σ(V,k,τ)\sigma^{*}(V,k,\tau) as the optimal solution of the RoseNets problem with element set VV, cardinality kk and robust parameter τ\tau, and gτ(σ)g_{\tau}(\sigma) be the minimum value of f(σ)f(\sigma) after τ\tau elements are removed from σ\sigma.

Theorem 1.

Consider τ=1\tau=1, Algorithm 1 achieves an approximation ratio of

max{1e(11/k)αβ,γ1din1βγ1din1}.\max\{\frac{1-e^{-(1-1/k)}}{\alpha\beta},\frac{\gamma^{\frac{1}{d_{\text{in}}}}-1}{\beta\gamma^{\frac{1}{d_{\text{in}}}}-1}\}.
Theorem 2.

Consider 1τk1\leq\tau\leq k, Algorithm 1 achieves an approximation ratio of

max{1e(11/k)αβ,ταβ(η1din1)ταη1dinβ(1e(11/k))}.\max\{\frac{1-e^{-(1-1/k)}}{\alpha\beta},\frac{\tau\alpha\beta(\eta^{\frac{1}{d_{\text{in}}}}-1)}{\tau\alpha\eta^{\frac{1}{d_{\text{in}}}}-\beta(1-e^{-(1-1/k)})}\}.

In Theorem 1, it is hard to compare the two approximation ratios directly due to their complex mathematical expression. Thus we consider specific network setting to show the different advantages of the two terms.

First, it is easy to verify that both the two terms are monotonically increasing function of kk. When k=3k=3, we have 1e(11/k)αβγ1din1βγ1din1=1e2/3αβ>0\frac{1-e^{-(1-1/k)}}{\alpha\beta}-\frac{\gamma^{\frac{1}{d_{\text{in}}}}-1}{\beta\gamma^{\frac{1}{d_{\text{in}}}}-1}=\frac{1-e^{-2/3}}{\alpha\beta}>0. Thus we know that when kk is small, the first term is larger. When kk\to\infty, the first term has a limit value of (11/e)/αβ(1-1/e)/\alpha\beta, and the second term has a limit value of e1din1βe1din1\frac{e^{\frac{1}{d_{\text{in}}}}-1}{\beta e^{\frac{1}{d_{\text{in}}}}-1}.When din=1d_{\text{in}}=1, then α=3\alpha=3 and (11/e)/αβe1din1βe1din1=(11/e)/3βe1βe1=12βe+1e+2β3β(βe1)<0(1-1/e)/\alpha\beta-\frac{e^{\frac{1}{d_{\text{in}}}}-1}{\beta e^{\frac{1}{d_{\text{in}}}}-1}=(1-1/e)/3\beta-\frac{e-1}{\beta e-1}=\frac{-1-2\beta e+\frac{1}{e}+2\beta}{3\beta(\beta e-1)}<0. In this case, the second term is larger than the first term in Theorem 1. Thus we can conclude that under specific network structure, the second term would be larger with large kk.

Similarly, for Theorem 2, when k=3k=3 and τ=1\tau=1, 1e(11/k)αβταβ(η1din1)ταη1dinβ(1e(11/k))=1e2/3αβ>0\frac{1-e^{-(1-1/k)}}{\alpha\beta}-\frac{\tau\alpha\beta(\eta^{\frac{1}{d_{\text{in}}}}-1)}{\tau\alpha\eta^{\frac{1}{d_{\text{in}}}}-\beta(1-e^{-(1-1/k)})}=\frac{1-e^{-2/3}}{\alpha\beta}>0. Thus we know when kk and τ\tau is small, the first term is larger. When kk\to\infty while τ\tau remains a constant, the first term has a limit value of (11/e)/αβ(1-1/e)/\alpha\beta, the second term has a limit value of ταβ(e1din1)ταe1dinβ(1e1)\frac{\tau\alpha\beta(e^{\frac{1}{d_{\text{in}}}}-1)}{\tau\alpha e^{\frac{1}{d_{\text{in}}}}-\beta(1-e^{-1})}. When din=1d_{\text{in}}=1 and dout<3τe2e12d_{\text{out}}<\frac{3\tau e^{2}}{e-1}-2, then α=3\alpha=3, β<3τe2e1\beta<\frac{3\tau e^{2}}{e-1} and (11/e)/αβταβ(e1din1)ταe1dinβ(1e1)=(11e)(3τeβ(11e))9β2τ(e1)3β(3τeβ(11e))<0(1-1/e)/\alpha\beta-\frac{\tau\alpha\beta(e^{\frac{1}{d_{\text{in}}}}-1)}{\tau\alpha e^{\frac{1}{d_{\text{in}}}}-\beta(1-e^{-1})}=\frac{(1-\frac{1}{e})(3\tau e-\beta(1-\frac{1}{e}))-9\beta^{2}\tau(e-1)}{3\beta(3\tau e-\beta(1-\frac{1}{e}))}<0. In this case, the second term is larger than the first term in Theorem 2. Thus we can conclude that under specific network structure, the second term would be larger with large kk.

According to the above analysis, we know that the value of kk and τ\tau, together with the network topology, significantly affect the approximation ratio. It would be an interesting future direction to explore how the approximation ratio change when the parameters and network topology change.

To prove the above two theorems, we need the following three auxiliary lemmas. Due to space limitations, we here assume Lemma 1, 2 and 3 hold and show the proof of Theorem 1 below. We provide the proofs of Lemma 1, 2, 3 and Theorem 2 in the supplementary material.

Lemma 1.

There exists an element vv for sequence σ1\sigma_{1} and σ2\sigma_{2} satisfies that f(v|σ1)1din|σ2|f(σ2|σ1)f(v|\sigma_{1})\geq\frac{1}{d_{\text{in}}|\sigma_{2}|}f(\sigma_{2}|\sigma_{1}).

Lemma 2.

Consider c(0,1]c\in(0,1] and 1kk1\leq k^{\prime}\leq k. Suppose that the sequence selected is σ\sigma with |σ|=k|\sigma|=k and that there exists a sequence σ\sigma^{\prime} with |σ|=kk|\sigma^{\prime}|=k-k^{\prime} such that σσ\sigma^{\prime}\subseteq\sigma and f(σ)cf(σ)f(\sigma^{\prime})\geq cf(\sigma). Then we have f(σ)ekdink1ekdinkcf(σ(V,k,0))f(\sigma)\geq\frac{e^{\frac{k^{\prime}}{d_{\text{in}}k}}-1}{e^{\frac{k^{\prime}}{d_{\text{in}}k}}-c}f(\sigma^{*}(V,k,0)).

Lemma 3.

Consider 1τk1\leq\tau\leq k. The following holds for any ZVZ\subseteq V with |Z|τ|Z|\leq\tau: gτ(σ(V,k,τ))f(σ(VZ,kτ,0))g_{\tau}(\sigma^{*}(V,k,\tau))\leq f(\sigma^{*}(V-Z,k-\tau,0)).

Proof of Theorem 1

Given τ=1\tau=1, the selected sequence σ1\sigma_{1} has one element. And we have σ2=k1\sigma_{2}=k-1. Let σ1={v1}\sigma_{1}=\{v_{1}\}. Then the final sequence is σ={v1}σ2\sigma=\{v_{1}\}\oplus\sigma_{2}. Suppose the the removed vertex from the sequence is zz.

First, we show a lower bound on f(σ2)f(\sigma_{2}):

f(σ2)\displaystyle f(\sigma_{2}) 1e(11/k)αf(σ(V\v1,k1,0))\displaystyle\geq\frac{1-e^{-(1-1/k)}}{\alpha}f(\sigma^{*}(V\backslash v_{1},k-1,0)) (1)
1e(11/k)αgτ(σ(V,k,τ))\displaystyle\geq\frac{1-e^{-(1-1/k)}}{\alpha}g_{\tau}(\sigma^{*}(V,k,\tau))

The first inequality is due to the approximation ratio of Sequence Greedy algorithm for net-submodular maximization (Mitrovic et al. 2018). The second inequality is due to Lemma 3.

Now, we can see the removed element zz can be either v1v_{1} or an element in σ2\sigma_{2}. In the following, we will consider these two cases:

Case 1. Let z=v1z=v_{1}. Then we have

f(σz)=f(σ2)1e(11/k)αgτ(σ(V,k,τ))f(\sigma-z)=f(\sigma_{2})\geq\frac{1-e^{-(1-1/k)}}{\alpha}g_{\tau}(\sigma^{*}(V,k,\tau)) (2)

Case 2. Let zσ2z\in\sigma_{2}. We then further consider two cases:

Case 2.1. Let f(σ2)f(σ2z)f(\sigma_{2})\leq f(\sigma_{2}-z).

In this case, the removal does not reduce the overall value of the remaining sequence σ2{z}\sigma_{2}-\{z\}. Then we have

f(σv)\displaystyle f(\sigma-v) =f(v1(σ2z))f(σ2z)\displaystyle=f(v_{1}\oplus(\sigma_{2}-z))\geq f(\sigma_{2}-z) (3)
f(σ2)1e(11/k)αgτ(σ(V,k,τ))\displaystyle\geq f(\sigma_{2})\geq\frac{1-e^{-(1-1/k)}}{\alpha}g_{\tau}(\sigma^{*}(V,k,\tau))

Case 2.2. Let f(σ2)>f(σ2z)f(\sigma_{2})>f(\sigma_{2}-z).

We define q=f(σ2)f(σ2z)(din+dout)f(σ2)q=\frac{f(\sigma_{2})-f(\sigma_{2}-z)}{(d_{\text{in}}+d_{\text{out}})f(\sigma_{2})}, to represent the ratio of the loss of removing element zz from sequence σ2\sigma_{2} to the value of the sequence σ2\sigma_{2}. Obviously, we have q(0,1din+dout]q\in(0,\frac{1}{d_{\text{in}}+d_{\text{out}}}] since f(σ2)>f(σ2z)f(\sigma_{2})>f(\sigma_{2}-z).

First, we have

(din+\displaystyle(d_{\text{in}}+ dout)qf(σ2)=f(σ2)f(σ2z)\displaystyle d_{\text{out}})qf(\sigma_{2})=f(\sigma_{2})-f(\sigma_{2}-z) (4)
=f(σ21zσ22)f(σ21σ22)\displaystyle=f(\sigma_{2}^{1}\oplus z\oplus\sigma_{2}^{2})-f(\sigma_{2}^{1}\oplus\sigma_{2}^{2})
=f(σ21)+f(z|σ21)+f(σ22|(σ21z))\displaystyle=f(\sigma_{2}^{1})+f(z|\sigma_{2}^{1})+f(\sigma_{2}^{2}|(\sigma_{2}^{1}\oplus z))
f(σ21)f(σ22|σ21)\displaystyle\quad\quad\quad\quad\quad\quad-f(\sigma_{2}^{1})-f(\sigma_{2}^{2}|\sigma_{2}^{1})
=f(z|σ21)+f(σ22|(σ21z))f(σ22|σ21)\displaystyle=f(z|\sigma_{2}^{1})+f(\sigma_{2}^{2}|(\sigma_{2}^{1}\oplus z))-f(\sigma_{2}^{2}|\sigma_{2}^{1})
dinh(ezin)+douth(ezout)\displaystyle\leq d_{\text{in}}h(e_{z}^{\text{in}})+d_{\text{out}}h(e_{z}^{\text{out}})
(din+dout)max{h(ezin),h(ezout)}\displaystyle\leq(d_{\text{in}}+d_{\text{out}})\max\{h(e_{z}^{\text{in}}),h(e_{z}^{\text{out}})\}

where h(ezin)h(e_{z}^{\text{in}})/h(ezout)h(e_{z}^{\text{out}}) are the edge that has maximum utility over all the incoming/outgoing edges of zz. The first inequality is due to the fact that the marginal gain of a vertex zz to the prefix and subsequent sequence is at most dinh(ezin)d_{\text{in}}h(e_{z}^{\text{in}}) and douth(ezout)d_{\text{out}}h(e_{z}^{\text{out}}). Then the second inequality follows intuitively.

Given Equation (4), we need to prove four inequalities for finally proving the theorem.

First, suppose the first vertex of σ2\sigma_{2} is v2v_{2}. By the monotonicity of function f()f(\cdot) and Equation (4), we have

f(σ{z})\displaystyle f(\sigma-\{z\}) f(v1v2)\displaystyle\geq f(v_{1}\oplus v_{2}) (5)
max{h(ezin),h(ezout)}qf(σ2), and\displaystyle\geq\max\{h(e_{z}^{\text{in}}),h(e_{z}^{\text{out}})\}\geq qf(\sigma_{2}),\text{ and }
f(σ{z})\displaystyle f(\sigma-\{z\}) =f(v1(σ2z))\displaystyle=f(v_{1}\oplus(\sigma_{2}-z))
f(σ2z)(1(din+dout)q)f(σ2)\displaystyle\geq f(\sigma_{2}-z)\geq(1-(d_{\text{in}}+d_{\text{out}})q)f(\sigma_{2})

Given Equation (5), we have Inequality 1 as below.

Inequality 1:

f(σz)max{qf(σ2),(1(din+dout)q)f(σ2)}.f(\sigma-z)\geq\max\{q\cdot f(\sigma_{2}),(1-(d_{\text{in}}+d_{\text{out}})q)\cdot f(\sigma_{2})\}.

We know max{x,1bx}11+b\max\{x,1-bx\}\geq\frac{1}{1+b} for x(0,1b]x\in(0,\frac{1}{b}] and b>0b>0.333As xx is monotone increasing and 1bx1-bx is monotone decreasing, the function achieves the maximum value when x=1bxx=1-bx Thus we have Inequality 2 as below.

Inequality 2: max{q,1(din+dout)q}1/β.\max\{q,1-(d_{\text{in}}+d_{\text{out}})q\}\geq 1/\beta.

Note that the first two elements v2,v3v_{2},v_{3} in σ2\sigma_{2} satisfy that

f(v2v3)max{h(ezin),h(ezout)}qf(σ2)f(v_{2}\oplus v_{3})\geq\max\{h(e_{z}^{\text{in}}),h(e_{z}^{\text{out}})\}\geq qf(\sigma_{2})

Thus by replacing the parameters in Lemma 2 and Lemma 3, we have the following result, which implies Inequality 3.

f(σ2)γ1din1γ1dinqf(σ(V\v1,kτ,0))\displaystyle f(\sigma_{2})\geq\frac{\gamma^{\frac{1}{d_{\text{in}}}}-1}{\gamma^{\frac{1}{d_{\text{in}}}}-q}f(\sigma^{*}(V\backslash v_{1},k-\tau,0))
 Inequality 3: f(σ2)γ1din1γ1dinqgτ(σ(V,k,τ))\displaystyle\Longrightarrow\text{ {Inequality 3: }}f(\sigma_{2})\geq\frac{\gamma^{\frac{1}{d_{\text{in}}}}-1}{\gamma^{\frac{1}{d_{\text{in}}}}-q}g_{\tau}(\sigma^{*}(V,k,\tau))

Now define 1(q)=qγ1din1γ1dinq\ell_{1}(q)=q\frac{\gamma^{\frac{1}{d_{\text{in}}}}-1}{\gamma^{\frac{1}{d_{\text{in}}}}-q} and 2(q)=(1(din+dout)q)γ1din1γ1dinq}γ1din1βγ1din1\ell_{2}(q)=(1-(d_{\text{in}}+d_{\text{out}})q)\frac{\gamma^{\frac{1}{d_{\text{in}}}}-1}{\gamma^{\frac{1}{d_{\text{in}}}}-q}\}\geq\frac{\gamma^{\frac{1}{d_{\text{in}}}}-1}{\beta\gamma^{\frac{1}{d_{\text{in}}}}-1}. It is easy to verify that for k3k\geq 3 and q(0,1din+dout]q\in(0,\frac{1}{d_{\text{in}}+d_{\text{out}}}], 1(q)\ell_{1}(q)/2(q)\ell_{2}(q) is monotonically increasing/decreasing. Note when q=1βq=\frac{1}{\beta}, 1(q)=2(q)=γ1din1βγ1din1\ell_{1}(q)=\ell_{2}(q)=\frac{\gamma^{\frac{1}{d_{\text{in}}}}-1}{\beta\gamma^{\frac{1}{d_{\text{in}}}}-1}. We consider two cases for qq: (1) when q(0,1β]q\in(0,\frac{1}{\beta}], we have max{1(q),2(q)}2(1β)\max\{\ell_{1}(q),\ell_{2}(q)\}\geq\ell_{2}(\frac{1}{\beta}) as 2(q)\ell_{2}(q) is monotonically decreasing; (2) when q(1β,1din+out]q\in(\frac{1}{\beta},\frac{1}{d_{\text{in}}+\text{out}}], we have max{1(q),2(q)}1(1β)\max\{\ell_{1}(q),\ell_{2}(q)\}\geq\ell_{1}(\frac{1}{\beta}) as 1(q)\ell_{1}(q) is monotonically increasing. Thus we have the Inequality 4 as below.

Inequality 4:

max{qγ1din1γ1dinq,(1(din+dout)q)γ1din1γ1dinq}γ1din1βγ1din1\max\{q\frac{\gamma^{\frac{1}{d_{\text{in}}}}-1}{\gamma^{\frac{1}{d_{\text{in}}}}-q},(1-(d_{\text{in}}+d_{\text{out}})q)\frac{\gamma^{\frac{1}{d_{\text{in}}}}-1}{\gamma^{\frac{1}{d_{\text{in}}}}-q}\}\geq\frac{\gamma^{\frac{1}{d_{\text{in}}}}-1}{\beta\gamma^{\frac{1}{d_{\text{in}}}}-1}

Combining Inequality 1, Inequality 2 and Equation (1), we can have the first lower bound in Theorem 1:

f(σz)max{qf(σ2),(1(din+dout)q)f(σ2)}\displaystyle f(\sigma-z)\geq\max\{q\cdot f(\sigma_{2}),(1-(d_{\text{in}}+d_{\text{out}})q)f(\sigma_{2})\}
\displaystyle\geq max{q,1(din+dout)q}1e(11/k)αgτ(σ(V,k,τ))\displaystyle\max\{q,1-(d_{\text{in}}+d_{\text{out}})q\}\frac{1-e^{-(1-1/k)}}{\alpha}g_{\tau}(\sigma^{*}(V,k,\tau))
\displaystyle\geq 1e(11/k)αβgτ(σ(V,k,τ))\displaystyle\frac{1-e^{-(1-1/k)}}{\alpha\beta}g_{\tau}(\sigma^{*}(V,k,\tau))

Combining Inequality 1, Inequality 3 and Inequality 4, we can have the second lower bound in Theorem 1:

f(σz)max{qf(σ2),(1(din+dout)q)f(σ2)}\displaystyle f(\sigma-z)\geq\max\{q\cdot f(\sigma_{2}),(1-(d_{\text{in}}+d_{\text{out}})q)\cdot f(\sigma_{2})\}
max{qγ1din1γ1dinq,\displaystyle\geq\max\{q\cdot\frac{\gamma^{\frac{1}{d_{\text{in}}}}-1}{\gamma^{\frac{1}{d_{\text{in}}}}-q},
(1(din+dout)q)γ1din1γ1dinq}gτ(σ(V,k,τ))\displaystyle\quad\quad\quad(1-(d_{\text{in}}+d_{\text{out}})q)\frac{\gamma^{\frac{1}{d_{\text{in}}}}-1}{\gamma^{\frac{1}{d_{\text{in}}}}-q}\}g_{\tau}(\sigma^{*}(V,k,\tau))
γ1din1βγ1din1gτ(σ(V,k,τ))\displaystyle\geq\frac{\gamma^{\frac{1}{d_{\text{in}}}}-1}{\beta\gamma^{\frac{1}{d_{\text{in}}}}-1}g_{\tau}(\sigma^{*}(V,k,\tau))

Now we are done.

Experiments

We compare the performance of our algorithms RoseNets to the non-robust version Sequence Greedy (Sequence for short) (Mitrovic et al. 2018), the existing submodular sequence baseline (OMegA) (Tschiatschek, Singla, and Krause 2017), and a naive baseline (Frequency) which outputs the most popular items the user has not yet reviewed.

To evaluate the performance of the algorithms, we use three evaluation metrics in this paper. The first one is Accuracy Score, which simply counts the number of accurately recommended items. While this is a sensible measure, it does not explicitly consider the order of the sequence. Therefore, we also consider the Sequence Score, which is a measure based on the Kendall-Tau distance (Kendall 1938). This metric counts the number of ordered pairs that appear in both the predicted sequence and the true sequence. These two metrics are also used in (Mitrovic et al. 2019). The third metric is the Utility Function Value of the selected sequence.

We use a probabilistic coverage utility function as our Net-submodular function ff. Mathematically,

f(σ)=h(E1)=jV[1(i,j)E1(1wij)],f(\sigma)=h(E_{1})=\sum_{j\in V}[1-\prod_{(i,j)\in E_{1}}(1-w_{ij})],

where E1EE_{1}\in E is the edges induced by sequence σ\sigma. And to simulate the worst case removal, we remove the first τ\tau elements to evaluate the robustness of the algorithms.

Amazon Product Recommendation

Using the Amazon Video Games review dataset (Ni, Li, and McAuley 2019), we conduct experiments for the task of recommending products to users. In particular, given a specific user with the first 4 products she has purchased, we want to predict the next kk products she will buy. We first build a graph G=(V,E)G=(V,E), where VV is the set of all products and EE is the set of edges between these products. The weight of each edge, wijw_{ij}, is defined to be the conditional probability of purchasing product jj given that the user has previously purchased product ii. We compute wijw_{ij} by taking the fraction of users that purchased jj after having purchased ii among all the users that purchased ii. There are also self-loops with weight wiiw_{ii} that represent the fraction of users that purchased product ii among all the users. We focused on the products that have been purchased at least 50 times each, leaving us with a total of 9383 unique products. Also we select the users that have purchased at least 29 products, leaving use 909 users. We conduct recommendation task on these 909 users and take the average value of each evaluation metric.

Figure 3 shows the performance of the comparison algorithms using the accuracy score, sequence score and utility function value respectively. In Figure 3(a), 3(b), 3(c) and 3(d), we find that after the removal of τ\tau elements, the RoseNets outperforms all the comparisons. Such results demonstrate that the RoseNets algorithm is effective and robust in real applications since accuracy score and sequence score are common evaluation metrics in practical. In Figure 3(e) and 3(f), the only difference is that the OMegA algorithm is outperforming when τ\tau is small or kk is large. The OMegA algorithm aims to find a global optimal solution. It topologically resorts all the candidates after each element selection. It can return a solution with better utility function value when kk is large and τ\tau is small, but runs much slower than RoseNets or Sequence. Also, it shows poor performance in accuracy and sequence score. Thus the RoseNets algorithm is more effective and robust in real applications.

In addition, we also show the case of RoseNets and Sequence with τ=0\tau=0. We can see that in all the experimental results for the three metrics, the RoseNets outperforms Sequence, which is consistent to our expectation. On utility function value, the Sequence(τ=0\tau=0) is better than the RoseNets(τ=0\tau=0). However, on accuracy score and sequence score, the RoseNets(τ=0\tau=0) is very close to the Sequence(τ=0\tau=0), sometimes shows better performance. The former result is due to the effectiveness of greedy framework. The RoseNets algorithm indeed invokes Sequence for two trials independently, which intuitively cannot achieve comparable performance with one trial Sequence execution, due to the Net-submodularity. But the latter result shows that directly implementing greedy selection is not always outperforming. This is due to the intrinsic property of the greedy algorithm. Though 11/e1-1/e is almost the best approximation ratio, some heuristic algorithms may achieve better performance in specific cases. However, if we invoke more trials of Sequence algorithm, better robust experimental results and approximation ratio might be achieved but the utility value would become lower. This is because more trials of independent greedy selection would give high probability of triggering the diminishing return phenomenon. In real applications, this is a trade-off for designing robust algorithms that requires a balance between high efficiency on robustness and maximization of utility value.

Refer to caption
(a) τ=10\tau=10 and k=[11,20]k=[11,20]
Refer to caption
(b) k=15k=15 and τ=[0,10]\tau=[0,10]
Refer to caption
(c) τ=10\tau=10 and k=[11,20]k=[11,20]
Refer to caption
(d) k=15k=15 and τ=[0,10]\tau=[0,10]
Refer to caption
(e) τ=10\tau=10 and k=[11,20]k=[11,20]
Refer to caption
(f) k=15k=15 and τ=[0,10]\tau=[0,10]
Figure 3: Recommendation Application
Refer to caption
(a) τ=5\tau=5 and k=[6,15]k=[6,15]
Refer to caption
(b) k=10k=10 and τ=[0,10]\tau=[0,10]
Refer to caption
(c) τ=5\tau=5 and k=[6,15]k=[6,15]
Refer to caption
(d) k=10k=10 and τ=[0,10]\tau=[0,10]
Refer to caption
(e) τ=5\tau=5 and k=[6,15]k=[6,15]
Refer to caption
(f) k=10k=10 and τ=[0,10]\tau=[0,10]
Figure 4: Link Prediction Application

Wikipedia Link Prediction

Using the Wikispeedia dataset (West, Pineau, and Precup 2009), we consider users who are surfing through Wikipedia towards some target article. Given a sequence of articles the user has previously visited, we want to guide her to the page she is trying to reach. Since different pages have different valid links, the order of pages we visit is critical to this task. Formally, given the first 4 pages each user visited, we want to predict which page she is trying to reach by making a series of suggestions for which link to follow. In this case, we build the graph G=(V,E)G=(V,E), where VV is the set of all pages and EE is the set of existing links between pages. Similarly to the recommendation case, the weight wijw_{ij} of an edge (i,j)E(i,j)\in E is the probability of moving to page jj given that the user is currently at page ii, i.e., the fraction of moves from ii to jj among all the visit of ii. In this case, we build no self-loops as we assume we can only move using links. Thus we cannot jump to random pages. We condense the dataset to include only articles and edges that appeared in a path, leaving us 4170 unique pages and 55147 edges. We run the algorithm on paths with length at least 29, which leaves us 271 paths. We conduct link prediction task on these 271 paths and take the average value of each evaluation metric.

Figure 4 shows the performance of the comparison algorithms using the accuracy score, sequence score and utility function value respectively. In Figure 4, we find that after the removal of τ\tau elements, the RoseNets outperforms all the comparisons in all the cases. These results demonstrate that the RoseNets algorithm is effective and robust in real applications. The OMegA algorithm does not show comparable performance to RoseNets algorithm any more. This is because that (1) the path need to be predicted is very long, and (2) the intersection part of different paths is not as large as the case in Amazon recommendation experiments. Thus the global algorithm OMegA cannot exploit its advantages. We still can find in Figure 4(a), 4(c) and 4(e) that when kk becomes larger, the performance of OMegA algorithm increases faster. This in turn demonstrates that the RoseNets algorithm is more general, effective and robust, which does not assume specific real application scenario.

Another difference in the link prediction application is that, the RoseNets(τ=0\tau=0) outperforms Sequence(τ=0\tau=0) almost in all the cases on accuracy and sequence score. This again verifies that directly implementing greedy selection may sometimes far away from the optimal solution. As discussed in the end of the recommendation case, an interesting future direction is to explore the trade-off between high efficiency on robustness and the maximization of utility value by invoking more independent greedy selection trials.

Conclusion

In this paper, we are the first to study the RoseNets problem, which combines the robust optimization and sequence networked submodular maximization. We design a robust algorithm with an approximation ratio that is bounded by the number of the removed elements and the network topology. Experiments on real applications of recommendation and link prediction demonstrate the effectiveness of the proposed algorithm. For future works, one direction is to develop robust algorithms that can achieve higher approximation ratio. An intuitive improvement is to invoke multiple trials of independent greedy selection. Another direction is to consider the robustness against the removal of edges. This is non-trivial since different removal operation would change the network topology and affect the approximation ratio.

Acknowledgments

This work is supported by the National Natural Science Foundation of China (Grant No: 62102357, U1866602, 62106221, 62102382), and the Starry Night Science Fund of Zhejiang University Shanghai Institute for Advanced Study (Grant No: SN-ZJU-SIAS-001). It is also partially supported by the Zhejiang Provincial Key Research and Development Program of China (2021C01164).

References

  • Bach (2019) Bach, F. 2019. Submodular functions: from discrete to continuous domains. Mathematical Programming, 175(1): 419–459.
  • Bogunovic et al. (2017) Bogunovic, I.; Mitrović, S.; Scarlett, J.; and Cevher, V. 2017. Robust submodular maximization: A non-uniform partitioning approach. In International Conference on Machine Learning, 508–516. PMLR.
  • Calinescu et al. (2011) Calinescu, G.; Chekuri, C.; Pal, M.; and Vondrák, J. 2011. Maximizing a monotone submodular function subject to a matroid constraint. SIAM Journal on Computing, 40(6): 1740–1766.
  • Chekuri, Vondrák, and Zenklusen (2014) Chekuri, C.; Vondrák, J.; and Zenklusen, R. 2014. Submodular function maximization via the multilinear relaxation and contention resolution schemes. SIAM Journal on Computing, 43(6): 1831–1879.
  • Das and Kempe (2011) Das, A.; and Kempe, D. 2011. Submodular meets spectral: greedy algorithms for subset selection, sparse approximation and dictionary selection. In Proceedings of the 28th International Conference on International Conference on Machine Learning, 1057–1064.
  • Feige, Mirrokni, and Vondrak (2007) Feige, U.; Mirrokni, V. S.; and Vondrak, J. 2007. Maximizing Non-Monotone Submodular Functions. In Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science, 461–471.
  • Gabillon et al. (2013) Gabillon, V.; Kveton, B.; Wen, Z.; Eriksson, B.; and Muthukrishnan, S. 2013. Adaptive submodular maximization in bandit setting. Advances in Neural Information Processing Systems, 26.
  • Golovin and Krause (2011) Golovin, D.; and Krause, A. 2011. Adaptive submodularity: Theory and applications in active learning and stochastic optimization. Journal of Artificial Intelligence Research, 42: 427–486.
  • Kazemi, Zadimoghaddam, and Karbasi (2018) Kazemi, E.; Zadimoghaddam, M.; and Karbasi, A. 2018. Scalable deletion-robust submodular maximization: Data summarization with privacy and fairness constraints. In International conference on machine learning, 2544–2553. PMLR.
  • Kempe, Kleinberg, and Tardos (2003) Kempe, D.; Kleinberg, J.; and Tardos, É. 2003. Maximizing the spread of influence through a social network. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 137–146. ACM.
  • Kendall (1938) Kendall, M. G. 1938. A new measure of rank correlation. Biometrika, 30(1/2): 81–93.
  • Khuller, Moss, and Naor (1999) Khuller, S.; Moss, A.; and Naor, J. S. 1999. The budgeted maximum coverage problem. Information processing letters, 70(1): 39–45.
  • Kirchhoff and Bilmes (2014) Kirchhoff, K.; and Bilmes, J. 2014. Submodularity for data selection in machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 131–141.
  • Krause (2005) Krause, A. 2005. Near-optimal nonmyopic value of information in graphical models. In Proc. of the 21st Annual Conf. on Uncertainty in Artificial Intelligence (UAI 2005), 324–331.
  • Li et al. (2022) Li, M.; Wang, T.; Zhang, H.; Zhang, S.; Zhao, Z.; Zhang, W.; Miao, J.; Pu, S.; and Wu, F. 2022. HERO: HiErarchical spatio-tempoRal reasOning with Contrastive Action Correspondence for End-to-End Video Object Grounding. In ACM MM.
  • Lin and Bilmes (2011) Lin, H.; and Bilmes, J. 2011. A class of submodular functions for document summarization. In Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, 510–520.
  • Mirzasoleiman, Karbasi, and Krause (2017) Mirzasoleiman, B.; Karbasi, A.; and Krause, A. 2017. Deletion-robust submodular maximization: Data summarization with “the right to be forgotten”. In International Conference on Machine Learning, 2449–2458. PMLR.
  • Mitrovic et al. (2018) Mitrovic, M.; Feldman, M.; Krause, A.; and Karbasi, A. 2018. Submodularity on hypergraphs: From sets to sequences. In International Conference on Artificial Intelligence and Statistics, 1177–1184. PMLR.
  • Mitrovic et al. (2019) Mitrovic, M.; Kazemi, E.; Feldman, M.; Krause, A.; and Karbasi, A. 2019. Adaptive sequence submodularity. Advances in Neural Information Processing Systems, 32.
  • Mitrovic et al. (2017) Mitrovic, S.; Bogunovic, I.; Norouzi-Fard, A.; Tarnawski, J. M.; and Cevher, V. 2017. Streaming robust submodular maximization: A partitioned thresholding approach. Advances in Neural Information Processing Systems, 30.
  • Nemhauser and Wolsey (1978) Nemhauser, G. L.; and Wolsey, L. A. 1978. Best algorithms for approximating the maximum of a submodular set function. Mathematics of operations research, 3(3): 177–188.
  • Ni, Li, and McAuley (2019) Ni, J.; Li, J.; and McAuley, J. 2019. Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), 188–197.
  • Orlin, Schulz, and Udwani (2018) Orlin, J. B.; Schulz, A. S.; and Udwani, R. 2018. Robust monotone submodular function maximization. Mathematical Programming, 172(1): 505–537.
  • Sallam et al. (2020) Sallam, G.; Zheng, Z.; Wu, J.; and Ji, B. 2020. Robust sequence submodular maximization. Advances in Neural Information Processing Systems, 33: 15417–15426.
  • Shi et al. (2019) Shi, Q.; Wang, C.; Ye, D.; Chen, J.; Feng, Y.; and Chen, C. 2019. Adaptive influence blocking: Minimizing the negative spread by observation-based policies. In 2019 IEEE 35th international conference on data engineering (ICDE), 1502–1513. IEEE.
  • Shi et al. (2021) Shi, Q.; Wang, C.; Ye, D.; Chen, J.; Zhou, S.; Feng, Y.; Chen, C.; and Huang, Y. 2021. Profit maximization for competitive social advertising. Theoretical Computer Science, 868: 12–29.
  • Streeter and Golovin (2008) Streeter, M.; and Golovin, D. 2008. An online algorithm for maximizing submodular functions. Advances in Neural Information Processing Systems, 21.
  • Tschiatschek, Singla, and Krause (2017) Tschiatschek, S.; Singla, A.; and Krause, A. 2017. Selecting sequences of items via submodular maximization. In Thirty-First AAAI Conference on Artificial Intelligence.
  • Tzoumas et al. (2017) Tzoumas, V.; Gatsis, K.; Jadbabaie, A.; and Pappas, G. J. 2017. Resilient monotone submodular function maximization. In 2017 IEEE 56th Annual Conference on Decision and Control (CDC), 1362–1367. IEEE.
  • Tzoumas, Jadbabaie, and Pappas (2018) Tzoumas, V.; Jadbabaie, A.; and Pappas, G. J. 2018. Resilient monotone sequential maximization. In 2018 IEEE Conference on Decision and Control (CDC), 7261–7268. IEEE.
  • Tzoumas, Jadbabaie, and Pappas (2020) Tzoumas, V.; Jadbabaie, A.; and Pappas, G. J. 2020. Robust and adaptive sequential submodular optimization. IEEE Transactions on Automatic Control.
  • Wang et al. (2021) Wang, C.; Shi, Q.; Xian, W.; Feng, Y.; and Chen, C. 2021. Efficient diversified influence maximization with adaptive policies. Knowledge-Based Systems, 213: 106692.
  • West, Pineau, and Precup (2009) West, R.; Pineau, J.; and Precup, D. 2009. Wikispeedia: An online game for inferring semantic distances between concepts. In Twenty-First IJCAI.
  • Zhang et al. (2015) Zhang, Z.; Chong, E. K.; Pezeshki, A.; and Moran, W. 2015. String submodular functions with curvature constraints. IEEE Transactions on Automatic Control, 61(3): 601–616.
  • Zhang et al. (2013) Zhang, Z.; Wang, Z.; Chong, E. K.; Pezeshki, A.; and Moran, W. 2013. Near optimality of greedy strategies for string submodular functions with forward and backward curvature constraints. In 52nd IEEE Conference on Decision and Control, 5156–5161. IEEE.

Appendices: Omitted Proofs

Proof of Lemma 1

Lemma 1. There exists a vertex vv for sequence σ1\sigma_{1} and σ2\sigma_{2} satisfies that f(v|σ1)1din|σ2|f(σ2|σ1)f(v|\sigma_{1})\geq\frac{1}{d_{\text{in}}|\sigma_{2}|}f(\sigma_{2}|\sigma_{1}).

Proof.

Define E1=E(σ1)E_{1}=E(\sigma_{1}) and E2=E(σ2σ1)E(σ1)E_{2}=E(\sigma_{2}\oplus\sigma_{1})-E(\sigma_{1}). We denote E[i,j]E^{[i,j]} as the iith to jjth elements in EE.

f(σ2|σ1)\displaystyle f(\sigma_{2}|\sigma_{1}) =f(σ2σ1)f(σ1)=h(E(σ2σ1))h(E(σ1))\displaystyle=f(\sigma_{2}\oplus\sigma_{1})-f(\sigma_{1})=h(E(\sigma_{2}\oplus\sigma_{1}))-h(E(\sigma_{1}))
=j=1|E2|h(E1E2[1,j])h(E1E2[1,j1])=j=1|E2|h(E2[j,j]|E1E2[1,j1])\displaystyle=\sum^{|E_{2}|}_{j=1}h(E_{1}\cup E_{2}^{[1,j]})-h(E_{1}\cup E_{2}^{[1,j-1]})=\sum_{j=1}^{|E_{2}|}h(E_{2}^{[j,j]}|E_{1}\cup E_{2}^{[1,j-1]})

There must exist an index 1j|E2|1\leq j^{\prime}\leq|E_{2}| such that

h(E2[j,j]|E1E2[1,j1])h(E(σ2σ1))h(E(σ1))|E2|h(E_{2}^{[j^{\prime},j^{\prime}]}|E_{1}\oplus E_{2}^{[1,j^{\prime}-1]})\geq\frac{h(E(\sigma_{2}\oplus\sigma_{1}))-h(E(\sigma_{1}))}{|E_{2}|}

Due to the submodularity of hh, we have

h(E2[j,j]|E1)h(E2[j,j]|E1E2[1,j1])1|E2|(h(E(σ2σ1))h(E(σ1)))h(E_{2}^{[j^{\prime},j^{\prime}]}|E_{1})\geq h(E_{2}^{[j^{\prime},j^{\prime}]}|E_{1}\oplus E_{2}^{[1,j^{\prime}-1]})\geq\frac{1}{|E_{2}|}(h(E(\sigma_{2}\oplus\sigma_{1}))-h(E(\sigma_{1})))

Thus there exists some element ee and some vertex vv such that

h(e|E1)1|E2|f(σ2|σ1)f(v|σ1)1din|σ2|f(σ2|σ1)h(e|E_{1})\geq\frac{1}{|E_{2}|}f(\sigma_{2}|\sigma_{1})\Longrightarrow f(v|\sigma_{1})\geq\frac{1}{d_{\text{in}}|\sigma_{2}|}f(\sigma_{2}|\sigma_{1})

Proof of Lemma 2

Lemma 2. Consider c(0,1]c\in(0,1] and 1kk1\leq k^{\prime}\leq k. Suppose that the sequence selected by the Sequence Greedy Algorithm is σ\sigma with |σ|=k|\sigma|=k and that there exists a sequence σ\sigma^{\prime} with |σ|=kk|\sigma^{\prime}|=k-k^{\prime} such that σσ\sigma^{\prime}\subseteq\sigma and f(σ)cf(σ)f(\sigma^{\prime})\geq cf(\sigma). Then we have f(σ)ekdink1ekdinkcf(σ(V,k,0))f(\sigma)\geq\frac{e^{\frac{k^{\prime}}{d_{\text{in}}k}}-1}{e^{\frac{k^{\prime}}{d_{\text{in}}k}}-c}f(\sigma^{*}(V,k,0)).

Proof.

Let v2iv_{2}^{i} be the ii-th vertex of sequence σ2\sigma_{2}. Let σ2i\sigma_{2}^{i} be the sequence consisting of the first ii vertices of sequence σ2\sigma_{2}. We can safely assume that |σ(V,k,0)|=k|\sigma^{*}(V,k,0)|=k.

f(σ1σ2i)f(σ1σ2i1)\displaystyle f(\sigma_{1}\oplus\sigma_{2}^{i})-f(\sigma_{1}\oplus\sigma_{2}^{i-1}) =f(v2i|σ1σ2i1)h(e2i|E(σ1σ2i1))\displaystyle=f(v_{2}^{i}|\sigma_{1}\oplus\sigma_{2}^{i-1})\geq h(e_{2}^{i}|E(\sigma_{1}\oplus\sigma_{2}^{i-1}))
1dink(h(E(σ(V,k,0))E(σ1σ2i1))h(E(σ1σ2i1))\displaystyle\geq\frac{1}{d_{\text{in}}k}(h(E(\sigma^{*}(V,k,0))\cup E(\sigma_{1}\oplus\sigma_{2}^{i-1}))-h(E(\sigma_{1}\oplus\sigma_{2}^{i-1}))
=1dinkf(σ(V,k,0)|σ1σ2i1)1dink(f(σ(V,k,0)f(σ1σ2i1))\displaystyle=\frac{1}{d_{\text{in}}k}f(\sigma^{*}(V,k,0)|\sigma_{1}\oplus\sigma_{2}^{i-1})\geq\frac{1}{d_{\text{in}}k}(f(\sigma^{*}(V,k,0)-f(\sigma_{1}\oplus\sigma_{2}^{i-1}))

where e2ie_{2}^{i} is the edge that selected by the algorithm making v2iv_{2}^{i} added into the returned sequence. The above inequalities hold due to the fact that the function f()f(\cdot) is monotone, Sequence Greedy algorithm select the edge of largest marginal value each time, and adding σ(V,k,0)\sigma^{*}(V,k,0) to σ1σ2i1\sigma_{1}\oplus\sigma_{2}^{i-1} may add at most dinkd_{\text{in}}k edges.

Rewriting the above inequality, we have

f(σ1σ2i)1dinkf(σ(V,k,0)+(11dink)f(σ1σ2i1)f(\sigma_{1}\oplus\sigma_{2}^{i})\geq\frac{1}{d_{\text{in}}k}f(\sigma^{*}(V,k,0)+(1-\frac{1}{d_{\text{in}}k})f(\sigma_{1}\oplus\sigma_{2}^{i-1})

Writing for i{1,2,,k}i\in\{1,2,...,k^{\prime}\}, we have

f(σ1σ2k)\displaystyle f(\sigma_{1}\oplus\sigma_{2}^{k^{\prime}}) j=0k11dink(11dink)jf(σ(V,k,0)+(11dink)kf(σ1)\displaystyle\geq\sum_{j=0}^{k^{\prime}-1}\frac{1}{d_{\text{in}}k}(1-\frac{1}{d_{\text{in}}k})^{j}f(\sigma^{*}(V,k,0)+(1-\frac{1}{d_{\text{in}}k})^{k^{\prime}}f(\sigma_{1})
=\displaystyle= (1(11dink)k)f(σ(V,k,0)+(11dink)kf(σ1)\displaystyle(1-(1-\frac{1}{d_{\text{in}}k})^{k^{\prime}})f(\sigma^{*}(V,k,0)+(1-\frac{1}{d_{\text{in}}k})^{k^{\prime}}f(\sigma_{1})

Thus we have

f(σ2|σ1)=f(σ1σ2k)f(σ1)\displaystyle f(\sigma_{2}|\sigma_{1})=f(\sigma_{1}\oplus\sigma_{2}^{k^{\prime}})-f(\sigma_{1}) (1(11dink)k)(f(σ(V,k,0))f(σ1))\displaystyle\geq(1-(1-\frac{1}{d_{\text{in}}k})^{k^{\prime}})(f(\sigma^{*}(V,k,0))-f(\sigma_{1}))
(11/ekdink)(f(σ(V,k,0))f(σ1))\displaystyle\geq(1-1/e^{\frac{k^{\prime}}{d_{\text{in}}k}})(f(\sigma^{*}(V,k,0))-f(\sigma_{1}))

Suppose f(σ)=δf(σ(V,k,0))f(\sigma)=\delta f(\sigma^{*}(V,k,0)) for some δ(0,1]\delta\in(0,1]. Then we have

δf(σ(V,k,0)\displaystyle\delta f(\sigma^{*}(V,k,0) =f(σ)=f(σ1)+f(σ2|σ1)f(σ1)+(11/ekdink)(f(σ(V,k,0))f(σ1))\displaystyle=f(\sigma)=f(\sigma_{1})+f(\sigma_{2}|\sigma_{1})\geq f(\sigma_{1})+(1-1/e^{\frac{k^{\prime}}{d_{\text{in}}k}})(f(\sigma^{*}(V,k,0))-f(\sigma_{1}))
=(1/ekdink)f(σ1)+(11/ekdink)f(σ(V,k,0))cδekdinkf(σ(V,k,0))+(11ekdink)f(σ(V,k,0))\displaystyle=(1/e^{\frac{k^{\prime}}{d_{\text{in}}k}})f(\sigma_{1})+(1-1/e^{\frac{k^{\prime}}{d_{\text{in}}k}})f(\sigma^{*}(V,k,0))\geq\frac{c\delta}{e^{\frac{k^{\prime}}{d_{\text{in}}k}}}f(\sigma^{*}(V,k,0))+(1-\frac{1}{e^{\frac{k^{\prime}}{d_{\text{in}}k}}})f(\sigma^{*}(V,k,0))

Then we have

δ(cδ/ekdink)+(11/ekdink)δekdink1ekdinkc\delta\geq(c\cdot\delta/e^{\frac{k^{\prime}}{d_{\text{in}}k}})+(1-1/e^{\frac{k^{\prime}}{d_{\text{in}}k}})\Longrightarrow\delta\geq\frac{e^{\frac{k^{\prime}}{d_{\text{in}}k}}-1}{e^{\frac{k^{\prime}}{d_{\text{in}}k}}-c}

Proof of Lemma 3

Lemma 3. Consider 1τk1\leq\tau\leq k. The following holds for any ZVZ\subseteq V with |Z|τ|Z|\leq\tau: gτ(σ(V,k,τ))f(σ(VZ,kτ,0))g_{\tau}(\sigma^{*}(V,k,\tau))\leq f(\sigma^{*}(V-Z,k-\tau,0)).

Proof.

Let Zτ(σ)Z_{\tau}(\sigma) be the set that minimizes gτ(σ)g_{\tau}(\sigma). We can safely assume |Zτ(σ)|=τ|Z_{\tau}(\sigma)|=\tau.

Let 𝒰=V(σ(V,k,τ))Z\mathcal{U}=V(\sigma^{*}(V,k,\tau))\cap Z and τ=|𝒰|\tau^{\prime}=|\mathcal{U}|. Obviously, 𝒰V(σ(V,k,τ))\mathcal{U}\subseteq V(\sigma^{*}(V,k,\tau)). Also, we have

𝒰Zττ(σ(V,k,τ)𝒰)V(σ(V,k,τ))and|𝒰Zττ(σ(V,k,τ)𝒰)|=τ.\mathcal{U}\cup Z_{\tau-\tau^{\prime}}(\sigma^{*}(V,k,\tau)-\mathcal{U})\subseteq V(\sigma^{*}(V,k,\tau))\quad\quad\text{and}\quad\quad|\mathcal{U}\cup Z_{\tau-\tau^{\prime}}(\sigma^{*}(V,k,\tau)-\mathcal{U})|=\tau.

Thus the set 𝒰Zττ(σ(V,k,τ)𝒰)\mathcal{U}\cup Z_{\tau-\tau^{\prime}}(\sigma^{*}(V,k,\tau)-\mathcal{U}) is a feasible solution to minimize function gτ(σ)g_{\tau}(\sigma) with σ\sigma^{*} and τ\tau. Thus we have

f(σ(V,k,τ)Zτ(σ(V,k,τ)))f(σ(V,k,τ)𝒰Zττ(σ(V,k,τ)𝒰))f(\sigma^{*}(V,k,\tau)-Z_{\tau}(\sigma^{*}(V,k,\tau)))\leq f(\sigma^{*}(V,k,\tau)-\mathcal{U}\cup Z_{\tau-\tau^{\prime}}(\sigma^{*}(V,k,\tau)-\mathcal{U}))

In addition, we have σ(V,k,τ)𝒰=σ(V,k,τ)Z\sigma^{*}(V,k,\tau)-\mathcal{U}=\sigma^{*}(V,k,\tau)-Z. This implies that

Zττ(σ(V,k,τ)𝒰)=Zττ(σ(V,k,τ)Z).Z_{\tau-\tau^{\prime}}(\sigma^{*}(V,k,\tau)-\mathcal{U})=Z_{\tau-\tau^{\prime}}(\sigma^{*}(V,k,\tau)-Z).

The elements in Z𝒰Z-\mathcal{U} are not in σ(V,k,τ)\sigma^{*}(V,k,\tau). Thus we have

f(σ(V,k,τ)𝒰Zττ(σ(V,k,τ)𝒰))=f(σ(V,k,τ)ZZττ(σ(V,k,τ)Z)f(\sigma^{*}(V,k,\tau)-\mathcal{U}\cup Z_{\tau-\tau^{\prime}}(\sigma^{*}(V,k,\tau)-\mathcal{U}))=f(\sigma^{*}(V,k,\tau)-Z\cup Z_{\tau-\tau^{\prime}}(\sigma^{*}(V,k,\tau)-Z)

Note the sequence σ(V,k,τ)Z\sigma^{*}(V,k,\tau)-Z does not contain any element in ZZ and has kτk-\tau^{\prime} elements. It is a feasible solution to minimize function gτ(σ)g_{\tau}(\sigma) with VZ,kτV-Z,k-\tau^{\prime} and ττ\tau-\tau^{\prime}. Thus we have

gττ(σ(V,k,τ)Z)gττ(σ(VZ,kτ,ττ))f(σ(VZ,kτ,0))g_{\tau-\tau^{\prime}}(\sigma^{*}(V,k,\tau)-Z)\leq g_{\tau-\tau^{\prime}}(\sigma^{*}(V-Z,k-\tau^{\prime},\tau-\tau^{\prime}))\leq f(\sigma^{*}(V-Z,k-\tau,0))

Also we have

gττ(σ(V,k,τ)Z)\displaystyle g_{\tau-\tau^{\prime}}(\sigma^{*}(V,k,\tau)-Z) =f((σ(V,k,τ)Z)Zττ(σ(V,k,τ)Z))\displaystyle=f((\sigma^{*}(V,k,\tau)-Z)-Z_{\tau-\tau^{\prime}}(\sigma^{*}(V,k,\tau)-Z))
=f((σ(V,k,τ)ZZττ(σ(V,k,τ)Z))\displaystyle=f((\sigma^{*}(V,k,\tau)-Z\cup Z_{\tau-\tau^{\prime}}(\sigma^{*}(V,k,\tau)-Z))
=f((σ(V,k,τ)𝒰Zττ(σ(V,k,τ)𝒰))\displaystyle=f((\sigma^{*}(V,k,\tau)-\mathcal{U}\cup Z_{\tau-\tau^{\prime}}(\sigma^{*}(V,k,\tau)-\mathcal{U}))
f((σ(V,k,τ)Zτ(σ(V,k,τ)))=gτ(σ(V,k,τ))\displaystyle\geq f((\sigma^{*}(V,k,\tau)-Z_{\tau}(\sigma^{*}(V,k,\tau)))=g_{\tau}(\sigma^{*}(V,k,\tau))

Hence we proved gτ(σ(V,k,τ))f(σ(VZ,kτ,0))g_{\tau}(\sigma^{*}(V,k,\tau))\leq f(\sigma^{*}(V-Z,k-\tau,0)). ∎

Proof of Theorem 2

Theorem 2. Consider 1τk1\leq\tau\leq k, Algorithm 1 achieves an approximation ratio of

max{1e(11/k)αβ,ταβ(η1din1)ταη1dinβ(1e(11/k))}.\max\{\frac{1-e^{-(1-1/k)}}{\alpha\beta},\frac{\tau\alpha\beta(\eta^{\frac{1}{d_{\text{in}}}}-1)}{\tau\alpha\eta^{\frac{1}{d_{\text{in}}}}-\beta(1-e^{-(1-1/k)})}\}.
Proof.

After the execution of the RoseNets Algorithm, the selected sequence σ=σ1σ2\sigma=\sigma_{1}\oplus\sigma_{2} where |σ1|=τ|\sigma_{1}|=\tau and |σ2|=kτ|\sigma_{2}|=k-\tau. Define Zτ=Zτ1Zτ2Z_{\tau}=Z_{\tau}^{1}\cup Z_{\tau}^{2} as the removed set of vertices that can minimize f(σZτ)f(\sigma-Z_{\tau}), where Zτ1=Zτσ1Z_{\tau}^{1}=Z_{\tau}\cap\sigma_{1} and Zτ2=Zτσ2Z_{\tau}^{2}=Z_{\tau}\cap\sigma_{2}. First, we have a lower bound for f(σ2)f(\sigma_{2}):

f(σ2)1e(11/k)αf(σ(V\σ1,kτ,0))1e(11/k)αgτ(σ(V,k,τ))f(\sigma_{2})\geq\frac{1-e^{-(1-1/k)}}{\alpha}f(\sigma^{*}(V\backslash\sigma_{1},k-\tau,0))\geq\frac{1-e^{-(1-1/k)}}{\alpha}g_{\tau}(\sigma^{*}(V,k,\tau))

The proof of Theorem 2 follows a similar line of the proof of Theorem 1. Specifically, we consider two cases as follows:

Case 1. Zτ2=Z^{2}_{\tau}=\emptyset, which means that all the removed vertices are in σ1\sigma_{1}. Thus we have

f(σZτ(σ))=f(σ2)1e(11/k)αgτ(σ(V,k,τ))f(\sigma-Z_{\tau}(\sigma))=f(\sigma_{2})\geq\frac{1-e^{-(1-1/k)}}{\alpha}g_{\tau}(\sigma^{*}(V,k,\tau))

Case 2. Zτ2Z^{2}_{\tau}\neq\emptyset. In this case, we further consider two cases.

Case 2.1. Let f(σ2)f(σ2Zτ2(σ))f(\sigma_{2})\leq f(\sigma_{2}-Z^{2}_{\tau}(\sigma)).

In this case, the removal of elements in Zτ2(σ)Z^{2}_{\tau}(\sigma) does not reduce the overall value of the remaining sequence. Then we have

f(σZτ(σ))\displaystyle f(\sigma-Z_{\tau}(\sigma)) =f((σ1Zτ1(σ))(σ2Zτ2(σ))\displaystyle=f((\sigma_{1}-Z_{\tau}^{1}(\sigma))\oplus(\sigma_{2}-Z_{\tau}^{2}(\sigma))
f(σ2Zτ2(σ))f(σ2)1e(11/k)αgτ(σ(V,k,τ))\displaystyle\geq f(\sigma_{2}-Z_{\tau}^{2}(\sigma))\geq f(\sigma_{2})\geq\frac{1-e^{-(1-1/k)}}{\alpha}g_{\tau}(\sigma^{*}(V,k,\tau))

Case 2.2. Let f(σ2)f(σ2Zτ2(σ))f(\sigma_{2})\geq f(\sigma_{2}-Z^{2}_{\tau}(\sigma)). Let τ1=|Zτ1(σ)|\tau_{1}=|Z_{\tau}^{1}(\sigma)| and τ2=|Zτ2(σ)|\tau_{2}=|Z_{\tau}^{2}(\sigma)|. Then we have τ=τ1+τ2\tau=\tau_{1}+\tau_{2} and k=|σ|=|σ1|+|σ2|τ+τ2k=|\sigma|=|\sigma_{1}|+|\sigma_{2}|\geq\tau+\tau_{2}.

Let q=f(σ2)f(σ2Zτ2(σ))(din+dout)f(σ2)q=\frac{f(\sigma_{2})-f(\sigma_{2}-Z_{\tau}^{2}(\sigma))}{(d_{\text{in}}+d_{\text{out}})f(\sigma_{2})}, denote the ratio of the loss of removing elements in Zτ2(σ)Z_{\tau}^{2}(\sigma) from sequence σ2\sigma_{2} to the value of the sequence σ2\sigma_{2}. We have q(0,1(din+dout)]q\in(0,\frac{1}{(d_{\text{in}}+d_{\text{out}})}] since f(σ2)>f(σ2Zτ2(σ))f(\sigma_{2})>f(\sigma_{2}-Z_{\tau}^{2}(\sigma)).

Let the elements in Zτ2(σ)Z^{2}_{\tau}(\sigma) be denoted by z1,z2,,zτ2z_{1},z_{2},...,z_{\tau_{2}} according to their order in sequence σ2\sigma_{2}. Then we can rewrite σ2\sigma_{2} as σ2=σ21z1σ22z2σ2τ2+1\sigma_{2}=\sigma_{2}^{1}\oplus z_{1}\oplus\sigma_{2}^{2}\oplus z_{2}\oplus...\oplus\sigma_{2}^{\tau_{2}+1}, where σ2i\sigma_{2}^{i} is the subsequence between item zi1z_{i-1} and ziz_{i}, for i=1,2,,τ2+1i=1,2,...,\tau_{2}+1 and z0z_{0} and zτ2+1z_{\tau_{2}+1} are empty sequences. Note σ2i\sigma_{2}^{i} can be an empty sequence for any ii. Then we have

(din\displaystyle(d_{\text{in}} +dout)qf(σ2)=f(σ2)f(σ2Zτ2(σ))\displaystyle+d_{\text{out}})qf(\sigma_{2})=f(\sigma_{2})-f(\sigma_{2}-Z_{\tau}^{2}(\sigma))
=f(σ21)+f(z1|σ21)+f(σ22|(σ21z1)++f(σ2τ2+1|σ21z1σ2τ2)\displaystyle=f(\sigma_{2}^{1})+f(z_{1}|\sigma_{2}^{1})+f(\sigma_{2}^{2}|(\sigma_{2}^{1}\oplus z_{1})+...+f(\sigma_{2}^{\tau_{2}+1}|\sigma_{2}^{1}\oplus z_{1}\oplus...\oplus\sigma_{2}^{\tau_{2}})
f(σ21)f(σ22|σ21)f(σ2τ2+1|σ21σ2τ2)\displaystyle\quad-f(\sigma_{2}^{1})-f(\sigma_{2}^{2}|\sigma_{2}^{1})-...-f(\sigma_{2}^{\tau_{2}+1}|\sigma_{2}^{1}\oplus...\oplus\sigma_{2}^{\tau_{2}})
=i=1τ2(f(zi|σ21z1zi1σ2i)+f(σ2i+1|σ2izi)f(σ2i+1|σ2i))\displaystyle=\sum_{i=1}^{\tau_{2}}\left(f(z_{i}|\sigma_{2}^{1}\oplus z_{1}\oplus...\oplus z_{i-1}\oplus\sigma_{2}^{i})+f(\sigma_{2}^{i+1}|\sigma_{2}^{i}\oplus z_{i})-f(\sigma_{2}^{i+1}|\sigma_{2}^{i})\right)
τ2dinh(ezin)+τ2douth(ezout)\displaystyle\leq\tau_{2}d_{\text{in}}h(e_{z}^{\text{in}})+\tau_{2}d_{\text{out}}h(e_{z}^{\text{out}})
τ2(din+dout)max{h(ezin),h(ezout)}\displaystyle\leq\tau_{2}(d_{\text{in}}+d_{\text{out}})\max\{h(e_{z}^{\text{in}}),h(e_{z}^{\text{out}})\}
τ2(din+dout)f(Zτ2)\displaystyle\leq\tau_{2}(d_{\text{in}}+d_{\text{out}})f(Z_{\tau}^{2})

where dind_{\text{in}}/doutd_{\text{out}} is the maximum in/out degree of the network, h(ezin)h(e_{z}^{\text{in}})/h(ezout)h(e_{z}^{\text{out}}) are the edge that has maximum utility over all the incoming/outgoing edges of all the removed nodes {zi}i=1,2,,τ2\{z_{i}\}_{i=1,2,...,\tau_{2}}. The first inequality is due to the fact that the marginal gain of a vertex zz to the prefix and subsequent sequence is at most dinh(ezin)d_{\text{in}}h(e_{z}^{\text{in}}) and douth(ezout)d_{\text{out}}h(e_{z}^{\text{out}}). Then the second inequality follows intuitively.

Given Equation (4), we need to prove four inequalities for finally proving the theorem.

First, suppose the first vertex of σ2\sigma_{2} is v2v_{2}. By the monotonicity of function f()f(\cdot) and the above inequality, we have

f(σZτ(σ))\displaystyle f(\sigma-Z_{\tau}(\sigma)) =f((σ1Zτ1(σ))(σ2Zτ2(σ))f(σ1Zτ1(σ))=f(σ1τ2)τ2max{h(ezin),h(ezout)}qf(σ2)\displaystyle=f((\sigma_{1}-Z_{\tau}^{1}(\sigma))\oplus(\sigma_{2}-Z_{\tau}^{2}(\sigma))\geq f(\sigma_{1}-Z_{\tau}^{1}(\sigma))=f(\sigma_{1}^{\tau_{2}})\geq\tau_{2}\max\{h(e_{z}^{\text{in}}),h(e_{z}^{\text{out}})\}\geq qf(\sigma_{2})
f(σZτ(σ))\displaystyle f(\sigma-Z_{\tau}(\sigma)) =f((σ1Zτ1(σ))(σ2Zτ2(σ))f(σ2Zτ2(σ))(1(din+dout)q)f(σ2)\displaystyle=f((\sigma_{1}-Z_{\tau}^{1}(\sigma))\oplus(\sigma_{2}-Z_{\tau}^{2}(\sigma))\geq f(\sigma_{2}-Z_{\tau}^{2}(\sigma))\geq(1-(d_{\text{in}}+d_{\text{out}})q)f(\sigma_{2})

Then we have Inequality 1 as below.

Inequality 1:

f(σz)max{qf(σ2),(1(din+dout)q)f(σ2)}.f(\sigma-z)\geq\max\{q\cdot f(\sigma_{2}),(1-(d_{\text{in}}+d_{\text{out}})q)\cdot f(\sigma_{2})\}.

By similar analysis in the proof of Theorem 1, we have inequality 2 as below.

Inequality 2:

max{q,1(din+dout)q}1/β.\max\{q,1-(d_{\text{in}}+d_{\text{out}})q\}\geq 1/\beta.

Let σ2τ2\sigma_{2}^{\tau_{2}} be the sequence consisting of the first τ2\tau_{2} elements of sequence σ2\sigma_{2}. Then we have

f(σ2τ2)1e(11/k)αf(σ(V\σ1,τ2,0))1e(11/k)αf(Zτ2(σ))q1e(11/k)τ2αf(σ2)f(\sigma_{2}^{\tau_{2}})\geq\frac{1-e^{-(1-1/k)}}{\alpha}f(\sigma^{*}(V\backslash\sigma_{1},\tau_{2},0))\geq\frac{1-e^{-(1-1/k)}}{\alpha}f(Z_{\tau}^{2}(\sigma))\geq q\frac{1-e^{-(1-1/k)}}{\tau_{2}\alpha}f(\sigma_{2})

Thus we have

f(σ2)\displaystyle f(\sigma_{2}) ekττ21(kτ21)din1ekττ21(kτ21)dinq1e(11/k)τ2αf(σ(V\σ1,kτ,0))\displaystyle\geq\frac{e^{\frac{k-\tau-\tau_{2}-1}{(k-\tau_{2}-1)d_{\text{in}}}}-1}{e^{\frac{k-\tau-\tau_{2}-1}{(k-\tau_{2}-1)d_{\text{in}}}}-q\frac{1-e^{-(1-1/k)}}{\tau_{2}\alpha}}f(\sigma^{*}(V\backslash\sigma_{1},k-\tau,0))
\displaystyle\Longrightarrow Inequality 3: f(σ2)ekττ21(kτ21)din1ekττ21(kτ21)dinq1e(11/k)τ2αgτ(σ(V,k,τ))\displaystyle\text{{ Inequality 3: }}f(\sigma_{2})\geq\frac{e^{\frac{k-\tau-\tau_{2}-1}{(k-\tau_{2}-1)d_{\text{in}}}}-1}{e^{\frac{k-\tau-\tau_{2}-1}{(k-\tau_{2}-1)d_{\text{in}}}}-q\frac{1-e^{-(1-1/k)}}{\tau_{2}\alpha}}g_{\tau}(\sigma^{*}(V,k,\tau))

Now define 1(q)=qekττ21(kτ21)din1ekττ21din(kτ21)q1e(11/k)τ2α\ell_{1}(q)=q\cdot\frac{e^{\frac{k-\tau-\tau_{2}-1}{(k-\tau_{2}-1)d_{\text{in}}}}-1}{e^{\frac{k-\tau-\tau_{2}-1}{d_{\text{in}}(k-\tau_{2}-1)}}-q\frac{1-e^{-(1-1/k)}}{\tau_{2}\alpha}} and 2(q)=(1(din+dout)q)ekττ21(kτ21)din1ekττ21din(kτ21)q1e(11/k)τ2α\ell_{2}(q)=(1-(d_{\text{in}}+d_{\text{out}})q)\frac{e^{\frac{k-\tau-\tau_{2}-1}{(k-\tau_{2}-1)d_{\text{in}}}}-1}{e^{\frac{k-\tau-\tau_{2}-1}{d_{\text{in}}(k-\tau_{2}-1)}}-q\frac{1-e^{-(1-1/k)}}{\tau_{2}\alpha}}. It is easy to verify that for k3k\geq 3 and q(0,1din+dout]q\in(0,\frac{1}{d_{\text{in}}+d_{\text{out}}}], 1(q)\ell_{1}(q)/2(q)\ell_{2}(q) is monotonically increasing/decreasing. Note when q=1βq=\frac{1}{\beta}, 1(q)=2(q)=β(ekττ21(kτ21)din1)ekττ21din(kτ21)β1e(11/k)τ2α\ell_{1}(q)=\ell_{2}(q)=\frac{\beta(e^{\frac{k-\tau-\tau_{2}-1}{(k-\tau_{2}-1)d_{\text{in}}}}-1)}{e^{\frac{k-\tau-\tau_{2}-1}{d_{\text{in}}(k-\tau_{2}-1)}}-\beta\frac{1-e^{-(1-1/k)}}{\tau_{2}\alpha}}. We consider two cases for qq: (1) when q(0,1β]q\in(0,\frac{1}{\beta}], we have max{1(q),2(q)}2(1β)\max\{\ell_{1}(q),\ell_{2}(q)\}\geq\ell_{2}(\frac{1}{\beta}) as 2(q)\ell_{2}(q) is monotonically decreasing; (2) when q(1β,1din+out]q\in(\frac{1}{\beta},\frac{1}{d_{\text{in}}+\text{out}}], we have max{1(q),2(q)}1(1β)\max\{\ell_{1}(q),\ell_{2}(q)\}\geq\ell_{1}(\frac{1}{\beta}) as 1(q)\ell_{1}(q) is monotonically increasing. Thus we have the Inequality 4 as below.

Inequality 4:

max{qekττ21(kτ21)din1ekττ21din(kτ21)q1e(11/k)τ2α,(1(din+dout)q)ekττ21(kτ21)din1ekττ21din(kτ21)q1e(11/k)τ2α}β(ekττ21(kτ21)din1)ekττ21din(kτ21)β1e(11/k)τ2α\max\{q\cdot\frac{e^{\frac{k-\tau-\tau_{2}-1}{(k-\tau_{2}-1)d_{\text{in}}}}-1}{e^{\frac{k-\tau-\tau_{2}-1}{d_{\text{in}}(k-\tau_{2}-1)}}-q\frac{1-e^{-(1-1/k)}}{\tau_{2}\alpha}},(1-(d_{\text{in}}+d_{\text{out}})q)\frac{e^{\frac{k-\tau-\tau_{2}-1}{(k-\tau_{2}-1)d_{\text{in}}}}-1}{e^{\frac{k-\tau-\tau_{2}-1}{d_{\text{in}}(k-\tau_{2}-1)}}-q\frac{1-e^{-(1-1/k)}}{\tau_{2}\alpha}}\}\geq\frac{\beta(e^{\frac{k-\tau-\tau_{2}-1}{(k-\tau_{2}-1)d_{\text{in}}}}-1)}{e^{\frac{k-\tau-\tau_{2}-1}{d_{\text{in}}(k-\tau_{2}-1)}}-\beta\frac{1-e^{-(1-1/k)}}{\tau_{2}\alpha}}

Combining Inequality 1, Inequality 2 and the lower bound of f(σ2)f(\sigma_{2}), we can have the first lower bound in Theorem 2:

f(σz)\displaystyle f(\sigma-z) max{qf(σ2),(1(din+dout)q)f(σ2)}\displaystyle\geq\max\{q\cdot f(\sigma_{2}),(1-(d_{\text{in}}+d_{\text{out}})q)\cdot f(\sigma_{2})\}
max{q,1(din+dout)q}1e(11/k)αgτ(σ(V,k,τ))\displaystyle\geq\max\{q,1-(d_{\text{in}}+d_{\text{out}})q\}\cdot\frac{1-e^{-(1-1/k)}}{\alpha}g_{\tau}(\sigma^{*}(V,k,\tau))
1e(11/k)αβgτ(σ(V,k,τ))\displaystyle\geq\frac{1-e^{-(1-1/k)}}{\alpha\beta}g_{\tau}(\sigma^{*}(V,k,\tau))

Combining Inequality 1, Inequality 3 and Equation Inequality 4, we can have the second lower bound in Theorem 2:

f(σz)\displaystyle f(\sigma-z) max{qf(σ2),(1(din+dout)q)f(σ2)}\displaystyle\geq\max\{q\cdot f(\sigma_{2}),(1-(d_{\text{in}}+d_{\text{out}})q)\cdot f(\sigma_{2})\}
max{qekττ21(kτ21)din1ekττ21(kτ21)dinq1e(11/k)α,(1(din+dout)q)ekττ21(kτ21)din1ekττ21(kτ21)dinτ2q1e(11/k)α}gτ(σ(V,k,τ))\displaystyle\geq\max\{q\cdot\frac{e^{\frac{k-\tau-\tau_{2}-1}{(k-\tau_{2}-1)d_{\text{in}}}}-1}{e^{\frac{k-\tau-\tau_{2}-1}{(k-\tau_{2}-1)d_{\text{in}}}}q\frac{1-e^{-(1-1/k)}}{\alpha}},(1-(d_{\text{in}}+d_{\text{out}})q)\frac{e^{\frac{k-\tau-\tau_{2}-1}{(k-\tau_{2}-1)d_{\text{in}}}}-1}{e^{\frac{k-\tau-\tau_{2}-1}{(k-\tau_{2}-1)d_{\text{in}}}}-\tau_{2}q\frac{1-e^{-(1-1/k)}}{\alpha}}\}g_{\tau}(\sigma^{*}(V,k,\tau))
β(ekττ21(kτ21)din1)ekττ21(kτ21)dinβ1e(11/k)τ2αgτ(σ(V,k,τ))\displaystyle\geq\frac{\beta(e^{\frac{k-\tau-\tau_{2}-1}{(k-\tau_{2}-1)d_{\text{in}}}}-1)}{e^{\frac{k-\tau-\tau_{2}-1}{(k-\tau_{2}-1)d_{\text{in}}}}-\beta\frac{1-e^{-(1-1/k)}}{\tau_{2}\alpha}}g_{\tau}(\sigma^{*}(V,k,\tau))

Since τ2τ\tau_{2}\leq\tau, the theorem holds. ∎