This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Multi-Agent Maximization of a Monotone Submodular Function via Maximum Consensus

Navid Rezazadeh
Mechanical and Aerospace Eng. Dept.
University of California Irvine
Irvine, CA
[email protected] &Solmaz S. Kia
Mechanical and Aerospace Eng. Dept.
University of California Irvine
Irvine, CA
[email protected]
Abstract

Constrained submodular set function maximization problems often appear in multi-agent decision-making problems with a discrete feasible set. A prominent example is the problem of multi-agent mobile sensor placement over a discrete domain. However, submodular set function optimization problems are known to be NP-hard. In this paper, we consider a class of submodular optimization problems that consists of maximization of a monotone and submodular set function subject to a uniform matroid constraint over a group of networked agents that communicate over a connected undirected graph. Our objective is to obtain a distributed suboptimal polynomial-time algorithm that enables each agent to obtain its respective policy via local interactions with its neighboring agents. Our solution is a fully distributed gradient-based algorithm using the multilinear extension of the submodular set functions and exploiting a maximum consensus scheme. This algorithm results in a policy set that when the team objective function is evaluated at worst case the objective function value is in 11/eO(1/T)1-1/{\textup{e}}-O(1/T) of the optimal solution. An example demonstrates our results.

1 Introduction

In recent years, optimal multi-agent sensor placement/dispatch to cover areas for sampling/observation objectives has been of great interest in the robotic community to solve various coverage, exploration, and monitoring problems. Depending on the problem setting, the planning space can be continuous [1, 2, 3, 4] or discrete [5, 6, 7, 8]. In this paper, our interest is in planning in discrete space. Many optimal discrete policy-making for multi-agent sensor placement problems appear in the form of maximizing a utility for the team [9, 10, 11, 12, 13, 5]. A particular subset of these problems that we are interested in is in the form of a set-valued function maximization problem described by

max𝒫f(),\displaystyle\underset{\mathcal{R}\in\mathcal{I}\subset\mathcal{P}}{\textup{max}}f(\mathcal{R}), (1)

where 2𝒫\mathcal{I}\subset 2^{\mathcal{P}} is the admissible set of subsets originated from the ground policy set 𝒫\mathcal{P}, which is the union of the local policy set of the agents; see Fig. 1 for an example.

Problem (1) is known to be NP hard [14]. However, when the objective function is monotone increasing and submodular set function, it is well-known that the sequential greedy algorithm [15] delivers a polynomial-time suboptimal solution with guaranteed optimality bound. For large scale submodular maximization problems, reducing the size of the problem through approximations [16] or using several processing units leads to a faster sequential greedy algorithm, however, with some sacrifices on the optimality bound [17, 18, 19, 20].

When the set that the agents can choose their policy from is the same for all agents and the agents are homogeneous, the feasible set \mathcal{I} of (1) can be spanned by a uniform matroid constraint111see Section 3 for a formal definition of uniform matroid constraint.. For problems with uniform matroid constraint, the sequential greedy algorithm delivers a suboptimal solution whose utility value is no worse than 11e1-\frac{1}{\textup{e}} times the optimal utility value [14]. On the other hand, when the local policy set of the agents is disjoint and/or the agents are heterogeneous, and each agent has to choose one policy from its local set, the feasible set \mathcal{I} of (1) can be spanned by the so-called partitioned matroid constraint222see Section 3 for a formal definition of partitioned matroid constraint.. For problems with partitioned matroid constraint, the sequential greedy algorithm delivers a suboptimal solution whose utility value is no worse than 12\frac{1}{2} times the optimal utility value [15]. For problems with partitioned matroid constraint, more recently, a suboptimal solution with a better optimality gap is proposed in [21, 22, 23, 24] using the multilinear continuous relaxation of a submodular set function. The multilinear continuous relaxation of a submodular set function defined on the vertices of the nn-dimensional hypercube facilitates achieves better optimality bounds for the maximization problem (1) by allowing to use a continuous gradient-based optimization algorithm. That is, the relaxation transforms the main discrete problem into a continuous optimization problem with linear constraints. The optimality bound achieved by the algorithm proposed in [21] is 11e1-\frac{1}{\textup{e}} bound for a partitioned matroid constraint, which outperforms the sequential greedy algorithm whose optimally bound is 1/21/2. However, this solution requires a central authority to find the optimal solution. In sensor placement problems, when the agents are self-organizing autonomous mobile agents with communication and computation capabilities, it is desired to solve the optimization problem 1 in a distributed way, without involving a central authority. A distributed multilinear extension based continuous algorithm that uses an average consensus scheme between the agents to solve (1) subject to partitioned matroid constraint is proposed in [25]. However, the proposed algorithm assumes that access to the exact multilinear extension of the utility function is available, whose calculation is exponentially hard.

In this paper, motivated by the improved optimality gap of multilinear continuous relaxation based algorithms, we develop a distributed implementation of the algorithm of [21] over a connected undirected graph to obtain a suboptimal solution for (1) subject to a partitioned matroid constraint. In this algorithm, to manage the computational cost of constructing the multilinear extension of the utility function, we use a sampled based evaluation of the multilinear extension and propose a gradient-based algorithm, which uses a maximum consensus message passing scheme over the communication graph. Our algorithm uses a fully distributed rounding technique to compute the final solution of each agent. Through rigorous analysis, we show that our proposed distributed algorithm achieves, with a known probability, a 11eO(1/T)1-\frac{1}{\textup{e}}-O(1/T) optimality bound, where TT is the number of times agents communicated over the network. A numerical example demonstrates our results.

The organization of this paper is as follows. In Section 2 we introduce the notation used throughout the paper. Section 3 provides the necessary background on submodularity and submodular functions. We introduce our proposed algorithm in Section 5. A demonstration of our results is provided in Section 6. Appendix A gives the proof of our main results in Section 5. Appendix B gives the auxiliary results that we use in establishing the proof of our main results.

2 Notation

We denote the vectors with bold small font. The pthp^{\textup{th}} element of vector 𝐱\boldsymbol{\mathbf{x}} is denoted by xpx_{p}. We denote the inner product of two vectors 𝐱\boldsymbol{\mathbf{x}} and 𝐲\boldsymbol{\mathbf{y}} with appropriate dimensions by 𝐱.𝐲\boldsymbol{\mathbf{x}}.\boldsymbol{\mathbf{y}}. Furthermore, we use 𝟏\boldsymbol{\mathbf{1}} as a vector of ones with appropriate dimension. We denote sets with capital curly font. For a 𝒫={1,,n}\mathcal{P}=\{1,\cdots,n\} and a vector 𝐱0n\boldsymbol{\mathbf{x}}\in{\mathbb{R}}_{\geq 0}^{n} with 0xp10\leq x_{p}\leq 1, the set 𝐱\mathcal{R}_{\boldsymbol{\mathbf{x}}} is a random set where p𝒫p\in\mathcal{P} is in 𝐱\mathcal{R}_{\boldsymbol{\mathbf{x}}} with the probability xpx_{p}. Hence, we call such vector 𝐱\boldsymbol{\mathbf{x}} as membership probability vector. Furthermore, for 𝒮𝒫\mathcal{S}\subset\mathcal{P}, 𝟏𝒮{0,1}n\boldsymbol{\mathbf{1}}_{\mathcal{S}}\in\{0,1\}^{n} is the vector whose pthp^{\textup{th}} element is 11 if p𝒮p\in\mathcal{S} and 0 otherwise; we call 𝟏𝒮\boldsymbol{\mathbf{1}}_{\mathcal{S}} the membership indicator vector of set 𝒮\mathcal{S}. |x||x| is the absolute value of xx\in. By overloading the notation, we also use |𝒮||\mathcal{S}| as the cordiality of set 𝒮\mathcal{S}. We denote a graph by 𝒢(𝒜,)\mathcal{G}(\mathcal{A},\mathcal{E}) where 𝒜\mathcal{A} is the node set and 𝒜×𝒜\mathcal{E}\subset\mathcal{A}\times\mathcal{A} is the edge set. 𝒢\mathcal{G} is undirected if and only (i,j)(i,j)\in\mathcal{E} means that agents ii and jj can exchange information. An undirected graph is connected if there is a path from each node to every other node in the graph. We denote the set of the neighboring nodes of node ii by 𝒩i={j𝒜|(i,j)}\mathcal{N}_{i}=\{j\in\mathcal{A}|(i,j)\in\mathcal{E}\}. We also use d(𝒢)d(\mathcal{G}) to show the diameter of the graph.

Given a set 𝒳×\mathcal{F}\subset\mathcal{X}\times and an element (p,α)𝒳×(p,\alpha)\in\mathcal{X}\times we define the addition operator \oplus as {(p,α)}={(u,γ)𝒳×|(u,γ),up}{(u,γ+α)𝒳×|(u,γ),u=p}{(p,α)𝒳×|(p,γ)}\mathcal{F}{\oplus}\{(p,\alpha)\}=\{(u,\gamma)\in\mathcal{X}\times\,|\,(u,\gamma)\in\mathcal{F},u\not=p\}\cup\{(u,\gamma+\alpha)\in\mathcal{X}\times\,|\,(u,\gamma)\in\mathcal{F},u=p\}\cup\{(p,\alpha)\in\mathcal{X}\times\,|\,(p,\gamma)\not\in\mathcal{F}\}. Given a collection of sets j𝒳×\mathcal{F}_{j}\in\mathcal{X}\times, j𝒩j\in\mathcal{N}, we define the max-operation over these collection as MAXj𝒩j={(u,γ)X×|(u,γ)¯ s.t. γ=max(u,α)¯α}\underset{j\in\mathcal{N}}{\textup{MAX}}\,\,\mathcal{F}_{j}=\{(u,\gamma)\in X\times|(u,\gamma)\in\bar{\mathcal{F}}\text{~{}s.t.~{}}\gamma=\underset{(u,\alpha)\in\bar{\mathcal{F}}}{\textup{max}}\alpha\}, where ¯=j𝒩j.\bar{\mathcal{F}}=\bigcup_{j\in\mathcal{N}}\mathcal{F}_{j}.

3 Submodular Functions

A set function f:2𝒫0f:2^{\mathcal{P}}\to{\mathbb{R}}_{\geq 0} is monotone increasing if for sets 𝒫1𝒫2𝒫\mathcal{P}_{1}\subset\mathcal{P}_{2}\subset\mathcal{P},

f(𝒫1)f(𝒫2),\displaystyle f(\mathcal{P}_{1})\leq f(\mathcal{P}_{2}),

holds. Furthermore, defining Δf(p|𝒮)=f(𝒮{p})f(𝒮),𝒮𝒫\Delta_{f}(p|\mathcal{S})=f(\mathcal{S}\cup\{p\})-f(\mathcal{S}),\,\,\mathcal{S}\subset\mathcal{P}, the set function ff is submoular if for p𝒫𝒫2p\in\mathcal{P}\setminus\mathcal{P}_{2},

Δf(p|𝒫1)Δf(p|𝒫2),\displaystyle\Delta_{f}(p|\mathcal{P}_{1})\geq\Delta_{f}(p|\mathcal{P}_{2}), (2)

holds, which shows the diminishing return of a set function. Furthermore, a set function is normalized if f()=0f(\emptyset)=0.

In what follows without loss of generality we assume that the ground set 𝒫\mathcal{P} is equal to {1,,n}\{1,\cdots,n\}. Submodular functions are functions assigning values to all subsets of a finite set 𝒫\mathcal{P}. Using the set membership indicator, equivalently, we can regard them as functions on the boolean hypercube, f:{1,0}nf:\{1,0\}^{n}\to. For a submodular function f:2𝒫0f:2^{\mathcal{P}}\to{\mathbb{R}}_{\geq 0}, its multilinear extension F:[0,1]n0F:[0,1]^{n}\to{\mathbb{R}}_{\geq 0} in the continuous space is

F(𝐱)=𝒫f()pxpp(1xp),𝐱[0,1]n.\displaystyle F(\boldsymbol{\mathbf{x}})=\sum_{\mathcal{R}\subset\mathcal{P}}f(\mathcal{R})\prod_{p\in\mathcal{R}}x_{p}\prod_{p\not\in\mathcal{R}}(1-x_{p}),\quad~{}~{}\boldsymbol{\mathbf{x}}\in[0,1]^{n}. (3)

The multilinear extension FF of set function ff is a unique multilinear function agreeing with f on the vertices of the hypercube. The multilinear extension function has the following properties.

Lemma 3.1 (See [21])

If ff is non-decreasing, then Fxp0\frac{\partial F}{\partial x_{p}}\geq 0 for all p𝒫p\in\mathcal{P}, everywhere in [0,1]n[0,1]^{n} (monotonicity of FF). If ff is submodular, then 2Fxpxq0\frac{\partial^{2}F}{\partial x_{p}x_{q}}\leq 0 for all p,q𝒫p,q\in\mathcal{P}, everywhere in [0,1]n[0,1]^{n}.

An alternative way to perceive FF is to observe that it is indeed equivalent to

F(𝐱)=𝔼[f(𝐱)],\displaystyle F(\boldsymbol{\mathbf{x}})=\mathbb{E}[f(\mathcal{R}_{\boldsymbol{\mathbf{x}}})], (4)

where 𝐱𝒫\mathcal{R}_{\boldsymbol{\mathbf{x}}}\subset\mathcal{P} is a set where each element p𝐱p\in\mathcal{R}_{\boldsymbol{\mathbf{x}}} appears independently with the probabilities xpx_{p}. Then, it can be shown that taking the derivatives of F(𝐱)F(\boldsymbol{\mathbf{x}}) yields

Fxp(𝐱)=𝔼[f(𝐱{p})f(𝐱{p})],\displaystyle\frac{\partial F}{\partial x_{p}}(\boldsymbol{\mathbf{x}})=\mathbb{E}[f(\mathcal{R}_{\boldsymbol{\mathbf{x}}}\cup\{p\})-f(\mathcal{R}_{\boldsymbol{\mathbf{x}}}\setminus\{p\})], (5)

and

2Fxpxq(𝐱)=𝔼[f(𝐱{p,q})f(𝐱{q}{p})\displaystyle\frac{\partial^{2}F}{\partial x_{p}\partial x_{q}}(\boldsymbol{\mathbf{x}})=\mathbb{E}[f(\mathcal{R}_{\boldsymbol{\mathbf{x}}}\cup\{p,q\})-f(\mathcal{R}_{\boldsymbol{\mathbf{x}}}\cup\{q\}\setminus\{p\})
f(𝐱{p}{q})+f(𝐱{p,q})].\displaystyle\quad\quad\quad\quad-f(\mathcal{R}_{\boldsymbol{\mathbf{x}}}\cup\{p\}\setminus\{q\})+f(\mathcal{R}_{\boldsymbol{\mathbf{x}}}\setminus\{p,q\})]. (6)

Constructing F(𝐱)F(\boldsymbol{\mathbf{x}}) in (3) requires the knowledge of f()f(\mathcal{R}) for all 𝒫\mathcal{R}\subset\mathcal{P}, which becomes computationally intractable when the size of ground set 𝒫\mathcal{P} increases. However, with the stochastic interpretation (4) of the multilinear extension and its derivatives, if enough samples are drawn according to 𝐱\boldsymbol{\mathbf{x}}, we can obtain an estimate of F(𝐱)F(\boldsymbol{\mathbf{x}}) with a reasonable computational cost. We can use Chernoff-Hoeffding’s inequality to quantify the quality of this estimation given the number of samples.

Theorem 3.1 (Chernoff-Hoeffding’s inequality [26])

Consider a set of KK independent random variables X1,,XKX_{1},...,X_{K} where a<Xi<ba<X_{i}<b. Letting SK=i=1KXiS_{K}=\sum_{i=1}^{K}X_{i}, then

[|SK𝔼[SK]|>δ]<2e2δ2K(ba)2.\displaystyle\mathbb{P}[|S_{K}-\mathbb{E}[S_{K}]|>\delta]<2\textup{e}^{-\frac{2\delta^{2}}{K(b-a)^{2}}}.

Problems such as sensor placement with a limited number of sensors and ample places of possible sensor placement can be formulated as a set value optimization (1) where the feasible set constraint \mathcal{I} is in the form of matroid. A matroid is defined as the pair ={𝒫,}\mathcal{M}=\{\mathcal{P},\mathcal{I}\} with 2𝒫\mathcal{I}\subset 2^{\mathcal{P}} such that

  • \mathcal{B}\in\mathcal{I} and 𝒜\mathcal{A}\subset\mathcal{B} then 𝒜\mathcal{A}\in\mathcal{I}

  • 𝒜,\mathcal{A},\mathcal{B}\in\mathcal{I} and ||>|𝒜||\mathcal{B}|>|\mathcal{A}|, then there exists x𝒜x\in\mathcal{B}\setminus\mathcal{A} such that 𝒜x\mathcal{A}\cup{x}\in\mathcal{I}

One of the known matroids is uniform matroid ={𝒫,||k}.\mathcal{I}=\{\mathcal{R}\subset\mathcal{P},|\mathcal{R}|\leq k\}. A sensor placement scenario where there are kk sensors to be placed in locations selected out of possible locations 𝒫\mathcal{P} can be described by a uniform matroid. On the other hand, when there is a set of agents 𝒜\mathcal{A}, each with multiple sensors kik_{i}, i𝒜i\in\mathcal{A}, that they can place in a set of specific and non-overlapping part 𝒫i\mathcal{P}_{i} of a discrete space, this constrained can be described by the partitioned matroid ={𝒫,}\mathcal{M}=\{\mathcal{P},\mathcal{I}\}, where ={𝒫,|𝒫i|ki}\mathcal{I}=\{\mathcal{R}\subset\mathcal{P},|\mathcal{R}\cap\mathcal{P}_{i}|\leq k_{i}\} with i=1N𝒫i=𝒫\bigcup\limits_{i=1}^{N}\mathcal{P}_{i}=\mathcal{P} and 𝒫i𝒫j=,ij\mathcal{P}_{i}\cap\mathcal{P}_{j}=\emptyset,\,\,i\neq j.

A matroid polytop is a convex hull defined as

P()={𝐱0|𝒫|,𝒬𝒫,p𝒬xpr(𝒬)}\displaystyle P(\mathcal{M})=\{\boldsymbol{\mathbf{x}}\in{\mathbb{R}}_{\geq 0}^{|\mathcal{P}|},\forall\mathcal{Q}\in\mathcal{P},\sum_{p\in\mathcal{Q}}x_{p}\leq r_{\mathcal{M}}(\mathcal{Q})\}

with rr_{\mathcal{M}} to be the rank function of the matroid defined as

r(𝒬)=max{|𝒮|,𝒮𝒬,𝒮}.\displaystyle r_{\mathcal{M}}(\mathcal{Q})=\textup{max}\{|\mathcal{S}|,\mathcal{S}\subset\mathcal{Q},\mathcal{S}\in\mathcal{I}\}.

For the partition matroid with ki=1k_{i}=1 which is the focus of this paper, the convex hull is such that for 𝐱=[𝐱1,,𝐱N]\boldsymbol{\mathbf{x}}=[\boldsymbol{\mathbf{x}}_{1}^{\top},...,\boldsymbol{\mathbf{x}}_{N}^{\top}]^{\top} with 𝐱i0|𝒫i|\boldsymbol{\mathbf{x}}_{i}\in{\mathbb{R}}_{\geq 0}^{|\mathcal{P}_{i}|} associated with membership probability of policies in 𝒫i\mathcal{P}_{i}, we have 𝟏𝐱i1\boldsymbol{\mathbf{1}}^{\top}\boldsymbol{\mathbf{x}}_{i}\leq 1, i.e.,

P()={𝐱0|𝒫||p=1|𝒫i|xip1,i𝒜}.\displaystyle P(\mathcal{M})=\{\boldsymbol{\mathbf{x}}\in{\mathbb{R}}_{\geq 0}^{|\mathcal{P}|}\,\big{|}\,\sum\nolimits_{p=1}^{|\mathcal{P}_{i}|}x_{ip}\leq 1,\forall i\in\mathcal{A}\}. (7)
Refer to caption
Figure 1: Let the policy set of each mobile sensor i𝒜i\in\mathcal{A} be 𝒫i={(i,p)|pi}\mathcal{P}_{i}=\{(i,p)|p\in\mathcal{B}_{i}\}, where i\mathcal{B}_{i}\subset\mathcal{B} is the set of the allowable sensor placement points for agent i𝒜i\in\mathcal{A} out of all the sensor placement points \mathcal{B}. Note that by definition, for any two agent i,j𝒜i,j\in\mathcal{A}, 𝒫i𝒫j=\mathcal{P}_{i}\cap\mathcal{P}_{j}=\emptyset. The sensors are heterogeneous, in the sense that the size of their sensing zone is different. The objective is to place the sensors in points in \mathcal{B} such that the total number of the observed points of interest is maximized. The utility function, the sum of observed points, is known to be a monotone and increasing submodular function of the agent’s sensing zone [27]. This sensor placement problem can be formalized as the optimization problem (8). The agents are communicating over a connected undirected graph and their objective is to obtain their respective placement points by interacting only with their communicating neighbors.

4 Problem Statement

We consider a group of 𝒜={1,,N}\mathcal{A}=\{1,...,N\} with communication and computation capabilities interacting over a connected undirected graph 𝒢(𝒜,)\mathcal{G}(\mathcal{A},\mathcal{E}). These agents aim to solve in a distributed manner the optimization problem

maxf()s.t.\displaystyle\underset{\mathcal{R}\in\mathcal{I}}{\textup{max}}f(\mathcal{R})\quad\textup{s.t.} (8a)
={\displaystyle\mathcal{I}=\big{\{} 𝒫||𝒫i|1,i𝒜},\displaystyle\mathcal{R}\subset\mathcal{P}\,\big{|}\,\,|\mathcal{R}\cap\mathcal{P}_{i}|\leq 1,~{}i\in\mathcal{A}\big{\}}, (8b)

where utility function f:2𝒫0f:2^{\mathcal{P}}\to_{\geq 0} is monotone increasing and submodular set function over the discrete policy space 𝒫=i=1N𝒫i\mathcal{P}=\bigcup\limits_{i=1}^{N}\mathcal{P}_{i}, with 𝒫i\mathcal{P}_{i} being the policy space of agent i𝒜i\in\mathcal{A}, which is only known to agent ii. In this paper, we work in the value oracle model where the only access to the utility function is through a black box returning f()f(\mathcal{R}) for a given set \mathcal{R}. Every agent can obtain the value of the utility function at any subset 𝒫\mathcal{R}\in\mathcal{P}. The constraint (8b) is the partitioned matroid ={𝒫,}\mathcal{M}=\{\mathcal{P},\mathcal{I}\}, which ensures that only one policy per agent is selected from each local policy set 𝒫i\mathcal{P}_{i}, i𝒜i\in\mathcal{A}. An example application scenario of our problem of interest is shown in Fig. 1. Without loss of generality, to simplify notation, we assume that the policy space is 𝒫={1,,n},\mathcal{P}=\{1,...,n\}, and is sorted agent-wise with 1𝒫11\in\mathcal{P}_{1} and n𝒫Nn\in\mathcal{P}_{N}.

Finding the optimal solution \mathcal{R}^{\star}\in\mathcal{I} of (8) even in central form is NP-Hard [28]. The computational time of finding the optimizer set increases exponentially with NN [29]. The well-known sequential greedy algorithm finds a suboptimal solution SG\mathcal{R}_{\textup{SG}} for (8) with the optimality bound of f(SG)12f()f(\mathcal{R}_{\textup{SG}})\geq\frac{1}{2}f(\mathcal{R}^{\star}), i.e., a 1/21/2-approximation at worst case [28]. More recently, by using the multilinear extension of the submodular utility functions, Vondrák [21] developed a randomized centralized continuous greedy algorithm which achieves a (11/e)O(1/T)(1-1/\text{e})-O(1/T)-approximate for set value optimization (8) in the value oracle model, see Algorithm 1333Pipage rounding is a polynomial time algorithm which moves a fractional point 𝐱\boldsymbol{\mathbf{x}} on a hypercube to a integral point 𝐱^\hat{\boldsymbol{\mathbf{x}}} on the same hypercube such that f(𝐱^)f(𝐱)f(\hat{\boldsymbol{\mathbf{x}}})\geq f(\boldsymbol{\mathbf{x}}). Our objective in this paper is to develop a distributed implementation of Algorithm 1 to solve (8) for when agents interact over a connected graph 𝒢\mathcal{G}. Recall that in our problem setting, every agent i𝒜i\in\mathcal{A} can evaluate the utility function for a given 𝒫\mathcal{R}\subset\mathcal{P} but it has access only to its own policy space 𝒫i\mathcal{P}_{i}.

Algorithm 1 Practical implementation of the continuous greedy process [21].
1:𝐱𝟎,Init:t1\boldsymbol{\mathbf{x}}\leftarrow\boldsymbol{\mathbf{0}},\,\,\mathbf{\text{Init:}}~{}t\leftarrow 1,
2:while tTt\leq T do
3:     Draw KK samples of \mathcal{R} from 𝒫\mathcal{P} according to 𝐱\boldsymbol{\mathbf{x}}
4:     for p𝒫p\in\mathcal{P} do
5:         Estimate wp𝔼[f(𝐱{p})f(𝐱{p})]w_{p}\sim\mathbb{E}[f(\mathcal{R}_{\boldsymbol{\mathbf{x}}}\cup\{p\})-f(\mathcal{R}_{\boldsymbol{\mathbf{x}}}\setminus\{p\})]
6:     end for
7:     Solve for =argmaxpwp.\mathcal{R}^{\star}=\underset{\mathcal{R}\in\mathcal{I}}{\textup{argmax}}\,\,\sum_{p\in\mathcal{R}}w_{p}.
8:     Update membership vector as 𝐱𝐱+1T𝟏\boldsymbol{\mathbf{x}}\leftarrow\boldsymbol{\mathbf{x}}+\frac{1}{T}\boldsymbol{\mathbf{1}}_{\mathcal{R}^{\star}}
9:     tt+1t\leftarrow t+1
10:end while
11:Use Pipage rounding to convert the fractional solution 𝐱\boldsymbol{\mathbf{x}} to an integral solution.

Note: 𝟏\boldsymbol{\mathbf{1}}_{\mathcal{R}^{\star}} is the 𝐯\boldsymbol{\mathbf{v}} in (10).

5 A polynomial-time distributed multi-agent randomized continuous greedy algorithm

Our proposed distributed multi-agent randomized continuous greedy algorithm to solve the set value optimization problem (8) over a connected graph 𝒢\mathcal{G} is given in Algorithm 2, whose convergence guarantee and suboptimality gap is given in Theorem 5.1 below. To provide the insight into construction of Algorithm 2, we first review the idea behind the central suboptimal solution via Algorithm 1 following [21]. We also provide some intermediate results that will be useful in establishing our results.

5.1 A short overview of the central continuous greedy process

As we mentioned earlier the continuous multilinear extension is a relaxation strategy that extends a submodular function f()f(\mathcal{R}), which is defined on the vertices of the nn-dimensional hypercube {0,1}n\{0,1\}^{n} to a continuous multilinear function FF defined on [0,1]n[0,1]^{n}. The two functions evaluate identically for any vector 𝐱[0,1]n\boldsymbol{\mathbf{x}}\in[0,1]^{n} that is the membership indicator vector of a set 𝒫\mathcal{R}\in\mathcal{P}. Then, by way of a process that runs continuously, depending only on local properties of FF, we can produce a point 𝐱P()\boldsymbol{\mathbf{x}}\in P(\mathcal{M}) to approximate the optimum OPT=max𝐱P()F(𝐱)OPT=\underset{\boldsymbol{\mathbf{x}}\in P(\mathcal{M})}{\textup{max}}F(\boldsymbol{\mathbf{x}}) (here recall (7)). The proposal in [21] is to move in the direction of a vector constrained by P()P(\mathcal{M}) which maximizes the local gain. To understand the logic behind Algorithm 1 let us review the conceptual steps of this continuous greedy process. [21] views the process as a particle starting at 𝐱(0)=𝟎\boldsymbol{\mathbf{x}}(0)=\boldsymbol{\mathbf{0}} and following the flow

d𝐱dt=𝐯(𝐱)where𝐯(𝐱)=argmax𝐯P()(𝐯.F(𝐱)).\displaystyle\frac{\text{d}\boldsymbol{\mathbf{x}}}{\text{d}t}=\boldsymbol{\mathbf{v}}(\boldsymbol{\mathbf{x}})~{}~{}\text{where}~{}~{}\boldsymbol{\mathbf{v}}(\boldsymbol{\mathbf{x}})=\underset{\boldsymbol{\mathbf{v}}\in P(\mathcal{M})}{\arg\max}(\boldsymbol{\mathbf{v}}.\nabla F(\boldsymbol{\mathbf{x}})). (9)

over a unit time interval [0,1][0,1]. We note that 𝐱(t)\boldsymbol{\mathbf{x}}(t) for t[0,1]t\in[0,1] is contained in P()P(\mathcal{M}), since it is a convex combination of vectors in P()P(\mathcal{M}). Now consider a point 𝐱\boldsymbol{\mathbf{x}} along the trajectory of our flow (9) and assume that 𝐱\boldsymbol{\mathbf{x}}^{\star} is the true optimum OPT=F(𝐱)OPT=F(\boldsymbol{\mathbf{x}}^{\star}). Now consider a direction 𝐯=max{𝐱𝐱,𝟎}\boldsymbol{\mathbf{v}}^{\star}=\max\{\boldsymbol{\mathbf{x}}^{\star}-\boldsymbol{\mathbf{x}},\boldsymbol{\mathbf{0}}\}, which is a nonnegative vector. Because 0𝐯𝐱P()0\leq\boldsymbol{\mathbf{v}}^{\star}\leq\boldsymbol{\mathbf{x}}^{\star}\in P(\mathcal{M}), then 𝐯P()\boldsymbol{\mathbf{v}}^{\star}\in P(\mathcal{M}). By virtue of Lemma 3.1, FF is monotone increasing, therefore, using max{𝐱𝐱,𝟎}=max{𝐱,𝐱}𝐱\max\{\boldsymbol{\mathbf{x}}^{\star}-\boldsymbol{\mathbf{x}},\boldsymbol{\mathbf{0}}\}=\max\{\boldsymbol{\mathbf{x}}^{\star},\boldsymbol{\mathbf{x}}\}-\boldsymbol{\mathbf{x}} we have F(𝐱+𝐯)=F(max{𝐱,𝐱})F(𝐱)F(\boldsymbol{\mathbf{x}}+\boldsymbol{\mathbf{v}}^{\star})=F(\max\{\boldsymbol{\mathbf{x}}^{\star},\boldsymbol{\mathbf{x}}\})\geq F(\boldsymbol{\mathbf{x}}^{\star}). However, 𝐱+𝐯\boldsymbol{\mathbf{x}}+\boldsymbol{\mathbf{v}}^{\star} does not belong to P()P(\mathcal{M}) necessarily. Therefore, lets consider F(𝐱+ζ𝐯)F(\boldsymbol{\mathbf{x}}+\zeta\boldsymbol{\mathbf{v}}^{\star}) for ζ0\zeta\geq 0. It can be shown that F(𝐱+ζ𝐯)F(\boldsymbol{\mathbf{x}}+\zeta\boldsymbol{\mathbf{v}}^{\star}) is concave in ζ\zeta and dFdζ\frac{\text{d}F}{\text{d}\zeta} is non-increasing. Thus, it can be established that F(𝐱+𝐯)F(𝐱)dFdζ|ζ=0=𝐯.F(𝐱)F(\boldsymbol{\mathbf{x}}+\boldsymbol{\mathbf{v}}^{\star})-F(\boldsymbol{\mathbf{x}})\leq\frac{\text{d}F}{\text{d}\zeta}|_{\zeta=0}=\boldsymbol{\mathbf{v}}^{\star}.\nabla F(\boldsymbol{\mathbf{x}}). But, since 𝐯P()\boldsymbol{\mathbf{v}}^{\star}\in P(\mathcal{M}), and 𝐯P()\boldsymbol{\mathbf{v}}\in P(\mathcal{M}) that is used to generate 𝐯\boldsymbol{\mathbf{v}} maximizes 𝐯.F(𝐱)\boldsymbol{\mathbf{v}}.\nabla F(\boldsymbol{\mathbf{x}}), we can write 𝐯.F(𝐱)𝐯.F(𝐱)F(𝐱+𝐯)F(𝐱)OPTF(𝐱)\boldsymbol{\mathbf{v}}.\nabla F(\boldsymbol{\mathbf{x}})\geq\boldsymbol{\mathbf{v}}^{\star}.\nabla F(\boldsymbol{\mathbf{x}})\geq F(\boldsymbol{\mathbf{x}}+\boldsymbol{\mathbf{v}}^{\star})-F(\boldsymbol{\mathbf{x}})\leq OPT-F(\boldsymbol{\mathbf{x}}). Now we note that dFdt=𝐯(𝐱).F(𝐱)OPTF(𝐱(t)).\frac{\text{d}F}{\text{d}t}=\boldsymbol{\mathbf{v}}(\boldsymbol{\mathbf{x}}).\nabla F(\boldsymbol{\mathbf{x}})\geq OPT-F(\boldsymbol{\mathbf{x}}(t)). Therefore, given 𝐱(0)=𝟎\boldsymbol{\mathbf{x}}(0)=\boldsymbol{\mathbf{0}}, using the Comparison Lemma [30], we can conclude the F(𝐱)(1et)OPTF(\boldsymbol{\mathbf{x}})\geq(1-\text{e}^{-t})OPT, and thus 𝐱(1)P()\boldsymbol{\mathbf{x}}(1)\in P(\mathcal{M}) and also F(𝐱(1))(11/e)OPTF(\boldsymbol{\mathbf{x}}(1))\geq(1-1/\text{e})OPT. In the second stage of the algorithm, the fractional solution 𝐱(1)\boldsymbol{\mathbf{x}}(1) is rounded to a point in {0,1}n\{0,1\}^{n} by use of Pipage rounding method, see [31] for more details about Pipage rounding. The aforementioned exposition is the conceptual design behind the continuous greedy process. The algorithm 1is a practical implementation achieved by the use of a numerical iterative process

𝐱(t+1)=𝐱(t)+1T𝐯(t),\displaystyle\boldsymbol{\mathbf{x}}(t+1)=\boldsymbol{\mathbf{x}}(t)+\frac{1}{T}\boldsymbol{\mathbf{v}}(t), (10)

and use of sampling to compute F(𝐱)\nabla F(\boldsymbol{\mathbf{x}}) and consequently 𝐯(t)\boldsymbol{\mathbf{v}}(t), see [21] for more details. In what follows, we explain a practical distributed implementation of the the continuous greedy process, which is realized as Algorithm 2, and is inspired by this central solution.

Algorithm 2 Discrete Distributed implementation of the continuous greedy process.
1:Init:𝒫¯,i,t1\mathbf{\text{Init:}}~{}\bar{\mathcal{P}}\leftarrow\emptyset,\,\,\mathcal{F}_{i}\leftarrow\emptyset,\,\,t\leftarrow 1,
2:while tTt\leq T do
3:    for i𝒜i\in\mathcal{A} do
4:         Draw KiK_{i} sample policy sets \mathcal{R} such that qq\in\mathcal{R} with the probability α\alpha for all (q,α)i(q,\alpha)\in\mathcal{F}_{i}.
5:         for p𝒫ip\in\mathcal{P}_{i} do
6:             Compute wpi𝔼[f({p})f({p})]w^{i}_{p}\sim\mathbb{E}[f(\mathcal{R}\cup\{p\})-f(\mathcal{R}\setminus\{p\})] using the policy sample sets of step 44.
7:         end for
8:         Solve for p=argmaxp𝒫iwpi.p^{\star}=\underset{p\in\mathcal{P}_{i}}{\textup{argmax}}\,\,w^{i}_{p}.
9:         ii{(p,1T)}.\mathcal{F}_{i}^{-}\leftarrow\mathcal{F}_{i}\oplus\{(p^{\star},\frac{1}{T})\}.
10:         Broadcast i\mathcal{F}_{i}^{-} to the neighbors 𝒩i\mathcal{N}_{i}.
11:         iMAXj𝒩i{i}j\mathcal{F}_{i}\leftarrow\underset{j\in\mathcal{N}_{i}\cup\{i\}}{\textup{MAX}}\mathcal{F}_{j}^{-}
12:    end for
13:    tt+1t\leftarrow t+1.
14:end while
15:for i𝒜i\in\mathcal{A} do
16:    Sample one policy p¯𝒫i\bar{p}\in\mathcal{P}_{i} using i\mathcal{F}_{i}
17:    𝒫¯𝒫¯{p¯}\bar{\mathcal{P}}\leftarrow\bar{\mathcal{P}}\cup\{\bar{p}\}
18:end for

5.2 Design and analysis of the distributed continuous greedy process

We start off our design and analysis of the distributed continuous greedy process by introducing our notation and the set of computations that agents carry out locally using their local information and interaction with their neighbors. The algorithm is performed over the finite time horizon of TT steps. Let i(t)𝒫×[0,1]\mathcal{F}_{i}(t)\subset\mathcal{P}\times[0,1] be the information set of agent ii at time step tt, initialized at i(0)=\mathcal{F}_{i}(0)=\emptyset. For, any couple (p,α)i(t)(p,\alpha)\in\mathcal{F}_{i}(t) the first element is the policy and the second element is the corresponding membership probability. We let 𝐱i(t)n\boldsymbol{\mathbf{x}}_{i}(t)\in^{n} (recall |𝒫|=n|\mathcal{P}|=n) be the local copy of the membership probability of our suboptimal solution of (8) at agent ii at time step tt, defined according to

xip(t)={α,(p,α)i(t),0otherwise,p𝒫.\displaystyle x_{ip}(t)=\begin{cases}\alpha,&(p,\alpha)\in\mathcal{F}_{i}(t),\\ 0&\text{otherwise},\end{cases}\qquad p\in\mathcal{P}. (11)

Recall that 𝒫={1,,n}\mathcal{P}=\{1,...,n\} and it is sorted agent-wise with 1𝒫11\in\mathcal{P}_{1} and n𝒫Nn\in\mathcal{P}_{N}. Hence, 𝐱i(t)=[𝐱^i1(t),,𝐱ii(t),,𝐱^iN(t)]\boldsymbol{\mathbf{x}}_{i}(t)=[\hat{\boldsymbol{\mathbf{x}}}_{i1}^{\top}(t),\cdots,\boldsymbol{\mathbf{x}}_{ii}^{\top}(t),\cdots,\hat{\boldsymbol{\mathbf{x}}}_{iN}^{\top}(t)]^{\top} where 𝐱ii(t)0|𝒫i|\boldsymbol{\mathbf{x}}_{ii}(t)\in{\mathbb{R}}_{\geq 0}^{|\mathcal{P}_{i}|} is the membership probability vector of agent ii’s own policy at iteration tt, while 𝐱^ij(t)0|𝒫j|\hat{\boldsymbol{\mathbf{x}}}_{ij}(t)\in{\mathbb{R}}_{\geq 0}^{|\mathcal{P}_{j}|} is the local estimate of the membership probability vector of agent jj by agent ii. At time step tt agent ii solves the optimization problem

𝐯~i(t)=argmax𝐲Pi()𝐲.F~(𝐱i(t))\displaystyle\widetilde{\boldsymbol{\mathbf{v}}}_{i}(t)=\underset{\boldsymbol{\mathbf{y}}\in P_{i}(\mathcal{M})}{\textup{argmax}}\boldsymbol{\mathbf{y}}.\widetilde{\nabla F}(\boldsymbol{\mathbf{x}}_{i}(t)) (12)

with

Pi()={[𝐲1,,𝐲N]\displaystyle P_{i}(\mathcal{M})=\Big{\{}[\boldsymbol{\mathbf{y}}_{1}^{\top},\cdots,\boldsymbol{\mathbf{y}}_{N}^{\top}]^{\top}\in 0n| 1.𝐲i1,\displaystyle{\mathbb{R}}_{\geq 0}^{n}\,\Big{|}\,\boldsymbol{\mathbf{1}}^{\top}.\boldsymbol{\mathbf{y}}_{i}\leq 1,
𝐲j=𝟎,j𝒜\{i}}.\displaystyle~{}~{}~{}\boldsymbol{\mathbf{y}}_{j}=\boldsymbol{\mathbf{0}},~{}j\in\mathcal{A}\backslash\{i\}\Big{\}}. (13)

The term F~(𝐱i(t))\widetilde{\nabla F}(\boldsymbol{\mathbf{x}}_{i}(t)) is the estimate of F(𝐱i(t))\nabla F(\boldsymbol{\mathbf{x}}_{i}(t)) which is calculated by taking KiK_{i} samples of set 𝐱i(t)\mathcal{R}_{\boldsymbol{\mathbf{x}}_{i}(t)} according to membership probability vector 𝐱i(t)\boldsymbol{\mathbf{x}}_{i}(t). Recall (5). Hence, Fxp\frac{\partial F}{\partial x_{p}} is estimated by averaging f(𝐱i(t){p})f(𝐱i(t){p})f(\mathcal{R}_{\boldsymbol{\mathbf{x}}_{i}(t)}\cup\{p\})-f(\mathcal{R}_{\boldsymbol{\mathbf{x}}_{i}(t)}\setminus\{p\}) over the samples. We denote the ppth element of F~(𝐱i(t))\widetilde{\nabla F}(\boldsymbol{\mathbf{x}}_{i}(t)) by wpiw_{p}^{i}, and represent it by

wpi𝔼[f(𝐱i(t){p})f(𝐱i(t){p})].w_{p}^{i}\sim\mathbb{E}[f(\mathcal{R}_{\boldsymbol{\mathbf{x}}_{i}(t)}\cup\{p\})-f(\mathcal{R}_{\boldsymbol{\mathbf{x}}_{i}(t)}\setminus\{p\})]. (14)

We note that given the definition of Pi()P_{i}(\mathcal{M}), to compute (12), we only need wpiw_{p}^{i} for p𝒫ip\in\mathcal{P}_{i}.

Remark 5.1 (Local computation of wpiw_{p}^{i}, p𝒫ip\in\mathcal{P}_{i}, by agent ii)

Given the definition (11), we note that wpiw_{p}^{i}, an estimate of Fxp\frac{\partial F}{\partial x_{p}} can be obtained from drawing KiK_{i} sample policy sets \mathcal{R} such that qq\in\mathcal{R} with the probability α\alpha for all (q,α)i(q,\alpha)\in\mathcal{F}_{i} and using

wpi𝔼[f({p})f({p})],p𝒫i.w^{i}_{p}\sim\mathbb{E}[f(\mathcal{R}\cup\{p\})-f(\mathcal{R}\setminus\{p\})],\quad p\in\mathcal{P}_{i}. (15)

Let each agent propagate its local variable according to

𝐱i(t+1)=𝐱i(t)+1T𝐯~i(t).\displaystyle\boldsymbol{\mathbf{x}}_{i}^{-}(t+1)=\boldsymbol{\mathbf{x}}_{i}(t)+\frac{1}{T}\widetilde{\boldsymbol{\mathbf{v}}}_{i}(t). (16)

One can make a conceptual connection between (16) and the practical implementation of (9) discussed earlier. Because the propagation is only based on local information of agent ii, next, each agent, by interacting with its neighbors, updates its propagated 𝐱i(t+1)\boldsymbol{\mathbf{x}}_{i}^{-}(t+1) by element-wise maximum seeking

𝐱i(t+1)=maxj𝒩i{i}𝐱j(t+1).\displaystyle\boldsymbol{\mathbf{x}}_{i}(t+1)=\underset{j\in\mathcal{N}_{i}\cup\{i\}}{\textup{max}}\boldsymbol{\mathbf{x}}_{j}^{-}(t+1). (17)

Lemma 5.1 below shows that, as one expects, 𝐱ii(t+1)=𝐱ii(t+1)\boldsymbol{\mathbf{x}}_{ii}(t+1)=\boldsymbol{\mathbf{x}}_{ii}^{-}(t+1), i.e., the corrected component of 𝐱i\boldsymbol{\mathbf{x}}_{i} corresponding to agent ii itself is the propagated value maintained at agent ii, and not the estimated value of any of its neighbors.

Lemma 5.1

Assume that the agents follow the distributed Algorithm 2. Let 𝐱¯(t)=maxi𝒜𝐱i(t)\bar{\boldsymbol{\mathbf{x}}}(t)=\underset{i\in\mathcal{A}}{\textup{max}}\,\,\boldsymbol{\mathbf{x}}_{i}(t) where 𝐱i\boldsymbol{\mathbf{x}}_{i} is given in (11). Moreover, Then, 𝐱¯(t)=[𝐱11(t),,𝐱NN(t)]\bar{\boldsymbol{\mathbf{x}}}(t)=[\boldsymbol{\mathbf{x}}_{11}^{\top}(t),\cdots,\boldsymbol{\mathbf{x}}_{NN}^{\top}(t)]^{\top} at any time step tt.

The proof of Lemma 5.1 is given in Appendix A. We note that it follows from Lemma 5.1 that

𝐱jj(t)=𝐱jj(t),j𝒜.\displaystyle\boldsymbol{\mathbf{x}}_{jj}(t)=\boldsymbol{\mathbf{x}}_{jj}^{-}(t),\quad j\in\mathcal{A}. (18)

In the distributed Algorithm 2 at each time step tt each agent has its own belief on the probabilities of the policies that is not necessarily the same as the belief of the other agents. The following result establishes the difference between the belief of the agents.

Proposition 5.1

Let agents follow the distributed Algorithm 2. Then, the vectorized membership probability 𝐱i(t)\boldsymbol{\mathbf{x}}_{i}(t) for each agent i𝒜i\in\mathcal{A} realized from i(t)\mathcal{F}_{i}(t) satisfy

01N𝟏.(𝐱¯(t)𝐱i(t))1Td(𝒢),\displaystyle 0\leq\frac{1}{N}\boldsymbol{\mathbf{1}}.(\bar{\boldsymbol{\mathbf{x}}}(t)-\boldsymbol{\mathbf{x}}_{i}(t))\leq\frac{1}{T}d(\mathcal{G}), (19a)
𝐱¯(t+1)𝐱¯(t)=1Ti𝒜𝐯~i(t),\displaystyle\bar{\boldsymbol{\mathbf{x}}}(t+1)-\bar{\boldsymbol{\mathbf{x}}}(t)=\frac{1}{T}\sum_{i\in\mathcal{A}}\widetilde{\boldsymbol{\mathbf{v}}}_{i}(t), (19b)
1N𝟏.(𝐱¯(t+1)𝐱¯(t))=1T.\displaystyle\frac{1}{N}\boldsymbol{\mathbf{1}}.(\bar{\boldsymbol{\mathbf{x}}}(t+1)-\bar{\boldsymbol{\mathbf{x}}}(t))=\frac{1}{T}. (19c)

The proof of Proposition 5.1 is given in the Appendix A. Because ff is normal and monotone increasing, we have the guarantees that wpi0w_{p}^{i}\geq 0. Therefore, without loss of generality, we know that one realization of 𝐯~i(t)\widetilde{\boldsymbol{\mathbf{v}}}_{i}(t) in (12) corresponds to 𝟏{p}\boldsymbol{\mathbf{1}}_{\{p^{\star}\}} where

p=argmaxp𝒫iwpi.\displaystyle p^{\star}=\underset{p\in\mathcal{P}_{i}}{\arg\max}\,\,w_{p}^{i}. (20)

Next, let each agent i𝒜i\in\mathcal{A} propagate its information set according to

i(t+1)=i(t){(p,1T)},\displaystyle\mathcal{F}_{i}^{-}(t+1)=\mathcal{F}_{i}(t)\oplus\left\{(p^{\star},\frac{1}{T})\right\}, (21)

and update it using a local interaction with its neighbors according to

i(t+1)=MAXj𝒩i{i}j(t+1).\displaystyle\mathcal{F}_{i}(t+1)=\underset{j\in\mathcal{N}_{i}\cup\{i\}}{\textup{MAX}}\mathcal{F}_{j}^{-}(t+1). (22)

By definition to \oplus and MAXMAX operators, we have the guarantees that if (p,α1)(p,\alpha_{1}), then there exists no α2α1\alpha_{2}\neq\alpha_{1} that (p,α2)i(p,\alpha_{2})\in\mathcal{F}_{i}.

Lemma 5.2

For 𝐯~i(t)=𝟏{p}\widetilde{\boldsymbol{\mathbf{v}}}_{i}(t)=\boldsymbol{\mathbf{1}}_{\{p^{\star}\}}, 𝐱i(t+1)\boldsymbol{\mathbf{x}}^{-}_{i}(t+1) and 𝐱i(t+1)\boldsymbol{\mathbf{x}}_{i}(t+1) computed from, respectively, (16) and (17) are the same as 𝐱i(t+1)\boldsymbol{\mathbf{x}}^{-}_{i}(t+1) and 𝐱i(t+1)\boldsymbol{\mathbf{x}}_{i}(t+1) constructed from, respectively, i(t+1)\mathcal{F}_{i}^{-}(t+1) and i(t+1)\mathcal{F}_{i}(t+1) using (11).

  • Proof of Lemma 5.2

    The proof follows trivially from the definition of the operator \oplus and (11). \Box

Initialized by i(0)=\mathcal{F}_{i}(0)=\emptyset, i𝒜i\in\mathcal{A}, (20), (21), and (21) where wpiw_{p}^{i} is computed via (15) constitute a distributed iterative process, formally stated by Algorithm 2, that runs for TT steps. At the end of these TT steps, as stated in Algorithm 2, each agent i𝒜i\in\mathcal{A}, obtains its suboptimal policy 𝒫¯i\bar{\mathcal{P}}_{i} by sampling one policy p¯𝒫i\bar{p}\in\mathcal{P}_{i} with the probability given by 𝐱ii(T)\boldsymbol{\mathbf{x}}_{ii}(T), where for p𝒫ip\in\mathcal{P}_{i},

xip(T)={α,(p,α)i(T),0otherwise.x_{ip}(T)=\begin{cases}\alpha,&(p,\alpha)\in\mathcal{F}_{i}(T),\\ 0&\text{otherwise}.\end{cases}

The following result, whose proof is available in the Appendix A, gives the convergence guarantee and suboptimality gap of Algorithm 2.

Theorem 5.1 (Convergence guarantee and suboptimality gap of Algorithm 2)

Let f:2𝒫0f:2^{\mathcal{P}}\to{\mathbb{R}}_{\geq 0} be normalized, monotone increasing and submodular set function. Let 𝒮\mathcal{S}^{\star} to be the optimizer of problem (8). Then, the admissible policy set 𝒫¯\bar{\mathcal{P}}, the output of distributed Algorithm 2, satisfies

(11e)(1(2N2d(𝒢)+12N2+N)1T)f(𝒮)𝔼[f(𝒫¯)],\displaystyle\left(1\!-\!\frac{1}{\textup{e}}\right)\left(1\!-\!\left(2N^{2}d(\mathcal{G})\!+\!\frac{1}{2}N^{2}+N\right)\frac{1}{T}\right)f(\mathcal{S}^{\star})\!\leq\!\mathbb{E}[f(\bar{\mathcal{P}})],

with the probability of (i𝒜(12e18T2Kj)|𝒫i|)T\left(\prod_{i\in\mathcal{A}}(1-2\textup{e}^{-\frac{1}{8T^{2}}K_{j}})^{|\mathcal{P}_{i}|}\right)^{T}.

By simplifying the probability statement and dropping the higher order terms, the optimality gap grantee of Theorem 5.1 holds with the probability of at least 12Tne18T2K¯,K¯=min{K1,,KN}1-2T\,n\,\textup{e}^{-\frac{1}{8T^{2}}\underline{K}},\,\,\underline{K}=\min\{K_{1},\cdots,K_{N}\}; note that 12Tne18T2K¯(i𝒜(12e18T2Kj)|𝒫i|)T1-2T\,n\,\textup{e}^{-\frac{1}{8T^{2}}\underline{K}}\leq\left(\prod_{i\in\mathcal{A}}(1-2\textup{e}^{-\frac{1}{8T^{2}}K_{j}})^{|\mathcal{P}_{i}|}\right)^{T}. Then, it is clear that the probability improves as TT and the K¯\underline{K}, the number of the samples collected by agents, increase.

Remark 5.2 (Extra communication for improved optimality gap)

Replacing the update step (17) with 𝐱i(t+1)=𝐲i(d(𝒢))\boldsymbol{\mathbf{x}}_{i}(t+1)=\boldsymbol{\mathbf{y}}_{i}(d(\mathcal{G})) where 𝐲i(0)=𝐱i(t+1)\boldsymbol{\mathbf{y}}_{i}(0)=\boldsymbol{\mathbf{x}}_{i}^{-}(t+1) and

𝐲i(m)=maxj𝒩i{i}𝐲j(m1),m{1,,d(𝒢)},\displaystyle\boldsymbol{\mathbf{y}}_{i}(m)=\underset{j\in\mathcal{N}_{i}\cup\{i\}}{\textup{max}}\boldsymbol{\mathbf{y}}_{j}(m-1),\quad m\in\{1,\cdots,d(\mathcal{G})\},

i.e., starting with 𝐱i(t+1)\boldsymbol{\mathbf{x}}_{i}^{-}(t+1) and recursively repeating the update step (17) using the output of the previous recursion for d(𝒢)d(\mathcal{G}) times, each agent i𝒜i\in\mathcal{A} arrives at 𝐱i(t+1)=𝐱¯(t+1)\boldsymbol{\mathbf{x}}_{i}(t+1)=\bar{\boldsymbol{\mathbf{x}}}(t+1) (recall Lemma 5.1). Hence, for this revised implementation, following the proof of Theorem 5.1, we observe that (33) is replaced by |Fxp(𝐱¯(t))Fxp(𝐱i(t))|=0\left|\frac{\partial F}{\partial x_{p}}(\bar{\boldsymbol{\mathbf{x}}}(t))-\frac{\partial F}{\partial x_{p}}({\boldsymbol{\mathbf{x}}}_{i}(t))\right|=0, which consequently, leads to

(11e)(1(12N2+N)1T)f(𝒮)𝔼[f(𝒫¯)],\displaystyle\left(1\!-\!\frac{1}{\textup{e}}\right)\left(1\!-\!\left(\!\frac{1}{2}N^{2}+N\right)\frac{1}{T}\right)f(\mathcal{S}^{\star})\!\leq\!\mathbb{E}[f(\bar{\mathcal{P}})], (23)

with the probability of (i𝒜(12e18T2Kj)|𝒫i|)T\left(\prod_{i\in\mathcal{A}}(1-2\textup{e}^{-\frac{1}{8T^{2}}K_{j}})^{|\mathcal{P}_{i}|}\right)^{T}. This improved optimality gap is achieved by (d(𝒢)1)T(d(\mathcal{G})-1)T extra communication per agent. The optimality bound (23) is the same bound that is achieved by the centralized algorithm of [21]. To implement this revision, Algorithm 2’s step 11 (equivalent to (21)) should be replaced by i=i(d(𝒢))\mathcal{F}_{i}=\mathcal{H}_{i}(d(\mathcal{G})), where i(0)=i\mathcal{H}_{i}(0)=\mathcal{F}_{i}^{-}, and

i(m)=MAXj𝒩i{i}j(m1),m{1,,d(𝒢)}.\displaystyle\mathcal{H}_{i}(m)=\underset{j\in\mathcal{N}_{i}\cup\{i\}}{\textup{MAX}}\mathcal{H}_{j}^{-}(m-1),\quad m\in\{1,\cdots,d(\mathcal{G})\}. (24)
Refer to caption
(a) Case 1
Refer to caption
(b) Case 2
Refer to caption
(c) Case 3
Refer to caption
(d) Case 4
Refer to caption
(e) Case 5
Refer to caption
(f) Case 6
Refer to caption
(g)
Refer to caption
(h)
Figure 2: Plots (a)-(f) show 6 different 𝚂𝙴𝚀\mathtt{SEQ}s used in the sequential greedy algorithm. Plot (g) shows the outcome of using Algorithm 2 whereas plot (h) shows the outcome of the sequential greedy algorithm when 𝚂𝙴𝚀\mathtt{SEQ} in Case 1 (plot (a)) is used.

6 Numerical Example

Consider the multi-agent sensor placement problem introduced in Fig. 1 for 55 agents for which i=\mathcal{B}_{i}=\mathcal{B}, i.e., each agent is able to move to any of the sensor placement points. This sensor placement problem is cast by optimization problem (8). The field is a 6 unit by 6 unit square and the feasible sensor locations are the 6 by 6 grid in the center square of the field, see Fig. 2. The points of interest are spread around the map (small red dots in Fig. 2) in the total number of 900900. The sensing zone of the agents 𝒜={a,b,c,d,e}\mathcal{A}=\{a,b,c,d,e\} are circles with radii of respectively {0.5,0.6,0.7,0.8,1.5}\{0.5,0.6,0.7,0.8,1.5\}. The agents communicate over a ring graph as shown in Fig. 2. We first solve this problem using our proposed distributed Algorithm 2. The results of the simulation for different iteration and sampling numbers are shown in Table 1. The algorithm produces good results at a modest number of iteration and sampling numbers (e.g. see T=20T=20 and K=500K=500). Fig. 2(g) shows the result of the deployment using Algorithm 2. Next, we solve this algorithm using the sequential greedy algorithm [29] in a decentralized way by first choosing a route 𝚂𝙴𝚀=12345\mathtt{SEQ}={\scriptsize\framebox{1}\to\framebox{2}\to\framebox{3}\to\framebox{4}\to\framebox{5}} that visits all the agents, and then giving 𝚂𝙴𝚀\mathtt{SEQ} to the agents so they follow 𝚂𝙴𝚀\mathtt{SEQ} to share their information in a sequential manner. Figure 2(a)-(f) gives 6 possible 𝚂𝙴𝚀\mathtt{SEQ} denoted by the semi-circular arrow inside the networks. The results of running the sequential greedy algorithm over the sequences in Fig. 2(a)-(f) is shown in Table 2.

TT K 10000 500 100 50 10 5 1
100 768 768 718 710 718 716 696
20 768 768 718 710 726 716 696
10 661 640 657 640 634 602 551
5 630 630 634 626 583 608 540
1 456 456 456 456 456 456 456
Table 1: The outcome of Algorithm 2 for different iteration and sampling numbers.
Case 1 2 3 4 5 6
Utility 634 704 699 640 767 760
Table 2: Outcome of sequential greedy algorithm.

What stands out about the sequential greedy algorithm is that the choice of sequence can affect the outcome of the algorithm significantly. We can attribute this inconsistency to the heterogeneity of the sensors’ measurement zone. We can see that when sensor ee is given the last priority to make its choice, the sequential greedy algorithm acts poorly. This can be explained by agents with smaller sensing zone picking high-density areas but not being able to cover it fully, see Fig. 3(h) which depicts the outcome of a sequential greedy algorithm using the sequence in Case 1. A simpler example justifying this observation is shown in Fig. 3 with the two disjoint clusters of points and two sensors. One may suggest to sequence the agents from high to low sensing zone order, however this is not necessarily the best choice as we can see in Table 2; the utility of case 6 is less than case 5 (the conjecture of sequencing the agents from strongest to weakest is not valid). Moreover, this ordering may lead to a very long 𝚂𝙴𝚀\mathtt{SEQ} over the communication graph. Interestingly, this inconsistency does not appear in solutions of Algorithm 2 where the agents intrinsically are overcoming the importance of a sequence by deciding the position of the sensors over a time horizon of communication and exchanging their information set.

Refer to caption
Figure 3: or The sequential greedy algorithm, when the blue agent chooses first assigns both the blue and the orange agents to point A resulting in inferior performance compared to the case that the orange agent chooses first. In the later case, orange agent gets A and the blue agent gets B, which is indeed the optimal solution.

7 Conclusion

We proposed a distributed suboptimal algorithm to solve the problem of maximizing an monotone increasing submodular set function subject to a partitioned matroid constraint. Our problem of interest was motivated by optimal multi-agent sensor placement problems in discrete space. Our algorithm was a practical decentralization of a multilinear extension based algorithm that achieves (11/eO(1/T))(1-1/\textup{e}-O(1/T)) optimally gap, which is an improvement over 1/21/2 optimality gap that the well-known sequential greedy algorithm achieves. In our numerical example, we compared the outcome obtained by our proposed algorithm with that of a decentralized sequential greedy algorithm which is constructed from assigning a priority sequence to the agents. We showed that the outcome of the sequential greedy algorithm is inconsistent and depends on the sequence. However, our algorithm’s outcome due to its iterative nature intrinsically tended to be consistent, which in some ways also explains its better optimally gap over the sequential greedy algorithm. Our future work is to study the robustness of our proposed algorithm to message dropout.

Acknowledgments

This work is supported by NSF award IIS-SAS-1724331.

References

  • [1] S. Martínez and F. Bullo, “Optimal sensor placement and motion coordination for target tracking,” Automatica, vol. 42, no. 4, pp. 661–668, 2006.
  • [2] N. Zhou, C. Cassandras, X. Yu, and S. Andersson, “Decentralized event-driven algorithms for multi-agent persistent monitoring tasks,” in IEEE Int. Conf. on Decision and Control, pp. 4064–4069, 2017.
  • [3] Y. Khazaeni and C. Cassandras, “Event-driven trajectory optimization for data harvesting in multi-agent systems,” IEEE Tran. on Control of Network Systems, 2017.
  • [4] N. Zhou, X. Yu, S. Andersson, and C. Cassandras, “Optimal event-driven multi-agent persistent monitoring of a finite set of data sources,” IEEE Tran. on Automatic Control, 2018.
  • [5] N. Rezazadeh and S. S. Kia, “A sub-modular receding horizon approach to persistent monitoring for a group of mobile agents over an urban area,” IFAC-PapersOnLine, vol. 52, no. 20, pp. 217–222, 2019.
  • [6] X. Duan, M. George, R. Patel, and F. Bullo, “Robotic surveillance based on the meeting time of random walks,” IEEE Tran. on Robotics and Automation, 2020.
  • [7] S. Welikala and C. Cassandras, “Event-driven receding horizon control for distributed estimation in network systems,” arXiv preprint arXiv:2009.11958, 2020.
  • [8] S. Alamdari, E. Fata, and L. Smith, “Persistent monitoring in discrete environments: Minimizing the maximum weighted latency between observations,” The International Journal of Robotics Research, vol. 33, no. 1, pp. 138–154, 2014.
  • [9] N. Mehr and R. Horowitz, “A submodular approach for optimal sensor placement in traffic networks,” in 2018 Annual American Control Conference (ACC), pp. 6353–6358, IEEE, 2018.
  • [10] H. A, M. Ghasemi, H. Vikalo, and U. Topcu, “Randomized greedy sensor selection: Leveraging weak submodularity,” IEEE Transactions on Automatic Control, 2020.
  • [11] V. Tzoumas, A. Jadbabaie, and G. J. Pappas, “Sensor placement for optimal kalman filtering: Fundamental limits, submodularity, and algorithms,” in American Control Conference, pp. 191–196, IEEE, 2016.
  • [12] M. Shamaiah, S. Banerjee, and H. Vikalo, “Greedy sensor selection: Leveraging submodularity,” in IEEE conference on decision and control, pp. 2572–2577, 2010.
  • [13] S. T. Jawaid and S. Smith, “Submodularity and greedy algorithms in sensor scheduling for linear dynamical systems,” Automatica, vol. 61, pp. 282–288, 2015.
  • [14] G. Nemhauser, L. Wolsey, and M. Fisher, “An analysis of approximations for maximizing submodular set functions—i,” Mathematical programming, vol. 14, no. 1, pp. 265–294, 1978.
  • [15] L. Fisher, G. Nemhauser, and L. Wolsey, “An analysis of approximations for maximizing submodular set functions—ii,” in Polyhedral combinatorics, pp. 73–87, Springer, 1978.
  • [16] K. Wei, R. Iyer, and J. Bilmes, “Fast multi-stage submodular maximization,” in International conference on machine learning, pp. 1494–1502, PMLR, 2014.
  • [17] B. Mirzasoleiman, A. Karbasi, R. Sarkar, and A. Krause, “Distributed submodular maximization: Identifying representative elements in massive data,” in Advances in Neural Information Processing Systems, pp. 2049–2057, 2013.
  • [18] B. Mirzasoleiman, M. Zadimoghaddam, and A. Karbasi, “Fast distributed submodular cover: Public-private data summarization,” in Advances in Neural Information Processing Systems, pp. 3594–3602, 2016.
  • [19] R. Kumar, B. Moseley, S. Vassilvitskii, and A. Vattani, “Fast greedy algorithms in mapreduce and streaming,” ACM Transactions on Parallel Computing, vol. 2, no. 3, pp. 1–22, 2015.
  • [20] P. S. Raut, O. Sadeghi, and M. Fazel, “Online dr-submodular maximization with stochastic cumulative constraints,” arXiv preprint arXiv:2005.14708, 2020.
  • [21] J. Vondrák, “Optimal approximation for the submodular welfare problem in the value oracle model,” in Proceedings of the fortieth annual ACM symposium on Theory of computing, pp. 67–74, 2008.
  • [22] A. A. Bian, B. Mirzasoleiman, J. Buhmann, and A. Krause, “Guaranteed non-convex optimization: Submodular maximization over continuous domains,” in Artificial Intelligence and Statistics, pp. 111–120, 2017.
  • [23] A. Mokhtari, H. Hassani, and A. Karbasi, “Stochastic conditional gradient methods: From convex minimization to submodular maximization,” Journal of Machine Learning Research, vol. 21, no. 105, pp. 1–49, 2020.
  • [24] O. Sadeghi and M. Fazel, “Online continuous dr-submodular maximization with long-term budget constraints,” in International Conference on Artificial Intelligence and Statistics, pp. 4410–4419, 2020.
  • [25] A. Robey, A. Adibi, B. Schlotfeldt, J. G. Pappas, and H. Hassani, “Optimal algorithms for submodular maximization with distributed constraints,” arXiv preprint arXiv:1909.13676, 2019.
  • [26] H. W, “Probability inequalities for sums of bounded random variables,” in The Collected Works of Wassily Hoeffding, pp. 409–426, Springer, 1994.
  • [27] C. Chekuri and A. Kumar, “Maximum coverage problem with group budget constraints and applications,” in Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pp. 72–83, Springer, 2004.
  • [28] M. Conforti and G. Cornuéjols, “Submodular set functions, matroids and the greedy algorithm: tight worst-case bounds and some generalizations of the rado-edmonds theorem,” Discrete applied mathematics, vol. 7, no. 3, pp. 251–274, 1984.
  • [29] N. Rezazadeh and S. S. Kia, “A sub-modular receding horizon solution for mobile multi-agent persistent monitoring,” arXiv preprint arXiv:1908.04425, 2019.
  • [30] H. K. Khalil and J. W. Grizzle, Nonlinear systems, vol. 3. Prentice hall Upper Saddle River, NJ, 2002.
  • [31] A. A. Ageev and M. I. Sviridenko, “Pipage rounding: A new method of constructing algorithms with proven performance guarantee,” Journal of Combinatorial Optimization, vol. 8, no. 3, pp. 307–328, 2004.

Appendix A: Proof of the Results in Section 5

  • Proof of Lemma 5.1

    Since ff is a monotone increasing and submodular set function, we have f(𝐱i(t){p})f(𝐱i(t){p})0f(\mathcal{R}_{\boldsymbol{\mathbf{x}}_{i}(t)}\cup\{p\})-f(\mathcal{R}_{\boldsymbol{\mathbf{x}}_{i}(t)}\setminus\{p\})\geq 0 and hence F~(𝐱i(t))\widetilde{\nabla F}(\boldsymbol{\mathbf{x}}_{i}(t)) has positive entries i𝒜\forall i\in\mathcal{A}. This results in the optimization (12) subject to vector space (5.2) to output vector 𝐯~i(t)Pi()\widetilde{\boldsymbol{\mathbf{v}}}_{i}(t)\in P_{i}(\mathcal{M}) that has entries greater or equal to zero. Hence, according to the propagation and update rule (16) and (17), we can conclude that 𝐱ii(t)\boldsymbol{\mathbf{x}}_{ii}(t) has increasing elements and only agent ii can update it and other agents only copy this value as 𝐱^ji(t)\hat{\boldsymbol{\mathbf{x}}}_{ji}(t). Therefor, we can conclude that x^jip(t)xiip(t)\hat{x}_{jip}(t)\leq x_{iip}(t) for all p𝒫ip\in\mathcal{P}_{i} which concludes the proof. \Box

  • Proof of Proposition 5.1

    ff is a monotone increasing and submodular set function therefor f(𝐱i(t){p})f(𝐱i(t){p})0f(\mathcal{R}_{\boldsymbol{\mathbf{x}}_{i}(t)}\cup\{p\})-f(\mathcal{R}_{\boldsymbol{\mathbf{x}}_{i}(t)}\setminus\{p\})\geq 0 and hence ~F(𝐱i(t))\widetilde{\nabla}F(\boldsymbol{\mathbf{x}}_{i}(t)) has positive entries i𝒜\forall i\in\mathcal{A}. Then, because 𝐯~j(t)Pj()\widetilde{\boldsymbol{\mathbf{v}}}_{j}(t)\in P_{j}(\mathcal{M}), it follows from (12) that 𝐯~j(t)\widetilde{\boldsymbol{\mathbf{v}}}_{j}(t) has non-negative entries, v~jp(t)0\widetilde{{v}}_{jp}(t)\geq 0, which satisfy p𝒫jv~jp(t)=1\sum_{p\in\mathcal{P}_{j}}\widetilde{{v}}_{jp}(t)=1. Therefore, it follows from (16) and Lemma 5.1 that

    𝟏.𝐱jj(t+1)=𝟏.𝐱jj(t)+1T,j𝒜.\displaystyle\boldsymbol{\mathbf{1}}.\boldsymbol{\mathbf{x}}_{jj}(t+1)=\boldsymbol{\mathbf{1}}.\boldsymbol{\mathbf{x}}_{jj}(t)+\frac{1}{T},\quad j\in\mathcal{A}. (25)

    Using (25), we can also write

    𝟏.𝐱jj(t)=𝟏.𝐱jj(td(𝒢))+1Td(𝒢),j𝒜.\displaystyle\boldsymbol{\mathbf{1}}.\boldsymbol{\mathbf{x}}_{jj}(t)=\boldsymbol{\mathbf{1}}.\boldsymbol{\mathbf{x}}_{jj}(t-d(\mathcal{G}))+\frac{1}{T}d(\mathcal{G}),\quad j\in\mathcal{A}. (26)

    Furthermore, it follows from Lemma (5.1) that for all p𝒫j\forall p\in\mathcal{P}_{j} and any i𝒜\{j}i\in\mathcal{A}\backslash\{j\} we can write

    xjjp(t)x^ijp(t).\displaystyle x_{jjp}(t)\geq\hat{x}_{ijp}(t). (27)

    Also, since every agent i𝒜\{j}i\in\mathcal{A}\backslash\{j\} can be reached from agent j𝒜j\in\mathcal{A} at most in d(𝒢)d(\mathcal{G}) hops, it follows from the propagation and update laws (16) and (17), for all p𝒫j\forall p\in\mathcal{P}_{j}, for any i𝒜\{j}i\in\mathcal{A}\backslash\{j\} that

    x^ijp(t)xjjp(td(𝒢)).\displaystyle\hat{x}_{ijp}(t)\geq x_{jjp}(t-d(\mathcal{G})). (28)

    Thus, for any j𝒜j\in\mathcal{A} and i𝒜\{j}i\in\mathcal{A}\backslash\{j\}, (27) and (28) result in

    𝟏.𝐱jj(t)𝟏.𝐱^ij(t)𝟏.𝐱jj(td(𝒢)).\displaystyle\boldsymbol{\mathbf{1}}.\boldsymbol{\mathbf{x}}_{jj}(t)\geq\boldsymbol{\mathbf{1}}.\hat{\boldsymbol{\mathbf{x}}}_{ij}(t)\geq\boldsymbol{\mathbf{1}}.\boldsymbol{\mathbf{x}}_{jj}(t-d(\mathcal{G})). (29)

    Next, we can use (26) and (29) to write

    𝟏.𝐱jj(t)𝟏.𝐱^ij(t)𝟏.𝐱jj(t)1Td(𝒢),\displaystyle\boldsymbol{\mathbf{1}}.\boldsymbol{\mathbf{x}}_{jj}(t)\geq\boldsymbol{\mathbf{1}}.\hat{\boldsymbol{\mathbf{x}}}_{ij}(t)\geq\boldsymbol{\mathbf{1}}.\boldsymbol{\mathbf{x}}_{jj}(t)-\frac{1}{T}d(\mathcal{G}), (30)

    for j𝒜j\in\mathcal{A} and i𝒜\{j}i\in\mathcal{A}\backslash\{j\}. Using (30) for any i𝒜i\in\mathcal{A} we can write

    j𝒜𝟏.𝐱jj(t)𝐱ii(t)+j𝒜\{i}𝟏.𝐱^ij(t)j𝒜𝟏.𝐱jj(t)1TNd(𝒢).\displaystyle\sum_{j\in\mathcal{A}}\boldsymbol{\mathbf{1}}.\boldsymbol{\mathbf{x}}_{jj}(t)\geq\boldsymbol{\mathbf{x}}_{ii}(t)+\sum_{j\in\mathcal{A}\backslash\{i\}}\boldsymbol{\mathbf{1}}.\hat{\boldsymbol{\mathbf{x}}}_{ij}(t)\geq\sum_{j\in\mathcal{A}}\boldsymbol{\mathbf{1}}.\boldsymbol{\mathbf{x}}_{jj}(t)-\frac{1}{T}Nd(\mathcal{G}). (31)

    Then, using Lemma 5.1, from (31) we can write

    𝟏.𝐱¯(t)𝟏.𝐱i(t)𝟏.𝐱¯(t)1TNd(𝒢),\displaystyle\boldsymbol{\mathbf{1}}.\bar{\boldsymbol{\mathbf{x}}}(t)\geq\boldsymbol{\mathbf{1}}.\boldsymbol{\mathbf{x}}_{i}(t)\geq\boldsymbol{\mathbf{1}}.\bar{\boldsymbol{\mathbf{x}}}(t)-\frac{1}{T}Nd(\mathcal{G}),

    which ascertains (19a). Next, note that from Lemma 5.1, we have 𝐱jj(t)=𝐱jj(t)\boldsymbol{\mathbf{x}}_{jj}(t)=\boldsymbol{\mathbf{x}}_{jj}^{-}(t) for any j𝒜j\in\mathcal{A}. Then, using (16) and invoking Lemma 5.1, we obtain (19b),which, given (25), also ascertains (19c). \Box

  • Proof of Theorem 5.1

    Knowing that |2Fxpxq|f(𝒮)\left|\frac{\partial^{2}F}{\partial x_{p}\partial x_{q}}\right|\leq f(\mathcal{S}^{\star}) from Lemma 7.2 and (19c), it follows from Lemma 7.3 that

    F(𝐱¯(t+1))F(𝐱¯(t))F(𝐱¯(t)).(𝐱¯(t+1)𝐱¯(t))12N21T2f(𝒮),\displaystyle F(\bar{\boldsymbol{\mathbf{x}}}(t+1))-F(\bar{\boldsymbol{\mathbf{x}}}(t))\geq\nabla F(\bar{\boldsymbol{\mathbf{x}}}(t)).(\bar{\boldsymbol{\mathbf{x}}}(t+1)-\bar{\boldsymbol{\mathbf{x}}}(t))-\frac{1}{2}N^{2}\frac{1}{T^{2}}f(\mathcal{S}^{\star}),

    which, given (19b), leads to

    F(𝐱¯(t+1))F(𝐱¯(t))1Ti𝒜𝐯~i(t).F(𝐱¯(t))12N21T2f(𝒮).\displaystyle F(\bar{\boldsymbol{\mathbf{x}}}(t+1))-F(\bar{\boldsymbol{\mathbf{x}}}(t))\geq\frac{1}{T}\sum_{i\in\mathcal{A}}\widetilde{\boldsymbol{\mathbf{v}}}_{i}(t).\nabla F(\bar{\boldsymbol{\mathbf{x}}}(t))-\frac{1}{2}N^{2}\frac{1}{T^{2}}f(\mathcal{S}^{\star}). (32)

    Next, we note that by definition, 𝐱¯(t)𝐱i(t)\bar{\boldsymbol{\mathbf{x}}}(t)\geq\boldsymbol{\mathbf{x}}_{i}(t) for any i𝒜\forall i\in\mathcal{A}. Therefore, given (19a), by invoking Lemma 7.3, for any i𝒜i\in\mathcal{A} we can write

    |Fxp(𝐱¯(t))Fxp(𝐱i(t))|N1Td(𝒢))f(𝒮),\displaystyle\left|\frac{\partial F}{\partial x_{p}}(\bar{\boldsymbol{\mathbf{x}}}(t))-\frac{\partial F}{\partial x_{p}}({\boldsymbol{\mathbf{x}}}_{i}(t))\right|\leq N\frac{1}{T}d(\mathcal{G}))f(\mathcal{S}^{\star}), (33)

    for p{1,,n}p\in\{1,\cdots,n\}. Recall that at each time step tt, the realization of 𝐯~i(t)\widetilde{\boldsymbol{\mathbf{v}}}_{i}(t) in (12) that Algorithm 2 uses is

    𝐯~i(t)=𝟏p,p𝒫iis given by(20)\displaystyle\widetilde{\boldsymbol{\mathbf{v}}}_{i}(t)=\boldsymbol{\mathbf{1}}_{p^{\star}},\qquad p^{\star}\in\mathcal{P}_{i}~{}~{}\text{is given by}~{}\eqref{eq::p_star} (34)

    for every i𝒜i\in\mathcal{A}. Thus, 𝟏.𝐯~i(t)=1\boldsymbol{\mathbf{1}}.\widetilde{\boldsymbol{\mathbf{v}}}_{i}(t)=1, i𝒜i\in\mathcal{A}. Consequently, using (33) we can write

    i𝒜𝐯~i(t).F(𝐱¯(t))i𝒜𝐯~i(t).F(𝐱i(t))N21Td(𝒢))f(𝒮).\displaystyle\sum_{i\in\mathcal{A}}\widetilde{\boldsymbol{\mathbf{v}}}_{i}(t).\nabla F(\bar{\boldsymbol{\mathbf{x}}}(t))\geq\sum_{i\in\mathcal{A}}\widetilde{\boldsymbol{\mathbf{v}}}_{i}(t).\nabla F(\boldsymbol{\mathbf{x}}_{i}(t))-N^{2}\frac{1}{T}d(\mathcal{G}))f(\mathcal{S}^{\star}). (35)

    Next, we let

    𝐯¯i(t)=argmax𝐯Pi()𝐯.F(𝐱¯(t))\bar{\boldsymbol{\mathbf{v}}}_{i}(t)=\underset{\boldsymbol{\mathbf{v}}\in P_{i}(\mathcal{M})}{\textup{argmax}}\boldsymbol{\mathbf{v}}.\nabla F(\bar{\boldsymbol{\mathbf{x}}}(t))

    and

    𝐯^i(t)=argmax𝐯Pi()𝐯.F(𝐱i(t)).\hat{\boldsymbol{\mathbf{v}}}_{i}(t)=\underset{\boldsymbol{\mathbf{v}}\in P_{i}(\mathcal{M})}{\textup{argmax}}\boldsymbol{\mathbf{v}}.\nabla F(\boldsymbol{\mathbf{x}}_{i}(t)).

    Because ff is monotone increasing, by virtue of Lemma 3.1, Fxp0\frac{\partial F}{\partial x_{p}}\geq 0, and as such 𝐯¯i(t)=𝟏p¯\bar{\boldsymbol{\mathbf{v}}}_{i}(t)=\boldsymbol{\mathbf{1}}_{\bar{p}} and 𝐯^i(t)=𝟏p^\hat{\boldsymbol{\mathbf{v}}}_{i}(t)=\boldsymbol{\mathbf{1}}_{\hat{p}}, where p¯=argmaxp𝒫iF(𝐱¯(t))xp\bar{p}=\underset{p\in\mathcal{P}_{i}}{\arg\max}\frac{\partial F(\bar{\boldsymbol{\mathbf{x}}}(t))}{\partial x_{p}} and p^=argmaxp𝒫iF(𝐱i(t))xp\hat{p}=\underset{p\in\mathcal{P}_{i}}{\arg\max}\frac{\partial F(\boldsymbol{\mathbf{x}}_{i}(t))}{\partial x_{p}}. Therefore, using

    𝐯^i(t).F(𝐱i(t))𝐯¯i(t).F(𝐱i(t))\hat{\boldsymbol{\mathbf{v}}}_{i}(t).\nabla F(\boldsymbol{\mathbf{x}}_{i}(t))\geq\bar{\boldsymbol{\mathbf{v}}}_{i}(t).\nabla F(\boldsymbol{\mathbf{x}}_{i}(t))

    and

    𝐯^i(t).F(𝐱i(t))𝐯~i(t).F(𝐱i(t)),\hat{\boldsymbol{\mathbf{v}}}_{i}(t).\nabla F(\boldsymbol{\mathbf{x}}_{i}(t))\geq\tilde{\boldsymbol{\mathbf{v}}}_{i}(t).\nabla F(\boldsymbol{\mathbf{x}}_{i}(t)),

    i𝒜i\in\mathcal{A}, and  (33) we can also write

    i𝒜𝐯^i(t).F(𝐱i(t))i𝒜𝐯¯i(t).F(𝐱i(t))i𝒜𝐯¯i(t).F(𝐱¯(t))N21Td(𝒢)f(𝒮),\displaystyle\sum_{i\in\mathcal{A}}\hat{\boldsymbol{\mathbf{v}}}_{i}(t).\nabla F(\boldsymbol{\mathbf{x}}_{i}(t))\geq\sum_{i\in\mathcal{A}}\bar{\boldsymbol{\mathbf{v}}}_{i}(t).\nabla F(\boldsymbol{\mathbf{x}}_{i}(t))\geq\sum_{i\in\mathcal{A}}\bar{\boldsymbol{\mathbf{v}}}_{i}(t).\nabla F(\bar{\boldsymbol{\mathbf{x}}}(t))-N^{2}\frac{1}{T}d(\mathcal{G})f(\mathcal{S}^{\star}), (36a)
    i𝒜𝐯^i(t).F(𝐱i(t))i𝒜𝐯~i(t).F(𝐱i(t)).\displaystyle\sum_{i\in\mathcal{A}}\hat{\boldsymbol{\mathbf{v}}}_{i}(t).\nabla F(\boldsymbol{\mathbf{x}}_{i}(t))\geq\sum_{i\in\mathcal{A}}\tilde{\boldsymbol{\mathbf{v}}}_{i}(t).\nabla F(\boldsymbol{\mathbf{x}}_{i}(t)). (36b)

On the other hand, by virtue of Lemma 7.4, F~xp(𝐱j(t))\frac{\widetilde{\partial F}}{\partial x_{p}}(\boldsymbol{\mathbf{x}}_{j}(t)), p𝒫jp\in\mathcal{P}_{j} that each agent j𝒜j\in\mathcal{A} uses to solve optimization problem (20) (equivalently (12)) satisfies

|F~xp(𝐱j(t))Fxp(𝐱j(t))|12Tf(𝒮)\displaystyle\left|\frac{\widetilde{\partial F}}{\partial x_{p}}(\boldsymbol{\mathbf{x}}_{j}(t))-\frac{{\partial F}}{\partial x_{p}}(\boldsymbol{\mathbf{x}}_{j}(t))\right|\leq\frac{1}{2T}f(\mathcal{S}^{\star}) (37)

with the probability of 12e18T2Kj1-2\textup{e}^{-\frac{1}{8T^{2}}K_{j}}. Using (36b) and (37), and also that the samples are drawn independently

i𝒜𝐯~i(t).F(𝐱i(t))i𝒜𝐯~i(t).F~(𝐱i(t))N12Tf(𝒮),\displaystyle\sum_{i\in\mathcal{A}}\widetilde{\boldsymbol{\mathbf{v}}}_{i}(t).\nabla F(\boldsymbol{\mathbf{x}}_{i}(t))\geq\sum_{i\in\mathcal{A}}\widetilde{\boldsymbol{\mathbf{v}}}_{i}(t).\widetilde{\nabla F}(\boldsymbol{\mathbf{x}}_{i}(t))-N\frac{1}{2T}f(\mathcal{S}^{\star}), (38a)
i𝒜𝐯~i(t).F~(𝐱i(t))i𝒜𝐯^i(t).F~(𝐱i(t))i𝒜𝐯^i(t).F(𝐱i(t))N12Tf(𝒮),\displaystyle\sum_{i\in\mathcal{A}}\widetilde{\boldsymbol{\mathbf{v}}}_{i}(t).\widetilde{\nabla F}(\boldsymbol{\mathbf{x}}_{i}(t))\geq\sum_{i\in\mathcal{A}}\hat{\boldsymbol{\mathbf{v}}}_{i}(t).\widetilde{\nabla F}(\boldsymbol{\mathbf{x}}_{i}(t))\geq\qquad\sum_{i\in\mathcal{A}}\hat{\boldsymbol{\mathbf{v}}}_{i}(t).\nabla F(\boldsymbol{\mathbf{x}}_{i}(t))-N\frac{1}{2T}f(\mathcal{S}^{\star}), (38b)

with the probability of i𝒜(12e18T2Kj)|𝒫i|\prod_{i\in\mathcal{A}}(1-2\textup{e}^{-\frac{1}{8T^{2}}K_{j}})^{|\mathcal{P}_{i}|}.

From (35), (36a),(38a), and (38b) now we can write

i𝒜𝐯~i(t).F(𝐱¯(t))i𝒜𝐯¯i(t).F(𝐱¯(t))(2Nd(𝒢))+1)N1Tf(𝒮),\displaystyle\sum_{i\in\mathcal{A}}\widetilde{\boldsymbol{\mathbf{v}}}_{i}(t).\nabla F(\bar{\boldsymbol{\mathbf{x}}}(t))\geq\sum_{i\in\mathcal{A}}\bar{\boldsymbol{\mathbf{v}}}_{i}(t).\nabla F(\bar{\boldsymbol{\mathbf{x}}}(t))-(2Nd(\mathcal{G}))+1)N\frac{1}{T}f(\mathcal{S}^{\star}), (39)

with the probability of 12i𝒜e18T2Ki1-2\,\sum_{i\in\mathcal{A}}\textup{e}^{-\frac{1}{8T^{2}}K_{i}}.

Next, let 𝐯i\boldsymbol{\mathbf{v}}_{i}^{\star} be the projection of 𝟏𝒮\boldsymbol{\mathbf{1}}_{\mathcal{S}^{\star}} into Pi()P_{i}(\mathcal{M}). Knowing that Pi()P_{i}(\mathcal{M})s are disjoint sub-spaces of P()P(\mathcal{M}) covering the whole space then we can write

𝟏𝒮=i𝒜𝐯i.\displaystyle\boldsymbol{\mathbf{1}}_{\mathcal{S}^{\star}}=\sum_{i\in\mathcal{A}}\boldsymbol{\mathbf{v}}_{i}^{\star}. (40)

Then, using (39), (40), and invoking Lemma 7.1 and the fact that 𝐯¯i(t).F(𝐱¯(t))𝐯i(t).F(𝐱¯(t))\bar{\boldsymbol{\mathbf{v}}}_{i}(t).\nabla F(\bar{\boldsymbol{\mathbf{x}}}(t))\geq\boldsymbol{\mathbf{v}}^{\star}_{i}(t).\nabla F(\bar{\boldsymbol{\mathbf{x}}}(t)) we obtain

i𝒜𝐯~i(t).F(𝐱¯(t))i𝒜𝐯i(t).F(𝐱¯(t))(2Nd(𝒢))+1)N1Tf(𝒮)=\displaystyle\sum_{i\in\mathcal{A}}\widetilde{\boldsymbol{\mathbf{v}}}_{i}(t).\nabla F(\bar{\boldsymbol{\mathbf{x}}}(t))\geq\sum_{i\in\mathcal{A}}\boldsymbol{\mathbf{v}}^{\star}_{i}(t).\nabla F(\bar{\boldsymbol{\mathbf{x}}}(t))-(2Nd(\mathcal{G}))+1)N\frac{1}{T}f(\mathcal{S}^{\star})=
𝟏𝒮.F(𝐱¯(t))(2Nd(𝒢))+1)N1Tf(𝒮)f(𝒮)F(𝐱¯(t))(2Nd(𝒢))+1)NTf(𝒮),\displaystyle\boldsymbol{\mathbf{1}}_{\mathcal{S}^{\star}}.\nabla F(\bar{\boldsymbol{\mathbf{x}}}(t))-(2Nd(\mathcal{G}))+1)N\frac{1}{T}f(\mathcal{S}^{\star})\geq f(\mathcal{S}^{\star})-F(\bar{\boldsymbol{\mathbf{x}}}(t))-(2Nd(\mathcal{G}))+1)\frac{N}{T}f(\mathcal{S}^{\star}), (41)

with the probability of i𝒜(12e18T2Kj)|𝒫i|\prod_{i\in\mathcal{A}}(1-2\textup{e}^{-\frac{1}{8T^{2}}K_{j}})^{|\mathcal{P}_{i}|}. Hence, using (32) and (Appendix A: Proof of the Results in Section 5), we conclude that

F(𝐱¯(t+1))F(𝐱¯(t))1T(f(𝒮)F(𝐱¯(t))(2Nd(𝒢))+12N+1)NT2f(𝒮),\displaystyle F(\bar{\boldsymbol{\mathbf{x}}}(t+1))-F(\bar{\boldsymbol{\mathbf{x}}}(t))\geq\frac{1}{T}(f(\mathcal{S}^{\star})-F(\bar{\boldsymbol{\mathbf{x}}}(t))-(2Nd(\mathcal{G}))+\frac{1}{2}N+1)\frac{N}{T^{2}}f(\mathcal{S}^{\star}), (42)

with the probability of i𝒜(12e18T2Kj)|𝒫i|\prod_{i\in\mathcal{A}}(1-2\textup{e}^{-\frac{1}{8T^{2}}K_{j}})^{|\mathcal{P}_{i}|}.

Next, let g(t)=f(𝒮)F(𝐱¯(t))g(t)=f(\mathcal{S}^{\star})-F(\bar{\boldsymbol{\mathbf{x}}}(t)) and β=(2Nd(𝒢))+12N+1)NT2f(𝒮)\beta=(2Nd(\mathcal{G}))+\frac{1}{2}N+1)\frac{N}{T^{2}}f(\mathcal{S}^{\star}), to rewrite (42) as

(f(𝒮)F(𝐱¯(t)))(f(𝒮)F(𝐱¯(t+1)))=\displaystyle(f(\mathcal{S}^{\star})-F(\bar{\boldsymbol{\mathbf{x}}}(t)))-(f(\mathcal{S}^{\star})-F(\bar{\boldsymbol{\mathbf{x}}}(t+1)))=
g(t)g(t+1)1T(f(𝒮)F(𝐱¯(t)))β=1Tg(t)β.\displaystyle g(t)-g(t+1)\geq\frac{1}{T}(f(\mathcal{S}^{\star})-F(\bar{\boldsymbol{\mathbf{x}}}(t)))-\beta=\frac{1}{T}g(t)-\beta. (43)

Then from inequality (Appendix A: Proof of the Results in Section 5) we get

g(t+1)(11T)g(t)+β\displaystyle g(t+1)\leq(1-\frac{1}{T})g(t)+\beta (44)

with the probability of i𝒜(12e18T2Kj)|𝒫i|\prod_{i\in\mathcal{A}}(1-2\textup{e}^{-\frac{1}{8T^{2}}K_{j}})^{|\mathcal{P}_{i}|}. Solving for inequality (44) at time TT yields

g(T)(11T)Tg(0)+βk=0T1(11T)k=(11T)Tg(0)+Tβ(1(11T)T)\displaystyle g(T)\leq(1-\frac{1}{T})^{T}g(0)+\beta\sum_{k=0}^{T-1}(1-\frac{1}{T})^{k}=(1-\frac{1}{T})^{T}g(0)+T\beta(1-(1-\frac{1}{T})^{T}) (45)

with the probability of (i𝒜(12e18T2Kj)|𝒫i|)T\left(\prod_{i\in\mathcal{A}}(1-2\textup{e}^{-\frac{1}{8T^{2}}K_{j}})^{|\mathcal{P}_{i}|}\right)^{T}. Substituting back g(T)=f(𝒮)F(𝐱¯(T))g(T)=f(\mathcal{S}^{\star})-F(\bar{\boldsymbol{\mathbf{x}}}(T)) and g(0)=f(𝒮)F(𝐱(0))=f(𝒮)g(0)=f(\mathcal{S}^{\star})-F(\boldsymbol{\mathbf{x}}(0))=f(\mathcal{S}^{\star}), in (45) we then obtain

(1(11T)T)(f(𝒮)Tβ)=(1(1\displaystyle(1-(1-\frac{1}{T})^{T})(f(\mathcal{S}^{\star})-T\beta)=(1-(1 1T)T)(1(2Nd(𝒢))+12N+1)NT)f(𝒮)\displaystyle-\frac{1}{T})^{T})(1-(2Nd(\mathcal{G}))+\frac{1}{2}N+1)\frac{N}{T})f(\mathcal{S}^{\star})
F(𝐱¯(T)),\displaystyle\leq F(\bar{\boldsymbol{\mathbf{x}}}(T)), (46)

with the probability of (i𝒜(12e18T2Kj)|𝒫i|)T\left(\prod_{i\in\mathcal{A}}(1-2\textup{e}^{-\frac{1}{8T^{2}}K_{j}})^{|\mathcal{P}_{i}|}\right)^{T}. By applying 1e(1(11T)T)\frac{1}{\textup{e}}\geq(1-(1-\frac{1}{T})^{T}), we get

(11e)(1(2Nd(𝒢)+12N+1)NT)f(𝒮)F(𝐱¯(T)),\displaystyle(1\!-\!\frac{1}{\textup{e}})(1\!-\!(2Nd(\mathcal{G})\!+\!\frac{1}{2}N+1)\frac{N}{T})f(\mathcal{S}^{\star})\!\leq F(\bar{\boldsymbol{\mathbf{x}}}(T)), (47)

with the probability of (i𝒜(12e18T2Kj)|𝒫i|)T\left(\prod_{i\in\mathcal{A}}(1-2\textup{e}^{-\frac{1}{8T^{2}}K_{j}})^{|\mathcal{P}_{i}|}\right)^{T}.

Given (34), from the propagation and update rules (16) and (17) and Lemma 5.1 we can conclude that 𝟏.𝐱ii(T)=1\boldsymbol{\mathbf{1}}.\boldsymbol{\mathbf{x}}_{ii}(T)=1. Furthermore by defining 𝐱ii(T)\mathcal{R}_{\boldsymbol{\mathbf{x}}_{ii}(T)} to be a random set where each member if sampled according to 𝐱ii(T)\boldsymbol{\mathbf{x}}_{ii}(T) and from 𝒫i\mathcal{P}_{i}. Since 𝟏.𝐱ii(T)=1\boldsymbol{\mathbf{1}}.\boldsymbol{\mathbf{x}}_{ii}(T)=1, we can also define 𝒯𝐱ii(T)\mathcal{T}_{\boldsymbol{\mathbf{x}}_{ii}(T)} to be a random set where only one policy is sampled from 𝒫i\mathcal{P}_{i} according to 𝐱ii(T)\boldsymbol{\mathbf{x}}_{ii}(T), then using Lemma 7.5, we can write

F(𝐱¯(T))=𝔼[f(𝐱(T))]=𝔼[f(𝐱11(T)𝐱NN(T))]\displaystyle F(\bar{\boldsymbol{\mathbf{x}}}(T))=\mathbb{E}[f(\mathcal{R}_{\boldsymbol{\mathbf{x}}(T)})]=\mathbb{E}[f(\mathcal{R}_{\boldsymbol{\mathbf{x}}_{11}(T)}\cup\cdots\cup\mathcal{R}_{\boldsymbol{\mathbf{x}}_{NN}(T)})]
𝔼[f(𝒯𝐱11(T)𝐱22(T)𝐱NN(T))]\displaystyle\leq\mathbb{E}[f(\mathcal{T}_{\boldsymbol{\mathbf{x}}_{11}(T)}\cup\mathcal{R}_{\boldsymbol{\mathbf{x}}_{22}(T)}\cup\cdots\cup\mathcal{R}_{\boldsymbol{\mathbf{x}}_{NN}(T)})]
\displaystyle\leq\cdots
𝔼[f(𝒯𝐱11(T)𝒯𝐱NN(T))]\displaystyle\leq\mathbb{E}[f(\mathcal{T}_{\boldsymbol{\mathbf{x}}_{11}(T)}\cup\cdots\cup\mathcal{T}_{\boldsymbol{\mathbf{x}}_{NN}(T)})]

which concludes the proof. \Box

Appendix B: Auxiliary Results

In what follows we let 𝒮\mathcal{S}^{\star} to represent the optimizer of problem (8).

Lemma 7.1

Consider the set value optimization problem (8). Suppose f:𝒫0f:\mathcal{P}\to{\mathbb{R}}_{\geq 0} is an increasing and submodular set function and consider its multilinear extension F:0n0F:{\mathbb{R}}_{\geq 0}^{n}\to{\mathbb{R}}_{\geq 0}. Then 𝐱P()\forall\boldsymbol{\mathbf{x}}\in P(\mathcal{M})

𝟏𝒮.F(𝐱)f(𝒮)F(𝐱).\displaystyle\boldsymbol{\mathbf{1}}_{\mathcal{S}^{\star}}.\nabla F(\boldsymbol{\mathbf{x}})\geq f(\mathcal{S}^{\star})-F(\boldsymbol{\mathbf{x}}).
  • Proof

    see [21] for the proof. \Box

Lemma 7.2

Consider the set value optimization problem (8). Suppose f:𝒫0f:\mathcal{P}\to{\mathbb{R}}_{\geq 0} is an increasing and submodular set function and consider its multilinear extension F:0n0F:{\mathbb{R}}_{\geq 0}^{n}\to{\mathbb{R}}_{\geq 0}. Then

|2Fxpxq|f(𝒮),p,q𝒫.\displaystyle\left|\frac{\partial^{2}F}{\partial x_{p}\partial x_{q}}\right|\leq f(\mathcal{S}^{\star}),\qquad p,q\in\mathcal{P}.
  • Proof

    Since p𝐱{q}{p}p\not\in\mathcal{R}_{\boldsymbol{\mathbf{x}}}\cup\{q\}\setminus\{p\}, therefor by submodularity of ff we can write

    0Δf(p|𝐱{q}{p})f({p}),\displaystyle 0\leq\Delta_{f}(p|\mathcal{R}_{\boldsymbol{\mathbf{x}}}\cup\{q\}\setminus\{p\})\leq f(\{p\}),
    0Δf(p|𝐱{p,q})f({p}).\displaystyle 0\leq\Delta_{f}(p|\mathcal{R}_{\boldsymbol{\mathbf{x}}}\!\!\setminus\!\!\{p,q\})\leq f(\{p\}). (48)

    Knowing that

    Δf(p|𝐱{q}{p})=f(𝐱{p,q})f(𝐱{q}{p}),\displaystyle\Delta_{f}(p|\mathcal{R}_{\boldsymbol{\mathbf{x}}}\!\cup\!\{q\}\!\setminus\!\{p\})\!=\!f(\mathcal{R}_{\boldsymbol{\mathbf{x}}}\cup\{p,q\})\!-\!f(\mathcal{R}_{\boldsymbol{\mathbf{x}}}\cup\{q\}\!\!\setminus\!\!\{p\}),

    and

    Δf(p|𝐱{p,q})=f(𝐱{p}){q})f(𝐱{p,j}),\displaystyle\Delta_{f}(p|\mathcal{R}_{\boldsymbol{\mathbf{x}}}\!\!\setminus\!\!\{p,q\})=f(\mathcal{R}_{\boldsymbol{\mathbf{x}}}\cup\{p\})\!\!\setminus\!\!\{q\})-f(\mathcal{R}_{\boldsymbol{\mathbf{x}}}\!\!\setminus\!\!\{p,j\}),

    then definition (3) can be written as

    Fxpxq=𝔼[Δf(p|𝐱{q}{p})Δf(p|𝐱{p,q})].\displaystyle\frac{\partial F}{\partial x_{p}\partial x_{q}}=\mathbb{E}[\Delta_{f}(p|\mathcal{R}_{\boldsymbol{\mathbf{x}}}\cup\{q\}\!\!\setminus\!\!\{p\})\!\!-\!\!\Delta_{f}(p|\mathcal{R}_{\boldsymbol{\mathbf{x}}}\!\!\setminus\!\!\{p,q\})]. (49)

    Putting (Proof) and (49) together results in

    |2Fxpxq|f({p}).\displaystyle\left|\frac{\partial^{2}F}{\partial x_{p}\partial x_{q}}\right|\leq f(\{p\}).

    where knowing that f({p})f(𝒮)f(\{p\})\leq f(\mathcal{S}^{\star}) concludes the proof. \Box

Lemma 7.3

Consider a twice differentiable function F(𝐱):NF(\boldsymbol{\mathbf{x}}):\mathbb{R}^{N}\to\mathbb{R} which satisfies |2Fxpxq|α\left|\frac{\partial^{2}F}{\partial x_{p}\partial x_{q}}\right|\leq\alpha for any p,q{1,,N}p,q\in\{1,\cdots,N\}. Then for any 𝐱1,𝐱2N\boldsymbol{\mathbf{x}}_{1},\boldsymbol{\mathbf{x}}_{2}\in\mathbb{R}^{N} satisfying 𝐱2𝐱1\boldsymbol{\mathbf{x}}_{2}\geq\boldsymbol{\mathbf{x}}_{1} and 𝟏.(𝐱2𝐱1)β\boldsymbol{\mathbf{1}}.({\boldsymbol{\mathbf{x}}}_{2}-{\boldsymbol{\mathbf{x}}}_{1})\leq\beta the following holds,

|Fxp(𝐱1+ϵ(𝐱2𝐱1))Fxp(𝐱1)|ϵαβ,\displaystyle\left|\frac{\partial F}{\partial x_{p}}({\boldsymbol{\mathbf{x}}}_{1}+\epsilon({\boldsymbol{\mathbf{x}}}_{2}-{\boldsymbol{\mathbf{x}}}_{1}))-\frac{\partial F}{\partial x_{p}}({\boldsymbol{\mathbf{x}}}_{1})\right|\leq\epsilon\alpha\beta, (50a)
F(𝐱2)F(𝐱1)F(𝐱1).(𝐱2𝐱1)12αβ2,\displaystyle F({\boldsymbol{\mathbf{x}}}_{2})-F({\boldsymbol{\mathbf{x}}}_{1})\geq\nabla F({\boldsymbol{\mathbf{x}}}_{1}).({\boldsymbol{\mathbf{x}}}_{2}-{\boldsymbol{\mathbf{x}}}_{1})-\frac{1}{2}\alpha\beta^{2}, (50b)

for ϵ[0  1]\epsilon\in[0\,\,1].

  • Proof

    Let 𝐡p=[2Fxpx1,,2FxpxN]\boldsymbol{\mathbf{h}}_{p}=[\frac{\partial^{2}F}{\partial x_{p}\partial x_{1}},\cdots,\frac{\partial^{2}F}{\partial x_{p}\partial x_{N}}]^{\top}. Then, we can write

    |Fxp(𝐱1+ϵ(𝐱2𝐱1))Fxp(𝐱1)|=|0ϵ𝐡l(𝐱1+τ(𝐱2𝐱1)).(𝐱2𝐱1)dτ|\displaystyle\left|\frac{\partial F}{\partial x_{p}}({\boldsymbol{\mathbf{x}}}_{1}+\epsilon({\boldsymbol{\mathbf{x}}}_{2}-{\boldsymbol{\mathbf{x}}}_{1}))-\frac{\partial F}{\partial x_{p}}({\boldsymbol{\mathbf{x}}}_{1})\right|=\left|\int_{0}^{\epsilon}\boldsymbol{\mathbf{h}}_{l}({\boldsymbol{\mathbf{x}}}_{1}+\tau({\boldsymbol{\mathbf{x}}}_{2}-{\boldsymbol{\mathbf{x}}}_{1})).({\boldsymbol{\mathbf{x}}}_{2}-{\boldsymbol{\mathbf{x}}}_{1})\textup{d}\tau\right|
    0ϵα𝟏.(𝐱2𝐱1)dτ=ϵαβ,\displaystyle\leq\int_{0}^{\epsilon}\alpha\boldsymbol{\mathbf{1}}.({\boldsymbol{\mathbf{x}}}_{2}-{\boldsymbol{\mathbf{x}}}_{1})\textup{d}\tau=\epsilon\alpha\beta, (51)

    Furthermore,

    F(𝐱2)F(𝐱1)=01F(𝐱1+ϵ(𝐱2𝐱1)).(𝐱¯(t+1)𝐱¯(t))dϵ\displaystyle F({\boldsymbol{\mathbf{x}}}_{2})-F({\boldsymbol{\mathbf{x}}}_{1})=\int_{0}^{1}\nabla F({\boldsymbol{\mathbf{x}}}_{1}+\epsilon({\boldsymbol{\mathbf{x}}}_{2}-{\boldsymbol{\mathbf{x}}}_{1})).(\bar{\boldsymbol{\mathbf{x}}}(t+1)-\bar{\boldsymbol{\mathbf{x}}}(t))\textup{d}\epsilon
    01(F(𝐱1)ϵαβ).(𝐱2𝐱1)dϵ=F(𝐱1).(𝐱2𝐱1)12αβ2,\displaystyle\geq\int_{0}^{1}(\nabla F({\boldsymbol{\mathbf{x}}}_{1})-\epsilon\alpha\beta).({\boldsymbol{\mathbf{x}}}_{2}-{\boldsymbol{\mathbf{x}}}_{1})\textup{d}\epsilon=\nabla F({\boldsymbol{\mathbf{x}}}_{1}).({\boldsymbol{\mathbf{x}}}_{2}\!\!-\!\!{\boldsymbol{\mathbf{x}}}_{1})-\frac{1}{2}\alpha\beta^{2},

    with the third line follow from equation (Proof), which concludes the proof. \Box

Lemma 7.4

Consider the set value optimization problem (8). Suppose f:𝒫0f:\mathcal{P}\to{\mathbb{R}}_{\geq 0} is an increasing and submodular set function and consider its multilinear extension F:0n0F:{\mathbb{R}}_{\geq 0}^{n}\to{\mathbb{R}}_{\geq 0}. Let F~(𝐱)\widetilde{\nabla F}(\boldsymbol{\mathbf{x}}) be the estimate of F(𝐱)\nabla F(\boldsymbol{\mathbf{x}}) that is calculated by taking K>0K\in_{>0} samples of set 𝐱\mathcal{R}_{\boldsymbol{\mathbf{x}}} according to membership probability vector 𝐱\boldsymbol{\mathbf{x}}. Then

|F~xp(𝐱)Fxp(𝐱)|12Tf(𝒮),p{1,,n},\displaystyle\left|\frac{\widetilde{\partial F}}{\partial x_{p}}(\boldsymbol{\mathbf{x}})-\frac{{\partial F}}{\partial x_{p}}(\boldsymbol{\mathbf{x}})\right|\geq\frac{1}{2T}f(\mathcal{S}^{\star}),\quad p\in\{1,\cdots,n\}, (52)

with the probability of 2e18T2K2\textup{e}^{-\frac{1}{8T^{2}}K}, for any T>0T\in_{>0}.

  • Proof

    Define the random variable

    X=((f(𝐱{p})f(𝐱{p}))Fxp(𝐱))/f(𝒮),\displaystyle\!X\!\!=\!\!\left(\!\!(f(\mathcal{R}_{\boldsymbol{\mathbf{x}}}\!\cup\{p\})\!-\!f(\mathcal{R}_{\boldsymbol{\mathbf{x}}}\!\!\setminus\!\{p\}))\!-\!\frac{\partial F}{\partial x_{p}}(\boldsymbol{\mathbf{x}})\!\!\right)\!\!\Big{/}\!\!f(\mathcal{S}^{\star}),

    and assume that agent j𝒜j\in\mathcal{A} takes KjK_{j} samples from 𝐱\mathcal{R}_{\boldsymbol{\mathbf{x}}} to construct {Xk}k=1K\{X_{k}\}_{k=1}^{K} realization of XX. Since ff is a submodular function, then we have (f(𝐱{p})f(𝐱{p}))f(𝒮)(f(\mathcal{R}_{\boldsymbol{\mathbf{x}}}\!\cup\{p\})\!-\!f(\mathcal{R}_{\boldsymbol{\mathbf{x}}}\!\!\setminus\!\{p\}))\leq f(\mathcal{S}^{\star}). Consequently using equation (5), we conclude that 0X10\leq X\leq 1. Hence, using Theorem 3.1, we have

    |k=1KXk|12TK.\displaystyle\left|\sum_{k=1}^{K}X_{k}\right|\geq\frac{1}{2T}K.

    with the probability of 2e18T2K2\textup{e}^{-\frac{1}{8T^{2}}K}. Hence, the estimation accuracy of F(𝐱)\nabla F(\boldsymbol{\mathbf{x}}), is given by

    |F~xp(𝐱)Fxp(𝐱)|12Tf(𝒮),p{1,,n}.\displaystyle\left|\frac{\widetilde{\partial F}}{\partial x_{p}}(\boldsymbol{\mathbf{x}})-\frac{{\partial F}}{\partial x_{p}}(\boldsymbol{\mathbf{x}})\right|\geq\frac{1}{2T}f(\mathcal{S}^{\star}),\quad p\in\{1,\cdots,n\}.

    with the probability of 2e18T2K2\textup{e}^{-\frac{1}{8T^{2}}K}. \Box

Lemma 7.5

Suppose f:𝒫0f:\mathcal{P}\to{\mathbb{R}}_{\geq 0} is an increasing and submodular set function and consider 𝐱\boldsymbol{\mathbf{x}} to be a membership probability vector of set 𝒬𝒫\mathcal{Q}\subset\mathcal{P} with 𝟏.𝐱=1\boldsymbol{\mathbf{1}}.\boldsymbol{\mathbf{x}}=1. We define 𝐱\mathcal{R}_{\boldsymbol{\mathbf{x}}} to be the set resulted by sampling independently each member of 𝒬\mathcal{Q} according to probability vector 𝐱\boldsymbol{\mathbf{x}} and 𝒯𝐱={t},t𝒬\mathcal{T}_{\boldsymbol{\mathbf{x}}}=\{t\},\,\,t\in\mathcal{Q} to be a single member set which is chosen according to 𝐱\boldsymbol{\mathbf{x}}. the following holds for any random set 𝒮𝒫𝒬\mathcal{S}\in\mathcal{P}\setminus\mathcal{Q}.

𝔼[f(𝐱iiS)]𝔼[f(𝒯𝐱iiS)].\displaystyle\mathbb{E}[f(\mathcal{R}_{\boldsymbol{\mathbf{x}}_{ii}}\cup S)]\leq\mathbb{E}[f(\mathcal{T}_{\boldsymbol{\mathbf{x}}_{ii}}\cup S)].
  • Proof

    Defining 𝐱={r1,,ro}\mathcal{R}_{\boldsymbol{\mathbf{x}}}=\{r_{1},\cdots,r_{o}\}, then we can write

    𝔼[f(𝐱𝒮)]=𝔼[f(𝒮)+l=1oΔf(rl|𝒮{r1,,rl1})]\displaystyle\mathbb{E}[f(\mathcal{R}_{\boldsymbol{\mathbf{x}}}\cup\mathcal{S})]=\mathbb{E}[f(\mathcal{S})+\sum_{l=1}^{o}\Delta_{f}(r_{l}|\mathcal{S}\cup\{r_{1},\cdots,r_{l-1}\})]
    𝔼[f(𝒮)+rl𝐱Δf(rl|𝒮)]=𝔼𝒮[𝔼𝐱|𝒮[f(𝒮)+rl𝐱Δf(rl|𝒮)]]\displaystyle\leq\mathbb{E}[f(\mathcal{S})+\sum_{r_{l}\in\mathcal{R}_{\boldsymbol{\mathbf{x}}}}\Delta_{f}(r_{l}|\mathcal{S})]=\mathbb{E}_{\mathcal{S}}[\mathbb{E}_{\mathcal{R}_{\boldsymbol{\mathbf{x}}}|\mathcal{S}}[f(\mathcal{S})+\sum_{r_{l}\in\mathcal{R}_{\boldsymbol{\mathbf{x}}}}\Delta_{f}(r_{l}|\mathcal{S})]]
    =𝔼𝒮[f(𝒮)+𝔼𝐱|𝒮[rl𝐱Δf(rl|𝒮)]]=𝔼𝒮[f(𝒮)+rl𝒬xlΔf(rl|𝒮)]\displaystyle=\mathbb{E}_{\mathcal{S}}[f(\mathcal{S})+\mathbb{E}_{\mathcal{R}_{\boldsymbol{\mathbf{x}}}|\mathcal{S}}[\sum_{r_{l}\in\mathcal{R}_{\boldsymbol{\mathbf{x}}}}\Delta_{f}(r_{l}|\mathcal{S})]]=\mathbb{E}_{\mathcal{S}}[f(\mathcal{S})+\sum_{r_{l}\in\mathcal{Q}}x_{l}\Delta_{f}(r_{l}|\mathcal{S})]
    =𝔼𝒮[f(𝒮)+𝔼𝒯𝐱|𝒮[Δf(t|𝒮)]]=𝔼𝒮[𝔼𝒯𝐱|𝒮[f(𝒮)+Δf(t|𝒮)]]=𝔼[f(𝒯𝐱S)]\displaystyle=\mathbb{E}_{\mathcal{S}}[f(\mathcal{S})+\mathbb{E}_{\mathcal{T}_{\boldsymbol{\mathbf{x}}}|\mathcal{S}}[\Delta_{f}(t|\mathcal{S})]]=\mathbb{E}_{\mathcal{S}}[\mathbb{E}_{\mathcal{T}_{\boldsymbol{\mathbf{x}}}|\mathcal{S}}[f(\mathcal{S})+\Delta_{f}(t|\mathcal{S})]]=\mathbb{E}[f(\mathcal{T}_{\boldsymbol{\mathbf{x}}}\cup S)]

    which concludes the proof. \Box