This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Massively Parallel Maximum Coverage Revisited 111The conference version of this manuscript is to appear in the 50th International Conference on Current Trends in Theory and Practice of Computer Science.

Thai Bui , Hoa T. Vu [email protected], San Diego State University. Supported by NSF Grant No. [email protected], San Diego State University. Supported by NSF Grant No. 2342527.
Abstract

We study the maximum set coverage problem in the massively parallel model. In this setting, mm sets that are subsets of a universe of nn elements are distributed among mm machines. In each round, these machines can communicate with each other, subject to the memory constraint that no machine may use more than O~(n)\tilde{O}\left(n\right) memory. The objective is to find the kk sets whose coverage is maximized. We consider the regime where k=Ω(m)k=\Omega(m) (i.e., k=m/100k=m/100), m=O(n)m=O(n), and each machine has O~(n)\tilde{O}\left(n\right) memory 222The input size is O(mn)O(mn) and each machine has the memory enough to store a constant number of sets..

Maximum coverage is a special case of the submodular maximization problem subject to a cardinality constraint. This problem can be approximated to within a 11/e1-1/e factor using the greedy algorithm, but this approach is not directly applicable to parallel and distributed models. When k=Ω(m)k=\Omega(m), to obtain a 11/eϵ1-1/e-\epsilon approximation, previous work either requires O~(mn)\tilde{O}\left(mn\right) memory per machine which is not interesting compared to the trivial algorithm that sends the entire input to a single machine, or requires 2O(1/ϵ)n2^{O(1/\epsilon)}n memory per machine which is prohibitively expensive even for a moderately small value ϵ\epsilon.

Our result is a randomized (11/eϵ)(1-1/e-\epsilon)-approximation algorithm that uses

O(1/ϵ3logm(log(1/ϵ)+logm))O(1/\epsilon^{3}\cdot\log m\cdot(\log(1/\epsilon)+\log m))

rounds. Our algorithm involves solving a slightly transformed linear program of the maximum coverage problem using the multiplicative weights update method, classic techniques in parallel computing such as parallel prefix, and various combinatorial arguments.

1 Introduction

Maximum coverage is a classic NP-Hard problem. In this problem, we have mm sets S1,S2,,SmS_{1},S_{2},\ldots,S_{m} that are subsets of a universe of nn elements [n]={1,2,,n}[n]=\{1,2,\ldots,n\}. The goal is to find kk sets that cover the maximum number of elements. In the offline model, the greedy algorithm achieves a 11/e1-1/e approximation and assuming PNP\textup{P}\neq\textup{NP}, this approximation is the best possible in polynomial time [9].

However, the greedy algorithm for maximum coverage and the related set cover problem is not friendly to streaming, distributed, and massively parallel computing. A large body of work has been devoted to designing algorithms for these problems in these big data computation models. An incomplete list of work includes [23, 13, 22, 17, 5, 12, 4, 7, 10, 6, 26, 24, 3, 14].

Some example applications of maximum coverage includes facility and sensor placement [18], circuit layout and job scheduling [11], information retrieval [1], market design [16], data summarization [24], and social network analysis [14].

The MPC model.

We consider the massively parallel computation model (MPC) in which mm sets S1,S2,,Sm[n]S_{1},S_{2},\ldots,S_{m}\subseteq[n] are distributed among mm machines. Each machine has memory O~(n)\tilde{O}(n) and holds a set. In each round, each machine can communicate with others with the constraint that no machine receives a total message of size more than O~(n)\tilde{O}(n). Similar to previous work in the literature, we assume that mnm\leq n.

The MPC model, introduced by Karloff, Suri, and Vassilvitskii [15] is an abstraction of various modern computing paradigms such as MapReduce, Hadoop, and Spark.

Previous work.

This problem is a special case of submodular maximization subject to a cardinality constraint. The results of Liu and Vondrak [21], Barbosa et al. [8], Kumar et al. [19] typically require that each machine has enough memory to store O(km)O(\sqrt{km}) items which are sets in our case (and storing a set requires O~(n)\tilde{O}\left(n\right) memory) with m/k\sqrt{m/k} machines. When k=Ω(m)k=\Omega(m) (e.g., k=m/100k=m/100), this means that a single machine may need O~(mn)\tilde{O}\left(mn\right) memory. This is not better than the trivial algorithm that sends the entire input to a single machine and solves the problem in 1 round.

Assadi and Khanna gave a randomized 11/eϵ1-1/e-\epsilon approximation algorithm in which each machine has O~(mδ/ϵn)\tilde{O}\left(m^{\delta/\epsilon}n\right) memory and the number of machines is m1δ/ϵm^{1-\delta/\epsilon} for any ϵ,δ(0,1)\epsilon,\delta\in(0,1) (see Corollary 10 in the full paper of [4]). Setting δ=Θ(1/logm)\delta=\Theta(1/\log m) gives us a 11/eϵ1-1/e-\epsilon approximation in O(1/ϵlogm)O(1/\epsilon\cdot\log m) rounds with O(m)O(m) machines each of which uses O~(21/ϵn)\tilde{O}\left(2^{1/\epsilon}n\right) memory. While Assadi and Khanna’s result is nontrivial in this regime, the dependence on ϵ\epsilon is exponential and if nn is large, then even a moderately small value of ϵ=0.01\epsilon=0.01 can lead to a prohibitively large memory requirement 2100n\approx 2^{100}n. Their work however can handle the case where k=o(m)k=o(m).

Our result.

We present a relatively simple randomized algorithm that achieves a 11/eϵ1-1/e-\epsilon approximation in O(1/ϵ3logm(log(1/ϵ)+logm))O(1/\epsilon^{3}\cdot\log m\cdot(\log(1/\epsilon)+\log m)) rounds with O~(n)\tilde{O}\left(n\right) memory per machine assuming k=Ω(m)k=\Omega(m). Our space requirement does not depend on ϵ\epsilon compared to the exponential dependence in Assadi and Khanna’s result.

We note that assuming k=Ω(m)k=\Omega(m) does not make the problem any easier since there are still exponentially many solutions to consider. In practice, one can think of many applications where one can utilize a constant fraction of the available sets (e.g., 10%10\% or 20%20\%). We state our main result as a theorem below.

Theorem 1.

Assume k=Ω(m)k=\Omega(m) and there are mm machines each of which has O~(n)\tilde{O}\left(n\right) memory. There exists an algorithm that with high probability finds kk sets that cover at least (11/eϵ)OPT(1-1/e-\epsilon)\textup{OPT} elements in O(1/ϵ3logm(log(1/ϵ)+logm))O(1/\epsilon^{3}\cdot\log m\cdot(\log(1/\epsilon)+\log m)) rounds.

If the maximum frequency ff (the maximum number of sets that any element belongs to) is bounded, we can drop the assumption that k=Ω(m)k=\Omega(m), and parameterize the number of rounds based on ff. In particular, we can obtain a 11/eϵ1-1/e-\epsilon approximation in O(f3/ϵ6log2(kfϵ))O(f^{3}/\epsilon^{6}\cdot\log^{2}(\frac{kf}{\epsilon})) rounds.

Remark.

We could easily modify our algorithm so that each machine uses O~(1/ϵn)\tilde{O}\left(1/\epsilon\cdot n\right) memory and the number of rounds is O(1/ϵ2logm(log(1/ϵ)+logm))O(1/\epsilon^{2}\cdot\log m\cdot(\log(1/\epsilon)+\log m)). At least one logm\log m factor is necessary based on the lower bound given by Corollary 9 of [4].

Randomization appears in two parts of our algorithms: the rounding step and the subsampling step to reduce the number of rounds from logmlogn\log m\cdot\log n to logm(log(1/ϵ)+logm)\log m\cdot(\log(1/\epsilon)+\log m). If we only need to compute an approximation to the optimal coverage value such that the output is in the interval [(1ϵ)OPT,OPT/(11/eϵ)][(1-\epsilon)\textup{OPT},\textup{OPT}/(1-1/e-\epsilon)], then we have a deterministic algorithm that runs in O(1/ϵ3lognlogm)O(1/\epsilon^{3}\cdot\log n\cdot\log m) rounds. The algorithm by Assadi and Khanna [4] combines the sample-and-prune framework with threshold greedy. This strategy requires sampling sets. It is unclear how to derandomize their algorithm even just to compute an approximation to the optimal coverage value.

Our techniques and paper organization.

In Section 2.1, we transform the standard linear program for the maximum coverage problem into an equivalent packing linear program that can be solved “approximately” by the multiplicative weights update method. At a high level, the multiplicative weights update method gives us a fractional solution that is a 11/eO(ϵ)1-1/e-O(\epsilon) bi-criteria approximation where (1+O(ϵ))k(1+O(\epsilon))k “fractional” sets cover (11/eO(ϵ))OPT(1-1/e-O(\epsilon))\textup{OPT} “fractional” elements. We then show how to find kk sets covering (11/eO(ϵ))OPT(1-1/e-O(\epsilon))\textup{OPT} elements from this fractional solution through a combinatorial argument and parallel prefix.

Section 2.2 outlines the details to solve the transformed linear program in the MPC model. While this part is an adaptation of the standard multiplicative weights, an implementation in the MPC model requires some additional details such as the number of bits to represent the weights. All missing proofs can be found in the appendix.

Preliminaries.

In this work, we will always consider the case where each machine has O~(n)\tilde{O}\left(n\right) memory and mnm\leq n. Without loss of generality, we may assume the non-central machine jj stores the set SjS_{j}. For each element i[n]i\in[n], we use fif_{i} to denote the number of sets that ii is in. This is also referred to as the frequency of ii. Assume each machine has O~(n)\tilde{O}\left(n\right) space. The vector f can be computed in O(logm)O(\log m) rounds and broadcasted to all machines. Each machine jj starts with the characteristic vector vj{0,1}nv_{j}\in\{0,1\}^{n} of the set SjS_{j} that it holds. The vector f is just the sum of the characteristic vectors of the sets. We can aggregate the vectors {vj}\{v_{j}\} in O(logm)O(\log m) rounds using the standard binary tree aggregation algorithm.

Since in this work, the dependence on 1/ϵ1/\epsilon is polynomial, an αO(ϵ)\alpha-O(\epsilon) approximation can easily be translated to an αϵ\alpha-\epsilon approximation by scaling ϵ\epsilon by a constant factor. We can also assume that 1/ϵ<n/101/\epsilon<n/10; otherwise, we can simulate the greedy algorithm in O(1/ϵ)O(1/\epsilon) rounds. For the sake of exposition, we will not attempt to optimize the constants in our algorithm and analysis. Finally, in this work we consider 11/poly(m)1-1/\operatorname{poly}(m) as a “high probability”. We use [E][E] to denote the indicator variable of the event EE that is 1 if EE happens and 0 otherwise.

2 Algorithm

2.1 The main algorithm

Linear programming (re)formulation.

We first recall the relaxed linear program (LP) for the maximum coverage problem Π0\Pi_{0}:

maximize i[n]xi\displaystyle\sum_{i\in[n]}x_{i}
(s.t.) xiSjiyj\displaystyle x_{i}\leq\sum_{S_{j}\ni i}y_{j} i[n]\displaystyle\forall i\in[n]
j[m]yj=k\displaystyle\sum_{j\in[m]}y_{j}=k
xi,yj[0,1]\displaystyle x_{i},y_{j}\in[0,1] i[n],j[m].\displaystyle\forall i\in[n],j\in[m].

We first reformulate this LP and then approximately solve the new LP using the multiplicative weights update method [2]. For each j[m]j\in[m], let zj:=1yjz_{j}:=1-y_{j}. We have the following fact.

Fact 1.

For each i[n]i\in[n], xiSjiyjxi+SjizjSji(yj+zj)=fi.x_{i}\leq\displaystyle\sum_{S_{j}\ni i}y_{j}\iff x_{i}+\sum_{S_{j}\ni i}z_{j}\leq\sum_{S_{j}\ni i}(y_{j}+z_{j})=f_{i}.

Note that if y[0,1]m\textbf{y}\in[0,1]^{m} and jyj=k\sum_{j}y_{j}=k, then z[0,1]m\textbf{z}\in[0,1]^{m} and jzj=mk\sum_{j}z_{j}=m-k. Thus, it is not hard to see that the original LP is equivalent to the following LP which we will refer to as Π1\Pi_{1}.

maximize i[n]xi\displaystyle\sum_{i\in[n]}x_{i}
(s.t.) xifi+1fiSjizj1\displaystyle\frac{x_{i}}{f_{i}}+\frac{1}{f_{i}}\cdot\sum_{S_{j}\ni i}z_{j}\leq 1 i[n]\displaystyle\forall i\in[n]
j[m]zj=mk\displaystyle\sum_{j\in[m]}z_{j}=m-k
xi,zj[0,1]\displaystyle x_{i},z_{j}\in[0,1] i[n],j[m].\displaystyle\forall i\in[n],j\in[m].

In this section, we will assume the existence an MPC algorithm that approximately solves the linear program Π1\Pi_{1} in O(1/ϵ3lognlogm)O(1/\epsilon^{3}\cdot\log n\cdot\log m) rounds. The proof will be deferred to Section 2.2.

Theorem 2.

There is an algorithm that finds x[0,1]n,z[0,1]m\textbf{x}\in[0,1]^{n},\textbf{z}\in[0,1]^{m} such that

  1. 1.

    i[n]xi(1ϵ)OPT\sum_{i\in[n]}x_{i}\geq(1-\epsilon)\textup{OPT},

  2. 2.

    j[m]zj=mk\sum_{j\in[m]}z_{j}=m-k, and

  3. 3.
    xifi+1fiSjizj1+ϵi[n]\frac{x_{i}}{f_{i}}+\frac{1}{f_{i}}\cdot\sum_{S_{j}\ni i}z_{j}\leq 1+\epsilon\ \ \,\forall i\in[n]

in O(ϵ3lognlogm)O(\epsilon^{-3}\log n\cdot\log m) rounds.

Let x and z be the be the output given by Theorem 2. Then, let x=x/(1+ϵ)\textbf{x}^{\prime}=\textbf{x}/(1+\epsilon), z=z/(1+ϵ)\textbf{z}^{\prime}=\textbf{z}/(1+\epsilon), and y=𝟏z\textbf{y}^{\prime}=\mathbf{1}-\textbf{z}^{\prime}. We have

i=1nxi\displaystyle\sum_{i=1}^{n}x^{\prime}_{i} =11+ϵi=1nxi1ϵ1+ϵOPT>(14ϵ)OPT,\displaystyle=\frac{1}{1+\epsilon}\sum_{i=1}^{n}x_{i}\geq\frac{1-\epsilon}{1+\epsilon}\textup{OPT}>(1-4\epsilon)\textup{OPT}, (1)
j=1myi\displaystyle\sum_{j=1}^{m}y^{\prime}_{i} =j=1m(1zj1+ϵ)=mmk1+ϵm(12ϵ)(mk)<k+2ϵm,\displaystyle=\sum_{j=1}^{m}\left(1-\frac{z_{j}}{1+\epsilon}\right)=m-\frac{m-k}{1+\epsilon}\leq m-(1-2\epsilon)(m-k)<k+2\epsilon m, (2)
xi+\displaystyle x_{i}^{\prime}+ SjizjfixiSjiyj,i[n], by Fact 1.\displaystyle\sum_{S_{j}\ni i}z_{j}^{\prime}\leq f_{i}\iff x_{i}^{\prime}\leq\sum_{S_{j}\ni i}y_{j}^{\prime},\ \ \forall i\in[n]\text{, by Fact \ref{fact:equiv-lp}}. (3)

Thus, by setting xx/(1+ϵ)\textbf{x}\leftarrow\textbf{x}/(1+\epsilon), and yy\textbf{y}\leftarrow\textbf{y}^{\prime}, we have an approximate solution x[0,1]n,y[0,1]n\textbf{x}\in[0,1]^{n},\textbf{y}\in[0,1]^{n} to the LP Π0\Pi_{0} such that

i=1nxi\displaystyle\sum_{i=1}^{n}x_{i} (14ϵ)OPT,\displaystyle\geq(1-4\epsilon)\textup{OPT}, j=1myjk+2ϵm, and\displaystyle\sum_{j=1}^{m}y_{j}\leq k+2\epsilon m,\text{ and }
xi\displaystyle x_{i} Sjiyj,i[n].\displaystyle\leq\sum_{S_{j}\ni i}y_{j},\forall i\in[n].

We can then apply the standard randomized rounding to find a sub-collection of at most k+2ϵmk+2\epsilon m sets that covers at least (14ϵ)OPT(1-4\epsilon)\textup{OPT} elements. For the sake of completeness, we will provide the rounding algorithm in the MPC model in the following lemma. The proof can be found in Appendix A.

Lemma 1.

Suppose x[0,1]n\textbf{x}\in[0,1]^{n} and y[0,1]m\textbf{y}\in[0,1]^{m} satisfy:

  1. 1.

    i[n]xiL\sum_{i\in[n]}x_{i}\geq L,

  2. 2.

    xiSjiyjx_{i}\leq\sum_{S_{j}\ni i}y_{j} for all i[n]i\in[n],

  3. 3.

    j[m]yj=k\sum_{j\in[m]}y_{j}=k,

  4. 4.

    xi,yj[0,1]x_{i},y_{j}\in[0,1] for all i[n]i\in[n] and j[m]j\in[m].

Then there exists a rounding algorithm that finds a sub-collection of kk sets that in expectation cover at least (11/e)L(1-1/e)L elements in O(1)O(1) round. To obtain a high probability guarantee, the algorithm requires O(1/ϵlogm)O(1/\epsilon\cdot\log m) rounds to find kk sets that cover least (11/eO(ϵ))L(1-1/e-O(\epsilon))L elements.

Applying Lemma 1 to x and y with k+2ϵmk+2\epsilon m in place of kk, we obtain a sub-collection of at most k+2ϵmk+2\epsilon m sets that covers at least (11/eO(ϵ))OPT(1-1/e-O(\epsilon))\textup{OPT} elements. Since we assume that k=Ω(m)k=\Omega(m), that means we have found k+O(ϵ)kk+O(\epsilon)k sets that cover at least (11/eO(ϵ))OPT(1-1/e-O(\epsilon))\textup{OPT} elements. The next lemma shows that we can find kk sets among these that cover at least (11/eO(ϵ))OPT(1-1/e-O(\epsilon))\textup{OPT} elements. The proof is a combination of a counting argument and the well-known parallel prefix algorithm [20].

1 Compute |S1|,|S2S1|,|S3(S1S2)|,,|Sk(S1Sk1)||S_{1}|,|S_{2}\setminus S_{1}|,|S_{3}\setminus(S_{1}\cup S_{2})|,\ldots,|S_{k}\setminus(S_{1}\cup\ldots\cup S_{k-1})| in O(logk)O(\log k) rounds.
2
3Function PrefixCoverage(S1,S2,,SkS_{1},S_{2},\ldots,S_{k}):
       // Compute |S1|,|S2S1|,|S3S2S1|,,|SkSk1S1||S_{1}|,|S_{2}\cup S_{1}|,|S_{3}\cup S_{2}\cup S_{1}|,\ldots,|S_{k}\cup S_{k-1}\cup\ldots\cup S_{1}|
4       if k=1k=1 then
5             return |S1||S_{1}|.
6      else
             // Assume kk is even.
7             In one round, machine 2j12j-1 sends S2j1S_{2j-1} to machine 2j2j, then machine 2j2j computes Qj=S2j1S2jQ_{j}=S_{2j-1}\cup S_{2j}.
8             Run PrefixCoverage(Q1,Q2,,Qk/2Q_{1},Q_{2},\ldots,Q_{k/2}) on machines 2,4,6,,k2,4,6,\ldots,k.
9             Machine jj now has S1S2S3SjS_{1}\cup S_{2}\cup S_{3}\cup\ldots\cup S_{j} for j=2,4,6,,kj=2,4,6,\ldots,k.
10             In one round, machine j=1,3,5,,k1j=1,3,5,\ldots,k-1 communicates with machine j1j-1 which has S1S2Sj1S_{1}\cup S_{2}\cup\ldots S_{j-1} and computes S1S2SjS_{1}\cup S_{2}\cup\ldots\cup S_{j}.
11            
            // If kk is odd, run the above algorithm on S1,S2,,Sk1S_{1},S_{2},\ldots,S_{k-1} and then compute S1S2SkS_{1}\cup S_{2}\cup\ldots\cup S_{k} in one round.
12            
13            Each machine jj communicates with machine j1j-1 to compute |S1S2Sj||S1S2Sj1||S_{1}\cup S_{2}\cup\ldots\cup S_{j}|-|S_{1}\cup S_{2}\cup\ldots\cup S_{j-1}| in one round.
14            
15      
16
Algorithm 1 Parallel prefix coverage

We rely on the following result which is a simulation of the parallel prefix in our setting.

Lemma 2.

Suppose there are kk sets and machine jj holds the set SjS_{j}. Then Algorithm 1 computes |S1|,|S2S1|,|S3(S1S2)|,|S4(S1S2S3)|,,|SkSk1S1||S_{1}|,|S_{2}\setminus S_{1}|,|S_{3}\setminus(S_{1}\cup S_{2})|,|S_{4}\setminus(S_{1}\cup S_{2}\cup S_{3})|,\ldots,|S_{k}\cup S_{k-1}\cup\ldots\cup S_{1}| in O(logk)O(\log k) rounds.

Proof.

We first show how to compute (S1),(S1S2),(S1S2S3),,(S1S2Sk)(S_{1}),(S_{1}\cup S_{2}),(S_{1}\cup S_{2}\cup S_{3}),\ldots,(S_{1}\cup S_{2}\cup\ldots\cup S_{k}) in O(logk)O(\log k) rounds where machine jj holds S1S2SjS_{1}\cup S_{2}\cup\ldots\cup S_{j} at the end. Once this is done, machine jj can send S1S2SjS_{1}\cup S_{2}\cup\ldots\cup S_{j} to machine j+1j+1 and machine j+1j+1 can compute |S1S2Sj+1||S1S2Sj|=|Sj+1(S1S2Sj)||S_{1}\cup S_{2}\cup\ldots\cup S_{j+1}|-|S_{1}\cup S_{2}\cup\ldots\cup S_{j}|=|S_{j+1}\setminus(S_{1}\cup S_{2}\cup\ldots\cup S_{j})|.

The algorithm operates recursively. In one round, machine 2j12j-1 sends S2j1S_{2j-1} to machine 2j2j, then machine 2j2j computes Qj=S2j1S2jQ_{j}=S_{2j-1}\cup S_{2j}. Assuming kk is even, the algorithm recursively computes (Q1),(Q1Q2),(Q1Q2Q3),,(Q1Q2Qk)(Q_{1}),(Q_{1}\cup Q_{2}),(Q_{1}\cup Q_{2}\cup Q_{3}),\ldots,(Q_{1}\cup Q_{2}\cup\ldots\cup Q_{k}) on machines 2,4,,k2,4,\ldots,k. After recursion, machines with even indices 2j2j has the set S1S2S2jS_{1}\cup S_{2}\cup\ldots\cup S_{2j}. Then, in one round, machines with odd indices 2j+12j+1 communicate with machine 2j2j to learn about S1S2S2j+1S_{1}\cup S_{2}\cup\ldots\cup S_{2j+1}. If kk is odd, we just do the same on S1,S2,,Sk1S_{1},S_{2},\ldots,S_{k-1} and then compute S1S2SkS_{1}\cup S_{2}\cup\ldots\cup S_{k} in one round.

There are O(logk)O(\log k) recursion levels and therefore, the total number of rounds is O(logk)O(\log k). ∎

Lemma 3.

Let 𝒮={S1,,Sr}\mathcal{S}=\{S_{1},\dots,S_{r}\} be a collection of r=(1+γ)kr=(1+\gamma)k sets whose union contains LL elements where γ[0,1)\gamma\in[0,1), then there exist kk sets in 𝒮\mathcal{S} whose union contains at least (1γ)L(1-\gamma)L elements. Furthermore, we can find these kk sets in O(logr)O(\log r) rounds.

Proof.

Consider the following quantities:

ϕ1\displaystyle\phi_{1} =|S1|\displaystyle=|S_{1}|
ϕ2\displaystyle\phi_{2} =|S1S2||S1|\displaystyle=|S_{1}\cup S_{2}|-|S_{1}|
ϕ3\displaystyle\phi_{3} =|S1S2S3||S1S2|\displaystyle=|S_{1}\cup S_{2}\cup S_{3}|-|S_{1}\cup S_{2}|
\displaystyle\ldots

Clearly, j=1rϕj=L\sum_{j=1}^{r}\phi_{j}=L. We say SjS_{j} is responsible for element ii if iSj(l<jSl)i\in S_{j}\setminus(\bigcup_{l<j}S_{l}). This establishes a one-to-one correspondence between the sets S1,,Sr{S_{1},\dots,S_{r}} and the elements they cover. SjS_{j} is responsible for exactly ϕj\phi_{j} elements. Furthermore, if we remove some sets from 𝒮\mathcal{S}, and an element becomes uncovered, the set responsible for that element must have been removed. Thus, if we remove the γk\gamma k sets corresponding to the γk\gamma k smallest ϕj\phi_{j}, then at most γL\gamma L elements will not have a responsible set. Thus, the number of elements that become uncovered is at most γL\gamma L.

To find these sets, we apply Lemma 2 with rr in place of kk and O(ϵ)O(\epsilon) in place of γ\gamma to learn about ϕ1,ϕ2,,ϕr\phi_{1},\phi_{2},\ldots,\phi_{r} in O(logr)=O(logk)O(\log r)=O(\log k) rounds. We then remove the γk=O(ϵ)k\gamma k=O(\epsilon)k sets corresponding to the γk\gamma k smallest ϕj\phi_{j} and output the remaining kk sets. ∎

Putting it all together.

We spend O(1/ϵ3lognlogm)O(1/\epsilon^{3}\cdot\log n\cdot\log m) rounds to approximately solve the linear program Π1\Pi_{1}. From there, we can round the solution to find a sub-collection of k+O(ϵ)kk+O(\epsilon)k sets that cover at least (11/eO(ϵ))OPT(1-1/e-O(\epsilon))\textup{OPT} elements with high probability in O(1/ϵlogm)O(1/\epsilon\cdot\log m) rounds. We then apply Lemma 3 to find kk sets among these that cover at least (11/eO(ϵ))OPT(1-1/e-O(\epsilon))\textup{OPT} elements in O(logk)O(\log k) rounds. The total number of rounds is therefore O(1/ϵ3lognlogm)O(1/\epsilon^{3}\cdot\log n\cdot\log m).

Reducing the number of rounds to O(1/ϵ3logm(logm+log(1/ϵ)))O(1/\epsilon^{3}\cdot\log m\cdot(\log m+\log(1/\epsilon))).

The described algorithm runs in O(1/ϵ3lognlogm)O(1/\epsilon^{3}\cdot\log n\cdot\log m) rounds. Our main result in Theorem 1 states a stronger bound O(1/ϵ3logm(logm+log(1/ϵ)))O(1/\epsilon^{3}\cdot\log m\cdot(\log m+\log(1/\epsilon))) rounds. We achieve this by adopting the sub-sampling framework of McGregor and Vu [23].

Without loss of generality, we may assume that each element is covered by some set. If not, we can remove all of the elements that are not covered by any set using O(logm)O(\log m) rounds. Specifically, let vj\textbf{v}_{j} be the characteristic vector of SjS_{j}. We can compute v=i=1jvj\textbf{v}=\sum_{i=1}^{j}\textbf{v}_{j} in O(logm)O(\log m) rounds using the standard converge-cast binary tree algorithm. We can then remove the elements that are not covered by any set (elements corresponding to 0 entries in v).

We now have mm sets covering nn elements. Since k=Ω(m)k=\Omega(m), we must have that OPT=Ω(n)\textup{OPT}=\Omega(n). McGregor and Vu showed that if one samples each element in the universe [n][n] independently with probability p=Θ(log(mk)ϵ2OPT)p=\Theta\left(\frac{\log{m\choose k}}{\epsilon^{2}\textup{OPT}}\right) then with high probability, if we run a β\beta approximation algorithm on the subsampled universe, the solution will correpond to a βϵ\beta-\epsilon approximation on the original universe. We have just argued that OPT=Ω(n)\textup{OPT}=\Omega(n) and therefore with high probability, we sample O(log(mk)ϵ2)=O(1/ϵ2m)O\left(\frac{\log{m\choose k}}{\epsilon^{2}}\right)=O(1/\epsilon^{2}\cdot m) elements by appealing to Chernoff bound and the fact that (mk)2m{m\choose k}\leq 2^{m}.

As a result, we may assume that n=O(1/ϵ2m)n=O(1/\epsilon^{2}\cdot m). This results in an O(1/ϵ2logm(logm+log(1/ϵ)))O(1/\epsilon^{2}\cdot\log m\cdot(\log m+\log(1/\epsilon))) round algorithm.

Bounded frequency.

Assuming f=maxifif=\max_{i}f_{i} is known, we can lift the assumption that k=Ω(m)k=\Omega(m) and parameterize our algorithm based on ff instead. McGregor et al. [22] showed that the largest kf/η\lceil kf/\eta\rceil sets contain a solution that covers at least (1η)OPT(1-\eta)\textup{OPT} elements. We therefore can assume that m=O(kf/η)m=O(kf/\eta) by keeping only the largest kf/η\lceil kf/\eta\rceil sets which can be identified in O(1)O(1) rounds.

We set ϵ=η2/f\epsilon=\eta^{2}/f and proceed to obtain a solution that covers at least (1η2/f)(1η)OPT=(1O(η))OPT(1-\eta^{2}/f)(1-\eta)\textup{OPT}=(1-O(\eta))\textup{OPT} elements using at most k+O(ϵm)=k+O(η2/fkf/η)=k+O(ηk)k+O(\epsilon m)=k+O(\eta^{2}/f\cdot kf/\eta)=k+O(\eta k) sets as in the discussion above. Appealing to Lemma 3, we can find kk sets that cover at least (1O(η))OPT(1-O(\eta))\textup{OPT} elements. The total number of rounds is O(f3/η6logkfη(log1η+logkfη))=O(f3/η6log2kfη)O(f^{3}/\eta^{6}\cdot\log\frac{kf}{\eta}\cdot(\log\frac{1}{\eta}+\log\frac{kf}{\eta}))=O(f^{3}/\eta^{6}\cdot\log^{2}\frac{kf}{\eta}).

2.2 Approximate the LP’s solution via multiplicative weights

Fix an objective value LL. Let PP be a convex region defined by

P={(x,z)[0,1]n×[0,1]m:i[n]xi=L and j[m]zj=mk}.P=\left\{(\textbf{x},\textbf{z})\in[0,1]^{n}\times[0,1]^{m}:\sum_{i\in[n]}x_{i}=L\text{ and }\sum_{j\in[m]}z_{j}=m-k\right\}.

Note that if (x1,z1),(x2,z2),,(xT,zT)P\left(\textbf{x}_{1},\textbf{z}_{1}),(\textbf{x}_{2},\textbf{z}_{2}\right),\ldots,\left(\textbf{x}_{T},\textbf{z}_{T}\right)\in P then (1Tt=1Txt,1Tt=1Tzt)P\left(\frac{1}{T}\sum_{t=1}^{T}\textbf{x}_{t},\frac{1}{T}\sum_{t=1}^{T}\textbf{z}_{t}\right)\in P. Consider the following problem Ψ1\Psi_{1} that asks to either correctly declare that

(x,z)P:xifi+1fiSjizj1,i[n]\nexists(\textbf{x},\textbf{z})\in P:\frac{x_{i}}{f_{i}}+\frac{1}{f_{i}}\cdot\sum_{S_{j}\ni i}z_{j}\leq 1,\ \ \ \forall i\in[n]

or to output a solution (x,z)P(\textbf{x},\textbf{z})\in P such that

xifi+1fiSjizj1+ϵi[n].\frac{x_{i}}{f_{i}}+\frac{1}{f_{i}}\cdot\sum_{S_{j}\ni i}z_{j}\leq 1+\epsilon\ \ \ \forall i\in[n].

Once we have such an algorithm, we can try different values of L=(1+ϵ)0,(1+ϵ)1,(1+ϵ)2,,nL=\lfloor(1+\epsilon)^{0}\rfloor,\lfloor(1+\epsilon)^{1}\rfloor,\lfloor(1+\epsilon)^{2}\rfloor,\ldots,n and return the solution corresponding to the largest LL that has a feasible solution. There are O(1/ϵlogn)O(1/\epsilon\cdot\log n) such guesses. We know that the guess LL where OPT/(1+ϵ)LOPT\textup{OPT}/(1+\epsilon)\leq L\leq\textup{OPT} must result in a feasible solution.

To avoid introducing a logn\log n factor in the number of rounds, we partition these O(1/ϵlogn)O(1/\epsilon\cdot\log n) guesses into batches of size O(1/ϵ)O(1/\epsilon) where each batch corresponds to O(logn)O(\log n) guesses. Algorithm copies that correspond to guesses in the same batch will run in parallel. This will only introduce a logn\log n factor in terms of memory used by each machine. By returning the solution corresponding to the largest feasible guess LL, one attains Theorem 2.

Oracle implementation.

Given a weight vector wn\textbf{w}\in\mathbb{R}^{n} in which wi0w_{i}\geq 0 for all i[n]i\in[n]. We first consider an easier feasibility problem Ψ2\Psi_{2}. It asks to either correctly declares that

(x,z)P:i=1nwi(xifi+Sjizifi)i=1nwi,i[n]\displaystyle\nexists(\textbf{x},\textbf{z})\in P:\ \ \sum_{i=1}^{n}w_{i}\cdot\left(\frac{x_{i}}{f_{i}}+\sum_{S_{j}\ni i}\frac{z_{i}}{f_{i}}\right)\leq\sum_{i=1}^{n}w_{i},\ \ \ \forall i\in[n]

or to outputs a solution (x,z)P(\textbf{x},\textbf{z})\in P such that

i=1nwi(xifi+Sjizifi)i=1nwi+1n5,i[n].\displaystyle\sum_{i=1}^{n}w_{i}\cdot\left(\frac{x_{i}}{f_{i}}+\sum_{S_{j}\ni i}\frac{z_{i}}{f_{i}}\right)\leq\sum_{i=1}^{n}w_{i}+\frac{1}{n^{5}},\ \ \ \forall i\in[n]. (4)

That is, if the input is feasible, then output the corresponding (x,z)P(\textbf{x},\textbf{z})\in P that approximately satisfy the constraint. Otherwise, correctly conclude that the input is infeasible. In the multiplicative weights update framework, this is known as the approximate oracle.

Note that if there is a feasible solution to Ψ1\Psi_{1}, then there is a feasible solution to Ψ2\Psi_{2} since

xifi+1fiSjizj1i[n]i=1nwi(xifi+Sjizifi)i=1nwi.\displaystyle\frac{x_{i}}{f_{i}}+\frac{1}{f_{i}}\cdot\sum_{S_{j}\ni i}z_{j}\leq 1\ \ \ \forall i\in[n]\implies\sum_{i=1}^{n}w_{i}\cdot\left(\frac{x_{i}}{f_{i}}+\sum_{S_{j}\ni i}\frac{z_{i}}{f_{i}}\right)\leq\sum_{i=1}^{n}w_{i}.

We can implement an oracle that solves the above feasibility problem Ψ2\Psi_{2} as follows. First, observe that

i=1nwifi(xi+Sjizj)i=1nwi\displaystyle\sum_{i=1}^{n}\frac{w_{i}}{f_{i}}\cdot\left(x_{i}+\sum_{S_{j}\ni i}z_{j}\right)\leq\sum_{i=1}^{n}w_{i}
i=1nwifixi+i=1nwifiSjizji=1nwi\displaystyle\iff\sum_{i=1}^{n}\frac{w_{i}}{f_{i}}\cdot x_{i}+\sum_{i=1}^{n}\frac{w_{i}}{f_{i}}\cdot\sum_{S_{j}\ni i}z_{j}\leq\sum_{i=1}^{n}w_{i}
i=1nwifixi+j=1mzjiSjwifii=1nwi.\displaystyle\iff\sum_{i=1}^{n}\frac{w_{i}}{f_{i}}\cdot x_{i}+\sum_{j=1}^{m}z_{j}\cdot\sum_{i\in S_{j}}\frac{w_{i}}{f_{i}}\leq\sum_{i=1}^{n}w_{i}.

To ease the notation, define

pi:=wifi,i[n], and qj:=iSjwifi=iSjpi,j[m].\displaystyle p_{i}:=\frac{w_{i}}{f_{i}},\ \ \ \forall i\in[n]\text{, and }q_{j}:=\sum_{i\in S_{j}}\frac{w_{i}}{f_{i}}=\sum_{i\in S_{j}}p_{i},\ \ \ \forall j\in[m].

We therefore want to check if there exists (x,z)P(\textbf{x},\textbf{z})\in P such that

LHS(x,z):=i=1nxipi+j=1mzjqji=1nwi.\displaystyle LHS(\textbf{x},\textbf{z}):=\sum_{i=1}^{n}x_{i}p_{i}+\sum_{j=1}^{m}z_{j}q_{j}\leq\sum_{i=1}^{n}w_{i}.

We will minimize the left hand side by minimizing each sum separately. We can indeed do this exactly. However, there is a subtle issue where we need to bound the number of bits required to represent pi=wifip_{i}=\frac{w_{i}}{f_{i}} and qj=iSjwifi=iSjpiq_{j}=\sum_{i\in S_{j}}\frac{w_{i}}{f_{i}}=\sum_{i\in S_{j}}p_{i} given the memory constraint. To do this, we truncate the value of each pip_{i} after the (10log2n)(10\log_{2}n)-th bit following the decimal point. Note that this will result in an underestimate of pip_{i} by at most 1/n101/n^{10}. In particular, let p^i\hat{p}_{i} be pip_{i} after truncating the value of pip_{i} at the (10log2n)(10\log_{2}n)-th bit after the decimal point and q^j=iSjp^i\hat{q}_{j}=\sum_{i\in S_{j}}\hat{p}_{i}. For any (x,z)[0,1]n×[0,1]m(\textbf{x},\textbf{z})\in[0,1]^{n}\times[0,1]^{m}, we have

LHS^(x,z)\displaystyle\widehat{LHS}(\textbf{x},\textbf{z}) :=i=1np^ixi+j=1mzjiSjp^i(i=1npixi)nn10+(j=1mzjiSjpi)mfin10\displaystyle:=\sum_{i=1}^{n}\hat{p}_{i}x_{i}+\sum_{j=1}^{m}z_{j}\sum_{i\in S_{j}}\hat{p}_{i}\geq\left(\sum_{i=1}^{n}p_{i}x_{i}\right)-\frac{n}{n^{10}}+\left(\sum_{j=1}^{m}z_{j}\sum_{i\in S_{j}}p_{i}\right)-\frac{mf_{i}}{n^{10}}
>LHS(x,z)1n5.\displaystyle>LHS(\textbf{x},\textbf{z})-\frac{1}{n^{5}}.

The last inequality is based on the assumption that m,finm,f_{i}\leq n. Therefore, LHS(x,z)1/n5LHS^(x,z)LHS(x,z)LHS(\textbf{x},\textbf{z})-1/n^{5}\leq\widehat{LHS}(\textbf{x},\textbf{z})\leq LHS(\textbf{x},\textbf{z}). Note that since i=1nxi=L\sum_{i=1}^{n}x_{i}=L and j=1mzj=mk\sum_{j=1}^{m}z_{j}=m-k , to minimize LHS^(x,z)\widehat{LHS}(\textbf{x},\textbf{z}) over (x,z)P(\textbf{x},\textbf{z})\in P, we simply set

xi\displaystyle x_{i} =[p^i is among the L smallest values of {p^t}t=1n}],\displaystyle=[\text{$\hat{p}_{i}$ is among the $L$ smallest values of $\{\hat{p}_{t}\}_{t=1}^{n}$}\}],
zj\displaystyle z_{j} =[q^j is among the mk smallest values of {q^t}t=1m}].\displaystyle=[\text{$\hat{q}_{j}$ is among the $m-k$ smallest values of $\{\hat{q}_{t}\}_{t=1}^{m}$}\}].

After setting x,z\textbf{x},\textbf{z} as above, if LHS^(x,z)>i=1nwiLHS(x,z)>i=1nwi\widehat{LHS}(\textbf{x},\textbf{z})>\sum_{i=1}^{n}w_{i}\implies LHS(\textbf{x},\textbf{z})>\sum_{i=1}^{n}w_{i}, then it is safe to declare that the system is infeasible. Otherwise, we have found (x,z)P(\textbf{x},\textbf{z})\in P such that LHS^(x,z)i=1nwiLHS(x,z)i=1nwi+1/n5\widehat{LHS}(\textbf{x},\textbf{z})\leq\sum_{i=1}^{n}w_{i}\implies LHS(\textbf{x},\textbf{z})\leq\sum_{i=1}^{n}w_{i}+1/n^{5} as required by Equation (4).

Lemma 4.

Assume that all machines have the vector w. We can solve the feasibility problem Ψ2\Psi_{2} in O(1)O(1) rounds, where all machines either learn that the system is infeasible or obtain an approximate solution (x,z)P(\textbf{x},\textbf{z})\in P that satisfies Equation (4).

Proof.

We have argued above that the oracle we design either correctly declares that the system is infeasible or outputs (x,z)P(\textbf{x},\textbf{z})\in P such that LHS(x,z)i=1nwi+1/n5LHS(\textbf{x},\textbf{z})\leq\sum_{i=1}^{n}w_{i}+1/n^{5}. Recall that each machine has the frequency vector f and suppose for the time being that each machine also has the vector w. We can implement this oracle in O(1)O(1) rounds as follows. Each machine jj computes q^j=iSjp^i\hat{q}_{j}=\sum_{i\in S_{j}}\hat{p}_{i} since it has wiw_{i} and fif_{i} to compute pip_{i} for all i[n]i\in[n]. Then it sends q^j\hat{q}_{j} to the central machine. Note that each p^i\hat{p}_{i} can be represented using O(logn)O(\log n) bits. Observe that q^j\hat{q}_{j} is the sum of at most nn different p^i\hat{p}_{i}. The number of bits in the fractional part does not increase while the number of bits in the integer part increases by at most O(1)O(1), as the sum is upper bounded by a crude bound 2n×2O(logn)=2O(logn)2n\times 2^{O(\log n)}=2^{O(\log n)}. Thus, the central machine receives at most O~(m)=O~(n)\tilde{O}\left(m\right)=\tilde{O}\left(n\right) bits from all other machines.

The central machine will then set x and z as described above. This allows the central machine to correctly determine whether the system is infeasible or to find (x,z)P(\textbf{x},\textbf{z})\in P that satisfies Equation (4). Since the entries of x and z are binary, they can be sent to non-central machines, with each machine receiving a message of n+m=O(n)n+m=O(n) bits. We summarize the above as the following lemma. ∎

Solving the LP via multiplicative weights.

Once the existence of such an oracle is guaranteed, we can follow the multiplicative weights framework to approximately solve the LP. We will first explain how to implement the MWU algorithm in the MPC model. See Algorithm 2.

Input: Objective value LL, ϵ1/4\epsilon\leq 1/4
1 Initialize wi(0)=1w_{i}^{(0)}=1 for all i[n]i\in[n].
2 for iteration t=1,2,,T=O(1/ϵ2logn)t=1,2,\ldots,T=O(1/\epsilon^{2}\cdot\log n) do
3       Run the oracle in Lemma 4 with w(t1)\textbf{w}^{(t-1)} to check if there exists a feasible solution. If the answer is Infeasible, stop the algorithm. If the answer is Feasible, let x(t)\textbf{x}^{(t)} and z(t)\textbf{z}^{(t)} be the output of the oracle that are now stored in all machines .
4      Each machine jj constructs Yj={Yj1,Yj2,,Yjn}Y_{j}=\{Y_{j1},Y_{j2},\ldots,Y_{jn}\} where Yji=zj(t)[{iSj}]Y_{ji}={z_{j}^{(t)}}\cdot[\{i\in S_{j}\}].
5      
6      Compute W=j[m]YjW=\sum_{j\in[m]}Y_{j} in O(logm)O(\log m) rounds using the a converge-cast binary tree and send WW to the central machine. Note that Wi=Sjizj(t)W_{i}=\sum_{S_{j}\ni i}{z_{j}^{(t)}}.
7       For each i[n]i\in[n], the central machine computes Ei(t)=fierrori(t)E_{i}^{(t)}=f_{i}\cdot\textup{error}_{i}^{(t)} where
errori(t):=1xi(t)fiWifi=1xi(t)fiSjizj(t)fii[n].\textup{error}_{i}^{(t)}:=1-\frac{x_{i}^{(t)}}{f_{i}}-\frac{W_{i}}{f_{i}}=1-\frac{x_{i}^{(t)}}{f_{i}}-\sum_{S_{j}\ni i}\frac{z_{j}^{(t)}}{f_{i}}\ \ \ \forall i\in[n].
and sends d=1tEi(d)\sum_{d=1}^{t}E_{i}^{(d)} to all other machines.
8       For each i[n]i\in[n], each machine computes
wi(t)=2ϵd=1tEi(d)/fi=2ϵd=1terrori(d)=2ϵerrori(t)wi(t1).w_{i}^{(t)}=2^{-\epsilon\cdot\sum_{d=1}^{t}E_{i}^{(d)}/f_{i}}=2^{-\epsilon\cdot\sum_{d=1}^{t}\textup{error}_{i}^{(d)}}=2^{-\epsilon\cdot\textup{error}_{i}^{(t)}}w_{i}^{(t-1)}.
9
10After TT iterations, output
x=1Tt=1Tx(t), and z=1Tt=1Tz(t).\textbf{x}=\frac{1}{T}\sum_{t=1}^{T}\textbf{x}^{(t)}\text{, and }\textbf{z}=\frac{1}{T}\sum_{t=1}^{T}\textbf{z}^{(t)}.
Algorithm 2 Multiplicative weights for solving the LP
Lemma 5.

Algorithm 2 can be implemented in O(1/ϵ2lognlogm)O(1/\epsilon^{2}\cdot\log n\cdot\log m) rounds.

Proof.

Without loss of generality, we can round ϵ\epsilon down to the nearest power of 1/21/2. Then, ϵ-\epsilon can be represented using O(log(1/ϵ))=O(logn)O(\log(1/\epsilon))=O(\log n) bits, since we assume 1/ϵ<n1/\epsilon<n.

Each machine maintains the current vectors x(t)\textbf{x}^{(t)}, z(t)\textbf{z}^{(t)}, and w(t)\textbf{w}^{(t)}, each of which has at most O(n)O(n) entries. Since the entries in x(t)\textbf{x}^{(t)} and z(t)\textbf{z}^{(t)} are binary, we can represent them using O(n)O(n) bits. Recall that, in the oracle described in Lemma 4, the central machine can broadcast x(t)\textbf{x}^{(t)} and z(t)\textbf{z}^{(t)} to all other machines in one round without violating the memory constraint.

It remains to show that the number of bits required to represent wi(t)w_{i}^{(t)} is O(logn)O(\log n) in all iterations. The central machine then implicitly broadcasts wi(t)w_{i}^{(t)} to all other machines in iteration tt by sending d=1tEi(d)\sum_{d=1}^{t}E_{i}^{(d)} to all other machines for each i[n]i\in[n].

Note that each wi(t)w_{i}^{(t)} has the form

wi(t)=2ϵd=1tEi(d)/fi=2ϵd=1terrori(d).w_{i}^{(t)}=2^{-\epsilon\cdot\sum_{d=1}^{t}E_{i}^{(d)}/f_{i}}=2^{-\epsilon\cdot\sum_{d=1}^{t}\textup{error}_{i}^{(d)}}.

For any dd, since m,finm,f_{i}\leq n, we have

fim1fierrori(d)fifierrori(d)[2n,2n].f_{i}-m-1\leq f_{i}\cdot\textup{error}_{i}^{(d)}\leq f_{i}\implies f_{i}\cdot\textup{error}_{i}^{(d)}\in[-2n,2n].

Since T=Θ(1/ϵ2logn)=o(n3)T=\Theta(1/\epsilon^{2}\cdot\log n)=o(n^{3}), we have

d=1tEi(d)=d=1tfierrori(d)[2nt,2nt][2n4,2n4].\sum_{d=1}^{t}E_{i}^{(d)}=\sum_{d=1}^{t}f_{i}\cdot\textup{error}_{i}^{(d)}\in[-2nt,2nt]\subset[-2n^{4},2n^{4}].

Thus, we can represent d=1tEi(d)\sum_{d=1}^{t}E_{i}^{(d)} using O(logn)O(\log n) bits. This implies that the central machine can implicitly broadcast wi(t)w_{i}^{(t)} to all other machines in O(1)O(1) rounds. Putting it all together, each of the T=Θ(1/ϵ2logn)T=\Theta(1/\epsilon^{2}\cdot\log n) iterations consists of: 1) a call to the oracle, which takes O(1)O(1) rounds, 2) computing errori(t)\textup{error}_{i}^{(t)} for each i[n]i\in[n] in step 7, which takes O(logm)O(\log m) rounds, and 3) the central machine broadcasting w(t)\textbf{w}^{(t)} in one round to all other machines before proceeding to the next iteration. ∎

The next lemma is an adaptation of the standard multiplicative weights algorithm.

Lemma 6.

The output of Algorithm 2 satisfies the following property. If there exists a feasible solution, then the output satisfies:

xifi+Sjizjfi\displaystyle\frac{x_{i}}{f_{i}}+\sum_{S_{j}\ni i}\frac{z_{j}}{f_{i}} 1+ϵi[n], and i=1nxi=L.\displaystyle\leq 1+\epsilon\ \ \ \forall i\in[n]\text{, and }\sum_{i=1}^{n}x_{i}=L.

Otherwise, the algorithm correctly concludes that the system is infeasible.

Proof.

If the algorithm does not output Infeasible, this implies that in each iteration tt, i=1nxi(t)=L\sum_{i=1}^{n}x_{i}^{(t)}=L, and therefore i=1nxi=L\sum_{i=1}^{n}x_{i}=L since x=1Tt=1Tx(t)\textbf{x}=\frac{1}{T}\sum_{t=1}^{T}\textbf{x}^{(t)}. Similarly, in each iteration tt, j=1mzj(t)=mk\sum_{j=1}^{m}z_{j}^{(t)}=m-k, and thus j=1mzj=mk\sum_{j=1}^{m}z_{j}=m-k since z=1Tt=1Tz(t)\textbf{z}=\frac{1}{T}\sum_{t=1}^{T}\textbf{z}^{(t)}. Hence, the output (x,z)P(\textbf{x},\textbf{z})\in P. Define the potential function

Φ(t):=i=1nwi(t).\displaystyle\Phi^{(t)}:=\sum_{i=1}^{n}w_{i}^{(t)}.

We will make use of the fact exp(ηx)1ηx+η2x2\text{exp}\left(-\eta x\right)\leq 1-\eta x+\eta^{2}x^{2} for |ηx|1|\eta x|\leq 1. Note that for all ii and tt, we always have

errori(t)=1xi(t)fiSjizj(t)fi[1,1].\textup{error}_{i}^{(t)}=1-\frac{x_{i}^{(t)}}{f_{i}}-\sum_{S_{j}\ni i}\frac{z_{j}^{(t)}}{f_{i}}\in[-1,1].

Note that errori(t)[1,1]\textup{error}_{i}^{(t)}\in[-1,1] because 0xi(t)1fi0\leq x_{i}^{(t)}\leq 1\leq f_{i} and 0Sjizj(t)Sji1=fi0\leq\sum_{S_{j}\ni i}{z_{j}^{(t)}}\leq\sum_{S_{j}\ni i}1=f_{i}. Set α=ϵln(2)\alpha=\epsilon\cdot\ln(2), as long as ϵ1/4\epsilon\leq 1/4, we have |αerrori(t)|<1|\alpha\cdot\textup{error}_{i}^{(t)}|<1 and therefore

wi(t)\displaystyle w_{i}^{(t)} =exp(αerrori(t))wi(t1)(1αerrori(t)+α2(errori(t))2)wi(t1).\displaystyle=\text{exp}\left(-\alpha\cdot\textup{error}_{i}^{(t)}\right)\cdot w_{i}^{(t-1)}\leq\left(1-\alpha\cdot\textup{error}_{i}^{(t)}+\alpha^{2}\cdot\left(\textup{error}_{i}^{(t)}\right)^{2}\right)\cdot w_{i}^{(t-1)}.

Summing over ii gives:

Φ(t)\displaystyle\Phi^{(t)} i=1n(1αerrori(t)+α2)wi(t1)\displaystyle\leq\sum_{i=1}^{n}\left(1-\alpha\cdot\textup{error}_{i}^{(t)}+\alpha^{2}\right)\cdot w_{i}^{(t-1)}
=(1+α2)i=1nwi(t1)αi=1nerrori(t)wi(t1).\displaystyle=(1+\alpha^{2})\sum_{i=1}^{n}w_{i}^{(t-1)}-\alpha\sum_{i=1}^{n}\textup{error}_{i}^{(t)}\cdot w_{i}^{(t-1)}.

The first inequality follows from the fact that (errori(t))2[0,1]\left(\textup{error}_{i}^{(t)}\right)^{2}\in[0,1]. Note that

i=1nerrori(t)wi(t1)\displaystyle\sum_{i=1}^{n}\textup{error}_{i}^{(t)}w_{i}^{(t-1)} =i=1nwi(t1)(1xi(t1)fiSjizj(t1)fi)\displaystyle=\sum_{i=1}^{n}w_{i}^{(t-1)}\left(1-\frac{x_{i}^{(t-1)}}{f_{i}}-\sum_{S_{j}\ni i}\frac{z_{j}^{(t-1)}}{f_{i}}\right)
=i=1nwi(t1)i=1nwi(t1)(xi(t1)fi+Sjizj(t1)fi)1n5.\displaystyle=\sum_{i=1}^{n}w_{i}^{(t-1)}-\sum_{i=1}^{n}w_{i}^{(t-1)}\left(\frac{x_{i}^{(t-1)}}{f_{i}}+\sum_{S_{j}\ni i}\frac{z_{j}^{(t-1)}}{f_{i}}\right)\geq-\frac{1}{n^{5}}.

The last inequality follows from the fact that the oracle is guaranteed to find x(t1)\textbf{x}^{(t-1)} and z(t1)\textbf{z}^{(t-1)} such that

i=1n(xi(t1)fi+Sjizj(t1)fi)wi(t1)i=1nwi(t1)+1n5.\displaystyle\sum_{i=1}^{n}\left(\frac{x_{i}^{(t-1)}}{f_{i}}+\sum_{S_{j}\ni i}\frac{z_{j}^{(t-1)}}{f_{i}}\right)w_{i}^{(t-1)}\leq\sum_{i=1}^{n}w_{i}^{(t-1)}+\frac{1}{n^{5}}.

Thus, Φ(t)(1+α2)i=1nwi(t1)+αn5=(1+α2)Φt1+αn5.\Phi^{(t)}\leq(1+\alpha^{2})\sum_{i=1}^{n}w_{i}^{(t-1)}+\frac{\alpha}{n^{5}}=(1+\alpha^{2})\Phi^{t-1}+\frac{\alpha}{n^{5}}. By a simple induction,

Φ(T)\displaystyle\Phi^{(T)} (1+α2)Φ(T1)+αn5\displaystyle\leq(1+\alpha^{2})\Phi^{(T-1)}+\frac{\alpha}{n^{5}}
(1+α2)2Φ(T2)+αn5(1+α2)+αn5\displaystyle\leq(1+\alpha^{2})^{2}\Phi^{(T-2)}+\frac{\alpha}{n^{5}}\left(1+\alpha^{2}\right)+\frac{\alpha}{n^{5}}
(1+α2)3Φ(T3)+(1+α2)2αn5+αn5(1+α2)+αn5\displaystyle\leq(1+\alpha^{2})^{3}\Phi^{(T-3)}+(1+\alpha^{2})^{2}\frac{\alpha}{n^{5}}+\frac{\alpha}{n^{5}}\left(1+\alpha^{2}\right)+\frac{\alpha}{n^{5}}
\displaystyle\ldots
(1+α2)TΦ(0)+αn5t=0T1(1+α2)t\displaystyle\leq(1+\alpha^{2})^{T}\Phi^{(0)}+\frac{\alpha}{n^{5}}\sum_{t=0}^{T-1}(1+\alpha^{2})^{t}
=(1+α2)TΦ(0)+αn5(1+α2)T1α2\displaystyle=(1+\alpha^{2})^{T}\Phi^{(0)}+\frac{\alpha}{n^{5}}\cdot\frac{(1+\alpha^{2})^{T}-1}{\alpha^{2}}
(1+α2)TΦ(0)+1αn5(1+α2)T.\displaystyle\leq(1+\alpha^{2})^{T}\Phi^{(0)}+\frac{1}{\alpha n^{5}}{(1+\alpha^{2})^{T}}.

Recall that α=ϵln2\alpha=\epsilon\ln 2 and we assume 1/ϵ<n/101/\epsilon<n/10. Thus, 1/(αn5)<ϵ4/ln2<11/(\alpha n^{5})<\epsilon^{4}/\ln 2<1. Furthermore, recall that Φ(0)=n\Phi^{(0)}=n. We have,

wi(T)\displaystyle w_{i}^{(T)} (1+α2)T(n+1)<(1+α2)T2n\displaystyle\leq(1+\alpha^{2})^{T}(n+1)<(1+\alpha^{2})^{T}2n
exp(αt=1Terrori(t))\displaystyle\text{exp}\left(-\alpha\sum_{t=1}^{T}\textup{error}_{i}^{(t)}\right) (1+α2)T(2n)\displaystyle\leq(1+\alpha^{2})^{T}(2n)
αt=1Terrori(t)\displaystyle-\alpha\sum_{t=1}^{T}\textup{error}_{i}^{(t)} ln(2n)+Tln(1+α2).\displaystyle\leq\ln(2n)+T\ln(1+\alpha^{2}).

We use the fact that ln(1+x)x\ln(1+x)\leq x for xx\in\mathbb{R} to get

t=1Terrori(t)\displaystyle\sum_{t=1}^{T}\textup{error}_{i}^{(t)} ln(2n)αTln(1+α2)α\displaystyle\geq-\frac{\ln(2n)}{\alpha}-T\frac{\ln(1+\alpha^{2})}{\alpha}
t=1T(1xi(t)fiSjizj(t)fi)\displaystyle\sum_{t=1}^{T}\left(1-\frac{x_{i}^{(t)}}{f_{i}}-\sum_{S_{j}\ni i}\frac{z_{j}^{(t)}}{f_{i}}\right) ln(2n)αTα2α\displaystyle\geq-\frac{\ln(2n)}{\alpha}-T\frac{\alpha^{2}}{\alpha}
1Tt=1T(1xi(t)fiSjizj(t)fi)\displaystyle\frac{1}{T}\sum_{t=1}^{T}\left(1-\frac{x_{i}^{(t)}}{f_{i}}-\sum_{S_{j}\ni i}\frac{z_{j}^{(t)}}{f_{i}}\right) ln(2n)Tαα\displaystyle\geq-\frac{\ln(2n)}{T\alpha}-\alpha
1xifiSjizjfi\displaystyle 1-\frac{x_{i}}{f_{i}}-\sum_{S_{j}\ni i}\frac{z_{j}}{f_{i}} ln(2n)Tαα\displaystyle\geq-\frac{\ln(2n)}{T\alpha}-\alpha
xifi+Sjizjfi\displaystyle\frac{x_{i}}{f_{i}}+\sum_{S_{j}\ni i}\frac{z_{j}}{f_{i}} 1+ln(2n)Tα+α\displaystyle\leq 1+\frac{\ln(2n)}{T\alpha}+\alpha
xifi+Sjizjfi\displaystyle\frac{x_{i}}{f_{i}}+\sum_{S_{j}\ni i}\frac{z_{j}}{f_{i}} 1+O(ϵ).\displaystyle\leq 1+O(\epsilon).

The last inequality follows from choosing T=Θ(1/ϵ2logn)T=\Theta(1/\epsilon^{2}\cdot\log n) and the fact that α=ϵln(2)\alpha=\epsilon\ln(2); furthermore, recall that the final solution xi=1Tt=1Txi(t)x_{i}=\frac{1}{T}\sum_{t=1}^{T}x_{i}^{(t)} and zj=1Tt=1Tzj(t)z_{j}=\frac{1}{T}\sum_{t=1}^{T}z_{j}^{(t)}. Thus, the output of the algorithm satisfies the desired properties.

References

  • [1] Aris Anagnostopoulos, Luca Becchetti, Ilaria Bordino, Stefano Leonardi, Ida Mele, and Piotr Sankowski. Stochastic query covering for fast approximate document retrieval. ACM Trans. Inf. Syst., 33(3):11:1–11:35, 2015.
  • [2] Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory Comput., 8(1):121–164, 2012.
  • [3] Sepehr Assadi. Tight space-approximation tradeoff for the multi-pass streaming set cover problem. In PODS, pages 321–335. ACM, 2017.
  • [4] Sepehr Assadi and Sanjeev Khanna. Tight bounds on the round complexity of the distributed maximum coverage problem. In SODA, pages 2412–2431. SIAM, 2018.
  • [5] Sepehr Assadi, Sanjeev Khanna, and Yang Li. Tight bounds for single-pass streaming complexity of the set cover problem. SIAM J. Comput., 50(3), 2021.
  • [6] Philip Cervenjak, Junhao Gan, Seeun William Umboh, and Anthony Wirth. Maximum unique coverage on streams: Improved FPT approximation scheme and tighter space lower bound. In APPROX/RANDOM, volume 317 of LIPIcs, pages 25:1–25:23. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2024.
  • [7] Amit Chakrabarti, Andrew McGregor, and Anthony Wirth. Improved algorithms for maximum coverage in dynamic and random order streams. CoRR, abs/2403.14087, 2024.
  • [8] Rafael da Ponte Barbosa, Alina Ene, Huy L. Nguyen, and Justin Ward. A new framework for distributed submodular maximization. In FOCS, pages 645–654. IEEE Computer Society, 2016.
  • [9] Uriel Feige. A threshold of ln n for approximating set cover. J. ACM, 45(4):634–652, 1998.
  • [10] Sariel Har-Peled, Piotr Indyk, Sepideh Mahabadi, and Ali Vakilian. Towards tight bounds for the streaming set cover problem. In PODS, pages 371–383. ACM, 2016.
  • [11] Dorit S Hochbaum and Anu Pathria. Analysis of the greedy approach in problems of maximum k-coverage. Naval Research Logistics (NRL), 45(6):615–627, 1998.
  • [12] Piotr Indyk, Sepideh Mahabadi, Ronitt Rubinfeld, Jonathan R. Ullman, Ali Vakilian, and Anak Yodpinyanee. Fractional set cover in the streaming model. In APPROX-RANDOM, volume 81 of LIPIcs, pages 12:1–12:20. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2017.
  • [13] Piotr Indyk and Ali Vakilian. Tight trade-offs for the maximum k-coverage problem in the general streaming model. In PODS, pages 200–217. ACM, 2019.
  • [14] Stephen Jaud, Anthony Wirth, and Farhana Murtaza Choudhury. Maximum coverage in sublinear space, faster. In SEA, volume 265 of LIPIcs, pages 21:1–21:20. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2023.
  • [15] Howard J. Karloff, Siddharth Suri, and Sergei Vassilvitskii. A model of computation for mapreduce. In SODA, pages 938–948. SIAM, 2010.
  • [16] David Kempe, Jon M. Kleinberg, and Éva Tardos. Maximizing the spread of influence through a social network. Theory Comput., 11:105–147, 2015.
  • [17] Sanjeev Khanna, Christian Konrad, and Cezar-Mihail Alexandru. Set cover in the one-pass edge-arrival streaming model. In PODS, pages 127–139. ACM, 2023.
  • [18] Andreas Krause and Carlos Guestrin. Near-optimal observation selection using submodular functions. In AAAI, pages 1650–1654. AAAI Press, 2007.
  • [19] Ravi Kumar, Benjamin Moseley, Sergei Vassilvitskii, and Andrea Vattani. Fast greedy algorithms in mapreduce and streaming. ACM Trans. Parallel Comput., 2(3):14:1–14:22, 2015.
  • [20] Richard E. Ladner and Michael J. Fischer. Parallel prefix computation. J. ACM, 27(4):831–838, 1980.
  • [21] Paul Liu and Jan Vondrák. Submodular optimization in the mapreduce model. In SOSA, volume 69 of OASIcs, pages 18:1–18:10. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019.
  • [22] Andrew McGregor, David Tench, and Hoa T. Vu. Maximum coverage in the data stream model: Parameterized and generalized. In ICDT, volume 186 of LIPIcs, pages 12:1–12:20. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021.
  • [23] Andrew McGregor and Hoa T. Vu. Better streaming algorithms for the maximum coverage problem. Theory Comput. Syst., 63(7):1595–1619, 2019.
  • [24] Barna Saha and Lise Getoor. On maximum coverage in the streaming model & application to multi-topic blog-watch. In SDM, pages 697–708. SIAM, 2009.
  • [25] David Steurer. Max coverage problem, lecture notes. https://www.cs.cornell.edu/courses/cs4820/2014sp/notes/maxcoverage.pdf, 2014. Accessed: 2024-09-14.
  • [26] Rowan Warneke, Farhana Murtaza Choudhury, and Anthony Wirth. Maximum coverage in random-arrival streams. In ESA, volume 274 of LIPIcs, pages 102:1–102:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2023.

Appendix A Omitted Proofs

Proof of Lemma 1.

Interpret {y1/k,y2/k,,ym/k}\{y_{1}/k,y_{2}/k,\ldots,y_{m}/k\} as the probabilities for the sets S1,S2,,SmS_{1},S_{2},\ldots,S_{m}. The central machine samples kk sets independently according to this distribution. In expectation, we cover at least (11/e)L(1-1/e)L elements (see [25]). Hence, in expectation, the number of uncovered elements is at most L/eL/e. By Markov’s inequality, the probability that the number of uncovered elements is more than (1+ϵ)L/e(1+\epsilon)L/e is at most 1/(1+ϵ)eϵ/21/(1+\epsilon)\leq e^{-\epsilon/2}. So, if we do this Θ(1/ϵlogm)\Theta(1/\epsilon\cdot\log m) times, the probability that the best solution covers fewer than (11/e1/(1+ϵ))L(1-1/e-1/(1+\epsilon))L elements is at most

exp(ϵ/2Θ(1/ϵlogm))1/poly(m).\text{exp}\left(-\epsilon/2\cdot\Theta(1/\epsilon\cdot\log m)\right)\leq 1/\operatorname{poly}(m).

After the central machine forms Θ(1/ϵlogm)\Theta(1/\epsilon\cdot\log m) solutions (each solution consisting of at most kk sets) as described above, it broadcasts these solutions in batches of size 1/ϵ1/\epsilon to all other machines. Note that each batch contains O(logm)O(\log m) solutions.

For each batch, each machine receives a message of size O~(k)\tilde{O}\left(k\right). For each of these O(logm)O(\log m) solutions in the batch, the machines can compute its coverage as follows. Let vjv_{j} be the characteristic vector of the set SjS_{j} that was chosen. We can aggregate the vectors vjv_{j} that are in the solution in O(logm)O(\log m) rounds and count the number of non-zero entries using a binary broadcast tree. We do this in parallel for all solutions in the batch. Finally, we repeat this process for all 1/ϵ1/\epsilon batches. ∎