Massively Parallel Maximum Coverage Revisited ¹¹1The conference version of this manuscript is to appear in the 50th International Conference on Current Trends in Theory and Practice of Computer Science.

Thai Bui , Hoa T. Vu [email protected], San Diego State University. Supported by NSF Grant No. [email protected], San Diego State University. Supported by NSF Grant No. 2342527.

Abstract

We study the maximum set coverage problem in the massively parallel model. In this setting, $m$ sets that are subsets of a universe of $n$ elements are distributed among $m$ machines. In each round, these machines can communicate with each other, subject to the memory constraint that no machine may use more than $\tilde{O}\left(n\right)$ memory. The objective is to find the $k$ sets whose coverage is maximized. We consider the regime where $k=\Omega(m)$ (i.e., $k=m/100$ ), $m=O(n)$ , and each machine has $\tilde{O}\left(n\right)$ memory ²²2The input size is $O(mn)$ and each machine has the memory enough to store a constant number of sets..

Maximum coverage is a special case of the submodular maximization problem subject to a cardinality constraint. This problem can be approximated to within a $1-1/e$ factor using the greedy algorithm, but this approach is not directly applicable to parallel and distributed models. When $k=\Omega(m)$ , to obtain a $1-1/e-\epsilon$ approximation, previous work either requires $\tilde{O}\left(mn\right)$ memory per machine which is not interesting compared to the trivial algorithm that sends the entire input to a single machine, or requires $2^{O(1/\epsilon)}n$ memory per machine which is prohibitively expensive even for a moderately small value $\epsilon$ .

Our result is a randomized $(1-1/e-\epsilon)$ -approximation algorithm that uses

O(1/\epsilon^{3}\cdot\log m\cdot(\log(1/\epsilon)+\log m))

rounds. Our algorithm involves solving a slightly transformed linear program of the maximum coverage problem using the multiplicative weights update method, classic techniques in parallel computing such as parallel prefix, and various combinatorial arguments.

1 Introduction

Maximum coverage is a classic NP-Hard problem. In this problem, we have $m$ sets $S_{1},S_{2},\ldots,S_{m}$ that are subsets of a universe of $n$ elements $[n]=\{1,2,\ldots,n\}$ . The goal is to find $k$ sets that cover the maximum number of elements. In the offline model, the greedy algorithm achieves a $1-1/e$ approximation and assuming $\textup{P}\neq\textup{NP}$ , this approximation is the best possible in polynomial time [9].

However, the greedy algorithm for maximum coverage and the related set cover problem is not friendly to streaming, distributed, and massively parallel computing. A large body of work has been devoted to designing algorithms for these problems in these big data computation models. An incomplete list of work includes [23, 13, 22, 17, 5, 12, 4, 7, 10, 6, 26, 24, 3, 14].

Some example applications of maximum coverage includes facility and sensor placement [18], circuit layout and job scheduling [11], information retrieval [1], market design [16], data summarization [24], and social network analysis [14].

The MPC model.

We consider the massively parallel computation model (MPC) in which $m$ sets $S_{1},S_{2},\ldots,S_{m}\subseteq[n]$ are distributed among $m$ machines. Each machine has memory $\tilde{O}(n)$ and holds a set. In each round, each machine can communicate with others with the constraint that no machine receives a total message of size more than $\tilde{O}(n)$ . Similar to previous work in the literature, we assume that $m\leq n$ .

The MPC model, introduced by Karloff, Suri, and Vassilvitskii [15] is an abstraction of various modern computing paradigms such as MapReduce, Hadoop, and Spark.

Previous work.

This problem is a special case of submodular maximization subject to a cardinality constraint. The results of Liu and Vondrak [21], Barbosa et al. [8], Kumar et al. [19] typically require that each machine has enough memory to store $O(\sqrt{km})$ items which are sets in our case (and storing a set requires $\tilde{O}\left(n\right)$ memory) with $\sqrt{m/k}$ machines. When $k=\Omega(m)$ (e.g., $k=m/100$ ), this means that a single machine may need $\tilde{O}\left(mn\right)$ memory. This is not better than the trivial algorithm that sends the entire input to a single machine and solves the problem in 1 round.

Assadi and Khanna gave a randomized $1-1/e-\epsilon$ approximation algorithm in which each machine has $\tilde{O}\left(m^{\delta/\epsilon}n\right)$ memory and the number of machines is $m^{1-\delta/\epsilon}$ for any $\epsilon,\delta\in(0,1)$ (see Corollary 10 in the full paper of [4]). Setting $\delta=\Theta(1/\log m)$ gives us a $1-1/e-\epsilon$ approximation in $O(1/\epsilon\cdot\log m)$ rounds with $O(m)$ machines each of which uses $\tilde{O}\left(2^{1/\epsilon}n\right)$ memory. While Assadi and Khanna’s result is nontrivial in this regime, the dependence on $\epsilon$ is exponential and if $n$ is large, then even a moderately small value of $\epsilon=0.01$ can lead to a prohibitively large memory requirement $\approx 2^{100}n$ . Their work however can handle the case where $k=o(m)$ .

Our result.

We present a relatively simple randomized algorithm that achieves a $1-1/e-\epsilon$ approximation in $O(1/\epsilon^{3}\cdot\log m\cdot(\log(1/\epsilon)+\log m))$ rounds with $\tilde{O}\left(n\right)$ memory per machine assuming $k=\Omega(m)$ . Our space requirement does not depend on $\epsilon$ compared to the exponential dependence in Assadi and Khanna’s result.

We note that assuming $k=\Omega(m)$ does not make the problem any easier since there are still exponentially many solutions to consider. In practice, one can think of many applications where one can utilize a constant fraction of the available sets (e.g., $10\%$ or $20\%$ ). We state our main result as a theorem below.

Theorem 1.

Assume $k=\Omega(m)$ and there are $m$ machines each of which has $\tilde{O}\left(n\right)$ memory. There exists an algorithm that with high probability finds $k$ sets that cover at least $(1-1/e-\epsilon)\textup{OPT}$ elements in $O(1/\epsilon^{3}\cdot\log m\cdot(\log(1/\epsilon)+\log m))$ rounds.

If the maximum frequency $f$ (the maximum number of sets that any element belongs to) is bounded, we can drop the assumption that $k=\Omega(m)$ , and parameterize the number of rounds based on $f$ . In particular, we can obtain a $1-1/e-\epsilon$ approximation in $O(f^{3}/\epsilon^{6}\cdot\log^{2}(\frac{kf}{\epsilon}))$ rounds.

Remark.

We could easily modify our algorithm so that each machine uses $\tilde{O}\left(1/\epsilon\cdot n\right)$ memory and the number of rounds is $O(1/\epsilon^{2}\cdot\log m\cdot(\log(1/\epsilon)+\log m))$ . At least one $\log m$ factor is necessary based on the lower bound given by Corollary 9 of [4].

Randomization appears in two parts of our algorithms: the rounding step and the subsampling step to reduce the number of rounds from $\log m\cdot\log n$ to $\log m\cdot(\log(1/\epsilon)+\log m)$ . If we only need to compute an approximation to the optimal coverage value such that the output is in the interval $[(1-\epsilon)\textup{OPT},\textup{OPT}/(1-1/e-\epsilon)]$ , then we have a deterministic algorithm that runs in $O(1/\epsilon^{3}\cdot\log n\cdot\log m)$ rounds. The algorithm by Assadi and Khanna [4] combines the sample-and-prune framework with threshold greedy. This strategy requires sampling sets. It is unclear how to derandomize their algorithm even just to compute an approximation to the optimal coverage value.

Our techniques and paper organization.

In Section 2.1, we transform the standard linear program for the maximum coverage problem into an equivalent packing linear program that can be solved “approximately” by the multiplicative weights update method. At a high level, the multiplicative weights update method gives us a fractional solution that is a $1-1/e-O(\epsilon)$ bi-criteria approximation where $(1+O(\epsilon))k$ “fractional” sets cover $(1-1/e-O(\epsilon))\textup{OPT}$ “fractional” elements. We then show how to find $k$ sets covering $(1-1/e-O(\epsilon))\textup{OPT}$ elements from this fractional solution through a combinatorial argument and parallel prefix.

Section 2.2 outlines the details to solve the transformed linear program in the MPC model. While this part is an adaptation of the standard multiplicative weights, an implementation in the MPC model requires some additional details such as the number of bits to represent the weights. All missing proofs can be found in the appendix.

Preliminaries.

In this work, we will always consider the case where each machine has $\tilde{O}\left(n\right)$ memory and $m\leq n$ . Without loss of generality, we may assume the non-central machine $j$ stores the set $S_{j}$ . For each element $i\in[n]$ , we use $f_{i}$ to denote the number of sets that $i$ is in. This is also referred to as the frequency of $i$ . Assume each machine has $\tilde{O}\left(n\right)$ space. The vector f can be computed in $O(\log m)$ rounds and broadcasted to all machines. Each machine $j$ starts with the characteristic vector $v_{j}\in\{0,1\}^{n}$ of the set $S_{j}$ that it holds. The vector f is just the sum of the characteristic vectors of the sets. We can aggregate the vectors $\{v_{j}\}$ in $O(\log m)$ rounds using the standard binary tree aggregation algorithm.

Since in this work, the dependence on $1/\epsilon$ is polynomial, an $\alpha-O(\epsilon)$ approximation can easily be translated to an $\alpha-\epsilon$ approximation by scaling $\epsilon$ by a constant factor. We can also assume that $1/\epsilon<n/10$ ; otherwise, we can simulate the greedy algorithm in $O(1/\epsilon)$ rounds. For the sake of exposition, we will not attempt to optimize the constants in our algorithm and analysis. Finally, in this work we consider $1-1/\operatorname{poly}(m)$ as a “high probability”. We use $[E]$ to denote the indicator variable of the event $E$ that is 1 if $E$ happens and 0 otherwise.

2 Algorithm

2.1 The main algorithm

Linear programming (re)formulation.

We first recall the relaxed linear program (LP) for the maximum coverage problem $\Pi_{0}$ :

maximize	$\displaystyle\sum_{i\in[n]}x_{i}$
(s.t.)	$\displaystyle x_{i}\leq\sum_{S_{j}\ni i}y_{j}$	$\displaystyle\forall i\in[n]$
	$\displaystyle\sum_{j\in[m]}y_{j}=k$
	$\displaystyle x_{i},y_{j}\in[0,1]$	$\displaystyle\forall i\in[n],j\in[m].$

We first reformulate this LP and then approximately solve the new LP using the multiplicative weights update method [2]. For each $j\in[m]$ , let $z_{j}:=1-y_{j}$ . We have the following fact.

Fact 1.

For each $i\in[n]$ , $x_{i}\leq\displaystyle\sum_{S_{j}\ni i}y_{j}\iff x_{i}+\sum_{S_{j}\ni i}z_{j}\leq\sum_{S_{j}\ni i}(y_{j}+z_{j})=f_{i}.$

Note that if $\textbf{y}\in[0,1]^{m}$ and $\sum_{j}y_{j}=k$ , then $\textbf{z}\in[0,1]^{m}$ and $\sum_{j}z_{j}=m-k$ . Thus, it is not hard to see that the original LP is equivalent to the following LP which we will refer to as $\Pi_{1}$ .

maximize	$\displaystyle\sum_{i\in[n]}x_{i}$
(s.t.)	$\displaystyle\frac{x_{i}}{f_{i}}+\frac{1}{f_{i}}\cdot\sum_{S_{j}\ni i}z_{j}\leq 1$	$\displaystyle\forall i\in[n]$
	$\displaystyle\sum_{j\in[m]}z_{j}=m-k$
	$\displaystyle x_{i},z_{j}\in[0,1]$	$\displaystyle\forall i\in[n],j\in[m].$

In this section, we will assume the existence an MPC algorithm that approximately solves the linear program $\Pi_{1}$ in $O(1/\epsilon^{3}\cdot\log n\cdot\log m)$ rounds. The proof will be deferred to Section 2.2.

Theorem 2.

There is an algorithm that finds $\textbf{x}\in[0,1]^{n},\textbf{z}\in[0,1]^{m}$ such that

1.

$\sum_{i\in[n]}x_{i}\geq(1-\epsilon)\textup{OPT}$ ,
2.

$\sum_{j\in[m]}z_{j}=m-k$ , and
3.

$\frac{x_{i}}{f_{i}}+\frac{1}{f_{i}}\cdot\sum_{S_{j}\ni i}z_{j}\leq 1+\epsilon\ \ \,\forall i\in[n]$

in $O(\epsilon^{-3}\log n\cdot\log m)$ rounds.

Let x and z be the be the output given by Theorem 2. Then, let $\textbf{x}^{\prime}=\textbf{x}/(1+\epsilon)$ , $\textbf{z}^{\prime}=\textbf{z}/(1+\epsilon)$ , and $\textbf{y}^{\prime}=\mathbf{1}-\textbf{z}^{\prime}$ . We have

$\displaystyle\sum_{i=1}^{n}x^{\prime}_{i}$	$\displaystyle=\frac{1}{1+\epsilon}\sum_{i=1}^{n}x_{i}\geq\frac{1-\epsilon}{1+\epsilon}\textup{OPT}>(1-4\epsilon)\textup{OPT},$	(1)
$\displaystyle\sum_{j=1}^{m}y^{\prime}_{i}$	$\displaystyle=\sum_{j=1}^{m}\left(1-\frac{z_{j}}{1+\epsilon}\right)=m-\frac{m-k}{1+\epsilon}\leq m-(1-2\epsilon)(m-k)<k+2\epsilon m,$	(2)
$\displaystyle x_{i}^{\prime}+$	$\displaystyle\sum_{S_{j}\ni i}z_{j}^{\prime}\leq f_{i}\iff x_{i}^{\prime}\leq\sum_{S_{j}\ni i}y_{j}^{\prime},\ \ \forall i\in[n]\text{, by Fact \ref{fact:equiv-lp}}.$	(3)

Thus, by setting $\textbf{x}\leftarrow\textbf{x}/(1+\epsilon)$ , and $\textbf{y}\leftarrow\textbf{y}^{\prime}$ , we have an approximate solution $\textbf{x}\in[0,1]^{n},\textbf{y}\in[0,1]^{n}$ to the LP $\Pi_{0}$ such that

	$\displaystyle\sum_{i=1}^{n}x_{i}$	$\displaystyle\geq(1-4\epsilon)\textup{OPT},$	$\displaystyle\sum_{j=1}^{m}y_{j}\leq k+2\epsilon m,\text{ and }$
	$\displaystyle x_{i}$	$\displaystyle\leq\sum_{S_{j}\ni i}y_{j},\forall i\in[n].$

We can then apply the standard randomized rounding to find a sub-collection of at most $k+2\epsilon m$ sets that covers at least $(1-4\epsilon)\textup{OPT}$ elements. For the sake of completeness, we will provide the rounding algorithm in the MPC model in the following lemma. The proof can be found in Appendix A.

Lemma 1.

Suppose $\textbf{x}\in[0,1]^{n}$ and $\textbf{y}\in[0,1]^{m}$ satisfy:

1.

$\sum_{i\in[n]}x_{i}\geq L$ ,
2.

$x_{i}\leq\sum_{S_{j}\ni i}y_{j}$ for all $i\in[n]$ ,
3.

$\sum_{j\in[m]}y_{j}=k$ ,
4.

$x_{i},y_{j}\in[0,1]$ for all $i\in[n]$ and $j\in[m]$ .

Then there exists a rounding algorithm that finds a sub-collection of $k$ sets that in expectation cover at least $(1-1/e)L$ elements in $O(1)$ round. To obtain a high probability guarantee, the algorithm requires $O(1/\epsilon\cdot\log m)$ rounds to find $k$ sets that cover least $(1-1/e-O(\epsilon))L$ elements.

Applying Lemma 1 to x and y with $k+2\epsilon m$ in place of $k$ , we obtain a sub-collection of at most $k+2\epsilon m$ sets that covers at least $(1-1/e-O(\epsilon))\textup{OPT}$ elements. Since we assume that $k=\Omega(m)$ , that means we have found $k+O(\epsilon)k$ sets that cover at least $(1-1/e-O(\epsilon))\textup{OPT}$ elements. The next lemma shows that we can find $k$ sets among these that cover at least $(1-1/e-O(\epsilon))\textup{OPT}$ elements. The proof is a combination of a counting argument and the well-known parallel prefix algorithm [20].

1 Compute

|S_{1}|,|S_{2}\setminus S_{1}|,|S_{3}\setminus(S_{1}\cup S_{2})|,\ldots,|S_{k}\setminus(S_{1}\cup\ldots\cup S_{k-1})|

O(\log k)

rounds.

3Function PrefixCoverage( $S_{1},S_{2},\ldots,S_{k}$ ):

// Compute

|S_{1}|,|S_{2}\cup S_{1}|,|S_{3}\cup S_{2}\cup S_{1}|,\ldots,|S_{k}\cup S_{k-1}\cup\ldots\cup S_{1}|

4 if $k=1$ then

5 return

|S_{1}|

6 else

// Assume

k

is even.

7 In one round, machine

2j-1

sends

S_{2j-1}

to machine

2j

, then machine

2j

computes

Q_{j}=S_{2j-1}\cup S_{2j}

8 Run PrefixCoverage( $Q_{1},Q_{2},\ldots,Q_{k/2}$ ) on machines

2,4,6,\ldots,k

9 Machine

j

now has

S_{1}\cup S_{2}\cup S_{3}\cup\ldots\cup S_{j}

for

j=2,4,6,\ldots,k

10 In one round, machine

j=1,3,5,\ldots,k-1

communicates with machine

j-1

which has

S_{1}\cup S_{2}\cup\ldots S_{j-1}

and computes

S_{1}\cup S_{2}\cup\ldots\cup S_{j}

// If

k

is odd, run the above algorithm on

S_{1},S_{2},\ldots,S_{k-1}

and then compute

S_{1}\cup S_{2}\cup\ldots\cup S_{k}

in one round.

13 Each machine

j

communicates with machine

j-1

to compute

|S_{1}\cup S_{2}\cup\ldots\cup S_{j}|-|S_{1}\cup S_{2}\cup\ldots\cup S_{j-1}|

in one round.

Algorithm 1 Parallel prefix coverage

We rely on the following result which is a simulation of the parallel prefix in our setting.

Lemma 2.

Suppose there are $k$ sets and machine $j$ holds the set $S_{j}$ . Then Algorithm 1 computes $|S_{1}|,|S_{2}\setminus S_{1}|,|S_{3}\setminus(S_{1}\cup S_{2})|,|S_{4}\setminus(S_{1}\cup S_{2}\cup S_{3})|,\ldots,|S_{k}\cup S_{k-1}\cup\ldots\cup S_{1}|$ in $O(\log k)$ rounds.

Proof.

We first show how to compute $(S_{1}),(S_{1}\cup S_{2}),(S_{1}\cup S_{2}\cup S_{3}),\ldots,(S_{1}\cup S_{2}\cup\ldots\cup S_{k})$ in $O(\log k)$ rounds where machine $j$ holds $S_{1}\cup S_{2}\cup\ldots\cup S_{j}$ at the end. Once this is done, machine $j$ can send $S_{1}\cup S_{2}\cup\ldots\cup S_{j}$ to machine $j+1$ and machine $j+1$ can compute $|S_{1}\cup S_{2}\cup\ldots\cup S_{j+1}|-|S_{1}\cup S_{2}\cup\ldots\cup S_{j}|=|S_{j+1}\setminus(S_{1}\cup S_{2}\cup\ldots\cup S_{j})|$ .

The algorithm operates recursively. In one round, machine $2j-1$ sends $S_{2j-1}$ to machine $2j$ , then machine $2j$ computes $Q_{j}=S_{2j-1}\cup S_{2j}$ . Assuming $k$ is even, the algorithm recursively computes $(Q_{1}),(Q_{1}\cup Q_{2}),(Q_{1}\cup Q_{2}\cup Q_{3}),\ldots,(Q_{1}\cup Q_{2}\cup\ldots\cup Q_{k})$ on machines $2,4,\ldots,k$ . After recursion, machines with even indices $2j$ has the set $S_{1}\cup S_{2}\cup\ldots\cup S_{2j}$ . Then, in one round, machines with odd indices $2j+1$ communicate with machine $2j$ to learn about $S_{1}\cup S_{2}\cup\ldots\cup S_{2j+1}$ . If $k$ is odd, we just do the same on $S_{1},S_{2},\ldots,S_{k-1}$ and then compute $S_{1}\cup S_{2}\cup\ldots\cup S_{k}$ in one round.

There are $O(\log k)$ recursion levels and therefore, the total number of rounds is $O(\log k)$ . ∎

Lemma 3.

Let $\mathcal{S}=\{S_{1},\dots,S_{r}\}$ be a collection of $r=(1+\gamma)k$ sets whose union contains $L$ elements where $\gamma\in[0,1)$ , then there exist $k$ sets in $\mathcal{S}$ whose union contains at least $(1-\gamma)L$ elements. Furthermore, we can find these $k$ sets in $O(\log r)$ rounds.

Proof.

Consider the following quantities:

	$\displaystyle\phi_{1}$	$\displaystyle=\|S_{1}\|$
	$\displaystyle\phi_{2}$	$\displaystyle=\|S_{1}\cup S_{2}\|-\|S_{1}\|$
	$\displaystyle\phi_{3}$	$\displaystyle=\|S_{1}\cup S_{2}\cup S_{3}\|-\|S_{1}\cup S_{2}\|$
	$\displaystyle\ldots$

Clearly, $\sum_{j=1}^{r}\phi_{j}=L$ . We say $S_{j}$ is responsible for element $i$ if $i\in S_{j}\setminus(\bigcup_{l<j}S_{l})$ . This establishes a one-to-one correspondence between the sets ${S_{1},\dots,S_{r}}$ and the elements they cover. $S_{j}$ is responsible for exactly $\phi_{j}$ elements. Furthermore, if we remove some sets from $\mathcal{S}$ , and an element becomes uncovered, the set responsible for that element must have been removed. Thus, if we remove the $\gamma k$ sets corresponding to the $\gamma k$ smallest $\phi_{j}$ , then at most $\gamma L$ elements will not have a responsible set. Thus, the number of elements that become uncovered is at most $\gamma L$ .

To find these sets, we apply Lemma 2 with $r$ in place of $k$ and $O(\epsilon)$ in place of $\gamma$ to learn about $\phi_{1},\phi_{2},\ldots,\phi_{r}$ in $O(\log r)=O(\log k)$ rounds. We then remove the $\gamma k=O(\epsilon)k$ sets corresponding to the $\gamma k$ smallest $\phi_{j}$ and output the remaining $k$ sets. ∎

Putting it all together.

We spend $O(1/\epsilon^{3}\cdot\log n\cdot\log m)$ rounds to approximately solve the linear program $\Pi_{1}$ . From there, we can round the solution to find a sub-collection of $k+O(\epsilon)k$ sets that cover at least $(1-1/e-O(\epsilon))\textup{OPT}$ elements with high probability in $O(1/\epsilon\cdot\log m)$ rounds. We then apply Lemma 3 to find $k$ sets among these that cover at least $(1-1/e-O(\epsilon))\textup{OPT}$ elements in $O(\log k)$ rounds. The total number of rounds is therefore $O(1/\epsilon^{3}\cdot\log n\cdot\log m)$ .

Reducing the number of rounds to $O(1/\epsilon^{3}\cdot\log m\cdot(\log m+\log(1/\epsilon)))$ .

The described algorithm runs in $O(1/\epsilon^{3}\cdot\log n\cdot\log m)$ rounds. Our main result in Theorem 1 states a stronger bound $O(1/\epsilon^{3}\cdot\log m\cdot(\log m+\log(1/\epsilon)))$ rounds. We achieve this by adopting the sub-sampling framework of McGregor and Vu [23].

Without loss of generality, we may assume that each element is covered by some set. If not, we can remove all of the elements that are not covered by any set using $O(\log m)$ rounds. Specifically, let $\textbf{v}_{j}$ be the characteristic vector of $S_{j}$ . We can compute $\textbf{v}=\sum_{i=1}^{j}\textbf{v}_{j}$ in $O(\log m)$ rounds using the standard converge-cast binary tree algorithm. We can then remove the elements that are not covered by any set (elements corresponding to 0 entries in v).

We now have $m$ sets covering $n$ elements. Since $k=\Omega(m)$ , we must have that $\textup{OPT}=\Omega(n)$ . McGregor and Vu showed that if one samples each element in the universe $[n]$ independently with probability $p=\Theta\left(\frac{\log{m\choose k}}{\epsilon^{2}\textup{OPT}}\right)$ then with high probability, if we run a $\beta$ approximation algorithm on the subsampled universe, the solution will correpond to a $\beta-\epsilon$ approximation on the original universe. We have just argued that $\textup{OPT}=\Omega(n)$ and therefore with high probability, we sample $O\left(\frac{\log{m\choose k}}{\epsilon^{2}}\right)=O(1/\epsilon^{2}\cdot m)$ elements by appealing to Chernoff bound and the fact that ${m\choose k}\leq 2^{m}$ .

As a result, we may assume that $n=O(1/\epsilon^{2}\cdot m)$ . This results in an $O(1/\epsilon^{2}\cdot\log m\cdot(\log m+\log(1/\epsilon)))$ round algorithm.

Bounded frequency.

Assuming $f=\max_{i}f_{i}$ is known, we can lift the assumption that $k=\Omega(m)$ and parameterize our algorithm based on $f$ instead. McGregor et al. [22] showed that the largest $\lceil kf/\eta\rceil$ sets contain a solution that covers at least $(1-\eta)\textup{OPT}$ elements. We therefore can assume that $m=O(kf/\eta)$ by keeping only the largest $\lceil kf/\eta\rceil$ sets which can be identified in $O(1)$ rounds.

We set $\epsilon=\eta^{2}/f$ and proceed to obtain a solution that covers at least $(1-\eta^{2}/f)(1-\eta)\textup{OPT}=(1-O(\eta))\textup{OPT}$ elements using at most $k+O(\epsilon m)=k+O(\eta^{2}/f\cdot kf/\eta)=k+O(\eta k)$ sets as in the discussion above. Appealing to Lemma 3, we can find $k$ sets that cover at least $(1-O(\eta))\textup{OPT}$ elements. The total number of rounds is $O(f^{3}/\eta^{6}\cdot\log\frac{kf}{\eta}\cdot(\log\frac{1}{\eta}+\log\frac{kf}{\eta}))=O(f^{3}/\eta^{6}\cdot\log^{2}\frac{kf}{\eta})$ .

2.2 Approximate the LP’s solution via multiplicative weights

Fix an objective value $L$ . Let $P$ be a convex region defined by

P=\left\{(\textbf{x},\textbf{z})\in[0,1]^{n}\times[0,1]^{m}:\sum_{i\in[n]}x_{i}=L\text{ and }\sum_{j\in[m]}z_{j}=m-k\right\}.

Note that if $\left(\textbf{x}_{1},\textbf{z}_{1}),(\textbf{x}_{2},\textbf{z}_{2}\right),\ldots,\left(\textbf{x}_{T},\textbf{z}_{T}\right)\in P$ then $\left(\frac{1}{T}\sum_{t=1}^{T}\textbf{x}_{t},\frac{1}{T}\sum_{t=1}^{T}\textbf{z}_{t}\right)\in P$ . Consider the following problem $\Psi_{1}$ that asks to either correctly declare that

\nexists(\textbf{x},\textbf{z})\in P:\frac{x_{i}}{f_{i}}+\frac{1}{f_{i}}\cdot\sum_{S_{j}\ni i}z_{j}\leq 1,\ \ \ \forall i\in[n]

or to output a solution $(\textbf{x},\textbf{z})\in P$ such that

\frac{x_{i}}{f_{i}}+\frac{1}{f_{i}}\cdot\sum_{S_{j}\ni i}z_{j}\leq 1+\epsilon\ \ \ \forall i\in[n].

Once we have such an algorithm, we can try different values of $L=\lfloor(1+\epsilon)^{0}\rfloor,\lfloor(1+\epsilon)^{1}\rfloor,\lfloor(1+\epsilon)^{2}\rfloor,\ldots,n$ and return the solution corresponding to the largest $L$ that has a feasible solution. There are $O(1/\epsilon\cdot\log n)$ such guesses. We know that the guess $L$ where $\textup{OPT}/(1+\epsilon)\leq L\leq\textup{OPT}$ must result in a feasible solution.

To avoid introducing a $\log n$ factor in the number of rounds, we partition these $O(1/\epsilon\cdot\log n)$ guesses into batches of size $O(1/\epsilon)$ where each batch corresponds to $O(\log n)$ guesses. Algorithm copies that correspond to guesses in the same batch will run in parallel. This will only introduce a $\log n$ factor in terms of memory used by each machine. By returning the solution corresponding to the largest feasible guess $L$ , one attains Theorem 2.

Oracle implementation.

Given a weight vector $\textbf{w}\in\mathbb{R}^{n}$ in which $w_{i}\geq 0$ for all $i\in[n]$ . We first consider an easier feasibility problem $\Psi_{2}$ . It asks to either correctly declares that

\displaystyle\nexists(\textbf{x},\textbf{z})\in P:\ \ \sum_{i=1}^{n}w_{i}\cdot\left(\frac{x_{i}}{f_{i}}+\sum_{S_{j}\ni i}\frac{z_{i}}{f_{i}}\right)\leq\sum_{i=1}^{n}w_{i},\ \ \ \forall i\in[n]

or to outputs a solution $(\textbf{x},\textbf{z})\in P$ such that

\displaystyle\sum_{i=1}^{n}w_{i}\cdot\left(\frac{x_{i}}{f_{i}}+\sum_{S_{j}\ni i}\frac{z_{i}}{f_{i}}\right)\leq\sum_{i=1}^{n}w_{i}+\frac{1}{n^{5}},\ \ \ \forall i\in[n].

(4)

That is, if the input is feasible, then output the corresponding $(\textbf{x},\textbf{z})\in P$ that approximately satisfy the constraint. Otherwise, correctly conclude that the input is infeasible. In the multiplicative weights update framework, this is known as the approximate oracle.

Note that if there is a feasible solution to $\Psi_{1}$ , then there is a feasible solution to $\Psi_{2}$ since

\displaystyle\frac{x_{i}}{f_{i}}+\frac{1}{f_{i}}\cdot\sum_{S_{j}\ni i}z_{j}\leq 1\ \ \ \forall i\in[n]\implies\sum_{i=1}^{n}w_{i}\cdot\left(\frac{x_{i}}{f_{i}}+\sum_{S_{j}\ni i}\frac{z_{i}}{f_{i}}\right)\leq\sum_{i=1}^{n}w_{i}.

We can implement an oracle that solves the above feasibility problem $\Psi_{2}$ as follows. First, observe that

	$\displaystyle\sum_{i=1}^{n}\frac{w_{i}}{f_{i}}\cdot\left(x_{i}+\sum_{S_{j}\ni i}z_{j}\right)\leq\sum_{i=1}^{n}w_{i}$
	$\displaystyle\iff\sum_{i=1}^{n}\frac{w_{i}}{f_{i}}\cdot x_{i}+\sum_{i=1}^{n}\frac{w_{i}}{f_{i}}\cdot\sum_{S_{j}\ni i}z_{j}\leq\sum_{i=1}^{n}w_{i}$
	$\displaystyle\iff\sum_{i=1}^{n}\frac{w_{i}}{f_{i}}\cdot x_{i}+\sum_{j=1}^{m}z_{j}\cdot\sum_{i\in S_{j}}\frac{w_{i}}{f_{i}}\leq\sum_{i=1}^{n}w_{i}.$

To ease the notation, define

\displaystyle p_{i}:=\frac{w_{i}}{f_{i}},\ \ \ \forall i\in[n]\text{, and }q_{j}:=\sum_{i\in S_{j}}\frac{w_{i}}{f_{i}}=\sum_{i\in S_{j}}p_{i},\ \ \ \forall j\in[m].

We therefore want to check if there exists $(\textbf{x},\textbf{z})\in P$ such that

\displaystyle LHS(\textbf{x},\textbf{z}):=\sum_{i=1}^{n}x_{i}p_{i}+\sum_{j=1}^{m}z_{j}q_{j}\leq\sum_{i=1}^{n}w_{i}.

We will minimize the left hand side by minimizing each sum separately. We can indeed do this exactly. However, there is a subtle issue where we need to bound the number of bits required to represent $p_{i}=\frac{w_{i}}{f_{i}}$ and $q_{j}=\sum_{i\in S_{j}}\frac{w_{i}}{f_{i}}=\sum_{i\in S_{j}}p_{i}$ given the memory constraint. To do this, we truncate the value of each $p_{i}$ after the $(10\log_{2}n)$ -th bit following the decimal point. Note that this will result in an underestimate of $p_{i}$ by at most $1/n^{10}$ . In particular, let $\hat{p}_{i}$ be $p_{i}$ after truncating the value of $p_{i}$ at the $(10\log_{2}n)$ -th bit after the decimal point and $\hat{q}_{j}=\sum_{i\in S_{j}}\hat{p}_{i}$ . For any $(\textbf{x},\textbf{z})\in[0,1]^{n}\times[0,1]^{m}$ , we have

	$\displaystyle\widehat{LHS}(\textbf{x},\textbf{z})$	$\displaystyle:=\sum_{i=1}^{n}\hat{p}_{i}x_{i}+\sum_{j=1}^{m}z_{j}\sum_{i\in S_{j}}\hat{p}_{i}\geq\left(\sum_{i=1}^{n}p_{i}x_{i}\right)-\frac{n}{n^{10}}+\left(\sum_{j=1}^{m}z_{j}\sum_{i\in S_{j}}p_{i}\right)-\frac{mf_{i}}{n^{10}}$
		$\displaystyle>LHS(\textbf{x},\textbf{z})-\frac{1}{n^{5}}.$

The last inequality is based on the assumption that $m,f_{i}\leq n$ . Therefore, $LHS(\textbf{x},\textbf{z})-1/n^{5}\leq\widehat{LHS}(\textbf{x},\textbf{z})\leq LHS(\textbf{x},\textbf{z})$ . Note that since $\sum_{i=1}^{n}x_{i}=L$ and $\sum_{j=1}^{m}z_{j}=m-k$ , to minimize $\widehat{LHS}(\textbf{x},\textbf{z})$ over $(\textbf{x},\textbf{z})\in P$ , we simply set

	$\displaystyle x_{i}$	$\displaystyle=[\text{$\hat{p}_{i}$ is among the $L$ smallest values of $\{\hat{p}_{t}\}_{t=1}^{n}$}\}],$
	$\displaystyle z_{j}$	$\displaystyle=[\text{$\hat{q}_{j}$ is among the $m-k$ smallest values of $\{\hat{q}_{t}\}_{t=1}^{m}$}\}].$

After setting $\textbf{x},\textbf{z}$ as above, if $\widehat{LHS}(\textbf{x},\textbf{z})>\sum_{i=1}^{n}w_{i}\implies LHS(\textbf{x},\textbf{z})>\sum_{i=1}^{n}w_{i}$ , then it is safe to declare that the system is infeasible. Otherwise, we have found $(\textbf{x},\textbf{z})\in P$ such that $\widehat{LHS}(\textbf{x},\textbf{z})\leq\sum_{i=1}^{n}w_{i}\implies LHS(\textbf{x},\textbf{z})\leq\sum_{i=1}^{n}w_{i}+1/n^{5}$ as required by Equation (4).

Lemma 4.

Assume that all machines have the vector w. We can solve the feasibility problem $\Psi_{2}$ in $O(1)$ rounds, where all machines either learn that the system is infeasible or obtain an approximate solution $(\textbf{x},\textbf{z})\in P$ that satisfies Equation (4).

Proof.

We have argued above that the oracle we design either correctly declares that the system is infeasible or outputs $(\textbf{x},\textbf{z})\in P$ such that $LHS(\textbf{x},\textbf{z})\leq\sum_{i=1}^{n}w_{i}+1/n^{5}$ . Recall that each machine has the frequency vector f and suppose for the time being that each machine also has the vector w. We can implement this oracle in $O(1)$ rounds as follows. Each machine $j$ computes $\hat{q}_{j}=\sum_{i\in S_{j}}\hat{p}_{i}$ since it has $w_{i}$ and $f_{i}$ to compute $p_{i}$ for all $i\in[n]$ . Then it sends $\hat{q}_{j}$ to the central machine. Note that each $\hat{p}_{i}$ can be represented using $O(\log n)$ bits. Observe that $\hat{q}_{j}$ is the sum of at most $n$ different $\hat{p}_{i}$ . The number of bits in the fractional part does not increase while the number of bits in the integer part increases by at most $O(1)$ , as the sum is upper bounded by a crude bound $2n\times 2^{O(\log n)}=2^{O(\log n)}$ . Thus, the central machine receives at most $\tilde{O}\left(m\right)=\tilde{O}\left(n\right)$ bits from all other machines.

The central machine will then set x and z as described above. This allows the central machine to correctly determine whether the system is infeasible or to find $(\textbf{x},\textbf{z})\in P$ that satisfies Equation (4). Since the entries of x and z are binary, they can be sent to non-central machines, with each machine receiving a message of $n+m=O(n)$ bits. We summarize the above as the following lemma. ∎

Solving the LP via multiplicative weights.

Once the existence of such an oracle is guaranteed, we can follow the multiplicative weights framework to approximately solve the LP. We will first explain how to implement the MWU algorithm in the MPC model. See Algorithm 2.

Input: Objective value

L

\epsilon\leq 1/4

1 Initialize

w_{i}^{(0)}=1

for all

i\in[n]

2 for iteration $t=1,2,\ldots,T=O(1/\epsilon^{2}\cdot\log n)$ do

3 Run the oracle in Lemma 4 with

\textbf{w}^{(t-1)}

to check if there exists a feasible solution. If the answer is Infeasible, stop the algorithm. If the answer is Feasible, let

\textbf{x}^{(t)}

and

\textbf{z}^{(t)}

be the output of the oracle that are now stored in all machines .

4 Each machine

j

constructs

Y_{j}=\{Y_{j1},Y_{j2},\ldots,Y_{jn}\}

where

Y_{ji}={z_{j}^{(t)}}\cdot[\{i\in S_{j}\}]

6 Compute

W=\sum_{j\in[m]}Y_{j}

O(\log m)

rounds using the a converge-cast binary tree and send

W

to the central machine. Note that

W_{i}=\sum_{S_{j}\ni i}{z_{j}^{(t)}}

7 For each

i\in[n]

, the central machine computes

E_{i}^{(t)}=f_{i}\cdot\textup{error}_{i}^{(t)}

where

\textup{error}_{i}^{(t)}:=1-\frac{x_{i}^{(t)}}{f_{i}}-\frac{W_{i}}{f_{i}}=1-\frac{x_{i}^{(t)}}{f_{i}}-\sum_{S_{j}\ni i}\frac{z_{j}^{(t)}}{f_{i}}\ \ \ \forall i\in[n].

and sends

\sum_{d=1}^{t}E_{i}^{(d)}

to all other machines.

8 For each

i\in[n]

, each machine computes

w_{i}^{(t)}=2^{-\epsilon\cdot\sum_{d=1}^{t}E_{i}^{(d)}/f_{i}}=2^{-\epsilon\cdot\sum_{d=1}^{t}\textup{error}_{i}^{(d)}}=2^{-\epsilon\cdot\textup{error}_{i}^{(t)}}w_{i}^{(t-1)}.

10After

T

iterations, output

\textbf{x}=\frac{1}{T}\sum_{t=1}^{T}\textbf{x}^{(t)}\text{, and }\textbf{z}=\frac{1}{T}\sum_{t=1}^{T}\textbf{z}^{(t)}.

Algorithm 2 Multiplicative weights for solving the LP

Lemma 5.

Algorithm 2 can be implemented in $O(1/\epsilon^{2}\cdot\log n\cdot\log m)$ rounds.

Proof.

Without loss of generality, we can round $\epsilon$ down to the nearest power of $1/2$ . Then, $-\epsilon$ can be represented using $O(\log(1/\epsilon))=O(\log n)$ bits, since we assume $1/\epsilon<n$ .

Each machine maintains the current vectors $\textbf{x}^{(t)}$ , $\textbf{z}^{(t)}$ , and $\textbf{w}^{(t)}$ , each of which has at most $O(n)$ entries. Since the entries in $\textbf{x}^{(t)}$ and $\textbf{z}^{(t)}$ are binary, we can represent them using $O(n)$ bits. Recall that, in the oracle described in Lemma 4, the central machine can broadcast $\textbf{x}^{(t)}$ and $\textbf{z}^{(t)}$ to all other machines in one round without violating the memory constraint.

It remains to show that the number of bits required to represent $w_{i}^{(t)}$ is $O(\log n)$ in all iterations. The central machine then implicitly broadcasts $w_{i}^{(t)}$ to all other machines in iteration $t$ by sending $\sum_{d=1}^{t}E_{i}^{(d)}$ to all other machines for each $i\in[n]$ .

Note that each $w_{i}^{(t)}$ has the form

w_{i}^{(t)}=2^{-\epsilon\cdot\sum_{d=1}^{t}E_{i}^{(d)}/f_{i}}=2^{-\epsilon\cdot\sum_{d=1}^{t}\textup{error}_{i}^{(d)}}.

For any $d$ , since $m,f_{i}\leq n$ , we have

f_{i}-m-1\leq f_{i}\cdot\textup{error}_{i}^{(d)}\leq f_{i}\implies f_{i}\cdot\textup{error}_{i}^{(d)}\in[-2n,2n].

Since $T=\Theta(1/\epsilon^{2}\cdot\log n)=o(n^{3})$ , we have

\sum_{d=1}^{t}E_{i}^{(d)}=\sum_{d=1}^{t}f_{i}\cdot\textup{error}_{i}^{(d)}\in[-2nt,2nt]\subset[-2n^{4},2n^{4}].

Thus, we can represent $\sum_{d=1}^{t}E_{i}^{(d)}$ using $O(\log n)$ bits. This implies that the central machine can implicitly broadcast $w_{i}^{(t)}$ to all other machines in $O(1)$ rounds. Putting it all together, each of the $T=\Theta(1/\epsilon^{2}\cdot\log n)$ iterations consists of: 1) a call to the oracle, which takes $O(1)$ rounds, 2) computing $\textup{error}_{i}^{(t)}$ for each $i\in[n]$ in step 7, which takes $O(\log m)$ rounds, and 3) the central machine broadcasting $\textbf{w}^{(t)}$ in one round to all other machines before proceeding to the next iteration. ∎

The next lemma is an adaptation of the standard multiplicative weights algorithm.

Lemma 6.

The output of Algorithm 2 satisfies the following property. If there exists a feasible solution, then the output satisfies:

\displaystyle\frac{x_{i}}{f_{i}}+\sum_{S_{j}\ni i}\frac{z_{j}}{f_{i}}

\displaystyle\leq 1+\epsilon\ \ \ \forall i\in[n]\text{, and }\sum_{i=1}^{n}x_{i}=L.

Otherwise, the algorithm correctly concludes that the system is infeasible.

Proof.

If the algorithm does not output Infeasible, this implies that in each iteration $t$ , $\sum_{i=1}^{n}x_{i}^{(t)}=L$ , and therefore $\sum_{i=1}^{n}x_{i}=L$ since $\textbf{x}=\frac{1}{T}\sum_{t=1}^{T}\textbf{x}^{(t)}$ . Similarly, in each iteration $t$ , $\sum_{j=1}^{m}z_{j}^{(t)}=m-k$ , and thus $\sum_{j=1}^{m}z_{j}=m-k$ since $\textbf{z}=\frac{1}{T}\sum_{t=1}^{T}\textbf{z}^{(t)}$ . Hence, the output $(\textbf{x},\textbf{z})\in P$ . Define the potential function

\displaystyle\Phi^{(t)}:=\sum_{i=1}^{n}w_{i}^{(t)}.

We will make use of the fact $\text{exp}\left(-\eta x\right)\leq 1-\eta x+\eta^{2}x^{2}$ for $|\eta x|\leq 1$ . Note that for all $i$ and $t$ , we always have

\textup{error}_{i}^{(t)}=1-\frac{x_{i}^{(t)}}{f_{i}}-\sum_{S_{j}\ni i}\frac{z_{j}^{(t)}}{f_{i}}\in[-1,1].

Note that $\textup{error}_{i}^{(t)}\in[-1,1]$ because $0\leq x_{i}^{(t)}\leq 1\leq f_{i}$ and $0\leq\sum_{S_{j}\ni i}{z_{j}^{(t)}}\leq\sum_{S_{j}\ni i}1=f_{i}$ . Set $\alpha=\epsilon\cdot\ln(2)$ , as long as $\epsilon\leq 1/4$ , we have $|\alpha\cdot\textup{error}_{i}^{(t)}|<1$ and therefore

\displaystyle w_{i}^{(t)}

\displaystyle=\text{exp}\left(-\alpha\cdot\textup{error}_{i}^{(t)}\right)\cdot w_{i}^{(t-1)}\leq\left(1-\alpha\cdot\textup{error}_{i}^{(t)}+\alpha^{2}\cdot\left(\textup{error}_{i}^{(t)}\right)^{2}\right)\cdot w_{i}^{(t-1)}.

Summing over $i$ gives:

	$\displaystyle\Phi^{(t)}$	$\displaystyle\leq\sum_{i=1}^{n}\left(1-\alpha\cdot\textup{error}_{i}^{(t)}+\alpha^{2}\right)\cdot w_{i}^{(t-1)}$
		$\displaystyle=(1+\alpha^{2})\sum_{i=1}^{n}w_{i}^{(t-1)}-\alpha\sum_{i=1}^{n}\textup{error}_{i}^{(t)}\cdot w_{i}^{(t-1)}.$

The first inequality follows from the fact that $\left(\textup{error}_{i}^{(t)}\right)^{2}\in[0,1]$ . Note that

	$\displaystyle\sum_{i=1}^{n}\textup{error}_{i}^{(t)}w_{i}^{(t-1)}$	$\displaystyle=\sum_{i=1}^{n}w_{i}^{(t-1)}\left(1-\frac{x_{i}^{(t-1)}}{f_{i}}-\sum_{S_{j}\ni i}\frac{z_{j}^{(t-1)}}{f_{i}}\right)$
		$\displaystyle=\sum_{i=1}^{n}w_{i}^{(t-1)}-\sum_{i=1}^{n}w_{i}^{(t-1)}\left(\frac{x_{i}^{(t-1)}}{f_{i}}+\sum_{S_{j}\ni i}\frac{z_{j}^{(t-1)}}{f_{i}}\right)\geq-\frac{1}{n^{5}}.$

The last inequality follows from the fact that the oracle is guaranteed to find $\textbf{x}^{(t-1)}$ and $\textbf{z}^{(t-1)}$ such that

\displaystyle\sum_{i=1}^{n}\left(\frac{x_{i}^{(t-1)}}{f_{i}}+\sum_{S_{j}\ni i}\frac{z_{j}^{(t-1)}}{f_{i}}\right)w_{i}^{(t-1)}\leq\sum_{i=1}^{n}w_{i}^{(t-1)}+\frac{1}{n^{5}}.

Thus, $\Phi^{(t)}\leq(1+\alpha^{2})\sum_{i=1}^{n}w_{i}^{(t-1)}+\frac{\alpha}{n^{5}}=(1+\alpha^{2})\Phi^{t-1}+\frac{\alpha}{n^{5}}.$ By a simple induction,

	$\displaystyle\Phi^{(T)}$	$\displaystyle\leq(1+\alpha^{2})\Phi^{(T-1)}+\frac{\alpha}{n^{5}}$
		$\displaystyle\leq(1+\alpha^{2})^{2}\Phi^{(T-2)}+\frac{\alpha}{n^{5}}\left(1+\alpha^{2}\right)+\frac{\alpha}{n^{5}}$
		$\displaystyle\leq(1+\alpha^{2})^{3}\Phi^{(T-3)}+(1+\alpha^{2})^{2}\frac{\alpha}{n^{5}}+\frac{\alpha}{n^{5}}\left(1+\alpha^{2}\right)+\frac{\alpha}{n^{5}}$
		$\displaystyle\ldots$
		$\displaystyle\leq(1+\alpha^{2})^{T}\Phi^{(0)}+\frac{\alpha}{n^{5}}\sum_{t=0}^{T-1}(1+\alpha^{2})^{t}$
		$\displaystyle=(1+\alpha^{2})^{T}\Phi^{(0)}+\frac{\alpha}{n^{5}}\cdot\frac{(1+\alpha^{2})^{T}-1}{\alpha^{2}}$
		$\displaystyle\leq(1+\alpha^{2})^{T}\Phi^{(0)}+\frac{1}{\alpha n^{5}}{(1+\alpha^{2})^{T}}.$

Recall that $\alpha=\epsilon\ln 2$ and we assume $1/\epsilon<n/10$ . Thus, $1/(\alpha n^{5})<\epsilon^{4}/\ln 2<1$ . Furthermore, recall that $\Phi^{(0)}=n$ . We have,

	$\displaystyle w_{i}^{(T)}$	$\displaystyle\leq(1+\alpha^{2})^{T}(n+1)<(1+\alpha^{2})^{T}2n$
	$\displaystyle\text{exp}\left(-\alpha\sum_{t=1}^{T}\textup{error}_{i}^{(t)}\right)$	$\displaystyle\leq(1+\alpha^{2})^{T}(2n)$
	$\displaystyle-\alpha\sum_{t=1}^{T}\textup{error}_{i}^{(t)}$	$\displaystyle\leq\ln(2n)+T\ln(1+\alpha^{2}).$

We use the fact that $\ln(1+x)\leq x$ for $x\in\mathbb{R}$ to get

	$\displaystyle\sum_{t=1}^{T}\textup{error}_{i}^{(t)}$	$\displaystyle\geq-\frac{\ln(2n)}{\alpha}-T\frac{\ln(1+\alpha^{2})}{\alpha}$
	$\displaystyle\sum_{t=1}^{T}\left(1-\frac{x_{i}^{(t)}}{f_{i}}-\sum_{S_{j}\ni i}\frac{z_{j}^{(t)}}{f_{i}}\right)$	$\displaystyle\geq-\frac{\ln(2n)}{\alpha}-T\frac{\alpha^{2}}{\alpha}$
	$\displaystyle\frac{1}{T}\sum_{t=1}^{T}\left(1-\frac{x_{i}^{(t)}}{f_{i}}-\sum_{S_{j}\ni i}\frac{z_{j}^{(t)}}{f_{i}}\right)$	$\displaystyle\geq-\frac{\ln(2n)}{T\alpha}-\alpha$
	$\displaystyle 1-\frac{x_{i}}{f_{i}}-\sum_{S_{j}\ni i}\frac{z_{j}}{f_{i}}$	$\displaystyle\geq-\frac{\ln(2n)}{T\alpha}-\alpha$
	$\displaystyle\frac{x_{i}}{f_{i}}+\sum_{S_{j}\ni i}\frac{z_{j}}{f_{i}}$	$\displaystyle\leq 1+\frac{\ln(2n)}{T\alpha}+\alpha$
	$\displaystyle\frac{x_{i}}{f_{i}}+\sum_{S_{j}\ni i}\frac{z_{j}}{f_{i}}$	$\displaystyle\leq 1+O(\epsilon).$

The last inequality follows from choosing $T=\Theta(1/\epsilon^{2}\cdot\log n)$ and the fact that $\alpha=\epsilon\ln(2)$ ; furthermore, recall that the final solution $x_{i}=\frac{1}{T}\sum_{t=1}^{T}x_{i}^{(t)}$ and $z_{j}=\frac{1}{T}\sum_{t=1}^{T}z_{j}^{(t)}$ . Thus, the output of the algorithm satisfies the desired properties.

∎

References

[1] Aris Anagnostopoulos, Luca Becchetti, Ilaria Bordino, Stefano Leonardi, Ida Mele, and Piotr Sankowski. Stochastic query covering for fast approximate document retrieval. ACM Trans. Inf. Syst., 33(3):11:1–11:35, 2015.
[2] Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory Comput., 8(1):121–164, 2012.
[3] Sepehr Assadi. Tight space-approximation tradeoff for the multi-pass streaming set cover problem. In PODS, pages 321–335. ACM, 2017.
[4] Sepehr Assadi and Sanjeev Khanna. Tight bounds on the round complexity of the distributed maximum coverage problem. In SODA, pages 2412–2431. SIAM, 2018.
[5] Sepehr Assadi, Sanjeev Khanna, and Yang Li. Tight bounds for single-pass streaming complexity of the set cover problem. SIAM J. Comput., 50(3), 2021.
[6] Philip Cervenjak, Junhao Gan, Seeun William Umboh, and Anthony Wirth. Maximum unique coverage on streams: Improved FPT approximation scheme and tighter space lower bound. In APPROX/RANDOM, volume 317 of LIPIcs, pages 25:1–25:23. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2024.
[7] Amit Chakrabarti, Andrew McGregor, and Anthony Wirth. Improved algorithms for maximum coverage in dynamic and random order streams. CoRR, abs/2403.14087, 2024.
[8] Rafael da Ponte Barbosa, Alina Ene, Huy L. Nguyen, and Justin Ward. A new framework for distributed submodular maximization. In FOCS, pages 645–654. IEEE Computer Society, 2016.
[9] Uriel Feige. A threshold of ln n for approximating set cover. J. ACM, 45(4):634–652, 1998.
[10] Sariel Har-Peled, Piotr Indyk, Sepideh Mahabadi, and Ali Vakilian. Towards tight bounds for the streaming set cover problem. In PODS, pages 371–383. ACM, 2016.
[11] Dorit S Hochbaum and Anu Pathria. Analysis of the greedy approach in problems of maximum k-coverage. Naval Research Logistics (NRL), 45(6):615–627, 1998.
[12] Piotr Indyk, Sepideh Mahabadi, Ronitt Rubinfeld, Jonathan R. Ullman, Ali Vakilian, and Anak Yodpinyanee. Fractional set cover in the streaming model. In APPROX-RANDOM, volume 81 of LIPIcs, pages 12:1–12:20. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2017.
[13] Piotr Indyk and Ali Vakilian. Tight trade-offs for the maximum k-coverage problem in the general streaming model. In PODS, pages 200–217. ACM, 2019.
[14] Stephen Jaud, Anthony Wirth, and Farhana Murtaza Choudhury. Maximum coverage in sublinear space, faster. In SEA, volume 265 of LIPIcs, pages 21:1–21:20. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2023.
[15] Howard J. Karloff, Siddharth Suri, and Sergei Vassilvitskii. A model of computation for mapreduce. In SODA, pages 938–948. SIAM, 2010.
[16] David Kempe, Jon M. Kleinberg, and Éva Tardos. Maximizing the spread of influence through a social network. Theory Comput., 11:105–147, 2015.
[17] Sanjeev Khanna, Christian Konrad, and Cezar-Mihail Alexandru. Set cover in the one-pass edge-arrival streaming model. In PODS, pages 127–139. ACM, 2023.
[18] Andreas Krause and Carlos Guestrin. Near-optimal observation selection using submodular functions. In AAAI, pages 1650–1654. AAAI Press, 2007.
[19] Ravi Kumar, Benjamin Moseley, Sergei Vassilvitskii, and Andrea Vattani. Fast greedy algorithms in mapreduce and streaming. ACM Trans. Parallel Comput., 2(3):14:1–14:22, 2015.
[20] Richard E. Ladner and Michael J. Fischer. Parallel prefix computation. J. ACM, 27(4):831–838, 1980.
[21] Paul Liu and Jan Vondrák. Submodular optimization in the mapreduce model. In SOSA, volume 69 of OASIcs, pages 18:1–18:10. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019.
[22] Andrew McGregor, David Tench, and Hoa T. Vu. Maximum coverage in the data stream model: Parameterized and generalized. In ICDT, volume 186 of LIPIcs, pages 12:1–12:20. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021.
[23] Andrew McGregor and Hoa T. Vu. Better streaming algorithms for the maximum coverage problem. Theory Comput. Syst., 63(7):1595–1619, 2019.
[24] Barna Saha and Lise Getoor. On maximum coverage in the streaming model & application to multi-topic blog-watch. In SDM, pages 697–708. SIAM, 2009.
[25] David Steurer. Max coverage problem, lecture notes. https://www.cs.cornell.edu/courses/cs4820/2014sp/notes/maxcoverage.pdf, 2014. Accessed: 2024-09-14.
[26] Rowan Warneke, Farhana Murtaza Choudhury, and Anthony Wirth. Maximum coverage in random-arrival streams. In ESA, volume 274 of LIPIcs, pages 102:1–102:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2023.

Appendix A Omitted Proofs

Proof of Lemma 1.

Interpret $\{y_{1}/k,y_{2}/k,\ldots,y_{m}/k\}$ as the probabilities for the sets $S_{1},S_{2},\ldots,S_{m}$ . The central machine samples $k$ sets independently according to this distribution. In expectation, we cover at least $(1-1/e)L$ elements (see [25]). Hence, in expectation, the number of uncovered elements is at most $L/e$ . By Markov’s inequality, the probability that the number of uncovered elements is more than $(1+\epsilon)L/e$ is at most $1/(1+\epsilon)\leq e^{-\epsilon/2}$ . So, if we do this $\Theta(1/\epsilon\cdot\log m)$ times, the probability that the best solution covers fewer than $(1-1/e-1/(1+\epsilon))L$ elements is at most

\text{exp}\left(-\epsilon/2\cdot\Theta(1/\epsilon\cdot\log m)\right)\leq 1/\operatorname{poly}(m).

After the central machine forms $\Theta(1/\epsilon\cdot\log m)$ solutions (each solution consisting of at most $k$ sets) as described above, it broadcasts these solutions in batches of size $1/\epsilon$ to all other machines. Note that each batch contains $O(\log m)$ solutions.

For each batch, each machine receives a message of size $\tilde{O}\left(k\right)$ . For each of these $O(\log m)$ solutions in the batch, the machines can compute its coverage as follows. Let $v_{j}$ be the characteristic vector of the set $S_{j}$ that was chosen. We can aggregate the vectors $v_{j}$ that are in the solution in $O(\log m)$ rounds and count the number of non-zero entries using a binary broadcast tree. We do this in parallel for all solutions in the batch. Finally, we repeat this process for all $1/\epsilon$ batches. ∎

Massively Parallel Maximum Coverage Revisited 111The conference version of this manuscript is to appear in the 50th International Conference on Current Trends in Theory and Practice of Computer Science.

Abstract

1 Introduction

The MPC model.

Previous work.

Our result.

Theorem 1.

Remark.

Our techniques and paper organization.

Preliminaries.

2 Algorithm

2.1 The main algorithm

Linear programming (re)formulation.

Fact 1.

Theorem 2.

Lemma 1.

Lemma 2.

Proof.

Lemma 3.

Proof.

Putting it all together.

Reducing the number of rounds to O​(1/ϵ3⋅log⁡m⋅(log⁡m+log⁡(1/ϵ)))O(1/\epsilon^{3}\cdot\log m\cdot(\log m+\log(1/\epsilon))).

Bounded frequency.

2.2 Approximate the LP’s solution via multiplicative weights

Oracle implementation.

Lemma 4.

Proof.

Solving the LP via multiplicative weights.

Lemma 5.

Proof.

Lemma 6.

Proof.

References

Appendix A Omitted Proofs

Proof of Lemma 1.

Massively Parallel Maximum Coverage Revisited ¹¹1The conference version of this manuscript is to appear in the 50th International Conference on Current Trends in Theory and Practice of Computer Science.

Reducing the number of rounds to $O(1/\epsilon^{3}\cdot\log m\cdot(\log m+\log(1/\epsilon)))$ .