Massively Parallel Maximum Coverage Revisited 111The conference version of this manuscript is to appear in the 50th International Conference on Current Trends in Theory and Practice of Computer Science.
Abstract
We study the maximum set coverage problem in the massively parallel model. In this setting, sets that are subsets of a universe of elements are distributed among machines. In each round, these machines can communicate with each other, subject to the memory constraint that no machine may use more than memory. The objective is to find the sets whose coverage is maximized. We consider the regime where (i.e., ), , and each machine has memory 222The input size is and each machine has the memory enough to store a constant number of sets..
Maximum coverage is a special case of the submodular maximization problem subject to a cardinality constraint. This problem can be approximated to within a factor using the greedy algorithm, but this approach is not directly applicable to parallel and distributed models. When , to obtain a approximation, previous work either requires memory per machine which is not interesting compared to the trivial algorithm that sends the entire input to a single machine, or requires memory per machine which is prohibitively expensive even for a moderately small value .
Our result is a randomized -approximation algorithm that uses
rounds. Our algorithm involves solving a slightly transformed linear program of the maximum coverage problem using the multiplicative weights update method, classic techniques in parallel computing such as parallel prefix, and various combinatorial arguments.
1 Introduction
Maximum coverage is a classic NP-Hard problem. In this problem, we have sets that are subsets of a universe of elements . The goal is to find sets that cover the maximum number of elements. In the offline model, the greedy algorithm achieves a approximation and assuming , this approximation is the best possible in polynomial time [9].
However, the greedy algorithm for maximum coverage and the related set cover problem is not friendly to streaming, distributed, and massively parallel computing. A large body of work has been devoted to designing algorithms for these problems in these big data computation models. An incomplete list of work includes [23, 13, 22, 17, 5, 12, 4, 7, 10, 6, 26, 24, 3, 14].
Some example applications of maximum coverage includes facility and sensor placement [18], circuit layout and job scheduling [11], information retrieval [1], market design [16], data summarization [24], and social network analysis [14].
The MPC model.
We consider the massively parallel computation model (MPC) in which sets are distributed among machines. Each machine has memory and holds a set. In each round, each machine can communicate with others with the constraint that no machine receives a total message of size more than . Similar to previous work in the literature, we assume that .
The MPC model, introduced by Karloff, Suri, and Vassilvitskii [15] is an abstraction of various modern computing paradigms such as MapReduce, Hadoop, and Spark.
Previous work.
This problem is a special case of submodular maximization subject to a cardinality constraint. The results of Liu and Vondrak [21], Barbosa et al. [8], Kumar et al. [19] typically require that each machine has enough memory to store items which are sets in our case (and storing a set requires memory) with machines. When (e.g., ), this means that a single machine may need memory. This is not better than the trivial algorithm that sends the entire input to a single machine and solves the problem in 1 round.
Assadi and Khanna gave a randomized approximation algorithm in which each machine has memory and the number of machines is for any (see Corollary 10 in the full paper of [4]). Setting gives us a approximation in rounds with machines each of which uses memory. While Assadi and Khanna’s result is nontrivial in this regime, the dependence on is exponential and if is large, then even a moderately small value of can lead to a prohibitively large memory requirement . Their work however can handle the case where .
Our result.
We present a relatively simple randomized algorithm that achieves a approximation in rounds with memory per machine assuming . Our space requirement does not depend on compared to the exponential dependence in Assadi and Khanna’s result.
We note that assuming does not make the problem any easier since there are still exponentially many solutions to consider. In practice, one can think of many applications where one can utilize a constant fraction of the available sets (e.g., or ). We state our main result as a theorem below.
Theorem 1.
Assume and there are machines each of which has memory. There exists an algorithm that with high probability finds sets that cover at least elements in rounds.
If the maximum frequency (the maximum number of sets that any element belongs to) is bounded, we can drop the assumption that , and parameterize the number of rounds based on . In particular, we can obtain a approximation in rounds.
Remark.
We could easily modify our algorithm so that each machine uses memory and the number of rounds is . At least one factor is necessary based on the lower bound given by Corollary 9 of [4].
Randomization appears in two parts of our algorithms: the rounding step and the subsampling step to reduce the number of rounds from to . If we only need to compute an approximation to the optimal coverage value such that the output is in the interval , then we have a deterministic algorithm that runs in rounds. The algorithm by Assadi and Khanna [4] combines the sample-and-prune framework with threshold greedy. This strategy requires sampling sets. It is unclear how to derandomize their algorithm even just to compute an approximation to the optimal coverage value.
Our techniques and paper organization.
In Section 2.1, we transform the standard linear program for the maximum coverage problem into an equivalent packing linear program that can be solved “approximately” by the multiplicative weights update method. At a high level, the multiplicative weights update method gives us a fractional solution that is a bi-criteria approximation where “fractional” sets cover “fractional” elements. We then show how to find sets covering elements from this fractional solution through a combinatorial argument and parallel prefix.
Section 2.2 outlines the details to solve the transformed linear program in the MPC model. While this part is an adaptation of the standard multiplicative weights, an implementation in the MPC model requires some additional details such as the number of bits to represent the weights. All missing proofs can be found in the appendix.
Preliminaries.
In this work, we will always consider the case where each machine has memory and . Without loss of generality, we may assume the non-central machine stores the set . For each element , we use to denote the number of sets that is in. This is also referred to as the frequency of . Assume each machine has space. The vector f can be computed in rounds and broadcasted to all machines. Each machine starts with the characteristic vector of the set that it holds. The vector f is just the sum of the characteristic vectors of the sets. We can aggregate the vectors in rounds using the standard binary tree aggregation algorithm.
Since in this work, the dependence on is polynomial, an approximation can easily be translated to an approximation by scaling by a constant factor. We can also assume that ; otherwise, we can simulate the greedy algorithm in rounds. For the sake of exposition, we will not attempt to optimize the constants in our algorithm and analysis. Finally, in this work we consider as a “high probability”. We use to denote the indicator variable of the event that is 1 if happens and 0 otherwise.
2 Algorithm
2.1 The main algorithm
Linear programming (re)formulation.
We first recall the relaxed linear program (LP) for the maximum coverage problem :
maximize | ||||
(s.t.) | ||||
We first reformulate this LP and then approximately solve the new LP using the multiplicative weights update method [2]. For each , let . We have the following fact.
Fact 1.
For each ,
Note that if and , then and . Thus, it is not hard to see that the original LP is equivalent to the following LP which we will refer to as .
maximize | ||||
(s.t.) | ||||
In this section, we will assume the existence an MPC algorithm that approximately solves the linear program in rounds. The proof will be deferred to Section 2.2.
Theorem 2.
There is an algorithm that finds such that
-
1.
,
-
2.
, and
-
3.
in rounds.
Let x and z be the be the output given by Theorem 2. Then, let , , and . We have
(1) | ||||
(2) | ||||
(3) |
Thus, by setting , and , we have an approximate solution to the LP such that
We can then apply the standard randomized rounding to find a sub-collection of at most sets that covers at least elements. For the sake of completeness, we will provide the rounding algorithm in the MPC model in the following lemma. The proof can be found in Appendix A.
Lemma 1.
Suppose and satisfy:
-
1.
,
-
2.
for all ,
-
3.
,
-
4.
for all and .
Then there exists a rounding algorithm that finds a sub-collection of sets that in expectation cover at least elements in round. To obtain a high probability guarantee, the algorithm requires rounds to find sets that cover least elements.
Applying Lemma 1 to x and y with in place of , we obtain a sub-collection of at most sets that covers at least elements. Since we assume that , that means we have found sets that cover at least elements. The next lemma shows that we can find sets among these that cover at least elements. The proof is a combination of a counting argument and the well-known parallel prefix algorithm [20].
We rely on the following result which is a simulation of the parallel prefix in our setting.
Lemma 2.
Suppose there are sets and machine holds the set . Then Algorithm 1 computes in rounds.
Proof.
We first show how to compute in rounds where machine holds at the end. Once this is done, machine can send to machine and machine can compute .
The algorithm operates recursively. In one round, machine sends to machine , then machine computes . Assuming is even, the algorithm recursively computes on machines . After recursion, machines with even indices has the set . Then, in one round, machines with odd indices communicate with machine to learn about . If is odd, we just do the same on and then compute in one round.
There are recursion levels and therefore, the total number of rounds is . ∎
Lemma 3.
Let be a collection of sets whose union contains elements where , then there exist sets in whose union contains at least elements. Furthermore, we can find these sets in rounds.
Proof.
Consider the following quantities:
Clearly, . We say is responsible for element if . This establishes a one-to-one correspondence between the sets and the elements they cover. is responsible for exactly elements. Furthermore, if we remove some sets from , and an element becomes uncovered, the set responsible for that element must have been removed. Thus, if we remove the sets corresponding to the smallest , then at most elements will not have a responsible set. Thus, the number of elements that become uncovered is at most .
To find these sets, we apply Lemma 2 with in place of and in place of to learn about in rounds. We then remove the sets corresponding to the smallest and output the remaining sets. ∎
Putting it all together.
We spend rounds to approximately solve the linear program . From there, we can round the solution to find a sub-collection of sets that cover at least elements with high probability in rounds. We then apply Lemma 3 to find sets among these that cover at least elements in rounds. The total number of rounds is therefore .
Reducing the number of rounds to .
The described algorithm runs in rounds. Our main result in Theorem 1 states a stronger bound rounds. We achieve this by adopting the sub-sampling framework of McGregor and Vu [23].
Without loss of generality, we may assume that each element is covered by some set. If not, we can remove all of the elements that are not covered by any set using rounds. Specifically, let be the characteristic vector of . We can compute in rounds using the standard converge-cast binary tree algorithm. We can then remove the elements that are not covered by any set (elements corresponding to 0 entries in v).
We now have sets covering elements. Since , we must have that . McGregor and Vu showed that if one samples each element in the universe independently with probability then with high probability, if we run a approximation algorithm on the subsampled universe, the solution will correpond to a approximation on the original universe. We have just argued that and therefore with high probability, we sample elements by appealing to Chernoff bound and the fact that .
As a result, we may assume that . This results in an round algorithm.
Bounded frequency.
Assuming is known, we can lift the assumption that and parameterize our algorithm based on instead. McGregor et al. [22] showed that the largest sets contain a solution that covers at least elements. We therefore can assume that by keeping only the largest sets which can be identified in rounds.
We set and proceed to obtain a solution that covers at least elements using at most sets as in the discussion above. Appealing to Lemma 3, we can find sets that cover at least elements. The total number of rounds is .
2.2 Approximate the LP’s solution via multiplicative weights
Fix an objective value . Let be a convex region defined by
Note that if then . Consider the following problem that asks to either correctly declare that
or to output a solution such that
Once we have such an algorithm, we can try different values of and return the solution corresponding to the largest that has a feasible solution. There are such guesses. We know that the guess where must result in a feasible solution.
To avoid introducing a factor in the number of rounds, we partition these guesses into batches of size where each batch corresponds to guesses. Algorithm copies that correspond to guesses in the same batch will run in parallel. This will only introduce a factor in terms of memory used by each machine. By returning the solution corresponding to the largest feasible guess , one attains Theorem 2.
Oracle implementation.
Given a weight vector in which for all . We first consider an easier feasibility problem . It asks to either correctly declares that
or to outputs a solution such that
(4) |
That is, if the input is feasible, then output the corresponding that approximately satisfy the constraint. Otherwise, correctly conclude that the input is infeasible. In the multiplicative weights update framework, this is known as the approximate oracle.
Note that if there is a feasible solution to , then there is a feasible solution to since
We can implement an oracle that solves the above feasibility problem as follows. First, observe that
To ease the notation, define
We therefore want to check if there exists such that
We will minimize the left hand side by minimizing each sum separately. We can indeed do this exactly. However, there is a subtle issue where we need to bound the number of bits required to represent and given the memory constraint. To do this, we truncate the value of each after the -th bit following the decimal point. Note that this will result in an underestimate of by at most . In particular, let be after truncating the value of at the -th bit after the decimal point and . For any , we have
The last inequality is based on the assumption that . Therefore, . Note that since and , to minimize over , we simply set
After setting as above, if , then it is safe to declare that the system is infeasible. Otherwise, we have found such that as required by Equation (4).
Lemma 4.
Assume that all machines have the vector w. We can solve the feasibility problem in rounds, where all machines either learn that the system is infeasible or obtain an approximate solution that satisfies Equation (4).
Proof.
We have argued above that the oracle we design either correctly declares that the system is infeasible or outputs such that . Recall that each machine has the frequency vector f and suppose for the time being that each machine also has the vector w. We can implement this oracle in rounds as follows. Each machine computes since it has and to compute for all . Then it sends to the central machine. Note that each can be represented using bits. Observe that is the sum of at most different . The number of bits in the fractional part does not increase while the number of bits in the integer part increases by at most , as the sum is upper bounded by a crude bound . Thus, the central machine receives at most bits from all other machines.
The central machine will then set x and z as described above. This allows the central machine to correctly determine whether the system is infeasible or to find that satisfies Equation (4). Since the entries of x and z are binary, they can be sent to non-central machines, with each machine receiving a message of bits. We summarize the above as the following lemma. ∎
Solving the LP via multiplicative weights.
Once the existence of such an oracle is guaranteed, we can follow the multiplicative weights framework to approximately solve the LP. We will first explain how to implement the MWU algorithm in the MPC model. See Algorithm 2.
Lemma 5.
Algorithm 2 can be implemented in rounds.
Proof.
Without loss of generality, we can round down to the nearest power of . Then, can be represented using bits, since we assume .
Each machine maintains the current vectors , , and , each of which has at most entries. Since the entries in and are binary, we can represent them using bits. Recall that, in the oracle described in Lemma 4, the central machine can broadcast and to all other machines in one round without violating the memory constraint.
It remains to show that the number of bits required to represent is in all iterations. The central machine then implicitly broadcasts to all other machines in iteration by sending to all other machines for each .
Note that each has the form
For any , since , we have
Since , we have
Thus, we can represent using bits. This implies that the central machine can implicitly broadcast to all other machines in rounds. Putting it all together, each of the iterations consists of: 1) a call to the oracle, which takes rounds, 2) computing for each in step 7, which takes rounds, and 3) the central machine broadcasting in one round to all other machines before proceeding to the next iteration. ∎
The next lemma is an adaptation of the standard multiplicative weights algorithm.
Lemma 6.
The output of Algorithm 2 satisfies the following property. If there exists a feasible solution, then the output satisfies:
Otherwise, the algorithm correctly concludes that the system is infeasible.
Proof.
If the algorithm does not output Infeasible, this implies that in each iteration , , and therefore since . Similarly, in each iteration , , and thus since . Hence, the output . Define the potential function
We will make use of the fact for . Note that for all and , we always have
Note that because and . Set , as long as , we have and therefore
Summing over gives:
The first inequality follows from the fact that . Note that
The last inequality follows from the fact that the oracle is guaranteed to find and such that
Thus, By a simple induction,
Recall that and we assume . Thus, . Furthermore, recall that . We have,
We use the fact that for to get
The last inequality follows from choosing and the fact that ; furthermore, recall that the final solution and . Thus, the output of the algorithm satisfies the desired properties.
∎
References
- [1] Aris Anagnostopoulos, Luca Becchetti, Ilaria Bordino, Stefano Leonardi, Ida Mele, and Piotr Sankowski. Stochastic query covering for fast approximate document retrieval. ACM Trans. Inf. Syst., 33(3):11:1–11:35, 2015.
- [2] Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory Comput., 8(1):121–164, 2012.
- [3] Sepehr Assadi. Tight space-approximation tradeoff for the multi-pass streaming set cover problem. In PODS, pages 321–335. ACM, 2017.
- [4] Sepehr Assadi and Sanjeev Khanna. Tight bounds on the round complexity of the distributed maximum coverage problem. In SODA, pages 2412–2431. SIAM, 2018.
- [5] Sepehr Assadi, Sanjeev Khanna, and Yang Li. Tight bounds for single-pass streaming complexity of the set cover problem. SIAM J. Comput., 50(3), 2021.
- [6] Philip Cervenjak, Junhao Gan, Seeun William Umboh, and Anthony Wirth. Maximum unique coverage on streams: Improved FPT approximation scheme and tighter space lower bound. In APPROX/RANDOM, volume 317 of LIPIcs, pages 25:1–25:23. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2024.
- [7] Amit Chakrabarti, Andrew McGregor, and Anthony Wirth. Improved algorithms for maximum coverage in dynamic and random order streams. CoRR, abs/2403.14087, 2024.
- [8] Rafael da Ponte Barbosa, Alina Ene, Huy L. Nguyen, and Justin Ward. A new framework for distributed submodular maximization. In FOCS, pages 645–654. IEEE Computer Society, 2016.
- [9] Uriel Feige. A threshold of ln n for approximating set cover. J. ACM, 45(4):634–652, 1998.
- [10] Sariel Har-Peled, Piotr Indyk, Sepideh Mahabadi, and Ali Vakilian. Towards tight bounds for the streaming set cover problem. In PODS, pages 371–383. ACM, 2016.
- [11] Dorit S Hochbaum and Anu Pathria. Analysis of the greedy approach in problems of maximum k-coverage. Naval Research Logistics (NRL), 45(6):615–627, 1998.
- [12] Piotr Indyk, Sepideh Mahabadi, Ronitt Rubinfeld, Jonathan R. Ullman, Ali Vakilian, and Anak Yodpinyanee. Fractional set cover in the streaming model. In APPROX-RANDOM, volume 81 of LIPIcs, pages 12:1–12:20. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2017.
- [13] Piotr Indyk and Ali Vakilian. Tight trade-offs for the maximum k-coverage problem in the general streaming model. In PODS, pages 200–217. ACM, 2019.
- [14] Stephen Jaud, Anthony Wirth, and Farhana Murtaza Choudhury. Maximum coverage in sublinear space, faster. In SEA, volume 265 of LIPIcs, pages 21:1–21:20. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2023.
- [15] Howard J. Karloff, Siddharth Suri, and Sergei Vassilvitskii. A model of computation for mapreduce. In SODA, pages 938–948. SIAM, 2010.
- [16] David Kempe, Jon M. Kleinberg, and Éva Tardos. Maximizing the spread of influence through a social network. Theory Comput., 11:105–147, 2015.
- [17] Sanjeev Khanna, Christian Konrad, and Cezar-Mihail Alexandru. Set cover in the one-pass edge-arrival streaming model. In PODS, pages 127–139. ACM, 2023.
- [18] Andreas Krause and Carlos Guestrin. Near-optimal observation selection using submodular functions. In AAAI, pages 1650–1654. AAAI Press, 2007.
- [19] Ravi Kumar, Benjamin Moseley, Sergei Vassilvitskii, and Andrea Vattani. Fast greedy algorithms in mapreduce and streaming. ACM Trans. Parallel Comput., 2(3):14:1–14:22, 2015.
- [20] Richard E. Ladner and Michael J. Fischer. Parallel prefix computation. J. ACM, 27(4):831–838, 1980.
- [21] Paul Liu and Jan Vondrák. Submodular optimization in the mapreduce model. In SOSA, volume 69 of OASIcs, pages 18:1–18:10. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019.
- [22] Andrew McGregor, David Tench, and Hoa T. Vu. Maximum coverage in the data stream model: Parameterized and generalized. In ICDT, volume 186 of LIPIcs, pages 12:1–12:20. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021.
- [23] Andrew McGregor and Hoa T. Vu. Better streaming algorithms for the maximum coverage problem. Theory Comput. Syst., 63(7):1595–1619, 2019.
- [24] Barna Saha and Lise Getoor. On maximum coverage in the streaming model & application to multi-topic blog-watch. In SDM, pages 697–708. SIAM, 2009.
- [25] David Steurer. Max coverage problem, lecture notes. https://www.cs.cornell.edu/courses/cs4820/2014sp/notes/maxcoverage.pdf, 2014. Accessed: 2024-09-14.
- [26] Rowan Warneke, Farhana Murtaza Choudhury, and Anthony Wirth. Maximum coverage in random-arrival streams. In ESA, volume 274 of LIPIcs, pages 102:1–102:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2023.
Appendix A Omitted Proofs
Proof of Lemma 1.
Interpret as the probabilities for the sets . The central machine samples sets independently according to this distribution. In expectation, we cover at least elements (see [25]). Hence, in expectation, the number of uncovered elements is at most . By Markov’s inequality, the probability that the number of uncovered elements is more than is at most . So, if we do this times, the probability that the best solution covers fewer than elements is at most
After the central machine forms solutions (each solution consisting of at most sets) as described above, it broadcasts these solutions in batches of size to all other machines. Note that each batch contains solutions.
For each batch, each machine receives a message of size . For each of these solutions in the batch, the machines can compute its coverage as follows. Let be the characteristic vector of the set that was chosen. We can aggregate the vectors that are in the solution in rounds and count the number of non-zero entries using a binary broadcast tree. We do this in parallel for all solutions in the batch. Finally, we repeat this process for all batches. ∎