Scalable Distributed Algorithms for Size-Constrained Submodular Maximization in the MapReduce and Adaptive Complexity Models

Yixin Chen [email protected]
Department of Computer Science & Engineering
Texas A&M University
College Station, TX Tonmoy Dey [email protected]
Department of Computer Science
Florida State University
Tallahassee, FL, USA Alan Kuhnle [email protected]
Department of Computer Science & Engineering
Texas A&M University
College Station, TX

Abstract

Distributed maximization of a submodular function in the MapReduce (MR) model has received much attention, culminating in two frameworks that allow a centralized algorithm to be run in the MR setting without loss of approximation, as long as the centralized algorithm satisfies a certain consistency property – which had previously only been known to be satisfied by the standard greedy and continous greedy algorithms. A separate line of work has studied parallelizability of submodular maximization in the adaptive complexity model, where each thread may have access to the entire ground set. For the size-constrained maximization of a monotone and submodular function, we show that several sublinearly adaptive (highly parallelizable) algorithms satisfy the consistency property required to work in the MR setting, which yields practical, parallelizable and distributed algorithms. Separately, we develop the first distributed algorithm with linear query complexity for this problem. Finally, we provide a method to increase the maximum cardinality constraint for MR algorithms at the cost of additional MR rounds.

1 Introduction

Submodular maximization has become an important problem in data mining and machine learning with real world applications ranging from video summarization (Mirzasoleiman et al., 2018) and mini-batch selection (Joseph et al., 2019) to more complex tasks such as active learning (Rangwani et al., 2021) and federated learning (Balakrishnan et al., 2022). In this work, we study the size-constrained maximization of a monotone, submodular function (SMCC), formally defined in Section 1.3. Because of the ubiquity of problems requiring the optimization of a submodular function, a vast literature on submodular optimization exists; we refer the reader to the surveys (Liu et al., 2020; Liu, 2020).

Reference	Ratio	MR Rounds	Adaptivity ( $\Theta$ )	Queries ( $\Xi$ )	$k_{max}$
RG	$\frac{1}{2}(1-\frac{1}{e})$	$\boldsymbol{2}$	$O(k)$	$O(nk+k^{2}\ell)$	$O(\frac{n}{\ell^{2}})$
PAlg	$\boldsymbol{1-\frac{1}{e}-\varepsilon}$	$O(\frac{1}{\varepsilon^{2}})$	$O(\frac{k}{\varepsilon^{2}})$	$O(\frac{nk}{\varepsilon^{2}}+\frac{k^{2}\ell^{2}}{\varepsilon^{4}})$	$O(\frac{n\varepsilon^{2}}{\ell^{2}})$
DDist	$\boldsymbol{1-\frac{1}{e}-\varepsilon}$	$O(\frac{1}{\varepsilon})$	$O(\frac{k}{\varepsilon})$	$O(\frac{nk}{\varepsilon}+\frac{k^{2}\ell^{2}}{\varepsilon^{2}})$	$O(\frac{n\varepsilon}{\ell^{2}})$
BCG	$1-\varepsilon$	$r$	$O\left(\frac{rk}{\varepsilon^{2/r}}\log^{2}(\frac{1}{\varepsilon^{1/r}})\right)$	$O(\frac{nkr}{\varepsilon^{1/r}}+\frac{k^{2}\ell rm^{2}(\varepsilon^{1/r})}{\varepsilon^{3/r}})$	$O(\frac{n\varepsilon^{1/r}}{\ell^{2}})$
PG	$0.545-\varepsilon$	$\boldsymbol{2}$	$O(k^{2})$	$O(nk+k^{3})$	$O(\frac{n}{\ell^{2}})$
TG	$\frac{1}{2}$	$\boldsymbol{2}$	${O(n\log(k))}$	${O(n\log(k))}$	$O(\frac{n}{\ell^{2}\log(k)})$
R-DASH (Alg. 4)	$\frac{1}{2}(1-\frac{1}{e}-\varepsilon)$	$\boldsymbol{2}$	$O(\frac{\log(k)\log(n)}{\varepsilon^{4}})$	${O(\frac{n\log(k)}{\varepsilon^{4}})}$	$O(\frac{n}{\ell^{2}})$
G-DASH (Alg. 5)	$\boldsymbol{1-\frac{1}{e}-\varepsilon}$	$O(\frac{1}{\varepsilon})$	$O\left(\frac{\log(k)\log(n)}{\varepsilon^{5}}\right)$	$O(\frac{n\log(k)}{\varepsilon^{5}})$	$O(\frac{n\varepsilon}{\ell^{2}})$
T-DASH (Alg. 10)	$\frac{3}{8}-\varepsilon$	$\boldsymbol{2}$	$\boldsymbol{O(\frac{\log(n)}{\varepsilon^{3}})}$	${O(\frac{n\log(k)}{\varepsilon^{4}})}$	$O(\frac{n\varepsilon}{\ell^{2}\log(k)})$
L-Dist (Alg. 7)	$\frac{1}{8}$	$\boldsymbol{2}$	$O(\frac{n}{\ell})$	$\boldsymbol{O(n)}$	$O(\frac{n}{\ell^{2}\log(n)})$

Table 1: Theoretical comparison of our algorithms to MapReduce model algorithms: RG of Barbosa et al. (2015), PAlg of Barbosa et al. (2016), DDist of Kazemi et al. (2021), BCG of Epasto et al. (2017), PG of Mirrokni and Zadimoghaddam (2015), and TG of Liu and Vondrak (2019). The column

k_{max}

provides the maximum cardinality constraint that the algorithm can compute.

A foundational result of Nemhauser et al. (1978) shows that a simple greedy algorithm (Greedy, pseudocode in Appendix H) achieves the optimal approximation ratio for SMCC of $1-1/e\approx 0.63$ in the value query model, in which the submodular function $f$ is available to the algorithm as an oracle that returns $f(S)$ when queried with set $S$ . However, because of the modern revolution in data (Mao et al., 2021; Ettinger et al., 2021), there is a need for more scalable algorithms. Specifically, the standard greedy algorithm is a centralized algorithm that needs access to the whole dataset, makes many adaptive rounds (defined below) of queries to the submodular function, and has quadratic total query complexity in the worst case.

Distributed Algorithms for SMCC. Massive data sets are often too large to fit on a single machine and are distributed over a cluster of many machines. In this context, there has been a line of work developing algorithms for SMCC in the MapReduce model, defined formally in Section 1.3 (see Table 1 and the Related Work section below). Of these, PAlg (Barbosa et al., 2015) is a general framework that allows a centralized algorithm Alg to be converted to the MapReduce distributed setting with nearly the same theoretical approximation ratio, as long as Alg satisfies a technical, randomized consistency property (RCP), defined in Section 1.3.

In addition, DistributedDistorted (Kazemi et al., 2019), is also a general framework that can adapt any Alg satisfying RCP, as we show¹¹1In Kazemi et al. (2019), an instantiated version of DistributedDistorted with a distorted greedy algorithm is analyzed. in Section 4.2. However, to the best of our knowledge, the only centralized algorithms to satisfy this property are the Greedy Nemhauser et al. (1978) and the continuous greedy algorithm of Calinescu et al. (2007).

Parallelizable Algorithms for SMCC. A separate line of works, initiated by Balkanski and Singer (2018) has taken an orthogonal approach to scaling up the standard greedy algorithm: parallelizable algorithms for SMCC as measured by the adaptive complexity of the algorithm, formally defined in Section 1.3. In this model, each thread may have access to the entire ground set and hence these algorithms do not apply in a distributed setting. Altogether, these algorithms have exponentially improved the adaptivity of standard greedy from $O(n)$ to $O(\log(n))$ while obtaining the nearly the same $1-1/e$ approximation ratio Fahrbach et al. (2019a); Balkanski et al. (2019); Ene and Nguyen (2019); Chekuri and Quanrud (2019). Recently, highly practical sublinearly adaptive algorithms have been developed without compromising theoretical guarantees Breuer et al. (2020); Chen et al. (2021). However, none of these algorithms have been shown to satisfy the RCP.

Combining Parallelizability and Distributed Settings. In this work, we study parallelizable, distributed algorithms for SMCC; each node of the distributed cluster likely has many processors, so we seek to take advantage of this situation by parallelizing the algorithm within each machine in the cluster. In the existing MapReduce algorithms, the number of processors $\ell$ could be set to the number of processors available to ensure the usage of all available resources – this would consider each processor as a separate machine in the cluster. However, there are a number of disadvantages to this approach: 1) A large number of machines severely restricts the size of the cardinality constraint $k$ – since, in all of the MapReduce algorithms, a set of size $k\ell$ must be stored on a single machine; therefore, we must have $k\leq O(\Psi/\ell)=O(n/\ell^{2})$ . For example, with $n=10^{6}$ , $\ell=256$ implies that $k\leq 15\cdot C$ , for some $C$ . However, if each machine has 8 processor cores, the number of machines $\ell$ can be set to 32 (and then parallelized on each machine), which enables the use of 256 processors with $k\leq 976\cdot C$ . As the number of cores and processors per physical machine continues to grow, large practical benefits can be obtained by a parallelized and distributed algorithm. 2) Modern CPU architectures have many cores that all share a common memory, and it is inefficient to view communication between these cores to be as expensive as communication between separate machines in a cluster.

In addition to the MapReduce and adaptive complexity models, the total query complexity of the algorithm to the submodular oracle is also relevant to the scalability of the algorithm. Frequently, evaluation of the submodular function is expensive, so the time spent answering oracle queries dominates the other parts of the computation. Therefore, the following questions are posed: Q1: Is it possible to design constant-factor approximation distributed algorithms with 1) constant number of MapReduce rounds; 2) sublinear adaptive complexity (highly-parallelizable); and 3) nearly linear total query complexity? Q2: Can we design practical distributed algorithms that also meet these three criteria?

1.1 Contributions

In overview, the contributions of the paper are: 1) an exponential improvement in adaptivity (from $\Omega(n)$ to $O(\log n))$ for an algorithm with a constant number of MR rounds; 2) the first linear-time, constant-factor algorithm with constant MR rounds; 3) a way to increase the size constraint limitation at the cost of additional MR rounds; and 4) an empirical evaluation on a cluster of 64 machines that demonstrates an order of magnitude improvement in runtime is obtained by combining the MR and adaptive complexity models.

To develop MR algorithms with sublinear adaptivity, we first modify and analyze the low-adaptive algorithm ThreshSeqMod from Chen et al. (2021) and show that our modification satisfies the RCP (Section 2). Next, ThreshSeqMod is used to create a low-adaptive greedy procedure LAG; LAG then satisfies the RCP, so can be used within PAlg Barbosa et al. (2016) to yield a $1-1/e-\varepsilon$ approximation in $O(1/\varepsilon^{2})$ MR rounds, with adaptivity $O(\log n)$ . We show in Section 4.2 that DistributedDistorted Kazemi et al. (2021) also works with any Alg satisfying RCP, which yields an improvement in the number of MR rounds to achieve the same ratio. The resulting algorithm we term G-DASH (Section 4), which achieves ratio $1-1/e-\varepsilon$ with $O(\log^{2}n)$ adaptivity and $O(1/\varepsilon)$ MR rounds.

Although G-DASH achieves nearly the optimal ratio in constant MR rounds and sublinear adaptivity, the number of MR rounds is high. To obtain truly practical algorithms, we develop constant-factor algorithms with 2 MR rounds: R-DASH and T-DASH with $O(\log(k)\log(n))$ and $O(\log n)$ adaptivity, respectively, and with approximation ratios of $\frac{1}{2}(1-1/e-\varepsilon)$ ( $\simeq$ 0.316) and $3/8-\varepsilon$ . R-DASH is our most practical algorithm and may be regarded as a parallelized version of RandGreeDI Barbosa et al. (2015), the first MapReduce algorithm for SMCC and the state-of-the-art MR algorithm in terms of empirical performance. T-DASH is a novel algorithm that improves the theoretical properties of R-DASH at the cost of empirical performance (see Table 1 and Section 8). Notably, the adaptivity of T-DASH is $O\left(\log(n)\right)$ which is close to the best known adaptivity for a constant factor algorithm of Chen et al. (2021), which has adaptivity of $O(\log(n/k))$ .

Although our MR algorithms are nearly linear time (within a polylogarithmic factor of linear), all of the existing MR algorithms are superlinear time, which raises the question of if an algorithm exists that has a linear query complexity and has constant MR rounds and constant approximation factor. We answer this question affirmatively by adapting a linear-time algorithm of Kuhnle (2021); Chen et al. (2021), and showing that our adaptation satisfies RCP (Section 3). Subsequently, we develop the first MR algorithm (L-Dist) with an overall linear query complexity.

Our next contribution is MED, a general plug-in framework for distributed algorithms, which increases the size of the maximum cardinality constraint at the cost of more MR rounds. As discussed above, the maximum $k$ value of any prior MR algorithm for SMCC is $O(n/\ell^{2})$ , where $\ell$ is the number of machines; MED increases this to $O(n/\ell)$ . We also show that under certain assumptions on the objective function (which are satisfied by all of the empirical applications evaluated in Section 8), MED can be run with $k_{max}=n$ , which removes any cardinality restrictions. When used in conjunction with a $\gamma$ -approximation MR algorithm, MED provides an ( $1-e^{-\gamma}$ )-approximate solution.

Finally, an extensive empirical evaluation of our algorithms and the current state-of-the-art on a 64-machine cluster with 32 cores each and data instances ranging up to 5 million nodes show that R-DASH is orders of magnitude faster than state-of-the-art MR algorithms and demonstrates an exponential improvement in scaling with larger $k$ . Moreover, we show that MR algorithms slow down with increasing number of machines past a certain point, even if enough memory is available, which further motivates distributing a parallelizable algorithm over smaller number of machines. This observation also serves as a motivation for the development of the MED+Alg framework. In our evaluation, we found that the MED+Alg framework delivers solutions significantly faster than the Alg for large constraints, showcasing its superior performance and efficiency.

A previous version of this work was published in a conference Dey et al. (2023). In that version, RCP was claimed to be shown for the ThresholdSeq algorithm of Chen et al. (2021). Unfortunately, RCP does not hold for this algorithm. In this version, we provide a modified version of the ThresholdSeq algorithm of Chen et al. (2021), for which we show RCP. Additionally, in this version, we introduce the first linear time MR algorithm, LinearTime-Distributed (L-Dist). Finally, in our empirical evaluation, we expand upon the conference version by conducting two additional experiments using a larger cluster comprising 64 machines. These additional experiments aim to evaluate the performance of L-Dist and investigate the effectiveness of MED in handling increasing cardinality constraints, respectively.

1.2 Organization

The rest of this paper is organized as follows. In Section 2, we present the low-adaptive procedures and show they satisfy the RCP. In Section 3, we analyze an algorithm with linear query complexity and show it satisfies the RCP. In Section 4, we show how to use algorithms satisfying the RCP to obtain MR algorithms. In Section 5, we improve the theoretical properties of our 2-round MR algorithm. In Section 6, we detail the linear-time MR algorithm. In Section 7, we show how to increase the supported constraint size by adding additional MR rounds. In Section 8, we conduct our empirical evaluation.

1.3 Preliminaries

A submodular set function captures the diminishing gain property of adding an element to a set that decreases with increase in size of the set. Formally, let $N$ be a finite set of size $n$ . A non-negative set function $f:2^{\mathcal{N}}\rightarrow\mathbb{R}^{+}$ is submodular iff for all sets $A\subseteq B\subseteq\mathcal{N}$ and $x\in\mathcal{N}\backslash B$ , $f(A\cup{x})-f(A)\geq f(B\cup{x})-f(B)$ . The submodular function is monotone if $f(A)\leq f(B)$ for all $A\subseteq B$ . We use $\Delta\left(x\,|\,S\right)$ to denote the marginal game of element $x$ to set $S$ : $\Delta\left(x\,|\,S\right)=f(S\cup\{x\})-f(S)$ . In this paper, we study the following optimization problem for submodular optimization under cardinality constraint (SMCC):

O\leftarrow\arg\max_{S\subseteq\mathcal{N}}f(S)\text{; subject to}\ |S|\leq k

(1)

MapReduce Model. The MapReduce (MR) model is a formalization of distributed computation into MapReduce rounds of communication between machines. A dataset of size $n$ is distributed on $\ell$ machines, each with memory to hold at most $\Psi$ elements of the ground set. The total memory of the machines is constrained to be $\Psi\cdot\ell=O(n)$ . After each round of computation, a machine may send $O(\Psi)$ amount of data to other machines. We assume $\ell\leq n^{1-c}$ for some constant $c\geq 1/2$ .

Next, we formally define the RCP needed to use the two frameworks to convert a centralized algorithm to the MR setting, as discussed previously.

Property 1 (Randomized Consistency Property of Barbosa et al. (2016)).

Let $\mathbf{q}$ be a fixed sequence of random bits; and Alg be a randomized algorithm with randomness determined by $\mathbf{q}$ . Suppose $\textsc{Alg}(\mathcal{N},\mathbf{q})$ returns a pair of sets, $(\textsc{AlgSol}(\mathcal{N},\mathbf{q}),\textsc{AlgRel}(\mathcal{N},\mathbf{q}))$ , where $\textsc{AlgSol}(\mathcal{N},\mathbf{q})$ is the feasible solution and $\textsc{AlgRel}(\mathcal{N},\mathbf{q})$ is a set providing additional information. Let $A$ and $B$ be two disjoint subsets of $\mathcal{N}$ , and that for each $b\in B$ , $\textsc{AlgRel}(A\cup\{b\},\mathbf{q})=\textsc{AlgRel}(A,\mathbf{q})$ . Also, suppose that $\textsc{Alg}(A,\mathbf{q})$ terminates successfully. Then $\textsc{Alg}(A\cup B,\mathbf{q})$ terminates successfully and $\textsc{AlgSol}(A\cup B,\mathbf{q})=\textsc{AlgSol}(A,\mathbf{q})$ .

Adaptive Complexity. The adaptive complexity of an algorithm is the minimum number of sequential adaptive rounds of at most polynomially many queries, in which the queries to the submodular function can be arranged, such that the queries in each round only depend on the results of queries in previous rounds.

1.4 Related Work

MapReduce Algorithms. Addressing the coverage maximization problem (a special case of SMCC), Chierichetti et al. (2010) proposed an approximation algorithm achieving a ratio of $1-1/e$ , with polylogarithmic MapReduce (MR) rounds. This was later enhanced by Blelloch et al. (2012), who reduced the number of rounds to $\log^{2}n$ . Kumar et al. (2013) contributed further advancements with a $1-1/e$ algorithm, significantly reducing the number of MR rounds to logarithmic levels. Additionally, they introduced a $1/2-\epsilon$ approximation algorithm that operates within $O(1/\delta)$ MR rounds, albeit with a logarithmic increase in communication complexity.

For SMCC, Mirzasoleiman et al. (2013) introduced the two-round distributed greedy algorithm (GreedI), demonstrating its efficacy through empirical studies across various machine learning applications. However, its worst-case approximation guarantee is $1/\Theta(\min\{\sqrt{k},\ell\})$ . Subsequent advancements led to the development of constant-factor algorithms by Barbosa et al. (2015). Barbosa et al. (2015) notably introduced randomization into the distributed greedy algorithm GreedI, resulting in the RandGreeDI algorithm. This algorithm achieves a $\frac{1}{2}(1-1/e)$ approximation ratio with two MR rounds and $O\left(nk\right)$ queries. Mirrokni and Zadimoghaddam (2015) also introduced an algorithm with two MR rounds, where the first round finded randomized composable core-sets and the second round applied a tie-breaking rule. This algorithm improve the approximation ratio from $\frac{1}{2}(1-1/e)$ to $0.545-\varepsilon$ . Later on, Barbosa et al. (2016) proposed a $O\left(1/\varepsilon^{2}\right)$ round algorithm with the optimal $(1-1/e-\varepsilon)$ approximation ratio without data duplication. Then, Kazemi et al. (2021) improved its space and communication complexity by a factor of $O\left(1/\varepsilon\right)$ with the same approximation ratio. Another two-round algorithm, proposed by Liu and Vondrak (2019), achieved an $1/2-\varepsilon$ approximation ratio, but requires data duplication with four times more elements distributed to each machine in the first round. As a result, distributed setups have a more rigid memory restriction when running this algorithm.

Parallelizable Algorithms. A separate line of works consider the parallelizability of the algorithm, as measured by its adaptive complexity. Balkanski and Singer (2018) introduced the first $O\left(\log(n)\right)$ -adaptive algorithm, achieving a $(1/3-\varepsilon)$ approximation ratio with $O\left(nk^{2}\log^{3}(n)\right)$ query complexity for SMC Then, Balkanski et al. (2019) enhanced the approximation ratio to $1-1/e-\varepsilon$ with $O\left(nk^{2}\log^{2}(n)\right)$ query complexity while maintaining the same sublinear adaptivity. In the meantime, Ene and Nguyen (2019), Chekuri and Quanrud (2019), and Fahrbach et al. (2019a) achieved the same approximation ratio and adaptivity but with improved query complexities: $O\left(n\text{poly}(\log(n))\right)$ for Ene and Nguyen (2019), $O\left(n\log(n)\right)$ for Chekuri and Quanrud (2019), and $O\left(n\right)$ for Fahrbach et al. (2019a). Recently, highly practical sublinearly adaptive algorithms have been developed without compromising on the approximation ratio. FAST, introduced by Breuer et al. (2020), operates with $O\left(\log(n)\log^{2}\left(\log(k)\right)\right)$ adaptive rounds and $O\left(n\log\left(\log(k)\right)\right)$ queries, while LS+PGB, proposed by Chen et al. (2021), runs with $O\left(\log(n)\right)$ adaptive rounds and $O\left(n\right)$ queries. However, none of these algorithms have been shown to satisfy the RCP.

Fast (Low Query Complexity) Algorithms. Since a query to the oracle for the submodular function is typically an expensive operation, the total query complexity of an algorithm is highly relevant to its scalability. The work by Badanidiyuru and Vondrak (2014) reduced the query complexity of the standard greedy algorithm from $O\left(kn\right)$ to $O\left(n\log(n)\right)$ , while nearly maintaining the ratio of $1-1/e$ . Subsequently, Mirzasoleiman et al. (2015) utilized random sampling on the ground set to achieve the same approximation ratio in expectation with optimal $O\left(n\right)$ query complexity. Afterwards, Kuhnle (2021) obtained a deterministic algorithm that achieves ratio $1/4$ in exactly $n$ queries, and the first deterministic algorithm with $O\left(n\right)$ query complexity and $(1-1/e-\varepsilon)$ approximation ratio. Since our goal is to develop practical algorithms in this work, we develop paralellizable and distributed algorithms with nearly linear query complexity (i.e. within a polylogarithmic factor of linear). Further, we develop the first linear-time algorithm with constant number of MR rounds, although it doesn’t parallelize well.

Non-monotone Algorithms. If the submodular function is no longer assumed to be monotone, the SMCC problem becomes more difficult. Here, we highlight a few works in each of the categories of distributed, parallelizable, and query efficient. Lee et al. (2009) provided the first constant-factor approximation algorithm with an approximation ratio of $(1/4-\varepsilon)$ and a query complexity of $O\left(nk^{5}\log(k)\right)$ . Building upon that, Gupta et al. (2010) reduced the query complexity to $O\left(nk\right)$ by replacing the local search procedure with a greedy approach that resulted in a slightly worse approximation ratio. Subsequently, Buchbinder et al. (2014) incorporated randomness to achieve an expected $1/e$ approximation ratio with $O\left(nk\right)$ query complexity. Furthermore, Buchbinder et al. (2017) improved it to $O\left(n\right)$ query complexity while maintaining the same approximation ratio in expectation. For parallelizable algorithms in the adaptive complexity model, Ene and Nguyen (2020) achieved the current best $(1/e-\varepsilon)$ approximation ratio with $O\left(\log(n)\right)$ adaptivity and $O\left(nk^{2}\log^{2}(n)\right)$ queries to the continuous oracle (a continuous relaxation of the original value oracle). Later, Chen and Kuhnle (2024) achieved the current best sublinear adaptivity and nearly linear query complexity with a $(1/6-\varepsilon)$ approximation ratio. In terms of distributed algorithms in the MapReduce model, RandGreeDI proposed by Barbosa et al. (2015) achieved a $0.1$ approximation ratio within $2$ MR rounds and $O\left(nk\right)$ query complexity. Simulaneously, Mirrokni and Zadimoghaddam (2015) also presented a $2$ -round algorithm with a $0.14$ approximation ratio and $O\left(nk\right)$ query complexity. Afterwards, Barbosa et al. (2016) proposed a $2$ -round continuous algorithm with a $0.232\left(1-\frac{1}{\ell}\right)$ approximation ratio.

2 Low-Adaptive Algorithms That Satisfy the RCP

In this section, we analyze two low-adaptive procedures, ThreshSeqMod (Alg. 1), LAG (Alg. 2), variants of low-adaptive procedures proposed in Chen et al. (2021). This analysis enables their use in the distributed, MapReduce setting. For convenience, we regard the randomness of the algorithms to be determined by a sequence of random bits $\mathbf{q}$ , which is an input to each algorithm.

Algorithm 1 Low-Adaptive Threshold Algorithm (ThreshSeqMod)

1:procedure ThreshSeqMod(

f,X,k,\delta,\varepsilon,\tau,\mathbf{q}

)

2: Input: evaluation oracle

f:2^{\mathcal{N}}\to\mathbb{R}^{+}

, subset

X\subseteq\mathcal{N}

, constraint

k

, confidence

\delta

, error

\varepsilon

, threshold

\tau

, a finite set of sequences of random bits

\mathbf{q}\leftarrow\{\sigma_{1},\ldots,\sigma_{M+1}\}

3: Initialize

S_{0}\leftarrow\emptyset

R_{0}\leftarrow\emptyset

V_{0}\leftarrow X

M\leftarrow\left\lceil 4\left(1+\frac{1}{\beta\varepsilon}\right)\log\left(\frac{n}{\delta}\right)\right\rceil

\beta\leftarrow\varepsilon\left(16\log\left(\frac{4}{1-e^{-\varepsilon/2}}\right)\right)^{-1}

4: for

j\leftarrow 1

M+1

5: Update

V_{j}\leftarrow\{x\in V_{j-1}:\Delta\left(x\,|\,S_{j-1}\right)\geq\tau\}

6: if

|V_{j}|=0

|S_{j-1}|=k

then

7: return

S_{j-1}

R_{j-1}

, success

V_{j}\leftarrow

random-permutation

(V_{j},\sigma_{j})

s\leftarrow\min\{k-|S_{j-1}|,|V_{j}|\}

10:

\Lambda\leftarrow\left\{1,2,\ldots,\min\left\{s,\left\lceil\frac{1}{\varepsilon}\right\rceil\right\}\right\}\cup\left\{\left\lfloor(1+\varepsilon)^{u}\right\rfloor:1\leq\left\lfloor(1+\varepsilon)^{u}\right\rfloor\leq s,u\in\mathbb{N}\right\}\cup\{s\}

11:

B[\lambda]\leftarrow\textbf{False},\forall\lambda\in\Lambda

12: for

\lambda\in\Lambda

in parallel do

13:

T_{\lambda}\leftarrow\{v_{1},v_{2},\ldots,v_{\lambda}\}

14: if

\Delta\left(T_{\lambda}\,|\,S_{j-1}\right)/|T_{\lambda}|\geq(1-\varepsilon)\tau

then

15:

B[\lambda]\leftarrow\textbf{True}

16:

\lambda^{\prime}_{j}\leftarrow\min\left\{\lambda\in\Lambda:B[\lambda]=\textbf{False}\text{ OR }\lambda=s\right\}

17:

R_{j}\leftarrow R_{j-1}\cup T_{\lambda^{\prime}_{j}}

18: if

\lambda_{j}^{\prime}\leq\left\lceil\frac{1}{\varepsilon}\right\rceil

then

\lambda_{j}^{*}\leftarrow\lambda^{\prime}_{j}-1

else

\lambda_{j}^{*}\leftarrow\lambda^{\prime}_{j}

19:

S_{j}\leftarrow S_{j-1}\cup T_{\lambda_{j}^{*}}

20: return

S_{M+1}

R_{M+1}

, failure

Observe that the randomness of both of ThreshSeqMod and LAG only comes from the random permutations of $V_{j}$ on Line 8 of ThreshSeqMod, since LAG employs ThreshSeqMod as a subroutine. Consider an equivalent version of these algorithms in which the entire ground set $\mathcal{N}$ is permuted randomly, from which the permutation of $V_{j}$ is extracted. That is, if $\sigma$ is the permutation of $\mathcal{N}$ , the permutation of $V$ is given by $v<w$ iff $\sigma(v)<\sigma(w)$ , for $v,w\in V$ . Then, the random vector $\mathbf{q}$ specifies a sequence of permutations of $\mathcal{N}$ : $(\sigma_{1},\sigma_{2},\ldots)$ , which completely determines the behavior of both algorithms.

2.1 Low-Adaptive Threshold (ThreshSeqMod) Algorithm

This section presents the analysis of the low-adaptive threshold algorithm, ThreshSeqMod (Alg. 1; a variant of ThresholdSeq from Chen et al. (2021)). In ThreshSeqMod, the randomness is explicitly dependent on a random vector $\mathbf{q}$ . This modification ensures the consistency property in the distributed setting. Besides the addition of randomness q, ThreshSeqMod employs an alternative strategy of prefix selection within the for loop. Instead of identifying the failure point (the average marginal gain is under the threshold) after the final success point (the average marginal gain is above the threshold) in Chen et al. (2021), ThreshSeqMod determines the first failure point. This adjustment not only addresses the consistency problem but also preserves the theoretical guarantees below.

Theorem 1.

Suppose ThreshSeqMod is run with input $(f,X,k,\delta,\varepsilon,\tau,\mathbf{q})$ . Then, the algorithm has adaptive complexity $O(\log(n/\delta)/\varepsilon^{3})$ and outputs $S,R\subseteq\mathcal{N}$ , where $S$ is the solution set with $|S|\leq k$ and $R$ provides additional information with $|R|=O\left(k\right)$ . The following properties hold: 1) The algorithm succeeds with probability at least $1-\delta/n$ . 2) There are $O(n/\varepsilon^{3})$ oracle queries in expectation. 3) It holds that $f(S)/|S|\geq(1-2\varepsilon)\tau/(1+\varepsilon)$ . 4) If $|S|<k$ , then $\Delta\left(x\,|\,S\right)<\tau$ for all $x\in\mathcal{N}$ .

2.1.1 Analysis of Consistency

Lemma 1.

ThreshSeqMod satisfies the randomized consistency property (Property 1).

Proof.

Consider that the algorithm runs for all $M+1$ iterations of the outer for loop; if the algorithm would have returned at iteration $j$ , the values $S_{i}$ and $V_{i}$ , for $i>j$ keep their values from when the algorithm would have returned. The proof relies upon the fact that every call to $\textsc{ThreshSeqMod}(\cdot,\mathbf{q})$ uses same sequences of permutations of $\mathcal{N}$ : $\{\sigma_{1},\sigma_{2},\ldots,\sigma_{M+1}\}$ . We refer to iterations of the outer for loop on Line 4 of Alg. 1 simply as iterations. Since randomness only happens with the random permutation of $\mathcal{N}$ at each iteration in Alg. 1, the randomness of $\textsc{ThreshSeqMod}(\cdot,\mathbf{q})$ is determined by $\mathbf{q}$ , which satisfies the hypothesis of Property 1.

For the two sets returned by $\textsc{ThreshSeqMod}(\mathcal{N},\mathbf{q})$ , $S_{M+1}=\textsc{ThreshSeqModSol}(\mathcal{N},\mathbf{q})$ represents the feasible solution, and $R_{M+1}=\textsc{ThreshSeqModRel}(\mathcal{N},\mathbf{q})$ is the set that provides additional information. We consider the runs of (1) $\textsc{ThreshSeqMod}(A,\mathbf{q})$ , (2) $\textsc{ThreshSeqMod}(A\cup\{b\},\mathbf{q})$ , and (3) $\textsc{ThreshSeqMod}(A\cup B,\mathbf{q})$ together. Variables of (1) are given the notation defined in the pseudocode; variables of (2) are given the superscript $b$ ; and variables of (3) are given the superscript ^′. Suppose for each $b\in B$ , $R_{M+1}^{b}=R_{M+1}$ . The lemma holds if $S_{M+1}^{\prime}=S_{M+1}$ and $\textsc{ThreshSeqMod}(A\cup B,\mathbf{q})$ terminates successfully.

Let $P(i)$ be the statement for iteration $i$ of the outer for loop on Line 4 that

(i)

$S_{i}=S^{\prime}_{i}$ , and
(ii)

for all $b\in B$ , $S_{i}=S^{b}_{i}$ , $R_{i}=R^{b}_{i}$ , and
(iii)

$V_{i}=V^{\prime}_{i}\setminus B$ , and
(iv)

for all $b\in B$ , $V_{i}=V^{b}_{i}\setminus\{b\}$ .

If $P(M+1)$ is true, and $\textsc{ThreshSeqMod}(A,\mathbf{q})$ and $\textsc{ThreshSeqMod}(A\cup B,\mathbf{q})$ terminate at the same iteration, implying that $\textsc{ThreshSeqMod}(A\cup B,\mathbf{q})$ also terminates successfully, then the lemma holds immediately. In the following, we prove these two statements by induction.

For $i=0$ , it is clear that $S_{0}=S_{0}^{\prime}=R_{0}=R_{0}^{b}=\emptyset$ , and $V_{0}=V_{0}^{\prime}\setminus B=V_{0}^{b}\setminus\{b\}=\mathcal{N}\setminus B$ for all $b\in B$ . Therefore, $P(0)$ is true. Next, suppose that $P(i-1)$ is true. We show that $P(i)$ is also true, and if $\textsc{ThreshSeqMod}(A,\mathbf{q})$ terminates at iteration $i$ , $\textsc{ThreshSeqMod}(A\cup B,\mathbf{q})$ also terminates at iteration $i$ .

Firstly, we show that (iii) and (iv) of $P(i)$ hold. Since $P(i-1)$ holds, it indicates that $S_{i-1}=S^{\prime}_{i-1}=S_{i-1}^{b}$ for any $b\in B$ by (i) and (ii) of $P(i-1)$ . So, (iii) and (iv) of $P(i)$ clearly hold since the sets $S_{i-1},S_{i-1}^{\prime},S_{i-1}^{b}$ involved in updating $V_{i}$ , $V_{i}^{\prime},V_{i}^{b}$ on Line 5 are equal (after $B$ is removed).

Secondly, we prove that (i) and (ii) of $P(i)$ hold and that if $\textsc{ThreshSeqMod}(A,\mathbf{q})$ terminates at iteration $i$ , $\textsc{ThreshSeqMod}(A\cup B,\mathbf{q})$ also terminates at iteration $i$ .

If $\textsc{ThreshSeqMod}(A,\mathbf{q})$ terminates at iteration $i$ because $|S_{i-1}|=k$ , then the other two runs of ThreshSeqMod also terminate at iteration $i$ , since $|S^{\prime}_{i-1}|=|S_{i-1}^{b}|=|S_{i-1}|=k$ by (i) and (ii) of $P(i-1)$ . Moreover, (i) and (ii) of $P(i)$ are true since $S^{\prime}_{i}$ , $S_{i}^{b}$ , $S_{i}$ , $R_{i}^{b}$ , and $R_{i}$ do not update.

If $\textsc{ThreshSeqMod}(A,\mathbf{q})$ terminates at iteration $i$ because $V_{i}=\emptyset$ , then by (iii) and (iv) of $P(i)$ , it holds that $V_{i}^{b}\setminus\{b\}=V_{i}^{\prime}\setminus B=\emptyset$ for all $b\in B$ . Thus, $V_{i}^{b}$ must be empty; otherwise, $V_{i}^{b}=\{b\}$ and $b$ will be added to $R_{i-1}^{b}$ at iteration $i$ which contradicts the fact that $b\not\in R_{M+1}=R_{M+1}^{b}$ . Therefore, $b$ has been filtered out before iteration $i$ or will be filtered out at iteration $i$ in both $\textsc{ThreshSeqMod}(A\cup\{b\},\mathbf{q})$ and $\textsc{ThreshSeqMod}(A\cup B,\mathbf{q})$ since the sets $S_{i-1}^{b}$ and $S_{i-1}^{B}$ involved in updating $V_{i}^{b}$ and $V_{i}^{B}$ are the same. Consequently, $V_{i}^{\prime}$ is also an empty set and $\textsc{ThreshSeqMod}(A\cup B,\mathbf{q})$ terminates at iteration $i$ . Futhermore, (i) and (ii) of $P(i)$ are true since $S^{\prime}_{i}$ , $S_{i}^{b}$ , $S_{i}$ , $R_{i}^{b}$ , and $R_{i}$ do not update.

Refer to caption — Figure 1: This figure depicts the relationship between $V_{i}$ , $V_{i}^{b_{1}}$ and $V_{i}^{\prime}$ in the circumstance that $|S_{i-1}|<k$ , $V_{i}\neq\emptyset$ .

Finally, consider the case that $\textsc{ThreshSeqMod}(A,\mathbf{q})$ does not terminate at iteration $i$ , where $V_{i}\neq\emptyset$ and $|S_{i-1}|<k$ . By (iii) and (iv), it holds that $V_{i}=V_{i}^{\prime}\setminus B=V_{i}^{b}\setminus\{b\}$ , for any $b\in B$ . Let $b_{1}\in B$ be the first element appeared in the sequence of $V_{i}^{\prime}$ , $\lambda^{\#}$ be the index of $b_{1}$ in the order of the permutation of $V_{i}^{\prime}$ . If such $b_{1}$ does not exist (i.e. $V_{i}=V_{i}^{\prime}=V_{i}^{b}$ , for any $b\in B$ ), let $\lambda^{\#}=|V_{i}|+1$ . Fig. 1 depicts the relationship between three sequences $V_{i}$ , $V_{i}^{b_{1}}$ and $V_{i}^{\prime}$ . Let $T_{\lambda^{\prime}_{i}}$ , $T_{\lambda^{\prime}_{i}}^{b_{1}}$ , $T_{\lambda^{\prime}_{i}}^{\prime}$ be sets added to $R_{i-1}$ , $R_{i-1}^{b_{1}}$ , $R_{i-1}^{\prime}$ at iteration $i$ respectively. Since $V^{b_{1}}_{i}$ and $V_{i}^{\prime}$ are subsets of the same random permutation of $\mathcal{N}$ , and $V_{i}^{\prime}\setminus B=V_{i}^{b_{1}}\setminus\{b_{1}\}$ , $\lambda^{\#}$ is also the index of $b_{1}$ in $V_{i}^{b_{1}}$ . Also, it holds that

T_{x}=T_{x}^{b_{1}}=T_{x}^{\prime},\forall x<\lambda^{\#}.

(2)

Since $b_{1}\not\in R_{M+1}^{b_{1}}$ , it holds that $b_{1}\not\in T_{\lambda^{\prime}_{i}}^{b_{1}}$ , which indicates that $\lambda^{\prime}_{i}<\lambda^{\#}$ . By the definition of $\lambda^{\prime}_{i}$ on Line 16,

\left\{\begin{aligned} &B^{b_{1}}[x]=\textbf{True},\forall x<\lambda^{\prime}_{i},\\ &B^{b_{1}}[\lambda^{\prime}_{i}]=\textbf{False}.\end{aligned}\right.

Follows from Equality 2, and $\lambda^{\prime}_{i}<\lambda^{\#}$ , it holds that

\left\{\begin{aligned} &B[x]=B^{\prime}[x]=\textbf{True},\forall x<\lambda^{\prime}_{i},\\ &B[\lambda^{*}_{i}]=B^{\prime}[\lambda^{\prime}_{i}]=\textbf{False},\end{aligned}\right.

Therefore, $T_{\lambda^{*}_{i}}=T_{\lambda^{*}_{i}}^{b}=T_{\lambda^{*}_{i}}^{\prime}$ , and even further $T_{\lambda^{\prime}_{i}}=T_{\lambda^{\prime}_{i}}^{b}=T_{\lambda^{\prime}_{i}}^{\prime}$ . So, Property (i) and (ii) hold.

∎

2.1.2 Analysis of Guarantees

There are three adjustments in ThreshSeqMod compared with ThresholdSeq in Chen et al. (2021) altogether. First, we made the randomness $\mathbf{q}$ explicit allowing us to still consider each permutation on Line 8 as a random step. Therefore, the analysis of the theoretical guarantees is not influenced by adopting the randomness $\mathbf{q}$ . Second, the prefix selection step is changed from identifying the failure point after the final success point to identifying the initial failure point. Note that, this change is necessary for the algorithm to satisfy RCP (Property1). However, by making this change, fewer elements could be filtered out at the next iteration. Fortunately, we are still able to filter out a constant fraction of the candidate set. Third, the algorithm returns one more set that provides extra information. Since the additional set does not affect the functioning of the algorithm, it does not influence the theoretical guarantees. In the following, we will provide the detailed analysis for the theoretical guarantees.

As demonstrated by Lemma 2 below, an increase in the number of selected elements results in a corresponding increase in the number of filtered elements. Consequently, there exists a point, say $t$ , such that a given constant fraction of $V_{j}$ can be filtered out if we are adding more than $t$ elements. The proof of this lemma is quite similar to the proof of Lemma 12 in Chen et al. (2021) which can be found in Appendix C.

Lemma 2.

At an iteration $j$ , let $A_{i}=\{x\in V_{j}:\Delta\left(x\,|\,T_{i}\cup S_{j-1}\right)<\tau\}$ after Line 5. It holds that $|A_{0}|=0$ , $|A_{|V_{j}|}|=|V_{j}|$ and $|A_{i}|\leq|A_{i+1}|$ .

Although we use a smaller prefix based on Line 16 and 18 compared to ThresholdSeq, it is still possible that with probability at least $1/2$ , a constant fraction of $V_{j}$ will be filtered out at the beginning of iteration $j+1$ , or the algorithm terminates with $|S_{j}|=k$ ; given as Lemma 3 in the following. Intuitively, if there are enough such iterations, ThreshSeqMod will succeed. Also, the calculation of query complexity comes from Lemma 3 directly. Since the analyses of success probability, adaptivity and query complexity simply follows the analysis in Chen et al. (2021), we provide these analyses in Appendix C. Next, we prove that Lemma 3 always holds.

Lemma 3.

At any iteration $j+1$ , with probability at least $1/2$ , either $|V_{j+1}|\leq(1-\beta\varepsilon)|V_{j}|$ after Line 5, or the algorithm terminates with $|S_{j}|=k$ on Line 7.

Proof.

By the definition of $A_{i}$ in Lemma 2, $A_{\lambda_{j}^{*}}$ will be filtered out from $V_{j}$ at the next iteration. Equivalently, $A_{\lambda_{j}^{*}}=V_{j}\setminus V_{j+1}$ . After Line 8 at iteration $j+1$ , by Lemma 2, there exists a $t$ such that $t=\min\{i\in\mathbb{N}:|A_{i}|\geq\beta\varepsilon|V_{j}|\}$ . We say an iteration $j$ succeeds, if $\lambda_{j}^{*}\geq\min\{s,t\}$ . In this case, it holds that either $|V_{j}\setminus V_{j+1}|\geq\beta\varepsilon|V_{j}|$ , or $|S_{j}|=k$ . In other words, iteration $j$ succeeds if there does not exist $\lambda\leq\min\{s,t\}$ such that $B[\lambda]=\textbf{False}$ by the selection of $\lambda_{j}^{*}$ on Line 18. Instead of directly calculating the success probability, we consider the failure probability. Define an element $v_{i}$ bad if $\Delta\left(v_{i}\,|\,S_{j-1}\cup T_{i-1}\right)<\tau$ , and good otherwise. Consider the random permutation of $V_{j}$ as a sequence of dependent Bernoulli trials, with success if the element is bad and failure otherwise. When $i\leq\min\{s,t\}$ , the probability of $v_{i}$ is bad is less than $\beta\varepsilon$ . If $B[\lambda]=\textbf{False}$ , there are at least $\varepsilon\lambda$ bad elements in $T_{\lambda}$ . Let $\{Y_{i}\}_{i=1}^{\infty}$ be a sequence of independent and identically distributed Bernoulli trials, each with success probability $\beta\varepsilon$ . Next, we consider the following sequence of dependent Bernoulli trials $\{X_{i}\}_{i=1}^{\infty}$ , where $X_{i}=1_{v_{i}\text{ is bad}}$ when $i\leq t$ , and $X_{i}=Y_{i}$ otherwise. Define $B^{\prime}[\lambda]=\textbf{False}$ if there are at least $\varepsilon\lambda$ bad elements in $\{X_{i}\}_{i=1}^{\lambda}$ . Then, for any $\lambda\leq\min\{s,t\}$ , it holds that $Pr\left(B[\lambda]=\textbf{False}\right)=Pr\left(B^{\prime}[\lambda]=\textbf{False}\right)$ . In the following, we bound the probability that there are more than $\varepsilon$ -fraction $1$ s in $\{X_{i}\}_{i=1}^{\lambda}$ for any $\lambda\leq s$ ,

$\displaystyle\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}Pr\left(B^{\prime}[\lambda]=\textbf{False}\right)$	$\displaystyle\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\leq Pr\left(\sum_{i=1}^{\lambda}X_{i}\geq\varepsilon\lambda\right)$
	$\displaystyle\leq Pr\left(\sum_{i=1}^{\lambda}Y_{i}\geq\varepsilon\lambda\right)$	(Lemma 9)
	$\displaystyle\leq\min\left\{\beta,e^{-\frac{(1-\beta)^{2}}{1+\beta}\varepsilon\lambda}\right\}$	(Markov’s Inequality, Lemma 10)
	$\displaystyle\leq\min\left\{\beta,e^{-\varepsilon\lambda/2}\right\}$	( $\beta<1/16$ )

Subsequently, the probability of failure at iteration $j$ is calculated below,

	$\displaystyle Pr\left(\text{iteration }j\text{ fails}\right)$	$\displaystyle=Pr\left(\exists\lambda\leq\min\{s,t\},\textit{s.t.},B[\lambda]=\textbf{False}\right)$
		$\displaystyle\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\leq Pr\left(\exists\lambda\leq s,\textit{s.t.},B^{\prime}[\lambda]=\textbf{False}\right)$
		$\displaystyle\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\leq\sum_{\lambda\leq s}Pr\left(B^{\prime}[\lambda]=\textbf{False}\right)$
		$\displaystyle\leq\sum_{\lambda=1}^{\infty}\min\left\{\beta,e^{-\varepsilon\lambda/2}\right\}$
		$\displaystyle\leq a\beta+\sum_{\lambda=a+1}^{\infty}e^{-\varepsilon\lambda/2}\text{, where }a=\left\lfloor\frac{1}{4\beta}\right\rfloor=\left\lfloor\frac{2}{\varepsilon}\log\left(\frac{4}{1-e^{-\varepsilon/2}}\right)\right\rfloor$
		$\displaystyle\leq a\beta+\frac{e^{-\varepsilon(a+1)/2}}{1-e^{-\varepsilon/2}}\leq\frac{1}{4}+\frac{1}{4}=\frac{1}{2}$

∎

As for the Properties (3) and (4) in Theorem 1, since the structure of the solution set is the same as the solution of ThresholdSeq. So, the analysis is similar to what in Chen et al. (2021). We provide these proofs in Appendix C.

2.2 Low-Adaptive Greedy (LAG) Algorithm

Algorithm 2 Low-Adaptive Greedy (LAG)

1:Input: Evaluation oracle

f:2^{\mathcal{N}}\to\mathbb{R}^{+}

, subset

C

, constraint

k

, fixed sequence of random bits

(\mathbf{q}_{1},\mathbf{q}_{2},\ldots)

, accuracy parameter

\varepsilon

, constant

\alpha

, value

\Gamma

such that

\Gamma\leq f(O)\leq\Gamma/\alpha

2:if

\Gamma=\varnothing

\alpha=\varnothing

then

\Gamma\leftarrow\max_{x\in\mathcal{N}}f(x)

\alpha\leftarrow 1/k

5:Initialize

\tau\leftarrow\Gamma/\alpha k

\delta

\leftarrow

1/(\log_{1-\varepsilon}(\alpha/3))+1)

S\leftarrow\emptyset

R\leftarrow\emptyset

i\leftarrow 0

6:for

i\leftarrow 0

\log_{1-\varepsilon}(\alpha/3)

\tau\leftarrow\tau(1-\varepsilon)^{i}

g(\cdot)\leftarrow f(S\cup\cdot)

T_{1},T_{2}\leftarrow\textsc{ThreshSeqMod}(g,C,k-|S|,\delta,\varepsilon/3,\tau,\mathbf{q}_{i})

S\leftarrow S\cup T_{1}

R\leftarrow R\cup T_{2}

10: if

|S|=k

then

11: return

S,R

12:return

S,R

Another building block for our distributed algorithms is a simple, low-adaptive greedy algorithm LAG (Alg. 2). This algorithm is an instantiation of the ParallelGreedyBoost framework of Chen et al. (2021), and it relies heavily on the low-adaptive procedure ThreshSeqMod (Alg. 1).

Guarantees. Following the analysis of Theorem 3 in Chen et al. (2021), LAG achieves ratio $1-1/e-\varepsilon$ in $O\left(\frac{\log(\alpha^{-1})\log(n\log(\alpha^{-1})/\varepsilon)}{\varepsilon^{4}}\right)$ adaptive rounds and $O(\frac{n\log(\alpha^{-1})}{\varepsilon^{4}})$ total queries with probability at least $1-1/n$ . Setting $\Gamma=\max_{x\in\mathcal{N}}f(\{x\})$ and $\alpha=1/k$ , LAG has adaptive complexity of $O\left(\frac{\log n\log k}{\varepsilon^{4}}\right)$ and a query complexity of $O(\frac{n\log k}{\varepsilon^{4}})$

Lemma 4.

LAG satisfies the randomized consistency property (Property 1).

Proof of Lemma 4.

Observe that the only randomness in LAG is from the calls to ThreshSeqMod. Since $\textsc{LAG}(A,\mathbf{q})$ succeeds, every call to ThreshSeqMod must succeed as well. Moreover, considering that $\mathbf{q}$ is used to permute the underlying ground set $\mathcal{N}$ , changing the set argument $A$ of LAG does not change the sequence received by each call to ThreshSeqMod. Order the calls to ThreshSeqMod: $\textsc{ThreshSeqMod}(\cdot,\mathbf{q}_{1})$ , $\textsc{ThreshSeqMod}(\cdot,\mathbf{q}_{2})$ , …, $\textsc{ThreshSeqMod}(\cdot,\mathbf{q}_{m})$ . Since $\textsc{LAG}\textsc{Rel}(A\cup\{b\},\mathbf{q})=\textsc{LAG}\textsc{Rel}(A,\mathbf{q})$ , it holds that $b\not\in\textsc{ThreshSeqMod}\textsc{Rel}(A\cup\{b\},\mathbf{q}_{i})$ , for each $b\in B$ and each $i$ . If $b\not\in\textsc{ThreshSeqMod}\textsc{Rel}(A\cup\{b\},\mathbf{q}_{i})$ is equivalent to $\textsc{ThreshSeqMod}\textsc{Rel}(A\cup\{b\},\mathbf{q}_{i})=\textsc{ThreshSeqMod}\textsc{Rel}(A,\mathbf{q}_{i})$ (as shown in Claim 1 in the following), then by application of Lemma 1, $\textsc{ThreshSeqMod}(A\cup B,\mathbf{q}_{i})$ terminates successfully and $\textsc{ThreshSeqMod}\textsc{Sol}(A\cup B,\mathbf{q}_{i})=\textsc{ThreshSeqMod}\textsc{Sol}(A,\mathbf{q}_{i})$ for each $i$ . Therefore, $\textsc{LAG}\textsc{Sol}(A\cup B,\mathbf{q})=\textsc{LAG}\textsc{Sol}(A,\mathbf{q})$ and the former call terminates successfully. ∎

Next, we provide Claim 1 and its analysis below.

Claim 1.

Let $\mathbf{q}$ be a fixed sequence of random bits, $A\subseteq\mathcal{N}$ , and $b\in\mathcal{N}\setminus A$ . Then, $b\not\in\textsc{ThreshSeqMod}\textsc{Rel}(A\cup\{b\},\mathbf{q})$ if and only if $\textsc{ThreshSeqMod}\textsc{Rel}(A\cup\{b\},\mathbf{q})=\textsc{ThreshSeqMod}\textsc{Rel}(A,\mathbf{q})$ .

Proof.

It is obvious that if $\textsc{ThreshSeqMod}\textsc{Rel}(A\cup\{b\},\mathbf{q}_{i})=\textsc{ThreshSeqMod}\textsc{Rel}(A,\mathbf{q})$ , then $b\not\in\textsc{ThreshSeqMod}\textsc{Rel}(A\cup\{b\},\mathbf{q})$ . In the following, we prove the reverse statement.

Following the analysis of RCP for ThreshSeqMod in Section 2.1.1, we consider the runs of (1) $\textsc{ThreshSeqMod}(A,\mathbf{q})$ , and (2) $\textsc{ThreshSeqMod}(A\cup\{b\},\mathbf{q})$ . Variables of (1) are given the notation defined in the pseudocode; variables of (2) are given the superscript $b$ . We analyze the following statement $P(i)$ for each iteration $i$ of the outer for loop on Line 4 in Alg. 1.

(i)

for all $b\in B$ , $S_{i}=S^{b}_{i}$ , $R_{i}=R^{b}_{i}$ , and
(ii)

for all $b\in B$ , $V_{i}=V^{b}_{i}\setminus\{b\}$ .

Since $b\not\in\textsc{ThreshSeqMod}\textsc{Rel}(A\cup\{b\},\mathbf{q})=R^{b}_{M+1}$ , then $b\not\in R^{b}_{i}$ for each $i$ . Thus, the analysis in Section 2.1.1 also holds in this case. According to the inductive method, we are able to prove that $P(M+1)$ is true for each $i$ which indicates $\textsc{ThreshSeqMod}\textsc{Rel}(A\cup\{b\},\mathbf{q})=\textsc{ThreshSeqMod}\textsc{Rel}(A,\mathbf{q})$ . ∎

3 Consistent Linear Time (Linear-TimeCardinality) Algorithm

In this section, we present the analysis of Linear-TimeCardinality (Alg. 3, LTC), a consistent linear time algorithm which is an extension of the highly adaptive linear-time algorithm (Alg. 3) in Chen et al. (2021). Notably, to ensure consistency within a distributed setting and bound the solution size, LTC incorporates the randomness q, and initializes the solution set with the maximum singleton. The algorithm facilitates the creation of linear-time MapReduce algorithms, enhancing overall computational efficiency beyond the capabilities of current state-of-the-art superlinear algorithms.

Algorithm 3 Linear-TimeCardinality (LTC)

1:procedure LTC(

f,\mathcal{N},k,\mathbf{q}

)

2: Input: evaluation oracle

f

, constraint

k

, fixed sequence of random bits

\mathbf{q}\leftarrow(\sigma)

\mathcal{N}\leftarrow\textbf{random-permutation}(\mathcal{N},\sigma)

4: Initialize

a\leftarrow\text{first element in }\operatorname*{arg\,max}_{x\in\mathcal{N}}{f(x)}

S\leftarrow\{a\}

5: for

x\in\mathcal{N}

6: if

\Delta\left(x\,|\,S\right)\geq\frac{f(S)}{k}

then

S\leftarrow S\cup\{x\}

7: return

S

Theorem 2.

Let $(f,k)$ be an instance of SM. The algorithm Linear-TimeCardinality outputs $S\subseteq\mathcal{N}$ such that the following properties hold: 1) There are $O(n)$ oracle queries and $O(n)$ adaptive rounds. 2) Let $S^{\prime}$ be the last $k$ elements in $S$ . It holds that $f(S^{\prime})\geq\frac{1}{2}f(S)\geq\frac{1}{4}f(O)$ , where $O$ is an optimal solution to the instance $(f,k)$ . 3) The size of $S$ is limited to $O(k\log(n))$ .

3.1 Analysis of Consistency

The highly adaptive linear-time algorithm (Alg. 3) outlined in Chen et al. (2021) commences with an empty set and adds elements to it iff. $\Delta\left(x\,|\,S\right)\geq f(S)/k$ . Without introducing any randomness, this algorithm can be deterministic only if the order of the ground set is fixed. Additionally, to limit the solution size, we initialize the solution set with the maximum singleton, a choice that also impacts the algorithm’s consistency. However, by selecting the first element that maximizes the objective value, the algorithm can maintain its consistency. In the following, we provide the analysis of randomized consistency property of LTC.

Lemma 5.

LTC satisfies the randomized consistency property (Property 1).

Proof.

Consider the runs of (1) $\textsc{LTC}(A,\textbf{q})$ , (2) $\textsc{LTC}(A\cup\{b\},\textbf{q})$ , and (3) $\textsc{LTC}(A\cup B,\textbf{q})$ . With the same sequence of random bits q, after the random permutation, $A$ , $A\cup\{b\}$ and $A\cup B$ are the subsets of the same sequence. For any $x\in\mathcal{N}$ , let $i_{x}$ be the index of $x$ . Then, $\mathcal{N}_{i_{x}}$ are the elements before $x$ (including $x$ ). Define $S_{i_{x}}$ be the intermediate solution of (1) after we consider element $x$ . If $x\not\in A$ , define $S_{i_{x}}=S_{i_{x}-1}$ . Similarly, define $S_{i_{x}}^{\prime}$ and $S_{i_{x}}^{b}$ be the intermediate solution of (2) and (3). If $S_{i_{x}}=S_{i_{x}}^{b}=S_{i_{x}}^{\prime}$ , for any $x\in\mathcal{N}$ and $b\in B$ , $\textsc{LTC}(A\cup B,\textbf{q})=\textsc{LTC}(A,\textbf{q})$ . Next, we prove the above statement holds.

For any $b\in B$ , since $\textsc{LTC}(A,\textbf{q})=\textsc{LTC}(A\cup\{b\},\textbf{q})$ , it holds that either $f(b)<\max_{x\in A}f(x)$ , or $f(b)=\max_{x\in A}f(x)$ , and $i_{a}<i_{b}$ , where $a$ is the first element in $\arg\max_{x\in A}f(x)$ . Therefore, $a$ is also the first element in $\arg\max_{x\in A\cup B}f(x)$ . Furthermore, it holds that $S_{0}=S_{0}^{b}=S_{0}^{\prime}=\{a\}$ .

Suppose that $S_{i_{x}-1}=S_{i_{x}-1}^{b}=S_{i_{x}-1}^{\prime}$ . If $x\in A$ or $x\not\in A\cup B$ , naturally, $S_{i_{x}}=S_{i_{x}}^{b}=S_{i_{x}}^{\prime}$ . If $x\in B$ , $S_{i_{x}}=S_{i_{x}-1}$ immediately. Since $\textsc{LTC}(A,\textbf{q})=\textsc{LTC}(A\cup\{b\},\textbf{q})$ , it holds that $\Delta\left(x\,|\,S_{i_{x}-1}^{b}\right)<f(S_{i_{x}-1}^{b})/k$ and $S_{i_{x}}^{b}=S_{i_{x}-1}^{b}$ . So, $\Delta\left(x\,|\,S_{i_{x}-1}^{\prime}\right)<f(S_{i_{x}-1}^{\prime})/k$ . Then, $S_{i_{x}}^{\prime}=S_{i_{x}-1}^{\prime}$ . Thus, $S_{i_{x}}=S_{i_{x}}^{b}=S_{i_{x}}^{\prime}$ . ∎

3.2 Analysis of Guarantees

Query Complexity and Adaptivity. Alg. 3 queries on Line 4 and 6, where there are $n$ oracle queries on Line 4 and 1 oracle query for each element received on Line 6. Therefore, the query complexity and the adaptivity are both $O(n)$ .

Solution Size. Given that $f$ is a monotone function, it holds that $\arg\max_{S\in\mathcal{N}}f(S)=\mathcal{N}$ . Furthermore, since $f(\{a\})=\max_{x\in\mathcal{N}}f(\{x\})$ , the submodularity property implies that $f(\{a\})\geq\frac{1}{n}\sum_{x\in\mathcal{N}}f(\{x\})\geq\frac{1}{n}f(\mathcal{N})$ . Each element added to $S$ contributes to an increase in the solution value by a factor of $(1+\frac{1}{k})$ . Therefore,

		$\displaystyle f(\mathcal{N})\geq f(S)\geq\left(1+\frac{1}{k}\right)^{\|S\|}f(a)\geq\left(1+\frac{1}{k}\right)^{\|S\|}\frac{f(\mathcal{N})}{n}$
	$\displaystyle\Rightarrow$	$\displaystyle\|S\|\leq\log_{1+\frac{1}{k}}(n)\leq(k+1)\log(n)$		( $\log(x)\geq 1-\frac{1}{x}$ )

Objective Value. For any $x\in\mathcal{N}$ , let $S_{x}$ be the intermediate solution before we process $x$ . It holds that $\Delta\left(o\,|\,S_{o}\right)<f(S_{o})/k$ for any $o\in O\setminus S$ . Then,

f(O)-f(S)\overset{(1)}{\leq}\sum_{o\in O\setminus S}\Delta\left(o\,|\,S\right)\overset{(2)}{\leq}\sum_{o\in O\setminus S}\Delta\left(o\,|\,S_{o}\right)<\sum_{o\in O\setminus S}f(S_{o})/k\leq f(S),

where Inequality (1) follows from monotonicity and submodularity, Inequality (2) follows from submodularity.

For any $x\in S^{\prime}$ , it holds that $\Delta\left(x\,|\,S_{x}\right)\geq f(S_{x})/k$ . Then,

		$\displaystyle f(S)-f(S\setminus S^{\prime})=\sum_{x\in S^{\prime}}\Delta\left(x\,\|\,S_{x}\right)\geq\sum_{x\in S^{\prime}}f(S_{x})/k\overset{(1)}{\geq}\sum_{x\in S^{\prime}}f(S\setminus S^{\prime})/k=f(S\setminus S^{\prime})$
	$\displaystyle\Rightarrow$	$\displaystyle\hskip 10.00002ptf(S^{\prime})\overset{(2)}{\geq}f(S)-f(S\setminus S^{\prime})\geq\frac{1}{2}f(S)$

where Inequalities (1) and (2) follow from submodularity.

Algorithm 4 Randomized-DASH (R-DASH)

1:Input: Evaluation oracle

f:2^{\mathcal{N}}\to\mathbb{R}

, constraint

k

, error

\varepsilon

, available machines

M\leftarrow\{1,2,...,\ell\}

2:for

e\in\mathcal{N}

3: Assign

e

to machine chosen uniformly at random

4:for

i\in M

\triangleright

On machine

i

6: Let

\mathcal{N}_{i}

be the elements assigned to machine

i

S_{i},R_{i}\leftarrow\textsc{LAG}(f,\mathcal{N}_{i},k,\varepsilon)

8: Send

S_{i},R_{i}

to primary machine

\triangleright

On primary machine

10:Gather

R\leftarrow\bigcup_{i=1}^{\ell}R_{i}

11:

T\leftarrow\textsc{LAG}(f,R,k,\varepsilon)

on machine

m_{1}

12:return

V\leftarrow\operatorname*{arg\,max}{\{f(T),f(S_{1})}\}

4 Low-Adaptive Algorithms with Constant MR Rounds (Greedy-DASH and Randomized-DASH)

Once we have the randomized consistency property of LAG, we can almost immediately use it to obtain parallelizable MapReduce algorithms.

4.1 Randomized-DASH

R-DASH (Alg. 4) is a two MR-rounds algorithm obtained by plugging LAG into the RandGreeDI algorithm of Mirzasoleiman et al. (2013); Barbosa et al. (2015). R-DASH runs in two MapReduce rounds, $O(\log{}(k)\log{}(n))$ adaptive rounds, and guarantees the ratio $\frac{1}{2}(1-1/e-\varepsilon)$ ( $\simeq$ 0.316).

Description. The ground set is initially distributed at random by R-DASH across all machines $M$ . In its first MR round, R-DASH runs LAG on every machine to obtain $S_{i},R_{i}$ in $O\left(\log(k)\log(|\mathcal{N}_{i}|)\right)$ adaptive rounds. The solutions from every machine are then returned to the primary machine, where LAG selects the output solution that guarantees $\frac{1}{2}(1-1/e-\varepsilon)$ approximation in $O\left(\log(k)\log(|R|)\right)$ adaptive rounds as stated in Corollary 1. First, we provide Theorem 3 and its analysis by plugging any randomized algorithm Alg which follows RCP (Property 1) into the framework of RandGreeDI. The proof is a minor modification of the proof of Barbosa et al. (2015) to incorporate randomized consistency. Then, we can get Corollary 1 for R-DASH immediately.

Theorem 3.

Let $(f,k)$ be an instance of SM where $k=O\left(\psi/\ell\right)$ . Alg is a randomized algorithm which satisfies the Randomized Consistency Property with $\alpha$ approximation ratio, $\Phi(n)$ query complexity, and $\Psi(n)$ adaptivity. By replacing LAG with Alg, R-DASH returns set $V$ with two MR rounds, $2\Psi(n/\ell)$ adaptive rounds, $(\ell+1)\Phi(n/\ell)$ total queries, $O(n)$ communication complexity such that

\mathbb{E}[f(V)]\geq\frac{\alpha}{2}\textsc{OPT}.

Proof.

Query Complexity and Adaptivity. R-DASH runs with two MR rounds. In the first MR round, Alg is invoked $\ell$ times in parallel, each with a ground set size of $n/\ell$ . So, during the first MR round, the number of queries is $\ell\Phi(n/\ell)$ and the number of adaptive rounds is $\Psi(n/\ell)$ . Then, the second MR round calls Alg once, handling at most $n/\ell$ elements. Consequently, the total number of queries of R-DASH is $(\ell+1)\Phi(n/\ell)$ , and the total number of adaptive rounds is $2\Psi(n/\ell)$ .

Approximation Ratio. Let R-DASH be run with input $(f,k,\varepsilon,M)$ . Since $\ell\leq n^{1-c}$ , $\mathbb{E}\left[|\mathcal{N}_{i}|\right]=n/\ell\geq n^{c}$ . By an application of Chernoff’s bound to show that the size $|\mathcal{N}_{i}|$ is concentrated.

Let $\mathcal{N}(1/\ell)$ denote the random distribution over subsets of $\mathcal{N}$ where each element is included independently with probability $1/\ell$ . For $x\in\mathcal{N}$ , let

p_{x}=\begin{cases}Pr_{X\sim\mathcal{N}(1/\ell),\mathbf{q}}\left[\textsc{AlgRel}(X\cup\{x\},\mathbf{q})=\textsc{AlgRel}(X,\mathbf{q})\right]\text{, if }x\in O\\ 0\hskip 130.0002pt\text{, otherwise }\end{cases}.

Consider $S_{1}=\textsc{AlgSol}(\mathcal{N}_{1},\mathbf{q})$ and $R_{1}=\textsc{AlgRel}(\mathcal{N}_{1},\mathbf{q})$ on machine $m_{1}$ . Let $O_{1}=\{o\in O:\textsc{AlgRel}(\mathcal{N}_{1}\cup\{o\},\mathbf{q})=\textsc{AlgRel}(\mathcal{N}_{1},\mathbf{q})\}$ . Since Alg follows the Randomized Consistency Property, $\textsc{AlgSol}(\mathcal{N}_{1}\cup O_{1},\mathbf{q})=\textsc{AlgSol}(\mathcal{N}_{1},\mathbf{q})=S_{1}$ . Therefore, $f(S_{1})\geq\alpha f(O_{1})$ . Next, let $O_{2}=O\cap R$ , where $R=\bigcup_{i=1}^{\ell}R_{i}$ . It holds that $f(T)\geq\alpha f(O_{2})$ . Let $o\in O$ ; $o$ is assigned to $\mathcal{N}_{c}$ on some machine $m_{c}$ . It holds that,

	$\displaystyle\Pr[o\in O_{2}]$	$\displaystyle=\Pr[o\in\textsc{AlgRel}(\mathcal{N}_{c})\|o\in\mathcal{N}_{c}]$
		$\displaystyle=\Pr[o\in\textsc{AlgRel}(\mathcal{N}_{c}\cup\{o\})]$
		$\displaystyle=1-p_{o}.$

Therefore,

$\displaystyle\mathbb{E}\left[f(V)\right]$	$\displaystyle\geq\frac{1}{2}\left(\mathbb{E}[f(S_{1})]+\mathbb{E}[f(T)]\right)$
	$\displaystyle\geq\frac{\alpha}{2}\left(\mathbb{E}[f(O_{1})]+\mathbb{E}[f(O_{2})]\right)$
	$\displaystyle\geq\frac{\alpha}{2}\left(F(\mathbf{p})+F(\mathbf{1}_{O}-\mathbf{p})\right)$
	$\displaystyle\geq\frac{\alpha}{2}F(\mathbf{1}_{O}),$	(3)

where Inequality 3 follows from Lemma 8 and $F$ is convex. ∎

Corollary 1.

Let $(f,k)$ be an instance of SM where $k=O\left(\psi/\ell\right)$ . R-DASH returns set $V$ with two MR rounds, $O\left(\frac{1}{\varepsilon^{4}}\log(k)\log\left(n\right)\right)$ adaptive rounds, $O\left(\frac{n\log(k)}{\varepsilon^{4}}\right)$ total queries, $O(n)$ communication complexity, and probability at least $1-n^{1-2c}$ such that

\mathbb{E}[f(V)]\geq\frac{1-1/e-\varepsilon}{2}\textsc{OPT}.

4.2 Greedy-DASH

Algorithm 5

\textsc{Greedy-DASH}(\textsc{G-DASH})

Input: evaluation oracle

f:2^{\mathcal{N}}\to\mathbb{R}

, constraint

k

, error

\varepsilon

, available machines

M\leftarrow\{1,2,...,\ell\}

S\leftarrow\emptyset,C_{0}\leftarrow\emptyset

for

r\leftarrow 1

\lceil\frac{1}{\varepsilon}\rceil

X_{r,i}\leftarrow

Elements assigned to machine

i

chosen uniformly at random in round

r

\mathcal{N}_{r,i}\leftarrow X_{r,i}\cup C_{r-1}

6: for

i\in M

in parallel do

S_{r,i},R_{r,i}\leftarrow\textsc{LAG}(f,\mathcal{N}_{r,i},k,\varepsilon)

8: Send

S_{r,i},R_{r,i}

to each machine

S\leftarrow\operatorname*{arg\,max}{\{f(S),f(S_{r,1}),\cdots,f(S_{r,\ell})}\}

10:

C_{r}\leftarrow\bigcup_{i=1}^{\ell}R_{r,i}\cup C_{r-1}

return S

Next, we obtain the nearly the optimal ratio by applying LAG and randomized consistency to DistributedDistorted proposed by Kazemi et al. (2021). DistributedDistorted is a distributed algorithm for regularized submodular maximization that relies upon (a distorted version of) the standard greedy algorithm. For SMCC, the distorted greedy algorithm reduces to the standard greedy algorithm Greedy. In the following, we show that any algorithm that satisfies randomized consistency can be used in place of Greedy as stated in Theorem 4. By introducing LAG into DistributedDistorted, G-DASH achieves the near optimal ( $1-1/e-\varepsilon$ ) expected ratio in $\lceil\frac{1}{\varepsilon}\rceil$ MapReduce rounds, $O(\log(n)\log(k))$ adaptive rounds, and $O(n\log(k))$ total queries. We also generalize this to any algorithm that satisfies randomized consistency.

The Framework of Kazemi et al. (2021). DistributedDistorted has $\lceil 1/\varepsilon\rceil$ MR rounds where each round $r$ works as follows: First, it distributes the ground set into $m$ machines uniformly at random. Then, each machine $i$ runs Greedy (when modular term $\ell(\cdot)=0$ ) on the data $\mathcal{N}_{r,i}$ that combines the elements distributed before each round $X_{r,i}$ and the elements forwarded from the previous rounds $C_{r-1}$ to get the solution $S_{r,i}$ . At the end, the final solution, which is the best among $S_{r,1}$ and all the previous solutions, is returned. To improve the adaptive rounds of DistributedDistorted, we replace standard greedy with LAG to get G-DASH.

Theorem 4.

Let $(f,k)$ be an instance of SM where $k=O\left(\varepsilon\psi/\ell\right)$ . Alg is a randomized algorithm which satisfies the Randomized Consistency Property with $\alpha$ approximation ratio, $\Phi(n)$ query complexity, and $\Psi(n)$ adaptivity. By replacing LAG with Alg, G-DASH returns set $S$ with $O\left(1/\varepsilon\right)$ MR rounds and $\frac{1}{\varepsilon}\Psi\left(\frac{n}{\ell}\right)$ adaptive rounds, $\frac{\ell}{\varepsilon}\Phi\left(\frac{n}{\ell}\right)$ total queries, $O\left(\frac{n}{\varepsilon}\right)$ communication complexity such that,

\mathbb{E}\left[f(S)\right]\geq(\alpha-\varepsilon)\textsc{OPT}.

Proof.

Query Complexity and Adaptivity. G-DASH operates with $\lceil\frac{1}{\varepsilon}\rceil$ MR rounds, where each MR round calls Alg $\ell$ times in parallel. As each call of Alg works on the ground set with a size of at most $\frac{n}{\ell}$ , the total number of queries for G-DASH is $\frac{\ell}{\varepsilon}\Phi\left(\frac{n}{\ell}\right)$ , and the total number of adaptive rounds is $\frac{1}{\varepsilon}\Psi\left(\frac{n}{\ell}\right)$ .

Approximation ratio. Let G-DASH be run with input $(f,k,\varepsilon,M)$ . Similar to the analysis of Theorem 3, The size of $\mathcal{N}_{i}$ is concentrated. Let $O$ be the optimal solution. For any $x\in\mathcal{N}$ , define that,

p_{x}^{r}=\begin{cases}Pr_{X\sim\mathcal{N}(1/\ell),\mathbf{q}}\left[x\not\in C_{r-1}\text{ and }\right.\\ \color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\hskip 10.00002pt\left.x\in\textsc{AlgRel}(X\cup C_{r-1}\cup\{x\},\mathbf{q})\right]\hskip 10.00002pt\text{, if }x\in O\\ 0\hskip 130.0002pt\text{, otherwise }\end{cases}.

Then we provide the following lemma.

Lemma 6.

For any $x\in O$ and $1\leq r\leq 1/\varepsilon$ , $Pr\left(x\in C_{r}\right)=\sum_{r^{\prime}=1}^{r}p_{x}^{r^{\prime}}$ .

Proof.

	$\displaystyle Pr\left(x\in C_{r}\right)$	$\displaystyle=\sum_{r^{\prime}=1}^{r}Pr\left(x\in C_{r^{\prime}}\backslash C_{r^{\prime}-1}\right)$
		$\displaystyle=\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\sum_{r^{\prime}=1}^{r}Pr\left(x\in\cup_{i=1}^{\ell}R_{r^{\prime},i}\backslash C_{r^{\prime}-1}\right)$
		$\displaystyle=\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\sum_{r^{\prime}=1}^{r}\sum_{i=1}^{\ell}\frac{1}{\ell}Pr\left(x\in R_{r^{\prime},i}\backslash C_{r^{\prime}-1}\|x\in\mathcal{N}_{r^{\prime},i}\right)$
		$\displaystyle=\sum_{r^{\prime}=1}^{r}\sum_{i=1}^{\ell}\frac{1}{\ell}Pr\left(x\not\in C_{r^{\prime}-1}\text{ and }\right.$
		$\displaystyle\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\quad\left.x\in\textsc{AlgRel}(X_{r^{\prime},i}\cup C_{r^{\prime}-1}\cup\{x\},\mathbf{q})\right)$
		$\displaystyle=\sum_{r^{\prime}=1}^{r}p_{x}^{r^{\prime}}$

∎

The rest of the proof bounds $f(S_{r,1})$ in the following two ways.

First, let $O_{r,1}=\{o\in O:o\not\in\textsc{AlgRel}\left(X_{r,1}\cup C_{r-1}\cup\{o\},\mathbf{q}\right)\}$ , and $O_{r,2}=\left(C_{r-1}\cap O\right)\cup O_{r,1}$ . Since Alg follows the Randomized Consistency Property, it holds that $\textsc{AlgSol}(X_{r,1}\cup C_{r-1},\mathbf{q})=\textsc{AlgSol}(X_{r,1}\cup C_{r-1}\cup O_{r,1},\mathbf{q})=S_{r,1}$ , and $f\left(S_{r,1}\right)\geq\alpha f\left(O_{r,2}\right)$ . So, for any $o\in O$ ,

\displaystyle Pr\left(o\in O_{r,2}\right)=Pr\left(o\in C_{r-1}\text{ or }o\not\in\textsc{AlgRel}\left(X_{r,1}\cup C_{r-1}\cup\{o\},\mathbf{q}\right)\right)=1-p_{o}^{r}.

Therefore,

\displaystyle\mathbb{E}\left[f\left(S_{r,1}\right)\right]\geq\alpha\mathbb{E}\left[f\left(O_{r,2}\right)\right]\geq\alpha F\left(\mathbf{1}_{O}-\mathbf{p}^{r}\right).

(4)

Second, let $O_{r,3}=C_{r-1}\cap O$ . Similarly, it holds that $f\left(S_{r,1}\right)\geq\alpha f\left(O_{r,3}\right)$ . And for any $o\in O$ , by Lemma 6, it holds that,

Pr\left(o\in O_{r,3}\right)=Pr\left(o\in C_{r-1}\right)=\sum_{r^{\prime}=1}^{r-1}p_{x}^{r^{\prime}},

Therefore,

\displaystyle\mathbb{E}\left[f\left(S_{r,1}\right)\right]\geq\alpha\mathbb{E}\left[f\left(O_{r,3}\right)\right]\geq\alpha F\left(\sum_{r^{\prime}=1}^{r-1}\mathbf{p}^{r^{\prime}}\right).

(5)

By Inequalities 4 and 5, we bound the approximation ratio of G-DASH by the following,

	$\displaystyle\mathbb{E}\left[f(S)\right]$	$\displaystyle\geq\varepsilon\sum_{r=1}^{1/\varepsilon}\mathbb{E}\left[f\left(S_{r,1}\right)\right]$
		$\displaystyle\geq\varepsilon\alpha\cdot\left(F\left(\sum_{r^{\prime}=1}^{1/\varepsilon-1}\mathbf{p}^{r^{\prime}}\right)+\sum_{r=1}^{1/\varepsilon-1}F\left(\mathbf{1}_{O}-\mathbf{p}^{r^{\prime}}\right)\right)$
		$\displaystyle\overset{(a)}{\geq}\alpha(1-\varepsilon)F\left(\mathbf{1}_{O}\right)$
		$\displaystyle\geq(\alpha-\varepsilon)\textsc{OPT},$

where Inequality (a) follows from Lemma 8 and $F$ is convex. ∎

Corollary 2.

Let $(f,k)$ be an instance of SM where $k=O\left(\varepsilon\psi/\ell\right)$ . G-DASH returns set $S$ with $O\left(1/\varepsilon\right)$ MR round and $O\left(\frac{1}{\varepsilon^{5}}\log(k)\log(n)\right)$ adaptive rounds, $O\left(\frac{n\log(k)}{\varepsilon^{5}}\right)$ total queries, $O\left(\frac{n}{\varepsilon}\right)$ communication complexity, and probability at least $1-\varepsilon^{-1}n^{1-2c}$ such that,

\mathbb{E}\left[f(S)\right]\geq(1-1/e-\varepsilon)\textsc{OPT}.

5 Threshold-DASH: Two MR-Rounds Algorithm with Improved Theoretical Properties

In this section, we present a novel $(0.375-\varepsilon$ )-approximate, two MR-rounds algorithm Threshold-DASH (T-DASH) that achieves nearly optimal $O(\log(n))$ adaptivity in $O(n\log(k))$ total time complexity.

Description. The T-DASH algorithm (Alg. 6) is a two MR-rounds algorithm that runs ThreshSeqMod concurrently on every machine with a specified threshold value of $\alpha\textsc{OPT}/k$ . The primary machine then builds up its solution $S_{1}$ by adding elements with ThreshSeqMod from the pool of solutions returned by the other machines. Notice that there is a small amount of data duplication as elements of the ground set are not randomly partitioned in the same way as in the other algorithms. This version of the algorithm requires to know the OPT value; in Appendix E we show how to remove this assumption. In Appendix I, we further discuss the similar two MR round algorithm of Liu and Vondrak (2019), that provides an improved $1/2$ -approximation but requires four times the data duplication of T-DASH.

Algorithm 6 Threshold-DASH Knowing OPT

1:Input: Evaluation oracle

f:2^{\mathcal{N}}\to\mathbb{R}

, constraint

k

, error

\varepsilon

, available machines

M\leftarrow\{1,2,...,\ell\}

, and OPT

2:Initialize

\delta\leftarrow 1/(\ell+1)

\mathbf{q}\leftarrow

a fixed sequence of random bits.

3:Set

\alpha\leftarrow\frac{3}{8}

(\mathbf{q}_{1},\ldots,\mathbf{q}_{\ell+1})\leftarrow\mathbf{q}

4:for

e\in\mathcal{N}

do do

5: Assign

e

to each machine independently with probability

1/\ell

6:for

i\in M

\triangleright

On machine

i

8: Let

\mathcal{N}_{i}

be the elements assigned to machine

i

S_{i},R_{i}\leftarrow\textsc{ThreshSeqMod}(f,\mathcal{N}_{i},k,\delta,\varepsilon,(\alpha+\varepsilon)\textsc{OPT}/k,\mathbf{q}_{i})

10: if

|S_{i}|=k

then return

T^{\prime}\leftarrow S_{i}

11: Send

S_{i},R_{i}

to primary machine

12:

\triangleright

On primary machine

13:

R\leftarrow\bigcup_{i=1}^{\ell}R_{i}

14:Let

g(\cdot)\leftarrow f(S_{1}\cup\cdot)-f(S_{1})

15:

T\leftarrow\textsc{ThreshSeqMod}(g,R,k-|S_{1}|,\delta,\varepsilon,(\alpha+\varepsilon)\textsc{OPT}/k,\mathbf{q}_{\ell+1})

16:

T^{\prime}\leftarrow S_{1}\cup T

17:return

T^{\prime}

Theorem 5.

Let $(f,k)$ be an instance of SM where $k=O\left(\frac{\psi}{\ell}\right)$ . T-DASH knowing OPT returns set $T^{\prime}$ with two MR rounds, $O\left(\frac{1}{\varepsilon^{3}}\log\left(n\right)\right)$ adaptive rounds, $O\left(\frac{n}{\varepsilon^{3}}\right)$ total queries, $O(n)$ communication complexity, and probability at least $1-n^{-c}$ such that

\mathbb{E}\left[f(T^{\prime})\right]\geq\left(\frac{3}{8}-\varepsilon\right)\textsc{OPT}.

Proof.

Query Complexity and Adaptivity. T-DASH runs with two MR rounds, where the first MR round invokes ThreshSeqMod $\ell$ times in parallel, and the second MR round invokes ThreshSeqMod once. By Theorem 1, each call of ThreshSeqMod with a ground set of size at most $\frac{n}{\ell}$ queries $O\left(\frac{n}{\ell\varepsilon^{3}}\right)$ within $O\left(\frac{\log(n/\ell)}{\varepsilon^{3}}\right)$ adaptive rounds. The total number of queries is $O\left(\frac{n}{\varepsilon^{3}}\right)$ , and the total number of adaptive rounds is $O\left(\frac{\log(n)}{\varepsilon^{3}}\right)$ .

Approximation Ratio. In Algorithm 6, there are $\ell+1$ independent calls of ThreshSeqMod. With $|\mathcal{N}_{i}|\geq n^{c}$ , the success probability of each call of ThreshSeqMod is larger than $1-\frac{1}{n^{c}(\ell+1)}$ . Thus, Algorithm 6 succeeds with probability larger than $1-n^{-c}$ . For the remainder of the analysis, we condition on the event that all calls to ThreshSeqMod succeed.

In the case that $|T^{\prime}|=k$ , by Theorem 1 in Section 2.1 and $\tau=\left(\frac{3}{8}+\varepsilon\right)\frac{\textsc{OPT}}{k}$ , it holds that $f(T^{\prime})\geq\frac{1-2\varepsilon}{1+\varepsilon}\tau\cdot k\geq\left(\frac{3}{8}-\varepsilon\right)\textsc{OPT}.$ Otherwise, we consider the case that $|T^{\prime}|<k$ in the following. Let $(\textsc{TSMSol}(\mathcal{N},\mathbf{q}),\textsc{TSMRel}(\mathcal{N},\mathbf{q}))$ be the pair of sets returned by $\textsc{ThreshSeqMod}(\mathcal{N},\mathbf{q})$ . For any $x\in\mathcal{N}$ , let

p_{x}=\begin{cases}Pr_{X\sim\mathcal{N}(1/\ell),\mathbf{q}}\left[\textsc{TSMRel}(X\cup\{x\},\mathbf{q})\right.\\ \hskip 60.00009pt\left.=\textsc{TSMRel}(X,\mathbf{q})\right]\hskip 10.00002pt\text{, if }x\in O\\ 0\hskip 130.0002pt\text{, otherwise }\end{cases}.

Let $O_{1}=\{o\in O:o\not\in\textsc{TSMRel}(N_{1}\cup\{o\},q)\}$ , $O_{2}=R\cap O$ . By Randomized Consistency Property, it holds that $S_{1}=\textsc{TSMSol}(N_{1},q)=\textsc{TSMSol}(N_{1}\cup O_{1},q)$ . For any $o\in O_{1}$ , $o$ is not selected in $S_{1}$ . Since, $|S_{1}|<k$ , by Property (4) in Theorem 1, $\Delta\left(o\,|\,T^{\prime}\right)<\Delta\left(o\,|\,S_{1}\right)<\tau.$ Also, for any $o\in O_{2}\backslash T$ , $o$ is not selected in $T$ . Similarly, $\Delta\left(o\,|\,T^{\prime}\right)<\tau.$ Then, we can get,

	$\displaystyle f(O_{1}\cup O_{2})-f(T^{\prime})\leq f(O_{1}\cup O_{2}\cup T^{\prime})-f(T^{\prime})$
	$\displaystyle\leq\sum_{o\in O_{1}\cup O_{2}\backslash T^{\prime}}\Delta\left(o\,\|\,T^{\prime}\right)\leq k\cdot\tau=\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\left(\frac{3}{8}+\varepsilon\right)\textsc{OPT}.$		(6)

Next, we provide the following lemma to complete the rest of the proof.

Lemma 7.

For any $o\in O$ , it holds that $Pr\left(o\in O_{1}\cup O_{2}\right)\geq 3/4$ .

Then, by Lemma 8 in Appendix A, $\mathbb{E}\left[f(O_{1}\cup O_{2})\right]\geq 3/4\cdot F(\mathbf{1}_{O})=3/4\cdot\textsc{OPT}.$ From this and Inequality 6, we can bound the approximation ratio for Algorithm T-DASH knowing OPT by follows, $\mathbb{E}\left[f(T^{\prime})\right]\geq\left(\frac{3}{8}-\varepsilon\right)\textsc{OPT}.$ ∎

6 LinearTime-Distributed (L-Dist)

Algorithm 7 LinearTime-Distributed

1:Input: Evaluation oracle

f:2^{\mathcal{N}}\to\mathbb{R}

, constraint

k

, error

\varepsilon

, available machines

M\leftarrow\{1,2,...,\ell\}

2:for

e\in\mathcal{N}

3: Assign

e

to machine chosen uniformly at random

4:for

i\in M

\triangleright

On machine

i

6: Let

\mathcal{N}_{i}

be the elements assigned to machine

i

S_{i}\leftarrow\textsc{LTC}(f,\mathcal{N}_{i},k,\mathbf{q})

8: Send

S_{i}

to primary machine

\triangleright

On primary machine

10:Gather

S\leftarrow\bigcup_{i=1}^{\ell}S_{i}

11:

T_{1}\leftarrow\textsc{LTC}(f,S,k,\mathbf{q})

on machine

m_{1}

12:

T_{1}^{\prime}\leftarrow

last

k

elements in

T_{1}

13:

T_{2}\leftarrow\textsc{ThresholdGreedy}\left(f,T_{1},k,\varepsilon,f(T_{1}^{\prime}),1/2\right)

\triangleright

Post-Processing

14:

S_{1}^{\prime}\leftarrow

last

k

elements added to

S_{1}

15:return

V\leftarrow\operatorname*{arg\,max}{\{f(S_{1}^{\prime}),f(T_{1}^{\prime}),f(T_{2})}\}

In this section, we introduce L-Dist, the first linear time MapReduce algorithm with constant approximation. We derive L-Dist from R-DASH by incorporating the analysis of LTC (Alg. 3), an algorithm with $O(n)$ query complexity, into the distributed setting. This integration allows L-Dist to operate within two MapReduce rounds, demonstrating an adaptive complexity of $O(\frac{n}{\ell})$ and a query complexity of $O(n)$ . Also, as stated in Theorem 6, it guarantees an approximation ratio of $\frac{1}{8}$ . Additionally, we improve the objective value by implementing ThresholdGreedy (Alg. 9 in Appendix D) on the set returned at the second MapReduce round. The integration of LTC into LinearTime-Distributed provides enhanced capabilities and improves the efficiency of the overall algorithm.

Description: Initially, L-Dist involves randomly distributing the ground set across all machines $M$ . In the first MR round, LinearTime-Distributed applies LTC on each machine to obtain $S_{i}$ within $O\left(n/\ell\right)$ query calls. The solutions from all machines are then returned to the primary machine, where LTC selects the output solution $T_{1}$ . By selecting the best last $k$ among all solutions, L-Dist ensures a $\frac{1}{8}$ -approximation. Besides returning the last $k$ elements of the solution $T_{1}$ which is an $1/2$ -approximation of $T_{1}$ , we employ ThresholdGreedy, a $(1-1/e-\varepsilon)$ approximation algorithm, to boost the objective value. We provide the theoretical guarantees of L-Dist as Theorem 6. The proof involves a minor modification of the proof provided in Theorem 3.

Theorem 6.

Let $(f,k)$ be an instance of SM where $k<\frac{\psi}{\ell\log(\psi)}$ . L-Dist returns a set $V$ with two MR rounds, $O\left(\frac{n}{\ell}\right)$ adaptive rounds, $O\left(n\right)$ total queries, $O(n)$ communication complexity such that

\mathbb{E}[f(V)]\geq\frac{1}{8}\textsc{OPT}.

6.1 Analysis of Query Complexity and Adaptivity

L-Dist runs with two MR rounds. In the first MR round, LTC is invoked $\ell$ times in parallel, each with $O\left(n/\ell\right)$ queries and $O\left(n/\ell\right)$ adaptive rounds by Theorem 2. So, during the first MR round, the number of queries is $O\left(n\right)$ and the number of adaptive rounds is $O\left(n/\ell\right)$ . Then, the second MR round calls LTC and ThresholdGreedy once respectively, handling at most $n/\ell$ elements. By Theorem 2 and 8, the number of queries is $O\left(n/\ell\right)$ and the number of adaptive rounds is $O\left(n/\ell\right)$ . Consequently, the total number of queries of L-Dist is $O\left(n\right)$ , and the total number of adaptive rounds is $O\left(n/\ell\right)$ .

6.2 Analysis of Approximation Ratio

Let L-Dist be executed with input $(f,k,\varepsilon,M)$ . Since $\ell\leq n^{1-c}$ , it follows that the expected size of each subset $|\mathcal{N}_{i}|$ satisfies $\mathbb{E}\left[|\mathcal{N}_{i}|\right]=n/\ell\geq n^{c}$ . To ensure that the size $|\mathcal{N}_{i}|$ is concentrated, we apply Chernoff’s bound. Let $\mathcal{N}(1/\ell)$ denote the random distribution over subsets of $\mathcal{N}$ where each element is included independently with probability $1/\ell$ . Let $\mathbf{p}\in[0,1]^{n}$ be the following vector. For $x\in\mathcal{N}$ , let

p_{x}=\begin{cases}Pr_{X\sim\mathcal{N}(1/\ell),\mathbf{q}}\left[\textsc{LTC}(X\cup\{x\},\mathbf{q})=\textsc{LTC}(X,\mathbf{q})\right]\text{, if }x\in O\\ 0\hskip 130.0002pt\text{, otherwise }\end{cases}.

Consider $S_{1}=\textsc{LTC}(\mathcal{N}_{1},\mathbf{q})$ on machine $m_{1}$ . Let $O_{1}=\{o\in O:\textsc{LTC}(\mathcal{N}_{1}\cup\{o\},\mathbf{q})=\textsc{LTC}(\mathcal{N}_{1},\mathbf{q})\}$ . By Lemma 5, $\textsc{LTC}(\mathcal{N}_{1}\cup O_{1},\mathbf{q})=S_{1}$ . Therefore, by Theorem 2, it holds that $f(S_{1}^{\prime})\geq f(O_{1})/4$ . Next, let $O_{2}=O\cap S$ , where $S=\bigcup_{i=1}^{\ell}S_{i}$ . Similarly, by Theorem 2, it holds that $f(T_{1}^{\prime})\geq f(O_{2})/4$ . Let $o\in O$ ; $o$ is assigned to $\mathcal{N}_{c}$ on some machine $m_{c}$ . It holds that,

	$\displaystyle\Pr[o\in O_{2}]$	$\displaystyle=\Pr[o\in\textsc{LTC}(\mathcal{N}_{c})\|o\in\mathcal{N}_{c}]$
		$\displaystyle=\Pr[o\in\textsc{LTC}(\mathcal{N}_{c}\cup\{o\})]$
		$\displaystyle=1-p_{o}.$

Therefore,

$\displaystyle\mathbb{E}\left[f(V)\right]$	$\displaystyle\geq\frac{1}{2}\left(\mathbb{E}[f(S_{1}^{\prime})]+\mathbb{E}[f(T_{1}^{\prime})]\right)$
	$\displaystyle\geq\frac{1}{2}\left(\mathbb{E}[f(O_{1})/4]+\mathbb{E}[f(O_{2})/4]\right)$
	$\displaystyle\geq\frac{1}{8}\left(F(\mathbf{p})+F(\mathbf{1}_{O}-\mathbf{p})\right)$
	$\displaystyle\geq\frac{1}{8}F(\mathbf{1}_{O})=\frac{1}{8}\textsc{OPT},$	(7)

where inequality 7 follows from Lemma 8 and $F$ is convex.

6.3 Post-Processing

In this algorithm, we employ a simple post-processing procedure. Since $T_{1}^{\prime}$ is an $1/2-$ approximation of $T_{1}$ with size $k$ , $T_{1}^{\prime}$ is also an $1/2-$ approximation of the instance $(f,k)$ on the ground set $T_{1}$ . By running any linear time algorithm that has a better approximation ratio on $T_{1}$ , we are able to boost the objective value returned by the algorithm with the same theoretical guarantees. By Theorem 8, ThresholdGreedy achieves $(1-1/e-\varepsilon)-$ approximation in linear time with a guess on the optimal solution. Therefore, with input $\alpha=1/2$ and $\Gamma=f(T_{1}^{\prime})$ , utilizing ThresholdGreedy can effectively enhance the objective value.

7 Towards Larger $k$ : A Memory-Efficient, Distributed Framework (MED)

In this section, we propose a general-purpose plug-in framework for distributed algorithms, MemoryEfficientDistributed (MED, Alg. 8). MED increases the largest possible constraint value from $O(n/\ell^{2})$ to $O(n/\ell)$ in the value oracle model. Under some additional assumptions, we remove all restrictions on the constraint value.

Algorithm 8

\textsc{MED}(f,k,M,\textsc{Alg},\mathbf{q})

1:Input: evaluation oracle

f:2^{\mathcal{N}}\to\mathbb{R}

, constraint

k

, available machines

M\leftarrow\{1,2,...,\ell\}

, MR algorithm Alg, random bits

\mathbf{q}

n\leftarrow|\mathcal{N}|

\Psi\leftarrow

Memory capacity (# elements) of primary machine

3:Choose

k^{\prime}\leftarrow\max\{k^{\prime}\in\mathbb{N}:k^{\prime}\leq\frac{\Psi}{\ell}\}

m\leftarrow\lceil k/k^{\prime}\rceil

(\mathbf{q}_{1},\ldots,\mathbf{q}_{m})\leftarrow\mathbf{q}

5:for

i\leftarrow 1\ to\ m

6: Let

g(\cdot)\leftarrow f(\cdot\cup S_{i})-f(S_{i})

A_{i}\leftarrow\textsc{Alg}(g,M,\min\{k^{\prime},k-|S_{i}|\},\mathbf{q}_{i})

S_{i+1}\leftarrow S_{i}\cup A_{i}

9:return

S_{m}

As discussed in Section 1, the $k$ value is limited to a fraction of the machine memory for MapReduce algorithms: $k\leq O(n/\ell^{2})$ , since those algorithms need to merge a group of solutions and pass it to a single machine. MED works around this limitation as follows: MED can be thought of as a greedy algorithm that uses an approximate greedy selection through Alg. One machine manages the partial solutions $\{S_{i}\}$ , built up over $m$ iterations. In this way, each call to Alg is within the constraint restriction of Alg, i.e. $O(n/\ell^{2})$ , but a larger solution of up to size $O(n/\ell)$ can be constructed.

The restriction on $k$ of MED of $O(n/\ell)$ comes from passing the data of the current solution to the next round. Intuitively, if we can send some auxiliary information about the function instead of the elements selected, the $k$ value can be unrestricted.

Assumption 1.

Let $f$ be a set function with ground set $\mathcal{N}$ of size $n$ . If for all $S\subseteq\mathcal{N}$ , there exists a bitvector $\mathbf{v}_{S}$ , such that the function $g(X)=f(S\cup X)-f(S)$ can be computed from $X$ and $\mathbf{v}_{S}$ , then $f$ satisfies Assumption 1.

We show in Appendix F that all four applications evaluated in Section 8 satisfy Assumption 1. As an example, consider MaxCover, which can be expressed as $f(S)=\sum_{i\in\mathcal{N}}f_{i}(S)$ , where $f_{i}(S)=\mathbf{1}_{\{i\text{ is covered by }S\}}$ . Let $g_{i}(X)=f_{i}(S\cup X)-f_{i}(S)$ , and $\mathbf{v}_{S}=(f_{1}(S),\ldots,f_{n}(S))$ . Then, $f_{i}(S\cup X)=\mathbf{1}_{\{i\text{ is covered by }S\}}\lor\mathbf{1}_{\{i\text{ is covered by }X\}}$ , where the first term is given by $\mathbf{v}_{S}$ and the second term is calculated by $X$ . Therefore, since $g(X)=\sum_{i\in\mathcal{N}}g_{i}(X)$ , $g(X)$ can be computed from $X$ and $\mathbf{v}_{S}$ .

Theorem 7.

Let ( $f,k$ ) be an instance of SMCC distributed over $\ell$ machines. For generic objectives, where the data of current solution need to be passed to the next round, $k\leq\min\{\frac{\ell\psi-n}{\ell-1}+1,\psi-\ell+1\}$ ; for special objectives, where only one piece of compressed data need to be passed to the next round, $k\leq n$ . Let $S_{m}$ be the set returned by MED. Then $\mathbb{E}\left[f(S_{m})\right]\geq(1-e^{-\gamma})\textsc{OPT}$ , where $\gamma$ is the expected approximation of Alg.

Proof.

Let $O$ be an optimal and $O_{1},O_{2},...,O_{m}$ be a partition of $O$ into $m$ pieces, each of size $\leq k^{\prime}$ . Also $\forall_{i}\hat{f}_{i}$ is monotone SMCC since $f$ is monotone SMCC.

	$\displaystyle\mathbb{E}\left[f(S_{i+1})-f(S_{i})\|S_{i}\right]$	$\displaystyle=\mathbb{E}\left[f(S_{i}\cup A_{i})-f(S_{i})\|S_{i}\right]$
		$\displaystyle=\mathbb{E}\left[\hat{f}_{i}(A_{i})\|S_{i}\right]$
		$\displaystyle\overset{(a)}{\geq}\frac{1}{m}\sum_{j=1}^{m}\gamma\hat{f}_{i}(O_{j})$
		$\displaystyle\geq\frac{\gamma}{m}\hat{f}_{i}(O)$
		$\displaystyle=\frac{\gamma}{m}(f(S_{i}\cup O)-f(S_{i}))$
		$\displaystyle\geq\frac{\gamma}{m}(OPT-f(S_{i})),$

where Inequality (a) follows from $\hat{f}_{i}$ is SMCC; Alg is $\gamma$ -approximation. Unfix $S_{i}$ , it holds that

	$\displaystyle OPT-\mathbb{E}\left[f(S_{i+1})\right]$	$\displaystyle\leq\left(1-\frac{\gamma}{m}\right)[OPT-\mathbb{E}\left[f(S_{i})\right]]$
		$\displaystyle\leq\left(1-\frac{\gamma}{m}\right)^{i}[OPT-f(\emptyset)]$
		$\displaystyle=\left(1-\frac{\gamma}{m}\right)^{i}\cdot OPT$
	$\displaystyle\therefore\mathbb{E}\left[f(S_{m})\right]$	$\displaystyle\geq\left[1-(1-\frac{\gamma}{m})^{m}\right]\cdot OPT$
		$\displaystyle\geq(1-e^{-\gamma})\cdot OPT$

To analyze the memory requirements of MED, consider the following. To compute $g(\cdot)\leftarrow f(\cdot\cup S_{i})-f(S_{i})$ , the current solution $S_{i}$ need to be passed to each machine in $M$ for the next call of Alg. Suppose $|S|=k-x$ where $1\leq x\leq k$ . The size of data stored on any non-primary machine in cluster $M$ , as well as on the primary machine of $M$ can be bounded as follows:

\begin{aligned} &k-x+(n-k+x)/\ell\leq\Psi\\ &k-x+(k\ell)/m\leq\Psi\end{aligned}\,\Rightarrow\,\begin{aligned} &k\leq(\ell\Psi-n)/(\ell-1)+1\\ &k\leq\Psi-\ell+1\end{aligned}.

Therefore, if $\Psi\geq 2n/\ell$ , MED can run since $\ell\leq n/\ell$ in our MapReduce model. Under the alternative assumption, it is clear that MED can run for all $k\leq n$ . ∎

8 Empirical Evaluation

This section presents empirical comparison of R-DASH, T-DASH, G-DASH, L-Dist to the state-of-the-art distributed algorithms. Distributed algorithms include RandGreeDI (RG) of Barbosa et al. (2015) and DistributedDistorted (DDist) of Kazemi et al. (2021).

Table 2: Environment Setup

Experiment	Number of Machines ( $\ell$ )^*	Number of Threads Per Machine ( $\varUpsilon$ )	Dataset Size (n)
Experiment 1 (Fig. 2)	8	4	10K - 100K
Experiment 2 (Fig. 3)	64	32	1M - 5M
Experiment 3 (Fig. 4)	8	4	3M
Experiment 3 (Fig. 4)	8	4	100K
Experiment 4 (Fig. 5)	32	1	100K

*

Prior MR (greedy) algorithms were assessed using $\ell\cdot\varUpsilon$ machines, utilizing each thread as an individual core. Thus, all algorithms were configured to use all available cores.
${\ddagger}$

We kept a variant of R-DASH evaluated on 8 machines (with 4 cores each) as a baseline. All other algorithms use 2 machines (with 4 cores each).

Environment. Our experiments utilized diverse computational setups. Experiments 1, and 3 employed an eight-machine cluster, each with four CPU cores, totaling 32 cores. Experiment 2 was conducted on a larger 64-machine cluster, each featuring 32 CPU cores. Experiment 4 operated on a cluster of 32 single-core machines. Notably, in Experiment 1, our algorithms used $\ell=8$ machines, while prior MapReduce (MR) algorithms employed $\ell=32$ , fully utilizing 32 cores. MPICH version 3.3a2 was installed on every machine, and we used the python library $mpi4py$ for implementing and parallelizing all algorithms with the Message Passing Interface (MPI). These algorithms were executed using the $mpirun$ command, with runtime tracked using $mpi4py.MPI.Wtime()$ at the algorithm’s start and completion.

Datasets. Table 2 presents dataset sizes for our experiments. In Experiment 1 and 3 (Fig. 4), we assess algorithm performance on small datasets, varying from $n$ = 10,000 to 100,000. These sizes enable evaluation of computationally intensive algorithms, such as DDist, T-DASH, and G-DASH. Experiment 2, 3 (Fig. 4), and 4 focus on larger datasets, ranging from $n$ = 50,000 to 5 million.

Applications. We evaluated image summarization (ImageSumm), influence maximization (InfMax), revenue maximization (RevMax) and maximum coverage (MaxCover). Details are provided in Appendix F.1.

Experiment Objectives. The primary objective of our experiment set is to comprehensively evaluate the practicality of the distributed algorithms across varying cluster and dataset sizes. The specific objectives of each experiment are outlined below:

•

Experiment 1¹¹1Results published in Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023: Baseline experiment aimed at assessing the performance of the algorithms using small datasets and cluster setup.
•

Experiment 2: Assess the performance of the algorithms on large datasets and cluster setup.
•

Experiment 3^∗: Investigate the influence of the number of nodes in a cluster on algorithm performance.
•

Experiment 4: Examine the impact of increasing cardinality constraints on the performance of MED.

8.1 Experiment 1 - Comparative Analysis on Small Datasets

The results of Experiment 1 (Fig. 2) show that all algorithms provide similar solution values (with T-DASH being a little worse than the others). However, there is a large difference in runtime, with R-DASH the fastest by orders of magnitude. The availability of only 4 threads per machine severely limits the parallelization of T-DASH, resulting in longer runtime; access to $\log{}_{1+\varepsilon}(k)$ threads per machine should result in faster runtime than R-DASH.

8.2 Experiment 2 - Performance Analysis of R-DASH, RandGreeDI and L-Dist on a Large Cluster

This section presents a comparative analysis of R-DASH, L-Dist, and RandGreeDI on a large 64-node cluster, each node equipped with 32 cores. We assess two versions of L-Dist and RandGreeDI, one with $\ell=64$ and the other with $\ell=2048$ , with a predetermined time limit of 3 hours for each application. The plotted results depict the instances completed by each algorithm within this time constraint.

In terms of solution quality, as depicted in Figure 3, statistically indistinguishable results, while L-Dist exhibits an average drop of 9% in solution quality across completed instances. Examining runtime (Figure 3), R-DASH consistently outperforms L-Dist and RandGreeDI by orders of magnitude. As discussed in Experiment 3, for both L-Dist and RandGreeDI, instances with fewer distributions ( $\ell=64$ ) display faster runtimes compared to instances using a larger setup ( $\ell=2048$ ), attributed to $k$ values surpassing $\frac{n}{\ell^{2}}$ ( $\simeq 0.27$ for InfMax). The subsequent experiment demonstrates how MED effectively addresses this challenge. Comparing the fastest variant of each algorithm, we observe that for the smallest $k$ values, RandGreeDI outperforms L-Dist. However, as $k$ increases, RandGreeDI’s runtime linearly grows, while L-Dist’s runtime remains unaffected due to its linear time complexity, enabling L-Dist to outperform RandGreeDI for larger constraints.

8.3 Experiment 3 - Scalability Assessment

Figure 4 illustrates a linear speedup for R-DASH as the number of machines $\ell$ increases. Figure 4 highlights an intriguing observation that, despite having sufficient available memory, increasing $\ell$ can result in inferior performance when $k>\frac{n}{\ell^{2}}$ . Specifically, as depicted in Figure 4, we initially witness the expected faster execution of RandGreeDI with $\ell=32$ compared to RandGreeDI with $\ell=8$ . However, once $k>\frac{n}{32^{2}}$ , the relative performance of RandGreeDI with $\ell=32$ rapidly deteriorates. This decline can be attributed to the degradation of RandGreeDI’s optimal performance beyond $k=\frac{n}{\ell^{2}}$ .

When running RandGreeDI with a single thread on each machine, the total running time on $\ell$ machines can be computed based on two components. First, the running time for one machine in the first MR round, which is proportional to $(n/\ell-(k-1)/2)k$ . Second, the running time for the primary machine in the second MR round (post-processing step), which is proportional to $(k\ell-(k-1)/2)k$ . Consequently, the total running time is proportional to $nk/\ell+\ell k^{2}-k(k-1)$ . Optimal performance is achieved when $\ell=\sqrt{n/k}$ , which justifies the preference for parallelization within a machine to maintain a lower $\ell$ rather than distributing the data across separate processors.

Furthermore, in Experiment 5 (Section 8.4), we demonstrate that utilizing MED enables MR algorithms to produce solutions much faster with no compromise in solution value, particularly when solving for $k>\frac{n}{\ell^{2}}$ . These results provide further support for the advantage of incorporating MED in achieving efficient and effective parallelization in MR algorithms.

8.4 Experiment 4 - Performance Analysis of MED

This section presents experimental results comparing the performance of MED+RG and the vanilla RandGreeDI algorithm. In terms of solution value, MED+RG consistently provides nearly identical solutions to the vanilla RandGreeDI algorithm across all instances for the three applications studied. Regarding runtime, the following observations are made: Initially, both algorithms exhibit similar execution times up to a threshold of $k=\frac{n}{\ell^{2}}$ . However, beyond this threshold, the performance gap between MED+RG and RandGreeDI widens linearly. MED+RG achieves notable average speedup factors of 1.8, 2.2, and 2.3 over RandGreeDI for the respective applications. Moreover, beyond $k=\frac{n}{\ell^{2}}$ , MED+RG outperforms RandGreeDI in terms of completing instances within a 12-hour timeout, completing 77% more instances of $k$ across all three applications. These findings highlight the promising performance of MED+RG, demonstrating comparable solution values while significantly improving runtime efficiency compared to the vanilla RandGreeDI algorithm.

The empirical findings from this experiment provide insights into the capabilities of the MED+Alg framework. Our results indicate that MED+Alg achieves solution quality that is practically indistinguishable from the vanilla Alg algorithm, even for significantly larger values of $k$ . Additionally, even with sufficient available memory to run Alg beyond $k=\frac{n}{\ell^{2}}$ , MED+Alg demonstrates noteworthy computational efficiency, surpassing that of the Alg algorithm. This combination of comparable solution quality and improved computational efficiency positions MED+Alg as a highly promising framework. These findings have significant implications and underscore the potential of MED+Alg to address complex problems in a distributed setting with large-scale values of $k$ .

9 Discussion and Conclusion

Prior to this work, no MR algorithms for SMCC could parallelize within a machine; these algorithms require many sequential queries. Moreover, increasing the number of machines to the number of threads available in the cluster may actually harm the performance, as we showed empirically; intuitively, this is because the size of the set on the primary machine scales linearly with the number of machines $\ell$ . In this paper, we have addressed this limitation by introducing a suite of algorithms that are both parallelizable and distributed. Specifically, we have presented R-DASH, T-DASH, and G-DASH, which are the first MR algorithms with sublinear adaptive complexity (highly parallelizable). Moreover, our algorithms make have nearly linear query complexity over the entire computation. We also provide the first distributed algorithm with $O(n)$ total query complexity, improving on the $O(n\text{polylog}(n))$ of our other algorithms and the algorithm of Liu and Vondrak (2019).

When RandGreeDI was introduced by Mirzasoleiman et al. (2013), the empirical performance of the algorithm was emphasized, with theoretical guarantees unproven until the work of Barbosa et al. (2015). Since that time, RandGreeDI has remained the most practical algorithm for distributed SMCC. Our R-DASH algorithm may be regarded as a version of RandGreeDI that is 1) parallelized; 2) nearly linear time in total (RandGreeDI is quadratic); and 3) empirically orders of magnitude faster in parallel wall time in our evaluation.

R-DASH achieves approximation ratio of $(1-1/e)/2$ in two MR rounds; the first round merely being to distribute the data. We provide G-DASH to close the gap to $(1-1/e)$ ratio in a constant number of rounds. However, MR rounds are expensive (as shown by our Experiment 1). The current best ratio achieved in two rounds is the $0.545$ -approximation algorithm of Mirrokni and Zadimoghaddam (2015). Therefore, a natural question for future research is what is the best ratio achievable in two MR rounds. Moreover, non-monotone or partially monotone objective functions and more sophisticated constraint systems are directions for future research.

Acknowledgements

Yixin Chen and Tonmoy Dey contributed equally to this work. The work of Tonmoy Dey was partially supported by Florida State University; the work of Yixin Chen and Alan Kuhnle was partially supported by Texas A & M University. The authors have received no third-party funding in direct support of this work. The authors have no additional revenues from other sources related to this work.

Appendix A Lovász Extension of Submodular Function.

Given submodular function $f$ , the Lovász extension $F$ of $f$ is defined as follows: For $\mathbf{z}\in[0,1]^{|\mathcal{N}|}$ ,

F(\mathbf{z}=(z_{i})_{i\in\mathcal{N}})=\mathbb{E}_{\lambda\sim\mathcal{U}[0,1]}[f(\{i:z_{i}\geq\lambda\})].

The Lovász extension satisfies the following properties: (1) $F$ is convex; (2) $F(c\mathbf{z})\geq cF(\mathbf{z})$ for any $c\in(0,1)$ . Moreover, we will require the following simple lemma:

Lemma 8 (Barbosa et al. (2015)).

Let $S$ be a random set, and suppose $\mathbb{E}[\mathbf{1}_{S}]=c\cdot\mathbf{z}$ , for $c\in(0,1)$ . Then $\mathbb{E}[f(S)]\geq c\cdot F(\mathbf{z})$ .

Appendix B Probability Lemma and Concentration Bounds

Lemma 9.

(Chen et al. (2021)) Suppose there is a sequence of $n$ Bernoulli trials: $X_{1},X_{2},\ldots,X_{n},$ where the success probability of $X_{i}$ depends on the results of the preceding trials $X_{1},\ldots,X_{i-1}$ . Suppose it holds that

Pr\left(X_{i}=1|X_{1}=x_{1},X_{2}=x_{2},\ldots,X_{i-1}=x_{i-1}\right)\geq\eta,

where $\eta>0$ is a constant and $x_{1},\ldots,x_{i-1}$ are arbitrary.

Then, if $Y_{1},\ldots,Y_{n}$ are independent Bernoulli trials, each with probability $\eta$ of success, then

Pr\left(\sum_{i=1}^{n}X_{i}\leq b\right)\leq Pr\left(\sum_{i=1}^{n}Y_{i}\leq b\right),

where $b$ is an arbitrary integer.

Moreover, let $A$ be the first occurrence of success in sequence $X_{i}$ . Then,

\mathbb{E}\left[A\right]\leq 1/\eta.

Lemma 10 (Chernoff bounds Mitzenmacher and Upfal (2017)).

Suppose $X_{1}$ , … , $X_{n}$ are independent binary random variables such that $Pr\left(X_{i}=1\right)=p_{i}$ . Let $\mu=\sum_{i=1}^{n}p_{i}$ , and $X=\sum_{i=1}^{n}X_{i}$ . Then for any $\delta\geq 0$ , we have

\displaystyle Pr\left(X\geq(1+\delta)\mu\right)\leq e^{-\frac{\delta^{2}\mu}{2+\delta}}.

(8)

Moreover, for any $0\leq\delta\leq 1$ , we have

\displaystyle Pr\left(X\leq(1-\delta)\mu\right)\leq e^{-\frac{\delta^{2}\mu}{2}}.

(9)

Appendix C Omitted Proof of ThreshSeqMod

See 2

Proof.

After filtration on Line 5, it holds that, for any $x\in V_{j}$ , $\Delta\left(x\,|\,S_{j-1}\right)\geq\tau$ and $\Delta\left(x\,|\,V_{j}\cup S_{j-1}\right)=0$ . Therefore,

	$\displaystyle\|A_{0}\|=\|\{x\in V_{j}:\Delta\left(x\,\|\,S_{j-1}\right)<\tau\}\|=0,$
	$\displaystyle\|A_{\|V_{j}\|}\|=\|\{x\in V_{j}:\Delta\left(x\,\|\,V_{j}\cup S_{j-1}\right)<\tau\}\|=\|V_{j}\|.$

For any $x\in A_{i}$ , it holds that $\Delta\left(x\,|\,T_{i}\cup S_{j-1}\right)<\tau$ . Due to submodularity, $\Delta\left(x\,|\,T_{i+1}\cup S_{j-1}\right)\leq\Delta\left(x\,|\,T_{i}\cup S_{j-1}\right)<\tau$ . Therefore, $A_{i}$ is subset of $A_{i+1}$ , which indicates that $|A_{i}|\leq|A_{i+1}|$ . ∎

Proof of Success Probability.

The algorithm successfully terminates if, at some point, $|V_{j}|=0$ or $|S_{j}|=k$ . To analyze the success probability, we consider a variant of the algorithm in which it does not terminate once $|V_{j}|=0$ or $|S_{j}|=k$ . In this case, the algorithm keeps running with $s=0$ and selecting empty set after inner for loop in the following iterations. Thus, with probability $1$ , either $|V_{j+1}|=0\leq(1-\beta\varepsilon)|V_{j}|=0$ , or $|S_{j}|=k$ . Lemma 3 holds for all $M+1$ iterations of the outer for loop.

If the algorithm fails, there should be no more than $m=\lceil\log_{1-\beta\varepsilon}(1/n)\rceil$ successful iterations where $\lambda_{j}^{*}\geq\min\{s,t\}$ . Otherwise, by Lemma 3, there exists an iteration $j$ such that $|S_{j}|=k$ , or $|V_{M+1}|<n\left(1-\beta\varepsilon\right)^{m}\leq 1$ . Let $X$ be the number of successful iterations. Then, $X$ is the sum of $M+1$ dependent Bernoulli random variables, where each variable has a success probability of more than $1/2$ . Let $Y$ be the sum of $M+1$ independent Bernoulli random variables with success probability $1/2$ . By probability lemmata, the failure probability of Alg. 1 is calculated as follows,

$\displaystyle Pr\left(\text{Alg.~{}\ref{alg:threshold} fails}\right)$	$\displaystyle\leq Pr\left(X\leq m\right)$
	$\displaystyle\leq Pr\left(Y\leq m\right)$	(Lemma 9)
	$\displaystyle\leq Pr\left(Y\leq\log\left(\frac{n}{\delta}\right)/(\beta\varepsilon)\right)$	( $\log(x)\geq 1-\frac{1}{x},\forall x>0$ )
	$\displaystyle\leq e^{-\frac{1}{2}\left(1-\frac{1}{2(1+\beta\varepsilon)}\right)^{2}\cdot 2(1+1/\beta\varepsilon)\log\left(\frac{n}{\delta}\right)}$	(Lemma 10)
	$\displaystyle=\left(\frac{\delta}{n}\right)^{\frac{(2\beta\varepsilon+1)^{2}}{4\beta\varepsilon(\beta\varepsilon+1)}}\leq\frac{\delta}{n}\qed$

Proof of Adaptivity and Query Complexity.

In Alg. 1, oracle queries incurred on Line 5 and 14, and can be done in parallel. Thus, there are constant number of adaptive rounds within one iteration of the outer for loop. The adaptivity is $O\left(M\right)=O\left(\log(n/\delta)/\varepsilon^{3}\right)$ .

Consider an iteration $j$ , there are no more than $|V_{j-1}|+1$ queries on Line 5 and $\log_{1+\varepsilon}(|V_{j}|)+2$ queries on Line 14. Let $Y_{i}$ be the $i-$ th successful iteration. Then, for any $Y_{i-1}<j\leq Y_{i}$ , it holds that $|V_{j}|\leq n(1-\beta\varepsilon)^{i-1}$ . By Lemma 9, for any $i\geq 1$ , $\mathbb{E}\left[Y_{i}-Y_{i-1}\right]\leq 2$ . Then, the expected number of oracle queries is as follows,

	$\displaystyle\mathbb{E}\left[\#\text{queries}\right]\leq\sum_{j=1}^{M+1}\mathbb{E}\left[\|V_{j-1}\|+\log_{1+\varepsilon}(\|V_{j}\|)+3\right]$
	$\displaystyle\leq n+\sum_{i:n(1-\beta\varepsilon)^{i-1}\geq 1}\mathbb{E}\left[Y_{i}-Y_{i-1}\right]\left(n(1-\beta\varepsilon)^{i-1}+\log_{1+\varepsilon}\left(n(1-\beta\varepsilon)^{i-1}\right)\right)+3(M+1)$
	$\displaystyle\leq n+2n\sum_{i\geq 1}(1-\beta\varepsilon)^{i-1}+2\log_{1+\varepsilon}(n)\cdot\log_{1-\beta\varepsilon}(1/n)+3\log(n/\delta)/\varepsilon^{3}$
	$\displaystyle\leq\left(1+\frac{2}{\beta\varepsilon}\right)n+2\log_{1+\varepsilon}(n)\cdot\log_{1-\beta\varepsilon}(1/n)+3\log(n/\delta)/\varepsilon^{3}$
	$\displaystyle\leq O(n/\varepsilon^{3})\qed$

Proof of Property (3).

For each iteration $j$ , consider two cases of $\lambda^{*}_{j}$ by the choice of $\lambda^{*}_{j}$ on Line 18. Firstly, if $\lambda^{*}_{j}\leq\left\lceil\frac{1}{\varepsilon}\right\rceil$ , clearly,

\Delta\left(T_{\lambda^{*}_{j}}\,|\,S_{j-1}\right)\geq(1-\varepsilon)\tau\lambda^{*}_{j}.

Secondly, if $\lambda^{*}_{j}>\left\lceil\frac{1}{\varepsilon}\right\rceil$ , let $\lambda_{j}=\max\left\{\lambda\in\Lambda:\lambda<\lambda_{j}^{*}\right\}$ . Then, $B[\lambda_{j}]=\textbf{True}$ , and

\Delta\left(T_{\lambda_{j}}\,|\,S_{j-1}\right)\geq(1-\varepsilon)\tau\lambda_{j}.

Let $\lambda_{j}=\left\lfloor(1+\varepsilon)^{u}\right\rfloor$ , and $\lambda^{*}_{j}=\left\lfloor(1+\varepsilon)^{u+1}\right\rfloor$ . Then,

\frac{|T_{j,\lambda_{j}}|}{|T_{j,\lambda_{j}^{*}}|}=\frac{\lambda_{j}}{\lambda_{j}^{*}}=\frac{\left\lfloor(1+\varepsilon)^{u}\right\rfloor}{\left\lfloor(1+\varepsilon)^{u+1}\right\rfloor}\geq\frac{(1+\varepsilon)^{u}-1}{(1+\varepsilon)^{u+1}}\geq\frac{1}{1+\varepsilon}-\varepsilon,

where the last inequality follows from $(1+\varepsilon)^{u+1}\geq\lambda_{j}^{*}>\left\lceil\frac{1}{\varepsilon}\right\rceil\geq\frac{1}{\varepsilon}$ . Therefore, by above two inequalities and monotonicity of $f$ ,

\Delta\left(T_{j,\lambda_{j}^{*}}\,|\,S_{j-1}\right)\geq\Delta\left(T_{j,\lambda_{j}}\,|\,S_{j-1}\right)\geq(1-\varepsilon)\left(\frac{1}{1+\varepsilon}-\varepsilon\right)\tau\lambda_{j}^{*}\geq(1-2\varepsilon)\tau\lambda_{j}^{*}/(1+\varepsilon).

The objective value of the returned solution can be bounded as follows,

\displaystyle f(S)=\sum_{j}\Delta\left(T_{\lambda_{j}^{*}}\,|\,S_{j-1}\right)\geq\sum_{j}(1-2\varepsilon)\tau\lambda_{j}^{*}/(1+\varepsilon)=(1-2\varepsilon)\tau|S|/(1+\varepsilon).\qed

Proof of Property (4).

If the output set $S$ follows that $|S|<k$ , for any $x\in\mathcal{N}$ , it is filtered out at some point. Let $x$ be discarded at itetation $j_{x}$ . Then,

\Delta\left(x\,|\,S\right)\leq\Delta\left(x\,|\,S_{j_{x}-1}\right)<\tau.\qed

Appendix D ThresholdGreedy

Algorithm 9 ThresholdGreedy

1:Input: Evaluation oracle

f:2^{\mathcal{N}}\to\mathbb{R}

, constraint

k

, constant

\alpha

, value

\Gamma

s.t.

\Gamma\leq f(O)\leq\Gamma/\alpha

, error

\varepsilon

2:Initialize

\tau\leftarrow\Gamma/(\alpha k)

S\leftarrow\emptyset

3:while

\tau\geq\varepsilon\Gamma/k

4: for

e\in\mathcal{N}

5: if

\Delta\left(e\,|\,S\right)\geq\tau

then

S\leftarrow S\cup\{e\}

6: if

|S|=k

then return

S

\tau\leftarrow\tau(1-\varepsilon)

8:return

S

In this section, we introduce a variant of near-linear time algorithm (Alg. 1) in Badanidiyuru and Vondrak (2014), ThresholdGreedy (Alg. 9), which requires two more parameters, constant $\alpha$ and value $\Gamma$ . With the assumption that $\Gamma$ is an $\alpha-$ approximation solution, ThresholdGreedy ensures a $(1-1/e-\varepsilon)-$ approximation with $O(n/\varepsilon)$ query calls.

Theorem 8.

Let $(f,k)$ be an instance of SM, and $O$ is the optimal solution to $(f,k)$ . With input $\alpha$ , $\Gamma$ and $\varepsilon$ , where $\Gamma\leq f(O)\leq\Gamma/\alpha$ , ThresholdGreedy outputs solution $S$ with $O(n\varepsilon^{-1}\log(\alpha\varepsilon)^{-1})$ queries such that $f(S)\geq(1-1/e-\varepsilon)f(O)$ .

Proof.

Query Complexity. The while loop in Alg. 9 has at most $\log_{1-\varepsilon}(\alpha\varepsilon)\leq\varepsilon^{-1}\log(\alpha\varepsilon)^{-1}$ iterations. For each iteration of the while loop, there are $O(n)$ queries. Therefore, the query complexity of ThresholdGreedy is $O(n\varepsilon^{-1}\log(\alpha\varepsilon)^{-1})$ .

Approximation Ratio. If $|S|<k$ at termination, for any $e\in\mathcal{N}\setminus S$ , it holds that $\Delta\left(e\,|\,S\right)<\varepsilon\Gamma/k\leq\varepsilon f(O)/k$ . Then,

		$\displaystyle f(O)-f(S)\overset{(1)}{\leq}\sum_{o\in O\setminus S}\Delta\left(o\,\|\,S\right)\leq\varepsilon f(O)$
	$\displaystyle\Rightarrow$	$\displaystyle\hskip 10.00002ptf(S)\geq(1-\varepsilon)f(O)$

where Inequality (1) follows from monotonicity and submodularity.

If $|S|=k$ at termination, let $S_{j}$ be the intermediate solution after iteration $j$ of the while loop, and $\tau_{j}$ is the corresponding threshold. Suppose that $S_{j}\setminus S_{j-1}\neq\emptyset$ . Then, each element added during iteration $j$ provides a minimum incremental benefit of $\tau$ ,

f(S_{j})-f(S_{j-1})\geq|S_{j}\setminus S_{j-1}|\tau_{j}.

Next, we consider iteration $j-1$ . If $j>1$ , since the algorithm does not terminate during iteration $j-1$ , each element in $\mathcal{N}$ is either added to the solution or omitted due to small marginal gain with respect to the current solution. Therefore, for any $o\in O\setminus S_{j-1}$ , it holds that $\Delta\left(o\,|\,S_{j-1}\right)<\tau_{j-1}$ by submodularity. Then,

f(O)-f(S_{j-1})\leq\sum_{o\in O\setminus S_{j-1}}\Delta\left(o\,|\,S_{j-1}\right)<k\tau_{j-1}=k\tau_{j}/(1-\varepsilon).

If $j=1$ , $\tau_{j}=\Gamma/(\alpha k)$ and $S_{j-1}=\emptyset$ .

f(O)-f(S_{j-1})=f(O)\leq\Gamma/\alpha=k\tau_{j}<k\tau_{j}/(1-\varepsilon).

Therefore, for any iteration $j$ that has elements being added to the solution, it holds that

$\displaystyle f(O)-f(S_{j})$	$\displaystyle\leq\left(1-\frac{1-\varepsilon}{k}\|S_{j}\setminus S_{j-1}\|\right)\left(f(O)-f(S_{j-1})\right)$
$\displaystyle\Rightarrow\hskip 10.00002ptf(O)-f(S)$	$\displaystyle\leq\prod_{j}\left(1-\frac{1-\varepsilon}{k}\|S_{j}\setminus S_{j-1}\|\right)f(O)$
	$\displaystyle\leq\prod_{j}e^{-\frac{1-\varepsilon}{k}\|S_{j}\setminus S_{j-1}\|}f(O)$	( $x+1\leq e^{x}$ )
	$\displaystyle=e^{-1+\varepsilon}f(O)\leq(1/e+\varepsilon)f(O)$	( $0<\varepsilon<1$ )

\Rightarrow\hskip 30.00005ptf(S)\geq(1-1/e-\varepsilon)f(O)\hskip 30.00005pt\qed

Table 3: Small and Large Data

. Application Small Data Large Data $n$ Edges^$\ast$ $n$ Edges^$\ast$ ImageSumm 10,000 $\simeq 1.0\times 10^{8}$ 50,000 $\simeq 2.5\times 10^{9}$ InfluenceMax 26,588 $\simeq 1.0\times 10^{5}$ 1,134,890 $\simeq 6.0\times 10^{6}$ RevenueMax 17,432 $\simeq 1.8\times 10^{5}$ 3,072,441 $\simeq 2.3\times 10^{8}$ MaxCover (BA) 100,000 $\simeq 5.0\times 10^{5}$ 1,000,000 $\simeq 1.0\times 10^{9}$

Appendix E Threshold-DASH with no Knowledge of OPT

See 7

Proof.

By Definition ( $p_{x}$ in Section 5), it holds that $Pr\left(o\in O_{1}\right)=p_{o}$ . Since $o$ is assigned to each machine randomly with probability $1/\ell$ ,

	$\displaystyle Pr\left(o\in O_{2}\right)$	$\displaystyle=\sum_{i=1}^{\ell}Pr\left(o\in S_{i}\right)$
		$\displaystyle=\ell\cdot Pr\left(o\in S_{1}\|o\in\mathcal{N}_{1}\right)\cdot Pr\left(o\in\mathcal{N}_{1}\right)$
		$\displaystyle\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}=Pr\left(o\in\textsc{TSMRel}(\mathcal{N}_{1})\|o\in\mathcal{N}_{1}\right)$
		$\displaystyle\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}=Pr\left(o\in\textsc{TSMRel}(\mathcal{N}_{1}\cup\{o\})\right)$
		$\displaystyle=1-p_{o}.$

Moreover, we know that any two machines selects elements independently. So,

	$\displaystyle Pr\left(o\in O_{2}\|o\in O_{1}\right)$	$\displaystyle\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}=Pr\left(o\in O_{2}\|o\not\in\textsc{TSMRel}(\mathcal{N}_{1}\cup\{o\},q)\right)$
		$\displaystyle\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}=1-Pr\left(o\not\in O_{2}\|o\not\in\textsc{TSMRel}(\mathcal{N}_{1}\cup\{o\},q)\right)$
		$\displaystyle\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}=1-\prod_{i=1}^{\ell}Pr\left(o\not\in S_{i}\|o\not\in\textsc{TSMRel}(\mathcal{N}_{1}\cup\{o\},q)\right)$
		$\displaystyle\overset{(a)}{=}1-\prod_{i=2}^{\ell}Pr\left(o\not\in S_{i}\right)$
		$\displaystyle\leq 1-Pr\left(o\not\in O_{2}\right)$
		$\displaystyle=Pr\left(o\in O_{2}\right)=1-p_{o},$

where Inequality (a) follows from $Pr\left(o\not\in\textsc{TSMRel}(\mathcal{N}_{1})|o\not\in\textsc{TSMRel}(\mathcal{N}_{1}\cup\{o\})\right)=1$ .

Thus, we can bound the probability by the following,

	$\displaystyle Pr\left(o\in O_{1}\cup O_{2}\right)=$	$\displaystyle Pr\left(o\in O_{1}\right)+Pr\left(o\in O_{1}\right)$
		$\displaystyle-Pr\left(o\in O_{1}\cap O_{2}\right)$
	$\displaystyle\geq$	$\displaystyle p_{o}+1-p_{o}-p_{o}(1-p_{o})$
	$\displaystyle\geq$	$\displaystyle 3/4.$

∎

Description. The T-DASH algorithm described in Alg. 10 is a two MR-rounds algorithm using the AFD approach that runs ThreshSeqMod concurrently on every machine for $\log{}_{1+\varepsilon}(k)$ different guesses of threshold $\tau_{i,j}$ in the range $[\frac{\alpha\Delta_{i}^{*}}{k},\alpha\Delta_{i}^{*}]$ ; where $\alpha$ is the approximation of T-DASH and $\Delta_{i}^{*}$ is the maximum singleton in $\mathcal{N}_{i}$ . Every solution returned to the primary machine are placed into bins based on their corresponding threshold guess $\tau_{i,j}$ such that $\frac{\Delta^{*}}{k}(1+\varepsilon)^{x}\leq\tau_{i,j}\leq\frac{\Delta^{*}}{k}(1+\varepsilon)^{x+1}$ ; where $\Delta^{*}$ is the maximum singleton in $\mathcal{N}$ . Since there must exist a threshold $\tau^{*}$ that is close enough to $\alpha\textsc{OPT}/k$ ; running ThreshSeqMod (Line 21) on every bin in the range $[\frac{\alpha\Delta^{*}}{k},\alpha\Delta^{*}]$ and selecting the best solution guarantees the $\alpha$ -approximation of T-DASH in $O(\log{}(n))$ adaptive rounds.

Algorithm 10 Threshold-DASH with no knowledge of OPT (T-DASH)

1:Input: Evaluation oracle

f:2^{\mathcal{N}}\to\mathbb{R}

, constraint

k

, error

\varepsilon

, available machines

M\leftarrow\{1,2,...,\ell\}

2:Initialize

\delta\leftarrow 1/(\ell+1)

\mathbf{q}\leftarrow

a fixed sequence of random bits.

3:Set

\alpha\leftarrow\frac{3}{8}

(\mathbf{q}_{i,j})_{i\in[\ell+1],j\in[\log_{1+\varepsilon}(k)]}\leftarrow\mathbf{q}

4:for

e\in\mathcal{N}

do do

5: Assign

e

to each machine independently with probability

1/\ell

6:for

i\in M

\triangleright

On machine

i

8: Let

\mathcal{N}_{i}

be the elements assigned to machine

i

9: Set

\Delta_{i}^{*}\leftarrow\max\{f(e):e\in\mathcal{N}_{i}\}

10: for

j\leftarrow 0

\log_{1+\varepsilon}(k)

in parallel do

11:

\tau_{i,j}\leftarrow\frac{(\alpha+\varepsilon)\Delta_{i}^{*}}{k}(1+\varepsilon)^{j}

12:

S_{i,j},R_{i,j}\leftarrow\textsc{ThreshSeqMod}(f,\mathcal{N}_{i},k,\delta,\varepsilon,\tau_{j},\mathbf{q}_{i,j})

13: Send

\Delta_{i}^{*}

and all

(\tau_{i,j},S_{i,j},R_{i,j})

to primary machine

14:

\triangleright

On primary machine

15:Set

\Delta^{*}\leftarrow\max\{\Delta_{i}^{*}:1\leq i\leq\ell\}

16:for

x\leftarrow 0

\lceil\log_{1+\varepsilon}(k)\rceil+1

in parallel do

17:

\tau_{x}\leftarrow\frac{(\alpha+\varepsilon)\Delta^{*}}{k}(1+\varepsilon)^{x}

18: Let

R_{x}\leftarrow\{\bigcup R_{i,j}:\tau_{x}\leq\tau_{i,j}\leq\tau_{x+1}\}

19: Let

A_{x}\leftarrow\{\text{Sample a solution }S_{i,j}:\tau_{x}\leq\tau_{i,j}\leq\tau_{x+1}\}

20: Let

g_{x}(\cdot)\leftarrow f(A_{x}\cup\cdot)-f(A_{x})

21:

T_{x}\leftarrow\textsc{ThreshSeqMod}(g_{x},R_{x},k-|A_{x}|,\delta,\varepsilon,\tau_{x},\mathbf{q}_{\ell+1,x})

22:

T_{x}^{\prime}\leftarrow A_{x}\cup T_{x}

23:

T\leftarrow\operatorname*{arg\,max}\{f(T_{x}^{\prime}):0\leq x\leq\log_{1+\varepsilon}(k)\}

24:return

T

Theorem 9.

Let $(f,k)$ be an instance of SM where $k\log(k)<\frac{\varepsilon\psi}{\ell}$ . T-DASH with no knowledge of OPT returns set $T^{\prime}$ with two MR rounds, $O\left(\frac{1}{\varepsilon^{3}}\log(n)\right)$ adaptive rounds, $O\left(\frac{n\log(k)}{\varepsilon^{4}}\right)$ total queries, $O(n)$ communication complexity, and probability at least $1-n^{-c}$ such that

\mathbb{E}\left[f(T^{\prime})\right]\geq\left(\frac{3}{8}-\varepsilon\right)\textsc{OPT}.

Overview of Proof. Alg. 10 is inspired by Alg. 6 in Section 5, which is a version of the algorithm that knows that optimal solution value. With $\Delta^{*}=\max\{f(e):e\in\mathcal{N}\}$ , there exists an $x_{0}$ such that $\tau_{x_{0}}\leq\alpha\textsc{OPT}(1+\varepsilon)/k\leq\tau_{x_{0}+1}$ with $O\left(\log(k)/\varepsilon\right)$ guesses. Then, on each machine $i$ , we only consider sets $S_{i,j}$ and $R_{i,j}$ such that $\tau_{x_{0}}\leq\tau_{i,j}\leq\tau_{x_{0}+1}$ . If this $\tau_{i,j}$ does exist, $(S_{i,j},R_{i,j})$ works like $(S_{i},R_{i})$ in Alg. 6. If this $\tau_{i,j}$ does not exist, then for any $e\in\mathcal{N}_{i}$ , it holds that $f(e)<\alpha\textsc{OPT}/k$ , which means $\textsc{ThreshSeqMod}(\mathcal{N}_{i})$ with $\tau=\alpha\textsc{OPT}/k$ will return an empty set. Since each call of ThreshSeqMod with different guesses on $\tau$ is executed in parallel, the adaptivity remains the same and the query complexity increases by a factor of $\log(k)/\varepsilon$ .

Proof.

First, for $x=0$ , $\tau_{x}=(\alpha+\varepsilon)\Delta^{*}/k\leq(\alpha+\varepsilon)\textsc{OPT}/k$ ; and for $x=\lceil\log_{1+\varepsilon}(k)\rceil+1$ ,

\tau_{x}\geq(\alpha+\varepsilon)(1+\varepsilon)\Delta^{*}\geq(\alpha+\varepsilon)(1+\varepsilon)\sum_{o\in O}f(o)/k\geq(\alpha+\varepsilon)(1+\varepsilon)\textsc{OPT}/k.

Therefore, there exists an $x_{0}$ such that $\tau_{x_{0}}\leq(\alpha+\varepsilon)(1+\varepsilon)\textsc{OPT}/k\leq\tau_{x_{0}+1}$ . Since $\tau_{x_{0}+1}=\tau_{x_{0}}(1+\varepsilon)$ , it holds that $\tau_{x_{0}}\geq(\alpha+\varepsilon)\textsc{OPT}/k$ and $\tau_{x_{0}+1}\leq(\alpha+\varepsilon)(1+\varepsilon)^{2}\textsc{OPT}/k$ .

Then, we only consider $T_{x_{0}}^{\prime}$ . If $|T_{x_{0}}^{\prime}|=k$ , by Theorem 1, it holds that,

\displaystyle f(T)\geq f(T_{x_{0}}^{\prime})\geq\frac{1-2\varepsilon}{1+\varepsilon}\tau_{x_{0}}k\geq\left(\frac{3}{8}-\varepsilon\right)\textsc{OPT}.

Otherwise, in the case that $|T_{x_{0}}^{\prime}|<k$ , let $A_{x_{0}}=S_{i_{0},j_{0}}$ , $O_{1}=\{o\in O:o\not\in\textsc{TSMRel}(N_{i_{0}}\cup\{o\},q)\}$ . Also, let $\tau_{i,(j_{0})}$ be returned by machine $i$ that $\tau_{x_{0}}\leq\tau_{i,(j_{0})}\leq\tau_{x_{0}+1}$ , define

(S_{i,(j_{0})}^{\prime},R_{i,(j_{0})}^{\prime})=\begin{cases}(S_{i,(j_{0})},R_{i,(j_{0})})&\text{, if machine $i$ returned a }\tau_{i,(j_{0})}\\ (\emptyset,\emptyset)&\text{, otherwise }\end{cases}.

On the machine which does not return a $\tau_{i,(j_{0})}$ , we consider it runs $\textsc{ThreshSeqMod}(\mathcal{N}_{i},\tau_{x_{0}+1})$ and returns two empty sets, and hence $\max\{f(e):e\in\mathcal{N}_{i}\}<\tau_{x_{0}}$ . Let $R_{x_{0}}^{\prime}=\cup_{i\in M}R_{i,(j_{0})}^{\prime}$ , $O_{2}=R_{x_{0}}^{\prime}\cap O$ . Then, Lemma 7 in Appendix E still holds in this case. We can calculate the approximation ratio as follows with $\varepsilon\leq 2/3$ ,

	$\displaystyle\mathbb{E}\left[T\right]\geq\mathbb{E}\left[T_{x_{0}}^{\prime}\right]$	$\displaystyle\geq\mathbb{E}\left[f(O_{1}\cup O_{2})\right]-k\cdot\tau_{x_{0}+1}$
		$\displaystyle\geq\left(\frac{3}{8}-\varepsilon\right)\textsc{OPT}.$

∎

Appendix F Experiment Setup

F.1 Applications

Given a constraint $k$ , the objectives of the applications are defined as follows:

F.1.1 Max Cover

Maximize the number of nodes covered by choosing a set $S$ of maximum size $k$ , such that the number of nodes having at least one neighbour in the set $S$ . The application is run on synthetic random BA graphs of groundset size 100,000, 1,000,000 and 5,000,000 generated using Barabási–Albert (BA) models for the centralized and distributed experiments respectively.

F.1.2 Image Summarization on CIFAR-10 Data

Given large collection of images, find a subset of maximum size $k$ which is representative of the entire collection. The objective used for the experiments is a monotone variant of the image summarization from Fahrbach et al. (2019b). For a groundset with $N$ images, it is defined as follows:

\displaystyle f(S)=\sum_{i\in N}\max_{j\in S}s_{i,j}

where $s_{i,j}$ is the cosine similarity of the pixel values between image $i$ and image $j$ . The data for the image summarization experiments contains 10,000 and 50,000 CIFAR-10 Krizhevsky et al. (2009) color images respectively for the centralized and distributed experiments.

F.1.3 Influence Maximization on a Social Network.

Maximise the aggregate influence to promote a topic by selecting a set of social network influencers of maximum size $k$ . The probability that a random user $i$ will be influenced by the set of influencers in $S$ is given by:

	$\displaystyle f_{i}(S)$	$\displaystyle=1\quad\textrm{, for $i\in S$}$
	$\displaystyle f_{i}(S)$	$\displaystyle=1-(1-p)^{\|N_{S}(i)\|}\quad\textrm{, for $i\notin S$ }$

where — $N_{S}(i)$ — is the number of neighbors of node $i$ in $S$ . We use the Epinions data set consisting of 27,000 users from Rossi and Ahmed (2015) for the centralized data experiments and the Youtube online social network data Yang and Leskovec (2012) consisting more than 1 million users for distrbuted data experiments. The value of $p$ is set to 0.01.

F.1.4 Revenue Maximization on YouTube.

Maximise revenue of a product by selecting set of users $S$ of maximum size $k$ , where the network neighbors will be advertised a different product by the set of users $S$ . It is based on the objective function from Mirzasoleiman et al. (2016). For a given set of users $X$ and $w_{i,j}$ as the influence between user $i$ and $j$ , the objective function can be defined by:

	$\displaystyle f(S)$	$\displaystyle=\sum_{i\in X}V\left(\sum_{j\in S}w_{i,j}\right)$
	$\displaystyle V(y)$	$\displaystyle=y^{\alpha}$

where $V(S)$ , the expected revenue from an user is a function of the sum of influences from neighbors who are in $S$ and $\alpha:0<\alpha<1$ is a rate of diminishing returns parameter for increased cover.

We use the Youtube data set from Mirzasoleiman et al. (2016) consisting of 18,000 users for centralized data experiments. For the distrbuted data experiments we perform empirical evaluation on the Orkut online social network data from Yang and Leskovec (2012) consisting more than 3 million users. The value of $\alpha$ is set to 0.3

Appendix G Replicating the Experiment Results

Our experiments can be replicated by running the following scripts:

•

Install MPICH version 3.3a2 (DO NOT install OpenMPI and ensure mpirun utlizes mpich using the command mpirun –version (Ubuntu))
•

Install pandas, mpi4py, scipy, networkx
•

Set up an MPI cluster using the following tutorial:
https://mpitutorial.com/tutorials/running-an-mpi-cluster-within-a-lan/
•
Create and update the host file ../nodesFileIPnew to store the ip addresses of all the connected MPI machines before running any experiments (First machine being the primary machine)
- –
  
  NOTE: Please place nodesFileIPnew inside the MPI shared repository; ”cloud/” in this case (at the same level as the code base directory). DO NOT place it inside the code base ”DASH-Distributed_SMCC-python/” directory.
•
Clone the DASH-Distributed_SMCC-python repository inside the MPI shared repository (/cloud in the case using the given tutorial)
- –
  
  NOTE: Please clone the ”DASH-Distributed_SMCC-python” repository and execute the following commands on a machine with sufficient memory (RAM); capable of generating the large datasets. This repository NEED NOT be the primary repository (”/cloud/DASH-Distributed_SMCC-python/”) on the shared memory of the cluster; that will be used for the experiments.
•

Additional Datasets For Experiment 1 : Please download the Image Similarity Matrix file ”images_10K_mat.csv”(https://drive.google.com/file/d/1s9PzUhV-C5dW8iL4tZPVjSRX4PBhrsiJ/view?usp=sharing)) and place it in the data/data_exp1/ directory.
•
To generate the decentralized data for Experiment 2 and 3 : Please follow the below steps:
- –
  
  Execute bash GenerateDistributedData.bash nThreads nNodes
- –
  
  The previous command should generate nNodes directories in loading_data/ directory (with names machine $<$ nodeNo $>$ Data)
- –
  
  Copy the data_exp2_split/ and data_exp3_split/ directories within each machine $<$ i $>$ Data directory to the corresponding machine $M_{i}$ and place the directories outside /cloud (directory created after setting up an MPI cluster using the given tutorial)).

To run all experiments in the apper
Please read the README.md file in the ”DASH-Distributed_SMCC-python” (Code/Data Appendix) for detailed information.

Appendix H The Greedy Algorithm (Nemhauser and Wolsey (1978))

The standard Greedy algorithm starts with an empty set and then proceeds by adding elements to the set over $k$ iterations. In each iteration the algorithm selects the element with the maximum marginal gain $\Delta(e|A_{i-1})$ where $\Delta(i|A)=f(A\cup\{i\})-f(A)$ . The Algorithm 11 is a formal statement of the standard Greedy algorithm. The intermediate solution set $A_{i}$ represents the solution after iteration $i$ . The ( $1-1/e\approx$ 0.632)-approximation of the Greedy algorithm is the optimal ratio possible for monotone objectives.

Algorithm 11

\textsc{Greedy}(f,\mathcal{N},k)

1:Input: evaluation oracle

f:2^{\mathcal{N}}\to\mathbb{R}

, constraint

k

2:Let

A_{i}\leftarrow\emptyset

3:for

i=1

k

u_{i}\leftarrow\operatorname*{arg\,max}\{f(e|A_{i-1});\text{ where }e\in\mathcal{N}\textbackslash A_{i-1}\}

A_{i}\leftarrow A_{i-1}\cup u_{i}

6:return

A_{k}

Appendix I Discussion of the $1/2$ -approximate Algorithm in Liu and Vondrak (2019)

Algorithm 12

\textsc{ThresholdGreedy}(S,G,\tau)

1:Input: An input set

S

, a partial greedy solution

G

with

G\leq k

, and a threshold

\tau

G^{\prime}\leftarrow G

3:for

e\in S

4: if

\Delta\left(e\,|\,G_{0}\right)\geq\tau

and

|G_{0}|<k

then

G_{0}\leftarrow G_{0}\cup\{e\}

6:return

G^{\prime}

Algorithm 13

\textsc{ThresholdFilter}(S,G,\tau)

1:Input: An input set

S

, a partial greedy solution

G

with

G\leq k

, and a threshold

\tau

S^{\prime}\leftarrow S

3:for

e\in S

4: if

\Delta\left(e\,|\,G\right)<\tau

then

S^{\prime}\leftarrow S^{\prime}\setminus\{e\}

5:return

S^{\prime}

Algorithm 14 A simple 2-round

1/2

approximation, assuming OPT is known

1:round 1:

S\leftarrow

sample each

e\in\mathcal{N}

with probability

p=4\sqrt{k/n}

3:send

S

to each machine and the central machine

C

4:partition

\mathcal{N}

randomly into sets

V_{1},V_{2},\ldots,V_{m}

5:send

V_{i}

to machine

i

for each

i\in[m]

6:for each machine

M_{i}

(in parallel) do

\tau\leftarrow\frac{\textsc{OPT}}{2k}

G_{0}\leftarrow\textsc{ThresholdGreedy}(S,\emptyset,\tau)

9: if

|G_{0}|<k

then

10:

R_{i}\leftarrow\textsc{ThresholdFilter}(V_{i},G_{0},\tau)

11: else

12:

R_{i}=\emptyset

13: send

R_{i}

to the central machine

C

14:

15:round 2 (only on

C

16:compute

G_{0}

from

S

as in first round

17:for

e\in\cup_{i}R_{i}

18: if

\Delta\left(e\,|\,G\right)\geq\tau

then

G\leftarrow G\cup\{e\}

19:return

G

The simple two-round distributed algorithm proposed by Liu and Vondrak (2019) achieves a deterministic $1/2-\varepsilon$ approximation ratio. The pseudocode of this algorithm, assuming that we know OPT, is provided in Alg. 14. The guessing OPT version runs $O\left(\log(k)\right)$ copies of Alg. 14 on each machine, which has $O\left(k\right)$ adaptivity and $O\left(nk\log(k)\right)$ query complexity.

Intuitively, the result of Alg. 14 is equivalent to a single run of ThresholdGreedy on the whole ground set. Within the first round, it calls ThresholdGreedy on $S$ to get an intermediate solution $G_{0}$ . After filtering out the elements with gains less than $\tau$ in $V_{i}$ , the rest of the elements in $V_{i}$ are sent to the primary machine. By careful analysis, there is a high probability that the total number of elements returned within the first round is less than $\sqrt{nk}$ .

Compared to the 2-round algorithms in Table 1, Alg. 14 is the only algorithm that requires data duplication, with four times more elements distributed to each machine in the first round. As a result, distributed setups have a more rigid memory restriction when running this algorithm. For example, when the dataset exactly fits into the distributed setups which follows that $\Psi=n/\ell$ , there is no more space on each machine to store the random subset $S$ . In the meantime, other 2-round algorithms can still run with $k\leq n/\ell^{2}$ .

Additionally, since the approximation analysis of Alg. 14 does not take randomization into consideration, we may be able to substitute its ThresholdGreedy subroutine with any threshold algorithm (such as ThresholdSeq in Chen et al. (2021)) that has optimal logarithmic adaptivity and linear query complexity. However, given its analysis of the success probability considers $S$ as $3k$ blocks where each block has random access to the whole ground set, it may not be easy to fit parallelizable threshold algorithms into this framework.

References

Badanidiyuru and Vondrak (2014) Ashwinkumar Badanidiyuru and Jan Vondrak. Fast algorithms for maximizing submodular functions. In Chandra Chekuri, editor, Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2014, Portland, Oregon, USA, January 5-7, 2014, pages 1497–1514. SIAM, 2014.
Balakrishnan et al. (2022) Ravikumar Balakrishnan, Tian Li, Tianyi Zhou, Nageen Himayat, Virginia Smith, and Jeff A. Bilmes. Diverse client selection for federated learning via submodular maximization. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022, 2022.
Balkanski and Singer (2018) Eric Balkanski and Yaron Singer. The adaptive complexity of maximizing a submodular function. In Ilias Diakonikolas, David Kempe, and Monika Henzinger, editors, Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018, Los Angeles, CA, USA, June 25-29, 2018, pages 1138–1151. ACM, 2018.
Balkanski et al. (2019) Eric Balkanski, Aviad Rubinstein, and Yaron Singer. An exponential speedup in parallel running time for submodular maximization without loss in approximation. In Timothy M. Chan, editor, Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, San Diego, California, USA, January 6-9, 2019, pages 283–302. SIAM, 2019.
Barbosa et al. (2015) Rafael da Ponte Barbosa, Alina Ene, Huy L. Nguyen, and Justin Ward. The power of randomization: Distributed submodular maximization on massive datasets. In Francis R. Bach and David M. Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, volume 37 of JMLR Workshop and Conference Proceedings, pages 1236–1244. JMLR.org, 2015.
Barbosa et al. (2016) Rafael da Ponte Barbosa, Alina Ene, Huy L. Nguyen, and Justin Ward. A new framework for distributed submodular maximization. In Irit Dinur, editor, IEEE 57th Annual Symposium on Foundations of Computer Science, FOCS 2016, 9-11 October 2016, Hyatt Regency, New Brunswick, New Jersey, USA, pages 645–654. IEEE Computer Society, 2016.
Blelloch et al. (2012) Guy E. Blelloch, Harsha Vardhan Simhadri, and Kanat Tangwongsan. Parallel and I/O efficient set covering algorithms. In Guy E. Blelloch and Maurice Herlihy, editors, 24th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’12, Pittsburgh, PA, USA, June 25-27, 2012, pages 82–90. ACM, 2012.
Breuer et al. (2020) Adam Breuer, Eric Balkanski, and Yaron Singer. The FAST algorithm for submodular maximization. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 1134–1143. PMLR, 2020.
Buchbinder et al. (2014) Niv Buchbinder, Moran Feldman, Joseph Naor, and Roy Schwartz. Submodular maximization with cardinality constraints. In Chandra Chekuri, editor, Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2014, Portland, Oregon, USA, January 5-7, 2014, pages 1433–1452. SIAM, 2014.
Buchbinder et al. (2017) Niv Buchbinder, Moran Feldman, and Roy Schwartz. Comparing apples and oranges: Query trade-off in submodular maximization. Math. Oper. Res., 42(2):308–329, 2017.
Calinescu et al. (2007) Gruia Calinescu, Chandra Chekuri, Martin Pal, and Jan Vondrak. Maximizing a submodular set function subject to a matroid constraint (extended abstract). In Matteo Fischetti and David P. Williamson, editors, Integer Programming and Combinatorial Optimization, 12th International IPCO Conference, Ithaca, NY, USA, June 25-27, 2007, Proceedings, volume 4513 of Lecture Notes in Computer Science, pages 182–196. Springer, 2007.
Chekuri and Quanrud (2019) Chandra Chekuri and Kent Quanrud. Submodular function maximization in parallel via the multilinear relaxation. In Timothy M. Chan, editor, Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, San Diego, California, USA, January 6-9, 2019, pages 303–322. SIAM, 2019.
Chen and Kuhnle (2024) Yixin Chen and Alan Kuhnle. Practical and parallelizable algorithms for non-monotone submodular maximization with size constraint. J. Artif. Intell. Res., 79:599–637, 2024.
Chen et al. (2021) Yixin Chen, Tonmoy Dey, and Alan Kuhnle. Best of both worlds: Practical and theoretically optimal submodular maximization in parallel. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 25528–25539, 2021.
Chierichetti et al. (2010) Flavio Chierichetti, Ravi Kumar, and Andrew Tomkins. Max-cover in map-reduce. In Michael Rappa, Paul Jones, Juliana Freire, and Soumen Chakrabarti, editors, Proceedings of the 19th International Conference on World Wide Web, WWW 2010, Raleigh, North Carolina, USA, April 26-30, 2010, pages 231–240. ACM, 2010.
Dey et al. (2023) Tonmoy Dey, Yixin Chen, and Alan Kuhnle. DASH: A distributed and parallelizable algorithm for size-constrained submodular maximization. In Brian Williams, Yiling Chen, and Jennifer Neville, editors, Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2023, Washington, DC, USA, February 7-14, 2023, pages 3941–3948. AAAI Press, 2023.
Ene and Nguyen (2019) Alina Ene and Huy L. Nguyen. Submodular maximization with nearly-optimal approximation and adaptivity in nearly-linear time. In Timothy M. Chan, editor, Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, San Diego, California, USA, January 6-9, 2019, pages 274–282. SIAM, 2019.
Ene and Nguyen (2020) Alina Ene and Huy L. Nguyen. Parallel algorithm for non-monotone dr-submodular maximization. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 2902–2911. PMLR, 2020.
Epasto et al. (2017) Alessandro Epasto, Vahab S. Mirrokni, and Morteza Zadimoghaddam. Bicriteria distributed submodular maximization in a few rounds. In Christian Scheideler and Mohammad Taghi Hajiaghayi, editors, Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2017, Washington DC, USA, July 24-26, 2017, pages 25–33. ACM, 2017.
Ettinger et al. (2021) Scott Ettinger, Shuyang Cheng, Benjamin Caine, Chenxi Liu, Hang Zhao, Sabeek Pradhan, Yuning Chai, Ben Sapp, Charles R. Qi, Yin Zhou, Zoey Yang, Aurelien Chouard, Pei Sun, Jiquan Ngiam, Vijay Vasudevan, Alexander McCauley, Jonathon Shlens, and Dragomir Anguelov. Large scale interactive motion forecasting for autonomous driving : The waymo open motion dataset. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pages 9690–9699. IEEE, 2021.
Fahrbach et al. (2019a) Matthew Fahrbach, Vahab S. Mirrokni, and Morteza Zadimoghaddam. Submodular maximization with nearly optimal approximation, adaptivity and query complexity. In Timothy M. Chan, editor, Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, San Diego, California, USA, January 6-9, 2019, pages 255–273. SIAM, 2019a.
Fahrbach et al. (2019b) Matthew Fahrbach, Vahab S. Mirrokni, and Morteza Zadimoghaddam. Non-monotone submodular maximization with nearly optimal adaptivity and query complexity. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 1833–1842. PMLR, 2019b.
Gupta et al. (2010) Anupam Gupta, Aaron Roth, Grant Schoenebeck, and Kunal Talwar. Constrained non-monotone submodular maximization: Offline and secretary algorithms. In Amin Saberi, editor, Internet and Network Economics - 6th International Workshop, WINE 2010, Stanford, CA, USA, December 13-17, 2010. Proceedings, volume 6484 of Lecture Notes in Computer Science, pages 246–257. Springer, 2010.
Joseph et al. (2019) K. J. Joseph, Vamshi Teja R, Krishnakant Singh, and Vineeth N. Balasubramanian. Submodular batch selection for training deep neural networks. In Sarit Kraus, editor, Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019, pages 2677–2683. ijcai.org, 2019.
Kazemi et al. (2019) Ehsan Kazemi, Marko Mitrovic, Morteza Zadimoghaddam, Silvio Lattanzi, and Amin Karbasi. Submodular streaming in all its glory: Tight approximation, minimum memory and low adaptive complexity. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 3311–3320. PMLR, 2019.
Kazemi et al. (2021) Ehsan Kazemi, Shervin Minaee, Moran Feldman, and Amin Karbasi. Regularized submodular maximization at scale. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 5356–5366. PMLR, 2021.
Krizhevsky et al. (2009) Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. In University of Toronto, Canada. Citeseer, 2009.
Kuhnle (2021) Alan Kuhnle. Quick streaming algorithms for maximization of monotone submodular functions in linear time. In Arindam Banerjee and Kenji Fukumizu, editors, The 24th International Conference on Artificial Intelligence and Statistics, AISTATS 2021, April 13-15, 2021, Virtual Event, volume 130 of Proceedings of Machine Learning Research, pages 1360–1368. PMLR, 2021.
Kumar et al. (2013) Ravi Kumar, Benjamin Moseley, Sergei Vassilvitskii, and Andrea Vattani. Fast greedy algorithms in mapreduce and streaming. In Guy E. Blelloch and Berthold Vöcking, editors, 25th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’13, Montreal, QC, Canada - July 23 - 25, 2013, pages 1–10. ACM, 2013.
Lee et al. (2009) Jon Lee, Vahab S. Mirrokni, Viswanath Nagarajan, and Maxim Sviridenko. Non-monotone submodular maximization under matroid and knapsack constraints. In Michael Mitzenmacher, editor, Proceedings of the 41st Annual ACM Symposium on Theory of Computing, STOC 2009, Bethesda, MD, USA, May 31 - June 2, 2009, pages 323–332. ACM, 2009.
Liu and Vondrak (2019) Paul Liu and Jan Vondrak. Submodular optimization in the mapreduce model. In Jeremy T. Fineman and Michael Mitzenmacher, editors, 2nd Symposium on Simplicity in Algorithms, SOSA 2019, January 8-9, 2019, San Diego, CA, USA, volume 69 of OASIcs, pages 18:1–18:10. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019.
Liu (2020) Siwen Liu. A review for submodular optimization on machine scheduling problems. In Ding-Zhu Du and Jie Wang, editors, Complexity and Approximation - In Memory of Ker-I Ko, volume 12000 of Lecture Notes in Computer Science, pages 252–267. Springer, 2020.
Liu et al. (2020) Yajing Liu, Edwin K. P. Chong, Ali Pezeshki, and Zhenliang Zhang. Submodular optimization problems and greedy strategies: A survey. Discret. Event Dyn. Syst., 30(3):381–412, 2020.
Mao et al. (2021) Jiageng Mao, Minzhe Niu, Chenhan Jiang, Hanxue Liang, Jingheng Chen, Xiaodan Liang, Yamin Li, Chaoqiang Ye, Wei Zhang, Zhenguo Li, Jie Yu, Chunjing Xu, and Hang Xu. One million scenes for autonomous driving: ONCE dataset. In Joaquin Vanschoren and Sai-Kit Yeung, editors, Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual, 2021.
Mirrokni and Zadimoghaddam (2015) Vahab S. Mirrokni and Morteza Zadimoghaddam. Randomized composable core-sets for distributed submodular maximization. In Rocco A. Servedio and Ronitt Rubinfeld, editors, Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC 2015, Portland, OR, USA, June 14-17, 2015, pages 153–162. ACM, 2015.
Mirzasoleiman et al. (2013) Baharan Mirzasoleiman, Amin Karbasi, Rik Sarkar, and Andreas Krause. Distributed submodular maximization: Identifying representative elements in massive data. In Christopher J. C. Burges, Léon Bottou, Zoubin Ghahramani, and Kilian Q. Weinberger, editors, Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States, pages 2049–2057, 2013.
Mirzasoleiman et al. (2015) Baharan Mirzasoleiman, Ashwinkumar Badanidiyuru, Amin Karbasi, Jan Vondrak, and Andreas Krause. Lazier than lazy greedy. In Blai Bonet and Sven Koenig, editors, Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, January 25-30, 2015, Austin, Texas, USA, pages 1812–1818. AAAI Press, 2015.
Mirzasoleiman et al. (2016) Baharan Mirzasoleiman, Ashwinkumar Badanidiyuru, and Amin Karbasi. Fast constrained submodular maximization: Personalized data summarization. In Maria-Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, volume 48 of JMLR Workshop and Conference Proceedings, pages 1358–1367. JMLR.org, 2016.
Mirzasoleiman et al. (2018) Baharan Mirzasoleiman, Stefanie Jegelka, and Andreas Krause. Streaming non-monotone submodular maximization: Personalized video summarization on the fly. In Sheila A. McIlraith and Kilian Q. Weinberger, editors, Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pages 1379–1386. AAAI Press, 2018.
Mitzenmacher and Upfal (2017) Michael Mitzenmacher and Eli Upfal. Probability and computing: Randomization and probabilistic techniques in algorithms and data analysis. Cambridge university press, 2017.
Nemhauser and Wolsey (1978) George L. Nemhauser and Laurence A. Wolsey. Best algorithms for approximating the maximum of a submodular set function. Math. Oper. Res., 3(3):177–188, 1978.
Nemhauser et al. (1978) George L. Nemhauser, Laurence A. Wolsey, and Marshall L. Fisher. An analysis of approximations for maximizing submodular set functions - I. Math. Program., 14(1):265–294, 1978.
Rangwani et al. (2021) Harsh Rangwani, Arihant Jain, Sumukh K. Aithal, and R. Venkatesh Babu. S ${}^{\mbox{3}}$ vaada: Submodular subset selection for virtual adversarial active domain adaptation. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pages 7496–7505. IEEE, 2021.
Rossi and Ahmed (2015) Ryan A. Rossi and Nesreen K. Ahmed. The network data repository with interactive graph analytics and visualization. In Blai Bonet and Sven Koenig, editors, Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, January 25-30, 2015, Austin, Texas, USA, pages 4292–4293. AAAI Press, 2015.
Yang and Leskovec (2012) Jaewon Yang and Jure Leskovec. Defining and evaluating network communities based on ground-truth. In Mohammed Javeed Zaki, Arno Siebes, Jeffrey Xu Yu, Bart Goethals, Geoffrey I. Webb, and Xindong Wu, editors, 12th IEEE International Conference on Data Mining, ICDM 2012, Brussels, Belgium, December 10-13, 2012, pages 745–754. IEEE Computer Society, 2012.

	$\displaystyle Pr\left(o\in O_{2}\|o\in O_{1}\right)$	$\displaystyle\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}=Pr\left(o\in O_{2}\|o\not\in\textsc{TSMRel}(\mathcal{N}_{1}\cup\{o\},q)\right)$
		$\displaystyle\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}=1-Pr\left(o\not\in O_{2}\|o\not\in\textsc{TSMRel}(\mathcal{N}_{1}\cup\{o\},q)\right)$
		$\displaystyle\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}=1-\prod_{i=1}^{\ell}Pr\left(o\not\in S_{i}\|o\not\in\textsc{TSMRel}(\mathcal{N}_{1}\cup\{o\},q)\right)$
		$\displaystyle\overset{(a)}{=}1-\prod_{i=2}^{\ell}Pr\left(o\not\in S_{i}\right)$
		$\displaystyle\leq 1-Pr\left(o\not\in O_{2}\right)$
		$\displaystyle=Pr\left(o\in O_{2}\right)=1-p_{o},$

Scalable Distributed Algorithms for Size-Constrained Submodular Maximization in the MapReduce and Adaptive Complexity Models

Abstract

1 Introduction

1.1 Contributions

1.2 Organization

1.3 Preliminaries

Property 1 (Randomized Consistency Property of Barbosa et al. (2016)).

1.4 Related Work

2 Low-Adaptive Algorithms That Satisfy the RCP

2.1 Low-Adaptive Threshold (ThreshSeqMod) Algorithm

Theorem 1.

2.1.1 Analysis of Consistency

Lemma 1.

Proof.

2.1.2 Analysis of Guarantees

Lemma 2.

Lemma 3.

Proof.

2.2 Low-Adaptive Greedy (LAG) Algorithm

Lemma 4.

Proof of Lemma 4.

Claim 1.

Proof.

3 Consistent Linear Time (Linear-TimeCardinality) Algorithm

Theorem 2.

3.1 Analysis of Consistency

Lemma 5.

Proof.

3.2 Analysis of Guarantees

4 Low-Adaptive Algorithms with Constant MR Rounds (Greedy-DASH and Randomized-DASH)

4.1 Randomized-DASH

Theorem 3.

Proof.

Corollary 1.

4.2 Greedy-DASH

Theorem 4.

Proof.

Lemma 6.

Proof.

Corollary 2.

5 Threshold-DASH: Two MR-Rounds Algorithm with Improved Theoretical Properties

Theorem 5.

Proof.

Lemma 7.

6 LinearTime-Distributed (L-Dist)

Theorem 6.

6.1 Analysis of Query Complexity and Adaptivity

6.2 Analysis of Approximation Ratio

6.3 Post-Processing

7 Towards Larger kk: A Memory-Efficient, Distributed Framework (MED)

Assumption 1.

Theorem 7.

Proof.

8 Empirical Evaluation

8.1 Experiment 1 - Comparative Analysis on Small Datasets

8.2 Experiment 2 - Performance Analysis of R-DASH, RandGreeDI and L-Dist on a Large Cluster

8.3 Experiment 3 - Scalability Assessment

8.4 Experiment 4 - Performance Analysis of MED

9 Discussion and Conclusion

Acknowledgements

Appendix A Lovász Extension of Submodular Function.

Lemma 8 (Barbosa et al. (2015)).

Appendix B Probability Lemma and Concentration Bounds

Lemma 9.

Lemma 10 (Chernoff bounds Mitzenmacher and Upfal (2017)).

Appendix C Omitted Proof of ThreshSeqMod

Proof.

Proof of Success Probability.

Proof of Adaptivity and Query Complexity.

Proof of Property (3).

Proof of Property (4).

Appendix D ThresholdGreedy

Theorem 8.

Proof.

Appendix E Threshold-DASH with no Knowledge of OPT

Proof.

Theorem 9.

Proof.

Appendix F Experiment Setup

F.1 Applications

7 Towards Larger $k$ : A Memory-Efficient, Distributed Framework (MED)

Appendix I Discussion of the $1/2$ -approximate Algorithm in Liu and Vondrak (2019)