This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Distributed Algorithms for Matching in Hypergraphs

Oussama Hanguir
[email protected]
Columbia University
   Clifford Stein
[email protected]
Columbia University
Research partially supported by NSF grants CCF-1714818 and CCF-1822809.
Abstract

We study the dd-Uniform Hypergraph Matching (dd-UHM) problem: given an nn-vertex hypergraph GG where every hyperedge is of size dd, find a maximum cardinality set of disjoint hyperedges. For d3d\geq 3, the problem of finding the maximum matching is 𝒩𝒫\mathcal{NP}-complete, and was one of Karp’s 21 𝒩𝒫\mathcal{NP}-complete problems. In this paper we are interested in the problem of finding matchings in hypergraphs in the massively parallel computation (MPC) model that is a common abstraction of MapReduce-style computation. In this model, we present the first three parallel algorithms for dd-Uniform Hypergraph Matching, and we analyse them in terms of resources such as memory usage, rounds of communication needed, and approximation ratio. The highlights include:

  • A O(logn)O(\log n)-round dd-approximation algorithm that uses O(nd)O(nd) space per machine.

  • A 33-round, O(d2)O(d^{2})-approximation algorithm that uses O~(nm)\tilde{O}(\sqrt{nm}) space per machine.

  • A 33-round algorithm that computes a subgraph containing a (d1+1d)2(d-1+\frac{1}{d})^{2}-approximation, using O~(nm)\tilde{O}(\sqrt{nm}) space per machine for linear hypergraphs, and O~(nnm)\tilde{O}(n\sqrt{nm}) in general.

For the third algorithm, we introduce the concept of HyperEdge Degree Constrained Subgraph (HEDCS), which can be of independent interest. We show that an HEDCS contains a fractional matching with total value at least |M|/(d1+1d)|M^{*}|/(d-1+\frac{1}{d}), where |M||M^{*}| is the size of the maximum matching in the hypergraph. Moreover, we investigate the experimental performance of these algorithms both on random input and real instances. Our results support the theoretical bounds and confirm the trade-offs between the quality of approximation and the speed of the algorithms.

1 Introduction

As massive graphs become more ubiquitous, the need for scalable parallel and distributed algorithms that solve graph problems grows as well. In recent years, we have seen progress in many graph problems (e.g. spanning trees, connectivity, shortest paths [4, 5]) and, most relevant to this work, matchings [27, 22]. A natural generalization of matchings in graphs is to matchings in hypergraphs. Hypergraph Matching is an important problem with many applications such as capital budgeting, crew scheduling, facility location, scheduling airline flights [53], forming a coalition structure in multi-agent systems [50] and determining the winners in combinatorial auctions [49] (see [56] for a partial survey). Although matching problems in graphs are one of the most well-studied problems in algorithms and optimization, the NP-hard problem of finding a maximum matching in a hypergraph is not as well understood.

In this work, we are interested in the problem of finding matchings in very large hypergraphs, large enough that we cannot solve the problem on one computer. We develop, analyze and experimentally evaluate three parallel algorithms for hypergraph matchings in the MPC model. Two of the algorithms are generalizations of parallel algorithms for matchings in graphs. The third algorithm develops new machinery which we call a hyper-edge degree constrained subgraph (HEDCS), generalizing the notion of an edge-degree constrained subgraph (EDCS). The EDCS has been recently used in parallel and dynamic algorithms for graph matching problems [16, 17, 6]. We will show a range of algorithm tradeoffs between approximation ratio, rounds, memory and computation, evaluated both as worst case bounds, and via computational experiments.

More formally, a hypergraph GG is a pair G=(V,E)G=(V,E) where VV is the set of vertices and EE is the set of hyperedges. A hyperedge eEe\in E is a nonempty subset of the vertices. The cardinality of a hyperedge is the number of vertices it contains. When every hyperedge has the same cardinality dd, the hypergraph is said to be dd-uniform. A hypergraph is linear if the intersection of any two hyperedges has at most one vertex. A hypergraph matching is a subset of the hyperedges MEM\subseteq E such that every vertex is covered at most once, i.e. the hyperedges are mutually disjoint. This notion generalizes matchings in graphs. The cardinality of a matching is the number of hyperedges it contains. A matching is called maximum if it has the largest cardinality of all possible matchings, and maximal if it is not contained in any other matching. In the dd-Uniform Hypergraph Matching Problem (also referred to as Set Packing or dd-Set Packing), a dd-uniform hypergraph is given and one needs to find the maximum cardinality matching.

We adopt the most restrictive MapReduce-like model of modern parallel computation among [40, 29, 11, 4], the Massively Parallel Computation (MPC) model of [11]. This model is widely used to solve different graph problems such as matching, vertex cover [44, 2, 7, 6, 27], independent set [27, 34], as well as many other algorithmic problems. In this model, we have kk machines (processors) each with space ss. NN is the size of the input and our algorithms will satisfy ks=O~(N)k\cdot s=\tilde{O}(N), which means that the total space in the system is only a polylogarithmic factor more than the input size. The computation proceeds in rounds. At the beginning of each round, the data (e.g. vertices and edges) is distributed across the machines. In each round, a machine performs local computation on its data (of size ss), and then sends messages to other machines for the next round. Crucially, the total amount of communication sent or received by a machine is bounded by ss, its space. For example, a machine can send one message of size ss, or ss messages of size 1. It cannot, however, broadcast a size ss message to every machine. Each machine treats the received messages as the input for the next round. Our model limits the number of machines and the memory per machine to be substantially sublinear in the size of the input. On the other hand, no restrictions are placed on the computational power of any individual machine. The main complexity measure is therefore the memory per machine and the number of rounds RR required to solve a problem, which we consider to be the “parallel time” of the algorithm. For the rest of the paper, G(V,E)G(V,E) is a dd-uniform hypergraph with nn vertices and mm hyperedges, and when the context is clear, we will simply refer to GG as a graph and to its hyperedges as edges. MM(G)MM(G) denotes the maximum matching in GG, and μ(G)=|MM(G)|\mu(G)=|MM(G)| is the size of that matching. We define dG(v)d_{G}(v) to be the degree of a vertex vv (the number of hyperedges that vv belongs to) in GG. When the context is clear, we will omit indexing by GG and simply denote it d(v)d(v).

2 Our contribution and results.

We design and implement algorithms for the dd-UHM in the MPC model. We will give three different algorithms, demonstrating different trade-offs between the model’s parameters. Our algorithms are inspired by methods to find maximum matchings in graphs, but require developing significant new tools to address hypergraphs. We are not aware of previous algorithms for hypergraph matching in the MPC model. First we generalize the randomized coreset algorithm of [7] which finds an 3-rounds O(1)O(1)-approximation for matching in graphs. Our algorithm partitions the graph into random pieces across the machines, and then simply picks a maximum matching of each machine’s subgraph. We show that this natural approach results in a O(d2)O(d^{2})-approximation. While the algorithmic generalization is straightforward, the analysis requires several new ideas.

Theorem (Theorem 4.2 (restated)).

There exists an MPCMPC algorithm that with high probability computes a (3d(d1)+3+ϵ)(3d(d-1)+3+\epsilon)-approximation for the dd-UHM problem in 3 MPCMPC rounds on machines of memory s=O~(nm)s=\tilde{O}(\sqrt{nm}).

Our second result concerns the MPC model with per-machine memory O(dn)O(d\cdot n). We adapt the sampling technique and post-processing strategy of [44] to construct maximal matchings in hypergraphs, and are able to show that in dd-uniform hypergraphs, this technique yields a maximal matching, and thus a dd-approximation to the dd-UHM problem in O(logn)O(\log{n}) rounds.

Theorem (Theorem 5.1 (restated)).

There exists an MPCMPC algorithm that given a dd-uniform hypergraph G(V,E)G(V,E) with high probability computes a maximal matching in GG in O(logn)O(\log{n}) MPCMPC rounds on machines of memory s=Θ(dn)s=\Theta(d\cdot n).

Our third result generalizes the edge degree constrained subgraphs (EDCS), originally introduced by Bernstein and Stein [16] for maintaining large matchings in dynamic graphs, and later used for several other problems including matching and vertex cover in the streaming and MPC model [6, 17]. We call these generalized subgraphs hyper-edge degree constrained subgraphs (HEDCS). We show that they exist for specific parameters, and that they contain a good approximation for the dd-UHM problem. We prove that an HEDCS of a hypergraph GG, with well chosen parameters, contain a fractional matching with a value at least dd2d+1μ(G)\frac{d}{d^{2}-d+1}\mu(G), and that the underlying fractional matching is special in the sense that for each hyperedge, it either assigns a value of 11 or a value less than some chosen ϵ\epsilon. We call such a fractional matching an ϵ\epsilon-restricted fractional matching. For Theorem 6.15, we compute an HEDCS of the hypergraph in a distributed fashion. This procedure relies on the robustness properties that we prove for the HEDCSs under sampling.

Theorem (Theorem 6.4 (restated informally)).

Let GG be a dd-uniform hypergraph and 0<ϵ<10<\epsilon<1. There exists an HEDCS that contains an ϵ\epsilon-restricted fractional matching MfHM^{H}_{f} with total value at least μ(G)(dd2d+1ϵ)\mu(G)\left(\frac{d}{d^{2}-d+1}-\epsilon\right).

Theorem (Theorem 6.15 (restated)).

There exists an MPCMPC algorithm that given a dd-uniform hypergraph G(V,E)G(V,E), where |V|=n|V|=n and |E|=m|E|=m, can construct an HEDCS of GG in 2 MPCMPC rounds on machines of memory s=O~(nnm)s=\tilde{O}(n\sqrt{nm}) in general and s=O~(nm)s=\tilde{O}(\sqrt{nm}) for linear hypergraphs.

Theorem (Corollary 6.16 (restated)).

There exists an MPCMPC algorithm that with high probability achieves a d(d1+1/d)2d(d-1+1/d)^{2}-approximation to the dd-Uniform Hypergraph Matching in 3 rounds.

Table 1 summarizes our results.

Approximation ratio Rounds Memory per machine Computation per round
3d(d1)+33d(d-1)+3 3 O~(nm)\tilde{O}(\sqrt{nm}) Exponential
dd O(logn)O(\log{n}) O(dn)O(dn) Polynomial
d(d1+1/d)2d(d-1+1/d)^{2} 3 O~(nnm)\tilde{O}(n\sqrt{nm}) in general Polynomial
O~(nm)\tilde{O}(\sqrt{nm}) for linear hypergraphs

Table 1: Our parallel algorithms for the dd-uniform hypergraph matching problem.

Experimental results. We implement our algorithms in a simulated MPC model environment and test them both on random and real-world instances. Our experimental results are consistent with the theoretical bounds on most instances, and show that there is a trade-off between the extent to which the algorithms use the power of parallelism and the quality of the approximations. This trade-off is illustrated by comparing the number of rounds and the performance of the algorithms on machines with the same memory size. See Section 7 for more details.

Our techniques. For Theorem 4.2 and Theorem 6.15, we use the concept of composable coresets, which has been employed in several distributed optimization models such as the streaming and MapReduce models [1, 47, 8, 9, 10, 37]. Roughly speaking, the main idea behind this technique is as follows: first partition the data into smaller parts. Then compute a representative solution, referred to as a coreset, from each part. Finally, obtain a solution by solving the optimization problem over the union of coresets for all parts. We use a randomized variant of composable coresets, first introduced in [46], where the above idea is applied on a random clustering of the data. This randomized variant has been used after for Graph Matching and Vertex Cover [7, 6], as well as Column Subset Selection [18]. Our algorithm for Theorem 4.2 is similar to previous works, but the analysis requires new techniques for handling hypergraphs. Theorem 5.1 is a relatively straightforward generalization of the corresponding result for matching in graphs.

The majority of our technical innovation is contained in Theorem 6.4 and 6.15. Our general approach is to construct, in parallel, an HEDCS that will contain a good approximation of the maximum matching in the original graph and that will fit on one machine. Then we can run an approximation algorithm on the resultant HEDCS to come up with a good approximation to the maximum matching in this HEDCS and hence in the original graph. In order to make this approach work, we need to generalize much of the known EDCS machinery [16, 17] to hypergraphs. This endeavor is quite involved, as almost all the proofs do not generalize easily and, as the results show, the resulting bounds are weaker than those for graphs. We first show that HEDCSs exist, and that they contain large fractional matchings. We then show that they are useful as a coreset, which amounts to showing that even though there can be many different HEDCSs of some fixed hypergraph G(V,E)G(V,E), the degree distributions of every HEDCS (for the same parameters) are almost identical. In other words, the degree of any vertex vv is almost the same in every HEDCS of GG. We show also that HEDCS are robust under edge sampling, in the sense that edge sampling from a HEDCS yields another HEDCS. These properties allow to use HEDCS in our coresets and parallel algorithm in the rest of the paper.

3 Related Work

Hypergraph Matching. The problem of finding a maximum matching in dd-uniform hypergraphs is NP-hard for any d3d\geq 3 [41, 48], and APX-hard for d=3d=3 [39]. The most natural approach for an approximation algorithm for the hypergraph matching problem is the greedy algorithm: repeatedly add an edge that doesn’t intersect any edges already in the matching. This solution is clearly within a factor of dd from optimal: from the edges removed in each iteration, the optimal solution can contain at most dd edges (at most one for each element of the chosen edge). It is also easy to construct examples showing that this analysis is tight for the greedy approach. All the best known approximation algorithms for the Hypergraph Matching Problem in dd-uniform hypergraphs are based on local search methods [13, 15, 20, 30, 36]. The first such result by Hurkens and Schrijver [36] gave a (d2+ϵ)(\frac{d}{2}+\epsilon)-approximation algorithm. Halldorsson [30] presented a quasi-polynomial (d+23)(\frac{d+2}{3})-approximation algorithm for the unweighted dd-UHM. Sviridenko and Ward [54] established a polynomial time (d+23)(\frac{d+2}{3})-approximation algorithm that is the first polynomial time improvement over the (d2+ϵ)(\frac{d}{2}+\epsilon) result from [36]. Cygan [21] and Furer and Yu [26] both provide a (d+13+ϵ)(\frac{d+1}{3}+\epsilon) polynomial time approximation algorithm, which is the best approximation guarantee known so far. On the other hand, Hazan, Safra and Schwartz [35] proved that it is hard to approximate dd-UHM problem within a factor of Ω(d/logd)\Omega(d/\log{d}).

Matching in parallel computation models. The study of the graph maximum matching problem in parallel computation models can be traced back to PRAM algorithms of 1980s [38, 3, 45]. Since then it has been studied in the LOCAL and MPC models, and (1+ϵ)(1+\epsilon)- approximation can be achieved in O(loglogn)O(\log{\log{n}}) rounds using a space of O(n)O(n) [6, 22, 27]. The question of finding a maximal matching in a small number of rounds has also been considered in [28, 44]. Recently, Behnezhad et al. [12] presented a O(loglogΔ)O(\log{\log{\Delta}}) round algorithm for maximal matching with O(n)O(n) memory per machine. While we are not aware of previous work on hypergraph matching in the MPC model, finding a maximal hypergraph matching has been considered in the LOCAL model. Both Fischer et al. [24] and Harris [33] provide a deterministic distributed algorithm that computes O(d)O(d)-approximation to the dd-UHM.

Maximum Independent Set. The dd-UHM is strongly related to different variants of the Independent Set problem as well as other combinatorial problems. The maximum independent set (MIS) problem on degree bounded graphs can be mapped to dd-UHM when the degree bound is dd [31, 14, 32, 55]. The dd-UHM problem can also be studied under a more general problem of maximum independent set on (d+1)(d+1)-claw-free graphs [13, 20]. (See [19] of connections between dd-UHM and other combinatorial optimization problems).

4 A 3-round O(d2)O(d^{2})-approximation

In this section, we generalize the randomized composable coreset algorithm of Assadi and Khanna [7]. They used a maximum matching as a coreset and obtained a O(1)O(1)-approximation. We use a hypergraph maximum matching and we obtain a O(d2)O(d^{2})-approximation. We first define a kk-partitioning and then present our greedy approach.

Definition 4.1 (Random kk-partitioning).

Let EE be an edge-set of a hypergraph G(V,E)G(V,E). We say that a collection of edges E(1),,E(k)E^{(1)},\ldots,E^{(k)} is a random kk-partition of EE if the sets are constructed by assigning each edge eEe\in E to some E(i)E^{(i)} chosen uniformly at random. A random kk-partition of EE naturally results in partitioning the graph G into kk subgraphs G(1),,G(k)G^{(1)},\ldots,G^{(k)} where G(i):=G(V,E(i))G^{(i)}:=G(V,E^{(i)}) for all i[k]i\in[k].

Let G(V,E)G(V,E) be any dd-uniform hypergraph and G(1),,G(k)G^{(1)},\ldots,G^{(k)} be a random kk-partitioning of GG. We describe a simple greedy process for combining the maximum matchings of G(i)G^{(i)}, and prove that this process results in a O(d2)O(d^{2})-approximation of the maximum matching of GG.

1Construct a random kk-partitioning of GG across the kk machines. Let M(0):=.M^{(0)}:=\emptyset.
2 for i=1i=1 to kk do
3       Set MM(G(i))MM(G^{(i)}) to be an arbitrary hypergraph maximum matching of G(i)G^{(i)}.
4       Let M(i)M^{(i)} be a maximal matching obtained by adding to M(i1)M^{(i-1)} the edges of MM(G(i)MM(G^{(i}) that do not violate the matching property.
5      
6 end for
7return M:=M(k)M:=M^{(k)}.
Algorithm 1 Greedy
Theorem 4.2.

Greedy computes, with high probability, a (3d(d1)+3+o(1))\big{(}3d(d-1)+3+o(1)\big{)}-approximation for the dd-UHM problem in 3 MPCMPC rounds on machines of memory s=O~(nm)s=\tilde{O}(\sqrt{nm}).

In the rest of the section, we present the proof of Theorem 4.2. Let c=13d(d1)+3c=\frac{1}{3d(d-1)+3}, we show that M(k)cμ(G)M^{(k)}\geq c\cdot\mu(G) w.h.p, where M(k)M^{(k)} is the output of Greedy. The randomness stems from the fact that the matchings M(i)M^{(i)} (for i{1,,k}i\in\{1,\ldots,k\}) constructed by Greedy are random variables depending on the random kk-partitioning. We adapt the general approach in [7] for dd-uniform hypergraphs, with d3d\geq 3. Suppose at the beginning of the ii-th step of Greedy, the matching M(i1)M^{(i-1)} is of size o(μ(G)/d2)o(\mu(G)/d^{2}). One can see that in this case, there is a matching of size Ω(μ(G))\Omega(\mu(G)) in GG that is entirely incident on vertices of GG that are not matched by M(i1)M^{(i-1)}. We can show that in fact Ω(μ(G)/(d2k))\Omega(\mu(G)/(d^{2}k)) edges of this matching are appearing in G(i)G^{(i)}, even when we condition on the assignment of the edges in the first (i1)(i-1) graphs. Next we argue that the existence of these edges forces any maximum matching of G(i)G^{(i)} to match Ω(μ(G)/(d2k))\Omega(\mu(G)/(d^{2}k)) edges in G(i)G^{(i)} between the vertices that are not matched by M(i1)M^{(i-1)}. These edges can always be added to the matching M(i1)M^{(i-1)} to form M(i)M^{(i)}. Therefore, while the maximal matching in Greedy is of size o(μ(G))o(\mu(G)), we can increase its size by Ω(μ(G)/(d2k))\Omega(\mu(G)/(d^{2}k)) edges in each of the first k/3k/3 steps, hence obtaining a matching of size Ω(μ(G)/d2)\Omega(\mu(G)/d^{2}) at the end. The following Lemma 4.3 formalizes this argument.

Lemma 4.3.

For any i[k/3]i\in[k/3], if M(i1)cμ(G)M^{(i-1)}\leq c\cdot\mu(G), then, w.p. 1O(1/n)1-O(1/n),

M(i)M(i1)+13d(d1)co(1)kμ(G).M^{(i)}\geq M^{(i-1)}+\frac{1-3d(d-1)c-o(1)}{k}\cdot\mu(G)\ .

Before proving the lemma, we define some notation. Let MM^{*} be an arbitrary maximum matching of GG. For any i[k]i\in[k], we define M<iM^{*<i} as the part of MM^{*} assigned to the first i1i-1 graphs in the random kk-partitioning, i.e., the graphs G(1),G(i1)G^{(1)},\ldots G^{(i-1)}. We have the following concentration result:

Claim 4.1.

W.p. 1O(1/n)1-O(1/n), for any i[k]i\in[k]:

M<i(i1+o(i)k)μ(G)M^{*<i}\leq\Big{(}\frac{i-1+o(i)}{k}\Big{)}\cdot\mu(G)
Proof of claim 4.1.

Fix an i[k]i\in[k]; each edge in MM^{*} is assigned to G(1),,G(i1)G^{(1)},\ldots,G^{(i-1)}, w.p. (i1)/k(i-1)/k, hence in expectation, size of M<iM^{*<i} is i1kμ(G)\frac{i-1}{k}\cdot\mu(G). For a large nn, the ratio μ(G)k\frac{\mu(G)}{k} is large and the claim follows from a standard application of Chernoff bound. ∎

Proof of Lemma 4.3.

Fix an i[k/3]i\in[k/3] and the set of edges for E(1),E(i1)E^{(1)},\ldots E^{(i-1)}. This also fixes the matching M(i1)M^{(i-1)} while the set of edges in E(i),,E(k)E^{(i)},\ldots,E^{(k)} together with the matching M(i)M^{(i)} are still random variables. We further condition on the event that after fixing the edges in E(1),,E(i1),|M<i|i1+o(i)kμ(G)E^{(1)},\ldots,E^{(i-1)},|M^{*<i}|\leq\frac{i-1+o(i)}{k}\cdot\mu(G) which happens w.p. 1O(1/n)1-O(1/n) by claim 4.1.

Let VoldV_{old} be the set of vertices incident on M(i1)M^{(i-1)} and VnewV_{new} be the remaining vertices. Let EiE^{\geq i} be the set of edges in EE(1)E(i1)E\setminus E^{(1)}\cup\ldots\cup E^{(i-1)}. We partition EiE^{\geq i} into two parts: (i) EoldE_{old}: the set of edges with at least one endpoint in VoldV_{old}, and (ii) EnewE_{new}: the set of edges incident entirely on VnewV_{new}. Our goal is to show that w.h.p. any maximum matching of G(i)G^{(i)} matches Ω(μ(G)/k)\Omega(\mu(G)/k) vertices in VnewV_{new} to each other by using the edges in EnewE_{new}. The lemma then follows easily from this.

The edges in the graph G(i)G^{(i)} are chosen by independently assigning each edge in EiE^{\geq i} to G(i)G^{(i)} w.p. 1/(ki+1)1/(k-i+1). This independence makes it possible to treat the edges in EoldE_{old} and EnewE_{new} separately. We can fix the set of sampled edges of G(i)G^{(i)} in EoldE_{old}, denoted by EoldiE^{i}_{old}, without changing the distribution of edges in G(i)G^{(i)} chosen from EnewE_{new}. Let μold:=MM(G(V,Eoldi))\mu_{old}:=MM(G(V,E^{i}_{old})), i.e., the maximum number of edges that can be matched in G(i)G^{(i)} using only the edges in EoldiE^{i}_{old}. In the following, we show that w.h.p., there exists a matching of size μold+Ω(μ(G)/k)\mu_{old}+\Omega(\mu(G)/k) in G(i)G^{(i)}. By the definition of μold\mu_{old}, this implies that any maximum matching of G(i)G^{(i)} has to use at least Ω(μ(G)/k)\Omega(\mu(G)/k) edges in EnewE_{new}.

Let MoldM_{old} be any arbitrary maximum matching of size μold\mu_{old} in G(V,Eoldi)G(V,E^{i}_{old}). Let Vnew(Mold)V_{new}(M_{old}) be the set of vertices in VnewV_{new} that are incident on MoldM_{old}. We show that there is a large matching in G(V,Enew)G(V,E_{new}) that avoids Vnew(Mold)V_{new}(M_{old}).

Claim 4.2.

|Vnew(Mold)|<cd(d1)μ(G)|V_{new}(M_{old})|<c\cdot d(d-1)\cdot\mu(G).

Proof of Claim 4.2.

Since any edge in MoldM_{old} has at least one endpoint in VoldV_{old}, we have |Vnew(Mold)|(d1)|Mold|(d1)|Vold||V_{new}(M_{old})|\leq(d-1)|M_{old}|\leq(d-1)|V_{old}|. By the assertion of the lemma, |M(i1)|<cμ(G)|M^{(i-1)}|<c\cdot\mu(G), and hence Vnew(Mold)|(d1)|Vold|d(d1)|M(i1)|<cd(d1)μ(G)V_{new}(M_{old})|\leq(d-1)\cdot|V_{old}|\leq d(d-1)\cdot|M^{(i-1)}|<c\cdot d(d-1)\cdot\mu(G). ∎

Claim 4.3.

There exists a matching in G(V,Enew)G(V,E_{new}) of size (ki+1o(i)k2d(d1)c)μ(G)\Big{(}\frac{k-i+1-o(i)}{k}-2d(d-1)c\Big{)}\cdot\mu(G) that avoids the vertices of Vnew(Mold)V_{new}(M_{old}).

Proof of Claim 4.3.

By the assumption that |M<i|i1+o(i)kμ(G)|M^{*<i}|\leq\frac{i-1+o(i)}{k}\cdot\mu(G), there is a matching of size ki+1o(i)kμ(G)\tfrac{k-i+1-o(i)}{k}\cdot\mu(G) in the graph G(V,Ei)G(V,E^{\geq i}). By removing the edges in MM that are either incident on VoldV_{old} or Vnew(Mold)V_{new}(M_{old}), at most 2d(d1)cμ(G)2d(d-1)c\cdot\mu(G) edges are removed from MM. Now the remaining matching is entirely contained in EnewE_{new} and also avoids Vnew(Mold)V_{new(Mold)}, hence proving the claim. ∎

We are now ready to finalize the proof of Lemma 4.3. Let MnewM_{new} be the matching guaranteed by Claim 4.3. Each edge in this matching is chosen in G(i)G^{(i)} w.p. 1/(ki+1)1/(k-i+1) independent of the other edges. Hence, by Chernoff bound, there is a matching of size

(1o(1))(1ko(i)k(ki+1)2d(d1)cki+1)μ(G)1o(1)3d(d1)ckμ(G)(1-o(1))\cdot\Big{(}\frac{1}{k}-\frac{o(i)}{k(k-i+1)}-\frac{2d(d-1)c}{k-i+1}\Big{)}\cdot\mu(G)\geq\frac{1-o(1)-3d(d-1)c}{k}\cdot\mu(G)

in the edges of MnewM_{new} that appear in G(i)G^{(i)} (for ik/3i\leq k/3). This matching can be directly added to the matching MoldM_{old}, implying the existence of a matching of size μold+1o(1)3d(d1)ckμ(G)\mu_{old}+\frac{1-o(1)-3d(d-1)c}{k}\cdot\mu(G) in G(i)G^{(i)}. As we argued before, this ensures that any maximum matching of G(i)G^{(i)} contains at least 1o(1)3d(d1)ckμ(G)\frac{1-o(1)-3d(d-1)c}{k}\cdot\mu(G) edges in EnewE_{new}. These edges can always be added to M(i1)M^{(i-1)} to form M(i)M^{(i)}, hence proving the lemma. ∎

Proof of Theorem 4.2.

Recall that M:=M(k)M:=M^{(k)} is the output matching of Greedy. For the first k/3k/3 steps of Greedy, if at any step we got a matching of size at least cμ(G)c\cdot\mu(G), then we are already done. Otherwise, at each step, by Lemma 4.3, w.p. 1O(1/n)1-O(1/n), we increase the size of the maximal matching by 13d(d1)co(1)kμ(G)\frac{1-3d(d-1)c-o(1)}{k}\cdot\mu(G) edges; consequently, by taking a union bound on the k/3k/3 steps, w.p. 1o(1)1-o(1), the size of the maximal matching would be 13d(d1)co(1)3μ(G)\frac{1-3d(d-1)c-o(1)}{3}\cdot\mu(G). Since c=1/(3d(d1)+3)c=1/(3d(d-1)+3), we ensure that 13d(d1)c3=c\frac{1-3d(d-1)c}{3}=c and in either case, the matching computed by Greedy is of size at least μ(G)/(3d(d1)+3)o(μ(G))\mu(G)/(3d(d-1)+3)-o(\mu(G)), and this proves that Greedy is a O(d2)O(d^{2})-approximation. All is left is to prove that Greedy can be implemented in 3 rounds with a memory of O~(nm)\tilde{O}(\sqrt{nm}) per machine. Let k=mnk=\sqrt{\frac{m}{n}} be the number of machines, each with a memory of O~(nm)\tilde{O}(\sqrt{nm}). We claim that Greedy can run in three rounds. In the first round, each machine randomly partitions the edges assigned to it across the kk machines. This results in a random kk-partitioning of the graph across the machines. In the second round, each machine sends a maximum matching of its input to a designated central machine MM; as there are k=mnk=\sqrt{\frac{m}{n}} machines and each machine is sending O~(n)\tilde{O}(n) size coreset, the input received by MM is of size O~(nm)\tilde{O}(\sqrt{nm}) and hence can be stored entirely on that machine. Finally, MM computes the answer by combining the matchings. ∎

We conclude this section by stating that computing a maximum matching on every machine is only required for the analysis, i.e., to show that there exists a large matching in the union of coresets. In practice, we can use an approximation algorithm to obtain a large matching from the coresets. In our experiments, we use a maximal matching instead of maximum.

5 A O(logn)O(\log{n})-rounds dd-approximation algorithm

In this section, we show how to compute a maximal matching in O(logn)O(\log{n}) MPC rounds, by generalizing the algorithm in [44] to dd-uniform hypergraphs. The algorithm provided by Lattanzi el al. [44] computes a maximal matching in graphs in O(logn)O(\log n) if s=Θ(n)s=\Theta(n), and in at mostc/ϵ\lfloor c/\epsilon\rfloor iterations when s=Θ(n1+ϵ)s=\Theta(n^{1+\epsilon}), where 0<ϵ<c0<\epsilon<c is a fixed constant. Harvey et al. [34] show a similar result.

The algorithm first samples O(s)O(s) edges and finds a maximal matching M1M_{1} on the resulting subgraph (we will specify a bound on the memory ss later). Given this matching, we can safely remove edges that are in conflict (i.e. those incident on nodes in M1M_{1}) from the original graph GG. If the resulting filtered graph HH is small enough to fit onto a single machine, the algorithm augments M1M_{1} with a matching found on HH. Otherwise, we augment M1M_{1} with the matching found by recursing on HH. Note that since the size of the graph reduces from round to round, the effective sampling probability increases, resulting in a larger sample of the remaining graph.

Theorem 5.1.

Given a dd-uniform hypergraph G(V,E)G(V,E), Iterated-Sampling omputes a maximal matching in GG with high probability in O(logn)O(\log{n}) MPCMPC rounds on machines of memory s=Θ(dn)s=\Theta(d\cdot n).

1Set M:=M:=\emptyset and 𝒮=E\mathcal{S}=E.
2 Sample every edge e𝒮e\in\mathcal{S} uniformly with probability p=s5|𝒮|dp=\frac{s}{5|\mathcal{S}|d} to form EE^{\prime}.
3 If |E|>s|E^{\prime}|>s the algorithm fails. Otherwise give the graph G(V,E)G(V,E^{\prime}) as input to a single machine and compute a maximal matching MM^{\prime} on it. Set M=MMM=M\cup M^{\prime}.
4 Let II be the unmatched vertices in GG and G[I]G[I] the induced subgraph with edges E[I]E[I]. If E[I]>sE[I]>s, set 𝒮:=E[I]\mathcal{S}:=E[I] and return to step 2.
5 Compute a maximal matching M′′M^{\prime\prime} on G[I]G[I] and output M=MM′′M=M\cup M^{\prime\prime}.
6
Algorithm 2 Iterated-Sampling

Next we present the proof of Theorem 5.1. We first show that after sampling edges, the size of the graph G[I]G[I] induced by unmatched vertices decreases exponentially with high probability, and therefore the algorithm terminates in O(logn)O(\log{n}) rounds. Next we argue that MM is indeed a maximal matching by showing that if after terminating, there is an edge that’s still unmatched, this will yield a contradiction. This is formalized in the following lemmas.

Lemma 5.2.

Let EEE^{\prime}\subset E be a set of edges chosen independently with probability pp. Then with probability at least 1en1-e^{-n}, for all IVI\subset V either |E[I]|<2n/p|E[I]|<2n/p or E[I]EE[I]\cap E^{\prime}\neq\emptyset.

Proof of Lemma 5.2.

Fix one such subgraph, G[I]=(I,E[I])G[I]=(I,E[I]) with |E[I]|2n/p|E[I]|\geq 2n/p. The probability that none of the edges in E[I]E[I] were chosen to be in EE^{\prime} is (1p)|E[I]|(1p)2n/pe2n(1-p)^{|E[I]|}\leq(1-p)^{2n/p}\leq e^{-2n}. Since there are at most 2n2^{n} total possible induced subgraphs G[I]G[I] (because each vertex is either matched or unmatched), the probability that there exists one that does not have an edge in EE^{\prime} is at most 2ne2nen2^{n}e^{-2n}\leq e^{-n}. ∎

Lemma 5.3.

If s20dns\geq 20d\cdot n then Iterated-Sampling runs for at most O(logn)O(\log{n}) iterations with high probability.

Proof of Lemma 5.3.

Fix an iteration ii of Iterated-Sampling and let pp be the sampling probability for this iteration. Let EiE_{i} be the set of edges at the beginning of this iteration, and denote by II be the set of unmatched vertices after this iteration. From Lemma 5.2, if |E[I]|2n/p|E[I]|\geq 2n/p then an edge of E[I]E[I] will be sampled with high probability. Note that no edge in E[I]E[I] is incident on any edge in MM^{\prime}. Thus, if an edge from E[I]E[I] is sampled then Iterated-Sampling would have chosen this edge to be in the matching. This contradicts the fact that no vertex in II is matched. Hence, |E[I]|2n/p|E[I]|\leq 2n/p with high probability.

Now consider the first iteration of the algorithm, let G1(V1,E1)G_{1}(V_{1},E_{1}) be the induced graph on the unmatched nodes after the first step of the algorithm. The above argument implies that |E1|10dn|E0|s10dn|E|s|E|2|E_{1}|\leq\frac{10d\cdot n|E_{0}|}{s}\leq\frac{10d\cdot n|E|}{s}\leq\frac{|E|}{2}. Similarly |E2|10dn|E1|s(10dn)2|E1|s2|E|22|E_{2}|\leq\frac{10d\cdot n|E_{1}|}{s}\leq\frac{(10d\cdot n)^{2}|E_{1}|}{s^{2}}\leq\frac{|E|}{2^{2}}. After ii iterations : |Ei||E|2i|E_{i}|\leq\frac{|E|}{2^{i}}, and the algorithm will terminate after O(logn)O(\log{n}) iterations. ∎

Lemma 5.4.

Iterated-Sampling finds a maximal matching of GG with high probability.

Proof of Lemma 5.4.

First consider the case that the algorithm does not fail. Suppose there is an edge e={v1,,vd}Ee=\{v_{1},\ldots,v_{d}\}\in E such that none of the viv_{i}’s are matched in the final matching MM that the algorithm output. In the last iteration of the algorithm, since eEe\in E and its endpoints are not matched, then eE[I]e\in E[I]. Since this is the last run of the algorithm, a maximal matching M′′M^{\prime\prime} of G[I]G[I] is computed on one machine. Since M′′M^{\prime\prime} is maximal, at least one of the viv_{i}’s must be matched in it. All of the edges of M′′M^{\prime\prime} get added to MM in the last step. This yields a contradiction.

Next, consider the case that the algorithm fails. This occurs due to the set of edges EE^{\prime} having size larger than the memory in some iteration of the algorithm. Note that 𝔼[|E|]=|S|p=s/5ds/10\mathbb{E}[|E^{\prime}|]=|S|\cdot p=s/5d\leq s/10 in a given iteration. By the Chernoff Bound it follows that |E|s|E^{\prime}|\geq s with probability smaller than 2s220dn2^{s}\geq 2^{-20d\cdot n} (since s20dns\geq 20d\cdot n). By Lemma 5.3 the algorithm completes in at most O(logn)O(\log{n}) rounds, thus the total failure probability is bounded by O(logn220n)O(\log{n}\cdot 2^{-20\cdot n}) using the union bound. ∎

We are now ready to show that Iterated-Sampling can be implemented in O(logn)O(\log n) MPC rounds.

Proof of Theorem 5.1.

We show that Iterated-Sampling can be implemented in the MPCMPC model with machines of memory Θ(dn)\Theta(dn) and O(logn)O(\log{n}) MPC rounds. Combining this with Lemma 5.4 on the correctness of Iterated-Sampling , we immediately obtain Theorem 5.1 for the case of s=Θ(dn)s=\Theta(dn). Every iteration of Iterated-Sampling can be implemented in O(1)O(1) MPC rounds. Suppose the edges are initially distributed over all machines. We can sample every edge with the probability pp (in each machine) and then send the sampled edges EE^{\prime} to the first machine. With high probability we will have |E|=O(n)|E^{\prime}|=O(n), so we can fit the sampled edges in the first machine. Therefore, the sampling can be done in one MPCMPC round. Computing the subgraph of GG induced by II can be done in 2 rounds; one round to send the list of unmatched vertices II from the first machine to all the other machines, and the other round to compute the subgraph G[I]G[I], and send it to a single machine if it fits, or start the sampling again. ∎

6 A 3-round O(d3)O(d^{3})-approximation using HEDCSs

In graphs, edge degree constrained subgraphs (EDCS) [16, 17] have been used as a local condition for identifying large matchings, and leading to good dynamic and parallel algorithms [16, 17, 6]. These papers showed that an EDCS exists, that it contains a large matching, that it is robust to sampling and composition, and that it can be used as a coreset.

In this section, we present a generalization of EDCS for hypergraphs. We prove that an HEDCS exists, that it contains a large matching, that it is robust to sampling and composition, and that it can be used as a coreset. The proofs and algorithms, however, require significant developments beyond the graph case. We first present definitions of a fractional matching and HEDCS.

Definition 6.1 (Fractional matching).

In a hypergraph G(V,E)G(V,E), a fractional matching is a mapping from y:E[0,1]y:E\mapsto[0,1] such that for every edge ee we have 0ye10\leq y_{e}\leq 1 and for every vertex vv : evye1\sum_{e\in v}y_{e}\leq 1. The value of such fractional matching is equal to eEye\sum_{e\in E}y_{e}. For ϵ>0\epsilon>0, a fractional matching is an ϵ\epsilon-restricted fractional matching if for every edge ee: ye=1 or ye[0,ϵ].y_{e}=1\mbox{ or }y_{e}\in[0,\epsilon].

Definition 6.2.

For any hypergraph G(V,E)G(V,E) and integers ββ0\beta\geq\beta^{-}\geq 0 a hyperedge degree constraint subgraph HEDCS(H,β,β)(H,\beta,\beta^{-}) is a subgraph H:=(V,EH)H:=(V,E_{H}) of GG satisfying:

  • (P1): For any hyperedge eEHe\in E_{H}: vedH(v)β.\sum_{v\in e}d_{H}(v)\leq\beta.

  • (P2): For any hyperedge eEHe\not\in E_{H}: vedH(v)β.\sum_{v\in e}d_{H}(v)\geq\beta^{-}.

We show via a constructive proof that a hypergraph contains an HEDCS when the parameters of this HEDCS satisfy the inequality ββd1\beta-\beta^{-}\geq d-1.

Lemma 6.3.

Any hypergraph GG contains an HEDCS(G,β,β)(G,\beta,\beta^{-}) for any parameters ββd1\beta-\beta^{-}\geq d-1.

Proof.

Consider the following simple procedure for creating an HEDCS HH of a given hypergraph GG: start by initializing HH to be equal to GG. And while HH is not an HEDCS(G,β,β)G,\beta,\beta^{-}), find an edge ee which violates one of the properties of HEDCS and fix it. Fixing the edge ee implies removing it from HH if it was violating Property (P1) and adding it to HH if it was violating Property (P2). The output of the above procedure is clearly an HEDCS of graph GG. However, a-priori it is not clear that this procedure ever terminates as fixing one edge ee can result in many edges violating the HEDCS properties, potentially undoing the previous changes. In the following, we use a potential function argument to show that this procedure always terminates after a finite number of steps, hence implying that a HEDCS(G,β,β)HEDCS(G,\beta,\beta^{-}) always exists.

We define the following potential function Φ\Phi:

Φ:=(2dβd1d)vVdH(v)eHuedH(u)\Phi:=(\frac{2}{d}\beta-\frac{d-1}{d})\cdot\sum\limits_{v\in V}d_{H}(v)-\sum\limits_{e\in H}\sum\limits_{u\in e}d_{H}(u)

We argue that in any step of the procedure above, the value of Φ\Phi increases by at least 1. Since the maximum value of Φ\Phi is at most O(2dnβ2)O(\frac{2}{d}n\cdot\beta^{2}), this immediately implies that this procedure terminates in O(2dnβ2)O(\frac{2}{d}n\cdot\beta^{2}) iterations.

Define Φ1=(2dβd1d)vVdH(v)\Phi_{1}=(\frac{2}{d}\beta-\frac{d-1}{d})\cdot\sum\limits_{v\in V}d_{H}(v) and Φ2=eHuedH(u)\Phi_{2}=\sum\limits_{e\in H}\sum\limits_{u\in e}d_{H}(u). Let ee be the edge we choose to fix at this step, HbH_{b} be the subgraph before fixing the edge ee, and HaH_{a} be the resulting subgraph. Suppose first that the edge ee was violating Property (P1) of HEDCS. As the only change is in the degrees of vertices vev\in e, Φ1\Phi_{1} decreases by (2β(d1))(2\beta-(d-1)). On the other hand, vedHb(v)β+1\sum\limits_{v\in e}d_{H_{b}}(v)\geq\beta+1 originally (as ee was violating Property (P1) of HEDCS), and hence after removing ee, Φ2\Phi_{2} increases by β+1\beta+1. Additionally, for each edge eue_{u} incident upon ueu\in e in HaH_{a} , after removing the edge ee, veudHa(v)\sum\limits_{v\in e_{u}}d_{H_{a}}(v) decreases by one. As there are at least uedHa(u)=uedHb(u)dβ(d1)\sum\limits_{u\in e}d_{H_{a}}(u)=\sum\limits_{u\in e}d_{H_{b}}(u)-d\geq\beta-(d-1) choices for eue_{u}, this means that in total, Φ2\Phi_{2} increases by at least 2β+1(d1)2\beta+1-(d-1). As a result, in this case Φ\Phi increases by at least 1 after fixing the edge ee.

Now suppose that the edge ee was violating Property (P2) of HEDCS instead. In this case, degree of vertices ueu\in e all increase by one, hence Φ1\Phi_{1} increases by 2β(d1)2\beta-(d-1). Additionally, note that since edge ee was violating Property (P2) we have vedHb(v)β1\sum\limits_{v\in e}d_{H_{b}}(v)\leq\beta^{-}-1, so the addition of edge ee decreases Φ2\Phi_{2} by at most vedHa(v)=vedHb(v)+dβ1+d\sum\limits_{v\in e}d_{H_{a}}(v)=\sum\limits_{v\in e}d_{H_{b}}(v)+d\leq\beta^{-}-1+d. Moreover, for each edge eue_{u} incident upon ueu\in e, after adding the edge ee, veudHa(v)\sum\limits_{v\in e_{u}}d_{H_{a}}(v) increases by one and since there are at most vedHb(v)β1\sum\limits_{v\in e}d_{H_{b}}(v)\leq\beta^{-}-1 choices for eue_{u}, Φ2\Phi_{2} decreases in total by at most 2β2+d2\beta^{-}-2+d. The total variation in Φ\Phi is therefore equal to 2β(d1)(2β2+d)=3+2(ββd).2\beta-(d-1)-(2\beta^{-}-2+d)=3+2(\beta-\beta^{-}-d). So if ββd1\beta-\beta^{-}\geq d-1, we have that Φ\Phi increases by at least 1 after fixing edge ee. ∎

The main result in this section shows that an HEDCS of a graph contains a large ϵ\epsilon-restricted fractional matching that approximates the maximum matching by a factor less than dd.

Theorem 6.4.

Let GG be a dd-uniform hypergraph and 0ϵ<10\leq\epsilon<1. Let HH:= HEDCS(G,β,β(1λ))(G,\beta,\beta(1-\lambda)) with λ=ϵ6\lambda=\frac{\epsilon}{6} and β8d2d1λ3\beta\geq\frac{8d^{2}}{d-1}\cdot\lambda^{-3}. Then HH contains an ϵ\epsilon-restricted fractional matching MfHM^{H}_{f} with total value at least μ(G)(dd2d+1ϵd1).\mu(G)(\frac{d}{d^{2}-d+1}-\frac{\epsilon}{d-1}).

In order to prove Theorem 6.4, we will need the following two lemmas. The first lemma we prove is an algebraic result that will help us bound the contribution of vertices in the ϵ\epsilon-restricted fractional matching. The second lemma identifies additional structure on the HEDCS, that we will use to construct the fractional matching of the theorem. For proofs of both lemmas, see Appendix A.2

Lemma 6.5.

Let ϕ(x)=min{1,(d1)xd(βx)}\phi(x)=\min\{1,\frac{(d-1)x}{d(\beta-x)}\}. If a1,ad0a_{1},\ldots a_{d}\geq 0 and a1++adβ(1λ)a_{1}+\ldots+a_{d}\geq\beta(1-\lambda) for some λ0\lambda\geq 0, then i=1dϕ(ai)15λ\sum\limits_{i=1}^{d}\phi(a_{i})\geq 1-5\cdot\lambda.

Lemma 6.6.

Given any HEDCS(G,β,β(1λ))HEDCS(G,\beta,\beta(1-\lambda)) HH, we can find two disjoint sets of vertices XX and YY that satisfy the following properties:

  1. 1.

    |X|+|Y|=dμ(G)|X|+|Y|=d\cdot\mu(G).

  2. 2.

    There is a perfect matching in YY using edges in HH.

  3. 3.

    Letting σ=|Y|d+xXϕ(dH(x))\sigma=\frac{|Y|}{d}+\sum\limits_{x\in X}\phi(d_{H}(x)),we have that σμ(G)(15λ)\sigma\geq\mu(G)(1-5\lambda).

  4. 4.

    All edges in HH with vertices in XX have at least one other vertex in YY, and have vertices only in XX and YY.

We are now ready to prove Theorem 6.4.

Proof of theorem 6.4.

Suppose we have two sets XX and YY satisfying the properties of Lemma 6.6. We construct an ϵ\epsilon-restricted fractional matching MfHM^{H}_{f} using the edges in HH such that

val(MfH)(dd2d+1ϵd1)μ(G),val(M^{H}_{f})\geq(\frac{d}{d^{2}-d+1}-\frac{\epsilon}{d-1})\mu(G),

where val(MfH)val(M^{H}_{f}) is the value of the fractional matching MfHM^{H}_{f}. Now, by Property 2 of Lemma 6.6, |Y||Y| contains a perfect matching MYHM^{H}_{Y} using edges in HH. Let YY^{-} be a subset of YY obtained by randomly sampling exactly 1/d1/d edges of MYHM^{H}_{Y} and adding their endpoints to YY^{-} Let Y=YYY^{*}=Y\setminus Y^{-} and observe that |Y|=|Y|/d|Y^{-}|=|Y|/d and |Y|=d1d|Y||Y^{*}|=\frac{d-1}{d}|Y|.

Let HH^{*} be the subgraph of HH induced by XYX\cup Y^{*} (each edge in HH^{*} has vertices in only XX and YY^{*}). We define a fractional matching MfHM^{H^{*}}_{f} on the edges of HH^{*} in which all edges have value at most ϵ\epsilon. We will then let our final fractional matching MfHM^{H}_{f} be the fractional matching MfHM^{H^{*}}_{f} joined with the perfect matching in HH of YY^{-} (so MfHM^{H}_{f} assigns value 11 to the edges in this perfect matching). MfHM^{H}_{f} is, by definition, an ϵ\epsilon-restricted fractional matching.

We now give the details for the construction of MfHM^{H^{*}}_{f}. Let V=XYV^{*}=X\cup Y^{*} be the vertices of HH^{*}, and let EE^{*} be its edges. For any vertex vVv\in V^{*}, define dH(v)d^{*}_{H}(v) to be the degree of vv in HH^{*}. Recall that by Property 4 of Lemma 6.6, if xXx\in X then all the edges of HH incident to xx go to YY (but some might go to YY^{-}). Thus, for xXx\in X, we have E[dH(x)]dH(x)(d1)dE[d^{*}_{H}(x)]\geq\frac{d_{H}(x)(d-1)}{d}.

We now define MfHM^{H^{*}}_{f} as follows. For every xXx\in X, we arbitrarily order the edges of HH incident to xx, and then we assign/add a value of min{ϵ|Xe|,1|Xe|1βdH(x)}\min\Big{\{}\frac{\epsilon}{|X\cap e|},\frac{1}{|X\cap e|}\frac{1}{\beta-d_{H}(x)}\Big{\}} to these edges one by one, stopping when either val(x)val(x) (the sum of values assigned to vertices incident to xx) reaches 1 or there are no more edges in HH incident to xx, whichever comes first. In the case that val(x)val(x) reaches 11 the last edge might have added value less than min{ϵ|Xe|,1|Xe|1βdH(x)}\min\Big{\{}\frac{\epsilon}{|X\cap e|},\frac{1}{|X\cap e|}\frac{1}{\beta-d_{H}(x)}\Big{\}}, where ee is the last edge to be considered.

We now verify that MfHM^{H^{*}}_{f} is a valid fractional matching in that all vertices have value at most  1. This is clearly true of vertices xXx\in X by construction. For a vertex yYy\in Y^{*}, it suffices to show that each edge incident to yy receives a value of at most 1/dH(y)1/dH(y)1/d_{H}(y)\leq 1/d^{*}_{H}(y). To see this, first note that the only edges to which MfHM^{H^{*}}_{f} assigns non-zero values have at least two endpoints in X×YX\times Y^{*}. Any such edge ee receives value at most min{ϵ,xXe1|Xe|1βdH(x)}\min{\big{\{}\epsilon,\sum\limits_{x\in X\cap e}\frac{1}{|X\cap e|}\frac{1}{\beta-d_{H}(x)}\big{\}}}, but since ee is in MfHM^{H^{*}}_{f} and so in HH, we have by Property (P1) of an HEDCS that dH(y)βdH(x)d_{H}(y)\leq\beta-d_{H}(x), and so xXe1|Xe|1βdH(x)1|Xe||Xe|dH(y)1dH(y)\sum\limits_{x\in X\cap e}\frac{1}{|X\cap e|}\frac{1}{\beta-d_{H}(x)}\leq\frac{1}{|X\cap e|}\frac{|X\cap e|}{d_{H}(y)}\leq\frac{1}{d_{H}(y)} .

By construction, for any xXx\in X, we have that the value val(x)val(x) of xx in MfHM_{f}^{H^{*}} satisfies :

val(x)\displaystyle val(x) =\displaystyle= min{1,exmin{ϵ|Xe|,1|Xe|1βdH(x)}}\displaystyle\min\left\{1,\sum\limits_{e\ni x}\min\left\{\frac{\epsilon}{|X\cap e|},\frac{1}{|X\cap e|}\frac{1}{\beta-d_{H}(x)}\right\}\right\}
\displaystyle\geq min{1,dH(x)min{ϵd1,1d11βdH(x)}}.\displaystyle\min\left\{1,d_{H}^{*}(x)\cdot\min\left\{\frac{\epsilon}{d-1},\frac{1}{d-1}\frac{1}{\beta-d_{H}(x)}\right\}\right\}\ .

Furthermore, we can bound the value of the fractional matching MfHM^{H*}_{f}: as val(MfH)xXval(x).val(M^{H*}_{f})\geq\sum\limits_{x\in X}val(x). For convenience, we use val(x)=min{1,dH(x)min{ϵ,1βdH(x)}}val^{\prime}(x)=\min\left\{1,d_{H}^{*}(x)\cdot\min\left\{\epsilon,\frac{1}{\beta-d_{H}(x)}\right\}\right\} such that

val(x)\displaystyle val(x) \displaystyle\geq val(x)d1 and\displaystyle\frac{val^{\prime}(x)}{d-1}\,\mbox{ and} (1)
val(MfH)\displaystyle val(M^{H*}_{f}) \displaystyle\geq 1d1xXval(x),\displaystyle\frac{1}{d-1}\sum\limits_{x\in X}val^{\prime}(x)\ ,\ (2)

Next we present a lemma, proved in Appendix A.2 that bounds val(x)val^{\prime}(x) for each vertex.

Lemma 6.7.

For any xXx\in X, E[val(x)](1λ)ϕ(dH(x))E[val^{\prime}(x)]\geq(1-\lambda)\phi(d_{H}(x)).

This last lemma, combined with (2), allows us to lower bound the value of MfHM_{f}^{H^{*}} :

val(MfH)1d1xXval(x)1λd1xXϕ(dH(x)).val(M_{f}^{H^{*}})\geq\frac{1}{d-1}\sum\limits_{x\in X}val^{\prime}(x)\geq\frac{1-\lambda}{d-1}\sum\limits_{x\in X}\phi(d_{H}(x)).

Note that we have constructed MfHM_{f}^{H} by taking the fractional value in MfHM_{f}^{H^{*}} and adding the perfect matching on edges from YY^{-}. The latter matching has size |Y|d=|Y|d2\frac{|Y^{-}|}{d}=\frac{|Y|}{d^{2}}, and the value of MfHM_{f}^{H} is bounded by:

val(MfH)\displaystyle val(M^{H}_{f}) \displaystyle\geq 1d1(1λ)xXϕ(dH(x))+|Y|d2\displaystyle\frac{1}{d-1}(1-\lambda)\sum\limits_{x\in X}\phi(d_{H}(x))+\frac{|Y|}{d^{2}}
=\displaystyle= 1d1((1λ)xXϕ(dH(x))+|Y|d)|Y|d2(d1)\displaystyle\frac{1}{d-1}\big{(}(1-\lambda)\sum\limits_{x\in X}\phi(d_{H}(x))+\frac{|Y|}{d}\big{)}-\frac{|Y|}{d^{2}(d-1)}
\displaystyle\geq 1d1(1λ)(15λ)μ(G)|Y|d2(d1)\displaystyle\frac{1}{d-1}(1-\lambda)(1-5\lambda)\mu(G)-\frac{|Y|}{d^{2}(d-1)}
\displaystyle\geq 1d1(16λ)μ(G)|Y|d2(d1).\displaystyle\frac{1}{d-1}(1-6\lambda)\mu(G)-\frac{|Y|}{d^{2}(d-1)}\ .

To complete the proof, recall that YY contains a perfect matching in HH of |Y|/d|Y|/d edges, so if |Y|ddd(d1)+1μ(G)\frac{|Y|}{d}\geq\frac{d}{d(d-1)+1}\mu(G) then there already exists a matching in HH of size at least dd(d1)+1μ(G)\frac{d}{d(d-1)+1}\mu(G), and the theorem is true. We can thus assume that |Y|/d<(dd(d1)+1)μ(G)|Y|/d<\big{(}\frac{d}{d(d-1)+1}\big{)}\mu(G), in which case the previous equation yields that:

val(MfH)\displaystyle val(M^{H}_{f}) \displaystyle\geq 1d1(16λ)μ(G)|Y|d2(d1)\displaystyle\frac{1}{d-1}(1-6\lambda)\mu(G)-\frac{|Y|}{d^{2}(d-1)}
\displaystyle\geq (16λd1)μ(G)μ(G)(d1)(d(d1)+1)\displaystyle(\frac{1-6\lambda}{d-1})\mu(G)-\frac{\mu(G)}{(d-1)(d(d-1)+1)}
=\displaystyle= (dd(d1)+16λd1)μ(G).\displaystyle\big{(}\frac{d}{d(d-1)+1}-\frac{6\lambda}{d-1}\big{)}\mu(G)\ .

In both cases we get that

val(MfH)(dd2d+16λd1)μ(G).val(M^{H}_{f})\geq(\frac{d}{d^{2}-d+1}-\frac{6\lambda}{d-1})\mu(G).

6.1 Sampling and constructing an HEDCS in the MPC model

Our results in this section are general and applicable to every computation model. We prove structural properties about the HEDCSs that will help us construct them in the MPC model. We show that the degree distributions of every HEDCS (for the same parameters β\beta and λ\lambda) are almost identical. In other words, the degree of any vertex vv is almost the same in every HEDCS of the same hypergraph GG. We show also that HEDCS are robust under edge sampling, i.e. that edge sampling from and HEDCS yields another HEDCS, and that the degree distributions of any two HEDCS for two different edge sampled subgraphs of GG is almost the same no matter how the two HEDCS are selected. In the following lemma, we argue that any two HEDCS of a graph GG (for the same parameters β\beta, β\beta^{-}) are “somehow identical” in that their degree distributions are close to each other. In the rest of this section, we fix the parameters β\beta, β\beta^{-} and two subgraphs AA and BB that are both HEDCS(G,β,β)(G,\beta,\beta^{-}).

Lemma 6.8.

(Degree Distribution Lemma). Fix a dd-uniform hypergraph G(V,E)G(V,E) and parameters β\beta, β=(1λ)β\beta^{-}=(1-\lambda)\cdot\beta (for λ\lambda small enough). For any two subgraphs AA and BB that are HEDCS(G,β,β)(G,\beta,\beta^{-}), and any vertex vVv\in V, then |dA(v)dB(v)|=O(n)λ1/2β|d_{A}(v)-d_{B}(v)|=O(\sqrt{n})\lambda^{1/2}\beta.

Proof.

Suppose that we have dA(v)=kλβd_{A}(v)=k\lambda\beta for some kk and that dB(v)=0d_{B}(v)=0. We will show that if the k=Ω(nλ1/2)k=\Omega(\tfrac{\sqrt{n}}{\lambda^{1/2}}), then this will lead to a contradiction. Let ee be one of the kλβk\lambda\beta edges that are incident to vv in AA. eBe\not\in B so uvdB(u)(1λ)β\sum\limits_{u\neq v}d_{B}(u)\geq(1-\lambda)\beta. From these (1λ)β(1-\lambda)\beta edges, at most (1kλ)β(1-k\lambda)\beta can be in AA in order to respect (P1), so at least we will have (k1)λβ(k-1)\lambda\beta edges in BAB\setminus A, thus we have now covered kλβ+(k1)λβk\lambda\beta+(k-1)\lambda\beta edges in both AA and BB. Let’s keep focusing on the edge ee, and especially on one of its (k1)λβ(k-1)\lambda\beta incident edges in BAB\setminus A. Let e1e_{1} be such an edge. e1BAe_{1}\in B\setminus A, therefore ve1dA(v)(1λ)β\sum\limits_{v^{\prime}\in e_{1}}d_{A}(v^{\prime})\geq(1-\lambda)\beta. The edges incident to e1e_{1} in AA that we have covered so far are at most (1kλ)β(1-k\lambda)\beta, therefore we still need at least (k1)λ(k-1)\lambda new edges in AA to respect (P1). Out of these (k1)λ(k-1)\lambda edges, at most λβ\lambda\beta can be in BB (because e1e_{1} has already (1λ)β(1-\lambda)\beta covered edges incident to it in BB). Therefore at least (k2)λβ(k-2)\lambda\beta are in ABA\setminus B. Thus, we have so far covered at least kλβ+(k1)λβ+(k2)λβk\lambda\beta+(k-1)\lambda\beta+(k-2)\lambda\beta. One can see that we can keep doing this until we cover at least k(k+1)2λβ\tfrac{k(k+1)}{2}\lambda\beta edges in both AA and BB. The number of edges in each of AA and BB cannot exceed nβn\cdot\beta (each vertex has degree β\leq\beta), therefore we will get a contradiction if k(k+1)2λβ>2nβ\tfrac{k(k+1)}{2}\lambda\beta>2n\beta, which holds if k>2nλ1/2k>\tfrac{2\sqrt{n}}{\lambda^{1/2}}. Therefore

k2nλ1/2, and dA(v)=kλβ2nλ1/2β.k\leq\tfrac{2\sqrt{n}}{\lambda^{1/2}}\mbox{, and }d_{A}(v)=k\lambda\beta\leq 2\sqrt{n}\lambda^{1/2}\beta.

The next corollary shows that if the hypergraph is linear (every two hyperedges intersect in at most on vertex), then the degree distribution is closer. The proof is in Appendix A.3.

Corollary 6.9.

For dd-uniform linear hypergraphs, the degree distribution is tighter, and |dA(v)dB(v)|=O(logn)λβ.|d_{A}(v)-d_{B}(v)|=O(\log{n})\lambda\beta.

Next we prove two lemmas regarding the structure of different HEDCSs across sampled subgraphs. The first lemma shows that edge sampling an HEDCS results in another HEDCS for the sampled subgraph. The second lemma shows that the degree distributions of any two HEDCS for two different edge sampled subgraphs of GG is almost the same.

Lemma 6.10.

Let HH be a HEDCS(G,βH,βH)G,\beta_{H},\beta^{-}_{H}) for parameters βH:=(1λα)βp\beta_{H}:=(1-\frac{\lambda}{\alpha})\cdot\frac{\beta}{p}, βH:=βH(d1)\beta_{H}^{-}:=\beta_{H}-(d-1) and β15d(αd)2λ2logn\beta\geq 15d(\alpha d)^{2}\cdot\lambda^{-2}\cdot\log{n} such that p<12αp<1-\frac{2}{\alpha}. Suppose Gp:=GpE(V,Ep)G_{p}:=G^{E}_{p}(V,E_{p}) is an edge sampled subgraph of G and Hp:=HGpH_{p}:=H\cap G_{p}; then, with high probability:

  1. 1.

    For any vertex vVv\in V : |dHp(v)pdH(v)|λαdβ|d_{H_{p}}(v)-p\cdot d_{H}(v)|\leq\frac{\lambda}{\alpha d}\beta

  2. 2.

    HpH_{p} is a HEDCS of GpG_{p} with parameters (β,(1λ)β)(\beta,(1-\lambda)\cdot\beta) .

Proof.

Let α:=αd\alpha^{\prime}:=\alpha d. For any vertex vVv\in V , E[dHp(v)]=pdH(v)E[d_{H_{p}}(v)]=p\cdot d_{H}(v) and dH(v)βHd_{H}(v)\leq\beta_{H} by Property (P1) of HEDCS HH. Moreover, since each edge incident upon vv in HH is sampled in HpH_{p} independently, by the Chernoff bound:

P(|dHp(v)pdH(v)|λαβ)2exp(λ2β3α2)2n5d.P\Big{(}|d_{H_{p}}(v)-p\cdot d_{H}(v)|\geq\frac{\lambda}{\alpha^{\prime}}\beta\Big{)}\leq 2\cdot\exp(-\frac{\lambda^{2}\beta}{3\cdot\alpha^{\prime 2}})\leq\frac{2}{n^{5d}}\ .

In the following, we condition on the event that:

|dHp(v)pdH(v)|λαβ.|d_{H_{p}}(v)-p\cdot d_{H}(v)|\leq\frac{\lambda}{\alpha^{\prime}}\beta\ .

This event happens with probability at least 12n5d11-\frac{2}{n^{5d-1}} by above equation and a union bound on |V|=n|V|=n vertices. This finalizes the proof of the first part of the claim. We are now ready to prove that HpH_{p} is indeed am HEDCS(Gp,β,(1λ)β)(G_{p},\beta,(1-\lambda)\cdot\beta) conditioned on this event. Consider any edge eHpe\in H_{p}. Since HpHH_{p}\subset H, eHe\in H as well. Hence, we have,

vedHp(v)pβH+dλαβ=(1λα+dλα)β=β,\sum\limits_{v\in e}d_{H_{p}}(v)\leq p\cdot\beta_{H}+\frac{d\lambda}{\alpha^{\prime}}\beta=(1-\frac{\lambda}{\alpha}+\frac{d\lambda}{\alpha^{\prime}})\beta=\beta\ ,

because αα=1d\frac{\alpha}{\alpha^{\prime}}=\frac{1}{d}, where the inequality is by Property (P1) of HEDCS HH and the equality is by the choice of βH\beta_{H}. As a result, HpH_{p} satisfies Property (P1) of HEDCS for parameter β\beta. Now consider an edge eGpHpe\in G_{p}\setminus H_{p}. Since Hp=GpHH_{p}=G_{p}\cap H, eHe\not\in H as well. Hence,

vedHp(v)pβHdλαβ\displaystyle\sum\limits_{v\in e}d_{H_{p}}(v)\geq p\cdot\beta_{H}^{-}-\frac{d\lambda}{\alpha^{\prime}}\beta =\displaystyle= (1λαdλα)βp(d1)\displaystyle(1-\frac{\lambda}{\alpha}-\frac{d\lambda}{\alpha^{\prime}})\beta-p\cdot(d-1)
=\displaystyle= (12λα)βp(d1)\displaystyle(1-\frac{2\lambda}{\alpha})\beta-p\cdot(d-1)
>\displaystyle> (1λ)β.\displaystyle(1-\lambda)\cdot\beta\ .

Lemma 6.11.

(HEDCS in Edge Sampled Subgraph). Fix any hypergraph G(V,E)G(V,E) and p(0,1)p\in(0,1). Let G1G_{1} and G2G_{2} be two edge sampled subgraphs of GG with probability pp (chosen not necessarily independently). Let H1H_{1} and H2H_{2} be arbitrary HEDCSs of G1G_{1} and G2G_{2} with parameters (β,(1λ)β)(\beta,(1-\lambda)\cdot\beta). Suppose β15d(αd)2λ2logn\beta\geq 15d(\alpha d)^{2}\cdot\lambda^{-2}\cdot\log{n}, then, with probability 14n5d11-\frac{4}{n^{5d-1}}, simultaneously for all vVv\in V : |dH1(v)dH2(v)|=O(n1/2)λ1/2β|d_{H_{1}}(v)-d_{H_{2}}(v)|=O\big{(}n^{1/2}\big{)}\lambda^{1/2}\beta.

Proof.

Let HH be an HEDCS(G, βH,βH)\beta_{H},\beta_{H}^{-}) for the parameters βH\beta_{H} and βH\beta_{H}^{-} as defined in the previous lemma. The existence of HH follows since βH(d1)βH\beta_{H}-(d-1)\geq\beta_{H}^{-} . Define H^1:=HG1\hat{H}_{1}:=H\cap G_{1} and H^2:=HG2\hat{H}_{2}:=H\cap G_{2}. By Lemma 6.10, H^1\hat{H}_{1} (resp. H^2\hat{H}_{2}) is an HEDCS of G1G_{1} (resp. G2G_{2}) with parameters (β,(1λ)β)(\beta,(1-\lambda)\beta) with probability 14n5d11-\frac{4}{n^{5d-1}} . In the following, we condition on this event. By Lemma 6.8 (Degree Distribution Lemma), since both H1H_{1} (resp. H2H_{2}) and H^1\hat{H}_{1} (resp. H2^\hat{H_{2}}) are HEDCSs for G1G_{1} (resp. G2G_{2}), the degree of vertices in both of them should be “close” to each other. Moreover, since by Lemma 6.10 the degree of each vertex in H^1\hat{H}_{1} and H^2\hat{H}_{2} is close to pp times its degree in HH, we can argue that the vertex degrees in H1H_{1} and H2H_{2} are close. Formally, for any vVv\in V , we have

|dH1(v)dH2(v)||d_{H_{1}}(v)-d_{H_{2}}(v)| \leq |dH1(v)dH^1(v)|+|dH^1(v)dH^2(v)|+|dH^2(v)dH2(v)||d_{H_{1}}(v)-d_{\hat{H}_{1}}(v)|+|d_{\hat{H}_{1}}(v)-d_{\hat{H}_{2}}(v)|+|d_{\hat{H}_{2}}(v)-d_{H_{2}}(v)|
\leq O(n1/2)λ1/2β+|dH^1(v)pdH(v)|+|dH^2(v)pdH(v)|O\big{(}n^{1/2}\big{)}\lambda^{1/2}\beta+|d_{\hat{H}_{1}}(v)-pd_{H}(v)|+|d_{\hat{H}_{2}}(v)-pd_{H}(v)|
\leq O(n1/2)λ1/2β+O(1)λβ.O\big{(}n^{1/2}\big{)}\lambda^{1/2}\beta+O(1)\cdot\lambda\cdot\beta\ .

Corollary 6.12.

If GG is linear, then |dH1(v)dH2(v)|=O(logn)λβ|d_{H_{1}}(v)-d_{H_{2}}(v)|=O\big{(}\log{n}\big{)}\lambda\beta.

We are now ready to present a parallel algorithm that will use the HEDCS subgraph. We first compute an HEDCS in parallel via edge sampling. Let G(1),,G(k)G^{(1)},\ldots,G^{(k)} be a random kk-partition of a graph GG. We show that if we compute an arbitrary HEDCS of each graph G(i)G^{(i)} (with no coordination across different graphs) and combine them together, we obtain a HEDCS for the original graph GG. We then store this HEDCS in one machine and compute a maximal matching on it. We present our algorithm for all range of memory s=nΩ(1)s=n^{\Omega(1)}. Lemma 6.13 and Corollary 6.14 serve as a proof to Theorem 6.15.

1Define k:=mslognk:=\frac{m}{s\log{n}},  λ:=12nlogn\lambda:=\frac{1}{2n\log{n}} and β:=500d3n2log3n\beta:=500\cdot d^{3}\cdot n^{2}\cdot\log^{3}{n}.
2 G(1),,G(k)G^{(1)},\ldots,G^{(k)}:= random kk-partition of GG.
3 for i=1i=1 to kk, in parallel do
4       Compute C(i)=HEDCS(G(i),β,(1λ)β)C^{(i)}=HEDCS(G^{(i)},\beta,(1-\lambda)\cdot\beta) on machine ii.
5      
6 end for
7Define the multi-graph C(V,EC)C(V,E_{C}) with EC:=i=1kC(i)E_{C}:=\cup_{i=1}^{k}C^{(i)}.
8 Compute and output a maximal matching on CC.
9
Algorithm 3 HEDCS-Matching(G,s)G,s): a parallel algorithm to compute a O(d3)O(d^{3})-approximation matching on a dd-uniform hypergraph GG with mm edges on machines of memory O(s)O(s)
Lemma 6.13.

Suppose kmk\leq\sqrt{m}. Then with high probability

  1. 1.

    The subgraph CC is an HEDCS(G,βC,βC)(G,\beta_{C},\beta_{C}^{-}) for parameters: λC=O(n1/2)λ1/2 , βC=(1+dλC)kβ and βC=(1λdλC)kβ\lambda_{C}=O\big{(}n^{1/2}\big{)}\lambda^{1/2}\mbox{ , }\beta_{C}=(1+d\cdot\lambda_{C})\cdot k\cdot\beta\mbox{ and }\beta_{C}^{-}=(1-\lambda-d\cdot\lambda_{C})\cdot k\cdot\beta.

  2. 2.

    The total number of edges in each subgraph G(i)G^{(i)} of GG is O~(s)\tilde{O}(s).

  3. 3.

    If s=O~(nnm)s=\tilde{O}(n\sqrt{nm}), then the graph CC can fit in the memory of one machine.

Proof of Lemma 6.13.

  1. 1.

    Recall that each graph G(i)G^{(i)} is an edge sampled subgraph of GG with sampling probability p=1kp=\frac{1}{k}. By Lemma 6.11 for graphs G(i)G^{(i)} and G(j)G^{(j)} (for ij[k])i\neq j\in[k]) and their HEDCSs C(i)C^{(i)} and C(j)C^{(j)}, with probability 14n5d11-\frac{4}{n^{5d-1}} , for all vertices vVv\in V :

    |dC(i)(v)dC(j)(v)|O(n1/2)λ1/2β.|d_{C^{(i)}}(v)-d_{C^{(j)}}(v)|\leq O\big{(}n^{1/2}\big{)}\lambda^{1/2}\beta\ .

    By taking a union bound on all (k2){k}\choose{2} nd\leq n^{d} pairs of subgraphs G(i)G^{(i)} and G(j)G^{(j)} for ij[k]i\neq j\in[k], the above property holds for all i,j[k]i,j\in[k], with probability at least 14n4d11-\frac{4}{n^{4d-1}}. In the following, we condition on this event.

    We now prove that CC is indeed a HEDCS(G,βC,βC)HEDCS(G,\beta_{C},\beta_{C}^{-}). First, consider an edge eCe\in C and let j[k]j\in[k] be such that eC(j)e\in C^{(j)} as well. We have

    vedC(v)\displaystyle\sum\limits_{v\in e}d_{C}(v) =\displaystyle= vei=1kdC(i)(v)\displaystyle\sum\limits_{v\in e}\sum\limits_{i=1}^{k}d_{C^{(i)}}(v)
    \displaystyle\leq kvedC(j)(v)+dkλCβ\displaystyle k\cdot\sum\limits_{v\in e}d_{C^{(j)}}(v)+d\cdot k\cdot\lambda_{C}\cdot\beta
    \displaystyle\leq kβ+dkλCβ\displaystyle k\cdot\beta+d\cdot k\cdot\lambda_{C}\cdot\beta
    =\displaystyle= βC.\displaystyle\beta_{C}\ .

    Hence, CC satisfies Property (P1) of HEDCS for parameter βC\beta_{C}. Now consider an edge eGCe\in G\setminus C and let j[k]j\in[k] be such that eG(j)C(j)e\in G^{(j)}\setminus C^{(j)} (recall that each edge in GG is sent to exactly one graph G(j)G^{(j)} in the random kk-partition). We have,

    vedC(v)\displaystyle\sum\limits_{v\in e}d_{C}(v) =\displaystyle= vei=1kdC(i)(v)\displaystyle\sum\limits_{v\in e}\sum\limits_{i=1}^{k}d_{C^{(i)}}(v)
    \displaystyle\geq kvedC(j)dkλCβ\displaystyle k\cdot\sum\limits_{v\in e}d_{C^{(j)}}-d\cdot k\lambda_{C}\beta
    \displaystyle\geq k(1λ)βdkλCβ.\displaystyle k\cdot(1-\lambda)\cdot\beta-d\cdot k\lambda_{C}\beta\ .
  2. 2.

    Let E(i)E^{(i)} be the edges of G(i)G^{(i)}. By the independent sampling of edges in an edge sampled subgraph, we have that E[|E(i)|]=mk=O~(s)E\big{[}|E^{(i)}|\big{]}=\tfrac{m}{k}=\tilde{O}(s). By Chernoff bound, with probability 11kn201-\tfrac{1}{k\cdot n^{20}}, the size of E(i)E^{(i)} is O~(s)\tilde{O}(s) . We can then take a union bound on all kk machines in G(i)G^{(i)} and have that with probability 11/n201-1/n^{20}, each graph G(i)G^{(i)} is of size O~(s)\tilde{O}(s).

  3. 3.

    The number of edges in CC is bounded by nβc=O(nkβ)=O~(n3ms)=O~(s)n\cdot\beta_{c}=O(n\cdot k\cdot\beta)=\tilde{O}(\frac{n^{3}m}{s})=\tilde{O}(s).

Corollary 6.14.

If GG is linear, then by choosing λ:=12log2n\lambda:=\frac{1}{2\log^{2}{n}} and β:=500d3log4n\beta:=500\cdot d^{3}\cdot\log^{4}{n}\ in HEDCS-Matching we have:

  1. 1.

    With high probability, the subgraph CC is a HEDCS(G,βC,βC)(G,\beta_{C},\beta_{C}^{-}) for parameters: λC=O(logn)λ , βC=(1+dλC)kβ and βC=(1λdλC)kβ\lambda_{C}=O(\log{n})\lambda\mbox{ , }\beta_{C}=(1+d\cdot\lambda_{C})\cdot k\cdot\beta\mbox{ and }\beta_{C}^{-}=(1-\lambda-d\cdot\lambda_{C})\cdot k\cdot\beta.

  2. 2.

    If s=O~(nm)s=\tilde{O}(\sqrt{nm}) then CC can fit on the memory of one machine.

Proof of Corollary 6.14.

  1. 1.

    Similarly to Lemma 6.13 and by using corollary 6.12, we know that for graphs G(i)G^{(i)} and G(j)G^{(j)} (for ij[k])i\neq j\in[k]) and their HEDCSs C(i)C^{(i)} and C(j)C^{(j)}, with high probability , for all vertices vVv\in V :

    |dC(i)(v)dC(j)(v)|O(logn)λβ.|d_{C^{(i)}}(v)-d_{C^{(j)}}(v)|\leq O\big{(}\log{n}\big{)}\lambda\beta.

    By taking a union bound on all (k2){k}\choose{2} nd\leq n^{d} pairs of subgraphs G(i)G^{(i)} and G(j)G^{(j)} for ij[k]i\neq j\in[k], the above property holds for all i,j[k]i,j\in[k], with probability at least 14n4d11-\frac{4}{n^{4d-1}}. In the following, we condition on this event. Showing that CC is indeed a HEDCS(G,βC,βC)HEDCS(G,\beta_{C},\beta_{C}^{-}) follows by the same analysis from the proof of Lemma 6.13.

  2. 2.

    The number of edges in CC is bounded by nβc=O(nkβ)=O(nk)=O~(nms)=O~(s)n\cdot\beta_{c}=O(n\cdot k\cdot\beta)=O(n\cdot k)=\tilde{O}(\frac{nm}{s})=\tilde{O}(s).

The previous lemmas allow us to formulate the following theorem.

Theorem 6.15.

HEDCS-Matching constructs a HEDCS of GG in 3 MPCMPC rounds on machines of memory s=O~(nnm)s=\tilde{O}(n\sqrt{nm}) in general and s=O~(nm)s=\tilde{O}(\sqrt{nm}) for linear hypergraphs.

Corollary 6.16.

HEDCS-Matching achieves a d(d1+1/d)2d(d-1+1/d)^{2}-approximation to the dd-Uniform Hypergraph Matching in 3 rounds with high probability.

Proof of Corollary 6.16.

We show that with high probability, CC verifies the assumptions of theorem 6.4. From Lemma 6.13, we get that with high probability, the subgraph CC is a HEDCS(G,βC,βC)HEDCS(G,\beta_{C},\beta_{C}^{-}) for parameters: λC=O(n1/2)λ1/2 , βC=(1+dλC)kβ and βC=(1λdλC)kβ.\lambda_{C}=O\big{(}n^{1/2}\big{)}\lambda^{1/2}\mbox{ , }\beta_{C}=(1+d\cdot\lambda_{C})\cdot k\cdot\beta\mbox{ and }\beta_{C}^{-}=(1-\lambda-d\cdot\lambda_{C})\cdot k\cdot\beta. We can see that βc8d2d1λc3\beta_{c}\geq\frac{8d^{2}}{d-1}\cdot\lambda_{c}^{-3}. Therefore by Theorem 6.4, CC contains a (d1+1d)(d-1+\frac{1}{d})-approximate ϵ\epsilon-restricted matching. Since the integrality gap of the dd-UHM is at most d1+1dd-1+\frac{1}{d} (see [19] for details), then CC contains a (d1+1d)2(d-1+\frac{1}{d})^{2}-approximate matching. Taking a maximal matching in CC multiplies the approximation factor by at most dd. Therefore, any maximal matching in CC is a d(d1+1d)2d(d-1+\frac{1}{d})^{2}-approximation. ∎

7 Computational Experiments

To understand the relative performance of the proposed algorithms, we conduct a wide variety of experiments on both random and real-life data [51, 57, 43]. We implement the three algorithms Greedy, Iterated-Sampling and HEDCS-Matching using Python, and more specifically relying on the module pygraph and its class pygraph.hypergraph to construct and perform operations on hypergraphs. We simulate the MPC model by computing a kk-partitioning and splitting it into kk different inputs. Parallel computations on different parts of the kk-partitioning are handled through the use of the multiprocessing library in Python. We compute the optimal matching through an Integer Program for small instances of random uniform hypergraphs (d10)d\leq 10) as well as geometric hypergraphs. The experiments were conducted on a 2.6 GHz Intel Core i7 processor and 16 GB RAM workstation. The datasets differ in their number of vertices, hyperedges, vertex degree and hyperedge cardinality. In the following tables, nn and mm denote the number of vertices and number of hyperedges respectively, and dd is the size of hyperedges. For Table 3, the graphs might have different number of edges and m¯\bar{m} denotes the average number of edges. kk is the number of machines used to distribute the hypergraph initially. We limit the number of edges that a machine can store to 2mk\frac{2m}{k}. In the columns Gr, IS and HEDCS, we store the average ratio between the size of the matching computed by the algorithms Greedy, Iterated-Sampling and HEDCS-Matching respectively, and the size of a benchmark. This ratio is computed by the percentage ALGBENCHMARK\frac{ALG}{BENCHMARK}, where ALGALG is the output of the algorithm, and BENCHMARKBENCHMARK denotes the size of the benchmark. The benchmarks include the size of optimal solution when it is possible to compute, or the size of a maximal matching computed via a sequential algorithm. #I\#I denotes the number of instances of random graphs that we generated for fixed nn, mm and dd. β\beta and β\beta^{-} are the parameters used to construct the HEDCS subgraphs in HEDCS-Matching. These subgraphs are constructed using the procedure in the proof of Lemma 6.3.

7.1 Experiments with random hypergraphs

We perform experiments on two classes of dd-uniform random hypergraphs. The first contains random uniform hypergraphs, and the second contains random geometric hypergraphs.

Random Uniform Hypergraphs. For a fixed nn, mm and dd, each potential hyperedge is sampled independently and uniformly at random from the set of vertices. In Table 2, we use the size of a perfect matching nd\frac{n}{d} as a benchmark, because a perfect matching in random graphs exists with probability 1o(1)1-o(1) under some conditions on m,nm,n and dd. If d(n,m)=mdnd(n,m)=\frac{m\cdot d}{n} is the expected degree of a random uniform hypergraph, Frieze and Janson [25] showed that d(n,m)n1/3\frac{d(n,m)}{n^{1/3}}\rightarrow\infty is a sufficient condition for the hypergraph to have a perfect matching with high probability. Kim [42] further weakened this condition to d(n,m)n1/(5+2/(d1))\frac{d(n,m)}{n^{1/(5+2/(d-1))}}\rightarrow\infty. We empirically verify, by solving the IP formulation, that for d=3,5d=3,5 and for small instances of d=10d=10, our random graphs contain a perfect matching. In Table 2, the benchmark is the size of a perfect matching, while in Table 5, it is the size of a greedy maximal matching. In terms of solution quality (Tables 2 and 5) HEDCS-Matching performs consistently better than Greedy, and Iterated-Sampling performs significantly better than the other two. None of the three algorithms are capable of finding perfect matchings for a significant number of the runs. When compared to the size of a maximal matching, Iterated-Sampling still performs better, followed by HEDCS-Matching. However, the ratio is smaller when compared to a maximal matching, which is explained by the deterioration of the quality of greedy maximal matching as nn and dd grow. Dufosse et al. [23] confirm that the approximation quality of a greedy maximal matching on random graphs that contain a perfect matching degrades as a function of nn and dd. The performance of the algorithms decreases as dd grows, which is theoretically expected since their approximations ratio are both proportional to dd. The number of rounds for Iterated-Sampling grows slowly with nn, which is consistent with O(logn)O(\log{n}) bound. Recall that the number of rounds for the other two algorithms is constant and equal to 3.

Geometric Random Hypergraphs. The second class we experimented on is random geometric hypergraphs. The vertices of a random geometric hypergraph (RGH) are randomly sampled from the uniform distribution of the space [0,1)2[0,1)^{2}. A set of dd different vertices v1,,vdVv_{1},\ldots,v_{d}\in V forms a hyperedge if, and only if, the distance between any viv_{i} and vjv_{j} is less than a previously specified parameter r(0,1)r\in(0,1). The parameters rr and nn fully characterize a RGH. We fix d=3d=3 and generate different geometric hypergraphs by varying nn and rr. We compare the output of the algorithms to the optimal solution that we compute through the IP formulation. Table 3 shows that the performance of our three algorithms is almost similar with Iterated-Sampling outperforming Greedy and HEDCS-Matching as the size of the graphs grows. We also observe that random geometric hypergraphs do not contain perfect matchings, mainly because of the existence of some outlier vertices that do not belong to any edge. The number of rounds of Iterated-Sampling still grows with nn, confirming the theoretical bound and the results on random uniform hypergraphs.

nn mm dd kk #I Gr IS HEDCS β\beta β\beta^{-} Rounds IS
15 200 3 5 500 77.6% 86.6% 82.8% 5 3 3.8
30 400 5 78.9% 88.1% 80.3% 7 4 4.56
100 3200 10 81.7% 93.4% 83.1% 5 2 5.08
300 4000 10 78.8% 88.7% 80.3% 8 6 7.05
50 800 5 6 500 66.0% 76.2% 67.0% 16 11 4.89
100 2,800 10 68.0% 79.6% 69.8% 16 11 4.74
300 4,000 10 62.2% 75.1% 65.5% 10 5 6.62
500 8,000 16 63.3% 76.4% 65.6% 10 5 7.62
500 15,000 10 16 500 44.9% 58.3% 53.9% 20 10 6.69
1,000 50,000 20 47.3% 61.3% 50.5% 20 10 8.25
2,500 100,000 20 45.6% 59.9% 48.2% 20 10 8.11
5,000 200,000 20 45.0% 59.7% 47.8% 20 10 7.89
1,000 50,000 25 25 100 27.5% 34.9% 30.8% 75 50 8.10
2,500 100,000 25 26.9% 34.0% 27.0% 75 50 8.26
5,000 250,000 30 26.7% 33.8% 28.8% 75 50 8.23
10,000 500,000 30 26.6% 34.1% 28.2% 75 50 8.46
5,000 250,000 50 30 100 22.4% 30.9% 27.9% 100 50 10.22
10,000 500,000 30 22.2% 31.0% 26.5% 100 50 10.15
15,000 750,000 30 20.9% 30.8% 26.4% 100 50 10.26
25,000 1,000,000 30 20.9% 30.8% 26.4% 100 50 10.29

Table 2: Comparison on random instances with perfect matching benchmark, of size nd\frac{n}{d}.
nn rr m¯\bar{m} dd kk #I Gr IS HEDCS β\beta β\beta^{-} Rounds IS
100 0.2 930.1 ±\pm 323 3 5 100 88.3% 89.0% 89.6% 3 5 4.1
100 0.25 1329.5 ±\pm 445 10 88.0% 89.0% 89.5% 3 5 5.2
250 0.15 13222±\pm 3541 5 85.0% 88.6% 85.5% 4 7 8.1
300 0.15 14838 ±\pm 4813 10 85.5% 88.0% 85.2% 4 7 11.1
300 0.2 27281 ±\pm 8234 10 85.0% 89.0% 86.3% 4 7 13.5

Table 3: Comparison on random geometric hypergraphs with optimal matching benchmark.

7.2 Experiments with real data

PubMed and Cora Datasets. We employ two citation network datasets, namely the Cora and Pubmed datasets [51, 57]. These datasets are represented with a graph, with vertices being publications, and edges being a citation links from one article to another. We construct the hypergraphs in two steps 1) each article is a vertex; 2) each article is taken as a centroid and forms a hyperedge to connect those articles which have citation links to it (either citing it or being cited). The Cora hypergraph has an average edge size of 3.0±1.13.0\pm 1.1, while the average in the Pubmed hypergraph is 4.3±5.74.3\pm 5.7. The number of edges in both is significantly smaller than the number of vertices, therefore we allow each machine to store only mk+14mk\frac{m}{k}+\frac{1}{4}\frac{m}{k} edges. We randomly split the edges on each machine, and because the subgraphs are small, we are able to compute the optimal matchings on each machine, as well as on the whole hypergraphs. We perform ten runs of each algorithm with different random kk-partitioning and take the maximum cardinality obtained. Table 4 shows that none of the algorithms is able to retrieve the optimal matching. This behaviour can be explained by the loss of information that using parallel machines implies. We see that Iterated-Sampling, like in previous experiments, outperforms the other algorithms due to highly sequential design. HEDCS-Matching particularly performs worse than the other algorithms, mainly because it fails to construct sufficiently large HEDCSs.

Social Network Communities. We include two larger real-world datasets, orkut-groups and LiveJournal, from the Koblenz Network Collection [43]. We use two hypergraphs that were constructed from these datasets by Shun [52]. Vertices represent individual users, and hyperedges represent communities in the network. Because membership in these communities does not require the same commitment as collaborating on academic research, these hypergraphs have different characteristics from co-citation hypergraphs, in terms of size, vertex degree and hyperedge cardinality. We use the size of a maximal matching as a benchmark. Table 4 shows that Iterated-Sampling still provides the best approximation. HEDCS-Sampling performs worse than Greedy on Livejournal, mainly because the ratio mn\frac{m}{n} is not big enough to construct an HEDCS with a large matching.

Name nn mm kk Gr IS HEDCS Rounds IS
Cora 2,708 1,579 2 75.0% 83.2% 63.9% 6
PubMed 19,717 7,963 3 72.0% 86.6% 62.4% 9
Orkut 2,32 ×106\times 10^{6} 1,53 ×107\times 10^{7} 10 55.6% 66.1% 58.1% 11
Livejournal 3,20 ×106\times 10^{6} 7,49 ×106\times 10^{6} 3 44.0% 55.2% 43.3% 10
Table 4: Comparison on co-citation and social network hypergraphs.

7.3 Experimental conclusions

In the majority of our experiments, Iterated-Sampling provides the best approximation to the dd-UHM problem, which is consistent with its theoretical superiority. On random graphs, HEDCS-Matching performs consistently better than Greedy, even-though Greedy has a better theoretical approximation ratio. We suspect it is because the O(d3)O(d^{3})-approximation bound on HEDCS-Matching is loose. We conjecture that rounding an ϵ\epsilon-restricted matching can be done efficiently, which would improve the approximation ratio. The performance of the three algorithms decreases as dd grows. The results on the number of rounds of Iterated-Sampling also are consistent with the theoretical bound. However, due to its sequential design, and by centralizing the computation on one single machine while using the other machines simply to coordinate, Iterated-Sampling not only takes more rounds than the other two algorithms, but is also slower when we account for the absolute runtime as well as the runtime per round. We compared the runtimes of our three algorithms on a set of random uniform hypergraphs. Figure 1 and 2 show that the absolute and per round run-times of Iterated-Sampling grow considerably faster with the size of the hypergraphs. We can also see that HEDCS-Sampling is slower than Greedy, since the former performs heavier computation on each machine. This confirms the trade-off between the extent to which the algorithms use the power of parallelism and the quality of the approximations.

8 Conclusion

We have presented the first algorithms for the dd-UHM problem in the MPC model. Our theoretical and experimental results highlight the trade-off between the approximation ratio, the necessary memory per machine and the number of rounds it takes to run the algorithm. We have also introduced the notion of HEDCS subgraphs, and have shown that an HEDCS contains a good approximation for the maximum matching and that they can be constructed in few rounds in the MPC model. We believe better approximation algorithms should be possible, especially if we can give better rounding algorithms for a ϵ\epsilon-restricted fractional hypergraph matching. For future work, it would be interesting to explore whether we can achieve better-than-dd approximation in the MPC model in a polylogarithmic number of rounds. Exploring algorithms relying on vertex sampling instead of edge sampling might be a good candidate. In addition, our analysis in this paper is specific to unweighted hypergraphs, and we would like to extend this to weighted hypergraphs.

Refer to caption
Figure 1: Runtime of the three algorithms when d=3d=3 and m=20dnm=20\cdot d\cdot n
Refer to caption
Figure 2: Total runtime per round (maximum over all machines in every round) of the three algorithms when d=3d=3 and m=20dnm=20\cdot d\cdot n

References

  • [1] S. Abbar, S. Amer-Yahia, P. Indyk, S. Mahabadi, and K. R. Varadarajan. Diverse near neighbor problem. In Proceedings of the twenty-ninth annual symposium on Computational geometry, pages 207–214. ACM, 2013.
  • [2] K. J. Ahn and S. Guha. Access to data and number of iterations: Dual primal algorithms for maximum matching under resource constraints. ACM Transactions on Parallel Computing (TOPC), 4(4):17, 2018.
  • [3] N. Alon, L. Babai, and A. Itai. A fast and simple randomized parallel algorithm for the maximal independent set problem. Journal of algorithms, 7(4):567–583, 1986.
  • [4] A. Andoni, A. Nikolov, K. Onak, and G. Yaroslavtsev. Parallel algorithms for geometric graph problems. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 574–583. ACM, 2014.
  • [5] A. Andoni, Z. Song, C. Stein, Z. Wang, and P. Zhong. Parallel graph connectivity in log diameter rounds. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pages 674–685. IEEE, 2018.
  • [6] S. Assadi, M. Bateni, A. Bernstein, V. Mirrokni, and C. Stein. Coresets meet edcs: algorithms for matching and vertex cover on massive graphs. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1616–1635. SIAM, 2019.
  • [7] S. Assadi and S. Khanna. Randomized composable coresets for matching and vertex cover. In Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures, pages 3–12, 2017.
  • [8] A. Badanidiyuru, B. Mirzasoleiman, A. Karbasi, and A. Krause. Streaming submodular maximization: Massive data summarization on the fly. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 671–680. ACM, 2014.
  • [9] M.-F. F. Balcan, S. Ehrlich, and Y. Liang. Distributed kk-means and kk-median clustering on general topologies. In Advances in Neural Information Processing Systems, pages 1995–2003, 2013.
  • [10] M. Bateni, A. Bhashkara, S. Lattanzi, and V. Mirrokni. Mapping core-sets for balanced clustering. Advances in Neural Information Processing Systems, 26:5–8, 2014.
  • [11] P. Beame, P. Koutris, and D. Suciu. Communication steps for parallel query processing. In Proceedings of the 32nd ACM SIGMOD-SIGACT-SIGAI symposium on Principles of database systems, pages 273–284. ACM, 2013.
  • [12] S. Behnezhad, M. Hajiaghayi, and D. G. Harris. Exponentially faster massively parallel maximal matching. In 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS), pages 1637–1649. IEEE, 2019.
  • [13] P. Berman. A d/2 approximation for maximum weight independent set in d-claw free graphs. In Scandinavian Workshop on Algorithm Theory, pages 214–219. Springer, 2000.
  • [14] P. Berman and M. Fürer. Approximating maximum independent set in bounded degree graphs. In Soda, volume 94, pages 365–371, 1994.
  • [15] P. Berman and P. Krysta. Optimizing misdirection. In Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms, pages 192–201. Society for Industrial and Applied Mathematics, 2003.
  • [16] A. Bernstein and C. Stein. Fully dynamic matching in bipartite graphs. In International Colloquium on Automata, Languages, and Programming, pages 167–179. Springer, 2015.
  • [17] A. Bernstein and C. Stein. Faster fully dynamic matchings with small approximation ratios. In Proceedings of the twenty-seventh annual ACM-SIAM symposium on Discrete algorithms, pages 692–711. Society for Industrial and Applied Mathematics, 2016.
  • [18] A. Bhaskara, A. Rostamizadeh, J. Altschuler, M. Zadimoghaddam, T. Fu, and V. Mirrokni. Greedy column subset selection: New bounds and distributed algorithms. 2016.
  • [19] Y. H. Chan and L. C. Lau. On linear and semidefinite programming relaxations for hypergraph matching. Mathematical programming, 135(1-2):123–148, 2012.
  • [20] B. Chandra and M. M. Halldórsson. Greedy local improvement and weighted set packing approximation. Journal of Algorithms, 39(2):223–240, 2001.
  • [21] M. Cygan. Improved approximation for 3-dimensional matching via bounded pathwidth local search. In Foundations of Computer Science (FOCS), 2013 IEEE 54th Annual Symposium on, pages 509–518. IEEE, 2013.
  • [22] A. Czumaj, J. Łacki, A. Madry, S. Mitrovic, K. Onak, and P. Sankowski. Round compression for parallel matching algorithms. SIAM Journal on Computing, (0):STOC18–1, 2019.
  • [23] F. Dufossé, K. Kaya, I. Panagiotas, and B. Uçar. Effective heuristics for matchings in hypergraphs. In International Symposium on Experimental Algorithms, pages 248–264. Springer, 2019.
  • [24] M. Fischer, M. Ghaffari, and F. Kuhn. Deterministic distributed edge-coloring via hypergraph maximal matching. In Foundations of Computer Science (FOCS), 2017 IEEE 58th Annual Symposium on, pages 180–191. IEEE, 2017.
  • [25] A. Frieze and S. Janson. Perfect matchings in random s-uniform hypergraphs. Random Structures & Algorithms, 7(1):41–57, 1995.
  • [26] M. Furer and H. Yu. Approximate the k-set packing problem by local improvements. arXiv preprint arXiv:1307.2262, 2013.
  • [27] M. Ghaffari, T. Gouleakis, C. Konrad, S. Mitrović, and R. Rubinfeld. Improved massively parallel computation algorithms for mis, matching, and vertex cover. In Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing, pages 129–138, 2018.
  • [28] M. Ghaffari and J. Uitto. Sparsifying distributed algorithms with ramifications in massively parallel computation and centralized local computation. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1636–1653. SIAM, 2019.
  • [29] M. T. Goodrich, N. Sitchinava, and Q. Zhang. Sorting, searching, and simulation in the mapreduce framework. In International Symposium on Algorithms and Computation, pages 374–383. Springer, 2011.
  • [30] M. M. Halldórsson. Approximating discrete collections via local improvements. In SODA, volume 95, pages 160–169, 1995.
  • [31] M. M. Halldórsson. Approximations of weighted independent set and hereditary subset problems. In Graph Algorithms And Applications 2, pages 3–18. World Scientific, 2004.
  • [32] M. M. Halldórsson and J. Radhakrishnan. Greed is good: Approximating independent sets in sparse and bounded-degree graphs. Algorithmica, 18(1):145–163, 1997.
  • [33] D. G. Harris. Distributed approximation algorithms for maximum matching in graphs and hypergraphs. arXiv preprint arXiv:1807.07645, 2018.
  • [34] N. J. Harvey, C. Liaw, and P. Liu. Greedy and local ratio algorithms in the mapreduce model. In Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures, pages 43–52, 2018.
  • [35] E. Hazan, S. Safra, and O. Schwartz. On the complexity of approximating k-set packing. computational complexity, 15(1):20–39, 2006.
  • [36] C. A. J. Hurkens and A. Schrijver. On the size of systems of sets every t of which have an sdr, with an application to the worst-case ratio of heuristics for packing problems. SIAM Journal on Discrete Mathematics, 2(1):68–72, 1989.
  • [37] P. Indyk, S. Mahabadi, M. Mahdian, and V. S. Mirrokni. Composable core-sets for diversity and coverage maximization. In Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 100–108. ACM, 2014.
  • [38] A. Israeli and A. Itai. A fast and simple randomized parallel algorithm for maximal matching. Information Processing Letters, 22(2):77–80, 1986.
  • [39] V. Kann. Maximum bounded 3-dimensional matching in max snp-complete. Inf. Process. Lett., 37(1):27–35, 1991.
  • [40] H. Karloff, S. Suri, and S. Vassilvitskii. A model of computation for mapreduce. In Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms, pages 938–948. SIAM, 2010.
  • [41] R. M. Karp. Reducibility among combinatorial problems. In Complexity of computer computations, pages 85–103. Springer, 1972.
  • [42] J. H. Kim. Perfect matchings in random uniform hypergraphs. Random Structures & Algorithms, 23(2):111–132, 2003.
  • [43] J. Kunegis. Konect: the koblenz network collection. In Proceedings of the 22nd International Conference on World Wide Web, pages 1343–1350, 2013.
  • [44] S. Lattanzi, B. Moseley, S. Suri, and S. Vassilvitskii. Filtering: a method for solving graph problems in mapreduce. In Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures, pages 85–94. ACM, 2011.
  • [45] M. Luby. A simple parallel algorithm for the maximal independent set problem. SIAM journal on computing, 15(4):1036–1053, 1986.
  • [46] V. Mirrokni and M. Zadimoghaddam. Randomized composable core-sets for distributed submodular maximization. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 153–162. ACM, 2015.
  • [47] B. Mirzasoleiman, A. Karbasi, R. Sarkar, and A. Krause. Distributed submodular maximization: Identifying representative elements in massive data. In Advances in Neural Information Processing Systems, pages 2049–2057, 2013.
  • [48] C. H. Papadimitriou. Computational complexity. John Wiley and Sons Ltd., 2003.
  • [49] T. Sandholm. Algorithm for optimal winner determination in combinatorial auctions. Artificial intelligence, 135(1-2):1–54, 2002.
  • [50] T. Sandholm, K. Larson, M. Andersson, O. Shehory, and F. Tohmé. Coalition structure generation with worst case guarantees. Artificial Intelligence, 111(1-2):209–238, 1999.
  • [51] P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and T. Eliassi-Rad. Collective classification in network data. AI magazine, 29(3):93–93, 2008.
  • [52] J. Shun. Practical parallel hypergraph algorithms. In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 232–249, 2020.
  • [53] S. S. Skiena. The algorithm design manual: Text, volume 1. Springer Science & Business Media, 1998.
  • [54] M. Sviridenko and J. Ward. Large neighborhood local search for the maximum set packing problem. In International Colloquium on Automata, Languages, and Programming, pages 792–803. Springer, 2013.
  • [55] L. Trevisan. Non-approximability results for optimization problems on bounded degree instances. In Proceedings of the thirty-third annual ACM symposium on Theory of computing, pages 453–461, 2001.
  • [56] R. Vemuganti. Applications of set covering, set packing and set partitioning models: A survey. In Handbook of combinatorial optimization, pages 573–746. Springer, 1998.
  • [57] J. Yang and J. Leskovec. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems, 42(1):181–213, 2015.

Appendix A Omitted proofs

A.1 Proof of Lemma 6.7

See 6.7

Proof.

We distinguish three cases:

  • dH(x)βdd_{H}(x)\leq\frac{\beta}{d}: in this case 1βdH(x)d(d1)β<ϵ\frac{1}{\beta-d_{H}(x)}\leq\frac{d}{(d-1)\beta}<\epsilon and dH(x)dH(x)βdH(x)d_{H^{*}}(x)\leq d_{H}(x)\leq\beta-d_{H}(x). This implies that val(x)=dH(x)βdH(x)val^{\prime}(x)=\frac{d_{H^{*}}(x)}{\beta-d_{H}(x)}, so that :

    E[val(x)]d1ddH(x)βdH(x)=ϕ(dH(x)).E[val^{\prime}(x)]\geq\frac{d-1}{d}\cdot\frac{d_{H}(x)}{\beta-d_{H}(x)}=\phi(d_{H}(x))\ .

    Now consider the case in which dH(x)>βdd_{H}(x)>\frac{\beta}{d}. Then

    E[dH(x)](d1)dH(x)d>(d1)d2β8λ3>>ϵ1,E[d_{H^{*}}(x)]\geq\frac{(d-1)d_{H}(x)}{d}>\frac{(d-1)}{d^{2}}\cdot\beta\geq 8\cdot\lambda^{-3}>>\epsilon^{-1}\ ,

    because β8d2d1λ3\beta\geq\frac{8d^{2}}{d-1}\cdot\lambda^{-3}, and by a Chernoff Bound :

    P[dH(x)<(1λ2)(d1)dH(x)d]exp(E[dH(x)](λ2)212)exp(λ1)λ/2.\begin{split}P\Big{[}d_{H^{*}}(x)<(1-\frac{\lambda}{2})\frac{(d-1)d_{H}(x)}{d}\Big{]}&\leq\exp{(-E[d_{H^{*}}(x)](\frac{\lambda}{2})^{2}\frac{1}{2})}\\ &\leq\exp{(-\lambda^{-1})}\\ &\leq\lambda/2\ .\end{split} (3)
  • Let us now consider the case dH(x)>βdd_{H}(x)>\frac{\beta}{d} and min{ϵ,1βdH(x)}=ϵ\min\Big{\{}\epsilon,\frac{1}{\beta-d_{H}(x)}\Big{\}}=\epsilon. With probability at least (1λ2)(1-\frac{\lambda}{2}), we have that:

    dH(x)(β1ϵ)(1λ2)d1d>>ϵ1.d_{H^{*}}(x)\geq(\beta-\frac{1}{\epsilon})(1-\frac{\lambda}{2})\frac{d-1}{d}>>\epsilon^{-1}.

    Thus with probability at least (1λ2)(1-\frac{\lambda}{2}), we have that dH(x)ϵ>1d_{H^{*}}(x)\epsilon>1 and:

    E[val(x)](1λ2)(1λ2)ϕ(x).E[val^{\prime}(x)]\geq(1-\frac{\lambda}{2})\geq(1-\frac{\lambda}{2})\phi(x).
  • The only case that we need to check is dH(x)βdd_{H}(x)\geq\frac{\beta}{d} and min{ϵ,1βdH(x)}=1βdH(x)\min\Big{\{}\epsilon,\frac{1}{\beta-d_{H}(x)}\Big{\}}=\frac{1}{\beta-d_{H}(x)}, so that val(x)=min{1,dH(x)βdH(x)}val^{\prime}(x)=\min\Big{\{}1,\frac{d_{H^{*}}(x)}{\beta-d_{H}(x)}\Big{\}}. Again we have that with probability at least (1λ2):(1-\frac{\lambda}{2}):

    dH(x)βdH(x)d1ddH(x)βdH(x)(1λ2)(1λ2)ϕ(dH(x)).\frac{d_{H^{*}}(x)}{\beta-d_{H}(x)}\geq\frac{d-1}{d}\frac{d_{H}(x)}{\beta-d_{H}(x)}(1-\frac{\lambda}{2})\geq(1-\frac{\lambda}{2})\phi(d_{H}(x)).

In other words, with probability at least (1λ2)(1-\frac{\lambda}{2}), we have val(x)(1λ2)ϕ(dH(x))val(x)\geq(1-\frac{\lambda}{2})\phi(d_{H}(x)), so that E[val(x)](1λ2)2ϕ(dH(x))>(1λ)ϕ(dH(x))E[val(x)]\geq(1-\frac{\lambda}{2})^{2}\phi(d_{H}(x))>(1-\lambda)\phi(d_{H}(x)). We just showed that in all cases E[val(x)](1λ)ϕ(dH(x))E[val^{\prime}(x)]\geq(1-\lambda)\phi(d_{H}(x))

A.2 Proof of Lemma 6.5 and Lemma 6.6

See 6.5

Proof.

We will provide a proof for d=3d=3 that is easy to generalize. We first show that if a+b+cβa+b+c\geq\beta, then ϕ(a)+ϕ(b)+ϕ(c)1\phi(a)+\phi(b)+\phi(c)\geq 1. The claim is true if ϕ(a)1\phi(a)\geq 1 or ϕ(b)1\phi(b)\geq 1 or ϕ(c)1\phi(c)\geq 1. Suppose that ϕ(a)<1\phi(a)<1 and ϕ(b)<1\phi(b)<1 and ϕ(c)<1\phi(c)<1. Then :

ϕ(a)+ϕ(b)+ϕ(c)=d1d(aβa+bβb+cβc)d1d(ab+c+ba+c+ca+b).\phi(a)+\phi(b)+\phi(c)=\frac{d-1}{d}\Big{(}\frac{a}{\beta-a}+\frac{b}{\beta-b}+\frac{c}{\beta-c}\Big{)}\geq\frac{d-1}{d}\Big{(}\frac{a}{b+c}+\frac{b}{a+c}+\frac{c}{a+b}\Big{)}\ .

By Nesbitt’s Inequality we know that

ab+c+ba+c+ca+bdd1=32,\frac{a}{b+c}+\frac{b}{a+c}+\frac{c}{a+b}\geq\frac{d}{d-1}=\frac{3}{2},

and therefore ϕ(a)+ϕ(b)+ϕ(c)1\phi(a)+\phi(b)+\phi(c)\geq 1.

By the general Nesbitt’s Inequality (See appendix D), we know that for d>3d>3

i=1daijiajdd1.\sum\limits_{i=1}^{d}\frac{a_{i}}{\sum\limits_{j\neq i}a_{j}}\geq\frac{d}{d-1}\ .

So if i=1daiβ\sum\limits_{i=1}^{d}a_{i}\geq\beta, then i=1dϕ(ai)1\sum\limits_{i=1}^{d}\phi(a_{i})\geq 1. Now, let ϕ(x)=ddxϕ(x)\phi^{\prime}(x)=\frac{d}{dx}\phi(x). To complete the proof, it is sufficient to show that we always have ϕ(x)5β\phi^{\prime}(x)\leq\frac{5}{\beta}. To prove this inequality, note that if xd2d1βx\geq\frac{d}{2d-1}\beta then ϕ(x)=1\phi(x)=1 and thus ϕ(x)=0.\phi^{\prime}(x)=0. Now, if xd2d1βx\leq\frac{d}{2d-1}\beta then:

ϕ(x)=d1dddxxβx=d1dβ(βx)2,\phi^{\prime}(x)=\frac{d-1}{d}\frac{d}{dx}\frac{x}{\beta-x}=\frac{d-1}{d}\frac{\beta}{(\beta-x)^{2}}\ ,

which is increasing in xx and maximized at x=d2d1βx=\frac{d}{2d-1}\beta, in which case ϕ(x)=(2d1)2d(d1)1β5β\phi^{\prime}(x)=\frac{(2d-1)^{2}}{d(d-1)}\frac{1}{\beta}\leq\frac{5}{\beta}. In the end we get:

i=1dϕ(ai)15λ.\sum\limits_{i=1}^{d}\phi(a_{i})\geq 1-5\cdot\lambda\ .

See 6.6

Proof.

Let MGM^{G} be some maximum integral matching in GG. Some of the edges in MGM^{G} are in HH, while others are in GHG\setminus H. Let X0X_{0} contain all vertices incident to edges in MG(GH)M_{G}\cap(G\setminus H), and let Y0Y_{0} contain all vertices incident to edges in MGHM_{G}\cap H. We now show that X0X_{0} and Y0Y_{0} satisfy the first three properties of the lemma. Property 1 is satisfied because X0Y0X_{0}\cup Y_{0} consists of all matched vertices in MGM_{G}. Property 2 is satisfied by definition of Y0Y_{0}. To see that Property 3 is satisfied, remark that the vertices of Y0Y_{0} each contribute exactly 1d\frac{1}{d}. Now, X0X_{0} consists of |X0|/d|X_{0}|/d disjoint edge in GHG\setminus H, and by Property P2 of a HEDCS, for each such edge ee : xedH(x)β(1λ)\sum\limits_{x\in e}d_{H}(x)\geq\beta(1-\lambda) and by Lemma 6.5, we have xeϕ(dH(x))(15λ)\sum\limits_{x\in e}\phi(d_{H}(x))\geq(1-5\lambda) and each one of these vertices contributes in average at least 15λd\frac{1-5\lambda}{d} to σ\sigma, just as desired. Loosely speaking, ϕ(dH(x))\phi(d_{H}(x)) will end up corresponding to the profit gained by vertex xx in the fractional matching MfHM^{H}_{f}.

Consider Y0Y_{0} and X0X_{0} from above. These sets might not satisfy the Property 4 (that all edges in HH with an endpoint in XX have at least one other endpoint in YY). Can we transform these into sets X1X_{1} and Y1Y_{1}, such that the first three properties still hold and there are no hyperedges with endpoints in X1X_{1} and V(X1Y1)V\setminus(X_{1}\cup Y_{1}); at this stage, however, there will be possibly edges in HH with different endpoints in X1X_{1}. To construct X1X_{1},Y1Y_{1}, we start with X=X0X=X_{0} and Y=Y0Y=Y_{0}, and present a transformation that terminates with X=X1X=X_{1} and Y=Y1Y=Y_{1}. Recall that X0X_{0} has a perfect matching using edges in GHG\setminus H. The set XX will maintain this property throughout the transformation, and each vertex xXx\in X has always a unique mate ee^{\prime}. The construction does the following : as long as there exists an edge ee in HH containing xx and only endpoints in XX and V(XY)V\setminus(X\cup Y), let ee^{\prime} be the mate of xx, we then remove the endpoints of ee^{\prime} from XX and add the endpoints of ee to YY. Property 1 is maintained because we have removed dd vertices from XX and added dd to YY. Property 2 is maintained because the vertices we added to YY were connected by an edge in HH. Property 3 is maintained because XX clearly still has a perfect matching in GHG\setminus H, and for the vertices {x1,x2,,xd}=e\{x_{1},x_{2},\ldots,x_{d}\}=e^{\prime}, the average contribution is still at least 15λd\frac{1-5\lambda}{d}, as above. We continue this process while there is an edge with endpoints in XX and V(XY)V\setminus(X\cup Y). The process terminates because each time we are removing dd vertices from XX and adding dd vertices to YY. We end up with two sets X1X_{1} and Y1Y_{1} such that the first three properties of the lemma are satisfied and there are no edges with endpoints in X1X_{1} and V(X1Y1)V\setminus(X_{1}\cup Y_{1}). This means that for any edges in HH incident to XX, this edge is either incident to YY as well or incident to only points in XX.

We now set X=X1X=X_{1} and Y=Y1Y=Y_{1} and show how to transform XX and YY into two sets that satisfy all four properties of the lemma. Recall that X1X_{1} still contains a perfect matching using edges in GHG\setminus H; denote this matching MXGM^{G}_{X}. Our final set, however, will not guarantee such a perfect matching. Let MXHM^{H}_{X} be a maximal matching in XX using edges in HH (with edges not incident to YY, because they already satisfy Property 4). Consider the edge set EX=MXGMXHE^{*}_{X}=M^{G}_{X}\cup M^{H}_{X}.

Refer to caption
Figure 3: Example with d=4d=4. In blue the edges of MXGM^{G}_{X} and in yellow the edges of MXHM^{H}_{X}

We now perform the following simple transformation, we remove the endpoints of the edges in MXHM^{H}_{X} from XX and add them directly to YY. Property 1 is preserved because we are deleting and adding the same number of vertices from XX and to YY respectively. Property 2 is preserved because the endpoints we add to YY are matched in MXHM^{H}_{X} and thus in HH. We will see later for Property 3.

Let’s check if Property 4 is preserved. To see this, note that we took MHXM_{H}^{X} to be maximal among edges of HH that contain only endpoints in XX, and moved all the matched vertices in MXHM^{H}_{X} to YY. Thus all vertices that remain in XX are free in MHXM_{H}^{X}, and so there are no edges between only endpoints in XX after the transformation. There are also no edges in HH containing endpoints from XX and V(XY)V\setminus(X\cup Y), because originally they don’t exist for X=X1X=X_{1}

Let’s check if the Property 3 is preserved. As before, this involves showing that after the transformation, the average contribution of a vertex in XYX\cup Y to σ\sigma is at least 15λd\frac{1-5\lambda}{d}. (Because every vertex in XX is incident to an edge in EXE^{*}_{X}, each vertex is accounted for in the transformation.) Now, all vertices that were in Y1Y_{1} remain in YY , so their average contribution remains at 1/d1/d. We thus need to show that the average contribution to σ\sigma among vertices in X1X_{1} remains at least 15λd\frac{1-5\lambda}{d} after the transformation.

Let n=|MXG|n=|M^{G}_{X}| and k=|MXH|k=|M^{H}_{X}|. Since MXGM^{G}_{X} is a perfect matching on X1X_{1}, we always have knk\leq n. If n=kn=k, then all vertices of X1X_{1} are transferred to YY and clearly their average contribution to σ\sigma is 1d15λd\frac{1}{d}\geq\frac{1-5\lambda}{d}. Now consider when kn1k\leq n-1. Let the edges of MXHM^{H}_{X} be {e1,ek}\{e_{1},\ldots e_{k}\} and these of MXGM^{G}_{X} be {e1,en}\{e^{\prime}_{1},\ldots e^{\prime}_{n}\}. Let XX^{\prime} be the set of vertices that remain in X1X_{1}. Because the edges of MXGM^{G}_{X} are not HH, the by two properties of HEDCS :

1in,xeidH(x)\displaystyle\sum\limits_{1\leq i\leq n,\ x\in e^{\prime}_{i}}d_{H}(x) \displaystyle\geq nβ(1λ), and\displaystyle n\beta(1-\lambda)\ ,\mbox{ and }
1ik,xeidH(x)\displaystyle\sum\limits_{1\leq i\leq k,\ x\in e_{i}}d_{H}(x) \displaystyle\leq kβ.\displaystyle k\beta\ .

The sum of the degrees of vertices in XX^{\prime} can be written as the following difference:

xXdH(x)\displaystyle\sum\limits_{x\in X^{\prime}}d_{H}(x) =\displaystyle= 1in,xeidH(x)1ik,xeidH(x)\displaystyle\sum\limits_{1\leq i\leq n,\ x\in e^{\prime}_{i}}d_{H}(x)-\sum\limits_{1\leq i\leq k,\ x\in e_{i}}d_{H}(x)
\displaystyle\geq nβ(1λ)kβ\displaystyle n\beta(1-\lambda)-k\beta
=\displaystyle= (nk)βnβλ.\displaystyle(n-k)\cdot\beta-n\beta\lambda\ .

Now we prove that (A.2) implies that the contribution of vertices from XX^{\prime} on average at least 15λd\frac{1-5\lambda}{d}.

Claim A.1.

xXϕ(dH(x))(nk)5nλ\sum\limits_{x\in X^{\prime}}\phi(d_{H}(x))\geq(n-k)-5n\cdot\lambda.

By claim A.1, the average contribution among the vertices that are considered is

(nk)5nλ+knd=15λd,\frac{(n-k)-5n\cdot\lambda+k}{nd}=\frac{1-5\lambda}{d}\ ,

which proves Property 3 and completes the proof of the lemma.∎

Proof of claim A.1.

Let’s denote m:=nkm:=n-k. Recall that the number of vertices in XX^{\prime} is equal to mdmd and ϕ(x)=d1dxβx\phi(x)=\frac{d-1}{d}\frac{x}{\beta-x}. Here we will prove it for λ=0\lambda=0. This means that we will prove that

xXdH(x)mβxXϕ(dH(x))m.\sum\limits_{x\in X^{\prime}}d_{H}(x)\geq m\cdot\beta\Rightarrow\sum\limits_{x\in X^{\prime}}\phi(d_{H}(x))\geq m\ .

If m=nk=1m=n-k=1, then the result clearly holds by Lemma 6.5. Let’s suppose k<n1k<n-1 and thus m2m\geq 2. By Lemma 6.5, we know that:

md1mdxXdH(x)mβdH(x)1.\frac{md-1}{md}\sum\limits_{x\in X^{\prime}}\frac{d_{H}(x)}{m\cdot\beta-d_{H}(x)}\geq 1\ . (5)

We also know that :

mβaβa\displaystyle\frac{m\beta-a}{\beta-a} =\displaystyle= m+(m1)aβa, and\displaystyle m+\frac{(m-1)a}{\beta-a}\ ,\mbox{ and } (6)
aβa\displaystyle\frac{a}{\beta-a} =\displaystyle= mamβa+(m1)a2(mβa)(βa).\displaystyle m\cdot\frac{a}{m\beta-a}+\frac{(m-1)a^{2}}{(m\beta-a)(\beta-a)}\ . (7)

Combining (5) and (7), we get:

xXdH(x)βdH(x)m2dmd1+xX(m1)dH(x)2(mβdH(x))(βdH(x)).\sum\limits_{x\in X^{\prime}}\frac{d_{H}(x)}{\beta-d_{H}(x)}\geq\frac{m^{2}d}{md-1}+\sum\limits_{x\in X^{\prime}}\frac{(m-1)d_{H}(x)^{2}}{(m\beta-d_{H}(x))(\beta-d_{H}(x))}\ .

which leads to:

xXϕ(dH(x))d1dxXdH(x)βdH(x)mm(d1)md1+d1dxX(m1)dH(x)2(mβdH(x))(βdH(x))m+d1dxX(m1)dH(x)2(mβdH(x))(βdH(x))m1md1.\begin{split}\sum\limits_{x\in X^{\prime}}\phi(d_{H}(x))&\geq\frac{d-1}{d}\sum\limits_{x\in X^{\prime}}\frac{d_{H}(x)}{\beta-d_{H}(x)}\\ &\geq m\cdot\frac{m(d-1)}{md-1}+\frac{d-1}{d}\sum\limits_{x\in X^{\prime}}\frac{(m-1)d_{H}(x)^{2}}{(m\beta-d_{H}(x))(\beta-d_{H}(x))}\\ &\geq m+\frac{d-1}{d}\sum\limits_{x\in X^{\prime}}\frac{(m-1)d_{H}(x)^{2}}{(m\beta-d_{H}(x))(\beta-d_{H}(x))}-\frac{m-1}{md-1}\ .\end{split} (8)

By convexity of the function xx2(mβx)(βx)x\mapsto\frac{x^{2}}{(m\beta-x)(\beta-x)}:

xXdH(x)2(mβdH(x))(βdH(x))md(dH(x))md)2(mβdH(x))md(βdH(x))md)mβ(md1)(d1).\begin{split}\sum\limits_{x\in X^{\prime}}\frac{d_{H}(x)^{2}}{(m\beta-d_{H}(x))(\beta-d_{H}(x))}&\geq md\frac{(\frac{\sum d_{H}(x))}{md})^{2}}{(m\beta-\frac{\sum d_{H}(x))}{md}(\beta-\frac{\sum d_{H}(x))}{md})}\\ &\geq\frac{m\beta}{(md-1)(d-1)}\ .\end{split} (9)

Where the last inequality is due to xXdH(x)mβ\sum\limits_{x\in X^{\prime}}d_{H}(x)\geq m\cdot\beta

Therefore, when βd2\beta\geq\frac{d}{2} the right hand side of (8) becomes:

m+d1dxXdH(x)2(mβdH(x))(βdH(x))1md1\displaystyle m+\frac{d-1}{d}\sum\limits_{x\in X^{\prime}}\frac{d_{H}(x)^{2}}{(m\beta-d_{H}(x))(\beta-d_{H}(x))}-\frac{1}{md-1} \displaystyle\geq m+1md1(mβd1)\displaystyle m+\frac{1}{md-1}\left(\frac{m\beta}{d}-1\right)
\displaystyle\geq m.\displaystyle m.

A.3 Proof of Corollary 6.9

See 6.9

Proof.

For linear hypergraphs, assume that dA(v)=kλβd_{A}(v)=k\lambda\beta with k=Ω(logn)k=\Omega(\log{n}) and dB(v)=0d_{B}(v)=0. The difference in the analysis is that, for every edge ee belonging to the kλβk\lambda\beta edges that are incident to vv in AA, we can find a new set of at least (k(d+1))λβ(k-(d+1))\lambda\beta edges in BAB\setminus A. In fact, for such an edge ee, every one of the (1λ)β(1-\lambda)\beta edges that verify uvdB(u)(1λ)β\sum\limits_{u\neq v}d_{B}(u)\geq(1-\lambda)\beta intersect ee in exactly on vertex that is not vv. The same goes for the subset of at least (k1)λβ(k-1)\lambda\beta that are in BB\setminus A, these edges already intersect ee, and can at most have one intersection in between them. At most d(d1)d(d-1) of these edges can be considered simultaneously for different e1,,ede_{1},\ldots,e_{d} from the kλβk\lambda\beta edges incident to vv. Therefore, for every edge ee, we can find at least a set of (k1)λβd(d1)(k(d+1))λβ(k-1)\lambda\beta d(d-1)\geq(k-(d+1))\lambda\beta new edges that in BAB\setminus A. This means that at this point we have already covered kλβ(k(d+1))λβk\lambda\beta(k-(d+1))\lambda\beta edges in both AA and BB. One can see that we can continue covering new edges just like in the previous lemma, such that at iteration ll, the number of covered edges is at least

k(k(d+1))(k2(d+1))(kl(d+1))(λβ)l,k(k-(d+1))(k-2(d+1))\ldots(k-l(d+1))(\lambda\beta)^{l},

for lk1d+1l\leq\frac{k-1}{d+1}. It is easy to see that for l=k1d+1l=\frac{k-1}{d+1}, we will have k(k(d+1))(k2(d+1))(kl(d+1))(λβ)l>2nβk(k-(d+1))(k-2(d+1))\ldots(k-l(d+1))(\lambda\beta)^{l}>2n\beta if k=Ω(logn)k=\Omega(\log{n}), which will be a contradiction. ∎

Appendix B Figures and Tables

nn mm dd kk #I Gr IS HEDCS β\beta λ\lambda
15 200 3 5 500 79.1% 87.5% 82.3% 5 3
30 400 5 82.6% 91.3% 84.4% 7 4
100 3200 10 83.9% 96.2% 88.2% 5 2
300 4000 10 81.1% 92.0% 86.3% 4 2
50 800 5 6 500 76.0% 89.1% 78.5% 16 11
100 2,800 10 77.9% 92.1% 81.9% 16 11
300 4,000 10 77.8% 93.9% 87.1% 10 5
500 8,000 16 79.2% 94.3% 85.9% 10 5
500 15,000 10 16 500 71.8% 90.6% 79.2% 20 10
1,000 50,000 20 73.8% 92.3% 81.5% 20 10
2,500 100,000 20 72.2% 91.5% 80.7% 20 10
5,000 200,000 20 72.5% 90.7% 79.8% 20 10
1,000 5,0000 25 20 100 68.2% 87.5% 75.6% 75 50
2,500 100,000 25 69.0% 87.9% 74.3% 75 50
5,000 250,000 30 67.8% 87.3% 75.1% 75 50
10,000 500,000 30 67.2% 86.9% 73.7% 75 50
5,000 250,000 50 30 100 67.4% 86.6% 74.0% 100 50
10,000 500,000 30 68.1% 87.1% 73.4% 100 50
15,000 750,000 30 66.9% 86.2% 73.2% 100 50
25,000 1,000,000 30 67.3% 86.0% 72.8% 100 50

Table 5: Comparison on random uniform instances with maximal matching benchmark.

Appendix C Chernoff Bound

Let X1,,XnX_{1},\ldots,X_{n} be independent random variables taking value in [0,1][0,1] and X:=i=1nXiX:=\sum\limits_{i=1}^{n}X_{i}. Then, for any δ(0,1)\delta\in(0,1)

Pr(|X𝔼[X]|δ𝔼[X])2exp(δ2𝔼[X]3).Pr\Big{(}|X-\mathbb{E}[X]|\geq\delta\mathbb{E}[X]\Big{)}\geq 2\cdot\exp{\Big{(}-\frac{\delta^{2}\mathbb{E}[X]}{3}\Big{)}}.

Appendix D Nesbitt’s Inequality

Nesbitt’s inequality states that for positive real numbers aa, bb and cc,

ab+c+ba+c+ca+b32,\frac{a}{b+c}+\frac{b}{a+c}+\frac{c}{a+b}\geq\frac{3}{2},

with equality when all the variables are equal. And generally, if a1,,ana_{1},\ldots,a_{n} are positive real numbers and s=i=1ns=\sum\limits_{i=1}^{n}, then:

i=1naisainn1,\sum\limits_{i=1}^{n}\frac{a_{i}}{s-a_{i}}\geq\frac{n}{n-1},

with equality when all the aia_{i} are equal.