This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Learning Low Degree Hypergraphs

Eric Balkanski111Columbia University. [email protected]    Oussama Hanguir222Columbia University. [email protected]    Shatian Wang333Columbia University. [email protected]
Abstract

We study the problem of learning a hypergraph via edge detecting queries. In this problem, a learner queries subsets of vertices of a hidden hypergraph and observes whether these subsets contain an edge or not. In general, learning a hypergraph with mm edges of maximum size dd requires Ω((2m/d)d/2)\Omega((2m/d)^{d/2}) queries [7]. In this paper, we aim to identify families of hypergraphs that can be learned without suffering from a query complexity that grows exponentially in the size of the edges.

We show that hypermatchings and low-degree near-uniform hypergraphs with nn vertices are learnable with poly(n)\operatorname{poly}(n) queries. For learning hypermatchings (hypergraphs of maximum degree 11), we give an 𝒪(log3n)\mathcal{O}(\log^{3}n)-round algorithm with 𝒪(nlog5n)\mathcal{O}(n\log^{5}n) queries. We complement this upper bound by showing that there are no algorithms with poly(n)\operatorname{poly}(n) queries that learn hypermatchings in o(loglogn)o(\log\log n) adaptive rounds. For hypergraphs with maximum degree Δ\Delta and edge size ratio ρ\rho, we give a non-adaptive algorithm with 𝒪((2n)ρΔ+1log2n)\mathcal{O}((2n)^{\rho\Delta+1}\log^{2}n) queries. To the best of our knowledge, these are the first algorithms with poly(n,m)\operatorname{poly}(n,m) query complexity for learning non-trivial families of hypergraphs that have a super-constant number of edges of super-constant size.444Accepted for presentation at the Conference on Learning Theory (COLT) 2022.

1 Introduction

Hypergraphs are a powerful tool in computational chemistry and molecular biology where they are used to represent groups of chemicals and molecules that cause a reaction. The chemicals and molecules that cause a reaction are however often unknown a priori, and learning such groups is a central problem of interest that has motivated a long line of work on hypergraph learning in the edge-detecting queries model, e.g., [18, 7, 6, 2, 3, 1]. In this model, a learner queries subsets of vertices and, for each queried subset, observes whether it contains an edge or not. When the vertices represent chemicals, the learner queries groups of chemicals and observes a reaction if a subgroup reacts. The goal is to learn the edges with a small number of queries, i.e., a small number of experiments.

An important application of learning chemical reaction networks is to learn effective drug combinations (drugs are vertices and effective drug combinations are hyperedges). In particular, there has recently been a lot of interest in finding drug combinations that reduce cancer cell viability. For example, as part of AstraZeneca’s recent DREAM Challenge, the effectiveness of a large number of drug combinations was tested against different cancer cell lines [17]. Despite this interest, as noted by Flobak et al. [14], “our knowledge about beneficial targeted drug combinations is still limited, partly due to the combinatorial complexity”, especially since the time required to culture, maintain and test cell lines is significant. This combinatorial complexity motivates the need for novel computational methods that efficiently explore drug combinations.

One main obstacle to discovering effective combinations that involve d>2d>2 vertices is that Ω((2m/d)d/2)\Omega((2m/d)^{d/2}) queries are required to learn hypergraphs with mm edges of maximum size dd [6]. Since there is no efficient algorithm for learning general hypergraphs with large maximum edge size, the main question is which families of hypergraphs can be efficiently learned.

Which families of hypergraphs can we learn with a number of queries
that does not grow exponentially in the size of the edges?

Our results.

We develop the first algorithms with poly(n,m)\operatorname{poly}(n,m) queries for learning non-trivial families of hypergraphs that have a super-constant number of edges mm of super-constant size. The first family of such hypergraphs that we consider are hypermatchings, i.e., hypergraphs such that every vertex is in at most one edge. Learning a hypermatching generalizes the problem of learning a matching studied in [4, 8]. In addition to their query complexity, we are also interested in the adaptive complexity of our algorithms, which is the number of adaptive rounds of queries required when the algorithm can perform multiple queries in parallel in each round. Our first main result is an algorithm with near-linear query complexity and poly-logarithmic adaptive complexity for learning hypermatchings.

Theorem.

There is a 𝒪(log3n)\mathcal{O}(\log^{3}n)-adaptive algorithm with 𝒪(nlog5n)\mathcal{O}(n\log^{5}n) queries that, for any hypermatching MM over nn vertices, learns MM with high probability.

The adaptivity of the algorithm can be improved to 𝒪(logn)\mathcal{O}(\log n) at the expense of 𝒪(n3log3n)\mathcal{O}(n^{3}\log^{3}n) queries. We complement our algorithm by showing that there are no o(loglogn)o(\log\log n)-adaptive algorithms that learn hypermatchings using poly(n)\operatorname{poly}(n) queries.

Theorem.

There is no o(loglogn)o(\log\log n)-adaptive algorithm with polynomial query complexity that learns an arbitrary hypermatching MM over nn vertices, even with small probability.

Moving beyond hypermatchings, we study hypergraphs whose vertices have bounded maximum degree Δ2\Delta\geq 2 (Δ=1\Delta=1 for hypermatchings). For such hypergraphs, the previously mentioned Ω((2m/d)d/2)\Omega((2m/d)^{d/2}) lower bound on the number of queries needed by any algorithm implicitly also implies that Ω((m/ρ)ρ)\Omega((m/\rho)^{\rho}) queries are required, even when Δ=2\Delta=2, where ρ1\rho\geq 1 is the ratio between the maximum and minimum edge sizes [6]. Thus, an exponential dependence on this near-uniform edge sizes parameter ρ\rho is, in general, unavoidable. We give a non-adaptive algorithm with query complexity that depends on the maximum degree Δ\Delta and the near-uniform parameter ρ\rho of the hypergraph HH we wish to learn.

Theorem.

There is a non-adaptive algorithm with 𝒪((2n)ρΔ+1log2n)\mathcal{O}((2n)^{\rho\Delta+1}\log^{2}n) queries that, for any ρ\rho near-uniform hypergraph HH with maximum degree Δ2\Delta\geq 2, learns HH with high probability.

This query complexity is independent of the maximum size dd of the edges and is polynomial in the number of vertices nn when the maximum degree Δ\Delta and the edge size ratio ρ\rho are constant. Our learning algorithms rely on novel constructions of collections of queries that satisfy a simple property that we call unique-edge covering, which is a general approach for constructing learning algorithms with a query complexity that does not grow exponentially in the size of the edges. We believe that an interesting direction for future work is to identify, in addition to hypermatchings, other relevant families of hypergraphs that are learnable with poly(n,m)\operatorname{poly}(n,m) queries, even when both the maximum edge size dd and the edge size ratio ρ\rho are non-constant.

Technical overview.

Previous work on learning a hypergraph H=(V,E)H=(V,E) relies on constructing a collection of sets of vertices called an independent covering family [7] or a cover-free family [3, 15, 16, 2]. These families aim to identify non-edges, which are sets of vertices TVT\subseteq V that do not contain an edge eEe\in E. More precisely, both of these families are a collection 𝒮\mathcal{S} of sets of vertices such that every non-edge set TT of size dd is contained in at least one set S𝒮S\in\mathcal{S} that does not contain an edge. These families have a minimum size that is exponential in dd and these approaches require a number of queries that is at least the size of these families. In contrast to these existing approaches that focus on non-edges, we construct a collection 𝒮\mathcal{S} of sets of vertices such that every edge ee is contained in at least one set S𝒮S\in\mathcal{S} that does not contain any other edge of HH. We call such a collection a unique-edge covering family. Since the number of edges in a hypergraph HH with maximum degree Δ\Delta is at most Δn\Delta n, there exists unique-edge covering families whose sizes depend on the maximum degree Δ\Delta of HH instead of the maximum edge size dd.

The algorithm for learning a hypermatching MM proceeds in phases. In each phase, we construct a unique-edge covering family of a subgraph of MM containing all edges of size that is in a specified range. This unique-edge covering family is constructed with i.i.d. sampled sets that contain each vertex, independently, with probability depending on the edge size range. The edge size range is widened at the end of each phase. One main challenge with hypermatchings is to obtain a near-linear query complexity. We do so by developing a subroutine which, given a set SS that covers a unique edge of size at most ss, finds this unique edge with 𝒪(slog|S|)\mathcal{O}(s\log|S|) queries. For the lower bound for hypermatchings, there is a simple construction that shows that there are no non-adaptive algorithms for learning hypermatchings with poly(n)\operatorname{poly}(n) queries. The technical part of interest is extending this construction and its analysis to hold for Ω(loglogn)\Omega(\log\log n) rounds. Finally, for low-degree near-uniform hypergraphs HH, we construct a unique-edge covering family in a similar manner to those for hypermatchings (Δ=1\Delta=1). The main technical difficulty for hypergraphs with maximum degree Δ>1\Delta>1 is in analyzing the number of samples required to construct a unique-edge covering family, which is significantly more involved than for hypermatchings due to overlapping edges.

Related work.

The problem of learning a hidden hypergraph with edge-detecting queries was first introduced by Torney [18] for complex group testing, where this problem is also known as exact learning from membership queries of a monotone DNF with at most mm monomials, where each monomial contains at most dd variables [5, 6, 9].

Regarding hardness results, non-adaptive algorithms for learning hypergraphs with mm edges of size at most dd require Ω(N(m,d)logn)\Omega(N(m,d)\log n) queries where N(m,d)=m+dlog(m+dd)(m+dd)N(m,d)=\frac{m+d}{\log\binom{m+d}{d}}\binom{m+d}{d} [2, 16]. Angluin and Chen [6] show that Ω((2m/d)d/2)\Omega((2m/d)^{d/2}) queries are required by any (adaptive) algorithm for learning general hypergraphs, and then extended this lower bound in [7] to Ω((2m/(δ+2))1+δ/2)\Omega((2m/(\delta+2))^{1+\delta/2}) for learning near-uniform hypergraphs with maximum edge size difference δ=dmineH|e|\delta=d-\min_{e\in H}|e|.

Regarding adaptive algorithms, Angluin and Chen [7] give an algorithm that learns a hypergraph with 𝒪(2𝒪((1+δ2)d)m1+δ2poly(logn))\mathcal{O}(2^{\mathcal{O}((1+\frac{\delta}{2})d)}\cdot m^{1+\frac{\delta}{2}}\cdot poly(\log n)) queries in 𝒪((1+δ)min{2d(logm+d)2,(logm+d)3})\mathcal{O}((1+\delta)\min\{2^{d}(\log m+d)^{2},(\log m+d)^{3}\}) rounds with high probability. When m<dm<d, Abasi et al. [2] give a randomized algorithm that achieves a query complexity of 𝒪(dmlogn+(d/m)m1+o(1))\mathcal{O}(dm\log n+(d/m)^{m-1+o(1)}) and show an almost matching Ω(dmlogn+(d/m)m1)\Omega(dm\log n+\left({d}/{m}\right)^{m-1}) lower bound. When mdm\geq d, they present a randomized algorithm that asks (cm)d/2+0.75+dmlogn(cm)^{d/2+0.75}+dm\log n queries for some constant cc, almost matching the Ω((2m/d)d/2)\Omega((2m/d)^{d/2}) lower bound of [6]. Chodoriwsky and Moura [12] develop an adaptive algorithm with 𝒪(mdlogn)\mathcal{O}(m^{d}\log n) queries. Adaptive algorithms with 𝒪(md(log2n+(md)d))\mathcal{O}\left(md(\log_{2}{n}+(md)^{d})\right) queries are constructed in [13]. Regarding non-adaptive algorithms, Chin et al. [11] and Abasi et al. [3] give deterministic non-adaptive algorithms with an almost optimal query complexity N(m,d)1+o(1)lognN(m,d)^{1+o(1)}\log n. The problem of learning a hypergraph when a fraction of the queries are incorrect is studied in Chen et al. [10] and Abasi [1]. We note that, to the best of our knowledge, all existing algorithms require a number of queries that is exponential in dd, or exponential in mm when m<dm<d. We develop the first algorithms using poly(n)\operatorname{poly}(n) queries for learning non-trivial families of hypergraphs that have arbitrarily large edge sizes and a super-constant number of edges mm.

Finally, learning a matching has been studied in the context of graphs (d=2)(d=2) [4, 8]. In particular Alon et al. [4] provide a non-adaptive algorithm with 𝒪(nlogn)\mathcal{O}(n\log n) queries. To the best of our knowledge, there is no previous work on learning hypermatchings, which generalizes matchings to hypergraphs of maximum edge size d>2d>2. Our lower bound shows that, in contrast to algorithms for learning matchings, algorithms for learning hypermatchings require multiple adaptive rounds. Similar to the algorithm in [4] for learning matchings, our algorithm for learning hypermatchings has a near-linear query complexity.

2 Preliminaries

A hypergraph H=(V,E)H=(V,E) consists of a set of nn vertices VV and a set of mm edges E2VE\subseteq 2^{V}. We abuse notation and write eHe\in H for edges eEe\in E. For the problem of learning a hypergraph HH, the learner is given VV and an edge-detecting oracle QHQ_{H}. Given a query SVS\subseteq V, the oracle answers QH(S)=1Q_{H}(S)=1 if SS contains an edge ee, i.e. there exists eHe\in H such that eSe\subseteq S; otherwise, QH(S)=0Q_{H}(S)=0. The goal is to learn the edge set EE using a small number of queries. When queries can be evaluated in parallel, we are interested in algorithms with low adaptivity. The adaptivity of an algorithm is measured by the number of sequential rounds it makes where, in each round, queries are evaluated in parallel. An algorithm is non-adaptive if its adaptivity is 11.

The degree of a vertex vv is the number of edges eHe\in H such that vev\in e. The maximum degree Δ\Delta of HH is the maximum degree of a vertex vVv\in V. When Δ=1\Delta=1, every vertex is in at most one edge and HH is called a hypermatching, which we also denote by MM. The rank dd of HH is the maximum size of an edge eHe\in H, i.e., d:=maxeH|e|d:=\max_{e\in H}|e|. The edge size ratio of a hypergraph HH is d/mineH|e|d/\min_{e\in H}|e|. A graph is ρ\rho-near-uniform and is uniform if d/mineH|e|ρd/\min_{e\in H}|e|\leq\rho and d/mineH|e|=1d/\min_{e\in H}|e|=1 respectively.

3 Learning Hypermatchings

In this section, we study the problem of learning hypermatchings, i.e., hypergraphs of maximum degree Δ=1\Delta=1. In Section 3.1, we present an algorithm that, with high probability, learns a hypermatching with 𝒪(nlog5n)\mathcal{O}(n\log^{5}n) queries in 𝒪(log3n)\mathcal{O}(\log^{3}n) rounds. The number of rounds can be improved to 𝒪(logn)\mathcal{O}(\log n), at the expense of an 𝒪(n3log3n)\mathcal{O}(n^{3}\log^{3}n) query complexity. In Section 3.2, we show that there is no o(loglogn)o(\log\log n)-round algorithm that learns hypermatchings with poly(n)\operatorname{poly}(n) queries. Some proofs are deferred to Appendix A.1 and A.2.

3.1 Learning algorithm for hypermatchings

A central concept we introduce for our learning algorithms is the definition of a unique-edge covering family of a hypergraph HH, which is a collection of sets such that every edge eHe\in H is contained in at least one set that does not contain any other edge.

Definition 1.

A collection 𝒮2V\mathcal{S}\subseteq 2^{V} is a unique-edge covering family of HH if, for every eHe\in H, there exists S𝒮S\in\mathcal{S} s.t. ee is the unique edge contained in SS, i.e. eSe\subseteq S and eSe^{\prime}\not\subseteq S for all eH,eee^{\prime}\in H,e^{\prime}\neq e.

Efficiently searching for the unique edge in a set.

We first show that, given a unique-edge covering family 𝒮\mathcal{S} of a hypermatching MM, we can learn MM. We observe that a set of vertices SVS\subseteq V contains a unique edge if and only if QM(S)=1Q_{M}(S)=1 and QM(S{v})=0Q_{M}(S\setminus\{v\})=0 for some vSv\in S and that if it contains a unique edge, this edge is e={vS:QM(S{v})=0}e=\{v\in S:Q_{M}(S\setminus\{v\})=0\}. This observation immediately leads to the following simple algorithm called FindEdgeParallel.

Algorithm 1 FindEdgeParallel(S), returns ee if SS contains a unique edge ee
0:  set SVS\subseteq V of vertices
1:   if QM(S)=1Q_{M}(S)=1 and vS\exists v\in S s.t. QM(S{v})=0Q_{M}(S\setminus\{v\})=0 then return {vS:QM(S{v})=0}\{v\in S:Q_{M}(S\setminus\{v\})=0\}
2:  return None

The main lemma for FindEdgeParallel follows immediately from the above observation.

Lemma 1.

For any hypermatching MM and set SVS\subseteq V, FindEdgeParallel is a non-adaptive algorithm with |S|+1|S|+1 queries that, if SS contains a unique edge ee, returns ee and None otherwise.

Proof.

It is clear that FindEdgeParallel(S)\textsc{FindEdgeParallel}(S) makes at most |S|+1|S|+1 queries. If SS does not contain an edge, then QM(S)=0Q_{M}(S)=0 and the algorithm returns None. If SS contains at least two edges, then there is no intersection between the edges because MM is a matching. Therefore, for every vSv\in S, QM(S{v})=1Q_{M}(S\setminus\{v\})=1, and the algorithm returns None. The only case left is when SS contains a unique edge ee. In this case QM(S)=1Q_{M}(S)=1 and e={vS,QM(S{v}=0}e=\{v\in S,\ Q_{M}(S\setminus\{v\}=0\}. In this case, FindEdgeParallel(S)\textsc{FindEdgeParallel}(S) return ee. ∎

In order to obtain a near-linear query complexity for learning a hypermatching, the query complexity of FindEdgeParallel needs to be improved. Next, we describe an 𝒪(log|S|)\mathcal{O}(\log|S|)-round algorithm, called FindEdgeAdaptive, which finds the unique edge ee in a set SS with 𝒪(slog|S|)\mathcal{O}(s\log|S|) queries, assuming |e|s|e|\leq s. The algorithm recursively partitions SS into two arbitrary sets S1S_{1} and S2S_{2} of equal size. If QM(S1)=QM(S2)=1Q_{M}(S_{1})=Q_{M}(S_{2})=1, then SS contains at least two edges. If QM(S1)=1Q_{M}(S_{1})=1 and QM(S2)=0Q_{M}(S_{2})=0 (or similarly QM(S1)=0Q_{M}(S_{1})=0 and QM(S2)=1Q_{M}(S_{2})=1), then, assuming SS contains a single edge, this edge is contained in S1S_{1} and we recurse on S1S_{1}. The most interesting case is if QM(S1)=QM(S2)=0Q_{M}(S_{1})=Q_{M}(S_{2})=0. Assuming SS contains a single edge ee, this implies that both S1S_{1} and S2S_{2} contain vertices in ee and we recurse on both S1S_{1} and S2S_{2}. When recursing on S1S_{1}, the set S2S_{2} needs to be included in future queries to find vertices in eS1e\cap S_{1} (and vice-versa when recursing on S2S_{2}). The algorithm thus also specifies an additional argument TT for the recursive calls. This argument is initially the empty set, in the case where QM(S1)=QM(S2)=0Q_{M}(S_{1})=Q_{M}(S_{2})=0 the set S2S_{2} is added to TT when recursing on S1S_{1} and vice-versa, and TT is included in all queries in the recursive calls.

Algorithm 2 FindEdgeAdaptive(S,s)\textsc{FindEdgeAdaptive}(S,s), returns ee if SS contains a unique edge ee and |e|s|e|\leq s
0:  set SVS\subseteq V, edge size ss
1:  if QM(S)=0Q_{M}(S)=0 then return None
2:  𝒮{(S,)},e^{}\mathcal{S}\leftarrow\{(S,\emptyset)\},\hat{e}\leftarrow\{\}
3:  while |𝒮|>0|\mathcal{S}|>0 do
4:   if |𝒮|>s|\mathcal{S}|>s then return None
5:   𝒮{}\mathcal{S}^{\prime}\leftarrow\{\}
6:   for each (S,T)𝒮(S^{\prime},T)\in\mathcal{S} do (in parallel)
7:    S1,S2S_{1},S_{2}\leftarrow partition SS^{\prime} into two sets of size |S|/2|S^{\prime}|/2
8:    if |S|=1|S^{\prime}|=1 then add vSv\in S^{\prime} to e^\hat{e}
9:    else if QM(S1T)=1Q_{M}(S_{1}\cup T)=1 and QM(S2T)=1Q_{M}(S_{2}\cup T)=1 then return None
10:    else if QM(S1T)=1Q_{M}(S_{1}\cup T)=1 and QM(S2T)=0Q_{M}(S_{2}\cup T)=0 then add (S1,T)(S_{1},T) to 𝒮\mathcal{S}^{\prime}
11:    else if QM(S1T)=0Q_{M}(S_{1}\cup T)=0 and QM(S2T)=1Q_{M}(S_{2}\cup T)=1 then add (S2,T)(S_{2},T) to 𝒮\mathcal{S}^{\prime}
12:    else if QM(S1T)=0Q_{M}(S_{1}\cup T)=0 and QM(S2T)=0Q_{M}(S_{2}\cup T)=0 then add (S1,S2T),(S2,S1T)(S_{1},S_{2}\cup T),(S_{2},S_{1}\cup T) to 𝒮\mathcal{S}^{\prime}
13:   𝒮𝒮\mathcal{S}\leftarrow\mathcal{S}^{\prime}
14:  if |e^|>s|\hat{e}|>s, or QM(e^)=0,Q_{M}(\hat{e})=0, or QM(e^{v})=1Q_{M}(\hat{e}\setminus\{v\})=1 for some ve^v\in\hat{e} then return None
15:  return  e^\hat{e}

Let 𝒮i={(S1i,T1i),,(Si,Ti)}\mathcal{S}^{i}=\{(S^{i}_{1},T^{i}_{1}),\ldots,(S^{i}_{\ell},T^{i}_{\ell})\} be the 𝒮\mathcal{S} at the beginning of the iith iteration of the while loop of FindEdgeAdaptive(S,s)(S,s). In the case where SS contains a single edge ee and |e|s|e|\leq s, the algorithm maintains two invariants (formally stated in Lemma 2; proof in Appendix A.1). The first is that, for all vertices vev\in e and all iterations ii, either ve^v\in\hat{e} or there exists a unique jj such that vv is contained in set SjiS^{i}_{j}. This invariant makes sure that every vertex vev\in e will eventually be added to e^\hat{e}. The second is that all sets SjiS^{i}_{j} contain at least one vertex vev\in e. This invariant explains why, in the base case |Sji|=1|S^{i}_{j}|=1, we add the single vertex in SjiS^{i}_{j} to the solution e^\hat{e}. Additionally, since by construction, S1i,,SliS^{i}_{1},\ldots,S^{i}_{l} are disjoint, the second invariant also implies that when |𝒮i|>s|\mathcal{S}^{i}|>s, we have that either set SS contains more than one edge or that it contains a single edge of size greater than ss, in which case the algorithm returns None. This stopping condition ensures that at most 2s2s queries are performed at each iteration of the while loop.

Lemma 2.

Assume there is a unique edge eMe\in M such that eSe\subseteq S and that |e|s|e|\leq s, then, for every vertex vev\in e and at every iteration ii of the while loop of FindEdgeAdaptive(S,s)(S,s) we have that (1) either ve^v\in\hat{e} or vSjiv\in S^{i}_{j} for a unique j[]j\in[\ell] and (2) eSjie\cap S^{i}_{j}\neq\emptyset for all j[]j\in[\ell].

We show that if FindEdgeAdaptive(S,s)\textsc{FindEdgeAdaptive}(S,s) returns e^\hat{e}, then e^\hat{e} is an edge. If there is a unique edge eSe\subseteq S and |e|s|e|\leq s, then the algorithm returns e^=e\hat{e}=e.

Lemma 3.

For any SVS\subseteq V and s[n]s\in[n], if there is a unique edge eMe\in M such that eSe\subseteq S and this edge is such that |e|s|e|\leq s, then FindEdgeAdaptive(S,s)\textsc{FindEdgeAdaptive}(S,s) returns the edge ee. Otherwise, it either returns None or an edge eMe\in M. FindEdgeAdaptive(S,s)\textsc{FindEdgeAdaptive}(S,s) uses at most 2slog2|S|+(s+1)2s\log_{2}|S|+(s+1) queries in at most log2|S|\log_{2}|S| rounds.

Proof.

First, assume that there is a unique edge eMe\in M such that eSe\subseteq S and that |e|s|e|\leq s. Consider vSv\in S. If vev\in e, then by Lemma 2, we have that at every iteration ii of the while loop, either ve^v\in\hat{e} or vSjiv\in S^{i}_{j} for some jj\in\ell. Since |Sji|=|Sji1|/2|S^{i}_{j}|=|S^{i-1}_{j^{\prime}}|/2 for all i,j,ji,j,j^{\prime}, at iteration i=log2ni^{\star}=\log_{2}n, either ve^v\in\hat{e} or vSjiv\in S^{i^{\star}}_{j} for some jj and |Sji|=1|S^{i^{\star}}_{j}|=1, in which case vv is then added to ee by the algorithm. Next, if vev\not\in e, since eSjie\cap S^{i}_{j}\neq\emptyset for all i,ji,j by Lemma 2, vv is never added to e^\hat{e}. Thus, FindEdgeAdaptive(S,s)(S,s) returns the edge ee

Assume now that SS does not contain any edges. In this case, every query is answered by zero, and FindEdgeAdaptive(S,s)(S,s) returns None. Finally, assume that SS contains at least two edges. If FindEdgeAdaptive(S,s)(S,s) does not return None, then Step 14 ensures that the returned edge e^\hat{e} has size at most ss. Step 14 also ensures that e^\hat{e} contains at least one edge. If e^\hat{e} strictly contains an edge, then there is a vertex ve^v\in\hat{e} such that QM(e^{v})=1Q_{M}(\hat{e}\setminus\{v\})=1, in which case FindEdgeAdaptive(S,s)(S,s) would have returned None. Therefore, e^\hat{e} is an edge of MM of size less than ss.

We now show that FindEdgeAdaptive(S,s)(S,s) runs in log2|S|\log_{2}|S| rounds and makes at most 2slog2|S|+(s+1)2s\log_{2}|S|+(s+1) queries. At every iteration of the while loop, FindEdgeAdaptive(S,s)(S,s) makes less than 2s2s queries (recall that if |𝒮|>s|\mathcal{S}|>s then the algorithm returns None). We show that the number of iterations of the while loop is less than log2|S|.\log_{2}|S|. At every iteration of the while loop, for every couple (S,T)(S,T) in 𝒮\mathcal{S}, the size of SS is divided by 2. After log2|S|\log_{2}|S| iterations, either 𝒮\mathcal{S} is empty or for every couple (S,T)(S,T) in 𝒮\mathcal{S} we have |S|=1|S|=1. This guarantees that no sets are added to 𝒮\mathcal{S}^{\prime}. Therefore, the number of iterations of the while loop is less than log2|S|\log_{2}|S|. This also shows that the adaptive complexity of FindEdgeAdaptive(S,s)(S,s) is less than log2|S|\log_{2}|S|. After the while loop, we make at most s+1s+1 queries and the total number of queries is less than 2slog|S|+s+12s\log|S|+s+1. ∎

Constructing a unique-edge covering family.

Next, we give an algorithm called FindDisjointEdges that, for any hypermatching MM, constructs a unique-edge covering family 𝒮\mathcal{S} of the α\alpha-near-uniform hypermatching Ms,α,VM_{s,\alpha,V^{\prime}}. Ms,α,VM_{s,\alpha,V^{\prime}} is defined as the subgraph of MM over edges of size between s/αs/\alpha and ss and over vertices in VV^{\prime} that are not in an edge of size less than s/αs/\alpha. The unique-edge covering family 𝒮\mathcal{S} is constructed with i.i.d. samples that contain each vertex in VV^{\prime}, independently, with probability nα/sn^{-\alpha/s}. The algorithm then calls a FindEdge subroutine, which is either FindEdgeParallel or FindEdgeAdaptive, on samples that contain at least one edge.

Algorithm 3 FindDisjointEdges(s,α,V,FindEdge)\textsc{FindDisjointEdges}(s,\alpha,V^{\prime},\textsc{FindEdge}), returns edges of size between sα\frac{s}{\alpha} and ss
0:  edge size ss, parameter α1\alpha\geq 1, set of vertices VVV^{\prime}\subseteq V, subroutine FindEdge
1:  M^\hat{M}\leftarrow\emptyset
2:  for i=1,,nαlog2ni=1,\ldots,\ n^{\alpha}\log^{2}{n} do (in parallel)
3:   SiS_{i}\leftarrow set of independently sampled vertices from VV^{\prime}, each with probability nα/sn^{-\alpha/s}.
4:   if QM(Si)=1Q_{M}(S_{i})=1 then
5:    eFindEdge(Si,s)e\leftarrow\textsc{FindEdge}(S_{i},s)
6:    if ee is not None then add ee to M^\hat{M}
7:  return  M^\hat{M}.

We show that nαlog2nn^{\alpha}\log^{2}n such samples suffice to construct a unique-edge covering family 𝒮\mathcal{S} of the α\alpha-near-uniform hypermatching Ms,α,VM_{s,\alpha,V^{\prime}}.

Lemma 4.

Assume s/α2s/\alpha\geq 2, α1\alpha\geq 1, that the FindEdge subroutine uses at most qq queries in at most rr rounds, and let 𝒮={Si}i\mathcal{S}=\{S_{i}\}_{i} be the nαlog2nn^{\alpha}\log^{2}{n} samples, then FindDisjointEdges is an (r+1)(r+1)-adaptive algorithm with nαlog2n+2αqnαlog2n/sn^{\alpha}\log^{2}n+2\alpha qn^{\alpha}\log^{2}n/s queries such that, with probability 1𝒪(nlogn)1-\mathcal{O}(n^{-\log n}), 𝒮\mathcal{S} is a unique-edge covering family of the hypermatching Ms,α,V={eM:eV,s/α|e|s}M_{s,\alpha,V^{\prime}}=\{e\in M:e\subseteq V^{\prime},s/\alpha\leq|e|\leq s\} and thus M^=Ms,α,V\hat{M}=M_{s,\alpha,V^{\prime}}.

Proof.

We know that a sample SiS_{i} contains at least an edge if and only if QM(Si)=1Q_{M}(S_{i})=1. We also know from Lemma 1 and Lemma 3 that if SiS_{i} contains a unique edge ee of size less than ss, then the subroutine FindEdge(Si,s)\textsc{FindEdge}(S_{i},s) will return ee. Therefore, to learn all edges of size in [s/α,s]\left[s/\alpha,s\right], we only need to make sure that each such edge appears in at least one sample that does not contain any other edges, in other words, that 𝒮\mathcal{S} is a unique-edge covering family for Ms,α,V={eM:eV,s/α|e|sM_{s,\alpha,V^{\prime}}=\{e\in M:e\subseteq V^{\prime},\ s/\alpha\leq|e|\leq s}. Below, we show that it is the case w.p. 1O(nlogn)=1eΩ(log2n)1-O(n^{-\log n})=1-e^{-\Omega(\log^{2}n)}.

We first show that with high probability, each edge ee of size |e|[s/α,s]|e|\in\left[s/\alpha,s\right] is contained in at least log2n/2\log^{2}n/2 sample SiS_{i}’s. We use XeX_{e} to denote the number of samples containing ee, then we have 𝔼(Xe)=nαlog2nnα|e|snαlog2nnα=log2n.\mathbb{E}(X_{e})=n^{\alpha}\log^{2}n\cdot n^{-\frac{\alpha|e|}{s}}\geq n^{\alpha}\log^{2}n\cdot n^{-\alpha}=\log^{2}n. By Chernoff bound, we have P(Xelog2n/2)nlogn/8.P(X_{e}\leq\log^{2}n/2)\leq n^{-\log n/8}. As there are at most αn/s\alpha n/s edges of size between s/αs/\alpha and ss, by a union bound, P(eM s.t. |e|[s/α,s],Xelog2n/2)αnsnlogn/8n(logn/81).P(\exists e\in M\text{ s.t. }|e|\in\left[s/\alpha,s\right],X_{e}\leq\log^{2}n/2)\leq\frac{\alpha n}{s}n^{-\log n/8}\leq n^{-(\log n/8-1)}. and we get P(eM s.t. |e|[s/α,s],Xelog2n/2)1n(logn/81).P(\forall e\in M\text{ s.t. }|e|\in\left[s/\alpha,s\right],X_{e}\geq\log^{2}n/2)\geq 1-n^{-(\log n/8-1)}. From now on we condition on the event that all edges ee whose size is between s/αs/\alpha and ss are included in at least log2n/2\log^{2}n/2 samples. We show below that given eSie\subseteq S_{i} for a sample SiS_{i}, the probability that there exists another edge of size at least ss in SiS_{i} is upper bounded by α/s\alpha/s. Recall that MM is the hidden matching we want to learn. We abuse notation and use eMSie^{\prime}\in M\cap S_{i} to denote that an edge ee^{\prime} is both in the matching MM and included in the sample SiS_{i}. We have

(eMSi,ee|eSi)\displaystyle\mathbb{P}(\exists e^{\prime}\in M\cap S_{i},e^{\prime}\neq e\ |\ e\subseteq S_{i}) eM,ee(eSi|eSi)\displaystyle\leq\sum_{e^{\prime}\in M,e^{\prime}\neq e}\mathbb{P}(e^{\prime}\subseteq S_{i}\ |\ e\subseteq S_{i})
=eM,ee(eSi)αns(nαs)sα=αs,\displaystyle=\sum_{e^{\prime}\in M,e^{\prime}\neq e}\mathbb{P}(e^{\prime}\subseteq S_{i})\leq\frac{\alpha n}{s}\cdot(n^{-\frac{\alpha}{s}})^{\frac{s}{\alpha}}=\frac{\alpha}{s},

where the first inequality uses a union bound, the first equality is due to the fact that MM is a matching and thus ee=e\cap e^{\prime}=\emptyset and that vertices are sampled independently, and the second inequality follows because the total number of remaining edges is upper bounded by αn/s\alpha n/s and each edge is in SiS_{i} with probability at most (nαs)sα(n^{-\frac{\alpha}{s}})^{\frac{s}{\alpha}}. As each edge ee with size between s/αs/\alpha and ss is contained in at least log2n/2\log^{2}n/2 samples, we have

(Si s.t. eSi,eMSi,ee)(sα)log2n/2nlogn/4,\displaystyle\mathbb{P}(\forall S_{i}\text{ s.t. }e\subseteq S_{i},\exists\ e^{\prime}\in M\cap S_{i},e^{\prime}\neq e)\leq(\frac{s}{\alpha})^{-\log^{2}n/2}\leq n^{-\log n/4},

where the last inequality is since s/α2s/\alpha\geq 2. By another union bound on the edges of size between s/αs/\alpha and ss, (e s.t. |e|[s/α,s],Si s.t. eSi,eMSi,ee)n(logn/21).\mathbb{P}(\exists e\text{ s.t. }|e|\in\left[s/\alpha,s\right],\forall S_{i}\text{ s.t. }e\subseteq S_{i},\exists e^{\prime}\in M\cap S_{i},e^{\prime}\neq e)\leq n^{-(\log n/2-1)}. We can thus conclude that with probability at least 1O(nlogn)=1eΩ(log2n)1-O(n^{-\log n})=1-e^{-\Omega(\log^{2}n)}, for all eMe\in M with size between s/αs/\alpha and ss, there exists at least one sample SiS_{i} that contains ee but no other remaining edges. By Lemma 1 and Lemma 3, ee is returned by FindEdge(Si,s)\textsc{FindEdge}(S_{i},s), and ee is added to the matching M^\hat{M} that is returned by FindDisjointEdges(s,ρ,V)\textsc{FindDisjointEdges}(s,\rho,V^{\prime}).

Next, we show that FindDisjointEdges is an (r+1)(r+1)-adaptive algorithm that requires nαlog2n+2αqnαlog2n/sn^{\alpha}\log^{2}n+2\alpha qn^{\alpha}\log^{2}n/s queries. We first observe that FindDisjointEdges makes only parallel calls to FindEdges (after verifying that QM(Si)=1Q_{M}(S_{i})=1). Therefore, since FindEdges runs in at most rr rounds, we get that FindDisjointEdges runs in at most r+1r+1 rounds. To prove a bound on the number of queries, we first argue using a Chernoff bound that with probability 1eΩ(log2n)1-e^{-\Omega(\log^{2}n)}, every edge ee of size at least s/αs/\alpha is in at most 2nα1log2n2n^{\alpha-1}\log^{2}n samples SiS_{i}. Therefore there are at most 2αnαslog2n\frac{2\alpha n^{\alpha}}{s}\log^{2}n samples SiS_{i} such that QM(Si)=1Q_{M}(S_{i})=1. For every one of these samples, we call FindEdges(Si,s)\textsc{FindEdges}(S_{i},s), which makes qq queries by assumption. Therefore, with high probability 1eΩ(log2n)1-e^{-\Omega(\log^{2}n)}, FindDisjointEdges makes 𝒪(nαlog2n+αq2nαslog2n)\mathcal{O}(n^{\alpha}\log^{2}n+\alpha q\frac{2n^{\alpha}}{s}\log^{2}n) queries. ∎

The main algorithm for hypermatchings.

The main algorithm first finds edges of size 11 by brute force with FindSingletons. It then iteratively learns the edges of size between s/αs/\alpha and ss by calling FindDisjointEdges(s,α,V)\textsc{FindDisjointEdges}(s,\alpha,V^{\prime}), where VV^{\prime} is the set of vertices that are not in edges learned in previous iterations. At the end of each iteration, the algorithm increases ss. We obtain the main theorem for learning hypermatchings, the proof of which is given in Appendix A.1.

Algorithm 4 FindMatching(α,FindEdge)(\alpha,\textsc{FindEdge}), learns a hypermatching.
0:  parameter α1\alpha\geq 1, subroutine FindEdge
1:  M^FindSingletons,s2α\hat{M}\leftarrow\textsc{FindSingletons},s\leftarrow 2\alpha
2:  while s<ns<n do
3:   VV{v:ve for some eM^}V^{\prime}\leftarrow V\setminus\{v:v\in e\text{ for some }e\in\hat{M}\}
4:   M^M^FindDisjointEdges(s,α,V,FindEdge)\hat{M}\leftarrow\hat{M}\cup\textsc{FindDisjointEdges}(s,\alpha,V^{\prime},\textsc{FindEdge})
5:   sαs+1s\leftarrow\lfloor\alpha s\rfloor+1
6:  return  M^\hat{M}.
Theorem 5.

Let MM be a hypermatching, then FindMatching learns MM w.h.p either in 𝒪(log3n)\mathcal{O}(\log^{3}n) rounds using 𝒪(nlog5n)\mathcal{O}(n\log^{5}n) queries, with α=1/(11/(2logn))\alpha=1/(1-1/(2\log n)) and FindEdgeAdaptive, or in 𝒪(logn)\mathcal{O}(\log n) rounds using 𝒪(n3log3n)\mathcal{O}(n^{3}\log^{3}n) queries, with α=2\alpha=2 and FindEdgeParallel.

3.2 Hardness of learning hypermatchings

In this section, we show that the adaptive complexity of learning a hypermatching with poly(n)\operatorname{poly}(n) queries is Ω(loglogn)\Omega(\log\log n). This result is in sharp contrast to matchings (where edges are of size d=2d=2) for which there exists a non-adaptive learning algorithm with 𝒪(nlogn)\mathcal{O}(n\log n) queries [4].

Warm-up: hardness for non-adaptive learning.

As a warm-up, we present a simple construction of a family of hypermatchings for which there is no non-adaptive learning algorithm with poly(n)\operatorname{poly}(n) queries. Each hypermatching MPM_{P} in this family depends on a partition P=(P1,P2,P3)P=(P_{1},P_{2},P_{3}) of the vertex set VV into three parts P1,P2,P3P_{1},P_{2},P_{3} where |P1|=n(n+1)|P_{1}|=n-(\sqrt{n}+1), |P2|=1|P_{2}|=1, and |P3|=n|P_{3}|=\sqrt{n}. MPM_{P} contains (n(n+1))/2(n-(\sqrt{n}+1))/2 edges of size 22 which form a perfect matching over P1P_{1} and one edge of size n\sqrt{n} which contains all the vertices in P3P_{3}. The only vertex in P2P_{2} is not contained in any edges in MPM_{P}. The main idea of the construction is that, after one round of poly(n)\operatorname{poly}(n) queries, vertices in P2P_{2} and P3P_{3} are indistinguishable to the learning algorithm. However, the algorithm needs to distinguish vertices in P3P_{3} from the vertex in P2P_{2} to learn the edge P3P_{3}.

Theorem 6.

There is no non-adaptive algorithm with poly(n)\operatorname{poly}(n) query complexity that can learn every non-uniform hypermatching with probability ω(1/n)\omega(1/\sqrt{n}).

Proof.

Let 𝒫3\mathcal{P}_{3} be the set of all possible partitions (P1,P2,P3)(P_{1},P_{2},P_{3}) such that |P1|=n(n+1)|P_{1}|=n-(\sqrt{n}+1), |P2|=1|P_{2}|=1, and |P3|=n|P_{3}|=\sqrt{n}. We consider a uniformly random partition PP from 𝒫3\mathcal{P}_{3}, a matching MPM_{P} and a non-adaptive algorithm 𝒜\mathcal{A} that asks a collection 𝒮\mathcal{S} of poly(n)\operatorname{poly}(n) non-adaptive queries. The main lemma (Lemma 2) shows that, with high probability, for all queries S𝒮S\in\mathcal{S}, the answer QMP(S)Q_{M_{P}}(S) is independent of which n\sqrt{n} vertices in P2P3P_{2}\cup P_{3} constitute the edge P3P_{3}. The analysis of this lemma consists of two cases. If the query SS is small (|S|<(n+n+1)/2|S|<(n+\sqrt{n}+1)/2), then we show that SS contains less than n\sqrt{n} vertices from P2P3P_{2}\cup P_{3}, w.h.p. over the randomization of PP, by the Chernoff bound. Therefore SS does not contain P3P_{3} and QMP(S)Q_{M_{P}}(S) is independent of the partition of P2P3P_{2}\cup P_{3} into P2P_{2} and P3P_{3}. If SS is large (|S|(n+n+1)/2|S|\geq(n+\sqrt{n}+1)/2), then SS contains an edge of size two from P1P_{1}. Thus QMP(S)=1Q_{M_{P}}(S)=1 and QMP(S)Q_{M_{P}}(S) is independent of the partition of P2P3P_{2}\cup P_{3} into P2P_{2} and P3P_{3}.

In Lemma 3, we argue that if all queries QMP(S)Q_{M_{P}}(S) of 𝒜\mathcal{A} are independent of the partition of P2P3P_{2}\cup P_{3} into P2P_{2} and P3P_{3}, then, with high probability, the matching returned by this algorithm is not equal to MPM_{P}. By the probabilistic method, there is a matching MPM_{P} with P𝒫3P\in\mathcal{P}_{3} that 𝒜\mathcal{A} does not successfully learn with probability 1𝒪(1/n)1-\mathcal{O}(1/\sqrt{n}). ∎

Hardness of learning in o(loglogn)o(\log\log n) rounds.

The main technical challenge is to generalize the construction and the analysis of the hard family of hypergraphs for non-adaptive learning algorithms to a construction and an analysis which holds over Ω(loglogn)\Omega(\log\log n) rounds. In this construction, each hypermatching MPM_{P} depends on a partition P=(P0,P1,,PR)P=(P_{0},P_{1},\ldots,P_{R}) of the vertex set VV into R+1R+1 parts. For each i{0,R}i\in\{0,\ldots R\}, MPM_{P} contains |Pi|/di|P_{i}|/d_{i} edges of size did_{i} which form a perfect matching over PiP_{i}. We set the sizes such that d0=3d_{0}=3, di+1=3log2ndi2d_{i+1}=3\log^{2}n\cdot d_{i}^{2}, and |Pi|=3log2ndi2|P_{i}|=3\log^{2}n\cdot d_{i}^{2}.

The main idea of the construction is that after ii rounds of queries, vertices in j=i+1RPi\cup_{j=i+1}^{R}P_{i} are indistinguishable to any algorithm. However, the algorithm needs to distinguish vertices in PjP_{j} from vertices in PjP_{j^{\prime}}, for all jjj\neq j^{\prime}, to learn MPM_{P}. Informally, since the edges in PjP_{j^{\prime}} have a larger size than edges in PjP_{j} for j>jj^{\prime}>j, an algorithm can learn only one part PjP_{j} at each round of queries, the one with edges of minimum size among all parts that have not yet been learned in previous rounds.

Theorem 7.

There is no (loglogn3)(\log\log n-3)-adaptive algorithm with poly(n)poly(n) query complexity which can learn every non-uniform hypermatching with probability eo(log2n)e^{-o(\log^{2}n)}.

Proof Sketch, full proof in Appendix A.1..

Let 𝒫R\mathcal{P}_{R} be the set of all partitions (P0,,PR)(P_{0},\ldots,P_{R}) such that |Pi|=3log2ndi2|P_{i}|=3\log^{2}n\cdot d_{i}^{2}, and let PP be a uniformly random partition from 𝒫R\mathcal{P}_{R}, and 𝒜\mathcal{A} be an algorithm with poly(n)\operatorname{poly}(n) queries. In Lemma 8, we show that if vertices in Pi,Pi+1,,PRP_{i},P_{i+1},\ldots,P_{R} are indistinguishable to 𝒜\mathcal{A} at the beginning of round ii of queries, then vertices in Pi+1,,PRP_{i+1},\ldots,P_{R} are indistinguishable at the end of round ii.

The proof of Lemma 8 consists of two parts. First, Lemma 7 shows that if SS is small (|S|(11/di)ji|Pj|)|S|\leq(1-1/d_{i})\sum_{j\geq i}|P_{j}|) and is independent of partition of j=iRPi\cup_{j=i}^{R}P_{i} into Pi,,PRP_{i},\ldots,P_{R}, then w.h.p., for every ji+1j\geq i+1, SS does not contain any edge contained in PjP_{j}. Second, Lemma 6 shows that if a query SS has a large size (|S|(11/di)ji|Pj|)|S|\geq(1-1/d_{i})\sum_{j\geq i}|P_{j}|) and is independent of partition of j=iRPi\cup_{j=i}^{R}P_{i} into Pi,,PRP_{i},\ldots,P_{R}, then w.h.p. SS contains at least one edge from the matching on PiP_{i} which implies QMP(S)=1Q_{M_{P}}(S)=1. Proving Lemma 6 is quite involved. Because the edges of a matching are not independent from each other, a Chernoff bound analysis is not enough to show that the probability that a query does not contain any edge from PiP_{i} is small. We therefore provide an alternative method to bound this probability (Lemma 5). In the end, in both cases, we show that QMP(S)Q_{M_{P}}(S) is independent of the partition of j=i+1RPi\cup_{j=i+1}^{R}P_{i} into Pi+1,,PRP_{i+1},\ldots,P_{R}. Finally, we conclude the proof of Theorem 7 similarly as for Theorem 6. ∎

4 Learning Low-Degree Near-Uniform Hypergraphs

In this section, we give an algorithm for learning low-degree near-uniform hypergraphs. The algorithm is non-adaptive and has 𝒪((2n)ρΔ+1log2n))\mathcal{O}\left((2n)^{\rho\Delta+1}\log^{2}n)\right) query complexity for learning a hypergraph H(V,E)H(V,E) of maximum degree Δ\Delta and edge size ratio ρ\rho. It generalizes FindDisjointEdges and constructs a unique-edge covering family, but its analysis is completely different, and significantly more challenging, due to overlapping edges. Full proofs are deferred to Appendix B.

Near-uniformity is necessary.

The 𝒪(nlog5n)\mathcal{O}(n\log^{5}n) query complexity from the previous section for Δ=1\Delta=1 holds for any hypermatching MM, even those with edge size ratio ρ=Ω(n)\rho=\Omega(n). In contrast, for general Δ\Delta, we obtain an 𝒪(nρΔ+1log2n)\mathcal{O}(n^{\rho\Delta+1}\log^{2}n) query complexity. Angluin and Chen [7] show that Ω((2m/d)d/2)\Omega((2m/d)^{d/2}) queries are required to, even fully adaptively, learn hypergraphs of maximum edge size dd. Their hardness result holds for a family of hypergraphs of maximum degree Δ=2\Delta=2 and edges of size 22 or dd, so with ρ=d/2\rho=d/2. Thus, it implies that Ω((m/ρ)ρ)\Omega((m/\rho)^{\rho}) queries are required to learn hypergraphs even when Δ=2\Delta=2, i.e., an exponential dependence on ρ\rho is required.

The algorithm.

FindLowDegreeEdges constructs a unique-edge covering family of the hypergraph HH similarly to FindDisjointEdges, which requires a larger number of samples when Δ>1\Delta>1. In addition, steps 55-66 are needed to ensure that intersections of edges are not falsely identified as edges since, when SiS_{i} contains two or more edges that all overlap in SiSiS^{\prime}_{i}\subseteq S_{i}, we have that Si={vSi:QH(Si{v})=0}S^{\prime}_{i}=\{v\in S_{i}:Q_{H}(S_{i}\setminus\{v\})=0\}.

Algorithm 5 FindLowDegreeEdges, learns a ρ\rho-near-uniform hypergraph with max degree Δ\Delta
0:  edge size ratio ρ\rho, maximum edge size dd, maximum degree Δ2\Delta\geq 2
1:  H^\hat{H}\leftarrow\emptyset
2:  for i=1,,(2n)ρΔlog2ni=1,\ldots,\ (2n)^{\rho\Delta}\log^{2}n do (in parallel)
3:    SiS_{i}\leftarrow set of independently sampled vertices, each with probability p=(2n)ρdp=(2n)^{-\frac{\rho}{d}}.
4:   if QH(Si)=1Q_{H}(S_{i})=1 and v\exists v s.t. QH(Si{v})=0Q_{H}(S_{i}\setminus\{v\})=0 then add {vSi:QH(Si{v})=0}\{v\in S_{i}:Q_{H}(S_{i}\setminus\{v\})=0\} to H^\hat{H}
5:  for eH^e\in\hat{H} do (in parallel)
6:   if eH^\exists\ e^{\prime}\in\hat{H} s.t. eee\subset e^{\prime} then remove ee from H^\hat{H}
7:  return  H^\hat{H}

Recall that for hypermatchings, FindDisjointEdges is used as a subroutine in an adaptive procedure that learns edges of any size. For hypergraphs, however, we cannot call FindLowDegreeEdges iteratively and remove vertices in learned edges after each call because a large edge could overlap with a small edge.

The analysis.

The main technical challenge in this section is to analyze the sample complexity required to obtain w.h.p. a unique-edge covering family. The main lemma bounds the probability that, conditioned on a sample SS containing an edge ee, this sample contains another edge ee^{\prime}.

Lemma 1.

If a sample set SS is constructed by independently sampling each vertex w.p. p=(2n)ρdp=(2n)^{-\frac{\rho}{d}} from a hypergraph HH with maximum degree Δ2\Delta\geq 2, maximum edge size dd, edge size ratio ρ\rho, n100n\geq 100, then for any edge eHe\in H, (eS,eH,ee|eS)1(logn2n)(Δ1)ρ.\mathbb{P}(\exists e^{\prime}\subseteq S,e^{\prime}\in H,e^{\prime}\neq e\ |\ e\subseteq S)\leq 1-\left(\frac{\log n}{2n}\right)^{(\Delta-1)\rho}.

Proof. [Proof Sketch, full proof in Appendix B.1.] Given an edge eSe\in S, the proof of Lemma 1 relies on analyzing the following two mathematical programs, whose optimal values are used to bound (eS,eH,ee|eS)\mathbb{P}(\exists e^{\prime}\subseteq S,e^{\prime}\in H,e^{\prime}\neq e\ |\ e\subseteq S).

minaij\displaystyle\min\limits_{a_{ij}}\hskip 2.84544pt j=d/ρdi=0j1(1pji)aij\displaystyle\prod\limits_{j=d/\rho}^{d}\prod\limits_{i=0}^{j-1}(1-p^{j-i})^{a_{ij}}
s.t. j=d/ρdi=0j1iaij(Δ1)d\displaystyle\sum_{j=d/\rho}^{d}\sum_{i=0}^{j-1}i\cdot a_{ij}\leq(\Delta-1)d
j=d/ρdi=0j1jaijΔn\displaystyle\sum_{j=d/\rho}^{d}\sum_{i=0}^{j-1}j\cdot a_{ij}\leq\Delta n
aij0.\displaystyle\hskip 2.84544pta_{ij}\geq 0.
maxaij\displaystyle\max\limits_{a_{ij}}\hskip 2.84544pt j=d/ρdi=0j1aijpji\displaystyle\sum_{j=d/\rho}^{d}\sum_{i=0}^{j-1}a_{ij}\cdot p^{j-i}
s.t. j=d/ρdi=0j1iaij(Δ1)d\displaystyle\sum_{j=d/\rho}^{d}\sum_{i=0}^{j-1}i\cdot a_{ij}\leq(\Delta-1)d
j=d/ρdi=0j1jaijΔn\displaystyle\sum_{j=d/\rho}^{d}\sum_{i=0}^{j-1}j\cdot a_{ij}\leq\Delta n
aij0.\displaystyle\hskip 2.84544pta_{ij}\geq 0.

The variables aija_{ij} denote the number of edges of size jj intersecting ee in ii vertices and we assume |e|=d|e|=d. The two programs have identical constraints. The first constraint is that the sum, over edges eee^{\prime}\neq e, of the size of the intersection of ee and ee^{\prime} has to be at most (Δ1)d(\Delta-1)d since ee has size dd and each vertex has degree at most Δ\Delta. The second constraint is that the sum, over edges eee^{\prime}\neq e, of the size of ee^{\prime} has to be at most Δn\Delta n since there are nn vertices of degree at most Δ.\Delta.

For the objectives, we note that pjip^{j-i} is the probability that a fixed collection of jij-i vertices is in a sample. Thus, 1pji1-p^{j-i} is the probability that an edge ee^{\prime} of size jj intersecting ee in ii vertices is not contained in a sample SS, conditioned on eSe\subseteq S. If events e1S|eSe_{1}\subseteq S|e\subseteq S and e2S|eSe_{2}\subseteq S|e\subseteq S were independent for all e1,e2ee_{1},e_{2}\neq e (which is not the case), the probability that there is no other edge in SS, conditioned on eSe\in S would be equal to the objective of the left program. The objective of the right program is an upper bound on (eS,eH,ee|eS)\mathbb{P}(\exists e^{\prime}\subseteq S,e^{\prime}\in H,e^{\prime}\neq e\ |\ e\subseteq S) by a union bound over all edges eee^{\prime}\neq e. We denote the optimal value to the left and right programs by f(Δ,p,d,ρ)f^{\bullet}(\Delta,p,d,\rho) and LP(Δ,p,d,ρ)LP^{\bullet}(\Delta,p,d,\rho). The main steps of the proof are

  1. 1.

    In Lemma 9, we show that the unfavorable event is more likely to happen with the edge independence assumption:

    (eS,eH,ee|eS)1f(Δ,p,d,ρ)\mathbb{P}(\exists e^{\prime}\subseteq S,e^{\prime}\in H,e^{\prime}\neq e\ |\ e\subseteq S)\leq 1-f^{\bullet}(\Delta,p,d,\rho) (1)

    by using induction and Bayes rule to formalize the intuition that the events {eS}\{e^{\prime}\not\subseteq S\} for eee^{\prime}\neq e are positively correlated because edges could share common vertices.

  2. 2.

    In Lemma 12, we show that for Δ2\Delta\geq 2,

    f(Δ,p,d,ρ)f(2,p,d/ρ,1)ρ(Δ1)f^{\bullet}(\Delta,p,d,\rho)\geq f^{\bullet}(2,p,d/\rho,1)^{\rho(\Delta-1)} (2)

    by deriving a closed-form optimal solution of f(Δ,p,d,ρ)f^{\bullet}(\Delta,p,d,\rho), which maximizes aija_{ij} for i{0,d/ρ1},j=d/ρi\in\{0,d/\rho-1\},j=d/\rho and sets the rest aij=0a_{ij}=0.

  3. 3.

    In Lemma 13, we show that

    f(2,p,d/ρ,1)1LP(2,p,d/ρ,1).f^{\bullet}(2,p,d/\rho,1)\geq 1-LP^{\bullet}(2,p,d/\rho,1). (3)
  4. 4.

    In Lemma 15, we show that

    1LP(2,p,d/ρ,1)logn2n.1-LP^{\bullet}(2,p,d/\rho,1)\geq\frac{\log n}{2n}. (4)

    by deriving a closed-form solution to LP(2,p,d/ρ,1)LP^{\bullet}(2,p,d/\rho,1) and showing that it can be upper bounded by d/ρd/ρ1nρ/d+1d/ρ\frac{d/\rho}{d/\rho-1}n^{-\rho/{d}}+\frac{1}{d/\rho} and that d/ρd/ρ1nρ/d+1d/ρ1logn2n\frac{d/\rho}{d/\rho-1}n^{-\rho/{d}}+\frac{1}{d/\rho}\leq 1-\frac{\log n}{2n} for n100n\geq 100.

  5. 5.

    Combining Eqs.(1), (2), (3), (4), we get the desired bound. \blacksquare

We are now ready to present the main result for this section (see Appendix B.2 for the full proof).

Theorem 8.

For any ρ\rho-near-uniform hypergraph HH with maximum degree Δ2\Delta\geq 2, maximum edge size dd, and number of vertices n100n\geq 100, FindLowDegreeEdges correctly learns HH with probability 1o(1)1-o(1). Furthermore, it is non-adaptive and makes at most 𝒪((2n)ρΔ+1log2n)\mathcal{O}((2n)^{\rho\Delta+1}\log^{2}n) queries.

We note that if the exact parameters ρ\rho, dd, and Δ\Delta are unknown and only upper bounds ρ¯ρ\bar{\rho}\geq\rho, d¯d\bar{d}\geq d, Δ¯Δ\bar{\Delta}\geq\Delta are available, then, if ρ¯\bar{\rho} is large enough so that d¯/ρ¯mineH|e|\bar{d}/\bar{\rho}\leq\min_{e\in H}|e|, we have that FindLowDegreeEdges with inputs ρ¯,d¯,Δ¯\bar{\rho},\bar{d},\bar{\Delta} also learns HH with probability 1o(1)1-o(1).

Acknowledgements

We thank Meghan Pantalia and Matthew Ulgherait for helpful discussions on finding drug combinations that reduce cancer cell viability.

References

  • Abasi [2018] Hasan Abasi. Error-tolerant non-adaptive learning of a hidden hypergraph. In 43rd International Symposium on Mathematical Foundations of Computer Science (MFCS 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018.
  • Abasi et al. [2014] Hasan Abasi, Nader H Bshouty, and Hanna Mazzawi. On exact learning monotone DNF from membership queries. In International Conference on Algorithmic Learning Theory, pages 111–124. Springer, 2014.
  • Abasi et al. [2018] Hasan Abasi, Nader H. Bshouty, and Hanna Mazzawi. Non-adaptive learning of a hidden hypergraph. Theoretical Computer Science, 716:15 – 27, 2018. ISSN 0304-3975. doi: https://doi.org/10.1016/j.tcs.2017.11.019. URL http://www.sciencedirect.com/science/article/pii/S0304397517308496. Special Issue on ALT 2015.
  • Alon et al. [2004] Noga Alon, Richard Beigel, Simon Kasif, Steven Rudich, and Benny Sudakov. Learning a hidden matching. SIAM Journal on Computing, 33(2):487–501, 2004.
  • Angluin [1988] Dana Angluin. Queries and concept learning. Machine learning, 2(4):319–342, 1988.
  • Angluin and Chen [2004] Dana Angluin and Jiang Chen. Learning a hidden graph using o (log n) queries per edge. In International Conference on Computational Learning Theory, pages 210–223. Springer, 2004.
  • Angluin and Chen [2006] Dana Angluin and Jiang Chen. Learning a hidden hypergraph. Journal of Machine Learning Research, 7:2215–2236, December 2006. ISSN 1532-4435.
  • Beigel et al. [2001] Richard Beigel, Noga Alon, Simon Kasif, Mehmet Serkan Apaydin, and Lance Fortnow. An optimal procedure for gap closing in whole genome shotgun sequencing. In Proceedings of the fifth annual international conference on Computational biology, pages 22–30, 2001.
  • Bshouty [2018] Nader H Bshouty. Exact learning from an honest teacher that answers membership queries. Theoretical Computer Science, 733:4–43, 2018.
  • Chen et al. [2008] Hong-Bin Chen, Hung-Lin Fu, and Frank K Hwang. An upper bound of the number of tests in pooling designs for the error-tolerant complex model. Optimization Letters, 2(3):425–431, 2008.
  • Chin et al. [2013] Francis YL Chin, Henry CM Leung, and Siu-Ming Yiu. Non-adaptive complex group testing with multiple positive sets. Theoretical Computer Science, 505:11–18, 2013.
  • Chodoriwsky and Moura [2015] Jacob Chodoriwsky and Lucia Moura. An adaptive algorithm for group testing for complexes. Theoretical Computer Science, 592:1 – 8, 2015. ISSN 0304-3975. doi: https://doi.org/10.1016/j.tcs.2015.05.005. URL http://www.sciencedirect.com/science/article/pii/S0304397515003965.
  • D’yachkov et al. [2016] A. G. D’yachkov, I. V. Vorobyev, N. A. Polyanskii, and V. Y. Shchukin. On multistage learning a hidden hypergraph. In 2016 IEEE International Symposium on Information Theory (ISIT), pages 1178–1182, 2016.
  • Flobak et al. [2019] Åsmund Flobak, Barbara Niederdorfer, Vu To Nakstad, Liv Thommesen, Geir Klinkenberg, and Astrid Lægreid. A high-throughput drug combination screen of targeted small molecule inhibitors in cancer cell lines. Scientific data, 6(1):1–10, 2019.
  • Gao et al. [2006] Hong Gao, Frank K Hwang, My T Thai, Weili Wu, and Taieb Znati. Construction of d (h)-disjunct matrix for group testing in hypergraphs. Journal of Combinatorial Optimization, 12(3):297–301, 2006.
  • Hwang and Du [2006] Frank Kwang-ming Hwang and Ding-zhu Du. Pooling designs and nonadaptive group testing: important tools for DNA sequencing, volume 18. World Scientific, 2006.
  • Menden et al. [2019] Michael P Menden, Dennis Wang, Mike J Mason, Bence Szalai, Krishna C Bulusu, Yuanfang Guan, Thomas Yu, Jaewoo Kang, Minji Jeon, Russ Wolfinger, et al. Community assessment to advance computational prediction of cancer drug combinations in a pharmacogenomic screen. Nature communications, 10(1):1–17, 2019.
  • Torney [1999] David C. Torney. Sets pooling designs. Annals of Combinatorics, 3(1):95–101, 1999. doi: 10.1007/BF01609879. URL https://doi.org/10.1007/BF01609879.

Appendix A Missing Analysis for Hypermatchings (Section 3)

A.1 Missing analysis for algorithm for learning hypermatchings (Section 3.1)

See 1

Proof.

It is clear that FindEdgeParallel(S)\textsc{FindEdgeParallel}(S) makes at most |S|+1|S|+1 queries. If SS does not contain an edge, then QM(S)=0Q_{M}(S)=0 and the algorithm returns None. If SS contains at least two edges, then there is no intersection between the edges because MM is a matching. Therefore, for every vSv\in S, QM(S{v})=1Q_{M}(S\setminus\{v\})=1, and the algorithm returns None. The only case left is when SS contains a unique edge ee. In this case QM(S)=1Q_{M}(S)=1 and e={vS,QM(S{v}=0}e=\{v\in S,\ Q_{M}(S\setminus\{v\}=0\}. In this case, FindEdgeParallel(S)\textsc{FindEdgeParallel}(S) return ee. ∎

Let 𝒮i={(S1i,T1i),,(Si,Ti)}\mathcal{S}^{i}=\{(S^{i}_{1},T^{i}_{1}),\ldots,(S^{i}_{\ell},T^{i}_{\ell})\} be 𝒮\mathcal{S} at the beginning of the iith iteration of the while loop of FindEdgeAdaptive(S,s)(S,s). To prove Lemma 3, we need the following three loop invariants in FindEdgeAdaptive. See 2

Proof.

To simplify the proof, we actually show the following three loop invariants, where the second is an intermediary step to show the third invariant:

  1. 1.

    either ve^v\in\hat{e} or vSjiv\in S^{i}_{j} for some j[]j\in[\ell]

  2. 2.

    QM(SjiTji)=1Q_{M}(S^{i}_{j}\cup T^{i}_{j})=1 for all j[]j\in[\ell]

  3. 3.

    eSjie\cap S^{i}_{j}\neq\emptyset for all j[]j\in[\ell]

We prove the three invariants by induction on ii.

Base case: At the beginning of the first iteration, 𝒮1={(S,)}\mathcal{S}^{1}=\{(S,\emptyset)\} and e^=\hat{e}=\emptyset, and we have the following 1) For every vev\in e we have vSv\in S. 2) QM(S)=1.Q_{M}(S\cup\emptyset)=1. 3) eS=ee\cap S=e\neq\emptyset.

Assume that the statement of the lemma holds at the beginning of iteration i1i\geq 1. We will show that it holds at the beginning of iteration i+1i+1. Let 𝒮i+1={(S1i+1,T1i+1),,(Ski+1,Tki+1)}\mathcal{S}^{i+1}=\{(S^{i+1}_{1},T^{i+1}_{1}),\ldots,(S^{i+1}_{k},T^{i+1}_{k})\} be 𝒮\mathcal{S} at the beginning of the (i+1)(i+1)th iteration of the while loop of FindEdgeAdaptive(S,s)(S,s).

  1. 1.

    Consider a vertex vev\in e. If at the beginning of the iith iteration we had ve^v\in\hat{e}, then ve^v\in\hat{e} at the beginning of the (i+1)(i+1)th iteration as well. Suppose now that vSjiv\in S^{i}_{j} for some j[]j\in[\ell] at the start of iteration ii. If |Sji|=1|S^{i}_{j}|=1 then vv is added to e^\hat{e} at Step 8. Assume |Sji|>1|S^{i}_{j}|>1, and let T=TjiT=T^{i}_{j}. Then SjiS^{i}_{j} is partitioned into two sets S1S_{1} and S2S_{2}. If QM(S1T)=1Q_{M}(S_{1}\cup T)=1 and QM(S2T)=1Q_{M}(S_{2}\cup T)=1 then this contradicts the fact that SS contains a unique edge. In fact, without loss of generality, we can assume that vS1v\in S_{1}. Therefore S2TS_{2}\cup T cannot include ee, and the only way QM(S2T)=1Q_{M}(S_{2}\cup T)=1 is if SS contains another edge. If QM(S1T)=1Q_{M}(S_{1}\cup T)=1 and QM(S2T)=0Q_{M}(S_{2}\cup T)=0 then we know that vS1v\in S_{1} and (S1,T)(S_{1},T) is added to 𝒮i+1\mathcal{S}^{i+1}. If QM(S1T)=0Q_{M}(S_{1}\cup T)=0 and QM(S2T)=1Q_{M}(S_{2}\cup T)=1 then we know that vS2v\in S_{2} and (S2,T)(S_{2},T) is added to 𝒮i+1\mathcal{S}^{i+1}. If QM(S1T)=0Q_{M}(S_{1}\cup T)=0 and QM(S2T)=0Q_{M}(S_{2}\cup T)=0 then both (S1,S2T)(S_{1},S_{2}\cup T) and (S2,S1T)(S_{2},S_{1}\cup T) are added to 𝒮i+1\mathcal{S}^{i+1}, and we have either vS1v\in S_{1} or vS2v\in S_{2}.

  2. 2.

    From steps 9 to 11 in FindEdgeAdaptive(S,s)(S,s), a couple (Sji+1,Tji+1)(S^{i+1}_{j},T^{i+1}_{j}) is only added to 𝒮i+1\mathcal{S}^{i+1} when QM(Sji+1Tji+1)=1Q_{M}(S^{i+1}_{j}\cup T^{i+1}_{j})=1. In step 12, we have QM(S1T)=0Q_{M}(S_{1}\cup T)=0 and QM(S2T)=0Q_{M}(S_{2}\cup T)=0, and both (S1,S2T)(S_{1},S_{2}\cup T) and (S2,S1T)(S_{2},S_{1}\cup T) are added to 𝒮i+1\mathcal{S}^{i+1}. But we know from the induction assumption that at the beginning of iteration ii we had (S1S2,T)𝒮i(S_{1}\cup S_{2},T)\in\mathcal{S}^{i}, therefore by 2. QM(S1S2T)=1Q_{M}(S_{1}\cup S_{2}\cup T)=1. This ensures that also in this case, (Sji+1,Tji+1)(S^{i+1}_{j},T^{i+1}_{j}) is only added to 𝒮i+1\mathcal{S}^{i+1} when QM(Sji+1Tji+1)=1Q_{M}(S^{i+1}_{j}\cup T^{i+1}_{j})=1.

  3. 3.

    We know from the induction hypothesis eSjie\cap S^{i}_{j}\neq\emptyset for all j[]j\in[\ell]. For a fixed j[],j\in[\ell], SjiS^{i}_{j} contains a vertex vv from ee. In the while loop, SjiS^{i}_{j} is partitioned into S1S_{1} and S2S_{2}, such that vv is either in S1S_{1} or in S2S_{2}. If QM(S1Tji)=1Q_{M}(S_{1}\cup T^{i}_{j})=1, then vS1v\in S_{1}, (S1,Tji)(S_{1},T^{i}_{j}) is added to 𝒮i+1\mathcal{S}^{i+1} and S1eS_{1}\cap e\neq\emptyset. Similarly, if QM(S2Tji)=1Q_{M}(S_{2}\cup T^{i}_{j})=1, then vS2v\in S_{2}, (S2,Tji)(S_{2},T^{i}_{j}) is added to 𝒮i+1\mathcal{S}^{i+1} and S2eS_{2}\cap e\neq\emptyset. Assume now that QM(S1Tji)=QM(S2Tji)=0Q_{M}(S_{1}\cup T^{i}_{j})=Q_{M}(S_{2}\cup T^{i}_{j})=0, then (S1,S2Tji)(S_{1},S_{2}\cup T^{i}_{j}) and (S2,S1Tji(S_{2},S_{1}\cup T^{i}_{j}) are added to 𝒮i+1\mathcal{S}^{i+1} and we need to show that S1eS_{1}\cap e\neq\emptyset and S2eS_{2}\cap e\neq\emptyset . We know from 2. that QM(S1S2Tji)=1Q_{M}(S_{1}\cup S_{2}\cup T^{i}_{j})=1. Therefore QM(S1Tji)=QM(S2Tji)=0Q_{M}(S_{1}\cup T^{i}_{j})=Q_{M}(S_{2}\cup T^{i}_{j})=0 implies that there is a vertex of ee both in S1S_{1} and S2S_{2}, therefore S1eS_{1}\cap e\neq\emptyset and S2eS_{2}\cap e\neq\emptyset. This concludes this part and shows that whenever (Sji+1,Tji+1)(S_{j}^{i+1},T_{j}^{i+1}) is added to 𝒮i+1\mathcal{S}^{i+1}, we have Sji+1eS_{j}^{i+1}\cap e\neq\emptyset.

See 4

Proof.

We know that a sample SiS_{i} contains at least an edge if and only if QM(Si)=1Q_{M}(S_{i})=1. We also know from Lemma 1 and Lemma 3 that if SiS_{i} contains a unique edge ee of size less than ss, then the subroutine FindEdge(Si,s)\textsc{FindEdge}(S_{i},s) will return ee. Therefore, to learn all edges of size in [s/α,s]\left[s/\alpha,s\right], we only need to make sure that each such edge appears in at least one sample that does not contain any other edges, in other words, that 𝒮\mathcal{S} is a unique-edge covering family for Ms,α,V={eM:eV,s/α|e|sM_{s,\alpha,V^{\prime}}=\{e\in M:e\subseteq V^{\prime},\ s/\alpha\leq|e|\leq s}. Below, we show that it is the case with probability 1O(nlogn)=1eΩ(log2n)1-O(n^{-\log n})=1-e^{-\Omega(\log^{2}n)}.

We first show that with high probability, each edge ee of size |e|[s/α,s]|e|\in\left[s/\alpha,s\right] is contained in at least (log2n)/2(\log^{2}n)/2 sample SiS_{i}’s. We use XeX_{e} to denote the number of samples containing ee, then we have

𝔼(Xe)=nαlog2nnα|e|slog2n.\mathbb{E}(X_{e})=n^{\alpha}\log^{2}n\cdot n^{-\frac{\alpha|e|}{s}}\geq\log^{2}n.

By Chernoff bound, we have

P(Xelog2n/2)P(Xe𝔼(Xe)/2)e𝔼(Xe)8nlogn8.P(X_{e}\leq\log^{2}n/2)\leq P(X_{e}\leq\mathbb{E}(X_{e})/2)\leq e^{-\frac{\mathbb{E}(X_{e})}{8}}\leq n^{-\frac{\log n}{8}}.

As there are at most αn/s\alpha n/s edges of size between s/αs/\alpha and ss, by a union bound, we have

P(eM s.t. |e|[s/α,s],Xe(log2n)/2)αnsnlogn/8n(logn/81).P(\exists e\in M\text{ s.t. }|e|\in\left[s/\alpha,s\right],X_{e}\leq(\log^{2}n)/2)\leq\frac{\alpha n}{s}n^{-\log n/8}\leq n^{-(\log n/8-1)}.

Subsequently,

P(eM s.t. |e|[s/α,s],Xe(log2n)/2)1n(logn/81).P(\forall e\in M\text{ s.t. }|e|\in\left[s/\alpha,s\right],X_{e}\geq(\log^{2}n)/2)\geq 1-n^{-(\log n/8-1)}.

From now on we condition on the event that all edges ee whose size is between s/αs/\alpha and ss are included in at least (log2n)/2(\log^{2}n)/2 samples. We show below that given eSie\subseteq S_{i} for a sample SiS_{i}, the probability that there exists another edge of size at least ss in SiS_{i} is upper bounded by n1/nn^{-1/n}. Recall that MM is the hidden matching we want to learn. We abuse notation and use eMSie^{\prime}\in M\cap S_{i} to denote that an edge ee^{\prime} is both in the matching MM and included in the sample SiS_{i}. We have

(eMSi,ee|eSi)\displaystyle\mathbb{P}(\exists e^{\prime}\in M\cap S_{i},e^{\prime}\neq e\ |\ e\subseteq S_{i})\leq eM,ee(eSi|eSi)\displaystyle\sum_{e^{\prime}\in M,e^{\prime}\neq e}\mathbb{P}(e^{\prime}\subseteq S_{i}\ |\ e\subseteq S_{i}) (5)
=\displaystyle= eM,ee(eSi)\displaystyle\sum_{e^{\prime}\in M,e^{\prime}\neq e}\mathbb{P}(e^{\prime}\subseteq S_{i}) (6)
\displaystyle\leq αns(nαs)sα\displaystyle\frac{\alpha n}{s}\cdot(n^{-\frac{\alpha}{s}})^{\frac{s}{\alpha}} (7)
=\displaystyle= αnsnαsαs=αs.\displaystyle\frac{\alpha n}{s}\cdot n^{-{\frac{\alpha s}{\alpha s}}}=\frac{\alpha}{s}.

where Eq.(5) uses union bound. Eq.(6) is due to the fact that MM is a matching and thus ee=e\cap e^{\prime}=\emptyset and vertices are sampled independently. Eq.(7) follows because the total number of remaining edges is upper bounded by αn/s\alpha n/s and each edge is in SiS_{i} with probability at most (nαs)sα(n^{-\frac{\alpha}{s}})^{\frac{s}{\alpha}}.

As each edge ee with size between s/αs/\alpha and ss is contained in at least (log2n)/2(\log^{2}n)/2 samples, we have that

(Si s.t. eSi,eMSi,ee)(sα)log2n/2nlogn/4,\displaystyle\mathbb{P}(\forall S_{i}\text{ s.t. }e\subseteq S_{i},\exists\ e^{\prime}\in M\cap S_{i},e^{\prime}\neq e)\leq(\frac{s}{\alpha})^{-\log^{2}n/2}\leq n^{-\log n/4},

where the last inequality follows because s/α2s/\alpha\geq 2.

By another union bound on all edges of size between s/αs/\alpha and ss (at most nn of them), we have that

(e s.t. |e|[s/α,s],Si s.t. eSi,eMSi,ee)n(logn/21).\displaystyle\mathbb{P}(\exists e\text{ s.t. }|e|\in\left[s/\alpha,s\right],\forall S_{i}\text{ s.t. }e\subseteq S_{i},\exists e^{\prime}\in M\cap S_{i},e^{\prime}\neq e)\leq n^{-(\log n/2-1)}.

We can thus conclude that with probability at least 1O(nlogn)=1eΩ(log2n)1-O(n^{-\log n})=1-e^{-\Omega(\log^{2}n)}, for all eMe\in M with size between s/αs/\alpha and ss, there exists at least one sample SiS_{i} that contains ee but no other remaining edges. By Lemma 1 and Lemma 3, ee is returned by FindEdge(Si,s)\textsc{FindEdge}(S_{i},s) and ee is added to the matching M^\hat{M} that is returned by FindDisjointEdges(s,α,V)\textsc{FindDisjointEdges}(s,\alpha,V^{\prime}).

Next, we show that FindDisjointEdges is an (r+1)(r+1)-adaptive algorithm with nαlog2n+2αqnαlog2n/sn^{\alpha}\log^{2}n+2\alpha qn^{\alpha}\log^{2}n/s queries. We first observe that FindDisjointEdges makes only parallel calls to FindEdges (after verifying that QM(Si)=1Q_{M}(S_{i})=1). Therefore, since FindEdges runs in at most rr rounds, we get that FindDisjointEdges runs in at most r+1r+1 rounds. To prove a bound on the number of queries, we first argue using a Chernoff bound that with probability 1eΩ(log2n)1-e^{-\Omega(\log^{2}n)}, every edge ee of size at least s/αs/\alpha is in at most 2nα1log2n2n^{\alpha-1}\log^{2}n samples SiS_{i}. For a fixed edge ee we have:

P(Xe>2nα1log2n)\displaystyle P(X_{e}>2n^{\alpha-1}\log^{2}n) P(Xe>2𝔼(Xe))\displaystyle\leq P(X_{e}>2\cdot\mathbb{E}(X_{e}))
e𝔼(Xe)3\displaystyle\leq e^{\frac{-\mathbb{E}(X_{e})}{3}}
elog2n3=eΩ(log2n).\displaystyle\leq e^{\frac{-\log^{2}n}{3}}=e^{-\Omega(\log^{2}n)}.

A union bound on the at most nn edges yields that every edge of size at least s/αs/\alpha is in at most 2nα1log2n2n^{\alpha-1}\log^{2}n samples with probability 1eΩ(log2n)1-e^{-\Omega(\log^{2}n)}. Therefore, there are at most 2αnsnα1log2n\frac{2\alpha n}{s}n^{\alpha-1}\log^{2}n samples SiS_{i} such that QM(Si)=1Q_{M}(S_{i})=1 For every one of these samples, we call FindEdges(Si,s)\textsc{FindEdges}(S_{i},s), which makes qq queries by assumption. Therefore, with high probability 1eΩ(log2n)1-e^{-\Omega(\log^{2}n)}, FindDisjointEdges makes 𝒪(nαlog2n+αq2nαslog2n)\mathcal{O}(n^{\alpha}\log^{2}n+\alpha q\frac{2n^{\alpha}}{s}\log^{2}n) queries. ∎

The following claim is a technical result we need to prove Theorem 5.

Claim 1.

For 0x<10\leq x<1 we have 11xex\frac{1}{1-x}\geq e^{x}.

Proof.

Consider the function f(x)=11xexf(x)=\frac{1}{1-x}-e^{x} for x<1x<1. The derivative of ff is

f(x)=1(1x)2ex=11x(11x(1x)ex).f^{\prime}(x)=\frac{1}{(1-x)^{2}}-e^{x}=\frac{1}{1-x}\cdot\left(\frac{1}{1-x}-(1-x)e^{x}\right).

Because 1xex1-x\leq e^{-x}, we have (1x)ex1(1-x)e^{x}\leq 1. Therefore,

f(x)11x(11x1)0,f^{\prime}(x)\geq\frac{1}{1-x}\cdot\left(\frac{1}{1-x}-1\right)\geq 0,

where the last inequality follows from the assumption that 0x<10\leq x<1. ∎

See 5

Proof.

FindSingletons learns, with probability 1, all the edges of size s=1s=1. For edges of size greater than 2, from Lemma 4, we know that each call of FindDisjointEdges(s,α,V,FindEdge)\textsc{FindDisjointEdges}(s,\alpha,V^{\prime},\textsc{FindEdge}) fails to find all edges in VV^{\prime} of size between s/αs/\alpha and ss with probability at most eΩ(log2n)e^{-\Omega(\log^{2}n)}. As there are at most nn different edge sizes, FindDisjointEdges is called at most nn times (we show below that it is actually called at most logn/(logα)\log n/(\log\alpha) times). Thus, the probability that at least one of the calls fails is upper bounded by

neΩ(log2n)=elognΩ(log2n))=eΩ(log2n).n\cdot e^{-\Omega(\log^{2}n)}=e^{\log n-\Omega(\log^{2}n)})=e^{-\Omega(\log^{2}n)}.

We can thus conclude that the probability that all calls are successful is at least 1eΩ(log2n)1-e^{-\Omega(\log^{2}n)}.

Next, we show that FindMatching makes 𝒪(lognlogα)\mathcal{O}(\frac{\log n}{\log\alpha}) calls to FindDisjointEdges.

We use sts_{t} to denote the value of ss after the ttht^{\text{th}} call of FindDisjointEdges for t=1,2,,Tt=1,2,\ldots,T with TT being the number of adaptive rounds of FindMatching and set s0=2αs_{0}=2\alpha. We show that the algorithm needs less than O(lognlogα)O(\frac{\log n}{\log\alpha}) rounds to go over all the possible edge sizes of the matching. In step 5 of FindMatching, we update sts_{t} as follows:

stαst1+1s_{t}\leftarrow\lfloor\alpha s_{t-1}\rfloor+1

Therefore

st\displaystyle s_{t} αst1,\displaystyle\geq\alpha s_{t-1}, (8)

and

st\displaystyle s_{t} (α)ts0,\displaystyle\geq(\alpha)^{t}s_{0}, (9)

Since the maximum ss we input to FindDisjointEdges is nn, we have that

nsT2(α)T+1n\geq s_{T}\geq 2(\alpha)^{T+1}

and subsequently

Tlognlog2logα.T\leq\frac{\log n-\log 2}{\log\alpha}.

Therefore, FindMatching makes 𝒪(lognlogα)\mathcal{O}(\frac{\log n}{\log\alpha}) calls to FindDisjointEdges. From Lemma 4, we know that with high probability 1eΩ(log2n)1-e^{-\Omega(\log^{2}n)}, FindDisjointEdges(s,α,V,FindEdge)\textsc{FindDisjointEdges}(s,\alpha,V^{\prime},\textsc{FindEdge}) runs in r+1r+1 rounds and makes 𝒪(nαlog2n+2αqnαlog2n/s)\mathcal{O}(n^{\alpha}\log^{2}n+2\alpha qn^{\alpha}\log^{2}n/s) queries when FindEdges(S,s)(S,s) runs in rr rounds and makes qq queries. From Lemma 1 and Lemma 3 we know that for the subroutines FindEdgeAdaptive and FindEdgeParallel we have r=𝒪(logn),q=𝒪(slogn)r=\mathcal{O}(\log n),q=\mathcal{O}(s\log n) and r=1,q=𝒪(n)r=1,q=\mathcal{O}(n) respectively. Therefore FindDisjointEdges runs in 𝒪(logn)\mathcal{O}(\log n) rounds and makes 𝒪(nαlog2n+2nαlog3n)\mathcal{O}(n^{\alpha}\log^{2}n+2n^{\alpha}\log^{3}n) queries with FindEdgeAdaptive and in 2 rounds 𝒪(nαlog2n+2nα+1log2n)\mathcal{O}(n^{\alpha}\log^{2}n+2n^{\alpha+1}\log^{2}n) with FindEdgeParallel. We conclude that with high probability,

  • FindMatching runs in 𝒪(log2nlogα)\mathcal{O}(\frac{\log^{2}n}{\log\alpha}) rounds and makes 𝒪(nαlog3nlogα+2αnαlog4nlogα)\mathcal{O}(n^{\alpha}\frac{\log^{3}n}{\log\alpha}+2\alpha n^{\alpha}\frac{\log^{4}n}{\log\alpha}) queries with FindEdgeAdaptive,

  • FindMatching runs in 𝒪(lognlogα)\mathcal{O}(\frac{\log n}{\log\alpha}) rounds and makes 𝒪(nαlog3nlogα+2nα+1log3nlogα)\mathcal{O}(n^{\alpha}\frac{\log^{3}n}{\log\alpha}+2n^{\alpha+1}\frac{\log^{3}n}{\log\alpha}) queries with FindEdgeParallel.

Now we turn our attention to particular values of α\alpha.

  • Suppose now that α=1/(112logn)\alpha=1/(1-\frac{1}{2\log n}) and that we use FindEdgeAdaptive. We get then α=1+o(1)\alpha=1+o(1). Furthermore, by Claim 1, we have

    logα\displaystyle\log\alpha =log1112logn\displaystyle=\log\frac{1}{1-\frac{1}{2\log n}}
    12logn.\displaystyle\geq\frac{1}{2\log n}.

    Therefore we get that 1logα=O(logn)\frac{1}{\log\alpha}=O(\log n). We also observe that for n3n\geq 3

    α\displaystyle\alpha =1+12logn112logn\displaystyle=1+\frac{\frac{1}{2\log n}}{1-\frac{1}{2\log n}}
    1+1logn.\displaystyle\leq 1+\frac{1}{\log n}.

    Therefore nαnn1logn=𝒪(n)n^{\alpha}\leq n\cdot n^{\frac{1}{\log n}}=\mathcal{O}(n). We finally conclude that when α=1/(112logn)\alpha=1/(1-\frac{1}{2\log n}) and we use the subroutine FindEdgeAdaptive, FindMatching runs in 𝒪(log2n/logα)=𝒪(log3n)\mathcal{O}(\log^{2}n/\log\alpha)=\mathcal{O}(\log^{3}n) adaptive rounds and makes 𝒪(nlog4n+nlog5n)=𝒪(nlog5n)\mathcal{O}(n\log^{4}n+n\log^{5}n)=\mathcal{O}(n\log^{5}n) queries in total.

  • With α=2\alpha=2 and FindEdgeParallel, we have r=1r=1 and we get that FindMatching learns any hypermatching w.h.p. in O(logn)O(\log n) rounds and with at most 𝒪(n3log3n)\mathcal{O}(n^{3}\log^{3}n) queries.

A.2 Missing analysis for hardness of learning hypermatchings (Section 3.2)

A.2.1 Lower bound for non-adaptive algorithms

To argue that learning hypermatchings non-adaptively requires an exponential number of queries, we fix the number of vertices nn, we construct a family of matchings MPM_{P} which depend on a partition PP of the vertex set [n][n] into 3 parts P1,P2,P3P_{1},P_{2},P_{3} such that

  • |P1|=n(n+1)|P_{1}|=n-(\sqrt{n}+1).

  • P1P_{1} contains a perfect matching M1M_{1} with edges of size 2.

  • |P2|=1|P_{2}|=1

  • P3P_{3} is an edge of the matching s.t. |P3|=n|P_{3}|=\sqrt{n}.

  • MP=M1{P3}M_{P}=M_{1}\cup\{P_{3}\}.

We denote all such partitions by 𝒫3\mathcal{P}_{3}. For a partition P𝒫3P\in\mathcal{P}_{3}, multiple matchings are possible, depending on the perfect matching M1M_{1}. We use MPM_{P} to denote a random matching from all the possible matchings satisfying the properties above. The main idea of the construction is that after one round of queries, elements in P2P_{2} and P3P_{3} are indistinguishable to any algorithm with polynomial queries. However, a learning algorithm needs to distinguish elements in P3P_{3} from the element in P2P_{2} to learn the edge P3P_{3}.

Before presenting the proof, we formalize what it means for a set of elements to be indistinguishable. Since the next definition is also used in the next subsection, we consider an arbitrary family 𝒫R\mathcal{P}_{R} of partitions P=(P0,,PR)P=(P_{0},\ldots,P_{R}). Given a partition (P0,,PR)(P_{0},\ldots,P_{R}), we denote by PrP_{r}: the union of parts PiP_{i} such that iri\geq r, i.e. Pr:=i=rRPiP_{r}:=\cup_{i=r}^{R}P_{i}. Informally, we say that queries QMP(S)Q_{M_{P}}(S) are independent of the partition of PrP_{r}: if the values QMP(S)Q_{M_{P}}(S) of these queries do not contain information about which elements are in PrP_{r}, or Pr+1,,P_{r+1},\ldots, or PRP_{R}.

Definition 2.

Given a family of partitions 𝒫R\mathcal{P}_{R} and a partition P=(P0,,PR)𝒫RP=(P_{0},\ldots,P_{R})\in\mathcal{P}_{R}, let PP^{\prime} be a partition chosen uniformly at random from {(P0,,PR)𝒫3:P0=P0,Pr1=Pr1}\{(P_{0}^{\prime},\ldots,P_{R}^{\prime})\in\mathcal{P}_{3}\ :\ P_{0}^{\prime}=P_{0},\ldots P^{\prime}_{r-1}=P_{r-1}\}. Let MPM_{P^{\prime}} be a matching on PP^{\prime} such that MPM_{P} and MPM_{P^{\prime}} are equal on Pr:=i=rRPiP_{r}:=\cup_{i=r}^{R}P_{i}. A query QMP(S)Q_{M_{P}}(S) is independent from Pr:=i=rRPiP_{r}:=\cup_{i=r}^{R}P_{i} if QMP(S)=QMP(S)Q_{M_{P}}(S)=Q_{M_{P^{\prime}}}(S) with high probability over PP^{\prime}.

For example, in the partition (P1,P2,P3)(P_{1},P_{2},P_{3}) above, any query QMP(S)Q_{M_{P}}(S) such that SS contains an edge from M1M_{1} is independent of the partition of P2P3P_{2}\cup P_{3} since this query will always answer 1.

Analysis of the construction. We consider a non-adaptive algorithm 𝒜\mathcal{A}, a uniformly random partition P=(P1,P2,P3)P=(P_{1},P_{2},P_{3}) in 𝒫3\mathcal{P}_{3}, and a matching MPM_{P}. We argue that with high probability over both the randomization of PP and the decisions of AA, the matching returned by 𝒜\mathcal{A} is not equal to MPM_{P}. The analysis consists of two main parts.

The first part argues that, with high probability, for all queries SS made by a non-adaptive algorithm, QMP(S)Q_{M_{P}}(S) is independent of the partition of P2P3P_{2}\cup P_{3}.

Lemma 2.

Let PP be a uniformly random partition from 𝒫3\mathcal{P}_{3}. For any collection 𝒮\mathcal{S} of poly(n)\operatorname{poly}(n) non-adaptive queries, with high probability 1eΩ(n)1-e^{-\Omega(n)}, for all S𝒮S\in\mathcal{S}, SS is independent of the partition of P2P3P_{2}\cup P_{3}.

Proof.

Consider any set S𝒮S\in\mathcal{S} which is independent of the randomization of PP. There are two cases depending on the size of SS.

  1. 1.

    |S|n+n+12|S|\leq\frac{n+\sqrt{n}+1}{2}. In this case, if SS does not contain at least n\sqrt{n} vertices from P2P3P_{2}\cup P_{3}, then for any other partition PP^{\prime} such that P1=P1P_{1}^{\prime}=P_{1}, we have QMP(S)=QMP(S)Q_{M_{P}}(S)=Q_{M_{P^{\prime}}}(S). In fact, both QMP(S)Q_{M_{P^{\prime}}}(S) and QMP(S)Q_{M_{P}}(S) will be equal 1 if and only if SS contains an edge from M1M_{1}. Next we show that, with high probability, SS will contain fewer than n\sqrt{n} vertices from P2P3P_{2}\cup P_{3}. Since P2P3P_{2}\cup P_{3} is a uniformly random set of size n+1\sqrt{n}+1, by the Chernoff bound, |S(P2P3)|<n+1|S\cap(P_{2}\cup P_{3})|<\sqrt{n}+1 with probability 1eΩ(n)1-e^{-\Omega(\sqrt{n})}.

    We therefore get that QMP(S)=QMP(S)Q_{M_{P}}(S)=Q_{M_{P^{\prime}}}(S) for any partition P=(P1,P2,P3)P^{\prime}=(P_{1},P_{2}^{\prime},P_{3}^{\prime}) and query |S|n+n+12|S|\leq\frac{n+\sqrt{n}+1}{2} with probability 1eΩ(n))1-e^{-\Omega(\sqrt{n}))}.

  2. 2.

    |S|>n+n+12|S|>\frac{n+\sqrt{n}+1}{2}. In this case, the set SS contains at least one matched edge from P1P_{1}. In fact, the number of edges in M1M_{1} plus the size of P2P3P_{2}\cup P_{3} is

    |P1|2+|P2|+|P3|=n(n+1)2+n+1=n+n+12<|S|.\frac{|P_{1}|}{2}+|P_{2}|+|P_{3}|=\frac{n-(\sqrt{n}+1)}{2}+\sqrt{n}+1=\frac{n+\sqrt{n}+1}{2}<|S|.

    SS must therefore contain at least one edge from M1M_{1}, and we get that QMP(S)=QP(S)Q_{M_{P}}(S)=Q^{P^{\prime}}(S) for any partition P=(P1,P2,P3)P^{\prime}=(P_{1},P_{2}^{\prime},P_{3}^{\prime}) and query |S|>n+n+12|S|>\frac{n+\sqrt{n}+1}{2} with probability 1.

By combining the two cases |S|n+n+12|S|\leq\frac{n+\sqrt{n}+1}{2} and |S|>n+n+12|S|>\frac{n+\sqrt{n}+1}{2} , we get that with probability 1eΩ(n))1-e^{-\Omega(\sqrt{n}))}, QMP(S)Q_{M_{P}}(S) is independent of the partition of P2P3P_{2}\cup P_{3}, and by a union bound, this holds for any collection of poly(n)poly(n) sets SS. ∎

The second part of the analysis argues that if all queries QMP(S)Q_{M_{P}}(S) of an algorithm are independent of the partition of P2P3P_{2}\cup P_{3}, then, with high probability, the matching returned by this algorithm is not equal to MPM_{P}.

Lemma 3.

Let PP be a uniformly random partition in 𝒫3\mathcal{P}_{3} and MPM_{P} a random matching over PP. Consider an algorithm 𝒜\mathcal{A} such that all the queries QMP(S)Q_{M_{P}}(S) made by 𝒜\mathcal{A} are independent of the partition of P2P3P_{2}\cup P_{3}. Then, the (possibly randomized) matching MM returned by 𝒜\mathcal{A} is, with probability 1O(1/n)1-O(1/\sqrt{n}), not equal to MPM_{P}.

Proof.

Consider an algorithm 𝒜\mathcal{A} such that all queries QMP(S)Q_{M_{P}}(S) of 𝒜\mathcal{A} are independent of the partition of P2P3P_{2}\cup P_{3}. Thus, the matching MM returned by 𝒜\mathcal{A} is conditionally independent of the randomization of the partition PP given P1P_{1}.

For the algorithm 𝒜\mathcal{A} to return MPM_{P}, it needs to learn the edge P3P_{3}. We distinguish two cases.

  • 𝒜\mathcal{A} does not return any edge that is included in P2P3P_{2}\cup P_{3}. In this case, with probability 1, MM is not equal to MPM_{P}.

  • 𝒜\mathcal{A} returns an edge that is included in P2P3P_{2}\cup P_{3}. We know from the previous lemma that with probability 1eΩ(n)1-e^{-\Omega(\sqrt{n})}, all queries are independent from P2P3P_{2}\cup P_{3}. Therefore, the edge returned by 𝒜\mathcal{A} is also independent of the partition P2P3P_{2}\cup P_{3}. The probability that this edge is equal to P3P_{3} is less than 1/n+11/\sqrt{n+1}. Therefore, with probability (1eΩ(n))(11/n+1)=1O(1/n)(1-e^{-\Omega(\sqrt{n})})(1-1/\sqrt{n+1})=1-O(1/\sqrt{n}), the returned matching MM is not equal to MPM_{P}.

By combining Lemma 2 and Lemma 3, we get the hardness result for non-adaptive algorithms.

See 6

Proof.

Consider a uniformly random partition P𝒫3P\in\mathcal{P}_{3}, a matching MPM_{P} and an algorithm 𝒜\mathcal{A} which queries MPM_{P} . By Lemma 2, after one round of queries, with probability 1eΩ(n)1-e^{-\Omega(\sqrt{n})} over both the randomization of PP and of the algorithm, all the queries QMP(S)Q_{M_{P}}(S) made by 𝒜\mathcal{A} are independent of the partition of P2P3P_{2}\cup P_{3}. By Lemma 3, this implies that, with probability 1O(1/n)1-O(1/\sqrt{n}), the matching returned by 𝒜\mathcal{A} is not a equal to MPM_{P}. By the probabilistic method, this implies that there exists a partition P𝒫3P\in\mathcal{P}_{3} for which, with probability 1O(1/n)1-O(1/\sqrt{n}), 𝒜\mathcal{A} does not return MPM_{P} after one round of queries. ∎

A.2.2 Lower bound for o(loglogn)o(\log\log n) adaptive algorithm

Our idea is to construct a partition P0,,Pi,PRP_{0},\ldots,P_{i},\ldots P_{R}, such that every set PiP_{i} is a perfect matching of edges of size did_{i}, and therefore has |Pi|/di|P_{i}|/d_{i} edges. We want to choose the size |Pi||P_{i}| and did_{i} (did_{i} increasing) such that with probability 1epoly(n)1-e^{-poly(n)}, after ii rounds of any adaptive algorithm that asks a polynomial number of queries, the partition PiPi+1PRP_{i}\cup P_{i+1}\ldots P_{R} is indistinguishable.

Our construction

  • d0=3d_{0}=3.

  • di+1=3log2ndi2d_{i+1}=3\log^{2}n\cdot d_{i}^{2}.

  • |Pi|=3log2ndi2|P_{i}|=3\log^{2}n\cdot d_{i}^{2}.

  • |Pi+1|=3log2ndi+12=3log2n(3log2ndi2)2=3log2n|Pi|2|P_{i+1}|=3\log^{2}n\cdot d_{i+1}^{2}=3\log^{2}n\cdot(3\log^{2}n\cdot d_{i}^{2})^{2}=3\log^{2}n\cdot|P_{i}|^{2}.

Notation : Let 𝒫R\mathcal{P}_{R} be the set of partitions P=(P0,,PR)P=(P_{0},\ldots,P_{R}) satisfying the conditions above. For a fixed ii, let ki=|Pi|dik_{i}=\frac{|P_{i}|}{d_{i}}, and let e1i,,eji,,ekiie_{1}^{i},\ldots,e_{j}^{i},\ldots,e^{i}_{k_{i}} be the random matching on PiP_{i}. Finally, let ni=li|Pl|n_{i}=\sum\limits_{l\geq i}|P_{l}|

Claim 2.

The construction has R+1=Θ(loglogn)R+1=\Theta(\log\log n) partitions.

Proof.

We first show that the construction has R+1loglognR+1\geq\log\log n partitions.

By induction we have that

|Pi|=(3log2n)2i1|P0|2i|P_{i}|=(3\log^{2}n)^{2^{i}-1}|P_{0}|^{2^{i}}

Therefore

n\displaystyle n i=0R|Pi|\displaystyle\leq\sum\limits_{i=0}^{R}|P_{i}|
(R+1)|PR|\displaystyle\leq(R+1)|P_{R}|
(R+1)(3log2n)2R1|3lognd02|2R\displaystyle\leq(R+1)(3\log^{2}n)^{2^{R}-1}|3\log^{n}d_{0}^{2}|^{2^{R}}
=(R+1)92R(3log2n)2R+11\displaystyle=(R+1)9^{2^{R}}(3\log^{2}n)^{2^{R+1}-1}
=92R+1(3log2n)2R+1\displaystyle=9^{2^{R+1}}(3\log^{2}n)^{2^{R+1}}

This implies that

2R+1log(27log2n)logn,2^{R+1}\log(27\log^{2}n)\geq\log n,

and

R+1loglognlog2log(log(27log2n))log2loglogn.R+1\geq\frac{\log\log n}{\log 2}-\frac{\log\left(\log(27\log^{2}n)\right)}{\log 2}\geq\log\log n.

Next we show that R=O(loglogn)R=O(\log\log n). We know that |PR|n|P_{R}|\leq n, which implies that (3log2n)2R1n(3\log^{2}n)^{2^{R}-1}\leq n and

2R1lognloglog6n.2^{R}-1\leq\frac{\log n}{\log\log^{6}n}.

Therefore,

R=O(loglognloglog6n)=O(loglogn).R=O(\log\frac{\log n}{\log\log^{6}n})=O(\log\log n).

Suppose that at the beginning of round ii, we learned P0Pi1P_{0}\cup\ldots\cup P_{i-1} but PiPi+1PRP_{i}\cup P_{i+1}\ldots P_{R} is indistinguishable. Consider a query SS of the ii-th round. We show that if the query SS is big, then with high probability SS contains an edge from PiP_{i}, and if SS is small, then SS does not contain any edge from the high partition Pi+1,Pi+2,PR.P_{i+1},P_{i+2},\ldots P_{R}. In both cases, querying SS gives an answer that is independent to Pi+1,Pi+2,PRP_{i+1},P_{i+2},\ldots P_{R} with high probability over all the polynomial number of queries. Therefore, after the ii-th round, Pi+1,Pi+2,P_{i+1},P_{i+2},\ldots is indistinguishable.

Outline of the proof. Claim 3 is a technical result we need to bound the binomial coefficients in our proofs. Lemma 4 gives the expected size of the intersection of a query with a partition and presents a concentration result on the intersection when the queries are large. Lemma 5 shows that for the purpose of bounding the probability that no edge of a partition is included in SS, we can drop the constraint the edges are disjoint and think of them as being randomly and uniformly sampled (with replacement) from the partition. Lemma 6 shows that |S|(11di)ni|S|\geq(1-\frac{1}{d_{i}})n_{i}, then with probability 1eΩ(log2n)1-e^{-\Omega(\log^{2}n)}, SS contains at least one edge from the matching on PiP_{i}. If |S|(11di)ni|S|\leq(1-\frac{1}{d_{i}})n_{i}, then Lemma 7 shows with probability 1eΩ(log2n)1-e^{-\Omega(\log^{2}n)}, for every li+1l\geq i+1, SS does not contain any edge from the matching on PlP_{l}. Combining the two lemmas (Lemma 8) shows that if PP is a uniformly random partition in PRP_{R}, and if all the queries made by an algorithm 𝒜\mathcal{A} in the previous ii rounds are independent of the partition of Pi,,PRP_{i},\ldots,P_{R}, then any collection of poly(n)poly(n) non-adaptive queries at round i+1i+1, independent of Pi+1,,PRP_{i+1},\ldots,P_{R} with probability 1eΩ(log2n)1-e^{-\Omega(\log^{2}n)}.

Claim 3.

If k<nk<\sqrt{n}, then

nk4(k!)(nk)nkk!\frac{n^{k}}{4(k!)}\leq{n\choose k}\leq\frac{n^{k}}{k!}
Proof.

We have that

(nk)\displaystyle{n\choose k} =n(n1)(n(k1)k!\displaystyle=\frac{n(n-1)\ldots(n-(k-1)}{k!}
=nkk!(11n)(1k1n)\displaystyle=\frac{n^{k}}{k!}(1-\frac{1}{n})\ldots(1-\frac{k-1}{n})

The upper bound (nk)nkk!{n\choose k}\leq\frac{n^{k}}{k!} follows immediately. To show the lower bound, we observe that

(nk)\displaystyle{n\choose k} =nkk!(11n)(1k1n)\displaystyle=\frac{n^{k}}{k!}(1-\frac{1}{n})\ldots(1-\frac{k-1}{n})
nkk!(1k1n)k1\displaystyle\geq\frac{n^{k}}{k!}(1-\frac{k-1}{n})^{k-1}

Now consider the function fn(x,y)=(1xn)yf_{n}(x,y)=(1-\frac{x}{n})^{y} for y1y\geq 1 and x[1,n]x\in[1,n]. For x[1,n]x\in[1,n] and fixed y1y\geq 1, the function fnf_{n} decreases as xx increases. Similarly, for xx fixed, fnf_{n} decreases as yy increases. Therefore

(1k1n)k1(11n)n.(1-\frac{k-1}{n})^{k-1}\geq(1-\frac{1}{\sqrt{n}})^{\sqrt{n}}.

Furthermore, we know that for x2x\geq 2, (11/x)x1/4(1-1/x)^{x}\geq 1/4. Therefore

(nk)\displaystyle{n\choose k} nkk!(1k1n)k1\displaystyle\geq\frac{n^{k}}{k!}(1-\frac{k-1}{n})^{k-1}
nkk!(11n)n\displaystyle\geq\frac{n^{k}}{k!}(1-\frac{1}{\sqrt{n}})^{\sqrt{n}}
nk4(k!).\displaystyle\geq\frac{n^{k}}{4(k!)}.

Lemma 4.

Let SS be an arbitrary subset of PiPi+1PRP_{i}\cup P_{i+1}\cup\ldots P_{R}. After ii rounds, for lil\geq i, the expected size of the intersection |SPl||S\cap P_{l}| is

𝔼[|SPl|]=sl=|S||Pl|ni,\mathbb{E}[|S\cap P_{l}|]=s_{l}=\frac{|S||P_{l}|}{n_{i}},

where the expectation is over the partitions of PiPi+1PrP_{i}\cup P_{i+1}\cup\ldots P_{r} into Pi,Pi+1,,PrP_{i},P_{i+1},\ldots,P_{r}. Moreover, for any polynomial set of queries 𝒮\mathcal{S} such that |S|(11di)ni|S|\geq(1-\frac{1}{d_{i}})n_{i} for S𝒮S\in\mathcal{S}, then with probability 1eΩ(log2n)1-e^{-\Omega(\log^{2}n)}, for all lil\geq i and for all S𝒮S\in\mathcal{S}, we have |SPl|sl(11di)|S\cap P_{l}|\geq s_{l}(1-\frac{1}{d_{i}}).

Proof of Lemma 4.

Every node in SS will be in PlP_{l} with probability |Pl|/ni|P_{l}|/n_{i}, therefore by linearity of expectation

𝔼[|SPl|]=sl=|S||Pl|ni\mathbb{E}[|S\cap P_{l}|]=s_{l}=\frac{|S||P_{l}|}{n_{i}}

Suppose now that |S|(11di)ni|S|\geq(1-\frac{1}{d_{i}})n_{i}. By Chernoff bound, we have

(|SPl|sl(11di))\displaystyle\mathbb{P}\big{(}|S\cap P_{l}|\leq s_{l}(1-\frac{1}{d_{i}})\big{)} esl2di2\displaystyle\leq e^{-\frac{s_{l}}{2d_{i}^{2}}}
=e|S|ni|Pl|2di2\displaystyle=e^{-\frac{|S|}{n_{i}}\frac{|P_{l}|}{2d_{i}^{2}}}
e(11/di)|Pl|2di2\displaystyle\leq e^{-(1-1/d_{i})\frac{|P_{l}|}{2d_{i}^{2}}}
e|Pl|3di2\displaystyle\leq e^{-\frac{|P_{l}|}{3d_{i}^{2}}}
elog2n\displaystyle\leq e^{-\log^{2}n}
1nlogn\displaystyle\leq\frac{1}{n^{\log n}}

In the fourth inequality, we used 11/di2/31-1/d_{i}\leq 2/3. By a union bound the probability that there exists a partition PlP_{l} with lil\geq i such that |SPl|sl(11di)|S\cap P_{l}|\leq s_{l}(1-\frac{1}{d_{i}}) is O(loglogn/nlogn)=eΩ(log2n)O(\log\log n/n^{\log n})=e^{-\Omega(\log^{2}n)}. Another application of the union bound on the polynomially many queries shows that with probability at least 1eΩ(log2n)1-e^{-\Omega(\log^{2}n)}, for all queries of size greater than (11/di)n(1-1/d_{i})n and for all partitions greater than ii, we have |SPl|sl(11di)|S\cap P_{l}|\geq s_{l}(1-\frac{1}{d_{i}}). ∎

Lemma 5.

Fix an iteration ii, and a query SS. Let e1,,ekie^{\prime}_{1},\ldots,e^{\prime}_{k_{i}} be edges of size did_{i}, independently and uniformly sampled from PiP_{i}. Let e1i,,ekiie^{i}_{1},\ldots,e^{i}_{k_{i}} be a random matching on PiP_{i}. We have

(j{1,,ki},ejiS)(j{1,,ki},ejS)=(1(|SPi|di)(|Pi|di))ki\mathbb{P}(\forall j\in\{1,\ldots,k_{i}\},\ e_{j}^{i}\not\subseteq S)\leq\mathbb{P}(\forall j\in\{1,\ldots,k_{i}\},\ e^{\prime}_{j}\not\subseteq S)=(1-\frac{\binom{|S\cap P_{i}|}{d_{i}}}{\binom{|P_{i}|}{d_{i}}})^{k_{i}}
Proof.

Since the partition ii is fixed, we drop indexing by ii and use eje_{j} to denote ejie_{j}^{i}, kk to denote kik_{i}, and dd to denote did_{i}. We also let n=|Pi|n=|P_{i}| and assume without loss of generality that S|Pi|S\subseteq|P_{i}|.

The intuition is that, since the edges eje_{j} must be disjoint but the edges eje^{\prime}_{j} do not necessarily have to, then the edges eje_{j} will cover a “bigger fraction” of SS than eje^{\prime}_{j}.

We prove this by induction on the number of edges. When considering only one edge e1e_{1}, we know that

(e1S)=(ejS)=(e1S).\mathbb{P}(e_{1}\not\subseteq S)=\mathbb{P}(e_{j}\not\subseteq S)=\mathbb{P}(e^{\prime}_{1}\not\subseteq S).

Now assume that the result is true for jj edges, i.e. (e1S,,ejS)(e1S)j\mathbb{P}(e_{1}\not\subseteq S,\ldots,e_{j}\not\subseteq S)\leq\mathbb{P}(e^{\prime}_{1}\not\subseteq S)^{j}. We want to show that it holds for j+1j+1 edges, i.e.,

(e1S,,ejS,ej+1S)(e1S)j+1\mathbb{P}(e_{1}\not\subseteq S,\ldots,e_{j}\not\subseteq S,e_{j+1}\not\subseteq S)\leq\mathbb{P}(e^{\prime}_{1}\not\subseteq S)^{j+1}

By the inductive hypothesis, it is sufficient to show that

P(ej+1S|lj,elS)(ej+1S)P(e_{j+1}\not\subseteq S|\ \forall\ l\leq j,\ e_{l}\not\subseteq S)\leq\mathbb{P}(e_{j+1}\not\subseteq S)

This is equivalent to showing that P(ej+1S|e1S,,ejS)(ej+1S)P(e_{j+1}\subseteq S|\ e_{1}\not\subseteq S,\ldots,e_{j}\not\subseteq S)\geq\mathbb{P}(e_{j+1}\subseteq S). Furthermore, we have

(ej+1S)\displaystyle\mathbb{P}(e_{j+1}\subseteq S) =(ej+1S|lj,elS)(lj,elS)\displaystyle=\mathbb{P}(e_{j+1}\subseteq S|\ \forall\ l\leq j,\ e_{l}\not\subseteq S)\mathbb{P}(\forall\ l\leq j,\ e_{l}\not\subseteq S)
+(ej+1S|lj,elS)(lj,elS)\displaystyle\ \ \ +\mathbb{P}(e_{j+1}\subseteq S|\ \exists\ l\leq j,\ e_{l}\subseteq S)\mathbb{P}(\exists\ l\leq j,\ e_{l}\subseteq S)

Therefore, P(ej+1S|e1S,,ejS)(ej+1S)P(e_{j+1}\not\subseteq S|\ e_{1}\not\subseteq S,\ldots,e_{j}\not\subseteq S)\leq\mathbb{P}(e_{j+1}\not\subseteq S) becomes equivalent to

P(ej+1S|lj,elS)(lj,elS)(ej+1S|lj,elS)(lj,elS)P(e_{j+1}\subseteq S|\ \forall\ l\leq j,\ e_{l}\not\subseteq S)\mathbb{P}(\exists\ l\leq j,\ e_{l}\subseteq S)\geq\mathbb{P}(e_{j+1}\subseteq S|\ \exists\ l\leq j,\ e_{l}\subseteq S)\mathbb{P}(\exists\ l\leq j,\ e_{l}\subseteq S) (10)

Since, (lj,elS)>0\mathbb{P}(\exists\ l\leq j,\ e_{l}\subseteq S)>0, the inequality (10) is equivalent to

P(ej+1S|lj,elS)(ej+1S|lj,elS)P(e_{j+1}\subseteq S|\ \forall\ l\leq j,\ e_{l}\not\subseteq S)\geq\mathbb{P}(e_{j+1}\subseteq S|\ \exists\ l\leq j,\ e_{l}\subseteq S) (11)

To prove equation (11), we observe that

(ej+1S)=pP(ej+1S|lj,elS)+(1p)(ej+1S|lj,elS),\mathbb{P}(e_{j+1}\subseteq S)=pP(e_{j+1}\subseteq S|\ \forall\ l\leq j,\ e_{l}\not\subseteq S)+(1-p)\mathbb{P}(e_{j+1}\subseteq S|\ \exists\ l\leq j,\ e_{l}\subseteq S),

where p=(lj,elS).p=\mathbb{P}(\forall\ l\leq j,\ e_{l}\not\subseteq S). Therefore, if we show that (ej+1S)(ej+1S|lj,elS)\mathbb{P}(e_{j+1}\subseteq S)\geq\mathbb{P}(e_{j+1}\subseteq S|\ \exists\ l\leq j,\ e_{l}\subseteq S), then we must have P(ej+1S|lj,elS)(ej+1S)(ej+1S|lj,elS)P(e_{j+1}\subseteq S|\ \forall\ l\leq j,\ e_{l}\not\subseteq S)\geq\mathbb{P}(e_{j+1}\subseteq S)\geq\mathbb{P}(e_{j+1}\subseteq S|\ \exists\ l\leq j,\ e_{l}\subseteq S). To see that (ej+1S)(ej+1S|lj,elS)\mathbb{P}(e_{j+1}\subseteq S)\geq\mathbb{P}(e_{j+1}\subseteq S|\ \exists\ l\leq j,\ e_{l}\subseteq S), we first observe that

(ej+1S|lj,elS)\displaystyle\mathbb{P}(e_{j+1}\subseteq S|\ \exists\ l\leq j,\ e_{l}\subseteq S) =(ej+1S|e1S)\displaystyle=\mathbb{P}(e_{j+1}\subseteq S|\ e_{1}\subseteq S)
=(|S|dd)(ndd),\displaystyle=\frac{\binom{|S|-d}{d}}{\binom{n-d}{d}},

while

(ej+1S)=(|S|d)(nd).\mathbb{P}(e_{j+1}\subseteq S)=\frac{\binom{|S|}{d}}{\binom{n}{d}}.

Therefore,

(ej+1S)(ej+1S|e1S)\displaystyle\frac{\mathbb{P}(e_{j+1}\subseteq S)}{\mathbb{P}(e_{j+1}\subseteq S|\ e_{1}\subseteq S)} =(|S|d)(nd)(ndd)(|S|dd)\displaystyle=\frac{\binom{|S|}{d}}{\binom{n}{d}}\frac{\binom{n-d}{d}}{\binom{|S|-d}{d}}
=(|S|)!(|S|d)!(|S|2d)!(|S|d)!(nd)!n!(nd)!(n2d)!\displaystyle=\frac{(|S|)!}{(|S|-d)!}\frac{(|S|-2d)!}{(|S|-d)!}\frac{(n-d)!}{n!}\frac{(n-d)!}{(n-2d)!}
=|S|(|S|d+1)(|S|d)|S|2d+1)n(nd+1)(nd)n2d+1)\displaystyle=\frac{\frac{|S|\ \ \ \ldots(|S|-d+1)}{(|S|-d)\ldots|S|-2d+1)}}{\frac{n\ \ \ \ldots(n-d+1)}{(n-d)\ldots n-2d+1)}}

It is easy to verify that

|S|d+i|S|2d+ind+in2d+i\frac{|S|-d+i}{|S|-2d+i}\geq\frac{n-d+i}{n-2d+i}

for every i{1,,d}i\in\{1,\ldots,d\}, because |S|n|S|\leq n. Therefore we get that (ej+1S)(ej+1S|e1S)\mathbb{P}(e_{j+1}\subseteq S)\geq\mathbb{P}(e_{j+1}\subseteq S|\ e_{1}\subseteq S), and hence (ej+1S)(ej+1S|lj,elS)\mathbb{P}(e_{j+1}\subseteq S)\geq\mathbb{P}(e_{j+1}\subseteq S|\ \exists\ l\leq j,\ e_{l}\subseteq S). This proves (11) and concludes the induction proof. ∎

Lemma 6.

Consider a query SS by an algorithm at round ii such that all the queries from previous rounds are independent of Pi,,PRP_{i},\ldots,P_{R}. Let SS be such that |S|(11di)ni|S|\geq(1-\frac{1}{d_{i}})n_{i}, then with probability 1eΩ(log2n)1-e^{-\Omega(\log^{2}n)}, SS contains at least one edge from the matching on PiP_{i}

Proof.
(j{1,,ki},ejiS)\displaystyle\mathbb{P}(\exists j\in\{1,\ldots,k_{i}\},\ e_{j}^{i}\subseteq S) =1(j{1,,ki},ejiS)\displaystyle=1-\mathbb{P}(\forall j\in\{1,\ldots,k_{i}\},\ e_{j}^{i}\not\subseteq S) (12)
1(1(|SPi|di)(|Pi|di))ki\displaystyle\geq 1-(1-\frac{\binom{|S\cap P_{i}|}{d_{i}}}{\binom{|P_{i}|}{d_{i}}})^{k_{i}} (13)

where the inequality comes from Lemma 5. Since |S|(11di)ni|S|\geq(1-\frac{1}{d_{i}})n_{i}, we have with high probability that |SPi|(11di)2|Pi|di2|S\cap P_{i}|\geq(1-\frac{1}{d_{i}})^{2}|P_{i}|\geq d_{i}^{2}. Conditioning on this event, and since we already have di|Pi|d_{i}\leq\sqrt{|P_{i}|}, we get by Claim 3

(|SPi|di)(|Pi|di)\displaystyle\frac{\binom{|S\cap P_{i}|}{d_{i}}}{\binom{|P_{i}|}{d_{i}}} |SPi|di4(di)!(di)!|Pi|di\displaystyle\geq\frac{|S\cap P_{i}|^{d_{i}}}{4(d_{i})!}\frac{(d_{i})!}{|P_{i}|^{d_{i}}}
14(11di)2di\displaystyle\geq\frac{1}{4}(1-\frac{1}{d_{i}})^{2d_{i}}
14e1/di11/di2di\displaystyle\geq\frac{1}{4}e^{-\frac{1/d_{i}}{1-1/d_{i}}2d_{i}}
14e211/di\displaystyle\geq\frac{1}{4}e^{-\frac{2}{1-1/d_{i}}}
14e4\displaystyle\geq\frac{1}{4}e^{-4}

Since ki=|Pi|di=3log2ndik_{i}=\frac{|P_{i}|}{d_{i}}=3\log^{2}n\cdot d_{i}, we get that

(|SPi|di)(|Pi|di)14e413di=log2nki\frac{\binom{|S\cap P_{i}|}{d_{i}}}{\binom{|P_{i}|}{d_{i}}}\geq\frac{1}{4}e^{-4}\geq\frac{1}{3d_{i}}=\frac{\log^{2}n}{k_{i}}

The probability (13) becomes

(j{1,,ki},ejiS)\displaystyle\mathbb{P}(\exists j\in\{1,\ldots,k_{i}\},\ e_{j}^{i}\subseteq S) 1(1(|SPi|di)(|Pi|di))ki\displaystyle\geq 1-(1-\frac{\binom{|S\cap P_{i}|}{d_{i}}}{\binom{|P_{i}|}{d_{i}}})^{k_{i}} (14)
1(1log2nki)ki\displaystyle\geq 1-(1-\frac{\log^{2}n}{k_{i}})^{k_{i}} (15)
11nlogn\displaystyle\geq 1-\frac{1}{n^{\log n}} (16)
=1eΩ(log2n)\displaystyle=1-e^{-\Omega(\log^{2}n)} (17)

Lemma 7.

Consider a query SS by an algorithm at round ii such that all the queries from previous rounds are independent of Pi,,PRP_{i},\ldots,P_{R}. Let SS be such that |S|(11di)ni|S|\leq(1-\frac{1}{d_{i}})n_{i}, then with probability 1eΩ(log2n)1-e^{-\Omega(\log^{2}n)}, for every li+1l\geq i+1, SS does not contain any edge from the matching on PlP_{l}.

Proof.

Let’s fix a partition PlP_{l} with li+1l\geq i+1. Let e1,,ej,ekle_{1},\ldots,e_{j},\ldots e_{k_{l}} be the random matching on PlP_{l}. We have for j=1,,klj=1,\ldots,k_{l},

𝔼[|ejS|]=|S||ej|ni(11di)dl=dldldi.\mathbb{E}\big{[}|e_{j}\cap S|\big{]}=\frac{|S||e_{j}|}{n_{i}}\leq(1-\frac{1}{d_{i}})d_{l}=d_{l}-\frac{d_{l}}{d_{i}}.

Going forward, we present an upper bound on (ejS)\mathbb{P}(e_{j}\subset S). We can assume without loss of generality that |S|=(11di)ni|S|=(1-\frac{1}{d_{i}})n_{i}. In fact, for any j=1,,klj=1,\ldots,k_{l}, if SSS\subseteq S^{\prime} then we have (ejS)(ejS)\mathbb{P}(e_{j}\subseteq S)\leq\mathbb{P}(e_{j}\subseteq S^{\prime}).

The probability that ejSe_{j}\subset S can be expressed as

(ejS)\displaystyle\mathbb{P}(e_{j}\subset S) =(|ejS|dl)\displaystyle=\mathbb{P}(|e_{j}\cap S|\geq d_{l})
(|ejS|(1+δ)E[|ejS|]),\displaystyle\leq\mathbb{P}\Big{(}|e_{j}\cap S|\geq(1+\delta)E\big{[}|e_{j}\cap S|\big{]}\Big{)},

where δ=1di1\delta=\frac{1}{d_{i}-1}. By Chernoff bound we get

(ejS)\displaystyle\mathbb{P}(e_{j}\subset S) 2e13δ2E[|ejS|]\displaystyle\leq 2e^{-\frac{1}{3}\delta^{2}E\big{[}|e_{j}\cap S|\big{]}}
=2e131(di1)2(11di)dl\displaystyle=2e^{-\frac{1}{3}\frac{1}{(d_{i}-1)^{2}}(1-\frac{1}{d_{i}})d_{l}}
=2e131(di1)didl\displaystyle=2e^{-\frac{1}{3}\frac{1}{(d_{i}-1)d_{i}}d_{l}}
2elog2n\displaystyle\leq 2e^{-\log^{2}n}
2nlogn\displaystyle\leq\frac{2}{n^{\log n}}

where the second to last inequality is due to dldi+1=3log2ndi2d_{l}\geq d_{i+1}=3\log^{2}n\cdot d_{i}^{2}. Therefore by a union bound on the edges of the matching in PlP_{l} we get that

(j=1,kl,ejS)\displaystyle\mathbb{P}(\exists\ j=1,\ldots k_{l},\ \ e_{j}\subset S) 2klnlogn\displaystyle\leq\frac{2k_{l}}{n^{\log n}}
1nlogn1\displaystyle\leq\frac{1}{n^{\log n-1}}

Finally, another union bound on all the partition li+1l\geq i+1 yields that the probability that there exists a partition PlP_{l} such that an edge of PlP_{l} is included in SS is less than O(loglogn/nlogn1)=1eΩ(log2n)O(\log\log n/n^{\log n-1})=1-e^{-\Omega(\log^{2}n)}. ∎

Lemma 8.

If vertices in Pi,Pi+1,,PRP_{i},P_{i+1},\ldots,P_{R} are indistinguishable to 𝒜\mathcal{A} at the beginning of round ii of queries, then vertices in Pi+1,,PRP_{i+1},\ldots,P_{R} are indistinguishable at the end of round ii with probability 1eΩ(log2n)1-e^{-\Omega(\log^{2}n)}.

Proof.

Consider a query SS by an algorithm at round ii such that all the queries from previous rounds are independent of the partition Pi,,PRP_{i},\ldots,P_{R}. Lemma 7 shows that if SS is small (|S|(11/di)ji|Pj|)|S|\leq(1-1/d_{i})\sum_{j\geq i}|P_{j}|) and is independent of partition of j=iRPi\cup_{j=i}^{R}P_{i} into Pi,,PRP_{i},\ldots,P_{R}, then with probability 1eΩ(log2n)1-e^{-\Omega(\log^{2}n)}, for every ji+1j\geq i+1, SS does not contain any edge contained in PjP_{j}. On the other hand, Lemma 6 shows that if a query SS has large size (|S|(11/di)ji|Pj|)|S|\geq(1-1/d_{i})\sum_{j\geq i}|P_{j}|) and is independent of partition of j=iRPi\cup_{j=i}^{R}P_{i} into Pi,,PRP_{i},\ldots,P_{R}, then with probability 1eΩ(log2n)1-e^{-\Omega(\log^{2}n)}, SS contains at least one edge from the matching on PiP_{i} which implies QMP(S)=1Q_{M_{P}}(S)=1. In both cases, QMP(S)Q_{M_{P}}(S) is independent of the partition of j=i+1RPi\cup_{j=i+1}^{R}P_{i} into Pi+1,,PRP_{i+1},\ldots,P_{R} with probability 1eΩ(log2n)1-e^{-\Omega(\log^{2}n)}. By a union bound, this holds for poly(n)poly(n) queries at round ii. ∎

See 7

Proof.

Consider a uniformly random partition P=(P0,,Pi,PR)P=(P_{0},\ldots,P_{i},\ldots P_{R}), a matching MPM_{P} and an algorithm 𝒜\mathcal{A} which queries MPM_{P} in loglogn3\log\log n-3 rounds. By Lemma 8, after ii rounds of queries, with probability 1eΩ(log2n)1-e^{-\Omega(\log^{2}n)} over both the randomization of PP and of the algorithm, all the queries QMP(S)Q_{M_{P}}(S) made by 𝒜\mathcal{A} are independent of the partition of PiPRP_{i}\cup\ldots\cup P_{R}. Therefore, and since Rloglogn1R\geq\log\log n-1 by Claim 2, we get that after loglogn3\log\log n-3 round of queries, with probability 1eΩ(log2n)1-e^{-\Omega(\log^{2}n)} over both the randomization of PP and the algorithm 𝒜\mathcal{A}, all the queries QMP(S)Q_{M_{P}}(S) made by 𝒜\mathcal{A} are independent of the partition of PR1PRP_{R-1}\cup P_{R}. We now distinguish two cases:

  • 𝒜\mathcal{A} does not a return any edge that is included in PR1PRP_{R-1}\cup P_{R}.

  • 𝒜\mathcal{A} returns a set of edges that is included in PR1PRP_{R-1}\cup P_{R}. In this case, we know that with probability 1eΩ(log2n)1-e^{-\Omega(\log^{2}n)} all the queries QMP(S)Q_{M_{P}}(S) made by 𝒜\mathcal{A} are independent of the partition of PR1PRP_{R-1}\cup P_{R} into PR1P_{R-1} and PRP_{R}. The edges that are returned by 𝒜\mathcal{A} and included in PR1PRP_{R-1}\cup P_{R} are therefore also independent from the partition of PR1PRP_{R-1}\cup P_{R} into PR1P_{R-1} and PRP_{R} with probability 1eΩ(log2n)1-e^{-\Omega(\log^{2}n)}. To fully learn MPM_{P}, 𝒜\mathcal{A} needs to make the distinction between points in PR1P_{R-1} and points in PRP_{R}, but there are (|PR|+|PR1||PR1|)\binom{|P_{R}|+|P_{R-1}|}{|P_{R-1}|} ways of partitioning PR1PRP_{R-1}\cup P_{R} into PR1P_{R-1} and PRP_{R}. Therefore the probability that 𝒜\mathcal{A} correctly learns MPM_{P} is less than

    (1eΩ(log2n))1(|PR|+|PR1||PR1|)(1-e^{-\Omega(\log^{2}n)})\frac{1}{\binom{|P_{R}|+|P_{R-1}|}{|P_{R-1}|}}

    In the rest of the proof, we show that 1/(|PR|+|PR1||PR1|)=eΩ(log2n)1/\binom{|P_{R}|+|P_{R-1}|}{|P_{R-1}|}=e^{-\Omega(\log^{2}n)}. This implies that the probability that 𝒜\mathcal{A} learns MPM_{P} correctly is less than (1eΩ(log2n))eΩ(log2n)=eΩ(log2n)(1-e^{-\Omega(\log^{2}n)})e^{-\Omega(\log^{2}n)}=e^{-\Omega(\log^{2}n)}.

    We know that ni=0R|Pi|(R+1)|PR|n\leq\sum_{i=0}^{R}|P_{i}|\leq(R+1)|P_{R}|. Therefore |PR|n/(R+1)=Ω(n/loglogn)|P_{R}|\geq n/(R+1)=\Omega(n/\log\log n). By Claim 3, we get that

    1(|PR|+|PR1||PR1|)\displaystyle\frac{1}{\binom{|P_{R}|+|P_{R-1}|}{|P_{R-1}|}} 4(|PR1|)!(|PR|+|PR1|)|PR1|\displaystyle\leq\frac{4(|P_{R-1}|)!}{(|P_{R}|+|P_{R-1}|)^{|P_{R-1}|}}
    (|PR1||PR|+|PR1|)|PR1|\displaystyle\leq\left(\frac{|P_{R-1}|}{|P_{R}|+|P_{R-1}|}\right)^{|P_{R-1}|}
    (13log2n|PR1|)|PR1|\displaystyle\leq\left(\frac{1}{3\log^{2}n|P_{R-1}|}\right)^{|P_{R-1}|}
    (13log2n|PR1|)3log2n\displaystyle\leq\left(\frac{1}{3\log^{2}n|P_{R-1}|}\right)^{3\log^{2}n}
    (13log2n|PR1|)3log2n\displaystyle\leq\left(\frac{1}{3\log^{2}n|P_{R-1}|}\right)^{3\log^{2}n}
    =eΩ(log2n)\displaystyle=e^{-\Omega(\log^{2}n)}

Therefore, with probability 1eΩ(log2n)1-e^{-\Omega(\log^{2}n)}, the matching returned by 𝒜\mathcal{A} is not equal to MPM_{P}. ∎

Appendix B Missing Analysis for Low Degree Near Uniform Hypergraphs (Section 4)

B.1 Proof of Lemma 1

Lemma 9.

Assume that the hypergraph H(V,E)H(V,E) has a maximum degree of Δ\Delta and edge size between d/ρd/\rho and dd. Let SS be a vertex-sample with probability pp, and assume that SS contains an edge ee. We have

(e s.t eS,eH{e}|eS)f(Δ,p,d,ρ)\mathbb{P}(\not\exists\ e^{\prime}\mbox{ s.t }e^{\prime}\subseteq S,e^{\prime}\in H\setminus\{e\}\ |\ e\subseteq S)\geq f^{\bullet}(\Delta,p,d,\rho)
Proof.

The intuition is that the term f(Δ,p,d,ρ)f^{\bullet}(\Delta,p,d,\rho) treats the events that the edges are not contained in SS as independent events.

Consider the following intermediate optimization problem

fk(Δ,p,d,ρ):=\displaystyle f^{\bullet}_{k}(\Delta,p,d,\rho):=\quad\quad minaij\displaystyle\min\limits_{a_{ij}} j=d/ρdi=0j1(1pji)aij\displaystyle\prod\limits_{j=d/\rho}^{d}\prod\limits_{i=0}^{j-1}(1-p^{j-i})^{a_{ij}}
s.t. j=d/ρdi=0j1iaijmin{(Δ1)d,kd}\displaystyle\sum_{j=d/\rho}^{d}\sum_{i=0}^{j-1}i\cdot a_{ij}\leq\min\{(\Delta-1)d,kd\}
j=d/ρdi=0j1jaijk\displaystyle\sum_{j=d/\rho}^{d}\sum_{i=0}^{j-1}j\cdot a_{ij}\leq k
aij0.\displaystyle a_{ij}\geq 0.

We want to show that for every 1kΔn1\leq k\leq\Delta n, and for any kk edges e1,,eke_{1},\ldots,e_{k}, all different than ee, we have

(e1,,ekS|eS)fk(Δ,p,d,ρ).\mathbb{P}(e_{1},\ldots,e_{k}\not\subseteq S\ |\ e\subseteq S)\geq f_{k}^{\bullet}(\Delta,p,d,\rho).

For k=1k=1, the result clearly holds. Suppose by induction for any kk edges e1,,eke_{1},\ldots,e_{k}, all different than ee, we have

(e1,,ekS|eS)fk(Δ,p,d,ρ).\mathbb{P}(e_{1},\ldots,e_{k}\not\subseteq S\ |\ e\subseteq S)\geq f_{k}^{\bullet}(\Delta,p,d,\rho).

If we consider k+1k+1 edges, then

(e1,,ek,ek+1S|eS)\displaystyle\mathbb{P}(e_{1},\ldots,e_{k},e_{k+1}\not\subseteq S\ |\ e\subseteq S) =P(e1,,ekS|eS)(ek+1S|eS,e1,,ek,ek+1S)\displaystyle=P(e_{1},\ldots,e_{k}\not\subseteq S\ |\ e\subseteq S)\mathbb{P}(e_{k+1}\not\subseteq S\ |\ e\subseteq S,\ e_{1},\ldots,e_{k},e_{k+1}\not\subseteq S)

By the induction hypothesis we know that P(e1,,ekS|eS)fk(Δ,p,d,ρ)P(e_{1},\ldots,e_{k}\not\subseteq S\ |\ e\subseteq S)\geq f_{k}^{\bullet}(\Delta,p,d,\rho). Therefore, if we can show that

(ek+1S|eS,e1,,ekS)(ek+1S|eS)\mathbb{P}(e_{k+1}\not\subseteq S\ |\ e\subseteq S,\ e_{1},\ldots,e_{k}\not\subseteq S)\geq\mathbb{P}(e_{k+1}\not\subseteq S\ |\ e\subseteq S) (18)

then we will have

(e1,,ek,ek+1S|eS)fk(Δ,p,d,ρ)(ek+1S|eS)fk+1(Δ,p,d,ρ)\mathbb{P}(e_{1},\ldots,e_{k},e_{k+1}\not\subseteq S\ |\ e\subseteq S)\geq f_{k}^{\bullet}(\Delta,p,d,\rho)\mathbb{P}(e_{k+1}\not\subseteq S\ |\ e\subseteq S)\geq f_{k+1}^{\bullet}(\Delta,p,d,\rho)

To see that (18) holds, we omit conditioning on eSe\subset S to ease notation. By Bayes rule,

(ek+1S|eS,e1,,ekS)\displaystyle\mathbb{P}(e_{k+1}\not\subseteq S\ |\ e\subseteq S,\ e_{1},\ldots,e_{k}\not\subseteq S) =(ek+1S|[1,k]eek+1S,e1,,ekS)\displaystyle=\mathbb{P}(e_{k+1}\not\subseteq S\ |\ \exists\ \ell\in[1,k]\ e_{\ell}\cap e_{k+1}\setminus S\neq\emptyset,\ e_{1},\ldots,e_{k}\not\subseteq S)
×([1,k]eek+1S|e1,,ekS)\displaystyle\ \ \ \ \times\mathbb{P}(\exists\ \ell\in[1,k]\ e_{\ell}\cap e_{k+1}\setminus S\neq\emptyset\ |\ e_{1},\ldots,e_{k}\not\subseteq S)
+(ek+1S|[1,k]eek+1S=,e1,,ekS)\displaystyle\ \ \ \ +\mathbb{P}(e_{k+1}\not\subseteq S\ |\ \forall\ \ell\in[1,k]\ e_{\ell}\cap e_{k+1}\setminus S=\emptyset,\ e_{1},\ldots,e_{k}\not\subseteq S)
×([1,k]eek+1S=|e1,,ekSS)\displaystyle\ \ \ \ \times\mathbb{P}(\forall\ \ell\in[1,k]\ e_{\ell}\cap e_{k+1}\setminus S=\emptyset\ |\ e_{1},\ldots,e_{k}\not\subseteq S\not\subseteq S)

When eek+1Se_{\ell}\cap e_{k+1}\setminus S\neq\emptyset for some value \ell, then we know with probability one that ek+1Se_{k+1}\not\subseteq S. Therefore

(ek+1S|[1,k]eek+1S,e1,,ekS)=1P(ek+1S).\mathbb{P}(e_{k+1}\not\subseteq S\ |\ \exists\ \ell\in[1,k]\ e_{\ell}\cap e_{k+1}\setminus S\neq\emptyset,\ e_{1},\ldots,e_{k}\not\subseteq S)=1\geq P(e_{k+1}\not\subseteq S).

Furthermore,

(ek+1S|[1,k]eek+1S=,e1,,ekS)=(ek+1S)=1pd,\mathbb{P}(e_{k+1}\not\subseteq S\ |\ \forall\ \ell\in[1,k]\ e_{\ell}\cap e_{k+1}\setminus S=\emptyset,\ e_{1},\ldots,e_{k}\not\subseteq S)=\mathbb{P}(e_{k+1}\not\subseteq S)=1-p^{d},

therefore

(ek+1S|eS,e1,,ekS)\displaystyle\mathbb{P}(e_{k+1}\not\subseteq S\ |\ e\subseteq S,\ e_{1},\ldots,e_{k}\not\subseteq S) (ek+1S|[1,k]eek+1S,e1,,ekS)\displaystyle\geq\mathbb{P}(e_{k+1}\not\subseteq S\ |\ \exists\ \ell\in[1,k]\ e_{\ell}\cap e_{k+1}\setminus S\neq\emptyset,\ e_{1},\ldots,e_{k}\not\subseteq S)
×P(ek+1S)\displaystyle\ \ \ \ \times P(e_{k+1}\not\subseteq S)
+(ek+1S|[1,k]eek+1S=,e1,,ekS)\displaystyle\ \ \ \ +\mathbb{P}(e_{k+1}\not\subseteq S\ |\ \forall\ \ell\in[1,k]\ e_{\ell}\cap e_{k+1}\setminus S=\emptyset,\ e_{1},\ldots,e_{k}\not\subseteq S)
×P(ek+1S)\displaystyle\ \ \ \ \times P(e_{k+1}\not\subseteq S)
=P(ek+1S)\displaystyle=P(e_{k+1}\not\subseteq S)

Lemma 10.

For p(0,1)p\in(0,1), any optimal solution to LP(Δ,p,d,ρ)LP^{\bullet}(\Delta,p,d,\rho) and f(Δ,p,d,ρ)f^{\bullet}(\Delta,p,d,\rho) is such that aij=0a_{ij}=0 for all jd/ρj\neq d/\rho.

Proof.

For ease of notation, we use dd^{-} instead of d/ρd/\rho in the proof. Suppose to the contrary that there is an optimal solution aa such that aij>0a_{ij}>0 for some j>dj>d^{-}. Then consider the alternative solution aa^{\prime} such that aid=aid+ϵa^{\prime}_{id^{-}}=a_{id^{-}}+\epsilon, aij=aijϵa^{\prime}_{ij}=a_{ij}-\epsilon, and ask=aska^{\prime}_{sk}=a^{\prime}_{sk} for all sis\neq i or k{d,j}k\notin\{d^{-},j\}. For ϵ>0\epsilon>0 small enough, clearly the constraints still hold.

The net increase in objective value of LP(Δ,p,d,ρ)LP^{\bullet}(\Delta,p,d,\rho) is ϵ(pd1pj1)>0\epsilon\cdot(p^{d^{-}-1}-p^{j-1})>0 because d<jd^{-}<j and p(0,1)p\in(0,1). The new objective value of f(Δ,p,d,ρ)f^{\bullet}(\Delta,p,d,\rho) is (1pdi1pji)ϵ<1\left(\frac{1-p^{d^{-}-i}}{1-p^{j-i}}\right)^{\epsilon}<1 times the old objective value (thus smaller than the old objective value) again because d<jd^{-}<j and p(0,1)p\in(0,1). Therefore, aa is not an optimal solution to either LP(Δ,p,d,ρ)LP^{\bullet}(\Delta,p,d,\rho) or f(Δ,p,d,ρ)f^{\bullet}(\Delta,p,d,\rho). ∎

From Lemma 10, we can obtain the following two equivalent representations of LP(Δ,p,d,ρ)LP^{\bullet}(\Delta,p,d,\rho) and f(Δ,p,d,ρ)f^{\bullet}(\Delta,p,d,\rho): let aia_{i} be the number of edges of size d/ρd/\rho that intersect the focal edge at ii vertices, we have

LP(Δ,p,d,ρ):=\displaystyle LP^{\bullet}(\Delta,p,d,\rho):=\quad\quad maxai\displaystyle\max\limits_{a_{i}} i=0d/ρ1aipd/ρi\displaystyle\sum_{i=0}^{d/\rho-1}a_{i}\cdot p^{d/\rho-i}
s.t. i=0d/ρ1iai(Δ1)d\displaystyle\sum_{i=0}^{d/\rho-1}i\cdot a_{i}\leq(\Delta-1)d
i=0d/ρ1aiΔnd/ρ\displaystyle\sum_{i=0}^{d/\rho-1}a_{i}\leq\frac{\Delta n}{d/\rho}
ai0.\displaystyle a_{i}\geq 0.
f(Δ,p,d,d/ρ):=\displaystyle f^{\bullet}(\Delta,p,d,d/\rho):=\quad\quad minai\displaystyle\min\limits_{a_{i}} i=0d/ρ1(1pd/ρi)ai\displaystyle\prod\limits_{i=0}^{d/\rho-1}(1-p^{d/\rho-i})^{a_{i}}
s.t. i=0d/ρ1iai(Δ1)d\displaystyle\sum_{i=0}^{d/\rho-1}i\cdot a_{i}\leq(\Delta-1)d
i=0d/ρ1aiΔnd/ρ\displaystyle\sum_{i=0}^{d/\rho-1}a_{i}\leq\frac{\Delta n}{d/\rho}
ai0.\displaystyle a_{i}\geq 0.
Claim 4.

The function

f(x)=(1pd1pdx)1xf(x)=\left(\frac{1-p^{d}}{1-p^{d-x}}\right)^{\frac{1}{x}}

is increasing in xx for x[1,d1]x\in[1,d-1].

Proof.

Taking the derivative of ff, we get

f(x)=(px(pd1)pdpx)1x(pdpx)x2(pdxlogp(pdpx)log(px(pd1)pdpx)).f^{\prime}(x)=\frac{\left(\frac{p^{x}(p^{d}-1)}{p^{d}-p^{x}}\right)^{\frac{1}{x}}}{(p^{d}-p^{x})x^{2}}\cdot\left(p^{d}x\log p-(p^{d}-p^{x})\log\left(\frac{p^{x}(p^{d}-1)}{p^{d}-p^{x}}\right)\right). (19)

The first term on the RHS of Eq.(19) is non-positive: pd<pxp^{d}<p^{x} and pd1<0p^{d}-1<0 because p(0,1)p\in(0,1) and x<dx<d. We now show that the second term is non-positive as well. Denote the second term by g(x)g(x), i.e.,

g(x)=pdxlogp(pdpx)log(px(pd1)pdpx).g(x)=p^{d}x\log p-(p^{d}-p^{x})\log\left(\frac{p^{x}(p^{d}-1)}{p^{d}-p^{x}}\right).

Our plan is to first show that g(x)g(x) is decreasing in xx, and therefore it is maximized at x=1x=1. We then show that g(1)0g(1)\leq 0, and thus conclude that g(x)0g(x)\leq 0 for all x[1,d1]x\in[1,d-1].

Taking derivative of gg, we get

g(x)=pxlogplog(px(pd1)pdpx)=pxlogplog(1pd1pdx).g^{\prime}(x)=p^{x}\log p\log\left(\frac{p^{x}(p^{d}-1)}{p^{d}-p^{x}}\right)=p^{x}\log p\log\left(\frac{1-p^{d}}{1-p^{d-x}}\right).

Because 1dxd11\leq d-x\leq d-1 and p(0,1)p\in(0,1), we have 01pd1pdx0\geq 1-p^{d}\geq 1-p^{d-x}. Thus, (1pd)/(1pdx)1(1-p^{d})/(1-p^{d-x})\geq 1. Furthermore, logp<0\log p<0 because p(0,1)p\in(0,1). We can thus conclude that g(x)0g^{\prime}(x)\leq 0 for all x[1,d1]x\in[1,d-1], or equivalently, g(x)g(x) is decreasing on [1,d1][1,d-1].

We now show g(1)0g(1)\leq 0.

g(1)\displaystyle g(1) =pdlogp(pdp)log(p(pd1)pdp)\displaystyle=p^{d}\log p-(p^{d}-p)\log\left(\frac{p(p^{d}-1)}{p^{d}-p}\right)
=pd(logplog(p(pd1)pdp))+plog(p(pd1)pdp)\displaystyle=p^{d}\left(\log p-\log\left(\frac{p(p^{d}-1)}{p^{d}-p}\right)\right)+p\log\left(\frac{p(p^{d}-1)}{p^{d}-p}\right)
=pdlogp(pdp)log(pd1pd11)\displaystyle=p^{d}\log p-(p^{d}-p)\log\left(\frac{p^{d}-1}{p^{d-1}-1}\right)
=p(pd1logp(pd11)log(1pd1pd1)))\displaystyle=p\left(p^{d-1}\log p-(p^{d-1}-1)\log\left(\frac{1-p^{d}}{1-p^{d-1}}\right))\right)
=p(pd1logp+(1pd1)log(1+pd1pd1pd1))\displaystyle=p\left(p^{d-1}\log p+(1-p^{d-1})\log\left(1+\frac{p^{d-1}-p^{d}}{1-p^{d-1}}\right)\right)
p(pd1logp+(1pd1)pd1pd1pd1)\displaystyle\leq p\left(p^{d-1}\log p+(1-p^{d-1})\frac{p^{d-1}-p^{d}}{1-p^{d-1}}\right)
=p(pd1logp+pd1pd)\displaystyle=p\left(p^{d-1}\log p+p^{d-1}-p^{d}\right)
=pd(logp+1p)0,\displaystyle=p^{d}\left(\log p+1-p\right)\leq 0,

where the last inequality follows from the fact that 1+tet1+t\leq e^{t} for all tt: we can set t=logpt=\log p. ∎

Lemma 11.

For p(0,1)p\in(0,1), an optimal solution to the math program f(Δ,p,d,ρ)f^{\bullet}(\Delta,p,d,\rho) is

ad/ρ1=min{(Δ1)dd/ρ1,Δnd/ρ},a0=Δnd/ρad/ρ1,ai=0i{1,,d/ρ2}.a^{*}_{d/\rho-1}=\min\left\{(\Delta-1)\frac{d}{d/\rho-1},\frac{\Delta n}{d/\rho}\right\},\;a^{*}_{0}=\frac{\Delta n}{d/\rho}-a^{*}_{d/\rho-1},\;a^{*}_{i}=0\;\forall i\in\{1,\ldots,d/\rho-2\}.
Proof.

For ease of notation, we use dd^{-} instead of d/ρd/\rho in the proof. Consider another solution aa such that ad1<min{(Δ1)d/(d1),Δn/d}a_{d^{-}-1}<\min\{(\Delta-1)d/(d^{-}-1),\Delta n/d^{-}\}. If for all i{0,,d2}i\in\{0,\ldots,d^{-}-2\}, ai=0a_{i}=0, then aa is not optimal: neither constraint is tight; thus one could always increase some ai,i<d1a_{i},i<d^{-}-1 to decrease the objective value. Therefore, without loss of generality, we can assume that there exists an index i<d1i<d^{-}-1, ai>0a_{i}>0. Let ii be the biggest such index. For ϵ>0\epsilon>0, consider the following alternative solution:

ad1=ad1+ϵ,ai={aid1iϵi>0aiϵi=0,a^{\prime}_{d^{-}-1}=a_{d^{-}-1}+\epsilon,\;a^{\prime}_{i}=\begin{cases}a_{i}-\frac{{d^{-}-1}}{i}\epsilon&i>0\\ a_{i}-\epsilon&i=0\end{cases},
a0={a0+(d1i1)ϵi>0a0ϵi=0,ak=akk{0,i,d1}a^{\prime}_{0}=\begin{cases}a_{0}+\left(\frac{{d^{-}-1}}{i}-1\right)\epsilon&i>0\\ a_{0}-\epsilon&i=0\end{cases},\;a^{\prime}_{k}=a_{k}\;\forall k\notin\{0,i,{d^{-}-1}\}

It can be easily checked that aa^{\prime} is still feasible for small enough ϵ\epsilon. Let obj(a)obj(a) (resp. obj(a)obj(a^{\prime})) be the objective function value with respect to solution aa (resp. aa^{\prime}). When i>0i>0, we have

obj(a)obj(a)\displaystyle\frac{obj(a^{\prime})}{obj(a)} =((1p)(1pd)d1i1(1pdi)d1i)ϵ\displaystyle=\left(\frac{(1-p)\cdot(1-p^{d^{-}})^{\frac{d^{-}-1}{i}-1}}{(1-p^{d^{-}-i})^{\frac{d^{-}-1}{i}}}\right)^{\epsilon}
=(1p1pd(1pd1pdi)d1i)ϵ\displaystyle=\left(\frac{1-p}{1-p^{d^{-}}}\left(\frac{1-p^{d^{-}}}{1-p^{d^{-}-i}}\right)^{\frac{d^{-}-1}{i}}\right)^{\epsilon}
=((1p1pd)1d1(1pd1pdi)1i)(d1)ϵ\displaystyle=\left(\left(\frac{1-p}{1-p^{d^{-}}}\right)^{\frac{1}{d^{-}-1}}\left(\frac{1-p^{d^{-}}}{1-p^{d^{-}-i}}\right)^{\frac{1}{i}}\right)^{(d^{-}-1)\epsilon}
=((1pd1pdi)1i(1pd1p)1d1)(d1)ϵ1,\displaystyle=\left(\frac{\left(\frac{1-p^{d^{-}}}{1-p^{d^{-}-i}}\right)^{\frac{1}{i}}}{\left(\frac{1-p^{d^{-}}}{1-p}\right)^{\frac{1}{d^{-}-1}}}\right)^{(d^{-}-1)\epsilon}\leq 1,

where the last inequality follows from Claim 4. We have thus shown that obj(a)obj(a)obj(a^{\prime})\leq obj(a) if i>0i>0.

We now show that obj(a)obj(a)obj(a^{\prime})\leq obj(a) if i=0i=0. When i=0i=0, we have

obj(a)obj(a)\displaystyle\frac{obj(a^{\prime})}{obj(a)} =(1p1pd)ϵ1,\displaystyle=\left(\frac{1-p}{1-p^{d^{-}}}\right)^{\epsilon}\leq 1,

because 0<pd<p0<p^{d^{-}}<p. ∎

Lemma 12.

For Δ2\Delta\geq 2, we have

f(Δ,p,d,ρ)f(2,p,d/ρ,1)(Δ1)ρf^{\bullet}(\Delta,p,d,\rho)\geq f^{\bullet}(2,p,d/\rho,1)^{(\Delta-1)\rho}
Proof.

For ease of notation, we use dd^{-} instead of d/ρd/\rho in the proof. From Lemma 11, we have that

f(Δ,p,d,ρ)\displaystyle f^{\bullet}(\Delta,p,d,\rho) =(1p)min{(Δ1)dd1,Δnd}(1pd)Δndmin{(Δ1)dd1,Δnd}\displaystyle=(1-p)^{\min\left\{(\Delta-1)\frac{d}{d^{-}-1},\frac{\Delta n}{d^{-}}\right\}}(1-p^{d^{-}})^{\frac{\Delta n}{d^{-}}-\min\left\{(\Delta-1)\frac{d}{d^{-}-1},\frac{\Delta n}{d^{-}}\right\}}
(1p)(Δ1)dd1(1pd)Δnd(Δ1)dd1.\displaystyle\geq(1-p)^{(\Delta-1)\frac{d}{d^{-}-1}}(1-p^{d^{-}})^{\frac{\Delta n}{d^{-}}-(\Delta-1)\frac{d}{d^{-}-1}}. (20)

Specifically when Δ=2\Delta=2, we have

f(2,p,d,1)\displaystyle f^{\bullet}(2,p,d^{-},1) =(1p)min{dd1,2nd}(1pd)2ndmin{dd1,2nd}\displaystyle=(1-p)^{\min\left\{\frac{d^{-}}{d^{-}-1},\frac{2n}{d^{-}}\right\}}(1-p^{d^{-}})^{\frac{2n}{d^{-}}-\min\left\{\frac{d^{-}}{d^{-}-1},\frac{2n}{d^{-}}\right\}}
=(1p)dd1(1pd)2nddd1,\displaystyle=(1-p)^{\frac{d^{-}}{d^{-}-1}}(1-p^{d^{-}})^{\frac{2n}{d^{-}}-\frac{d^{-}}{d^{-}-1}},

where the second equality follows because for 2dn2\leq d^{-}\leq n, we have dd12nd\frac{d^{-}}{d^{-}-1}\leq\frac{2n}{d^{-}}.

We therefore have

f(2,p,d,1)(Δ1)dd=(1p)dd1(Δ1)dd(1pd)(2nddd1)(Δ1)dd.f^{\bullet}(2,p,d^{-},1)^{(\Delta-1)\frac{d}{d^{-}}}=(1-p)^{\frac{d^{-}}{d^{-}-1}(\Delta-1)\frac{d}{d^{-}}}(1-p^{d^{-}})^{\left(\frac{2n}{d^{-}}-\frac{d^{-}}{d^{-}-1}\right)(\Delta-1)\frac{d}{d^{-}}}. (21)

From Eqs.(20) and (21), we have that to show the desired inequality, we can equivalently show

  1. 1.
    (1p)(Δ1)dd1(1p)dd1(Δ1)dd.(1-p)^{(\Delta-1)\frac{d}{d^{-}-1}}\geq(1-p)^{\frac{d^{-}}{d^{-}-1}(\Delta-1)\frac{d}{d^{-}}}.
  2. 2.
    (1pd)Δnd(Δ1)dd1(1pd)(2nddd1)(Δ1)dd.(1-p^{d^{-}})^{\frac{\Delta n}{d^{-}}-(\Delta-1)\frac{d}{d^{-}-1}}\geq(1-p^{d^{-}})^{(\frac{2n}{d^{-}}-\frac{d^{-}}{d^{-}-1})(\Delta-1)\frac{d}{d^{-}}}.

We first show

(1p)(Δ1)dd1(1p)dd1(Δ1)dd.(1-p)^{(\Delta-1)\frac{d}{d^{-}-1}}\geq(1-p)^{\frac{d^{-}}{d^{-}-1}(\Delta-1)\frac{d}{d^{-}}}.

Since 1p(0,1)1-p\in(0,1), we equivalently want to show

(Δ1)dd1dd1(Δ1)dd,(\Delta-1)\frac{d}{d^{-}-1}\leq\frac{d^{-}}{d^{-}-1}(\Delta-1)\frac{d}{d^{-}},

which, after canceling common terms and rearranging clearly holds.

We then show

(1pd)Δnd(Δ1)dd1(1pd)(2nddd1)(Δ1)dd.(1-p^{d^{-}})^{\frac{\Delta n}{d^{-}}-(\Delta-1)\frac{d}{d^{-}-1}}\geq(1-p^{d^{-}})^{(\frac{2n}{d^{-}}-\frac{d^{-}}{d^{-}-1})(\Delta-1)\frac{d}{d^{-}}}.

Again, because 1pd(0,1)1-p^{d^{-}}\in(0,1), we equivalently need to show

Δnd(Δ1)dd1(2nddd1)(Δ1)dd.\frac{\Delta n}{d^{-}}-(\Delta-1)\frac{d}{d^{-}-1}\leq(\frac{2n}{d^{-}}-\frac{d^{-}}{d^{-}-1})(\Delta-1)\frac{d}{d^{-}}.

Equivalently

Δnd((2nddd1)dd+dd1)(Δ1)=2nddd(Δ1).\frac{\Delta n}{d^{-}}\leq\left((\frac{2n}{d^{-}}-\frac{d^{-}}{d^{-}-1})\frac{d}{d^{-}}+\frac{d}{d^{-}-1}\right)(\Delta-1)=\frac{2n}{d^{-}}\frac{d}{d^{-}}(\Delta-1).

Dividing both sides by nd(Δ1)\frac{n}{d^{-}}(\Delta-1), we get equivalently

ΔΔ12dd,\frac{\Delta}{\Delta-1}\leq 2\frac{d}{d^{-}},

which holds for Δ2\Delta\geq 2 because Δ/(Δ1)2\Delta/(\Delta-1)\leq 2 and d/d1d/d^{-}\geq 1.

Lemma 13.

Let p>0p>0, then we have

f(2,p,d,ρ)1LP(2,p,d,ρ).f^{\bullet}(2,p,d,\rho)\geq 1-LP^{\bullet}(2,p,d,\rho).
Proof.

For ease of notation, we use dd^{-} instead of d/ρd/\rho in the proof. Let a=(aij)j[d,d],i[0,j1]a=(a_{ij})_{j\in[d^{-},d],\ i\in[0,j-1]} be a feasible solution to LP(2,p,d,ρ)LP^{\bullet}(2,p,d,\rho). It is sufficient to show that

j=ddi=0j1(1pji)aij1j=ddi=0j1aijpji\prod\limits_{j=d^{-}}^{d}\prod\limits_{i=0}^{j-1}(1-p^{j-i})^{a_{ij}}\geq 1-\sum_{j=d^{-}}^{d}\sum_{i=0}^{j-1}a_{ij}\cdot p^{j-i}

We show, more generally that for x0,,xk[0,1]kx_{0},\ldots,x_{k}\in[0,1]^{k} and a0,,ak0a_{0},\ldots,a_{k}\geq 0

i=0k(1xi)ai1i=0kaixi,\prod\limits_{i=0}^{k}(1-x_{i})^{a_{i}}\geq 1-\sum_{i=0}^{k}a_{i}\cdot x_{i},

If 1i=0kaixi01-\sum_{i=0}^{k}a_{i}\cdot x_{i}\leq 0, the inequality is trivially true. So we assume that 1i=0kaixi01-\sum_{i=0}^{k}a_{i}\cdot x_{i}\geq 0, which implies 1aixi01-a_{i}\cdot x_{i}\geq 0 for i[k]i\in[k]. We first show that for a positive integer aa and x[0,1]x\in[0,1] we have

(1x)a1ax(1-x)^{a}\geq 1-ax

To see this, we consider the function g(x)=(1x)a(1ax)g(x)=(1-x)^{a}-(1-ax) over [0,1][0,1]. The derivative of gg is g(x)=aa(1x)a10g^{\prime}(x)=a-a(1-x)^{a-1}\geq 0 for x[0,1]x\in[0,1]. gg is therefore increasing and since g(0)=0g(0)=0, we get that g(x)0g(x)\geq 0 for x[0,1]x\in[0,1]. Therefore, for i[k]i\in[k]

(1xi)ai1aixi0(1-x_{i})^{a_{i}}\geq 1-a_{i}\cdot x_{i}\geq 0

By taking the product we get that

i=0k(1xi)aii=0k(1aixi)\prod\limits_{i=0}^{k}(1-x_{i})^{a_{i}}\geq\prod\limits_{i=0}^{k}(1-a_{i}\cdot x_{i}) (22)

All is left is to show that

i=0k(1aixi)1i=0kaixi\prod\limits_{i=0}^{k}(1-a_{i}\cdot x_{i})\geq 1-\sum_{i=0}^{k}a_{i}\cdot x_{i} (23)

We show (23) by induction. It is true for i=0i=0. Suppose (23) holds for some <k\ell<k, then

i=0(1xi)ai1i=0aixi,\prod\limits_{i=0}^{\ell}(1-x_{i})^{a_{i}}\geq 1-\sum_{i=0}^{\ell}a_{i}\cdot x_{i},

and by multiplying by (1a+1x+1)(1-a_{\ell+1}\cdot x_{\ell+1}), we get

i=0+1(1aixi)\displaystyle\prod\limits_{i=0}^{\ell+1}(1-a_{i}\cdot x_{i}) (1a+1x+1)(1i=0aixi)\displaystyle\geq(1-a_{\ell+1}\cdot x_{\ell+1})(1-\sum_{i=0}^{\ell}a_{i}\cdot x_{i})
=1i=0+1aixi+(a+1x+1)(i=0+1aixi)\displaystyle=1-\sum_{i=0}^{\ell+1}a_{i}\cdot x_{i}+(a_{\ell+1}\cdot x_{\ell+1})(\sum_{i=0}^{\ell+1}a_{i}\cdot x_{i})
1i=0+1aixi\displaystyle\geq 1-\sum_{i=0}^{\ell+1}a_{i}\cdot x_{i}

This concludes the proof of (23). From (22) and (23), we get

i=0k(1xi)ai1i=0kaixi.\prod\limits_{i=0}^{k}(1-x_{i})^{a_{i}}\geq 1-\sum_{i=0}^{k}a_{i}\cdot x_{i}.

Finally, by setting k=d1k=d^{-}-1, and xi=pdix_{i}=p^{d^{-}-i}, we get that

i=0d1(1pdi)ai1i=0d1aipdi,\prod\limits_{i=0}^{d^{-}-1}(1-p^{d^{-}-i})^{a_{i}}\geq 1-\sum_{i=0}^{d^{-}-1}a_{i}\cdot p^{d^{-}-i},

which proves the lemma. ∎

Claim 5.

For n100n\geq 100 and 2dn2\leq d\leq n, we have

dd1n1/d+1dnn1n1/n+1n1logn2n\frac{d}{d-1}n^{-1/d}+\frac{1}{d}\leq\frac{n}{n-1}n^{-1/n}+\frac{1}{n}\leq 1-\frac{\log n}{2n}
Proof.

We prove the claim in three steps.

  1. 1.

    dd1n1/d+1dnn1n1/n+1n\frac{d}{d-1}n^{-1/d}+\frac{1}{d}\leq\frac{n}{n-1}n^{-1/n}+\frac{1}{n} for n42n\geq 42.

    To see this, we analyze the derivative of

    q(w)=ww1n1w+1w,w[2,).q(w)=\frac{w}{w-1}\cdot n^{-\frac{1}{w}}+\frac{1}{w},\;\;w\in[2,\infty).

    We get

    q(w)\displaystyle q^{\prime}(w) =n1/w(n1/w+2wn1/ww2(1+n1/w)+(w1)wlogn)w2(w1)2\displaystyle=\frac{n^{-1/w}\cdot\left(-n^{1/w}+2wn^{1/w}-w^{2}(1+n^{1/w})+(w-1)w\log n\right)}{w^{2}(w-1)^{2}}
    =n1/w(n1/w(w1)2w2+w(w1)logn)w2(w1)2\displaystyle=\frac{n^{-1/w}\cdot\left(-n^{1/w}\cdot(w-1)^{2}-w^{2}+w(w-1)\log n\right)}{w^{2}(w-1)^{2}}
    n1/w(e(w1)2w2+w(w1)logn)w2(w1)2.\displaystyle\geq\frac{n^{-1/w}\cdot\left(-e\cdot(w-1)^{2}-w^{2}+w(w-1)\log n\right)}{w^{2}(w-1)^{2}}.

    When logne+1\log n\geq e+1, we have w(w1)lognw(w1)(e+1)w2+e(w1)2w(w-1)\log n\geq w(w-1)(e+1)\geq w^{2}+e(w-1)^{2}, and therefore q(w)0q^{\prime}(w)\geq 0 when n42ee+1n\geq 42\geq e^{e+1}.

  2. 2.

    n1/n134lognnn^{-1/n}\leq 1-\frac{3}{4}\frac{\log n}{n} for n10n\geq 10

    To see this, consider the function g(x)=134logxxx1/xg(x)=1-\frac{3}{4}\frac{\log x}{x}-x^{-1/x} over the interval [e,)[e,\infty). We have

    g(x)=1logxx2(x1/x34).g^{\prime}(x)=\frac{1-\log x}{x^{2}}(x^{-1/x}-\frac{3}{4}).

    Since x1/xx^{-1/x} is increasing and 101/10>3/410^{-1/10}>3/4, we have that g(x)<0g^{\prime}(x)<0 for x10x\geq 10. Therefore, gg is decreasing over [10,][10,\infty]. Since limxg(x)=0\lim\limits_{x\rightarrow\infty}g(x)=0, this implies that g(n)0g(n)\geq 0 for n10n\geq 10.

  3. 3.

    nn1(134lognn)+1n1logn2n\frac{n}{n-1}(1-\frac{3}{4}\frac{\log n}{n})+\frac{1}{n}\leq 1-\frac{\log n}{2n} for n100n\geq 100.

    To see this, we use the following sequence of equivalences

    nn1(134lognn)+1n1logn2n\displaystyle\frac{n}{n-1}(1-\frac{3}{4}\frac{\log n}{n})+\frac{1}{n}\leq 1-\frac{\log n}{2n} (1+1n1)(134lognn)+1n1logn2n\displaystyle\Leftrightarrow(1+\frac{1}{n-1})(1-\frac{3}{4}\frac{\log n}{n})+\frac{1}{n}\leq 1-\frac{\log n}{2n}
    1n1+1n34lognn(n1))logn4n\displaystyle\Leftrightarrow\frac{1}{n-1}+\frac{1}{n}-\frac{3}{4}\frac{\log n}{n(n-1)})\leq\frac{\log n}{4n}
    4n+4(n1)3logn(n1)logn\displaystyle\Leftrightarrow 4n+4(n-1)-3\log n\leq(n-1)\log n

    The last inequality is true for n100n\geq 100

Lemma 14.

The optimal solution of LP(2,p,d/ρ,1)LP^{\bullet}(2,p,d/\rho,1) is given by

ad/ρ1=min{d/ρd/ρ1,2nd/ρ},a0=2nd/ρad/ρ1,ai=0i{1,2,,d/ρ2}.a_{d/\rho-1}=\min\left\{\frac{d/\rho}{d/\rho-1},\frac{2n}{d/\rho}\right\},\;a_{0}=\frac{2n}{d/\rho}-a_{d/\rho-1},\;a_{i}=0\;\;\forall i\in\{1,2,\ldots,d/\rho-2\}. (24)
Proof.

For ease of notation, we use dd^{-} instead of d/ρd/\rho in the proof. Suppose to the contrary that there exists an optimal solution aa such that ad1<min{d/(d1),2n/d}a_{d^{-}-1}<\min\{d^{-}/(d^{-}-1),2n/d^{-}\}. There must exist i<d1i<d^{-}-1 such that ai>0a_{i}>0: neither constraint is tight if for all i<d1i<d^{-}-1, ai=0a_{i}=0; thus one could always increase some ai,i<d1a_{i},i<d^{-}-1 to increase the objective value. Let ii be the biggest index below d1d^{-}-1 such that ai>0a_{i}>0. For ϵ>0\epsilon>0, consider the following alternative solution:

ad1=ad1+ϵ,ai={aid1iϵi>0aiϵi=0,a0={a0+(d1i1)ϵi>0a0ϵi=0,a^{\prime}_{d^{-}-1}=a_{d^{-}-1}+\epsilon,\;a^{\prime}_{i}=\begin{cases}a_{i}-\frac{{d^{-}-1}}{i}\epsilon&i>0\\ a_{i}-\epsilon&i=0\end{cases},\;a^{\prime}_{0}=\begin{cases}a_{0}+\left(\frac{{d^{-}-1}}{i}-1\right)\epsilon&i>0\\ a_{0}-\epsilon&i=0\end{cases},
ak=akk{0,i,d1}a^{\prime}_{k}=a_{k}\;\;\forall k\notin\{0,i,{d^{-}-1}\}

It can be easily checked that aa^{\prime} is still feasible for small enough ϵ\epsilon. When i>0i>0, the net increase in objective value from aa to aa^{\prime} is

ϵ(pd(d1)+(d1i1)pdd1ipdi).\displaystyle\epsilon\cdot\left(p^{d^{-}-{(d^{-}-1)}}+\left(\frac{{d^{-}-1}}{i}-1\right)\cdot p^{d}-\frac{{d^{-}-1}}{i}\cdot p^{d-i}\right). (25)

Now consider the following function

f(x)=1x(pdxpd),x[1,d1].f(x)=\frac{1}{x}(p^{d^{-}-x}-p^{d^{-}}),\;\;x\in[1,{d^{-}-1}].

We claim that f(x)f(x) is strictly increasing when p(0,1)p\in(0,1): taking the derivative of f(x)f(x), we get

f(x)=pdx(px(1+xlogp))x2=pdx(exlogp(1+xlogp))x2>0,f^{\prime}(x)=\frac{p^{d^{-}-x}\cdot(p^{x}-(1+x\log p))}{x^{2}}=\frac{p^{d^{-}-x}\cdot(e^{x\log p}-(1+x\log p))}{x^{2}}>0,

where the inequality follows from the fact that et>1+te^{t}>1+t for all t0t\neq 0 and p(0,1)p\in(0,1).

Therefore, we have that f(d1)>f(i)f({d^{-}-1})>f(i), which translates to

1d1(pd(d1)pd)>1i(pdipd).\frac{1}{{d^{-}-1}}(p^{d^{-}-{(d^{-}-1)}}-p^{d^{-}})>\frac{1}{i}(p^{d^{-}-i}-p^{d^{-}}). (26)

By rearranging Eq.(26), we can conclude that the net increase given in (25) is strictly positive, thus contradicting the assumption that aa is an optimal solution.

Now consider the case where i=0i=0. The net increase in objective value from aa to aa^{\prime} is

ϵ(pd(d1)pd),\displaystyle\epsilon\cdot\left(p^{d^{-}-{(d^{-}-1)}}-p^{d^{-}}\right), (27)

which is clearly strictly positive if p(0,1)p\in(0,1), contradicting the assumption that aa is an optimal solution. ∎

Lemma 15.

Assume d2d\geq 2. When p=(2n)ρdp=(2n)^{-\frac{\rho}{d}} and n100n\geq 100, we have

LP(2,p,d/ρ,1)1logn2n.LP^{\bullet}(2,p,d/\rho,1)\leq 1-\frac{\log n}{2n}.
Proof.

From Lemma 14, we know that

LP(2,p,d/ρ,1)d/ρd/ρ1p+2nd/ρpd/ρ.LP^{\bullet}(2,p,d/\rho,1)\leq\frac{d/\rho}{d/\rho-1}\cdot p+\frac{2n}{d/\rho}\cdot p^{d/\rho}.

Therefore, when p=(2n)ρdp=(2n)^{-\frac{\rho}{d}},

LP(2,p,d/ρ,1)\displaystyle LP^{\bullet}(2,p,d/\rho,1) d/ρd/ρ1(2n)1d/ρ+2nd/ρ(2n)1\displaystyle\leq\frac{d/\rho}{d/\rho-1}\cdot(2n)^{-\frac{1}{d/\rho}}+\frac{2n}{d/\rho}\cdot(2n)^{-1}
d/ρd/ρ1n1d/ρ+1d/ρ,\displaystyle\leq\frac{d/\rho}{d/\rho-1}\cdot n^{-\frac{1}{d/\rho}}+\frac{1}{d/\rho}, (28)

where the last inequality follows because (2n)1d/ρn1d/ρ(2n)^{-\frac{1}{d/\rho}}\leq n^{-\frac{1}{d/\rho}}.

From Claim 5, we have that for d/ρ2d/\rho\geq 2 and n100n\geq 100, (28) is upper bounded by 1logn2n1-\frac{\log n}{2n}. ∎

B.2 Proof of Theorem 8

See 8

Proof.

To show that Algorithm 5 returns the hypergraph with high probability, we will show the following:

  1. 1.

    With high probability, for every edge, there exists at least one sample that contains that edge and no other edges.

  2. 2.

    If a sample contains more than one edge, we will learn at most the intersection of the edges, and this intersection will be discarded in the last for-loop of the algorithm

The second point above is easy to see: if a sample SiS_{i} contains more than one edge, then QH(Si{v})=0Q_{H}(S_{i}\setminus\{v\})=0 if and only if vv is in the intersection of all edges contained in SiS_{i}. If we can successfully learn any edge in SiS_{i} from another sample SjS_{j} that contains only that edge, then the intersection learned through sample SiS_{i} will be discarded as it is a subset of the edge learned using SjS_{j}.

For the rest of the proof, we focus on showing the first point. To reiterate, we want to show that every edge of HH appears in at least one sample by itself. Below, we show that it is the case with probability 1o(1)1-o(1).

We first show that with high probability, each edge ee of size d[d/ρ,d]d^{\prime}\in[d/\rho,d] is contained in at least (2n)Δ1(log2n)/2(2n)^{\Delta-1}(\log^{2}n)/2 sample SiS_{i}’s: use XeX_{e} to denote the number of samples containing ee, then we have

𝔼(Xe)\displaystyle\mathbb{E}(X_{e}) =(2n)Δρlog2npd\displaystyle=(2n)^{\Delta\rho}\log^{2}n\cdot p^{d^{\prime}}
(2n)Δρlog2npd\displaystyle\geq(2n)^{\Delta\rho}\log^{2}n\cdot p^{d}
=(2n)Δρlog2n(2n)ρ\displaystyle=(2n)^{\Delta\rho}\log^{2}n\cdot(2n)^{-\rho}
=(2n)(Δ1)ρlog2n\displaystyle=(2n)^{(\Delta-1)\rho}\log^{2}n

By Chernoff bound, we have

(Xe(2n)(Δ1)ρ(log2n)/2)e(2n)(Δ1)ρ(log2n)/8.\mathbb{P}(X_{e}\leq(2n)^{(\Delta-1)\rho}(\log^{2}n)/2)\leq e^{-(2n)^{(\Delta-1)\rho}(\log^{2}n)/8}.

As there are at most Δnn2\Delta n\leq n^{2} edges of size d[d/ρ,d]d^{\prime}\in[d/\rho,d], by a union bound, we have

P(eE s.t. |e|[d/ρ,d],Xe(2n)(Δ1)ρ(log2n)/2)\displaystyle P(\exists e\in E\text{ s.t. }|e|\in[d/\rho,d],X_{e}\leq(2n)^{(\Delta-1)\rho}(\log^{2}n)/2)
Δne(2n)(Δ1)ρ(log2n)/8=elogΔ+logn(2n)(Δ1)ρ(log2n)/8\displaystyle\leq\Delta ne^{-(2n)^{(\Delta-1)\rho}(\log^{2}n)/8}=e^{\log\Delta+\log n-(2n)^{(\Delta-1)\rho}(\log^{2}n)/8}
elogn((2n)(Δ1)ρ(logn)/82)\displaystyle\leq e^{-\log n((2n)^{(\Delta-1)\rho}(\log n)/8-2)}
elogn((logn)/82)=n(logn)/8+2=o(1).\displaystyle\leq e^{-\log n((\log n)/8-2)}=n^{-(\log n)/8+2}=o(1).

Subsequently,

P(eE s.t. |e|[d/ρ,d],Xe(2n)(Δ1)ρ(log2n)/2)1o(1).P(\forall e\in E\text{ s.t. }|e|\in[d/\rho,d],X_{e}\geq(2n)^{(\Delta-1)\rho}(\log^{2}n)/2)\geq 1-o(1).

From now on we condition on the event that all edges ee whose size is between d/ρd/\rho and dd are included in at least (2n)(Δ1)ρ(log2n)/2(2n)^{(\Delta-1)\rho}(\log^{2}n)/2 samples.

From Lemma 1, we have that for all n100n\geq 100,

(eE{e},eSi|eSi)1(logn2n)(Δ1)ρ.\displaystyle\mathbb{P}(\exists e^{\prime}\in E\setminus\{e\},\ e^{\prime}\subseteq S_{i}\ |\ e\subseteq S_{i})\leq 1-(\frac{\log n}{2n})^{(\Delta-1)\rho}.

As each edge ee with size between d/ρd/\rho and dd is contained in at least (2n)(Δ1)ρ(log2n)/2(2n)^{(\Delta-1)\rho}(\log^{2}n)/2 samples, we have that

(Si s.t. eSi,eE{e} s.t. eSi)\displaystyle\mathbb{P}(\forall S_{i}\text{ s.t. }e\in S_{i},\exists e^{\prime}\in E\setminus\{e\}\mbox{ s.t. }e^{\prime}\subseteq S_{i}) (1(logn2n)(Δ1)ρ)(2n)(Δ1)ρ(log2n)/2\displaystyle\leq\left(1-\left(\frac{\log n}{2n}\right)^{(\Delta-1)\rho}\right)^{(2n)^{(\Delta-1)\rho}(\log^{2}n)/2}
e(logn2n)(Δ1)ρ(2n)(Δ1)ρ(log2n)/2\displaystyle\leq e^{-\left(\frac{\log n}{2n}\right)^{(\Delta-1)\rho}(2n)^{(\Delta-1)\rho}(\log^{2}n)/2}
=e(log(Δ+1)ρn)/2=n(logΔρn)/2.\displaystyle=e^{-(\log^{(\Delta+1)\rho}n)/2}=n^{-(\log^{\Delta\rho}n)/2}.

By another union bound on all edges of size between d/ρd/\rho and dd (at most Δn\Delta n of them), we have that

(e s.t. |e|[d/ρ,d],Si s.t. eSi,eE{e},eSi)n(logΔρn)/2+2=o(1).\displaystyle\mathbb{P}(\exists e\text{ s.t. }|e|\in[d/\rho,d],\forall S_{i}\text{ s.t. }e\in S_{i},\exists e^{\prime}\in E\setminus\{e\},\ e^{\prime}\subseteq S_{i})\leq n^{-(\log^{\Delta\rho}n)/2+2}=o(1).

We can thus conclude that with probability at least 1o(1)1-o(1), for all eEe\in E with size between d/ρd/\rho and dd, there exists at least one sample SiS_{i} that contains ee but no other remaining edges.

We now prove the query complexity of FindLowDegreeEdges. It is clear that FindLowDegreeEdges is non adaptive because all the queries can be made in parallel. Regarding query complexity, FindLowDegreeEdges constructs (2n)ρΔlog2n(2n)^{\rho\Delta}\log^{2}n samples, and for each one of these samples, makes at most 𝒪(n)\mathcal{O}(n) queries. Therefore FindLowDegreeEdges makes at most

𝒪(n(2n)ρΔlog2n)=𝒪((2n)ρΔ+1log2n)\mathcal{O}(n(2n)^{\rho\Delta}\log^{2}n)=\mathcal{O}((2n)^{\rho\Delta+1}\log^{2}n)

queries in total. ∎

Appendix C Sequential and Parallel Runtime

The runtime of our proposed algorithms is not much worse than their query complexity. When running sequentially, FindEdgeAdaptive requires an additional O(n)O(n) time to do every set partition, FindDisjointEdges requires an additional O(n)O(n) time to construct a set of independently sampled vertices, and FindLowDegreeEdges requires an additive (Δn)2d(\Delta n)^{2}d time to execute the last for loop that consolidates the candidate edge sets. We summarize both the sequential and parallel runtimes for each algorithm in Table 1 below which we will include in the next version of the paper. We assume that each query can be made in O(1)O(1) time, and for the parallel runtime analysis, we assume access to polynomially many parallel processors. Some of our algorithm use a subroutine that can either be FindEdgeParallel or FindEdgeAdaptive. We will use PRL and ADA to refer to these subroutines respectively.

Algorithm Query
Complexity
Sequential
Runtime
Parallel
Runtime
PRL: FindEdgeParallel(S)\textsc{PRL: FindEdgeParallel}(S) O(|S|)O(|S|) O(|S|)O(|S|) O(1)O(1)
ADA: FindEdgeAdaptive(S,s)\textsc{ADA: FindEdgeAdaptive}(S,s) O(slog(|S|))O(s\log(|S|)) O(s|S|log(|S|))O(s|S|\log(|S|)) O(log|S|)O(\log|S|)
FindDisjointEdges(PRL)\textsc{FindDisjointEdges}(\textsc{PRL}) O(nαlog2nO(n^{\alpha}\log^{2}n
+n2+αlog2n)+n^{2+\alpha}\log^{2}n)
O(nα+1log2n)O(n^{\alpha+1}\log^{2}n) O(1)O(1)
FindDisjointEdges(ADA)\textsc{FindDisjointEdges}(\textsc{ADA}) O(nαlog2nO(n^{\alpha}\log^{2}n
+2nαlog3n)+2n^{\alpha}\log^{3}n)
O(nα+2log3n)O(n^{\alpha+2}\log^{3}n) O(logn)O(\log n)
FindMatching(PRL)\textsc{FindMatching}(\textsc{PRL}) O(n3log3n)O(n^{3}\log^{3}n) O(n3log3n)O(n^{3}\log^{3}n) O(log2n)O(\log^{2}n)
FindMatching(ADA)\textsc{FindMatching}(\textsc{ADA}) O(nlog5n)O(n\log^{5}n) *O(nα+2log6n)O(n^{\alpha+2}\log^{6}n) O(log4n)O(\log^{4}n)
FindLowDegreeEdges O((2n)ρΔ+1O((2n)^{\rho\Delta+1}
log2n)\log^{2}n)
O((2n)ρΔ+1log2nO((2n)^{\rho\Delta+1}\log^{2}n
+(Δn)2d)+(\Delta n)^{2}d)
O(logn)O(\log n)
Table 1: Algorithms Runtime (with α=1/(11/(2logn))\alpha=1/(1-1/(2\log n)) in the cell with *)