Learning Low Degree Hypergraphs

Eric Balkanski¹¹1Columbia University. [email protected] Oussama Hanguir²²2Columbia University. [email protected] Shatian Wang³³3Columbia University. [email protected]

Abstract

We study the problem of learning a hypergraph via edge detecting queries. In this problem, a learner queries subsets of vertices of a hidden hypergraph and observes whether these subsets contain an edge or not. In general, learning a hypergraph with $m$ edges of maximum size $d$ requires $\Omega((2m/d)^{d/2})$ queries [7]. In this paper, we aim to identify families of hypergraphs that can be learned without suffering from a query complexity that grows exponentially in the size of the edges.

We show that hypermatchings and low-degree near-uniform hypergraphs with $n$ vertices are learnable with $\operatorname{poly}(n)$ queries. For learning hypermatchings (hypergraphs of maximum degree $1$ ), we give an $\mathcal{O}(\log^{3}n)$ -round algorithm with $\mathcal{O}(n\log^{5}n)$ queries. We complement this upper bound by showing that there are no algorithms with $\operatorname{poly}(n)$ queries that learn hypermatchings in $o(\log\log n)$ adaptive rounds. For hypergraphs with maximum degree $\Delta$ and edge size ratio $\rho$ , we give a non-adaptive algorithm with $\mathcal{O}((2n)^{\rho\Delta+1}\log^{2}n)$ queries. To the best of our knowledge, these are the first algorithms with $\operatorname{poly}(n,m)$ query complexity for learning non-trivial families of hypergraphs that have a super-constant number of edges of super-constant size.⁴⁴4Accepted for presentation at the Conference on Learning Theory (COLT) 2022.

1 Introduction

Hypergraphs are a powerful tool in computational chemistry and molecular biology where they are used to represent groups of chemicals and molecules that cause a reaction. The chemicals and molecules that cause a reaction are however often unknown a priori, and learning such groups is a central problem of interest that has motivated a long line of work on hypergraph learning in the edge-detecting queries model, e.g., [18, 7, 6, 2, 3, 1]. In this model, a learner queries subsets of vertices and, for each queried subset, observes whether it contains an edge or not. When the vertices represent chemicals, the learner queries groups of chemicals and observes a reaction if a subgroup reacts. The goal is to learn the edges with a small number of queries, i.e., a small number of experiments.

An important application of learning chemical reaction networks is to learn effective drug combinations (drugs are vertices and effective drug combinations are hyperedges). In particular, there has recently been a lot of interest in finding drug combinations that reduce cancer cell viability. For example, as part of AstraZeneca’s recent DREAM Challenge, the effectiveness of a large number of drug combinations was tested against different cancer cell lines [17]. Despite this interest, as noted by Flobak et al. [14], “our knowledge about beneficial targeted drug combinations is still limited, partly due to the combinatorial complexity”, especially since the time required to culture, maintain and test cell lines is significant. This combinatorial complexity motivates the need for novel computational methods that efficiently explore drug combinations.

One main obstacle to discovering effective combinations that involve $d>2$ vertices is that $\Omega((2m/d)^{d/2})$ queries are required to learn hypergraphs with $m$ edges of maximum size $d$ [6]. Since there is no efficient algorithm for learning general hypergraphs with large maximum edge size, the main question is which families of hypergraphs can be efficiently learned.

Which families of hypergraphs can we learn with a number of queries
that does not grow exponentially in the size of the edges?

Our results.

We develop the first algorithms with $\operatorname{poly}(n,m)$ queries for learning non-trivial families of hypergraphs that have a super-constant number of edges $m$ of super-constant size. The first family of such hypergraphs that we consider are hypermatchings, i.e., hypergraphs such that every vertex is in at most one edge. Learning a hypermatching generalizes the problem of learning a matching studied in [4, 8]. In addition to their query complexity, we are also interested in the adaptive complexity of our algorithms, which is the number of adaptive rounds of queries required when the algorithm can perform multiple queries in parallel in each round. Our first main result is an algorithm with near-linear query complexity and poly-logarithmic adaptive complexity for learning hypermatchings.

Theorem.

There is a $\mathcal{O}(\log^{3}n)$ -adaptive algorithm with $\mathcal{O}(n\log^{5}n)$ queries that, for any hypermatching $M$ over $n$ vertices, learns $M$ with high probability.

The adaptivity of the algorithm can be improved to $\mathcal{O}(\log n)$ at the expense of $\mathcal{O}(n^{3}\log^{3}n)$ queries. We complement our algorithm by showing that there are no $o(\log\log n)$ -adaptive algorithms that learn hypermatchings using $\operatorname{poly}(n)$ queries.

Theorem.

There is no $o(\log\log n)$ -adaptive algorithm with polynomial query complexity that learns an arbitrary hypermatching $M$ over $n$ vertices, even with small probability.

Moving beyond hypermatchings, we study hypergraphs whose vertices have bounded maximum degree $\Delta\geq 2$ ( $\Delta=1$ for hypermatchings). For such hypergraphs, the previously mentioned $\Omega((2m/d)^{d/2})$ lower bound on the number of queries needed by any algorithm implicitly also implies that $\Omega((m/\rho)^{\rho})$ queries are required, even when $\Delta=2$ , where $\rho\geq 1$ is the ratio between the maximum and minimum edge sizes [6]. Thus, an exponential dependence on this near-uniform edge sizes parameter $\rho$ is, in general, unavoidable. We give a non-adaptive algorithm with query complexity that depends on the maximum degree $\Delta$ and the near-uniform parameter $\rho$ of the hypergraph $H$ we wish to learn.

Theorem.

There is a non-adaptive algorithm with $\mathcal{O}((2n)^{\rho\Delta+1}\log^{2}n)$ queries that, for any $\rho$ near-uniform hypergraph $H$ with maximum degree $\Delta\geq 2$ , learns $H$ with high probability.

This query complexity is independent of the maximum size $d$ of the edges and is polynomial in the number of vertices $n$ when the maximum degree $\Delta$ and the edge size ratio $\rho$ are constant. Our learning algorithms rely on novel constructions of collections of queries that satisfy a simple property that we call unique-edge covering, which is a general approach for constructing learning algorithms with a query complexity that does not grow exponentially in the size of the edges. We believe that an interesting direction for future work is to identify, in addition to hypermatchings, other relevant families of hypergraphs that are learnable with $\operatorname{poly}(n,m)$ queries, even when both the maximum edge size $d$ and the edge size ratio $\rho$ are non-constant.

Technical overview.

Previous work on learning a hypergraph $H=(V,E)$ relies on constructing a collection of sets of vertices called an independent covering family [7] or a cover-free family [3, 15, 16, 2]. These families aim to identify non-edges, which are sets of vertices $T\subseteq V$ that do not contain an edge $e\in E$ . More precisely, both of these families are a collection $\mathcal{S}$ of sets of vertices such that every non-edge set $T$ of size $d$ is contained in at least one set $S\in\mathcal{S}$ that does not contain an edge. These families have a minimum size that is exponential in $d$ and these approaches require a number of queries that is at least the size of these families. In contrast to these existing approaches that focus on non-edges, we construct a collection $\mathcal{S}$ of sets of vertices such that every edge $e$ is contained in at least one set $S\in\mathcal{S}$ that does not contain any other edge of $H$ . We call such a collection a unique-edge covering family. Since the number of edges in a hypergraph $H$ with maximum degree $\Delta$ is at most $\Delta n$ , there exists unique-edge covering families whose sizes depend on the maximum degree $\Delta$ of $H$ instead of the maximum edge size $d$ .

The algorithm for learning a hypermatching $M$ proceeds in phases. In each phase, we construct a unique-edge covering family of a subgraph of $M$ containing all edges of size that is in a specified range. This unique-edge covering family is constructed with i.i.d. sampled sets that contain each vertex, independently, with probability depending on the edge size range. The edge size range is widened at the end of each phase. One main challenge with hypermatchings is to obtain a near-linear query complexity. We do so by developing a subroutine which, given a set $S$ that covers a unique edge of size at most $s$ , finds this unique edge with $\mathcal{O}(s\log|S|)$ queries. For the lower bound for hypermatchings, there is a simple construction that shows that there are no non-adaptive algorithms for learning hypermatchings with $\operatorname{poly}(n)$ queries. The technical part of interest is extending this construction and its analysis to hold for $\Omega(\log\log n)$ rounds. Finally, for low-degree near-uniform hypergraphs $H$ , we construct a unique-edge covering family in a similar manner to those for hypermatchings ( $\Delta=1$ ). The main technical difficulty for hypergraphs with maximum degree $\Delta>1$ is in analyzing the number of samples required to construct a unique-edge covering family, which is significantly more involved than for hypermatchings due to overlapping edges.

Related work.

The problem of learning a hidden hypergraph with edge-detecting queries was first introduced by Torney [18] for complex group testing, where this problem is also known as exact learning from membership queries of a monotone DNF with at most $m$ monomials, where each monomial contains at most $d$ variables [5, 6, 9].

Regarding hardness results, non-adaptive algorithms for learning hypergraphs with $m$ edges of size at most $d$ require $\Omega(N(m,d)\log n)$ queries where $N(m,d)=\frac{m+d}{\log\binom{m+d}{d}}\binom{m+d}{d}$ [2, 16]. Angluin and Chen [6] show that $\Omega((2m/d)^{d/2})$ queries are required by any (adaptive) algorithm for learning general hypergraphs, and then extended this lower bound in [7] to $\Omega((2m/(\delta+2))^{1+\delta/2})$ for learning near-uniform hypergraphs with maximum edge size difference $\delta=d-\min_{e\in H}|e|$ .

Regarding adaptive algorithms, Angluin and Chen [7] give an algorithm that learns a hypergraph with $\mathcal{O}(2^{\mathcal{O}((1+\frac{\delta}{2})d)}\cdot m^{1+\frac{\delta}{2}}\cdot poly(\log n))$ queries in $\mathcal{O}((1+\delta)\min\{2^{d}(\log m+d)^{2},(\log m+d)^{3}\})$ rounds with high probability. When $m<d$ , Abasi et al. [2] give a randomized algorithm that achieves a query complexity of $\mathcal{O}(dm\log n+(d/m)^{m-1+o(1)})$ and show an almost matching $\Omega(dm\log n+\left({d}/{m}\right)^{m-1})$ lower bound. When $m\geq d$ , they present a randomized algorithm that asks $(cm)^{d/2+0.75}+dm\log n$ queries for some constant $c$ , almost matching the $\Omega((2m/d)^{d/2})$ lower bound of [6]. Chodoriwsky and Moura [12] develop an adaptive algorithm with $\mathcal{O}(m^{d}\log n)$ queries. Adaptive algorithms with $\mathcal{O}\left(md(\log_{2}{n}+(md)^{d})\right)$ queries are constructed in [13]. Regarding non-adaptive algorithms, Chin et al. [11] and Abasi et al. [3] give deterministic non-adaptive algorithms with an almost optimal query complexity $N(m,d)^{1+o(1)}\log n$ . The problem of learning a hypergraph when a fraction of the queries are incorrect is studied in Chen et al. [10] and Abasi [1]. We note that, to the best of our knowledge, all existing algorithms require a number of queries that is exponential in $d$ , or exponential in $m$ when $m<d$ . We develop the first algorithms using $\operatorname{poly}(n)$ queries for learning non-trivial families of hypergraphs that have arbitrarily large edge sizes and a super-constant number of edges $m$ .

Finally, learning a matching has been studied in the context of graphs $(d=2)$ [4, 8]. In particular Alon et al. [4] provide a non-adaptive algorithm with $\mathcal{O}(n\log n)$ queries. To the best of our knowledge, there is no previous work on learning hypermatchings, which generalizes matchings to hypergraphs of maximum edge size $d>2$ . Our lower bound shows that, in contrast to algorithms for learning matchings, algorithms for learning hypermatchings require multiple adaptive rounds. Similar to the algorithm in [4] for learning matchings, our algorithm for learning hypermatchings has a near-linear query complexity.

2 Preliminaries

A hypergraph $H=(V,E)$ consists of a set of $n$ vertices $V$ and a set of $m$ edges $E\subseteq 2^{V}$ . We abuse notation and write $e\in H$ for edges $e\in E$ . For the problem of learning a hypergraph $H$ , the learner is given $V$ and an edge-detecting oracle $Q_{H}$ . Given a query $S\subseteq V$ , the oracle answers $Q_{H}(S)=1$ if $S$ contains an edge $e$ , i.e. there exists $e\in H$ such that $e\subseteq S$ ; otherwise, $Q_{H}(S)=0$ . The goal is to learn the edge set $E$ using a small number of queries. When queries can be evaluated in parallel, we are interested in algorithms with low adaptivity. The adaptivity of an algorithm is measured by the number of sequential rounds it makes where, in each round, queries are evaluated in parallel. An algorithm is non-adaptive if its adaptivity is $1$ .

The degree of a vertex $v$ is the number of edges $e\in H$ such that $v\in e$ . The maximum degree $\Delta$ of $H$ is the maximum degree of a vertex $v\in V$ . When $\Delta=1$ , every vertex is in at most one edge and $H$ is called a hypermatching, which we also denote by $M$ . The rank $d$ of $H$ is the maximum size of an edge $e\in H$ , i.e., $d:=\max_{e\in H}|e|$ . The edge size ratio of a hypergraph $H$ is $d/\min_{e\in H}|e|$ . A graph is $\rho$ -near-uniform and is uniform if $d/\min_{e\in H}|e|\leq\rho$ and $d/\min_{e\in H}|e|=1$ respectively.

3 Learning Hypermatchings

In this section, we study the problem of learning hypermatchings, i.e., hypergraphs of maximum degree $\Delta=1$ . In Section 3.1, we present an algorithm that, with high probability, learns a hypermatching with $\mathcal{O}(n\log^{5}n)$ queries in $\mathcal{O}(\log^{3}n)$ rounds. The number of rounds can be improved to $\mathcal{O}(\log n)$ , at the expense of an $\mathcal{O}(n^{3}\log^{3}n)$ query complexity. In Section 3.2, we show that there is no $o(\log\log n)$ -round algorithm that learns hypermatchings with $\operatorname{poly}(n)$ queries. Some proofs are deferred to Appendix A.1 and A.2.

3.1 Learning algorithm for hypermatchings

A central concept we introduce for our learning algorithms is the definition of a unique-edge covering family of a hypergraph $H$ , which is a collection of sets such that every edge $e\in H$ is contained in at least one set that does not contain any other edge.

Definition 1.

A collection $\mathcal{S}\subseteq 2^{V}$ is a unique-edge covering family of $H$ if, for every $e\in H$ , there exists $S\in\mathcal{S}$ s.t. $e$ is the unique edge contained in $S$ , i.e. $e\subseteq S$ and $e^{\prime}\not\subseteq S$ for all $e^{\prime}\in H,e^{\prime}\neq e$ .

Efficiently searching for the unique edge in a set.

We first show that, given a unique-edge covering family $\mathcal{S}$ of a hypermatching $M$ , we can learn $M$ . We observe that a set of vertices $S\subseteq V$ contains a unique edge if and only if $Q_{M}(S)=1$ and $Q_{M}(S\setminus\{v\})=0$ for some $v\in S$ and that if it contains a unique edge, this edge is $e=\{v\in S:Q_{M}(S\setminus\{v\})=0\}$ . This observation immediately leads to the following simple algorithm called FindEdgeParallel.

Algorithm 1 FindEdgeParallel(S), returns

e

S

contains a unique edge

e

0: set

S\subseteq V

of vertices

1: if

Q_{M}(S)=1

and

\exists v\in S

s.t.

Q_{M}(S\setminus\{v\})=0

then return

\{v\in S:Q_{M}(S\setminus\{v\})=0\}

2: return None

The main lemma for FindEdgeParallel follows immediately from the above observation.

Lemma 1.

For any hypermatching $M$ and set $S\subseteq V$ , FindEdgeParallel is a non-adaptive algorithm with $|S|+1$ queries that, if $S$ contains a unique edge $e$ , returns $e$ and None otherwise.

Proof.

It is clear that $\textsc{FindEdgeParallel}(S)$ makes at most $|S|+1$ queries. If $S$ does not contain an edge, then $Q_{M}(S)=0$ and the algorithm returns None. If $S$ contains at least two edges, then there is no intersection between the edges because $M$ is a matching. Therefore, for every $v\in S$ , $Q_{M}(S\setminus\{v\})=1$ , and the algorithm returns None. The only case left is when $S$ contains a unique edge $e$ . In this case $Q_{M}(S)=1$ and $e=\{v\in S,\ Q_{M}(S\setminus\{v\}=0\}$ . In this case, $\textsc{FindEdgeParallel}(S)$ return $e$ . ∎

In order to obtain a near-linear query complexity for learning a hypermatching, the query complexity of FindEdgeParallel needs to be improved. Next, we describe an $\mathcal{O}(\log|S|)$ -round algorithm, called FindEdgeAdaptive, which finds the unique edge $e$ in a set $S$ with $\mathcal{O}(s\log|S|)$ queries, assuming $|e|\leq s$ . The algorithm recursively partitions $S$ into two arbitrary sets $S_{1}$ and $S_{2}$ of equal size. If $Q_{M}(S_{1})=Q_{M}(S_{2})=1$ , then $S$ contains at least two edges. If $Q_{M}(S_{1})=1$ and $Q_{M}(S_{2})=0$ (or similarly $Q_{M}(S_{1})=0$ and $Q_{M}(S_{2})=1$ ), then, assuming $S$ contains a single edge, this edge is contained in $S_{1}$ and we recurse on $S_{1}$ . The most interesting case is if $Q_{M}(S_{1})=Q_{M}(S_{2})=0$ . Assuming $S$ contains a single edge $e$ , this implies that both $S_{1}$ and $S_{2}$ contain vertices in $e$ and we recurse on both $S_{1}$ and $S_{2}$ . When recursing on $S_{1}$ , the set $S_{2}$ needs to be included in future queries to find vertices in $e\cap S_{1}$ (and vice-versa when recursing on $S_{2}$ ). The algorithm thus also specifies an additional argument $T$ for the recursive calls. This argument is initially the empty set, in the case where $Q_{M}(S_{1})=Q_{M}(S_{2})=0$ the set $S_{2}$ is added to $T$ when recursing on $S_{1}$ and vice-versa, and $T$ is included in all queries in the recursive calls.

Algorithm 2

\textsc{FindEdgeAdaptive}(S,s)

, returns

e

S

contains a unique edge

e

and

|e|\leq s

0: set

S\subseteq V

, edge size

s

1: if

Q_{M}(S)=0

then return None

\mathcal{S}\leftarrow\{(S,\emptyset)\},\hat{e}\leftarrow\{\}

3: while

|\mathcal{S}|>0

4: if

|\mathcal{S}|>s

then return None

\mathcal{S}^{\prime}\leftarrow\{\}

6: for each

(S^{\prime},T)\in\mathcal{S}

do (in parallel)

S_{1},S_{2}\leftarrow

partition

S^{\prime}

into two sets of size

|S^{\prime}|/2

8: if

|S^{\prime}|=1

then add

v\in S^{\prime}

\hat{e}

9: else if

Q_{M}(S_{1}\cup T)=1

and

Q_{M}(S_{2}\cup T)=1

then return None

10: else if

Q_{M}(S_{1}\cup T)=1

and

Q_{M}(S_{2}\cup T)=0

then add

(S_{1},T)

\mathcal{S}^{\prime}

11: else if

Q_{M}(S_{1}\cup T)=0

and

Q_{M}(S_{2}\cup T)=1

then add

(S_{2},T)

\mathcal{S}^{\prime}

12: else if

Q_{M}(S_{1}\cup T)=0

and

Q_{M}(S_{2}\cup T)=0

then add

(S_{1},S_{2}\cup T),(S_{2},S_{1}\cup T)

\mathcal{S}^{\prime}

13:

\mathcal{S}\leftarrow\mathcal{S}^{\prime}

14: if

|\hat{e}|>s

, or

Q_{M}(\hat{e})=0,

Q_{M}(\hat{e}\setminus\{v\})=1

for some

v\in\hat{e}

then return None

15: return

\hat{e}

Let $\mathcal{S}^{i}=\{(S^{i}_{1},T^{i}_{1}),\ldots,(S^{i}_{\ell},T^{i}_{\ell})\}$ be the $\mathcal{S}$ at the beginning of the $i$ th iteration of the while loop of FindEdgeAdaptive $(S,s)$ . In the case where $S$ contains a single edge $e$ and $|e|\leq s$ , the algorithm maintains two invariants (formally stated in Lemma 2; proof in Appendix A.1). The first is that, for all vertices $v\in e$ and all iterations $i$ , either $v\in\hat{e}$ or there exists a unique $j$ such that $v$ is contained in set $S^{i}_{j}$ . This invariant makes sure that every vertex $v\in e$ will eventually be added to $\hat{e}$ . The second is that all sets $S^{i}_{j}$ contain at least one vertex $v\in e$ . This invariant explains why, in the base case $|S^{i}_{j}|=1$ , we add the single vertex in $S^{i}_{j}$ to the solution $\hat{e}$ . Additionally, since by construction, $S^{i}_{1},\ldots,S^{i}_{l}$ are disjoint, the second invariant also implies that when $|\mathcal{S}^{i}|>s$ , we have that either set $S$ contains more than one edge or that it contains a single edge of size greater than $s$ , in which case the algorithm returns None. This stopping condition ensures that at most $2s$ queries are performed at each iteration of the while loop.

Lemma 2.

Assume there is a unique edge $e\in M$ such that $e\subseteq S$ and that $|e|\leq s$ , then, for every vertex $v\in e$ and at every iteration $i$ of the while loop of FindEdgeAdaptive $(S,s)$ we have that (1) either $v\in\hat{e}$ or $v\in S^{i}_{j}$ for a unique $j\in[\ell]$ and (2) $e\cap S^{i}_{j}\neq\emptyset$ for all $j\in[\ell]$ .

We show that if $\textsc{FindEdgeAdaptive}(S,s)$ returns $\hat{e}$ , then $\hat{e}$ is an edge. If there is a unique edge $e\subseteq S$ and $|e|\leq s$ , then the algorithm returns $\hat{e}=e$ .

Lemma 3.

For any $S\subseteq V$ and $s\in[n]$ , if there is a unique edge $e\in M$ such that $e\subseteq S$ and this edge is such that $|e|\leq s$ , then $\textsc{FindEdgeAdaptive}(S,s)$ returns the edge $e$ . Otherwise, it either returns None or an edge $e\in M$ . $\textsc{FindEdgeAdaptive}(S,s)$ uses at most $2s\log_{2}|S|+(s+1)$ queries in at most $\log_{2}|S|$ rounds.

Proof.

First, assume that there is a unique edge $e\in M$ such that $e\subseteq S$ and that $|e|\leq s$ . Consider $v\in S$ . If $v\in e$ , then by Lemma 2, we have that at every iteration $i$ of the while loop, either $v\in\hat{e}$ or $v\in S^{i}_{j}$ for some $j\in\ell$ . Since $|S^{i}_{j}|=|S^{i-1}_{j^{\prime}}|/2$ for all $i,j,j^{\prime}$ , at iteration $i^{\star}=\log_{2}n$ , either $v\in\hat{e}$ or $v\in S^{i^{\star}}_{j}$ for some $j$ and $|S^{i^{\star}}_{j}|=1$ , in which case $v$ is then added to $e$ by the algorithm. Next, if $v\not\in e$ , since $e\cap S^{i}_{j}\neq\emptyset$ for all $i,j$ by Lemma 2, $v$ is never added to $\hat{e}$ . Thus, FindEdgeAdaptive $(S,s)$ returns the edge $e$

Assume now that $S$ does not contain any edges. In this case, every query is answered by zero, and FindEdgeAdaptive $(S,s)$ returns None. Finally, assume that $S$ contains at least two edges. If FindEdgeAdaptive $(S,s)$ does not return None, then Step 14 ensures that the returned edge $\hat{e}$ has size at most $s$ . Step 14 also ensures that $\hat{e}$ contains at least one edge. If $\hat{e}$ strictly contains an edge, then there is a vertex $v\in\hat{e}$ such that $Q_{M}(\hat{e}\setminus\{v\})=1$ , in which case FindEdgeAdaptive $(S,s)$ would have returned None. Therefore, $\hat{e}$ is an edge of $M$ of size less than $s$ .

We now show that FindEdgeAdaptive $(S,s)$ runs in $\log_{2}|S|$ rounds and makes at most $2s\log_{2}|S|+(s+1)$ queries. At every iteration of the while loop, FindEdgeAdaptive $(S,s)$ makes less than $2s$ queries (recall that if $|\mathcal{S}|>s$ then the algorithm returns None). We show that the number of iterations of the while loop is less than $\log_{2}|S|.$ At every iteration of the while loop, for every couple $(S,T)$ in $\mathcal{S}$ , the size of $S$ is divided by 2. After $\log_{2}|S|$ iterations, either $\mathcal{S}$ is empty or for every couple $(S,T)$ in $\mathcal{S}$ we have $|S|=1$ . This guarantees that no sets are added to $\mathcal{S}^{\prime}$ . Therefore, the number of iterations of the while loop is less than $\log_{2}|S|$ . This also shows that the adaptive complexity of FindEdgeAdaptive $(S,s)$ is less than $\log_{2}|S|$ . After the while loop, we make at most $s+1$ queries and the total number of queries is less than $2s\log|S|+s+1$ . ∎

Constructing a unique-edge covering family.

Next, we give an algorithm called FindDisjointEdges that, for any hypermatching $M$ , constructs a unique-edge covering family $\mathcal{S}$ of the $\alpha$ -near-uniform hypermatching $M_{s,\alpha,V^{\prime}}$ . $M_{s,\alpha,V^{\prime}}$ is defined as the subgraph of $M$ over edges of size between $s/\alpha$ and $s$ and over vertices in $V^{\prime}$ that are not in an edge of size less than $s/\alpha$ . The unique-edge covering family $\mathcal{S}$ is constructed with i.i.d. samples that contain each vertex in $V^{\prime}$ , independently, with probability $n^{-\alpha/s}$ . The algorithm then calls a FindEdge subroutine, which is either FindEdgeParallel or FindEdgeAdaptive, on samples that contain at least one edge.

Algorithm 3

\textsc{FindDisjointEdges}(s,\alpha,V^{\prime},\textsc{FindEdge})

, returns edges of size between

\frac{s}{\alpha}

and

s

0: edge size

s

, parameter

\alpha\geq 1

, set of vertices

V^{\prime}\subseteq V

, subroutine FindEdge

\hat{M}\leftarrow\emptyset

2: for

i=1,\ldots,\ n^{\alpha}\log^{2}{n}

do (in parallel)

S_{i}\leftarrow

set of independently sampled vertices from

V^{\prime}

, each with probability

n^{-\alpha/s}

4: if

Q_{M}(S_{i})=1

then

e\leftarrow\textsc{FindEdge}(S_{i},s)

6: if

e

is not None then add

e

\hat{M}

7: return

\hat{M}

We show that $n^{\alpha}\log^{2}n$ such samples suffice to construct a unique-edge covering family $\mathcal{S}$ of the $\alpha$ -near-uniform hypermatching $M_{s,\alpha,V^{\prime}}$ .

Lemma 4.

Assume $s/\alpha\geq 2$ , $\alpha\geq 1$ , that the FindEdge subroutine uses at most $q$ queries in at most $r$ rounds, and let $\mathcal{S}=\{S_{i}\}_{i}$ be the $n^{\alpha}\log^{2}{n}$ samples, then FindDisjointEdges is an $(r+1)$ -adaptive algorithm with $n^{\alpha}\log^{2}n+2\alpha qn^{\alpha}\log^{2}n/s$ queries such that, with probability $1-\mathcal{O}(n^{-\log n})$ , $\mathcal{S}$ is a unique-edge covering family of the hypermatching $M_{s,\alpha,V^{\prime}}=\{e\in M:e\subseteq V^{\prime},s/\alpha\leq|e|\leq s\}$ and thus $\hat{M}=M_{s,\alpha,V^{\prime}}$ .

Proof.

We know that a sample $S_{i}$ contains at least an edge if and only if $Q_{M}(S_{i})=1$ . We also know from Lemma 1 and Lemma 3 that if $S_{i}$ contains a unique edge $e$ of size less than $s$ , then the subroutine $\textsc{FindEdge}(S_{i},s)$ will return $e$ . Therefore, to learn all edges of size in $\left[s/\alpha,s\right]$ , we only need to make sure that each such edge appears in at least one sample that does not contain any other edges, in other words, that $\mathcal{S}$ is a unique-edge covering family for $M_{s,\alpha,V^{\prime}}=\{e\in M:e\subseteq V^{\prime},\ s/\alpha\leq|e|\leq s$ }. Below, we show that it is the case w.p. $1-O(n^{-\log n})=1-e^{-\Omega(\log^{2}n)}$ .

We first show that with high probability, each edge $e$ of size $|e|\in\left[s/\alpha,s\right]$ is contained in at least $\log^{2}n/2$ sample $S_{i}$ ’s. We use $X_{e}$ to denote the number of samples containing $e$ , then we have $\mathbb{E}(X_{e})=n^{\alpha}\log^{2}n\cdot n^{-\frac{\alpha|e|}{s}}\geq n^{\alpha}\log^{2}n\cdot n^{-\alpha}=\log^{2}n.$ By Chernoff bound, we have $P(X_{e}\leq\log^{2}n/2)\leq n^{-\log n/8}.$ As there are at most $\alpha n/s$ edges of size between $s/\alpha$ and $s$ , by a union bound, $P(\exists e\in M\text{ s.t. }|e|\in\left[s/\alpha,s\right],X_{e}\leq\log^{2}n/2)\leq\frac{\alpha n}{s}n^{-\log n/8}\leq n^{-(\log n/8-1)}.$ and we get $P(\forall e\in M\text{ s.t. }|e|\in\left[s/\alpha,s\right],X_{e}\geq\log^{2}n/2)\geq 1-n^{-(\log n/8-1)}.$ From now on we condition on the event that all edges $e$ whose size is between $s/\alpha$ and $s$ are included in at least $\log^{2}n/2$ samples. We show below that given $e\subseteq S_{i}$ for a sample $S_{i}$ , the probability that there exists another edge of size at least $s$ in $S_{i}$ is upper bounded by $\alpha/s$ . Recall that $M$ is the hidden matching we want to learn. We abuse notation and use $e^{\prime}\in M\cap S_{i}$ to denote that an edge $e^{\prime}$ is both in the matching $M$ and included in the sample $S_{i}$ . We have

	$\displaystyle\mathbb{P}(\exists e^{\prime}\in M\cap S_{i},e^{\prime}\neq e\ \|\ e\subseteq S_{i})$	$\displaystyle\leq\sum_{e^{\prime}\in M,e^{\prime}\neq e}\mathbb{P}(e^{\prime}\subseteq S_{i}\ \|\ e\subseteq S_{i})$
		$\displaystyle=\sum_{e^{\prime}\in M,e^{\prime}\neq e}\mathbb{P}(e^{\prime}\subseteq S_{i})\leq\frac{\alpha n}{s}\cdot(n^{-\frac{\alpha}{s}})^{\frac{s}{\alpha}}=\frac{\alpha}{s},$

where the first inequality uses a union bound, the first equality is due to the fact that $M$ is a matching and thus $e\cap e^{\prime}=\emptyset$ and that vertices are sampled independently, and the second inequality follows because the total number of remaining edges is upper bounded by $\alpha n/s$ and each edge is in $S_{i}$ with probability at most $(n^{-\frac{\alpha}{s}})^{\frac{s}{\alpha}}$ . As each edge $e$ with size between $s/\alpha$ and $s$ is contained in at least $\log^{2}n/2$ samples, we have

\displaystyle\mathbb{P}(\forall S_{i}\text{ s.t. }e\subseteq S_{i},\exists\ e^{\prime}\in M\cap S_{i},e^{\prime}\neq e)\leq(\frac{s}{\alpha})^{-\log^{2}n/2}\leq n^{-\log n/4},

where the last inequality is since $s/\alpha\geq 2$ . By another union bound on the edges of size between $s/\alpha$ and $s$ , $\mathbb{P}(\exists e\text{ s.t. }|e|\in\left[s/\alpha,s\right],\forall S_{i}\text{ s.t. }e\subseteq S_{i},\exists e^{\prime}\in M\cap S_{i},e^{\prime}\neq e)\leq n^{-(\log n/2-1)}.$ We can thus conclude that with probability at least $1-O(n^{-\log n})=1-e^{-\Omega(\log^{2}n)}$ , for all $e\in M$ with size between $s/\alpha$ and $s$ , there exists at least one sample $S_{i}$ that contains $e$ but no other remaining edges. By Lemma 1 and Lemma 3, $e$ is returned by $\textsc{FindEdge}(S_{i},s)$ , and $e$ is added to the matching $\hat{M}$ that is returned by $\textsc{FindDisjointEdges}(s,\rho,V^{\prime})$ .

Next, we show that FindDisjointEdges is an $(r+1)$ -adaptive algorithm that requires $n^{\alpha}\log^{2}n+2\alpha qn^{\alpha}\log^{2}n/s$ queries. We first observe that FindDisjointEdges makes only parallel calls to FindEdges (after verifying that $Q_{M}(S_{i})=1$ ). Therefore, since FindEdges runs in at most $r$ rounds, we get that FindDisjointEdges runs in at most $r+1$ rounds. To prove a bound on the number of queries, we first argue using a Chernoff bound that with probability $1-e^{-\Omega(\log^{2}n)}$ , every edge $e$ of size at least $s/\alpha$ is in at most $2n^{\alpha-1}\log^{2}n$ samples $S_{i}$ . Therefore there are at most $\frac{2\alpha n^{\alpha}}{s}\log^{2}n$ samples $S_{i}$ such that $Q_{M}(S_{i})=1$ . For every one of these samples, we call $\textsc{FindEdges}(S_{i},s)$ , which makes $q$ queries by assumption. Therefore, with high probability $1-e^{-\Omega(\log^{2}n)}$ , FindDisjointEdges makes $\mathcal{O}(n^{\alpha}\log^{2}n+\alpha q\frac{2n^{\alpha}}{s}\log^{2}n)$ queries. ∎

The main algorithm for hypermatchings.

The main algorithm first finds edges of size $1$ by brute force with FindSingletons. It then iteratively learns the edges of size between $s/\alpha$ and $s$ by calling $\textsc{FindDisjointEdges}(s,\alpha,V^{\prime})$ , where $V^{\prime}$ is the set of vertices that are not in edges learned in previous iterations. At the end of each iteration, the algorithm increases $s$ . We obtain the main theorem for learning hypermatchings, the proof of which is given in Appendix A.1.

Algorithm 4 FindMatching

(\alpha,\textsc{FindEdge})

, learns a hypermatching.

0: parameter

\alpha\geq 1

, subroutine FindEdge

\hat{M}\leftarrow\textsc{FindSingletons},s\leftarrow 2\alpha

2: while

s<n

V^{\prime}\leftarrow V\setminus\{v:v\in e\text{ for some }e\in\hat{M}\}

\hat{M}\leftarrow\hat{M}\cup\textsc{FindDisjointEdges}(s,\alpha,V^{\prime},\textsc{FindEdge})

s\leftarrow\lfloor\alpha s\rfloor+1

6: return

\hat{M}

Theorem 5.

Let $M$ be a hypermatching, then FindMatching learns $M$ w.h.p either in $\mathcal{O}(\log^{3}n)$ rounds using $\mathcal{O}(n\log^{5}n)$ queries, with $\alpha=1/(1-1/(2\log n))$ and FindEdgeAdaptive, or in $\mathcal{O}(\log n)$ rounds using $\mathcal{O}(n^{3}\log^{3}n)$ queries, with $\alpha=2$ and FindEdgeParallel.

3.2 Hardness of learning hypermatchings

In this section, we show that the adaptive complexity of learning a hypermatching with $\operatorname{poly}(n)$ queries is $\Omega(\log\log n)$ . This result is in sharp contrast to matchings (where edges are of size $d=2$ ) for which there exists a non-adaptive learning algorithm with $\mathcal{O}(n\log n)$ queries [4].

Warm-up: hardness for non-adaptive learning.

As a warm-up, we present a simple construction of a family of hypermatchings for which there is no non-adaptive learning algorithm with $\operatorname{poly}(n)$ queries. Each hypermatching $M_{P}$ in this family depends on a partition $P=(P_{1},P_{2},P_{3})$ of the vertex set $V$ into three parts $P_{1},P_{2},P_{3}$ where $|P_{1}|=n-(\sqrt{n}+1)$ , $|P_{2}|=1$ , and $|P_{3}|=\sqrt{n}$ . $M_{P}$ contains $(n-(\sqrt{n}+1))/2$ edges of size $2$ which form a perfect matching over $P_{1}$ and one edge of size $\sqrt{n}$ which contains all the vertices in $P_{3}$ . The only vertex in $P_{2}$ is not contained in any edges in $M_{P}$ . The main idea of the construction is that, after one round of $\operatorname{poly}(n)$ queries, vertices in $P_{2}$ and $P_{3}$ are indistinguishable to the learning algorithm. However, the algorithm needs to distinguish vertices in $P_{3}$ from the vertex in $P_{2}$ to learn the edge $P_{3}$ .

Theorem 6.

There is no non-adaptive algorithm with $\operatorname{poly}(n)$ query complexity that can learn every non-uniform hypermatching with probability $\omega(1/\sqrt{n})$ .

Proof.

Let $\mathcal{P}_{3}$ be the set of all possible partitions $(P_{1},P_{2},P_{3})$ such that $|P_{1}|=n-(\sqrt{n}+1)$ , $|P_{2}|=1$ , and $|P_{3}|=\sqrt{n}$ . We consider a uniformly random partition $P$ from $\mathcal{P}_{3}$ , a matching $M_{P}$ and a non-adaptive algorithm $\mathcal{A}$ that asks a collection $\mathcal{S}$ of $\operatorname{poly}(n)$ non-adaptive queries. The main lemma (Lemma 2) shows that, with high probability, for all queries $S\in\mathcal{S}$ , the answer $Q_{M_{P}}(S)$ is independent of which $\sqrt{n}$ vertices in $P_{2}\cup P_{3}$ constitute the edge $P_{3}$ . The analysis of this lemma consists of two cases. If the query $S$ is small ( $|S|<(n+\sqrt{n}+1)/2$ ), then we show that $S$ contains less than $\sqrt{n}$ vertices from $P_{2}\cup P_{3}$ , w.h.p. over the randomization of $P$ , by the Chernoff bound. Therefore $S$ does not contain $P_{3}$ and $Q_{M_{P}}(S)$ is independent of the partition of $P_{2}\cup P_{3}$ into $P_{2}$ and $P_{3}$ . If $S$ is large ( $|S|\geq(n+\sqrt{n}+1)/2$ ), then $S$ contains an edge of size two from $P_{1}$ . Thus $Q_{M_{P}}(S)=1$ and $Q_{M_{P}}(S)$ is independent of the partition of $P_{2}\cup P_{3}$ into $P_{2}$ and $P_{3}$ .

In Lemma 3, we argue that if all queries $Q_{M_{P}}(S)$ of $\mathcal{A}$ are independent of the partition of $P_{2}\cup P_{3}$ into $P_{2}$ and $P_{3}$ , then, with high probability, the matching returned by this algorithm is not equal to $M_{P}$ . By the probabilistic method, there is a matching $M_{P}$ with $P\in\mathcal{P}_{3}$ that $\mathcal{A}$ does not successfully learn with probability $1-\mathcal{O}(1/\sqrt{n})$ . ∎

Hardness of learning in $o(\log\log n)$ rounds.

The main technical challenge is to generalize the construction and the analysis of the hard family of hypergraphs for non-adaptive learning algorithms to a construction and an analysis which holds over $\Omega(\log\log n)$ rounds. In this construction, each hypermatching $M_{P}$ depends on a partition $P=(P_{0},P_{1},\ldots,P_{R})$ of the vertex set $V$ into $R+1$ parts. For each $i\in\{0,\ldots R\}$ , $M_{P}$ contains $|P_{i}|/d_{i}$ edges of size $d_{i}$ which form a perfect matching over $P_{i}$ . We set the sizes such that $d_{0}=3$ , $d_{i+1}=3\log^{2}n\cdot d_{i}^{2}$ , and $|P_{i}|=3\log^{2}n\cdot d_{i}^{2}$ .

The main idea of the construction is that after $i$ rounds of queries, vertices in $\cup_{j=i+1}^{R}P_{i}$ are indistinguishable to any algorithm. However, the algorithm needs to distinguish vertices in $P_{j}$ from vertices in $P_{j^{\prime}}$ , for all $j\neq j^{\prime}$ , to learn $M_{P}$ . Informally, since the edges in $P_{j^{\prime}}$ have a larger size than edges in $P_{j}$ for $j^{\prime}>j$ , an algorithm can learn only one part $P_{j}$ at each round of queries, the one with edges of minimum size among all parts that have not yet been learned in previous rounds.

Theorem 7.

There is no $(\log\log n-3)$ -adaptive algorithm with $poly(n)$ query complexity which can learn every non-uniform hypermatching with probability $e^{-o(\log^{2}n)}$ .

Proof Sketch, full proof in Appendix A.1..

Let $\mathcal{P}_{R}$ be the set of all partitions $(P_{0},\ldots,P_{R})$ such that $|P_{i}|=3\log^{2}n\cdot d_{i}^{2}$ , and let $P$ be a uniformly random partition from $\mathcal{P}_{R}$ , and $\mathcal{A}$ be an algorithm with $\operatorname{poly}(n)$ queries. In Lemma 8, we show that if vertices in $P_{i},P_{i+1},\ldots,P_{R}$ are indistinguishable to $\mathcal{A}$ at the beginning of round $i$ of queries, then vertices in $P_{i+1},\ldots,P_{R}$ are indistinguishable at the end of round $i$ .

The proof of Lemma 8 consists of two parts. First, Lemma 7 shows that if $S$ is small ( $|S|\leq(1-1/d_{i})\sum_{j\geq i}|P_{j}|)$ and is independent of partition of $\cup_{j=i}^{R}P_{i}$ into $P_{i},\ldots,P_{R}$ , then w.h.p., for every $j\geq i+1$ , $S$ does not contain any edge contained in $P_{j}$ . Second, Lemma 6 shows that if a query $S$ has a large size ( $|S|\geq(1-1/d_{i})\sum_{j\geq i}|P_{j}|)$ and is independent of partition of $\cup_{j=i}^{R}P_{i}$ into $P_{i},\ldots,P_{R}$ , then w.h.p. $S$ contains at least one edge from the matching on $P_{i}$ which implies $Q_{M_{P}}(S)=1$ . Proving Lemma 6 is quite involved. Because the edges of a matching are not independent from each other, a Chernoff bound analysis is not enough to show that the probability that a query does not contain any edge from $P_{i}$ is small. We therefore provide an alternative method to bound this probability (Lemma 5). In the end, in both cases, we show that $Q_{M_{P}}(S)$ is independent of the partition of $\cup_{j=i+1}^{R}P_{i}$ into $P_{i+1},\ldots,P_{R}$ . Finally, we conclude the proof of Theorem 7 similarly as for Theorem 6. ∎

4 Learning Low-Degree Near-Uniform Hypergraphs

In this section, we give an algorithm for learning low-degree near-uniform hypergraphs. The algorithm is non-adaptive and has $\mathcal{O}\left((2n)^{\rho\Delta+1}\log^{2}n)\right)$ query complexity for learning a hypergraph $H(V,E)$ of maximum degree $\Delta$ and edge size ratio $\rho$ . It generalizes FindDisjointEdges and constructs a unique-edge covering family, but its analysis is completely different, and significantly more challenging, due to overlapping edges. Full proofs are deferred to Appendix B.

Near-uniformity is necessary.

The $\mathcal{O}(n\log^{5}n)$ query complexity from the previous section for $\Delta=1$ holds for any hypermatching $M$ , even those with edge size ratio $\rho=\Omega(n)$ . In contrast, for general $\Delta$ , we obtain an $\mathcal{O}(n^{\rho\Delta+1}\log^{2}n)$ query complexity. Angluin and Chen [7] show that $\Omega((2m/d)^{d/2})$ queries are required to, even fully adaptively, learn hypergraphs of maximum edge size $d$ . Their hardness result holds for a family of hypergraphs of maximum degree $\Delta=2$ and edges of size $2$ or $d$ , so with $\rho=d/2$ . Thus, it implies that $\Omega((m/\rho)^{\rho})$ queries are required to learn hypergraphs even when $\Delta=2$ , i.e., an exponential dependence on $\rho$ is required.

The algorithm.

FindLowDegreeEdges constructs a unique-edge covering family of the hypergraph $H$ similarly to FindDisjointEdges, which requires a larger number of samples when $\Delta>1$ . In addition, steps $5$ - $6$ are needed to ensure that intersections of edges are not falsely identified as edges since, when $S_{i}$ contains two or more edges that all overlap in $S^{\prime}_{i}\subseteq S_{i}$ , we have that $S^{\prime}_{i}=\{v\in S_{i}:Q_{H}(S_{i}\setminus\{v\})=0\}$ .

Algorithm 5 FindLowDegreeEdges, learns a

\rho

-near-uniform hypergraph with max degree

\Delta

0: edge size ratio

\rho

, maximum edge size

d

, maximum degree

\Delta\geq 2

\hat{H}\leftarrow\emptyset

2: for

i=1,\ldots,\ (2n)^{\rho\Delta}\log^{2}n

do (in parallel)

S_{i}\leftarrow

set of independently sampled vertices, each with probability

p=(2n)^{-\frac{\rho}{d}}

4: if

Q_{H}(S_{i})=1

and

\exists v

s.t.

Q_{H}(S_{i}\setminus\{v\})=0

then add

\{v\in S_{i}:Q_{H}(S_{i}\setminus\{v\})=0\}

\hat{H}

5: for

e\in\hat{H}

do (in parallel)

6: if

\exists\ e^{\prime}\in\hat{H}

s.t.

e\subset e^{\prime}

then remove

e

from

\hat{H}

7: return

\hat{H}

Recall that for hypermatchings, FindDisjointEdges is used as a subroutine in an adaptive procedure that learns edges of any size. For hypergraphs, however, we cannot call FindLowDegreeEdges iteratively and remove vertices in learned edges after each call because a large edge could overlap with a small edge.

The analysis.

The main technical challenge in this section is to analyze the sample complexity required to obtain w.h.p. a unique-edge covering family. The main lemma bounds the probability that, conditioned on a sample $S$ containing an edge $e$ , this sample contains another edge $e^{\prime}$ .

Lemma 1.

If a sample set $S$ is constructed by independently sampling each vertex w.p. $p=(2n)^{-\frac{\rho}{d}}$ from a hypergraph $H$ with maximum degree $\Delta\geq 2$ , maximum edge size $d$ , edge size ratio $\rho$ , $n\geq 100$ , then for any edge $e\in H$ , $\mathbb{P}(\exists e^{\prime}\subseteq S,e^{\prime}\in H,e^{\prime}\neq e\ |\ e\subseteq S)\leq 1-\left(\frac{\log n}{2n}\right)^{(\Delta-1)\rho}.$

Proof. [Proof Sketch, full proof in Appendix B.1.] Given an edge $e\in S$ , the proof of Lemma 1 relies on analyzing the following two mathematical programs, whose optimal values are used to bound $\mathbb{P}(\exists e^{\prime}\subseteq S,e^{\prime}\in H,e^{\prime}\neq e\ |\ e\subseteq S)$ .

	$\displaystyle\min\limits_{a_{ij}}\hskip 2.84544pt$	$\displaystyle\prod\limits_{j=d/\rho}^{d}\prod\limits_{i=0}^{j-1}(1-p^{j-i})^{a_{ij}}$
	s.t.	$\displaystyle\sum_{j=d/\rho}^{d}\sum_{i=0}^{j-1}i\cdot a_{ij}\leq(\Delta-1)d$
		$\displaystyle\sum_{j=d/\rho}^{d}\sum_{i=0}^{j-1}j\cdot a_{ij}\leq\Delta n$
		$\displaystyle\hskip 2.84544pta_{ij}\geq 0.$

	$\displaystyle\max\limits_{a_{ij}}\hskip 2.84544pt$	$\displaystyle\sum_{j=d/\rho}^{d}\sum_{i=0}^{j-1}a_{ij}\cdot p^{j-i}$
	s.t.	$\displaystyle\sum_{j=d/\rho}^{d}\sum_{i=0}^{j-1}i\cdot a_{ij}\leq(\Delta-1)d$
		$\displaystyle\sum_{j=d/\rho}^{d}\sum_{i=0}^{j-1}j\cdot a_{ij}\leq\Delta n$
		$\displaystyle\hskip 2.84544pta_{ij}\geq 0.$

The variables $a_{ij}$ denote the number of edges of size $j$ intersecting $e$ in $i$ vertices and we assume $|e|=d$ . The two programs have identical constraints. The first constraint is that the sum, over edges $e^{\prime}\neq e$ , of the size of the intersection of $e$ and $e^{\prime}$ has to be at most $(\Delta-1)d$ since $e$ has size $d$ and each vertex has degree at most $\Delta$ . The second constraint is that the sum, over edges $e^{\prime}\neq e$ , of the size of $e^{\prime}$ has to be at most $\Delta n$ since there are $n$ vertices of degree at most $\Delta.$

For the objectives, we note that $p^{j-i}$ is the probability that a fixed collection of $j-i$ vertices is in a sample. Thus, $1-p^{j-i}$ is the probability that an edge $e^{\prime}$ of size $j$ intersecting $e$ in $i$ vertices is not contained in a sample $S$ , conditioned on $e\subseteq S$ . If events $e_{1}\subseteq S|e\subseteq S$ and $e_{2}\subseteq S|e\subseteq S$ were independent for all $e_{1},e_{2}\neq e$ (which is not the case), the probability that there is no other edge in $S$ , conditioned on $e\in S$ would be equal to the objective of the left program. The objective of the right program is an upper bound on $\mathbb{P}(\exists e^{\prime}\subseteq S,e^{\prime}\in H,e^{\prime}\neq e\ |\ e\subseteq S)$ by a union bound over all edges $e^{\prime}\neq e$ . We denote the optimal value to the left and right programs by $f^{\bullet}(\Delta,p,d,\rho)$ and $LP^{\bullet}(\Delta,p,d,\rho)$ . The main steps of the proof are

In Lemma 9, we show that the unfavorable event is more likely to happen with the edge independence assumption:

\mathbb{P}(\exists e^{\prime}\subseteq S,e^{\prime}\in H,e^{\prime}\neq e\ |\ e\subseteq S)\leq 1-f^{\bullet}(\Delta,p,d,\rho)

(1)

by using induction and Bayes rule to formalize the intuition that the events $\{e^{\prime}\not\subseteq S\}$ for $e^{\prime}\neq e$ are positively correlated because edges could share common vertices.

2.

In Lemma 12, we show that for $\Delta\geq 2$ ,

$f^{\bullet}(\Delta,p,d,\rho)\geq f^{\bullet}(2,p,d/\rho,1)^{\rho(\Delta-1)}$ (2)

by deriving a closed-form optimal solution of $f^{\bullet}(\Delta,p,d,\rho)$ , which maximizes $a_{ij}$ for $i\in\{0,d/\rho-1\},j=d/\rho$ and sets the rest $a_{ij}=0$ .
3.

In Lemma 13, we show that

$f^{\bullet}(2,p,d/\rho,1)\geq 1-LP^{\bullet}(2,p,d/\rho,1).$ (3)
4.

In Lemma 15, we show that

$1-LP^{\bullet}(2,p,d/\rho,1)\geq\frac{\log n}{2n}.$ (4)

by deriving a closed-form solution to $LP^{\bullet}(2,p,d/\rho,1)$ and showing that it can be upper bounded by $\frac{d/\rho}{d/\rho-1}n^{-\rho/{d}}+\frac{1}{d/\rho}$ and that $\frac{d/\rho}{d/\rho-1}n^{-\rho/{d}}+\frac{1}{d/\rho}\leq 1-\frac{\log n}{2n}$ for $n\geq 100$ .
5.

Combining Eqs.(1), (2), (3), (4), we get the desired bound. $\blacksquare$

We are now ready to present the main result for this section (see Appendix B.2 for the full proof).

Theorem 8.

For any $\rho$ -near-uniform hypergraph $H$ with maximum degree $\Delta\geq 2$ , maximum edge size $d$ , and number of vertices $n\geq 100$ , FindLowDegreeEdges correctly learns $H$ with probability $1-o(1)$ . Furthermore, it is non-adaptive and makes at most $\mathcal{O}((2n)^{\rho\Delta+1}\log^{2}n)$ queries.

We note that if the exact parameters $\rho$ , $d$ , and $\Delta$ are unknown and only upper bounds $\bar{\rho}\geq\rho$ , $\bar{d}\geq d$ , $\bar{\Delta}\geq\Delta$ are available, then, if $\bar{\rho}$ is large enough so that $\bar{d}/\bar{\rho}\leq\min_{e\in H}|e|$ , we have that FindLowDegreeEdges with inputs $\bar{\rho},\bar{d},\bar{\Delta}$ also learns $H$ with probability $1-o(1)$ .

Acknowledgements

We thank Meghan Pantalia and Matthew Ulgherait for helpful discussions on finding drug combinations that reduce cancer cell viability.

References

Abasi [2018] Hasan Abasi. Error-tolerant non-adaptive learning of a hidden hypergraph. In 43rd International Symposium on Mathematical Foundations of Computer Science (MFCS 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018.
Abasi et al. [2014] Hasan Abasi, Nader H Bshouty, and Hanna Mazzawi. On exact learning monotone DNF from membership queries. In International Conference on Algorithmic Learning Theory, pages 111–124. Springer, 2014.
Abasi et al. [2018] Hasan Abasi, Nader H. Bshouty, and Hanna Mazzawi. Non-adaptive learning of a hidden hypergraph. Theoretical Computer Science, 716:15 – 27, 2018. ISSN 0304-3975. doi: https://doi.org/10.1016/j.tcs.2017.11.019. URL http://www.sciencedirect.com/science/article/pii/S0304397517308496. Special Issue on ALT 2015.
Alon et al. [2004] Noga Alon, Richard Beigel, Simon Kasif, Steven Rudich, and Benny Sudakov. Learning a hidden matching. SIAM Journal on Computing, 33(2):487–501, 2004.
Angluin [1988] Dana Angluin. Queries and concept learning. Machine learning, 2(4):319–342, 1988.
Angluin and Chen [2004] Dana Angluin and Jiang Chen. Learning a hidden graph using o (log n) queries per edge. In International Conference on Computational Learning Theory, pages 210–223. Springer, 2004.
Angluin and Chen [2006] Dana Angluin and Jiang Chen. Learning a hidden hypergraph. Journal of Machine Learning Research, 7:2215–2236, December 2006. ISSN 1532-4435.
Beigel et al. [2001] Richard Beigel, Noga Alon, Simon Kasif, Mehmet Serkan Apaydin, and Lance Fortnow. An optimal procedure for gap closing in whole genome shotgun sequencing. In Proceedings of the fifth annual international conference on Computational biology, pages 22–30, 2001.
Bshouty [2018] Nader H Bshouty. Exact learning from an honest teacher that answers membership queries. Theoretical Computer Science, 733:4–43, 2018.
Chen et al. [2008] Hong-Bin Chen, Hung-Lin Fu, and Frank K Hwang. An upper bound of the number of tests in pooling designs for the error-tolerant complex model. Optimization Letters, 2(3):425–431, 2008.
Chin et al. [2013] Francis YL Chin, Henry CM Leung, and Siu-Ming Yiu. Non-adaptive complex group testing with multiple positive sets. Theoretical Computer Science, 505:11–18, 2013.
Chodoriwsky and Moura [2015] Jacob Chodoriwsky and Lucia Moura. An adaptive algorithm for group testing for complexes. Theoretical Computer Science, 592:1 – 8, 2015. ISSN 0304-3975. doi: https://doi.org/10.1016/j.tcs.2015.05.005. URL http://www.sciencedirect.com/science/article/pii/S0304397515003965.
D’yachkov et al. [2016] A. G. D’yachkov, I. V. Vorobyev, N. A. Polyanskii, and V. Y. Shchukin. On multistage learning a hidden hypergraph. In 2016 IEEE International Symposium on Information Theory (ISIT), pages 1178–1182, 2016.
Flobak et al. [2019] Åsmund Flobak, Barbara Niederdorfer, Vu To Nakstad, Liv Thommesen, Geir Klinkenberg, and Astrid Lægreid. A high-throughput drug combination screen of targeted small molecule inhibitors in cancer cell lines. Scientific data, 6(1):1–10, 2019.
Gao et al. [2006] Hong Gao, Frank K Hwang, My T Thai, Weili Wu, and Taieb Znati. Construction of d (h)-disjunct matrix for group testing in hypergraphs. Journal of Combinatorial Optimization, 12(3):297–301, 2006.
Hwang and Du [2006] Frank Kwang-ming Hwang and Ding-zhu Du. Pooling designs and nonadaptive group testing: important tools for DNA sequencing, volume 18. World Scientific, 2006.
Menden et al. [2019] Michael P Menden, Dennis Wang, Mike J Mason, Bence Szalai, Krishna C Bulusu, Yuanfang Guan, Thomas Yu, Jaewoo Kang, Minji Jeon, Russ Wolfinger, et al. Community assessment to advance computational prediction of cancer drug combinations in a pharmacogenomic screen. Nature communications, 10(1):1–17, 2019.
Torney [1999] David C. Torney. Sets pooling designs. Annals of Combinatorics, 3(1):95–101, 1999. doi: 10.1007/BF01609879. URL https://doi.org/10.1007/BF01609879.

Appendix A Missing Analysis for Hypermatchings (Section 3)

A.1 Missing analysis for algorithm for learning hypermatchings (Section 3.1)

See 1

Proof.

Let $\mathcal{S}^{i}=\{(S^{i}_{1},T^{i}_{1}),\ldots,(S^{i}_{\ell},T^{i}_{\ell})\}$ be $\mathcal{S}$ at the beginning of the $i$ th iteration of the while loop of FindEdgeAdaptive $(S,s)$ . To prove Lemma 3, we need the following three loop invariants in FindEdgeAdaptive. See 2

Proof.

To simplify the proof, we actually show the following three loop invariants, where the second is an intermediary step to show the third invariant:

1.

either $v\in\hat{e}$ or $v\in S^{i}_{j}$ for some $j\in[\ell]$
2.

$Q_{M}(S^{i}_{j}\cup T^{i}_{j})=1$ for all $j\in[\ell]$
3.

$e\cap S^{i}_{j}\neq\emptyset$ for all $j\in[\ell]$

We prove the three invariants by induction on $i$ .

Base case: At the beginning of the first iteration, $\mathcal{S}^{1}=\{(S,\emptyset)\}$ and $\hat{e}=\emptyset$ , and we have the following 1) For every $v\in e$ we have $v\in S$ . 2) $Q_{M}(S\cup\emptyset)=1.$ 3) $e\cap S=e\neq\emptyset$ .

Assume that the statement of the lemma holds at the beginning of iteration $i\geq 1$ . We will show that it holds at the beginning of iteration $i+1$ . Let $\mathcal{S}^{i+1}=\{(S^{i+1}_{1},T^{i+1}_{1}),\ldots,(S^{i+1}_{k},T^{i+1}_{k})\}$ be $\mathcal{S}$ at the beginning of the $(i+1)$ th iteration of the while loop of FindEdgeAdaptive $(S,s)$ .

1.

Consider a vertex $v\in e$ . If at the beginning of the $i$ th iteration we had $v\in\hat{e}$ , then $v\in\hat{e}$ at the beginning of the $(i+1)$ th iteration as well. Suppose now that $v\in S^{i}_{j}$ for some $j\in[\ell]$ at the start of iteration $i$ . If $|S^{i}_{j}|=1$ then $v$ is added to $\hat{e}$ at Step 8. Assume $|S^{i}_{j}|>1$ , and let $T=T^{i}_{j}$ . Then $S^{i}_{j}$ is partitioned into two sets $S_{1}$ and $S_{2}$ . If $Q_{M}(S_{1}\cup T)=1$ and $Q_{M}(S_{2}\cup T)=1$ then this contradicts the fact that $S$ contains a unique edge. In fact, without loss of generality, we can assume that $v\in S_{1}$ . Therefore $S_{2}\cup T$ cannot include $e$ , and the only way $Q_{M}(S_{2}\cup T)=1$ is if $S$ contains another edge. If $Q_{M}(S_{1}\cup T)=1$ and $Q_{M}(S_{2}\cup T)=0$ then we know that $v\in S_{1}$ and $(S_{1},T)$ is added to $\mathcal{S}^{i+1}$ . If $Q_{M}(S_{1}\cup T)=0$ and $Q_{M}(S_{2}\cup T)=1$ then we know that $v\in S_{2}$ and $(S_{2},T)$ is added to $\mathcal{S}^{i+1}$ . If $Q_{M}(S_{1}\cup T)=0$ and $Q_{M}(S_{2}\cup T)=0$ then both $(S_{1},S_{2}\cup T)$ and $(S_{2},S_{1}\cup T)$ are added to $\mathcal{S}^{i+1}$ , and we have either $v\in S_{1}$ or $v\in S_{2}$ .
2.

From steps 9 to 11 in FindEdgeAdaptive $(S,s)$ , a couple $(S^{i+1}_{j},T^{i+1}_{j})$ is only added to $\mathcal{S}^{i+1}$ when $Q_{M}(S^{i+1}_{j}\cup T^{i+1}_{j})=1$ . In step 12, we have $Q_{M}(S_{1}\cup T)=0$ and $Q_{M}(S_{2}\cup T)=0$ , and both $(S_{1},S_{2}\cup T)$ and $(S_{2},S_{1}\cup T)$ are added to $\mathcal{S}^{i+1}$ . But we know from the induction assumption that at the beginning of iteration $i$ we had $(S_{1}\cup S_{2},T)\in\mathcal{S}^{i}$ , therefore by 2. $Q_{M}(S_{1}\cup S_{2}\cup T)=1$ . This ensures that also in this case, $(S^{i+1}_{j},T^{i+1}_{j})$ is only added to $\mathcal{S}^{i+1}$ when $Q_{M}(S^{i+1}_{j}\cup T^{i+1}_{j})=1$ .
3.

We know from the induction hypothesis $e\cap S^{i}_{j}\neq\emptyset$ for all $j\in[\ell]$ . For a fixed $j\in[\ell],$ $S^{i}_{j}$ contains a vertex $v$ from $e$ . In the while loop, $S^{i}_{j}$ is partitioned into $S_{1}$ and $S_{2}$ , such that $v$ is either in $S_{1}$ or in $S_{2}$ . If $Q_{M}(S_{1}\cup T^{i}_{j})=1$ , then $v\in S_{1}$ , $(S_{1},T^{i}_{j})$ is added to $\mathcal{S}^{i+1}$ and $S_{1}\cap e\neq\emptyset$ . Similarly, if $Q_{M}(S_{2}\cup T^{i}_{j})=1$ , then $v\in S_{2}$ , $(S_{2},T^{i}_{j})$ is added to $\mathcal{S}^{i+1}$ and $S_{2}\cap e\neq\emptyset$ . Assume now that $Q_{M}(S_{1}\cup T^{i}_{j})=Q_{M}(S_{2}\cup T^{i}_{j})=0$ , then $(S_{1},S_{2}\cup T^{i}_{j})$ and $(S_{2},S_{1}\cup T^{i}_{j}$ ) are added to $\mathcal{S}^{i+1}$ and we need to show that $S_{1}\cap e\neq\emptyset$ and $S_{2}\cap e\neq\emptyset$ . We know from 2. that $Q_{M}(S_{1}\cup S_{2}\cup T^{i}_{j})=1$ . Therefore $Q_{M}(S_{1}\cup T^{i}_{j})=Q_{M}(S_{2}\cup T^{i}_{j})=0$ implies that there is a vertex of $e$ both in $S_{1}$ and $S_{2}$ , therefore $S_{1}\cap e\neq\emptyset$ and $S_{2}\cap e\neq\emptyset$ . This concludes this part and shows that whenever $(S_{j}^{i+1},T_{j}^{i+1})$ is added to $\mathcal{S}^{i+1}$ , we have $S_{j}^{i+1}\cap e\neq\emptyset$ .

∎

See 4

Proof.

We first show that with high probability, each edge $e$ of size $|e|\in\left[s/\alpha,s\right]$ is contained in at least $(\log^{2}n)/2$ sample $S_{i}$ ’s. We use $X_{e}$ to denote the number of samples containing $e$ , then we have

\mathbb{E}(X_{e})=n^{\alpha}\log^{2}n\cdot n^{-\frac{\alpha|e|}{s}}\geq\log^{2}n.

By Chernoff bound, we have

P(X_{e}\leq\log^{2}n/2)\leq P(X_{e}\leq\mathbb{E}(X_{e})/2)\leq e^{-\frac{\mathbb{E}(X_{e})}{8}}\leq n^{-\frac{\log n}{8}}.

As there are at most $\alpha n/s$ edges of size between $s/\alpha$ and $s$ , by a union bound, we have

P(\exists e\in M\text{ s.t. }|e|\in\left[s/\alpha,s\right],X_{e}\leq(\log^{2}n)/2)\leq\frac{\alpha n}{s}n^{-\log n/8}\leq n^{-(\log n/8-1)}.

Subsequently,

P(\forall e\in M\text{ s.t. }|e|\in\left[s/\alpha,s\right],X_{e}\geq(\log^{2}n)/2)\geq 1-n^{-(\log n/8-1)}.

From now on we condition on the event that all edges $e$ whose size is between $s/\alpha$ and $s$ are included in at least $(\log^{2}n)/2$ samples. We show below that given $e\subseteq S_{i}$ for a sample $S_{i}$ , the probability that there exists another edge of size at least $s$ in $S_{i}$ is upper bounded by $n^{-1/n}$ . Recall that $M$ is the hidden matching we want to learn. We abuse notation and use $e^{\prime}\in M\cap S_{i}$ to denote that an edge $e^{\prime}$ is both in the matching $M$ and included in the sample $S_{i}$ . We have

$\displaystyle\mathbb{P}(\exists e^{\prime}\in M\cap S_{i},e^{\prime}\neq e\ \|\ e\subseteq S_{i})\leq$	$\displaystyle\sum_{e^{\prime}\in M,e^{\prime}\neq e}\mathbb{P}(e^{\prime}\subseteq S_{i}\ \|\ e\subseteq S_{i})$	(5)
$\displaystyle=$	$\displaystyle\sum_{e^{\prime}\in M,e^{\prime}\neq e}\mathbb{P}(e^{\prime}\subseteq S_{i})$	(6)
$\displaystyle\leq$	$\displaystyle\frac{\alpha n}{s}\cdot(n^{-\frac{\alpha}{s}})^{\frac{s}{\alpha}}$	(7)
$\displaystyle=$	$\displaystyle\frac{\alpha n}{s}\cdot n^{-{\frac{\alpha s}{\alpha s}}}=\frac{\alpha}{s}.$

where Eq.(5) uses union bound. Eq.(6) is due to the fact that $M$ is a matching and thus $e\cap e^{\prime}=\emptyset$ and vertices are sampled independently. Eq.(7) follows because the total number of remaining edges is upper bounded by $\alpha n/s$ and each edge is in $S_{i}$ with probability at most $(n^{-\frac{\alpha}{s}})^{\frac{s}{\alpha}}$ .

As each edge $e$ with size between $s/\alpha$ and $s$ is contained in at least $(\log^{2}n)/2$ samples, we have that

\displaystyle\mathbb{P}(\forall S_{i}\text{ s.t. }e\subseteq S_{i},\exists\ e^{\prime}\in M\cap S_{i},e^{\prime}\neq e)\leq(\frac{s}{\alpha})^{-\log^{2}n/2}\leq n^{-\log n/4},

where the last inequality follows because $s/\alpha\geq 2$ .

By another union bound on all edges of size between $s/\alpha$ and $s$ (at most $n$ of them), we have that

\displaystyle\mathbb{P}(\exists e\text{ s.t. }|e|\in\left[s/\alpha,s\right],\forall S_{i}\text{ s.t. }e\subseteq S_{i},\exists e^{\prime}\in M\cap S_{i},e^{\prime}\neq e)\leq n^{-(\log n/2-1)}.

We can thus conclude that with probability at least $1-O(n^{-\log n})=1-e^{-\Omega(\log^{2}n)}$ , for all $e\in M$ with size between $s/\alpha$ and $s$ , there exists at least one sample $S_{i}$ that contains $e$ but no other remaining edges. By Lemma 1 and Lemma 3, $e$ is returned by $\textsc{FindEdge}(S_{i},s)$ and $e$ is added to the matching $\hat{M}$ that is returned by $\textsc{FindDisjointEdges}(s,\alpha,V^{\prime})$ .

Next, we show that FindDisjointEdges is an $(r+1)$ -adaptive algorithm with $n^{\alpha}\log^{2}n+2\alpha qn^{\alpha}\log^{2}n/s$ queries. We first observe that FindDisjointEdges makes only parallel calls to FindEdges (after verifying that $Q_{M}(S_{i})=1$ ). Therefore, since FindEdges runs in at most $r$ rounds, we get that FindDisjointEdges runs in at most $r+1$ rounds. To prove a bound on the number of queries, we first argue using a Chernoff bound that with probability $1-e^{-\Omega(\log^{2}n)}$ , every edge $e$ of size at least $s/\alpha$ is in at most $2n^{\alpha-1}\log^{2}n$ samples $S_{i}$ . For a fixed edge $e$ we have:

	$\displaystyle P(X_{e}>2n^{\alpha-1}\log^{2}n)$	$\displaystyle\leq P(X_{e}>2\cdot\mathbb{E}(X_{e}))$
		$\displaystyle\leq e^{\frac{-\mathbb{E}(X_{e})}{3}}$
		$\displaystyle\leq e^{\frac{-\log^{2}n}{3}}=e^{-\Omega(\log^{2}n)}.$

A union bound on the at most $n$ edges yields that every edge of size at least $s/\alpha$ is in at most $2n^{\alpha-1}\log^{2}n$ samples with probability $1-e^{-\Omega(\log^{2}n)}$ . Therefore, there are at most $\frac{2\alpha n}{s}n^{\alpha-1}\log^{2}n$ samples $S_{i}$ such that $Q_{M}(S_{i})=1$ For every one of these samples, we call $\textsc{FindEdges}(S_{i},s)$ , which makes $q$ queries by assumption. Therefore, with high probability $1-e^{-\Omega(\log^{2}n)}$ , FindDisjointEdges makes $\mathcal{O}(n^{\alpha}\log^{2}n+\alpha q\frac{2n^{\alpha}}{s}\log^{2}n)$ queries. ∎

The following claim is a technical result we need to prove Theorem 5.

Claim 1.

For $0\leq x<1$ we have $\frac{1}{1-x}\geq e^{x}$ .

Proof.

Consider the function $f(x)=\frac{1}{1-x}-e^{x}$ for $x<1$ . The derivative of $f$ is

f^{\prime}(x)=\frac{1}{(1-x)^{2}}-e^{x}=\frac{1}{1-x}\cdot\left(\frac{1}{1-x}-(1-x)e^{x}\right).

Because $1-x\leq e^{-x}$ , we have $(1-x)e^{x}\leq 1$ . Therefore,

f^{\prime}(x)\geq\frac{1}{1-x}\cdot\left(\frac{1}{1-x}-1\right)\geq 0,

where the last inequality follows from the assumption that $0\leq x<1$ . ∎

See 5

Proof.

FindSingletons learns, with probability 1, all the edges of size $s=1$ . For edges of size greater than 2, from Lemma 4, we know that each call of $\textsc{FindDisjointEdges}(s,\alpha,V^{\prime},\textsc{FindEdge})$ fails to find all edges in $V^{\prime}$ of size between $s/\alpha$ and $s$ with probability at most $e^{-\Omega(\log^{2}n)}$ . As there are at most $n$ different edge sizes, FindDisjointEdges is called at most $n$ times (we show below that it is actually called at most $\log n/(\log\alpha)$ times). Thus, the probability that at least one of the calls fails is upper bounded by

n\cdot e^{-\Omega(\log^{2}n)}=e^{\log n-\Omega(\log^{2}n)})=e^{-\Omega(\log^{2}n)}.

We can thus conclude that the probability that all calls are successful is at least $1-e^{-\Omega(\log^{2}n)}$ .

Next, we show that FindMatching makes $\mathcal{O}(\frac{\log n}{\log\alpha})$ calls to FindDisjointEdges.

We use $s_{t}$ to denote the value of $s$ after the $t^{\text{th}}$ call of FindDisjointEdges for $t=1,2,\ldots,T$ with $T$ being the number of adaptive rounds of FindMatching and set $s_{0}=2\alpha$ . We show that the algorithm needs less than $O(\frac{\log n}{\log\alpha})$ rounds to go over all the possible edge sizes of the matching. In step 5 of FindMatching, we update $s_{t}$ as follows:

s_{t}\leftarrow\lfloor\alpha s_{t-1}\rfloor+1

Therefore

\displaystyle s_{t}

\displaystyle\geq\alpha s_{t-1},

(8)

and

\displaystyle s_{t}

\displaystyle\geq(\alpha)^{t}s_{0},

(9)

Since the maximum $s$ we input to FindDisjointEdges is $n$ , we have that

n\geq s_{T}\geq 2(\alpha)^{T+1}

and subsequently

T\leq\frac{\log n-\log 2}{\log\alpha}.

Therefore, FindMatching makes $\mathcal{O}(\frac{\log n}{\log\alpha})$ calls to FindDisjointEdges. From Lemma 4, we know that with high probability $1-e^{-\Omega(\log^{2}n)}$ , $\textsc{FindDisjointEdges}(s,\alpha,V^{\prime},\textsc{FindEdge})$ runs in $r+1$ rounds and makes $\mathcal{O}(n^{\alpha}\log^{2}n+2\alpha qn^{\alpha}\log^{2}n/s)$ queries when FindEdges $(S,s)$ runs in $r$ rounds and makes $q$ queries. From Lemma 1 and Lemma 3 we know that for the subroutines FindEdgeAdaptive and FindEdgeParallel we have $r=\mathcal{O}(\log n),q=\mathcal{O}(s\log n)$ and $r=1,q=\mathcal{O}(n)$ respectively. Therefore FindDisjointEdges runs in $\mathcal{O}(\log n)$ rounds and makes $\mathcal{O}(n^{\alpha}\log^{2}n+2n^{\alpha}\log^{3}n)$ queries with FindEdgeAdaptive and in 2 rounds $\mathcal{O}(n^{\alpha}\log^{2}n+2n^{\alpha+1}\log^{2}n)$ with FindEdgeParallel. We conclude that with high probability,

•

FindMatching runs in $\mathcal{O}(\frac{\log^{2}n}{\log\alpha})$ rounds and makes $\mathcal{O}(n^{\alpha}\frac{\log^{3}n}{\log\alpha}+2\alpha n^{\alpha}\frac{\log^{4}n}{\log\alpha})$ queries with FindEdgeAdaptive,
•

FindMatching runs in $\mathcal{O}(\frac{\log n}{\log\alpha})$ rounds and makes $\mathcal{O}(n^{\alpha}\frac{\log^{3}n}{\log\alpha}+2n^{\alpha+1}\frac{\log^{3}n}{\log\alpha})$ queries with FindEdgeParallel.

Now we turn our attention to particular values of $\alpha$ .

•

Suppose now that $\alpha=1/(1-\frac{1}{2\log n})$ and that we use FindEdgeAdaptive. We get then $\alpha=1+o(1)$ . Furthermore, by Claim 1, we have

$\displaystyle\log\alpha$ $\displaystyle=\log\frac{1}{1-\frac{1}{2\log n}}$

$\displaystyle\geq\frac{1}{2\log n}.$

Therefore we get that $\frac{1}{\log\alpha}=O(\log n)$ . We also observe that for $n\geq 3$

$\displaystyle\alpha$ $\displaystyle=1+\frac{\frac{1}{2\log n}}{1-\frac{1}{2\log n}}$

$\displaystyle\leq 1+\frac{1}{\log n}.$

Therefore $n^{\alpha}\leq n\cdot n^{\frac{1}{\log n}}=\mathcal{O}(n)$ . We finally conclude that when $\alpha=1/(1-\frac{1}{2\log n})$ and we use the subroutine FindEdgeAdaptive, FindMatching runs in $\mathcal{O}(\log^{2}n/\log\alpha)=\mathcal{O}(\log^{3}n)$ adaptive rounds and makes $\mathcal{O}(n\log^{4}n+n\log^{5}n)=\mathcal{O}(n\log^{5}n)$ queries in total.
•

With $\alpha=2$ and FindEdgeParallel, we have $r=1$ and we get that FindMatching learns any hypermatching w.h.p. in $O(\log n)$ rounds and with at most $\mathcal{O}(n^{3}\log^{3}n)$ queries.

∎

A.2 Missing analysis for hardness of learning hypermatchings (Section 3.2)

A.2.1 Lower bound for non-adaptive algorithms

To argue that learning hypermatchings non-adaptively requires an exponential number of queries, we fix the number of vertices $n$ , we construct a family of matchings $M_{P}$ which depend on a partition $P$ of the vertex set $[n]$ into 3 parts $P_{1},P_{2},P_{3}$ such that

•

$|P_{1}|=n-(\sqrt{n}+1)$ .
•

$P_{1}$ contains a perfect matching $M_{1}$ with edges of size 2.
•

$|P_{2}|=1$
•

$P_{3}$ is an edge of the matching s.t. $|P_{3}|=\sqrt{n}$ .
•

$M_{P}=M_{1}\cup\{P_{3}\}$ .

We denote all such partitions by $\mathcal{P}_{3}$ . For a partition $P\in\mathcal{P}_{3}$ , multiple matchings are possible, depending on the perfect matching $M_{1}$ . We use $M_{P}$ to denote a random matching from all the possible matchings satisfying the properties above. The main idea of the construction is that after one round of queries, elements in $P_{2}$ and $P_{3}$ are indistinguishable to any algorithm with polynomial queries. However, a learning algorithm needs to distinguish elements in $P_{3}$ from the element in $P_{2}$ to learn the edge $P_{3}$ .

Before presenting the proof, we formalize what it means for a set of elements to be indistinguishable. Since the next definition is also used in the next subsection, we consider an arbitrary family $\mathcal{P}_{R}$ of partitions $P=(P_{0},\ldots,P_{R})$ . Given a partition $(P_{0},\ldots,P_{R})$ , we denote by $P_{r}$ : the union of parts $P_{i}$ such that $i\geq r$ , i.e. $P_{r}:=\cup_{i=r}^{R}P_{i}$ . Informally, we say that queries $Q_{M_{P}}(S)$ are independent of the partition of $P_{r}$ : if the values $Q_{M_{P}}(S)$ of these queries do not contain information about which elements are in $P_{r}$ , or $P_{r+1},\ldots,$ or $P_{R}$ .

Definition 2.

Given a family of partitions $\mathcal{P}_{R}$ and a partition $P=(P_{0},\ldots,P_{R})\in\mathcal{P}_{R}$ , let $P^{\prime}$ be a partition chosen uniformly at random from $\{(P_{0}^{\prime},\ldots,P_{R}^{\prime})\in\mathcal{P}_{3}\ :\ P_{0}^{\prime}=P_{0},\ldots P^{\prime}_{r-1}=P_{r-1}\}$ . Let $M_{P^{\prime}}$ be a matching on $P^{\prime}$ such that $M_{P}$ and $M_{P^{\prime}}$ are equal on $P_{r}:=\cup_{i=r}^{R}P_{i}$ . A query $Q_{M_{P}}(S)$ is independent from $P_{r}:=\cup_{i=r}^{R}P_{i}$ if $Q_{M_{P}}(S)=Q_{M_{P^{\prime}}}(S)$ with high probability over $P^{\prime}$ .

For example, in the partition $(P_{1},P_{2},P_{3})$ above, any query $Q_{M_{P}}(S)$ such that $S$ contains an edge from $M_{1}$ is independent of the partition of $P_{2}\cup P_{3}$ since this query will always answer 1.

Analysis of the construction. We consider a non-adaptive algorithm $\mathcal{A}$ , a uniformly random partition $P=(P_{1},P_{2},P_{3})$ in $\mathcal{P}_{3}$ , and a matching $M_{P}$ . We argue that with high probability over both the randomization of $P$ and the decisions of $A$ , the matching returned by $\mathcal{A}$ is not equal to $M_{P}$ . The analysis consists of two main parts.

The first part argues that, with high probability, for all queries $S$ made by a non-adaptive algorithm, $Q_{M_{P}}(S)$ is independent of the partition of $P_{2}\cup P_{3}$ .

Lemma 2.

Let $P$ be a uniformly random partition from $\mathcal{P}_{3}$ . For any collection $\mathcal{S}$ of $\operatorname{poly}(n)$ non-adaptive queries, with high probability $1-e^{-\Omega(n)}$ , for all $S\in\mathcal{S}$ , $S$ is independent of the partition of $P_{2}\cup P_{3}$ .

Proof.

Consider any set $S\in\mathcal{S}$ which is independent of the randomization of $P$ . There are two cases depending on the size of $S$ .

1.

$|S|\leq\frac{n+\sqrt{n}+1}{2}$ . In this case, if $S$ does not contain at least $\sqrt{n}$ vertices from $P_{2}\cup P_{3}$ , then for any other partition $P^{\prime}$ such that $P_{1}^{\prime}=P_{1}$ , we have $Q_{M_{P}}(S)=Q_{M_{P^{\prime}}}(S)$ . In fact, both $Q_{M_{P^{\prime}}}(S)$ and $Q_{M_{P}}(S)$ will be equal 1 if and only if $S$ contains an edge from $M_{1}$ . Next we show that, with high probability, $S$ will contain fewer than $\sqrt{n}$ vertices from $P_{2}\cup P_{3}$ . Since $P_{2}\cup P_{3}$ is a uniformly random set of size $\sqrt{n}+1$ , by the Chernoff bound, $|S\cap(P_{2}\cup P_{3})|<\sqrt{n}+1$ with probability $1-e^{-\Omega(\sqrt{n})}$ .

We therefore get that $Q_{M_{P}}(S)=Q_{M_{P^{\prime}}}(S)$ for any partition $P^{\prime}=(P_{1},P_{2}^{\prime},P_{3}^{\prime})$ and query $|S|\leq\frac{n+\sqrt{n}+1}{2}$ with probability $1-e^{-\Omega(\sqrt{n}))}$ .
2.

$|S|>\frac{n+\sqrt{n}+1}{2}$ . In this case, the set $S$ contains at least one matched edge from $P_{1}$ . In fact, the number of edges in $M_{1}$ plus the size of $P_{2}\cup P_{3}$ is

$\frac{|P_{1}|}{2}+|P_{2}|+|P_{3}|=\frac{n-(\sqrt{n}+1)}{2}+\sqrt{n}+1=\frac{n+\sqrt{n}+1}{2}<|S|.$

$S$ must therefore contain at least one edge from $M_{1}$ , and we get that $Q_{M_{P}}(S)=Q^{P^{\prime}}(S)$ for any partition $P^{\prime}=(P_{1},P_{2}^{\prime},P_{3}^{\prime})$ and query $|S|>\frac{n+\sqrt{n}+1}{2}$ with probability 1.

By combining the two cases $|S|\leq\frac{n+\sqrt{n}+1}{2}$ and $|S|>\frac{n+\sqrt{n}+1}{2}$ , we get that with probability $1-e^{-\Omega(\sqrt{n}))}$ , $Q_{M_{P}}(S)$ is independent of the partition of $P_{2}\cup P_{3}$ , and by a union bound, this holds for any collection of $poly(n)$ sets $S$ . ∎

The second part of the analysis argues that if all queries $Q_{M_{P}}(S)$ of an algorithm are independent of the partition of $P_{2}\cup P_{3}$ , then, with high probability, the matching returned by this algorithm is not equal to $M_{P}$ .

Lemma 3.

Let $P$ be a uniformly random partition in $\mathcal{P}_{3}$ and $M_{P}$ a random matching over $P$ . Consider an algorithm $\mathcal{A}$ such that all the queries $Q_{M_{P}}(S)$ made by $\mathcal{A}$ are independent of the partition of $P_{2}\cup P_{3}$ . Then, the (possibly randomized) matching $M$ returned by $\mathcal{A}$ is, with probability $1-O(1/\sqrt{n})$ , not equal to $M_{P}$ .

Proof.

Consider an algorithm $\mathcal{A}$ such that all queries $Q_{M_{P}}(S)$ of $\mathcal{A}$ are independent of the partition of $P_{2}\cup P_{3}$ . Thus, the matching $M$ returned by $\mathcal{A}$ is conditionally independent of the randomization of the partition $P$ given $P_{1}$ .

For the algorithm $\mathcal{A}$ to return $M_{P}$ , it needs to learn the edge $P_{3}$ . We distinguish two cases.

•

$\mathcal{A}$ does not return any edge that is included in $P_{2}\cup P_{3}$ . In this case, with probability 1, $M$ is not equal to $M_{P}$ .
•

$\mathcal{A}$ returns an edge that is included in $P_{2}\cup P_{3}$ . We know from the previous lemma that with probability $1-e^{-\Omega(\sqrt{n})}$ , all queries are independent from $P_{2}\cup P_{3}$ . Therefore, the edge returned by $\mathcal{A}$ is also independent of the partition $P_{2}\cup P_{3}$ . The probability that this edge is equal to $P_{3}$ is less than $1/\sqrt{n+1}$ . Therefore, with probability $(1-e^{-\Omega(\sqrt{n})})(1-1/\sqrt{n+1})=1-O(1/\sqrt{n})$ , the returned matching $M$ is not equal to $M_{P}$ .

∎

By combining Lemma 2 and Lemma 3, we get the hardness result for non-adaptive algorithms.

See 6

Proof.

Consider a uniformly random partition $P\in\mathcal{P}_{3}$ , a matching $M_{P}$ and an algorithm $\mathcal{A}$ which queries $M_{P}$ . By Lemma 2, after one round of queries, with probability $1-e^{-\Omega(\sqrt{n})}$ over both the randomization of $P$ and of the algorithm, all the queries $Q_{M_{P}}(S)$ made by $\mathcal{A}$ are independent of the partition of $P_{2}\cup P_{3}$ . By Lemma 3, this implies that, with probability $1-O(1/\sqrt{n})$ , the matching returned by $\mathcal{A}$ is not a equal to $M_{P}$ . By the probabilistic method, this implies that there exists a partition $P\in\mathcal{P}_{3}$ for which, with probability $1-O(1/\sqrt{n})$ , $\mathcal{A}$ does not return $M_{P}$ after one round of queries. ∎

A.2.2 Lower bound for $o(\log\log n)$ adaptive algorithm

Our idea is to construct a partition $P_{0},\ldots,P_{i},\ldots P_{R}$ , such that every set $P_{i}$ is a perfect matching of edges of size $d_{i}$ , and therefore has $|P_{i}|/d_{i}$ edges. We want to choose the size $|P_{i}|$ and $d_{i}$ ( $d_{i}$ increasing) such that with probability $1-e^{-poly(n)}$ , after $i$ rounds of any adaptive algorithm that asks a polynomial number of queries, the partition $P_{i}\cup P_{i+1}\ldots P_{R}$ is indistinguishable.

Our construction

•

$d_{0}=3$ .
•

$d_{i+1}=3\log^{2}n\cdot d_{i}^{2}$ .
•

$|P_{i}|=3\log^{2}n\cdot d_{i}^{2}$ .
•

$|P_{i+1}|=3\log^{2}n\cdot d_{i+1}^{2}=3\log^{2}n\cdot(3\log^{2}n\cdot d_{i}^{2})^{2}=3\log^{2}n\cdot|P_{i}|^{2}$ .

Notation : Let $\mathcal{P}_{R}$ be the set of partitions $P=(P_{0},\ldots,P_{R})$ satisfying the conditions above. For a fixed $i$ , let $k_{i}=\frac{|P_{i}|}{d_{i}}$ , and let $e_{1}^{i},\ldots,e_{j}^{i},\ldots,e^{i}_{k_{i}}$ be the random matching on $P_{i}$ . Finally, let $n_{i}=\sum\limits_{l\geq i}|P_{l}|$

Claim 2.

The construction has $R+1=\Theta(\log\log n)$ partitions.

Proof.

We first show that the construction has $R+1\geq\log\log n$ partitions.

By induction we have that

|P_{i}|=(3\log^{2}n)^{2^{i}-1}|P_{0}|^{2^{i}}

Therefore

	$\displaystyle n$	$\displaystyle\leq\sum\limits_{i=0}^{R}\|P_{i}\|$
		$\displaystyle\leq(R+1)\|P_{R}\|$
		$\displaystyle\leq(R+1)(3\log^{2}n)^{2^{R}-1}\|3\log^{n}d_{0}^{2}\|^{2^{R}}$
		$\displaystyle=(R+1)9^{2^{R}}(3\log^{2}n)^{2^{R+1}-1}$
		$\displaystyle=9^{2^{R+1}}(3\log^{2}n)^{2^{R+1}}$

This implies that

2^{R+1}\log(27\log^{2}n)\geq\log n,

and

R+1\geq\frac{\log\log n}{\log 2}-\frac{\log\left(\log(27\log^{2}n)\right)}{\log 2}\geq\log\log n.

Next we show that $R=O(\log\log n)$ . We know that $|P_{R}|\leq n$ , which implies that $(3\log^{2}n)^{2^{R}-1}\leq n$ and

2^{R}-1\leq\frac{\log n}{\log\log^{6}n}.

Therefore,

R=O(\log\frac{\log n}{\log\log^{6}n})=O(\log\log n).

∎

Suppose that at the beginning of round $i$ , we learned $P_{0}\cup\ldots\cup P_{i-1}$ but $P_{i}\cup P_{i+1}\ldots P_{R}$ is indistinguishable. Consider a query $S$ of the $i$ -th round. We show that if the query $S$ is big, then with high probability $S$ contains an edge from $P_{i}$ , and if $S$ is small, then $S$ does not contain any edge from the high partition $P_{i+1},P_{i+2},\ldots P_{R}.$ In both cases, querying $S$ gives an answer that is independent to $P_{i+1},P_{i+2},\ldots P_{R}$ with high probability over all the polynomial number of queries. Therefore, after the $i$ -th round, $P_{i+1},P_{i+2},\ldots$ is indistinguishable.

Outline of the proof. Claim 3 is a technical result we need to bound the binomial coefficients in our proofs. Lemma 4 gives the expected size of the intersection of a query with a partition and presents a concentration result on the intersection when the queries are large. Lemma 5 shows that for the purpose of bounding the probability that no edge of a partition is included in $S$ , we can drop the constraint the edges are disjoint and think of them as being randomly and uniformly sampled (with replacement) from the partition. Lemma 6 shows that $|S|\geq(1-\frac{1}{d_{i}})n_{i}$ , then with probability $1-e^{-\Omega(\log^{2}n)}$ , $S$ contains at least one edge from the matching on $P_{i}$ . If $|S|\leq(1-\frac{1}{d_{i}})n_{i}$ , then Lemma 7 shows with probability $1-e^{-\Omega(\log^{2}n)}$ , for every $l\geq i+1$ , $S$ does not contain any edge from the matching on $P_{l}$ . Combining the two lemmas (Lemma 8) shows that if $P$ is a uniformly random partition in $P_{R}$ , and if all the queries made by an algorithm $\mathcal{A}$ in the previous $i$ rounds are independent of the partition of $P_{i},\ldots,P_{R}$ , then any collection of $poly(n)$ non-adaptive queries at round $i+1$ , independent of $P_{i+1},\ldots,P_{R}$ with probability $1-e^{-\Omega(\log^{2}n)}$ .

Claim 3.

If $k<\sqrt{n}$ , then

\frac{n^{k}}{4(k!)}\leq{n\choose k}\leq\frac{n^{k}}{k!}

Proof.

We have that

	$\displaystyle{n\choose k}$	$\displaystyle=\frac{n(n-1)\ldots(n-(k-1)}{k!}$
		$\displaystyle=\frac{n^{k}}{k!}(1-\frac{1}{n})\ldots(1-\frac{k-1}{n})$

The upper bound ${n\choose k}\leq\frac{n^{k}}{k!}$ follows immediately. To show the lower bound, we observe that

	$\displaystyle{n\choose k}$	$\displaystyle=\frac{n^{k}}{k!}(1-\frac{1}{n})\ldots(1-\frac{k-1}{n})$
		$\displaystyle\geq\frac{n^{k}}{k!}(1-\frac{k-1}{n})^{k-1}$

Now consider the function $f_{n}(x,y)=(1-\frac{x}{n})^{y}$ for $y\geq 1$ and $x\in[1,n]$ . For $x\in[1,n]$ and fixed $y\geq 1$ , the function $f_{n}$ decreases as $x$ increases. Similarly, for $x$ fixed, $f_{n}$ decreases as $y$ increases. Therefore

(1-\frac{k-1}{n})^{k-1}\geq(1-\frac{1}{\sqrt{n}})^{\sqrt{n}}.

Furthermore, we know that for $x\geq 2$ , $(1-1/x)^{x}\geq 1/4$ . Therefore

	$\displaystyle{n\choose k}$	$\displaystyle\geq\frac{n^{k}}{k!}(1-\frac{k-1}{n})^{k-1}$
		$\displaystyle\geq\frac{n^{k}}{k!}(1-\frac{1}{\sqrt{n}})^{\sqrt{n}}$
		$\displaystyle\geq\frac{n^{k}}{4(k!)}.$

∎

Lemma 4.

Let $S$ be an arbitrary subset of $P_{i}\cup P_{i+1}\cup\ldots P_{R}$ . After $i$ rounds, for $l\geq i$ , the expected size of the intersection $|S\cap P_{l}|$ is

\mathbb{E}[|S\cap P_{l}|]=s_{l}=\frac{|S||P_{l}|}{n_{i}},

where the expectation is over the partitions of $P_{i}\cup P_{i+1}\cup\ldots P_{r}$ into $P_{i},P_{i+1},\ldots,P_{r}$ . Moreover, for any polynomial set of queries $\mathcal{S}$ such that $|S|\geq(1-\frac{1}{d_{i}})n_{i}$ for $S\in\mathcal{S}$ , then with probability $1-e^{-\Omega(\log^{2}n)}$ , for all $l\geq i$ and for all $S\in\mathcal{S}$ , we have $|S\cap P_{l}|\geq s_{l}(1-\frac{1}{d_{i}})$ .

Proof of Lemma 4.

Every node in $S$ will be in $P_{l}$ with probability $|P_{l}|/n_{i}$ , therefore by linearity of expectation

\mathbb{E}[|S\cap P_{l}|]=s_{l}=\frac{|S||P_{l}|}{n_{i}}

Suppose now that $|S|\geq(1-\frac{1}{d_{i}})n_{i}$ . By Chernoff bound, we have

	$\displaystyle\mathbb{P}\big{(}\|S\cap P_{l}\|\leq s_{l}(1-\frac{1}{d_{i}})\big{)}$	$\displaystyle\leq e^{-\frac{s_{l}}{2d_{i}^{2}}}$
		$\displaystyle=e^{-\frac{\|S\|}{n_{i}}\frac{\|P_{l}\|}{2d_{i}^{2}}}$
		$\displaystyle\leq e^{-(1-1/d_{i})\frac{\|P_{l}\|}{2d_{i}^{2}}}$
		$\displaystyle\leq e^{-\frac{\|P_{l}\|}{3d_{i}^{2}}}$
		$\displaystyle\leq e^{-\log^{2}n}$
		$\displaystyle\leq\frac{1}{n^{\log n}}$

In the fourth inequality, we used $1-1/d_{i}\leq 2/3$ . By a union bound the probability that there exists a partition $P_{l}$ with $l\geq i$ such that $|S\cap P_{l}|\leq s_{l}(1-\frac{1}{d_{i}})$ is $O(\log\log n/n^{\log n})=e^{-\Omega(\log^{2}n)}$ . Another application of the union bound on the polynomially many queries shows that with probability at least $1-e^{-\Omega(\log^{2}n)}$ , for all queries of size greater than $(1-1/d_{i})n$ and for all partitions greater than $i$ , we have $|S\cap P_{l}|\geq s_{l}(1-\frac{1}{d_{i}})$ . ∎

Lemma 5.

Fix an iteration $i$ , and a query $S$ . Let $e^{\prime}_{1},\ldots,e^{\prime}_{k_{i}}$ be edges of size $d_{i}$ , independently and uniformly sampled from $P_{i}$ . Let $e^{i}_{1},\ldots,e^{i}_{k_{i}}$ be a random matching on $P_{i}$ . We have

\mathbb{P}(\forall j\in\{1,\ldots,k_{i}\},\ e_{j}^{i}\not\subseteq S)\leq\mathbb{P}(\forall j\in\{1,\ldots,k_{i}\},\ e^{\prime}_{j}\not\subseteq S)=(1-\frac{\binom{|S\cap P_{i}|}{d_{i}}}{\binom{|P_{i}|}{d_{i}}})^{k_{i}}

Proof.

Since the partition $i$ is fixed, we drop indexing by $i$ and use $e_{j}$ to denote $e_{j}^{i}$ , $k$ to denote $k_{i}$ , and $d$ to denote $d_{i}$ . We also let $n=|P_{i}|$ and assume without loss of generality that $S\subseteq|P_{i}|$ .

The intuition is that, since the edges $e_{j}$ must be disjoint but the edges $e^{\prime}_{j}$ do not necessarily have to, then the edges $e_{j}$ will cover a “bigger fraction” of $S$ than $e^{\prime}_{j}$ .

We prove this by induction on the number of edges. When considering only one edge $e_{1}$ , we know that

\mathbb{P}(e_{1}\not\subseteq S)=\mathbb{P}(e_{j}\not\subseteq S)=\mathbb{P}(e^{\prime}_{1}\not\subseteq S).

Now assume that the result is true for $j$ edges, i.e. $\mathbb{P}(e_{1}\not\subseteq S,\ldots,e_{j}\not\subseteq S)\leq\mathbb{P}(e^{\prime}_{1}\not\subseteq S)^{j}$ . We want to show that it holds for $j+1$ edges, i.e.,

\mathbb{P}(e_{1}\not\subseteq S,\ldots,e_{j}\not\subseteq S,e_{j+1}\not\subseteq S)\leq\mathbb{P}(e^{\prime}_{1}\not\subseteq S)^{j+1}

By the inductive hypothesis, it is sufficient to show that

P(e_{j+1}\not\subseteq S|\ \forall\ l\leq j,\ e_{l}\not\subseteq S)\leq\mathbb{P}(e_{j+1}\not\subseteq S)

This is equivalent to showing that $P(e_{j+1}\subseteq S|\ e_{1}\not\subseteq S,\ldots,e_{j}\not\subseteq S)\geq\mathbb{P}(e_{j+1}\subseteq S)$ . Furthermore, we have

	$\displaystyle\mathbb{P}(e_{j+1}\subseteq S)$	$\displaystyle=\mathbb{P}(e_{j+1}\subseteq S\|\ \forall\ l\leq j,\ e_{l}\not\subseteq S)\mathbb{P}(\forall\ l\leq j,\ e_{l}\not\subseteq S)$
		$\displaystyle\ \ \ +\mathbb{P}(e_{j+1}\subseteq S\|\ \exists\ l\leq j,\ e_{l}\subseteq S)\mathbb{P}(\exists\ l\leq j,\ e_{l}\subseteq S)$

Therefore, $P(e_{j+1}\not\subseteq S|\ e_{1}\not\subseteq S,\ldots,e_{j}\not\subseteq S)\leq\mathbb{P}(e_{j+1}\not\subseteq S)$ becomes equivalent to

P(e_{j+1}\subseteq S|\ \forall\ l\leq j,\ e_{l}\not\subseteq S)\mathbb{P}(\exists\ l\leq j,\ e_{l}\subseteq S)\geq\mathbb{P}(e_{j+1}\subseteq S|\ \exists\ l\leq j,\ e_{l}\subseteq S)\mathbb{P}(\exists\ l\leq j,\ e_{l}\subseteq S)

(10)

Since, $\mathbb{P}(\exists\ l\leq j,\ e_{l}\subseteq S)>0$ , the inequality (10) is equivalent to

P(e_{j+1}\subseteq S|\ \forall\ l\leq j,\ e_{l}\not\subseteq S)\geq\mathbb{P}(e_{j+1}\subseteq S|\ \exists\ l\leq j,\ e_{l}\subseteq S)

(11)

To prove equation (11), we observe that

\mathbb{P}(e_{j+1}\subseteq S)=pP(e_{j+1}\subseteq S|\ \forall\ l\leq j,\ e_{l}\not\subseteq S)+(1-p)\mathbb{P}(e_{j+1}\subseteq S|\ \exists\ l\leq j,\ e_{l}\subseteq S),

where $p=\mathbb{P}(\forall\ l\leq j,\ e_{l}\not\subseteq S).$ Therefore, if we show that $\mathbb{P}(e_{j+1}\subseteq S)\geq\mathbb{P}(e_{j+1}\subseteq S|\ \exists\ l\leq j,\ e_{l}\subseteq S)$ , then we must have $P(e_{j+1}\subseteq S|\ \forall\ l\leq j,\ e_{l}\not\subseteq S)\geq\mathbb{P}(e_{j+1}\subseteq S)\geq\mathbb{P}(e_{j+1}\subseteq S|\ \exists\ l\leq j,\ e_{l}\subseteq S)$ . To see that $\mathbb{P}(e_{j+1}\subseteq S)\geq\mathbb{P}(e_{j+1}\subseteq S|\ \exists\ l\leq j,\ e_{l}\subseteq S)$ , we first observe that

	$\displaystyle\mathbb{P}(e_{j+1}\subseteq S\|\ \exists\ l\leq j,\ e_{l}\subseteq S)$	$\displaystyle=\mathbb{P}(e_{j+1}\subseteq S\|\ e_{1}\subseteq S)$
		$\displaystyle=\frac{\binom{\|S\|-d}{d}}{\binom{n-d}{d}},$

while

\mathbb{P}(e_{j+1}\subseteq S)=\frac{\binom{|S|}{d}}{\binom{n}{d}}.

Therefore,

	$\displaystyle\frac{\mathbb{P}(e_{j+1}\subseteq S)}{\mathbb{P}(e_{j+1}\subseteq S\|\ e_{1}\subseteq S)}$	$\displaystyle=\frac{\binom{\|S\|}{d}}{\binom{n}{d}}\frac{\binom{n-d}{d}}{\binom{\|S\|-d}{d}}$
		$\displaystyle=\frac{(\|S\|)!}{(\|S\|-d)!}\frac{(\|S\|-2d)!}{(\|S\|-d)!}\frac{(n-d)!}{n!}\frac{(n-d)!}{(n-2d)!}$
		$\displaystyle=\frac{\frac{\|S\|\ \ \ \ldots(\|S\|-d+1)}{(\|S\|-d)\ldots\|S\|-2d+1)}}{\frac{n\ \ \ \ldots(n-d+1)}{(n-d)\ldots n-2d+1)}}$

It is easy to verify that

\frac{|S|-d+i}{|S|-2d+i}\geq\frac{n-d+i}{n-2d+i}

for every $i\in\{1,\ldots,d\}$ , because $|S|\leq n$ . Therefore we get that $\mathbb{P}(e_{j+1}\subseteq S)\geq\mathbb{P}(e_{j+1}\subseteq S|\ e_{1}\subseteq S)$ , and hence $\mathbb{P}(e_{j+1}\subseteq S)\geq\mathbb{P}(e_{j+1}\subseteq S|\ \exists\ l\leq j,\ e_{l}\subseteq S)$ . This proves (11) and concludes the induction proof. ∎

Lemma 6.

Consider a query $S$ by an algorithm at round $i$ such that all the queries from previous rounds are independent of $P_{i},\ldots,P_{R}$ . Let $S$ be such that $|S|\geq(1-\frac{1}{d_{i}})n_{i}$ , then with probability $1-e^{-\Omega(\log^{2}n)}$ , $S$ contains at least one edge from the matching on $P_{i}$

Proof.

	$\displaystyle\mathbb{P}(\exists j\in\{1,\ldots,k_{i}\},\ e_{j}^{i}\subseteq S)$	$\displaystyle=1-\mathbb{P}(\forall j\in\{1,\ldots,k_{i}\},\ e_{j}^{i}\not\subseteq S)$		(12)
		$\displaystyle\geq 1-(1-\frac{\binom{\|S\cap P_{i}\|}{d_{i}}}{\binom{\|P_{i}\|}{d_{i}}})^{k_{i}}$		(13)

where the inequality comes from Lemma 5. Since $|S|\geq(1-\frac{1}{d_{i}})n_{i}$ , we have with high probability that $|S\cap P_{i}|\geq(1-\frac{1}{d_{i}})^{2}|P_{i}|\geq d_{i}^{2}$ . Conditioning on this event, and since we already have $d_{i}\leq\sqrt{|P_{i}|}$ , we get by Claim 3

	$\displaystyle\frac{\binom{\|S\cap P_{i}\|}{d_{i}}}{\binom{\|P_{i}\|}{d_{i}}}$	$\displaystyle\geq\frac{\|S\cap P_{i}\|^{d_{i}}}{4(d_{i})!}\frac{(d_{i})!}{\|P_{i}\|^{d_{i}}}$
		$\displaystyle\geq\frac{1}{4}(1-\frac{1}{d_{i}})^{2d_{i}}$
		$\displaystyle\geq\frac{1}{4}e^{-\frac{1/d_{i}}{1-1/d_{i}}2d_{i}}$
		$\displaystyle\geq\frac{1}{4}e^{-\frac{2}{1-1/d_{i}}}$
		$\displaystyle\geq\frac{1}{4}e^{-4}$

Since $k_{i}=\frac{|P_{i}|}{d_{i}}=3\log^{2}n\cdot d_{i}$ , we get that

\frac{\binom{|S\cap P_{i}|}{d_{i}}}{\binom{|P_{i}|}{d_{i}}}\geq\frac{1}{4}e^{-4}\geq\frac{1}{3d_{i}}=\frac{\log^{2}n}{k_{i}}

The probability (13) becomes

$\displaystyle\mathbb{P}(\exists j\in\{1,\ldots,k_{i}\},\ e_{j}^{i}\subseteq S)$	$\displaystyle\geq 1-(1-\frac{\binom{\|S\cap P_{i}\|}{d_{i}}}{\binom{\|P_{i}\|}{d_{i}}})^{k_{i}}$	(14)
	$\displaystyle\geq 1-(1-\frac{\log^{2}n}{k_{i}})^{k_{i}}$	(15)
	$\displaystyle\geq 1-\frac{1}{n^{\log n}}$	(16)
	$\displaystyle=1-e^{-\Omega(\log^{2}n)}$	(17)

∎

Lemma 7.

Consider a query $S$ by an algorithm at round $i$ such that all the queries from previous rounds are independent of $P_{i},\ldots,P_{R}$ . Let $S$ be such that $|S|\leq(1-\frac{1}{d_{i}})n_{i}$ , then with probability $1-e^{-\Omega(\log^{2}n)}$ , for every $l\geq i+1$ , $S$ does not contain any edge from the matching on $P_{l}$ .

Proof.

Let’s fix a partition $P_{l}$ with $l\geq i+1$ . Let $e_{1},\ldots,e_{j},\ldots e_{k_{l}}$ be the random matching on $P_{l}$ . We have for $j=1,\ldots,k_{l}$ ,

\mathbb{E}\big{[}|e_{j}\cap S|\big{]}=\frac{|S||e_{j}|}{n_{i}}\leq(1-\frac{1}{d_{i}})d_{l}=d_{l}-\frac{d_{l}}{d_{i}}.

Going forward, we present an upper bound on $\mathbb{P}(e_{j}\subset S)$ . We can assume without loss of generality that $|S|=(1-\frac{1}{d_{i}})n_{i}$ . In fact, for any $j=1,\ldots,k_{l}$ , if $S\subseteq S^{\prime}$ then we have $\mathbb{P}(e_{j}\subseteq S)\leq\mathbb{P}(e_{j}\subseteq S^{\prime})$ .

The probability that $e_{j}\subset S$ can be expressed as

	$\displaystyle\mathbb{P}(e_{j}\subset S)$	$\displaystyle=\mathbb{P}(\|e_{j}\cap S\|\geq d_{l})$
		$\displaystyle\leq\mathbb{P}\Big{(}\|e_{j}\cap S\|\geq(1+\delta)E\big{[}\|e_{j}\cap S\|\big{]}\Big{)},$

where $\delta=\frac{1}{d_{i}-1}$ . By Chernoff bound we get

	$\displaystyle\mathbb{P}(e_{j}\subset S)$	$\displaystyle\leq 2e^{-\frac{1}{3}\delta^{2}E\big{[}\|e_{j}\cap S\|\big{]}}$
		$\displaystyle=2e^{-\frac{1}{3}\frac{1}{(d_{i}-1)^{2}}(1-\frac{1}{d_{i}})d_{l}}$
		$\displaystyle=2e^{-\frac{1}{3}\frac{1}{(d_{i}-1)d_{i}}d_{l}}$
		$\displaystyle\leq 2e^{-\log^{2}n}$
		$\displaystyle\leq\frac{2}{n^{\log n}}$

where the second to last inequality is due to $d_{l}\geq d_{i+1}=3\log^{2}n\cdot d_{i}^{2}$ . Therefore by a union bound on the edges of the matching in $P_{l}$ we get that

	$\displaystyle\mathbb{P}(\exists\ j=1,\ldots k_{l},\ \ e_{j}\subset S)$	$\displaystyle\leq\frac{2k_{l}}{n^{\log n}}$
		$\displaystyle\leq\frac{1}{n^{\log n-1}}$

Finally, another union bound on all the partition $l\geq i+1$ yields that the probability that there exists a partition $P_{l}$ such that an edge of $P_{l}$ is included in $S$ is less than $O(\log\log n/n^{\log n-1})=1-e^{-\Omega(\log^{2}n)}$ . ∎

Lemma 8.

If vertices in $P_{i},P_{i+1},\ldots,P_{R}$ are indistinguishable to $\mathcal{A}$ at the beginning of round $i$ of queries, then vertices in $P_{i+1},\ldots,P_{R}$ are indistinguishable at the end of round $i$ with probability $1-e^{-\Omega(\log^{2}n)}$ .

Proof.

Consider a query $S$ by an algorithm at round $i$ such that all the queries from previous rounds are independent of the partition $P_{i},\ldots,P_{R}$ . Lemma 7 shows that if $S$ is small ( $|S|\leq(1-1/d_{i})\sum_{j\geq i}|P_{j}|)$ and is independent of partition of $\cup_{j=i}^{R}P_{i}$ into $P_{i},\ldots,P_{R}$ , then with probability $1-e^{-\Omega(\log^{2}n)}$ , for every $j\geq i+1$ , $S$ does not contain any edge contained in $P_{j}$ . On the other hand, Lemma 6 shows that if a query $S$ has large size ( $|S|\geq(1-1/d_{i})\sum_{j\geq i}|P_{j}|)$ and is independent of partition of $\cup_{j=i}^{R}P_{i}$ into $P_{i},\ldots,P_{R}$ , then with probability $1-e^{-\Omega(\log^{2}n)}$ , $S$ contains at least one edge from the matching on $P_{i}$ which implies $Q_{M_{P}}(S)=1$ . In both cases, $Q_{M_{P}}(S)$ is independent of the partition of $\cup_{j=i+1}^{R}P_{i}$ into $P_{i+1},\ldots,P_{R}$ with probability $1-e^{-\Omega(\log^{2}n)}$ . By a union bound, this holds for $poly(n)$ queries at round $i$ . ∎

See 7

Proof.

Consider a uniformly random partition $P=(P_{0},\ldots,P_{i},\ldots P_{R})$ , a matching $M_{P}$ and an algorithm $\mathcal{A}$ which queries $M_{P}$ in $\log\log n-3$ rounds. By Lemma 8, after $i$ rounds of queries, with probability $1-e^{-\Omega(\log^{2}n)}$ over both the randomization of $P$ and of the algorithm, all the queries $Q_{M_{P}}(S)$ made by $\mathcal{A}$ are independent of the partition of $P_{i}\cup\ldots\cup P_{R}$ . Therefore, and since $R\geq\log\log n-1$ by Claim 2, we get that after $\log\log n-3$ round of queries, with probability $1-e^{-\Omega(\log^{2}n)}$ over both the randomization of $P$ and the algorithm $\mathcal{A}$ , all the queries $Q_{M_{P}}(S)$ made by $\mathcal{A}$ are independent of the partition of $P_{R-1}\cup P_{R}$ . We now distinguish two cases:

•

$\mathcal{A}$ does not a return any edge that is included in $P_{R-1}\cup P_{R}$ .

•

$\mathcal{A}$ returns a set of edges that is included in $P_{R-1}\cup P_{R}$ . In this case, we know that with probability $1-e^{-\Omega(\log^{2}n)}$ all the queries $Q_{M_{P}}(S)$ made by $\mathcal{A}$ are independent of the partition of $P_{R-1}\cup P_{R}$ into $P_{R-1}$ and $P_{R}$ . The edges that are returned by $\mathcal{A}$ and included in $P_{R-1}\cup P_{R}$ are therefore also independent from the partition of $P_{R-1}\cup P_{R}$ into $P_{R-1}$ and $P_{R}$ with probability $1-e^{-\Omega(\log^{2}n)}$ . To fully learn $M_{P}$ , $\mathcal{A}$ needs to make the distinction between points in $P_{R-1}$ and points in $P_{R}$ , but there are $\binom{|P_{R}|+|P_{R-1}|}{|P_{R-1}|}$ ways of partitioning $P_{R-1}\cup P_{R}$ into $P_{R-1}$ and $P_{R}$ . Therefore the probability that $\mathcal{A}$ correctly learns $M_{P}$ is less than

(1-e^{-\Omega(\log^{2}n)})\frac{1}{\binom{|P_{R}|+|P_{R-1}|}{|P_{R-1}|}}

In the rest of the proof, we show that $1/\binom{|P_{R}|+|P_{R-1}|}{|P_{R-1}|}=e^{-\Omega(\log^{2}n)}$ . This implies that the probability that $\mathcal{A}$ learns $M_{P}$ correctly is less than $(1-e^{-\Omega(\log^{2}n)})e^{-\Omega(\log^{2}n)}=e^{-\Omega(\log^{2}n)}$ .

We know that $n\leq\sum_{i=0}^{R}|P_{i}|\leq(R+1)|P_{R}|$ . Therefore $|P_{R}|\geq n/(R+1)=\Omega(n/\log\log n)$ . By Claim 3, we get that

	$\displaystyle\frac{1}{\binom{\|P_{R}\|+\|P_{R-1}\|}{\|P_{R-1}\|}}$	$\displaystyle\leq\frac{4(\|P_{R-1}\|)!}{(\|P_{R}\|+\|P_{R-1}\|)^{\|P_{R-1}\|}}$
		$\displaystyle\leq\left(\frac{\|P_{R-1}\|}{\|P_{R}\|+\|P_{R-1}\|}\right)^{\|P_{R-1}\|}$
		$\displaystyle\leq\left(\frac{1}{3\log^{2}n\|P_{R-1}\|}\right)^{\|P_{R-1}\|}$
		$\displaystyle\leq\left(\frac{1}{3\log^{2}n\|P_{R-1}\|}\right)^{3\log^{2}n}$
		$\displaystyle\leq\left(\frac{1}{3\log^{2}n\|P_{R-1}\|}\right)^{3\log^{2}n}$
		$\displaystyle=e^{-\Omega(\log^{2}n)}$

Therefore, with probability $1-e^{-\Omega(\log^{2}n)}$ , the matching returned by $\mathcal{A}$ is not equal to $M_{P}$ . ∎

Appendix B Missing Analysis for Low Degree Near Uniform Hypergraphs (Section 4)

B.1 Proof of Lemma 1

Lemma 9.

Assume that the hypergraph $H(V,E)$ has a maximum degree of $\Delta$ and edge size between $d/\rho$ and $d$ . Let $S$ be a vertex-sample with probability $p$ , and assume that $S$ contains an edge $e$ . We have

\mathbb{P}(\not\exists\ e^{\prime}\mbox{ s.t }e^{\prime}\subseteq S,e^{\prime}\in H\setminus\{e\}\ |\ e\subseteq S)\geq f^{\bullet}(\Delta,p,d,\rho)

Proof.

The intuition is that the term $f^{\bullet}(\Delta,p,d,\rho)$ treats the events that the edges are not contained in $S$ as independent events.

Consider the following intermediate optimization problem

$\displaystyle f^{\bullet}_{k}(\Delta,p,d,\rho):=\quad\quad$	$\displaystyle\min\limits_{a_{ij}}$	$\displaystyle\prod\limits_{j=d/\rho}^{d}\prod\limits_{i=0}^{j-1}(1-p^{j-i})^{a_{ij}}$
	s.t.	$\displaystyle\sum_{j=d/\rho}^{d}\sum_{i=0}^{j-1}i\cdot a_{ij}\leq\min\{(\Delta-1)d,kd\}$
$\displaystyle\sum_{j=d/\rho}^{d}\sum_{i=0}^{j-1}j\cdot a_{ij}\leq k$
$\displaystyle a_{ij}\geq 0.$

We want to show that for every $1\leq k\leq\Delta n$ , and for any $k$ edges $e_{1},\ldots,e_{k}$ , all different than $e$ , we have

\mathbb{P}(e_{1},\ldots,e_{k}\not\subseteq S\ |\ e\subseteq S)\geq f_{k}^{\bullet}(\Delta,p,d,\rho).

For $k=1$ , the result clearly holds. Suppose by induction for any $k$ edges $e_{1},\ldots,e_{k}$ , all different than $e$ , we have

\mathbb{P}(e_{1},\ldots,e_{k}\not\subseteq S\ |\ e\subseteq S)\geq f_{k}^{\bullet}(\Delta,p,d,\rho).

If we consider $k+1$ edges, then

\displaystyle\mathbb{P}(e_{1},\ldots,e_{k},e_{k+1}\not\subseteq S\ |\ e\subseteq S)

\displaystyle=P(e_{1},\ldots,e_{k}\not\subseteq S\ |\ e\subseteq S)\mathbb{P}(e_{k+1}\not\subseteq S\ |\ e\subseteq S,\ e_{1},\ldots,e_{k},e_{k+1}\not\subseteq S)

By the induction hypothesis we know that $P(e_{1},\ldots,e_{k}\not\subseteq S\ |\ e\subseteq S)\geq f_{k}^{\bullet}(\Delta,p,d,\rho)$ . Therefore, if we can show that

\mathbb{P}(e_{k+1}\not\subseteq S\ |\ e\subseteq S,\ e_{1},\ldots,e_{k}\not\subseteq S)\geq\mathbb{P}(e_{k+1}\not\subseteq S\ |\ e\subseteq S)

(18)

then we will have

\mathbb{P}(e_{1},\ldots,e_{k},e_{k+1}\not\subseteq S\ |\ e\subseteq S)\geq f_{k}^{\bullet}(\Delta,p,d,\rho)\mathbb{P}(e_{k+1}\not\subseteq S\ |\ e\subseteq S)\geq f_{k+1}^{\bullet}(\Delta,p,d,\rho)

To see that (18) holds, we omit conditioning on $e\subset S$ to ease notation. By Bayes rule,

	$\displaystyle\mathbb{P}(e_{k+1}\not\subseteq S\ \|\ e\subseteq S,\ e_{1},\ldots,e_{k}\not\subseteq S)$	$\displaystyle=\mathbb{P}(e_{k+1}\not\subseteq S\ \|\ \exists\ \ell\in[1,k]\ e_{\ell}\cap e_{k+1}\setminus S\neq\emptyset,\ e_{1},\ldots,e_{k}\not\subseteq S)$
		$\displaystyle\ \ \ \ \times\mathbb{P}(\exists\ \ell\in[1,k]\ e_{\ell}\cap e_{k+1}\setminus S\neq\emptyset\ \|\ e_{1},\ldots,e_{k}\not\subseteq S)$
		$\displaystyle\ \ \ \ +\mathbb{P}(e_{k+1}\not\subseteq S\ \|\ \forall\ \ell\in[1,k]\ e_{\ell}\cap e_{k+1}\setminus S=\emptyset,\ e_{1},\ldots,e_{k}\not\subseteq S)$
		$\displaystyle\ \ \ \ \times\mathbb{P}(\forall\ \ell\in[1,k]\ e_{\ell}\cap e_{k+1}\setminus S=\emptyset\ \|\ e_{1},\ldots,e_{k}\not\subseteq S\not\subseteq S)$

When $e_{\ell}\cap e_{k+1}\setminus S\neq\emptyset$ for some value $\ell$ , then we know with probability one that $e_{k+1}\not\subseteq S$ . Therefore

\mathbb{P}(e_{k+1}\not\subseteq S\ |\ \exists\ \ell\in[1,k]\ e_{\ell}\cap e_{k+1}\setminus S\neq\emptyset,\ e_{1},\ldots,e_{k}\not\subseteq S)=1\geq P(e_{k+1}\not\subseteq S).

Furthermore,

\mathbb{P}(e_{k+1}\not\subseteq S\ |\ \forall\ \ell\in[1,k]\ e_{\ell}\cap e_{k+1}\setminus S=\emptyset,\ e_{1},\ldots,e_{k}\not\subseteq S)=\mathbb{P}(e_{k+1}\not\subseteq S)=1-p^{d},

therefore

	$\displaystyle\mathbb{P}(e_{k+1}\not\subseteq S\ \|\ e\subseteq S,\ e_{1},\ldots,e_{k}\not\subseteq S)$	$\displaystyle\geq\mathbb{P}(e_{k+1}\not\subseteq S\ \|\ \exists\ \ell\in[1,k]\ e_{\ell}\cap e_{k+1}\setminus S\neq\emptyset,\ e_{1},\ldots,e_{k}\not\subseteq S)$
		$\displaystyle\ \ \ \ \times P(e_{k+1}\not\subseteq S)$
		$\displaystyle\ \ \ \ +\mathbb{P}(e_{k+1}\not\subseteq S\ \|\ \forall\ \ell\in[1,k]\ e_{\ell}\cap e_{k+1}\setminus S=\emptyset,\ e_{1},\ldots,e_{k}\not\subseteq S)$
		$\displaystyle\ \ \ \ \times P(e_{k+1}\not\subseteq S)$
		$\displaystyle=P(e_{k+1}\not\subseteq S)$

∎

Lemma 10.

For $p\in(0,1)$ , any optimal solution to $LP^{\bullet}(\Delta,p,d,\rho)$ and $f^{\bullet}(\Delta,p,d,\rho)$ is such that $a_{ij}=0$ for all $j\neq d/\rho$ .

Proof.

For ease of notation, we use $d^{-}$ instead of $d/\rho$ in the proof. Suppose to the contrary that there is an optimal solution $a$ such that $a_{ij}>0$ for some $j>d^{-}$ . Then consider the alternative solution $a^{\prime}$ such that $a^{\prime}_{id^{-}}=a_{id^{-}}+\epsilon$ , $a^{\prime}_{ij}=a_{ij}-\epsilon$ , and $a^{\prime}_{sk}=a^{\prime}_{sk}$ for all $s\neq i$ or $k\notin\{d^{-},j\}$ . For $\epsilon>0$ small enough, clearly the constraints still hold.

The net increase in objective value of $LP^{\bullet}(\Delta,p,d,\rho)$ is $\epsilon\cdot(p^{d^{-}-1}-p^{j-1})>0$ because $d^{-}<j$ and $p\in(0,1)$ . The new objective value of $f^{\bullet}(\Delta,p,d,\rho)$ is $\left(\frac{1-p^{d^{-}-i}}{1-p^{j-i}}\right)^{\epsilon}<1$ times the old objective value (thus smaller than the old objective value) again because $d^{-}<j$ and $p\in(0,1)$ . Therefore, $a$ is not an optimal solution to either $LP^{\bullet}(\Delta,p,d,\rho)$ or $f^{\bullet}(\Delta,p,d,\rho)$ . ∎

From Lemma 10, we can obtain the following two equivalent representations of $LP^{\bullet}(\Delta,p,d,\rho)$ and $f^{\bullet}(\Delta,p,d,\rho)$ : let $a_{i}$ be the number of edges of size $d/\rho$ that intersect the focal edge at $i$ vertices, we have

$\displaystyle LP^{\bullet}(\Delta,p,d,\rho):=\quad\quad$	$\displaystyle\max\limits_{a_{i}}$	$\displaystyle\sum_{i=0}^{d/\rho-1}a_{i}\cdot p^{d/\rho-i}$
	s.t.	$\displaystyle\sum_{i=0}^{d/\rho-1}i\cdot a_{i}\leq(\Delta-1)d$
$\displaystyle\sum_{i=0}^{d/\rho-1}a_{i}\leq\frac{\Delta n}{d/\rho}$
$\displaystyle a_{i}\geq 0.$

$\displaystyle f^{\bullet}(\Delta,p,d,d/\rho):=\quad\quad$	$\displaystyle\min\limits_{a_{i}}$	$\displaystyle\prod\limits_{i=0}^{d/\rho-1}(1-p^{d/\rho-i})^{a_{i}}$
	s.t.	$\displaystyle\sum_{i=0}^{d/\rho-1}i\cdot a_{i}\leq(\Delta-1)d$
$\displaystyle\sum_{i=0}^{d/\rho-1}a_{i}\leq\frac{\Delta n}{d/\rho}$
$\displaystyle a_{i}\geq 0.$

Claim 4.

The function

f(x)=\left(\frac{1-p^{d}}{1-p^{d-x}}\right)^{\frac{1}{x}}

is increasing in $x$ for $x\in[1,d-1]$ .

Proof.

Taking the derivative of $f$ , we get

f^{\prime}(x)=\frac{\left(\frac{p^{x}(p^{d}-1)}{p^{d}-p^{x}}\right)^{\frac{1}{x}}}{(p^{d}-p^{x})x^{2}}\cdot\left(p^{d}x\log p-(p^{d}-p^{x})\log\left(\frac{p^{x}(p^{d}-1)}{p^{d}-p^{x}}\right)\right).

(19)

The first term on the RHS of Eq.(19) is non-positive: $p^{d}<p^{x}$ and $p^{d}-1<0$ because $p\in(0,1)$ and $x<d$ . We now show that the second term is non-positive as well. Denote the second term by $g(x)$ , i.e.,

g(x)=p^{d}x\log p-(p^{d}-p^{x})\log\left(\frac{p^{x}(p^{d}-1)}{p^{d}-p^{x}}\right).

Our plan is to first show that $g(x)$ is decreasing in $x$ , and therefore it is maximized at $x=1$ . We then show that $g(1)\leq 0$ , and thus conclude that $g(x)\leq 0$ for all $x\in[1,d-1]$ .

Taking derivative of $g$ , we get

g^{\prime}(x)=p^{x}\log p\log\left(\frac{p^{x}(p^{d}-1)}{p^{d}-p^{x}}\right)=p^{x}\log p\log\left(\frac{1-p^{d}}{1-p^{d-x}}\right).

Because $1\leq d-x\leq d-1$ and $p\in(0,1)$ , we have $0\geq 1-p^{d}\geq 1-p^{d-x}$ . Thus, $(1-p^{d})/(1-p^{d-x})\geq 1$ . Furthermore, $\log p<0$ because $p\in(0,1)$ . We can thus conclude that $g^{\prime}(x)\leq 0$ for all $x\in[1,d-1]$ , or equivalently, $g(x)$ is decreasing on $[1,d-1]$ .

We now show $g(1)\leq 0$ .

	$\displaystyle g(1)$	$\displaystyle=p^{d}\log p-(p^{d}-p)\log\left(\frac{p(p^{d}-1)}{p^{d}-p}\right)$
		$\displaystyle=p^{d}\left(\log p-\log\left(\frac{p(p^{d}-1)}{p^{d}-p}\right)\right)+p\log\left(\frac{p(p^{d}-1)}{p^{d}-p}\right)$
		$\displaystyle=p^{d}\log p-(p^{d}-p)\log\left(\frac{p^{d}-1}{p^{d-1}-1}\right)$
		$\displaystyle=p\left(p^{d-1}\log p-(p^{d-1}-1)\log\left(\frac{1-p^{d}}{1-p^{d-1}}\right))\right)$
		$\displaystyle=p\left(p^{d-1}\log p+(1-p^{d-1})\log\left(1+\frac{p^{d-1}-p^{d}}{1-p^{d-1}}\right)\right)$
		$\displaystyle\leq p\left(p^{d-1}\log p+(1-p^{d-1})\frac{p^{d-1}-p^{d}}{1-p^{d-1}}\right)$
		$\displaystyle=p\left(p^{d-1}\log p+p^{d-1}-p^{d}\right)$
		$\displaystyle=p^{d}\left(\log p+1-p\right)\leq 0,$

where the last inequality follows from the fact that $1+t\leq e^{t}$ for all $t$ : we can set $t=\log p$ . ∎

Lemma 11.

For $p\in(0,1)$ , an optimal solution to the math program $f^{\bullet}(\Delta,p,d,\rho)$ is

a^{*}_{d/\rho-1}=\min\left\{(\Delta-1)\frac{d}{d/\rho-1},\frac{\Delta n}{d/\rho}\right\},\;a^{*}_{0}=\frac{\Delta n}{d/\rho}-a^{*}_{d/\rho-1},\;a^{*}_{i}=0\;\forall i\in\{1,\ldots,d/\rho-2\}.

Proof.

For ease of notation, we use $d^{-}$ instead of $d/\rho$ in the proof. Consider another solution $a$ such that $a_{d^{-}-1}<\min\{(\Delta-1)d/(d^{-}-1),\Delta n/d^{-}\}$ . If for all $i\in\{0,\ldots,d^{-}-2\}$ , $a_{i}=0$ , then $a$ is not optimal: neither constraint is tight; thus one could always increase some $a_{i},i<d^{-}-1$ to decrease the objective value. Therefore, without loss of generality, we can assume that there exists an index $i<d^{-}-1$ , $a_{i}>0$ . Let $i$ be the biggest such index. For $\epsilon>0$ , consider the following alternative solution:

a^{\prime}_{d^{-}-1}=a_{d^{-}-1}+\epsilon,\;a^{\prime}_{i}=\begin{cases}a_{i}-\frac{{d^{-}-1}}{i}\epsilon&i>0\\ a_{i}-\epsilon&i=0\end{cases},

a^{\prime}_{0}=\begin{cases}a_{0}+\left(\frac{{d^{-}-1}}{i}-1\right)\epsilon&i>0\\ a_{0}-\epsilon&i=0\end{cases},\;a^{\prime}_{k}=a_{k}\;\forall k\notin\{0,i,{d^{-}-1}\}

It can be easily checked that $a^{\prime}$ is still feasible for small enough $\epsilon$ . Let $obj(a)$ (resp. $obj(a^{\prime})$ ) be the objective function value with respect to solution $a$ (resp. $a^{\prime}$ ). When $i>0$ , we have

	$\displaystyle\frac{obj(a^{\prime})}{obj(a)}$	$\displaystyle=\left(\frac{(1-p)\cdot(1-p^{d^{-}})^{\frac{d^{-}-1}{i}-1}}{(1-p^{d^{-}-i})^{\frac{d^{-}-1}{i}}}\right)^{\epsilon}$
		$\displaystyle=\left(\frac{1-p}{1-p^{d^{-}}}\left(\frac{1-p^{d^{-}}}{1-p^{d^{-}-i}}\right)^{\frac{d^{-}-1}{i}}\right)^{\epsilon}$
		$\displaystyle=\left(\left(\frac{1-p}{1-p^{d^{-}}}\right)^{\frac{1}{d^{-}-1}}\left(\frac{1-p^{d^{-}}}{1-p^{d^{-}-i}}\right)^{\frac{1}{i}}\right)^{(d^{-}-1)\epsilon}$
		$\displaystyle=\left(\frac{\left(\frac{1-p^{d^{-}}}{1-p^{d^{-}-i}}\right)^{\frac{1}{i}}}{\left(\frac{1-p^{d^{-}}}{1-p}\right)^{\frac{1}{d^{-}-1}}}\right)^{(d^{-}-1)\epsilon}\leq 1,$

where the last inequality follows from Claim 4. We have thus shown that $obj(a^{\prime})\leq obj(a)$ if $i>0$ .

We now show that $obj(a^{\prime})\leq obj(a)$ if $i=0$ . When $i=0$ , we have

\displaystyle\frac{obj(a^{\prime})}{obj(a)}

\displaystyle=\left(\frac{1-p}{1-p^{d^{-}}}\right)^{\epsilon}\leq 1,

because $0<p^{d^{-}}<p$ . ∎

Lemma 12.

For $\Delta\geq 2$ , we have

f^{\bullet}(\Delta,p,d,\rho)\geq f^{\bullet}(2,p,d/\rho,1)^{(\Delta-1)\rho}

Proof.

For ease of notation, we use $d^{-}$ instead of $d/\rho$ in the proof. From Lemma 11, we have that

	$\displaystyle f^{\bullet}(\Delta,p,d,\rho)$	$\displaystyle=(1-p)^{\min\left\{(\Delta-1)\frac{d}{d^{-}-1},\frac{\Delta n}{d^{-}}\right\}}(1-p^{d^{-}})^{\frac{\Delta n}{d^{-}}-\min\left\{(\Delta-1)\frac{d}{d^{-}-1},\frac{\Delta n}{d^{-}}\right\}}$
		$\displaystyle\geq(1-p)^{(\Delta-1)\frac{d}{d^{-}-1}}(1-p^{d^{-}})^{\frac{\Delta n}{d^{-}}-(\Delta-1)\frac{d}{d^{-}-1}}.$		(20)

Specifically when $\Delta=2$ , we have

	$\displaystyle f^{\bullet}(2,p,d^{-},1)$	$\displaystyle=(1-p)^{\min\left\{\frac{d^{-}}{d^{-}-1},\frac{2n}{d^{-}}\right\}}(1-p^{d^{-}})^{\frac{2n}{d^{-}}-\min\left\{\frac{d^{-}}{d^{-}-1},\frac{2n}{d^{-}}\right\}}$
		$\displaystyle=(1-p)^{\frac{d^{-}}{d^{-}-1}}(1-p^{d^{-}})^{\frac{2n}{d^{-}}-\frac{d^{-}}{d^{-}-1}},$

where the second equality follows because for $2\leq d^{-}\leq n$ , we have $\frac{d^{-}}{d^{-}-1}\leq\frac{2n}{d^{-}}$ .

We therefore have

f^{\bullet}(2,p,d^{-},1)^{(\Delta-1)\frac{d}{d^{-}}}=(1-p)^{\frac{d^{-}}{d^{-}-1}(\Delta-1)\frac{d}{d^{-}}}(1-p^{d^{-}})^{\left(\frac{2n}{d^{-}}-\frac{d^{-}}{d^{-}-1}\right)(\Delta-1)\frac{d}{d^{-}}}.

(21)

From Eqs.(20) and (21), we have that to show the desired inequality, we can equivalently show

1.

$(1-p)^{(\Delta-1)\frac{d}{d^{-}-1}}\geq(1-p)^{\frac{d^{-}}{d^{-}-1}(\Delta-1)\frac{d}{d^{-}}}.$

(1-p^{d^{-}})^{\frac{\Delta n}{d^{-}}-(\Delta-1)\frac{d}{d^{-}-1}}\geq(1-p^{d^{-}})^{(\frac{2n}{d^{-}}-\frac{d^{-}}{d^{-}-1})(\Delta-1)\frac{d}{d^{-}}}.

We first show

(1-p)^{(\Delta-1)\frac{d}{d^{-}-1}}\geq(1-p)^{\frac{d^{-}}{d^{-}-1}(\Delta-1)\frac{d}{d^{-}}}.

Since $1-p\in(0,1)$ , we equivalently want to show

(\Delta-1)\frac{d}{d^{-}-1}\leq\frac{d^{-}}{d^{-}-1}(\Delta-1)\frac{d}{d^{-}},

which, after canceling common terms and rearranging clearly holds.

We then show

(1-p^{d^{-}})^{\frac{\Delta n}{d^{-}}-(\Delta-1)\frac{d}{d^{-}-1}}\geq(1-p^{d^{-}})^{(\frac{2n}{d^{-}}-\frac{d^{-}}{d^{-}-1})(\Delta-1)\frac{d}{d^{-}}}.

Again, because $1-p^{d^{-}}\in(0,1)$ , we equivalently need to show

\frac{\Delta n}{d^{-}}-(\Delta-1)\frac{d}{d^{-}-1}\leq(\frac{2n}{d^{-}}-\frac{d^{-}}{d^{-}-1})(\Delta-1)\frac{d}{d^{-}}.

Equivalently

\frac{\Delta n}{d^{-}}\leq\left((\frac{2n}{d^{-}}-\frac{d^{-}}{d^{-}-1})\frac{d}{d^{-}}+\frac{d}{d^{-}-1}\right)(\Delta-1)=\frac{2n}{d^{-}}\frac{d}{d^{-}}(\Delta-1).

Dividing both sides by $\frac{n}{d^{-}}(\Delta-1)$ , we get equivalently

\frac{\Delta}{\Delta-1}\leq 2\frac{d}{d^{-}},

which holds for $\Delta\geq 2$ because $\Delta/(\Delta-1)\leq 2$ and $d/d^{-}\geq 1$ .

∎

Lemma 13.

Let $p>0$ , then we have

f^{\bullet}(2,p,d,\rho)\geq 1-LP^{\bullet}(2,p,d,\rho).

Proof.

For ease of notation, we use $d^{-}$ instead of $d/\rho$ in the proof. Let $a=(a_{ij})_{j\in[d^{-},d],\ i\in[0,j-1]}$ be a feasible solution to $LP^{\bullet}(2,p,d,\rho)$ . It is sufficient to show that

\prod\limits_{j=d^{-}}^{d}\prod\limits_{i=0}^{j-1}(1-p^{j-i})^{a_{ij}}\geq 1-\sum_{j=d^{-}}^{d}\sum_{i=0}^{j-1}a_{ij}\cdot p^{j-i}

We show, more generally that for $x_{0},\ldots,x_{k}\in[0,1]^{k}$ and $a_{0},\ldots,a_{k}\geq 0$

\prod\limits_{i=0}^{k}(1-x_{i})^{a_{i}}\geq 1-\sum_{i=0}^{k}a_{i}\cdot x_{i},

If $1-\sum_{i=0}^{k}a_{i}\cdot x_{i}\leq 0$ , the inequality is trivially true. So we assume that $1-\sum_{i=0}^{k}a_{i}\cdot x_{i}\geq 0$ , which implies $1-a_{i}\cdot x_{i}\geq 0$ for $i\in[k]$ . We first show that for a positive integer $a$ and $x\in[0,1]$ we have

(1-x)^{a}\geq 1-ax

To see this, we consider the function $g(x)=(1-x)^{a}-(1-ax)$ over $[0,1]$ . The derivative of $g$ is $g^{\prime}(x)=a-a(1-x)^{a-1}\geq 0$ for $x\in[0,1]$ . $g$ is therefore increasing and since $g(0)=0$ , we get that $g(x)\geq 0$ for $x\in[0,1]$ . Therefore, for $i\in[k]$

(1-x_{i})^{a_{i}}\geq 1-a_{i}\cdot x_{i}\geq 0

By taking the product we get that

\prod\limits_{i=0}^{k}(1-x_{i})^{a_{i}}\geq\prod\limits_{i=0}^{k}(1-a_{i}\cdot x_{i})

(22)

All is left is to show that

\prod\limits_{i=0}^{k}(1-a_{i}\cdot x_{i})\geq 1-\sum_{i=0}^{k}a_{i}\cdot x_{i}

(23)

We show (23) by induction. It is true for $i=0$ . Suppose (23) holds for some $\ell<k$ , then

\prod\limits_{i=0}^{\ell}(1-x_{i})^{a_{i}}\geq 1-\sum_{i=0}^{\ell}a_{i}\cdot x_{i},

and by multiplying by $(1-a_{\ell+1}\cdot x_{\ell+1})$ , we get

	$\displaystyle\prod\limits_{i=0}^{\ell+1}(1-a_{i}\cdot x_{i})$	$\displaystyle\geq(1-a_{\ell+1}\cdot x_{\ell+1})(1-\sum_{i=0}^{\ell}a_{i}\cdot x_{i})$
		$\displaystyle=1-\sum_{i=0}^{\ell+1}a_{i}\cdot x_{i}+(a_{\ell+1}\cdot x_{\ell+1})(\sum_{i=0}^{\ell+1}a_{i}\cdot x_{i})$
		$\displaystyle\geq 1-\sum_{i=0}^{\ell+1}a_{i}\cdot x_{i}$

This concludes the proof of (23). From (22) and (23), we get

\prod\limits_{i=0}^{k}(1-x_{i})^{a_{i}}\geq 1-\sum_{i=0}^{k}a_{i}\cdot x_{i}.

Finally, by setting $k=d^{-}-1$ , and $x_{i}=p^{d^{-}-i}$ , we get that

\prod\limits_{i=0}^{d^{-}-1}(1-p^{d^{-}-i})^{a_{i}}\geq 1-\sum_{i=0}^{d^{-}-1}a_{i}\cdot p^{d^{-}-i},

which proves the lemma. ∎

Claim 5.

For $n\geq 100$ and $2\leq d\leq n$ , we have

\frac{d}{d-1}n^{-1/d}+\frac{1}{d}\leq\frac{n}{n-1}n^{-1/n}+\frac{1}{n}\leq 1-\frac{\log n}{2n}

Proof.

We prove the claim in three steps.

$\frac{d}{d-1}n^{-1/d}+\frac{1}{d}\leq\frac{n}{n-1}n^{-1/n}+\frac{1}{n}$ for $n\geq 42$ .

To see this, we analyze the derivative of

q(w)=\frac{w}{w-1}\cdot n^{-\frac{1}{w}}+\frac{1}{w},\;\;w\in[2,\infty).

We get

	$\displaystyle q^{\prime}(w)$	$\displaystyle=\frac{n^{-1/w}\cdot\left(-n^{1/w}+2wn^{1/w}-w^{2}(1+n^{1/w})+(w-1)w\log n\right)}{w^{2}(w-1)^{2}}$
		$\displaystyle=\frac{n^{-1/w}\cdot\left(-n^{1/w}\cdot(w-1)^{2}-w^{2}+w(w-1)\log n\right)}{w^{2}(w-1)^{2}}$
		$\displaystyle\geq\frac{n^{-1/w}\cdot\left(-e\cdot(w-1)^{2}-w^{2}+w(w-1)\log n\right)}{w^{2}(w-1)^{2}}.$

When $\log n\geq e+1$ , we have $w(w-1)\log n\geq w(w-1)(e+1)\geq w^{2}+e(w-1)^{2}$ , and therefore $q^{\prime}(w)\geq 0$ when $n\geq 42\geq e^{e+1}$ .

2.

$n^{-1/n}\leq 1-\frac{3}{4}\frac{\log n}{n}$ for $n\geq 10$

To see this, consider the function $g(x)=1-\frac{3}{4}\frac{\log x}{x}-x^{-1/x}$ over the interval $[e,\infty)$ . We have

$g^{\prime}(x)=\frac{1-\log x}{x^{2}}(x^{-1/x}-\frac{3}{4}).$

Since $x^{-1/x}$ is increasing and $10^{-1/10}>3/4$ , we have that $g^{\prime}(x)<0$ for $x\geq 10$ . Therefore, $g$ is decreasing over $[10,\infty]$ . Since $\lim\limits_{x\rightarrow\infty}g(x)=0$ , this implies that $g(n)\geq 0$ for $n\geq 10$ .

$\frac{n}{n-1}(1-\frac{3}{4}\frac{\log n}{n})+\frac{1}{n}\leq 1-\frac{\log n}{2n}$ for $n\geq 100$ .

To see this, we use the following sequence of equivalences

	$\displaystyle\frac{n}{n-1}(1-\frac{3}{4}\frac{\log n}{n})+\frac{1}{n}\leq 1-\frac{\log n}{2n}$	$\displaystyle\Leftrightarrow(1+\frac{1}{n-1})(1-\frac{3}{4}\frac{\log n}{n})+\frac{1}{n}\leq 1-\frac{\log n}{2n}$
		$\displaystyle\Leftrightarrow\frac{1}{n-1}+\frac{1}{n}-\frac{3}{4}\frac{\log n}{n(n-1)})\leq\frac{\log n}{4n}$
		$\displaystyle\Leftrightarrow 4n+4(n-1)-3\log n\leq(n-1)\log n$

The last inequality is true for $n\geq 100$

∎

Lemma 14.

The optimal solution of $LP^{\bullet}(2,p,d/\rho,1)$ is given by

a_{d/\rho-1}=\min\left\{\frac{d/\rho}{d/\rho-1},\frac{2n}{d/\rho}\right\},\;a_{0}=\frac{2n}{d/\rho}-a_{d/\rho-1},\;a_{i}=0\;\;\forall i\in\{1,2,\ldots,d/\rho-2\}.

(24)

Proof.

For ease of notation, we use $d^{-}$ instead of $d/\rho$ in the proof. Suppose to the contrary that there exists an optimal solution $a$ such that $a_{d^{-}-1}<\min\{d^{-}/(d^{-}-1),2n/d^{-}\}$ . There must exist $i<d^{-}-1$ such that $a_{i}>0$ : neither constraint is tight if for all $i<d^{-}-1$ , $a_{i}=0$ ; thus one could always increase some $a_{i},i<d^{-}-1$ to increase the objective value. Let $i$ be the biggest index below $d^{-}-1$ such that $a_{i}>0$ . For $\epsilon>0$ , consider the following alternative solution:

a^{\prime}_{d^{-}-1}=a_{d^{-}-1}+\epsilon,\;a^{\prime}_{i}=\begin{cases}a_{i}-\frac{{d^{-}-1}}{i}\epsilon&i>0\\ a_{i}-\epsilon&i=0\end{cases},\;a^{\prime}_{0}=\begin{cases}a_{0}+\left(\frac{{d^{-}-1}}{i}-1\right)\epsilon&i>0\\ a_{0}-\epsilon&i=0\end{cases},

a^{\prime}_{k}=a_{k}\;\;\forall k\notin\{0,i,{d^{-}-1}\}

It can be easily checked that $a^{\prime}$ is still feasible for small enough $\epsilon$ . When $i>0$ , the net increase in objective value from $a$ to $a^{\prime}$ is

\displaystyle\epsilon\cdot\left(p^{d^{-}-{(d^{-}-1)}}+\left(\frac{{d^{-}-1}}{i}-1\right)\cdot p^{d}-\frac{{d^{-}-1}}{i}\cdot p^{d-i}\right).

(25)

Now consider the following function

f(x)=\frac{1}{x}(p^{d^{-}-x}-p^{d^{-}}),\;\;x\in[1,{d^{-}-1}].

We claim that $f(x)$ is strictly increasing when $p\in(0,1)$ : taking the derivative of $f(x)$ , we get

f^{\prime}(x)=\frac{p^{d^{-}-x}\cdot(p^{x}-(1+x\log p))}{x^{2}}=\frac{p^{d^{-}-x}\cdot(e^{x\log p}-(1+x\log p))}{x^{2}}>0,

where the inequality follows from the fact that $e^{t}>1+t$ for all $t\neq 0$ and $p\in(0,1)$ .

Therefore, we have that $f({d^{-}-1})>f(i)$ , which translates to

\frac{1}{{d^{-}-1}}(p^{d^{-}-{(d^{-}-1)}}-p^{d^{-}})>\frac{1}{i}(p^{d^{-}-i}-p^{d^{-}}).

(26)

By rearranging Eq.(26), we can conclude that the net increase given in (25) is strictly positive, thus contradicting the assumption that $a$ is an optimal solution.

Now consider the case where $i=0$ . The net increase in objective value from $a$ to $a^{\prime}$ is

\displaystyle\epsilon\cdot\left(p^{d^{-}-{(d^{-}-1)}}-p^{d^{-}}\right),

(27)

which is clearly strictly positive if $p\in(0,1)$ , contradicting the assumption that $a$ is an optimal solution. ∎

Lemma 15.

Assume $d\geq 2$ . When $p=(2n)^{-\frac{\rho}{d}}$ and $n\geq 100$ , we have

LP^{\bullet}(2,p,d/\rho,1)\leq 1-\frac{\log n}{2n}.

Proof.

From Lemma 14, we know that

LP^{\bullet}(2,p,d/\rho,1)\leq\frac{d/\rho}{d/\rho-1}\cdot p+\frac{2n}{d/\rho}\cdot p^{d/\rho}.

Therefore, when $p=(2n)^{-\frac{\rho}{d}}$ ,

	$\displaystyle LP^{\bullet}(2,p,d/\rho,1)$	$\displaystyle\leq\frac{d/\rho}{d/\rho-1}\cdot(2n)^{-\frac{1}{d/\rho}}+\frac{2n}{d/\rho}\cdot(2n)^{-1}$
		$\displaystyle\leq\frac{d/\rho}{d/\rho-1}\cdot n^{-\frac{1}{d/\rho}}+\frac{1}{d/\rho},$		(28)

where the last inequality follows because $(2n)^{-\frac{1}{d/\rho}}\leq n^{-\frac{1}{d/\rho}}$ .

From Claim 5, we have that for $d/\rho\geq 2$ and $n\geq 100$ , (28) is upper bounded by $1-\frac{\log n}{2n}$ . ∎

B.2 Proof of Theorem 8

See 8

Proof.

To show that Algorithm 5 returns the hypergraph with high probability, we will show the following:

1.

With high probability, for every edge, there exists at least one sample that contains that edge and no other edges.
2.

If a sample contains more than one edge, we will learn at most the intersection of the edges, and this intersection will be discarded in the last for-loop of the algorithm

The second point above is easy to see: if a sample $S_{i}$ contains more than one edge, then $Q_{H}(S_{i}\setminus\{v\})=0$ if and only if $v$ is in the intersection of all edges contained in $S_{i}$ . If we can successfully learn any edge in $S_{i}$ from another sample $S_{j}$ that contains only that edge, then the intersection learned through sample $S_{i}$ will be discarded as it is a subset of the edge learned using $S_{j}$ .

For the rest of the proof, we focus on showing the first point. To reiterate, we want to show that every edge of $H$ appears in at least one sample by itself. Below, we show that it is the case with probability $1-o(1)$ .

We first show that with high probability, each edge $e$ of size $d^{\prime}\in[d/\rho,d]$ is contained in at least $(2n)^{\Delta-1}(\log^{2}n)/2$ sample $S_{i}$ ’s: use $X_{e}$ to denote the number of samples containing $e$ , then we have

	$\displaystyle\mathbb{E}(X_{e})$	$\displaystyle=(2n)^{\Delta\rho}\log^{2}n\cdot p^{d^{\prime}}$
		$\displaystyle\geq(2n)^{\Delta\rho}\log^{2}n\cdot p^{d}$
		$\displaystyle=(2n)^{\Delta\rho}\log^{2}n\cdot(2n)^{-\rho}$
		$\displaystyle=(2n)^{(\Delta-1)\rho}\log^{2}n$

By Chernoff bound, we have

\mathbb{P}(X_{e}\leq(2n)^{(\Delta-1)\rho}(\log^{2}n)/2)\leq e^{-(2n)^{(\Delta-1)\rho}(\log^{2}n)/8}.

As there are at most $\Delta n\leq n^{2}$ edges of size $d^{\prime}\in[d/\rho,d]$ , by a union bound, we have

	$\displaystyle P(\exists e\in E\text{ s.t. }\|e\|\in[d/\rho,d],X_{e}\leq(2n)^{(\Delta-1)\rho}(\log^{2}n)/2)$
	$\displaystyle\leq\Delta ne^{-(2n)^{(\Delta-1)\rho}(\log^{2}n)/8}=e^{\log\Delta+\log n-(2n)^{(\Delta-1)\rho}(\log^{2}n)/8}$
	$\displaystyle\leq e^{-\log n((2n)^{(\Delta-1)\rho}(\log n)/8-2)}$
	$\displaystyle\leq e^{-\log n((\log n)/8-2)}=n^{-(\log n)/8+2}=o(1).$

Subsequently,

P(\forall e\in E\text{ s.t. }|e|\in[d/\rho,d],X_{e}\geq(2n)^{(\Delta-1)\rho}(\log^{2}n)/2)\geq 1-o(1).

From now on we condition on the event that all edges $e$ whose size is between $d/\rho$ and $d$ are included in at least $(2n)^{(\Delta-1)\rho}(\log^{2}n)/2$ samples.

From Lemma 1, we have that for all $n\geq 100$ ,

\displaystyle\mathbb{P}(\exists e^{\prime}\in E\setminus\{e\},\ e^{\prime}\subseteq S_{i}\ |\ e\subseteq S_{i})\leq 1-(\frac{\log n}{2n})^{(\Delta-1)\rho}.

As each edge $e$ with size between $d/\rho$ and $d$ is contained in at least $(2n)^{(\Delta-1)\rho}(\log^{2}n)/2$ samples, we have that

	$\displaystyle\mathbb{P}(\forall S_{i}\text{ s.t. }e\in S_{i},\exists e^{\prime}\in E\setminus\{e\}\mbox{ s.t. }e^{\prime}\subseteq S_{i})$	$\displaystyle\leq\left(1-\left(\frac{\log n}{2n}\right)^{(\Delta-1)\rho}\right)^{(2n)^{(\Delta-1)\rho}(\log^{2}n)/2}$
		$\displaystyle\leq e^{-\left(\frac{\log n}{2n}\right)^{(\Delta-1)\rho}(2n)^{(\Delta-1)\rho}(\log^{2}n)/2}$
		$\displaystyle=e^{-(\log^{(\Delta+1)\rho}n)/2}=n^{-(\log^{\Delta\rho}n)/2}.$

By another union bound on all edges of size between $d/\rho$ and $d$ (at most $\Delta n$ of them), we have that

\displaystyle\mathbb{P}(\exists e\text{ s.t. }|e|\in[d/\rho,d],\forall S_{i}\text{ s.t. }e\in S_{i},\exists e^{\prime}\in E\setminus\{e\},\ e^{\prime}\subseteq S_{i})\leq n^{-(\log^{\Delta\rho}n)/2+2}=o(1).

We can thus conclude that with probability at least $1-o(1)$ , for all $e\in E$ with size between $d/\rho$ and $d$ , there exists at least one sample $S_{i}$ that contains $e$ but no other remaining edges.

We now prove the query complexity of FindLowDegreeEdges. It is clear that FindLowDegreeEdges is non adaptive because all the queries can be made in parallel. Regarding query complexity, FindLowDegreeEdges constructs $(2n)^{\rho\Delta}\log^{2}n$ samples, and for each one of these samples, makes at most $\mathcal{O}(n)$ queries. Therefore FindLowDegreeEdges makes at most

\mathcal{O}(n(2n)^{\rho\Delta}\log^{2}n)=\mathcal{O}((2n)^{\rho\Delta+1}\log^{2}n)

queries in total. ∎

Appendix C Sequential and Parallel Runtime

The runtime of our proposed algorithms is not much worse than their query complexity. When running sequentially, FindEdgeAdaptive requires an additional $O(n)$ time to do every set partition, FindDisjointEdges requires an additional $O(n)$ time to construct a set of independently sampled vertices, and FindLowDegreeEdges requires an additive $(\Delta n)^{2}d$ time to execute the last for loop that consolidates the candidate edge sets. We summarize both the sequential and parallel runtimes for each algorithm in Table 1 below which we will include in the next version of the paper. We assume that each query can be made in $O(1)$ time, and for the parallel runtime analysis, we assume access to polynomially many parallel processors. Some of our algorithm use a subroutine that can either be FindEdgeParallel or FindEdgeAdaptive. We will use PRL and ADA to refer to these subroutines respectively.

Algorithm	Query Complexity	Sequential Runtime	Parallel Runtime
$\textsc{PRL: FindEdgeParallel}(S)$	$O(\|S\|)$	$O(\|S\|)$	$O(1)$
$\textsc{ADA: FindEdgeAdaptive}(S,s)$	$O(s\log(\|S\|))$	$O(s\|S\|\log(\|S\|))$	$O(\log\|S\|)$
$\textsc{FindDisjointEdges}(\textsc{PRL})$	$O(n^{\alpha}\log^{2}n$ $+n^{2+\alpha}\log^{2}n)$	$O(n^{\alpha+1}\log^{2}n)$	$O(1)$
$\textsc{FindDisjointEdges}(\textsc{ADA})$	$O(n^{\alpha}\log^{2}n$ $+2n^{\alpha}\log^{3}n)$	$O(n^{\alpha+2}\log^{3}n)$	$O(\log n)$
$\textsc{FindMatching}(\textsc{PRL})$	$O(n^{3}\log^{3}n)$	$O(n^{3}\log^{3}n)$	$O(\log^{2}n)$
$\textsc{FindMatching}(\textsc{ADA})$	$O(n\log^{5}n)$	* $O(n^{\alpha+2}\log^{6}n)$	$O(\log^{4}n)$
FindLowDegreeEdges	$O((2n)^{\rho\Delta+1}$ $\log^{2}n)$	$O((2n)^{\rho\Delta+1}\log^{2}n$ $+(\Delta n)^{2}d)$	$O(\log n)$

Table 1: Algorithms Runtime (with

\alpha=1/(1-1/(2\log n))

in the cell with *)

	$\displaystyle\mathbb{P}\big{(}\|S\cap P_{l}\|\leq s_{l}(1-\frac{1}{d_{i}})\big{)}$	$\displaystyle\leq e^{-\frac{s_{l}}{2d_{i}^{2}}}$
		$\displaystyle=e^{-\frac{\|S\|}{n_{i}}\frac{\|P_{l}\|}{2d_{i}^{2}}}$
		$\displaystyle\leq e^{-(1-1/d_{i})\frac{\|P_{l}\|}{2d_{i}^{2}}}$
		$\displaystyle\leq e^{-\frac{\|P_{l}\|}{3d_{i}^{2}}}$
		$\displaystyle\leq e^{-\log^{2}n}$
		$\displaystyle\leq\frac{1}{n^{\log n}}$

	$\displaystyle\frac{\mathbb{P}(e_{j+1}\subseteq S)}{\mathbb{P}(e_{j+1}\subseteq S\|\ e_{1}\subseteq S)}$	$\displaystyle=\frac{\binom{\|S\|}{d}}{\binom{n}{d}}\frac{\binom{n-d}{d}}{\binom{\|S\|-d}{d}}$
		$\displaystyle=\frac{(\|S\|)!}{(\|S\|-d)!}\frac{(\|S\|-2d)!}{(\|S\|-d)!}\frac{(n-d)!}{n!}\frac{(n-d)!}{(n-2d)!}$
		$\displaystyle=\frac{\frac{\|S\|\ \ \ \ldots(\|S\|-d+1)}{(\|S\|-d)\ldots\|S\|-2d+1)}}{\frac{n\ \ \ \ldots(n-d+1)}{(n-d)\ldots n-2d+1)}}$

	$\displaystyle\frac{1}{\binom{\|P_{R}\|+\|P_{R-1}\|}{\|P_{R-1}\|}}$	$\displaystyle\leq\frac{4(\|P_{R-1}\|)!}{(\|P_{R}\|+\|P_{R-1}\|)^{\|P_{R-1}\|}}$
		$\displaystyle\leq\left(\frac{\|P_{R-1}\|}{\|P_{R}\|+\|P_{R-1}\|}\right)^{\|P_{R-1}\|}$
		$\displaystyle\leq\left(\frac{1}{3\log^{2}n\|P_{R-1}\|}\right)^{\|P_{R-1}\|}$
		$\displaystyle\leq\left(\frac{1}{3\log^{2}n\|P_{R-1}\|}\right)^{3\log^{2}n}$
		$\displaystyle\leq\left(\frac{1}{3\log^{2}n\|P_{R-1}\|}\right)^{3\log^{2}n}$
		$\displaystyle=e^{-\Omega(\log^{2}n)}$

	$\displaystyle\mathbb{P}(e_{k+1}\not\subseteq S\ \|\ e\subseteq S,\ e_{1},\ldots,e_{k}\not\subseteq S)$	$\displaystyle=\mathbb{P}(e_{k+1}\not\subseteq S\ \|\ \exists\ \ell\in[1,k]\ e_{\ell}\cap e_{k+1}\setminus S\neq\emptyset,\ e_{1},\ldots,e_{k}\not\subseteq S)$
		$\displaystyle\ \ \ \ \times\mathbb{P}(\exists\ \ell\in[1,k]\ e_{\ell}\cap e_{k+1}\setminus S\neq\emptyset\ \|\ e_{1},\ldots,e_{k}\not\subseteq S)$
		$\displaystyle\ \ \ \ +\mathbb{P}(e_{k+1}\not\subseteq S\ \|\ \forall\ \ell\in[1,k]\ e_{\ell}\cap e_{k+1}\setminus S=\emptyset,\ e_{1},\ldots,e_{k}\not\subseteq S)$
		$\displaystyle\ \ \ \ \times\mathbb{P}(\forall\ \ell\in[1,k]\ e_{\ell}\cap e_{k+1}\setminus S=\emptyset\ \|\ e_{1},\ldots,e_{k}\not\subseteq S\not\subseteq S)$

	$\displaystyle\log\alpha$	$\displaystyle=\log\frac{1}{1-\frac{1}{2\log n}}$
		$\displaystyle\geq\frac{1}{2\log n}.$

	$\displaystyle\alpha$	$\displaystyle=1+\frac{\frac{1}{2\log n}}{1-\frac{1}{2\log n}}$
		$\displaystyle\leq 1+\frac{1}{\log n}.$

Learning Low Degree Hypergraphs

Abstract

1 Introduction

Our results.

Theorem.

Theorem.

Theorem.

Technical overview.

Related work.

2 Preliminaries

3 Learning Hypermatchings

3.1 Learning algorithm for hypermatchings

Definition 1.

Efficiently searching for the unique edge in a set.

Lemma 1.

Proof.

Lemma 2.

Lemma 3.

Proof.

Constructing a unique-edge covering family.

Lemma 4.

Proof.

The main algorithm for hypermatchings.

Theorem 5.

3.2 Hardness of learning hypermatchings

Warm-up: hardness for non-adaptive learning.

Theorem 6.

Proof.

Hardness of learning in o​(log⁡log⁡n)o(\log\log n) rounds.

Theorem 7.

Proof Sketch, full proof in Appendix A.1..

4 Learning Low-Degree Near-Uniform Hypergraphs

Near-uniformity is necessary.

The algorithm.

The analysis.

Lemma 1.

Theorem 8.

Acknowledgements

References

Appendix A Missing Analysis for Hypermatchings (Section 3)

A.1 Missing analysis for algorithm for learning hypermatchings (Section 3.1)

Proof.

Proof.

Proof.

Claim 1.

Proof.

Proof.

A.2 Missing analysis for hardness of learning hypermatchings (Section 3.2)

A.2.1 Lower bound for non-adaptive algorithms

Definition 2.

Lemma 2.

Proof.

Lemma 3.

Proof.

Proof.

A.2.2 Lower bound for o​(log⁡log⁡n)o(\log\log n) adaptive algorithm

Claim 2.

Proof.

Claim 3.

Proof.

Lemma 4.

Proof of Lemma 4.

Lemma 5.

Proof.

Lemma 6.

Proof.

Lemma 7.

Proof.

Lemma 8.

Proof.

Proof.

Appendix B Missing Analysis for Low Degree Near Uniform Hypergraphs (Section 4)

B.1 Proof of Lemma 1

Lemma 9.

Proof.

Lemma 10.

Proof.

Claim 4.

Proof.

Lemma 11.

Hardness of learning in $o(\log\log n)$ rounds.

A.2.2 Lower bound for $o(\log\log n)$ adaptive algorithm