This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Testing Graph Properties with the Container Method

Eric Blais University of Waterloo Cameron Seth University of Waterloo
Abstract

We establish nearly optimal sample complexity bounds for testing the ρ\rho-clique property in the dense graph model. Specifically, we show that it is possible to distinguish graphs on nn vertices that have a ρn\rho n-clique from graphs for which at least ϵn2\epsilon n^{2} edges must be added to form a ρn\rho n-clique by sampling and inspecting a random subgraph on only O~(ρ3/ϵ2)\tilde{O}(\rho^{3}/\epsilon^{2}) vertices.

We also establish new sample complexity bounds for ϵ\epsilon-testing kk-colorability. In this case, we show that a sampled subgraph on O~(k/ϵ)\tilde{O}(k/\epsilon) vertices suffices to distinguish kk-colorable graphs from those for which any kk-coloring of the vertices causes at least ϵn2\epsilon n^{2} edges to be monochromatic.

The new bounds for testing the ρ\rho-clique and kk-colorability properties are both obtained via new extensions of the graph container method. This method has been an effective tool for tackling various problems in graph theory and combinatorics. Our results demonstrate that it is also a powerful tool for the analysis of property testing algorithms.

1 Introduction

Is it possible to test if a graph contains a large clique or if it is kk-colorable while examining only a small fraction of that graph? These problems are two of the foundational examples in graph property testing that were originally considered in the groundbreaking work of Goldreich, Goldwasser, and Ron [GGR98]. The problems can be specified formally in the dense graph property testing framework defined as follows.

A (simple undirected) graph GG on nn vertices is ϵ\epsilon-far from a graph property Π\Pi if we need to add or remove at least ϵn2\epsilon n^{2} edges from GG to obtain a graph that does have the property Π\Pi. A canonical ϵ\epsilon-tester for Π\Pi with sample cost ss is a bounded-error111By bounded error we mean there exist absolute constants δ1>δ2\delta_{1}>\delta_{2} such that if GG has property Π\Pi then the algorithm accepts with probability at least δ1,\delta_{1}, and if GG is ϵ\epsilon-far from Π\Pi then the algorithm accepts with probability at most δ2.\delta_{2}. randomized algorithm that samples a set SS of ss vertices of some unknown graph GG, examines the induced subgraph G[S]G[S], and based only on this local view of the graph can distinguish between the case where GG has property Π\Pi and the case where it is ϵ\epsilon-far from Π\Pi. The sample complexity of Π\Pi, denoted 𝒮Π(n,ϵ)\mathcal{S}_{\Pi}(n,\epsilon), is the minimum value ss for which there is a canonical ϵ\epsilon-tester for Π\Pi with sample cost ss.

1.1 Testing Cliques

The ρ\rho-Clique property is the set of all graphs on nn vertices that contain a clique on ρn\rho n vertices. The problem of testing the ρ\rho-Clique property can be considered in both the large clique regime, where ρ\rho is a constant, and in the small clique regime, where ρ:=ρ(n)\rho:=\rho(n) is a function of the number of vertices of the graph. In the small clique regime, it is important to note that the problem is non-trivial only when ϵ<ρ2\epsilon<\rho^{2} is also a function of nn, since we can always add at most (ρn2)<ρ2n2\binom{\rho n}{2}<\rho^{2}n^{2} edges to any graph to form a ρn\rho n-clique.

What is the sample complexity 𝒮ρ-Clique(n,ϵ)\mathcal{S}_{\rho\textsc{-Clique}}(n,\epsilon) for ϵ\epsilon-testing the ρ\rho-Clique property? Goldreich, Goldwasser, and Ron [GGR98] were the first to consider this question and showed that 𝒮ρ-Clique(n,ϵ)=O~(ρ/ϵ4)\mathcal{S}_{\rho\textsc{-Clique}}(n,\epsilon)=\tilde{O}(\rho/\epsilon^{4}). This result has remarkable qualitative implications in both the large clique and the small clique regimes. In the large clique regime, it shows that ϵ\epsilon-testing ρ\rho-cliques requires only inspection of a constant-sized subgraph when ϵ>0\epsilon>0 is an absolute constant. And in the small clique regime, it shows that the sample complexity for ϵ\epsilon-testing ρ\rho-clique property is sublinear in nn for all ρ=ω(n1/7)\rho=\omega(n^{-1/7}) when ϵ=Ω(ρ2)\epsilon=\Omega(\rho^{2})—i.e., when the tester must distinguish graphs with ρn\rho n-cliques from those whose ρn\rho n-subgraphs all have constant density bounded away from 11.

Improved bounds on the sample complexity for testing cliques were obtained by Feige, Langberg, and Schechtman [FLS04]. With a new direct analysis of the canonical tester for testing cliques, they showed that 𝒮ρ-Clique(n,ϵ)=O~(ρ4/ϵ3)\mathcal{S}_{\rho\textsc{-Clique}}(n,\epsilon)=\tilde{O}(\rho^{4}/\epsilon^{3}). And by considering the restricted problem of distinguishing graphs that consist of a single ρn\rho n-clique from those that consist of a single (ρϵρ)n(\rho-\frac{\epsilon}{\rho})n-clique, they showed that 𝒮ρ-Clique(n,ϵ)=Ω~(ρ3/ϵ2)\mathcal{S}_{\rho\textsc{-Clique}}(n,\epsilon)=\tilde{\Omega}(\rho^{3}/\epsilon^{2}).

Our first main result is a new upper bound on the sample complexity of testing the ρ\rho-Clique property that is nearly optimal, since it matches the lower bound of Feige, Langberg, and Schechtman up to polylogarithmic factors.

Theorem 1.

The sample complexity of the ρ\rho-Clique property is 𝒮ρ-Clique(n,ϵ)=O~(ρ3ϵ2)\mathcal{S}_{\rho\textsc{-Clique}}(n,\epsilon)=\tilde{O}(\frac{\rho^{3}}{\epsilon^{2}}).222Here and throughout the article, we use O~()\tilde{O}(\cdot) and Ω~()\tilde{\Omega}(\cdot) notation to hide terms that are polylogarithmic in the argument. See Sections 3 and 4 for the precise formulation of the main theorems.

The qualitative implications of Theorem 1 are most striking in the small clique regime. In this regime, the result has a more natural representation when we reformulate it using notation from the Densest kk-Subgraph (DkkS) problem. (For more details on the DkkS problem, see [Man17] and the references therein.)

Theorem 1’ (Alternative formulation).

For every nn, k:=k(n)<nk:=k(n)<n, and δ:=δ(n)>0\delta:=\delta(n)>0, there is a bounded-error randomized algorithm that distinguishes (i) graphs that contain a kk-clique from (ii) graphs whose kk-subgraphs all contain at most (1δ)k2(1-\delta)k^{2} edges by inspecting the induced subgraph of the input graph GG on a random set SS of only |S|=O(nδ2kln3(nδ2k))|S|=O(\frac{n}{\delta^{2}k}\ln^{3}(\frac{n}{\delta^{2}k})) vertices.

This result implies that for every constant δ>0\delta>0 and every k=ω(ln3n)k=\omega(\ln^{3}n), we can distinguish graphs with a kk-clique from graphs whose kk-subgraphs have density at most 1δ1-\delta by examining only a sublinear portion of the graph. The previous bounds in [GGR98] and [FLS04] showed that sublinear sample complexity is achievable for constant δ\delta when k=ω~(n6/7)k=\tilde{\omega}(n^{6/7}) and k=ω~(n1/2)k=\tilde{\omega}(n^{1/2}), respectively.

This result can also be viewed as a generalization of results of Rácz and Schiffer [RS19] and of Huleihel, Mazumdar, and Pal [HMP21] regarding the sample complexity of the Planted kk-Clique Detection problem. The latter result shows that sampling an induced subgraph on O(nkδlnnk)O(\frac{n}{k\delta}\ln{\frac{n}{k}}) vertices suffices to distinguish (i) random graphs with edge probability (1δ)(1-\delta) from (ii) random graphs with edge probability (1δ)(1-\delta) containing a planted kk-clique. The former result shows a similar bound for the restricted case of δ=1/2.\delta=1/2. When δ>0\delta>0 is a constant, the bound in Theorem 1’ shows that the same sample complexity (up to logarithmic factors) suffices to solve the more general perfect completeness decision version of the Densest kk-Subgraph problem as defined in the theorem statement.

Moreover, Theorem 1’ also gives a nearly optimal upper bound on the query complexity of algorithms that solve the Densest kk-Subgraph problem in the regime where δ>0\delta>0 is constant. In this setting, the algorithm is free to query any vertex pair (instead of selecting a set SS of vertices and querying all the pairs of vertices in SS to determine G[S]G[S] exactly), and is free to adaptively select which pairs of vertices to query based on the results of its earlier queries. The bound in the theorem implies that there is a bounded-error algorithm that distinguishes graphs with kk-cliques from those whose subgraphs have at most (1δ)k2(1-\delta)k^{2} edges and has query complexity at most O(n2δ4k2ln6(nδ2k))O(\frac{n^{2}}{\delta^{4}k^{2}}\ln^{6}(\frac{n}{\delta^{2}k})). However, as shown in [RS19, HMP21], at least Ω(n2δ2k2ln2nk)\Omega(\frac{n^{2}}{\delta^{2}k^{2}}\ln^{2}{\frac{n}{k}}) queries are required to solve the easier kk-Planted Clique problem, even for adaptive algorithms.

1.2 Testing Colorability

The kk-Colorable property is the set of all graphs on nn vertices that are kk-colorable. The case where k=2k=2 corresponds to bipartiteness. For testing bipartiteness, nearly optimal bounds 𝒮2Colorable(n,ϵ)=Θ~(1/ϵ)\mathcal{S}_{2-\textsc{Colorable}}(n,\epsilon)=\tilde{\Theta}(1/\epsilon) [GGR98, AK02] are known. We focus on the case where k3k\geq 3. We again consider two different regimes: the low chromatic number regime where kk is a constant, and the polychromatic regime where k=k(n)k=k(n) is a function of nn. In the polychromatic regime, we note that ϵ\epsilon-testing the kk-Colorable property is non-trivial only when ϵ<1/k\epsilon<1/k (and so also depends on nn) since we can always eliminate at most k(n/k2)n2/kk\binom{n/k}{2}\leq n^{2}/k edges from a graph to make it kk-colorable.

The study of the sample complexity of the kk-Colorable property has a long history that predates the definition of property testing itself. Rödl and Duke [RD85], building on prior work of Bollobás, Erdős, Simonovits, and Szemerédi [BESS78], showed that 𝒮k-Colorable(n,ϵ)\mathcal{S}_{k\textsc{-Colorable}}(n,\epsilon) is a constant independent of nn when both kk and ϵ\epsilon are constant. However, this result relies on the regularity lemma and so the bounds obtained on the sample complexity grow as a tower of height polynomial in 1/ϵ1/\epsilon. Notably, this means that these results give only very limited results in the polychromatic regime since they give a sublinear bound on the sample complexity of k(n)k(n)-colorability only for some k(n)k(n) that are iterated logarithm functions of nn.

Goldreich, Goldwasser, and Ron [GGR98] obtained much better bounds for kk-Colorable. They showed that 𝒮k-Colorable(n,ϵ)=O~(k2/ϵ3)\mathcal{S}_{k\textsc{-Colorable}}(n,\epsilon)=\tilde{O}(k^{2}/\epsilon^{3}) for all k3k\geq 3. This result provides a significant improvement in the testability of colorability in the polychromatic regime: it shows that kk-colorability is testable with a sublinear sample complexity when ϵ=δ/k\epsilon=\delta/k for a constant δ>0\delta>0 whenever k=o(n1/5)k=o(n^{1/5}).

Alon and Krivelevich [AK02] further improved the bounds for testing kk-colorability. They showed that 𝒮k-Colorable(n,ϵ)=O~(k/ϵ2)\mathcal{S}_{k\textsc{-Colorable}}(n,\epsilon)=\tilde{O}(k/\epsilon^{2}) for all k3k\geq 3. This result shows that kk-colorability can be testable with a sublinear sample complexity for all k=o(n1/3)k=o(n^{1/3}). Sohler [Soh12] obtained another improvement in the low chromatic number regime, showing that 𝒮k-Colorable(n,ϵ)=O~(k6/ϵ)\mathcal{S}_{k\textsc{-Colorable}}(n,\epsilon)=\tilde{O}(k^{6}/\epsilon) and, in particular, when kk is constant then the sample complexity for testing kk-colorability is Θ~(1/ϵ)\tilde{\Theta}(1/\epsilon). But in the polychromatic setting, the bound of Alon and Krivelevich remains stronger whenever k=ω(1)k=\omega(1) and ϵ=ω(1/k5)\epsilon=\omega(1/k^{5}).

Our next main result unifies and improves on both of the incomparable results of Alon and Krivelevich [AK02] and Sohler [Soh12].

Theorem 2.

The kk-Colorable property has sample complexity 𝒮k-Colorable(n,ϵ)=O~(kϵ)\mathcal{S}_{k\textsc{-Colorable}}(n,\epsilon)=\tilde{O}(\frac{k}{\epsilon}).

Theorem 2 implies that sublinear sample complexity suffices to distinguish kk-colorable graphs from graphs whose kk-colorings all create at least δ/k\delta/k monochromatic edges for all values of k=o(n)k=o(\sqrt{n}) when δ>0\delta>0 is constant.

We do not know whether the bound in Theorem 2 is optimal. The best lower bound for testing kk-colorability in the polychromatic regime is Ω~(1/ϵ)\tilde{\Omega}(1/\epsilon) [AK02]. Since ϵ\epsilon-testing kk-colorability is non-trivial only when ϵ<1/k\epsilon<1/k, this means that there is a quadratic gap between our new upper bound and this lower bound for testing kk-colorability for large values of kk.

1.3 The Graph Container Method

Theorems 1 and 2 are both established by analyzing the natural tester that checks whether the sampled induced subgraph G[S]G[S] has the property that we are testing or not. In the case of large cliques, this tester accepts with probability at least 1/2,333See the Remark at the end of Section 3.3 for a discussion related to this choice of acceptance probability. and in the case of kk-colorability, the tester always accepts when the graph has the property. The challenge in the analysis, as is usually the case in property testing, is in showing that the tester rejects all graphs that are far from having the property with sufficiently large probability. We address this challenge with the use of the graph container method.

We first note that the graph container method is stated in terms of independent sets. In the case of testing ρ\rho-cliques, our application of the graph container method applies more naturally to testing ρ\rho-independent sets, which is equivalent by considering the complement of a graph.

Very briefly, the graph container method builds upon the observation that though a graph may contain a large number of independent sets, for every graph GG we can find a much smaller collection of sets of vertices that we call containers such that (i) every independent set is a subset of at least one of the containers, (ii) the containers are small, and (iii) for each container CC, the induced subgraph G[C]G[C] is sparse. The graph container method was originally introduced by Kleitman and Winston [KW82] to bound the number of square-free graphs. The method has since been used to great effect, notably by Sapozhenko (see [Sap05] and the references therein), and was later extended to the hypergraph container method [BMS15, ST15] that has seen even more wide use throughout combinatorics. The container method has also been recently applied to algorithmic settings for the problem of approximately counting independent sets in bipartite graphs [JPP23] and the problem of obtaining faster exact algorithms for almost-regular graphs [Zam23].

Our use of the graph container method builds on Kleitman and Winston’s original approach to it. The key component of the method as they introduced it is a simple greedy algorithm for identifying a small set of vertices of an independent set that form its fingerprint. The fingerprint of an independent set II in turn uniquely defines the container that contains II. Our main technical lemmas provide tight bounds on the size of the containers as a function of the size of the corresponding fingerprints. We describe the algorithm and present the technical lemmas in Section 2.

The tight bounds on the size of containers can then be used to analyze the soundness of testers for ρ\rho-independent sets in the following way. Let SS be the set of ss vertices sampled by the tester. Any independent set SS has a (very small) fingerprint that defines a container with bounded size; we can use this bounded size to show that the probability that more than ρs\rho s vertices (including the fingerprint vertices) in SS are drawn from that container is extremely small. So small, in fact, that we can apply a union bound over all potential fingerprints in SS to bound the overall probability that the sample includes a large independent set. For all the details of the proof, see Section 3.

For testing kk-colorability, the overall approach is the same, but a different argument is required to show that for any kk independent sets in the sample, we can identify kk fingerprints whose corresponding containers have a small union. See Section 4 for the details.

1.4 Discussion

Theorems 1 and 2 suggest a number of intriguing open problems. We discuss three of them briefly.

Property testing with the container method.

Our results demonstrate that the graph container method provides a useful tool to study the problems of testing cliques and colorability. In subsequent work, we show that the graph container method can be used to study all 0–1 graph partition properties, as defined by Nakar and Ron [NR18]. It would be interesting to see if the method is useful for testing other classes of graph properties as well. For example, the class of 0–1 graph partition properties does not include the property of containing one, or multiple, large cliques, and so we suspect there is a larger class of graph partition properties that the container method can be used on.

We would be remiss if we didn’t also discuss the hypergraph container method. In the subsequent work mentioned above, we have also been able to show that the hypergraph container method can also be used to provide tight analysis of testers for various partition properties of hypergraphs. However, one of the most exciting aspects of the hypergraph container method in combinatorics is that it has led to a number of results to problems in combinatorics that at first glance do not appear to relate directly to hypergraphs. Could the hypergraph container method similarly yield a new analysis technique for obtaining new property testing results in other domains as well?

Query complexity.

What exactly is the relation between the query complexity of (adaptive or non-adaptive) testers that are free to query any potential edge in a graph and the query complexity of the canonical tester that samples a set of vertices and queries all potential edges within the corresponding induced subgraph? This question has been studied extensively in the dense graph setting (see, e.g., [AFKS00, GT03, BT04, BL10, GR10, GR11, GW21] and discussions in [Gol17, BY22]) but remains open for many natural properties of graphs.

As discussed above, Theorem 1 gives a partial answer to this question for the case of testing large cliques, since it shows that the number of edge queries performed by the canonical ϵ\epsilon-tester for the ρ\rho-Clique property is (up to logarithmic factors) identical to the query complexity of the best adaptive testing algorithm for the same problem when ϵ=Ω(ρ2)\epsilon=\Omega(\rho^{2}). But can adaptivity improve query complexity when ϵ\epsilon is smaller? In particular, is it possible to 1n\frac{1}{\sqrt{n}}-test the 12\frac{1}{2}-Clique property with o(n2)o(n^{2}) edge queries?

For kk-colorability, the analogous question has already been posed even in the setting where k=2k=2: Bogdanov and Li [BL10] conjectured that it is possible to 1n\frac{1}{n}-test 22-colorability (that is, bipartiteness) with o(n2)o(n^{2}) queries. Theorem 2 does not shed light on this conjecture (or its analogues for larger values of kk) but it is possible that Lemma 6 or a similar variant could be helpful in this conjecture and other related ones.

Time complexity.

Theorem 1’ only bounds the sample complexity of the Densest kk-Subgraph problem, but the proof of this bound also immediately implies fairly strong results on the time complexity of the problem. Namely, it shows that distinguishing graphs with kk-cliques from those whose kk-subgraphs all have density at most (1δ)(1-\delta) for some constant δ>0\delta>0 can be done by sampling s=O(nδ2kln3(nδ2k))s=O(\frac{n}{\delta^{2}k}\ln^{3}(\frac{n}{\delta^{2}k})) vertices and checking whether the induced subgraph on those vertices contains a clique of size (k/n)s=O(δ2ln3(nδ2k))(k/n)s=O(\delta^{-2}\ln^{3}(\frac{n}{\delta^{2}k})). Even by using exhaustive search for the latter task, the resulting time complexity is quasipolynomial in nn—or, more precisely, it is nO(ln3n)n^{O(\ln^{3}n)}. Stronger time complexity bounds already exist for this problem [FS97], but it would be interesting to see if the graph container method could be used to directly improve the time complexity bounds for various property testing problems and related approximation problems.

2 The Container Method

The notions of fingerprints and containers of independent sets in graphs are defined according to the procedure defined in Algorithm 1, which is a slight variant of the greedy algorithm that was (implicitly) originally introduced by Kleitman and Winston [KW82]. The algorithm works as follows. The first fingerprint of an independent set II is simply the vertex vv contained in II with highest degree in G.G. Since II is an independent set, any neighbour of vv must not be contained in II and, further, by the fact that vv has the highest degree in GG of all vertices in I,I, any vertex with higher degree in GG cannot be in I.I. The associated container is constructed by removing all vertices from these two collections. The procedure is repeated until every vertex from II has been selected to the fingerprint.

Input: A graph G=(V,E)G=(V,E) and an independent set IVI\subseteq V
1 Initialize F0F_{0}\leftarrow\emptyset and C0VC_{0}\leftarrow V;
2
3for t=1,2,,|I|t=1,2,\dots,|I| do
4       vtv_{t}\leftarrow the vertex in IFt1I\setminus F_{t-1} with largest degree in G[Ct1]G[C_{t-1}];
5      
      // Add vtv_{t} to the fingerprint
6       FtFt1{vt}F_{t}\leftarrow F_{t-1}\cup\{v_{t}\};
7      
      // Remove all the neighbours of vtv_{t} from the container
8       CtCt1{wCt1:(v,w)E}C_{t}\leftarrow C_{t-1}\setminus\big{\{}w\in C_{t-1}:(v,w)\in E\big{\}};
9      
      // And remove all vertices with higher degree than vtv_{t} in G[Ct1]G[C_{t-1}]
10       CtCt{wCt1:degG[Ct1](w)>degG[Ct1](vt)}C_{t}\leftarrow C_{t}\setminus\big{\{}w\in C_{t-1}:\deg_{G[C_{t-1}]}(w)>\deg_{G[C_{t-1}]}(v_{t})\big{\}};
11      
12Return F1,,F|I|F_{1},\ldots,F_{|I|} and C1,,C|I|C_{1},\ldots,C_{|I|};
Algorithm 1 Fingerprint & Container Generator

The greedy algorithm for generating fingerprints and containers was originally defined with an appropriate stopping condition so that each independent set generates a single fingerprint and container. For our proofs, however, it will be more convenient to work directly with the sequence of fingerprints F1,F2,,F|I|F_{1},F_{2},\ldots,F_{|I|} and their corresponding containers C1,C2,,C|I|C_{1},C_{2},\ldots,C_{|I|} generated by the algorithm for the independent set II. Further, it is also convenient to extend the definition with Ct=Ft=IC_{t}=F_{t}=I for all t>|I|.t>|I|. We refer to FtF_{t} and CtC_{t} as the tt-th fingerprint and tt-th container of II, respectively.

When we consider the fingerprints or containers of multiple independent sets, we write Ft(I)F_{t}(I) and Ct(I)C_{t}(I) to denote the tt-th fingerprint and container of the independent set II. (When the current context specifies a single independent set, we will continue to write only FtF_{t} and CtC_{t} to keep the notation lighter.)

2.1 Basic Properties of Containers

By construction, the sequence of fingerprints and containers of an independent set satisfy a number of useful properties. For instance, we have

F1F2F|I|=IC|I|C|I|1C2C1F_{1}\subseteq F_{2}\subseteq\cdots\subseteq F_{|I|}=I\subseteq C_{|I|}\subseteq C_{|I|-1}\subseteq\cdots\subseteq C_{2}\subseteq C_{1} (1)

so that for each t=1,2,,|I|t=1,2,\ldots,|I|, FtICtF_{t}\subseteq I\subseteq C_{t}.

The construction of the algorithm also guarantees that the tt-th container of an independent set is the same as the tt-th container of its tt-th fingerprint.

Proposition 3.

For any graph G=(V,E)G=(V,E), any independent set II in GG, and any t,t, the fingerprint Ft(I)F_{t}(I) and container Ct(I)C_{t}(I) of II satisfy

Ct(Ft(I))=Ct(I).C_{t}\big{(}F_{t}(I)\big{)}=C_{t}(I).
Proof.

If t>|I|t>|I| then Ft(I)=I=Ct(I)F_{t}(I)=I=C_{t}(I) and the equality holds. Now consider the case that t|I|.t\leq|I|. If v1,,vtv_{1},\ldots,v_{t} are the vertices selected in the first tt rounds of the greedy algorithm on input II, then the algorithm selects the same vertices and forms the same container CtC_{t} when provided with input Ft(I)F_{t}(I) instead of II. ∎

Another basic property of the containers of any independent set is that we can bound their maximum degree in the following way.

Proposition 4.

For any graph G=(V,E)G=(V,E) on |V|=n|V|=n vertices, any independent set II in GG, and any t1,t\geq 1, the tt-th container of II has maximum degree bounded by Δ(G[Ct])n/t\Delta(G[C_{t}])\leq n/t.

Proof.

If t>|I|,t>|I|, then Ct=I,C_{t}=I, and so the maximum degree is 0.0. Now, consider any t|I|t\leq|I|. Since the containers satisfy the containment V=C0C1CtV=C_{0}\supseteq C_{1}\supseteq\dots\supseteq C_{t},

n|C0||Ct|=i=1t(|Ci1||Ci|)=i=1t|Ci1Ci|n\geq|C_{0}|-|C_{t}|=\sum_{i=1}^{t}\big{(}|C_{i-1}|-|C_{i}|\big{)}=\sum_{i=1}^{t}|C_{i-1}\setminus C_{i}|

and there must be an index j[t]j\in[t] for which |Cj1Cj|n/t|C_{j-1}\setminus C_{j}|\leq n/t. This means that the vertex vjv_{j} chosen in the jjth round of the greedy algorithm must have degree at most n/tn/t in the induced subgraph G[Cj1]G[C_{j-1}]. All vertices that had higher degree in the same graph were excluded from CjC_{j}, so all vertices have degree at most n/tn/t in G[Cj]G[C_{j}]. Since CtCjC_{t}\subseteq C_{j}, the maximum degree of G[Ct]G[C_{t}] is bounded above by n/tn/t as well. ∎

2.2 Graph Container Lemmas

In order to use containers in our proofs, we need stronger structural results about containers when the graph is ϵ\epsilon-far from the properties of interest.

In the case of testing the ρ\rho-Clique property, we first change our focus so that we instead look at graphs that are ϵ\epsilon-far from having the ρ\rho-IndepSet property. We show that for each such graph GG, every independent set in GG is a subset of a container of size strictly less than ρn\rho n. In fact, our result shows more: there is a container for II of size (ρα)n(\rho-\alpha)n where α\alpha is at least as large as ϵ/ρ\epsilon/\rho times the length of the corresponding fingerprint (up to log factors).

Lemma 5 (Graph Container Lemma I).

Let G=(V,E)G=(V,E) be a graph on nn vertices that is ϵ\epsilon-far from the ρ\rho-IndepSet property. For any independent set II in GG there exists an index t8ρ2ϵln(2ρϵ)t\leq\frac{8\rho^{2}}{\epsilon}\ln(\frac{2\rho}{\epsilon}) such that the size of the tt-th container of II is bounded by

|Ct|(ρtϵ8ρln(2ρϵ))n|C_{t}|\leq\left(\rho-t\cdot\frac{\epsilon}{8\rho\ln(\frac{2\rho}{\epsilon})}\right)n (2)

and G[Ct]G[C_{t}] contains at most ϵn2\epsilon n^{2} edges.

Graph container lemmas like the one above are often used to bound the size of a collection of containers that cover all independent sets. Indeed, Lemma 5 also immediately yields such a bound, showing that for every graph GG that is ϵ\epsilon-far from the ρ\rho-IndepSet property, there is a collection of nO~(ρ2/ϵ)n^{\tilde{O}(\rho^{2}/\epsilon)} containers such that every independent set of GG is a subset of at least one container. However, for our intended application in the proof of Theorem 1, the size of the collection of containers is not relevant; what we need is the bound on the size of the containers in (2) and, in particular, the trade-off in that bound between the size of the container and its fingerprint.

For testing the kk-Colorable property, we need a different type of Graph Container lemma which bounds the size of the union of kk containers for different independent sets in a graph.

Lemma 6 (Graph Container Lemma II).

Let GG be ϵ\epsilon-far from kk-colorable, and let I1,,IkI_{1},\dots,I_{k} be independent sets in G.G. Then, there exists t4ϵt\leq\frac{4}{\epsilon} such that

|i=1kCt(Ii)|(1tϵ4ln(1ϵ))n.\left|\cup_{i=1}^{k}C_{t}(I_{i})\right|\leq\left(1-t\cdot\frac{\epsilon}{4\ln(\frac{1}{\epsilon})}\right)n. (3)

Once again, the upper bound on the size of the union of containers is what we need for the proof of Theorem 2, and this result is possible only because the bound in (3) decreases linearly with the size of the corresponding fingerprints.

We present the proof of Lemma 5 in Section 3 and the proof of Lemma 6 in Section 4.

3 Testing Cliques and Independent Sets

We complete the proof of Theorem 1 in this section. The proof we describe applies to both the ρ\rho-Clique and the ρ\rho-IndepSet properties (by considering the complement of the tested graph) but the latter fits in more naturally with the graph container method, so in the rest of the section we discuss the result entirely in terms of independent sets.

3.1 Container Shrinking Lemma

The main tool in the proof of the Graph Container Lemma for testing independent sets, Lemma 5, is a shrinking lemma that, informally speaking, says that many vertices are excluded in each round of Algorithm 1 whenever the containers contain a large subset CC for which the induced subgraph G[C]G[C] is sparse.

Lemma 7 (Container Shrinking Lemma).

Let G=(V,E)G=(V,E) be a graph on nn vertices that is ϵ\epsilon-far from the ρ\rho-IndepSet property, let II be an independent set in GG, and let t<|I|t<|I| be an index for which the tt-th container CtC_{t} of II has cardinality |Ct|ρn|C_{t}|\geq\rho n. For any α>0\alpha>0, if there exists a subset CCt+1C\subseteq C_{t+1} of (ρα)n(\rho-\alpha)n vertices where G[C]G[C] contains at most ϵ4n2\frac{\epsilon}{4}n^{2} edges, then

|Ct+1C|(1ϵ4ρα)|CtC|.\left|C_{t+1}\setminus C\right|\leq\left(1-\frac{\epsilon}{4\rho\alpha}\right)\left|C_{t}\setminus C\right|.
Proof.

If CC contains a vertex vv with degree at least ϵ4ρα|CtC|\frac{\epsilon}{4\rho\alpha}|C_{t}\setminus C| in the graph G[Ct|G[C_{t}| then at least this number of vertices are eliminated since the vertex vt+1v_{t+1} selected by the greedy algorithm has degree at least as large as that of vv, otherwise vv would not have been included in Ct+1C_{t+1} (and so therefore would not be in CC).

Consider now the setting where all the vertices in CC have degree at most ϵ4ρα|CtC|\frac{\epsilon}{4\rho\alpha}|C_{t}\setminus C|. We claim that G[CtC]G[C_{t}\setminus C] contains at least ϵ4ρα|CtC|2\frac{\epsilon}{4\rho\alpha}|C_{t}\setminus C|^{2} edges. This claim suffices to complete the proof of the lemma, since it implies that at least ϵ4ρα|CtC|\frac{\epsilon}{4\rho\alpha}|C_{t}\setminus C| vertices in CtCC_{t}\setminus C have degree at least ϵ4ρα|CtC|\frac{\epsilon}{4\rho\alpha}|C_{t}\setminus C| in G[CtC]G[C_{t}\setminus C], and thus that at least this many vertices from CtCC_{t}\setminus C are excluded from Ct+1C_{t+1} by Algorithm 1.

We now complete the proof of the claim that G[CtC]G[C_{t}\setminus C] contains at least ϵ4ρα|CtC|2\frac{\epsilon}{4\rho\alpha}|C_{t}\setminus C|^{2} edges.

Let RR be chosen uniformly at random from all subsets of CtCC_{t}\setminus C of size αn\alpha n. Since GG is ϵ\epsilon-far from having independent sets of size ρn\rho n, the induced subgraph G[RC]G[R\cup C] contains at least ϵn2\epsilon n^{2} edges. By the lemma hypothesis, G[C]G[C] contains at most ϵ4n2\frac{\epsilon}{4}n^{2} edges. Finally, each edge connecting a vertex in CC to one in CtCC_{t}\setminus C is included in G[RC]G[R\cup C] with probability |R|/|CtC||R|/|C_{t}\setminus C|. So by the maximum degree of the vertices in CC in the graph G[Ct]G[C_{t}], the expected number of edges between CC and RR is at most ϵ4α|R|n=ϵ4n2\frac{\epsilon}{4\alpha}|R|n=\frac{\epsilon}{4}n^{2}. Therefore, the expected number of edges in G[R]G[R] is at least

ϵ2n2=ϵ2α2|R|2ϵα2(|R|2).\frac{\epsilon}{2}n^{2}=\frac{\epsilon}{2\alpha^{2}}|R|^{2}\geq\frac{\epsilon}{\alpha^{2}}\binom{|R|}{2}.

In other words, the random graph RR chosen uniformly at random from the subgraphs of CtCC_{t}\setminus C of cardinality αn\alpha n has expected density at least ϵα2\frac{\epsilon}{\alpha^{2}}. This implies that CtCC_{t}\setminus C also has density at least ϵα2\frac{\epsilon}{\alpha^{2}}. So it contains at least ϵα2(|CtC|2)ϵρα(|CtC|2)ϵ4ρα|CtC|2\frac{\epsilon}{\alpha^{2}}\binom{|C_{t}\setminus C|}{2}\geq\frac{\epsilon}{\rho\alpha}\binom{|C_{t}\setminus C|}{2}\geq\frac{\epsilon}{4\rho\alpha}|C_{t}\setminus C|^{2} edges, as we wanted to show. ∎

3.2 Proof of Graph Container Lemma I

We are now ready to complete the proof of the Graph Container Lemma for testing independent sets, restated below.

See 5

Proof.

Let CC denote the first container in the sequence C1,C2,C_{1},C_{2},\ldots that contains at most ϵ4n2\frac{\epsilon}{4}n^{2} edges. The existence of CC is guaranteed because C|I|+1C_{|I|+1} is an independent set and has no edges. Define α\alpha such that |C|=(ρα)n|C|=(\rho-\alpha)n. The value of α\alpha is bounded by ϵ2ραρ\frac{\epsilon}{2\rho}\leq\alpha\leq\rho, where the lower bound on α\alpha is due to the fact that GG is ϵ\epsilon-far from the ρ\rho-IndepSet property.

Define tt^{*} to be the largest index for which |Ct|ρn|C_{t^{*}}|\geq\rho n. First observe that t|I|t^{*}\leq|I| because otherwise CtC_{t^{*}} is an independent set, which contradicts the fact that GG is ϵ\epsilon-far from the ρ\rho-IndepSet property. Hence, by Lemma 7 applied to all values of t=0,1,2,,t1t=0,1,2,\ldots,t^{*}-1,

|CtC|(1ϵ4ρα)tn|C_{t^{*}}\setminus C|\leq\left(1-\frac{\epsilon}{4\rho\alpha}\right)^{t^{*}}n

and since |CtC|αn|C_{t^{*}}\setminus C|\geq\alpha n, we conclude that t4ραϵln(1/α)4ραϵln(2ρ/ϵ)t^{*}\leq\frac{4\rho\alpha}{\epsilon}\ln(1/\alpha)\leq\frac{4\rho\alpha}{\epsilon}\ln(2\rho/\epsilon).

Consider now values t>tt>t^{*}. When G[Ct]G[C_{t}] contains more than ϵ4n2\frac{\epsilon}{4}n^{2} edges, then the vertices in CtC_{t} have average degree at least ϵ4ρn\frac{\epsilon}{4\rho}n in this graph. This means that at least ϵ4ρn\frac{\epsilon}{4\rho}n vertices in CtC_{t} have degree at least ϵ4ρn\frac{\epsilon}{4\rho}n in G[Ct]G[C_{t}], so at least ϵ4ρn\frac{\epsilon}{4\rho}n vertices are removed in the tt-th iteration of Algorithm 1. Since |Ct+1C|αn|C_{t^{*}+1}\setminus C|\leq\alpha n, there can be at most 4ραϵ\frac{4\rho\alpha}{\epsilon} such iterations before we reach CC. Therefore, the container CC satisfies the conclusion of the lemma. ∎

3.3 Proof of Theorem 1

We are now ready to use Lemma 5 to complete the proof of Theorem 1. The proof of the theorem also makes use of the following form of Chernoff’s bound for hypergeometric distributions.

Lemma 8 (Chernoff’s Bound).

Let XX be drawn from the hypergeometric distribution H(N,K,n)H(N,K,n) where nn elements are drawn without replacement from a population of size NN where KK elements are marked and XX represents the number of marked elements that were drawn. Then for any ϑE[X]\vartheta\geq E[X],

Pr[Xϑ]exp((ϑE[X])2ϑ+E[X]).\Pr\big{[}X\geq\vartheta\big{]}\leq\exp\left(-\frac{(\vartheta-E[X])^{2}}{\vartheta+E[X]}\right).
Proof.

A standard multiplicative form of Chernoff’s bound states that for all δ>0\delta>0, the random variable XX satisfies Pr[X(1+δ)E[X]]exp(δ2E[X]2+δ)\Pr\big{[}X\geq(1+\delta)E[X]\big{]}\leq\exp\left(-\frac{\delta^{2}E[X]}{2+\delta}\right). (This standard bound is usually stated for sums of independent variables, but the same bound also applies to hypergeometric random variables as well. See, e.g., [Mul18]). Using the identity ϑ=(1+ϑE[X]E[X])E[X]\vartheta=\left(1+\frac{\vartheta-E[X]}{E[X]}\right)E[X], we can apply this inequality with the parameter δ=ϑE[X]E[X]\delta=\frac{\vartheta-E[X]}{E[X]} and simplify. ∎

We are now ready to prove Theorem 1, restated below in its precise form (with the polylogarithmic term) and for the ρ\rho-IndepSet property.

Theorem 1 (Precise formulation).

The sample complexity of the ρ\rho-IndepSet property is

𝒮ρ-IndepSet(n,ϵ)=O(ρ3ϵ2ln3(1ϵ)).\mathcal{S}_{\rho\textsc{-IndepSet}}(n,\epsilon)=O\left(\frac{\rho^{3}}{\epsilon^{2}}\ln^{3}\left(\frac{1}{\epsilon}\right)\right).
Proof.

Let SS be a random set of s=cρ3ϵ2ln3(1ϵ)s=c\cdot\frac{\rho^{3}}{\epsilon^{2}}\ln^{3}\left(\frac{1}{\epsilon}\right) vertices drawn uniformly at random from VV without replacement, where cc is a large enough constant. Note that for clarity of presentation, in the rest of the proof we ignore all integer rounding issues as they do not affect the asymptotics of the final result.

If GG contains a ρn\rho n independent set, then SS contains at least ρs\rho s vertices from this independent set with probability at least 12\frac{1}{2}, since the number of such vertices follows a hypergeometric distribution, and the median of this distribution is at least ρs\rho s [Neu70, KB80].

In the remainder of the proof we upper bound the probability that SS contains a ρs\rho s-independent set when GG is ϵ\epsilon-far from containing a ρn\rho n-independent set. For the rest of the argument, let us call a container CtC_{t} small when its cardinality is bounded by |Ct|(ρtϵ8ρln(2ρ/ϵ))n|C_{t}|\leq\left(\rho-t\cdot\frac{\epsilon}{8\rho\ln(2\rho/\epsilon)}\right)n, the expression (2) in the conclusion of Lemma 5.

Let us denote the vertices of the sampled set SS as u1,u2,,usu_{1},u_{2},\ldots,u_{s}. First observe that, by Lemma 5, for every independent set II of size ρs\rho s in SS, there is a t8ρ2ln(2ρϵ)ϵt\leq\frac{8\rho^{2}\ln(\frac{2\rho}{\epsilon})}{\epsilon} and a fingerprint FtIF_{t}\subseteq I of size tt that defines a small container Ct(Ft)C_{t}(F_{t}) such that ICt(Ft).I\subseteq C_{t}(F_{t}). Hence, the probability that SS contains an independent set of size at least ρs\rho s is at most the probability that there exists some t8ρ2ln(2ρϵ)ϵt\leq\frac{8\rho^{2}\ln(\frac{2\rho}{\epsilon})}{\epsilon} such that SS contains a set of tt vertices that form the fingerprint FtF_{t} of a small container Ct,C_{t}, and SS contains at least ρst\rho s-t other vertices from Ct.C_{t}. We now upper bound this probability.

For any tt distinct indices i1,i2,,it[s]i_{1},i_{2},\ldots,i_{t}\in[s], consider the event where the vertices ui1,,uitu_{i_{1}},\ldots,u_{i_{t}} form the fingerprint FtF_{t} of a small container CtC_{t}. Let XX denote the number of vertices among the other sts-t sampled vertices that belong to CtC_{t}. By the bound on the size of small containers, the expected value of XX is

E[X](ρtϵ8ρln(2ρϵ))(st)<ρstϵs8ρln(2ρϵ)ρsttϵs16ρln(2ρϵ),\operatorname{\mathrm{E}}[X]\leq\left(\rho-\frac{t\epsilon}{8\rho\ln(\frac{2\rho}{\epsilon})}\right)(s-t)<\rho s-\frac{t\epsilon s}{8\rho\ln(\frac{2\rho}{\epsilon})}\leq\rho s-t-\frac{t\epsilon s}{16\rho\ln(\frac{2\rho}{\epsilon})},

where the last inequality uses the fact that s=cρ3ϵ2ln3(1ϵ)s=c\cdot\frac{\rho^{3}}{\epsilon^{2}}\ln^{3}(\frac{1}{\epsilon}) for a large enough constant cc and that the problem is non-trivial only when ϵ<ρ2.\epsilon<\rho^{2}.

By the Chernoff bound in Lemma 8, the probability that we draw at least ρst\rho s-t vertices from CtC_{t} in the final sts-t vertices drawn to form SS is

Pr[Xρst]exp((ρstE[X])2ρst+E[X])exp(t2ϵ2s512ρ3ln2(2ρϵ)).\Pr[X\geq\rho s-t]\leq\exp\left(\frac{-(\rho s-t-E[X])^{2}}{\rho s-t+E[X]}\right)\leq\exp\left(-\frac{t^{2}\epsilon^{2}s}{512\rho^{3}\ln^{2}(\frac{2\rho}{\epsilon})}\right).

Therefore, by applying a union bound over all values t8ρ2ln(2ρϵ)ϵt\leq\frac{8\rho^{2}\ln(\frac{2\rho}{\epsilon})}{\epsilon} and all possible choices of tt indices from [s][s], the probability that any set of at most 8ρ2ln(2ρϵ)ϵ\frac{8\rho^{2}\ln(\frac{2\rho}{\epsilon})}{\epsilon} vertices in SS form the fingerprint of a small container from which we sample at least ρs\rho s vertices in SS is at most

t(st)exp(t2ϵ2s512ρ3ln2(2ρϵ))texp(tlnst2ϵ2s512ρ3ln2(2ρϵ))<13\sum_{t}\binom{s}{t}\exp\left(-\frac{t^{2}\epsilon^{2}s}{512\rho^{3}\ln^{2}(\frac{2\rho}{\epsilon})}\right)\leq\sum_{t}\exp\left(t\ln s-\frac{t^{2}\epsilon^{2}s}{512\rho^{3}\ln^{2}(\frac{2\rho}{\epsilon})}\right)<\frac{1}{3}

where the first inequality uses the upper bound (st)st,\binom{s}{t}\leq s^{t}, and the last inequality again uses the fact that s=cρ3ϵ2ln3(1ϵ)s=c\cdot\frac{\rho^{3}}{\epsilon^{2}}\ln^{3}(\frac{1}{\epsilon}) for a large enough constant cc. Hence, the probability that SS contains an independent set of size at least ρs\rho s is less than 1/3.1/3.

Remark.

The completeness guarantee in the proof of Theorem 1 states that the algorithm correctly accepts graphs that have a ρn\rho n-independent set with probability at least 12\frac{1}{2}, instead of the usual 23\frac{2}{3} bound typically used in property testing. Since the soundness guarantee is that the algorithm accepts graphs that are ϵ\epsilon-far from the ρ\rho-IndepSet property with probability at most 13\frac{1}{3}, these guarantees still enable us to apply standard error reduction techniques to obtain algorithms that err with probability at most γ\gamma with a ln(1/γ)\ln(1/\gamma) multiplicative factor. In particular, it shows that there is also a standard tester with completeness guarantee 23\frac{2}{3} with the same asymptotic sample complexity.

Alternatively, it is also possible to obtain the 23\frac{2}{3} completeness guarantee directly by changing the algorithm slightly to check if the induced subgraph has the (ρτ)(\rho-\tau)-IndepSet property for some appropriate gap parameter τ:=τ(ρ,ϵ)\tau:=\tau(\rho,\epsilon).

4 Testing k-Colorability

In this section we prove Theorem 2. We start by proving a graph container lemma in Section 4.1, and complete the proof in Section 4.2.

4.1 Proof of Graph Container Lemma II

In this section we prove the main lemma that balances the trade off between the size of the union and the sum of the fingerprints sizes of a set of containers, restated from Section 2.2.

See 6

Proof.

Assume on the contrary that for all t4/ϵt\leq 4/\epsilon,

|i=1kCt(Ii)|>(1tϵ4ln(1/ϵ))n.\left|\cup_{i=1}^{k}C_{t}(I_{i})\right|>\left(1-t\cdot\frac{\epsilon}{4\ln(1/\epsilon)}\right)n. (4)

We will obtain a contradiction by constructing a kk-partition V1,,VkV_{1},\dots,V_{k} of the vertices which shows that GG is not 3ϵ4\frac{3\epsilon}{4}-far from kk-colorable.

Initialize the sets V1=C4/ϵ(I1),,Vk=C4/ϵ(Ik).V_{1}=C_{4/\epsilon}(I_{1}),\dots,V_{k}=C_{4/\epsilon}(I_{k}). Remove vertices from V1,,VkV_{1},\dots,V_{k} until every vertex appears in one set so that V1,,VkV_{1},\dots,V_{k} forms a partition of i=1kC4/ϵ(Ii).\cup_{i=1}^{k}C_{4/\epsilon}(I_{i}). By Proposition 4, each vertex has degree at most ϵn/4\epsilon n/4 in it’s set, and so the sum of edges contained in the sets V1,,VkV_{1},\dots,V_{k} is at most ϵn24.\frac{\epsilon n^{2}}{4}.

In the remainder of the proof, we allocate the remaining vertices in Vi[k]Vi=Vi[k]C4/ϵ(Ii)V\setminus\cup_{i\in[k]}V_{i}=V\setminus\cup_{i\in[k]}C_{4/\epsilon}(I_{i}) to one of the parts and argue that the number of additional edges is small.

For every t=1,,4ϵ,t=1,\dots,\frac{4}{\epsilon}, let AtA_{t} be the set of vertices that are contained in at least one container at the (t1)(t-1)-th step, but not contained in any container at step tt step of the container procedure. In other words, let

At={vV:vi[k]Ct1(Ii) and vi[k]Ct(Ii).}A_{t}=\{v\in V:v\in\cup_{i\in[k]}C_{t-1}(I_{i})\textrm{ and }v\notin\cup_{i\in[k]}C_{t}(I_{i}).\}

Observe that A1,,A4/ϵA_{1},\dots,A_{4/\epsilon} partitions the vertices in Vi[k]Vi,V\setminus\cup_{i\in[k]}V_{i}, and so to complete the partition of GG we add the vertices of A1,,A4/ϵA_{1},\dots,A_{4/\epsilon} to V1,,VkV_{1},\dots,V_{k} in the following way: for each t=1,,4ϵt=1,\dots,\frac{4}{\epsilon} and vAt,v\in A_{t}, select an ii such that vCt1(Ii),v\in C_{t-1}(I_{i}), and add vv to set Vi.V_{i}.

In order to count the new edges, consider adding the vertices from A1,,A4/ϵA_{1},\dots,A_{4/\epsilon} in reverse order (starting at vertices in A4/ϵA_{4/\epsilon} and going to A1A_{1}). In this way, when vAtv\in A_{t} is added to Vi,V_{i}, it holds that ViCt1(Ii).V_{i}\subseteq C_{t-1}(I_{i}). Hence, by Proposition 4, each vAtv\in A_{t} contributes at most nt1\frac{n}{t-1} new edges to ViV_{i} (or nn edges if t=1t=1) because vv is contained in Ct1(Ii).C_{t-1}(I_{i}).

Then, the sum of the edges in each of the sets V1,,VkV_{1},\dots,V_{k} can be upper bounded by

ϵn24+|A1|n+t=24/ϵ|At|n(t1).\frac{\epsilon n^{2}}{4}+|A_{1}|\cdot n+\sum_{t=2}^{4/\epsilon}|A_{t}|\frac{n}{(t-1)}. (5)

For any t,t, =1tAt=Vi[k]Ct(Ii),\cup_{\ell=1}^{t}A_{t}=V\setminus\cup_{i\in[k]}C_{t}(I_{i}), and so (4) implies that =1t|A|tϵn4ln(1/ϵ).\sum_{\ell=1}^{t}|A_{\ell}|\leq\frac{t\epsilon n}{4\ln(1/\epsilon)}. In the sum (5), the contribution from |At||A_{t}| goes down as tt increases, and so the sum is maximized when |At|=ϵn4ln(1/ϵ)|A_{t}|=\frac{\epsilon n}{4\ln(1/\epsilon)} for all t.t. Hence, the sum of the edges in each of the sets V1,,VkV_{1},\dots,V_{k} can be upper bounded by

ϵn24+ϵn24ln(1/ϵ)+ϵn24ln(1/ϵ)t=24/ϵ1t1<ϵn2,\frac{\epsilon n^{2}}{4}+\frac{\epsilon n^{2}}{4\ln(1/\epsilon)}+\frac{\epsilon n^{2}}{4\ln(1/\epsilon)}\sum_{t=2}^{4/\epsilon}\frac{1}{t-1}<\epsilon n^{2},

where the last inequality uses the upper bound H(m)ln(m)+1H(m)\leq\ln(m)+1 on the harmonic series and the bound ϵ<e2\epsilon<e^{-2} that we can apply without loss of generality. This is a contradiction with the fact that GG is ϵ\epsilon-far from kk-colorable. ∎

4.2 Proof of Theorem 2

We can now complete the proof of Theorem 2, restated below.

Theorem 2 (Precise formulation).

The sample complexity of the kk-Colorable property is

𝒮k-Colorable(n,ϵ)=O(kϵln2(1ϵ)).\mathcal{S}_{k\textsc{-Colorable}}(n,\epsilon)=O\left(\frac{k}{\epsilon}\ln^{2}\left(\frac{1}{\epsilon}\right)\right).
Proof.

Let SS be a random set of s=ckϵln2(1ϵ)s=c\frac{k}{\epsilon}\ln^{2}\left(\frac{1}{\epsilon}\right) vertices drawn uniformly at random from VV without replacement, where cc is a large enough constant. The tester accepts a graph GG if and only if G[S]G[S] is kk-colorable. When GG is kk-colorable, then G[S]G[S] is also kk-colorable so the tester always accepts.

In the remainder of the proof, we upper bound the probability that G[S]G[S] is kk-colorable when GG is ϵ\epsilon-far from kk-colorable. In the following, let us say that kk containers Ct(I1),,Ct(Ik)C_{t}(I_{1}),\ldots,C_{t}(I_{k}) have a small union when |i=1kCt(Ii)|(1tϵ4ln(1/ϵ))n\left|\cup_{i=1}^{k}C_{t}(I_{i})\right|\leq\left(1-\frac{t\epsilon}{4\ln(1/\epsilon)}\right)n.

Let us denote the vertices of SS as u1,u2,,usu_{1},u_{2},\ldots,u_{s}. For any t4ϵt\leq\frac{4}{\epsilon}, any values t1,,tktt_{1},\ldots,t_{k}\leq t, and any set of t1+t2++tkt_{1}+t_{2}+\cdots+t_{k} distinct indices {ij,1,ij,2,,ij,tj}j=1,,k\{i_{j,1},i_{j,2},\ldots,i_{j,t_{j}}\}_{j=1,\ldots,k} in [s][s], let us consider the event where the fingerprints Ft(j)={ij,1,,ij,tj}F_{t}^{(j)}=\{i_{j,1},\ldots,i_{j,t_{j}}\} define a collection of containers Ct(Ft(1)),,Ct(Ft(k))C_{t}(F_{t}^{(1)}),\ldots,C_{t}(F_{t}^{(k)}) with a small union. When this event occurs, the probability that the remaining s(t1++tk)stks-(t_{1}+\cdots+t_{k})\geq s-tk vertices in SS are all drawn from j=1kCt(Ft(j))\cup_{j=1}^{k}C_{t}(F_{t}^{(j)}) is at most (1tϵ4ln(1/ϵ))stk\left(1-\frac{t\epsilon}{4\ln(1/\epsilon)}\right)^{s-tk}.

For any tt, there are at most ((st))kstk(\binom{s}{\leq t})^{k}\leq s^{tk} ways to choose kk sets of at most tt distinct indices as above. So by applying a union bound argument over all t4ϵt\leq\frac{4}{\epsilon} and over all possible choices of indices, the probability that any set of at most 4kϵ\frac{4k}{\epsilon} vertices in SS form the kk fingerprints of a family of containers with a small union from which we sample all the remaining vertices is at most

t=14/ϵstk(1tϵ4ln(1/ϵ))stkt=14/ϵexp(tkln(s)tϵs8ln(1/ϵ))<1/3\sum_{t=1}^{4/\epsilon}s^{tk}\left(1-\frac{t\epsilon}{4\ln(1/\epsilon)}\right)^{s-tk}\leq\sum_{t=1}^{4/\epsilon}\exp\left(tk\ln(s)-\frac{t\epsilon s}{8\ln(1/\epsilon)}\right)<1/3

where the first inequality uses the fact that s>2tks>2tk and the second is obtained by using the fact that s=ckϵln2(1ϵ)s=c\frac{k}{\epsilon}\ln^{2}\left(\frac{1}{\epsilon}\right) for a large enough constant cc (and that the problem is non-trivial only when ϵ<1/k\epsilon<1/k and ln(kϵ)ln1ϵ2=2ln1ϵ\ln(\frac{k}{\epsilon})\leq\ln\frac{1}{\epsilon^{2}}=2\ln\frac{1}{\epsilon} in this regime).

By Lemma 6, for every collection of kk independent sets I1,,IkI_{1},\ldots,I_{k} in SS, there is a value t4ϵt\leq\frac{4}{\epsilon} for which the fingerprints Ft(I1),,Ft(Ik)F_{t}(I_{1}),\ldots,F_{t}(I_{k}) define containers Ct(I1),,Ct(Ik)C_{t}(I_{1}),\ldots,C_{t}(I_{k}) with a small union. And by Proposition 3, each of these containers satisfy Ct(Ij)=Ct(Ft(Ij))C_{t}(I_{j})=C_{t}(F_{t}(I_{j})). Therefore, the argument above implies that the probability that the union of any kk independent sets contains all of SS, or equivalently that SS is kk-colorable, is at most 1/31/3. ∎

References

  • [AFKS00] Noga Alon, Eldar Fischer, Michael Krivelevich, and Mario Szegedy. Efficient testing of large graphs. Combinatorica, 20(4):451–476, 2000.
  • [AK02] Noga Alon and Michael Krivelevich. Testing k-colorability. SIAM Journal on Discrete Mathematics, 15(2):211–227, 2002.
  • [BESS78] Béla Bollobás, Paul Erdős, Miklós Simonovits, and Endre Szemerédi. Extremal graphs without large forbidden subgraphs. Annals of Discrete Mathematics, 3:29–41, 1978.
  • [BL10] Andrej Bogdanov and Fan Li. A better tester for bipartiteness? arXiv preprint 1011.0531, November 2010.
  • [BMS15] József Balogh, Robert Morris, and Wojciech Samotij. Independent sets in hypergraphs. Journal of the American Mathematical Society, 28(3):669–709, 2015.
  • [BT04] Andrej Bogdanov and Luca Trevisan. Lower bounds for testing bipartiteness in dense graphs. In Proceedings. of the 19th IEEE Annual Conference on Computational Complexity, pages 75–81, 2004.
  • [BY22] Arnab Bhattacharyya and Yuichi Yoshida. Property Testing: Problems and Techniques. Springer Nature, 2022.
  • [FLS04] Uriel Feige, Michael Langberg, and Gideon Schechtman. Graphs with tiny vector chromatic numbers and huge chromatic numbers. SIAM Journal on Computing, 33(6):1338–1368, 2004.
  • [FS97] Uriel Feige and Michael Seltser. On the densest k-subgraph problem. Technical Report No. CS97-16, Weizmann Institute of Science, 1997.
  • [GGR98] Oded Goldreich, Shari Goldwasser, and Dana Ron. Property testing and its connection to learning and approximation. Journal of the ACM, 45(4):653–750, 1998.
  • [Gol17] Oded Goldreich. Introduction to property testing. Cambridge University Press, 2017.
  • [GR10] Mira Gonen and Dana Ron. On the benefits of adaptivity in property testing of dense graphs. Algorithmica, 58(4):811–830, 2010.
  • [GR11] Oded Goldreich and Dana Ron. Algorithmic aspects of property testing in the dense graphs model. SIAM Journal on Computing, 40(2):376–445, 2011.
  • [GT03] Oded Goldreich and Luca Trevisan. Three theorems regarding testing graph properties. Random Structures & Algorithms, 23(1):23–57, 2003.
  • [GW21] Oded Goldreich and Avi Wigderson. Non-adaptive vs adaptive queries in the dense graph testing model. In Proceedings of the 62nd IEEE Annual Symposium on Foundations of Computer Science, pages 269–275, 2021.
  • [HMP21] Wasim Huleihel, Arya Mazumdar, and Soumyabrata Pal. Random subgraph detection using queries. arXiv preprint arXiv:2110.00744, 2021.
  • [JPP23] Matthew Jenssen, Will Perkins, and Aditya Potukuchi. Approximately counting independent sets in bipartite graphs via graph containers. Random Structures & Algorithms, 2023.
  • [KB80] Rob Kaas and Jan M Buhrman. Mean, median and mode in binomial distributions. Statistica Neerlandica, 34(1):13–18, 1980.
  • [KW82] Daniel J. Kleitman and Kenneth J. Winston. On the number of graphs without 4-cycles. Discrete Mathematics, 41(2):167–172, 1982.
  • [Man17] Pasin Manurangsi. Almost-polynomial ratio ETH-hardness of approximating densest k-subgraph. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 954–961, June 2017.
  • [Mul18] Wolfgang Mulzer. Five proofs of Chernoff’s bound with applications. Bulletin of EATCS, 1(124), 2018.
  • [Neu70] Peter Neumann. Über den median einiger dikreter verteilungen und eine damit zusammenhängende monotone konvergenz. Wissenschaftliche Zeitschrift der Technischen Universität Dresden, 19:29–33, 1970.
  • [NR18] Yonatan Nakar and Dana Ron. On the testability of graph partition properties. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2018), page 13, 2018.
  • [RD85] Vojtěch Rödl and Richard A. Duke. On graphs with small subgraphs of large chromatic number. Graphs and Combinatorics, 1(1):91–96, 1985.
  • [RS19] Miklós Z. Rácz and Benjamin Schiffer. Finding a planted clique by adaptive probing. arXiv preprint 1903.12050, 2019.
  • [Sap05] Alexander Sapozhenko. Systems of containers and enumeration problems. In Stochastic Algorithms: Foundations and Applications (SAGA 2005), pages 1–13. Springer, 2005.
  • [Soh12] Christian Sohler. Almost optimal canonical property testers for satisfiability. In Proceedings of the 53rd Annual IEEE Symposium on Foundations of Computer Science, pages 541–550, 2012.
  • [ST15] David Saxton and Andrew Thomason. Hypergraph containers. Inventiones mathematicae, 201(3):925–992, 2015.
  • [Zam23] Or Zamir. Algorithmic applications of hypergraph and partition containers. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, 2023.