This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Testing Triangle Freeness in the General Model in Graphs with Arboricity O(n)O(\sqrt{n})

Reut Levi Email: [email protected].
Abstract

We study the problem of testing triangle freeness in the general graph model. This problem was first studied in the general graph model by Alon et al. (SIAM J. Discret. Math. 2008) who provided both lower bounds and upper bounds that depend on the number of vertices and the average degree of the graph. Their bounds are tight only when dmax=O(d)d_{\rm max}=O(d) and d¯n\bar{d}\leq\sqrt{n} or when d¯=Θ(1)\bar{d}=\Theta(1), where dmaxd_{\rm max} denotes the maximum degree and d¯\bar{d} denotes the average degree of the graph. In this paper we provide bounds that depend on the arboricity of the graph and the average degree. As in Alon et al., the parameters of our tester is the number of vertices, nn, the number of edges, mm, and the proximity parameter ϵ\epsilon (the arboricity of the graph is not a parameter of the algorithm). The query complexity of our tester is O~(Γ/d¯+Γ)poly(1/ϵ)\tilde{O}(\Gamma/\bar{d}+\Gamma)\cdot{\rm poly}(1/\epsilon) on expectation, where Γ\Gamma denotes the arboricity of the input graph (we use O~()\tilde{O}(\cdot) to suppress O(loglogn)O(\log\log n) factors). We show that for graphs with arboricity O(n)O(\sqrt{n}) this upper bound is tight in the following sense. For any Γ[s]\Gamma\in[s] where s=Θ(n)s=\Theta(\sqrt{n}) there exists a family of graphs with arboricity Γ\Gamma and average degree d¯\bar{d} such that Ω(Γ/d¯+Γ)\Omega(\Gamma/\bar{d}+\Gamma) queries are required for testing triangle freeness on this family of graphs. Moreover, this lower bound holds for any such Γ\Gamma and for a large range of feasible average degrees 111For a graph, GG, whose arboricity is Γ\Gamma, the number of edges is at most nΓn\cdot\Gamma and at least Γ2\Gamma^{2}. Thus, the average degree of GG is at least Γ2/n\Gamma^{2}/n and at most Γ\Gamma..

1 Introduction

Testing triangle-freeness is one of the most basic decision problems on graphs. The existence of triangles in a graph is often a crucial property for various applications. In the realm of property testing, decision problems are relaxed so that a tester for a property 𝒫\mathcal{P} is only required to distinguish between graphs that have the property 𝒫\mathcal{P} from graphs which are “far” according to some predetermined distance measure, from having the property 𝒫\mathcal{P}, which in our case are graphs which are far from being triangle free.

Testing triangle freeness is known to be possible with query complexity which only depends on the proximity parameter, ϵ\epsilon, in graphs which are either dense or sparse. More specifically, Alon, Fischer, Krivelevich and Szegedy [2] showed that in the dense-graphs model [8] it is possible to test triangle-freeness with query complexity which is independent of the size of the graph but has a tower-type dependence in 1/ϵ1/\epsilon. In the other extreme, Goldreich and Ron [9] observed that in the bounded-degree model [9] it is possible to test triangle-freeness with query complexity O(1/ϵ)O(1/\epsilon) given that the maximum degree of the input graph is constant.

Alon, Kaufman, Krivelevich, Ron [3] were the first to study this problem in the general-graphs model [12, 11]. This model is more stringent in the sense that we do not assume anything on the density of the graph and the distance is measured with respect to the actual number of edges in the graph (instead of the maximum possible number of edges). They provided several upper bounds which apply for almost the entire range of average degrees. They also provided lower bounds that show that their upper bounds are at most quadratic in the optimal bounds. Shortly after, Rast [13] and Gugelmann [10] improved their upper bounds and lower bounds, respectively, for some ranges of the parameters.

Although there is a fairly significant gap between the known upper bounds and lower bounds for the vast range of parameters, there has been no progress on this question since then. In this paper we provide an upper bound and several lower bounds which are tight for a large range of parameters. Surprisingly, our bounds depend on the arboricity of the graph although it is not a parameter of our algorithm.

1.1 Results

We provide an upper bound whose running time complexity is O~(Γ/d¯+Γ)poly(1/ϵ)\tilde{O}(\Gamma/\bar{d}+\Gamma)\cdot{\rm poly}(1/\epsilon) on expectation. Therefore, for mnm\leq n our upper bound is O~(Γ/d¯)\tilde{O}(\Gamma/\bar{d}) and when m>nm>n our upper bound is O~(Γ)\tilde{O}(\Gamma) (ignoring polynomial dependencies in 1/ϵ1/\epsilon).

We provide three lower bounds, each suitable for a different range of parameters.

  1. 1.

    For any Γ\Gamma and any feasible m1m\geq 1, we provide a lower bound of Ω((Γn)/m)=Ω(Γ/d¯)\Omega((\Gamma n)/m)=\Omega(\Gamma/\bar{d}) queries. Therefore our upper bound is tight when mnm\leq n (up to polynomial dependencies in 1/ϵ1/\epsilon and O(loglogn)O(\log\log n) factors).

  2. 2.

    For any Γ\Gamma and any feasible mΓ3m\geq\Gamma^{3} we provide a lower bound of Ω(Γ)\Omega(\Gamma) queries. Therefore, our upper bound is also essentially tight as long as mΓ3m\geq\Gamma^{3} (notice that since mnΓm\leq n\cdot\Gamma, it is implied that this lower bound applies only for graphs in which Γ=O(n1/3)\Gamma=O(n^{1/3})). Since we may assume that mnm\geq n (otherwise we already have essentially tight lower bound), one implication of this lower bound is that our upper bound is tight in the strong sense for graphs with arboricity O(n1/3)O(n^{1/3}) (namely it is tight for any feasible mm) as it is always the case that mΓ3m\geq\Gamma^{3} for Γ=O(n1/3)\Gamma=O(n^{1/3}).

  3. 3.

    For any Γ(n/2)1/2\Gamma\leq(n/2)^{1/2} and any feasible nmΓ3n\leq m\leq\Gamma^{3} we provide a lower bound of Ω(m1/3)\Omega(m^{1/3}) queries. Since it is always the case that mΓ2m\geq\Gamma^{2}, a lower bound of Ω(Γ2/3)\Omega(\Gamma^{2/3}) queries is also implied.

To summarize, for graphs of arboricity Γ=O(n1/2)\Gamma=O(n^{1/2}) we obtain that our upper bound is tight for a large range of average degrees. Additionally, for the range of average degrees in which we do not provide tight bounds, our upper bound is essentially O(Γ)O(\Gamma) while our lower bound is Ω(Γ2/3)\Omega(\Gamma^{2/3}) in the worst case.

1.2 Related Work

1.2.1 Property testing of triangle freeness

As mentioned above, testing triangle freeness, in the context of property testing, was first studied by Alon et al. [2] in the dense graphs model. They showed that triangle freeness can be tested in time which is independent of the size of the graph. However, their upper bound has tower-type dependence in 1/ϵ1/\epsilon. Alon [1] showed that the query complexity of this problem in the dense-graphs model is indeed super-polynomial in 1/ϵ1/\epsilon.

In the bounded degree model Goldreich and Ron [9] observed that it is possible to test triangle freeness with query complexity O(1/ϵ)O(1/\epsilon) in graphs of maximum degree bounded by some constant.

The problem of testing triangle freeness in the general graph model was first studied by Alon, Kaufman, Krivelevich, Ron [3]. The query complexity of their algorithms dependent on nn and d¯\bar{d}, the number of vertices in the graph and the average degree, respectively. They provided sublinear upper bounds for almost the entire range of parameters. Moreover, their upper bounds are at most quadratic in their lower bounds. Specifically, their upper bound, which is combined from several upper bounds is O~(min{(nd¯)1/2/ϵ3/2,(n4/3/d¯2/3)/ϵ2})\tilde{O}(\min\{(n\bar{d})^{1/2}/\epsilon^{3/2},(n^{4/3}/\bar{d}^{2/3})/\epsilon^{2}\}). Their lower bound, which is also combined from several lower bounds, is Ω(max{(n/d¯)1/2,min{d¯,n/d¯},min{d¯1/2,n2/3/d¯1/3}no(1)})\Omega(\max\{(n/\bar{d})^{1/2},\min\{\bar{d},n/\bar{d}\},\min\{\bar{d}^{1/2},n^{2/3}/\bar{d}^{1/3}\}\cdot n^{-o(1)}\}).

Rast [13] improved the upper bound of [3] for graphs with average degree in the range [c1n1/5,c2n1/2][c_{1}n^{1/5},c_{2}n^{1/2}] where c1c_{1} and c2c_{2} are some constants. The upper bound in [13] is O(max{(nd¯)4/9,n2/3/d¯1/3})O(\max\{(n\bar{d})^{4/9},n^{2/3}/\bar{d}^{1/3}\}).

Gugelmann [10] provided a lower bound which improves the lower bound in [3] for graphs with average degree in the range [c1n2/5,c2n4/5][c_{1}n^{2/5},c_{2}n^{4/5}] where c1c_{1} and c2c_{2} are some constants. The lower bound in [10] is Ω(min{(nd¯)1/3,n/d¯})\Omega(\min\{(n\bar{d})^{1/3},n/\bar{d}\}).

1.2.2 Sublinear algorithms that receive the arboricity of the graph as a parameter

Eden, Ron and Rosenbaum [5] designed an algorithm that given nn, the number of edges of the input graph and an upper bound on the arboricity of the input graph, Γ\Gamma, the algorithm makes O(Γ/d¯+log3n/ϵ)O(\Gamma/\bar{d}+\log^{3}n/\epsilon) queries on expectation and samples an edge of the graph almost uniformly. More specifically, each edge in the graph is sampled with probability in the range [(1ϵ)m,(1+ϵ)m][(1-\epsilon)m,(1+\epsilon)m].

Eden, Ron and Seshadhri [6] estimate the degree distribution moments of an undirected graph. In particular, for estimating the average degree of a graph, their algorithm has query complexity of O~(Γ/d¯)\tilde{O}(\Gamma/\bar{d}). As they show in their paper, if Γ\Gamma is not given as an input to the algorithm then estimating the average degree is not possible in general with this query complexity.

In another paper, Eden, Ron and Seshadhri [7] give a (1±ϵ)(1\pm\epsilon)-approximation for the number of kk-cliques in a graph given a bound on the arboricity of the graph Γ\Gamma. In particular for triangles they provide an upper bound with expected running time, in terms of nn, Γ\Gamma and the number of triangles in the graph, n3n_{3}, of min{nΓ2/n3,n/n31/3+(mΓ)/n3}poly(logn,1/ϵ)\min\{n\Gamma^{2}/n_{3},n/n_{3}^{1/3}+(m\Gamma)/n_{3}\}\cdot{\rm poly}(\log n,1/\epsilon).

1.2.3 Testing graphs for bounded arboricity

Eden, Levi and Ron [4] provided an algorithm for testing whether a graph has bounded arboricity. Specifically, they provide a tolerant tester that distinguished graphs that are ϵ\epsilon-close to having arboricity Γ\Gamma from which are cϵc\cdot\epsilon-far from having arboricity 3Γ3\Gamma, where cc is an absolute constant. The query complexity and the running time of their algorithm is in terms of nn, mm and Γ\Gamma is O~(n/m+nΓ/m)\tilde{O}(n/\sqrt{m}+n\Gamma/m) and is quasi-polynomial in 1/ϵ1/\epsilon.

1.3 Comparison between our upper bound and upper bounds in previous work

As mentioned before, Alon et al. [3] provide tight bounds only in two cases. The first case is when dmax=O(d¯)d_{\rm max}=O(\bar{d}) and d¯n\bar{d}\leq\sqrt{n}, where dmaxd_{\rm max} denotes the maximum degree and d¯\bar{d} denotes the average degree of the graph. In this case, it follows that Γ=Θ(dmax)=Θ(d¯)\Gamma=\Theta(d_{\rm max})=\Theta(\bar{d}) an so our upper bounds essentially match. Additionally we note that a bound on the arboricty of the graph does not imply a bound on the maximum degree of the graph. In fact, the maximum degree could be Θ(n)\Theta(n) while the arboricity is Θ(1)\Theta(1) (as it is the case in the star graph). Consequently, the tightness of our upper bound is not restricted for graphs which have bounded maximum degree.

The second case is when d¯=Θ(1)\bar{d}=\Theta(1), for these graphs the running time complexity of their algorithm is Θ~(n1/2)\tilde{\Theta}(n^{1/2}). For this case, the running time complexity of our upper bound is O~(Γ)\tilde{O}(\Gamma). We note that in graphs in which d¯=Θ(1)\bar{d}=\Theta(1), Γ\Gamma could range between Θ(1)\Theta(1) and Θ(n1/2)\Theta(n^{1/2}). Therefore when d¯=Θ(1)\bar{d}=\Theta(1) the complexity of our upper bound is not worse than the complexity of the upper bound in [3] but could be much better, depending on Γ\Gamma.

For average degree in the range between Ω(1)\Omega(1) and O(n2/5)O(n^{2/5}) and in the range between Ω(n1/2)\Omega(n^{1/2}) and O(n2/3)O(n^{2/3}) the upper bound of O(m1/2)O(m^{1/2}) queries of Alon et al. achieves the best running time, in terms of nn and mm. For these ranges, the running time of our algorithm is O~(Γ)\tilde{O}(\Gamma). Since m1/2Γm^{1/2}\geq\Gamma, we obtain that for this ranges as well the performances of our algorithm are at least as good (up to O(loglogn)O(\log\log n) and poly(1/ϵ){\rm poly}(1/\epsilon) factors) but could be significantly better.

For average degree in the range between Ω(n2/5)\Omega(n^{2/5}) and O(n1/2)O(n^{1/2}) the upper bound of O(max{(nd¯)4/9,n2/3/d¯1/3})O(\max\{(n\bar{d})^{4/9},n^{2/3}/\bar{d}^{1/3}\}) queries of Rast [13] achieves the best running time. In this range our upper bound is always better than the upper bound of [13] for graphs of arboricity O(n12/21)O(n^{12/21}).

1.4 High-level of Our Algorithm

It is well known that a graph which is ϵ\epsilon-far from being triangle free has Ω(ϵm)\Omega(\epsilon m) edge-disjoint triangles (see Claim 1). Therefore if we were able to sample edges uniformly from the graph then after sampling O(1/ϵ)O(1/\epsilon) edges we would sample an edge {u,v}\{u,v\} which belongs to a triangle. Thus, if we revealed the entire neighborhood of uu and the entire neighborhood of vv then we would find a triangle in the graph. Our algorithm is based on this simple approach. There are only two problems that need to be addressed. The first problem is that sampling edges uniformly in a graph in which the degrees have high variability is too costly. The second problem, which also stems from the variability of the degrees in the graph, is that revealing the entire neighborhood of a vertex can be too costly, depending on its degree.

This is where the arboricity of the graph comes into play. For a graph of arboricity Γ\Gamma, as was shown in [4], the fraction of edges in the subgraph induced on heavy vertices, that is, vertices with degree greater than cΓ/ϵc\Gamma/\epsilon where cc is some absolute constant, is at most ϵ/2\epsilon/2. Therefore, if Γ\Gamma was given to us as a parameter then we could, in some sense, ignore the subgraph induced on vertices of degree greater than Θ(Γ/ϵ)\Theta(\Gamma/\epsilon) since a graph which is ϵ\epsilon-far from being triangle free still have Ω(ϵm)\Omega(\epsilon m) violating edges (and Ω(ϵm)\Omega(\epsilon m) edge-disjoint triangles) even after we remove this subgraph entirely. Ignoring this subgraph allows us on one hand to sample edges almost uniformly from the resulting graph while making only Ω(Γ/d¯)\Omega(\Gamma/\bar{d}) queries, and also guarantees that there are Ω(ϵm)\Omega(\epsilon m) violating edges for which both endpoints are not heavy. This solves the two problems we had with taking the simple approach.

However a bound on the arboricty of the graph is not given to the algorithm as a parameter. Since approximating the arboricity of a graph up to a constant factor is not possible in sublinear time (to see this consider a graph with a hidden clique), we estimate a different parameter which we informally refer to as the effective arboricity of the graph. We show that this parameter suffices for our needs. In fact, this parameter could be much smaller than Γ\Gamma, in which case the complexity of our algorithm is better than O(Γ/d¯+Γ)O(\Gamma/\bar{d}+\Gamma). We reduce the problem of approximating the effective arboricity of the graph to the problem of estimating the number of edges in the graph in which we remove the subgraph induced on heavy vertices, where heavy vertices are defined with respect to increasing thresholds. We stop increasing our threshold once the estimation of the number of edges is sufficiently large. As we prove, with high constant probability, our approximation to the effective arboricity is bounded by O(Γ)O(\Gamma) which leads to a tester with running time O(Γ/d¯+Γ)O(\Gamma/\bar{d}+\Gamma), as claimed.

1.5 Lower Bounds

Our first lower bound of Ω(Γ/d¯)\Omega(\Gamma/\bar{d}) for graphs in which d¯1\bar{d}\leq 1 is based on a simple hitting argument. Specifically, construct a graph which is 1/31/3-far from being triangle free in which Ω(Γ/d¯+Γ)=Ω(Γ/d¯)\Omega(\Gamma/\bar{d}+\Gamma)=\Omega(\Gamma/\bar{d}) queries are required in order to sample a vertex which is not isolated with probability that is at least 1/31/3.

Our other two lower bounds are simple adaptations of the lower bound of Ω(min{d¯,n/d¯})\Omega(\min\{\bar{d},n/\bar{d}\}) queries presented in Alon et al. [3].

2 Preliminaries

Let G=(V,E)G=(V,E) be an undirected graph and let d¯=2m/n\bar{d}=2m/n denote the average degree of GG where n=|V|n=|V| and m=|E|m=|E|. For each vertex vVv\in V, let deg(v){\rm deg}(v) denote the number of neighbors of vv. For a subset of vertices SVS\subseteq V we denote by G([S])G([S]) the subgraph induced on SS. For a directed graph DD we denote by d¯out(D)\bar{d}_{out}(D) the average out-degree of DD.

A graph GG is triangle free if for every three vertices, u,v,wu,v,w in GG at least one pair in {{u,v},{v,w},{w,u}}\{\{u,v\},\{v,w\},\{w,u\}\} is not an edge of GG. A graph GG is ϵ\epsilon-far from being triangle free if more than ϵm\epsilon m edges need to be removed in order to make GG triangle free.

In the general graph model the tester accesses the graph via the following oracle queries.

  1. 1.

    Degree queries: on query vv the oracle returns deg(v){\rm deg}(v).

  2. 2.

    Neighbor queries: on query (v,i)(v,i) where i[deg(v)]i\in[{\rm deg}(v)], the oracle returns the ii-th neighbor of vv.

  3. 3.

    Vertex-pair queries: on query {v,u}\{v,u\} the oracle returns whether there is an edge between uu and vv.

An algorithm is a tester for the property of triangle freeness if given a proximity parameter ϵ\epsilon and access to an input graph GG, it accepts GG with probability at least 2/32/3 if GG is triangle free and rejects GG with probability at least 2/32/3 if GG is ϵ\epsilon-far from being triangle free. If the tester always accepts graphs which are triangle free we say it has one-side error. Otherwise we say it has two-sided error.

Claim 1.

A graph G=(V,E)G=(V,E) which is ϵ\epsilon-far from being triangle free has at least ϵm/3\epsilon m/3 edge-disjoint triangles.

Proof.

Consider a procedure that given a graph GG, as long as there is triangle, tt in GG it deletes all the edges of tt and proceeds in this manner until there are no triangles in the graph. The number of edges which are deleted by this process is at least ϵm\epsilon m. Therefore the number of edge disjoint triangles that are deleted is at least ϵm/3\epsilon m/3. The claim follows. ∎

The arboricity of an undirected graph GG, denoted by Γ(G)\Gamma(G), is the minimum number of forests into which its edges can be partitioned. Equivalently it is the minimum number of spanning forests needed to cover all the edges of the graph.

3 The Algorithm

3.1 First Step: computing the threshold for defining heavy vertices

As described above, for an input graph GG, the number of edges in the subgraph induced on the heavy vertices w.r.t. the threshold 4Γ(G)/ϵ4\Gamma(G)/\epsilon is at most (ϵ/2)|E(G)|(\epsilon/2)|E(G)| (see Claim 4). Therefore, when testing triangle freeness, we may, roughly speaking, ignore this subgraph with the hope of obtaining better complexity. Since Γ(G)\Gamma(G) is not given to the algorithm as a parameter, we compute, in Algorithm 1, a different parameter of the graph, denoted by Γ\Gamma^{*}, which is, roughly speaking, an approximation of the effective arboricity of the graph. In order to specify the guarantees on Γ\Gamma^{*} we shall need a couple of definitions.

Definition 2.

For a graph G=(V,E)G=(V,E) and a threshold tt we define the set of heavy vertices with respect to tt as Ht(G)={vV:d(v)>t}H_{t}(G)=\{v\in V:d(v)>t\} and the set of light vertices with respect to tt as Lt(G)=VHt(G)L_{t}(G)=V\setminus H_{t}(G).

When GG is clear from the context we may simply use HtH_{t} and LtL_{t}. Using the definition of Ht(G)H_{t}(G), we next define the graph H(G,t)H(G,t) which is defined w.r.t. GG and a threshold tt.

Definition 3 (The undirected graph H(G,t)H(G,t)).

For a graph G=(V,E)G=(V,E) and a threshold tt, the graph H(G,t)H(G,t) is an undirected graph defined as follows. The set of vertices of H(G,t)H(G,t) is VV and the set of edges of H(G,t)H(G,t) is E(G){{u,v}:uHt(G) and vHt(G)}E(G)\setminus\{\{u,v\}:u\in H_{t}(G)\text{ and }v\in H_{t}(G)\}. Namely, H(G,t)H(G,t) is the graph GG after removing the edges for which both endpoints are heavy with respect to tt.

Claim 4.

For a graph GG, ΓΓ(G)\Gamma^{\prime}\geq\Gamma(G) and η(0,1]\eta\in(0,1] it holds that |E(H(G,Γ/η))|(12η)|E(G)||E(H(G,\Gamma^{\prime}/\eta))|\geq(1-2\eta)|E(G)|.

Proof.

We shall prove the claim about Γ=Γ\Gamma^{\prime}=\Gamma. The general claim will follow from the fact that |E(H(G,x)||E(H(G,x)| is monotonically non-decreasing in xx. Let G([Ht])G([H_{t}]) denote the sub-graph induced on Ht(G)H_{t}(G) where t=Γ/ηt=\Gamma/\eta and let kk denote the number of edges of this graph. Our goal is to show that k<2ηmk<2\eta m where m=|E(G)|m=|E(G)|. The sum of the degrees of vertices in Ht(G)H_{t}(G) is greater than t|Ht(G)|t\cdot|H_{t}(G)|, therefore m>t|Ht(G)|/2m>t\cdot|H_{t}(G)|/2. On the other hand, since the arboricity of G([Ht])G([H_{t}]) is also bounded by Γ\Gamma it follows that k|Ht(G)|Γk\leq|H_{t}(G)|\cdot\Gamma. Therefore k<2m/tΓ=2ηmk<2m/t\cdot\Gamma=2\eta m, as desired. ∎

The guarantees on Γ\Gamma^{*}, which is the return value of Algorithm 1 (that will be described next), are as described in the following claim.

Claim 5.

With probability at least 5/65/6, Γ\Gamma^{*} returned by Algorithm 1 is such that:

  1. 1.

    |E(H(G,t))|(1(ϵ/6))m|E(H(G,t))|\geq(1-(\epsilon/6))m where t=Γ/(48ϵ)t=\Gamma^{*}/(48\epsilon),

  2. 2.

    Γ2Γ(G)\Gamma^{*}\leq 2\Gamma(G).

Algorithm 1 proceeds in iterations where in each iteration it multiplies Γ\Gamma^{*} by a factor of 22, where initially Γ\Gamma^{*} is set to 11. It stops when the estimated number of edges in E(H(G,t))E(H(G,t)), for tt which is Θ(Γ/ϵ)\Theta(\Gamma^{*}/\epsilon), is at least |E(G)|(1Θ(ϵ))|E(G)|(1-\Theta(\epsilon)). In order to estimate the number of edges in E(H(G,t))E(H(G,t)), Algorithm 1 calls Algorithm 2.

In turn, Algorithm 2 uses the directed graph D(G,t)D(G,t), which we defined momentarily, that is constructed from GG and can be accessed by making a constant number of queries to GG.

Definition 6 (The directed graph D(G,t)D(G,t)).

The graph D(G,t)D(G,t) is a directed version of the graph H(G,t)H(G,t) in which we orient the edges as follows. For every edge {u,v}\{u,v\} of H(G,t)H(G,t), we orient the edge from uu to vv if: (a) uLtu\in L_{t} and vHtv\in H_{t} or (b) both uLtu\in L_{t} and vLtv\in L_{t} and id(u)<id(v)id(u)<id(v). Otherwise, we orient it from vv to uu.

Input: Access to a graph GG and parameters nn, mm and ϵ(0,1]\epsilon\in(0,1]
Output: Γ\Gamma^{*} as described in Lemma 5
1 Set Γ1=1\Gamma_{1}=1
2 for i=1i=1 to logn\log n do
3       Run Algorithm 2 on GG with parameters nn, mm, ϵ/24\epsilon/24, tit_{i} and δ=Θ(loglogn)\delta=\Theta(\log\log n) where ti=defΓi/(24ϵ)t_{i}\stackrel{{\scriptstyle\rm def}}{{=}}\Gamma_{i}/(24\epsilon) . Let ZiZ_{i} denote the returned value.
4       If Zi(1ϵ/12)mZ_{i}\leq(1-\epsilon/12)m then set Γi+1=2Γi\Gamma_{i+1}=2\Gamma_{i}, otherwise, return Γi\Gamma_{i}
5      
Algorithm 1 Compute Γ\Gamma^{*}
Input: Access to an undirected graph GG and parameters parameters nn, mm, ϵ\epsilon, tt and δ\delta, where n=|V(G)|n=|V(G)| and m=|E(G)|m=|E(G)|
Output: Estimation to the number of edges of H(G,t)H(G,t)
1 Sample r=Θ(δ1ϵ2t/d¯)r=\Theta(\delta^{-1}\epsilon^{-2}t/\bar{d}), where d¯=m/n\bar{d}=m/n, vertices v1,,vrv_{1},\ldots,v_{r}, uniformly at random from V(G)V(G).
2 For each i[r]i\in[r]:
3
  1. 1.

    Sample a random neighbor of viv_{i}, uu. If the edge between uu and viv_{i} is oriented from viv_{i} to uu in DD then set Yi=1Y_{i}=1, otherwise set Yi=0Y_{i}=0

  2. 2.

    Set Xi=(dout(v)+din(v))YitX_{i}=\frac{(d_{\rm out}(v)+d_{\rm in}(v))\cdot Y_{i}}{t}

Return X=t|V(G)|ri[r]XiX=\frac{t\cdot|V(G)|}{r}\cdot\sum_{i\in[r]}X_{i}
Algorithm 2 Estimate the number of edges of H(G,t)H(G,t)

The following claim specifies the guarantees of Algorithm 2.

Claim 7.

Given a query access to a graph GG and parameters nn, mm, ϵ\epsilon, tt and δ\delta where n=|V(G)|n=|V(G)| and m=|E(G)|m=|E(G)|, Algorithm 2 outputs XX such that w.p. at least 121/δ1-2^{1/\delta}, (1ϵ)mX(1+ϵ)m(1-\epsilon)m^{\prime}\leq X\leq(1+\epsilon)m^{\prime} if m(12ϵ)mm^{\prime}\geq(1-2\epsilon)m, and X<(1ϵ)mX<(1-\epsilon)m, otherwise, where m=|E(H(G,t))|m^{\prime}=|E(H(G,t))|.

Proof.

First observe that YiY_{i} is an indicator variable to the event that the edge selected in the ii-th iteration of Algorithm 2, {vi,u}\{v_{i},u\} is an out-edge of viv_{i} in D(G,t)D(G,t). Since V(G)=V(D(G,t))V(G)=V(D(G,t)) we obtain the following.

E(Yi)=1|V(D(G,t))|ΣvV(D(G,t))dout(v)dout(v)+din(v).{\rm E}(Y_{i})=\frac{1}{|V(D(G,t))|}\cdot\Sigma_{v\in V(D(G,t))}\frac{d_{\rm out}(v)}{d_{\rm out}(v)+d_{\rm in}(v)}\;. (1)

Similarly,

E(Xi)=1|V(D(G,t))|ΣvV(D(G,t))dout(v)+din(v)tdout(v)dout(v)+din(v)=1|V(D(G,t))|ΣvV(D(G,t))dout(v)t=d¯out(D(G,t))t.\begin{split}{\rm E}(X_{i})=&~{}\frac{1}{|V(D(G,t))|}\cdot\Sigma_{v\in V(D(G,t))}\frac{d_{\rm out}(v)+d_{\rm in}(v)}{t}\cdot\frac{d_{\rm out}(v)}{d_{\rm out}(v)+d_{\rm in}(v)}\\ =&~{}\frac{1}{|V(D(G,t))|}\cdot\Sigma_{v\in V(D(G,t))}\frac{d_{\rm out}(v)}{t}=\frac{\bar{d}_{\rm out}(D(G,t))}{t}\;.\end{split} (2)

Observe that if viHt(G)v_{i}\in H_{t}(G) then dout(v)=0d_{\rm out}(v)=0 and so Yi=Xi=0Y_{i}=X_{i}=0. On the other hand, if viLt(G)v_{i}\in L_{t}(G) then dout(v)+din(v)td_{\rm out}(v)+d_{\rm in}(v)\leq t. Therefore, in both cases Xi[0,1]X_{i}\in[0,1]. Thus, for rr which is Θ(1/(δϵ2E(X1)))\Theta(1/(\delta\epsilon^{2}E(X_{1}))) it follows by Multiplicative Chernoff’s bound (see Theorem 3 in the appendix) that with probability at least 121/δ1-2^{1/\delta},

(1ϵ)E(X1)i[r]Xi/r(1+ϵ)E(X1).(1-\epsilon)E(X_{1})\leq\sum_{i\in[r]}{X_{i}}/r\leq(1+\epsilon)E(X_{1})\;.

And so

t|V(G)|(1ϵ)E(X1)Xt|V(G)|(1+ϵ)E(X1).t\cdot|V(G)|\cdot(1-\epsilon)E(X_{1})\leq X\leq t\cdot|V(G)|\cdot(1+\epsilon)E(X_{1})\;.

Since t|V(G)|E(X1)=|E(H(G,t))|t\cdot|V(G)|\cdot E(X_{1})=|E(H(G,t))| we obtain that

(1ϵ)|E(H(G,t))|X(1+ϵ)|E(H(G,t))|.(1-\epsilon)\cdot|E(H(G,t))|\leq X\leq(1+\epsilon)\cdot|E(H(G,t))|\;.

Hence, if |E(H(G,t))|(12ϵ)m|E(H(G,t))|\geq(1-2\epsilon)m then d¯out(D(G,t))=Θ(d¯(G))\bar{d}_{\rm out}(D(G,t))=\Theta(\bar{d}(G)) and so E(X1)=Θ(d¯(G)/t){\rm E}(X_{1})=\Theta(\bar{d}(G)/t), implying that r=Θ(1/(δϵ2E(X1)))r=\Theta(1/(\delta\epsilon^{2}E(X_{1}))), as desired. On the other hand, if |E(H(G,t))|<(12ϵ)m|E(H(G,t))|<(1-2\epsilon)m, then it is not hard to see that the claim follows by a straightforward coupling argument. More specifically, first assume that |E(H(G,t))|=(12ϵ)m|E(H(G,t))|=(1-2\epsilon)m and so by the above, with probability at least 121/δ1-2^{1/\delta},

X(1+ϵ)|E(H(G,t))|=(1+ϵ)(12ϵ)m<(1ϵ)mX\leq(1+\epsilon)|E(H(G,t))|=(1+\epsilon)(1-2\epsilon)m<(1-\epsilon)m

. Therefore, it follows by a coupling argument that with probability at least 121/δ1-2^{1/\delta}, X<(1ϵ)mX<(1-\epsilon)m also in the case that |E(H(G,t))|<(12ϵ)m|E(H(G,t))|<(1-2\epsilon)m. ∎

Claim 8.

For an input graph GG and parameters nn, mm, ϵ\epsilon, tt and δ\delta, where n=|V(G)|n=|V(G)| and m=|E(G)|m=|E(G)|, the time complexity and query complexity of Algorithm 2 is O(δ1ϵ2t/d¯(G))O(\delta^{-1}\epsilon^{-2}t/\bar{d}(G)).

Proof.

The claim follows from the fact that in order to implement Steps 2.1 and 2.2 of Algorithm 2 the algorithm makes a constant number of queries to GG. Specifically, for each viv_{i} the algorithm either performs a single degree query (in case viHt(G)v_{i}\in H_{t}(G) then Yi=Xi=0)Y_{i}=X_{i}=0) or a single adjacency-list query and 22 degree queries in case viLt(G)v_{i}\in L_{t}(G) (the orientation of the edge {vi,u}\{v_{i},u\} can be determined by the degrees and ids of viv_{i} and uu). The implementation of Step 2.2 does not require additional queries as dout(vi)+din(vi)=dG(vi)d_{\rm out}(v_{i})+d_{\rm in}(v_{i})=d_{G}(v_{i}). ∎

We are now ready to prove Claim 5.

Proof of Claim 5.

For the sake of analysis assume that Algorithm 1 performs all logn\log n iterations of the for-loop. Let EiE_{i} denote the event that ZiZ_{i} is as claimed in Claim 7. By Claim 7, for a fixed ii, the probability that EiE_{i} occurs is at least 1/(6logn)1/(6\log n) for an appropriate setting of δ\delta. Therefore, by the union bound the probability that EiE_{i} occurs for all i[logn]i\in[\log n] is at least 5/65/6. From this point on we condition on the event that indeed EiE_{i} occurs for all i[logn]i\in[\log n].

Let mi=|E(H(G,ti))|m_{i}=|E(H(G,t_{i}))| for every i[logn]i\in[\log n]. Let jj denote the iteration in which Algorithm 1 returns a value. By Step 1 of Algorithm 1, Zj>(1ϵ/12)mZ_{j}>(1-\epsilon/12)m. By Claim 7, it follows that mj(1ϵ/6)mm_{j}\geq(1-\epsilon/6)m, as desired (to see this note that if mj<(1ϵ/6)mm_{j}<(1-\epsilon/6)m then By Claim 7, Zj<(1ϵ/12)mZ_{j}<(1-\epsilon/12)m).

To prove the claim about Γ\Gamma^{*} we consider the minimum j1j^{\prime}\geq 1, for which 2j1Γ2^{j^{\prime}-1}\geq\Gamma. If j<jj<j^{\prime} then clearly Γ<Γ\Gamma^{*}<\Gamma, as desired. Otherwise, we claim that j=jj=j^{\prime} (namely, that Γ=2j1\Gamma^{*}=2^{j^{\prime}-1}) which implies that Γ2Γ\Gamma^{*}\leq 2\Gamma, as desired. To see this, first note that by Claim 4, for any ΓΓ\Gamma^{\prime}\geq\Gamma and η(0,1]\eta\in(0,1], |E(H(G,Γ/η))|(12η)m|E(H(G,\Gamma^{\prime}/\eta))|\geq(1-2\eta)m. Therefore mj(1ϵ/24)mm_{j}^{\prime}\geq(1-\epsilon/24)m. Therefore, by Claim 7, Zj(1ϵ/24)mj(1ϵ/24)2m>(1ϵ/12)mZ_{j^{\prime}}\geq(1-\epsilon/24)m_{j}^{\prime}\geq(1-\epsilon/24)^{2}m>(1-\epsilon/12)m, which implies that the algorithms stops at the jj^{\prime}-th iteration. ∎

3.2 Second Step: sampling edges almost uniformly given a threshold for heavy vertices

Given a threshold tt, Algorithm 3 samples an edge from H(G,t)H(G,t) almost uniformly as described in the next claim.

Claim 9.

Algorithm 3 samples an edge from H(G,t)H(G,t) such that for each edge, ee, of H(G,t)H(G,t), the probability to sample ee is in [c1m,c2m]\left[\frac{c_{1}}{m^{\prime}},\frac{c_{2}}{m^{\prime}}\right], where c1c_{1} and c2c_{2} are absolute constants and m=def|E(H(G,t))|m^{\prime}\stackrel{{\scriptstyle\rm def}}{{=}}|E(H(G,t))|. If tt is such that m|E(G)|/2m^{\prime}\geq|E(G)|/2 then the expected running time of the algorithm is O(t/d¯(G))O(t/\bar{d}(G)).

Proof.

Consider a single iteration of the while loop of Algorithm 3. For an edge ee in H(G,t)H(G,t) let p(e)p(e) denote the probability that ee is returned in this iteration of Algorithm 3. If ee is an edge such that both endpoints are in Lt(G)L_{t}(G), then p(e)=2n1tp(e)=\frac{2}{n}\cdot\frac{1}{t} . If ee is an edge such that one endpoint in Lt(G)L_{t}(G) and the other endpoint is in Ht(G)H_{t}(G), then p(e)=1n1tp(e)=\frac{1}{n}\cdot\frac{1}{t} . Therefore for any two edges e1e_{1} and e2e_{2} in H(G,t)H(G,t) the probability that e1e_{1} is picked by the algorithm is at most twice the probability that e2e_{2} is picked by the algorithm. Since Algorithm 3 only returns edges in H(G,t)H(G,t), the claim about the probability to sample an edge follows.

For a fixed iteration of the while loop, the probability that the algorithm returns one of the edges of E(H(G,t))E(H(G,t)) is at least m1n1tm^{\prime}\cdot\frac{1}{n}\cdot\frac{1}{t} which is at least d¯(G)/(2t)\bar{d}(G)/(2t) in the case that m|E(G)|/2m^{\prime}\geq|E(G)|/2. Therefore in this case the expected number of iterations of the while loop is at most (2t/d¯(G))(2t/\bar{d}(G)). Hence the claim about the expected running time follows. ∎

Input: Access to a graph GG and a parameter tt.
1 while  do
2       Pick u.a.r. a vertex vv from V(G)V(G)
3       Pick u.a.r. j[t]j\in[t]
4       If vLt(G)v\in L_{t}(G) and vv has a jj-th neighbor, uu, then return {v,u}\{v,u\}.
5      
Algorithm 3 Sample an edge from H(G,t)H(G,t) almost uniformly
Remark 1.

We remark that Algorithm 3 is stated as a Las Vegas algorithm. Moreover, if m<|E(G)|/2m<|E(G)|/2 then we can not obtain from Claim 9 any bound on the expected running time of the algorithm. However, we note that since tt and d¯(G)\bar{d}(G) are known then we can set a timeout for the algorithm (specifically ct/d¯(G)ct/\bar{d}(G) for some constant cc) and incorporate the event that we were forced to stop the algorithm in the failure probability of the tester.

3.3 Putting things together - the algorithm for testing triangle freeness

Using Algorithms 2 and 3 we are now ready to describe our tester (Algorithm 4).

Input: Access to a graph GG and parameters nn, mm, and ϵ\epsilon.
1 Execute Algorithm 2 with parameters nn, mm and ϵ\epsilon and let Γ\Gamma^{*} denote the returned value
2 Let t=Γ/ϵt=\Gamma^{*}/\epsilon
3 for i=1i=1 to s=Θ(ϵ1)s=\Theta(\epsilon^{-1}) do
4       Execute Algorithm 3 on GG with parameter tt and let {u,v}\{u,v\} denote the edge returned by the algorithm.
5       If both uLt(G)u\in L_{t}(G) and vLt(G)v\in L_{t}(G) then return REJECT if N(u)N(v)N(u)\cap N(v)\neq\emptyset
6      
7Return ACCEPT
Algorithm 4 Testing Triangle-Freeness

Since the tester has one-sided error (it rejects only if it finds a witness for violation, i.e., a triangle) its correctness follows from the following claim.

Claim 10.

If GG is ϵ\epsilon-far from being triangle free then Test Triangle Freeness finds a triangle with probability at least 2/32/3. The expected running time of the algorithm is O~(Γ/d¯(G)+Γ)poly(ϵ1)\tilde{O}\left(\Gamma/\bar{d}(G)+\Gamma\right)\cdot{\rm poly}(\epsilon^{-1}).

Proof.

Let G=(V,E)G=(V,E) be an input graph which is ϵ\epsilon-far from being triangle free. There are at least ϵm/3\epsilon m/3 edge-disjoint triangles in GG (see Claim 1). Let E1E_{1} denote the event that for Γ\Gamma^{*} that is return by Algorithm 1 it holds that |E(H(G,t))|(1(ϵ/6))m|E(H(G,t))|\geq(1-(\epsilon/6))m where t=Γ/ϵt=\Gamma^{*}/\epsilon. By Claim 5 E1E_{1} occurs with probability at least 5/65/6. Given that E1E_{1} occurred, it follows that there are at least (ϵ/3)m(ϵ/6)m=(ϵ/6)m(\epsilon/3)m-(\epsilon/6)m=(\epsilon/6)m edge-disjoint triangles in H(G,t)H(G,t). Let {t1,,tk}\{t_{1},\ldots,t_{k}\} be an arbitrary subset of these edge-disjoint triangles where k=def(ϵ/6)mk\stackrel{{\scriptstyle\rm def}}{{=}}(\epsilon/6)m. By the definition of H(G,t)H(G,t) it holds that for every edge {u,v}E(H(G,t))\{u,v\}\in E(H(G,t)) either uLt(G)u\in L_{t}(G) or vLt(G)v\in L_{t}(G). Thus, for every i[k]i\in[k] the triangle tit_{i} includes an edge {xi,yi}\{x_{i},y_{i}\} such that both xix_{i} and yiy_{i} are in Lt(G)L_{t}(G). Therefore, there are at least kk edges in H(G,t)H(G,t) such that if Algorithm 3 returns one of these edges in Step 4 of Algorithm 4 then Algorithm 4 rejects. For every i[s]i\in[s], let E2,iE_{2,i} denote the event that Algorithm 3 returns one of these edges in the ii-th iteration of Algorithm 4.

By Claim 9, given that E1E_{1} occurred, the probability that E2,iE_{2,i} occurs (given that the algorithm did not return REJECT before the ii-th iteration) is at least cϵc\epsilon for some absolute constant cc. Therefore the probability that both E1E_{1} and E2,iE_{2,i} occur for some i[s]i\in[s] is at least 2/32/3 for an appropriate setting of ss. Thus the algorithm rejects with probability at least 2/32/3 as desired. ∎

4 Lower Bounds

4.1 Lower bound of Ω((Γn)/m)\Omega((\Gamma n)/m) for any Γ\Gamma and any feasible m1m\geq 1

Theorem 1.

For any Γn1\Gamma\leq n-1 and any m1m\geq 1 which is feasible w.r.t. Γ\Gamma, any algorithm for testing triangle-freeness must perform Ω((Γn)/m)\Omega((\Gamma n)/m) queries where nn, mm and Γ\Gamma denote the number of vertices, the number of edges and the arboricity of the input graph, receptively. This lower bound holds even if the algorithm is allowed two-sided error.

Proof.

We consider the following graph GG over nn vertices, mm edges and of arboricity Γ\Gamma. V1V_{1} and V2V_{2} are subsets of 2m/(3Γ)2m/(3\Gamma) vertices each and V3V_{3} is a subset of Γ/2\Gamma/2 vertices. V1V_{1}, V2V_{2} and V3V_{3} are pairwise disjoint. The edges of the graph are as follows. The sub-graph induced on V1V_{1} and V3V_{3} is a complete bipartite graph with V1V_{1} on one side and V3V_{3} on the other side. Similarly the subgraph induced on V2V_{2} and V3V_{3} is also a complete bipartite graph. Between V1V_{1} and V2V_{2} we take Γ/2\Gamma/2 edge disjoint prefect matchings. Consequently the degree of every node in V1V_{1} and V2V_{2}, as well as the arboricity of the graph, is exactly Γ\Gamma. The number of edge disjoint triangles in the graph is at least m/3m/3. To see this consider the following correspondence between an edge {u,v}\{u,v\} such that uV1u\in V_{1} and vV2v\in V_{2} and a triangle in the graph. Let i[Γ/2]i\in[\Gamma/2] denote the matching for which {u,v}\{u,v\} belongs to, then triangle that corresponds to {u,v}\{u,v\} is {u,v,wi}\{u,v,w_{i}\}.

Therefore the graph is (1/3)(1/3)-far from being triangle free (if t1,,tm/3t_{1},\ldots,t_{m/3} are edge disjoint triangles of GG then we need to delete at least one edge per triangle in order to make GG triangle free). The number of queries we need to make to hit either V1V_{1} or V2V_{2} is Ω((Γn)/m)\Omega((\Gamma n)/m). The number of queries we need to make to hit V3V_{3} is Ω(n/Γ)\Omega(n/\Gamma). Since m=Ω(Γ2)m=\Omega(\Gamma^{2}), we obtain a lower bound of Ω((Γn)/m)\Omega((\Gamma n)/m) queries in order to hit a vertex from V1V2V3V_{1}\cup V_{2}\cup V_{3}. Therefore unless the tester makes Ω((Γn)/m)\Omega((\Gamma n)/m) queries, it can not distinguish between GG and the empty graph. The theorem follows. ∎

4.2 Lower bound of Ω(Γ)\Omega(\Gamma) for any Γ\Gamma and any feasible mΓ3m\geq\Gamma^{3}

We adapt the following lower bound of Alon et al. [3].

Theorem 2.

Any algorithm for testing triangle-freeness must perform Ω(min{d¯,n/d¯})\Omega(\min\{\bar{d},n/\bar{d}\}) queries. This lower bound holds even if the algorithm is allowed two-sided error and even for dmax=O(d¯)d_{\rm max}=O(\bar{d}).

Using Theorem 2, we shall prove the following claims.

Claim 11.

For any Γ\Gamma and any feasible mm w.r.t. Γ\Gamma such that mΓ3m\geq\Gamma^{3}, any algorithm for testing triangle-freeness must perform Ω(Γ)\Omega(\Gamma) queries, where mm and Γ\Gamma denote the number of edges and the arboricity of the input graph, respectively. This lower bound holds even if the algorithm is allowed two-sided error.

Proof.

Assume towards contradiction that there exists an algorithm 𝒜\mathcal{A} for testing triangle-freeness that is allowed two-sided error and performs o(Γ)o(\Gamma) queries even for input graphs for which mΓ3m\geq\Gamma^{3}, where mm and Γ\Gamma denote the number of edges and the arboricity of the input graph of 𝒜\mathcal{A}, respectively. We will show that there exists an algorithm \mathcal{B} for testing triangle-freeness (with two-sided error) for graphs in which M/N=ΓM/N=\Gamma, N=m/ΓN=m/\Gamma and the maximum degree is Γ\Gamma, whose query complexity is o(Γ)o(\Gamma), where MM and NN denote the number of edges and the number of vertices of the input graph of \mathcal{B}, respectively. This will contradict the lower bound in Theorem 2 as min{M/N,N2/M}=min{Γ,m/Γ2}=Γ\min\{M/N,N^{2}/M\}=\min\{\Gamma,m/\Gamma^{2}\}=\Gamma, where the last inequality follows from the fact that mΓ3m\geq\Gamma^{3}.

Let mm, nn and Γ\Gamma be such that mm is feasible w.r.t. Γ\Gamma and mΓ3m\geq\Gamma^{3}. Therefore Γ3mΓn\Gamma^{3}\leq m\leq\Gamma n. Let GG be a graph over NN vertices and MM edges for which M/N=ΓM/N=\Gamma, N=m/ΓN=m/\Gamma and the maximum degree is Γ\Gamma. Given such an input graph GG, the algorithm \mathcal{B} simulates 𝒜\mathcal{A} on another graph, GG^{\prime}, that will be described momentarily, and returns the output of 𝒜\mathcal{A} on GG^{\prime}. The graph GG^{\prime} is constructed from GG and has the following properties:

  1. 1.

    GG^{\prime} has arboricity Γ\Gamma.

  2. 2.

    Any query on GG^{\prime} can be answered by performing at most a single query to GG

  3. 3.

    If GG is triangle free then GG^{\prime} is triangle free as well.

  4. 4.

    If GG is ϵ\epsilon-far from being triangle free then GG^{\prime} is ϵ\epsilon-far from being triangle free as well.

GG^{\prime} is simply the graph GG with nNn-N isolated vertices (observe that nNn\geq N since mΓnm\leq\Gamma n). Since M=mM=m, Item 4 follows. Items 1 and 3 follow from construction and the bound on the maximum degree of GG. Item 2 follows from the fact that any query to the graph GG^{\prime} is can be answered either by performing a single query the graph GG^{\prime} or to without performing any query to GG (in case the algorithm queries the subgraph induced on the additional nNn-N isolated vertices of GG^{\prime}).

The completeness and soundness of algorithm \mathcal{B} follows from the correctness of algorithm 𝒜\mathcal{A} and Items 3 and 4, respectively. The claim about the query complexity of algorithm \mathcal{B} follows from Item 2 and the assumption on the query complexity of algorithm 𝒜\mathcal{A}. This completes the proof of the claim. ∎

4.3 Lower bound of Ω(m1/3)\Omega(m^{1/3}) for any Γ(n/2)1/2\Gamma\leq(n/2)^{1/2} and any feasible nmΓ3n\leq m\leq\Gamma^{3}

Claim 12.

For any Γ(n/2)1/2\Gamma\leq(n/2)^{1/2} and any feasible mm w.r.t. Γ\Gamma such that nmΓ3n\leq m\leq\Gamma^{3}, any algorithm for testing triangle-freeness must perform Ω(m1/3)\Omega(m^{1/3}) queries, where mm, nn and Γ\Gamma denote the number of edges, number of vertices and the arboricity of the input graph, respectively. This lower bound holds even if the algorithm is allowed two-sided error.

Proof.

The proof of this claim follows the same lines as the proof of Claim 11. Assume towards contradiction that there exists an algorithm 𝒜\mathcal{A} for testing triangle-freeness that is allowed two-sided error and performs o(m1/3)o(m^{1/3}) queries even for input graphs for which nmΓ3n\leq m\leq\Gamma^{3}, where mm, nn and Γ\Gamma denote the number of edges, number of vertices and the arboricity of the input graph, respectively. We will show that there exists an algorithm \mathcal{B} for testing triangle-freeness (with two-sided error) for graphs in which M=N3/2=Θ(m)M=N^{3/2}=\Theta(m) and the maximum degree is M/NM/N, whose query complexity is o(M/N)o(M/N), where MM and NN denote the number of edges and the number of vertices of the input graph of \mathcal{B}, respectively. This will contradict the lower bound in Theorem 2 as min{M/N,N2/M}=M/N\min\{M/N,N^{2}/M\}=M/N, where the last inequality follows from the fact that M=N3/2M=N^{3/2}.

Let mm, nn and Γ\Gamma be such that mm is feasible w.r.t. Γ\Gamma and nmΓ3n\leq m\leq\Gamma^{3}. Let GG be a graph over NN vertices and MM edges for which M=N3/2=m/2M=N^{3/2}=m/2 and for which the maximum degree is M/NM/N. Given such an input graph GG, the algorithm \mathcal{B} simulates 𝒜\mathcal{A} on another graph, GG^{\prime}, that will be described momentarily, and returns the output of 𝒜\mathcal{A} on GG^{\prime}. The graph GG^{\prime} is constructed from GG and has the following properties:

  1. 1.

    GG^{\prime} has arboricity Γ\Gamma.

  2. 2.

    Any query on GG^{\prime} can be answered by performing at most a single query to GG

  3. 3.

    If GG is triangle free then GG^{\prime} is triangle free as well.

  4. 4.

    If GG is ϵ\epsilon-far from being triangle free then GG^{\prime} is at least ϵ/2\epsilon/2-far from being triangle free as well.

GG^{\prime} is composed of the graph GG, a complete bipartite graph AA over 2Γ2\Gamma vertices and a graph II of nN2Γn-N-2\Gamma isolated vertices. Observe that nN2Γn-N\geq 2\Gamma since Γ2n/2\Gamma^{2}\leq n/2 and N=M2/3Γ2/22/3n/2N=M^{2/3}\leq\Gamma^{2}/2^{2/3}\leq n/2. The graph AA has Γ\Gamma vertices on each side, A1A_{1} and A2A_{2} and Γ2\Gamma^{2} edges. Item 4 follows from the fact that M=mΓ2mn/2m/2M=m-\Gamma^{2}\geq m-n/2\geq m/2. Observe that maximum degree of GG is at most Γ\Gamma since M/Nm1/3ΓM/N\leq m^{1/3}\leq\Gamma. Therefore, Items 1 and 3 follow from the fact that the arboricity of AA is Γ\Gamma and the bound on the maximum degree of GG. Item 2 follows from the fact that any query to the graph GG^{\prime} is can be answered either by performing a single query the graph GG^{\prime} or to without performing any query to GG (in case the algorithm queries the subgraphs AA or II).

The completeness and soundness of algorithm \mathcal{B} follows from the correctness of algorithm 𝒜\mathcal{A} and Items 3 and 4, respectively. The claim about the query complexity of algorithm \mathcal{B} follows from Item 2 and the assumption on the query complexity of algorithm 𝒜\mathcal{A}. This completes the proof of the claim. ∎

References

  • [1] Noga Alon. Testing subgraphs in large graphs. Random Struct. Algorithms, 21(3-4):359–370, 2002.
  • [2] Noga Alon, Eldar Fischer, Michael Krivelevich, and Mario Szegedy. Efficient testing of large graphs. Combinatorica, 20(4):451–476, 2000.
  • [3] Noga Alon, Tali Kaufman, Michael Krivelevich, and Dana Ron. Testing triangle-freeness in general graphs. SIAM J. Discret. Math., 22(2):786–819, 2008.
  • [4] Talya Eden, Reut Levi, and Dana Ron. Testing bounded arboricity. ACM Trans. Algorithms, 16(2):18:1–18:22, 2020.
  • [5] Talya Eden, Dana Ron, and Will Rosenbaum. The arboricity captures the complexity of sampling edges. In Christel Baier, Ioannis Chatzigiannakis, Paola Flocchini, and Stefano Leonardi, editors, 46th International Colloquium on Automata, Languages, and Programming, ICALP 2019, July 9-12, 2019, Patras, Greece, volume 132 of LIPIcs, pages 52:1–52:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019.
  • [6] Talya Eden, Dana Ron, and C. Seshadhri. Sublinear time estimation of degree distribution moments: The arboricity connection. SIAM J. Discret. Math., 33(4):2267–2285, 2019.
  • [7] Talya Eden, Dana Ron, and C. Seshadhri. Faster sublinear approximation of the number of k-cliques in low-arboricity graphs. In Shuchi Chawla, editor, Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, Salt Lake City, UT, USA, January 5-8, 2020, pages 1467–1478. SIAM, 2020.
  • [8] Oded Goldreich, Shafi Goldwasser, and Dana Ron. Property testing and its connection to learning and approximation. Journal of the ACM (JACM), 45(4):653–750, 1998.
  • [9] Oded Goldreich and Dana Ron. Property testing in bounded degree graphs. Algorithmica, 32(2):302–343, 2002.
  • [10] L. Gugelmann. Testing triangle-freeness in general graphs: Lower bounds. Bachelor thesis, Dept. of Mathematics, ETH, Zurich, 2006.
  • [11] Tali Kaufman, Michael Krivelevich, and Dana Ron. Tight bounds for testing bipartiteness in general graphs. SIAM Journal on Computing, 33(6):1441–1483, 2004.
  • [12] Michal Parnas and Dana Ron. Testing the diameter of graphs. Random Struct. Algorithms, 20(2):165–183, 2002.
  • [13] T. Rast. Testing triangle-freeness in general graphs: Upper bounds. Bachelor thesis, Dept. of Mathematics, ETH, Zurich, 2006.

Appendix A Appendix

Theorem 3 (Multiplicative Chernoff’s Bound).

Let X1,,XnX_{1},\ldots,X_{n} be identical independent random variables ranging in [0,1][0,1], and let p=E[X1]p={\rm E}[X_{1}]. Then, for every γ(0,2]\gamma\in(0,2], it holds that

Pr[|1ni[n]Xip|>γp]<2eγ2pn/4.\mathrm{Pr}\left[\left|\frac{1}{n}\cdot\sum_{i\in[n]}X_{i}-p\right|>\gamma\cdot p\right]<2\cdot e^{-\gamma^{2}pn/4}\;. (3)