This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Optimal Deterministic Group Testing Algorithms
to Estimate the Number of Defectives

Nader H. Bshouty
Dept. of Computer Science
Technion, Haifa, 32000
 
Catherine A. Haddad-Zaknoon
Dept. of Computer Science
Technion, Haifa, 32000
Abstract

We study the problem of estimating the number of defective items dd within a pile of nn elements up to a multiplicative factor of Δ>1\Delta>1, using deterministic group testing algorithms. We bring lower and upper bounds on the number of tests required in both the adaptive and the non-adaptive deterministic settings given an upper bound DD on the defectives number. For the adaptive deterministic settings, our results show that, any algorithm for estimating the defectives number up to a multiplicative factor of Δ\Delta must make at least Ω((D/Δ2)log(n/D))\Omega\left((D/\Delta^{2})\log(n/D)\right) tests. This extends the same lower bound achieved in [1] for non-adaptive algorithms. Moreover, we give a polynomial time adaptive algorithm that shows that our bound is tight up to a small additive term.

For non-adaptive algorithms, an upper bound of O((D/Δ2)O((D/\Delta^{2}) (log(n/D)+logΔ))(\log(n/D)+\log\Delta)) is achieved by means of non-constructive proof. This improves the lower bound O((logD)/(logΔ))Dlogn)O((\log D)/(\log\Delta))D\log n) from [1] and matches the lower bound up to a small additive term.

In addition, we study polynomial time constructive algorithms. We use existing polynomial time constructible expander regular bipartite graphs, extractors and condensers to construct two polynomial time algorithms. The first algorithm makes O((D1+o(1)/Δ2)logn)O((D^{1+o(1)}/\Delta^{2})\cdot\log n) tests, and the second makes (D/Δ2)quazipoly(D/\Delta^{2})\cdot quazipoly (logn)(\log n) tests. This is the first explicit construction with an almost optimal test complexity.

1 Introduction

The problem of group testing is the problem of identifying or, in some cases, examining the properties of a small amount of items known as defective items within a pile of elements using group tests. Let XX be a set of nn items, and let IXI\subseteq X be the set of defective items. A group test is a subset QXQ\subseteq X of items. The result of the test QQ with respect to II is defined by Q(I):=1Q(I):=1 if QIQ\cap I\neq\emptyset and Q(I):=0Q(I):=0 otherwise. While the defective set II is unknown to the algorithm, in many cases we might be interested in finding the size of the defective set |I||I|, or at least an estimation of that value with a minimum number of tests.

Group testing was originally proposed as a potential solution for economising mass blood testing during WWII [11]. Since then, group testing approach has been diversely applied in a wide area of practical applications including DNA library screening [20], product testing quality control[22], file searching in storage systems [17], sequential screening of experimental variables [18], efficient contention algorithms for MAC [17, 26], data compression [16], and computations in data stream model [8]. Recently, during the COVID-19 pandemic outbreak, a number of researches adopted the group testing paradigm not only to accelerate mass testing process, but also to dramatically reduce the number of kits required for testing due to severe shortages in the testing kits supply [27, 14, 19].

While an up-front knowledge of the value of dd or at least an upper bound on it is required in many of the algorithms aimed at identifying the defective items, estimating or finding the number of defectives is an interesting problem on its own as well. Defectives estimation via group testing has been applied vastly in biological and medical applications [7, 23, 24, 25, 13]. In [24], for example, group testing algorithms are used to estimate aster-yellow virus transmitters proportion over the organisms in a natural population of leafhoppers. Similarly, in [25], the authors estimate the infection rate of the yellow-fever virus in mosquito population using group testing methods. On the other hand, in [13], group-testing-based estimation of rare diseases prevalence is employed not only for its effectiveness but also because it naturally preserves individual anonymity of the subjects.

Algorithms dedicated for this task might operate in stages or rounds. In each round, the tests are defined in advance and tested in a single parallel step. Tests on some round might depend on the test results of the preceding rounds. A single round algorithm is called non-adaptive algorithm, while a multi-round algorithm is called adaptive algorithm.

In recent years, there has been an increasing interest in the problem of estimating the number of defective items via group testing[3, 2, 7, 5, 9, 10, 12, 21]. The target in some of these papers is to find an estimation d^\hat{d} within an additive factor of ϵ<1\epsilon<1 such that (1ϵ)dd^(1+ϵ)d(1-\epsilon)d\leq\hat{d}\leq(1+\epsilon)d. For randomized adaptive algorithms we have the following results. Falhatgar et.al. [12] give a randomised adaptive algorithm that estimates dd using 2loglogd+O(1/ϵ2log1/δ)2\log\log d+O(1/\epsilon^{2}\log 1/\delta) queries in expectation where δ\delta is the failure probability of the algorithm. Bshouty et. al. [3] modified this result and gave an algorithm that uses (1δ)loglogd+O((1/ϵ2)log1/δ)(1-\delta)\log\log d+O((1/\epsilon^{2})\log 1/\delta) expected number of queries. Moreover, they proved a lower bound of (1δ)loglogd+Ω((1/ϵ)log(1/δ))(1-\delta)\log\log d+\Omega((1/\epsilon)\log(1/\delta)) queries.

For randomized non-adaptive algorithms with constant estimation, Damaschke and Sheikh Muhammad give in [10] a randomized non-adaptive algorithm that makes O((log(1/δ))logn)O((\log(1/\delta))\log n) tests and in [2], Bshouty gives the lower bound Ω(logn/loglogn)\Omega(\log n/\log\log n).

In this paper, we are interested in deterministic adaptive and non-adaptive algorithms that estimate the defective items set size dd up to a multiplicative factor of Δ>1\Delta>1. Formally, let |I|:=d|I|:=d and let DdD\geq d. We say that a deterministic algorithm 𝒜\cal A estimates dd up to a multiplicative factor of Δ\Delta if, given DD as an input to the algorithm, it evaluates an estimation d^\hat{d} such that d/Δd^dΔd/\Delta\leq\hat{d}\leq d\Delta. Bshouty et al. show in [3] that, if no upper bound DD is given to the algorithm, then any deterministic adaptive algorithm (and therefore also non-deterministic algorithm) for this problem must make at least Ω(n)\Omega(n) tests. This is equivalent to testing all the items. This justifies the fact that any non-trivial efficient algorithm must have some upper bound DD for dd.

Agarwal et.al.  [1] consider this problem. They first give the lower bound of Ω((D/Δ2)\Omega((D/\Delta^{2}) log(n/D))\log({n}/{D})) queries for any non-adaptive deterministic algorithm. Moreover, using a non-constructive proof, they give an upper bound of O(((logD)/(logΔ))Dlogn)O\left((({\log D})/({\log\Delta}))D\log n\right) queries.

We further investigate this problem. We bring new lower and upper bounds on the number of tests required both in adaptive and non-adaptive deterministic algorithms. For the adaptive deterministic settings, our results show that, any algorithm for estimating the defectives number up to a multiplicative factor of Δ\Delta must make at least Ω((D/Δ2)log(n/D))\Omega\left((D/\Delta^{2})\log(n/D)\right) tests. This extends the same lower bound achieved in [1] for non-adaptive algorithms. Furthermore, we give a polynomial time adaptive algorithm that shows that our bound is tight up to a small additive term.

For non-adaptive algorithms, we achieve an upper bound of O((D/Δ2)O((D/\Delta^{2}) (log(n/D)+logΔ))(\log(n/D)+\log\Delta)) by means of non-constructive proof. This improves the lower bound O((logD)/(logΔ))Dlogn)O((\log D)/(\log\Delta))D\log n) from [1], and matches the lower bound up to a small additive term.

We then study polynomial time constructive algorithms. For this task, we use existing polynomial time constructible expander regular bipartite graphs, extractors and condensers to construct two polynomial time algorithms. The first algorithm makes O((D1+o(1)/Δ2)logn)O((D^{1+o(1)}/\Delta^{2})\cdot\log n) tests, and the second makes (D/Δ2)quazipoly(D/\Delta^{2})\cdot quazipoly (logn)(\log n) tests. To the best of our knowledge, this is the first explicit construction with an almost optimal test complexity. Our results are summarised in Table 1.

Bounds Adaptive/ Result Explicit/ Ref.
Non-Adapt. Non-Expl.
Lower B. Non-Adapt. DΔ2lognD\frac{D}{\Delta^{2}}\log\frac{n}{D} - [1]
Lower B. Adaptive DΔ2lognD\frac{D}{\Delta^{2}}\log\frac{n}{D} - Ours
Upper B. Adaptive DΔ2(lognD+logΔ)\frac{D}{\Delta^{2}}\left(\log\frac{n}{D}+\log\Delta\right) Explicit Ours
Upper B. Non-Adapt. logDlogΔDlogn\frac{\log D}{\log\Delta}D\log n Non-Expl. [1]
Upper B. Non-Adapt. DΔ2(lognD+logΔ)\frac{D}{\Delta^{2}}\left(\log\frac{n}{D}+\log\Delta\right) Non-Expl. Ours
Upper B. Non-Adapt. D1+o(1)Δ2logn\frac{D^{1+o(1)}}{\Delta^{2}}\log n Explicit111This result is true for Δ>C\Delta>C for some constant CC. See section 6.2. Ours
Upper B. Non-Adapt. DΔ2Quazipoly(logn)\frac{D}{\Delta^{2}}\cdot{\rm Quazipoly}(\log n) Explicit Ours
Table 1: Upper and lower bounds on the number of tests required
for estimating defectives in deterministic group testing.

2 Definitions and Preliminary Results

In this section, we give some notations and definition that will be used in this paper.

Let X=[n]:={1,,n}X=[n]:=\{1,\cdots,n\} be a set of items. Let IXI\subseteq X be a set of defective items, and let dd denote its size, i.e. d=|I|d=|I|. In the group testing settings, a test is a subset QXQ\subseteq X of items. An answer to a test QQ with respect to the defective items set II, is denoted by Q(I)Q(I), such that Q(I):=1Q(I):=1 if QIQ\cap I\neq\emptyset and 0 otherwise. We denote by 𝒪I{\cal O}_{I} an oracle that for a test QQ returns Q(I)Q(I).

Let 𝒜{\cal A} be an algorithm with an access to 𝒪I{\cal O}_{I}, and let d=|I|d=|I|. We say that the algorithm 𝒜{\cal A} estimates dd up to a multiplicative factor of Δ\Delta, if 𝒜{\cal A} gets as an input an upper bound DdD\geq d and a parameter Δ>1\Delta>1, and outputs d^\hat{d} such that d/Δd^dΔd/\Delta\leq\hat{d}\leq d\Delta. We say that 𝒜{\cal A} is an adaptive algorithm, if its queries depend on the result of previous queries, and non-adaptive if its queries are independent of previous ones and therefore, can be executed in a single parallel step. We may assume that DΔ2D\geq\Delta^{2}, otherwise, the algorithm trivially outputs d^=D/Δ\hat{d}=D/\Delta. We note here that Δ1+Ω(1)\Delta\geq 1+\Omega(1), that is, it is greater than a constant that is greater than 11 and it may depend222For example Δ=loglogn+logD\Delta=\log\log n+\log D on nn and/or DD. This is implicit in [1] and is also constrained in this paper. It is also interesting to investigate this problem when Δ=1+o(n)\Delta=1+o(n) where o()o() (small oo) is with respect to DD and/or nn.

We will use the following

Lemma 1.

Chernoff’s Bound. Let X1,,XmX_{1},\ldots,X_{m} be independent random variables taking values in {0,1}\{0,1\}. Let X=i=1mXiX=\sum_{i=1}^{m}X_{i} denotes their sum and μ=𝐄[X]\mu={\bf E}[X] denotes the sum’s expected value. Then

𝐏𝐫[X>(1+λ)μ](eλ(1+λ)(1+λ))μeλ2μ2+λ{eλ2μ3if 0<λ1eλμ3if λ>1.\displaystyle{\bf Pr}[X>(1+\lambda)\mu]\leq\left(\frac{e^{\lambda}}{(1+\lambda)^{(1+\lambda)}}\right)^{\mu}\leq e^{-\frac{\lambda^{2}\mu}{2+\lambda}}\leq\begin{cases}e^{-\frac{\lambda^{2}\mu}{3}}&\mbox{if\ }0<\lambda\leq 1\\ e^{-\frac{\lambda\mu}{3}}&\mbox{if\ }\lambda>1\end{cases}. (1)

In particular,

𝐏𝐫[X>Λ](eμΛ)Λ.\displaystyle{\bf Pr}[X>\Lambda]\leq\left(\frac{e\mu}{\Lambda}\right)^{\Lambda}. (2)

For 0λ10\leq\lambda\leq 1 we have

𝐏𝐫[X<(1λ)μ](eλ(1λ)(1λ))μeλ2μ2.\displaystyle{\bf Pr}[X<(1-\lambda)\mu]\leq\left(\frac{e^{-\lambda}}{(1-\lambda)^{(1-\lambda)}}\right)^{\mu}\leq e^{-\frac{\lambda^{2}\mu}{2}}. (3)

Moreover, we will often use the inequality

(nk)k(nk)i=0k(ni)(enk)k,\displaystyle\left(\frac{n}{k}\right)^{k}\leq{n\choose k}\leq\sum_{i=0}^{k}{n\choose i}\leq\left(\frac{en}{k}\right)^{k}, (4)

3 Upper Bound for Non-Adaptive Deterministic Algorithms

In this section, we give the upper bound for deterministic non-adaptive algorithm that estimates dd up to a multiplicative factor of Δ\Delta. We prove:

Theorem 2.

Let DD be some upper bound on the number of defective items dd and Δ>1\Delta>1. Then, there is a deterministic non-adaptive algorithm that makes

O(DΔ2(lognD+logΔ))O\left(\frac{D}{\Delta^{2}}\left(\log\frac{n}{D}+\log\Delta\right)\right)

tests and outputs d^\hat{d} such that dΔd^dΔ.\frac{d}{\Delta}\leq\hat{d}\leq d\Delta.

To prove the Theorem we need the following:

Lemma 3.

Let Δ>1\Delta>1 and 2Δ2\ell\geq 2\Delta^{2}. There is a non-adaptive deterministic algorithm that makes

t=O(Δ2(logn+logΔ))t=O\left(\frac{\ell}{\Delta^{2}}\left(\log\frac{n}{\ell}+\log\Delta\right)\right)

tests such that,

  1. 1.

    If the number of defectives dd is less than /Δ2\ell/\Delta^{2}, it outputs 0.

  2. 2.

    If it is greater than /Δ\ell/\Delta, it outputs 11.

Proof.

We choose a constant cc such that (1Δ2/(c))/Δ2=1/e(1-\Delta^{2}/(c\ell))^{\ell/\Delta^{2}}=1/e. Note that

(1Δ22)/Δ21Δ22Δ2=12>1e\left(1-\frac{\Delta^{2}}{2\ell}\right)^{\ell/\Delta^{2}}\geq 1-\frac{\Delta^{2}\ell}{2\ell\Delta^{2}}=\frac{1}{2}>\frac{1}{e}

and

(12Δ2)/Δ2=((12Δ2)2Δ2)21e2<1e.\left(1-\frac{2\Delta^{2}}{\ell}\right)^{\ell/\Delta^{2}}=\left(\left(1-\frac{2\Delta^{2}}{\ell}\right)^{\frac{\ell}{2\Delta^{2}}}\right)^{2}\leq\frac{1}{e^{2}}<\frac{1}{e}.

Therefore, such cc exists and we have 1/2c21/2\leq c\leq 2.

Consider a test Q[n]Q\subseteq[n] chosen at random where each item i[n]i\in[n] is chosen to be in QQ with probability Δ2/(c)\Delta^{2}/(c\ell). Let II be the set of defective items such that |I|=d|I|=d, and let Q(I)Q(I) be the result of the test QQ with respect to the set II. Then,

𝐏𝐫[Q(I)=0]=(1Δ2c)d.{\bf Pr}[Q(I)=0]=\left(1-\frac{\Delta^{2}}{c\ell}\right)^{d}. (5)

If d/Δ2d\leq\ell/\Delta^{2},

𝐏𝐫[Q(I)=0](1Δ2c)/Δ2=e1,{\bf Pr}[Q(I)=0]\geq\left(1-\frac{\Delta^{2}}{c\ell}\right)^{\ell/\Delta^{2}}=e^{-1}, (6)

if d=2/Δ2d=2\ell/\Delta^{2},

𝐏𝐫[Q(I)=0]=(1Δ2c)2/Δ2=e2,{\bf Pr}[Q(I)=0]=\left(1-\frac{\Delta^{2}}{c\ell}\right)^{2\ell/\Delta^{2}}=e^{-2}, (7)

and if d=/Δd=\ell/\Delta, we get:

𝐏𝐫[Q(I)=0]=((1Δ2c)Δ2)Δ=eΔ.{\bf Pr}[Q(I)=0]=\left(\left(1-\frac{\Delta^{2}}{c\ell}\right)^{\frac{\ell}{\Delta^{2}}}\right)^{\Delta}=e^{-\Delta}. (8)

Let Q1,Q2,,QtQ_{1},Q_{2},\ldots,Q_{t} be a sequence of tt i.i.d tests such that

t=c(Δ1)2lnc′′Δ2nt=\frac{c^{\prime}\ell}{(\Delta-1)^{2}}\ln\frac{c^{\prime\prime}\Delta^{2}n}{\ell}

where c=54e2c^{\prime}=54e^{2} and c′′=4ec^{\prime\prime}=4e.

Let

η=e1(12+12Δ).\eta=e^{-1}\left(\frac{1}{2}+\frac{1}{2\Delta}\right).

Consider the following two events:

  1. 1.

    AA: There is a set of defectives II of size |I|/Δ2|I|\leq\ell/\Delta^{2} such that the number of tests with 0 answer is less than ηt\eta t.

  2. 2.

    BB: There is a set of defectives JJ of size |J|>/Δ|J|>\ell/\Delta such that the number of tests with 0 answer is at least ηt\eta t.

Notice that, to prove the lemma it is enough to prove that 𝐏𝐫[AB]<1{\bf Pr}[A\vee B]<1. We will show that 𝐏𝐫[A],𝐏𝐫[B]<1/2{\bf Pr}[A],{\bf Pr}[B]<1/2.

Let X1,,XtX_{1},\ldots,X_{t} be random variables such that Xi=1X_{i}=1 if and only if Qi(I)=0Q_{i}(I)=0. Let XX be the number of tests that yield the result 0. Therefore, X=i=1tXiX=\sum_{i=1}^{t}X_{i} and define μ:=𝐄[X]\mu:={\bf E}[X].

If |I|=d/Δ2|I|=d\leq\ell/\Delta^{2}, then μ=t𝐄[Xi]=t𝐏𝐫[Xi=1]\mu=t\cdot{\bf E}[X_{i}]=t\cdot{\bf Pr}[X_{i}=1]. By (6) we have

μ=𝐄[X]te1.\mu={\bf E}[X]\geq t\cdot e^{-1}. (9)

By (3) in Lemma 1, for λ=1/21/(2Δ)\lambda=1/2-1/(2\Delta) we have

𝐏𝐫[Xηt]=𝐏𝐫[X(1λ)te1]𝐏𝐫[X(1λ)μ]eλ2μ2e(1Δ1)2t8e.{\bf Pr}[X\leq\eta t]={\bf Pr}[X\leq(1-\lambda)te^{-1}]\leq{\bf Pr}[X\leq(1-\lambda)\mu]\leq e^{-\frac{\lambda^{2}\mu}{2}}\leq e^{-\frac{(1-\Delta^{-1})^{2}t}{8e}}.

Using this result, equations (4) and the union bound, we can conclude that

𝐏𝐫[A]\displaystyle{\bf Pr}[A] \displaystyle\leq (i=0/Δ2(ni))e(1Δ1)2t8e(eΔ2n)Δ2e(1Δ1)2t8e\displaystyle\left(\sum_{i=0}^{\ell/\Delta^{2}}{n\choose i}\right)e^{-\frac{(1-\Delta^{-1})^{2}t}{8e}}\leq\left(\frac{e\Delta^{2}n}{\ell}\right)^{\frac{\ell}{\Delta^{2}}}e^{-\frac{(1-\Delta^{-1})^{2}t}{8e}}
=\displaystyle= (eΔ2n)Δ2ec8eΔ2lnc′′Δ2n=(eΔ2n)Δ2(c′′Δ2n)cΔ2<12.\displaystyle\left(\frac{e\Delta^{2}n}{\ell}\right)^{\frac{\ell}{\Delta^{2}}}e^{-\frac{c^{\prime}\ell}{8e\Delta^{2}}\ln\frac{c^{\prime\prime}\Delta^{2}n}{\ell}}=\left(\frac{e\Delta^{2}n}{\ell}\right)^{\frac{\ell}{\Delta^{2}}}\left(\frac{c^{\prime\prime}\Delta^{2}n}{\ell}\right)^{-\frac{c^{\prime}\ell}{\Delta^{2}}}<\frac{1}{2}.

On the other hand, for the event BB, we have two cases.

Case I. 1<Δ21<\Delta\leq 2.

If there is a set of defectives JJ of size |J|>/Δ|J|>\ell/\Delta such that more than ηt\eta t of the tests yield the answer 0, then there is a set of defectives JJ^{\prime} of size |J|=/Δ|J^{\prime}|=\ell/\Delta such that more than ηt\eta t of the tests answers are 0. Denote by BB^{\prime} the latter event. Then, by (8) we have μ=𝐄[X]=eΔt\mu={\bf E}[X]=e^{-\Delta}t and for λ=(eΔ11)/2(Δ1)/2\lambda=(e^{\Delta-1}-1)/2\geq(\Delta-1)/2, η=(e1+eΔ)/2η\eta^{\prime}=(e^{-1}+e^{-\Delta})/2\leq\eta we get

𝐏𝐫[B]𝐏𝐫[B]\displaystyle{\bf Pr}[B]\leq{\bf Pr}[B^{\prime}] \displaystyle\leq (n/Δ)𝐏𝐫[Xηt](n/Δ)𝐏𝐫[Xηt]\displaystyle{n\choose\ell/\Delta}{\bf Pr}\left[X\geq\eta t\right]\leq{n\choose\ell/\Delta}{\bf Pr}\left[X\geq\eta^{\prime}t\right]
=\displaystyle= (n/Δ)𝐏𝐫[X(1+λ)μ]\displaystyle{n\choose\ell/\Delta}{\bf Pr}\left[X\geq\left(1+\lambda\right)\mu\right]
\displaystyle\leq (eΔn)Δ𝐏𝐫[X(1+λ)μ]\displaystyle\left(\frac{e\Delta n}{\ell}\right)^{\frac{\ell}{\Delta}}{\bf Pr}\left[X\geq\left(1+\lambda\right)\mu\right]

If 1<Δ21<\Delta\leq 2 then 0λ10\leq\lambda\leq 1 and then by (1) in Lemma 1, we have

(eΔn)Δ𝐏𝐫[X(1+λ)μ]\displaystyle\left(\frac{e\Delta n}{\ell}\right)^{\frac{\ell}{\Delta}}{\bf Pr}\left[X\geq\left(1+\lambda\right)\mu\right] \displaystyle\leq (eΔn)Δeλ2μ/3\displaystyle\left(\frac{e\Delta n}{\ell}\right)^{\frac{\ell}{\Delta}}e^{-\lambda^{2}\mu/3}
\displaystyle\leq (eΔn)Δe(Δ1)2μ/12sinceλ(Δ1)/2\displaystyle\left(\frac{e\Delta n}{\ell}\right)^{\frac{\ell}{\Delta}}e^{-(\Delta-1)^{2}\mu/12}\ \ \ \mbox{since}\ \lambda\geq(\Delta-1)/2
=\displaystyle= (eΔn)Δe(Δ1)2eΔt/12\displaystyle\left(\frac{e\Delta n}{\ell}\right)^{\frac{\ell}{\Delta}}e^{-(\Delta-1)^{2}e^{-\Delta}t/12}
=\displaystyle= (eΔn)Δ(c′′Δ2n)ceΔ/12\displaystyle\left(\frac{e\Delta n}{\ell}\right)^{\frac{\ell}{\Delta}}\left(\frac{c^{\prime\prime}\Delta^{2}n}{\ell}\right)^{-c^{\prime}\ell e^{-\Delta}/12}
\displaystyle\leq (2en)(c′′n)(ce2/12)<12 1Δ<2\displaystyle\left(\frac{2en}{\ell}\right)^{\ell}\left(\frac{c^{\prime\prime}n}{\ell}\right)^{-(c^{\prime}e^{-2}/12)\ell}<\frac{1}{2}\ \ \ \ \ \ \ \ 1\leq\Delta<2

Case II. Δ>2\Delta>2.

In this case we have /Δ>2/Δ2\ell/\Delta>2\ell/\Delta^{2}. Therefore, if there is a set of defectives JJ of size |J|>/Δ|J|>\ell/\Delta such that more than ηt\eta t of the tests yield the answer 0, then there is a set of defectives JJ^{\prime} of size |J|=2/Δ2|J^{\prime}|=2\ell/\Delta^{2} such that more than ηt\eta t of the tests answers are 0. Denote by B′′B^{\prime\prime} the latter event. By (7), μ=𝐄[X]=e2t\mu={\bf E}[X]=e^{-2}t. Let λ=1/31/(3Δ)<1\lambda=1/3-1/(3\Delta)<1. Then ηt>(1+λ)μ\eta t>(1+\lambda)\mu. By (1) in Lemma 1, we have

𝐏𝐫[Xηt]𝐏𝐫[X(1+λ)μ]eλ2μ3e(1Δ1)2t27e2.{\bf Pr}[X\geq\eta t]\leq{\bf Pr}[X\geq(1+\lambda)\mu]\leq e^{-\frac{\lambda^{2}\mu}{3}}\leq e^{-\frac{(1-\Delta^{-1})^{2}t}{27e^{2}}}.

Then

𝐏𝐫[A]\displaystyle{\bf Pr}[A] \displaystyle\leq 𝐏𝐫[B′′](n2/Δ2)e(1Δ1)2t27e2(eΔ2n2)2Δ2e(1Δ1)2t27e2\displaystyle{\bf Pr}[B^{\prime\prime}]\leq{n\choose 2\ell/\Delta^{2}}e^{-\frac{(1-\Delta^{-1})^{2}t}{27e^{2}}}\leq\left(\frac{e\Delta^{2}n}{2\ell}\right)^{\frac{2\ell}{\Delta^{2}}}e^{-\frac{(1-\Delta^{-1})^{2}t}{27e^{2}}}
=\displaystyle= (eΔ2n2)2Δ2ec27e2Δ2lnc′′Δ2n=(eΔ2n2)2Δ2(c′′Δ2n)c27e2Δ2<12.\displaystyle\left(\frac{e\Delta^{2}n}{2\ell}\right)^{\frac{2\ell}{\Delta^{2}}}e^{-\frac{c^{\prime}\ell}{27e^{2}\Delta^{2}}\ln\frac{c^{\prime\prime}\Delta^{2}n}{\ell}}=\left(\frac{e\Delta^{2}n}{2\ell}\right)^{\frac{2\ell}{\Delta^{2}}}\left(\frac{c^{\prime\prime}\Delta^{2}n}{\ell}\right)^{-\frac{c^{\prime}\ell}{27e^{2}\Delta^{2}}}<\frac{1}{2}.

We are now ready to prove Theorem 2.

Let 𝒜(,Δ){\cal A}(\ell,\Delta) be the algorithm from Lemma 3. Then, 𝒜(,Δ){\cal A}(\ell,\Delta) makes at most

cΔ2logΔn\frac{c\ell}{\Delta^{2}}\log\frac{\Delta n}{\ell} (10)

queries for some constant cc, and

  1. 1.

    If 𝒜(,Δ)=1{\cal A}(\ell,\Delta)=1, then dΔ2d\geq\frac{\ell}{\Delta^{2}}.

  2. 2.

    If 𝒜(,Δ)=0{\cal A}(\ell,\Delta)=0, then dΔ.d\leq\frac{\ell}{\Delta}.

Consider the algorithm 𝒯(n,D,Δ){\cal T}(n,D,\Delta) that runs 𝒜(D/Δi,Δ){\cal A}(D/\Delta^{i},\Delta) for all i=0,,logD/logΔi=0,\ldots,\lceil\log D/\log\Delta\rceil. Let rr be the minimum integer such that 𝒜(D/Δr,Δ)=1{\cal A}(D/\Delta^{r},\Delta)=1. Algorithm 𝒯(n,D,Δ){\cal T}(n,D,\Delta) then outputs d^=D/Δr+1\hat{d}=D/\Delta^{r+1}. See algorithm 𝒯{\cal T} in Figure 1.

𝒯{\cal T} (n,D,Δ)(n,D,\Delta) 1) r0.r\leftarrow 0. 2) For each i=0,1,,logD/logΔi=0,1,\ldots,\lceil\log D/\log\Delta\rceil do: 2.1) R𝒜(D/Δi,Δ)R\leftarrow{\cal A}(D/\Delta^{i},\Delta) 2.2) If (R=1)(R=1) then rir\leftarrow i d^D/Δr+1\hat{d}\leftarrow D/\Delta^{r+1} Output (d^\hat{d}).

Figure 1: Algorithm 𝒯{\cal T}

We now prove:

Lemma 4.

Algorithm 𝒯(n,D,Δ){\cal T}(n,D,\Delta) is deterministic non-adaptive that makes

O(DΔ2log(ΔnD))O\left(\frac{D}{\Delta^{2}}\log\left(\frac{\Delta n}{D}\right)\right)

tests and outputs d^\hat{d} that satisfies

dΔd^Δd.\frac{d}{\Delta}\leq\hat{d}\leq\Delta d.
Proof.

For i=0i=0, if A(D/Δi,Δ)=1A(D/\Delta^{i},\Delta)=1 then dD/Δ2d\geq D/\Delta^{2}. Then d^=D/ΔΔd\hat{d}=D/\Delta\leq\Delta d and since DdD\geq d we also have d^=D/Δd/Δ\hat{d}=D/\Delta\geq d/\Delta.

For i>0i>0, if A(D/Δi1,Δ)=0A(D/\Delta^{i-1},\Delta)=0 and A(D/Δi,Δ)=1A(D/\Delta^{i},\Delta)=1 then dD/Δid\leq D/\Delta^{i} and dD/Δi+2d\geq D/\Delta^{i+2}. Then d^=D/Δi+1Δd\hat{d}=D/\Delta^{i+1}\leq\Delta d and d^d/Δ\hat{d}\geq d/\Delta.

Let q=logD/logΔq=\lceil\log D/\log\Delta\rceil. Let tt denote the number of queries performed by algorithm 𝒯(n,D,Δ){\cal T}(n,D,\Delta). By (10), the number of tests is at most

i=0qcDΔiΔ2lognΔi+1D\displaystyle\sum_{i=0}^{q}\frac{cD}{\Delta^{i}\Delta^{2}}\log\frac{n\Delta^{i+1}}{D} \displaystyle\leq cDΔ2i=01ΔilognΔi+1D\displaystyle\frac{cD}{\Delta^{2}}\sum_{i=0}^{\infty}\frac{1}{\Delta^{i}}\log\frac{n\Delta^{i+1}}{D}
=\displaystyle= cDΔ2((lognD)i=01Δi+(logΔ)i=0i+1Δi)\displaystyle\frac{cD}{\Delta^{2}}\left(\left(\log\frac{n}{D}\right)\sum_{i=0}^{\infty}\frac{1}{\Delta^{i}}+(\log\Delta)\sum_{i=0}^{\infty}\frac{i+1}{\Delta^{i}}\right)
\displaystyle\leq cDΔ2(ΔΔ1lognD+Δ2(Δ1)2logΔ).\displaystyle\frac{cD}{\Delta^{2}}\left(\frac{\Delta}{\Delta-1}\log\frac{n}{D}+\frac{\Delta^{2}}{(\Delta-1)^{2}}\log\Delta\right).

For the case when Δ=1+Θ(1)\Delta=1+\Theta(1) we get

t=O(DlognD)t=O\left(D\log\frac{n}{D}\right)

and for the case when Δ=ω(1)\Delta=\omega(1) we get

t=O(DΔ2(lognD+logΔ)).t=O\left(\frac{D}{\Delta^{2}}\left(\log\frac{n}{D}+\log\Delta\right)\right).

4 Lower Bound for Adaptive Deterministic Algorithm

In this section, we prove the following lower bound.

Theorem 5.

Any deterministic adaptive group testing algorithm that given D>dD>d, outputs d^\hat{d} that satisfies d/Δd^Δdd/\Delta\leq\hat{d}\leq\Delta d must make at least

Ω(DΔ2lognD)\Omega\left(\frac{D}{\Delta^{2}}\log\frac{n}{D}\right)

queries.

For the proof, we use the following from [3].

Lemma 6.

Let AA be a deterministic adaptive algorithm that for a defective sets I[n]I\subset[n] makes the tests T1I,T2I,Tw(I)IT^{I}_{1},T^{I}_{2}\ldots,T^{I}_{w(I)} and let s(I)s(I) be the sequence of answers to these tests. If M=|{s(I)|I[n]}|M=|\{s(I)|I\subseteq[n]\}| then the test complexity of AA is maxIw(I)logM\max_{I}w(I)\geq\log M.

The following Lemma assists us to prove the result declared by Theorem 5.

Lemma 7.

Any deterministic adaptive algorithm such that, if the number of defectives dd is less than or equal d1d_{1} it outputs 0 and if it is greater than d2d_{2} it outputs 11, must make

Ω(d1lognd2)\Omega\left(d_{1}\log\frac{n}{d_{2}}\right)

tests.

In particular, when d1=/Δ2d_{1}=\ell/\Delta^{2} and d2=/Δd_{2}=\ell/\Delta we get

Ω(Δ2(logn+logΔ))\Omega\left(\frac{\ell}{\Delta^{2}}\left(\log\frac{n}{\ell}+\log\Delta\right)\right)

tests.

Proof.

Let AA be such algorithm. Let s(I)s(I) be the sequence of answers to the tests of AA when the set of defective items is II. Consider a set II of size d1d_{1} and let 𝒥={J[n]:|J|=d1,s(J)=s(I)}{\cal J}=\{J\subseteq[n]:|J|=d_{1},s(J)=s(I)\}. Let I=J𝒥JI^{\prime}=\cup_{J\in{\cal J}}J. We claim that s(I)=s(I)s(I^{\prime})=s(I). Suppose for the contrary, s(I)s(I)s(I^{\prime})\not=s(I). Then, since III\subseteq I^{\prime}, there is a test Q[n]Q\subseteq[n] that is asked by AA that gives answer 0 to II and 11 to II^{\prime}. Since IQI^{\prime}\cap Q\not=\emptyset, there is a subset J𝒥J^{\prime}\in{\cal J} such that JQJ^{\prime}\cap Q\not=\emptyset and therefore QQ gives answer 11 to JJ^{\prime}. Then s(J)s(I)s(J^{\prime})\not=s(I) and we get a contradiction.

Since s(I)=s(I)s(I^{\prime})=s(I) and algorithm AA outputs 0 to II, it also outputs 0 to II^{\prime}. Therefore, |I|d2|I^{\prime}|\leq d_{2}. Therefore |𝒥|N:=(d2d1)|{\cal J}|\leq N:={d_{2}\choose d_{1}}. That is, for every possible sequence of answers ss^{\prime} of the algorithm AA, there is at most NN sets of size d1d_{1} that get the same sequence of answers. Since there are L:=(nd1)L:={n\choose d_{1}} such sets, the number of different sequences of answers that AA might have must be at least L/NL/N. By Lemma 6, the number of tests that the algorithm makes is at least

log(nd1)(d2d1)log(ned2)d1=Ω(d1lognd2).\log\frac{{n\choose d_{1}}}{{d_{2}\choose d_{1}}}\geq\log\left(\frac{n}{ed_{2}}\right)^{d_{1}}=\Omega\left(d_{1}\log\frac{n}{d_{2}}\right).

The conclusions established by Lemma 7 show that the upper bound from Lemma  3 is tight. Moreover, using these results, we provide the following proof for Theorem 5.

Proof.

Let d1=D/Δ21d_{1}=D/\Delta^{2}-1 and d2=Dd_{2}=D. For sets of size less than or equal d1d_{1} the algorithm returns d1/Δd^Δd1d_{1}/\Delta\leq\hat{d}\leq\Delta d_{1} and for sets of equal to d2d_{2} the algorithm returns d2/Δ<d^Δd2d_{2}/\Delta<\hat{d}\leq\Delta d_{2}. Since Δd1<d2/Δ\Delta d_{1}<d_{2}/\Delta, the above intervals are disjoint. So, the algorithm can distinguish between sets of size less that or equal to d1d_{1} and sets of size greater than d2d_{2}. By Lemma 7 the algorithm must make at least

Ω(DΔ2lognD)\Omega\left(\frac{D}{\Delta^{2}}\log\frac{n}{D}\right)

tests. ∎

5 Polynomial Time Adaptive Algorithm

In this section, we prove:

Theorem 8.

Let DD be some upper bound on the number of defective items dd and Δ>1\Delta>1. Then, there is a linear time deterministic adaptive algorithm that makes

O(DΔ2(lognD+logΔ))O\left(\frac{D}{\Delta^{2}}\left(\log\frac{n}{D}+\log\Delta\right)\right)

tests and outputs d^\hat{d} such that dΔd^dΔ.\frac{d}{\Delta}\leq\hat{d}\leq d\Delta.

We first describe the algorithm. The algorithm gets as an input the set of items X=[n]X=[n] and splits it into two equally-sized disjoint sets Q1Q_{1} and Q2Q_{2}. The algorithm asks the queries defined by Q1Q_{1} and Q2Q_{2} and proceeds in the splitting process on the sets that yielded positive answers only. We call these sets defective sets. As long as the algorithm gets less than D/Δ2D/\Delta^{2} distinct defective sets, it continues to split and test. Two cases can happen. Either it gets D/Δ2D/\Delta^{2} defective sets and then the algorithm outputs d^=D/Δ\hat{d}=D/\Delta, or the number of the defective sets is always less than D/Δ2D/\Delta^{2} and then, the algorithm finds all the defective items and returns their exact number. The algorithm is given in Figure 2. The algorithm invokes the procedure 𝐒𝐩𝐥𝐢𝐭(X){\bf Split}(X) that on an input X={a1,a2,,an}X=\{a_{1},a_{2},\ldots,a_{n}\}, it returns the set WW where W:={X1,X2}W:=\{X_{1},X_{2}\} such that XiXX_{i}\subseteq X, X1={a1,a2,,an/2}X_{1}=\{a_{1},a_{2},\ldots,a_{\left\lfloor{n/2}\right\rfloor}\}, X2={an/2+1,,an}X_{2}=\{a_{\left\lfloor{n/2}\right\rfloor+1},\ldots,a_{n}\} if |X|2|X|\geq 2, and W:={X}W:=\{X\} otherwise.

Adaptive-dEstimate (𝒪I,X,Δ,D)({\cal O}_{I},X,\Delta,D) 1) QX,SQ\leftarrow{X},S\leftarrow\emptyset 2) While (|Q|D/Δ2)(|Q|\leq D/\Delta^{2}) do: 2.1) For each QiQQ_{i}\in Q {Qi(1),Qi(2)}𝐒𝐩𝐥𝐢𝐭(Qi)\left\{Q_{i}^{(1)},Q_{i}^{(2)}\right\}\leftarrow{\bf Split}(Q_{i}) If (Qi(1)(I)=1)(Q_{i}^{(1)}(I)=1) then SS{Qi(1)}S\leftarrow S\cup\{Q_{i}^{(1)}\} If (Qi(2)(I)=1)(Q_{i}^{(2)}(I)=1) then SS{Qi(2)}S\leftarrow S\cup\{Q_{i}^{(2)}\} 2.2) If SiS,|Si|=1\forall S_{i}\in S,|S_{i}|=1 d^|S|\hat{d}\leftarrow|S| Output (d^\hat{d}) Else QS,S.Q\leftarrow S,S\leftarrow\emptyset. 3) d^|Q|Δ\hat{d}\leftarrow|Q|\cdot\Delta. 4) Output (d^\hat{d})

Figure 2: Algorithm Adaptive-dEstimate to estimate the number of defective items.
Lemma 9.

Algorithm Adaptive-dEstimate is a deterministic adaptive algorithm that makes

2DΔ2lognΔ2D=O(DΔ2(lognD+logΔ))2\frac{D}{\Delta^{2}}\log{\frac{n\Delta^{2}}{D}}=O\left(\frac{D}{\Delta^{2}}\left(\log{\frac{n}{D}}+\log{\Delta}\right)\right)

tests and outputs an estimation d^\hat{d} such that:

dΔd^dΔ.\frac{d}{\Delta}\leq\hat{d}\leq d\Delta.
Proof.

If dDΔ2d\leq\frac{D}{\Delta^{2}}, then the splitting process in step 2 of the algorithm proceeds until each defective item belongs to a distinct set. Eventually, the condition in step 2.2 is met and the algorithm outputs the exact value of dd. If d>D/Δ2d>{D/\Delta^{2}}, then the splitting process stops when the number of defective sets |Q||Q| exceeds D/Δ2D/\Delta^{2}. The algorithm halts and outputs d^=|Q|Δ\hat{d}=|Q|\Delta. Obviously, |Q|d|Q|\leq d. Therefore, d^=|Q|ΔdΔ.\hat{d}=|Q|\Delta\leq d\Delta. Moreover, |Q|>D/Δ2d/Δ2|Q|>D/\Delta^{2}\geq d/\Delta^{2} which implies that d^d/Δ.\hat{d}\geq d/\Delta.

The number of iterations cannot exceed logn\log n iterations. In the first log(D/Δ2)\log(D/\Delta^{2}) iterations, in the worst case scenario, the algorithm splits its current set QiQ_{i} on each iteration into two sets Qi(1)Q_{i}^{(1)} and Qi(2)Q_{i}^{(2)} such that Qi(1)(I)=Qi(2)(I)=1Q_{i}^{(1)}(I)=Q_{i}^{(2)}(I)=1. Therefore, the number of tests that the algorithm asks over all the first log(D/Δ2)\log({D}/{\Delta^{2}}) iterations is at most

i=1log(D/Δ2)2i2DΔ2.\sum_{i=1}^{\log({D}/{\Delta^{2}})}2^{i}\leq 2\frac{D}{\Delta^{2}}.

Since |Q|D/Δ2|Q|\leq{D}/{\Delta^{2}}, in the other lognlog(D/Δ2)\log n-\log({D}/{\Delta^{2}}) iterations, the algorithm makes at most 2D/Δ22D/{\Delta^{2}} tests each iteration. So, the total number of tests is at most

2DΔ2(lognlogDΔ2)+2DΔ2=O(DΔ2lognΔ2D)2\frac{D}{\Delta^{2}}\left(\log n-\log\frac{D}{\Delta^{2}}\right)+2\frac{D}{\Delta^{2}}=O\left(\frac{D}{\Delta^{2}}\log\frac{n\Delta^{2}}{D}\right)

. ∎

6 Polynomial Time Non-Adaptive Algorithm

In this section, we show how to use expanders, condensers and extractors to construct deterministic non-adaptive algorithms for defectives number estimation. We prove:

Theorem 10.

Let DD be some upper bound on the number of defective items dd and Δ>1\Delta>1. Then, there is a polynomial time deterministic non-adaptive algorithm that makes

min(Do(1),2log3(logn))DΔ2logn\min\left(D^{o(1)},2^{\log^{3}(\log n)}\right)\cdot\frac{D}{\Delta^{2}}\log n

tests and outputs d^\hat{d} such that dΔd^dΔ.\frac{d}{\Delta}\leq\hat{d}\leq d\Delta.

6.1 Algorithms Using Expanders

Let GG be a bipartite graph G=G(L,R,E)G=G(L,R,E) with left vertices L=[n]L=[n], right vertices R=[m]R=[m] and edges EL×RE\subseteq L\times R. For each edge (i,j)E(i,j)\in E, it holds that the endpoint iLi\in L and jRj\in R. For a vertex vLv\in L, define Γ(v)\Gamma(v) to be the set of the neighbours of vv in GG i.e. Γ(v):={uR|(v,u)E}\Gamma(v):=\{u\in R|(v,u)\in E\}. For a subset SLS\subseteq L, we define Γ(S)\Gamma(S) to be the set of neighbours of SS, meaning Γ(S):=vSΓ(v)\Gamma(S):=\cup_{v\in S}\Gamma(v). For a vertex vLv\in L, the degree of vv is defined as deg(v):=|Γ(v)|deg(v):=|\Gamma(v)|. We say that a bipartite graph G=G(L,R,E)G=G(L,R,E) is a (k,a)(k,a)-expander δ\delta-regular bipartite graph if, the degree of every vertex in LL is δ\delta, and for every left-subset SLS\subseteq L of size at most kk, we have |Γ(S)|a|S||\Gamma(S)|\geq a|S|.

Lemma 11.

Let X=[n]X=[n] be a set of items and I[n]I\subseteq[n] is the set of defective items such that |I|=d|I|=d is unknown to the algorithm. Let G=G(L,R,E)G=G(L,R,E) be a (k,a)(k,a)-expander δ\delta-regular bipartite graph with |L|=n|L|=n and |R|=m|R|=m. Then, there is a deterministic non-adaptive algorithm AA, such that for nn items, it makes mm tests and satisfies:

  1. 1.

    If |I|<ak/δ|I|<ak/\delta, then AA outputs 0.

  2. 2.

    If |I|k|I|\geq k, then AA outputs 11.

Proof.

For every jRj\in R, we define the test T(j)={i|(i,j)E}T^{(j)}=\{i|(i,j)\in E\}. The number of tests is |R|=m|R|=m. If |I|k|I|\geq k, then |Γ(I)|ak|\Gamma(I)|\geq ak. Therefore, at least akak tests will give positive answer 11. If |I|<ak/δ|I|<ak/\delta, then, since the degree of every vertex in LL is δ\delta, we have |Γ(I)|δ|I|<ak|\Gamma(I)|\leq\delta|I|<ak. This shows that, for this case, at most ak1ak-1 tests give the answer 11. Hence, we can distinguish between the two cases. ∎

Following the same proof of Lemma 4 with algorithm 𝒯{\cal T} in Figure 1, we have:

Lemma 12.

Let A(,Δ)A(\ell,\Delta) be a deterministic non-adaptive algorithm such that, for nn items, it makes m(,Δ)m(\ell,\Delta) tests and satisfies:

  1. 1.

    If |I|</Δ2|I|<\ell/\Delta^{2}, then AA outputs 0.

  2. 2.

    If |I|/Δ|I|\geq\ell/\Delta, then AA outputs 11.

Then, there is a deterministic non-adaptive algorithm 𝒯{\cal T} such that, given D>dD>d, for nn items it makes

i=0logD/logΔm(DΔi,Δ)\sum_{i=0}^{\lceil\log D/\log\Delta\rceil}m\left(\frac{D}{\Delta^{i}},\Delta\right)

tests and outputs d^\hat{d} that satisfies d/Δd^Δdd/\Delta\leq\hat{d}\leq\Delta d.

The parameters of the explicit construction of a (k,a)(k,a)-expander δ\delta-regular bipartite graph from  [4] are summarised in the following lemma.

Lemma 13.

For any k>0k>0 and 0<ϵ<10<\epsilon<1, there is an explicit construction of a (k,a)(k,a)-expander δ\delta-regular bipartite graph with

m=O(kδ/ϵ),δ=2O(log3(logn/ϵ)),a=(1ϵ)δ.m=O(k\delta/\epsilon),\ \ \delta=2^{O(\log^{3}(\log n/\epsilon))},\ \ a=(1-\epsilon)\delta.

We now prove:

Lemma 14.

There is a polynomial time deterministic non-adaptive algorithm that makes

DΔ22O(log3(logn))=DΔ2quasipoly(logn)\frac{D}{\Delta^{2}}\cdot 2^{O(\log^{3}(\log n))}=\frac{D}{\Delta^{2}}\cdot\mbox{{\rm quasipoly}}(\log n)

tests and outputs d^\hat{d} that satisfies

dΔd^Δd.\frac{d}{\Delta}\leq\hat{d}\leq\Delta d.
Proof.

We use the expander in Lemma 13. Recall that Δ=1+Ω(1)\Delta=1+\Omega(1). Let r=min(Δ,2)r=\min(\Delta,2), ϵ=11/r\epsilon=1-1/r and k=r/Δ2k=r\ell/\Delta^{2}. Then a=δ/r=2O(log3logn)a=\delta/r=2^{O(\log^{3}\log n)} and m=m(,Δ)=(/Δ2)2O(log3logn)m=m(\ell,\Delta)=(\ell/\Delta^{2})2^{O(\log^{3}\log n)}. By Lemma 11, there is a deterministic non-adaptive algorithm AA such that for nn items, it makes m(,Δ)m(\ell,\Delta) tests and

  1. 1.

    If |I|<ak/δ=/Δ2|I|<ak/\delta=\ell/\Delta^{2} then AA outputs 0.

  2. 2.

    If |I|k=r/Δ2|I|\geq k=r\ell/\Delta^{2} then AA outputs 11.

Algorithm AA trivially satisfies the first condition required by Lemma 12. Consider item 2. If Δ<2\Delta<2 then r=Δr=\Delta and then if |I|k=/Δ|I|\geq k=\ell/\Delta then AA outputs 11. If Δ>2\Delta>2 then r=2r=2 and then if |I|k=2/Δ2|I|\geq k=2\ell/\Delta^{2} then AA outputs 11. Since 2/Δ2</Δ2\ell/\Delta^{2}<\ell/\Delta, if |I|/Δ|I|\geq\ell/\Delta then AA outputs 11.

Now by Lemma 12, there is a deterministic non-adaptive algorithm 𝒯{\cal T} such that, given D>dD>d, for nn items, it makes

i=0logD/logΔm(DΔi,Δ)=DΔ22O(log3(logn))\sum_{i=0}^{\lceil\log D/\log\Delta\rceil}m\left(\frac{D}{\Delta^{i}},\Delta\right)=\frac{D}{\Delta^{2}}\cdot 2^{O(\log^{3}(\log n))}

tests and outputs d^\hat{d} that satisfies d/Δd^Δdd/\Delta\leq\hat{d}\leq\Delta d.

6.2 Algorithms Using Extractors and Condensers

Extractors are functions that convert weak random sources into almost-perfect random sources. We use these objects to construct a non-adaptive algorithm for estimating dd. We start with some definitions.

Definition 15.

Let XX be a random variable over a finite set SS. We say that XX has min-entropy at least kk if Pr[X=x]2kPr[X=x]\leq 2^{-k} for all xSx\in S.

Definition 16.

Let XX and YY be random variables over a finite set SS. We say that XX and YY are ϵclose\epsilon-close if maxPS|𝐏𝐫[XP]𝐏𝐫[YP]|ϵ.\max_{P\subseteq S}|{\bf Pr}[X\in P]-{\bf Pr}[Y\in P]|\leq\epsilon.

We denote by UU_{\ell} the uniform distribution on {0,1}\{0,1\}^{\ell}. The notations 𝐏𝐫xB{\bf Pr}_{x\in B} or 𝐄xB{\bf E}_{x\in B} stand for the fact that the probability and the expectation are taken when xx is chosen randomly uniformly from BB.

Definition 17.

A function F:{0,1}n^×{0,1}t^{0,1}m^F:\{0,1\}^{\hat{n}}\times\{0,1\}^{\hat{t}}\to\{0,1\}^{\hat{m}} is a kϵkk\to_{\epsilon}k^{\prime} condenser if for every XX with min-entropy at least kk and YY uniformly distributed on {0,1}t^\{0,1\}^{\hat{t}}, the distribution of (Y,F(X,Y))(Y,F(X,Y)) is ϵ\epsilon-close to a distribution (Ut^,Z)(U_{\hat{t}},Z) with min-entropy t^+k\hat{t}+k^{\prime}. A condenser is called (k,ϵ)(k,\epsilon)-lossless condenser if k=kk^{\prime}=k. A condenser is called (k,ϵ)(k,\epsilon)-extractor if m^=k\hat{m}=k^{\prime}.

Let N^={0,1}n^,T^={0,1}t^\hat{N}=\{0,1\}^{\hat{n}},\hat{T}=\{0,1\}^{\hat{t}} and M^={0,1}m^\hat{M}=\{0,1\}^{\hat{m}}, and let F:N^×T^M^F:\hat{N}\times\hat{T}\to\hat{M} be a kϵkk\to_{\epsilon}k^{\prime} condenser. Consider the 2t^×2n^2^{\hat{t}}\times 2^{\hat{n}} matrix {\cal M} induced by FF. That is, for rT^r\in\hat{T} and sN^s\in\hat{N}, the entry r,s{\cal M}_{r,s} is equal to F(s,r)F(s,r). For sN^s\in\hat{N}, let (s){\cal M}^{(s)} be the ssth column of {\cal M}. Then, r(s)=r,s=F(s,r){\cal M}^{(s)}_{r}={\cal M}_{r,s}=F(s,r).

Definition 18.

Let Σ\Sigma be a finite set. An nn-mixture over Σ\Sigma is an nn-tuple 𝒮:=(S1,,Sn){\cal S}:=(S_{1},\cdots,S_{n}) such that for all i[n]i\in[n], SiΣS_{i}\subseteq\Sigma.

Using these definitions and notations, we restate the result proved by Cheraghchi [6] (Theorem 9) in the following lemma.

Lemma 19.

Let F:{0,1}n^×{0,1}t^{0,1}m^F:\{0,1\}^{\hat{n}}\times\{0,1\}^{\hat{t}}\to\{0,1\}^{\hat{m}} be a kϵkk\to_{\epsilon}k^{\prime} condenser. Let {\cal M} be the matrix induced by FF. Then, for any 2t^2^{\hat{t}}-mixture 𝒮=(S1,,S2t^){\cal S}=(S_{1},\cdots,S_{2^{\hat{t}}}) over M^:={0,1}m^\hat{M}:=\{0,1\}^{\hat{m}}, the number of columns ss in {\cal M} that satisfies

𝐏𝐫rT^[r(s)Sr]>𝐄rT^[|Sr|]2k+ϵ\underset{{r\in\hat{T}}}{{\bf Pr}}[{\cal M}^{(s)}_{r}\in S_{r}]>\frac{{\bf E}_{r\in\hat{T}}[|S_{r}|]}{2^{k^{\prime}}}+\epsilon

is less than 2k2^{k}.

Equipped with Lemma 19, we prove:

Lemma 20.

If there is a kϵkk\to_{\epsilon}k^{\prime} condenser F:{0,1}n^×{0,1}t^{0,1}m^F:\{0,1\}^{\hat{n}}\times\{0,1\}^{\hat{t}}\to\{0,1\}^{\hat{m}} then, there is a deterministic non-adaptive algorithm 𝒜{\cal A} for n=2n^n=2^{\hat{n}} items that makes m=2t^+m^m=2^{\hat{t}+\hat{m}} tests and satisfies the following.

  1. 1.

    If the number of defectives is less than (1ϵ)2k(1-\epsilon)2^{k^{\prime}} then 𝒜{\cal A} outputs 0.

  2. 2.

    If the number of defectives is greater than or equal 2k+12^{k}+1 then 𝒜{\cal A} outputs 11.

Proof.

Consider the matrix {\cal M} induced by the condenser FF as explained above. We define the test matrix 𝒯{\cal T} from {\cal M} as follows. Let x{0,1}m^x\in\{0,1\}^{\hat{m}}. Define e(x){0,1}2m^e(x)\in\{0,1\}^{2^{\hat{m}}} such that e(x)y=1e(x)_{y}=1 if and only if x=yx=y, where the bits in e(x)e(x) are indexed by the elements of {0,1}2m^\{0,1\}^{2^{\hat{m}}}. Each row rr in the matrix {\cal M} is replaced by 2m^2^{\hat{m}} rows (in 𝒯{\cal T}) such that in each entry r,s{0,1}m^{\cal M}_{r,s}\in\{0,1\}^{\hat{m}} is replaced by the column vector e(r,s)T{0,1}2m^e({\cal M}_{r,s})^{T}\in\{0,1\}^{2^{\hat{m}}}. The rows of the matrix 𝒯{\cal T} are indexed by T^×M^\hat{T}\times\hat{M}. Let 𝒯(i){\cal T}^{(i)} denote the iith column of 𝒯{\cal T}. Therefore, for rT^r\in\hat{T} and jM^j\in\hat{M}, the row (r,j)(r,j) in the matrix 𝒯{\cal T} is denoted by 𝒯(r,j){\cal T}_{(r,j)}. Moreover, the iith entry of the row 𝒯(r,j){\cal T}_{(r,j)} is denoted by 𝒯(r,j),i{\cal T}_{(r,j),i} and 𝒯(r,j),i=𝒯(r,j)(i)=1{\cal T}_{(r,j),i}={\cal T}^{(i)}_{(r,j)}=1 if and only if r,i=j{\cal M}_{r,i}=j. The size of the test matrix 𝒯{\cal T} is m×nm\times n.

Let the defective elements be si1,,sis_{i_{1}},\ldots,s_{i_{\ell}} and let y{0,1}my\in\{0,1\}^{m} indicate the tests result. Then, yy is equal to 𝒯(si1)𝒯(si){\cal T}^{(s_{i_{1}})}\vee\cdots\vee{\cal T}^{(s_{i_{\ell}})}. Let 𝒮=(Sr)rT^{\cal S}=(S_{r})_{r\in\hat{T}} be a 2t^2^{\hat{t}}-mixture over {0,1}m^\{0,1\}^{{\hat{m}}} where for all rT^r\in\hat{T}, Sr={j{0,1}m^|y(r,j)=𝒯(r,j)(si1)𝒯(r,j)(si)=1}S_{r}=\{j\in\{0,1\}^{{\hat{m}}}|y_{(r,j)}={\cal T}^{(s_{i_{1}})}_{(r,j)}\vee\cdots\vee{\cal T}^{(s_{i_{\ell}})}_{(r,j)}=1\}. It is easy to see that:

  1. 1.

    |Sr||S_{r}|\leq\ell. This is because, by the definition of SrS_{r}, jSrj\in S_{r} if and only if y(r,j)=1y_{(r,j)}=1. The entry y(r,j)y_{(r,j)} gets the value 11 if at least one of the entries 𝒯(r,j)(si1),,𝒯(r,j)(si){\cal T}_{(r,j)}^{(s_{i_{1}})},\cdots,{\cal T}_{(r,j)}^{(s_{i_{\ell}})} is 11. Any row in 𝒯(si1),,𝒯(si){\cal T}^{(s_{i_{1}})},\cdots,{\cal T}^{(s_{i_{\ell}})} has exactly one entry that is equal to 11 in all the 2m^2^{\hat{m}} rows indexed by rr. Hence, each row can cause one item to be inserted to SrS_{r}.

  2. 2.

    For any sij{si1,,si}s_{i_{j}}\in\{s_{i_{1}},\ldots,s_{i_{\ell}}\}, we have 𝐏𝐫rT^[r(sij)Sr]=1{\bf Pr}_{r\in\hat{T}}[{\cal M}^{(s_{i_{j}})}_{r}\in S_{r}]=1

  3. 3.

    Given the matrix {\cal M}, its test matrix 𝒯{\cal T} and the observed result yy, for any column ss the probability 𝐏𝐫rT^[r(s)Sr]{\bf Pr}_{r\in\hat{T}}[{\cal M}^{(s)}_{r}\in S_{r}] can be easily computed.

If the number of defectives is less than (1ϵ)2k(1-\epsilon)2^{k^{\prime}} then, by Lemma 19, all columns, except for at most 2k2^{k} columns, satisfy

𝐏𝐫rT^[r(s)Sr]𝐄rT^[|Sr|]2k+ϵ<𝐄rT^[(1ϵ)2k]2k+ϵ=1.\underset{{r\in\hat{T}}}{{\bf Pr}}[{\cal M}^{(s)}_{r}\in S_{r}]\leq\frac{{\bf E}_{r\in\hat{T}}[|S_{r}|]}{2^{k^{\prime}}}+\epsilon<\frac{{\bf E}_{r\in\hat{T}}[(1-\epsilon)2^{k^{\prime}}]}{2^{k^{\prime}}}+\epsilon=1.

So for less than 2k+12^{k}+1 columns we have 𝐏𝐫rT^[r(s)Sr]=1{\bf Pr}_{r\in\hat{T}}[{\cal M}^{(s)}_{r}\in S_{r}]=1. If the number of defectives is greater than or equal 2k+12^{k}+1, then for the columns of the defectives we have 𝐏𝐫rT^[r(s)Sr]=1{\bf Pr}_{r\in\hat{T}}[{\cal M}^{(s)}_{r}\in S_{r}]=1. So for more than 2k2^{k} columns we have 𝐏𝐫rT^[r(s)Sr]=1{\bf Pr}_{r\in\hat{T}}[{\cal M}^{(s)}_{r}\in S_{r}]=1.

The following Lemma summarises the state of the art result due to Guruswami et. al.  [15] on explicit construction of expanders.

Lemma 21.

For all positive integers n^,k\hat{n},k such that n^k\hat{n}\geq k, and all ϵ>0\epsilon>0, there is an explicit (k,ϵ)(k,\epsilon) extractor F:{0,1}n^×{0,1}t^{0,1}m^F:\{0,1\}^{\hat{n}}\times\{0,1\}^{\hat{t}}\to\{0,1\}^{\hat{m}} with t^=logn^+O(logklog(k/ϵ))\hat{t}=\log{\hat{n}}+O(\log{k}\log{(k/\epsilon)}) and m^=k=k2log1/ϵc\hat{m}=k^{\prime}=k-2\log{1/\epsilon}-c for some constant cc.

We now prove:

Lemma 22.

There is a constant CC such that for every Δ>C\Delta>C, there is a polynomial deterministic non-adaptive algorithm that estimates the number of defective items in a set of nn items up to a multiplicative factor of Δ\Delta and asks

O(D1+o(1)Δ2logn)O\left(\frac{D^{1+o(1)}}{\Delta^{2}}\log{n}\right)

queries.

Proof.

We use the notations from Lemma 21. Let C=272c2C=27\cdot 2^{c-2}. We choose ϵ=2/3\epsilon=2/3 and kk^{\prime} such that (1ϵ)2k=/Δ2(1-\epsilon)2^{k^{\prime}}=\ell/\Delta^{2}. Then

2k=2k+2log(1/ϵ)+c=272c2Δ2<Δ2^{k}=2^{k^{\prime}+2\log(1/\epsilon)+c}=27\cdot 2^{c-2}\frac{\ell}{\Delta^{2}}<\frac{\ell}{\Delta}

By Lemma 20, there is a deterministic non-adaptive algorithm 𝒜{\cal A} for n=2n^n=2^{\hat{n}} items that makes

m=2t^+m^=n^2O(logklog(k/ϵ))(1ϵ)Δ2=2log2log(/Δ)Δ2lognm=2^{\hat{t}+\hat{m}}=\hat{n}2^{O(\log k\log(k/\epsilon))}\frac{\ell}{(1-\epsilon)\Delta^{2}}=2^{\log^{2}\log(\ell/\Delta)}\frac{\ell}{\Delta^{2}}\log n

tests that satisfies the following:

  1. 1.

    If the number of defectives is less than (1ϵ)2k=/Δ2(1-\epsilon)2^{k^{\prime}}={\ell}/{\Delta^{2}} then 𝒜{\cal A} outputs 0.

  2. 2.

    If the number of defectives is greater than or equal 2k+12^{k}+1 then 𝒜{\cal A} outputs 11 and, since 2k</Δ2^{k}<\ell/\Delta, in particular, if the number of defectives is greater than or equal /Δ\ell/\Delta then 𝒜{\cal A} outputs 11.

By Lemma 12, the result follows. ∎

A similar work by Capalbo et.al.  [4] gives an explicit constrction of a lossless condenser is summarised in the following lemma:

Lemma 23.

For all positive integers n^,k\hat{n},k and all ϵ>0\epsilon>0, there is an explicit lossless condenser F:{0,1}n^×{0,1}t^{0,1}m^F:\{0,1\}^{\hat{n}}\times\{0,1\}^{\hat{t}}\to\{0,1\}^{\hat{m}} with t^=O(log3(n^/ϵ))\hat{t}=O(\log^{3}(\hat{n}/\epsilon)) and m^=k+log(1/ϵ)+O(1)\hat{m}=k+\log(1/\epsilon)+O(1).

The construction from Lemma 23 yields a result that is similar to the one established in Lemma 22.


References

  • [1] Abhishek Agarwal, Larkin Flodin, and Arya Mazumdar. Estimation of sparsity via simple measurements. In 2017 IEEE International Symposium on Information Theory (ISIT), pages 456–460. IEEE, 2017.
  • [2] Nader H. Bshouty. Lower bound for non-adaptive estimation of the number of defective items. In 30th International Symposium on Algorithms and Computation, ISAAC 2019, December 8-11, 2019, Shanghai University of Finance and Economics, Shanghai, China, pages 2:1–2:9, 2019.
  • [3] Nader H. Bshouty, Vivian E. Bshouty-Hurani, George Haddad, Thomas Hashem, Fadi Khoury, and Omar Sharafy. Adaptive group testing algorithms to estimate the number of defectives. CoRR, abs/1712.00615, 2017.
  • [4] Michael Capalbo, Omer Reingold, Salil Vadhan, and Avi Wigderson. Randomness conductors and constant-degree expansion beyond the degree / 2 barrier. 01 2002.
  • [5] Yongxi Cheng and Yin-Feng Xu. An efficient fpras type group testing procedure to approximate the number of defectives. Journal of Combinatorial Optimization, 27:302–314, 2014.
  • [6] Mahdi Cheraghchi. Noise-resilient group testing: Limitations and constructions. CoRR, abs/0811.2609, 2008.
  • [7] Chen CL and Swallow WH. Using group testing to estimate a proportion, and to test the binomial model. Biometrics, 46(4):1035–1046, 1990.
  • [8] Graham Cormode and S. Muthukrishnan. What’s hot and what’s not: Tracking most frequent items dynamically. ACM Trans. Database Syst., 30(1):249–278, March 2005.
  • [9] Peter Damaschke and Azam Sheikh Muhammad. Bounds for nonadaptive group tests to estimate the amount of defectives. In Weili Wu and Ovidiu Daescu, editors, Combinatorial Optimization and Applications, pages 117–130, Berlin, Heidelberg, 2010. Springer Berlin Heidelberg.
  • [10] Peter Damaschke and Azam Sheikh Muhammad. Competitive group testing and learning hidden vertex covers with minimum adaptivity. In Mirosław Kutyłowski, Witold Charatonik, and Maciej Gebala, editors, Fundamentals of Computation Theory, pages 84–95, Berlin, Heidelberg, 2009. Springer Berlin Heidelberg.
  • [11] Robert Dorfman. The detection of defective members of large populations. The Annals of Mathematical Statistics, 14(4):436–440, 1943.
  • [12] Moein Falahatgar, Ashkan Jafarpour, Alon Orlitsky, Venkatadheeraj Pichapati, and Ananda Suresh. Estimating the number of defectives with group testing. pages 1376–1380, 07 2016.
  • [13] Joseph L. Gastwirth and Patricia A. Hammick. Estimation of the prevalence of a rare disease, preserving the anonymity of the subjects by group testing: application to estimating the prevalence of aids antibodies in blood donors. Journal of Statistical Planning and Inference, 22(1):15 – 27, 1989.
  • [14] Christian Gollier and Olivier Gossner. Group testing against covid-19. Covid Economics, pages 32–42, 04 2020.
  • [15] V. Guruswami, C. Umans, and S. Vadhan. Unbalanced expanders and randomness extractors from parvaresh-vardy codes. In Twenty-Second Annual IEEE Conference on Computational Complexity (CCC’07), pages 96–108, 2007.
  • [16] E. S. Hong and R. E. Ladner. Group testing for image compression. IEEE Transactions on Image Processing, 11(8):901–911, 2002.
  • [17] W. Kautz and R. Singleton. Nonrandom binary superimposed codes. IEEE Transactions on Information Theory, 10(4):363–377, 1964.
  • [18] Chou Li. A sequential method for screening experimental variables. Journal of The American Statistical Association - J AMER STATIST ASSN, 57:455–477, 06 1962.
  • [19] Cassidy Mentus, Martin Romeo, and Christian DiPaola. Analysis and applications of adaptive group testing methods for covid-19. medRxiv, 2020.
  • [20] Hung Ngo and Ding-Zhu Du. A survey on combinatorial group testing algorithms with applications to dna library screening. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, 55, 12 2000.
  • [21] Dana Ron and Gilad Tsur. The power of an example: Hidden set size approximation using group queries and conditional sampling. CoRR, abs/1404.5568, 2014.
  • [22] M. Sobel and P. A. Groll. Group testing to eliminate efficiently all defectives in a binomial sample. The Bell System Technical Journal, 38(5):1179–1252, 1959.
  • [23] William H Swallow. Group testing for estimating infection rates and probabilities of disease transmission. Phytopathology (USA), 1985.
  • [24] Keith H Thompson. Estimation of the proportion of vectors in a natural population of insects. Biometrics, 18(4):568–578, 1962.
  • [25] Stephen D. Walter, Stephen W. Hilderth, and Barry J. Beaty. Estimation of infection Rates in Populations of Organisms Using Pools of Variable Size. American Journal of Epidemiology, 112(1):124–128, 07 1980.
  • [26] J. Wolf. Born again group testing: Multiaccess communications. IEEE Transactions on Information Theory, 31(2):185–191, 1985.
  • [27] Idan Yelin, Noga Aharony, Einat Shaer-Tamar, Amir Argoetti, Esther Messer, Dina Berenbaum, Einat Shafran, Areen Kuzli, Nagam Gandali, Tamar Hashimshony, Yael Mandel-Gutfreund, Michael Halberthal, Yuval Geffen, Moran Szwarcwort-Cohen, and Roy Kishony. Evaluation of covid-19 rt-qpcr test in multi-sample pools. medRxiv, 2020.