Optimal Deterministic Group Testing Algorithms
to Estimate the Number of Defectives

Nader H. Bshouty
Dept. of Computer Science
Technion, Haifa, 32000

Catherine A. Haddad-Zaknoon
Dept. of Computer Science
Technion, Haifa, 32000

Abstract

We study the problem of estimating the number of defective items $d$ within a pile of $n$ elements up to a multiplicative factor of $\Delta>1$ , using deterministic group testing algorithms. We bring lower and upper bounds on the number of tests required in both the adaptive and the non-adaptive deterministic settings given an upper bound $D$ on the defectives number. For the adaptive deterministic settings, our results show that, any algorithm for estimating the defectives number up to a multiplicative factor of $\Delta$ must make at least $\Omega\left((D/\Delta^{2})\log(n/D)\right)$ tests. This extends the same lower bound achieved in [1] for non-adaptive algorithms. Moreover, we give a polynomial time adaptive algorithm that shows that our bound is tight up to a small additive term.

For non-adaptive algorithms, an upper bound of $O((D/\Delta^{2})$ $(\log(n/D)+\log\Delta))$ is achieved by means of non-constructive proof. This improves the lower bound $O((\log D)/(\log\Delta))D\log n)$ from [1] and matches the lower bound up to a small additive term.

In addition, we study polynomial time constructive algorithms. We use existing polynomial time constructible expander regular bipartite graphs, extractors and condensers to construct two polynomial time algorithms. The first algorithm makes $O((D^{1+o(1)}/\Delta^{2})\cdot\log n)$ tests, and the second makes $(D/\Delta^{2})\cdot quazipoly$ $(\log n)$ tests. This is the first explicit construction with an almost optimal test complexity.

1 Introduction

The problem of group testing is the problem of identifying or, in some cases, examining the properties of a small amount of items known as defective items within a pile of elements using group tests. Let $X$ be a set of $n$ items, and let $I\subseteq X$ be the set of defective items. A group test is a subset $Q\subseteq X$ of items. The result of the test $Q$ with respect to $I$ is defined by $Q(I):=1$ if $Q\cap I\neq\emptyset$ and $Q(I):=0$ otherwise. While the defective set $I$ is unknown to the algorithm, in many cases we might be interested in finding the size of the defective set $|I|$ , or at least an estimation of that value with a minimum number of tests.

Group testing was originally proposed as a potential solution for economising mass blood testing during WWII [11]. Since then, group testing approach has been diversely applied in a wide area of practical applications including DNA library screening [20], product testing quality control[22], file searching in storage systems [17], sequential screening of experimental variables [18], efficient contention algorithms for MAC [17, 26], data compression [16], and computations in data stream model [8]. Recently, during the COVID-19 pandemic outbreak, a number of researches adopted the group testing paradigm not only to accelerate mass testing process, but also to dramatically reduce the number of kits required for testing due to severe shortages in the testing kits supply [27, 14, 19].

While an up-front knowledge of the value of $d$ or at least an upper bound on it is required in many of the algorithms aimed at identifying the defective items, estimating or finding the number of defectives is an interesting problem on its own as well. Defectives estimation via group testing has been applied vastly in biological and medical applications [7, 23, 24, 25, 13]. In [24], for example, group testing algorithms are used to estimate aster-yellow virus transmitters proportion over the organisms in a natural population of leafhoppers. Similarly, in [25], the authors estimate the infection rate of the yellow-fever virus in mosquito population using group testing methods. On the other hand, in [13], group-testing-based estimation of rare diseases prevalence is employed not only for its effectiveness but also because it naturally preserves individual anonymity of the subjects.

Algorithms dedicated for this task might operate in stages or rounds. In each round, the tests are defined in advance and tested in a single parallel step. Tests on some round might depend on the test results of the preceding rounds. A single round algorithm is called non-adaptive algorithm, while a multi-round algorithm is called adaptive algorithm.

In recent years, there has been an increasing interest in the problem of estimating the number of defective items via group testing[3, 2, 7, 5, 9, 10, 12, 21]. The target in some of these papers is to find an estimation $\hat{d}$ within an additive factor of $\epsilon<1$ such that $(1-\epsilon)d\leq\hat{d}\leq(1+\epsilon)d$ . For randomized adaptive algorithms we have the following results. Falhatgar et.al. [12] give a randomised adaptive algorithm that estimates $d$ using $2\log\log d+O(1/\epsilon^{2}\log 1/\delta)$ queries in expectation where $\delta$ is the failure probability of the algorithm. Bshouty et. al. [3] modified this result and gave an algorithm that uses $(1-\delta)\log\log d+O((1/\epsilon^{2})\log 1/\delta)$ expected number of queries. Moreover, they proved a lower bound of $(1-\delta)\log\log d+\Omega((1/\epsilon)\log(1/\delta))$ queries.

For randomized non-adaptive algorithms with constant estimation, Damaschke and Sheikh Muhammad give in [10] a randomized non-adaptive algorithm that makes $O((\log(1/\delta))\log n)$ tests and in [2], Bshouty gives the lower bound $\Omega(\log n/\log\log n)$ .

In this paper, we are interested in deterministic adaptive and non-adaptive algorithms that estimate the defective items set size $d$ up to a multiplicative factor of $\Delta>1$ . Formally, let $|I|:=d$ and let $D\geq d$ . We say that a deterministic algorithm $\cal A$ estimates $d$ up to a multiplicative factor of $\Delta$ if, given $D$ as an input to the algorithm, it evaluates an estimation $\hat{d}$ such that $d/\Delta\leq\hat{d}\leq d\Delta$ . Bshouty et al. show in [3] that, if no upper bound $D$ is given to the algorithm, then any deterministic adaptive algorithm (and therefore also non-deterministic algorithm) for this problem must make at least $\Omega(n)$ tests. This is equivalent to testing all the items. This justifies the fact that any non-trivial efficient algorithm must have some upper bound $D$ for $d$ .

Agarwal et.al. [1] consider this problem. They first give the lower bound of $\Omega((D/\Delta^{2})$ $\log({n}/{D}))$ queries for any non-adaptive deterministic algorithm. Moreover, using a non-constructive proof, they give an upper bound of $O\left((({\log D})/({\log\Delta}))D\log n\right)$ queries.

We further investigate this problem. We bring new lower and upper bounds on the number of tests required both in adaptive and non-adaptive deterministic algorithms. For the adaptive deterministic settings, our results show that, any algorithm for estimating the defectives number up to a multiplicative factor of $\Delta$ must make at least $\Omega\left((D/\Delta^{2})\log(n/D)\right)$ tests. This extends the same lower bound achieved in [1] for non-adaptive algorithms. Furthermore, we give a polynomial time adaptive algorithm that shows that our bound is tight up to a small additive term.

For non-adaptive algorithms, we achieve an upper bound of $O((D/\Delta^{2})$ $(\log(n/D)+\log\Delta))$ by means of non-constructive proof. This improves the lower bound $O((\log D)/(\log\Delta))D\log n)$ from [1], and matches the lower bound up to a small additive term.

We then study polynomial time constructive algorithms. For this task, we use existing polynomial time constructible expander regular bipartite graphs, extractors and condensers to construct two polynomial time algorithms. The first algorithm makes $O((D^{1+o(1)}/\Delta^{2})\cdot\log n)$ tests, and the second makes $(D/\Delta^{2})\cdot quazipoly$ $(\log n)$ tests. To the best of our knowledge, this is the first explicit construction with an almost optimal test complexity. Our results are summarised in Table 1.

Bounds	Adaptive/	Result	Explicit/	Ref.
	Non-Adapt.		Non-Expl.
Lower B.	Non-Adapt.	$\frac{D}{\Delta^{2}}\log\frac{n}{D}$	-	[1]
Lower B.	Adaptive	$\frac{D}{\Delta^{2}}\log\frac{n}{D}$	-	Ours
Upper B.	Adaptive	$\frac{D}{\Delta^{2}}\left(\log\frac{n}{D}+\log\Delta\right)$	Explicit	Ours
Upper B.	Non-Adapt.	$\frac{\log D}{\log\Delta}D\log n$	Non-Expl.	[1]
Upper B.	Non-Adapt.	$\frac{D}{\Delta^{2}}\left(\log\frac{n}{D}+\log\Delta\right)$	Non-Expl.	Ours
Upper B.	Non-Adapt.	$\frac{D^{1+o(1)}}{\Delta^{2}}\log n$	Explicit¹¹1This result is true for $\Delta>C$ for some constant $C$ . See section 6.2.	Ours
Upper B.	Non-Adapt.	$\frac{D}{\Delta^{2}}\cdot{\rm Quazipoly}(\log n)$	Explicit	Ours

Table 1: Upper and lower bounds on the number of tests required
for estimating defectives in deterministic group testing.

2 Definitions and Preliminary Results

In this section, we give some notations and definition that will be used in this paper.

Let $X=[n]:=\{1,\cdots,n\}$ be a set of items. Let $I\subseteq X$ be a set of defective items, and let $d$ denote its size, i.e. $d=|I|$ . In the group testing settings, a test is a subset $Q\subseteq X$ of items. An answer to a test $Q$ with respect to the defective items set $I$ , is denoted by $Q(I)$ , such that $Q(I):=1$ if $Q\cap I\neq\emptyset$ and $0$ otherwise. We denote by ${\cal O}_{I}$ an oracle that for a test $Q$ returns $Q(I)$ .

Let ${\cal A}$ be an algorithm with an access to ${\cal O}_{I}$ , and let $d=|I|$ . We say that the algorithm ${\cal A}$ estimates $d$ up to a multiplicative factor of $\Delta$ , if ${\cal A}$ gets as an input an upper bound $D\geq d$ and a parameter $\Delta>1$ , and outputs $\hat{d}$ such that $d/\Delta\leq\hat{d}\leq d\Delta$ . We say that ${\cal A}$ is an adaptive algorithm, if its queries depend on the result of previous queries, and non-adaptive if its queries are independent of previous ones and therefore, can be executed in a single parallel step. We may assume that $D\geq\Delta^{2}$ , otherwise, the algorithm trivially outputs $\hat{d}=D/\Delta$ . We note here that $\Delta\geq 1+\Omega(1)$ , that is, it is greater than a constant that is greater than $1$ and it may depend²²2For example $\Delta=\log\log n+\log D$ on $n$ and/or $D$ . This is implicit in [1] and is also constrained in this paper. It is also interesting to investigate this problem when $\Delta=1+o(n)$ where $o()$ (small $o$ ) is with respect to $D$ and/or $n$ .

We will use the following

Lemma 1.

Chernoff’s Bound. Let $X_{1},\ldots,X_{m}$ be independent random variables taking values in $\{0,1\}$ . Let $X=\sum_{i=1}^{m}X_{i}$ denotes their sum and $\mu={\bf E}[X]$ denotes the sum’s expected value. Then

\displaystyle{\bf Pr}[X>(1+\lambda)\mu]\leq\left(\frac{e^{\lambda}}{(1+\lambda)^{(1+\lambda)}}\right)^{\mu}\leq e^{-\frac{\lambda^{2}\mu}{2+\lambda}}\leq\begin{cases}e^{-\frac{\lambda^{2}\mu}{3}}&\mbox{if\ }0<\lambda\leq 1\\ e^{-\frac{\lambda\mu}{3}}&\mbox{if\ }\lambda>1\end{cases}.

(1)

In particular,

\displaystyle{\bf Pr}[X>\Lambda]\leq\left(\frac{e\mu}{\Lambda}\right)^{\Lambda}.

(2)

For $0\leq\lambda\leq 1$ we have

\displaystyle{\bf Pr}[X<(1-\lambda)\mu]\leq\left(\frac{e^{-\lambda}}{(1-\lambda)^{(1-\lambda)}}\right)^{\mu}\leq e^{-\frac{\lambda^{2}\mu}{2}}.

(3)

Moreover, we will often use the inequality

\displaystyle\left(\frac{n}{k}\right)^{k}\leq{n\choose k}\leq\sum_{i=0}^{k}{n\choose i}\leq\left(\frac{en}{k}\right)^{k},

(4)

3 Upper Bound for Non-Adaptive Deterministic Algorithms

In this section, we give the upper bound for deterministic non-adaptive algorithm that estimates $d$ up to a multiplicative factor of $\Delta$ . We prove:

Theorem 2.

Let $D$ be some upper bound on the number of defective items $d$ and $\Delta>1$ . Then, there is a deterministic non-adaptive algorithm that makes

O\left(\frac{D}{\Delta^{2}}\left(\log\frac{n}{D}+\log\Delta\right)\right)

tests and outputs $\hat{d}$ such that $\frac{d}{\Delta}\leq\hat{d}\leq d\Delta.$

To prove the Theorem we need the following:

Lemma 3.

Let $\Delta>1$ and $\ell\geq 2\Delta^{2}$ . There is a non-adaptive deterministic algorithm that makes

t=O\left(\frac{\ell}{\Delta^{2}}\left(\log\frac{n}{\ell}+\log\Delta\right)\right)

tests such that,

1.

If the number of defectives $d$ is less than $\ell/\Delta^{2}$ , it outputs $0$ .
2.

If it is greater than $\ell/\Delta$ , it outputs $1$ .

Proof.

We choose a constant $c$ such that $(1-\Delta^{2}/(c\ell))^{\ell/\Delta^{2}}=1/e$ . Note that

\left(1-\frac{\Delta^{2}}{2\ell}\right)^{\ell/\Delta^{2}}\geq 1-\frac{\Delta^{2}\ell}{2\ell\Delta^{2}}=\frac{1}{2}>\frac{1}{e}

and

\left(1-\frac{2\Delta^{2}}{\ell}\right)^{\ell/\Delta^{2}}=\left(\left(1-\frac{2\Delta^{2}}{\ell}\right)^{\frac{\ell}{2\Delta^{2}}}\right)^{2}\leq\frac{1}{e^{2}}<\frac{1}{e}.

Therefore, such $c$ exists and we have $1/2\leq c\leq 2$ .

Consider a test $Q\subseteq[n]$ chosen at random where each item $i\in[n]$ is chosen to be in $Q$ with probability $\Delta^{2}/(c\ell)$ . Let $I$ be the set of defective items such that $|I|=d$ , and let $Q(I)$ be the result of the test $Q$ with respect to the set $I$ . Then,

{\bf Pr}[Q(I)=0]=\left(1-\frac{\Delta^{2}}{c\ell}\right)^{d}.

(5)

If $d\leq\ell/\Delta^{2}$ ,

{\bf Pr}[Q(I)=0]\geq\left(1-\frac{\Delta^{2}}{c\ell}\right)^{\ell/\Delta^{2}}=e^{-1},

(6)

if $d=2\ell/\Delta^{2}$ ,

{\bf Pr}[Q(I)=0]=\left(1-\frac{\Delta^{2}}{c\ell}\right)^{2\ell/\Delta^{2}}=e^{-2},

(7)

and if $d=\ell/\Delta$ , we get:

{\bf Pr}[Q(I)=0]=\left(\left(1-\frac{\Delta^{2}}{c\ell}\right)^{\frac{\ell}{\Delta^{2}}}\right)^{\Delta}=e^{-\Delta}.

(8)

Let $Q_{1},Q_{2},\ldots,Q_{t}$ be a sequence of $t$ i.i.d tests such that

t=\frac{c^{\prime}\ell}{(\Delta-1)^{2}}\ln\frac{c^{\prime\prime}\Delta^{2}n}{\ell}

where $c^{\prime}=54e^{2}$ and $c^{\prime\prime}=4e$ .

Let

\eta=e^{-1}\left(\frac{1}{2}+\frac{1}{2\Delta}\right).

Consider the following two events:

1.

$A$ : There is a set of defectives $I$ of size $|I|\leq\ell/\Delta^{2}$ such that the number of tests with $0$ answer is less than $\eta t$ .
2.

$B$ : There is a set of defectives $J$ of size $|J|>\ell/\Delta$ such that the number of tests with $0$ answer is at least $\eta t$ .

Notice that, to prove the lemma it is enough to prove that ${\bf Pr}[A\vee B]<1$ . We will show that ${\bf Pr}[A],{\bf Pr}[B]<1/2$ .

Let $X_{1},\ldots,X_{t}$ be random variables such that $X_{i}=1$ if and only if $Q_{i}(I)=0$ . Let $X$ be the number of tests that yield the result $0$ . Therefore, $X=\sum_{i=1}^{t}X_{i}$ and define $\mu:={\bf E}[X]$ .

If $|I|=d\leq\ell/\Delta^{2}$ , then $\mu=t\cdot{\bf E}[X_{i}]=t\cdot{\bf Pr}[X_{i}=1]$ . By (6) we have

\mu={\bf E}[X]\geq t\cdot e^{-1}.

(9)

By (3) in Lemma 1, for $\lambda=1/2-1/(2\Delta)$ we have

{\bf Pr}[X\leq\eta t]={\bf Pr}[X\leq(1-\lambda)te^{-1}]\leq{\bf Pr}[X\leq(1-\lambda)\mu]\leq e^{-\frac{\lambda^{2}\mu}{2}}\leq e^{-\frac{(1-\Delta^{-1})^{2}t}{8e}}.

Using this result, equations (4) and the union bound, we can conclude that

	$\displaystyle{\bf Pr}[A]$	$\displaystyle\leq$	$\displaystyle\left(\sum_{i=0}^{\ell/\Delta^{2}}{n\choose i}\right)e^{-\frac{(1-\Delta^{-1})^{2}t}{8e}}\leq\left(\frac{e\Delta^{2}n}{\ell}\right)^{\frac{\ell}{\Delta^{2}}}e^{-\frac{(1-\Delta^{-1})^{2}t}{8e}}$
		$\displaystyle=$	$\displaystyle\left(\frac{e\Delta^{2}n}{\ell}\right)^{\frac{\ell}{\Delta^{2}}}e^{-\frac{c^{\prime}\ell}{8e\Delta^{2}}\ln\frac{c^{\prime\prime}\Delta^{2}n}{\ell}}=\left(\frac{e\Delta^{2}n}{\ell}\right)^{\frac{\ell}{\Delta^{2}}}\left(\frac{c^{\prime\prime}\Delta^{2}n}{\ell}\right)^{-\frac{c^{\prime}\ell}{\Delta^{2}}}<\frac{1}{2}.$

On the other hand, for the event $B$ , we have two cases.

Case I. $1<\Delta\leq 2$ .

If there is a set of defectives $J$ of size $|J|>\ell/\Delta$ such that more than $\eta t$ of the tests yield the answer $0$ , then there is a set of defectives $J^{\prime}$ of size $|J^{\prime}|=\ell/\Delta$ such that more than $\eta t$ of the tests answers are $0$ . Denote by $B^{\prime}$ the latter event. Then, by (8) we have $\mu={\bf E}[X]=e^{-\Delta}t$ and for $\lambda=(e^{\Delta-1}-1)/2\geq(\Delta-1)/2$ , $\eta^{\prime}=(e^{-1}+e^{-\Delta})/2\leq\eta$ we get

$\displaystyle{\bf Pr}[B]\leq{\bf Pr}[B^{\prime}]$	$\displaystyle\leq$	$\displaystyle{n\choose\ell/\Delta}{\bf Pr}\left[X\geq\eta t\right]\leq{n\choose\ell/\Delta}{\bf Pr}\left[X\geq\eta^{\prime}t\right]$
	$\displaystyle=$	$\displaystyle{n\choose\ell/\Delta}{\bf Pr}\left[X\geq\left(1+\lambda\right)\mu\right]$
	$\displaystyle\leq$	$\displaystyle\left(\frac{e\Delta n}{\ell}\right)^{\frac{\ell}{\Delta}}{\bf Pr}\left[X\geq\left(1+\lambda\right)\mu\right]$

If $1<\Delta\leq 2$ then $0\leq\lambda\leq 1$ and then by (1) in Lemma 1, we have

$\displaystyle\left(\frac{e\Delta n}{\ell}\right)^{\frac{\ell}{\Delta}}{\bf Pr}\left[X\geq\left(1+\lambda\right)\mu\right]$	$\displaystyle\leq$	$\displaystyle\left(\frac{e\Delta n}{\ell}\right)^{\frac{\ell}{\Delta}}e^{-\lambda^{2}\mu/3}$
	$\displaystyle\leq$	$\displaystyle\left(\frac{e\Delta n}{\ell}\right)^{\frac{\ell}{\Delta}}e^{-(\Delta-1)^{2}\mu/12}\ \ \ \mbox{since}\ \lambda\geq(\Delta-1)/2$
	$\displaystyle=$	$\displaystyle\left(\frac{e\Delta n}{\ell}\right)^{\frac{\ell}{\Delta}}e^{-(\Delta-1)^{2}e^{-\Delta}t/12}$
	$\displaystyle=$	$\displaystyle\left(\frac{e\Delta n}{\ell}\right)^{\frac{\ell}{\Delta}}\left(\frac{c^{\prime\prime}\Delta^{2}n}{\ell}\right)^{-c^{\prime}\ell e^{-\Delta}/12}$
	$\displaystyle\leq$	$\displaystyle\left(\frac{2en}{\ell}\right)^{\ell}\left(\frac{c^{\prime\prime}n}{\ell}\right)^{-(c^{\prime}e^{-2}/12)\ell}<\frac{1}{2}\ \ \ \ \ \ \ \ 1\leq\Delta<2$

Case II. $\Delta>2$ .

In this case we have $\ell/\Delta>2\ell/\Delta^{2}$ . Therefore, if there is a set of defectives $J$ of size $|J|>\ell/\Delta$ such that more than $\eta t$ of the tests yield the answer $0$ , then there is a set of defectives $J^{\prime}$ of size $|J^{\prime}|=2\ell/\Delta^{2}$ such that more than $\eta t$ of the tests answers are $0$ . Denote by $B^{\prime\prime}$ the latter event. By (7), $\mu={\bf E}[X]=e^{-2}t$ . Let $\lambda=1/3-1/(3\Delta)<1$ . Then $\eta t>(1+\lambda)\mu$ . By (1) in Lemma 1, we have

{\bf Pr}[X\geq\eta t]\leq{\bf Pr}[X\geq(1+\lambda)\mu]\leq e^{-\frac{\lambda^{2}\mu}{3}}\leq e^{-\frac{(1-\Delta^{-1})^{2}t}{27e^{2}}}.

Then

	$\displaystyle{\bf Pr}[A]$	$\displaystyle\leq$	$\displaystyle{\bf Pr}[B^{\prime\prime}]\leq{n\choose 2\ell/\Delta^{2}}e^{-\frac{(1-\Delta^{-1})^{2}t}{27e^{2}}}\leq\left(\frac{e\Delta^{2}n}{2\ell}\right)^{\frac{2\ell}{\Delta^{2}}}e^{-\frac{(1-\Delta^{-1})^{2}t}{27e^{2}}}$
		$\displaystyle=$	$\displaystyle\left(\frac{e\Delta^{2}n}{2\ell}\right)^{\frac{2\ell}{\Delta^{2}}}e^{-\frac{c^{\prime}\ell}{27e^{2}\Delta^{2}}\ln\frac{c^{\prime\prime}\Delta^{2}n}{\ell}}=\left(\frac{e\Delta^{2}n}{2\ell}\right)^{\frac{2\ell}{\Delta^{2}}}\left(\frac{c^{\prime\prime}\Delta^{2}n}{\ell}\right)^{-\frac{c^{\prime}\ell}{27e^{2}\Delta^{2}}}<\frac{1}{2}.$

∎

We are now ready to prove Theorem 2.

Let ${\cal A}(\ell,\Delta)$ be the algorithm from Lemma 3. Then, ${\cal A}(\ell,\Delta)$ makes at most

\frac{c\ell}{\Delta^{2}}\log\frac{\Delta n}{\ell}

(10)

queries for some constant $c$ , and

1.

If ${\cal A}(\ell,\Delta)=1$ , then $d\geq\frac{\ell}{\Delta^{2}}$ .
2.

If ${\cal A}(\ell,\Delta)=0$ , then $d\leq\frac{\ell}{\Delta}.$

Consider the algorithm ${\cal T}(n,D,\Delta)$ that runs ${\cal A}(D/\Delta^{i},\Delta)$ for all $i=0,\ldots,\lceil\log D/\log\Delta\rceil$ . Let $r$ be the minimum integer such that ${\cal A}(D/\Delta^{r},\Delta)=1$ . Algorithm ${\cal T}(n,D,\Delta)$ then outputs $\hat{d}=D/\Delta^{r+1}$ . See algorithm ${\cal T}$ in Figure 1.

${\cal T}$ $(n,D,\Delta)$ 1) $r\leftarrow 0.$ 2) For each $i=0,1,\ldots,\lceil\log D/\log\Delta\rceil$ do: 2.1) $R\leftarrow{\cal A}(D/\Delta^{i},\Delta)$ 2.2) If $(R=1)$ then $r\leftarrow i$ $\hat{d}\leftarrow D/\Delta^{r+1}$ Output ( $\hat{d}$ ).

Figure 1: Algorithm

{\cal T}

We now prove:

Lemma 4.

Algorithm ${\cal T}(n,D,\Delta)$ is deterministic non-adaptive that makes

O\left(\frac{D}{\Delta^{2}}\log\left(\frac{\Delta n}{D}\right)\right)

tests and outputs $\hat{d}$ that satisfies

\frac{d}{\Delta}\leq\hat{d}\leq\Delta d.

Proof.

For $i=0$ , if $A(D/\Delta^{i},\Delta)=1$ then $d\geq D/\Delta^{2}$ . Then $\hat{d}=D/\Delta\leq\Delta d$ and since $D\geq d$ we also have $\hat{d}=D/\Delta\geq d/\Delta$ .

For $i>0$ , if $A(D/\Delta^{i-1},\Delta)=0$ and $A(D/\Delta^{i},\Delta)=1$ then $d\leq D/\Delta^{i}$ and $d\geq D/\Delta^{i+2}$ . Then $\hat{d}=D/\Delta^{i+1}\leq\Delta d$ and $\hat{d}\geq d/\Delta$ .

Let $q=\lceil\log D/\log\Delta\rceil$ . Let $t$ denote the number of queries performed by algorithm ${\cal T}(n,D,\Delta)$ . By (10), the number of tests is at most

$\displaystyle\sum_{i=0}^{q}\frac{cD}{\Delta^{i}\Delta^{2}}\log\frac{n\Delta^{i+1}}{D}$	$\displaystyle\leq$	$\displaystyle\frac{cD}{\Delta^{2}}\sum_{i=0}^{\infty}\frac{1}{\Delta^{i}}\log\frac{n\Delta^{i+1}}{D}$
	$\displaystyle=$	$\displaystyle\frac{cD}{\Delta^{2}}\left(\left(\log\frac{n}{D}\right)\sum_{i=0}^{\infty}\frac{1}{\Delta^{i}}+(\log\Delta)\sum_{i=0}^{\infty}\frac{i+1}{\Delta^{i}}\right)$
	$\displaystyle\leq$	$\displaystyle\frac{cD}{\Delta^{2}}\left(\frac{\Delta}{\Delta-1}\log\frac{n}{D}+\frac{\Delta^{2}}{(\Delta-1)^{2}}\log\Delta\right).$

For the case when $\Delta=1+\Theta(1)$ we get

t=O\left(D\log\frac{n}{D}\right)

and for the case when $\Delta=\omega(1)$ we get

t=O\left(\frac{D}{\Delta^{2}}\left(\log\frac{n}{D}+\log\Delta\right)\right).

∎

4 Lower Bound for Adaptive Deterministic Algorithm

In this section, we prove the following lower bound.

Theorem 5.

Any deterministic adaptive group testing algorithm that given $D>d$ , outputs $\hat{d}$ that satisfies $d/\Delta\leq\hat{d}\leq\Delta d$ must make at least

\Omega\left(\frac{D}{\Delta^{2}}\log\frac{n}{D}\right)

queries.

For the proof, we use the following from [3].

Lemma 6.

Let $A$ be a deterministic adaptive algorithm that for a defective sets $I\subset[n]$ makes the tests $T^{I}_{1},T^{I}_{2}\ldots,T^{I}_{w(I)}$ and let $s(I)$ be the sequence of answers to these tests. If $M=|\{s(I)|I\subseteq[n]\}|$ then the test complexity of $A$ is $\max_{I}w(I)\geq\log M$ .

The following Lemma assists us to prove the result declared by Theorem 5.

Lemma 7.

Any deterministic adaptive algorithm such that, if the number of defectives $d$ is less than or equal $d_{1}$ it outputs $0$ and if it is greater than $d_{2}$ it outputs $1$ , must make

\Omega\left(d_{1}\log\frac{n}{d_{2}}\right)

tests.

In particular, when $d_{1}=\ell/\Delta^{2}$ and $d_{2}=\ell/\Delta$ we get

\Omega\left(\frac{\ell}{\Delta^{2}}\left(\log\frac{n}{\ell}+\log\Delta\right)\right)

tests.

Proof.

Let $A$ be such algorithm. Let $s(I)$ be the sequence of answers to the tests of $A$ when the set of defective items is $I$ . Consider a set $I$ of size $d_{1}$ and let ${\cal J}=\{J\subseteq[n]:|J|=d_{1},s(J)=s(I)\}$ . Let $I^{\prime}=\cup_{J\in{\cal J}}J$ . We claim that $s(I^{\prime})=s(I)$ . Suppose for the contrary, $s(I^{\prime})\not=s(I)$ . Then, since $I\subseteq I^{\prime}$ , there is a test $Q\subseteq[n]$ that is asked by $A$ that gives answer $0$ to $I$ and $1$ to $I^{\prime}$ . Since $I^{\prime}\cap Q\not=\emptyset$ , there is a subset $J^{\prime}\in{\cal J}$ such that $J^{\prime}\cap Q\not=\emptyset$ and therefore $Q$ gives answer $1$ to $J^{\prime}$ . Then $s(J^{\prime})\not=s(I)$ and we get a contradiction.

Since $s(I^{\prime})=s(I)$ and algorithm $A$ outputs $0$ to $I$ , it also outputs $0$ to $I^{\prime}$ . Therefore, $|I^{\prime}|\leq d_{2}$ . Therefore $|{\cal J}|\leq N:={d_{2}\choose d_{1}}$ . That is, for every possible sequence of answers $s^{\prime}$ of the algorithm $A$ , there is at most $N$ sets of size $d_{1}$ that get the same sequence of answers. Since there are $L:={n\choose d_{1}}$ such sets, the number of different sequences of answers that $A$ might have must be at least $L/N$ . By Lemma 6, the number of tests that the algorithm makes is at least

\log\frac{{n\choose d_{1}}}{{d_{2}\choose d_{1}}}\geq\log\left(\frac{n}{ed_{2}}\right)^{d_{1}}=\Omega\left(d_{1}\log\frac{n}{d_{2}}\right).

∎

The conclusions established by Lemma 7 show that the upper bound from Lemma 3 is tight. Moreover, using these results, we provide the following proof for Theorem 5.

Proof.

Let $d_{1}=D/\Delta^{2}-1$ and $d_{2}=D$ . For sets of size less than or equal $d_{1}$ the algorithm returns $d_{1}/\Delta\leq\hat{d}\leq\Delta d_{1}$ and for sets of equal to $d_{2}$ the algorithm returns $d_{2}/\Delta<\hat{d}\leq\Delta d_{2}$ . Since $\Delta d_{1}<d_{2}/\Delta$ , the above intervals are disjoint. So, the algorithm can distinguish between sets of size less that or equal to $d_{1}$ and sets of size greater than $d_{2}$ . By Lemma 7 the algorithm must make at least

\Omega\left(\frac{D}{\Delta^{2}}\log\frac{n}{D}\right)

tests. ∎

5 Polynomial Time Adaptive Algorithm

In this section, we prove:

Theorem 8.

Let $D$ be some upper bound on the number of defective items $d$ and $\Delta>1$ . Then, there is a linear time deterministic adaptive algorithm that makes

O\left(\frac{D}{\Delta^{2}}\left(\log\frac{n}{D}+\log\Delta\right)\right)

tests and outputs $\hat{d}$ such that $\frac{d}{\Delta}\leq\hat{d}\leq d\Delta.$

We first describe the algorithm. The algorithm gets as an input the set of items $X=[n]$ and splits it into two equally-sized disjoint sets $Q_{1}$ and $Q_{2}$ . The algorithm asks the queries defined by $Q_{1}$ and $Q_{2}$ and proceeds in the splitting process on the sets that yielded positive answers only. We call these sets defective sets. As long as the algorithm gets less than $D/\Delta^{2}$ distinct defective sets, it continues to split and test. Two cases can happen. Either it gets $D/\Delta^{2}$ defective sets and then the algorithm outputs $\hat{d}=D/\Delta$ , or the number of the defective sets is always less than $D/\Delta^{2}$ and then, the algorithm finds all the defective items and returns their exact number. The algorithm is given in Figure 2. The algorithm invokes the procedure ${\bf Split}(X)$ that on an input $X=\{a_{1},a_{2},\ldots,a_{n}\}$ , it returns the set $W$ where $W:=\{X_{1},X_{2}\}$ such that $X_{i}\subseteq X$ , $X_{1}=\{a_{1},a_{2},\ldots,a_{\left\lfloor{n/2}\right\rfloor}\}$ , $X_{2}=\{a_{\left\lfloor{n/2}\right\rfloor+1},\ldots,a_{n}\}$ if $|X|\geq 2$ , and $W:=\{X\}$ otherwise.

Adaptive-dEstimate $({\cal O}_{I},X,\Delta,D)$ 1) $Q\leftarrow{X},S\leftarrow\emptyset$ 2) While $(|Q|\leq D/\Delta^{2})$ do: 2.1) For each $Q_{i}\in Q$ $\left\{Q_{i}^{(1)},Q_{i}^{(2)}\right\}\leftarrow{\bf Split}(Q_{i})$ If $(Q_{i}^{(1)}(I)=1)$ then $S\leftarrow S\cup\{Q_{i}^{(1)}\}$ If $(Q_{i}^{(2)}(I)=1)$ then $S\leftarrow S\cup\{Q_{i}^{(2)}\}$ 2.2) If $\forall S_{i}\in S,|S_{i}|=1$ $\hat{d}\leftarrow|S|$ Output ( $\hat{d}$ ) Else $Q\leftarrow S,S\leftarrow\emptyset.$ 3) $\hat{d}\leftarrow|Q|\cdot\Delta$ . 4) Output ( $\hat{d}$ )

Figure 2: Algorithm Adaptive-dEstimate to estimate the number of defective items.

Lemma 9.

Algorithm Adaptive-dEstimate is a deterministic adaptive algorithm that makes

2\frac{D}{\Delta^{2}}\log{\frac{n\Delta^{2}}{D}}=O\left(\frac{D}{\Delta^{2}}\left(\log{\frac{n}{D}}+\log{\Delta}\right)\right)

tests and outputs an estimation $\hat{d}$ such that:

\frac{d}{\Delta}\leq\hat{d}\leq d\Delta.

Proof.

If $d\leq\frac{D}{\Delta^{2}}$ , then the splitting process in step 2 of the algorithm proceeds until each defective item belongs to a distinct set. Eventually, the condition in step 2.2 is met and the algorithm outputs the exact value of $d$ . If $d>{D/\Delta^{2}}$ , then the splitting process stops when the number of defective sets $|Q|$ exceeds $D/\Delta^{2}$ . The algorithm halts and outputs $\hat{d}=|Q|\Delta$ . Obviously, $|Q|\leq d$ . Therefore, $\hat{d}=|Q|\Delta\leq d\Delta.$ Moreover, $|Q|>D/\Delta^{2}\geq d/\Delta^{2}$ which implies that $\hat{d}\geq d/\Delta.$

The number of iterations cannot exceed $\log n$ iterations. In the first $\log(D/\Delta^{2})$ iterations, in the worst case scenario, the algorithm splits its current set $Q_{i}$ on each iteration into two sets $Q_{i}^{(1)}$ and $Q_{i}^{(2)}$ such that $Q_{i}^{(1)}(I)=Q_{i}^{(2)}(I)=1$ . Therefore, the number of tests that the algorithm asks over all the first $\log({D}/{\Delta^{2}})$ iterations is at most

\sum_{i=1}^{\log({D}/{\Delta^{2}})}2^{i}\leq 2\frac{D}{\Delta^{2}}.

Since $|Q|\leq{D}/{\Delta^{2}}$ , in the other $\log n-\log({D}/{\Delta^{2}})$ iterations, the algorithm makes at most $2D/{\Delta^{2}}$ tests each iteration. So, the total number of tests is at most

2\frac{D}{\Delta^{2}}\left(\log n-\log\frac{D}{\Delta^{2}}\right)+2\frac{D}{\Delta^{2}}=O\left(\frac{D}{\Delta^{2}}\log\frac{n\Delta^{2}}{D}\right)

. ∎

6 Polynomial Time Non-Adaptive Algorithm

In this section, we show how to use expanders, condensers and extractors to construct deterministic non-adaptive algorithms for defectives number estimation. We prove:

Theorem 10.

Let $D$ be some upper bound on the number of defective items $d$ and $\Delta>1$ . Then, there is a polynomial time deterministic non-adaptive algorithm that makes

\min\left(D^{o(1)},2^{\log^{3}(\log n)}\right)\cdot\frac{D}{\Delta^{2}}\log n

tests and outputs $\hat{d}$ such that $\frac{d}{\Delta}\leq\hat{d}\leq d\Delta.$

6.1 Algorithms Using Expanders

Let $G$ be a bipartite graph $G=G(L,R,E)$ with left vertices $L=[n]$ , right vertices $R=[m]$ and edges $E\subseteq L\times R$ . For each edge $(i,j)\in E$ , it holds that the endpoint $i\in L$ and $j\in R$ . For a vertex $v\in L$ , define $\Gamma(v)$ to be the set of the neighbours of $v$ in $G$ i.e. $\Gamma(v):=\{u\in R|(v,u)\in E\}$ . For a subset $S\subseteq L$ , we define $\Gamma(S)$ to be the set of neighbours of $S$ , meaning $\Gamma(S):=\cup_{v\in S}\Gamma(v)$ . For a vertex $v\in L$ , the degree of $v$ is defined as $deg(v):=|\Gamma(v)|$ . We say that a bipartite graph $G=G(L,R,E)$ is a $(k,a)$ -expander $\delta$ -regular bipartite graph if, the degree of every vertex in $L$ is $\delta$ , and for every left-subset $S\subseteq L$ of size at most $k$ , we have $|\Gamma(S)|\geq a|S|$ .

Lemma 11.

Let $X=[n]$ be a set of items and $I\subseteq[n]$ is the set of defective items such that $|I|=d$ is unknown to the algorithm. Let $G=G(L,R,E)$ be a $(k,a)$ -expander $\delta$ -regular bipartite graph with $|L|=n$ and $|R|=m$ . Then, there is a deterministic non-adaptive algorithm $A$ , such that for $n$ items, it makes $m$ tests and satisfies:

1.

If $|I|<ak/\delta$ , then $A$ outputs $0$ .
2.

If $|I|\geq k$ , then $A$ outputs $1$ .

Proof.

For every $j\in R$ , we define the test $T^{(j)}=\{i|(i,j)\in E\}$ . The number of tests is $|R|=m$ . If $|I|\geq k$ , then $|\Gamma(I)|\geq ak$ . Therefore, at least $ak$ tests will give positive answer $1$ . If $|I|<ak/\delta$ , then, since the degree of every vertex in $L$ is $\delta$ , we have $|\Gamma(I)|\leq\delta|I|<ak$ . This shows that, for this case, at most $ak-1$ tests give the answer $1$ . Hence, we can distinguish between the two cases. ∎

Following the same proof of Lemma 4 with algorithm ${\cal T}$ in Figure 1, we have:

Lemma 12.

Let $A(\ell,\Delta)$ be a deterministic non-adaptive algorithm such that, for $n$ items, it makes $m(\ell,\Delta)$ tests and satisfies:

1.

If $|I|<\ell/\Delta^{2}$ , then $A$ outputs $0$ .
2.

If $|I|\geq\ell/\Delta$ , then $A$ outputs $1$ .

Then, there is a deterministic non-adaptive algorithm ${\cal T}$ such that, given $D>d$ , for $n$ items it makes

\sum_{i=0}^{\lceil\log D/\log\Delta\rceil}m\left(\frac{D}{\Delta^{i}},\Delta\right)

tests and outputs $\hat{d}$ that satisfies $d/\Delta\leq\hat{d}\leq\Delta d$ .

The parameters of the explicit construction of a $(k,a)$ -expander $\delta$ -regular bipartite graph from [4] are summarised in the following lemma.

Lemma 13.

For any $k>0$ and $0<\epsilon<1$ , there is an explicit construction of a $(k,a)$ -expander $\delta$ -regular bipartite graph with

m=O(k\delta/\epsilon),\ \ \delta=2^{O(\log^{3}(\log n/\epsilon))},\ \ a=(1-\epsilon)\delta.

We now prove:

Lemma 14.

There is a polynomial time deterministic non-adaptive algorithm that makes

\frac{D}{\Delta^{2}}\cdot 2^{O(\log^{3}(\log n))}=\frac{D}{\Delta^{2}}\cdot\mbox{{\rm quasipoly}}(\log n)

tests and outputs $\hat{d}$ that satisfies

\frac{d}{\Delta}\leq\hat{d}\leq\Delta d.

Proof.

We use the expander in Lemma 13. Recall that $\Delta=1+\Omega(1)$ . Let $r=\min(\Delta,2)$ , $\epsilon=1-1/r$ and $k=r\ell/\Delta^{2}$ . Then $a=\delta/r=2^{O(\log^{3}\log n)}$ and $m=m(\ell,\Delta)=(\ell/\Delta^{2})2^{O(\log^{3}\log n)}$ . By Lemma 11, there is a deterministic non-adaptive algorithm $A$ such that for $n$ items, it makes $m(\ell,\Delta)$ tests and

1.

If $|I|<ak/\delta=\ell/\Delta^{2}$ then $A$ outputs $0$ .
2.

If $|I|\geq k=r\ell/\Delta^{2}$ then $A$ outputs $1$ .

Algorithm $A$ trivially satisfies the first condition required by Lemma 12. Consider item 2. If $\Delta<2$ then $r=\Delta$ and then if $|I|\geq k=\ell/\Delta$ then $A$ outputs $1$ . If $\Delta>2$ then $r=2$ and then if $|I|\geq k=2\ell/\Delta^{2}$ then $A$ outputs $1$ . Since $2\ell/\Delta^{2}<\ell/\Delta$ , if $|I|\geq\ell/\Delta$ then $A$ outputs $1$ .

Now by Lemma 12, there is a deterministic non-adaptive algorithm ${\cal T}$ such that, given $D>d$ , for $n$ items, it makes

\sum_{i=0}^{\lceil\log D/\log\Delta\rceil}m\left(\frac{D}{\Delta^{i}},\Delta\right)=\frac{D}{\Delta^{2}}\cdot 2^{O(\log^{3}(\log n))}

tests and outputs $\hat{d}$ that satisfies $d/\Delta\leq\hat{d}\leq\Delta d$ .

∎

6.2 Algorithms Using Extractors and Condensers

Extractors are functions that convert weak random sources into almost-perfect random sources. We use these objects to construct a non-adaptive algorithm for estimating $d$ . We start with some definitions.

Definition 15.

Let $X$ be a random variable over a finite set $S$ . We say that $X$ has min-entropy at least $k$ if $Pr[X=x]\leq 2^{-k}$ for all $x\in S$ .

Definition 16.

Let $X$ and $Y$ be random variables over a finite set $S$ . We say that $X$ and $Y$ are $\epsilon-close$ if $\max_{P\subseteq S}|{\bf Pr}[X\in P]-{\bf Pr}[Y\in P]|\leq\epsilon.$

We denote by $U_{\ell}$ the uniform distribution on $\{0,1\}^{\ell}$ . The notations ${\bf Pr}_{x\in B}$ or ${\bf E}_{x\in B}$ stand for the fact that the probability and the expectation are taken when $x$ is chosen randomly uniformly from $B$ .

Definition 17.

A function $F:\{0,1\}^{\hat{n}}\times\{0,1\}^{\hat{t}}\to\{0,1\}^{\hat{m}}$ is a $k\to_{\epsilon}k^{\prime}$ condenser if for every $X$ with min-entropy at least $k$ and $Y$ uniformly distributed on $\{0,1\}^{\hat{t}}$ , the distribution of $(Y,F(X,Y))$ is $\epsilon$ -close to a distribution $(U_{\hat{t}},Z)$ with min-entropy $\hat{t}+k^{\prime}$ . A condenser is called $(k,\epsilon)$ -lossless condenser if $k^{\prime}=k$ . A condenser is called $(k,\epsilon)$ -extractor if $\hat{m}=k^{\prime}$ .

Let $\hat{N}=\{0,1\}^{\hat{n}},\hat{T}=\{0,1\}^{\hat{t}}$ and $\hat{M}=\{0,1\}^{\hat{m}}$ , and let $F:\hat{N}\times\hat{T}\to\hat{M}$ be a $k\to_{\epsilon}k^{\prime}$ condenser. Consider the $2^{\hat{t}}\times 2^{\hat{n}}$ matrix ${\cal M}$ induced by $F$ . That is, for $r\in\hat{T}$ and $s\in\hat{N}$ , the entry ${\cal M}_{r,s}$ is equal to $F(s,r)$ . For $s\in\hat{N}$ , let ${\cal M}^{(s)}$ be the $s$ th column of ${\cal M}$ . Then, ${\cal M}^{(s)}_{r}={\cal M}_{r,s}=F(s,r)$ .

Definition 18.

Let $\Sigma$ be a finite set. An $n$ -mixture over $\Sigma$ is an $n-$ tuple ${\cal S}:=(S_{1},\cdots,S_{n})$ such that for all $i\in[n]$ , $S_{i}\subseteq\Sigma$ .

Using these definitions and notations, we restate the result proved by Cheraghchi [6] (Theorem 9) in the following lemma.

Lemma 19.

Let $F:\{0,1\}^{\hat{n}}\times\{0,1\}^{\hat{t}}\to\{0,1\}^{\hat{m}}$ be a $k\to_{\epsilon}k^{\prime}$ condenser. Let ${\cal M}$ be the matrix induced by $F$ . Then, for any $2^{\hat{t}}-$ mixture ${\cal S}=(S_{1},\cdots,S_{2^{\hat{t}}})$ over $\hat{M}:=\{0,1\}^{\hat{m}}$ , the number of columns $s$ in ${\cal M}$ that satisfies

\underset{{r\in\hat{T}}}{{\bf Pr}}[{\cal M}^{(s)}_{r}\in S_{r}]>\frac{{\bf E}_{r\in\hat{T}}[|S_{r}|]}{2^{k^{\prime}}}+\epsilon

is less than $2^{k}$ .

Equipped with Lemma 19, we prove:

Lemma 20.

If there is a $k\to_{\epsilon}k^{\prime}$ condenser $F:\{0,1\}^{\hat{n}}\times\{0,1\}^{\hat{t}}\to\{0,1\}^{\hat{m}}$ then, there is a deterministic non-adaptive algorithm ${\cal A}$ for $n=2^{\hat{n}}$ items that makes $m=2^{\hat{t}+\hat{m}}$ tests and satisfies the following.

1.

If the number of defectives is less than $(1-\epsilon)2^{k^{\prime}}$ then ${\cal A}$ outputs $0$ .
2.

If the number of defectives is greater than or equal $2^{k}+1$ then ${\cal A}$ outputs $1$ .

Proof.

Consider the matrix ${\cal M}$ induced by the condenser $F$ as explained above. We define the test matrix ${\cal T}$ from ${\cal M}$ as follows. Let $x\in\{0,1\}^{\hat{m}}$ . Define $e(x)\in\{0,1\}^{2^{\hat{m}}}$ such that $e(x)_{y}=1$ if and only if $x=y$ , where the bits in $e(x)$ are indexed by the elements of $\{0,1\}^{2^{\hat{m}}}$ . Each row $r$ in the matrix ${\cal M}$ is replaced by $2^{\hat{m}}$ rows (in ${\cal T}$ ) such that in each entry ${\cal M}_{r,s}\in\{0,1\}^{\hat{m}}$ is replaced by the column vector $e({\cal M}_{r,s})^{T}\in\{0,1\}^{2^{\hat{m}}}$ . The rows of the matrix ${\cal T}$ are indexed by $\hat{T}\times\hat{M}$ . Let ${\cal T}^{(i)}$ denote the $i$ th column of ${\cal T}$ . Therefore, for $r\in\hat{T}$ and $j\in\hat{M}$ , the row $(r,j)$ in the matrix ${\cal T}$ is denoted by ${\cal T}_{(r,j)}$ . Moreover, the $i$ th entry of the row ${\cal T}_{(r,j)}$ is denoted by ${\cal T}_{(r,j),i}$ and ${\cal T}_{(r,j),i}={\cal T}^{(i)}_{(r,j)}=1$ if and only if ${\cal M}_{r,i}=j$ . The size of the test matrix ${\cal T}$ is $m\times n$ .

Let the defective elements be $s_{i_{1}},\ldots,s_{i_{\ell}}$ and let $y\in\{0,1\}^{m}$ indicate the tests result. Then, $y$ is equal to ${\cal T}^{(s_{i_{1}})}\vee\cdots\vee{\cal T}^{(s_{i_{\ell}})}$ . Let ${\cal S}=(S_{r})_{r\in\hat{T}}$ be a $2^{\hat{t}}-$ mixture over $\{0,1\}^{{\hat{m}}}$ where for all $r\in\hat{T}$ , $S_{r}=\{j\in\{0,1\}^{{\hat{m}}}|y_{(r,j)}={\cal T}^{(s_{i_{1}})}_{(r,j)}\vee\cdots\vee{\cal T}^{(s_{i_{\ell}})}_{(r,j)}=1\}$ . It is easy to see that:

1.

$|S_{r}|\leq\ell$ . This is because, by the definition of $S_{r}$ , $j\in S_{r}$ if and only if $y_{(r,j)}=1$ . The entry $y_{(r,j)}$ gets the value $1$ if at least one of the entries ${\cal T}_{(r,j)}^{(s_{i_{1}})},\cdots,{\cal T}_{(r,j)}^{(s_{i_{\ell}})}$ is $1$ . Any row in ${\cal T}^{(s_{i_{1}})},\cdots,{\cal T}^{(s_{i_{\ell}})}$ has exactly one entry that is equal to $1$ in all the $2^{\hat{m}}$ rows indexed by $r$ . Hence, each row can cause one item to be inserted to $S_{r}$ .
2.

For any $s_{i_{j}}\in\{s_{i_{1}},\ldots,s_{i_{\ell}}\}$ , we have ${\bf Pr}_{r\in\hat{T}}[{\cal M}^{(s_{i_{j}})}_{r}\in S_{r}]=1$
3.

Given the matrix ${\cal M}$ , its test matrix ${\cal T}$ and the observed result $y$ , for any column $s$ the probability ${\bf Pr}_{r\in\hat{T}}[{\cal M}^{(s)}_{r}\in S_{r}]$ can be easily computed.

If the number of defectives is less than $(1-\epsilon)2^{k^{\prime}}$ then, by Lemma 19, all columns, except for at most $2^{k}$ columns, satisfy

\underset{{r\in\hat{T}}}{{\bf Pr}}[{\cal M}^{(s)}_{r}\in S_{r}]\leq\frac{{\bf E}_{r\in\hat{T}}[|S_{r}|]}{2^{k^{\prime}}}+\epsilon<\frac{{\bf E}_{r\in\hat{T}}[(1-\epsilon)2^{k^{\prime}}]}{2^{k^{\prime}}}+\epsilon=1.

So for less than $2^{k}+1$ columns we have ${\bf Pr}_{r\in\hat{T}}[{\cal M}^{(s)}_{r}\in S_{r}]=1$ . If the number of defectives is greater than or equal $2^{k}+1$ , then for the columns of the defectives we have ${\bf Pr}_{r\in\hat{T}}[{\cal M}^{(s)}_{r}\in S_{r}]=1$ . So for more than $2^{k}$ columns we have ${\bf Pr}_{r\in\hat{T}}[{\cal M}^{(s)}_{r}\in S_{r}]=1$ .

∎

The following Lemma summarises the state of the art result due to Guruswami et. al. [15] on explicit construction of expanders.

Lemma 21.

For all positive integers $\hat{n},k$ such that $\hat{n}\geq k$ , and all $\epsilon>0$ , there is an explicit $(k,\epsilon)$ extractor $F:\{0,1\}^{\hat{n}}\times\{0,1\}^{\hat{t}}\to\{0,1\}^{\hat{m}}$ with $\hat{t}=\log{\hat{n}}+O(\log{k}\log{(k/\epsilon)})$ and $\hat{m}=k^{\prime}=k-2\log{1/\epsilon}-c$ for some constant $c$ .

We now prove:

Lemma 22.

There is a constant $C$ such that for every $\Delta>C$ , there is a polynomial deterministic non-adaptive algorithm that estimates the number of defective items in a set of $n$ items up to a multiplicative factor of $\Delta$ and asks

O\left(\frac{D^{1+o(1)}}{\Delta^{2}}\log{n}\right)

queries.

Proof.

We use the notations from Lemma 21. Let $C=27\cdot 2^{c-2}$ . We choose $\epsilon=2/3$ and $k^{\prime}$ such that $(1-\epsilon)2^{k^{\prime}}=\ell/\Delta^{2}$ . Then

2^{k}=2^{k^{\prime}+2\log(1/\epsilon)+c}=27\cdot 2^{c-2}\frac{\ell}{\Delta^{2}}<\frac{\ell}{\Delta}

By Lemma 20, there is a deterministic non-adaptive algorithm ${\cal A}$ for $n=2^{\hat{n}}$ items that makes

m=2^{\hat{t}+\hat{m}}=\hat{n}2^{O(\log k\log(k/\epsilon))}\frac{\ell}{(1-\epsilon)\Delta^{2}}=2^{\log^{2}\log(\ell/\Delta)}\frac{\ell}{\Delta^{2}}\log n

tests that satisfies the following:

1.

If the number of defectives is less than $(1-\epsilon)2^{k^{\prime}}={\ell}/{\Delta^{2}}$ then ${\cal A}$ outputs $0$ .
2.

If the number of defectives is greater than or equal $2^{k}+1$ then ${\cal A}$ outputs $1$ and, since $2^{k}<\ell/\Delta$ , in particular, if the number of defectives is greater than or equal $\ell/\Delta$ then ${\cal A}$ outputs $1$ .

By Lemma 12, the result follows. ∎

A similar work by Capalbo et.al. [4] gives an explicit constrction of a lossless condenser is summarised in the following lemma:

Lemma 23.

For all positive integers $\hat{n},k$ and all $\epsilon>0$ , there is an explicit lossless condenser $F:\{0,1\}^{\hat{n}}\times\{0,1\}^{\hat{t}}\to\{0,1\}^{\hat{m}}$ with $\hat{t}=O(\log^{3}(\hat{n}/\epsilon))$ and $\hat{m}=k+\log(1/\epsilon)+O(1)$ .

The construction from Lemma 23 yields a result that is similar to the one established in Lemma 22.

References

[1] Abhishek Agarwal, Larkin Flodin, and Arya Mazumdar. Estimation of sparsity via simple measurements. In 2017 IEEE International Symposium on Information Theory (ISIT), pages 456–460. IEEE, 2017.
[2] Nader H. Bshouty. Lower bound for non-adaptive estimation of the number of defective items. In 30th International Symposium on Algorithms and Computation, ISAAC 2019, December 8-11, 2019, Shanghai University of Finance and Economics, Shanghai, China, pages 2:1–2:9, 2019.
[3] Nader H. Bshouty, Vivian E. Bshouty-Hurani, George Haddad, Thomas Hashem, Fadi Khoury, and Omar Sharafy. Adaptive group testing algorithms to estimate the number of defectives. CoRR, abs/1712.00615, 2017.
[4] Michael Capalbo, Omer Reingold, Salil Vadhan, and Avi Wigderson. Randomness conductors and constant-degree expansion beyond the degree / 2 barrier. 01 2002.
[5] Yongxi Cheng and Yin-Feng Xu. An efficient fpras type group testing procedure to approximate the number of defectives. Journal of Combinatorial Optimization, 27:302–314, 2014.
[6] Mahdi Cheraghchi. Noise-resilient group testing: Limitations and constructions. CoRR, abs/0811.2609, 2008.
[7] Chen CL and Swallow WH. Using group testing to estimate a proportion, and to test the binomial model. Biometrics, 46(4):1035–1046, 1990.
[8] Graham Cormode and S. Muthukrishnan. What’s hot and what’s not: Tracking most frequent items dynamically. ACM Trans. Database Syst., 30(1):249–278, March 2005.
[9] Peter Damaschke and Azam Sheikh Muhammad. Bounds for nonadaptive group tests to estimate the amount of defectives. In Weili Wu and Ovidiu Daescu, editors, Combinatorial Optimization and Applications, pages 117–130, Berlin, Heidelberg, 2010. Springer Berlin Heidelberg.
[10] Peter Damaschke and Azam Sheikh Muhammad. Competitive group testing and learning hidden vertex covers with minimum adaptivity. In Mirosław Kutyłowski, Witold Charatonik, and Maciej Gebala, editors, Fundamentals of Computation Theory, pages 84–95, Berlin, Heidelberg, 2009. Springer Berlin Heidelberg.
[11] Robert Dorfman. The detection of defective members of large populations. The Annals of Mathematical Statistics, 14(4):436–440, 1943.
[12] Moein Falahatgar, Ashkan Jafarpour, Alon Orlitsky, Venkatadheeraj Pichapati, and Ananda Suresh. Estimating the number of defectives with group testing. pages 1376–1380, 07 2016.
[13] Joseph L. Gastwirth and Patricia A. Hammick. Estimation of the prevalence of a rare disease, preserving the anonymity of the subjects by group testing: application to estimating the prevalence of aids antibodies in blood donors. Journal of Statistical Planning and Inference, 22(1):15 – 27, 1989.
[14] Christian Gollier and Olivier Gossner. Group testing against covid-19. Covid Economics, pages 32–42, 04 2020.
[15] V. Guruswami, C. Umans, and S. Vadhan. Unbalanced expanders and randomness extractors from parvaresh-vardy codes. In Twenty-Second Annual IEEE Conference on Computational Complexity (CCC’07), pages 96–108, 2007.
[16] E. S. Hong and R. E. Ladner. Group testing for image compression. IEEE Transactions on Image Processing, 11(8):901–911, 2002.
[17] W. Kautz and R. Singleton. Nonrandom binary superimposed codes. IEEE Transactions on Information Theory, 10(4):363–377, 1964.
[18] Chou Li. A sequential method for screening experimental variables. Journal of The American Statistical Association - J AMER STATIST ASSN, 57:455–477, 06 1962.
[19] Cassidy Mentus, Martin Romeo, and Christian DiPaola. Analysis and applications of adaptive group testing methods for covid-19. medRxiv, 2020.
[20] Hung Ngo and Ding-Zhu Du. A survey on combinatorial group testing algorithms with applications to dna library screening. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, 55, 12 2000.
[21] Dana Ron and Gilad Tsur. The power of an example: Hidden set size approximation using group queries and conditional sampling. CoRR, abs/1404.5568, 2014.
[22] M. Sobel and P. A. Groll. Group testing to eliminate efficiently all defectives in a binomial sample. The Bell System Technical Journal, 38(5):1179–1252, 1959.
[23] William H Swallow. Group testing for estimating infection rates and probabilities of disease transmission. Phytopathology (USA), 1985.
[24] Keith H Thompson. Estimation of the proportion of vectors in a natural population of insects. Biometrics, 18(4):568–578, 1962.
[25] Stephen D. Walter, Stephen W. Hilderth, and Barry J. Beaty. Estimation of infection Rates in Populations of Organisms Using Pools of Variable Size. American Journal of Epidemiology, 112(1):124–128, 07 1980.
[26] J. Wolf. Born again group testing: Multiaccess communications. IEEE Transactions on Information Theory, 31(2):185–191, 1985.
[27] Idan Yelin, Noga Aharony, Einat Shaer-Tamar, Amir Argoetti, Esther Messer, Dina Berenbaum, Einat Shafran, Areen Kuzli, Nagam Gandali, Tamar Hashimshony, Yael Mandel-Gutfreund, Michael Halberthal, Yuval Geffen, Moran Szwarcwort-Cohen, and Roy Kishony. Evaluation of covid-19 rt-qpcr test in multi-sample pools. medRxiv, 2020.

Optimal Deterministic Group Testing Algorithms to Estimate the Number of Defectives

Abstract

1 Introduction

2 Definitions and Preliminary Results

Lemma 1.

3 Upper Bound for Non-Adaptive Deterministic Algorithms

Theorem 2.

Lemma 3.

Proof.

Lemma 4.

Proof.

4 Lower Bound for Adaptive Deterministic Algorithm

Theorem 5.

Lemma 6.

Lemma 7.

Proof.

Proof.

5 Polynomial Time Adaptive Algorithm

Theorem 8.

Lemma 9.

Proof.

6 Polynomial Time Non-Adaptive Algorithm

Theorem 10.

6.1 Algorithms Using Expanders

Lemma 11.

Proof.

Lemma 12.

Lemma 13.

Lemma 14.

Proof.

6.2 Algorithms Using Extractors and Condensers

Definition 15.

Definition 16.

Definition 17.

Definition 18.

Lemma 19.

Lemma 20.

Proof.

Lemma 21.

Lemma 22.

Proof.

Lemma 23.

References

Optimal Deterministic Group Testing Algorithms
to Estimate the Number of Defectives