Optimal Deterministic Group Testing Algorithms
to Estimate the Number of Defectives
Abstract
We study the problem of estimating the number of defective items within a pile of elements up to a multiplicative factor of , using deterministic group testing algorithms. We bring lower and upper bounds on the number of tests required in both the adaptive and the non-adaptive deterministic settings given an upper bound on the defectives number. For the adaptive deterministic settings, our results show that, any algorithm for estimating the defectives number up to a multiplicative factor of must make at least tests. This extends the same lower bound achieved in [1] for non-adaptive algorithms. Moreover, we give a polynomial time adaptive algorithm that shows that our bound is tight up to a small additive term.
For non-adaptive algorithms, an upper bound of is achieved by means of non-constructive proof. This improves the lower bound from [1] and matches the lower bound up to a small additive term.
In addition, we study polynomial time constructive algorithms. We use existing polynomial time constructible expander regular bipartite graphs, extractors and condensers to construct two polynomial time algorithms. The first algorithm makes tests, and the second makes tests. This is the first explicit construction with an almost optimal test complexity.
1 Introduction
The problem of group testing is the problem of identifying or, in some cases, examining the properties of a small amount of items known as defective items within a pile of elements using group tests. Let be a set of items, and let be the set of defective items. A group test is a subset of items. The result of the test with respect to is defined by if and otherwise. While the defective set is unknown to the algorithm, in many cases we might be interested in finding the size of the defective set , or at least an estimation of that value with a minimum number of tests.
Group testing was originally proposed as a potential solution for economising mass blood testing during WWII [11]. Since then, group testing approach has been diversely applied in a wide area of practical applications including DNA library screening [20], product testing quality control[22], file searching in storage systems [17], sequential screening of experimental variables [18], efficient contention algorithms for MAC [17, 26], data compression [16], and computations in data stream model [8]. Recently, during the COVID-19 pandemic outbreak, a number of researches adopted the group testing paradigm not only to accelerate mass testing process, but also to dramatically reduce the number of kits required for testing due to severe shortages in the testing kits supply [27, 14, 19].
While an up-front knowledge of the value of or at least an upper bound on it is required in many of the algorithms aimed at identifying the defective items, estimating or finding the number of defectives is an interesting problem on its own as well. Defectives estimation via group testing has been applied vastly in biological and medical applications [7, 23, 24, 25, 13]. In [24], for example, group testing algorithms are used to estimate aster-yellow virus transmitters proportion over the organisms in a natural population of leafhoppers. Similarly, in [25], the authors estimate the infection rate of the yellow-fever virus in mosquito population using group testing methods. On the other hand, in [13], group-testing-based estimation of rare diseases prevalence is employed not only for its effectiveness but also because it naturally preserves individual anonymity of the subjects.
Algorithms dedicated for this task might operate in stages or rounds. In each round, the tests are defined in advance and tested in a single parallel step. Tests on some round might depend on the test results of the preceding rounds. A single round algorithm is called non-adaptive algorithm, while a multi-round algorithm is called adaptive algorithm.
In recent years, there has been an increasing interest in the problem of estimating the number of defective items via group testing[3, 2, 7, 5, 9, 10, 12, 21]. The target in some of these papers is to find an estimation within an additive factor of such that . For randomized adaptive algorithms we have the following results. Falhatgar et.al. [12] give a randomised adaptive algorithm that estimates using queries in expectation where is the failure probability of the algorithm. Bshouty et. al. [3] modified this result and gave an algorithm that uses expected number of queries. Moreover, they proved a lower bound of queries.
For randomized non-adaptive algorithms with constant estimation, Damaschke and Sheikh Muhammad give in [10] a randomized non-adaptive algorithm that makes tests and in [2], Bshouty gives the lower bound .
In this paper, we are interested in deterministic adaptive and non-adaptive algorithms that estimate the defective items set size up to a multiplicative factor of . Formally, let and let . We say that a deterministic algorithm estimates up to a multiplicative factor of if, given as an input to the algorithm, it evaluates an estimation such that . Bshouty et al. show in [3] that, if no upper bound is given to the algorithm, then any deterministic adaptive algorithm (and therefore also non-deterministic algorithm) for this problem must make at least tests. This is equivalent to testing all the items. This justifies the fact that any non-trivial efficient algorithm must have some upper bound for .
Agarwal et.al. [1] consider this problem. They first give the lower bound of queries for any non-adaptive deterministic algorithm. Moreover, using a non-constructive proof, they give an upper bound of queries.
We further investigate this problem. We bring new lower and upper bounds on the number of tests required both in adaptive and non-adaptive deterministic algorithms. For the adaptive deterministic settings, our results show that, any algorithm for estimating the defectives number up to a multiplicative factor of must make at least tests. This extends the same lower bound achieved in [1] for non-adaptive algorithms. Furthermore, we give a polynomial time adaptive algorithm that shows that our bound is tight up to a small additive term.
For non-adaptive algorithms, we achieve an upper bound of by means of non-constructive proof. This improves the lower bound from [1], and matches the lower bound up to a small additive term.
We then study polynomial time constructive algorithms. For this task, we use existing polynomial time constructible expander regular bipartite graphs, extractors and condensers to construct two polynomial time algorithms. The first algorithm makes tests, and the second makes tests. To the best of our knowledge, this is the first explicit construction with an almost optimal test complexity. Our results are summarised in Table 1.
Bounds | Adaptive/ | Result | Explicit/ | Ref. |
---|---|---|---|---|
Non-Adapt. | Non-Expl. | |||
Lower B. | Non-Adapt. | - | [1] | |
Lower B. | Adaptive | - | Ours | |
Upper B. | Adaptive | Explicit | Ours | |
Upper B. | Non-Adapt. | Non-Expl. | [1] | |
Upper B. | Non-Adapt. | Non-Expl. | Ours | |
Upper B. | Non-Adapt. | Explicit111This result is true for for some constant . See section 6.2. | Ours | |
Upper B. | Non-Adapt. | Explicit | Ours |
for estimating defectives in deterministic group testing.
2 Definitions and Preliminary Results
In this section, we give some notations and definition that will be used in this paper.
Let be a set of items. Let be a set of defective items, and let denote its size, i.e. . In the group testing settings, a test is a subset of items. An answer to a test with respect to the defective items set , is denoted by , such that if and otherwise. We denote by an oracle that for a test returns .
Let be an algorithm with an access to , and let . We say that the algorithm estimates up to a multiplicative factor of , if gets as an input an upper bound and a parameter , and outputs such that . We say that is an adaptive algorithm, if its queries depend on the result of previous queries, and non-adaptive if its queries are independent of previous ones and therefore, can be executed in a single parallel step. We may assume that , otherwise, the algorithm trivially outputs . We note here that , that is, it is greater than a constant that is greater than and it may depend222For example on and/or . This is implicit in [1] and is also constrained in this paper. It is also interesting to investigate this problem when where (small ) is with respect to and/or .
We will use the following
Lemma 1.
Chernoff’s Bound. Let be independent random variables taking values in . Let denotes their sum and denotes the sum’s expected value. Then
(1) |
In particular,
(2) |
For we have
(3) |
Moreover, we will often use the inequality
(4) |
3 Upper Bound for Non-Adaptive Deterministic Algorithms
In this section, we give the upper bound for deterministic non-adaptive algorithm that estimates up to a multiplicative factor of . We prove:
Theorem 2.
Let be some upper bound on the number of defective items and . Then, there is a deterministic non-adaptive algorithm that makes
tests and outputs such that
To prove the Theorem we need the following:
Lemma 3.
Let and . There is a non-adaptive deterministic algorithm that makes
tests such that,
-
1.
If the number of defectives is less than , it outputs .
-
2.
If it is greater than , it outputs .
Proof.
We choose a constant such that . Note that
and
Therefore, such exists and we have .
Consider a test chosen at random where each item is chosen to be in with probability . Let be the set of defective items such that , and let be the result of the test with respect to the set . Then,
(5) |
If ,
(6) |
if ,
(7) |
and if , we get:
(8) |
Let be a sequence of i.i.d tests such that
where and .
Let
Consider the following two events:
-
1.
: There is a set of defectives of size such that the number of tests with answer is less than .
-
2.
: There is a set of defectives of size such that the number of tests with answer is at least .
Notice that, to prove the lemma it is enough to prove that . We will show that .
Let be random variables such that if and only if . Let be the number of tests that yield the result . Therefore, and define .
If , then . By (6) we have
(9) |
By (3) in Lemma 1, for we have
Using this result, equations (4) and the union bound, we can conclude that
On the other hand, for the event , we have two cases.
Case I. .
If there is a set of defectives of size such that more than of the tests yield the answer , then there is a set of defectives of size such that more than of the tests answers are . Denote by the latter event. Then, by (8) we have and for , we get
If then and then by (1) in Lemma 1, we have
Case II. .
In this case we have . Therefore, if there is a set of defectives of size such that more than of the tests yield the answer , then there is a set of defectives of size such that more than of the tests answers are . Denote by the latter event. By (7), . Let . Then . By (1) in Lemma 1, we have
Then
∎
We are now ready to prove Theorem 2.
Let be the algorithm from Lemma 3. Then, makes at most
(10) |
queries for some constant , and
-
1.
If , then .
-
2.
If , then
Consider the algorithm that runs for all . Let be the minimum integer such that . Algorithm then outputs . See algorithm in Figure 1.
1) 2) For each do: 2.1) 2.2) If then Output ().
We now prove:
Lemma 4.
Algorithm is deterministic non-adaptive that makes
tests and outputs that satisfies
Proof.
For , if then . Then and since we also have .
For , if and then and . Then and .
Let . Let denote the number of queries performed by algorithm . By (10), the number of tests is at most
For the case when we get
and for the case when we get
∎
4 Lower Bound for Adaptive Deterministic Algorithm
In this section, we prove the following lower bound.
Theorem 5.
Any deterministic adaptive group testing algorithm that given , outputs that satisfies must make at least
queries.
For the proof, we use the following from [3].
Lemma 6.
Let be a deterministic adaptive algorithm that for a defective sets makes the tests and let be the sequence of answers to these tests. If then the test complexity of is .
The following Lemma assists us to prove the result declared by Theorem 5.
Lemma 7.
Any deterministic adaptive algorithm such that, if the number of defectives is less than or equal it outputs and if it is greater than it outputs , must make
tests.
In particular, when and we get
tests.
Proof.
Let be such algorithm. Let be the sequence of answers to the tests of when the set of defective items is . Consider a set of size and let . Let . We claim that . Suppose for the contrary, . Then, since , there is a test that is asked by that gives answer to and to . Since , there is a subset such that and therefore gives answer to . Then and we get a contradiction.
Since and algorithm outputs to , it also outputs to . Therefore, . Therefore . That is, for every possible sequence of answers of the algorithm , there is at most sets of size that get the same sequence of answers. Since there are such sets, the number of different sequences of answers that might have must be at least . By Lemma 6, the number of tests that the algorithm makes is at least
∎
The conclusions established by Lemma 7 show that the upper bound from Lemma 3 is tight. Moreover, using these results, we provide the following proof for Theorem 5.
Proof.
Let and . For sets of size less than or equal the algorithm returns and for sets of equal to the algorithm returns . Since , the above intervals are disjoint. So, the algorithm can distinguish between sets of size less that or equal to and sets of size greater than . By Lemma 7 the algorithm must make at least
tests. ∎
5 Polynomial Time Adaptive Algorithm
In this section, we prove:
Theorem 8.
Let be some upper bound on the number of defective items and . Then, there is a linear time deterministic adaptive algorithm that makes
tests and outputs such that
We first describe the algorithm. The algorithm gets as an input the set of items and splits it into two equally-sized disjoint sets and . The algorithm asks the queries defined by and and proceeds in the splitting process on the sets that yielded positive answers only. We call these sets defective sets. As long as the algorithm gets less than distinct defective sets, it continues to split and test. Two cases can happen. Either it gets defective sets and then the algorithm outputs , or the number of the defective sets is always less than and then, the algorithm finds all the defective items and returns their exact number. The algorithm is given in Figure 2. The algorithm invokes the procedure that on an input , it returns the set where such that , , if , and otherwise.
Adaptive-dEstimate 1) 2) While do: 2.1) For each If then If then 2.2) If Output () Else 3) . 4) Output ()
Lemma 9.
Algorithm Adaptive-dEstimate is a deterministic adaptive algorithm that makes
tests and outputs an estimation such that:
Proof.
If , then the splitting process in step 2 of the algorithm proceeds until each defective item belongs to a distinct set. Eventually, the condition in step 2.2 is met and the algorithm outputs the exact value of . If , then the splitting process stops when the number of defective sets exceeds . The algorithm halts and outputs . Obviously, . Therefore, Moreover, which implies that
The number of iterations cannot exceed iterations. In the first iterations, in the worst case scenario, the algorithm splits its current set on each iteration into two sets and such that . Therefore, the number of tests that the algorithm asks over all the first iterations is at most
Since , in the other iterations, the algorithm makes at most tests each iteration. So, the total number of tests is at most
. ∎
6 Polynomial Time Non-Adaptive Algorithm
In this section, we show how to use expanders, condensers and extractors to construct deterministic non-adaptive algorithms for defectives number estimation. We prove:
Theorem 10.
Let be some upper bound on the number of defective items and . Then, there is a polynomial time deterministic non-adaptive algorithm that makes
tests and outputs such that
6.1 Algorithms Using Expanders
Let be a bipartite graph with left vertices , right vertices and edges . For each edge , it holds that the endpoint and . For a vertex , define to be the set of the neighbours of in i.e. . For a subset , we define to be the set of neighbours of , meaning . For a vertex , the degree of is defined as . We say that a bipartite graph is a -expander -regular bipartite graph if, the degree of every vertex in is , and for every left-subset of size at most , we have .
Lemma 11.
Let be a set of items and is the set of defective items such that is unknown to the algorithm. Let be a -expander -regular bipartite graph with and . Then, there is a deterministic non-adaptive algorithm , such that for items, it makes tests and satisfies:
-
1.
If , then outputs .
-
2.
If , then outputs .
Proof.
For every , we define the test . The number of tests is . If , then . Therefore, at least tests will give positive answer . If , then, since the degree of every vertex in is , we have . This shows that, for this case, at most tests give the answer . Hence, we can distinguish between the two cases. ∎
Lemma 12.
Let be a deterministic non-adaptive algorithm such that, for items, it makes tests and satisfies:
-
1.
If , then outputs .
-
2.
If , then outputs .
Then, there is a deterministic non-adaptive algorithm such that, given , for items it makes
tests and outputs that satisfies .
The parameters of the explicit construction of a -expander -regular bipartite graph from [4] are summarised in the following lemma.
Lemma 13.
For any and , there is an explicit construction of a -expander -regular bipartite graph with
We now prove:
Lemma 14.
There is a polynomial time deterministic non-adaptive algorithm that makes
tests and outputs that satisfies
Proof.
We use the expander in Lemma 13. Recall that . Let , and . Then and . By Lemma 11, there is a deterministic non-adaptive algorithm such that for items, it makes tests and
-
1.
If then outputs .
-
2.
If then outputs .
Algorithm trivially satisfies the first condition required by Lemma 12. Consider item 2. If then and then if then outputs . If then and then if then outputs . Since , if then outputs .
Now by Lemma 12, there is a deterministic non-adaptive algorithm such that, given , for items, it makes
tests and outputs that satisfies .
∎
6.2 Algorithms Using Extractors and Condensers
Extractors are functions that convert weak random sources into almost-perfect random sources. We use these objects to construct a non-adaptive algorithm for estimating . We start with some definitions.
Definition 15.
Let be a random variable over a finite set . We say that has min-entropy at least if for all .
Definition 16.
Let and be random variables over a finite set . We say that and are if
We denote by the uniform distribution on . The notations or stand for the fact that the probability and the expectation are taken when is chosen randomly uniformly from .
Definition 17.
A function is a condenser if for every with min-entropy at least and uniformly distributed on , the distribution of is -close to a distribution with min-entropy . A condenser is called -lossless condenser if . A condenser is called -extractor if .
Let and , and let be a condenser. Consider the matrix induced by . That is, for and , the entry is equal to . For , let be the th column of . Then, .
Definition 18.
Let be a finite set. An -mixture over is an tuple such that for all , .
Using these definitions and notations, we restate the result proved by Cheraghchi [6] (Theorem 9) in the following lemma.
Lemma 19.
Let be a condenser. Let be the matrix induced by . Then, for any mixture over , the number of columns in that satisfies
is less than .
Equipped with Lemma 19, we prove:
Lemma 20.
If there is a condenser then, there is a deterministic non-adaptive algorithm for items that makes tests and satisfies the following.
-
1.
If the number of defectives is less than then outputs .
-
2.
If the number of defectives is greater than or equal then outputs .
Proof.
Consider the matrix induced by the condenser as explained above. We define the test matrix from as follows. Let . Define such that if and only if , where the bits in are indexed by the elements of . Each row in the matrix is replaced by rows (in ) such that in each entry is replaced by the column vector . The rows of the matrix are indexed by . Let denote the th column of . Therefore, for and , the row in the matrix is denoted by . Moreover, the th entry of the row is denoted by and if and only if . The size of the test matrix is .
Let the defective elements be and let indicate the tests result. Then, is equal to . Let be a mixture over where for all , . It is easy to see that:
-
1.
. This is because, by the definition of , if and only if . The entry gets the value if at least one of the entries is . Any row in has exactly one entry that is equal to in all the rows indexed by . Hence, each row can cause one item to be inserted to .
-
2.
For any , we have
-
3.
Given the matrix , its test matrix and the observed result , for any column the probability can be easily computed.
If the number of defectives is less than then, by Lemma 19, all columns, except for at most columns, satisfy
So for less than columns we have . If the number of defectives is greater than or equal , then for the columns of the defectives we have . So for more than columns we have .
∎
The following Lemma summarises the state of the art result due to Guruswami et. al. [15] on explicit construction of expanders.
Lemma 21.
For all positive integers such that , and all , there is an explicit extractor with and for some constant .
We now prove:
Lemma 22.
There is a constant such that for every , there is a polynomial deterministic non-adaptive algorithm that estimates the number of defective items in a set of items up to a multiplicative factor of and asks
queries.
Proof.
We use the notations from Lemma 21. Let . We choose and such that . Then
By Lemma 20, there is a deterministic non-adaptive algorithm for items that makes
tests that satisfies the following:
-
1.
If the number of defectives is less than then outputs .
-
2.
If the number of defectives is greater than or equal then outputs and, since , in particular, if the number of defectives is greater than or equal then outputs .
By Lemma 12, the result follows. ∎
A similar work by Capalbo et.al. [4] gives an explicit constrction of a lossless condenser is summarised in the following lemma:
Lemma 23.
For all positive integers and all , there is an explicit lossless condenser with and .
References
- [1] Abhishek Agarwal, Larkin Flodin, and Arya Mazumdar. Estimation of sparsity via simple measurements. In 2017 IEEE International Symposium on Information Theory (ISIT), pages 456–460. IEEE, 2017.
- [2] Nader H. Bshouty. Lower bound for non-adaptive estimation of the number of defective items. In 30th International Symposium on Algorithms and Computation, ISAAC 2019, December 8-11, 2019, Shanghai University of Finance and Economics, Shanghai, China, pages 2:1–2:9, 2019.
- [3] Nader H. Bshouty, Vivian E. Bshouty-Hurani, George Haddad, Thomas Hashem, Fadi Khoury, and Omar Sharafy. Adaptive group testing algorithms to estimate the number of defectives. CoRR, abs/1712.00615, 2017.
- [4] Michael Capalbo, Omer Reingold, Salil Vadhan, and Avi Wigderson. Randomness conductors and constant-degree expansion beyond the degree / 2 barrier. 01 2002.
- [5] Yongxi Cheng and Yin-Feng Xu. An efficient fpras type group testing procedure to approximate the number of defectives. Journal of Combinatorial Optimization, 27:302–314, 2014.
- [6] Mahdi Cheraghchi. Noise-resilient group testing: Limitations and constructions. CoRR, abs/0811.2609, 2008.
- [7] Chen CL and Swallow WH. Using group testing to estimate a proportion, and to test the binomial model. Biometrics, 46(4):1035–1046, 1990.
- [8] Graham Cormode and S. Muthukrishnan. What’s hot and what’s not: Tracking most frequent items dynamically. ACM Trans. Database Syst., 30(1):249–278, March 2005.
- [9] Peter Damaschke and Azam Sheikh Muhammad. Bounds for nonadaptive group tests to estimate the amount of defectives. In Weili Wu and Ovidiu Daescu, editors, Combinatorial Optimization and Applications, pages 117–130, Berlin, Heidelberg, 2010. Springer Berlin Heidelberg.
- [10] Peter Damaschke and Azam Sheikh Muhammad. Competitive group testing and learning hidden vertex covers with minimum adaptivity. In Mirosław Kutyłowski, Witold Charatonik, and Maciej Gebala, editors, Fundamentals of Computation Theory, pages 84–95, Berlin, Heidelberg, 2009. Springer Berlin Heidelberg.
- [11] Robert Dorfman. The detection of defective members of large populations. The Annals of Mathematical Statistics, 14(4):436–440, 1943.
- [12] Moein Falahatgar, Ashkan Jafarpour, Alon Orlitsky, Venkatadheeraj Pichapati, and Ananda Suresh. Estimating the number of defectives with group testing. pages 1376–1380, 07 2016.
- [13] Joseph L. Gastwirth and Patricia A. Hammick. Estimation of the prevalence of a rare disease, preserving the anonymity of the subjects by group testing: application to estimating the prevalence of aids antibodies in blood donors. Journal of Statistical Planning and Inference, 22(1):15 – 27, 1989.
- [14] Christian Gollier and Olivier Gossner. Group testing against covid-19. Covid Economics, pages 32–42, 04 2020.
- [15] V. Guruswami, C. Umans, and S. Vadhan. Unbalanced expanders and randomness extractors from parvaresh-vardy codes. In Twenty-Second Annual IEEE Conference on Computational Complexity (CCC’07), pages 96–108, 2007.
- [16] E. S. Hong and R. E. Ladner. Group testing for image compression. IEEE Transactions on Image Processing, 11(8):901–911, 2002.
- [17] W. Kautz and R. Singleton. Nonrandom binary superimposed codes. IEEE Transactions on Information Theory, 10(4):363–377, 1964.
- [18] Chou Li. A sequential method for screening experimental variables. Journal of The American Statistical Association - J AMER STATIST ASSN, 57:455–477, 06 1962.
- [19] Cassidy Mentus, Martin Romeo, and Christian DiPaola. Analysis and applications of adaptive group testing methods for covid-19. medRxiv, 2020.
- [20] Hung Ngo and Ding-Zhu Du. A survey on combinatorial group testing algorithms with applications to dna library screening. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, 55, 12 2000.
- [21] Dana Ron and Gilad Tsur. The power of an example: Hidden set size approximation using group queries and conditional sampling. CoRR, abs/1404.5568, 2014.
- [22] M. Sobel and P. A. Groll. Group testing to eliminate efficiently all defectives in a binomial sample. The Bell System Technical Journal, 38(5):1179–1252, 1959.
- [23] William H Swallow. Group testing for estimating infection rates and probabilities of disease transmission. Phytopathology (USA), 1985.
- [24] Keith H Thompson. Estimation of the proportion of vectors in a natural population of insects. Biometrics, 18(4):568–578, 1962.
- [25] Stephen D. Walter, Stephen W. Hilderth, and Barry J. Beaty. Estimation of infection Rates in Populations of Organisms Using Pools of Variable Size. American Journal of Epidemiology, 112(1):124–128, 07 1980.
- [26] J. Wolf. Born again group testing: Multiaccess communications. IEEE Transactions on Information Theory, 31(2):185–191, 1985.
- [27] Idan Yelin, Noga Aharony, Einat Shaer-Tamar, Amir Argoetti, Esther Messer, Dina Berenbaum, Einat Shafran, Areen Kuzli, Nagam Gandali, Tamar Hashimshony, Yael Mandel-Gutfreund, Michael Halberthal, Yuval Geffen, Moran Szwarcwort-Cohen, and Roy Kishony. Evaluation of covid-19 rt-qpcr test in multi-sample pools. medRxiv, 2020.