On the All-Or-Nothing Behavior
of Bernoulli Group Testing
Abstract
In this paper, we study the problem of non-adaptive group testing, in which one seeks to identify which items are defective given a set of suitably-designed tests whose outcomes indicate whether or not at least one defective item was included in the test. The most widespread recovery criterion seeks to exactly recover the entire defective set, and relaxed criteria such as approximate recovery and list decoding have also been considered. In this paper, we study the fundamental limits of group testing under the significantly relaxed weak recovery criterion, which only seeks to identify a small fraction (e.g., ) of the defective items. Given the near-optimality of i.i.d. Bernoulli testing for exact recovery in sufficiently sparse scaling regimes, it is natural to ask whether this design additionally succeeds with much fewer tests under weak recovery. Our main negative result shows that this is not the case, and in fact, under i.i.d. Bernoulli random testing in the sufficiently sparse regime, an all-or-nothing phenomenon occurs: When the number of tests is slightly below a threshold, weak recovery is impossible, whereas when the number of tests is slightly above the same threshold, high-probability exact recovery is possible. In establishing this result, we additionally prove similar negative results under Bernoulli designs for the weak detection problem (distinguishing between the group testing model vs. completely random outcomes) and the problem of identifying a single item that is definitely defective. On the positive side, we show that all three relaxed recovery criteria can be attained using considerably fewer tests under suitably-chosen non-Bernoulli designs. Thus, our results collectively indicate that when too few tests are available, naively applying i.i.d. Bernoulli testing can lead to catastrophic failure, whereas “cutting one’s losses” and adopting a more carefully-chosen design can still succeed in attaining these less stringent criteria.
Index Terms:
Group testing, hypothesis testing, approximate recovery, phase transitions.I Introduction
The group testing problem has recently regained significant attention following new applications and connections with compressive sensing; see [1] for a recent survey. Briefly, the idea of group testing is to identify a small subset of defective items within a larger subset of items, based on a number of tests whose binary outcomes indicate whether or not at least one defective item was included in the test.
The standard recovery goal in group testing is to exactly identify the entire defective set. In combinatorial group testing [2, 3] a single test design is required to succeed for all defective sets up to a certain size, whereas in probabilistic group testing [4, 1] only high-probability recovery is required with respect to a random defective set (and/or a random test design). Various relaxed recovery criteria have also appeared, including list decoding recovery [5, 6, 7, 8, 9, 10, 11] and approximate recovery criteria that allow a small number of false positives and/or false negatives in the reconstruction [12, 11, 11, 13].
In this paper, focusing on probabilistic group testing, our goal is to better understand the fundamental limits of what can be achieved in the group testing problem under significantly weaker recovery criteria. In particular, instead of asking when it is possible to recover most of the defectives, we seek to understand when it is possible just to recover a small fraction. For general non-adaptive designs, we show that this goal can be obtained with much fewer tests, and identify an exact threshold on the number required. On the other hand, for the widely-adopted i.i.d.111We write i.i.d. as a shorthand for independent and identically distributed. Bernoulli test matrix design, we identify scenarios under which an all-or-nothing phenomenon occurs: When the number of tests is slightly above a certain threshold, high-probability exact recovery is possible, whereas slightly below the same threshold, essentially nothing can be learned from the tests. Thus, while Bernoulli designs can be near-optimal under standard recovery criteria, they are also prone to complete failure when there are too few tests, and accordingly can be highly suboptimal under relaxed recovery criteria. Along the way, we additionally provide analogous results for the problems of weak detection and identifying a definite defective, which are formally defined in Section I-C.
I-A Problem Setup
We consider a population of items indexed as , and we let denote the number of defective items. The set of defective items is denoted by , and is assumed to be uniform over the possibilities.
A group testing procedure performs a sequence of tests, with indicating which item is in the -th test. The resulting outcomes are for , where is the number of tests. That is, the test outcome is if there is any defective item in the test, and otherwise. The tests can be represented by a matrix , whose -th row is . Similarly, the outcomes can be represented by a vector , whose -th entry is .
In general, group testing procedures may be adaptive (i.e., may be chosen as a function of the previous outcomes) or non-adaptive (i.e., all must be selected prior to observing any outcomes). We focus on the non-adaptive setting, which is often preferable in practice due to permitting highly parallelized tests. In particular, except where stated otherwise, we consider the widely-adopted (i.i.d.) Bernoulli random test design [1, Sec. 2.1], in which every item is independently placed in each test with probability for some , and we choose to be such that
(1) |
This choice ensures that the probability of a positive test is exactly , which maximizes the entropy of each test outcome. More importantly, this choice of leads to a provably optimal number of tests in broad scaling regimes, as we survey in Section I-B. A simple asymptotic analysis gives as , which behaves similarly to the choice , but the exact choice described by (1) will be more convenient to work with.
I-B Related Work
There have recently been numerous developments on theory and algorithms for probabilistic group testing [14, 15, 16, 17, 13, 18, 19, 20] (see [1] for a survey); here we focus only on those most relevant to the present paper.
The most relevant works to us are those attaining upper and/or lower bounds on the number of tests of the form . The most straightforward way that this quantity arises is that with possible defective sets and possible sequences of outcomes, we require for each defective set to produce different outcomes. In the sublinear regime , this simplifies to . Building on this intuition, Fano’s inequality was used in [4, 21] to show that is required to attain an error probability of at most , and a refined bound was established in [22].
More recently, various results showed that tests are sufficient for certain recovery guarantees under broad scaling regimes on as a function of . In [13], high-probability exact recovery was shown to be possible under Bernoulli random testing when and , and in addition, this result was extended to all when the exact recovery criterion is replaced by the following approximate recovery criterion (see also [11]): Output a set of cardinality such that
(2) |
for some . The above-mentioned result holds for arbitrarily small , as long as it is bounded away from zero as .
On the other hand, the lower bounds for approximate recovery in [13, 11] only state that in order to attain (2) for fixed , it is necessary that . This suggests that as increases, the constant factor in the number of tests could be reduced. We will show that the preceding lower bound can in fact be matched with an upper bound using a suitably-chosen non-Bernoulli design, whereas Bernoulli designs can fail even for close to one when is slightly smaller than .
While our discussion thus far focuses on Bernoulli designs, in the case of exact recovery, improved bounds have been shown for a different random test design based on near-constant tests-per-item [23, 19, 20], in particular permitting for all (improving on ). However, upon moving to approximate recovery with parameter , both designs attain the threshold in the limit as [11, 24], suggesting that there is less to be gained via the near-constant tests-per-item design under relaxed recovery criteria. Nevertheless, extending our results to this design may be an interesting direction for future work.
Our work is inspired by recent studies of the all-or-nothing behavior of sparse linear regression under i.i.d. Gaussian measurements; see [25] for a study of the maximum-likelihood estimator, and [26, 27] for general estimators. While group testing can be viewed as a non-linear Boolean counterpart to sparse linear regression [28, 29], and our work will adopt the same high-level approach as [26], the details will be very different.
I-C Overview of the Paper
As hinted above, in this paper, our main goal is to investigate the question of when the following mild recovery requirement is possible:
-
•
(Weak recovery) Can we find a set of size such that for small with some non-zero constant probability?
The study of this goal essentially asks whether we can learn even a small amount of information from the test outcomes. As a result, any hardness result (lower bound on the number of tests) under this criterion serves as a much stronger claim compared to a hardness result for exact recovery.
While our main focus is on weak recovery, we will additionally address the following two recovery goals that are also much milder than exact recovery:
-
•
(Weak detection) Can we perform a hypothesis test on to distinguish between the above group testing model and the “null model” in which is independent of ?
-
•
(Identify a definite defective) Can we identify just a single defective item, i.e., output a single index with certainty that ? (Here we also allow “detected errors”, in which the decoder declares that it is uncertain.)
We show in Section II that these goals can be achieved with very few tests using suitably-chosen non-Bernoulli test designs. On the other hand, for Bernoulli designs (studied in Section III), we will see that in sufficiently sparse regimes, these criteria all require essentially the same number of tests as exact recovery.
The main reason that we consider weak detection is that is serves as a useful stepping stone to establishing our negative result for weak recovery under Bernoulli designs. Identifying a definite defective is also not our central focus, but its negative result comes almost for free via our analysis.
Before proceeding, we briefly pause to discuss our emphasis on Bernoulli designs, despite our results demonstrating that they can be inferior to alternative designs under the relaxed recovery criteria. The justification for doing so is that Bernoulli designs (and other related unstructured random designs) are widespread and extensively studied in the literature [1], and thus serve as a standard “go-to” design. As a result, it is essential to not only identify the cases that they succeed, but also understand their limitations.
II Positive Results for Bernoulli and non-Bernoulli Designs
To set the stage for our negative results on Bernoulli designs, we start by providing several positive (i.e., achievability) results for weak recovery, weak detection, and identifying a definite defective, focusing primarily on non-Bernoulli designs.
II-A Asymptotically Optimal Approximate Recovery
As discussed in Section I-B, the lower bounds for approximate recovery in [13, 11] state that in order to attain
(3) |
for fixed , it is necessary that . The following result shows that one can in fact attain a matching upper bound for general test designs.
Theorem 1.
(Positive Result for Approximate Recovery) Consider the probabilistic group testing problem, and for fixed , suppose that for some . Then, when with as , there exists a non-adaptive test design and decoder that outputs an estimate of satisfying the following as :
(4) |
Moreover, it can be assumed that has cardinality exactly .
By letting be arbitrarily close to one, this result establishes the following corollary.
Corollary 1.
(Positive Result for Weak Detection) Under the preceding setup, when with as , as long as (with any implied constant), there exists a non-adaptive test design and decoder that outputs an estimate of cardinality satisfying with probability approaching one.
To prove Theorem 1, we will use the previous best-known positive result on approximate recovery as a stepping stone. This is stated in the following lemma, whose main statement comes from [13, 11], with the second part regarding approximately-known instead coming from [30, App. B].
Lemma 1.
(Existing Positive Results for Approximate Recovery [13, 11, 30]) Consider the probabilistic group testing problem with Bernoulli random testing using the choice of in (1), and suppose that for some . Then, for any fixed , when as , there exists a decoder that outputs an estimate of satisfying the following as :
(5) |
Furthermore, this result remains true even when the decoder does not know the exact value of but instead only knows some quantity satisfying , and the test-inclusion probability is replaced by .
Theorem 1 reduces the number of tests in Lemma 1 by a multiplicative factor of , and provides an asymptotically optimal result (including the constant). In addition, this result demonstrates that weak recovery is possible whenever with an arbitrarily small implied constant. In contrast, we will show in Theorem 6 below that Bernoulli testing requires the implied constant to be one, and hence, the two differ by an arbitrarily large constant factor.
Proof of Theorem 1.
The idea of the proof is straightforward: We ignore slightly less than a fraction of the items, and use Bernoulli testing to achieve approximate recovery on the items that were not ignored.
More formally, fix , and consider discarding items chosen uniformly at random, leaving items remaining. By Hoeffding’s inequality for the Hypergeometric distribution [31] and the assumption , we have with probability that the number of remaining defectives satisfies
(6) |
We apply the second part of Lemma 1 on this reduced problem, with and an approximate recovery parameter to be selected shortly. While the number of defectives in the reduced problem is random, we see from (6) that it is known up to a multiplicative factor of , as required in Lemma 1. Then, the number of tests required in Lemma 1 satisfies the following:
(7) | ||||
(8) | ||||
(9) |
for arbitrarily small , where we used for . In addition, in accordance with the lemma, the returned set of size contains at least
(10) |
defective items, with probability approaching one.
It suffices to let the final estimate equal ; alternatively, if an estimate of size is sought, one can add arbitrary items to to form . In either case, taking arbitrarily close to and arbitrarily close to , (10) ensures that contains at least defective items, which implies due to the fact that . In addition, since is arbitrarily small, the number of tests in (9) simplifies to for arbitrarily small .
∎
We note that while the preceding analysis still uses Bernoulli testing as a subroutine via Lemma 1, the full test matrix is not i.i.d. Bernoulli, as a fraction of its columns are set to zero. Hence, we still consider this to be a non-Bernoulli test design.
Discussion. Since the proof of Theorem 1 is based on ignoring a fraction of the items, it amounts to a technique that immediately gives up on exact recovery, or “cuts its losses” from an early stage. This raises the interesting question as to whether such an approach is actually necessary to obtain the bound in Theorem 1.
To appreciate this distinction, note that Hwang’s adaptive generalized binary splitting algorithm [32] works by repeatedly identifying a single defective using at most tests, and then removing it from further consideration. Hence, exact recovery is guaranteed with , but even if the procedure is stopped after tests, one will have already identified defective items. In this sense, Hwang’s adaptive algorithm is universally optimal with respect to the approximate recovery parameter ,222Note that the lower bound stated following (3) also holds for adaptive algorithms. and the algorithm degrades gracefully as the number of tests decreases below the exact recovery threshold.
In contrast, the non-adaptive designs that we have considered do not enjoy such universality. Under a Bernoulli design with , we can achieve approximate recovery with arbitrarily small when (Lemma 1), or even exact recovery if [13], but we are prone to complete failure for smaller , at least in sufficiently sparse regimes (see Theorem 6 below). Alternatively, ignoring roughly a fraction of the items leads to -approximate recovery when , but one retains this guarantee and no better regardless of how much is increased. Hence, it remains an interesting question for future work as to whether there exists a gracefully degrading (and ideally universally optimal) test matrix design in the non-adaptive setting.
II-B A Trivial Strategy for Weak Detection
We first describe the weak detection problem in more detail. The goal is to distinguish between two joint distributions on the pair for some specified distribution :
-
•
Under distribution , the X-marginal is , and the joint distribution is deduced by deterministically computing from via the group testing model.
-
•
Under distribution , the and marginals match those of , but and are independent, i.e., .
This is a binary hypothesis testing problem. The distribution corresponds to “completely uninformative outcomes”, so intuitively, if we cannot reliably distinguish between and , then we can view the group tests (under the distribution ) as being highly uninformative.
As hinted above, for Bernoulli designs, the main reason that we consider weak detection is as a stepping stone to providing a negative result for weak recovery (see Section III). It turns out that if we consider weak detection as a standalone problem with general designs, then a trivial strategy succeeds with very few tests. Formally, we have the following.
Proposition 1.
(Trivial Strategy for Weak Detection) For any number of tests , there exists some distribution such that given , the group testing joint distribution can be distinguished from with zero error probability under and error probability under .
Proof.
Consider letting each test independently contain all items with probability and no items with probability . Hence, each outcome is independent and equiprobable on . Consider a detection algorithm that declares whenever the ones in exactly match the rows of ones in , and declares otherwise. Then, under , success is guaranteed due to the group testing model being noiseless. Under , since the test outcomes are uniformly random, the probability of producing the correct outcomes is , proving the proposition. ∎
This result suggests that weak detection is of limited interest as a standalone recovery criterion in general. Nevertheless, this finding arguably strengthens our subsequent negative results for Bernoulli designs (Section III), in the sense of showing that weak detection is not attained despite being an “extremely easy” goal.
II-C Weak Detection with Bernoulli Designs
Our negative results for Bernoulli testing in Section III will demonstrate failure in sufficiently sparse regimes (e.g., ) when the number of tests is slightly below . On the other hand, fairly simple detection strategy can be used to attain the following positive result under Bernoulli testing when and .
Theorem 2.
(Positive Result for Weak Detection with Bernoulli Designs) Consider the probabilistic group testing problem with Bernoulli random testing using the choice of in (1), and suppose that for some . Then, under the condition that and for some (arbitrarily small) fixed constant , there exists a binary hypothesis testing scheme that succeeds in distinguishing from with probability approaching one as .
Proof.
The idea of is to perform weak detection based on the number of columns of that are covered by the test outcomes (i.e., the test outcome is one whenever the column entry is one). The number of such columns is characterized by a binomial distribution under both the group testing model and the null model, and the distributions are shown to be statistically distinguishable under the conditions given. The details are given in Appendix A. ∎
II-D A Simple Strategy for Identifying a Definite Defective
We first describe the problem of identifying a definite defective in more detail. We suppose that the decoder either outputs a single index believed to be defective, or declares “I don’t know” by outputting . In the former scenario, we insist that must be defective (i.e., ) with probability one, meaning that the only errors allowed are detected errors corresponding to . This setup is partly motivated by the definite defectives algorithm for recovering the defective set [17, 23], as well as the notion of zero undetected error capacity in information theory [33].
The following result shows that under a suitably-chosen non-Bernoulli test design, a single definitely-defective item can be reliably identified using only tests, representing a reduction by a factor of compared to the usual scaling.
Theorem 3.
(Positive Result for Identifying a Definite Defective) Consider the probabilistic group testing problem, and suppose that for some and some positive integer . Then there exists a non-adaptive test design and decoder that outputs an estimate of a definite defective (with representing “I don’t know”) satisfying
(11) |
as and .
Proof.
We first consider the case . Let be a uniformly random set of items.333We assume for simplicity that is an integer; the general case follows similarly by rounding. By the assumption that and , it is straightforward to show that
(12) |
Indeed, the analogous claim is standard when each item is included in with probability [1, Sec. 2.3], and (12) can then by understood by approximating the Hypergeometric distribution by the binomial distribution [34].
We proceed by describing a procedure from the SAFFRON algorithm of [12] that is guaranteed to identify the single defective item in whenever , while also being able to identify with certainty whether or . This procedure uses tests, where .
Number the items in from to in a fixed manner (e.g., maintaining the order that they take as items in ). For , let be a binary vector of length and weight constructed as follows: The first entries are the number written in binary, and the last entries are the same, but with the s and s swapped. We then construct tests, where test contains exactly the items corresponding to for which the -th entry of equals .
We now consider the following cases:
-
•
If contains no defective items, then all tests will be negative. When this is observed, we set .
-
•
If contains exactly one defective, then exactly of the tests will be positive. When this is observed, we set to be item whose value is spelled out (in binary) by the first test results.
-
•
If contains two or more defectives, then more than of the tests will be positive. When this is observed, we set .
The first and third cases ensure that we never erroneously set . In addition, we correctly identify a defective item in the second case, which occurs with probability due to (12). This proves the theorem for .
To handle , we simply repeat the preceding process times, drawing independently each time. By doing so, we only fail if none of the sets contain exactly one defective item, which occurs with probability . ∎
We briefly mention that the scaling in Theorem 3 cannot be improved further. To see this, suppose by contradiction that a definite defective could be found with constant (non-zero) probability using tests. By repeating this procedure with uniformly random shuffling of the items, ignoring any outcomes and removing any defectives identified, an adaptive group testing algorithm could recover the full defective set using tests on average. However, this would violate the standard lower bound [1, Sec 1.4], which holds even in the adaptive setting.
III All-or-Nothing Behavior for Bernoulli Designs
To establish the an all-or-nothing threshold for weak recovery under Bernoulli designs, it is useful to first study the weak detection problem as a stepping stone.
III-A Weak Detection
Recall from Section II-C that the goal of weak detection is to distinguish (defined according to the group testing model) from , with being the i.i.d. Bernoulli distribution using the choice of in (1), and .
For concreteness, we consider the Bayesian setting, where the observed pair is drawn from or with probability each. The resulting error probability of the hypothesis test is denoted by . Trivially, choosing the hypothesis via a random guess gives . It is a standard result in binary hypothesis testing that if as , then one cannot do better than random guessing asymptotically, i.e., it is impossible do better than (e.g., see [35, Sec. 2.3.1]).
Let denote the KL divergence between and . By Pinsker’s inequality, implies , and in addition, [36], where we define the divergence
(13) |
Hence, to prove a hardness result for distinguishing from , it suffices to show that as . The following theorem gives conditions under which this is the case.
Theorem 4.
(Negative Result for Weak Detection) Consider the probabilistic group testing problem with Bernoulli random testing using the choice of in (1), and suppose that for some . Then, when as , we have
(14) |
as . Hence, the smallest possible error probability for the binary hypothesis test between and behaves as .
Proof.
The condition holds when grows sufficiently slowly with respect to , e.g., it holds for arbitrarily small when . On the other hand, it remains open as to whether a similar hardness result can be proved when grows faster than . To address this question, we note the following:
-
•
Theorem 2 above shows that and can be reliably distinguished when and , which essentially reduces to for small . Thus, no longer serves as an all-or-nothing threshold in this denser regime. This provides an interesting point of contrast with the analogous sparse linear regression problem [26], where the analogous hardness result to Theorem 4 holds for all . In addition, [26, App. C] provides a positive result showing that this threshold is tight.
-
•
Theorem 5 below shows that the above -divergence does not approach zero when and . Note that -divergence approaching zero is sufficient, but not necessary, for establishing the hardness of distinguishing from . Hence, this result does not establish such hardness, but it does show that any proof establishing hardness must move beyond the approach of bounding .
In the sparse linear regression problem, a similar limitation regarding the divergence is overcome by conditioning out certain “catastrophic” low-probability events [26] that blow up the divergence. Unfortunately, it appears to be difficult to identify an analogous event in the group testing problem.
Formally, the second of these is stated as follows.
Theorem 5.
(A Condition for Non-Vanishing -Divergence) Consider the probabilistic group testing problem with Bernoulli random testing using the choice of in (1), and suppose that for some . Then, when as , we have
(15) |
for some constant .
Proof.
See Appendix B. ∎
III-B Weak Recovery
We are now ready to present our main negative result concerning the weak recovery criterion under Bernoulli testing, and establishing an all-or-nothing threshold at .
Theorem 6.
(Negative Result for Weak Recovery) Consider the probabilistic group testing problem with Bernoulli random testing using the choice of in (1), and suppose that for some . Then, when with as , for any fixed , any decoder that outputs some estimate of must yield the following as :
(16) |
Proof.
The proof is outlined as follows. Following an idea used for sparse linear regression in [26], we study the mutual information , where is an additional test independent from . Combining some manipulations of information terms with the weak detection result of Theorem 4, we show that when . On the other hand, we show that if weak recovery were possible, we would be able to predict given better than random guessing, meaning that would be bounded away from . Combining these observations, we deduce that weak recovery must be impossible. The details are given in Appendix B. ∎
Hence, when is sufficiently sparse so that holds for any (e.g., ), the threshold serves as an exact threshold between complete success and complete failure. We note that in previous works, phase transitions were proved in [13, 19, 20] regarding the error probability of recovering the exact defective set, whereas Theorem 6 gives the much stronger statement that one cannot even identify a small fraction of the defective set.
III-C Identifying a Definite Defective
The following negative result for identifying a definite defective (see Section II-D) follows in a straightforward manner from our negative result for weak detection (Theorem 4).
Theorem 7.
(Negative Result for Identifying a Definite Defective) Consider the probabilistic group testing problem with Bernoulli random testing using the choice of in (1), and suppose that for some . Then, when as , for any decoder that outputs an estimate of a definite defective (with 0 representing “I don’t know”), we have
(17) |
as .
Proof.
The idea is to note that since , the decoder must output whenever there exists some of cardinality that is disjoint from and consistent with the test outcomes. Using de Caen’s bound [37], we show that this holds with probability at least , and the theorem follows since due to Theorem 4. The details are given in Appendix B. ∎
III-D Discussion: Bernoulli vs. General Designs
We wrap up this section and Section II by highlighting that for all three of the recovery criteria considered, Bernoulli designs can be highly suboptimal compared to general designs:
-
•
Weak recovery is possible (for general test designs) when with an arbitrarily small implied constant, whereas Bernoulli testing requires in sufficiently sparse regimes;
-
•
A trivial strategy exists for weak detection with only tests (for constant error probability), whereas Bernoulli testing requires in sufficiently sparse regimes;
-
•
Identifying a definite defective is possible with tests, whereas Bernoulli testing requires in sufficiently sparse regimes.
Since Bernoulli designs are a prototypical example of an unstructured random design, these results indicate that despite their strong theoretical guarantees (e.g., as established in [13]), significant care should be taken in adopting them when too few tests are available, or when relaxed recovery criteria are considered to be acceptable. Essentially, in such cases, it is much better to “cut one’s losses” early on via a hand-crafted design that targets the less stringent recovery criterion under consideration.
IV Conclusion
We have established the fundamental limits of noiseless non-adaptive group testing under the weak detection criterion, and more generally approximate recovery, and shown Bernoulli designs to be highly suboptimal (in sufficiently sparse regimes) in the sense of exhibiting an all-or-nothing threshold at . In addition, we gave similar negative results for Bernoulli testing under the criteria of weak detection and identifying a definite defective, while establishing that non-Bernoulli designs can attain these goals with very few tests.
Possible directions for future work include (i) closing the remaining gaps in between the positive and negative results in the regime with ; (ii) handling Bernoulli designs with more general choices of ; (iii) providing analogous results for the near-constant tests-per-item design [23, 19]; (iv) finding a design that attains exact or near-exact recovery with tests while degrading gracefully below this threshold; and (v) developing analogous results under random noise models [38, 14, 18].
Appendix A Proof of Theorem 2 (Positive Result for Weak Detection with Bernoulli Designs)
By the assumption , we have . Hence, it suffices to prove the theorem for , since we can incorporate the term into .
In the following, we use the terminology that the -th column of is covered by if the support of is a subset of the support of (i.e., whenever the -th entry of is , the outcome is also ). We consider distinguishing models and by counting the number of columns of that are covered by .
Fix a constant , and consider the “typical” set containing all the sequences such that the number of positive tests is between and . By the law of large numbers, we have as . Given any sequence , when an independent random column is generated, the (conditional) probability of it being covered satisfies
(18) |
For , recalling the choice of in (1) (which yields ), we have
(19) |
and similarly
(20) |
Hence, we have
(21) |
Then, the distribution of the number of covered columns under the two hypotheses is given as follows:
-
•
Under : , where the addition of is due to the fact that the defective items’ columns are almost surely covered due to the definition of the group testing model.
-
•
Under : .
We consider the following procedure for distinguishing these two hypotheses:
(22) |
Then, given , the error probability with a uniform prior satisfies
(23) |
For the first term in (23), observe that by the Berry-Esseen Theorem [39] (see Corollary 2 in Appendix C), we have
(24) | ||||
(25) |
where denotes the standard Gaussian upper tail probability function, and as also shown in Appendix C, the relevant moments are
(26) | ||||
(27) |
The ratio appearing in (25) can be simplified as follows:
(28) | ||||
(29) | ||||
(30) | ||||
(31) |
Similarly, again using the Berry-Essen theorem, and writing in place of for brevity, we have
(32) | ||||
(33) | ||||
(34) |
and since , we have from (31) that .
We know from (21) that , and combining this with , we see from (31) and (34) that as long as and . The condition follows as an immediate consequence of (21) (with and ). In addition, again using (21), we find that the condition is satisfied if
(35) |
Letting , we find that this condition holds if , which simplifies to . Substituting , and recalling that is constant and is arbitrarily small, we find that the preceding condition holds as long as
(36) |
for arbitrarily small . This completes the proof of Theorem 2.
Appendix B Proofs of Negative Results for Bernoulli Designs
B-A Preliminary Calculations
Since the group testing model and the null model have the same distribution, and the null model assigns probability to each sequence (due to the choice of in (1)), we have
(37) |
where here and subsequently, the summation over is implicitly over all subsets of of cardinality . Since the observation model defining is deterministic, is simply if is consistent with (according to ), and otherwise. Letting be the corresponding indicator function, we rewrite (37) as
(38) |
and take the square to obtain
(39) |
Taking the average over and using the middle form in (13), we obtain
(40) |
The average here is the probability that a randomly generated (independent of each other) is consistent with both and . By the symmetry of with respect to re-labeling items, we can assume without loss of generality that equals the set
(41) |
and average over alone; by splitting into with entries in (non-overlapping with ) and entries in (overlapping with ), we obtain
(42) |
where implicitly satisfies in the expectation.
Now observe that is the probability (with respect to ) that every one of the tests satisfies any one of the following:
-
•
The test outcome is negative, and all items from are excluded;
-
•
The test outcome is positive, and at least one item from is included;
-
•
The test outcome is positive, and no items from are included, but at least one item from each of and are included.
For a single test, we characterize the probabilities of these three events under as follows follows (recalling (1)):
-
•
The first event has probability ;
-
•
The union of the second and third events above can be reformulated as the event the test outcome is positive and none of the following events occur: (i) All items from are excluded; (ii) All items from are excluded, but at least one from is included; (iii) All items from are excluded, but at least one from is included. Using this formulation, the union of the second and third events above has probability
Summing these two probabilities together gives an overall probability of associated with a single test. Since the tests are independent, taking the intersection of the corresponding events gives
(43) |
and substitution into (42) yields
(44) |
B-B Proof of Theorem 4 (Impossibility of Weak Detection for )
Recall the setup of weak detection described in Section II-B, and that we consider i.i.d. Bernoulli designs with chosen via (1). We first prove the following lemma, which provides an upper bound on the -divergence.
Lemma 2.
Assume that for some . Then, it holds that
(45) |
B-C Proof of Theorem 5 (A Condition for Non-Vanishing -Divergence)
Here we prove that for some when under the assumptions and .444We are grateful to an anonymous reviewer for helping us to significantly simplify this proof.
As mentioned in Section III-A, implies that , which in turn implies that weak detection is impossible for any algorithm. However, Theorem 2 shows that weak detection is possible when for arbitrarily small . Note that that for any . Thus, it must be the case that whenever , since otherwise would be a contradiction. In the following, we therefore only consider .
We need to prove that for some when under the assumptions and . Following (46)–(49) with the inequality in (47) reversed, we have
(63) |
Further lower bounding the right-hand side by taking only the terms , we obtain
(64) | ||||
(65) | ||||
(66) | ||||
(67) | ||||
(68) |
where (65) uses , and (67) follows since . To bound the ratio of binomial coefficients in (68), observe that
(69) | ||||
(70) | ||||
(71) | ||||
(72) |
where (72) follows since for all . Hence, since we are considering , it follows that . Combining this with (68) and the assumption , we obtain
(73) |
for some constant , completing the proof of Theorem 5.
B-D Proof of Theorem 6 (Negative Result for Weak Recovery)
As mentioned in Section III-A, implies that . Consider , along with an additional pair drawn from the same joint distribution as a single test in , independently from . Following the steps of [26] for sparse linear regression, we consider the following conditional mutual information term:
(74) | |||
(75) | |||
(76) |
where is now defined according to and containing tests instead of (one extra for , ). Under , we have almost surely, and combining this with , it follows that
(77) |
Moreover, by the chain rule for mutual information, we have
(78) | ||||
(79) |
where the two terms are attained as follows by expanding the conditional mutual information as as a difference of conditional entropies:
-
•
For the first term, write , and note that since and entropy is upper bounded by the logarithm of the number of outcomes;
-
•
For the second term, write , and note that we have since is deterministic given .
(80) |
since for by Theorem 4 (the replacement of of by only amounts to a negligible multiplicative change in ). Since the entropy functional is continuous, and the entropy of a binary random variable is at most with equality if and only if the random variable is equiprobable on its two values, we deduce from (80) that the following holds: With probability with respect to , the conditional distribution of places probability on each of and .
To complete the proof of Theorem 6, we show that the preceding claim precludes the possibility of weak recovery, i.e., (16) holds. Suppose by contradiction to (16) that it were possible to use to attain with probability at least , for some . In the following, we assume the extreme case ; the case of strict inequality follows similarly. Consider a procedure that uses this to construct an estimator that takes the test vector as input and returns an estimate of as follows: Set if the test includes any item from , and otherwise. There are two scenarios in which the estimate is incorrect:
-
•
The test may include no items from (and hence ), but an item from (and hence ). By the choice of in (1), the probability (with respect to ) of this occurring is .
-
•
The test may include no items from (and hence ), but an item from (and hence ). By the same argument as above, the probability of this occurring is .
Hence, when , the estimator produces with probability . As a result, for any fixed , the success probability behaves as . This is in contradiction with the conditional distribution of stated following (80) (which only permits a probability of correctness), and this completes the proof by contradiction establishing (16).
B-E Proof of Theorem 7 (Negative Result for Identifying a Definite Defective)
Consider any algorithm that, with probability one, only outputs when is the index of a defective item. If is the true defective set, then it is easy to see that an error occurs (i.e., ) if some disjoint from is still consistent with . Denoting this event by , it follows that
(81) | ||||
(82) |
where (82) follows from de Caen’s lower bound on the probability of a union [37]. However, for and both disjoint from (but possibly overlapping with each other), is exactly the same quantity as appearing in (43). In particular, corresponds to the case that . Substituting the expression in (43) gives when , and substitution into (82) gives
(83) | ||||
(84) |
where (84) follows by equating with (44). From Theorem 4, we know that under the conditions of Theorem 7, and we conclude that . Since this holds regardless of which is conditioned on, we obtain as desired.
Appendix C Berry-Esseen Theorem
Our analysis makes use of the following Berry-Esseen theorem, a non-asymptotic form of the central limit theorem.
Theorem 8.
(Berry-Esseen Theorem [39, Theorem 2]) For , let be independent random variables with
Denote and . Then, for any , we have
(85) |
where .
More precisely, we use the following simple corollary.
Corollary 2.
Let . Then, for any , the following holds:
(86) |
where
(87) | ||||
(88) |
References
- [1] M. Aldridge, O. Johnson, and J. Scarlett, “Group testing: An information theory perspective,” Found. Trend. Comms. Inf. Theory, vol. 15, no. 3–4, pp. 196–392, 2019.
- [2] W. Kautz and R. Singleton, “Nonrandom binary superimposed codes,” IEEE Trans. Inf. Theory, vol. 10, no. 4, pp. 363–377, 1964.
- [3] D. Du and F. K. Hwang, Combinatorial group testing and its applications. World Scientific, 2000, vol. 12.
- [4] M. Malyutov, “The separating property of random matrices,” Math. Notes Acad. Sci. USSR, vol. 23, no. 1, pp. 84–91, 1978.
- [5] A. de Bonis, L. Gasieniec, and U. Vaccaro, “Optimal two-stage algorithms for group testing problems,” SIAM Journal on Computing, vol. 34, no. 5, pp. 1253–1270, 2005.
- [6] A. G. D’yachkov and V. V. Rykov, “A survey of superimposed code theory,” Prob. Contr. Inf., vol. 12, no. 4, pp. 1–13, 1983.
- [7] A. Rashad, “Random coding bounds on the rate for list-decoding superimposed codes,” Prob. Ctrl. Inf. Theory, vol. 19, no. 2, pp. 141–149, 1990.
- [8] M. Cheraghchi, “Noise-resilient group testing: Limitations and constructions,” in Int. Symp. Found. Comp. Theory, 2009, pp. 62–73.
- [9] P. Indyk, H. Q. Ngo, and A. Rudra, “Efficiently decodable non-adaptive group testing,” in ACM-SIAM Symp. Disc. Alg. (SODA), 2010.
- [10] H. Q. Ngo, E. Porat, and A. Rudra, “Efficiently decodable error-correcting list disjunct matrices and applications,” in Int. Colloq. Automata, Lang., and Prog., 2011.
- [11] J. Scarlett and V. Cevher, “How little does non-exact recovery help in group testing?” in IEEE Int. Conf. Acoust. Sp. Sig. Proc. (ICASSP), 2017.
- [12] K. Lee, K. Chandrasekher, R. Pedarsani, and K. Ramchandran, “SAFFRON: A fast, efficient, and robust framework for group testing based on sparse-graph codes,” IEEE Trans. Sig. Proc., vol. 67, no. 17, pp. 4649–4664, Sept. 2019.
- [13] J. Scarlett and V. Cevher, “Phase transitions in group testing,” in Proc. ACM-SIAM Symp. Disc. Alg. (SODA), 2016.
- [14] C. L. Chan, P. H. Che, S. Jaggi, and V. Saligrama, “Non-adaptive probabilistic group testing with noisy measurements: Near-optimal bounds with efficient algorithms,” in Allerton Conf. Comm., Ctrl., Comp., Sep. 2011, pp. 1832–1839.
- [15] M. Cheraghchi, A. Hormati, A. Karbasi, and M. Vetterli, “Group testing with probabilistic tests: Theory, design and application,” IEEE Trans. Inf. Theory, vol. 57, no. 10, pp. 7057–7067, Oct 2011.
- [16] M. B. Malyutov, “Search for sparse active inputs: A review,” in Inf. Theory, Comb. and Search Theory, 2013, pp. 609–647.
- [17] M. Aldridge, L. Baldassini, and O. Johnson, “Group testing algorithms: Bounds and simulations,” IEEE Trans. Inf. Theory, vol. 60, no. 6, pp. 3671–3687, June 2014.
- [18] J. Scarlett and V. Cevher, “Near-optimal noisy group testing via separate decoding of items,” IEEE Trans. Sel. Topics Sig. Proc., vol. 2, no. 4, pp. 625–638, 2018.
- [19] A. Coja-Oghlan, O. Gebhard, M. Hahn-Klimroth, and P. Loick, “Information-theoretic and algorithmic thresholds for group testing,” in Int. Colloq. Aut., Lang. and Prog. (ICALP), 2019.
- [20] ——, “Optimal group testing,” in Conf. Learn. Theory (COLT), 2020.
- [21] C. L. Chan, S. Jaggi, V. Saligrama, and S. Agnihotri, “Non-adaptive group testing: Explicit bounds and novel algorithms,” IEEE Trans. Inf. Theory, vol. 60, no. 5, pp. 3019–3035, May 2014.
- [22] L. Baldassini, O. Johnson, and M. Aldridge, “The capacity of adaptive group testing,” in IEEE Int. Symp. Inf. Theory, July 2013, pp. 2676–2680.
- [23] O. Johnson, M. Aldridge, and J. Scarlett, “Performance of group testing algorithms with near-constant tests-per-item,” IEEE Trans. on Inform. Th., vol. 65, no. 2, pp. 707–723, 2019.
- [24] M. Hahn-Klimroth and P. Loick, “Optimal adaptive group testing,” 2019, https://arxiv.org/abs/1911.06647.
- [25] D. Gamarnik and I. Zadik, “High dimensional regression with binary coefficients. estimating squared error and a phase transtition,” in Conf. Learn. Theory (COLT), 2017, pp. 948–953.
- [26] G. Reeves, J. Xu, and I. Zadik, “The all-or-nothing phenomenon in sparse linear regression,” in Conf. Learn. Theory (COLT), 2019.
- [27] ——, “All-or-nothing phenomena: From single-letter to high dimensions,” 2019, https://arxiv.org/abs/1912.13027.
- [28] A. Gilbert, M. Iwen, and M. Strauss, “Group testing and sparse signal recovery,” in Asilomar Conf. Sig., Sys. and Comp., Oct. 2008, pp. 1059–1063.
- [29] G. Atia and V. Saligrama, “Boolean compressed sensing and noisy group testing,” IEEE Trans. Inf. Theory, vol. 58, no. 3, pp. 1880–1901, March 2012.
- [30] J. Scarlett, “Noisy adaptive group testing: Bounds and algorithms,” IEEE Trans. Inf. Theory, vol. 65, no. 6, pp. 3646–3661, June 2019.
- [31] W. Hoeffding, “Probability inequalities for sums of bounded random variables,” J. Amer. Stat. Assoc., vol. 58, no. 301, pp. 13–30, 1963.
- [32] F. Hwang, “A method for detecting all defective members in a population by group testing,” J. Amer. Stats. Assoc., vol. 67, no. 339, pp. 605–608, 1972.
- [33] G. Forney, “Exponential error bounds for erasure, list, and decision feedback schemes,” IEEE Trans. Inf. Theory, vol. 14, no. 2, pp. 206–220, 1968.
- [34] S. Y. T. Soon, “Binomial approximation for dependent indicators,” Statistica Sinica, vol. 6, no. 3, pp. 703–714, 1996.
- [35] J. Duchi, “Lecture notes for statistics 311/electrical engineering 377,” http://stanford.edu/class/stats311/.
- [36] I. Sason and S. Verdú, “-divergence inequalities,” IEEE Trans. Inf. Theory, vol. 62, no. 11, pp. 5973–6006, Nov. 2016.
- [37] D. de Caen, “A lower bound on the probability of a union,” Discrete mathematics, vol. 169, no. 1, pp. 217–220, 1997.
- [38] M. B. Malyutov and P. S. Mateev, “Screening designs for non-symmetric response function,” Mat. Zametki, vol. 29, pp. 109–127, 1980.
- [39] W. Feller, An Introduction to Probability Theory and Its Applications, 2nd ed. John Wiley and Sons, 1971.