This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

On the All-Or-Nothing Behavior
of Bernoulli Group Testing

Lan V. Truong, Matthew Aldridge, and Jonathan Scarlett L. V. Truong is with the Department of Engineering, University of Cambridge. M. Aldridge is with the School of Mathematics, University of Leeds. J. Scarlett is with the Department of Computer Science, Department of Mathematics, and Institute of Data Science, National University of Singapore (NUS). e-mail: [email protected]; [email protected]; [email protected] This work was supported by the Singapore National Research Foundation (NRF) under grant number R-252-000-A74-281. Copyright © 2017 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to [email protected]
Abstract

In this paper, we study the problem of non-adaptive group testing, in which one seeks to identify which items are defective given a set of suitably-designed tests whose outcomes indicate whether or not at least one defective item was included in the test. The most widespread recovery criterion seeks to exactly recover the entire defective set, and relaxed criteria such as approximate recovery and list decoding have also been considered. In this paper, we study the fundamental limits of group testing under the significantly relaxed weak recovery criterion, which only seeks to identify a small fraction (e.g., 0.010.01) of the defective items. Given the near-optimality of i.i.d. Bernoulli testing for exact recovery in sufficiently sparse scaling regimes, it is natural to ask whether this design additionally succeeds with much fewer tests under weak recovery. Our main negative result shows that this is not the case, and in fact, under i.i.d. Bernoulli random testing in the sufficiently sparse regime, an all-or-nothing phenomenon occurs: When the number of tests is slightly below a threshold, weak recovery is impossible, whereas when the number of tests is slightly above the same threshold, high-probability exact recovery is possible. In establishing this result, we additionally prove similar negative results under Bernoulli designs for the weak detection problem (distinguishing between the group testing model vs. completely random outcomes) and the problem of identifying a single item that is definitely defective. On the positive side, we show that all three relaxed recovery criteria can be attained using considerably fewer tests under suitably-chosen non-Bernoulli designs. Thus, our results collectively indicate that when too few tests are available, naively applying i.i.d. Bernoulli testing can lead to catastrophic failure, whereas “cutting one’s losses” and adopting a more carefully-chosen design can still succeed in attaining these less stringent criteria.

Index Terms:
Group testing, hypothesis testing, approximate recovery, phase transitions.

I Introduction

The group testing problem has recently regained significant attention following new applications and connections with compressive sensing; see [1] for a recent survey. Briefly, the idea of group testing is to identify a small subset of defective items within a larger subset of items, based on a number of tests whose binary outcomes indicate whether or not at least one defective item was included in the test.

The standard recovery goal in group testing is to exactly identify the entire defective set. In combinatorial group testing [2, 3] a single test design is required to succeed for all defective sets up to a certain size, whereas in probabilistic group testing [4, 1] only high-probability recovery is required with respect to a random defective set (and/or a random test design). Various relaxed recovery criteria have also appeared, including list decoding recovery [5, 6, 7, 8, 9, 10, 11] and approximate recovery criteria that allow a small number of false positives and/or false negatives in the reconstruction [12, 11, 11, 13].

In this paper, focusing on probabilistic group testing, our goal is to better understand the fundamental limits of what can be achieved in the group testing problem under significantly weaker recovery criteria. In particular, instead of asking when it is possible to recover most of the defectives, we seek to understand when it is possible just to recover a small fraction. For general non-adaptive designs, we show that this goal can be obtained with much fewer tests, and identify an exact threshold on the number required. On the other hand, for the widely-adopted i.i.d.111We write i.i.d. as a shorthand for independent and identically distributed. Bernoulli test matrix design, we identify scenarios under which an all-or-nothing phenomenon occurs: When the number of tests is slightly above a certain threshold, high-probability exact recovery is possible, whereas slightly below the same threshold, essentially nothing can be learned from the tests. Thus, while Bernoulli designs can be near-optimal under standard recovery criteria, they are also prone to complete failure when there are too few tests, and accordingly can be highly suboptimal under relaxed recovery criteria. Along the way, we additionally provide analogous results for the problems of weak detection and identifying a definite defective, which are formally defined in Section I-C.

I-A Problem Setup

We consider a population of pp items indexed as {1,,p}\{1,\dotsc,p\}, and we let kk denote the number of defective items. The set of defective items is denoted by SS, and is assumed to be uniform over the (pk){p\choose k} possibilities.

A group testing procedure performs a sequence of nn tests, with X(i){0,1}pX^{(i)}\in\{0,1\}^{p} indicating which item is in the ii-th test. The resulting outcomes are Y(i)=jSXj(i)Y^{(i)}=\bigvee_{j\in S}X_{j}^{(i)} for i=1,,ni=1,\dotsc,n, where nn is the number of tests. That is, the test outcome is 11 if there is any defective item in the test, and 0 otherwise. The tests can be represented by a matrix 𝐗{0,1}n×p\mathbf{X}\in\{0,1\}^{n\times p}, whose ii-th row is X(i)X^{(i)}. Similarly, the outcomes can be represented by a vector 𝐘{0,1}n\mathbf{Y}\in\{0,1\}^{n}, whose ii-th entry is Y(i)Y^{(i)}.

In general, group testing procedures may be adaptive (i.e., X(i)X^{(i)} may be chosen as a function of the previous outcomes) or non-adaptive (i.e., all X(i)X^{(i)} must be selected prior to observing any outcomes). We focus on the non-adaptive setting, which is often preferable in practice due to permitting highly parallelized tests. In particular, except where stated otherwise, we consider the widely-adopted (i.i.d.) Bernoulli random test design [1, Sec. 2.1], in which every item is independently placed in each test with probability νk\frac{\nu}{k} for some ν>0\nu>0, and we choose ν\nu to be such that

(1νk)k=12.\bigg{(}1-\frac{\nu}{k}\bigg{)}^{k}=\frac{1}{2}. (1)

This choice ensures that the probability of a positive test is exactly 12\frac{1}{2}, which maximizes the entropy of each test outcome. More importantly, this choice of ν\nu leads to a provably optimal number of tests in broad scaling regimes, as we survey in Section I-B. A simple asymptotic analysis gives ν=(log2)(1+o(1))\nu=(\log 2)(1+o(1)) as kk\to\infty, which behaves similarly to the choice ν=log2\nu=\log 2, but the exact choice described by (1) will be more convenient to work with.

I-B Related Work

There have recently been numerous developments on theory and algorithms for probabilistic group testing [14, 15, 16, 17, 13, 18, 19, 20] (see [1] for a survey); here we focus only on those most relevant to the present paper.

The most relevant works to us are those attaining upper and/or lower bounds on the number of tests of the form (klog2pk)(1+o(1))\big{(}k\log_{2}\frac{p}{k}\big{)}(1+o(1)). The most straightforward way that this quantity arises is that with (pk)p\choose k possible defective sets and 2n2^{n} possible sequences of outcomes, we require nlog2(pk)n\geq\log_{2}{p\choose k} for each defective set to produce different outcomes. In the sublinear regime k=o(p)k=o(p), this simplifies to n(klog2pk)(1+o(1))n\geq\big{(}k\log_{2}\frac{p}{k}\big{)}(1+o(1)). Building on this intuition, Fano’s inequality was used in [4, 21] to show that n(1δ)log2(pk)n\geq(1-\delta)\log_{2}{p\choose k} is required to attain an error probability of at most δ\delta, and a refined bound nlog2((pk)(1δ))n\geq\log_{2}\big{(}{p\choose k}(1-\delta)\big{)} was established in [22].

More recently, various results showed that (klog2pk)(1+o(1))\big{(}k\log_{2}\frac{p}{k}\big{)}(1+o(1)) tests are sufficient for certain recovery guarantees under broad scaling regimes on kk as a function of pp. In [13], high-probability exact recovery was shown to be possible under Bernoulli random testing when k=O(p1/3)k=O(p^{1/3}) and n=(klog2pk)(1+o(1))n=\big{(}k\log_{2}\frac{p}{k}\big{)}(1+o(1)), and in addition, this result was extended to all k=o(p)k=o(p) when the exact recovery criterion is replaced by the following approximate recovery criterion (see also [11]): Output a set S^{1,,p}\widehat{S}\subseteq\{1,\dotsc,p\} of cardinality kk such that

max{|S^S|,|SS^|}αk\max\big{\{}|\widehat{S}\setminus S|,|S\setminus\widehat{S}|\big{\}}\leq\alpha k (2)

for some α(0,1)\alpha\in(0,1). The above-mentioned result holds for arbitrarily small α>0\alpha>0, as long as it is bounded away from zero as pp\to\infty.

On the other hand, the lower bounds for approximate recovery in [13, 11] only state that in order to attain (2) for fixed α(0,1)\alpha\in(0,1), it is necessary that n(1α)(klog2pk)(1+o(1))n\geq(1-\alpha)\big{(}k\log_{2}\frac{p}{k}\big{)}(1+o(1)). This suggests that as α\alpha increases, the constant factor in the number of tests could be reduced. We will show that the preceding lower bound can in fact be matched with an upper bound using a suitably-chosen non-Bernoulli design, whereas Bernoulli designs can fail even for α\alpha close to one when nn is slightly smaller than klog2pkk\log_{2}\frac{p}{k}.

While our discussion thus far focuses on Bernoulli designs, in the case of exact recovery, improved bounds have been shown for a different random test design based on near-constant tests-per-item [23, 19, 20], in particular permitting n=(klog2pk)(1+o(1))n=\big{(}k\log_{2}\frac{p}{k}\big{)}(1+o(1)) for all k=O(p0.409)k=O(p^{0.409}) (improving on O(p1/3)O(p^{1/3})). However, upon moving to approximate recovery with parameter α\alpha, both designs attain the threshold n=(klog2pk)(1+o(1))n=\big{(}k\log_{2}\frac{p}{k}\big{)}(1+o(1)) in the limit as α0\alpha\to 0 [11, 24], suggesting that there is less to be gained via the near-constant tests-per-item design under relaxed recovery criteria. Nevertheless, extending our results to this design may be an interesting direction for future work.

Our work is inspired by recent studies of the all-or-nothing behavior of sparse linear regression under i.i.d. Gaussian measurements; see [25] for a study of the maximum-likelihood estimator, and [26, 27] for general estimators. While group testing can be viewed as a non-linear Boolean counterpart to sparse linear regression [28, 29], and our work will adopt the same high-level approach as [26], the details will be very different.

I-C Overview of the Paper

As hinted above, in this paper, our main goal is to investigate the question of when the following mild recovery requirement is possible:

  • (Weak recovery) Can we find a set S^\hat{S} of size kk such that |SS^|δk|S\cap\hat{S}|\geq\delta k for small δ>0\delta>0 with some non-zero constant probability?

The study of this goal essentially asks whether we can learn even a small amount of information from the test outcomes. As a result, any hardness result (lower bound on the number of tests) under this criterion serves as a much stronger claim compared to a hardness result for exact recovery.

While our main focus is on weak recovery, we will additionally address the following two recovery goals that are also much milder than exact recovery:

  • (Weak detection) Can we perform a hypothesis test on (𝐗,𝐘)(\mathbf{X},\mathbf{Y}) to distinguish between the above group testing model and the “null model” in which 𝐘\mathbf{Y} is independent of 𝐗\mathbf{X}?

  • (Identify a definite defective) Can we identify just a single defective item, i.e., output a single index {1,,p}\mathcal{I}\in\{1,\dotsc,p\} with certainty that S\mathcal{I}\in S? (Here we also allow “detected errors”, in which the decoder declares that it is uncertain.)

We show in Section II that these goals can be achieved with very few tests using suitably-chosen non-Bernoulli test designs. On the other hand, for Bernoulli designs (studied in Section III), we will see that in sufficiently sparse regimes, these criteria all require essentially the same number of tests as exact recovery.

The main reason that we consider weak detection is that is serves as a useful stepping stone to establishing our negative result for weak recovery under Bernoulli designs. Identifying a definite defective is also not our central focus, but its negative result comes almost for free via our analysis.

Before proceeding, we briefly pause to discuss our emphasis on Bernoulli designs, despite our results demonstrating that they can be inferior to alternative designs under the relaxed recovery criteria. The justification for doing so is that Bernoulli designs (and other related unstructured random designs) are widespread and extensively studied in the literature [1], and thus serve as a standard “go-to” design. As a result, it is essential to not only identify the cases that they succeed, but also understand their limitations.

II Positive Results for Bernoulli and non-Bernoulli Designs

To set the stage for our negative results on Bernoulli designs, we start by providing several positive (i.e., achievability) results for weak recovery, weak detection, and identifying a definite defective, focusing primarily on non-Bernoulli designs.

II-A Asymptotically Optimal Approximate Recovery

As discussed in Section I-B, the lower bounds for approximate recovery in [13, 11] state that in order to attain

max{|S^S|,|SS^|}αk\max\big{\{}|\hat{S}\setminus S|,|S\setminus\hat{S}|\big{\}}\leq\alpha k (3)

for fixed α(0,1)\alpha\in(0,1), it is necessary that n(1α)(klog2pk)(1o(1))n\geq(1-\alpha)(k\log_{2}\frac{p}{k})(1-o(1)). The following result shows that one can in fact attain a matching upper bound for general test designs.

Theorem 1.

(Positive Result for Approximate Recovery) Consider the probabilistic group testing problem, and for fixed α(0,1)\alpha\in(0,1), suppose that n(1+η)(1α)log2(pk)n\geq(1+\eta)(1-\alpha)\log_{2}\binom{p}{k} for some η>0\eta>0. Then, when kk\to\infty with k=o(p)k=o(p) as pp\to\infty, there exists a non-adaptive test design and decoder that outputs an estimate S^\hat{S} of SS satisfying the following as pp\to\infty:

[max{|S^S|,|SS^|}αk]1.\mathbb{P}\left[\max\big{\{}|\hat{S}\setminus S|,|S\setminus\hat{S}|\big{\}}\leq\alpha k\right]\to 1. (4)

Moreover, it can be assumed that S^\hat{S} has cardinality exactly kk.

By letting α\alpha be arbitrarily close to one, this result establishes the following corollary.

Corollary 1.

(Positive Result for Weak Detection) Under the preceding setup, when kk\to\infty with k=o(p)k=o(p) as pp\to\infty, as long as k=Ω(klogpk)k=\Omega\big{(}k\log\frac{p}{k}\big{)} (with any implied constant), there exists a non-adaptive test design and decoder that outputs an estimate S^\hat{S} of cardinality kk satisfying |S^S|=Ω(k)|\hat{S}\cap S|=\Omega(k) with probability approaching one.

To prove Theorem 1, we will use the previous best-known positive result on approximate recovery as a stepping stone. This is stated in the following lemma, whose main statement comes from [13, 11], with the second part regarding approximately-known kk instead coming from [30, App. B].

Lemma 1.

(Existing Positive Results for Approximate Recovery [13, 11, 30]) Consider the probabilistic group testing problem with Bernoulli random testing using the choice of ν\nu in (1), and suppose that n(1+η)log2(pk)n\geq(1+\eta)\log_{2}\binom{p}{k} for some η>0\eta>0. Then, for any fixed α(0,1)\alpha\in(0,1), when k=o(p)k=o(p) as pp\to\infty, there exists a decoder that outputs an estimate S^\hat{S} of SS satisfying the following as pp\to\infty:

[max{|S^S|,|SS^|}αk]1.\mathbb{P}\left[\max\big{\{}|\hat{S}\setminus S|,|S\setminus\hat{S}|\big{\}}\leq\alpha k\right]\to 1. (5)

Furthermore, this result remains true even when the decoder does not know the exact value of kk but instead only knows some quantity k¯\bar{k} satisfying k¯=k(1+o(1))\bar{k}=k(1+o(1)), and the ν/k\nu/k test-inclusion probability is replaced by ν/k¯\nu/\bar{k}.

Theorem 1 reduces the number of tests in Lemma 1 by a multiplicative factor of 1α1-\alpha, and provides an asymptotically optimal result (including the constant). In addition, this result demonstrates that weak recovery is possible whenever n=Ω(klog2pk)n=\Omega\big{(}k\log_{2}\frac{p}{k}\big{)} with an arbitrarily small implied constant. In contrast, we will show in Theorem 6 below that Bernoulli testing requires the implied constant to be one, and hence, the two differ by an arbitrarily large constant factor.

Proof of Theorem 1.

The idea of the proof is straightforward: We ignore slightly less than a fraction α\alpha of the items, and use Bernoulli testing to achieve approximate recovery on the items that were not ignored.

More formally, fix α(0,α)\alpha^{\prime}\in(0,\alpha), and consider discarding αp\alpha^{\prime}p items chosen uniformly at random, leaving p=(1α)pp^{\prime}=(1-\alpha^{\prime})p items remaining. By Hoeffding’s inequality for the Hypergeometric distribution [31] and the assumption kk\to\infty, we have with probability 1o(1)1-o(1) that the number of remaining defectives kk^{\prime} satisfies

k=(1α)k(1+o(1)).k^{\prime}=(1-\alpha^{\prime})k\cdot(1+o(1)). (6)

We apply the second part of Lemma 1 on this reduced problem, with k¯=(1α)k\bar{k}=(1-\alpha^{\prime})k and an approximate recovery parameter α′′\alpha^{\prime\prime} to be selected shortly. While the number of defectives kk^{\prime} in the reduced problem is random, we see from (6) that it is known up to a multiplicative factor of 1+o(1)1+o(1), as required in Lemma 1. Then, the number of tests nn^{\prime} required in Lemma 1 satisfies the following:

n\displaystyle n^{\prime} =(1+η)log2(pk)\displaystyle=(1+\eta^{\prime})\log_{2}\binom{p^{\prime}}{k^{\prime}} (7)
=(1+η)log2((1α)p(1α)k(1+o(1)))\displaystyle=(1+\eta^{\prime})\log_{2}\binom{(1-\alpha^{\prime})p^{\prime}}{(1-\alpha^{\prime})k^{\prime}(1+o(1))} (8)
=(1+η+o(1))(1α)log2(pk)\displaystyle=\big{(}1+\eta^{\prime}+o(1)\big{)}(1-\alpha^{\prime})\log_{2}\binom{p}{k} (9)

for arbitrarily small η>0\eta^{\prime}>0, where we used log(pk)=(klogpk)(1+o(1))\log{p\choose k}=\big{(}k\log\frac{p}{k}\big{)}(1+o(1)) for k=o(p)k=o(p). In addition, in accordance with the lemma, the returned set S^\hat{S}^{\prime} of size kk^{\prime} contains at least

(1α′′)k=(1α′′)(1α)k(1+o(1))(1-\alpha^{\prime\prime})k^{\prime}=(1-\alpha^{\prime\prime})(1-\alpha^{\prime})k\cdot(1+o(1)) (10)

defective items, with probability approaching one.

It suffices to let the final estimate S^\hat{S} equal S^\hat{S}^{\prime}; alternatively, if an estimate of size kk is sought, one can add kkk-k^{\prime} arbitrary items to S^\hat{S}^{\prime} to form S^\hat{S}. In either case, taking α\alpha^{\prime} arbitrarily close to α\alpha and α′′\alpha^{\prime\prime} arbitrarily close to 0, (10) ensures that S^\hat{S} contains at least (1α)k(1-\alpha)k defective items, which implies max{|S^S|,|SS^|}αk\max\big{\{}|\hat{S}\setminus S|,|S\setminus\hat{S}|\big{\}}\leq\alpha k due to the fact that |S|=k|S|=k. In addition, since η>0\eta^{\prime}>0 is arbitrarily small, the number of tests in (9) simplifies to (1+η)(1α)log2(pk)(1+\eta)(1-\alpha)\log_{2}{p\choose k} for arbitrarily small η>0\eta>0.

We note that while the preceding analysis still uses Bernoulli testing as a subroutine via Lemma 1, the full n×pn\times p test matrix is not i.i.d. Bernoulli, as a fraction α\alpha^{\prime} of its columns are set to zero. Hence, we still consider this to be a non-Bernoulli test design.

Discussion. Since the proof of Theorem 1 is based on ignoring a fraction of the items, it amounts to a technique that immediately gives up on exact recovery, or “cuts its losses” from an early stage. This raises the interesting question as to whether such an approach is actually necessary to obtain the bound in Theorem 1.

To appreciate this distinction, note that Hwang’s adaptive generalized binary splitting algorithm [32] works by repeatedly identifying a single defective using at most (log2pk)(1+o(1))\big{(}\log_{2}\frac{p}{k}\big{)}(1+o(1)) tests, and then removing it from further consideration. Hence, exact recovery is guaranteed with n=(klog2pk)(1+o(1))n=\big{(}k\log_{2}\frac{p}{k}\big{)}(1+o(1)), but even if the procedure is stopped after (1α)(klog2pk)(1+o(1))(1-\alpha)\big{(}k\log_{2}\frac{p}{k}\big{)}(1+o(1)) tests, one will have already identified (1α)k(1-\alpha)k defective items. In this sense, Hwang’s adaptive algorithm is universally optimal with respect to the approximate recovery parameter α\alpha,222Note that the lower bound stated following (3) also holds for adaptive algorithms. and the algorithm degrades gracefully as the number of tests decreases below the exact recovery threshold.

In contrast, the non-adaptive designs that we have considered do not enjoy such universality. Under a Bernoulli design with k=o(p)k=o(p), we can achieve approximate recovery with arbitrarily small α\alpha when n=(klog2pk)(1+o(1))n=\big{(}k\log_{2}\frac{p}{k}\big{)}(1+o(1)) (Lemma 1), or even exact recovery if k=O(p1/3)k=O(p^{1/3}) [13], but we are prone to complete failure for smaller nn, at least in sufficiently sparse regimes (see Theorem 6 below). Alternatively, ignoring roughly a fraction α\alpha of the items leads to α\alpha-approximate recovery when n=(1α)(klog2pk)(1+o(1))n=(1-\alpha)\big{(}k\log_{2}\frac{p}{k}\big{)}(1+o(1)), but one retains this guarantee and no better regardless of how much nn is increased. Hence, it remains an interesting question for future work as to whether there exists a gracefully degrading (and ideally universally optimal) test matrix design in the non-adaptive setting.

II-B A Trivial Strategy for Weak Detection

We first describe the weak detection problem in more detail. The goal is to distinguish between two joint distributions on the pair (𝐗,𝐘)(\mathbf{X},\mathbf{Y}) for some specified distribution P(𝐗)P(\mathbf{X}):

  • Under distribution PP, the X-marginal is P(𝐗)P(\mathbf{X}), and the joint distribution P(𝐗,𝐘)P(\mathbf{X},\mathbf{Y}) is deduced by deterministically computing 𝐘\mathbf{Y} from 𝐗\mathbf{X} via the group testing model.

  • Under distribution QQ, the 𝐗\mathbf{X} and 𝐘\mathbf{Y} marginals match those of PP, but 𝐗\mathbf{X} and 𝐘\mathbf{Y} are independent, i.e., Q(𝐗,𝐘)=P(𝐗)P(𝐘)Q(\mathbf{X},\mathbf{Y})=P(\mathbf{X})P(\mathbf{Y}).

This is a binary hypothesis testing problem. The distribution QQ corresponds to “completely uninformative outcomes”, so intuitively, if we cannot reliably distinguish between PP and QQ, then we can view the group tests (under the distribution PP) as being highly uninformative.

As hinted above, for Bernoulli designs, the main reason that we consider weak detection is as a stepping stone to providing a negative result for weak recovery (see Section III). It turns out that if we consider weak detection as a standalone problem with general designs, then a trivial strategy succeeds with very few tests. Formally, we have the following.

Proposition 1.

(Trivial Strategy for Weak Detection) For any number of tests nn, there exists some distribution P(𝐗)P(\mathbf{X}) such that given (𝐗,𝐘)(\mathbf{X},\mathbf{Y}), the group testing joint distribution P(𝐗,𝐘)P(\mathbf{X},\mathbf{Y}) can be distinguished from Q(𝐗,𝐘):=P(𝐗)P(𝐘)Q(\mathbf{X},\mathbf{Y}):=P(\mathbf{X})P(\mathbf{Y}) with zero error probability under PP and 12n1-2^{-n} error probability under QQ.

Proof.

Consider letting each test independently contain all items with probability 12\frac{1}{2} and no items with probability 12\frac{1}{2}. Hence, each outcome is independent and equiprobable on {0,1}\{0,1\}. Consider a detection algorithm that declares PP whenever the ones in 𝐘\mathbf{Y} exactly match the rows of ones in 𝐗\mathbf{X}, and declares QQ otherwise. Then, under PP, success is guaranteed due to the group testing model being noiseless. Under QQ, since the test outcomes are uniformly random, the probability of producing the correct outcomes is 2n2^{-n}, proving the proposition. ∎

This result suggests that weak detection is of limited interest as a standalone recovery criterion in general. Nevertheless, this finding arguably strengthens our subsequent negative results for Bernoulli designs (Section III), in the sense of showing that weak detection is not attained despite being an “extremely easy” goal.

II-C Weak Detection with Bernoulli Designs

Our negative results for Bernoulli testing in Section III will demonstrate failure in sufficiently sparse regimes (e.g., k=poly(logp)k={\rm poly}(\log p)) when the number of tests is slightly below log2(pk)\log_{2}{p\choose k}. On the other hand, fairly simple detection strategy can be used to attain the following positive result under Bernoulli testing when kp1/3k\gg p^{1/3} and k=o(p)k=o(p).

Theorem 2.

(Positive Result for Weak Detection with Bernoulli Designs) Consider the probabilistic group testing problem with Bernoulli random testing using the choice of ν\nu in (1), and suppose that n=(1η)log2(pk)n=(1-\eta)\log_{2}{p\choose k} for some η(0,1)\eta\in(0,1). Then, under the condition that k=o(p)k=o(p) and k=Ω(p1+η3+η+δ)k=\Omega\big{(}p^{\frac{1+\eta}{3+\eta}+\delta}\big{)} for some (arbitrarily small) fixed constant δ(0,1)\delta\in(0,1), there exists a binary hypothesis testing scheme that succeeds in distinguishing P(𝐗,𝐘)P(\mathbf{X},\mathbf{Y}) from Q(𝐗,𝐘):=P(𝐗)P(𝐘)Q(\mathbf{X},\mathbf{Y}):=P(\mathbf{X})P(\mathbf{Y}) with probability approaching one as pp\to\infty.

Proof.

The idea of is to perform weak detection based on the number of columns of 𝐗\mathbf{X} that are covered by the test outcomes (i.e., the test outcome is one whenever the column entry is one). The number of such columns is characterized by a binomial distribution under both the group testing model and the null model, and the distributions are shown to be statistically distinguishable under the conditions given. The details are given in Appendix A. ∎

II-D A Simple Strategy for Identifying a Definite Defective

We first describe the problem of identifying a definite defective in more detail. We suppose that the decoder either outputs a single index {1,,p}\mathcal{I}\in\{1,\dotsc,p\} believed to be defective, or declares “I don’t know” by outputting =0\mathcal{I}=0. In the former scenario, we insist that \mathcal{I} must be defective (i.e., S\mathcal{I}\in S) with probability one, meaning that the only errors allowed are detected errors corresponding to =0\mathcal{I}=0. This setup is partly motivated by the definite defectives algorithm for recovering the defective set [17, 23], as well as the notion of zero undetected error capacity in information theory [33].

The following result shows that under a suitably-chosen non-Bernoulli test design, a single definitely-defective item can be reliably identified using only O(logpk)O(\log\frac{p}{k}) tests, representing a reduction by a factor of kk compared to the usual O(klogpk)O(k\log\frac{p}{k}) scaling.

Theorem 3.

(Positive Result for Identifying a Definite Defective) Consider the probabilistic group testing problem, and suppose that n(1+η)2clog2pkn\geq(1+\eta)2c\log_{2}\frac{p}{k} for some η>0\eta>0 and some positive integer c>0c>0. Then there exists a non-adaptive test design and decoder that outputs an estimate {0,1,,p}\mathcal{I}\in\{0,1,\dots,p\} of a definite defective (with 0 representing “I don’t know”) satisfying

[>0S]=0and[=0](1e1+o(1))c\mathbb{P}[\mathcal{I}>0\,\cap\,\mathcal{I}\not\in S]=0~{}~{}\text{\rm and}~{}~{}\mathbb{P}[\mathcal{I}=0]\leq(1-e^{-1}+o(1))^{c} (11)

as kk\to\infty and pp\to\infty.

Proof.

We first consider the case c=1c=1. Let 𝒜\mathcal{A} be a uniformly random set of pk\frac{p}{k} items.333We assume for simplicity that pk\frac{p}{k} is an integer; the general case follows similarly by rounding. By the assumption that kk\to\infty and k=o(p)k=o(p), it is straightforward to show that

[|𝒜S|=1]=e1+o(1).\mathbb{P}[|\mathcal{A}\cap S|=1]=e^{-1}+o(1). (12)

Indeed, the analogous claim is standard when each item is included in 𝒜\mathcal{A} with probability 1k\frac{1}{k} [1, Sec. 2.3], and (12) can then by understood by approximating the Hypergeometric distribution by the binomial distribution [34].

We proceed by describing a procedure from the SAFFRON algorithm of [12] that is guaranteed to identify the single defective item in 𝒜\mathcal{A} whenever |𝒜S|=1|\mathcal{A}\cap S|=1, while also being able to identify with certainty whether |𝒜S|=1|\mathcal{A}\cap S|=1 or |𝒜S|1|\mathcal{A}\cap S|\neq 1. This procedure uses 2v2v tests, where v=log2pk=(1+o(1))log2pkv=\big{\lceil}\log_{2}\frac{p}{k}\big{\rceil}=(1+o(1))\log_{2}\frac{p}{k}.

Number the items in 𝒜\mathcal{A} from 0 to pk1\frac{p}{k}-1 in a fixed manner (e.g., maintaining the order that they take as items in {1,,n}\{1,\dotsc,n\}). For i{0,1,,pk1}i\in\big{\{}0,1,\dots,\frac{p}{k}-1\big{\}}, let 𝐛i{0,1}2v\mathbf{b}_{i}\in\{0,1\}^{2v} be a binary vector of length 2v2v and weight vv constructed as follows: The first vv entries are the number ii written in binary, and the last vv entries are the same, but with the 0s and 11s swapped. We then construct 2v2v tests, where test jj contains exactly the items corresponding to i𝒜i\in\mathcal{A} for which the jj-th entry of 𝐛i\mathbf{b}_{i} equals 11.

We now consider the following cases:

  • If 𝒜\mathcal{A} contains no defective items, then all 2v2v tests will be negative. When this is observed, we set =0\mathcal{I}=0.

  • If 𝒜\mathcal{A} contains exactly one defective, then exactly vv of the tests will be positive. When this is observed, we set \mathcal{I} to be item whose value i𝒜i\in\mathcal{A} is spelled out (in binary) by the first vv test results.

  • If 𝒜\mathcal{A} contains two or more defectives, then more than vv of the tests will be positive. When this is observed, we set =0\mathcal{I}=0.

The first and third cases ensure that we never erroneously set 0\mathcal{I}\neq 0. In addition, we correctly identify a defective item in the second case, which occurs with probability e1+o(1)e^{-1}+o(1) due to (12). This proves the theorem for c=1c=1.

To handle c>1c>1, we simply repeat the preceding process cc times, drawing 𝒜\mathcal{A} independently each time. By doing so, we only fail if none of the sets 𝒜\mathcal{A} contain exactly one defective item, which occurs with probability (1e1+o(1))c(1-e^{-1}+o(1))^{c}. ∎

We briefly mention that the O(logpk)O\big{(}\log\frac{p}{k}\big{)} scaling in Theorem 3 cannot be improved further. To see this, suppose by contradiction that a definite defective could be found with constant (non-zero) probability using o(logpk)o\big{(}\log\frac{p}{k}\big{)} tests. By repeating this procedure with uniformly random shuffling of the items, ignoring any =0\mathcal{I}=0 outcomes and removing any defectives identified, an adaptive group testing algorithm could recover the full defective set using o(klogpk)o\big{(}k\log\frac{p}{k}\big{)} tests on average. However, this would violate the standard Ω(klogpk)\Omega\big{(}k\log\frac{p}{k}\big{)} lower bound [1, Sec 1.4], which holds even in the adaptive setting.

III All-or-Nothing Behavior for Bernoulli Designs

To establish the an all-or-nothing threshold for weak recovery under Bernoulli designs, it is useful to first study the weak detection problem as a stepping stone.

III-A Weak Detection

Recall from Section II-C that the goal of weak detection is to distinguish P(𝐗,𝐘)P(\mathbf{X},\mathbf{Y}) (defined according to the group testing model) from Q(𝐗,𝐘):=P(𝐗)P(𝐘)Q(\mathbf{X},\mathbf{Y}):=P(\mathbf{X})P(\mathbf{Y}), with P(𝐗)P(\mathbf{X}) being the i.i.d. Bernoulli distribution using the choice of ν\nu in (1), and P(𝐘)=2nP(\mathbf{Y})=2^{-n}.

For concreteness, we consider the Bayesian setting, where the observed pair (𝐗,𝐘)(\mathbf{X},\mathbf{Y}) is drawn from PP or QQ with probability 12\frac{1}{2} each. The resulting error probability of the hypothesis test is denoted by 𝖯e\mathsf{P}_{\mathrm{e}}. Trivially, choosing the hypothesis via a random guess gives 𝖯e=12\mathsf{P}_{\mathrm{e}}=\frac{1}{2}. It is a standard result in binary hypothesis testing that if dTV(P,Q)0d_{\rm{TV}}(P,Q)\to 0 as pp\to\infty, then one cannot do better than random guessing asymptotically, i.e., it is impossible do better than 𝖯e=12+o(1)\mathsf{P}_{\mathrm{e}}=\frac{1}{2}+o(1) (e.g., see [35, Sec. 2.3.1]).

Let D(PQ)D(P\|Q) denote the KL divergence between PP and QQ. By Pinsker’s inequality, D(PQ)0D(P\|Q)\to 0 implies dTV(P,Q)0d_{\rm{TV}}(P,Q)\to 0, and in addition, D(PQ)χ2(PQ)D(P\|Q)\leq\chi^{2}(P\|Q) [36], where we define the χ2\chi^{2} divergence

χ2(PQ)=𝐄Q[(P(𝐗,𝐘)Q(𝐗,𝐘)1)2]\displaystyle\chi^{2}(P\|Q)=\mathbf{E}_{Q}\bigg{[}\bigg{(}\frac{P(\mathbf{X},\mathbf{Y})}{Q(\mathbf{X},\mathbf{Y})}-1\bigg{)}^{2}\bigg{]}
=𝐄Q[(P(𝐗,𝐘)Q(𝐗,𝐘))2]1=𝐄P[P(𝐗,𝐘)Q(𝐗,𝐘)]1.\displaystyle\quad=\mathbf{E}_{Q}\bigg{[}\bigg{(}\frac{P(\mathbf{X},\mathbf{Y})}{Q(\mathbf{X},\mathbf{Y})}\bigg{)}^{2}\bigg{]}-1=\mathbf{E}_{P}\bigg{[}\frac{P(\mathbf{X},\mathbf{Y})}{Q(\mathbf{X},\mathbf{Y})}\bigg{]}-1. (13)

Hence, to prove a hardness result for distinguishing PP from QQ, it suffices to show that χ2(PQ)0\chi^{2}(P\|Q)\to 0 as pp\to\infty. The following theorem gives conditions under which this is the case.

Theorem 4.

(Negative Result for Weak Detection) Consider the probabilistic group testing problem with Bernoulli random testing using the choice of ν\nu in (1), and suppose that n(1η)log2(pk)n\leq(1-\eta)\log_{2}{p\choose k} for some η(0,1)\eta\in(0,1). Then, when k=o(pη1+η)k=o(p^{\frac{\eta}{1+\eta}}) as pp\to\infty, we have

χ2(PQ)=1(pk)l=0k(kl)(pkl)2n(1lk)10\displaystyle\chi^{2}(P\|Q)=\frac{1}{{p\choose k}}\sum_{l=0}^{k}{k\choose l}{p-k\choose l}2^{n(1-\frac{l}{k})}-1\to 0 (14)

as pp\to\infty. Hence, the smallest possible error probability for the binary hypothesis test between P(𝐗,𝐘)P(\mathbf{X},\mathbf{Y}) and Q(𝐗,𝐘)=P(𝐗)(𝐘)Q(\mathbf{X},\mathbf{Y})=P(\mathbf{X})(\mathbf{Y}) behaves as 𝖯e=12+o(1)\mathsf{P}_{\mathrm{e}}=\frac{1}{2}+o(1).

Proof.

The proof is based on substituting the distributions PP and QQ into (13) and performing asymptotic simplifications. The details are given in Appendix B. ∎

The condition k=o(pη1+η)k=o(p^{\frac{\eta}{1+\eta}}) holds when kk grows sufficiently slowly with respect to pp, e.g., it holds for arbitrarily small η\eta when k=poly(logp)k={\rm poly}(\log p). On the other hand, it remains open as to whether a similar hardness result can be proved when kk grows faster than Θ(pη1+η)\Theta(p^{\frac{\eta}{1+\eta}}). To address this question, we note the following:

  • Theorem 2 above shows that PP and QQ can be reliably distinguished when n(1η)log2(pk)n\geq(1-\eta)\log_{2}{p\choose k} and k=p1+η3+η+Ω(1)k=p^{\frac{1+\eta}{3+\eta}+\Omega(1)}, which essentially reduces to kp1/3k\gg p^{1/3} for small η\eta. Thus, log2(pk)\log_{2}{p\choose k} no longer serves as an all-or-nothing threshold in this denser regime. This provides an interesting point of contrast with the analogous sparse linear regression problem [26], where the analogous hardness result to Theorem 4 holds for all k=O(p)k=O(\sqrt{p}). In addition, [26, App. C] provides a positive result showing that this threshold is tight.

  • Theorem 5 below shows that the above χ2\chi^{2}-divergence does not approach zero when n(1η)log2(pk)n\geq(1-\eta)\log_{2}{p\choose k} and k=Ω(pη1+η)k=\Omega\big{(}p^{\frac{\eta}{1+\eta}}\big{)}. Note that χ2\chi^{2}-divergence approaching zero is sufficient, but not necessary, for establishing the hardness of distinguishing PP from QQ. Hence, this result does not establish such hardness, but it does show that any proof establishing hardness must move beyond the approach of bounding χ2(PQ)\chi^{2}(P\|Q).

    In the sparse linear regression problem, a similar limitation regarding the χ2\chi^{2} divergence is overcome by conditioning out certain “catastrophic” low-probability events [26] that blow up the divergence. Unfortunately, it appears to be difficult to identify an analogous event in the group testing problem.

Formally, the second of these is stated as follows.

Theorem 5.

(A Condition for Non-Vanishing χ2\chi^{2}-Divergence) Consider the probabilistic group testing problem with Bernoulli random testing using the choice of ν\nu in (1), and suppose that n=(1η)log2(pk)n=(1-\eta)\log_{2}{p\choose k} for some η(0,1)\eta\in(0,1). Then, when k=Ω(pη1+η)k=\Omega(p^{\frac{\eta}{1+\eta}}) as pp\to\infty, we have

lim infpχ2(PQ)c\displaystyle\liminf_{p\to\infty}\chi^{2}(P\|Q)\geq c (15)

for some constant c>0c>0.

Proof.

See Appendix B. ∎

III-B Weak Recovery

We are now ready to present our main negative result concerning the weak recovery criterion under Bernoulli testing, and establishing an all-or-nothing threshold at nklog2pkn\sim k\log_{2}\frac{p}{k}.

Theorem 6.

(Negative Result for Weak Recovery) Consider the probabilistic group testing problem with Bernoulli random testing using the choice of ν\nu in (1), and suppose that n(1η)log2(pk)n\leq(1-\eta)\log_{2}{p\choose k} for some η(0,1)\eta\in(0,1). Then, when kk\to\infty with k=o(pη1+η)k=o(p^{\frac{\eta}{1+\eta}}) as pp\to\infty, for any fixed α(0,1)\alpha\in(0,1), any decoder that outputs some estimate S^\widehat{S} of SS must yield the following as pp\to\infty:

[max{|S^S|,|SS^|}αk]0\mathbb{P}\Big{[}\max\big{\{}|\widehat{S}\setminus S|,|S\setminus\widehat{S}|\big{\}}\leq\alpha k\Big{]}\to 0 (16)
Proof.

The proof is outlined as follows. Following an idea used for sparse linear regression in [26], we study the mutual information I(S;𝐘,Y|𝐗,X)I(S;\mathbf{Y},Y^{\prime}|\mathbf{X},X^{\prime}), where (X,Y)(X^{\prime},Y^{\prime}) is an additional test independent from (𝐗,𝐘)(\mathbf{X},\mathbf{Y}). Combining some manipulations of information terms with the weak detection result of Theorem 4, we show that H(Y|𝐗,X,𝐘)log2H(Y^{\prime}|\mathbf{X},X^{\prime},\mathbf{Y})\to\log 2 when n(1η)log2(pk)n\leq(1-\eta)\log_{2}{p\choose k}. On the other hand, we show that if weak recovery were possible, we would be able to predict YY^{\prime} given (𝐗,X,𝐘)(\mathbf{X},X^{\prime},\mathbf{Y}) better than random guessing, meaning that H(Y|𝐗,X,𝐘)H(Y^{\prime}|\mathbf{X},X^{\prime},\mathbf{Y}) would be bounded away from log2\log 2. Combining these observations, we deduce that weak recovery must be impossible. The details are given in Appendix B. ∎

Hence, when kk is sufficiently sparse so that k=o(pη1+η)k=o(p^{\frac{\eta}{1+\eta}}) holds for any η>0\eta>0 (e.g., k=poly(logp)k={\rm poly}(\log p)), the threshold n=klog2pkn^{*}=k\log_{2}\frac{p}{k} serves as an exact threshold between complete success and complete failure. We note that in previous works, phase transitions were proved in [13, 19, 20] regarding the error probability of recovering the exact defective set, whereas Theorem 6 gives the much stronger statement that one cannot even identify a small fraction of the defective set.

III-C Identifying a Definite Defective

The following negative result for identifying a definite defective (see Section II-D) follows in a straightforward manner from our negative result for weak detection (Theorem 4).

Theorem 7.

(Negative Result for Identifying a Definite Defective) Consider the probabilistic group testing problem with Bernoulli random testing using the choice of ν\nu in (1), and suppose that n(1η)log2(pk)n\leq(1-\eta)\log_{2}{p\choose k} for some η(0,1)\eta\in(0,1). Then, when k=o(pη1+η)k=o(p^{\frac{\eta}{1+\eta}}) as pp\to\infty, for any decoder that outputs an estimate {0,1,,p}\mathcal{I}\in\{0,1,\dotsc,p\} of a definite defective (with 0 representing “I don’t know”), we have

[>0S]=0[=0]1\mathbb{P}[\mathcal{I}>0\cap\mathcal{I}\notin S]=0\implies\mathbb{P}[\mathcal{I}=0]\to 1 (17)

as pp\to\infty.

Proof.

The idea is to note that since [>0S]=0\mathbb{P}[\mathcal{I}>0\cap\mathcal{I}\notin S]=0, the decoder must output =0\mathcal{I}=0 whenever there exists some SS^{\prime} of cardinality kk that is disjoint from SS and consistent with the test outcomes. Using de Caen’s bound [37], we show that this holds with probability at least 11+χ2(PQ)\frac{1}{1+\chi^{2}(P\|Q)}, and the theorem follows since χ2(PQ)0\chi^{2}(P\|Q)\to 0 due to Theorem 4. The details are given in Appendix B. ∎

III-D Discussion: Bernoulli vs. General Designs

We wrap up this section and Section II by highlighting that for all three of the recovery criteria considered, Bernoulli designs can be highly suboptimal compared to general designs:

  • Weak recovery is possible (for general test designs) when n=Ω(klognk)n=\Omega\big{(}k\log\frac{n}{k}\big{)} with an arbitrarily small implied constant, whereas Bernoulli testing requires n=(1+o(1))klognkn=(1+o(1))k\log\frac{n}{k} in sufficiently sparse regimes;

  • A trivial strategy exists for weak detection with only O(1)O(1) tests (for constant error probability), whereas Bernoulli testing requires n=(1+o(1))klognkn=(1+o(1))k\log\frac{n}{k} in sufficiently sparse regimes;

  • Identifying a definite defective is possible with O(logn)O(\log n) tests, whereas Bernoulli testing requires n=(1+o(1))klognkn=(1+o(1))k\log\frac{n}{k} in sufficiently sparse regimes.

Since Bernoulli designs are a prototypical example of an unstructured random design, these results indicate that despite their strong theoretical guarantees (e.g., as established in [13]), significant care should be taken in adopting them when too few tests are available, or when relaxed recovery criteria are considered to be acceptable. Essentially, in such cases, it is much better to “cut one’s losses” early on via a hand-crafted design that targets the less stringent recovery criterion under consideration.

IV Conclusion

We have established the fundamental limits of noiseless non-adaptive group testing under the weak detection criterion, and more generally approximate recovery, and shown Bernoulli designs to be highly suboptimal (in sufficiently sparse regimes) in the sense of exhibiting an all-or-nothing threshold at nklog2pkn\sim k\log_{2}\frac{p}{k}. In addition, we gave similar negative results for Bernoulli testing under the criteria of weak detection and identifying a definite defective, while establishing that non-Bernoulli designs can attain these goals with very few tests.

Possible directions for future work include (i) closing the remaining gaps in between the positive and negative results in the regime k=Θ(pθ)k=\Theta(p^{\theta}) with θ(0,13)\theta\in\big{(}0,\frac{1}{3}\big{)}; (ii) handling Bernoulli(νk)\big{(}\frac{\nu}{k}\big{)} designs with more general choices of ν\nu; (iii) providing analogous results for the near-constant tests-per-item design [23, 19]; (iv) finding a design that attains exact or near-exact recovery with n=(1+o(1))(klog2pk)n=(1+o(1))\big{(}k\log_{2}\frac{p}{k}\big{)} tests while degrading gracefully below this threshold; and (v) developing analogous results under random noise models [38, 14, 18].

Appendix A Proof of Theorem 2 (Positive Result for Weak Detection with Bernoulli Designs)

By the assumption k=o(p)k=o(p), we have log2(pk)=(klog2pk)(1+o(1))\log_{2}{p\choose k}=\big{(}k\log_{2}\frac{p}{k}\big{)}(1+o(1)). Hence, it suffices to prove the theorem for n=(1η)klog2pkn=(1-\eta)k\log_{2}\frac{p}{k}, since we can incorporate the 1+o(1)1+o(1) term into δ\delta.

In the following, we use the terminology that the jj-th column 𝐗j\mathbf{X}_{j} of 𝐗\mathbf{X} is covered by 𝐘\mathbf{Y} if the support of 𝐗j\mathbf{X}_{j} is a subset of the support of 𝐘\mathbf{Y} (i.e., whenever the ii-th entry of 𝐗j\mathbf{X}_{j} is 11, the outcome yiy_{i} is also 11). We consider distinguishing models PP and QQ by counting the number of columns of 𝐗\mathbf{X} that are covered by 𝐘\mathbf{Y}.

Fix a constant ζ(0,1)\zeta\in(0,1), and consider the “typical” set 𝒯\mathcal{T} containing all the sequences 𝐲{0,1}n\mathbf{y}\in\{0,1\}^{n} such that the number of positive tests is between n(1ζ)2\frac{n(1-\zeta)}{2} and (1+ζ)n2\frac{(1+\zeta)n}{2}. By the law of large numbers, we have [𝐘𝒯]0\mathbb{P}[\mathbf{Y}\notin\mathcal{T}]\to 0 as pp\to\infty. Given any sequence 𝐲𝒯\mathbf{y}\in\mathcal{T}, when an independent random column 𝐗j\mathbf{X}_{j} is generated, the (conditional) probability q0q_{0} of it being covered satisfies

(1νk)(1+ζ)n2q0(1νk)(1ζ)n2.\displaystyle\bigg{(}1-\frac{\nu}{k}\bigg{)}^{\frac{(1+\zeta)n}{2}}\leq q_{0}\leq\bigg{(}1-\frac{\nu}{k}\bigg{)}^{\frac{(1-\zeta)n}{2}}. (18)

For n=(1η)klog2pkn=(1-\eta)k\log_{2}\frac{p}{k}, recalling the choice of ν\nu in (1) (which yields P(𝐘)=2nP(\mathbf{Y})=2^{-n}), we have

(1νk)(1ζ)n2\displaystyle\bigg{(}1-\frac{\nu}{k}\bigg{)}^{\frac{(1-\zeta)n}{2}} =(12)(1ζ)n2k=(kp)(1ζ)(1η)2,\displaystyle=\bigg{(}\frac{1}{2}\bigg{)}^{\frac{(1-\zeta)n}{2k}}=\bigg{(}\frac{k}{p}\bigg{)}^{\frac{(1-\zeta)(1-\eta)}{2}}, (19)

and similarly

(1νk)(1+ζ)n2=(kp)(1+ζ)(1η)2.\displaystyle\bigg{(}1-\frac{\nu}{k}\bigg{)}^{\frac{(1+\zeta)n}{2}}=\bigg{(}\frac{k}{p}\bigg{)}^{\frac{(1+\zeta)(1-\eta)}{2}}. (20)

Hence, we have

(kp)(1+ζ)(1η)2q0(kp)(1ζ)(1η)2.\displaystyle\bigg{(}\frac{k}{p}\bigg{)}^{\frac{(1+\zeta)(1-\eta)}{2}}\leq q_{0}\leq\bigg{(}\frac{k}{p}\bigg{)}^{\frac{(1-\zeta)(1-\eta)}{2}}. (21)

Then, the distribution of the number N~(𝐗,𝐲)\tilde{N}(\mathbf{X},\mathbf{y}) of covered columns under the two hypotheses is given as follows:

  • Under PP: N~(𝐗,𝐲)(k+Binomial(pk,q0))\tilde{N}(\mathbf{X},\mathbf{y})\sim\big{(}k+\mbox{Binomial}(p-k,q_{0})\big{)}, where the addition of kk is due to the fact that the defective items’ columns are almost surely covered due to the definition of the group testing model.

  • Under QQ: N~(𝐗,𝐲)Binomial(p,q0)\tilde{N}(\mathbf{X},\mathbf{y})\sim\mbox{Binomial}(p,q_{0}).

We consider the following procedure for distinguishing these two hypotheses:

N~(𝐗,𝐘)QPpq0+k2.\displaystyle\tilde{N}(\mathbf{X},\mathbf{Y})\quad\mathop{\gtreqless}_{Q}^{P}\quad pq_{0}+\frac{k}{2}. (22)

Then, given 𝐲\mathbf{y}, the error probability 𝖯e(𝐲)\mathsf{P}_{\mathrm{e}}(\mathbf{y}) with a uniform prior satisfies

𝖯e(𝐲)\displaystyle\mathsf{P}_{\mathrm{e}}(\mathbf{y}) =12Q[N~(𝐗,𝐲)>pq0+k2]\displaystyle=\frac{1}{2}\mathbb{P}_{Q}\bigg{[}\tilde{N}(\mathbf{X},\mathbf{y})>pq_{0}+\frac{k}{2}\bigg{]}
+12P[N~(𝐗,𝐲)<pq0+k2].\displaystyle\quad+\frac{1}{2}\mathbb{P}_{P}\bigg{[}\tilde{N}(\mathbf{X},\mathbf{y})<pq_{0}+\frac{k}{2}\bigg{]}. (23)

For the first term in (23), observe that by the Berry-Esseen Theorem [39] (see Corollary 2 in Appendix C), we have

Q[N~(𝐗,𝐲)pq0+k2]\displaystyle\mathbb{P}_{Q}\bigg{[}\tilde{N}(\mathbf{X},\mathbf{y})\geq pq_{0}+\frac{k}{2}\bigg{]}
=Q[N~(𝐗,𝐘)pq0p(1q0)q0k2p(1q0)q0]\displaystyle~{}~{}=\mathbb{P}_{Q}\bigg{[}\frac{\tilde{N}(\mathbf{X},\mathbf{Y})-pq_{0}}{\sqrt{p(1-q_{0})q_{0}}}\geq\frac{k}{2\sqrt{p(1-q_{0})q_{0}}}\bigg{]} (24)
𝒬(k2p(1q0)q0)+6ρσ3p,\displaystyle~{}~{}\leq\mathcal{Q}\bigg{(}\frac{k}{2\sqrt{p(1-q_{0})q_{0}}}\bigg{)}+\frac{6\rho}{\sigma^{3}\sqrt{p}}, (25)

where 𝒬(t)=12πteu2/2du\mathcal{Q}(t)=\frac{1}{\sqrt{2\pi}}\int_{t}^{\infty}e^{-u^{2}/2}\,{\rm d}u denotes the standard Gaussian upper tail probability function, and as also shown in Appendix C, the relevant moments are

ρ\displaystyle\rho =(1q0)3q0+q03(1q0),\displaystyle=(1-q_{0})^{3}q_{0}+q_{0}^{3}(1-q_{0}), (26)
σ\displaystyle\sigma =(1q0)q0.\displaystyle=\sqrt{(1-q_{0})q_{0}}. (27)

The ratio appearing in (25) can be simplified as follows:

6ρσ3p\displaystyle\frac{6\rho}{\sigma^{3}\sqrt{p}} =6(1q0)3q0+6q03(1q0)((1q0)q0)3p\displaystyle=\frac{6(1-q_{0})^{3}q_{0}+6q_{0}^{3}(1-q_{0})}{(\sqrt{(1-q_{0})q_{0}})^{3}\sqrt{p}} (28)
=6(1q0)2q0+6q03q03/2(1q0)p\displaystyle=\frac{6(1-q_{0})^{2}q_{0}+6q_{0}^{3}}{q_{0}^{3/2}\sqrt{(1-q_{0})p}} (29)
12q0q03/2(1q0)p\displaystyle\leq\frac{12q_{0}}{q_{0}^{3/2}\sqrt{(1-q_{0})p}} (30)
=12pq0(1q0).\displaystyle=\frac{12}{\sqrt{pq_{0}(1-q_{0})}}. (31)

Similarly, again using the Berry-Essen theorem, and writing N~\tilde{N} in place of N~(𝐗,𝐘)\tilde{N}(\mathbf{X},\mathbf{Y}) for brevity, we have

P[N~pq0+k2]\displaystyle\mathbb{P}_{P}\bigg{[}\tilde{N}\leq pq_{0}+\frac{k}{2}\bigg{]}
=P[N~k(pk)q0pq0+k2k(pk)q0]\displaystyle=\mathbb{P}_{P}\bigg{[}\tilde{N}-k-(p-k)q_{0}\leq pq_{0}+\frac{k}{2}-k-(p-k)q_{0}\bigg{]} (32)
=P[N~k(pk)q0(pk)(1q0)q0q0kk2(pk)(1q0)q0]\displaystyle=\mathbb{P}_{P}\bigg{[}\frac{\tilde{N}-k-(p-k)q_{0}}{\sqrt{(p-k)(1-q_{0})q_{0}}}\leq\frac{q_{0}k-\frac{k}{2}}{\sqrt{(p-k)(1-q_{0})q_{0}}}\bigg{]} (33)
1𝒬(q0kk2(pk)(1q0)q0)+6ρσ3pk,\displaystyle\leq 1-\mathcal{Q}\bigg{(}\frac{q_{0}k-\frac{k}{2}}{\sqrt{(p-k)(1-q_{0})q_{0}}}\bigg{)}+\frac{6\rho}{\sigma^{3}\sqrt{p-k}}, (34)

and since k=o(p)k=o(p), we have from (31) that 6ρσ3pk12pq0(1q0)(1+o(1))\frac{6\rho}{\sigma^{3}\sqrt{p-k}}\leq\frac{12}{\sqrt{pq_{0}(1-q_{0})}}(1+o(1)).

We know from (21) that q00q_{0}\to 0, and combining this with k=o(p)k=o(p), we see from (31) and (34) that 𝖯e0\mathsf{P}_{\mathrm{e}}\to 0 as long as pq0pq_{0}\to\infty and k=ω(pq0)k=\omega(\sqrt{pq_{0}}). The condition pq0pq_{0}\to\infty follows as an immediate consequence of (21) (with k=o(p)k=o(p) and δ<1\delta<1). In addition, again using (21), we find that the condition k=ω(pq0)k=\omega\big{(}\sqrt{pq_{0}}\big{)} is satisfied if

k\displaystyle k =ω(p(kp)(1η)(1ζ)4).\displaystyle=\omega\bigg{(}\sqrt{p}\bigg{(}\frac{k}{p}\bigg{)}^{\frac{(1-\eta)(1-\zeta)}{4}}\bigg{)}. (35)

Letting a=(1η)(1ζ)a=(1-\eta)(1-\zeta), we find that this condition holds if k1a4=ω(p12a4)k^{1-\frac{a}{4}}=\omega\big{(}p^{\frac{1}{2}-\frac{a}{4}}\big{)}, which simplifies to k=ω(p2a4a)k=\omega\big{(}p^{\frac{2-a}{4-a}}\big{)}. Substituting a=(1η)(1ζ)a=(1-\eta)(1-\zeta), and recalling that η\eta is constant and ζ\zeta is arbitrarily small, we find that the preceding condition holds as long as

k\displaystyle k =Ω(p1+η3+η+δ)\displaystyle=\Omega\big{(}p^{\frac{1+\eta}{3+\eta}+\delta}\big{)} (36)

for arbitrarily small δ(0,1)\delta\in(0,1). This completes the proof of Theorem 2.

Appendix B Proofs of Negative Results for Bernoulli Designs

B-A Preliminary Calculations

Since the group testing model PP and the null model QQ have the same 𝐗\mathbf{X} distribution, and the null model assigns probability 2n2^{-n} to each 𝐘\mathbf{Y} sequence (due to the choice of ν\nu in (1)), we have

P(𝐗,𝐘)Q(𝐗,𝐘)=2nP(𝐘|𝐗)=2nS1(pk)P(𝐘|𝐗,S),\frac{P(\mathbf{X},\mathbf{Y})}{Q(\mathbf{X},\mathbf{Y})}=2^{n}P(\mathbf{Y}|\mathbf{X})=2^{n}\sum_{S}\frac{1}{{p\choose k}}P(\mathbf{Y}|\mathbf{X},S), (37)

where here and subsequently, the summation over SS is implicitly over all (pk)p\choose k subsets of {1,,n}\{1,\dotsc,n\} of cardinality kk. Since the observation model defining PP is deterministic, P(𝐘|𝐗,S)P(\mathbf{Y}|\mathbf{X},S) is simply 11 if 𝐘\mathbf{Y} is consistent with (𝐗,S)(\mathbf{X},S) (according to Y(i)=jSXj(i)Y^{(i)}=\bigvee_{j\in S}X_{j}^{(i)}), and 0 otherwise. Letting S(𝐗,𝐘)\mathcal{I}_{S}(\mathbf{X},\mathbf{Y}) be the corresponding indicator function, we rewrite (37) as

P(𝐗,𝐘)Q(𝐗,𝐘)=2n(pk)SS(𝐗,𝐘),\frac{P(\mathbf{X},\mathbf{Y})}{Q(\mathbf{X},\mathbf{Y})}=\frac{2^{n}}{{p\choose k}}\sum_{S}\mathcal{I}_{S}(\mathbf{X},\mathbf{Y}), (38)

and take the square to obtain

(P(𝐗,𝐘)Q(𝐗,𝐘))2=4n(pk)2S,SS(𝐗,𝐘)S(𝐗,𝐘).\bigg{(}\frac{P(\mathbf{X},\mathbf{Y})}{Q(\mathbf{X},\mathbf{Y})}\bigg{)}^{2}=\frac{4^{n}}{{p\choose k}^{2}}\sum_{S,S^{\prime}}\mathcal{I}_{S}(\mathbf{X},\mathbf{Y})\mathcal{I}_{S^{\prime}}(\mathbf{X},\mathbf{Y}). (39)

Taking the average over (𝐗,𝐘)Q(\mathbf{X},\mathbf{Y})\sim Q and using the middle form in (13), we obtain

χ2(PQ)=4n(pk)2S,S𝔼Q[S(𝐗,𝐘)S(𝐗,𝐘)]1.\chi^{2}(P\|Q)=\frac{4^{n}}{{p\choose k}^{2}}\sum_{S,S^{\prime}}\mathbb{E}_{Q}\big{[}\mathcal{I}_{S}(\mathbf{X},\mathbf{Y})\mathcal{I}_{S^{\prime}}(\mathbf{X},\mathbf{Y})\big{]}-1. (40)

The average here is the probability that a randomly generated (𝐗,𝐘)(\mathbf{X},\mathbf{Y}) (independent of each other) is consistent with both SS and SS^{\prime}. By the symmetry of (𝐗,𝐘)(\mathbf{X},\mathbf{Y}) with respect to re-labeling items, we can assume without loss of generality that SS equals the set

S0={1,2,,k},\displaystyle S_{0}=\{1,2,\dotsc,k\}, (41)

and average over SS^{\prime} alone; by splitting into SS^{\prime} with \ell entries in {k+1,,p}\{k+1,\dotsc,p\} (non-overlapping with S0S_{0}) and kk-\ell entries in {1,,k}\{1,\dotsc,k\} (overlapping with S0S_{0}), we obtain

χ2(PQ)\displaystyle\chi^{2}(P\|Q)
=4n(pk)=0k(k)(pk)𝔼Q[S0(𝐗,𝐘)S(𝐗,𝐘)]1,\displaystyle=\frac{4^{n}}{{p\choose k}}\sum_{\ell=0}^{k}{k\choose\ell}{p-k\choose\ell}\mathbb{E}_{Q}\big{[}\mathcal{I}_{S_{0}}(\mathbf{X},\mathbf{Y})\mathcal{I}_{S^{\prime}}(\mathbf{X},\mathbf{Y})\big{]}-1, (42)

where SS^{\prime} implicitly satisfies |S0S|=k|S_{0}\cap S^{\prime}|=k-\ell in the expectation.

Now observe that 𝔼Q[S0(𝐗,𝐘)S(𝐗,𝐘)]\mathbb{E}_{Q}\big{[}\mathcal{I}_{S_{0}}(\mathbf{X},\mathbf{Y})\mathcal{I}_{S^{\prime}}(\mathbf{X},\mathbf{Y})\big{]} is the probability (with respect to QQ) that every one of the nn tests satisfies any one of the following:

  • The test outcome is negative, and all k+k+\ell items from S0SS_{0}\cup S^{\prime} are excluded;

  • The test outcome is positive, and at least one item from S0SS_{0}\cap S^{\prime} is included;

  • The test outcome is positive, and no items from S0SS_{0}\cap S^{\prime} are included, but at least one item from each of S0SS_{0}\setminus S^{\prime} and SS0S^{\prime}\setminus S_{0} are included.

For a single test, we characterize the probabilities of these three events under QQ as follows follows (recalling (1)):

  • The first event has probability 12(1νk)k+=12(12)k+k=(12)2+k\frac{1}{2}\big{(}1-\frac{\nu}{k}\big{)}^{k+\ell}=\frac{1}{2}\cdot\big{(}\frac{1}{2}\big{)}^{\frac{k+\ell}{k}}=\big{(}\frac{1}{2}\big{)}^{2+\frac{\ell}{k}};

  • The union of the second and third events above can be reformulated as the event the test outcome is positive and none of the following events occur: (i) All items from S0SS_{0}\cup S^{\prime} are excluded; (ii) All items from S0S_{0} are excluded, but at least one from SS0S^{\prime}\setminus S_{0} is included; (iii) All items from SS^{\prime} are excluded, but at least one from S0SS_{0}\setminus S^{\prime} is included. Using this formulation, the union of the second and third events above has probability 12[1(1νk)k+212(1(1νk))]=12[1(12)1+k(1(12)k)]=(12)1+k(12)2+k=(12)2+k.\frac{1}{2}\big{[}1-\big{(}1-\frac{\nu}{k}\big{)}^{k+\ell}-2\cdot\frac{1}{2}\cdot\big{(}1-\big{(}1-\frac{\nu}{k}\big{)}^{\ell}\big{)}\big{]}=\frac{1}{2}\big{[}1-\big{(}\frac{1}{2}\big{)}^{1+\frac{\ell}{k}}-\big{(}1-\big{(}\frac{1}{2}\big{)}^{\frac{\ell}{k}}\big{)}\big{]}=\big{(}\frac{1}{2}\big{)}^{1+\frac{\ell}{k}}-\big{(}\frac{1}{2}\big{)}^{2+\frac{\ell}{k}}=\big{(}\frac{1}{2}\big{)}^{2+\frac{\ell}{k}}.

Summing these two probabilities together gives an overall probability of 2(12)2+k=(12)1+k2\cdot\big{(}\frac{1}{2}\big{)}^{2+\frac{\ell}{k}}=\big{(}\frac{1}{2}\big{)}^{1+\frac{\ell}{k}} associated with a single test. Since the tests are independent, taking the intersection of the corresponding nn events gives

𝔼Q[S0(𝐗,𝐘)S(𝐗,𝐘)]=2n(1+k),\displaystyle\mathbb{E}_{Q}\big{[}\mathcal{I}_{S_{0}}(\mathbf{X},\mathbf{Y})\mathcal{I}_{S^{\prime}}(\mathbf{X},\mathbf{Y})\big{]}=2^{-n(1+\frac{\ell}{k})}, (43)

and substitution into (42) yields

χ2(PQ)=4n(pk)=0k(k)(pk)2n(1+k)1.\chi^{2}(P\|Q)=\frac{4^{n}}{{p\choose k}}\sum_{\ell=0}^{k}{k\choose\ell}{p-k\choose\ell}2^{-n(1+\frac{\ell}{k})}-1. (44)

B-B Proof of Theorem 4 (Impossibility of Weak Detection for n(1η)log2(pk)n\leq(1-\eta)\log_{2}{p\choose k})

Recall the setup of weak detection described in Section II-B, and that we consider i.i.d. Bernoulli designs with ν\nu chosen via (1). We first prove the following lemma, which provides an upper bound on the χ2\chi^{2}-divergence.

Lemma 2.

Assume that n(1η)log2(pk)n\leq(1-\eta)\log_{2}{p\choose k} for some η(0,1)\eta\in(0,1). Then, it holds that

χ2(PQ)exp[e1ηk(kp)ηppk+1]1.\displaystyle\chi^{2}(P\|Q)\leq\exp\bigg{[}e^{1-\eta}k\bigg{(}\frac{k}{p}\bigg{)}^{\eta}\frac{p}{p-k+1}\bigg{]}-1. (45)
Proof.

Using the assumption n(1η)log2(pk)n\leq(1-\eta)\log_{2}{p\choose k}, we bound (44) as follows:

χ2(PQ)\displaystyle\chi^{2}(P\|Q)
=1(pk)l=0k(kl)(pkl)2n(1lk)1\displaystyle~{}~{}=\frac{1}{{p\choose k}}\sum_{l=0}^{k}{k\choose l}{p-k\choose l}2^{n(1-\frac{l}{k})}-1 (46)
1(pk)l=0k(kl)(pkl)2(1lk)(1η)log2(pk)1\displaystyle~{}~{}\leq\frac{1}{{p\choose k}}\sum_{l=0}^{k}{k\choose l}{p-k\choose l}2^{(1-\frac{l}{k})(1-\eta)\log_{2}{p\choose k}}-1 (47)
=1(pk)l=0k(kl)(pkl)(pk)(1η)(1lk)1\displaystyle~{}~{}=\frac{1}{{p\choose k}}\sum_{l=0}^{k}{k\choose l}{p-k\choose l}{p\choose k}^{(1-\eta)(1-\frac{l}{k})}-1 (48)
=(pk)ηl=0k(kl)(pkl)(pk)(1η)lk1.\displaystyle~{}~{}={p\choose k}^{-\eta}\sum_{l=0}^{k}{k\choose l}{p-k\choose l}{p\choose k}^{-(1-\eta)\frac{l}{k}}-1. (49)

In addition, for all l<kl<k, we have

(pkl)(pk)\displaystyle\frac{{p-k\choose l}}{{p\choose k}} (pl)(pk)\displaystyle\leq\frac{{p\choose l}}{{p\choose k}} (50)
=k!(pk)!l!(pl)!\displaystyle=\frac{k!(p-k)!}{l!(p-l)!} (51)
=k(k1)(l+1)(pl)(pl1)(pk+1)\displaystyle=\frac{k(k-1)\cdots(l+1)}{(p-l)(p-l-1)\cdots(p-k+1)} (52)
(kpk+1)kl.\displaystyle\leq\bigg{(}\frac{k}{p-k+1}\bigg{)}^{k-l}. (53)

Since (53) also holds for l=kl=k, it follows that

(pkl)(pk)(kpk+1)kl\displaystyle\frac{{p-k\choose l}}{{p\choose k}}\leq\bigg{(}\frac{k}{p-k+1}\bigg{)}^{k-l} (54)

for all 0lk0\leq l\leq k.

From (49) and (54), we obtain

χ2(PQ)\displaystyle\chi^{2}(P\|Q)
(pk)1ηl=0k(kl)(kpk+1)kl(pk)(1η)lk1\displaystyle\leq{p\choose k}^{1-\eta}\sum_{l=0}^{k}{k\choose l}\bigg{(}\frac{k}{p-k+1}\bigg{)}^{k-l}{p\choose k}^{-(1-\eta)\frac{l}{k}}-1 (55)
=(pk)1η(kpk+1)k\displaystyle={p\choose k}^{1-\eta}\bigg{(}\frac{k}{p-k+1}\bigg{)}^{k}
×l=0k(kl)(kpk+1)l(pk)(1η)lk1\displaystyle\qquad\qquad\times\sum_{l=0}^{k}{k\choose l}\bigg{(}\frac{k}{p-k+1}\bigg{)}^{-l}{p\choose k}^{-(1-\eta)\frac{l}{k}}-1 (56)
=(pk)1η(kpk+1)k[1+pk+1k(pk)1ηk]k1\displaystyle={p\choose k}^{1-\eta}\bigg{(}\frac{k}{p-k+1}\bigg{)}^{k}\bigg{[}1+\frac{p-k+1}{k}{p\choose k}^{-\frac{1-\eta}{k}}\bigg{]}^{k}-1 (57)
=[(pk)1ηk(kpk+1)+1]k1\displaystyle=\bigg{[}{p\choose k}^{\frac{1-\eta}{k}}\bigg{(}\frac{k}{p-k+1}\bigg{)}+1\bigg{]}^{k}-1 (58)
[(pek)1η(kpk+1)+1]k1\displaystyle\leq\bigg{[}\bigg{(}\frac{pe}{k}\bigg{)}^{1-\eta}\bigg{(}\frac{k}{p-k+1}\bigg{)}+1\bigg{]}^{k}-1 (59)
exp[k(pek)1η(kpk+1)]1\displaystyle\leq\exp\bigg{[}k\bigg{(}\frac{pe}{k}\bigg{)}^{1-\eta}\bigg{(}\frac{k}{p-k+1}\bigg{)}\bigg{]}-1 (60)
=exp[e1ηk(kp)ηppk+1]1\displaystyle=\exp\bigg{[}e^{1-\eta}k\bigg{(}\frac{k}{p}\bigg{)}^{\eta}\frac{p}{p-k+1}\bigg{]}-1 (61)

where (57) follows from the fact that j=0k(kj)xj=(1+x)k\sum_{j=0}^{k}{k\choose j}x^{j}=(1+x)^{k}, (59) follows from (pk)(pek)k{p\choose k}\leq\big{(}\frac{pe}{k}\big{)}^{k}, and (60) uses 1+xex1+x\leq e^{x}. This proves (45). ∎

To prove Theorem 4, it suffices to show that the right-hand side of (45) tends to zero as pp\to\infty. To see this, observe that the condition k=o(pη1+η)k=o(p^{\frac{\eta}{1+\eta}}) can equivalently be written as k(kp)η=o(1)k\big{(}\frac{k}{p}\big{)}^{\eta}=o(1), and this condition implies that

0χ2(PQ)exp[e1ηk(kp)ηppk+1]10\displaystyle 0\leq\chi^{2}(P\|Q)\leq\exp\bigg{[}e^{1-\eta}k\bigg{(}\frac{k}{p}\bigg{)}^{\eta}\frac{p}{p-k+1}\bigg{]}-1\to 0 (62)

as pp\to\infty. This means that χ2(PQ)0\chi^{2}(P\|Q)\to 0 when k=o(pη1+η)k=o(p^{\frac{\eta}{1+\eta}}), which proves Theorem 4.

B-C Proof of Theorem 5 (A Condition for Non-Vanishing χ2\chi^{2}-Divergence)

Here we prove that lim infpχ2(PQ)c\liminf_{p\to\infty}\chi^{2}(P\|Q)\geq c for some c>0c>0 when n(1η)log2(pk)n\geq(1-\eta)\log_{2}{p\choose k} under the assumptions k=o(p)k=o(p) and k=Ω(pη1+η)k=\Omega(p^{\frac{\eta}{1+\eta}}).444We are grateful to an anonymous reviewer for helping us to significantly simplify this proof.

As mentioned in Section III-A, χ2(PQ)0\chi^{2}(P\|Q)\to 0 implies that dTV(P,Q)0d_{\rm{TV}}(P,Q)\to 0, which in turn implies that weak detection is impossible for any algorithm. However, Theorem 2 shows that weak detection is possible when k=Ω(p1+η3+η+δ)k=\Omega(p^{\frac{1+\eta}{3+\eta}+\delta}) for arbitrarily small δ(0,1)\delta\in(0,1). Note that that 1+η3+η<12\frac{1+\eta}{3+\eta}<\frac{1}{2} for any η[0,1)\eta\in[0,1). Thus, it must be the case that lim infpχ2(PQ)c\liminf_{p\to\infty}\chi^{2}(P\|Q)\geq c whenever k=Ω(p)k=\Omega(\sqrt{p}), since otherwise χ2(PQ)=o(1)\chi^{2}(P\|Q)=o(1) would be a contradiction. In the following, we therefore only consider k=o(p)k=o(\sqrt{p}).

We need to prove that lim infpχ2(PQ)c\liminf_{p\to\infty}\chi^{2}(P\|Q)\geq c for some c>0c>0 when n(1η)log2(pk)n\geq(1-\eta)\log_{2}{p\choose k} under the assumptions k=o(p)k=o(\sqrt{p}) and k=Ω(pη1+η)k=\Omega(p^{\frac{\eta}{1+\eta}}). Following (46)–(49) with the inequality in (47) reversed, we have

χ2(PQ)(pk)ηl=0k(kl)(pkl)(pk)(1η)lk1.\displaystyle\chi^{2}(P\|Q)\geq{p\choose k}^{-\eta}\sum_{l=0}^{k}{k\choose l}{p-k\choose l}{p\choose k}^{-(1-\eta)\frac{l}{k}}-1. (63)

Further lower bounding the right-hand side by taking only the terms l{k1,k}l\in\{k-1,k\}, we obtain

χ2(PQ)\displaystyle\chi^{2}(P\|Q) 1+(pkk)(pk)+k(pkk1)(pk)(pk)1ηk\displaystyle\geq-1+\frac{{p-k\choose k}}{{p\choose k}}+k\frac{{p-k\choose k-1}}{{p\choose k}}{p\choose k}^{\frac{1-\eta}{k}} (64)
=1+(pkk)(pk)+k2p2k+1(pkk)(pk)(pk)1ηk\displaystyle=-1+\frac{{p-k\choose k}}{{p\choose k}}+\frac{k^{2}}{p-2k+1}\frac{{p-k\choose k}}{{p\choose k}}{p\choose k}^{\frac{1-\eta}{k}} (65)
=1+(pkk)(pk)[1+k2p2k+1(pk)1ηk]\displaystyle=-1+\frac{{p-k\choose k}}{{p\choose k}}\bigg{[}1+\frac{k^{2}}{p-2k+1}{p\choose k}^{\frac{1-\eta}{k}}\bigg{]} (66)
1+(pkk)(pk)[1+k2p2k+1(pk)1η]\displaystyle\geq-1+\frac{{p-k\choose k}}{{p\choose k}}\bigg{[}1+\frac{k^{2}}{p-2k+1}\bigg{(}\frac{p}{k}\bigg{)}^{1-\eta}\bigg{]} (67)
=1+(pkk)(pk)[1+k1+ηpη(pp2k+1)],\displaystyle=-1+\frac{{p-k\choose k}}{{p\choose k}}\bigg{[}1+\frac{k^{1+\eta}}{p^{\eta}}\bigg{(}\frac{p}{p-2k+1}\bigg{)}\bigg{]}, (68)

where (65) uses (pkk1)=(pkk)kp2k+1{{p-k}\choose{k-1}}={{p-k}\choose{k}}\frac{k}{p-2k+1}, and (67) follows since (pk)(pk)k{p\choose k}\geq(\frac{p}{k})^{k}. To bound the ratio of binomial coefficients in (68), observe that

1\displaystyle 1 (pkk)(pk)\displaystyle\geq\frac{{p-k\choose k}}{{p\choose k}} (69)
=(pk)(pk1)(p2k+1)p(p1)(pk+1)\displaystyle=\frac{(p-k)(p-k-1)\cdots(p-2k+1)}{p(p-1)\cdots(p-k+1)} (70)
>(12kp)k\displaystyle>\bigg{(}1-\frac{2k}{p}\bigg{)}^{k} (71)
12k2p\displaystyle\geq 1-\frac{2k^{2}}{p} (72)

where (72) follows since (1+x)k1+kx(1+x)^{k}\geq 1+kx for all x>1x>-1. Hence, since we are considering k=o(p)k=o(\sqrt{p}), it follows that (pkk)(pk)=1+o(1)\frac{{p-k\choose k}}{{p\choose k}}=1+o(1). Combining this with (68) and the assumption k=Ω(pη1+η)k=\Omega(p^{\frac{\eta}{1+\eta}}), we obtain

lim infpχ2(PQ)c\displaystyle\liminf_{p\to\infty}\chi^{2}(P\|Q)\geq c (73)

for some constant c>0c>0, completing the proof of Theorem 5.

B-D Proof of Theorem 6 (Negative Result for Weak Recovery)

As mentioned in Section III-A, χ2(PQ)0\chi^{2}(P\|Q)\to 0 implies that D(PQ)0D(P\|Q)\to 0. Consider (𝐗,𝐘)P(\mathbf{X},\mathbf{Y})\sim P, along with an additional pair (X,Y){0,1}p×{0,1}(X^{\prime},Y^{\prime})\in\{0,1\}^{p}\times\{0,1\} drawn from the same joint distribution as a single test in (𝐗,𝐘)(\mathbf{X},\mathbf{Y}), independently from (𝐗,𝐘)(\mathbf{X},\mathbf{Y}). Following the steps of [26] for sparse linear regression, we consider the following conditional mutual information term:

I(S;𝐘,Y|𝐗,X)\displaystyle I(S;\mathbf{Y},Y^{\prime}|\mathbf{X},X^{\prime})
=𝐄P[logP(𝐘,Y|𝐗,X,S)P(𝐘,Y|𝐗,X)]\displaystyle\quad=\mathbf{E}_{P}\bigg{[}\log\frac{P(\mathbf{Y},Y^{\prime}|\mathbf{X},X^{\prime},S)}{P(\mathbf{Y},Y^{\prime}|\mathbf{X},X^{\prime})}\bigg{]} (74)
=𝐄P[logP(𝐘,Y|𝐗,X,S)Q(𝐘,Y)]\displaystyle\quad=\mathbf{E}_{P}\bigg{[}\log\frac{P(\mathbf{Y},Y^{\prime}|\mathbf{X},X^{\prime},S)}{Q(\mathbf{Y},Y^{\prime})}\bigg{]}
𝐄P[logP(𝐘,Y|𝐗,X)Q(𝐘,Y)]\displaystyle\qquad\qquad-\mathbf{E}_{P}\bigg{[}\log\frac{P(\mathbf{Y},Y^{\prime}|\mathbf{X},X^{\prime})}{Q(\mathbf{Y},Y^{\prime})}\bigg{]} (75)
=𝐄P[logP(𝐘,Y|𝐗,X,S)Q(𝐘,Y)]D(PQ),\displaystyle\quad=\mathbf{E}_{P}\bigg{[}\log\frac{P(\mathbf{Y},Y^{\prime}|\mathbf{X},X^{\prime},S)}{Q(\mathbf{Y},Y^{\prime})}\bigg{]}-D(P\|Q), (76)

where D(PQ)D(P\|Q) is now defined according to PP and QQ containing n+1n+1 tests instead of nn (one extra for XX^{\prime}, YY^{\prime}). Under PP, we have P(𝐘,Y|𝐗,X,S)=1P(\mathbf{Y},Y^{\prime}|\mathbf{X},X^{\prime},S)=1 almost surely, and combining this with Q(𝐘,Y)=2(n+1)Q(\mathbf{Y},Y^{\prime})=2^{-(n+1)}, it follows that

I(S;𝐘,Y|𝐗,X)=(n+1)log2D(PQ).I(S;\mathbf{Y},Y^{\prime}|\mathbf{X},X^{\prime})=(n+1)\log 2-D(P\|Q). (77)

Moreover, by the chain rule for mutual information, we have

I(S;𝐘,Y|𝐗,X)\displaystyle I(S;\mathbf{Y},Y^{\prime}|\mathbf{X},X^{\prime}) =I(S;𝐘|𝐗,X)+I(S;Y|𝐗,X,𝐘)\displaystyle=I(S;\mathbf{Y}|\mathbf{X},X^{\prime})+I(S;Y^{\prime}|\mathbf{X},X^{\prime},\mathbf{Y}) (78)
nlog2+H(Y|𝐗,X,𝐘),\displaystyle\leq n\log 2+H(Y^{\prime}|\mathbf{X},X^{\prime},\mathbf{Y}), (79)

where the two terms are attained as follows by expanding the conditional mutual information as as a difference of conditional entropies:

  • For the first term, write I(S;𝐘|𝐗,X)=H(𝐘|𝐗,X)H(𝐘|𝐗,X,S)H(𝐘|𝐗,X)I(S;\mathbf{Y}|\mathbf{X},X^{\prime})=H(\mathbf{Y}|\mathbf{X},X^{\prime})-H(\mathbf{Y}|\mathbf{X},X^{\prime},S)\leq H(\mathbf{Y}|\mathbf{X},X^{\prime}), and note that H(𝐘|𝐗,X)nlog2H(\mathbf{Y}|\mathbf{X},X^{\prime})\leq n\log 2 since 𝐘{0,1}n\mathbf{Y}\in\{0,1\}^{n} and entropy is upper bounded by the logarithm of the number of outcomes;

  • For the second term, write I(S;Y|𝐗,X,𝐘)=H(Y|𝐗,X,𝐘)H(Y|𝐗,X,𝐘,S)I(S;Y^{\prime}|\mathbf{X},X^{\prime},\mathbf{Y})=H(Y^{\prime}|\mathbf{X},X^{\prime},\mathbf{Y})-H(Y^{\prime}|\mathbf{X},X^{\prime},\mathbf{Y},S), and note that we have H(Y|𝐗,X,𝐘,S)=0H(Y^{\prime}|\mathbf{X},X^{\prime},\mathbf{Y},S)=0 since YY^{\prime} is deterministic given (X,S)(X^{\prime},S).

Combining (77) and (79) gives

H(Y|𝐗,X,𝐘)log2D(PQ)=log2o(1),H(Y^{\prime}|\mathbf{X},X^{\prime},\mathbf{Y})\geq\log 2-D(P\|Q)=\log 2-o(1), (80)

since D(PQ)0D(P\|Q)\to 0 for n(1η)log2(pk)n\leq(1-\eta)\log_{2}{p\choose k} by Theorem 4 (the replacement of of nn by n+1n+1 only amounts to a negligible multiplicative 1+o(1)1+o(1) change in η\eta). Since the entropy functional is continuous, and the entropy of a binary random variable is at most log2\log 2 with equality if and only if the random variable is equiprobable on its two values, we deduce from (80) that the following holds: With probability 1o(1)1-o(1) with respect to (𝐗,X,𝐘)(\mathbf{X},X^{\prime},\mathbf{Y}), the conditional distribution of YY^{\prime} places probability 12+o(1)\frac{1}{2}+o(1) on each of Y=0Y^{\prime}=0 and Y=1Y^{\prime}=1.

To complete the proof of Theorem 6, we show that the preceding claim precludes the possibility of weak recovery, i.e., (16) holds. Suppose by contradiction to (16) that it were possible to use (𝐗,𝐘)(\mathbf{X},\mathbf{Y}) to attain |SS^|δk|S\cap\hat{S}|\geq\delta k with probability at least δ\delta, for some δ>0\delta>0. In the following, we assume the extreme case |SS^|=δk|S\cap\hat{S}|=\delta k; the case of strict inequality follows similarly. Consider a procedure that uses this S^\hat{S} to construct an estimator that takes the test vector XX^{\prime} as input and returns an estimate Y^\hat{Y}^{\prime} of YY^{\prime} as follows: Set Y^=1\hat{Y}^{\prime}=1 if the test includes any item from S^\hat{S}, and Y^=0\hat{Y}^{\prime}=0 otherwise. There are two scenarios in which the estimate is incorrect:

  • The test may include no items from SS (and hence Y=0Y^{\prime}=0), but an item from SSS^{\prime}\setminus S (and hence Y^=1\hat{Y}^{\prime}=1). By the choice of ν\nu in (1), the probability (with respect to PP) of this occurring is 12(1(1νk)(1δ)k)=12(1(12)1δ)\frac{1}{2}\big{(}1-\big{(}1-\frac{\nu}{k}\big{)}^{(1-\delta)k}\big{)}=\frac{1}{2}\big{(}1-\big{(}\frac{1}{2}\big{)}^{1-\delta}\big{)}.

  • The test may include no items from SS^{\prime} (and hence Y^=0\hat{Y}^{\prime}=0), but an item from SSS\setminus S^{\prime} (and hence Y=1Y^{\prime}=1). By the same argument as above, the probability of this occurring is 12(1(12)1δ)\frac{1}{2}\big{(}1-\big{(}\frac{1}{2}\big{)}^{1-\delta}\big{)}.

Hence, when |SS^|=δk|S\cap\hat{S}|=\delta k, the estimator produces Y^=Y\hat{Y}^{\prime}=Y^{\prime} with probability (12)1δ\big{(}\frac{1}{2}\big{)}^{1-\delta}. As a result, for any fixed δ>0\delta>0, the success probability behaves as 12+Ω(1)\frac{1}{2}+\Omega(1). This is in contradiction with the conditional distribution of YY^{\prime} stated following (80) (which only permits a 12+o(1)\frac{1}{2}+o(1) probability of correctness), and this completes the proof by contradiction establishing (16).

B-E Proof of Theorem 7 (Negative Result for Identifying a Definite Defective)

Consider any algorithm that, with probability one, only outputs 0\mathcal{I}\neq 0 when \mathcal{I} is the index of a defective item. If SS is the true defective set, then it is easy to see that an error occurs (i.e., =0\mathcal{I}=0) if some SS^{\prime} disjoint from SS is still consistent with (𝐗,𝐘)(\mathbf{X},\mathbf{Y}). Denoting this event by S\mathcal{E}_{S^{\prime}}, it follows that

[=0|S]\displaystyle\mathbb{P}[\mathcal{I}=0\,|\,S] [S:SS={S}]\displaystyle\geq\mathbb{P}\bigg{[}\bigcup_{S^{\prime}\,:\,S\cap S^{\prime}=\emptyset}\{\mathcal{E}_{S^{\prime}}\}\bigg{]} (81)
S:SS=[S]2S:SS=[SS],\displaystyle\geq\sum_{S^{\prime}\,:\,S\cap S^{\prime}=\emptyset}\frac{\mathbb{P}[\mathcal{E}_{S^{\prime}}]^{2}}{\sum_{S^{\natural}\,:\,S\cap S^{\natural}=\emptyset}\mathbb{P}[\mathcal{E}_{S^{\prime}}\cap\mathcal{E}_{S^{\natural}}]}, (82)

where (82) follows from de Caen’s lower bound on the probability of a union [37]. However, for SS^{\prime} and SS^{\natural} both disjoint from SS (but possibly overlapping with each other), [SS]\mathbb{P}[\mathcal{E}_{S^{\prime}}\cap\mathcal{E}_{S^{\natural}}] is exactly the same quantity as 𝔼Q[S(𝐗,𝐘)S(𝐗,𝐘)]\mathbb{E}_{Q}\big{[}\mathcal{I}_{S^{\prime}}(\mathbf{X},\mathbf{Y})\mathcal{I}_{S^{\natural}}(\mathbf{X},\mathbf{Y})\big{]} appearing in (43). In particular, [S]\mathbb{P}[\mathcal{E}_{S^{\prime}}] corresponds to the case that S=SS^{\prime}=S^{\natural}. Substituting the expression in (43) gives [SS]=2n(1+k)\mathbb{P}[\mathcal{E}_{S^{\prime}}\cap\mathcal{E}_{S^{\natural}}]=2^{-n(1+\frac{\ell}{k})} when |SS|=k|S^{\prime}\cap S^{\natural}|=k-\ell, and substitution into (82) gives

[=0|S]\displaystyle\mathbb{P}[\mathcal{I}=0\,|\,S] (pk)4n=0k(k)(pk)2n(1+k)\displaystyle\geq{p\choose k}\frac{4^{-n}}{\sum_{\ell=0}^{k}{k\choose\ell}{p-k\choose\ell}2^{-n(1+\frac{\ell}{k})}} (83)
=11+χ2(PQ),\displaystyle=\frac{1}{1+\chi^{2}(P\|Q)}, (84)

where (84) follows by equating with (44). From Theorem 4, we know that χ2(PQ)0\chi^{2}(P\|Q)\to 0 under the conditions of Theorem 7, and we conclude that [=0|S]1\mathbb{P}[\mathcal{I}=0\,|\,S]\to 1. Since this holds regardless of which SS is conditioned on, we obtain [=0]1\mathbb{P}[\mathcal{I}=0]\to 1 as desired.

Appendix C Berry-Esseen Theorem

Our analysis makes use of the following Berry-Esseen theorem, a non-asymptotic form of the central limit theorem.

Theorem 8.

(Berry-Esseen Theorem [39, Theorem 2]) For j=1,,pj=1,\ldots,p, let XjX_{j} be independent random variables with

μj=𝔼[Xj],σj2=𝖵𝖺𝗋[Xj],andρj=𝔼[|Xjμj|3].\displaystyle\mu_{j}=\mathbb{E}[X_{j}],\quad\sigma_{j}^{2}=\operatorname{\mathsf{Var}}[X_{j}],~{}~{}\text{\rm and}~{}~{}\rho_{j}=\mathbb{E}[|X_{j}-\mu_{j}|^{3}].

Denote V=j=1pσj2V=\sum_{j=1}^{p}\sigma_{j}^{2} and T=j=1pρjT=\sum_{j=1}^{p}\rho_{j}. Then, for any λ\lambda\in\mathbb{R}, we have

|[j=1p(Xjμj)Vλ]𝒬(λ)|6TV3/2,\displaystyle\bigg{|}\mathbb{P}\bigg{[}\frac{\sum_{j=1}^{p}(X_{j}-\mu_{j})}{\sqrt{V}}\geq\lambda\bigg{]}-\mathcal{Q}(\lambda)\bigg{|}\leq\frac{6T}{V^{3/2}}, (85)

where 𝒬(t)=t12πeu22du\mathcal{Q}(t)=\int_{t}^{\infty}\frac{1}{\sqrt{2\pi}}e^{-\frac{u^{2}}{2}}\,{\rm d}u.

More precisely, we use the following simple corollary.

Corollary 2.

Let ZBinomial(p,q0)Z\sim\mbox{Binomial}(p,q_{0}). Then, for any λ\lambda\in\mathbb{R}, the following holds:

|[Zpq0σpλ]𝒬(λ)|6ρσ3p,\displaystyle\bigg{|}\mathbb{P}\bigg{[}\frac{Z-pq_{0}}{\sigma\sqrt{p}}\geq\lambda\bigg{]}-\mathcal{Q}(\lambda)\bigg{|}\leq\frac{6\rho}{\sigma^{3}\sqrt{p}}, (86)

where

ρ\displaystyle\rho =(1q0)3q0+q03(1q0),\displaystyle=(1-q_{0})^{3}q_{0}+q_{0}^{3}(1-q_{0}), (87)
σ\displaystyle\sigma =(1q0)q0.\displaystyle=\sqrt{(1-q_{0})q_{0}}. (88)
Proof.

Since ZBinomial(p,q0)Z\sim\mbox{Binomial}(p,q_{0}), we can write Z=j=1pZjZ=\sum_{j=1}^{p}Z_{j}, where the ZjZ_{j} are i.i.d. with distribution Bernoulli(q0)\mbox{Bernoulli}(q_{0}). We shift to a zero-mean summation by writing Zpq0=j=1p(Zjq0)Z-pq_{0}=\sum_{j=1}^{p}(Z_{j}-q_{0}), and observe that

ρ1:=𝔼[|Z1q0|3]=(1q0)3q0+|0q0|3(1q0)=ρ,\displaystyle\rho_{1}:=\mathbb{E}[|Z_{1}-q_{0}|^{3}]=(1-q_{0})^{3}q_{0}+|0-q_{0}|^{3}(1-q_{0})=\rho, (89)

and

σ12=𝔼[(Z1q0)2]=(1q0)2q0+(0q0)2(1q0)=σ2\displaystyle\sigma_{1}^{2}=\mathbb{E}[(Z_{1}-q_{0})^{2}]=(1-q_{0})^{2}q_{0}+(0-q_{0})^{2}(1-q_{0})=\sigma^{2} (90)

for ρ\rho and σ\sigma defined in (87)–(88). Hence, (86) follows directly from Theorem 8 with T=pρ13=pρ3T=p\rho_{1}^{3}=p\rho^{3} and V=pσ12=pσ2V=p\sigma_{1}^{2}=p\sigma^{2}. ∎

References

  • [1] M. Aldridge, O. Johnson, and J. Scarlett, “Group testing: An information theory perspective,” Found. Trend. Comms. Inf. Theory, vol. 15, no. 3–4, pp. 196–392, 2019.
  • [2] W. Kautz and R. Singleton, “Nonrandom binary superimposed codes,” IEEE Trans. Inf. Theory, vol. 10, no. 4, pp. 363–377, 1964.
  • [3] D. Du and F. K. Hwang, Combinatorial group testing and its applications.   World Scientific, 2000, vol. 12.
  • [4] M. Malyutov, “The separating property of random matrices,” Math. Notes Acad. Sci. USSR, vol. 23, no. 1, pp. 84–91, 1978.
  • [5] A. de Bonis, L. Gasieniec, and U. Vaccaro, “Optimal two-stage algorithms for group testing problems,” SIAM Journal on Computing, vol. 34, no. 5, pp. 1253–1270, 2005.
  • [6] A. G. D’yachkov and V. V. Rykov, “A survey of superimposed code theory,” Prob. Contr. Inf., vol. 12, no. 4, pp. 1–13, 1983.
  • [7] A. Rashad, “Random coding bounds on the rate for list-decoding superimposed codes,” Prob. Ctrl. Inf. Theory, vol. 19, no. 2, pp. 141–149, 1990.
  • [8] M. Cheraghchi, “Noise-resilient group testing: Limitations and constructions,” in Int. Symp. Found. Comp. Theory, 2009, pp. 62–73.
  • [9] P. Indyk, H. Q. Ngo, and A. Rudra, “Efficiently decodable non-adaptive group testing,” in ACM-SIAM Symp. Disc. Alg. (SODA), 2010.
  • [10] H. Q. Ngo, E. Porat, and A. Rudra, “Efficiently decodable error-correcting list disjunct matrices and applications,” in Int. Colloq. Automata, Lang., and Prog., 2011.
  • [11] J. Scarlett and V. Cevher, “How little does non-exact recovery help in group testing?” in IEEE Int. Conf. Acoust. Sp. Sig. Proc. (ICASSP), 2017.
  • [12] K. Lee, K. Chandrasekher, R. Pedarsani, and K. Ramchandran, “SAFFRON: A fast, efficient, and robust framework for group testing based on sparse-graph codes,” IEEE Trans. Sig. Proc., vol. 67, no. 17, pp. 4649–4664, Sept. 2019.
  • [13] J. Scarlett and V. Cevher, “Phase transitions in group testing,” in Proc. ACM-SIAM Symp. Disc. Alg. (SODA), 2016.
  • [14] C. L. Chan, P. H. Che, S. Jaggi, and V. Saligrama, “Non-adaptive probabilistic group testing with noisy measurements: Near-optimal bounds with efficient algorithms,” in Allerton Conf. Comm., Ctrl., Comp., Sep. 2011, pp. 1832–1839.
  • [15] M. Cheraghchi, A. Hormati, A. Karbasi, and M. Vetterli, “Group testing with probabilistic tests: Theory, design and application,” IEEE Trans. Inf. Theory, vol. 57, no. 10, pp. 7057–7067, Oct 2011.
  • [16] M. B. Malyutov, “Search for sparse active inputs: A review,” in Inf. Theory, Comb. and Search Theory, 2013, pp. 609–647.
  • [17] M. Aldridge, L. Baldassini, and O. Johnson, “Group testing algorithms: Bounds and simulations,” IEEE Trans. Inf. Theory, vol. 60, no. 6, pp. 3671–3687, June 2014.
  • [18] J. Scarlett and V. Cevher, “Near-optimal noisy group testing via separate decoding of items,” IEEE Trans. Sel. Topics Sig. Proc., vol. 2, no. 4, pp. 625–638, 2018.
  • [19] A. Coja-Oghlan, O. Gebhard, M. Hahn-Klimroth, and P. Loick, “Information-theoretic and algorithmic thresholds for group testing,” in Int. Colloq. Aut., Lang. and Prog. (ICALP), 2019.
  • [20] ——, “Optimal group testing,” in Conf. Learn. Theory (COLT), 2020.
  • [21] C. L. Chan, S. Jaggi, V. Saligrama, and S. Agnihotri, “Non-adaptive group testing: Explicit bounds and novel algorithms,” IEEE Trans. Inf. Theory, vol. 60, no. 5, pp. 3019–3035, May 2014.
  • [22] L. Baldassini, O. Johnson, and M. Aldridge, “The capacity of adaptive group testing,” in IEEE Int. Symp. Inf. Theory, July 2013, pp. 2676–2680.
  • [23] O. Johnson, M. Aldridge, and J. Scarlett, “Performance of group testing algorithms with near-constant tests-per-item,” IEEE Trans. on Inform. Th., vol. 65, no. 2, pp. 707–723, 2019.
  • [24] M. Hahn-Klimroth and P. Loick, “Optimal adaptive group testing,” 2019, https://arxiv.org/abs/1911.06647.
  • [25] D. Gamarnik and I. Zadik, “High dimensional regression with binary coefficients. estimating squared error and a phase transtition,” in Conf. Learn. Theory (COLT), 2017, pp. 948–953.
  • [26] G. Reeves, J. Xu, and I. Zadik, “The all-or-nothing phenomenon in sparse linear regression,” in Conf. Learn. Theory (COLT), 2019.
  • [27] ——, “All-or-nothing phenomena: From single-letter to high dimensions,” 2019, https://arxiv.org/abs/1912.13027.
  • [28] A. Gilbert, M. Iwen, and M. Strauss, “Group testing and sparse signal recovery,” in Asilomar Conf. Sig., Sys. and Comp., Oct. 2008, pp. 1059–1063.
  • [29] G. Atia and V. Saligrama, “Boolean compressed sensing and noisy group testing,” IEEE Trans. Inf. Theory, vol. 58, no. 3, pp. 1880–1901, March 2012.
  • [30] J. Scarlett, “Noisy adaptive group testing: Bounds and algorithms,” IEEE Trans. Inf. Theory, vol. 65, no. 6, pp. 3646–3661, June 2019.
  • [31] W. Hoeffding, “Probability inequalities for sums of bounded random variables,” J. Amer. Stat. Assoc., vol. 58, no. 301, pp. 13–30, 1963.
  • [32] F. Hwang, “A method for detecting all defective members in a population by group testing,” J. Amer. Stats. Assoc., vol. 67, no. 339, pp. 605–608, 1972.
  • [33] G. Forney, “Exponential error bounds for erasure, list, and decision feedback schemes,” IEEE Trans. Inf. Theory, vol. 14, no. 2, pp. 206–220, 1968.
  • [34] S. Y. T. Soon, “Binomial approximation for dependent indicators,” Statistica Sinica, vol. 6, no. 3, pp. 703–714, 1996.
  • [35] J. Duchi, “Lecture notes for statistics 311/electrical engineering 377,” http://stanford.edu/class/stats311/.
  • [36] I. Sason and S. Verdú, “ff-divergence inequalities,” IEEE Trans. Inf. Theory, vol. 62, no. 11, pp. 5973–6006, Nov. 2016.
  • [37] D. de Caen, “A lower bound on the probability of a union,” Discrete mathematics, vol. 169, no. 1, pp. 217–220, 1997.
  • [38] M. B. Malyutov and P. S. Mateev, “Screening designs for non-symmetric response function,” Mat. Zametki, vol. 29, pp. 109–127, 1980.
  • [39] W. Feller, An Introduction to Probability Theory and Its Applications, 2nd ed.   John Wiley and Sons, 1971.