Noisy Computing of the Threshold Function

Ziao Wang, Nadim Ghaddar, Banghua Zhu, and Lele Wang Ziao Wang is with the Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, BC V6T 1Z4, Canada (email: [email protected]).Nadim Ghaddar is with the Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON M5S 3G8, Canada, (email: [email protected]).Banghua Zhu is with the Department of Electrical Engineering and Computer Sciences, University of California Berkeley, Berkeley, CA 94720, USA, (email: [email protected]).Lele Wang is with the Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, BC V6T 1Z4, Canada (email: [email protected]).

Abstract

Let $\mathsf{TH}_{k}$ denote the $k$ -out-of- $n$ threshold function: given $n$ input Boolean variables, the output is $1$ if and only if at least $k$ of the inputs are $1$ . We consider the problem of computing the $\mathsf{TH}_{k}$ function using noisy readings of the Boolean variables, where each reading is incorrect with some fixed and known probability $p\in(0,1/2)$ . As our main result, we show that it is sufficient to use $(1+o(1))\frac{n\log\frac{m}{\delta}}{D_{\mathsf{KL}}(p\|1-p)}$ queries in expectation to compute the $\mathsf{TH}_{k}$ function with a vanishing error probability $\delta=o(1)$ , where $m\triangleq\min\{k,n-k+1\}$ and $D_{\mathsf{KL}}(p\|1-p)$ denotes the Kullback-Leibler divergence between $\mathsf{Bern}(p)$ and $\mathsf{Bern}(1-p)$ distributions. Conversely, we show that any algorithm achieving an error probability of $\delta=o(1)$ necessitates at least $(1-o(1))\frac{(n-m)\log\frac{m}{\delta}}{D_{\mathsf{KL}}(p\|1-p)}$ queries in expectation. The upper and lower bounds are tight when $m=o(n)$ , and are within a multiplicative factor of $\frac{n}{n-m}$ when $m=\Theta(n)$ . In particular, when $k=n/2$ , the $\mathsf{TH}_{k}$ function corresponds to the $\mathsf{MAJORITY}$ function, in which case the upper and lower bounds are tight up to a multiplicative factor of two. Compared to previous work, our result tightens the dependence on $p$ in both the upper and lower bounds.

I Introduction

Coping with noise in computing is an important problem to consider in large systems. With applications in fault tolerance (Hastad et al.,, 1987; Pease et al.,, 1980; Pippenger et al.,, 1991), active ranking (Shah and Wainwright,, 2018; Agarwal et al.,, 2017; Falahatgar et al.,, 2017; Heckel et al.,, 2019; Wang et al.,, 2024; Gu and Xu,, 2023), noisy searching (Berlekamp,, 1964; Horstein,, 1963; Burnashev and Zigangirov,, 1974; Pelc,, 1989; Karp and Kleinberg,, 2007), among many others, the goal is to devise algorithms that are robust enough to detect and correct the errors that can happen during the computation. More concretely, the problem can be defined as follows: suppose an agent is interested in computing a function $f$ of $n$ variables with an error probability at most $\delta$ , as quickly as possible. To this end, the agent can ask binary questions (referred to hereafter as queries) about the variables at hand. The binary response to the queries is observed by the agent through a binary symmetric channel (BSC) with crossover probability $p$ . The agent can adaptively design subsequent queries based on the responses to previous queries. The goal is to characterize the relation between $n$ , $\delta$ , $p$ , and the query complexity, which is defined as the minimal number of queries that are needed by the agent to meet the intended goal of computing the function $f$ .

This paper considers the computation of the threshold- $k$ function. For $n$ Boolean variables $\mathbf{x}=(x_{1},\ldots,x_{n})\in\{0,1\}^{n}$ , the threshold- $k$ function $\mathsf{TH}_{k}(\cdot)$ computes whether the number of 1’s in $\mathbf{x}$ is at least $k$ or not, i.e.,

\mathsf{TH}_{k}(\mathbf{x})\triangleq\begin{cases}1&\text{ if }\sum_{i=1}^{n}x_{i}\geq k;\\ 0&\text{ if }\sum_{i=1}^{n}x_{i}<k.\end{cases}

The noisy queries correspond to noisy readings of the bits, where at each time step, the agent queries one of the bits, and with probability $p$ , the wrong value of the bit is returned. It is assumed that the constant $p\in(0,1/2)$ is known to the agent. Our goal is to characterize the optimal query complexity for computing the $\mathsf{TH}_{k}$ function with error probability at most $\delta$ . Throughout the paper, we assume parameters $k$ and $\delta$ are functions that scale with $n$ , while $p$ is an absolute constant independent of all other parameters. We also assume that $\delta$ is always strictly positive.

This model for noisy computation of the $\mathsf{TH}_{k}$ function has been considered in Feige et al., (1994), where upper and lower bounds on the number of queries that are needed to compute the two functions are derived in terms of the number of variables $n$ , the threshold $k$ , the noise probability $p$ and the desired error probability $\delta$ . The upper and lower bounds in Feige et al., (1994) are within a constant factor, hence providing the optimal order for the minimum number of queries; however, the exact tight characterization of the optimal number of queries is still open.

In this paper, we tighten this gap and provide new upper and lower bounds for the computation of the $\mathsf{TH}_{k}$ function, which simultaneously improves the existing upper and lower bounds.

The main result of this paper can be stated as follows: for any $1\leq k\leq n$ , there exists an algorithm that computes the $\mathsf{TH}_{k}$ function with an error probability at most $\delta=o(1)$ , and the algorithm uses at most

(1+o(1))\frac{n\log\frac{m}{\delta}}{D_{\mathsf{KL}}(p\|1-p)}

queries in expectation. Here we define $m\triangleq\min\{k,n-k+1\}$ and denote the Kullback-Leibler divergence between $\mathsf{Bern}(p)$ and $\mathsf{Bern}(1-p)$ distributions by $D_{\mathsf{KL}}(p\|1-p)$ . Conversely, we prove that to achieve an error probability of $\delta=o(1)$ , any algorithm must make at least

(1-o(1))\frac{(n-m)\log\frac{m}{\delta}}{D_{\mathsf{KL}}(p\|1-p)}

queries in expectation. When $m=o(n)$ , the ratio between these upper and lower bounds is $1+o(1)$ , and hence we provide an asymptotically tight characterization for the optimal number of queries. For general $m$ , these bounds are tight within a multiplicative factor of $2$ .

Refer to caption — Figure 1: Average number of queries used by the proposed noisy threshold algorithm for computing the $\mathsf{MAJORITY}$ function (i.e., $k=n/2$ ), in comparison to the algorithm proposed in Feige et al., (1994), for $n=100$ and $\delta=10^{-2}$ . The theoretical lower bound corresponds to the plot of $\frac{(n-m)\log(m/\delta)}{{D_{\mathrm{KL}}(p\|1-p)}}$ , where $m=\min\{k,n-k+1\}$ .

To provide a more quantitative comparison with the bounds presented in Feige et al., (1994), let us consider the example where we set $k=n^{1/3}$ , $\delta=n^{-1/4}$ , and $p=\frac{1}{3}$ . According to our results, the optimal number of queries to achieve error probability $\delta$ is approximately $2.5247\cdot n\log n$ . In contrast, the constants in front of $n\log n$ in the lower and upper bounds given in Feige et al., (1994) are roughly $0.0506$ and $433.7518$ ¹¹1The constant in the upper bound of Feige et al., (1994) is not explicitly given. We refined the analysis and computed the constant in this example using the algorithm proposed in Feige et al., (1994)., respectively. As an additional example, Figure 1 shows the simulation of our proposed algorithm in computing the $\mathsf{MAJORITY}$ function (that is, the threshold function with $k=n/2$ ) compared to the algorithm proposed in Feige et al., (1994), when $n=100$ and $\delta=10^{-2}$ . Our algorithm clearly uses fewer queries on average to achieve the desired error probability. Moreover, our algorithm shows a scaling similar to the theoretical lower bound derived in this paper. The detailed setup of the simulation is given in Appendix C.

I-A Related Work

The noisy computation of the $\mathsf{TH}_{k}$ function has primarily been investigated with a focus on the special case of $k=1$ , i.e., the $\mathsf{OR}$ function. Early investigations into the noisy computation of the $\mathsf{OR}$ function can be traced back to studies in circuit theory with noisy gates (Dobrushin and Ortyukov, 1977a, ; Dobrushin and Ortyukov, 1977b, ; Von Neumann,, 1956; Pippenger et al.,, 1991; Gács and Gál,, 1994) and noisy decision trees (Evans and Pippenger,, 1998; Reischuk and Schmeltz,, 1991). For the more general case of arbitrary $k$ , the noisy computation of the $\mathsf{TH}_{k}$ function was initially explored in the seminal work (Feige et al.,, 1994) on noisy computing. Subsequent investigations extended this study to the context of noisy broadcast networks (Kushilevitz and Mansour,, 2005). In the noise model we consider, the best-known upper and lower bounds for the required number of queries to achieve a desired error probability $\delta$ are attributed to Feige et al., (1994). Their work characterizes the optimal order as $\Theta(n\log(k/\delta))$ for the number of queries required. In this work, we improve the known bounds and provide an asymptotically tight characterization for the required number of queries.

Recently, there have been several notable efforts to precisely characterize the optimal number of queries with tight constant dependencies for various noisy computing tasks. Wang et al., (2024) consider the problem of noisy sorting with pairwise comparisons, and provide upper and lower bounds for the required number of queries with a multiplicative gap of factor $2$ . This gap is later closed by Gu and Xu, (2023). In a recent paper (Zhu et al.,, 2024), tight bounds are established for two tasks including computing the $\mathsf{OR}$ function of a binary sequence and $\mathsf{MAX}$ function of a real sequence using pairwise comparisons.

I-B Notation

In this paper, we adhere to the following notation conventions. Sets and events are represented using calligraphic letters, random variables using uppercase letters, and specific realizations of random variables using lowercase letters. For example, $\mathcal{X}$ and $\mathcal{Y}$ are two sets, $X$ and $Y$ are random variables, and $x$ and $y$ are their respective realizations. Vectors will be denoted using boldface letters, while scalars will be denoted using regular (non-bold) letters. The probability of an event $\mathcal{A}$ will be expressed as $\operatorname{\mathsf{P}}(\mathcal{A})$ , and the expectation of a real-valued random variable $X$ will be denoted as $\operatorname{\mathsf{E}}[X]$ . For a finite set $\mathcal{X}$ , its cardinality will be denoted by $\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathcal{X}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}$ . Unless otherwise specified, we use $\log$ to indicate the natural logarithm. We use $[n]$ to denote the set of integers $\{1,\ldots,n\}$ . We follow the standard Landau notation: $f(n)=O(g(n))$ if $\lim_{n\to\infty}\frac{\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}f(n)\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}missing}{g(n)}<\infty$ ; $f(n)=\Omega(g(n))$ if $\lim_{n\to\infty}\frac{f(n)}{g(n)}>0$ ; $f(n)=\Theta(g(n))$ if $f(n)=O(g(n))$ and $f(n)=\Omega(g(n))$ ; $f(n)=o(g(n))$ if $\lim_{n\to\infty}\frac{f(n)}{g(n)}=0$ ; $f(n)=\omega(g(n))$ if $\lim_{n\to\infty}\frac{\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}f(n)\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}missing}{\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}g(n)\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}missing}=\infty$ ; and $f(n)\sim g(n)$ if $\lim_{n\to\infty}\frac{f(n)}{g(n)}=1$ . Unless otherwise specified, limits in all instances of Landau notation are taken with respect to $n$ .

II Main Results

The main result of this paper comprises two theorems, through which we provide an asymptotically tight characterization of the minimum expected number of queries required to compute the $\mathsf{TH}_{k}$ function with worst-case error probability $\delta$ .

Theorem 1 (Converse).

Suppose $k\leq n/2$ and $\delta=o(1)$ . Consider any variable-length algorithm for computing $\mathsf{TH}_{k}(\mathbf{x})$ that makes $M$ noisy queries. If $M$ satisfies $\operatorname{\mathsf{E}}[M\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x}]\leq(1-o(1))\frac{(n-k)\log\frac{k}{\delta}}{{D_{\mathrm{KL}}(p\|1-p)}}$ for any input instance $\mathbf{x}$ , then the worst-case error probability of the algorithm is at least $\delta$ .

The main technical contribution of this paper is in the proof of Theorem 1. In this proof, we consider an enhanced yet more statistically tractable version of any adaptive algorithm for computing the $\mathsf{TH}_{k}$ function, and leverage the probabilistic methods to lower bound its error probability. We illustrate the main idea of the proof in Section III, while deferring the detailed proof to Appendix A.

Theorem 2 (Achievability).

Suppose $k\leq n/2$ and $\delta=o(1)$ . There exists a variable-length algorithm that computes the $\mathsf{TH}_{k}$ function with worst-case error probability at most $\delta$ . Moreover, the number of queries $M$ made by the algorithm satisfies $\operatorname{\mathsf{E}}[M\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathbf{x}}]\leq(1+o(1))\frac{n\log\frac{k}{\delta}}{{D_{\mathrm{KL}}(p\|1-p)}}$ for any input instance ${\mathbf{x}}$ .

We describe the proposed algorithm, and sketch the proof of Theorem 2 in Section IV. The detailed proof is presented in Appendix B.

The upper and lower bounds established in Theorems 1 and 2 are restricted to the case of $k\leq n/2$ . The following corollary extends the results to any $k\in[n]$ .

Corollary 1.

Let $m=\min\{k,n-k+1\}$ . For any $k\in[n]$ and $\delta=o(1)$ , it is sufficient to use $(1+o(1))\frac{n\log\frac{m}{\delta}}{D_{\mathsf{KL}}(p\|1-p)}$ queries in expectation to compute the $\mathsf{TH}_{k}$ function with worst-case error probability $\delta=o(1)$ , while at least $(1-o(1))\frac{(n-m)\log\frac{m}{\delta}}{D_{\mathsf{KL}}(p\|1-p)}$ queries in expectation are necessary.

Corollary 1 follows readily by the fact that $\mathsf{TH}_{k}({\mathbf{x}})=1-\mathsf{TH}_{n-k+1}(1-{\mathbf{x}})$ for any binary sequence ${\mathbf{x}}$ , which implies that computing the $\mathsf{TH}_{k}$ function is equivalent as computing the $\mathsf{TH}_{n-k+1}$ function.

Remark 1 (Gap between the upper and lower bounds).

The upper and lower bounds presented in Corollary 1 are within a multiplicative factor of $2+o(1)$ . To see this, notice that the ratio between the upper and lower bounds in Corollary 1 is $(1+o(1))\frac{n}{n-m}$ . Because $m=\min\{k,n-k+1\}\leq(n+1)/2$ , this ratio has maximum value $2+o(1)$ when $k=(n+1)/2$ . The problem of closing this gap remains open.

Corollary 2 (Tight bounds for $\mathsf{OR}$ and $\mathsf{AND}$ functions).

It is both sufficient and necessary to use $(1\pm o(1))\frac{n\log\frac{1}{\delta}}{D_{\mathsf{KL}}(p\|1-p)}$ noisy queries in expectation to compute the $\mathsf{OR}$ and $\mathsf{AND}$ functions of $n$ Boolean variables with worst-case error probability $\delta$ .

Corollary 2 follows by the fact that $\mathsf{OR}$ and $\mathsf{AND}$ functions can be viewed as special cases of the $\mathsf{TH}_{k}$ function with $k=1$ and $k=n$ respectively. It recovers the tight bounds for the $\mathsf{OR}$ and $\mathsf{AND}$ functions found by Zhu et al., (2024).

Corollary 3 (Fixed-length algorithms).

Suppose $\delta=o(1)$ and $\delta\geq n^{-1+{\epsilon}}$ for some arbitrarily small positive constant ${\epsilon}$ . Then for any $k\leq n/2$ , there exists a fixed-length algorithm that computes the $\mathsf{TH}_{k}$ function with error probability at most $\delta$ . Moreover, the algorithm uses at most $(1+o(1))\frac{n\log\frac{k}{\delta}}{D_{\mathsf{KL}}(p\|1-p)}$ queries.

We prove Corollary 3 in Appendix B by constructing a fixed-length version of the proposed algorithm.

III Main Idea in the Proof of the Lower Bound

In this section, we present the main idea in the proof of Theorem 1. We illustrate the proof idea for two distinct regimes: $\log k\leq\frac{\log(1/\delta)}{\log\log(1/\delta)}$ and $\log k>\frac{\log(1/\delta)}{\log\log(1/\delta)}$ , as the techniques applied in these regimes differ significantly.

III-A Regime $\log k\leq\frac{\log(1/\delta)}{\log\log(1/\delta)}$

In this regime, our lower bound relies on Le Cam’s two-point method. In this method, we view the noisy computing problem as a binary testing problem. We design an input instance ${\mathbf{x}}^{(0)}$ with $\mathsf{TH}_{k}({\mathbf{x}})=0$ , and $n-k$ input instances ${\mathbf{x}}^{(1)},\ldots,{\mathbf{x}}^{(n-k)}$ with $\mathsf{TH}_{k}({\mathbf{x}})=1$ for each $i\in[n-k]$ . Using Le Cam’s two-point lemma (Yu,, 1997), we show that when the query complexity of the algorithm is below a certain threshold, the algorithm cannot distinguish ${\mathbf{x}}^{(0)}$ and ${\mathbf{x}}^{(i)}$ for some $i\in[n-k]$ , leading to an estimation error of the $\mathsf{TH}_{k}$ function. Specifically, this method yields a lower bound of $(1-o(1))\frac{(n-k)\log\frac{1}{\delta}}{{D_{\mathrm{KL}}(p\|1-p)}}$ on the required number of queries, and further implies the desired bound $(1-o(1))\frac{(n-k)\log\frac{k}{\delta}}{{D_{\mathrm{KL}}(p\|1-p)}}$ by the fact that $\log(k/\delta)=(1+o(1))\log\frac{1}{\delta}$ in this regime. We defer the detailed proof in this regime to Appendix A.

Notably, in the opposite regime $\log k>\frac{\log(1/\delta)}{\log\log(1/\delta)}$ , the equality $\log(k/\delta)=(1+o(1))\log\frac{1}{\delta}$ does not necessarily hold true, and hence the bound provided by Le Cam’s method does not yield the desired lower bound. For example, when $\delta=\frac{1}{\log n}$ and $k=n/2$ , the lower bound provided by Le Cam’s method is given by $(1-o(1))\frac{\frac{n}{2}\log\log n}{{D_{\mathrm{KL}}(p\|1-p)}}$ , which does not even match the asymptotic order of the desired lower bound $(1-o(1))\frac{\frac{n}{2}\log\frac{n\log n}{2}}{{D_{\mathrm{KL}}(p\|1-p)}}$ .

III-B Regime $\log k>\frac{\log(1/\delta)}{\log\log(1/\delta)}$

In this regime with a larger $k$ value, the bound derived from Le Cam’s method falls short in capturing the dependency between the optimal number of queries and the parameter $k$ . Therefore, we adopt an alternative approach, where we directly analyze the query-response statistics of an arbitrary algorithm, and bound its error probability. The major difficulty in establishing a converse result for an arbitrary algorithm is that the algorithm may have an arbitrarily correlated query strategy. To tackle this challenge, we introduce a two-phase enhanced version of any given variable-length algorithm. The first phase of the enhanced algorithm is called the non-adaptive phase. It queries each bit for a fixed amount of time. The second phase is called the adaptive phase. It adaptively chooses certain bits to noiselessly reveal their values. This two-phase design of the enhanced algorithm makes the response statistics more tractable and analyzable. We show that the worst-case probability of error of the enhanced algorithm is less than or equal to that of the original algorithm. Therefore, it suffices to show that the enhanced algorithm has error probability at least $\delta$ . We sketch the proof in the rest of this section.

To prove the desired bound $(1-o(1))\frac{(n-k)\log\frac{k}{\delta}}{{D_{\mathrm{KL}}(p\|1-p)}}$ , we define a vanishing quantity $\epsilon=o(1)$ , and prove that any algorithm with at most $\frac{(1-{\epsilon})(n-k-{\epsilon}n)\log\frac{k}{\delta}}{{D_{\mathrm{KL}}}(p-\epsilon\|1-p)}$ queries in expectation has error probability at least $\delta$ .

Enhanced algorithm. Consider an arbitrary algorithm $\mathcal{A}$ that makes at most $\frac{(1-\epsilon)(n-k-\epsilon n)\log\frac{k}{\delta}}{{D_{\mathrm{KL}}}(p-\epsilon\|1-p)}$ queries in expectation. We introduce an enhanced version of $\mathcal{A}$ , denoted $\mathcal{A}^{\prime}$ , and show that the error probability of $\mathcal{A}^{\prime}$ is always less than or equal to the error probability of $\mathcal{A}$ . We then lower bound the error probability of $\mathcal{A}^{\prime}$ to complete the proof.

The enhanced algorithm $\mathcal{A}^{\prime}$ includes a two-step procedure:

$\bullet$

Non-adaptive phase: query each of the $n$ bits $\alpha\triangleq\frac{(1-\epsilon)\log\frac{k}{\delta}}{{D_{\mathrm{KL}}}(p-\epsilon\|1-p)}$ times.
$\bullet$

Adaptive phase: Adaptively choose $N^{\prime}$ bits and noiselessly reveal their value, where the number of revealed bits $N^{\prime}$ satisfies $\operatorname{\mathsf{E}}[N^{\prime}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x}]=n-k-\epsilon n$ for any $\mathbf{x}$ .

The purpose of the enhanced algorithm is to transform an intractable adaptive algorithm into a tractable one with analyzable statistics. The idea of algorithm enhancement is first introduced by Feige et al., (1994), where the authors define an enhancement for any fixed-length algorithm. In the above definition of ${\mathcal{A}}^{\prime}$ , we extend this notion of algorithm enhancement to accommodate any variable-length algorithm, and analyze the error probability for this enhanced variable-length algorithm. Consequently, the lower bound established in our work applies to the expected query complexity of any variable-length algorithm, while the lower bound by Feige et al., (1994) applies only to the deterministic query complexity of any fixed-length algorithm.

It is an intuitive observation that Algorithm $\mathcal{A}^{\prime}$ works strictly better than $\mathcal{A}$ . To see this, we notice that the expected number of bits queried more than $\alpha=\frac{(1-\epsilon)\log\frac{k}{\delta}}{{D_{\mathrm{KL}}}(p-\epsilon\|1-p)}$ times in Algorithm $\mathcal{A}$ is at most $n-k-\epsilon n$ , because $\mathcal{A}$ makes at most $\frac{(1-\epsilon)(n-k-\epsilon n)\log\frac{k}{\delta}}{{D_{\mathrm{KL}}}(p-\epsilon\|1-p)}$ queries in expectation. In contrast, Algorithm $\mathcal{A}^{\prime}$ makes $\alpha$ queries to each bit, and adaptively choose $n-k-\epsilon n$ bits in expectation to acquire the values noiselessly. As a result, Algorithm $\mathcal{A}^{\prime}$ obtains more information on every bit than Algorithm $\mathcal{A}$ , and hence has a lower error probability. We formally state this result in following lemma, and defer its proof to Appendix A.

Lemma 1.

The worst-case error probability of $\mathcal{A}^{\prime}$ is at most the worst-case error probability of $\mathcal{A}$ .

Now, it suffices to show that the worst-case error probability of $\mathcal{A}^{\prime}$ is at least $\delta$ . To bound the error probability of Algorithm $\mathcal{A}^{\prime}$ , we analyze the query responses from the non-adaptive phase using a balls-and-bins model. We view each of the $n$ bits as a ball with an underlying weight. A bit of value one corresponds to a heavy ball, and a bit of value zero corresponds to a light ball. In the non-adaptive phase, each bit $x_{i}$ is queried $\alpha$ times. Let $I_{i}$ denote the number of ones in these $\alpha$ responses. If $I_{i}=j$ , we put ball $i$ to bin $j\in\{0,1,\ldots,\alpha\}$ . This creates $\alpha+1$ bins ${\mathcal{S}}_{j}\triangleq\{i\in[n]\mathchar 58\relax I_{i}=j\}$ for $j\in\{0,1,\ldots,\alpha\}$ .

The statistics of the responses we get from the non-adaptive phase can be characterized by a sequence of $\alpha+1$ sets ${\mathcal{S}}\triangleq({\mathcal{S}}_{0},\ldots,{\mathcal{S}}_{\alpha})$ . For $j\in\{0,\ldots,\alpha\}$ , we define $\mathcal{H}_{j}\triangleq\{i\in[n]\mathchar 58\relax x_{i}=1,I_{i}=j\}$ and $\mathcal{L}_{j}\triangleq\{i\in[n]\mathchar 58\relax x_{i}=0,I_{i}=j\}$ , i.e., $\mathcal{H}_{j}$ (resp. $\mathcal{L}_{j}$ ) represents the set of heavy (resp. light) balls in the $j$ th bin. Let $H_{j}\triangleq\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathcal{H}_{j}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}$ and $L_{j}\triangleq\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathcal{L}_{j}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}$ . Let ${\bf H}_{0}^{\alpha}\triangleq(H_{0},\ldots,H_{\alpha})$ and ${\bf L}_{0}^{\alpha}\triangleq(L_{0},\ldots,L_{\alpha})$ .

In the adaptive phase of Algorithm $\mathcal{A}^{\prime}$ , we noiselessly reveal the weights of $N^{\prime}$ balls. Using ${\mathcal{S}}$ and the weights of the balls revealed in the adaptive phase, Algorithm $\mathcal{A}^{\prime}$ needs to estimate whether there are at least $k$ heavy balls out of $n$ balls.

Restricting input sequence weights. To facilitate the analysis of Algorithm $\mathcal{A}^{\prime}$ , we restrict the input instance space. We assume that the input sequence $\mathbf{x}$ satisfies either $w(\mathbf{x})=k-1$ or $w(\mathbf{x})=k$ , i.e., there are either $k$ or $k-1$ heavy balls. The error for the restricted inputs is a lower bound on the original worst-case error

\max_{\mathbf{x}\in\{0,1\}^{n}}\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}\neq\mathsf{TH}_{k}(\mathbf{x})\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x})\geq\max_{\begin{subarray}{c}\mathbf{x}\in\{0,1\}^{n}\mathchar 58\relax\\ w(\mathbf{x})=k\text{ or}\\ w(\mathbf{x})=k-1\end{subarray}}\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}\neq\mathsf{TH}_{k}(\mathbf{x})\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x}),

where $\widehat{\mathsf{TH}}_{k}$ denote the output estimate of Algorithm ${\mathcal{A}}^{\prime}$ . Later we will show the error on the restricted input space is at least $\delta$ .

Genie-aided information. We assume that a genie provides Algorithm $\mathcal{A}^{\prime}$ with additional information after the non-adaptive phase to assist the estimation. Specifically, the genie reveals two sequences of sets $\bar{\mathcal{H}}_{0}^{\alpha}=(\bar{\mathcal{H}}_{0},\ldots,\bar{\mathcal{H}}_{\alpha})$ and $\bar{\mathcal{L}}_{0}^{\alpha}=(\bar{\mathcal{L}}_{0},\ldots,\bar{\mathcal{L}}_{\alpha})$ , which are defined under two hypotheses $w(\mathbf{x})=k-1$ and $w(\mathbf{x})=k$ respectively as follows.

$\bullet$

When $w(\mathbf{x})=k-1$ , there are in total $k-1$ heavy balls in the bins, and the genie reveals the indices of all of them, i.e., $(\bar{\mathcal{H}}_{0},\ldots,\bar{\mathcal{H}}_{\alpha})=(\mathcal{H}_{0},\ldots,\mathcal{H}_{\alpha})\;\;\text{and}\;\;(\bar{\mathcal{L}}_{0},\ldots,\bar{\mathcal{L}}_{\alpha})=(\mathcal{L}_{0},\ldots,\mathcal{L}_{\alpha}).$
$\bullet$

When $w(\mathbf{x})=k$ , the genie decides to either reveal the indices of $k-1$ heavy balls or the indices of all $k$ heavy balls through a random process. Let ${\mathcal{T}}\triangleq\{j\in\{0,1,\ldots,\alpha\}\mathchar 58\relax\alpha(p-{\epsilon})\leq j\leq\alpha(p+{\epsilon})\}$ be the set of typical bin indices for light balls. The genie picks a random bin $J\in{\mathcal{T}}$ from distribution $\operatorname{\mathsf{P}}(J=j)=\frac{L_{j}}{\sum_{i\in{\mathcal{T}}}L_{i}}$ . If the chosen bin $J$ contains no heavy balls, i.e., $H_{J}=0$ , then the genie reveals the indices of all $k$ heavy balls by setting $(\bar{\mathcal{H}}_{0},\ldots,\bar{\mathcal{H}}_{\alpha})=(\mathcal{H}_{0},\ldots,\mathcal{H}_{\alpha})\;\;\text{and}\;\;(\bar{\mathcal{L}}_{0},\ldots,\bar{\mathcal{L}}_{\alpha})=(\mathcal{L}_{0},\ldots,\mathcal{L}_{\alpha}).$ On the other hand, if $H_{J}\geq 1$ , then the genie chooses a heavy ball $i$ in bin $J$ uniformly at random to be hidden, and reveals the indices of all other heavy balls. This is done by setting $(\bar{\mathcal{H}}_{0},\ldots,\bar{\mathcal{H}}_{\alpha})=(\mathcal{H}_{0},\ldots,\mathcal{H}_{J-1},\mathcal{H}_{J}\setminus\{i\},\mathcal{H}_{J+1},\ldots,\mathcal{H}_{\alpha})$ and $(\bar{\mathcal{L}}_{0},\ldots,\bar{\mathcal{L}}_{\alpha})=(\mathcal{L}_{0},\ldots,\mathcal{L}_{J-1},\mathcal{L}_{J}\cup\{i\},\mathcal{L}_{J+1},\ldots,\mathcal{L}_{\alpha}).$

It is not hard to see that with the additional noiseless information revealed by the genie, the performance of Algorithm $\mathcal{A}^{\prime}$ does not deteriorate. Moreover, because in all cases the sets $\bar{\mathcal{H}}_{0}^{\alpha}$ only contain heavy balls, and the sets $\{\bar{\mathcal{L}_{j}}\mathchar 58\relax j\notin\mathcal{T}\}$ only contain light balls, we assume without loss of generality that the adaptive phase of $\mathcal{A}^{\prime}$ only reveals the weights of the balls in { $\bar{\mathcal{L}_{j}}\mathchar 58\relax j\in\mathcal{T}\}$ . In the remaining proof, we analyze the error probability of $\mathcal{A}^{\prime}$ in the presence of genie-aided information.

Why can we establish a tighter lower bound? Before sketching the proof of our lower bound with the genie-aided information, we first clarify the key distinctions between our analysis and that in Feige et al., (1994). We also highlight the main reason why our approach provides a tighter lower bound for the optimal query complexity.

In the proof by Feige et al., (1994), the error bound for $\mathcal{A}^{\prime}$ is also established through a genie-aided analysis, where the genie picks the bin $J$ to hide a heavy ball from all possible bins $\{0,\ldots,\alpha\}$ instead of choosing it from the set ${\mathcal{T}}$ . It is then shown that when $\alpha$ is small enough, every bin in $\{0,\ldots,\alpha\}$ contains at least one heavy ball with high probability, ensuring the genie hides a heavy ball in the chosen bin. Furthermore, it is shown that the noiseless queries in the adaptive phase would fail to find the hidden heavy ball with a constant probability, resulting in the scenario where the algorithm observes the $k-1$ heavy balls revealed by the genie, but cannot distinguish whether there is a $k$ th hidden heavy ball. This leads to an overall error probability of at least $\delta$ for the algorithm.

In our analysis, we realize that having a heavy ball in every bin with high probability is not a necessary condition for establishing the desired error bound for Algorithm $\mathcal{A}^{\prime}$ . In our proof, the genie instead picks the bin $J$ from ${\mathcal{T}}$ to hide a heavy ball, where ${\mathcal{T}}$ is a typical set of bins for the light balls. We then focus on analyzing the statistical behavior of these bins in ${\mathcal{T}}$ . We show that for each $i\in{\mathcal{T}}$ , the probability that bin $i$ contains at least one heavy ball is $\omega(\delta)$ . Moreover, conditioned on the event that a heavy ball is hidden by the genie, we further show that the information revealed by the genie and the noiseless queries in the adaptive phase are not informative enough for the algorithm to distinguish whether there is a $k$ th heavy ball that is hidden. This leads to the desired error bound. Notably, requiring each bin $i\in{\mathcal{T}}$ to contain at least one heavy ball with probability $\omega(\delta)$ imposes a much less stringent condition on the choice of $\alpha$ comparing to requiring all the bins to contain at least one heavy ball with high probability. Consequently, the choice of $\alpha=\frac{(1-\epsilon)\log\frac{k}{\delta}}{{D_{\mathrm{KL}}}(p-\epsilon\|1-p)}$ in our analysis is much higher than the choice of $\alpha$ in Feige et al., (1994). Therefore, we can show that the error lower bound $\delta$ applies to an algorithm with higher query complexity. This leads to a tighter lower bound for the optimal query complexity.

Error analysis for the optimal estimator. Towards a contradiction, we assume that the worst-case error is at most $\delta$ , i.e.,

\max\{\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=1\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1),\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k\}\leq\delta.

(1)

This implies

\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1)\geq 1-\delta.

(2)

However, in the rest of the proof, we show (2) implies

\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k)>\delta,

(3)

which contradicts with (1).

Sufficient statistic for estimation. We claim a sufficient statistic for the optimal estimator $\widehat{\mathsf{TH}}_{k}$ . When algorithm $\mathcal{A}^{\prime}$ decides its output, it has access to information including $\bar{\mathcal{H}}_{0}^{\alpha}$ , $\bar{\mathcal{L}}_{0}^{\alpha}$ and the weights of the balls revealed in the adaptive phase. By the symmetry of the balls in the same bin, it suffices for $\mathcal{A}^{\prime}$ to consider the cardinality of the sets $\bar{\mathcal{H}}_{0}^{\alpha}$ and $\bar{\mathcal{L}}_{0}^{\alpha}$ , denoted by $\bar{\bf H}_{0}^{\alpha}\triangleq(\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\bar{\mathcal{H}}_{0}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|},\ldots,\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\bar{\mathcal{H}}_{\alpha}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|})$ and $\bar{\bf L}_{0}^{\alpha}\triangleq(\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\bar{\mathcal{L}}_{0}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|},\ldots,\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\bar{\mathcal{L}}_{\alpha}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|})$ . Moreover, notice that the unrevealed balls include at most $1$ heavy ball. The adaptive phase of the algorithm either finds one heavy ball or no heavy ball. By the symmetry of the balls in the same bin, it suffices for $\mathcal{A}^{\prime}$ to consider whether the adaptive phase finds any heavy ball. Let ${\mathcal{E}}_{1}$ denote the event that the adaptive phase does not find any heavy ball. Thus, the optimal estimator $\widehat{\mathsf{TH}}_{k}$ of $\mathcal{A}^{\prime}$ can be written as a function $\widehat{\mathsf{TH}}_{k}(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha},\mathbbm{1}_{\{{\mathcal{E}}_{1}\}})$ .

Next, we argue that certain realizations of the sufficient statistic yield the correct answer to the true threshold $\mathsf{TH}_{k}(\mathbf{x})$ (and thus no error for the optimal estimator). Notice that if $\sum_{i=0}^{\alpha}\bar{H}_{i}=k$ , then we know that $w(\mathbf{x})=k$ . This is because $\bar{\mathcal{H}}_{0}^{\alpha}$ only contain indices of heavy balls. Moreover, if $\mathbbm{1}_{\{{\mathcal{E}}_{1}\}}=0$ , we know that a heavy ball is found in the adaptive phase, and it follows that $w(\mathbf{x})=k$ . Therefore, we assume without loss of generality that the optimal estimator $\widehat{\mathsf{TH}}_{k}(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha},\mathbbm{1}_{\{{\mathcal{E}}_{1}\}})=1$ if $\sum_{i=0}^{\alpha}\bar{H}_{i}=k$ or $\mathbbm{1}_{\{{\mathcal{E}}_{1}\}}=0$ .

Now the only ambiguous case that might lead to error is when $\sum_{i=1}^{\alpha}\bar{H}_{i}=k-1$ and $\mathbbm{1}_{\{{\mathcal{E}}_{1}\}}=1$ . This corresponds to the case where the genie reveals the indices of $k-1$ heavy balls, and the adaptive phase does not find any heavy balls. Among the remaining realizations of the sufficient statistic, we further divide them into two sets according to whether the estimator $\widehat{\mathsf{TH}}_{k}({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha},1)$ is 0 or 1. We define $\mathcal{N}$ as the collection of realizations $({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})$ such that $\widehat{\mathsf{TH}}_{k}({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha},1)=0$ . In other words, we have

\displaystyle\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0)=\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}).

(4)

Ideally, if we can prove that this sufficient statistic $(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha},\mathbbm{1}_{\{{\mathcal{E}}_{1}\}})$ is similarly distributed under the two different hypotheses $w({\mathbf{x}})=k-1$ and $w({\mathbf{x}})=k$ , then we can lower bound the ratio $\frac{\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w({\mathbf{x}})=k)}{\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w({\mathbf{x}})=k-1)}$ , and hence show that (2) implies (3). More specifically, if we can show that

\frac{\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k\right)}{\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1\right)}=\omega(\delta)

(5)

for any $({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}$ , then we would have

\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w({\mathbf{x}})=k)=\omega(\delta)\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w({\mathbf{x}})=k-1)>\delta,

which completes the proof. However, it turns out that the desired property (5) does not hold for all realizations $({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}$ . Instead, we slightly adjust this approach, and identify a family $\mathcal{C}$ of realizations $({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})$ satisfying

1.

Typicality: $\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{C}}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1)\geq 1-o(1).$
2.

Bounded ratio property²²2For simplicity, the bounded ratio property presented here is a simplified version of what we actually prove, and it holds only in the special case when ${\mathcal{A}}^{\prime}$ makes a fixed number of queries in the adaptive phase. We refer the readers to Propositions 2 and 4 for the properties in the general case.: $\frac{\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k\right)}{\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1\right)}=\omega(\delta),$ $\forall({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{C}$ .

With the two properties for the family $\mathcal{C}$ , we have

	$\displaystyle\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w({\mathbf{x}})=1)$	$\displaystyle\geq\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}\cap\mathcal{C}}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w({\mathbf{x}})=1)$
		$\displaystyle\geq\omega(\delta)\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}\cap\mathcal{C}}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w({\mathbf{x}})=0)$
		$\displaystyle\geq\omega(\delta(1-\delta-o(1)))=\omega(\delta),$

which completes the proof. In the detailed proof presented in Appendix A, we further separate the regime $\log k>\frac{\log(1/\delta)}{\log\log(1/\delta)}$ into two sub-regimes $\frac{\log(1/\delta)}{\log\log(1/\delta)}<\log k\leq\log(1/\delta)\log\log(1/\delta)$ and $\log k>\log(1/\delta)\log\log(1/\delta)$ . We present the detailed construction of the family $\mathcal{C}$ in these two sub-regimes in Appendix A.

IV Main Idea in the Proof of the Upper Bound

In this section, we propose a noisy computing algorithm for the $\mathsf{TH}_{k}$ function, and illustrate the main idea in the proof of Theorem 2. We separately consider two regimes $k>\frac{n}{\log n}$ and $k\leq\frac{n}{\log n}$ .

IV-A Regime $k>\frac{n}{\log n}$

To estimate a function of a binary sequence ${\mathbf{x}}$ , one natural idea is first to estimate each bit in ${\mathbf{x}}$ and then to compute the function with the estimates. In this regime, the proposed algorithm exactly follows this idea.

The proposed algorithm estimates the bits in the sequence by employing an existing algorithm ChechBit as a subroutine. CheckBit (Algorithm 1 in Gu and Xu, (2023)) takes as input a single bit and a desired error probability $\delta$ , and returns an estimate of the bit through repeated queries. A detailed description of the CheckBit algorithm is provided in Algorithm 3 in Appendix B. Lemma 13 in Gu and Xu, (2023) shows that the algorithm makes at most $\frac{1}{1-2p}\left\lceil\frac{\log\frac{1-\delta}{\delta}}{\log\frac{1-p}{p}}\right\rceil$ queries in expectation and returns the correct value of the input bit with probability at least $1-\delta$ .

In the proposed algorithm, we call the CheckBit function to estimate $x_{i}$ for each $i\in[n]$ with error probability set to $\delta/n$ , and obtain an estimate $\hat{x}_{i}$ . The algorithm then outputs $\widehat{\mathsf{TH}}_{k}(x)=\mathbbm{1}_{\{\sum_{i=1}^{n}\hat{x}_{i}\geq k\}}$ as the estimate. The pseudo-code of this algorithm is given in Algorithm 1.

Data: Bit sequence

{\bf x}=(x_{1},\ldots,x_{n})

, parameter

k\leq\frac{n}{2}

, error probability

\delta

, noise probability

p

Result: Estimate of

\mathsf{TH}_{k}({\bf x})

w\leftarrow 0

2 for $i\in[n]$ do

\hat{x}_{i}\leftarrow

CheckBit(

x_{i}

\delta/n

p

)

w\leftarrow w+\hat{x}_{i}

6 end for

return

\mathbbm{1}_{\{\sum_{i=1}^{n}\hat{x}_{i}\geq k\}}

Algorithm 1 Proposed NoisyThreshold algorithm for

k\geq\frac{n}{\log n}

By the union bound, we have $\operatorname{\mathsf{P}}(\exists i\in[n]\mathchar 58\relax\hat{x}_{i}\neq x_{i})\leq\delta$ . Consequently, we get $\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}(x)\neq\mathsf{TH}_{k}(x))\leq\delta$ , i.e., the error probability of Algorithm 1 is at most $\delta$ . The expected number of queries used by these $n$ calls of the CheckBit function is at most

\frac{n}{1-2p}\left\lceil\frac{\log\frac{1-\delta/n}{\delta/n}}{\log\frac{1-p}{p}}\right\rceil\leq\frac{n}{1-2p}\left(\frac{\log\frac{n}{\delta}}{\log\frac{1-p}{p}}+1\right)=(1+o(1))\frac{n\log\frac{n}{\delta}}{{D_{\mathrm{KL}}(p\|1-p)}}.

In the regime $k>\frac{n}{\log n}$ , it follows that $\log\frac{n}{\delta}=(1+o(1))\log\frac{k}{\delta}$ . Therefore, Algorithm 1 uses at most $(1+o(1))\frac{n\log\frac{n}{\delta}}{{D_{\mathrm{KL}}(p\|1-p)}}$ queries in expectation. This proves Theorem 2 in the regime $k>\frac{n}{\log n}$ .

Notably, in the opposite regime $k\leq\frac{n}{\log n}$ , the equation $\log\frac{n}{\delta}=(1+o(1))\log\frac{k}{\delta}$ does not necessarily hold true, and hence the desired upper bound in Theorem 2 cannot be obtained through Algorithm 1. For example, suppose $k=\log n$ and $\delta=1/\log n$ . The upper bound provided by Algorithm 1 is $\frac{(1+o(1))n\log n}{D_{\mathsf{KL}}(p\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}1-p)}$ , while the desired upper bound in Theorem 2 is given by $\frac{(1+o(1))2n\log\log n}{D_{\mathsf{KL}}(p\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}1-p)}$ .

IV-B Regime $k\leq\frac{n}{\log n}$

In this regime, we propose the NoisyThreshold algorithm, given in Algorithm 2, that computes the $\mathsf{TH}_{k}$ function with the desired number of queries and error probability. In addition to CheckBit, Algorithm 2 also utilizes the existing MaxHeapThreshold algorithm as a subroutine.

Data: Bit sequence

{\bf x}=(x_{1},\ldots,x_{n})

, parameter

k\leq\frac{n}{2}

, error probability

\delta

, noise probability

p

Result: Estimate of

\mathsf{TH}_{k}({\bf x})

{\mathcal{S}}\leftarrow\emptyset

3for $i\in[n]$ do

4 if CheckBit( $x_{i}$ , $\delta/k$ , $p$ ) = $1$ then

5 Append

x_{i}

{\mathcal{S}}

7 end if

9 end for

10if $\lvert{\mathcal{S}}\rvert\leq k-1$ then

11 return 0

13 else if $\lvert{\mathcal{S}}\rvert\geq k+\max(n\delta+n\sqrt{\delta},\frac{n}{\log n})$ then

14 return 1

16 else

17 return MaxHeapThreshold(

{\mathcal{S}}

k

\delta

p

)

19 end if

Algorithm 2 Proposed NoisyThreshold algorithm for

k\leq\frac{n}{\log n}

MaxHeapThreshold is the algorithm proposed by Feige et al., (1994) for estimating the $\mathsf{TH}_{k}$ function. Its main idea is to use heapsort to find the $k$ -th largest element in ${\mathbf{x}}$ , and then repeatedly query this element to decide $\mathsf{TH}_{k}(x)$ . To cope with the observation noise, each query in heapsort is repeated for a certain amount of times. Detailed descriptions of the MaxHeapThreshold algorithm and its subroutines are given in Algorithms 4, 5 and 6 in Appendix B. Theorem 3.3 in Feige et al., (1994) shows that the function uses at most $O(n\log\frac{k}{\delta})$ queries. Although the query complexity of MaxHeapThreshold achieves the correct asymptotic order, there remains a constant multiplicative gap between its query complexity and the optimal value, as demonstrated in the numerical examples provided in Section I.

The main idea in Algorithm 2 is first reducing the problem size to the order of $o(n)$ through a filtering process for the bits in ${\mathbf{x}}$ , and then applying MaxHeapThreshold on the reduced problem. In the first step, we call the CheckBit function on each bit to estimate its value with tolerated error probability $\delta/k$ . The bits estimated to be $1$ are added to the set $\mathcal{S}$ . This step can be viewed as a filtering process that provides a set of bits believed to be $1$ with high confidence. Notice that with an error probability $\delta/k$ for each bit estimation, we cannot guarantee that all bits are correctly estimated with probability at least $1-\delta$ as in the case of Algorithm 1. Instead, we show that with probability at least $1-2\delta$ , $\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathcal{S}}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}$ is below a certain threshold $\tau$ if $\mathsf{TH}_{k}(x)=0$ , and ${\mathcal{S}}$ contains at least $k$ bits of true value 1 if $\mathsf{TH}_{k}(x)=1$ . From here, we can output $\mathsf{TH}_{k}(x)=1$ if $\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathcal{S}}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\geq\tau$ (Line 10 of Algorithm 2), and output $\mathsf{TH}_{k}(x)=0$ if $\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathcal{S}}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\leq k-1$ (Line 8 of Algorithm 2). When none of these two cases holds, the problem of estimating $\mathsf{TH}_{k}({\mathbf{x}})$ reduces to deciding whether ${\mathcal{S}}$ contains at least $k$ bits of value $1$ . For this reduced problem, we perform the second step by calling the MaxHeapThreshold function on the set ${\mathcal{S}}$ (Line 12 of Algorithm 2) to decide the final output.

By applying the union bound over the two steps, the overall error probability of Algorithm 2 is at most $\delta^{\prime}$ when the input parameter $\delta$ is set to $\delta/3$ . Notably, $\tau$ is chosen to be $o(n)$ . So in the event that the MaxHeapThreshold function is executed, its input size is $o(n)$ . Although running MaxHeapThreshold induces a suboptimality of a constant factor, running it on an input size of $o(n)$ does not affect the dominating term in the query complexity of Algorithm 2. Therefore, the dominating term of the query complexity solely depends on the query complexity in the first step, which is upper bounded by $(1+o(1))\frac{n\log\frac{k}{\delta^{\prime}}}{{D_{\mathrm{KL}}(p\|1-p)}}$ . This proves Theorem 2 in the regime of $k\leq\frac{n}{\log n}$ . The detailed proof for this regime is presented in Appendix B.

V Conclusion and Discussion

In this work, we establish new upper and lower bounds of optimal query complexity for noisy computing of the $\mathsf{TH}_{k}$ function. Our bounds are asymptotically tight in the case of $k=o(n)$ , while having a multiplicative gap of at most $2$ in general. Our new bounds strictly improve the best known bounds by Feige et al., (1994). In the following, we discuss several extensions of our study.

Closing the gap.

We believe the gap between our bounds stems from the lower bound. To establish the lower bound, we assume the existence of a genie that reveals auxiliary information to the algorithm, with the amount of information scaling with $k$ . We conjecture that when $k$ is large, the additional information provided by the genie fundamentally alters the optimal query complexity, leading to a looser lower bound. How to tighten our lower bound remains an open problem.

Unknown $p$ .

This paper assumes that the parameter $p$ is known. However, our bounds can be easily extended to the case of unknown $p$ . Note that the noisy computing problem becomes strictly harder when $p$ is unknown, so our lower bound automatically holds in the unknown $p$ setting. To extend our upper bound, the proposed algorithms need to be modified so that they do not rely on the knowledge of $p$ . The key idea is to incorporate an initial estimation step for $p$ before proceeding with the subsequent steps using the estimated value. More specifically, we can first query $x_{1}$ for $\theta=\frac{n\log\frac{1}{\delta}}{\log n}$ times, and compute a biased estimate of $p$ as $\hat{p}=\frac{\min(r,\theta-r)}{\theta(1-\frac{1}{\log n})}$ , where $r$ denotes the number of $1$ ’s observed in these $\theta$ queries. The original steps in Algorithms 1 and 2 are then executed with input $\hat{p}$ . By incorporating this estimation step, it can be shown that the error probabilities of Algorithms 1 and 2 both increase by at most $o(\delta)$ , and the expected query complexities both increase by a factor of at most $1+o(1)$ . This extends our upper bound to the unknown $p$ setting.

Varying $p$ .

In our study, we always assume that the noise parameter $p$ is a constant bounded away from both $0$ and $\frac{1}{2}$ . Another interesting extension of our study is to let $p$ also be a function of $n$ , and considering the limiting behavior of the optimal complexity when $p\rightarrow 0$ or $p\rightarrow\frac{1}{2}$ .

References

Agarwal et al., (2017) Agarwal, A., Agarwal, S., Assadi, S., and Khanna, S. (2017). Learning with limited rounds of adaptivity: Coin tossing, multi-armed bandits, and ranking from pairwise comparisons. In Proceedings of the 2017 Conference on Learning Theory, volume 65 of Proceedings of Machine Learning Research, pages 39–75. PMLR.
Auer et al., (1995) Auer, P., Cesa-Bianchi, N., Freund, Y., and Schapire, R. E. (1995). Gambling in a rigged casino: The adversarial multi-armed bandit problem. In Proceedings of IEEE 36th annual foundations of computer science, pages 322–331. IEEE.
Berlekamp, (1964) Berlekamp, E. R. (1964). Block coding with noiseless feedback. Ph.D. thesis, MIT, Cambridge, MA, USA.
Bretagnolle and Huber, (1979) Bretagnolle, J. and Huber, C. (1979). Estimation des densités: risque minimax. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, 47(2):119–137.
Burnashev and Zigangirov, (1974) Burnashev, M. V. and Zigangirov, K. (1974). An interval estimation problem for controlled observations. Problemy Peredachi Informatsii, 10(3):51–61.
(6) Dobrushin, R. L. and Ortyukov, S. (1977a). Lower bound for the redundancy of self-correcting arrangements of unreliable functional elements. Problemy Peredachi Informatsii, 13(1):82–89.
(7) Dobrushin, R. L. and Ortyukov, S. (1977b). Upper bound on the redundancy of self-correcting arrangements of unreliable functional elements. Problemy Peredachi Informatsii, 13(3):56–76.
Evans and Pippenger, (1998) Evans, W. and Pippenger, N. (1998). Average-case lower bounds for noisy boolean decision trees. SIAM Journal on Computing, 28(2):433–446.
Falahatgar et al., (2017) Falahatgar, M., Orlitsky, A., Pichapati, V., and Suresh, A. T. (2017). Maximum selection and ranking under noisy comparisons. In Precup, D. and Teh, Y. W., editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1088–1096. PMLR.
Feige et al., (1994) Feige, U., Raghavan, P., Peleg, D., and Upfal, E. (1994). Computing with noisy information. SIAM Journal on Computing, 23(5):1001–1018.
Gács and Gál, (1994) Gács, P. and Gál, A. (1994). Lower bounds for the complexity of reliable boolean circuits with noisy gates. IEEE Trans. Inf. Theory, 40(2):579–583.
Gu and Xu, (2023) Gu, Y. and Xu, Y. (2023). Optimal bounds for noisy sorting. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, STOC 2023, page 1502–1515, New York, NY, USA. Association for Computing Machinery.
Hastad et al., (1987) Hastad, J., Leighton, T., and Newman, M. (1987). Reconfiguring a hypercube in the presence of faults. In Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing, STOC ’87, page 274–284, New York, NY, USA.
Heckel et al., (2019) Heckel, R., Shah, N. B., Ramchandran, K., and Wainwright, M. J. (2019). Active ranking from pairwise comparisons and when parametric assumptions do not help. Ann. Stat., 47(6):3099 – 3126.
Hoeffding, (1963) Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13–30.
Horstein, (1963) Horstein, M. (1963). Sequential transmission using noiseless feedback. IEEE Trans. Inf. Theory, 9(3):136–143.
Karp and Kleinberg, (2007) Karp, R. M. and Kleinberg, R. (2007). Noisy binary search and its applications. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’07, page 881–890, USA. Society for Industrial and Applied Mathematics.
Kushilevitz and Mansour, (2005) Kushilevitz, E. and Mansour, Y. (2005). Computation in noisy radio networks. SIAM Journal on Discrete Mathematics, 19(1):96–108.
Mitzenmacher and Upfal, (2017) Mitzenmacher, M. and Upfal, E. (2017). Probability and computing: Randomization and probabilistic techniques in algorithms and data analysis. Cambridge university press.
Pease et al., (1980) Pease, M., Shostak, R., and Lamport, L. (1980). Reaching agreement in the presence of faults. J. ACM, 27(2):228–234.
Pelc, (1989) Pelc, A. (1989). Searching with known error probability. Theoretical Computer Science, 63(2):185–202.
Pippenger et al., (1991) Pippenger, N., Stamoulis, G. D., and Tsitsiklis, J. N. (1991). On a lower bound for the redundancy of reliable networks with noisy gates. IEEE Transactions on Information Theory, 37(3):639–643.
Reischuk and Schmeltz, (1991) Reischuk, R. and Schmeltz, B. (1991). Reliable computation with noisy circuits and decision trees-a general n log n lower bound. In [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science, pages 602–611. IEEE Computer Society.
Shah and Wainwright, (2018) Shah, N. B. and Wainwright, M. J. (2018). Simple, robust and optimal ranking from pairwise comparisons. J. Mach. Learn. Res., 18(199):1–38.
Von Neumann, (1956) Von Neumann, J. (1956). Probabilistic logics and the synthesis of reliable organisms from unreliable components. Automata studies, 34:43–98.
Wang et al., (2024) Wang, Z., Ghaddar, N., Zhu, B., and Wang, L. (2024). Noisy sorting capacity. IEEE Transactions on Information Theory, 70(9):6121–6138.
Yu, (1997) Yu, B. (1997). Assouad, fano, and le cam. In Festschrift for Lucien Le Cam, pages 423–435. Springer.
Zhu et al., (2024) Zhu, B., Wang, Z., Ghaddar, N., Jiao, J., and Wang, L. (2024). Noisy computing of the or and max functions. IEEE Journal on Selected Areas in Information Theory, 5:302–313.

Appendix A Detailed Proof of Theorem 1

In this section, we prove Theorem 1 in three separate regimes $\frac{\log(1/\delta)}{\log\log(1/\delta)}<\log k\leq\log(1/\delta)\log\log(1/\delta)$ , $\log k>\log(1/\delta)\log\log(1/\delta)$ and $\log k\leq\frac{\log(1/\delta)}{\log\log(1/\delta)}$ .

A-A Regime 1: $\frac{\log(1/\delta)}{\log\log(1/\delta)}<\log k\leq\log(1/\delta)\log\log(1/\delta)$

Let ${\epsilon}=(\log k)^{-1/4}$ . Because we assume that $\log k>\frac{\log(1/\delta)}{\log\log(1/\delta)}=\omega(1)$ , we have ${\epsilon}=o(1)$ . Since the KL divergence is a Lipschitz continuous function, it follows that $\frac{(1-\epsilon)(n-k-\epsilon n)\log\frac{k}{\delta}}{{D_{\mathrm{KL}}}(p-\epsilon\|1-p)}=(1-o(1))\frac{(n-k)\log\frac{k}{\delta}}{{D_{\mathrm{KL}}(p\|1-p)}}$ , and it suffices to show that for any algorithm that makes at most $\frac{(1-\epsilon)(n-k-\epsilon n)\log\frac{k}{\delta}}{{D_{\mathrm{KL}}}(p-\epsilon\|1-p)}$ queries in expectation, the worst-case error probability is at least $\delta$ .

Let $\mathcal{A}$ be an arbitrary algorithm that uses at most $\frac{(1-\epsilon)(n-k-\epsilon n)\log\frac{k}{\delta}}{{D_{\mathrm{KL}}}(p-\epsilon\|1-p)}$ queries in expectation. Recall the enhanced algorithm $\mathcal{A}^{\prime}$ described in Section III. By Lemma 1, it suffices to show that the worst case error probability of algorithm $\mathcal{A}^{\prime}$ is at least $\delta$ . To accomplish this, we prove

\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k)>\delta

(6)

under the assumption that

\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1)\geq 1-\delta.

(7)

Recall the notations under the balls-and-bins model we defined in Section III. In addition, we define $P_{j}^{1}\triangleq\binom{\alpha}{j}(1-p)^{j}p^{\alpha-j}$ as the probability that a heavy ball is put into bin $j$ , and $P_{j}^{0}\triangleq\binom{\alpha}{j}p^{j}(1-p)^{\alpha-j}$ as the probability that a light ball is put into bin $j$ . Recall that we define $\mathcal{N}$ as the collection of realizations $({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})$ such that $\widehat{\mathsf{TH}}_{k}({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha},1)=0$ , and we have

\displaystyle\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0)=\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}).

(8)

In the following, we construct the family $\mathcal{C}$ of realizations such that the typicality and the bounded ratio property introduced in Section III are satisfied. Let $\mathcal{C}$ be the collection of all $(2\alpha+2)$ -tuples of natural numbers $({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})$ satisfying

1.

$\sum_{i=0}^{\alpha}h_{i}=k-1$ , $\sum_{i=0}^{\alpha}l_{i}=n-k+1$ ;
2.

$\sum_{i\in{\mathcal{T}}}l_{i}\geq(n-k+1)\left(1-2\exp\left(-\frac{\sqrt{\log k}}{6p{D_{\mathrm{KL}}}(p-{\epsilon}\|1-p)}\right)\right)\left(1-\sqrt{\frac{\log n}{n}}\right)$ ;
3.

$(1-\Delta_{j}^{0})(n-k+1)P_{j}^{0}\leq l_{j}\leq(1+\Delta_{j}^{0})(n-k+1)P_{j}^{0}$ for any $j\in{\mathcal{T}}$ , where $\Delta_{j}^{0}\triangleq\sqrt{\frac{\log n}{(n-k+1)P_{j}^{0}}}$ ;
4.

$h_{i}=0$ for any $i\in{\mathcal{T}}$ .

Here we provide a brief interpretation of the last three conditions in the definition of $\mathcal{C}$ . Firstly, as we have mentioned, ${\mathcal{T}}$ represents the typical range of bins for the light balls. Therefore, condition 2 essentially states that only a vanishing fraction of the light balls will fall outside the typical bins. Secondly, as we will see in the later part of the proof, $P_{j}^{0}=\Omega(n^{-1+\Theta(1)})$ for each $j\in{\mathcal{T}}$ . This implies that $\Delta_{j}^{0}=o(1)$ , and condition 3 requires that the number of light balls in each typical bin has a small deviation from its mean. Finally, we will later show that $\operatorname{\mathsf{E}}[H_{j}]=o(1)$ for each $j\in{\mathcal{T}}$ , and condition 4 essentially requires the concentration of the number of heavy balls in these bins. From these, we can see that $\mathcal{C}$ is a collection of typical realizations for $({\bf H}_{0}^{\alpha},{\bf L}_{0}^{\alpha})$ . In the following two propositions, we state the typicality and the bounded ratio property for $\mathcal{C}$ .

Proposition 1.

There exists a positive constant $\gamma$ such that

\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{C}}\operatorname{\mathsf{P}}(\{({\bf H}_{0}^{\alpha},{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1)\geq 1-O\left(\frac{\log k\log\log(1/\delta)}{k^{\frac{\gamma}{\log\log(1/\delta)}}}\right)-O(n^{-1/6}).

(9)

Proposition 2.

Let $n^{*}\triangleq(n-k+1)\left(1-2\exp\left(-\frac{\sqrt{\log k}}{6p{D_{\mathrm{KL}}}(p-{\epsilon}\|1-p)}\right)\right)\left(1-\sqrt{\frac{\log n}{n}}\right)-(2{\epsilon}\alpha+1).$ For any $({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{C}$ , we have

	$\displaystyle\frac{\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k\right)}{\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k-1\right)}$		(10)
	$\displaystyle\geq\tfrac{(1+o(1))\delta}{n}\left(\tfrac{k}{\delta}\right)^{{\epsilon}/2}\left(n^{*}-\operatorname{\mathsf{E}}[N^{\prime}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w({\mathbf{x}})=k-1,(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})]\right).$		(11)

With Propositions 1 and 2, we are now ready to prove inequality (6). By Proposition 2, we have

	$\displaystyle\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k)$
	$\displaystyle\geq\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}\cap\mathcal{C}}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k)$
	$\displaystyle\stackrel{{\scriptstyle\mathchoice{\hbox to0.0pt{\hss$\displaystyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\textstyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\scriptstyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\scriptscriptstyle{(a)}$\hss}}}}{{\geq}}\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}\cap\mathcal{C}}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k-1)$
	$\displaystyle\hskip 10.00002pt\cdot\tfrac{(1+o(1))\delta}{n}\left(\tfrac{k}{\delta}\right)^{{\epsilon}/2}\left(n^{*}-\operatorname{\mathsf{E}}[N^{\prime}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w({\mathbf{x}})=k-1,(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})]\right)$
	$\displaystyle=\tfrac{(1+o(1))\delta}{n}\left(\tfrac{k}{\delta}\right)^{{\epsilon}/2}\Bigg{(}\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}\cap\mathcal{C}}n^{*}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k-1)$
	$\displaystyle\hskip 10.00002pt-\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k-1)\operatorname{\mathsf{E}}[N^{\prime}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w({\mathbf{x}})=k-1,(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})]\Bigg{)}.$		(12)

Now, we bound the summation in (12). Since we assume that $\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1)\geq 1-\delta$ , we have

1-\delta\leq\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1)=\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1),

(13)

By Proposition 1, we have

\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}\cap\mathcal{C}}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1)\geq 1-\delta-O\left(\frac{\log k\log\log(1/\delta)}{k^{\frac{\gamma}{\log\log(1/\delta)}}}\right)-O(n^{-1/6}).

By the definition of $n^{*}$ in Proposition 2, it follows that

	$\displaystyle\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}\cap\mathcal{C}}n^{*}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k-1)$
	$\displaystyle\geq(n-k)\bigg{(}1-\delta-O\left(\frac{\log k\log\log(1/\delta)}{k^{\frac{\gamma}{\log\log(1/\delta)}}}\right)-O(n^{-1/6})-2\exp\left(-\frac{\sqrt{\log k}}{6p{D_{\mathrm{KL}}}(p-{\epsilon}\\|1-p)}\right)$
	$\displaystyle\;\;\;-O\left(\sqrt{\tfrac{\log n}{n}}\right)-O\left(\frac{{\epsilon}\alpha}{n}\right)\bigg{)}.$		(14)

By the law of total expectation and the fact that $N^{\prime}$ is always non-negative, we have

	$\displaystyle\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}\cap\mathcal{C}}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k-1)\operatorname{\mathsf{E}}[N^{\prime}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w({\mathbf{x}})=k-1,(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})]$
	$\displaystyle\leq\operatorname{\mathsf{E}}[N^{\prime}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w({\mathbf{x}})=k-1]=n-k-{\epsilon}n.$		(15)

Because ${\epsilon}=(\log k)^{-1/4}$ , we have

\delta+\frac{\log k\log\log(1/\delta)}{k^{\frac{\gamma}{\log\log(1/\delta)}}}+n^{-1/6}+2\exp\left(-\frac{\sqrt{\log k}}{6p{D_{\mathrm{KL}}}(p-{\epsilon}\|1-p)}\right)+\sqrt{\tfrac{\log n}{n}}+\frac{{\epsilon}\alpha}{n}=o(\epsilon).

Plugging (14) and (15) into (12) yields

\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k)\geq(1-o(1)){\epsilon}\delta\left(\tfrac{k}{\delta}\right)^{{\epsilon}/2}.

Finally, ${\epsilon}=(\log k)^{-1/4}$ implies that ${\epsilon}\left(\tfrac{k}{\delta}\right)^{{\epsilon}/2}=\omega(1)$ . This shows that

\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k)\geq\omega(\delta).

To finish the proof in this regime, we are left to prove Propositions 1 and 2 and Lemma 1.

A-A1 Proof of Propositions

Proof of Proposition 1.

We define error events

{\mathcal{E}}_{2}\triangleq\left\{\sum_{j\in{\mathcal{T}}}L_{j}\leq(n-k+1)\left(1-2\exp\left(-\frac{\sqrt{\log k}}{6p{D_{\mathrm{KL}}}(p-{\epsilon}\|1-p)}\right)\right)\left(1-\sqrt{\frac{\log n}{n}}\right)\right\},

{\mathcal{E}}_{3}\triangleq\{\exists j\in{\mathcal{T}}\mathchar 58\relax L_{j}\notin[(1-\Delta_{j}^{0})(n-k+1)P_{j}^{0},(1+\Delta_{j}^{0})(n-k+1)P_{j}^{0}]\},

and

{\mathcal{E}}_{4}\triangleq\{\exists j\in{\mathcal{T}}\mathchar 58\relax H_{j}\geq 1\}.

Notice that events ${\mathcal{E}}_{2},{\mathcal{E}}_{3}$ and ${\mathcal{E}}_{4}$ correspond to the second, third and fourth conditions in the definition of $\mathcal{C}$ . The first condition is always satisfied because we always have $\sum_{i=0}^{\alpha}H_{i}=k-1$ and $\sum_{i=0}^{\alpha}L_{i}=n-k+1$ given $w(\mathbf{x})=k-1$ . It follows that

	$\displaystyle\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{C}}\operatorname{\mathsf{P}}(\{({\bf H}_{0}^{\alpha},{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k-1)$
	$\displaystyle=1-\operatorname{\mathsf{P}}({\mathcal{E}}_{2}\cup{\mathcal{E}}_{3}\cup{\mathcal{E}}_{4}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k-1)$
	$\displaystyle\geq 1-\operatorname{\mathsf{P}}({\mathcal{E}}_{2}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k-1)-\operatorname{\mathsf{P}}({\mathcal{E}}_{3}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k-1)-\operatorname{\mathsf{P}}({\mathcal{E}}_{4}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k-1),$

where the last inequality follows by the union bound. We bound the probabilities of these three error events in the following lemmas. The proof of these lemmas are deferred to later part of this section.

Lemma 2.

For any $\mathbf{x}$ with $w(\mathbf{x})=k-1$ , we have $\operatorname{\mathsf{P}}({\mathcal{E}}_{2}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x})\leq n^{-1/5}.$

Lemma 3.

For any $\mathbf{x}$ with $w(\mathbf{x})=k-1$ , we have $\operatorname{\mathsf{P}}({\mathcal{E}}_{3}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x})\leq n^{-1/6}$ .

Lemma 4.

For any $\mathbf{x}$ with $w(\mathbf{x})=k-1$ , we have $\operatorname{\mathsf{P}}({\mathcal{E}}_{4}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x})=O\left(\frac{\log k\log\log(1/\delta)}{k^{\frac{\gamma}{\log\log(1/\delta)}}}\right)$ , where $\gamma$ is a positive constant.

By Lemmas 2, 3 and 4, we have

	$\displaystyle\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{C}}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k-1)$	$\displaystyle\geq 1-n^{-1/5}-n^{-1/6}-O\left(\frac{\log k\log\log(1/\delta)}{k^{\frac{\gamma}{\log\log(1/\delta)}}}\right)$
		$\displaystyle=1-O(n^{-1/6})-O\left(\frac{\log k\log\log(1/\delta)}{k^{\frac{\gamma}{\log\log(1/\delta)}}}\right),$

which completes the proof of Proposition 1. ∎

Proof of Proposition 2.

To begin with, we notice that on any input $\mathbf{x}$ with $w(\mathbf{x})=k-1$ , the genie-aided information is exactly $\bar{\bf H}_{0}^{\alpha}={\bf H}_{0}^{\alpha}$ and $\bar{\bf L}_{0}^{\alpha}={\bf L}_{0}^{\alpha}$ . Moreover, conditioned on $w(\mathbf{x})=k-1$ , we know that ${\bf H}_{0}^{\alpha}$ represents the number of heavy balls in each bin, and ${\bf L}_{0}^{\alpha}$ represents the number of light balls in each bin. Therefore, given $w(\mathbf{x})=k-1$ , ${\bf H}_{0}^{\alpha}$ and ${\bf L}_{0}^{\alpha}$ are two independent random vectors with multinomial conditional distributions ${\bf H}_{0}^{\alpha}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\{w(\mathbf{x})=k-1\}\sim\mathrm{M}(k-1;P_{0}^{1},\ldots,P_{\alpha}^{1})$ and ${\bf L}_{0}^{\alpha}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\{w(\mathbf{x})=k-1\}\sim\mathrm{M}(n-k+1;P_{0}^{0},\ldots,P_{\alpha}^{0})$ respectively. Here, we say a random vector $\mathbf{Z}=(Z_{1},\ldots,Z_{m})$ follows multinomial distribution $\mathrm{M}(N;p_{1},\ldots,p_{m})$ if its pmf is given by $\operatorname{\mathsf{P}}(\mathbf{Z}=\mathbf{z})=\frac{N!\prod_{i=1}^{m}p_{i}^{z_{i}}}{\prod_{i=1}^{m}z_{i}!}.$ Therefore, the denominator in (10) can be written as

	$\displaystyle\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k-1\right)$
	$\displaystyle=\frac{(k-1)!\prod_{i=0}^{\alpha}(P_{i}^{1})^{h_{i}}}{\prod_{i=0}^{\alpha}h_{i}!}\frac{(n-k+1)!\prod_{i=0}^{\alpha}(P_{i}^{0})^{l_{i}}}{\prod_{i=0}^{\alpha}l_{i}!}.$		(16)

Now, we move on to consider the numerator in (10). Recalled that when $w(\mathbf{x})=k$ , the genie picks a bin with random index $J\in{\mathcal{T}}$ . If there exists a heavy ball in bin $J$ , the genie randomly chooses a heavy ball from there and keeps it unrevealed. The numerator in (10) can be written as

	$\displaystyle\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k\right)$
	$\displaystyle=\sum_{j\in{\mathcal{T}}}\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap\{J=j\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k\right)$
	$\displaystyle=\sum_{j\in{\mathcal{T}}}\operatorname{\mathsf{P}}\left(\{({\bf H}_{0}^{\alpha},{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha}+{\bf e}_{j},{\bf l}_{0}^{\alpha}-{\bf e}_{j})\}\cap\{J=j\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k\right),$		(17)

where for each $j\in\{0,\ldots,\alpha\}$ , ${\bf e}_{j}$ is the length- $(\alpha+1)$ indicator vector for the $(j+1)$ -th coordinate, i.e., the $(j+1)$ -th coordinate has value one, while all other coordinates have value zero. By the chain rule, we have

	$\displaystyle\operatorname{\mathsf{P}}\left(\{({\bf H}_{0}^{\alpha},{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha}+{\bf e}_{j},{\bf l}_{0}^{\alpha}-{\bf e}_{j})\}\cap\{J=j\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k\right)$
	$\displaystyle=\operatorname{\mathsf{P}}(\{{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha}+{\bf e}_{j}\}\cap\{{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha}-{\bf e}_{j}\}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k)$
	$\displaystyle\;\;\cdot\operatorname{\mathsf{P}}(\{J=j\}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha}+{\bf e}_{j},{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha}-{\bf e}_{j},w(\mathbf{x})=k)$
	$\displaystyle\;\;\cdot\operatorname{\mathsf{P}}({\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}J=j,Y_{0}^{\alpha}=y_{0}^{\alpha}+{\bf e}_{j},Z_{0}^{\alpha}=z_{0}^{\alpha}-{\bf e}_{j},w(\mathbf{x})=k),$		(18)

Now, we bound the three probabilities in (18) separately. Firstly, given $w(\mathbf{x})=k$ , ${\bf H}_{0}^{\alpha}$ and ${\bf L}_{0}^{\alpha}$ are two independent random vectors with conditional distributions ${\bf H}_{0}^{\alpha}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\{w(\mathbf{x})=k\}\sim\mathrm{M}(k;P_{0}^{1}\ldots,P_{\alpha}^{1})$ and ${\bf L}_{0}^{\alpha}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\{w(\mathbf{x})=k\}\sim\mathrm{M}(n-k;P_{0}^{0}\ldots,P_{\alpha}^{0})$ respectively. This suggests that

\displaystyle\operatorname{\mathsf{P}}(\{{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha}+{\bf e}_{j}\}\cap\{{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha}-{\bf e}_{j}\}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k)=\frac{k!P_{j}^{1}\prod_{i=0}^{\alpha}(P_{i}^{1})^{h_{i}}}{(h_{j}+1)\prod_{i=0}^{\alpha}h_{i}!}\frac{(n-k)!l_{j}\prod_{i=0}^{\alpha}(P_{i}^{0})^{l_{i}}}{P_{j}^{0}\prod_{i=0}^{\alpha}l_{i}!}.

(19)

Secondly,

\displaystyle\operatorname{\mathsf{P}}(\{J=j\}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha}+{\bf e}_{j},{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha}-{\bf e}_{j},w(\mathbf{x})=k)=\frac{l_{j}-1}{-1+\sum_{j\in{\mathcal{T}}}l_{j}}

(20)

by the random process of choosing $J$ . Finally, by the symmetry among the balls in the same bin, it suffices for the adaptive phase of the algorithm to choose the number of balls to be revealed in each bin. Recall that we assume without loss of generality that the adaptive phase only reveals balls in sets $\{\bar{\mathcal{L}}_{j}\mathchar 58\relax j\in\mathcal{T}\}$ . For $j\in\mathcal{T}$ , let $N^{\prime}_{j}$ denote the number of balls revealed in $\bar{\mathcal{L}}_{j}$ . Note that $N^{\prime}_{j}$ ’s are random variables satisfying $\sum_{j\in\mathcal{T}}N^{\prime}_{j}=N^{\prime}$ by the construction of $\mathcal{A}^{\prime}$ . Notice that $J$ and $w({\mathbf{x}})$ are both variables hidden from $\mathcal{A}^{\prime}$ . Moreover, the adaptive phase can only lead to two possible outcomes: a heavy ball is found or no heavy ball is found. Therefore, we can assume without loss of generality that the distribution of $N^{\prime}_{j}$ ’s only depends on the observations $\bar{\mathcal{H}}_{0}^{\alpha}$ and $\bar{\mathcal{L}}_{0}^{\alpha}$ . Because the probability of finding the heavy ball in $\bar{\mathcal{L}}_{j}$ by revealing $i$ balls is $\frac{i}{l_{j}}$ , it follows that

	$\displaystyle\operatorname{\mathsf{P}}({\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}J=j,{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha}+{\bf e}_{j},{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha}-{\bf e}_{j},w(\mathbf{x})=k)$
	$\displaystyle=\sum_{i=0}^{l_{j}}\operatorname{\mathsf{P}}(N^{\prime}_{j}=i\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}J=j,{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha}+{\bf e}_{j},{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha}-{\bf e}_{j},w(\mathbf{x})=k)\left(1-\frac{i}{l_{j}}\right)$
	$\displaystyle\geq 1-\frac{\operatorname{\mathsf{E}}[N^{\prime}_{j}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}J=j,{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha}+{\bf e}_{j},{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha}-{\bf e}_{j},w(\mathbf{x})=k]}{l_{j}-1}$
	$\displaystyle=1-\frac{\operatorname{\mathsf{E}}[N^{\prime}_{j}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}J=j,\bar{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha},w(\mathbf{x})=k]}{l_{j}-1}$
	$\displaystyle=1-\frac{\operatorname{\mathsf{E}}[N^{\prime}_{j}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}\bar{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha},w(\mathbf{x})=k-1]}{l_{j}-1},$		(21)

where the penultimate equality follows because $\{J=j,{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha}+{\bf e}_{j},{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha}-{\bf e}_{j}\}=\{J=j,\bar{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha}\}$ , and the last equality follows because $N^{\prime}_{j}$ is conditionally independent of $w({\mathbf{x}})$ and $J$ given $\bar{\bf H}_{0}^{\alpha}$ and $\bar{\bf L}_{0}^{\alpha}$ . Plugging (19), (20) and (21) into (18) yields

	$\displaystyle\operatorname{\mathsf{P}}\left(\{({\bf H}_{0}^{\alpha},{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha}+{\bf e}_{j},{\bf l}_{0}^{\alpha}-{\bf e}_{j})\}\cap\{J=j\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k\right)$
	$\displaystyle\geq\frac{k!P_{j}^{1}\prod_{i=0}^{\alpha}(P_{i}^{1})^{h_{i}}}{(h_{j}+1)\prod_{i=0}^{\alpha}h_{i}!}\cdot\frac{(n-k)!l_{j}\prod_{i=0}^{\alpha}(P_{i}^{0})^{l_{i}}}{P_{j}^{0}\prod_{i=0}^{\alpha}l_{i}!}\cdot\frac{l_{j}-1-\operatorname{\mathsf{E}}[N^{\prime}_{j}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}\bar{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha},w(\mathbf{x})=k-1]}{-1+\sum_{j\in{\mathcal{T}}}l_{j}}.$		(22)

Then, we plug (22) into (17) and divide by (16) to get

	$\displaystyle\frac{\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k\right)}{\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k-1\right)}$
	$\displaystyle=\sum_{j\in{\mathcal{T}}}\frac{kP_{j}^{1}}{h_{j}+1}\cdot\frac{l_{j}}{(n-k+1)P_{j}^{0}}\cdot\frac{l_{j}-1-\operatorname{\mathsf{E}}[N^{\prime}_{j}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}\bar{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha},w(\mathbf{x})=k-1]}{-1+\sum_{i\in{\mathcal{T}}}l_{i}}.$		(23)

To complete the proof, we need a technical lemma that bounds the probabilities $P_{j}^{1}$ and $P_{j}^{0}$ .

Lemma 5.

For each $j\in{\mathcal{T}}$ , we have $P_{j}^{0}\geq\exp(-o(\log k))$ , and $\frac{\delta}{k}\left(\frac{k}{\delta}\right)^{{\epsilon}/2}\leq P_{j}^{1}\leq k^{-1-\frac{\gamma}{\log\log(1/\delta)}}$ for some positive constant $\gamma$ .

Because $({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{C}$ , we have $h_{j}=0$ and $(1-\Delta_{j}^{0})(n-k+1)P_{j}^{0}\leq l_{i}\leq(1+\Delta_{j}^{0})(n-k+1)P_{j}^{0}$ for each $j\in{\mathcal{T}}$ . Recall that we defined $\Delta_{j}^{0}=\sqrt{\frac{\log n}{(n-k+1)P_{j}^{0}}}$ . It follows by Lemma 5 that $\Delta_{j}^{0}=o(1)$ for each $j\in{\mathcal{T}}$ , which further implies that $l_{j}=(1+o(1))(n-k+1)P_{j}^{0}$ . We then have

	$\displaystyle\sum_{j\in{\mathcal{T}}}\frac{kP_{j}^{1}}{h_{j}+1}\cdot\frac{l_{j}}{(n-k+1)P_{j}^{0}}\cdot\frac{l_{j}-1-\operatorname{\mathsf{E}}[N^{\prime}_{j}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}\bar{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha},w(\mathbf{x})=k-1]}{-1+\sum_{i\in{\mathcal{T}}}l_{i}}$
	$\displaystyle\geq\delta\left(\frac{k}{\delta}\right)^{{\epsilon}/2}(1+o(1))\sum_{j\in{\mathcal{T}}}\frac{l_{j}-1-\operatorname{\mathsf{E}}[N^{\prime}_{j}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}\bar{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha},w(\mathbf{x})=k-1]}{-1+\sum_{i\in{\mathcal{T}}}l_{i}}$
	$\displaystyle\stackrel{{\scriptstyle\mathchoice{\hbox to0.0pt{\hss$\displaystyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\textstyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\scriptstyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\scriptscriptstyle{(a)}$\hss}}}}{{\geq}}\frac{\delta}{n}\left(\frac{k}{\delta}\right)^{{\epsilon}/2}(1+o(1))\left(-(2{\epsilon}\alpha+1)-\operatorname{\mathsf{E}}[N^{\prime}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}\bar{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha},w(\mathbf{x})=k-1]+\sum_{j\in{\mathcal{T}}}l_{j}\right)$
	$\displaystyle\stackrel{{\scriptstyle\mathchoice{\hbox to0.0pt{\hss$\displaystyle{(b)}$\hss}}{\hbox to0.0pt{\hss$\textstyle{(b)}$\hss}}{\hbox to0.0pt{\hss$\scriptstyle{(b)}$\hss}}{\hbox to0.0pt{\hss$\scriptscriptstyle{(b)}$\hss}}}}{{\geq}}\frac{(1+o(1))\delta}{n}\left(\frac{k}{\delta}\right)^{{\epsilon}/2}\left(n^{*}-\operatorname{\mathsf{E}}[N^{\prime}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w({\mathbf{x}})=k-1,(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})]\right),$

where (a) follows because $-1+\sum_{i\in{\mathcal{T}}}l_{i}\leq n$ and (b) follows because

\sum_{i\in{\mathcal{T}}}l_{i}\geq(n-k+1)\left(1-2\exp\left(-\frac{\sqrt{\log k}}{6p{D_{\mathrm{KL}}}(p-{\epsilon}\|1-p)}\right)\right)\left(1-\sqrt{\frac{\log n}{n}}\right)

on the assumption that $({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{C}$ . ∎

A-A2 Proof of Lemmas

Proof of Lemma 1.

The key idea in this proof is to show that the original Algorithm $\mathcal{A}$ can be simulated by executing Algorithm $\mathcal{A}^{\prime}$ .

To show this simulation, we assume that the non-adaptive phase of $\mathcal{A}^{\prime}$ is completed, and simulate the execution of $\mathcal{A}$ in the adaptive phase of $\mathcal{A}^{\prime}$ . When Algorithm $\mathcal{A}$ makes a query on bit $x_{j}$ for the $i$ th time, where $i\leq\alpha$ , Algorithm $\mathcal{A}^{\prime}$ returns the response to the $i$ th query on $x_{j}$ from its non-adaptive phase to Algorithm $\mathcal{A}$ . When Algorithm $\mathcal{A}$ makes a query on bit $x_{j}$ for the $\alpha+1$ st time, Algorithm $\mathcal{A}^{\prime}$ noiselessly reads the bit $x_{j}$ , and returns response $x_{j}\oplus\mathsf{Bern}(p)$ to $\mathcal{A}$ . When Algorithm $\mathcal{A}$ makes a query on bit $x_{j}$ for the $i$ th time, where $i\geq\alpha+2$ , because Algorithm $\mathcal{A}^{\prime}$ already has the value of $x_{j}$ , it returns response $x_{j}\oplus\mathsf{Bern}(p)$ to Algorithm $\mathcal{A}$ . Let $N$ denote the number of bits that are queried by $\mathcal{A}$ for at least $\alpha$ times. Recall that $M$ denotes the total number of queries made by $\mathcal{A}$ . Because we assume $\operatorname{\mathsf{E}}[M\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x}]\leq\frac{(1-\epsilon)(n-k-\epsilon n)\log\frac{k}{\delta}}{{D_{\mathrm{KL}}}(p-\epsilon\|1-p)}$ , we have

\frac{(1-\epsilon)(n-k-\epsilon n)\log\frac{k}{\delta}}{{D_{\mathrm{KL}}}(p-\epsilon\|1-p)}\geq\operatorname{\mathsf{E}}[M\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x}]\geq\sum_{i=0}^{n}\operatorname{\mathsf{P}}(N=i\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x})\cdot i\alpha=\alpha\operatorname{\mathsf{E}}[N\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x}],

for any input instance ${\mathbf{x}}$ , and it follows that $\operatorname{\mathsf{E}}[N\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x}]\leq n-k-{\epsilon}n$ . Thus, this simulation process uses at most $n-k-{\epsilon}n$ noiseless bits read in expectation in the adaptive phase of Algorithm $\mathcal{A}^{\prime}$ . Finally, it is not hard to see that the output distribution of this simulation is the same as the output distribution of Algorithm $\mathcal{A}$ . ∎

Proof of Lemma 5.

By definition, we have $P_{j}^{1}=\binom{\alpha}{j}(1-p)^{j}p^{\alpha-j}$ . Notice that $\min_{j\in{\mathcal{T}}}P_{j}^{1}=P_{\alpha(p-{\epsilon})}^{1}$ . By Stirling’s approximation, we have

	$\displaystyle P_{\alpha(p-{\epsilon})}^{1}$	$\displaystyle=\binom{\alpha}{(p-{\epsilon})\alpha}(1-p)^{\alpha(p-{\epsilon})}p^{\alpha(1-p+{\epsilon})}$
		$\displaystyle\geq\exp\left(-\alpha\left({D_{\mathrm{KL}}}(p-{\epsilon}\\|1-p)+O\left(\frac{\log\alpha}{\alpha}\right)\right)\right)$
		$\displaystyle=\exp\left(-\frac{(1-{\epsilon})\log\frac{k}{\delta}}{{D_{\mathrm{KL}}}(p-{\epsilon}\\|1-p)}\left({D_{\mathrm{KL}}}(p-{\epsilon}\\|1-p)+O\left(\frac{\log\alpha}{\alpha}\right)\right)\right)$
		$\displaystyle=\left(\frac{\delta}{k}\right)^{(1-{\epsilon})(1+O((\log\alpha)/\alpha))}$
		$\displaystyle\geq\frac{\delta}{k}\left(\frac{k}{\delta}\right)^{{\epsilon}/2}$

To see the upper bounds $P_{j}^{1}\leq k^{-1-\gamma}$ , we observe that $\max_{j\in{\mathcal{T}}}P_{j}^{1}=P_{\alpha(p+{\epsilon})}^{1}$ . By Stirling’s approximation, we have

	$\displaystyle P_{\alpha(p+{\epsilon})}^{1}$	$\displaystyle=\binom{\alpha}{(p+{\epsilon})\alpha}(1-p)^{\alpha(p+{\epsilon})}p^{\alpha(1-p-{\epsilon})}$
		$\displaystyle\leq\exp(-\alpha{D_{\mathrm{KL}}}(p+{\epsilon}\\|1-p))$
		$\displaystyle=\exp\left(-\frac{(1-{\epsilon})\log\frac{k}{\delta}}{{D_{\mathrm{KL}}}(p-{\epsilon}\\|1-p)}{D_{\mathrm{KL}}}(p+{\epsilon}\\|1-p)\right)$
		$\displaystyle\stackrel{{\scriptstyle\mathchoice{\hbox to0.0pt{\hss$\displaystyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\textstyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\scriptstyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\scriptscriptstyle{(a)}$\hss}}}}{{=}}\exp\left(-(1-{\epsilon})(1-O({\epsilon}))\log\frac{k}{\delta}\right)$
		$\displaystyle\leq k^{-1-\frac{\gamma}{\log\log(1/\delta)}},$

where (a) follows because $\frac{{D_{\mathrm{KL}}}(p+{\epsilon}\|1-p)}{{D_{\mathrm{KL}}}(p-{\epsilon}\|1-p)}=1-O({\epsilon})$ , and the existence of positive constant $\gamma$ is guaranteed by the fact that $\log\frac{1}{\delta}\geq\frac{\log k}{\log\log(1/\delta)}$ and ${\epsilon}=o(\frac{1}{\log\log(1/\delta)})$ .

We have defined $P_{j}^{0}=\binom{\alpha}{j}p^{j}(1-p)^{\alpha-j}$ . We can see that $\min_{j\in{\mathcal{T}}}P_{j}^{0}=\min(P_{\alpha(p-{\epsilon})}^{0},P_{\alpha(p+{\epsilon})}^{0})$ . Therefore, it suffices to show that $P_{\alpha(p-{\epsilon})}^{0}\geq\exp(-o(\log k))$ and $P_{\alpha(p+{\epsilon})}^{0}\geq\exp(-o(\log k))$ . By Stirling’s approximation, we have

	$\displaystyle P_{\alpha(p-{\epsilon})}^{0}$	$\displaystyle=\binom{\alpha}{\alpha(p-{\epsilon})}p^{\alpha(p-{\epsilon})}(1-p)^{\alpha(1-p+{\epsilon})}$
		$\displaystyle\geq\exp\left(-\alpha\left({D_{\mathrm{KL}}}(p-{\epsilon}\\|p)+O\left(\frac{\log\alpha}{\alpha}\right)\right)\right)$
		$\displaystyle=\exp\left(-\frac{(1-{\epsilon})\log\frac{k}{\delta}}{{D_{\mathrm{KL}}}(p-{\epsilon}\\|1-p)}\left({D_{\mathrm{KL}}}(p-{\epsilon}\\|p)+O\left(\frac{\log\alpha}{\alpha}\right)\right)\right)$
		$\displaystyle\geq\exp(-o(\log k)),$

where the last inequality follows because ${D_{\mathrm{KL}}}(p-{\epsilon}\|p)=O(\epsilon^{2})$ and $\log\frac{k}{\delta}\leq\log k+\log\log(1/\delta)\log k$ . It can be similarly shown that $P_{\alpha(p+{\epsilon})}^{0}\geq\exp(-o(\log k))$ . ∎

Proof of Lemma 2.

Let $W_{0}\sim{\mathrm{Binom}}(\alpha,p)$ . For any $\mathbf{x}$ with $w(\mathbf{x})=k-1$ , we have $\sum_{j\in{\mathcal{T}}}L_{j}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x}\sim{\mathrm{Binom}}(n-k+1,\operatorname{\mathsf{P}}(W_{0}\in{\mathcal{T}}))$ . To bound $\operatorname{\mathsf{P}}({\mathcal{E}}_{2}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x})$ , we first bound $\operatorname{\mathsf{P}}(W_{0}\in{\mathcal{T}})$ . We have

	$\displaystyle\operatorname{\mathsf{P}}(W_{0}\notin{\mathcal{T}})$	$\displaystyle=\operatorname{\mathsf{P}}(\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}W_{0}-\alpha p\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}\geq\alpha{\epsilon})$
		$\displaystyle\stackrel{{\scriptstyle\mathchoice{\hbox to0.0pt{\hss$\displaystyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\textstyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\scriptstyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\scriptscriptstyle{(a)}$\hss}}}}{{\leq}}2\exp\left(-\frac{{\epsilon}^{2}\alpha}{3p}\right)$
		$\displaystyle\stackrel{{\scriptstyle\mathchoice{\hbox to0.0pt{\hss$\displaystyle{(b)}$\hss}}{\hbox to0.0pt{\hss$\textstyle{(b)}$\hss}}{\hbox to0.0pt{\hss$\scriptstyle{(b)}$\hss}}{\hbox to0.0pt{\hss$\scriptscriptstyle{(b)}$\hss}}}}{{\leq}}2\exp\left(-\frac{(1-{\epsilon})\log\frac{k}{\delta}}{3p{D_{\mathrm{KL}}}(p-{\epsilon}\\|1-p)\sqrt{\log k}}\right)$
		$\displaystyle\stackrel{{\scriptstyle\mathchoice{\hbox to0.0pt{\hss$\displaystyle{(c)}$\hss}}{\hbox to0.0pt{\hss$\textstyle{(c)}$\hss}}{\hbox to0.0pt{\hss$\scriptstyle{(c)}$\hss}}{\hbox to0.0pt{\hss$\scriptscriptstyle{(c)}$\hss}}}}{{\leq}}2\exp\left(-\frac{\sqrt{\log k}}{6p{D_{\mathrm{KL}}}(p-{\epsilon}\\|1-p)}\right),$

where (a) follows by the Chernoff bound (see, for example, Corollary 4.6 in Mitzenmacher and Upfal, (2017)), (b) follows because ${\epsilon}=(\log k)^{-1/4}$ , and (c) follows because $\frac{1}{2}(1-{\epsilon})\log\frac{k}{\delta}\geq\log k$ . For simplicity, we denote $c=\frac{1}{6p{D_{\mathrm{KL}}}(p-{\epsilon}\|1-p)}$ . It follows that

	$\displaystyle\operatorname{\mathsf{P}}({\mathcal{E}}_{2}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}\mathbf{x})$
	$\displaystyle\leq\operatorname{\mathsf{P}}\Bigg{(}\mathrm{Binom}\left(n-k+1,1-2\exp\left(-c\sqrt{\log k}\right)\right)\leq$
	$\displaystyle\hskip 20.00003pt(n-k+1)\left(1-2\exp\left(-c\sqrt{\log k}\right)\right)\left(1-\sqrt{\frac{\log n}{n}}\right)\Bigg{)}$
	$\displaystyle\stackrel{{\scriptstyle\mathchoice{\hbox to0.0pt{\hss$\displaystyle{(d)}$\hss}}{\hbox to0.0pt{\hss$\textstyle{(d)}$\hss}}{\hbox to0.0pt{\hss$\scriptstyle{(d)}$\hss}}{\hbox to0.0pt{\hss$\scriptscriptstyle{(d)}$\hss}}}}{{\leq}}\exp\left(-\frac{(n-k+1)\left(1-2\exp(-c\sqrt{\log k})\right)\log n}{2n}\right)$
	$\displaystyle\leq n^{-1/5},$

where (d) follows by the Chernoff bound. ∎

Proof of Lemma 3.

For any $\mathbf{x}$ with $w(\mathbf{x})=k-1$ , we have $L_{j}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x}\sim{\mathrm{Binom}}(n-k+1,P_{j}^{0})$ for each $j\in{\mathcal{T}}$ . For any $j\in{\mathcal{T}}$ , by Chernoff bound we have

	$\displaystyle\operatorname{\mathsf{P}}(\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}L_{j}-(n-k+1)P_{j}^{0}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}\geq\Delta_{j}^{0}(n-k+1)P_{j}^{0}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}\mathbf{x})$
	$\displaystyle\leq 2\exp(-(\Delta_{j}^{0})^{2}(n-k+1)P_{j}^{0}/3)$
	$\displaystyle=2n^{-1/3},$

where the last equality follows because $\Delta_{j}^{0}=\sqrt{\frac{\log n}{(n-k+1)P_{j}^{0}}}$ . By union bound, we have

\operatorname{\mathsf{P}}({\mathcal{E}}_{3}^{c}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x})\leq(2\alpha+1)2n^{-1/3}\leq n^{-1/6}.

∎

Proof of Lemma 4.

Let $\mathbf{x}$ be an input instance with $w(\mathbf{x})=k-1$ . For each $j\in{\mathcal{T}}$ , we have $H_{j}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x}\sim{\mathrm{Binom}}(k-1,P_{j}^{1})$ . From Lemma 5, we know that $P_{j}^{1}\leq k^{-1-\frac{\gamma}{\log\log(1/\delta)}}$ , where $\gamma$ is a positive constant. Therefore, we have $\operatorname{\mathsf{E}}[H_{j}]\leq k^{-\frac{\gamma}{\log\log(1/\delta)}}$ for each $j\in{\mathcal{T}}$ . By Markov’s inequality, we know that $\operatorname{\mathsf{P}}(H_{j}\geq 1)\leq k^{-\frac{\gamma}{\log\log(1/\delta)}}$ for any $j\in{\mathcal{T}}$ . Applying a union bound over all $j\in{\mathcal{T}}$ yields

\operatorname{\mathsf{P}}({\mathcal{E}}_{4}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x})\leq O\left(\frac{\alpha}{k^{\gamma}}\right)=O\left(\frac{\log k\log\log(1/\delta)}{k^{\frac{\gamma}{\log\log(1/\delta)}}}\right).

∎

A-B Regime 2: $\log k>\log(1/\delta)\log\log(1/\delta)$

In this section, we prove Theorem 1 in regime $\log k>\log(1/\delta)\log\log(1/\delta)$ . The proof in this regime is similar to the proof in the previous regime. The major difference is that $\frac{1}{\delta}$ is relatively small comparing to $k$ in this regime, so the statistics exhibit slightly different concentration behaviors.

Let ${\epsilon}=(\log\frac{1}{\delta})^{-1/4}$ in this regime. Because $\delta=o(1)$ and $\log k=\omega(\log(1/\delta))$ , we know that $\frac{(1-\epsilon)(n-k-{\epsilon}n)\log k}{{D_{\mathrm{KL}}}(p-\epsilon\|1-p)}=(1-o(1))\frac{n\log(k/\delta)}{{D_{\mathrm{KL}}(p\|1-p)}}$ . Therefore, it suffices to show that for any algorithm that makes at most $\frac{(1-\epsilon)(n-k-{\epsilon}n)\log k}{{D_{\mathrm{KL}}}(p-\epsilon\|1-p)}$ queries in expectation, the worst-case error probability is at least $\delta$ .

Let $\mathcal{A}$ be an arbitrary algorithm that uses at most $\frac{(1-\epsilon)(n-k-{\epsilon}n)\log k}{{D_{\mathrm{KL}}}(p-\epsilon\|1-p)}$ queries in expectation. We slightly modify the the procedure of the enhanced algorithm $\mathcal{A}^{\prime}$ defined in Section III as follows:

$\bullet$

Non-adaptive phase: query each of the $n$ bits $\alpha\triangleq\frac{(1-\epsilon)\log k}{{D_{\mathrm{KL}}}(p-\epsilon\|1-p)}$ times.
$\bullet$

Adaptive phase: Adaptively choose $N^{\prime}$ bits and noiselessly reveal their value, where the number of revealed bits $N^{\prime}$ satisfies $\operatorname{\mathsf{E}}[N^{\prime}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x}]=n-k-{\epsilon}n$ for any $\mathbf{x}$ .

Following an analogous argument as in Lemma 1, we know that the worst-case error probability of ${\mathcal{A}}^{\prime}$ is upper bounded by that of ${\mathcal{A}}$ . Therefore, it suffices to show that the worst-case error probability of ${\mathcal{A}}^{\prime}$ is at least $\delta$ . In the analysis of $\mathcal{A}^{\prime}$ , we assume that the input instance satisfies either $w(\mathbf{x})=k-1$ or $w(\mathbf{x})=k$ , and there exists genie-aided information in the form of two sequences of sets $\bar{\mathcal{H}}_{0}^{\alpha}=(\bar{\mathcal{H}}_{0},\ldots,\bar{\mathcal{H}}_{\alpha})$ and $\bar{\mathcal{L}}_{0}^{\alpha}=(\bar{\mathcal{L}}_{0},\ldots,\bar{\mathcal{L}}_{\alpha})$ defined in the same way as the previous regime. Recall that $\widehat{\mathsf{TH}}_{k}({\bf H}_{0}^{\alpha},{\bf L}_{0}^{\alpha},\mathbbm{1}_{\{{\mathcal{E}}_{1}\}})$ denote the output estimator of $\mathcal{A}^{\prime}$ , where ${\mathcal{E}}_{1}$ denote the event that the adaptive phase does not find any heavy ball. Set $\mathcal{N}$ is defined as the collections of all $(2\alpha+2)-$ tuples $({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})$ such that $\widehat{\mathsf{TH}}_{k}({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha},1)=0$ .

To prove that $\mathcal{A}^{\prime}$ has worst-case error probability at least $\delta$ , we prove

\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k)>\delta

under the assumption that

\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1)\geq 1-\delta.

In the following, we construct the family $\mathcal{D}$ of realizations such that the typicality and the bounded ratio property introduced in Section III are satisfied. Let $\mathcal{D}$ denote the collection of $(2\alpha+2)-$ tuples of natural numbers $({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})$ such that

1.

$\sum_{i=0}^{\alpha}h_{i}=k-1$ , $\sum_{i=0}^{\alpha}l_{i}=n-k+1$ ;
2.

$\sum_{j\in{\mathcal{T}}}l_{j}\geq(n-k+1)\left(1-2\exp\left(-\sqrt{\log\frac{1}{\delta}}\right)\right)\left(1-\sqrt{\frac{\log n}{n}}\right)$ ;
3.

$(1-\Delta_{j}^{0})(n-k+1)P_{j}^{0}\leq l_{i}\leq(1+\Delta_{j}^{0})(n-k+1)P_{j}^{0}$ for any $j\in{\mathcal{T}}$ , where $\Delta_{j}^{0}\triangleq\sqrt{\frac{\log n}{(n-k+1)P_{j}^{0}}}$ ;
4.

$(1-\Delta_{j}^{1})(k-1)P_{j}^{1}\leq h_{i}\leq(1+\Delta_{j}^{1})(k-1)P_{j}^{1}$ for any $j\in{\mathcal{T}}$ , where $\Delta_{j}^{1}\triangleq\sqrt{\frac{\log n}{(k-1)P_{j}^{1}}}$ .

Notice that the major difference between the definition of $\mathcal{D}$ and the definition of $\mathcal{C}$ in the previous regime is on the last condition. The reason for this distinction is that in this regime, we have $\operatorname{\mathsf{E}}[H_{j}]=\omega(1)$ instead of $\operatorname{\mathsf{E}}[H_{j}]=o(1)$ for each $j\in{\mathcal{T}}$ . Therefore, the typical realizations of $H_{\alpha(p-{\epsilon})}^{\alpha(p-{\epsilon})}$ would center around its mean rather than having zero values. In the following two propositions, we state the typicality and the bounded ratio property of $\mathcal{D}$ .

Proposition 3.

Set $\mathcal{D}$ satisfies that

\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{D}}\operatorname{\mathsf{P}}(\{({\bf H}_{0}^{\alpha},{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1)\geq 1-k^{-1/6}-O(n^{-1/6}).

(24)

Proposition 4.

Let

n^{*}\triangleq(n-k+1)\left(1-2\exp\left(-\sqrt{\log\frac{1}{\delta}}\right)\right)\left(1-\sqrt{\frac{\log n}{n}}\right)-(2{\epsilon}\alpha+1).

For any $({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{D}$ , we have

	$\displaystyle\frac{\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k\right)}{\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k-1\right)}$		(25)
	$\displaystyle\geq\frac{1+o(1)}{n}\left(n^{*}-\operatorname{\mathsf{E}}[N^{\prime}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w({\mathbf{x}})=k-1,(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})]\right).$		(26)

We are now ready to prove $\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k)>\delta$ based on Propositions 3 and 4. We have

	$\displaystyle\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k)$
	$\displaystyle=\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k)$
	$\displaystyle\geq\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}\cap\mathcal{D}}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k)$
	$\displaystyle\stackrel{{\scriptstyle\mathchoice{\hbox to0.0pt{\hss$\displaystyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\textstyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\scriptstyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\scriptscriptstyle{(a)}$\hss}}}}{{\geq}}\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}\cap\mathcal{D}}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k-1)$
	$\displaystyle\hskip 10.00002pt\cdot\frac{1+o(1)}{n}\left(n^{*}-\operatorname{\mathsf{E}}[N^{\prime}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w({\mathbf{x}})=k-1,(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})]\right)$
	$\displaystyle=\frac{1+o(1)}{n}\Bigg{(}\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}\cap\mathcal{D}}n^{*}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k-1)$
	$\displaystyle\hskip 10.00002pt-\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k-1)\operatorname{\mathsf{E}}[N^{\prime}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w({\mathbf{x}})=k-1,(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})]\Bigg{)},$		(27)

where (a) follows by Proposition 4. By Proposition 3 and the definition of $n^{*}$ , we have

	$\displaystyle\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}\cap\mathcal{D}}n^{*}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k-1)$
	$\displaystyle\geq n^{*}(1-\delta-k^{-1/6}-O(n^{-1/6}))$
	$\displaystyle=(n-k)\left(1-\delta-O(k^{-1/6})-O\left(\exp\left(-\sqrt{\log\tfrac{1}{\delta}}\right)\right)-O\left(\sqrt{\tfrac{\log n}{n}}\right)-O\left(\frac{{\epsilon}\alpha}{n}\right)\right).$		(28)

By the law of total expectation and the fact that $N^{\prime}$ is always non-negative, we have

	$\displaystyle\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}\cap\mathcal{D}}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k-1)\operatorname{\mathsf{E}}[N^{\prime}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w({\mathbf{x}})=k-1,(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})]$
	$\displaystyle\leq\operatorname{\mathsf{E}}[N^{\prime}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w({\mathbf{x}})=k-1]\leq n-k-{\epsilon}n.$		(29)

Now, we focus on the difference between (28) and (29). Recall that we defined ${\epsilon}=(\log\frac{1}{\delta})^{-1/4}$ . It follows that

	$\displaystyle(n-k)\left(1-\delta-O(k^{-1/6})-O\left(\exp\left(-\sqrt{\log\tfrac{1}{\delta}}\right)\right)-O\left(\sqrt{\tfrac{\log n}{n}}\right)-O\left(\frac{{\epsilon}\alpha}{n}\right)\right)-(n-k-{\epsilon}n)$
	$\displaystyle\geq(1-o(1))\epsilon n.$

Therefore, (27) implies that

\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k)\geq(1-o(1))\epsilon.

Notice that $\frac{{\epsilon}}{\delta}\geq\frac{(\log\tfrac{1}{\delta})^{-1/4}}{\delta}=\omega(1)$ . This shows that

\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k)\geq\omega(\delta).

Now it suffices to prove Propositions 3 and 4 to complete the proof in this regime.

A-B1 Proof of Propositions

Proof of Proposition 3.

Define error events

{\mathcal{E}}_{5}\triangleq\left\{\sum_{j\in{\mathcal{T}}}\leq(n-k+1)\left(1-2\exp\left(-\sqrt{\log\frac{1}{\delta}}\right)\right)\left(1-\sqrt{\frac{\log n}{n}}\right)\right\},

{\mathcal{E}}_{6}\triangleq\{\exists j\in{\mathcal{T}}\mathchar 58\relax L_{j}\notin[(1-\Delta_{j}^{0})(n-k+1)P_{j}^{0},(1+\Delta_{j}^{0})(n-k+1)P_{j}^{0}]\},

and

{\mathcal{E}}_{7}\triangleq\{\exists j\in{\mathcal{T}}\mathchar 58\relax H_{j}\notin[(1-\Delta_{j}^{1})(k-1)P_{j}^{1},(1+\Delta_{j}^{1})(k-1)P_{j}^{1}]\}.

Notice that events ${\mathcal{E}}_{5},{\mathcal{E}}_{6}$ and ${\mathcal{E}}_{7}$ correspond to the second, third and fourth conditions in the definition of $\mathcal{D}$ . It follows that

	$\displaystyle\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{D}}\operatorname{\mathsf{P}}(\{({\bf H}_{0}^{\alpha},{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k-1)$
	$\displaystyle=1-\operatorname{\mathsf{P}}({\mathcal{E}}_{5}\cup{\mathcal{E}}_{6}\cup{\mathcal{E}}_{7}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k-1)$
	$\displaystyle\geq 1-\operatorname{\mathsf{P}}({\mathcal{E}}_{5}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k-1)-\operatorname{\mathsf{P}}({\mathcal{E}}_{6}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k-1)-\operatorname{\mathsf{P}}({\mathcal{E}}_{7}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k-1),$

where the last inequality follows by the union bound. Proposition 3 follows readily by the following three lemmas.

Lemma 6.

For any $\mathbf{x}$ with $w(\mathbf{x})=k-1$ , we have $\operatorname{\mathsf{P}}({\mathcal{E}}_{5}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x})\leq n^{-1/5}.$

Lemma 7.

For any $\mathbf{x}$ with $w(\mathbf{x})=k-1$ , we have $\operatorname{\mathsf{P}}({\mathcal{E}}_{6}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x})\leq n^{-1/6}$ .

Lemma 8.

For any $\mathbf{x}$ with $w(\mathbf{x})=k-1$ , we have $\operatorname{\mathsf{P}}({\mathcal{E}}_{7}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x})\leq k^{-1/6}$ .

By Lemmas 6, 7 and 8, we have

	$\displaystyle\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{D}}\operatorname{\mathsf{P}}(\{({\bf H}_{0}^{\alpha},{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k-1)$
	$\displaystyle\geq 1-O(n^{-1/6})-k^{-1/6},$

which completes the proof. ∎

Proof of Proposition 4.

Following the same derivation for equation (23) in the proof of Proposition 2, we have

	$\displaystyle\frac{\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k\right)}{\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k-1\right)}$
	$\displaystyle=\sum_{j\in{\mathcal{T}}}\frac{kP_{j}^{1}}{h_{j}+1}\cdot\frac{l_{j}}{(n-k+1)P_{j}^{0}}\cdot\frac{l_{j}-1-\operatorname{\mathsf{E}}[N^{\prime}_{j}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}\bar{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha},w(\mathbf{x})=k-1]}{-1+\sum_{i\in{\mathcal{T}}}l_{i}}.$		(30)

To proceed, we need a technical lemma that bounds $P_{j}^{0}$ and $P_{j}^{1}$ for each $j\in{\mathcal{T}}$ .

Lemma 9.

For each $j\in{\mathcal{T}}$ , we have $P_{j}^{1}\geq\exp\left(-\log k+\frac{\log k}{2(\log\tfrac{1}{\delta})^{1/4}}\right)$ and $P_{j}^{0}\geq\exp(-o(\log k))$ .

Because $({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{C}$ , we have $(1-\Delta_{j}^{1})(k-1)P_{j}^{1}\leq h_{i}\leq(1+\Delta_{j}^{1})(k-1)P_{j}^{1}$ and $(1-\Delta_{j}^{0})(n-k+1)P_{j}^{0}\leq l_{i}\leq(1+\Delta_{j}^{0})(n-k+1)P_{j}^{0}$ for each $j\in{\mathcal{T}}$ . Recall that we defined $\Delta_{j}^{1}\triangleq\sqrt{\frac{\log k}{(k-1)P_{j}^{1}}}$ and $\Delta_{j}^{0}\triangleq\sqrt{\frac{\log n}{(n-k+1)P_{j}^{0}}}$ . By Lemma 9, we have $\Delta_{j}^{0}=o(1)$ and $\Delta_{j}^{1}=o(1)$ , and it follows that

\frac{kP_{j}^{1}}{h_{j}+1}=1+o(1)\;\text{ and }\;\frac{l_{j}}{(n-k+1)P_{j}^{0}}=1+o(1).

Therefore, we have

	$\displaystyle\frac{\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k\right)}{\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k-1\right)}$
	$\displaystyle=(1+o(1))\sum_{j\in{\mathcal{T}}}\frac{l_{j}-1-\operatorname{\mathsf{E}}[N^{\prime}_{j}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}\bar{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha},w(\mathbf{x})=k-1]}{-1+\sum_{i\in{\mathcal{T}}}l_{i}}$
	$\displaystyle\geq(1+o(1))\frac{-\operatorname{\mathsf{E}}[N^{\prime}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}\bar{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha},w(\mathbf{x})=k-1]-(2{\epsilon}\alpha+1)+\sum_{j\in{\mathcal{T}}}l_{j}}{-1+\sum_{i\in{\mathcal{T}}}l_{i}}.$

Because $({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{D}$ , we have

\sum_{i\in{\mathcal{T}}}l_{i}\geq(n-k+1)\left(1-2\exp\left(-\sqrt{\log\frac{1}{\delta}}\right)\right)\left(1-\sqrt{\frac{\log n}{n}}\right).

Finally, by the definition of $n^{*}$ in the proposition statement, we have

	$\displaystyle\frac{\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k\right)}{\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k-1\right)}$
	$\displaystyle\geq\frac{1+o(1)}{n}\left(n^{*}-\operatorname{\mathsf{E}}[N^{\prime}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w({\mathbf{x}})=k-1,(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})]\right),$

which completes the proof. ∎

A-B2 Proof of Lemmas

Proof of Lemma 9.

By definition, we have $P_{j}^{1}=\binom{\alpha}{j}(1-p)^{j}p^{\alpha-j}$ . Notice that $\min_{j\in{\mathcal{T}}}P_{j}^{1}=P_{\alpha(p-{\epsilon})}^{1}$ . Therefore, it suffices to show the claim for $j={\alpha(p-{\epsilon})}$ . By the pmf of binomial distribution and Stirling’s approximation, we have

	$\displaystyle P_{\alpha(p-{\epsilon})}^{1}$	$\displaystyle=\binom{\alpha}{(p-{\epsilon})\alpha}(1-p)^{\alpha(p-{\epsilon})}p^{\alpha(1-p+{\epsilon})}$
		$\displaystyle\geq\exp\left(-\alpha\left({D_{\mathrm{KL}}}(p-{\epsilon}\\|1-p)+O\left(\frac{\log\alpha}{\alpha}\right)\right)\right)$
		$\displaystyle=\exp\left(-\frac{(1-{\epsilon})\log k}{{D_{\mathrm{KL}}}(p-{\epsilon}\\|1-p)}\left({D_{\mathrm{KL}}}(p-{\epsilon}\\|1-p)+O\left(\frac{\log\alpha}{\alpha}\right)\right)\right)$
		$\displaystyle\stackrel{{\scriptstyle\mathchoice{\hbox to0.0pt{\hss$\displaystyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\textstyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\scriptstyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\scriptscriptstyle{(a)}$\hss}}}}{{\geq}}\exp\left(-\left(1-\frac{1}{(\log\tfrac{1}{\delta})^{1/4}}\right)\left(1+O\left(\frac{\log\log k}{\log k}\right)\right)\log k\right)$
		$\displaystyle\stackrel{{\scriptstyle\mathchoice{\hbox to0.0pt{\hss$\displaystyle{(b)}$\hss}}{\hbox to0.0pt{\hss$\textstyle{(b)}$\hss}}{\hbox to0.0pt{\hss$\scriptstyle{(b)}$\hss}}{\hbox to0.0pt{\hss$\scriptscriptstyle{(b)}$\hss}}}}{{\geq}}\exp\left(-\log k+\frac{\log k}{2(\log\tfrac{1}{\delta})^{1/4}}\right),$

where (a) follows because ${\epsilon}=(\log\tfrac{1}{\delta})^{-1/4}$ and $\alpha=\Theta(\log k)$ , and (b) follows because $\frac{1}{(\log\tfrac{1}{\delta})^{1/4}}=\omega\left(\frac{\log\log k}{\log k}\right)$ .

We have $P_{j}^{0}=\binom{\alpha}{j}p^{j}(1-p)^{\alpha-j}$ . We can see that $\min_{j\in{\mathcal{T}}}P_{j}^{0}=\min(P_{\alpha(p-{\epsilon})}^{0},P_{\alpha(p+{\epsilon})}^{0})$ . Therefore, it suffices to show that $P_{\alpha(p-{\epsilon})}^{0}\geq\exp(-o(\log k))$ and $P_{\alpha(p+{\epsilon})}^{0}\geq\exp(-o(\log k))$ . By Stirling’s approximation, we have

	$\displaystyle P_{\alpha(p-{\epsilon})}^{0}$	$\displaystyle=\binom{\alpha}{\alpha(p-{\epsilon})}p^{\alpha(p-{\epsilon})}(1-p)^{\alpha(1-p+{\epsilon})}$
		$\displaystyle\geq\exp\left(-\alpha\left({D_{\mathrm{KL}}}(p-{\epsilon}\\|p)+O\left(\frac{\log\alpha}{\alpha}\right)\right)\right)$
		$\displaystyle=\exp\left(-\frac{(1-{\epsilon})\log k}{{D_{\mathrm{KL}}}(p-{\epsilon}\\|1-p)}\left({D_{\mathrm{KL}}}(p-{\epsilon}\\|p)+O\left(\frac{\log\alpha}{\alpha}\right)\right)\right)$
		$\displaystyle\geq\exp(-o(\log k)),$

where the last inequality follows because ${D_{\mathrm{KL}}}(p-{\epsilon}\|p)=o(1)$ . It can be similarly shown that $P_{\alpha(p+{\epsilon})}^{0}\geq\exp(-o(\log k))$ . ∎

Proof of Lemma 6.

For any $\mathbf{x}$ with $w(\mathbf{x})=k-1$ , we know that $\sum_{j\in{\mathcal{T}}}L_{j}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x}\sim{\mathrm{Binom}}(n-k+1,\operatorname{\mathsf{P}}(W_{0}\in{\mathcal{T}}))$ , where $W_{0}\sim{\mathrm{Binom}}(\alpha,p)$ . To bound $\operatorname{\mathsf{P}}({\mathcal{E}}_{5}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x})$ , we first bound $\operatorname{\mathsf{P}}(W_{0}\in{\mathcal{T}})$ . We have

	$\displaystyle\operatorname{\mathsf{P}}(W_{0}\notin{\mathcal{T}})$	$\displaystyle=\operatorname{\mathsf{P}}(\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}W_{0}-\alpha p\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}\geq\alpha{\epsilon})$
		$\displaystyle\stackrel{{\scriptstyle\mathchoice{\hbox to0.0pt{\hss$\displaystyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\textstyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\scriptstyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\scriptscriptstyle{(a)}$\hss}}}}{{\leq}}2\exp\left(-\frac{{\epsilon}^{2}\alpha}{3p}\right)$
		$\displaystyle\stackrel{{\scriptstyle\mathchoice{\hbox to0.0pt{\hss$\displaystyle{(b)}$\hss}}{\hbox to0.0pt{\hss$\textstyle{(b)}$\hss}}{\hbox to0.0pt{\hss$\scriptstyle{(b)}$\hss}}{\hbox to0.0pt{\hss$\scriptscriptstyle{(b)}$\hss}}}}{{\leq}}2\exp\left(-\frac{(1-{\epsilon})\log k}{3p{D_{\mathrm{KL}}}(p-{\epsilon}\\|1-p)\sqrt{\log(1/\delta)}}\right)$
		$\displaystyle\stackrel{{\scriptstyle\mathchoice{\hbox to0.0pt{\hss$\displaystyle{(c)}$\hss}}{\hbox to0.0pt{\hss$\textstyle{(c)}$\hss}}{\hbox to0.0pt{\hss$\scriptstyle{(c)}$\hss}}{\hbox to0.0pt{\hss$\scriptscriptstyle{(c)}$\hss}}}}{{\leq}}2\exp\left(-\sqrt{\log\frac{1}{\delta}}\right),$

where (a) follows by the Chernoff bound, (b) follows because ${\epsilon}=(\log(1/\delta))^{-1/4}$ , and (c) follows because $\log k=\omega(\log\frac{1}{\delta})$ .

Therefore, we have

	$\displaystyle\operatorname{\mathsf{P}}({\mathcal{E}}_{5}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}\mathbf{x})$
	$\displaystyle\leq\operatorname{\mathsf{P}}\Bigg{(}\mathrm{Binom}\left(n-k+1,1-2\exp\left(-\sqrt{\log\tfrac{1}{\delta}}\right)\right)$
	$\displaystyle\hskip 20.00003pt\leq(n-k+1)\left(1-2\exp\left(-\sqrt{\log\tfrac{1}{\delta}}\right)\right)\left(1-\sqrt{\tfrac{\log n}{n}}\right)\Bigg{)}$
	$\displaystyle\stackrel{{\scriptstyle\mathchoice{\hbox to0.0pt{\hss$\displaystyle{(d)}$\hss}}{\hbox to0.0pt{\hss$\textstyle{(d)}$\hss}}{\hbox to0.0pt{\hss$\scriptstyle{(d)}$\hss}}{\hbox to0.0pt{\hss$\scriptscriptstyle{(d)}$\hss}}}}{{\leq}}\exp\left(-\frac{(n-k+1)\left(1-2\exp\left(-\sqrt{\log\frac{1}{\delta}}\right)\right)\log n}{2n}\right)$
	$\displaystyle\leq n^{-1/5},$

where (d) follows by the Chernoff bound. ∎

Proof of Lemma 7.

	$\displaystyle\operatorname{\mathsf{P}}(\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}L_{j}-(n-k+1)P_{j}^{0}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}\geq\Delta_{j}^{0}(n-k+1)P_{j}^{0}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}\mathbf{x})$
	$\displaystyle\leq 2\exp(-(\Delta_{j}^{0})^{2}(n-k+1)P_{j}^{0}/3)$
	$\displaystyle=2n^{-1/3},$

where the last equality follows because $\Delta_{j}^{0}=\sqrt{\frac{\log n}{(n-k+1)P_{j}^{0}}}$ . By union bound, we have

\operatorname{\mathsf{P}}({\mathcal{E}}_{6}^{c}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x})\leq(2\alpha+1)2n^{-1/3}\leq n^{-1/6}.

∎

Proof of Lemma 8.

For any $\mathbf{x}$ with $w(\mathbf{x})=k-1$ , we have $H_{j}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x}\sim{\mathrm{Binom}}(k-1,P_{j}^{1})$ for each $j\in{\mathcal{T}}$ . For any $j\in{\mathcal{T}}$ , by Chernoff bound we have

	$\displaystyle\operatorname{\mathsf{P}}(\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}H_{j}-(k-1)P_{j}^{1}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}\geq\Delta_{j}^{1}((k-1)P_{j}^{1}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}\mathbf{x})$
	$\displaystyle\leq 2\exp(-(\Delta_{j}^{1})^{2}(k-1)P_{j}^{1}/3)$
	$\displaystyle=2k^{-1/3},$

where the last equality follows because $\Delta_{j}^{1}=\sqrt{\frac{\log k}{(k-1)P_{j}^{1}}}$ . By union bound, we have

\operatorname{\mathsf{P}}({\mathcal{E}}_{7}^{c}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x})\leq(2\alpha+1)2k^{-1/3}\leq k^{-1/6}.

∎

A-C Regime 3: $\log k\leq\frac{\log(1/\delta)}{\log\log(1/\delta)}$

In this regime, we prove the converse using an approach called Le Cam’s two points methods (see, e.g. Yu, (1997)). The proof is done by establishing the following proposition.

Proposition 5.

For any variable-length algorithm for computing $\mathsf{TH}_{k}(\mathbf{x})$ with the number of queries $M$ satisfying

\operatorname{\mathsf{E}}[M\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathbf{x}}]\leq\frac{(n-k+1)\log\tfrac{1}{4\delta}}{{D_{\mathrm{KL}}(p\|1-p)}}

for any input instance ${\mathbf{x}}$ . Then the worst-case error probability of the algorithm is at least $\delta$ .

Notice that under assumptions $\delta=o(1)$ and $\log k\leq\frac{\log(1/\delta)}{\log\log(1/\delta)}$ , we have

\frac{(n-k+1)\log\tfrac{1}{4\delta}}{{D_{\mathrm{KL}}(p\|1-p)}}=(1-o(1))\frac{(n-k)\log\frac{k}{\delta}}{{D_{\mathrm{KL}}(p\|1-p)}}.

Hence Proposition 5 directly implies Theorem 1 in this regime. It is worth noting that Proposition 5 does not imply the same bounds in the other two regimes, namely when $\frac{\log(1/\delta)}{\log\log(1/\delta)}<\log k\leq\log(1/\delta)\log\log(1/\delta)$ and when $\log k>\log(1/\delta)\log\log(1/\delta)$ .

The proof of Proposition 5 is based on Le Cam’s Two Points Lemma stated below.

Lemma 10 (Le Cam’s Two Points Lemma (Yu,, 1997)).

For any $\theta_{0},\theta_{1}\in\Theta$ , suppose that the following separation condition holds for some loss function $L(\theta,a)\mathchar 58\relax\Theta\times\mathcal{A}\rightarrow\mathbb{R}$ :

\displaystyle\forall{a\in\mathcal{A}},L(\theta_{0},a)+L(\theta_{1},a)\geq\Delta>0.

Then we have

\displaystyle\inf_{f}\sup_{\theta}\mathbb{E}_{\theta}[L(\theta,f(X))]\geq\frac{\Delta}{2}\left(1-\mathsf{TV}(\mathbb{P}_{\theta_{0}},\mathbb{P}_{\theta_{1}})\right),

where $\mathbb{P}_{\theta_{0}}$ and $\mathbb{P}_{\theta_{1}}$ represent the probability distribution of the statistics $X$ under the two hypotheses $\theta_{1}$ and $\theta_{2}$ respectively.

Proof of Proposition 5.

To apply Lemma 10, we first define $n-k+2$ input instances for the $\mathsf{TH}_{k}$ problem. Let $\mathbf{x}^{(0)}$ denote the binary sequence such that

x_{i}^{(0)}=\begin{cases}1,&\text{ for }1\leq i\leq k-1,\\ 0,&\text{ otherwise}.\end{cases}

For each $k\leq j\leq n$ , let $\mathbf{x}^{(j)}$ denote the binary sequence such that

x_{i}^{(j)}=\begin{cases}1,&\text{ for }1\leq i\leq k-1\text{ or }i=j,\\ 0,&\text{ otherwise}.\end{cases}

Notice that $\mathsf{TH}_{k}(\mathbf{x}^{(0)})=0$ and $\mathsf{TH}_{k}(\mathbf{x}^{(j)})=1$ for each $k\leq j\leq n$ . Therefore, for any estimator $\widehat{\mathsf{TH}}_{k}$ , we have

\displaystyle\mathbbm{1}(\widehat{\mathsf{TH}}_{k}\neq\mathsf{TH}_{k}(\mathbf{x}^{(0)}))+\mathbbm{1}(\widehat{\mathsf{TH}}_{k}\neq\mathsf{TH}_{k}(\mathbf{x}^{(j)}))\geq 1.

By Lemma 10, we have

\displaystyle\inf_{\widehat{\mathsf{TH}}_{k}}\sup_{\mathbf{x}\in\{0,1\}^{K}}\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}\neq\mathsf{TH}_{k}(\mathbf{x}))

\displaystyle\geq\frac{1}{2}(1-\mathsf{TV}(\mathbb{P}_{\mathbf{x}^{(0)}},\mathbb{P}_{\mathbf{x}^{(j)}})),

for any $k\leq j\leq n$ , where $\mathbb{P}_{\mathbf{x}^{(0)}}$ denote the joint distribution of query-observation pairs in $m$ rounds when the ground truth is $\mathbf{x}^{(0)}$ and $\mathbb{P}_{\mathbf{x}^{(j)}}$ denote the joint distribution of query-observation pairs in $m$ rounds when the ground truth is $\mathbf{x}^{(j)}$ . We can take maximum over $k\leq j\leq n$ on the right-hand side and get

	$\displaystyle\inf_{\widehat{\mathsf{TH}}_{k}}\sup_{\mathbf{x}\in\{0,1\}^{K}}\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}\neq\mathsf{TH}_{k}(\mathbf{x}))$
	$\displaystyle\geq\sup_{k\leq j\leq n}\frac{1}{2}(1-\mathsf{TV}(\mathbb{P}_{\mathbf{x}^{(0)}},\mathbb{P}_{\mathbf{x}^{(j)}}))$
	$\displaystyle\geq\sup_{k\leq j\leq n}\frac{1}{4}\exp(-{D_{\mathrm{KL}}}(\mathbb{P}_{\mathbf{x}^{(0)}},\mathbb{P}_{\mathbf{x}^{(j)}})),$

where the last inequality follows by the Bretagnolle–Huber inequality stated in Lemma 11. Now, we need to bound the KL divergence ${D_{\mathrm{KL}}}(\mathbb{P}_{\mathbf{x}^{(0)}},\mathbb{P}_{\mathbf{x}^{(j)}})$ . By the Divergence Decomposition Lemma stated in Lemma 12, we have

{D_{\mathrm{KL}}}(\mathbb{P}_{\mathbf{x}^{(0)}},\mathbb{P}_{\mathbf{x}^{(j)}})=\operatorname{\mathsf{E}}[M_{j}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathbf{x}}^{(0)}]{D_{\mathrm{KL}}(p\|1-p)},

where $M_{j}$ denote the number of queries on the bit $x_{j}$ . Because $\sum_{j=k}^{n}\operatorname{\mathsf{E}}[M_{j}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathbf{x}}^{(0)}]\leq\operatorname{\mathsf{E}}[M\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathbf{x}}^{(0)}]\leq\frac{(n-k+1)\log\tfrac{1}{4\delta}}{{D_{\mathrm{KL}}(p\|1-p)}}$ , there exists some $k\leq j^{*}\leq n$ such that $\operatorname{\mathsf{E}}[M_{j}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathbf{x}}^{(0)}]\leq\frac{\log\tfrac{1}{4\delta}}{{D_{\mathrm{KL}}(p\|1-p)}}$ . Finally, we have

	$\displaystyle\inf_{\widehat{\mathsf{TH}}_{k}}\sup_{\mathbf{x}\in\{0,1\}^{K}}\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}\neq\mathsf{TH}_{k}(\mathbf{x}))$
	$\displaystyle\geq\sup_{k\leq j\leq n}\frac{1}{4}\exp(-\operatorname{\mathsf{E}}[M_{j}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}{\mathbf{x}}^{(0)}]{D_{\mathrm{KL}}(p\\|1-p)})$
	$\displaystyle\geq\frac{1}{4}\exp(-\operatorname{\mathsf{E}}[M_{j^{*}}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}{\mathbf{x}}^{(0)}]{D_{\mathrm{KL}}(p\\|1-p)})$
	$\displaystyle\geq\delta,$

which completes the proof. ∎

Lemma 11 (An upper Bound of Bretagnolle–Huber inequality (Bretagnolle and Huber,, 1979)).

For any distribution $\mathbb{P}_{1},\mathbb{P}_{2}$ , one has

\displaystyle\mathsf{TV}(\mathbb{P}_{1},\mathbb{P}_{2})\leq 1-\frac{1}{2}\exp(-D_{\mathsf{KL}}(\mathbb{P}_{1},\mathbb{P}_{2})).

Lemma 12 (Divergence Decomposition (Auer et al.,, 1995)).

Let $T_{i}$ be the random variable denoting the number of times experiment $i\in[K]$ is performed under some policy $\pi$ , then for two distributions $\mathbb{P}^{\pi},\mathbb{Q}^{\pi}$ under policy $\pi$ ,

\displaystyle D_{\mathsf{KL}}(\mathbb{P}^{\pi},\mathbb{Q}^{\pi})=\sum_{i\in[K]}\mathbb{E}_{\mathbb{P}^{\pi}}[T_{i}]D_{\mathsf{KL}}(\mathbb{P}_{i}^{\pi},\mathbb{Q}_{i}^{\pi}).

Appendix B Detailed Proof of Theorem 2

In this appendix, we give a complete proof of Theorem 2. We present the proof in two separate regimes $k\leq\frac{n}{\log n}$ and $k>\frac{n}{\log n}$ .

B-A Regime 1: $k\leq\frac{n}{\log n}$

To prove Theorem 2 in this regime, we establish the following proposition.

Proposition 6.

Suppose $k\leq\frac{n}{\log n}$ and $k\leq n/2$ . There exists a variable-length algorithm, namely Algorithm 2, that computes the $\mathsf{TH}_{k}$ function with worst-case error probability at most $3\delta$ . Moreover, the number of queries $M$ made by the algorithm satisfies

\operatorname{\mathsf{E}}[M\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathbf{x}}]\leq\frac{n\log\frac{k}{\delta}}{{D_{\mathrm{KL}}(p\|1-p)}}+\frac{n}{1-2p}+C\left(k+\max\left(n\delta+n\sqrt{\delta},\frac{n}{\log n}\right)\right)\log\frac{k}{\delta}

for any input instance ${\mathbf{x}}$ .

To prove Theorem 2 in this regime, we can set $\delta$ in Proposition 6 to $\delta^{\prime}/3$ . Then we know that Algorithm 2 has worst-case error probability at most $\delta^{\prime}$ , and its expected query complexity satisfies

\operatorname{\mathsf{E}}[M\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathbf{x}}]\leq\frac{n\log\frac{3k}{\delta^{\prime}}}{{D_{\mathrm{KL}}(p\|1-p)}}+\frac{n}{1-2p}+C\left(k+\max\left(n\delta^{\prime}/3+n\sqrt{\delta^{\prime}/3},\frac{n}{\log n}\right)\right)\log\frac{3k}{\delta^{\prime}}.

By the assumptions that $k\leq\frac{n}{\log n}$ and $\delta^{\prime}=o(1)$ , it follows that

\operatorname{\mathsf{E}}[M\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathbf{x}}]=(1+o(1))\frac{n\log\frac{k}{\delta^{\prime}}}{{D_{\mathrm{KL}}(p\|1-p)}},

which proves Theorem 2 in this regime.

In the rest of this section, we prove Proposition 6 by analyzing the proposed NoisyThreshold algorithm (Algorithm 2).

Recall that Algorithm 2 uses the CheckBit and MaxHeapThreshold functions as subroutines. The CheckBit algorithm takes as input a single bit and returns an estimate of the bit through repeated queries. The CheckBit algorithm is given in Algorithm 3, where the parameter $\alpha$ in the $i$ th iteration corresponds to the probability that the input bit is one given the first $i$ noisy observations. The following lemma gives an upper bound on the expected number of queries used by the CheckBit algorithm, which can be seen as a recast of Lemma 13 in Gu and Xu, (2023).

Data: Input bit

x

, tolerated error probability

\delta

, crossover probability

p

Result: Estimate of

x

\alpha\leftarrow 1/2

3while $\alpha\in(\delta,1-\delta)$ do

4 Query the input bit

x

6 if $1$ is observed then

\alpha\leftarrow\frac{(1-p)\alpha}{(1-p)\alpha+p(1-\alpha)}

9 else

\alpha\leftarrow\frac{p\alpha}{p\alpha+(1-p)(1-\alpha)}

12 end if

14 end while

15if $\alpha\geq 1-\delta$ then

16 return

1

18else

19 return

0

21 end if

Algorithm 3 CheckBit algorithm for finding the value of an input bit

Lemma 13 (Lemma 13 in Gu and Xu, (2023)).

There exists a randomized algorithm, namely, the CheckBit algorithm, that finds the value of any input bit $x$ with error probability at most $\delta$ . The algorithm makes at most $\frac{1}{1-2p}\left\lceil\frac{\log\frac{1-\delta}{\delta}}{\log\frac{1-p}{p}}\right\rceil$ queries in expectation.

The MaxHeapThreshold algorithm computes the $\mathsf{TH}_{k}$ function for a length- $n$ input bit sequence with a desired error probability $\delta$ . This algorithm has been previously proposed in Feige et al., (1994), and is given here for completion in Algorithm 6. The MaxHeapThreshold algorithm uses the following three subroutines.

$\bullet$

NoisyCompareUsingReadings (Algorithm 4): which compares two bits using $C_{1}(p)$ noisy readings of each bit.
$\bullet$

ConstructMaxHeap (Algorithm 5): which returns a max-heap representation ${\bf z}$ of ${\bf x}$ such that $z_{i,j}$ is the $j$ -th entry in the $i$ -th level of a max-heap. The algorithm is similar to the tournament algorithm for computing the Max function (Zhu et al.,, 2024, Algorithm 5).
$\bullet$

NoisyExtractMax: which is similar to the common ExtractMax() operation in a max-heap structure, except that every comparison is performed using $C_{2}(p)$ noisy readings. Since this is a simple modification to a standard max-heap operation, we don’t show this algorithm here.

Remark 2.

Note that $C_{1}(p)$ and $C_{2}(p)$ are constants that depend only on $p$ . For the sake of the analysis of our proposed algorithm, the exact value of these constants does not matter.

The following lemma gives an upper bound on the expected number of queries made by the MaxHeapThreshold algorithm.

Lemma 14 (Theorem 3.3 in Feige et al., (1994)).

For any positive integer $k<n$ and positive real number $\delta<1$ , the MaxHeapThreshold algorithm computes the $\mathsf{TH}_{k}$ function with error probability at most $\delta$ and using at most $Cn\log\frac{m}{\delta}$ queries, where $m=\min\{k,n-k+1\}$ and $C$ is a constant depending only on $p$ .

Data: Elements

x_{1}

and

x_{2}

, tolerated error probability

\delta

, crossover probability

p

Result: Estimate of

\max\{x_{1},x_{2}\}

1 Make

C_{1}(p)\log\tfrac{1}{\delta}

noisy readings of

x_{1}

. Let

\hat{x}_{1}

be estimate pertaining to majority of readings.

3Make

C_{1}(p)\log\tfrac{1}{\delta}

noisy readings of

x_{2}

. Let

\hat{x}_{2}

be estimate pertaining to majority of readings.

5return

\max\{\hat{x}_{1},\hat{x}_{2}\}

Algorithm 4 NoisyCompareUsingReadings for comparing two elements using noisy readings

Data: Sequence

{\bf x}=(x_{1},\ldots,x_{n})

, tolerated error probability

\delta

, crossover probability

p

Result: Returns a max-heap structure of

{\bf x}

1 Set

{\bf y}\leftarrow{\bf x}

{\bf z}_{0,1\mathchar 58\relax n}\leftarrow{\bf y}

r\leftarrow n

3for $i=1\mathchar 58\relax\lceil\log_{2}(n)\rceil$ do

\tilde{\delta}_{i}\leftarrow\delta^{2(2i-1)}

6 for $j=1\mathchar 58\relax\lfloor r/2\rfloor$ do

z_{i,j}\leftarrow\textsc{NoisyCompareUsingReadings}

(

y_{2j-1}

y_{2j}

\tilde{\delta}_{i}

p

)

9 end for

10 if $r$ is even then

{\bf y}\leftarrow(z_{i,1},\ldots,z_{i,r/2})

13 else

{\bf y}\leftarrow(z_{i,1},\ldots,z_{i,\lfloor r/2\rfloor},y_{r})

z_{i,\lceil r/2\rceil}\leftarrow y_{r}

18 end if

r\leftarrow\lceil r/2\rceil

21 end for

22return

{\bf z}

Algorithm 5 ConstructMaxHeap algorithm

Data: Bit sequence

{\bf x}=(x_{1},\ldots,x_{n})

, parameter

k\leq\frac{n}{2}

, error probability

\delta

, noise probability

p

Result: Estimate of

\mathsf{TH}_{k}({\bf x})

y\leftarrow x

3if $k\geq\sqrt{n}$ then

4 Apply the sorting algorithm from Feige et al., (1994) and return the

k

th largest element³³3The sorting algorithm of Feige et al., (1994) considers the setting of noisy pairwise comparisons. Since we consider noisy readings of the bits here, then each comparison in the sorting algorithm of Feige et al., (1994) should be done by a call to NoisyCompareUsingReadings (Algorithm 4).

6else

{\bf y}\leftarrow\textsc{ConstructMaxHeap}({\bf x},\,p,\,\frac{\delta}{2k})

9 for $j=1,\ldots,k$ do

z_{j}\leftarrow\textsc{NoisyExtractMax}({\bf y},\,p,\,\frac{\delta}{2k\log n})

12 end for

13 return

z_{k}

15 end if

Algorithm 6 MaxHeapThreshold algorithm for computing the

\mathsf{TH}_{k}

function

By symmetry, we only need to consider the case of $k\leq n/2$ . In the following, we analyze the error probability and the number of queries of Algorithm 2.

Error Analysis: The following lemma bounds the worst-case error probability of Algorithm 2.

Lemma 15 (Error probability of NoisyThreshold).

The worst-case error probability of the NoisyThreshold algorithm is at most $3\delta$ .

Proof.

First, suppose that $\mathsf{TH}_{k}({\bf x})=1$ . Without loss of generality, assume that $x_{1}=\cdots=x_{k}=1$ . Let ${\mathcal{E}}_{1}=\{\widehat{\mathsf{TH}}_{k}=0\}$ denote the error event in this case, and let ${\mathcal{A}}=\{\exists\,i\in[k]\mathchar 58\relax x_{i}\notin{\mathcal{S}}\}$ . By the union bound, we have

\operatorname{\mathsf{P}}({\mathcal{E}}_{1})\leq\operatorname{\mathsf{P}}({\mathcal{A}})+\operatorname{\mathsf{P}}({\mathcal{E}}_{1}\mathchoice{\,|\,}{\mspace{2.0mu}|\mspace{2.0mu}}{|}{|}{\mathcal{A}}^{c}).

For the first term, by Lemma 13, we have that $\operatorname{\mathsf{P}}({\mathcal{A}})\leq\sum_{i=1}^{k}\operatorname{\mathsf{P}}(x_{i}\notin{\mathcal{S}})\leq\delta$ . For the second term, notice that $\lvert{\mathcal{S}}\rvert\geq k$ on the event ${\mathcal{A}}^{c}$ . Therefore, the only possibility for the algorithm to return $0$ is that the function call of MaxHeapThreshold on Line 10 incorrectly returns $0$ . By Lemma 14, we have that $\operatorname{\mathsf{P}}({\mathcal{E}}_{1}\mathchoice{\,|\,}{\mspace{2.0mu}|\mspace{2.0mu}}{|}{|}{\mathcal{A}}^{c})\leq\delta$ . This leads to the upper bound $\operatorname{\mathsf{P}}({\mathcal{E}}_{1})\leq 2\delta$ . Next, suppose that $\mathsf{TH}_{k}({\bf x})=0$ . Let ${\mathcal{E}}_{0}=\{\widehat{\mathsf{TH}}_{k}=1\}$ denote the error event in this case, and define the auxiliary event ${\mathcal{B}}=\{\lvert{\mathcal{S}}\rvert\geq k+\max(n\delta+n\sqrt{\delta},\frac{n}{\log n})\}$ . As before, we have

\operatorname{\mathsf{P}}(\mathcal{E}_{0})\leq\operatorname{\mathsf{P}}({\mathcal{B}})+\operatorname{\mathsf{P}}({\mathcal{E}}_{0}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathcal{B}}^{c}).

By Lemma 14, we know that $\operatorname{\mathsf{P}}({\mathcal{E}}_{0}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathcal{B}}^{c})\leq\delta$ . Moreover, we claim that $\operatorname{\mathsf{P}}({\mathcal{B}})\leq 2\delta$ . It follows that $\operatorname{\mathsf{P}}({\mathcal{E}}_{0})\leq 3\delta$ , and completes the proof.

It remains to show the claim. Notice that, by Lemma 13,

\lvert{\mathcal{S}}\rvert\,\sim\,\mathrm{Binom}(w,1-\gamma)+\mathrm{Binom}(n-w,\gamma),

where $w\leq k-1$ and $\gamma\leq\delta/k$ . Hence, $\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathcal{S}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}$ is stochastically dominated by the random variable $k+\mathrm{Binom}(n,\delta)$ . We separately consider two different cases to show the claim.

Case 1: $\delta\geq\frac{2}{n}$ . In this case, we have

	$\displaystyle\operatorname{\mathsf{P}}\left(\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}\mathcal{S}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}\geq k+n\delta+n\sqrt{\delta}\right)$	$\displaystyle\leq\operatorname{\mathsf{P}}\left(\mathrm{Binom}(n,\delta)\geq n\delta+n\sqrt{\delta}\right)$
		$\displaystyle\stackrel{{\scriptstyle\mathchoice{\hbox to0.0pt{\hss$\displaystyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\textstyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\scriptstyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\scriptscriptstyle{(a)}$\hss}}}}{{\leq}}\exp\left(-\frac{n}{2+1/\sqrt{\delta}}\right),$

where $(a)$ follows by the Chernoff bound (see, for example, Theorem 4.4 in Mitzenmacher and Upfal, (2017)). To show that $\operatorname{\mathsf{P}}({\mathcal{B}})\leq 2\delta$ , it suffices to show that $\frac{n}{2+1/\sqrt{\delta}}\geq\log\frac{1}{\delta}$ . This follows by the fact that $(2+1/\sqrt{\delta})\log\frac{1}{\delta}$ is maximized at $\delta=2/n$ with value $(2+1/\sqrt{2/n})\ln(n/2)$ , and that $n>(2+1/\sqrt{2/n})\ln(n/2)$ for any $n\geq 1$ .

Case 2: $\delta<\frac{2}{n}$ . Here we assume without loss of generality that $n\geq 3$ , because $\frac{n}{\log n}>n$ when $n\leq 2$ , and it follows that $\operatorname{\mathsf{P}}({\mathcal{B}})=0$ . We have

	$\displaystyle\operatorname{\mathsf{P}}\left(\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}\mathcal{S}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}\geq k+\frac{n}{\log n}\right)$	$\displaystyle\leq\operatorname{\mathsf{P}}\left(\mathrm{Binom}(n,\delta)\geq\frac{n}{\log n}\right)$
		$\displaystyle\stackrel{{\scriptstyle\mathchoice{\hbox to0.0pt{\hss$\displaystyle{(b)}$\hss}}{\hbox to0.0pt{\hss$\textstyle{(b)}$\hss}}{\hbox to0.0pt{\hss$\scriptstyle{(b)}$\hss}}{\hbox to0.0pt{\hss$\scriptscriptstyle{(b)}$\hss}}}}{{=}}\exp\left(-nD_{\mathsf{KL}}\left(\frac{1}{\log n}\big{\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}missing}\big{\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}missing}\delta\right)\right)$
		$\displaystyle=\exp\left(-n\left(\frac{1}{\log n}\log\frac{1}{\delta\log n}+\left(1-\frac{1}{\log n}\right)\log\frac{1-1/\log n}{1-\delta}\right)\right),$

where $(b)$ follows by the Chernoff–Hoeffding theorem (see, for example, Theorem 1 in Hoeffding, (1963)). To prove $\operatorname{\mathsf{P}}({\mathcal{B}})\leq 2\delta$ , it suffices to show that

n\left(\frac{1}{\log n}\log\frac{1}{\delta\log n}+\left(1-\frac{1}{\log n}\right)\log\frac{1-1/\log n}{1-\delta}\right)-\log\frac{1}{2\delta}\geq 0.

Firstly, notice that $n\left(\frac{1}{\log n}\log\frac{1}{\delta\log n}+\left(1-\frac{1}{\log n}\right)\log\frac{1-1/\log n}{1-\delta}\right)-\log\frac{1}{2\delta}$ monotonically decreases as $\delta$ increases from $0$ to $2/n$ . The proof is then completed by the fact that

n\left(\frac{1}{\log n}\log\frac{n}{2\log n}+\left(1-\frac{1}{\log n}\right)\log\frac{1-1/\log n}{1-2/n}\right)-\log\frac{n}{4}\geq 0

for any $n\geq 3$ . ∎

Query Analysis: Let $M$ denote the number of queries made by the NoisyThreshold algorithm. The following lemma gives an upper bound on the expectation of $M$ , and hence completes the proof of Proposition 6.

Lemma 16.

For any input instance ${\mathbf{x}}$ , the expected number of queries made by the NoisyThreshold algorithm satisfies that

\operatorname{\mathsf{E}}[M\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathbf{x}}]\leq\frac{n\log\frac{k}{\delta}}{{D_{\mathrm{KL}}(p\|1-p)}}+\frac{n}{1-2p}+C\left(k+\max\left(n\delta+n\sqrt{\delta},\frac{n}{\log n}\right)\right)\log\frac{k}{\delta},

where $C$ is a constant depending only on $p$ .

Proof.

To bound $\operatorname{\mathsf{E}}[M\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathbf{x}}]$ , we only need to bound the expected number of queries made by the $n$ calls of the CheckBit function on Line 3, and the single call of the MaxHeapThreshold function on Line 10. By Lemma 13, we know that the expected number of queries spent at Line 3 is at most

	$\displaystyle\frac{n}{1-2p}\left\lceil\frac{\log\frac{1-\frac{\delta}{k}}{\frac{\delta}{k}}}{\log\frac{1-p}{p}}\right\rceil$	$\displaystyle\leq\frac{n}{1-2p}\left(\frac{\log\frac{k}{\delta}}{\log\frac{1-p}{p}}+1\right)$
		$\displaystyle=\frac{n\log\frac{k}{\delta}}{{D_{\mathrm{KL}}(p\\|1-p)}}+\frac{n}{1-2p}.$

By Lemma 14, we know that the expected number of queries spent at Line 12 is at most

C\operatorname{\mathsf{E}}\left[\lvert{\mathcal{S}}\rvert\right]\log\frac{k}{\delta}\leq C\left(k+\max\left(n\delta+n\sqrt{\delta},\frac{n}{\log n}\right)\right)\log\frac{k}{\delta},

where the inequality holds because $\lvert{\mathcal{S}}\rvert\leq k+\max\left(n\delta+n\sqrt{\delta},\frac{n}{\log n}\right)$ on the event that Line 12 is executed. The lemma follows by the linearity of expectation. ∎

B-B Regime 2: $k>\frac{n}{\log n}$

To prove Theorem 2 in this regime, we establish the following proposition.

Proposition 7.

Suppose $k\geq\frac{n}{\log n}$ and $k\leq n/2$ . There exists a variable-length algorithm, namely Algorithm 1, that computes the $\mathsf{TH}_{k}$ function with worst-case error probability at most $\delta$ . Moreover, the number of queries $M$ made by the algorithm satisfies

\operatorname{\mathsf{E}}[M\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathbf{x}}]\leq\frac{n\log\frac{n}{\delta}}{{D_{\mathrm{KL}}(p\|1-p)}}+\frac{n}{1-2p}.

for any input instance ${\mathbf{x}}$ .

To see that Proposition 7 implies Theorem 2 in this regime, notice that under assumptions $k\geq\frac{n}{\log n}$ and $\delta=o(1)$ , the expected query complexity of Algorithm 1 satisfies

\operatorname{\mathsf{E}}[M\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathbf{x}}]\leq\frac{n\log\frac{n}{\delta}}{{D_{\mathrm{KL}}(p\|1-p)}}+\frac{n}{1-2p}=(1+o(1))\frac{n\log\frac{k}{\delta}}{{D_{\mathrm{KL}}(p\|1-p)}}.

Now it suffices to prove Proposition 7.

To prove the proposition, we first analyze the error probability of Algorithm 1. By Lemma 13, we know that $\operatorname{\mathsf{P}}(\hat{x}_{i}\neq x_{i})\leq\frac{\delta}{n}$ for each $i$ , where $\hat{x}_{i}$ is the estimate for $x_{i}$ in Algorithm 1. By the union bound, we have $\operatorname{\mathsf{P}}(\exists i\in[n]\mathchar 58\relax\hat{x}_{i}\neq x_{i})\leq\delta$ , and it follows that $\operatorname{\mathsf{P}}(\mathbbm{1}_{\{\sum_{i=1}^{n}\hat{x}_{i}\geq k\}}\neq\mathsf{TH}_{k}({\mathbf{x}}))\leq\delta$ . Therefore, Algorithm 1 has error probability at most $\delta$ .

Regarding the expected number of queries of Algorithm 1, notice that the algorithm calls the CheckBit for $n$ times. By Lemma 13, the total expected number of queries is at most

\frac{n}{1-2p}\left\lceil\frac{\log\frac{1-\frac{\delta}{n}}{\frac{\delta}{n}}}{\log\frac{1-p}{p}}\right\rceil\leq\frac{n}{1-2p}\frac{\log\frac{n}{\delta}}{\log\frac{1-p}{p}}+\frac{n}{1-2p}=\frac{n\log\frac{n}{\delta}}{D_{\mathsf{KL}}(p\|1-p)}+\frac{n}{1-2p},

which completes the proof.

B-C Fixed-length Algorithms

In this section, we describe a fixed-length version for each of the two proposed algorithms, and bound their number of queries and worst-case error probability.

B-C1 Fixed-length Version of Algorithm 2

Notice that in Algorithm 2, all the queries are spent on the $n$ executions of the CheckBit function on Line 3 and the call of the MaxHeapThreshold function on Line 12. Notice that in Lemma 14, a deterministic upper bound is established for the number of queries of the MaxHeapThreshold function. Thus, the randomness in the proposed algorithm only comes from the number queries of the CheckBit function. The main idea in this the construction of the fixed-length algorithm is to convert the CheckBit algorithm into one with bounded second moment for the number of queries. More specifically, we define the SafeCheckBit algorithm in the following two step procedure:

1.

Run the CheckBit algorithm until it halts, or it is about to make the $(\eta\log\eta)$ -th query, where $\eta\triangleq\frac{1}{1-2p}\left\lceil\frac{\log\frac{1-\delta}{\delta}}{\log\frac{1-p}{p}}\right\rceil$ .
2.

If the CheckBit algorithm halts, then return the algorithm output, otherwise, restart the CheckBit algorithm.

The construction of the safe algorithm is first proposed in Gu and Xu, (2023). It is shown in Corollary 19 of Gu and Xu, (2023) that the error probability of the SafeCheckBit algorithm is at most $(1+o(1))\delta$ . Moreover, its number of queries $M_{s}$ satisfies

\operatorname{\mathsf{E}}[M_{s}]\leq\frac{\eta}{1-\frac{1}{\log\eta}}\;\text{ and }\;\mathsf{Var}(M_{s})=O\left(\log^{3}\frac{1}{\delta}\right).

(31)

In the proposed fixed-length algorithm, we replace the call of CheckBit $(x_{i},\frac{\delta}{k},p)$ on Line 3 of Algorithm 2 by SafeCheckBit $(x_{i},\frac{\delta}{k},p)$ . Moreover, we add a termination rule in the for loop from Line 2 to Line 6 that the algorithm terminates and declares failure if it is about to use the

\left(\frac{n\eta^{\prime}}{1-\frac{1}{\log\eta^{\prime}}}+n\sqrt{\log\frac{k}{\delta}}\right)\text{-th}

query, where $\eta^{\prime}\triangleq\frac{1}{1-2p}\left\lceil\frac{\log\frac{k-\delta}{\delta}}{\log\frac{1-p}{p}}\right\rceil$ .

With the termination rule, it readily follows that the number of queries made by the algorithm is always at most $(1+o(1))\frac{n\log\frac{k}{\delta}}{{D_{\mathrm{KL}}(p\|1-p)}}$ . Let $X$ denote the total number of queries made by the $n$ executions of the SafeCheckBit algorithm. Because the number of queries made by each execution of the SafeCheckBit algorithm is independent of one another, we have

\operatorname{\mathsf{E}}[X]\leq\frac{n\eta^{\prime}}{1-\frac{1}{\log\eta^{\prime}}}\;\text{ and }\;\mathsf{Var}(X)=O\left(n\log^{3}\frac{k}{\delta}\right).

By Chebyshev’s inequality, we have

\operatorname{\mathsf{P}}\left(X\geq\frac{n\eta^{\prime}}{1-\frac{1}{\log\eta^{\prime}}}+n\sqrt{\log\frac{k}{\delta}}\right)\leq O\left(\frac{n\log^{3}\frac{k}{\delta}}{n^{2}\log\frac{k}{\delta}}\right)=O(n^{-1+o(1)})=o(\delta),

where the last two equalities hold true under the assumption that $\delta\geq n^{-1+{\epsilon}}$ for some positive constant ${\epsilon}$ . Hence, by the union bound and Lemma 15, the error probability of the proposed fixed-length algorithm is at most $(3+o(1))\delta$ . This proves Corollary 3 for the regime $k\leq\frac{n}{\log n}$ .

B-C2 Fixed-length Version of Algorithm 1

To construct the fixed-length version of Algorithm 1, we similarly replace the CheckBit function by the SafeCheckBit function and add a termination rule based on the number of queries spent. More specifically, we replace the call of function CheckBit $(x_{i},\delta/n)$ on Line 3 Algorithm 1 by SafeCheckBit $(x_{i},\delta/n)$ . Furthermore, we add the termination rule in the for loop that the algorithm terminates and declares failure if it is about to use the

\left(\frac{n\eta^{\prime\prime}}{1-\frac{1}{\log\eta^{\prime\prime}}}+n\sqrt{\log\frac{n}{\delta}}\right)\text{-th}

query, where $\eta^{\prime\prime}\triangleq\frac{1}{1-2p}\left\lceil\frac{\log\frac{n-\delta}{\delta}}{\log\frac{1-p}{p}}\right\rceil$ .

By the termination condition, we know that the algorithm deterministically uses at most $(1+o(1))\frac{n\log\frac{n}{\delta}}{{D_{\mathrm{KL}}(p\|1-p)}}=(1+o(1))\frac{n\log\frac{k}{\delta}}{{D_{\mathrm{KL}}(p\|1-p)}}$ queries, where the equality follows because $k>\frac{n}{\log n}$ . By an analogous analysis as in the previous section, we know that the error probability of the algorithm is at most $(1+o(1))\delta$ given that $\delta\geq n^{-1+\epsilon}$ . This completes the proof for the regime $k>\frac{n}{\log n}$ .

Appendix C Implementation Details

In this part, we give some details about the simulation results shown in Figure 1. In this simulation, our algorithm for computing the $\mathsf{MAJORITY}$ function (i.e., the threshold function with $k=n/2$ ) is compared to the existing algorithm proposed in Feige et al., (1994). A binary sequence of length $n=100$ chosen uniformly at random is inputted to the algorithms, and the desired error probability in the algorithms is set to $\delta=10^{-2}$ . For a sequence generated uniformly at random, the law of large numbers implies that the fraction of ones in the sequence is approximately $1/2$ with high probability. This scenario is the most likely to induce errors for an algorithm computing the $\mathsf{MAJORITY}$ function, making it critical for assessing the worst-case performance of the algorithm. For $k=n/2$ , the existing work by Feige et al., (1994) proposes a noisy sorting algorithm that sorts the input binary sequence, and returns the $k$ -th largest element in the sorted sequence as an estimate of the threshold function. The reader is referred to Section 3 in Feige et al., (1994) for more details about the existing algorithm. For the simulation of our proposed algorithm, we use Algorithm 1 (since $k=n/2$ ). At each value of $p$ ranging from 0.01 to 0.25, we run 10000 realizations of input sequences and compute the average number of queries used by the two algorithms.

	$\displaystyle\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w({\mathbf{x}})=1)$	$\displaystyle\geq\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}\cap\mathcal{C}}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w({\mathbf{x}})=1)$
		$\displaystyle\geq\omega(\delta)\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}\cap\mathcal{C}}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w({\mathbf{x}})=0)$
		$\displaystyle\geq\omega(\delta(1-\delta-o(1)))=\omega(\delta),$

	$\displaystyle\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k)$
	$\displaystyle\geq\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}\cap\mathcal{C}}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k)$
	$\displaystyle\stackrel{{\scriptstyle\mathchoice{\hbox to0.0pt{\hss$\displaystyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\textstyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\scriptstyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\scriptscriptstyle{(a)}$\hss}}}}{{\geq}}\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}\cap\mathcal{C}}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k-1)$
	$\displaystyle\hskip 10.00002pt\cdot\tfrac{(1+o(1))\delta}{n}\left(\tfrac{k}{\delta}\right)^{{\epsilon}/2}\left(n^{*}-\operatorname{\mathsf{E}}[N^{\prime}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w({\mathbf{x}})=k-1,(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})]\right)$
	$\displaystyle=\tfrac{(1+o(1))\delta}{n}\left(\tfrac{k}{\delta}\right)^{{\epsilon}/2}\Bigg{(}\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}\cap\mathcal{C}}n^{*}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k-1)$
	$\displaystyle\hskip 10.00002pt-\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k-1)\operatorname{\mathsf{E}}[N^{\prime}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w({\mathbf{x}})=k-1,(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})]\Bigg{)}.$		(12)

	$\displaystyle\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k\right)$
	$\displaystyle=\sum_{j\in{\mathcal{T}}}\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap\{J=j\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k\right)$
	$\displaystyle=\sum_{j\in{\mathcal{T}}}\operatorname{\mathsf{P}}\left(\{({\bf H}_{0}^{\alpha},{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha}+{\bf e}_{j},{\bf l}_{0}^{\alpha}-{\bf e}_{j})\}\cap\{J=j\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k\right),$		(17)

	$\displaystyle\operatorname{\mathsf{P}}\left(\{({\bf H}_{0}^{\alpha},{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha}+{\bf e}_{j},{\bf l}_{0}^{\alpha}-{\bf e}_{j})\}\cap\{J=j\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k\right)$
	$\displaystyle=\operatorname{\mathsf{P}}(\{{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha}+{\bf e}_{j}\}\cap\{{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha}-{\bf e}_{j}\}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}w(\mathbf{x})=k)$
	$\displaystyle\;\;\cdot\operatorname{\mathsf{P}}(\{J=j\}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha}+{\bf e}_{j},{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha}-{\bf e}_{j},w(\mathbf{x})=k)$
	$\displaystyle\;\;\cdot\operatorname{\mathsf{P}}({\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}J=j,Y_{0}^{\alpha}=y_{0}^{\alpha}+{\bf e}_{j},Z_{0}^{\alpha}=z_{0}^{\alpha}-{\bf e}_{j},w(\mathbf{x})=k),$		(18)

	$\displaystyle\operatorname{\mathsf{P}}({\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}J=j,{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha}+{\bf e}_{j},{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha}-{\bf e}_{j},w(\mathbf{x})=k)$
	$\displaystyle=\sum_{i=0}^{l_{j}}\operatorname{\mathsf{P}}(N^{\prime}_{j}=i\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}J=j,{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha}+{\bf e}_{j},{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha}-{\bf e}_{j},w(\mathbf{x})=k)\left(1-\frac{i}{l_{j}}\right)$
	$\displaystyle\geq 1-\frac{\operatorname{\mathsf{E}}[N^{\prime}_{j}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}J=j,{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha}+{\bf e}_{j},{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha}-{\bf e}_{j},w(\mathbf{x})=k]}{l_{j}-1}$
	$\displaystyle=1-\frac{\operatorname{\mathsf{E}}[N^{\prime}_{j}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}J=j,\bar{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha},w(\mathbf{x})=k]}{l_{j}-1}$
	$\displaystyle=1-\frac{\operatorname{\mathsf{E}}[N^{\prime}_{j}\mathchoice{\mspace{1.0mu}\|\mspace{1.0mu}}{\|}{\|}{\|}\bar{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha},w(\mathbf{x})=k-1]}{l_{j}-1},$		(21)

Noisy Computing of the Threshold Function

Abstract

I Introduction

I-A Related Work

I-B Notation

II Main Results

Theorem 1 (Converse).

Theorem 2 (Achievability).

Corollary 1.

Remark 1 (Gap between the upper and lower bounds).

Corollary 2 (Tight bounds for 𝖮𝖱\mathsf{OR} and 𝖠𝖭𝖣\mathsf{AND} functions).

Corollary 3 (Fixed-length algorithms).

III Main Idea in the Proof of the Lower Bound

III-A Regime log⁡k≤log⁡(1/δ)log⁡log⁡(1/δ)\log k\leq\frac{\log(1/\delta)}{\log\log(1/\delta)}

III-B Regime log⁡k>log⁡(1/δ)log⁡log⁡(1/δ)\log k>\frac{\log(1/\delta)}{\log\log(1/\delta)}

Lemma 1.

IV Main Idea in the Proof of the Upper Bound

IV-A Regime k>nlog⁡nk>\frac{n}{\log n}

IV-B Regime k≤nlog⁡nk\leq\frac{n}{\log n}

V Conclusion and Discussion

Closing the gap.

Unknown pp.

Varying pp.

References

Appendix A Detailed Proof of Theorem 1

A-A Regime 1: log⁡(1/δ)log⁡log⁡(1/δ)<log⁡k≤log⁡(1/δ)​log⁡log⁡(1/δ)\frac{\log(1/\delta)}{\log\log(1/\delta)}<\log k\leq\log(1/\delta)\log\log(1/\delta)

Proposition 1.

Proposition 2.

A-A1 Proof of Propositions

Proof of Proposition 1.

Lemma 2.

Lemma 3.

Lemma 4.

Proof of Proposition 2.

Lemma 5.

A-A2 Proof of Lemmas

Proof of Lemma 1.

Proof of Lemma 5.

Proof of Lemma 2.

Proof of Lemma 3.

Proof of Lemma 4.

A-B Regime 2: log⁡k>log⁡(1/δ)​log⁡log⁡(1/δ)\log k>\log(1/\delta)\log\log(1/\delta)

Proposition 3.

Proposition 4.

A-B1 Proof of Propositions

Proof of Proposition 3.

Lemma 6.

Lemma 7.

Lemma 8.

Proof of Proposition 4.

Lemma 9.

A-B2 Proof of Lemmas

Proof of Lemma 9.

Proof of Lemma 6.

Proof of Lemma 7.

Proof of Lemma 8.

A-C Regime 3: log⁡k≤log⁡(1/δ)log⁡log⁡(1/δ)\log k\leq\frac{\log(1/\delta)}{\log\log(1/\delta)}

Proposition 5.

Lemma 10 (Le Cam’s Two Points Lemma (Yu,, 1997)).

Proof of Proposition 5.

Lemma 11 (An upper Bound of Bretagnolle–Huber inequality (Bretagnolle and Huber,, 1979)).

Lemma 12 (Divergence Decomposition (Auer et al.,, 1995)).

Appendix B Detailed Proof of Theorem 2

B-A Regime 1: k≤nlog⁡nk\leq\frac{n}{\log n}

Proposition 6.

Lemma 13 (Lemma 13 in Gu and Xu, (2023)).

Remark 2.

Lemma 14 (Theorem 3.3 in Feige et al., (1994)).

Lemma 15 (Error probability of NoisyThreshold).

Proof.

Lemma 16.

Proof.

B-B Regime 2: k>nlog⁡nk>\frac{n}{\log n}

Proposition 7.

B-C Fixed-length Algorithms

B-C1 Fixed-length Version of Algorithm 2

B-C2 Fixed-length Version of Algorithm 1

Appendix C Implementation Details

Corollary 2 (Tight bounds for $\mathsf{OR}$ and $\mathsf{AND}$ functions).

III-A Regime $\log k\leq\frac{\log(1/\delta)}{\log\log(1/\delta)}$

III-B Regime $\log k>\frac{\log(1/\delta)}{\log\log(1/\delta)}$

IV-A Regime $k>\frac{n}{\log n}$

IV-B Regime $k\leq\frac{n}{\log n}$

Unknown $p$ .

Varying $p$ .

A-A Regime 1: $\frac{\log(1/\delta)}{\log\log(1/\delta)}<\log k\leq\log(1/\delta)\log\log(1/\delta)$

A-B Regime 2: $\log k>\log(1/\delta)\log\log(1/\delta)$

A-C Regime 3: $\log k\leq\frac{\log(1/\delta)}{\log\log(1/\delta)}$

B-A Regime 1: $k\leq\frac{n}{\log n}$

B-B Regime 2: $k>\frac{n}{\log n}$