This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Noisy Computing of the Threshold Function

Ziao Wang, Nadim Ghaddar, Banghua Zhu, and Lele Wang Ziao Wang is with the Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, BC V6T 1Z4, Canada (email: [email protected]).Nadim Ghaddar is with the Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON M5S 3G8, Canada, (email: [email protected]).Banghua Zhu is with the Department of Electrical Engineering and Computer Sciences, University of California Berkeley, Berkeley, CA 94720, USA, (email: [email protected]).Lele Wang is with the Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, BC V6T 1Z4, Canada (email: [email protected]).
Abstract

Let 𝖳𝖧k\mathsf{TH}_{k} denote the kk-out-of-nn threshold function: given nn input Boolean variables, the output is 11 if and only if at least kk of the inputs are 11. We consider the problem of computing the 𝖳𝖧k\mathsf{TH}_{k} function using noisy readings of the Boolean variables, where each reading is incorrect with some fixed and known probability p(0,1/2)p\in(0,1/2). As our main result, we show that it is sufficient to use (1+o(1))nlogmδD𝖪𝖫(p1p)(1+o(1))\frac{n\log\frac{m}{\delta}}{D_{\mathsf{KL}}(p\|1-p)} queries in expectation to compute the 𝖳𝖧k\mathsf{TH}_{k} function with a vanishing error probability δ=o(1)\delta=o(1), where mmin{k,nk+1}m\triangleq\min\{k,n-k+1\} and D𝖪𝖫(p1p)D_{\mathsf{KL}}(p\|1-p) denotes the Kullback-Leibler divergence between 𝖡𝖾𝗋𝗇(p)\mathsf{Bern}(p) and 𝖡𝖾𝗋𝗇(1p)\mathsf{Bern}(1-p) distributions. Conversely, we show that any algorithm achieving an error probability of δ=o(1)\delta=o(1) necessitates at least (1o(1))(nm)logmδD𝖪𝖫(p1p)(1-o(1))\frac{(n-m)\log\frac{m}{\delta}}{D_{\mathsf{KL}}(p\|1-p)} queries in expectation. The upper and lower bounds are tight when m=o(n)m=o(n), and are within a multiplicative factor of nnm\frac{n}{n-m} when m=Θ(n)m=\Theta(n). In particular, when k=n/2k=n/2, the 𝖳𝖧k\mathsf{TH}_{k} function corresponds to the 𝖬𝖠𝖩𝖮𝖱𝖨𝖳𝖸\mathsf{MAJORITY} function, in which case the upper and lower bounds are tight up to a multiplicative factor of two. Compared to previous work, our result tightens the dependence on pp in both the upper and lower bounds.

I Introduction

Coping with noise in computing is an important problem to consider in large systems. With applications in fault tolerance (Hastad et al.,, 1987; Pease et al.,, 1980; Pippenger et al.,, 1991), active ranking (Shah and Wainwright,, 2018; Agarwal et al.,, 2017; Falahatgar et al.,, 2017; Heckel et al.,, 2019; Wang et al.,, 2024; Gu and Xu,, 2023), noisy searching (Berlekamp,, 1964; Horstein,, 1963; Burnashev and Zigangirov,, 1974; Pelc,, 1989; Karp and Kleinberg,, 2007), among many others, the goal is to devise algorithms that are robust enough to detect and correct the errors that can happen during the computation. More concretely, the problem can be defined as follows: suppose an agent is interested in computing a function ff of nn variables with an error probability at most δ\delta, as quickly as possible. To this end, the agent can ask binary questions (referred to hereafter as queries) about the variables at hand. The binary response to the queries is observed by the agent through a binary symmetric channel (BSC) with crossover probability pp. The agent can adaptively design subsequent queries based on the responses to previous queries. The goal is to characterize the relation between nn, δ\delta, pp, and the query complexity, which is defined as the minimal number of queries that are needed by the agent to meet the intended goal of computing the function ff.

This paper considers the computation of the threshold-kk function. For nn Boolean variables 𝐱=(x1,,xn){0,1}n\mathbf{x}=(x_{1},\ldots,x_{n})\in\{0,1\}^{n}, the threshold-kk function 𝖳𝖧k()\mathsf{TH}_{k}(\cdot) computes whether the number of 1’s in 𝐱\mathbf{x} is at least kk or not, i.e.,

𝖳𝖧k(𝐱){1 if i=1nxik;0 if i=1nxi<k.\mathsf{TH}_{k}(\mathbf{x})\triangleq\begin{cases}1&\text{ if }\sum_{i=1}^{n}x_{i}\geq k;\\ 0&\text{ if }\sum_{i=1}^{n}x_{i}<k.\end{cases}

The noisy queries correspond to noisy readings of the bits, where at each time step, the agent queries one of the bits, and with probability pp, the wrong value of the bit is returned. It is assumed that the constant p(0,1/2)p\in(0,1/2) is known to the agent. Our goal is to characterize the optimal query complexity for computing the 𝖳𝖧k\mathsf{TH}_{k} function with error probability at most δ\delta. Throughout the paper, we assume parameters kk and δ\delta are functions that scale with nn, while pp is an absolute constant independent of all other parameters. We also assume that δ\delta is always strictly positive.

This model for noisy computation of the 𝖳𝖧k\mathsf{TH}_{k} function has been considered in Feige et al., (1994), where upper and lower bounds on the number of queries that are needed to compute the two functions are derived in terms of the number of variables nn, the threshold kk, the noise probability pp and the desired error probability δ\delta. The upper and lower bounds in Feige et al., (1994) are within a constant factor, hence providing the optimal order for the minimum number of queries; however, the exact tight characterization of the optimal number of queries is still open.

In this paper, we tighten this gap and provide new upper and lower bounds for the computation of the 𝖳𝖧k\mathsf{TH}_{k} function, which simultaneously improves the existing upper and lower bounds.

The main result of this paper can be stated as follows: for any 1kn1\leq k\leq n, there exists an algorithm that computes the 𝖳𝖧k\mathsf{TH}_{k} function with an error probability at most δ=o(1)\delta=o(1), and the algorithm uses at most

(1+o(1))nlogmδD𝖪𝖫(p1p)(1+o(1))\frac{n\log\frac{m}{\delta}}{D_{\mathsf{KL}}(p\|1-p)}

queries in expectation. Here we define mmin{k,nk+1}m\triangleq\min\{k,n-k+1\} and denote the Kullback-Leibler divergence between 𝖡𝖾𝗋𝗇(p)\mathsf{Bern}(p) and 𝖡𝖾𝗋𝗇(1p)\mathsf{Bern}(1-p) distributions by D𝖪𝖫(p1p)D_{\mathsf{KL}}(p\|1-p). Conversely, we prove that to achieve an error probability of δ=o(1)\delta=o(1), any algorithm must make at least

(1o(1))(nm)logmδD𝖪𝖫(p1p)(1-o(1))\frac{(n-m)\log\frac{m}{\delta}}{D_{\mathsf{KL}}(p\|1-p)}

queries in expectation. When m=o(n)m=o(n), the ratio between these upper and lower bounds is 1+o(1)1+o(1), and hence we provide an asymptotically tight characterization for the optimal number of queries. For general mm, these bounds are tight within a multiplicative factor of 22.

Refer to caption
Figure 1: Average number of queries used by the proposed noisy threshold algorithm for computing the 𝖬𝖠𝖩𝖮𝖱𝖨𝖳𝖸\mathsf{MAJORITY} function (i.e., k=n/2k=n/2), in comparison to the algorithm proposed in Feige et al., (1994), for n=100n=100 and δ=102\delta=10^{-2}. The theoretical lower bound corresponds to the plot of (nm)log(m/δ)DKL(p1p)\frac{(n-m)\log(m/\delta)}{{D_{\mathrm{KL}}(p\|1-p)}}, where m=min{k,nk+1}m=\min\{k,n-k+1\}.

To provide a more quantitative comparison with the bounds presented in Feige et al., (1994), let us consider the example where we set k=n1/3k=n^{1/3}, δ=n1/4\delta=n^{-1/4}, and p=13p=\frac{1}{3}. According to our results, the optimal number of queries to achieve error probability δ\delta is approximately 2.5247nlogn2.5247\cdot n\log n. In contrast, the constants in front of nlognn\log n in the lower and upper bounds given in Feige et al., (1994) are roughly 0.05060.0506 and 433.7518433.7518111The constant in the upper bound of Feige et al., (1994) is not explicitly given. We refined the analysis and computed the constant in this example using the algorithm proposed in Feige et al., (1994)., respectively. As an additional example, Figure 1 shows the simulation of our proposed algorithm in computing the 𝖬𝖠𝖩𝖮𝖱𝖨𝖳𝖸\mathsf{MAJORITY} function (that is, the threshold function with k=n/2k=n/2) compared to the algorithm proposed in Feige et al., (1994), when n=100n=100 and δ=102\delta=10^{-2}. Our algorithm clearly uses fewer queries on average to achieve the desired error probability. Moreover, our algorithm shows a scaling similar to the theoretical lower bound derived in this paper. The detailed setup of the simulation is given in Appendix C.

I-A Related Work

The noisy computation of the 𝖳𝖧k\mathsf{TH}_{k} function has primarily been investigated with a focus on the special case of k=1k=1, i.e., the 𝖮𝖱\mathsf{OR} function. Early investigations into the noisy computation of the 𝖮𝖱\mathsf{OR} function can be traced back to studies in circuit theory with noisy gates (Dobrushin and Ortyukov, 1977a, ; Dobrushin and Ortyukov, 1977b, ; Von Neumann,, 1956; Pippenger et al.,, 1991; Gács and Gál,, 1994) and noisy decision trees (Evans and Pippenger,, 1998; Reischuk and Schmeltz,, 1991). For the more general case of arbitrary kk, the noisy computation of the 𝖳𝖧k\mathsf{TH}_{k} function was initially explored in the seminal work (Feige et al.,, 1994) on noisy computing. Subsequent investigations extended this study to the context of noisy broadcast networks (Kushilevitz and Mansour,, 2005). In the noise model we consider, the best-known upper and lower bounds for the required number of queries to achieve a desired error probability δ\delta are attributed to Feige et al., (1994). Their work characterizes the optimal order as Θ(nlog(k/δ))\Theta(n\log(k/\delta)) for the number of queries required. In this work, we improve the known bounds and provide an asymptotically tight characterization for the required number of queries.

Recently, there have been several notable efforts to precisely characterize the optimal number of queries with tight constant dependencies for various noisy computing tasks. Wang et al., (2024) consider the problem of noisy sorting with pairwise comparisons, and provide upper and lower bounds for the required number of queries with a multiplicative gap of factor 22. This gap is later closed by Gu and Xu, (2023). In a recent paper (Zhu et al.,, 2024), tight bounds are established for two tasks including computing the 𝖮𝖱\mathsf{OR} function of a binary sequence and 𝖬𝖠𝖷\mathsf{MAX} function of a real sequence using pairwise comparisons.

I-B Notation

In this paper, we adhere to the following notation conventions. Sets and events are represented using calligraphic letters, random variables using uppercase letters, and specific realizations of random variables using lowercase letters. For example, 𝒳\mathcal{X} and 𝒴\mathcal{Y} are two sets, XX and YY are random variables, and xx and yy are their respective realizations. Vectors will be denoted using boldface letters, while scalars will be denoted using regular (non-bold) letters. The probability of an event 𝒜\mathcal{A} will be expressed as 𝖯(𝒜)\operatorname{\mathsf{P}}(\mathcal{A}), and the expectation of a real-valued random variable XX will be denoted as 𝖤[X]\operatorname{\mathsf{E}}[X]. For a finite set 𝒳\mathcal{X}, its cardinality will be denoted by |𝒳|\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathcal{X}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}. Unless otherwise specified, we use log\log to indicate the natural logarithm. We use [n][n] to denote the set of integers {1,,n}\{1,\ldots,n\}. We follow the standard Landau notation: f(n)=O(g(n))f(n)=O(g(n)) if limn|f(n)|missingg(n)<\lim_{n\to\infty}\frac{\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}f(n)\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}missing}{g(n)}<\infty; f(n)=Ω(g(n))f(n)=\Omega(g(n)) if limnf(n)g(n)>0\lim_{n\to\infty}\frac{f(n)}{g(n)}>0; f(n)=Θ(g(n))f(n)=\Theta(g(n)) if f(n)=O(g(n))f(n)=O(g(n)) and f(n)=Ω(g(n))f(n)=\Omega(g(n)); f(n)=o(g(n))f(n)=o(g(n)) if limnf(n)g(n)=0\lim_{n\to\infty}\frac{f(n)}{g(n)}=0; f(n)=ω(g(n))f(n)=\omega(g(n)) if limn|f(n)|missing|g(n)|missing=\lim_{n\to\infty}\frac{\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}f(n)\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}missing}{\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}g(n)\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}missing}=\infty; and f(n)g(n)f(n)\sim g(n) if limnf(n)g(n)=1\lim_{n\to\infty}\frac{f(n)}{g(n)}=1. Unless otherwise specified, limits in all instances of Landau notation are taken with respect to nn.

II Main Results

The main result of this paper comprises two theorems, through which we provide an asymptotically tight characterization of the minimum expected number of queries required to compute the 𝖳𝖧k\mathsf{TH}_{k} function with worst-case error probability δ\delta.

Theorem 1 (Converse).

Suppose kn/2k\leq n/2 and δ=o(1)\delta=o(1). Consider any variable-length algorithm for computing 𝖳𝖧k(𝐱)\mathsf{TH}_{k}(\mathbf{x}) that makes MM noisy queries. If MM satisfies 𝖤[M|𝐱](1o(1))(nk)logkδDKL(p1p)\operatorname{\mathsf{E}}[M\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x}]\leq(1-o(1))\frac{(n-k)\log\frac{k}{\delta}}{{D_{\mathrm{KL}}(p\|1-p)}} for any input instance 𝐱\mathbf{x}, then the worst-case error probability of the algorithm is at least δ\delta.

The main technical contribution of this paper is in the proof of Theorem 1. In this proof, we consider an enhanced yet more statistically tractable version of any adaptive algorithm for computing the 𝖳𝖧k\mathsf{TH}_{k} function, and leverage the probabilistic methods to lower bound its error probability. We illustrate the main idea of the proof in Section III, while deferring the detailed proof to Appendix A.

Theorem 2 (Achievability).

Suppose kn/2k\leq n/2 and δ=o(1)\delta=o(1). There exists a variable-length algorithm that computes the 𝖳𝖧k\mathsf{TH}_{k} function with worst-case error probability at most δ\delta. Moreover, the number of queries MM made by the algorithm satisfies 𝖤[M|𝐱](1+o(1))nlogkδDKL(p1p)\operatorname{\mathsf{E}}[M\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathbf{x}}]\leq(1+o(1))\frac{n\log\frac{k}{\delta}}{{D_{\mathrm{KL}}(p\|1-p)}} for any input instance 𝐱{\mathbf{x}}.

We describe the proposed algorithm, and sketch the proof of Theorem 2 in Section IV. The detailed proof is presented in Appendix B.

The upper and lower bounds established in Theorems 1 and 2 are restricted to the case of kn/2k\leq n/2. The following corollary extends the results to any k[n]k\in[n].

Corollary 1.

Let m=min{k,nk+1}m=\min\{k,n-k+1\}. For any k[n]k\in[n] and δ=o(1)\delta=o(1), it is sufficient to use (1+o(1))nlogmδD𝖪𝖫(p1p)(1+o(1))\frac{n\log\frac{m}{\delta}}{D_{\mathsf{KL}}(p\|1-p)} queries in expectation to compute the 𝖳𝖧k\mathsf{TH}_{k} function with worst-case error probability δ=o(1)\delta=o(1), while at least (1o(1))(nm)logmδD𝖪𝖫(p1p)(1-o(1))\frac{(n-m)\log\frac{m}{\delta}}{D_{\mathsf{KL}}(p\|1-p)} queries in expectation are necessary.

Corollary 1 follows readily by the fact that 𝖳𝖧k(𝐱)=1𝖳𝖧nk+1(1𝐱)\mathsf{TH}_{k}({\mathbf{x}})=1-\mathsf{TH}_{n-k+1}(1-{\mathbf{x}}) for any binary sequence 𝐱{\mathbf{x}}, which implies that computing the 𝖳𝖧k\mathsf{TH}_{k} function is equivalent as computing the 𝖳𝖧nk+1\mathsf{TH}_{n-k+1} function.

Remark 1 (Gap between the upper and lower bounds).

The upper and lower bounds presented in Corollary 1 are within a multiplicative factor of 2+o(1)2+o(1). To see this, notice that the ratio between the upper and lower bounds in Corollary 1 is (1+o(1))nnm(1+o(1))\frac{n}{n-m}. Because m=min{k,nk+1}(n+1)/2m=\min\{k,n-k+1\}\leq(n+1)/2, this ratio has maximum value 2+o(1)2+o(1) when k=(n+1)/2k=(n+1)/2. The problem of closing this gap remains open.

Corollary 2 (Tight bounds for 𝖮𝖱\mathsf{OR} and 𝖠𝖭𝖣\mathsf{AND} functions).

It is both sufficient and necessary to use (1±o(1))nlog1δD𝖪𝖫(p1p)(1\pm o(1))\frac{n\log\frac{1}{\delta}}{D_{\mathsf{KL}}(p\|1-p)} noisy queries in expectation to compute the 𝖮𝖱\mathsf{OR} and 𝖠𝖭𝖣\mathsf{AND} functions of nn Boolean variables with worst-case error probability δ\delta.

Corollary 2 follows by the fact that 𝖮𝖱\mathsf{OR} and 𝖠𝖭𝖣\mathsf{AND} functions can be viewed as special cases of the 𝖳𝖧k\mathsf{TH}_{k} function with k=1k=1 and k=nk=n respectively. It recovers the tight bounds for the 𝖮𝖱\mathsf{OR} and 𝖠𝖭𝖣\mathsf{AND} functions found by Zhu et al., (2024).

Corollary 3 (Fixed-length algorithms).

Suppose δ=o(1)\delta=o(1) and δn1+ϵ\delta\geq n^{-1+{\epsilon}} for some arbitrarily small positive constant ϵ{\epsilon}. Then for any kn/2k\leq n/2, there exists a fixed-length algorithm that computes the 𝖳𝖧k\mathsf{TH}_{k} function with error probability at most δ\delta. Moreover, the algorithm uses at most (1+o(1))nlogkδD𝖪𝖫(p1p)(1+o(1))\frac{n\log\frac{k}{\delta}}{D_{\mathsf{KL}}(p\|1-p)} queries.

We prove Corollary 3 in Appendix B by constructing a fixed-length version of the proposed algorithm.

III Main Idea in the Proof of the Lower Bound

In this section, we present the main idea in the proof of Theorem 1. We illustrate the proof idea for two distinct regimes: logklog(1/δ)loglog(1/δ)\log k\leq\frac{\log(1/\delta)}{\log\log(1/\delta)} and logk>log(1/δ)loglog(1/δ)\log k>\frac{\log(1/\delta)}{\log\log(1/\delta)}, as the techniques applied in these regimes differ significantly.

III-A Regime logklog(1/δ)loglog(1/δ)\log k\leq\frac{\log(1/\delta)}{\log\log(1/\delta)}

In this regime, our lower bound relies on Le Cam’s two-point method. In this method, we view the noisy computing problem as a binary testing problem. We design an input instance 𝐱(0){\mathbf{x}}^{(0)} with 𝖳𝖧k(𝐱)=0\mathsf{TH}_{k}({\mathbf{x}})=0, and nkn-k input instances 𝐱(1),,𝐱(nk){\mathbf{x}}^{(1)},\ldots,{\mathbf{x}}^{(n-k)} with 𝖳𝖧k(𝐱)=1\mathsf{TH}_{k}({\mathbf{x}})=1 for each i[nk]i\in[n-k]. Using Le Cam’s two-point lemma (Yu,, 1997), we show that when the query complexity of the algorithm is below a certain threshold, the algorithm cannot distinguish 𝐱(0){\mathbf{x}}^{(0)} and 𝐱(i){\mathbf{x}}^{(i)} for some i[nk]i\in[n-k], leading to an estimation error of the 𝖳𝖧k\mathsf{TH}_{k} function. Specifically, this method yields a lower bound of (1o(1))(nk)log1δDKL(p1p)(1-o(1))\frac{(n-k)\log\frac{1}{\delta}}{{D_{\mathrm{KL}}(p\|1-p)}} on the required number of queries, and further implies the desired bound (1o(1))(nk)logkδDKL(p1p)(1-o(1))\frac{(n-k)\log\frac{k}{\delta}}{{D_{\mathrm{KL}}(p\|1-p)}} by the fact that log(k/δ)=(1+o(1))log1δ\log(k/\delta)=(1+o(1))\log\frac{1}{\delta} in this regime. We defer the detailed proof in this regime to Appendix A.

Notably, in the opposite regime logk>log(1/δ)loglog(1/δ)\log k>\frac{\log(1/\delta)}{\log\log(1/\delta)}, the equality log(k/δ)=(1+o(1))log1δ\log(k/\delta)=(1+o(1))\log\frac{1}{\delta} does not necessarily hold true, and hence the bound provided by Le Cam’s method does not yield the desired lower bound. For example, when δ=1logn\delta=\frac{1}{\log n} and k=n/2k=n/2, the lower bound provided by Le Cam’s method is given by (1o(1))n2loglognDKL(p1p)(1-o(1))\frac{\frac{n}{2}\log\log n}{{D_{\mathrm{KL}}(p\|1-p)}}, which does not even match the asymptotic order of the desired lower bound (1o(1))n2lognlogn2DKL(p1p)(1-o(1))\frac{\frac{n}{2}\log\frac{n\log n}{2}}{{D_{\mathrm{KL}}(p\|1-p)}}.

III-B Regime logk>log(1/δ)loglog(1/δ)\log k>\frac{\log(1/\delta)}{\log\log(1/\delta)}

In this regime with a larger kk value, the bound derived from Le Cam’s method falls short in capturing the dependency between the optimal number of queries and the parameter kk. Therefore, we adopt an alternative approach, where we directly analyze the query-response statistics of an arbitrary algorithm, and bound its error probability. The major difficulty in establishing a converse result for an arbitrary algorithm is that the algorithm may have an arbitrarily correlated query strategy. To tackle this challenge, we introduce a two-phase enhanced version of any given variable-length algorithm. The first phase of the enhanced algorithm is called the non-adaptive phase. It queries each bit for a fixed amount of time. The second phase is called the adaptive phase. It adaptively chooses certain bits to noiselessly reveal their values. This two-phase design of the enhanced algorithm makes the response statistics more tractable and analyzable. We show that the worst-case probability of error of the enhanced algorithm is less than or equal to that of the original algorithm. Therefore, it suffices to show that the enhanced algorithm has error probability at least δ\delta. We sketch the proof in the rest of this section.

To prove the desired bound (1o(1))(nk)logkδDKL(p1p)(1-o(1))\frac{(n-k)\log\frac{k}{\delta}}{{D_{\mathrm{KL}}(p\|1-p)}}, we define a vanishing quantity ϵ=o(1)\epsilon=o(1), and prove that any algorithm with at most (1ϵ)(nkϵn)logkδDKL(pϵ1p)\frac{(1-{\epsilon})(n-k-{\epsilon}n)\log\frac{k}{\delta}}{{D_{\mathrm{KL}}}(p-\epsilon\|1-p)} queries in expectation has error probability at least δ\delta.

Enhanced algorithm. Consider an arbitrary algorithm 𝒜\mathcal{A} that makes at most (1ϵ)(nkϵn)logkδDKL(pϵ1p)\frac{(1-\epsilon)(n-k-\epsilon n)\log\frac{k}{\delta}}{{D_{\mathrm{KL}}}(p-\epsilon\|1-p)} queries in expectation. We introduce an enhanced version of 𝒜\mathcal{A}, denoted 𝒜\mathcal{A}^{\prime}, and show that the error probability of 𝒜\mathcal{A}^{\prime} is always less than or equal to the error probability of 𝒜\mathcal{A}. We then lower bound the error probability of 𝒜\mathcal{A}^{\prime} to complete the proof.

The enhanced algorithm 𝒜\mathcal{A}^{\prime} includes a two-step procedure:

  • \bullet

    Non-adaptive phase: query each of the nn bits α(1ϵ)logkδDKL(pϵ1p)\alpha\triangleq\frac{(1-\epsilon)\log\frac{k}{\delta}}{{D_{\mathrm{KL}}}(p-\epsilon\|1-p)} times.

  • \bullet

    Adaptive phase: Adaptively choose NN^{\prime} bits and noiselessly reveal their value, where the number of revealed bits NN^{\prime} satisfies 𝖤[N|𝐱]=nkϵn\operatorname{\mathsf{E}}[N^{\prime}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x}]=n-k-\epsilon n for any 𝐱\mathbf{x}.

The purpose of the enhanced algorithm is to transform an intractable adaptive algorithm into a tractable one with analyzable statistics. The idea of algorithm enhancement is first introduced by Feige et al., (1994), where the authors define an enhancement for any fixed-length algorithm. In the above definition of 𝒜{\mathcal{A}}^{\prime}, we extend this notion of algorithm enhancement to accommodate any variable-length algorithm, and analyze the error probability for this enhanced variable-length algorithm. Consequently, the lower bound established in our work applies to the expected query complexity of any variable-length algorithm, while the lower bound by Feige et al., (1994) applies only to the deterministic query complexity of any fixed-length algorithm.

It is an intuitive observation that Algorithm 𝒜\mathcal{A}^{\prime} works strictly better than 𝒜\mathcal{A}. To see this, we notice that the expected number of bits queried more than α=(1ϵ)logkδDKL(pϵ1p)\alpha=\frac{(1-\epsilon)\log\frac{k}{\delta}}{{D_{\mathrm{KL}}}(p-\epsilon\|1-p)} times in Algorithm 𝒜\mathcal{A} is at most nkϵnn-k-\epsilon n, because 𝒜\mathcal{A} makes at most (1ϵ)(nkϵn)logkδDKL(pϵ1p)\frac{(1-\epsilon)(n-k-\epsilon n)\log\frac{k}{\delta}}{{D_{\mathrm{KL}}}(p-\epsilon\|1-p)} queries in expectation. In contrast, Algorithm 𝒜\mathcal{A}^{\prime} makes α\alpha queries to each bit, and adaptively choose nkϵnn-k-\epsilon n bits in expectation to acquire the values noiselessly. As a result, Algorithm 𝒜\mathcal{A}^{\prime} obtains more information on every bit than Algorithm 𝒜\mathcal{A}, and hence has a lower error probability. We formally state this result in following lemma, and defer its proof to Appendix A.

Lemma 1.

The worst-case error probability of 𝒜\mathcal{A}^{\prime} is at most the worst-case error probability of 𝒜\mathcal{A}.

Now, it suffices to show that the worst-case error probability of 𝒜\mathcal{A}^{\prime} is at least δ\delta. To bound the error probability of Algorithm 𝒜\mathcal{A}^{\prime}, we analyze the query responses from the non-adaptive phase using a balls-and-bins model. We view each of the nn bits as a ball with an underlying weight. A bit of value one corresponds to a heavy ball, and a bit of value zero corresponds to a light ball. In the non-adaptive phase, each bit xix_{i} is queried α\alpha times. Let IiI_{i} denote the number of ones in these α\alpha responses. If Ii=jI_{i}=j, we put ball ii to bin j{0,1,,α}j\in\{0,1,\ldots,\alpha\}. This creates α+1\alpha+1 bins 𝒮j{i[n]:Ii=j}{\mathcal{S}}_{j}\triangleq\{i\in[n]\mathchar 58\relax I_{i}=j\} for j{0,1,,α}j\in\{0,1,\ldots,\alpha\}.

The statistics of the responses we get from the non-adaptive phase can be characterized by a sequence of α+1\alpha+1 sets 𝒮(𝒮0,,𝒮α){\mathcal{S}}\triangleq({\mathcal{S}}_{0},\ldots,{\mathcal{S}}_{\alpha}). For j{0,,α}j\in\{0,\ldots,\alpha\}, we define j{i[n]:xi=1,Ii=j}\mathcal{H}_{j}\triangleq\{i\in[n]\mathchar 58\relax x_{i}=1,I_{i}=j\} and j{i[n]:xi=0,Ii=j}\mathcal{L}_{j}\triangleq\{i\in[n]\mathchar 58\relax x_{i}=0,I_{i}=j\}, i.e., j\mathcal{H}_{j} (resp. j\mathcal{L}_{j}) represents the set of heavy (resp. light) balls in the jjth bin. Let Hj|j|H_{j}\triangleq\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathcal{H}_{j}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|} and Lj|j|L_{j}\triangleq\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathcal{L}_{j}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}. Let 𝐇0α(H0,,Hα){\bf H}_{0}^{\alpha}\triangleq(H_{0},\ldots,H_{\alpha}) and 𝐋0α(L0,,Lα){\bf L}_{0}^{\alpha}\triangleq(L_{0},\ldots,L_{\alpha}).

In the adaptive phase of Algorithm 𝒜\mathcal{A}^{\prime}, we noiselessly reveal the weights of NN^{\prime} balls. Using 𝒮{\mathcal{S}} and the weights of the balls revealed in the adaptive phase, Algorithm 𝒜\mathcal{A}^{\prime} needs to estimate whether there are at least kk heavy balls out of nn balls.

Restricting input sequence weights. To facilitate the analysis of Algorithm 𝒜\mathcal{A}^{\prime}, we restrict the input instance space. We assume that the input sequence 𝐱\mathbf{x} satisfies either w(𝐱)=k1w(\mathbf{x})=k-1 or w(𝐱)=kw(\mathbf{x})=k, i.e., there are either kk or k1k-1 heavy balls. The error for the restricted inputs is a lower bound on the original worst-case error

max𝐱{0,1}n𝖯(𝖳𝖧^k𝖳𝖧k(𝐱)|𝐱)max𝐱{0,1}n:w(𝐱)=k orw(𝐱)=k1𝖯(𝖳𝖧^k𝖳𝖧k(𝐱)|𝐱),\max_{\mathbf{x}\in\{0,1\}^{n}}\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}\neq\mathsf{TH}_{k}(\mathbf{x})\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x})\geq\max_{\begin{subarray}{c}\mathbf{x}\in\{0,1\}^{n}\mathchar 58\relax\\ w(\mathbf{x})=k\text{ or}\\ w(\mathbf{x})=k-1\end{subarray}}\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}\neq\mathsf{TH}_{k}(\mathbf{x})\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x}),

where 𝖳𝖧^k\widehat{\mathsf{TH}}_{k} denote the output estimate of Algorithm 𝒜{\mathcal{A}}^{\prime}. Later we will show the error on the restricted input space is at least δ\delta.

Genie-aided information. We assume that a genie provides Algorithm 𝒜\mathcal{A}^{\prime} with additional information after the non-adaptive phase to assist the estimation. Specifically, the genie reveals two sequences of sets ¯0α=(¯0,,¯α)\bar{\mathcal{H}}_{0}^{\alpha}=(\bar{\mathcal{H}}_{0},\ldots,\bar{\mathcal{H}}_{\alpha}) and ¯0α=(¯0,,¯α)\bar{\mathcal{L}}_{0}^{\alpha}=(\bar{\mathcal{L}}_{0},\ldots,\bar{\mathcal{L}}_{\alpha}), which are defined under two hypotheses w(𝐱)=k1w(\mathbf{x})=k-1 and w(𝐱)=kw(\mathbf{x})=k respectively as follows.

  • \bullet

    When w(𝐱)=k1w(\mathbf{x})=k-1, there are in total k1k-1 heavy balls in the bins, and the genie reveals the indices of all of them, i.e., (¯0,,¯α)=(0,,α)and(¯0,,¯α)=(0,,α).(\bar{\mathcal{H}}_{0},\ldots,\bar{\mathcal{H}}_{\alpha})=(\mathcal{H}_{0},\ldots,\mathcal{H}_{\alpha})\;\;\text{and}\;\;(\bar{\mathcal{L}}_{0},\ldots,\bar{\mathcal{L}}_{\alpha})=(\mathcal{L}_{0},\ldots,\mathcal{L}_{\alpha}).

  • \bullet

    When w(𝐱)=kw(\mathbf{x})=k, the genie decides to either reveal the indices of k1k-1 heavy balls or the indices of all kk heavy balls through a random process. Let 𝒯{j{0,1,,α}:α(pϵ)jα(p+ϵ)}{\mathcal{T}}\triangleq\{j\in\{0,1,\ldots,\alpha\}\mathchar 58\relax\alpha(p-{\epsilon})\leq j\leq\alpha(p+{\epsilon})\} be the set of typical bin indices for light balls. The genie picks a random bin J𝒯J\in{\mathcal{T}} from distribution 𝖯(J=j)=Lji𝒯Li\operatorname{\mathsf{P}}(J=j)=\frac{L_{j}}{\sum_{i\in{\mathcal{T}}}L_{i}}. If the chosen bin JJ contains no heavy balls, i.e., HJ=0H_{J}=0, then the genie reveals the indices of all kk heavy balls by setting (¯0,,¯α)=(0,,α)and(¯0,,¯α)=(0,,α).(\bar{\mathcal{H}}_{0},\ldots,\bar{\mathcal{H}}_{\alpha})=(\mathcal{H}_{0},\ldots,\mathcal{H}_{\alpha})\;\;\text{and}\;\;(\bar{\mathcal{L}}_{0},\ldots,\bar{\mathcal{L}}_{\alpha})=(\mathcal{L}_{0},\ldots,\mathcal{L}_{\alpha}). On the other hand, if HJ1H_{J}\geq 1, then the genie chooses a heavy ball ii in bin JJ uniformly at random to be hidden, and reveals the indices of all other heavy balls. This is done by setting (¯0,,¯α)=(0,,J1,J{i},J+1,,α)(\bar{\mathcal{H}}_{0},\ldots,\bar{\mathcal{H}}_{\alpha})=(\mathcal{H}_{0},\ldots,\mathcal{H}_{J-1},\mathcal{H}_{J}\setminus\{i\},\mathcal{H}_{J+1},\ldots,\mathcal{H}_{\alpha}) and (¯0,,¯α)=(0,,J1,J{i},J+1,,α).(\bar{\mathcal{L}}_{0},\ldots,\bar{\mathcal{L}}_{\alpha})=(\mathcal{L}_{0},\ldots,\mathcal{L}_{J-1},\mathcal{L}_{J}\cup\{i\},\mathcal{L}_{J+1},\ldots,\mathcal{L}_{\alpha}).

It is not hard to see that with the additional noiseless information revealed by the genie, the performance of Algorithm 𝒜\mathcal{A}^{\prime} does not deteriorate. Moreover, because in all cases the sets ¯0α\bar{\mathcal{H}}_{0}^{\alpha} only contain heavy balls, and the sets {j¯:j𝒯}\{\bar{\mathcal{L}_{j}}\mathchar 58\relax j\notin\mathcal{T}\} only contain light balls, we assume without loss of generality that the adaptive phase of 𝒜\mathcal{A}^{\prime} only reveals the weights of the balls in {j¯:j𝒯}\bar{\mathcal{L}_{j}}\mathchar 58\relax j\in\mathcal{T}\}. In the remaining proof, we analyze the error probability of 𝒜\mathcal{A}^{\prime} in the presence of genie-aided information.

Why can we establish a tighter lower bound? Before sketching the proof of our lower bound with the genie-aided information, we first clarify the key distinctions between our analysis and that in Feige et al., (1994). We also highlight the main reason why our approach provides a tighter lower bound for the optimal query complexity.

In the proof by Feige et al., (1994), the error bound for 𝒜\mathcal{A}^{\prime} is also established through a genie-aided analysis, where the genie picks the bin JJ to hide a heavy ball from all possible bins {0,,α}\{0,\ldots,\alpha\} instead of choosing it from the set 𝒯{\mathcal{T}}. It is then shown that when α\alpha is small enough, every bin in {0,,α}\{0,\ldots,\alpha\} contains at least one heavy ball with high probability, ensuring the genie hides a heavy ball in the chosen bin. Furthermore, it is shown that the noiseless queries in the adaptive phase would fail to find the hidden heavy ball with a constant probability, resulting in the scenario where the algorithm observes the k1k-1 heavy balls revealed by the genie, but cannot distinguish whether there is a kkth hidden heavy ball. This leads to an overall error probability of at least δ\delta for the algorithm.

In our analysis, we realize that having a heavy ball in every bin with high probability is not a necessary condition for establishing the desired error bound for Algorithm 𝒜\mathcal{A}^{\prime}. In our proof, the genie instead picks the bin JJ from 𝒯{\mathcal{T}} to hide a heavy ball, where 𝒯{\mathcal{T}} is a typical set of bins for the light balls. We then focus on analyzing the statistical behavior of these bins in 𝒯{\mathcal{T}}. We show that for each i𝒯i\in{\mathcal{T}}, the probability that bin ii contains at least one heavy ball is ω(δ)\omega(\delta). Moreover, conditioned on the event that a heavy ball is hidden by the genie, we further show that the information revealed by the genie and the noiseless queries in the adaptive phase are not informative enough for the algorithm to distinguish whether there is a kkth heavy ball that is hidden. This leads to the desired error bound. Notably, requiring each bin i𝒯i\in{\mathcal{T}} to contain at least one heavy ball with probability ω(δ)\omega(\delta) imposes a much less stringent condition on the choice of α\alpha comparing to requiring all the bins to contain at least one heavy ball with high probability. Consequently, the choice of α=(1ϵ)logkδDKL(pϵ1p)\alpha=\frac{(1-\epsilon)\log\frac{k}{\delta}}{{D_{\mathrm{KL}}}(p-\epsilon\|1-p)} in our analysis is much higher than the choice of α\alpha in Feige et al., (1994). Therefore, we can show that the error lower bound δ\delta applies to an algorithm with higher query complexity. This leads to a tighter lower bound for the optimal query complexity.

Error analysis for the optimal estimator. Towards a contradiction, we assume that the worst-case error is at most δ\delta, i.e.,

max{𝖯(𝖳𝖧^k=1|w(𝐱)=k1),𝖯(𝖳𝖧^k=0|w(𝐱)=k}δ.\max\{\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=1\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1),\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k\}\leq\delta. (1)

This implies

𝖯(𝖳𝖧^k=0|w(𝐱)=k1)1δ.\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1)\geq 1-\delta. (2)

However, in the rest of the proof, we show (2) implies

𝖯(𝖳𝖧^k=0|w(𝐱)=k)>δ,\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k)>\delta, (3)

which contradicts with (1).

Sufficient statistic for estimation. We claim a sufficient statistic for the optimal estimator 𝖳𝖧^k\widehat{\mathsf{TH}}_{k}. When algorithm 𝒜\mathcal{A}^{\prime} decides its output, it has access to information including ¯0α\bar{\mathcal{H}}_{0}^{\alpha}, ¯0α\bar{\mathcal{L}}_{0}^{\alpha} and the weights of the balls revealed in the adaptive phase. By the symmetry of the balls in the same bin, it suffices for 𝒜\mathcal{A}^{\prime} to consider the cardinality of the sets ¯0α\bar{\mathcal{H}}_{0}^{\alpha} and ¯0α\bar{\mathcal{L}}_{0}^{\alpha}, denoted by 𝐇¯0α(|¯0|,,|¯α|)\bar{\bf H}_{0}^{\alpha}\triangleq(\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\bar{\mathcal{H}}_{0}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|},\ldots,\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\bar{\mathcal{H}}_{\alpha}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}) and 𝐋¯0α(|¯0|,,|¯α|)\bar{\bf L}_{0}^{\alpha}\triangleq(\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\bar{\mathcal{L}}_{0}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|},\ldots,\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\bar{\mathcal{L}}_{\alpha}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}). Moreover, notice that the unrevealed balls include at most 11 heavy ball. The adaptive phase of the algorithm either finds one heavy ball or no heavy ball. By the symmetry of the balls in the same bin, it suffices for 𝒜\mathcal{A}^{\prime} to consider whether the adaptive phase finds any heavy ball. Let 1{\mathcal{E}}_{1} denote the event that the adaptive phase does not find any heavy ball. Thus, the optimal estimator 𝖳𝖧^k\widehat{\mathsf{TH}}_{k} of 𝒜\mathcal{A}^{\prime} can be written as a function 𝖳𝖧^k(𝐇¯0α,𝐋¯0α,𝟙{1})\widehat{\mathsf{TH}}_{k}(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha},\mathbbm{1}_{\{{\mathcal{E}}_{1}\}}).

Next, we argue that certain realizations of the sufficient statistic yield the correct answer to the true threshold 𝖳𝖧k(𝐱)\mathsf{TH}_{k}(\mathbf{x}) (and thus no error for the optimal estimator). Notice that if i=0αH¯i=k\sum_{i=0}^{\alpha}\bar{H}_{i}=k, then we know that w(𝐱)=kw(\mathbf{x})=k. This is because ¯0α\bar{\mathcal{H}}_{0}^{\alpha} only contain indices of heavy balls. Moreover, if 𝟙{1}=0\mathbbm{1}_{\{{\mathcal{E}}_{1}\}}=0, we know that a heavy ball is found in the adaptive phase, and it follows that w(𝐱)=kw(\mathbf{x})=k. Therefore, we assume without loss of generality that the optimal estimator 𝖳𝖧^k(𝐇¯0α,𝐋¯0α,𝟙{1})=1\widehat{\mathsf{TH}}_{k}(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha},\mathbbm{1}_{\{{\mathcal{E}}_{1}\}})=1 if i=0αH¯i=k\sum_{i=0}^{\alpha}\bar{H}_{i}=k or 𝟙{1}=0\mathbbm{1}_{\{{\mathcal{E}}_{1}\}}=0.

Now the only ambiguous case that might lead to error is when i=1αH¯i=k1\sum_{i=1}^{\alpha}\bar{H}_{i}=k-1 and 𝟙{1}=1\mathbbm{1}_{\{{\mathcal{E}}_{1}\}}=1. This corresponds to the case where the genie reveals the indices of k1k-1 heavy balls, and the adaptive phase does not find any heavy balls. Among the remaining realizations of the sufficient statistic, we further divide them into two sets according to whether the estimator 𝖳𝖧^k(𝐡0α,𝐥0α,1)\widehat{\mathsf{TH}}_{k}({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha},1) is 0 or 1. We define 𝒩\mathcal{N} as the collection of realizations (𝐡0α,𝐥0α)({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha}) such that 𝖳𝖧^k(𝐡0α,𝐥0α,1)=0\widehat{\mathsf{TH}}_{k}({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha},1)=0. In other words, we have

𝖯(𝖳𝖧^k=0)=(𝐡0α,𝐥0α)𝒩𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}1).\displaystyle\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0)=\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}). (4)

Ideally, if we can prove that this sufficient statistic (𝐇¯0α,𝐋¯0α,𝟙{1})(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha},\mathbbm{1}_{\{{\mathcal{E}}_{1}\}}) is similarly distributed under the two different hypotheses w(𝐱)=k1w({\mathbf{x}})=k-1 and w(𝐱)=kw({\mathbf{x}})=k, then we can lower bound the ratio 𝖯(𝖳𝖧^k=0|w(𝐱)=k)𝖯(𝖳𝖧^k=0|w(𝐱)=k1)\frac{\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w({\mathbf{x}})=k)}{\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w({\mathbf{x}})=k-1)}, and hence show that (2) implies (3). More specifically, if we can show that

𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}1|w(𝐱)=k)𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}1|w(𝐱)=k1)=ω(δ)\frac{\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k\right)}{\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1\right)}=\omega(\delta) (5)

for any (𝐡0α,𝐥0α)𝒩({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}, then we would have

𝖯(𝖳𝖧^k=0|w(𝐱)=k)=ω(δ)𝖯(𝖳𝖧^k=0|w(𝐱)=k1)>δ,\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w({\mathbf{x}})=k)=\omega(\delta)\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w({\mathbf{x}})=k-1)>\delta,

which completes the proof. However, it turns out that the desired property (5) does not hold for all realizations (𝐡0α,𝐥0α)𝒩({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}. Instead, we slightly adjust this approach, and identify a family 𝒞\mathcal{C} of realizations (𝐡0α,𝐥0α)({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha}) satisfying

  1. 1.

    Typicality: (𝐡0α,𝐥0α)𝒞𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}|w(𝐱)=k1)1o(1).\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{C}}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1)\geq 1-o(1).

  2. 2.

    Bounded ratio property222For simplicity, the bounded ratio property presented here is a simplified version of what we actually prove, and it holds only in the special case when 𝒜{\mathcal{A}}^{\prime} makes a fixed number of queries in the adaptive phase. We refer the readers to Propositions 2 and 4 for the properties in the general case.: 𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}1|w(𝐱)=k)𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}1|w(𝐱)=k1)=ω(δ),\frac{\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k\right)}{\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1\right)}=\omega(\delta), (𝐡0α,𝐥0α)𝒞\forall({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{C}.

With the two properties for the family 𝒞\mathcal{C}, we have

𝖯(𝖳𝖧^k=0|w(𝐱)=1)\displaystyle\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w({\mathbf{x}})=1) (𝐡0α,𝐥0α)𝒩𝒞𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}1|w(𝐱)=1)\displaystyle\geq\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}\cap\mathcal{C}}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w({\mathbf{x}})=1)
ω(δ)(𝐡0α,𝐥0α)𝒩𝒞𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}1|w(𝐱)=0)\displaystyle\geq\omega(\delta)\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}\cap\mathcal{C}}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w({\mathbf{x}})=0)
ω(δ(1δo(1)))=ω(δ),\displaystyle\geq\omega(\delta(1-\delta-o(1)))=\omega(\delta),

which completes the proof. In the detailed proof presented in Appendix A, we further separate the regime logk>log(1/δ)loglog(1/δ)\log k>\frac{\log(1/\delta)}{\log\log(1/\delta)} into two sub-regimes log(1/δ)loglog(1/δ)<logklog(1/δ)loglog(1/δ)\frac{\log(1/\delta)}{\log\log(1/\delta)}<\log k\leq\log(1/\delta)\log\log(1/\delta) and logk>log(1/δ)loglog(1/δ)\log k>\log(1/\delta)\log\log(1/\delta). We present the detailed construction of the family 𝒞\mathcal{C} in these two sub-regimes in Appendix A.

IV Main Idea in the Proof of the Upper Bound

In this section, we propose a noisy computing algorithm for the 𝖳𝖧k\mathsf{TH}_{k} function, and illustrate the main idea in the proof of Theorem 2. We separately consider two regimes k>nlognk>\frac{n}{\log n} and knlognk\leq\frac{n}{\log n}.

IV-A Regime k>nlognk>\frac{n}{\log n}

To estimate a function of a binary sequence 𝐱{\mathbf{x}}, one natural idea is first to estimate each bit in 𝐱{\mathbf{x}} and then to compute the function with the estimates. In this regime, the proposed algorithm exactly follows this idea.

The proposed algorithm estimates the bits in the sequence by employing an existing algorithm ChechBit as a subroutine. CheckBit (Algorithm 1 in Gu and Xu, (2023)) takes as input a single bit and a desired error probability δ\delta, and returns an estimate of the bit through repeated queries. A detailed description of the CheckBit algorithm is provided in Algorithm 3 in Appendix B. Lemma 13 in Gu and Xu, (2023) shows that the algorithm makes at most 112plog1δδlog1pp\frac{1}{1-2p}\left\lceil\frac{\log\frac{1-\delta}{\delta}}{\log\frac{1-p}{p}}\right\rceil queries in expectation and returns the correct value of the input bit with probability at least 1δ1-\delta.

In the proposed algorithm, we call the CheckBit function to estimate xix_{i} for each i[n]i\in[n] with error probability set to δ/n\delta/n, and obtain an estimate x^i\hat{x}_{i}. The algorithm then outputs 𝖳𝖧^k(x)=𝟙{i=1nx^ik}\widehat{\mathsf{TH}}_{k}(x)=\mathbbm{1}_{\{\sum_{i=1}^{n}\hat{x}_{i}\geq k\}} as the estimate. The pseudo-code of this algorithm is given in Algorithm 1.

Data: Bit sequence 𝐱=(x1,,xn){\bf x}=(x_{1},\ldots,x_{n}), parameter kn2k\leq\frac{n}{2}, error probability δ\delta, noise probability pp.
Result: Estimate of 𝖳𝖧k(𝐱)\mathsf{TH}_{k}({\bf x}).
1 w0w\leftarrow 0
2 for i[n]i\in[n] do
3       x^i\hat{x}_{i}\leftarrow CheckBit(xix_{i}, δ/n\delta/n, pp)
4       ww+x^iw\leftarrow w+\hat{x}_{i}
5      
6 end for
return 𝟙{i=1nx^ik}\mathbbm{1}_{\{\sum_{i=1}^{n}\hat{x}_{i}\geq k\}}
Algorithm 1 Proposed NoisyThreshold algorithm for knlognk\geq\frac{n}{\log n}

By the union bound, we have 𝖯(i[n]:x^ixi)δ\operatorname{\mathsf{P}}(\exists i\in[n]\mathchar 58\relax\hat{x}_{i}\neq x_{i})\leq\delta. Consequently, we get 𝖯(𝖳𝖧^k(x)𝖳𝖧k(x))δ\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}(x)\neq\mathsf{TH}_{k}(x))\leq\delta, i.e., the error probability of Algorithm 1 is at most δ\delta. The expected number of queries used by these nn calls of the CheckBit function is at most

n12plog1δ/nδ/nlog1ppn12p(lognδlog1pp+1)=(1+o(1))nlognδDKL(p1p).\frac{n}{1-2p}\left\lceil\frac{\log\frac{1-\delta/n}{\delta/n}}{\log\frac{1-p}{p}}\right\rceil\leq\frac{n}{1-2p}\left(\frac{\log\frac{n}{\delta}}{\log\frac{1-p}{p}}+1\right)=(1+o(1))\frac{n\log\frac{n}{\delta}}{{D_{\mathrm{KL}}(p\|1-p)}}.

In the regime k>nlognk>\frac{n}{\log n}, it follows that lognδ=(1+o(1))logkδ\log\frac{n}{\delta}=(1+o(1))\log\frac{k}{\delta}. Therefore, Algorithm 1 uses at most (1+o(1))nlognδDKL(p1p)(1+o(1))\frac{n\log\frac{n}{\delta}}{{D_{\mathrm{KL}}(p\|1-p)}} queries in expectation. This proves Theorem 2 in the regime k>nlognk>\frac{n}{\log n}.

Notably, in the opposite regime knlognk\leq\frac{n}{\log n}, the equation lognδ=(1+o(1))logkδ\log\frac{n}{\delta}=(1+o(1))\log\frac{k}{\delta} does not necessarily hold true, and hence the desired upper bound in Theorem 2 cannot be obtained through Algorithm 1. For example, suppose k=lognk=\log n and δ=1/logn\delta=1/\log n. The upper bound provided by Algorithm 1 is (1+o(1))nlognD𝖪𝖫(p||1p)\frac{(1+o(1))n\log n}{D_{\mathsf{KL}}(p\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}1-p)}, while the desired upper bound in Theorem 2 is given by (1+o(1))2nloglognD𝖪𝖫(p||1p)\frac{(1+o(1))2n\log\log n}{D_{\mathsf{KL}}(p\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}1-p)}.

IV-B Regime knlognk\leq\frac{n}{\log n}

In this regime, we propose the NoisyThreshold algorithm, given in Algorithm 2, that computes the 𝖳𝖧k\mathsf{TH}_{k} function with the desired number of queries and error probability. In addition to CheckBit, Algorithm 2 also utilizes the existing MaxHeapThreshold algorithm as a subroutine.

Data: Bit sequence 𝐱=(x1,,xn){\bf x}=(x_{1},\ldots,x_{n}), parameter kn2k\leq\frac{n}{2}, error probability δ\delta, noise probability pp.
Result: Estimate of 𝖳𝖧k(𝐱)\mathsf{TH}_{k}({\bf x}).
1 𝒮{\mathcal{S}}\leftarrow\emptyset
2
3for i[n]i\in[n] do
4       if CheckBit(xix_{i}, δ/k\delta/k, pp) = 11 then
5             Append xix_{i} to 𝒮{\mathcal{S}}
6            
7       end if
8      
9 end for
10if |𝒮|k1\lvert{\mathcal{S}}\rvert\leq k-1 then
11       return 0
12      
13 else if |𝒮|k+max(nδ+nδ,nlogn)\lvert{\mathcal{S}}\rvert\geq k+\max(n\delta+n\sqrt{\delta},\frac{n}{\log n}) then
14       return 1
15      
16 else
17       return MaxHeapThreshold(𝒮{\mathcal{S}}, kk, δ\delta, pp)
18      
19 end if
Algorithm 2 Proposed NoisyThreshold algorithm for knlognk\leq\frac{n}{\log n}

MaxHeapThreshold is the algorithm proposed by Feige et al., (1994) for estimating the 𝖳𝖧k\mathsf{TH}_{k} function. Its main idea is to use heapsort to find the kk-th largest element in 𝐱{\mathbf{x}}, and then repeatedly query this element to decide 𝖳𝖧k(x)\mathsf{TH}_{k}(x). To cope with the observation noise, each query in heapsort is repeated for a certain amount of times. Detailed descriptions of the MaxHeapThreshold algorithm and its subroutines are given in Algorithms 45 and 6 in Appendix B. Theorem 3.3 in Feige et al., (1994) shows that the function uses at most O(nlogkδ)O(n\log\frac{k}{\delta}) queries. Although the query complexity of MaxHeapThreshold achieves the correct asymptotic order, there remains a constant multiplicative gap between its query complexity and the optimal value, as demonstrated in the numerical examples provided in Section I.

The main idea in Algorithm 2 is first reducing the problem size to the order of o(n)o(n) through a filtering process for the bits in 𝐱{\mathbf{x}}, and then applying MaxHeapThreshold on the reduced problem. In the first step, we call the CheckBit function on each bit to estimate its value with tolerated error probability δ/k\delta/k. The bits estimated to be 11 are added to the set 𝒮\mathcal{S}. This step can be viewed as a filtering process that provides a set of bits believed to be 11 with high confidence. Notice that with an error probability δ/k\delta/k for each bit estimation, we cannot guarantee that all bits are correctly estimated with probability at least 1δ1-\delta as in the case of Algorithm 1. Instead, we show that with probability at least 12δ1-2\delta, |𝒮|\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathcal{S}}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|} is below a certain threshold τ\tau if 𝖳𝖧k(x)=0\mathsf{TH}_{k}(x)=0, and 𝒮{\mathcal{S}} contains at least kk bits of true value 1 if 𝖳𝖧k(x)=1\mathsf{TH}_{k}(x)=1. From here, we can output 𝖳𝖧k(x)=1\mathsf{TH}_{k}(x)=1 if |𝒮|τ\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathcal{S}}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\geq\tau (Line 10 of Algorithm 2), and output 𝖳𝖧k(x)=0\mathsf{TH}_{k}(x)=0 if |𝒮|k1\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathcal{S}}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\leq k-1 (Line 8 of Algorithm 2). When none of these two cases holds, the problem of estimating 𝖳𝖧k(𝐱)\mathsf{TH}_{k}({\mathbf{x}}) reduces to deciding whether 𝒮{\mathcal{S}} contains at least kk bits of value 11. For this reduced problem, we perform the second step by calling the MaxHeapThreshold function on the set 𝒮{\mathcal{S}} (Line 12 of Algorithm 2) to decide the final output.

By applying the union bound over the two steps, the overall error probability of Algorithm 2 is at most δ\delta^{\prime} when the input parameter δ\delta is set to δ/3\delta/3. Notably, τ\tau is chosen to be o(n)o(n). So in the event that the MaxHeapThreshold function is executed, its input size is o(n)o(n). Although running MaxHeapThreshold induces a suboptimality of a constant factor, running it on an input size of o(n)o(n) does not affect the dominating term in the query complexity of Algorithm 2. Therefore, the dominating term of the query complexity solely depends on the query complexity in the first step, which is upper bounded by (1+o(1))nlogkδDKL(p1p)(1+o(1))\frac{n\log\frac{k}{\delta^{\prime}}}{{D_{\mathrm{KL}}(p\|1-p)}}. This proves Theorem 2 in the regime of knlognk\leq\frac{n}{\log n}. The detailed proof for this regime is presented in Appendix B.

V Conclusion and Discussion

In this work, we establish new upper and lower bounds of optimal query complexity for noisy computing of the 𝖳𝖧k\mathsf{TH}_{k} function. Our bounds are asymptotically tight in the case of k=o(n)k=o(n), while having a multiplicative gap of at most 22 in general. Our new bounds strictly improve the best known bounds by Feige et al., (1994). In the following, we discuss several extensions of our study.

Closing the gap.

We believe the gap between our bounds stems from the lower bound. To establish the lower bound, we assume the existence of a genie that reveals auxiliary information to the algorithm, with the amount of information scaling with kk. We conjecture that when kk is large, the additional information provided by the genie fundamentally alters the optimal query complexity, leading to a looser lower bound. How to tighten our lower bound remains an open problem.

Unknown pp.

This paper assumes that the parameter pp is known. However, our bounds can be easily extended to the case of unknown pp. Note that the noisy computing problem becomes strictly harder when pp is unknown, so our lower bound automatically holds in the unknown pp setting. To extend our upper bound, the proposed algorithms need to be modified so that they do not rely on the knowledge of pp. The key idea is to incorporate an initial estimation step for pp before proceeding with the subsequent steps using the estimated value. More specifically, we can first query x1x_{1} for θ=nlog1δlogn\theta=\frac{n\log\frac{1}{\delta}}{\log n} times, and compute a biased estimate of pp as p^=min(r,θr)θ(11logn)\hat{p}=\frac{\min(r,\theta-r)}{\theta(1-\frac{1}{\log n})}, where rr denotes the number of 11’s observed in these θ\theta queries. The original steps in Algorithms 1 and 2 are then executed with input p^\hat{p}. By incorporating this estimation step, it can be shown that the error probabilities of Algorithms 1 and 2 both increase by at most o(δ)o(\delta), and the expected query complexities both increase by a factor of at most 1+o(1)1+o(1). This extends our upper bound to the unknown pp setting.

Varying pp.

In our study, we always assume that the noise parameter pp is a constant bounded away from both 0 and 12\frac{1}{2}. Another interesting extension of our study is to let pp also be a function of nn, and considering the limiting behavior of the optimal complexity when p0p\rightarrow 0 or p12p\rightarrow\frac{1}{2}.

References

  • Agarwal et al., (2017) Agarwal, A., Agarwal, S., Assadi, S., and Khanna, S. (2017). Learning with limited rounds of adaptivity: Coin tossing, multi-armed bandits, and ranking from pairwise comparisons. In Proceedings of the 2017 Conference on Learning Theory, volume 65 of Proceedings of Machine Learning Research, pages 39–75. PMLR.
  • Auer et al., (1995) Auer, P., Cesa-Bianchi, N., Freund, Y., and Schapire, R. E. (1995). Gambling in a rigged casino: The adversarial multi-armed bandit problem. In Proceedings of IEEE 36th annual foundations of computer science, pages 322–331. IEEE.
  • Berlekamp, (1964) Berlekamp, E. R. (1964). Block coding with noiseless feedback. Ph.D. thesis, MIT, Cambridge, MA, USA.
  • Bretagnolle and Huber, (1979) Bretagnolle, J. and Huber, C. (1979). Estimation des densités: risque minimax. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, 47(2):119–137.
  • Burnashev and Zigangirov, (1974) Burnashev, M. V. and Zigangirov, K. (1974). An interval estimation problem for controlled observations. Problemy Peredachi Informatsii, 10(3):51–61.
  • (6) Dobrushin, R. L. and Ortyukov, S. (1977a). Lower bound for the redundancy of self-correcting arrangements of unreliable functional elements. Problemy Peredachi Informatsii, 13(1):82–89.
  • (7) Dobrushin, R. L. and Ortyukov, S. (1977b). Upper bound on the redundancy of self-correcting arrangements of unreliable functional elements. Problemy Peredachi Informatsii, 13(3):56–76.
  • Evans and Pippenger, (1998) Evans, W. and Pippenger, N. (1998). Average-case lower bounds for noisy boolean decision trees. SIAM Journal on Computing, 28(2):433–446.
  • Falahatgar et al., (2017) Falahatgar, M., Orlitsky, A., Pichapati, V., and Suresh, A. T. (2017). Maximum selection and ranking under noisy comparisons. In Precup, D. and Teh, Y. W., editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1088–1096. PMLR.
  • Feige et al., (1994) Feige, U., Raghavan, P., Peleg, D., and Upfal, E. (1994). Computing with noisy information. SIAM Journal on Computing, 23(5):1001–1018.
  • Gács and Gál, (1994) Gács, P. and Gál, A. (1994). Lower bounds for the complexity of reliable boolean circuits with noisy gates. IEEE Trans. Inf. Theory, 40(2):579–583.
  • Gu and Xu, (2023) Gu, Y. and Xu, Y. (2023). Optimal bounds for noisy sorting. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, STOC 2023, page 1502–1515, New York, NY, USA. Association for Computing Machinery.
  • Hastad et al., (1987) Hastad, J., Leighton, T., and Newman, M. (1987). Reconfiguring a hypercube in the presence of faults. In Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing, STOC ’87, page 274–284, New York, NY, USA.
  • Heckel et al., (2019) Heckel, R., Shah, N. B., Ramchandran, K., and Wainwright, M. J. (2019). Active ranking from pairwise comparisons and when parametric assumptions do not help. Ann. Stat., 47(6):3099 – 3126.
  • Hoeffding, (1963) Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13–30.
  • Horstein, (1963) Horstein, M. (1963). Sequential transmission using noiseless feedback. IEEE Trans. Inf. Theory, 9(3):136–143.
  • Karp and Kleinberg, (2007) Karp, R. M. and Kleinberg, R. (2007). Noisy binary search and its applications. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’07, page 881–890, USA. Society for Industrial and Applied Mathematics.
  • Kushilevitz and Mansour, (2005) Kushilevitz, E. and Mansour, Y. (2005). Computation in noisy radio networks. SIAM Journal on Discrete Mathematics, 19(1):96–108.
  • Mitzenmacher and Upfal, (2017) Mitzenmacher, M. and Upfal, E. (2017). Probability and computing: Randomization and probabilistic techniques in algorithms and data analysis. Cambridge university press.
  • Pease et al., (1980) Pease, M., Shostak, R., and Lamport, L. (1980). Reaching agreement in the presence of faults. J. ACM, 27(2):228–234.
  • Pelc, (1989) Pelc, A. (1989). Searching with known error probability. Theoretical Computer Science, 63(2):185–202.
  • Pippenger et al., (1991) Pippenger, N., Stamoulis, G. D., and Tsitsiklis, J. N. (1991). On a lower bound for the redundancy of reliable networks with noisy gates. IEEE Transactions on Information Theory, 37(3):639–643.
  • Reischuk and Schmeltz, (1991) Reischuk, R. and Schmeltz, B. (1991). Reliable computation with noisy circuits and decision trees-a general n log n lower bound. In [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science, pages 602–611. IEEE Computer Society.
  • Shah and Wainwright, (2018) Shah, N. B. and Wainwright, M. J. (2018). Simple, robust and optimal ranking from pairwise comparisons. J. Mach. Learn. Res., 18(199):1–38.
  • Von Neumann, (1956) Von Neumann, J. (1956). Probabilistic logics and the synthesis of reliable organisms from unreliable components. Automata studies, 34:43–98.
  • Wang et al., (2024) Wang, Z., Ghaddar, N., Zhu, B., and Wang, L. (2024). Noisy sorting capacity. IEEE Transactions on Information Theory, 70(9):6121–6138.
  • Yu, (1997) Yu, B. (1997). Assouad, fano, and le cam. In Festschrift for Lucien Le Cam, pages 423–435. Springer.
  • Zhu et al., (2024) Zhu, B., Wang, Z., Ghaddar, N., Jiao, J., and Wang, L. (2024). Noisy computing of the or and max functions. IEEE Journal on Selected Areas in Information Theory, 5:302–313.

Appendix A Detailed Proof of Theorem 1

In this section, we prove Theorem 1 in three separate regimes log(1/δ)loglog(1/δ)<logklog(1/δ)loglog(1/δ)\frac{\log(1/\delta)}{\log\log(1/\delta)}<\log k\leq\log(1/\delta)\log\log(1/\delta), logk>log(1/δ)loglog(1/δ)\log k>\log(1/\delta)\log\log(1/\delta) and logklog(1/δ)loglog(1/δ)\log k\leq\frac{\log(1/\delta)}{\log\log(1/\delta)}.

A-A Regime 1: log(1/δ)loglog(1/δ)<logklog(1/δ)loglog(1/δ)\frac{\log(1/\delta)}{\log\log(1/\delta)}<\log k\leq\log(1/\delta)\log\log(1/\delta)

Let ϵ=(logk)1/4{\epsilon}=(\log k)^{-1/4}. Because we assume that logk>log(1/δ)loglog(1/δ)=ω(1)\log k>\frac{\log(1/\delta)}{\log\log(1/\delta)}=\omega(1), we have ϵ=o(1){\epsilon}=o(1). Since the KL divergence is a Lipschitz continuous function, it follows that (1ϵ)(nkϵn)logkδDKL(pϵ1p)=(1o(1))(nk)logkδDKL(p1p)\frac{(1-\epsilon)(n-k-\epsilon n)\log\frac{k}{\delta}}{{D_{\mathrm{KL}}}(p-\epsilon\|1-p)}=(1-o(1))\frac{(n-k)\log\frac{k}{\delta}}{{D_{\mathrm{KL}}(p\|1-p)}}, and it suffices to show that for any algorithm that makes at most (1ϵ)(nkϵn)logkδDKL(pϵ1p)\frac{(1-\epsilon)(n-k-\epsilon n)\log\frac{k}{\delta}}{{D_{\mathrm{KL}}}(p-\epsilon\|1-p)} queries in expectation, the worst-case error probability is at least δ\delta.

Let 𝒜\mathcal{A} be an arbitrary algorithm that uses at most (1ϵ)(nkϵn)logkδDKL(pϵ1p)\frac{(1-\epsilon)(n-k-\epsilon n)\log\frac{k}{\delta}}{{D_{\mathrm{KL}}}(p-\epsilon\|1-p)} queries in expectation. Recall the enhanced algorithm 𝒜\mathcal{A}^{\prime} described in Section III. By Lemma 1, it suffices to show that the worst case error probability of algorithm 𝒜\mathcal{A}^{\prime} is at least δ\delta. To accomplish this, we prove

𝖯(𝖳𝖧^k=0|w(𝐱)=k)>δ\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k)>\delta (6)

under the assumption that

𝖯(𝖳𝖧^k=0|w(𝐱)=k1)1δ.\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1)\geq 1-\delta. (7)

Recall the notations under the balls-and-bins model we defined in Section III. In addition, we define Pj1(αj)(1p)jpαjP_{j}^{1}\triangleq\binom{\alpha}{j}(1-p)^{j}p^{\alpha-j} as the probability that a heavy ball is put into bin jj, and Pj0(αj)pj(1p)αjP_{j}^{0}\triangleq\binom{\alpha}{j}p^{j}(1-p)^{\alpha-j} as the probability that a light ball is put into bin jj. Recall that we define 𝒩\mathcal{N} as the collection of realizations (𝐡0α,𝐥0α)({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha}) such that 𝖳𝖧^k(𝐡0α,𝐥0α,1)=0\widehat{\mathsf{TH}}_{k}({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha},1)=0, and we have

𝖯(𝖳𝖧^k=0)=(𝐡0α,𝐥0α)𝒩𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}1).\displaystyle\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0)=\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}). (8)

In the following, we construct the family 𝒞\mathcal{C} of realizations such that the typicality and the bounded ratio property introduced in Section III are satisfied. Let 𝒞\mathcal{C} be the collection of all (2α+2)(2\alpha+2)-tuples of natural numbers (𝐡0α,𝐥0α)({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha}) satisfying

  1. 1.

    i=0αhi=k1\sum_{i=0}^{\alpha}h_{i}=k-1, i=0αli=nk+1\sum_{i=0}^{\alpha}l_{i}=n-k+1;

  2. 2.

    i𝒯li(nk+1)(12exp(logk6pDKL(pϵ1p)))(1lognn)\sum_{i\in{\mathcal{T}}}l_{i}\geq(n-k+1)\left(1-2\exp\left(-\frac{\sqrt{\log k}}{6p{D_{\mathrm{KL}}}(p-{\epsilon}\|1-p)}\right)\right)\left(1-\sqrt{\frac{\log n}{n}}\right);

  3. 3.

    (1Δj0)(nk+1)Pj0lj(1+Δj0)(nk+1)Pj0(1-\Delta_{j}^{0})(n-k+1)P_{j}^{0}\leq l_{j}\leq(1+\Delta_{j}^{0})(n-k+1)P_{j}^{0} for any j𝒯j\in{\mathcal{T}}, where Δj0logn(nk+1)Pj0\Delta_{j}^{0}\triangleq\sqrt{\frac{\log n}{(n-k+1)P_{j}^{0}}};

  4. 4.

    hi=0h_{i}=0 for any i𝒯i\in{\mathcal{T}}.

Here we provide a brief interpretation of the last three conditions in the definition of 𝒞\mathcal{C}. Firstly, as we have mentioned, 𝒯{\mathcal{T}} represents the typical range of bins for the light balls. Therefore, condition 2 essentially states that only a vanishing fraction of the light balls will fall outside the typical bins. Secondly, as we will see in the later part of the proof, Pj0=Ω(n1+Θ(1))P_{j}^{0}=\Omega(n^{-1+\Theta(1)}) for each j𝒯j\in{\mathcal{T}}. This implies that Δj0=o(1)\Delta_{j}^{0}=o(1), and condition 3 requires that the number of light balls in each typical bin has a small deviation from its mean. Finally, we will later show that 𝖤[Hj]=o(1)\operatorname{\mathsf{E}}[H_{j}]=o(1) for each j𝒯j\in{\mathcal{T}}, and condition 4 essentially requires the concentration of the number of heavy balls in these bins. From these, we can see that 𝒞\mathcal{C} is a collection of typical realizations for (𝐇0α,𝐋0α)({\bf H}_{0}^{\alpha},{\bf L}_{0}^{\alpha}). In the following two propositions, we state the typicality and the bounded ratio property for 𝒞\mathcal{C}.

Proposition 1.

There exists a positive constant γ\gamma such that

(𝐡0α,𝐥0α)𝒞𝖯({(𝐇0α,𝐋0α)=(𝐡0α,𝐥0α)}|w(𝐱)=k1)1O(logkloglog(1/δ)kγloglog(1/δ))O(n1/6).\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{C}}\operatorname{\mathsf{P}}(\{({\bf H}_{0}^{\alpha},{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1)\geq 1-O\left(\frac{\log k\log\log(1/\delta)}{k^{\frac{\gamma}{\log\log(1/\delta)}}}\right)-O(n^{-1/6}). (9)
Proposition 2.

Let n(nk+1)(12exp(logk6pDKL(pϵ1p)))(1lognn)(2ϵα+1).n^{*}\triangleq(n-k+1)\left(1-2\exp\left(-\frac{\sqrt{\log k}}{6p{D_{\mathrm{KL}}}(p-{\epsilon}\|1-p)}\right)\right)\left(1-\sqrt{\frac{\log n}{n}}\right)-(2{\epsilon}\alpha+1). For any (𝐡0α,𝐥0α)𝒞({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{C}, we have

𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}1|w(𝐱)=k)𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}|w(𝐱)=k1)\displaystyle\frac{\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k\right)}{\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1\right)} (10)
(1+o(1))δn(kδ)ϵ/2(n𝖤[N|w(𝐱)=k1,(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)]).\displaystyle\geq\tfrac{(1+o(1))\delta}{n}\left(\tfrac{k}{\delta}\right)^{{\epsilon}/2}\left(n^{*}-\operatorname{\mathsf{E}}[N^{\prime}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w({\mathbf{x}})=k-1,(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})]\right). (11)

With Propositions 1 and 2, we are now ready to prove inequality (6). By Proposition 2, we have

𝖯(𝖳𝖧^k=0|w(𝐱)=k)\displaystyle\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k)
(𝐡0α,𝐥0α)𝒩𝒞𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}1|w(𝐱)=k)\displaystyle\geq\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}\cap\mathcal{C}}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k)
(a)(𝐡0α,𝐥0α)𝒩𝒞𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}|w(𝐱)=k1)\displaystyle\stackrel{{\scriptstyle\mathchoice{\hbox to0.0pt{\hss$\displaystyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\textstyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\scriptstyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\scriptscriptstyle{(a)}$\hss}}}}{{\geq}}\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}\cap\mathcal{C}}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1)
(1+o(1))δn(kδ)ϵ/2(n𝖤[N|w(𝐱)=k1,(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)])\displaystyle\hskip 10.00002pt\cdot\tfrac{(1+o(1))\delta}{n}\left(\tfrac{k}{\delta}\right)^{{\epsilon}/2}\left(n^{*}-\operatorname{\mathsf{E}}[N^{\prime}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w({\mathbf{x}})=k-1,(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})]\right)
=(1+o(1))δn(kδ)ϵ/2((𝐡0α,𝐥0α)𝒩𝒞n𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}|w(𝐱)=k1)\displaystyle=\tfrac{(1+o(1))\delta}{n}\left(\tfrac{k}{\delta}\right)^{{\epsilon}/2}\Bigg{(}\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}\cap\mathcal{C}}n^{*}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1)
𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}|w(𝐱)=k1)𝖤[N|w(𝐱)=k1,(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)]).\displaystyle\hskip 10.00002pt-\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1)\operatorname{\mathsf{E}}[N^{\prime}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w({\mathbf{x}})=k-1,(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})]\Bigg{)}. (12)

Now, we bound the summation in (12). Since we assume that 𝖯(𝖳𝖧^k=0|w(𝐱)=k1)1δ\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1)\geq 1-\delta, we have

1δ𝖯(𝖳𝖧^k=0|w(𝐱)=k1)=(𝐡0α,𝐥0α)𝒩𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}|w(𝐱)=k1),1-\delta\leq\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1)=\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1), (13)

By Proposition 1, we have

(𝐡0α,𝐥0α)𝒩𝒞𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}|w(𝐱)=k1)1δO(logkloglog(1/δ)kγloglog(1/δ))O(n1/6).\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}\cap\mathcal{C}}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1)\geq 1-\delta-O\left(\frac{\log k\log\log(1/\delta)}{k^{\frac{\gamma}{\log\log(1/\delta)}}}\right)-O(n^{-1/6}).

By the definition of nn^{*} in Proposition 2, it follows that

(𝐡0α,𝐥0α)𝒩𝒞n𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}|w(𝐱)=k1)\displaystyle\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}\cap\mathcal{C}}n^{*}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1)
(nk)(1δO(logkloglog(1/δ)kγloglog(1/δ))O(n1/6)2exp(logk6pDKL(pϵ1p))\displaystyle\geq(n-k)\bigg{(}1-\delta-O\left(\frac{\log k\log\log(1/\delta)}{k^{\frac{\gamma}{\log\log(1/\delta)}}}\right)-O(n^{-1/6})-2\exp\left(-\frac{\sqrt{\log k}}{6p{D_{\mathrm{KL}}}(p-{\epsilon}\|1-p)}\right)
O(lognn)O(ϵαn)).\displaystyle\;\;\;-O\left(\sqrt{\tfrac{\log n}{n}}\right)-O\left(\frac{{\epsilon}\alpha}{n}\right)\bigg{)}. (14)

By the law of total expectation and the fact that NN^{\prime} is always non-negative, we have

(𝐡0α,𝐥0α)𝒩𝒞𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}|w(𝐱)=k1)𝖤[N|w(𝐱)=k1,(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)]\displaystyle\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}\cap\mathcal{C}}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1)\operatorname{\mathsf{E}}[N^{\prime}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w({\mathbf{x}})=k-1,(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})]
𝖤[N|w(𝐱)=k1]=nkϵn.\displaystyle\leq\operatorname{\mathsf{E}}[N^{\prime}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w({\mathbf{x}})=k-1]=n-k-{\epsilon}n. (15)

Because ϵ=(logk)1/4{\epsilon}=(\log k)^{-1/4}, we have

δ+logkloglog(1/δ)kγloglog(1/δ)+n1/6+2exp(logk6pDKL(pϵ1p))+lognn+ϵαn=o(ϵ).\delta+\frac{\log k\log\log(1/\delta)}{k^{\frac{\gamma}{\log\log(1/\delta)}}}+n^{-1/6}+2\exp\left(-\frac{\sqrt{\log k}}{6p{D_{\mathrm{KL}}}(p-{\epsilon}\|1-p)}\right)+\sqrt{\tfrac{\log n}{n}}+\frac{{\epsilon}\alpha}{n}=o(\epsilon).

Plugging (14) and (15) into (12) yields

𝖯(𝖳𝖧^k=0|w(𝐱)=k)(1o(1))ϵδ(kδ)ϵ/2.\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k)\geq(1-o(1)){\epsilon}\delta\left(\tfrac{k}{\delta}\right)^{{\epsilon}/2}.

Finally, ϵ=(logk)1/4{\epsilon}=(\log k)^{-1/4} implies that ϵ(kδ)ϵ/2=ω(1){\epsilon}\left(\tfrac{k}{\delta}\right)^{{\epsilon}/2}=\omega(1). This shows that

𝖯(𝖳𝖧^k=0|w(𝐱)=k)ω(δ).\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k)\geq\omega(\delta).

To finish the proof in this regime, we are left to prove Propositions 1 and 2 and Lemma 1.

A-A1 Proof of Propositions

Proof of Proposition 1.

We define error events

2{j𝒯Lj(nk+1)(12exp(logk6pDKL(pϵ1p)))(1lognn)},{\mathcal{E}}_{2}\triangleq\left\{\sum_{j\in{\mathcal{T}}}L_{j}\leq(n-k+1)\left(1-2\exp\left(-\frac{\sqrt{\log k}}{6p{D_{\mathrm{KL}}}(p-{\epsilon}\|1-p)}\right)\right)\left(1-\sqrt{\frac{\log n}{n}}\right)\right\},
3{j𝒯:Lj[(1Δj0)(nk+1)Pj0,(1+Δj0)(nk+1)Pj0]},{\mathcal{E}}_{3}\triangleq\{\exists j\in{\mathcal{T}}\mathchar 58\relax L_{j}\notin[(1-\Delta_{j}^{0})(n-k+1)P_{j}^{0},(1+\Delta_{j}^{0})(n-k+1)P_{j}^{0}]\},

and

4{j𝒯:Hj1}.{\mathcal{E}}_{4}\triangleq\{\exists j\in{\mathcal{T}}\mathchar 58\relax H_{j}\geq 1\}.

Notice that events 2,3{\mathcal{E}}_{2},{\mathcal{E}}_{3} and 4{\mathcal{E}}_{4} correspond to the second, third and fourth conditions in the definition of 𝒞\mathcal{C}. The first condition is always satisfied because we always have i=0αHi=k1\sum_{i=0}^{\alpha}H_{i}=k-1 and i=0αLi=nk+1\sum_{i=0}^{\alpha}L_{i}=n-k+1 given w(𝐱)=k1w(\mathbf{x})=k-1. It follows that

(𝐡0α,𝐥0α)𝒞𝖯({(𝐇0α,𝐋0α)=(𝐡0α,𝐥0α)}|w(𝐱)=k1)\displaystyle\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{C}}\operatorname{\mathsf{P}}(\{({\bf H}_{0}^{\alpha},{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1)
=1𝖯(234|w(𝐱)=k1)\displaystyle=1-\operatorname{\mathsf{P}}({\mathcal{E}}_{2}\cup{\mathcal{E}}_{3}\cup{\mathcal{E}}_{4}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1)
1𝖯(2|w(𝐱)=k1)𝖯(3|w(𝐱)=k1)𝖯(4|w(𝐱)=k1),\displaystyle\geq 1-\operatorname{\mathsf{P}}({\mathcal{E}}_{2}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1)-\operatorname{\mathsf{P}}({\mathcal{E}}_{3}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1)-\operatorname{\mathsf{P}}({\mathcal{E}}_{4}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1),

where the last inequality follows by the union bound. We bound the probabilities of these three error events in the following lemmas. The proof of these lemmas are deferred to later part of this section.

Lemma 2.

For any 𝐱\mathbf{x} with w(𝐱)=k1w(\mathbf{x})=k-1, we have 𝖯(2|𝐱)n1/5.\operatorname{\mathsf{P}}({\mathcal{E}}_{2}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x})\leq n^{-1/5}.

Lemma 3.

For any 𝐱\mathbf{x} with w(𝐱)=k1w(\mathbf{x})=k-1, we have 𝖯(3|𝐱)n1/6\operatorname{\mathsf{P}}({\mathcal{E}}_{3}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x})\leq n^{-1/6}.

Lemma 4.

For any 𝐱\mathbf{x} with w(𝐱)=k1w(\mathbf{x})=k-1, we have 𝖯(4|𝐱)=O(logkloglog(1/δ)kγloglog(1/δ))\operatorname{\mathsf{P}}({\mathcal{E}}_{4}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x})=O\left(\frac{\log k\log\log(1/\delta)}{k^{\frac{\gamma}{\log\log(1/\delta)}}}\right), where γ\gamma is a positive constant.

By Lemmas 2, 3 and 4, we have

(𝐡0α,𝐥0α)𝒞𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}|w(𝐱)=k1)\displaystyle\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{C}}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1) 1n1/5n1/6O(logkloglog(1/δ)kγloglog(1/δ))\displaystyle\geq 1-n^{-1/5}-n^{-1/6}-O\left(\frac{\log k\log\log(1/\delta)}{k^{\frac{\gamma}{\log\log(1/\delta)}}}\right)
=1O(n1/6)O(logkloglog(1/δ)kγloglog(1/δ)),\displaystyle=1-O(n^{-1/6})-O\left(\frac{\log k\log\log(1/\delta)}{k^{\frac{\gamma}{\log\log(1/\delta)}}}\right),

which completes the proof of Proposition 1. ∎

Proof of Proposition 2.

To begin with, we notice that on any input 𝐱\mathbf{x} with w(𝐱)=k1w(\mathbf{x})=k-1, the genie-aided information is exactly 𝐇¯0α=𝐇0α\bar{\bf H}_{0}^{\alpha}={\bf H}_{0}^{\alpha} and 𝐋¯0α=𝐋0α\bar{\bf L}_{0}^{\alpha}={\bf L}_{0}^{\alpha}. Moreover, conditioned on w(𝐱)=k1w(\mathbf{x})=k-1, we know that 𝐇0α{\bf H}_{0}^{\alpha} represents the number of heavy balls in each bin, and 𝐋0α{\bf L}_{0}^{\alpha} represents the number of light balls in each bin. Therefore, given w(𝐱)=k1w(\mathbf{x})=k-1, 𝐇0α{\bf H}_{0}^{\alpha} and 𝐋0α{\bf L}_{0}^{\alpha} are two independent random vectors with multinomial conditional distributions 𝐇0α|{w(𝐱)=k1}M(k1;P01,,Pα1){\bf H}_{0}^{\alpha}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\{w(\mathbf{x})=k-1\}\sim\mathrm{M}(k-1;P_{0}^{1},\ldots,P_{\alpha}^{1}) and 𝐋0α|{w(𝐱)=k1}M(nk+1;P00,,Pα0){\bf L}_{0}^{\alpha}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\{w(\mathbf{x})=k-1\}\sim\mathrm{M}(n-k+1;P_{0}^{0},\ldots,P_{\alpha}^{0}) respectively. Here, we say a random vector 𝐙=(Z1,,Zm)\mathbf{Z}=(Z_{1},\ldots,Z_{m}) follows multinomial distribution M(N;p1,,pm)\mathrm{M}(N;p_{1},\ldots,p_{m}) if its pmf is given by 𝖯(𝐙=𝐳)=N!i=1mpizii=1mzi!.\operatorname{\mathsf{P}}(\mathbf{Z}=\mathbf{z})=\frac{N!\prod_{i=1}^{m}p_{i}^{z_{i}}}{\prod_{i=1}^{m}z_{i}!}. Therefore, the denominator in (10) can be written as

𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}|w(𝐱)=k1)\displaystyle\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1\right)
=(k1)!i=0α(Pi1)hii=0αhi!(nk+1)!i=0α(Pi0)lii=0αli!.\displaystyle=\frac{(k-1)!\prod_{i=0}^{\alpha}(P_{i}^{1})^{h_{i}}}{\prod_{i=0}^{\alpha}h_{i}!}\frac{(n-k+1)!\prod_{i=0}^{\alpha}(P_{i}^{0})^{l_{i}}}{\prod_{i=0}^{\alpha}l_{i}!}. (16)

Now, we move on to consider the numerator in (10). Recalled that when w(𝐱)=kw(\mathbf{x})=k, the genie picks a bin with random index J𝒯J\in{\mathcal{T}}. If there exists a heavy ball in bin JJ, the genie randomly chooses a heavy ball from there and keeps it unrevealed. The numerator in (10) can be written as

𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}1|w(𝐱)=k)\displaystyle\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k\right)
=j𝒯𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}{J=j}1|w(𝐱)=k)\displaystyle=\sum_{j\in{\mathcal{T}}}\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap\{J=j\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k\right)
=j𝒯𝖯({(𝐇0α,𝐋0α)=(𝐡0α+𝐞j,𝐥0α𝐞j)}{J=j}1|w(𝐱)=k),\displaystyle=\sum_{j\in{\mathcal{T}}}\operatorname{\mathsf{P}}\left(\{({\bf H}_{0}^{\alpha},{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha}+{\bf e}_{j},{\bf l}_{0}^{\alpha}-{\bf e}_{j})\}\cap\{J=j\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k\right), (17)

where for each j{0,,α}j\in\{0,\ldots,\alpha\}, 𝐞j{\bf e}_{j} is the length-(α+1)(\alpha+1) indicator vector for the (j+1)(j+1)-th coordinate, i.e., the (j+1)(j+1)-th coordinate has value one, while all other coordinates have value zero. By the chain rule, we have

𝖯({(𝐇0α,𝐋0α)=(𝐡0α+𝐞j,𝐥0α𝐞j)}{J=j}1|w(𝐱)=k)\displaystyle\operatorname{\mathsf{P}}\left(\{({\bf H}_{0}^{\alpha},{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha}+{\bf e}_{j},{\bf l}_{0}^{\alpha}-{\bf e}_{j})\}\cap\{J=j\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k\right)
=𝖯({𝐇0α=𝐡0α+𝐞j}{𝐋0α=𝐥0α𝐞j}|w(𝐱)=k)\displaystyle=\operatorname{\mathsf{P}}(\{{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha}+{\bf e}_{j}\}\cap\{{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha}-{\bf e}_{j}\}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k)
𝖯({J=j}|𝐇0α=𝐡0α+𝐞j,𝐋0α=𝐥0α𝐞j,w(𝐱)=k)\displaystyle\;\;\cdot\operatorname{\mathsf{P}}(\{J=j\}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha}+{\bf e}_{j},{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha}-{\bf e}_{j},w(\mathbf{x})=k)
𝖯(1|J=j,Y0α=y0α+𝐞j,Z0α=z0α𝐞j,w(𝐱)=k),\displaystyle\;\;\cdot\operatorname{\mathsf{P}}({\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}J=j,Y_{0}^{\alpha}=y_{0}^{\alpha}+{\bf e}_{j},Z_{0}^{\alpha}=z_{0}^{\alpha}-{\bf e}_{j},w(\mathbf{x})=k), (18)

Now, we bound the three probabilities in (18) separately. Firstly, given w(𝐱)=kw(\mathbf{x})=k, 𝐇0α{\bf H}_{0}^{\alpha} and 𝐋0α{\bf L}_{0}^{\alpha} are two independent random vectors with conditional distributions 𝐇0α|{w(𝐱)=k}M(k;P01,Pα1){\bf H}_{0}^{\alpha}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\{w(\mathbf{x})=k\}\sim\mathrm{M}(k;P_{0}^{1}\ldots,P_{\alpha}^{1}) and 𝐋0α|{w(𝐱)=k}M(nk;P00,Pα0){\bf L}_{0}^{\alpha}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\{w(\mathbf{x})=k\}\sim\mathrm{M}(n-k;P_{0}^{0}\ldots,P_{\alpha}^{0}) respectively. This suggests that

𝖯({𝐇0α=𝐡0α+𝐞j}{𝐋0α=𝐥0α𝐞j}|w(𝐱)=k)=k!Pj1i=0α(Pi1)hi(hj+1)i=0αhi!(nk)!lji=0α(Pi0)liPj0i=0αli!.\displaystyle\operatorname{\mathsf{P}}(\{{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha}+{\bf e}_{j}\}\cap\{{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha}-{\bf e}_{j}\}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k)=\frac{k!P_{j}^{1}\prod_{i=0}^{\alpha}(P_{i}^{1})^{h_{i}}}{(h_{j}+1)\prod_{i=0}^{\alpha}h_{i}!}\frac{(n-k)!l_{j}\prod_{i=0}^{\alpha}(P_{i}^{0})^{l_{i}}}{P_{j}^{0}\prod_{i=0}^{\alpha}l_{i}!}. (19)

Secondly,

𝖯({J=j}|𝐇0α=𝐡0α+𝐞j,𝐋0α=𝐥0α𝐞j,w(𝐱)=k)=lj11+j𝒯lj\displaystyle\operatorname{\mathsf{P}}(\{J=j\}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha}+{\bf e}_{j},{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha}-{\bf e}_{j},w(\mathbf{x})=k)=\frac{l_{j}-1}{-1+\sum_{j\in{\mathcal{T}}}l_{j}} (20)

by the random process of choosing JJ. Finally, by the symmetry among the balls in the same bin, it suffices for the adaptive phase of the algorithm to choose the number of balls to be revealed in each bin. Recall that we assume without loss of generality that the adaptive phase only reveals balls in sets {¯j:j𝒯}\{\bar{\mathcal{L}}_{j}\mathchar 58\relax j\in\mathcal{T}\}. For j𝒯j\in\mathcal{T}, let NjN^{\prime}_{j} denote the number of balls revealed in ¯j\bar{\mathcal{L}}_{j}. Note that NjN^{\prime}_{j}’s are random variables satisfying j𝒯Nj=N\sum_{j\in\mathcal{T}}N^{\prime}_{j}=N^{\prime} by the construction of 𝒜\mathcal{A}^{\prime}. Notice that JJ and w(𝐱)w({\mathbf{x}}) are both variables hidden from 𝒜\mathcal{A}^{\prime}. Moreover, the adaptive phase can only lead to two possible outcomes: a heavy ball is found or no heavy ball is found. Therefore, we can assume without loss of generality that the distribution of NjN^{\prime}_{j}’s only depends on the observations ¯0α\bar{\mathcal{H}}_{0}^{\alpha} and ¯0α\bar{\mathcal{L}}_{0}^{\alpha}. Because the probability of finding the heavy ball in ¯j\bar{\mathcal{L}}_{j} by revealing ii balls is ilj\frac{i}{l_{j}}, it follows that

𝖯(1|J=j,𝐇0α=𝐡0α+𝐞j,𝐋0α=𝐥0α𝐞j,w(𝐱)=k)\displaystyle\operatorname{\mathsf{P}}({\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}J=j,{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha}+{\bf e}_{j},{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha}-{\bf e}_{j},w(\mathbf{x})=k)
=i=0lj𝖯(Nj=i|J=j,𝐇0α=𝐡0α+𝐞j,𝐋0α=𝐥0α𝐞j,w(𝐱)=k)(1ilj)\displaystyle=\sum_{i=0}^{l_{j}}\operatorname{\mathsf{P}}(N^{\prime}_{j}=i\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}J=j,{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha}+{\bf e}_{j},{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha}-{\bf e}_{j},w(\mathbf{x})=k)\left(1-\frac{i}{l_{j}}\right)
1𝖤[Nj|J=j,𝐇0α=𝐡0α+𝐞j,𝐋0α=𝐥0α𝐞j,w(𝐱)=k]lj1\displaystyle\geq 1-\frac{\operatorname{\mathsf{E}}[N^{\prime}_{j}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}J=j,{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha}+{\bf e}_{j},{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha}-{\bf e}_{j},w(\mathbf{x})=k]}{l_{j}-1}
=1𝖤[Nj|J=j,𝐇¯0α=𝐡0α,𝐋¯0α=𝐥0α,w(𝐱)=k]lj1\displaystyle=1-\frac{\operatorname{\mathsf{E}}[N^{\prime}_{j}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}J=j,\bar{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha},w(\mathbf{x})=k]}{l_{j}-1}
=1𝖤[Nj|𝐇¯0α=𝐡0α,𝐋¯0α=𝐥0α,w(𝐱)=k1]lj1,\displaystyle=1-\frac{\operatorname{\mathsf{E}}[N^{\prime}_{j}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\bar{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha},w(\mathbf{x})=k-1]}{l_{j}-1}, (21)

where the penultimate equality follows because {J=j,𝐇0α=𝐡0α+𝐞j,𝐋0α=𝐥0α𝐞j}={J=j,𝐇¯0α=𝐡0α,𝐋¯0α=𝐥0α}\{J=j,{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha}+{\bf e}_{j},{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha}-{\bf e}_{j}\}=\{J=j,\bar{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha}\}, and the last equality follows because NjN^{\prime}_{j} is conditionally independent of w(𝐱)w({\mathbf{x}}) and JJ given 𝐇¯0α\bar{\bf H}_{0}^{\alpha} and 𝐋¯0α\bar{\bf L}_{0}^{\alpha}. Plugging (19), (20) and (21) into (18) yields

𝖯({(𝐇0α,𝐋0α)=(𝐡0α+𝐞j,𝐥0α𝐞j)}{J=j}1|w(𝐱)=k)\displaystyle\operatorname{\mathsf{P}}\left(\{({\bf H}_{0}^{\alpha},{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha}+{\bf e}_{j},{\bf l}_{0}^{\alpha}-{\bf e}_{j})\}\cap\{J=j\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k\right)
k!Pj1i=0α(Pi1)hi(hj+1)i=0αhi!(nk)!lji=0α(Pi0)liPj0i=0αli!lj1𝖤[Nj|𝐇¯0α=𝐡0α,𝐋¯0α=𝐥0α,w(𝐱)=k1]1+j𝒯lj.\displaystyle\geq\frac{k!P_{j}^{1}\prod_{i=0}^{\alpha}(P_{i}^{1})^{h_{i}}}{(h_{j}+1)\prod_{i=0}^{\alpha}h_{i}!}\cdot\frac{(n-k)!l_{j}\prod_{i=0}^{\alpha}(P_{i}^{0})^{l_{i}}}{P_{j}^{0}\prod_{i=0}^{\alpha}l_{i}!}\cdot\frac{l_{j}-1-\operatorname{\mathsf{E}}[N^{\prime}_{j}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\bar{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha},w(\mathbf{x})=k-1]}{-1+\sum_{j\in{\mathcal{T}}}l_{j}}. (22)

Then, we plug (22) into (17) and divide by (16) to get

𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}1|w(𝐱)=k)𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}|w(𝐱)=k1)\displaystyle\frac{\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k\right)}{\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1\right)}
=j𝒯kPj1hj+1lj(nk+1)Pj0lj1𝖤[Nj|𝐇¯0α=𝐡0α,𝐋¯0α=𝐥0α,w(𝐱)=k1]1+i𝒯li.\displaystyle=\sum_{j\in{\mathcal{T}}}\frac{kP_{j}^{1}}{h_{j}+1}\cdot\frac{l_{j}}{(n-k+1)P_{j}^{0}}\cdot\frac{l_{j}-1-\operatorname{\mathsf{E}}[N^{\prime}_{j}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\bar{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha},w(\mathbf{x})=k-1]}{-1+\sum_{i\in{\mathcal{T}}}l_{i}}. (23)

To complete the proof, we need a technical lemma that bounds the probabilities Pj1P_{j}^{1} and Pj0P_{j}^{0}.

Lemma 5.

For each j𝒯j\in{\mathcal{T}}, we have Pj0exp(o(logk))P_{j}^{0}\geq\exp(-o(\log k)), and δk(kδ)ϵ/2Pj1k1γloglog(1/δ)\frac{\delta}{k}\left(\frac{k}{\delta}\right)^{{\epsilon}/2}\leq P_{j}^{1}\leq k^{-1-\frac{\gamma}{\log\log(1/\delta)}} for some positive constant γ\gamma.

Because (𝐡0α,𝐥0α)𝒞({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{C}, we have hj=0h_{j}=0 and (1Δj0)(nk+1)Pj0li(1+Δj0)(nk+1)Pj0(1-\Delta_{j}^{0})(n-k+1)P_{j}^{0}\leq l_{i}\leq(1+\Delta_{j}^{0})(n-k+1)P_{j}^{0} for each j𝒯j\in{\mathcal{T}}. Recall that we defined Δj0=logn(nk+1)Pj0\Delta_{j}^{0}=\sqrt{\frac{\log n}{(n-k+1)P_{j}^{0}}}. It follows by Lemma 5 that Δj0=o(1)\Delta_{j}^{0}=o(1) for each j𝒯j\in{\mathcal{T}}, which further implies that lj=(1+o(1))(nk+1)Pj0l_{j}=(1+o(1))(n-k+1)P_{j}^{0}. We then have

j𝒯kPj1hj+1lj(nk+1)Pj0lj1𝖤[Nj|𝐇¯0α=𝐡0α,𝐋¯0α=𝐥0α,w(𝐱)=k1]1+i𝒯li\displaystyle\sum_{j\in{\mathcal{T}}}\frac{kP_{j}^{1}}{h_{j}+1}\cdot\frac{l_{j}}{(n-k+1)P_{j}^{0}}\cdot\frac{l_{j}-1-\operatorname{\mathsf{E}}[N^{\prime}_{j}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\bar{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha},w(\mathbf{x})=k-1]}{-1+\sum_{i\in{\mathcal{T}}}l_{i}}
δ(kδ)ϵ/2(1+o(1))j𝒯lj1𝖤[Nj|𝐇¯0α=𝐡0α,𝐋¯0α=𝐥0α,w(𝐱)=k1]1+i𝒯li\displaystyle\geq\delta\left(\frac{k}{\delta}\right)^{{\epsilon}/2}(1+o(1))\sum_{j\in{\mathcal{T}}}\frac{l_{j}-1-\operatorname{\mathsf{E}}[N^{\prime}_{j}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\bar{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha},w(\mathbf{x})=k-1]}{-1+\sum_{i\in{\mathcal{T}}}l_{i}}
(a)δn(kδ)ϵ/2(1+o(1))((2ϵα+1)𝖤[N|𝐇¯0α=𝐡0α,𝐋¯0α=𝐥0α,w(𝐱)=k1]+j𝒯lj)\displaystyle\stackrel{{\scriptstyle\mathchoice{\hbox to0.0pt{\hss$\displaystyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\textstyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\scriptstyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\scriptscriptstyle{(a)}$\hss}}}}{{\geq}}\frac{\delta}{n}\left(\frac{k}{\delta}\right)^{{\epsilon}/2}(1+o(1))\left(-(2{\epsilon}\alpha+1)-\operatorname{\mathsf{E}}[N^{\prime}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\bar{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha},w(\mathbf{x})=k-1]+\sum_{j\in{\mathcal{T}}}l_{j}\right)
(b)(1+o(1))δn(kδ)ϵ/2(n𝖤[N|w(𝐱)=k1,(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)]),\displaystyle\stackrel{{\scriptstyle\mathchoice{\hbox to0.0pt{\hss$\displaystyle{(b)}$\hss}}{\hbox to0.0pt{\hss$\textstyle{(b)}$\hss}}{\hbox to0.0pt{\hss$\scriptstyle{(b)}$\hss}}{\hbox to0.0pt{\hss$\scriptscriptstyle{(b)}$\hss}}}}{{\geq}}\frac{(1+o(1))\delta}{n}\left(\frac{k}{\delta}\right)^{{\epsilon}/2}\left(n^{*}-\operatorname{\mathsf{E}}[N^{\prime}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w({\mathbf{x}})=k-1,(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})]\right),

where (a) follows because 1+i𝒯lin-1+\sum_{i\in{\mathcal{T}}}l_{i}\leq n and (b) follows because

i𝒯li(nk+1)(12exp(logk6pDKL(pϵ1p)))(1lognn)\sum_{i\in{\mathcal{T}}}l_{i}\geq(n-k+1)\left(1-2\exp\left(-\frac{\sqrt{\log k}}{6p{D_{\mathrm{KL}}}(p-{\epsilon}\|1-p)}\right)\right)\left(1-\sqrt{\frac{\log n}{n}}\right)

on the assumption that (𝐡0α,𝐥0α)𝒞({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{C}. ∎

A-A2 Proof of Lemmas

Proof of Lemma 1.

The key idea in this proof is to show that the original Algorithm 𝒜\mathcal{A} can be simulated by executing Algorithm 𝒜\mathcal{A}^{\prime}.

To show this simulation, we assume that the non-adaptive phase of 𝒜\mathcal{A}^{\prime} is completed, and simulate the execution of 𝒜\mathcal{A} in the adaptive phase of 𝒜\mathcal{A}^{\prime}. When Algorithm 𝒜\mathcal{A} makes a query on bit xjx_{j} for the iith time, where iαi\leq\alpha, Algorithm 𝒜\mathcal{A}^{\prime} returns the response to the iith query on xjx_{j} from its non-adaptive phase to Algorithm 𝒜\mathcal{A}. When Algorithm 𝒜\mathcal{A} makes a query on bit xjx_{j} for the α+1\alpha+1st time, Algorithm 𝒜\mathcal{A}^{\prime} noiselessly reads the bit xjx_{j}, and returns response xj𝖡𝖾𝗋𝗇(p)x_{j}\oplus\mathsf{Bern}(p) to 𝒜\mathcal{A}. When Algorithm 𝒜\mathcal{A} makes a query on bit xjx_{j} for the iith time, where iα+2i\geq\alpha+2, because Algorithm 𝒜\mathcal{A}^{\prime} already has the value of xjx_{j}, it returns response xj𝖡𝖾𝗋𝗇(p)x_{j}\oplus\mathsf{Bern}(p) to Algorithm 𝒜\mathcal{A}. Let NN denote the number of bits that are queried by 𝒜\mathcal{A} for at least α\alpha times. Recall that MM denotes the total number of queries made by 𝒜\mathcal{A}. Because we assume 𝖤[M|𝐱](1ϵ)(nkϵn)logkδDKL(pϵ1p)\operatorname{\mathsf{E}}[M\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x}]\leq\frac{(1-\epsilon)(n-k-\epsilon n)\log\frac{k}{\delta}}{{D_{\mathrm{KL}}}(p-\epsilon\|1-p)}, we have

(1ϵ)(nkϵn)logkδDKL(pϵ1p)𝖤[M|𝐱]i=0n𝖯(N=i|𝐱)iα=α𝖤[N|𝐱],\frac{(1-\epsilon)(n-k-\epsilon n)\log\frac{k}{\delta}}{{D_{\mathrm{KL}}}(p-\epsilon\|1-p)}\geq\operatorname{\mathsf{E}}[M\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x}]\geq\sum_{i=0}^{n}\operatorname{\mathsf{P}}(N=i\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x})\cdot i\alpha=\alpha\operatorname{\mathsf{E}}[N\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x}],

for any input instance 𝐱{\mathbf{x}}, and it follows that 𝖤[N|𝐱]nkϵn\operatorname{\mathsf{E}}[N\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x}]\leq n-k-{\epsilon}n. Thus, this simulation process uses at most nkϵnn-k-{\epsilon}n noiseless bits read in expectation in the adaptive phase of Algorithm 𝒜\mathcal{A}^{\prime}. Finally, it is not hard to see that the output distribution of this simulation is the same as the output distribution of Algorithm 𝒜\mathcal{A}. ∎

Proof of Lemma 5.

By definition, we have Pj1=(αj)(1p)jpαjP_{j}^{1}=\binom{\alpha}{j}(1-p)^{j}p^{\alpha-j}. Notice that minj𝒯Pj1=Pα(pϵ)1\min_{j\in{\mathcal{T}}}P_{j}^{1}=P_{\alpha(p-{\epsilon})}^{1}. By Stirling’s approximation, we have

Pα(pϵ)1\displaystyle P_{\alpha(p-{\epsilon})}^{1} =(α(pϵ)α)(1p)α(pϵ)pα(1p+ϵ)\displaystyle=\binom{\alpha}{(p-{\epsilon})\alpha}(1-p)^{\alpha(p-{\epsilon})}p^{\alpha(1-p+{\epsilon})}
exp(α(DKL(pϵ1p)+O(logαα)))\displaystyle\geq\exp\left(-\alpha\left({D_{\mathrm{KL}}}(p-{\epsilon}\|1-p)+O\left(\frac{\log\alpha}{\alpha}\right)\right)\right)
=exp((1ϵ)logkδDKL(pϵ1p)(DKL(pϵ1p)+O(logαα)))\displaystyle=\exp\left(-\frac{(1-{\epsilon})\log\frac{k}{\delta}}{{D_{\mathrm{KL}}}(p-{\epsilon}\|1-p)}\left({D_{\mathrm{KL}}}(p-{\epsilon}\|1-p)+O\left(\frac{\log\alpha}{\alpha}\right)\right)\right)
=(δk)(1ϵ)(1+O((logα)/α))\displaystyle=\left(\frac{\delta}{k}\right)^{(1-{\epsilon})(1+O((\log\alpha)/\alpha))}
δk(kδ)ϵ/2\displaystyle\geq\frac{\delta}{k}\left(\frac{k}{\delta}\right)^{{\epsilon}/2}

To see the upper bounds Pj1k1γP_{j}^{1}\leq k^{-1-\gamma}, we observe that maxj𝒯Pj1=Pα(p+ϵ)1\max_{j\in{\mathcal{T}}}P_{j}^{1}=P_{\alpha(p+{\epsilon})}^{1}. By Stirling’s approximation, we have

Pα(p+ϵ)1\displaystyle P_{\alpha(p+{\epsilon})}^{1} =(α(p+ϵ)α)(1p)α(p+ϵ)pα(1pϵ)\displaystyle=\binom{\alpha}{(p+{\epsilon})\alpha}(1-p)^{\alpha(p+{\epsilon})}p^{\alpha(1-p-{\epsilon})}
exp(αDKL(p+ϵ1p))\displaystyle\leq\exp(-\alpha{D_{\mathrm{KL}}}(p+{\epsilon}\|1-p))
=exp((1ϵ)logkδDKL(pϵ1p)DKL(p+ϵ1p))\displaystyle=\exp\left(-\frac{(1-{\epsilon})\log\frac{k}{\delta}}{{D_{\mathrm{KL}}}(p-{\epsilon}\|1-p)}{D_{\mathrm{KL}}}(p+{\epsilon}\|1-p)\right)
=(a)exp((1ϵ)(1O(ϵ))logkδ)\displaystyle\stackrel{{\scriptstyle\mathchoice{\hbox to0.0pt{\hss$\displaystyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\textstyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\scriptstyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\scriptscriptstyle{(a)}$\hss}}}}{{=}}\exp\left(-(1-{\epsilon})(1-O({\epsilon}))\log\frac{k}{\delta}\right)
k1γloglog(1/δ),\displaystyle\leq k^{-1-\frac{\gamma}{\log\log(1/\delta)}},

where (a) follows because DKL(p+ϵ1p)DKL(pϵ1p)=1O(ϵ)\frac{{D_{\mathrm{KL}}}(p+{\epsilon}\|1-p)}{{D_{\mathrm{KL}}}(p-{\epsilon}\|1-p)}=1-O({\epsilon}), and the existence of positive constant γ\gamma is guaranteed by the fact that log1δlogkloglog(1/δ)\log\frac{1}{\delta}\geq\frac{\log k}{\log\log(1/\delta)} and ϵ=o(1loglog(1/δ)){\epsilon}=o(\frac{1}{\log\log(1/\delta)}).

We have defined Pj0=(αj)pj(1p)αjP_{j}^{0}=\binom{\alpha}{j}p^{j}(1-p)^{\alpha-j}. We can see that minj𝒯Pj0=min(Pα(pϵ)0,Pα(p+ϵ)0)\min_{j\in{\mathcal{T}}}P_{j}^{0}=\min(P_{\alpha(p-{\epsilon})}^{0},P_{\alpha(p+{\epsilon})}^{0}). Therefore, it suffices to show that Pα(pϵ)0exp(o(logk))P_{\alpha(p-{\epsilon})}^{0}\geq\exp(-o(\log k)) and Pα(p+ϵ)0exp(o(logk))P_{\alpha(p+{\epsilon})}^{0}\geq\exp(-o(\log k)). By Stirling’s approximation, we have

Pα(pϵ)0\displaystyle P_{\alpha(p-{\epsilon})}^{0} =(αα(pϵ))pα(pϵ)(1p)α(1p+ϵ)\displaystyle=\binom{\alpha}{\alpha(p-{\epsilon})}p^{\alpha(p-{\epsilon})}(1-p)^{\alpha(1-p+{\epsilon})}
exp(α(DKL(pϵp)+O(logαα)))\displaystyle\geq\exp\left(-\alpha\left({D_{\mathrm{KL}}}(p-{\epsilon}\|p)+O\left(\frac{\log\alpha}{\alpha}\right)\right)\right)
=exp((1ϵ)logkδDKL(pϵ1p)(DKL(pϵp)+O(logαα)))\displaystyle=\exp\left(-\frac{(1-{\epsilon})\log\frac{k}{\delta}}{{D_{\mathrm{KL}}}(p-{\epsilon}\|1-p)}\left({D_{\mathrm{KL}}}(p-{\epsilon}\|p)+O\left(\frac{\log\alpha}{\alpha}\right)\right)\right)
exp(o(logk)),\displaystyle\geq\exp(-o(\log k)),

where the last inequality follows because DKL(pϵp)=O(ϵ2){D_{\mathrm{KL}}}(p-{\epsilon}\|p)=O(\epsilon^{2}) and logkδlogk+loglog(1/δ)logk\log\frac{k}{\delta}\leq\log k+\log\log(1/\delta)\log k. It can be similarly shown that Pα(p+ϵ)0exp(o(logk))P_{\alpha(p+{\epsilon})}^{0}\geq\exp(-o(\log k)). ∎

Proof of Lemma 2.

Let W0Binom(α,p)W_{0}\sim{\mathrm{Binom}}(\alpha,p). For any 𝐱\mathbf{x} with w(𝐱)=k1w(\mathbf{x})=k-1, we have j𝒯Lj|𝐱Binom(nk+1,𝖯(W0𝒯))\sum_{j\in{\mathcal{T}}}L_{j}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x}\sim{\mathrm{Binom}}(n-k+1,\operatorname{\mathsf{P}}(W_{0}\in{\mathcal{T}})). To bound 𝖯(2|𝐱)\operatorname{\mathsf{P}}({\mathcal{E}}_{2}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x}), we first bound 𝖯(W0𝒯)\operatorname{\mathsf{P}}(W_{0}\in{\mathcal{T}}). We have

𝖯(W0𝒯)\displaystyle\operatorname{\mathsf{P}}(W_{0}\notin{\mathcal{T}}) =𝖯(|W0αp|αϵ)\displaystyle=\operatorname{\mathsf{P}}(\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}W_{0}-\alpha p\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\geq\alpha{\epsilon})
(a)2exp(ϵ2α3p)\displaystyle\stackrel{{\scriptstyle\mathchoice{\hbox to0.0pt{\hss$\displaystyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\textstyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\scriptstyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\scriptscriptstyle{(a)}$\hss}}}}{{\leq}}2\exp\left(-\frac{{\epsilon}^{2}\alpha}{3p}\right)
(b)2exp((1ϵ)logkδ3pDKL(pϵ1p)logk)\displaystyle\stackrel{{\scriptstyle\mathchoice{\hbox to0.0pt{\hss$\displaystyle{(b)}$\hss}}{\hbox to0.0pt{\hss$\textstyle{(b)}$\hss}}{\hbox to0.0pt{\hss$\scriptstyle{(b)}$\hss}}{\hbox to0.0pt{\hss$\scriptscriptstyle{(b)}$\hss}}}}{{\leq}}2\exp\left(-\frac{(1-{\epsilon})\log\frac{k}{\delta}}{3p{D_{\mathrm{KL}}}(p-{\epsilon}\|1-p)\sqrt{\log k}}\right)
(c)2exp(logk6pDKL(pϵ1p)),\displaystyle\stackrel{{\scriptstyle\mathchoice{\hbox to0.0pt{\hss$\displaystyle{(c)}$\hss}}{\hbox to0.0pt{\hss$\textstyle{(c)}$\hss}}{\hbox to0.0pt{\hss$\scriptstyle{(c)}$\hss}}{\hbox to0.0pt{\hss$\scriptscriptstyle{(c)}$\hss}}}}{{\leq}}2\exp\left(-\frac{\sqrt{\log k}}{6p{D_{\mathrm{KL}}}(p-{\epsilon}\|1-p)}\right),

where (a) follows by the Chernoff bound (see, for example, Corollary 4.6 in Mitzenmacher and Upfal, (2017)), (b) follows because ϵ=(logk)1/4{\epsilon}=(\log k)^{-1/4}, and (c) follows because 12(1ϵ)logkδlogk\frac{1}{2}(1-{\epsilon})\log\frac{k}{\delta}\geq\log k. For simplicity, we denote c=16pDKL(pϵ1p)c=\frac{1}{6p{D_{\mathrm{KL}}}(p-{\epsilon}\|1-p)}. It follows that

𝖯(2|𝐱)\displaystyle\operatorname{\mathsf{P}}({\mathcal{E}}_{2}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x})
𝖯(Binom(nk+1,12exp(clogk))\displaystyle\leq\operatorname{\mathsf{P}}\Bigg{(}\mathrm{Binom}\left(n-k+1,1-2\exp\left(-c\sqrt{\log k}\right)\right)\leq
(nk+1)(12exp(clogk))(1lognn))\displaystyle\hskip 20.00003pt(n-k+1)\left(1-2\exp\left(-c\sqrt{\log k}\right)\right)\left(1-\sqrt{\frac{\log n}{n}}\right)\Bigg{)}
(d)exp((nk+1)(12exp(clogk))logn2n)\displaystyle\stackrel{{\scriptstyle\mathchoice{\hbox to0.0pt{\hss$\displaystyle{(d)}$\hss}}{\hbox to0.0pt{\hss$\textstyle{(d)}$\hss}}{\hbox to0.0pt{\hss$\scriptstyle{(d)}$\hss}}{\hbox to0.0pt{\hss$\scriptscriptstyle{(d)}$\hss}}}}{{\leq}}\exp\left(-\frac{(n-k+1)\left(1-2\exp(-c\sqrt{\log k})\right)\log n}{2n}\right)
n1/5,\displaystyle\leq n^{-1/5},

where (d) follows by the Chernoff bound. ∎

Proof of Lemma 3.

For any 𝐱\mathbf{x} with w(𝐱)=k1w(\mathbf{x})=k-1, we have Lj|𝐱Binom(nk+1,Pj0)L_{j}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x}\sim{\mathrm{Binom}}(n-k+1,P_{j}^{0}) for each j𝒯j\in{\mathcal{T}}. For any j𝒯j\in{\mathcal{T}}, by Chernoff bound we have

𝖯(|Lj(nk+1)Pj0|Δj0(nk+1)Pj0|𝐱)\displaystyle\operatorname{\mathsf{P}}(\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}L_{j}-(n-k+1)P_{j}^{0}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\geq\Delta_{j}^{0}(n-k+1)P_{j}^{0}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x})
2exp((Δj0)2(nk+1)Pj0/3)\displaystyle\leq 2\exp(-(\Delta_{j}^{0})^{2}(n-k+1)P_{j}^{0}/3)
=2n1/3,\displaystyle=2n^{-1/3},

where the last equality follows because Δj0=logn(nk+1)Pj0\Delta_{j}^{0}=\sqrt{\frac{\log n}{(n-k+1)P_{j}^{0}}}. By union bound, we have

𝖯(3c|𝐱)(2α+1)2n1/3n1/6.\operatorname{\mathsf{P}}({\mathcal{E}}_{3}^{c}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x})\leq(2\alpha+1)2n^{-1/3}\leq n^{-1/6}.

Proof of Lemma 4.

Let 𝐱\mathbf{x} be an input instance with w(𝐱)=k1w(\mathbf{x})=k-1. For each j𝒯j\in{\mathcal{T}}, we have Hj|𝐱Binom(k1,Pj1)H_{j}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x}\sim{\mathrm{Binom}}(k-1,P_{j}^{1}). From Lemma 5, we know that Pj1k1γloglog(1/δ)P_{j}^{1}\leq k^{-1-\frac{\gamma}{\log\log(1/\delta)}}, where γ\gamma is a positive constant. Therefore, we have 𝖤[Hj]kγloglog(1/δ)\operatorname{\mathsf{E}}[H_{j}]\leq k^{-\frac{\gamma}{\log\log(1/\delta)}} for each j𝒯j\in{\mathcal{T}}. By Markov’s inequality, we know that 𝖯(Hj1)kγloglog(1/δ)\operatorname{\mathsf{P}}(H_{j}\geq 1)\leq k^{-\frac{\gamma}{\log\log(1/\delta)}} for any j𝒯j\in{\mathcal{T}}. Applying a union bound over all j𝒯j\in{\mathcal{T}} yields

𝖯(4|𝐱)O(αkγ)=O(logkloglog(1/δ)kγloglog(1/δ)).\operatorname{\mathsf{P}}({\mathcal{E}}_{4}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x})\leq O\left(\frac{\alpha}{k^{\gamma}}\right)=O\left(\frac{\log k\log\log(1/\delta)}{k^{\frac{\gamma}{\log\log(1/\delta)}}}\right).

A-B Regime 2: logk>log(1/δ)loglog(1/δ)\log k>\log(1/\delta)\log\log(1/\delta)

In this section, we prove Theorem 1 in regime logk>log(1/δ)loglog(1/δ)\log k>\log(1/\delta)\log\log(1/\delta). The proof in this regime is similar to the proof in the previous regime. The major difference is that 1δ\frac{1}{\delta} is relatively small comparing to kk in this regime, so the statistics exhibit slightly different concentration behaviors.

Let ϵ=(log1δ)1/4{\epsilon}=(\log\frac{1}{\delta})^{-1/4} in this regime. Because δ=o(1)\delta=o(1) and logk=ω(log(1/δ))\log k=\omega(\log(1/\delta)), we know that (1ϵ)(nkϵn)logkDKL(pϵ1p)=(1o(1))nlog(k/δ)DKL(p1p)\frac{(1-\epsilon)(n-k-{\epsilon}n)\log k}{{D_{\mathrm{KL}}}(p-\epsilon\|1-p)}=(1-o(1))\frac{n\log(k/\delta)}{{D_{\mathrm{KL}}(p\|1-p)}}. Therefore, it suffices to show that for any algorithm that makes at most (1ϵ)(nkϵn)logkDKL(pϵ1p)\frac{(1-\epsilon)(n-k-{\epsilon}n)\log k}{{D_{\mathrm{KL}}}(p-\epsilon\|1-p)} queries in expectation, the worst-case error probability is at least δ\delta.

Let 𝒜\mathcal{A} be an arbitrary algorithm that uses at most (1ϵ)(nkϵn)logkDKL(pϵ1p)\frac{(1-\epsilon)(n-k-{\epsilon}n)\log k}{{D_{\mathrm{KL}}}(p-\epsilon\|1-p)} queries in expectation. We slightly modify the the procedure of the enhanced algorithm 𝒜\mathcal{A}^{\prime} defined in Section III as follows:

  • \bullet

    Non-adaptive phase: query each of the nn bits α(1ϵ)logkDKL(pϵ1p)\alpha\triangleq\frac{(1-\epsilon)\log k}{{D_{\mathrm{KL}}}(p-\epsilon\|1-p)} times.

  • \bullet

    Adaptive phase: Adaptively choose NN^{\prime} bits and noiselessly reveal their value, where the number of revealed bits NN^{\prime} satisfies 𝖤[N|𝐱]=nkϵn\operatorname{\mathsf{E}}[N^{\prime}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x}]=n-k-{\epsilon}n for any 𝐱\mathbf{x}.

Following an analogous argument as in Lemma 1, we know that the worst-case error probability of 𝒜{\mathcal{A}}^{\prime} is upper bounded by that of 𝒜{\mathcal{A}}. Therefore, it suffices to show that the worst-case error probability of 𝒜{\mathcal{A}}^{\prime} is at least δ\delta. In the analysis of 𝒜\mathcal{A}^{\prime}, we assume that the input instance satisfies either w(𝐱)=k1w(\mathbf{x})=k-1 or w(𝐱)=kw(\mathbf{x})=k, and there exists genie-aided information in the form of two sequences of sets ¯0α=(¯0,,¯α)\bar{\mathcal{H}}_{0}^{\alpha}=(\bar{\mathcal{H}}_{0},\ldots,\bar{\mathcal{H}}_{\alpha}) and ¯0α=(¯0,,¯α)\bar{\mathcal{L}}_{0}^{\alpha}=(\bar{\mathcal{L}}_{0},\ldots,\bar{\mathcal{L}}_{\alpha}) defined in the same way as the previous regime. Recall that 𝖳𝖧^k(𝐇0α,𝐋0α,𝟙{1})\widehat{\mathsf{TH}}_{k}({\bf H}_{0}^{\alpha},{\bf L}_{0}^{\alpha},\mathbbm{1}_{\{{\mathcal{E}}_{1}\}}) denote the output estimator of 𝒜\mathcal{A}^{\prime}, where 1{\mathcal{E}}_{1} denote the event that the adaptive phase does not find any heavy ball. Set 𝒩\mathcal{N} is defined as the collections of all (2α+2)(2\alpha+2)-tuples (𝐡0α,𝐥0α)({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha}) such that 𝖳𝖧^k(𝐡0α,𝐥0α,1)=0\widehat{\mathsf{TH}}_{k}({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha},1)=0.

To prove that 𝒜\mathcal{A}^{\prime} has worst-case error probability at least δ\delta, we prove

𝖯(𝖳𝖧^k=0|w(𝐱)=k)>δ\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k)>\delta

under the assumption that

𝖯(𝖳𝖧^k=0|w(𝐱)=k1)1δ.\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1)\geq 1-\delta.

In the following, we construct the family 𝒟\mathcal{D} of realizations such that the typicality and the bounded ratio property introduced in Section III are satisfied. Let 𝒟\mathcal{D} denote the collection of (2α+2)(2\alpha+2)-tuples of natural numbers (𝐡0α,𝐥0α)({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha}) such that

  1. 1.

    i=0αhi=k1\sum_{i=0}^{\alpha}h_{i}=k-1, i=0αli=nk+1\sum_{i=0}^{\alpha}l_{i}=n-k+1;

  2. 2.

    j𝒯lj(nk+1)(12exp(log1δ))(1lognn)\sum_{j\in{\mathcal{T}}}l_{j}\geq(n-k+1)\left(1-2\exp\left(-\sqrt{\log\frac{1}{\delta}}\right)\right)\left(1-\sqrt{\frac{\log n}{n}}\right);

  3. 3.

    (1Δj0)(nk+1)Pj0li(1+Δj0)(nk+1)Pj0(1-\Delta_{j}^{0})(n-k+1)P_{j}^{0}\leq l_{i}\leq(1+\Delta_{j}^{0})(n-k+1)P_{j}^{0} for any j𝒯j\in{\mathcal{T}}, where Δj0logn(nk+1)Pj0\Delta_{j}^{0}\triangleq\sqrt{\frac{\log n}{(n-k+1)P_{j}^{0}}};

  4. 4.

    (1Δj1)(k1)Pj1hi(1+Δj1)(k1)Pj1(1-\Delta_{j}^{1})(k-1)P_{j}^{1}\leq h_{i}\leq(1+\Delta_{j}^{1})(k-1)P_{j}^{1} for any j𝒯j\in{\mathcal{T}}, where Δj1logn(k1)Pj1\Delta_{j}^{1}\triangleq\sqrt{\frac{\log n}{(k-1)P_{j}^{1}}}.

Notice that the major difference between the definition of 𝒟\mathcal{D} and the definition of 𝒞\mathcal{C} in the previous regime is on the last condition. The reason for this distinction is that in this regime, we have 𝖤[Hj]=ω(1)\operatorname{\mathsf{E}}[H_{j}]=\omega(1) instead of 𝖤[Hj]=o(1)\operatorname{\mathsf{E}}[H_{j}]=o(1) for each j𝒯j\in{\mathcal{T}}. Therefore, the typical realizations of Hα(pϵ)α(pϵ)H_{\alpha(p-{\epsilon})}^{\alpha(p-{\epsilon})} would center around its mean rather than having zero values. In the following two propositions, we state the typicality and the bounded ratio property of 𝒟\mathcal{D}.

Proposition 3.

Set 𝒟\mathcal{D} satisfies that

(𝐡0α,𝐥0α)𝒟𝖯({(𝐇0α,𝐋0α)=(𝐡0α,𝐥0α)}|w(𝐱)=k1)1k1/6O(n1/6).\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{D}}\operatorname{\mathsf{P}}(\{({\bf H}_{0}^{\alpha},{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1)\geq 1-k^{-1/6}-O(n^{-1/6}). (24)
Proposition 4.

Let

n(nk+1)(12exp(log1δ))(1lognn)(2ϵα+1).n^{*}\triangleq(n-k+1)\left(1-2\exp\left(-\sqrt{\log\frac{1}{\delta}}\right)\right)\left(1-\sqrt{\frac{\log n}{n}}\right)-(2{\epsilon}\alpha+1).

For any (𝐡0α,𝐥0α)𝒟({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{D}, we have

𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}1|w(𝐱)=k)𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}|w(𝐱)=k1)\displaystyle\frac{\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k\right)}{\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1\right)} (25)
1+o(1)n(n𝖤[N|w(𝐱)=k1,(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)]).\displaystyle\geq\frac{1+o(1)}{n}\left(n^{*}-\operatorname{\mathsf{E}}[N^{\prime}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w({\mathbf{x}})=k-1,(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})]\right). (26)

We are now ready to prove 𝖯(𝖳𝖧^k=0|w(𝐱)=k)>δ\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k)>\delta based on Propositions 3 and 4. We have

𝖯(𝖳𝖧^k=0|w(𝐱)=k)\displaystyle\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k)
=(𝐡0α,𝐥0α)𝒩𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}1|w(𝐱)=k)\displaystyle=\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k)
(𝐡0α,𝐥0α)𝒩𝒟𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}1|w(𝐱)=k)\displaystyle\geq\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}\cap\mathcal{D}}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k)
(a)(𝐡0α,𝐥0α)𝒩𝒟𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}|w(𝐱)=k1)\displaystyle\stackrel{{\scriptstyle\mathchoice{\hbox to0.0pt{\hss$\displaystyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\textstyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\scriptstyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\scriptscriptstyle{(a)}$\hss}}}}{{\geq}}\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}\cap\mathcal{D}}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1)
1+o(1)n(n𝖤[N|w(𝐱)=k1,(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)])\displaystyle\hskip 10.00002pt\cdot\frac{1+o(1)}{n}\left(n^{*}-\operatorname{\mathsf{E}}[N^{\prime}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w({\mathbf{x}})=k-1,(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})]\right)
=1+o(1)n((𝐡0α,𝐥0α)𝒩𝒟n𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}|w(𝐱)=k1)\displaystyle=\frac{1+o(1)}{n}\Bigg{(}\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}\cap\mathcal{D}}n^{*}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1)
𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}|w(𝐱)=k1)𝖤[N|w(𝐱)=k1,(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)]),\displaystyle\hskip 10.00002pt-\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1)\operatorname{\mathsf{E}}[N^{\prime}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w({\mathbf{x}})=k-1,(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})]\Bigg{)}, (27)

where (a) follows by Proposition 4. By Proposition 3 and the definition of nn^{*}, we have

(𝐡0α,𝐥0α)𝒩𝒟n𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}|w(𝐱)=k1)\displaystyle\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}\cap\mathcal{D}}n^{*}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1)
n(1δk1/6O(n1/6))\displaystyle\geq n^{*}(1-\delta-k^{-1/6}-O(n^{-1/6}))
=(nk)(1δO(k1/6)O(exp(log1δ))O(lognn)O(ϵαn)).\displaystyle=(n-k)\left(1-\delta-O(k^{-1/6})-O\left(\exp\left(-\sqrt{\log\tfrac{1}{\delta}}\right)\right)-O\left(\sqrt{\tfrac{\log n}{n}}\right)-O\left(\frac{{\epsilon}\alpha}{n}\right)\right). (28)

By the law of total expectation and the fact that NN^{\prime} is always non-negative, we have

(𝐡0α,𝐥0α)𝒩𝒟𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}|w(𝐱)=k1)𝖤[N|w(𝐱)=k1,(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)]\displaystyle\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{N}\cap\mathcal{D}}\operatorname{\mathsf{P}}(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1)\operatorname{\mathsf{E}}[N^{\prime}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w({\mathbf{x}})=k-1,(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})]
𝖤[N|w(𝐱)=k1]nkϵn.\displaystyle\leq\operatorname{\mathsf{E}}[N^{\prime}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w({\mathbf{x}})=k-1]\leq n-k-{\epsilon}n. (29)

Now, we focus on the difference between (28) and (29). Recall that we defined ϵ=(log1δ)1/4{\epsilon}=(\log\frac{1}{\delta})^{-1/4}. It follows that

(nk)(1δO(k1/6)O(exp(log1δ))O(lognn)O(ϵαn))(nkϵn)\displaystyle(n-k)\left(1-\delta-O(k^{-1/6})-O\left(\exp\left(-\sqrt{\log\tfrac{1}{\delta}}\right)\right)-O\left(\sqrt{\tfrac{\log n}{n}}\right)-O\left(\frac{{\epsilon}\alpha}{n}\right)\right)-(n-k-{\epsilon}n)
(1o(1))ϵn.\displaystyle\geq(1-o(1))\epsilon n.

Therefore, (27) implies that

𝖯(𝖳𝖧^k=0|w(𝐱)=k)(1o(1))ϵ.\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k)\geq(1-o(1))\epsilon.

Notice that ϵδ(log1δ)1/4δ=ω(1)\frac{{\epsilon}}{\delta}\geq\frac{(\log\tfrac{1}{\delta})^{-1/4}}{\delta}=\omega(1). This shows that

𝖯(𝖳𝖧^k=0|w(𝐱)=k)ω(δ).\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}=0\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k)\geq\omega(\delta).

Now it suffices to prove Propositions 3 and 4 to complete the proof in this regime.

A-B1 Proof of Propositions

Proof of Proposition 3.

Define error events

5{j𝒯(nk+1)(12exp(log1δ))(1lognn)},{\mathcal{E}}_{5}\triangleq\left\{\sum_{j\in{\mathcal{T}}}\leq(n-k+1)\left(1-2\exp\left(-\sqrt{\log\frac{1}{\delta}}\right)\right)\left(1-\sqrt{\frac{\log n}{n}}\right)\right\},
6{j𝒯:Lj[(1Δj0)(nk+1)Pj0,(1+Δj0)(nk+1)Pj0]},{\mathcal{E}}_{6}\triangleq\{\exists j\in{\mathcal{T}}\mathchar 58\relax L_{j}\notin[(1-\Delta_{j}^{0})(n-k+1)P_{j}^{0},(1+\Delta_{j}^{0})(n-k+1)P_{j}^{0}]\},

and

7{j𝒯:Hj[(1Δj1)(k1)Pj1,(1+Δj1)(k1)Pj1]}.{\mathcal{E}}_{7}\triangleq\{\exists j\in{\mathcal{T}}\mathchar 58\relax H_{j}\notin[(1-\Delta_{j}^{1})(k-1)P_{j}^{1},(1+\Delta_{j}^{1})(k-1)P_{j}^{1}]\}.

Notice that events 5,6{\mathcal{E}}_{5},{\mathcal{E}}_{6} and 7{\mathcal{E}}_{7} correspond to the second, third and fourth conditions in the definition of 𝒟\mathcal{D}. It follows that

(𝐡0α,𝐥0α)𝒟𝖯({(𝐇0α,𝐋0α)=(𝐡0α,𝐥0α)}|w(𝐱)=k1)\displaystyle\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{D}}\operatorname{\mathsf{P}}(\{({\bf H}_{0}^{\alpha},{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1)
=1𝖯(567|w(𝐱)=k1)\displaystyle=1-\operatorname{\mathsf{P}}({\mathcal{E}}_{5}\cup{\mathcal{E}}_{6}\cup{\mathcal{E}}_{7}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1)
1𝖯(5|w(𝐱)=k1)𝖯(6|w(𝐱)=k1)𝖯(7|w(𝐱)=k1),\displaystyle\geq 1-\operatorname{\mathsf{P}}({\mathcal{E}}_{5}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1)-\operatorname{\mathsf{P}}({\mathcal{E}}_{6}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1)-\operatorname{\mathsf{P}}({\mathcal{E}}_{7}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1),

where the last inequality follows by the union bound. Proposition 3 follows readily by the following three lemmas.

Lemma 6.

For any 𝐱\mathbf{x} with w(𝐱)=k1w(\mathbf{x})=k-1, we have 𝖯(5|𝐱)n1/5.\operatorname{\mathsf{P}}({\mathcal{E}}_{5}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x})\leq n^{-1/5}.

Lemma 7.

For any 𝐱\mathbf{x} with w(𝐱)=k1w(\mathbf{x})=k-1, we have 𝖯(6|𝐱)n1/6\operatorname{\mathsf{P}}({\mathcal{E}}_{6}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x})\leq n^{-1/6}.

Lemma 8.

For any 𝐱\mathbf{x} with w(𝐱)=k1w(\mathbf{x})=k-1, we have 𝖯(7|𝐱)k1/6\operatorname{\mathsf{P}}({\mathcal{E}}_{7}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x})\leq k^{-1/6}.

By Lemmas 6, 7 and 8, we have

(𝐡0α,𝐥0α)𝒟𝖯({(𝐇0α,𝐋0α)=(𝐡0α,𝐥0α)}|w(𝐱)=k1)\displaystyle\sum_{({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{D}}\operatorname{\mathsf{P}}(\{({\bf H}_{0}^{\alpha},{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1)
1O(n1/6)k1/6,\displaystyle\geq 1-O(n^{-1/6})-k^{-1/6},

which completes the proof. ∎

Proof of Proposition 4.

Following the same derivation for equation (23) in the proof of Proposition 2, we have

𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}1|w(𝐱)=k)𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}|w(𝐱)=k1)\displaystyle\frac{\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k\right)}{\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1\right)}
=j𝒯kPj1hj+1lj(nk+1)Pj0lj1𝖤[Nj|𝐇¯0α=𝐡0α,𝐋¯0α=𝐥0α,w(𝐱)=k1]1+i𝒯li.\displaystyle=\sum_{j\in{\mathcal{T}}}\frac{kP_{j}^{1}}{h_{j}+1}\cdot\frac{l_{j}}{(n-k+1)P_{j}^{0}}\cdot\frac{l_{j}-1-\operatorname{\mathsf{E}}[N^{\prime}_{j}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\bar{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha},w(\mathbf{x})=k-1]}{-1+\sum_{i\in{\mathcal{T}}}l_{i}}. (30)

To proceed, we need a technical lemma that bounds Pj0P_{j}^{0} and Pj1P_{j}^{1} for each j𝒯j\in{\mathcal{T}}.

Lemma 9.

For each j𝒯j\in{\mathcal{T}}, we have Pj1exp(logk+logk2(log1δ)1/4)P_{j}^{1}\geq\exp\left(-\log k+\frac{\log k}{2(\log\tfrac{1}{\delta})^{1/4}}\right) and Pj0exp(o(logk))P_{j}^{0}\geq\exp(-o(\log k)).

Because (𝐡0α,𝐥0α)𝒞({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{C}, we have (1Δj1)(k1)Pj1hi(1+Δj1)(k1)Pj1(1-\Delta_{j}^{1})(k-1)P_{j}^{1}\leq h_{i}\leq(1+\Delta_{j}^{1})(k-1)P_{j}^{1} and (1Δj0)(nk+1)Pj0li(1+Δj0)(nk+1)Pj0(1-\Delta_{j}^{0})(n-k+1)P_{j}^{0}\leq l_{i}\leq(1+\Delta_{j}^{0})(n-k+1)P_{j}^{0} for each j𝒯j\in{\mathcal{T}}. Recall that we defined Δj1logk(k1)Pj1\Delta_{j}^{1}\triangleq\sqrt{\frac{\log k}{(k-1)P_{j}^{1}}} and Δj0logn(nk+1)Pj0\Delta_{j}^{0}\triangleq\sqrt{\frac{\log n}{(n-k+1)P_{j}^{0}}}. By Lemma 9, we have Δj0=o(1)\Delta_{j}^{0}=o(1) and Δj1=o(1)\Delta_{j}^{1}=o(1), and it follows that

kPj1hj+1=1+o(1) and lj(nk+1)Pj0=1+o(1).\frac{kP_{j}^{1}}{h_{j}+1}=1+o(1)\;\text{ and }\;\frac{l_{j}}{(n-k+1)P_{j}^{0}}=1+o(1).

Therefore, we have

𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}1|w(𝐱)=k)𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}|w(𝐱)=k1)\displaystyle\frac{\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k\right)}{\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1\right)}
=(1+o(1))j𝒯lj1𝖤[Nj|𝐇¯0α=𝐡0α,𝐋¯0α=𝐥0α,w(𝐱)=k1]1+i𝒯li\displaystyle=(1+o(1))\sum_{j\in{\mathcal{T}}}\frac{l_{j}-1-\operatorname{\mathsf{E}}[N^{\prime}_{j}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\bar{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha},w(\mathbf{x})=k-1]}{-1+\sum_{i\in{\mathcal{T}}}l_{i}}
(1+o(1))𝖤[N|𝐇¯0α=𝐡0α,𝐋¯0α=𝐥0α,w(𝐱)=k1](2ϵα+1)+j𝒯lj1+i𝒯li.\displaystyle\geq(1+o(1))\frac{-\operatorname{\mathsf{E}}[N^{\prime}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\bar{\bf H}_{0}^{\alpha}={\bf h}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha}={\bf l}_{0}^{\alpha},w(\mathbf{x})=k-1]-(2{\epsilon}\alpha+1)+\sum_{j\in{\mathcal{T}}}l_{j}}{-1+\sum_{i\in{\mathcal{T}}}l_{i}}.

Because (𝐡0α,𝐥0α)𝒟({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\in\mathcal{D}, we have

i𝒯li(nk+1)(12exp(log1δ))(1lognn).\sum_{i\in{\mathcal{T}}}l_{i}\geq(n-k+1)\left(1-2\exp\left(-\sqrt{\log\frac{1}{\delta}}\right)\right)\left(1-\sqrt{\frac{\log n}{n}}\right).

Finally, by the definition of nn^{*} in the proposition statement, we have

𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}1|w(𝐱)=k)𝖯({(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)}|w(𝐱)=k1)\displaystyle\frac{\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\cap{\mathcal{E}}_{1}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k\right)}{\operatorname{\mathsf{P}}\left(\{(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})\}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w(\mathbf{x})=k-1\right)}
1+o(1)n(n𝖤[N|w(𝐱)=k1,(𝐇¯0α,𝐋¯0α)=(𝐡0α,𝐥0α)]),\displaystyle\geq\frac{1+o(1)}{n}\left(n^{*}-\operatorname{\mathsf{E}}[N^{\prime}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}w({\mathbf{x}})=k-1,(\bar{\bf H}_{0}^{\alpha},\bar{\bf L}_{0}^{\alpha})=({\bf h}_{0}^{\alpha},{\bf l}_{0}^{\alpha})]\right),

which completes the proof. ∎

A-B2 Proof of Lemmas

Proof of Lemma 9.

By definition, we have Pj1=(αj)(1p)jpαjP_{j}^{1}=\binom{\alpha}{j}(1-p)^{j}p^{\alpha-j}. Notice that minj𝒯Pj1=Pα(pϵ)1\min_{j\in{\mathcal{T}}}P_{j}^{1}=P_{\alpha(p-{\epsilon})}^{1}. Therefore, it suffices to show the claim for j=α(pϵ)j={\alpha(p-{\epsilon})}. By the pmf of binomial distribution and Stirling’s approximation, we have

Pα(pϵ)1\displaystyle P_{\alpha(p-{\epsilon})}^{1} =(α(pϵ)α)(1p)α(pϵ)pα(1p+ϵ)\displaystyle=\binom{\alpha}{(p-{\epsilon})\alpha}(1-p)^{\alpha(p-{\epsilon})}p^{\alpha(1-p+{\epsilon})}
exp(α(DKL(pϵ1p)+O(logαα)))\displaystyle\geq\exp\left(-\alpha\left({D_{\mathrm{KL}}}(p-{\epsilon}\|1-p)+O\left(\frac{\log\alpha}{\alpha}\right)\right)\right)
=exp((1ϵ)logkDKL(pϵ1p)(DKL(pϵ1p)+O(logαα)))\displaystyle=\exp\left(-\frac{(1-{\epsilon})\log k}{{D_{\mathrm{KL}}}(p-{\epsilon}\|1-p)}\left({D_{\mathrm{KL}}}(p-{\epsilon}\|1-p)+O\left(\frac{\log\alpha}{\alpha}\right)\right)\right)
(a)exp((11(log1δ)1/4)(1+O(loglogklogk))logk)\displaystyle\stackrel{{\scriptstyle\mathchoice{\hbox to0.0pt{\hss$\displaystyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\textstyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\scriptstyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\scriptscriptstyle{(a)}$\hss}}}}{{\geq}}\exp\left(-\left(1-\frac{1}{(\log\tfrac{1}{\delta})^{1/4}}\right)\left(1+O\left(\frac{\log\log k}{\log k}\right)\right)\log k\right)
(b)exp(logk+logk2(log1δ)1/4),\displaystyle\stackrel{{\scriptstyle\mathchoice{\hbox to0.0pt{\hss$\displaystyle{(b)}$\hss}}{\hbox to0.0pt{\hss$\textstyle{(b)}$\hss}}{\hbox to0.0pt{\hss$\scriptstyle{(b)}$\hss}}{\hbox to0.0pt{\hss$\scriptscriptstyle{(b)}$\hss}}}}{{\geq}}\exp\left(-\log k+\frac{\log k}{2(\log\tfrac{1}{\delta})^{1/4}}\right),

where (a) follows because ϵ=(log1δ)1/4{\epsilon}=(\log\tfrac{1}{\delta})^{-1/4} and α=Θ(logk)\alpha=\Theta(\log k), and (b) follows because 1(log1δ)1/4=ω(loglogklogk)\frac{1}{(\log\tfrac{1}{\delta})^{1/4}}=\omega\left(\frac{\log\log k}{\log k}\right).

We have Pj0=(αj)pj(1p)αjP_{j}^{0}=\binom{\alpha}{j}p^{j}(1-p)^{\alpha-j}. We can see that minj𝒯Pj0=min(Pα(pϵ)0,Pα(p+ϵ)0)\min_{j\in{\mathcal{T}}}P_{j}^{0}=\min(P_{\alpha(p-{\epsilon})}^{0},P_{\alpha(p+{\epsilon})}^{0}). Therefore, it suffices to show that Pα(pϵ)0exp(o(logk))P_{\alpha(p-{\epsilon})}^{0}\geq\exp(-o(\log k)) and Pα(p+ϵ)0exp(o(logk))P_{\alpha(p+{\epsilon})}^{0}\geq\exp(-o(\log k)). By Stirling’s approximation, we have

Pα(pϵ)0\displaystyle P_{\alpha(p-{\epsilon})}^{0} =(αα(pϵ))pα(pϵ)(1p)α(1p+ϵ)\displaystyle=\binom{\alpha}{\alpha(p-{\epsilon})}p^{\alpha(p-{\epsilon})}(1-p)^{\alpha(1-p+{\epsilon})}
exp(α(DKL(pϵp)+O(logαα)))\displaystyle\geq\exp\left(-\alpha\left({D_{\mathrm{KL}}}(p-{\epsilon}\|p)+O\left(\frac{\log\alpha}{\alpha}\right)\right)\right)
=exp((1ϵ)logkDKL(pϵ1p)(DKL(pϵp)+O(logαα)))\displaystyle=\exp\left(-\frac{(1-{\epsilon})\log k}{{D_{\mathrm{KL}}}(p-{\epsilon}\|1-p)}\left({D_{\mathrm{KL}}}(p-{\epsilon}\|p)+O\left(\frac{\log\alpha}{\alpha}\right)\right)\right)
exp(o(logk)),\displaystyle\geq\exp(-o(\log k)),

where the last inequality follows because DKL(pϵp)=o(1){D_{\mathrm{KL}}}(p-{\epsilon}\|p)=o(1). It can be similarly shown that Pα(p+ϵ)0exp(o(logk))P_{\alpha(p+{\epsilon})}^{0}\geq\exp(-o(\log k)). ∎

Proof of Lemma 6.

For any 𝐱\mathbf{x} with w(𝐱)=k1w(\mathbf{x})=k-1, we know that j𝒯Lj|𝐱Binom(nk+1,𝖯(W0𝒯))\sum_{j\in{\mathcal{T}}}L_{j}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x}\sim{\mathrm{Binom}}(n-k+1,\operatorname{\mathsf{P}}(W_{0}\in{\mathcal{T}})), where W0Binom(α,p)W_{0}\sim{\mathrm{Binom}}(\alpha,p). To bound 𝖯(5|𝐱)\operatorname{\mathsf{P}}({\mathcal{E}}_{5}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x}), we first bound 𝖯(W0𝒯)\operatorname{\mathsf{P}}(W_{0}\in{\mathcal{T}}). We have

𝖯(W0𝒯)\displaystyle\operatorname{\mathsf{P}}(W_{0}\notin{\mathcal{T}}) =𝖯(|W0αp|αϵ)\displaystyle=\operatorname{\mathsf{P}}(\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}W_{0}-\alpha p\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\geq\alpha{\epsilon})
(a)2exp(ϵ2α3p)\displaystyle\stackrel{{\scriptstyle\mathchoice{\hbox to0.0pt{\hss$\displaystyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\textstyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\scriptstyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\scriptscriptstyle{(a)}$\hss}}}}{{\leq}}2\exp\left(-\frac{{\epsilon}^{2}\alpha}{3p}\right)
(b)2exp((1ϵ)logk3pDKL(pϵ1p)log(1/δ))\displaystyle\stackrel{{\scriptstyle\mathchoice{\hbox to0.0pt{\hss$\displaystyle{(b)}$\hss}}{\hbox to0.0pt{\hss$\textstyle{(b)}$\hss}}{\hbox to0.0pt{\hss$\scriptstyle{(b)}$\hss}}{\hbox to0.0pt{\hss$\scriptscriptstyle{(b)}$\hss}}}}{{\leq}}2\exp\left(-\frac{(1-{\epsilon})\log k}{3p{D_{\mathrm{KL}}}(p-{\epsilon}\|1-p)\sqrt{\log(1/\delta)}}\right)
(c)2exp(log1δ),\displaystyle\stackrel{{\scriptstyle\mathchoice{\hbox to0.0pt{\hss$\displaystyle{(c)}$\hss}}{\hbox to0.0pt{\hss$\textstyle{(c)}$\hss}}{\hbox to0.0pt{\hss$\scriptstyle{(c)}$\hss}}{\hbox to0.0pt{\hss$\scriptscriptstyle{(c)}$\hss}}}}{{\leq}}2\exp\left(-\sqrt{\log\frac{1}{\delta}}\right),

where (a) follows by the Chernoff bound, (b) follows because ϵ=(log(1/δ))1/4{\epsilon}=(\log(1/\delta))^{-1/4}, and (c) follows because logk=ω(log1δ)\log k=\omega(\log\frac{1}{\delta}).

Therefore, we have

𝖯(5|𝐱)\displaystyle\operatorname{\mathsf{P}}({\mathcal{E}}_{5}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x})
𝖯(Binom(nk+1,12exp(log1δ))\displaystyle\leq\operatorname{\mathsf{P}}\Bigg{(}\mathrm{Binom}\left(n-k+1,1-2\exp\left(-\sqrt{\log\tfrac{1}{\delta}}\right)\right)
(nk+1)(12exp(log1δ))(1lognn))\displaystyle\hskip 20.00003pt\leq(n-k+1)\left(1-2\exp\left(-\sqrt{\log\tfrac{1}{\delta}}\right)\right)\left(1-\sqrt{\tfrac{\log n}{n}}\right)\Bigg{)}
(d)exp((nk+1)(12exp(log1δ))logn2n)\displaystyle\stackrel{{\scriptstyle\mathchoice{\hbox to0.0pt{\hss$\displaystyle{(d)}$\hss}}{\hbox to0.0pt{\hss$\textstyle{(d)}$\hss}}{\hbox to0.0pt{\hss$\scriptstyle{(d)}$\hss}}{\hbox to0.0pt{\hss$\scriptscriptstyle{(d)}$\hss}}}}{{\leq}}\exp\left(-\frac{(n-k+1)\left(1-2\exp\left(-\sqrt{\log\frac{1}{\delta}}\right)\right)\log n}{2n}\right)
n1/5,\displaystyle\leq n^{-1/5},

where (d) follows by the Chernoff bound. ∎

Proof of Lemma 7.

For any 𝐱\mathbf{x} with w(𝐱)=k1w(\mathbf{x})=k-1, we have Lj|𝐱Binom(nk+1,Pj0)L_{j}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x}\sim{\mathrm{Binom}}(n-k+1,P_{j}^{0}) for each j𝒯j\in{\mathcal{T}}. For any j𝒯j\in{\mathcal{T}}, by Chernoff bound we have

𝖯(|Lj(nk+1)Pj0|Δj0(nk+1)Pj0|𝐱)\displaystyle\operatorname{\mathsf{P}}(\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}L_{j}-(n-k+1)P_{j}^{0}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\geq\Delta_{j}^{0}(n-k+1)P_{j}^{0}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x})
2exp((Δj0)2(nk+1)Pj0/3)\displaystyle\leq 2\exp(-(\Delta_{j}^{0})^{2}(n-k+1)P_{j}^{0}/3)
=2n1/3,\displaystyle=2n^{-1/3},

where the last equality follows because Δj0=logn(nk+1)Pj0\Delta_{j}^{0}=\sqrt{\frac{\log n}{(n-k+1)P_{j}^{0}}}. By union bound, we have

𝖯(6c|𝐱)(2α+1)2n1/3n1/6.\operatorname{\mathsf{P}}({\mathcal{E}}_{6}^{c}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x})\leq(2\alpha+1)2n^{-1/3}\leq n^{-1/6}.

Proof of Lemma 8.

For any 𝐱\mathbf{x} with w(𝐱)=k1w(\mathbf{x})=k-1, we have Hj|𝐱Binom(k1,Pj1)H_{j}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x}\sim{\mathrm{Binom}}(k-1,P_{j}^{1}) for each j𝒯j\in{\mathcal{T}}. For any j𝒯j\in{\mathcal{T}}, by Chernoff bound we have

𝖯(|Hj(k1)Pj1|Δj1((k1)Pj1|𝐱)\displaystyle\operatorname{\mathsf{P}}(\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}H_{j}-(k-1)P_{j}^{1}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\geq\Delta_{j}^{1}((k-1)P_{j}^{1}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x})
2exp((Δj1)2(k1)Pj1/3)\displaystyle\leq 2\exp(-(\Delta_{j}^{1})^{2}(k-1)P_{j}^{1}/3)
=2k1/3,\displaystyle=2k^{-1/3},

where the last equality follows because Δj1=logk(k1)Pj1\Delta_{j}^{1}=\sqrt{\frac{\log k}{(k-1)P_{j}^{1}}}. By union bound, we have

𝖯(7c|𝐱)(2α+1)2k1/3k1/6.\operatorname{\mathsf{P}}({\mathcal{E}}_{7}^{c}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathbf{x})\leq(2\alpha+1)2k^{-1/3}\leq k^{-1/6}.

A-C Regime 3: logklog(1/δ)loglog(1/δ)\log k\leq\frac{\log(1/\delta)}{\log\log(1/\delta)}

In this regime, we prove the converse using an approach called Le Cam’s two points methods (see, e.g. Yu, (1997)). The proof is done by establishing the following proposition.

Proposition 5.

For any variable-length algorithm for computing 𝖳𝖧k(𝐱)\mathsf{TH}_{k}(\mathbf{x}) with the number of queries MM satisfying

𝖤[M|𝐱](nk+1)log14δDKL(p1p)\operatorname{\mathsf{E}}[M\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathbf{x}}]\leq\frac{(n-k+1)\log\tfrac{1}{4\delta}}{{D_{\mathrm{KL}}(p\|1-p)}}

for any input instance 𝐱{\mathbf{x}}. Then the worst-case error probability of the algorithm is at least δ\delta.

Notice that under assumptions δ=o(1)\delta=o(1) and logklog(1/δ)loglog(1/δ)\log k\leq\frac{\log(1/\delta)}{\log\log(1/\delta)}, we have

(nk+1)log14δDKL(p1p)=(1o(1))(nk)logkδDKL(p1p).\frac{(n-k+1)\log\tfrac{1}{4\delta}}{{D_{\mathrm{KL}}(p\|1-p)}}=(1-o(1))\frac{(n-k)\log\frac{k}{\delta}}{{D_{\mathrm{KL}}(p\|1-p)}}.

Hence Proposition 5 directly implies Theorem 1 in this regime. It is worth noting that Proposition 5 does not imply the same bounds in the other two regimes, namely when log(1/δ)loglog(1/δ)<logklog(1/δ)loglog(1/δ)\frac{\log(1/\delta)}{\log\log(1/\delta)}<\log k\leq\log(1/\delta)\log\log(1/\delta) and when logk>log(1/δ)loglog(1/δ)\log k>\log(1/\delta)\log\log(1/\delta).

The proof of Proposition 5 is based on Le Cam’s Two Points Lemma stated below.

Lemma 10 (Le Cam’s Two Points Lemma (Yu,, 1997)).

For any θ0,θ1Θ\theta_{0},\theta_{1}\in\Theta, suppose that the following separation condition holds for some loss function L(θ,a):Θ×𝒜L(\theta,a)\mathchar 58\relax\Theta\times\mathcal{A}\rightarrow\mathbb{R}:

a𝒜,L(θ0,a)+L(θ1,a)Δ>0.\displaystyle\forall{a\in\mathcal{A}},L(\theta_{0},a)+L(\theta_{1},a)\geq\Delta>0.

Then we have

inffsupθ𝔼θ[L(θ,f(X))]Δ2(1𝖳𝖵(θ0,θ1)),\displaystyle\inf_{f}\sup_{\theta}\mathbb{E}_{\theta}[L(\theta,f(X))]\geq\frac{\Delta}{2}\left(1-\mathsf{TV}(\mathbb{P}_{\theta_{0}},\mathbb{P}_{\theta_{1}})\right),

where θ0\mathbb{P}_{\theta_{0}} and θ1\mathbb{P}_{\theta_{1}} represent the probability distribution of the statistics XX under the two hypotheses θ1\theta_{1} and θ2\theta_{2} respectively.

Proof of Proposition 5.

To apply Lemma 10, we first define nk+2n-k+2 input instances for the 𝖳𝖧k\mathsf{TH}_{k} problem. Let 𝐱(0)\mathbf{x}^{(0)} denote the binary sequence such that

xi(0)={1, for 1ik1,0, otherwise.x_{i}^{(0)}=\begin{cases}1,&\text{ for }1\leq i\leq k-1,\\ 0,&\text{ otherwise}.\end{cases}

For each kjnk\leq j\leq n, let 𝐱(j)\mathbf{x}^{(j)} denote the binary sequence such that

xi(j)={1, for 1ik1 or i=j,0, otherwise.x_{i}^{(j)}=\begin{cases}1,&\text{ for }1\leq i\leq k-1\text{ or }i=j,\\ 0,&\text{ otherwise}.\end{cases}

Notice that 𝖳𝖧k(𝐱(0))=0\mathsf{TH}_{k}(\mathbf{x}^{(0)})=0 and 𝖳𝖧k(𝐱(j))=1\mathsf{TH}_{k}(\mathbf{x}^{(j)})=1 for each kjnk\leq j\leq n. Therefore, for any estimator 𝖳𝖧^k\widehat{\mathsf{TH}}_{k}, we have

𝟙(𝖳𝖧^k𝖳𝖧k(𝐱(0)))+𝟙(𝖳𝖧^k𝖳𝖧k(𝐱(j)))1.\displaystyle\mathbbm{1}(\widehat{\mathsf{TH}}_{k}\neq\mathsf{TH}_{k}(\mathbf{x}^{(0)}))+\mathbbm{1}(\widehat{\mathsf{TH}}_{k}\neq\mathsf{TH}_{k}(\mathbf{x}^{(j)}))\geq 1.

By Lemma 10, we have

inf𝖳𝖧^ksup𝐱{0,1}K𝖯(𝖳𝖧^k𝖳𝖧k(𝐱))\displaystyle\inf_{\widehat{\mathsf{TH}}_{k}}\sup_{\mathbf{x}\in\{0,1\}^{K}}\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}\neq\mathsf{TH}_{k}(\mathbf{x})) 12(1𝖳𝖵(𝐱(0),𝐱(j))),\displaystyle\geq\frac{1}{2}(1-\mathsf{TV}(\mathbb{P}_{\mathbf{x}^{(0)}},\mathbb{P}_{\mathbf{x}^{(j)}})),

for any kjnk\leq j\leq n, where 𝐱(0)\mathbb{P}_{\mathbf{x}^{(0)}} denote the joint distribution of query-observation pairs in mm rounds when the ground truth is 𝐱(0)\mathbf{x}^{(0)} and 𝐱(j)\mathbb{P}_{\mathbf{x}^{(j)}} denote the joint distribution of query-observation pairs in mm rounds when the ground truth is 𝐱(j)\mathbf{x}^{(j)}. We can take maximum over kjnk\leq j\leq n on the right-hand side and get

inf𝖳𝖧^ksup𝐱{0,1}K𝖯(𝖳𝖧^k𝖳𝖧k(𝐱))\displaystyle\inf_{\widehat{\mathsf{TH}}_{k}}\sup_{\mathbf{x}\in\{0,1\}^{K}}\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}\neq\mathsf{TH}_{k}(\mathbf{x}))
supkjn12(1𝖳𝖵(𝐱(0),𝐱(j)))\displaystyle\geq\sup_{k\leq j\leq n}\frac{1}{2}(1-\mathsf{TV}(\mathbb{P}_{\mathbf{x}^{(0)}},\mathbb{P}_{\mathbf{x}^{(j)}}))
supkjn14exp(DKL(𝐱(0),𝐱(j))),\displaystyle\geq\sup_{k\leq j\leq n}\frac{1}{4}\exp(-{D_{\mathrm{KL}}}(\mathbb{P}_{\mathbf{x}^{(0)}},\mathbb{P}_{\mathbf{x}^{(j)}})),

where the last inequality follows by the Bretagnolle–Huber inequality stated in Lemma 11. Now, we need to bound the KL divergence DKL(𝐱(0),𝐱(j)){D_{\mathrm{KL}}}(\mathbb{P}_{\mathbf{x}^{(0)}},\mathbb{P}_{\mathbf{x}^{(j)}}). By the Divergence Decomposition Lemma stated in Lemma 12, we have

DKL(𝐱(0),𝐱(j))=𝖤[Mj|𝐱(0)]DKL(p1p),{D_{\mathrm{KL}}}(\mathbb{P}_{\mathbf{x}^{(0)}},\mathbb{P}_{\mathbf{x}^{(j)}})=\operatorname{\mathsf{E}}[M_{j}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathbf{x}}^{(0)}]{D_{\mathrm{KL}}(p\|1-p)},

where MjM_{j} denote the number of queries on the bit xjx_{j}. Because j=kn𝖤[Mj|𝐱(0)]𝖤[M|𝐱(0)](nk+1)log14δDKL(p1p)\sum_{j=k}^{n}\operatorname{\mathsf{E}}[M_{j}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathbf{x}}^{(0)}]\leq\operatorname{\mathsf{E}}[M\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathbf{x}}^{(0)}]\leq\frac{(n-k+1)\log\tfrac{1}{4\delta}}{{D_{\mathrm{KL}}(p\|1-p)}}, there exists some kjnk\leq j^{*}\leq n such that 𝖤[Mj|𝐱(0)]log14δDKL(p1p)\operatorname{\mathsf{E}}[M_{j}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathbf{x}}^{(0)}]\leq\frac{\log\tfrac{1}{4\delta}}{{D_{\mathrm{KL}}(p\|1-p)}}. Finally, we have

inf𝖳𝖧^ksup𝐱{0,1}K𝖯(𝖳𝖧^k𝖳𝖧k(𝐱))\displaystyle\inf_{\widehat{\mathsf{TH}}_{k}}\sup_{\mathbf{x}\in\{0,1\}^{K}}\operatorname{\mathsf{P}}(\widehat{\mathsf{TH}}_{k}\neq\mathsf{TH}_{k}(\mathbf{x}))
supkjn14exp(𝖤[Mj|𝐱(0)]DKL(p1p))\displaystyle\geq\sup_{k\leq j\leq n}\frac{1}{4}\exp(-\operatorname{\mathsf{E}}[M_{j}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathbf{x}}^{(0)}]{D_{\mathrm{KL}}(p\|1-p)})
14exp(𝖤[Mj|𝐱(0)]DKL(p1p))\displaystyle\geq\frac{1}{4}\exp(-\operatorname{\mathsf{E}}[M_{j^{*}}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathbf{x}}^{(0)}]{D_{\mathrm{KL}}(p\|1-p)})
δ,\displaystyle\geq\delta,

which completes the proof. ∎

Lemma 11 (An upper Bound of Bretagnolle–Huber inequality (Bretagnolle and Huber,, 1979)).

For any distribution 1,2\mathbb{P}_{1},\mathbb{P}_{2}, one has

𝖳𝖵(1,2)112exp(D𝖪𝖫(1,2)).\displaystyle\mathsf{TV}(\mathbb{P}_{1},\mathbb{P}_{2})\leq 1-\frac{1}{2}\exp(-D_{\mathsf{KL}}(\mathbb{P}_{1},\mathbb{P}_{2})).
Lemma 12 (Divergence Decomposition (Auer et al.,, 1995)).

Let TiT_{i} be the random variable denoting the number of times experiment i[K]i\in[K] is performed under some policy π\pi, then for two distributions π,π\mathbb{P}^{\pi},\mathbb{Q}^{\pi} under policy π\pi,

D𝖪𝖫(π,π)=i[K]𝔼π[Ti]D𝖪𝖫(iπ,iπ).\displaystyle D_{\mathsf{KL}}(\mathbb{P}^{\pi},\mathbb{Q}^{\pi})=\sum_{i\in[K]}\mathbb{E}_{\mathbb{P}^{\pi}}[T_{i}]D_{\mathsf{KL}}(\mathbb{P}_{i}^{\pi},\mathbb{Q}_{i}^{\pi}).

Appendix B Detailed Proof of Theorem 2

In this appendix, we give a complete proof of Theorem 2. We present the proof in two separate regimes knlognk\leq\frac{n}{\log n} and k>nlognk>\frac{n}{\log n}.

B-A Regime 1: knlognk\leq\frac{n}{\log n}

To prove Theorem 2 in this regime, we establish the following proposition.

Proposition 6.

Suppose knlognk\leq\frac{n}{\log n} and kn/2k\leq n/2. There exists a variable-length algorithm, namely Algorithm 2, that computes the 𝖳𝖧k\mathsf{TH}_{k} function with worst-case error probability at most 3δ3\delta. Moreover, the number of queries MM made by the algorithm satisfies

𝖤[M|𝐱]nlogkδDKL(p1p)+n12p+C(k+max(nδ+nδ,nlogn))logkδ\operatorname{\mathsf{E}}[M\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathbf{x}}]\leq\frac{n\log\frac{k}{\delta}}{{D_{\mathrm{KL}}(p\|1-p)}}+\frac{n}{1-2p}+C\left(k+\max\left(n\delta+n\sqrt{\delta},\frac{n}{\log n}\right)\right)\log\frac{k}{\delta}

for any input instance 𝐱{\mathbf{x}}.

To prove Theorem 2 in this regime, we can set δ\delta in Proposition 6 to δ/3\delta^{\prime}/3. Then we know that Algorithm 2 has worst-case error probability at most δ\delta^{\prime}, and its expected query complexity satisfies

𝖤[M|𝐱]nlog3kδDKL(p1p)+n12p+C(k+max(nδ/3+nδ/3,nlogn))log3kδ.\operatorname{\mathsf{E}}[M\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathbf{x}}]\leq\frac{n\log\frac{3k}{\delta^{\prime}}}{{D_{\mathrm{KL}}(p\|1-p)}}+\frac{n}{1-2p}+C\left(k+\max\left(n\delta^{\prime}/3+n\sqrt{\delta^{\prime}/3},\frac{n}{\log n}\right)\right)\log\frac{3k}{\delta^{\prime}}.

By the assumptions that knlognk\leq\frac{n}{\log n} and δ=o(1)\delta^{\prime}=o(1), it follows that

𝖤[M|𝐱]=(1+o(1))nlogkδDKL(p1p),\operatorname{\mathsf{E}}[M\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathbf{x}}]=(1+o(1))\frac{n\log\frac{k}{\delta^{\prime}}}{{D_{\mathrm{KL}}(p\|1-p)}},

which proves Theorem 2 in this regime.

In the rest of this section, we prove Proposition 6 by analyzing the proposed NoisyThreshold algorithm (Algorithm 2).

Recall that Algorithm 2 uses the CheckBit and MaxHeapThreshold functions as subroutines. The CheckBit algorithm takes as input a single bit and returns an estimate of the bit through repeated queries. The CheckBit algorithm is given in Algorithm 3, where the parameter α\alpha in the iith iteration corresponds to the probability that the input bit is one given the first ii noisy observations. The following lemma gives an upper bound on the expected number of queries used by the CheckBit algorithm, which can be seen as a recast of Lemma 13 in Gu and Xu, (2023).

Data: Input bit xx, tolerated error probability δ\delta, crossover probability pp.
Result: Estimate of xx.
1 α1/2\alpha\leftarrow 1/2
2
3while α(δ,1δ)\alpha\in(\delta,1-\delta) do
4       Query the input bit xx
5      
6      if 11 is observed then
7             α(1p)α(1p)α+p(1α)\alpha\leftarrow\frac{(1-p)\alpha}{(1-p)\alpha+p(1-\alpha)}
8            
9      else
10             αpαpα+(1p)(1α)\alpha\leftarrow\frac{p\alpha}{p\alpha+(1-p)(1-\alpha)}
11            
12       end if
13      
14 end while
15if α1δ\alpha\geq 1-\delta then
16       return 11
17      
18else
19       return 0
20      
21 end if
Algorithm 3 CheckBit algorithm for finding the value of an input bit
Lemma 13 (Lemma 13 in Gu and Xu, (2023)).

There exists a randomized algorithm, namely, the CheckBit algorithm, that finds the value of any input bit xx with error probability at most δ\delta. The algorithm makes at most 112plog1δδlog1pp\frac{1}{1-2p}\left\lceil\frac{\log\frac{1-\delta}{\delta}}{\log\frac{1-p}{p}}\right\rceil queries in expectation.

The MaxHeapThreshold algorithm computes the 𝖳𝖧k\mathsf{TH}_{k} function for a length-nn input bit sequence with a desired error probability δ\delta. This algorithm has been previously proposed in Feige et al., (1994), and is given here for completion in Algorithm 6. The MaxHeapThreshold algorithm uses the following three subroutines.

  • \bullet

    NoisyCompareUsingReadings (Algorithm 4): which compares two bits using C1(p)C_{1}(p) noisy readings of each bit.

  • \bullet

    ConstructMaxHeap  (Algorithm 5): which returns a max-heap representation 𝐳{\bf z} of 𝐱{\bf x} such that zi,jz_{i,j} is the jj-th entry in the ii-th level of a max-heap. The algorithm is similar to the tournament algorithm for computing the Max function (Zhu et al.,, 2024, Algorithm 5).

  • \bullet

    NoisyExtractMax: which is similar to the common ExtractMax() operation in a max-heap structure, except that every comparison is performed using C2(p)C_{2}(p) noisy readings. Since this is a simple modification to a standard max-heap operation, we don’t show this algorithm here.

Remark 2.

Note that C1(p)C_{1}(p) and C2(p)C_{2}(p) are constants that depend only on pp. For the sake of the analysis of our proposed algorithm, the exact value of these constants does not matter.

The following lemma gives an upper bound on the expected number of queries made by the MaxHeapThreshold algorithm.

Lemma 14 (Theorem 3.3 in Feige et al., (1994)).

For any positive integer k<nk<n and positive real number δ<1\delta<1, the MaxHeapThreshold algorithm computes the 𝖳𝖧k\mathsf{TH}_{k} function with error probability at most δ\delta and using at most CnlogmδCn\log\frac{m}{\delta} queries, where m=min{k,nk+1}m=\min\{k,n-k+1\} and CC is a constant depending only on pp.

Data: Elements x1x_{1} and x2x_{2}, tolerated error probability δ\delta, crossover probability pp.
Result: Estimate of max{x1,x2}\max\{x_{1},x_{2}\}.
1 Make C1(p)log1δC_{1}(p)\log\tfrac{1}{\delta} noisy readings of x1x_{1}. Let x^1\hat{x}_{1} be estimate pertaining to majority of readings.
2
3Make C1(p)log1δC_{1}(p)\log\tfrac{1}{\delta} noisy readings of x2x_{2}. Let x^2\hat{x}_{2} be estimate pertaining to majority of readings.
4
5return max{x^1,x^2}\max\{\hat{x}_{1},\hat{x}_{2}\}
Algorithm 4 NoisyCompareUsingReadings for comparing two elements using noisy readings
Data: Sequence 𝐱=(x1,,xn){\bf x}=(x_{1},\ldots,x_{n}), tolerated error probability δ\delta, crossover probability pp.
Result: Returns a max-heap structure of 𝐱{\bf x}.
1 Set 𝐲𝐱{\bf y}\leftarrow{\bf x}, 𝐳0,1:n𝐲{\bf z}_{0,1\mathchar 58\relax n}\leftarrow{\bf y}, rnr\leftarrow n
2
3for i=1:log2(n)i=1\mathchar 58\relax\lceil\log_{2}(n)\rceil do
4       δ~iδ2(2i1)\tilde{\delta}_{i}\leftarrow\delta^{2(2i-1)}
5      
6      for j=1:r/2j=1\mathchar 58\relax\lfloor r/2\rfloor do
7             zi,jNoisyCompareUsingReadingsz_{i,j}\leftarrow\textsc{NoisyCompareUsingReadings}(y2j1y_{2j-1}, y2jy_{2j}, δ~i\tilde{\delta}_{i}, pp)
8            
9       end for
10      if rr is even then
11             𝐲(zi,1,,zi,r/2){\bf y}\leftarrow(z_{i,1},\ldots,z_{i,r/2})
12            
13      else
14             𝐲(zi,1,,zi,r/2,yr){\bf y}\leftarrow(z_{i,1},\ldots,z_{i,\lfloor r/2\rfloor},y_{r})
15            
16            zi,r/2yrz_{i,\lceil r/2\rceil}\leftarrow y_{r}
17            
18       end if
19      rr/2r\leftarrow\lceil r/2\rceil
20      
21 end for
22return 𝐳{\bf z}
Algorithm 5 ConstructMaxHeap algorithm
Data: Bit sequence 𝐱=(x1,,xn){\bf x}=(x_{1},\ldots,x_{n}), parameter kn2k\leq\frac{n}{2}, error probability δ\delta, noise probability pp.
Result: Estimate of 𝖳𝖧k(𝐱)\mathsf{TH}_{k}({\bf x}).
1 yxy\leftarrow x
2
3if knk\geq\sqrt{n} then
4       Apply the sorting algorithm from Feige et al., (1994) and return the kkth largest element333The sorting algorithm of Feige et al., (1994) considers the setting of noisy pairwise comparisons. Since we consider noisy readings of the bits here, then each comparison in the sorting algorithm of Feige et al., (1994) should be done by a call to NoisyCompareUsingReadings (Algorithm 4).
5      
6else
7       𝐲ConstructMaxHeap(𝐱,p,δ2k){\bf y}\leftarrow\textsc{ConstructMaxHeap}({\bf x},\,p,\,\frac{\delta}{2k})
8      
9      for j=1,,kj=1,\ldots,k do
10             zjNoisyExtractMax(𝐲,p,δ2klogn)z_{j}\leftarrow\textsc{NoisyExtractMax}({\bf y},\,p,\,\frac{\delta}{2k\log n})
11            
12       end for
13      return zkz_{k}
14      
15 end if
Algorithm 6 MaxHeapThreshold algorithm for computing the 𝖳𝖧k\mathsf{TH}_{k} function

By symmetry, we only need to consider the case of kn/2k\leq n/2. In the following, we analyze the error probability and the number of queries of Algorithm 2.

Error Analysis: The following lemma bounds the worst-case error probability of Algorithm 2.

Lemma 15 (Error probability of NoisyThreshold).

The worst-case error probability of the NoisyThreshold algorithm is at most 3δ3\delta.

Proof.

First, suppose that 𝖳𝖧k(𝐱)=1\mathsf{TH}_{k}({\bf x})=1. Without loss of generality, assume that x1==xk=1x_{1}=\cdots=x_{k}=1. Let 1={𝖳𝖧^k=0}{\mathcal{E}}_{1}=\{\widehat{\mathsf{TH}}_{k}=0\} denote the error event in this case, and let 𝒜={i[k]:xi𝒮}{\mathcal{A}}=\{\exists\,i\in[k]\mathchar 58\relax x_{i}\notin{\mathcal{S}}\}. By the union bound, we have

𝖯(1)𝖯(𝒜)+𝖯(1|𝒜c).\operatorname{\mathsf{P}}({\mathcal{E}}_{1})\leq\operatorname{\mathsf{P}}({\mathcal{A}})+\operatorname{\mathsf{P}}({\mathcal{E}}_{1}\mathchoice{\,|\,}{\mspace{2.0mu}|\mspace{2.0mu}}{|}{|}{\mathcal{A}}^{c}).

For the first term, by Lemma 13, we have that 𝖯(𝒜)i=1k𝖯(xi𝒮)δ\operatorname{\mathsf{P}}({\mathcal{A}})\leq\sum_{i=1}^{k}\operatorname{\mathsf{P}}(x_{i}\notin{\mathcal{S}})\leq\delta. For the second term, notice that |𝒮|k\lvert{\mathcal{S}}\rvert\geq k on the event 𝒜c{\mathcal{A}}^{c}. Therefore, the only possibility for the algorithm to return 0 is that the function call of MaxHeapThreshold on Line 10 incorrectly returns 0. By Lemma 14, we have that 𝖯(1|𝒜c)δ\operatorname{\mathsf{P}}({\mathcal{E}}_{1}\mathchoice{\,|\,}{\mspace{2.0mu}|\mspace{2.0mu}}{|}{|}{\mathcal{A}}^{c})\leq\delta. This leads to the upper bound 𝖯(1)2δ\operatorname{\mathsf{P}}({\mathcal{E}}_{1})\leq 2\delta. Next, suppose that 𝖳𝖧k(𝐱)=0\mathsf{TH}_{k}({\bf x})=0. Let 0={𝖳𝖧^k=1}{\mathcal{E}}_{0}=\{\widehat{\mathsf{TH}}_{k}=1\} denote the error event in this case, and define the auxiliary event ={|𝒮|k+max(nδ+nδ,nlogn)}{\mathcal{B}}=\{\lvert{\mathcal{S}}\rvert\geq k+\max(n\delta+n\sqrt{\delta},\frac{n}{\log n})\}. As before, we have

𝖯(0)𝖯()+𝖯(0|c).\operatorname{\mathsf{P}}(\mathcal{E}_{0})\leq\operatorname{\mathsf{P}}({\mathcal{B}})+\operatorname{\mathsf{P}}({\mathcal{E}}_{0}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathcal{B}}^{c}).

By Lemma 14, we know that 𝖯(0|c)δ\operatorname{\mathsf{P}}({\mathcal{E}}_{0}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathcal{B}}^{c})\leq\delta. Moreover, we claim that 𝖯()2δ\operatorname{\mathsf{P}}({\mathcal{B}})\leq 2\delta. It follows that 𝖯(0)3δ\operatorname{\mathsf{P}}({\mathcal{E}}_{0})\leq 3\delta, and completes the proof.

It remains to show the claim. Notice that, by Lemma 13,

|𝒮|Binom(w,1γ)+Binom(nw,γ),\lvert{\mathcal{S}}\rvert\,\sim\,\mathrm{Binom}(w,1-\gamma)+\mathrm{Binom}(n-w,\gamma),

where wk1w\leq k-1 and γδ/k\gamma\leq\delta/k. Hence, |𝒮|\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathcal{S}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|} is stochastically dominated by the random variable k+Binom(n,δ)k+\mathrm{Binom}(n,\delta). We separately consider two different cases to show the claim.

Case 1: δ2n\delta\geq\frac{2}{n}. In this case, we have

𝖯(|𝒮|k+nδ+nδ)\displaystyle\operatorname{\mathsf{P}}\left(\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathcal{S}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\geq k+n\delta+n\sqrt{\delta}\right) 𝖯(Binom(n,δ)nδ+nδ)\displaystyle\leq\operatorname{\mathsf{P}}\left(\mathrm{Binom}(n,\delta)\geq n\delta+n\sqrt{\delta}\right)
(a)exp(n2+1/δ),\displaystyle\stackrel{{\scriptstyle\mathchoice{\hbox to0.0pt{\hss$\displaystyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\textstyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\scriptstyle{(a)}$\hss}}{\hbox to0.0pt{\hss$\scriptscriptstyle{(a)}$\hss}}}}{{\leq}}\exp\left(-\frac{n}{2+1/\sqrt{\delta}}\right),

where (a)(a) follows by the Chernoff bound (see, for example, Theorem 4.4 in Mitzenmacher and Upfal, (2017)). To show that 𝖯()2δ\operatorname{\mathsf{P}}({\mathcal{B}})\leq 2\delta, it suffices to show that n2+1/δlog1δ\frac{n}{2+1/\sqrt{\delta}}\geq\log\frac{1}{\delta}. This follows by the fact that (2+1/δ)log1δ(2+1/\sqrt{\delta})\log\frac{1}{\delta} is maximized at δ=2/n\delta=2/n with value (2+1/2/n)ln(n/2)(2+1/\sqrt{2/n})\ln(n/2), and that n>(2+1/2/n)ln(n/2)n>(2+1/\sqrt{2/n})\ln(n/2) for any n1n\geq 1.

Case 2: δ<2n\delta<\frac{2}{n}. Here we assume without loss of generality that n3n\geq 3, because nlogn>n\frac{n}{\log n}>n when n2n\leq 2, and it follows that 𝖯()=0\operatorname{\mathsf{P}}({\mathcal{B}})=0. We have

𝖯(|𝒮|k+nlogn)\displaystyle\operatorname{\mathsf{P}}\left(\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\mathcal{S}\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}\geq k+\frac{n}{\log n}\right) 𝖯(Binom(n,δ)nlogn)\displaystyle\leq\operatorname{\mathsf{P}}\left(\mathrm{Binom}(n,\delta)\geq\frac{n}{\log n}\right)
=(b)exp(nD𝖪𝖫(1logn|missing|missingδ))\displaystyle\stackrel{{\scriptstyle\mathchoice{\hbox to0.0pt{\hss$\displaystyle{(b)}$\hss}}{\hbox to0.0pt{\hss$\textstyle{(b)}$\hss}}{\hbox to0.0pt{\hss$\scriptstyle{(b)}$\hss}}{\hbox to0.0pt{\hss$\scriptscriptstyle{(b)}$\hss}}}}{{=}}\exp\left(-nD_{\mathsf{KL}}\left(\frac{1}{\log n}\big{\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}missing}\big{\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}missing}\delta\right)\right)
=exp(n(1lognlog1δlogn+(11logn)log11/logn1δ)),\displaystyle=\exp\left(-n\left(\frac{1}{\log n}\log\frac{1}{\delta\log n}+\left(1-\frac{1}{\log n}\right)\log\frac{1-1/\log n}{1-\delta}\right)\right),

where (b)(b) follows by the Chernoff–Hoeffding theorem (see, for example, Theorem 1 in Hoeffding, (1963)). To prove 𝖯()2δ\operatorname{\mathsf{P}}({\mathcal{B}})\leq 2\delta, it suffices to show that

n(1lognlog1δlogn+(11logn)log11/logn1δ)log12δ0.n\left(\frac{1}{\log n}\log\frac{1}{\delta\log n}+\left(1-\frac{1}{\log n}\right)\log\frac{1-1/\log n}{1-\delta}\right)-\log\frac{1}{2\delta}\geq 0.

Firstly, notice that n(1lognlog1δlogn+(11logn)log11/logn1δ)log12δn\left(\frac{1}{\log n}\log\frac{1}{\delta\log n}+\left(1-\frac{1}{\log n}\right)\log\frac{1-1/\log n}{1-\delta}\right)-\log\frac{1}{2\delta} monotonically decreases as δ\delta increases from 0 to 2/n2/n. The proof is then completed by the fact that

n(1lognlogn2logn+(11logn)log11/logn12/n)logn40n\left(\frac{1}{\log n}\log\frac{n}{2\log n}+\left(1-\frac{1}{\log n}\right)\log\frac{1-1/\log n}{1-2/n}\right)-\log\frac{n}{4}\geq 0

for any n3n\geq 3. ∎

Query Analysis: Let MM denote the number of queries made by the NoisyThreshold algorithm. The following lemma gives an upper bound on the expectation of MM, and hence completes the proof of Proposition 6.

Lemma 16.

For any input instance 𝐱{\mathbf{x}}, the expected number of queries made by the NoisyThreshold algorithm satisfies that

𝖤[M|𝐱]nlogkδDKL(p1p)+n12p+C(k+max(nδ+nδ,nlogn))logkδ,\operatorname{\mathsf{E}}[M\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathbf{x}}]\leq\frac{n\log\frac{k}{\delta}}{{D_{\mathrm{KL}}(p\|1-p)}}+\frac{n}{1-2p}+C\left(k+\max\left(n\delta+n\sqrt{\delta},\frac{n}{\log n}\right)\right)\log\frac{k}{\delta},

where CC is a constant depending only on pp.

Proof.

To bound 𝖤[M|𝐱]\operatorname{\mathsf{E}}[M\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathbf{x}}], we only need to bound the expected number of queries made by the nn calls of the CheckBit function on Line 3, and the single call of the MaxHeapThreshold function on Line 10. By Lemma 13, we know that the expected number of queries spent at Line 3 is at most

n12plog1δkδklog1pp\displaystyle\frac{n}{1-2p}\left\lceil\frac{\log\frac{1-\frac{\delta}{k}}{\frac{\delta}{k}}}{\log\frac{1-p}{p}}\right\rceil n12p(logkδlog1pp+1)\displaystyle\leq\frac{n}{1-2p}\left(\frac{\log\frac{k}{\delta}}{\log\frac{1-p}{p}}+1\right)
=nlogkδDKL(p1p)+n12p.\displaystyle=\frac{n\log\frac{k}{\delta}}{{D_{\mathrm{KL}}(p\|1-p)}}+\frac{n}{1-2p}.

By Lemma 14, we know that the expected number of queries spent at Line 12 is at most

C𝖤[|𝒮|]logkδC(k+max(nδ+nδ,nlogn))logkδ,C\operatorname{\mathsf{E}}\left[\lvert{\mathcal{S}}\rvert\right]\log\frac{k}{\delta}\leq C\left(k+\max\left(n\delta+n\sqrt{\delta},\frac{n}{\log n}\right)\right)\log\frac{k}{\delta},

where the inequality holds because |𝒮|k+max(nδ+nδ,nlogn)\lvert{\mathcal{S}}\rvert\leq k+\max\left(n\delta+n\sqrt{\delta},\frac{n}{\log n}\right) on the event that Line 12 is executed. The lemma follows by the linearity of expectation. ∎

B-B Regime 2: k>nlognk>\frac{n}{\log n}

To prove Theorem 2 in this regime, we establish the following proposition.

Proposition 7.

Suppose knlognk\geq\frac{n}{\log n} and kn/2k\leq n/2. There exists a variable-length algorithm, namely Algorithm 1, that computes the 𝖳𝖧k\mathsf{TH}_{k} function with worst-case error probability at most δ\delta. Moreover, the number of queries MM made by the algorithm satisfies

𝖤[M|𝐱]nlognδDKL(p1p)+n12p.\operatorname{\mathsf{E}}[M\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathbf{x}}]\leq\frac{n\log\frac{n}{\delta}}{{D_{\mathrm{KL}}(p\|1-p)}}+\frac{n}{1-2p}.

for any input instance 𝐱{\mathbf{x}}.

To see that Proposition 7 implies Theorem 2 in this regime, notice that under assumptions knlognk\geq\frac{n}{\log n} and δ=o(1)\delta=o(1), the expected query complexity of Algorithm 1 satisfies

𝖤[M|𝐱]nlognδDKL(p1p)+n12p=(1+o(1))nlogkδDKL(p1p).\operatorname{\mathsf{E}}[M\mathchoice{\mspace{1.0mu}|\mspace{1.0mu}}{|}{|}{|}{\mathbf{x}}]\leq\frac{n\log\frac{n}{\delta}}{{D_{\mathrm{KL}}(p\|1-p)}}+\frac{n}{1-2p}=(1+o(1))\frac{n\log\frac{k}{\delta}}{{D_{\mathrm{KL}}(p\|1-p)}}.

Now it suffices to prove Proposition 7.

To prove the proposition, we first analyze the error probability of Algorithm 1. By Lemma 13, we know that 𝖯(x^ixi)δn\operatorname{\mathsf{P}}(\hat{x}_{i}\neq x_{i})\leq\frac{\delta}{n} for each ii, where x^i\hat{x}_{i} is the estimate for xix_{i} in Algorithm 1. By the union bound, we have 𝖯(i[n]:x^ixi)δ\operatorname{\mathsf{P}}(\exists i\in[n]\mathchar 58\relax\hat{x}_{i}\neq x_{i})\leq\delta, and it follows that 𝖯(𝟙{i=1nx^ik}𝖳𝖧k(𝐱))δ\operatorname{\mathsf{P}}(\mathbbm{1}_{\{\sum_{i=1}^{n}\hat{x}_{i}\geq k\}}\neq\mathsf{TH}_{k}({\mathbf{x}}))\leq\delta. Therefore, Algorithm 1 has error probability at most δ\delta.

Regarding the expected number of queries of Algorithm 1, notice that the algorithm calls the CheckBit for nn times. By Lemma 13, the total expected number of queries is at most

n12plog1δnδnlog1ppn12plognδlog1pp+n12p=nlognδD𝖪𝖫(p1p)+n12p,\frac{n}{1-2p}\left\lceil\frac{\log\frac{1-\frac{\delta}{n}}{\frac{\delta}{n}}}{\log\frac{1-p}{p}}\right\rceil\leq\frac{n}{1-2p}\frac{\log\frac{n}{\delta}}{\log\frac{1-p}{p}}+\frac{n}{1-2p}=\frac{n\log\frac{n}{\delta}}{D_{\mathsf{KL}}(p\|1-p)}+\frac{n}{1-2p},

which completes the proof.

B-C Fixed-length Algorithms

In this section, we describe a fixed-length version for each of the two proposed algorithms, and bound their number of queries and worst-case error probability.

B-C1 Fixed-length Version of Algorithm 2

Notice that in Algorithm 2, all the queries are spent on the nn executions of the CheckBit function on Line 3 and the call of the MaxHeapThreshold function on Line 12. Notice that in Lemma 14, a deterministic upper bound is established for the number of queries of the MaxHeapThreshold function. Thus, the randomness in the proposed algorithm only comes from the number queries of the CheckBit function. The main idea in this the construction of the fixed-length algorithm is to convert the CheckBit algorithm into one with bounded second moment for the number of queries. More specifically, we define the SafeCheckBit algorithm in the following two step procedure:

  1. 1.

    Run the CheckBit algorithm until it halts, or it is about to make the (ηlogη)(\eta\log\eta)-th query, where η112plog1δδlog1pp\eta\triangleq\frac{1}{1-2p}\left\lceil\frac{\log\frac{1-\delta}{\delta}}{\log\frac{1-p}{p}}\right\rceil.

  2. 2.

    If the CheckBit algorithm halts, then return the algorithm output, otherwise, restart the CheckBit algorithm.

The construction of the safe algorithm is first proposed in Gu and Xu, (2023). It is shown in Corollary 19 of Gu and Xu, (2023) that the error probability of the SafeCheckBit algorithm is at most (1+o(1))δ(1+o(1))\delta. Moreover, its number of queries MsM_{s} satisfies

𝖤[Ms]η11logη and 𝖵𝖺𝗋(Ms)=O(log31δ).\operatorname{\mathsf{E}}[M_{s}]\leq\frac{\eta}{1-\frac{1}{\log\eta}}\;\text{ and }\;\mathsf{Var}(M_{s})=O\left(\log^{3}\frac{1}{\delta}\right). (31)

In the proposed fixed-length algorithm, we replace the call of CheckBit(xi,δk,p)(x_{i},\frac{\delta}{k},p) on Line 3 of Algorithm 2 by SafeCheckBit(xi,δk,p)(x_{i},\frac{\delta}{k},p). Moreover, we add a termination rule in the for loop from Line 2 to Line 6 that the algorithm terminates and declares failure if it is about to use the

(nη11logη+nlogkδ)-th\left(\frac{n\eta^{\prime}}{1-\frac{1}{\log\eta^{\prime}}}+n\sqrt{\log\frac{k}{\delta}}\right)\text{-th}

query, where η112plogkδδlog1pp\eta^{\prime}\triangleq\frac{1}{1-2p}\left\lceil\frac{\log\frac{k-\delta}{\delta}}{\log\frac{1-p}{p}}\right\rceil.

With the termination rule, it readily follows that the number of queries made by the algorithm is always at most (1+o(1))nlogkδDKL(p1p)(1+o(1))\frac{n\log\frac{k}{\delta}}{{D_{\mathrm{KL}}(p\|1-p)}}. Let XX denote the total number of queries made by the nn executions of the SafeCheckBit algorithm. Because the number of queries made by each execution of the SafeCheckBit algorithm is independent of one another, we have

𝖤[X]nη11logη and 𝖵𝖺𝗋(X)=O(nlog3kδ).\operatorname{\mathsf{E}}[X]\leq\frac{n\eta^{\prime}}{1-\frac{1}{\log\eta^{\prime}}}\;\text{ and }\;\mathsf{Var}(X)=O\left(n\log^{3}\frac{k}{\delta}\right).

By Chebyshev’s inequality, we have

𝖯(Xnη11logη+nlogkδ)O(nlog3kδn2logkδ)=O(n1+o(1))=o(δ),\operatorname{\mathsf{P}}\left(X\geq\frac{n\eta^{\prime}}{1-\frac{1}{\log\eta^{\prime}}}+n\sqrt{\log\frac{k}{\delta}}\right)\leq O\left(\frac{n\log^{3}\frac{k}{\delta}}{n^{2}\log\frac{k}{\delta}}\right)=O(n^{-1+o(1)})=o(\delta),

where the last two equalities hold true under the assumption that δn1+ϵ\delta\geq n^{-1+{\epsilon}} for some positive constant ϵ{\epsilon}. Hence, by the union bound and Lemma 15, the error probability of the proposed fixed-length algorithm is at most (3+o(1))δ(3+o(1))\delta. This proves Corollary 3 for the regime knlognk\leq\frac{n}{\log n}.

B-C2 Fixed-length Version of Algorithm 1

To construct the fixed-length version of Algorithm 1, we similarly replace the CheckBit function by the SafeCheckBit function and add a termination rule based on the number of queries spent. More specifically, we replace the call of function CheckBit(xi,δ/n)(x_{i},\delta/n) on Line 3 Algorithm 1 by SafeCheckBit(xi,δ/n)(x_{i},\delta/n). Furthermore, we add the termination rule in the for loop that the algorithm terminates and declares failure if it is about to use the

(nη′′11logη′′+nlognδ)-th\left(\frac{n\eta^{\prime\prime}}{1-\frac{1}{\log\eta^{\prime\prime}}}+n\sqrt{\log\frac{n}{\delta}}\right)\text{-th}

query, where η′′112plognδδlog1pp\eta^{\prime\prime}\triangleq\frac{1}{1-2p}\left\lceil\frac{\log\frac{n-\delta}{\delta}}{\log\frac{1-p}{p}}\right\rceil.

By the termination condition, we know that the algorithm deterministically uses at most (1+o(1))nlognδDKL(p1p)=(1+o(1))nlogkδDKL(p1p)(1+o(1))\frac{n\log\frac{n}{\delta}}{{D_{\mathrm{KL}}(p\|1-p)}}=(1+o(1))\frac{n\log\frac{k}{\delta}}{{D_{\mathrm{KL}}(p\|1-p)}} queries, where the equality follows because k>nlognk>\frac{n}{\log n}. By an analogous analysis as in the previous section, we know that the error probability of the algorithm is at most (1+o(1))δ(1+o(1))\delta given that δn1+ϵ\delta\geq n^{-1+\epsilon}. This completes the proof for the regime k>nlognk>\frac{n}{\log n}.

Appendix C Implementation Details

In this part, we give some details about the simulation results shown in Figure 1. In this simulation, our algorithm for computing the 𝖬𝖠𝖩𝖮𝖱𝖨𝖳𝖸\mathsf{MAJORITY} function (i.e., the threshold function with k=n/2k=n/2) is compared to the existing algorithm proposed in Feige et al., (1994). A binary sequence of length n=100n=100 chosen uniformly at random is inputted to the algorithms, and the desired error probability in the algorithms is set to δ=102\delta=10^{-2}. For a sequence generated uniformly at random, the law of large numbers implies that the fraction of ones in the sequence is approximately 1/21/2 with high probability. This scenario is the most likely to induce errors for an algorithm computing the 𝖬𝖠𝖩𝖮𝖱𝖨𝖳𝖸\mathsf{MAJORITY} function, making it critical for assessing the worst-case performance of the algorithm. For k=n/2k=n/2, the existing work by Feige et al., (1994) proposes a noisy sorting algorithm that sorts the input binary sequence, and returns the kk-th largest element in the sorted sequence as an estimate of the threshold function. The reader is referred to Section 3 in Feige et al., (1994) for more details about the existing algorithm. For the simulation of our proposed algorithm, we use Algorithm 1 (since k=n/2k=n/2). At each value of pp ranging from 0.01 to 0.25, we run 10000 realizations of input sequences and compute the average number of queries used by the two algorithms.