\NewEnviron

resizealign $\displaystyle\BODY$ (1)

On the Optimal Bounds for Noisy Computing

Banghua Zhu, Ziao Wang Nadim Ghaddar Jiantao Jiao Lele Wang Banghua Zhu is with the Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA 94720, USA (email: [email protected]).Ziao Wang is with the Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, BC V6T1Z4, Canada (email: [email protected]).Nadim Ghaddar is with the Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA 92093, USA, (email: [email protected]).Jiantao Jiao is with the Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA 94720, USA (email: [email protected]).Lele Wang is with the Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, BC V6T1Z4, Canada (email: [email protected]).A shorter version of this work is accepted at the 2023 IEEE International Symposium on Information Theory.

On the Optimal Bounds for Noisy Computing

Abstract

We revisit the problem of computing with noisy information considered in Feige et al. (1994), which includes computing the $\mathsf{OR}$ function from noisy queries, and computing the $\mathsf{MAX},\mathsf{SEARCH}$ , and $\mathsf{SORT}$ functions from noisy pairwise comparisons. For $K$ given elements, the goal is to correctly recover the desired function with probability at least $1-\delta$ when the outcome of each query is flipped with probability $p$ . We consider both the adaptive sampling setting where each query can be adaptively designed based on past outcomes, and the non-adaptive sampling setting where the query cannot depend on past outcomes. The prior work provides tight bounds on the worst-case query complexity in terms of the dependence on $K$ . However, the upper and lower bounds do not match in terms of the dependence on $\delta$ and $p$ . We improve the lower bounds for all the four functions under both adaptive and non-adaptive query models. Most of our lower bounds match the upper bounds up to constant factors when either $p$ or $\delta$ is bounded away from $0$ , while the ratio between the best prior upper and lower bounds goes to infinity when $p\rightarrow 0$ or $p\rightarrow 1/2$ . On the other hand, we also provide matching upper and lower bounds for the number of queries in expectation, improving both the upper and lower bounds for the variable-length query model.

I Introduction

The problem of computing with noisy information has been studied extensively since the seminal work (Feige et al., 1994), which considers four problems:

•

Computing the $\mathsf{OR}$ function of $K$ bits from noisy observations of the bits;
•

Finding the largest (or top- $N$ ) element among $K$ real-valued elements from noisy pairwise comparisons;
•

Searching the rank of a new element in an ordered list of $K$ elements from noisy pairwise comparisons;
•

Sorting $K$ elements from noisy pairwise comparisons.

Feige et al. (1994) is based on a simple noise model where each observation goes through a binary symmetric channel $\mathsf{BSC}(p)$ , i.e. for each observation, with probability $p$ we see its flipped outcome, and with probability $1-p$ we see its true value. They provide upper and lower bounds for the query complexity in terms of the total number of elements $K$ , the noise probability $p$ , and the desired confidence level $\delta$ when adaptive querying is allowed. They establish the optimal query complexity in terms of dependence with respect to $K$ . However, the exact sample/query complexity with respect to all parameters $K,\delta$ , and $p$ is still not fully understood. In this paper, we revisit the problem of computing under noisy observations in both the adaptive sampling and non-adaptive sampling settings. We aim to close the gap in the existing bounds and illustrate the difference in query complexity between adaptive sampling and non-adaptive sampling.

Taking the problem of computing the $\mathsf{OR}$ function of $K$ bits as an example, assume that there are $K$ bits $(X_{1},\cdots,X_{K})\in\{0,1\}^{K}$ . The $\mathsf{OR}$ function is defined as

\displaystyle\mathsf{OR}(X_{1},\cdots,X_{K})=\begin{cases}1,&\text{ if }\exists k\in[K],X_{k}=1\\ 0,&\text{ otherwise.}\end{cases}

(2)

The question is simple when we can query each bit noiselessly. In this case, $K$ queries are both sufficient and necessary since it suffices to query each bit once. And thus there is no benefit in applying adaptive querying compared to non-adaptive querying. When the observation of each query goes through a binary symmetric channel $\mathsf{BSC}(p)$ , we ask two questions:

•

How many queries (samples) do we need in the worst case to recover the true $\mathsf{OR}$ function value of any given sequences $X_{1},\cdots,X_{K}$ with probability at least $1-\delta$ ?
•

Can adaptive sampling do better than non-adaptive sampling when noise is present?

Problem	Fixed Length, Adaptive Sampling
Problem	Upper Bound	Lower Bound
$\mathsf{OR}$	$\mathcal{O}(\frac{K\log(1/\delta)}{1-H(p)})$ (Feige et al., 1994)	$\Omega(\frac{K}{1-H(p)}+\frac{K\log(1/\delta)}{D_{\mathsf{KL}}(p\\|1-p)})$ (Thm II.1)
$\mathsf{MAX}$	$\mathcal{O}(\frac{K\log(1/\delta)}{1-H(p)})$ (Feige et al., 1994)	$\Omega(\frac{K}{1-H(p)}+\frac{K\log(1/\delta)}{D_{\mathsf{KL}}(p\\|1-p)})$ (Thm III.1)
$\mathsf{SEARCH}$	$\mathcal{O}(\frac{\log(K/\delta)}{1-H(p)})$ (Feige et al., 1994)	$\Omega(\frac{\log(K)}{1-H(p)}+\frac{\log(1/\delta)}{D_{\mathsf{KL}}(p\\|1-p)})$ (Thm IV.1)
$\mathsf{SORT}$	$\mathcal{O}(\frac{K\log(K/\delta)}{1-H(p)})$ (Feige et al., 1994)	$\Omega(\frac{K\log(K)}{1-H(p)}+\frac{K\log(K/\delta)}{D_{\mathsf{KL}}(p\\|1-p)})$ (Thm V.1)
Problem	Fixed Length, Non-adaptive Sampling
Problem	Upper Bound (Appendix H)	Lower Bound
$\mathsf{OR}$	$\mathcal{O}(\frac{K\log(K/\delta)}{1-H(p)})$	$\Omega(\max(K,\frac{K\log(K)p}{1-H(p)},\frac{K\log(K)}{\log((1-p)/p)}))$ (Thm II.3)
$\mathsf{MAX}$	$\mathcal{O}(\frac{K^{2}\log(K/\delta)}{1-H(p)})$	$\Omega(\frac{K^{2}}{1-H(p)}+\frac{K^{2}\log(1/\delta)}{D_{\mathsf{KL}}(p\\|1-p)})$ (Thm III.2)
$\mathsf{SEARCH}$	$\mathcal{O}(\frac{K\log(1/\delta)}{1-H(p)})$	$\Omega(\frac{K}{1-H(p)}+\frac{K\log(1/\delta)}{D_{\mathsf{KL}}(p\\|1-p)})$ (Thm IV.3)
$\mathsf{SORT}$	$\mathcal{O}(\frac{K^{2}\log(K/\delta)}{1-H(p)})$	$\Omega(K^{2}+\frac{K^{2}\log(K)}{D_{\mathsf{KL}}(p\\|1-p))})$ (Thm V.3)
Problem	Variable Length, Adaptive Sampling
Problem	Matching Bound (Thm VI.1)
$\mathsf{OR}$	$\Theta(\frac{K}{1-H(p)}+\frac{K\log(1/\delta)}{D_{\mathsf{KL}}(p\\|1-p)})$
$\mathsf{MAX}$	$\Theta(\frac{K}{1-H(p)}+\frac{K\log(1/\delta)}{D_{\mathsf{KL}}(p\\|1-p)})$
$\mathsf{SEARCH}$	$\Theta(\frac{\log(K)}{1-H(p)}+\frac{\log(1/\delta)}{D_{\mathsf{KL}}(p\\|1-p)})$
$\mathsf{SORT}$	$\Theta(\frac{K\log(K)}{1-H(p)}+\frac{K\log(K/\delta)}{D_{\mathsf{KL}}(p\\|1-p)})$

TABLE I: Summary of query complexity bounds of

\mathsf{OR}

\mathsf{MAX}

\mathsf{SEARCH}

and

\mathsf{SORT}

. Here, we assume

\delta<0.49

In Feige et al. (1994), an adaptive tournament algorithm is proposed that achieves worst-case query complexity $\mathcal{O}(K\log(1/\delta)/(1-H(p)))$ , and a corresponding lower bound $\Omega(K\log(1/\delta)/\log((1-p)/p))$ for adaptive sampling is provided. A simple calculation tells us that their ratio $\log((1-p)/p)/(1-H(p))$ goes to infinity as $p\rightarrow 0$ or $p\rightarrow 1/2$ , which indicates that there is still a gap between the upper and the lower bounds when the noise probability $p$ is near the point $0$ or $1/2$ . This calls for tighter upper or lower bound for these cases. In our paper, we improve the lower bound to $\Omega(K/(1-H(p))+K\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p))$ , which matches the existing upper bound up to a constant factor when either $p$ or $\delta$ is bounded away from $0$ .

One may wonder how many samples are needed when each query is not allowed to depend on previous outcomes. We provide a lower bound $\Omega(\max(K,K\log(K)p/(1-H(p))),K\log(K)/\log((1-p)/p))$ for this non-adaptive setting. On the other hand, a repetition-based upper bound $\mathcal{O}(K\log(K/\delta)/(1-H(p)))$ matches the lower bound up to a constant factor when both $p$ and $\delta$ are bounded from $0$ .

Similarly, we ask the same questions for computing $\mathsf{MAX}$ , $\mathsf{SEARCH}$ , and $\mathsf{SORT}$ . We defer the definitions and discussions of the problems to Section I-B. Here we summarize the best upper and lower bounds for the problems considered in this paper in Table I, either from previous art or from this paper.

Our lower bounds take the form of $f(K)/(1-H(p))+g(K,\delta)/D_{\mathsf{KL}}(p\|1-p)$ . Here the first term does not depend on $\delta$ for $\delta<0.49$ , representing the number of queries one must pay regardless of the target error probability $\delta$ . The second term grows logarithmically with $1/\delta$ , representing the price to pay for smaller target error probability. Here we always have $D_{\mathsf{KL}}(p\|1-p)\gtrsim 1-H(p)$ for $p\in(0,1)$ , with $D_{\mathsf{KL}}(p\|1-p)\asymp 1-H(p)$ when $p$ is bounded away from $0$ . Technically, the first term is usually from Fano’s inequality, which gives better dependence on $p$ but worse dependence on $\delta$ . The second term is from a $\mathsf{KL}$ -divergence based lower bound, which gives better dependence on $\delta$ but worse dependence on $p$ .

We also extend our bounds from the fixed length query model to variable length query model (a.k.a. fixed budget and fixed confidence in the bandit literature (Kaufmann et al., 2016)). In the fixed length setting considered above, we ask for the worst-case deterministic number of queries required in the worst case to recover the true value with probability at least $1-\delta$ . In the variable length setting, the number of queries can be random and dependent on the past outcomes. And we ask for the expected number of queries to recover the true value with probability at least $1-\delta$ . We discuss the results for variable length in Section VI, where we give matching upper and lower bounds with respect to all parameters for computing all four functions, improving over both existing upper and lower bounds and closing the gap.

I-A Related Work

The problems of noisy computation have been studied extensively before and after (Feige et al., 1994). However, most of the existing research work focuses on tightening the dependence on $K$ instead of $p$ and $\delta$ , or extending the results to a more general framework where the noise follows a generalized model that includes $\mathsf{BSC}$ channel as a special case. Although worst-case upper and lower bounds are provided for the generalized model, most of the lower bounds are based on instances where the noise does not follow a $\mathsf{BSC}$ model. Thus the lower bounds do not apply to our case.

Noisy binary searching

The noisy searching problem was first introduced by Rényi (Rényi, 1961) and Ulam (Ulam, 1976) and further developed by Berlekamp (Berlekamp, 1964) and Horstein (Horstein, 1963) in the context of coding for channels with feedback. The noisy searching algorithm in (Burnashev and Zigangirov, 1974) by Burnashev and Zigangirov can be seen as a specialization of Horstein’s coding scheme, whereas the algorithms in (Feige et al., 1994, Pelc, 1989, Karp and Kleinberg, 2007) can be seen as an adaptation of the binary search algorithm to the noisy setting.

The first tight lower bound for variable-length adaptive sampling in noisy searching is given by (Burnashev, 1976). The recent concurrent work (Gu and Xu, 2023) improves the dependence on constant when $p$ is some constant that is bounded away from $0$ and $1/2$ . Our lower bounds are based on a different proof using Le Cam’s method. The results do not require $p$ to be bounded, but are worse in terms of the constant dependence. (Gu and Xu, 2023) also provides matching upper bound that is tight even with the constant in the variable-length setting. We provide more discussions in Section VI. Making the upper and lower bounds match in the fixed-length query model still remains an important open problem.

In terms of the bounds for non-adaptive sampling, the gap between $\mathcal{O}(\log(K))$ for adaptive sampling and $\mathcal{O}(K)$ can be seen from the noiseless case when $p=0$ (Rényi, 1961). Here we provide an improved bound for the noisy case that has explicit dependence on $p$ and $\delta$ .

Noisy Sorting and max selection

The noisy sorting and max (or Top- $N$ ) selection problems have been usually studied together (e.g., (Feige et al., 1994)) and later have been extended to a more general setting known as active ranking (Mohajer et al., 2017, Falahatgar et al., 2017, Shah and Wainwright, 2018, Heckel et al., 2019, Agarwal et al., 2017), where the noise $p_{ij}$ for the comparison of a pair of elements $i$ and $j$ is usually unknown and different for different pairs. Other related but different settings for noisy sorting in the literature include the cases when some distance metric for permutations is to be minimized (rather than the the probability of error) (Ailon et al., 2008, Braverman and Mossel, 2009, Ailon, 2011, Negahban et al., 2012, Wauthier et al., 2013, Rajkumar and Agarwal, 2014, Shah et al., 2016, Shah and Wainwright, 2018, Mao et al., 2018), when the noise for each pairwise comparison is not i.i.d. and is determined by some noise model (e.g. the Bradley–Terry–Luce model(Bradley and Terry, 1952)) (Ajtai et al., 2009, Negahban et al., 2012, Rajkumar and Agarwal, 2014, Chen and Suh, 2015, Chen et al., 2017, Ren et al., 2018), or when the ordering itself is restricted to some subset of all permutations (Jamieson and Nowak, 2011, Ailon et al., 2011).

For noisy sorting, the best upper and lower bounds have been provided in (Feige et al., 1994, Wang et al., 2023), which give an upper bound $\mathcal{O}(K\log(K/\delta)/(1-H(p)))$ and lower bound $\Omega(K\log(K)/(1-H(p))+\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p))$ . We tighten the lower bound to be $\Omega(K\log(K)/(1-H(p))+K\log(K/\delta)/D_{\mathsf{KL}}(p\|1-p))$ . On the other hand, (Gu and Xu, 2023) shows that the query complexity is $(1+o(1))(K\log(K)/(1-H(p))+K\log(K)/D_{\mathsf{KL}}(p\|1-p))$ , which does not scale with $\delta$ . Our lower bound for fixed-length improves the dependence on $\delta$ . We also make the bounds match up to constant factors for all parameters in the variable-length setting. However, making the $\delta$ dependence tight in upper bound for fixed-length remains an open problem. In terms of the non-adaptive sampling scenario, we provide an upper and lower bound that matches in terms of the dependence on $K$ , but is still loose when both $p$ and $\delta$ go to $0$ simultaneously.

For max selection, the best known lower bound for fixed-length adaptive sampling is $\Omega(K\log(1/\delta)/\log((1-p)/p))$ (Feige et al., 1994), while our result makes it tight when either $p$ or $\delta$ goes to $0$ and provides matching bounds for variable-length setting. On the other hand, (Mohajer et al., 2017, Shah and Wainwright, 2018) discuss the gap between adaptive sampling and non-adaptive sampling. However, the $\Omega(K^{3}\log(K))$ lower bound for non-adaptive sampling in (Shah and Wainwright, 2018) does not apply to our case since it is based on a generalized model where the noise probability is different. In our case, $\mathcal{O}(K^{2}\log(K))$ is a natural upper bound. However, it is unclear whether our lower bound $\Omega(K^{2})$ can be improved to $\Omega(K^{2}\log(K))$ .

$\mathsf{OR}$ and best arm identification

The noisy computation of $\mathsf{OR}$ has been first studied in the literature of circuit with noisy gates (Dobrushin and Ortyukov, 1977a, b, Von Neumann, 1956, Pippenger et al., 1991, Gács and Gál, 1994) and noisy decision trees (Feige et al., 1994, Evans and Pippenger, 1998, Reischuk and Schmeltz, 1991). Different from the other three problems we consider, computing $\mathsf{OR}$ does not rely on pairwise comparisons, but instead directly queries the values of the bits. This is also related to the rich literature of best arm identification, which queries real-valued arms and aims to identify the arm with largest value (reward) (Bubeck et al., 2009, Audibert et al., 2010, Garivier and Kaufmann, 2016, Jamieson and Nowak, 2014, Gabillon et al., 2012, Kaufmann et al., 2016). Indeed, any best-arm identification algorithm can be converted to an $\mathsf{OR}$ computation by first finding the maximum and then query the binary value of the maximum. This recovers the best existing upper bound $\mathcal{O}(K\log(1/\delta)/(1-H(p)))$ for computation of $\mathsf{OR}$ under adaptive sampling scenario (Audibert et al., 2010). However, the lower bound for best arm identification does not apply to our case, since $\mathsf{OR}$ has a binary output, while the best arm identification problem requires the arm index. And our lower bound for fixed-length adaptive sampling improves over the best known lower bound $\Omega(K\log(1/\delta)/\log((1-p)/p))$ (Feige et al., 1994). We also provide matching bounds for variable-length.

For non-adaptive sampling, (Reischuk and Schmeltz, 1991, Gács and Gál, 1994) provide a lower bound $\Omega(K\log(K)/\log((1-p)/p))$ . Our lower bound $\Omega(K\log(K)p/(1-H(p)))$ is tighter than the current lower bound when $p\rightarrow 1/2$ , but looser when $p\rightarrow 0$ . Thus the tightest bound is a maximum of $K$ and the two lower bounds.

I-B Problem Definition and Preliminaries

The $\mathsf{OR}$ function is defined in Equation (2). We define the rest of the problems here. Different from $\mathsf{OR}$ , the $\mathsf{MAX}$ , $\mathsf{SEARCH}$ and $\mathsf{SORT}$ problems are all based on noisy pairwise comparisons. Concretely, assume we have $K$ distinct real-valued items $X_{1},\cdots,X_{K}$ . Instead of querying the exact value of each element, we can only query a pair of elements and observe their noisy comparison. For any queried pair $(i,j)$ , we will observe a sample from $\mathsf{Bern}(1-p)$ if $X_{i}>X_{j}$ , and a sample from $\mathsf{Bern}(p)$ if $X_{i}<X_{j}$ .

We have $\mathsf{MAX}(X_{1},\cdots,X_{K})=\operatorname*{arg\,max}_{i\in[K]}X_{i}$ , $\mathsf{SORT}(X_{1},\cdots,X_{K})=\sigma$ , where $\sigma:[K]\mapsto[K]$ is the permutation function such that $X_{\sigma(1)}<X_{\sigma(2)}<\cdots<X_{\sigma(K)}$ . And $\mathsf{SEARCH}(X;X_{1},\cdots,X_{K})=i$ , where $i$ satisfies that $X_{i}<X<X_{i+1}$ with $X_{0}=-\infty$ and $X_{K+1}=+\infty$ . In the $\mathsf{SEARCH}$ problem, we assume that the ordering of $X_{1},\cdots,X_{K}$ is given, and we are interested where the position of a new $X$ is. Thus, in each round we compare the given $X$ and any of the elements $X_{i}$ .

We are interested in the probability of exact recovery of the functions. We consider both adaptive sampling and non-adaptive sampling. In adaptive sampling, the sampling distribution at each round can be dependent on the past queries and observations. In non-adaptive sampling, the sampling distribution in each round has to be determined at the beginning and cannot change with the ongoing queries or observations. Throughout the paper, we assume that the desired error probability $\delta$ satisfies $\delta<0.49$ . We use the terms “querying” and “sampling” interchangeably.

II Computing the $\mathsf{OR}$ function

In this section, we provide the lower bounds for the query complexity of computing the $\mathsf{OR}$ function under both adaptive and non-adaptive sampling. The upper bound for adaptive sampling is from (Feige et al., 1994). And the upper bound for non-adaptive sampling is omitted . Let $\theta\in\{0,1\}^{K}$ be the $K$ -bit sequence representing the true values. Let $\mathsf{OR}(\theta)$ be the result of the $\mathsf{OR}$ function applied to the $K$ -bit noiseless sequence. We also let $\hat{\mu}$ be any algorithm that queries any noisy bit in $T$ rounds, and outputs a (possibly random) decision from $\{0,1\}$ .

II-A Adaptive Sampling

We have the following minimax lower bound.

Theorem II.1.

In the adaptive setting, we have

\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in\{0,1\}^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{OR}(\theta))\geq\frac{1}{4}\cdot\exp\left(-\frac{T\cdot D_{\mathsf{KL}}(p\|1-p)}{K}\right).

Thus, the number of queries required to recover the true value with probability at least $1-\delta$ is lower bounded by $\Omega({K}/{(1-H(p))}+K\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p))$ .

We provide the proof here, which is based on Le Cam’s two point method (see e.g. (LeCam, 1973, Yu, 1997)).

Proof of Theorem II.1.

Our lower bound proof is mainly based on Le Cam’s two point method, which is also re-stated in Lemma A.1 in Appendix A for reader’s convenience. Let $\theta_{0}$ be the length- $K$ all-zero sequence, and let $\theta_{j}\in\{0,1\}^{K}$ be such that $\theta_{jj}=1$ and $\theta_{ji}=0$ for $i\neq j$ . Here $\theta_{ji}$ refers to the $i$ -th element in the binary vector $\theta_{j}$ . We can first verify that for any $\hat{\mu}$ , one has

\displaystyle\mathds{1}(\hat{\mu}\neq\mathsf{OR}(\theta_{0}))+\mathds{1}(\hat{\mu}\neq\mathsf{OR}(\theta_{j}))\geq 1.

By applying Le Cam’s two point lemma on $\theta_{0}$ and $\theta_{j}$ , we know that

\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in\{0,1\}^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{OR}(\theta))

\displaystyle\geq\frac{1}{2}(1-\mathsf{TV}(\mathbb{P}_{\theta_{0}},\mathbb{P}_{\theta_{j}})).

Here $\mathbb{P}_{\theta_{j}}$ is the joint distribution of query-observation pairs in $T$ rounds when the ground truth is $\theta_{j}$ . By taking maximum over $j$ on the right-hand side, we have

	$\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in\{0,1\}^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{OR}(\theta))$
	$\displaystyle\geq\sup_{1\leq j\leq K}\frac{1}{2}(1-\mathsf{TV}(\mathbb{P}_{\theta_{0}},\mathbb{P}_{\theta_{j}}))$
	$\displaystyle\geq\sup_{1\leq j\leq K}\frac{1}{4}\exp(-D_{\mathsf{KL}}(\mathbb{P}_{\theta_{0}},\mathbb{P}_{\theta_{j}})).$

Here the last inequality is due to Bretagnolle–Huber inequality (Bretagnolle and Huber, 1979) (Lemma A.4 in Appendix A). Now we aim at computing $D_{\mathsf{KL}}(\mathbb{P}_{\theta_{0}},\mathbb{P}_{\theta_{j}})$ . Let $T_{j}$ be the random variable that denotes the number of times the $j$ -th element is queried among all $T$ rounds. From divergence decomposition lemma (Auer et al., 1995) (Lemma A.3 in Appendix A), we have

\displaystyle D_{\mathsf{KL}}(\mathbb{P}_{\theta_{0}},\mathbb{P}_{\theta_{j}})=\mathbb{E}_{\theta_{0}}[T_{j}]\cdot D_{\mathsf{KL}}(p\|1-p).

Here $\mathbb{E}_{\theta_{0}}[T_{j}]$ denotes the expected number of times the $j$ -th element is queried when the ground truth is $\theta_{0}$ . Thus we have

	$\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in\{0,1\}^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{OR}(\theta))$
	$\displaystyle\geq\sup_{1\leq j\leq K}\frac{1}{4}\exp(-\mathbb{E}_{\theta_{0}}[T_{j}]\cdot D_{\mathsf{KL}}(p\\|1-p)).$

Now since $\sum_{j}\mathbb{E}_{\theta_{0}}[T_{j}]=T$ , there must exists some $j$ such that $\mathbb{E}_{\theta_{0}}[T_{j}]\leq T/K$ . This gives

\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in\{0,1\}^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{OR}(\theta))

\displaystyle\geq\frac{1}{4}\cdot\exp\left(-\frac{T\cdot D_{\mathsf{KL}}(p\|1-p)}{K}\right).

On the other hand, $K$ is naturally a lower bound for query complexity since one has to query each element at least once. Thus we arrive at a lower bound of $\Omega(K+K\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p))$ . Note that this is equivalent to $\Omega(K/(1-H(p))+K\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p))$ up to a constant factor when $\delta<0.49$ . The reason is that when $p$ is bounded away from $0$ , $(1-H(p))/D_{\mathsf{KL}}(p\|1-p)$ is always some constant. When $p$ is close to $0$ , $1-H(p)$ is within constant factor of $1$ . ∎

Remark II.2.

Compared with the existing tightest bound $\Omega(K\log(1/\delta)/\log((1-p)/p))$ in Feige et al. (1994), the rate is greatly improved as $p\rightarrow 0$ or $p\rightarrow 1/2$ .

On the other hand, the best known upper bound from Feige et al. (1994), which is $\mathcal{O}(\frac{K\log(1/\delta)}{1-H(p)})$ . We include its algorithm and analysis in Appendix I. Theorem II.1 shows that when $\delta$ is bounded away from $0$ , one needs at least $C\cdot K/(1-H(p))$ samples, matching the upper bound. Similarly, when $p$ is bounded away from $0$ , the term $D_{\mathsf{KL}}(p\|1-p)$ is also within a constant factor of $1-H(p)$ , thus the upper and lower bounds match. The only regime where the upper and lower bounds do not match is the case when both $p$ and $\delta$ go to $0$ . We conjecture that a better upper bound is needed in this case.

II-B Non-adaptive Sampling

In the case of non-adaptive sampling, we show that $\mathcal{O}(K)$ queries are not enough. And one needs $\Omega(K\log(K))$ queries.

Theorem II.3.

In the non-adaptive sampling setting, where the sampling procedure is restricted to taking independent samples from a sequence of distributions $p_{1},\cdots,p_{T}$ , we have

	$\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in\{0,1\}^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{OR}(\theta))$
	$\displaystyle\quad\geq\tfrac{1}{2}\cdot\Big{(}1-\sqrt{\tfrac{1}{2K}\Big{(}\left(1+\tfrac{(1-2p)^{2}}{K(1-p)p}\right)^{T}-1\Big{)}}\Big{)}.$

This shows that the query complexity is at least $\Omega(\max(K,K\log(K)p/(1-H(p))))$ .

We provide the proof below. Different from the case of adaptive sampling, for non-adaptive sampling we target for a rate of $\Omega(K\log(K))$ . And a standard Le Cam’s two point method is not sufficient to give the extra logarithmic factor. Thus we provide a new proof based on a point versus mixture extension of Le Cam’s method.

Proof of Theorem II.3.

Consider the instance $0$ which has all $0$ as its elements. For instance $1$ , we define it as a distribution $q$ over $K$ instances $1_{k}$ , $k\in[K]$ which puts probability $p_{k}$ on the $k$ -th element. We will determine the value of $p_{k}$ later. Here instance $1_{k}$ refers to the case when $k$ -th element is $1$ , and the rest elements are $0$ . Now from Le Cam’s two point lemma, we have

	$\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in\{0,1\}^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{OR}(\theta))$
	$\displaystyle\geq\frac{1}{2}(1-\mathsf{TV}(\mathbb{P}_{1},\mathbb{P}_{0}))$
	$\displaystyle=\frac{1}{2}(1-\mathsf{TV}(\mathbb{E}_{j\sim q}[\mathbb{P}_{1_{j}}],\mathbb{P}_{0}))$
	$\displaystyle\geq\frac{1}{2}\left(1-\sqrt{\frac{1}{2}\chi^{2}(\mathbb{E}_{j\sim q}[\mathbb{P}_{1_{j}}],\mathbb{P}_{0})}\right).$

Here the last inequality is based on the inequality $\mathsf{TV}\leq\sqrt{\frac{1}{2}\chi^{2}}$ . Let $\pi_{m}^{t}$ be the probability of querying the $m$ -th element in round $t$ . We can further calculate the $\chi^{2}$ divergence as

	$\displaystyle\chi^{2}(\mathbb{E}[\mathbb{P}_{1_{j}}(\cdot)],\mathbb{P}_{0}(\cdot))$
	$\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}\mathbb{E}_{j,j^{\prime}\sim q}\left[\sum_{x}\frac{\mathbb{P}_{1_{j}}(x)\mathbb{P}_{1_{j^{\prime}}}(x)}{\mathbb{P}_{0}(x)}\right]-1$
	$\displaystyle\stackrel{{\scriptstyle(b)}}{{=}}\mathbb{E}_{j,j^{\prime}}\Bigg{[}\prod_{t=1}^{T}\Bigg{(}\sum_{m=1}^{K}\frac{(\pi^{t}_{m})^{2}(1-p)^{\operatorname{\mathbbm{1}}(j\neq m)+\operatorname{\mathbbm{1}}(j^{\prime}\neq m)}p^{\operatorname{\mathbbm{1}}(j=m)+\operatorname{\mathbbm{1}}(j^{\prime}=m)}}{\pi_{m}^{t}(1-p)}$
	$\displaystyle\hskip 20.00003pt+\frac{(\pi^{t}_{m})^{2}(1-p)^{\operatorname{\mathbbm{1}}(j=m)+\operatorname{\mathbbm{1}}(j^{\prime}=m)}p^{\operatorname{\mathbbm{1}}(j\neq m)+\operatorname{\mathbbm{1}}(j^{\prime}\neq m)}}{\pi_{m}^{t}p}\Bigg{)}\Bigg{]}-1$
	$\displaystyle=\mathbb{E}_{j,j^{\prime}}\Bigg{[}\prod_{t=1}^{T}\Bigg{(}\sum_{m=1}^{K}\pi^{t}_{m}\Bigg{(}\frac{(1-p)^{\operatorname{\mathbbm{1}}(j\neq m)+\operatorname{\mathbbm{1}}(j^{\prime}\neq m)}p^{\operatorname{\mathbbm{1}}(j=m)+\operatorname{\mathbbm{1}}(j^{\prime}=m)}}{1-p}$
	$\displaystyle\hskip 20.00003pt+\frac{(1-p)^{\operatorname{\mathbbm{1}}(j=m)+\operatorname{\mathbbm{1}}(j^{\prime}=m)}p^{\operatorname{\mathbbm{1}}(j\neq m)+\operatorname{\mathbbm{1}}(j^{\prime}\neq m)}}{p}\Bigg{)}\Bigg{)}\Bigg{]}-1$
	$\displaystyle=\mathbb{E}_{j,j^{\prime}}\left[\prod_{t=1}^{T}\left(\sum_{m=1}^{K}\pi_{m}^{t}\left(1+\frac{(1-2p)^{2}\cdot\operatorname*{\mathbbm{1}}(j=j^{\prime}=m)}{(1-p)p}\right)\right)\right]-1$
	$\displaystyle=1-\sum_{j=1}^{K}p_{j}^{2}+\sum_{j=1}^{K}p_{j}^{2}\prod_{t=1}^{T}\left(1+\frac{\pi_{j}^{t}(1-2p)^{2}}{(1-p)p}\right)-1$
	$\displaystyle=-\sum_{j=1}^{K}p_{j}^{2}+\sum_{j=1}^{K}p_{j}^{2}\prod_{t=1}^{T}\left(1+\frac{\pi_{j}^{t}(1-2p)^{2}}{(1-p)p}\right),$

where $(a)$ follows from Lemma A.5, and $(b)$ follows from the tensorization property of $\chi^{2}$ for tensor products, a direct result of Lemma A.5. Now denote $T_{j}=\sum_{t=1}^{T}\pi_{j}^{t}$ . By Jensen’s inequality, we know that $\prod_{t=1}^{T}(1+\frac{\pi_{j}^{t}(1-2p)^{2}}{(1-p)p})\leq(1+\frac{T_{j}(1-2p)^{2}}{T(1-p)p})^{T}$ . Thus we have

	$\displaystyle\chi^{2}(\mathbb{P}_{0}(\cdot),\mathbb{E}[\mathbb{P}_{1_{j}}(\cdot)])$
	$\displaystyle\leq-\sum_{j=1}^{K}p_{j}^{2}+\sum_{j=1}^{K}p_{j}^{2}\left(1+\frac{T_{j}(1-2p)^{2}}{T(1-p)p}\right)^{T}.$

Now we take $p_{j}=((1+\frac{T_{j}(1-2p)^{2}}{T(1-p)p})^{T}-1)^{-1/2}/(\sum_{j}((1+\frac{T_{j}(1-2p)^{2}}{T(1-p)p})^{T}-1)^{-1/2})$ . We have

	$\displaystyle\chi^{2}(\mathbb{P}_{0}(\cdot),\mathbb{E}[\mathbb{P}_{1_{j}}(\cdot)])$
	$\displaystyle\leq\frac{K}{(\sum_{j=1}^{K}((1+\frac{T_{j}(1-2p)^{2}}{T(1-p)p})^{T}-1)^{-1/2})^{2}}.$

Since $\sum_{j}T_{j}=T$ , by Jensen’s inequality, we have $\sum_{j=1}^{K}((1+\frac{T_{j}(1-2p)^{2}}{T(1-p)p})^{T}-1)^{-1/2}\geq K((1+\frac{(1-2p)^{2}}{K(1-p)p})^{T}-1)^{-1/2}$ . Thus

\displaystyle\chi^{2}(\mathbb{P}_{0}(\cdot),\mathbb{E}[\mathbb{P}_{1_{j}}(\cdot)])\leq\frac{1}{K}\cdot\left(\left(1+\frac{(1-2p)^{2}}{K(1-p)p}\right)^{T}-1\right).

This gives the desired result. Now solving the inequality

\displaystyle\delta\geq\frac{1}{2}\cdot\left(1-\sqrt{\frac{1}{2K}\left(\left(1+\frac{(1-2p)^{2}}{K(1-p)p}\right)^{T}-1\right)}\right),

we arrive at $T\geq\log(1+2K(1-2\delta^{2}))/\log(1+\frac{(1-2p)^{2}}{K(1-p)p})\gtrsim K\log(K)p/(1-2p)^{2}$ . ∎

Remark II.4.

Compared with the bound $\Omega(\frac{K\log(K)}{\log((1-p)/p)})$ in (Reischuk and Schmeltz, 1991, Gács and Gál, 1994), our lower bound is tighter when $p\rightarrow 1/2$ , but looser when $p\rightarrow 0$ . Thus the tightest lower bound is a maximum of $K$ and the two lower bounds. The corresponding repetition-based upper bound $\mathcal{O}(\frac{K\log(K/\delta)}{1-H(p)})$ can be derived by a union-bound based argument. Compared with the upper bound $\mathcal{O}(\frac{K\log(K/\delta)}{1-H(p)})$ , the lower bound is tight with respect to all parameters when both $p$ and $\delta$ are bounded away from $0$ .

III Computing the $\mathsf{MAX}$ function

Let $\theta\in[0,1]^{K}$ be a sequence of length $K$ representing the true values of each element. $\mathsf{MAX}(\theta)$ be the index of the maximum value in the sequence. We also let $\hat{\mu}$ be any algorithm that (possibly randomly) queries any noisy comparison between two elements in $T$ rounds, and output a (possibly random) decision from $0,1$ .

III-A Adaptive Sampling

We have the following minimax lower bound for the adaptive setting. The proof is deferred to Appendix B.

Theorem III.1.

In the adaptive setting, we have

\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in[0,1]^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{MAX}(\theta))\geq\tfrac{1}{2}\cdot\exp\left(-\tfrac{T\cdot D_{\mathsf{KL}}(p\|1-p)}{K}\right).

Thus, the number of queries required to recover the true value with probability at least $1-\delta$ is lower bounded by $\Omega({K}/{(1-H(p))}+K\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p))$ .

The comparison with the existing lower bound (Feige et al., 1994) for computing $\mathsf{MAX}$ is similar as that for $\mathsf{OR}$ in Remark II.2, and is thus omitted here.

III-B Non-adaptive Sampling

In the case of non-adaptive sampling, we show that $\mathcal{O}(K)$ samples are not enough. Instead, one needs at least $\Omega(K^{2})$ samples. We have the following result. The proof is deferred to Appendix C.

Theorem III.2.

In the non-adaptive setting, where the sampling procedure is restricted to taking independent samples from a sequence of distributions $p_{1},\cdots,p_{T}$ , we have

\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in[0,1]^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{MAX}(\theta))\geq\tfrac{1}{4}\cdot\exp\left(-\tfrac{T\cdot D_{\mathsf{KL}}(p\|1-p)}{K^{2}}\right).

Thus the queries required to recover the true value with probability at least $1-\delta$ is lower bounded by $\Omega(K^{2}/(1-H(p))+K^{2}\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p))$ .

Remark III.3.

Compared with the repetition-based upper bound $\mathcal{O}(K^{2}\log(K))$ , the lower bound has a $\log(K)$ gap. The tight dependence on $K$ remains open.

IV Computing the $\mathsf{SEARCH}$ function

IV-A Adaptive Sampling

Recall that for any sorted sequence $X_{1},\cdots,X_{K}$ , the $\mathsf{SEARCH}$ function for $X$ is defined as $\mathsf{SEARCH}(X;X_{1},\cdots,X_{K})=i$ , where $i$ satisfies $X_{i}<X<X_{i+1}$ . We start with adaptive setting. The proof is deferred to Appendix D.

Theorem IV.1.

In the adaptive setting, we have

		$\displaystyle\inf_{\hat{\mu}}\sup_{X}\mathbb{P}(\hat{\mu}\neq\mathsf{SEARCH}(X))$
	$\displaystyle\geq$	$\displaystyle\tfrac{1}{4}\cdot\max\Big{(}\exp\left(-{T\cdot D_{\mathsf{KL}}(p\\|1-p)}\right),1-\tfrac{T\cdot(1-H(p))}{\log(K)}\Big{)}.$

Thus the queries required to recover the true value with probability at least $1-\delta$ is lower bounded by $\Omega(\log(K)/(1-H(p))+\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p))$ .

Remark IV.2.

The same lower bound is also proven in two different manners in the concurrent work (Wang et al., 2023, Gu and Xu, 2023). And a matching upper bound that is tight with respect to all parameters when $p$ is bounded away from $0$ and $1/2$ is provided in Gu and Xu (2023).

IV-B Non-adaptive Sampling

For non-adaptive sampling, we show $\Omega(K)$ queries are needed. The proof is deferred to Appendix E.

Theorem IV.3.

In the non-adaptive setting where the sampling procedure is restricted to taking independent samples from a sequence of distributions $p_{1},\cdots,p_{T}$ , we have

\displaystyle\inf_{\hat{\mu}}\sup_{X}\mathbb{P}(\hat{\mu}\neq\mathsf{SEARCH}(X))\geq\tfrac{1}{4}\cdot\exp\left(-\tfrac{T\cdot D_{\mathsf{KL}}(p\|1-p)}{K}\right).

Thus, the queries required to recover the true value with probability at least $1-\delta$ is lower bounded by $\Omega(K/(1-H(p))+K\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p))$ .

Remark IV.4.

We show in Appendix H the upper bound $\mathcal{O}(K\log(1/\delta)/(1-H(p)))$ for computing $\mathsf{SEARCH}$ . Our lower bound matches with all parameters when either $p$ or $\delta$ is bounded away from $0$ .

V Computing the $\mathsf{SORT}$ function

V-A Adaptive Sampling

Let $\theta\in[0,1]^{K}$ be any sequence of $K$ elements with distinct values in $[0,1]$ , $\mathsf{SORT}(\theta)$ be the result of the $\mathsf{SORT}$ function applied on the noiseless sequence. We also let $\hat{\mu}$ be any algorithm that queries any noisy comparison between two elements in $T$ rounds, and output a (possibly random) decision. We have the following result for computing the $\mathsf{SORT}$ function. The proof is deferred to Appendix F.

Theorem V.1.

In the adaptive setting, we have

		$\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in[0,1]^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{SORT}(\theta))$
	$\displaystyle\geq$	$\displaystyle\frac{1}{4}\max\left(\exp\left(-\frac{T\cdot D_{\mathsf{KL}}(p\\|1-p)}{K}\right),1-\frac{T\cdot(1-H(p))}{K\log(K)}\right).$

Thus, the queries required to recover the true value with probability at least $1-\delta$ is lower bounded by $\Omega(K\log(K)/(1-H(p))+K\log(K/\delta)/D_{\mathsf{KL}}(p\|1-p)))$ .

Remark V.2.

Compared with the upper bound $\mathcal{O}(K\log(K/\delta)/(1-H(p)))$ , the bound is tight with all parameters when either $p$ or $\delta$ is bounded away from $0$ .

V-B Non-adaptive Sampling

Here we provide the following minimax lower bound for non-adaptive learning. The proof is deferred to Appendix G.

Theorem V.3.

In the non-adaptive setting where the sampling procedure is restricted to taking independent samples from a sequence of distributions $p_{1},\cdots,p_{T}$ ,

\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in[0,1]^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{SORT}(\theta))\geq\frac{1}{2}\cdot\left(1-\frac{T\cdot D_{\mathsf{KL}}(p\|1-p)}{K^{2}\log(K)}\right).

Thus, the queries required to recover the true value with probability at least $1-\delta$ is lower bounded by $\Omega(\max(K^{2},K^{2}\log(K)/D_{\mathsf{KL}}(p\|1-p)))$ .

Remark V.4.

Compared with the repetition-based upper bound $\mathcal{O}(K^{2}\log(K/\delta)/(1-H(p)))$ , the lower bound is tight with all parameters when $p$ and $\delta$ are bounded away from $0$ .

VI Matching Bounds for Variable Length

In this section, we provide matching upper and lower bounds for the variable-length setting. All the bounds here are the same as the lower bound for the fixed-length setting, the proof of which can be directly adapted from the fixed-length results.

Theorem VI.1.

In the adaptive setting, the number of queries in expectation to achieve at most $\delta$ error probability is

1.

$\Theta(\frac{K}{1-H(p)}+\frac{K\log(1/\delta)}{D_{\mathsf{KL}}(p\|1-p)})$ for computing $\mathsf{OR}$ ;
2.

$\Theta(\frac{K}{1-H(p)}+\frac{K\log(1/\delta)}{D_{\mathsf{KL}}(p\|1-p)})$ for computing $\mathsf{MAX}$ ;
3.

$\Theta(\frac{\log(K)}{1-H(p)}+\frac{\log(1/\delta)}{D_{\mathsf{KL}}(p\|1-p)})$ for computing $\mathsf{SEARCH}$ ;
4.

$\Theta(\frac{K\log(K)}{1-H(p)}+\frac{K\log(K/\delta)}{D_{\mathsf{KL}}(p\|1-p)})$ for computing $\mathsf{SORT}$ .

The proof is deferred to Appendix J. The matching upper bound for $\mathsf{SEARCH}$ is given in (Gu and Xu, 2023) in the regime when $p$ is some constant that is bounded away from $0$ and $1/2$ , where they make it tight even for the dependency on the constant. Our results for the other three functions improve both existing upper and lower bounds, and provide tight query complexity with all parameters in the variable-length setting. (Wang et al., 2023) initiates the study for constant-wise matching bounds for $\mathsf{SORT}$ when $\delta=\Omega(K^{-1})$ , which is achieved by (Gu and Xu, 2023). Our bounds are tight up to constant for arbitrarily small $\delta$ .

Algorithm 1 Variable-length tournament for computing

\mathsf{OR}

with noise

1:Input: Target confidence level

\delta

2:Set

\mathcal{X}=(X_{1},X_{2},\cdots,X_{K})

as the list of all bits with unknown value.

3:for iteration

i=1:\lceil\log_{2}(K)\rceil

4: for iteration

j=1:\lceil|\mathcal{X}|/2\rceil

5: Set

a=1/2

\tilde{\delta}_{i}=\delta^{2(2i-1)}

6: while

a\in(\tilde{\delta}_{i},1-\tilde{\delta}_{i})

7: Query the

(2j-1)

-th element once.

8: If observe

1

, update

a=\frac{(1-p)a}{(1-p)a+p(1-a)}

. Otherwise update

a=\frac{pa}{pa+(1-p)(1-a)}

9: If

a\leq\tilde{\delta}_{i}

, remove the

(2j-1)

-th element, otherwise remove the

2j

-th element.

10: Break when

\mathcal{X}

only has one element left.

11:Set

a=1/2

12:while

a\in(\delta,1-\delta)

13: Query the only left element in

\mathcal{X}

element once.

14: If observe

1

, update

a=\frac{(1-p)a}{(1-p)a+p(1-a)}

. Otherwise update

a=\frac{pa}{pa+(1-p)(1-a)}

15:If

a\leq\delta

, return

0

, otherwise return

1.

We provide the upper bound algorithm for computing $\mathsf{OR}$ in Algorithm 1. For the upper bound, one major difference is that to compare two elements with error probability at most $\delta$ , one needs $\mathcal{O}(\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p)+1/(1-2p))$ queries in expectation, which can be achieved by keep comparing the two elements until the posterior distribution reaches the desired confidence level (see e.g. Lemma 13 of Gu and Xu (2023)). But the best known bound for fixed-length is $\mathcal{O}(\log(1/\delta)/(1-H(p))$ (Feige et al., 1994). This makes it simpler to achieve tight rate for variable length.

In Algorithm 1, we adapt the noisy comparison oracle in Gu and Xu (2023) to noisy query oracle on each element, and combine with the original fixed-length algorithm in Feige et al. (1994). In each round, the algorithm eliminates half of the elements in the current set by querying the elements with odd indices. If the $(2j-1)$ -th element is determined to be $1$ , the $2j$ -th element will be removed without being queried. If the $(2j-1)$ -th element is determined to be $0$ , it will be removed from the list. Thus after $\mathcal{O}(\log(K))$ rounds, we have only one element left in the set, and it suffices to query this element to determine the output of $\mathsf{OR}$ function.

VII Conclusions and Future Work

For four noisy computing tasks — the $\mathsf{OR}$ function from noisy queries, and the $\mathsf{MAX},\mathsf{SEARCH}$ , and $\mathsf{SORT}$ functions from noisy pairwise comparisons — we tighten the lower bounds for fixed-length noisy computing and provide matching bounds for variable-length noisy computing. Making the bounds match exactly in the fixed-length setting remains an important open problem.

Acknowledgements

The authors would like to thank Yanjun Han for insightful discussions on the information-theoretic lower bounds. This work was supported in part by the NSERC Discovery Grant No. RGPIN-2019-05448, the NSERC Collaborative Research and Development Grant CRDPJ 54367619, NSF Grants IIS-1901252 and CCF-1909499.

References

Agarwal et al. (2017) A. Agarwal, S. Agarwal, S. Assadi, and S. Khanna. Learning with limited rounds of adaptivity: Coin tossing, multi-armed bandits, and ranking from pairwise comparisons. In Proceedings of the 2017 Conference on Learning Theory, volume 65 of Proceedings of Machine Learning Research, pages 39–75. PMLR, 07–10 Jul 2017.
Ailon (2011) N. Ailon. Active learning ranking from pairwise preferences with almost optimal query complexity. In J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 24. Curran Associates, Inc., 2011.
Ailon et al. (2008) N. Ailon, M. Charikar, and A. Newman. Aggregating inconsistent information: Ranking and clustering. J. ACM, 55(5), nov 2008. ISSN 0004-5411. doi: 10.1145/1411509.1411513.
Ailon et al. (2011) N. Ailon, R. Begleiter, and E. Ezra. A new active learning scheme with applications to learning to rank from pairwise preferences. 2011.
Ajtai et al. (2009) M. Ajtai, V. Feldman, A. Hassidim, and J. Nelson. Sorting and selection with imprecise comparisons. In ACM Trans. Algorithms, volume 12, pages 37–48, 07 2009. doi: 10.1007/978-3-642-02927-1˙5.
Audibert et al. (2010) J.-Y. Audibert, S. Bubeck, and R. Munos. Best arm identification in multi-armed bandits. In COLT, pages 41–53, 2010.
Auer et al. (1995) P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. Gambling in a rigged casino: The adversarial multi-armed bandit problem. In Proceedings of IEEE 36th annual foundations of computer science, pages 322–331. IEEE, 1995.
Berlekamp (1964) E. R. Berlekamp. Block coding with noiseless feedback. Ph.D. thesis, MIT, Cambridge, MA, USA, 1964.
Bradley and Terry (1952) R. A. Bradley and M. E. Terry. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):324–345, 1952. ISSN 00063444.
Braverman and Mossel (2009) M. Braverman and E. Mossel. Sorting from noisy information. 2009. URL https://arxiv.org/abs/0910.1191.
Bretagnolle and Huber (1979) J. Bretagnolle and C. Huber. Estimation des densités: risque minimax. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, 47(2):119–137, 1979. doi: 10.1007/BF00535278.
Bubeck et al. (2009) S. Bubeck, R. Munos, and G. Stoltz. Pure exploration in multi-armed bandits problems. In International conference on Algorithmic learning theory, pages 23–37. Springer, 2009.
Burnashev (1976) M. V. Burnashev. Data transmission over a discrete channel with feedback. random transmission time. Problemy Peredachi Informatsii, 12(4):10–30, 1976.
Burnashev and Zigangirov (1974) M. V. Burnashev and K. Zigangirov. An interval estimation problem for controlled observations. Problemy Peredachi Informatsii, 10(3):51–61, 1974.
Chen et al. (2017) X. Chen, Y. Chen, and X. Li. Asymptotically optimal sequential design for rank aggregation. Math. Oper. Res., 10 2017. doi: 10.1287/moor.2021.1209.
Chen and Suh (2015) Y. Chen and C. Suh. Spectral MLE: Top-K rank aggregation from pairwise comparisons. In F. Bach and D. Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 371–380, Lille, France, 07–09 Jul 2015. PMLR.
Dobrushin and Ortyukov (1977a) R. L. Dobrushin and S. Ortyukov. Lower bound for the redundancy of self-correcting arrangements of unreliable functional elements. Problemy Peredachi Informatsii, 13(1):82–89, 1977a.
Dobrushin and Ortyukov (1977b) R. L. Dobrushin and S. Ortyukov. Upper bound on the redundancy of self-correcting arrangements of unreliable functional elements. Problemy Peredachi Informatsii, 13(3):56–76, 1977b.
Evans and Pippenger (1998) W. Evans and N. Pippenger. Average-case lower bounds for noisy boolean decision trees. SIAM Journal on Computing, 28(2):433–446, 1998.
Falahatgar et al. (2017) M. Falahatgar, A. Orlitsky, V. Pichapati, and A. T. Suresh. Maximum selection and ranking under noisy comparisons. In D. Precup and Y. W. Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1088–1096. PMLR, 06–11 Aug 2017.
Feige et al. (1994) U. Feige, P. Raghavan, D. Peleg, and E. Upfal. Computing with noisy information. SIAM Journal on Computing, 23(5):1001–1018, 1994.
Gabillon et al. (2012) V. Gabillon, M. Ghavamzadeh, and A. Lazaric. Best arm identification: A unified approach to fixed budget and fixed confidence. Advances in Neural Information Processing Systems, 25, 2012.
Gács and Gál (1994) P. Gács and A. Gál. Lower bounds for the complexity of reliable boolean circuits with noisy gates. IEEE Transactions on Information Theory, 40(2):579–583, 1994.
Garivier and Kaufmann (2016) A. Garivier and E. Kaufmann. Optimal best arm identification with fixed confidence. In Conference on Learning Theory, 2016.
Gu and Xu (2023) Y. Gu and Y. Xu. Optimal bounds for noisy sorting, 2023.
Heckel et al. (2019) R. Heckel, N. B. Shah, K. Ramchandran, and M. J. Wainwright. Active ranking from pairwise comparisons and when parametric assumptions do not help. Ann. Stat., 47(6):3099 – 3126, 2019.
Horstein (1963) M. Horstein. Sequential transmission using noiseless feedback. IEEE Trans. Inf. Theory, 9(3):136–143, 1963. doi: 10.1109/TIT.1963.1057832.
Ingster et al. (2003) Y. Ingster, J. I. Ingster, and I. Suslina. Nonparametric goodness-of-fit testing under Gaussian models, volume 169. Springer Science & Business Media, 2003.
Jamieson and Nowak (2014) K. Jamieson and R. Nowak. Best-arm identification algorithms for multi-armed bandits in the fixed confidence setting. In 48th Annual Conference on Information Sciences and Systems, pages 1–6, 2014.
Jamieson and Nowak (2011) K. G. Jamieson and R. D. Nowak. Active ranking using pairwise comparisons. In Proceedings of the 24th International Conference on Neural Information Processing Systems, NIPS’11, page 2240–2248, Red Hook, NY, USA, 2011. Curran Associates Inc. ISBN 9781618395993.
Karp and Kleinberg (2007) R. M. Karp and R. Kleinberg. Noisy binary search and its applications. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’07, page 881–890, USA, 2007. Society for Industrial and Applied Mathematics. ISBN 9780898716245.
Kaufmann et al. (2016) E. Kaufmann, O. Cappé, and A. Garivier. On the complexity of best arm identification in multi-armed bandit models. Journal of Machine Learning Research, 17:1–42, 2016.
LeCam (1973) L. LeCam. Convergence of estimates under dimensionality restrictions. The Annals of Statistics, pages 38–53, 1973.
Mao et al. (2018) C. Mao, J. Weed, and P. Rigollet. Minimax rates and efficient algorithms for noisy sorting. In Proceedings of Algorithmic Learning Theory, volume 83 of Proceedings of Machine Learning Research, pages 821–847. PMLR, 07–09 Apr 2018.
Mitzenmacher and Upfal (2017) M. Mitzenmacher and E. Upfal. Probability and computing: Randomization and probabilistic techniques in algorithms and data analysis. Cambridge university press, 2017.
Mohajer et al. (2017) S. Mohajer, C. Suh, and A. Elmahdy. Active learning for top- $k$ rank aggregation from noisy comparisons. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, page 2488–2497. JMLR.org, 2017.
Negahban et al. (2012) S. Negahban, S. Oh, and D. Shah. Iterative ranking from pair-wise comparisons. In Advances in Neural Information Processing Systems, volume 25, 2012.
Pelc (1989) A. Pelc. Searching with known error probability. Theoretical Computer Science, 63(2):185–202, 1989.
Pippenger et al. (1991) N. Pippenger, G. D. Stamoulis, and J. N. Tsitsiklis. On a lower bound for the redundancy of reliable networks with noisy gates. IEEE Transactions on Information Theory, 37(3):639–643, 1991.
Rajkumar and Agarwal (2014) A. Rajkumar and S. Agarwal. A statistical convergence perspective of algorithms for rank aggregation from pairwise data. In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32, page I–118–I–126, 2014.
Reischuk and Schmeltz (1991) R. Reischuk and B. Schmeltz. Reliable computation with noisy circuits and decision trees-a general n log n lower bound. In [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science, pages 602–611. IEEE Computer Society, 1991.
Ren et al. (2018) W. Ren, J. Liu, and N. B. Shroff. Pac ranking from pairwise and listwise queries: Lower bounds and upper bounds. 2018.
Rényi (1961) A. Rényi. On a problem of information theory. MTA Mat. Kut. Int. Kozl. B, 6(MR143666):505–516, 1961.
Shah and Wainwright (2018) N. B. Shah and M. J. Wainwright. Simple, robust and optimal ranking from pairwise comparisons. J. Mach. Learn. Res., 18(199):1–38, 2018.
Shah et al. (2016) N. B. Shah, S. Balakrishnan, A. Guntuboyina, and M. J. Wainwright. Stochastically transitive models for pairwise comparisons: Statistical and computational issues. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML’16, page 11–20. JMLR.org, 2016.
Tsybakov (2004) A. B. Tsybakov. Introduction to nonparametric estimation. Springer, 9(10), 2004.
Ulam (1976) S. Ulam. Adventures of a mathematician. Charles Scribner’s Sons, New York, NY, USA, 1976.
Von Neumann (1956) J. Von Neumann. Probabilistic logics and the synthesis of reliable organisms from unreliable components. Automata studies, 34:43–98, 1956.
Wang et al. (2022) Z. Wang, N. Ghaddar, and L. Wang. Noisy sorting capacity. In 2022 IEEE International Symposium on Information Theory (ISIT), pages 2541–2546. IEEE, 2022.
Wang et al. (2023) Z. Wang, N. Ghaddar, B. Zhu, and L. Wang. Noisy sorting capacity, 2023. URL https://arxiv.org/abs/2202.01446.
Wauthier et al. (2013) F. Wauthier, M. Jordan, and N. Jojic. Efficient ranking from pairwise comparisons. In Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pages 109–117, Atlanta, Georgia, USA, 17–19 Jun 2013.
Yu (1997) B. Yu. Assouad, fano, and le cam. In Festschrift for Lucien Le Cam, pages 423–435. Springer, 1997.

Appendix A Useful Lemmas

Here, we introduce several important lemmas.

Lemma A.1 (Le Cam’s Two Point Lemma, see e.g. (Yu, 1997)).

For any $\theta_{0},\theta_{1}\in\Theta$ , suppose that the following separation condition holds for some loss function $L(\theta,a):\Theta\times\mathcal{A}\rightarrow\mathbb{R}$ :

\displaystyle\forall{a\in\mathcal{A}},L(\theta_{0},a)+L(\theta_{1},a)\geq\Delta>0.

Then we have

\displaystyle\inf_{f}\sup_{\theta}\mathbb{E}_{\theta}[L(\theta,f(X))]\geq\frac{\Delta}{2}\left(1-\mathsf{TV}(\mathbb{P}_{\theta_{0}},\mathbb{P}_{\theta_{1}})\right).

Lemma A.2 (Point vs Mixture, see e.g. (Ingster et al., 2003)).

For any $\theta_{0}\in\Theta$ , $\Theta_{1}\subset\Theta$ , suppose that the following separation condition holds for some loss function $L(\theta,a):\Theta\times\mathcal{A}\rightarrow\mathbb{R}$ :

\displaystyle\forall\theta_{1}\in\Theta_{1},{a\in\mathcal{A}},L(\theta_{0},a)+L(\theta_{1},a)\geq\Delta>0.

Then for any probability measure $\mu$ supported on $\Theta_{1}$ , we have

\displaystyle\inf_{f}\sup_{\theta}\mathbb{E}_{\theta}[L(\theta,f(X))]\geq\frac{\Delta}{2}\left(1-\mathsf{TV}(\mathbb{P}_{\theta_{0}},\mathbb{E}_{\mu(d\theta)}[\mathbb{P}_{\theta_{1}}])\right).

Lemma A.3 (Divergence Decomposition (Auer et al., 1995)).

Let $T_{i}$ be the random variable denoting the number of times experiment $i\in[K]$ is performed under some policy $\pi$ , then for two distributions $\mathbb{P}^{\pi},\mathbb{Q}^{\pi}$ under policy $\pi$ ,

\displaystyle D_{\mathsf{KL}}(\mathbb{P}^{\pi},\mathbb{Q}^{\pi})=\sum_{i\in[K]}\mathbb{E}_{\mathbb{P}^{\pi}}[T_{i}]D_{\mathsf{KL}}(\mathbb{P}_{i}^{\pi},\mathbb{Q}_{i}^{\pi}).

Lemma A.4 (An upper Bound of Bretagnolle–Huber inequality ((Bretagnolle and Huber, 1979), and Lemma 2.6 in Tsybakov (2004))).

For any distribution $\mathbb{P}_{1},\mathbb{P}_{2}$ , one has

\displaystyle\mathsf{TV}(\mathbb{P}_{1},\mathbb{P}_{2})\leq 1-\frac{1}{2}\exp(-D_{\mathsf{KL}}(\mathbb{P}_{1},\mathbb{P}_{2})).

Lemma A.5.

(Chi-Square divergence for mixture of distributions, see e.g. (Ingster et al., 2003)) Let $(P_{\theta})_{\theta\in\Theta}$ be a family of distributions parametrized by $\theta$ , and $Q$ be any fixed distribution. Then for any probability measure $\mu$ supported on $\Theta$ , we have

\displaystyle\chi^{2}(\mathbb{E}_{\theta\sim\mu}[P_{\theta}],Q)=\mathbb{E}_{\theta\sim\mu,\theta^{\prime}\sim\mu}\left[\sum_{x}\frac{P_{\theta}(x)P_{\theta^{\prime}}(x)}{Q(x)}\right]-1.

Here $\theta,\theta^{\prime}$ in the expectation are independent.

Lemma A.6.

[Chernoff’s bound for Binomial random variables (Mitzenmacher and Upfal, 2017, Exercise 4.7)] If $X\sim\mathsf{Bin}(n,\frac{\lambda}{n})$ , then for any $\eta\in(0,1)$ , we have

	$\displaystyle\mathbb{P}(X\geq(1+\eta)\lambda)$	$\displaystyle\leq\left(\frac{e^{\eta}}{(1+\eta)^{(1+\eta)}}\right)^{\lambda}$		(3)
	$\displaystyle\mathbb{P}(X\leq(1-\eta)\lambda)$	$\displaystyle\leq\left(\frac{e^{-\eta}}{(1-\eta)^{(1-\eta)}}\right)^{\lambda}\leq e^{-\eta^{2}\lambda/2}.$		(4)

Appendix B Proof for Theorem III.1

Proof.

We first select an arbitrary sequence $0<X_{1}<X_{2}<\cdots<X_{K}<1$ . Let $\theta_{0}=(X_{1},X_{2},X_{3},\cdots,X_{K})$ be original sequence which has its largest value in the $K$ -th element. Thus $\mathsf{MAX}(\theta_{0})=K$ . Now for any $i\in[K-1]$ , we design $\theta_{i}=(X_{1},\cdots,X_{i-1},X_{K},X_{i},X_{i+1},\cdots,X_{K-1})$ , i.e., we move the $K$ -th element in $\theta_{0}$ and insert it between $X_{i-1}$ and $X_{i}$ . Let $T_{i,j}$ be the random variable that represents the number of comparisons between the $i$ -th item and $j$ -th item in the $T$ rounds. We know that $\mathsf{MAX}(\theta_{i})=i$ for all $i\in[K-1]$ . Following a similar proof as Theorem II.1, we know that

	$\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in\{0,1\}^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{MAX}(\theta))$
	$\displaystyle\geq\sup_{1\leq j\leq K-1}\frac{1}{2}(1-\mathsf{TV}(\mathbb{P}_{\theta_{0}},\mathbb{P}_{\theta_{j}}))$
	$\displaystyle\geq\sup_{1\leq j\leq K-1}\frac{1}{4}\exp(-D_{\mathsf{KL}}(\mathbb{P}_{\theta_{0}},\mathbb{P}_{\theta_{j}}))$
	$\displaystyle\geq\sup_{1\leq j\leq K-1}\frac{1}{4}\exp(-\sum_{l=j+1}^{K}\mathbb{E}_{\theta_{0}}[T_{j,l}]\cdot D_{\mathsf{KL}}(p\\|1-p)).$

Now since $\sum_{i,j\in[K],i<j}\mathbb{E}_{\theta_{0}}[T_{i,j}]=T$ , there must exists some $j$ such that $\sum_{l=j+1}^{K}\mathbb{E}_{\theta_{0}}[T_{j,l}]\leq T/K$ . This gives

\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in\{0,1\}^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{MAX}(\theta))

\displaystyle\geq\frac{1}{4}\cdot\exp\left(-\frac{T\cdot D_{\mathsf{KL}}(p\|1-p)}{K}\right).

On the other hand, $K$ is naturally a lower bound for the query complexity since one has to query each element at least once. Thus we arrive at a lower bound of $\Omega(\max(K,K\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p)))$ . Note that this is equivalent to $\Omega(K/(1-H(p))+K\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p))$ up to a constant factor when $\delta<0.49$ . The reason is that when $p$ is bounded away from $0$ , $(1-H(p))/D_{\mathsf{KL}}(p\|1-p)$ is always some constant. When $p$ is close to $0$ , $1-H(p)$ is within constant factor of $1$ . ∎

Appendix C Proof for Theorem III.2

Proof.

Consider an arbitrary sequence $0<X_{1}<X_{2}<\cdots<X_{K}<1$ . Since $\sum_{i,j\in[K],i<j}\mathbb{E}_{\theta_{0}}[T_{i,j}]=T$ , there must exists some pair $(i,j)$ such that $\mathbb{E}_{\theta_{0}}[T_{i,j}]\leq 2T/K(K-1)$ . Now we construct

\theta_{0}=(X_{1},X_{2},X_{3},\cdots,X_{K},\cdots,X_{K-1},\cdots,X_{K-2}),

where $X_{K}$ lies in the $i$ -th position and $X_{K-1}$ lies in the $j$ -th position, and

\theta_{1}=(X_{1},X_{2},X_{3},\cdots,X_{K-1},\cdots,X_{K},\cdots,X_{K-2}),

where $X_{K}$ lies in the $j$ -th position and $X_{K-1}$ lies in the $i$ -th position. Thus $\mathsf{MAX}(\theta_{0})=i$ , $\mathsf{MAX}(\theta_{1})=j$ . From Le Cam’s two point lemma, we know that

	$\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in\{0,1\}^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{MAX}(\theta))$
	$\displaystyle\geq\frac{1}{2}(1-\mathsf{TV}(\mathbb{P}_{\theta_{0}},\mathbb{P}_{\theta_{1}}))$
	$\displaystyle\geq\frac{1}{4}\exp(-\mathbb{E}_{\theta_{0}}[T_{i,j}]\cdot D_{\mathsf{KL}}(p\\|1-p))$
	$\displaystyle\geq\frac{1}{4}\cdot\exp\left(-\frac{2T\cdot D_{\mathsf{KL}}(p\\|1-p)}{K^{2}}\right).$

On the other hand, $K^{2}$ is naturally a lower bound for the query complexity since one has to query each element at least once. Thus we arrive at a lower bound of $\Omega(\max(K^{2},K^{2}\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p)))$ . Note that this is equivalent to $\Omega(K^{2}/(1-H(p))+K^{2}\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p))$ up to a constant factor when $\delta<0.49$ . ∎

Appendix D Proof for Theorem IV.1

Proof.

We begin with the first half, i.e.

\displaystyle\inf_{\hat{\mu}}\sup_{X}\mathbb{P}(\hat{\mu}\neq\mathsf{SEARCH}(X))\geq\frac{1}{4}\cdot\exp\left(-{T\cdot D_{\mathsf{KL}}(p\|1-p)}\right).

To see this, simply consider the case of $K=1$ , and we need to determine whether $X<X_{1}$ or $X>X_{1}$ from their pairwise comparisons. We consider two instances $X^{(0)},X^{(1)}$ , where $X^{(0)}<X_{1}<X^{(1)}$ . From Le Cam’s lemma, we have

	$\displaystyle\inf_{\hat{\mu}}\sup_{X}\mathbb{P}(\hat{\mu}\neq\mathsf{SEARCH}(\theta))$	$\displaystyle\geq\frac{1}{2}(1-\mathsf{TV}(\mathbb{P}_{X^{(0)}},\mathbb{P}_{X^{(1)}}))$
		$\displaystyle\geq\frac{1}{4}\exp(-T\cdot D_{\mathsf{KL}}(p\\|1-p)).$

Next, we aim to prove the second half, namely

		$\displaystyle\inf_{\hat{\mu}}\sup_{X}\mathbb{P}(\hat{\mu}\neq\mathsf{SEARCH}(X))$
	$\displaystyle\geq$	$\displaystyle\frac{1}{2}\cdot\left(1-\frac{T\cdot(1-H(p))+\log(2)}{\log(K)}\right).$

Now we design $K$ instances $X^{(0)},\cdots,X^{(K-1)}$ . We let $X^{(0)}<X_{1}$ , and $X^{(l)}\in(X_{l},X_{l+1})$ . From Le Cam’s lemma, we have

	$\displaystyle\inf_{\hat{\mu}}\sup_{X}\mathbb{P}(\hat{\mu}\neq\mathsf{SEARCH}(\theta))$
	$\displaystyle\geq\inf_{\hat{\mu}}\sup_{X\in\{X^{l}\}_{l\in[K]}}\mathbb{P}(\hat{\mu}\neq\mathsf{SEARCH}(\theta))$
	$\displaystyle\geq\inf_{\Psi}\frac{1}{2K}\sum_{l\in[K]}\mathbb{P}_{X^{(l)}}(\Psi\neq l).$

Now by Fano’s inequality, we have

	$\displaystyle\inf_{\Psi}\frac{1}{2K}\sum_{l\in[K]}\mathbb{P}_{{}^{(l)}}(\Psi\neq l)$
	$\displaystyle\geq\frac{1}{2}\cdot\left(1-\frac{I(V;Y)+\log(2)}{\log(K)}\right).$

Here $V\sim\mathsf{Unif}(\{0,1,\cdots,K-1\}))$ and $Y$ satisfies $\mathbb{P}_{Y|V=l}=\mathbb{P}_{X^{(l)}}$ . Following the same argument as the proof of Theorem 3 in Wang et al. (2022), we know that $I(V;Y)\leq T\cdot(1-H(p))$ . Thus overall we have

\displaystyle\inf_{\hat{\mu}}\sup_{X}\mathbb{P}(\hat{\mu}\neq\mathsf{SEARCH}(X))\geq\frac{1}{4}\cdot\left(1-\frac{T\cdot(1-H(p))}{\log(K)}\right).

Thus the queries required to recover the true value with probability at least $1-\delta$ is lower bounded by $\Omega(\log(K)/(1-H(p))+\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p))$ . ∎

Appendix E Proof for Theorem IV.3

Proof.

Let $T_{i}$ be the random variable that denotes the number of times $X$ is compared with $X_{i}$ . Since $\sum_{i\in[K]}\mathbb{E}[T_{i}]=T$ , there must exists some $i$ such that $\mathbb{E}[T_{i}]\leq T/K$ . Now we construct the first instance $X^{(0)}\in(X_{i-1},X_{i})$ , and the second instance $X^{(1)}\in(X_{i},X_{i+1})$ . From Le Cam’s two point lemma, we know that

	$\displaystyle\inf_{\hat{\mu}}\sup_{X}\mathbb{P}(\hat{\mu}\neq\mathsf{SEARCH}(X))$
	$\displaystyle\geq\frac{1}{2}(1-\mathsf{TV}(\mathbb{P}_{X^{(0)}},\mathbb{P}_{X^{(1)}}))$
	$\displaystyle\geq\frac{1}{4}\exp(-\mathbb{E}[T_{i}]\cdot D_{\mathsf{KL}}(p\\|1-p))$
	$\displaystyle\geq\frac{1}{4}\cdot\exp\left(-\frac{T\cdot D_{\mathsf{KL}}(p\\|1-p)}{K}\right).$

On the other hand, $K$ is naturally a lower bound for query complexity since one has to query each element at least once. Thus we arrive at a lower bound of $\Omega(\max(K,K\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p)))$ . ∎

Appendix F Proof for Theorem V.1

Proof.

From (Wang et al., 2023), it is proven with Fano’s inequality that

\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in[0,1]^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{SORT}(\theta))\geq\frac{1}{2}\cdot\left(1-\frac{T\cdot(1-H(p))}{K\log(K)}\right).

So it suffices to prove that

\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in[0,1]^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{SORT}(\theta))\geq\frac{1}{4}\cdot\exp\left(-\frac{T\cdot D_{\mathsf{KL}}(p\|1-p)}{K}\right).

To see this, consider an arbitrary sequence $0<X_{1}<X_{2}<\cdots<X_{K}<1$ . Now we design $K$ instances $\theta_{0},\cdots,\theta_{K-1}$ . We let $\theta_{0}=(X_{1},\cdots,X_{K})$ , and $\theta_{l}$ be the instance that switches the element $X_{l}$ with $X_{l+1}$ , where $l\in[K]$ . From Le Cam’s two point lemma, we have

	$\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in[0,1]^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{SORT}(\theta))$
	$\displaystyle\geq\sup_{1\leq j\leq K-1}\frac{1}{2}(1-\mathsf{TV}(\mathbb{P}_{\theta_{0}},\mathbb{P}_{\theta_{j}}))$
	$\displaystyle\geq\sup_{1\leq j\leq K-1}\frac{1}{4}\exp(-D_{\mathsf{KL}}(\mathbb{P}_{\theta_{0}},\mathbb{P}_{\theta_{j}}))$
	$\displaystyle\geq\sup_{1\leq j\leq K-1}\frac{1}{4}\exp(-\mathbb{E}_{\theta_{0}}[T_{j,j+1}]\cdot D_{\mathsf{KL}}(p\\|1-p)).$

Let $T_{i,j}$ be the random variable that represents the number of comparisons between the $i$ -th item and $j$ -th item in the $T$ rounds. Now since $\sum_{1\leq j\leq K-1}\mathbb{E}_{\theta_{0}}[T_{j,j+1}]\leq T$ , there must exists some $j$ such that $\mathbb{E}_{\theta_{0}}[T_{j,j+1}]\leq T/K$ . This gives

\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in[0,1]^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{SORT}(\theta))

\displaystyle\geq\frac{1}{4}\cdot\exp\left(-\frac{T\cdot D_{\mathsf{KL}}(p\|1-p)}{K}\right).

Altogether, this shows a lower bound on the query complexity $\Omega(K\log(K)/(1-H(p))+K\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p)))$ . Note that this is equivalent to $\Omega(K\log(K)/(1-H(p))+K\log(K/\delta)/D_{\mathsf{KL}}(p\|1-p)))$ since $1/(1-H(p))\gtrsim 1/D_{\mathsf{KL}}(p\|1-p)$ . ∎

Appendix G Proof for Theorem V.3

Proof.

We first select an arbitrary sequence $0<X_{1}<X_{2}<\cdots<X_{K}<1$ . Let $\sigma_{i}:[K]\mapsto[K]$ be the $i$ -th permutation of the sequence, where $i\in[K!]$ . Here $\sigma_{i}(k)$ refers to the $k$ -th largest element under permutation $\sigma_{i}$ . Now we consider a summation $\sum_{i\in[K!],j\in[K-1]}\mathbb{E}[T_{\sigma_{i}(j),\sigma_{i}(j+1)}]$ . For each pair $(i,j)$ , $\mathbb{E}[T_{i,j}]$ is counted $2(K-1)!$ times in the summation. Furthermore, we know that $\sum_{i<j}\mathbb{E}[T_{i,j}]=T$ . Thus we have

\displaystyle\sum_{i\in[K!],j\in[K-1]}\mathbb{E}[T_{\sigma_{i}(j),\sigma_{i}(j+1)}]=2T(K-1)!

We know that there must exists some $i$ such that

\displaystyle\sum_{j\in[K]}\mathbb{E}[T_{\sigma_{i}(j),\sigma_{i}(j+1)}]\leq\frac{2T(K-1)!}{K!}=\frac{2T}{K}.

Now we design $K$ instances $\theta_{0},\cdots,\theta_{K-1}$ . We let $\theta_{0}=(X_{\sigma_{i}(1)},\cdots,X_{\sigma_{i}(K)})$ , and $\theta_{l}$ be the instance that switches the element $X_{\sigma_{i}(l)}$ with $X_{\sigma_{i}(l+1)}$ based on $\theta_{0}$ , where $l\in[K-1]$ . From Le Cam’s two point lemma, we have

	$\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in\{0,1\}^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{SORT}(\theta))$
	$\displaystyle\geq\inf_{\hat{\mu}}\sup_{\theta\in\{\theta_{l}\}_{l\in[K]}}\mathbb{P}(\hat{\mu}\neq\mathsf{SORT}(\theta))$
	$\displaystyle\geq\inf_{\Psi}\frac{1}{2K}\sum_{l\in[K]}\mathbb{P}_{\theta_{l}}(\Psi\neq l).$

Now by Fano’s inequality, we have

	$\displaystyle\inf_{\Psi}\frac{1}{2K}\sum_{l\in[K]}\mathbb{P}_{\theta_{l}}(\Psi\neq l)$
	$\displaystyle\geq\frac{1}{2}\cdot\left(1-\frac{I(V;X)+\log(2)}{\log(K)}\right)$
	$\displaystyle=\frac{1}{2}\cdot\left(1-\frac{\sum_{l\in[K]}D_{\mathsf{KL}}(\mathbb{P}^{l},\bar{\mathbb{P}})/K+\log(2)}{\log(K)}\right).$

Here $\bar{\mathbb{P}}=\frac{1}{K}\sum_{l\in[K]}\mathbb{P}^{l}$ . We can compute

	$\displaystyle D_{\mathsf{KL}}(\mathbb{P}^{l},\bar{\mathbb{P}})$
	$\displaystyle\leq\mathbb{E}[T_{\sigma_{i}(l),\sigma_{i}(l+1)}]D_{\mathsf{KL}}\left(1-p\\|\frac{(K-1)p+(1-p)}{K}\right)$
	$\displaystyle+\sum_{m\neq l}\mathbb{E}[T_{\sigma_{i}(m),\sigma_{i}(m+1)}]D_{\mathsf{KL}}\left(p\\|\frac{(K-1)p+(1-p)}{K}\right)$
	$\displaystyle\leq\frac{K-1}{K}\cdot\mathbb{E}[T_{\sigma_{i}(l),\sigma_{i}(l+1)}]D_{\mathsf{KL}}\left(p\\|1-p\right)$
	$\displaystyle+\frac{1}{K}\sum_{m\neq l}\mathbb{E}[T_{\sigma_{i}(m),\sigma_{i}(m+1)}]D_{\mathsf{KL}}\left(p\\|1-p\right).$

Now summing over all $l$ , we have

	$\displaystyle\frac{1}{K}\sum_{l\in[K]}D_{\mathsf{KL}}(\mathbb{P}^{l},\bar{\mathbb{P}})$	$\displaystyle\leq\frac{2D_{\mathsf{KL}}\left(p\\|1-p\right)}{K}\cdot\sum_{l\in[K]}\mathbb{E}[T_{\sigma_{i}(l),\sigma_{i}(l+1)}]$
		$\displaystyle\leq\frac{4TD_{\mathsf{KL}}\left(p\\|1-p\right)}{K^{2}}.$

This gives

\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in\{0,1\}^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{OR}(\theta))

\displaystyle\geq\frac{1}{2}\cdot\left(1-\frac{4T\cdot D_{\mathsf{KL}}(p\|1-p)}{K^{2}\log(K)}\right).

This shows that to output the correct answer with probability at least $2/3$ , one needs at least $C\cdot K^{2}\log(K)/D_{\mathsf{KL}}(p\|1-p)$ queries for some universal constant $C$ .

∎

Appendix H Upper Bounds for Non-adaptive Sampling

In this section, we present a theorem on the upper bounds for non-adaptive learning in the worst-case query model.

Theorem H.1.

One can design algorithm such that the worst-case query complexity is

1.

$\mathcal{O}(\frac{K\log(K/\delta)}{1-H(p)})$ for computing $\mathsf{OR}$ ;
2.

$\mathcal{O}(\frac{K^{2}\log(K/\delta)}{1-H(p)})$ for computing $\mathsf{MAX}$ ;
3.

$\mathcal{O}(\frac{K\log(1/\delta)}{1-H(p)})$ for computing $\mathsf{SEARCH}$ ;
4.

$\mathcal{O}(\frac{K^{2}\log(K/\delta)}{1-H(p)})$ for computing $\mathsf{SORT}$ ;

The results for $\mathsf{OR}$ , $\mathsf{MAX}$ and $\mathsf{SORT}$ are based the simple algorithms of querying all possible elements equal number of times. And the analysis is a direct union bound argument. Here we only present the algorithm and analysis for $\mathsf{SEARCH}$ .

Proof.

Assume that the target $X$ lies between $X_{l}$ and $X_{l+1}$ . Consider the non-adaptive learning algorithm which compares $X$ with each element $X_{i}$ for $T_{i}=\lfloor T/K\rfloor=4\log(1/\delta)/(1-H(p))$ times.

Let $N_{i}$ be the number of observations of $1$ among $T_{i}$ queries for element $X_{i}$ . Consider the following algorithm:

\displaystyle\hat{l}=\operatorname*{arg\,max}_{l\in[K]}\sum_{i=1}^{l}N_{i}+\sum_{i=l+1}^{K}(T_{i}-N_{i}).

We show that with high probability, $\hat{l}=l$ . We have

	$\displaystyle\mathbb{P}(\hat{l}\neq l)$	$\displaystyle\leq\sum_{j\neq l}\mathbb{P}(\hat{l}=j)$
		$\displaystyle\leq\sum_{j\neq l}\mathbb{P}\Big{(}\sum_{i=1}^{l}N_{i}+\sum_{i=l+1}^{K}(T_{i}-N_{i})-(\sum_{i=1}^{j}N_{i}+\sum_{i=j+1}^{K}(T_{i}-N_{i}))<0\Big{)}.$

Now we bound the above probability for each $j$ . Without loss of generality, we assume that $j>l$ . The above probability can be written as

		$\displaystyle\mathbb{P}\Big{(}\sum_{i=1}^{l}N_{i}+\sum_{i=l+1}^{K}(T_{i}-N_{i})<\sum_{i=1}^{j}N_{i}+\sum_{i=j+1}^{K}(T_{i}-N_{i})\Big{)}$
	$\displaystyle=$	$\displaystyle\mathbb{P}\Big{(}\sum_{i=l+1}^{j}(T_{i}-2N_{i})<0\Big{)}$
	$\displaystyle=$	$\displaystyle\mathbb{P}\Big{(}\sum_{i=l+1}^{j}N_{i}>\frac{1}{2}(j-l)\lfloor T/K\rfloor\Big{)}$

Note that $\sum_{i=l+1}^{j}N_{i}\sim\mathsf{Bin}((j-l)\lfloor T/K\rfloor,p)$ . Let $n=\frac{1}{2}(j-l)\lfloor T/K\rfloor$ . From Lemma A.6 we have

	$\displaystyle\mathbb{P}(\sum_{i=l+1}^{j}N_{i}\geq n/2)$	$\displaystyle\leq\left(\frac{e^{\frac{1-2p}{2p}}}{(1/2p)^{(1/2p)}}\right)^{np}=(2p\cdot\exp(1-2p))^{n/2}$
		$\displaystyle<\exp\left(\log(1/\delta)\cdot{\frac{2(j-l)(1-2p+\log(2p))}{(1-2p)^{2}}}\right)<\delta^{j-l}.$

Now by summing over the probability for different $j^{\prime}s$ , we get

\displaystyle\mathbb{P}(\hat{l}\neq l)\leq\sum_{j\neq l}\delta^{|j-l|}<\frac{2\delta}{1-\delta}.

Rescaling $\delta$ finishes the proof. ∎

Appendix I Upper Bounds for Adaptive Sampling

Here we present the tournament algorithm for computing $\mathsf{OR}$ introduced in (Feige et al., 1994). Similar algorithm can also be applied to compute $\mathsf{MAX}$ . The main difference is that in $\mathsf{MAX}$ , we directly compare two elements $\lceil\frac{4(2i-1)\log(1/\delta)}{(1-H(p))}\rceil$ times instead of comparing their number of $1^{\prime}$ s.

Algorithm 2 Tournament for computing

\mathsf{OR}

with noise

1:Input: Target confidence level

\delta

2:Set

\mathcal{X}=(X_{1},X_{2},\cdots,X_{K})

as the list of all bits with unknown value.

3:for iteration

i=1:\lceil\log_{2}(K)\rceil

4: Query

\lceil\frac{4(2i-1)\log(1/\delta)}{(1-H(p))}\rceil

times each of the element in

\mathcal{X}

5: for iteration

j=1:\lceil|\mathcal{X}|/2\rceil

6: Compare the number of

1

’s in the queries from the

(2j-1)

-th element and

(2j)

-th element, remove the element with smaller number of

1

’s from the list

\mathcal{X}

. Ties are broken arbitrarily. (If the

(2j)

-th element does not exist, we will not remove the

(2j-1)

-th element.)

7: Break when

\mathcal{X}

only has one element left.

8:Query

\lceil\frac{6\log(1/\delta)}{(1-H(p))}\rceil

times the only left element in

\mathcal{X}

. Return

1

if there is more than half

1

’s, and

0

otherwise.

The following theorem is due to (Feige et al., 1994). We include it here for completeness.

Theorem I.1.

Algorithm 2 finishes within $C\cdot K\log(1/\delta)/(1-H(p))$ queries, and outputs the correct value of $\mathsf{OR}(X_{1},\cdots,X_{K})$ with probability at least $1-2\delta$ when $\delta<1/2$ .

Proof.

First, we compute the total number of queries of the algorithm. Without loss of generality, we may assume that $K$ can be written as $2^{m}$ for some integer $m$ . If not we may add no more than $K$ extra dummy $0$ ’s to the original list $\mathcal{X}$ to make sure $K=2^{m}$ . In each of the outer iteration $i$ , the size of $\mathcal{X}$ is decreased half. We know that after $\lceil\log_{2}(K)\rceil$ round, the set $\mathcal{X}$ will only contain one element. In round $i$ , the number of queries we make for each element is $\lceil\frac{4(2i-1)\log(1/\delta)}{(1-2p)^{2}}\rceil$ . The total number of queries we make is

		$\displaystyle\left\lceil\frac{4\log(1/\delta)}{(1-2p)^{2}}\right\rceil+\sum_{i=1}^{\lceil\log_{2}(K)\rceil}\left\lceil\frac{4(2i-1)\log(1/\delta)}{(1-2p)^{2}}\right\rceil\cdot\frac{K}{2^{i-1}}$
	$\displaystyle\leq$	$\displaystyle\left\lceil\frac{4\log(1/\delta)}{(1-2p)^{2}}\right\rceil+\sum_{i=1}^{\lceil\log_{2}(K)\rceil}\left(\frac{4(2i-1)\log(1/\delta)}{(1-2p)^{2}}+1\right)\cdot\frac{K}{2^{i-1}}$
	$\displaystyle\leq$	$\displaystyle\left\lceil\frac{4\log(1/\delta)}{(1-2p)^{2}}\right\rceil+2K+\frac{K\log(1/\delta)}{(1-2p)^{2}}\sum_{i=1}^{\lceil\log_{2}(K)\rceil}\frac{4(2i-1)}{2^{i-1}}$
	$\displaystyle\leq$	$\displaystyle\left\lceil\frac{4\log(1/\delta)}{(1-2p)^{2}}\right\rceil+2K+\frac{28K\log(1/\delta)}{(1-2p)^{2}}$
	$\displaystyle\leq$	$\displaystyle\frac{CK\log(1/\delta)}{1-H(p)}$

Here in last inequality we use the fact below, which can be verified numerically:

\displaystyle\forall p\in[0,1],\frac{(1/2-p)^{2}}{1-H(p)}\in[1/4,1/2].

(5)

Now we show that the failure probability of the algorithm is at most $\delta$ . Consider the first case where all the elements are $0$ . Then no matter which element is left in $\mathcal{X}$ , the probability that the algorithm fails is the probability that a Binomial random variable $X\sim B(n,np)$ has value larger or equal to $n/2$ with $n=\left\lceil\frac{4\log(1/\delta)}{(1-2p)^{2}}\right\rceil$ .

Taking $\lambda=np$ , $\eta=\frac{1-2p}{2p}$ in Equation (3) of Lemma A.6, we know that

	$\displaystyle\mathbb{P}(X\geq n/2)$	$\displaystyle\leq\left(\frac{e^{\frac{1-2p}{2p}}}{(1/2p)^{(1/2p)}}\right)^{np}$
		$\displaystyle=(2p\cdot\exp(1-2p))^{n/2}$
		$\displaystyle<\exp\left(\log(1/\delta)\cdot{\frac{2(1-2p+\log(2p))}{(1-2p)^{2}}}\right)$
		$\displaystyle<\delta.$

Here the last inequality uses the fact that ${\frac{(1-2p+\log(2p))}{(1-2p)^{2}}}<-1/2$ for all $p\in[0,1/2)$ . This shows that the final failure probability is bounded by $\delta$ when the true elements are all $0$ .

Consider the second case where there exists at least a $1$ in the original elements $X_{1},\cdots,X_{K}$ . Without loss of generality, we assume that $X_{1}=1$ . Let $\mathcal{X}^{i}$ be the remaining list of elements at the beginning of $i$ -th iteration. We let $\mathcal{E}_{i}$ be the event that the first element in $\mathcal{X}^{i}$ is $1$ while the first element in $\mathcal{X}^{i+1}$ is $0$ . This event only happens when the second element in $\mathcal{X}^{i}$ is $0$ and gets more $1$ ’s in the noisy queries than the first element. Let $\mathcal{A}$ denote the event that the only left element is $1$ in the last round, we have

	$\displaystyle\mathbb{P}(\mathcal{A})$	$\displaystyle\geq 1-\mathbb{P}\left(\bigcup_{i=1}^{\lceil\log_{2}(K)\rceil}\mathcal{E}_{i}\right)$
		$\displaystyle\geq 1-\sum_{i=1}^{\lceil\log_{2}(K)\rceil}\mathbb{P}(\mathcal{E}_{i})$
		$\displaystyle\geq 1-\sum_{i=1}^{\lceil\log_{2}(K)\rceil}\mathbb{P}(Y_{i}-X_{i}\geq 0),$

where $X_{i}\sim B(n_{i},n_{i}(1-p))$ and $Y_{i}\sim B(n_{i},n_{i}p)$ , with $n_{i}=\lceil\frac{4(2i-1)\log(1/\delta)}{(1-2p)^{2}}\rceil$ . Let $Z_{i}=Y_{i}-X_{i}+n_{i}$ , we know that the random variable $Z_{i}\sim B(2n_{i},n_{i}p)$ . Thus we have

	$\displaystyle\mathbb{P}(\mathcal{A})$	$\displaystyle\geq 1-\sum_{i=1}^{\lceil\log_{2}(K)\rceil}\mathbb{P}(Z_{i}\geq n_{i})$
		$\displaystyle>1-\sum_{i=1}^{\lceil\log_{2}(K)\rceil}\delta^{2(2i-1)}$
		$\displaystyle>1-\frac{\delta^{2}}{1-\delta^{2}}.$

Following the same argument on Binomial distribution, we can upper bound the error probability under event $\mathcal{A}$ with $\delta$ . Thus the total failure probability is upper bounded by $\delta+\delta^{2}/(1-\delta^{2})<2\delta$ when $\delta<1/2$ .

∎

Appendix J Proof of Theorem VI.1

J-A Lower Bounds

First, note that all our lower bounds for fixed-length can be adapted to variable-length by replacing $T$ with $\mathbb{E}[T]$ . The bound of mutual information in Fano’s inequality in Section D can be proven using the same argument as Lemma 27 in Gu and Xu (2023), and the divergence decomposition lemma (Lemma A.3) still holds for variable length due to Lemma 15 in Kaufmann et al. (2016).

J-B Upper Bounds

Now it suffices to prove the upper bounds. For $\mathsf{OR}$ and $\mathsf{MAX}$ , we already know from Theorem I.1 that $\mathcal{O}(K\log(1/\delta)/(1-H(p)))$ is an upper bound for fixed-length setting when comparing two elements with error probability at most $\delta$ requires $\lceil{C\log(1/\delta)}/{(1-H(p))}\rceil$ samples for some constant $C$ . Now for variable-length setting, we know that comparing two elements with error probability at most $\delta$ only requires $\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p)+1/(1-2p)$ queries in expectation via the variable-length comparison algorithm in Lemma 13 of Gu and Xu (2023). Thus in Algorithm 1, we replace the repetition-based comparisons in Algorithm 2 with the new variable-length comparison algorithm. This gives the query complexity $\mathcal{O}(K/(1-H(p))+K\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p))$ . Similar algorithm can also be applied to compute $\mathsf{MAX}$ . The main difference is that in $\mathsf{MAX}$ , we directly compare two elements instead of finding the number of $1^{\prime}$ s of the first element.

Theorem J.1.

The expected number of total queries made by Algorithm 1 is upper bounded by $C\cdot\left(\frac{K}{1-H(p)}+\frac{K\log(1/\delta)}{D_{\mathsf{KL}}(p\|1-p)}\right)$ . Furthermore, the algorithm outputs the correct value of $\mathsf{OR}(X_{1},\cdots,X_{K})$ with probability at least $1-2\delta$ when $\delta<1/2$ .

Proof.

First, we compute the total number of queries of the algorithm. Without loss of generality, we may assume that $K$ can be written as $2^{m}$ for some integer $m$ . If not we may add no more than $K$ extra dummy $0$ ’s to the original list $\mathcal{X}$ to make sure $K=2^{m}$ . In each of the outer iteration $i$ , the size of $\mathcal{X}$ is decreased half. We know that after $\lceil\log_{2}(K)\rceil$ iterations, the set $\mathcal{X}$ will only contain one element. From Lemma 27 in Gu and Xu (2023), the expected number of queries we make is $\mathcal{O}(\log(1/\tilde{\delta}_{i})/D_{\mathsf{KL}}(p\|1-p)+1/(1-2p))$ at round $i$ . Thus the total number of queries we make is

		$\displaystyle\frac{\log(1/\delta)}{D_{\mathsf{KL}}(p\\|1-p)}+\frac{1}{1-2p}+C\cdot\sum_{i=1}^{\lceil\log_{2}(K)\rceil}\left(\frac{\log(1/\tilde{\delta}_{i})}{D_{\mathsf{KL}}(p\\|1-p)}+\frac{1}{1-2p}\right)\cdot\frac{K}{2^{i-1}}$
	$\displaystyle\leq$	$\displaystyle C\cdot\sum_{i=1}^{\lceil\log_{2}(K)\rceil}\left(\frac{(4i-2)\log(1/\delta)}{D_{\mathsf{KL}}(p\\|1-p)}+\frac{1}{1-2p}\right)\cdot\frac{K}{2^{i-1}}$
	$\displaystyle\leq$	$\displaystyle C\cdot\left(\frac{K}{1-2p}+\frac{K\log(1/\delta)}{D_{\mathsf{KL}}(p\\|1-p)}\cdot\sum_{i=1}^{\lceil\log_{2}(K)\rceil}\frac{4(2i-1)}{2^{i-1}}\right)$
	$\displaystyle\leq$	$\displaystyle C\cdot\left(\frac{K}{1-2p}+\frac{K\log(1/\delta)}{D_{\mathsf{KL}}(p\\|1-p)}\right)$
	$\displaystyle\leq$	$\displaystyle C\cdot\left(\frac{K}{1-H(p)}+\frac{K\log(1/\delta)}{D_{\mathsf{KL}}(p\\|1-p)}\right).$

Here in last inequality we use the fact below, which can be verified numerically:

\displaystyle\forall p\in[0,1],\frac{(1/2-p)^{2}}{1-H(p)}\in[1/4,1/2].

(6)

Now we show that the failure probability of the algorithm is at most $\delta$ . Consider the first case where all the elements are $0$ . Then no matter which element is left in $\mathcal{X}$ , the probability that the algorithm fails is the probability that the last while loop gives wrong output. From Lemma 27 in Gu and Xu (2023), we know that such probability is less than $\delta$ .

Consider the second case where there exists at least a $1$ in the original elements $X_{1},\cdots,X_{K}$ . Without loss of generality, we assume that $X_{1}=1$ . Let $\mathcal{X}^{i}$ be the remaining list of elements at the beginning of $i$ -th iteration. We let $\mathcal{E}_{i}$ be the event that the first element in $\mathcal{X}^{i}$ is $1$ while the first element in $\mathcal{X}^{i+1}$ is $0$ . This event only happens when the while loop ends with $a<\tilde{\delta}_{i}$ for the first element at the $i$ -th round. Let $\mathcal{A}$ denote the event that the only left element is $1$ in the last round, we have

	$\displaystyle\mathbb{P}(\mathcal{A})$	$\displaystyle\geq 1-\mathbb{P}\left(\bigcup_{i=1}^{\lceil\log_{2}(K)\rceil}\mathcal{E}_{i}\right)$
		$\displaystyle\geq 1-\sum_{i=1}^{\lceil\log_{2}(K)\rceil}\mathbb{P}(\mathcal{E}_{i})$
		$\displaystyle>1-\sum_{i=1}^{\lceil\log_{2}(K)\rceil}\delta^{2(2i-1)}$
		$\displaystyle>1-\frac{\delta^{2}}{1-\delta^{2}}.$

We can upper bound the error probability under event $\mathcal{A}$ with $\delta$ . Thus the total failure probability is upper bounded by $\delta+\delta^{2}/(1-\delta^{2})<2\delta$ when $\delta<1/2$ .

∎

The upper bound for $\mathsf{SEARCH}$ is due to Gu and Xu (2023), and is thus omitted here. The upper bound for $\mathsf{SORT}$ is based on that for $\mathsf{SEARCH}$ . We can design an insertion-based sorting algorithm by adding elements sequentially to an initially empty sorted set via noisy searching, as in Algorithm 1 in (Wang et al., 2023). Since the insertion step requires $\mathcal{O}(\log(K)/(1-H(p))+\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p))$ queries. Overall we know that one needs $\mathcal{O}(\sum_{k=1}^{K}(\log(k)/(1-H(p))+\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p)))$ $=\mathcal{O}(K\log(K)/(1-H(p))+K\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p))$ queries to achieve error probability at most $K\delta$ . Rescaling $\delta$ gives the final rate.

	$\displaystyle D_{\mathsf{KL}}(\mathbb{P}^{l},\bar{\mathbb{P}})$
	$\displaystyle\leq\mathbb{E}[T_{\sigma_{i}(l),\sigma_{i}(l+1)}]D_{\mathsf{KL}}\left(1-p\\|\frac{(K-1)p+(1-p)}{K}\right)$
	$\displaystyle+\sum_{m\neq l}\mathbb{E}[T_{\sigma_{i}(m),\sigma_{i}(m+1)}]D_{\mathsf{KL}}\left(p\\|\frac{(K-1)p+(1-p)}{K}\right)$
	$\displaystyle\leq\frac{K-1}{K}\cdot\mathbb{E}[T_{\sigma_{i}(l),\sigma_{i}(l+1)}]D_{\mathsf{KL}}\left(p\\|1-p\right)$
	$\displaystyle+\frac{1}{K}\sum_{m\neq l}\mathbb{E}[T_{\sigma_{i}(m),\sigma_{i}(m+1)}]D_{\mathsf{KL}}\left(p\\|1-p\right).$

		$\displaystyle\frac{\log(1/\delta)}{D_{\mathsf{KL}}(p\\|1-p)}+\frac{1}{1-2p}+C\cdot\sum_{i=1}^{\lceil\log_{2}(K)\rceil}\left(\frac{\log(1/\tilde{\delta}_{i})}{D_{\mathsf{KL}}(p\\|1-p)}+\frac{1}{1-2p}\right)\cdot\frac{K}{2^{i-1}}$
	$\displaystyle\leq$	$\displaystyle C\cdot\sum_{i=1}^{\lceil\log_{2}(K)\rceil}\left(\frac{(4i-2)\log(1/\delta)}{D_{\mathsf{KL}}(p\\|1-p)}+\frac{1}{1-2p}\right)\cdot\frac{K}{2^{i-1}}$
	$\displaystyle\leq$	$\displaystyle C\cdot\left(\frac{K}{1-2p}+\frac{K\log(1/\delta)}{D_{\mathsf{KL}}(p\\|1-p)}\cdot\sum_{i=1}^{\lceil\log_{2}(K)\rceil}\frac{4(2i-1)}{2^{i-1}}\right)$
	$\displaystyle\leq$	$\displaystyle C\cdot\left(\frac{K}{1-2p}+\frac{K\log(1/\delta)}{D_{\mathsf{KL}}(p\\|1-p)}\right)$
	$\displaystyle\leq$	$\displaystyle C\cdot\left(\frac{K}{1-H(p)}+\frac{K\log(1/\delta)}{D_{\mathsf{KL}}(p\\|1-p)}\right).$

On the Optimal Bounds for Noisy Computing

On the Optimal Bounds for Noisy Computing

Abstract

I Introduction

I-A Related Work

Noisy binary searching

Noisy Sorting and max selection

𝖮𝖱\mathsf{OR} and best arm identification

I-B Problem Definition and Preliminaries

II Computing the 𝖮𝖱\mathsf{OR} function

II-A Adaptive Sampling

Theorem II.1.

Proof of Theorem II.1.

Remark II.2.

II-B Non-adaptive Sampling

Theorem II.3.

Proof of Theorem II.3.

Remark II.4.

III Computing the 𝖬𝖠𝖷\mathsf{MAX} function

III-A Adaptive Sampling

Theorem III.1.

III-B Non-adaptive Sampling

Theorem III.2.

Remark III.3.

IV Computing the 𝖲𝖤𝖠𝖱𝖢𝖧\mathsf{SEARCH} function

IV-A Adaptive Sampling

Theorem IV.1.

Remark IV.2.

IV-B Non-adaptive Sampling

Theorem IV.3.

Remark IV.4.

V Computing the 𝖲𝖮𝖱𝖳\mathsf{SORT} function

V-A Adaptive Sampling

Theorem V.1.

Remark V.2.

V-B Non-adaptive Sampling

Theorem V.3.

Remark V.4.

VI Matching Bounds for Variable Length

Theorem VI.1.

VII Conclusions and Future Work

Acknowledgements

References

Appendix A Useful Lemmas

Lemma A.1 (Le Cam’s Two Point Lemma, see e.g. (Yu, 1997)).

Lemma A.2 (Point vs Mixture, see e.g. (Ingster et al., 2003)).

Lemma A.3 (Divergence Decomposition (Auer et al., 1995)).

Lemma A.4 (An upper Bound of Bretagnolle–Huber inequality ((Bretagnolle and Huber, 1979), and Lemma 2.6 in Tsybakov (2004))).

Lemma A.5.

Lemma A.6.

Appendix B Proof for Theorem III.1

Proof.

Appendix C Proof for Theorem III.2

Proof.

Appendix D Proof for Theorem IV.1

Proof.

Appendix E Proof for Theorem IV.3

Proof.

Appendix F Proof for Theorem V.1

Proof.

Appendix G Proof for Theorem V.3

Proof.

Appendix H Upper Bounds for Non-adaptive Sampling

Theorem H.1.

Proof.

Appendix I Upper Bounds for Adaptive Sampling

Theorem I.1.

Proof.

Appendix J Proof of Theorem VI.1

J-A Lower Bounds

J-B Upper Bounds

Theorem J.1.

Proof.

$\mathsf{OR}$ and best arm identification

II Computing the $\mathsf{OR}$ function

III Computing the $\mathsf{MAX}$ function

IV Computing the $\mathsf{SEARCH}$ function

V Computing the $\mathsf{SORT}$ function