This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\NewEnviron

resizealign \BODY\displaystyle\BODY (1)

On the Optimal Bounds for Noisy Computing

Banghua Zhu,    Ziao Wang    Nadim Ghaddar    Jiantao Jiao    Lele Wang Banghua Zhu is with the Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA 94720, USA (email: [email protected]).Ziao Wang is with the Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, BC V6T1Z4, Canada (email: [email protected]).Nadim Ghaddar is with the Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA 92093, USA, (email: [email protected]).Jiantao Jiao is with the Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA 94720, USA (email: [email protected]).Lele Wang is with the Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, BC V6T1Z4, Canada (email: [email protected]).A shorter version of this work is accepted at the 2023 IEEE International Symposium on Information Theory.

On the Optimal Bounds for Noisy Computing

Banghua Zhu,    Ziao Wang    Nadim Ghaddar    Jiantao Jiao    Lele Wang Banghua Zhu is with the Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA 94720, USA (email: [email protected]).Ziao Wang is with the Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, BC V6T1Z4, Canada (email: [email protected]).Nadim Ghaddar is with the Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA 92093, USA, (email: [email protected]).Jiantao Jiao is with the Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA 94720, USA (email: [email protected]).Lele Wang is with the Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, BC V6T1Z4, Canada (email: [email protected]).A shorter version of this work is accepted at the 2023 IEEE International Symposium on Information Theory.
Abstract

We revisit the problem of computing with noisy information considered in Feige et al. (1994), which includes computing the 𝖮𝖱\mathsf{OR} function from noisy queries, and computing the 𝖬𝖠𝖷,𝖲𝖤𝖠𝖱𝖢𝖧\mathsf{MAX},\mathsf{SEARCH}, and 𝖲𝖮𝖱𝖳\mathsf{SORT} functions from noisy pairwise comparisons. For KK given elements, the goal is to correctly recover the desired function with probability at least 1δ1-\delta when the outcome of each query is flipped with probability pp. We consider both the adaptive sampling setting where each query can be adaptively designed based on past outcomes, and the non-adaptive sampling setting where the query cannot depend on past outcomes. The prior work provides tight bounds on the worst-case query complexity in terms of the dependence on KK. However, the upper and lower bounds do not match in terms of the dependence on δ\delta and pp. We improve the lower bounds for all the four functions under both adaptive and non-adaptive query models. Most of our lower bounds match the upper bounds up to constant factors when either pp or δ\delta is bounded away from 0, while the ratio between the best prior upper and lower bounds goes to infinity when p0p\rightarrow 0 or p1/2p\rightarrow 1/2. On the other hand, we also provide matching upper and lower bounds for the number of queries in expectation, improving both the upper and lower bounds for the variable-length query model.

I Introduction

The problem of computing with noisy information has been studied extensively since the seminal work (Feige et al., 1994), which considers four problems:

  • Computing the 𝖮𝖱\mathsf{OR} function of KK bits from noisy observations of the bits;

  • Finding the largest (or top-NN) element among KK real-valued elements from noisy pairwise comparisons;

  • Searching the rank of a new element in an ordered list of KK elements from noisy pairwise comparisons;

  • Sorting KK elements from noisy pairwise comparisons.

Feige et al. (1994) is based on a simple noise model where each observation goes through a binary symmetric channel 𝖡𝖲𝖢(p)\mathsf{BSC}(p), i.e. for each observation, with probability pp we see its flipped outcome, and with probability 1p1-p we see its true value. They provide upper and lower bounds for the query complexity in terms of the total number of elements KK, the noise probability pp, and the desired confidence level δ\delta when adaptive querying is allowed. They establish the optimal query complexity in terms of dependence with respect to KK. However, the exact sample/query complexity with respect to all parameters K,δK,\delta, and pp is still not fully understood. In this paper, we revisit the problem of computing under noisy observations in both the adaptive sampling and non-adaptive sampling settings. We aim to close the gap in the existing bounds and illustrate the difference in query complexity between adaptive sampling and non-adaptive sampling.

Taking the problem of computing the 𝖮𝖱\mathsf{OR} function of KK bits as an example, assume that there are KK bits (X1,,XK){0,1}K(X_{1},\cdots,X_{K})\in\{0,1\}^{K}. The 𝖮𝖱\mathsf{OR} function is defined as

𝖮𝖱(X1,,XK)={1, if k[K],Xk=10, otherwise.\displaystyle\mathsf{OR}(X_{1},\cdots,X_{K})=\begin{cases}1,&\text{ if }\exists k\in[K],X_{k}=1\\ 0,&\text{ otherwise.}\end{cases} (2)

The question is simple when we can query each bit noiselessly. In this case, KK queries are both sufficient and necessary since it suffices to query each bit once. And thus there is no benefit in applying adaptive querying compared to non-adaptive querying. When the observation of each query goes through a binary symmetric channel 𝖡𝖲𝖢(p)\mathsf{BSC}(p), we ask two questions:

  • How many queries (samples) do we need in the worst case to recover the true 𝖮𝖱\mathsf{OR} function value of any given sequences X1,,XKX_{1},\cdots,X_{K} with probability at least 1δ1-\delta?

  • Can adaptive sampling do better than non-adaptive sampling when noise is present?

Problem Fixed Length, Adaptive Sampling
Upper Bound Lower Bound
𝖮𝖱\mathsf{OR} 𝒪(Klog(1/δ)1H(p))\mathcal{O}(\frac{K\log(1/\delta)}{1-H(p)}) (Feige et al., 1994) Ω(K1H(p)+Klog(1/δ)D𝖪𝖫(p1p))\Omega(\frac{K}{1-H(p)}+\frac{K\log(1/\delta)}{D_{\mathsf{KL}}(p\|1-p)}) (Thm II.1)
𝖬𝖠𝖷\mathsf{MAX} 𝒪(Klog(1/δ)1H(p))\mathcal{O}(\frac{K\log(1/\delta)}{1-H(p)}) (Feige et al., 1994) Ω(K1H(p)+Klog(1/δ)D𝖪𝖫(p1p))\Omega(\frac{K}{1-H(p)}+\frac{K\log(1/\delta)}{D_{\mathsf{KL}}(p\|1-p)}) (Thm III.1)
𝖲𝖤𝖠𝖱𝖢𝖧\mathsf{SEARCH} 𝒪(log(K/δ)1H(p))\mathcal{O}(\frac{\log(K/\delta)}{1-H(p)}) (Feige et al., 1994) Ω(log(K)1H(p)+log(1/δ)D𝖪𝖫(p1p))\Omega(\frac{\log(K)}{1-H(p)}+\frac{\log(1/\delta)}{D_{\mathsf{KL}}(p\|1-p)}) (Thm IV.1)
𝖲𝖮𝖱𝖳\mathsf{SORT} 𝒪(Klog(K/δ)1H(p))\mathcal{O}(\frac{K\log(K/\delta)}{1-H(p)}) (Feige et al., 1994) Ω(Klog(K)1H(p)+Klog(K/δ)D𝖪𝖫(p1p))\Omega(\frac{K\log(K)}{1-H(p)}+\frac{K\log(K/\delta)}{D_{\mathsf{KL}}(p\|1-p)}) (Thm V.1)
Problem Fixed Length, Non-adaptive Sampling
Upper Bound (Appendix H) Lower Bound
𝖮𝖱\mathsf{OR} 𝒪(Klog(K/δ)1H(p))\mathcal{O}(\frac{K\log(K/\delta)}{1-H(p)}) Ω(max(K,Klog(K)p1H(p),Klog(K)log((1p)/p)))\Omega(\max(K,\frac{K\log(K)p}{1-H(p)},\frac{K\log(K)}{\log((1-p)/p)})) (Thm II.3)
𝖬𝖠𝖷\mathsf{MAX} 𝒪(K2log(K/δ)1H(p))\mathcal{O}(\frac{K^{2}\log(K/\delta)}{1-H(p)}) Ω(K21H(p)+K2log(1/δ)D𝖪𝖫(p1p))\Omega(\frac{K^{2}}{1-H(p)}+\frac{K^{2}\log(1/\delta)}{D_{\mathsf{KL}}(p\|1-p)}) (Thm III.2)
𝖲𝖤𝖠𝖱𝖢𝖧\mathsf{SEARCH} 𝒪(Klog(1/δ)1H(p))\mathcal{O}(\frac{K\log(1/\delta)}{1-H(p)}) Ω(K1H(p)+Klog(1/δ)D𝖪𝖫(p1p))\Omega(\frac{K}{1-H(p)}+\frac{K\log(1/\delta)}{D_{\mathsf{KL}}(p\|1-p)}) (Thm IV.3)
𝖲𝖮𝖱𝖳\mathsf{SORT} 𝒪(K2log(K/δ)1H(p))\mathcal{O}(\frac{K^{2}\log(K/\delta)}{1-H(p)}) Ω(K2+K2log(K)D𝖪𝖫(p1p)))\Omega(K^{2}+\frac{K^{2}\log(K)}{D_{\mathsf{KL}}(p\|1-p))}) (Thm V.3)
Problem Variable Length, Adaptive Sampling
Matching Bound (Thm VI.1)
𝖮𝖱\mathsf{OR} Θ(K1H(p)+Klog(1/δ)D𝖪𝖫(p1p))\Theta(\frac{K}{1-H(p)}+\frac{K\log(1/\delta)}{D_{\mathsf{KL}}(p\|1-p)})
𝖬𝖠𝖷\mathsf{MAX} Θ(K1H(p)+Klog(1/δ)D𝖪𝖫(p1p))\Theta(\frac{K}{1-H(p)}+\frac{K\log(1/\delta)}{D_{\mathsf{KL}}(p\|1-p)})
𝖲𝖤𝖠𝖱𝖢𝖧\mathsf{SEARCH} Θ(log(K)1H(p)+log(1/δ)D𝖪𝖫(p1p))\Theta(\frac{\log(K)}{1-H(p)}+\frac{\log(1/\delta)}{D_{\mathsf{KL}}(p\|1-p)})
𝖲𝖮𝖱𝖳\mathsf{SORT} Θ(Klog(K)1H(p)+Klog(K/δ)D𝖪𝖫(p1p))\Theta(\frac{K\log(K)}{1-H(p)}+\frac{K\log(K/\delta)}{D_{\mathsf{KL}}(p\|1-p)})
TABLE I: Summary of query complexity bounds of 𝖮𝖱\mathsf{OR}, 𝖬𝖠𝖷\mathsf{MAX}, 𝖲𝖤𝖠𝖱𝖢𝖧\mathsf{SEARCH} and 𝖲𝖮𝖱𝖳\mathsf{SORT}. Here, we assume δ<0.49\delta<0.49.

In Feige et al. (1994), an adaptive tournament algorithm is proposed that achieves worst-case query complexity 𝒪(Klog(1/δ)/(1H(p)))\mathcal{O}(K\log(1/\delta)/(1-H(p))), and a corresponding lower bound Ω(Klog(1/δ)/log((1p)/p))\Omega(K\log(1/\delta)/\log((1-p)/p)) for adaptive sampling is provided. A simple calculation tells us that their ratio log((1p)/p)/(1H(p))\log((1-p)/p)/(1-H(p)) goes to infinity as p0p\rightarrow 0 or p1/2p\rightarrow 1/2, which indicates that there is still a gap between the upper and the lower bounds when the noise probability pp is near the point 0 or 1/21/2. This calls for tighter upper or lower bound for these cases. In our paper, we improve the lower bound to Ω(K/(1H(p))+Klog(1/δ)/D𝖪𝖫(p1p))\Omega(K/(1-H(p))+K\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p)), which matches the existing upper bound up to a constant factor when either pp or δ\delta is bounded away from 0.

One may wonder how many samples are needed when each query is not allowed to depend on previous outcomes. We provide a lower bound Ω(max(K,Klog(K)p/(1H(p))),Klog(K)/log((1p)/p))\Omega(\max(K,K\log(K)p/(1-H(p))),K\log(K)/\log((1-p)/p)) for this non-adaptive setting. On the other hand, a repetition-based upper bound 𝒪(Klog(K/δ)/(1H(p)))\mathcal{O}(K\log(K/\delta)/(1-H(p))) matches the lower bound up to a constant factor when both pp and δ\delta are bounded from 0.

Similarly, we ask the same questions for computing 𝖬𝖠𝖷\mathsf{MAX}, 𝖲𝖤𝖠𝖱𝖢𝖧\mathsf{SEARCH}, and 𝖲𝖮𝖱𝖳\mathsf{SORT}. We defer the definitions and discussions of the problems to Section I-B. Here we summarize the best upper and lower bounds for the problems considered in this paper in Table I, either from previous art or from this paper.

Our lower bounds take the form of f(K)/(1H(p))+g(K,δ)/D𝖪𝖫(p1p)f(K)/(1-H(p))+g(K,\delta)/D_{\mathsf{KL}}(p\|1-p). Here the first term does not depend on δ\delta for δ<0.49\delta<0.49, representing the number of queries one must pay regardless of the target error probability δ\delta. The second term grows logarithmically with 1/δ1/\delta, representing the price to pay for smaller target error probability. Here we always have D𝖪𝖫(p1p)1H(p)D_{\mathsf{KL}}(p\|1-p)\gtrsim 1-H(p) for p(0,1)p\in(0,1), with D𝖪𝖫(p1p)1H(p)D_{\mathsf{KL}}(p\|1-p)\asymp 1-H(p) when pp is bounded away from 0. Technically, the first term is usually from Fano’s inequality, which gives better dependence on pp but worse dependence on δ\delta. The second term is from a 𝖪𝖫\mathsf{KL}-divergence based lower bound, which gives better dependence on δ\delta but worse dependence on pp.

We also extend our bounds from the fixed length query model to variable length query model (a.k.a. fixed budget and fixed confidence in the bandit literature (Kaufmann et al., 2016)). In the fixed length setting considered above, we ask for the worst-case deterministic number of queries required in the worst case to recover the true value with probability at least 1δ1-\delta. In the variable length setting, the number of queries can be random and dependent on the past outcomes. And we ask for the expected number of queries to recover the true value with probability at least 1δ1-\delta. We discuss the results for variable length in Section VI, where we give matching upper and lower bounds with respect to all parameters for computing all four functions, improving over both existing upper and lower bounds and closing the gap.

I-A Related Work

The problems of noisy computation have been studied extensively before and after (Feige et al., 1994). However, most of the existing research work focuses on tightening the dependence on KK instead of pp and δ\delta, or extending the results to a more general framework where the noise follows a generalized model that includes 𝖡𝖲𝖢\mathsf{BSC} channel as a special case. Although worst-case upper and lower bounds are provided for the generalized model, most of the lower bounds are based on instances where the noise does not follow a 𝖡𝖲𝖢\mathsf{BSC} model. Thus the lower bounds do not apply to our case.

Noisy binary searching

The noisy searching problem was first introduced by Rényi (Rényi, 1961) and Ulam (Ulam, 1976) and further developed by Berlekamp (Berlekamp, 1964) and Horstein (Horstein, 1963) in the context of coding for channels with feedback. The noisy searching algorithm in (Burnashev and Zigangirov, 1974) by Burnashev and Zigangirov can be seen as a specialization of Horstein’s coding scheme, whereas the algorithms in (Feige et al., 1994, Pelc, 1989, Karp and Kleinberg, 2007) can be seen as an adaptation of the binary search algorithm to the noisy setting.

The first tight lower bound for variable-length adaptive sampling in noisy searching is given by (Burnashev, 1976). The recent concurrent work (Gu and Xu, 2023) improves the dependence on constant when pp is some constant that is bounded away from 0 and 1/21/2. Our lower bounds are based on a different proof using Le Cam’s method. The results do not require pp to be bounded, but are worse in terms of the constant dependence. (Gu and Xu, 2023) also provides matching upper bound that is tight even with the constant in the variable-length setting. We provide more discussions in Section VI. Making the upper and lower bounds match in the fixed-length query model still remains an important open problem.

In terms of the bounds for non-adaptive sampling, the gap between 𝒪(log(K))\mathcal{O}(\log(K)) for adaptive sampling and 𝒪(K)\mathcal{O}(K) can be seen from the noiseless case when p=0p=0 (Rényi, 1961). Here we provide an improved bound for the noisy case that has explicit dependence on pp and δ\delta.

Noisy Sorting and max selection

The noisy sorting and max (or Top-NN) selection problems have been usually studied together (e.g., (Feige et al., 1994)) and later have been extended to a more general setting known as active ranking (Mohajer et al., 2017, Falahatgar et al., 2017, Shah and Wainwright, 2018, Heckel et al., 2019, Agarwal et al., 2017), where the noise pijp_{ij} for the comparison of a pair of elements ii and jj is usually unknown and different for different pairs. Other related but different settings for noisy sorting in the literature include the cases when some distance metric for permutations is to be minimized (rather than the the probability of error) (Ailon et al., 2008, Braverman and Mossel, 2009, Ailon, 2011, Negahban et al., 2012, Wauthier et al., 2013, Rajkumar and Agarwal, 2014, Shah et al., 2016, Shah and Wainwright, 2018, Mao et al., 2018), when the noise for each pairwise comparison is not i.i.d. and is determined by some noise model (e.g. the Bradley–Terry–Luce model(Bradley and Terry, 1952)(Ajtai et al., 2009, Negahban et al., 2012, Rajkumar and Agarwal, 2014, Chen and Suh, 2015, Chen et al., 2017, Ren et al., 2018), or when the ordering itself is restricted to some subset of all permutations (Jamieson and Nowak, 2011, Ailon et al., 2011).

For noisy sorting, the best upper and lower bounds have been provided in (Feige et al., 1994, Wang et al., 2023), which give an upper bound 𝒪(Klog(K/δ)/(1H(p)))\mathcal{O}(K\log(K/\delta)/(1-H(p))) and lower bound Ω(Klog(K)/(1H(p))+log(1/δ)/D𝖪𝖫(p1p))\Omega(K\log(K)/(1-H(p))+\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p)). We tighten the lower bound to be Ω(Klog(K)/(1H(p))+Klog(K/δ)/D𝖪𝖫(p1p))\Omega(K\log(K)/(1-H(p))+K\log(K/\delta)/D_{\mathsf{KL}}(p\|1-p)). On the other hand, (Gu and Xu, 2023) shows that the query complexity is (1+o(1))(Klog(K)/(1H(p))+Klog(K)/D𝖪𝖫(p1p))(1+o(1))(K\log(K)/(1-H(p))+K\log(K)/D_{\mathsf{KL}}(p\|1-p)), which does not scale with δ\delta. Our lower bound for fixed-length improves the dependence on δ\delta. We also make the bounds match up to constant factors for all parameters in the variable-length setting. However, making the δ\delta dependence tight in upper bound for fixed-length remains an open problem. In terms of the non-adaptive sampling scenario, we provide an upper and lower bound that matches in terms of the dependence on KK, but is still loose when both pp and δ\delta go to 0 simultaneously.

For max selection, the best known lower bound for fixed-length adaptive sampling is Ω(Klog(1/δ)/log((1p)/p))\Omega(K\log(1/\delta)/\log((1-p)/p)) (Feige et al., 1994), while our result makes it tight when either pp or δ\delta goes to 0 and provides matching bounds for variable-length setting. On the other hand, (Mohajer et al., 2017, Shah and Wainwright, 2018) discuss the gap between adaptive sampling and non-adaptive sampling. However, the Ω(K3log(K))\Omega(K^{3}\log(K)) lower bound for non-adaptive sampling in (Shah and Wainwright, 2018) does not apply to our case since it is based on a generalized model where the noise probability is different. In our case, 𝒪(K2log(K))\mathcal{O}(K^{2}\log(K)) is a natural upper bound. However, it is unclear whether our lower bound Ω(K2)\Omega(K^{2}) can be improved to Ω(K2log(K))\Omega(K^{2}\log(K)).

𝖮𝖱\mathsf{OR} and best arm identification

The noisy computation of 𝖮𝖱\mathsf{OR} has been first studied in the literature of circuit with noisy gates (Dobrushin and Ortyukov, 1977a, b, Von Neumann, 1956, Pippenger et al., 1991, Gács and Gál, 1994) and noisy decision trees (Feige et al., 1994, Evans and Pippenger, 1998, Reischuk and Schmeltz, 1991). Different from the other three problems we consider, computing 𝖮𝖱\mathsf{OR} does not rely on pairwise comparisons, but instead directly queries the values of the bits. This is also related to the rich literature of best arm identification, which queries real-valued arms and aims to identify the arm with largest value (reward) (Bubeck et al., 2009, Audibert et al., 2010, Garivier and Kaufmann, 2016, Jamieson and Nowak, 2014, Gabillon et al., 2012, Kaufmann et al., 2016). Indeed, any best-arm identification algorithm can be converted to an 𝖮𝖱\mathsf{OR} computation by first finding the maximum and then query the binary value of the maximum. This recovers the best existing upper bound 𝒪(Klog(1/δ)/(1H(p)))\mathcal{O}(K\log(1/\delta)/(1-H(p))) for computation of 𝖮𝖱\mathsf{OR} under adaptive sampling scenario (Audibert et al., 2010). However, the lower bound for best arm identification does not apply to our case, since 𝖮𝖱\mathsf{OR} has a binary output, while the best arm identification problem requires the arm index. And our lower bound for fixed-length adaptive sampling improves over the best known lower bound Ω(Klog(1/δ)/log((1p)/p))\Omega(K\log(1/\delta)/\log((1-p)/p))  (Feige et al., 1994). We also provide matching bounds for variable-length.

For non-adaptive sampling, (Reischuk and Schmeltz, 1991, Gács and Gál, 1994) provide a lower bound Ω(Klog(K)/log((1p)/p))\Omega(K\log(K)/\log((1-p)/p)). Our lower bound Ω(Klog(K)p/(1H(p)))\Omega(K\log(K)p/(1-H(p))) is tighter than the current lower bound when p1/2p\rightarrow 1/2, but looser when p0p\rightarrow 0. Thus the tightest bound is a maximum of KK and the two lower bounds.

I-B Problem Definition and Preliminaries

The 𝖮𝖱\mathsf{OR} function is defined in Equation (2). We define the rest of the problems here. Different from 𝖮𝖱\mathsf{OR}, the 𝖬𝖠𝖷\mathsf{MAX}, 𝖲𝖤𝖠𝖱𝖢𝖧\mathsf{SEARCH} and 𝖲𝖮𝖱𝖳\mathsf{SORT} problems are all based on noisy pairwise comparisons. Concretely, assume we have KK distinct real-valued items X1,,XKX_{1},\cdots,X_{K}. Instead of querying the exact value of each element, we can only query a pair of elements and observe their noisy comparison. For any queried pair (i,j)(i,j), we will observe a sample from 𝖡𝖾𝗋𝗇(1p)\mathsf{Bern}(1-p) if Xi>XjX_{i}>X_{j}, and a sample from 𝖡𝖾𝗋𝗇(p)\mathsf{Bern}(p) if Xi<XjX_{i}<X_{j}.

We have 𝖬𝖠𝖷(X1,,XK)=argmaxi[K]Xi\mathsf{MAX}(X_{1},\cdots,X_{K})=\operatorname*{arg\,max}_{i\in[K]}X_{i}, 𝖲𝖮𝖱𝖳(X1,,XK)=σ\mathsf{SORT}(X_{1},\cdots,X_{K})=\sigma, where σ:[K][K]\sigma:[K]\mapsto[K] is the permutation function such that Xσ(1)<Xσ(2)<<Xσ(K)X_{\sigma(1)}<X_{\sigma(2)}<\cdots<X_{\sigma(K)}. And 𝖲𝖤𝖠𝖱𝖢𝖧(X;X1,,XK)=i\mathsf{SEARCH}(X;X_{1},\cdots,X_{K})=i, where ii satisfies that Xi<X<Xi+1X_{i}<X<X_{i+1} with X0=X_{0}=-\infty and XK+1=+X_{K+1}=+\infty. In the 𝖲𝖤𝖠𝖱𝖢𝖧\mathsf{SEARCH} problem, we assume that the ordering of X1,,XKX_{1},\cdots,X_{K} is given, and we are interested where the position of a new XX is. Thus, in each round we compare the given XX and any of the elements XiX_{i}.

We are interested in the probability of exact recovery of the functions. We consider both adaptive sampling and non-adaptive sampling. In adaptive sampling, the sampling distribution at each round can be dependent on the past queries and observations. In non-adaptive sampling, the sampling distribution in each round has to be determined at the beginning and cannot change with the ongoing queries or observations. Throughout the paper, we assume that the desired error probability δ\delta satisfies δ<0.49\delta<0.49. We use the terms “querying” and “sampling” interchangeably.

II Computing the 𝖮𝖱\mathsf{OR} function

In this section, we provide the lower bounds for the query complexity of computing the 𝖮𝖱\mathsf{OR} function under both adaptive and non-adaptive sampling. The upper bound for adaptive sampling is from (Feige et al., 1994). And the upper bound for non-adaptive sampling is omitted . Let θ{0,1}K\theta\in\{0,1\}^{K} be the KK-bit sequence representing the true values. Let 𝖮𝖱(θ)\mathsf{OR}(\theta) be the result of the 𝖮𝖱\mathsf{OR} function applied to the KK-bit noiseless sequence. We also let μ^\hat{\mu} be any algorithm that queries any noisy bit in TT rounds, and outputs a (possibly random) decision from {0,1}\{0,1\}.

II-A Adaptive Sampling

We have the following minimax lower bound.

Theorem II.1.

In the adaptive setting, we have

infμ^supθ{0,1}K(μ^𝖮𝖱(θ))14exp(TD𝖪𝖫(p1p)K).\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in\{0,1\}^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{OR}(\theta))\geq\frac{1}{4}\cdot\exp\left(-\frac{T\cdot D_{\mathsf{KL}}(p\|1-p)}{K}\right).

Thus, the number of queries required to recover the true value with probability at least 1δ1-\delta is lower bounded by Ω(K/(1H(p))+Klog(1/δ)/D𝖪𝖫(p1p))\Omega({K}/{(1-H(p))}+K\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p)).

We provide the proof here, which is based on Le Cam’s two point method (see e.g. (LeCam, 1973, Yu, 1997)).

Proof of Theorem II.1.

Our lower bound proof is mainly based on Le Cam’s two point method, which is also re-stated in Lemma A.1 in Appendix A for reader’s convenience. Let θ0\theta_{0} be the length-KK all-zero sequence, and let θj{0,1}K\theta_{j}\in\{0,1\}^{K} be such that θjj=1\theta_{jj}=1 and θji=0\theta_{ji}=0 for iji\neq j. Here θji\theta_{ji} refers to the ii-th element in the binary vector θj\theta_{j}. We can first verify that for any μ^\hat{\mu}, one has

𝟙(μ^𝖮𝖱(θ0))+𝟙(μ^𝖮𝖱(θj))1.\displaystyle\mathds{1}(\hat{\mu}\neq\mathsf{OR}(\theta_{0}))+\mathds{1}(\hat{\mu}\neq\mathsf{OR}(\theta_{j}))\geq 1.

By applying Le Cam’s two point lemma on θ0\theta_{0} and θj\theta_{j}, we know that

infμ^supθ{0,1}K(μ^𝖮𝖱(θ))\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in\{0,1\}^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{OR}(\theta)) 12(1𝖳𝖵(θ0,θj)).\displaystyle\geq\frac{1}{2}(1-\mathsf{TV}(\mathbb{P}_{\theta_{0}},\mathbb{P}_{\theta_{j}})).

Here θj\mathbb{P}_{\theta_{j}} is the joint distribution of query-observation pairs in TT rounds when the ground truth is θj\theta_{j}. By taking maximum over jj on the right-hand side, we have

infμ^supθ{0,1}K(μ^𝖮𝖱(θ))\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in\{0,1\}^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{OR}(\theta))
sup1jK12(1𝖳𝖵(θ0,θj))\displaystyle\geq\sup_{1\leq j\leq K}\frac{1}{2}(1-\mathsf{TV}(\mathbb{P}_{\theta_{0}},\mathbb{P}_{\theta_{j}}))
sup1jK14exp(D𝖪𝖫(θ0,θj)).\displaystyle\geq\sup_{1\leq j\leq K}\frac{1}{4}\exp(-D_{\mathsf{KL}}(\mathbb{P}_{\theta_{0}},\mathbb{P}_{\theta_{j}})).

Here the last inequality is due to Bretagnolle–Huber inequality (Bretagnolle and Huber, 1979) (Lemma A.4 in Appendix A). Now we aim at computing D𝖪𝖫(θ0,θj)D_{\mathsf{KL}}(\mathbb{P}_{\theta_{0}},\mathbb{P}_{\theta_{j}}). Let TjT_{j} be the random variable that denotes the number of times the jj-th element is queried among all TT rounds. From divergence decomposition lemma (Auer et al., 1995) (Lemma A.3 in Appendix A), we have

D𝖪𝖫(θ0,θj)=𝔼θ0[Tj]D𝖪𝖫(p1p).\displaystyle D_{\mathsf{KL}}(\mathbb{P}_{\theta_{0}},\mathbb{P}_{\theta_{j}})=\mathbb{E}_{\theta_{0}}[T_{j}]\cdot D_{\mathsf{KL}}(p\|1-p).

Here 𝔼θ0[Tj]\mathbb{E}_{\theta_{0}}[T_{j}] denotes the expected number of times the jj-th element is queried when the ground truth is θ0\theta_{0}. Thus we have

infμ^supθ{0,1}K(μ^𝖮𝖱(θ))\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in\{0,1\}^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{OR}(\theta))
sup1jK14exp(𝔼θ0[Tj]D𝖪𝖫(p1p)).\displaystyle\geq\sup_{1\leq j\leq K}\frac{1}{4}\exp(-\mathbb{E}_{\theta_{0}}[T_{j}]\cdot D_{\mathsf{KL}}(p\|1-p)).

Now since j𝔼θ0[Tj]=T\sum_{j}\mathbb{E}_{\theta_{0}}[T_{j}]=T, there must exists some jj such that 𝔼θ0[Tj]T/K\mathbb{E}_{\theta_{0}}[T_{j}]\leq T/K. This gives

infμ^supθ{0,1}K(μ^𝖮𝖱(θ))\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in\{0,1\}^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{OR}(\theta)) 14exp(TD𝖪𝖫(p1p)K).\displaystyle\geq\frac{1}{4}\cdot\exp\left(-\frac{T\cdot D_{\mathsf{KL}}(p\|1-p)}{K}\right).

On the other hand, KK is naturally a lower bound for query complexity since one has to query each element at least once. Thus we arrive at a lower bound of Ω(K+Klog(1/δ)/D𝖪𝖫(p1p))\Omega(K+K\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p)). Note that this is equivalent to Ω(K/(1H(p))+Klog(1/δ)/D𝖪𝖫(p1p))\Omega(K/(1-H(p))+K\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p)) up to a constant factor when δ<0.49\delta<0.49. The reason is that when pp is bounded away from 0, (1H(p))/D𝖪𝖫(p1p)(1-H(p))/D_{\mathsf{KL}}(p\|1-p) is always some constant. When pp is close to 0, 1H(p)1-H(p) is within constant factor of 11. ∎

Remark II.2.

Compared with the existing tightest bound Ω(Klog(1/δ)/log((1p)/p))\Omega(K\log(1/\delta)/\log((1-p)/p)) in Feige et al. (1994), the rate is greatly improved as p0p\rightarrow 0 or p1/2p\rightarrow 1/2.

On the other hand, the best known upper bound from Feige et al. (1994), which is 𝒪(Klog(1/δ)1H(p))\mathcal{O}(\frac{K\log(1/\delta)}{1-H(p)}). We include its algorithm and analysis in Appendix I. Theorem II.1 shows that when δ\delta is bounded away from 0, one needs at least CK/(1H(p))C\cdot K/(1-H(p)) samples, matching the upper bound. Similarly, when pp is bounded away from 0, the term D𝖪𝖫(p1p)D_{\mathsf{KL}}(p\|1-p) is also within a constant factor of 1H(p)1-H(p), thus the upper and lower bounds match. The only regime where the upper and lower bounds do not match is the case when both pp and δ\delta go to 0. We conjecture that a better upper bound is needed in this case.

II-B Non-adaptive Sampling

In the case of non-adaptive sampling, we show that 𝒪(K)\mathcal{O}(K) queries are not enough. And one needs Ω(Klog(K))\Omega(K\log(K)) queries.

Theorem II.3.

In the non-adaptive sampling setting, where the sampling procedure is restricted to taking independent samples from a sequence of distributions p1,,pTp_{1},\cdots,p_{T}, we have

infμ^supθ{0,1}K(μ^𝖮𝖱(θ))\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in\{0,1\}^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{OR}(\theta))
12(112K((1+(12p)2K(1p)p)T1)).\displaystyle\quad\geq\tfrac{1}{2}\cdot\Big{(}1-\sqrt{\tfrac{1}{2K}\Big{(}\left(1+\tfrac{(1-2p)^{2}}{K(1-p)p}\right)^{T}-1\Big{)}}\Big{)}.

This shows that the query complexity is at least Ω(max(K,Klog(K)p/(1H(p))))\Omega(\max(K,K\log(K)p/(1-H(p)))).

We provide the proof below. Different from the case of adaptive sampling, for non-adaptive sampling we target for a rate of Ω(Klog(K))\Omega(K\log(K)). And a standard Le Cam’s two point method is not sufficient to give the extra logarithmic factor. Thus we provide a new proof based on a point versus mixture extension of Le Cam’s method.

Proof of Theorem  II.3.

Consider the instance 0 which has all 0 as its elements. For instance 11, we define it as a distribution qq over KK instances 1k1_{k}, k[K]k\in[K] which puts probability pkp_{k} on the kk-th element. We will determine the value of pkp_{k} later. Here instance 1k1_{k} refers to the case when kk-th element is 11, and the rest elements are 0. Now from Le Cam’s two point lemma, we have

infμ^supθ{0,1}K(μ^𝖮𝖱(θ))\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in\{0,1\}^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{OR}(\theta))
12(1𝖳𝖵(1,0))\displaystyle\geq\frac{1}{2}(1-\mathsf{TV}(\mathbb{P}_{1},\mathbb{P}_{0}))
=12(1𝖳𝖵(𝔼jq[1j],0))\displaystyle=\frac{1}{2}(1-\mathsf{TV}(\mathbb{E}_{j\sim q}[\mathbb{P}_{1_{j}}],\mathbb{P}_{0}))
12(112χ2(𝔼jq[1j],0)).\displaystyle\geq\frac{1}{2}\left(1-\sqrt{\frac{1}{2}\chi^{2}(\mathbb{E}_{j\sim q}[\mathbb{P}_{1_{j}}],\mathbb{P}_{0})}\right).

Here the last inequality is based on the inequality 𝖳𝖵12χ2\mathsf{TV}\leq\sqrt{\frac{1}{2}\chi^{2}}. Let πmt\pi_{m}^{t} be the probability of querying the mm-th element in round tt. We can further calculate the χ2\chi^{2} divergence as

χ2(𝔼[1j()],0())\displaystyle\chi^{2}(\mathbb{E}[\mathbb{P}_{1_{j}}(\cdot)],\mathbb{P}_{0}(\cdot))
=(a)𝔼j,jq[x1j(x)1j(x)0(x)]1\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}\mathbb{E}_{j,j^{\prime}\sim q}\left[\sum_{x}\frac{\mathbb{P}_{1_{j}}(x)\mathbb{P}_{1_{j^{\prime}}}(x)}{\mathbb{P}_{0}(x)}\right]-1
=(b)𝔼j,j[t=1T(m=1K(πmt)2(1p)𝟙(jm)+𝟙(jm)p𝟙(j=m)+𝟙(j=m)πmt(1p)\displaystyle\stackrel{{\scriptstyle(b)}}{{=}}\mathbb{E}_{j,j^{\prime}}\Bigg{[}\prod_{t=1}^{T}\Bigg{(}\sum_{m=1}^{K}\frac{(\pi^{t}_{m})^{2}(1-p)^{\operatorname*{\mathbbm{1}}(j\neq m)+\operatorname*{\mathbbm{1}}(j^{\prime}\neq m)}p^{\operatorname*{\mathbbm{1}}(j=m)+\operatorname*{\mathbbm{1}}(j^{\prime}=m)}}{\pi_{m}^{t}(1-p)}
+(πmt)2(1p)𝟙(j=m)+𝟙(j=m)p𝟙(jm)+𝟙(jm)πmtp)]1\displaystyle\hskip 20.00003pt+\frac{(\pi^{t}_{m})^{2}(1-p)^{\operatorname*{\mathbbm{1}}(j=m)+\operatorname*{\mathbbm{1}}(j^{\prime}=m)}p^{\operatorname*{\mathbbm{1}}(j\neq m)+\operatorname*{\mathbbm{1}}(j^{\prime}\neq m)}}{\pi_{m}^{t}p}\Bigg{)}\Bigg{]}-1
=𝔼j,j[t=1T(m=1Kπmt((1p)𝟙(jm)+𝟙(jm)p𝟙(j=m)+𝟙(j=m)1p\displaystyle=\mathbb{E}_{j,j^{\prime}}\Bigg{[}\prod_{t=1}^{T}\Bigg{(}\sum_{m=1}^{K}\pi^{t}_{m}\Bigg{(}\frac{(1-p)^{\operatorname*{\mathbbm{1}}(j\neq m)+\operatorname*{\mathbbm{1}}(j^{\prime}\neq m)}p^{\operatorname*{\mathbbm{1}}(j=m)+\operatorname*{\mathbbm{1}}(j^{\prime}=m)}}{1-p}
+(1p)𝟙(j=m)+𝟙(j=m)p𝟙(jm)+𝟙(jm)p))]1\displaystyle\hskip 20.00003pt+\frac{(1-p)^{\operatorname*{\mathbbm{1}}(j=m)+\operatorname*{\mathbbm{1}}(j^{\prime}=m)}p^{\operatorname*{\mathbbm{1}}(j\neq m)+\operatorname*{\mathbbm{1}}(j^{\prime}\neq m)}}{p}\Bigg{)}\Bigg{)}\Bigg{]}-1
=𝔼j,j[t=1T(m=1Kπmt(1+(12p)2𝟙(j=j=m)(1p)p))]1\displaystyle=\mathbb{E}_{j,j^{\prime}}\left[\prod_{t=1}^{T}\left(\sum_{m=1}^{K}\pi_{m}^{t}\left(1+\frac{(1-2p)^{2}\cdot\operatorname*{\mathbbm{1}}(j=j^{\prime}=m)}{(1-p)p}\right)\right)\right]-1
=1j=1Kpj2+j=1Kpj2t=1T(1+πjt(12p)2(1p)p)1\displaystyle=1-\sum_{j=1}^{K}p_{j}^{2}+\sum_{j=1}^{K}p_{j}^{2}\prod_{t=1}^{T}\left(1+\frac{\pi_{j}^{t}(1-2p)^{2}}{(1-p)p}\right)-1
=j=1Kpj2+j=1Kpj2t=1T(1+πjt(12p)2(1p)p),\displaystyle=-\sum_{j=1}^{K}p_{j}^{2}+\sum_{j=1}^{K}p_{j}^{2}\prod_{t=1}^{T}\left(1+\frac{\pi_{j}^{t}(1-2p)^{2}}{(1-p)p}\right),

where (a)(a) follows from Lemma A.5, and (b)(b) follows from the tensorization property of χ2\chi^{2} for tensor products, a direct result of Lemma A.5. Now denote Tj=t=1TπjtT_{j}=\sum_{t=1}^{T}\pi_{j}^{t}. By Jensen’s inequality, we know that t=1T(1+πjt(12p)2(1p)p)(1+Tj(12p)2T(1p)p)T\prod_{t=1}^{T}(1+\frac{\pi_{j}^{t}(1-2p)^{2}}{(1-p)p})\leq(1+\frac{T_{j}(1-2p)^{2}}{T(1-p)p})^{T}. Thus we have

χ2(0(),𝔼[1j()])\displaystyle\chi^{2}(\mathbb{P}_{0}(\cdot),\mathbb{E}[\mathbb{P}_{1_{j}}(\cdot)])
j=1Kpj2+j=1Kpj2(1+Tj(12p)2T(1p)p)T.\displaystyle\leq-\sum_{j=1}^{K}p_{j}^{2}+\sum_{j=1}^{K}p_{j}^{2}\left(1+\frac{T_{j}(1-2p)^{2}}{T(1-p)p}\right)^{T}.

Now we take pj=((1+Tj(12p)2T(1p)p)T1)1/2/(j((1+Tj(12p)2T(1p)p)T1)1/2)p_{j}=((1+\frac{T_{j}(1-2p)^{2}}{T(1-p)p})^{T}-1)^{-1/2}/(\sum_{j}((1+\frac{T_{j}(1-2p)^{2}}{T(1-p)p})^{T}-1)^{-1/2}). We have

χ2(0(),𝔼[1j()])\displaystyle\chi^{2}(\mathbb{P}_{0}(\cdot),\mathbb{E}[\mathbb{P}_{1_{j}}(\cdot)])
K(j=1K((1+Tj(12p)2T(1p)p)T1)1/2)2.\displaystyle\leq\frac{K}{(\sum_{j=1}^{K}((1+\frac{T_{j}(1-2p)^{2}}{T(1-p)p})^{T}-1)^{-1/2})^{2}}.

Since jTj=T\sum_{j}T_{j}=T, by Jensen’s inequality, we have j=1K((1+Tj(12p)2T(1p)p)T1)1/2K((1+(12p)2K(1p)p)T1)1/2\sum_{j=1}^{K}((1+\frac{T_{j}(1-2p)^{2}}{T(1-p)p})^{T}-1)^{-1/2}\geq K((1+\frac{(1-2p)^{2}}{K(1-p)p})^{T}-1)^{-1/2}. Thus

χ2(0(),𝔼[1j()])1K((1+(12p)2K(1p)p)T1).\displaystyle\chi^{2}(\mathbb{P}_{0}(\cdot),\mathbb{E}[\mathbb{P}_{1_{j}}(\cdot)])\leq\frac{1}{K}\cdot\left(\left(1+\frac{(1-2p)^{2}}{K(1-p)p}\right)^{T}-1\right).

This gives the desired result. Now solving the inequality

δ12(112K((1+(12p)2K(1p)p)T1)),\displaystyle\delta\geq\frac{1}{2}\cdot\left(1-\sqrt{\frac{1}{2K}\left(\left(1+\frac{(1-2p)^{2}}{K(1-p)p}\right)^{T}-1\right)}\right),

we arrive at Tlog(1+2K(12δ2))/log(1+(12p)2K(1p)p)Klog(K)p/(12p)2T\geq\log(1+2K(1-2\delta^{2}))/\log(1+\frac{(1-2p)^{2}}{K(1-p)p})\gtrsim K\log(K)p/(1-2p)^{2}. ∎

Remark II.4.

Compared with the bound Ω(Klog(K)log((1p)/p))\Omega(\frac{K\log(K)}{\log((1-p)/p)}) in (Reischuk and Schmeltz, 1991, Gács and Gál, 1994), our lower bound is tighter when p1/2p\rightarrow 1/2, but looser when p0p\rightarrow 0. Thus the tightest lower bound is a maximum of KK and the two lower bounds. The corresponding repetition-based upper bound 𝒪(Klog(K/δ)1H(p))\mathcal{O}(\frac{K\log(K/\delta)}{1-H(p)}) can be derived by a union-bound based argument. Compared with the upper bound 𝒪(Klog(K/δ)1H(p))\mathcal{O}(\frac{K\log(K/\delta)}{1-H(p)}), the lower bound is tight with respect to all parameters when both pp and δ\delta are bounded away from 0.

III Computing the 𝖬𝖠𝖷\mathsf{MAX} function

Let θ[0,1]K\theta\in[0,1]^{K} be a sequence of length KK representing the true values of each element. 𝖬𝖠𝖷(θ)\mathsf{MAX}(\theta) be the index of the maximum value in the sequence. We also let μ^\hat{\mu} be any algorithm that (possibly randomly) queries any noisy comparison between two elements in TT rounds, and output a (possibly random) decision from 0,10,1.

III-A Adaptive Sampling

We have the following minimax lower bound for the adaptive setting. The proof is deferred to Appendix B.

Theorem III.1.

In the adaptive setting, we have

infμ^supθ[0,1]K(μ^𝖬𝖠𝖷(θ))12exp(TD𝖪𝖫(p1p)K).\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in[0,1]^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{MAX}(\theta))\geq\tfrac{1}{2}\cdot\exp\left(-\tfrac{T\cdot D_{\mathsf{KL}}(p\|1-p)}{K}\right).

Thus, the number of queries required to recover the true value with probability at least 1δ1-\delta is lower bounded by Ω(K/(1H(p))+Klog(1/δ)/D𝖪𝖫(p1p))\Omega({K}/{(1-H(p))}+K\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p)).

The comparison with the existing lower bound (Feige et al., 1994) for computing 𝖬𝖠𝖷\mathsf{MAX} is similar as that for 𝖮𝖱\mathsf{OR} in Remark II.2, and is thus omitted here.

III-B Non-adaptive Sampling

In the case of non-adaptive sampling, we show that 𝒪(K)\mathcal{O}(K) samples are not enough. Instead, one needs at least Ω(K2)\Omega(K^{2}) samples. We have the following result. The proof is deferred to Appendix C.

Theorem III.2.

In the non-adaptive setting, where the sampling procedure is restricted to taking independent samples from a sequence of distributions p1,,pTp_{1},\cdots,p_{T}, we have

infμ^supθ[0,1]K(μ^𝖬𝖠𝖷(θ))14exp(TD𝖪𝖫(p1p)K2).\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in[0,1]^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{MAX}(\theta))\geq\tfrac{1}{4}\cdot\exp\left(-\tfrac{T\cdot D_{\mathsf{KL}}(p\|1-p)}{K^{2}}\right).

Thus the queries required to recover the true value with probability at least 1δ1-\delta is lower bounded by Ω(K2/(1H(p))+K2log(1/δ)/D𝖪𝖫(p1p))\Omega(K^{2}/(1-H(p))+K^{2}\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p)).

Remark III.3.

Compared with the repetition-based upper bound 𝒪(K2log(K))\mathcal{O}(K^{2}\log(K)), the lower bound has a log(K)\log(K) gap. The tight dependence on KK remains open.

IV Computing the 𝖲𝖤𝖠𝖱𝖢𝖧\mathsf{SEARCH} function

IV-A Adaptive Sampling

Recall that for any sorted sequence X1,,XKX_{1},\cdots,X_{K}, the 𝖲𝖤𝖠𝖱𝖢𝖧\mathsf{SEARCH} function for XX is defined as 𝖲𝖤𝖠𝖱𝖢𝖧(X;X1,,XK)=i\mathsf{SEARCH}(X;X_{1},\cdots,X_{K})=i, where ii satisfies Xi<X<Xi+1X_{i}<X<X_{i+1}. We start with adaptive setting. The proof is deferred to Appendix D.

Theorem IV.1.

In the adaptive setting, we have

infμ^supX(μ^𝖲𝖤𝖠𝖱𝖢𝖧(X))\displaystyle\inf_{\hat{\mu}}\sup_{X}\mathbb{P}(\hat{\mu}\neq\mathsf{SEARCH}(X))
\displaystyle\geq 14max(exp(TD𝖪𝖫(p1p)),1T(1H(p))log(K)).\displaystyle\tfrac{1}{4}\cdot\max\Big{(}\exp\left(-{T\cdot D_{\mathsf{KL}}(p\|1-p)}\right),1-\tfrac{T\cdot(1-H(p))}{\log(K)}\Big{)}.

Thus the queries required to recover the true value with probability at least 1δ1-\delta is lower bounded by Ω(log(K)/(1H(p))+log(1/δ)/D𝖪𝖫(p1p))\Omega(\log(K)/(1-H(p))+\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p)).

Remark IV.2.

The same lower bound is also proven in two different manners in the concurrent work (Wang et al., 2023, Gu and Xu, 2023). And a matching upper bound that is tight with respect to all parameters when pp is bounded away from 0 and 1/21/2 is provided in Gu and Xu (2023).

IV-B Non-adaptive Sampling

For non-adaptive sampling, we show Ω(K)\Omega(K) queries are needed. The proof is deferred to Appendix E.

Theorem IV.3.

In the non-adaptive setting where the sampling procedure is restricted to taking independent samples from a sequence of distributions p1,,pTp_{1},\cdots,p_{T}, we have

infμ^supX(μ^𝖲𝖤𝖠𝖱𝖢𝖧(X))14exp(TD𝖪𝖫(p1p)K).\displaystyle\inf_{\hat{\mu}}\sup_{X}\mathbb{P}(\hat{\mu}\neq\mathsf{SEARCH}(X))\geq\tfrac{1}{4}\cdot\exp\left(-\tfrac{T\cdot D_{\mathsf{KL}}(p\|1-p)}{K}\right).

Thus, the queries required to recover the true value with probability at least 1δ1-\delta is lower bounded by Ω(K/(1H(p))+Klog(1/δ)/D𝖪𝖫(p1p))\Omega(K/(1-H(p))+K\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p)).

Remark IV.4.

We show in Appendix H the upper bound 𝒪(Klog(1/δ)/(1H(p)))\mathcal{O}(K\log(1/\delta)/(1-H(p))) for computing 𝖲𝖤𝖠𝖱𝖢𝖧\mathsf{SEARCH}. Our lower bound matches with all parameters when either pp or δ\delta is bounded away from 0.

V Computing the 𝖲𝖮𝖱𝖳\mathsf{SORT} function

V-A Adaptive Sampling

Let θ[0,1]K\theta\in[0,1]^{K} be any sequence of KK elements with distinct values in [0,1][0,1], 𝖲𝖮𝖱𝖳(θ)\mathsf{SORT}(\theta) be the result of the 𝖲𝖮𝖱𝖳\mathsf{SORT} function applied on the noiseless sequence. We also let μ^\hat{\mu} be any algorithm that queries any noisy comparison between two elements in TT rounds, and output a (possibly random) decision. We have the following result for computing the 𝖲𝖮𝖱𝖳\mathsf{SORT} function. The proof is deferred to Appendix F.

Theorem V.1.

In the adaptive setting, we have

infμ^supθ[0,1]K(μ^𝖲𝖮𝖱𝖳(θ))\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in[0,1]^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{SORT}(\theta))
\displaystyle\geq 14max(exp(TD𝖪𝖫(p1p)K),1T(1H(p))Klog(K)).\displaystyle\frac{1}{4}\max\left(\exp\left(-\frac{T\cdot D_{\mathsf{KL}}(p\|1-p)}{K}\right),1-\frac{T\cdot(1-H(p))}{K\log(K)}\right).

Thus, the queries required to recover the true value with probability at least 1δ1-\delta is lower bounded by Ω(Klog(K)/(1H(p))+Klog(K/δ)/D𝖪𝖫(p1p)))\Omega(K\log(K)/(1-H(p))+K\log(K/\delta)/D_{\mathsf{KL}}(p\|1-p))).

Remark V.2.

Compared with the upper bound 𝒪(Klog(K/δ)/(1H(p)))\mathcal{O}(K\log(K/\delta)/(1-H(p))), the bound is tight with all parameters when either pp or δ\delta is bounded away from 0.

V-B Non-adaptive Sampling

Here we provide the following minimax lower bound for non-adaptive learning. The proof is deferred to Appendix G.

Theorem V.3.

In the non-adaptive setting where the sampling procedure is restricted to taking independent samples from a sequence of distributions p1,,pTp_{1},\cdots,p_{T},

infμ^supθ[0,1]K(μ^𝖲𝖮𝖱𝖳(θ))12(1TD𝖪𝖫(p1p)K2log(K)).\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in[0,1]^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{SORT}(\theta))\geq\frac{1}{2}\cdot\left(1-\frac{T\cdot D_{\mathsf{KL}}(p\|1-p)}{K^{2}\log(K)}\right).

Thus, the queries required to recover the true value with probability at least 1δ1-\delta is lower bounded by Ω(max(K2,K2log(K)/D𝖪𝖫(p1p)))\Omega(\max(K^{2},K^{2}\log(K)/D_{\mathsf{KL}}(p\|1-p))).

Remark V.4.

Compared with the repetition-based upper bound 𝒪(K2log(K/δ)/(1H(p)))\mathcal{O}(K^{2}\log(K/\delta)/(1-H(p))), the lower bound is tight with all parameters when pp and δ\delta are bounded away from 0.

VI Matching Bounds for Variable Length

In this section, we provide matching upper and lower bounds for the variable-length setting. All the bounds here are the same as the lower bound for the fixed-length setting, the proof of which can be directly adapted from the fixed-length results.

Theorem VI.1.

In the adaptive setting, the number of queries in expectation to achieve at most δ\delta error probability is

  1. 1.

    Θ(K1H(p)+Klog(1/δ)D𝖪𝖫(p1p))\Theta(\frac{K}{1-H(p)}+\frac{K\log(1/\delta)}{D_{\mathsf{KL}}(p\|1-p)}) for computing 𝖮𝖱\mathsf{OR} ​;

  2. 2.

    Θ(K1H(p)+Klog(1/δ)D𝖪𝖫(p1p))\Theta(\frac{K}{1-H(p)}+\frac{K\log(1/\delta)}{D_{\mathsf{KL}}(p\|1-p)}) for computing 𝖬𝖠𝖷\mathsf{MAX};

  3. 3.

    Θ(log(K)1H(p)+log(1/δ)D𝖪𝖫(p1p))\Theta(\frac{\log(K)}{1-H(p)}+\frac{\log(1/\delta)}{D_{\mathsf{KL}}(p\|1-p)}) for computing 𝖲𝖤𝖠𝖱𝖢𝖧\mathsf{SEARCH};

  4. 4.

    Θ(Klog(K)1H(p)+Klog(K/δ)D𝖪𝖫(p1p))\Theta(\frac{K\log(K)}{1-H(p)}+\frac{K\log(K/\delta)}{D_{\mathsf{KL}}(p\|1-p)}) for computing 𝖲𝖮𝖱𝖳\mathsf{SORT}.

The proof is deferred to Appendix J. The matching upper bound for 𝖲𝖤𝖠𝖱𝖢𝖧\mathsf{SEARCH} is given in (Gu and Xu, 2023) in the regime when pp is some constant that is bounded away from 0 and 1/21/2, where they make it tight even for the dependency on the constant. Our results for the other three functions improve both existing upper and lower bounds, and provide tight query complexity with all parameters in the variable-length setting. (Wang et al., 2023) initiates the study for constant-wise matching bounds for 𝖲𝖮𝖱𝖳\mathsf{SORT} when δ=Ω(K1)\delta=\Omega(K^{-1}), which is achieved by (Gu and Xu, 2023). Our bounds are tight up to constant for arbitrarily small δ\delta.

Algorithm 1 Variable-length tournament for computing 𝖮𝖱\mathsf{OR} with noise
1:Input: Target confidence level δ\delta.
2:Set 𝒳=(X1,X2,,XK)\mathcal{X}=(X_{1},X_{2},\cdots,X_{K}) as the list of all bits with unknown value.
3:for iteration i=1:log2(K)i=1:\lceil\log_{2}(K)\rceil do
4:     for iteration j=1:|𝒳|/2j=1:\lceil|\mathcal{X}|/2\rceil do
5:         Set a=1/2a=1/2, δ~i=δ2(2i1)\tilde{\delta}_{i}=\delta^{2(2i-1)}.
6:         while a(δ~i,1δ~i)a\in(\tilde{\delta}_{i},1-\tilde{\delta}_{i}) do
7:              Query the (2j1)(2j-1)-th element once.
8:              If observe 11, update a=(1p)a(1p)a+p(1a)a=\frac{(1-p)a}{(1-p)a+p(1-a)}. Otherwise update a=papa+(1p)(1a)a=\frac{pa}{pa+(1-p)(1-a)}.          
9:         If aδ~ia\leq\tilde{\delta}_{i}, remove the (2j1)(2j-1)-th element, otherwise remove the 2j2j-th element.      
10:     Break when 𝒳\mathcal{X} only has one element left.
11:Set a=1/2a=1/2.
12:while a(δ,1δ)a\in(\delta,1-\delta) do
13:     Query the only left element in 𝒳\mathcal{X} element once.
14:     If observe 11, update a=(1p)a(1p)a+p(1a)a=\frac{(1-p)a}{(1-p)a+p(1-a)}. Otherwise update a=papa+(1p)(1a)a=\frac{pa}{pa+(1-p)(1-a)}.
15:If aδa\leq\delta, return 0, otherwise return 1.1.

We provide the upper bound algorithm for computing 𝖮𝖱\mathsf{OR} in Algorithm 1. For the upper bound, one major difference is that to compare two elements with error probability at most δ\delta, one needs 𝒪(log(1/δ)/D𝖪𝖫(p1p)+1/(12p))\mathcal{O}(\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p)+1/(1-2p)) queries in expectation, which can be achieved by keep comparing the two elements until the posterior distribution reaches the desired confidence level (see e.g. Lemma 13 of Gu and Xu (2023)). But the best known bound for fixed-length is 𝒪(log(1/δ)/(1H(p))\mathcal{O}(\log(1/\delta)/(1-H(p)) (Feige et al., 1994). This makes it simpler to achieve tight rate for variable length.

In Algorithm 1, we adapt the noisy comparison oracle in Gu and Xu (2023) to noisy query oracle on each element, and combine with the original fixed-length algorithm in Feige et al. (1994). In each round, the algorithm eliminates half of the elements in the current set by querying the elements with odd indices. If the (2j1)(2j-1)-th element is determined to be 11, the 2j2j-th element will be removed without being queried. If the (2j1)(2j-1)-th element is determined to be 0, it will be removed from the list. Thus after 𝒪(log(K))\mathcal{O}(\log(K)) rounds, we have only one element left in the set, and it suffices to query this element to determine the output of 𝖮𝖱\mathsf{OR} function.

VII Conclusions and Future Work

For four noisy computing tasks — the 𝖮𝖱\mathsf{OR} function from noisy queries, and the 𝖬𝖠𝖷,𝖲𝖤𝖠𝖱𝖢𝖧\mathsf{MAX},\mathsf{SEARCH}, and 𝖲𝖮𝖱𝖳\mathsf{SORT} functions from noisy pairwise comparisons — we tighten the lower bounds for fixed-length noisy computing and provide matching bounds for variable-length noisy computing. Making the bounds match exactly in the fixed-length setting remains an important open problem.

Acknowledgements

The authors would like to thank Yanjun Han for insightful discussions on the information-theoretic lower bounds. This work was supported in part by the NSERC Discovery Grant No. RGPIN-2019-05448, the NSERC Collaborative Research and Development Grant CRDPJ 54367619, NSF Grants IIS-1901252 and CCF-1909499.

References

  • Agarwal et al. (2017) A. Agarwal, S. Agarwal, S. Assadi, and S. Khanna. Learning with limited rounds of adaptivity: Coin tossing, multi-armed bandits, and ranking from pairwise comparisons. In Proceedings of the 2017 Conference on Learning Theory, volume 65 of Proceedings of Machine Learning Research, pages 39–75. PMLR, 07–10 Jul 2017.
  • Ailon (2011) N. Ailon. Active learning ranking from pairwise preferences with almost optimal query complexity. In J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 24. Curran Associates, Inc., 2011.
  • Ailon et al. (2008) N. Ailon, M. Charikar, and A. Newman. Aggregating inconsistent information: Ranking and clustering. J. ACM, 55(5), nov 2008. ISSN 0004-5411. doi: 10.1145/1411509.1411513.
  • Ailon et al. (2011) N. Ailon, R. Begleiter, and E. Ezra. A new active learning scheme with applications to learning to rank from pairwise preferences. 2011.
  • Ajtai et al. (2009) M. Ajtai, V. Feldman, A. Hassidim, and J. Nelson. Sorting and selection with imprecise comparisons. In ACM Trans. Algorithms, volume 12, pages 37–48, 07 2009. doi: 10.1007/978-3-642-02927-1˙5.
  • Audibert et al. (2010) J.-Y. Audibert, S. Bubeck, and R. Munos. Best arm identification in multi-armed bandits. In COLT, pages 41–53, 2010.
  • Auer et al. (1995) P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. Gambling in a rigged casino: The adversarial multi-armed bandit problem. In Proceedings of IEEE 36th annual foundations of computer science, pages 322–331. IEEE, 1995.
  • Berlekamp (1964) E. R. Berlekamp. Block coding with noiseless feedback. Ph.D. thesis, MIT, Cambridge, MA, USA, 1964.
  • Bradley and Terry (1952) R. A. Bradley and M. E. Terry. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):324–345, 1952. ISSN 00063444.
  • Braverman and Mossel (2009) M. Braverman and E. Mossel. Sorting from noisy information. 2009. URL https://arxiv.org/abs/0910.1191.
  • Bretagnolle and Huber (1979) J. Bretagnolle and C. Huber. Estimation des densités: risque minimax. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, 47(2):119–137, 1979. doi: 10.1007/BF00535278.
  • Bubeck et al. (2009) S. Bubeck, R. Munos, and G. Stoltz. Pure exploration in multi-armed bandits problems. In International conference on Algorithmic learning theory, pages 23–37. Springer, 2009.
  • Burnashev (1976) M. V. Burnashev. Data transmission over a discrete channel with feedback. random transmission time. Problemy Peredachi Informatsii, 12(4):10–30, 1976.
  • Burnashev and Zigangirov (1974) M. V. Burnashev and K. Zigangirov. An interval estimation problem for controlled observations. Problemy Peredachi Informatsii, 10(3):51–61, 1974.
  • Chen et al. (2017) X. Chen, Y. Chen, and X. Li. Asymptotically optimal sequential design for rank aggregation. Math. Oper. Res., 10 2017. doi: 10.1287/moor.2021.1209.
  • Chen and Suh (2015) Y. Chen and C. Suh. Spectral MLE: Top-K rank aggregation from pairwise comparisons. In F. Bach and D. Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 371–380, Lille, France, 07–09 Jul 2015. PMLR.
  • Dobrushin and Ortyukov (1977a) R. L. Dobrushin and S. Ortyukov. Lower bound for the redundancy of self-correcting arrangements of unreliable functional elements. Problemy Peredachi Informatsii, 13(1):82–89, 1977a.
  • Dobrushin and Ortyukov (1977b) R. L. Dobrushin and S. Ortyukov. Upper bound on the redundancy of self-correcting arrangements of unreliable functional elements. Problemy Peredachi Informatsii, 13(3):56–76, 1977b.
  • Evans and Pippenger (1998) W. Evans and N. Pippenger. Average-case lower bounds for noisy boolean decision trees. SIAM Journal on Computing, 28(2):433–446, 1998.
  • Falahatgar et al. (2017) M. Falahatgar, A. Orlitsky, V. Pichapati, and A. T. Suresh. Maximum selection and ranking under noisy comparisons. In D. Precup and Y. W. Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1088–1096. PMLR, 06–11 Aug 2017.
  • Feige et al. (1994) U. Feige, P. Raghavan, D. Peleg, and E. Upfal. Computing with noisy information. SIAM Journal on Computing, 23(5):1001–1018, 1994.
  • Gabillon et al. (2012) V. Gabillon, M. Ghavamzadeh, and A. Lazaric. Best arm identification: A unified approach to fixed budget and fixed confidence. Advances in Neural Information Processing Systems, 25, 2012.
  • Gács and Gál (1994) P. Gács and A. Gál. Lower bounds for the complexity of reliable boolean circuits with noisy gates. IEEE Transactions on Information Theory, 40(2):579–583, 1994.
  • Garivier and Kaufmann (2016) A. Garivier and E. Kaufmann. Optimal best arm identification with fixed confidence. In Conference on Learning Theory, 2016.
  • Gu and Xu (2023) Y. Gu and Y. Xu. Optimal bounds for noisy sorting, 2023.
  • Heckel et al. (2019) R. Heckel, N. B. Shah, K. Ramchandran, and M. J. Wainwright. Active ranking from pairwise comparisons and when parametric assumptions do not help. Ann. Stat., 47(6):3099 – 3126, 2019.
  • Horstein (1963) M. Horstein. Sequential transmission using noiseless feedback. IEEE Trans. Inf. Theory, 9(3):136–143, 1963. doi: 10.1109/TIT.1963.1057832.
  • Ingster et al. (2003) Y. Ingster, J. I. Ingster, and I. Suslina. Nonparametric goodness-of-fit testing under Gaussian models, volume 169. Springer Science & Business Media, 2003.
  • Jamieson and Nowak (2014) K. Jamieson and R. Nowak. Best-arm identification algorithms for multi-armed bandits in the fixed confidence setting. In 48th Annual Conference on Information Sciences and Systems, pages 1–6, 2014.
  • Jamieson and Nowak (2011) K. G. Jamieson and R. D. Nowak. Active ranking using pairwise comparisons. In Proceedings of the 24th International Conference on Neural Information Processing Systems, NIPS’11, page 2240–2248, Red Hook, NY, USA, 2011. Curran Associates Inc. ISBN 9781618395993.
  • Karp and Kleinberg (2007) R. M. Karp and R. Kleinberg. Noisy binary search and its applications. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’07, page 881–890, USA, 2007. Society for Industrial and Applied Mathematics. ISBN 9780898716245.
  • Kaufmann et al. (2016) E. Kaufmann, O. Cappé, and A. Garivier. On the complexity of best arm identification in multi-armed bandit models. Journal of Machine Learning Research, 17:1–42, 2016.
  • LeCam (1973) L. LeCam. Convergence of estimates under dimensionality restrictions. The Annals of Statistics, pages 38–53, 1973.
  • Mao et al. (2018) C. Mao, J. Weed, and P. Rigollet. Minimax rates and efficient algorithms for noisy sorting. In Proceedings of Algorithmic Learning Theory, volume 83 of Proceedings of Machine Learning Research, pages 821–847. PMLR, 07–09 Apr 2018.
  • Mitzenmacher and Upfal (2017) M. Mitzenmacher and E. Upfal. Probability and computing: Randomization and probabilistic techniques in algorithms and data analysis. Cambridge university press, 2017.
  • Mohajer et al. (2017) S. Mohajer, C. Suh, and A. Elmahdy. Active learning for top-kk rank aggregation from noisy comparisons. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, page 2488–2497. JMLR.org, 2017.
  • Negahban et al. (2012) S. Negahban, S. Oh, and D. Shah. Iterative ranking from pair-wise comparisons. In Advances in Neural Information Processing Systems, volume 25, 2012.
  • Pelc (1989) A. Pelc. Searching with known error probability. Theoretical Computer Science, 63(2):185–202, 1989.
  • Pippenger et al. (1991) N. Pippenger, G. D. Stamoulis, and J. N. Tsitsiklis. On a lower bound for the redundancy of reliable networks with noisy gates. IEEE Transactions on Information Theory, 37(3):639–643, 1991.
  • Rajkumar and Agarwal (2014) A. Rajkumar and S. Agarwal. A statistical convergence perspective of algorithms for rank aggregation from pairwise data. In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32, page I–118–I–126, 2014.
  • Reischuk and Schmeltz (1991) R. Reischuk and B. Schmeltz. Reliable computation with noisy circuits and decision trees-a general n log n lower bound. In [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science, pages 602–611. IEEE Computer Society, 1991.
  • Ren et al. (2018) W. Ren, J. Liu, and N. B. Shroff. Pac ranking from pairwise and listwise queries: Lower bounds and upper bounds. 2018.
  • Rényi (1961) A. Rényi. On a problem of information theory. MTA Mat. Kut. Int. Kozl. B, 6(MR143666):505–516, 1961.
  • Shah and Wainwright (2018) N. B. Shah and M. J. Wainwright. Simple, robust and optimal ranking from pairwise comparisons. J. Mach. Learn. Res., 18(199):1–38, 2018.
  • Shah et al. (2016) N. B. Shah, S. Balakrishnan, A. Guntuboyina, and M. J. Wainwright. Stochastically transitive models for pairwise comparisons: Statistical and computational issues. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML’16, page 11–20. JMLR.org, 2016.
  • Tsybakov (2004) A. B. Tsybakov. Introduction to nonparametric estimation. Springer, 9(10), 2004.
  • Ulam (1976) S. Ulam. Adventures of a mathematician. Charles Scribner’s Sons, New York, NY, USA, 1976.
  • Von Neumann (1956) J. Von Neumann. Probabilistic logics and the synthesis of reliable organisms from unreliable components. Automata studies, 34:43–98, 1956.
  • Wang et al. (2022) Z. Wang, N. Ghaddar, and L. Wang. Noisy sorting capacity. In 2022 IEEE International Symposium on Information Theory (ISIT), pages 2541–2546. IEEE, 2022.
  • Wang et al. (2023) Z. Wang, N. Ghaddar, B. Zhu, and L. Wang. Noisy sorting capacity, 2023. URL https://arxiv.org/abs/2202.01446.
  • Wauthier et al. (2013) F. Wauthier, M. Jordan, and N. Jojic. Efficient ranking from pairwise comparisons. In Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pages 109–117, Atlanta, Georgia, USA, 17–19 Jun 2013.
  • Yu (1997) B. Yu. Assouad, fano, and le cam. In Festschrift for Lucien Le Cam, pages 423–435. Springer, 1997.

Appendix A Useful Lemmas

Here, we introduce several important lemmas.

Lemma A.1 (Le Cam’s Two Point Lemma, see e.g. (Yu, 1997)).

For any θ0,θ1Θ\theta_{0},\theta_{1}\in\Theta, suppose that the following separation condition holds for some loss function L(θ,a):Θ×𝒜L(\theta,a):\Theta\times\mathcal{A}\rightarrow\mathbb{R}:

a𝒜,L(θ0,a)+L(θ1,a)Δ>0.\displaystyle\forall{a\in\mathcal{A}},L(\theta_{0},a)+L(\theta_{1},a)\geq\Delta>0.

Then we have

inffsupθ𝔼θ[L(θ,f(X))]Δ2(1𝖳𝖵(θ0,θ1)).\displaystyle\inf_{f}\sup_{\theta}\mathbb{E}_{\theta}[L(\theta,f(X))]\geq\frac{\Delta}{2}\left(1-\mathsf{TV}(\mathbb{P}_{\theta_{0}},\mathbb{P}_{\theta_{1}})\right).
Lemma A.2 (Point vs Mixture, see e.g. (Ingster et al., 2003)).

For any θ0Θ\theta_{0}\in\Theta, Θ1Θ\Theta_{1}\subset\Theta, suppose that the following separation condition holds for some loss function L(θ,a):Θ×𝒜L(\theta,a):\Theta\times\mathcal{A}\rightarrow\mathbb{R}:

θ1Θ1,a𝒜,L(θ0,a)+L(θ1,a)Δ>0.\displaystyle\forall\theta_{1}\in\Theta_{1},{a\in\mathcal{A}},L(\theta_{0},a)+L(\theta_{1},a)\geq\Delta>0.

Then for any probability measure μ\mu supported on Θ1\Theta_{1}, we have

inffsupθ𝔼θ[L(θ,f(X))]Δ2(1𝖳𝖵(θ0,𝔼μ(dθ)[θ1])).\displaystyle\inf_{f}\sup_{\theta}\mathbb{E}_{\theta}[L(\theta,f(X))]\geq\frac{\Delta}{2}\left(1-\mathsf{TV}(\mathbb{P}_{\theta_{0}},\mathbb{E}_{\mu(d\theta)}[\mathbb{P}_{\theta_{1}}])\right).
Lemma A.3 (Divergence Decomposition (Auer et al., 1995)).

Let TiT_{i} be the random variable denoting the number of times experiment i[K]i\in[K] is performed under some policy π\pi, then for two distributions π,π\mathbb{P}^{\pi},\mathbb{Q}^{\pi} under policy π\pi,

D𝖪𝖫(π,π)=i[K]𝔼π[Ti]D𝖪𝖫(iπ,iπ).\displaystyle D_{\mathsf{KL}}(\mathbb{P}^{\pi},\mathbb{Q}^{\pi})=\sum_{i\in[K]}\mathbb{E}_{\mathbb{P}^{\pi}}[T_{i}]D_{\mathsf{KL}}(\mathbb{P}_{i}^{\pi},\mathbb{Q}_{i}^{\pi}).
Lemma A.4 (An upper Bound of Bretagnolle–Huber inequality ((Bretagnolle and Huber, 1979), and Lemma 2.6 in Tsybakov (2004))).

For any distribution 1,2\mathbb{P}_{1},\mathbb{P}_{2}, one has

𝖳𝖵(1,2)112exp(D𝖪𝖫(1,2)).\displaystyle\mathsf{TV}(\mathbb{P}_{1},\mathbb{P}_{2})\leq 1-\frac{1}{2}\exp(-D_{\mathsf{KL}}(\mathbb{P}_{1},\mathbb{P}_{2})).
Lemma A.5.

(Chi-Square divergence for mixture of distributions, see e.g. (Ingster et al., 2003)) Let (Pθ)θΘ(P_{\theta})_{\theta\in\Theta} be a family of distributions parametrized by θ\theta, and QQ be any fixed distribution. Then for any probability measure μ\mu supported on Θ\Theta, we have

χ2(𝔼θμ[Pθ],Q)=𝔼θμ,θμ[xPθ(x)Pθ(x)Q(x)]1.\displaystyle\chi^{2}(\mathbb{E}_{\theta\sim\mu}[P_{\theta}],Q)=\mathbb{E}_{\theta\sim\mu,\theta^{\prime}\sim\mu}\left[\sum_{x}\frac{P_{\theta}(x)P_{\theta^{\prime}}(x)}{Q(x)}\right]-1.

Here θ,θ\theta,\theta^{\prime} in the expectation are independent.

Lemma A.6.

[Chernoff’s bound for Binomial random variables (Mitzenmacher and Upfal, 2017, Exercise 4.7)] If X𝖡𝗂𝗇(n,λn)X\sim\mathsf{Bin}(n,\frac{\lambda}{n}), then for any η(0,1)\eta\in(0,1), we have

(X(1+η)λ)\displaystyle\mathbb{P}(X\geq(1+\eta)\lambda) (eη(1+η)(1+η))λ\displaystyle\leq\left(\frac{e^{\eta}}{(1+\eta)^{(1+\eta)}}\right)^{\lambda} (3)
(X(1η)λ)\displaystyle\mathbb{P}(X\leq(1-\eta)\lambda) (eη(1η)(1η))λeη2λ/2.\displaystyle\leq\left(\frac{e^{-\eta}}{(1-\eta)^{(1-\eta)}}\right)^{\lambda}\leq e^{-\eta^{2}\lambda/2}. (4)

Appendix B Proof for Theorem III.1

Proof.

We first select an arbitrary sequence 0<X1<X2<<XK<10<X_{1}<X_{2}<\cdots<X_{K}<1. Let θ0=(X1,X2,X3,,XK)\theta_{0}=(X_{1},X_{2},X_{3},\cdots,X_{K}) be original sequence which has its largest value in the KK-th element. Thus 𝖬𝖠𝖷(θ0)=K\mathsf{MAX}(\theta_{0})=K. Now for any i[K1]i\in[K-1], we design θi=(X1,,Xi1,XK,Xi,Xi+1,,XK1)\theta_{i}=(X_{1},\cdots,X_{i-1},X_{K},X_{i},X_{i+1},\cdots,X_{K-1}), i.e., we move the KK-th element in θ0\theta_{0} and insert it between Xi1X_{i-1} and XiX_{i}. Let Ti,jT_{i,j} be the random variable that represents the number of comparisons between the ii-th item and jj-th item in the TT rounds. We know that 𝖬𝖠𝖷(θi)=i\mathsf{MAX}(\theta_{i})=i for all i[K1]i\in[K-1]. Following a similar proof as Theorem II.1, we know that

infμ^supθ{0,1}K(μ^𝖬𝖠𝖷(θ))\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in\{0,1\}^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{MAX}(\theta))
sup1jK112(1𝖳𝖵(θ0,θj))\displaystyle\geq\sup_{1\leq j\leq K-1}\frac{1}{2}(1-\mathsf{TV}(\mathbb{P}_{\theta_{0}},\mathbb{P}_{\theta_{j}}))
sup1jK114exp(D𝖪𝖫(θ0,θj))\displaystyle\geq\sup_{1\leq j\leq K-1}\frac{1}{4}\exp(-D_{\mathsf{KL}}(\mathbb{P}_{\theta_{0}},\mathbb{P}_{\theta_{j}}))
sup1jK114exp(l=j+1K𝔼θ0[Tj,l]D𝖪𝖫(p1p)).\displaystyle\geq\sup_{1\leq j\leq K-1}\frac{1}{4}\exp(-\sum_{l=j+1}^{K}\mathbb{E}_{\theta_{0}}[T_{j,l}]\cdot D_{\mathsf{KL}}(p\|1-p)).

Now since i,j[K],i<j𝔼θ0[Ti,j]=T\sum_{i,j\in[K],i<j}\mathbb{E}_{\theta_{0}}[T_{i,j}]=T, there must exists some jj such that l=j+1K𝔼θ0[Tj,l]T/K\sum_{l=j+1}^{K}\mathbb{E}_{\theta_{0}}[T_{j,l}]\leq T/K. This gives

infμ^supθ{0,1}K(μ^𝖬𝖠𝖷(θ))\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in\{0,1\}^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{MAX}(\theta)) 14exp(TD𝖪𝖫(p1p)K).\displaystyle\geq\frac{1}{4}\cdot\exp\left(-\frac{T\cdot D_{\mathsf{KL}}(p\|1-p)}{K}\right).

On the other hand, KK is naturally a lower bound for the query complexity since one has to query each element at least once. Thus we arrive at a lower bound of Ω(max(K,Klog(1/δ)/D𝖪𝖫(p1p)))\Omega(\max(K,K\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p))). Note that this is equivalent to Ω(K/(1H(p))+Klog(1/δ)/D𝖪𝖫(p1p))\Omega(K/(1-H(p))+K\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p)) up to a constant factor when δ<0.49\delta<0.49. The reason is that when pp is bounded away from 0, (1H(p))/D𝖪𝖫(p1p)(1-H(p))/D_{\mathsf{KL}}(p\|1-p) is always some constant. When pp is close to 0, 1H(p)1-H(p) is within constant factor of 11. ∎

Appendix C Proof for Theorem III.2

Proof.

Consider an arbitrary sequence 0<X1<X2<<XK<10<X_{1}<X_{2}<\cdots<X_{K}<1. Since i,j[K],i<j𝔼θ0[Ti,j]=T\sum_{i,j\in[K],i<j}\mathbb{E}_{\theta_{0}}[T_{i,j}]=T, there must exists some pair (i,j)(i,j) such that 𝔼θ0[Ti,j]2T/K(K1)\mathbb{E}_{\theta_{0}}[T_{i,j}]\leq 2T/K(K-1). Now we construct

θ0=(X1,X2,X3,,XK,,XK1,,XK2),\theta_{0}=(X_{1},X_{2},X_{3},\cdots,X_{K},\cdots,X_{K-1},\cdots,X_{K-2}),

where XKX_{K} lies in the ii-th position and XK1X_{K-1} lies in the jj-th position, and

θ1=(X1,X2,X3,,XK1,,XK,,XK2),\theta_{1}=(X_{1},X_{2},X_{3},\cdots,X_{K-1},\cdots,X_{K},\cdots,X_{K-2}),

where XKX_{K} lies in the jj-th position and XK1X_{K-1} lies in the ii-th position. Thus 𝖬𝖠𝖷(θ0)=i\mathsf{MAX}(\theta_{0})=i, 𝖬𝖠𝖷(θ1)=j\mathsf{MAX}(\theta_{1})=j. From Le Cam’s two point lemma, we know that

infμ^supθ{0,1}K(μ^𝖬𝖠𝖷(θ))\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in\{0,1\}^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{MAX}(\theta))
12(1𝖳𝖵(θ0,θ1))\displaystyle\geq\frac{1}{2}(1-\mathsf{TV}(\mathbb{P}_{\theta_{0}},\mathbb{P}_{\theta_{1}}))
14exp(𝔼θ0[Ti,j]D𝖪𝖫(p1p))\displaystyle\geq\frac{1}{4}\exp(-\mathbb{E}_{\theta_{0}}[T_{i,j}]\cdot D_{\mathsf{KL}}(p\|1-p))
14exp(2TD𝖪𝖫(p1p)K2).\displaystyle\geq\frac{1}{4}\cdot\exp\left(-\frac{2T\cdot D_{\mathsf{KL}}(p\|1-p)}{K^{2}}\right).

On the other hand, K2K^{2} is naturally a lower bound for the query complexity since one has to query each element at least once. Thus we arrive at a lower bound of Ω(max(K2,K2log(1/δ)/D𝖪𝖫(p1p)))\Omega(\max(K^{2},K^{2}\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p))). Note that this is equivalent to Ω(K2/(1H(p))+K2log(1/δ)/D𝖪𝖫(p1p))\Omega(K^{2}/(1-H(p))+K^{2}\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p)) up to a constant factor when δ<0.49\delta<0.49. ∎

Appendix D Proof for Theorem IV.1

Proof.

We begin with the first half, i.e.

infμ^supX(μ^𝖲𝖤𝖠𝖱𝖢𝖧(X))14exp(TD𝖪𝖫(p1p)).\displaystyle\inf_{\hat{\mu}}\sup_{X}\mathbb{P}(\hat{\mu}\neq\mathsf{SEARCH}(X))\geq\frac{1}{4}\cdot\exp\left(-{T\cdot D_{\mathsf{KL}}(p\|1-p)}\right).

To see this, simply consider the case of K=1K=1, and we need to determine whether X<X1X<X_{1} or X>X1X>X_{1} from their pairwise comparisons. We consider two instances X(0),X(1)X^{(0)},X^{(1)}, where X(0)<X1<X(1)X^{(0)}<X_{1}<X^{(1)}. From Le Cam’s lemma, we have

infμ^supX(μ^𝖲𝖤𝖠𝖱𝖢𝖧(θ))\displaystyle\inf_{\hat{\mu}}\sup_{X}\mathbb{P}(\hat{\mu}\neq\mathsf{SEARCH}(\theta)) 12(1𝖳𝖵(X(0),X(1)))\displaystyle\geq\frac{1}{2}(1-\mathsf{TV}(\mathbb{P}_{X^{(0)}},\mathbb{P}_{X^{(1)}}))
14exp(TD𝖪𝖫(p1p)).\displaystyle\geq\frac{1}{4}\exp(-T\cdot D_{\mathsf{KL}}(p\|1-p)).

Next, we aim to prove the second half, namely

infμ^supX(μ^𝖲𝖤𝖠𝖱𝖢𝖧(X))\displaystyle\inf_{\hat{\mu}}\sup_{X}\mathbb{P}(\hat{\mu}\neq\mathsf{SEARCH}(X))
\displaystyle\geq 12(1T(1H(p))+log(2)log(K)).\displaystyle\frac{1}{2}\cdot\left(1-\frac{T\cdot(1-H(p))+\log(2)}{\log(K)}\right).

Now we design KK instances X(0),,X(K1)X^{(0)},\cdots,X^{(K-1)}. We let X(0)<X1X^{(0)}<X_{1}, and X(l)(Xl,Xl+1)X^{(l)}\in(X_{l},X_{l+1}). From Le Cam’s lemma, we have

infμ^supX(μ^𝖲𝖤𝖠𝖱𝖢𝖧(θ))\displaystyle\inf_{\hat{\mu}}\sup_{X}\mathbb{P}(\hat{\mu}\neq\mathsf{SEARCH}(\theta))
infμ^supX{Xl}l[K](μ^𝖲𝖤𝖠𝖱𝖢𝖧(θ))\displaystyle\geq\inf_{\hat{\mu}}\sup_{X\in\{X^{l}\}_{l\in[K]}}\mathbb{P}(\hat{\mu}\neq\mathsf{SEARCH}(\theta))
infΨ12Kl[K]X(l)(Ψl).\displaystyle\geq\inf_{\Psi}\frac{1}{2K}\sum_{l\in[K]}\mathbb{P}_{X^{(l)}}(\Psi\neq l).

Now by Fano’s inequality, we have

infΨ12Kl[K](l)(Ψl)\displaystyle\inf_{\Psi}\frac{1}{2K}\sum_{l\in[K]}\mathbb{P}_{{}^{(l)}}(\Psi\neq l)
12(1I(V;Y)+log(2)log(K)).\displaystyle\geq\frac{1}{2}\cdot\left(1-\frac{I(V;Y)+\log(2)}{\log(K)}\right).

Here V𝖴𝗇𝗂𝖿({0,1,,K1}))V\sim\mathsf{Unif}(\{0,1,\cdots,K-1\})) and YY satisfies Y|V=l=X(l)\mathbb{P}_{Y|V=l}=\mathbb{P}_{X^{(l)}}. Following the same argument as the proof of Theorem 3 in Wang et al. (2022), we know that I(V;Y)T(1H(p))I(V;Y)\leq T\cdot(1-H(p)). Thus overall we have

infμ^supX(μ^𝖲𝖤𝖠𝖱𝖢𝖧(X))14(1T(1H(p))log(K)).\displaystyle\inf_{\hat{\mu}}\sup_{X}\mathbb{P}(\hat{\mu}\neq\mathsf{SEARCH}(X))\geq\frac{1}{4}\cdot\left(1-\frac{T\cdot(1-H(p))}{\log(K)}\right).

Thus the queries required to recover the true value with probability at least 1δ1-\delta is lower bounded by Ω(log(K)/(1H(p))+log(1/δ)/D𝖪𝖫(p1p))\Omega(\log(K)/(1-H(p))+\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p)). ∎

Appendix E Proof for Theorem IV.3

Proof.

Let TiT_{i} be the random variable that denotes the number of times XX is compared with XiX_{i}. Since i[K]𝔼[Ti]=T\sum_{i\in[K]}\mathbb{E}[T_{i}]=T, there must exists some ii such that 𝔼[Ti]T/K\mathbb{E}[T_{i}]\leq T/K. Now we construct the first instance X(0)(Xi1,Xi)X^{(0)}\in(X_{i-1},X_{i}), and the second instance X(1)(Xi,Xi+1)X^{(1)}\in(X_{i},X_{i+1}). From Le Cam’s two point lemma, we know that

infμ^supX(μ^𝖲𝖤𝖠𝖱𝖢𝖧(X))\displaystyle\inf_{\hat{\mu}}\sup_{X}\mathbb{P}(\hat{\mu}\neq\mathsf{SEARCH}(X))
12(1𝖳𝖵(X(0),X(1)))\displaystyle\geq\frac{1}{2}(1-\mathsf{TV}(\mathbb{P}_{X^{(0)}},\mathbb{P}_{X^{(1)}}))
14exp(𝔼[Ti]D𝖪𝖫(p1p))\displaystyle\geq\frac{1}{4}\exp(-\mathbb{E}[T_{i}]\cdot D_{\mathsf{KL}}(p\|1-p))
14exp(TD𝖪𝖫(p1p)K).\displaystyle\geq\frac{1}{4}\cdot\exp\left(-\frac{T\cdot D_{\mathsf{KL}}(p\|1-p)}{K}\right).

On the other hand, KK is naturally a lower bound for query complexity since one has to query each element at least once. Thus we arrive at a lower bound of Ω(max(K,Klog(1/δ)/D𝖪𝖫(p1p)))\Omega(\max(K,K\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p))). ∎

Appendix F Proof for Theorem V.1

Proof.

From (Wang et al., 2023), it is proven with Fano’s inequality that

infμ^supθ[0,1]K(μ^𝖲𝖮𝖱𝖳(θ))12(1T(1H(p))Klog(K)).\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in[0,1]^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{SORT}(\theta))\geq\frac{1}{2}\cdot\left(1-\frac{T\cdot(1-H(p))}{K\log(K)}\right).

So it suffices to prove that

infμ^supθ[0,1]K(μ^𝖲𝖮𝖱𝖳(θ))14exp(TD𝖪𝖫(p1p)K).\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in[0,1]^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{SORT}(\theta))\geq\frac{1}{4}\cdot\exp\left(-\frac{T\cdot D_{\mathsf{KL}}(p\|1-p)}{K}\right).

To see this, consider an arbitrary sequence 0<X1<X2<<XK<10<X_{1}<X_{2}<\cdots<X_{K}<1. Now we design KK instances θ0,,θK1\theta_{0},\cdots,\theta_{K-1}. We let θ0=(X1,,XK)\theta_{0}=(X_{1},\cdots,X_{K}), and θl\theta_{l} be the instance that switches the element XlX_{l} with Xl+1X_{l+1}, where l[K]l\in[K]. From Le Cam’s two point lemma, we have

infμ^supθ[0,1]K(μ^𝖲𝖮𝖱𝖳(θ))\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in[0,1]^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{SORT}(\theta))
sup1jK112(1𝖳𝖵(θ0,θj))\displaystyle\geq\sup_{1\leq j\leq K-1}\frac{1}{2}(1-\mathsf{TV}(\mathbb{P}_{\theta_{0}},\mathbb{P}_{\theta_{j}}))
sup1jK114exp(D𝖪𝖫(θ0,θj))\displaystyle\geq\sup_{1\leq j\leq K-1}\frac{1}{4}\exp(-D_{\mathsf{KL}}(\mathbb{P}_{\theta_{0}},\mathbb{P}_{\theta_{j}}))
sup1jK114exp(𝔼θ0[Tj,j+1]D𝖪𝖫(p1p)).\displaystyle\geq\sup_{1\leq j\leq K-1}\frac{1}{4}\exp(-\mathbb{E}_{\theta_{0}}[T_{j,j+1}]\cdot D_{\mathsf{KL}}(p\|1-p)).

Let Ti,jT_{i,j} be the random variable that represents the number of comparisons between the ii-th item and jj-th item in the TT rounds. Now since 1jK1𝔼θ0[Tj,j+1]T\sum_{1\leq j\leq K-1}\mathbb{E}_{\theta_{0}}[T_{j,j+1}]\leq T, there must exists some jj such that 𝔼θ0[Tj,j+1]T/K\mathbb{E}_{\theta_{0}}[T_{j,j+1}]\leq T/K. This gives

infμ^supθ[0,1]K(μ^𝖲𝖮𝖱𝖳(θ))\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in[0,1]^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{SORT}(\theta)) 14exp(TD𝖪𝖫(p1p)K).\displaystyle\geq\frac{1}{4}\cdot\exp\left(-\frac{T\cdot D_{\mathsf{KL}}(p\|1-p)}{K}\right).

Altogether, this shows a lower bound on the query complexity Ω(Klog(K)/(1H(p))+Klog(1/δ)/D𝖪𝖫(p1p)))\Omega(K\log(K)/(1-H(p))+K\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p))). Note that this is equivalent to Ω(Klog(K)/(1H(p))+Klog(K/δ)/D𝖪𝖫(p1p)))\Omega(K\log(K)/(1-H(p))+K\log(K/\delta)/D_{\mathsf{KL}}(p\|1-p))) since 1/(1H(p))1/D𝖪𝖫(p1p)1/(1-H(p))\gtrsim 1/D_{\mathsf{KL}}(p\|1-p). ∎

Appendix G Proof for Theorem V.3

Proof.

We first select an arbitrary sequence 0<X1<X2<<XK<10<X_{1}<X_{2}<\cdots<X_{K}<1. Let σi:[K][K]\sigma_{i}:[K]\mapsto[K] be the ii-th permutation of the sequence, where i[K!]i\in[K!]. Here σi(k)\sigma_{i}(k) refers to the kk-th largest element under permutation σi\sigma_{i}. Now we consider a summation i[K!],j[K1]𝔼[Tσi(j),σi(j+1)]\sum_{i\in[K!],j\in[K-1]}\mathbb{E}[T_{\sigma_{i}(j),\sigma_{i}(j+1)}]. For each pair (i,j)(i,j), 𝔼[Ti,j]\mathbb{E}[T_{i,j}] is counted 2(K1)!2(K-1)! times in the summation. Furthermore, we know that i<j𝔼[Ti,j]=T\sum_{i<j}\mathbb{E}[T_{i,j}]=T. Thus we have

i[K!],j[K1]𝔼[Tσi(j),σi(j+1)]=2T(K1)!\displaystyle\sum_{i\in[K!],j\in[K-1]}\mathbb{E}[T_{\sigma_{i}(j),\sigma_{i}(j+1)}]=2T(K-1)!

We know that there must exists some ii such that

j[K]𝔼[Tσi(j),σi(j+1)]2T(K1)!K!=2TK.\displaystyle\sum_{j\in[K]}\mathbb{E}[T_{\sigma_{i}(j),\sigma_{i}(j+1)}]\leq\frac{2T(K-1)!}{K!}=\frac{2T}{K}.

Now we design KK instances θ0,,θK1\theta_{0},\cdots,\theta_{K-1}. We let θ0=(Xσi(1),,Xσi(K))\theta_{0}=(X_{\sigma_{i}(1)},\cdots,X_{\sigma_{i}(K)}), and θl\theta_{l} be the instance that switches the element Xσi(l)X_{\sigma_{i}(l)} with Xσi(l+1)X_{\sigma_{i}(l+1)} based on θ0\theta_{0}, where l[K1]l\in[K-1]. From Le Cam’s two point lemma, we have

infμ^supθ{0,1}K(μ^𝖲𝖮𝖱𝖳(θ))\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in\{0,1\}^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{SORT}(\theta))
infμ^supθ{θl}l[K](μ^𝖲𝖮𝖱𝖳(θ))\displaystyle\geq\inf_{\hat{\mu}}\sup_{\theta\in\{\theta_{l}\}_{l\in[K]}}\mathbb{P}(\hat{\mu}\neq\mathsf{SORT}(\theta))
infΨ12Kl[K]θl(Ψl).\displaystyle\geq\inf_{\Psi}\frac{1}{2K}\sum_{l\in[K]}\mathbb{P}_{\theta_{l}}(\Psi\neq l).

Now by Fano’s inequality, we have

infΨ12Kl[K]θl(Ψl)\displaystyle\inf_{\Psi}\frac{1}{2K}\sum_{l\in[K]}\mathbb{P}_{\theta_{l}}(\Psi\neq l)
12(1I(V;X)+log(2)log(K))\displaystyle\geq\frac{1}{2}\cdot\left(1-\frac{I(V;X)+\log(2)}{\log(K)}\right)
=12(1l[K]D𝖪𝖫(l,¯)/K+log(2)log(K)).\displaystyle=\frac{1}{2}\cdot\left(1-\frac{\sum_{l\in[K]}D_{\mathsf{KL}}(\mathbb{P}^{l},\bar{\mathbb{P}})/K+\log(2)}{\log(K)}\right).

Here ¯=1Kl[K]l\bar{\mathbb{P}}=\frac{1}{K}\sum_{l\in[K]}\mathbb{P}^{l}. We can compute

D𝖪𝖫(l,¯)\displaystyle D_{\mathsf{KL}}(\mathbb{P}^{l},\bar{\mathbb{P}})
𝔼[Tσi(l),σi(l+1)]D𝖪𝖫(1p(K1)p+(1p)K)\displaystyle\leq\mathbb{E}[T_{\sigma_{i}(l),\sigma_{i}(l+1)}]D_{\mathsf{KL}}\left(1-p\|\frac{(K-1)p+(1-p)}{K}\right)
+ml𝔼[Tσi(m),σi(m+1)]D𝖪𝖫(p(K1)p+(1p)K)\displaystyle+\sum_{m\neq l}\mathbb{E}[T_{\sigma_{i}(m),\sigma_{i}(m+1)}]D_{\mathsf{KL}}\left(p\|\frac{(K-1)p+(1-p)}{K}\right)
K1K𝔼[Tσi(l),σi(l+1)]D𝖪𝖫(p1p)\displaystyle\leq\frac{K-1}{K}\cdot\mathbb{E}[T_{\sigma_{i}(l),\sigma_{i}(l+1)}]D_{\mathsf{KL}}\left(p\|1-p\right)
+1Kml𝔼[Tσi(m),σi(m+1)]D𝖪𝖫(p1p).\displaystyle+\frac{1}{K}\sum_{m\neq l}\mathbb{E}[T_{\sigma_{i}(m),\sigma_{i}(m+1)}]D_{\mathsf{KL}}\left(p\|1-p\right).

Now summing over all ll, we have

1Kl[K]D𝖪𝖫(l,¯)\displaystyle\frac{1}{K}\sum_{l\in[K]}D_{\mathsf{KL}}(\mathbb{P}^{l},\bar{\mathbb{P}}) 2D𝖪𝖫(p1p)Kl[K]𝔼[Tσi(l),σi(l+1)]\displaystyle\leq\frac{2D_{\mathsf{KL}}\left(p\|1-p\right)}{K}\cdot\sum_{l\in[K]}\mathbb{E}[T_{\sigma_{i}(l),\sigma_{i}(l+1)}]
4TD𝖪𝖫(p1p)K2.\displaystyle\leq\frac{4TD_{\mathsf{KL}}\left(p\|1-p\right)}{K^{2}}.

This gives

infμ^supθ{0,1}K(μ^𝖮𝖱(θ))\displaystyle\inf_{\hat{\mu}}\sup_{\theta\in\{0,1\}^{K}}\mathbb{P}(\hat{\mu}\neq\mathsf{OR}(\theta)) 12(14TD𝖪𝖫(p1p)K2log(K)).\displaystyle\geq\frac{1}{2}\cdot\left(1-\frac{4T\cdot D_{\mathsf{KL}}(p\|1-p)}{K^{2}\log(K)}\right).

This shows that to output the correct answer with probability at least 2/32/3, one needs at least CK2log(K)/D𝖪𝖫(p1p)C\cdot K^{2}\log(K)/D_{\mathsf{KL}}(p\|1-p) queries for some universal constant CC.

Appendix H Upper Bounds for Non-adaptive Sampling

In this section, we present a theorem on the upper bounds for non-adaptive learning in the worst-case query model.

Theorem H.1.

One can design algorithm such that the worst-case query complexity is

  1. 1.

    𝒪(Klog(K/δ)1H(p))\mathcal{O}(\frac{K\log(K/\delta)}{1-H(p)}) for computing 𝖮𝖱\mathsf{OR};

  2. 2.

    𝒪(K2log(K/δ)1H(p))\mathcal{O}(\frac{K^{2}\log(K/\delta)}{1-H(p)}) for computing 𝖬𝖠𝖷\mathsf{MAX};

  3. 3.

    𝒪(Klog(1/δ)1H(p))\mathcal{O}(\frac{K\log(1/\delta)}{1-H(p)}) for computing 𝖲𝖤𝖠𝖱𝖢𝖧\mathsf{SEARCH};

  4. 4.

    𝒪(K2log(K/δ)1H(p))\mathcal{O}(\frac{K^{2}\log(K/\delta)}{1-H(p)}) for computing 𝖲𝖮𝖱𝖳\mathsf{SORT};

The results for 𝖮𝖱\mathsf{OR}, 𝖬𝖠𝖷\mathsf{MAX} and 𝖲𝖮𝖱𝖳\mathsf{SORT} are based the simple algorithms of querying all possible elements equal number of times. And the analysis is a direct union bound argument. Here we only present the algorithm and analysis for 𝖲𝖤𝖠𝖱𝖢𝖧\mathsf{SEARCH}.

Proof.

Assume that the target XX lies between XlX_{l} and Xl+1X_{l+1}. Consider the non-adaptive learning algorithm which compares XX with each element XiX_{i} for Ti=T/K=4log(1/δ)/(1H(p))T_{i}=\lfloor T/K\rfloor=4\log(1/\delta)/(1-H(p)) times.

Let NiN_{i} be the number of observations of 11 among TiT_{i} queries for element XiX_{i}. Consider the following algorithm:

l^=argmaxl[K]i=1lNi+i=l+1K(TiNi).\displaystyle\hat{l}=\operatorname*{arg\,max}_{l\in[K]}\sum_{i=1}^{l}N_{i}+\sum_{i=l+1}^{K}(T_{i}-N_{i}).

We show that with high probability, l^=l\hat{l}=l. We have

(l^l)\displaystyle\mathbb{P}(\hat{l}\neq l) jl(l^=j)\displaystyle\leq\sum_{j\neq l}\mathbb{P}(\hat{l}=j)
jl(i=1lNi+i=l+1K(TiNi)(i=1jNi+i=j+1K(TiNi))<0).\displaystyle\leq\sum_{j\neq l}\mathbb{P}\Big{(}\sum_{i=1}^{l}N_{i}+\sum_{i=l+1}^{K}(T_{i}-N_{i})-(\sum_{i=1}^{j}N_{i}+\sum_{i=j+1}^{K}(T_{i}-N_{i}))<0\Big{)}.

Now we bound the above probability for each jj. Without loss of generality, we assume that j>lj>l. The above probability can be written as

(i=1lNi+i=l+1K(TiNi)<i=1jNi+i=j+1K(TiNi))\displaystyle\mathbb{P}\Big{(}\sum_{i=1}^{l}N_{i}+\sum_{i=l+1}^{K}(T_{i}-N_{i})<\sum_{i=1}^{j}N_{i}+\sum_{i=j+1}^{K}(T_{i}-N_{i})\Big{)}
=\displaystyle= (i=l+1j(Ti2Ni)<0)\displaystyle\mathbb{P}\Big{(}\sum_{i=l+1}^{j}(T_{i}-2N_{i})<0\Big{)}
=\displaystyle= (i=l+1jNi>12(jl)T/K)\displaystyle\mathbb{P}\Big{(}\sum_{i=l+1}^{j}N_{i}>\frac{1}{2}(j-l)\lfloor T/K\rfloor\Big{)}

Note that i=l+1jNi𝖡𝗂𝗇((jl)T/K,p)\sum_{i=l+1}^{j}N_{i}\sim\mathsf{Bin}((j-l)\lfloor T/K\rfloor,p). Let n=12(jl)T/Kn=\frac{1}{2}(j-l)\lfloor T/K\rfloor. From Lemma A.6 we have

(i=l+1jNin/2)\displaystyle\mathbb{P}(\sum_{i=l+1}^{j}N_{i}\geq n/2) (e12p2p(1/2p)(1/2p))np=(2pexp(12p))n/2\displaystyle\leq\left(\frac{e^{\frac{1-2p}{2p}}}{(1/2p)^{(1/2p)}}\right)^{np}=(2p\cdot\exp(1-2p))^{n/2}
<exp(log(1/δ)2(jl)(12p+log(2p))(12p)2)<δjl.\displaystyle<\exp\left(\log(1/\delta)\cdot{\frac{2(j-l)(1-2p+\log(2p))}{(1-2p)^{2}}}\right)<\delta^{j-l}.

Now by summing over the probability for different jsj^{\prime}s, we get

(l^l)jlδ|jl|<2δ1δ.\displaystyle\mathbb{P}(\hat{l}\neq l)\leq\sum_{j\neq l}\delta^{|j-l|}<\frac{2\delta}{1-\delta}.

Rescaling δ\delta finishes the proof. ∎

Appendix I Upper Bounds for Adaptive Sampling

Here we present the tournament algorithm for computing 𝖮𝖱\mathsf{OR} ​ introduced in (Feige et al., 1994). Similar algorithm can also be applied to compute 𝖬𝖠𝖷\mathsf{MAX} ​. The main difference is that in 𝖬𝖠𝖷\mathsf{MAX}, we directly compare two elements 4(2i1)log(1/δ)(1H(p))\lceil\frac{4(2i-1)\log(1/\delta)}{(1-H(p))}\rceil times instead of comparing their number of 11^{\prime}s.

Algorithm 2 Tournament for computing 𝖮𝖱\mathsf{OR} with noise
1:Input: Target confidence level δ\delta.
2:Set 𝒳=(X1,X2,,XK)\mathcal{X}=(X_{1},X_{2},\cdots,X_{K}) as the list of all bits with unknown value.
3:for iteration i=1:log2(K)i=1:\lceil\log_{2}(K)\rceil do
4:     Query 4(2i1)log(1/δ)(1H(p))\lceil\frac{4(2i-1)\log(1/\delta)}{(1-H(p))}\rceil times each of the element in 𝒳\mathcal{X}.
5:     for iteration j=1:|𝒳|/2j=1:\lceil|\mathcal{X}|/2\rceil do
6:         Compare the number of 11’s in the queries from the (2j1)(2j-1)-th element and (2j)(2j)-th element, remove the element with smaller number of 11’s from the list 𝒳\mathcal{X}. Ties are broken arbitrarily. (If the (2j)(2j)-th element does not exist, we will not remove the (2j1)(2j-1)-th element.)      
7:     Break when 𝒳\mathcal{X} only has one element left.
8:Query 6log(1/δ)(1H(p))\lceil\frac{6\log(1/\delta)}{(1-H(p))}\rceil times the only left element in 𝒳\mathcal{X}. Return 11 if there is more than half 11’s, and 0 otherwise.

The following theorem is due to (Feige et al., 1994). We include it here for completeness.

Theorem I.1.

Algorithm 2 finishes within CKlog(1/δ)/(1H(p))C\cdot K\log(1/\delta)/(1-H(p)) queries, and outputs the correct value of 𝖮𝖱(X1,,XK)\mathsf{OR}(X_{1},\cdots,X_{K}) with probability at least 12δ1-2\delta when δ<1/2\delta<1/2.

Proof.

First, we compute the total number of queries of the algorithm. Without loss of generality, we may assume that KK can be written as 2m2^{m} for some integer mm. If not we may add no more than KK extra dummy 0’s to the original list 𝒳\mathcal{X} to make sure K=2mK=2^{m}. In each of the outer iteration ii, the size of 𝒳\mathcal{X} is decreased half. We know that after log2(K)\lceil\log_{2}(K)\rceil round, the set 𝒳\mathcal{X} will only contain one element. In round ii, the number of queries we make for each element is 4(2i1)log(1/δ)(12p)2\lceil\frac{4(2i-1)\log(1/\delta)}{(1-2p)^{2}}\rceil. The total number of queries we make is

4log(1/δ)(12p)2+i=1log2(K)4(2i1)log(1/δ)(12p)2K2i1\displaystyle\left\lceil\frac{4\log(1/\delta)}{(1-2p)^{2}}\right\rceil+\sum_{i=1}^{\lceil\log_{2}(K)\rceil}\left\lceil\frac{4(2i-1)\log(1/\delta)}{(1-2p)^{2}}\right\rceil\cdot\frac{K}{2^{i-1}}
\displaystyle\leq 4log(1/δ)(12p)2+i=1log2(K)(4(2i1)log(1/δ)(12p)2+1)K2i1\displaystyle\left\lceil\frac{4\log(1/\delta)}{(1-2p)^{2}}\right\rceil+\sum_{i=1}^{\lceil\log_{2}(K)\rceil}\left(\frac{4(2i-1)\log(1/\delta)}{(1-2p)^{2}}+1\right)\cdot\frac{K}{2^{i-1}}
\displaystyle\leq 4log(1/δ)(12p)2+2K+Klog(1/δ)(12p)2i=1log2(K)4(2i1)2i1\displaystyle\left\lceil\frac{4\log(1/\delta)}{(1-2p)^{2}}\right\rceil+2K+\frac{K\log(1/\delta)}{(1-2p)^{2}}\sum_{i=1}^{\lceil\log_{2}(K)\rceil}\frac{4(2i-1)}{2^{i-1}}
\displaystyle\leq 4log(1/δ)(12p)2+2K+28Klog(1/δ)(12p)2\displaystyle\left\lceil\frac{4\log(1/\delta)}{(1-2p)^{2}}\right\rceil+2K+\frac{28K\log(1/\delta)}{(1-2p)^{2}}
\displaystyle\leq CKlog(1/δ)1H(p)\displaystyle\frac{CK\log(1/\delta)}{1-H(p)}

Here in last inequality we use the fact below, which can be verified numerically:

p[0,1],(1/2p)21H(p)[1/4,1/2].\displaystyle\forall p\in[0,1],\frac{(1/2-p)^{2}}{1-H(p)}\in[1/4,1/2]. (5)

Now we show that the failure probability of the algorithm is at most δ\delta. Consider the first case where all the elements are 0. Then no matter which element is left in 𝒳\mathcal{X}, the probability that the algorithm fails is the probability that a Binomial random variable XB(n,np)X\sim B(n,np) has value larger or equal to n/2n/2 with n=4log(1/δ)(12p)2n=\left\lceil\frac{4\log(1/\delta)}{(1-2p)^{2}}\right\rceil.

Taking λ=np\lambda=np, η=12p2p\eta=\frac{1-2p}{2p} in Equation (3) of Lemma A.6, we know that

(Xn/2)\displaystyle\mathbb{P}(X\geq n/2) (e12p2p(1/2p)(1/2p))np\displaystyle\leq\left(\frac{e^{\frac{1-2p}{2p}}}{(1/2p)^{(1/2p)}}\right)^{np}
=(2pexp(12p))n/2\displaystyle=(2p\cdot\exp(1-2p))^{n/2}
<exp(log(1/δ)2(12p+log(2p))(12p)2)\displaystyle<\exp\left(\log(1/\delta)\cdot{\frac{2(1-2p+\log(2p))}{(1-2p)^{2}}}\right)
<δ.\displaystyle<\delta.

Here the last inequality uses the fact that (12p+log(2p))(12p)2<1/2{\frac{(1-2p+\log(2p))}{(1-2p)^{2}}}<-1/2 for all p[0,1/2)p\in[0,1/2). This shows that the final failure probability is bounded by δ\delta when the true elements are all 0.

Consider the second case where there exists at least a 11 in the original elements X1,,XKX_{1},\cdots,X_{K}. Without loss of generality, we assume that X1=1X_{1}=1. Let 𝒳i\mathcal{X}^{i} be the remaining list of elements at the beginning of ii-th iteration. We let i\mathcal{E}_{i} be the event that the first element in 𝒳i\mathcal{X}^{i} is 11 while the first element in 𝒳i+1\mathcal{X}^{i+1} is 0. This event only happens when the second element in 𝒳i\mathcal{X}^{i} is 0 and gets more 11’s in the noisy queries than the first element. Let 𝒜\mathcal{A} denote the event that the only left element is 11 in the last round, we have

(𝒜)\displaystyle\mathbb{P}(\mathcal{A}) 1(i=1log2(K)i)\displaystyle\geq 1-\mathbb{P}\left(\bigcup_{i=1}^{\lceil\log_{2}(K)\rceil}\mathcal{E}_{i}\right)
1i=1log2(K)(i)\displaystyle\geq 1-\sum_{i=1}^{\lceil\log_{2}(K)\rceil}\mathbb{P}(\mathcal{E}_{i})
1i=1log2(K)(YiXi0),\displaystyle\geq 1-\sum_{i=1}^{\lceil\log_{2}(K)\rceil}\mathbb{P}(Y_{i}-X_{i}\geq 0),

where XiB(ni,ni(1p))X_{i}\sim B(n_{i},n_{i}(1-p)) and YiB(ni,nip)Y_{i}\sim B(n_{i},n_{i}p), with ni=4(2i1)log(1/δ)(12p)2n_{i}=\lceil\frac{4(2i-1)\log(1/\delta)}{(1-2p)^{2}}\rceil. Let Zi=YiXi+niZ_{i}=Y_{i}-X_{i}+n_{i}, we know that the random variable ZiB(2ni,nip)Z_{i}\sim B(2n_{i},n_{i}p). Thus we have

(𝒜)\displaystyle\mathbb{P}(\mathcal{A}) 1i=1log2(K)(Zini)\displaystyle\geq 1-\sum_{i=1}^{\lceil\log_{2}(K)\rceil}\mathbb{P}(Z_{i}\geq n_{i})
>1i=1log2(K)δ2(2i1)\displaystyle>1-\sum_{i=1}^{\lceil\log_{2}(K)\rceil}\delta^{2(2i-1)}
>1δ21δ2.\displaystyle>1-\frac{\delta^{2}}{1-\delta^{2}}.

Following the same argument on Binomial distribution, we can upper bound the error probability under event 𝒜\mathcal{A} with δ\delta. Thus the total failure probability is upper bounded by δ+δ2/(1δ2)<2δ\delta+\delta^{2}/(1-\delta^{2})<2\delta when δ<1/2\delta<1/2.

Appendix J Proof of Theorem VI.1

J-A Lower Bounds

First, note that all our lower bounds for fixed-length can be adapted to variable-length by replacing TT with 𝔼[T]\mathbb{E}[T]. The bound of mutual information in Fano’s inequality in Section D can be proven using the same argument as Lemma 27 in Gu and Xu (2023), and the divergence decomposition lemma (Lemma A.3) still holds for variable length due to Lemma 15 in Kaufmann et al. (2016).

J-B Upper Bounds

Now it suffices to prove the upper bounds. For 𝖮𝖱\mathsf{OR} and 𝖬𝖠𝖷\mathsf{MAX}, we already know from Theorem I.1 that 𝒪(Klog(1/δ)/(1H(p)))\mathcal{O}(K\log(1/\delta)/(1-H(p))) is an upper bound for fixed-length setting when comparing two elements with error probability at most δ\delta requires Clog(1/δ)/(1H(p))\lceil{C\log(1/\delta)}/{(1-H(p))}\rceil samples for some constant CC. Now for variable-length setting, we know that comparing two elements with error probability at most δ\delta only requires log(1/δ)/D𝖪𝖫(p1p)+1/(12p)\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p)+1/(1-2p) queries in expectation via the variable-length comparison algorithm in Lemma 13 of Gu and Xu (2023). Thus in Algorithm 1, we replace the repetition-based comparisons in Algorithm 2 with the new variable-length comparison algorithm. This gives the query complexity 𝒪(K/(1H(p))+Klog(1/δ)/D𝖪𝖫(p1p))\mathcal{O}(K/(1-H(p))+K\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p)). Similar algorithm can also be applied to compute 𝖬𝖠𝖷\mathsf{MAX} ​. The main difference is that in 𝖬𝖠𝖷\mathsf{MAX}, we directly compare two elements instead of finding the number of 11^{\prime}s of the first element.

Theorem J.1.

The expected number of total queries made by Algorithm 1 is upper bounded by C(K1H(p)+Klog(1/δ)D𝖪𝖫(p1p))C\cdot\left(\frac{K}{1-H(p)}+\frac{K\log(1/\delta)}{D_{\mathsf{KL}}(p\|1-p)}\right). Furthermore, the algorithm outputs the correct value of 𝖮𝖱(X1,,XK)\mathsf{OR}(X_{1},\cdots,X_{K}) with probability at least 12δ1-2\delta when δ<1/2\delta<1/2.

Proof.

First, we compute the total number of queries of the algorithm. Without loss of generality, we may assume that KK can be written as 2m2^{m} for some integer mm. If not we may add no more than KK extra dummy 0’s to the original list 𝒳\mathcal{X} to make sure K=2mK=2^{m}. In each of the outer iteration ii, the size of 𝒳\mathcal{X} is decreased half. We know that after log2(K)\lceil\log_{2}(K)\rceil iterations, the set 𝒳\mathcal{X} will only contain one element. From Lemma 27 in Gu and Xu (2023), the expected number of queries we make is 𝒪(log(1/δ~i)/D𝖪𝖫(p1p)+1/(12p))\mathcal{O}(\log(1/\tilde{\delta}_{i})/D_{\mathsf{KL}}(p\|1-p)+1/(1-2p)) at round ii. Thus the total number of queries we make is

log(1/δ)D𝖪𝖫(p1p)+112p+Ci=1log2(K)(log(1/δ~i)D𝖪𝖫(p1p)+112p)K2i1\displaystyle\frac{\log(1/\delta)}{D_{\mathsf{KL}}(p\|1-p)}+\frac{1}{1-2p}+C\cdot\sum_{i=1}^{\lceil\log_{2}(K)\rceil}\left(\frac{\log(1/\tilde{\delta}_{i})}{D_{\mathsf{KL}}(p\|1-p)}+\frac{1}{1-2p}\right)\cdot\frac{K}{2^{i-1}}
\displaystyle\leq Ci=1log2(K)((4i2)log(1/δ)D𝖪𝖫(p1p)+112p)K2i1\displaystyle C\cdot\sum_{i=1}^{\lceil\log_{2}(K)\rceil}\left(\frac{(4i-2)\log(1/\delta)}{D_{\mathsf{KL}}(p\|1-p)}+\frac{1}{1-2p}\right)\cdot\frac{K}{2^{i-1}}
\displaystyle\leq C(K12p+Klog(1/δ)D𝖪𝖫(p1p)i=1log2(K)4(2i1)2i1)\displaystyle C\cdot\left(\frac{K}{1-2p}+\frac{K\log(1/\delta)}{D_{\mathsf{KL}}(p\|1-p)}\cdot\sum_{i=1}^{\lceil\log_{2}(K)\rceil}\frac{4(2i-1)}{2^{i-1}}\right)
\displaystyle\leq C(K12p+Klog(1/δ)D𝖪𝖫(p1p))\displaystyle C\cdot\left(\frac{K}{1-2p}+\frac{K\log(1/\delta)}{D_{\mathsf{KL}}(p\|1-p)}\right)
\displaystyle\leq C(K1H(p)+Klog(1/δ)D𝖪𝖫(p1p)).\displaystyle C\cdot\left(\frac{K}{1-H(p)}+\frac{K\log(1/\delta)}{D_{\mathsf{KL}}(p\|1-p)}\right).

Here in last inequality we use the fact below, which can be verified numerically:

p[0,1],(1/2p)21H(p)[1/4,1/2].\displaystyle\forall p\in[0,1],\frac{(1/2-p)^{2}}{1-H(p)}\in[1/4,1/2]. (6)

Now we show that the failure probability of the algorithm is at most δ\delta. Consider the first case where all the elements are 0. Then no matter which element is left in 𝒳\mathcal{X}, the probability that the algorithm fails is the probability that the last while loop gives wrong output. From Lemma 27 in Gu and Xu (2023), we know that such probability is less than δ\delta.

Consider the second case where there exists at least a 11 in the original elements X1,,XKX_{1},\cdots,X_{K}. Without loss of generality, we assume that X1=1X_{1}=1. Let 𝒳i\mathcal{X}^{i} be the remaining list of elements at the beginning of ii-th iteration. We let i\mathcal{E}_{i} be the event that the first element in 𝒳i\mathcal{X}^{i} is 11 while the first element in 𝒳i+1\mathcal{X}^{i+1} is 0. This event only happens when the while loop ends with a<δ~ia<\tilde{\delta}_{i} for the first element at the ii-th round. Let 𝒜\mathcal{A} denote the event that the only left element is 11 in the last round, we have

(𝒜)\displaystyle\mathbb{P}(\mathcal{A}) 1(i=1log2(K)i)\displaystyle\geq 1-\mathbb{P}\left(\bigcup_{i=1}^{\lceil\log_{2}(K)\rceil}\mathcal{E}_{i}\right)
1i=1log2(K)(i)\displaystyle\geq 1-\sum_{i=1}^{\lceil\log_{2}(K)\rceil}\mathbb{P}(\mathcal{E}_{i})
>1i=1log2(K)δ2(2i1)\displaystyle>1-\sum_{i=1}^{\lceil\log_{2}(K)\rceil}\delta^{2(2i-1)}
>1δ21δ2.\displaystyle>1-\frac{\delta^{2}}{1-\delta^{2}}.

We can upper bound the error probability under event 𝒜\mathcal{A} with δ\delta. Thus the total failure probability is upper bounded by δ+δ2/(1δ2)<2δ\delta+\delta^{2}/(1-\delta^{2})<2\delta when δ<1/2\delta<1/2.

The upper bound for 𝖲𝖤𝖠𝖱𝖢𝖧\mathsf{SEARCH} is due to Gu and Xu (2023), and is thus omitted here. The upper bound for 𝖲𝖮𝖱𝖳\mathsf{SORT} is based on that for 𝖲𝖤𝖠𝖱𝖢𝖧\mathsf{SEARCH}. We can design an insertion-based sorting algorithm by adding elements sequentially to an initially empty sorted set via noisy searching, as in Algorithm 1 in (Wang et al., 2023). Since the insertion step requires 𝒪(log(K)/(1H(p))+log(1/δ)/D𝖪𝖫(p1p))\mathcal{O}(\log(K)/(1-H(p))+\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p)) queries. Overall we know that one needs 𝒪(k=1K(log(k)/(1H(p))+log(1/δ)/D𝖪𝖫(p1p)))\mathcal{O}(\sum_{k=1}^{K}(\log(k)/(1-H(p))+\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p))) =𝒪(Klog(K)/(1H(p))+Klog(1/δ)/D𝖪𝖫(p1p))=\mathcal{O}(K\log(K)/(1-H(p))+K\log(1/\delta)/D_{\mathsf{KL}}(p\|1-p)) queries to achieve error probability at most KδK\delta. Rescaling δ\delta gives the final rate.