This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Optimal Bounds for Noisy Sorting

Yuzhou Gu
MIT
[email protected]
   Yinzhan Xu
MIT
[email protected]
Abstract

Sorting is a fundamental problem in computer science. In the classical setting, it is well-known that (1±o(1))nlog2n(1\pm o(1))n\log_{2}n comparisons are both necessary and sufficient to sort a list of nn elements. In this paper, we study the Noisy Sorting problem, where each comparison result is flipped independently with probability pp for some fixed p(0,12)p\in(0,\frac{1}{2}). As our main result, we show that

(1±o(1))(1I(p)+1(12p)log2(1pp))nlog2n(1\pm o(1))\left(\frac{1}{I(p)}+\frac{1}{(1-2p)\log_{2}\left(\frac{1-p}{p}\right)}\right)n\log_{2}n

noisy comparisons are both necessary and sufficient to sort nn elements with error probability o(1)o(1) using noisy comparisons, where I(p)=1+plog2p+(1p)log2(1p)I(p)=1+p\log_{2}p+(1-p)\log_{2}(1-p) is capacity of BSC channel with crossover probability pp. This simultaneously improves the previous best lower and upper bounds (Wang, Ghaddar and Wang, ISIT 2022) for this problem.

For the related Noisy Binary Search problem, we show that

(1±o(1))((1δ)log2(n)I(p)+2log2(1δ)(12p)log2(1pp))(1\pm o(1))\left((1-\delta)\frac{\log_{2}(n)}{I(p)}+\frac{2\log_{2}\left(\frac{1}{\delta}\right)}{(1-2p)\log_{2}\left(\frac{1-p}{p}\right)}\right)

noisy comparisons are both necessary and sufficient to find the predecessor of an element among nn sorted elements with error probability δ\delta. This extends the previous bounds of (Burnashev and Zigangirov, 1974), which are only tight for δ=1/no(1)\delta=1/n^{o(1)}.

1 Introduction

Sorting is one of the most fundamental problems in computer science. Among the wide variety of sorting algorithms, an important class is comparison-based sorting. It is well-known that sorting a list of nn numbers requires (1o(1))nlogn(1-o(1))n\log n comparisons (see e.g., [CLRS22]), even if the input list is a random permutation.111All logarithms in the paper are of base 22. A good number of algorithms require only (1+o(1))nlogn(1+o(1))n\log n comparisons, such as binary insertion sort, merge sort (see e.g., [Knu98]), and merge insertion sort [FJ59].

Sorting is run on many real-world applications with large data sets. Thus, it is important to consider faults that arise unavoidably in large systems. For comparison-based sorting, this means that the result of a comparison could be incorrect. We consider one well-studied model for simulating such faults, studied in, e.g., [Hor63, Gal78, Pel89, BM86, FRPU94, BMW16]. In this model, the result of each query (or noisy comparison in the context of sorting) is flipped independently with probability pp for some fixed p(0,12)p\in(0,\frac{1}{2}) and repeated queries are allowed. Let us call it the noisy model. In this model, we define 𝖭𝗈𝗂𝗌𝗒𝖲𝗈𝗋𝗍𝗂𝗇𝗀(n)\mathsf{NoisySorting}(n) as the task to sort nn elements using noisy comparisons. As this model inherently has errors and no algorithm can always produce correct outputs, we consider algorithms with o(1)o(1) error probability.

If we repeat each noisy comparison Θ(logn)\Theta(\log n) times, the majority of the returned results is the correct comparison result with high probability. Therefore, we can modify any classical comparison-based sorting algorithm with O(nlogn)O(n\log n) comparisons, by replacing each comparison with Θ(logn)\Theta(\log n) noisy comparisons, to get an O(nlog2n)O(n\log^{2}n)-comparison algorithm for Noisy Sorting that succeeds with 1o(1)1-o(1) probability.

In fact, better algorithms were known. Feige, Raghavan, Peleg, and Upfal [FRPU94] gave an algorithm for 𝖭𝗈𝗂𝗌𝗒𝖲𝗈𝗋𝗍𝗂𝗇𝗀(n)\mathsf{NoisySorting}(n) with only O(nlogn)O(n\log n) noisy comparisons. Recently, Wang, Ghaddar, Wang [WGW22] analyzed the constant factor hidden in [FRPU94]’s algorithm,222The actual constant is quite complicated, see [WGW22] for details. and gave an improved upper bound of (1+o(1))2log(112+p(1p))nlogn(1+o(1))\frac{2}{\log\left(\frac{1}{\frac{1}{2}+\sqrt{p(1-p)}}\right)}n\log n noisy comparisons. They also showed that (1o(1))1I(p)nlogn(1-o(1))\frac{1}{I(p)}n\log n comparisons are necessary for any algorithm with o(1)o(1) error probability, where I(p)=1h(p)I(p)=1-h(p) is the capacity of BSC channel with crossover probability pp (which is also the amount of information each noisy comparison gives) and h(p)=plogp(1p)log(1p)h(p)=-p\log p-(1-p)\log(1-p) is the binary entropy function.

Despite these progresses, there is still a big gap between the upper and lower bounds. For instance, when p=0.1p=0.1, the constant in front of nlognn\log n for the lower bound is roughly 1.883221.88322, whereas the constant for the upper bound is roughly 6.212576.21257, more than three times as large. It is a fascinating question to narrow down this gap, or even close it like the case of classical sorting.

As the main result of our paper, we show that it is indeed possible to close this gap by giving both an improved upper bound and an improved lower bound:

Theorem 1 (Noisy Sorting Upper Bound).

There exists a deterministic algorithm for 𝖭𝗈𝗂𝗌𝗒𝖲𝗈𝗋𝗍𝗂𝗇𝗀(n)\mathsf{NoisySorting}(n) with error probability o(1)o(1) which always uses at most

(1+o(1))(1I(p)+1(12p)log1pp)nlogn(1+o(1))\left(\frac{1}{I(p)}+\frac{1}{(1-2p)\log\frac{1-p}{p}}\right)n\log n

noisy comparisons.

Theorem 2 (Noisy Sorting Lower Bound).

Any (deterministic or randomized) algorithm for 𝖭𝗈𝗂𝗌𝗒𝖲𝗈𝗋𝗍𝗂𝗇𝗀(n)\mathsf{NoisySorting}(n) with error probability o(1)o(1) must make at least

(1o(1))(1I(p)+1(12p)log1pp)nlogn(1-o(1))\left(\frac{1}{I(p)}+\frac{1}{(1-2p)\log\frac{1-p}{p}}\right)n\log n

noisy comparisons in expectation, even if the input is a uniformly random permutation.

To compare with previous works more quantitatively, our constant is roughly 2.277552.27755 for p=0.1p=0.1. When pp approaches 0, i.e., when the noisy comparisons are almost errorless, our bounds approach (1±o(1))nlogn(1\pm o(1))n\log n, matching the bounds of classical sorting.

Noisy Binary Search.

We also study the closely related Noisy Binary Search problem. In 𝖭𝗈𝗂𝗌𝗒𝖡𝗂𝗇𝖺𝗋𝗒𝖲𝖾𝖺𝗋𝖼𝗁(n)\mathsf{NoisyBinarySearch}(n), we are given a sorted list of nn elements and an element xx, and the goal is to find the predecessor of xx in the sorted list. Noisy Binary Search is well-studied [Hor63, BZ74, FRPU94, BH08, Pel89, DŁU21] in the noisy model we consider, as well as in some other models [AD91, BK93, DGW92, Mut94, RMK+80].

Both previous algorithms [FRPU94, WGW22] for 𝖭𝗈𝗂𝗌𝗒𝖲𝗈𝗋𝗍𝗂𝗇𝗀(n)\mathsf{NoisySorting}(n) use binary insertion sort at the high level. More specifically, the algorithms keep a sorted array of the first ii elements for ii from 11 to nn. When the ii-th element needs to be inserted for i>1i>1, the algorithms use some implementations of 𝖭𝗈𝗂𝗌𝗒𝖡𝗂𝗇𝖺𝗋𝗒𝖲𝖾𝖺𝗋𝖼𝗁(i1)\mathsf{NoisyBinarySearch}(i-1) to find the correct position to insert it, with error probability o(1/n)o(1/n) (it also suffices to have average error probability o(1/n)o(1/n)). A simple union bound then implies that the error probability of the sorting algorithm is o(1)o(1). The difference between the two algorithms is that [FRPU94] uses an algorithm for Noisy Binary Search based on random walk, while [WGW22] builds on Burnashev and Zigangirov’s algorithm [BZ74].

In fact, a variable-length version of Burnashev and Zigangirov’s algorithm [BZ74] achieves optimal query complexity for the Noisy Binary Search problem.333The original paper is in Russian. See [WGZW23, Theorem 5] for an English version. They proved that 𝖭𝗈𝗂𝗌𝗒𝖡𝗂𝗇𝖺𝗋𝗒𝖲𝖾𝖺𝗋𝖼𝗁(n)\mathsf{NoisyBinarySearch}(n) with error probability δ\delta can be solved using at most logn+log(1/δ)+log((1p)/p)I(p)\frac{\log n+\log(1/\delta)+\log((1-p)/p)}{I(p)} noisy comparisons in expectation (we remark that their algorithm is randomized and assumes uniform prior). Information theoretical lower bound (e.g., [BH08, Theorem B.1]) shows that (1δo(1))lognI(p)(1-\delta-o(1))\frac{\log n}{I(p)} noisy comparisons are necessary. These bounds essentially match when 1/no(1)δo(1)1/n^{o(1)}\leq\delta\leq o(1). However, for the application of Noisy Sorting, we must have δ=o(1/n)\delta=o(1/n) on average. In this case, the lower and upper bounds do not match.

Dereniowski, Łukasiewicz, and Uznański [DŁU21] designed alternative algorithms Noisy Binary Search in various settings. They also considered a more general problem called Graph Search.

We remark that Ben-Or and Hassidim [BH08], likely unaware of [BZ74], claims an algorithm for solving 𝖭𝗈𝗂𝗌𝗒𝖡𝗂𝗇𝖺𝗋𝗒𝖲𝖾𝖺𝗋𝖼𝗁(n)\mathsf{NoisyBinarySearch}(n) with error probability δ\delta using at most (1δo(1))logn+O(log(1/δ))I(p)(1-\delta-o(1))\frac{\log n+O(\log(1/\delta))}{I(p)} noisy comparisons in expectation. However, as pointed out by [DŁU21], there are potential issues in [BH08]’s proof.

By using a similar technique as our lower bound for Noisy Sorting, we are also able to improve the lower bound for Noisy Binary Search for δ=1/nΩ(1)\delta=1/n^{\Omega(1)}.

Theorem 3 (Noisy Binary Search Lower Bound).

Any (deterministic or randomized) algorithm for 𝖭𝗈𝗂𝗌𝗒𝖡𝗂𝗇𝖺𝗋𝗒𝖲𝖾𝖺𝗋𝖼𝗁(n)\mathsf{NoisyBinarySearch}(n) with error probability δ\leq\delta must make at least

(1o(1))((1δ)lognI(p)+2log1δ(12p)log1pp)\displaystyle(1-o(1))\left((1-\delta)\frac{\log n}{I(p)}+\frac{2\log\frac{1}{\delta}}{(1-2p)\log\frac{1-p}{p}}\right)

noisy comparisons in expectation, even if the position of the element to search for is uniformly random.

This lower bound indicates that, any algorithm for 𝖭𝗈𝗂𝗌𝗒𝖲𝗈𝗋𝗍𝗂𝗇𝗀(n)\mathsf{NoisySorting}(n) purely based on binary insertion sort, which requires n1n-1 executions of Noisy Binary Search with average error probability o(1/n)o(1/n), needs at least (1o(1))(1I(p)+2(12p)log1pp)nlogn(1-o(1))\left(\frac{1}{I(p)}+\frac{2}{(1-2p)\log\frac{1-p}{p}}\right)n\log n noisy comparisons in expectation to achieve error probability o(1)o(1). In light of Theorem 1, we see that any algorithm for Noisy Sorting purely based on binary insertion sort cannot be optimal.

We also show an algorithm for 𝖭𝗈𝗂𝗌𝗒𝖡𝗂𝗇𝖺𝗋𝗒𝖲𝖾𝖺𝗋𝖼𝗁(n)\mathsf{NoisyBinarySearch}(n) that matches with the lower bound in Theorem 3:

Theorem 4 (Noisy Binary Search Upper Bound).

There exists a randomized algorithm for 𝖭𝗈𝗂𝗌𝗒𝖡𝗂𝗇𝖺𝗋𝗒𝖲𝖾𝖺𝗋𝖼𝗁(n)\mathsf{NoisyBinarySearch}(n) with error probability δ\leq\delta that makes at most

(1+o(1))((1δ)lognI(p)+2log1δ(12p)log1pp)\displaystyle(1+o(1))\left((1-\delta)\frac{\log n}{I(p)}+\frac{2\log\frac{1}{\delta}}{(1-2p)\log\frac{1-p}{p}}\right)

noisy comparisons in expectation.

Concurrent Work.

Concurrent work [WGZW23] also improves over [WGW22] in many aspects. Our results are stronger than their main results. Our algorithm in Theorem 1 uses fewer noisy comparisons than their algorithms in expectation  [WGZW23, Theorem 1 and 2]. Our Theorem 2 is strictly stronger than [WGZW23, Theorem 3]. Our lower bound for insertion-based sorting algorithms (see discussion after Theorem 3) is strictly stronger than [WGZW23, Theorem 4].

Other Related Works.

People have also considered Noisy Sorting in a slightly different noisy model, where every pair of elements is allowed to be compared at most once [BM08, KPSW11, GLLP17, GLLP20, GLLP19]. Sorting is harder in this model than the model we consider, as it is information-theoretically impossible to correctly sort all the elements with 1o(1)1-o(1) probability for p>0p>0 [GLLP17]. For p<1/16p<1/16, Geissmann, Leucci, Liu, Penna [GLLP19] achieved an O(nlogn)O(n\log n) time algorithm that guarantees O(logn)O(\log n) maximum dislocation and O(n)O(n) total dislocation with high probability, matching the lower bound given in [GLLP17].

Acknowledgements.

We thank Przemysław Uznański for telling us about the issue of [BH08].

1.1 Technical Overview

Before we give the overview of our techniques, let us first give some intuitions about the constant, 1I(p)+1(12p)log1pp\frac{1}{I(p)}+\frac{1}{(1-2p)\log\frac{1-p}{p}}, in our bound. I(p)I(p) is essentially the amount of information in bits each noisy query can give, and the ordering of nn elements requires log(n!)nlogn\log(n!)\approx n\log n bits to represent. Therefore, nlognI(p)\frac{n\log n}{I(p)} queries are necessary intuitively. On the other hand, logn(12p)log1pp\frac{\log n}{(1-2p)\log\frac{1-p}{p}} is roughly the number of noisy comparisons required to compare two elements with error probability 1n\leq\frac{1}{n}. Therefore, even for the simpler task of checking whether a list of nn elements are sorted, roughly nlogn(12p)log1pp\frac{n\log n}{(1-2p)\log\frac{1-p}{p}} queries seem necessary to compare all the adjacent elements with overall error probability o(1)o(1). Therefore, the constants 1I(p)\frac{1}{I(p)} and 1(12p)log1pp\frac{1}{(1-2p)\log\frac{1-p}{p}} are natural.

However, the above discussion only intuitively suggests a max{1I(p),1(12p)log1pp}\max\left\{\frac{1}{I(p)},\frac{1}{(1-2p)\log\frac{1-p}{p}}\right\} lower bound on the constant, and a priori it is unclear how to strengthen it to their sum. To resolve this issue, we design an easier version of Noisy Sorting as an intermediate problem. In this version, suppose elements are split into continuous groups of sizes logn\log n in the sorted order (and the groups are unknown to the algorithm). Any time the algorithm tries to compare two elements from the same group, the query result tells the algorithm that these two elements are in the same group, without noise; otherwise, the query result flips the result of the comparison with probability pp, as usual. The algorithm is also not required to sort elements inside the same group correctly. We can show that in this version, each query still only gives roughly I(p)I(p) bits of information, and the total amount of bits the output space can represent is logn!((logn)!)n/lognnlogn\log\frac{n!}{((\log n)!)^{n/\log n}}\approx n\log n. Therefore, this simpler problem still requires approximately nlognI(p)\frac{n\log n}{I(p)} queries. By designing a reduction from this problem to Noisy Sorting, we can show that any algorithm for Noisy Sorting requires roughly nlognI(p)\frac{n\log n}{I(p)} queries comparing elements not from the same group. At the same time, inside each group, the algorithm needs to perform logn1\log n-1 comparisons with 1n\leq\frac{1}{n} error probability, even for checking if all the groups are sorted. This part requires roughly nlogn(12p)log1pp\frac{n\log n}{(1-2p)\log\frac{1-p}{p}} queries.

Our algorithm that matches this lower bound can be viewed as a remote variant of quicksort, with a large number of pivots and one level of recursion. First, we sample a random subset of elements SS of size nlogn\frac{n}{\log n}, which can be viewed as pivots. Sorting SS only contributes to lower order terms in the query complexity. These pivots separate the list to multiple sublists. Then, we use Noisy Binary Search to find the correct sublist for the remaining elements, with error probability 1logn\frac{1}{\log n}. This step contributes to the nlognI(p)\frac{n\log n}{I(p)} part of the query complexity. Then we design an algorithm that correctly sorts mm elements with probability 1o(1n)\geq 1-o(\frac{1}{n}) using only O(mlogm)+(1+o(1))mlogn(12p)log1ppO(m\log m)+(1+o(1))\frac{m\log n}{(1-2p)\log\frac{1-p}{p}} queries. This algorithm is then used to sort all the sublists. Summing over all the sublists, the first O(mlogm)O(m\log m) term becomes O(nloglogn)O(n\log\log n), as we expect each sublist to have size polylog(n)\operatorname{polylog}(n). The second term sums up to (1+o(1))nlogn(12p)log1pp(1+o(1))\frac{n\log n}{(1-2p)\log\frac{1-p}{p}}. The total error probability is o(1n)n=o(1)o(\frac{1}{n})\cdot n=o(1) Finally, we have to correct the elements that are sent to the wrong sublists due to the error probability in the Noisy Binary Search algorithm, but we only expect to see at most nlogn\frac{n}{\log n} of them, so the number of queries for handling them ends up being a lower order term.

2 Preliminaries

In this section we introduce several basic notations, definitions, and lemmas used in the proof.

2.1 Basic Notions

Notation 5.

We use the following notations throughout this paper.

  • For k0k\in\mathbb{Z}_{\geq 0}, let [k]:={1,,k}[k]:=\{1,\ldots,k\}.

  • For two random variables X,YX,Y, let I(X;Y)I(X;Y) denote the mutual information between XX and YY.

  • For a distribution P=(p1,,pk)P=(p_{1},\ldots,p_{k}), define its entropy function as h(P):=i[k]pilogpih(P):=-\sum_{i\in[k]}p_{i}\log p_{i}. Specially, for p[0,1]p\in[0,1], define h(p):=h(p,1p)h(p):=h(p,1-p). Define I(p):=1h(p)I(p):=1-h(p).

  • For p[0,1]p\in[0,1], let Bern(p)\operatorname{\mathrm{Bern}}(p) denote the Bernoulli random variable, such that for XBern(p)X\sim\operatorname{\mathrm{Bern}}(p), we have [X=0]=1p\mathbb{P}[X=0]=1-p, [X=1]=p\mathbb{P}[X=1]=p.

  • For p[0,1]p\in[0,1], let BSCp\operatorname{\mathrm{BSC}}_{p} be the binary symmetric channel with crossover probability pp, i.e., on input x{0,1}x\in\{0,1\}, the output BSCp(x)\operatorname{\mathrm{BSC}}_{p}(x) follows distribution Bern(p+(12p)x)\operatorname{\mathrm{Bern}}(p+(1-2p)x).

  • For two random variables X,YX,Y, define XY:=min{X,Y}X\land Y:=\min\{X,Y\}.

  • Let xx be a variable and f(x)f(x) and g(x)g(x) be two functions which are defined for all large enough values of xx. We say f(x)=ox(g(x))f(x)=o_{x}(g(x)) if limxf(x)/g(x)=0\lim_{x\to\infty}f(x)/g(x)=0. Specially, if x=nx=n, we omit the subscript and write f(n)=o(g(n))f(n)=o(g(n)).

We also assume some basic knowledge about information theory, such as chain rule of entropy and mutual information and Fano’s inequality. See e.g. [PW22] for an overview.

In Noisy Sorting and Noisy Binary Search, the algorithm is allowed to make noisy comparison queries.

Definition 6 (Noisy Comparison Query).

Fix p(0,12)p\in(0,\frac{1}{2}). Let x,yx,y be two comparable elements (i.e., either xyx\leq y or yxy\leq x). We define the noisy comparison query NoisyComparison(x,y)\textsc{NoisyComparison}(x,y) as returning BSCp(𝟙{x<y})\operatorname{\mathrm{BSC}}_{p}(\mathbbm{1}\{x<y\}), where the randomness is independent for every query.

We will omit the crossover probability pp when it is clear from the context throughout the paper.

With these definitions, we are ready to define the problems of study.

Definition 7 (Noisy Sorting Problem).

Let 𝖭𝗈𝗂𝗌𝗒𝖲𝗈𝗋𝗍𝗂𝗇𝗀(n)\mathsf{NoisySorting}(n) be the following problem: Given nn comparable elements (ai)i[n](a_{i})_{i\in[n]}, an algorithm is allowed to make query NoisyComparison(ai,aj)\textsc{NoisyComparison}(a_{i},a_{j}) for i,j[n]i,j\in[n]. The goal is to output a permutation (bi)i[n](b_{i})_{i\in[n]} of (ai)i[n](a_{i})_{i\in[n]} such that bibi+1b_{i}\leq b_{i+1} for all i[n1]i\in[n-1].

Definition 8 (Noisy Binary Search Problem).

Let 𝖭𝗈𝗂𝗌𝗒𝖡𝗂𝗇𝖺𝗋𝗒𝖲𝖾𝖺𝗋𝖼𝗁(n)\mathsf{NoisyBinarySearch}(n) be the following problem: Given nn elements (ai)i[n](a_{i})_{i\in[n]} satisfying aiai+1a_{i}\leq a_{i+1} for all i[n1]i\in[n-1] and an element xx. An algorithm is allowed to make query NoisyComparison(s,t)\textsc{NoisyComparison}(s,t), where {s,t}={ai,x}\{s,t\}=\{a_{i},x\} for some i[n]i\in[n]. The goal is to output max({0}{i[n]:ai<x})\max(\{0\}\cup\{i\in[n]:a_{i}<x\}) (i.e., the index of the largest element in (ai)i[n](a_{i})_{i\in[n]} less than xx).

Remark 9.

By our definition of the Noisy Sorting Problem, we can WLOG assume that all elements (ai)i[n](a_{i})_{i\in[n]} are distinct. In fact, suppose we have an algorithm 𝒜\mathcal{A} which works for the distinct elements case. Then we can modify it such that if it is about to call NoisyComparison(ai,aj)\textsc{NoisyComparison}(a_{i},a_{j}), it instead calls NoisyComparison(amax{i,j},amin{i,j})\textsc{NoisyComparison}(a_{\max\{i,j\}},a_{\min\{i,j\}}) (but flip the returning bit if i<ji<j). Call the modified algorithm 𝒜\mathcal{A}^{\prime}. This way, to the view of the algorithm, the elements are ordered by (ai,i)(a_{i},i). It is easy to see that 𝒜\mathcal{A}^{\prime} has the same distribution of number of queries, and error probability is no larger than error probability of 𝒜\mathcal{A}. Thus, we get an algorithm which works for the case where elements are not necessarily distinct.

Similarly, by our definition of the Noisy Binary Search Problem, we can WLOG assume that all elements (ai)i[n](a_{i})_{i\in[n]} and xx are distinct.

2.2 Known and Folklore Algorithms

Theorem 10 ([DŁU21, Corollary 1.6]).

There exists a randomized algorithm for 𝖭𝗈𝗂𝗌𝗒𝖡𝗂𝗇𝖺𝗋𝗒𝖲𝖾𝖺𝗋𝖼𝗁(n)\mathsf{NoisyBinarySearch}(n) with error probability at most δ\delta using

(1+o(1))(logn+O(log(1/δ))I(p))(1+o(1))\left(\frac{\log n+O(\log(1/\delta))}{I(p)}\right)

noisy comparisons in expectation, and using O(logn)O(\log n) random bits always.

Remark 11.

[DŁU21] stated their problem as finding an element in a sorted array, but their algorithm can also find the predecessor. The only place their algorithm uses randomness is to shift the initial array randomly, which requires O(logn)O(\log n) random bits (as we can WLOG assume nn is a power of two in 𝖭𝗈𝗂𝗌𝗒𝖡𝗂𝗇𝖺𝗋𝗒𝖲𝖾𝖺𝗋𝖼𝗁(n)\mathsf{NoisyBinarySearch}(n), it always takes O(logn)O(\log n) random bits to sample a random shift, instead of O(logn)O(\log n) random bits in expectation).

We could also use the noisy binary search algorithm in [BZ74] (see also [WGZW23, Theorem 5]), which achieves a slightly better bound on the number of noisy comparisons. However, the number of random bits used in Burnashev-Zigangirov’s algorithm is O(logn)O(\log n) in expectation, not always. This would make our control of number of random bits used slightly more complicated.

Corollary 12.

There exists a randomized algorithm for 𝖭𝗈𝗂𝗌𝗒𝖲𝗈𝗋𝗍𝗂𝗇𝗀(n)\mathsf{NoisySorting}(n) with failure probability δ\delta using O(nlog(n/δ))O(n\log(n/\delta)) noisy comparisons in expectation, and using O(nlogn)O(n\log n) random bits always.

Proof.

The algorithm keeps a sorted array of the first ii elements for ii from 11 to nn. Each time a new element needs to be inserted, the algorithm uses Theorem 10 to find its correct position, with error probability δ/n\delta/n. By union bound, the overall error probability is bounded by δ/nn=δ\delta/n\cdot n=\delta, the expected number of queries is n(1+o(1))(logn+log(n/δ)I(p))=O(nlog(n/δ)I(p))n\cdot(1+o(1))\left(\frac{\log n+\log(n/\delta)}{I(p)}\right)=O(\frac{n\log(n/\delta)}{I(p)}), and the number of random bits used is O(nlogn)O(n\log n). ∎

We show how to use repeated queries to boost the correctness probability of pairwise comparisons. This is a folklore result, but we present a full proof here for completeness.

Lemma 13.

There exists a deterministic algorithm which compares two (unequal) elements using noisy comparisons with error probability δ\leq\delta using

112plog1δδlog1pp=(1+o1/δ(1))log(1/δ)(12p)log((1p)/p)\displaystyle\frac{1}{1-2p}\left\lceil\frac{\log\frac{1-\delta}{\delta}}{\log\frac{1-p}{p}}\right\rceil=(1+o_{1/\delta}(1))\frac{\log(1/\delta)}{(1-2p)\log((1-p)/p)}

noisy queries in expectation.

Proof.

Consider Algorithm 1 which maintains the posterior probability that x<yx<y.

Algorithm 1
1:procedure LessThan(x,y,δx,y,\delta)
2:     a12a\leftarrow\frac{1}{2}
3:     while true do
4:         if NoisyComparison(x,y)=1\textsc{NoisyComparison}(x,y)=1 then
5:              a(1p)a(1p)a+p(1a)a\leftarrow\frac{(1-p)a}{(1-p)a+p(1-a)}
6:         else
7:              apapa+(1p)(1a)a\leftarrow\frac{pa}{pa+(1-p)(1-a)}          
8:         if a1δa\geq 1-\delta then return true \trianglerightx<yx<y          
9:         if aδa\leq\delta then return false \trianglerightx>yx>y               

Because of the return condition, if prior distribution of 𝟙{x<y}\mathbbm{1}\{x<y\} is Bern(12)\operatorname{\mathrm{Bern}}(\frac{1}{2}), then probability of error is δ\leq\delta. However, by symmetry, probability of error of the algorithm does not depend on the ground truth, so the probability of error is δ\leq\delta regardless of the ground truth.

In the following, let us analyze the number of queries used.

Let ata_{t} be the value of aa after the tt-th query. If the ground truth is x<yx<y, then

logat+11at+1=logat1at+{+log1ppw.p. 1p,log1ppw.p. p.\displaystyle\log\frac{a_{t+1}}{1-a_{t+1}}=\log\frac{a_{t}}{1-a_{t}}+\left\{\begin{array}[]{ll}+\log\frac{1-p}{p}&\text{w.p. $1-p$,}\\ -\log\frac{1-p}{p}&\text{w.p. $p$.}\end{array}\right.

Thus,

𝔼[logat+11at+1]=logat1at+(12p)log1pp,\displaystyle\mathbb{E}\left[\log\frac{a_{t+1}}{1-a_{t+1}}\right]=\log\frac{a_{t}}{1-a_{t}}+(1-2p)\log\frac{1-p}{p},

where the expectation is over the randomness of the (t+1)(t+1)-th query.

Define two sequences (Xt)t0(X_{t})_{t\geq 0}, (Yt)t0(Y_{t})_{t\geq 0} as Xt=logat1atX_{t}=\log\frac{a_{t}}{1-a_{t}} and Yt=Xtt(12p)log1ppY_{t}=X_{t}-t(1-2p)\log\frac{1-p}{p}. Let τ\tau be the stopping time τ:=min{t:atδ or at1δ}\tau:=\min\{t:a_{t}\leq\delta\text{ or }a_{t}\geq 1-\delta\}.

By the above discussion, (Yt)t0(Y_{t})_{t\geq 0} is a martingale. Using Optional Stopping Theorem, we have

𝔼[Yτn]=𝔼[Y0]=0\displaystyle\mathbb{E}[Y_{\tau\land n}]=\mathbb{E}[Y_{0}]=0

for every n0n\geq 0, so

𝔼[Xτn]=𝔼[τn](12p)log1pp.\displaystyle\mathbb{E}[X_{\tau\land n}]=\mathbb{E}[\tau\land n](1-2p)\log\frac{1-p}{p}.

By the bounded convergence theorem, 𝔼[Xτn]\mathbb{E}[X_{\tau\land n}] goes to 𝔼[Xτ]\mathbb{E}[X_{\tau}] as nn\to\infty. By the monotone convergence theorem, 𝔼[τn]\mathbb{E}[\tau\land n] goes to 𝔼[τ]\mathbb{E}[\tau] as nn\to\infty. Therefore,

𝔼[Xτ]=𝔼[τ](12p)log1pp,\displaystyle\mathbb{E}[X_{\tau}]=\mathbb{E}[\tau](1-2p)\log\frac{1-p}{p},

which leads to

𝔼[τ]=𝔼[Xτ](12p)log1pp112plog1δδlog1pp\displaystyle\mathbb{E}[\tau]=\frac{\mathbb{E}[X_{\tau}]}{(1-2p)\log\frac{1-p}{p}}\leq\frac{1}{1-2p}\left\lceil\frac{\log\frac{1-\delta}{\delta}}{\log\frac{1-p}{p}}\right\rceil

where we used the fact that XtX_{t} is always an integer multiple of log1pp\log\frac{1-p}{p} and hence Xτlog1δδlog1pplog1ppX_{\tau}\leq\left\lceil\frac{\log\frac{1-\delta}{\delta}}{\log\frac{1-p}{p}}\right\rceil\cdot\log\frac{1-p}{p}.

Thus, the algorithm stops within 112plog1δδlog1pp\frac{1}{1-2p}\left\lceil\frac{\log\frac{1-\delta}{\delta}}{\log\frac{1-p}{p}}\right\rceil queries in expectation.

Similarly, if the ground truth is x>yx>y, the algorithm stops within the same expected number of queries. This finishes the proof. ∎

For simplicity, we use fp(δ)f_{p}(\delta) to denote 112plog1δδlog1pp\frac{1}{1-2p}\left\lceil\frac{\log\frac{1-\delta}{\delta}}{\log\frac{1-p}{p}}\right\rceil for 0<p<120<p<\frac{1}{2} and δ>0\delta>0, which, by Lemma 13, upper bounds the expected number of comparisons needed by Algorithm 1 to compare two elements with error probability δ\leq\delta. When clear from the context, we drop the subscript pp.

2.3 Pairwise Independent Hashing

During our algorithm, we sometimes run into the following situation: we need to run a certain algorithm nn times, where each instance uses m=ncm=n^{c} random bits for some 0<c<10<c<1. We can only afford o(nlogn)o(n\log n) fresh random bits in total (for derandomization purpose), so the instances need to share randomness in some way. On the other hand, we want randomness between any two instances to be independent of each other, in order for concentration bounds to hold. Therefore we need the following standard result on pairwise independent hashing.

Lemma 14 ([CW77]).

There exists a pairwise independent hash family of size 2k12^{k}-1 using kk fresh random bits.

By using mm fully independent copies of pairwise independent hash families, we can achieve pairwise independence between instances using O(mlogn)O(m\log n) fresh random bits.

3 Noisy Sorting Algorithm

In this section, we will present our algorithm for noisy sorting and prove Theorem 1. We will first present the following randomized version of our algorithm, and then convert it to a deterministic one in Section 3.1.

Theorem 15.

There exists a randomized algorithm for 𝖭𝗈𝗂𝗌𝗒𝖲𝗈𝗋𝗍𝗂𝗇𝗀(n)\mathsf{NoisySorting}(n) with error probability o(1)o(1) which always uses at most

(1+o(1))(1I(p)+1(12p)log1pp)nlogn(1+o(1))\left(\frac{1}{I(p)}+\frac{1}{(1-2p)\log\frac{1-p}{p}}\right)n\log n

noisy comparisons, and uses O(n)O(n) random bits in expectation.

We start with a subroutine that sorts a list with a small number of inversion pairs.

Lemma 16.

Fix any parameter σ(0,1)\sigma\in(0,1) which can depend on nn. Given a list of nn elements with tt inversion pairs, there exists a deterministic algorithm using noisy comparisons that sorts these nn elements with error probability (n1+t)σ\leq(n-1+t)\sigma using (n1+t)f(σ)+(n1+t)σn2f(σ)(n-1+t)f(\sigma)+(n-1+t)\sigma n^{2}f(\sigma) noisy queries in expectation. The algorithm does not have to know tt in advance.

Proof.

Consider Algorithm 2, which is essentially insertion sort.

Algorithm 2
1:procedure SortInversion(n,(ai)i[n],p,σn,(a_{i})_{i\in[n]},p,\sigma)
2:     for i=1ni=1\to n do
3:         for j=i2j=i\to 2 do
4:              if LessThan(aj,aj1,σa_{j},a_{j-1},\sigmathen
5:                  Swap aja_{j} and aj1a_{j-1}
6:              else
7:                  break                             
8:     return (ai)i[n](a_{i})_{i\in[n]}

By union bound, with probability 1(n1+t)σ\geq 1-(n-1+t)\sigma, the first (n1+t)(n-1+t) calls to LessThan all return correctly. Conditioned on this happening, the algorithm acts exactly like a normal insertion sort, which halts after the first (n1+t)(n-1+t) calls to LessThan, which takes (n1+t)f(σ)(n-1+t)f(\sigma) noisy comparisons in expectation.

With probability (n1+t)σ\leq(n-1+t)\sigma, the algorithm does not run correctly, but the algorithm still halts in at most (n2)f(σ)n2f(σ)\binom{n}{2}f(\sigma)\leq n^{2}f(\sigma) queries in expectation. ∎

Next, we use Lemma 16 to construct the following subroutine. It is ultimately used to sort some small sets of elements in the main algorithm.

Lemma 17.

Fix any parameter δ(0,1)\delta\in(0,1) which can depend on nn. There is a randomized algorithm for 𝖭𝗈𝗂𝗌𝗒𝖲𝗈𝗋𝗍𝗂𝗇𝗀(n)\mathsf{NoisySorting}(n) with error probability at most δ\delta using O(nlogn)+nf(δ/n)+n2δf(δ/n)O(n\log n)+nf(\delta/n)+n^{2}\delta f(\delta/n) noisy comparisons in expectation, and using O(nlogn)O(n\log n) random bits always.

Proof.

We first use Corollary 12 with error probability Pen2P_{e}\leq n^{-2}, then use Lemma 16 with error probability σ=δ/n\sigma=\delta/n.

After the first step, the expected number of inversion pairs is

𝔼[t]Pen21,\displaystyle\mathbb{E}[t]\leq P_{e}n^{2}\leq 1,

so the second step uses at most nf(δ/n)+n2δf(δ/n)nf(\delta/n)+n^{2}\delta f(\delta/n) queries in expectation.

The overall error probability is at most 𝔼[(n1+t)σ]nσ=δ\mathbb{E}[(n-1+t)\sigma]\leq n\sigma=\delta. ∎

The following lemma allows us to construct algorithms with guaranteed tail behavior for the number of queries. This will be helpful for concentration bounds.

Lemma 18.

Suppose we have an algorithm 𝒜\mathcal{A} (solving a certain task) that has error probability δ\leq\delta and number of queries τ\tau with 𝔼[τ]m\mathbb{E}[\tau]\leq m, and uses r\leq r random bits always. Then we can construct an algorithm \mathcal{B} solving the same task satisfying the following properties:

  • \mathcal{B} has error probability (1+om(1))δ(1+o_{m}(1))\delta;

  • Let ρ\rho be the number of queries algorithm \mathcal{B} uses. Then

    𝔼[ρ]\displaystyle\mathbb{E}[\rho] (1+om(1))m,\displaystyle\leq(1+o_{m}(1))m,
    𝔼[ρ2]\displaystyle\mathbb{E}[\rho^{2}] =O(m3).\displaystyle=O(m^{3}).
  • Let rr^{\prime} be the number of random bits algorithm \mathcal{B} uses. Then

    𝔼[r]\displaystyle\mathbb{E}[r^{\prime}] (1+om(1))r,\displaystyle\leq(1+o_{m}(1))r,
    𝔼[(r)2]\displaystyle\mathbb{E}[(r^{\prime})^{2}] =O(r2).\displaystyle=O(r^{2}).
Proof.

Let kk be a parameter to be chosen later. Consider the following algorithm \mathcal{B}:

  1. 1.

    Run algorithm 𝒜\mathcal{A} until it halts, or it is about to use the (k+1)(k+1)-th query.

  2. 2.

    If 𝒜\mathcal{A} halts, then we return the return value of 𝒜\mathcal{A}; otherwise, we restart the whole algorithm.

Let us compute the probability of a restart. By Markov’s inequality,

[restart]=𝔼[τ>k]mk.\displaystyle\mathbb{P}[\text{restart}]=\mathbb{E}[\tau>k]\leq\frac{m}{k}.

Algorithm \mathcal{B}’s error probability Pe()P_{e}(\mathcal{B}) is

Pe()=[𝒜 errsτk][𝒜 errs][τk]δ1mk.\displaystyle P_{e}(\mathcal{B})=\mathbb{P}[\text{$\mathcal{A}$ errs}\mid\tau\leq k]\leq\frac{\mathbb{P}[\text{$\mathcal{A}$ errs}]}{\mathbb{P}[\tau\leq k]}\leq\frac{\delta}{1-\frac{m}{k}}.

The expected number of queries ρ\rho used by Algorithm BB satisfies

𝔼[ρ]\displaystyle\mathbb{E}[\rho] =𝔼[τk]+[restart]𝔼[ρ]\displaystyle=\mathbb{E}[\tau\land k]+\mathbb{P}[\text{restart}]\mathbb{E}[\rho]
m+mk𝔼[ρ].\displaystyle\leq m+\frac{m}{k}\mathbb{E}[\rho].

Solving this, we get

𝔼[ρ]m1mk.\displaystyle\mathbb{E}[\rho]\leq\frac{m}{1-\frac{m}{k}}.

The second moment satisfies

𝔼[ρ2]𝔼[(τk)2]+[restart]𝔼[(k+ρ)2]\displaystyle\mathbb{E}[\rho^{2}]\leq\mathbb{E}[(\tau\land k)^{2}]+\mathbb{P}[\text{restart}]\mathbb{E}[(k+\rho)^{2}]
k2+mk(𝔼[ρ2]+2k𝔼[ρ]+k2)\displaystyle\leq k^{2}+\frac{m}{k}\cdot(\mathbb{E}[\rho^{2}]+2k\mathbb{E}[\rho]+k^{2})
k2+mk+2m21mk+mk𝔼[ρ2].\displaystyle\leq k^{2}+mk+\frac{2m^{2}}{1-\frac{m}{k}}+\frac{m}{k}\cdot\mathbb{E}[\rho^{2}].

Solving this, we get

𝔼[ρ2](1mk)1(k2+mk+2m21mk).\displaystyle\mathbb{E}[\rho^{2}]\leq\left(1-\frac{m}{k}\right)^{-1}\left(k^{2}+mk+\frac{2m^{2}}{1-\frac{m}{k}}\right).

Similar to ρ\rho, we have

𝔼[r]\displaystyle\mathbb{E}[r^{\prime}] r+[restart]𝔼[r]\displaystyle\leq r+\mathbb{P}[\text{restart}]\mathbb{E}[r^{\prime}]
r+mk𝔼[r].\displaystyle\leq r+\frac{m}{k}\cdot\mathbb{E}[r^{\prime}].

Solving this we get

𝔼[r]r1mk.\displaystyle\mathbb{E}[r^{\prime}]\leq\frac{r}{1-\frac{m}{k}}.

Also,

𝔼[(r)2]\displaystyle\mathbb{E}[(r^{\prime})^{2}] r2+[restart]𝔼[(r+r)2]\displaystyle\leq r^{2}+\mathbb{P}[\text{restart}]\mathbb{E}[(r+r^{\prime})^{2}]
r2+mk(r2+2r𝔼[r]+𝔼[(r)2])\displaystyle\leq r^{2}+\frac{m}{k}(r^{2}+2r\mathbb{E}[r^{\prime}]+\mathbb{E}[(r^{\prime})^{2}])
r2+mkr2+mk2r21mk+mk𝔼[(r)2].\displaystyle\leq r^{2}+\frac{m}{k}\cdot r^{2}+\frac{m}{k}\cdot\frac{2r^{2}}{1-\frac{m}{k}}+\frac{m}{k}\cdot\mathbb{E}[(r^{\prime})^{2}].

Solving this we get

𝔼[(r)2](1mk)1(r2+mkr2+mk2r21mk).\displaystyle\mathbb{E}[(r^{\prime})^{2}]\leq(1-\frac{m}{k})^{-1}(r^{2}+\frac{m}{k}\cdot r^{2}+\frac{m}{k}\cdot\frac{2r^{2}}{1-\frac{m}{k}}).

Choosing k=mlogmk=m\log m, we have 1/(1m/k)=1+om(1)1/(1-m/k)=1+o_{m}(1), so we get

Pe()\displaystyle P_{e}(\mathcal{B}) (1+om(1))δ,\displaystyle\leq(1+o_{m}(1))\delta,
𝔼[ρ]\displaystyle\mathbb{E}[\rho] (1+om(1))m,\displaystyle\leq(1+o_{m}(1))m,
𝔼[ρ2]\displaystyle\mathbb{E}[\rho^{2}] O(m2log2m)=O(m3),\displaystyle\leq O(m^{2}\log^{2}m)=O(m^{3}),
𝔼[r]\displaystyle\mathbb{E}[r^{\prime}] (1+om(1))r,\displaystyle\leq(1+o_{m}(1))r,
𝔼[(r)2]\displaystyle\mathbb{E}[(r^{\prime})^{2}] O(r2).\displaystyle\leq O(r^{2}).

Using Lemma 18, we are able to construct “safe” (i.e., with guaranteed tail behavior) versions of algorithms introduced before.

Corollary 19 (Safe Algorithms).
  1. 1.

    Given δ(0,1)\delta\in(0,1), there exists a randomized algorithm SafeBinarySearch for 𝖭𝗈𝗂𝗌𝗒𝖡𝗂𝗇𝖺𝗋𝗒𝖲𝖾𝖺𝗋𝖼𝗁(n)\mathsf{NoisyBinarySearch}(n), with error probability δ\leq\delta, with the number of queries τ\tau satisfying

    𝔼[τ]=(1+o(1))(logn+O(log1δ)I(p)),Var[τ]=O(log3n+log31δ).\displaystyle\mathbb{E}[\tau]=(1+o(1))\left(\frac{\log n+O(\log\frac{1}{\delta})}{I(p)}\right),\quad\operatorname{\mathrm{Var}}[\tau]=O(\log^{3}n+\log^{3}\frac{1}{\delta}).

    and with the number of used random bits rr satisfying

    𝔼[r]=O(logn),Var[r]=O(log2n).\displaystyle\mathbb{E}[r]=O(\log n),\quad\operatorname{\mathrm{Var}}[r]=O(\log^{2}n).
  2. 2.

    Given δ(0,1)\delta\in(0,1), there exists a deterministic algorithm SafeLessThan which compares two elements with error probability δ\leq\delta, with the number of queries τ\tau satisfying

    𝔼[τ]=(1+o1/δ(1))log1δ(12p)log1pp,Var[τ]=O(log31δ).\displaystyle\mathbb{E}[\tau]=(1+o_{1/\delta}(1))\frac{\log\frac{1}{\delta}}{(1-2p)\log\frac{1-p}{p}},\quad\operatorname{\mathrm{Var}}[\tau]=O(\log^{3}\frac{1}{\delta}).
  3. 3.

    Given δ=O(1n3)\delta=O(\frac{1}{n^{3}}), there exists a randomized algorithm SafeWeakSort for 𝖭𝗈𝗂𝗌𝗒𝖲𝗈𝗋𝗍𝗂𝗇𝗀(n)\mathsf{NoisySorting}(n), with error probability δ\leq\delta, with the number of queries τ\tau satisfying

    𝔼[τ]=O(nlogn)+(1+o(1))nlog1δ(12p)log1pp,Var[τ]=O(n3log31δ).\displaystyle\mathbb{E}[\tau]=O(n\log n)+(1+o(1))\frac{n\log\frac{1}{\delta}}{(1-2p)\log\frac{1-p}{p}},\quad\operatorname{\mathrm{Var}}[\tau]=O(n^{3}\log^{3}\frac{1}{\delta}).

    and with the number of used random bits rr satisfying

    𝔼[r]=O(nlogn),Var[r]=O(n2log2n).\displaystyle\mathbb{E}[r]=O(n\log n),\quad\operatorname{\mathrm{Var}}[r]=O(n^{2}\log^{2}n).
Proof.

Apply Lemma 18 to Theorem 10, Lemma 13, and Lemma 17 respectively. ∎

We introduce our last subroutine below, which is used to sort a subset of Θ(nlogn)\Theta(\frac{n}{\log n}) elements in the main algorithm.

Lemma 20.

There exists a randomized algorithm SafeSimpleSort for 𝖭𝗈𝗂𝗌𝗒𝖲𝗈𝗋𝗍𝗂𝗇𝗀(n)\mathsf{NoisySorting}(n) with error probability o(1)o(1) which always uses O(nlogn)O(n\log n) queries and O(nlogn)O(n\log n) random bits always.

Proof.

Consider the following algorithm 𝒜\mathcal{A}:

  • Keep an array of the first ii elements for ii from 11 to nn. Each time a new element needs to be inserted, the algorithm uses SafeBinarySearch to find the correct position, with error probability 1nlogn\frac{1}{n\log n}.

  • Output the resulting array.

We have (n1)(n-1) calls to SafeBinarySearch. Let EiE_{i} be the event that the ii-th call to SafeBinarySearch returns the correct value, and let EE be the event that all EiE_{i} happens. By union bound, probability of error [¬E]\mathbb{P}[\neg E] is at most (n1)1nlogn=o(1)(n-1)\cdot\frac{1}{n\log n}=o(1).

Let τi\tau_{i} be the number of queries used in the ii-th call to SafeBinarySearch. By Corollary 19 Part 1, we have

𝔼[τiE]=𝔼[τi1jiEj]𝔼[τi1j<iEj][Ei1j<iEj]\displaystyle\mathbb{E}[\tau_{i}\mid E]=\mathbb{E}\left[\tau_{i}\mid\bigwedge_{1\leq j\leq i}E_{j}\right]\leq\frac{\mathbb{E}\left[\tau_{i}\mid\bigwedge_{1\leq j<i}E_{j}\right]}{\mathbb{P}\left[E_{i}\mid\bigwedge_{1\leq j<i}E_{j}\right]} =O(logn),\displaystyle=O(\log n),

and

Var[τiE]\displaystyle\operatorname{\mathrm{Var}}[\tau_{i}\mid E] =Var[τi1jiEj]𝔼[τi21jiEj]𝔼[τi21j<iEj][Ei1j<iEj]\displaystyle=\operatorname{\mathrm{Var}}\left[\tau_{i}\mid\bigwedge_{1\leq j\leq i}E_{j}\right]\leq\mathbb{E}\left[\tau_{i}^{2}\mid\bigwedge_{1\leq j\leq i}E_{j}\right]\leq\frac{\mathbb{E}\left[\tau_{i}^{2}\mid\bigwedge_{1\leq j<i}E_{j}\right]}{\mathbb{P}\left[E_{i}\mid\bigwedge_{1\leq j<i}E_{j}\right]}
O(Var[τi1j<iEj]+𝔼[τi1j<iEj]2)=O(log3n).\displaystyle\leq O\left(\operatorname{\mathrm{Var}}\left[\tau_{i}\mid\bigwedge_{1\leq j<i}E_{j}\right]+\mathbb{E}\left[\tau_{i}\mid\bigwedge_{1\leq j<i}E_{j}\right]^{2}\right)=O(\log^{3}n).

Conditioned on EE, τi\tau_{i}’s are independent because they use disjoint source of randomness. Thus, by Chebyshev’s inequality,

[i[n1]τii[n1]𝔼[τi]+n2/3E]O(nlog3n(n2/3)2)=o(1).\displaystyle\mathbb{P}\left[\sum_{i\in[n-1]}\tau_{i}\geq\sum_{i\in[n-1]}\mathbb{E}[\tau_{i}]+n^{2/3}\mid E\right]\leq O\left(\frac{n\log^{3}n}{\left(n^{2/3}\right)^{2}}\right)=o(1).

Let us define m=i[n1]𝔼[τiE]+n2/3=O(nlogn)m=\sum_{i\in[n-1]}\mathbb{E}[\tau_{i}\mid E]+n^{2/3}=O(n\log n).

If the number of random bits used in the ii-th call to SafeBinarySearch is rir_{i}, then we can similarly show

[i[n1]rii[n1]𝔼[ri]+n2/3E]=o(1).\displaystyle\mathbb{P}\left[\sum_{i\in[n-1]}r_{i}\geq\sum_{i\in[n-1]}\mathbb{E}[r_{i}]+n^{2/3}\mid E\right]=o(1).

Let R=i[n1]𝔼[riE]+n2/3=O(nlogn)R=\sum_{i\in[n-1]}\mathbb{E}[r_{i}\mid E]+n^{2/3}=O(n\log n).

Consider the following algorithm (which is our SafeSimpleSort):

  • Run 𝒜\mathcal{A} until it finishes, or it is about the make the (m+1)(m+1)-th query, or it is about the use the (R+1)(R+1)-th random bit.

  • If 𝒜\mathcal{A} finishes, then return the output of 𝒜\mathcal{A}; otherwise, output an arbitrary permutation.

Then SafeSimpleSort always makes at most m=O(nlogn)m=O(n\log n) queries, uses at most R=O(nlogn)R=O(n\log n) random bits, and has error probability [𝒜 errs]+o(1)=o(1)\mathbb{P}[\mathcal{A}\text{ errs}]+o(1)=o(1). ∎

Finally, we give our main algorithm for noisy sorting. See Algorithm 3 for its description.

We first analyze its error probability.

Algorithm 3
1:procedure NoisySort(A=(x1,,xn),pA=(x_{1},\ldots,x_{n}),p)
2:     S{,}S\leftarrow\{-\infty,\infty\}.
3:     for aAa\in A do
4:         Add aa to SS with probability 1logn\frac{1}{\log n}. \trianglerightFully independent random bits.      
5:     Sort SS using SafeSimpleSort. \trianglerightFully independent random bits.
6:     for aASa\in A\setminus S do
7:         Use SafeBinarySearch with δ=1logn\delta=\frac{1}{\log n} to search the predecessor of aa in SS. \trianglerightWe use n2/3n^{2/3} pairwise independent hash families to feed the first n2/3n^{2/3} random bits of each SafeBinarySearch call, and errs if some call needs more than n2/3n^{2/3} random bits.
8:         Denote the returned answer as l^a\hat{l}_{a}.      
9:     XX\leftarrow\emptyset.
10:     𝒜S{,}\mathcal{A}\leftarrow S\setminus\{-\infty,\infty\}.
11:     for lS{}l\in S\setminus\{\infty\} do
12:         rr\leftarrow next element in SS.
13:         Bl{aAS:l^a=l}B_{l}\leftarrow\{a\in A\setminus S:\hat{l}_{a}=l\}.
14:         if |Bl|>6log2n|B_{l}|>6\log^{2}n then
15:              XXBlX\leftarrow X\cup B_{l}
16:         else
17:              Sort BlB_{l} using SafeWeakSort with error probability 1nlogn\frac{1}{n\log n}. \trianglerightWe use n2/3n^{2/3} pairwise independent hash families to feed the first n2/3n^{2/3} random bits of each SafeWeakSort call, and errs if some call needs more than n2/3n^{2/3} random bits.
18:              while |Bl|>0|B_{l}|>0 do
19:                  xx\leftarrow first element in BlB_{l}.
20:                  if SafeLessThan(xx, ll, 1nlogn\frac{1}{n\log n}then
21:                       BB{x}B\leftarrow B\setminus\{x\}.
22:                       XX{x}X\leftarrow X\cup\{x\}.
23:                  else
24:                       break                                 
25:              while |Bl|>0|B_{l}|>0 do
26:                  xx\leftarrow last element in BlB_{l}.
27:                  if SafeLessThan(rr, xx, 1nlogn\frac{1}{n\log n}then
28:                       BlBl{x}B_{l}\leftarrow B_{l}\setminus\{x\}.
29:                       XX{x}X\leftarrow X\cup\{x\}.
30:                  else
31:                       break                                 
32:              Add BlB_{l} to 𝒜\mathcal{A} between ll and rr.               
33:     for xXx\in X do
34:         Insert xx to 𝒜\mathcal{A} using SafeBinarySearch with δ=1nlogn\delta=\frac{1}{n\log n}. \trianglerightWe use n2/3n^{2/3} pairwise independent hash families to feed the first n2/3n^{2/3} random bits of each SafeBinarySearch call, and errs if some call needs more than n2/3n^{2/3} random bits.      
35:     return 𝒜\mathcal{A}.
Lemma 21.

Algorithm 3 has error probability o(1)o(1).

Proof.

Let E0E_{0} be the event that the algorithm does not err because of lack of random bits at Line 7, Line 17, or Line 34. Let E1E_{1} be the event that SafeSimpleSort successfully sorts SS at Line 5; E2E_{2} be the event that all calls of SafeWeakSort at Line 17 are correct; E3E_{3} be the event that all calls of SafeLessThan at Lines 20 and Lines 27 are correct; E4E_{4} be the event that all calls of SafeBinarySearch at Line 34 are correct.

We show that if E0,E1,E2,E3,E4E_{0},E_{1},E_{2},E_{3},E_{4} all happen, then Algorithm 3 is correct. First of all, under E1E_{1}, SS is correctly sorted at Line 5. Secondly, under E2E_{2} and E0E_{0}, for each ll, BlB_{l} is correctly sorted at Line 17. Under E3E_{3} and E0E_{0}, at Line 32, all elements remaining in BlB_{l} are indeed greater than ll and smaller than rr, so adding BlB_{l} to 𝒜\mathcal{A} between ll and rr keeps all elements in 𝒜\mathcal{A} sorted. Therefore, before the for loop at Line 33, all elements in 𝒜\mathcal{A} are correctly sorted, and clearly these elements are exactly AXA\setminus X. Finally, under E4E_{4} and E0E_{0}, the insertions made at Line 34 are all correct, so the final result is correct.

For each call of SafeBinarySearch at Line 7, the probability that it requires more than n2/3n^{2/3} random bits is O(polylognn4/3)O(\frac{\operatorname{polylog}n}{n^{4/3}}) by Corollary 19 Part 1 and Chebyshev’s inequality. By union bound, with probability 1o(1)1-o(1), none of the calls of SafeBinarySearch at Line 7 causes the algorithm to err. We can similarly bound the probability the algorithm errs at Line 17 or Line 34. Overall, Pr[¬E0]o(1)\Pr[\neg E_{0}]\leq o(1). Clearly, Pr[¬E1]o(1)\Pr[\neg E_{1}]\leq o(1), Pr[¬E2]1nlogn(|S|1)1logn\Pr[\neg E_{2}]\leq\frac{1}{n\log n}\cdot(|S|-1)\leq\frac{1}{\log n}, Pr[¬E3]1nlogn(2n)=2logn\Pr[\neg E_{3}]\leq\frac{1}{n\log n}\cdot(2n)=\frac{2}{\log n} and Pr[¬E4]1nlognn=1logn\Pr[\neg E_{4}]\leq\frac{1}{n\log n}\cdot n=\frac{1}{\log n}. Thus, by union bound, the overall success probability of Algorithm 3 is at least 14logno(1)=1o(1)1-\frac{4}{\log n}-o(1)=1-o(1). ∎

Let a bucket be the set of elements that are between two adjacent elements in the sorted order of SS. Then we have the following simple lemma.

Lemma 22.

With probability 11n\geq 1-\frac{1}{n}, all buckets have size at most 3log2n3\log^{2}n.

Proof.

For any continuous segment of length LL in the sorted order of AA, the probability that none of these element is in SS is (11/logn)Lexp(L/logn)(1-1/\log n)^{L}\leq\exp(-L/\log n). By union bound, the probability that there exists LL continuous elements not in SS is at most nexp(L/logn)n\exp(-L/\log n). Taking L=2logenlogn3log2nL=2\log_{e}n\cdot\log n\leq 3\log^{2}n, we see that with probability 11n\geq 1-\frac{1}{n}, all continuous segments of length LL have at least one element in SS, which implies all buckets have size 3log2n\leq 3\log^{2}n. ∎

Lemma 23.

With probability 1o(1)1-o(1), Algorithm 3 uses at most (1+o(1))(1I(p)+1(12p)log1pp)nlogn(1+o(1))\left(\frac{1}{I(p)}+\frac{1}{(1-2p)\log\frac{1-p}{p}}\right)n\log n queries.

Proof.

Because erring only makes the algorithm exit earlier, in the analysis we ignore the effect of erring caused by insufficient random bits. We define the following events:

  • Let E1E_{1} be the event that |S|nlogn+n2/3=(1+o(1))nlogn|S|\leq\frac{n}{\log n}+n^{2/3}=(1+o(1))\frac{n}{\log n}. By Chernoff bound, [E1]=1o(1)\mathbb{P}[E_{1}]=1-o(1).

  • Let E2E_{2} be the event that all buckets have size at most 3log2n3\log^{2}n. By Lemma 22, [E2]=1o(1)\mathbb{P}[E_{2}]=1-o(1).

  • Let E3E_{3} be the event that at most nlogn+n2/3=(1+o(1))nlogn\frac{n}{\log n}+n^{2/3}=(1+o(1))\frac{n}{\log n} elements aa have the wrong predecessor l^a\hat{l}_{a} at Line 7. By Chebshev’s inequality, [E3]=1o(1)\mathbb{P}[E_{3}]=1-o(1).

  • Let E4E_{4} be the event that all calls to SafeLessThan on Line 20, 27 return the correct values. Because there are at most 2n2n calls, by union bound, [E4]=1o(1)\mathbb{P}[E_{4}]=1-o(1).

  • Let E5E_{5} be the event that |X|O(nlogn)|X|\leq O(\frac{n}{\log n}). If E2E_{2}, E3E_{3}, and E4E_{4} all happen, then E5E_{5} happens, for the following reasons. First, at Line 15, BlB_{l} has size greater than 6log2n6\log^{2}n, while the sizes of all buckets are at most 3log2n3\log^{2}n conditioned on E2E_{2}. Thus, at least half of the elements in BlB_{l} have the wrong predecessor l^a\hat{l}_{a} at Line 7. Thus, the amount of elements added to XX at Line 15 is at most twice the number of elements aa with the wrong predecessor l^a\hat{l}_{a}, which is bounded by O(nlogn)O(\frac{n}{\log n}) conditioned on E3E_{3}. Also, if an element xx is added to XX at Line 22 or 29 and E4E_{4} happens, xx must also have the wrong predecessor. Thus, overall, |X|O(nlogn)|X|\leq O(\frac{n}{\log n}) if E2E_{2}, E3E_{3}, and E4E_{4} all happen.

We make queries in the following lines: Line 5, 7, 17, 20, 27, 34. Let us consider them separately.

Line 5.

By Lemma 20, conditioned on E1E_{1}, with probability 1o(1)1-o(1), Line 5 uses O(n)O(n) queries.

Line 7.

We have at most nn calls to SafeBinarySearch whose number of queries are independent. By Corollary 19 Part 1,

𝔼[number of queries]=(1+o(1))nlognI(p).\mathbb{E}[\text{number of queries}]=(1+o(1))\frac{n\log n}{I(p)}.

By Corollary 19 Part 1 and Chebyshev’s inequality,

[number of queries>𝔼[number of queries]+n2/3]=o(1).\mathbb{P}\left[\text{number of queries}>\mathbb{E}[\text{number of queries}\right]+n^{2/3}]=o(1).

Line 17.

We have a few calls to SafeWeakSort where every input length is at most 6log2n6\log^{2}n, and total input length is at most nn. By Corollary 19 Part 3,

𝔼[number of queries]=(1+o(1))nlogn(12p)log1pp.\mathbb{E}[\text{number of queries}]=(1+o(1))\frac{n\log n}{(1-2p)\log\frac{1-p}{p}}.

By Corollary 19 Part 3 and Chebyshev’s inequality,

[number of queries>𝔼[number of queries]+n2/3]=o(1).\mathbb{P}[\text{number of queries}>\mathbb{E}[\text{number of queries}]+n^{2/3}]=o(1).

Line 20, 27.

Conditioned on E1E_{1}, E3E_{3}, E4E_{4}, the number of calls to SafeLessThan is

O(|S|)+O(number of elements in the wrong bucket)=O(nlogn).\displaystyle O(|S|)+O(\text{number of elements in the wrong bucket})=O(\frac{n}{\log n}).

By Corollary 19 Part 2,

𝔼[number of queries]=O(n).\mathbb{E}[\text{number of queries}]=O(n).

By Corollary 19 Part 2 and Chebyshev’s inequality,

[number of queries>𝔼[number of queries]+n2/3]=o(1).\mathbb{P}[\text{number of queries}>\mathbb{E}[\text{number of queries}]+n^{2/3}]=o(1).

Line 34.

Conditioned on E5E_{5}, the number of calls to SafeBinarySearch is O(nlogn)O(\frac{n}{\log n}). By Corollary 19 Part 1,

𝔼[number of queries]=O(n)\mathbb{E}[\text{number of queries}]=O(n)

By Corollary 19 Part 1 and Chebyshev’s inequality,

[number of queries>𝔼[number of queries]+n2/3]=o(1).\mathbb{P}[\text{number of queries}>\mathbb{E}[\text{number of queries}]+n^{2/3}]=o(1).

Summarizing the above, excluding events with o(1)o(1) probability in total, the total number of queries made is

O(n)+(1+o(1))nlognI(p)+(1+o(1))nlogn(12p)log1pp+O(n)+O(n)+O(n2/3)\displaystyle O(n)+(1+o(1))\frac{n\log n}{I(p)}+(1+o(1))\frac{n\log n}{(1-2p)\log\frac{1-p}{p}}+O(n)+O(n)+O(n^{2/3})
=(1+o(1))(nlognI(p)+nlogn(12p)log1pp).\displaystyle=(1+o(1))\left(\frac{n\log n}{I(p)}+\frac{n\log n}{(1-2p)\log\frac{1-p}{p}}\right).

Finally, we analyze the expected number of random bits used by the algorithm.

Lemma 24.

Algorithm 3 uses O(n)O(n) random bits in expectation.

Proof.

We consider all the places Algorithm 3 uses randomness.

Line 4.

Any biased random bits can be simulated by O(1)O(1) random bits in expectation [KY76], so Line 4 uses O(n)O(n) random bits in expectation.

Line 5.

By Chernoff bound, |S|100n/logn|S|\leq 100n/\log n with probability at least 1n21-n^{-2}. In this case, SafeSimpleSort uses O(|S|log|S|)=O(n)O(|S|\log|S|)=O(n) random bits. If S>100n/lognS>100n/\log n, SafeSimpleSort uses O(nlogn)O(n\log n) random bits, which contribute at most O(nlognn2)O(n\log n\cdot n^{-2}) to the overall expectation. Thus, Line 5 uses O(n)O(n) random bits in expectation.

Line 7, Line 17, Line 34.

For each of the first n2/3n^{2/3} outputted random bits, we use Lemma 14 to construct a pairwise independent hash family of size nn using O(logn)O(\log n) fresh random bits. The total number of fresh random bits needed is O(n2/3logn)O(n^{2/3}\log n). ∎

Now we are ready to prove Theorem 15.

Proof of Theorem 15.

Consider the following algorithm, which we call SafeNoisySort:

  • Run NoisySort. Stop immediately if we are about to use mm queries, for some mm to be chosen later.

  • If we successfully completed NoisySort, then output the sorted array; otherwise, output any permutation.

By Lemma 23, we can choose m=(1+o(1))(1I(p)+1(12p)log1pp)nlognm=(1+o(1))\left(\frac{1}{I(p)}+\frac{1}{(1-2p)\log\frac{1-p}{p}}\right)n\log n so that with probability 1o(1)1-o(1), the NoisySort call completes successfully. By Lemma 21, with probability 1o(1)1-o(1), the NoisySort call is correct. By union bound, SafeNoisySort has error probability o(1)o(1) and it always takes at most mm queries. By Lemma 24, number of random bits used is O(n)O(n) in expectation. ∎

3.1 Trading Queries for Random Bits

In this section, we show how to use the randomized algorithm to construct a deterministic one.

See 1

Proof.

In our model, it is possible to generate unbiased random bits using noisy comparisons. Take two arbitrary elements x,yx,y, and call NoisyComparison(x,y)\textsc{NoisyComparison}(x,y) twice. If the two returned values are different, we use the result of the first returned value as our random bit; otherwise, we repeat. Clearly, this procedure generates an unbiased random bit. The probability that this procedure successfully return a random bit after each two calls of NoisyComparison is 2p(1p)2p(1-p), so the expected number of noisy comparisons needed to generate each random bit is O(12p(1p))=O(1)O(\frac{1}{2p(1-p)})=O(1). Note that this procedure is deterministic.

Algorithm 3 uses O(n)O(n) random bits in expectation, so we can use the above procedure to generate random bits, and the expected number of queries needed is O(n)O(n). We can halt the algorithm if the number of queries used this way exceeds O(nloglogn)O(n\log\log n), which only incurs an additional o(1)o(1) error probability. ∎

4 Noisy Sorting Lower Bound

In this section, we show the lower bound of noisy sorting. See 2

The following problem serves as an intermediate step towards the lower bound.

Definition 25 (Group Sorting Problem).

For some integers knk\mid n, we define problem 𝖦𝗋𝗈𝗎𝗉𝖲𝗈𝗋𝗍𝗂𝗇𝗀(n,k)\mathsf{GroupSorting}(n,k) as follows: Given a list LL of nn elements, divided into n/kn/k groups A1,,An/kA_{1},\ldots,A_{n/k} of kk elements each, satisfying the property that for all 1i<jn/k1\leq i<j\leq n/k, xAi,yAjx\in A_{i},y\in A_{j}, we have x<yx<y. An algorithm needs to output an ordered list of sets A1,,An/kA_{1},\ldots,A_{n/k} by asking the following type of queries GroupQuery(x,y)\textsc{GroupQuery}(x,y):

  • If the two elements are in the same group, then the query result is *.

  • Otherwise, suppose xAix\in A_{i}, yAjy\in A_{j} with iji\neq j. Then the query result is NoisyComparison(i,j)\textsc{NoisyComparison}(i,j).

The following lemma relates GroupSorting and NoisySorting.

Lemma 26.

Fix any algorithm 𝒜\mathcal{A} for 𝖭𝗈𝗂𝗌𝗒𝖲𝗈𝗋𝗍𝗂𝗇𝗀(n)\mathsf{NoisySorting}(n) with error probability o(1)o(1). Let knk\mid n. Let LL be the input list of 𝖭𝗈𝗂𝗌𝗒𝖲𝗈𝗋𝗍𝗂𝗇𝗀(n)\mathsf{NoisySorting}(n). Suppose we partition LL into n/kn/k groups A1,,An/kA_{1},\ldots,A_{n/k} where for all 1i<jn/k1\leq i<j\leq n/k, for all xAix\in A_{i}, yAjy\in A_{j}, we have x<yx<y. Let UU^{\neq} denote the number of queries 𝒜\mathcal{A} makes to elements in different groups.

Then there exists an algorithm for 𝖦𝗋𝗈𝗎𝗉𝖲𝗈𝗋𝗍𝗂𝗇𝗀(n,k)\mathsf{GroupSorting}(n,k) with error probability o(1)o(1) which makes at most 𝔼[U]+(nn/k)\mathbb{E}[U^{\neq}]+(n-n/k) queries in expectation.

Proof.

Given 𝒜\mathcal{A}, we design an algorithm 𝒜\mathcal{A}^{\prime} for 𝖦𝗋𝗈𝗎𝗉𝖲𝗈𝗋𝗍𝗂𝗇𝗀(n,k)\mathsf{GroupSorting}(n,k) as follows.

Given input LL, Algorithm 𝒜\mathcal{A}^{\prime} picks an arbitrary strict total order <T<_{T} on all the elements. 𝒜\mathcal{A}^{\prime} also maintains which elements are known to be in the same group, implied by queries that return *. Then, it simulates Algorithm 𝒜\mathcal{A} on the same input LL as follows:

  • When 𝒜\mathcal{A} attempts to make a comparison between xx and yy:

    • If x,yx,y are not known to be in the same group, call GroupQuery(x,y)\textsc{GroupQuery}(x,y). If the query returns *, which means xx and yy are in the same group, return BSCp(𝟙{x<Ty})\operatorname{\mathrm{BSC}}_{p}(\mathbbm{1}\{x<_{T}y\}) to 𝒜\mathcal{A}. Otherwise, return the result of GroupQuery(x,y)\textsc{GroupQuery}(x,y) to 𝒜\mathcal{A}.

    • If x,yx,y are known to be in the same group, return BSCp(𝟙{x<Ty})\operatorname{\mathrm{BSC}}_{p}(\mathbbm{1}\{x<_{T}y\}) to 𝒜\mathcal{A}.

  • When 𝒜\mathcal{A} outputs a sequence x1,,xnx_{1},\ldots,x_{n}: Let Ai={x(i1)k+1,,xik}A_{i}=\{x_{(i-1)k+1},\ldots,x_{ik}\} for i[n/k]i\in[n/k] and output A1,,An/kA_{1},\ldots,A_{n/k}.

Let us analyze the error probability and number of queries made by algorithm 𝒜\mathcal{A}^{\prime}.

Error probability.

Algorithm 𝒜\mathcal{A}^{\prime} simulates a 𝖭𝗈𝗂𝗌𝗒𝖲𝗈𝗋𝗍𝗂𝗇𝗀(n)\mathsf{NoisySorting}(n) instance on LL with the following total order:

  • If x,yx,y are in the same group, then x<yx<y iff x<Tyx<_{T}y;

  • If xAix\in A_{i}, yAjy\in A_{j} are in different groups, then x<yx<y iff i<ji<j.

Therefore, with error probability o(1)o(1), the sequence x1,,xnx_{1},\ldots,x_{n} that 𝒜\mathcal{A} outputs is sorted with respect to the above total order, which gives the valid groups.

Number of queries.

Algorithm 𝒜\mathcal{A}^{\prime} makes one query when 𝒜\mathcal{A} makes a query for elements in different groups. Algorithm 𝒜\mathcal{A}^{\prime} makes queries between elements in the same group only when they are not known to be in the same group. Imagine an initially empty graph on vertex set AA where for each GroupQuery(x,y)\textsc{GroupQuery}(x,y) that returns *, we add an edge between xx and yy. For every such query, the number of connected components in this graph decreases by 11, and at the end the graph has at least n/kn/k connected components. Thus, the total number of queries made by 𝒜\mathcal{A}^{\prime} to elements in the same group is at most nn/kn-n/k.

Overall, the total number of queries made by Algorithm 𝒜\mathcal{A}^{\prime} is at most 𝔼[U]+(nn/k)\mathbb{E}[U^{\neq}]+(n-n/k) in expectation. ∎

Next we prove a lower bound for the Group Sorting Problem. By Lemma 26, this implies a lower bound of the number of queries made between elements in different groups of any Noisy Sorting algorithm.

Lemma 27.

Let knk\mid n with k=no(1)k=n^{o(1)}. Any (deterministic or randomized) algorithm for 𝖦𝗋𝗈𝗎𝗉𝖲𝗈𝗋𝗍𝗂𝗇𝗀(n,k)\mathsf{GroupSorting}(n,k) with error probability o(1)o(1) makes at least (1o(1))nlognI(p)(1-o(1))\frac{n\log n}{I(p)} queries in expectation, even if the input is a uniformly random permutation.

Proof.

Fix an algorithm 𝒜\mathcal{A} for 𝖦𝗋𝗈𝗎𝗉𝖲𝗈𝗋𝗍𝗂𝗇𝗀(n,k)\mathsf{GroupSorting}(n,k). WLOG assume that 𝒜\mathcal{A} makes queries between elements in the same group only when they are not known to be in the same group. Therefore, the total number of queries between elements in the same group is at most nn/kn-n/k.

Let X=(A1,,An/k)X=(A_{1},\ldots,A_{n/k}) be the true partition. Let mm be the (random) number of queries made. Let QmQ^{m} be the queries made. Let YmY^{m} be the returned values of the queries. Let X^\hat{X} be the most probable XX given all query results. Note that X^\hat{X} is a function of (Qm,Ym)(Q^{m},Y^{m}).

When k=no(1)k=n^{o(1)}, we have H(X)=logn!nklogk!=(1o(1))nlognH(X)=\log n!-\frac{n}{k}\log k!=(1-o(1))n\log n.

By Fano’s inequality, H(XX^)1+o(nlogn)H(X\mid\hat{X})\leq 1+o(n\log n), so

I(X;Qm,Ym)I(X;X^)=H(X)H(XX^)(1o(1))nlogn.\displaystyle I(X;Q^{m},Y^{m})\geq I(X;\hat{X})=H(X)-H(X\mid\hat{X})\geq(1-o(1))n\log n. (1)

On the other hand,

I(X;Qm,Ym)\displaystyle I(X;Q^{m},Y^{m}) =i1[mi]I(X;Qi,YiQi1,Yi1,mi)\displaystyle=\sum_{i\geq 1}\mathbb{P}[m\geq i]I(X;Q_{i},Y_{i}\mid Q^{i-1},Y^{i-1},m\geq i) (chain rule of mutual information)
=i1[mi]I(X;YiQi,Qi1,Yi1,mi)\displaystyle=\sum_{i\geq 1}\mathbb{P}[m\geq i]I(X;Y_{i}\mid Q_{i},Q^{i-1},Y^{i-1},m\geq i) (QiQ_{i} and XX are independent conditioned on (Qi1,Yi1)(Q^{i-1},Y^{i-1}))
i1[mi]I(X,Qi,Qi1,Yi1;Yimi)\displaystyle\leq\sum_{i\geq 1}\mathbb{P}[m\geq i]I(X,Q_{i},Q^{i-1},Y^{i-1};Y_{i}\mid m\geq i) (chain rule)
=i1[mi]I(X,Qi;Yimi)\displaystyle=\sum_{i\geq 1}\mathbb{P}[m\geq i]I(X,Q_{i};Y_{i}\mid m\geq i) ((Qi1,Yi1)(Q^{i-1},Y^{i-1}) and YiY_{i} are independent conditioned on (X,Qi)(X,Q_{i}))
=i1[mi](H(Yimi)H(YiX,Qi,mi))\displaystyle=\sum_{i\geq 1}\mathbb{P}[m\geq i]\left(H(Y_{i}\mid m\geq i)-H(Y_{i}\mid X,Q_{i},m\geq i)\right) (2)

Let qi=[Yi=mi]q_{i}=\mathbb{P}[Y_{i}=*\mid m\geq i]. Because the algorithm makes at most nn/kn-n/k queries between elements in the same group, we have

i1[mi]qinn/k.\displaystyle\sum_{i\geq 1}\mathbb{P}[m\geq i]q_{i}\leq n-n/k.

We have

H(Yimi)h(qi,1qi2,1qi2)\displaystyle H(Y_{i}\mid m\geq i)\leq h\left(q_{i},\frac{1-q_{i}}{2},\frac{1-q_{i}}{2}\right)

and

H(YiX,Qi,mi)=0qi+h(p)(1qi).\displaystyle H(Y_{i}\mid X,Q_{i},m\geq i)=0\cdot q_{i}+h(p)(1-q_{i}).

Note that H(Yimi)log3H(Y_{i}\mid m\geq i)\leq\log 3 trivially, so by Equation (2), I(X;Qm,Ym)log3i1[mi]=log3𝔼[m]I(X;Q^{m},Y^{m})\leq\log 3\cdot\sum_{i\geq 1}\mathbb{P}[m\geq i]=\log 3\cdot\mathbb{E}[m]. Combining with Equation (1), we have 𝔼[m]=Ω(nlogn)\mathbb{E}[m]=\Omega(n\log n).

Note that g(x):=h(x,1x2,1x2)g(x):=h(x,\frac{1-x}{2},\frac{1-x}{2}) is concave in xx, so

i1[mi]H(Yimi)\displaystyle\sum_{i\geq 1}\mathbb{P}[m\geq i]H(Y_{i}\mid m\geq i) (i1[mi])g(i1[mi]qii1[mi])\displaystyle\leq\left(\sum_{i\geq 1}\mathbb{P}[m\geq i]\right)g\left(\frac{\sum_{i\geq 1}\mathbb{P}[m\geq i]q_{i}}{\sum_{i\geq 1}\mathbb{P}[m\geq i]}\right) (Jensen’s inequality)
𝔼[m]g(min{13,nn/k𝔼[m]})\displaystyle\leq\mathbb{E}[m]\cdot g\left(\min\left\{\frac{1}{3},\frac{n-n/k}{\mathbb{E}[m]}\right\}\right) (gg is increasing on [0,13][0,\frac{1}{3}] and is maximized at 13\frac{1}{3})
=(1+o(1))𝔼[m].\displaystyle=(1+o(1))\mathbb{E}[m]. (n=o(𝔼[m])n=o(\mathbb{E}[m]))

On the other hand,

i1[mi]H(YiX,Qi,mi)\displaystyle\sum_{i\geq 1}\mathbb{P}[m\geq i]H(Y_{i}\mid X,Q_{i},m\geq i) =i1[mi]h(p)(1qi)\displaystyle=\sum_{i\geq 1}\mathbb{P}[m\geq i]h(p)(1-q_{i})
=h(p)(i1[mi]i1[mi]qi)\displaystyle=h(p)\left(\sum_{i\geq 1}\mathbb{P}[m\geq i]-\sum_{i\geq 1}\mathbb{P}[m\geq i]q_{i}\right)
h(p)(𝔼[m]n)\displaystyle\geq h(p)(\mathbb{E}[m]-n)
=(1o(1))h(p)𝔼[m].\displaystyle=(1-o(1))h(p)\mathbb{E}[m].

Plugging the above two inequalities in Equation (2), we get

I(X;Qm,Ym)\displaystyle I(X;Q^{m},Y^{m}) (1+o(1))𝔼[m](1o(1))h(p)𝔼[m]\displaystyle\leq(1+o(1))\mathbb{E}[m]-(1-o(1))h(p)\mathbb{E}[m]
(1+o(1))I(p)𝔼[m].\displaystyle\leq(1+o(1))I(p)\mathbb{E}[m].

Combining with Equation (1), we get 𝔼[m](1o(1))nlognI(p)\mathbb{E}[m]\geq(1-o(1))\frac{n\log n}{I(p)}. ∎

Before we give lower bound for the number of queries made to elements in the same group, we first show the following technical lemma that will be useful later.

Lemma 28.

Let (Xn)n0(X_{n})_{n\geq 0} be a Markov process defined as

X0=0,Xn+1=Xn+{+1w.p.1p,1w.p.p.\displaystyle X_{0}=0,\quad X_{n+1}=X_{n}+\left\{\begin{array}[]{ll}+1&\text{w.p.}~{}1-p,\\ -1&\text{w.p.}~{}p.\end{array}\right.

Let τ\tau be a random variable supported on 0\mathbb{Z}_{\geq 0}. Suppose there exists δ>0\delta>0 such that

𝔼[τ](1δ)m/(12p).\displaystyle\mathbb{E}[\tau]\leq(1-\delta)m/(1-2p).

Then

[Xτm]δ/2om(1).\displaystyle\mathbb{P}[X_{\tau}\leq m]\geq\delta/2-o_{m}(1).
Proof.

Let E1E_{1} be the event that τ(1δ/2)m/(12p)\tau\leq(1-\delta/2)m/(1-2p). Let E2E_{2} be the event that XtmX_{t}\leq m for all t(1δ/2)m/(12p)t\leq(1-\delta/2)m/(1-2p). By Markov’s inequality, [E1]δ/2\mathbb{P}[E_{1}]\geq\delta/2. By concentration inequalities, [E2]1om(1)\mathbb{P}[E_{2}]\geq 1-o_{m}(1). Using union bound, we have

[Xτm][E1E2]δ/2om(1).\displaystyle\mathbb{P}[X_{\tau}\leq m]\geq\mathbb{P}[E_{1}\land E_{2}]\geq\delta/2-o_{m}(1).

Lemma 29.

Fix any (deterministic or randomized) algorithm for 𝖭𝗈𝗂𝗌𝗒𝖲𝗈𝗋𝗍𝗂𝗇𝗀(n)\mathsf{NoisySorting}(n) with error probability o(1)o(1). Let knk\mid n. Let LL be the input list of 𝖭𝗈𝗂𝗌𝗒𝖲𝗈𝗋𝗍𝗂𝗇𝗀(n)\mathsf{NoisySorting}(n). Suppose we partition LL into n/kn/k groups A1,,An/kA_{1},\ldots,A_{n/k} where for all 1i<jn/k1\leq i<j\leq n/k, for all xAix\in A_{i}, yAjy\in A_{j}, we have x<yx<y. Let U=U^{=} denote the number of queries made to elements in the same group.

Then 𝔼[U=](1o(1))(nn/k)log(nn/k)(12p)log1pp\mathbb{E}[U^{=}]\geq(1-o(1))\frac{(n-n/k)\log(n-n/k)}{(1-2p)\log\frac{1-p}{p}}, even if the input is a uniformly random permutation.

Proof.

Suppose the true order is σ=(x1,,xn)\sigma=(x_{1},\ldots,x_{n}). Let S={i[n]:(n/k)i}S=\{i\in[n]:(n/k)\nmid i\}. Note that |S|=nn/k|S|=n-n/k. For iSi\in S, define WiW_{i} to be the number of queries returning xi<xi+1x_{i}<x_{i+1}, minus the number of queries returning xi>xi+1x_{i}>x_{i+1}. Note that WiW_{i} is a random variable depending on the transcript. Define W=iSWiW=\sum_{i\in S}W_{i}. Then 𝔼[W]=(12p)𝔼[U=]\mathbb{E}[W]=(1-2p)\mathbb{E}[U^{=}].

Let us consider the posterior probabilities qq of the permutations given a fixed transcript. That is, define

qτ:=[σ=τQm,Ym].q_{\tau}:=\mathbb{P}[\sigma=\tau\mid Q^{m},Y^{m}].

By Bayes’ rule, qτq_{\tau} equals [Qm,Ymσ=τ][σ=τ][Qm,Ym]\frac{\mathbb{P}[Q^{m},Y^{m}\mid\sigma=\tau]\mathbb{P}[\sigma=\tau]}{\mathbb{P}[Q^{m},Y^{m}]}. By symmetry, [σ=τ]\mathbb{P}[\sigma=\tau] are equal for different σ\sigma and [Qm,Ym]\mathbb{P}[Q^{m},Y^{m}] is constant as we fixed the transcript, so qτq_{\tau} is proportional to [Qm,Ymσ=τ]\mathbb{P}[Q^{m},Y^{m}\mid\sigma=\tau]. If τ=(y1,,yn)\tau=(y_{1},\ldots,y_{n}), then [Qm,Ymσ=τ]\mathbb{P}[Q^{m},Y^{m}\mid\sigma=\tau] equals

1i<jn(1p)(#queries returning yi<yj)1i<jnp(#queries returning yi>yj).\prod_{1\leq i<j\leq n}(1-p)^{(\text{\#queries returning }y_{i}<y_{j})}\cdot\prod_{1\leq i<j\leq n}p^{(\text{\#queries returning }y_{i}>y_{j})}. (3)

For iSi\in S, consider the permutation τi:=(x1,,xi1,xi+1,xi,xi+2,,xn)\tau_{i}:=(x_{1},\ldots,x_{i-1},x_{i+1},x_{i},x_{i+2},\ldots,x_{n}). Then by comparing qτiq_{\tau_{i}} and qσq_{\sigma} using (3), we have

qσqτi\displaystyle\frac{q_{\sigma}}{q_{\tau_{i}}} =(1p)(#queries returning xi<xi+1)p(#queries returning xi>xi+1)(1p)(#queries returning xi>xi+1)p(#queries returning xi<xi+1)\displaystyle=\frac{(1-p)^{(\text{\#queries returning }x_{i}<x_{i+1})}p^{(\text{\#queries returning }x_{i}>x_{i+1})}}{(1-p)^{(\text{\#queries returning }x_{i}>x_{i+1})}p^{(\text{\#queries returning }x_{i}<x_{i+1})}}
=(1pp)(#queries returning xi<xi+1)(1pp)(#queries returning xi>xi+1)\displaystyle=\left(\frac{1-p}{p}\right)^{(\text{\#queries returning }x_{i}<x_{i+1})}\left(\frac{1-p}{p}\right)^{-(\text{\#queries returning }x_{i}>x_{i+1})}
=(1pp)Wi.\displaystyle=\left(\frac{1-p}{p}\right)^{W_{i}}.

Thus,

qτi=qσ(1pp)Wi.\displaystyle q_{\tau_{i}}=q_{\sigma}\cdot\left(\frac{1-p}{p}\right)^{-W_{i}}.

Summing over SS, we have

iSqτi\displaystyle\sum_{i\in S}q_{\tau_{i}} =qσiS(1pp)Wi\displaystyle=q_{\sigma}\sum_{i\in S}\left(\frac{1-p}{p}\right)^{-W_{i}}
qσ|S|(1pp)W/|S|.\displaystyle\geq q_{\sigma}|S|\left(\frac{1-p}{p}\right)^{-W/|S|}.

Because the error probability is o(1)o(1), for any ϵ>0\epsilon>0, with probability 1o(1)1-o(1), qσ1ϵq_{\sigma}\geq 1-\epsilon. So

|S|(1pp)W/|S|ϵ/(1ϵ)\displaystyle|S|\left(\frac{1-p}{p}\right)^{-W/|S|}\leq\epsilon/(1-\epsilon)

with probability 1o(1)1-o(1).

On the other hand, for any δ>0\delta>0, if

𝔼[U=](1δ)|S|log|S|(12p)log1pp,\displaystyle\mathbb{E}[U^{=}]\leq(1-\delta)\frac{|S|\log|S|}{(1-2p)\log\frac{1-p}{p}},

then by regarding U=U^{=} as τ\tau and WW as XτX_{\tau} in Lemma 28, with probability Ω(δ)o|S|(1)\Omega(\delta)-o_{|S|}(1), we have

W|S|log|S|log1pp\displaystyle W\leq\frac{|S|\log|S|}{\log\frac{1-p}{p}}

and then

|S|(1pp)W/|S|1,\displaystyle|S|\left(\frac{1-p}{p}\right)^{-W/|S|}\geq 1,

which is a contradiction.

Thus,

𝔼[U=](1o(1))|S|log|S|(12p)log1pp,\displaystyle\mathbb{E}[U^{=}]\geq(1-o(1))\frac{|S|\log|S|}{(1-2p)\log\frac{1-p}{p}},

as desired. ∎

Now we are ready to prove Theorem 2.

Proof of Theorem 2.

Fix an algorithm for 𝖭𝗈𝗂𝗌𝗒𝖲𝗈𝗋𝗍𝗂𝗇𝗀(n)\mathsf{NoisySorting}(n) with error probability o(1)o(1).

Take k=lognk=\log n. Let n=kn/kn^{\prime}=k\lfloor n/k\rfloor. Clearly, an algorithm for 𝖭𝗈𝗂𝗌𝗒𝖲𝗈𝗋𝗍𝗂𝗇𝗀(n)\mathsf{NoisySorting}(n) can be used to solve 𝖭𝗈𝗂𝗌𝗒𝖲𝗈𝗋𝗍𝗂𝗇𝗀(n)\mathsf{NoisySorting}(n^{\prime}) with less or equal number of queries in expectation. Let UU^{\neq}, U=U^{=} be as defined in Lemma 26, Lemma 29 for nn^{\prime} and kk.

By Lemma 26 and Lemma 27, we have

𝔼[U](1o(1))nlognI(p)(nn/k)=(1o(1))nlognI(p).\mathbb{E}[U^{\neq}]\geq(1-o(1))\frac{n^{\prime}\log n^{\prime}}{I(p)}-(n^{\prime}-n^{\prime}/k)=(1-o(1))\frac{n^{\prime}\log n^{\prime}}{I(p)}.

By Lemma 29, we have

𝔼[U=](1o(1))(nn/k)log(nn/k)(12p)log1pp=(1o(1))nlogn(12p)log1pp.\mathbb{E}[U^{=}]\geq(1-o(1))\frac{(n^{\prime}-n^{\prime}/k)\log(n^{\prime}-n^{\prime}/k)}{(1-2p)\log\frac{1-p}{p}}=(1-o(1))\frac{n^{\prime}\log n^{\prime}}{(1-2p)\log\frac{1-p}{p}}.

Therefore expected number of queries made by the algorithm is at least

𝔼[U]+𝔼[U=]\displaystyle\mathbb{E}[U^{\neq}]+\mathbb{E}[U^{=}] (1o(1))(1I(p)+1(12p)log1pp)nlogn\displaystyle\geq(1-o(1))\left(\frac{1}{I(p)}+\frac{1}{(1-2p)\log\frac{1-p}{p}}\right)n^{\prime}\log n^{\prime}
=(1o(1))(1I(p)+1(12p)log1pp)nlogn.\displaystyle=(1-o(1))\left(\frac{1}{I(p)}+\frac{1}{(1-2p)\log\frac{1-p}{p}}\right)n\log n.

5 Noisy Binary Search

In this section, we prove our lower and upper bounds for Noisy Binary Search.

5.1 Lower Bound

In this section, we prove our lower bound for Noisy Binary Search using techniques similar to our lower bound for Noisy Sorting. See 3

We define the following intermediate problem:

Definition 30 (𝖮𝗋𝖺𝖼𝗅𝖾𝖭𝗈𝗂𝗌𝗒𝖡𝗂𝗇𝖺𝗋𝗒𝖲𝖾𝖺𝗋𝖼𝗁(n)\mathsf{OracleNoisyBinarySearch}(n)).

Given a sorted list LL of nn elements and an element xx that is unequal to any element in the list, a solver needs to output the predecessor of xx in LL by asking the following type of query OracleQuery(x,y)\textsc{OracleQuery}(x,y) for yLy\in L:

  • If yy is the predecessor of xx, output symbol PP;

  • If yy is the successor of xx, output symbol SS;

  • Otherwise, output NoisyComparison(x,y)\textsc{NoisyComparison}(x,y).

Lemma 31.

Fix an algorithm 𝒜\mathcal{A} for 𝖭𝗈𝗂𝗌𝗒𝖡𝗂𝗇𝖺𝗋𝗒𝖲𝖾𝖺𝗋𝖼𝗁(n)\mathsf{NoisyBinarySearch}(n) with error probability δ\delta. Let LL be the input list and xx be the input element of 𝖭𝗈𝗂𝗌𝗒𝖡𝗂𝗇𝖺𝗋𝗒𝖲𝖾𝖺𝗋𝖼𝗁(n)\mathsf{NoisyBinarySearch}(n). Let UU^{\not\approx} denote the number of noisy comparisons 𝒜\mathcal{A} makes between xx and elements that are not the predecessor or successor of xx.

Then there exists an algorithm for 𝖮𝗋𝖺𝖼𝗅𝖾𝖭𝗈𝗂𝗌𝗒𝖡𝗂𝗇𝖺𝗋𝗒𝖲𝖾𝖺𝗋𝖼𝗁(n)\mathsf{OracleNoisyBinarySearch}(n) with error probability at most δ\delta which makes at most 𝔼[U]+1\mathbb{E}[U^{\not\approx}]+1 queries in expectation.

Proof.

The algorithm for 𝖮𝗋𝖺𝖼𝗅𝖾𝖭𝗈𝗂𝗌𝗒𝖡𝗂𝗇𝖺𝗋𝗒𝖲𝖾𝖺𝗋𝖼𝗁(n)\mathsf{OracleNoisyBinarySearch}(n) simulates 𝒜\mathcal{A} as follows:

  • Every time 𝒜\mathcal{A} attempts to make a NoisyComparison(x,y)\textsc{NoisyComparison}(x,y) between some xx and some element yy in LL, the algorithm makes OracleQuery(x,y)\textsc{OracleQuery}(x,y) instead. If OracleQuery(x,y)\textsc{OracleQuery}(x,y) returns PP or SS, we have found the predecessor of xx, so we can return the predecessor and finish the execution; otherwise, we pass the result of OracleQuery(x,y)\textsc{OracleQuery}(x,y) (which will be NoisyComparison(x,y)\textsc{NoisyComparison}(x,y)) to 𝒜\mathcal{A}.

  • If 𝒜\mathcal{A} returns a predecessor or xx, then we also return the same predecessor, finishing the execution.

The error probability bound and number of queries easily follow. ∎

Lemma 32.

Any (deterministic or randomized) algorithm for 𝖮𝗋𝖺𝖼𝗅𝖾𝖭𝗈𝗂𝗌𝗒𝖡𝗂𝗇𝖺𝗋𝗒𝖲𝖾𝖺𝗋𝖼𝗁(n)\mathsf{OracleNoisyBinarySearch}(n) with error probability δ\leq\delta makes at least (1δo(1))lognI(p)(1-\delta-o(1))\frac{\log n}{I(p)} queries in expectation, even if the position of the element to search for is uniformly random.

Proof.

Fix an algorithm 𝒜\mathcal{A} for 𝖮𝗋𝖺𝖼𝗅𝖾𝖭𝗈𝗂𝗌𝗒𝖡𝗂𝗇𝖺𝗋𝗒𝖲𝖾𝖺𝗋𝖼𝗁(n)\mathsf{OracleNoisyBinarySearch}(n). WLOG, we can assume that 𝒜\mathcal{A} exits as soon as it sees a query that returns PP or SS, i.e., it sees at most one such query result. Also, we assume δ1Ω(1)\delta\leq 1-\Omega(1), as otherwise the lower bound is trivially 0.

Let X{0,,n}X\in\{0,\ldots,n\} be the index of the true predecessor of xx. Let mm be the (random) number of queries made. Let QmQ^{m} be the queries made. Let YmY^{m} be the returned values of the queries. Let X^\hat{X} be the most probable XX given all query results. Note that X^\hat{X} is a function of (Qm,Ym)(Q^{m},Y^{m}).

Clearly, H(X)=log(n+1)H(X)=\log(n+1). By Fano’s inequality, H(XX^)1+δlog(n)H(X\mid\hat{X})\leq 1+\delta\log(n). Thus,

I(X;Qm,Ym)I(X;X^)=H(X)H(XX^)(1δo(1))logn.\displaystyle I(X;Q^{m},Y^{m})\geq I(X;\hat{X})=H(X)-H(X\mid\hat{X})\geq(1-\delta-o(1))\log n.

On the other hand, we also have

I(X;Qm,Ym)i1[mi](H(Yimi)H(YiX,Qi,mi))I(X;Q^{m},Y^{m})\leq\sum_{i\geq 1}\mathbb{P}[m\geq i]\left(H(Y_{i}\mid m\geq i)-H(Y_{i}\mid X,Q_{i},m\geq i)\right)

using the same inequalities in the proof of Lemma 27.

Let qi=[Yi=Pmi]q_{i}=\mathbb{P}[Y_{i}=P\mid m\geq i] and ri=[Yi=Smi]r_{i}=\mathbb{P}[Y_{i}=S\mid m\geq i]. Because 𝒜\mathcal{A} exits as soon as it sees PP or SS we have

i1[mi](qi+ri)1.\displaystyle\sum_{i\geq 1}\mathbb{P}[m\geq i](q_{i}+r_{i})\leq 1.

Then

H(Yimi)h(qi,ri,1qiri2,1qiri2)h(qi+ri2,qi+ri2,1qiri2,1qiri2)\displaystyle H(Y_{i}\mid m\geq i)\leq h\left(q_{i},r_{i},\frac{1-q_{i}-r_{i}}{2},\frac{1-q_{i}-r_{i}}{2}\right)\leq h\left(\frac{q_{i}+r_{i}}{2},\frac{q_{i}+r_{i}}{2},\frac{1-q_{i}-r_{i}}{2},\frac{1-q_{i}-r_{i}}{2}\right)

and

H(YiX,Qi,mi)=0(qi+ri)+h(p)(1qiri).\displaystyle H(Y_{i}\mid X,Q_{i},m\geq i)=0\cdot(q_{i}+r_{i})+h(p)(1-q_{i}-r_{i}).

Note that H(Yimi)2H(Y_{i}\mid m\geq i)\leq 2 trivially, so I(X;Qm,Ym)2i1[mi]=2𝔼[m]I(X;Q^{m},Y^{m})\leq 2\sum_{i\geq 1}\mathbb{P}[m\geq i]=2\mathbb{E}[m]. Combining with I(X;Qm,Ym)(1δo(1))lognI(X;Q^{m},Y^{m})\geq(1-\delta-o(1))\log n, we have 𝔼[m]=Ω(logn)\mathbb{E}[m]=\Omega(\log n) (as we can assume δ1Ω(1)\delta\leq 1-\Omega(1)).

Note that g(x):=h(x2,x2,1x2,1x2)g(x):=h(\frac{x}{2},\frac{x}{2},\frac{1-x}{2},\frac{1-x}{2}) is concave in xx, so

i1[mi]H(Yimi)\displaystyle\sum_{i\geq 1}\mathbb{P}[m\geq i]H(Y_{i}\mid m\geq i) (i1[mi])g(i1[mi](qi+ri)/2i1[mi])\displaystyle\leq\left(\sum_{i\geq 1}\mathbb{P}[m\geq i]\right)g\left(\frac{\sum_{i\geq 1}\mathbb{P}[m\geq i](q_{i}+r_{i})/2}{\sum_{i\geq 1}\mathbb{P}[m\geq i]}\right) (Jensen’s inequality)
𝔼[m]g(min{12,1/2𝔼[m]})\displaystyle\leq\mathbb{E}[m]\cdot g\left(\min\left\{\frac{1}{2},\frac{1/2}{\mathbb{E}[m]}\right\}\right) (gg is increasing on [0,12][0,\frac{1}{2}] and is maximum at 12\frac{1}{2})
=(1+o(1))𝔼[m].\displaystyle=(1+o(1))\cdot\mathbb{E}[m]. (1/2𝔼[m]=o(1)\frac{1/2}{\mathbb{E}[m]}=o(1))

On the other hand,

i1[mi]H(YiX,Qi,mi)\displaystyle\sum_{i\geq 1}\mathbb{P}[m\geq i]H(Y_{i}\mid X,Q_{i},m\geq i) =i1[mi]h(p)(1qiri)\displaystyle=\sum_{i\geq 1}\mathbb{P}[m\geq i]h(p)(1-q_{i}-r_{i})
=h(p)(i1[mi]i1[mi](qi+ri))\displaystyle=h(p)\left(\sum_{i\geq 1}\mathbb{P}[m\geq i]-\sum_{i\geq 1}\mathbb{P}[m\geq i](q_{i}+r_{i})\right)
h(p)(𝔼[m]1)\displaystyle\geq h(p)(\mathbb{E}[m]-1)
=(1o(1))h(p)𝔼[m].\displaystyle=(1-o(1))h(p)\mathbb{E}[m].

Therefore,

I(X;Qm,Ym)\displaystyle I(X;Q^{m},Y^{m}) (1+o(1))𝔼[m](1o(1))h(p)𝔼[m]\displaystyle\leq(1+o(1))\mathbb{E}[m]-(1-o(1))h(p)\mathbb{E}[m]
(1+o(1))I(p)𝔼[m].\displaystyle\leq(1+o(1))I(p)\mathbb{E}[m].

Combining with I(X;Qm,Ym)(1δo(1))lognI(X;Q^{m},Y^{m})\geq(1-\delta-o(1))\log n, we get 𝔼[m](1δo(1))lognI(p)\mathbb{E}[m]\geq(1-\delta-o(1))\frac{\log n}{I(p)}. ∎

Lemma 33.

Fix any (deterministic or randomized) algorithm for 𝖭𝗈𝗂𝗌𝗒𝖡𝗂𝗇𝖺𝗋𝗒𝖲𝖾𝖺𝗋𝖼𝗁(n)\mathsf{NoisyBinarySearch}(n) with error probability δ1/logn\delta\leq 1/\log n. Let LL be the input list and xx be the input element of 𝖭𝗈𝗂𝗌𝗒𝖡𝗂𝗇𝖺𝗋𝗒𝖲𝖾𝖺𝗋𝖼𝗁(n)\mathsf{NoisyBinarySearch}(n). Let UU^{\approx} denote the number of noisy comparisons made between xx and it the predecessor and successor. Then 𝔼[U](2o(1))log1δ(12p)log1pp\mathbb{E}[U^{\approx}]\geq(2-o(1))\frac{\log\frac{1}{\delta}}{(1-2p)\log\frac{1-p}{p}}, even if the position of the element to search for is uniformly random.

Proof.

Let the input list contain elements y1,,yny_{1},\ldots,y_{n} in this order. Let kk be the index of the predecessor of xx. Define WiW_{i} to be the number of queries returning x<yix<y_{i}, minus the number of queries returning x>yix>y_{i}. Note that WiW_{i} is a random variable depending on the transcript. We will first show that 𝔼[Uk{0,n}](2o(1))log1δ(12p)log1pp\mathbb{E}[U^{\approx}\mid k\not\in\{0,n\}]\geq(2-o(1))\frac{\log\frac{1}{\delta}}{(1-2p)\log\frac{1-p}{p}}. Notice that the error probability of the algorithm conditioned on k{0,n}k\not\in\{0,n\} is at most δPr[k{0,n}]1.1δ\frac{\delta}{\Pr[k\not\in\{0,n\}]}\leq 1.1\delta for sufficiently large nn.

Define W=Wk+Wk+1W=W_{k}+W_{k+1}. Then 𝔼[W]=(12p)𝔼[U]\mathbb{E}[W]=(1-2p)\mathbb{E}[U^{\approx}]. Let us consider the posterior probabilities qq of the distributions of kk given a fixed transcript. That is,

qi:=[k=iQm,Ym].q_{i}:=\mathbb{P}[k=i\mid Q^{m},Y^{m}].

Then if k{0,n}k\not\in\{0,n\}, similarly to the proof of Lemma 29, we have

qk1=qk(1pp)Wk and qk+1=qk(1pp)Wk+1q_{k-1}=q_{k}\cdot\left(\frac{1-p}{p}\right)^{-W_{k}}\text{\quad and\quad}q_{k+1}=q_{k}\cdot\left(\frac{1-p}{p}\right)^{-W_{k+1}}

Thus, by Jensen’s inequality,

qk1+qk+1qk((1pp)Wk+(1pp)Wk+1)2qk(1pp)W/2.\displaystyle q_{k-1}+q_{k+1}\geq q_{k}\left(\left(\frac{1-p}{p}\right)^{-W_{k}}+\left(\frac{1-p}{p}\right)^{-W_{k+1}}\right)\geq 2q_{k}\left(\frac{1-p}{p}\right)^{-W/2}.

Suppose for the sake of contradiction that

𝔼[Uk{0,1}](2σ)log1δ(12p)log1pp\displaystyle\mathbb{E}[U^{\approx}\mid k\not\in\{0,1\}]\leq(2-\sigma)\frac{\log\frac{1}{\delta}}{(1-2p)\log\frac{1-p}{p}}

for some constant σ>0\sigma>0. Then by Lemma 28 (taking UU^{\approx} as τ\tau, WW as XτX_{\tau}, mm as (2σ/4)log1δlog1pp\frac{(2-\sigma/4)\log\frac{1}{\delta}}{\log\frac{1-p}{p}}, δ\delta as 12σ2σ/4σ/41-\frac{2-\sigma}{2-\sigma/4}\geq\sigma/4), with probability σ/8o1/δ(1)\geq\sigma/8-o_{1/\delta}(1), we have

W(2σ/4)log1δlog1pp,\displaystyle W\leq\frac{(2-\sigma/4)\log\frac{1}{\delta}}{\log\frac{1-p}{p}},

and then

2(1pp)W/22δ1σ/8.\displaystyle 2\left(\frac{1-p}{p}\right)^{-W/2}\geq 2\delta^{1-\sigma/8}.

Because the error probability is 1.1δ\leq 1.1\delta, with probability at least 11.1δ10δ/σ1σ91-\frac{1.1\delta}{10\delta/\sigma}\geq 1-\frac{\sigma}{9}, we have 1qk10δ/σ1-q_{k}\leq 10\delta/\sigma, which leads to qk1+qk+110δ/σq_{k-1}+q_{k+1}\leq 10\delta/\sigma. Therefore,

2(1pp)W/2qk1+qk+1qk10δ/σ110δ/σ<20δ/σ.2\left(\frac{1-p}{p}\right)^{-W/2}\leq\frac{q_{k-1}+q_{k+1}}{q_{k}}\leq\frac{10\delta/\sigma}{1-10\delta/\sigma}<20\delta/\sigma.

(In the last step we use δ1/logn\delta\leq 1/\log n and thus 110δ/σ>121-10\delta/\sigma>\frac{1}{2} for sufficiently large nn.) This is a contradiction because 2δ1σ/820δ/σ2\delta^{1-\sigma/8}\geq 20\delta/\sigma and σ8+(1σ9)o1/δ(1)>1\frac{\sigma}{8}+\left(1-\frac{\sigma}{9}\right)-o_{1/\delta}(1)>1 when δ1/logn\delta\leq 1/\log n and nn is large enough.

Thus,

𝔼[Uk{0,n}](2o(1))log1δ(12p)log1pp.\displaystyle\mathbb{E}[U^{\approx}\mid k\not\in\{0,n\}]\geq(2-o(1))\frac{\log\frac{1}{\delta}}{(1-2p)\log\frac{1-p}{p}}.

Consequently,

𝔼[U]𝔼[Uk{0,n}]Pr[k{0,n}](2o(1))log1δ(12p)log1pp.\displaystyle\mathbb{E}[U^{\approx}]\geq\mathbb{E}[U^{\approx}\mid k\not\in\{0,n\}]\cdot\Pr[k\not\in\{0,n\}]\geq(2-o(1))\frac{\log\frac{1}{\delta}}{(1-2p)\log\frac{1-p}{p}}.

Now we are ready to prove Theorem 3.

Proof of Theorem 3.

If δ1/logn\delta\geq 1/\log n, then by Lemma 31, 32,

𝔼[U]\displaystyle\mathbb{E}[U] 𝔼[U]\displaystyle\geq\mathbb{E}[U^{\approx}]
(1δo(1))lognI(p)\displaystyle\geq(1-\delta-o(1))\frac{\log n}{I(p)}
(1o(1))((1δ)lognI(p)+2log1δ(12p)log1pp).\displaystyle\geq(1-o(1))\left((1-\delta)\frac{\log n}{I(p)}+\frac{2\log\frac{1}{\delta}}{(1-2p)\log\frac{1-p}{p}}\right).

If δ1/logn\delta\leq 1/\log n, then by combining Lemma 31, 32, 33, for any algorithm, the number of queries UU satisfies

𝔼[U]\displaystyle\mathbb{E}[U] 𝔼[U]+𝔼[U]\displaystyle\geq\mathbb{E}[U^{\approx}]+\mathbb{E}[U^{\not\approx}]
(1δo(1))lognI(p)+(2o(1))log1δ(12p)log1pp\displaystyle\geq(1-\delta-o(1))\frac{\log n}{I(p)}+(2-o(1))\frac{\log\frac{1}{\delta}}{(1-2p)\log\frac{1-p}{p}}
(1o(1))((1δ)lognI(p)+2log1δ(12p)log1pp).\displaystyle\geq(1-o(1))\left((1-\delta)\frac{\log n}{I(p)}+\frac{2\log\frac{1}{\delta}}{(1-2p)\log\frac{1-p}{p}}\right).

5.2 Upper Bound

In this section we prove our upper bound for Noisy Binary Search. See 4

Proof.

First, if δ>1logn\delta>\frac{1}{\log n}, we output an arbitrary answer and halt with probability δ1logn\delta-\frac{1}{\log n}. Otherwise, we call Theorem 10 with error probability 1logn\frac{1}{\log n}. By union bound, the overall error probability is at most δ\delta. The expected number of noisy comparisons is

(1δ+1logn)(1+o(1))(logn+O(loglogn)I(p))=(1+o(1))(1δ)lognI(p)\left(1-\delta+\frac{1}{\log n}\right)\cdot(1+o(1))\left(\frac{\log n+O(\log\log n)}{I(p)}\right)=(1+o(1))(1-\delta)\frac{\log n}{I(p)}

as required. In the following, we assume δ1logn\delta\leq\frac{1}{\log n}. We also assume logn4\log n\geq 4 for simplicity, as we could pad dummy elements.

Let xx be the element for which we need to find the predecessor. First, we use Theorem 10 with error probability 1logn\frac{1}{\log n} to find a candidate predecessor, which takes (1+o(1))(logn+O(loglogn)I(p))=(1+o(1))lognI(p)(1+o(1))\left(\frac{\log n+O(\log\log n)}{I(p)}\right)=(1+o(1))\cdot\frac{\log n}{I(p)} time. Then we use Lemma 13 to compare xx with this candidate predecessor ll and with the next element of this candidate predecessor rr, with error probability δ/4\delta/4. This takes (2+o4/δ(1))log(4/δ)(12p)log((1p)/p)=(2+o(1))log(1/δ)(12p)log((1p)/p)(2+o_{4/\delta}(1))\cdot\frac{\log(4/\delta)}{(1-2p)\log((1-p)/p)}=(2+o(1))\cdot\frac{\log(1/\delta)}{(1-2p)\log((1-p)/p)} time. If the comparison results are l<xl<x and x<rx<r, we return ll as the predecessor of xx, otherwise, we restart. See Algorithm 4.

Algorithm 4
1:procedure NoisyBinarySearch(n,A,x,δn,A,x,\delta)
2:     while true do
3:         Use Theorem 10 with error probability 1logn\frac{1}{\log n} to search the predecessor ll of xx in A{}A\cup\{-\infty\}.
4:         rr\leftarrow next element of ll in A{}A\cup\{\infty\}.
5:         if LessThan(l,x,δ/4l,x,\delta/4) and LessThan(x,r,δ/4x,r,\delta/4then return ll.               

Let [restart]\mathbb{P}[\text{restart}] be the probability that we restart a new iteration of the while loop. Clearly, if the call to Theorem 10 and both calls to LessThan are correct, then we will not restart. Thus, [restart]1logn+δ22logn\mathbb{P}[\text{restart}]\leq\frac{1}{\log n}+\frac{\delta}{2}\leq\frac{2}{\log n}. Note that as logn4\log n\geq 4, [restart]12\mathbb{P}[\text{restart}]\leq\frac{1}{2}.

Let PeP_{e} be the probability that this algorithm returns the wrong predecessor. In each iteration of the while loop, the algorithm returns the wrong answer only if at least one of the two calls to LessThan are incorrect. Therefore, Peδ2+[restart]PeP_{e}\leq\frac{\delta}{2}+\mathbb{P}[\text{restart}]P_{e}, so Peδ/21[restart]δP_{e}\leq\frac{\delta/2}{1-\mathbb{P}[\text{restart}]}\leq\delta.

Let 𝔼[Q]\mathbb{E}[Q] be the expected number of queries that this algorithm makes. Since the call to Theorem 10 uses (1+o(1))(logn+O(loglogn)I(p))(1+o(1))lognI(p)(1+o(1))\left(\frac{\log n+O(\log\log n)}{I(p)}\right)\leq(1+o(1))\frac{\log n}{I(p)} noisy comparisons in expectation, and each call to LessThan uses (1+o1/δ(1))log(4/δ)(12p)log1pp(1+o(1))log(1/δ)(12p)log1pp(1+o_{1/\delta}(1))\frac{\log(4/\delta)}{(1-2p)\log\frac{1-p}{p}}\leq(1+o(1))\frac{\log(1/\delta)}{(1-2p)\log\frac{1-p}{p}} noisy comparisons in expectation (as δ1logn\delta\leq\frac{1}{\log n}), we get that

𝔼[Q](1+o(1))(lognI(p)+2log(1/δ)(12p)log1pp)+𝔼[Q][restart].\mathbb{E}[Q]\leq(1+o(1))\left(\frac{\log n}{I(p)}+\frac{2\log(1/\delta)}{(1-2p)\log\frac{1-p}{p}}\right)+\mathbb{E}[Q]\cdot\mathbb{P}[\text{restart}].

Therefore,

𝔼[Q]\displaystyle\mathbb{E}[Q] (1+o(1))112logn(lognI(p)+2log(1/δ)(12p)log1pp)\displaystyle\leq(1+o(1))\cdot\frac{1}{1-\frac{2}{\log n}}\cdot\left(\frac{\log n}{I(p)}+\frac{2\log(1/\delta)}{(1-2p)\log\frac{1-p}{p}}\right)
(1+o(1))(lognI(p)+2log(1/δ)(12p)log1pp),\displaystyle\leq(1+o(1))\left(\frac{\log n}{I(p)}+\frac{2\log(1/\delta)}{(1-2p)\log\frac{1-p}{p}}\right),

as desired. ∎

6 Discussions

In this section we discuss a few possible extensions of our results.

Varying pp.

In our main results for Noisy Sorting, we have assumed that pp remains constant as nn\to\infty. This assumption is necessary in several places of our proofs as discussed below and we leave it as an open problem to drop this assumption.

For p12p\to\frac{1}{2} as nn\to\infty, Theorem 1 still holds. For the proof, one needs to suitably modify Corollary 12, Lemma 17, Lemma 18, Corollary 19, Lemma 20 to handle p12p\to\frac{1}{2} correctly. For example, in our current statement of Corollary 19 we ignore constants related to pp in the expressions for Var[τ]\operatorname{\mathrm{Var}}[\tau]. By choosing mm in Lemma 18 carefully, we can let Var[τ]=(𝔼[τ])2polylog(n)\operatorname{\mathrm{Var}}[\tau]=(\mathbb{E}[\tau])^{2}\operatorname{polylog}(n) even for growing pp, which suffices for the concentration results. For the lower bound (Theorem 2), Lemma 27 may become problematic. It seems possible to modify the proof of Lemma 27 to handle the case I(p)=no(1)I(p)=n^{-o(1)}. However, for I(p)=nΩ(1)I(p)=n^{-\Omega(1)} a stronger method is needed.

For p0p\to 0 as nn\to\infty, the lower bounds hold without any change. For the algorithms, our derandomization method would result in too many queries when p=O(1/logn)p=O(1/\log n).

We similarly leave the task of dropping the assumption that pp remains a constant in our Noisy Binary Search results as an open problem.

Non-uniform pp and semi-random adversary.

In our main results we have assumed that the probability of error in the comparisons remains constant. One possible extension is that in each noisy comparison, the result is flipped with some probability at most pp, where this probability can be adversarially chosen for each query. (The algorithm knows pp but does not know the flip probability for individual comparisons.) This model is sometimes called semi-random adversary in literature (e.g., [MWR18]). It is an interesting open problem to generalize our results to this model.

Dependency on δ\delta.

In our bounds for Noisy Sorting, we simply used o(1)o(1) as our error probability. We leave it as an open problem to generalize our bounds to the case where the error probability δ\delta can also be specified.

BMS observation.

One natural extension is to replace BSC\operatorname{\mathrm{BSC}} observation noise with a Binary Memoryless Symmetric channel (BMS). That is, for every noisy comparison, the probability of error is chosen (independently) from a distribution WW on [0,12][0,\frac{1}{2}], and this probability is returned together with the comparison result. In this case our guess is that

(1±o(1))(1𝔼pW[I(p)]+1𝔼pW[(12p)log1pp])nlogn\displaystyle(1\pm o(1))\left(\frac{1}{\mathbb{E}_{p\sim W}[I(p)]}+\frac{1}{\mathbb{E}_{p\sim W}[(1-2p)\log\frac{1-p}{p}]}\right)n\log n

noisy comparisons should be necessary and sufficient. When WW is supported on [ϵ,1/2ϵ][\epsilon,1/2-\epsilon] for some constant ϵ>0\epsilon>0 independent of nn, we expect it to be easier to adapt our proofs to BMS, whereas in the general case, issues described in the Varying pp discussion above can occur.

References

  • [AD91] Javed A. Aslam and Aditi Dhagat. Searching in the presence of linearly bounded errors. In Proceedings of the 23rd Annual ACM Symposium on Theory of Computing (STOC), pages 486–493, 1991.
  • [BH08] Michael Ben-Or and Avinatan Hassidim. The bayesian learner is optimal for noisy binary search (and pretty good for quantum as well). In Proceedings of the 2008 49th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 221–230, 2008.
  • [BK93] Ryan S. Borgstrom and S. Rao Kosaraju. Comparison-based search in the presence of errors. In Proceedings of the 25th Annual ACM Symposium on Theory of Computing (STOC), pages 130–136, 1993.
  • [BM86] Donald A. Berry and Roy F. Mensch. Discrete search with directional information. Oper. Res., 34(3):470–477, 1986.
  • [BM08] Mark Braverman and Elchanan Mossel. Noisy sorting without resampling. In Proceedings of the 19th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 268–276, 2008.
  • [BMW16] Mark Braverman, Jieming Mao, and S. Matthew Weinberg. Parallel algorithms for select and partition with noisy comparisons. In Proceedings of the 48th Annual ACM Symposium on Theory of Computing (STOC), pages 851–862, 2016.
  • [BZ74] M. V. Burnashev and K. Sh. Zigangirov. An interval estimation problem for controlled observations. Probl. Peredachi Inf., 10(3):51–61, 1974.
  • [CLRS22] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to algorithms. MIT press, 2022.
  • [CW77] J. Lawrence Carter and Mark N. Wegman. Universal classes of hash functions. In Proceedings of the 9th Annual ACM Symposium on Theory of Computing (STOC), pages 106–112, 1977.
  • [DGW92] Aditi Dhagat, Peter Gács, and Peter Winkler. On playing “twenty questions” with a liar. In Proceedings of the 3rd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 16–22, 1992.
  • [DŁU21] Dariusz Dereniowski, Aleksander Łukasiewicz, and Przemysław Uznański. Noisy searching: simple, fast and correct. arXiv preprint arXiv:2107.05753, 2021.
  • [FJ59] Lester R. Ford Jr. and Selmer M. Johnson. A tournament problem. Am. Math. Mon., 66(5):387–389, 1959.
  • [FRPU94] Uriel Feige, Prabhakar Raghavan, David Peleg, and Eli Upfal. Computing with noisy information. SIAM J. Comput., 23(5):1001–1018, 1994.
  • [Gal78] Shmuel Gal. A stochastic search game. SIAM. J. Appl. Math., 34(1):205–210, 1978.
  • [GLLP17] Barbara Geissmann, Stefano Leucci, Chih-Hung Liu, and Paolo Penna. Sorting with recurrent comparison errors. In Proceedings of the 28th International Symposium on Algorithms and Computation (ISAAC), pages 38:1–38:12, 2017.
  • [GLLP19] Barbara Geissmann, Stefano Leucci, Chih-Hung Liu, and Paolo Penna. Optimal sorting with persistent comparison errors. In Proceedings of the 27th Annual European Symposium on Algorithms (ESA), pages 1–14, 2019.
  • [GLLP20] Barbara Geissmann, Stefano Leucci, Chih-Hung Liu, and Paolo Penna. Optimal dislocation with persistent errors in subquadratic time. Theory Comput. Syst., 64(3):508–521, 2020.
  • [Hor63] Michael Horstein. Sequential transmission using noiseless feedback. IEEE Trans. Inf. Theory, 9(3):136–143, 1963.
  • [Knu98] Donald E. Knuth. The art of computer programming, volume 3. Addison-Wesley Professional, 1998.
  • [KPSW11] Rolf Klein, Rainer Penninger, Christian Sohler, and David P. Woodruff. Tolerant algorithms. In Proceedings of the 19th Annual European Symposium on Algorithms (ESA), pages 736–747, 2011.
  • [KY76] D. Knuth and A. Yao. The complexity of nonuniform random number generation. In Algorithms and Complexity: New Directions and Recent Results, 1976.
  • [Mut94] S. Muthukrishnan. On optimal strategies for searching in presence of errors. In Proceedings of the 5th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 680–689, 1994.
  • [MWR18] Cheng Mao, Jonathan Weed, and Philippe Rigollet. Minimax rates and efficient algorithms for noisy sorting. In Algorithmic Learning Theory, pages 821–847. PMLR, 2018.
  • [Pel89] Andrzej Pelc. Searching with known error probability. Theor. Comput. Sci., 63(2):185–202, 1989.
  • [PW22] Yury Polyanskiy and Yihong Wu. Information Theory: From Coding to Learning. Cambridge University Press, 2022. Draft version available at https://people.lids.mit.edu/yp/homepage/data/itbook-export.pdf.
  • [RMK+80] Ronald L. Rivest, Albert R. Meyer, Daniel J. Kleitman, Karl Winklmann, and Joel Spencer. Coping with errors in binary search procedures. J. Comput. Syst. Sci., 20(3):396–404, 1980.
  • [WGW22] Ziao Wang, Nadim Ghaddar, and Lele Wang. Noisy sorting capacity. In Proceedings of the 2022 IEEE International Symposium on Information Theory (ISIT), pages 2541–2546, 2022.
  • [WGZW23] Ziao Wang, Nadim Ghaddar, Banghua Zhu, and Lele Wang. Noisy sorting capacity. arXiv preprint arXiv:2202.01446v2, 2023.