resizealign (1)
On the Optimal Bounds for Noisy Computing
On the Optimal Bounds for Noisy Computing
Abstract
We revisit the problem of computing with noisy information considered in Feige et al. (1994), which includes computing the function from noisy queries, and computing the , and functions from noisy pairwise comparisons. For given elements, the goal is to correctly recover the desired function with probability at least when the outcome of each query is flipped with probability . We consider both the adaptive sampling setting where each query can be adaptively designed based on past outcomes, and the non-adaptive sampling setting where the query cannot depend on past outcomes. The prior work provides tight bounds on the worst-case query complexity in terms of the dependence on . However, the upper and lower bounds do not match in terms of the dependence on and . We improve the lower bounds for all the four functions under both adaptive and non-adaptive query models. Most of our lower bounds match the upper bounds up to constant factors when either or is bounded away from , while the ratio between the best prior upper and lower bounds goes to infinity when or . On the other hand, we also provide matching upper and lower bounds for the number of queries in expectation, improving both the upper and lower bounds for the variable-length query model.
I Introduction
The problem of computing with noisy information has been studied extensively since the seminal work (Feige et al., 1994), which considers four problems:
-
•
Computing the function of bits from noisy observations of the bits;
-
•
Finding the largest (or top-) element among real-valued elements from noisy pairwise comparisons;
-
•
Searching the rank of a new element in an ordered list of elements from noisy pairwise comparisons;
-
•
Sorting elements from noisy pairwise comparisons.
Feige et al. (1994) is based on a simple noise model where each observation goes through a binary symmetric channel , i.e. for each observation, with probability we see its flipped outcome, and with probability we see its true value. They provide upper and lower bounds for the query complexity in terms of the total number of elements , the noise probability , and the desired confidence level when adaptive querying is allowed. They establish the optimal query complexity in terms of dependence with respect to . However, the exact sample/query complexity with respect to all parameters , and is still not fully understood. In this paper, we revisit the problem of computing under noisy observations in both the adaptive sampling and non-adaptive sampling settings. We aim to close the gap in the existing bounds and illustrate the difference in query complexity between adaptive sampling and non-adaptive sampling.
Taking the problem of computing the function of bits as an example, assume that there are bits . The function is defined as
(2) |
The question is simple when we can query each bit noiselessly. In this case, queries are both sufficient and necessary since it suffices to query each bit once. And thus there is no benefit in applying adaptive querying compared to non-adaptive querying. When the observation of each query goes through a binary symmetric channel , we ask two questions:
-
•
How many queries (samples) do we need in the worst case to recover the true function value of any given sequences with probability at least ?
-
•
Can adaptive sampling do better than non-adaptive sampling when noise is present?
Problem | Fixed Length, Adaptive Sampling | |
Upper Bound | Lower Bound | |
(Feige et al., 1994) | (Thm II.1) | |
(Feige et al., 1994) | (Thm III.1) | |
(Feige et al., 1994) | (Thm IV.1) | |
(Feige et al., 1994) | (Thm V.1) | |
Problem | Fixed Length, Non-adaptive Sampling | |
Upper Bound (Appendix H) | Lower Bound | |
(Thm II.3) | ||
(Thm III.2) | ||
(Thm IV.3) | ||
(Thm V.3) | ||
Problem | Variable Length, Adaptive Sampling | |
Matching Bound (Thm VI.1) | ||
In Feige et al. (1994), an adaptive tournament algorithm is proposed that achieves worst-case query complexity , and a corresponding lower bound for adaptive sampling is provided. A simple calculation tells us that their ratio goes to infinity as or , which indicates that there is still a gap between the upper and the lower bounds when the noise probability is near the point or . This calls for tighter upper or lower bound for these cases. In our paper, we improve the lower bound to , which matches the existing upper bound up to a constant factor when either or is bounded away from .
One may wonder how many samples are needed when each query is not allowed to depend on previous outcomes. We provide a lower bound for this non-adaptive setting. On the other hand, a repetition-based upper bound matches the lower bound up to a constant factor when both and are bounded from .
Similarly, we ask the same questions for computing , , and . We defer the definitions and discussions of the problems to Section I-B. Here we summarize the best upper and lower bounds for the problems considered in this paper in Table I, either from previous art or from this paper.
Our lower bounds take the form of . Here the first term does not depend on for , representing the number of queries one must pay regardless of the target error probability . The second term grows logarithmically with , representing the price to pay for smaller target error probability. Here we always have for , with when is bounded away from . Technically, the first term is usually from Fano’s inequality, which gives better dependence on but worse dependence on . The second term is from a -divergence based lower bound, which gives better dependence on but worse dependence on .
We also extend our bounds from the fixed length query model to variable length query model (a.k.a. fixed budget and fixed confidence in the bandit literature (Kaufmann et al., 2016)). In the fixed length setting considered above, we ask for the worst-case deterministic number of queries required in the worst case to recover the true value with probability at least . In the variable length setting, the number of queries can be random and dependent on the past outcomes. And we ask for the expected number of queries to recover the true value with probability at least . We discuss the results for variable length in Section VI, where we give matching upper and lower bounds with respect to all parameters for computing all four functions, improving over both existing upper and lower bounds and closing the gap.
I-A Related Work
The problems of noisy computation have been studied extensively before and after (Feige et al., 1994). However, most of the existing research work focuses on tightening the dependence on instead of and , or extending the results to a more general framework where the noise follows a generalized model that includes channel as a special case. Although worst-case upper and lower bounds are provided for the generalized model, most of the lower bounds are based on instances where the noise does not follow a model. Thus the lower bounds do not apply to our case.
Noisy binary searching
The noisy searching problem was first introduced by Rényi (Rényi, 1961) and Ulam (Ulam, 1976) and further developed by Berlekamp (Berlekamp, 1964) and Horstein (Horstein, 1963) in the context of coding for channels with feedback. The noisy searching algorithm in (Burnashev and Zigangirov, 1974) by Burnashev and Zigangirov can be seen as a specialization of Horstein’s coding scheme, whereas the algorithms in (Feige et al., 1994, Pelc, 1989, Karp and Kleinberg, 2007) can be seen as an adaptation of the binary search algorithm to the noisy setting.
The first tight lower bound for variable-length adaptive sampling in noisy searching is given by (Burnashev, 1976). The recent concurrent work (Gu and Xu, 2023) improves the dependence on constant when is some constant that is bounded away from and . Our lower bounds are based on a different proof using Le Cam’s method. The results do not require to be bounded, but are worse in terms of the constant dependence. (Gu and Xu, 2023) also provides matching upper bound that is tight even with the constant in the variable-length setting. We provide more discussions in Section VI. Making the upper and lower bounds match in the fixed-length query model still remains an important open problem.
In terms of the bounds for non-adaptive sampling, the gap between for adaptive sampling and can be seen from the noiseless case when (Rényi, 1961). Here we provide an improved bound for the noisy case that has explicit dependence on and .
Noisy Sorting and max selection
The noisy sorting and max (or Top-) selection problems have been usually studied together (e.g., (Feige et al., 1994)) and later have been extended to a more general setting known as active ranking (Mohajer et al., 2017, Falahatgar et al., 2017, Shah and Wainwright, 2018, Heckel et al., 2019, Agarwal et al., 2017), where the noise for the comparison of a pair of elements and is usually unknown and different for different pairs. Other related but different settings for noisy sorting in the literature include the cases when some distance metric for permutations is to be minimized (rather than the the probability of error) (Ailon et al., 2008, Braverman and Mossel, 2009, Ailon, 2011, Negahban et al., 2012, Wauthier et al., 2013, Rajkumar and Agarwal, 2014, Shah et al., 2016, Shah and Wainwright, 2018, Mao et al., 2018), when the noise for each pairwise comparison is not i.i.d. and is determined by some noise model (e.g. the Bradley–Terry–Luce model(Bradley and Terry, 1952)) (Ajtai et al., 2009, Negahban et al., 2012, Rajkumar and Agarwal, 2014, Chen and Suh, 2015, Chen et al., 2017, Ren et al., 2018), or when the ordering itself is restricted to some subset of all permutations (Jamieson and Nowak, 2011, Ailon et al., 2011).
For noisy sorting, the best upper and lower bounds have been provided in (Feige et al., 1994, Wang et al., 2023), which give an upper bound and lower bound . We tighten the lower bound to be . On the other hand, (Gu and Xu, 2023) shows that the query complexity is , which does not scale with . Our lower bound for fixed-length improves the dependence on . We also make the bounds match up to constant factors for all parameters in the variable-length setting. However, making the dependence tight in upper bound for fixed-length remains an open problem. In terms of the non-adaptive sampling scenario, we provide an upper and lower bound that matches in terms of the dependence on , but is still loose when both and go to simultaneously.
For max selection, the best known lower bound for fixed-length adaptive sampling is (Feige et al., 1994), while our result makes it tight when either or goes to and provides matching bounds for variable-length setting. On the other hand, (Mohajer et al., 2017, Shah and Wainwright, 2018) discuss the gap between adaptive sampling and non-adaptive sampling. However, the lower bound for non-adaptive sampling in (Shah and Wainwright, 2018) does not apply to our case since it is based on a generalized model where the noise probability is different. In our case, is a natural upper bound. However, it is unclear whether our lower bound can be improved to .
and best arm identification
The noisy computation of has been first studied in the literature of circuit with noisy gates (Dobrushin and Ortyukov, 1977a, b, Von Neumann, 1956, Pippenger et al., 1991, Gács and Gál, 1994) and noisy decision trees (Feige et al., 1994, Evans and Pippenger, 1998, Reischuk and Schmeltz, 1991). Different from the other three problems we consider, computing does not rely on pairwise comparisons, but instead directly queries the values of the bits. This is also related to the rich literature of best arm identification, which queries real-valued arms and aims to identify the arm with largest value (reward) (Bubeck et al., 2009, Audibert et al., 2010, Garivier and Kaufmann, 2016, Jamieson and Nowak, 2014, Gabillon et al., 2012, Kaufmann et al., 2016). Indeed, any best-arm identification algorithm can be converted to an computation by first finding the maximum and then query the binary value of the maximum. This recovers the best existing upper bound for computation of under adaptive sampling scenario (Audibert et al., 2010). However, the lower bound for best arm identification does not apply to our case, since has a binary output, while the best arm identification problem requires the arm index. And our lower bound for fixed-length adaptive sampling improves over the best known lower bound (Feige et al., 1994). We also provide matching bounds for variable-length.
I-B Problem Definition and Preliminaries
The function is defined in Equation (2). We define the rest of the problems here. Different from , the , and problems are all based on noisy pairwise comparisons. Concretely, assume we have distinct real-valued items . Instead of querying the exact value of each element, we can only query a pair of elements and observe their noisy comparison. For any queried pair , we will observe a sample from if , and a sample from if .
We have , , where is the permutation function such that . And , where satisfies that with and . In the problem, we assume that the ordering of is given, and we are interested where the position of a new is. Thus, in each round we compare the given and any of the elements .
We are interested in the probability of exact recovery of the functions. We consider both adaptive sampling and non-adaptive sampling. In adaptive sampling, the sampling distribution at each round can be dependent on the past queries and observations. In non-adaptive sampling, the sampling distribution in each round has to be determined at the beginning and cannot change with the ongoing queries or observations. Throughout the paper, we assume that the desired error probability satisfies . We use the terms “querying” and “sampling” interchangeably.
II Computing the function
In this section, we provide the lower bounds for the query complexity of computing the function under both adaptive and non-adaptive sampling. The upper bound for adaptive sampling is from (Feige et al., 1994). And the upper bound for non-adaptive sampling is omitted . Let be the -bit sequence representing the true values. Let be the result of the function applied to the -bit noiseless sequence. We also let be any algorithm that queries any noisy bit in rounds, and outputs a (possibly random) decision from .
II-A Adaptive Sampling
We have the following minimax lower bound.
Theorem II.1.
In the adaptive setting, we have
Thus, the number of queries required to recover the true value with probability at least is lower bounded by .
We provide the proof here, which is based on Le Cam’s two point method (see e.g. (LeCam, 1973, Yu, 1997)).
Proof of Theorem II.1.
Our lower bound proof is mainly based on Le Cam’s two point method, which is also re-stated in Lemma A.1 in Appendix A for reader’s convenience. Let be the length- all-zero sequence, and let be such that and for . Here refers to the -th element in the binary vector . We can first verify that for any , one has
By applying Le Cam’s two point lemma on and , we know that
Here is the joint distribution of query-observation pairs in rounds when the ground truth is . By taking maximum over on the right-hand side, we have
Here the last inequality is due to Bretagnolle–Huber inequality (Bretagnolle and Huber, 1979) (Lemma A.4 in Appendix A). Now we aim at computing . Let be the random variable that denotes the number of times the -th element is queried among all rounds. From divergence decomposition lemma (Auer et al., 1995) (Lemma A.3 in Appendix A), we have
Here denotes the expected number of times the -th element is queried when the ground truth is . Thus we have
Now since , there must exists some such that . This gives
On the other hand, is naturally a lower bound for query complexity since one has to query each element at least once. Thus we arrive at a lower bound of . Note that this is equivalent to up to a constant factor when . The reason is that when is bounded away from , is always some constant. When is close to , is within constant factor of . ∎
Remark II.2.
Compared with the existing tightest bound in Feige et al. (1994), the rate is greatly improved as or .
On the other hand, the best known upper bound from Feige et al. (1994), which is . We include its algorithm and analysis in Appendix I. Theorem II.1 shows that when is bounded away from , one needs at least samples, matching the upper bound. Similarly, when is bounded away from , the term is also within a constant factor of , thus the upper and lower bounds match. The only regime where the upper and lower bounds do not match is the case when both and go to . We conjecture that a better upper bound is needed in this case.
II-B Non-adaptive Sampling
In the case of non-adaptive sampling, we show that queries are not enough. And one needs queries.
Theorem II.3.
In the non-adaptive sampling setting, where the sampling procedure is restricted to taking independent samples from a sequence of distributions , we have
This shows that the query complexity is at least .
We provide the proof below. Different from the case of adaptive sampling, for non-adaptive sampling we target for a rate of . And a standard Le Cam’s two point method is not sufficient to give the extra logarithmic factor. Thus we provide a new proof based on a point versus mixture extension of Le Cam’s method.
Proof of Theorem II.3.
Consider the instance which has all as its elements. For instance , we define it as a distribution over instances , which puts probability on the -th element. We will determine the value of later. Here instance refers to the case when -th element is , and the rest elements are . Now from Le Cam’s two point lemma, we have
Here the last inequality is based on the inequality . Let be the probability of querying the -th element in round . We can further calculate the divergence as
where follows from Lemma A.5, and follows from the tensorization property of for tensor products, a direct result of Lemma A.5. Now denote . By Jensen’s inequality, we know that . Thus we have
Now we take . We have
Since , by Jensen’s inequality, we have . Thus
This gives the desired result. Now solving the inequality
we arrive at . ∎
Remark II.4.
Compared with the bound in (Reischuk and Schmeltz, 1991, Gács and Gál, 1994), our lower bound is tighter when , but looser when . Thus the tightest lower bound is a maximum of and the two lower bounds. The corresponding repetition-based upper bound can be derived by a union-bound based argument. Compared with the upper bound , the lower bound is tight with respect to all parameters when both and are bounded away from .
III Computing the function
Let be a sequence of length representing the true values of each element. be the index of the maximum value in the sequence. We also let be any algorithm that (possibly randomly) queries any noisy comparison between two elements in rounds, and output a (possibly random) decision from .
III-A Adaptive Sampling
We have the following minimax lower bound for the adaptive setting. The proof is deferred to Appendix B.
Theorem III.1.
In the adaptive setting, we have
Thus, the number of queries required to recover the true value with probability at least is lower bounded by .
III-B Non-adaptive Sampling
In the case of non-adaptive sampling, we show that samples are not enough. Instead, one needs at least samples. We have the following result. The proof is deferred to Appendix C.
Theorem III.2.
In the non-adaptive setting, where the sampling procedure is restricted to taking independent samples from a sequence of distributions , we have
Thus the queries required to recover the true value with probability at least is lower bounded by .
Remark III.3.
Compared with the repetition-based upper bound , the lower bound has a gap. The tight dependence on remains open.
IV Computing the function
IV-A Adaptive Sampling
Recall that for any sorted sequence , the function for is defined as , where satisfies . We start with adaptive setting. The proof is deferred to Appendix D.
Theorem IV.1.
In the adaptive setting, we have
Thus the queries required to recover the true value with probability at least is lower bounded by .
IV-B Non-adaptive Sampling
For non-adaptive sampling, we show queries are needed. The proof is deferred to Appendix E.
Theorem IV.3.
In the non-adaptive setting where the sampling procedure is restricted to taking independent samples from a sequence of distributions , we have
Thus, the queries required to recover the true value with probability at least is lower bounded by .
Remark IV.4.
We show in Appendix H the upper bound for computing . Our lower bound matches with all parameters when either or is bounded away from .
V Computing the function
V-A Adaptive Sampling
Let be any sequence of elements with distinct values in , be the result of the function applied on the noiseless sequence. We also let be any algorithm that queries any noisy comparison between two elements in rounds, and output a (possibly random) decision. We have the following result for computing the function. The proof is deferred to Appendix F.
Theorem V.1.
In the adaptive setting, we have
Thus, the queries required to recover the true value with probability at least is lower bounded by .
Remark V.2.
Compared with the upper bound , the bound is tight with all parameters when either or is bounded away from .
V-B Non-adaptive Sampling
Here we provide the following minimax lower bound for non-adaptive learning. The proof is deferred to Appendix G.
Theorem V.3.
In the non-adaptive setting where the sampling procedure is restricted to taking independent samples from a sequence of distributions ,
Thus, the queries required to recover the true value with probability at least is lower bounded by .
Remark V.4.
Compared with the repetition-based upper bound , the lower bound is tight with all parameters when and are bounded away from .
VI Matching Bounds for Variable Length
In this section, we provide matching upper and lower bounds for the variable-length setting. All the bounds here are the same as the lower bound for the fixed-length setting, the proof of which can be directly adapted from the fixed-length results.
Theorem VI.1.
In the adaptive setting, the number of queries in expectation to achieve at most error probability is
-
1.
for computing ;
-
2.
for computing ;
-
3.
for computing ;
-
4.
for computing .
The proof is deferred to Appendix J. The matching upper bound for is given in (Gu and Xu, 2023) in the regime when is some constant that is bounded away from and , where they make it tight even for the dependency on the constant. Our results for the other three functions improve both existing upper and lower bounds, and provide tight query complexity with all parameters in the variable-length setting. (Wang et al., 2023) initiates the study for constant-wise matching bounds for when , which is achieved by (Gu and Xu, 2023). Our bounds are tight up to constant for arbitrarily small .
We provide the upper bound algorithm for computing in Algorithm 1. For the upper bound, one major difference is that to compare two elements with error probability at most , one needs queries in expectation, which can be achieved by keep comparing the two elements until the posterior distribution reaches the desired confidence level (see e.g. Lemma 13 of Gu and Xu (2023)). But the best known bound for fixed-length is (Feige et al., 1994). This makes it simpler to achieve tight rate for variable length.
In Algorithm 1, we adapt the noisy comparison oracle in Gu and Xu (2023) to noisy query oracle on each element, and combine with the original fixed-length algorithm in Feige et al. (1994). In each round, the algorithm eliminates half of the elements in the current set by querying the elements with odd indices. If the -th element is determined to be , the -th element will be removed without being queried. If the -th element is determined to be , it will be removed from the list. Thus after rounds, we have only one element left in the set, and it suffices to query this element to determine the output of function.
VII Conclusions and Future Work
For four noisy computing tasks — the function from noisy queries, and the , and functions from noisy pairwise comparisons — we tighten the lower bounds for fixed-length noisy computing and provide matching bounds for variable-length noisy computing. Making the bounds match exactly in the fixed-length setting remains an important open problem.
Acknowledgements
The authors would like to thank Yanjun Han for insightful discussions on the information-theoretic lower bounds. This work was supported in part by the NSERC Discovery Grant No. RGPIN-2019-05448, the NSERC Collaborative Research and Development Grant CRDPJ 54367619, NSF Grants IIS-1901252 and CCF-1909499.
References
- Agarwal et al. (2017) A. Agarwal, S. Agarwal, S. Assadi, and S. Khanna. Learning with limited rounds of adaptivity: Coin tossing, multi-armed bandits, and ranking from pairwise comparisons. In Proceedings of the 2017 Conference on Learning Theory, volume 65 of Proceedings of Machine Learning Research, pages 39–75. PMLR, 07–10 Jul 2017.
- Ailon (2011) N. Ailon. Active learning ranking from pairwise preferences with almost optimal query complexity. In J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 24. Curran Associates, Inc., 2011.
- Ailon et al. (2008) N. Ailon, M. Charikar, and A. Newman. Aggregating inconsistent information: Ranking and clustering. J. ACM, 55(5), nov 2008. ISSN 0004-5411. doi: 10.1145/1411509.1411513.
- Ailon et al. (2011) N. Ailon, R. Begleiter, and E. Ezra. A new active learning scheme with applications to learning to rank from pairwise preferences. 2011.
- Ajtai et al. (2009) M. Ajtai, V. Feldman, A. Hassidim, and J. Nelson. Sorting and selection with imprecise comparisons. In ACM Trans. Algorithms, volume 12, pages 37–48, 07 2009. doi: 10.1007/978-3-642-02927-1˙5.
- Audibert et al. (2010) J.-Y. Audibert, S. Bubeck, and R. Munos. Best arm identification in multi-armed bandits. In COLT, pages 41–53, 2010.
- Auer et al. (1995) P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. Gambling in a rigged casino: The adversarial multi-armed bandit problem. In Proceedings of IEEE 36th annual foundations of computer science, pages 322–331. IEEE, 1995.
- Berlekamp (1964) E. R. Berlekamp. Block coding with noiseless feedback. Ph.D. thesis, MIT, Cambridge, MA, USA, 1964.
- Bradley and Terry (1952) R. A. Bradley and M. E. Terry. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):324–345, 1952. ISSN 00063444.
- Braverman and Mossel (2009) M. Braverman and E. Mossel. Sorting from noisy information. 2009. URL https://arxiv.org/abs/0910.1191.
- Bretagnolle and Huber (1979) J. Bretagnolle and C. Huber. Estimation des densités: risque minimax. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, 47(2):119–137, 1979. doi: 10.1007/BF00535278.
- Bubeck et al. (2009) S. Bubeck, R. Munos, and G. Stoltz. Pure exploration in multi-armed bandits problems. In International conference on Algorithmic learning theory, pages 23–37. Springer, 2009.
- Burnashev (1976) M. V. Burnashev. Data transmission over a discrete channel with feedback. random transmission time. Problemy Peredachi Informatsii, 12(4):10–30, 1976.
- Burnashev and Zigangirov (1974) M. V. Burnashev and K. Zigangirov. An interval estimation problem for controlled observations. Problemy Peredachi Informatsii, 10(3):51–61, 1974.
- Chen et al. (2017) X. Chen, Y. Chen, and X. Li. Asymptotically optimal sequential design for rank aggregation. Math. Oper. Res., 10 2017. doi: 10.1287/moor.2021.1209.
- Chen and Suh (2015) Y. Chen and C. Suh. Spectral MLE: Top-K rank aggregation from pairwise comparisons. In F. Bach and D. Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 371–380, Lille, France, 07–09 Jul 2015. PMLR.
- Dobrushin and Ortyukov (1977a) R. L. Dobrushin and S. Ortyukov. Lower bound for the redundancy of self-correcting arrangements of unreliable functional elements. Problemy Peredachi Informatsii, 13(1):82–89, 1977a.
- Dobrushin and Ortyukov (1977b) R. L. Dobrushin and S. Ortyukov. Upper bound on the redundancy of self-correcting arrangements of unreliable functional elements. Problemy Peredachi Informatsii, 13(3):56–76, 1977b.
- Evans and Pippenger (1998) W. Evans and N. Pippenger. Average-case lower bounds for noisy boolean decision trees. SIAM Journal on Computing, 28(2):433–446, 1998.
- Falahatgar et al. (2017) M. Falahatgar, A. Orlitsky, V. Pichapati, and A. T. Suresh. Maximum selection and ranking under noisy comparisons. In D. Precup and Y. W. Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1088–1096. PMLR, 06–11 Aug 2017.
- Feige et al. (1994) U. Feige, P. Raghavan, D. Peleg, and E. Upfal. Computing with noisy information. SIAM Journal on Computing, 23(5):1001–1018, 1994.
- Gabillon et al. (2012) V. Gabillon, M. Ghavamzadeh, and A. Lazaric. Best arm identification: A unified approach to fixed budget and fixed confidence. Advances in Neural Information Processing Systems, 25, 2012.
- Gács and Gál (1994) P. Gács and A. Gál. Lower bounds for the complexity of reliable boolean circuits with noisy gates. IEEE Transactions on Information Theory, 40(2):579–583, 1994.
- Garivier and Kaufmann (2016) A. Garivier and E. Kaufmann. Optimal best arm identification with fixed confidence. In Conference on Learning Theory, 2016.
- Gu and Xu (2023) Y. Gu and Y. Xu. Optimal bounds for noisy sorting, 2023.
- Heckel et al. (2019) R. Heckel, N. B. Shah, K. Ramchandran, and M. J. Wainwright. Active ranking from pairwise comparisons and when parametric assumptions do not help. Ann. Stat., 47(6):3099 – 3126, 2019.
- Horstein (1963) M. Horstein. Sequential transmission using noiseless feedback. IEEE Trans. Inf. Theory, 9(3):136–143, 1963. doi: 10.1109/TIT.1963.1057832.
- Ingster et al. (2003) Y. Ingster, J. I. Ingster, and I. Suslina. Nonparametric goodness-of-fit testing under Gaussian models, volume 169. Springer Science & Business Media, 2003.
- Jamieson and Nowak (2014) K. Jamieson and R. Nowak. Best-arm identification algorithms for multi-armed bandits in the fixed confidence setting. In 48th Annual Conference on Information Sciences and Systems, pages 1–6, 2014.
- Jamieson and Nowak (2011) K. G. Jamieson and R. D. Nowak. Active ranking using pairwise comparisons. In Proceedings of the 24th International Conference on Neural Information Processing Systems, NIPS’11, page 2240–2248, Red Hook, NY, USA, 2011. Curran Associates Inc. ISBN 9781618395993.
- Karp and Kleinberg (2007) R. M. Karp and R. Kleinberg. Noisy binary search and its applications. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’07, page 881–890, USA, 2007. Society for Industrial and Applied Mathematics. ISBN 9780898716245.
- Kaufmann et al. (2016) E. Kaufmann, O. Cappé, and A. Garivier. On the complexity of best arm identification in multi-armed bandit models. Journal of Machine Learning Research, 17:1–42, 2016.
- LeCam (1973) L. LeCam. Convergence of estimates under dimensionality restrictions. The Annals of Statistics, pages 38–53, 1973.
- Mao et al. (2018) C. Mao, J. Weed, and P. Rigollet. Minimax rates and efficient algorithms for noisy sorting. In Proceedings of Algorithmic Learning Theory, volume 83 of Proceedings of Machine Learning Research, pages 821–847. PMLR, 07–09 Apr 2018.
- Mitzenmacher and Upfal (2017) M. Mitzenmacher and E. Upfal. Probability and computing: Randomization and probabilistic techniques in algorithms and data analysis. Cambridge university press, 2017.
- Mohajer et al. (2017) S. Mohajer, C. Suh, and A. Elmahdy. Active learning for top- rank aggregation from noisy comparisons. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, page 2488–2497. JMLR.org, 2017.
- Negahban et al. (2012) S. Negahban, S. Oh, and D. Shah. Iterative ranking from pair-wise comparisons. In Advances in Neural Information Processing Systems, volume 25, 2012.
- Pelc (1989) A. Pelc. Searching with known error probability. Theoretical Computer Science, 63(2):185–202, 1989.
- Pippenger et al. (1991) N. Pippenger, G. D. Stamoulis, and J. N. Tsitsiklis. On a lower bound for the redundancy of reliable networks with noisy gates. IEEE Transactions on Information Theory, 37(3):639–643, 1991.
- Rajkumar and Agarwal (2014) A. Rajkumar and S. Agarwal. A statistical convergence perspective of algorithms for rank aggregation from pairwise data. In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32, page I–118–I–126, 2014.
- Reischuk and Schmeltz (1991) R. Reischuk and B. Schmeltz. Reliable computation with noisy circuits and decision trees-a general n log n lower bound. In [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science, pages 602–611. IEEE Computer Society, 1991.
- Ren et al. (2018) W. Ren, J. Liu, and N. B. Shroff. Pac ranking from pairwise and listwise queries: Lower bounds and upper bounds. 2018.
- Rényi (1961) A. Rényi. On a problem of information theory. MTA Mat. Kut. Int. Kozl. B, 6(MR143666):505–516, 1961.
- Shah and Wainwright (2018) N. B. Shah and M. J. Wainwright. Simple, robust and optimal ranking from pairwise comparisons. J. Mach. Learn. Res., 18(199):1–38, 2018.
- Shah et al. (2016) N. B. Shah, S. Balakrishnan, A. Guntuboyina, and M. J. Wainwright. Stochastically transitive models for pairwise comparisons: Statistical and computational issues. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML’16, page 11–20. JMLR.org, 2016.
- Tsybakov (2004) A. B. Tsybakov. Introduction to nonparametric estimation. Springer, 9(10), 2004.
- Ulam (1976) S. Ulam. Adventures of a mathematician. Charles Scribner’s Sons, New York, NY, USA, 1976.
- Von Neumann (1956) J. Von Neumann. Probabilistic logics and the synthesis of reliable organisms from unreliable components. Automata studies, 34:43–98, 1956.
- Wang et al. (2022) Z. Wang, N. Ghaddar, and L. Wang. Noisy sorting capacity. In 2022 IEEE International Symposium on Information Theory (ISIT), pages 2541–2546. IEEE, 2022.
- Wang et al. (2023) Z. Wang, N. Ghaddar, B. Zhu, and L. Wang. Noisy sorting capacity, 2023. URL https://arxiv.org/abs/2202.01446.
- Wauthier et al. (2013) F. Wauthier, M. Jordan, and N. Jojic. Efficient ranking from pairwise comparisons. In Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pages 109–117, Atlanta, Georgia, USA, 17–19 Jun 2013.
- Yu (1997) B. Yu. Assouad, fano, and le cam. In Festschrift for Lucien Le Cam, pages 423–435. Springer, 1997.
Appendix A Useful Lemmas
Here, we introduce several important lemmas.
Lemma A.1 (Le Cam’s Two Point Lemma, see e.g. (Yu, 1997)).
For any , suppose that the following separation condition holds for some loss function :
Then we have
Lemma A.2 (Point vs Mixture, see e.g. (Ingster et al., 2003)).
For any , , suppose that the following separation condition holds for some loss function :
Then for any probability measure supported on , we have
Lemma A.3 (Divergence Decomposition (Auer et al., 1995)).
Let be the random variable denoting the number of times experiment is performed under some policy , then for two distributions under policy ,
Lemma A.4 (An upper Bound of Bretagnolle–Huber inequality ((Bretagnolle and Huber, 1979), and Lemma 2.6 in Tsybakov (2004))).
For any distribution , one has
Lemma A.5.
(Chi-Square divergence for mixture of distributions, see e.g. (Ingster et al., 2003)) Let be a family of distributions parametrized by , and be any fixed distribution. Then for any probability measure supported on , we have
Here in the expectation are independent.
Lemma A.6.
[Chernoff’s bound for Binomial random variables (Mitzenmacher and Upfal, 2017, Exercise 4.7)] If , then for any , we have
(3) | ||||
(4) |
Appendix B Proof for Theorem III.1
Proof.
We first select an arbitrary sequence . Let be original sequence which has its largest value in the -th element. Thus . Now for any , we design , i.e., we move the -th element in and insert it between and . Let be the random variable that represents the number of comparisons between the -th item and -th item in the rounds. We know that for all . Following a similar proof as Theorem II.1, we know that
Now since , there must exists some such that . This gives
On the other hand, is naturally a lower bound for the query complexity since one has to query each element at least once. Thus we arrive at a lower bound of . Note that this is equivalent to up to a constant factor when . The reason is that when is bounded away from , is always some constant. When is close to , is within constant factor of . ∎
Appendix C Proof for Theorem III.2
Proof.
Consider an arbitrary sequence . Since , there must exists some pair such that . Now we construct
where lies in the -th position and lies in the -th position, and
where lies in the -th position and lies in the -th position. Thus , . From Le Cam’s two point lemma, we know that
On the other hand, is naturally a lower bound for the query complexity since one has to query each element at least once. Thus we arrive at a lower bound of . Note that this is equivalent to up to a constant factor when . ∎
Appendix D Proof for Theorem IV.1
Proof.
We begin with the first half, i.e.
To see this, simply consider the case of , and we need to determine whether or from their pairwise comparisons. We consider two instances , where . From Le Cam’s lemma, we have
Next, we aim to prove the second half, namely
Now we design instances . We let , and . From Le Cam’s lemma, we have
Now by Fano’s inequality, we have
Here and satisfies . Following the same argument as the proof of Theorem 3 in Wang et al. (2022), we know that . Thus overall we have
Thus the queries required to recover the true value with probability at least is lower bounded by . ∎
Appendix E Proof for Theorem IV.3
Proof.
Let be the random variable that denotes the number of times is compared with . Since , there must exists some such that . Now we construct the first instance , and the second instance . From Le Cam’s two point lemma, we know that
On the other hand, is naturally a lower bound for query complexity since one has to query each element at least once. Thus we arrive at a lower bound of . ∎
Appendix F Proof for Theorem V.1
Proof.
From (Wang et al., 2023), it is proven with Fano’s inequality that
So it suffices to prove that
To see this, consider an arbitrary sequence . Now we design instances . We let , and be the instance that switches the element with , where . From Le Cam’s two point lemma, we have
Let be the random variable that represents the number of comparisons between the -th item and -th item in the rounds. Now since , there must exists some such that . This gives
Altogether, this shows a lower bound on the query complexity . Note that this is equivalent to since . ∎
Appendix G Proof for Theorem V.3
Proof.
We first select an arbitrary sequence . Let be the -th permutation of the sequence, where . Here refers to the -th largest element under permutation . Now we consider a summation . For each pair , is counted times in the summation. Furthermore, we know that . Thus we have
We know that there must exists some such that
Now we design instances . We let , and be the instance that switches the element with based on , where . From Le Cam’s two point lemma, we have
Now by Fano’s inequality, we have
Here . We can compute
Now summing over all , we have
This gives
This shows that to output the correct answer with probability at least , one needs at least queries for some universal constant .
∎
Appendix H Upper Bounds for Non-adaptive Sampling
In this section, we present a theorem on the upper bounds for non-adaptive learning in the worst-case query model.
Theorem H.1.
One can design algorithm such that the worst-case query complexity is
-
1.
for computing ;
-
2.
for computing ;
-
3.
for computing ;
-
4.
for computing ;
The results for , and are based the simple algorithms of querying all possible elements equal number of times. And the analysis is a direct union bound argument. Here we only present the algorithm and analysis for .
Proof.
Assume that the target lies between and . Consider the non-adaptive learning algorithm which compares with each element for times.
Let be the number of observations of among queries for element . Consider the following algorithm:
We show that with high probability, . We have
Now we bound the above probability for each . Without loss of generality, we assume that . The above probability can be written as
Note that . Let . From Lemma A.6 we have
Now by summing over the probability for different , we get
Rescaling finishes the proof. ∎
Appendix I Upper Bounds for Adaptive Sampling
Here we present the tournament algorithm for computing introduced in (Feige et al., 1994). Similar algorithm can also be applied to compute . The main difference is that in , we directly compare two elements times instead of comparing their number of s.
The following theorem is due to (Feige et al., 1994). We include it here for completeness.
Theorem I.1.
Algorithm 2 finishes within queries, and outputs the correct value of with probability at least when .
Proof.
First, we compute the total number of queries of the algorithm. Without loss of generality, we may assume that can be written as for some integer . If not we may add no more than extra dummy ’s to the original list to make sure . In each of the outer iteration , the size of is decreased half. We know that after round, the set will only contain one element. In round , the number of queries we make for each element is . The total number of queries we make is
Here in last inequality we use the fact below, which can be verified numerically:
(5) |
Now we show that the failure probability of the algorithm is at most . Consider the first case where all the elements are . Then no matter which element is left in , the probability that the algorithm fails is the probability that a Binomial random variable has value larger or equal to with .
Taking , in Equation (3) of Lemma A.6, we know that
Here the last inequality uses the fact that for all . This shows that the final failure probability is bounded by when the true elements are all .
Consider the second case where there exists at least a in the original elements . Without loss of generality, we assume that . Let be the remaining list of elements at the beginning of -th iteration. We let be the event that the first element in is while the first element in is . This event only happens when the second element in is and gets more ’s in the noisy queries than the first element. Let denote the event that the only left element is in the last round, we have
where and , with . Let , we know that the random variable . Thus we have
Following the same argument on Binomial distribution, we can upper bound the error probability under event with . Thus the total failure probability is upper bounded by when .
∎
Appendix J Proof of Theorem VI.1
J-A Lower Bounds
First, note that all our lower bounds for fixed-length can be adapted to variable-length by replacing with . The bound of mutual information in Fano’s inequality in Section D can be proven using the same argument as Lemma 27 in Gu and Xu (2023), and the divergence decomposition lemma (Lemma A.3) still holds for variable length due to Lemma 15 in Kaufmann et al. (2016).
J-B Upper Bounds
Now it suffices to prove the upper bounds. For and , we already know from Theorem I.1 that is an upper bound for fixed-length setting when comparing two elements with error probability at most requires samples for some constant . Now for variable-length setting, we know that comparing two elements with error probability at most only requires queries in expectation via the variable-length comparison algorithm in Lemma 13 of Gu and Xu (2023). Thus in Algorithm 1, we replace the repetition-based comparisons in Algorithm 2 with the new variable-length comparison algorithm. This gives the query complexity . Similar algorithm can also be applied to compute . The main difference is that in , we directly compare two elements instead of finding the number of s of the first element.
Theorem J.1.
The expected number of total queries made by Algorithm 1 is upper bounded by . Furthermore, the algorithm outputs the correct value of with probability at least when .
Proof.
First, we compute the total number of queries of the algorithm. Without loss of generality, we may assume that can be written as for some integer . If not we may add no more than extra dummy ’s to the original list to make sure . In each of the outer iteration , the size of is decreased half. We know that after iterations, the set will only contain one element. From Lemma 27 in Gu and Xu (2023), the expected number of queries we make is at round . Thus the total number of queries we make is
Here in last inequality we use the fact below, which can be verified numerically:
(6) |
Now we show that the failure probability of the algorithm is at most . Consider the first case where all the elements are . Then no matter which element is left in , the probability that the algorithm fails is the probability that the last while loop gives wrong output. From Lemma 27 in Gu and Xu (2023), we know that such probability is less than .
Consider the second case where there exists at least a in the original elements . Without loss of generality, we assume that . Let be the remaining list of elements at the beginning of -th iteration. We let be the event that the first element in is while the first element in is . This event only happens when the while loop ends with for the first element at the -th round. Let denote the event that the only left element is in the last round, we have
We can upper bound the error probability under event with . Thus the total failure probability is upper bounded by when .
∎
The upper bound for is due to Gu and Xu (2023), and is thus omitted here. The upper bound for is based on that for . We can design an insertion-based sorting algorithm by adding elements sequentially to an initially empty sorted set via noisy searching, as in Algorithm 1 in (Wang et al., 2023). Since the insertion step requires queries. Overall we know that one needs queries to achieve error probability at most . Rescaling gives the final rate.