Shanghai University of Finance and Economics, China and Huawei TCS Lab, [email protected] University, [email protected] Shanghai Jiao Tong University, [email protected] Peking University, [email protected] \CopyrightJohn Q. Public and Joan R. Public \ccsdesc[100]Theory of computation Design and analysis of algorithms \supplement
Acknowledgements.
\hideLIPIcs\EventEditorsJohn Q. Open and Joan R. Access \EventNoEds2 \EventLongTitle42nd Conference on Very Important Topics (CVIT 2016) \EventShortTitleCVIT 2016 \EventAcronymCVIT \EventYear2016 \EventDateDecember 24–27, 2016 \EventLocationLittle Whinging, United Kingdom \EventLogo \SeriesVolume42 \ArticleNo23Generalized Sorting with Predictions
Abstract
Generalized sorting problem, also known as sorting with forbidden comparisons, was first introduced by Huang et al. [5] together with a randomized algorithm which requires probes. We study this problem with additional predictions for all pairs of allowed comparisons as input. We propose a randomized algorithm which uses probes with high probability and a deterministic algorithm which uses probes, where is the number of mistakes made by prediction.
keywords:
Algorithm, Generalized Sorting, Predictioncategory:
\relatedversion1 Introduction
1.1 Generalized sorting
Sorting is arguably the most basic computational task, which is also widely used as an important component of many other algorithms. Pair-wise comparison is the core of most sorting algorithms. In the standard model, it is assumed that we can make comparison between all pairs of elements as we want and they are all of the same cost. However, this may not be the case in many applications. There might be some constrains which forbid us to compare some pairs of elements, or the costs for comparing different pairs of elements may be different. The non-uniform cost sorting model is studied in [3, 4, 6]. A special case called matching nuts and bolts problem is studied in [1, 7].
In this paper, we study the model introduced by Huang, Kannan and Khanna [5], which is known as generalized sorting problem or sorting with forbidden pairs. In this model, only a subset of the comparisons are allowed, and each allowed comparison is of the same cost. We can view the elements as the vertices of a graph, each undirected edge in the graph represents a pair of elements which is allowed to be compared. A comparison of two elements leads to the exposure of the direction of edge . It is guaranteed that the hidden directed graph is acyclic, and it contains a Hamiltonian path which represents the total order of all elements. Our goal is to adaptively probe edges in to find out the Hamiltonian path.
For generalized sorting problem, the performance of an algorithm is measured by the number of edges it probes. In standard sorting problem where is a complete graph, the minimum number of probes required is . For general graphs, one may need more probes. Huang et al. [5] have proved an upper bound of on the number of probes by giving a randomized algorithm. When the graph is dense and the number of edges is as large as , [2] proposes a deterministic algorithm which makes probes together with a randomized algorithm which makes probes with high probability. Most part of the generalized sorting problem is still open.
1.2 Algorithms with predictions
Recently, there is an interesting line of research called algorithm design with predictions [9], which is motivated by the observation that by making use of predictions provided by machine learning, one may be able to design a more effective algorithm. Normally, the better the prediction, the better the performance. In this framework, we aim for algorithms which have near optimal performance when the predictions are good, and no worse than the prediction-less case when the predictions have large errors. The above two targets in algorithm design with predictions are called consistency and robustness respectively.
Take the classic binary search algorithm as an example, which can find the position of an existing element in a sorted list in comparisons. It starts by querying the median of the list. However, if a machine learning algorithm can roughly estimate the position of the given element, it may not be always a good idea to start from the middle. Based on this idea, one designed an algorithm with query complexity of , where is the distance between the true position of the element and the estimated one, which measures the accuracy of the prediction. This algorithm can be much better than when is much smaller than , and no worse than the prediction-less binary search even if the estimation is terribly wrong because is at most .
1.3 Our results
In this paper, we initiate the study of generalized sorting with predictions. The model is very natural, besides the undirected graph , we are also given an orientation of as input, which are predictions of the hidden direction of the edges. The number of mis-predicted edges is denoted by . With the help of predictions, we hope to improve the bound of when is small.
In section 3, we propose a randomized algorithm for the generalized sorting problem and prove that it probes at most edges with high probability. The description of the algorithm is simple while the analysis is quite subtle and involved.
Theorem 1.1 (An randomized algorithm).
There is a polynomial time randomized algorithm which solves generalized sorting problem in probes with high probability.
In section 4, we also propose a deterministic algorithm using probes, in order to show that when is as small as a constant, the generalized sorting with prediction problem can be solved using only linear probes.
Theorem 1.2 (An deterministic algorithm).
There is a polynomial time deterministic algorithm which solves generalized sorting problem in probes.
Note that in the query complexity model, if we have two algorithms and which use and queries respectively, we can simply merge them into an algorithm , which simulates and , and make ’s queries and ’s queries alternately. Then uses only queries. Therefore by combining our algorithms with the one in [5], both consistency and robustness can be achieved.
2 Preliminaries
The input of the generalized sorting with prediction problem is an undirected graph together with an orientation of . There is another orientation of which represents the underlying total order and is unknown to us. The problem is formally stated as follows:
Definition 2.1 (generalized sorting with prediction).
An instance of generalized sorting with prediction problem can be represented as , where
-
•
is an undirected graph.
-
•
are two orientations of , i.e. , exactly one of and holds, and exactly one of and holds.
-
•
is the directed graph which represents the underlying total order, it is guaranteed that is acyclic and there is a directed Hamiltonian path in it.
-
•
is the predicted directed graph, there are no more guarantees about .
An edge whose direction is different in and is called a mis-predicted edge. An in-neighbor of in which is not an in-neighbor of in is called a wrong in-neighbor.
We use to denote the number of vertices and to denote the number of mis-predicted edges. The input and the required output of the problem are stated as follows:
-
•
Input:
-
•
Output: s.t. , which represents the directed Hamiltonian path in .
Note that is not given as input, but is fixed at the very beginning and does not change when the algorithm is executing.
An algorithm can adaptively probe an edge and know its direction in . The performance of an algorithm is measured by the number of edges it probes, which may be in terms of and .
Recall that is acyclic, so any subset naturally defines a partial order of . Sometimes we focus on a vertex set , and only consider the partial order in the induced subgraph :
Definition 2.2 (partial order and ).
-
•
defines a partial order of on graph , which is referred to as . For , iff there is a directed path from to which only consists of edges in .
-
•
defines a partial order of on the induced subgraph , which is referred to as . For , iff there is a directed path from to which only consists of edges in and only passes vertices in .
Definition 2.3 (vertex sets ).
Denote by the set of all in-neighbors of in the prediction graph, i.e. .
Denote by the set of real in-neighbors of among , i.e. .
Denote by (with respect to a specific moment) the set of in-neighbors of , which are not known to be wrong at that moment, i.e. the corresponding edges are either correct or unprobed. where (also with respect to a specific moment) is the set of probed directed edges up to that moment.
Note by definition it always holds . and are fixed while may change over time. Initially there are no probed edges, so . As the algorithm proceeds, some mis-predicted edges between and are found, the corresponding wrong in-neighbors no longer belong to and will finally shrink to .
3 An algorithm using probes
3.1 Description
The algorithm maintains a set of vertices satisfying , direction of edges between and are all known to us (either probed or can be deduced from other probed edges). Notice that the direction of edges in the induced subgraph must be all known. When , the direction of all edges are known and we can easily find the desired Hamiltonian path.
We initialize as , then iteratively add ‘ideal vertices’ to , which are defined as follows:
Definition 3.1 (ideal vertex).
A vertex is called an ideal vertex if both of the following conditions are satisfied:
-
1.
.
-
2.
The partial order restricted to is a total order.
Before adding a vertex to , we need to determine the direction of edges between and (those between and have been already known to be mis-predicted). For an ideal vertex , this can be done by using a straightforward strategy: repeatedly probe the edge , where is the largest vertex in with respect to . If the direction of this edge is correct, i.e. , we can conclude that the direction of all edges between and are correct by transitivity. We can end this phase and add to . Otherwise is a mis-predicted edge, is removed from and we move on to probe the edge between the new largest vertex in and , and so on.
If there is an ideal vertex, we are in an ideal case: by probing only one edge, we either learn the direction of all edges between and and add into , or find a mis-predicted edge. Notice that each vertex is added to once, and the wrong probes are charged to the term of complexity. Therefore we can add all vertices to in only probes, assuming there is always an ideal vertex in each step.
However, the assumption does not always hold due to the existence of mis-predicted edges, i.e. there may be a time when there is no ideal vertex. We have to relax the conditions to define a new type of vertex to help, which always exists:
Definition 3.2 (active vertex).
A vertex is called an active vertex if both of the following conditions are satisfied:
-
1.
.
-
2.
The partial order restricted to is a total order.
Lemma 3.3.
There is at least one active vertex in if .
Proof 3.4.
Suppose the Hamiltonian path in is . Let be the smallest index s.t. , then satisfies
-
1.
.
-
2.
restricted to is a total order.
Therefore is active at the moment.
An ideal vertex is always active since holds, but there may be some wrong in-neighbors not identified yet (the vertices in ) to prevent an active vertex from being ideal. By cleverly identify the wrong in-neighbors and remove them from , an active vertex would become an ideal one and we can use the above strategy again.
As is invisible to us, we know neither which vertices are active, nor which in-neighbors of a vertex are wrong, so we turn to focus on the in-neighbors of which prevent it from being ideal:
-
1.
s.t. .
-
2.
s.t. .
Now consider an active vertex . If such in case 1 exists, then must be a mis-predicted edge. If such in case 2 exists, there is at least one mis-predicted edge in . By probing or both repeatedly, we can keep removing its wrong in-neighbors from and finally make ideal.
For an inactive vertex , the direction of or both of may be correct, but that tells us is not active hence is not the vertex we are looking for. In this case the vertex or the pair of vertices is called a certificate for , which proves that is currently not active.
Definition 3.5 (certificate).
For a vertex ,
-
1.
A type-1 certificate is a vertex s.t. .
-
2.
A type-2 certificate is a pair of different vertices s.t. .
Once a certificate for is found, we turn to check the activeness of other vertices and do not need to probe any other incoming edges for until the next vertex is added to . For a fixed set , both activeness of vertices and validity of certificates are determined and do not change when new probes are made. Only when is extended and the current certificate of is no longer valid do we need to look for a new certificate for . A type-1 certificate becomes invalid when is added into , while a type-2 certificate becomes invalid when happens as expands.
In the worst case, one may need to update the certificates again and again and thus probe too many edges. By checking the validity of certificates in a random order, the worst case is avoided with high probability. We prove that our algorithm uses only probes with high probability in the next subsection, where the term comes from the probes used in re-searching for valid certificates.
Our algorithm works by repeatedly choose a vertex which does not have a valid certificate.
-
1.
If it is an ideal vertex, we use the strategy mentioned above to determine the direction of edges between and , then add to .
-
2.
Otherwise there must be a vertex s.t. , or a pair of vertices s.t. . We randomly choose such a vertex or such a pair of vertices and probe the edge(s) between and them. Then either at least one mis-predicted edge is found, or a valid certificate for is found.
Since there is always an active vertex , after finding some mis-predicted edges and removing the corresponding wrong in-neighbors from , must become an ideal vertex, that is how the algorithm makes progress.
Here is the pseudo code of the algorithm:
3.2 Analysis
We first prove that the algorithm can always proceed, and always terminates.
Lemma 3.6.
The algorithm can always proceed, and will terminate in finitely many steps.
Proof 3.7.
We know from Lemma 3.3 that there is always at least one active vertex outside if . Since active vertices have no valid certificates, in each execution of line 3, there is always at least one vertex which meets our requirements.
In each execution of the ‘while’ loop, exactly one of the following happens:
-
1.
We find a certificate for which doesn’t have a valid one previously.
-
2.
We find a mis-predicted edge.
-
3.
We add a vertex into .
When case 1 happens, we find a new certificate for . Only when this certificate becomes invalid do we need to look for a new one for . An invalid certificate will never become valid again since is always enlarging. Therefore this case happens for finitely many times.
Case 2 happens for finitely many times because once we find a mis-predicted edge, we remove a vertex from , the size of which is initially finite and non-negative all the time.
Case 3 also happens for finitely many times because each vertex is added into only once.
Now we proceed to analyze the total number of probes made by the algorithm. First we introduce some important lemmas:
Lemma 3.8.
All vertices are added into in a fixed order regardless of the randomness.
Proof 3.9.
Recall that an ideal vertex is always active, so we only add active vertices to . Whether a vertex is active or not only depends on the current . Therefore , when , the -th vertex added into is always the one with the smallest index among all active vertices in regardless of randomness.
Corollary 3.10.
Let denotes the event that the vertex pair becomes comparable in , i.e. . As expands, all such events happen in a fixed order regardless of the randomness (breaking ties in an arbitrarily fixed manner).
Remark. Lemma 3.8 and Corollary 3.10 use the fact that we determine the order among vertices in not according to all edges probed till now, but only according to the edges in the induced subgraph . It seems more efficient if we use all the information instead of the restricted portion, but it will create subtle correlations and we do not know how to analyze. The above fixed-order properties are crucial in the proof of our main theorem.
Lemma 3.11.
For a random permutation of , the number of elements s.t. does not exceed w.p. at least .
Proof 3.12.
A random permutation can be built in such a way:
-
•
, randomly and independently pick from .
-
•
take the unique permutation s.t. is the -th largest element in .
It’s easy to see a random permutation built in this way is uniformly distributed.
Let be a sequence of 0-1 random variables indicating whether . The number mentioned in the lemma is just since . We have
Note that as where is the euler constant, and
Plugging into a Chernoff bound: , we have
See 1.1
Proof 3.13.
The correctness of the algorithm directly follows from Lemma 3.6. We only need to bound the number of probes it uses.
Follow the proof of Lemma 3.6 we know in each execution of the ‘while’ loop, exactly one of the three cases happens:
-
•
We find a certificate for which doesn’t have a valid one previously.
-
•
We find a mis-predicted edge.
-
•
We add a vertex into .
The algorithm makes no more than two probes in each loop, so it’s sufficient to bound the number of occurrences of each cases.
Case 2 and case 3 happen for at most times in all, because the number of mis-predicted edges is at most and each vertex is added into exactly once.
We now focus on case 1. Once a valid certificate for is found, we won’t make any further probes for until its current certificate becomes invalid. According to Lemma 3.8, all vertices are added into in a fixed order, which means all possible type-1 certificates for (the set of which is initially) become invalid in a fixed order.
Each time we find a uniformly random type-1 certificate for among its currently valid ones. In the analysis it can be equivalently viewed as that for each , a random permutation of is chosen and fixed at first, and all type-1 certificates used in the process are identified according to this permutation: when a new valid type-1 certificate is found, let it be , where is the smallest index s.t. is currently valid as a type-1 certificate for .
Let represents the fixed order of to become invalid, i.e. is the -th earliest to become invalid. is a uniformly random permutation as well as , and the total number of valid type-1 certificates found for equals to the number of s.t. .
From Lemma 3.11 we know w.p. at least , this number does not exceed . Take union bound over all , w.p. at least , no vertex uses more than type-1 certificates, hence total number of valid type-1 certificates the algorithm finds does not exceed .
The above analysis is exactly the same for type-2 certificates, since according to Corollary 3.10, the possible type-2 certificates for also become invalid in a fixed order. The only difference is that the number of valid type-2 certificates for each may be up to , while it is at most for type-1 certificates. Again use Lemma 3.11 and take union bound, w.p. at least , the total number of valid type-2 certificates the algorithm finds does not exceed .
Combining all cases we can conclude that w.p. at least , the algorithm uses no more than probes in total.
4 An algorithm using probes
Here we briefly introduce a deterministic algorithm using probes, to show that the generalized sorting with prediction problem can be solved in only linear probes when is as small as a constant.
The basic idea is to find a mis-predicted edge in probes and correct it in the predicted graph. We use to denote the predicted graph after correction: where is the set of directed edges probed till now.
If there is a directed cycle in , there must be at least one mis-predicted edge on the cycle since the actual is acyclic. We just probe all edges on a simple directed cycle, update and loop again.
If is acyclic, consider running topological sort on it. If the direction of all edges in are correct, each time there should be exactly one vertex whose in-degree is 0. If not, i.e. there are two vertices with in-degree 0 at the same time, this can only happen when there are some mis-predicted edges either adjacent to or on the path produced by the topological sort before. We probe all such edges and loop again.
The pseudo code of the algorithm is stated as follows:
See 1.2
Proof 4.1.
In each execution of the ‘while’ loop, exactly one of the following three cases happens, and we analyze them separately:
-
1.
There is a simple directed cycle in . Since the actual is acyclic, at least one edge on the cycle in is mis-predicted. By probing all edges on it we will find at least one mis-predicted edge.
-
2.
is acyclic and in line 7, i.e. the topological sort terminates normally. By probing all edges between the adjacent vertices in the topological order, we either find a mis-predicted edge, or can know that the resulting path is just the desired Hamiltonian path.
-
3.
is acyclic and there are two different vertices with in-degree 0 during the topological sort.
In this case, a mis-predicted edge must be found in line 10 or line 11. Prove it by contradiction, assume all edges probed in line 10 and line 11 are correct, if , and must both lie in since the topological sort stops just after handling , so holds due to transitivity. W.l.o.g assume . Consider the directed path from to in the actual graph , let it be where and . Note that . Therefore the edge and , which contradicts the fact that the in-degree of is 0 at that moment.
Therefore in each loop, we either find at least one mis-predicted edge in or find the correct Hamiltonian path. It’s obvious that the number of probes we make is in each loop, so the total number of probes does not exceed .
References
- [1] Noga Alon, Manuel Blum, Amos Fiat, Sampath Kannan, Moni Naor, and Rafail Ostrovsky. Matching nuts and bolts. In SODA, pages 690–696, 1994.
- [2] I. Banerjee and D. Richards. Sorting under forbidden comparisons. In SWAT, 2016.
- [3] Moses Charikar, Ronald Fagin, Venkatesan Guruswami, Jon Kleinberg, Prabhakar Raghavan, and Amit Sahai. Query strategies for priced information. Journal of Computer and System Sciences, 64(4):785–819, 2002.
- [4] Anupam Gupta and Amit Kumar. Sorting and selection with structured costs. In Proceedings 42nd IEEE Symposium on Foundations of Computer Science, pages 416–425. IEEE, 2001.
- [5] Zhiyi Huang, Sampath Kannan, and Sanjeev Khanna. Algorithms for the generalized sorting problem. In 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science, pages 738–747. IEEE, 2011.
- [6] Sampath Kannan and Sanjeev Khanna. Selection with monotone comparison costs. In Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms, page 10. SIAM, 2003.
- [7] János Komlós, Yuan Ma, and Endre Szemerédi. Matching nuts and bolts in time. SIAM Journal on Discrete Mathematics, 11(3):347–372, 1998.
- [8] Thodoris Lykouris and Sergei Vassilvitskii. Competitive caching with machine learned advice. arXiv preprint arXiv:1802.05399, 2018.
- [9] Michael Mitzenmacher and Sergei Vassilvitskii. Algorithms with predictions. arXiv preprint arXiv:2006.09123, 2020.
- [10] Manish Purohit, Zoya Svitkina, and Ravi Kumar. Improving online algorithms via ml predictions. In Advances in Neural Information Processing Systems, pages 9661–9670, 2018.
- [11] Dhruv Rohatgi. Near-optimal bounds for online caching with machine learned advice. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1834–1845. SIAM, 2020.