This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Shanghai University of Finance and Economics, China and Huawei TCS Lab, [email protected] University, [email protected] Shanghai Jiao Tong University, [email protected] Peking University, [email protected] \CopyrightJohn Q. Public and Joan R. Public \ccsdesc[100]Theory of computation \to Design and analysis of algorithms \supplement

Acknowledgements.
\hideLIPIcs\EventEditorsJohn Q. Open and Joan R. Access \EventNoEds2 \EventLongTitle42nd Conference on Very Important Topics (CVIT 2016) \EventShortTitleCVIT 2016 \EventAcronymCVIT \EventYear2016 \EventDateDecember 24–27, 2016 \EventLocationLittle Whinging, United Kingdom \EventLogo \SeriesVolume42 \ArticleNo23

Generalized Sorting with Predictions

Pinyan Lu    Xuandi Ren    Enze Sun    Yubo Zhang
Abstract

Generalized sorting problem, also known as sorting with forbidden comparisons, was first introduced by Huang et al.  [5] together with a randomized algorithm which requires O~(n3/2)\tilde{O}(n^{3/2}) probes. We study this problem with additional predictions for all pairs of allowed comparisons as input. We propose a randomized algorithm which uses O(nlogn+w)O(n\log n+w) probes with high probability and a deterministic algorithm which uses O(nw)O(nw) probes, where ww is the number of mistakes made by prediction.

keywords:
Algorithm, Generalized Sorting, Prediction
category:
\relatedversion

1 Introduction

1.1 Generalized sorting

Sorting is arguably the most basic computational task, which is also widely used as an important component of many other algorithms. Pair-wise comparison is the core of most sorting algorithms. In the standard model, it is assumed that we can make comparison between all pairs of elements as we want and they are all of the same cost. However, this may not be the case in many applications. There might be some constrains which forbid us to compare some pairs of elements, or the costs for comparing different pairs of elements may be different. The non-uniform cost sorting model is studied in [3, 4, 6]. A special case called matching nuts and bolts problem is studied in [1, 7].

In this paper, we study the model introduced by Huang, Kannan and Khanna [5], which is known as generalized sorting problem or sorting with forbidden pairs. In this model, only a subset of the comparisons are allowed, and each allowed comparison is of the same cost. We can view the elements as the vertices of a graph, each undirected edge in the graph represents a pair of elements which is allowed to be compared. A comparison of two elements a,ba,b leads to the exposure of the direction of edge (a,b)(a,b). It is guaranteed that the hidden directed graph is acyclic, and it contains a Hamiltonian path which represents the total order of all elements. Our goal is to adaptively probe edges in EE to find out the Hamiltonian path.

For generalized sorting problem, the performance of an algorithm is measured by the number of edges it probes. In standard sorting problem where GG is a complete graph, the minimum number of probes required is Θ(nlogn)\Theta(n\log n). For general graphs, one may need more probes. Huang et al. [5] have proved an upper bound of O~(n1.5)\widetilde{O}(n^{1.5}) on the number of probes by giving a randomized algorithm. When the graph is dense and the number of edges is as large as (n2)q\binom{n}{2}-q, [2] proposes a deterministic algorithm which makes O((n+q)logn)O((n+q)\log n) probes together with a randomized algorithm which makes O~(n2/q+n+nq)\widetilde{O}\left(n^{2}/\sqrt{q+n}+n\sqrt{q}\right) probes with high probability. Most part of the generalized sorting problem is still open.

1.2 Algorithms with predictions

Recently, there is an interesting line of research called algorithm design with predictions [9], which is motivated by the observation that by making use of predictions provided by machine learning, one may be able to design a more effective algorithm. Normally, the better the prediction, the better the performance. In this framework, we aim for algorithms which have near optimal performance when the predictions are good, and no worse than the prediction-less case when the predictions have large errors. The above two targets in algorithm design with predictions are called consistency and robustness respectively.

Take the classic binary search algorithm as an example, which can find the position of an existing element in a sorted list in O(logn)O(\log n) comparisons. It starts by querying the median of the list. However, if a machine learning algorithm can roughly estimate the position of the given element, it may not be always a good idea to start from the middle. Based on this idea, one designed an algorithm with query complexity of O(logw)O(\log w), where ww is the distance between the true position of the element and the estimated one, which measures the accuracy of the prediction. This algorithm can be much better than O(logn)O(\log n) when ww is much smaller than nn, and no worse than the prediction-less binary search even if the estimation is terribly wrong because ww is at most nn.

Algorithms with predictions are studied for caching [8, 11], ski-rental, online scheduling [10] and other problems. See the nice survey by Mitzenmacher and Vassilvitskii [9].

1.3 Our results

In this paper, we initiate the study of generalized sorting with predictions. The model is very natural, besides the undirected graph G=(V,E)G=(V,E), we are also given an orientation of GG as input, which are predictions of the hidden direction of the edges. The number of mis-predicted edges is denoted by ww. With the help of predictions, we hope to improve the bound of O~(n1.5)\tilde{O}(n^{1.5}) when ww is small.

In section 3, we propose a randomized algorithm for the generalized sorting problem and prove that it probes at most O(nlogn+w)O(n\log n+w) edges with high probability. The description of the algorithm is simple while the analysis is quite subtle and involved.

Theorem 1.1 (An O(nlogn+w)O(n\log n+w) randomized algorithm).

There is a polynomial time randomized algorithm which solves generalized sorting problem in O(nlogn+w)O(n\log n+w) probes with high probability.

In section 4, we also propose a deterministic algorithm using O(nw)O(nw) probes, in order to show that when ww is as small as a constant, the generalized sorting with prediction problem can be solved using only linear probes.

Theorem 1.2 (An O(nw)O(nw) deterministic algorithm).

There is a polynomial time deterministic algorithm which solves generalized sorting problem in O(nw)O(nw) probes.

Note that in the query complexity model, if we have two algorithms 𝒜\mathcal{A} and \mathcal{B} which use O(f(n))O(f(n)) and O(g(n))O(g(n)) queries respectively, we can simply merge them into an algorithm 𝒞\mathcal{C}, which simulates 𝒜\mathcal{A} and \mathcal{B}, and make 𝒜\mathcal{A}’s queries and \mathcal{B}’s queries alternately. Then 𝒞\mathcal{C} uses only O(min(f(n),g(n)))O(\min(f(n),g(n))) queries. Therefore by combining our algorithms with the one in [5], both consistency and robustness can be achieved.

2 Preliminaries

The input of the generalized sorting with prediction problem is an undirected graph G=(V,E)G=(V,E) together with an orientation P\vec{P} of EE. There is another orientation E\vec{E} of EE which represents the underlying total order and is unknown to us. The problem is formally stated as follows:

Definition 2.1 (generalized sorting with prediction).

An instance of generalized sorting with prediction problem can be represented as (V,E,E,P)(V,E,\vec{E},\vec{P}), where

  • G=(V,E)G=(V,E) is an undirected graph.

  • P,E\vec{P},\vec{E} are two orientations of EE, i.e. (u,v)E\forall(u,v)\in E, exactly one of (u,v)P(u,v)\in\vec{P} and (v,u)P(v,u)\in\vec{P} holds, and exactly one of (u,v)E(u,v)\in\vec{E} and (v,u)E(v,u)\in\vec{E} holds.

  • G=(V,E)\vec{G}=(V,\vec{E}) is the directed graph which represents the underlying total order, it is guaranteed that G\vec{G} is acyclic and there is a directed Hamiltonian path in it.

  • GP=(V,P)\vec{G}_{P}=(V,\vec{P}) is the predicted directed graph, there are no more guarantees about GP\vec{G}_{P}.

An edge (u,v)(u,v) whose direction is different in P\vec{P} and E\vec{E} is called a mis-predicted edge. An in-neighbor of uu in GP\vec{G}_{P} which is not an in-neighbor of uu in G\vec{G} is called a wrong in-neighbor.

We use n=|V|n=|V| to denote the number of vertices and w=|P\E|w=|\vec{P}\backslash\vec{E}| to denote the number of mis-predicted edges. The input and the required output of the problem are stated as follows:

  • Input: (V,E,P)(V,E,\vec{P})

  • Output: (v1,v2,,vn)(v_{1},v_{2},...,v_{n}) s.t. 1i<n,(vi,vi+1)E\forall 1\leq i<n,(v_{i},v_{i+1})\in\vec{E}, which represents the directed Hamiltonian path in G\vec{G}.

Note that E\vec{E} is not given as input, but is fixed at the very beginning and does not change when the algorithm is executing.

An algorithm can adaptively probe an edge and know its direction in E\vec{E}. The performance of an algorithm is measured by the number of edges it probes, which may be in terms of nn and ww.

Recall that G=(V,E)\vec{G}=(V,\vec{E}) is acyclic, so any subset EE\vec{E}^{\prime}\subseteq\vec{E} naturally defines a partial order of VV. Sometimes we focus on a vertex set VV^{\prime}, and only consider the partial order in the induced subgraph G[V]\vec{G}[V^{\prime}]:

Definition 2.2 (partial order <E<_{\vec{E}^{\prime}} and <V<_{V^{\prime}}).
  • EE\vec{E}^{\prime}\subseteq\vec{E} defines a partial order of VV on graph (V,E)(V,\vec{E}^{\prime}), which is referred to as <E<_{\vec{E}^{\prime}}. For a,bVa,b\in V, (a<Eb)(a<_{\vec{E}^{\prime}}b) iff there is a directed path from aa to bb which only consists of edges in E\vec{E}^{\prime}.

  • VVV^{\prime}\subseteq V defines a partial order of VV^{\prime} on the induced subgraph G[V]\vec{G}[V^{\prime}], which is referred to as <V<_{V^{\prime}}. For a,bVa,b\in V^{\prime}, (a<Vb)(a<_{V^{\prime}}b) iff there is a directed path from aa to bb which only consists of edges in E\vec{E} and only passes vertices in VV^{\prime}.

Definition 2.3 (vertex sets 𝒩in(GP,u),Su,Tu\mathcal{N}_{in}(G_{\vec{P}},u),S_{u},T_{u}).

Denote by 𝒩in(GP,u)\mathcal{N}_{in}(G_{\vec{P}},u) the set of all in-neighbors of uu in the prediction graph, i.e. 𝒩in(GP,u)={v|(v,u)P}\mathcal{N}_{in}(G_{\vec{P}},u)=\{v|(v,u)\in\vec{P}\}.

Denote by SuS_{u} the set of real in-neighbors of uu among 𝒩in(GP,u)\mathcal{N}_{in}(G_{\vec{P}},u), i.e. Su={v|(v,u)PE}S_{u}=\{v|(v,u)\in\vec{P}\cap\vec{E}\}.

Denote by TuT_{u} (with respect to a specific moment) the set of in-neighbors of uu, which are not known to be wrong at that moment, i.e. the corresponding edges are either correct or unprobed. Tu={v|(v,u)P(u,v)Q}T_{u}=\{v|(v,u)\in\vec{P}\land(u,v)\notin\vec{Q}\} where Q\vec{Q} (also with respect to a specific moment) is the set of probed directed edges up to that moment.

Note by definition it always holds SuTu𝒩in(GP,u)S_{u}\subseteq T_{u}\subseteq\mathcal{N}_{in}(G_{\vec{P}},u). SuS_{u} and 𝒩in(GP,u)\mathcal{N}_{in}(G_{\vec{P}},u) are fixed while TuT_{u} may change over time. Initially there are no probed edges, so Tu=𝒩in(GP,u)T_{u}=\mathcal{N}_{in}(G_{\vec{P}},u). As the algorithm proceeds, some mis-predicted edges between 𝒩in(GP,u)\mathcal{N}_{in}(G_{\vec{P}},u) and uu are found, the corresponding wrong in-neighbors no longer belong to TuT_{u} and TuT_{u} will finally shrink to SuS_{u}.

3 An algorithm using O(nlogn+w)O(n\log n+w) probes

3.1 Description

The algorithm maintains a set of vertices AA satisfying uA\forall u\in A, direction of edges between 𝒩in(GP,u)\mathcal{N}_{in}(\vec{G}_{P},u) and uu are all known to us (either probed or can be deduced from other probed edges). Notice that the direction of edges in the induced subgraph G[A]G[A] must be all known. When A=VA=V, the direction of all edges are known and we can easily find the desired Hamiltonian path.

We initialize AA as \emptyset, then iteratively add ‘ideal vertices’ to AA, which are defined as follows:

Definition 3.1 (ideal vertex).

A vertex uVu\in V is called an ideal vertex if both of the following conditions are satisfied:

  1. 1.

    TuAT_{u}\subseteq A.

  2. 2.

    The partial order <A<_{A} restricted to TuT_{u} is a total order.

Before adding a vertex uu to AA, we need to determine the direction of edges between TuT_{u} and uu (those between 𝒩in(GP,u)\Tu\mathcal{N}_{in}(\vec{G}_{P},u)\backslash T_{u} and uu have been already known to be mis-predicted). For an ideal vertex uu, this can be done by using a straightforward strategy: repeatedly probe the edge (t,u)(t,u), where tt is the largest vertex in TuT_{u} with respect to <A<_{A}. If the direction of this edge is correct, i.e. t<Eut<_{\vec{E}}u, we can conclude that the direction of all edges between TuT_{u} and uu are correct by transitivity. We can end this phase and add uu to AA. Otherwise (t,u)(t,u) is a mis-predicted edge, tt is removed from TuT_{u} and we move on to probe the edge between the new largest vertex in TuT_{u} and uu, and so on.

If there is an ideal vertex, we are in an ideal case: by probing only one edge, we either learn the direction of all edges between TuT_{u} and uu and add uu into AA, or find a mis-predicted edge. Notice that each vertex is added to AA once, and the wrong probes are charged to the ww term of complexity. Therefore we can add all vertices to AA in only O(n+w)O(n+w) probes, assuming there is always an ideal vertex in each step.

However, the assumption does not always hold due to the existence of mis-predicted edges, i.e. there may be a time when there is no ideal vertex. We have to relax the conditions to define a new type of vertex to help, which always exists:

Definition 3.2 (active vertex).

A vertex uVu\in V is called an active vertex if both of the following conditions are satisfied:

  1. 1.

    SuAS_{u}\subseteq A.

  2. 2.

    The partial order <A<_{A} restricted to SuS_{u} is a total order.

Lemma 3.3.

There is at least one active vertex in V\AV\backslash A if AVA\neq V.

Proof 3.4.

Suppose the Hamiltonian path in G\vec{G} is (v1,,vn)(v_{1},...,v_{n}). Let kk be the smallest index s.t. vkAv_{k}\notin A, then vkv_{k} satisfies

  1. 1.

    Svk{v1,,vk1}AS_{v_{k}}\subseteq\{v_{1},...,v_{k-1}\}\subseteq A.

  2. 2.

    <A<_{A} restricted to SvkS_{v_{k}} is a total order.

Therefore vkv_{k} is active at the moment.

An ideal vertex is always active since SuTuS_{u}\subseteq T_{u} holds, but there may be some wrong in-neighbors not identified yet (the vertices in Tu\SuT_{u}\backslash S_{u}) to prevent an active vertex from being ideal. By cleverly identify the wrong in-neighbors and remove them from TuT_{u}, an active vertex uu would become an ideal one and we can use the above strategy again.

As SuS_{u} is invisible to us, we know neither which vertices are active, nor which in-neighbors of a vertex are wrong, so we turn to focus on the in-neighbors of uu which prevent it from being ideal:

  1. 1.

    vTuv\in T_{u} s.t. vAv\notin A.

  2. 2.

    v1,v2Tuv_{1},v_{2}\in T_{u} s.t. (v1,v2A)(v1Av2)(v2Av1)(v_{1},v_{2}\in A)\land(v_{1}\not<_{A}v_{2})\land(v_{2}\not<_{A}v_{1}).

Now consider an active vertex uu. If such vv in case 1 exists, then (v,u)(v,u) must be a mis-predicted edge. If such v1,v2v_{1},v_{2} in case 2 exists, there is at least one mis-predicted edge in (v1,u),(v2,u)(v_{1},u),(v_{2},u). By probing (v,u)(v,u) or both (v1,u),(v2,u)(v_{1},u),(v_{2},u) repeatedly, we can keep removing its wrong in-neighbors from TuT_{u} and finally make uu ideal.

For an inactive vertex uu, the direction of (v,u)(v,u) or both of (v1,u),(v2,u)(v_{1},u),(v_{2},u) may be correct, but that tells us uu is not active hence is not the vertex we are looking for. In this case the vertex vv or the pair of vertices (v1,v2)(v_{1},v_{2}) is called a certificate for uu, which proves that uu is currently not active.

Definition 3.5 (certificate).

For a vertex uVu\in V,

  1. 1.

    A type-1 certificate is a vertex vSuv\in S_{u} s.t. vAv\notin A.

  2. 2.

    A type-2 certificate is a pair of different vertices v1,v2Suv_{1},v_{2}\in S_{u} s.t. (v1,v2A)(v1Av2)(v2Av1)(v_{1},v_{2}\in A)\land(v_{1}\not<_{A}v_{2})\land(v_{2}\not<_{A}v_{1}).

Once a certificate for uu is found, we turn to check the activeness of other vertices and do not need to probe any other incoming edges for uu until the next vertex is added to AA. For a fixed set AA, both activeness of vertices and validity of certificates are determined and do not change when new probes are made. Only when AA is extended and the current certificate of uu is no longer valid do we need to look for a new certificate for uu. A type-1 certificate vv becomes invalid when vv is added into AA, while a type-2 certificate (v1,v2)(v_{1},v_{2}) becomes invalid when (v1<Av2)(v2<Av1)(v_{1}<_{A}v_{2})\lor(v_{2}<_{A}v_{1}) happens as AA expands.

In the worst case, one may need to update the certificates again and again and thus probe too many edges. By checking the validity of certificates in a random order, the worst case is avoided with high probability. We prove that our algorithm uses only O(nlogn+w)O(n\log n+w) probes with high probability in the next subsection, where the term nlognn\log n comes from the probes used in re-searching for valid certificates.

Our algorithm works by repeatedly choose a vertex uu which does not have a valid certificate.

  1. 1.

    If it is an ideal vertex, we use the strategy mentioned above to determine the direction of edges between TuT_{u} and uu, then add uu to AA.

  2. 2.

    Otherwise there must be a vertex vTuv\in T_{u} s.t. vAv\notin A, or a pair of vertices v1,v2Tuv_{1},v_{2}\in T_{u} s.t. (v1,v2A)(v1Av2)(v2Av1)(v_{1},v_{2}\in A)\land(v_{1}\not<_{A}v_{2})\land(v_{2}\not<_{A}v_{1}). We randomly choose such a vertex vv or such a pair of vertices (v1,v2)(v_{1},v_{2}) and probe the edge(s) between uu and them. Then either at least one mis-predicted edge is found, or a valid certificate for uu is found.

Since there is always an active vertex uu, after finding some mis-predicted edges and removing the corresponding wrong in-neighbors from TuT_{u}, uu must become an ideal vertex, that is how the algorithm makes progress.

Here is the pseudo code of the algorithm:

Algorithm 1 A randomized algorithm using O(nlogn+w)O(n\log n+w) probes
1:set A:=A:=\emptyset
2:while AVA\neq V do
3:     pick uV\Au\in V\backslash A s.t. uu does not have a valid certificate (if there are multiple ones, pick one with the smallest index)
4:     if TuAT_{u}\not\subseteq A then
5:         randomly pick vTu\Av\in T_{u}\backslash A
6:         probe (v,u)(v,u)
7:     else if \exists different v1,v2Tuv_{1},v_{2}\in T_{u} s.t. (v1Av2)(v2Av1)(v_{1}\not<_{A}v_{2})\land(v_{2}\not<_{A}v_{1}) then
8:         randomly select such a pair v1,v2v_{1},v_{2}
9:         probe (v1,u),(v2,u)(v_{1},u),(v_{2},u)
10:     else
11:         let tt be the largest vertex in TuT_{u} w.r.t. <A<_{A}
12:         probe (t,u)(t,u)
13:         if the direction of (t,u)(t,u) is correct, i.e. t<Eut<_{\vec{E}}u then
14:              add uu to AA               

3.2 Analysis

We first prove that the algorithm can always proceed, and always terminates.

Lemma 3.6.

The algorithm can always proceed, and will terminate in finitely many steps.

Proof 3.7.

We know from Lemma 3.3 that there is always at least one active vertex outside AA if AVA\neq V. Since active vertices have no valid certificates, in each execution of line 3, there is always at least one vertex which meets our requirements.

In each execution of the ‘while’ loop, exactly one of the following happens:

  1. 1.

    We find a certificate for uu which doesn’t have a valid one previously.

  2. 2.

    We find a mis-predicted edge.

  3. 3.

    We add a vertex uu into AA.

When case 1 happens, we find a new certificate for uu. Only when this certificate becomes invalid do we need to look for a new one for uu. An invalid certificate will never become valid again since AA is always enlarging. Therefore this case happens for finitely many times.

Case 2 happens for finitely many times because once we find a mis-predicted edge, we remove a vertex from TuT_{u}, the size of which is initially finite and non-negative all the time.

Case 3 also happens for finitely many times because each vertex is added into AA only once.

Now we proceed to analyze the total number of probes made by the algorithm. First we introduce some important lemmas:

Lemma 3.8.

All vertices are added into AA in a fixed order regardless of the randomness.

Proof 3.9.

Recall that an ideal vertex is always active, so we only add active vertices to AA. Whether a vertex uu is active or not only depends on the current AA. Therefore 1in\forall 1\leq i\leq n, when |A|=i1|A|=i-1, the ii-th vertex added into AA is always the one with the smallest index among all active vertices in V\AV\backslash A regardless of randomness.

Corollary 3.10.

Let C(a,b)(a,bV)C_{(a,b)}(a,b\in V) denotes the event that the vertex pair (a,b)(a,b) becomes comparable in <A<_{A}, i.e. (a,bA)((a<Ab)(b<Aa))(a,b\in A)\land((a<_{A}b)\lor(b<_{A}a)). As AA expands, all (n2)\binom{n}{2} such events happen in a fixed order regardless of the randomness (breaking ties in an arbitrarily fixed manner).

Remark. Lemma 3.8 and Corollary 3.10 use the fact that we determine the order among vertices in TuT_{u} not according to all edges probed till now, but only according to the edges in the induced subgraph G[A]G[A]. It seems more efficient if we use all the information instead of the restricted portion, but it will create subtle correlations and we do not know how to analyze. The above fixed-order properties are crucial in the proof of our main theorem.

Lemma 3.11.

For a random permutation {Y1,,Yn}\{Y_{1},...,Y_{n}\} of {1,,n}\{1,...,n\}, the number of elements YiY_{i} s.t. Yi=maxjiYjY_{i}=\max_{j\leq i}Y_{j} does not exceed 6lnn+66\ln n+6 w.p. at least 112n21-\frac{1}{2n^{2}}.

Proof 3.12.

A random permutation can be built in such a way:

  • 1in\forall 1\leq i\leq n, randomly and independently pick ZiZ_{i} from {1,,i}\{1,...,i\}.

  • take the unique permutation {Y1,,Yn}\{Y_{1},...,Y_{n}\} s.t. i,Yi\forall i,Y_{i} is the ZiZ_{i}-th largest element in {Y1,,Yi}\{Y_{1},...,Y_{i}\}.

It’s easy to see a random permutation built in this way is uniformly distributed.

Let {X1,,Xn}\{X_{1},...,X_{n}\} be a sequence of 0-1 random variables indicating whether Zi=iZ_{i}=i. The number mentioned in the lemma is just X=i=1nXiX=\sum_{i=1}^{n}X_{i} since Zi=iYi=maxjiYjZ_{i}=i\Leftrightarrow Y_{i}=\max_{j\leq i}Y_{j}. We have

Pr[Xi=1]=1Pr[Xi=0]=1i\Pr[X_{i}=1]=1-\Pr[X_{i}=0]=\frac{1}{i}

Note that μ=𝔼[X]=Hn=i=1n1ilnn+γ\mu=\mathbb{E}[X]=H_{n}=\sum_{i=1}^{n}\frac{1}{i}\approx\ln n+\gamma as nn\rightarrow\infty where γ\gamma is the euler constant, and

lnn+γ<Hnlnn+1,n1\ln n+\gamma<H_{n}\leq\ln n+1,\forall n\geq 1

Plugging ϵ=5\epsilon=5 into a Chernoff bound: Pr[X>(1+ϵ)μ]exp(μϵ22+ϵ)\Pr[X>(1+\epsilon)\mu]\leq\exp\left(-\frac{\mu\epsilon^{2}}{2+\epsilon}\right), we have

Pr[X>6(lnn+1)]\displaystyle\Pr[X>6(\ln n+1)] exp(257(lnn+γ))\displaystyle\leq\exp\left(-\frac{25}{7}(\ln n+\gamma)\right)
12n2\displaystyle\leq\frac{1}{2n^{2}}

See 1.1

Proof 3.13.

The correctness of the algorithm directly follows from Lemma 3.6. We only need to bound the number of probes it uses.

Follow the proof of Lemma 3.6 we know in each execution of the ‘while’ loop, exactly one of the three cases happens:

  • We find a certificate for uu which doesn’t have a valid one previously.

  • We find a mis-predicted edge.

  • We add a vertex uu into AA.

The algorithm makes no more than two probes in each loop, so it’s sufficient to bound the number of occurrences of each cases.

Case 2 and case 3 happen for at most n+wn+w times in all, because the number of mis-predicted edges is at most ww and each vertex is added into AA exactly once.

We now focus on case 1. Once a valid certificate for uu is found, we won’t make any further probes for uu until its current certificate becomes invalid. According to Lemma 3.8, all vertices are added into AA in a fixed order, which means all possible type-1 certificates for uu (the set of which is SuS_{u} initially) become invalid in a fixed order.

Each time we find a uniformly random type-1 certificate for uu among its currently valid ones. In the analysis it can be equivalently viewed as that for each uu, a random permutation {P1(u),,P|Su|(u)}\{P_{1}^{(u)},...,P_{|S_{u}|}^{(u)}\} of SuS_{u} is chosen and fixed at first, and all type-1 certificates used in the process are identified according to this permutation: when a new valid type-1 certificate is found, let it be Pi(u)P_{i}^{(u)}, where ii is the smallest index s.t. Pi(u)P_{i}^{(u)} is currently valid as a type-1 certificate for uu.

Let {Y1(u),,Y|Su|(u)}\{Y_{1}^{(u)},...,Y_{|S_{u}|}^{(u)}\} represents the fixed order of {P1(u),,P|Su|(u)}\{P_{1}^{(u)},...,P_{|S_{u}|}^{(u)}\} to become invalid, i.e. Pi(u)P_{i}^{(u)} is the Yi(u)Y_{i}^{(u)}-th earliest to become invalid. {Y1(u),,Y|Su|(u)}\{Y_{1}^{(u)},...,Y_{|S_{u}|}^{(u)}\} is a uniformly random permutation as well as {P1(u),,P|Su|(u)}\{P_{1}^{(u)},...,P_{|S_{u}|}^{(u)}\}, and the total number of valid type-1 certificates found for uu equals to the number of Yi(u)Y_{i}^{(u)} s.t. Yi(u)=maxjiYj(u)Y_{i}^{(u)}=\max_{j\leq i}Y_{j}^{(u)}.

From Lemma 3.11 we know w.p. at least 112n21-\frac{1}{2n^{2}}, this number does not exceed 6lnn+66\ln n+6. Take union bound over all uu, w.p. at least 112n1-\frac{1}{2n}, no vertex uses more than (6lnn+6)(6\ln n+6) type-1 certificates, hence total number of valid type-1 certificates the algorithm finds does not exceed 6nlnn+6n6n\ln n+6n.

The above analysis is exactly the same for type-2 certificates, since according to Corollary 3.10, the possible type-2 certificates for uu also become invalid in a fixed order. The only difference is that the number of valid type-2 certificates for each uu may be up to n2n^{2}, while it is at most nn for type-1 certificates. Again use Lemma 3.11 and take union bound, w.p. at least 112n1-\frac{1}{2n}, the total number of valid type-2 certificates the algorithm finds does not exceed 12nlnn+6n12n\ln n+6n.

Combining all cases we can conclude that w.p. at least 11n1-\frac{1}{n}, the algorithm uses no more than O(nlogn+w)O(n\log n+w) probes in total.

4 An algorithm using O(nw)O(nw) probes

Here we briefly introduce a deterministic algorithm using O(nw)O(nw) probes, to show that the generalized sorting with prediction problem can be solved in only linear probes when ww is as small as a constant.

The basic idea is to find a mis-predicted edge in O(n)O(n) probes and correct it in the predicted graph. We use GC=(V,PC)\vec{G}_{C}=(V,\vec{P}_{C}) to denote the predicted graph after correction: PC={(v,u)P|(u,v)Q}Q\vec{P}_{C}=\{(v,u)\in\vec{P}|(u,v)\notin\vec{Q}\}\cup\vec{Q} where QE\vec{Q}\subseteq\vec{E} is the set of directed edges probed till now.

If there is a directed cycle in GC\vec{G}_{C}, there must be at least one mis-predicted edge on the cycle since the actual G\vec{G} is acyclic. We just probe all edges on a simple directed cycle, update GC\vec{G}_{C} and loop again.

If GC\vec{G}_{C} is acyclic, consider running topological sort on it. If the direction of all edges in GCG_{C} are correct, each time there should be exactly one vertex whose in-degree is 0. If not, i.e. there are two vertices v1,v2v_{1},v_{2} with in-degree 0 at the same time, this can only happen when there are some mis-predicted edges either adjacent to v1,v2v_{1},v_{2} or on the path produced by the topological sort before. We probe all such edges and loop again.

The pseudo code of the algorithm is stated as follows:

Algorithm 2 A deterministic algorithm using O(nw)O(nw) probes
1:while there is no Hamiltonian path consisting of only probed edges do
2:     if there is a simple directed cycle in GC=(V,PC)\vec{G}_{C}=(V,\vec{P}_{C}) then
3:         probe all the edges on the cycle
4:     else
5:         run topological sort on GC\vec{G}_{C} and stop when v1,v2\exists v_{1},v_{2} both with in-degree 0
6:         let (a1,,ak)(a_{1},...,a_{k}) be the (partial) topological order
7:         if k=nk=n (i.e. no such v1,v2v_{1},v_{2} found) then
8:              1i<k\forall 1\leq i<k, probe the edge (ai,ai+1)(a_{i},a_{i+1})
9:         else
10:              1i<k\forall 1\leq i<k, probe the edge (ai,ai+1)(a_{i},a_{i+1})
11:              probe all edges adjacent to v1,v2v_{1},v_{2}               

See 1.2

Proof 4.1.

In each execution of the ‘while’ loop, exactly one of the following three cases happens, and we analyze them separately:

  1. 1.

    There is a simple directed cycle in GC\vec{G}_{C}. Since the actual G\vec{G} is acyclic, at least one edge on the cycle in GC\vec{G}_{C} is mis-predicted. By probing all edges on it we will find at least one mis-predicted edge.

  2. 2.

    GC\vec{G}_{C} is acyclic and k=nk=n in line 7, i.e. the topological sort terminates normally. By probing all edges between the adjacent vertices in the topological order, we either find a mis-predicted edge, or can know that the resulting path is just the desired Hamiltonian path.

  3. 3.

    GC\vec{G}_{C} is acyclic and there are two different vertices v1,v2v_{1},v_{2} with in-degree 0 during the topological sort.

    In this case, a mis-predicted edge must be found in line 10 or line 11. Prove it by contradiction, assume all edges probed in line 10 and line 11 are correct, if k>0k>0, (ak,v1)(a_{k},v_{1}) and (ak,v2)(a_{k},v_{2}) must both lie in GC\vec{G}_{C} since the topological sort stops just after handling aka_{k}, so 1ik,(ai<Ev1)(ai<Ev2)\forall 1\leq i\leq k,(a_{i}<_{\vec{E}}v_{1})\land(a_{i}<_{\vec{E}}v_{2}) holds due to transitivity. W.l.o.g assume v1<Ev2v_{1}<_{\vec{E}}v_{2}. Consider the directed path from v1v_{1} to v2v_{2} in the actual graph G\vec{G}, let it be (b1,,bl)(b_{1},...,b_{l}) where b1=v1b_{1}=v_{1} and bl=v2b_{l}=v_{2}. Note that a1<E<Eak<Eb1=v1<E<Ebl=v2a_{1}<_{\vec{E}}...<_{\vec{E}}a_{k}<_{\vec{E}}b_{1}=v_{1}<_{\vec{E}}...<_{\vec{E}}b_{l}=v_{2}. Therefore the edge (bl1,bl)GC(b_{l-1},b_{l})\in\vec{G}_{C} and bl1{a1,,ak}b_{l-1}\notin\{a_{1},...,a_{k}\}, which contradicts the fact that the in-degree of v2v_{2} is 0 at that moment.

Therefore in each loop, we either find at least one mis-predicted edge in GC\vec{G}_{C} or find the correct Hamiltonian path. It’s obvious that the number of probes we make is O(n)O(n) in each loop, so the total number of probes does not exceed O(nw)O(nw).

References

  • [1] Noga Alon, Manuel Blum, Amos Fiat, Sampath Kannan, Moni Naor, and Rafail Ostrovsky. Matching nuts and bolts. In SODA, pages 690–696, 1994.
  • [2] I. Banerjee and D. Richards. Sorting under forbidden comparisons. In SWAT, 2016.
  • [3] Moses Charikar, Ronald Fagin, Venkatesan Guruswami, Jon Kleinberg, Prabhakar Raghavan, and Amit Sahai. Query strategies for priced information. Journal of Computer and System Sciences, 64(4):785–819, 2002.
  • [4] Anupam Gupta and Amit Kumar. Sorting and selection with structured costs. In Proceedings 42nd IEEE Symposium on Foundations of Computer Science, pages 416–425. IEEE, 2001.
  • [5] Zhiyi Huang, Sampath Kannan, and Sanjeev Khanna. Algorithms for the generalized sorting problem. In 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science, pages 738–747. IEEE, 2011.
  • [6] Sampath Kannan and Sanjeev Khanna. Selection with monotone comparison costs. In Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms, page 10. SIAM, 2003.
  • [7] János Komlós, Yuan Ma, and Endre Szemerédi. Matching nuts and bolts in O(nlogn)O(n\log n) time. SIAM Journal on Discrete Mathematics, 11(3):347–372, 1998.
  • [8] Thodoris Lykouris and Sergei Vassilvitskii. Competitive caching with machine learned advice. arXiv preprint arXiv:1802.05399, 2018.
  • [9] Michael Mitzenmacher and Sergei Vassilvitskii. Algorithms with predictions. arXiv preprint arXiv:2006.09123, 2020.
  • [10] Manish Purohit, Zoya Svitkina, and Ravi Kumar. Improving online algorithms via ml predictions. In Advances in Neural Information Processing Systems, pages 9661–9670, 2018.
  • [11] Dhruv Rohatgi. Near-optimal bounds for online caching with machine learned advice. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1834–1845. SIAM, 2020.