Vertex Ordering Problems in Directed Graph Streams
We consider directed graph algorithms in a streaming setting, focusing on problems concerning orderings of the vertices. This includes such fundamental problems as topological sorting and acyclicity testing. We also study the related problems of finding a minimum feedback arc set (edges whose removal yields an acyclic graph), and finding a sink vertex. We are interested in both adversarially-ordered and randomly-ordered streams. For arbitrary input graphs with edges ordered adversarially, we show that most of these problems have high space complexity, precluding sublinear-space solutions. Some lower bounds also apply when the stream is randomly ordered: e.g., in our most technical result we show that testing acyclicity in the -pass random-order model requires roughly space. For other problems, random ordering can make a dramatic difference: e.g., it is possible to find a sink in an acyclic tournament in the one-pass random-order model using polylog space whereas under adversarial ordering roughly space is necessary and sufficient given passes. We also design sublinear algorithms for the feedback arc set problem in tournament graphs; for random graphs; and for randomly ordered streams. In some cases, we give lower bounds establishing that our algorithms are essentially space-optimal. Together, our results complement the much maturer body of work on algorithms for undirected graph streams.
1 Introduction
While there has been a large body of work on undirected graphs in the data stream model [20], the complexity of processing directed graphs (digraphs) in this model is relatively unexplored. The handful of exceptions include multipass algorithms emulating random walks in directed graphs [22, 15], establishing prohibitive space lower bounds on finding sinks [13] and answering reachability queries [11], and ruling out semi-streaming constant-pass algorithms for directed reachability [12]. This is rather unfortunate given that many of the massive graphs often mentioned in the context of motivating work on graph streaming are directed, e.g., hyperlinks, citations, and Twitter “follows” all correspond to directed edges.
In this paper we consider the complexity of a variety of fundamental problems related to vertex ordering in directed graphs. For example, one basic problem that motivated111The problem was explicitly raised in an open problems session at the Shonan Workshop “Processing Big Data Streams” (June 5-8, 2017) and generated considerable discussion. much of this work is as follows: given a stream consisting of edges of an acyclic graph in an arbitrary order, how much memory is required to return a topological ordering of the graph? In the offline setting, this can be computed in time using Kahn’s algorithm [16] or via depth-first trees [23] but nothing was known in the data stream setting.
We also consider the related minimum feedback arc set problem, i.e., estimating the minimum number of edges (arcs) that need to be removed such that the resulting graph is acyclic. This problem is NP-hard and the best known approximation factor is for arbitrary graphs [10], although a PTAS is known in the case of tournaments [18]. Again, nothing was known in the data stream model. In contrast, the analogous problem for undirected graphs is well understood in the data stream model. The number of edges required to make an undirected graph acyclic is where is the number of connected components. The number of connected components can be computed in space by constructing a spanning forest [11, 2].
Previous Work. Some versions of the problems we study in this work have been considered previously in the query complexity model. For example, Huang et al. [14] consider the “generalized sorting problem” where is an acyclic graph with a unique topological order. The algorithm is presented with an undirected version of this graph and may query any edge to reveal its direction. The goal is to learn the topological ordering with the minimum number of queries. Huang et al. [14] and Angelov et al. [5] also studied the average case complexity of various problems where the input graph is chosen from some known distribution. Ailon [3] studied the equivalent problem for feedback arc set in tournaments. Note that all these query complexity results are adaptive and do not immediately give rise to small-space data stream algorithms.
Perhaps the relative lack of progress on streaming algorithms for directed graph problems stems from their being considered “implicitly hard” in the literature, a point made in the recent work of Khan and Mehta [19]. Indeed, that work and the also-recent work of Elkin [9] provide the first nontrivial streaming algorithms for computing a depth-first search tree and a shortest-paths tree (respectively) in semi-streaming space, using passes. Notably, fairly non-trivial work was needed to barely beat the trivial bound of passes.
Some of our work here applies and extends the work of Guruswami and Onak [12], who gave the first super-linear (in ) space lower bounds in the streaming model for decision problems on graphs. In particular, they showed that solving reachability in -vertex digraphs using passes requires space. Via simple reductions, they then showed similar lower bounds for deciding whether a given (undirected) graph has a short – path or a perfect matching.
1.1 Results
Problem | Passes | Input Order | Space Bound | Notes |
---|---|---|---|---|
acyc | ||||
acyc | ||||
mult. approx. fas-size | ||||
mult. approx. fas-size | ||||
topo-sort | ||||
topo-sort | ||||
mult. approx. fas | ||||
mult. approx. fas | ||||
stconn-dag | random | error probability | ||
acyc | random | error probability | ||
mult. approx. fas-size | random | error probability | ||
topo-sort | random | error probability | ||
mult. approx. fas | random | error probability | ||
-approx. fas-t | exp. time post-processing | |||
-approx. fas-t | ||||
acyc-t | ||||
acyc-t | ||||
sink-find-t | ||||
sink-find-t | ||||
sink-find-t | random | |||
topo-sort | random | random DAG + planted path | ||
topo-sort | random DAG + planted path | |||
-apx. rank-aggr | exp. time post-processing |
Arbitrary Graphs. To set the stage, in Section 2 we present a number of negative results for the case when the input digraph can be arbitrary. In particular, we show that there is no one-pass sublinear-space algorithm for such fundamental digraph problems as testing whether an input digraph is acyclic, topologically sorting it if it is, or finding its feedback arc set if it is not. These results set the stage for our later focus on specific families of graphs, where we can do much more, algorithmically.
For our lower bounds, we consider both arbitrary and random stream orderings. In Section 2.1, we concentrate on the arbitrary ordering and show that checking whether the graph is acyclic, finding a topological ordering of a directed acyclic graph (DAG), or any multiplicative approximation of feedback arc set requires space in one pass. The lower bound extends to when the number of passes is . In Section 2.2, we show that essentially the same bound holds even when the stream is randomly ordered. This strengthening is one of our more technically involved results and it is based on generalizing a fundamental result by Guruswami and Onak [12] on – connectivity in the multi-pass data stream model.
As a by-product of our generalization, we also obtain the first random-order super-linear (in ) lower bounds for the undirected graph problems of deciding (i) whether there exists a short – path (ii) whether there exists a perfect matching.
Tournaments. A tournament is a digraph that has exactly one directed edge between each pair of distinct vertices. If we assume that the input graph is a tournament, it is trivial to find a topological ordering, given that one exists, by considering the in-degrees of the vertices. Furthermore, it is known that ordering the vertices by in-degree yields a 5-approximation to feedback arc set [8].
In Section 3, we present an algorithm which computes a -approximation to feedback arc set in one pass using space222Throughout the paper, .. However, in the post-processing step, it estimates the number of back edges for every permutation of vertices in the graph, thus resulting in exponential post-processing time. Despite its “brute force” feel, our algorithm is essentially optimal, both in its space usage (unconditionally) and its post-processing time (in a sense we shall make precise later). We address these issues in Section 3.4. On the other hand, in Section 3.2, we show that with additional passes it is possible to compute a 3-approximation to feedback arc set while using only polynomial time and space.
Lastly, in Section 4, we consider the problem of finding a sink in a tournament which is guaranteed to be acyclic. Obviously, this problem can be solved in a single pass using space by maintaining an “is-sink” flag for each vertex. Our results show that for arbitrary order streams this is tight. We prove that finding a sink in passes requires space. We also provide an -space sink-finding algorithm that uses passes, for any . In contrast, we show that if the stream is randomly ordered, then using space and a single pass is sufficient. This is a significant separation between the arbitrary-order and random-order data stream models.
Random Graphs. In Section 5, we consider a natural family of random acyclic graphs (see Definition 1.2 below) and present two algorithms for finding a topological ordering of vertices. We show that, for this family, space is sufficient to find the best ordering given passes. Alternatively, space is sufficient given only a single pass, on the assumption that the edges in the stream are randomly ordered.
Rank Aggregation. In Section 6, we consider the problem of rank aggregation (formally defined in the next section), which is closely related to the feedback arc set problem. We present a one-pass, space algorithm that returns -approximation to the rank aggregation problem. The algorithm is very similar to our -approximation of feedback arc set in tournaments and has the same drawback of using exponential post-processing time.
A summary of these results is given in Table 1.
1.2 Models and Preliminaries
Vertex Ordering Problems in Digraphs. An ordering of an -vertex digraph is a list consisting of its vertices. We shall view each ordering as a function , with being the position of in the list. To each ordering , there corresponds a set of back edges . We say that is a topological ordering if ; such exists iff is acyclic. We define is an ordering of , i.e., the size of a minimum feedback arc set for .
We now define the many interrelated digraph problems studied in this work. In each of these problems, the input is a digraph , presented as a stream of its edges. The ordering of the edges is adversarial unless specified otherwise.
- acyc:
-
Decide whether or not is acyclic.
- topo-sort:
-
Under the promise that is acyclic, output a topological ordering of its vertices.
- stconn-dag:
-
Under the promise that is acyclic, decide whether it has an -to- path, these being two prespecified vertices.
- sink-find:
-
Under the promise that is acyclic, output a sink of .
- fas-size (-approximation):
-
Output an integer .
- fas (-approximation):
-
Output an ordering such that .
- fas-t:
-
Solve fas under the promise that is a tournament. In a similar vein, we define the promise problems acyc-t, topo-sort-t, sink-find-t, fas-size-t.
For randomized solutions to these problems we shall require that the error probability be at most .
We remark that the most common definition of the minimum feedback arc set problem in the literature on optimization is to identify a small set of edges whose removal makes the graph acyclic, so fas-size is closer in spirit to this problem than fas. As we shall see, our algorithms will apply to both variants of the problem. On the other hand, lower bounds sometimes require different proofs for the two variants. Since iff is acyclic, we have the following basic observation.
Observation 1.1.
Producing a multiplicative approximation for any of fas, fas-t, fas-size, and fas-size-t entails solving (respectively) topo-sort, topo-sort-t, acyc, and acyc-t.
For an ordering of a vertex set , define . Define to be the unique acyclic tournament on consistent with .
As mentioned above, we will also consider vertex ordering problems on random graphs from a natural distribution. This distribution, which we shall call a “planted path distribution,” was considered by Huang et al. [14] for average case analysis in their work on generalized sorting.
Definition 1.2 (Planted Path Distribution).
Let be the distribution on digraphs on defined as follows. Pick a permutation of uniformly at random. Retain each edge in with probability if , and with probability , independently, otherwise.
Rank Aggregation. The feedback arc set problem in tournaments is closely related to the problem of rank aggregation (rank-aggr). Given total orderings of objects we want to find an ordering that best describes the “preferences” expressed in the input. Formally, we want to find an ordering that minimizes , where the distance between two orderings is the number of pairs of objects ranked differently by them. That is,
where the notation denotes a -valued indicator for the condition .
In the streaming model, the input to rank-aggr can be given either as a concatenation of orderings, leading to a stream of length , or as a sequence of triples conveying that , leading to a stream of length . Since we want the length of the stream to be polynomial in , we assume .
Lower Bounds through Communication Complexity. Space lower bounds for data streaming algorithms are most often proven via reductions from standard problems in communication complexity. We recall two such problems, each involving two players, Alice and Bob. In the problem, Alice holds a vector and Bob holds an index : the goal is for Alice to send Bob a message allowing him to output . In the problem, Alice holds and Bob holds : the goal is for them to communicate interactively, following which they must decide whether and are disjoint, when considered as subsets of , i.e., they must output . In the special case , it is promised that the cardinalities . In each case, the communication protocol may be randomized, erring with probability at most . We shall use the following well-known lower bounds.
Fact 1.3 (See, e.g., [1, 21]).
For error probability , the one-way randomized complexity and the general randomized complexity .
Other Notation and Terminology. We call an edge critical if it lies on a directed Hamiltonian path of length in a directed acyclic graph. We say an event holds with high probability (w.h.p.) if the probability is at least . Given a graph with a unique total ordering, we say a vertex has rank if it occurs in the th position in this total ordering.
2 General Digraphs and the Hardness of some Basic Problems
In this section, our focus is bad news. In particular, we show that there is no one-pass sublinear-space algorithm for the rather basic problem of testing whether an input digraph is acyclic, nor for topologically sorting it if it is. These results set the stage for our later focus on tournament graphs, where we can do much more, algorithmically.
2.1 Arbitrary Order Lower Bounds
To begin, note that the complexity of topo-sort-t is easily understood: maintaining in-degrees of all vertices and then sorting by in-degree provides a one-pass -space solution. However, the problem becomes maximally hard without the promise of a tournament.
Theorem 2.1.
Solving topo-sort in one pass requires space.
Proof.
We reduce from , where for a positive integer . Using a canonical bijection from to , we rewrite Alice’s input vector as a matrix and Bob’s input index as . Our reduction creates a graph on vertices: the vertex set , where each . These vertices are labeled, with being the th vertex in (and similarly for ).
Based on their inputs, Alice and Bob create streams of edges by listing the following sets:
The combined stream defines the graph , where .
We claim that is acyclic. In the digraph , every vertex is either a source or a sink. So the only vertices that could lie on a cycle in are , and . Either or , so there is in fact no cycle even among these four vertices.
Let be a topological ordering of . If , then we must, in particular, have , else we must have . Thus, by simulating a one-pass algorithm on Alice’s stream followed by Bob’s stream, consulting the ordering produced by and outputting iff , the players can solve . It follows that the space used by must be at least . ∎
For our next two results, we use reductions from stconn-dag. It is a simple exercise to show that a one-pass streaming algorithm for stconn-dag requires space. Guruswami and Onak [12] showed that a -pass algorithm requires space.333Although their paper states the lower bound for - connectivity in general digraphs, their proof in fact shows the stronger result that the bound holds even when restricted to DAGs.
Proposition 2.2.
Solving acyc requires space in one pass and space in passes.
Proof.
Given a DAG and specific vertices , let be obtained by adding edge to . Then is acyclic iff has no -to- path. By the discussion above, the lower bounds on acyc follow. ∎
Corollary 2.3.
Solving topo-sort in passes requires space.
Proof.
Given a -pass -space algorithm for topo-sort, we can obtain a -pass -space algorithm for acyc as follows. Run algorithm , store the ordering it outputs, and in another pass, check if the ordering induces any back-edge. If it does, we output NO, and otherwise, we output YES. In case of any runtime error, we return NO. For correctness, observe that if the input graph is acyclic, then returns a valid topological ordering w.h.p.. Hence, the final pass doesn’t detect any back-edge, and we correctly output YES. In case is not acyclic, the promise that the input graph for topo-sort would be a DAG is violated, and hence, either raises an error or returns some ordering that must induce a back-edge since doesn’t have a topological ordering. Thus, we correctly return NO in this case. Finally, Proposition 2.2 implies that , i.e., . ∎
Corollary 2.4.
A multiplicative approximation algorithm for either fas-size or fas requires space in one pass. In passes, such approximations require space and space respectively.
Proof.
This is immediate from 1.1, Theorem 2.1, Proposition 2.2, and Corollary 2.3. ∎
2.2 Random Order Lower Bounds
We consider the stconn-dag, acyc, and fas problems in a uniformly randomly ordered digraph stream. Recall that for adversarially ordered streams, these problems require about space in passes. The hardness ultimately stems from a similar lower bound for the shortpath-dag problem. In this latter problem, the input is an -vertex DAG with two designated vertices and , such that either (a) there exists a path of length at most from to or (b) is unreachable from . The goal is to determine which of these is the case.
Our goal in this section is to show that the same lower bound continues to hold under random ordering, provided we insist on a sufficiently small error probability, about . We prove this for shortpath-dag. As this is a special case of stconn-dag, a lower bound for shortpath-dag carries over to stconn-dag. Further, by the reductions in Proposition 2.2 and Corollaries 2.4 and 2.3, the lower bounds also carry over to acyc, topo-sort, and fas. We also show a barrier result arguing that this restriction to low error is necessary: for the shortpath-dag problem, if an error probability of at least is allowed, then space is achievable in passes.
Our proof uses the machinery of the Guruswami–Onak lower bound for shortpath-dag under an adversarial stream ordering [12]. As in their work, we derive our space lower bound from a communication lower bound for set chasing intersection (henceforth, sci). However, unlike them, we need to prove a “robust” lower bound for sci, in the sense of Chakrabarti, Cormode, and McGregor [7], as explained below. To define sci, we first set up a special family of multilayer pointer jumping problems, described next.
Picture a layered digraph with layers of vertices, each layer having vertices, laid out in a rectangular grid with each column being one layer. From left to right, the layers are numbered . Layer is called the mid-layer. The only possible edges of are from layer to layer , or from layer to layer , for (i.e., edges travel from the left and right ends of the rectangular grid towards the mid-layer). We designate the first vertex in layer as and the first vertex in layer as .
Each vertex not in the mid-layer has exactly outgoing edges, numbered st through th, possibly with repetition (i.e., is a multigraph). Think of these edges as pointers. An input to one of our communication problems (to be defined soon) specifies the destinations of these pointers. Thus, an input consists of tokens, where each token is an integer in specifying which of the possibilities a certain pointer takes. The pointers emanating from layer of vertices constitute the th layer of pointers. Our communication games will involve players named . We say that is the natural owner of the portion of the input specifying the th layer of pointers.
In the problem, the goal is to determine whether or not there exists a mid-layer vertex reachable from as well as . Consider the communication game where each pointer is known to its natural owner and the players must communicate in rounds, where in each round they broadcast messages in the fixed order . Guruswami and Onak showed that this problem requires total communication in the parameter regime . This almost immediately implies a similar lower bound for shortpath-dag—simply reverse the directions of the pointers in positive-numbered layers—which then translates into a data streaming lower bound along standard lines.
The key twist in our version of the sci problem is that each pointer is allocated to one of the players uniformly at random: thus, most pointers are not allocated to their natural owners. The players have to determine the output to sci communicating exactly in the same pattern as before, up to a small error probability taken over the protocol’s internal randomness as well as the random allocation. This setup potentially makes the problem easier because there is a good chance that the players will be able to “jump two pointers” within a single round. Our main technical result is to show that a lower bound of the form holds despite this. In the terminology of Chakrabarti et al. [7], who lower-bounded a number of communication problems under such random-allocation setups, this is a robust communication lower bound.
Theorem 2.5.
Suppose that and that protocol solves with error when the input is randomly allocated amongst the players, as described above. Then, communicates bits.
To prove this result, we consider a problem we call , defined next (Guruswami and Onak called this problem ). Consider an input to and fix an . If we retain only the th pointer emanating from each vertex, for each layer , the th layer of pointers defines a function . Let (respectively, ) denote the index of the unique mid-layered vertex reached from (respectively, ) by following the retained pointers. Formally,
Define a function to be -thin if every element in its range has at most distinct pre-images. The instance is said to meet at if and is said to be -thin at if each function is -thin. The desired output of mpj-meet is
for an appropriate universal constant . The corresponding communication game allocates each pointer to its natural owner and asks them to determine the output using the same communication pattern as for sci. Here is the key result about this problem.
Lemma 2.6 (Lemma 7 of Guruswami–Onak [12]).
The -round constant-error communication complexity . ∎
Using this, we prove our main technical result.
Proof of Theorem 2.5.
Based on the -error protocol for , we design a protocol for as follows. Let be an instance of mpj-meet allocated to players as described above. The players first check whether, for some , fails to be -thin at , for : this check can be performed in the first round of communication with each player communicating a single bit. If the check passes, the protocol ends with output . From now on, we assume that is indeed -thin at each .
Using public randomness, the players randomly renumber the vertices in each layer of , creating an instance of sci.444This step is exactly as in Guruswami-Onak [12]. Formally, each function is replaced by a corresponding function of form (for ), for random permutations . To keep things concise, we omit the full details here. The players then choose , a random allocation of pointers as in the sci problem. They would like to simulate on , as allocated by , but of course they can’t do so without additional communication. Instead, using further public randomness, for each pointer that allocates to someone besides its natural owner, the players reset that pointer to a uniformly random (and independent) value in . We refer to such a pointer as damaged. Since there are players, each pointer is damaged with probability . Let denote the resulting random instance of sci. The players then simulate on as allocated by .
It remains to analyze the correctness properties of . Suppose that is a -instance of mpj-meet. Then there exists such that meets at . By considering the unique maximal paths out of and following only the th pointers at each vertex, we see that is also a -instance of sci. Since the vertex renumbering preserves connectivity, is also a -instance of sci. With probability , none of the pointers on these renumbered paths is damaged; when this event occurs, is also a -instance of sci. Therefore, outputs with probability at least .
Next, suppose that is a -instance of mpj-meet. It could be that is a -instance of sci. However, as Guruswami and Onak show,555See the final paragraph of the proof of Lemma 11 in [12]. the random vertex renumbering ensures that . For the rest of the argument, assume that . In order to have , there must exist a mid-layer vertex such that
(1) |
for some choice of pointer numbers . We consider three cases.
-
•
Case 1: None of the pointers in the above list is damaged. In this case, eq. 1 cannot hold, because .
-
•
Case 2: The layer- pointer in the above list is damaged. Condition on a particular realization of pointers in negative-numbered layers and let denote the mid-layered vertex reached from by following pointers numbered , as in eq. 1. The probability that the damaged pointer at layer points to is . Since this holds for each conditioning, the probability that is also .
-
•
Case 3: The layer- pointer is damaged, but pointers in layers are not, where . Again, condition on a particular realization of pointers in negative-numbered layers and let be as above. Since the functions in eq. 1 are all -thin, the number of vertices in layer that can reach using only undamaged pointers is at most . The probability that the damaged pointer at layer points to one of these vertices is at most .
Combining the cases, the probability that eq. 1 holds for a particular choice of pointer numbers is at most . Taking a union bound over the choices, the overall probability , for the parameter regime and . Therefore, outputs with probability at most .
Thus far, we have a protocol that outputs with probability when and with probability when , where and . Recall that , so is bounded away from . Let be a protocol where we first toss an unbiased coin: if it lands heads, we output with probability and with probability ; if it lands tails, we simulate . Then is a protocol for mpj-meet with error probability . By Lemma 2.6, this protocol must communicate bits and so must . ∎
By a standard reduction from random-allocation communication protocols to random-order streaming algorithms, we obtain the following lower bound: the main result of this section.
Theorem 2.7.
For each constant , a -pass algorithm that solves shortpath-dag on -vertex digraphs whose edges presented in a uniform random order, erring with probability at most must use bits of space.
Consequently, similar lower bounds hold for the problems stconn-dag, acyc, topo-sort, fas, and fas-size. ∎
This paper is focused on directed graph problems. However, it is worth noting that a by-product of our generalization of the Guruswami–Onak bound to randomly ordered streams is that we also obtain the first random-order super-linear (in ) lower bounds for two important undirected graph problems.
Corollary 2.8.
For each constant , space is required to solve either of the following problems in passes, erring with probability at most , over a randomly ordered edge stream of an -vertex undirected graph :
-
•
decide whether contains a perfect matching;
-
•
decide whether the distance between prespecified vertices and is at most .
A Barrier Result. Notably, Theorem 2.7 applies only to algorithms with a rather small error probability. This is inherent: allowing just a slightly larger error probability renders the problem solvable in semi-streaming space. This is shown in the result below, which should be read as a barrier result rather than a compelling algorithm.
Proposition 2.9.
Given a randomly ordered edge stream of a digraph , the shortpath-dag problem on can be solved using space and passes, with error probability at most .
Proof.
Recall that we’re trying to decide whether or not has a path of length at most from to . The high-level idea is that thanks to the random ordering, a “Bellman–Ford” style algorithm that grows a forward path out of and a backward path out of is very likely to make more than one step of progress during some pass.
To be precise, we maintain arrays and , each indexed by . Initialize the arrays to , except that . During each pass, we use the following logic.
for each edge in the stream: | |
if : output true and halt | |
If we complete passes without any output, then we output false.
If has no short enough path from to , this algorithm will always output false. So let’s consider the other case, when there is a – path of length at most . Let vertex be the midpoint of , breaking ties arbitrarily if needed. The subpaths and have lengths and , respectively, with and . Notice that if our algorithm is allowed to run for (resp. ) passes, then (resp. ) will settle to its correct value. If both of them settle, then the algorithm correctly outputs true. So, the only nontrivial case is when .
Let be the event that the random ordering of the edges in the stream places the edges of in the exact reverse order of . Let be the event that the random ordering places the edges of in the exact same order as . If does not occur, then for some two consecutive edges on , the stream puts before . Therefore, once settles to its correct value, the following pass will settle not just , but also ; therefore, after passes, is settled. Similarly, if does not occur, then after passes, is settled. As noted above, if both of them settle, the algorithm correctly outputs true.
Thus, the error probability , as required. ∎
3 Feedback Arc Set in Tournaments
3.1 Accurate, One Pass, but Slow Algorithm for FAS-T
We shall now design an algorithm for fas-t (that also solves fas-size-t) based on linear sketches for -norm estimation. Recall that the -norm of a vector is . A -dimensional -sketch with accuracy parameter and error parameter is a distribution over matrices, together with an estimation procedure such that
Such a sketch is “stream friendly” if there is an efficient procedure to generate a given column of and further, is efficient. Obviously, a stream friendly sketch leads to a space and time efficient algorithm for estimating given a stream of entrywise updates to . We shall use the following specialization of a result of Kane et al. [17].
Fact 3.1 (Kane et al. [17]).
There is a stream friendly -dimensional -sketch with accuracy and error that can handle many -updates to , with each update taking time, with , and with entries of the sketched vector fitting in bits.
Theorem 3.2.
There is a one-pass algorithm for fas-t that uses space and returns a -approximation with probability at least , but requires exponential post-processing time.
Proof.
Identify the vertex set of the input graph with and put . We index vectors in as , where . Define a vector based on and vectors for each permutation using indicator variables as follows.
A key observation is that the -entry of is nonzero iff the edge between and is a back edge of according to the ordering . Thus, .
Our algorithm processes the graph stream by maintaining an -sketch with accuracy and error . By 3.1, this takes space and time per edge.
In post-processing, the algorithm considers all permutations and, for each of them, computes . It thereby recovers an estimate for and finally outputs the ordering that minimizes this estimate. By a union bound, the probability that every estimate is -accurate is at least . When this happens, the output ordering provides a -approximation to fas-t by our key observation above. ∎
Despite its “brute force” feel, the above algorithm is essentially optimal, both in its space usage (unconditionally) and its post-processing time (in a sense we shall make precise later). We address these issues in Section 3.4.
3.2 Multiple Passes: FAS-T in Polynomial Time
For a more time-efficient streaming algorithm, we design one based on the KwikSort algorithm of Ailon et al. [4]. This (non-streaming) algorithm operates as follows on a tournament .
-
•
Choose a random ordering of the vertices: .
-
•
Vertex partitions into two sub-problems and . At this point we know the exact place of in the ordering.
-
•
Vertex further partitions one of the these sub-problems. Proceeding in this manner, after are considered, there are sub-problems.
-
•
Continue until all vertices are ordered.
When is being used to divide a sub-problem we refer to it as a pivot.
Emulating KwikSort in the Data Stream Model. We will emulate KwikSort in passes over the data stream. In each pass, we will consider the action of multiple pivots. Partition into groups , where , consists of the next vertices in the sequence, and contains vertices coming after . Here is a sufficiently large constant. At the end of pass , we want to emulate the effect of pivots in on the sub-problems resulting from considering pivots in through . In order to do that, in pass for each vertex we store all edges between and vertices in the same sub-problem as , where the sub-problems are defined at the end of pass .
The following combinatorial lemma plays a key role in analyzing this algorithm’s space usage.
Lemma 3.3 (Mediocrity Lemma).
In an -vertex tournament, if we pick a vertex uniformly at random, then . Similarly, . In particular, with probability at least , has in/out-degree between and .666 The Mediocrity Lemma is tight: consider sets of vertices where and . Edges on do not form any directed cycles. Subgraphs induced by and are balanced, i.e., the in-degree equals the out-degree of every vertex (where degrees here are considered within the subgraph). All other edges are directed from to , from to , or from to . Then vertices with in/out-degrees between and are exactly the vertices in , and a random vertex belongs to this set with probability .
Proof.
Let be a set of vertices of in-degree at least . Let . On the one hand, . On the other hand, the edges that contribute to the in-degrees of vertices in have both endpoints in or one endpoint in and one in . The number of such edges is
Therefore, . This implies .
Thus, the number of vertices with in-degree at least (and out-degree at most ) is . By symmetry, the number of vertices with out-degree at least (and in-degree at most ) is also less than . Thus, the probability a random vertex has in/out-degree between and is . ∎
Space Analysis. Let be the maximum size of a sub-problem after pass . The number of edges collected in pass is then at most . By Lemma 3.4 (below), this is at most . Once the post-processing of pass is done, the edges collected in that pass can be discarded.
Lemma 3.4.
With high probability, for all .
Proof.
Let denote the size of the sub-problem that contains , after the th pass. We shall prove that, for each , . The lemma will then follow by a union bound.
Take a particular vertex . If, before the th pass, we already have , there is nothing to prove. So assume that . Call a pivot “good” if it reduces the size of the sub-problem containing by a factor of at least . A random pivot falls in the same sub-problem as with probability at least ; when this happens, by the Mediocrity Lemma, the probability that the pivot is good is at least . Overall, the probability that the pivot is good is at least .
In the th pass, we use pivots. If at least of them are good, we definitely have . Thus, by a Chernoff bound, for a sufficiently large , we have
Theorem 3.5.
There exists a polynomial time -pass data stream algorithm using space that returns a -approximation (in expectation) for fas-t.
3.3 A Space Lower Bound
Both our one-pass algorithm and the -pass instantiation of our multi-pass algorithm use at least space. For fas-size-t, where the desired output is a just a number, it is reasonable to ask whether -space solutions exist. We now prove that they do not.
Proposition 3.6.
Solving acyc-t is possible in one pass and space. Meanwhile, any -pass solution requires space.
Proof.
For the upper bound, we maintain the in-degrees of all vertices in the input graph . Since is a tournament, the set of in-degrees is exactly iff the input graph is acyclic.
For the lower bound, we reduce from . Alice and Bob construct a tournament on vertices, where the vertices are labeled . Alice, based on her input , adds edges for each . For each remaining pair with , she adds the edge . Let be the sorted order of the elements in Bob’s set . For each , Bob defines the alias and then adds the edges
Finally, he adds the edges . This completes the construction of .
We claim that the tournament is acyclic iff . The “only if” part is direct from construction, since if and intersect at some index , we have the directed cycle . For the “if” part, let be the ordering and let , as defined in Section 1.2. We show how to modify into a topological ordering of , proving that is acyclic. Observe that, by construction, the tournament can be obtained from by flipping only the edges for each . Each time we perform such an edge flip, we modify the topological ordering of by swapping the associated vertices of the edge. The resultant ordering would still be topological as the vertices were consecutive in the ordering before the flip. Thus, after performing these swaps, we get a topological ordering of . Now, consider some . Since , and so, succeeds in this ordering, just as in , since we never touched these two vertices while performing the swaps. Thus, for each such , we can now insert between and in the ordering and obtain a topological ordering of . This proves the claim.
Thus, given a -pass solution to acyc-t using bits of space, we obtain a protocol for that communicates at most bits. By 1.3, , i.e., . ∎
Theorem 3.7.
A -pass multiplicative approximation for fas-size-t requires space.
3.4 An Oracle Lower Bound
Let us now consider the nature of the post-processing performed by our one-pass fas-t algorithm in Section 3.1. During its streaming pass, that algorithm builds an oracle based on that, when queried on an ordering , returns a fairly accurate estimate of . It proceeds to query this oracle times to find a good ordering. This raises the question: is there a more efficient way to exploit the oracle that the algorithm has built? A similar question was asked in Bateni et al. [6] in the context of using sketches for the maximum coverage problem.
Were the oracle exact—i.e., on input it returned exactly—then two queries to the oracle would determine which of and was an edge in . It follows that queries to such an exact oracle suffice to solve fas-t and fas-size-t. However, what we actually have is an -oracle, defined as one that, on query , returns such that . We shall show that an -oracle cannot be exploited efficiently: a randomized algorithm will, with high probability, need exponentially many queries to such an oracle to solve either fas-t or fas-size-t.
To prove this formally, we consider two distributions on -vertex tournaments, defined next.
Definition 3.8.
Let be distributions on tournaments on produced as follows. To produce a sample from , pick a permutation of uniformly at random; output . To produce a sample from , for each with , independently at random, include edge with probability ; otherwise include edge .
Let be an ordering of . By linearity of expectation, if is sampled from either or ,
In fact, we can say much more.
Lemma 3.9.
There is a constant such that, for all , sufficiently large , a fixed ordering on , and random drawn from either or ,
Proof.
When , the random variable has binomial distribution , so the claimed bound is immediate.
Let . Partition the edges of the tournament into perfect matchings . For each , let be the number of back edges of involving , i.e.,
Notice that , whence
for a certain constant . By a union bound, the probability that all of the s are between these bounds is at least , for suitable . When this latter event happens, we also have . ∎
We define a -query algorithm for a problem to be one that access an input digraph solely through queries to an -oracle and, after at most such queries, outputs its answer to . We require this answer to be correct with probability at least .
Now consider the particular oracle , describing an -vertex tournament , that behaves as follows when queried on an ordering .
-
•
If , then return .
-
•
Otherwise, return .
Clearly, is an -oracle. The intuition in the next two proofs is that this oracle makes life difficult by seldom providing useful information.
Proposition 3.10.
Every -query algorithm for topo-sort-t makes queries.
Proof.
WLOG, consider a -query algorithm, , that makes exactly queries, the last of which is its output. Using Yao’s minimax principle, fix ’s random coins, obtaining a deterministic -query algorithm that succeeds with probability on a random tournament . Let be the sequence of queries that makes when the answer it receives from the oracle to each of is .
Suppose that the oracle supplied to is . Let be the event that ’s query sequence is and it receives the response to each of these queries. For a particular ,
for a suitable constant , by Lemma 3.9. Thus, by a union bound, .
When occurs, must output , but itself implies that , so errs. Thus, the success probability of is at most . Since this probability must be at least , we need . ∎
Proposition 3.11.
Every -query algorithm for acyc-t makes queries.
Proof.
We proceed similarly to Proposition 3.10, except that we require the deterministic -query algorithm to succeed with probability at least on a random . We view as being chosen in two stages: first, we pick uniformly at random, then we pick .
Define and as before. So . When occurs, must output some fixed answer, either “yes” or “no.” We consider these cases separately.
Suppose that outputs “no,” declaring that is not acyclic. Then errs whenever yes and occurs. The probability of this is at least , but it must be at most , requiring .
Suppose that outputs “yes” instead. Then it errs when no, is cyclic, and occurs. Since
we have errs, requiring . ∎
Theorem 3.12.
A -query algorithm that gives a multiplicative approximation for either fas-t or fas-size-t must make queries.
4 Sink Finding in Tournaments
A classical offline algorithm for topo-sort is to repeatedly find a sink in the input graph (which must exist in a DAG), prepend to a growing list, and recurse on . Thus, sink-find itself is a fundamental digraph problem. Obviously, sink-find can be solved in a single pass using space by maintaining an “is-sink” flag for each vertex. Our results below show that for arbitrary order streams this is tight, even for tournament graphs.
In fact, we say much more. In passes, on the one hand, the space bound can be improved to roughly . On the other hand, any -pass algorithm requires about space. While these bounds don’t quite match, they reveal the correct asymptotics for the number of passes required to achieve polylogarithmic space usage: namely, .
In contrast, we show that if the stream is randomly ordered, then using space and a single pass is sufficient. This is a significant separation between the adversarial and random order data streams.
4.1 Arbitrary Order Sink Finding
Theorem 4.1 (Multi-pass algorithm).
For all with , there is a -pass algorithm for sink-find-t that uses space and has failure probability at most .
Proof.
Let the input digraph be . For a set , let denote the vertex in that has maximum in-degree. This can also be seen as the maximum vertex within according to the total ordering defined by the edge directions.
Our algorithm proceeds as follows.
-
•
Initialization: Set . Let be a set of vertices chosen randomly from .
-
•
For to :
-
–
During pass : Find by computing the in-degree of each vertex in .
-
–
During pass : Let be a set of vertices chosen randomly from .
-
–
-
•
During pass : Find by computing the in-degree of each vertex in .
For the sake of analysis, consider the quantity . Note that, for each ,
Thus, by the union bound, with probability at least . Note that implies that is a sink. ∎
We turn to establishing a multi-pass lower bound. Our starting point for this is the tree pointer jumping problem , which is a communication game involving players. To set up the problem, consider a complete ordered -level -ary tree ; we consider its root to be at level , the children of to be at level , and so on. We denote the -th child of by , the -th child of by , and so on. Thus, each leaf of is of the form for some integers .
An instance of is given by a function such that for each leaf . The desired one-bit output is
(2) |
For each , Player receives the input values for each vertex at level . The players then communicate using at most rounds, where a single round consists of one message from each player, speaking in the order Player , …, Player . All messages are broadcast publicly (equivalently, written on a shared blackboard) and may depend on public random coins. The cost of a round is the total number of bits communicated in that round and the cost of a protocol is the maximum, over all rounds, of the cost of a round. The randomized complexity is the minimum cost of a -round -error protocol for .
Combining the lower bound approach of Chakrabarti et al. [7] with the improved round elimination analysis of Yehudayoff [24], we obtain the following lower bound on the randomized communication complexity of the problem.
Theorem 4.2.
. ∎
Based on this, we prove the following lower bound.
Theorem 4.3 (Multi-pass lower bound).
Any streaming algorithm that solves sink-find-t in passes must use space.
Proof.
We reduce from , where . We continue using the notations defined above. At a high level, we encode an instance of tpj in the directions of edges in a tournament digraph , where can be viewed as two copies of the set of leaves of . Formally,
We assign each pair of distinct vertices to a level in as follows. Suppose that and . We assign to level , where is the smallest index such that . Given an instance of , the players jointly create an instance of sink-find-t as follows. For each from to , in that order, Player assigns directions for all pairs of vertices at level , obtaining a set of directed edges, and then appends to a stream. The combined stream defines the tournament . It remains to define each set precisely.
The set encodes the bits at the leaves of as follows.
(3) |
Notice that if we ignore edge directions, is a perfect matching on .
Now consider an arbitrary level . Corresponding to each vertex at level of , we define the permutation thus:
(4) |
Using this, we define so as to encode the pointers at level as follows.
(5) |
It should be clear that the digraph is a tournament. We argue that it is acyclic. Suppose, to the contrary, that has a cycle . Let be the smallest-numbered level of an edge on . Then there exist such that every vertex on is of the form . Let be the vertices on whose outgoing edges belong to level . For each , let . Let . According to eq. 5,
a contradiction.
It follows that has a unique sink. Let be this sink. In particular, for each level , all edges in involving must be directed towards . According to eq. 5, we must have , i.e., . By section 4.1, this gives . Next, by eq. 2, this gives . Instantiating this observation for , we have
i.e., .
At this point have been determined, leaving only two possibilities for . We now use the fact that the sole edge in involving must be directed towards . According to eq. 3, . Invoking eq. 2 again, .
Thus, the players can read off the desired output from the identity of the unique sink of the constructed digraph . Notice that . It follows that a -pass streaming algorithm for sink-find-t that uses bits of space solves in rounds at a communication cost of . By Theorem 4.2, we have . ∎
4.2 Random Order Sink Finding
In this section we show that it is possible to find the sink of an acyclic tournament in one pass over a randomly order stream while using only space. The algorithm we consider is as follows:
-
•
Initialization: Let be a random set of nodes.
-
•
For to :
-
–
Ingest the next elements of the stream: For each , collect the set of edges consisting of all outgoing edges; throw away if it exceeds size
-
–
Pick any , such that and let be the endpoints (other than ) of the edges in
-
–
-
•
Ingest the next elements: find the set of vertices such that there exists an edge for some
-
•
Ingest the remaining elements: Output any vertex in with no outgoing edges.
Theorem 4.4.
There is a single pass algorithm for sink-find-t that uses space and has failure probability at most under the assumption that the data stream is randomly ordered.
Proof.
We refer to the elements used in the iteration as the th segment of the stream. For a node , let be the number of outgoing edges from amongst the th segment. The following claim follows from the Chernoff bound:
Claim 4.5.
With high probability, for all with then
With high probability, for all with , then
If follows from the claim that if after processing the th segment of the stream there exists a such that then with high probability . We next need to argue that there exists such a .
Claim 4.6.
With high probability, for every node with , there exists an edge in the th segment such that .
Proof.
There are at least such edges. The probability that none of them exists in the th segment is at most . ∎
The above two claims allow us to argue by induction that we will have an element with after the th segment. At the end of the th segment we have identified at least vertices where every rank is at most . With probability at least one of these vertices includes an edge to the sink amongst the segment and hence the sink is in with high probability. There may be other vertices in but the following claim shows that we will identify any false positives while processing the final elements of the stream.
Claim 4.7.
With probability at least , there exists at least once outgoing edge from every node except the sink amongst the last elements of the stream
Proof of Claim.
The probability no outgoing edge from the an element of rank appears in the suffix of the stream is at most . Hence, by the union bound the probability that there exists an element of rank without an outgoing edge is at most ∎
This concludes the proof of Theorem 4.4. ∎
5 Topological Ordering in Random Graphs
We present results for computing a topological ordering of (see Definition 1.2). We first present an -pass algorithm using space. We then present a one-pass algorithm that uses space and requires the assumption that the stream is in random order.
5.1 Arbitrary Order Algorithm
In this section, we present two different algorithms. The first is appropriate when is large whereas the second is appropriate when is small. Combining these algorithms and considering the worst case value of yields the algorithm using space.
Algorithm for large q. The basic approach is to emulate QuickSort. We claim that we can find the relationship between any vertex among vertices and a predetermined vertex using three passes and space. Assuming this claim, we can sort in passes and space: we recursively partition the vertices and suppose at the end of a phase we have sub-problems of sizes . Any sub-problem with at least vertices is then sub-divided by picking random pivots (with replacement) within the sub-problems using the aforementioned three pass algorithm. There are at most such sub-problems. Hence, the total space required partition all the sub-problems in this way is at most
Note that the size of every sub-problem decreases by a factor at least at each step with high probability and hence after iterations, all sub-problems have at most vertices. Furthermore, each vertex degree is in each sub-problem. Hence, the entire remaining instance can be stored using space.
It remains to prove our three-pass claim. For this, we define the following families of sets:
Using two passes and space we can identify and using space. Let be the set of vertices not contained in . The following lemma (which can be proved via Chernoff bounds) establishes that includes most of the vertices of the graph with high probability.
Lemma 5.1.
With high probability, .
In a third pass, we store every edge between vertices in and also compute and . Computing and requires only space. There is an edge between each pair of vertices in with probability and hence, the expected number of edges between vertices in is at most . By an application of the Chernoff Bound, this bound also holds w.h.p. Note that , and the edges within suffice to determine whether or for all . To see this first suppose and that is the critical edge on the directed path from to . Either and therefore we deduce ; or ; or and and we therefore store the edge .
This establishes the following lemma.
Lemma 5.2.
There is a -pass, -space algorithm for topo-sort on a random input graph . ∎
Algorithm for small q. We use only two passes. In the first pass, we compute the in-degree of every vertex. In the second, we store all edges between vertices where the in-degrees differ by at most where is a sufficiently large constant.
Lemma 5.3.
There is a two-pass, -space algorithm for topo-sort on a random input graph .
Proof.
We show that, with high probability, the above algorithm collects all critical edges and furthermore only collects edges in total. Let be the element of rank . Note that has distribution . Let . By an application of the Chernoff Bound,
Hence, w.h.p., for all vertices . Therefore, if is critical, then
This ensures that the algorithm collects all critical edges. For the space bound, we first observe that for an arbitrary pair of vertices and , if then
Hence, we only store an edge between vertex and vertices whose rank differs by at most . Since edges between such vertices are present with probability , the expected number of edges stored incident to is and is by an application of the Chernoff bounds. Across all vertices this means the number of edges stored is as claimed. ∎
Theorem 5.4.
There is an -pass algorithm for topo-sort on a random input that uses space. For the worst-case over , this is . ∎
5.2 Random Order Algorithm
The transitive reduction of a DAG is the minimal subgraph such that, for all , if has a -to- path, then so does . So if has a Hamiltonian path, is this path.
The one-pass algorithm assuming a random ordering of the edges is simply to maintain as is streamed in, as follows. Let be initially empty. For each edge in the stream, we add to and then remove all edges where there is a -to- path among the stored edges.
Theorem 5.5.
There is a one-pass -space algorithm for topo-sort on inputs presented in random order. In the worst case this space bound is .
Proof.
Consider the length- prefix of the stream where the edges of are presented in random order. It will be convenient to write . We will argue that the number of edges in the transitive reduction of this prefix is with high probability; note the bound follows trivially because the transitive reduction has at most edges. The result then follows by taking the maximum over all prefixes.
We say an edge of is short if the difference between the ranks is where is some sufficiently large constant. An edge that is not short is defined to be long. Let be the number of short edges in and let be the total number of edges in . Note that and . By the Chernoff bound, and with high probability. Furthermore, the number of short edges in the prefix is expected to be and, with high probability, is at most
Now consider how many long edges are in the transitive reduction of the prefix. For any long edge , let denote the event that are both in the prefix. Note that the variables are negatively correlated and that
Hence, if then
and so, by the Chernoff bound, with high probability and if this is the case, even if is in the prefix, it will not be in the transitive reduction of the prefix. Hence, by the union bound, with high probability no long edges exist in the transitive closure of the prefix. ∎
6 Rank Aggregation
Recall the rank-aggr problem and the distance between permutations, defined in Section 1.2. To recap, the distance between two orderings is the number of pairs of objects which are ranked differently by them, i.e.,
Note that rank-aggr is equivalent to finding the median of a set of points under this distance function, which can be shown to be metric. It follows that picking a random ordering from the input orderings provides a -approximation for rank-aggr.
A different approach is to reduce rank-aggr to the weighted feedback arc set problem on a tournament. This idea leads to a -approximation via -norm estimation in a way similar to the algorithm in Section 3.1. Define a vector of length indexed by pairs of vertices where
i.e., the number of input orderings that have . Then for any ordering define a vector , where for each pair of vertices ,
It is easy to see that .
As in Section 3.1, our algorithm maintains an -sketch with accuracy and error . By 3.1, this requires at most space. In post-processing, the algorithm considers all permutations and, for each of them, computes . It thereby recovers an estimate for and finally outputs the ordering that minimizes this estimate.
The analysis of this algorithm is essentially the same as in Theorem 3.2. Overall, we obtain the following result.
Theorem 6.1.
There is a one-pass algorithm for rank aggregation that uses space, returns a -approximation with probability at least , but requires exponential post-processing time. ∎
Acknowledgements
We thank Riko Jacob for a helpful discussion about the sink finding problem. We thank the anonymous SODA 2020 reviewers for several helpful comments that improved the presentation of the paper.
References
- [1] F. Ablayev. Lower bounds for one-way probabilistic communication complexity and their application to space complexity. Theoretical Computer Science, 175(2):139–159, 1996.
- [2] K. J. Ahn, S. Guha, and A. McGregor. Analyzing graph structure via linear measurements. In Proc. 23rd Annual ACM-SIAM Symposium on Discrete Algorithms, pages 459–467, 2012.
- [3] N. Ailon. Active learning ranking from pairwise preferences with almost optimal query complexity. In J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 24, pages 810–818. Curran Associates, Inc., 2011.
- [4] N. Ailon, M. Charikar, and A. Newman. Aggregating inconsistent information: Ranking and clustering. J. ACM, 55(5):23:1–23:27, 2008.
- [5] S. Angelov, K. Kunal, and A. McGregor. Sorting and selection with random costs. In LATIN 2008: Theoretical Informatics, 8th Latin American Symposium, Búzios, Brazil, April 7-11, 2008, Proceedings, pages 48–59, 2008.
- [6] M. Bateni, H. Esfandiari, and V. S. Mirrokni. Almost optimal streaming algorithms for coverage problems. In Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2017, Washington DC, USA, July 24-26, 2017, pages 13–23, 2017.
- [7] A. Chakrabarti, G. Cormode, and A. McGregor. Robust lower bounds for communication and stream computation. Theor. Comput., 12(1):1–35, 2016. Preliminary version in Proc. 40th Annual ACM Symposium on the Theory of Computing, pages 641–649, 2008.
- [8] D. Coppersmith, L. Fleischer, and A. Rudra. Ordering by weighted number of wins gives a good ranking for weighted tournaments. ACM Trans. Algorithms, 6(3):55:1–55:13, 2010.
- [9] M. Elkin. Distributed exact shortest paths in sublinear time. In Proc. 49th Annual ACM Symposium on the Theory of Computing, pages 757–770, 2017.
- [10] G. Even, J. (Seffi) Naor, B. Schieber, and M. Sudan. Approximating minimum feedback sets and multicuts in directed graphs. Algorithmica, 20(2):151–174, Feb 1998.
- [11] J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang. On graph problems in a semi-streaming model. Theoretical Computer Science, 348(2-3):207–216, 2005.
- [12] V. Guruswami and K. Onak. Superlinear lower bounds for multipass graph processing. Algorithmica, 76(3):654–683, Nov. 2016.
- [13] M. R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. External memory algorithms, pages 107–118, 1999.
- [14] Z. Huang, S. Kannan, and S. Khanna. Algorithms for the generalized sorting problem. In IEEE 52nd Annual Symposium on Foundations of Computer Science, FOCS 2011, Palm Springs, CA, USA, October 22-25, 2011, pages 738–747, 2011.
- [15] C. Jin. Simulating random walks on graphs in the streaming model. In 10th Innovations in Theoretical Computer Science Conference, ITCS 2019, January 10-12, 2019, San Diego, California, USA, pages 46:1–46:15, 2019.
- [16] A. B. Kahn. Topological sorting of large networks. Commun. ACM, 5(11):558–562, Nov. 1962.
- [17] D. M. Kane, J. Nelson, and D. P. Woodruff. On the exact space complexity of sketching and streaming small norms. In Proc. 21st Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1161–1178, 2010.
- [18] C. Kenyon-Mathieu and W. Schudy. How to rank with few errors: A PTAS for weighted feedback arc set on tournaments. Electronic Colloquium on Computational Complexity (ECCC), 13(144), 2006.
- [19] S. Khan and S. K. Mehta. Depth first search in the semi-streaming model. In Proc. 36th International Colloquium on Automata, Languages and Programming, pages 42:1–42:16, 2019.
- [20] A. McGregor. Graph stream algorithms: a survey. SIGMOD Record, 43(1):9–20, 2014.
- [21] A. Razborov. On the distributional complexity of disjointness. Theor. Comput. Sci., 106(2):385–390, 1992. Preliminary version in Proc. 17th International Colloquium on Automata, Languages and Programming, pages 249–253, 1990.
- [22] A. D. Sarma, S. Gollapudi, and R. Panigrahy. Estimating pagerank on graph streams. J. ACM, 58(3):13, 2011.
- [23] R. E. Tarjan. Edge-disjoint spanning trees and depth-first search. Acta Informatica, 6(2):171–185, Jun 1976.
- [24] A. Yehudayoff. Pointer chasing via triangular discrimination. Technical Report TR16-151, ECCC, 2016.