Simulating Time With Square-Root Space111To appear in STOC 2025.
Abstract
We show that for all functions , every multitape Turing machine running in time can be simulated in space only . This is a substantial improvement over Hopcroft, Paul, and Valiant’s simulation of time in space from 50 years ago [FOCS 1975, JACM 1977]. Among other results, our simulation implies that bounded fan-in circuits of size can be evaluated on any input in only space, and that there are explicit problems solvable in space which require time on a multitape Turing machine for all , thereby making a little progress on the versus problem.
Our simulation reduces the problem of simulating time-bounded multitape Turing machines to a series of implicitly-defined Tree Evaluation instances with nice parameters, leveraging the remarkable space-efficient algorithm for Tree Evaluation recently found by Cook and Mertz [STOC 2024].
1 Introduction
One of the most fundamental questions in theoretical computer science is: how does time relate to space, in computation? For instance, can every problem solvable in polynomial space () also be solved in polynomial time ()? Although it is widely believed that , there has been scant progress on separating time and space over the decades [SHL65, HU68, SM73, HPV75, PR81, HLMW86]. In particular, for general models of computation (beyond one-tape models), only time lower bounds of the form have been reported for linear-space problems, whereas requires showing that there is a linear-space problem that cannot be solved in time for every constant .
In this paper, we make another step towards separating from , by showing that there are problems solvable in space that cannot be solved in time for some constant . This is the first generic polynomial separation of time and space in a robust computational model (namely, the multitape Turing machine). The separation is accomplished by exhibiting a surprisingly space-efficient simulation of generic time-bounded algorithms. More formally, let be the class of decision problems decided by -time multitape Turing machines, and let the class decided by -space multitape Turing machines.
Theorem 1.1.
For every function , .
We find Theorem 1.1 to be very surprising. It appears to have been a common belief for decades that time cannot be simulated in space, for any constant . For example, assuming that time cannot be simulated in space, Sipser gave a poly-time derandomization of [Sip88]222Sipser also assumed explicit constructions of certain bipartite expander graphs, which were later constructed [SSZ98]. and Nisan-Wigderson gave a subexponential-time derandomization of ([NW94, Section 3.6]).333Note that the derandomization of Nisan and Wigderson can also be based on assuming the weaker hypothesis , which remains open and almost certainly cannot be refuted using the ideas of this paper.
Given the power of multitape Turing machines, Theorem 1.1 has many interesting consequences. Along with the ability to evaluate generic straight-line programs of length (over a bounded domain) in only space (see Section 4.1), Theorem 1.1 immediately implies by diagonalization that there are -space problems that cannot be solved in time for all : a “truly-polynomial” time lower bound.
Corollary 1.2.
For space constructible , and all , .
Similarly, if Theorem 1.1 could be extended to show for all that every time- Turing machine can be simulated in space, then would follow (see the arguments in Section 4). We discuss this possibility at the end of the paper, in Section 5. It follows from Corollary 1.2 that any complete problem for linear space requires quadratic time to be solved. For example:
Corollary 1.3.
The language requires time to solve on a multitape Turing machine, for every .
Another consequence is that there are context-sensitive languages which need essentially time to be recognized by multitape Turing machines. This follows from the fact that (nondeterministic linear space) is equivalent to the class of context-sensitive languages [Kur64].
Since multitape Turing machines can evaluate any circuit of size (and fan-in two) on any given input of length in only time [Pip77], it follows that arbitrary subquadratic-size circuits can be simulated by subexponential-size branching programs.444Unfortunately, the reference [Pip77] does not seem to be available online. See Section 4.1 for an alternative sketch of the argument.
Corollary 1.4.
There is a universal such that for all , every bounded fan-in circuit of size and inputs has a branching program of size at most .
It is an interesting question whether the simulation of Theorem 1.1 can be extended beyond multitape Turing machines, to more powerful computational models. We can generalize Theorem 1.1 to -dimensional multitape Turing machines, as follows:
Theorem 1.5.
Every decision problem solvable by a -time -dimensional multitape Turing machine can be decided in space (on a typical one-dimensional multitape Turing machine).
The space bound of Theorem 1.5 matches the best known space bounds for simulating time- machines with one -dimensional tape [Lou81, LL90]. See Section 2.1 for more references on prior simulations.
Remark 1.6 (Extension to Oblivious Random-Access Models).
At the present time, we do not know how to extend Theorem 1.1 to arbitrary random-access models of computation. The main issue is that the indegree of the resulting computation graph (defined in Section 3) is so high that the computation graph cannot be stored in memory. However, we remark that if the pattern of reads and writes of a given random-access machine model is oblivious, in the sense that given a step-count specified in bits, the names of the registers being accessed in step can be computed in space, then Theorem 1.1 does apply, with only a extra space factor. This is because such machines can be simulated by circuits of size, which can in turn be simulated efficiently by multitape Turing machines in time (see Section 4.1).
1.1 Intuition
The key idea behind Theorem 1.1 is to reduce the problem of simulating arbitrary time-bounded computations to particular instances of the Tree Evaluation problem, defined by S. Cook, McKenzie, Wehr, Braverman, and Santhanam [CMW+12]. In this problem, one is given a complete -ary tree of height , where each leaf of the tree is labeled with a -bit string, and each inner node of the tree is labeled with a function from bits to bits.555Our notation is slightly different from the usual setup, where there is a parameter . Here, our notation more closely follows Goldreich’s exposition [Gol24]. (Each function is presented as a table of values, each of which are bits.) Each inner node computes its value by evaluating its function on the values of its children. The task of Tree Evaluation is to determine the value of the root of the tree. To make this a decision problem, we will decide whether the first bit of the root’s value equals or not.
The obvious depth-first algorithm for Tree Evaluation uses space, to store intermediate results at each level of the tree. The authors of [CMW+12] conjectured that Tree Evaluation is not in , which would separate from (in fact, stronger separations would hold). Recent algorithmic work has led to significant skepticism that this conjecture is true. In a line of work, J. Cook and Mertz have found surprisingly space-efficient methods for Tree Evaluation [CM20, CM21, CM22, CM24], culminating in a marvelous algorithm using space only ([CM24, Theorem 7], see Appendix A).
Utilizing the old notion of “block-respecting” Turing machines [HPV75], we show how to reduce time- computations to (implicitly defined) Tree Evaluation instances of height , bit-length , and fan-in , where is a parameter from to that we can choose, and is a fixed constant depending on the machine. In particular, one can generate arbitrary bits of our Tree Evaluation instance in a space-efficient way as needed.
Very roughly speaking, each level of the tree will correspond to a “time block” of steps, and the value of each node of the tree will be a “block of tape” of length (a sequence of contiguous cells) from some particular tape of a -tape Turing machine; the value of the root will contain the contents of the final tape block of the computation, including the relevant accept/reject state. The evaluation function at node will simulate the Turing machine for steps, using the contents of tape blocks of length from previous time blocks; this can be computed in time and space, given the relevant contents of -length tape blocks from all tapes and the state from the previous time block. (The leaves of the tree roughly correspond to strings of blank symbols, or a block of symbols from the input .) We end up with a Tree Evaluation instance of height and fan-in , where steps of the Turing machine are processed at each node, and where each node is labeled with a string of about bits. (There are several technical details to worry over, such as the problem of knowing which tape blocks to use at each time step, but this is the high-level idea.)
Observe that, under the above parameter settings, the obvious depth-first Tree Evaluation procedure would yield an algorithm running in space. Applying the Cook-Mertz procedure, and setting to optimize the space usage, we find and obtain the space bound of Theorem 1.1.
Organization.
Section 2 discusses preliminaries (which we do not recommend skipping, but it is short). Section 3 begins by proving a short “warm-up” simulation (Theorem 3.1) which already achieves space and gives many of the key ideas. Section 3.2 proves Theorem 1.1, discusses the possibility of improvement, and proves Theorem 1.5. Section 4 discusses various corollaries mentioned earlier, and Section 5 concludes with a number of new directions to consider.
2 Preliminaries
We assume the reader is familiar with basic notions in time and space bounded complexity [Gol08, AB09]. There are a few pieces not covered in the usual complexity textbooks which are important for this work.
Robustness of Space Complexity.
It is important to recall that the class of problems is robust under changes in the definition of the underlying machine model. For essentially any model (with sequential access versus random access, or tapes versus registers) that measures space usage by the total number of bits needed to represent the storage, is the same complexity class. This is part of what is known as the “invariance thesis” [SvEB88, vEB90]. As a consequence, we do not have to worry much about machine models when we’re trying to design a space algorithm and we’re ignoring time complexity. This allows us to be more lax in our exposition.
Block-Respecting Turing Machines.
In our main simulation (Theorem 1.1) we will utilize a construction to simplify the analysis of multitape Turing machines. The construction was originally used by Hopcroft, Paul, and Valiant [HPV75] in their -space simulation of time, and it has been used in many subsequent works to “decompose” time-bounded computations (such as [PPST83, KLV03, KLRS12, LW13, MW17]). A time- block-respecting multitape Turing machine with blocks of length has the property that on all inputs of length , the computation is partitioned into time blocks of length , while each tape is partitioned into contiguous tape blocks of length . The key property is that for every time block, each tape head is in exactly one tape block during that time block, so that tape heads can switch between tape blocks only at the end of a time block. In particular, every tape head only crosses from one tape block to another, on time steps that are integer multiples of . A key lemma of [HPV75] is that every multitape Turing machine can be made block-respecting with low overhead.
Lemma 2.1 ([HPV75]).
For every time-constructible such that and every -time -tape Turing machine , there is an equivalent -time block-respecting -tape Turing machine with blocks of length .
The Tree Evaluation Problem.
As mentioned earlier, we will reduce time- computations to the Tree Evaluation problem. Our formulation of Tree Evaluation will be relaxed from the original notion: we allow a tree of height at most , where each inner node of the tree has children for some integer depending on , and is labeled with a function from bits to bits. As before, each leaf is labeled with a -bit string, and we wish to determine the first bit of the root’s value. Prior work on Tree Evaluation assumed a complete -ary tree where all root-to-leaf paths have length equal to ; in our setting, we allow some paths to have length than , and some nodes to have fewer than children. However, the algorithm of Cook and Mertz works just as well in our relaxed formulation:
Theorem 2.2 ([CM24], Theorem 7).
Tree Evaluation on trees of bit-length , maximum height , and fan-in at most , can be computed in space.
In Appendix A, we give an overview of how the Cook-Mertz algorithm works, and describe why it extends to our case.
2.1 More Related Work
As mentioned earlier, Hopcroft, Paul, and Valiant showed that time- multitape Turing machines can be simulated in space [HPV75]. This simulation was later extended beyond the multitape model, yielding a more space-efficient simulation for essentially all common models of computation used in algorithms and complexity [PR81, HLMW86]. Paterson and Valiant [PV76] showed that circuits of size can be simulated by depth circuits, implying a space-efficient simulation of circuits. Similarly, Dymond and Tompa [DT85] showed that time can be simulated in alternating time .
For decades, a square-root space simulation of the form in Theorem 1.1 has been known for one-tape Turing machines: Hopcroft and Ullman [HU68] showed that time- Turing machines with one read-write tape can be simulated in space . Other improvements on this simulation (e.g., improving the time of the simulation, improving the model(s) slightly) include [Pat72, IM83, LL90].
Paul, Pippenger, Szemerédi, and Trotter [PPST83] proved the separation for multitape Turing machines. Unfortunately, their proof only works for (one-dimensional) multitape Turing machines, and it is infamously still open to prove in more general models. We prove Theorem 1.5 (an extension to the -dimensional case) to illustrate that our approach via Tree Evaluation is more broadly applicable.
3 A More Space-Efficient Simulation of Time
In this section, we prove Theorem 1.1, showing that any multitape Turing machine running in time can be simulated in space . We will proceed by reducing the computation of on an input (of length ) to an instance of Tree Evaluation with particular parameters. (In fact, in the full proof of Theorem 1.1, we will reduce computing on to a series of Tree Evaluation instances.)
One initial remark is in order. First, note that we did not specify in the statement of Theorem 1.1 that the time function is time constructible, in that there is a Turing machine that can print the value of on in steps. This is because in our space-bounded simulation, we can simply try increasing values , one at a time, and not worry about constructibility issues. (This trick is used in other space-bounded simulations, such as [HPV75].)
Assume has tapes which are infinite in one direction, and all tape heads are initially at the leftmost cell. We also assume that the input is received on tape .
3.1 A Warm-up Result
Before describing the full space simulation, first we present a slightly worse algorithm which uses space and requires , but is simpler to describe. This bound is already extremely surprising (to the author), and the simulation gives most of the key intuition as well.
Theorem 3.1.
For every function , .
The proof of Theorem 3.1 will utilize the well-known oblivious two-tape simulation of multitape Turing machines. A multitape Turing machine is oblivious if for every , and every input of length , the tape head movements of on depend only on , and not on itself.
Theorem 3.2 ([HS66, PF79, FLvMV05]).
For every time- multitape Turing machine , there is an equivalent time- two-tape Turing machine which is oblivious, with . Furthermore, given and specified in bits, the two head positions of on a length- input at time step can be computed in time.
Let be the machine obtained from Theorem 3.2. The first idea behind our simulation is to conceptually partition the computation of on a length- input into time and tape “blocks” of length , for a parameter to be set later. In particular, the two tapes of are split into tape blocks of contiguous cells, and the steps of on are split into contiguous time blocks of length up to . Observe that, for any block of steps on , and any given tape , there are at most two tape blocks of tape that may have been accessed during the time block (moreover, they are adjacent tape blocks on tape ). We will construct a Tree Evaluation instance where the functions at each node of the tree evaluate a single time block, taking as input some relevant tape blocks from previous time blocks. In fact, the same time block may be recomputed many times over the entire Tree Evaluation instance, in order to evaluate the last time block at the root of the tree.
A Computation Graph.
To this end, we define a computation graph on nodes, a directed acyclic graph whose edges model the information flow from earlier time blocks in the computation to later time blocks: the reads and writes of and the head movements across blocks of tape. Our notion is very similar to the computation graphs defined in [HPV75, PR80]. Eventually, the computation being performed on the -node graph will be viewed as a Tree Evaluation instance of height .
Our graph has a node for each time block , and the edges will indicate which previous time block contents need to be read in order to compute the content of the tape blocks accessed during time block . In particular, we say that all tape blocks on the two tapes are active during time block , and for , a tape block is active during time block if the tape head visits some cell of the block during time block . We put an edge from in with if and only if:
-
•
either , or
-
•
when is run on input , there is some tape block active during time block that is not active again until time block . That is, for some tape head , it reads the same tape block in both time blocks and , but does not read tape block during any time blocks . (Alternatively, if some tape block is being accessed for the first time in time block , we have an edge to reflect the fact that all tape blocks are defined to be active in time block .)
Observe that each node has indegree at most : one edge from , and at most four other edges for (at most) four active tape blocks during time block (two active tape blocks for each of the two tapes).
The key insight behind the computation graph notion is that the information needed to simulate during a time block only requires knowing the information computed in previous time blocks , where is an edge in . In particular, the state and head positions of at the start of time block may be obtained from the state and head positions at the end of time block , so we have the edge , and we have an edge for each of those blocks of tape accessed during a time block which are not accessed again until time block .
Due to the obliviousness of (Theorem 3.2), we have the following claim:
Claim 3.3.
Given the indices of any two nodes , in , we can determine if there is an edge from to in additional space.
The claim follows because determining whether is an edge just requires us to keep track of the locations of the two tape heads, from some time block to a later time block . By Theorem 3.2, we can calculate the two tape head locations at any point in time using only time, so we only have to use space to keep track of whether a tape block accessed during time block is only accessed later at time block .
The Functions at the Nodes.
Now we define what is being computed at each node of . The content of time block , denoted by , is defined as follows:
-
•
is the initial configuration of running on the input of length , encoded in bits. (Recalling we have , we will eventually set . Thus the initial configuration of on can “fit” in one tape block.)
-
•
For , encodes information about the status of on at the end of time block : the state of , the two tape head positions, and a list of the contents of those tape blocks accessed during time block . As there are at most four such tape blocks which may have been accessed during time block , can be encoded in bits.
Note that for every fixed , if we are given for all edges in , then we can compute in time and space, by simply simulating for steps on the tape blocks of the relevant values.
Our goal is to determine , the content of the final time block, which will contain either an accept or reject state and determine the answer.
A Tree Evaluation Instance.
To do this, we construct a Tree Evaluation instance in which the root node computes , its children compute for all such that is an edge in , and so on; the leaves of our Tree Evaluation instance will compute , the initial configuration of on . This transformation is analogous to how depth- Boolean circuits of fan-in can be modeled by formulas of depth- and size at most , by tracing over all possible paths from the output gate of the circuit to an input gate of the circuit; see for example [Juk12, Chapter 6].
More precisely, we define a tree of height at most and fan-in at most 5, with a root node that will evaluate to . Each node of is labeled by a distinct path from some node in to the node of . Inductively, we define the labels as follows:
-
•
the root node of is labeled by the empty string (the empty path from to itself), and
-
•
for every node in labeled by a distinct path from to , and for every node with an edge to in , the node has a child in labeled by the path which takes the edge then the path to .
Observe that paths of length from a node to node can be encoded in bits, since the indegree of each node is at most . Furthermore, given such a path encoded in this way, observe we can determine the node at the start of in space, by making repeated calls to the edge relation (as in 3.3).
The desired value to be computed at node (labeled by a path from to ) is precisely . For , this value is just the initial configuration of on , which can be produced immediately in time and space. For , can be computed in time and space given the values of the children of . Since has at most total nodes, the height of is at most . By induction, the value of the root of is precisely .
While the tree has nodes, observe that for any given node of (labeled by a path to ), we can easily compute the labels of the children of in space, using 3.3 to compute the edges of the graph . Thus, it takes only additional space to determine the children of any given node of , and this space can be immediately reused once these children are determined. So without loss of generality, we may assume we have random access to the tree , its leaves, and its functions at each node.
Finally, we call the Cook-Mertz Tree Evaluation algorithm (Theorem 2.2) in its most general form on . Recall that the space bound of this algorithm, for trees of height at most , fan-in at most , computing -bit values at each node, is
For us, , , and . We only use additional time and space for each function call to compute some , and we can reuse this space for every separate call. Therefore our space bound is optimized up to constant factors, by setting such that
so suffices. This completes the proof of Theorem 3.1.
3.2 The Main Result
Now we describe how to improve the bound of the space simulation, by avoiding the blowup of the oblivious simulation. This will establish Theorem 1.1. The main difficulty is that without the oblivious guarantee of Theorem 3.2, we do not know how to determine the edges of computation graph in an efficient way. To remedy this, we will use more space: we will enumerate over possible computation graphs , and introduce a method for checking that in the functions of our Tree Evaluation instance. Our graph enumeration will be performed in a particularly space-efficient way, so that if Tree Evaluation turns out to be in logspace, the simulation of this paper will yield an space bound.
As before, we start with a multitape which runs in time, and let be an input of length .
First, we make block-respecting as in Lemma 2.1 with block-length on inputs of length , for a parameter to be set later. The new multitape machine has tapes, runs in time , and has time and tape blocks.666Strictly speaking, we do not have to make block-respecting, but it does make aspects of the presentation a little cleaner: we do not have to reason about “active” tape blocks as we did in the warm-up (Theorem 3.1). Foreshadowing a bit, we will not use a block-respecting notion in the later extension to -dimensional Turing machines (Theorem 1.5) and again use “active” tape blocks.
The Computation Graph.
We start by refining the computation graph notion from Theorem 3.1. Here, our computation graph is similar but not identical to that of [HPV75] and the warm-up Theorem 3.1. Because we will allow for space bounds which are smaller than the input length , we have to make several modifications to to fit the specifications of Tree Evaluation. We define the set of nodes in to be
Intuitively, each will correspond to the content of the relevant block of tape after time block , while each will be a source node in corresponding to the content of the -th block of tape when it is accessed for the first time, i.e., the initial configuration of the -th block of tape .777A little explanation may be in order. In the warm-up Theorem 3.1, we assumed , so that the entire initial configuration of the machine on could fit in a length- block. This made the leaf nodes of our Tree Evaluation instance particularly easy to describe. For , we may have , so the input may not fit in one tape block. To accommodate this possibility and obtain a clean reduction to Tree Evaluation in the end, we define multiple source nodes in to account for different -length blocks of input in the initial configuration of on , along with -length blocks of all-blank tape when these blocks are accessed for the first time. (We imagine that on each tape, the tape blocks are indexed starting from the leftmost block, with up to tape blocks for each tape.) We think of all nodes as associated with “time block ”.
Each node is labeled with an integer , indicating the tape head movement at the end of time block :
Next, we describe the edges of ; there are two types. For each and with , the edge is put in if either:
-
•
, or
-
•
while is running during time block , the tape block accessed by during time block is not accessed again until time block . That is, tape head reads the same tape block in both time blocks and , but head does not read tape block during any of the time blocks .
For and , the edge is put in if, while is running during time block , the tape head accesses its -th tape block for the first time in the computation. (For example, note that , for all , is an edge.)
Observe that the indegree of each node is at most for all : for all , there is an edge from to for (and from to for ), and there is an edge from a node labeled by (either of the form or ) to , indicating which block of tape is needed to compute the block of tape during time block . (Note that some of these edges might be double-counted, so the indegree is at most .)
A Succinct Graph Encoding.
The obvious way to store the computation graph uses bits. To save space, we will be more careful, and use additional observations on the structure of such computation graphs. (These observations are similar but not identical to those used in the separation of and , of Paul-Pippenger-Szemerédi-Trotter [PPST83].)
Recall that each node is labeled by a head movement indicating whether the tape block of head is decremented by , incremented by , or stays the same, at the end of time block . Our key observation is that the numbers alone already tell us the entire structure of the graph . This immediately implies that every possible guess of can be encoded in only bits: we only need a constant number of bits for each of the nodes.
In particular, for an index , we define to be the index of the tape block of tape being accessed at the start of time block . For all , we have , and for , by definition of the we have
(1) |
Equation (1) is true because each tells us how the index of the tape block of tape changes at the end of time block , which is the same as the tape block index at the start of time block .
For , observe that there is an edge from to in if and only if either:
-
•
, or
-
•
and for all , . (The tape block accessed by in time block is not accessed again until time block .)
Furthermore, there is an edge from to in if and only if and for all , . (The tape block accessed by in time block was never accessed before, and its index is equal to .)
We will use a few claims about the block function and the computation graph.
Claim 3.4.
Given and the claimed sequence , we can compute in additional space.
Proof.
Given any , each can be computed in logspace using (1), by maintaining a counter and streaming over the sequence . ∎
Claim 3.5.
Given the indices of a pair of nodes in , and the claimed sequence , we can determine if is an edge in the encoded graph in additional space.
Proof.
We can easily check if in logspace. By 3.4, we can compute for any in logspace as well. Therefore the three possible conditions for an edge in the computation graph can each be checked in logspace. ∎
To summarize, we can encode in bits, and given such an encoding, we can determine any desired edge of the graph in logspace.
Values of Nodes.
Similarly to our warm-up Theorem 3.1, we define contents for each of the nodes of . For all and , we define for the source nodes as follows:
-
•
If , then we are reading a tape that does not contain the input. In this case, is defined to be all-blank tape content: blanks, with the tape head at the leftmost cell of the block, and head position equal to . (Note that, assuming we start numbering tape cells at , is the leftmost cell of the -th block of tape.)
-
•
If , then we may need to read portions of the input of length . If , then the -th tape block of tape does not contain any symbol of , and is defined to be all-blank tape content as above. Otherwise, if , then is defined to be the relevant -length substring of (possibly padded with blanks at the end) with the tape head at the leftmost cell of the block, head position equal to , and the initial state of included if .
Note that the source nodes of our will eventually be the leaf nodes of the Tree Evaluation instance; in the above, we are also defining the values of those leaves.
For and , we define of node similarly as in Theorem 3.1: it is a string encoding the pair , the state of and the head position of tape at the end of time block , and the content of the tape block read by head at the end of time block . With a reasonable encoding, can be represented in bits, and our computation graph has been set up (just as in Theorem 3.1) so that given the strings for all nodes with edges to , the string can be computed in time and space.
Computation Graph Enumeration.
In what follows, we enumerate over all possible -bit choices of the encoding of the computation graph on nodes. For each , we construct a Tree Evaluation instance based on , which will attempt to simulate on assuming . We will set up so that the following conditions hold:
-
•
If , this means that at the end of some time block , some tape head of on moves inconsistently with the guessed label in . In this case, evaluating will result in a special FAIL value at the root.
-
•
If , then will evaluate to at the root, which will include either an accept or reject state at the end of the computation of on . Thus we will be able to conclude decisively whether accepts or rejects after this call to Tree Evaluation.
Finally, if we exhaust all computation graphs and always receive FAIL from all Tree Evaluation calls, we then increase our guess of by (starting from ), and restart the entire process. (Recall that we do not assume is constructible.) Eventually, we will choose an appropriate , in which case the above enumeration of graphs will result in either acceptance or rejection.
The Functions At The Nodes.
We now define the functions at the nodes of our Tree Evaluation instance . These functions will allow us to detect when the current graph we are considering has an error. We have to be a little careful here, as the Cook-Mertz algorithm is only guaranteed to produce the value of the root of a given tree, and not the values of any intermediate nodes along the way. (Indeed, the values of other nodes may become quite “scrambled” from evaluating a low-degree extension of the functions involved.)
For each tape and time block , we define a time block function which attempts to simulate over time block , and to output , the content of the relevant block of tape at the end of time block .
First, we choose an encoding of strings so that there is also a special FAIL string of length , which is distinct from all valid content strings.
-
•
The input to consists of up to strings of length . Some strings may be -bit FAIL strings. In the case where , of the input strings have the form (or if ) for all . These content strings contain the index of the node in they correspond to, the state of at the start of time block , the content of the relevant tape blocks at the end of time block , and the head positions of all tapes at the start of time block , where each head position is encoded in bits. When some of the tape blocks accessed in time block were not accessed in the previous time block, there are also (up to) other strings which contain tape block content for each of the tapes at the start of time block .
-
•
The output of is defined as follows. First of all, if some input string is a FAIL string, then immediately outputs FAIL as well. In this way, a single FAIL detection at any node will propagate to the root value of .
Assuming no FAIL strings have been given as input, attempts to simulate for time block , using the given content strings. While simulating, checks that for all , the tape head moves consistently with the integer . In particular, if then it checks tape head moves to its left-adjacent tape block at the end of time block , if then it checks moves to its right-adjacent tape block, and if then it checks remains in the same tape block. If all heads move consistently with the integers , then the output of is set to be . Otherwise, the output of is FAIL.
Observe that can be computed in time and space, by simply simulating for steps, starting from the contents for each of the tapes, and the state and head information given. Furthermore, by setting appropriately, we may think of each as a function from an ordered collection of separate -bit strings, to a single -bit string. These will be the functions at the nodes of our Tree Evaluation instance.
Construct a Tree Evaluation Instance.
Observe that the depth of every is at most : for every edge in , the time block of is always smaller than the time block of , and there are time blocks. For each possible computation graph , we will construct an equivalent and implicitly-defined Tree Evaluation instance of height at most , such that the edge relation of the tree can be determined in small space. The idea is analogous to that described in the warm-up (Theorem 3.1); for completeness, we will be a little more formal than Theorem 3.1.
Recall that each candidate is defined so that each non-source node has indegree at most . Let be the set of all sequences of the form for all , where each , and let denote the empty string (also in ). The nodes of will be indexed by sequences in . For all , each node has at most children for some .
Each node of is directly associated with a path from a node to the node in the guessed graph , as follows.
-
•
The node corresponds to the node in (tape , in the last time block), and we associate the function with this node.
-
•
Inductively assume the node corresponds to some node in . We associate the function with this node.
For every , the node corresponds to a node (or ) in with an edge to , for some . This corresponds to the case where the tape block of tape accessed in time block is accessed later in time block (or when is reading tape block for the first time in time block ).
For every , let , so . The node corresponds to the node with an edge to when , and the node with an edge to when . This corresponds to the case where the state and head position of tape at the end of time block is passed to the start of time block , and to the case where the initial state and head position is passed to the start of time block .
If there is an edge from to for some , then the tape head has never previously visited the tape block that is used to compute time block . In that case, we set to be a leaf node in the tree. The value of that leaf node is set to be .
Finishing up.
We call Tree Evaluation on . If the current guessed is not equal to , then some guessed integer is an incorrect value (recall that is specified entirely by the integers ). This incorrect head movement will be detected by the function , which will then output FAIL. This FAIL value will propagate to the root value of by construction. When evaluates to FAIL, we move to the next possible , encoded in bits.
Assuming the current graph is correct, i.e., , then the value of the root of is , the output of on the final time block . This value contains the correct accept/reject state of on . This follows from the construction of the functions , and can be proved formally by an induction on the nodes of in topological order. Therefore if our Tree Evaluation call returns some content with an accept/reject state, we can immediately return the decision.
Space Complexity.
Observe that, by 3.4 and 3.5, any bits of the Tree Evaluation instance defined above can be computed in space, given the computation graph encoded in space. In particular, given the index of a node of of the tree as defined above (as a path from a node to in ), we can determine the corresponding node of and its children in space.
Therefore, we can call the Cook-Mertz algorithm (Theorem 2.2) on this implicitly-defined instance of Tree Evaluation, using space plus space, where
-
•
is the bit-length of the output of our functions ,
-
•
, the number of children of each inner node, and
-
•
, the maximum height of the tree.
At each node, each call to one of the functions requires only space. Therefore the overall space bound is
Setting obtains the bound . This completes the proof of Theorem 1.1.
Given that we have reduced to Tree Evaluation, it may be instructive to think about what is happening in the final algorithm, conceptually. At a high level, for our instances of Tree Evaluation, the Cook-Mertz procedure on uses space to specify the current node of being examined (given by a path from that node to the node ) as well as an element from a field of size for each node along that path. The algorithm also reuses space at each node, in order to compute low-degree extensions of the function by interpolation over carefully chosen elements of the field. For our setting of , we are using equal amounts of space to store a path in the graph labeled with field elements, and to compute a low-degree extension of the “block evaluation” function at each node (with aggressive reuse of the latter space, at every level of the tree).
3.3 On Possibly Removing the Square-Root-Log Factor
We observe that, if Tree Evaluation turns out to be in , then we would obtain a simulation that runs in space when the number of children is upper bounded by a constant. This is due to our succinct encoding of the computation graph, which only needs space to be stored. Setting , this would remove the pesky factor from our space bound above:
Corollary 3.6.
If Tree Evaluation is in , then .
There may be a path to removing the factor that does not require solving Tree Evaluation in logspace. The simulations of time building on Hopcroft-Paul-Valiant [HPV75, PV76, PR81, DT85, HLMW86], which save a factor in space and alternating time, utilize strategies which seem orthogonal to our approach via Tree Evaluation. Roughly speaking, there are two kinds of strategies: a pebbling strategy on computation graphs which minimizes the total number of pebbles needed [HPV75, PR81], and a recursive composition strategy which cuts the graph into halves and performs one of two recursive approaches based on the cardinality of the cut [PV76, DT85, HLMW86].888There is even a third strategy, based on an “overlap” argument [AL81]. Neither of these seem comparable to the Cook-Mertz algorithm. However, so far we have been unable to merge the Tree Evaluation approach and the other strategies. The following hypothesis seems plausible:
Hypothesis 3.7.
For “reasonable” and , time- multitape computations can be space-efficiently reduced to Tree Evaluation instances with constant degree , height , and functions from -bits to -bits at each inner node.
Results such as [PV76, DT85] which show that and that circuits of size can be simulated in depth , prove that the hypothesis is actually true for . If the hypothesis is also true for , then we could conclude , by applying Cook and Mertz (Theorem 2.2).
3.4 Extension to Higher Dimensional Tapes
The simulation of Theorem 1.1 extends to Turing machines with higher-dimensional tapes. The proof is very similar in spirit to Theorem 1.1, but a few crucial changes are needed to carry out the generalization.
Reminder of Theorem 1.5. Every decision problem solvable by a -time -dimensional multitape Turing machine can be decided in space.
Proof.
(Sketch) As there is no -dimensional version of block-respecting Turing machines, we have to take a different approach. Similarly to Theorem 3.1 and Paul-Reischuk [PR80], we upper-bound the number of tape blocks that may be relevant to any given time block, and use that information to construct a series of Tree Evaluation instances.
Suppose our time- Turing machine has tapes which are -dimensional, and we wish to simulate on an input of length . We assume is written on the first tape in a single direction starting from the cell indexed by . (For concreteness, for we may assume the -th bit of is written in the cell .)
For a parameter , we partition the time into time blocks of length , so there are total time blocks. Besides time blocks , we also define a time block (similarly to Theorem 3.1 and Theorem 1.1). Each -dimensional tape is partitioned into tape blocks of contiguous cells: in particular, each tape block is a -dimensional cube with side-length in each direction. Observe that each tape block has up to adjacent tape blocks in dimensions. We define a tape block to be active in time block if some cells of are accessed during time block , and all tape blocks are defined to be active in time block . Since each time block is only for steps and each tape block has side-length in each direction, observe that for each and each tape , there are at most blocks of tape that may be active during time block . Therefore across all tapes, the total number of active blocks is at most . Note that each active tape block of tape during time block can be indexed by some vector in indicating its position relative to the tape block in which tape started the time block.
Similar to the proofs of Theorem 3.1 and Theorem 1.1, we define a computation graph . The set of nodes will be the set unioned with a subset , where .
Each node will have a content value as before. The nodes will store a portion of the input, or the all-blank content, depending on the vector . The vector gives the index of a block of tape . (In the one-dimensional case, the tape blocks were simply indexed by .) The nodes will store information at the end of time block : the state of , the head position of tape , and a list of the contents of those blocks of tape that are active during time block . Note that can be encoded in bits.
As before, we label each node to indicate the head movements between time blocks, but our labels are more complicated than before. We use a -dimensional vector to indicate the difference between the index of the tape block that tape head is in at the start of time block , and the index of the tape block that head is in at the end of time block . We have , since every time block takes steps and the side-length of a tape block is cells. We also label each node with a list of up to other vectors in , describing the indices of all other blocks of tape that are active in time block . Observe that the vectors and the lists over all nodes can be encoded in bits. Moreover, given the vectors and the lists , we can reconstruct any and in only logspace by maintaining counters.
As before, we put an edge from to if either
-
•
, or
-
•
and there is some tape block active on tape in both time blocks and that is not active for all time blocks through .
We put an edge from to if during time block , the tape head accesses the tape block indexed by for the first time in the computation.
We observe that the indegree of each is at most : there are at most active tape blocks for each of the tapes, each of which may require information from a different previous node, and there are also edges from the previous time block providing the state and head positions at the start of time block .
Generalizing 3.4 and 3.5, we claim that the edges of can be determined in logspace, given the vectors and the lists encoded in bits. We enumerate over all possible computation graphs , using this encoding. Note that the encoding also allows us to determine which source nodes appear in , by tracking the tape head movements as claimed by the vectors and the lists , and noting when a tape head enters a new block that has not been accessed before.
For each node , the evaluation function is defined similarly as in the proofs of Theorem 3.1 and Theorem 1.1. Given all the necessary information at the start of a time block: the state , the contents of the (at most) active blocks, and each of the head positions encoded in bits, the function computes : it simulates for steps on the active blocks, then outputs updated contents of all active blocks on tape , in bits for some , including the head position of tape at the end of the time block and the state reached.
As before, checks that the claimed active tape blocks in and the claimed vector are consistent with the simulation of time block ; if they are not, then outputs FAIL and we design the as before so that a single FAIL value is propagated to the root of . We can think of each as a mapping from bits to bits, where we allow a special encoding to “pad” the input and output when the number of active blocks on some tape is less than .
Finally, for each guessed graph we define a Tree Evaluation instance analogously as in the proof of Theorem 1.1. The root of the tree corresponds to the node , where we wish to check if contains the accept or reject state. The children of a given node in the tree are determined directly by the predecessors of the corresponding node in . The leaves correspond to the nodes in such that contains either an all-blank -dimensional cube of cells, or a -dimensional cube of cells which includes up to symbols of the input and is otherwise all-blank.
As before, if a call to Tree Evaluation for some ever returns an answer other than FAIL, we return the appropriate accept/reject decision. If all calls FAIL, we increment the guess for the running time and try again.
Our Tree Evaluation instance has height at most , where each inner node has at most children, working on blocks of bitlength . Applying the Cook-Mertz algorithm for Tree Evaluation, recalling that and are both constants, and including the bits we use to encode a candidate graph , we obtain a simulation running in space
since and is constant.
Setting , the resulting space bound is . ∎
We remark that putting Tree Evaluation in would also directly improve the above simulation as well, to use space.
4 Some Consequences
First, we describe some simple lower bounds that follow by diagonalization. We will use the following form of the space hierarchy theorem:
Theorem 4.1 ([SHL65]).
For space constructible such that , we have .
Reminder of Corollary 1.2. For space constructible and all , .
Proof.
Assume to the contrary that for some and some space constructible . By Theorem 1.1, we have
for a function which is . This contradicts Theorem 4.1. ∎
Reminder of Corollary 1.3. The language requires time to solve on a multitape Turing machine, for every .
Proof.
Suppose can be solved in time for some . We show every language in could then be solved in time , contradicting Corollary 1.2. Let be decided by a Turing machine using space for a constant . Given an input to , we call our time algorithm for on the input of length . This takes time, a contradiction. ∎
Observe the same proof also shows a time lower bound of the form , for a constant .
4.1 Subexponential Size Branching Programs for Circuits
Here, we observe that Theorem 1.1 implies that subquadratic size circuits can be simulated with subexponential size branching programs:
Reminder of Corollary 1.4. There is a universal such that for all , every bounded fan-in circuit of size and inputs has a branching program of size at most .
Recall the Circuit Evaluation problem: given the description of a Boolean circuit of fan-in two with one output, and given an input , does on evaluate to ? We assume is encoded in topological order: the gates are numbered , and for all , the -th gate in the encoding only takes inputs from gates appearing earlier in the encoding. For simplicity, we further assume each gate provides a correct list of all future gates that will consume the output of gate . Note that when is at least the number of input bits, we can still afford an encoding of size- circuits in bits, carrying this extra information. An old result commonly credited to Pippenger is that Circuit Evaluation on topologically-ordered circuits can be efficiently solved on multitape Turing machines:
Theorem 4.2 (Pippenger [Pip77]).
Circuit Evaluation .
The reference [Pip77] is difficult to find, however the PhD thesis of Swamy ([Swa78, Chapter 2, Theorem 2.1]) gives an exposition of an time bound.999Available at http://static.cs.brown.edu/research/pubs/theses/phd/1978/swamy.pdf. Swamy describes his construction as simulating oblivious RAMs / straight-line programs; it readily applies to circuits. The high-level idea is to solve a more general problem: given the description of a circuit and input , we wish to insert the output values of each gate of directly into the description of . To solve this problem on of size , we first recursively evaluate the circuit on the first gates (in topological order) which sets values to all those gates. We pass over those values, and collect the outputs of all gates among the first that will be used as inputs to the remaining gates. We sort these outputs by gate index, then recursively evaluate the remaining gates on plus the list of outputs. This leads to a runtime recurrence of (using the fact that ).
Combining Theorem 4.2 and Theorem 1.1, we directly conclude that Circuit Evaluation can be solved in space. Now, given any circuit of size , we hardcode its description into the input of a multitape Turing machine using space for Circuit Evaluation. Applying the standard translation of -space algorithms into branching programs of size (see for instance the survey [Raz91]), Corollary 1.4 follows.
5 Discussion
We have shown that multitape Turing machines and circuits have surprisingly space-efficient evaluation algorithms, via a reduction to the Tree Evaluation problem. Two reflective remarks come to mind.
First, we find it very interesting that Tree Evaluation, which was originally proposed and studied in the hopes of getting a handle on , may turn out to be more useful for making progress on . In any case, it is clear that Tree Evaluation is a central problem in complexity theory.
Second, we find it fortunate that the main reduction of this paper (from time- multitape Turing machine computations to Tree Evaluation) was found after the Cook-Mertz procedure was discovered. Had our reduction been found first, the community (including the author) would have likely declared the following theorem (a cheeky alternative way of presenting the main reduction of Theorem 1.1) as a “barrier” to further progress on Tree Evaluation, and possibly discouraged work on the subject:
“Theorem.” Unless the 50-year-old [HPV75] simulation can be improved, Tree Evaluation instances of constant arity, height , and -bit values cannot be solved in space.101010Here, we think of as shorthand for for some unbounded function .
Theorem 1.1 and its relatives open up an entirely new set of questions that did not seem possible to ask before. Here are a few tantalizing ones.
Can the simulation of Theorem 1.1 be improved to show ? We discussed some prospects for a yes-answer in Section 3.3 (e.g., showing Tree Evaluation is in ). An interesting secondary question is whether the longstanding -space simulation of one tape time- Turing machines [HU68, Pat72] can be improved to space, for some .
Is there an such that ? Is there a better speedup of time , using alternating Turing machines? Recall the best-known simulation is [DT85]. A yes-answer to this question would imply (for example) a super-linear time lower bound on solving quantified Boolean formulas (QBF), which is still open [Wil08, LW13]. Note that if Theorem 1.1 could be improved to for some , then we would have .
Can time- random-access Turing machines be simulated in space , for some ? As mentioned in Remark 1.6, the answer is yes for “oblivious” models in which data access patterns can be calculated in advance. In both Theorem 1.1 and Theorem 1.5, we exploit the locality of low-dimensional tape storage: without that property, the indegrees of nodes in the computation graph would be rather high, and (as far as we can tell) the resulting simulation would not improve the known space bounds for simulating time in random-access models [PR81, HLMW86]. Similarly to the previous question, if for some , then the answer to the question would be yes, since for natural random-access models (e.g., log-cost RAMs), time can be simulated by -time multitape Turing machines [PF79]. A similar statement holds for improving the -dimensional simulation of Theorem 1.5 [Lou83]. On the other hand, if the answer to the question is no, then we would separate linear time for multitape Turing machines and linear time for random-access models, another longstanding open question (see for example [GS02]).
Is a time-space tradeoff possible? For example, is , for all ? The Cook-Mertz procedure needs to compute a low-degree extension of the function at each node, which is time-consuming: for time blocks of steps, it takes time to compute the low-degree extension at a given point. If low-degree extensions of time- computations could be computed more time-efficiently, then such a time-space tradeoff may be possible. Note that if the multilinear extension of a given CNF on variables and clauses could be evaluated over a large field in time, then the Strong Exponential Time Hypothesis [IP01, CIP09] would be false: the multilinear extension is the identically-zero polynomial if and only if the CNF is unsatisfiable, so one could use DeMillo–Lipton–Schwartz–Zippel [DL78] to test for unsatisfiability. However, this observation does not apparently rule out the possibility of evaluating low-degree but non-multilinear extensions efficiently.
Can the simulation be applied recursively, to show that for all ? Note that a yes-answer would imply . At first glance, a recursive extension of Theorem 1.1 seems natural: we decompose a time- computation into blocks, each of which are time- computations. (This property was successfully exploited recursively in [LW13], for example, to show some weak lower bounds on QBF.) However, we cannot directly apply recursion to time blocks, because in the Cook-Mertz procedure, the task at each function evaluation is not just to simulate for steps, but to compute a low-degree extension of the function (as noted in the previous question). If low-degree extensions of time- computations can be computed in small space, then there is a possibility of recursion. However, given the discussion on the previous question, there is reason to think this will be difficult.
Is there a barrier to further progress? Perhaps lower bounds for the Node-Named Jumping Automata on Graphs (NNJAG) model (a popular model for restricted space lower bounds) [Poo93, Poo00, EPA99] could demonstrate a barrier. Many complex algorithms such as Reingold’s [Rei08] can be simulated in the NNJAG model [LZPC05]; does the Cook-Mertz algorithm have this property as well? We believe the answer is probably no: NNJAG is fundamentally a node-pebbling model, and the Cook-Mertz procedure definitely breaks space lower bounds for pebbling DAGs and trees [LT82, CMW+12]. Pebbling lower bounds were a major bottleneck to improving [HPV75].
While Theorem 1.1 and Theorem 3.1 are non-relativizing (see for example [HCC+93]), the simulations do permit a restricted form of relativization, as do all prior simulations of time in smaller space. Define to be the class of problems solvable in time with queries to the oracle , where all oracle queries are restricted to have length at most . Similarly define . We observe the following extension of Theorem 3.1:
Theorem 5.1.
For every oracle and , is contained in .
The theorem follows because each oracle query can be arranged to be written in a single tape block of length , and the Cook-Mertz procedure treats the evaluation functions as black boxes. Thus each evaluation function can simply call the oracle as needed. (Tretkoff [Tre86] made a similar observation for the simulation of in , of Paul-Pippenger-Szemerédi-Trotter [PPST83]. Such a length-restricted oracle model was also studied in [CW06].) Is there a barrier to improving such (restricted) relativizing results to obtain ? This seems related to the fact that Cai and Watanabe [CW06] were unable to collapse to with “random access to advice” (length-restricted access to oracles).
Acknowledgments.
I am grateful to Rahul Ilango, Egor Lifar, Priya Malhotra, Danil Sibgatullin, and Virginia Vassilevska Williams for valuable discussions on Tree Evaluation and the Cook-Mertz procedure. I am also grateful to Shyan Akmal, Paul Beame, Lijie Chen, Ce Jin, Dylan McKay, and the anonymous STOC reviewers for helpful comments on an earlier draft of this paper.
References
- [AB09] Sanjeev Arora and Boaz Barak. Computational Complexity - A Modern Approach. Cambridge University Press, 2009. URL: http://www.cambridge.org/catalogue/catalogue.asp?isbn=9780521424264.
- [AL81] Leonard M. Adleman and Michael C. Loui. Space-bounded simulation of multitape turing machines. Math. Syst. Theory, 14:215–222, 1981. URL: https://doi.org/10.1007/BF01752397.
- [CIP09] Chris Calabro, Russell Impagliazzo, and Ramamohan Paturi. The complexity of satisfiability of small depth circuits. In Parameterized and Exact Computation, 4th International Workshop, IWPEC, volume 5917 of Lecture Notes in Computer Science, pages 75–85. Springer, 2009. URL: https://doi.org/10.1007/978-3-642-11269-0_6.
- [CM20] James Cook and Ian Mertz. Catalytic approaches to the tree evaluation problem. In Proceedings of STOC, pages 752–760. ACM, 2020. URL: https://doi.org/10.1145/3357713.3384316.
- [CM21] James Cook and Ian Mertz. Encodings and the tree evaluation problem. Electron. Colloquium Comput. Complex., TR21-054, 2021. URL: https://eccc.weizmann.ac.il/report/2021/054.
- [CM22] James Cook and Ian Mertz. Trading time and space in catalytic branching programs. In 37th Computational Complexity Conference, CCC, volume 234 of LIPIcs, pages 8:1–8:21. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022. URL: https://doi.org/10.4230/LIPIcs.CCC.2022.8.
- [CM24] James Cook and Ian Mertz. Tree evaluation is in space O(log n log log n). In Proceedings of the 56th Annual ACM Symposium on Theory of Computing (STOC), pages 1268–1278. ACM, 2024. URL: https://doi.org/10.1145/3618260.3649664.
- [CMW+12] Stephen A. Cook, Pierre McKenzie, Dustin Wehr, Mark Braverman, and Rahul Santhanam. Pebbles and branching programs for tree evaluation. ACM Trans. Comput. Theory, 3(2):4:1–4:43, 2012. URL: https://doi.org/10.1145/2077336.2077337.
- [CW06] Jin-yi Cai and Osamu Watanabe. Random access to advice strings and collapsing results. Algorithmica, 46(1):43–57, 2006. URL: https://doi.org/10.1007/s00453-006-0078-8.
- [DL78] Richard A. DeMillo and Richard J. Lipton. A probabilistic remark on algebraic program testing. Inf. Process. Lett., 7(4):193–195, 1978. URL: https://doi.org/10.1016/0020-0190(78)90067-4.
- [DT85] Patrick W. Dymond and Martin Tompa. Speedups of deterministic machines by synchronous parallel machines. J. Comput. Syst. Sci., 30(2):149–161, 1985. URL: https://doi.org/10.1016/0022-0000(85)90011-X.
- [EPA99] Jeff Edmonds, Chung Keung Poon, and Dimitris Achlioptas. Tight lower bounds for st-connectivity on the NNJAG model. SIAM J. Comput., 28(6):2257–2284, 1999.
- [FLvMV05] Lance Fortnow, Richard J. Lipton, Dieter van Melkebeek, and Anastasios Viglas. Time-space lower bounds for satisfiability. J. ACM, 52(6):835–865, 2005. URL: https://doi.org/10.1145/1101821.1101822.
- [Gol08] Oded Goldreich. Computational complexity - a conceptual perspective. Cambridge University Press, 2008. URL: https://doi.org/10.1017/CBO9780511804106.
- [Gol24] Oded Goldreich. On the Cook-Mertz Tree Evaluation procedure. Electron. Colloquium Comput. Complex., TR24-109, 2024. URL: https://eccc.weizmann.ac.il/report/2024/109.
- [GS02] Etienne Grandjean and Thomas Schwentick. Machine-independent characterizations and complete problems for deterministic linear time. SIAM J. Comput., 32(1):196–230, 2002. doi:10.1137/S0097539799360240.
- [HCC+93] Juris Hartmanis, Richard Chang, Suresh Chari, Desh Ranjan, and Pankaj Rohatgi. Relativization: a revisionistic retrospective. In Current Trends in Theoretical Computer Science - Essays and Tutorials, volume 40 of World Scientific Series in Computer Science, pages 537–548. World Scientific, 1993. URL: https://doi.org/10.1142/9789812794499_0040.
- [HLMW86] Joseph Y. Halpern, Michael C. Loui, Albert R. Meyer, and Daniel Weise. On time versus space III. Math. Syst. Theory, 19(1):13–28, 1986. URL: https://doi.org/10.1007/BF01704903.
- [HPV75] John E. Hopcroft, Wolfgang J. Paul, and Leslie G. Valiant. On time versus space. J. ACM, 24(2):332–337, 1977. Conference version in FOCS’75. URL: https://doi.org/10.1145/322003.322015.
- [HS66] F. C. Hennie and Richard Edwin Stearns. Two-tape simulation of multitape turing machines. J. ACM, 13(4):533–546, 1966. URL: https://doi.org/10.1145/321356.321362.
- [HU68] John E. Hopcroft and Jeffrey D. Ullman. Relations between time and tape complexities. J. ACM, 15(3):414–427, 1968. URL: https://doi.org/10.1145/321466.321474.
- [IM83] Oscar H. Ibarra and Shlomo Moran. Some time-space tradeoff results concerning single-tape and offline TM’s. SIAM J. Comput., 12(2):388–394, 1983. URL: https://doi.org/10.1137/0212025.
- [IP01] Russell Impagliazzo and Ramamohan Paturi. On the complexity of k-sat. J. Comput. Syst. Sci., 62(2):367–375, 2001. URL: https://doi.org/10.1006/jcss.2000.1727, doi:10.1006/JCSS.2000.1727.
- [Juk12] Stasys Jukna. Boolean Function Complexity - Advances and Frontiers, volume 27 of Algorithms and combinatorics. Springer, 2012. URL: https://doi.org/10.1007/978-3-642-24508-4.
- [KLRS12] Subrahmanyam Kalyanasundaram, Richard J. Lipton, Kenneth W. Regan, and Farbod Shokrieh. Improved simulation of nondeterministic turing machines. Theor. Comput. Sci., 417:66–73, 2012. URL: https://doi.org/10.1016/j.tcs.2011.05.018.
- [KLV03] George Karakostas, Richard J. Lipton, and Anastasios Viglas. On the complexity of intersecting finite state automata and NL versus NP. Theor. Comput. Sci., 302(1-3):257–274, 2003. URL: https://doi.org/10.1016/S0304-3975(02)00830-7.
- [Kur64] S.-Y. Kuroda. Classes of languages and linear-bounded automata. Information and control, 7(2):207–223, 1964.
- [LL90] Maciej Liskiewicz and Krzysztof Lorys. Fast simulations of time-bounded one-tape turing machines by space-bounded ones. SIAM J. Comput., 19(3):511–521, 1990. URL: https://doi.org/10.1137/0219034.
- [Lou81] Michael C. Loui. A space bound for one-tape multidimensional turing machines. Theor. Comput. Sci., 15:311–320, 1981. URL: https://doi.org/10.1016/0304-3975(81)90084-0.
- [Lou83] Michael C. Loui. Optimal dynamic embedding of trees into arrays. SIAM J. Comput., 12(3):463–472, 1983. doi:10.1137/0212030.
- [LT82] Thomas Lengauer and Robert Endre Tarjan. Asymptotically tight bounds on time-space trade-offs in a pebble game. J. ACM, 29(4):1087–1130, 1982. doi:10.1145/322344.322354.
- [LW13] Richard J. Lipton and Ryan Williams. Amplifying circuit lower bounds against polynomial time, with applications. Comput. Complex., 22(2):311–343, 2013. URL: https://doi.org/10.1007/s00037-013-0069-5.
- [LZPC05] Pinyan Lu, Jialin Zhang, Chung Keung Poon, and Jin-yi Cai. Simulating undirected st-connectivity algorithms on uniform JAGs and NNJAGs. In Proceedings of 16th International Symposium on Algorithms and Computation (ISAAC), volume 3827 of Lecture Notes in Computer Science, pages 767–776. Springer, 2005. URL: https://doi.org/10.1007/11602613_77.
- [MW17] Cody D. Murray and R. Ryan Williams. Easiness amplification and uniform circuit lower bounds. In 32nd Computational Complexity Conference (CCC), volume 79 of LIPIcs, pages 8:1–8:21. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2017. URL: https://doi.org/10.4230/LIPIcs.CCC.2017.8.
- [NW94] Noam Nisan and Avi Wigderson. Hardness vs randomness. J. Comput. Syst. Sci., 49(2):149–167, 1994. URL: https://doi.org/10.1016/S0022-0000(05)80043-1.
- [Pat72] Mike Paterson. Tape bounds for time-bounded turing machines. J. Comput. Syst. Sci., 6(2):116–124, 1972. URL: https://doi.org/10.1016/S0022-0000(72)80017-5.
- [PF79] Nicholas Pippenger and Michael J. Fischer. Relations among complexity measures. J. ACM, 26(2):361–381, 1979. URL: https://doi.org/10.1145/322123.322138.
- [Pip77] Nicholas Pippenger. Fast simulation of combinational logic networks by machines without random-access storage. In Proceedings of the Fifteenth Annual Allerton Conference on Communication, Control and Computing, pages 25–33, 1977.
- [Poo93] Chung Keung Poon. Space bounds for graph connectivity problems on node-named jags and node-ordered jags. In 34th Annual Symposium on Foundations of Computer Science (FOCS), pages 218–227. IEEE Computer Society, 1993. URL: https://doi.org/10.1109/SFCS.1993.366865.
- [Poo00] Chung Keung Poon. A space lower bound for st-connectivity on node-named jags. Theor. Comput. Sci., 237(1-2):327–345, 2000. URL: https://doi.org/10.1016/S0304-3975(00)00019-0.
- [PPST83] Wolfgang J. Paul, Nicholas Pippenger, Endre Szemerédi, and William T. Trotter. On determinism versus non-determinism and related problems (preliminary version). In 24th Annual Symposium on Foundations of Computer Science (FOCS), pages 429–438. IEEE Computer Society, 1983.
- [PR80] Wolfgang J. Paul and Rüdiger Reischuk. On alternation II. A graph theoretic approach to determinism versus nondeterminism. Acta Informatica, 14:391–403, 1980. URL: https://doi.org/10.1007/BF00286494.
- [PR81] Wolfgang J. Paul and Rüdiger Reischuk. On time versus space II. J. Comput. Syst. Sci., 22(3):312–327, 1981. URL: https://doi.org/10.1016/0022-0000(81)90035-0.
- [PV76] Mike Paterson and Leslie G. Valiant. Circuit size is nonlinear in depth. Theor. Comput. Sci., 2(3):397–400, 1976. URL: https://doi.org/10.1016/0304-3975(76)90090-6.
- [Raz91] Alexander A. Razborov. Lower bounds for deterministic and nondeterministic branching programs. In Lothar Budach, editor, 8th International Symposium on Fundamentals of Computation Theory (FCT), volume 529 of Lecture Notes in Computer Science, pages 47–60. Springer, 1991. URL: https://doi.org/10.1007/3-540-54458-5_49.
- [Rei08] Omer Reingold. Undirected connectivity in log-space. J. ACM, 55(4):17:1–17:24, 2008. doi:10.1145/1391289.1391291.
- [SHL65] Richard Edwin Stearns, Juris Hartmanis, and Philip M. Lewis II. Hierarchies of memory limited computations. In 6th Annual Symposium on Switching Circuit Theory and Logical Design, pages 179–190. IEEE Computer Society, 1965. URL: https://doi.org/10.1109/FOCS.1965.11.
- [Sip88] Michael Sipser. Expanders, randomness, or time versus space. J. Comput. Syst. Sci., 36(3):379–383, 1988. URL: https://doi.org/10.1016/0022-0000(88)90035-9.
- [SM73] Larry J. Stockmeyer and Albert R. Meyer. Word problems requiring exponential time: Preliminary report. In Proceedings of STOC, pages 1–9. ACM, 1973. URL: https://doi.org/10.1145/800125.804029.
- [SSZ98] Michael E. Saks, Aravind Srinivasan, and Shiyu Zhou. Explicit or-dispersers with polylogarithmic degree. J. ACM, 45(1):123–154, 1998. URL: https://doi.org/10.1145/273865.273915.
- [SvEB88] Cees F. Slot and Peter van Emde Boas. The problem of space invariance for sequential machines. Inf. Comput., 77(2):93–122, 1988. URL: https://doi.org/10.1016/0890-5401(88)90052-1.
- [Swa78] Sowmitri Swamy. On Space-Time Tradeoffs. PhD thesis, Brown University, USA, 1978. URL: https://cs.brown.edu/research/pubs/theses/phd/1978/swamy.pdf.
- [Tre86] Carol Tretkoff. Bounded oracles and complexity classes inside linear space. In Alan L. Selman, editor, Proceedings of Structure in Complexity Theory, volume 223 of Lecture Notes in Computer Science, pages 347–361. Springer, 1986. URL: https://doi.org/10.1007/3-540-16486-3_110.
- [vEB90] Peter van Emde Boas. Machine models and simulation. In Jan van Leeuwen, editor, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity, pages 1–66. Elsevier and MIT Press, 1990.
- [Wil08] Ryan Williams. Non-linear time lower bound for (succinct) quantified boolean formulas. Electron. Colloquium Comput. Complex., TR08-076, 2008. URL: https://eccc.weizmann.ac.il/eccc-reports/2008/TR08-076/index.html.
Appendix A Appendix: An Overview of The Cook-Mertz Procedure
The goal of this appendix is to describe the Cook-Mertz procedure for Tree Evaluation, and how it extends to arbitrary -ary trees of maximum height , and not just complete -ary trees of height , with the full set of possible nodes. In particular, any tree with every inner node having at most children and depth at most can be evaluated using their algorithm.
Reminder of Theorem 2.2. [CM24, Theorem 7] Tree Evaluation on trees of bit-length , maximum height , and fan-in at most , can be computed in space.
Our description below heavily draws from Goldreich’s exposition of the Cook-Mertz procedure [Gol24]. Let us stress that we make absolutely no claims of originality in the exposition below; we are only including the following in order to make our paper more self-contained.
First, let us describe how to handle inner nodes in the tree which have less than children. Letting denote the number of children of , for all we let be the function which returns the -th bit of the function computed at node . For every node with , we add extra children to which are simply leaves with value , so that has exactly children. The resulting new functions at node are defined to simply ignore all bits of input with index larger than . Now all inner nodes of the tree have exactly children, and all values of inner nodes remain the same as before. For our intended application to an implicit Tree Evaluation instance, we note that the reduction of this paragraph can also be implemented in a space-efficient way: we only need to be able to check the number of children of the current node , which we can easily do in our applications (e.g., Theorem 3.1 and Theorem 1.5). In particular, in our reduction the memory stores the entire computation graph which defines the tree, so we know the number of children of every tree node.
Let be a field of characteristic two such that . For a node and index , let be the multilinear extension (over ) of the function which returns the -th bit of the function computed at node . In particular, is a polynomial of degree on variables. Letting denote a block of variables for ,
(2) |
where is the unique multilinear polynomial of degree such that
and for all such that .
Let be the set of all strings over the alphabet (including the empty string). Observe that for every node with label , its children have labels . Thus by definition, the value of is
(3) | ||||
(4) |
The Tree Evaluation procedure has two types of storage:
-
•
One storage type is “local”, holding the index of the current node ( bits), as well as a recursion stack of height at most with bits stored at each level of recursion, which is bits in total.
-
•
The other type of storage is “catalytic” or “global” storage ([Gol24]). This space is used to store separate -bit blocks. In the final version of the algorithm that we describe, each -bit block corresponds to storing elements of . These blocks are repeatedly reused at every node of the tree, to evaluate the functions at each node.
In the following, for simplicity, we will describe an space bound using multilinear extensions; with minor modifications, Cook-Mertz [CM24] (see also Goldreich [Gol24]) show how relaxing to merely low-degree extensions allows us to use only space. At the end, we will discuss how to achieve this space bound.
At any point in time, we will denote the entire storage content of the algorithm by , where
-
•
, , is the label of a node in the tree, denoting the path from root to that node. ( denotes the root node, denotes the -th child of the root, etc.),
-
•
each is a collection of registers holding elements of , and
-
•
is a collection of registers , also holding elements of .
Initially, we set the storage content to be
i.e., all-zeroes, starting at the root node.
For a node label , let be the value of ; we think of as an element of . Our goal is to compute , the value of the root of the tree. We will give a recursive procedure Add which takes storage content and returns the content . That is, Add adds the value to the last register, over the field . Observe that, if Add works correctly, then calling Add on will put the value of the root node into the last register.
Now we describe Add. Let be such that , and let be an -th root of unity in .
Add: Given the storage content ,
If is a leaf, look up the value in the input, and return .
For ,
For ,
Rotate registers, so shifts to .
Multiply by , and call Add on ,
which returns , where .
(here, is just the concatenation of the string with the symbol )
(at this point, the storage has the form: )
Update back to .
For all ,
Compute using extra space.
Update .
Erase the space used to compute .
For ,
Call Add on ,
which returns , where .
(note: here we use the fact that is characteristic two)
Divide by , so that .
Rotate registers, so shifts to
.
Update back to .
(claim: now the storage has the form )
(note that , by (3) and (4))
Return .
Recall that is characteristic two, so that adding twice to the register has a net contribution of zero. The key to the correctness of Add (and the claim in the pseudocode) is the following polynomial interpolation formula (proved in [CM24]): for every -th root of unity over , for every , and for every polynomial of degree less than ,
(5) |
(Note that our formula is slightly simpler than Cook-Mertz [CM24] and Goldreich [Gol24], because we assume is a field of characteristic two.) Equation (5) ensures that the content of is indeed updated to be , which equals by equations (3) and (4).
Let us briefly compare Add to the “obvious” algorithm for Tree Evaluation. The “obvious” algorithm allocates fresh new space for each recursive call to store the values of the children, traversing the tree in a depth-first manner. Each level of the stack holds bits, and this approach takes space overall. In contrast, the algorithm Add adds the values of the children to the existing content of the registers, and uses polynomial interpolation to add the correct value of the node to the last register.
More prescisely, while Add has no initial control over the content of , equation (5) allows Add to perform a type of “worst-case to arbitrary-case” reduction: in order to add the value of function on specific to the register , given that the initial space content is some arbitrary , it suffices to perform a running sum from to , where we multiply the content by , add into the current space content, then evaluate the functions on that content, storing the result of the running sum into , which (after summing from ) results in adding the value into . That is, starting from any arbitrary values in the registers which are beyond our control, we can compute on any desired input registers.
The overall space usage of the above procedure is
The “local” storage is used to store current values , , and bits to store a program counter (storing which of the two sets of recursive calls we’re at), at each level of the tree. This takes bits for each level of recursion, and space overall. The node index is also stored, which takes bits.
The “global” storage holds the content . Each and consist of elements of , which take bits each. To compute the multilinear extension on , we follow equation (2): we use bits of space to sum over all , and we use extra space to evaluate the expression and multiply it with the value .
Finally, we describe how to improve the space usage to . The idea is to use “low-degree extensions” of , rather than multilinear extensions, and to group the -bit output of into approximately blocks, instead of blocks. In more detail, we modify the polynomials over so that they have variables instead of variables, and the ranges over a set for a constant , instead of . To accommodate this, we will need to adjust the order of slightly.
Let , for an even integer to be determined later (recall is characteristic two). Let be a set of cardinality , which is put in one-to-one correspondence with the elements of , and let . We can then view the function as instead having domain , and co-domain . That is, we can think of the output of as the concatenation of subfunctions , with each . For each , the multilinear polynomial of equation (2) can then be replaced by another polynomial in variables over :
(6) |
where again is a polynomial over such that , and vanishes on all such that . Such a polynomial can be constructed with degree . As long as this quantity is less than , we can pick an -th root of unity with and we may apply the polynomial interpolation formula of equation (5). WLOG, we may assume and are even powers of two (otherwise, we can round them up to the closest such powers of two). Setting , we have
As a result, each of the registers can now be represented with elements of , rather than elements of as before. Since , each such register can now be represented with bits instead, and the new polynomials of (6) can still be evaluated in space.