This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Simulating Time With Square-Root Space111To appear in STOC 2025.

Ryan Williams
MIT
Work supported in part by NSF CCF-2127597 and NSF CCF-2420092.
Abstract

We show that for all functions t(n)nt(n)\geq n, every multitape Turing machine running in time tt can be simulated in space only O(tlogt)O(\sqrt{t\log t}). This is a substantial improvement over Hopcroft, Paul, and Valiant’s simulation of time tt in O(t/logt)O(t/\log t) space from 50 years ago [FOCS 1975, JACM 1977]. Among other results, our simulation implies that bounded fan-in circuits of size ss can be evaluated on any input in only spoly(logs)\sqrt{s}\cdot\text{poly}(\log s) space, and that there are explicit problems solvable in O(n)O(n) space which require n2εn^{2-\varepsilon} time on a multitape Turing machine for all ε>0\varepsilon>0, thereby making a little progress on the 𝖯{\sf P} versus 𝖯𝖲𝖯𝖠𝖢𝖤{\sf PSPACE} problem.

Our simulation reduces the problem of simulating time-bounded multitape Turing machines to a series of implicitly-defined Tree Evaluation instances with nice parameters, leveraging the remarkable space-efficient algorithm for Tree Evaluation recently found by Cook and Mertz [STOC 2024].

1 Introduction

One of the most fundamental questions in theoretical computer science is: how does time relate to space, in computation? For instance, can every problem solvable in polynomial space (𝖯𝖲𝖯𝖠𝖢𝖤{\sf PSPACE}) also be solved in polynomial time (𝖯{\sf P})? Although it is widely believed that 𝖯𝖯𝖲𝖯𝖠𝖢𝖤{\sf P}\neq{\sf PSPACE}, there has been scant progress on separating time and space over the decades [SHL65, HU68, SM73, HPV75, PR81, HLMW86]. In particular, for general models of computation (beyond one-tape models), only time lower bounds of the form Ω(nlogn)\Omega(n\log n) have been reported for linear-space problems, whereas 𝖯𝖯𝖲𝖯𝖠𝖢𝖤{\sf P}\neq{\sf PSPACE} requires showing that there is a linear-space problem that cannot be solved in O(nk)O(n^{k}) time for every constant k1k\geq 1.

In this paper, we make another step towards separating 𝖯{\sf P} from 𝖯𝖲𝖯𝖠𝖢𝖤{\sf PSPACE}, by showing that there are problems solvable in O(n)O(n) space that cannot be solved in n2/logcnn^{2}/\log^{c}n time for some constant c>0c>0. This is the first generic polynomial separation of time and space in a robust computational model (namely, the multitape Turing machine). The separation is accomplished by exhibiting a surprisingly space-efficient simulation of generic time-bounded algorithms. More formally, let 𝖳𝖨𝖬𝖤[t(n)]{\sf TIME}[t(n)] be the class of decision problems decided by O(t(n))O(t(n))-time multitape Turing machines, and let 𝖲𝖯𝖠𝖢𝖤[s(n)]{\sf SPACE}[s(n)] the class decided by O(s(n))O(s(n))-space multitape Turing machines.

Theorem 1.1.

For every function t(n)nt(n)\geq n, 𝖳𝖨𝖬𝖤[t(n)]𝖲𝖯𝖠𝖢𝖤[t(n)logt(n)]{\sf TIME}[t(n)]\subseteq{\sf SPACE}[\sqrt{t(n)\log t(n)}].

We find Theorem 1.1 to be very surprising. It appears to have been a common belief for decades that tt time cannot be simulated in t1εt^{1-\varepsilon} space, for any constant ε>0\varepsilon>0. For example, assuming that tt time cannot be simulated in t1εt^{1-\varepsilon} space, Sipser gave a poly-time derandomization of 𝖱𝖯{\sf RP} [Sip88]222Sipser also assumed explicit constructions of certain bipartite expander graphs, which were later constructed [SSZ98]. and Nisan-Wigderson gave a subexponential-time derandomization of 𝖡𝖯𝖯{\sf BPP} ([NW94, Section 3.6]).333Note that the derandomization of Nisan and Wigderson can also be based on assuming the weaker hypothesis 𝖳𝖨𝖬𝖤[t]Σ2𝖳𝖨𝖬𝖤[t1ε]{\sf TIME}[t]\not\subset\Sigma_{2}{\sf TIME}[t^{1-\varepsilon}], which remains open and almost certainly cannot be refuted using the ideas of this paper.

Given the power of multitape Turing machines, Theorem 1.1 has many interesting consequences. Along with the ability to evaluate generic straight-line programs of length nn (over a bounded domain) in only O~(n)\tilde{O}(\sqrt{n}) space (see Section 4.1), Theorem 1.1 immediately implies by diagonalization that there are s(n)s(n)-space problems that cannot be solved in s(n)2εs(n)^{2-\varepsilon} time for all ε>0\varepsilon>0: a “truly-polynomial” time lower bound.

Corollary 1.2.

For space constructible s(n)ns(n)\geq n, and all ε>0\varepsilon>0, 𝖲𝖯𝖠𝖢𝖤[s(n)]𝖳𝖨𝖬𝖤[s(n)2ε]{\sf SPACE}[s(n)]\not\subset{\sf TIME}[s(n)^{2-\varepsilon}].

Similarly, if Theorem 1.1 could be extended to show for all ε>0\varepsilon>0 that every time-t(n)t(n) Turing machine can be simulated in O(t(n)ε)O(t(n)^{\varepsilon}) space, then 𝖯𝖯𝖲𝖯𝖠𝖢𝖤{\sf P}\neq{\sf PSPACE} would follow (see the arguments in Section 4). We discuss this possibility at the end of the paper, in Section 5. It follows from Corollary 1.2 that any complete problem for linear space requires quadratic time to be solved. For example:

Corollary 1.3.

The language L={M,x,1k|M|k and M(x) halts in space k}L=\{\langle M,x,1^{k}\rangle\mid|M|\leq k\text{ and }M(x)\text{ halts in space }k\} requires n2εn^{2-\varepsilon} time to solve on a multitape Turing machine, for every ε>0\varepsilon>0.

Another consequence is that there are context-sensitive languages which need essentially n2n^{2} time to be recognized by multitape Turing machines. This follows from the fact that 𝖭𝖲𝖯𝖠𝖢𝖤[n]\mathsf{NSPACE}[n] (nondeterministic linear space) is equivalent to the class of context-sensitive languages [Kur64].

Since multitape Turing machines can evaluate any circuit of size ss (and fan-in two) on any given input of length nsn\leq s in only spoly(logs)s\cdot\text{poly}(\log s) time [Pip77], it follows that arbitrary subquadratic-size circuits can be simulated by subexponential-size branching programs.444Unfortunately, the reference [Pip77] does not seem to be available online. See Section 4.1 for an alternative sketch of the argument.

Corollary 1.4.

There is a universal k1k\geq 1 such that for all sns\geq n, every bounded fan-in circuit of size ss and nn inputs has a branching program of size at most 2kslogks2^{k\sqrt{s}\log^{k}s}.

It is an interesting question whether the simulation of Theorem 1.1 can be extended beyond multitape Turing machines, to more powerful computational models. We can generalize Theorem 1.1 to dd-dimensional multitape Turing machines, as follows:

Theorem 1.5.

Every decision problem solvable by a t(n)t(n)-time dd-dimensional multitape Turing machine can be decided in O((t(n)logt(n))11/(d+1))O((t(n)\log t(n))^{1-1/(d+1)}) space (on a typical one-dimensional multitape Turing machine).

The space bound of Theorem 1.5 matches the best known space bounds for simulating time-tt machines with one dd-dimensional tape [Lou81, LL90]. See Section 2.1 for more references on prior simulations.

Remark 1.6 (Extension to Oblivious Random-Access Models).

At the present time, we do not know how to extend Theorem 1.1 to arbitrary random-access models of computation. The main issue is that the indegree of the resulting computation graph (defined in Section 3) is so high that the computation graph cannot be stored in O(t1ε)O(t^{1-\varepsilon}) memory. However, we remark that if the pattern of reads and writes of a given random-access machine model is oblivious, in the sense that given a step-count i=1,,ti=1,\ldots,t specified in O(logt)O(\log t) bits, the names of the registers being accessed in step ii can be computed in O(t)O(\sqrt{t}) space, then Theorem 1.1 does apply, with only a poly(logt)\text{poly}(\log t) extra space factor. This is because such machines can be simulated by circuits of tpoly(logt)t\cdot\text{poly}(\log t) size, which can in turn be simulated efficiently by multitape Turing machines in tpoly(logt)t\cdot\text{poly}(\log t) time (see Section 4.1).

1.1 Intuition

The key idea behind Theorem 1.1 is to reduce the problem of simulating arbitrary time-bounded computations to particular instances of the Tree Evaluation problem, defined by S. Cook, McKenzie, Wehr, Braverman, and Santhanam [CMW+12]. In this problem, one is given a complete dd-ary tree of height hh, where each leaf of the tree is labeled with a bb-bit string, and each inner node of the tree is labeled with a function from dbd\cdot b bits to bb bits.555Our notation is slightly different from the usual setup, where there is a parameter k=2bk=2^{b}. Here, our notation more closely follows Goldreich’s exposition [Gol24]. (Each function is presented as a table of 2db2^{d\cdot b} values, each of which are bb bits.) Each inner node computes its value by evaluating its function on the values of its dd children. The task of Tree Evaluation is to determine the value of the root of the tree. To make this a decision problem, we will decide whether the first bit of the root’s value equals 11 or not.

The obvious depth-first algorithm for Tree Evaluation uses O(dbh)O(d\cdot b\cdot h) space, to store intermediate results at each level of the tree. The authors of [CMW+12] conjectured that Tree Evaluation is not in 𝖭𝖫{\sf NL}, which would separate 𝖭𝖫{\sf NL} from 𝖯{\sf P} (in fact, stronger separations would hold). Recent algorithmic work has led to significant skepticism that this conjecture is true. In a line of work, J. Cook and Mertz have found surprisingly space-efficient methods for Tree Evaluation [CM20, CM21, CM22, CM24], culminating in a marvelous algorithm using space only O(db+hlog(db))O(d\cdot b+h\log(d\cdot b)) ([CM24, Theorem 7], see Appendix A).

Utilizing the old notion of “block-respecting” Turing machines [HPV75], we show how to reduce time-tt computations to (implicitly defined) Tree Evaluation instances of height h=Θ(t/b)h=\Theta(t/b), bit-length bb, and fan-in dd, where bb is a parameter from 11 to tt that we can choose, and dd is a fixed constant depending on the machine. In particular, one can generate arbitrary bits of our Tree Evaluation instance in a space-efficient way as needed.

Very roughly speaking, each level of the tree will correspond to a “time block” of bb steps, and the value of each node vv of the tree will be a “block of tape” of length Θ(b)\Theta(b) (a sequence of Θ(b)\Theta(b) contiguous cells) from some particular tape of a kk-tape Turing machine; the value of the root will contain the contents of the final tape block of the computation, including the relevant accept/reject state. The evaluation function FvF_{v} at node vv will simulate the Turing machine for Θ(b)\Theta(b) steps, using the contents of kk tape blocks of length Θ(b)\Theta(b) from previous time blocks; this FvF_{v} can be computed in O(b)O(b) time and space, given the relevant contents of O(b)O(b)-length tape blocks from all kk tapes and the state from the previous time block. (The leaves of the tree roughly correspond to strings of Θ(b)\Theta(b) blank symbols, or a block of Θ(b)\Theta(b) symbols from the input xx.) We end up with a Tree Evaluation instance of height h=O(t/b)h=O(t/b) and fan-in d=O(1)d=O(1), where Θ(b)\Theta(b) steps of the Turing machine are processed at each node, and where each node is labeled with a string of about Θ(b)\Theta(b) bits. (There are several technical details to worry over, such as the problem of knowing which tape blocks to use at each time step, but this is the high-level idea.)

Observe that, under the above parameter settings, the obvious depth-first Tree Evaluation procedure would yield an algorithm running in Θ(hb)=Θ(t)\Theta(h\cdot b)=\Theta(t) space. Applying the Cook-Mertz procedure, and setting db=hlog(db)=Θ(tlog(db)/b)d\cdot b=h\log(d\cdot b)=\Theta(t\cdot\log(d\cdot b)/b) to optimize the space usage, we find b=Θ(tlogt)b=\Theta(\sqrt{t\log t}) and obtain the O(tlogt)O(\sqrt{t\log t}) space bound of Theorem 1.1.

Organization.

Section 2 discusses preliminaries (which we do not recommend skipping, but it is short). Section 3 begins by proving a short “warm-up” simulation (Theorem 3.1) which already achieves O(tlogt)O(\sqrt{t}\log t) space and gives many of the key ideas. Section 3.2 proves Theorem 1.1, discusses the possibility of improvement, and proves Theorem 1.5. Section 4 discusses various corollaries mentioned earlier, and Section 5 concludes with a number of new directions to consider.

2 Preliminaries

We assume the reader is familiar with basic notions in time and space bounded complexity [Gol08, AB09]. There are a few pieces not covered in the usual complexity textbooks which are important for this work.

Robustness of Space Complexity.

It is important to recall that the class of problems 𝖲𝖯𝖠𝖢𝖤[s(n)]{\sf SPACE}[s(n)] is robust under changes in the definition of the underlying machine model. For essentially any model (with sequential access versus random access, or tapes versus registers) that measures space usage by the total number of bits needed to represent the storage, 𝖲𝖯𝖠𝖢𝖤[s(n)]{\sf SPACE}[s(n)] is the same complexity class. This is part of what is known as the “invariance thesis” [SvEB88, vEB90]. As a consequence, we do not have to worry much about machine models when we’re trying to design a space s(n)s(n) algorithm and we’re ignoring time complexity. This allows us to be more lax in our exposition.

Block-Respecting Turing Machines.

In our main simulation (Theorem 1.1) we will utilize a construction to simplify the analysis of multitape Turing machines. The construction was originally used by Hopcroft, Paul, and Valiant [HPV75] in their O(t/logt)O(t/\log t)-space simulation of tt time, and it has been used in many subsequent works to “decompose” time-bounded computations (such as [PPST83, KLV03, KLRS12, LW13, MW17]). A time-t(n)t(n) block-respecting multitape Turing machine with blocks of length b(n)b(n) has the property that on all inputs of length nn, the computation is partitioned into O(t(n)/b(n))O(t(n)/b(n)) time blocks of length b(n)b(n), while each tape is partitioned into O(t(n)/b(n))O(t(n)/b(n)) contiguous tape blocks of length b(n)b(n). The key property is that for every time block, each tape head is in exactly one tape block during that time block, so that tape heads can switch between tape blocks only at the end of a time block. In particular, every tape head only crosses from one tape block to another, on time steps that are integer multiples of b(n)b(n). A key lemma of [HPV75] is that every multitape Turing machine can be made block-respecting with low overhead.

Lemma 2.1 ([HPV75]).

For every time-constructible b(n),t(n)b(n),t(n) such that logt(n)b(n)t(n)\log t(n)\leq b(n)\leq t(n) and every t(n)t(n)-time \ell-tape Turing machine MM, there is an equivalent O(t(n))O(t(n))-time block-respecting (+1)(\ell+1)-tape Turing machine MM^{\prime} with blocks of length b(n)b(n).

The Tree Evaluation Problem.

As mentioned earlier, we will reduce time-tt computations to the Tree Evaluation problem. Our formulation of Tree Evaluation will be relaxed from the original notion: we allow a tree of height at most hh, where each inner node vv of the tree has kvdk_{v}\leq d children for some integer kv2k_{v}\geq 2 depending on vv, and vv is labeled with a function from kvbk_{v}\cdot b bits to bb bits. As before, each leaf is labeled with a bb-bit string, and we wish to determine the first bit of the root’s value. Prior work on Tree Evaluation assumed a complete dd-ary tree where all root-to-leaf paths have length equal to hh; in our setting, we allow some paths to have length than hh, and some nodes to have fewer than dd children. However, the algorithm of Cook and Mertz works just as well in our relaxed formulation:

Theorem 2.2 ([CM24], Theorem 7).

Tree Evaluation on trees of bit-length bb, maximum height hh, and fan-in at most dd, can be computed in O(db+hlog(db))O(d\cdot b+h\log(d\cdot b)) space.

In Appendix A, we give an overview of how the Cook-Mertz algorithm works, and describe why it extends to our case.

2.1 More Related Work

As mentioned earlier, Hopcroft, Paul, and Valiant showed that time-tt multitape Turing machines can be simulated in O(t/logt)O(t/\log t) space [HPV75]. This simulation was later extended beyond the multitape model, yielding a more space-efficient simulation for essentially all common models of computation used in algorithms and complexity [PR81, HLMW86]. Paterson and Valiant [PV76] showed that circuits of size ss can be simulated by depth O(s/logs)O(s/\log s) circuits, implying a space-efficient simulation of circuits. Similarly, Dymond and Tompa [DT85] showed that time tt can be simulated in alternating time O(t/logt)O(t/\log t).

For decades, a square-root space simulation of the form in Theorem 1.1 has been known for one-tape Turing machines: Hopcroft and Ullman [HU68] showed that time-tt Turing machines with one read-write tape can be simulated in space O(t)O(\sqrt{t}). Other improvements on this simulation (e.g., improving the time of the simulation, improving the model(s) slightly) include [Pat72, IM83, LL90].

Paul, Pippenger, Szemerédi, and Trotter [PPST83] proved the separation 𝖭𝖳𝖨𝖬𝖤[n]𝖳𝖨𝖬𝖤[n]{\sf NTIME}[n]\neq{\sf TIME}[n] for multitape Turing machines. Unfortunately, their proof only works for (one-dimensional) multitape Turing machines, and it is infamously still open to prove 𝖭𝖳𝖨𝖬𝖤[n]𝖳𝖨𝖬𝖤[n]{\sf NTIME}[n]\neq{\sf TIME}[n] in more general models. We prove Theorem 1.5 (an extension to the dd-dimensional case) to illustrate that our approach via Tree Evaluation is more broadly applicable.

3 A More Space-Efficient Simulation of Time

In this section, we prove Theorem 1.1, showing that any multitape Turing machine MM running in time tt can be simulated in space O(tlogt)O(\sqrt{t\log t}). We will proceed by reducing the computation of MM on an input xx (of length nn) to an instance of Tree Evaluation with particular parameters. (In fact, in the full proof of Theorem 1.1, we will reduce computing MM on xx to a series of Tree Evaluation instances.)

One initial remark is in order. First, note that we did not specify in the statement of Theorem 1.1 that the time function t(n)t(n) is time constructible, in that there is a Turing machine that can print the value of t(n)t(n) on 1n1^{n} in O(t(n))O(t(n)) steps. This is because in our space-bounded simulation, we can simply try increasing values t(n)=n,n+1,n+2,t(n)=n,n+1,n+2,\ldots, one at a time, and not worry about constructibility issues. (This trick is used in other space-bounded simulations, such as [HPV75].)

Assume MM has \ell tapes which are infinite in one direction, and all tape heads are initially at the leftmost cell. We also assume that the input xx is received on tape 11.

3.1 A Warm-up Result

Before describing the full O(tlogt)O(\sqrt{t\log t}) space simulation, first we present a slightly worse algorithm which uses O(tlogt)O(\sqrt{t}\log t) space and requires t(n)n2t(n)\geq n^{2}, but is simpler to describe. This bound is already extremely surprising (to the author), and the simulation gives most of the key intuition as well.

Theorem 3.1.

For every function t(n)n2t(n)\geq n^{2}, 𝖳𝖨𝖬𝖤[t(n)]𝖲𝖯𝖠𝖢𝖤[t(n)logt(n)]{\sf TIME}[t(n)]\subseteq{\sf SPACE}[\sqrt{t(n)}\log t(n)].

The proof of Theorem 3.1 will utilize the well-known oblivious two-tape simulation of multitape Turing machines. A multitape Turing machine MM is oblivious if for every nn, and every input xx of length nn, the tape head movements of MM on xx depend only on nn, and not on xx itself.

Theorem 3.2 ([HS66, PF79, FLvMV05]).

For every time-t(n)t(n) multitape Turing machine MM, there is an equivalent time-T(n)T(n) two-tape Turing machine MM^{\prime} which is oblivious, with T(n)O(t(n)logt(n))T(n)\leq O(t(n)\log t(n)). Furthermore, given nn and i[T(n)]i\in[T(n)] specified in O(logt(n))O(\log t(n)) bits, the two head positions of MM^{\prime} on a length-nn input at time step ii can be computed in poly(logt(n))\text{poly}(\log t(n)) time.

Let MM^{\prime} be the machine obtained from Theorem 3.2. The first idea behind our simulation is to conceptually partition the computation of MM^{\prime} on a length-nn input xx into time and tape “blocks” of length b(n)b(n), for a parameter b(n)logt(n)b(n)\geq\log t(n) to be set later. In particular, the two tapes of MM^{\prime} are split into tape blocks of b(n)b(n) contiguous cells, and the T(n)T(n) steps of MM^{\prime} on xx are split into B:=B(n)=O(T(n)/b(n))B:=B(n)=O(T(n)/b(n)) contiguous time blocks of length up to b(n)b(n). Observe that, for any block of b(n)b(n) steps on MM^{\prime}, and any given tape h{1,2}h\in\{1,2\}, there are at most two tape blocks of tape hh that may have been accessed during the time block (moreover, they are adjacent tape blocks on tape hh). We will construct a Tree Evaluation instance where the functions at each node of the tree evaluate a single time block, taking as input some relevant tape blocks from previous time blocks. In fact, the same time block may be recomputed many times over the entire Tree Evaluation instance, in order to evaluate the last time block at the root of the tree.

A Computation Graph.

To this end, we define a computation graph GM,xG_{M^{\prime},x} on B+1=O(T(n)/b(n))B+1=O(T(n)/b(n)) nodes, a directed acyclic graph whose edges model the information flow from earlier time blocks in the computation to later time blocks: the reads and writes of MM^{\prime} and the head movements across blocks of tape. Our notion is very similar to the computation graphs defined in [HPV75, PR80]. Eventually, the computation being performed on the (B+1)(B+1)-node graph GM,xG_{M^{\prime},x} will be viewed as a Tree Evaluation instance of height B+1B+1.

Our graph GM,xG_{M^{\prime},x} has a node ii for each time block i{0,1,,B}i\in\{0,1,\ldots,B\}, and the edges will indicate which previous time block contents need to be read in order to compute the content of the tape blocks accessed during time block ii. In particular, we say that all tape blocks on the two tapes are active during time block 0, and for i>0i>0, a tape block is active during time block ii if the tape head visits some cell of the block during time block ii. We put an edge from (i,j)(i,j) in GM,xG_{M^{\prime},x} with i<ji<j if and only if:

  • either j=i+1j=i+1, or

  • when MM^{\prime} is run on input xx, there is some tape block active during time block ii that is not active again until time block jj. That is, for some tape head hh, it reads the same tape block CC in both time blocks ii and jj, but hh does not read tape block CC during any time blocks i+1,,j1i+1,\ldots,j-1. (Alternatively, if some tape block is being accessed for the first time in time block ii, we have an edge (0,i)(0,i) to reflect the fact that all tape blocks are defined to be active in time block 0.)

Observe that each node ii has indegree at most 55: one edge from i1i-1, and at most four other edges for (at most) four active tape blocks during time block ii (two active tape blocks for each of the two tapes).

The key insight behind the computation graph notion is that the information needed to simulate MM^{\prime} during a time block jj only requires knowing the information computed in previous time blocks ii, where (i,j)(i,j) is an edge in GM,xG_{M^{\prime},x}. In particular, the state and head positions of MM^{\prime} at the start of time block jj may be obtained from the state and head positions at the end of time block j1j-1, so we have the edge (j1,j)(j-1,j), and we have an edge (i,j)(i,j) for each of those blocks of tape accessed during a time block ii which are not accessed again until time block jj.

Due to the obliviousness of MM^{\prime} (Theorem 3.2), we have the following claim:

Claim 3.3.

Given the indices of any two nodes ii, jj in GM,xG_{M^{\prime},x}, we can determine if there is an edge from ii to jj in poly(logt(n))\text{poly}(\log t(n)) additional space.

The claim follows because determining whether (i,j)(i,j) is an edge just requires us to keep track of the locations of the two tape heads, from some time block ii to a later time block jj. By Theorem 3.2, we can calculate the two tape head locations at any point in time using only poly(logt(n))\text{poly}(\log t(n)) time, so we only have to use poly(logt(n))\text{poly}(\log t(n)) space to keep track of whether a tape block accessed during time block ii is only accessed later at time block jj.

The Functions at the Nodes.

Now we define what is being computed at each node of GM,xG_{M^{\prime},x}. The content of time block ii, denoted by content(i)\text{content}(i), is defined as follows:

  • content(0)\text{content}(0) is the initial configuration of MM^{\prime} running on the input xx of length nn, encoded in n+O(1)n+O(1) bits. (Recalling we have t(n)n2t(n)\geq n^{2}, we will eventually set b(n)nb(n)\geq n. Thus the initial configuration of MM^{\prime} on xx can “fit” in one tape block.)

  • For i{1,,B}i\in\{1,\ldots,B\}, content(i)\text{content}(i) encodes information about the status of MM^{\prime} on xx at the end of time block ii: the state of MM^{\prime}, the two tape head positions, and a list of the contents of those tape blocks accessed during time block ii. As there are at most four such tape blocks which may have been accessed during time block ii, content(i)\text{content}(i) can be encoded in O(b(n)+logt(n))O(b(n))O(b(n)+\log t(n))\leq O(b(n)) bits.

Note that for every fixed j[B]j\in[B], if we are given content(i)\text{content}(i) for all edges (i,j)(i,j) in GM,xG_{M^{\prime},x}, then we can compute content(j)\text{content}(j) in O(b(n))O(b(n)) time and space, by simply simulating MM^{\prime} for b(n)b(n) steps on the tape blocks of the relevant content(i)\text{content}(i) values.

Our goal is to determine content(B)\text{content}(B), the content of the final time block, which will contain either an accept or reject state and determine the answer.

A Tree Evaluation Instance.

To do this, we construct a Tree Evaluation instance in which the root node computes content(B)\text{content}(B), its children compute content(i)\text{content}(i) for all ii such that (i,B)(i,B) is an edge in GM,xG_{M^{\prime},x}, and so on; the leaves of our Tree Evaluation instance will compute content(0)\text{content}(0), the initial configuration of MM^{\prime} on xx. This transformation is analogous to how depth-dd Boolean circuits of fan-in FF can be modeled by formulas of depth-dd and size at most O(Fd)O(F^{d}), by tracing over all possible paths from the output gate of the circuit to an input gate of the circuit; see for example [Juk12, Chapter 6].

More precisely, we define a tree RM,xR_{M^{\prime},x} of height at most B+1B+1 and fan-in at most 5, with a root node that will evaluate to content(B)\text{content}(B). Each node vv of RM,xR_{M^{\prime},x} is labeled by a distinct path from some node jj in GM,xG_{M^{\prime},x} to the node BB of GM,xG_{M^{\prime},x}. Inductively, we define the labels as follows:

  • the root node of RGR_{G^{\prime}} is labeled by the empty string ε\varepsilon (the empty path from BB to itself), and

  • for every node vv in RM,xR_{M^{\prime},x} labeled by a distinct path PP from jj to BB, and for every node ii with an edge to jj in GM,xG_{M^{\prime},x}, the node vv has a child ww in RM,xR_{M^{\prime},x} labeled by the path which takes the edge (i,j)(i,j) then the path PP to BB.

Observe that paths PP of length \ell from a node jj to node BB can be encoded in O()O(\ell) bits, since the indegree of each node is at most 55. Furthermore, given such a path PP encoded in this way, observe we can determine the node jj at the start of PP in poly(logt)\text{poly}(\log t) space, by making repeated calls to the edge relation (as in 3.3).

The desired value to be computed at node vv (labeled by a path from jj to BB) is precisely content(j)\text{content}(j). For j=0j=0, this value is just the initial configuration of MM^{\prime} on xx, which can be produced immediately in O(n)O(n) time and space. For j>0j>0, content(j)\text{content}(j) can be computed in O(b(n))O(b(n)) time and space given the values of the children of vv. Since GM,xG_{M^{\prime},x} has at most B+1B+1 total nodes, the height of RM,xR_{M^{\prime},x} is at most B+1B+1. By induction, the value of the root of RM,xR_{M^{\prime},x} is precisely content(B)\text{content}(B).

While the tree RM,xR_{M^{\prime},x} has 2Θ(B)2Θ(T(n)/b(n))2^{\Theta(B)}\leq 2^{\Theta(T(n)/b(n))} nodes, observe that for any given node vv of RM,xR_{M^{\prime},x} (labeled by a path to BB), we can easily compute the labels of the children of vv in poly(logt(n))\text{poly}(\log t(n)) space, using 3.3 to compute the edges of the graph GM,xG_{M^{\prime},x}. Thus, it takes only poly(logt(n))\text{poly}(\log t(n)) additional space to determine the children of any given node of RM,xR_{M^{\prime},x}, and this space can be immediately reused once these children are determined. So without loss of generality, we may assume we have random access to the tree RM,xR_{M^{\prime},x}, its leaves, and its functions at each node.

Finally, we call the Cook-Mertz Tree Evaluation algorithm (Theorem 2.2) in its most general form on RM,xR_{M^{\prime},x}. Recall that the space bound of this algorithm, for trees of height at most hh, fan-in at most dd, computing bb-bit values at each node, is

O(db+hlog(db)).O(d\cdot b+h\log(d\cdot b)).

For us, d=5d=5, b=b(n)b=b(n), and h=O(T(n)/b(n))O(t(n)logt(n))/b(n)h=O(T(n)/b(n))\leq O(t(n)\log t(n))/b(n). We only use O(b(n))O(b(n)) additional time and space for each function call to compute some content(j)\text{content}(j), and we can reuse this space for every separate call. Therefore our space bound is optimized up to constant factors, by setting b(n)b(n) such that

b(n)2=Θ(t(n)logt(n)logb(n)),b(n)^{2}=\Theta(t(n)\log t(n)\cdot\log b(n)),

so b(n)=t(n)logt(n)b(n)=\sqrt{t(n)}\cdot\log t(n) suffices. This completes the proof of Theorem 3.1.

3.2 The Main Result

Now we describe how to improve the bound of the space simulation, by avoiding the tlogtt\log t blowup of the oblivious simulation. This will establish Theorem 1.1. The main difficulty is that without the oblivious guarantee of Theorem 3.2, we do not know how to determine the edges of computation graph GM,xG_{M,x} in an efficient way. To remedy this, we will use more space: we will enumerate over possible computation graphs GG^{\prime}, and introduce a method for checking that G=GM,xG^{\prime}=G_{M^{\prime},x} in the functions of our Tree Evaluation instance. Our graph enumeration will be performed in a particularly space-efficient way, so that if Tree Evaluation turns out to be in logspace, the simulation of this paper will yield an O(t(n))O(\sqrt{t(n)}) space bound.

As before, we start with a multitape MM which runs in t(n)t(n) time, and let xx be an input of length nn.

First, we make MM block-respecting as in Lemma 2.1 with block-length b=b(n)b=b(n) on inputs of length nn, for a parameter bb to be set later. The new multitape machine MM^{\prime} has p:=+1p:=\ell+1 tapes, runs in time O(t(n))O(t(n)), and has B:=O(t(n)/b(n))B:=O(t(n)/b(n)) time and tape blocks.666Strictly speaking, we do not have to make MM block-respecting, but it does make aspects of the presentation a little cleaner: we do not have to reason about “active” tape blocks as we did in the warm-up (Theorem 3.1). Foreshadowing a bit, we will not use a block-respecting notion in the later extension to dd-dimensional Turing machines (Theorem 1.5) and again use “active” tape blocks.

The Computation Graph.

We start by refining the computation graph notion from Theorem 3.1. Here, our computation graph GM,xG_{M^{\prime},x} is similar but not identical to that of [HPV75] and the warm-up Theorem 3.1. Because we will allow for space bounds which are smaller than the input length n=|x|n=|x|, we have to make several modifications to GM,xG_{M^{\prime},x} to fit the specifications of Tree Evaluation. We define the set of nodes in GM,xG_{M^{\prime},x} to be

S={(h,i),(h,0,i)h[p],i[B]}.S=\{(h,i),(h,0,i)\mid h\in[p],i\in[B]\}.

Intuitively, each (h,i)[p]×[B](h,i)\in[p]\times[B] will correspond to the content of the relevant block of tape hh after time block ii, while each (h,0,i)(h,0,i) will be a source node in GM,xG_{M^{\prime},x} corresponding to the content of the ii-th block of tape hh when it is accessed for the first time, i.e., the initial configuration of the ii-th block of tape hh.777A little explanation may be in order. In the warm-up Theorem 3.1, we assumed t(n)n2t(n)\geq n^{2}, so that the entire initial configuration of the machine on xx could fit in a length-b(n)b(n) block. This made the leaf nodes of our Tree Evaluation instance RGR_{G^{\prime}} particularly easy to describe. For t(n)n2t(n)\ll n^{2}, we may have |x|=nb(n)|x|=n\ll b(n), so the input xx may not fit in one tape block. To accommodate this possibility and obtain a clean reduction to Tree Evaluation in the end, we define multiple source nodes in GM,xG_{M^{\prime},x} to account for different b(n)b(n)-length blocks of input xx in the initial configuration of MM^{\prime} on xx, along with b(n)b(n)-length blocks of all-blank tape when these blocks are accessed for the first time. (We imagine that on each tape, the tape blocks are indexed 1,2,1,2,\ldots starting from the leftmost block, with up to BB tape blocks for each tape.) We think of all (h,0,i)(h,0,i) nodes as associated with “time block 0”.

Each node (h,i)[p]×[B](h,i)\in[p]\times[B] is labeled with an integer m(h,i){1,0,1}m_{(h,i)}\in\{-1,0,1\}, indicating the tape head hh movement at the end of time block ii:

m(h,i)={1 if the head h moves one tape block to the right of the current tape block,1 if h moves one tape block to the left, and0 if h stays in the same tape block for both time blocks i and i+1.m_{(h,i)}=\begin{cases}~{}1&\text{ if the head $h$ moves one tape block to the right of the current tape block},\\ -1&\text{ if $h$ moves one tape block to the left, and}\\ ~{}0&\text{ if $h$ stays in the same tape block for both time blocks $i$ and $i+1$.}\end{cases}

Next, we describe the edges of GM,xG_{M^{\prime},x}; there are two types. For each h,h[p]h,h^{\prime}\in[p] and i,j[B]i,j\in[B] with i<ji<j, the edge ((h,i),(h,j))((h^{\prime},i),(h,j)) is put in GM,xG_{M^{\prime},x} if either:

  • j=i+1j=i+1, or

  • while MM^{\prime} is running during time block jj, the tape block accessed by hh^{\prime} during time block ii is not accessed again until time block jj. That is, tape head hh^{\prime} reads the same tape block TT in both time blocks ii and jj, but head hh^{\prime} does not read tape block TT during any of the time blocks i+1,,j1i+1,\ldots,j-1.

For h,h[p]h,h^{\prime}\in[p] and i,j[B]i,j\in[B], the edge ((h,0,i),(h,j))((h^{\prime},0,i),(h,j)) is put in GM,xG_{M^{\prime},x} if, while MM^{\prime} is running during time block jj, the tape head hh^{\prime} accesses its ii-th tape block for the first time in the computation. (For example, note that , for all h,hh,h^{\prime}, ((h,0,1),(h,1))((h^{\prime},0,1),(h,1)) is an edge.)

Observe that the indegree of each node (h,j)[p]×[B](h,j)\in[p]\times[B] is at most 2p2p for all j>0j>0: for all h[p]h^{\prime}\in[p], there is an edge from (h,j1)(h^{\prime},j-1) to (h,j)(h,j) for j>1j>1 (and from (h,0,1)(h^{\prime},0,1) to (h,1)(h,1) for j=1j=1), and there is an edge from a node labeled by hh^{\prime} (either of the form (h,ih,j)(h^{\prime},i_{h^{\prime},j}) or (h,0,ih,j)(h^{\prime},0,i_{h^{\prime},j})) to (h,j)(h,j), indicating which block of tape hh^{\prime} is needed to compute the block of tape hh during time block jj. (Note that some of these edges might be double-counted, so the indegree is at most 2p2p.)

A Succinct Graph Encoding.

The obvious way to store the computation graph GM,xG_{M^{\prime},x} uses O(BlogB)O(B\log B) bits. To save space, we will be more careful, and use additional observations on the structure of such computation graphs. (These observations are similar but not identical to those used in the separation of 𝖭𝖳𝖨𝖬𝖤[n]{\sf NTIME}[n] and 𝖳𝖨𝖬𝖤[n]{\sf TIME}[n], of Paul-Pippenger-Szemerédi-Trotter [PPST83].)

Recall that each node (h,i)(h,i) is labeled by a head movement m(h,i){1,1,0}m_{(h,i)}\in\{-1,1,0\} indicating whether the tape block of head hh is decremented by 11, incremented by 11, or stays the same, at the end of time block ii. Our key observation is that the numbers m(h,i)m_{(h,i)} alone already tell us the entire structure of the graph GM,xG_{M^{\prime},x}. This immediately implies that every possible guess of GM,xG_{M^{\prime},x} can be encoded in only O(B)O(B) bits: we only need a constant number of bits for each of the O(B)O(B) nodes.

In particular, for an index i[B]i\in[B], we define block(h,i)[B]\text{block}(h,i)\in[B] to be the index of the tape block of tape hh being accessed at the start of time block ii. For all h[p]h\in[p], we have block(h,1)=1\text{block}(h,1)=1, and for i>1i>1, by definition of the m(h,i)m_{(h,i)} we have

block(h,i)=1+j=1i1m(h,j).\displaystyle\text{block}(h,i)=1+\sum_{j=1}^{i-1}m_{(h,j)}. (1)

Equation (1) is true because each m(h,j)m_{(h,j)} tells us how the index of the tape block of tape hh changes at the end of time block jj, which is the same as the tape block index at the start of time block j+1j+1.

For i<ji<j, observe that there is an edge from (h,i)(h^{\prime},i) to (h,j)(h,j) in GM,xG_{M^{\prime},x} if and only if either:

  • j=i+1j=i+1, or

  • block(h,i)=block(h,j)\text{block}(h^{\prime},i)=\text{block}(h^{\prime},j) and for all k{i+1,,j1}k\in\{i+1,\ldots,j-1\}, block(h,k)block(h,i)\text{block}(h^{\prime},k)\neq\text{block}(h^{\prime},i). (The tape block accessed by hh^{\prime} in time block ii is not accessed again until time block jj.)

Furthermore, there is an edge from (h,0,i)(h^{\prime},0,i) to (h,j)(h,j) in GM,xG_{M^{\prime},x} if and only if i=block(h,j)i=\text{block}(h^{\prime},j) and for all k{1,,j1}k\in\{1,\ldots,j-1\}, block(h,j)block(h,k)\text{block}(h^{\prime},j)\neq\text{block}(h^{\prime},k). (The tape block accessed by hh^{\prime} in time block jj was never accessed before, and its index is equal to ii.)

We will use a few claims about the block function and the computation graph.

Claim 3.4.

Given (h,i)[p]×[B](h^{\prime},i^{\prime})\in[p]\times[B] and the claimed sequence {m(h,i)}\{m_{(h,i)}\}, we can compute block(h,i)\text{block}(h^{\prime},i^{\prime}) in O(logt(n))O(\log t(n)) additional space.

Proof.

Given any h,ih^{\prime},i^{\prime}, each block(h,i)\text{block}(h^{\prime},i^{\prime}) can be computed in logspace using (1), by maintaining a counter and streaming over the sequence m(h,i)m_{(h,i)}. ∎

Claim 3.5.

Given the indices of a pair of nodes u,vu,v in GM,xG_{M^{\prime},x}, and the claimed sequence {m(h,i)}\{m_{(h,i)}\}, we can determine if (u,v)(u,v) is an edge in the encoded graph in O(logt(n))O(\log t(n)) additional space.

Proof.

We can easily check if j=i+1j=i+1 in logspace. By 3.4, we can compute block(h,j)\text{block}(h^{\prime},j) for any h,jh^{\prime},j in logspace as well. Therefore the three possible conditions for an edge in the computation graph GM,xG_{M^{\prime},x} can each be checked in logspace. ∎

To summarize, we can encode GM,xG_{M^{\prime},x} in O(B)O(B) bits, and given such an encoding, we can determine any desired edge (u,v)(u,v) of the graph in logspace.

Values of Nodes.

Similarly to our warm-up Theorem 3.1, we define contents for each of the nodes of GM,xG_{M^{\prime},x}. For all h[p]h\in[p] and i[B]i\in[B], we define content(h,0,i)\text{content}(h,0,i) for the source nodes (h,0,i)(h,0,i) as follows:

  • If h>1h>1, then we are reading a tape hh that does not contain the input. In this case, content(h,0,i)\text{content}(h,0,i) is defined to be all-blank tape content: b(n)b(n) blanks, with the tape head at the leftmost cell of the block, and head position equal to (i1)b(n)(i-1)\cdot b(n). (Note that, assuming we start numbering tape cells at 0, (i1)b(n)(i-1)\cdot b(n) is the leftmost cell of the ii-th block of tape.)

  • If h=1h=1, then we may need to read portions of the input xx of length nn. If i>n/b(n)i>\lceil n/b(n)\rceil, then the ii-th tape block of tape 11 does not contain any symbol of xx, and content(1,0,i)\text{content}(1,0,i) is defined to be all-blank tape content as above. Otherwise, if in/b(n)i\leq\lceil n/b(n)\rceil, then content(1,0,i)\text{content}(1,0,i) is defined to be the relevant b(n)b(n)-length substring of xx (possibly padded with blanks at the end) with the tape head at the leftmost cell of the block, head position equal to (i1)b(n)(i-1)\cdot b(n), and the initial state of MM^{\prime} included if i=0i=0.

Note that the source nodes of our GM,xG_{M^{\prime},x} will eventually be the leaf nodes of the Tree Evaluation instance; in the above, we are also defining the values of those leaves.

For h[p]h\in[p] and i[B]i\in[B], we define content(h,i)\text{content}(h,i) of node (h,i)(h,i) similarly as in Theorem 3.1: it is a string encoding the pair (h,i)(h,i), the state of MM^{\prime} and the head position of tape hh at the end of time block ii, and the content of the tape block read by head hh at the end of time block ii. With a reasonable encoding, content(h,i)\text{content}(h,i) can be represented in O(b(n)+logt(n))O(b(n))O(b(n)+\log t(n))\leq O(b(n)) bits, and our computation graph GM,xG_{M^{\prime},x} has been set up (just as in Theorem 3.1) so that given the strings content(u)\text{content}(u) for all nodes uu with edges to (h,j)(h,j), the string content(h,j)\text{content}(h,j) can be computed in O(b(n))O(b(n)) time and space.

Computation Graph Enumeration.

In what follows, we enumerate over all possible O(t(n)/b(n))O(t(n)/b(n))-bit choices GG^{\prime} of the encoding of the computation graph GM,xG_{M^{\prime},x} on O(B)O(t(n)/b(n))O(B)\leq O(t(n)/b(n)) nodes. For each GG^{\prime}, we construct a Tree Evaluation instance RGR_{G^{\prime}} based on GG^{\prime}, which will attempt to simulate MM^{\prime} on xx assuming G=GM,xG^{\prime}=G_{M^{\prime},x}. We will set up RGR_{G^{\prime}} so that the following conditions hold:

  • If GGM,xG^{\prime}\neq G_{M^{\prime},x}, this means that at the end of some time block ii, some tape head hh of MM^{\prime} on xx moves inconsistently with the guessed label m(h,i)m_{(h,i)} in GG^{\prime}. In this case, evaluating RGR_{G^{\prime}} will result in a special FAIL value at the root.

  • If G=GM,xG^{\prime}=G_{M^{\prime},x}, then RGR_{G^{\prime}} will evaluate to content(1,B)\text{content}(1,B) at the root, which will include either an accept or reject state at the end of the computation of MM^{\prime} on xx. Thus we will be able to conclude decisively whether MM accepts or rejects xx after this call to Tree Evaluation.

Finally, if we exhaust all computation graphs GG^{\prime} and always receive FAIL from all Tree Evaluation calls, we then increase our guess of t(n)t(n) by 11 (starting from t(n)=nt(n)=n), and restart the entire process. (Recall that we do not assume t(n)t(n) is constructible.) Eventually, we will choose an appropriate t(n)t(n), in which case the above enumeration of graphs GG^{\prime} will result in either acceptance or rejection.

The Functions At The Nodes.

We now define the functions at the nodes of our Tree Evaluation instance RGR_{G^{\prime}}. These functions will allow us to detect when the current graph GG^{\prime} we are considering has an error. We have to be a little careful here, as the Cook-Mertz algorithm is only guaranteed to produce the value of the root of a given tree, and not the values of any intermediate nodes along the way. (Indeed, the values of other nodes may become quite “scrambled” from evaluating a low-degree extension of the functions involved.)

For each tape h[p]h\in[p] and time block i[B]i\in[B], we define a time block function Fh,iF_{h,i} which attempts to simulate MM^{\prime} over time block ii, and to output content(h,i)\text{content}(h,i), the content of the relevant block of tape hh at the end of time block ii.

First, we choose an encoding of content(h,i)\text{content}(h,i) strings so that there is also a special FAIL string of length O(b(n))O(b(n)), which is distinct from all valid content strings.

  • The input to Fh,iF_{h,i} consists of up to 2p2p strings of length O(b(n))O(b(n)). Some strings may be O(b(n))O(b(n))-bit FAIL strings. In the case where G=GM,xG^{\prime}=G_{M^{\prime},x}, pp of the input strings have the form content(h,i1)\text{content}(h^{\prime},i-1) (or content(h,0,1)\text{content}(h^{\prime},0,1) if i=1i=1) for all h[p]h^{\prime}\in[p]. These content strings contain the index of the node in GG^{\prime} they correspond to, the state qq of MM^{\prime} at the start of time block ii, the content of the relevant tape blocks at the end of time block i1i-1, and the head positions of all pp tapes at the start of time block ii, where each head position is encoded in O(logb(n))O(\log b(n)) bits. When some of the tape blocks accessed in time block ii were not accessed in the previous time block, there are also (up to) pp other strings content(u1),,content(up)\text{content}(u_{1}),\ldots,\text{content}(u_{p}) which contain tape block content c1,,cpc_{1},\ldots,c_{p} for each of the pp tapes at the start of time block ii.

  • The output of Fh,iF_{h,i} is defined as follows. First of all, if some input string is a FAIL string, then Fh,iF_{h,i} immediately outputs FAIL as well. In this way, a single FAIL detection at any node will propagate to the root value of RGR_{G^{\prime}}.

    Assuming no FAIL strings have been given as input, Fh,iF_{h,i} attempts to simulate MM^{\prime} for time block ii, using the given content strings. While simulating, Fh,iF_{h,i} checks that for all h[p]h^{\prime}\in[p], the tape head hh^{\prime} moves consistently with the integer m(h,i){0,1,1}m_{(h^{\prime},i)}\in\{0,1,-1\}. In particular, if m(h,i)=1m_{(h^{\prime},i)}=-1 then it checks tape head hh^{\prime} moves to its left-adjacent tape block at the end of time block ii, if m(h,i)=1m_{(h^{\prime},i)}=1 then it checks hh^{\prime} moves to its right-adjacent tape block, and if m(h,i)=0m_{(h^{\prime},i)}=0 then it checks hh^{\prime} remains in the same tape block. If all heads hh^{\prime} move consistently with the integers m(h,i)m_{(h^{\prime},i)}, then the output of Fh,iF_{h,i} is set to be content(h,i)\text{content}(h,i). Otherwise, the output of Fh,iF_{h,i} is FAIL.

Observe that Fh,iF_{h,i} can be computed in O(b(n))O(b(n)) time and space, by simply simulating MM^{\prime} for b(n)b(n) steps, starting from the contents c1,,cpc_{1},\ldots,c_{p} for each of the tapes, and the state and head information given. Furthermore, by setting b(n)=Θ(b(n))b^{\prime}(n)=\Theta(b(n)) appropriately, we may think of each Fh,iF_{h,i} as a function from an ordered collection of 2p2p separate b(n)b^{\prime}(n)-bit strings, to a single b(n)b^{\prime}(n)-bit string. These will be the functions at the nodes of our Tree Evaluation instance.

Construct a Tree Evaluation Instance.

Observe that the depth of every GG^{\prime} is at most B+1B+1: for every edge (u,v)(u,v) in GG^{\prime}, the time block of uu is always smaller than the time block of vv, and there are B+1B+1 time blocks. For each possible computation graph GG^{\prime}, we will construct an equivalent and implicitly-defined Tree Evaluation instance RGR_{G^{\prime}} of height at most B+1O(t(n)/b(n))B+1\leq O(t(n)/b(n)), such that the edge relation of the tree can be determined in small space. The idea is analogous to that described in the warm-up (Theorem 3.1); for completeness, we will be a little more formal than Theorem 3.1.

Recall that each candidate GG^{\prime} is defined so that each non-source node (h,i)(h,i) has indegree at most 2p2p. Let VV be the set of all O((2p)B+1)O((2p)^{B+1}) sequences of the form h1hh_{1}\cdots h_{\ell} for all =0,,B\ell=0,\ldots,B, where each hi[2p]h_{i}\in[2p], and let ε\varepsilon denote the empty string (also in VV). The nodes of RGR_{G^{\prime}} will be indexed by sequences in VV. For all =0,,B1\ell=0,\ldots,B-1, each node h1hVh_{1}\cdots h_{\ell}\in V has at most 2p2p children h1hh+1h_{1}\cdots h_{\ell}h_{\ell+1} for some h+1[2p]h_{\ell+1}\in[2p].

Each node of RGR_{G^{\prime}} is directly associated with a path from a node vv to the node (1,B)(1,B) in the guessed graph GG^{\prime}, as follows.

  • The node εV\varepsilon\in V corresponds to the node (1,B)(1,B) in GG^{\prime} (tape 11, in the last time block), and we associate the function F1,BF_{1,B} with this node.

  • Inductively assume the node h1hVh_{1}\cdots h_{\ell}\in V corresponds to some node (h,j)(h,j) in GG^{\prime}. We associate the function Fh,jF_{h,j} with this node.

    For every h{1,,p}h^{\prime}\in\{1,\ldots,p\}, the node h1hhVh_{1}\cdots h_{\ell}h^{\prime}\in V corresponds to a node (h,i)(h^{\prime},i) (or (h,0,i)(h^{\prime},0,i)) in GG^{\prime} with an edge to (h,j)(h,j), for some ii. This corresponds to the case where the tape block of tape hh^{\prime} accessed in time block ii is accessed later in time block jj (or when hh^{\prime} is reading tape block ii for the first time in time block jj).

    For every h{p+1,,2p}h^{\prime}\in\{p+1,\ldots,2p\}, let h′′=hph^{\prime\prime}=h^{\prime}-p, so h′′[p]h^{\prime\prime}\in[p]. The node h1hhVh_{1}\cdots h_{\ell}h^{\prime}\in V corresponds to the node (h′′,j1)(h^{\prime\prime},j-1) with an edge to (h,j)(h,j) when j>1j>1, and the node (h′′,0,1)(h^{\prime\prime},0,1) with an edge to (h,j)(h,j) when j=1j=1. This corresponds to the case where the state and head position of tape h′′h^{\prime\prime} at the end of time block j1j-1 is passed to the start of time block jj, and to the case where the initial state and head position is passed to the start of time block 11.

    If there is an edge from (h,0,i)(h^{\prime},0,i) to (h,j)(h,j) for some ii, then the tape head hh^{\prime} has never previously visited the tape block ii that is used to compute time block jj. In that case, we set h1hhh_{1}\cdots h_{\ell}h^{\prime} to be a leaf node in the tree. The value of that leaf node is set to be content(h,0,i)\text{content}(h^{\prime},0,i).

Finishing up.

We call Tree Evaluation on RGR_{G^{\prime}}. If the current guessed GG^{\prime} is not equal to GM,xG_{M^{\prime},x}, then some guessed integer m(h,i){1,1,0}m_{(h,i)}\in\{-1,1,0\} is an incorrect value (recall that GG^{\prime} is specified entirely by the integers {m(h,i)}\{m_{(h,i)}\}). This incorrect head movement will be detected by the function Fh,iF_{h,i}, which will then output FAIL. This FAIL value will propagate to the root value of RGR_{G^{\prime}} by construction. When RGR_{G^{\prime}} evaluates to FAIL, we move to the next possible GG^{\prime}, encoded in O(B)O(B) bits.

Assuming the current graph GG^{\prime} is correct, i.e., G=GM,xG^{\prime}=G_{M^{\prime},x}, then the value of the root of RGR_{G^{\prime}} is content(1,B)\text{content}(1,B), the output of F1,BF_{1,B} on the final time block BB. This value contains the correct accept/reject state of MM^{\prime} on xx. This follows from the construction of the functions Fh,iF_{h,i}, and can be proved formally by an induction on the nodes of GG^{\prime} in topological order. Therefore if our Tree Evaluation call returns some content with an accept/reject state, we can immediately return the decision.

Space Complexity.

Observe that, by 3.4 and 3.5, any bits of the Tree Evaluation instance RGR_{G^{\prime}} defined above can be computed in O(B)O(B) space, given the computation graph GG^{\prime} encoded in O(B)O(B) space. In particular, given the index of a node of RGR_{G^{\prime}} of the tree as defined above (as a path from a node vv to (1,B)(1,B) in GG^{\prime}), we can determine the corresponding node of GG^{\prime} and its children in O(B)O(B) space.

Therefore, we can call the Cook-Mertz algorithm (Theorem 2.2) on this implicitly-defined instance of Tree Evaluation, using O(B)O(B) space plus O(db+hlog(db))O(d^{\prime}\cdot b^{\prime}+h^{\prime}\cdot\log(d\cdot b^{\prime})) space, where

  • b=Θ(b(n))b^{\prime}=\Theta(b(n)) is the bit-length of the output of our functions Fh,iF_{h,i},

  • d=2p=Θ(1)d^{\prime}=2p=\Theta(1), the number of children of each inner node, and

  • h=B=Θ(t(n)/b(n))h^{\prime}=B=\Theta(t(n)/b(n)), the maximum height of the tree.

At each node, each call to one of the functions Fh,iF_{h,i} requires only O(b(n))O(b(n)) space. Therefore the overall space bound is

O(b(n)+t(n)b(n)log(b(n))).O\left(b(n)+\frac{t(n)}{b(n)}\cdot\log(b(n))\right).

Setting b(n)=t(n)logt(n)b(n)=\sqrt{t(n)\log t(n)} obtains the bound O(t(n)logt(n))O(\sqrt{t(n)\log t(n)}). This completes the proof of Theorem 1.1.

Given that we have reduced to Tree Evaluation, it may be instructive to think about what is happening in the final algorithm, conceptually. At a high level, for our instances of Tree Evaluation, the Cook-Mertz procedure on RGR_{G^{\prime}} uses O((t(n)/b(n))logb(n))O((t(n)/b(n))\cdot\log b(n)) space to specify the current node vv of GG^{\prime} being examined (given by a path from that node vv to the node (1,B)(1,B)) as well as an element from a field of size poly(b(n))\text{poly}(b(n)) for each node along that path. The algorithm also reuses O(b(n))O(b(n)) space at each node, in order to compute low-degree extensions of the function Fh,iF_{h,i} by interpolation over carefully chosen elements of the field. For our setting of b(n)b(n), we are using equal amounts of space to store a path in the graph GG^{\prime} labeled with field elements, and to compute a low-degree extension of the “block evaluation” function Fh,iF_{h,i} at each node (with aggressive reuse of the latter space, at every level of the tree).

3.3 On Possibly Removing the Square-Root-Log Factor

We observe that, if Tree Evaluation turns out to be in 𝖫𝖮𝖦𝖲𝖯𝖠𝖢𝖤=𝖲𝖯𝖠𝖢𝖤[logn]{\sf LOGSPACE}={\sf SPACE}[\log n], then we would obtain a simulation that runs in O(b+t(n)/b)O(b^{\prime}+t(n)/b^{\prime}) space when the number of children is upper bounded by a constant. This is due to our succinct encoding of the computation graph, which only needs O(b)O(b^{\prime}) space to be stored. Setting b=t(n)b^{\prime}=\sqrt{t(n)}, this would remove the pesky O(logt)O(\sqrt{\log t}) factor from our space bound above:

Corollary 3.6.

If Tree Evaluation is in 𝖫𝖮𝖦𝖲𝖯𝖠𝖢𝖤{\sf LOGSPACE}, then 𝖳𝖨𝖬𝖤[t(n)]𝖲𝖯𝖠𝖢𝖤[t(n)]{\sf TIME}[t(n)]\subseteq{\sf SPACE}[\sqrt{t(n)}].

There may be a path to removing the logt\sqrt{\log t} factor that does not require solving Tree Evaluation in logspace. The simulations of time tt building on Hopcroft-Paul-Valiant [HPV75, PV76, PR81, DT85, HLMW86], which save a logt\log t factor in space and alternating time, utilize strategies which seem orthogonal to our approach via Tree Evaluation. Roughly speaking, there are two kinds of strategies: a pebbling strategy on computation graphs which minimizes the total number of pebbles needed [HPV75, PR81], and a recursive composition strategy which cuts the graph into halves and performs one of two recursive approaches based on the cardinality of the cut [PV76, DT85, HLMW86].888There is even a third strategy, based on an “overlap” argument [AL81]. Neither of these seem comparable to the Cook-Mertz algorithm. However, so far we have been unable to merge the Tree Evaluation approach and the other strategies. The following hypothesis seems plausible:

Hypothesis 3.7.

For “reasonable” b(n)b(n) and t(n)t(n), time-t(n)t(n) multitape computations can be space-efficiently reduced to Tree Evaluation instances with constant degree dd, height O(t(n)/b(n))/log(t(n)/b(n))O(t(n)/b(n))/\log(t(n)/b(n)), and functions from O(b)O(b)-bits to bb-bits at each inner node.

Results such as [PV76, DT85] which show that 𝖳𝖨𝖬𝖤[t]𝖠𝖳𝖨𝖬𝖤[t/logt]{\sf TIME}[t]\subseteq{\sf ATIME}[t/\log t] and that circuits of size tt can be simulated in depth O(t/logt)O(t/\log t), prove that the hypothesis is actually true for b=Θ(1)b=\Theta(1). If the hypothesis is also true for b=tb=\sqrt{t}, then we could conclude 𝖳𝖨𝖬𝖤[t]𝖲𝖯𝖠𝖢𝖤[t]{\sf TIME}[t]\subseteq{\sf SPACE}[\sqrt{t}], by applying Cook and Mertz (Theorem 2.2).

3.4 Extension to Higher Dimensional Tapes

The simulation of Theorem 1.1 extends to Turing machines with higher-dimensional tapes. The proof is very similar in spirit to Theorem 1.1, but a few crucial changes are needed to carry out the generalization.

Reminder of Theorem 1.5. Every decision problem solvable by a t(n)t(n)-time dd-dimensional multitape Turing machine can be decided in O((tlogt)11/(d+1))O((t\log t)^{1-1/(d+1)}) space.

Proof.

(Sketch) As there is no dd-dimensional version of block-respecting Turing machines, we have to take a different approach. Similarly to Theorem 3.1 and Paul-Reischuk [PR80], we upper-bound the number of tape blocks that may be relevant to any given time block, and use that information to construct a series of Tree Evaluation instances.

Suppose our time-t(n)t(n) Turing machine MM has pp tapes which are dd-dimensional, and we wish to simulate MM on an input xx of length nn. We assume xx is written on the first tape in a single direction starting from the cell indexed by (0,,0)d(0,\ldots,0)\in{\mathbb{N}}^{d}. (For concreteness, for i=1,,ni=1,\ldots,n we may assume the ii-th bit of xx is written in the cell (0,,0,i1)d(0,\ldots,0,i-1)\in{\mathbb{N}}^{d}.)

For a parameter cc, we partition the time t(n)t(n) into time blocks of length cc, so there are B=t(n)/cB=\lceil t(n)/c\rceil total time blocks. Besides time blocks 1,,B1,\ldots,B, we also define a time block 0 (similarly to Theorem 3.1 and Theorem 1.1). Each dd-dimensional tape is partitioned into tape blocks of b=cdb=c^{d} contiguous cells: in particular, each tape block is a dd-dimensional cube with side-length cc in each direction. Observe that each tape block has up to 3d13^{d}-1 adjacent tape blocks in dd dimensions. We define a tape block TT to be active in time block i>1i>1 if some cells of TT are accessed during time block ii, and all tape blocks are defined to be active in time block 0. Since each time block is only for cc steps and each tape block has side-length cc in each direction, observe that for each i[B]i\in[B] and each tape h[p]h\in[p], there are at most 2d2^{d} blocks of tape hh that may be active during time block ii. Therefore across all pp tapes, the total number of active blocks is at most 2dp2^{d}\cdot p. Note that each active tape block of tape hh during time block ii can be indexed by some vector in {1,0,1}d\{-1,0,1\}^{d} indicating its position relative to the tape block in which tape hh started the time block.

Similar to the proofs of Theorem 3.1 and Theorem 1.1, we define a computation graph GM,xG_{M,x}. The set of nodes will be the set S={(h,i)h[p],i[B]}S=\{(h,i)\mid h\in[p],i\in[B]\} unioned with a subset T{(h,0,v)h[p],vd}T\subseteq\{(h,0,v)\mid h\in[p],v\in{\mathbb{N}}^{d}\}, where |T|pB|T|\leq p\cdot B.

Each node will have a content value as before. The content(h,0,v)\text{content}(h,0,v) nodes will store a portion of the input, or the all-blank content, depending on the vector vdv\in{\mathbb{N}}^{d}. The vector vv gives the index of a block of tape hh. (In the one-dimensional case, the tape blocks were simply indexed by {\mathbb{N}}.) The content(h,i)\text{content}(h,i) nodes will store information at the end of time block ii: the state of MM^{\prime}, the head position of tape hh, and a list of the contents of those blocks of tape hh that are active during time block ii. Note that content(h,i)\text{content}(h,i) can be encoded in O(2db(n))O(2^{d}\cdot b(n)) bits.

As before, we label each node (h,i)[p]×[B](h,i)\in[p]\times[B] to indicate the head movements between time blocks, but our labels are more complicated than before. We use a dd-dimensional vector m(h,i){1,0,1}dm_{(h,i)}\in\{-1,0,1\}^{d} to indicate the difference between the index of the tape block u(h,i)du_{(h,i)}\in{\mathbb{N}}^{d} that tape head hh is in at the start of time block ii, and the index of the tape block v(h,i)dv_{(h,i)}\in{\mathbb{N}}^{d} that head hh is in at the end of time block ii. We have v(h,i)u(h,i){1,0,1}dv_{(h,i)}-u_{(h,i)}\in\{-1,0,1\}^{d}, since every time block takes cc steps and the side-length of a tape block is cc cells. We also label each node with a list L(h,i)L_{(h,i)} of up to 2d2^{d} other vectors in {1,0,1}d\{-1,0,1\}^{d}, describing the indices of all other blocks of tape hh that are active in time block ii. Observe that the vectors m(h,i)m_{(h,i)} and the lists L(h,i)L_{(h,i)} over all nodes can be encoded in O(d2dB)O(B)O(d\cdot 2^{d}\cdot B)\leq O(B) bits. Moreover, given the vectors m(h,i)m_{(h,i)} and the lists L(h,i)L_{(h,i)}, we can reconstruct any v(h,i)v_{(h,i)} and u(h,i)u_{(h,i)} in only logspace by maintaining counters.

As before, we put an edge from (h,i)(h,i) to (h,j)(h^{\prime},j) if either

  • j=i+1j=i+1, or

  • i<ji<j and there is some tape block active on tape hh in both time blocks ii and jj that is not active for all time blocks i+1i+1 through j1j-1.

We put an edge from (h,0,v)(h,0,v) to (h,j)(h^{\prime},j) if during time block jj, the tape head hh accesses the tape block indexed by vv for the first time in the computation.

We observe that the indegree of each (h,i)(h,i) is at most (2d+1)p(2^{d}+1)\cdot p: there are at most 2d2^{d} active tape blocks for each of the pp tapes, each of which may require information from a different previous node, and there are also pp edges from the previous time block providing the state and head positions at the start of time block ii.

Generalizing 3.4 and 3.5, we claim that the edges of GM,xG_{M,x} can be determined in logspace, given the vectors m(h,i)m_{(h,i)} and the lists L(h,i)L_{(h,i)} encoded in O(B)O(B) bits. We enumerate over all possible computation graphs GG^{\prime}, using this encoding. Note that the encoding also allows us to determine which source nodes (h,0,v)(h,0,v) appear in GG^{\prime}, by tracking the tape head movements as claimed by the vectors m(h,i)m_{(h,i)} and the lists L(h,i)L_{(h,i)}, and noting when a tape head hh enters a new block that has not been accessed before.

For each node (h,i)(h,i), the evaluation function Fh,iF_{h,i} is defined similarly as in the proofs of Theorem 3.1 and Theorem 1.1. Given all the necessary information at the start of a time block: the state qq, the contents of the (at most) 2dp2^{d}\cdot p active blocks, and each of the pp head positions encoded in O(dlogt)O(d\log t) bits, the function Fh,iF_{h,i} computes content(h,i)\text{content}(h,i): it simulates MM for cc steps on the active blocks, then outputs updated contents of all O(2d)O(2^{d}) active blocks on tape hh, in bO(2dcd+2ddlogt)b^{\prime}\leq O(2^{d}\cdot c^{d}+2^{d}\cdot d\log t) bits for some b=Θ(cd)b^{\prime}=\Theta(c^{d}), including the head position of tape hh at the end of the time block and the state qq^{\prime} reached.

As before, Fh,iF_{h,i} checks that the claimed active tape blocks in L(h,i)L_{(h,i)} and the claimed vector m(h,i)m_{(h,i)} are consistent with the simulation of time block ii; if they are not, then Fh,iF_{h,i} outputs FAIL and we design the Fh,iF_{h,i} as before so that a single FAIL value is propagated to the root of RGR_{G^{\prime}}. We can think of each Fh,iF_{h,i} as a mapping from (2d+1)pb(2^{d}+1)\cdot p\cdot b^{\prime} bits to bb^{\prime} bits, where we allow a special encoding to “pad” the input and output when the number of active blocks on some tape is less than 2d2^{d}.

Finally, for each guessed graph GG^{\prime} we define a Tree Evaluation RGR_{G^{\prime}} instance analogously as in the proof of Theorem 1.1. The root of the tree corresponds to the node (1,B)(1,B), where we wish to check if content(1,B)\text{content}(1,B) contains the accept or reject state. The children of a given node in the tree RGR_{G^{\prime}} are determined directly by the predecessors of the corresponding node in GM,xG_{M,x}. The leaves correspond to the nodes (h,0,v)(h,0,v) in GM,xG_{M,x} such that content(h,0,v)\text{content}(h,0,v) contains either an all-blank dd-dimensional cube of cdc^{d} cells, or a dd-dimensional cube of cdc^{d} cells which includes up to cc symbols of the input xx and is otherwise all-blank.

As before, if a call to Tree Evaluation for some RGR_{G^{\prime}} ever returns an answer other than FAIL, we return the appropriate accept/reject decision. If all calls FAIL, we increment the guess for the running time t(n)t(n) and try again.

Our Tree Evaluation instance RGR_{G^{\prime}} has height at most B=t(n)/cB=\lceil t(n)/c\rceil, where each inner node has at most (2d+1)p(2^{d}+1)\cdot p children, working on blocks of bitlength 2dbO(cd+dlogt)2^{d}\cdot b^{\prime}\leq O(c^{d}+d\log t). Applying the Cook-Mertz algorithm for Tree Evaluation, recalling that dd and pp are both constants, and including the O(B)O(B) bits we use to encode a candidate graph GG^{\prime}, we obtain a simulation running in space

O(cd+dlogt+t(n)clog(cd))O(cd+t(n)clogt(n)),O\left(c^{d}+d\log t+\frac{t(n)}{c}\cdot\log(c^{d})\right)\leq O\left(c^{d}+\frac{t(n)}{c}\cdot\log t(n)\right),

since ct(n)c\leq t(n) and dd is constant.

Setting c=(t(n)logt(n))1/(d+1)c=(t(n)\log t(n))^{1/(d+1)}, the resulting space bound is O((t(n)logt(n))11/(d+1))O((t(n)\log t(n))^{1-1/(d+1)}). ∎

We remark that putting Tree Evaluation in 𝖫𝖮𝖦𝖲𝖯𝖠𝖢𝖤{\sf LOGSPACE} would also directly improve the above simulation as well, to use O(t(n)11/(d+1))O(t(n)^{1-1/(d+1)}) space.

4 Some Consequences

First, we describe some simple lower bounds that follow by diagonalization. We will use the following form of the space hierarchy theorem:

Theorem 4.1 ([SHL65]).

For space constructible s(n),s(n)ns^{\prime}(n),s(n)\geq n such that s(n)<o(s(n))s^{\prime}(n)<o(s(n)), we have 𝖲𝖯𝖠𝖢𝖤[s(n)]𝖲𝖯𝖠𝖢𝖤[s(n)]{\sf SPACE}[s(n)]\not\subset{\sf SPACE}[s^{\prime}(n)].

Reminder of Corollary 1.2. For space constructible s(n)ns(n)\geq n and all ε>0\varepsilon>0, 𝖲𝖯𝖠𝖢𝖤[s(n)]𝖳𝖨𝖬𝖤[s(n)2ε]{\sf SPACE}[s(n)]\not\subset{\sf TIME}[s(n)^{2-\varepsilon}].

Proof.

Assume to the contrary that 𝖲𝖯𝖠𝖢𝖤[s(n)]𝖳𝖨𝖬𝖤[s(n)2ε]{\sf SPACE}[s(n)]\subseteq{\sf TIME}[s(n)^{2-\varepsilon}] for some ε>0\varepsilon>0 and some space constructible s(n)s(n). By Theorem 1.1, we have

𝖲𝖯𝖠𝖢𝖤[s(n)]𝖳𝖨𝖬𝖤[s(n)2ε]𝖲𝖯𝖠𝖢𝖤[(s(n)2εlogs(n))1/2]𝖲𝖯𝖠𝖢𝖤[s(n)]{\sf SPACE}[s(n)]\subseteq{\sf TIME}[s(n)^{2-\varepsilon}]\subseteq{\sf SPACE}[(s(n)^{2-\varepsilon}\log s(n))^{1/2}]\subseteq{\sf SPACE}[s^{\prime}(n)]

for a function s(n)s^{\prime}(n) which is o(s(n))o(s(n)). This contradicts Theorem 4.1. ∎

Reminder of Corollary 1.3. The language L={M,x,1k|M|k and M(x) halts in space k}L=\{\langle M,x,1^{k}\rangle\mid|M|\leq k\text{ and }M(x)\text{ halts in space }k\} requires n2εn^{2-\varepsilon} time to solve on a multitape Turing machine, for every ε>0\varepsilon>0.

Proof.

Suppose LL can be solved in n2εn^{2-\varepsilon} time for some ε>0\varepsilon>0. We show every language in 𝖲𝖯𝖠𝖢𝖤[n]{\sf SPACE}[n] could then be solved in time O(n2ε)O(n^{2-\varepsilon}), contradicting Corollary 1.2. Let L𝖲𝖯𝖠𝖢𝖤[n]L^{\prime}\in{\sf SPACE}[n] be decided by a Turing machine MM using space cncn for a constant c1c\geq 1. Given an input xx to MM, we call our n2εn^{2-\varepsilon} time algorithm for LL on the input M,x,1c|x|\langle M,x,1^{c|x|}\rangle of length n=O(|x|)n=O(|x|). This takes O(|x|2ε)O(|x|^{2-\varepsilon}) time, a contradiction. ∎

Observe the same proof also shows a time lower bound of the form n2/logcnn^{2}/\log^{c}n, for a constant c>0c>0.

4.1 Subexponential Size Branching Programs for Circuits

Here, we observe that Theorem 1.1 implies that subquadratic size circuits can be simulated with subexponential size branching programs:

Reminder of Corollary 1.4. There is a universal k1k\geq 1 such that for all sns\geq n, every bounded fan-in circuit of size ss and nn inputs has a branching program of size at most 2kslogks2^{k\sqrt{s}\log^{k}s}.

Recall the Circuit Evaluation problem: given the description of a Boolean circuit CC of fan-in two with one output, and given an input xx, does CC on xx evaluate to 11? We assume CC is encoded in topological order: the gates are numbered 1,,s1,\ldots,s, and for all i[s]i\in[s], the ii-th gate in the encoding only takes inputs from gates appearing earlier in the encoding. For simplicity, we further assume each gate ii provides a correct list of all future gates j1,,jk>ij_{1},\ldots,j_{k}>i that will consume the output of gate ii. Note that when s(n)s(n) is at least the number of input bits, we can still afford an encoding of size-s(n)s(n) circuits in O(s(n)logs(n))O(s(n)\log s(n)) bits, carrying this extra information. An old result commonly credited to Pippenger is that Circuit Evaluation on topologically-ordered circuits can be efficiently solved on multitape Turing machines:

Theorem 4.2 (Pippenger [Pip77]).

Circuit Evaluation 𝖳𝖨𝖬𝖤[npoly(logn)]\in{\sf TIME}[n\cdot\text{poly}(\log n)].

The reference [Pip77] is difficult to find, however the PhD thesis of Swamy ([Swa78, Chapter 2, Theorem 2.1]) gives an exposition of an O(n(logn)3)O(n\cdot(\log n)^{3}) time bound.999Available at http://static.cs.brown.edu/research/pubs/theses/phd/1978/swamy.pdf. Swamy describes his construction as simulating oblivious RAMs / straight-line programs; it readily applies to circuits. The high-level idea is to solve a more general problem: given the description of a circuit CC and input xx, we wish to insert the output values of each gate of C(x)C(x) directly into the description of CC. To solve this problem on CC of size ss, we first recursively evaluate the circuit on the first s/2\lfloor s/2\rfloor gates (in topological order) which sets values to all those gates. We pass over those values, and collect the outputs of all gates among the first s/2\lfloor s/2\rfloor that will be used as inputs to the remaining s/2\lceil s/2\rceil gates. We sort these outputs by gate index, then recursively evaluate the remaining s/2\lceil s/2\rceil gates on xx plus the list of outputs. This leads to a runtime recurrence of T(n)2T(n/2)+O(n(logn)2)T(n)\leq 2\cdot T(n/2)+O(n\cdot(\log n)^{2}) (using the fact that n=Θ(slogs)n=\Theta(s\log s)).

Combining Theorem 4.2 and Theorem 1.1, we directly conclude that Circuit Evaluation can be solved in npoly(logn)\sqrt{n}\cdot\text{poly}(\log n) space. Now, given any circuit CC of size ss, we hardcode its description into the input of a multitape Turing machine using s(n)=spoly(logs)s^{\prime}(n)=\sqrt{s}\cdot\text{poly}(\log s) space for Circuit Evaluation. Applying the standard translation of s(n)s^{\prime}(n)-space algorithms into branching programs of 2O(s(n))2^{O(s^{\prime}(n))} size (see for instance the survey [Raz91]), Corollary 1.4 follows.

5 Discussion

We have shown that multitape Turing machines and circuits have surprisingly space-efficient evaluation algorithms, via a reduction to the Tree Evaluation problem. Two reflective remarks come to mind.

First, we find it very interesting that Tree Evaluation, which was originally proposed and studied in the hopes of getting a handle on 𝖫𝖮𝖦𝖲𝖯𝖠𝖢𝖤𝖯{\sf LOGSPACE}\neq{\sf P}, may turn out to be more useful for making progress on 𝖯𝖯𝖲𝖯𝖠𝖢𝖤{\sf P}\neq{\sf PSPACE}. In any case, it is clear that Tree Evaluation is a central problem in complexity theory.

Second, we find it fortunate that the main reduction of this paper (from time-tt multitape Turing machine computations to Tree Evaluation) was found after the Cook-Mertz procedure was discovered. Had our reduction been found first, the community (including the author) would have likely declared the following theorem (a cheeky alternative way of presenting the main reduction of Theorem 1.1) as a “barrier” to further progress on Tree Evaluation, and possibly discouraged work on the subject:

Theorem.” Unless the 50-year-old [HPV75] simulation 𝖳𝖨𝖬𝖤[t]𝖲𝖯𝖠𝖢𝖤[t/logt]{\sf TIME}[t]\subseteq{\sf SPACE}[t/\log t] can be improved, Tree Evaluation instances of constant arity, height hh, and bb-bit values cannot be solved in o(hb/log(hb))o(h\cdot b/\log(h\cdot b)) space.101010Here, we think of o(hb/log(hb))o(h\cdot b/\log(h\cdot b)) as shorthand for O(hb/(f(hb)log(hb)))O(h\cdot b/(f(h\cdot b)\cdot\log(h\cdot b))) for some unbounded function ff.

Theorem 1.1 and its relatives open up an entirely new set of questions that did not seem possible to ask before. Here are a few tantalizing ones.

Can the simulation of Theorem 1.1 be improved to show 𝖳𝖨𝖬𝖤[t]𝖲𝖯𝖠𝖢𝖤[t]{\sf TIME}[t]\subseteq{\sf SPACE}[\sqrt{t}]? We discussed some prospects for a yes-answer in Section 3.3 (e.g., showing Tree Evaluation is in 𝖫𝖮𝖦𝖲𝖯𝖠𝖢𝖤{\sf LOGSPACE}). An interesting secondary question is whether the longstanding O(t)O(\sqrt{t})-space simulation of one tape time-tt Turing machines [HU68, Pat72] can be improved to O(t1/2ε)O(t^{1/2-\varepsilon}) space, for some ε>0\varepsilon>0.

Is there an ε>0\varepsilon>0 such that 𝖳𝖨𝖬𝖤[t]𝖠𝖳𝖨𝖬𝖤[t1ε]{\sf TIME}[t]\subseteq\mathsf{ATIME}[t^{1-\varepsilon}]? Is there a better speedup of time tt, using alternating Turing machines? Recall the best-known simulation is 𝖳𝖨𝖬𝖤[t]𝖠𝖳𝖨𝖬𝖤[t/logt]{\sf TIME}[t]\subseteq\mathsf{ATIME}[t/\log t] [DT85]. A yes-answer to this question would imply (for example) a super-linear time lower bound on solving quantified Boolean formulas (QBF), which is still open [Wil08, LW13]. Note that if Theorem 1.1 could be improved to 𝖳𝖨𝖬𝖤[t]𝖲𝖯𝖠𝖢𝖤[t1/2ε]{\sf TIME}[t]\subseteq{\sf SPACE}[t^{1/2-\varepsilon}] for some ε>0\varepsilon>0, then we would have 𝖳𝖨𝖬𝖤[t]𝖠𝖳𝖨𝖬𝖤[t12ε]{\sf TIME}[t]\subseteq\mathsf{ATIME}[t^{1-2\varepsilon}].

Can time-tt random-access Turing machines be simulated in space O(t1ε)O(t^{1-\varepsilon}), for some ε>0\varepsilon>0? As mentioned in Remark 1.6, the answer is yes for “oblivious” models in which data access patterns can be calculated in advance. In both Theorem 1.1 and Theorem 1.5, we exploit the locality of low-dimensional tape storage: without that property, the indegrees of nodes in the computation graph would be rather high, and (as far as we can tell) the resulting simulation would not improve the known O(t/logt)O(t/\log t) space bounds for simulating tt time in random-access models [PR81, HLMW86]. Similarly to the previous question, if 𝖳𝖨𝖬𝖤[t]𝖲𝖯𝖠𝖢𝖤[t1/2ε]{\sf TIME}[t]\subseteq{\sf SPACE}[t^{1/2-\varepsilon}] for some ε>0\varepsilon>0, then the answer to the question would be yes, since for natural random-access models (e.g., log-cost RAMs), time tt can be simulated by O(t2)O(t^{2})-time multitape Turing machines [PF79]. A similar statement holds for improving the dd-dimensional simulation of Theorem 1.5 [Lou83]. On the other hand, if the answer to the question is no, then we would separate linear time for multitape Turing machines and linear time for random-access models, another longstanding open question (see for example [GS02]).

Is a time-space tradeoff possible? For example, is 𝖳𝖨𝖬𝖤[t]𝖳𝖨𝖬𝖤𝖲𝖯𝖠𝖢𝖤[2O~(tε),O~(t1ε)]{\sf TIME}[t]\subseteq{\sf TIMESPACE}[2^{\tilde{O}(t^{\varepsilon})},\tilde{O}(t^{1-\varepsilon})], for all ε>0\varepsilon>0? The Cook-Mertz procedure needs to compute a low-degree extension of the function at each node, which is time-consuming: for time blocks of bb steps, it takes 2Θ(b)2^{\Theta(b)} time to compute the low-degree extension at a given point. If low-degree extensions of time-tt computations could be computed more time-efficiently, then such a time-space tradeoff may be possible. Note that if the multilinear extension of a given CNF on nn variables and mm clauses could be evaluated over a large field in 1.999n2o(m)1.999^{n}\cdot 2^{o(m)} time, then the Strong Exponential Time Hypothesis [IP01, CIP09] would be false: the multilinear extension is the identically-zero polynomial if and only if the CNF is unsatisfiable, so one could use DeMillo–Lipton–Schwartz–Zippel [DL78] to test for unsatisfiability. However, this observation does not apparently rule out the possibility of evaluating low-degree but non-multilinear extensions efficiently.

Can the simulation be applied recursively, to show that 𝖳𝖨𝖬𝖤[t]𝖲𝖯𝖠𝖢𝖤[tε]{\sf TIME}[t]\subseteq{\sf SPACE}[t^{\varepsilon}] for all ε>0\varepsilon>0? Note that a yes-answer would imply 𝖯𝖯𝖲𝖯𝖠𝖢𝖤{\sf P}\neq{\sf PSPACE}. At first glance, a recursive extension of Theorem 1.1 seems natural: we decompose a time-tt computation into O(t/b)O(t/b) blocks, each of which are time-bb computations. (This property was successfully exploited recursively in [LW13], for example, to show some weak lower bounds on QBF.) However, we cannot directly apply recursion to time blocks, because in the Cook-Mertz procedure, the task at each function evaluation is not just to simulate for bb steps, but to compute a low-degree extension of the function (as noted in the previous question). If low-degree extensions of time-tt computations can be computed in small space, then there is a possibility of recursion. However, given the discussion on the previous question, there is reason to think this will be difficult.

Is there a barrier to further progress? Perhaps lower bounds for the Node-Named Jumping Automata on Graphs (NNJAG) model (a popular model for restricted space lower bounds) [Poo93, Poo00, EPA99] could demonstrate a barrier. Many complex algorithms such as Reingold’s [Rei08] can be simulated in the NNJAG model [LZPC05]; does the Cook-Mertz algorithm have this property as well? We believe the answer is probably no: NNJAG is fundamentally a node-pebbling model, and the Cook-Mertz procedure definitely breaks space lower bounds for pebbling DAGs and trees [LT82, CMW+12]. Pebbling lower bounds were a major bottleneck to improving [HPV75].

While Theorem 1.1 and Theorem 3.1 are non-relativizing (see for example [HCC+93]), the simulations do permit a restricted form of relativization, as do all prior simulations of time in smaller space. Define 𝖳𝖨𝖬𝖤-𝖫𝖤𝖭𝖦𝖳𝖧A[t(n),(n)]{\sf TIME\text{-}LENGTH}^{A}[t(n),\ell(n)] to be the class of problems solvable in O(t(n))O(t(n)) time with queries to the oracle AA, where all oracle queries are restricted to have length at most (n)\ell(n). Similarly define 𝖲𝖯𝖠𝖢𝖤-𝖫𝖤𝖭𝖦𝖳𝖧A[s(n),(n)]{\sf SPACE\text{-}LENGTH}^{A}[s(n),\ell(n)]. We observe the following extension of Theorem 3.1:

Theorem 5.1.

For every oracle AA and t(n)nt(n)\geq n, 𝖳𝖨𝖬𝖤-𝖫𝖤𝖭𝖦𝖳𝖧A[t(n),t(n)logt(n)]{\sf TIME\text{-}LENGTH}^{A}[t(n),\sqrt{t(n)}\log t(n)] is contained in 𝖲𝖯𝖠𝖢𝖤-𝖫𝖤𝖭𝖦𝖳𝖧A[t(n)logt(n),t(n)logt(n)]{\sf SPACE\text{-}LENGTH}^{A}[\sqrt{t(n)}\log t(n),\sqrt{t(n)}\log t(n)].

The theorem follows because each oracle query can be arranged to be written in a single tape block of length O(t(n)logt(n))O(\sqrt{t(n)}\log t(n)), and the Cook-Mertz procedure treats the evaluation functions as black boxes. Thus each evaluation function Fh,iF_{h,i} can simply call the oracle AA as needed. (Tretkoff [Tre86] made a similar observation for the simulation of 𝖳𝖨𝖬𝖤[t]{\sf TIME}[t] in Σ2𝖳𝖨𝖬𝖤[o(t)]\Sigma_{2}{\sf TIME}[o(t)], of Paul-Pippenger-Szemerédi-Trotter [PPST83]. Such a length-restricted oracle model was also studied in [CW06].) Is there a barrier to improving such (restricted) relativizing results to obtain 𝖯𝖯𝖲𝖯𝖠𝖢𝖤{\sf P}\neq{\sf PSPACE}? This seems related to the fact that Cai and Watanabe [CW06] were unable to collapse 𝖯𝖲𝖯𝖠𝖢𝖤{\sf PSPACE} to 𝖯{\sf P} with “random access to advice” (length-restricted access to oracles).

Acknowledgments.

I am grateful to Rahul Ilango, Egor Lifar, Priya Malhotra, Danil Sibgatullin, and Virginia Vassilevska Williams for valuable discussions on Tree Evaluation and the Cook-Mertz procedure. I am also grateful to Shyan Akmal, Paul Beame, Lijie Chen, Ce Jin, Dylan McKay, and the anonymous STOC reviewers for helpful comments on an earlier draft of this paper.

References

  • [AB09] Sanjeev Arora and Boaz Barak. Computational Complexity - A Modern Approach. Cambridge University Press, 2009. URL: http://www.cambridge.org/catalogue/catalogue.asp?isbn=9780521424264.
  • [AL81] Leonard M. Adleman and Michael C. Loui. Space-bounded simulation of multitape turing machines. Math. Syst. Theory, 14:215–222, 1981. URL: https://doi.org/10.1007/BF01752397.
  • [CIP09] Chris Calabro, Russell Impagliazzo, and Ramamohan Paturi. The complexity of satisfiability of small depth circuits. In Parameterized and Exact Computation, 4th International Workshop, IWPEC, volume 5917 of Lecture Notes in Computer Science, pages 75–85. Springer, 2009. URL: https://doi.org/10.1007/978-3-642-11269-0_6.
  • [CM20] James Cook and Ian Mertz. Catalytic approaches to the tree evaluation problem. In Proceedings of STOC, pages 752–760. ACM, 2020. URL: https://doi.org/10.1145/3357713.3384316.
  • [CM21] James Cook and Ian Mertz. Encodings and the tree evaluation problem. Electron. Colloquium Comput. Complex., TR21-054, 2021. URL: https://eccc.weizmann.ac.il/report/2021/054.
  • [CM22] James Cook and Ian Mertz. Trading time and space in catalytic branching programs. In 37th Computational Complexity Conference, CCC, volume 234 of LIPIcs, pages 8:1–8:21. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022. URL: https://doi.org/10.4230/LIPIcs.CCC.2022.8.
  • [CM24] James Cook and Ian Mertz. Tree evaluation is in space O(log n \cdot log log n). In Proceedings of the 56th Annual ACM Symposium on Theory of Computing (STOC), pages 1268–1278. ACM, 2024. URL: https://doi.org/10.1145/3618260.3649664.
  • [CMW+12] Stephen A. Cook, Pierre McKenzie, Dustin Wehr, Mark Braverman, and Rahul Santhanam. Pebbles and branching programs for tree evaluation. ACM Trans. Comput. Theory, 3(2):4:1–4:43, 2012. URL: https://doi.org/10.1145/2077336.2077337.
  • [CW06] Jin-yi Cai and Osamu Watanabe. Random access to advice strings and collapsing results. Algorithmica, 46(1):43–57, 2006. URL: https://doi.org/10.1007/s00453-006-0078-8.
  • [DL78] Richard A. DeMillo and Richard J. Lipton. A probabilistic remark on algebraic program testing. Inf. Process. Lett., 7(4):193–195, 1978. URL: https://doi.org/10.1016/0020-0190(78)90067-4.
  • [DT85] Patrick W. Dymond and Martin Tompa. Speedups of deterministic machines by synchronous parallel machines. J. Comput. Syst. Sci., 30(2):149–161, 1985. URL: https://doi.org/10.1016/0022-0000(85)90011-X.
  • [EPA99] Jeff Edmonds, Chung Keung Poon, and Dimitris Achlioptas. Tight lower bounds for st-connectivity on the NNJAG model. SIAM J. Comput., 28(6):2257–2284, 1999.
  • [FLvMV05] Lance Fortnow, Richard J. Lipton, Dieter van Melkebeek, and Anastasios Viglas. Time-space lower bounds for satisfiability. J. ACM, 52(6):835–865, 2005. URL: https://doi.org/10.1145/1101821.1101822.
  • [Gol08] Oded Goldreich. Computational complexity - a conceptual perspective. Cambridge University Press, 2008. URL: https://doi.org/10.1017/CBO9780511804106.
  • [Gol24] Oded Goldreich. On the Cook-Mertz Tree Evaluation procedure. Electron. Colloquium Comput. Complex., TR24-109, 2024. URL: https://eccc.weizmann.ac.il/report/2024/109.
  • [GS02] Etienne Grandjean and Thomas Schwentick. Machine-independent characterizations and complete problems for deterministic linear time. SIAM J. Comput., 32(1):196–230, 2002. doi:10.1137/S0097539799360240.
  • [HCC+93] Juris Hartmanis, Richard Chang, Suresh Chari, Desh Ranjan, and Pankaj Rohatgi. Relativization: a revisionistic retrospective. In Current Trends in Theoretical Computer Science - Essays and Tutorials, volume 40 of World Scientific Series in Computer Science, pages 537–548. World Scientific, 1993. URL: https://doi.org/10.1142/9789812794499_0040.
  • [HLMW86] Joseph Y. Halpern, Michael C. Loui, Albert R. Meyer, and Daniel Weise. On time versus space III. Math. Syst. Theory, 19(1):13–28, 1986. URL: https://doi.org/10.1007/BF01704903.
  • [HPV75] John E. Hopcroft, Wolfgang J. Paul, and Leslie G. Valiant. On time versus space. J. ACM, 24(2):332–337, 1977. Conference version in FOCS’75. URL: https://doi.org/10.1145/322003.322015.
  • [HS66] F. C. Hennie and Richard Edwin Stearns. Two-tape simulation of multitape turing machines. J. ACM, 13(4):533–546, 1966. URL: https://doi.org/10.1145/321356.321362.
  • [HU68] John E. Hopcroft and Jeffrey D. Ullman. Relations between time and tape complexities. J. ACM, 15(3):414–427, 1968. URL: https://doi.org/10.1145/321466.321474.
  • [IM83] Oscar H. Ibarra and Shlomo Moran. Some time-space tradeoff results concerning single-tape and offline TM’s. SIAM J. Comput., 12(2):388–394, 1983. URL: https://doi.org/10.1137/0212025.
  • [IP01] Russell Impagliazzo and Ramamohan Paturi. On the complexity of k-sat. J. Comput. Syst. Sci., 62(2):367–375, 2001. URL: https://doi.org/10.1006/jcss.2000.1727, doi:10.1006/JCSS.2000.1727.
  • [Juk12] Stasys Jukna. Boolean Function Complexity - Advances and Frontiers, volume 27 of Algorithms and combinatorics. Springer, 2012. URL: https://doi.org/10.1007/978-3-642-24508-4.
  • [KLRS12] Subrahmanyam Kalyanasundaram, Richard J. Lipton, Kenneth W. Regan, and Farbod Shokrieh. Improved simulation of nondeterministic turing machines. Theor. Comput. Sci., 417:66–73, 2012. URL: https://doi.org/10.1016/j.tcs.2011.05.018.
  • [KLV03] George Karakostas, Richard J. Lipton, and Anastasios Viglas. On the complexity of intersecting finite state automata and NL versus NP. Theor. Comput. Sci., 302(1-3):257–274, 2003. URL: https://doi.org/10.1016/S0304-3975(02)00830-7.
  • [Kur64] S.-Y. Kuroda. Classes of languages and linear-bounded automata. Information and control, 7(2):207–223, 1964.
  • [LL90] Maciej Liskiewicz and Krzysztof Lorys. Fast simulations of time-bounded one-tape turing machines by space-bounded ones. SIAM J. Comput., 19(3):511–521, 1990. URL: https://doi.org/10.1137/0219034.
  • [Lou81] Michael C. Loui. A space bound for one-tape multidimensional turing machines. Theor. Comput. Sci., 15:311–320, 1981. URL: https://doi.org/10.1016/0304-3975(81)90084-0.
  • [Lou83] Michael C. Loui. Optimal dynamic embedding of trees into arrays. SIAM J. Comput., 12(3):463–472, 1983. doi:10.1137/0212030.
  • [LT82] Thomas Lengauer and Robert Endre Tarjan. Asymptotically tight bounds on time-space trade-offs in a pebble game. J. ACM, 29(4):1087–1130, 1982. doi:10.1145/322344.322354.
  • [LW13] Richard J. Lipton and Ryan Williams. Amplifying circuit lower bounds against polynomial time, with applications. Comput. Complex., 22(2):311–343, 2013. URL: https://doi.org/10.1007/s00037-013-0069-5.
  • [LZPC05] Pinyan Lu, Jialin Zhang, Chung Keung Poon, and Jin-yi Cai. Simulating undirected st-connectivity algorithms on uniform JAGs and NNJAGs. In Proceedings of 16th International Symposium on Algorithms and Computation (ISAAC), volume 3827 of Lecture Notes in Computer Science, pages 767–776. Springer, 2005. URL: https://doi.org/10.1007/11602613_77.
  • [MW17] Cody D. Murray and R. Ryan Williams. Easiness amplification and uniform circuit lower bounds. In 32nd Computational Complexity Conference (CCC), volume 79 of LIPIcs, pages 8:1–8:21. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2017. URL: https://doi.org/10.4230/LIPIcs.CCC.2017.8.
  • [NW94] Noam Nisan and Avi Wigderson. Hardness vs randomness. J. Comput. Syst. Sci., 49(2):149–167, 1994. URL: https://doi.org/10.1016/S0022-0000(05)80043-1.
  • [Pat72] Mike Paterson. Tape bounds for time-bounded turing machines. J. Comput. Syst. Sci., 6(2):116–124, 1972. URL: https://doi.org/10.1016/S0022-0000(72)80017-5.
  • [PF79] Nicholas Pippenger and Michael J. Fischer. Relations among complexity measures. J. ACM, 26(2):361–381, 1979. URL: https://doi.org/10.1145/322123.322138.
  • [Pip77] Nicholas Pippenger. Fast simulation of combinational logic networks by machines without random-access storage. In Proceedings of the Fifteenth Annual Allerton Conference on Communication, Control and Computing, pages 25–33, 1977.
  • [Poo93] Chung Keung Poon. Space bounds for graph connectivity problems on node-named jags and node-ordered jags. In 34th Annual Symposium on Foundations of Computer Science (FOCS), pages 218–227. IEEE Computer Society, 1993. URL: https://doi.org/10.1109/SFCS.1993.366865.
  • [Poo00] Chung Keung Poon. A space lower bound for st-connectivity on node-named jags. Theor. Comput. Sci., 237(1-2):327–345, 2000. URL: https://doi.org/10.1016/S0304-3975(00)00019-0.
  • [PPST83] Wolfgang J. Paul, Nicholas Pippenger, Endre Szemerédi, and William T. Trotter. On determinism versus non-determinism and related problems (preliminary version). In 24th Annual Symposium on Foundations of Computer Science (FOCS), pages 429–438. IEEE Computer Society, 1983.
  • [PR80] Wolfgang J. Paul and Rüdiger Reischuk. On alternation II. A graph theoretic approach to determinism versus nondeterminism. Acta Informatica, 14:391–403, 1980. URL: https://doi.org/10.1007/BF00286494.
  • [PR81] Wolfgang J. Paul and Rüdiger Reischuk. On time versus space II. J. Comput. Syst. Sci., 22(3):312–327, 1981. URL: https://doi.org/10.1016/0022-0000(81)90035-0.
  • [PV76] Mike Paterson and Leslie G. Valiant. Circuit size is nonlinear in depth. Theor. Comput. Sci., 2(3):397–400, 1976. URL: https://doi.org/10.1016/0304-3975(76)90090-6.
  • [Raz91] Alexander A. Razborov. Lower bounds for deterministic and nondeterministic branching programs. In Lothar Budach, editor, 8th International Symposium on Fundamentals of Computation Theory (FCT), volume 529 of Lecture Notes in Computer Science, pages 47–60. Springer, 1991. URL: https://doi.org/10.1007/3-540-54458-5_49.
  • [Rei08] Omer Reingold. Undirected connectivity in log-space. J. ACM, 55(4):17:1–17:24, 2008. doi:10.1145/1391289.1391291.
  • [SHL65] Richard Edwin Stearns, Juris Hartmanis, and Philip M. Lewis II. Hierarchies of memory limited computations. In 6th Annual Symposium on Switching Circuit Theory and Logical Design, pages 179–190. IEEE Computer Society, 1965. URL: https://doi.org/10.1109/FOCS.1965.11.
  • [Sip88] Michael Sipser. Expanders, randomness, or time versus space. J. Comput. Syst. Sci., 36(3):379–383, 1988. URL: https://doi.org/10.1016/0022-0000(88)90035-9.
  • [SM73] Larry J. Stockmeyer and Albert R. Meyer. Word problems requiring exponential time: Preliminary report. In Proceedings of STOC, pages 1–9. ACM, 1973. URL: https://doi.org/10.1145/800125.804029.
  • [SSZ98] Michael E. Saks, Aravind Srinivasan, and Shiyu Zhou. Explicit or-dispersers with polylogarithmic degree. J. ACM, 45(1):123–154, 1998. URL: https://doi.org/10.1145/273865.273915.
  • [SvEB88] Cees F. Slot and Peter van Emde Boas. The problem of space invariance for sequential machines. Inf. Comput., 77(2):93–122, 1988. URL: https://doi.org/10.1016/0890-5401(88)90052-1.
  • [Swa78] Sowmitri Swamy. On Space-Time Tradeoffs. PhD thesis, Brown University, USA, 1978. URL: https://cs.brown.edu/research/pubs/theses/phd/1978/swamy.pdf.
  • [Tre86] Carol Tretkoff. Bounded oracles and complexity classes inside linear space. In Alan L. Selman, editor, Proceedings of Structure in Complexity Theory, volume 223 of Lecture Notes in Computer Science, pages 347–361. Springer, 1986. URL: https://doi.org/10.1007/3-540-16486-3_110.
  • [vEB90] Peter van Emde Boas. Machine models and simulation. In Jan van Leeuwen, editor, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity, pages 1–66. Elsevier and MIT Press, 1990.
  • [Wil08] Ryan Williams. Non-linear time lower bound for (succinct) quantified boolean formulas. Electron. Colloquium Comput. Complex., TR08-076, 2008. URL: https://eccc.weizmann.ac.il/eccc-reports/2008/TR08-076/index.html.

Appendix A Appendix: An Overview of The Cook-Mertz Procedure

The goal of this appendix is to describe the Cook-Mertz procedure for Tree Evaluation, and how it extends to arbitrary dd-ary trees of maximum height hh, and not just complete dd-ary trees of height hh, with the full set of (dh1)/(d1)(d^{h}-1)/(d-1) possible nodes. In particular, any tree with every inner node having at most dd children and depth at most hh can be evaluated using their algorithm.

Reminder of Theorem 2.2. [CM24, Theorem 7] Tree Evaluation on trees of bit-length bb, maximum height hh, and fan-in at most dd, can be computed in O(db+hlog(db))O(d\cdot b+h\log(d\cdot b)) space.

Our description below heavily draws from Goldreich’s exposition of the Cook-Mertz procedure [Gol24]. Let us stress that we make absolutely no claims of originality in the exposition below; we are only including the following in order to make our paper more self-contained.

First, let us describe how to handle inner nodes in the tree which have less than dd children. Letting deg(u)\deg(u) denote the number of children of uu, for all j[b]j\in[b] we let fu,j:{0,1}deg(u)b{0,1}f_{u,j}:\{0,1\}^{\deg(u)\cdot b}\rightarrow\{0,1\} be the function which returns the jj-th bit of the function fuf_{u} computed at node uu. For every node uu with deg(u)<d\deg(u)<d, we add extra children to uu which are simply leaves with value 0b0^{b}, so that uu has exactly dd children. The resulting new functions fu,j:{0,1}db{0,1}f_{u,j}:\{0,1\}^{d\cdot b}\rightarrow\{0,1\} at node uu are defined to simply ignore all bits of input with index larger than deg(u)b\deg(u)\cdot b. Now all inner nodes of the tree have exactly dd children, and all values of inner nodes remain the same as before. For our intended application to an implicit Tree Evaluation instance, we note that the reduction of this paragraph can also be implemented in a space-efficient way: we only need to be able to check the number of children of the current node uu, which we can easily do in our applications (e.g., Theorem 3.1 and Theorem 1.5). In particular, in our reduction the memory stores the entire computation graph which defines the tree, so we know the number of children of every tree node.

Let 𝔽{\mathbb{F}} be a field of characteristic two such that |𝔽|db2|{\mathbb{F}}|\geq db^{2}. For a node uu and index j[b]j\in[b], let f~u,j\widetilde{f}_{u,j} be the multilinear extension (over 𝔽{\mathbb{F}}) of the function fu,j:{0,1}db{0,1}f_{u,j}:\{0,1\}^{d\cdot b}\rightarrow\{0,1\} which returns the jj-th bit of the function fuf_{u} computed at node uu. In particular, f~u,j\widetilde{f}_{u,j} is a polynomial of degree dbd\cdot b on dbd\cdot b variables. Letting xi\vec{x_{i}} denote a block of bb variables for i=1,,di=1,\ldots,d,

f~u,j(x1,,xd)=a1,,ad{0,1}bχa1,,ad(x1,,xd)fu,j(a1,,ad),\displaystyle\widetilde{f}_{u,j}(\vec{x_{1}},\ldots,\vec{x_{d}})=\sum_{a_{1},\ldots,a_{d}\in\{0,1\}^{b}}\chi_{a_{1},\ldots,a_{d}}(\vec{x_{1}},\ldots,\vec{x_{d}})\cdot f_{u,j}(a_{1},\ldots,a_{d}), (2)

where χa1,,ad(x1,,xd)\chi_{a_{1},\ldots,a_{d}}(\vec{x_{1}},\ldots,\vec{x_{d}}) is the unique multilinear polynomial of degree dbd\cdot b such that

χa1,,ad(a1,,ad)=1,\chi_{a_{1},\ldots,a_{d}}(a_{1},\ldots,a_{d})=1,

and χa1,,ad(a1,,ad)=0\chi_{a_{1},\ldots,a_{d}}(a^{\prime}_{1},\ldots,a^{\prime}_{d})=0 for all (a1,,ad)({0,1}b)d(a^{\prime}_{1},\ldots,a^{\prime}_{d})\in(\{0,1\}^{b})^{d} such that (a1,,ad)(a1,,ad)(a^{\prime}_{1},\ldots,a^{\prime}_{d})\neq(a_{1},\ldots,a_{d}).

Let [d][d]^{\star} be the set of all strings over the alphabet {1,,d}\{1,\ldots,d\} (including the empty string). Observe that for every node with label u[d]u\in[d]^{\star}, its children have labels u1,,udu1,\ldots,ud. Thus by definition, the value of uu is

vu\displaystyle v_{u} =fu(vu1,,vud)=(fu,1(vu1,,vud),,fu,b(vu1,,vud))\displaystyle=f_{u}(v_{u1},\ldots,v_{ud})=(f_{u,1}(v_{u1},\ldots,v_{ud}),\ldots,f_{u,b}(v_{u1},\ldots,v_{ud})) (3)
=(f~u,1(vu1,,vud),,f~u,b(vu1,,vud)).\displaystyle=(\widetilde{f}_{u,1}(v_{u1},\ldots,v_{ud}),\ldots,\widetilde{f}_{u,b}(v_{u1},\ldots,v_{ud})). (4)

The Tree Evaluation procedure has two types of storage:

  • One storage type is “local”, holding the index of the current node (O(hlogd)O(h\log d) bits), as well as a recursion stack of height at most hh with O(log(db))O(\log(d\cdot b)) bits stored at each level of recursion, which is O(hlog(db)))O(h\log(d\cdot b))) bits in total.

  • The other type of storage is “catalytic” or “global” storage ([Gol24]). This O(db)O(d\cdot b) space is used to store d+1d+1 separate bb-bit blocks. In the final version of the algorithm that we describe, each bb-bit block corresponds to storing O(b/log|𝔽|)O(b/\log|{\mathbb{F}}|) elements of 𝔽{\mathbb{F}}. These blocks are repeatedly reused at every node of the tree, to evaluate the functions at each node.

    In the following, for simplicity, we will describe an O(dblog(db))O(d\cdot b\log(d\cdot b)) space bound using multilinear extensions; with minor modifications, Cook-Mertz [CM24] (see also Goldreich [Gol24]) show how relaxing to merely low-degree extensions allows us to use only O(db)O(d\cdot b) space. At the end, we will discuss how to achieve this space bound.

At any point in time, we will denote the entire storage content of the algorithm by (u,x^1,,x^d,y^)(u,\hat{x}_{1},\ldots,\hat{x}_{d},\hat{y}), where

  • u[d]u\in[d]^{\star}, |u|h|u|\leq h, is the label of a node in the tree, denoting the path from root to that node. (ε\varepsilon denotes the root node, i[d]i\in[d] denotes the ii-th child of the root, etc.),

  • each x^i\hat{x}_{i} is a collection of bb registers x^i(1),,x^i(b)\hat{x}^{(1)}_{i},\ldots,\hat{x}^{(b)}_{i} holding bb elements of 𝔽{\mathbb{F}}, and

  • y^\hat{y} is a collection of bb registers y^(1),,y^(b)\hat{y}^{(1)},\ldots,\hat{y}^{(b)}, also holding bb elements of 𝔽{\mathbb{F}}.

Initially, we set the storage content to be

(ε,0,,0),(\varepsilon,\vec{0},\ldots,\vec{0}),

i.e., all-zeroes, starting at the root node.

For a node label uu, let vu{0,1}bv_{u}\in\{0,1\}^{b} be the value of uu; we think of vuv_{u} as an element of 𝔽b{\mathbb{F}}^{b}. Our goal is to compute vεv_{\varepsilon}, the value of the root of the tree. We will give a recursive procedure Add which takes storage content (u,x^1,,x^d,y^)(u,\hat{x}_{1},\ldots,\hat{x}_{d},\hat{y}) and returns the content (u,x^1,,x^d,y^+vu)(u,\hat{x}_{1},\ldots,\hat{x}_{d},\hat{y}+v_{u}). That is, Add adds the value vuv_{u} to the last register, over the field 𝔽{\mathbb{F}}. Observe that, if Add works correctly, then calling Add on (ε,0,,0)(\varepsilon,\vec{0},\ldots,\vec{0}) will put the value of the root node into the last register.

Now we describe Add. Let mm be such that |𝔽|=m+1db|{\mathbb{F}}|=m+1\geq d\cdot b, and let ω\omega be an mm-th root of unity in 𝔽{\mathbb{F}}.

Add: Given the storage content (u,x^1,,x^d,y^)(u,\hat{x}_{1},\ldots,\hat{x}_{d},\hat{y}),
If uu is a leaf, look up the value vuv_{u} in the input, and return (u,x^1,,x^d,y^+vu)(u,\hat{x}_{1},\ldots,\hat{x}_{d},\hat{y}+v_{u}).
For i=1,,mi=1,\ldots,m,
   For r=1,,dr=1,\ldots,d,
     Rotate registers, so (x^r,,x^d,y^,x^1,,x^r1)(\hat{x}_{r},\ldots,\hat{x}_{d},\hat{y},\hat{x}^{\prime}_{1},\ldots,\hat{x}^{\prime}_{r-1}) shifts to (x^r+1,,x^d,y^,x^1,,x^r1,x^r)(\hat{x}_{r+1},\ldots,\hat{x}_{d},\hat{y},\hat{x}^{\prime}_{1},\ldots,\hat{x}^{\prime}_{r-1},\hat{x}_{r}).
     Multiply x^r\hat{x}_{r} by ωi\omega^{i}, and call Add on (ur,x^r+1,,x^d,y^,x^1,,x^r1,ωix^r)(ur,\hat{x}_{r+1},\ldots,\hat{x}_{d},\hat{y},\hat{x}^{\prime}_{1},\ldots,\hat{x}^{\prime}_{r-1},\omega^{i}\cdot\hat{x}_{r}),
       which returns (ur,x^r+1,,x^d,y^,x^1,,x^r1,x^r)(ur,\hat{x}_{r+1},\ldots,\hat{x}_{d},\hat{y},\hat{x}^{\prime}_{1},\ldots,\hat{x}^{\prime}_{r-1},\hat{x}^{\prime}_{r}), where x^r=ωix^r+vur\hat{x}^{\prime}_{r}=\omega^{i}\cdot\hat{x}_{r}+v_{ur}.
       (here, urur is just the concatenation of the string uu with the symbol rr)
   (at this point, the storage has the form: (ud,y^,ωix^1+vu1,,ωix^d+vud)(ud,\hat{y},\omega^{i}\cdot\hat{x}_{1}+v_{u1},\ldots,\omega^{i}\cdot\hat{x}_{d}+v_{ud}))
   Update udud back to uu.
   For all j=1,,bj=1,\ldots,b,
     Compute f~u,j(ωix^1+vu1,,ωix^d+vud)\widetilde{f}_{u,j}(\omega^{i}\cdot\hat{x}_{1}+v_{u1},\ldots,\omega^{i}\cdot\hat{x}_{d}+v_{ud}) using O(b)O(b) extra space.
     Update y^(j)=y^(j)+f~u,j(ωix^1+vu1,,ωix^d+vud)\hat{y}^{(j)}=\hat{y}^{(j)}+\widetilde{f}_{u,j}(\omega^{i}\cdot\hat{x}_{1}+v_{u1},\ldots,\omega^{i}\cdot\hat{x}_{d}+v_{ud}).
     Erase the O(b)O(b) space used to compute f~u,j\widetilde{f}_{u,j}.
   For r=d,,1r=d,\ldots,1,
     Call Add on (ur,x^r+1,,x^d,y^,ωix^1+vu1,,ωix^r+vur)(ur,\hat{x}_{r+1},\ldots,\hat{x}_{d},\hat{y},\omega^{i}\cdot\hat{x}_{1}+v_{u1},\ldots,\omega^{i}\cdot\hat{x}_{r}+v_{ur}),
       which returns (ur,x^r+1,,x^d,y^,ωix^1+vu1,,ωix^r1+vu(r1),x^r′′)(ur,\hat{x}_{r+1},\ldots,\hat{x}_{d},\hat{y},\omega^{i}\cdot\hat{x}_{1}+v_{u1},\ldots,\omega^{i}\cdot\hat{x}_{r-1}+v_{u(r-1)},\hat{x}^{\prime\prime}_{r}), where x^r′′=ωix^r\hat{x}^{\prime\prime}_{r}=\omega^{i}\cdot\hat{x}_{r}.
       (note: here we use the fact that 𝔽{\mathbb{F}} is characteristic two)
     Divide x^r′′\hat{x}^{\prime\prime}_{r} by ωi\omega^{i}, so that x^r′′=x^r\hat{x}^{\prime\prime}_{r}=\hat{x}_{r}.
     Rotate registers, so (x^r+1,,x^d,y^,ωix^1+vu1,,ωix^r1+vu(r1),x^r)(\hat{x}_{r+1},\ldots,\hat{x}_{d},\hat{y},\omega^{i}\cdot\hat{x}_{1}+v_{u1},\ldots,\omega^{i}\cdot\hat{x}_{r-1}+v_{u(r-1)},\hat{x}_{r}) shifts to
        (x^r,x^r+1,,x^d,y^,ωix^1+vu1,,ωix^r1+vu(r1))(\hat{x}_{r},\hat{x}_{r+1},\ldots,\hat{x}_{d},\hat{y},\omega^{i}\cdot\hat{x}_{1}+v_{u1},\ldots,\omega^{i}\cdot\hat{x}_{r-1}+v_{u(r-1)}).
   Update u1u1 back to uu.
(claim: now the storage has the form (u,x^1,,x^d,y^+f~u(vu1,,vud))(u,\hat{x}_{1},\ldots,\hat{x}_{d},\hat{y}+\widetilde{f}_{u}(v_{u1},\ldots,v_{ud})))
(note that vu=f~u(vu1,,vud)v_{u}=\widetilde{f}_{u}(v_{u1},\ldots,v_{ud}), by (3) and (4))
Return (u,x^1,,x^d,y^+vu)(u,\hat{x}_{1},\ldots,\hat{x}_{d},\hat{y}+v_{u}).

Recall that 𝔽{\mathbb{F}} is characteristic two, so that adding vurv_{ur} twice to the register x^r\hat{x}_{r} has a net contribution of zero. The key to the correctness of Add (and the claim in the pseudocode) is the following polynomial interpolation formula (proved in [CM24]): for every mm-th root of unity ω\omega over 𝔽{\mathbb{F}}, for every x^1,,x^d,vu1,,vud𝔽b\hat{x}_{1},\ldots,\hat{x}_{d},v_{u1},\ldots,v_{ud}\in{\mathbb{F}}^{b}, and for every polynomial PP of degree less than mm,

i=1mP(ωix^1+vu1,,ωix^d+vud)=P(vu1,,vud).\displaystyle\sum_{i=1}^{m}P(\omega^{i}\cdot\hat{x}_{1}+v_{u1},\ldots,\omega^{i}\cdot\hat{x}_{d}+v_{ud})=P(v_{u1},\ldots,v_{ud}). (5)

(Note that our formula is slightly simpler than Cook-Mertz [CM24] and Goldreich [Gol24], because we assume 𝔽{\mathbb{F}} is a field of characteristic two.) Equation (5) ensures that the content of y^\hat{y} is indeed updated to be y^+fu~(vu1,,vud)\hat{y}+\widetilde{f_{u}}(v_{u1},\ldots,v_{ud}), which equals y^+vu\hat{y}+v_{u} by equations (3) and (4).

Let us briefly compare Add to the “obvious” algorithm for Tree Evaluation. The “obvious” algorithm allocates fresh new space for each recursive call to store the values of the children, traversing the tree in a depth-first manner. Each level of the stack holds O(db)O(d\cdot b) bits, and this approach takes Θ(hdb)\Theta(h\cdot d\cdot b) space overall. In contrast, the algorithm Add adds the values of the dd children to the existing O(dblogb)O(d\cdot b\log b) content of the dd registers, and uses polynomial interpolation to add the correct value of the node to the last register.

More prescisely, while Add has no initial control over the content of x^1,,x^d\hat{x}_{1},\ldots,\hat{x}_{d}, equation (5) allows Add to perform a type of “worst-case to arbitrary-case” reduction: in order to add the value of function fuf_{u} on specific vu1,,vudv_{u1},\ldots,v_{ud} to the register y^\hat{y}, given that the initial space content is some arbitrary x^1,,x^d,y^\hat{x}_{1},\ldots,\hat{x}_{d},\hat{y}, it suffices to perform a running sum from i=1i=1 to mm, where we multiply the x^1,,x^d\hat{x}_{1},\ldots,\hat{x}_{d} content by ωi\omega^{i}, add vu1,,vudv_{u1},\ldots,v_{ud} into the current space content, then evaluate the functions f~u,j\tilde{f}_{u,j} on that content, storing the result of the running sum into y^\hat{y}, which (after summing from i=1,,mi=1,\ldots,m) results in adding the value vuv_{u} into y^\hat{y}. That is, starting from any arbitrary values in the registers which are beyond our control, we can compute fuf_{u} on any desired input registers.

The overall space usage of the above procedure is

O(hlog(db)+dblog(db)).O(h\cdot\log(d\cdot b)+d\cdot b\log(d\cdot b)).

The “local” storage is used to store current values i[m]i\in[m], r[d]r\in[d], and O(1)O(1) bits to store a program counter (storing which of the two sets of recursive calls we’re at), at each level of the tree. This takes O(log(db))O(\log(d\cdot b)) bits for each level of recursion, and O(hlog(db))O(h\cdot\log(d\cdot b)) space overall. The node index u[d]u\in[d]^{\star} is also stored, which takes O(hlogd)O(h\cdot\log d) bits.

The “global” storage holds the content (x^1,,x^d,y^)(\hat{x}_{1},\ldots,\hat{x}_{d},\hat{y}). Each x^i\hat{x}_{i} and yy consist of bb elements of 𝔽{\mathbb{F}}, which take O(blog(db))O(b\log(d\cdot b)) bits each. To compute the multilinear extension f~u,j\widetilde{f}_{u,j} on (ωix^1+vu1,,ωix^d+vud)(\omega^{i}\cdot\hat{x}_{1}+v_{u1},\ldots,\omega^{i}\cdot\hat{x}_{d}+v_{ud}), we follow equation (2): we use O(b)O(b) bits of space to sum over all a1,,ad{0,1}ba_{1},\ldots,a_{d}\in\{0,1\}^{b}, and we use O(log|𝔽|)O(log(db))O(\log|{\mathbb{F}}|)\leq O(\log(d\cdot b)) extra space to evaluate the expression χa1,,ad(ωix^1+vu1,,ωix^d+vud)\chi_{a_{1},\ldots,a_{d}}(\omega^{i}\cdot\hat{x}_{1}+v_{u1},\ldots,\omega^{i}\cdot\hat{x}_{d}+v_{ud}) and multiply it with the value fu,j(a1,,ad){0,1}f_{u,j}(a_{1},\ldots,a_{d})\in\{0,1\}.

Finally, we describe how to improve the space usage to O(hlog(db)+db)O(h\log(d\cdot b)+d\cdot b). The idea is to use “low-degree extensions” of fuf_{u}, rather than multilinear extensions, and to group the bb-bit output of fuf_{u} into approximately b/log|𝔽|\lceil b/\log|{\mathbb{F}}|\rceil blocks, instead of bb blocks. In more detail, we modify the polynomials f~u,j\widetilde{f}_{u,j} over 𝔽{\mathbb{F}} so that they have O(d(b/log|𝔽|))O(d\cdot(b/\log|{\mathbb{F}}|)) variables instead of O(db)O(d\cdot b) variables, and the jj ranges over a set {1,,cb/log|𝔽|}\{1,\ldots,\lceil cb/\log|{\mathbb{F}}|\rceil\} for a constant c>0c>0, instead of [b]={1,,b}[b]=\{1,\ldots,b\}. To accommodate this, we will need to adjust the order of 𝔽{\mathbb{F}} slightly.

Let |𝔽|=2q|{\mathbb{F}}|=2^{q}, for an even integer qq to be determined later (recall 𝔽{\mathbb{F}} is characteristic two). Let S𝔽S\subseteq{\mathbb{F}} be a set of cardinality 2q/22^{q/2}, which is put in one-to-one correspondence with the elements of {0,1}q/2\{0,1\}^{q/2}, and let t=b/log|S|t=\lceil b/\log|S|\rceil. We can then view the function fu:{0,1}db{0,1}bf_{u}:\{0,1\}^{d\cdot b}\rightarrow\{0,1\}^{b} as instead having domain (St)d(S^{t})^{d}, and co-domain StS^{t}. That is, we can think of the output of fuf_{u} as the concatenation of tt subfunctions (fu,1,,fu,t)(f_{u,1},\ldots,f_{u,t}), with each fu,j:(St)dSf_{u,j}:(S^{t})^{d}\rightarrow S. For each j=1,,tj=1,\ldots,t, the multilinear polynomial f~u,j\widetilde{f}_{u,j} of equation (2) can then be replaced by another polynomial in dtd\cdot t variables over 𝔽{\mathbb{F}}:

f~u,j(x1,,xd)=a1,,adStχa1,,ad(x1,,xd)fu,j(a1,,ad),\displaystyle\widetilde{f}_{u,j}(\vec{x_{1}},\ldots,\vec{x_{d}})=\sum_{a_{1},\ldots,a_{d}\in S^{t}}\chi_{a_{1},\ldots,a_{d}}(\vec{x_{1}},\ldots,\vec{x_{d}})\cdot f_{u,j}(a_{1},\ldots,a_{d}), (6)

where again χa1,,ad(x1,,xd)\chi_{a_{1},\ldots,a_{d}}(\vec{x_{1}},\ldots,\vec{x_{d}}) is a polynomial over 𝔽{\mathbb{F}} such that χa1,,ad(a1,,ad)=1\chi_{a_{1},\ldots,a_{d}}(a_{1},\ldots,a_{d})=1, and χa1,,ad\chi_{a_{1},\ldots,a_{d}} vanishes on all a1,,adSta^{\prime}_{1},\ldots,a^{\prime}_{d}\in S^{t} such that (a1,,ad)(a1,,ad)(a^{\prime}_{1},\ldots,a^{\prime}_{d})\neq(a_{1},\ldots,a_{d}). Such a polynomial χa1,,ad\chi_{a_{1},\ldots,a_{d}} can be constructed with degree d(|S|1)t=d(|𝔽|1)td\cdot(|S|-1)\cdot t=d\cdot(\sqrt{|{\mathbb{F}}|}-1)\cdot t. As long as this quantity is less than |𝔽||{\mathbb{F}}|, we can pick an mm-th root of unity ω\omega with m=|𝔽|1m=|{\mathbb{F}}|-1 and we may apply the polynomial interpolation formula of equation (5). WLOG, we may assume dd and bb are even powers of two (otherwise, we can round them up to the closest such powers of two). Setting 2q=d2b22^{q}=d^{2}b^{2}, we have

d(|S|1)td(db1)b<d2b2=|𝔽|.d\cdot(|S|-1)\cdot t\leq d\cdot(d\cdot b-1)\cdot b<d^{2}b^{2}=|{\mathbb{F}}|.

As a result, each of the registers x^1,,x^d,y^\hat{x}_{1},\ldots,\hat{x}_{d},\hat{y} can now be represented with t=b/log|S|t=\lceil b/\log|S|\rceil elements of 𝔽{\mathbb{F}}, rather than bb elements of 𝔽{\mathbb{F}} as before. Since log|S|=Θ(log|𝔽|)\log|S|=\Theta(\log|{\mathbb{F}}|), each such register can now be represented with O(db)O(d\cdot b) bits instead, and the new polynomials of (6) can still be evaluated in O(b)O(b) space.