\lmcsdoi

16311 \lmcsheadingLABEL:LastPageApr. 18, 2014Aug. 20, 2020

On the Expressive Power of
Higher-Order Pushdown Systems

Paweł Parys University of Warsaw, ul. Banacha 2, 02-097 Warszawa, Poland [email protected]

Abstract.

We show that deterministic collapsible pushdown automata of second order can recognize a language that is not recognizable by any deterministic higher-order pushdown automaton (without collapse) of any order. This implies that there exists a tree generated by a second order collapsible pushdown system (equivalently, by a recursion scheme of second order) that is not generated by any deterministic higher-order pushdown system (without collapse) of any order (equivalently, by any safe recursion scheme of any order). As a side effect, we present a pumping lemma for deterministic higher-order pushdown automata, which potentially can be useful for other applications.

Key words and phrases:

Higher-order pushdown systems, collapse, higher-order recursion schemes

1991 Mathematics Subject Classification:

F.1.1. Models of computation—Relations between models

Work supported by the National Science Center (decision DEC-2012/07/D/ST6/02443). The author holds a post-doctoral position supported by Warsaw Center of Mathematics and Computer Science.

\titlecomment

This is a full version of our conference paper [ho-new]

1. Introduction

Already in the 70’s, Maslov [Mas74, Mas76] generalized the concept of pushdown automata to higher-order pushdown automata ( $n$ -PDA) by allowing the stack to contain other stacks rather than just atomic elements. In the last decade, renewed interest in these automata has arisen. They are now studied not only as acceptors of string languages, but also as generators of graphs and trees. It was an interesting problem whether the class of trees generated by $n$ -PDA coincides with the class of trees generated by order- $n$ recursion schemes. Knapik, Niwiński, and Urzyczyn [easy-trees] showed something similar but different: that this class coincides with the class of trees generated by safe order- $n$ recursion schemes (safety is a syntactic restriction on the recursion scheme), and Caucal [Caucal02] gave another characterization: trees of order $n+1$ are obtained from trees of order $n$ by an MSO-interpretation of a graph, followed by application of unfolding.

Driven by the question whether safety implies a semantical restriction to recursion schemes Hague, Murawski, Ong, and Serre [collapsible] extended the model of $n$ -PDA to order- $n$ collapsible pushdown automata ( $n$ -CPDA) by introducing a new stack operation called collapse, and proved that the class of trees generated by $n$ -CPDA coincides with the class of trees generated by order- $n$ recursion schemes (earlier, Knapik, Niwiński, Urzyczyn, and Walukiewicz [panic] introduced panic automata, a model equivalent to $2$ -CPDA). Let us mention that these trees have decidable MSO theory [ong-lics], and that higher-order recursion schemes have close connections with verification of some real life higher-order programs [Kobayashi09].

Nevertheless, it was still an open question whether these two hierarchies of trees are possibly the same hierarchy? This problem was stated in Knapik et al. [easy-trees] and repeated in other papers concerning higher-order pushdown automata [panic, AehligMO05, ong-lics, collapsible]. A partial answer to this question was given in our previous paper [parys-panic]: there is a tree generated by a $2$ -CPDA that is not generated by any $2$ -PDA. We prove the following stronger property.

Theorem 1.

There is a tree generated by a $2$ -CPDA (equivalently, by a recursion scheme of order $2$ ) that is not generated by any $n$ -PDA, for any $n$ (equivalently, by any safe recursion scheme of any order).

This confirms that the correspondence between higher-order recursion schemes and higher-order pushdown automata is not perfect. The tree used in Theorem 1 (after some adaptations) comes from Knapik et al. [easy-trees] and from that time was conjectured to be a good example.

In this paper we work with PDA that recognize words instead of generating trees. While in general PDA used to recognize word languages can be nondeterministic, trees generated by PDA closely correspond to word languages recognized by deterministic PDA. Technically, we prove the following theorem, from which Theorem 1 follows (it is shown in Section 3 how these theorems are related).

Theorem 2.

There is a language recognized by a deterministic $2$ -CPDA that is not recognized by any deterministic $n$ -PDA, for any $n$ .

As a side effect, in Section LABEL:sec:pumping we present a pumping lemma for higher-order pushdown automata. Although its formulation is not very natural, we believe it may be useful for some other applications. The lemma is similar to the pumping lemma from our another paper [parys-pumping]; see Section LABEL:sec:pumping for some comments. Earlier, several pumping lemmas related to the second order of the pushdown hierarchy were proposed [hayashi-pumping, gilman-pumping, kartzow-pumping].

This paper is an extended version of our conference paper [ho-new]. The proof of Theorem 1 goes along the same line, but with essential differences in details. The part about types (Section LABEL:sec:types) was simplified slightly, in the cost of complicating other parts (which was necessary since Theorem LABEL:thm:types is now proven in a weaker form than in the conference paper).

1.1. Related Work

One may ask a similar question for word languages instead of trees: is there a language recognized by a CPDA that is not recognized by any (nondeterministic) PDA? This is an independent problem. The answer is known only for order $2$ and is opposite: one can see that in $2$ -CPDA the collapse operation can be simulated by nondeterminism, hence $2$ -PDA and $2$ -CPDA recognize the same languages [AehligMO05]. It is also an open question whether all word languages recognized by CPDA are context-sensitive.

We have shown [collapse-data] that the collapse operation increases the expressive power of deterministic higher-order pushdown automata with data. In this model of automata each letter from the input word is equipped by a data value, which comes from an infinite set; these data values can be stored on the stack and compared with other data values. In such a setting the proof becomes easier than in the no-data case considered in this paper.

One can consider configuration graphs of $n$ -PDA and $n$ -CPDA, and their $\varepsilon$ -closures. We know [collapsible] that there is a $2$ -CPDA whose configuration graph has undecidable MSO theory, hence which is not a configuration graph of an $n$ -PDA, nor an $\varepsilon$ -closure of such, as they all have decidable MSO theories.

Engelfriet [Engelfriet91] showed that the hierarchies of word languages and of trees generated by PDA are strict (that is, for each $n$ there is a language recognized by an $n$ -PDA that is not recognized by any $(n-1)$ -PDA, and similarly for trees). As observed by Haußner and Kartzow [HeussnerKartzow], his proof works equally well for these hierarchies for CPDA, once we know that the reachability problem for $n$ -CPDA is $(n-1)$ -EXPTIME complete (which follows from Kobayashi and Ong [emptiness-n-1-exptime]).

2. Preliminaries

For natural numbers $a$ , $b$ , where $b\geq a-1$ , by $[a,b]$ we denote the set $\{a,\dots,b\}$ (which is empty if $b=a-1$ ).

In the whole paper, the letter $n$ is used exclusively for the order of pushdown automata, which is usually assumed to be fixed and known implicitly.

We now define stacks of order $k$ ( $k$ -stacks for short). Traditionally, a $0$ -stack is just a single symbol, and a $k$ -stack for $k\geq 1$ is a (possibly empty) sequence of nonempty $(k-1)$ -stacks. However, having a $k$ -stack that is a part of an $r$ -stack for $k<r$ , it is convenient to know where this $k$ -stack is located in the $r$ -stack. For this reason, we equip every element of a stack by its position, written as a vector of natural numbers. Thus, for a fixed alphabet $\Gamma$ (of stack symbols), a stack of order $0$ is a pair $(\gamma,x)$ , where $\gamma\in\Gamma$ and $x=(x_{n},x_{n-1},\dots,x_{1})$ is a vector of $n$ positive integers, called a position. Then, for $k\in[1,n]$ we define $k$ -stacks by induction: a $k$ -stack is a list $[s_{1},s_{2},\dots,s_{m}]$ of nonempty $(k-1)$ -stacks (where, by convection, all $0$ -stacks are nonempty) for which there exist numbers $x_{n},x_{n-1},\dots,x_{k+1}$ such that, for $i\in[1,m]$ , all positions in $s_{i}$ are of the form $(x_{n},x_{n-1},\dots,x_{k+1},i,y_{k-1},y_{k-2},\dots,y_{1})$ . By $\Gamma^{k}_{*}$ and $\Gamma^{k}_{+}$ we denote the the set of order- $k$ stacks, and the set of nonempty order- $k$ stack, respectively, where $k\in[0,n]$ . The top of a stack is on the right.

For example, when we have a $3$ -stack $s$ , and $n=5$ , then the second $0$ -stack of the third $1$ -stack (counting from the bottom) of the bottommost $2$ -stack of $s$ is of the form $(\gamma,(x_{5},x_{4},1,3,2))$ , where $x_{5}$ and $x_{4}$ say where $s$ is located in an imaginary $5$ -stack; the numbers $x_{5}$ and $x_{4}$ should be the same in the whole $s$ .

For a $k$ -stack $s^{k}$ , where $k\in[0,n-1]$ , let $\mathsf{p}_{+1}(s^{k})$ be the $k$ -stack obtained from $s^{k}$ by increasing the $(n-k)$ -th coordinate of all its positions by $1$ . For example $\mathsf{p}_{+1}((\gamma,(2,3)))=(\gamma,(2,4))$ , and $\mathsf{p}_{+1}([(\gamma,(2,1)),(\gamma,(2,2))])=[(\gamma,(3,1)),(\gamma,(3,2))]$ .

Let us emphasize that when for two $k$ -stacks $s^{k}$ , $t^{k}$ we write $s^{k}=t^{k}$ , we mean that not only their contents are equal, but also positions contained in their $0$ -stacks are equal; thus, when $s^{k}$ and $t^{k}$ come from the same $n$ -stack, this actually means that $s^{k}$ and $t^{k}$ refer to the same $k$ -stack.

While comparing two stacks, we sometimes need to ignore positions contained in their $0$ -stacks, and compare only their contents. For a $k$ -stack $s^{k}$ , let positionless stack $\mathsf{pos}{\downarrow}(s^{k})$ be the list of lists of … of lists of stack symbols obtained from $s^{k}$ by removing positions from all $0$ -stacks. We say that two $k$ -stacks $s^{k},t^{k}$ are positionless-equal, denoted $s^{k}\cong t^{k}$ , when $\mathsf{pos}{\downarrow}(s^{k})=\mathsf{pos}{\downarrow}(t^{k})$ . When $s^{n}_{-}$ is a positionless $n$ -stack, there is a unique $n$ -stack $s^{n}$ such that $s^{n}_{-}=\mathsf{pos}{\downarrow}(s^{n})$ ; we write $\mathsf{pos}_{+}(s^{n}_{-})$ for $s^{n}$ .

The size of a $k$ -stack $s^{k}$ , denoted $|s^{k}|$ , is the number of $(k-1)$ -stacks it contains. When $s^{k}=[s_{1},s_{2},\dots,s_{m}]\in\Gamma^{k}_{*}$ , and $s^{k-1}\in\Gamma^{k-1}_{+}$ , and $[s_{1},s_{2},\dots,s_{m},s^{k-1}]$ is a valid $k$ -stack, we denote this $k$ -stack by $s^{k}:s^{k-1}$ . The operator “ $:$ ” is assumed to be right associative (i.e., e.g., $s^{2}:s^{1}:s^{0}=s^{2}:(s^{1}:s^{0})$ ). When $0\leq k\leq r$ , and $s^{r}=t^{r}:t^{r-1}:\dots:t^{k}\in\Gamma^{r}_{+}$ , by $\mathsf{top}^{k}(s^{r})$ we denote the topmost $k$ -stack of $s^{r}$ , that is, $t^{k}$ . We use the name positionless topmost $k$ -stack for $\mathsf{pos}{\downarrow}(\mathsf{top}^{k}(\cdot))$ .

When $\Gamma$ is fixed, the stack operations of order $k\geq 1$ are $\mathsf{pop}^{k}$ and $\mathsf{push}^{k}_{\gamma}$ for each $\gamma\in\Gamma$ . We can apply them to a nonempty $r$ -stack for $r\geq k$ , which gives the following:

•

$\mathsf{pop}^{k}(s^{r}:s^{r-1}:\dots:s^{k}:s^{k-1})=s^{r}:s^{r-1}:\dots:s^{k}$ , that is, we remove the topmost $(k-1)$ -stack; it is defined only when the topmost $k$ -stack contains at least two $(k-1)$ -stacks;
•

$\mathsf{push}^{k}_{\gamma}(s^{r}:s^{r-1}:\dots:s^{0})=s^{r}:s^{r-1}:\dots:s^{k+1}:(s^{k}:s^{k-1}:\dots:s^{0}):\mathsf{p}_{+1}(s^{k-1}:s^{k-2}:\dots:s^{1}:(\gamma,x))$ for $s^{0}=(\gamma^{\prime},x)$ , that is, we duplicate the topmost $(k-1)$ -stack, and then we replace the topmost stack symbol by $\gamma$ , adjusting appropriately all positions.¹¹1 In the classical definition the topmost symbol can be changed only when $k=1$ (for $k\geq 2$ it required that $\gamma=\gamma^{\prime}$ ). We make this (unimportant) extension to have a uniform definition of $\mathsf{push}^{k}$ for all $k$ .

A deterministic word-recognizing pushdown automaton of order $n$ ( $n$ -DPDA for short) is a tuple $(A,\Gamma,\gamma_{I},Q,q_{I},F,\delta)$ where $A$ is an input alphabet, $\Gamma$ is a stack alphabet, $\gamma_{I}\in\Gamma$ is an initial stack symbol, $Q$ is a set of states, $q_{I}\in Q$ is an initial state, $F\subseteq Q$ is a set of accepting states, and $\delta$ is a transition function that maps every element of $Q\times\Gamma$ into one of the following objects:

•

$\mathsf{read}(\vec{q})$ , where $\vec{q}:A\to Q$ is an injective function, or
•

$(q,op)$ , where $q\in Q$ and $op$ is a stack operation of order at most $n$ .

A configuration of ${\mathcal{A}}$ consists of a state and of a nonempty $n$ -stack, that is, it is an element of $Q\times\Gamma_{+}^{n}$ . The initial configuration consists of the initial state $q_{I}$ and of the $n$ -stack containing only one $0$ -stack, enclosing the initial stack symbol $\gamma_{I}$ . We use the notation $\pi_{i}((p_{1},\dots,p_{k}))=p_{i}$ ; in particular for a configuration $c$ , $\pi_{1}(c)$ denotes its state, and $\pi_{2}(c)$ its stack. Additionally, for a set $X$ of tuples we define $\pi_{i}(X)$ to be $\{\pi_{i}(p)\colon\,p\in X\}$ . In order to shorten the notation, for a configuration $c$ we sometimes write $\mathsf{top}^{k}(c)$ or $\mathsf{pop}^{k}(c)$ for $\mathsf{top}^{k}(\pi_{2}(c))$ or $\mathsf{pop}^{k}(\pi_{2}(c))$ , respectively.

We use a shorthand $\delta(c)$ for a configuration $c$ to denote $\delta(\pi_{1}(c),\mathsf{pos}{\downarrow}(\mathsf{top}^{0}(c)))$ . A configuration $d$ is a successor of a configuration $c$ , if

•

$\delta(c)=\mathsf{read}(\vec{q})$ , and $d=(\vec{q}(a),\pi_{2}(c))$ for some $a\in A$ , or
•

$\delta(c)=(q,op)$ , and $d=(q,op(\pi_{2}(c)))$ .

Notice that a configuration $c$ has

•

$|A|$ successors, if the transition is $\mathsf{read}(\vec{q})$ ;
•

no successors, if the operation is $\mathsf{pop}^{k}$ but there is only one $(k-1)$ -stack on the topmost $k$ -stack;
•

one successor, otherwise.

Next, we define a run of ${\mathcal{A}}$ . For $0\leq i\leq m$ , let $c_{i}$ be a configuration. A run $R$ from $c_{0}$ to $c_{m}$ is a sequence $c_{0},c_{1},\dots,c_{m}$ such that, for each $i\in[1,m]$ , $c_{i}$ is a successor of $c_{i-1}$ . We set $R(i)=c_{i}$ and call $\lvert R\rvert=m$ the length of $R$ . The subrun $R{\restriction}_{i,j}$ is $c_{i},c_{i+1},\dots,c_{j}$ . For runs $R,S$ with $R(\lvert R\rvert)=S(0)$ , we write $R\circ S$ for the composition of $R$ and $S$ that is defined as expected. Sometimes we also consider infinite runs, such that the sequence $c_{0},c_{1},c_{2},\dots$ is infinite. However, unless stated explicitly, a run is finite.

The word read by a run is a word over the input alphabet $A$ . For a run from a configuration $c$ to its successor $d$ , it is the empty word if the transition between them is of the form $(q,op)$ . If the transition is $\mathsf{read}(\vec{q})$ , this is the one-letter word consisting of the letter $a$ for which $\pi_{1}(d)=\vec{q}(a)$ (this letter is determined uniquely, as $\vec{q}$ is injective). For a longer run $R$ this is defined as the concatenation of the words read by the subruns $R{\restriction}_{i-1,i}$ for $i\in[1,|R|]$ . A run is accepting if it ends in a configuration whose state is accepting. A word $w$ is accepted by ${\mathcal{A}}$ if it is read by some accepting run starting in the initial configuration. The language recognized by ${\mathcal{A}}$ is the set of words accepted by ${\mathcal{A}}$ .

2.1. Collapsible $2$ -DPDA

In Section 4 we also use deterministic collapsible pushdown automata of order $2$ ( $2$ -DCPDA for short). Such automata are defined like $2$ -DPDA, with the following differences. A $0$ -stack contains now three parts: a symbol from $\Gamma$ , a position, and a natural number, but still only the symbol (together with a state) is used to determine which transition is performed from a configuration. The $\mathsf{push}^{1}_{\gamma}$ operation sets the number in the topmost $0$ -stack to the current size of the $2$ -stack (while $\mathsf{push}^{2}_{\gamma}$ does not modify these numbers). We have a new stack operation $\mathsf{collapse}$ . Its result $\mathsf{collapse}(s)$ is obtained from $s$ by removing its topmost $1$ -stacks, so that only $k-1$ of them are left, where $k$ is the number stored in $\mathsf{top}^{0}(s)$ (intuitively, we remove all $1$ -stacks on which the topmost $0$ -stack is present).

3. Relation between Word Languages and Trees

In this section we describe how word languages recognized by DPDA are related to trees generated by PDA. Before seeing how Theorem 2 implies Theorem 1, we need to define how $n$ -PDA are used to generate trees. We consider ranked, potentially infinite trees. Beside of the input alphabet $A$ we have a function $\mathit{rank}\colon A\to\mathbb{N}$ ; a tree node labelled by some $a\in A$ has always $\mathit{rank}(a)$ children.

Automata used to generate trees are defined like DPDA or DCPDA (in particular they are deterministic as well), with the difference that they do not have the set of accepting states, and that instead of the $\mathsf{read}(\vec{q})$ transitions, there are $\mathsf{branch}(a,q_{1},q_{2},\dots,q_{\mathit{rank}(a)})$ transitions, for $a\in A$ , and for pairwise distinct states $q_{1},q_{2},\dots,q_{\mathit{rank}(a)}\in Q$ . If the transition from $c$ is $\delta(c)=\mathsf{branch}(a,q_{1},q_{2},\dots,q_{\mathit{rank}(a)})$ , in a successor $d$ of $c$ we have $\pi_{2}(d)=\pi_{2}(c)$ and $\pi_{1}(d)=q_{i}$ for some $i\in[1,\mathit{rank}(a)]$ (in particular $c$ has no successors if $\mathit{rank}(a)=0$ ).

Let $T({\mathcal{A}})$ be the set of all configurations $c$ of ${\mathcal{A}}$ reachable from the initial one, such that a $\mathsf{branch}$ transition should be performed from $c$ . If there is a configuration of ${\mathcal{A}}$ reachable from the initial one, from which there is no run to a configuration from $T({\mathcal{A}})$ , by definition ${\mathcal{A}}$ does not generate any tree. Otherwise, a tree generated by ${\mathcal{A}}$ has runs from the initial configuration to a configuration from $T({\mathcal{A}})$ as its nodes. A node $R$ is labelled by $a\in A$ such that $\delta(R(|R|))=\mathsf{branch}(a,q_{1},q_{2},\dots,q_{\mathit{rank}(a)})$ . A node $S$ is its $i$ -th child ( $1\leq i\leq\mathit{rank}(a)$ ), if $S$ is the composition of $R$ and a run $S^{\prime}$ that uses a $\mathsf{branch}$ transition only in its first transition, and for which $\pi_{1}(S^{\prime}(1))=q_{i}$ . Notice that the graph obtained this way is really an $A$ -labelled ranked tree.

We now see how Theorem 1 follows from Theorem 2. Let $L\subseteq A^{*}$ be the language recognized by a $2$ -DCPDA ${\mathcal{A}}$ that is not recognized by any $n$ -DPDA, for any $n$ ( $L$ exists by Theorem 2). First, we transform ${\mathcal{A}}$ into a $2$ -DCPDA ${\mathcal{B}}$ , recognizing $L$ as well, such that each configuration of ${\mathcal{B}}$ reachable from the initial one has a successor. Observe that the only reason why in ${\mathcal{A}}$ there may be configurations with no successors is that it wants to empty a stack using a $\mathsf{pop}$ operation. To avoid such situations, ${\mathcal{B}}$ should have some bottom-of-stack marker $\bot$ on the bottom of each $1$ -stack, and on the bottom of the $2$ -stack (a $1$ -stack containing only the $\bot$ marker). Thus, ${\mathcal{B}}$ starts with the $\bot$ marker as the initial stack symbol, performs $\mathsf{push}^{2}_{\bot}$ and $\mathsf{push}^{1}_{\gamma_{I}}$ , placing the original initial stack symbol $\gamma_{I}$ . Then, whenever ${\mathcal{A}}$ blocks because it wants to empty a stack, in ${\mathcal{B}}$ the bottom-of-stack marker is uncovered; in such a situation ${\mathcal{B}}$ starts some loop with no accepting state. There is also a technical detail, that a $\mathsf{pop}$ operation that would block ${\mathcal{A}}$ , in ${\mathcal{B}}$ can enter an accepting state; to overcome this problem, every $\mathsf{pop}$ operation ending in an accepting state should first end in some auxiliary, not accepting state, from which (if the bottom-of-stack marker is not seen) the accepting state is reached.

Next, we create a tree-generating $2$ -CPDA ${\mathcal{C}}$ , which generates a tree over the alphabet $B=\{{X},{Y},{Z}\}$ , where $\mathit{rank}({X})=|A|$ and $\mathit{rank}({Y})=\mathit{rank}({Z})=1$ . It is obtained from ${\mathcal{B}}$ in two steps. First, we replace each transition $\mathsf{read}(\vec{q})$ of ${\mathcal{B}}$ by the transition $\mathsf{branch}({X},\vec{q}(a_{1}),\vec{q}(a_{2}),\dots,\vec{q}(a_{|A|}))$ , where $A=\{a_{1},\dots,a_{|A|}\}$ . Then, in each transition we replace the resulting state $q$ by a fresh auxiliary state $\overline{q}$ , and from $\overline{q}$ (for any topmost stack symbol) we perform transition $\mathsf{branch}({Y},q)$ if $q$ was accepting, or transition $\mathsf{branch}({Z},q)$ if $q$ was not accepting (this way, after each step of the original automaton, we perform a transition $\mathsf{branch}({Y},\cdot)$ or $\mathsf{branch}({Z},\cdot)$ ). Notice that from each configuration of ${\mathcal{C}}$ reachable from the initial one, there exists a run to a configuration from $T({\mathcal{C}})$ , as required by the definition of a tree-generating CPDA. Let $t_{\mathcal{C}}$ be the tree generated by ${\mathcal{C}}$ .

Finally, suppose that $t_{\mathcal{C}}$ can also be generated by some $n$ -PDA ${\mathcal{D}}$ (without collapse). From ${\mathcal{D}}$ we create a word-recognizing $n$ -DPDA ${\mathcal{E}}$ . We replace each transition of the form $\mathsf{branch}({X},q_{1},q_{2},\dots,q_{|A|})$ of ${\mathcal{D}}$ by the transition $\mathsf{read}(\vec{q})$ , where $\vec{q}(a_{i})=q_{i}$ . We replace each transition $\mathsf{branch}({Y},q)$ of ${\mathcal{D}}$ by the transition $(p,\mathsf{push}^{1}_{\gamma})$ for a fresh accepting state $p$ and some stack symbol $\gamma$ ; from $(p,\gamma)$ we perform the transition $(q,\mathsf{pop}^{1})$ (thus, we replace $\mathsf{branch}({Y},q)$ by a pass through an accepting state). The same for a $\mathsf{branch}({Z},q)$ transition, but the fresh state $p$ is not accepting.

Notice that ${\mathcal{E}}$ recognizes $L$ ; this contradicts our assumptions about $L$ , so $t_{\mathcal{C}}$ is not generated by any $n$ -PDA. Indeed, take any word $w\in L$ . We have an accepting run of ${\mathcal{B}}$ that reads $w$ and starts in the initial configuration. This run corresponds to a run of ${\mathcal{C}}$ , that is, to a path $p$ in $t_{\mathcal{C}}$ from the root to a ${Y}$ -labelled node. Letters of $w$ tell us which child the path $p$ chooses in ${X}$ -labelled nodes: if $i$ -th letter of $w$ is $a_{j}$ , then from the $i$ -th ${X}$ -labelled node of $p$ , the path continues to the $j$ -th child. This path $p$ corresponds also to a run of ${\mathcal{D}}$ , so to a run of ${\mathcal{E}}$ . This run starts in the initial configuration, ends with an accepting state, and reads $w$ ; thus, ${\mathcal{E}}$ accepts $w$ . Similarly, each word accepted by ${\mathcal{E}}$ is also accepted by ${\mathcal{B}}$ .

We also recall that a tree is generated by a recursion scheme of order $2$ if and only if it is generated by a $2$ -CPDA [collapsible], and that a tree is generated by a safe recursion scheme of order $n$ if and only if it is generated by an $n$ -PDA [easy-trees]; this implies the “equivalently” parts of Theorem 1.

4. The Separating Language

In this section we define a language $U$ that can be recognized by a $2$ -DCPDA, but not by any $n$ -DPDA, for any $n$ . It is a language over the alphabet $A=\{[,],\star,\sharp\}$ . For a word $w\in\{[,],\star\}^{*}$ we define $\mathit{stars}(w)$ . Whenever in some prefix of $w$ there are more closing brackets than opening brackets, $\mathit{stars}(w)=0$ . Also when in the whole $w$ we have the same number of opening and closing brackets, $\mathit{stars}(w)=0$ . Otherwise, let $\mathit{stars}(w)$ be the number of stars in $w$ before the last opening bracket that is not closed. Let $U$ be the set of words $w\sharp^{\mathit{stars}(w)+1}$ , for any $w\in\{[,],\star\}^{*}$ (i.e., these are words $w$ consisting of brackets and stars, followed by $\mathit{stars}(w)+1$ sharp symbols).

It is known that languages similar to $U$ can be recognized by a $2$ -DCPDA (cf., e.g., Aehlig, de Miranda, and Ong [AehligMO05]), but for completeness we briefly show it below. The $2$ -DCPDA uses three stack symbols: $X$ (used to mark the bottom of $1$ -stacks), $Y$ (used to count brackets), $Z$ (used to mark the bottommost $1$ -stack). The initial symbol is $X$ . The automaton first pushes $Z$ , makes a copy of the $1$ -stack (i.e., it performs $\mathsf{push}^{2}_{Z}$ ), and pops $Z$ (hence the first $1$ -stack is marked with $Z$ , unlike any other $1$ -stack used later). Then, for an opening bracket we push $Y$ , for a closing bracket we pop $Y$ , and for a star we perform $\mathsf{push}^{2}_{\gamma}$ (where $\gamma$ is the topmost stack symbol). Hence for each star we have a $1$ -stack and on the last $1$ -stack we have as many $Y$ symbols as the number of currently open brackets. If for a closing bracket the topmost symbol is $X$ , it means that in the word read so far we have more closing brackets than opening brackets; in this case we should accept suffixes of the form $\{[,],\star\}^{*}\sharp$ , which is easy.

Finally, the $\sharp$ symbol is read. If the topmost symbol is $X$ , we have read as many opening brackets as closing brackets, hence we should accept one $\sharp$ symbol. Otherwise, the topmost $Y$ symbol corresponds to the last opening bracket that is not closed. We execute the $\mathsf{collapse}$ operation. It leaves the $1$ -stacks created by the stars read before this bracket, except one (plus the first $1$ -stack). Thus, the number of $1$ -stacks is precisely equal to $\mathit{stars}(w)$ . Now we should read as many $\sharp$ symbols as we have $1$ -stacks, plus one (after each $\sharp$ symbol we perform $\mathsf{pop}^{2}$ ), and then accept.

In the remaining part of the paper we prove that any $n$ -DPDA cannot recognize $U$ ; in particular all automata appearing in the following sections do not use collapse.

5. Overview of the Proof

Before we start the real proof, in this section we present its general structure, on the intuitive level. Let us first see why $U$ cannot be recognized by any $1$ -DPDA ${\mathcal{A}}$ . Consider the input word

\displaystyle w_{1}=[\star^{n_{1}}[\star^{n_{2}}\dots[\star^{n_{N}}[\star^{m_{N+1}}]\star^{m_{N}}]\dots\star^{m_{1}}]\star^{m_{0}}[

(where each bracket is matched, except the last opening bracket). Notice that $\mathit{stars}(w_{1})$ equals the sum of all $n_{i}$ and $m_{i}$ , so ${\mathcal{A}}$ , after reading $w_{1}$ , has to store all these numbers in its stack. Thus, it first stores the number $n_{1}$ on the stack (by repeating some stack symbol $n_{1}$ times), then it can mark that there was an opening bracket, then it stores $n_{2}$ , and so on (see Figure 1); none of these numbers can be removed later.

Refer to caption — Figure 1. The stack of a $1$ -DPDA after reading the word $w_{1}$

Now consider the prefix $w_{1,i}$ of $w_{1}$ that ends just after the $i$ -th closing bracket. Since ${\mathcal{A}}$ is deterministic, the stack at the end of $w_{1,i}$ looks similar: it is just shorter, but for sure it ends to the right of the vertical line, which denotes the stack size after the last opening bracket. We see that $\mathit{stars}(w_{1,i})=n_{1}+\dots+n_{N-i}$ . Thus, when ${\mathcal{A}}$ sees a $\sharp$ after $w_{1,i}$ , it has to remove (ignore) the numbers above $n_{N-i}$ , and sum the rest. In particular it passes the vertical line in some state $q_{i}$ . We see that for each $i$ , at the moment of crossing this line, the stack is the same (everything to the right of the line is removed), only the state $q_{i}$ can differ. So in fact each $q_{i}$ has to be different, since for each $i$ we expect a different behavior. This is a contradiction when $N$ is greater than the number of states.

It follows that ${\mathcal{A}}$ is of order at least $2$ , and while reading $w_{1}$ at some moment a push of order $2$ has to be performed, where in the topmost $1$ -stack we don’t remember some of the numbers $n_{i}$ or $m_{i}$ (for example, in order to recognize $w_{1}$ , after each $]$ we can copy the topmost $1$ -stack, and remove a fragment of its copy, so that the matching opening bracket is on the top). But now we can consider the word

\displaystyle w_{2}=w_{1}\star^{n^{\prime}_{1}}w_{1}\star^{n^{\prime}_{2}}\dots w_{1}\star^{n^{\prime}_{N}}w_{1}\star^{m^{\prime}_{N+1}}]\star^{m^{\prime}_{N}}]\dots\star^{m^{\prime}_{1}}]\star^{m^{\prime}_{0}}[\,,

where the numbers $n_{i},m_{i}$ in each copy of $w_{1}$ are independent (so in fact each $w_{1}$ is a different word). Notice that each $w_{1}$ ends by an unmatched opening bracket; they are matched by the closing brackets at the end of $w_{2}$ . We can now almost repeat the previous reasoning. First, $\mathit{stars}(w_{2})$ equals the sum of all numbers, so they all have to be kept on the stack. Then, we draw a line after reading the last $w_{1}$ (that is, separating the $1$ -stacks created before that moment from those created later). By the order- $1$ argument, some number from each $w_{1}$ is not present in the topmost $1$ -stack after reading this $w_{1}$ , so it cannot be present above the line. Next, for each $i$ we try to end the word already after the $i$ -th closing bracket (among those at the end of $w_{2}$ , not those inside words $w_{1}$ ). When we have a $\sharp$ after each of these prefixes, we have to go below the line and behave differently (include a different subset of those values which are not present above the line), so we have to cross the line in different states. This is again a contradiction when $N$ is greater than the number of states. By induction we can continue like this, and nesting the words $w_{n}$ again we can show that for each order of the DPDA there is a problem.

Although the above idea of the proof looks simple, formalizing it is not straightforward. We have to deal with the following issues:

(1)

Above we have argued why a $1$ -DPDA cannot deal correctly with the word $w_{1}$ . But in fact we should consider any $n$ -DPDA, and prove that it is impossible that it stores all numbers from $w_{1}$ inside one $1$ -stack. Then there arises a problem: when crossing “the line” it is no longer true that the stack can only be of one form. Indeed, the topmost $1$ -stack has one fixed form, but we can cross the line in a copy of this $1$ -stack, with anything below this $1$ -stack. We can even cross the line multiple times, in several copies of the $1$ -stack. Thus, it is no longer true that the number of states gives the number of ways in which we can visit a substack. The ways of visiting a substack are described by types of stacks and by types of sequences of configurations, defined in Section LABEL:sec:types. The key point is that there are finitely many types for a fixed DPDA.
(2)

Where exactly is a number stored in a stack? And, where exactly “the line” should be placed? This is not sharp, since a DPDA may delay some stack operations by keeping information in its state, as well as it may temporarily create some fancy redundant structures on the stack, which are removed later in the run. To deal with this issue, in Section LABEL:sec:milestone we define milestone configurations. Intuitively, these are configurations in which no additional garbage is present on the stack.
(3)

Finally, why it would be wrong when, while reading the $\sharp$ symbols, the automaton did not visit a place where there is stored a number that is a part of $\mathit{stars}(\cdot)$ ? Maybe, accidentally, this number is equal to some other amount in the stack. Or maybe it was propagated to some other region on the stack by some involved manipulations. To overcome this difficulty, in Section LABEL:sec:pumping we prove a pumping lemma. It allows to change any of the numbers in the input word, without altering too much the whole stack. If some number (included in $\mathit{stars}(\cdot)$ ) is changed, the DPDA has to enter the part of the stack changed by the pumping lemma; otherwise it would incorrectly accept after the same number of the $\sharp$ symbols for two words with different $\mathit{stars}(\cdot)$ .

6. The History Function and Special Runs

We begin this section by defining the history function. Then we define two classes of runs that are particularly interesting for us, namely $k$ -upper runs and $k$ -returns.

For any run $R$ and any $k$ -stack $s^{k}$ of $R(|R|)$ , where $k\in[0,n]$ , we define a $k$ -stack $\mathsf{hist}(R,s^{k})$ . Intuitively, $\mathsf{hist}(R,s^{k})$ is the (unique) $k$ -stack of $R(0)$ , which evolved to the $k$ -stack $s^{k}$ in $R(|R|)$ . Formally, we define $\mathsf{hist}(R,s^{k})$ by induction on the length of $R$ , starting with the case of $k=0$ . When $|R|=0$ , we take $\mathsf{hist}(R,s^{0})=s^{0}$ . Consider now a longer run $R=S\circ T$ with $|T|=1$ . We take $\mathsf{hist}(R,s^{0})=\mathsf{hist}(S,s^{0})$ if the last transition of $R$ is $\mathsf{read}$ or performs $\mathsf{pop}$ , as well as if the transition performs $\mathsf{push}^{r}_{\gamma}$ and $s^{0}$ is not in the topmost $(r-1)$ -stack of $R(|R|)$ . If the last transition of $R$ performs $\mathsf{push}^{r}_{\gamma}$ and $s^{0}$ is in the topmost $(r-1)$ -stack of $R(|R|)$ , then $\mathsf{hist}(R,s^{0})=\mathsf{hist}(S,t^{0})$ , where $t^{0}$ is equal to $s^{0}$ with the $(n-r+1)$ -th coordinate of its position decreased by $1$ (i.e., $t^{0}$ is the $0$ -stack of $T(0)$ from which $s^{0}$ was obtained as a copy). Notice that (for technical convenience) $\mathsf{hist}$ works in this way also for the topmost $0$ -stack, although the content of the topmost $0$ -stack changes during the $\mathsf{push}^{r}_{\gamma}$ operation. For $k>0$ , we define $\mathsf{hist}(R,s^{k})$ to be the $k$ -stack of $R(0)$ containing $\mathsf{hist}(R,s^{0})$ for all $0$ -stacks $s^{0}$ in $s^{k}$ (observe that when $s^{0}$ , $t^{0}$ are two $0$ -stacks in $s^{k}$ , the $0$ -stacks $\mathsf{hist}(R,s^{0})$ and $\mathsf{hist}(R,t^{0})$ are in the same $k$ -stack).

It is important to notice that whenever $R=S\circ T$ , then $\mathsf{hist}(S,\mathsf{hist}(T,s^{k}))=\mathsf{hist}(R,s^{k})$ . In the sequel we extensively use this property, which we call compositionality of histories.

For $k\in[0,n]$ , we say that a run $R$ is $k$ -upper if $\mathsf{hist}(R,\mathsf{top}^{k}(R(|R|)))=\mathsf{top}^{k}(R(0))$ ; let $\mathsf{up}^{k}$ be the set of all such runs. Intuitively, a run $R$ is $k$ -upper when the topmost $k$ -stack of $R(|R|)$ is a copy of the topmost $k$ -stack of $R(0)$ , but possibly some changes were made to it. Notice that $\mathsf{up}^{n}$ contains all runs, $\mathsf{up}^{k}\subseteq\mathsf{up}^{l}$ for $k\leq l$ , and for a run $R\circ S$ with $S\in\mathsf{up}^{k}$ it holds $R\in\mathsf{up}^{k}\iff R\circ S\in\mathsf{up}^{k}$ (the last property is by compositionality of histories).

For $k\in[1,n]$ , a run $R$ is a $k$ -return if

•

$\mathsf{hist}(R,\mathsf{top}^{k-1}(R(|R|)))=\mathsf{top}^{k-1}(\mathsf{pop}^{k}(R(0)))$ , and
•

$R{\restriction}_{i,|R|}\not\in\mathsf{up}^{k-1}$ for all $i\in[0,|R|-1]$ .

Let $\mathsf{ret}^{k}$ be the set of $k$ -returns. Observe that $\mathsf{ret}^{k}\subseteq\mathsf{up}^{k}$ . Intuitively, $R$ is a $k$ -return when the topmost $k$ -stack of $R(|R|)$ is obtained from the topmost $k$ -stack of $R(0)$ by removing its topmost $(k-1)$ -stack (but not only in the sense of contents, but we require that really it was obtained this way).

{exa}

Consider a $2$ -DPDA, and its run $R$ of length $6$ in which $\mathsf{pos}{\downarrow}(\pi_{2}(R(0)))=[[a,b],[c,d]]$ , and in which the operations between consecutive configurations are

\displaystyle\mathsf{push}^{2}_{e}\,,\ \mathsf{pop}^{1}\,,\ \mathsf{pop}^{2}\,,\ \mathsf{pop}^{1}\,,\ \mathsf{push}^{1}_{d}\,,\ \mathsf{pop}^{1}\,.

Recall that our definition is that a $\mathsf{push}$ of any order can change the topmost stack symbol. The contents of the stacks of the configurations in the run, and subruns being $k$ -upper runs and $k$ -returns are presented in Table 1.

Table 1. Stack contents of the example run, and subruns being

k

-upper runs and

k

-returns

\displaystyle\begin{array}[]{c|l|l|l|l|l}j&\mathsf{pos}{\downarrow}(\pi_{2}(R(j)))&i\colon\,R{\restriction}_{i,j}\in\mathsf{up}^{0}&i\colon\,R{\restriction}_{i,j}\in\mathsf{up}^{1}&i\colon\,R{\restriction}_{i,j}\in\mathsf{ret}^{1}&i\colon\,R{\restriction}_{i,j}\in\mathsf{ret}^{2}\\ \hline\cr 0&[[a,b],[c,d]]&0&0&-&-\\ 1&[[a,b],[c,d],[c,e]]&0,1&0,1&-&-\\ 2&[[a,b],[c,d],[c]]&2&0,1,2&0,1&-\\ 3&[[a,b],[c,d]]&0,3&0,3&-&1,2\\ 4&[[a,b],[c]]&4&0,3,4&0,3&-\\ 5&[[a,b],[c,d]]&4,5&0,3,4,5&-&-\\ 6&[[a,b],[c]]&4,6&0,3,4,5,6&5&-\end{array}

Notice that $R$ is not a $1$ -return. We have $\mathsf{hist}(R{\restriction}_{0,5},(d,(2,2)))=(c,(2,1))$ .

6.1. Basic Properties of Runs

We now state several easy propositions, which are useful later, and also give more intuition about the above definitions.

Proposition 3.

Let $R$ be a $k$ -upper run (where $k\in[0,n]$ ) such that $R{\restriction}_{i,|R|}\not\in\mathsf{up}^{k}$ for each $i\in[1,|R|-1]$ . Then either

•

$\mathsf{top}^{k}(R(0))\cong\mathsf{top}^{k}(R(|R|))$ ; additionally for every $0$ -stack $s^{0}$ in $\mathsf{top}^{k}(R(|R|))$ , $\mathsf{hist}(R,s^{0})$ is the corresponding $0$ -stack in $\mathsf{top}^{k}(R(0))$ , or
•

$|R|=1$ and the only transition of $R$ performs $\mathsf{pop}^{r}$ for $r\leq k$ , or $\mathsf{push}^{r}_{\gamma}$ for $r\leq k$ .

Proof 6.1.

For $|R|\leq 1$ we immediately fall into one of the possibilities. Otherwise, we look at the history of the topmost $k$ -stack of $R(|R|)$ . It is covered by the first operation of $R$ , and then it is not the topmost $k$ -stack until $R(|R|)$ . Thus, it remains unchanged (we have the first possibility). ∎

Next, we give four propositions about $k$ -upper runs and $k$ -returns.

Proposition 4.

Let $R$ be a $k$ -upper run, where $k\in[1,n]$ . Then $R$ is $(k-1)$ -upper if and only if $|\mathsf{top}^{k}(R(0))|\leq\mathsf{top}^{k}(R(i))|$ for each $i\in[0,|R|]$ such that $R{\restriction}_{i,|R|}\in\mathsf{up}^{k}$ .

Proposition 5.

Let $S\circ T$ be a $(k-1)$ -upper run in which $T$ is $k$ -upper, where $k\in[1,n]$ . Then $S$ is $(k-1)$ -upper.

Proposition 6.

Let $R$ be a run that is not $(k-1)$ -upper, where $k\in[1,n]$ . Suppose that $R{\restriction}_{0,j}$ is $(k-1)$ -upper for the greatest index $j\in[0,|R|-1]$ such that $R{\restriction}_{j,|R|}$ is $k$ -upper (in particular such an index $j$ exists). Then $R$ is a $k$ -return.

Proposition 7.

Let $R$ be a $k$ -return, where $k\in[1,n]$ . Then $\mathsf{pop}^{k}(\mathsf{top}^{k}(R(0)))\cong\mathsf{top}^{k}(R(|R|))$ . Additionally for every $0$ -stack $s^{0}$ in $\mathsf{top}^{k}(R(|R|))$ , $\mathsf{hist}(R,s^{0})$ is the corresponding $0$ -stack in $\mathsf{pop}^{k}(\mathsf{top}^{k}(R(0)))$ .

On the Expressive Power of Higher-Order Pushdown Systems