Various Types of Comet Languages and their Application in External Contextual Grammars

Marvin Ködding Bianca Truthe Institut für Informatik, Universität Giessen
Arndtstr. 2, 35392 Giessen, Germany {marvin.koedding,bianca.truthe}@informatik.uni-giessen.de

Abstract

In this paper, we continue the research on the power of contextual grammars with selection languages from subfamilies of the family of regular languages. We investigate various comet-like types of languages and compare such language families to some other subregular families of languages (finite, monoidal, nilpotent, combinational, (symmetric) definite, ordered, non-counting, power-separating, suffix-closed, commutative, circular, or union-free languages). Further, we compare the language families defined by these types for the selection with each other and with the families of the hierarchy obtained for external contextual grammars. In this way, we extend the existing hierarchy by new language families.

Keywords: Comet languages, contextual grammars, subregular selection languages, computational capacity.

1 Introduction

Contextual grammars were first introduced by Solomon Marcus in [17] as a formal model that might be used for the generation of natural languages. The derivation steps consist in adding contexts to given well-formed sentences, starting from an initial finite basis. Formally, a context is given by a pair $(u,v)$ of words and the external adding to a word $x$ gives the word $uxv$ . In order to control the derivation process, contextual grammars with selection in a certain family of languages were defined. In such contextual grammars, a context $(u,v)$ may be added only around a word $x$ if this word $x$ belongs to a language which is associated with the context. Language families were defined where all selection languages in a contextual grammar belong to some language family $F$ .

The study of external contextual grammars with selection in special regular sets was started by Jürgen Dassow in [7]. The research was continued by Jürgen Dassow, Florin Manea, and Bianca Truthe (see [9]) where further subregular families of selection languages were considered.

In the present paper, we extend the hierarchy of subregular language families by families of comet-like languages. Furthermore, we investigate the generative capacity of external contextual grammars with selection in such subregular language families.

2 Preliminaries

Throughout the paper, we assume that the reader is familiar with the basic concepts of the theory of automata and formal languages. For details, we refer to [23]. Here we only recall some notation, definitions, and previous results which we need for the present research.

An alphabet is a non-empty finite set of symbols. For an alphabet $V$ , we denote by $V^{*}$ and $V^{+}$ the set of all words and the set of all non-empty words over $V$ , respectively. The empty word is denoted by $\lambda$ . For a word $w$ and a letter $a$ , we denote the length of $w$ by $|w|$ and the number of occurrences of the letter $a$ in the word $w$ by $|w|_{a}$ . For a set $A$ , we denote its cardinality by $|A|$ . The reversal of a word $w$ is denoted by $w^{R}$ : if $w=x_{1}x_{2}\ldots x_{n}$ for letters $x_{1},\ldots,x_{n}$ , then $w^{R}=x_{n}x_{n-1}\ldots x_{1}$ . By $L^{R}$ , we denote the language of all reversals of the words in $L$ : $L^{R}=\{\;w^{R}\mid w\in L\;\}$ .

A deterministic finite automaton is a quintuple

{\cal A}=(V,Z,z_{0},F,\delta)

where $V$ is a finite set of input symbols, $Z$ is a finite set of states, $z_{0}\in Z$ is the initial state, $F\subseteq Z$ is a set of accepting states, and $\delta$ is a transition function $\delta:Z\times V\to Z$ . The language accepted by such an automaton is the set of all input words over the alphabet $V$ which lead letterwise by the transition function from the initial state to an accepting state.

A regular expression over some alphabet $V$ is defined inductively as follows:

1.

$\emptyset$ is a regular expression;
2.

every element $x\in V$ is a regular expression;
3.

if $R$ and $S$ are regular expressions, so are the concatenation $R\cdot S$ , the union $R\cup S$ , and the Kleene closure $R^{*}$ ;
4.

for every regular expression, there is a natural number $n$ such that the regular expression is obtained from the atomic elements $\emptyset$ and $x\in V$ by $n$ operations concatenation, union, or Kleene closure.

The language $L(R)$ which is described by a regular expression $R$ is also inductively defined:

1.

$L(\emptyset)=\emptyset$ ;
2.

for every element $x\in V$ , we have $L(x)=\{x\}$ ;
3.

if $R$ and $S$ are regular expressions, then

$L(R\cdot S)=L(R)\cdot L(S),\quad L(R\cup S)=L(R)\cup L(S),\quad L(R^{*})=(L(R))^{*}.$

A general regular expression admits as operations (in the third item of the definition above) also intersection (where $L(R\cap S)=L(R)\cap L(S)$ ) and complementation (where $L(\overline{R})=\overline{L(R)}$ ).

All the languages accepted by a finite automaton or described by some regular expression are called regular and form a family denoted by $\mathit{REG}$ . Any subfamily of this set is called a subregular language family.

2.1 Some subregular language families

We consider the following restrictions for regular languages. In the following list of properties, we give already the abbreviation which denotes the family of all languages with the respective property. Let $L$ be a regular language over an alphabet $V$ . With respect to the alphabet $V$ , the language $L$ is said to be

•

monoidal ( $\mathit{MON}$ ) if and only if $L=V^{*}$ ,
•

nilpotent ( $\mathit{NIL}$ ) if and only if it is finite or its complement $V^{*}\setminus L$ is finite,
•

combinational ( $\mathit{COMB}$ ) if and only if it has the form $L=V^{*}X$ for some subset $X\subseteq V$ ,
•

definite ( $\mathit{DEF}$ ) if and only if it can be represented in the form $L=A\cup V^{*}B$ where $A$ and $B$ are finite subsets of $V^{*}$ ,
•

symmetric definite ( $\mathit{SYDEF}$ ) if and only if $L=EV^{*}H$ for some regular languages $E$ and $H$ ,
•

suffix-closed ( $\mathit{SUF}$ ) (or fully initial or multiple-entry language) if and only if, for any two words over $V$ , say $x\in V^{*}$ and $y\in V^{*}$ , the relation $xy\in L$ implies the relation $y\in L$ ,
•

ordered ( $\mathit{ORD}$ ) if and only if the language is accepted by some deterministic finite automaton

${\cal A}=(V,Z,z_{0},F,\delta)$

with an input alphabet $V$ , a finite set $Z$ of states, a start state $z_{0}\in Z$ , a set $F\subseteq Z$ of accepting states and a transition mapping $\delta$ where $(Z,\preceq)$ is a totally ordered set and, for any input symbol $a\in V$ , the relation $z\preceq z^{\prime}$ implies $\delta(z,a)\preceq\delta(z^{\prime},a)$ ,
•

commutative ( $\mathit{COMM}$ ) if and only if it contains with each word also all permutations of this word,
•

circular ( $\mathit{CIRC}$ ) if and only if it contains with each word also all circular shifts of this word,
•

non-counting ( $\mathit{NC}$ ) if and only if there is a natural number $k\geq 1$ such that, for any three words $x\in V^{*}$ , $y\in V^{*}$ , and $z\in V^{*}$ , it holds $xy^{k}z\in L$ if and only if $xy^{k+1}z\in L$ ,
•

star-free ( $\mathit{SF}$ ) if and only if $L$ can be described by a regular expression which is built by concatenation, union, and complementation,
•

power-separating ( $\mathit{PS}$ ) if and only if, there is a natural number $m\geq 1$ such that for any word $x\in V^{*}$ , either $J_{x}^{m}\cap L=\emptyset$ or $J_{x}^{m}\subseteq L$ where $J_{x}^{m}=\{\;x^{n}\mid n\geq m\;\}$ ,
•

union-free ( $\mathit{UF}$ ) if and only if $L$ can be described by a regular expression which is only built by concatenation and Kleene closure,
•

star ( $\mathit{STAR}$ ) if and only if $L=H^{*}$ for some regular language $H\subseteq V^{*}$ ,
•

left-sided comet ( $\mathit{LCOM}$ ) if and only if $L=EG^{*}$ for some regular language $E$ and a regular language $G\notin\{\emptyset,\{\lambda\}\}$ ,
•

right-sided comet ( $\mathit{RCOM}$ ) if and only if $L=G^{*}H$ for some regular language $H$ and a regular language $G\notin\{\emptyset,\{\lambda\}\}$ ,
•

two-sided comet ( $\mathit{2COM}$ ) if and only if $L=EG^{*}H$ for two regular languages $E$ and $H$ and a regular language $G\notin\{\emptyset,\{\lambda\}\}$ .

We remark that monoidal, nilpotent, combinational, (symmetric) definite, ordered, non-counting, star-free, union-free, star, and (left-, right-, or two-sided) comet languages are regular, whereas non-regular languages of the other types mentioned above exist. Here, we consider among the suffix-closed, commutative, circular, and power-separating languages only those which are also regular. By $\mathit{FIN}$ , we denote the family of languages with finitely many words. In [18], it was shown that the families of the non-counting languages and the star-free languages are equivalent ( $\mathit{NC}=\mathit{SF}$ ).

Some properties of the languages of the classes mentioned above can be found in [24] (monoids), [11] (nilpotent languages), [14] (combinational and commutative languages), [22] (definite languages), [21] (symmetric definite languages), [12] and [6] (suffix-closed languages), [25] (ordered languages), [16] (circular languages), [18] (non-counting and star free languages), [26] (power-separating languages), [3] (union-free languages), [4] (star languages), [5] (comet languages).

2.2 Contextual grammars

Let ${\cal F}$ be a family of languages. A contextual grammar with selection in ${\cal F}$ is a triple $G=(V,{\cal S},A)$ where

–

$V$ is an alphabet,
–

${\cal S}$ is a finite set of selection pairs $(S,C)$ with a selection language $S$ over some subset $U$ of the alphabet $V$ which belongs to the family ${\cal F}$ with respect to the alphabet $U$ and a finite set $C\subset V^{*}\times V^{*}$ of contexts where, for each context $(u,v)\in C$ , at least one side is not empty: $uv\not=\lambda$ ,
–

$A$ is a finite subset of $V^{*}$ (its elements are called axioms).

We write a selection pair $(S,C)$ also as $S\to C$ . In the case that $C$ is a singleton set $C=\{(u,v)\}$ , we also write $S\to(u,v)$ . For a contextual grammar $G=(V,\left\{\,(S_{1},C_{1}),(S_{2},C_{2}),\dots,(S_{n},C_{n})\,\right\},A)$ , we set

\ell_{A}(G)=\max\left\{\>|w|\;|\;w\in A\>\right\},\quad\ell_{C}(G)=\max\left\{\>|uv|\;|\;(u,v)\in C_{i},1\leq i\leq n\>\right\},\quad\ell(G)=\ell_{A}(G)+\ell_{C}(G)+1.

We now define the derivation modes for contextual grammars with selection.

Let $G=(V,{\cal S},A)$ be a contextual grammar with selection. A direct external derivation step in $G$ is defined as follows: a word $x$ derives a word $y$ (written as $x\Longrightarrow y$ ) if and only if there is a pair $(S,C)\in{\cal S}$ such that $x\in S$ and $y=uxv$ for some pair $(u,v)\in C$ . Intuitively, one can only wrap a context $(u,v)\in C$ around a word $x$ if $x$ belongs to the corresponding selection language $S$ .

By $\Longrightarrow^{*}$ we denote the reflexive and transitive closure of the relation $\Longrightarrow$ . The language generated by $G$ is $L=\{\;z\mid x\Longrightarrow^{*}z\mbox{ for some }x\in A\;\}$ .

Example 1

Consider the contextual grammar $G=(\{a,b,c\},\{(S_{1},C_{1}),(S_{2},C_{2})\},\{\lambda\})$ with

S_{1}=\{a,b\}^{*},\quad C_{1}=\{(\lambda,a),(\lambda,b)\},\qquad S_{2}=\{ab\}^{*},\quad C_{2}=\{(c,c)\}.

Starting from the axiom $\lambda$ , every word of the language $S_{1}$ is generated by applying the first selection. Starting from any word of $S_{2}\subset S_{1}$ , every word of the language $\{c\}S_{2}\{c\}$ is generated by applying the second selection. Other words are not generated.

Thus, the language generated is

L(G)=\{a,b\}^{*}\cup\{\;c(ab)^{n}c\mid n\geq 0\;\}.

Both selection languages are ordered: The language $S_{1}$ is accepted by a finite automaton with exactly one state. Hence, it is ordered. The language $S_{2}$ is accepted by the following deterministic finite automaton $A=(\{z_{0},z_{1},z_{2},z_{3}\},\{a,b\},\delta,z_{1},\{z_{1}\})$ where the transition function is illustrated in the following picture and given in the table next to it, from which it can be seen that the automaton is ordered:

	$z_{0}$	$z_{1}$	$z_{2}$	$z_{3}$
$a$	$z_{0}$	$z_{2}$	$z_{3}$	$z_{3}$
$b$	$z_{0}$	$z_{0}$	$z_{1}$	$z_{3}$

$\Diamond$

By ${\cal EC}({\cal F})$ , we denote the family of all languages generated externally by contextual grammars with selection in ${\cal F}$ . When a contextual grammar works in the external mode, we call it an external contextual grammar.

The language generated by the external contextual grammar in Example 1 belongs, for instance, to the family ${\cal EC}(\mathit{ORD})$ because all selection languages ( $S_{1}$ and $S_{2}$ ) are ordered.

3 Results on families of comet languages

We first present some observations about star languages and two-sided comet languages, we give normal forms for two-sided comets, and we insert the subregular families investigated here into the existing hierarchy.

From the structure of two-sided comet languages (languages $L$ of the form $EG^{*}H$ where $G$ is neither the empty set nor the set with the empty word only), we see that every such language is infinite if none of the sets $E$ , $G$ , and $H$ is the empty set. If one of the sets $E$ or $H$ is empty, then the whole language $L$ is also empty.

Lemma 2

For each language $L\in\mathit{2COM}$ , it holds that $L$ is either infinite or empty.

A similar observation can be made for star languages.

Lemma 3

For each language $L\in\mathit{STAR}$ , it holds that $L$ either is infinite or consists of the empty word $\lambda$ .

3.1 Normal forms

We first show some observations before we conclude a normal form for languages from the class $\mathit{2COM}$ . This normal form is later used when we prove that $\mathit{2COM}$ -languages as selection languages are as powerful as arbitrary regular languages.

Lemma 4

Each two-sided comet language $L=EG^{*}H$ can be represented as a finite union

L=\bigcup_{i=1}^{n}E_{i}G^{*}H

for some number $n\geq 1$ and with union-free languages $E_{i}$ for all $1\leq i\leq n$ .

Proof.

Let $L=EG^{*}H$ be a two-sided comet language. Every regular language is the union of finitely many union-free languages [19]. Let $n\geq 1$ be a natural number and $E_{i}$ be a union-free language for any $i$ with $1\leq i\leq n$ such that $E=E_{1}\cup E_{2}\cup\cdots\cup E_{n}$ . Then, it follows $L=E_{1}G^{*}H\cup E_{2}G^{*}H\cup\cdots\cup E_{n}G^{*}H$ . ∎

In order to show later that we can transform any $\mathit{2COM}$ -language into the mentioned normal form, we now present how an infinite union-free language can be represented by a special $\mathit{2COM}$ -form.

Lemma 5

For an infinite union-free language $L$ , there exist sets $L_{l}$ , $L_{i}$ , and $L_{r}$ such that $L=L_{l}L_{i}^{*}L_{r}$ where $L_{l}$ is finite and $L_{i}\notin\{\emptyset,\{\lambda\}\}$ .

Proof.

We prove the assertion inductively via the number of construction steps required to create a regular expression $\mathcal{R}$ such that $L=L(\mathcal{R})$ holds. In construction step 0, only finite languages are created. Therefore, the base case is $n=1$ .

Base case $n=1$ : Since $L$ is infinite, we have $\mathcal{R}=\{x\}^{*}$ for a letter $x\in V$ . A desired representation for the language $L$ is then $\{\lambda\}\{x\}^{*}\{\lambda\}$ .

Induction step $n\to n+1$ : Assume the induction hypothesis: For every regular expression $\mathcal{R}$ without the union operator which describes an infinite language and which is at construction level of at most $n$ , the language $L(\mathcal{R})$ can be represented as $L(\mathcal{R})=L_{l}L_{i}^{*}L_{r}$ with $|L_{l}|<\infty$ and $L_{i}\notin\{\emptyset,\{\lambda\}\}$ . Now, let ${\cal R}$ be a regular expression of construction level $n+1$ which describes an infinite language and which does not contain the union operator. Then, there are two possibilities how ${\cal R}$ is built: by concatenation of two regular expressions where for at least one of the described languages the induction hypothesis holds or by Kleene closure of a regular expression which neither describes the empty set nor the language $\{\lambda\}$ (otherwise, $L({\cal R})$ would be finite).

Case 1: Let $\mathcal{R}=\mathcal{S}\mathcal{T}$ . Then, the equation $L(\mathcal{R})=L(\mathcal{S})L(\mathcal{T})$ holds. If $L({\cal S})$ is infinite, we get, according to the induction hypothesis, $L(\mathcal{S})=S_{l}S_{i}^{*}S_{r}$ for suitable sets $S_{l}$ , $S_{i}$ , and $S_{r}$ . With $R_{l}=S_{l}$ , $R_{i}=S_{i}$ , and $R_{r}=S_{r}L({\cal T})$ , we obtain $L(\mathcal{R})=R_{l}R_{i}^{*}R_{r}$ with $|R_{l}|<\infty$ and $R_{i}\notin\{\emptyset,\{\lambda\}\}$ . If $L({\cal S})$ is finite, then $L({\cal T})$ is infinite (because we consider only such ${\cal R}$ where $L({\cal R})$ is infinite) and we get, according to the induction hypothesis, that $L(\mathcal{T})=T_{l}T_{i}^{*}T_{r}$ for suitable sets $T_{l}$ , $T_{i}$ , and $T_{r}$ . With $R_{l}=L({\cal S})T_{l}$ , $R_{i}=T_{i}$ , and $R_{r}=T_{r}$ , we obtain a desired representation $L(\mathcal{R})=R_{l}R_{i}^{*}R_{r}$ with $|R_{l}|<\infty$ and $R_{i}\notin\{\emptyset,\{\lambda\}\}$ .

Case 2: Let $\mathcal{R}=\mathcal{S}^{*}$ . Then, the equation $L(\mathcal{R})=(L(\mathcal{S}))^{*}$ holds. Thus, with $R_{l}=\{\lambda\}$ , $R_{i}=L(\mathcal{S})$ , and $R_{r}=\{\lambda\}$ , we obtain that $L(\mathcal{R})=R_{l}R_{i}^{*}R_{r}$ with $|R_{l}|<\infty$ and $R_{i}\notin\{\emptyset,\{\lambda\}\}$ .

Hence, every infinite union-free language can be expressed in the claimed form. ∎

We proved with Lemma 4 that any $\mathit{2COM}$ -language can be given as a union of finitely many $\mathit{2COM}$ -languages where the first comet tail is always union-free. Together, we obtain that any $\mathit{2COM}$ -language has a representation in the $\mathit{2COM}$ -form where the first comet tail is a finite set.

Lemma 6

For each two-sided comet language $L=EG^{*}H$ with $E\in\mathit{UF}$ , there exist a finite language $E^{\prime}$ , a language $G^{\prime}\notin\{\emptyset,\{\lambda\}\}$ , and a regular language $H^{\prime}$ such that $L=E^{\prime}(G^{\prime})^{*}H^{\prime}$ .

Proof.

We have shown in Lemma 2 that each two-sided comet language $L$ is either empty or infinite. For the first case, the assertion holds with $E^{\prime}=\emptyset$ and any regular languages $G^{\prime}\notin\{\emptyset,\{\lambda\}\}$ and $H^{\prime}$ .

Now, let $L=EG^{*}H$ be an infinite $\mathit{2COM}$ -language with $E\in\mathit{UF}$ . If $E$ is finite, then we already have a desired form with $E^{\prime}=E$ , $G^{\prime}=G$ , and $H^{\prime}=H$ .

So, let $E$ be infinite. By Lemma 5, we know that there are languages $E_{l}$ , $E_{i}$ , and $E_{r}$ such that $E_{l}$ is a finite set, $E_{i}\notin\{\emptyset,\{\lambda\}\}$ , and $E=E_{l}E_{i}^{*}E_{r}$ . If we set $E^{\prime}=E_{l}$ , $G^{\prime}=E_{i}$ , and $H^{\prime}=E_{r}G^{*}H$ , then we obtain a desired form because $L=E^{\prime}(G^{\prime})^{*}H^{\prime}$ where $E^{\prime}$ is finite, $G^{\prime}\notin\{\emptyset,\{\lambda\}\}$ , and $H^{\prime}$ is a regular language. ∎

Now we connect the previous lemmas and conclude that, for every two-sided comet language, there is such a representation where the first comet tail of the language is finite.

Theorem 7 (Normal form for $\mathit{2COM}$ -languages)

For each two-sided comet language, there exists a representation $L=EG^{*}H$ such that $E$ is a finite language and $G\notin\{\emptyset,\{\lambda\}\}$ .

Proof.

According to Lemma 4, any two-sided comet language $L=E^{\prime}(G^{\prime})^{*}H^{\prime}$ can be represented as a union of finitely many languages $E^{\prime}_{i}(G^{\prime})^{*}H^{\prime}$ such that all languages $E^{\prime}_{i}$ are union-free. According to Lemma 6, every such language $E^{\prime}_{i}(G^{\prime})^{*}H^{\prime}$ can in turn be represented as a $\mathit{2COM}$ -language $E_{i}G^{*}H$ where the first tail $E_{i}$ is finite. The union $E$ of all these finite languages $E_{i}$ is also finite. Hence, we obtain

L=E^{\prime}(G^{\prime})^{*}H^{\prime}=\bigcup_{i=1}^{n}E^{\prime}_{i}(G^{\prime})^{*}H^{\prime}=\bigcup_{i=1}^{n}E_{i}G^{*}H=\left(\bigcup_{i=1}^{n}E_{i}\right)G^{*}H=EG^{*}H

where $E$ is finite and $G\notin\{\emptyset,\{\lambda\}\}$ . ∎

We refer to this representation as a left-sided normal form. A right-sided normal form (where the last comet tail is a finite set) can be derived in a similar way.

3.2 Hierarchy of subregular language classes

In this section, we investigate inclusion relations between various subregular languages classes. Figure 1 shows the results.

Figure 1: Resulting hierarchy of subregular language families

An arrow from a node $X$ to a node $Y$ stands for the proper inclusion $X\subset Y$ . If two families are not connected by a directed path, then they are incomparable. An edge label refers to the paper where the proper inclusion has been shown (in some cases, it might be that it is not the first paper where the respective inclusion has been mentioned, since it is so obvious that it was not emphasized in a publication) or the lemma of this paper where the proper inclusion will be shown.

In the literature, it is often said that two languages are equivalent if they are equal or differ at most in the empty word. Similarly, two families can be regarded to be equivalent if they differ only in the languages $\emptyset$ or $\{\lambda\}$ . Therefore, the set $\mathit{STAR}$ of all star languages is sometimes regarded as a proper subset of the set $\mathit{COM}$ of all (left-, right-, or two-sided) comet languages although $\{\lambda\}$ belongs to the family $\mathit{STAR}$ but not to $\mathit{LCOM}$ , $\mathit{RCOM}$ or $\mathit{2COM}$ . We regard $\mathit{STAR}$ and $\mathit{STAR}\setminus\{\{\lambda\}\}$ as different. Then, the family $\mathit{STAR}$ is incomparable to $\mathit{LCOM}$ , $\mathit{RCOM}$ , and $\mathit{2COM}$ , as we will later show.

For space reasons, we give the following observation without a proof.

Lemma 8

Whenever a language $L$ is a right-sided comet then its reversal $L^{R}$ is a left-sided comet language and vice versa.

Corollary 9

We have $\mathit{LCOM}=\{\;L^{R}\mid L\in\mathit{RCOM}\;\}$ and $\mathit{RCOM}=\{\;L^{R}\mid L\in\mathit{LCOM}\;\}$ .

We now present some languages which will serve later as witness languages for proper inclusions or incomparabilities.

Lemma 10

The language $L=\{\lambda\}$ is in $\mathit{STAR}\setminus\mathit{2COM}.$

Proof.

The language $L$ is a star language since $L=H^{*}$ with $H=\{\lambda\}$ . According to Lemma 2, a two-sided comet language is either infinite or the empty language. Hence, $L$ is not a two-sided comet. ∎

Lemma 11

Let $L=\{\;a^{2n}\mid n\geq 0\;\}$ . Then, it holds $L\in(\mathit{STAR}\cap\mathit{LCOM}\cap\mathit{RCOM})\setminus\mathit{PS}$ .

Proof.

Let $G=\{aa\}$ and $E=H=\{\lambda\}$ . The language $L$ can be expressed as $L=G^{*}=EG^{*}=G^{*}H$ . Therefore, $L\in\mathit{STAR}\cap\mathit{LCOM}\cap\mathit{RCOM}$ .

Assume that $L\in\mathit{PS}$ . Then, there is a natural number $m\geq 1$ such that, for any word $x\in\{a\}^{*}$ , either $J_{x}^{m}\cap L=\emptyset$ or $J_{x}^{m}\subseteq L$ where $J_{x}^{m}=\{\;x^{n}\mid n\geq m\;\}$ . For any natural number $m\geq 1$ , we have with the word $x=a$ the set $J_{a}^{m}=\{\;a^{n}\mid n\geq m\;\}$ . Since $a^{2m}\in J_{a}^{m}\cap L$ , the intersection is not empty. But, since $a^{2m+1}\in J_{a}^{m}\setminus L$ , it neither holds $J_{a}^{m}\subseteq L$ . Hence, the language $L$ is not power-separating. ∎

Lemma 12

Let $L=\{ab\}^{*}$ . Then, it holds $L\in(\mathit{STAR}\cap\mathit{LCOM}\cap\mathit{RCOM})\setminus\mathit{CIRC}.$

Proof.

Let $G=\{ab\}$ and $E=H=\{\lambda\}$ . The language $L$ can be expressed as $L=G^{*}=EG^{*}=G^{*}H$ . Therefore, $L\in\mathit{STAR}\cap\mathit{LCOM}\cap\mathit{RCOM}$ .

Assume that the language $L$ is circular. Then, the word $ba$ would belong to it because $ab\in L$ but it does not. Hence, $L\notin\mathit{CIRC}$ . ∎

Lemma 13 ([20])

Let $V=\{a,b\}$ be an alphabet, $H=\{ba\}\{b\}^{*}(\{aa\}\{b\}^{*})^{*}$ a regular language over $V$ , and $L=V^{*}H$ . Then, $L\in\mathit{SYDEF}\setminus\mathit{SF}$ .

Proof.

The language $L$ can be represented as $\{\lambda\}V^{*}H$ . So, the language is symmetric definite. As shown in [20], the language is not star-free. ∎

Lemma 14

Let $L_{1}=\{\;a^{n}b\mid n\geq 0\;\}$ and $L_{2}=L_{1}^{R}$ . Then, $L_{1}\in\mathit{RCOM}\setminus\mathit{LCOM}$ and $L_{2}\in\mathit{LCOM}\setminus\mathit{RCOM}$ .

Proof.

The language $L_{1}$ can be expressed as $\{a\}^{*}\{b\}$ , hence, in the form $L_{1}=G^{*}H$ with $G=\{a\}$ and $H=\{b\}$ . Thus, $L_{1}\in\mathit{RCOM}$ .

Assume that $L_{1}\in\mathit{LCOM}$ . Then, two languages $E$ and $I$ would exist such that $L_{1}=EI^{*}$ . Since $b$ is a suffix of every word in $L_{1}$ , the letter $b$ is also a suffix of a word in $I$ . But then $L_{1}$ would also contain a word with more than one $b$ which is a contradiction. Hence, $L_{1}\notin\mathit{LCOM}$ .

By Corollary 9, it follows that $L_{2}\in\mathit{LCOM}\setminus\mathit{RCOM}$ . ∎

Lemma 15

The language $L=\{\lambda,a\}$ belongs to the set $(\mathit{FIN}\cap\mathit{SUF}\cap\mathit{COMM})\setminus(\mathit{STAR}\cup\mathit{2COM})$ .

Proof.

All suffixes of all words of the language $L$ belong to $L$ . Thus, $L$ is suffix-closed. Furthermore, the language is finite but not empty and commutative. According to Lemma 2, each two-sided comet language is either empty or infinite. Hence, $L$ is not a two-sided comet language. According to Lemma 3, each star language is either infinite or contains only the empty word. Hence, $L$ is not a star language either. ∎

We now prove some proper inclusions.

Lemma 16

We have the proper inclusions $\mathit{MON}\subset\mathit{STAR}\subset\mathit{UF}$ .

Proof.

We first prove the relation $\mathit{MON}\subset\mathit{STAR}$ : Any monoidal language can be expressed as $L=V^{*}$ for some alphabet $V$ . Since $V$ is a regular language, $L$ is a star language. A witness language for the properness is the language $L=\{\;a^{2n}\mid n\geq 0\;\}$ as shown in Lemma 11.

We now prove the relation $\mathit{STAR}\subset\mathit{UF}$ : Every language $H^{*}$ for some regular language $H$ is union-free according to [19]. A witness language for the properness is $L=\{a\}$ which is union-free but, according to Lemma 3, not a star language since it is neither infinite nor equal to $\{\lambda\}$ . ∎

Lemma 17

We have the proper inclusions $\mathit{MON}\subset\mathit{SYDEF}\subset{\cal C}\subset\mathit{2COM}$ for ${\cal C}\in\{\mathit{LCOM},\mathit{RCOM}\}$ .

Proof.

1.

$\mathit{MON}\subset\mathit{SYDEF}$ : Any monoidal language can be expressed as $L=V^{*}$ for some alphabet $V$ and, with $E=H=\{\lambda\}$ also in the form $EV^{*}H$ . Hence, the language $L$ is symmetric definite. A witness language for the properness is $\{a,b\}^{*}\{ba\}\{b\}^{*}(\{aa\}\{b\}^{*})^{*}$ from Lemma 13 (and originally [20]).
2.

$\mathit{SYDEF}\subset\mathit{RCOM}$ : This relation was proved in [20].
3.

$\mathit{SYDEF}\subset\mathit{LCOM}$ : The family $\mathit{SYDEF}$ is closed under reversal. For any symmetric definite language $L$ , its reversal $L^{R}$ also belongs to the family $\mathit{SYDEF}$ and, by [20], is also a right-sided comet language. By Lemma 8, the reversal of the language $L^{R}$ , hence $L$ itself, is a left-sided comet language. A witness language for the properness is the language $L=\{\;a^{2n}\mid n\geq 0\;\}$ according to Lemma 11 where it is shown that $L\in\mathit{LCOM}\setminus\mathit{PS}$ and according to [20] where the inclusion $\mathit{SYDEF}\subset\mathit{PS}$ is proved.
4.

$\mathit{RCOM}\subset\mathit{2COM}$ : This relation was proved in [20].
5.

$\mathit{LCOM}\subset\mathit{2COM}$ : Any left-sided comet language $L=EG^{*}$ is also a two-sided comet $EG^{*}H$ with $H=\{\lambda\}$ . In Lemma 14, it was shown that the language $L=\{\;a^{n}b\mid n\geq 0\;\}$ is a right-sided comet language but not a left-sided comet. By [20], it is a two-sided comet language. $\Box$

We now prove the incomparability relations mentioned in Figure 1 which have not been proved earlier. These are the relations regarding the families $\mathit{STAR}$ , $\mathit{SYDEF}$ , $\mathit{LCOM}$ , $\mathit{RCOM}$ , and $\mathit{2COM}$ .

Lemma 18

Each of the families $\mathit{STAR}$ and $\mathit{UF}$ is incomparable to each of the families $\mathit{COMB}$ , $\mathit{SYDEF}$ , $\mathit{RCOM}$ , $\mathit{LCOM}$ , and $\mathit{2COM}$ .

Proof.

Due to inclusion relations, it suffices to show that there are a language $L_{1}\in\mathit{STAR}\setminus\mathit{2COM}$ and a language $L_{2}\in\mathit{COMB}\setminus\mathit{UF}$ . From Lemma 10, we get $L_{1}=\{\lambda\}$ . From [15], we take $L_{2}=\{a,b,c\}^{*}\{a,b\}$ . ∎

Lemma 19

The language family $\mathit{STAR}$ is incomparable to each of the families $\mathit{FIN}$ , $\mathit{NIL}$ , $\mathit{DEF}$ , $\mathit{ORD}$ , $\mathit{NC}$ , $\mathit{SF}$ , $\mathit{PS}$ , and $\mathit{SUF}$ .

Proof.

Due to inclusion relations, it suffices to show that there are a language $L_{1}\in\mathit{STAR}\setminus\mathit{PS}$ , a language $L_{2}\in\mathit{FIN}\setminus\mathit{STAR}$ , and a language $L_{3}\in\mathit{SUF}\setminus\mathit{STAR}$ . As $L_{1}$ , we obtain from Lemma 11 the language $L_{1}=\{\;a^{2n}\mid n\geq 0\;\}$ . From Lemma 15, we take $L_{2}=L_{3}=\{\lambda,a\}$ . ∎

Lemma 20

The language family $\mathit{STAR}$ is incomparable to the families $\mathit{CIRC}$ and $\mathit{COMM}$ .

Proof.

Due to inclusion relations, it suffices to show that there are a language $L_{1}\in\mathit{STAR}\setminus\mathit{CIRC}$ and a language $L_{2}\in\mathit{COMM}\setminus\mathit{STAR}$ . From Lemma 12, we have $L_{1}=\{ab\}^{*}$ . From Lemma 15, we take again the language $L_{2}=\{\lambda,a\}$ . ∎

Lemma 21

The language families $\mathit{LCOM}$ and $\mathit{RCOM}$ are incomparable to each other.

Proof.

With the witness languages $L_{1}=\{\;a^{n}b\mid n\geq 0\;\}\in\mathit{RCOM}\setminus\mathit{LCOM}$ and $L_{2}=L_{1}^{R}\in\mathit{LCOM}\setminus\mathit{RCOM}$ , the statement follows from Lemma 14. ∎

Lemma 22

The language families $\mathit{SYDEF}$ , $\mathit{LCOM}$ , $\mathit{RCOM}$ , and $\mathit{2COM}$ are incomparable to each of the families $\mathit{FIN}$ , $\mathit{NIL}$ , $\mathit{DEF}$ , $\mathit{ORD}$ , $\mathit{NC}$ , and $\mathit{SF}$ .

Proof.

Due to inclusion relations, it suffices to show that there are a language $L_{1}\in\mathit{SYDEF}\setminus\mathit{SF}$ and a language $L_{2}\in\mathit{FIN}\setminus\mathit{2COM}$ . From Lemma 13 (and previously [20]), for the first language, we obtain the language $L_{1}=\{a,b\}^{*}\{ba\}\{b\}^{*}(\{aa\}\{b\}^{*})^{*}$ . From Lemma 15, we take the language $L_{2}=\{\lambda,a\}$ . ∎

Lemma 23

The language families $\mathit{LCOM}$ , $\mathit{RCOM}$ , and $\mathit{2COM}$ are incomparable to the family $\mathit{PS}$ .

Proof.

Due to inclusion relations, it suffices to show that there are a language $L_{1}\in\mathit{LCOM}\setminus\mathit{PS}$ and a language $L_{2}\in\mathit{PS}\setminus\mathit{2COM}$ . The property of (non) power-separating is not influenced by the reversal operation. If there is a language $L_{1}\in\mathit{LCOM}\setminus\mathit{PS}$ , then there is also a language in the set $\mathit{RCOM}\setminus\mathit{PS}$ , namely $L_{1}^{R}$ . From Lemma 11, we have $L_{1}=\{\;a^{2n}\mid n\geq 0\;\}\in(\mathit{LCOM}\cap\mathit{RCOM})\setminus\mathit{PS}$ . As language $L_{2}$ , we take again the language $L_{2}=\{\lambda,a\}$ from Lemma 15. ∎

Lemma 24

The language families $\mathit{SYDEF}$ , $\mathit{LCOM}$ , $\mathit{RCOM}$ , and $\mathit{2COM}$ are incomparable to each of the families $\mathit{SUF}$ , $\mathit{CIRC}$ and $\mathit{COMM}$ .

Proof.

Due to inclusion relations, it suffices to show that there are a language $L_{1}\in\mathit{SYDEF}\setminus\mathit{SUF}$ , a language $L_{2}\in\mathit{SYDEF}\setminus\mathit{CIRC}$ , a language $L_{3}\in\mathit{SUF}\setminus\mathit{2COM}$ , and a language $L_{4}\in\mathit{COMM}\setminus\mathit{2COM}$ . In [15], it was shown that the families $\mathit{COMB}$ and $\mathit{SUF}$ are disjoint. Since $\mathit{COMB}\subseteq\mathit{SYDEF}$ , we can take any combinational language as $L_{1}$ , for instance, $L_{1}=\{a,b\}^{*}\{b\}$ . The same language serves as $L_{2}$ because it is not circular. From Lemma 15, we take again the language $\{\lambda,a\}$ as $L_{3}$ and $L_{4}$ . ∎

From all these relations, the hierachy presented in Figure 1 follows.

Theorem 25 (Resulting hierarchy)

The inclusion relations presented in Figure 1 hold. An arrow from an entry $X$ to an entry $Y$ depicts the proper inclusion $X\subset Y$ ; if two families are not connected by a directed path, then they are incomparable.

Proof.

An edge label refers to the paper or lemma in the present paper where the proper inclusion is shown. The incomparability results are proved in Lemmas 18 to 24. ∎

4 Results on subregular control in external contextual grammars

In this section, we include the families of languages generated by external contextual grammars with selection languages from the subregular families under investigation into the existing hierarchy with respect to external contextual grammars.

If, in a contextual grammar, all selection languages belong to some language family $X$ , then they belong also to every super set $Y$ of $X$ . Therefore, each language in $\mathcal{EC}(X)$ is also generated by a contextual grammar with selection languages from $Y$ and we have the following monotonicity.

Lemma 26

For any two language classes $X$ and $Y$ with $X\subseteq Y$ , we have the inclusion ${\cal EC}(X)\subseteq{\cal EC}(Y)$ .

Figure 2 shows a hierarchy of some language families which are generated by external contextual grammars where the selection languages belong to subregular classes investigated before. The hierarchy contains results which were already known (marked by a reference to the literature) and results which will be proved in this section (marked by a number which refers to the respective lemma).

Figure 2: Resulting hierarchy of language families by external contextual grammars with special selection languages

We now present some languages which will serve later as witness languages for proper inclusions or incomparabilities. Due to space limitations, we give only proof sketches in some cases where we believe that the reader finds the idea feasible.

Lemma 27

Let $L=\{\;a^{n}bbb\mid n\geq 1\;\}\cup\{\lambda\}$ . Then, it holds $L\in\mathcal{EC}(\mathit{NIL})\setminus\mathcal{EC}(\mathit{STAR})$ .

Proof.

The contextual grammar $G=(\{a,b\},\{\{a,b\}^{*}\{a,b\}^{4}\to(a,\lambda)\},\{abbb,\lambda\})$ generates $L$ .

During the derivation, the number of the letter $a$ is increasing without changing the number of $b$ . If the selection languages are from $\mathit{STAR}$ , then such a context containing letters $a$ only could be wrapped around the empty word yielding a word without $b$ which is a contradiction. ∎

Lemma 28

Let $L=\{\;b^{n}a\mid n\geq 0\;\}\cup\{\lambda\}$ . Then, it holds $L\in\mathcal{EC}(\mathit{COMB})\setminus\mathcal{EC}(\mathit{STAR})$ .

Proof.

The contextual grammar $G=(\{a,b\},\{\{a,b\}^{*}\{a\}\to(b,\lambda)\},\{\lambda,a\})$ generates the language $L$ and the selection language is combinational.

Similarly to the proof before: With star selection languages, a word with the letter $b$ but without $a$ could be generated. ∎

Lemma 29

Let $L_{1}=\{a,b\}^{*}\{\;a^{n}b^{m}\mid n\geq 1,\ m\geq 1\;\}$ , $L_{2}=\{\;ca^{n}b^{m}c\mid n\geq 1,\ m\geq 1\;\}$ , and $L=L_{1}\cup L_{2}$ . Then, it holds $L\in\mathcal{EC}(\mathit{SUF})\setminus\mathcal{EC}(\mathit{STAR})$ .

Proof.

It holds $L=L(G)$ for the contextual grammar $G=(\{a,b,c\},\{(S_{1},C_{1}),(S_{2},C_{2})\},\{ab\})$ with

S_{1}=\{a,b\}^{*},\quad C_{1}=\{(a,\lambda),(b,\lambda),(\lambda,b)\},\qquad S_{2}=\{\;a^{n}b^{m}\mid n\geq 0,\ m\geq 1\;\}\cup\{\lambda\},\quad C_{2}=\{(c,c)\}.

Using star selection languages, the two letters $c$ could be wrapped around a word with more than one $a$ -to- $b$ -change from $L_{1}$ which would yield a word not belonging to $L$ . ∎

Lemma 30

Let $L_{1}=\{\;a^{n}\mid n\geq 2\;\}$ and $L_{2}=\{\;ba^{2n}b\mid n\geq 1\;\}$ be two languages and $L=L_{1}\cup L_{2}$ its union. Then, the relation $L\in\mathcal{EC}(STAR)\setminus\mathcal{EC}(PS)$ holds.

Proof.

It holds $L=L(G)$ for the contextual grammar $G=(\{a,b\},\{(S_{1},C_{1}),(S_{2},C_{2})\},\{aa\})$ with

S_{1}=\{\;a^{n}\mid n\geq 0\;\},\quad C_{1}=\{(\lambda,a)\}\quad\mbox{and}\quad S_{2}=\{\;a^{2n}\mid n\geq 0\;\},\quad C_{2}=\{(b,b)\}.

Now assume that $L\in\mathcal{EC}(\mathit{PS})$ . Then, $L=L(G^{\prime})$ for a contextual grammar $G^{\prime}$ where every selection language is power-separating.

For every selection language (since it is power-separating), there is a number $m_{S}\in\mathbb{N}$ such that, for every word $x\in\{a,b\}^{*}$ , either $J_{x}^{m_{S}}\cap S=\emptyset$ or $J_{x}^{m_{S}}\subseteq S$ with $J_{x}^{m_{S}}=\{\;x^{n}\mid n\geq m_{S}\;\}$ . Let $m_{S}$ be the minimum of these numbers for $S$ and let $m$ be the maximum of all the values $m_{S}$ for a selection language $S$ .

Further, let $p=m+\ell(G^{\prime})$ . Then, we have the following statement for every selection language $S$ : For each word $x\in\{a,b\}^{*}$ , it is

\text{either }J_{x}^{p}\cap S=\emptyset\text{ or }J_{x}^{p}\subseteq S

(1)

where $J_{x}^{p}=\{\;x^{n}\mid n\geq p\;\}$ .

The language $L_{2}$ contains words with an arbitrary even number of letters $a$ and a letter $b$ at each end. Hence, there is a derivation $w_{0}\Longrightarrow^{*}w_{1}\Longrightarrow uw_{1}v$ with $w_{0}\in A$ , $|w_{1}|_{a}>p$ , $|w_{1}|_{b}=0$ , and $|uv|_{b}>0$ . This implies $w_{1}=a^{k}$ with $k>p$ .

Let $S$ be the selection language used in the last derivation step. Then, we have $a^{k}\in S$ and, with property (1), also $a^{k+1}\in S$ . Since $a^{k+1}$ belongs to $L_{1}$ and therefore also to $L$ , the last derivation step can also be applied to $a^{k+1}$ which yields the word $ua^{k+1}v$ . Since $|uv|_{b}>0$ , the word $ua^{k+1}v$ belongs at most to $L_{2}$ . Since $ua^{k}v\in L_{1}$ , we know that $|ua^{k}v|_{a}$ is an even number and $|ua^{k+1}v|_{a}$ is an odd number. Therefore, the word $ua^{k+1}v$ does not belong to $L_{2}$ and neither to $L$ which is a contradiction to $L=L(G^{\prime})$ . Thus, we conclude $L\notin{\cal EC}(\mathit{PS})$ . ∎

Lemma 31

Let $L=\{\;a^{n}b^{n}\mid n\geq 1\;\}\cup\{\;b^{n}a^{n}\mid n\geq 1\;\}$ . Then, it holds $L\in\mathcal{EC}(\mathit{STAR})\setminus\mathcal{EC}(\mathit{CIRC})$ .

Proof.

It holds $L=L(G)$ for the contextual grammar $G=(\{a,b\},\{(S_{1},C_{1}),(S_{2},C_{2})\},\{ab,ba\})$ with

S_{1}=\{\;a^{n}b^{m}\mid n\geq 1,\ m\geq 1\;\}^{*},\quad C_{1}=\{(a,b)\}\quad\mbox{and}\quad S_{2}=\{\;b^{n}a^{m}\mid n\geq 1,\ m\geq 1\;\}^{*},\quad C_{2}=\{(b,a)\}.

With circular selection languages, a context $(a^{k},b^{k})$ could be wrapped around a word $b^{m}a^{m}$ yielding a word which does not belong to the language $L$ . ∎

Lemma 32

The language $L=\{a,b\}^{*}\cup\{c\}\{ab\}^{*}\{c\}$ belongs to the set $\mathcal{EC}(\mathit{ORD})\setminus\mathcal{EC}(\mathit{SYDEF})$ .

Proof.

In Example 1, we have given a contextual grammar where all selection languages are accepted by ordered finite automata, and thus, have shown that $L\in{\cal EC}(\mathit{ORD})$ .

Suppose that the language $L$ is also generated by a contextual grammar $G^{\prime}$ where all selection languages are symmetric definite.

Let us consider a word $w=c(ab)^{n}c\in L$ for some $n\geq\ell(G^{\prime})$ . Due to the choice of $n$ , the word $w$ is derived in one step from some word $z$ by using a selection language $S$ and context $(u,v)$ : $z\Longrightarrow uzv=w$ . The word $u$ begins with the letter $c$ ; the word $v$ ends with $c$ . Due to the choice of $n$ , we also have $|z|_{a}>0$ and $|z|_{b}>0$ . Since $S$ is symmetric definite over the alphabet $V=\{a,b\}$ , it can be expressed as $S=EV^{*}H$ for some regular languages $E$ and $H$ over $V$ . The sets $E$ and $H$ are not empty because $S$ contains at least the word $z$ . Let $e$ be a word of $E$ and $h$ a word of $H$ . Then, the word $ebbh$ belongs to the selection language $S$ as well. Since $ebbh\in\{a,b\}^{*}$ and $\{a,b\}^{*}\subseteq L$ , we can apply the same derivation to this word and obtain $uebbhv$ . This word starts and ends with $c$ but it does not have the form of those words from $L$ because of the double $b$ . From this contradiction, it follows $L\notin{\cal EC}(\mathit{SYDEF})$ . ∎

Lemma 33

The language $L=\{a,b\}^{*}\cup\{c\}\{\lambda,b\}\{ab\}^{*}\{c\}$ belongs to $\mathcal{EC}(\mathit{SUF})\setminus\mathcal{EC}(\mathit{SYDEF})$ .

Proof.

The language $L$ is generated by the contextual grammar $G=(\{a,b,c\},\{(S_{1},C_{1}),(S_{2},C_{2})\},\{\lambda\})$ with

S_{1}=\{a,b\}^{*},\quad C_{1}=\{(\lambda,a),(\lambda,b)\}\quad\mbox{and}\quad S_{2}=\mathit{Suf}(\{ab\}^{*}),\quad C_{2}=\{(c,c)\}

where $\mathit{Suf}(M)$ denotes the suffix-closure of the set $M$ .

With the same argumentation as in the proof of Lemma 32, one can show also here $L\notin{\cal EC}(\mathit{SYDEF})$ (the letters $c$ are in both cases wrapped around words which are an alternating sequence of $a$ and $b$ what cannot be checked by a symmetric definite selection language). ∎

Lemma 34

Let $L_{1}=\{\;a^{n}\mid n\geq 1\;\}$ , $L_{2}=\{\;ba^{n}b\mid n\geq 1\;\}$ , $L_{3}=\{\;cba^{2n}bc\mid n\geq 1\;\}$ , and $L=L_{1}\cup L_{2}\cup L_{3}$ . Then, it holds $L\in\mathcal{EC}(\mathit{SYDEF})\setminus\mathcal{EC}(\mathit{NC})$ .

Proof.

Let $V=\{a,b,c\}$ . The contextual grammar $G=(V,\{(S_{1},C_{1}),(S_{2},C_{2})\},\{a\})$ with

S_{1}=\{a\}V^{*}\{\lambda\},\quad C_{1}=\{(\lambda,a),(b,b)\}\quad\mbox{and}\quad S_{2}=\{\;ba^{2m}b\mid m\geq 1\;\}V^{*}\{\lambda\},\quad C_{2}=\{(c,c)\}

generates the language $L$ . This can be seen as follows: The shortest word of $L$ is $a$ which is the axiom. To every word of $L$ starting with the letter $a$ (hence, any word of $L_{1}$ ), another $a$ can be added or the letter $b$ is added at the beginning and the end of the word (using the first selection component) yielding all and only words of the languages $L_{1}$ and $L_{2}$ . To every word of $L_{2}$ which also belongs to $S_{2}$ , the letter $c$ is added at the beginning and the end of the word (using the second selection component) yielding exactly the words of the language $L_{3}$ . To the words of $L_{3}$ , no selection component can be applied. All the selection languages are symmetric definite as can be seen from the form in which they are given.

In [29], it was proved that the language $L$ does not belong to the family ${\cal EC}(\mathit{NC})$ . ∎

Next, we show some equalities.

Lemma 35

A restriction to comet languages (left, right, two-sided) as selection languages does not decrease the generative capacity of external contextual grammars:

\mathcal{EC}(\mathit{REG})=\mathcal{EC}(\mathit{LCOM})=\mathcal{EC}(\mathit{RCOM})=\mathcal{EC}(\mathit{2COM}).

Proof.

With the inclusions $\mathit{LCOM}\subseteq\mathit{2COM}$ , $\mathit{RCOM}\subseteq\mathit{2COM}$ , and $\mathit{2COM}\subseteq\mathit{REG}$ (see Theorem 25 and Figure 1), we obtain also the inclusions ${\cal EC}(\mathit{LCOM})\subseteq{\cal EC}(\mathit{2COM})$ , ${\cal EC}(\mathit{RCOM})\subseteq{\cal EC}(\mathit{2COM})$ , and ${\cal EC}(\mathit{2COM})\subseteq{\cal EC}(\mathit{REG})$ according to Lemma 26.

Let $G=(V,\{(S_{1},C_{1}),\dots,(S_{n},C_{n})\},A)$ be a contextual grammar with arbitrary regular selection languages. Further, let $X$ be a new symbol ( $X\notin V$ ). We set $S_{i}^{\prime}=\{X\}^{*}S_{i}$ for $1\leq i\leq n$ . Then, the contextual grammar $G^{\prime}=(V\cup\{X\},\{(S_{1}^{\prime},C_{1}),\dots,(S_{n}^{\prime},C_{n})\},A)$ generates the same language as $G$ . The selection languages are all right-sided comet languages. The letter $X$ neither occurs in an axiom nor in a context. Therefore, the part $\{X\}^{*}$ of the selection languages has no impact on the possible derivations (the only word used is $\lambda$ ). Thus, the inclusion $\mathcal{EC}(\mathit{REG})\subseteq\mathcal{EC}(\mathit{RCOM})$ holds.

With $S_{i}^{\prime}=S_{i}\{X\}^{*}$ for $1\leq i\leq n$ , the same language is generated and the selection languages are left-sided comets. Hence, we also have the inclusion $\mathcal{EC}(\mathit{REG})\subseteq\mathcal{EC}(\mathit{LCOM})$ . Hence, we obtain the chain of inclusions ${\cal EC}(\mathit{REG})\subseteq{\cal C}\subseteq\mathit{2COM}\subseteq{\cal EC}(\mathit{REG})$ for ${\cal C}\in\{\mathit{LCOM},\mathit{RCOM}\}$ which implies the equalities stated in the lemma. ∎

We now prove some proper inclusions.

Lemma 36

The family $\mathcal{EC}(\mathit{MON})$ is a proper subset of the family $\mathcal{EC}(\mathit{STAR})$ .

Proof.

With the inclusion $\mathit{MON}\subseteq\mathit{STAR}$ (see Theorem 25 and Figure 1), we obtain also the inclusion ${\cal EC}(\mathit{MON})\subseteq{\cal EC}(\mathit{STAR})$ according to Lemma 26.

The language $L=\{\;a^{n}\mid n\geq 2\;\}\cup\{\;ba^{2n}b\mid n\geq 1\;\}$ from Lemma 30 belongs to the family $\mathcal{EC}(\mathit{STAR})$ but not to the family $\mathcal{EC}(\mathit{PS})$ and, hence, neither to $\mathcal{EC}(\mathit{MON})$ . Thus, the language is a witness for the properness of the inclusion. ∎

Lemma 37

The family $\mathcal{EC}(\mathit{FIN})$ is a proper subset of the family $\mathcal{EC}(\mathit{STAR})$

Proof.

According to [7], ${\cal EC}(\mathit{FIN})\subset{\cal EC}(\mathit{MON})$ . According to Lemma 36, $\mathcal{EC}(\mathit{MON})\subset\mathcal{EC}(STAR)$ . Hence, the family $\mathcal{EC}(\mathit{FIN})$ is also a proper subset of the family ${\cal EC}(\mathit{STAR})$ . ∎

Lemma 38

The family $\mathcal{EC}(\mathit{STAR})$ is a proper subset of the families $\mathcal{EC}(\mathit{LCOM})$ and ${\cal EC}(\mathit{RCOM})$ .

Proof.

The inclusions $\mathit{STAR}\setminus\{\{\lambda\}\}\subseteq\mathit{LCOM}$ and $\mathit{STAR}\setminus\{\{\lambda\}\}\subseteq\mathit{RCOM}$ hold as recalled in Section 3.2. Consider an external contextual grammar with a single selection component $(\{\lambda\},C)$ (if there are more components with the selection language $\{\lambda\}$ , they can be joined to one where the new set of contexts is the union of the single sets and the selection language is still the same). If the generated language contains the empty word, then this is an axiom since it cannot be obtained by derivation. Then, exactly the (finitely many) words $uv$ with $(u,v)\in C$ are generated using this selection component. Thus, if we put all these words $uv$ with $(u,v)\in C$ into the set of axioms as well, we can remove the component $(\{\lambda\},C)$ and obtain a contextual grammar which generates the same language but has no selection language $\{\lambda\}$ anymore. Then, the remaining selection languages belong to the families $\mathit{LCOM}$ and $\mathit{RCOM}$ . Hence, every language of $\mathcal{EC}(\mathit{STAR})$ also belongs to the families $\mathcal{EC}(\mathit{LCOM})$ and ${\cal EC}(\mathit{RCOM})$ .

According to Lemma 28, the language $L=\{\;b^{n}a\mid n\geq 0\;\}\cup\{\lambda\}$ belongs to $\mathcal{EC}(\mathit{COMB})$ (and also to $\mathcal{EC}(\mathit{LCOM})$ and $\mathcal{EC}(\mathit{RCOM})$ by Theorem 25, Figure 1, and Lemma 26) but not to $\mathcal{EC}(\mathit{STAR})$ . This proves the properness of the inclusion. ∎

Lemma 39

The family $\mathcal{EC}(\mathit{DEF})$ is a proper subset of the family $\mathcal{EC}(\mathit{SYDEF})$ .

Proof.

Let $G=(V,\{(S_{1},C_{1}),\ldots,(S_{n},C_{n})\},A)$ be a contextual grammar where all selection languages are definite: $S_{i}=U_{i}^{*}B_{i}\cup A_{i}$ for $1\leq i\leq n$ . We first separate the finite parts and obtain the contextual grammar $G^{\prime}=(V,\{(U_{1}^{*}B_{1},C_{1}),(A_{1},C_{1}),\ldots,(U_{n}^{*}B_{n},C_{n}),(A_{n},C_{n})\},A)$ which generates the same language as $G$ . Next, we eliminate the components with finite selection languages: If a set $B_{i}$ is empty, then the entire selection language is empty and cannot be used for derivation. Hence, we can simply omit such selection components without changing the generated language. For every component $(A_{i},C_{i})$ where $A_{i}$ is a finite language ( $1\leq i\leq n$ ), we move all words $uwv$ with $(u,v)\in C_{i}$ and $w\in A_{i}\cap L(G)$ into the set of axioms. These are finitely many (as $A_{i}$ and $C_{i}$ are finite) and are exactly the words generated by these components). Hence, we can remove these components afterwards. Then, we have obtained a contextual grammar which still generates the same language $L(G)$ but has only symmetric definite languages left.

The language $L=\{\;a^{n}\mid n\geq 1\;\}\cup\{\;ba^{n}b\mid n\geq 1\;\}\cup\{\;cba^{2n}bc\mid n\geq 1\;\}$ is a witness language for the properness of the inclusion which, according to Lemma 34, belongs to the family $\mathcal{EC}(\mathit{SYDEF})$ but not to the family $\mathcal{EC}(\mathit{NC})$ and, hence, not to (since ${\cal EC}(\mathit{DEF})\subset{\cal EC}(\mathit{NC})$ according to [7]). ∎

Lemma 40

The family $\mathcal{EC}(\mathit{SYDEF})$ is a proper subset of the family $\mathcal{EC}(\mathit{PS})$ .

Proof.

From [20], we know the inclusion $\mathit{SYDEF}\subseteq\mathit{PS}$ . Therefore, by Lemma 26, we have the inclusion $\mathcal{EC}(\mathit{SYDEF})\subseteq\mathcal{EC}(\mathit{PS})$ . Its properness follows from Lemma 32 with $L=\{a,b\}^{*}\cup\{c\}\{ab\}^{*}\{c\}$ which belongs to the family $\mathcal{EC}(\mathit{ORD})$ (and also to $\mathcal{EC}(\mathit{PS})$ by [29]) but not to the family $\mathcal{EC}(\mathit{SYDEF})$ . ∎

Now, we prove the incomparability relations mentioned in Figure 2 which have not been proved earlier. These are the relations regarding the families ${\cal EC}(\mathit{STAR})$ and ${\cal EC}(\mathit{SYDEF})$ since the families ${\cal EC}(\mathit{LCOM})$ , ${\cal EC}(\mathit{RCOM})$ , and ${\cal EC}(\mathit{2COM})$ coincide with ${\cal EC}(\mathit{REG})$ and are therefore not incomparable to the other families mentioned.

Lemma 41

Let ${\cal F}=\{\mathit{COMB},\mathit{DEF},\mathit{SYDEF},\mathit{ORD},\mathit{NC},\mathit{PS}\}$ . The family $\mathcal{EC}(\mathit{STAR})$ is incomparable to each family $\mathcal{EC}(F)$ with $F\in{\cal F}$ .

Proof.

Due to the inclusion relations, it suffices to show that there are two languages $L_{1}$ and $L_{2}$ with the properties $L_{1}\in{\cal EC}(\mathit{COMB})\setminus{\cal EC}(\mathit{STAR})$ and $L_{2}\in{\cal EC}(\mathit{STAR})\setminus{\cal EC}(\mathit{PS})$ . From Lemma 28, we have the language $L_{1}=\{\;b^{n}a\mid n\geq 0\;\}\cup\{\lambda\}$ . From Lemma 30, we have $L_{2}=\{\;ba^{2n}b\mid n\geq 1\;\}\cup\{\;a^{n}\mid n\geq 2\;\}$ . ∎

Lemma 42

Let ${\cal F}=\{\mathit{NIL},\mathit{COMM},\mathit{CIRC}\}$ . The family $\mathcal{EC}(\mathit{STAR})$ is incomparable to each family $\mathcal{EC}(F)$ with $F\in{\cal F}$ .

Proof.

Due to the inclusion relations, it suffices to show that there are two languages $L_{1}$ and $L_{2}$ with the properties $L_{1}\in{\cal EC}(\mathit{NIL})\setminus{\cal EC}(\mathit{STAR})$ and $L_{2}\in{\cal EC}(\mathit{STAR})\setminus{\cal EC}(\mathit{CIRC})$ . From Lemma 27, we have the language $L_{1}=\{\;a^{n}bbb\mid n\geq 1\;\}\cup\{\lambda\}$ . From Lemma 31, we have $L_{2}=\{\;a^{n}b^{n}\mid n\geq 1\;\}\cup\{\;b^{n}a^{n}\mid n\geq 1\;\}$ . ∎

Lemma 43

The language family $\mathcal{EC}(\mathit{STAR})$ is incomparable to the family $\mathcal{EC}(\mathit{SUF})$ .

Proof.

We have $L_{1}=\{a,b\}^{*}\{\;a^{n}b^{m}\mid n\geq 1,\ m\geq 1\;\}\cup\{\;ca^{n}b^{m}c\mid n\geq 1,\ m\geq 1\;\}\in\mathcal{EC}(\mathit{SUF})\setminus\mathcal{EC}(\mathit{STAR})$ from Lemma 29. From Lemma 30, we know that $L_{2}=\{\;a^{n}\mid n\geq 2\;\}\cup\{\;ba^{2n}b\mid n\geq 1\;\}$ belongs to the family $\mathcal{EC}(\mathit{STAR})$ but not to $\mathcal{EC}(\mathit{PS})$ (and neither to $\mathcal{EC}(\mathit{SUF})$ by [29]). ∎

Lemma 44

The language family $\mathcal{EC}(\mathit{SYDEF})$ is incomparable to the family $\mathcal{EC}(\mathit{SUF})$ .

Proof.

We have $L_{1}=\{a,b\}^{*}\cup\{c\}\{\lambda,b\}\{ab\}^{*}\{c\}\in\mathcal{EC}(\mathit{SUF})\setminus\mathcal{EC}(\mathit{SYDEF})$ from Lemma 33. From [7], we know that $L_{2}=\{\;ab^{n}\mid n\geq 1\;\}\cup\{\lambda\}$ belongs to the family $\mathcal{EC}(\mathit{COMB})$ but not to $\mathcal{EC}(\mathit{SUF})$ . By [29] and Lemma 39, the language $L_{2}$ also belongs to ${\cal EC}(\mathit{SYDEF})$ . ∎

Lemma 45

The family $\mathcal{EC}(\mathit{SYDEF})$ is incomparable to each of the families ${\cal EC}(\mathit{ORD})$ and ${\cal EC}(\mathit{NC})$ .

Proof.

Due to the inclusion relations, it suffices to show that there are two languages $L_{1}$ and $L_{2}$ with the properties $L_{1}\in{\cal EC}(\mathit{ORD})\setminus{\cal EC}(\mathit{SYDEF})$ and $L_{2}\in{\cal EC}(\mathit{SYDEF})\setminus{\cal EC}(\mathit{NC})$ . From Lemma 32, we have $L_{1}=\{a,b\}^{*}\cup\{c\}\{ab\}^{*}\{c\}$ . As $L_{2}$ , we take $L_{2}=\{\;a^{n}\mid n\geq 1\;\}\cup\{\;ba^{n}b\mid n\geq 1\;\}\cup\{\;cba^{2n}bc\mid n\geq 1\;\}$ from Lemma 34. ∎

Lemma 46

The family $\mathcal{EC}(\mathit{SYDEF})$ is incomparable to each of the families ${\cal EC}(\mathit{COMM})$ and ${\cal EC}(\mathit{CIRC})$ .

Proof.

Due to the inclusion relations, it suffices to show that there are two languages $L_{1}$ and $L_{2}$ with the properties $L_{1}\in{\cal EC}(\mathit{COMM})\setminus{\cal EC}(\mathit{SYDEF})$ and $L_{2}\in{\cal EC}(\mathit{SYDEF})\setminus{\cal EC}(\mathit{CIRC})$ . In [29], it was proved that the language $L_{1}=\{\;a^{n}\mid n\geq 2\;\}\cup\{\;ba^{2n}b\mid n\geq 1\;\}$ belongs to ${\cal EC}(COMM)$ but not to ${\cal EC}(\mathit{PS})$ (this can be seen also in the proof of Lemma 30). By Lemma 40, the language $L_{1}$ neither belongs to the family ${\cal EC}(\mathit{SYDEF})$ .

In [7], it was proved that the language $L_{2}=\{\;abc^{n}\mid n\geq 1\;\}\cup\{\;c^{n}ab\mid n\geq 1\;\}$ belongs ${\cal EC}(\mathit{COMB})$ but not to ${\cal EC}(\mathit{CIRC})$ . By [29] and Lemma 39, the language $L_{2}$ also belongs to the family ${\cal EC}(\mathit{SYDEF})$ . ∎

Theorem 47 (Hierarchy of the ${\cal EC}$ language families)

The inclusion relations presented in Figure 2 hold. An arrow from an entry $X$ to an entry $Y$ depicts the proper inclusion $X\subset Y$ ; if two families are not connected by a directed path, then they are incomparable.

Proof.

An edge label refers to the paper or lemma in the present paper where the proper inclusion is shown. The incomparability results are proved in Lemmas 41 to 46. ∎

5 Conclusion and future work

In this paper, we have extended the previous hierarchy of subregular language families and families generated by external contextual grammars with selection in certain subregular language families.

Various other subregular language families have also been investigated in the past (for instance, in [2, 13, 20]). Future research will be on extending and unifying current hierarchies of subregular language families (presented, for instance, in [10, 29]) by additional families and to use them as control in external contextual grammars. We already started investigations on the position of prefix- and infix-closed as well as prefix-, suffix-, and infix-free languages in the current hierarchy and their impact on the generative power of external contextual grammars when used for selection. The extension of the hierarchy with other families of definite-like languages (for instance, ultimate definite, central definite, noninital definite) has also already begun.

The research will be also extended to internal contextual grammars or tree-controlled grammars where results are already available in [10, 29, 30, 31].

References

[1]
[2] Henning Bordihn, Markus Holzer & Martin Kutrib (2009): Determination of finite automata accepting subregular languages. Theoretical Computer Science 410(35), pp. 3209–3222, 10.1016/j.tcs.2009.05.019.
[3] Janusz A. Brzozowski (1962): Regular expression techniques for sequential circuits. Ph.D. thesis, Princeton University, Princeton, NJ, USA.
[4] Janusz A. Brzozowski (1967): Roots of star events. Journal of the ACM 14(3), pp. 466–477, 10.1109/SWAT.1966.21.
[5] Janusz A. Brzozowski & Rina Cohen (1969): On decompositions of regular events. Journal of the ACM 16(1), pp. 132–144, 10.1145/321495.321505.
[6] Janusz A. Brzozowski, Galina Jirásková & Chenglong Zou (2014): Quotient complexity of closed languages. Theory of Computing Systems 54, pp. 277–292, 10.1007/s00224-013-9515-7.
[7] Jürgen Dassow (2005): Contextual grammars with subregular choice. Fundamenta Informaticae 64(1–4), pp. 109–118.
[8] Jürgen Dassow (2015): Contextual languages with strictly locally testable and star free selection languages. Analele Universitatii Bucuresti 62, pp. 25–36.
[9] Jürgen Dassow, Florin Manea & Bianca Truthe (2012): On external contextual grammars with subregular selection languages. Theoretical Computer Science 449, pp. 64–73, 10.1016/j.tcs.2012.04.008.
[10] Jürgen Dassow & Bianca Truthe (2023): Relations of contextual grammars with strictly locally testable selection languages. RAIRO – Theoretical Informatics and Applications 57, p. #10, 10.1051/ita/2023012.
[11] Ference Gécseg & István Peák (1972): Algebraic Theory of Automata. Academiai Kiado, Budapest.
[12] Arthur Gill & Lawrence T. Kou (1974): Multiple-entry finite automata. Journal of Computer and System Sciences 9(1), pp. 1–19, 10.1016/S0022-0000(74)80034-6.
[13] Yo-Sub Han & Kai Salomaa (2009): State complexity of basic operations on suffix-free regular languages. Theoretical Computer Science 410(27), pp. 2537–2548, 10.1016/j.tcs.2008.12.054.
[14] Ivan M. Havel (1969): The theory of regular events II. Kybernetika 5(6), pp. 520–544.
[15] Markus Holzer & Bianca Truthe (2015): On relations between some subregular language families. In Rudolf Freund, Markus Holzer, Nelma Moreira & Rogério Reis, editors: Seventh Workshop on Non-Classical Models of Automata and Applications – NCMA 2015, Porto, Portugal, August 31 – September 1, 2015. Proceedings, [email protected] 318, Österreichische Computer Gesellschaft, pp. 109–124.
[16] Manfred Kudlek (2004): On languages of cyclic words. In Natasha Jonoska, Gheorghe Păun & Grzegorz Rozenberg, editors: Aspects of Molecular Computing, Essays Dedicated to Tom Head on the Occasion of His 70th Birthday, LNCS 2950, Springer-Verlag, pp. 278–288, 10.1007/978-3-540-24635-0_20.
[17] Solomon Marcus (1969): Contextual grammars. Revue Roumaine de Mathématique Pures et Appliquées 14, pp. 1525–1534.
[18] Robert McNaughton & Seymour Papert (1971): Counter-Free Automata. MIT Press, Cambridge, USA.
[19] Benedek Nagy (2019): Union-Freeness, Deterministic Union-Freeness and Union-Complexity. In Michal Hospodár, Galina Jirásková & Stavros Konstantinidis, editors: Descriptional Complexity of Formal Systems, 21st IFIP WG 1.02 International Conference, DCFS 2019, Košice, Slovakia, July 17–19, 2019, Proceedings, Springer, Cham, pp. 46–56, 10.1007/978-3-030-23247-4_3.
[20] Viktor Olejár & Alexander Szabari (2023): Closure Properties of Subregular Languages Under Operations. International Journal of Foundations of Computer Science, pp. 1–25, 10.1142/S0129054123450016.
[21] Azaria Paz & Bezalel Peleg (1965): Ultimate-definite and symmetric-definite events and automata. Journal of the ACM 12(3), pp. 399–410, 10.1145/321281.321292.
[22] Micha A. Perles, Michael O. Rabin & Eli Shamir (1963): The theory of definite automata. IEEE Transactions of Electronic Computers 12, pp. 233–243, 10.1109/PGEC.1963.263534.
[23] Grzegorz Rozenberg & Arto Salomaa, editors (1997): Handbook of Formal Languages. Springer-Verlag, Berlin, 10.1007/978-3-642-59136-5.
[24] Huei-Jan Shyr (1991): Free Monoids and Languages. Hon Min Book Co., Taichung, Taiwan.
[25] Huei-Jan Shyr & Gabriel Thierrin (1974): Ordered automata and associated languages. Tamkang Journal of Mathematics 5(1), pp. 9–20.
[26] Huei-Jan Shyr & Gabriel Thierrin (1974): Power-separating regular languages. Mathematical Systems Theory 8(1), pp. 90–95, 10.1007/BF01761710.
[27] Bianca Truthe (2014): A relation between definite and ordered finite automata. In Suna Bensch, Rudolf Freund & Friedrich Otto, editors: Sixth Workshop on Non-Classical Models for Automata and Applications – NCMA 2014, Kassel, Germany, July 28–29, 2014. Proceedings, [email protected] 304, Österreichische Computer Gesellschaft, pp. 235–247.
[28] Bianca Truthe (2018): Hierarchy of Subregular Language Families. Technical Report, Justus-Liebig-Universität Giessen, Institut für Informatik, IFIG Research Report 1801.
[29] Bianca Truthe (2021): Generative Capacity of Contextual Grammars with Subregular Selection Languages. Fundamenta Informaticae 180(1–2), pp. 123–150, 10.3233/FI-2021-2037.
[30] Bianca Truthe (2023): Merging two Hierarchies of Internal Contextual Grammars with Subregular Selection. In Benedek Nagy & Rudolf Freund, editors: Proceedings of the 13th International Workshop on Non-Classical Models of Automata and Applications, NCMA 2023, Famagusta, North Cyprus, 18th–19th September, 2023, EPTCS 388, pp. 125–139, 10.4204/EPTCS.388.12.
[31] Bianca Truthe (2023): Strictly Locally Testable and Resources Restricted Control Languages in Tree-Controlled Grammars. In Zsolt Gazdag, Szabolcs Iván & Gergely Kovásznai, editors: Proceedings of the 16th International Conference on Automata and Formal Languages, AFL 2023, Eger, Hungary, September 5–7, 2023, EPTCS 386, pp. 253–268, 10.4204/EPTCS.386.20.
[32] Barbara Wiedemann (1978): Vergleich der Leistungsfähigkeit endlicher determinierter Automaten. Diplomarbeit, Universität Rostock.

Various Types of Comet Languages and their Application in External Contextual Grammars

Abstract

1 Introduction

2 Preliminaries

2.1 Some subregular language families

2.2 Contextual grammars

Example 1

3 Results on families of comet languages

Lemma 2

Lemma 3

3.1 Normal forms

Lemma 4

Proof.

Lemma 5

Proof.

Lemma 6

Proof.

Theorem 7 (Normal form for 2​C​O​M\mathit{2COM}-languages)

Proof.

3.2 Hierarchy of subregular language classes

Lemma 8

Corollary 9

Lemma 10

Proof.

Lemma 11

Proof.

Lemma 12

Proof.

Lemma 13 ([20])

Proof.

Lemma 14

Proof.

Lemma 15

Proof.

Lemma 16

Proof.

Lemma 17

Lemma 18

Proof.

Lemma 19

Proof.

Lemma 20

Proof.

Lemma 21

Proof.

Lemma 22

Proof.

Lemma 23

Proof.

Lemma 24

Proof.

Theorem 25 (Resulting hierarchy)

Proof.

4 Results on subregular control in external contextual grammars

Lemma 26

Lemma 27

Proof.

Lemma 28

Proof.

Lemma 29

Proof.

Lemma 30

Proof.

Lemma 31

Proof.

Lemma 32

Proof.

Lemma 33

Proof.

Lemma 34

Proof.

Lemma 35

Proof.

Lemma 36

Proof.

Lemma 37

Proof.

Lemma 38

Proof.

Lemma 39

Theorem 7 (Normal form for $\mathit{2COM}$ -languages)

Theorem 47 (Hierarchy of the ${\cal EC}$ language families)