This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Various Types of Comet Languages and their Application in External Contextual Grammars

Marvin Ködding     Bianca Truthe Institut für Informatik, Universität Giessen
Arndtstr. 2, 35392 Giessen, Germany {marvin.koedding,bianca.truthe}@informatik.uni-giessen.de
Abstract

In this paper, we continue the research on the power of contextual grammars with selection languages from subfamilies of the family of regular languages. We investigate various comet-like types of languages and compare such language families to some other subregular families of languages (finite, monoidal, nilpotent, combinational, (symmetric) definite, ordered, non-counting, power-separating, suffix-closed, commutative, circular, or union-free languages). Further, we compare the language families defined by these types for the selection with each other and with the families of the hierarchy obtained for external contextual grammars. In this way, we extend the existing hierarchy by new language families.

Keywords: Comet languages, contextual grammars, subregular selection languages, computational capacity.

1 Introduction

Contextual grammars were first introduced by Solomon Marcus in [17] as a formal model that might be used for the generation of natural languages. The derivation steps consist in adding contexts to given well-formed sentences, starting from an initial finite basis. Formally, a context is given by a pair (u,v)(u,v) of words and the external adding to a word xx gives the word uxvuxv. In order to control the derivation process, contextual grammars with selection in a certain family of languages were defined. In such contextual grammars, a context (u,v)(u,v) may be added only around a word xx if this word xx belongs to a language which is associated with the context. Language families were defined where all selection languages in a contextual grammar belong to some language family FF.

The study of external contextual grammars with selection in special regular sets was started by Jürgen Dassow in [7]. The research was continued by Jürgen Dassow, Florin Manea, and Bianca Truthe (see [9]) where further subregular families of selection languages were considered.

In the present paper, we extend the hierarchy of subregular language families by families of comet-like languages. Furthermore, we investigate the generative capacity of external contextual grammars with selection in such subregular language families.

2 Preliminaries

Throughout the paper, we assume that the reader is familiar with the basic concepts of the theory of automata and formal languages. For details, we refer to [23]. Here we only recall some notation, definitions, and previous results which we need for the present research.

An alphabet is a non-empty finite set of symbols. For an alphabet VV, we denote by VV^{*} and V+V^{+} the set of all words and the set of all non-empty words over VV, respectively. The empty word is denoted by λ\lambda. For a word ww and a letter aa, we denote the length of ww by |w||w| and the number of occurrences of the letter aa in the word ww by |w|a|w|_{a}. For a set AA, we denote its cardinality by |A||A|. The reversal of a word ww is denoted by wRw^{R}: if w=x1x2xnw=x_{1}x_{2}\ldots x_{n} for letters x1,,xnx_{1},\ldots,x_{n}, then wR=xnxn1x1w^{R}=x_{n}x_{n-1}\ldots x_{1}. By LRL^{R}, we denote the language of all reversals of the words in LL: LR={wRwL}L^{R}=\{\;w^{R}\mid w\in L\;\}.

A deterministic finite automaton is a quintuple

𝒜=(V,Z,z0,F,δ){\cal A}=(V,Z,z_{0},F,\delta)

where VV is a finite set of input symbols, ZZ is a finite set of states, z0Zz_{0}\in Z is the initial state, FZF\subseteq Z is a set of accepting states, and δ\delta is a transition function δ:Z×VZ\delta:Z\times V\to Z. The language accepted by such an automaton is the set of all input words over the alphabet VV which lead letterwise by the transition function from the initial state to an accepting state.

A regular expression over some alphabet VV is defined inductively as follows:

  1. 1.

    \emptyset is a regular expression;

  2. 2.

    every element xVx\in V is a regular expression;

  3. 3.

    if RR and SS are regular expressions, so are the concatenation RSR\cdot S, the union RSR\cup S, and the Kleene closure RR^{*};

  4. 4.

    for every regular expression, there is a natural number nn such that the regular expression is obtained from the atomic elements \emptyset and xVx\in V by nn operations concatenation, union, or Kleene closure.

The language L(R)L(R) which is described by a regular expression RR is also inductively defined:

  1. 1.

    L()=L(\emptyset)=\emptyset;

  2. 2.

    for every element xVx\in V, we have L(x)={x}L(x)=\{x\};

  3. 3.

    if RR and SS are regular expressions, then

    L(RS)=L(R)L(S),L(RS)=L(R)L(S),L(R)=(L(R)).L(R\cdot S)=L(R)\cdot L(S),\quad L(R\cup S)=L(R)\cup L(S),\quad L(R^{*})=(L(R))^{*}.

A general regular expression admits as operations (in the third item of the definition above) also intersection (where L(RS)=L(R)L(S)L(R\cap S)=L(R)\cap L(S)) and complementation (where L(R¯)=L(R)¯L(\overline{R})=\overline{L(R)}).

All the languages accepted by a finite automaton or described by some regular expression are called regular and form a family denoted by 𝑅𝐸𝐺\mathit{REG}. Any subfamily of this set is called a subregular language family.

2.1 Some subregular language families

We consider the following restrictions for regular languages. In the following list of properties, we give already the abbreviation which denotes the family of all languages with the respective property. Let LL be a regular language over an alphabet VV. With respect to the alphabet VV, the language LL is said to be

  • monoidal (𝑀𝑂𝑁\mathit{MON}) if and only if L=VL=V^{*},

  • nilpotent (𝑁𝐼𝐿\mathit{NIL}) if and only if it is finite or its complement VLV^{*}\setminus L is finite,

  • combinational (𝐶𝑂𝑀𝐵\mathit{COMB}) if and only if it has the form L=VXL=V^{*}X for some subset XVX\subseteq V,

  • definite (𝐷𝐸𝐹\mathit{DEF}) if and only if it can be represented in the form L=AVBL=A\cup V^{*}B where AA and BB are finite subsets of VV^{*},

  • symmetric definite (𝑆𝑌𝐷𝐸𝐹\mathit{SYDEF}) if and only if L=EVHL=EV^{*}H for some regular languages EE and HH,

  • suffix-closed (𝑆𝑈𝐹\mathit{SUF}) (or fully initial or multiple-entry language) if and only if, for any two words over VV, say xVx\in V^{*} and yVy\in V^{*}, the relation xyLxy\in L implies the relation yLy\in L,

  • ordered (𝑂𝑅𝐷\mathit{ORD}) if and only if the language is accepted by some deterministic finite automaton

    𝒜=(V,Z,z0,F,δ){\cal A}=(V,Z,z_{0},F,\delta)

    with an input alphabet VV, a finite set ZZ of states, a start state z0Zz_{0}\in Z, a set FZF\subseteq Z of accepting states and a transition mapping δ\delta where (Z,)(Z,\preceq) is a totally ordered set and, for any input symbol aVa\in V, the relation zzz\preceq z^{\prime} implies δ(z,a)δ(z,a)\delta(z,a)\preceq\delta(z^{\prime},a),

  • commutative (𝐶𝑂𝑀𝑀\mathit{COMM}) if and only if it contains with each word also all permutations of this word,

  • circular (𝐶𝐼𝑅𝐶\mathit{CIRC}) if and only if it contains with each word also all circular shifts of this word,

  • non-counting (𝑁𝐶\mathit{NC}) if and only if there is a natural number k1k\geq 1 such that, for any three words xVx\in V^{*}, yVy\in V^{*}, and zVz\in V^{*}, it holds xykzLxy^{k}z\in L if and only if xyk+1zLxy^{k+1}z\in L,

  • star-free (𝑆𝐹\mathit{SF}) if and only if LL can be described by a regular expression which is built by concatenation, union, and complementation,

  • power-separating (𝑃𝑆\mathit{PS}) if and only if, there is a natural number m1m\geq 1 such that for any word xVx\in V^{*}, either JxmL=J_{x}^{m}\cap L=\emptyset or JxmLJ_{x}^{m}\subseteq L where Jxm={xnnm}J_{x}^{m}=\{\;x^{n}\mid n\geq m\;\},

  • union-free (𝑈𝐹\mathit{UF}) if and only if LL can be described by a regular expression which is only built by concatenation and Kleene closure,

  • star (𝑆𝑇𝐴𝑅\mathit{STAR}) if and only if L=HL=H^{*} for some regular language HVH\subseteq V^{*},

  • left-sided comet (𝐿𝐶𝑂𝑀\mathit{LCOM}) if and only if L=EGL=EG^{*} for some regular language EE and a regular language G{,{λ}}G\notin\{\emptyset,\{\lambda\}\},

  • right-sided comet (𝑅𝐶𝑂𝑀\mathit{RCOM}) if and only if L=GHL=G^{*}H for some regular language HH and a regular language G{,{λ}}G\notin\{\emptyset,\{\lambda\}\},

  • two-sided comet (2COM\mathit{2COM}) if and only if L=EGHL=EG^{*}H for two regular languages EE and HH and a regular language G{,{λ}}G\notin\{\emptyset,\{\lambda\}\}.

We remark that monoidal, nilpotent, combinational, (symmetric) definite, ordered, non-counting, star-free, union-free, star, and (left-, right-, or two-sided) comet languages are regular, whereas non-regular languages of the other types mentioned above exist. Here, we consider among the suffix-closed, commutative, circular, and power-separating languages only those which are also regular. By 𝐹𝐼𝑁\mathit{FIN}, we denote the family of languages with finitely many words. In [18], it was shown that the families of the non-counting languages and the star-free languages are equivalent (𝑁𝐶=𝑆𝐹\mathit{NC}=\mathit{SF}).

Some properties of the languages of the classes mentioned above can be found in [24] (monoids), [11] (nilpotent languages), [14] (combinational and commutative languages), [22] (definite languages), [21] (symmetric definite languages), [12] and [6] (suffix-closed languages), [25] (ordered languages), [16] (circular languages), [18] (non-counting and star free languages), [26] (power-separating languages), [3] (union-free languages), [4] (star languages), [5] (comet languages).

2.2 Contextual grammars

Let {\cal F} be a family of languages. A contextual grammar with selection in {\cal F} is a triple G=(V,𝒮,A)G=(V,{\cal S},A) where

  • VV is an alphabet,

  • 𝒮{\cal S} is a finite set of selection pairs (S,C)(S,C) with a selection language SS over some subset UU of the alphabet VV which belongs to the family {\cal F} with respect to the alphabet UU and a finite set CV×VC\subset V^{*}\times V^{*} of contexts where, for each context (u,v)C(u,v)\in C, at least one side is not empty: uvλuv\not=\lambda,

  • AA is a finite subset of VV^{*} (its elements are called axioms).

We write a selection pair (S,C)(S,C) also as SCS\to C. In the case that CC is a singleton set C={(u,v)}C=\{(u,v)\}, we also write S(u,v)S\to(u,v). For a contextual grammar G=(V,{(S1,C1),(S2,C2),,(Sn,Cn)},A)G=(V,\left\{\,(S_{1},C_{1}),(S_{2},C_{2}),\dots,(S_{n},C_{n})\,\right\},A), we set

A(G)=max{|w||wA},C(G)=max{|uv||(u,v)Ci,1in},(G)=A(G)+C(G)+1.\ell_{A}(G)=\max\left\{\>|w|\;|\;w\in A\>\right\},\quad\ell_{C}(G)=\max\left\{\>|uv|\;|\;(u,v)\in C_{i},1\leq i\leq n\>\right\},\quad\ell(G)=\ell_{A}(G)+\ell_{C}(G)+1.

We now define the derivation modes for contextual grammars with selection.

Let G=(V,𝒮,A)G=(V,{\cal S},A) be a contextual grammar with selection. A direct external derivation step in GG is defined as follows: a word xx derives a word yy (written as xyx\Longrightarrow y) if and only if there is a pair (S,C)𝒮(S,C)\in{\cal S} such that xSx\in S and y=uxvy=uxv for some pair (u,v)C(u,v)\in C. Intuitively, one can only wrap a context (u,v)C(u,v)\in C around a word xx if xx belongs to the corresponding selection language SS.

By \Longrightarrow^{*} we denote the reflexive and transitive closure of the relation \Longrightarrow. The language generated by GG is L={zxz for some xA}L=\{\;z\mid x\Longrightarrow^{*}z\mbox{ for some }x\in A\;\}.

Example 1

Consider the contextual grammar G=({a,b,c},{(S1,C1),(S2,C2)},{λ})G=(\{a,b,c\},\{(S_{1},C_{1}),(S_{2},C_{2})\},\{\lambda\}) with

S1={a,b},C1={(λ,a),(λ,b)},S2={ab},C2={(c,c)}.S_{1}=\{a,b\}^{*},\quad C_{1}=\{(\lambda,a),(\lambda,b)\},\qquad S_{2}=\{ab\}^{*},\quad C_{2}=\{(c,c)\}.

Starting from the axiom λ\lambda, every word of the language S1S_{1} is generated by applying the first selection. Starting from any word of S2S1S_{2}\subset S_{1}, every word of the language {c}S2{c}\{c\}S_{2}\{c\} is generated by applying the second selection. Other words are not generated.

Thus, the language generated is

L(G)={a,b}{c(ab)ncn0}.L(G)=\{a,b\}^{*}\cup\{\;c(ab)^{n}c\mid n\geq 0\;\}.

Both selection languages are ordered: The language S1S_{1} is accepted by a finite automaton with exactly one state. Hence, it is ordered. The language S2S_{2} is accepted by the following deterministic finite automaton A=({z0,z1,z2,z3},{a,b},δ,z1,{z1})A=(\{z_{0},z_{1},z_{2},z_{3}\},\{a,b\},\delta,z_{1},\{z_{1}\}) where the transition function is illustrated in the following picture and given in the table next to it, from which it can be seen that the automaton is ordered:

z1z_{1}startz2z_{2}z3z_{3}z0z_{0}aabbbbaaa,ba,ba,ba,b
z0z_{0} z1z_{1} z2z_{2} z3z_{3}
aa z0z_{0} z2z_{2} z3z_{3} z3z_{3}
bb z0z_{0} z0z_{0} z1z_{1} z3z_{3}

\Diamond

By 𝒞(){\cal EC}({\cal F}), we denote the family of all languages generated externally by contextual grammars with selection in {\cal F}. When a contextual grammar works in the external mode, we call it an external contextual grammar.

The language generated by the external contextual grammar in Example 1 belongs, for instance, to the family 𝒞(𝑂𝑅𝐷){\cal EC}(\mathit{ORD}) because all selection languages (S1S_{1} and S2S_{2}) are ordered.

3 Results on families of comet languages

We first present some observations about star languages and two-sided comet languages, we give normal forms for two-sided comets, and we insert the subregular families investigated here into the existing hierarchy.

From the structure of two-sided comet languages (languages LL of the form EGHEG^{*}H where GG is neither the empty set nor the set with the empty word only), we see that every such language is infinite if none of the sets EE, GG, and HH is the empty set. If one of the sets EE or HH is empty, then the whole language LL is also empty.

Lemma 2

For each language L2COML\in\mathit{2COM}, it holds that LL is either infinite or empty.

A similar observation can be made for star languages.

Lemma 3

For each language L𝑆𝑇𝐴𝑅L\in\mathit{STAR}, it holds that LL either is infinite or consists of the empty word λ\lambda.

3.1 Normal forms

We first show some observations before we conclude a normal form for languages from the class 2COM\mathit{2COM}. This normal form is later used when we prove that 2COM\mathit{2COM}-languages as selection languages are as powerful as arbitrary regular languages.

Lemma 4

Each two-sided comet language L=EGHL=EG^{*}H can be represented as a finite union

L=i=1nEiGHL=\bigcup_{i=1}^{n}E_{i}G^{*}H

for some number n1n\geq 1 and with union-free languages EiE_{i} for all 1in1\leq i\leq n.

Proof.

Let L=EGHL=EG^{*}H be a two-sided comet language. Every regular language is the union of finitely many union-free languages [19]. Let n1n\geq 1 be a natural number and EiE_{i} be a union-free language for any ii with 1in1\leq i\leq n such that E=E1E2EnE=E_{1}\cup E_{2}\cup\cdots\cup E_{n}. Then, it follows L=E1GHE2GHEnGHL=E_{1}G^{*}H\cup E_{2}G^{*}H\cup\cdots\cup E_{n}G^{*}H. ∎

In order to show later that we can transform any 2COM\mathit{2COM}-language into the mentioned normal form, we now present how an infinite union-free language can be represented by a special 2COM\mathit{2COM}-form.

Lemma 5

For an infinite union-free language LL, there exist sets LlL_{l}, LiL_{i}, and LrL_{r} such that L=LlLiLrL=L_{l}L_{i}^{*}L_{r} where LlL_{l} is finite and Li{,{λ}}L_{i}\notin\{\emptyset,\{\lambda\}\}.

Proof.

We prove the assertion inductively via the number of construction steps required to create a regular expression \mathcal{R} such that L=L()L=L(\mathcal{R}) holds. In construction step 0, only finite languages are created. Therefore, the base case is n=1n=1.

Base case n=1n=1: Since LL is infinite, we have ={x}\mathcal{R}=\{x\}^{*} for a letter xVx\in V. A desired representation for the language LL is then {λ}{x}{λ}\{\lambda\}\{x\}^{*}\{\lambda\}.

Induction step nn+1n\to n+1: Assume the induction hypothesis: For every regular expression \mathcal{R} without the union operator which describes an infinite language and which is at construction level of at most nn, the language L()L(\mathcal{R}) can be represented as L()=LlLiLrL(\mathcal{R})=L_{l}L_{i}^{*}L_{r} with |Ll|<|L_{l}|<\infty and Li{,{λ}}L_{i}\notin\{\emptyset,\{\lambda\}\}. Now, let {\cal R} be a regular expression of construction level n+1n+1 which describes an infinite language and which does not contain the union operator. Then, there are two possibilities how {\cal R} is built: by concatenation of two regular expressions where for at least one of the described languages the induction hypothesis holds or by Kleene closure of a regular expression which neither describes the empty set nor the language {λ}\{\lambda\} (otherwise, L()L({\cal R}) would be finite).

Case 1: Let =𝒮𝒯\mathcal{R}=\mathcal{S}\mathcal{T}. Then, the equation L()=L(𝒮)L(𝒯)L(\mathcal{R})=L(\mathcal{S})L(\mathcal{T}) holds. If L(𝒮)L({\cal S}) is infinite, we get, according to the induction hypothesis, L(𝒮)=SlSiSrL(\mathcal{S})=S_{l}S_{i}^{*}S_{r} for suitable sets SlS_{l}, SiS_{i}, and SrS_{r}. With Rl=SlR_{l}=S_{l}, Ri=SiR_{i}=S_{i}, and Rr=SrL(𝒯)R_{r}=S_{r}L({\cal T}), we obtain L()=RlRiRrL(\mathcal{R})=R_{l}R_{i}^{*}R_{r} with |Rl|<|R_{l}|<\infty and Ri{,{λ}}R_{i}\notin\{\emptyset,\{\lambda\}\}. If L(𝒮)L({\cal S}) is finite, then L(𝒯)L({\cal T}) is infinite (because we consider only such {\cal R} where L()L({\cal R}) is infinite) and we get, according to the induction hypothesis, that L(𝒯)=TlTiTrL(\mathcal{T})=T_{l}T_{i}^{*}T_{r} for suitable sets TlT_{l}, TiT_{i}, and TrT_{r}. With Rl=L(𝒮)TlR_{l}=L({\cal S})T_{l}, Ri=TiR_{i}=T_{i}, and Rr=TrR_{r}=T_{r}, we obtain a desired representation L()=RlRiRrL(\mathcal{R})=R_{l}R_{i}^{*}R_{r} with |Rl|<|R_{l}|<\infty and Ri{,{λ}}R_{i}\notin\{\emptyset,\{\lambda\}\}.

Case 2: Let =𝒮\mathcal{R}=\mathcal{S}^{*}. Then, the equation L()=(L(𝒮))L(\mathcal{R})=(L(\mathcal{S}))^{*} holds. Thus, with Rl={λ}R_{l}=\{\lambda\}, Ri=L(𝒮)R_{i}=L(\mathcal{S}), and Rr={λ}R_{r}=\{\lambda\}, we obtain that L()=RlRiRrL(\mathcal{R})=R_{l}R_{i}^{*}R_{r} with |Rl|<|R_{l}|<\infty and Ri{,{λ}}R_{i}\notin\{\emptyset,\{\lambda\}\}.

Hence, every infinite union-free language can be expressed in the claimed form. ∎

We proved with Lemma 4 that any 2COM\mathit{2COM}-language can be given as a union of finitely many 2COM\mathit{2COM}-languages where the first comet tail is always union-free. Together, we obtain that any 2COM\mathit{2COM}-language has a representation in the 2COM\mathit{2COM}-form where the first comet tail is a finite set.

Lemma 6

For each two-sided comet language L=EGHL=EG^{*}H with E𝑈𝐹E\in\mathit{UF}, there exist a finite language EE^{\prime}, a language G{,{λ}}G^{\prime}\notin\{\emptyset,\{\lambda\}\}, and a regular language HH^{\prime} such that L=E(G)HL=E^{\prime}(G^{\prime})^{*}H^{\prime}.

Proof.

We have shown in Lemma 2 that each two-sided comet language LL is either empty or infinite. For the first case, the assertion holds with E=E^{\prime}=\emptyset and any regular languages G{,{λ}}G^{\prime}\notin\{\emptyset,\{\lambda\}\} and HH^{\prime}.

Now, let L=EGHL=EG^{*}H be an infinite 2COM\mathit{2COM}-language with E𝑈𝐹E\in\mathit{UF}. If EE is finite, then we already have a desired form with E=EE^{\prime}=E, G=GG^{\prime}=G, and H=HH^{\prime}=H.

So, let EE be infinite. By Lemma 5, we know that there are languages ElE_{l}, EiE_{i}, and ErE_{r} such that ElE_{l} is a finite set, Ei{,{λ}}E_{i}\notin\{\emptyset,\{\lambda\}\}, and E=ElEiErE=E_{l}E_{i}^{*}E_{r}. If we set E=ElE^{\prime}=E_{l}, G=EiG^{\prime}=E_{i}, and H=ErGHH^{\prime}=E_{r}G^{*}H, then we obtain a desired form because L=E(G)HL=E^{\prime}(G^{\prime})^{*}H^{\prime} where EE^{\prime} is finite, G{,{λ}}G^{\prime}\notin\{\emptyset,\{\lambda\}\}, and HH^{\prime} is a regular language. ∎

Now we connect the previous lemmas and conclude that, for every two-sided comet language, there is such a representation where the first comet tail of the language is finite.

Theorem 7 (Normal form for 2COM\mathit{2COM}-languages)

For each two-sided comet language, there exists a representation L=EGHL=EG^{*}H such that EE is a finite language and G{,{λ}}G\notin\{\emptyset,\{\lambda\}\}.

Proof.

According to Lemma 4, any two-sided comet language L=E(G)HL=E^{\prime}(G^{\prime})^{*}H^{\prime} can be represented as a union of finitely many languages Ei(G)HE^{\prime}_{i}(G^{\prime})^{*}H^{\prime} such that all languages EiE^{\prime}_{i} are union-free. According to Lemma 6, every such language Ei(G)HE^{\prime}_{i}(G^{\prime})^{*}H^{\prime} can in turn be represented as a 2COM\mathit{2COM}-language EiGHE_{i}G^{*}H where the first tail EiE_{i} is finite. The union EE of all these finite languages EiE_{i} is also finite. Hence, we obtain

L=E(G)H=i=1nEi(G)H=i=1nEiGH=(i=1nEi)GH=EGHL=E^{\prime}(G^{\prime})^{*}H^{\prime}=\bigcup_{i=1}^{n}E^{\prime}_{i}(G^{\prime})^{*}H^{\prime}=\bigcup_{i=1}^{n}E_{i}G^{*}H=\left(\bigcup_{i=1}^{n}E_{i}\right)G^{*}H=EG^{*}H

where EE is finite and G{,{λ}}G\notin\{\emptyset,\{\lambda\}\}. ∎

We refer to this representation as a left-sided normal form. A right-sided normal form (where the last comet tail is a finite set) can be derived in a similar way.

3.2 Hierarchy of subregular language classes

In this section, we investigate inclusion relations between various subregular languages classes. Figure 1 shows the results.

𝑀𝑂𝑁\mathit{MON}𝐹𝐼𝑁\mathit{FIN}𝑁𝐼𝐿\mathit{NIL}𝐶𝑂𝑀𝐵\mathit{COMB}𝐷𝐸𝐹\mathit{DEF}𝑆𝑌𝐷𝐸𝐹\mathit{SYDEF}𝑆𝑈𝐹\mathit{SUF}𝑂𝑅𝐷\mathit{ORD}𝑁𝐶=[18]𝑆𝐹\mathit{NC}\stackrel{{\scriptstyle\text{\cite[cite]{[\@@bibref{}{McNaughton_Papert.1971}{}{}]}}}}{{=}}\mathit{SF}𝑃𝑆\mathit{PS}𝑅𝐶𝑂𝑀\mathit{RCOM}𝐿𝐶𝑂𝑀\mathit{LCOM}2COM\mathit{2COM}𝐶𝑂𝑀𝑀\mathit{COMM}𝐶𝐼𝑅𝐶\mathit{CIRC}𝑈𝐹\mathit{UF}𝑆𝑇𝐴𝑅\mathit{STAR}𝑅𝐸𝐺\mathit{REG}1617[2]17[2]16[15][28][15][15][28][32][15][25][26][15][28][15][32][14][20]17[20][20]
Figure 1: Resulting hierarchy of subregular language families

An arrow from a node XX to a node YY stands for the proper inclusion XYX\subset Y. If two families are not connected by a directed path, then they are incomparable. An edge label refers to the paper where the proper inclusion has been shown (in some cases, it might be that it is not the first paper where the respective inclusion has been mentioned, since it is so obvious that it was not emphasized in a publication) or the lemma of this paper where the proper inclusion will be shown.

In the literature, it is often said that two languages are equivalent if they are equal or differ at most in the empty word. Similarly, two families can be regarded to be equivalent if they differ only in the languages \emptyset or {λ}\{\lambda\}. Therefore, the set 𝑆𝑇𝐴𝑅\mathit{STAR} of all star languages is sometimes regarded as a proper subset of the set 𝐶𝑂𝑀\mathit{COM} of all (left-, right-, or two-sided) comet languages although {λ}\{\lambda\} belongs to the family 𝑆𝑇𝐴𝑅\mathit{STAR} but not to 𝐿𝐶𝑂𝑀\mathit{LCOM}, 𝑅𝐶𝑂𝑀\mathit{RCOM} or 2COM\mathit{2COM}. We regard 𝑆𝑇𝐴𝑅\mathit{STAR} and 𝑆𝑇𝐴𝑅{{λ}}\mathit{STAR}\setminus\{\{\lambda\}\} as different. Then, the family 𝑆𝑇𝐴𝑅\mathit{STAR} is incomparable to 𝐿𝐶𝑂𝑀\mathit{LCOM}, 𝑅𝐶𝑂𝑀\mathit{RCOM}, and 2COM\mathit{2COM}, as we will later show.

For space reasons, we give the following observation without a proof.

Lemma 8

Whenever a language LL is a right-sided comet then its reversal LRL^{R} is a left-sided comet language and vice versa.

Corollary 9

We have 𝐿𝐶𝑂𝑀={LRL𝑅𝐶𝑂𝑀}\mathit{LCOM}=\{\;L^{R}\mid L\in\mathit{RCOM}\;\} and 𝑅𝐶𝑂𝑀={LRL𝐿𝐶𝑂𝑀}\mathit{RCOM}=\{\;L^{R}\mid L\in\mathit{LCOM}\;\}.

We now present some languages which will serve later as witness languages for proper inclusions or incomparabilities.

Lemma 10

The language L={λ}L=\{\lambda\} is in 𝑆𝑇𝐴𝑅2COM.\mathit{STAR}\setminus\mathit{2COM}.

Proof.

The language LL is a star language since L=HL=H^{*} with H={λ}H=\{\lambda\}. According to Lemma 2, a two-sided comet language is either infinite or the empty language. Hence, LL is not a two-sided comet. ∎

Lemma 11

Let L={a2nn0}L=\{\;a^{2n}\mid n\geq 0\;\}. Then, it holds L(𝑆𝑇𝐴𝑅𝐿𝐶𝑂𝑀𝑅𝐶𝑂𝑀)𝑃𝑆L\in(\mathit{STAR}\cap\mathit{LCOM}\cap\mathit{RCOM})\setminus\mathit{PS}.

Proof.

Let G={aa}G=\{aa\} and E=H={λ}E=H=\{\lambda\}. The language LL can be expressed as L=G=EG=GHL=G^{*}=EG^{*}=G^{*}H. Therefore, L𝑆𝑇𝐴𝑅𝐿𝐶𝑂𝑀𝑅𝐶𝑂𝑀L\in\mathit{STAR}\cap\mathit{LCOM}\cap\mathit{RCOM}.

Assume that L𝑃𝑆L\in\mathit{PS}. Then, there is a natural number m1m\geq 1 such that, for any word x{a}x\in\{a\}^{*}, either JxmL=J_{x}^{m}\cap L=\emptyset or JxmLJ_{x}^{m}\subseteq L where Jxm={xnnm}J_{x}^{m}=\{\;x^{n}\mid n\geq m\;\}. For any natural number m1m\geq 1, we have with the word x=ax=a the set Jam={annm}J_{a}^{m}=\{\;a^{n}\mid n\geq m\;\}. Since a2mJamLa^{2m}\in J_{a}^{m}\cap L, the intersection is not empty. But, since a2m+1JamLa^{2m+1}\in J_{a}^{m}\setminus L, it neither holds JamLJ_{a}^{m}\subseteq L. Hence, the language LL is not power-separating. ∎

Lemma 12

Let L={ab}L=\{ab\}^{*}. Then, it holds L(𝑆𝑇𝐴𝑅𝐿𝐶𝑂𝑀𝑅𝐶𝑂𝑀)𝐶𝐼𝑅𝐶.L\in(\mathit{STAR}\cap\mathit{LCOM}\cap\mathit{RCOM})\setminus\mathit{CIRC}.

Proof.

Let G={ab}G=\{ab\} and E=H={λ}E=H=\{\lambda\}. The language LL can be expressed as L=G=EG=GHL=G^{*}=EG^{*}=G^{*}H. Therefore, L𝑆𝑇𝐴𝑅𝐿𝐶𝑂𝑀𝑅𝐶𝑂𝑀L\in\mathit{STAR}\cap\mathit{LCOM}\cap\mathit{RCOM}.

Assume that the language LL is circular. Then, the word baba would belong to it because abLab\in L but it does not. Hence, L𝐶𝐼𝑅𝐶L\notin\mathit{CIRC}. ∎

Lemma 13 ([20])

Let V={a,b}V=\{a,b\} be an alphabet, H={ba}{b}({aa}{b})H=\{ba\}\{b\}^{*}(\{aa\}\{b\}^{*})^{*} a regular language over VV, and L=VHL=V^{*}H. Then, L𝑆𝑌𝐷𝐸𝐹𝑆𝐹L\in\mathit{SYDEF}\setminus\mathit{SF}.

Proof.

The language LL can be represented as {λ}VH\{\lambda\}V^{*}H. So, the language is symmetric definite. As shown in [20], the language is not star-free. ∎

Lemma 14

Let L1={anbn0}L_{1}=\{\;a^{n}b\mid n\geq 0\;\} and L2=L1RL_{2}=L_{1}^{R}. Then, L1𝑅𝐶𝑂𝑀𝐿𝐶𝑂𝑀L_{1}\in\mathit{RCOM}\setminus\mathit{LCOM} and L2𝐿𝐶𝑂𝑀𝑅𝐶𝑂𝑀L_{2}\in\mathit{LCOM}\setminus\mathit{RCOM}.

Proof.

The language L1L_{1} can be expressed as {a}{b}\{a\}^{*}\{b\}, hence, in the form L1=GHL_{1}=G^{*}H with G={a}G=\{a\} and H={b}H=\{b\}. Thus, L1𝑅𝐶𝑂𝑀L_{1}\in\mathit{RCOM}.

Assume that L1𝐿𝐶𝑂𝑀L_{1}\in\mathit{LCOM}. Then, two languages EE and II would exist such that L1=EIL_{1}=EI^{*}. Since bb is a suffix of every word in L1L_{1}, the letter bb is also a suffix of a word in II. But then L1L_{1} would also contain a word with more than one bb which is a contradiction. Hence, L1𝐿𝐶𝑂𝑀L_{1}\notin\mathit{LCOM}.

By Corollary 9, it follows that L2𝐿𝐶𝑂𝑀𝑅𝐶𝑂𝑀L_{2}\in\mathit{LCOM}\setminus\mathit{RCOM}. ∎

Lemma 15

The language L={λ,a}L=\{\lambda,a\} belongs to the set (𝐹𝐼𝑁𝑆𝑈𝐹𝐶𝑂𝑀𝑀)(𝑆𝑇𝐴𝑅2COM)(\mathit{FIN}\cap\mathit{SUF}\cap\mathit{COMM})\setminus(\mathit{STAR}\cup\mathit{2COM}).

Proof.

All suffixes of all words of the language LL belong to LL. Thus, LL is suffix-closed. Furthermore, the language is finite but not empty and commutative. According to Lemma 2, each two-sided comet language is either empty or infinite. Hence, LL is not a two-sided comet language. According to Lemma 3, each star language is either infinite or contains only the empty word. Hence, LL is not a star language either. ∎

We now prove some proper inclusions.

Lemma 16

We have the proper inclusions 𝑀𝑂𝑁𝑆𝑇𝐴𝑅𝑈𝐹\mathit{MON}\subset\mathit{STAR}\subset\mathit{UF}.

Proof.

We first prove the relation 𝑀𝑂𝑁𝑆𝑇𝐴𝑅\mathit{MON}\subset\mathit{STAR}: Any monoidal language can be expressed as L=VL=V^{*} for some alphabet VV. Since VV is a regular language, LL is a star language. A witness language for the properness is the language L={a2nn0}L=\{\;a^{2n}\mid n\geq 0\;\} as shown in Lemma 11.

We now prove the relation 𝑆𝑇𝐴𝑅𝑈𝐹\mathit{STAR}\subset\mathit{UF}: Every language HH^{*} for some regular language HH is union-free according to [19]. A witness language for the properness is L={a}L=\{a\} which is union-free but, according to Lemma 3, not a star language since it is neither infinite nor equal to {λ}\{\lambda\}. ∎

Lemma 17

We have the proper inclusions 𝑀𝑂𝑁𝑆𝑌𝐷𝐸𝐹𝒞2COM\mathit{MON}\subset\mathit{SYDEF}\subset{\cal C}\subset\mathit{2COM} for 𝒞{𝐿𝐶𝑂𝑀,𝑅𝐶𝑂𝑀}{\cal C}\in\{\mathit{LCOM},\mathit{RCOM}\}.

Proof.

  1. 1.

    𝑀𝑂𝑁𝑆𝑌𝐷𝐸𝐹\mathit{MON}\subset\mathit{SYDEF}: Any monoidal language can be expressed as L=VL=V^{*} for some alphabet VV and, with E=H={λ}E=H=\{\lambda\} also in the form EVHEV^{*}H. Hence, the language LL is symmetric definite. A witness language for the properness is {a,b}{ba}{b}({aa}{b})\{a,b\}^{*}\{ba\}\{b\}^{*}(\{aa\}\{b\}^{*})^{*} from Lemma 13 (and originally [20]).

  2. 2.

    𝑆𝑌𝐷𝐸𝐹𝑅𝐶𝑂𝑀\mathit{SYDEF}\subset\mathit{RCOM}: This relation was proved in [20].

  3. 3.

    𝑆𝑌𝐷𝐸𝐹𝐿𝐶𝑂𝑀\mathit{SYDEF}\subset\mathit{LCOM}: The family 𝑆𝑌𝐷𝐸𝐹\mathit{SYDEF} is closed under reversal. For any symmetric definite language LL, its reversal LRL^{R} also belongs to the family 𝑆𝑌𝐷𝐸𝐹\mathit{SYDEF} and, by [20], is also a right-sided comet language. By Lemma 8, the reversal of the language LRL^{R}, hence LL itself, is a left-sided comet language. A witness language for the properness is the language L={a2nn0}L=\{\;a^{2n}\mid n\geq 0\;\} according to Lemma 11 where it is shown that L𝐿𝐶𝑂𝑀𝑃𝑆L\in\mathit{LCOM}\setminus\mathit{PS} and according to [20] where the inclusion 𝑆𝑌𝐷𝐸𝐹𝑃𝑆\mathit{SYDEF}\subset\mathit{PS} is proved.

  4. 4.

    𝑅𝐶𝑂𝑀2COM\mathit{RCOM}\subset\mathit{2COM}: This relation was proved in [20].

  5. 5.

    𝐿𝐶𝑂𝑀2COM\mathit{LCOM}\subset\mathit{2COM}: Any left-sided comet language L=EGL=EG^{*} is also a two-sided comet EGHEG^{*}H with H={λ}H=\{\lambda\}. In Lemma 14, it was shown that the language L={anbn0}L=\{\;a^{n}b\mid n\geq 0\;\} is a right-sided comet language but not a left-sided comet. By [20], it is a two-sided comet language.\Box

We now prove the incomparability relations mentioned in Figure 1 which have not been proved earlier. These are the relations regarding the families 𝑆𝑇𝐴𝑅\mathit{STAR}, 𝑆𝑌𝐷𝐸𝐹\mathit{SYDEF}, 𝐿𝐶𝑂𝑀\mathit{LCOM}, 𝑅𝐶𝑂𝑀\mathit{RCOM}, and 2COM\mathit{2COM}.

Lemma 18

Each of the families 𝑆𝑇𝐴𝑅\mathit{STAR} and 𝑈𝐹\mathit{UF} is incomparable to each of the families 𝐶𝑂𝑀𝐵\mathit{COMB}, 𝑆𝑌𝐷𝐸𝐹\mathit{SYDEF}, 𝑅𝐶𝑂𝑀\mathit{RCOM}, 𝐿𝐶𝑂𝑀\mathit{LCOM}, and 2COM\mathit{2COM}.

Proof.

Due to inclusion relations, it suffices to show that there are a language L1𝑆𝑇𝐴𝑅2COML_{1}\in\mathit{STAR}\setminus\mathit{2COM} and a language L2𝐶𝑂𝑀𝐵𝑈𝐹L_{2}\in\mathit{COMB}\setminus\mathit{UF}. From Lemma 10, we get L1={λ}L_{1}=\{\lambda\}. From [15], we take L2={a,b,c}{a,b}L_{2}=\{a,b,c\}^{*}\{a,b\}. ∎

Lemma 19

The language family 𝑆𝑇𝐴𝑅\mathit{STAR} is incomparable to each of the families 𝐹𝐼𝑁\mathit{FIN}, 𝑁𝐼𝐿\mathit{NIL}, 𝐷𝐸𝐹\mathit{DEF}, 𝑂𝑅𝐷\mathit{ORD}, 𝑁𝐶\mathit{NC}, 𝑆𝐹\mathit{SF}, 𝑃𝑆\mathit{PS}, and 𝑆𝑈𝐹\mathit{SUF}.

Proof.

Due to inclusion relations, it suffices to show that there are a language L1𝑆𝑇𝐴𝑅𝑃𝑆L_{1}\in\mathit{STAR}\setminus\mathit{PS}, a language L2𝐹𝐼𝑁𝑆𝑇𝐴𝑅L_{2}\in\mathit{FIN}\setminus\mathit{STAR}, and a language L3𝑆𝑈𝐹𝑆𝑇𝐴𝑅L_{3}\in\mathit{SUF}\setminus\mathit{STAR}. As L1L_{1}, we obtain from Lemma 11 the language L1={a2nn0}L_{1}=\{\;a^{2n}\mid n\geq 0\;\}. From Lemma 15, we take L2=L3={λ,a}L_{2}=L_{3}=\{\lambda,a\}. ∎

Lemma 20

The language family 𝑆𝑇𝐴𝑅\mathit{STAR} is incomparable to the families 𝐶𝐼𝑅𝐶\mathit{CIRC} and 𝐶𝑂𝑀𝑀\mathit{COMM}.

Proof.

Due to inclusion relations, it suffices to show that there are a language L1𝑆𝑇𝐴𝑅𝐶𝐼𝑅𝐶L_{1}\in\mathit{STAR}\setminus\mathit{CIRC} and a language L2𝐶𝑂𝑀𝑀𝑆𝑇𝐴𝑅L_{2}\in\mathit{COMM}\setminus\mathit{STAR}. From Lemma 12, we have L1={ab}L_{1}=\{ab\}^{*}. From Lemma 15, we take again the language L2={λ,a}L_{2}=\{\lambda,a\}. ∎

Lemma 21

The language families 𝐿𝐶𝑂𝑀\mathit{LCOM} and 𝑅𝐶𝑂𝑀\mathit{RCOM} are incomparable to each other.

Proof.

With the witness languages L1={anbn0}𝑅𝐶𝑂𝑀𝐿𝐶𝑂𝑀L_{1}=\{\;a^{n}b\mid n\geq 0\;\}\in\mathit{RCOM}\setminus\mathit{LCOM} and L2=L1R𝐿𝐶𝑂𝑀𝑅𝐶𝑂𝑀L_{2}=L_{1}^{R}\in\mathit{LCOM}\setminus\mathit{RCOM}, the statement follows from Lemma 14. ∎

Lemma 22

The language families 𝑆𝑌𝐷𝐸𝐹\mathit{SYDEF}, 𝐿𝐶𝑂𝑀\mathit{LCOM}, 𝑅𝐶𝑂𝑀\mathit{RCOM}, and 2COM\mathit{2COM} are incomparable to each of the families 𝐹𝐼𝑁\mathit{FIN}, 𝑁𝐼𝐿\mathit{NIL}, 𝐷𝐸𝐹\mathit{DEF}, 𝑂𝑅𝐷\mathit{ORD}, 𝑁𝐶\mathit{NC}, and 𝑆𝐹\mathit{SF}.

Proof.

Due to inclusion relations, it suffices to show that there are a language L1𝑆𝑌𝐷𝐸𝐹𝑆𝐹L_{1}\in\mathit{SYDEF}\setminus\mathit{SF} and a language L2𝐹𝐼𝑁2COML_{2}\in\mathit{FIN}\setminus\mathit{2COM}. From Lemma 13 (and previously [20]), for the first language, we obtain the language L1={a,b}{ba}{b}({aa}{b})L_{1}=\{a,b\}^{*}\{ba\}\{b\}^{*}(\{aa\}\{b\}^{*})^{*}. From Lemma 15, we take the language L2={λ,a}L_{2}=\{\lambda,a\}. ∎

Lemma 23

The language families 𝐿𝐶𝑂𝑀\mathit{LCOM}, 𝑅𝐶𝑂𝑀\mathit{RCOM}, and 2COM\mathit{2COM} are incomparable to the family 𝑃𝑆\mathit{PS}.

Proof.

Due to inclusion relations, it suffices to show that there are a language L1𝐿𝐶𝑂𝑀𝑃𝑆L_{1}\in\mathit{LCOM}\setminus\mathit{PS} and a language L2𝑃𝑆2COML_{2}\in\mathit{PS}\setminus\mathit{2COM}. The property of (non) power-separating is not influenced by the reversal operation. If there is a language L1𝐿𝐶𝑂𝑀𝑃𝑆L_{1}\in\mathit{LCOM}\setminus\mathit{PS}, then there is also a language in the set 𝑅𝐶𝑂𝑀𝑃𝑆\mathit{RCOM}\setminus\mathit{PS}, namely L1RL_{1}^{R}. From Lemma 11, we have L1={a2nn0}(𝐿𝐶𝑂𝑀𝑅𝐶𝑂𝑀)𝑃𝑆L_{1}=\{\;a^{2n}\mid n\geq 0\;\}\in(\mathit{LCOM}\cap\mathit{RCOM})\setminus\mathit{PS}. As language L2L_{2}, we take again the language L2={λ,a}L_{2}=\{\lambda,a\} from Lemma 15. ∎

Lemma 24

The language families 𝑆𝑌𝐷𝐸𝐹\mathit{SYDEF}, 𝐿𝐶𝑂𝑀\mathit{LCOM}, 𝑅𝐶𝑂𝑀\mathit{RCOM}, and 2COM\mathit{2COM} are incomparable to each of the families 𝑆𝑈𝐹\mathit{SUF}, 𝐶𝐼𝑅𝐶\mathit{CIRC} and 𝐶𝑂𝑀𝑀\mathit{COMM}.

Proof.

Due to inclusion relations, it suffices to show that there are a language L1𝑆𝑌𝐷𝐸𝐹𝑆𝑈𝐹L_{1}\in\mathit{SYDEF}\setminus\mathit{SUF}, a language L2𝑆𝑌𝐷𝐸𝐹𝐶𝐼𝑅𝐶L_{2}\in\mathit{SYDEF}\setminus\mathit{CIRC}, a language L3𝑆𝑈𝐹2COML_{3}\in\mathit{SUF}\setminus\mathit{2COM}, and a language L4𝐶𝑂𝑀𝑀2COML_{4}\in\mathit{COMM}\setminus\mathit{2COM}. In [15], it was shown that the families 𝐶𝑂𝑀𝐵\mathit{COMB} and 𝑆𝑈𝐹\mathit{SUF} are disjoint. Since 𝐶𝑂𝑀𝐵𝑆𝑌𝐷𝐸𝐹\mathit{COMB}\subseteq\mathit{SYDEF}, we can take any combinational language as L1L_{1}, for instance, L1={a,b}{b}L_{1}=\{a,b\}^{*}\{b\}. The same language serves as L2L_{2} because it is not circular. From Lemma 15, we take again the language {λ,a}\{\lambda,a\} as L3L_{3} and L4L_{4}. ∎

From all these relations, the hierachy presented in Figure 1 follows.

Theorem 25 (Resulting hierarchy)

The inclusion relations presented in Figure 1 hold. An arrow from an entry XX to an entry YY depicts the proper inclusion XYX\subset Y; if two families are not connected by a directed path, then they are incomparable.

Proof.

An edge label refers to the paper or lemma in the present paper where the proper inclusion is shown. The incomparability results are proved in Lemmas 18 to 24. ∎

4 Results on subregular control in external contextual grammars

In this section, we include the families of languages generated by external contextual grammars with selection languages from the subregular families under investigation into the existing hierarchy with respect to external contextual grammars.

If, in a contextual grammar, all selection languages belong to some language family XX, then they belong also to every super set YY of XX. Therefore, each language in 𝒞(X)\mathcal{EC}(X) is also generated by a contextual grammar with selection languages from YY and we have the following monotonicity.

Lemma 26

For any two language classes XX and YY with XYX\subseteq Y, we have the inclusion 𝒞(X)𝒞(Y){\cal EC}(X)\subseteq{\cal EC}(Y).

Figure 2 shows a hierarchy of some language families which are generated by external contextual grammars where the selection languages belong to subregular classes investigated before. The hierarchy contains results which were already known (marked by a reference to the literature) and results which will be proved in this section (marked by a number which refers to the respective lemma).

𝒞(𝑀𝑂𝑁)\mathcal{EC}(\mathit{MON})𝒞(𝐹𝐼𝑁)\mathcal{EC}(\mathit{FIN})𝒞(𝐶𝑂𝑀𝐵)\mathcal{EC}(\mathit{COMB})𝒞(𝑁𝐼𝐿)\mathcal{EC}(\mathit{NIL})𝒞(𝐷𝐸𝐹)\mathcal{EC}(\mathit{DEF})𝒞(𝑂𝑅𝐷)\mathcal{EC}(\mathit{ORD})𝒞(𝑆𝑌𝐷𝐸𝐹)\mathcal{EC}(\mathit{SYDEF})𝒞(𝑆𝑈𝐹)\mathcal{EC}(\mathit{SUF})𝒞(𝐶𝑂𝑀𝑀)\mathcal{EC}(\mathit{COMM})𝒞(𝑆𝑇𝐴𝑅)\mathcal{EC}(\mathit{STAR})𝒞(𝑁𝐶)\mathcal{EC}(\mathit{NC})𝒞(𝑃𝑆)\mathcal{EC}(\mathit{PS})𝒞(𝐶𝐼𝑅𝐶)\mathcal{EC}(\mathit{CIRC})𝒞(𝑅𝐸𝐺)=[9]𝒞(𝑈𝐹)=35𝒞(𝐿𝐶𝑂𝑀)=35𝒞(𝑅𝐶𝑂𝑀)=35𝒞(2COM)\mathcal{EC}(\mathit{REG})\stackrel{{\scriptstyle\text{\cite[cite]{[\@@bibref{}{Dassow_Manea_Truthe.2012}{}{}]}}}}{{=}}\mathcal{EC}(\mathit{UF})\stackrel{{\scriptstyle\text{\ref{lemma:ec:reg_eq_com}}}}{{=}}{\cal EC}(\mathit{LCOM})\stackrel{{\scriptstyle\text{\ref{lemma:ec:reg_eq_com}}}}{{=}}{\cal EC}(\mathit{RCOM})\stackrel{{\scriptstyle\text{\ref{lemma:ec:reg_eq_com}}}}{{=}}{\cal EC}(\mathit{2COM})[7][7][8][29][27]39[10][29][7][29]363840[7][7][9][9][29]
Figure 2: Resulting hierarchy of language families by external contextual grammars with special selection languages

An arrow from a node XX to a node YY stands for the proper inclusion XYX\subset Y. If two families are not connected by a directed path, then they are incomparable. An edge label refers to the paper where the proper inclusion has been shown or the lemma of this paper where the proper inclusion will be shown.

We now present some languages which will serve later as witness languages for proper inclusions or incomparabilities. Due to space limitations, we give only proof sketches in some cases where we believe that the reader finds the idea feasible.

Lemma 27

Let L={anbbbn1}{λ}L=\{\;a^{n}bbb\mid n\geq 1\;\}\cup\{\lambda\}. Then, it holds L𝒞(𝑁𝐼𝐿)𝒞(𝑆𝑇𝐴𝑅)L\in\mathcal{EC}(\mathit{NIL})\setminus\mathcal{EC}(\mathit{STAR}).

Proof.

The contextual grammar G=({a,b},{{a,b}{a,b}4(a,λ)},{abbb,λ})G=(\{a,b\},\{\{a,b\}^{*}\{a,b\}^{4}\to(a,\lambda)\},\{abbb,\lambda\}) generates LL.

During the derivation, the number of the letter aa is increasing without changing the number of bb. If the selection languages are from 𝑆𝑇𝐴𝑅\mathit{STAR}, then such a context containing letters aa only could be wrapped around the empty word yielding a word without bb which is a contradiction. ∎

Lemma 28

Let L={bnan0}{λ}L=\{\;b^{n}a\mid n\geq 0\;\}\cup\{\lambda\}. Then, it holds L𝒞(𝐶𝑂𝑀𝐵)𝒞(𝑆𝑇𝐴𝑅)L\in\mathcal{EC}(\mathit{COMB})\setminus\mathcal{EC}(\mathit{STAR}).

Proof.

The contextual grammar G=({a,b},{{a,b}{a}(b,λ)},{λ,a})G=(\{a,b\},\{\{a,b\}^{*}\{a\}\to(b,\lambda)\},\{\lambda,a\}) generates the language LL and the selection language is combinational.

Similarly to the proof before: With star selection languages, a word with the letter bb but without aa could be generated. ∎

Lemma 29

Let L1={a,b}{anbmn1,m1}L_{1}=\{a,b\}^{*}\{\;a^{n}b^{m}\mid n\geq 1,\ m\geq 1\;\}, L2={canbmcn1,m1}L_{2}=\{\;ca^{n}b^{m}c\mid n\geq 1,\ m\geq 1\;\}, and L=L1L2L=L_{1}\cup L_{2}. Then, it holds L𝒞(𝑆𝑈𝐹)𝒞(𝑆𝑇𝐴𝑅)L\in\mathcal{EC}(\mathit{SUF})\setminus\mathcal{EC}(\mathit{STAR}).

Proof.

It holds L=L(G)L=L(G) for the contextual grammar G=({a,b,c},{(S1,C1),(S2,C2)},{ab})G=(\{a,b,c\},\{(S_{1},C_{1}),(S_{2},C_{2})\},\{ab\}) with

S1={a,b},C1={(a,λ),(b,λ),(λ,b)},S2={anbmn0,m1}{λ},C2={(c,c)}.S_{1}=\{a,b\}^{*},\quad C_{1}=\{(a,\lambda),(b,\lambda),(\lambda,b)\},\qquad S_{2}=\{\;a^{n}b^{m}\mid n\geq 0,\ m\geq 1\;\}\cup\{\lambda\},\quad C_{2}=\{(c,c)\}.

Using star selection languages, the two letters cc could be wrapped around a word with more than one aa-to-bb-change from L1L_{1} which would yield a word not belonging to LL. ∎

Lemma 30

Let L1={ann2}L_{1}=\{\;a^{n}\mid n\geq 2\;\} and L2={ba2nbn1}L_{2}=\{\;ba^{2n}b\mid n\geq 1\;\} be two languages and L=L1L2L=L_{1}\cup L_{2} its union. Then, the relation L𝒞(STAR)𝒞(PS)L\in\mathcal{EC}(STAR)\setminus\mathcal{EC}(PS) holds.

Proof.

It holds L=L(G)L=L(G) for the contextual grammar G=({a,b},{(S1,C1),(S2,C2)},{aa})G=(\{a,b\},\{(S_{1},C_{1}),(S_{2},C_{2})\},\{aa\}) with

S1={ann0},C1={(λ,a)}andS2={a2nn0},C2={(b,b)}.S_{1}=\{\;a^{n}\mid n\geq 0\;\},\quad C_{1}=\{(\lambda,a)\}\quad\mbox{and}\quad S_{2}=\{\;a^{2n}\mid n\geq 0\;\},\quad C_{2}=\{(b,b)\}.

Now assume that L𝒞(𝑃𝑆)L\in\mathcal{EC}(\mathit{PS}). Then, L=L(G)L=L(G^{\prime}) for a contextual grammar GG^{\prime} where every selection language is power-separating.

For every selection language (since it is power-separating), there is a number mSm_{S}\in\mathbb{N} such that, for every word x{a,b}x\in\{a,b\}^{*}, either JxmSS=J_{x}^{m_{S}}\cap S=\emptyset or JxmSSJ_{x}^{m_{S}}\subseteq S with JxmS={xnnmS}J_{x}^{m_{S}}=\{\;x^{n}\mid n\geq m_{S}\;\}. Let mSm_{S} be the minimum of these numbers for SS and let mm be the maximum of all the values mSm_{S} for a selection language SS.

Further, let p=m+(G)p=m+\ell(G^{\prime}). Then, we have the following statement for every selection language SS: For each word x{a,b}x\in\{a,b\}^{*}, it is

either JxpS= or JxpS\text{either }J_{x}^{p}\cap S=\emptyset\text{ or }J_{x}^{p}\subseteq S (1)

where Jxp={xnnp}J_{x}^{p}=\{\;x^{n}\mid n\geq p\;\}.

The language L2L_{2} contains words with an arbitrary even number of letters aa and a letter bb at each end. Hence, there is a derivation w0w1uw1vw_{0}\Longrightarrow^{*}w_{1}\Longrightarrow uw_{1}v with w0Aw_{0}\in A, |w1|a>p|w_{1}|_{a}>p, |w1|b=0|w_{1}|_{b}=0, and |uv|b>0|uv|_{b}>0. This implies w1=akw_{1}=a^{k} with k>pk>p.

Let SS be the selection language used in the last derivation step. Then, we have akSa^{k}\in S and, with property (1), also ak+1Sa^{k+1}\in S. Since ak+1a^{k+1} belongs to L1L_{1} and therefore also to LL, the last derivation step can also be applied to ak+1a^{k+1} which yields the word uak+1vua^{k+1}v. Since |uv|b>0|uv|_{b}>0, the word uak+1vua^{k+1}v belongs at most to L2L_{2}. Since uakvL1ua^{k}v\in L_{1}, we know that |uakv|a|ua^{k}v|_{a} is an even number and |uak+1v|a|ua^{k+1}v|_{a} is an odd number. Therefore, the word uak+1vua^{k+1}v does not belong to L2L_{2} and neither to LL which is a contradiction to L=L(G)L=L(G^{\prime}). Thus, we conclude L𝒞(𝑃𝑆)L\notin{\cal EC}(\mathit{PS}). ∎

Lemma 31

Let L={anbnn1}{bnann1}L=\{\;a^{n}b^{n}\mid n\geq 1\;\}\cup\{\;b^{n}a^{n}\mid n\geq 1\;\}. Then, it holds L𝒞(𝑆𝑇𝐴𝑅)𝒞(𝐶𝐼𝑅𝐶)L\in\mathcal{EC}(\mathit{STAR})\setminus\mathcal{EC}(\mathit{CIRC}).

Proof.

It holds L=L(G)L=L(G) for the contextual grammar G=({a,b},{(S1,C1),(S2,C2)},{ab,ba})G=(\{a,b\},\{(S_{1},C_{1}),(S_{2},C_{2})\},\{ab,ba\}) with

S1={anbmn1,m1},C1={(a,b)}andS2={bnamn1,m1},C2={(b,a)}.S_{1}=\{\;a^{n}b^{m}\mid n\geq 1,\ m\geq 1\;\}^{*},\quad C_{1}=\{(a,b)\}\quad\mbox{and}\quad S_{2}=\{\;b^{n}a^{m}\mid n\geq 1,\ m\geq 1\;\}^{*},\quad C_{2}=\{(b,a)\}.

With circular selection languages, a context (ak,bk)(a^{k},b^{k}) could be wrapped around a word bmamb^{m}a^{m} yielding a word which does not belong to the language LL. ∎

Lemma 32

The language L={a,b}{c}{ab}{c}L=\{a,b\}^{*}\cup\{c\}\{ab\}^{*}\{c\} belongs to the set 𝒞(𝑂𝑅𝐷)𝒞(𝑆𝑌𝐷𝐸𝐹)\mathcal{EC}(\mathit{ORD})\setminus\mathcal{EC}(\mathit{SYDEF}).

Proof.

In Example 1, we have given a contextual grammar where all selection languages are accepted by ordered finite automata, and thus, have shown that L𝒞(𝑂𝑅𝐷)L\in{\cal EC}(\mathit{ORD}).

Suppose that the language LL is also generated by a contextual grammar GG^{\prime} where all selection languages are symmetric definite.

Let us consider a word w=c(ab)ncLw=c(ab)^{n}c\in L for some n(G)n\geq\ell(G^{\prime}). Due to the choice of nn, the word ww is derived in one step from some word zz by using a selection language SS and context (u,v)(u,v): zuzv=wz\Longrightarrow uzv=w. The word uu begins with the letter cc; the word vv ends with cc. Due to the choice of nn, we also have |z|a>0|z|_{a}>0 and |z|b>0|z|_{b}>0. Since SS is symmetric definite over the alphabet V={a,b}V=\{a,b\}, it can be expressed as S=EVHS=EV^{*}H for some regular languages EE and HH over VV. The sets EE and HH are not empty because SS contains at least the word zz. Let ee be a word of EE and hh a word of HH. Then, the word ebbhebbh belongs to the selection language SS as well. Since ebbh{a,b}ebbh\in\{a,b\}^{*} and {a,b}L\{a,b\}^{*}\subseteq L, we can apply the same derivation to this word and obtain uebbhvuebbhv. This word starts and ends with cc but it does not have the form of those words from LL because of the double bb. From this contradiction, it follows L𝒞(𝑆𝑌𝐷𝐸𝐹)L\notin{\cal EC}(\mathit{SYDEF}). ∎

Lemma 33

The language L={a,b}{c}{λ,b}{ab}{c}L=\{a,b\}^{*}\cup\{c\}\{\lambda,b\}\{ab\}^{*}\{c\} belongs to 𝒞(𝑆𝑈𝐹)𝒞(𝑆𝑌𝐷𝐸𝐹)\mathcal{EC}(\mathit{SUF})\setminus\mathcal{EC}(\mathit{SYDEF}).

Proof.

The language LL is generated by the contextual grammar G=({a,b,c},{(S1,C1),(S2,C2)},{λ})G=(\{a,b,c\},\{(S_{1},C_{1}),(S_{2},C_{2})\},\{\lambda\}) with

S1={a,b},C1={(λ,a),(λ,b)}andS2=𝑆𝑢𝑓({ab}),C2={(c,c)}S_{1}=\{a,b\}^{*},\quad C_{1}=\{(\lambda,a),(\lambda,b)\}\quad\mbox{and}\quad S_{2}=\mathit{Suf}(\{ab\}^{*}),\quad C_{2}=\{(c,c)\}

where 𝑆𝑢𝑓(M)\mathit{Suf}(M) denotes the suffix-closure of the set MM.

With the same argumentation as in the proof of Lemma 32, one can show also here L𝒞(𝑆𝑌𝐷𝐸𝐹)L\notin{\cal EC}(\mathit{SYDEF}) (the letters cc are in both cases wrapped around words which are an alternating sequence of aa and bb what cannot be checked by a symmetric definite selection language). ∎

Lemma 34

Let L1={ann1}L_{1}=\{\;a^{n}\mid n\geq 1\;\}, L2={banbn1}L_{2}=\{\;ba^{n}b\mid n\geq 1\;\}, L3={cba2nbcn1}L_{3}=\{\;cba^{2n}bc\mid n\geq 1\;\}, and L=L1L2L3L=L_{1}\cup L_{2}\cup L_{3}. Then, it holds L𝒞(𝑆𝑌𝐷𝐸𝐹)𝒞(𝑁𝐶)L\in\mathcal{EC}(\mathit{SYDEF})\setminus\mathcal{EC}(\mathit{NC}).

Proof.

Let V={a,b,c}V=\{a,b,c\}. The contextual grammar G=(V,{(S1,C1),(S2,C2)},{a})G=(V,\{(S_{1},C_{1}),(S_{2},C_{2})\},\{a\}) with

S1={a}V{λ},C1={(λ,a),(b,b)}andS2={ba2mbm1}V{λ},C2={(c,c)}S_{1}=\{a\}V^{*}\{\lambda\},\quad C_{1}=\{(\lambda,a),(b,b)\}\quad\mbox{and}\quad S_{2}=\{\;ba^{2m}b\mid m\geq 1\;\}V^{*}\{\lambda\},\quad C_{2}=\{(c,c)\}

generates the language LL. This can be seen as follows: The shortest word of LL is aa which is the axiom. To every word of LL starting with the letter aa (hence, any word of L1L_{1}), another aa can be added or the letter bb is added at the beginning and the end of the word (using the first selection component) yielding all and only words of the languages L1L_{1} and L2L_{2}. To every word of L2L_{2} which also belongs to S2S_{2}, the letter cc is added at the beginning and the end of the word (using the second selection component) yielding exactly the words of the language L3L_{3}. To the words of L3L_{3}, no selection component can be applied. All the selection languages are symmetric definite as can be seen from the form in which they are given.

In [29], it was proved that the language LL does not belong to the family 𝒞(𝑁𝐶){\cal EC}(\mathit{NC}). ∎

Next, we show some equalities.

Lemma 35

A restriction to comet languages (left, right, two-sided) as selection languages does not decrease the generative capacity of external contextual grammars:

𝒞(𝑅𝐸𝐺)=𝒞(𝐿𝐶𝑂𝑀)=𝒞(𝑅𝐶𝑂𝑀)=𝒞(2COM).\mathcal{EC}(\mathit{REG})=\mathcal{EC}(\mathit{LCOM})=\mathcal{EC}(\mathit{RCOM})=\mathcal{EC}(\mathit{2COM}).
Proof.

With the inclusions 𝐿𝐶𝑂𝑀2COM\mathit{LCOM}\subseteq\mathit{2COM}, 𝑅𝐶𝑂𝑀2COM\mathit{RCOM}\subseteq\mathit{2COM}, and 2COM𝑅𝐸𝐺\mathit{2COM}\subseteq\mathit{REG} (see Theorem 25 and Figure 1), we obtain also the inclusions 𝒞(𝐿𝐶𝑂𝑀)𝒞(2COM){\cal EC}(\mathit{LCOM})\subseteq{\cal EC}(\mathit{2COM}), 𝒞(𝑅𝐶𝑂𝑀)𝒞(2COM){\cal EC}(\mathit{RCOM})\subseteq{\cal EC}(\mathit{2COM}), and 𝒞(2COM)𝒞(𝑅𝐸𝐺){\cal EC}(\mathit{2COM})\subseteq{\cal EC}(\mathit{REG}) according to Lemma 26.

Let G=(V,{(S1,C1),,(Sn,Cn)},A)G=(V,\{(S_{1},C_{1}),\dots,(S_{n},C_{n})\},A) be a contextual grammar with arbitrary regular selection languages. Further, let XX be a new symbol (XVX\notin V). We set Si={X}SiS_{i}^{\prime}=\{X\}^{*}S_{i} for 1in1\leq i\leq n. Then, the contextual grammar G=(V{X},{(S1,C1),,(Sn,Cn)},A)G^{\prime}=(V\cup\{X\},\{(S_{1}^{\prime},C_{1}),\dots,(S_{n}^{\prime},C_{n})\},A) generates the same language as GG. The selection languages are all right-sided comet languages. The letter XX neither occurs in an axiom nor in a context. Therefore, the part {X}\{X\}^{*} of the selection languages has no impact on the possible derivations (the only word used is λ\lambda). Thus, the inclusion 𝒞(𝑅𝐸𝐺)𝒞(𝑅𝐶𝑂𝑀)\mathcal{EC}(\mathit{REG})\subseteq\mathcal{EC}(\mathit{RCOM}) holds.

With Si=Si{X}S_{i}^{\prime}=S_{i}\{X\}^{*} for 1in1\leq i\leq n, the same language is generated and the selection languages are left-sided comets. Hence, we also have the inclusion 𝒞(𝑅𝐸𝐺)𝒞(𝐿𝐶𝑂𝑀)\mathcal{EC}(\mathit{REG})\subseteq\mathcal{EC}(\mathit{LCOM}). Hence, we obtain the chain of inclusions 𝒞(𝑅𝐸𝐺)𝒞2COM𝒞(𝑅𝐸𝐺){\cal EC}(\mathit{REG})\subseteq{\cal C}\subseteq\mathit{2COM}\subseteq{\cal EC}(\mathit{REG}) for 𝒞{𝐿𝐶𝑂𝑀,𝑅𝐶𝑂𝑀}{\cal C}\in\{\mathit{LCOM},\mathit{RCOM}\} which implies the equalities stated in the lemma. ∎

We now prove some proper inclusions.

Lemma 36

The family 𝒞(𝑀𝑂𝑁)\mathcal{EC}(\mathit{MON}) is a proper subset of the family 𝒞(𝑆𝑇𝐴𝑅)\mathcal{EC}(\mathit{STAR}).

Proof.

With the inclusion 𝑀𝑂𝑁𝑆𝑇𝐴𝑅\mathit{MON}\subseteq\mathit{STAR} (see Theorem 25 and Figure 1), we obtain also the inclusion 𝒞(𝑀𝑂𝑁)𝒞(𝑆𝑇𝐴𝑅){\cal EC}(\mathit{MON})\subseteq{\cal EC}(\mathit{STAR}) according to Lemma 26.

The language L={ann2}{ba2nbn1}L=\{\;a^{n}\mid n\geq 2\;\}\cup\{\;ba^{2n}b\mid n\geq 1\;\} from Lemma 30 belongs to the family 𝒞(𝑆𝑇𝐴𝑅)\mathcal{EC}(\mathit{STAR}) but not to the family 𝒞(𝑃𝑆)\mathcal{EC}(\mathit{PS}) and, hence, neither to 𝒞(𝑀𝑂𝑁)\mathcal{EC}(\mathit{MON}). Thus, the language is a witness for the properness of the inclusion. ∎

Lemma 37

The family 𝒞(𝐹𝐼𝑁)\mathcal{EC}(\mathit{FIN}) is a proper subset of the family 𝒞(𝑆𝑇𝐴𝑅)\mathcal{EC}(\mathit{STAR})

Proof.

According to [7], 𝒞(𝐹𝐼𝑁)𝒞(𝑀𝑂𝑁){\cal EC}(\mathit{FIN})\subset{\cal EC}(\mathit{MON}). According to Lemma 36, 𝒞(𝑀𝑂𝑁)𝒞(STAR)\mathcal{EC}(\mathit{MON})\subset\mathcal{EC}(STAR). Hence, the family 𝒞(𝐹𝐼𝑁)\mathcal{EC}(\mathit{FIN}) is also a proper subset of the family 𝒞(𝑆𝑇𝐴𝑅){\cal EC}(\mathit{STAR}). ∎

Lemma 38

The family 𝒞(𝑆𝑇𝐴𝑅)\mathcal{EC}(\mathit{STAR}) is a proper subset of the families 𝒞(𝐿𝐶𝑂𝑀)\mathcal{EC}(\mathit{LCOM}) and 𝒞(𝑅𝐶𝑂𝑀){\cal EC}(\mathit{RCOM}).

Proof.

The inclusions 𝑆𝑇𝐴𝑅{{λ}}𝐿𝐶𝑂𝑀\mathit{STAR}\setminus\{\{\lambda\}\}\subseteq\mathit{LCOM} and 𝑆𝑇𝐴𝑅{{λ}}𝑅𝐶𝑂𝑀\mathit{STAR}\setminus\{\{\lambda\}\}\subseteq\mathit{RCOM} hold as recalled in Section 3.2. Consider an external contextual grammar with a single selection component ({λ},C)(\{\lambda\},C) (if there are more components with the selection language {λ}\{\lambda\}, they can be joined to one where the new set of contexts is the union of the single sets and the selection language is still the same). If the generated language contains the empty word, then this is an axiom since it cannot be obtained by derivation. Then, exactly the (finitely many) words uvuv with (u,v)C(u,v)\in C are generated using this selection component. Thus, if we put all these words uvuv with (u,v)C(u,v)\in C into the set of axioms as well, we can remove the component ({λ},C)(\{\lambda\},C) and obtain a contextual grammar which generates the same language but has no selection language {λ}\{\lambda\} anymore. Then, the remaining selection languages belong to the families 𝐿𝐶𝑂𝑀\mathit{LCOM} and 𝑅𝐶𝑂𝑀\mathit{RCOM}. Hence, every language of 𝒞(𝑆𝑇𝐴𝑅)\mathcal{EC}(\mathit{STAR}) also belongs to the families 𝒞(𝐿𝐶𝑂𝑀)\mathcal{EC}(\mathit{LCOM}) and 𝒞(𝑅𝐶𝑂𝑀){\cal EC}(\mathit{RCOM}).

According to Lemma 28, the language L={bnan0}{λ}L=\{\;b^{n}a\mid n\geq 0\;\}\cup\{\lambda\} belongs to 𝒞(𝐶𝑂𝑀𝐵)\mathcal{EC}(\mathit{COMB}) (and also to 𝒞(𝐿𝐶𝑂𝑀)\mathcal{EC}(\mathit{LCOM}) and 𝒞(𝑅𝐶𝑂𝑀)\mathcal{EC}(\mathit{RCOM}) by Theorem 25, Figure 1, and Lemma 26) but not to 𝒞(𝑆𝑇𝐴𝑅)\mathcal{EC}(\mathit{STAR}). This proves the properness of the inclusion. ∎

Lemma 39

The family 𝒞(𝐷𝐸𝐹)\mathcal{EC}(\mathit{DEF}) is a proper subset of the family 𝒞(𝑆𝑌𝐷𝐸𝐹)\mathcal{EC}(\mathit{SYDEF}).

Proof.

Let G=(V,{(S1,C1),,(Sn,Cn)},A)G=(V,\{(S_{1},C_{1}),\ldots,(S_{n},C_{n})\},A) be a contextual grammar where all selection languages are definite: Si=UiBiAiS_{i}=U_{i}^{*}B_{i}\cup A_{i} for 1in1\leq i\leq n. We first separate the finite parts and obtain the contextual grammar G=(V,{(U1B1,C1),(A1,C1),,(UnBn,Cn),(An,Cn)},A)G^{\prime}=(V,\{(U_{1}^{*}B_{1},C_{1}),(A_{1},C_{1}),\ldots,(U_{n}^{*}B_{n},C_{n}),(A_{n},C_{n})\},A) which generates the same language as GG. Next, we eliminate the components with finite selection languages: If a set BiB_{i} is empty, then the entire selection language is empty and cannot be used for derivation. Hence, we can simply omit such selection components without changing the generated language. For every component (Ai,Ci)(A_{i},C_{i}) where AiA_{i} is a finite language (1in1\leq i\leq n), we move all words uwvuwv with (u,v)Ci(u,v)\in C_{i} and wAiL(G)w\in A_{i}\cap L(G) into the set of axioms. These are finitely many (as AiA_{i} and CiC_{i} are finite) and are exactly the words generated by these components). Hence, we can remove these components afterwards. Then, we have obtained a contextual grammar which still generates the same language L(G)L(G) but has only symmetric definite languages left.

The language L={ann1}{banbn1}{cba2nbcn1}L=\{\;a^{n}\mid n\geq 1\;\}\cup\{\;ba^{n}b\mid n\geq 1\;\}\cup\{\;cba^{2n}bc\mid n\geq 1\;\} is a witness language for the properness of the inclusion which, according to Lemma 34, belongs to the family 𝒞(𝑆𝑌𝐷𝐸𝐹)\mathcal{EC}(\mathit{SYDEF}) but not to the family 𝒞(𝑁𝐶)\mathcal{EC}(\mathit{NC}) and, hence, not to (since 𝒞(𝐷𝐸𝐹)𝒞(𝑁𝐶){\cal EC}(\mathit{DEF})\subset{\cal EC}(\mathit{NC}) according to [7]). ∎

Lemma 40

The family 𝒞(𝑆𝑌𝐷𝐸𝐹)\mathcal{EC}(\mathit{SYDEF}) is a proper subset of the family 𝒞(𝑃𝑆)\mathcal{EC}(\mathit{PS}).

Proof.

From [20], we know the inclusion 𝑆𝑌𝐷𝐸𝐹𝑃𝑆\mathit{SYDEF}\subseteq\mathit{PS}. Therefore, by Lemma 26, we have the inclusion 𝒞(𝑆𝑌𝐷𝐸𝐹)𝒞(𝑃𝑆)\mathcal{EC}(\mathit{SYDEF})\subseteq\mathcal{EC}(\mathit{PS}). Its properness follows from Lemma 32 with L={a,b}{c}{ab}{c}L=\{a,b\}^{*}\cup\{c\}\{ab\}^{*}\{c\} which belongs to the family 𝒞(𝑂𝑅𝐷)\mathcal{EC}(\mathit{ORD}) (and also to 𝒞(𝑃𝑆)\mathcal{EC}(\mathit{PS}) by [29]) but not to the family 𝒞(𝑆𝑌𝐷𝐸𝐹)\mathcal{EC}(\mathit{SYDEF}). ∎

Now, we prove the incomparability relations mentioned in Figure 2 which have not been proved earlier. These are the relations regarding the families 𝒞(𝑆𝑇𝐴𝑅){\cal EC}(\mathit{STAR}) and 𝒞(𝑆𝑌𝐷𝐸𝐹){\cal EC}(\mathit{SYDEF}) since the families 𝒞(𝐿𝐶𝑂𝑀){\cal EC}(\mathit{LCOM}), 𝒞(𝑅𝐶𝑂𝑀){\cal EC}(\mathit{RCOM}), and 𝒞(2COM){\cal EC}(\mathit{2COM}) coincide with 𝒞(𝑅𝐸𝐺){\cal EC}(\mathit{REG}) and are therefore not incomparable to the other families mentioned.

Lemma 41

Let ={𝐶𝑂𝑀𝐵,𝐷𝐸𝐹,𝑆𝑌𝐷𝐸𝐹,𝑂𝑅𝐷,𝑁𝐶,𝑃𝑆}{\cal F}=\{\mathit{COMB},\mathit{DEF},\mathit{SYDEF},\mathit{ORD},\mathit{NC},\mathit{PS}\}. The family 𝒞(𝑆𝑇𝐴𝑅)\mathcal{EC}(\mathit{STAR}) is incomparable to each family 𝒞(F)\mathcal{EC}(F) with FF\in{\cal F}.

Proof.

Due to the inclusion relations, it suffices to show that there are two languages L1L_{1} and L2L_{2} with the properties L1𝒞(𝐶𝑂𝑀𝐵)𝒞(𝑆𝑇𝐴𝑅)L_{1}\in{\cal EC}(\mathit{COMB})\setminus{\cal EC}(\mathit{STAR}) and L2𝒞(𝑆𝑇𝐴𝑅)𝒞(𝑃𝑆)L_{2}\in{\cal EC}(\mathit{STAR})\setminus{\cal EC}(\mathit{PS}). From Lemma 28, we have the language L1={bnan0}{λ}L_{1}=\{\;b^{n}a\mid n\geq 0\;\}\cup\{\lambda\}. From Lemma 30, we have L2={ba2nbn1}{ann2}L_{2}=\{\;ba^{2n}b\mid n\geq 1\;\}\cup\{\;a^{n}\mid n\geq 2\;\}. ∎

Lemma 42

Let ={𝑁𝐼𝐿,𝐶𝑂𝑀𝑀,𝐶𝐼𝑅𝐶}{\cal F}=\{\mathit{NIL},\mathit{COMM},\mathit{CIRC}\}. The family 𝒞(𝑆𝑇𝐴𝑅)\mathcal{EC}(\mathit{STAR}) is incomparable to each family 𝒞(F)\mathcal{EC}(F) with FF\in{\cal F}.

Proof.

Due to the inclusion relations, it suffices to show that there are two languages L1L_{1} and L2L_{2} with the properties L1𝒞(𝑁𝐼𝐿)𝒞(𝑆𝑇𝐴𝑅)L_{1}\in{\cal EC}(\mathit{NIL})\setminus{\cal EC}(\mathit{STAR}) and L2𝒞(𝑆𝑇𝐴𝑅)𝒞(𝐶𝐼𝑅𝐶)L_{2}\in{\cal EC}(\mathit{STAR})\setminus{\cal EC}(\mathit{CIRC}). From Lemma 27, we have the language L1={anbbbn1}{λ}L_{1}=\{\;a^{n}bbb\mid n\geq 1\;\}\cup\{\lambda\}. From Lemma 31, we have L2={anbnn1}{bnann1}L_{2}=\{\;a^{n}b^{n}\mid n\geq 1\;\}\cup\{\;b^{n}a^{n}\mid n\geq 1\;\}. ∎

Lemma 43

The language family 𝒞(𝑆𝑇𝐴𝑅)\mathcal{EC}(\mathit{STAR}) is incomparable to the family 𝒞(𝑆𝑈𝐹)\mathcal{EC}(\mathit{SUF}).

Proof.

We have L1={a,b}{anbmn1,m1}{canbmcn1,m1}𝒞(𝑆𝑈𝐹)𝒞(𝑆𝑇𝐴𝑅)L_{1}=\{a,b\}^{*}\{\;a^{n}b^{m}\mid n\geq 1,\ m\geq 1\;\}\cup\{\;ca^{n}b^{m}c\mid n\geq 1,\ m\geq 1\;\}\in\mathcal{EC}(\mathit{SUF})\setminus\mathcal{EC}(\mathit{STAR}) from Lemma 29. From Lemma 30, we know that L2={ann2}{ba2nbn1}L_{2}=\{\;a^{n}\mid n\geq 2\;\}\cup\{\;ba^{2n}b\mid n\geq 1\;\} belongs to the family 𝒞(𝑆𝑇𝐴𝑅)\mathcal{EC}(\mathit{STAR}) but not to 𝒞(𝑃𝑆)\mathcal{EC}(\mathit{PS}) (and neither to 𝒞(𝑆𝑈𝐹)\mathcal{EC}(\mathit{SUF}) by [29]). ∎

Lemma 44

The language family 𝒞(𝑆𝑌𝐷𝐸𝐹)\mathcal{EC}(\mathit{SYDEF}) is incomparable to the family 𝒞(𝑆𝑈𝐹)\mathcal{EC}(\mathit{SUF}).

Proof.

We have L1={a,b}{c}{λ,b}{ab}{c}𝒞(𝑆𝑈𝐹)𝒞(𝑆𝑌𝐷𝐸𝐹)L_{1}=\{a,b\}^{*}\cup\{c\}\{\lambda,b\}\{ab\}^{*}\{c\}\in\mathcal{EC}(\mathit{SUF})\setminus\mathcal{EC}(\mathit{SYDEF}) from Lemma 33. From [7], we know that L2={abnn1}{λ}L_{2}=\{\;ab^{n}\mid n\geq 1\;\}\cup\{\lambda\} belongs to the family 𝒞(𝐶𝑂𝑀𝐵)\mathcal{EC}(\mathit{COMB}) but not to 𝒞(𝑆𝑈𝐹)\mathcal{EC}(\mathit{SUF}). By [29] and Lemma 39, the language L2L_{2} also belongs to 𝒞(𝑆𝑌𝐷𝐸𝐹){\cal EC}(\mathit{SYDEF}). ∎

Lemma 45

The family 𝒞(𝑆𝑌𝐷𝐸𝐹)\mathcal{EC}(\mathit{SYDEF}) is incomparable to each of the families 𝒞(𝑂𝑅𝐷){\cal EC}(\mathit{ORD}) and 𝒞(𝑁𝐶){\cal EC}(\mathit{NC}).

Proof.

Due to the inclusion relations, it suffices to show that there are two languages L1L_{1} and L2L_{2} with the properties L1𝒞(𝑂𝑅𝐷)𝒞(𝑆𝑌𝐷𝐸𝐹)L_{1}\in{\cal EC}(\mathit{ORD})\setminus{\cal EC}(\mathit{SYDEF}) and L2𝒞(𝑆𝑌𝐷𝐸𝐹)𝒞(𝑁𝐶)L_{2}\in{\cal EC}(\mathit{SYDEF})\setminus{\cal EC}(\mathit{NC}). From Lemma 32, we have L1={a,b}{c}{ab}{c}L_{1}=\{a,b\}^{*}\cup\{c\}\{ab\}^{*}\{c\}. As L2L_{2}, we take L2={ann1}{banbn1}{cba2nbcn1}L_{2}=\{\;a^{n}\mid n\geq 1\;\}\cup\{\;ba^{n}b\mid n\geq 1\;\}\cup\{\;cba^{2n}bc\mid n\geq 1\;\} from Lemma 34. ∎

Lemma 46

The family 𝒞(𝑆𝑌𝐷𝐸𝐹)\mathcal{EC}(\mathit{SYDEF}) is incomparable to each of the families 𝒞(𝐶𝑂𝑀𝑀){\cal EC}(\mathit{COMM}) and 𝒞(𝐶𝐼𝑅𝐶){\cal EC}(\mathit{CIRC}).

Proof.

Due to the inclusion relations, it suffices to show that there are two languages L1L_{1} and L2L_{2} with the properties L1𝒞(𝐶𝑂𝑀𝑀)𝒞(𝑆𝑌𝐷𝐸𝐹)L_{1}\in{\cal EC}(\mathit{COMM})\setminus{\cal EC}(\mathit{SYDEF}) and L2𝒞(𝑆𝑌𝐷𝐸𝐹)𝒞(𝐶𝐼𝑅𝐶)L_{2}\in{\cal EC}(\mathit{SYDEF})\setminus{\cal EC}(\mathit{CIRC}). In [29], it was proved that the language L1={ann2}{ba2nbn1}L_{1}=\{\;a^{n}\mid n\geq 2\;\}\cup\{\;ba^{2n}b\mid n\geq 1\;\} belongs to 𝒞(COMM){\cal EC}(COMM) but not to 𝒞(𝑃𝑆){\cal EC}(\mathit{PS}) (this can be seen also in the proof of Lemma 30). By Lemma 40, the language L1L_{1} neither belongs to the family 𝒞(𝑆𝑌𝐷𝐸𝐹){\cal EC}(\mathit{SYDEF}).

In [7], it was proved that the language L2={abcnn1}{cnabn1}L_{2}=\{\;abc^{n}\mid n\geq 1\;\}\cup\{\;c^{n}ab\mid n\geq 1\;\} belongs 𝒞(𝐶𝑂𝑀𝐵){\cal EC}(\mathit{COMB}) but not to 𝒞(𝐶𝐼𝑅𝐶){\cal EC}(\mathit{CIRC}). By [29] and Lemma 39, the language L2L_{2} also belongs to the family 𝒞(𝑆𝑌𝐷𝐸𝐹){\cal EC}(\mathit{SYDEF}). ∎

Theorem 47 (Hierarchy of the 𝒞{\cal EC} language families)

The inclusion relations presented in Figure 2 hold. An arrow from an entry XX to an entry YY depicts the proper inclusion XYX\subset Y; if two families are not connected by a directed path, then they are incomparable.

Proof.

An edge label refers to the paper or lemma in the present paper where the proper inclusion is shown. The incomparability results are proved in Lemmas 41 to 46. ∎

5 Conclusion and future work

In this paper, we have extended the previous hierarchy of subregular language families and families generated by external contextual grammars with selection in certain subregular language families.

Various other subregular language families have also been investigated in the past (for instance, in [2, 13, 20]). Future research will be on extending and unifying current hierarchies of subregular language families (presented, for instance, in [10, 29]) by additional families and to use them as control in external contextual grammars. We already started investigations on the position of prefix- and infix-closed as well as prefix-, suffix-, and infix-free languages in the current hierarchy and their impact on the generative power of external contextual grammars when used for selection. The extension of the hierarchy with other families of definite-like languages (for instance, ultimate definite, central definite, noninital definite) has also already begun.

The research will be also extended to internal contextual grammars or tree-controlled grammars where results are already available in [10, 29, 30, 31].

References

  • [1]
  • [2] Henning Bordihn, Markus Holzer & Martin Kutrib (2009): Determination of finite automata accepting subregular languages. Theoretical Computer Science 410(35), pp. 3209–3222, 10.1016/j.tcs.2009.05.019.
  • [3] Janusz A. Brzozowski (1962): Regular expression techniques for sequential circuits. Ph.D. thesis, Princeton University, Princeton, NJ, USA.
  • [4] Janusz A. Brzozowski (1967): Roots of star events. Journal of the ACM 14(3), pp. 466–477, 10.1109/SWAT.1966.21.
  • [5] Janusz A. Brzozowski & Rina Cohen (1969): On decompositions of regular events. Journal of the ACM 16(1), pp. 132–144, 10.1145/321495.321505.
  • [6] Janusz A. Brzozowski, Galina Jirásková & Chenglong Zou (2014): Quotient complexity of closed languages. Theory of Computing Systems 54, pp. 277–292, 10.1007/s00224-013-9515-7.
  • [7] Jürgen Dassow (2005): Contextual grammars with subregular choice. Fundamenta Informaticae 64(1–4), pp. 109–118.
  • [8] Jürgen Dassow (2015): Contextual languages with strictly locally testable and star free selection languages. Analele Universitatii Bucuresti 62, pp. 25–36.
  • [9] Jürgen Dassow, Florin Manea & Bianca Truthe (2012): On external contextual grammars with subregular selection languages. Theoretical Computer Science 449, pp. 64–73, 10.1016/j.tcs.2012.04.008.
  • [10] Jürgen Dassow & Bianca Truthe (2023): Relations of contextual grammars with strictly locally testable selection languages. RAIRO – Theoretical Informatics and Applications 57, p. #10, 10.1051/ita/2023012.
  • [11] Ference Gécseg & István Peák (1972): Algebraic Theory of Automata. Academiai Kiado, Budapest.
  • [12] Arthur Gill & Lawrence T. Kou (1974): Multiple-entry finite automata. Journal of Computer and System Sciences 9(1), pp. 1–19, 10.1016/S0022-0000(74)80034-6.
  • [13] Yo-Sub Han & Kai Salomaa (2009): State complexity of basic operations on suffix-free regular languages. Theoretical Computer Science 410(27), pp. 2537–2548, 10.1016/j.tcs.2008.12.054.
  • [14] Ivan M. Havel (1969): The theory of regular events II. Kybernetika 5(6), pp. 520–544.
  • [15] Markus Holzer & Bianca Truthe (2015): On relations between some subregular language families. In Rudolf Freund, Markus Holzer, Nelma Moreira & Rogério Reis, editors: Seventh Workshop on Non-Classical Models of Automata and Applications – NCMA 2015, Porto, Portugal, August 31 – September 1, 2015. Proceedings, [email protected] 318, Österreichische Computer Gesellschaft, pp. 109–124.
  • [16] Manfred Kudlek (2004): On languages of cyclic words. In Natasha Jonoska, Gheorghe Păun & Grzegorz Rozenberg, editors: Aspects of Molecular Computing, Essays Dedicated to Tom Head on the Occasion of His 70th Birthday, LNCS 2950, Springer-Verlag, pp. 278–288, 10.1007/978-3-540-24635-0_20.
  • [17] Solomon Marcus (1969): Contextual grammars. Revue Roumaine de Mathématique Pures et Appliquées 14, pp. 1525–1534.
  • [18] Robert McNaughton & Seymour Papert (1971): Counter-Free Automata. MIT Press, Cambridge, USA.
  • [19] Benedek Nagy (2019): Union-Freeness, Deterministic Union-Freeness and Union-Complexity. In Michal Hospodár, Galina Jirásková & Stavros Konstantinidis, editors: Descriptional Complexity of Formal Systems, 21st IFIP WG 1.02 International Conference, DCFS 2019, Košice, Slovakia, July 17–19, 2019, Proceedings, Springer, Cham, pp. 46–56, 10.1007/978-3-030-23247-4_3.
  • [20] Viktor Olejár & Alexander Szabari (2023): Closure Properties of Subregular Languages Under Operations. International Journal of Foundations of Computer Science, pp. 1–25, 10.1142/S0129054123450016.
  • [21] Azaria Paz & Bezalel Peleg (1965): Ultimate-definite and symmetric-definite events and automata. Journal of the ACM 12(3), pp. 399–410, 10.1145/321281.321292.
  • [22] Micha A. Perles, Michael O. Rabin & Eli Shamir (1963): The theory of definite automata. IEEE Transactions of Electronic Computers 12, pp. 233–243, 10.1109/PGEC.1963.263534.
  • [23] Grzegorz Rozenberg & Arto Salomaa, editors (1997): Handbook of Formal Languages. Springer-Verlag, Berlin, 10.1007/978-3-642-59136-5.
  • [24] Huei-Jan Shyr (1991): Free Monoids and Languages. Hon Min Book Co., Taichung, Taiwan.
  • [25] Huei-Jan Shyr & Gabriel Thierrin (1974): Ordered automata and associated languages. Tamkang Journal of Mathematics 5(1), pp. 9–20.
  • [26] Huei-Jan Shyr & Gabriel Thierrin (1974): Power-separating regular languages. Mathematical Systems Theory 8(1), pp. 90–95, 10.1007/BF01761710.
  • [27] Bianca Truthe (2014): A relation between definite and ordered finite automata. In Suna Bensch, Rudolf Freund & Friedrich Otto, editors: Sixth Workshop on Non-Classical Models for Automata and Applications – NCMA 2014, Kassel, Germany, July 28–29, 2014. Proceedings, [email protected] 304, Österreichische Computer Gesellschaft, pp. 235–247.
  • [28] Bianca Truthe (2018): Hierarchy of Subregular Language Families. Technical Report, Justus-Liebig-Universität Giessen, Institut für Informatik, IFIG Research Report 1801.
  • [29] Bianca Truthe (2021): Generative Capacity of Contextual Grammars with Subregular Selection Languages. Fundamenta Informaticae 180(1–2), pp. 123–150, 10.3233/FI-2021-2037.
  • [30] Bianca Truthe (2023): Merging two Hierarchies of Internal Contextual Grammars with Subregular Selection. In Benedek Nagy & Rudolf Freund, editors: Proceedings of the 13th International Workshop on Non-Classical Models of Automata and Applications, NCMA 2023, Famagusta, North Cyprus, 18th–19th September, 2023, EPTCS 388, pp. 125–139, 10.4204/EPTCS.388.12.
  • [31] Bianca Truthe (2023): Strictly Locally Testable and Resources Restricted Control Languages in Tree-Controlled Grammars. In Zsolt Gazdag, Szabolcs Iván & Gergely Kovásznai, editors: Proceedings of the 16th International Conference on Automata and Formal Languages, AFL 2023, Eger, Hungary, September 5–7, 2023, EPTCS 386, pp. 253–268, 10.4204/EPTCS.386.20.
  • [32] Barbara Wiedemann (1978): Vergleich der Leistungsfähigkeit endlicher determinierter Automaten. Diplomarbeit, Universität Rostock.