Indian Statistical Institute, Kolkata, India and https://www.isical.ac.in/~ansuman[email protected]://orcid.org/0000-0003-0220-646X Government College of Engineering and Ceramic Technology, Kolkata, India and https://sites.google.com/view/kingshukchatterjee/home[email protected]://orcid.org/0000-0002-2617-6309 Tata Institute of Fundamental Research, Mumbai, India and https://www.tifr.res.in/~shibashis.guha/[email protected]://orcid.org/0000-0002-9814-6651 \ccsdesc[500]Theory of computation Formal languages and automata theory \relatedversion
Acknowledgements.
We thank Amaldev Manuel for providing useful comments on a preliminary version of this paper.\EventEditorsJohn Q. Open and Joan R. Access \EventNoEds2 \EventLongTitle42nd Conference on Very Important Topics (CVIT 2016) \EventShortTitleCVIT 2016 \EventAcronymCVIT \EventYear2016 \EventDateDecember 24–27, 2016 \EventLocationLittle Whinging, United Kingdom \EventLogo \SeriesVolume42 \ArticleNo23Set Augmented Finite Automata over Infinite Alphabets
Abstract
A data language is a set of finite words defined on an infinite alphabet. Data languages are used to express properties associated with data values (domain defined over a countably infinite set). In this paper, we introduce set augmented finite automata (SAFA), a new class of automata for expressing data languages. We investigate the decision problems, closure properties, and expressiveness of SAFA. We also study the deterministic variant of these automata.
keywords:
automata on infinite alphabet, data languages, register automata, expressiveness, closure properties1 Introduction
A data language is a set of data words that are concatenations of attribute, data-value pairs. While the set of attributes is finite, the values that these attributes hold often come from a countably infinite set (e.g. natural numbers). With large scale availability of data in recent times, there is a need for methods for modeling and analysis of data languages. Thus, there is a demand for automated methods for recognizing attribute data relationships and languages defined on infinite alphabets. A data word is a concatenation of a finite number of attribute, data-value pairs, i.e. a data word , where each belongs to a finite set and each belongs to a countably infinite set. We denote by the length of . This work introduces a new model for data languages on infinite alphabets.
-register automata (finite automata with registers, each capable of storing one data value) [16] are finite automata with the ability to handle infinite alphabets. The nonemptiness and membership problems for register automata are both NP-complete [24]. However, the language recognizability is somewhat restricted, since it uses only a finite number of registers to store the data values. Thus, register automata cannot accept many data languages, one such being which is a collection of data words where all data values associated with the attribute have to be distinct. Pushdown versions of automata on infinite alphabets using stacks have also been introduced in [8]. However, even with the introduction of a stack, these models are unable to accept . Data automata are introduced in [6] and the emptiness problem is shown to be nonelementary. Further, class memory automata (CMA) and class counter automata (CCA) are introduced in [5, 19] respectively. While CMA and data automata are shown to be equivalent [5], the set of languages accepted by CCA is a subset of the set of languages accepted by CMA. The nonemptiness problem for CCA is EXPSPACE-complete [19], and the nonemptiness problem for CMA is interreducible to the reachability problem in Petri nets [20, 6, 18, 21], and hence Ackermann-complete [9].
Our Contribution:
This work introduces set augmented finite automata (SAFA) that are finite automaton models equipped with a finite number of finite sets for storing data values. Using these sets as auxiliary storage, SAFA is able to recognize many important data languages including . This paper has the following contributions.
-
•
We present the formal definition of the SAFA model on infinite alphabets (Definition 3.1).
- •
- •
-
•
We also study the deterministic variant of SAFA and show that there are languages that necessarily need nondeterminism to be accepted (Theorem 4.38).
-
•
We present a strict hierarchy of languages with respect to the number of sets associated with SAFA models (Theorem 3.5).
-
•
Finally, we study the expressiveness of SAFA models (See Section 5). While we show that the class of languages recognized by SAFA and register automata are incomparable, the set of languages accepted by SAFA is a strict subset of the set of languages accepted by CCA, and hence by CMA.
Related work:
Register automata introduced by Kaminski et.al.[16] use a finite number of registers to store data values; hence they can only accept those data languages in which membership depends on remembering properties of a finite number of data values where is bounded above by the number of registers in the register automata. An extension to finite register models are pushdown automata models for infinite alphabets [8]. Cheng et.al. [8] and Autebut et.al. [1] both describe context free languages for infinite alphabets. The membership problem for the grammar proposed by Cheng et. al. is decidable unlike Autebut’s. Sakamoto et.al.[24] show that the membership problem is PTIME-complete and nonemptiness is NP-complete for deterministic register finite automata model on infinite alphabets. Neven et.al.[22, 23] discuss the properties of register and pebble automata on infinite alphabets, in terms of their language acceptance capabilities and also establish their relationship with logic. Tan et.al. [25] introduces a weak 2-pebble automata model whose emptiness is decidable but with significant reduction in acceptance capabilities with respect to pebble automata.Kaminski et.al.[17] also develop a regular expression over infinite alphabets. Choffrut et.al.[4] define finite automata on infinite alphabets which uses first order logic on transitions. This is later extended by Iosif et.al. to an alternating automata model [14] with emptiness problem being undecidable but they give two efficient semi-algorithms for emptiness checking. Demri et.al. [11] explore the relationship between linear temporal logic (LTL) and register automata. Grumberg et.al. [13] introduce variable automata over infinite alphabets where transitions are defined over alphabets as well as over variables. Data automata are introduced in [6]. Class memory automata (CMA) is introduced in [5] and is shown to be equivalent to data automata. The set of languages accepted by CMA is a superset of Class counter automata (CCA) [19], another infinite alphabet automata introduced by Manuel et.al. Bollig [7] combines CMA and register automata and shows that local existential monadic second order (MSO) logic can be converted to class register automata in time polynomial in the size of the input automata. Figuera [12] discusses the properties of alternating register automata and also discusses restricted variants of this model where decidability is tractable. Dassow et.al. [10] introduces the P-automata model and establishes that these are equivalent to a restricted version of register automata. A detailed survey of existing finite automata models on infinite alphabets can be found in [20].
The elegance of SAFA is its simple structure that is easy to implement. Moreover, the membership and nonemptiness problems are NP-complete for our model. On one hand, this gives us an advantage over the hash-based family of models on infinite alphabets, with respect to the associated problem complexities. On the other hand, this puts our model in the same complexity class as the -register automata, while having the ability to accept many important data languages.
A preliminary version of this paper appeared in [3].
2 Preliminaries
Let denote the set of natural numbers, and the set where . Let be a finite alphabet which comprises a finite set of attributes, and be a countably infinite set of attribute values. A data word is a concatenation of attribute, data value pairs, where denotes zero or more repetitions. A data value is also known as an attribute value, that is the value associated with an attribute. An example data word is of the form where , , and denotes the length of . An example data word on and is: with = 8. A data language is a set of data words. Some example data languages with , are mentioned below.
-
•
: language of data words, wherein the data values associated with attribute are all distinct.
-
•
: language of data words wherein all data values appear exactly twice.
-
•
: the language of data words where there exists a data value which does not appear twice. is the complement of .
-
•
: the language of data words wherein the data values associated with attribute are those which have already appeared with the attribute .
For a word , we denote by and the projection of on and respectively. Let be the set of all data words such that where is the set of all words over generated by the regular expression .
Post Correspondence Problem (PCP)
: \chadded[id=KC]The PCP problem consists of two lists of equal length, say . The items of the lists are finite strings defined on an alphabet where . Without loss of generality, we can assume . List 1 consists of the strings , and list 2 consists of the strings where . The PCP problem is true, if there exists a sequence where such that and false otherwise (see Example 2.1). Each pair is considered as a domino and the PCP solution is an arrangement of these dominoes such that the strings in the upper part and lower part of the arranged dominoes become same.
Example 2.1.
An instance of the PCP problem on is as follows: List 1= and List 2=, Then one possible solution for the above PCP problem is the sequence , i.e. . ∎
-register automata [16]
: A -register automaton is a tuple , where is a finite set of states, is an initial state and is a set of final states, is an initial register configuration given by , where is a countably infinite set, and denotes an uninitialized register, and is a partial update function: . The transition relation is . The registers initially contain distinct data values other than which can be present in more than one register. The automaton works as follows. Consider a register automaton in state . Each of its registers holds datum where , . Let at some instance reads the data element of the input word where , . Two cases may arise.
-
•
Case 1: There exists an such that : In this case two situations may arise (i) and (ii) . In situation (i) the corresponding transition is executed, and in situation (ii) the automaton stops without consuming the data element.
-
•
Case 2: There exists no register such that : In this case, for all , we have . We look at the partial update function . If is not defined, the automaton stops without consuming the data element. If is defined, then is inserted in the register and the automaton executes the transition if , otherwise it halts if .
The automaton accepts an input data word if it consumes the whole word and ends in a final state.
Class counting automata [19]
: A class counting automaton (a.k.a 1-bag CCA) is defined as a -tuple where is a finite set of states, is an initial state, and is the set of accepting states. A constraint is a pair , where , . Let denote a collection of constraints. The transition relation is . A bag is a finite set . Initially, is set to for all data values . The set . An element of is called an operation. When making a transition, a CCA reads an attribute, data-value pair , and checks if holds. If it holds, then (i) either is incremented by if the operation is , or (ii) is reset to if the operation is , and we go to the next state. A CCA accepts a data word if it is in a final state after consuming .
A -bag CCA has bags. For a data value, constraint checking can be done on a subset of the bags. The bags can also be updated or reset independently. The set of transitions of a -bag CCA is a subset of . We denote by , the bag. It is shown in [19] that for every -bag CCA that accepts a language , there exists a -bag CCA which accepts the same language .
Class memory automata [5]
: A class memory automaton is a -tuple where is a finite set of states, is an initial state and are a set of global and local accepting states respectively. The transition relation is . The automaton keeps track of the last state where a data value is encountered. If a data value is not yet encountered, then it is associated with . Each transition of a CMA is dependent on the current state of the automaton and the state the automaton was in when the data value being read currently was last encountered. A data word is accepted if the automaton reaches a state and the last state of all the data values encountered in are in .
3 Set augmented finite automata
Definition 3.1.
A set augmented finite automaton (SAFA) is defined as a -tuple where is a finite set of states, is a finite alphabet, is a countably infinite set, is the initial state, is a set of final states, is a finite set of finite sets of data values. The transition relation is defined as: where , denotes the set in , and . ∎
We call a SAFA a singleton if . The unary Boolean predicate evaluates to true if the data value currently being read by the automaton is present in the set . The predicate is true if the data value currently being read is not in . Further, denotes a set of operations that a SAFA can execute on reading a symbol; the operation inserts the data value currently being read by the automaton into the set , while denotes no such insertion is done. For any combination not in , we assume the transition is absent.
For a SAFA , we define a configuration as follows: is a state of the automaton, where each for is a finite subset of , and denotes the content of the sets in . A run of on an input is a sequence , where , and for is the content of the set after reading the prefix for . A configuration succeeds a configuration if there is a transition where
-
(i)
for , we have that the data value .
-
(ii)
for , we have that the data value .
The execution of the operation takes the content of the sets of data values from to . If is , then . If is ins, then for all , and . If the consumes the whole word , and , then the run is accepting, otherwise, it is rejecting. A word is accepted by if it has an accepting run. The language accepted by consists of all words accepted by . We denote by the length of the run which equals the number of transitions taken. Note that for the run on an input word , we have that = .
Definition 3.2.
A SAFA is deterministic (DSAFA) if for every and , if there is a transition , where , , , , then there cannot be any transition of the form , , where , , , . The only other allowed transition can be for , , and . ∎
Let and denote the set of all languages accepted by nondeterministic SAFA and deterministic SAFA respectively. We illustrate the SAFA model with some instances of data languages.
Example 3.3.
The language can be accepted by the DSAFA in Figure 1. Here, , , is any countably infinite set, , , the transition relation consists of the transitions shown in Figure 1. The automaton works as follows. The set is used to store the data values encountered in the input word associated with . At , if reads , it remains in without modifying . At when the automaton reads , it checks whether the corresponding data value is present in . If present, it indicates it has already encountered this data value with attribute before; therefore the automaton goes to which is a dead state and the input word is rejected. If the data value is not present in , it implies that it has not encountered this value with , thus it remains in and inserts the data value into . Only if the automaton encounters a duplicate data value, it goes to . If it does not encounter duplicate data values with respect to in the input, the automaton remains in after consuming the entire word and it is accepted. ∎
Example 3.4.
Consider the data language over the alphabet . A nonempty word is in the language if there exists a data value that appears times in with 2. This can be accepted by the nondeterministic SAFA in Figure 2. At state , the automaton nondeterministically guesses the data value that does not appear exactly twice \chadded[id=KC]and stores it in set and goes to state . The automaton remains in state if the count of the data value is or moves to state (via ) and remains there if the count of the data value is greater than . In both the cases, it accepts the input word if it can be consumed entirely. If the guess is incorrect, the data value appears twice, and it is in the nonaccepting state after consuming the input word. Thus, if a data word has all its data values that appear exactly twice, then all the runs end in and the input is rejected. ∎
[id=KC]The number of sets in the SAFA model impacts the language accepting capacity. The following theorem establishes a hierarchy of accepted languages by SAFA based on the size of .
Theorem 3.5.
No SAFA with , can accept the language . 111The language could have also been considered but the proof is relatively simpler if we instead consider .
Proof 3.6.
Assume that there exists a SAFA with , which accepts and has states. Since the automaton accepts and contains words which are longer than , the automaton must have at least one cycle of sequence of transitions in its structure. Thus, the automaton accepts where is the prefix of before entering the cycle, is the infix of the word that is consumed in the cycle and and is suffix of the word that is consumed after exiting the cycle. The sequence of transitions that consume and themselves may or may not contain cycles of transitions. Let us focus on the cycle of sequence of transitions that consumes .
-
•
CASE 1: contains a transition with . The (attribute, data value) pair consumed in , can be consumed again in the next execution of . The new word accepted by will then not be . Therefore cannot have any transition with
-
•
CASE 2: Now suppose has a transition of the form or , , . When is executed for the first time, let the transition consume a data value which has not consumed before and will not consume in the next execution of except when it is executing . Since the number of transitions in is finite and is countably infinite, it is always possible to find such a value. When is executed again and is being executed, the same (attribute, data-value) pair can be consumed by it as the data value has not been inserted in , when executed before. Therefore, the new word which accepts cannot be in .
-
•
CASE 3: Since , the cycle of transitions cannot be defined on two different . This is because if is executed again, then the projection of the new word on will no longer be in .
From Cases 1 to 3, we conclude that for to accept , must have at least one cycle for each and the sequence of transitions in the cycle for must be of the form , , , .
As , by pigeon hole principle there will be two such that they insert the value in the same set . Now, suppose inserts a data value in set . Consider the data word where all the positions have unique data value except at and , and is of the form . The string is a valid data word in . But the automaton will reject such a data word , because when implementing the transition for a cycle of , when it reads it will fail as is already stored in when the data element was being consumed. It can be argued that the data word can be accepted by some other cycle involving but increasing the number of cycles will further result in storing its data in the same set , with some other , . Thus, the above mentioned problem will persist. Therefore, it is not possible to construct a SAFA with , which accepts the language
Let be the set of all languages accepted by SAFA with . Since every SAFA with can be simulated by a SAFA with and by using dummy sets that are never used in an execution of , we have the following.
Corollary 3.7.
.
Corollary 3.7 shows that there is a strict hierarchy in terms of accepting capabilities of SAFA with respect to .
4 Decision problems and closure properties
We study the nonemptiness, membership problems and closure properties of SAFA.
4.1 Nonemptiness and membership
We study the nonemptiness and the membership problems of SAFA and show that both are -complete. Given a SAFA and an input word , the membership problem is to check if . Given a SAFA , the nonemptiness problem is to check if . We start with the nonemptiness problem. To show the NP-membership, we first show that there exists a small run if the language accepted by a given SAFA is nonempty.
Lemma 4.1.
Every SAFA with has a data word in with an accepting run such that .
Proof 4.2.
We prove by contradiction. Assume that and that for every , for all accepting runs of , we have that . We define an indicator function which maps to , where corresponding to a set denotes that is empty while denotes that is nonempty. Since , the run can be divided into segments, each of length , that is, each segment is an infix over transitions. By pigeon hole principle, in each segment, there exists a state that is visited more than once. Further, since there are such segments, again by pigeon hole principle, there exists a segment and a state such that is visited more than once in this segment and does not change over the infix of the run between the two successive visits of . We now note that the sequence of transitions reading this infix makes a loop over , and thus can be removed from resulting into a word and the corresponding run is such that and is accepting. Now there can be two cases if the suffix of following reading in has a transition with and it reads a data value .
-
•
It may happen that the data value was inserted along the infix . Since does not change while reading the infix , it implies that the set was nonempty even before the infix was read. Let a data value was inserted into while reading the prefix before . Then may be modified to so that the suffix following in reads the data value instead of . Let the run corresponding to be that follows the same sequence of states as . Note that , and that is an accepting run.
-
•
If while reading , the transition reads a data value that was inserted while reading a prefix appearing before , then does not need to be modified, and we thus have the accepting run .
Since is an arbitrary accepting run of length or more, starting from , we can remove infixes repeatedly and modify it as mentioned above if needed until we reach an accepting run of length strictly smaller than without affecting acceptance, and hence the contradiction.
Using Lemma 4.1 we get the following.
Lemma 4.3.
Nonemptiness problem for SAFA is in NP.
Proof 4.4.
Consider a SAFA with . By Lemma 4.1, \chadded[id=KC]a Turing machine can nondeterministically guess an accepting run of polynomial length, hence the result.
We now show that the nonemptiness problem is NP-hard even for deterministic acyclic SAFA over an alphabet of size . Example 4.5 describes our construction.
Example 4.5.
For a 3CNF formula , the corresponding SAFA is shown in Figure 3. We denote by the transition and by the transition with , , , , . In particular, if there are variables in the formula, then . ∎
Lemma 4.6.
The nonemptiness problem is NP-hard for deterministic acyclic SAFA over an alphabet of size .
Theorem 4.7.
The nonemptiness problem for SAFA is NP-complete.
We now show that for singleton SAFA, nonemptiness is NL-complete. Towards this, we first show the following lemma.
Lemma 4.8.
The nonemptiness problem for singleton SAFA is reducible to the nonemptiness of a nondeterministic finite automaton (NFA) in PTIME.
Proof 4.9.
We begin with the following observations on SAFA transitions for a run on an input word. The idea is to see how we can construct a word accepted by a given SAFA that takes it from the initial state to a final state following the transition rules.
-
•
Transitions with can always be satisfied since we have an infinite number of data values. We can always introduce a new data value with an attribute so that it is not in the set . However, transitions with should only be executed if there exists an somewhere earlier on the path before reaching the transition with .
-
•
Given a SAFA, it is just not enough to only look for simple paths from the initial state to a final state satisfying the observations stated above (see Figure 4). In Figure 4, the only simple path is with transition . Since the simple path does not contain any transition having prior to the transition containing , no word is accepted by the automaton along this simple path. However, we find that the automaton accepts the string . Thus, when checking for emptiness of a SAFA, we cannot just restrict our analysis to simple paths.
-
•
The language accepted by a SAFA is nonempty iff there exists a sequence of transitions that takes it from an initial state to a final state with the added condition that if the sequence of transitions contains a , then there must exist a corresponding prior to the in the sequence of transitions.
Given a singleton SAFA , , we check for nonemptiness by constructing three nondeterministic finite automata (NFA). The first NFA accepts all those sequences of transitions which can take the SAFA from its initial state to any of its final states. However, does not check if the sequence of transitions on the path from the initial state to a final state of is valid (i.e. Condition 1 above). The second NFA accepts all possible sequences of transitions that have prior to encountering a . We construct the synchronous product [2] of and that gives us another NFA . We check for emptiness of . If is not empty (final state is reachable), we can conclude that there exists at least one sequence of transitions that takes the SAFA from the initial state to a final state and every encountered on that sequence of transitions has an prior to it. Therefore, the SAFA is not empty. If is empty, it indicates that there exists no such sequence of transitions. Thus, if is empty, we can conclude that the SAFA is empty as well and there is no data word that is accepted by the SAFA.
Given a SAFA , we construct NFA as below:
-
•
where , . The alphabet contains all transitions that M can have.
-
•
with the triplets in considered as elements of .
From the construction of , we see that it accepts all those possible sequences of transitions that may take from its initial state to any of its final states. We construct NFA as below:
-
•
The transitions in are as follows:
-
•
,
-
•
,
-
•
,
-
•
,
-
•
,
-
•
,
The automaton works as follows. State denotes that we have not yet come across , state denotes we have seen an . For inputs of the form where , we remain in state . If we come across inputs of the form where , we move to state from . At state , the automaton remains in state for every element
By construction, accepts all those sequences of transitions of where every transition containing is preceded by at least one transition containing . The NFA is a synchronous product of NFA and NFA . Therefore NFA accepts all those sequences which take the SAFA from its initial state to a final state. Thus, if the language accepted by is empty, so is the language accepted by the SAFA , and non-empty otherwise. The automaton is nonempty if there exists a simple path from its initial state to any of its final states. This can be found out using a standard Depth First Search (DFS).
The time complexity of the emptiness check is polynomial in the size of . The size of depends on the size of and . Size of an NFA is defined as . The number of states in is same as . The number of states in is and the number of states in is at most and number of edges in is at most which is polynomial in the input size.
Example 4.10.
Theorem 4.11.
The nonemptiness problem for singleton SAFA is -complete.
Proof 4.12.
We first discuss -membership. By Lemma 4.8, the nonemptiness for singleton SAFA is in by reducing the problem to checking the nonemptiness of a nondeterministic finite automaton (NFA) with states. This NFA can be constructed on-the-fly leading to an -membership of the nonemptiness problem of singleton SAFA.
For -hardness, we show a reduction from the reachability problem on a directed graph having vertex set which is known to be -complete [15]. Let be a directed graph with and we are given the vertices and . We define a SAFA where , , is a countably infinite set, , , i.e. . The transitions in are as follows: if is an edge in . It is easy to see that can be constructed from in logspace and that iff there is a path from vertex to vertex in . Hence, the result.
We now show that membership for SAFA is NP-complete. We first note that unlike the nonemptiness problem, for DSAFA, membership can be decided in by reading the input word and by checking if a final state is reached.
Theorem 4.13.
The membership problem for SAFA is NP-complete.
Proof 4.14.
Given a SAFA and an input word , if , then a nondeterministic Turing machine can guess an accepting run in polynomial time and hence the membership problem is in NP.
For showing NP-hardness, we reduce from SAT for the nonemptiness problem as done for Lemma 4.6. Instead of a deterministic automaton that was constructed in the proof of Lemma 4.6, we construct a nondeterministic SAFA with and all the transitions in are labelled with the same letter . Everything else remains the same as in the construction in Lemma 4.6. Note that in the 3SAT formula , if there are variables and clauses, then there is a path of length from the initial state to the unique final state of . We consider an input word , that is a word in which all the attribute, data-value pairs are identical in the whole word such that . It is not difficult to see that iff is satisfiable.
Finally, we show that given a SAFA defined on , whether (universality problem) is undecidable.
Theorem 4.15.
The universality problem for SAFA is undecidable.
Proof 4.16.
The proof is similar to showing \chadded[id=KC]that the universality problem for -register automata is undecidable [23]. We reduce the Post Correspondence Problem (PCP) which is already known to be undecidable to the universality problem for SAFA. \chdeleted[id=KC]The PCP problem consists of two lists of equal length, say . The items of the lists are finite strings defined on an alphabet where . Without loss of generality, we can assume . List 1 consists of the strings , and list 2 consists of the strings where . The PCP problem is true, if there exists a sequence where such that and false otherwise (see Example 2.1). We reduce the PCP problem to the universality problem for SAFA such that the constructed SAFA does not accept a word which corresponds to a PCP solution. Thus, the SAFA is universal if and only if there does not exist a solution to the PCP problem. For the reduction, we consider input data words of the format with , where data item is a separator and the data item is an end-marker. The data words and represent a candidate solution () where of the PCP instance. Such a candidate solution is a true solution of the PCP instance if the following conditions hold.
-
•
for each which denotes \chadded[id=KC]the fact that the corresponding strings are taken from the same domino.
-
•
, i.e. both strings are same.
We now describe the format in more detail.
-
•
Each is encoded as where gives a unique data value to this particular occurrence of domino string from the first list. The symbols , the data values represent the position of each in uniquely and . Similarly, is also encoded. The data words . Every and is unique in , that is even across different instances of the data values and used are different.
-
•
A string is syntactically correct if the above conditions hold and also the following two conditions are true.
-
–
, i.e. the sequence of data values associated with the symbols in in are same as that in . Having this same sequence of data values in both and corresponds to the fact that the and the appear in the same order i.e. for each
-
–
, i.e. the sequence of data values associated with the symbols in in are same as that in . This corresponds to the fact that the strings in and obtained by concatenating the and the respectively match, i.e. .
-
–
A syntactically correct string is a true solution of a PCP instance if
-
•
for each data value in the number in associated with that data value in both and are same. This ensures that the strings in both and are chosen from the same domino, i.e. for each , and
-
•
for each data value in , the letter in associated with that data value in both and are same. This ensures that the strings formed from both the list are same. i.e. .
We now describe a nondeterministic SAFA which accepts an input data word where if and only if the input data word is not in the correct format or it is not a solution of the PCP instance. The SAFA checks and accepts if the following conditions are satisfied for the input string .
-
1.
The input strng is not in the format as required by a PCP instance:
-
(a)
The input word is not in the form . This checking can be done using an NFA.
-
(b)
Consider a substring between two consecutive in where , , and , and we call the -projection of . The string is not in the right format if there exists a substring as above whose -projection is not the same as . Similarly, the string is not in the right format if there exists a substring as above whose -projection is not the same as . Corresponding to every string for , there is a deterministic finite automaton (DFA) that accepts . Given the , we can use the corresponding DFA to check that the -projection of is not the same as . The nondeterminstic SAFA guesses such a substring of which is not in the right format.
-
(a)
-
2.
The projections in and are not in the required format.
-
(a)
Two data values in are same. The SAFA can nondeterministically guess that a particular data value is repeated in and store it in a set. If it comes across that same data value again while traversing , it accepts the input word.
-
(b)
Two data values in are same. The SAFA can nondeterministically guess that a particular data value is repeated in and store it in a set. If it comes across that same data value again while traversing , it accepts the input word.
-
(c)
The first data value in and the first data value are not the same. The SAFA can store the first data value in in a set and match it while reading the first data value in after .
-
(d)
The last data value in and the last data value are not the same. Again the SAFA can nondeterministically read the last two data values in each of these sequences and match them.
-
(e)
Two data values and are successors in but not in .
-
•
The SAFA can again nondeterministically decide on reading such a pair of data values and in that are different from those in and store them in two different sets. Then as it parses through data values in and comes across the first data value it checks whether the successor data value in both cases are same or not.
-
•
-
(a)
-
3.
The projections in and are not in the required format. This checking can be done in a similar manner as the checking for projections. Again recall that this can easily be done since all values are unique in .
-
4.
The input word is not a true solution of the PCP instance.
-
(a)
For the input word to be a correct solution of the PCP instance, the attribute corresponding to each data value in needs to be same as that in . This ensures the strings are chosen from the same domino of the PCP instance.
-
(b)
The attribute corresponding to each data value in needs to be same as the attribute in . This ensures the concatenation of strings chosen from List 1 and List 2 are the same.
-
(c)
If any of the corresponding data values in and have different attributes associated with them, then the input word is not a true solution of the PCP instance and the SAFA accepts the input word. We know describe how the SAFA finds that the input word is not a correct solution for and above. Based on the data value in the SAFA nondeterministically checks whether the same data value in has the same attribute associated with it. If they are not the same, the input word is accepted. The SAFA after making the guess stores the data value in a set and also remembers in its state space the attribute it encountered with the data value. When it again comes across the data value in , it checks the associated attribute. It accepts the input word if they are not the same.
-
(a)
The above mentioned nondeterministic SAFA accepts an input word if and only if the input data word is not in the correct format or it is not a solution to the PCP instance. Therefore if the SAFA does not accept an input word, i.e. its not universal, then it implies that the input word is in the required format and is also a solution to the PCP instance, implying that the PCP instance has a solution.
If the PCP instance has a solution then that solution can be represented in the correct format, and the SAFA does not accept the input word which represents such a solution to the PCP instance is in the correct format. Therefore, the SAFA is not universal.
Hence, universality of SAFA is undecidable.
4.2 Closure Properties
We now study the closure properties. We first study the closure properties of SAFA followed by those of DSAFA.
4.2.1 Closure Properties of SAFA
We start with the Boolean closure properties. We show that SAFA are closed under union, but not under intersection and complementation.
Lemma 4.17.
SAFA are closed under union.
Proof 4.18.
We show here that SAFA models are closed under union. Union of two SAFA models can be obtained by superimposing their start states together. Let us consider two SAFA and . The SAFA which accepts the language is constructed as follows: , if and , otherwise , . All transitions in and are in . Additionally, for every transition from state to any state in such that , with and , , the transition is included in from state to state . Similarly, for every transition from state to a state in such that , , and , , the transition is included in from state to state . The automaton on an input data word nondeterministically decides to which of and the input word belongs. If the input word is accepted by either of the automata, it is accepted by . If an input word is not accepted by both the automata, it is rejected. Thus, the SAFA accepts the language .
Lemma 4.19.
SAFA are not closed under intersection.
Proof 4.20.
Consider the language . We show that there exists no SAFA which accepts . We prove by contradiction. Assume that there exists a SAFA with such that accepts . Then must accept the following word where and are all distinct. In order to accept , the SAFA must go through a sequence of transitions to completely consume and end in an accepting state. Here consumes the data element , and consumes the data element and . Two cases are possible:
-
•
There is a transition say in consuming the data element of where and is of the form or where . The SAFA using the same sequence of transitions can accept another data word where is replaced by such that for . It is always possible to get such a data value as is finite but is countably infinite. Therefore, at the time of executing the data value is not present in and executes successfully. Recall that the data values in that follow are different from . The data values in that follow the data value are not equal to . Therefore whether has been inserted to any set or not while executing does not impact the successful execution of the transitions in that follow . Now, in there is a data value associated with attribute which is not associated with attribute , thus .
-
•
All transitions in following are of the form or where . The number of transitions in that follow is greater than . Hence, by pigeon hole principle, there must be two transitions and where which have the same condition for some . The SAFA using the same sequence of transitions can accept another data word where is replaced by . The SAFA when executing on can successfully consume the data element instead of . This is because and have the same condition and is already present in when is executed. Note that the data values in that follow the execution of are not equal to . Therefore whether has been inserted to any set or not does not impact the successful execution of the transitions in that follow .
Using a pumping argument, we show that these automata are not closed under complementation.
Lemma 4.21.
Let . Then there exists a SAFA with states that accepts such that every data word of length at least can be written as and corresponds to the sequence of transitions that takes to accept , where , , is the sequence of transitions that takes to read , , respectively, and denotes the transition of the transition sequence with , satisfying the following:
-
•
-
•
-
•
for all , for all words such that is the sequence of transitions that takes to accept and , .
-
–
if has then the datum of , , .
-
–
if has then the datum of , , .
-
–
if has then the datum of , .
-
–
if has , then the datum of , .
-
–
-
•
.
Proof 4.22.
Since , there exists a SAFA , with say states that accepts . As and , the sequence of states that traverses to accept must contain a cycle. Let us take the first such cycle and call it . Let the sequence of transitions that executes to traverse the cycle be and the infix of read along be . Let be the sequence of transitions that traverses before entering the first cycle and the prefix of read along be . Let be the sequence of transitions that traverses after exiting the cycle to reach a final state in , and let the suffix of read along be . Therefore, and the sequence of transitions that traverses to accept is say . Moreover since is the first such cycle, we have and . Now, consider a sequence of transitions , then , where but may or may not be equal to each other. Since , the sequence of transitions must have at least one transition. The transition in with can always be executed successfully because the SAFA when executing can always read a new data value which it has not read till executing . We can always find such data values as is finite whereas is countably infinite. The transition in with is executed successfully when consuming , since, is consumed successfully by when accepting . As, is consumed successfully by , the set corresponding to is non-empty as there are no removal operations in SAFA. Therefore, after consuming , every time the sequence is executed, is also executed successfully.
The sequence is executed successfully due to same reasons as . Thus, accepts a data word .
Lemma 4.23.
SAFA are not closed under complementation.
Proof 4.24.
To show SAFA are not closed under complementation, we first define the following functions. The function gives the number of times data value is present in a data word and gives the number of data values with in . We consider the language , which is the language of data words where there exists a data value such that . Example 3.4 shows a SAFA that accepts this.
Consider the complement language wherein all data values occur exactly twice. Using Lemma 4.21 we show no SAFA can accept .
The proof is by contradiction. Suppose that there exists a SAFA with n states accepting . Let be a data word such that and .
For every decomposition of as and sequence of transitions that takes to accept with , we have a such that . Since , we have that must have either a transition with for some or a transition with for some or both. If has , then the first time is executed while consuming , assume that it consumes a data value . It is able to consume the data value as it is already inserted in before is executed. Now if after consuming the word , is executed again, then when executing the transition it can again consume the same data value as before. So, every time is executed, the SAFA will consume the data value while executing the transition . After executing three times, the SAFA executes the transition sequence . All the transitions with in can be executed successfully with the same data value that they consumed when accepted because and both have the same prefix . The transitions with in consume data values that had not encountered prior to executing these transitions. Thus, if has a transition with for some , then accepts the data word where there exists a data value with .
If has a transition, say with for some , then every time is executed after consuming , the SAFA when executing can always read a new data value which it has not read till executing and that it will not read later. We can always find such data values as is finite whereas is countably infinite. The sequence is executed successfully due to same reasons as before. Thus, if has a transition with , then accepts a data word where . Note that the data value consumed by when taking the transition while reading may already be present in the prefix being read by the sequence of transitions prior to taking . Therefore, .
Note that from the above construction, we see that SAFA with are not closed under complementation. We observe that singleton SAFA are closed under union but not under intersection. From the hierarchy theorem (Theorem 3.5), we see singleton SAFA cannot accept but singleton SAFA can accept and . Thus, singleton SAFA are not closed under intersection, and hence also not closed under complementation.
Theorem 4.25.
SAFA are closed under concatenation.
Proof 4.26.
SAFA are shown to be closed under concatenation in a manner similar to NFA. Note that if the sets used in the two input automata and be and respectively, then the automaton accepting the language obtained as a result of concatenation uses the sets .
Theorem 4.27.
SAFA are not closed under Kleene’s closure.
Proof 4.28.
Consider the language . The language can be accepted by a DSAFA (see Figure 8). The Kleene’s closure of is the language , i.e., is the set of all data words where every data value appears in pairs. We show that there exists no SAFA which accepts . We prove by contradiction. Assume that there exists a SAFA with such that accepts . Then must accept the following word where and are all distinct. In order to accept , the SAFA must go through a sequence of transitions to completely consume and end in an accepting state. Here consumes the first data item of the data value pair and consumes the second data item of the data value pair and .
-
•
The transitions must be of the form or where . This is essential because if is of the form or then instead of consuming it can also consume successfully a new data value which is not present in . It is always possible to get such a data value as is countably infinite. The SAFA will then accept the data word which is not in .
-
•
The transitions must be of the form where . This is essential because if is of the form or , then will fail to consume the first instance of the data value , since has not encountered prior to the transition as all data value pairs in are distinct and therefore is not present in any set. The transition cannot be of the form because the following transition which is of the form or where will not be executed successfully. Note that for the transition to consume the data value , the data value must be inserted in a set when it was first encountered.
-
•
Since the number of distinct data values in is more than the number of sets in and all the distinct data values are inserted in the sets in , by pigeon hole principle, there are two distinct data values and , with which are inserted in the same set . Thus, if accepts the data word then also accepts the data word which is not in .
Thus, SAFA are not closed under Kleene’s closure.
Theorem 4.29.
SAFA are not closed under reversal.
Proof 4.30.
Consider the language which can be accepted by a DSAFA (see Figure 9). The reversal of is the language i.e. is the set of all data words where for every attribute there is an attribute which comes after it and whose data value is same as that of attribute . We show that there exists no SAFA which accepts . We prove by contradiction. Assume that there exists a SAFA with such that accepts . Then must accept the following word where and are all distinct. In order to accept , the SAFA must go through a sequence of transitions to completely consume and end in an accepting state. Here consumes the data item and consumes the data item of .
-
•
The transitions must be of the form or where . This is essential because if is of the form or then instead of consuming it can also consume successfully a new data value which is not present in . It is always possible to get such a data value as is countably infinite. The SAFA will then accept the data word which is not in .
-
•
The transitions must be of the form where . This is essential because if is of the form or , then will fail to consume the first instance of the data value , since has not encountered prior to the transition as all data values in associated with attribute are distinct and therefore is not present in any set. The transition cannot be of the form because there is a following transition which is of the form or where that will not be executed successfully. Recall that for the transition to consume the data value , the data value must be inserted in a set when it was first encountered.
-
•
Since the number of distinct data values in associated with attribute are more than the number of sets in and all the distinct data values associated with attribute are inserted in the sets in , by pigeon hole principle, there are two distinct data values and , with which are inserted in the same set . Thus, if accepts the data word then also accepts the data word
which is not in .
Thus, SAFA are not closed under reversal.
Theorem 4.31.
SAFA are not closed under homomorphism.
Proof 4.32.
Consider the language where the data values are taken from the set of natural numbers and . The homomorphism function is , for all and . The language is a language of data words where every data value occurs exactly twice and also consecutively. There exists no SAFA which accepts . The proof is by contradiction. Suppose that there exists a SAFA with n states accepting . Let be a data word such that and .
For every decomposition of as and sequence of transitions that takes to accept with , we have a such that . Since , we have that must have either a transition with for some or a transition with for some or both. If has , then the first time is executed while consuming , assume that it consumes a data value . It is able to consume the data value as it is already inserted in before is executed. Now if after consuming the word , is executed again, then when executing the transition it can again consume the same data value as before. So, every time is executed, the SAFA will consume the data value while executing the transition . After executing three times, the SAFA executes the transition sequence . All the transitions with in can be executed successfully with the same data value that they consumed when accepted because and both have the same prefix . The transitions with in consume data values that has not encountered prior to executing these transitions. Thus, if has a transition with for some , then accepts the data word where there exists a data value with .
If has a transition, say with for some , then every time is executed after consuming , the SAFA when executing can always read a new data value which it has not read till executing and that it will not read later. We can always find such data values as is finite whereas is countably infinite. The sequence is executed successfully due to same reasons as before. Thus, if has a transition with , then accepts a data word where . Note that the data value consumed by when taking the transition while reading may already be present in the prefix being read by the sequence of transitions prior to taking . Therefore, .
Theorem 4.33.
SAFA are not closed under inverse homomorphism.
Proof 4.34.
Consider the language . There exists a SAFA which accepts . The data values are taken from the set of natural numbers and . The homomorphism function is , , for all . The language is a language of data words where the data word is or data words only having data value 1 present in it. Since SAFA cannot be initialized \chadded[id=KC], SAFA cannot identify that it has seen the data value 1 as it does not have the data value 1 in any of its sets at the time of beginning the computation. Hence, no SAFA can recognise .
4.2.2 Closure properties of Deterministic SAFA:
Here we discuss deterministic SAFA and compare their expressiveness with SAFA. Using standard complementation construction as in deterministic finite automata (DFA), by changing non-accepting states to accepting and vice versa, we can show that DSAFA are closed under complementation. Moreover, as deterministic SAFA are closed under complementation but not under intersection hence, it follows that they are also not closed under union. Since the languages used to show non-closure of SAFA under Kleene’s closure, homomorphism, and inverse homomorphism are accepted by DSAFA, we have that DSAFA are also not closed under Kleene’s closure, homomorphism and inverse homomorphism.
Theorem 4.35.
DSAFA are not closed under concatenation.
Proof 4.36.
Consider the language with and . Both and can be accepted by DSAFA. The concatenation of and is the language . i.e. is the set of all data words that ends with a pair of same data values and all other data values present in the data word other than the data value in the pair are distinct. The frequency of the data value in the pair is either two or three. We show that there exists no DSAFA which accepts . We prove by contradiction. Assume that there exists a DSAFA with such that accepts . Then must accept the following word where and are all distinct. In order to accept , the DSAFA must go through a sequence of transitions to completely consume and end in an accepting state. Similarly, the DSAFA must go through a sequence of states to accept where , and is the initial state. Here consumes the data item of the input data word and takes the DSAFA from state to .
-
•
The last transition must be of the form or where . This is essential because if is of the form or then instead of consuming which is the second last data value in repeated in the last position, it can also consume successfully a new data value which is not present in . It is always possible to get such a data value as is countably infinite. The DSAFA will then accept the data word which is not in . If the last transition is of the form or then the second last transition must be of the form where . The second last transition cannot be of the form or because it consumes the data value in , which was first encountered by when executing transition . Hence, the data value cannot be present in any set prior to executing . The transition and consume the same data value. Moreover, is of the form or , therefore to execute successfully the data value must be present in the set . The data value was first encountered when executing , hence must insert the data value into the set .
-
•
The transitions must be of the form or where . This is essential because if where is of the form or , then will fail to consume the first instance of the data value , since has not encountered prior to the transition as all data values in except the last data value are distinct and therefore is not present in any set.
-
•
Since , there exists at least a state in which is repeated in the sequence .
-
–
In the sequence of states only the last state can be an accepting state, no other intermediate state can be accepting because then the DSAFA will accept a data word which is not in . Therefore, the accepting state of cannot be a repeated state in .
-
–
Let us assume that the state just prior to the final accepting state , i.e. the state in is one such repeated state, i.e. suppose it is same as the state in . Then, can execute the sequence of transitions and will accept the data word
where , all of them are distinct and also different from data values . We can always find such data values as is countably infinite. The DSAFA accepts the data word due to the following reasons:
-
*
The SAFA accepts by executing the sequence of transitions . The transition in is of the form or , therefore, for SAFA , to execute the transition and consume the data value , the data value must already be present in the set . The data value is first encountered while executing transition therefore inserts the data value in .
-
*
In consuming , the DSAFA executes the transition in its transition sequence, so the data value is already present in set which uses to execute the transition as the last transition in the sequence of transitions to consume . The data word is not in as the last two data values are not the same.
Hence the state cannot be a repeated state in .
-
*
-
–
Let us assume a state where and is one such repeated state, i.e. suppose it is same as the state in . The transitions in are of the form or . The SAFA is a DSAFA and in a DSAFA only two possible transitions can come out of a state: one with and the other with . In the state the transition of the form or takes to a state which is inside the cycle . Only other transition allowed in is a transition of the form or that takes it to a state which is not in the cycle. The state is not a final state or a state prior to a final state. Therefore there must exist a transition from in as is an accepting sequence of transitions which takes DSAFA from to such that is no longer in the cycle . The DSAFA moves towards the accepting state which is not present in the cycle by executing . In order to go from to , the DSAFA must execute a transition of the form or as it is the only other transition available in state which is a contradiction as all transitions in coming out of states which are not accepting states or states prior to accepting states are of the form or .
-
–
Hence, DSAFA are not closed under concatenation.
Thus, we get the following theorem.
Theorem 4.37.
DSAFA are closed under complementation but not under union, intersection, concatenation, Kleene’s closure, reversal, homomorphism and inverse homomorphism.
Table 1 provides a summary of the closure properties of SAFA and DSAFA. We use the symbols for union, for intersection, for complement, for homomorphism and for inverse homomorphism respectively.
. | ||||||||
---|---|---|---|---|---|---|---|---|
SAFA | ||||||||
DSAFA |
We now show that the class of languages accepted by DSAFA is strictly contained in the class of languages accepted by SAFA.
Theorem 4.38.
.
Proof 4.39.
Recall from Example 3.4 that the language . On the other hand, we show in the proof of Lemma 4.23 that there does not exist a SAFA accepting its complement language . This implies that the language cannot be accepted by a DSAFA since DSAFA are closed under complementation. The result follows since every deterministic SAFA is a SAFA.
Lemma 4.40.
For Every DSAFA accepting we can always get a DFA accepting where is the language expressed by the regular expression which has the same number of states as the DSAFA.
Proof 4.41.
Consider the language . Let be a DSAFA which accepts . We can always obtain a DSAFA with less or equal number of states as and transitions only of the form where and which accepts in the following manner:
-
•
We can remove transitions of the form or where and from to construct .
-
•
If there are transitions of the form where and in we convert it to in .
Thus all transactions in are of the form . Observe, that remains a DSAFA. The removal of or from may result in some unreachable states in which can be removed. accepts due to the following reason:
Consider the case that has a sequence of transitions where there is at least one transition of the form which results in accepting a data word in . The data word has where L(r) is the regular language expressed by the regular expression and has at least one data value which is repeated. But as accepts it should also accept the data word in where and have data values which are all distinct. The sequence of transitions which accepts consists of transitions with where only. The sequence of transitions are valid also for as retains all transitions with and removes any insert operation if there are any from . This sequence of transitions accepts both and in . Given such a DSAFA , we can get a DFA which accepts by converting the transitions labelled to in . The number of states in the DFA is same as the number of states of the DSAFA .
Lemma 4.42.
Every DSAFA accepting has at least as many states as the smallest DFA accepting where is the language expressed by the regular expression .
Proof 4.43.
Let us assume there exists a DSAFA which accepts with states and there is a minimized DFA which accepts where is the language expressed by regular expression with states such that .
From Lemma 4.40 we see that there exists a DFA which accepts with number of states. Thus, is no longer the minimized DFA, which is a contradiction.
Theorem 4.44.
There exists a language which is accepted by a non-deterministic SAFA with states but a DSAFA will require at least states to accept the same language .
Proof 4.45.
From Lemma 4.42 we see that the number of states of any DSAFA that accepts must be greater than or equal to the number of states of the minimum DFA that accepts . Now consider a minimum NFA which accepts L(r). We can obtain a nondeterministic SAFA which accepts the language using the NFA by replacing the transitions of the NFA which are of the form where by the transition . The number of states of the nondeterministic SAFA is the same as that of the NFA . We know that there exists an NFA accepting such that every DFA accepting the same language has size at least exponential in the size of the NFA. Now for one such , the number of states required for any DSAFA to accept is exponential in the number of states required to accept by a nondeterministic SAFA.
5 Expressiveness
Let and be the set of all languages accepted by -register automata and CMA respectively. We compare the computational power of SAFA with -register automata. We show that and are incomparable. Although and are incomparable, SAFA recognize many important languages which -register automata also recognize such as : wherein the first data value is repeated, : wherein attribute is associated with more than two distinct data values. On the other hand, -register automata fail to accept languages where we have to store more than data values such as , : wherein attribute is associated with an even number of distinct data values [20]. SAFA can accept both these data languages. We also show below that there are languages such as : the language of data words, each of which contains the data value associated with some attribute at some position in the data word, that can be accepted by a -register automaton but not by SAFA.
Example 5.1.
A -register automaton can accept the language . Consider the 2-register automaton with , , , , , . The transition relation is defined as:
. The automaton accepts . For an input word , the automaton checks whether the current data value of under the head of is equal to the content of register , which holds the data value from the time of initialization of the registers. If it is equal, the automaton goes to state and consumes the word. Since is a final state, the word is accepted. If the data value is not present in , the automaton remains in state and rejects the input word. ∎
Theorem 5.2.
and are incomparable.
Proof 5.3.
We first show by contradiction that no SAFA can accept . Suppose there exists a SAFA which accepts . Now consider a word where the data value has occurred at the first position only. Let the sequence of transitions that goes through to accept be . The first transition that executes cannot be a transition with , because is the first transition of and there cannot be any insertion to set prior to it. Therefore, is a transition with , and so can consume any other data value which is not present in . As does not occur anywhere else in , it is safe to say that all other transitions in with , have data values other than present in their respective sets. Thus, if accepts , where and does not have value in it, then also accepts , and does not have data value in it, which is a contradiction. From Example 5.1, Example 3.3 and the fact that -register automata cannot accept [20] and the above, we conclude and are incomparable.
[id=KC]If we equip SAFA with initialization, that is the sets of SAFA can be initialized prior to the beginning of computation, then SAFA can accept the language . Even then, -register automata and SAFA with initialization are incomparable as shown below.
Example 5.4.
A -register automaton can accept the language . Consider the -register automaton with , , , , , ,
The transition relation is defined as: . The automaton accepts . For an input word , the automaton in state reads the first data value of a data pair and inserts it into register and goes to state . In state if the second data value of the data pair is same as the first, the automaton goes back to state to read the next data value pair. Otherwise, the automaton goes to a dead state and rejects the input word . ∎
Let denote the set of all languages accepted by nondeterministic SAFA with initialization. We have the following.
Theorem 5.5.
and are incomparable.
Proof 5.6.
Consider the language i.e. is the set of all data words where every data value appears in pairs. We show that there exists no SAFA with initialization which accepts . We prove by contradiction. Assume that there exists a SAFA with and whose sets can be initialized such that accepts . Then must accept the following word where and are all distinct. In order to accept , the SAFA must go through a sequence of transitions to completely consume and end in an accepting state. Here consumes the first data item of the data value pair and consumes the second data item of the data value pair and .
-
•
The transitions must be of the form or where . This is essential because if is of the form or then instead of consuming it can also consume successfully a new data value which is not present in . It is always possible to get such a data value as is countably infinite. The SAFA will then accept the data word which is not in .
-
•
The transitions must either insert the data values it reads in to a set in or if the data value is already present in a set during initialization it may or may not insert it. Recall that for the transition to consume the data value , the data value must be inserted in a set when it was first encountered or it must already be present during initialization.
-
•
Since the number of distinct data values in are more than the number of sets in and all the distinct data values are inserted in the sets in , by pigeon hole principle, there are two distinct data values and , with which are inserted in the same set . Thus, if accepts the data word then will also accept the data word which is not in .
Also we note that both SAFA and -register automata have the same complexity for the nonemptiness and the membership problems. Similar to SAFA, CCA and CMA also accept data languages such as , which -register automata cannot, but their decision problems have higher complexity [5, 19]. We can show that the class of languages accepted by SAFA is strictly contained in the class of languages accepted by CCA.
Theorem 5.7.
Proof 5.8.
For every SAFA with accepting a language , we can construct a -bag CCA which accepts the same language in the following manner:
-
•
The set of states, the set of final states and the initial state are the same for both the SAFA and the -bag CCA A.
-
•
The transitions in the transition relation of SAFA are mapped to transitions in the transition relation of the -bag CCA as follows:
-
–
For every transition of the form in the transition relation of SAFA which takes the SAFA from state to state where , and , we have the transition where is and all other are , for all in the transition relation of the -bag CCA A. If a data value is read on the transition in SAFA , then CCA A checks if denoting that the -bag CCA has seen the data value before and it has been recorded in the bag.
-
–
For every transition of the form in the transition relation of SAFA which takes the SAFA from state to state where , and , we have the transition where is and all other are , for all in the transition relation of -bag CCA. If a data value is read on the transition in SAFA , then CCA A checks if denoting that the -bag CCA has either not seen the data value or it has not been recorded in the bag.
-
–
For every transition of the form in the transition relation of the SAFA which takes the SAFA from state to state where , and , we introduce the transition where is and all other are , and all other for all in the transition relation of the -bag CCA. Similar to SAFA inserting the data value in the set , the -bag CCA records the data value by setting while all the values for all other bags remain unchanged.
-
–
For every transition of the form in the transition relation of SAFA which takes the SAFA from state to state where , and , we introduce the transition where is and all other are , and all other for all in transition relation of the -bag CCA. Similar to SAFA inserting the data value in the set , the -bag CCA records the data value by setting .
-
–
The -bag CCA simulates the SAFA . Given a data word accepted by the SAFA , there is a sequence of transitions that takes SAFA from an initial state to a final state. The -bag CCA simulating the SAFA can replicate the corresponding sequence of transitions to accept . Given a data word not accepted by the SAFA , there is no sequence of transitions that takes the SAFA from an initial state to a final state. The -bag CCA simulating the SAFA also does not have any sequence of transitions which takes it from an initial state to a final state and thus the -bag CCA also rejects .
We know from [19], that for every -bag CCA there exists a one bag CCA which accepts the same language. Therefore for every SAFA there exists a CCA which accepts the same language. Now, we show that CCA are more expressive than SAFA. Moreover, SAFA accepts the languages and but not the language , and there exists a CCA for every language accepted by a SAFA, the languages . Since CCA are closed under intersection [19], we have that . Therefore, .
[id=KC]We know from [19] that and from the above result, we get the following theorem. \chdeleted[id=KC]We can show that the class of languages accepted by SAFA is strictly contained in the class of languages accepted by CMA.
Corollary 5.9.
.
[id=KC]Proof sketch. A state of a CMA simultaneously keeps track of the state of the SAFA that it wants to simulate and also to which sets a particular data value has been inserted in . Since both the number of states and the number of sets in SAFA are finite, therefore, the number of states in the CMA is also finite. The number of states of the CMA simulating a SAFA is exponential in the size of the SAFA since a state of the CMA keeps the information of a subset of storing a data value. Further, one can show that a CMA can accept the language , and from Lemma 4.23, we know that no SAFA can accept . ∎
6 Conclusion
In this paper, we introduce set augmented finite automata which use a finite set of sets of data values to accept data languages. We have shown examples of several data languages that can be accepted by our model. We compare the language acceptance capabilities of SAFA with -register automata, CCA, and CMA. The computational power and low complexity of nonemptiness and membership of SAFA makes it an useful tool for modeling and analysis of data languages. We show that similar to register automata, the universality problem for SAFA is undecidable. We also study the deterministic variant of SAFA which are closed under complementation, and hence have decidable universality. We believe our model is robust enough and can also be extended to infinite words. This model opens up some interesting avenues for future research. We would like to explore augmentations of SAFA with Boolean combinations of tests and the feature of updating multiple sets simultaneously. This may lead to having a more well-behaved model with respect to closure properties.
References
- [1] Jean-Michel Autebert, Joffroy Beauquier, and Luc Boasson. Langages sur des alphabets infinis. Discrete Applied Mathematics, 2(1):1–20, 1980.
- [2] Christel Baier and Joost-Pieter Katoen. Principles of model checking. MIT press, 2008.
- [3] Ansuman Banerjee, Kingshuk Chatterjee, and Shibashis Guha. Set augmented finite automata over infinite alphabets. In Frank Drewes and Mikhail Volkov, editors, Developments in Language Theory - 27th International Conference, DLT 2023, Umeå, Sweden, June 12-16, 2023, Proceedings, volume 13911 of Lecture Notes in Computer Science, pages 36–50. Springer, 2023.
- [4] Alexis Bès. An application of the feferman-vaught theorem to automata and logics for words over an infinite alphabet. Logical Methods in Computer Science, 4, 2008.
- [5] Henrik Björklund and Thomas Schwentick. On notions of regularity for data languages. Theoretical Computer Science, 411(4-5):702–715, 2010.
- [6] Mikolaj Bojanczyk, Anca Muscholl, Thomas Schwentick, Luc Segoufin, and Claire David. Two-variable logic on words with data. In 21st Annual IEEE Symposium on Logic in Computer Science (LICS’06), pages 7–16. IEEE, 2006.
- [7] Benedikt Bollig. An automaton over data words that captures emso logic. In International Conference on Concurrency Theory, pages 171–186. Springer, 2011.
- [8] Edward YC Cheng and Michael Kaminski. Context-free languages over infinite alphabets. Acta Informatica, 35(3):245–267, 1998.
- [9] W. Czerwiński and L. Orlikowski. Reachability in vector addition systems is ackermann-complete. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 1229–1240, 2022. doi:10.1109/FOCS52979.2021.00120.
- [10] Jürgen Dassow and György Vaszil. P finite automata and regular languages over countably infinite alphabets. In International Workshop on Membrane Computing, pages 367–381. Springer, 2006.
- [11] Stéphane Demri and Ranko Lazić. LTL with the freeze quantifier and register automata. ACM Transactions on Computational Logic (TOCL), 10(3):1–30, 2009.
- [12] Diego Figueira. Alternating register automata on finite words and trees. Logical Methods in Computer Science, 8, 2012.
- [13] Orna Grumberg, Orna Kupferman, and Sarai Sheinvald. Variable automata over infinite alphabets. In International Conference on Language and Automata Theory and Applications, pages 561–572. Springer, 2010.
- [14] Radu Iosif and Xiao Xu. Abstraction refinement for emptiness checking of alternating data automata. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 93–111. Springer, 2018.
- [15] Neil D Jones. Space-bounded reducibility among combinatorial problems. Journal of Computer and System Sciences, 11(1):68–85, 1975.
- [16] Michael Kaminski and Nissim Francez. Finite-memory automata. Theoretical Computer Science, 134(2):329–363, 1994.
- [17] Michael Kaminski and Tony Tan. Regular expressions for languages over infinite alphabets. Fundamenta Informaticae, 69(3):301–318, 2006.
- [18] S. R. Kosaraju. Decidability of reachability in vector addition systems (preliminary version). In Harry R. Lewis, Barbara B. Simons, Walter A. Burkhard, and Lawrence H. Landweber, editors, Proceedings of the 14th Annual ACM Symposium on Theory of Computing, May 5-7, 1982, San Francisco, California, USA, pages 267–281. ACM, 1982.
- [19] Amaldev Manuel and Ramaswamy Ramanujam. Class counting automata on datawords. International Journal of Foundations of Computer Science, 22(04):863–882, 2011.
- [20] Amaldev Manuel and Ramaswamy Ramanujam. Automata over infinite alphabets. In Modern applications of automata theory, pages 529–553. World Scientific, 2012.
- [21] E. W. Mayr. An algorithm for the general petri net reachability problem. SIAM J. Comput., 13(3):441–460, 1984.
- [22] Frank Neven. Automata, logic, and xml. In International Workshop on Computer Science Logic, pages 2–26. Springer, 2002.
- [23] Frank Neven, Thomas Schwentick, and Victor Vianu. Finite state machines for strings over infinite alphabets. ACM Transactions on Computational Logic (TOCL), 5(3):403–435, 2004.
- [24] Hiroshi Sakamoto and Daisuke Ikeda. Intractability of decision problems for finite-memory automata. Theoretical Computer Science, 231(2):297–308, 2000.
- [25] Tony Tan. On pebble automata for data languages with decidable emptiness problem. Journal of Computer and System Sciences, 76(8):778–791, 2010.