11email: kzhou23@uic.edu
Query Learning Bounds for Advice and Nominal Automata
Abstract
Learning automata by queries is a long-studied area initiated by Angluin in 1987 with the introduction of the algorithm to learn regular languages, with a large body of work afterwards on many different variations and generalizations of DFAs. Recently, Chase and Freitag introduced a novel approach to proving query learning bounds by computing combinatorial complexity measures for the classes in question, which they applied to the setting of DFAs to obtain qualitatively different results compared to the algorithm. Using this approach, we prove new query learning bounds for two generalizations of DFAs. The first setting is that of advice DFAs, which are DFAs augmented with an advice string that informs the DFA’s transition behavior at each step. For advice DFAs, we give the first known upper bounds for query complexity. The second setting is that of nominal DFAs, which generalize DFAs to infinite alphabets which admit some structure via symmetries. For nominal DFAs, we make qualitative improvements over prior results.
Keywords:
Advice automata Nominal automata Query learning1 Introduction
Learning various forms of automata with queries is a well-studied class of problems with many applications, including in AI, automatic verification, and model checking. The field was initiated by Angluin in 1987 with the introduction of the algorithm that learns deterministic finite automata (DFAs) using a polynomially bounded number of equivalence and membership queries [1]. The algorithm has seen many applications and has also inspired generalizations to other types of automata, such as tree automata [22], nondeterministic finite automata [6], -automata [2], symbolic automata [9], and fully-ordered lattice automata [11]. The setting of learning with queries is especially well-suited for learning automata—in real-world applications where the task is to identify the behavior of some black-box system that can be modelled by an automaton, queries can be simulated by interacting with the system and observing the output.
Historically, the study of query learning of automata has centered around adapting Angluin’s algorithm to different settings. We do not do this, and instead build on a method introduced by Chase & Freitag [8], who give query learning bounds in terms of Littlestone dimension and consistency dimension, which are complexity measures for concept classes. One can then prove a query learning bound in a given setting by computing the Littlestone and consistency dimensions. This approach seems to be particularly effective for learning automata, since many types of automata exhibit form of Myhill-Nerode characterization, which can be used to bound the consistency dimension. For example, Chase and Freitag apply this method for regular languages and regular -languages and obtain qualitatively different results from prior work. Our work serves as further proof of concept of this method.
Our work studies the learnability of two generalizations of DFAs. The first is advice DFAs, which were studied as early as 1968 by Salomaa [23], though we follow the notation of Kruckman et al. [15]. Advice DFAs generalize classical DFAs by allowing the automata access to an additional advice string that it reads concurrently with the input, and are useful in modeling situations where the transition behavior can vary over time. They are also of interest to the logic community: it is a classical result that DFAs correspond to weak monadic second-order formulas over the structure of natural numbers with the successor operation; advice DFAs correspond to formulas over expansions of this structure by unary predicates, a frequently studied setting; see e.g. [10, 7, 21, 4]. Another motivating factor for advice DFAs comes from the study of automatic structures, which are structures whose domain and atomic relations are recognized by DFAs. It turns out some natural structures are not automatic or even isomorphic to an automatic structure, such as , the additive group of the rationals [24]. However, is isomorphic to a structure whose domain and atomic relations are recognized by advice DFAs [18, 15].
The second setting we consider is that of nominal DFAs, introduced by Bojańczyk, Klin, and Lasota [5], which generalize DFAs to infinite alphabets. Generalizing formal language theory to infinite alphabets is highly non-trivial, since without any restrictions, the fact that there are uncountably many subsets of any infinite set makes computation intractable. However, in most reasonable applications, there is additional structure that can be leveraged to make computation reasonable, and nominal DFAs utilize the notion of nominal sets (first introduced by Gabbay & Pitts [12]) to formalize this idea. Aside from the development of the theory of automata over infinite alphabets, nominal sets have also found much use in the concurrency and semantics communities as a formalism for modeling name binding (see e.g. [17, 19]).
1.0.1 Organization of Paper and Summary of Results
In Section 2, we present the basic definitions and notation necessary for the remainder of the paper.
In Section 3, we study query learning of advice DFAs, and give the first known bound for the query complexity of advice DFAs. Our result for advice DFAs is as follows: let be the set of languages over an alphabet of size recognized by an advice DFA on at most states, restricted to strings of length at most . {restatable*}theoremadviceLC The (EQ+MQ)-query complexity of with queries from is .
In Section 4, we study query learning of nominal DFAs.
Query learning of nominal DFAs was previously studied by Moerman et al. [16].
We give a qualitative improvement over this previous result; a detailed comparison is given in the discussion below.
Our result for nominal DFAs is as follows: given a -alphabet , let denote the set of -languages recognized by a nominal DFA with at most orbits and nominal dimension at most .
{restatable*}theoremnominalLCfixedalphabet
For a fixed -alphabet , the (EQ+MQ)-query complexity of
with queries from is at most .
1.0.2 Discussion & Related Work
The approach of Chase & Freitag of deriving query complexity bounds by computing bounds on various combinatorial notions of dimension is a fairly novel one, and in [8] they illustrate its usefulness by easily proving polynomial query learning bounds for regular languages. The approach gives a qualitatively different bound compared to the one given by the algorithm—their results have no dependence on the length of the longest counterexample returned by the oracle (while the algorithm does), but a worse dependence on the number of states. They also do not give any computational complexity guarantees—their algorithm requires computing the Littlestone dimension and finding witnesses to the consistency dimension, which in general may be hard computational problems.
As mentioned, in the setting of advice DFAs, we give the first known bounds on the query complexity. The restriction on the length of the strings in is necessary to make the problem tractable and can be thought of as giving a dependence on the length of the longest counterexample for equivalence queries. It may be of interest to see if there is an analogous version of the algorithm for advice DFAs, and if so, how the bounds with that approach compare to ours.
The prior best known bound for query complexity of nominal DFAs is given by Moerman et al. in [16], using a generalization of Angluin’s algorithm to nominal DFAs. The bound that they derive is a complicated quantity (see [16, Corollary 1]), but in particular we observe the following: if the target automaton has orbits and nominal dimension , is the nominal dimension of the (fixed) alphabet, and is the length of the longest counterexample returned by the oracle, then their bound is lower bounded by both
-
1.
and
-
2.
.
These are not explicitly stated in the original work, but follow with some additional work (see 4).
Our result improves on (a) since our bound does not depend on the length of the longest counterexample, and improves the asymptotic dependence on compared to (b) (polynomial in with respect to , instead of factorial in ). This is especially important in light of 1, which says that in this setting, is at most a constant multiple of . However, as with Chase & Freitag’s result on classical DFAs, we give no computational complexity guarantees.
2 Preliminaries
2.1 Query learning
Let be a set (the instance space). A concept is a function , and a concept class on is a nonempty set of concepts. We note that a concept is sometimes equivalently defined as a subset of , but for our purposes it will be easier to work with functions. Fix a concept , which we call the target, and another concept class , which we call the hypothesis class. An equivalence query (EQ) consists of a hypothesis , to which the oracle answers yes if , or with a counterexample for which . A membership query (MQ) consists of an element , to which the oracle responds with the value of . The learning procedure proceeds interactively in rounds: in each round, the learner poses a query to the oracle, and the oracle responds with the corresponding answer. The learner is allowed to choose queries based on the responses to previous queries, and succeeds if they submit the target as an equivalence query. In EQ-learning, the learner is only allowed to submit equivalence queries, while in (EQ+MQ)-learning, the learner can use both equivalence and membership queries.
Definition 1 (Query Complexity)
Let be two concept classes. The EQ-query complexity of with queries from is defined to be the least such that there is an algorithm for the learner to submit equivalence queries from with the property that for any , the learner can identify within at most queries, or if no such exists.
The (EQ+MQ)-query complexity of with queries from is defined in the same way, except that the learner is allowed to also use membership queries.
A significant stream of prior work studied bounds for the query complexity of with hypotheses from in terms of combinatorial complexity measures of and [14, 3, 8, 13]. These bounds are formulated in terms of the Littlestone dimension, consistency dimension, and strong consistency dimension (also known as the dual Helly number), which we define now.
A binary element tree is a complete binary tree whose internal nodes are labeled by elements of . A binary element tree is shattered by if there is a way to label all the leaves of with elements of such that the following condition holds: given a leaf node labeled by , for each internal node above labeled by , if and only if the (unique) path from the root to goes through the left child of .
An example of a (labelled) binary element tree of height 2 is given below, where , and :
This tree is shattered exactly when , but , but , and .
Definition 2 (Littlestone dimension)
The Littlestone dimension of a concept class , denoted , is the maximum such that there exists a binary element tree of height which is shattered by . If no such exists, we say that .
Remark 1
A straightforward bound for the Littlestone dimension of a finite class is . To see this, note that a binary element tree of height has more than leaves, so there must be two leaves labeled by the same element . Consider the internal node where the paths leading to these two leaves differ; suppose it is labeled by . Then there is a root-to-leaf path ending at that goes through the left child of , implying that . However, there is also a root-to-leaf path ending at that goes through the right child of , implying that , so cannot be shattered.
We now define consistency dimension. For the following bulleted definitions, let be partial functions from to .
-
•
denotes domain of .
-
•
The size of refers to the cardinality of .
-
•
For a set , the restriction of to is the partial function defined by for and undefined outside of .
-
•
We say that extends if and . On the other hand, we say that is a restriction of .
-
•
Given a concept class , is -consistent with if every size restriction of has an extension in . Otherwise, is -inconsistent.
Definition 3 (Consistency Dimension)
The consistency dimension of with respect to , denoted , is the least such that for every concept that is -consistent with , we have that . If no such exists, we say that . In the case that , we will write to denote .
Remark 2
By contrapositive, if for every concept , is -inconsistent with . That is, we can find a subset of size such that cannot be extended to anything in .
It is often easier to work concretely with the contrapositive characterization of consistency dimension. Our proofs of bounds on consistency dimension will go through this direction.
Theorem 2.1 ([8, Theorem 2.24])
Let be two concept classes on a set , , and . Then (EQ+MQ)-query complexity of with queries from is .
2.2 Deterministic finite automata
In this subsection, we set notation and terminology for deterministic finite automata and regular languages, and briefly review the Myhill-Nerode theorem.
Definition 4 (Deterministic Finite Automata)
Let be a finite set (the input alphabet). A deterministic finite automaton (DFA) over consists of the following data:
-
•
a finite set (the set of states);
-
•
a function (the transition function);
-
•
a state (the initial state); and
-
•
a set of states (the set of accepting states).
As is typical, will denote the set of finite strings over . Given an input string , and a DFA , define the run of on to be the sequence of states such that , and for , we have that . A string is accepted by if the last state appearing in the run of on is in . A language is a function . We note that languages are often defined as subsets of , but here we use functions in order to align with our definition of a concept. A language is recognized by a DFA if accepts a string if and only if . We say that a language is regular if it is recognized by some DFA.
The Myhill-Nerode theorem provides a useful syntactic characterization of regular languages. Given a language , define an equivalence relation on as follows: for strings , we say that iff for all .
Theorem 2.2 (Myhill-Nerode)
A language is regular if and only if has finitely many equivalence classes. Moreover, has exactly classes if and only if the minimal DFA recognizing has exactly states.
3 Learning Advice DFAs
3.1 Overview of advice DFAs
In this subsection, we give an introduction to automata with advice. For a more comprehensive overview, see [15].
Definition 5 (Advice DFA)
Let and be finite sets (the input and advice alphabets, respectively). An advice DFA over with advice from consists of the following data:
-
•
a finite set (the set of states);
-
•
an infinite length string over the advice alphabet (the advice string);
-
•
a function (the transition function);
-
•
a state (the initial state); and
-
•
a set of states (the set of accepting states).
Given an input string and advice DFA , define the run of on to be the sequence of states such that , and for , we have that . A string is accepted by if the last state appearing in the run of on is in . A language is recognized by an advice DFA if accepts a string if and only if . We say that a language is regular with advice if it is recognized by some advice DFA.
Advice DFAs extend classical DFAs with the addition of the (infinite) advice string (which is fixed beforehand as part of the automaton). The advice string is read in parallel with the input string; i.e., when reads the character of the input, it also has access to the character of the advice when deciding which transition to make. One way to think about the advice string is that it allows the transition function to vary at each step of the computation (although at a fixed step , the transition behavior is the same regardless of the input string).
Advice DFAs satisfy a Myhill-Nerode characterization, under a variant of the relation. Define an equivalence relation on by iff for all . Notice that is simply restricted to strings of length .
Theorem 3.1 (Myhill-Nerode for advice DFAs, cf. [15, Theorem 4])
Let be a language.
-
1.
Suppose is accepted by an advice DFA that has states. Then has at most classes for all .
-
2.
Suppose has at most classes for all . Then there is an advice DFA on states that recognizes .
In general, the bound of states in the second item is tight. A example witnesing this is given in Appendix 0.A.1. Also, note that this statement is more precise than the original version in [15]; in particular, the relationship between the number of states and the number of -classes does not appear in the original theorem. However, this relationship is easily derived from its proof.
3.2 Learning bound for advice DFAs
For the remainder of the section, fix and let be an alphabet of size . We will not fix the advice alphabet ; however, notice that if we are constructing an automaton with at most states, we may take to have size , since the advice string can be thought of as coding the transition function at a given step, and there are functions from (possible transition functions).
Note that the language consisting of the finite prefixes of any -word is regular with advice , and each of these languages is distinct, so there are uncountably many languages that are regular with advice. Thus we cannot have any finitary representation of arbitrary languages that are regular with advice. Hence, we consider the case where we restrict to strings of bounded length. Let denote the set
and let denote the set
By Theorem 3.1, for any . Because of this, it is more convenient to compute the query complexity of with queries .
Proposition 1
The consistency dimension of with respect to
is at most .
Proof
Let , and suppose that . Since , we have that . This means that there are strings which are pairwise -inequivalent for some . For each , let such that (i.e., distinguishes and according to ). Consider the set
which has size . Let extend . must have at least -classes, since are forced to be -inequivalent. Hence , so in particular . Since we found a set of size which witnesses the fact that is -inconsistent with , the consistency dimension of with respect to is at most .
Proposition 2
The Littlestone dimension of is .
Proof
We can bound the Littlestone dimension of by bounding the size of and applying 1. As noted at the beginning of the section, we may interpret as coding all possible transition functions, so the advice string simply tells the automaton which transition function to use at each step.
Consider an advice DFA such that and . Then to fully specify the behavior of the automaton on strings of length , it is enough to choose one transition function for each step, as well as the set of accepting states. There are ways to choose the transition functions, and ways to choose the set of accepting states. So there are total possible advice DFAs of this form.
Now notice that every language on strings of length up to that is accepted by an advice DFA with at most states is accepted by an advice DFA of this form, simply by appropriately permuting the states and the letters of the advice string. So indeed the size of the class is upper bounded by . Therefore
Combining Propositions 1 and 2 with Theorem 2.1(1), we obtain:
4 Learning Nominal DFAs
4.1 Overview of nominal DFAs
In this subsection, we give an introduction to nominal automata. For a more comprehensive treatment, see [5].
As mentioned in the introduction, one must leverage some underlying structure to properly generalize automata theory to infinite alphabets. For example, consider the alphabet , and define by if and only if for some . An equivalently defined language over a finite alphabet is easily seen to be regular, and so we expect this example to be “regular” as well. We can draw an infinite automaton that recognizes in the following way:
Notice that in some cases, we have combined infinitely many transitions into a single arrow—here, we assume that we are able to compare whether or not two data values are equal, and so are leveraging a form of symmetry. We can further condense this representation into an actually finite diagram:
We can compress the distinct states for each character from into a single “state” because it is enough to be able to compare whether or not the first and second characters read are equal or not. In order to formalize this intuition, we use the notion of nominal sets. Note: we do not work in the most general setting of [5], and instead only focus on what they call the equality symmetry. However, it is reasonable to expect that our results can generalize to other well-behaved symmetries.
Given a set , let denote the group of permutations on , i.e., the set of bijections from to with the operation of function composition. For the rest of the paper, let denote Given a set , a (left) action of on is an operation such that:
-
1.
for all , , where is the identity function on , and
-
2.
for all and , .
Example 1
Let . Define an action of on by . Observe that there is a permutation for which if and only if either (1) and , or (2) and . This example illustrates the action of can be used to formalize the idea of being able to compare data values for equality.
Definition 6 (Support; Least Support)
Fix and an action of on . Given a subset and an element , we say that supports if for every such that fixes every element of , we have that .
A finite set is called the least support of if
-
1.
supports ,
-
2.
no proper subset of supports , and
-
3.
no other finite subset of has properties (1) and (2).
We will use to denote the least support of .
An equivalent characterization of an element having least support is that the intersection of any two finite supports of also supports .
Definition 7 (Nominal Set)
A nominal set is a set along with an action of on such that every element of has a least support.
Lemma 1 ([5, Lemma 4.9])
Let be a nominal set and let . If supports , then for any , supports .
As a corollary, for any .
Definition 8 (Nominal Dimension)
Let be a nominal set. The nominal dimension of is (i.e., the largest size of a least support).
In the literature on nominal sets, this is simply referred to as the dimension of . However, in order to prevent confusion with other notions of dimension used in this paper, we will use the term nominal dimension.
Definition 9 (Orbit; Orbit-finiteness)
Let be a nominal set. The orbit of an element , denoted , is the set
Every nominal set is partitioned into the disjoint union of its orbits. We say that is orbit-finite if is the union of only finitely many orbits.
By 1, elements in the same orbit have the same size of least support, so orbit-finite nominal sets have finite nominal dimension.
Example 2
Let , and define the action of on as in 1. The orbits of are and , where and . These are the sets of pairs whose coordinate are either equal or unequal, respectively. Thus is orbit-finite with two orbits. Additionally, every element has least support , and so the nominal dimension of is 2.
For , let be a set with an action of on . The pointwise action of on is the action given by .
Definition 10 (Equivariance)
Let be a nominal set. A subset is equivariant if for any and , .
A relation on , where each is a nominal set, is equivariant if it is equivariant when considered as a subset of the product equipped with the pointwise action.
In particular, a function is equivariant exactly when
Remark 3
is an equivariant subset of if and only if is a union of orbits.
The following lemmas demonstrate how equivariant functions behave nicely with orbits and supports.
Lemma 2
Let be an equivariant function, and let . The image of the orbit of (in ) under is equal to the orbit of (in ).
As a corollary, if is surjective, then has at most as many orbits as .
Proof
The orbit of is the set . By equivariance of , this is equal to , which is the image under of the orbit of .
Lemma 3 ([5, Lemma 4.8])
Let be an equivariant function, , and . If supports , then supports .
As a corollary, , and hence if is surjective, the nominal dimension of is at most the nominal dimension of .
We will work extensively with quotients by equivariant equivalence relations, so we state some important general facts about them. Recall that if is an equivalence relation on , denotes the set of equivalence classes of , and is called the quotient of by .
Lemma 4 ([5, Lemma 3.5])
Let be a nominal set and be an equivariant equivalence relation. Then the quotient is a nominal set, under the action , and the quotient map is a surjective equivariant function.
Lemma 5
Let be a nominal set and be an equivariant equivalence relation. Then for every , and the nominal dimension of is at most the nominal dimension of .
Proof
This follows immediately from 3 and the fact that the quotient map is surjective and equivariant.
Lemma 6
Let be nominal sets. An equivariant function and equivariant equivalence relation on induce an equivariant equivalence relation on and an equivariant function .
Furthermore, is injective, and if is surjective, then is also surjective.
We are now ready to define nominal DFAs and state the nominal Myhill-Nerode theorem. Fix an orbit-finite nominal set , which we will call the -alphabet. The action of on naturally extends to : given a string , is the string obtained by letting act on each individual character of .
Definition 11 (-language)
A -language is a function such that the set is an equivariant subset of .
Definition 12 (Nominal DFA)
Let be an orbit-finite nominal set (the input alphabet). A nominal DFA over consists of the following data:
-
•
an orbit-finite nominal set (the set of states);
-
•
an equivariant function (the transition function)
-
•
a state such that the orbit of is (the initial state)
-
•
an equivariant subset (the set of accepting states)
As a shorthand, we say that a nominal DFA has orbits or nominal dimension when the state set has orbits or nominal dimension .
Given an input string , define the run of on to be the sequence of states such that , and for , we have that . A string is accepted by if the last state appearing in the run of on is in . We say that a langauge is recognized by a nominal DFA if accepts a string if and only if . As noted in [5, Definition 3.1], the language recognized by a nominal DFA must be a -language. We say that a -language is nominal regular if it is recognized by some nominal DFA.
Example 3
Let , and let be defined by if and only if for some as in the previous example. It is straightfoward to see this language is a -language. Additionally, in the automaton that recognizes , we can define an action of on the set of states by for any and , while for all for the states and . This automaton is a nominal DFA, and hence is nominal regular.
Let be a -language, and define the usual relation on by: if and only if for all , we have . This relation is equivariant (see [5, Lemma 3.4]), and hence by 4, is a nominal set. We will write to denote the -equivalence class of a string .
Theorem 4.1 (Myhill-Nerode for nominal DFAs [5, Theorem 5.2])
Let be an orbit-finite nominal set, and let be a -language. Then the following are equivalent:
-
1.
has at most orbits and has nominal dimension at most ;
-
2.
is nominal regular, and in particular is recognized by a nominal DFA with at most orbits and nominal dimension at most .
Note that this statement is more precise than the original version in [5]; in particular, the conditions on the number of orbits and the nominal dimension do not appear in the original theorem. However, these conditions are easily derived from its proof. For completeness, a proof of the correspondence between the number of orbits and the nominal dimension is given in Appendix 0.B.2.
4.2 Littlestone dimension of nominal DFAs
To prove bounds on the Littlestone dimension of nominal automata, we wish to bound the number of possible nominal automata. Since automata involve transition functions, we will need to understand products of nominal sets. Nominality is easily seen to be preserved under Cartesian products:
Proposition 3
The product of two nominal sets is nominal. In particular, the nominal dimension of the product of two nominal sets is at most the sum of the nominal dimensions of and .
Proof
Suppose and are nominal with nominal dimensions and , respectively. Let . Notice that , which has size at most , supports .
On the other hand, in the most general setting, products of orbit-finite nominal sets need not be orbit-finite [5, Example 2.5]. However, in our context of , products of orbit-finite sets are known to be orbit-finite [5, Section 10]. We will need an explicit bound on the number of orbits of products of orbit-finite sets, which is the purpose of the following discussion.
For a natural number , define . is a single-orbit nominal set when equipped with the pointwise action of .
Lemma 7 ([5, Lemma 4.13])
Given a nominal set of nominal dimension that has exactly one orbit, there is an equivariant surjection .
Definition 13 (cf. [16, Section 3])
Given , let denote the number of orbits of .
Proposition 4 (cf. [16, Section 3])
Let be a nominal set with orbits and nominal dimension for . Let and . Then has at most many orbits.
4 is stated but not proven in [16]. For completeness, we include a proof in Appendix 0.B.3. In light of 4, we can find bounds on the number of orbits of a product of nominal sets by bounding the value of . We do so for the case :
Proposition 5
Suppose (without loss of generality) that . Then
In particular, if is a constant, then .
The proof of 5 is given in Appendix 0.B.4. 5 will help us to calculate an explicit upper bound on the Littlestone dimension of nominal automata later, but we can first use it to substantiate a comment from the introduction about previously known query bounds.
The bound in [16, Corollary 1] involves a factor, where is the number of orbits of the state set of the target automaton, is the nominal dimension of the target automaton, is the nominal dimension of the alphabet, and is the length of the longest counterexample. Notice that is non-decreasing in all coordinates. Therefore, is lower bounded by both and . Applying the lower bounds from 5 yields the following:
Remark 4
The bound on the (EQ+MQ)-query complexity of nominal automata given in [16, Corollary 1] is at least , and at least .
We are now at a point where we can calculate a bound on the number of possible nominal DFAs and hence bound the Littlestone dimension. First, we state a general result on the number of single-orbit nominal sets:
Proposition 6
The number of distinct (up to isomorphism) single-orbit nominal sets of nominal dimension at most is at most .
The proof of 6 is given in Appendix 0.B.5. We note that it is a completely general result about nominal sets and may be of independent interest. For us, it will allow us to bound the number of possible state sets.
Lemma 8
The number of possible state sets for a nominal DFA with orbits and nominal dimension is .
Proof
Since the state set is an orbit-finite nominal set with orbits and nominal dimension , we count the number of such sets. We can view as the disjoint union of single-orbit nominal sets, each with nominal dimension at most , just by considering each orbit independently. By 6, the number of single-orbit nominal sets of nominal dimension is at most . An upper bound on the number of nominal sets with orbits and nominal dimension can be found by raising this number to , i.e., .
Now, we count the number of possible transition behaviors. Fix an input alphabet , where has orbits and nominal dimension .
Lemma 9
The number of possible transition behaviors for a nominal DFA with orbits and nominal dimension is at most
Let denote the set of -languages over recognized by a nominal DFA with at most orbits and nominal dimension at most .
Proposition 7
The Littlestone dimension of is at most
Proof
We bound the Littlestone dimension by bounding the size of . To choose a nominal DFA with at most orbits and nominal dimension at most , we must choose the state set, transition function, initial state, and set of accepting states. By 8, there are at most choices for the state set. By 9, there are at most choices for the transition function. There are at most choices for the initial state (since the initial state must be an orbit of its own) and choices for the accepting states (since the set of accepting states is a union of orbits). So in total, we can upper bound by
Applying 1, taking logs, using Stirling’s approximation, and remembering that is constant, we find that
4.3 Consistency dimension of nominal DFAs
It remains to bound the consistency dimension. To do this, we need to show that if is a language that is not recognized by a nominal DFA with at most orbits and nominal dimension at most , then there is a set of strings of bounded size such that the restriction of to cannot be extended to any -language recognized by such a nominal DFA. By the nominal Myhill-Nerode theorem, not being recognized by a nominal DFA with at most orbits and nominal dimension at most is equivalent to either having greater than orbits or having nominal dimension greater than . We first address the case where has large nominal dimension.
Lemma 10
Let be any nominal set, let , and let . Suppose that fixes every element of except for one. Then .
The proof of this lemma is given in Appendix 0.B.7.
Proposition 8
Let be a -language over alphabet , and suppose that has nominal dimension at least . Then there is a set of size such that for any -language , if , then also has nominal dimension at least .
Proof
Since has nominal dimension , there is some such that . Let denote , and let be a subset of of size . Let , and for each , let (the permutation that swaps and ). Notice that fixes all but one element of , and so by 10, . Thus for each , there is some such that . Let
which has size .
Now, let be a -language extending , and suppose for contradiction that has nominal dimension at most . That is, for every , . In particular, . For each , agrees with on and , so . This shows that . Since the only elements not fixed by are and , it must be that at least one of and are in the least support of . Thus for each , contains at least one of and . Since , all values for are distinct, and so , which contradicts the fact that .
Next, we address the case where has a large number of orbits.
Lemma 11
Let be a -language such that has orbits. Then for every orbit of , there is a string such that such that .
Proposition 9
Let be a -language over alphabet , where has nominal dimension , and suppose that has at least orbits. Then there is a set of size such that for any -language , if , then also has at least orbits.
Proof
Since has at least orbits, there are strings such that all belong to distinct orbits. That is, for every and , , and thus there is such that
Moreover, by 11, we may assume that for each .
For each , let . Since the nominal dimension of is , we have that . Also, let be some subset of such that is disjoint from , and .
Now, for each , let
Notice that has size at most : to choose a , we choose a subset of of size and a value from for each of the inputs. For each , let be an arbitrary but fixed extension of to a permutation of , and let consist of all such (so ).
Now, let
has size at most . Let be a -language extending , and assume for contradiction that has at most orbits. Thus there must be such that is in the same orbit as . Let such that . does not have to be in , and so we cannot directly derive a contradiction. Instead, we will alter in order to obtain a such that .
Define in the following way: first, let denote the image of under . For each element , select a unique element . This is possible because , and so . Let be the permutation that transposes each with its corresponding , and fixes every other element of .
Notice that , and so . Also, if , then : if , then by definition does not affect and so , whereas if , will transpose with an element in . Let denote . Since has nominal dimension at most , . Also, by 5, . Therefore, is an injection from a subset of of size at most into . Therefore, there is some such that
We may then deduce that
This means that for all , . In particular, we may choose . However, since and are in , it must be that
a contradiction! So we may finally conclude that has at least orbits.
11 yields an interesting consequence, which we do not use in any of our proofs but may be of independent interest:
Corollary 1
If is a -language over an alphabet which has nominal dimension such that has orbits, then has nominal dimension at most .
This result suggests that ultimately it is the number of orbits, and not the nominal dimension, of which characterizes the complexity of . Its proof is given in Appendix 0.B.9.
We now have all the ingredients we need to bound the consistency dimension.
Proposition 10
The consistency dimension of with respect to itself is at most .
Proof
Let , and suppose that , i.e., is not recognized by any nominal DFA with at most orbits and nominal dimension at most . We must find a set of at most strings such that any function extending is not in .
Case 1: is not equivariant. Then there is and such that . Set . Any function extending cannot be equivariant, and therefore cannot be nominal regular, so .
Case 2: is equivariant, and is recognized by some nominal DFA with at most orbits. By Theorem 4.1, has at most orbits. Thus the nominal dimension of must be at least , or else another application of Theorem 4.1 would imply that is recognized by a nominal DFA with at most orbits and nominal dimension at most , contradicting the assumption that . Let of size be as given by 8.
Now, let extend , and suppose for contradiction that . Since is recognized by a nominal DFA, it must be a -language, so by the definition of , has nominal dimension at least . However, by Theorem 4.1 and the fact that is recognized by a nominal DFA with at most orbits and nominal dimension at most , has nominal dimension at most , a contradiction! So .
Case 3: is equivariant, and is not recognized by any nominal DFAs with at most orbits. By Theorem 4.1, has at least orbits. Let of size be as given by 9.
Now, let extend , and suppose for contradiction that . Since is recognized by a nominal DFA, it must be a -language, so by the definition of , has at least orbits. However, by Theorem 4.1 and the fact that is recognized by a nominal DFA with at most orbits and nominal dimension at most , has at most orbits, a contradiction! So .
In all three cases, we found a set of size at most such that any language extending cannot be in . We can then conclude that the consistency dimension of with respect to itself is at most .
4.4 Learning bound for nominal DFAs
We can now use the bounds we proved for Littlestone dimension and consistency dimension to prove our main result on nominal DFAs:
Proof
By 7, the Littlestone dimension of is at most . By 10, the consistency dimension of with respect to itself is at most . This is upper bounded by , which is in turn at most (since is a constant). Applying Theorem 2.1, the query complexity of with queries from is at most .
5 Conclusion
In this paper, we derive query learning bounds for advice DFAs and nominal DFAs; the bounds for advice DFAs are the first known bounds for the setting, while the bounds for nominal DFAs improve upon prior results, at the cost of not making any computational guarantees about the learning algorithm.
The bounds for advice DFAs are derived in almost the exact same manner as the bounds for DFAs given by Chase and Freitag [8]. However, to the complexity of the setting of nominal DFAs requires a much deeper analysis of the structure of nominal sets and nominal DFAs. This leads to several auxillary results which may be of independent interest; namely, 5, which analyzes the asymptotics of the number of orbits of products of nominal sets; 6, which counts the number of single-orbit nominal sets of a given dimension; and 1, which bounds the nominal dimension of in terms of the number of orbits of and the nominal dimension of .
One natural extension of our work is to adapt the results for nominal automata to other well-behaved symmetries beside the equality symmetry, such as the total order symmetry, as described in [5]. Our approach is general enough that it should apply to these symmetries; however, there are many details that would need to be verified. It would also be natural to explore what the approach of Chase and Freitag yields for other previously-studied forms of automata that admit versions of the Myhill-Nerode theorem. For example, Bollig et al. [6] study query learning of NFAs via residual finite-state automata, which crucially utilize the Myhill-Nerode correspondence to find canonical NFA representations of regular languages. A final potential direction for future work would be to consider the generalization of nominal automata to -languages; i.e., nominal DFAs with a Büchi-style acceptance criteria. Angluin and Fisman have adapted the algorithm to the classical setting of -regular languages [2], and Chase and Freitag apply their method in the same setting [8]. To our knowledge, no work has been done studying the learnability of -nominal regular languages, so it would be of interest to find results using both Angluin’s method and Chase and Freitag’s method.
References
- [1] Angluin, D.: Learning regular sets from queries and counterexamples. Inf. Comput. 75, 87–106 (1987)
- [2] Angluin, D., Fisman, D.: Learning regular omega languages. Theoretical Computer Science 650, 57–72 (2016). https://doi.org/https://doi.org/10.1016/j.tcs.2016.07.031, https://www.sciencedirect.com/science/article/pii/S0304397516303760, algorithmic Learning Theory
- [3] Balcázar, J.L., Castro, J., Guijarro, D., Simon, H.U.: The consistency dimension and distribution-dependent learning from queries. Theoretical Computer Science 288(2), 197–215 (2002). https://doi.org/https://doi.org/10.1016/S0304-3975(01)00400-5, https://www.sciencedirect.com/science/article/pii/S0304397501004005, algorithmic Learning Theory
- [4] Bárány, V.: A hierarchy of automatic -words having a decidable mso theory. RAIRO - Theoretical Informatics and Applications - Informatique Théorique et Applications 42(3), 417–450 (2008). https://doi.org/10.1051/ita:2008008, http://www.numdam.org/articles/10.1051/ita:2008008/
- [5] Bojańczyk, M., Klin, B., Lasota, S.: Automata theory in nominal sets. Logical Methods in Computer Science Volume 10, Issue 3 (Aug 2014). https://doi.org/10.2168/LMCS-10(3:4)2014, https://lmcs.episciences.org/1157
- [6] Bollig, B., Habermehl, P., Kern, C., Leucker, M.: Angluin-style learning of nfa. In: IJCAI. pp. 1004–1009 (07 2009)
- [7] Carton, O., Thomas, W.: The monadic theory of morphic infinite words and generalizations. In: Nielsen, M., Rovan, B. (eds.) Mathematical Foundations of Computer Science 2000. pp. 275–284. Springer Berlin Heidelberg, Berlin, Heidelberg (2000)
- [8] Chase, H., Freitag, J.: Bounds in query learning. In: Abernethy, J., Agarwal, S. (eds.) Proceedings of Thirty Third Conference on Learning Theory. Proceedings of Machine Learning Research, vol. 125, pp. 1142–1160. PMLR (2020), http://proceedings.mlr.press/v125/chase20a.html
- [9] Drews, S., D’Antoni, L.: Learning symbolic automata. In: Legay, A., Margaria, T. (eds.) Tools and Algorithms for the Construction and Analysis of Systems. pp. 173–189. Springer Berlin Heidelberg, Berlin, Heidelberg (2017)
- [10] Elgot, C.C., Rabin, M.: Decidability and undecidability of extensions of second (first) order theory of (generalized) successor. J. Symb. Log. 31, 169–181 (1966)
- [11] Fisman, D., Saadon, S.: Learning and characterizing fully-ordered lattice automata. In: Bouajjani, A., Holík, L., Wu, Z. (eds.) Automated Technology for Verification and Analysis. pp. 266–282. Springer International Publishing, Cham (2022)
- [12] Gabbay, M.J., Pitts, A.M.: A new approach to abstract syntax with variable binding. Formal Aspects of Computing 13(3), 341–363 (07 2002). https://doi.org/10.1007/s001650200016, https://doi.org/10.1007/s001650200016
- [13] Hanneke, S., Livni, R., Moran, S.: Online learning with simple predictors and a combinatorial characterization of minimax in 0/1 games. In: COLT (2021)
- [14] Hellerstein, L., Pillaipakkamnatt, K., Raghavan, V., Wilkins, D.: How many queries are needed to learn? J. ACM 43(5), 840–862 (Sep 1996). https://doi.org/10.1145/234752.234755, https://doi.org/10.1145/234752.234755
- [15] Kruckman, A., Rubin, S., Sheridan, J., Zax, B.: A myhill-nerode theorem for automata with advice. Electronic Proceedings in Theoretical Computer Science 96 (10 2012). https://doi.org/10.4204/EPTCS.96.18
- [16] Moerman, J., Sammartino, M., Silva, A., Klin, B., Szynwelski, M.: Learning nominal automata. In: Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages. p. 613–625. POPL 2017, Association for Computing Machinery, New York, NY, USA (2017). https://doi.org/10.1145/3009837.3009879, https://doi.org/10.1145/3009837.3009879
- [17] Montanari, U., Pistore, M.: An introduction to history dependent automata. Electr. Notes Theor. Comput. Sci. 10, 170–188 (01 1997). https://doi.org/10.1016/S1571-0661(05)80696-6
- [18] Nies, A.: Describing groups. The Bulletin of Symbolic Logic 13(3), 305–339 (2007), http://www.jstor.org/stable/4493323
- [19] Pitts, A.M.: Nominal Sets: Names and Symmetry in Computer Science. Cambridge Tracts in Theoretical Computer Science, Cambridge University Press (2013). https://doi.org/10.1017/CBO9781139084673
- [20] Pyber, L.: Asymptotic results for permutation groups. In: Groups And Computation (1991)
- [21] Rabinovich, A., Thomas, W.: Decidable theories of the ordering of natural numbers with unary predicates. In: Ésik, Z. (ed.) Computer Science Logic. pp. 562–574. Springer Berlin Heidelberg, Berlin, Heidelberg (2006)
- [22] Sakakibara, Y.: Learning context-free grammars from structural data in polynomial time. In: Proceedings of the First Annual Workshop on Computational Learning Theory. p. 330–344. COLT ’88, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1988)
- [23] Salomaa, A.: On finite automata with a time-variant structure. Information and Control 13(2), 85–98 (1968). https://doi.org/https://doi.org/10.1016/S0019-9958(68)90706-7, https://www.sciencedirect.com/science/article/pii/S0019995868907067
- [24] Tsankov, T.: The additive group of the rationals does not have an automatic presentation. The Journal of Symbolic Logic 76(4), 1341–1351 (2011), http://www.jstor.org/stable/23208221
Appendix 0.A Omitted proofs from Section 3
0.A.1 A witness to the tightness of the bounds in Theorem 3.1
Example 4
Define by
Notice that has at most two classes for every —one for strings with an even number of 0’s and one for strings with an odd number of 0’s. However, it is not recognized by any DFA with advice with 3 states. To see this, let accept . The runs of strings 00 and 01 both end in reject states of , but since they are in separate -classes, they must end in distinct states. So there are at least two reject states. Similarly, the runs of 000 and 001 both end in distinct accept states. Thus there must be at least 4 states in .
Replacing “ has an even number of 0’s” with any regular language that has at most equivalence classes in the Myhill-Nerode relation (for example, “the number of 0’s in is divisible by ”), we obtain a language such that has at most classes for every , but any advice DFA accepting must have at least states.
Appendix 0.B Omitted proofs from Section 4
0.B.1 Proof of 6
Proof
Define by if and only if . A standard argument confirms that this is indeed an equivalence relation, equivariance follows from the fact that is equivariant.
Next, define by . This is well-defined: if , then
is equivariant since if , then . is injective since for ,
Finally, if is surjective, then given , there is such that and hence , so is surjective.
0.B.2 Proof of Theorem 4.1
Definition 14 (Reachable Nominal DFA)
A nominal DFA is said to be reachable if for every state in , there is such that the run of on ends in state .
Definition 15 (Syntactic Automaton)
Fix an orbit-finite nominal alphabet , and let be a -language. The syntactic automaton of , denoted is specified as follows:
-
•
the state set is the set ;
-
•
the transition function is
defined by -
•
the initial state is ;
-
•
the set of accepting states is .
Lemma 12 ([5, Lemma 3.6; Proposition 5.1])
The syntactic automaton of a -language is a reachable nominal DFA.
By [5, Lemma 3.7], a -language is always recognized by its syntactic automaton.
Definition 16 (Automaton Homomorphism)
Let and be two nominal DFAs over the same alphabet . An automaton homomorphism from to is an equivariant function such that:
-
•
;
-
•
for every ; and
-
•
for every .
If there exists an automaton homomorphism from to , then and recognize the same language: for any string , the run of on ends in state if and only if the run of on ends in the state , and if and only if , so accepts if and only if accepts .
Lemma 13 ([5, Lemma 3.7])
Let be a -language. For any reachable nominal DFA that recognizes , there is a surjective automaton homomorphism from to .
We can now prove bounds in the nominal Myhill-Nerode theorem:
Proof (Proof of Theorem 4.1)
(1. 2.) Suppose that has at most orbits and nominal dimension at most . Then is a nominal DFA that recognizes , and its state set is which by assumption has at most orbits and nominal dimension at most .
(2. 1.) Suppose that is recognized by a nominal DFA with at most orbits and nominal dimension at most . We may assume that is reachable, so by 13, there is a surjective automaton homomorphism from to the syntactic automaton . By 2 and 3, the state set of has at most orbits and nominal dimension . But the state set of is exactly , and so we obtain the desired bounds.
0.B.3 Proof of 4
Proof
Let . Its orbit must be contained in the product . Hence, it is enough to bound the number of orbits of any given product , where each is an orbit of , and multiply by the number of possible products. Fix orbits of , respectively. By 7, there are equivariant surjections . Taking the product gives us the equivariant surjection
By 2, has at most as many orbits as , which has orbits. Now, there were ways to choose the orbits , so in total, can have at most orbits.
0.B.4 Proof of 5
Proof
Consider tuples and . Notice that the orbit of the pair is exactly determined by the collection of indices such that . So to choose an orbit, we can first choose the number of indices that and coincide on. Then we need to choose indices of , indices of of , and a bijection between and in order to determine the indices that and coincide on. There are ways to choose these.
Thus the total number of orbits is
This is lower bounded by . Since and for any value of , we have that
On the other hand, since for any and ,
0.B.5 Proof of 6
The following definitions and propositions are adapted from [5, Section 9], and will allow us to bound the number of possible nominal sets.
Definition 17 (cf. [5, Definition 9.11])
A support representation is a pair , where , and is a subgroup of .
Definition 18 (cf. [5, Definition 9.14])
Given a support representation , the semantics of , denoted , is the set , where is defined as
There is a natural action of on defined by
An isomorphism of nominal sets is an equivariant bijection between two nominal sets.
Proposition 11 (cf. [5, Proposition 9.15])
For any support representation , is a single-orbit nominal set of nominal dimension , and every single-orbit nominal set of nominal dimension is isomorphic to for some .
Proposition 12 (cf. [5, Proposition 9.16])
Let and be single-orbit nominal sets. Let
Equivariant functions from to are in bijective correspondence with (where is as in 18).
Lemma 14
is determined, up to isomorphism, by and the conjugacy class of in .
Proof
Let and be single-orbit nominal sets. Suppose that and that are conjugate in . That is, there is a permutation such that . Define by (i.e., reorder the input using ). Notice that is an equivariant bijection. By 6, and induce an equivalence relation on . Since , the induced equivalence relation is actually . Then the induced function is an equivariant bijection between and .
In the other direction, suppose there is an isomorphism . The proof of 12 gives us a bijection such that . In particular, and is a permutation in that witnesses that and are conjugate.
With this machinery in hand, we can give the proof of 6:
Proof
Let be a single-orbit nominal set with nominal dimension at most . By 14, is determined up to isomorphism by the value of and the conjugacy class of in . Since the nominal dimension of is , , and so there are choices for . It remains to count the number of conjugacy classes of subgroups of . The number of subgroups of is [20, Theorem 4.2], which certainly gives an upper bound for the number of conjugacy classes of subgroups. Additionally, any subgroup has at most conjugates (one for each permutation in ), so the number of conjugacy classes is also at least .
Thus the number of single-orbit nominal sets of nominal dimension at most is at most .
0.B.6 Proof of 9
Proof
We count the number of equivariant functions , where has orbits and nominal dimension . Since is equivariant, by 2, a single orbit of must map into a single orbit of , so we can first choose a target orbit of for each orbit of . By 4, has at most orbits. Since is fixed, by 5, for any , . Furthermore, since is also fixed, has at most orbits. Thus there are many ways to choose the target orbits of each orbit of .
Once we have chosen a target orbit of for each orbit of , we must choose an equivariant function from each orbit of to one orbit of . By 3, the nominal dimension of is at most , and so also has nominal dimension at most . Similarly, has nominal dimension at most . Therefore where , and where . By 12, the number of equivariant functions from to is upper bounded by the number of injections , which is in turn at most .
We must choose one such equivariant function for each orbit of , so the total number of choices of all of these functions is at most . Once we have done this, we have chosen a transition behavior .
Hence the total number of possible transition behaviors is upper bounded by
0.B.7 Proof of 10
Proof
Suppose for contradiction that . Let be the only element of such that , and notice that since fixes every element of other than . We claim that supports . To see this, let such that fixes every element of . We need to show that . We may assume that , since otherwise fixes every element of , and since supports , we would have . Additionally, since fixes every element of other than . Now, let and , and consider the permutation . We have that , as and both fix every element of , and . Since supports , we can deduce that . Then . Since , fixes every element of , and so . This shows that , and since was an arbitrary permutation that fixed every element of , we may conclude that supports . This is a contradiction, since is the least support of , and hence .
0.B.8 Proof of 11
Proof
Suppose for contradiction that there is some orbit of such that the orbit of every string of length strictly less than is not in . We may assume that the representative of the orbit is of minimal length . Write .
Consider the collection of prefixes of of length up to , i.e., the set . Since these all have length strictly less than , they must belong to one of the -many orbits that are not in . As there are strings in the collection, there must be such that and belong to the same orbit (here, denotes the empty string). That is, there is such that . Notice that is preserved under appending common suffixes; i.e., if , then for any , . Thus . However, notice that is strictly shorter than , and its -class is in the same orbit of as (more than that, is in the same -class), which contradicts the assumption that was of minimal length.
Thus for every orbit of , there must be a string of length strictly less than that is in the orbit.
0.B.9 Proof of 1
Proof
Let . By 11, there is some with and such that . Then 1 and 5 tell us that
Each of the letters of is supported by a set of size at most . Since acts on coordinate-wise, the least support of is contained in the union of the least supports of all the letters of , which is a set of size at most . Hence every element of is supported by a set of size at most , and so the nominal dimension of is at most .