This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: University of Illinois Chicago, Chicago, IL 60607, USA
11email: kzhou23@uic.edu

Query Learning Bounds for Advice and Nominal Automata

Kevin Zhou
Abstract

Learning automata by queries is a long-studied area initiated by Angluin in 1987 with the introduction of the LL^{*} algorithm to learn regular languages, with a large body of work afterwards on many different variations and generalizations of DFAs. Recently, Chase and Freitag introduced a novel approach to proving query learning bounds by computing combinatorial complexity measures for the classes in question, which they applied to the setting of DFAs to obtain qualitatively different results compared to the LL^{*} algorithm. Using this approach, we prove new query learning bounds for two generalizations of DFAs. The first setting is that of advice DFAs, which are DFAs augmented with an advice string that informs the DFA’s transition behavior at each step. For advice DFAs, we give the first known upper bounds for query complexity. The second setting is that of nominal DFAs, which generalize DFAs to infinite alphabets which admit some structure via symmetries. For nominal DFAs, we make qualitative improvements over prior results.

Keywords:
Advice automata Nominal automata Query learning

1 Introduction

Learning various forms of automata with queries is a well-studied class of problems with many applications, including in AI, automatic verification, and model checking. The field was initiated by Angluin in 1987 with the introduction of the LL^{*} algorithm that learns deterministic finite automata (DFAs) using a polynomially bounded number of equivalence and membership queries [1]. The LL^{*} algorithm has seen many applications and has also inspired generalizations to other types of automata, such as tree automata [22], nondeterministic finite automata [6], ω\omega-automata [2], symbolic automata [9], and fully-ordered lattice automata [11]. The setting of learning with queries is especially well-suited for learning automata—in real-world applications where the task is to identify the behavior of some black-box system that can be modelled by an automaton, queries can be simulated by interacting with the system and observing the output.

Historically, the study of query learning of automata has centered around adapting Angluin’s LL^{*} algorithm to different settings. We do not do this, and instead build on a method introduced by Chase & Freitag [8], who give query learning bounds in terms of Littlestone dimension and consistency dimension, which are complexity measures for concept classes. One can then prove a query learning bound in a given setting by computing the Littlestone and consistency dimensions. This approach seems to be particularly effective for learning automata, since many types of automata exhibit form of Myhill-Nerode characterization, which can be used to bound the consistency dimension. For example, Chase and Freitag apply this method for regular languages and regular ω\omega-languages and obtain qualitatively different results from prior work. Our work serves as further proof of concept of this method.

Our work studies the learnability of two generalizations of DFAs. The first is advice DFAs, which were studied as early as 1968 by Salomaa [23], though we follow the notation of Kruckman et al. [15]. Advice DFAs generalize classical DFAs by allowing the automata access to an additional advice string that it reads concurrently with the input, and are useful in modeling situations where the transition behavior can vary over time. They are also of interest to the logic community: it is a classical result that DFAs correspond to weak monadic second-order formulas over the structure of natural numbers with the successor operation; advice DFAs correspond to formulas over expansions of this structure by unary predicates, a frequently studied setting; see e.g. [10, 7, 21, 4]. Another motivating factor for advice DFAs comes from the study of automatic structures, which are structures whose domain and atomic relations are recognized by DFAs. It turns out some natural structures are not automatic or even isomorphic to an automatic structure, such as (,+)(\mathbb{Q},+), the additive group of the rationals [24]. However, (,+)(\mathbb{Q},+) is isomorphic to a structure whose domain and atomic relations are recognized by advice DFAs [18, 15].

The second setting we consider is that of nominal DFAs, introduced by Bojańczyk, Klin, and Lasota [5], which generalize DFAs to infinite alphabets. Generalizing formal language theory to infinite alphabets is highly non-trivial, since without any restrictions, the fact that there are uncountably many subsets of any infinite set makes computation intractable. However, in most reasonable applications, there is additional structure that can be leveraged to make computation reasonable, and nominal DFAs utilize the notion of nominal sets (first introduced by Gabbay & Pitts [12]) to formalize this idea. Aside from the development of the theory of automata over infinite alphabets, nominal sets have also found much use in the concurrency and semantics communities as a formalism for modeling name binding (see e.g. [17, 19]).

1.0.1 Organization of Paper and Summary of Results

In Section 2, we present the basic definitions and notation necessary for the remainder of the paper.

In Section 3, we study query learning of advice DFAs, and give the first known bound for the query complexity of advice DFAs. Our result for advice DFAs is as follows: let kadv(n,m)\mathcal{L}^{\text{adv}}_{k}(n,m) be the set of languages over an alphabet of size kk recognized by an advice DFA on at most nn states, restricted to strings of length at most mm. {restatable*}theoremadviceLC The (EQ+MQ)-query complexity of kadv(n,m)\mathcal{L}^{\text{adv}}_{k}(n,m) with queries from kadv(2n,m)\mathcal{L}^{\text{adv}}_{k}(2n,m) is O(n3mklogn)O(n^{3}mk\log n).

In Section 4, we study query learning of nominal DFAs. Query learning of nominal DFAs was previously studied by Moerman et al. [16]. We give a qualitative improvement over this previous result; a detailed comparison is given in the discussion below. Our result for nominal DFAs is as follows: given a GG-alphabet AA, let Anom(n,k)\mathcal{L}^{\text{nom}}_{A}(n,k) denote the set of GG-languages recognized by a nominal DFA with at most nn orbits and nominal dimension at most kk. {restatable*}theoremnominalLCfixedalphabet For a fixed GG-alphabet AA, the (EQ+MQ)-query complexity of
Anom(n,k)\mathcal{L}^{\text{nom}}_{A}(n,k) with queries from Anom(n,k)\mathcal{L}^{\text{nom}}_{A}(n,k) is at most nO(k)kk\frac{n^{O(k)}}{k^{k}}.

1.0.2 Discussion & Related Work

The approach of Chase & Freitag of deriving query complexity bounds by computing bounds on various combinatorial notions of dimension is a fairly novel one, and in [8] they illustrate its usefulness by easily proving polynomial query learning bounds for regular languages. The approach gives a qualitatively different bound compared to the one given by the LL^{*} algorithm—their results have no dependence on the length of the longest counterexample returned by the oracle (while the LL^{*} algorithm does), but a worse dependence on the number of states. They also do not give any computational complexity guarantees—their algorithm requires computing the Littlestone dimension and finding witnesses to the consistency dimension, which in general may be hard computational problems.

As mentioned, in the setting of advice DFAs, we give the first known bounds on the query complexity. The restriction on the length of the strings in kadv(n,m)\mathcal{L}^{\text{adv}}_{k}(n,m) is necessary to make the problem tractable and can be thought of as giving a dependence on the length of the longest counterexample for equivalence queries. It may be of interest to see if there is an analogous version of the LL^{*} algorithm for advice DFAs, and if so, how the bounds with that approach compare to ours.

The prior best known bound for query complexity of nominal DFAs is given by Moerman et al. in [16], using a generalization of Angluin’s LL^{*} algorithm to nominal DFAs. The bound that they derive is a complicated quantity (see [16, Corollary 1]), but in particular we observe the following: if the target automaton has nn orbits and nominal dimension kk, pp is the nominal dimension of the (fixed) alphabet, and mm is the length of the longest counterexample returned by the oracle, then their bound is lower bounded by both

  1. 1.

    min((nke)m,(me)nk)\min\left(\left(\frac{nk}{e}\right)^{m},\left(\frac{m}{e}\right)^{nk}\right) and

  2. 2.

    (knn!)p(k^{n}n!)^{p}.

These are not explicitly stated in the original work, but follow with some additional work (see 4).

Our result improves on (a) since our bound does not depend on the length of the longest counterexample, and improves the asymptotic dependence on nn compared to (b) (polynomial in nn with respect to kk, instead of factorial in nn). This is especially important in light of 1, which says that in this setting, kk is at most a constant multiple of nn. However, as with Chase & Freitag’s result on classical DFAs, we give no computational complexity guarantees.

2 Preliminaries

2.1 Query learning

Let XX be a set (the instance space). A concept is a function C:X{0,1}C:X\to\{0,1\}, and a concept class 𝒞\mathcal{C} on XX is a nonempty set of concepts. We note that a concept is sometimes equivalently defined as a subset of XX, but for our purposes it will be easier to work with functions. Fix a concept C𝒞C\in\mathcal{C}, which we call the target, and another concept class 𝒞\mathcal{H}\supseteq\mathcal{C}, which we call the hypothesis class. An equivalence query (EQ) consists of a hypothesis HH\in\mathcal{H}, to which the oracle answers yes if H=CH=C, or with a counterexample xXx\in X for which H(x)C(x)H(x)\neq C(x). A membership query (MQ) consists of an element xXx\in X, to which the oracle responds with the value of C(x)C(x). The learning procedure proceeds interactively in rounds: in each round, the learner poses a query to the oracle, and the oracle responds with the corresponding answer. The learner is allowed to choose queries based on the responses to previous queries, and succeeds if they submit the target as an equivalence query. In EQ-learning, the learner is only allowed to submit equivalence queries, while in (EQ+MQ)-learning, the learner can use both equivalence and membership queries.

Definition 1 (Query Complexity)

Let 𝒞\mathcal{C}\subseteq\mathcal{H} be two concept classes. The EQ-query complexity of 𝒞\mathcal{C} with queries from \mathcal{H} is defined to be the least nn such that there is an algorithm for the learner to submit equivalence queries from \mathcal{H} with the property that for any C𝒞C\in\mathcal{C}, the learner can identify CC within at most nn queries, or \infty if no such nn exists.

The (EQ+MQ)-query complexity of 𝒞\mathcal{C} with queries from \mathcal{H} is defined in the same way, except that the learner is allowed to also use membership queries.

A significant stream of prior work studied bounds for the query complexity of 𝒞\mathcal{C} with hypotheses from \mathcal{H} in terms of combinatorial complexity measures of 𝒞\mathcal{C} and \mathcal{H} [14, 3, 8, 13]. These bounds are formulated in terms of the Littlestone dimension, consistency dimension, and strong consistency dimension (also known as the dual Helly number), which we define now.

A binary element tree is a complete binary tree whose internal nodes are labeled by elements of XX. A binary element tree TT is shattered by 𝒞\mathcal{C} if there is a way to label all the leaves of TT with elements of 𝒞\mathcal{C} such that the following condition holds: given a leaf node labeled by A𝒞A\in\mathcal{C}, for each internal node above AA labeled by xXx\in X, A(x)=1A(x)=1 if and only if the (unique) path from the root to AA goes through the left child of xx.

An example of a (labelled) binary element tree of height 2 is given below, where x0,x1,x2Xx_{0},x_{1},x_{2}\in X, and C0,C1,C2,C3𝒞C_{0},C_{1},C_{2},C_{3}\in\mathcal{C}:

x0x_{0}x1x_{1}C0C_{0}C1C_{1}x2x_{2}C2C_{2}C3C_{3}

This tree is shattered exactly when C0(x0)=C0(x1)=1C_{0}(x_{0})=C_{0}(x_{1})=1, C1(x0)=1C_{1}(x_{0})=1 but C1(x1)=0C_{1}(x_{1})=0, C2(x0)=0C_{2}(x_{0})=0 but C2(x2)=1C_{2}(x_{2})=1, and C3(x0)=C3(x2)=0C_{3}(x_{0})=C_{3}(x_{2})=0.

Definition 2 (Littlestone dimension)

The Littlestone dimension of a concept class 𝒞\mathcal{C}, denoted Ldim(𝒞)\operatorname{Ldim}(\mathcal{C}), is the maximum nn such that there exists a binary element tree TT of height nn which is shattered by 𝒞\mathcal{C}. If no such nn exists, we say that Ldim(𝒞)=\operatorname{Ldim}(\mathcal{C})=\infty.

Remark 1

A straightforward bound for the Littlestone dimension of a finite class 𝒞\mathcal{C} is Ldim(𝒞)log|𝒞|\operatorname{Ldim}(\mathcal{C})\leq\log|\mathcal{C}|. To see this, note that a binary element tree TT of height >log|𝒞|>\log|\mathcal{C}| has more than |𝒞||\mathcal{C}| leaves, so there must be two leaves labeled by the same element C𝒞C\in\mathcal{C}. Consider the internal node where the paths leading to these two leaves differ; suppose it is labeled by xx. Then there is a root-to-leaf path ending at CC that goes through the left child of xx, implying that xCx\in C. However, there is also a root-to-leaf path ending at CC that goes through the right child of xx, implying that xCx\notin C, so TT cannot be shattered.

We now define consistency dimension. For the following bulleted definitions, let A,BA,B be partial functions from XX to {0,1}\{0,1\}.

  • dom(A)\operatorname{dom}(A) denotes domain of AA.

  • The size of AA refers to the cardinality of dom(A)\operatorname{dom}(A).

  • For a set Ydom(A)Y\subseteq\operatorname{dom}(A), the restriction of AA to YY is the partial function A|YA|_{Y} defined by A|Y(x)=A(x)A|_{Y}(x)=A(x) for xYx\in Y and undefined outside of YY.

  • We say that BB extends AA if dom(A)dom(B)\operatorname{dom}(A)\subseteq\operatorname{dom}(B) and B|dom(A)=AB|_{\operatorname{dom}(A)}=A. On the other hand, we say that AA is a restriction of BB.

  • Given a concept class 𝒞\mathcal{C}, AA is nn-consistent with 𝒞\mathcal{C} if every size nn restriction of AA has an extension in 𝒞\mathcal{C}. Otherwise, AA is nn-inconsistent.

Definition 3 (Consistency Dimension)

The consistency dimension of 𝒞\mathcal{C} with respect to \mathcal{H}, denoted Cdim(𝒞,)\operatorname{Cdim}(\mathcal{C},\mathcal{H}), is the least nn such that for every concept A:X{0,1}A:X\to\{0,1\} that is nn-consistent with 𝒞\mathcal{C}, we have that AA\in\mathcal{H}. If no such nn exists, we say that Cdim(𝒞,)=\operatorname{Cdim}(\mathcal{C},\mathcal{H})=\infty. In the case that =𝒞\mathcal{H}=\mathcal{C}, we will write Cdim(𝒞)\operatorname{Cdim}(\mathcal{C}) to denote Cdim(𝒞,𝒞)\operatorname{Cdim}(\mathcal{C},\mathcal{C}).

Remark 2

By contrapositive, Cdim(𝒞,)n\operatorname{Cdim}(\mathcal{C},\mathcal{H})\leq n if for every concept AA\notin\mathcal{H}, AA is nn-inconsistent with 𝒞\mathcal{C}. That is, we can find a subset YXY\subseteq X of size nn such that A|YA|_{Y} cannot be extended to anything in 𝒞\mathcal{C}.

It is often easier to work concretely with the contrapositive characterization of consistency dimension. Our proofs of bounds on consistency dimension will go through this direction.

Theorem 2.1 ([8, Theorem 2.24])

Let 𝒞\mathcal{C}\subseteq\mathcal{H} be two concept classes on a set XX, c=Cdim(𝒞,)c=\operatorname{Cdim}(\mathcal{C},\mathcal{H}), and d=Ldim(𝒞)d=\operatorname{Ldim}(\mathcal{C}). Then (EQ+MQ)-query complexity of 𝒞\mathcal{C} with queries from \mathcal{H} is O(cd)O(cd).

2.2 Deterministic finite automata

In this subsection, we set notation and terminology for deterministic finite automata and regular languages, and briefly review the Myhill-Nerode theorem.

Definition 4 (Deterministic Finite Automata)

Let Σ\Sigma be a finite set (the input alphabet). A deterministic finite automaton (DFA) over Σ\Sigma consists of the following data:

  • a finite set QQ (the set of states);

  • a function δ:Q×ΣQ\delta:Q\times\Sigma\to Q (the transition function);

  • a state q0Qq_{0}\in Q (the initial state); and

  • a set of states FQF\subseteq Q (the set of accepting states).

As is typical, Σ\Sigma^{*} will denote the set of finite strings over Σ\Sigma. Given an input string x=x1x2xnΣx=x_{1}x_{2}\cdots x_{n}\in\Sigma^{*}, and a DFA MM, define the run of MM on xx to be the sequence of states α0,,αnQ\alpha_{0},\ldots,\alpha_{n}\in Q such that α0=q0\alpha_{0}=q_{0}, and for 1in1\leq i\leq n, we have that δ(αi1,xi)=αi\delta(\alpha_{i-1},x_{i})=\alpha_{i}. A string xΣx\in\Sigma^{*} is accepted by MM if the last state appearing in the run of MM on xx is in FF. A language is a function L:Σ{0,1}L:\Sigma^{*}\to\{0,1\}. We note that languages are often defined as subsets of Σ\Sigma^{*}, but here we use functions in order to align with our definition of a concept. A language is recognized by a DFA MM if MM accepts a string xx if and only if xLx\in L. We say that a language LL is regular if it is recognized by some DFA.

The Myhill-Nerode theorem provides a useful syntactic characterization of regular languages. Given a language L:Σ{0,1}L:\Sigma^{*}\to\{0,1\}, define an equivalence relation L\equiv_{L} on Σ\Sigma^{*} as follows: for strings x,yΣx,y\in\Sigma^{*}, we say that xLyx\equiv_{L}y iff L(xz)=L(yz)L(xz)=L(yz) for all zΣz\in\Sigma^{*}.

Theorem 2.2 (Myhill-Nerode)

A language LL is regular if and only if L\equiv_{L} has finitely many equivalence classes. Moreover, L\equiv_{L} has exactly nn classes if and only if the minimal DFA recognizing LL has exactly nn states.

3 Learning Advice DFAs

3.1 Overview of advice DFAs

In this subsection, we give an introduction to automata with advice. For a more comprehensive overview, see [15].

Definition 5 (Advice DFA)

Let Σ\Sigma and Γ\Gamma be finite sets (the input and advice alphabets, respectively). An advice DFA MM over Σ\Sigma with advice from Γ\Gamma consists of the following data:

  • a finite set QQ (the set of states);

  • an infinite length string AΓωA\in\Gamma^{\omega} over the advice alphabet (the advice string);

  • a function δ:Q×Σ×ΓQ\delta:Q\times\Sigma\times\Gamma\to Q (the transition function);

  • a state q0Qq_{0}\in Q (the initial state); and

  • a set of states FQF\subseteq Q (the set of accepting states).

Given an input string x=x1x2xnΣx=x_{1}x_{2}\cdots x_{n}\in\Sigma^{*} and advice DFA MM, define the run of MM on xx to be the sequence of states α0,,αnQ\alpha_{0},\ldots,\alpha_{n}\in Q such that α0=q0\alpha_{0}=q_{0}, and for 1in1\leq i\leq n, we have that δ(αi1,xi,Ai)=αi\delta(\alpha_{i-1},x_{i},A_{i})=\alpha_{i}. A string xΣx\in\Sigma^{*} is accepted by MM if the last state appearing in the run of MM on xx is in FF. A language LL is recognized by an advice DFA MM if MM accepts a string xx if and only if xLx\in L. We say that a language LL is regular with advice if it is recognized by some advice DFA.

Advice DFAs extend classical DFAs with the addition of the (infinite) advice string AA (which is fixed beforehand as part of the automaton). The advice string is read in parallel with the input string; i.e., when MM reads the nthn^{\text{th}} character of the input, it also has access to the nthn^{\text{th}} character of the advice when deciding which transition to make. One way to think about the advice string is that it allows the transition function to vary at each step of the computation (although at a fixed step ii, the transition behavior is the same regardless of the input string).

Advice DFAs satisfy a Myhill-Nerode characterization, under a variant of the L\equiv_{L} relation. Define an equivalence relation L,m\equiv_{L,m} on Σm\Sigma^{m} by xL,myx\equiv_{L,m}y iff xzLyzLxz\in L\iff yz\in L for all zΣz\in\Sigma^{*}. Notice that L,m\equiv_{L,m} is simply L\equiv_{L} restricted to strings of length mm.

Theorem 3.1 (Myhill-Nerode for advice DFAs, cf. [15, Theorem 4])

Let LL be a language.

  1. 1.

    Suppose LL is accepted by an advice DFA that has nn states. Then L,m\equiv_{L,m} has at most nn classes for all mm\in\mathbb{N}.

  2. 2.

    Suppose L,m\equiv_{L,m} has at most nn classes for all mm\in\mathbb{N}. Then there is an advice DFA on 2n2n states that recognizes LL.

In general, the bound of 2n2n states in the second item is tight. A example witnesing this is given in Appendix 0.A.1. Also, note that this statement is more precise than the original version in [15]; in particular, the relationship between the number of states and the number of L,m\equiv_{L,m}-classes does not appear in the original theorem. However, this relationship is easily derived from its proof.

3.2 Learning bound for advice DFAs

For the remainder of the section, fix kk\in\mathbb{N} and let Σ\Sigma be an alphabet of size kk. We will not fix the advice alphabet Γ\Gamma; however, notice that if we are constructing an automaton with at most nn states, we may take Γ\Gamma to have size nnkn^{nk}, since the advice string can be thought of as coding the transition function at a given step, and there are nnkn^{nk} functions from Q×ΣQQ\times\Sigma\to Q (possible transition functions).

Note that the language consisting of the finite prefixes of any ω\omega-word AA is regular with advice AA, and each of these languages is distinct, so there are uncountably many languages that are regular with advice. Thus we cannot have any finitary representation of arbitrary languages that are regular with advice. Hence, we consider the case where we restrict to strings of bounded length. Let kadv(n,m)\mathcal{L}^{\text{adv}}_{k}(n,m) denote the set

kadv(n,m):={LΣm\displaystyle\mathcal{L}^{\text{adv}}_{k}(n,m):=\{L\subseteq\Sigma^{\leq m}\ \mid\ L is recognized by an advice\displaystyle L\text{ is recognized by an advice}
DFA with at most n states},\displaystyle\text{DFA with at most }n\text{ states}\},

and let k(n,m)\mathcal{E}_{k}(n,m) denote the set

k(n,m):={LΣm\displaystyle\mathcal{E}_{k}(n,m):=\{L\subseteq\Sigma^{\leq m}\ \mid\ L, has at most\displaystyle\equiv_{L,\ell}\text{ has at most }
n classes for all m}.\displaystyle n\text{ classes for all }\ell\leq m\}.

By Theorem 3.1, kadv(n,m)k(n,m)kadv(2n,m)\mathcal{L}^{\text{adv}}_{k}(n,m)\subseteq\mathcal{E}_{k}(n,m)\subseteq\mathcal{L}^{\text{adv}}_{k}(2n,m) for any n,mn,m\in\mathbb{N}. Because of this, it is more convenient to compute the query complexity of kadv(n,m)\mathcal{L}^{\text{adv}}_{k}(n,m) with queries kadv(2n,m)\mathcal{L}^{\text{adv}}_{k}(2n,m).

Proposition 1

The consistency dimension of kadv(n,m)\mathcal{L}^{\text{adv}}_{k}(n,m) with respect to
kadv(2n,m)\mathcal{L}^{\text{adv}}_{k}(2n,m) is at most n(n+1)n(n+1).

Proof

Let L:Σm{0,1}L:\Sigma^{\leq m}\to\{0,1\}, and suppose that Lkadv(2n,m)L\notin\mathcal{L}^{\text{adv}}_{k}(2n,m). Since k(n,m)kadv(2n,m)\mathcal{E}_{k}(n,m)\subseteq\mathcal{L}^{\text{adv}}_{k}(2n,m), we have that Lk(n,m)L\notin\mathcal{E}_{k}(n,m). This means that there are strings x0,,xnΣx_{0},\ldots,x_{n}\in\Sigma^{\ell} which are pairwise L,\equiv_{L,\ell}-inequivalent for some m\ell\leq m. For each 0i<jn0\leq i<j\leq n, let zijΣz_{ij}\in\Sigma^{*} such that L(xizij)L(xjzij)L(x_{i}z_{ij})\neq L(x_{j}z_{ij}) (i.e., zijz_{ij} distinguishes xix_{i} and xjx_{j} according to L,\equiv_{L,\ell}). Consider the set

B={xkzij0i<jn,k=i,j},B=\{x_{k}z_{ij}\mid 0\leq i<j\leq n,k=i,j\},

which has size 2(n+12)=n(n+1)2\binom{n+1}{2}=n(n+1). Let L:Σm{0,1}L^{\prime}:\Sigma^{\leq m}\to\{0,1\} extend L|BL|_{B}. LL^{\prime} must have at least n+1n+1 L,\equiv_{L^{\prime},\ell}-classes, since x0,,xnx_{0},\ldots,x_{n} are forced to be L,\equiv_{L^{\prime},\ell}-inequivalent. Hence Lk(n,m)L^{\prime}\notin\mathcal{E}_{k}(n,m), so in particular Lkadv(n,m)L^{\prime}\notin\mathcal{L}^{\text{adv}}_{k}(n,m). Since we found a set of size n(n+1)n(n+1) which witnesses the fact that LL is nn-inconsistent with kadv(n,m)\mathcal{L}^{\text{adv}}_{k}(n,m), the consistency dimension of kadv(n,m)\mathcal{L}^{\text{adv}}_{k}(n,m) with respect to kadv(2n,m)\mathcal{L}^{\text{adv}}_{k}(2n,m) is at most n(n+1)n(n+1).

Proposition 2

The Littlestone dimension of kadv(n,m)\mathcal{L}^{\text{adv}}_{k}(n,m) is O(nmklogn)O(nmk\log n).

Proof

We can bound the Littlestone dimension of kadv(n,m)\mathcal{L}^{\text{adv}}_{k}(n,m) by bounding the size of kadv(n,m)\mathcal{L}^{\text{adv}}_{k}(n,m) and applying 1. As noted at the beginning of the section, we may interpret Γ\Gamma as coding all possible transition functions, so the advice string AA simply tells the automaton which transition function to use at each step.

Consider an advice DFA such that Q=[n]Q=[n] and q0=1q_{0}=1. Then to fully specify the behavior of the automaton on strings of length mm, it is enough to choose one transition function for each step, as well as the set of accepting states. There are (nnk)m=nnmk(n^{nk})^{m}=n^{nmk} ways to choose the transition functions, and 2n2^{n} ways to choose the set of accepting states. So there are nnmk2nn^{nmk}2^{n} total possible advice DFAs of this form.

Now notice that every language on strings of length up to mm that is accepted by an advice DFA with at most nn states is accepted by an advice DFA of this form, simply by appropriately permuting the states and the letters of the advice string. So indeed the size of the class kadv(n,m)\mathcal{L}^{\text{adv}}_{k}(n,m) is upper bounded by nnmk2nn^{nmk}2^{n}. Therefore

Ldim(kadv(n,m))\displaystyle\operatorname{Ldim}(\mathcal{L}^{\text{adv}}_{k}(n,m)) log|kadv(n,m)|\displaystyle\leq\log|\mathcal{L}^{\text{adv}}_{k}(n,m)|
=log(nnmk2n)\displaystyle=\log(n^{nmk}2^{n})
=nmklogn+n\displaystyle=nmk\log n+n
=O(nmklogn).\displaystyle=O(nmk\log n).

Combining Propositions 1 and 2 with Theorem 2.1(1), we obtain:

\adviceLC

4 Learning Nominal DFAs

4.1 Overview of nominal DFAs

In this subsection, we give an introduction to nominal automata. For a more comprehensive treatment, see [5].

As mentioned in the introduction, one must leverage some underlying structure to properly generalize automata theory to infinite alphabets. For example, consider the alphabet A=A=\mathbb{N}, and define L:A{0,1}L:A^{*}\to\{0,1\} by L(w)=1L(w)=1 if and only if w=aaw=aa for some aAa\in A. An equivalently defined language over a finite alphabet is easily seen to be regular, and so we expect this example to be “regular” as well. We can draw an infinite automaton that recognizes LL in the following way:

qIq_{I}q0q_{0}q1q_{1}\vdotsqAq_{A}qRq_{R}0110110\neq 01\neq 1AAAA

Notice that in some cases, we have combined infinitely many transitions into a single arrow—here, we assume that we are able to compare whether or not two data values are equal, and so are leveraging a form of symmetry. We can further condense this representation into an actually finite diagram:

qIq_{I}qxq_{x}qAq_{A}qRq_{R}xA\forall x\in Axxxxx\neq xAAAA

We can compress the distinct states for each character from AA into a single “state” because it is enough to be able to compare whether or not the first and second characters read are equal or not. In order to formalize this intuition, we use the notion of nominal sets. Note: we do not work in the most general setting of [5], and instead only focus on what they call the equality symmetry. However, it is reasonable to expect that our results can generalize to other well-behaved symmetries.

Given a set AA, let Sym(A)\operatorname{Sym}(A) denote the group of permutations on AA, i.e., the set of bijections from AA to AA with the operation of function composition. For the rest of the paper, let GG denote Sym()\operatorname{Sym}(\mathbb{N}) Given a set XX, a (left) action of GG on XX is an operation :G×XX\cdot:G\times X\to X such that:

  1. 1.

    for all xXx\in X, ex=xe\cdot x=x, where ee is the identity function on \mathbb{N}, and

  2. 2.

    for all π,πG\pi,\pi^{\prime}\in G and xXx\in X, (ππ)x=π(πx)(\pi\circ\pi^{\prime})\cdot x=\pi\cdot(\pi^{\prime}\cdot x).

Example 1

Let X=2X=\mathbb{N}^{2}. Define an action of GG on XX by π(n,m)=(π(n),π(m))\pi\cdot(n,m)=(\pi(n),\pi(m)). Observe that there is a permutation π\pi for which π(n1,m1)=(n2,m2)\pi\cdot(n_{1},m_{1})=(n_{2},m_{2}) if and only if either (1) n1=m1n_{1}=m_{1} and n2=m2n_{2}=m_{2}, or (2) n1m1n_{1}\neq m_{1} and n2m2n_{2}\neq m_{2}. This example illustrates the action of Sym()\operatorname{Sym}(\mathbb{N}) can be used to formalize the idea of being able to compare data values for equality.

Definition 6 (Support; Least Support)

Fix XX and an action of GG on XX. Given a subset DD\subseteq\mathbb{N} and an element xXx\in X, we say that DD supports xx if for every πG\pi\in G such that π\pi fixes every element of DD, we have that πx=x\pi\cdot x=x.

A finite set DD\subseteq\mathbb{N} is called the least support of xx if

  1. 1.

    DD supports xx,

  2. 2.

    no proper subset of DD supports xx, and

  3. 3.

    no other finite subset of \mathbb{N} has properties (1) and (2).

We will use supp(x)supp(x) to denote the least support of xx.

An equivalent characterization of an element having least support is that the intersection of any two finite supports of xx also supports xx.

Definition 7 (Nominal Set)

A nominal set is a set XX along with an action of GG on XX such that every element of XX has a least support.

Lemma 1 ([5, Lemma 4.9])

Let XX be a nominal set and let xXx\in X. If DD supports xx, then for any πG\pi\in G, π(D)\pi(D) supports πx\pi\cdot x.

As a corollary, |supp(x)|=|supp(πx)||supp(x)|=|supp(\pi\cdot x)| for any πG\pi\in G.

Definition 8 (Nominal Dimension)

Let XX be a nominal set. The nominal dimension of XX is supxX|supp(x)|\displaystyle\sup_{x\in X}|supp(x)| (i.e., the largest size of a least support).

In the literature on nominal sets, this is simply referred to as the dimension of XX. However, in order to prevent confusion with other notions of dimension used in this paper, we will use the term nominal dimension.

Definition 9 (Orbit; Orbit-finiteness)

Let XX be a nominal set. The orbit of an element xXx\in X, denoted GxG\cdot x, is the set

Gx={πxπG}.G\cdot x=\{\pi\cdot x\mid\pi\in G\}.

Every nominal set is partitioned into the disjoint union of its orbits. We say that XX is orbit-finite if XX is the union of only finitely many orbits.

By 1, elements in the same orbit have the same size of least support, so orbit-finite nominal sets have finite nominal dimension.

Example 2

Let X=2X=\mathbb{N}^{2}, and define the action of GG on XX as in 1. The orbits of XX are G(n,n)G\cdot(n,n) and G(n,m)G\cdot(n,m), where n,mn,m\in\mathbb{N} and nmn\neq m. These are the sets of pairs whose coordinate are either equal or unequal, respectively. Thus XX is orbit-finite with two orbits. Additionally, every element (n,m)(n,m) has least support {n,m}\{n,m\}, and so the nominal dimension of XX is 2.

For i=1,,ni=1,\ldots,n, let XiX_{i} be a set with an action of GG on XiX_{i}. The pointwise action of GG on i=1nXi\prod_{i=1}^{n}X_{i} is the action given by π(x1,,xn)=(πx1,,πxn)\pi\cdot(x_{1},\ldots,x_{n})=(\pi\cdot x_{1},\ldots,\pi\cdot x_{n}).

Definition 10 (Equivariance)

Let XX be a nominal set. A subset YXY\subseteq X is equivariant if for any yYy\in Y and πG\pi\in G, πyY\pi\cdot y\in Y.

A relation RR on i=1nXi\prod_{i=1}^{n}X_{i}, where each XiX_{i} is a nominal set, is equivariant if it is equivariant when considered as a subset of the product equipped with the pointwise action.

In particular, a function f:XYf:X\to Y is equivariant exactly when

f(πx)=πf(x).f(\pi\cdot x)=\pi\cdot f(x).
Remark 3

YY is an equivariant subset of XX if and only if YY is a union of orbits.

The following lemmas demonstrate how equivariant functions behave nicely with orbits and supports.

Lemma 2

Let f:XYf:X\to Y be an equivariant function, and let xXx\in X. The image of the orbit of xx (in XX) under ff is equal to the orbit of f(x)f(x) (in YY).

As a corollary, if ff is surjective, then YY has at most as many orbits as XX.

Proof

The orbit of f(x)f(x) is the set {πf(x)πG}\{\pi\cdot f(x)\mid\pi\in G\}. By equivariance of ff, this is equal to {f(πx)πG}=f({πxπG})\{f(\pi\cdot x)\mid\pi\in G\}=f(\{\pi\cdot x\mid\pi\in G\}), which is the image under ff of the orbit of xx.

Lemma 3 ([5, Lemma 4.8])

Let f:XYf:X\to Y be an equivariant function, xXx\in X, and DD\subseteq\mathbb{N}. If DD supports xx, then DD supports f(x)f(x).

As a corollary, supp(f(x))supp(x)supp(f(x))\subseteq supp(x), and hence if ff is surjective, the nominal dimension of YY is at most the nominal dimension of XX.

We will work extensively with quotients by equivariant equivalence relations, so we state some important general facts about them. Recall that if RR is an equivalence relation on XX, X/RX/R denotes the set of equivalence classes of RR, and is called the quotient of XX by RR.

Lemma 4 ([5, Lemma 3.5])

Let XX be a nominal set and RX×XR\subseteq X\times X be an equivariant equivalence relation. Then the quotient X/RX/R is a nominal set, under the action π[x]R=[πx]R\pi\cdot[x]_{R}=[\pi\cdot x]_{R}, and the quotient map x[x]Rx\mapsto[x]_{R} is a surjective equivariant function.

Lemma 5

Let XX be a nominal set and RX×XR\subseteq X\times X be an equivariant equivalence relation. Then supp([x]R)supp(x)supp([x]_{R})\subseteq supp(x) for every xXx\in X, and the nominal dimension of X/RX/R is at most the nominal dimension of XX.

Proof

This follows immediately from 3 and the fact that the quotient map x[x]Rx\mapsto[x]_{R} is surjective and equivariant.

Lemma 6

Let X,YX,Y be nominal sets. An equivariant function F:XYF:X\to Y and equivariant equivalence relation Y\equiv_{Y} on YY induce an equivariant equivalence relation X\equiv_{X} on XX and an equivariant function f:X/XY/Yf:X/\equiv_{X}\to Y/\equiv_{Y}.

Furthermore, ff is injective, and if FF is surjective, then ff is also surjective.

A proof of 6 is given in Appendix 0.B.1.

We are now ready to define nominal DFAs and state the nominal Myhill-Nerode theorem. Fix an orbit-finite nominal set AA, which we will call the GG-alphabet. The action of GG on AA naturally extends to AA^{*}: given a string wAw\in A^{*}, πw\pi\cdot w is the string obtained by letting π\pi act on each individual character of ww.

Definition 11 (GG-language)

A GG-language is a function L:A{0,1}L:A^{*}\to\{0,1\} such that the set {xAL(x)=1}\{x\in A^{*}\mid L(x)=1\} is an equivariant subset of AA^{*}.

Definition 12 (Nominal DFA)

Let AA be an orbit-finite nominal set (the input alphabet). A nominal DFA MM over AA consists of the following data:

  • an orbit-finite nominal set QQ (the set of states);

  • an equivariant function δ:Q×AQ\delta:Q\times A\to Q (the transition function)

  • a state q0Qq_{0}\in Q such that the orbit of q0q_{0} is {q0}\{q_{0}\} (the initial state)

  • an equivariant subset FQF\subseteq Q (the set of accepting states)

As a shorthand, we say that a nominal DFA MM has nn orbits or nominal dimension kk when the state set has nn orbits or nominal dimension kk.

Given an input string x=x1x2xnAx=x_{1}x_{2}\cdots x_{n}\in A^{*}, define the run of MM on xx to be the sequence of states α0,,αnQ\alpha_{0},\ldots,\alpha_{n}\in Q such that α0=q0\alpha_{0}=q_{0}, and for 1in1\leq i\leq n, we have that δ(αi1,xi)=αi\delta(\alpha_{i-1},x_{i})=\alpha_{i}. A string xΣx\in\Sigma^{*} is accepted by MM if the last state appearing in the run of MM on xx is in FF. We say that a langauge LL is recognized by a nominal DFA MM if MM accepts a string xx if and only if xLx\in L. As noted in [5, Definition 3.1], the language recognized by a nominal DFA must be a GG-language. We say that a GG-language LL is nominal regular if it is recognized by some nominal DFA.

Example 3

Let A=A=\mathbb{N}, and let L:A{0,1}L:A\to\{0,1\} be defined by L(w)=1L(w)=1 if and only if w=aaw=aa for some aAa\in A as in the previous example. It is straightfoward to see this language is a GG-language. Additionally, in the automaton that recognizes LL, we can define an action of GG on the set of states by σqi=qσ(i)\sigma\cdot q_{i}=q_{\sigma(i)} for any σG\sigma\in G and ii\in\mathbb{N}, while σq=q\sigma\cdot q=q for all σG\sigma\in G for the states qI,qA,q_{I},q_{A}, and qRq_{R}. This automaton is a nominal DFA, and hence LL is nominal regular.

Let LL be a GG-language, and define the usual relation L\equiv_{L} on AA^{*} by: xLyx\equiv_{L}y if and only if for all zAz\in A^{*}, we have L(xz)=L(yz)L(xz)=L(yz). This relation is equivariant (see [5, Lemma 3.4]), and hence by 4, A/LA^{*}/\equiv_{L} is a nominal set. We will write [x]L[x]_{L} to denote the L\equiv_{L}-equivalence class of a string xAx\in A^{*}.

Theorem 4.1 (Myhill-Nerode for nominal DFAs [5, Theorem 5.2])

Let AA be an orbit-finite nominal set, and let LL be a GG-language. Then the following are equivalent:

  1. 1.

    A/LA^{*}/\equiv_{L} has at most nn orbits and has nominal dimension at most kk;

  2. 2.

    LL is nominal regular, and in particular is recognized by a nominal DFA with at most nn orbits and nominal dimension at most kk.

Note that this statement is more precise than the original version in [5]; in particular, the conditions on the number of orbits and the nominal dimension do not appear in the original theorem. However, these conditions are easily derived from its proof. For completeness, a proof of the correspondence between the number of orbits and the nominal dimension is given in Appendix 0.B.2.

4.2 Littlestone dimension of nominal DFAs

To prove bounds on the Littlestone dimension of nominal automata, we wish to bound the number of possible nominal automata. Since automata involve transition functions, we will need to understand products of nominal sets. Nominality is easily seen to be preserved under Cartesian products:

Proposition 3

The product of two nominal sets is nominal. In particular, the nominal dimension of the product of two nominal sets X×YX\times Y is at most the sum of the nominal dimensions of XX and YY.

Proof

Suppose XX and YY are nominal with nominal dimensions kk and \ell, respectively. Let (x,y)X×Y(x,y)\in X\times Y. Notice that supp(x)supp(y)supp(x)\cup supp(y), which has size at most k+k+\ell, supports (x,y)X×Y(x,y)\in X\times Y.

On the other hand, in the most general setting, products of orbit-finite nominal sets need not be orbit-finite [5, Example 2.5]. However, in our context of Sym()\operatorname{Sym}(\mathbb{N}), products of orbit-finite sets are known to be orbit-finite [5, Section 10]. We will need an explicit bound on the number of orbits of products of orbit-finite sets, which is the purpose of the following discussion.

For a natural number kk, define (k):={(a1,,ak)aiaj for ij}\mathbb{N}^{(k)}:=\{(a_{1},\ldots,a_{k})\mid a_{i}\neq a_{j}\text{ for }i\neq j\}. (k)\mathbb{N}^{(k)} is a single-orbit nominal set when equipped with the pointwise action of GG.

Lemma 7 ([5, Lemma 4.13])

Given a nominal set XX of nominal dimension kk that has exactly one orbit, there is an equivariant surjection fX:(k)Xf_{X}:\mathbb{N}^{(k)}\to X.

Definition 13 (cf. [16, Section 3])

Given k1,,knk_{1},\ldots,k_{n}\in\mathbb{N}, let f(k1,,kn)f_{\mathbb{N}}(k_{1},\ldots,k_{n}) denote the number of orbits of (k1)××(kn)\mathbb{N}^{(k_{1})}\times\cdots\times\mathbb{N}^{(k_{n})}.

Proposition 4 (cf. [16, Section 3])

Let XiX_{i} be a nominal set with i\ell_{i} orbits and nominal dimension kik_{i} for i=1,,ni=1,\ldots,n. Let X:=X1××XnX:=X_{1}\times\cdots\times X_{n} and =1n\ell=\ell_{1}\cdots\ell_{n}. Then XX has at most f(k1,,kn)\ell f_{\mathbb{N}}(k_{1},\ldots,k_{n}) many orbits.

4 is stated but not proven in [16]. For completeness, we include a proof in Appendix 0.B.3. In light of 4, we can find bounds on the number of orbits of a product of nominal sets by bounding the value of ff_{\mathbb{N}}. We do so for the case n=2n=2:

Proposition 5

Suppose (without loss of generality) that k1k2k_{1}\geq k_{2}. Then

(k1e)k2(k1k2)k2!f(k1,k2)(2k1)k2.\left(\frac{k_{1}}{e}\right)^{k_{2}}\leq\binom{k_{1}}{k_{2}}k_{2}!\leq f_{\mathbb{N}}(k_{1},k_{2})\leq(2k_{1})^{k_{2}}.

In particular, if k2k_{2} is a constant, then f(k1,k2)=Θ(k1k2)f_{\mathbb{N}}(k_{1},k_{2})=\Theta(k_{1}^{k_{2}}).

The proof of 5 is given in Appendix 0.B.4. 5 will help us to calculate an explicit upper bound on the Littlestone dimension of nominal automata later, but we can first use it to substantiate a comment from the introduction about previously known query bounds.

The bound in [16, Corollary 1] involves a f(p(n+m),pn(k+klogk+1))f_{\mathbb{N}}(p(n+m),pn(k+k\log k+1)) factor, where nn is the number of orbits of the state set of the target automaton, kk is the nominal dimension of the target automaton, pp is the nominal dimension of the alphabet, and mm is the length of the longest counterexample. Notice that ff_{\mathbb{N}} is non-decreasing in all coordinates. Therefore, f(p(n+m),pn(k+klogk+1))f_{\mathbb{N}}(p(n+m),pn(k+k\log k+1)) is lower bounded by both f(pn,pnk)f_{\mathbb{N}}(pn,pnk) and f(m,nk)f_{\mathbb{N}}(m,nk). Applying the lower bounds from 5 yields the following:

Remark 4

The bound on the (EQ+MQ)-query complexity of nominal automata given in [16, Corollary 1] is at least (knn!)p(k^{n}n!)^{p}, and at least min((nke)m,(me)nk)\min\left(\left(\frac{nk}{e}\right)^{m},\left(\frac{m}{e}\right)^{nk}\right).

We are now at a point where we can calculate a bound on the number of possible nominal DFAs and hence bound the Littlestone dimension. First, we state a general result on the number of single-orbit nominal sets:

Proposition 6

The number of distinct (up to isomorphism) single-orbit nominal sets of nominal dimension at most kk is at most 2O(k2)2^{O(k^{2})}.

The proof of 6 is given in Appendix 0.B.5. We note that it is a completely general result about nominal sets and may be of independent interest. For us, it will allow us to bound the number of possible state sets.

Lemma 8

The number of possible state sets for a nominal DFA with nn orbits and nominal dimension kk is 2O(nk2)2^{O(nk^{2})}.

Proof

Since the state set QQ is an orbit-finite nominal set with nn orbits and nominal dimension kk, we count the number of such sets. We can view QQ as the disjoint union of nn single-orbit nominal sets, each with nominal dimension at most kk, just by considering each orbit independently. By 6, the number of single-orbit nominal sets of nominal dimension k\leq k is at most 2O(k2)2^{O(k^{2})}. An upper bound on the number of nominal sets with nn orbits and nominal dimension kk can be found by raising this number to nn, i.e., (2O(k2))n=2O(nk2)\left(2^{O(k^{2})}\right)^{n}=2^{O(nk^{2})}.

Now, we count the number of possible transition behaviors. Fix an input alphabet AA, where AA has \ell orbits and nominal dimension pp.

Lemma 9

The number of possible transition behaviors for a nominal DFA with nn orbits and nominal dimension kk is at most

(n(k+p)!)O(nkp).\left(n(k+p)!\right)^{O\left(nk^{p}\right)}.

The proof of 9 is given in Appendix 0.B.6

Let Anom(n,k)\mathcal{L}^{\text{nom}}_{A}(n,k) denote the set of GG-languages over AA recognized by a nominal DFA with at most nn orbits and nominal dimension at most kk.

Proposition 7

The Littlestone dimension of Anom(n,k)\mathcal{L}^{\text{nom}}_{A}(n,k) is at most

O(nkp(logn+klogk))O\left(nk^{p}\left(\log n+k\log k\right)\right)
Proof

We bound the Littlestone dimension by bounding the size of Anom(n,k)\mathcal{L}^{\text{nom}}_{A}(n,k). To choose a nominal DFA with at most nn orbits and nominal dimension at most kk, we must choose the state set, transition function, initial state, and set of accepting states. By 8, there are at most 2O(nk2)2^{O(nk^{2})} choices for the state set. By 9, there are at most (n(k+p)!)O(nkp)\left(n(k+p)!\right)^{O(nk^{p})} choices for the transition function. There are at most nn choices for the initial state (since the initial state must be an orbit of its own) and 2n2^{n} choices for the accepting states (since the set of accepting states is a union of orbits). So in total, we can upper bound |Anom(n,k)||\mathcal{L}^{nom}_{A}(n,k)| by

|Anom(n,k)|2O(nk2)(n(k+p)!)O(nkp)n2n.|\mathcal{L}^{nom}_{A}(n,k)|\leq 2^{O\left(nk^{2}\right)}\left(n(k+p)!\right)^{O\left(nk^{p}\right)}n2^{n}.

Applying 1, taking logs, using Stirling’s approximation, and remembering that pp is constant, we find that

Ldim(Anom(n,k))\displaystyle\operatorname{Ldim}(\mathcal{L}^{nom}_{A}(n,k)) log|Anom(n,k)|\displaystyle\leq\log|\mathcal{L}^{nom}_{A}(n,k)|
O(nk2)+O(nkp)(logn+log((k+p)!))+logn+n\displaystyle\leq O\left(nk^{2}\right)+O\left(nk^{p}\right)(\log n+\log((k+p)!))+\log n+n
O(nkp(logn+(k+p)log(k+p)))\displaystyle\leq O\left(nk^{p}\left(\log n+(k+p)\log(k+p)\right)\right)
=O(nkp(logn+klogk)).\displaystyle=O\left(nk^{p}\left(\log n+k\log k\right)\right).

4.3 Consistency dimension of nominal DFAs

It remains to bound the consistency dimension. To do this, we need to show that if LL is a language that is not recognized by a nominal DFA with at most nn orbits and nominal dimension at most kk, then there is a set of strings BB of bounded size such that the restriction of LL to BB cannot be extended to any GG-language recognized by such a nominal DFA. By the nominal Myhill-Nerode theorem, LL not being recognized by a nominal DFA with at most nn orbits and nominal dimension at most kk is equivalent to A/LA^{*}/\equiv_{L} either having greater than nn orbits or having nominal dimension greater than kk. We first address the case where A/LA^{*}/\equiv_{L} has large nominal dimension.

Lemma 10

Let XX be any nominal set, let xXx\in X, and let D=supp(x)D=supp(x). Suppose that τG\tau\in G fixes every element of DD except for one. Then τxx\tau\cdot x\neq x.

The proof of this lemma is given in Appendix 0.B.7.

Proposition 8

Let LL be a GG-language over alphabet AA, and suppose that A/LA^{*}/\equiv_{L} has nominal dimension at least k+1k+1. Then there is a set BAB\subseteq A^{*} of size 2(k+1)2(k+1) such that for any GG-language LL^{\prime}, if L|B=L|BL|_{B}=L^{\prime}|_{B}, then A/LA^{*}/\equiv_{L^{\prime}} also has nominal dimension at least k+1k+1.

Proof

Since A/LA^{*}/\equiv_{L} has nominal dimension k+1\geq k+1, there is some x0Ax_{0}\in A^{*} such that |supp([x0]L)|k+1|supp([x_{0}]_{L})|\geq k+1. Let DD denote supp([x0]L)supp([x_{0}]_{L}), and let D0D_{0} be a subset of DD of size k+1k+1. Let j=max(D)+1j=\max(D)+1, and for each iD0i\in D_{0}, let τi=(ii+j)G\tau_{i}=(i\enspace i+j)\in G (the permutation that swaps ii and i+ji+j). Notice that τi\tau_{i} fixes all but one element of DD, and so by 10, [τix0]L=τi[x0]L[x0]L[\tau_{i}\cdot x_{0}]_{L}=\tau_{i}\cdot[x_{0}]_{L}\neq[x_{0}]_{L}. Thus for each iD0i\in D_{0}, there is some ziAz_{i}\in A^{*} such that L((τix0)zi)L(x0zi)L((\tau_{i}\cdot x_{0})z_{i})\neq L(x_{0}z_{i}). Let

B={(τix0)ziiD0}{x0ziiD0},B=\{(\tau_{i}\cdot x_{0})z_{i}\mid i\in D_{0}\}\cup\{x_{0}z_{i}\mid i\in D_{0}\},

which has size 2(k+1)2(k+1).

Now, let LL^{\prime} be a GG-language extending L|BL|_{B}, and suppose for contradiction that A/LA^{*}/\equiv_{L^{\prime}} has nominal dimension at most kk. That is, for every wAw\in A^{*}, |supp([w]L)|k|supp([w]_{L^{\prime}})|\leq k. In particular, |supp([x0]L)|k|supp([x_{0}]_{L^{\prime}})|\leq k. For each iD0i\in D_{0}, LL^{\prime} agrees with LL on (τix0)zi(\tau_{i}\cdot x_{0})z_{i} and x0zix_{0}z_{i}, so L((τix0)zi)L(x0zi)L^{\prime}((\tau_{i}\cdot x_{0})z_{i})\neq L^{\prime}(x_{0}z_{i}). This shows that τi[x0]L=[τix0]L[x0]L\tau_{i}\cdot[x_{0}]_{L^{\prime}}=[\tau_{i}\cdot x_{0}]_{L^{\prime}}\neq[x_{0}]_{L^{\prime}}. Since the only elements not fixed by τi\tau_{i} are ii and i+ji+j, it must be that at least one of ii and i+ji+j are in the least support of [x0]L[x_{0}]_{L^{\prime}}. Thus for each iD0i\in D_{0}, supp([x0]L)supp([x_{0}]_{L^{\prime}}) contains at least one of ii and i+ji+j. Since j>max(D)j>\max(D), all values i,i+ji,i+j for iD0i\in D_{0} are distinct, and so |supp([x0]L)||D0|=k+1|supp([x_{0}]_{L^{\prime}})|\geq|D_{0}|=k+1, which contradicts the fact that |supp([x0]L)|k|supp([x_{0}]_{L^{\prime}})|\leq k.

Next, we address the case where A/LA^{*}/\equiv_{L} has a large number of orbits.

Lemma 11

Let LL be a GG-language such that A/LA^{*}/\equiv_{L} has nn orbits. Then for every orbit G[x]LG\cdot[x]_{L} of A/LA^{*}/\equiv_{L}, there is a string xAx^{\prime}\in A^{*} such that |x|<n|x^{\prime}|<n such that [x]LG[x]L[x^{\prime}]_{L}\in G\cdot[x]_{L}.

The proof of 11 is given in Appendix 0.B.8.

Proposition 9

Let LL be a GG-language over alphabet AA, where AA has nominal dimension pp, and suppose that A/LA^{*}/\equiv_{L} has at least n+1n+1 orbits. Then there is a set BAB\subseteq A^{*} of size 2(n+12)(pnk)(3pn)k2\binom{n+1}{2}\binom{pn}{k}(3pn)^{k} such that for any GG-language LL^{\prime}, if L|B=L|BL|_{B}=L^{\prime}|_{B}, then A/LA^{*}/\equiv_{L^{\prime}} also has at least n+1n+1 orbits.

Proof

Since A/LA^{*}/\equiv_{L} has at least n+1n+1 orbits, there are strings x0,,xnx_{0},\ldots,x_{n} such that [xi]L[x_{i}]_{L} all belong to distinct orbits. That is, for every τG\tau\in G and 0i<jn0\leq i<j\leq n, [τxi]L=τ[xi]L[xj]L[\tau\cdot x_{i}]_{L}=\tau[x_{i}]_{L}\neq[x_{j}]_{L}, and thus there is zijτAz^{\tau}_{ij}\in A^{*} such that

L((τxi)zijτ)L(xjzijτ).L((\tau\cdot x_{i})z^{\tau}_{ij})\neq L(x_{j}z^{\tau}_{ij}).

Moreover, by 11, we may assume that |xi|n|x_{i}|\leq n for each 0in0\leq i\leq n.

For each 0in0\leq i\leq n, let Di=supp(xi)D_{i}=supp(x_{i}). Since the nominal dimension of AA is pp, we have that |Di|p|xi|pn|D_{i}|\leq p\cdot|x_{i}|\leq pn. Also, let DD be some subset of \mathbb{N} such that DD is disjoint from D0,,DnD_{0},\ldots,D_{n}, and |D|=max0in|Di|pn\displaystyle|D|=\max_{0\leq i\leq n}|D_{i}|\leq pn.

Now, for each 0i<jn0\leq i<j\leq n, let

Σij:={σ:D′′DiDjDD′′Di of size k and σ is an injection}.\Sigma_{ij}^{\prime}:=\{\sigma^{\prime}:D^{\prime\prime}\to D_{i}\cup D_{j}\cup D\ \mid\ D^{\prime\prime}\subseteq D_{i}\text{ of size }k\text{ and $\sigma^{\prime}$ is an injection}\}.

Notice that Σij\Sigma_{ij}^{\prime} has size at most (pnk)(3pn)k\binom{pn}{k}(3pn)^{k}: to choose a σ\sigma^{\prime}, we choose a subset of DiD_{i} of size kk and a value from DiDjDD_{i}\cup D_{j}\cup D for each of the kk inputs. For each σΣij\sigma^{\prime}\in\Sigma_{ij}^{\prime}, let σG\sigma\in G be an arbitrary but fixed extension of σ\sigma^{\prime} to a permutation of \mathbb{N}, and let Σij\Sigma_{ij} consist of all such σ\sigma (so |Σij|=|Σij|(pnk)(3pn)k|\Sigma_{ij}|=|\Sigma_{ij}^{\prime}|\leq\binom{pn}{k}(3pn)^{k}).

Now, let

B=\displaystyle B=\ {(σxi)zijσ 0i<jn,σΣij}\displaystyle\{(\sigma\cdot x_{i})z^{\sigma}_{ij}\ \mid\ 0\leq i<j\leq n,\ \sigma\in\Sigma_{ij}\}\cup
{xjzijσ 0i<jn,σΣij}.\displaystyle\{x_{j}z^{\sigma}_{ij}\ \mid\ 0\leq i<j\leq n,\ \sigma\in\Sigma_{ij}\}.

BB has size at most 2(n+12)(pnk)(3pn)k2\binom{n+1}{2}\binom{pn}{k}(3pn)^{k}. Let LL^{\prime} be a GG-language extending L|BL|_{B}, and assume for contradiction that A/LA^{*}/\equiv_{L^{\prime}} has at most nn orbits. Thus there must be 0i<jn0\leq i<j\leq n such that [xi]L[x_{i}]_{L^{\prime}} is in the same orbit as [xj]L[x_{j}]_{L^{\prime}}. Let πG\pi\in G such that π[xi]L=[xj]L\pi\cdot[x_{i}]_{L^{\prime}}=[x_{j}]_{L^{\prime}}. π\pi does not have to be in Σij\Sigma_{ij}, and so we cannot directly derive a contradiction. Instead, we will alter π\pi in order to obtain a σΣij\sigma\in\Sigma_{ij} such that σ[xi]L=[xj]L\sigma\cdot[x_{i}]_{L^{\prime}}=[x_{j}]_{L^{\prime}}.

Define πG\pi^{\prime}\in G in the following way: first, let π(Di)\pi(D_{i}) denote the image of DiD_{i} under π\pi. For each element aπ(Di)(DiDjD)a\in\pi(D_{i})\setminus(D_{i}\cup D_{j}\cup D), select a unique element baDπ(Di)b_{a}\in D\setminus\pi(D_{i}). This is possible because |D||Di|=|π(Di)||D|\geq|D_{i}|=|\pi(D_{i})|, and so |Dπ(Di)||π(Di)D||π(Di)(DiDjD)||D\setminus\pi(D_{i})|\geq|\pi(D_{i})\setminus D|\geq|\pi(D_{i})\setminus(D_{i}\cup D_{j}\cup D)|. Let π\pi^{\prime} be the permutation that transposes each aπ(Di)(DiDjD)a\in\pi(D_{i})\setminus(D_{i}\cup D_{j}\cup D) with its corresponding baDπ(Di)b_{a}\in D\setminus\pi(D_{i}), and fixes every other element of \mathbb{N}.

Notice that π|Dj=idDj\pi^{\prime}|_{D_{j}}=\mathrm{id}_{D_{j}}, and so πxj=xj\pi^{\prime}\cdot x_{j}=x_{j}. Also, if aDia\in D_{i}, then ππ(a)DiDjD\pi^{\prime}\circ\pi(a)\in D_{i}\cup D_{j}\cup D: if π(a)DiDjD\pi(a)\in D_{i}\cup D_{j}\cup D, then by definition π\pi^{\prime} does not affect π(a)\pi(a) and so π(π(a))=π(a)DiDjD\pi^{\prime}(\pi(a))=\pi(a)\in D_{i}\cup D_{j}\cup D, whereas if π(a)π(Di)(DiDjD)\pi(a)\in\pi(D_{i})\setminus(D_{i}\cup D_{j}\cup D), π\pi^{\prime} will transpose π(a)\pi(a) with an element in DD. Let DiD_{i}^{\prime} denote supp([xi]L)supp([x_{i}]_{L^{\prime}}). Since A/LA^{*}/\equiv_{L^{\prime}} has nominal dimension at most kk, |Di|k|D_{i}^{\prime}|\leq k. Also, by 5, Disupp(xi)=DiD_{i}^{\prime}\subseteq supp(x_{i})=D_{i}. Therefore, (ππ)|Di(\pi^{\prime}\circ\pi)|_{D_{i}^{\prime}} is an injection from a subset of DiD_{i} of size at most kk into DiDjDD_{i}\cup D_{j}\cup D. Therefore, there is some σΣij\sigma\in\Sigma_{ij} such that

σ|Di=(ππ)|Di.\sigma|_{D_{i}^{\prime}}=(\pi^{\prime}\circ\pi)|_{D_{i}^{\prime}}.

We may then deduce that

[σxi]L\displaystyle[\sigma\cdot x_{i}]_{L^{\prime}} =σ[xi]L\displaystyle=\sigma\cdot[x_{i}]_{L^{\prime}}
=(ππ)[xi]L\displaystyle=(\pi^{\prime}\circ\pi)\cdot[x_{i}]_{L^{\prime}}
=π(π[xi]L)\displaystyle=\pi^{\prime}\cdot(\pi\cdot[x_{i}]_{L^{\prime}})
=π[xj]L\displaystyle=\pi^{\prime}\cdot[x_{j}]_{L^{\prime}}
=[πxj]L\displaystyle=[\pi^{\prime}\cdot x_{j}]_{L^{\prime}}
=[xj]L.\displaystyle=[x_{j}]_{L^{\prime}}.

This means that for all zAz\in A^{*}, L((σxi)z)=L(xjz)L^{\prime}((\sigma\cdot x_{i})z)=L^{\prime}(x_{j}z). In particular, we may choose z=zijσz=z^{\sigma}_{ij}. However, since (σxi)zijσ(\sigma\cdot x_{i})z^{\sigma}_{ij} and xjzijσx_{j}z^{\sigma}_{ij} are in BB, it must be that

L((σxi)zijσ)\displaystyle L((\sigma\cdot x_{i})z^{\sigma}_{ij}) =L((σxi)zijσ)\displaystyle=L^{\prime}((\sigma\cdot x_{i})z^{\sigma}_{ij})
=L(xjzijσ)\displaystyle=L^{\prime}(x_{j}z^{\sigma}_{ij})
=L(xjzijσ),\displaystyle=L(x_{j}z^{\sigma}_{ij}),

a contradiction! So we may finally conclude that A/LA^{*}/\equiv_{L^{\prime}} has at least n+1n+1 orbits.

11 yields an interesting consequence, which we do not use in any of our proofs but may be of independent interest:

Corollary 1

If LL is a GG-language over an alphabet AA which has nominal dimension pp such that A/LA^{*}/\equiv_{L} has nn orbits, then A/LA^{*}/\equiv_{L} has nominal dimension at most (n1)p(n-1)p.

This result suggests that ultimately it is the number of orbits, and not the nominal dimension, of A/LA^{*}/\equiv_{L} which characterizes the complexity of LL. Its proof is given in Appendix 0.B.9.


We now have all the ingredients we need to bound the consistency dimension.

Proposition 10

The consistency dimension of Anom(n,k)\mathcal{L}^{nom}_{A}(n,k) with respect to itself is at most 2(n+12)(pnk)(3pn)k2\binom{n+1}{2}\binom{pn}{k}(3pn)^{k}.

Proof

Let L:A{0,1}L:A^{*}\to\{0,1\}, and suppose that LAnom(n,k)L\notin\mathcal{L}^{nom}_{A}(n,k), i.e., LL is not recognized by any nominal DFA with at most nn orbits and nominal dimension at most kk. We must find a set BB of at most 2(n+12)(pnk)(3pn)k2\binom{n+1}{2}\binom{pn}{k}(3pn)^{k} strings such that any function extending L|BL|_{B} is not in Anom(n,k)\mathcal{L}^{nom}_{A}(n,k).

Case 1: LL is not equivariant. Then there is x0Ax_{0}\in A^{*} and πG\pi\in G such that L(x0)L(πx0)L(x_{0})\neq L(\pi\cdot x_{0}). Set B={x0,πx0}B=\{x_{0},\pi\cdot x_{0}\}. Any function LL^{\prime} extending L|BL|_{B} cannot be equivariant, and therefore cannot be nominal regular, so LAnom(n,k)L^{\prime}\notin\mathcal{L}^{nom}_{A}(n,k).

Case 2: LL is equivariant, and is recognized by some nominal DFA with at most nn orbits. By Theorem 4.1, A/LA^{*}/\equiv_{L} has at most nn orbits. Thus the nominal dimension of A/LA^{*}/\equiv_{L} must be at least k+1k+1, or else another application of Theorem 4.1 would imply that LL is recognized by a nominal DFA with at most nn orbits and nominal dimension at most kk, contradicting the assumption that LAnom(n,k)L\notin\mathcal{L}^{nom}_{A}(n,k). Let BAB\subseteq A^{*} of size 2(k+1)2(k+1) be as given by 8.

Now, let L:A{0,1}L^{\prime}:A^{*}\to\{0,1\} extend L|BL|_{B}, and suppose for contradiction that LAnom(n,k)L^{\prime}\in\mathcal{L}^{nom}_{A}(n,k). Since LL^{\prime} is recognized by a nominal DFA, it must be a GG-language, so by the definition of BB, A/LA^{*}/\equiv_{L^{\prime}} has nominal dimension at least k+1k+1. However, by Theorem 4.1 and the fact that LL^{\prime} is recognized by a nominal DFA with at most nn orbits and nominal dimension at most kk, A/LA^{*}/\equiv_{L^{\prime}} has nominal dimension at most kk, a contradiction! So LAnom(n,k)L^{\prime}\notin\mathcal{L}^{nom}_{A}(n,k).

Case 3: LL is equivariant, and LL is not recognized by any nominal DFAs with at most nn orbits. By Theorem 4.1, A/LA^{*}/\equiv_{L} has at least n+1n+1 orbits. Let BAB\subseteq A^{*} of size 2(n+12)(pnk)(3pn)k2\binom{n+1}{2}\binom{pn}{k}(3pn)^{k} be as given by 9.

Now, let L:A{0,1}L^{\prime}:A^{*}\to\{0,1\} extend L|BL|_{B}, and suppose for contradiction that LAnom(n,k)L^{\prime}\in\mathcal{L}^{nom}_{A}(n,k). Since LL^{\prime} is recognized by a nominal DFA, it must be a GG-language, so by the definition of BB, A/LA^{*}/\equiv_{L^{\prime}} has at least n+1n+1 orbits. However, by Theorem 4.1 and the fact that LL^{\prime} is recognized by a nominal DFA with at most nn orbits and nominal dimension at most kk, A/LA^{*}/\equiv_{L^{\prime}} has at most nn orbits, a contradiction! So LAnom(n,k)L^{\prime}\notin\mathcal{L}^{nom}_{A}(n,k).

In all three cases, we found a set BB of size at most 2(n+12)(pnk)(3pn)k2\binom{n+1}{2}\binom{pn}{k}(3pn)^{k} such that any language extending L|BL|_{B} cannot be in Anom(n,k)\mathcal{L}^{nom}_{A}(n,k). We can then conclude that the consistency dimension of Anom(n,k)\mathcal{L}^{nom}_{A}(n,k) with respect to itself is at most 2(n+12)(pnk)(3pn)k2\binom{n+1}{2}\binom{pn}{k}(3pn)^{k}.

4.4 Learning bound for nominal DFAs

We can now use the bounds we proved for Littlestone dimension and consistency dimension to prove our main result on nominal DFAs:

\nominalLCfixedalphabet
Proof

By 7, the Littlestone dimension of Anom(n,k)\mathcal{L}^{nom}_{A}(n,k) is at most O(nkp(logn+klogk))O\left(nk^{p}\left(\log n+k\log k\right)\right). By 10, the consistency dimension of Anom(n,k)\mathcal{L}^{nom}_{A}(n,k) with respect to itself is at most 2(n+12)(pnk)(3pn)k2\binom{n+1}{2}\binom{pn}{k}(3pn)^{k}. This is upper bounded by n(n+1)(epn)kkk(3pn)kn(n+1)\frac{(e\cdot pn)^{k}}{k^{k}}(3pn)^{k}, which is in turn at most nO(k)kk\frac{n^{O(k)}}{k^{k}} (since pp is a constant). Applying Theorem 2.1, the query complexity of Anom(n,k)\mathcal{L}^{nom}_{A}(n,k) with queries from Anom(n,k)\mathcal{L}^{nom}_{A}(n,k) is at most O(nkp(logn+klogk)nO(k)kk)=nO(k)kkO\left(nk^{p}\left(\log n+k\log k\right)\frac{n^{O(k)}}{k^{k}}\right)=\frac{n^{O(k)}}{k^{k}}.

5 Conclusion

In this paper, we derive query learning bounds for advice DFAs and nominal DFAs; the bounds for advice DFAs are the first known bounds for the setting, while the bounds for nominal DFAs improve upon prior results, at the cost of not making any computational guarantees about the learning algorithm.

The bounds for advice DFAs are derived in almost the exact same manner as the bounds for DFAs given by Chase and Freitag [8]. However, to the complexity of the setting of nominal DFAs requires a much deeper analysis of the structure of nominal sets and nominal DFAs. This leads to several auxillary results which may be of independent interest; namely, 5, which analyzes the asymptotics of the number of orbits of products of nominal sets; 6, which counts the number of single-orbit nominal sets of a given dimension; and 1, which bounds the nominal dimension of A/LA^{*}/\equiv_{L} in terms of the number of orbits of A/LA^{*}/\equiv_{L} and the nominal dimension of AA.

One natural extension of our work is to adapt the results for nominal automata to other well-behaved symmetries beside the equality symmetry, such as the total order symmetry, as described in [5]. Our approach is general enough that it should apply to these symmetries; however, there are many details that would need to be verified. It would also be natural to explore what the approach of Chase and Freitag yields for other previously-studied forms of automata that admit versions of the Myhill-Nerode theorem. For example, Bollig et al. [6] study query learning of NFAs via residual finite-state automata, which crucially utilize the Myhill-Nerode correspondence to find canonical NFA representations of regular languages. A final potential direction for future work would be to consider the generalization of nominal automata to ω\omega-languages; i.e., nominal DFAs with a Büchi-style acceptance criteria. Angluin and Fisman have adapted the LL^{*} algorithm to the classical setting of ω\omega-regular languages [2], and Chase and Freitag apply their method in the same setting [8]. To our knowledge, no work has been done studying the learnability of ω\omega-nominal regular languages, so it would be of interest to find results using both Angluin’s method and Chase and Freitag’s method.

References

  • [1] Angluin, D.: Learning regular sets from queries and counterexamples. Inf. Comput. 75, 87–106 (1987)
  • [2] Angluin, D., Fisman, D.: Learning regular omega languages. Theoretical Computer Science 650, 57–72 (2016). https://doi.org/https://doi.org/10.1016/j.tcs.2016.07.031, https://www.sciencedirect.com/science/article/pii/S0304397516303760, algorithmic Learning Theory
  • [3] Balcázar, J.L., Castro, J., Guijarro, D., Simon, H.U.: The consistency dimension and distribution-dependent learning from queries. Theoretical Computer Science 288(2), 197–215 (2002). https://doi.org/https://doi.org/10.1016/S0304-3975(01)00400-5, https://www.sciencedirect.com/science/article/pii/S0304397501004005, algorithmic Learning Theory
  • [4] Bárány, V.: A hierarchy of automatic ω\omega-words having a decidable mso theory. RAIRO - Theoretical Informatics and Applications - Informatique Théorique et Applications 42(3), 417–450 (2008). https://doi.org/10.1051/ita:2008008, http://www.numdam.org/articles/10.1051/ita:2008008/
  • [5] Bojańczyk, M., Klin, B., Lasota, S.: Automata theory in nominal sets. Logical Methods in Computer Science Volume 10, Issue 3 (Aug 2014). https://doi.org/10.2168/LMCS-10(3:4)2014, https://lmcs.episciences.org/1157
  • [6] Bollig, B., Habermehl, P., Kern, C., Leucker, M.: Angluin-style learning of nfa. In: IJCAI. pp. 1004–1009 (07 2009)
  • [7] Carton, O., Thomas, W.: The monadic theory of morphic infinite words and generalizations. In: Nielsen, M., Rovan, B. (eds.) Mathematical Foundations of Computer Science 2000. pp. 275–284. Springer Berlin Heidelberg, Berlin, Heidelberg (2000)
  • [8] Chase, H., Freitag, J.: Bounds in query learning. In: Abernethy, J., Agarwal, S. (eds.) Proceedings of Thirty Third Conference on Learning Theory. Proceedings of Machine Learning Research, vol. 125, pp. 1142–1160. PMLR (2020), http://proceedings.mlr.press/v125/chase20a.html
  • [9] Drews, S., D’Antoni, L.: Learning symbolic automata. In: Legay, A., Margaria, T. (eds.) Tools and Algorithms for the Construction and Analysis of Systems. pp. 173–189. Springer Berlin Heidelberg, Berlin, Heidelberg (2017)
  • [10] Elgot, C.C., Rabin, M.: Decidability and undecidability of extensions of second (first) order theory of (generalized) successor. J. Symb. Log. 31, 169–181 (1966)
  • [11] Fisman, D., Saadon, S.: Learning and characterizing fully-ordered lattice automata. In: Bouajjani, A., Holík, L., Wu, Z. (eds.) Automated Technology for Verification and Analysis. pp. 266–282. Springer International Publishing, Cham (2022)
  • [12] Gabbay, M.J., Pitts, A.M.: A new approach to abstract syntax with variable binding. Formal Aspects of Computing 13(3), 341–363 (07 2002). https://doi.org/10.1007/s001650200016, https://doi.org/10.1007/s001650200016
  • [13] Hanneke, S., Livni, R., Moran, S.: Online learning with simple predictors and a combinatorial characterization of minimax in 0/1 games. In: COLT (2021)
  • [14] Hellerstein, L., Pillaipakkamnatt, K., Raghavan, V., Wilkins, D.: How many queries are needed to learn? J. ACM 43(5), 840–862 (Sep 1996). https://doi.org/10.1145/234752.234755, https://doi.org/10.1145/234752.234755
  • [15] Kruckman, A., Rubin, S., Sheridan, J., Zax, B.: A myhill-nerode theorem for automata with advice. Electronic Proceedings in Theoretical Computer Science 96 (10 2012). https://doi.org/10.4204/EPTCS.96.18
  • [16] Moerman, J., Sammartino, M., Silva, A., Klin, B., Szynwelski, M.: Learning nominal automata. In: Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages. p. 613–625. POPL 2017, Association for Computing Machinery, New York, NY, USA (2017). https://doi.org/10.1145/3009837.3009879, https://doi.org/10.1145/3009837.3009879
  • [17] Montanari, U., Pistore, M.: An introduction to history dependent automata. Electr. Notes Theor. Comput. Sci. 10, 170–188 (01 1997). https://doi.org/10.1016/S1571-0661(05)80696-6
  • [18] Nies, A.: Describing groups. The Bulletin of Symbolic Logic 13(3), 305–339 (2007), http://www.jstor.org/stable/4493323
  • [19] Pitts, A.M.: Nominal Sets: Names and Symmetry in Computer Science. Cambridge Tracts in Theoretical Computer Science, Cambridge University Press (2013). https://doi.org/10.1017/CBO9781139084673
  • [20] Pyber, L.: Asymptotic results for permutation groups. In: Groups And Computation (1991)
  • [21] Rabinovich, A., Thomas, W.: Decidable theories of the ordering of natural numbers with unary predicates. In: Ésik, Z. (ed.) Computer Science Logic. pp. 562–574. Springer Berlin Heidelberg, Berlin, Heidelberg (2006)
  • [22] Sakakibara, Y.: Learning context-free grammars from structural data in polynomial time. In: Proceedings of the First Annual Workshop on Computational Learning Theory. p. 330–344. COLT ’88, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1988)
  • [23] Salomaa, A.: On finite automata with a time-variant structure. Information and Control 13(2), 85–98 (1968). https://doi.org/https://doi.org/10.1016/S0019-9958(68)90706-7, https://www.sciencedirect.com/science/article/pii/S0019995868907067
  • [24] Tsankov, T.: The additive group of the rationals does not have an automatic presentation. The Journal of Symbolic Logic 76(4), 1341–1351 (2011), http://www.jstor.org/stable/23208221

Appendix 0.A Omitted proofs from Section 3

0.A.1 A witness to the tightness of the bounds in Theorem 3.1

Example 4

Define L:{0,1}{0,1}L:\{0,1\}^{*}\to\{0,1\} by

L(w)={1(w has an even number of 0’s|w|2)(|w|=3)0otherwiseL(w)=\begin{cases}1&(w\text{ has an even number of 0's}\land|w|\neq 2)\lor(|w|=3)\\ 0&\text{otherwise}\end{cases}

Notice that L,m\equiv_{L,m} has at most two classes for every mm—one for strings with an even number of 0’s and one for strings with an odd number of 0’s. However, it is not recognized by any DFA MM with advice with 3 states. To see this, let MM accept LL. The runs of strings 00 and 01 both end in reject states of MM, but since they are in separate L,2\equiv_{L,2}-classes, they must end in distinct states. So there are at least two reject states. Similarly, the runs of 000 and 001 both end in distinct accept states. Thus there must be at least 4 states in MM.

Replacing “ww has an even number of 0’s” with any regular language that has at most kk equivalence classes in the Myhill-Nerode relation (for example, “the number of 0’s in ww is divisible by kk”), we obtain a language LkL_{k} such that Lk,m\equiv_{L_{k},m} has at most kk classes for every mm, but any advice DFA accepting LkL_{k} must have at least 2k2k states.

Appendix 0.B Omitted proofs from Section 4

0.B.1 Proof of 6

Proof

Define X\equiv_{X} by x1Xx2x_{1}\equiv_{X}x_{2} if and only if F(x1)YF(x2)F(x_{1})\equiv_{Y}F(x_{2}). A standard argument confirms that this is indeed an equivalence relation, equivariance follows from the fact that FF is equivariant.

Next, define f:X/XYf:X/\equiv_{X}\to\equiv_{Y} by f([x]X)=[F(x)]Yf([x]_{\equiv_{X}})=[F(x)]_{\equiv_{Y}}. This is well-defined: if x1Xx2x_{1}\equiv_{X}x_{2}, then

f([x1]X)\displaystyle f([x_{1}]_{\equiv_{X}}) =[F(x1)]Y\displaystyle=[F(x_{1})]_{\equiv_{Y}}
=[F(x2)]Y\displaystyle=[F(x_{2})]_{\equiv_{Y}} since x1Xx2F(x1)YF(x2)\displaystyle\text{since }x_{1}\equiv_{X}x_{2}\Rightarrow F(x_{1})\equiv_{Y}F(x_{2})
=f([x2]X)\displaystyle=f([x_{2}]_{\equiv_{X}})

ff is equivariant since if πG\pi\in G, then f(π[x]X)=f([πx]X)=[F(πx)]Y=[πF(x)]Y=π[F(x)]Y=πf([x]X)f(\pi\cdot[x]_{\equiv_{X}})=f([\pi\cdot x]_{\equiv_{X}})=[F(\pi\cdot x)]_{\equiv_{Y}}=[\pi\cdot F(x)]_{\equiv_{Y}}=\pi\cdot[F(x)]_{\equiv_{Y}}=\pi\cdot f([x]_{\equiv_{X}}). ff is injective since for x1,x2Xx_{1},x_{2}\in X,

f([x1]X)\displaystyle f([x_{1}]_{\equiv_{X}}) =f([x2]X)\displaystyle=f([x_{2}]_{\equiv_{X}})
[F(x1)]Y\displaystyle\Rightarrow[F(x_{1})]_{\equiv_{Y}} =[F(x2)]Y\displaystyle=[F(x_{2})]_{\equiv_{Y}}
F(x1)\displaystyle\Rightarrow F(x_{1}) YF(x2)\displaystyle\equiv_{Y}F(x_{2})
x1\displaystyle\Rightarrow x_{1} Xx2\displaystyle\equiv_{X}x_{2}
[x1]X\displaystyle\Rightarrow[x_{1}]_{\equiv_{X}} =[x2]X.\displaystyle=[x_{2}]_{\equiv_{X}}.

Finally, if FF is surjective, then given [y]YY/Y[y]_{\equiv_{Y}}\in Y/\equiv_{Y}, there is xXx\in X such that F(x)=yF(x)=y and hence f([x]X)=[y]Yf([x]_{\equiv_{X}})=[y]_{\equiv_{Y}}, so ff is surjective.

0.B.2 Proof of Theorem 4.1

Definition 14 (Reachable Nominal DFA)

A nominal DFA MM is said to be reachable if for every state qq in MM, there is xAx\in A^{*} such that the run of MM on xx ends in state qq.

Definition 15 (Syntactic Automaton)

Fix an orbit-finite nominal alphabet AA, and let L:A{0,1}L:A^{*}\to\{0,1\} be a GG-language. The syntactic automaton of LL, denoted MLM_{L} is specified as follows:

  • the state set is the set A/LA^{*}/\equiv_{L};

  • the transition function is δL:A/L×AA/L\delta_{L}:A^{*}/\equiv_{L}\times\ A\to A^{*}/\equiv_{L}
    defined by

    δL([x]L,a)=[xa]L,\delta_{L}([x]_{L},a)=[xa]_{L},
  • the initial state is [ϵ]L[\epsilon]_{L};

  • the set of accepting states is {[x]LxL}\{[x]_{L}\mid x\in L\}.

Lemma 12 ([5, Lemma 3.6; Proposition 5.1])

The syntactic automaton of a GG-language is a reachable nominal DFA.

By [5, Lemma 3.7], a GG-language is always recognized by its syntactic automaton.

Definition 16 (Automaton Homomorphism)

Let M=(Q,δ,q0,F)M=(Q,\delta,q_{0},F) and M=(Q,δ,q0,F)M^{\prime}=(Q^{\prime},\delta^{\prime},q_{0}^{\prime},F^{\prime}) be two nominal DFAs over the same alphabet AA. An automaton homomorphism from MM to MM^{\prime} is an equivariant function f:QQf:Q\to Q^{\prime} such that:

  • f(q0)=q0f(q_{0})=q_{0}^{\prime};

  • qFf(q)Fq\in F\iff f(q)\in F^{\prime} for every qQq\in Q; and

  • f(δ(q,a))=δ(f(q),a)f(\delta(q,a))=\delta^{\prime}(f(q),a) for every qQ,aAq\in Q,a\in A.

If there exists an automaton homomorphism from MM to MM^{\prime}, then MM and MM^{\prime} recognize the same language: for any string xx, the run of MM on xx ends in state qq if and only if the run of MM^{\prime} on xx ends in the state f(q)f(q), and qFq\in F if and only if f(q)Ff(q)\in F^{\prime}, so MM accepts xx if and only if MM^{\prime} accepts xx.

Lemma 13 ([5, Lemma 3.7])

Let LL be a GG-language. For any reachable nominal DFA MM that recognizes LL, there is a surjective automaton homomorphism ff from MM to MLM_{L}.

We can now prove bounds in the nominal Myhill-Nerode theorem:

Proof (Proof of Theorem 4.1)

(1. \Rightarrow 2.) Suppose that A/LA^{*}/\equiv_{L} has at most nn orbits and nominal dimension at most kk. Then MLM_{L} is a nominal DFA that recognizes LL, and its state set is A/LA^{*}/\equiv_{L} which by assumption has at most nn orbits and nominal dimension at most kk.

(2. \Rightarrow 1.) Suppose that LL is recognized by a nominal DFA MM with at most nn orbits and nominal dimension at most kk. We may assume that MM is reachable, so by 13, there is a surjective automaton homomorphism ff from MM to the syntactic automaton MLM_{L}. By 2 and 3, the state set of MLM_{L} has at most nn orbits and nominal dimension kk. But the state set of MLM_{L} is exactly A/LA^{*}/\equiv_{L}, and so we obtain the desired bounds.

0.B.3 Proof of 4

Proof

Let (x1,,xn)X(x_{1},\ldots,x_{n})\in X. Its orbit G(x1,,xn)G\cdot(x_{1},\ldots,x_{n}) must be contained in the product (Gx1)××(Gxn)(G\cdot x_{1})\times\cdots\times(G\cdot x_{n}). Hence, it is enough to bound the number of orbits of any given product O1××OnO_{1}\times\ldots\times O_{n}, where each OiO_{i} is an orbit of XiX_{i}, and multiply by the number of possible products. Fix orbits O1,,OnO_{1},\ldots,O_{n} of X1,,XnX_{1},\ldots,X_{n}, respectively. By 7, there are equivariant surjections fOi:(ki)Oif_{O_{i}}:\mathbb{N}^{(k_{i})}\to O_{i}. Taking the product gives us the equivariant surjection

(k1)××(kn)fO1××fOnO1××On.\mathbb{N}^{(k_{1})}\times\cdots\times\mathbb{N}^{(k_{n})}\xrightarrow{f_{O_{1}}\times\cdots\times f_{O_{n}}}O_{1}\times\cdots\times O_{n}.

By 2, O1××OnO_{1}\times\cdots\times O_{n} has at most as many orbits as (k1)××(kn)\mathbb{N}^{(k_{1})}\times\cdots\times\mathbb{N}^{(k_{n})}, which has f(k1,,kn)f_{\mathbb{N}}(k_{1},\ldots,k_{n}) orbits. Now, there were =12n\ell=\ell_{1}\ell_{2}\cdots\ell_{n} ways to choose the orbits O1,,OnO_{1},\ldots,O_{n}, so in total, XX can have at most f(k1,,kn)\ell f_{\mathbb{N}}(k_{1},\ldots,k_{n}) orbits.

0.B.4 Proof of 5

Proof

Consider tuples a¯=(a1,,ak1)(k1)\bar{a}=(a_{1},\ldots,a_{k_{1}})\in\mathbb{N}^{(k_{1})} and b¯=(b1,,bk2)(k2)\bar{b}=(b_{1},\ldots,b_{k_{2}})\in\mathbb{N}^{(k_{2})}. Notice that the orbit of the pair (a¯,b¯)(\bar{a},\bar{b}) is exactly determined by the collection of indices i,ji,j such that ai=bja_{i}=b_{j}. So to choose an orbit, we can first choose the number of indices 0rk20\leq r\leq k_{2} that a¯\bar{a} and b¯\bar{b} coincide on. Then we need to choose rr indices i1,,iri_{1},\ldots,i_{r} of a¯\bar{a}, rr indices of j1,,jrj_{1},\ldots,j_{r} of b¯\bar{b}, and a bijection between {i1,,ir}\{i_{1},\ldots,i_{r}\} and {j1,,jr}\{j_{1},\ldots,j_{r}\} in order to determine the indices that a¯\bar{a} and b¯\bar{b} coincide on. There are (k1r)(k2r)r!\binom{k_{1}}{r}\binom{k_{2}}{r}r! ways to choose these.

Thus the total number of orbits is

f(k1,k2)=r=0k2(k1r)(k2r)r!,f_{\mathbb{N}}(k_{1},k_{2})=\sum_{r=0}^{k_{2}}\binom{k_{1}}{r}\binom{k_{2}}{r}r!,

This is lower bounded by (k1k2)k2!\binom{k_{1}}{k_{2}}k_{2}!. Since k2!(k2e)k2k_{2}!\geq\left(\frac{k_{2}}{e}\right)^{k_{2}} and (k1k2)(k1k2)k2\binom{k_{1}}{k_{2}}\geq\left(\frac{k_{1}}{k_{2}}\right)^{k_{2}} for any value of k1,k2k_{1},k_{2}, we have that

f(k1,k2)(k1k2)k2!(k1k2)k2(k2e)k2=(k1e)k2.f_{\mathbb{N}}(k_{1},k_{2})\geq\binom{k_{1}}{k_{2}}k_{2}!\geq\left(\frac{k_{1}}{k_{2}}\right)^{k_{2}}\left(\frac{k_{2}}{e}\right)^{k_{2}}=\left(\frac{k_{1}}{e}\right)^{k_{2}}.

On the other hand, since (nk)nkk!\binom{n}{k}\leq\frac{n^{k}}{k!} for any nn and kk,

r=0k2(k1r)(k2r)r!\displaystyle\sum_{r=0}^{k_{2}}\binom{k_{1}}{r}\binom{k_{2}}{r}r! r=0k2k1rr!(k2r)r!\displaystyle\leq\sum_{r=0}^{k_{2}}\frac{k_{1}^{r}}{r!}\binom{k_{2}}{r}r!
=r=0k2k1r(k2r)\displaystyle=\sum_{r=0}^{k_{2}}k_{1}^{r}\binom{k_{2}}{r}
r=0k2k1k2(k2r)\displaystyle\leq\sum_{r=0}^{k_{2}}k_{1}^{k_{2}}\binom{k_{2}}{r}
=k1k2r=0k2(k2r)\displaystyle=k_{1}^{k_{2}}\sum_{r=0}^{k_{2}}\binom{k_{2}}{r}
=k1k22k2=(2k1)k2\displaystyle=k_{1}^{k_{2}}2^{k_{2}}=(2k_{1})^{k_{2}}

0.B.5 Proof of 6

The following definitions and propositions are adapted from [5, Section 9], and will allow us to bound the number of possible nominal sets.

Definition 17 (cf. [5, Definition 9.11])

A support representation is a pair (k,S)(k,S), where kk\in\mathbb{N}, and SS is a subgroup of Sym([k])\operatorname{Sym}([k]).

Definition 18 (cf. [5, Definition 9.14])

Given a support representation (k,S)(k,S), the semantics of (k,S)(k,S), denoted [k,S]ec[k,S]^{ec}, is the set (k)/S\mathbb{N}^{(k)}/\equiv_{S}, where S\equiv_{S} is defined as

(a1,,ak)S(b1,,bk)τSi[k],aτ(i)=bi.(a_{1},\ldots,a_{k})\equiv_{S}(b_{1},\ldots,b_{k})\iff\\ \exists\tau\in S\ \forall i\in[k],\ a_{\tau(i)}=b_{i}.

There is a natural action of GG on [k,S]ec[k,S]^{ec} defined by

π[(a1,,ak)]S=[(π(a1),,π(ak))]S.\pi\cdot[(a_{1},\ldots,a_{k})]_{S}=[(\pi(a_{1}),\ldots,\pi(a_{k}))]_{S}.

An isomorphism of nominal sets is an equivariant bijection between two nominal sets.

Proposition 11 (cf. [5, Proposition 9.15])

For any support representation (k,S)(k,S), [k,S]ec[k,S]^{ec} is a single-orbit nominal set of nominal dimension kk, and every single-orbit nominal set XX of nominal dimension kk is isomorphic to [k,S]ec[k,S]^{ec} for some SSym([k])S\leq\operatorname{Sym}([k]).

Proposition 12 (cf. [5, Proposition 9.16])

Let X=[k,S]ecX=[k,S]^{ec} and Y=[,T]ecY=[\ell,T]^{ec} be single-orbit nominal sets. Let

U={u:\displaystyle U=\{u:\ [][k]u is injective and\displaystyle[\ell]\to[k]\mid u\text{ is injective and }
σSτT,σu=uτ}.\displaystyle\forall\sigma\in S\ \exists\tau\in T,\ \sigma\circ u=u\circ\tau\}.

Equivariant functions from XX to YY are in bijective correspondence with U/TU/\equiv_{T} (where T\equiv_{T} is as in 18).

Lemma 14

[k,S]ec[k,S]^{ec} is determined, up to isomorphism, by kk and the conjugacy class of SS in Sym([k])\operatorname{Sym}([k]).

Proof

Let X=[k,S]ecX=[k,S]^{ec} and Y=[,T]ecY=[\ell,T]^{ec} be single-orbit nominal sets. Suppose that k=k=\ell and that S,TS,T are conjugate in Sym([k])\operatorname{Sym}([k]). That is, there is a permutation ρ:[k][k]\rho:[k]\to[k] such that ρSρ1=T\rho S\rho^{-1}=T. Define Fρ:(k)(k)F^{\rho}:\mathbb{N}^{(k)}\to\mathbb{N}^{(k)} by Fρ((a1,,ak))=(aρ(1),,aρ(k))F^{\rho}((a_{1},\ldots,a_{k}))=(a_{\rho(1)},\ldots,a_{\rho(k)}) (i.e., reorder the input using ρ\rho). Notice that FρF^{\rho} is an equivariant bijection. By 6, FρF^{\rho} and T\equiv_{T} induce an equivalence relation on (k)\mathbb{N}^{(k)}. Since ρSρ1=T\rho S\rho^{-1}=T, the induced equivalence relation is actually S\equiv_{S}. Then the induced function ff is an equivariant bijection between XX and YY.

In the other direction, suppose there is an isomorphism f:XYf:X\to Y. The proof of 12 gives us a bijection u:[][k]u:[\ell]\to[k] such that uS=TuuS=Tu. In particular, k=k=\ell and uu is a permutation in Sym([k])\operatorname{Sym}([k]) that witnesses that SS and TT are conjugate.

With this machinery in hand, we can give the proof of 6:

Proof

Let X=[k,S]ecX=[k^{\prime},S]^{ec} be a single-orbit nominal set with nominal dimension at most kk. By 14, XX is determined up to isomorphism by the value of kk^{\prime} and the conjugacy class of SS in Sym([k])\operatorname{Sym}([k^{\prime}]). Since the nominal dimension of XX is kk, kkk^{\prime}\leq k, and so there are kk choices for kk^{\prime}. It remains to count the number of conjugacy classes of subgroups of Sym([k])\operatorname{Sym}([k]). The number of subgroups of Sym([k])\operatorname{Sym}([k]) is 2Θ(k2)2^{\Theta(k^{2})} [20, Theorem 4.2], which certainly gives an upper bound for the number of conjugacy classes of subgroups. Additionally, any subgroup has at most k!k! conjugates (one for each permutation in Sym([k])\operatorname{Sym}([k])), so the number of conjugacy classes is also at least 2O(k2)/k!=2O(k2)2^{O(k^{2})}/k!=2^{O(k^{2})}.

Thus the number of single-orbit nominal sets of nominal dimension at most kk is at most k2O(k2)=2O(k2)k2^{O(k^{2})}=2^{O(k^{2})}.

0.B.6 Proof of 9

Proof

We count the number of equivariant functions δ:Q×AQ\delta:Q\times A\to Q, where QQ has nn orbits and nominal dimension kk. Since δ\delta is equivariant, by 2, a single orbit of Q×AQ\times A must map into a single orbit of QQ, so we can first choose a target orbit of QQ for each orbit of Q×AQ\times A. By 4, Q×AQ\times A has at most nf(k,p)n\ell f_{\mathbb{N}}(k,p) orbits. Since pp is fixed, by 5, for any kk, f(k,p)=O(kp)f_{\mathbb{N}}(k,p)=O(k^{p}). Furthermore, since \ell is also fixed, Q×AQ\times A has at most O(nkp)O(nk^{p}) orbits. Thus there are nO(nkp)n^{O(nk^{p})} many ways to choose the target orbits of each orbit of Q×AQ\times A.

Once we have chosen a target orbit of QQ for each orbit of Q×AQ\times A, we must choose an equivariant function from each orbit O1O_{1} of Q×AQ\times A to one orbit O2O_{2} of QQ. By 3, the nominal dimension of Q×AQ\times A is at most k+pk+p, and so O1O_{1} also has nominal dimension at most k+pk+p. Similarly, O2O_{2} has nominal dimension at most kk. Therefore O1=[k,S]ecO_{1}=[k^{\prime},S]^{ec} where kk+pk^{\prime}\leq k+p, and O2=[,T]ecO_{2}=[\ell,T]^{ec} where k\ell\leq k. By 12, the number of equivariant functions from O1O_{1} to O2O_{2} is upper bounded by the number of injections [][k][\ell]\to[k], which is in turn at most (k+p)!p!(k+p)!\frac{(k+p)!}{p!}\leq(k+p)!.

We must choose one such equivariant function for each orbit of Q×AQ\times A, so the total number of choices of all of these functions is at most ((k+p)!)O(nkp)\left((k+p)!\right)^{O(nk^{p})}. Once we have done this, we have chosen a transition behavior δ:Q×AQ\delta:Q\times A\to Q.

Hence the total number of possible transition behaviors is upper bounded by

nO(nkp)((k+p)!)nkp=(n(k+p)!)O(nkp).n^{O(nk^{p})}\cdot\left((k+p)!\right)^{nk^{p}}=\left(n(k+p)!\right)^{O(nk^{p})}.

0.B.7 Proof of 10

Proof

Suppose for contradiction that τx=x\tau\cdot x=x. Let iDi\in D be the only element of DD such that τ(i)i\tau(i)\neq i, and notice that τ(i)D\tau(i)\notin D since τ\tau fixes every element of DD other than ii. We claim that D{i}D\setminus\{i\} supports xx. To see this, let σG\sigma\in G such that σ\sigma fixes every element of DiD\setminus i. We need to show that σx=x\sigma\cdot x=x. We may assume that σ(i)i\sigma(i)\neq i, since otherwise σ\sigma fixes every element of DD, and since DD supports xx, we would have σx=x\sigma\cdot x=x. Additionally, σ(i)D\sigma(i)\notin D since σ\sigma fixes every element of DD other than ii. Now, let j=τ(i)j=\tau(i) and k=σ(i)k=\sigma(i), and consider the permutation (jk)σ(j\ k)\sigma. We have that (jk)σ|D=τ|D(j\ k)\sigma|_{D}=\tau|_{D}, as σ\sigma and τ\tau both fix every element of D{i}D\setminus\{i\}, and τ(i)=j=(jk)(k)=(j/k)σ(i)\tau(i)=j=(j\ k)(k)=(j/k)\sigma(i). Since DD supports xx, we can deduce that (jk)σx=τx=x(j\ k)\sigma\cdot x=\tau\cdot x=x. Then σx=(jk)x\sigma\cdot x=(j\ k)\cdot x. Since j,kDj,k\notin D, (jk)(j\ k) fixes every element of DD, and so (jk)x=x(j\ k)\cdot x=x. This shows that σx=x\sigma\cdot x=x, and since σ\sigma was an arbitrary permutation that fixed every element of D{i}D\setminus\{i\}, we may conclude that D{i}D\setminus\{i\} supports xx. This is a contradiction, since DD is the least support of xx, and hence τxx\tau\cdot x\neq x.

0.B.8 Proof of 11

Proof

Suppose for contradiction that there is some orbit G[x]LG\cdot[x]_{L} of A/LA^{*}/\equiv_{L} such that the orbit of every string of length strictly less than nn is not in G[x]LG\cdot[x]_{L}. We may assume that the representative of the orbit xx is of minimal length mnm\geq n. Write x=a1amx=a_{1}\cdots a_{m}.

Consider the collection of prefixes of xx of length up to n1n-1, i.e., the set {ϵ,a1,a1a2,,a1an1}\{\epsilon,a_{1},a_{1}a_{2},\ldots,a_{1}\cdots a_{n-1}\}. Since these all have length strictly less than nn, they must belong to one of the (n1)(n-1)-many orbits that are not in G[x]LG\cdot[x]_{L}. As there are nn strings in the collection, there must be 0i<jn10\leq i<j\leq n-1 such that [a1ai]L[a_{1}\cdots a_{i}]_{L} and [a1aj]L[a_{1}\cdots a_{j}]_{L} belong to the same orbit (here, a1a0a_{1}\cdots a_{0} denotes the empty string). That is, there is τG\tau\in G such that τ(a1ai)La1aj\tau\cdot(a_{1}\cdots a_{i})\equiv_{L}a_{1}\cdots a_{j}. Notice that L\equiv_{L} is preserved under appending common suffixes; i.e., if xLyx\equiv_{L}y, then for any zAz\in A^{*}, xzLyzxz\equiv_{L}yz. Thus [τ(a1ai)]aj+1amLa1ajaj+1am=x[\tau\cdot(a_{1}\cdots a_{i})]a_{j+1}\cdots a_{m}\equiv_{L}a_{1}\cdots a_{j}a_{j+1}\cdots a_{m}=x. However, notice that [τ(a1ai)]aj+1am[\tau\cdot(a_{1}\cdots a_{i})]a_{j+1}\cdots a_{m} is strictly shorter than xx, and its L\equiv_{L}-class is in the same orbit of A/LA^{*}/\equiv_{L} as [x]L[x]_{L} (more than that, is in the same L\equiv_{L}-class), which contradicts the assumption that xx was of minimal length.

Thus for every orbit of A/LA^{*}/\equiv_{L}, there must be a string of length strictly less than nn that is in the orbit.

0.B.9 Proof of 1

Proof

Let [x]LA/L[x]_{L}\in A^{*}/\equiv_{L}. By 11, there is some xAx^{\prime}\in A^{*} with |x|<n|x^{\prime}|<n and πG\pi\in G such that π[x]L=[x]L\pi\cdot[x]_{L}=[x^{\prime}]_{L}. Then 1 and 5 tell us that

|supp([x]L)|=|supp(π[x]L)|=|supp([x]L)||supp(x)|.|supp([x]_{L})|=|supp(\pi\cdot[x]_{L})|=|supp([x^{\prime}]_{L})|\leq|supp(x^{\prime})|.

Each of the letters of xx^{\prime} is supported by a set of size at most pp. Since GG acts on xx^{\prime} coordinate-wise, the least support of xx^{\prime} is contained in the union of the least supports of all the letters of xx^{\prime}, which is a set of size at most (n1)p(n-1)p. Hence every element of A/LA^{*}/\equiv_{L} is supported by a set of size at most (n1)p(n-1)p, and so the nominal dimension of A/LA^{*}/\equiv_{L} is at most (n1)p(n-1)p.