The palindromization map

Dominique Perrin and Christophe Reutenauer

Abstract

The palindromization map has been defined initially by Aldo de Luca in the context of Sturmian words. It was extended to the free group of rank $2$ by Kassel and the second author. We extend their construction to arbitrary alphabets. We also investigate the suffix automaton and compact suffix automaton of the words obtained by palindromization.

1 Introduction

The iterated palindromic closure is an injective map mapping arbitrary words to palindromes. It has been introduced by Aldo de Luca in [9]. This map is used to define a representation of Sturmian words by means of a directive word and is related to a transformation introduced by Rauzy (see [20, 3]). The iterated palindromic closure has been shown in [18] to be extendable, in the case of two letters, to a map (not anymore injective) from the free group into itself and to have many interesting properties.

In this article, we study the extension of the iterated palindrome closure to a map from the free group on more than two letters to itself. We show that some of the features appearing with two letters remain valid while some others do not hold anymore. In particular, we show that the map is continuous for the profinite topology (Proposition 4.1). We were not able to characterise the kernel of the map as it is done in [18], where it is related to the braid group. We also discuss the relation with noncommutative cohomology evidenced in [18] but we show that, on more than two letters, the cocycle corresponding to the iterated palindromization map is not trivial.

In Section 5, we describe the suffix automaton of a word of the form $\operatorname{Pal}(w)$ , that is the minimal automaton of the set of suffixes of this word. We extend to arbitrary alphabets results concerning the suffix automaton; the corresponding results for binary alphabets are from [13].

In Section 6, we develop study compact automata. These automata have already been studied in the case of suffix automata (see the chapter by Maxime Crochemore in [19]), but they do not seem to have been considered before in the general case of automata. We fill this gap and present a direct definition of a minimal compact automaton, which is shown to be unique (Corollary 6.6), together with two other results related to the notion of reduction of automaton (Propositions 6.5 and 6.7).

This will apply in Section 7 to the construction of the minimal compact suffix automaton of $\operatorname{Pal}(u)$ . In that section, we construct directly the minimal compact automaton of the set of suffixes of $\operatorname{Pal}(w)$ . The construction extends the known construction in the binary alphabet case due to Epifanio, Mignosi, Shallit and Venturini [13] (see also [7]). It consists in computing the automaton for $\operatorname{Pal}(ua)$ from the automaton of $\operatorname{Pal}(u)$ , by adjoining one state, and several transitions from the first automaton to this state. From this, we deduce the exact form of the automaton (Theorem 7.1) and several of its properties (Corollaries 7.6 and 7.7)

Acknowledgements

We thank Maxime Crochemore for his advice about compact automata.

2 The palindromization map

We denote by $A^{*}$ the free monoid on the alphabet $A$ and by $1$ the empty word. We denote by $\tilde{w}=a_{n}\cdots a_{2}a_{1}$ the reversal of the word $w=a_{1}a_{2}\cdots a_{n}$ with $a_{i}\in A$ . The word $w$ is a palindrome if $\tilde{w}=w$ .

We denote by $FG(A)$ the free group on $A$ . If $w$ is in $F(A)$ and $a$ in $A$ , we denote by $|w|_{a}$ the number of occurrences of $a$ in $w$ , where one counts with -1 the occurrences of $a^{-1}$ ; this is well defined and does not depend on the chosen expression for $w$ ; we call it the $a$ -degree of $w$ . Moreover, define $|w|=\sum_{a\in A}|w|_{a}$ , the algebraic length of $w$ . In particular, if $w\in A^{*}$ , then $|w|$ is the length of $w$ .

The reversal of the element $w=a_{1}a_{2}\cdots a_{n}$ with $a_{i}\in A\cup A^{-1}$ is the element $\tilde{w}=a_{n}\cdots a_{2}a_{1}$ . This does not depend on the chosen expression for $w$ . The map $w\mapsto\tilde{w}$ is an antimorphism, that is, it satisfies $\widetilde{uv}=\tilde{v}\tilde{u}$ . We also say that an element $w$ of $FG(A)$ is a palindrome if $\tilde{w}=w$ .

Let us define two morphisms $L\colon u\mapsto L_{u}$ and $R\colon u\mapsto R_{u}$ from $FG(A)$ into its group of automorphisms as follows. For $a,b\in A$ , we set

L_{a}(b)=\begin{cases}a&\mbox{if $b=a$}\\ ab&\mbox{otherwise}\end{cases}

The map $L_{a}$ is an automorphism of $FG(A)$ since its inverse is the map

L_{a}^{-1}(b)=\begin{cases}a&\mbox{if $b=a$}\\ a^{-1}b&\mbox{otherwise}\end{cases}

Symmetrically, for $a,b\in A$ , we set $R_{a}(b)=\widetilde{L_{a}(b)}$ .

Thus, for example, $L_{a}(ab^{-1})=aL_{a}(b)^{-1}=ab^{-1}a^{-1}$ and $R_{a}(ab^{-1})=aR_{a}(b)^{-1}=aa^{-1}b^{-1}=b^{-1}$ .

Note that $L,R$ are related by two identities. The first one is

aR_{a}(u)=L_{a}(u)a

(2.1)

for every $a\in A\cup A^{-1}$ and $u\in FG(A)$ . Indeed, this is true when $u$ is a letter; by rewriting equivalently $R_{a}(u)=a^{-1}L_{a}(u)a$ , this equality follows since both sides are the image of $u$ under an automorphism of $FG(A)$ .

The second one is

R_{u}(v)=\widetilde{L_{u}(\tilde{v})}

(2.2)

for every $u,v\in FG(A)$ . Indeed, this is true when $u$ is a letter and the general case follows similarly.

Every word $w\in A^{*}$ is a prefix of some palindrome since $w\tilde{w}$ is always a palindrome. Thus, there exists a palindrome of shortest length which has $w$ as a prefix. Actually, this palindrome is unique. It is called the palindromic closure of $w$ and denoted $w^{(+)}$ . One has $w^{(+})=yz\tilde{y}$ where $z$ is the longest palindrome suffix of $w=yz$ (for these results, due to Aldo de Luca, see for example [21] Proposition 12.1.1).

For example, $(abaa)^{(+)}=abaaba$ , since the longest palindromic suffix of $abaa$ is $z=aa$ , and $y=ab$ .

Let $\operatorname{Pal}$ be the unique map from $A^{*}$ to $A^{*}$ such that $\operatorname{Pal}(1)=1$ , and for $w\in A^{*}$ and $a\in A$

\operatorname{Pal}(wa)=(\operatorname{Pal}(w)a)^{(+)}.

For a word $w$ , $\operatorname{Pal}(w)$ is called the iterated palindromic closure of $w$ . The iterated palindromic closure has been introduced by Aldo de Luca [9] who has shown that it is injective (see for example [21] p. 102, where the injectivity is proved by an algorithm).

For example $\operatorname{Pal}(aba)=abaaba$ , since $\operatorname{Pal}(a)=a$ , $\operatorname{Pal}(ab)=(ab)^{(+)}=aba$ , and finally $\operatorname{Pal}(aba)=(abaa)^{(+)}=abaaba$ .

The mapping $\operatorname{Pal}$ may be extended to infinite words, since when $u$ is a proper prefix of $v$ , $\operatorname{Pal}(u)$ is a proper prefix of $\operatorname{Pal}(v)$ .

An important property of $\operatorname{Pal}$ is the following functional equation, known as Justin’s Formula [17]. For all $u,v\in A^{*}$ ,

\operatorname{Pal}(uv)=\operatorname{Pal}(u)R_{u}(\operatorname{Pal}(v)).

(2.3)

A dual form of (2.3) (which is the one actually given in [17, Lemma 2.1]) is

\operatorname{Pal}(uv)=L_{u}(\operatorname{Pal}(v))\operatorname{Pal}(u).

(2.4)

Indeed, assuming (2.4), we have using (2.2),

	$\displaystyle\operatorname{Pal}(uv)$	$\displaystyle=$	$\displaystyle\widetilde{\operatorname{Pal}(uv)}=\widetilde{\operatorname{Pal}(u)}\widetilde{L_{u}(\operatorname{Pal}(v))}$		(2.5)
		$\displaystyle=$	$\displaystyle\operatorname{Pal}(u)R_{u}(\widetilde{\operatorname{Pal}(v)})=\operatorname{Pal}(u)R_{u}(\operatorname{Pal}(v)).$		(2.6)

Hence (2.4) implies (2.3). The reverse implication is true, too, as is similarly verified.

We want to prove the following result, extending the construction of [18] to arbitrary alphabets.

Theorem 2.1

There exists a unique extension of $\operatorname{Pal}$ to a map from $FG(A)$ to itself fixing every $a\in A\cup A^{-1}$ and satisfying (2.3) for every $u,v\in FG(A)$ .

We will still denote by $\operatorname{Pal}$ the extension of the iterated palindromic closure to the free group and call it the palindromization map. We will see below that the extension also satisfies (2.4), for any $u,v\in FG(A)$ .

The statement follows directly from the following property.

Proposition 2.2

Let $\alpha:u\in FG(A)\mapsto\alpha_{u}\in\operatorname{Aut}(FG(A))$ be a morphism from the free group on $A$ to the group of its automorphisms, such that $\alpha_{a}(a)=a$ for every $a\in A$ . There exists a unique map $f:FG(A)\to FG(A)$ such that

(i)

$f(a)=a$ for every $a\in A\cup A^{-1}$ .
(ii)

$f(uv)=f(u)\alpha_{u}(f(v))$ (2.7)

for every $u,v\in FG(A)$ .

Proof.

We prove first uniqueness of $f$ . Equation (2.7) with $u=v=1$ implies that $f(1)=1$ .

Next, the same equation implies

f(au)=a\alpha_{a}(f(u)).

for each $u\in FG(A)$ and each $a\in A\cup A^{-1}$ . Thus there is a unique $f$ such that for each reduced word $au$ with $a\in A\cup A^{-1}$ one has $f(au)=a\alpha_{a}(f(u))$ . Indeed, this is true when $u=1$ and follows easily using induction on the length of $au$ . Thus there is at most one map $f$ satisfying conditions (i) and (ii).

To prove the existence, let us prove that, for this map $f$ , Equation (2.7) holds for every $v\in FG(A)$ by induction on the length $l(u)$ of the reduced word $u$ . It holds for $|u|=0$ . Next, let $u\in FG(A)$ be of positive length. Set $u=u^{\prime}a$ with $a\in A\cup A^{-1}$ , in reduced form. Then $l(u^{\prime})=l(u)-1$ . Since $uv=u^{\prime}(av)$ , the induction hypothesis implies

\displaystyle f(uv)

\displaystyle=

\displaystyle f(u^{\prime})\alpha_{u^{\prime}}(f(av)).

Applying again the induction hypothesis, we obtain

f(u)=f(u^{\prime}a)=f(u^{\prime})\alpha_{u^{\prime}}(a).

- Assume first that $av$ is reduced. Then, by definition of $f$ , we have $f(av)=a\alpha_{a}(f(v))$ . Thus

	$\displaystyle f(uv)$	$\displaystyle=$	$\displaystyle f(u^{\prime})\alpha_{u^{\prime}}(a\alpha_{a}(f(v)))$
		$\displaystyle=$	$\displaystyle f(u^{\prime})\alpha_{u^{\prime}}(a)\alpha_{u}(f(v)).$

Thus we obtain

\displaystyle f(uv)

\displaystyle=

\displaystyle f(u)\alpha_{u}(f(v)).

- Assume now that $av$ is not reduced; set $v=a^{-1}v^{\prime}$ in reduced form. Then, by definition of $f$ , we have $f(v)=f(a^{-1}v^{\prime})=a^{-1}\alpha_{a^{-1}}(f(v^{\prime}))$ .

Thus, since $\alpha_{a}(a^{-1})=a^{-1}$ , we have

$\displaystyle f(u)\alpha_{u}(f(v))$	$\displaystyle=$	$\displaystyle f(u^{\prime})\alpha_{u^{\prime}}(a)\alpha_{u^{\prime}a}(a^{-1}\alpha_{a^{-1}}(f(v^{\prime}))$
	$\displaystyle=$	$\displaystyle f(u^{\prime})\alpha_{u^{\prime}}(a)\alpha_{u^{\prime}}(a^{-1})\alpha_{u^{\prime}}(f(v^{\prime}))$
	$\displaystyle=$	$\displaystyle f(u^{\prime})\alpha_{u^{\prime}}(f(v^{\prime}))$

which by induction hypothesis is equal to $f(u^{\prime}v^{\prime})=f(uv)$ . ∎

We now verify the following property of the palindromization map.

Proposition 2.3

For every $w\in FG(A)$ , $\operatorname{Pal}(w)$ is a palindrome.

Proof.

For every $u\in FG(A)$ and $a\in A\cup A^{-1}$ , we have

aR_{a}(\tilde{u})=\widetilde{R_{a}(u)}a.

(2.8)

Indeed, $aR_{a}(\tilde{u})=L_{a}(\tilde{u})a=\widetilde{R_{a}(u)}a$ by (2.1) and (2.2). It follows from (2.8) that for every $a\in A\cup A^{-1}$ and every palindrome $u\in FG(A)$ , $aR_{a}(u)$ is a palindrome.

Let us now show by induction on the length of the reduced word representing $w\in FG(A)$ that $\operatorname{Pal}(w)$ is a palindrome. It is true if $w=1$ . Next, set $w=au$ in reduced form with $a\in A\cup A^{-1}$ and $u\in FG(A)$ . We have by (2.3), $\operatorname{Pal}(w)=aR_{a}(\operatorname{Pal}(u))$ . By induction hypothesis, $\operatorname{Pal}(u)$ is a palindrome and it follows from the previous observation that $\operatorname{Pal}(w)$ is a palindrome, too. ∎

It follows from Proposition 2.3, using the same argument as in (2.6), that the map $\operatorname{Pal}$ satisfies also (2.4) for every $u,v\in FG(A)$ .

As an example, we have $Pal(b^{-1})=b^{-1}$ and $\operatorname{Pal}(ab^{-1})=\operatorname{Pal}(a)R_{a}(Pal(b^{-1}))=aR_{a}(b^{-1})=a(ba)^{-1}=b^{-1}$ . This shows that the extension of $\operatorname{Pal}$ to $FG(A)$ is not injective. In the case of a binary alphabet, one can characterize the kernel of $\operatorname{Pal}$ as follows.

Let $B_{3}$ be the braid group on three strands defined as

B_{3}=\langle\sigma_{1},\sigma_{2}\mid\sigma_{1}\sigma_{2}\sigma_{1}=\sigma_{2}\sigma_{1}\sigma_{2}\rangle.

Let $\beta:FG(a,b)\to B_{3}$ be the morphism $\beta:a\mapsto\sigma_{1},b\mapsto\sigma_{2}$ . For any $u,v\in FG(a,b)$ , one has $\operatorname{Pal}(u)=\operatorname{Pal}(v)$ if and only if $\beta(u^{-1}v)\in\langle\sigma_{1}\sigma_{2}^{-1}\sigma_{1}^{-1}\rangle$ (see [18, Proposition 5.2]). No such characterization is known on more than two letters.

3 Semidirect products, cocycles and sequential functions

We discuss now several interpretations of Justin’s Formula.

Semidirect products

Observe first that Equation (2.7) (and thus also Equation (2.3)) can be expressed in terms of semidirect products. Indeed, consider the semidirect product $FG(A)*_{\alpha}FG(A)$ of $FG(A)$ with itself corresponding to the morphism $\alpha$ from $FG(A)$ into $\operatorname{Aut}(FG(A))$ . By definition, it is the set of pairs $(u,v)\in FG(A)\times FG(A)$ with the product

(u,v)(r,s)=(u\alpha_{v}(r),vs)

Equation (2.7) expresses the fact that $\delta:w\mapsto(f(w),w)$ is a morphism from $FG(A)$ to $FG(A)*_{\alpha}FG(a)$ . Indeed, assuming (2.7), we have for every $u,v\in FG(A)$ ,

\delta(u)\delta(v)=(f(u),u)(f(v),v)=(f(u)\alpha_{u}(f(v)),uv)=f(uv),uv)=\delta(uv).

This proves the following statement.

Proposition 3.1

The map $u\mapsto(\operatorname{Pal}(u),u)$ is the unique morphism from $FG(A)$ to $FG(A)*_{R}FG(A)$ sending every $a\in A$ to $(a,a)$ .

Cocycles

Justin’s Formula is also related to the notion of nonabelian group cohomology, as pointed out in [18]. A function $f$ , from $FG(A)$ to itself, is a 1-cocycle, with respect to a group morphism $\alpha:u\mapsto\alpha_{u}$ from $FG(A)$ to $\operatorname{Aut}(FG(A))$ , if (2.7) holds for all $u,v\in FG(A)$ . Thus $\operatorname{Pal}$ , as a function from $FG(A)$ to itself, is a 1-cocycle with respect to the morphism $R$ . Such a 1-cocycle is trivial if there is an element $x\in FG(A)$ such that

f(u)=x^{-1}\alpha_{u}(x)

(3.1)

for every $u\in FG(A)$ . When $A$ has two elements, one has $\operatorname{Pal}(u)=(ab)^{-1}R_{u}(ab)$ by [18, Equation (3.1)]. Thus the 1-cocycle $\operatorname{Pal}$ is trivial. This is not the case on more than two letters. Indeed, suppose that $x\in FG(A)$ is such that $\operatorname{Pal}(u)=x^{-1}R_{u}(x)$ for all $u\in FG(A)$ . One has then $xa=R_{a}(x)$ for every $a\in A$ and thus, by taking the $a$ -degree: $|x|_{a}+1=|x|$ . This implies, by summing over all $a\in A$ , $|x|(\operatorname{Card}(A)-1)=\operatorname{Card}(A)$ , which is impossible for $\operatorname{Card}(A)\geq 3$ since $|x|$ is an integer.

Sequential functions

Equation (2.3) can also be seen as expressing that, as a function from $A^{*}$ to $A^{*}$ , the map $\operatorname{Pal}$ is a sequential function, that is, a function computed by a sequential transducer. Let us recall the definition of a sequential transducer on a set $Q$ of states. Let

(q,a)\in Q\times A\mapsto q\cdot a\in Q

be a map, called the transition function. This map extends to a right action of $A^{*}$ on $Q$ by $q\cdot(ua)=(q\cdot u)\cdot a$ for $u\in A^{*}$ and $a\in A$ . In addition, let

(q,a)\in Q\times A\mapsto q*a\in A^{*}

be a map called the output function. This map extends to a map from $Q\times A^{*}$ to $A^{*}$ by

q*(ua)=(q*u)((q\cdot u)*a).

Given a sequential transducer on $Q$ defined by the maps $(q,a)\mapsto q\cdot a$ and $(q,a)\mapsto q*a$ and given an initial state $i\in Q$ , the function $f:A^{*}\to A^{*}$ defined by the transducer is

f(w)=i*w

Proposition 3.2

The function $\operatorname{Pal}$ is defined by the transducer on the set of states $\operatorname{Aut}(FG(A))$ with transition and output functions

R_{u}\cdot a=R_{ua}\mbox{ and }R_{u}*a=R_{u}(a)

respectively, and with initial state $i=R_{1}$ .

Proof.

We prove by induction on the length of $w\in A^{*}$ that $\operatorname{Pal}(w)=i*w$ . It is true for $w=1$ and next, assuming that $\operatorname{Pal}(u)=i*u$ , we obtain for every $a\in A$ ,

i*(ua)=(i*u)((i\cdot u)*a)=\operatorname{Pal}(u)R_{u}(a)=\operatorname{Pal}(ua).

∎

4 Uniform continuity of $\operatorname{Pal}$ for the profinite distance

The profinite topology on the free group $FG(A)$ is the topology generated by the inverse images of subsets of a finite group $F$ by a morphism $\varphi:FG(A)\to F$ . Equivalently, it is the coarsest topology such that every morphism to a finite discrete group is continuous. This topology on the free group was introduced by Hall (see [16]). It is a particular case of a more general notion which extends to varieties of groups and also of semigroups (see [22] and [1]).

The following is proved in [18] for the case of a binary alphabet.

Proposition 4.1

The map $\operatorname{Pal}:FG(A)\to FG(A)$ is continuous for the profinite topology.

Actually we shall prove a slightly more general result. Following [1] p. 57, we define the profinite distance $d$ on $FG(A)$ by the formula

d(u,v)=\frac{1}{r(u,v)}

where $u,v$ are distinct, and where $r(u,v)$ is the minimal cardinality of a group $G$ such that, for some morphism $\varphi:FG(A)\to G$ , $\varphi(u)\neq\varphi(v)$ . Since $FG(A)$ is residually finite, $r(u,v)$ is a finite integer for each pair $u,v$ of distinct elements in $FG(A)$ . It is easy to prove that $r(u,w)\geq\min(r(u,v),r(v,w))$ , hence

d(u,w)\leq\max(d(u,v),d(v,w)),

which means that $d$ is an ultrametric distance.

The topology of $FG(A)$ induced by $d$ is precisely the profinite topology.

Proposition 4.1 follows from the next result, since each uniformly continuous function is continuous.

Proposition 4.2

The map $\operatorname{Pal}:FG(A)\to FG(A)$ is uniformly continuous for the profinite distance.

We need in the proof the following characterizations of uniformly continuous functions.

1. A function $P:FG(A)\to FG(A)$ is uniformly continuous if and only if for any morphism $\varphi:FG(A)\to G$ , the function $\varphi\circ P:FG(A)\to G$ is uniformly continuous, where $G$ has the discrete distance.

2. A function $P^{\prime}:FG(A)\to G$ , with $G$ a finite group with discrete distance, is uniformly continuous if and only if it factorizes as $P^{\prime}=h\circ\psi$ , where $\psi:FG(A)\to S$ is a morphism into a finite group, and $h:S\to G$ is some function.

Proof.

We apply the first criterion above to the function $P=\operatorname{Pal}$ . Thus let $\varphi:FG(A)\to G$ be a group morphism into a finite group $G$ .

We define a right action of $FG(A)$ on the finite set $Q=G\times\operatorname{Hom}(FG(A),G)$ by

(g,\pi)\cdot w=(g\pi(\operatorname{Pal}(w)),\pi\circ R_{w}).

(4.1)

It is indeed a right action, since the associativity follows from

$\displaystyle((g,\pi)\cdot u)\cdot v$	$\displaystyle=$	$\displaystyle(g\pi(\operatorname{Pal}(u)),\pi\circ R_{u})\cdot v$
	$\displaystyle=$	$\displaystyle(g\pi(\operatorname{Pal}(u))\pi\circ R_{u}(\operatorname{Pal}(v)),\pi\circ R_{u}\circ R_{v})$
	$\displaystyle=$	$\displaystyle(g\pi(\operatorname{Pal}(u)R_{u}(\operatorname{Pal}(v))),\pi\circ R_{uv})$
	$\displaystyle=$	$\displaystyle(g\pi(\operatorname{Pal}(uv)),\pi\circ R_{uv})=(g,\pi)\cdot(uv).$

It follows from Equation (4.1) that for every $w\in FG(A)$ , one has

(1_{G},\varphi)\cdot w=(\varphi(\operatorname{Pal}(w)),\varphi\circ R_{w}).

In order to apply the second criterion, we define now a morphism $\psi:FG(A)\to S$ , where $S$ is the symmetric group on $Q$ : for any $w$ in $FG(A)$ , $\psi(w)$ is the permutation $q\mapsto q\cdot w$ . Then $\psi$ is a morphism, because of the right action defined above, letting $S$ act on the right on $Q$ . Note that $S$ is a finite group, since $Q$ is finite.

Define a function $h:S\to G$ ; it sends each element $\sigma\in S$ onto the first component $g$ of the pair $(g,\pi)=(1_{G},\varphi)\sigma$ . Thus we have $h\circ\psi(w)=$ first component of $(1_{G},\varphi)\cdot w$ , which is equal to $\varphi\circ\operatorname{Pal}(w)$ ; thus $h\circ\psi=\varphi\circ\operatorname{Pal}$ , which allows to conclude, according to the second criterion, that $\varphi\circ\operatorname{Pal}$ is uniformly continuous. With the first criterion, we see that $\operatorname{Pal}$ is uniformly continuous. ∎

Since $A^{*}$ is dense in $FG(A)$ for the profinite topology (see [1] or [2] for example), we deduce from Proposition 4.1 the following statement.

Corollary 4.3

The map $\operatorname{Pal}:FG(A)\to FG(A)$ is the unique continuous extension to $FG(A)$ of the iterated palindromic closure of $A^{*}$ .

Though $\operatorname{Pal}$ is continuous for the profinite topology, it is not continuous for the pro- $p$ topology on the free group $F_{2}$ of rank $2$ , where $p$ is a prime number. Recall that the pro- $p$ topology is the coarsest topology such that every group homomorphism from $F_{2}$ into a finite $p$ -group is continuous (see [18, Remark 6.3]).

5 Suffix automaton

The minimal automaton of the set of suffixes of a word $w$ is called the suffix automaton of $w$ . These automata has been extensively studied (see [19, Chapter 2]) A striking property (originally due to [4]) is that its number of states is at most $2|w|-1$ .

Let $u$ be a word over an arbitrary alphabet $A$ . We denote by $\mathcal{S}(u)$ the suffix automaton of $\operatorname{Pal}(u)$ .

Part (i) of the following result is not new, but we give a proof for sake of completeness.

Theorem 5.1

The automaton $\mathcal{S}(u)$ has the following properties:

(i)

It has $|\operatorname{Pal}(u)|+1$ states, which may be naturally identified with the prefixes of $\operatorname{Pal}(u)$ .
(ii)

Its terminal states are the palindromic prefixes of $\operatorname{Pal}(u)$ .

This result, for a binary alphabet, is due to [13] ((i) in the binary case also follows from Theorem 1 in [24], which characterizes the binary words whose suffix automaton has $|w|+1$ states); see also [13], [12], [7]. Moreover, for general alphabets, Part (i) of the theorem is a consequence of the remark in Section 5 in Fici’s article [14]. Note that that characterizations of the words $w$ such that the suffix automaton of $w$ has $|w|+1$ states have been given by Fici [14] and Richomme [23].

A factor $w$ of a word $u$ (resp. an infinite word $s$ ) is left-special if there are at least two letters $a$ such that $aw$ is a factor of $u$ (resp. $s$ ).

The following is from [10, Proposition 5] (see also [11, Proposition 1.5.11]). Recall that for any infinite word $x$ , the infinite word $\operatorname{Pal}(x)$ is well-defined.

Proposition 5.2

If $s=\operatorname{Pal}(x)$ for some infinite word $x$ , the left-special factors of $s$ are the prefixes of $s$ .

We will use the following consequence.

Corollary 5.3

For any (finite) word $u$ , the left special factors of $\operatorname{Pal}(u)$ are prefixes of $\operatorname{Pal}(u)$ .

Proof.

Set $s=\operatorname{Pal}(u^{\omega})$ . Let $p$ be a left-special factor of $\operatorname{Pal}(u)$ . Since $\operatorname{Pal}(u)$ is a prefix of $s$ , the word $p$ is also left-special with respect to $s$ . By Proposition 5.2, $p$ is a prefix of $s$ and thus of $\operatorname{Pal}(u)$ . ∎

Note that not all prefixes of $\operatorname{Pal}(u)$ need to be left-special. For example, if $u=ab$ , the factor $ab$ is not a left-special factor of $\operatorname{Pal}(u)=aba$ .

Given a language $L$ , a residual of $L$ is set of the form $u^{-1}L=\{v\in A^{*}\mid uv\in L\}$ . It is well-known that the minimal automaton of a language $L$ , denoted $\mathcal{A}(L)$ , has the set of nonempty residuals of $L$ as set of states.

Proof of Theorem 5.1. Let $1$ be the initial state of $\mathcal{S}(u)$ . Let $P$ be the set of prefixes of $\operatorname{Pal}(u)$ . The map $\alpha\colon p\mapsto 1\cdot p$ is injective. Indeed, let $p,p^{\prime}\in P$ be such that $1\cdot p=1\cdot p^{\prime}$ . Assuming that $|p|\leq|p^{\prime}|$ , let $r$ be such that $p^{\prime}=pr$ . Then $1\cdot pr=1\cdot p^{\prime}=1\cdot p$ and thus $(1\cdot p)\cdot r=1\cdot p$ . Since the language recognized by $\mathcal{S}(u)$ is finite, the graph of $\mathcal{S}(u)$ is acyclic, which forces $r=1$ . Thus $p=p^{\prime}$ .

Let us show now that $\alpha$ is surjective. Let $q$ be a state of the automaton $\mathcal{S}(u)$ . Let $w$ be a word such that $1\cdot w=q$ . Since the automaton is co-accessible, there is some word $s$ such that $1\cdot ws$ is a terminal state, and thus $ws$ is a suffix of $\operatorname{Pal}(u)$ . Hence the word $w$ is a factor of $\operatorname{Pal}(u)$ . Let $p$ be the shortest prefix of $\operatorname{Pal}(u)$ such that $pw$ is a prefix of $\operatorname{Pal}(u)$ . Let us show by induction on the length of $p^{\prime}$ that for every suffix $p^{\prime}$ of $p$ , one has $1\cdot p^{\prime}w=q$ . It is true if $p^{\prime}=1$ . Otherwise, set $p^{\prime}=ap^{\prime\prime}$ . We have $1\cdot p^{\prime\prime}w=q$ by induction hypothesis.

Let $s$ be an arbitrary word such that $ws$ belongs to the set $S$ of suffixes of $\operatorname{Pal}(u)$ . Since $1\cdot p^{\prime\prime}w=1\cdot w$ , and since the automaton $\mathcal{S}(u)$ is the minimal automaton of $S$ , so that $(p^{\prime\prime}w)^{-1}S=w^{-1}S$ , and $s\in w^{-1}S$ , we have $p^{\prime\prime}ws\in S$ . We cannot have $p^{\prime\prime}ws=\operatorname{Pal}(u)$ , since otherwise $p^{\prime\prime}w$ is a prefix of $\operatorname{Pal}(u)$ with $p^{\prime\prime}$ shorter than $p$ .

Thus there is some $b\in A$ such that $bp^{\prime\prime}ws\in S$ . Since $ap^{\prime\prime}w=p^{\prime}w$ is a factor of $\operatorname{Pal}(u)$ and since, by Corollary 5.3, $p^{\prime\prime}w$ is not left-special (because $p^{\prime\prime}w$ is not a prefix of $\operatorname{Pal}(u)$ for the same reason as above), we have $a=b$ . This shows that $ws\in S\Rightarrow p^{\prime}ws\in S$ . The converse is also true and thus that $w^{-1}S=(p^{\prime}w)^{-1}S$ , that is $1\cdot p^{\prime}w=1\cdot w=q$ , since the automaton is minimal.

We conclude that $1\cdot pw=q$ . This shows that $\alpha$ is surjective and proves property (i). Next, if $p$ is a prefix of $\operatorname{Pal}(u)$ which is also a suffix, then $p$ is a palindrome and thus property (ii) is true.

Note that the automaton $\mathcal{S}(u)$ has the additional property

(iii)

The label of an edge depends only on its end.

Actually, this property holds for any suffix automaton, as is well-known. Indeed, if $p,q$ are two states of the suffix automaton of a word $w$ such that $p\cdot a=q\cdot b=r$ , let $u,v,t$ be such that $1\stackrel{{\scriptstyle u}}{{\rightarrow}}p\stackrel{{\scriptstyle a}}{{\rightarrow}}r\stackrel{{\scriptstyle t}}{{\rightarrow}}s$ and $1\stackrel{{\scriptstyle v}}{{\rightarrow}}q\stackrel{{\scriptstyle b}}{{\rightarrow}}r\stackrel{{\scriptstyle t}}{{\rightarrow}}s$ with $s$ a terminal state. Then $uat$ and $vbt$ are suffixes of $w$ , which implies $a=b$ .

Example 5.4

Consider $u=abc$ . We have $\operatorname{Pal}(u)=abacaba$ and the automaton $\mathcal{S}(abc)$ is represented in Figure 1.

Figure 1: The automaton

\mathcal{S}(abc)

6 Compact automata

We explore the notion of compact automaton in which the edges can be labeled by nonempty words instead of letters. This version of automata appears, in the case of compact suffix automata, in [6] or [8]. It is also presented in the chapter by Maxime Crochemore in [19]. In particular, the construction of a minimal compact suffix automaton is described (see also [5]). We will show here that it is possible to define in complete generality a minimal compact automaton for every language.

A compact automaton $\mathcal{A}=(Q,E,I,T)$ is given by a set of states $Q$ , a set of edges $E\subset Q\times A^{+}\times Q$ , a set initial states $I\subset Q$ and a set of terminal states $T\subset Q$ . A path $p_{0}\stackrel{{\scriptstyle u_{0}}}{{\rightarrow}}p_{1}\stackrel{{\scriptstyle u_{1}}}{{\rightarrow}}p_{2}\ldots\stackrel{{\scriptstyle u_{n-1}}}{{\rightarrow}}p_{n}$ is a sequence of consecutive edges. Its label is $u_{0}u_{1}\cdots u_{n-1}$ . The language recognized by $\mathcal{A}$ , denoted $L(\mathcal{A})$ , is the set of labels od successful paths, that is paths from $I$ to $T$ .

An ordinary automaton is clearly a particular case of a compact automaton.

A compact automaton is deterministic if $\operatorname{Card}(I)=1$ and if for every state $p$ , the labels of the edges starting at $p$ begin with distinct letters.

Again, an ordinary deterministic automaton is deterministic as a compact automaton.

Example 6.1

The compact automaton of Figure 2 is deterministic. Its initial state (indicated with an incoming arrow) is $0$ and $2$ (with a double circle) is the unique terminal state.

Figure 2: A deterministic compact automaton.

The set of special states of a compact automaton $\mathcal{A}=(Q,E,I,T)$ is the set $Q_{s}$ of states $q$ which either belong to $I\cup T$ or such that there are edges going out of $q$ with labels beginning with distinct letters.

Let $p,q$ be special states. A path $p\stackrel{{\scriptstyle w}}{{\rightarrow}}q$ is special if the only special states on the path are its origin and its end.

A reduction from a deterministic compact automaton $\mathcal{A}=(Q,E,i,T)$ onto a deterministic compact automaton $\mathcal{A}^{\prime}=(Q^{\prime},E^{\prime},i^{\prime},T^{\prime})$ is a map $\varphi$ from $Q_{s}$ onto $Q^{\prime}_{s}$ such that

1.

$\varphi(i)=i^{\prime}$ ,
2.

$\varphi(p)\in T^{\prime}$ if and only if $p\in T$ ,
3.

for every $p,q\in Q_{s}$ , there is a special path $p\stackrel{{\scriptstyle w}}{{\rightarrow}}q$ in $\mathcal{A}$ if and only there is a special path $\varphi(p)\stackrel{{\scriptstyle w}}{{\rightarrow}}\varphi(q)$ in $\mathcal{A}^{\prime}$ .

An automaton is trim if every state is on some successful path.

Proposition 6.2

If $\mathcal{A},\mathcal{A}^{\prime}$ are deterministic compact automata and if $\varphi\colon\mathcal{A}\to\mathcal{A}^{\prime}$ is a reduction, then $L(\mathcal{A})=L(\mathcal{A}^{\prime})$ .

Proof.

If $w$ is in $L(\mathcal{A})$ , there is a path $i\stackrel{{\scriptstyle w}}{{\rightarrow}}t$ with $t\in T$ . Let $w=w_{0}w_{1}\cdots w_{n}$ be the factorisation of $w$ such that the path has the form $q_{0}\stackrel{{\scriptstyle w_{0}}}{{\rightarrow}}q_{1}\stackrel{{\scriptstyle w_{1}}}{{\rightarrow}}\cdots q_{n}\stackrel{{\scriptstyle w_{n}}}{{\rightarrow}}q_{n+1}$ with each path $q_{i}\stackrel{{\scriptstyle w_{i}}}{{\rightarrow}}q_{i+1}$ being special and where $q_{0}=i,q_{n+1}=t$ . Since $\varphi$ is a reduction, there is for each $i$ with $0\leq i\leq n$ , a special path $\varphi(q_{i})\stackrel{{\scriptstyle w_{i}}}{{\rightarrow}}\varphi(q_{i+1})$ . Thus there is in $\mathcal{A}^{\prime}$ a path $i^{\prime}=\varphi(i)\stackrel{{\scriptstyle w}}{{\rightarrow}}\varphi(t)\in T^{\prime}$ , which implies that $w$ is in $L(\mathcal{A}^{\prime})$ .

Conversely, let $w\in L(\mathcal{A}^{\prime})$ . Then there exists a path $i^{\prime}\stackrel{{\scriptstyle w}}{{\rightarrow}}t^{\prime}$ with $t^{\prime}\in T^{\prime}$ . We may decompose this path as $i^{\prime}=q^{\prime}_{0}\stackrel{{\scriptstyle w_{0}}}{{\rightarrow}}q^{\prime}_{1}\stackrel{{\scriptstyle w_{1}}}{{\rightarrow}}\cdots q^{\prime}_{n}\stackrel{{\scriptstyle w_{n}}}{{\rightarrow}}q^{\prime}_{n+1}=t^{\prime}$ , where each path $q^{\prime}_{i}\stackrel{{\scriptstyle w_{i}}}{{\rightarrow}}q^{\prime}_{i+1}$ is special. By surjectivity of $\varphi$ , we have $q^{\prime}_{i}=\varphi(q_{i})$ , $q_{0}=i$ by Condition 1 in the definition of reduction, and $q_{n+1}=t\in T$ by Condition 2. Next, there is in $\mathcal{A}$ a special path $q_{i}\stackrel{{\scriptstyle w_{i}}}{{\rightarrow}}q_{i+1}$ by Condition 3. Thus there is in $\mathcal{A}$ a path $i\stackrel{{\scriptstyle w}}{{\rightarrow}}t$ and consequently $w\in L(\mathcal{A})$ . ∎

Given a language $L\subset A^{*}$ , a nonempty residual $u^{-1}L$ is called special if either

1.

$u=1$ , or
2.

$u\in L$ , or
3.

there are two $v,w\in u^{-1}L$ which begin by different letters.

The minimal compact automaton of a language $L$ , denoted $\mathcal{A}_{c}(L)$ is the following compact automaton. The set of states is the set of special residuals of $L$ . The initial state is $L$ and the terminal states are the $u^{-1}L$ such that $u$ is in $L$ . The edges are the $(p,v,q)$ such that $p=u^{-1}L$ , $q=(uv)^{-1}L$ and there is no factorization $v=v^{\prime}v^{\prime\prime}$ with $v^{\prime},v^{\prime\prime}$ nonempty such that $(uv^{\prime})^{-1}L$ is a special residual. By definition, $\mathcal{A}_{c}(L)$ is deterministic and all its states are special; in particular, all its special paths are edges.

Example 6.3

The compact automaton of Figure 2 is the minimal compact automaton of the language $\{aaa,aba\}$ .

Example 6.4

The compact automaton of Figure 3 is deterministic. Its initial state is indicated by an incoming arrow, and all states are terminal.

Figure 3: A deterministic compact automaton.

This automaton is the minimal compact automaton of the set of suffixes of the word $abacaba=\operatorname{Pal}(abc)$ .

Proposition 6.5

For every trim deterministic compact automaton $\mathcal{A}$ , there is a unique reduction from $\mathcal{A}$ onto the minimal compact automaton of $L(\mathcal{A})$ .

Proof.

Let $\mathcal{A}=(Q,i,T)$ and $L=L(\mathcal{A})$ . Set $\mathcal{A}_{c}(L)=(R,j,S)$ . We define a mapping $\varphi\colon Q_{s}\to R$ as follows. First $\varphi(i)=j$ , so that Condition 1 in the definition of reduction is satisfied. Next, for $p\in Q_{s}$ let $u$ be such that $i\stackrel{{\scriptstyle u}}{{\rightarrow}}p$ . We set $\varphi(p)=u^{-1}L$ . The map is well-defined because if $i\stackrel{{\scriptstyle u}}{{\rightarrow}}p$ and $i\stackrel{{\scriptstyle u^{\prime}}}{{\rightarrow}}p$ , then $u^{-1}L=u^{\prime-1}L$ . If $p$ is in $T$ , then $u$ is in $L$ and thus $\varphi(p)$ is in $S$ ; conversely, if $\varphi(p)\in S$ , then $u\in L$ , hence $p\in T$ , and Condition 2 is satisfied.

If there are two edges $p\stackrel{{\scriptstyle av}}{{\rightarrow}}q$ and $p\stackrel{{\scriptstyle a^{\prime}v^{\prime}}}{{\rightarrow}}q^{\prime}$ in $\mathcal{A}$ with $a\neq a^{\prime}$ , let $w,w^{\prime}$ be such that $q\stackrel{{\scriptstyle w}}{{\rightarrow}}t$ , $q^{\prime}\stackrel{{\scriptstyle w^{\prime}}}{{\rightarrow}}t^{\prime}$ and $t,t^{\prime}\in T$ . Then $avw,a^{\prime}v^{\prime}w^{\prime}\in u^{-1}L$ and thus $\varphi(p)$ is a special residual. This shows that $\varphi$ maps $Q_{s}$ into $R_{s}$ .

The mapping $\varphi$ is surjective because for each $u^{-1}L$ in $R$ , the state $p\in Q$ such that $i\stackrel{{\scriptstyle u}}{{\rightarrow}}p$ is special.

We verify Condition 3, that is, $p\stackrel{{\scriptstyle w}}{{\rightarrow}}q$ is a special path of $\mathcal{A}$ if and only if $\varphi(p)\stackrel{{\scriptstyle w}}{{\rightarrow}}\varphi(q)$ is an edge of $\mathcal{A}_{c}(L)$ . Indeed, if $p\stackrel{{\scriptstyle w}}{{\rightarrow}}q$ is a special path, let $i\stackrel{{\scriptstyle u}}{{\rightarrow}}p$ be a path in $\mathcal{A}$ . Then $u^{-1}L\stackrel{{\scriptstyle w}}{{\rightarrow}}(uw)^{-1}L$ is an edge of $\mathcal{A}_{c}$ since otherwise the path $p\stackrel{{\scriptstyle w}}{{\rightarrow}}q$ would not be special. Conversely, if $\varphi(p)\stackrel{{\scriptstyle w}}{{\rightarrow}}\varphi(q)$ is a special path of $\mathcal{A}_{c}(L)$ , then it is an edge; let $u$ be such that there is a path $i\stackrel{{\scriptstyle u}}{{\rightarrow}}p$ if $\mathcal{A}$ ; then, by definition of edges in $\mathcal{A}_{c}(L)$ , $\varphi(p)=u^{-1}L$ and $\varphi(q)=(uw)^{-1}L$ ; thus there is a path $i\stackrel{{\scriptstyle uw}}{{\rightarrow}}q$ and finally, since the automaton is deterministic, a path $p\stackrel{{\scriptstyle w}}{{\rightarrow}}q$ ; it must be special, since $u^{-1}L\stackrel{{\scriptstyle w}}{{\rightarrow}}(uw)^{-1}L$ is an edge of $\mathcal{A}_{c}$ .

Thus $\varphi$ is a reduction for $\mathcal{A}$ onto $\mathcal{A}_{c}(L)$ .

We prove now uniqueness: let $\psi$ be some reduction from $\mathcal{A}$ onto $\mathcal{A}_{c}$ . Let $q=i\cdot u$ (that is, the unique state $q$ such that there is a path from $i$ to $q$ with label $u$ ). Then $(Q,q,T)$ recognizes $u^{-1}L$ . If $\psi(q)=k$ , then it is easily verified that $\psi$ is a reduction from $(Q,q,T)$ onto $(R,k,S)$ . Hence by Proposition 6.2, these two automata recognize the same language. But, since the states of $\mathcal{A}_{c}(u)$ are distinct residuals, $u^{-1}L$ is the unique state of $\mathcal{A}_{c}$ such that $(R,u^{-1}L,S)$ recognizes $u^{-1}L$ . Thus we must have $k=u^{-1}L$ . ∎

Since all the states of the compact automaton $\mathcal{A}_{c}(L)$ are special, its number of states is at most the number of special states of any compact automaton $\mathcal{A}$ recognizing $L$ . We have therefore the following statement which justifies the name of minimal compact automaton for $\mathcal{A}_{c}(L)$ .

Corollary 6.6

The compact automaton $\mathcal{A}_{c}(L)$ is, for every recognizable language $L$ , the unique compact automaton with the minimal number of states which recognizes $L$ .

Let $\mathcal{A}=(Q,i,T)$ be a trim compact deterministic automaton. Let $q\in Q$ be a non-special state. Let $q\stackrel{{\scriptstyle v}}{{\rightarrow}}r$ be the unique edge going out of $q$ . Then $q\neq r$ : indeed, if $q=r$ , then $q$ is not co-accessible, since $q$ is not terminal (being not special), and there is no other outgoing edge than the loop $q\stackrel{{\scriptstyle v}}{{\rightarrow}}q$ ; this contradicts that $\mathcal{A}$ is trim. Since $i$ is special, we have also $q\neq i$ . Consider the compact automaton $\mathcal{A}^{\prime}=(Q\setminus\{q\},i,T)$ with set of edges

(i)

the edges $(p,w,r)$ of $\mathcal{A}$ with $p,r\neq q$ ,
(ii)

the edges $(p,uv,r)$ for every edge $(p,u,q)$ of $\mathcal{A}$

The identity map of $Q_{s}$ is a reduction from $\mathcal{A}$ onto $\mathcal{A}^{\prime}$ , called an elementary reduction.

Figure 4: An elementary reduction.

Proposition 6.7

The minimal compact automaton $\mathcal{A}_{c}(L)$ is obtained from $\mathcal{A}(L)$ by a sequence of elementary reductions.

Proof.

Consider the deterministic compact automata recognizing $L$ , having the following property, denoted by (R): for each state $q$ , reachable from the initial state by a path labelled $u$ , define the (well defined) mapping $\varphi_{\mathcal{A}}$ from the set of states into the set of residuals of $L$ by $q\mapsto u^{-1}L$ ; then this mapping is injective.

The minimal automaton $\mathcal{A}(L)$ has property $(R)$ . We claim that property (R) is preserved by each elementary reduction. If an automaton has property (R) and has only special states, then it must be the minimal compact automaton. This proves the proposition.

We prove the claim. Let $\mathcal{A}$ and $\mathcal{A}^{\prime}$ be as above. Then, as is easily verified, one has $\varphi_{\mathcal{A}^{\prime}}=\varphi_{\mathcal{A}}|(Q\setminus\{q\})$ , and more precisely, if $p\neq q$ is reachable by $u$ from $i$ in $\mathcal{A}$ , then it is reachable from $i$ by $u$ in $\mathcal{A}^{\prime}$ and therefore $\varphi_{\mathcal{A}}(p)=u^{-1}L=\varphi_{\mathcal{A}^{\prime}}(p)$ . ∎

Example 6.8

Let $\mathcal{A}$ be the deterministic automaton represented in Figure 1.

The special states are $0,1,3,7$ . There is a reduction from this automaton to the compact automaton of Figure 3.

Two elementary reductions, suppressing $5,6$ give the automaton of Figure 7.

Figure 5: Suppresion of

5,6

The suppression of $4$ gives then the compact automaton of Figure 6.

Figure 6: Suppresion of

4

Finally, the suppression of $2$ gives the minimal compact automaton of Figure 3.

7 Direct construction of the compact suffix automaton of $\operatorname{Pal}(u)$

In Theorem 5.1, we have given some properties of the minimal automaton $\mathcal{S}(u)$ of the set of suffixes of $\operatorname{Pal}(u)$ . By Proposition 6.7, we know how to transform this automaton into the minimal compact automaton of this set, which we denote $\mathcal{S}_{c}(u)$ .

In the present section, we construct directly this compact automaton. One reason to proceed directly from $u$ to $S_{c}(u)$ is that the number of states of $\mathcal{S}_{c}(u)$ is $1+$ the length of $u$ (by Lemma 7.2 below), while the number of states of $\mathcal{S}(u)$ (which is $1+$ the length of $Pal(u)$ by Theorem 5.1) can be exponential in $|u|$ (for example, if $u=(ab)^{n}$ , the length of $Pal(u)$ is $F_{2n+3}-2$ , which grows exponentially with $n$ ).

Theorem 7.1

The automaton $\mathcal{S}_{c}(u)$ is completely characterized as follows: the states are the prefixes of $u$ , all terminal, and $1$ is the initial state. For each factorization $u=xyaz$ , where $a$ is a letter and $x,y,z$ are words, with $y$ $a$ -free, there is a transition $x\to xya$ , labelled $\operatorname{Pal}(xy)^{-1}\operatorname{Pal}(xya)$ .

Recall from Section 6 that the states of the automaton $\mathcal{S}_{c}(u)$ are the special residuals of $L$ , the set of suffixes of $\operatorname{Pal}(u)$ . By Theorem 5.1, the nonempty residuals of $L$ are the $p^{-1}L$ where $p$ is a prefix of $\operatorname{Pal}(u)$ . Clearly, the mapping $p\mapsto p^{-1}L$ is a bijection from the set of prefixes of $\operatorname{Pal}(u)$ onto the set of nonempty residuals of $L$ .

Lemma 7.2

The set of states $\mathcal{S}_{c}(u)$ is naturally in bijection with the set of palindromic prefixes of $\operatorname{Pal}(u)$ . This set is naturally in bijection with the set of prefixes of $u$ ; with this identification, the initial state is $1$ and all states are terminal.

Proof.

The second bijection maps a prefix $p$ of $u$ onto the prefix $\operatorname{Pal}(p)$ of $\operatorname{Pal}(u)$ (see [17] p. 209).

Let $p$ be a prefix of $\operatorname{Pal}(u)$ . According to the definition of special residuals in Section 6, $p^{-1}L$ is special if and only if (i) either $p=1$ , or (ii) $p\in L$ , or (iii) if there are two words in $p^{-1}L$ beginning by different letters.

Let $p^{-1}L$ be a special residual of $L$ . We show that in all three cases, $p$ is a palindromic prefix of $\operatorname{Pal}(u)$ .

In case (i), $p=1$ which is clearly a palindromic prefix of $\operatorname{Pal}(u)$ . In case (ii), $p$ is a suffix of $\operatorname{Pal}(u)$ , and being also a prefix, it is a palindromic prefix of $\operatorname{Pal}(u)$ . In case (iii), let $s,t$ be the two words, with $s=as^{\prime}$ , $t=bt^{\prime}$ , for distinct letters $a,b$ ; then $pas^{\prime},pbt^{\prime}$ are in $L$ , hence $p$ is a right special factor of $\operatorname{Pal}(u)$ ; then $\tilde{p}$ is a left special factor of $\operatorname{Pal}(u)$ , thus by Corollary 5.3, $\tilde{p}$ is a prefix of $\operatorname{Pal}(u)$ and therefore $p$ is a palindromic prefix of $\operatorname{Pal}(u)$ .

Conversely, if $p$ is a palindromic prefix of $\operatorname{Pal}(u)$ , then $p$ is also a suffix of $\operatorname{Pal}(u)$ , hence $p\in L$ and $p^{-1}L$ is special residual of $L$ , by case (ii). ∎

We use another formula of Justin, see [17] p. 209. Let $x\in A$ . If $u$ is $x$ -free then $Pal(ux)=Pal(u)xPal(u)$ . If on the other hand $x$ occurs in $u$ , write $u=u_{1}xu_{2}$ with $u_{2}$ $x$ -free. Then

\operatorname{Pal}(ux)=\operatorname{Pal}(u)\operatorname{Pal}(u_{1})^{-1}\operatorname{Pal}(u).

(7.1)

The recursive definition of $\mathcal{S}_{c}(u)$ is is explained in the following result.

Proposition 7.3

Let $u\in A^{*},x\in A$ . Define $u=hu_{2}$ , where $u_{2}$ is the longest $x$ -free suffix of $u$ . The automaton $\mathcal{S}_{c}(u)$ having as set of states the set of prefixes of $u$ , as stated in Lemma 7.2, construct an automaton $\mathcal{S}$ as follows:

•

add to $\mathcal{S}_{c}(u)$ the new state $ux$ , which is terminal;
•

for each prefix $p$ of $u_{2}$ , add an edge from the state $hp$ of $\mathcal{S}_{c}(u)$ to the new state $ux$ , labelled $\operatorname{Pal}(u)^{-1}\operatorname{Pal}(ux)$ .

Then $\mathcal{S}=\mathcal{S}_{c}(u)$ .

Note that $\operatorname{Pal}(u)$ is a prefix of $\operatorname{Pal}(ux)$ , so that $\operatorname{Pal}(u)^{-1}\operatorname{Pal}(ux)\in A^{*}$ . Moreover, $hp$ is a prefix of $u$ , hence is a state of $\mathcal{S}_{c}(x)$ .

Figure 7: From

\mathcal{S}_{c}(abc)

\mathcal{S}_{c}(abca)

Figure 7 illustrates the construction in Proposition 7.3: the construction from $\mathcal{S}_{c}(abc)$ in Figure 3 to $\mathcal{S}_{c}(abca)$ ; here, $u=abc$ , $h=a,u_{2}=bc$ and only the new edges are drawn. Note that $\operatorname{Pal}(abc)^{-1}\operatorname{Pal}(abca)=abacaba$ .

Lemma 7.4

Each word recognized by the automaton $\mathcal{S}$ of Proposition 7.3 is a suffix of $Pal(ux)$ .

Proof.

Let $w$ be a word recognized by $\mathcal{S}$ , that is the label of some path in $\mathcal{S}$ . If this path does not end at $ux$ , then by the construction, it is a path in $\mathcal{S}_{c}(u)$ ; hence $w$ is a suffix of $\operatorname{Pal}(u)$ , and since $\operatorname{Pal}(u)$ is a suffix of $\operatorname{Pal}(ux)$ (the former is a prefix of the latter, and both words are palindromes), $w$ is a suffix of $\operatorname{Pal}(ux)$ . If this path ends at $ux$ , then its last edge is one of the new edges; hence $w=s\operatorname{Pal}(u)^{-1}\operatorname{Pal}(ux)$ , where $s$ is a suffix of $\operatorname{Pal}(u)$ . Suppose first that $u$ is $x$ -free; then by Justin’s result recalled above, $\operatorname{Pal}(ux)=\operatorname{Pal}(u)x\operatorname{Pal}(u)$ and $w=sx\operatorname{Pal}(u)$ is a suffix of $\operatorname{Pal}(u)x\operatorname{Pal}(u)=Pal(ux)$ . Suppose now that $u$ is not $x$ -free; then $u=u_{1}xu_{2}$ , $u_{2}$ is $x$ -free and $\operatorname{Pal}(ux)=\operatorname{Pal}(u)\operatorname{Pal}(u_{1})^{-1}\operatorname{Pal}(u)$ ; moreover, $w=s\operatorname{Pal}(u)^{-1}\operatorname{Pal}(u)\operatorname{Pal}(u_{1})^{-1}\operatorname{Pal}(u)=s\operatorname{Pal}(u_{1})^{-1}\operatorname{Pal}(u)$ ; since $s$ is a suffix of $\operatorname{Pal}(u)$ and since $\operatorname{Pal}(u_{1})^{-1}\operatorname{Pal}(u)$ is in $A^{*}$ , we see that $w$ is a suffix of $\operatorname{Pal}(u)\operatorname{Pal}(u_{1})^{-1}\operatorname{Pal}(u)=\operatorname{Pal}(ux)$ .

∎

Lemma 7.5

For every prefix of $p$ of $u$ , $\mathcal{S}_{c}(p)$ is obtained from $\mathcal{S}_{c}(u)$ by keeping in the latter only the states which are prefixes of $p$ .

The number of paths in $\mathcal{S}_{c}(u)$ from the initial state to the state $u$ is equal to $|\operatorname{Pal}(u)|-|\operatorname{Pal}(u^{-})|$ if $u$ is nonempty (where $x^{-}$ denotes the word $x$ with the last letter removed), and it is $1$ if $u$ is empty.

Proof.

We prove the lemma by induction on the length of $u$ . For $u=1$ , it is immediate.

Suppose now that $u\in A^{*}$ and $x\in A$ . We prove the assertions for $ux$ , admitting them for shorter words.

Take the notation $\mathcal{S}$ and $u=hu_{2}$ in Proposition 7.3. We know by Lemma 7.4 that each word recognized by $\mathcal{S}$ is recognized by $\mathcal{S}_{c}(ux)$ . We prove now the converse, by a counting argument, using the induction hypothesis. Let $n_{w}$ denote the number of words recognized by $\mathcal{S}_{c}(w)$ . Then $n_{w}$ is equal to the number of suffixes of $\operatorname{Pal}(w)$ , hence is equal to to $1+$ the length of $\operatorname{Pal}(w)$ . Hence $n_{ux}-n_{u}=|\operatorname{Pal}(ux)|-|\operatorname{Pal}(u)|$ . Now let $n$ be the number of words recognized by $\mathcal{S}$ . By construction of the automaton $\mathcal{S}$ , each word recognized by it, and not recognized by $\mathcal{S}_{c}(u)$ , is of the form $w=s\operatorname{Pal}(u)^{-1}\operatorname{Pal}(ux)$ , where $s$ is the label of some path in $\mathcal{S}_{c}(u)$ which starts at $1$ and ends at $hp$ for some prefix $p$ of $u_{2}$ . By induction, the number of such words $s$ is equal to $|\operatorname{Pal}(hp)|-|\operatorname{Pal}((hp)^{-})|$ if $hp$ is nonempty, and 1 if $hp$ is empty. Since the corresponding sum is telescoping, it follows that the number $n-n_{u}$ of possible words $s$ is equal to $|\operatorname{Pal}(u)|-|\operatorname{Pal}(h^{-})|$ if $h$ is nonempty, and to $1+|\operatorname{Pal}(u)|$ if $h$ is empty. If $u$ is not $x$ -free, then $h=u_{1}x$ is nonempty, $h^{-}=u_{1}$ and $\operatorname{Pal}(ux)=\operatorname{Pal}(u)\operatorname{Pal}(u_{1})^{-1}\operatorname{Pal}(u)$ , so that $n_{ux}-n_{u}=|\operatorname{Pal}(ux)|-|\operatorname{Pal}(u)|=|\operatorname{Pal}(u_{1})^{-1}\operatorname{Pal}(u)|=|\operatorname{Pal}(u)|-|u_{1}|=|\operatorname{Pal}(u)|-|h^{-}|=n-n_{u}$ . If $u$ is $x$ -free, then $h=1$ , $\operatorname{Pal}(ux)=\operatorname{Pal}(u)x\operatorname{Pal}(u)$ , and $n_{ux}-n_{u}=|\operatorname{Pal}(ux)|-|\operatorname{Pal}(u)|=|x\operatorname{Pal}(u)|=1+|\operatorname{Pal}(u)|=n-n_{u}$ . Thus in both cases, $n_{ux}=n$ , which implies that $\mathcal{S}_{c}(ux)$ and $\mathcal{S}$ both recognize the language of suffixes of $\operatorname{Pal}(ux)$ ; since both automata have the same number of states and the first is minimal, they are isomorphic; but since both have a unique longest path of the same length, they are equal.

The two assertions of the lemma now clearly follow for $ux$ . ∎

Proof of Proposition 7.3. It follows from the proof of Lemma 7.5.

Proof Theorem 7.1. The theorem follows by a straightforward induction from Proposition 7.3.

Theorem 7.1 has the following corollary, which could also be proved using (iv) in Section 5 and Proposition 6.7.

Corollary 7.6

In the graph $\mathcal{S}_{c}(u)$ , the label of an edge depends only on the final state of the edge.

Proof.

Indeed, the label of each transition $v\to w$ is $\operatorname{Pal}(w^{-})^{-1}\operatorname{Pal}(w)$ . ∎

Corollary 7.7

Let $u\in A^{*}$ , of length $n$ , and for any $a\in A$ , denote by $p_{a}(u)$ the position of the rightmost occurrence of $a$ in the word $u$ , with $p_{a}(u)=0$ when $u$ is $a$ -free. Then the number of states in $\mathcal{S}_{c}(u)$ is $n+1$ and the number of transitions is $\sum_{a\in A}p_{a}(u)$ .

Proof.

By Theorem 7.1, the number of states is the number of prefixes of $u$ , thus it is $n+1$ . The number $t$ of transitions is equal to $t=\sum_{a\in A}t_{a}$ , where $t_{a}$ is the number of factorizations $u=xyaz$ , $x,y,z\in A^{*}$ , $y$ $a$ -free. We have $t_{a}=0$ if $u$ is $a$ -free.

Suppose that $u$ contains $a$ . Denote $I_{a}=A^{*}aA^{*}$ , the set of words containing letter $a$ ; clearly, each word $v$ in $I_{a}$ has a unique factorization $v=yaz$ , $y,z\in A^{*}$ , $y$ $a$ -free. Hence $t_{a}$ is equal to the number of factorizations $u=xv$ ,

(*)\,\,x\in A^{*},v\in I_{a}.

If we factorize $u=u_{1}au_{2}$ , where $u_{2}$ is the longest $a$ -free suffix of $u$ , then $|u_{1}|+1=p_{a}(u)$ , and a factorization $u=xv$ satisfies $(*)$ if and only if $x$ is a prefix of $u_{1}$ . Hence the number of such factorizations is $|u_{1}|+1=p_{a}(u)$ . ∎

For a binary alphabet, Theorem 7.1 was obtained, in an another but equivalent form, by Epifanio, Mignosi, Shallit and Venturini [13] (see also [7], especially Figure 1). Similarly for Corollary 7.7, see [13] Proposition 1.

8 Further comments

Following [13], we obtain for any word $u$ on any alphabet, a directed graph that counts from 0 to $n=|\operatorname{Pal}(u)|$ in the following sense: replace in $\mathcal{S}_{c}(u)$ each label by its length; then one obtains a directed graph, with the initial vertex $1$ , such that for each $k=0,\ldots,n$ , there is a unique path, starting from the initial vertex, whose label is $k$ (here, labels of paths here additive). This follows clearly since there is a unique suffix of $\operatorname{Pal}(u)$ of each length $k=0,\ldots,n$ . For $u$ on a binary alphabet, Epifanio et al. call this graph a Sturmian graph.

As open problem, we mention that Theorem 7.3 has certainly an interpretation as a factorization result: each suffix of $\operatorname{Pal}(u)$ as a certain factorization as a product of the words which are the label of the edges of the automaton; these words are all of the form $\operatorname{Pal}(p^{-})\operatorname{Pal}(p)$ , $p$ a proper prefix of $u$ . In the binary alphabet case, this is known: the factorization is related to the Ostrowski lazy factorization (see [12]), and to the factorization theorem of Anna Frid [15] Corollary 1, which following [7] Section 9, may be stated as follows: for $u$ on a binary alphabet, of length $n$ , each suffix $s$ of $\operatorname{Pal}(u)$ , of length $\ell$ , has a unique factorization $\prod_{1\leq i\leq n}L_{0}^{d_{1}}L_{1}^{d_{2}}\cdots L_{n-1}^{d_{n}}$ , where $L_{i}=\operatorname{Pal}(p_{i}^{-})^{-1}\operatorname{Pal}(p_{i+1})$ , $p_{i}$ the prefix of length $i$ of $u$ , and where $\ell=\sum_{1\leq i\leq n}d_{i}q_{i-1}$ is the lazy Ostrowski representation of $\ell$ , with $q_{j}=|L_{j}|$ .

References

[1] Jorge Almeida. Profinite semigroups and applications. In Structural theory of automata, semigroups, and universal algebra, volume 207 of NATO Sci. Ser. II Math. Phys. Chem., pages 1–45. Springer, Dordrecht, 2005. Notes taken by Alfredo Costa.
[2] Jorge Almeida, Afredo Costa, Revekka Kyriakoglou, and Dominique Perrin. Profinite Semigroups and Symbolic Dynamics. Springer verlag, 2020.
[3] Pierre Arnoux and Gérard Rauzy. Représentation géométrique de suites de complexité $2n+1$ . Bull. Soc. Math. France, 119(2):199–215, 1991.
[4] A. Blumer, J. Blumer, D. Haussler, A. Ehrenfeucht, M. T. Chen, and J. Seiferas. The smallest automaton recognizing the subwords of a text. Theoret. Comput. Sci., 40(1):31–55, 1985. Special issue: Eleventh international colloquium on automata, languages and programming (Antwerp, 1984).
[5] Anselm Blumer, J. Blumer, David Haussler, Ross M. McConnell, and Andrzej Ehrenfeucht. Complete inverted files for efficient text retrieval and analysis. J. ACM, 34(3):578–595, 1987.
[6] Anselm Blumer, Andrzej Ehrenfeucht, and David Haussler. Average sizes of suffix trees and dawgs. Discret. Appl. Math., 24(1-3):37–45, 1989.
[7] Y. Bugeaud and C. Reutenauer. On the conjugates of Christoffel words, 2022.
[8] Maxime Crochemore and Renaud Vérin. Direct construction of compact directed acyclic word graphs. In Alberto Apostolico and Jotun Hein, editors, Combinatorial Pattern Matching, 8th Annual Symposium, CPM 97, Aarhus, Denmark, June 30 - July 2, 1997, Proceedings, volume 1264 of Lecture Notes in Computer Science, pages 116–129. Springer, 1997.
[9] Aldo de Luca. Sturmian words: structure, combinatorics, and their arithmetics. Theoret. Comput. Sci., 183(1):45 – 82, 1997.
[10] Xavier Droubay, Jacques Justin, and Giuseppe Pirillo. Episturmian words and some constructions of de Luca and Rauzy. Theoret. Comput. Sci., 255(1-2):539–553, 2001.
[11] Fabien Durand and Dominique Perrin. Dimension Groups and Dynamical Systems. Cambridge University Press, 2022.
[12] Chiara Epifanio, Christiane Frougny, Alessandra Gabriele, Filippo Mignosi, and Jeffrey O. Shallit. Sturmian graphs and integer representations over numeration systems. Discret. Appl. Math., 160(4-5):536–547, 2012.
[13] Chiara Epifanio, Filippo Mignosi, Jeffrey O. Shallit, and Ilaria Venturini. On sturmian graphs. Discret. Appl. Math., 155(8):1014–1030, 2007.
[14] Gabriele Fici. Special factors and the combinatorics of suffix and factor automata. Theoret. Comput. Sci., 412(29):3604–3615, 2011.
[15] Anna E. Frid. Sturmian numeration systems and decompositions to palindromes. European J. Combin., 71:202–212, 2018.
[16] Marshall Hall, Jr. A topology for free groups and related groups. Ann. of Math. (2), 52:127–139, 1950.
[17] Jacques Justin. Episturmian morphisms and a Galois theorem on continued fractions. ITA, 39(1):207–215, 2005.
[18] Christian Kassel and Christophe Reutenauer. A palindromization map for the free group. Theor. Comput. Sci., 409(3):461–470, 2008.
[19] M. Lothaire. Applied Combinatorics on Words. Encyclopedia of Mathematics and its Applications. Cambridge University Press, 2005.
[20] Gérard Rauzy. Mots infinis en arithmétique. In Maurice Nivat and Dominique Perrin, editors, Automata on Infinite Words, volume 192 of Lecture Notes in Computer Science, pages 165–171. Springer-verlag, 1984.
[21] Christophe Reutenauer. From Christoffel Words to Markoff Numbers. Oxford University Press, 2019.
[22] Luis Ribes and Pavel Zalesskii. Profinite groups, volume 40 of Ergebnisse der Mathematik und ihrer Grenzgebiete. 3. Folge. A Series of Modern Surveys in Mathematics [Results in Mathematics and Related Areas. 3rd Series. A Series of Modern Surveys in Mathematics]. Springer-Verlag, Berlin, second edition, 2010.
[23] Gwenaël Richomme. A characterization of infinite LSP words. In Developments in language theory, volume 10396 of Lecture Notes in Comput. Sci., pages 320–331. Springer, Cham, 2017.
[24] Marinella Sciortino and Luca Q. Zamboni. Suffix automata and standard sturmian words. In Tero Harju, Juhani Karhumäki, and Arto Lepistö, editors, Developments in Language Theory, 11th International Conference, DLT 2007, Turku, Finland, July 3-6, 2007, Proceedings, volume 4588 of Lecture Notes in Computer Science, pages 382–398. Springer, 2007.

The palindromization map

Abstract

1 Introduction

Acknowledgements

2 The palindromization map

Theorem 2.1

Proposition 2.2

Proof.

Proposition 2.3

Proof.

3 Semidirect products, cocycles and sequential functions

Semidirect products

Proposition 3.1

Cocycles

Sequential functions

Proposition 3.2

Proof.

4 Uniform continuity of Pal\operatorname{Pal} for the profinite distance

Proposition 4.1

Proposition 4.2

Proof.

Corollary 4.3

5 Suffix automaton

Theorem 5.1

Proposition 5.2

Corollary 5.3

Proof.

Example 5.4

6 Compact automata

Example 6.1

Proposition 6.2

Proof.

Example 6.3

Example 6.4

Proposition 6.5

Proof.

Corollary 6.6

Proposition 6.7

Proof.

Example 6.8

7 Direct construction of the compact suffix automaton of Pal⁡(u)\operatorname{Pal}(u)

Theorem 7.1

Lemma 7.2

Proof.

Proposition 7.3

Lemma 7.4

Proof.

Lemma 7.5

Proof.

Corollary 7.6

Proof.

Corollary 7.7

Proof.

8 Further comments

References

4 Uniform continuity of $\operatorname{Pal}$ for the profinite distance

7 Direct construction of the compact suffix automaton of $\operatorname{Pal}(u)$