This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: Laboratoire de Combinatoire et d’Informatique Mathématique,
Université du Québec à Montréal,
CP 8888 Succ. Centre-ville, Montréal (QC) Canada H3C 3P8
11email: [email protected]

On the number of kk-powers in a finite word

Shuo LI 11
Abstract

This note is an attempt to attack a conjecture of Fraenkel and Simpson stated in 1998 concerning the number of distinct squares in a finite word. By counting the number of (right-)special factors, we give an upper bound of the number of kk-powers in a finite word for any integer k3k\geq 3. By kk-power, we mean a word of the form uuuktimes\underbrace{uu...u}_{k\;\text{times}}.

1 Introduction and notation

Given a finite word, the problem of counting the number of distinct squares was introduced by Fraenkel and Simpson. In [4] they conjectured that the number of distinct squares in a finite word ww is bounded by its length |w||w| and they proved that this number is bounded by 2|w|2|w|. After that Ilie [5] strengthened this bound to 2|w|Θ(n)2|w|-\Theta(n); Lam [6] improved this result to 9548|w|\frac{95}{48}|w|; Deza, Franek and Thierry [2] achieved a bound of 116|w|\frac{11}{6}|w|; Thierry [7] refined this bound to 32n\frac{3}{2}n. A basic fact about the square-counting problem is that no more than two squares can have their last occurrence starting at the same position, this fact is proved in [4] using the three squares lemma of Crochemore and Rytter [1]. After that, the ideas of improving the bound of distinct squares in a finite word are about counting the number of positions at which there exist two different squares having their last occurrence starting. In this article, instead of studying the same motif, we propose to consider the number of special factors. The idea is from the fact that (almost) every occurrent factor can be associated with a (right-)special factor. Even though we can not achieve an injection from the set of squares to the set of special factors in the finite word. This correspondence leads an upper bound of the number of kk-powers in the given word. The main result of this note is announced as follows:

Theorem 1

Let kk be an integer larger than 2. For any finite word ww, let Nk(w)N_{k}(w) denote the number of its distinct non-empty factors of the form uuuktimes\underbrace{uu...u}_{k\;\text{times}}, let |w||w| denote the length of ww and let |Alph(w)||\hbox{\rm Alph}(w)| denote the number of distinct letters in ww. We then have

Nk(w)|w||Alph(w)|k2.N_{k}(w)\leq\frac{|w|-|\hbox{\rm Alph}(w)|}{k-2}.

2 Preliminaries

Let us recall the basic terminology about words. By word we mean a finite concatenation of symbols w=w1w2wnw=w_{1}w_{2}\cdots w_{n}, with nn\in\mathbb{N}. The length of ww, denoted |w||w|, is nn and we say that the symbol wiw_{i} is at position ii. The set Alph(w)={wi|1in}\hbox{\rm Alph}(w)=\left\{w_{i}|1\leq i\leq n\right\} is called the alphabet of ww and its elements are called letters. Let |Alph(w)||\hbox{\rm Alph}(w)| denote the cardinality of Alph(w)\hbox{\rm Alph}(w). A word of length 0 is called the empty word and it is denoted by ε\varepsilon. For any word uu, we have u=εu=uεu=\varepsilon u=u\varepsilon. Let u,vu,v be two different words, we say aa is shorter (resp. longer) than bb if |a|<|b||a|<|b| (resp. |a|>|b||a|>|b|).

A word uu is called a factor of ww if w=pusw=pus for some words p,sp,s. When p=εp=\varepsilon (resp. s=εs=\varepsilon) uu is called a prefix (resp. suffix) of ww. The set of all factors (resp. prefixes, resp. suffixes) of ww is denoted by Fac(w)\hbox{\rm Fac}(w) (resp. Pref(w)\hbox{\rm Pref}(w), resp. Suff(w)\hbox{\rm Suff}(w)). For any integer ii satisfying 1i|w|1\leq i\leq|w|, let Cw(i)C_{w}(i) denote the number of distinct factors of length ii in ww.

A factor uu of ww is called right-special if there exist two different letters a,bAlph(w)a,b\in\hbox{\rm Alph}(w) such that uaua and ubub are both factors of ww. ua,ubua,ub are called right-extensions of uu.

For any natural number kk, the kk-th power of a finite word uu is denoted by uk=uuuu^{k}=uu\cdots u and consists of the concatenation of kk copies of uu. A finite word ww is said to be primitive if it is not a power of another word, that is if w=ukw=u^{k} implies k=1k=1. Let Prim(w)\hbox{\rm Prim}(w) denote the set of all primitive factors of ww. A kk-power is a word ww satisfying w=uuuktilmesw=\underbrace{uu...u}_{k\;\text{tilmes}} for a certain uFac(w)u\in\hbox{\rm Fac}(w) and for a k2k\geq 2. Let Nk(w)N_{k}(w) denote the number of its distinct non-empty kk-powers. For a given word ww and two positive integers a,ba,b satisfying ab=|w|a\leq b=|w|, let us define wabw^{\frac{a}{b}} to be the prefix of ww of length aa. Now we can define the rational power a word: let ww be a finite word and let ab+\frac{a}{b}\in\mathbb{Q}^{+} be a positive rational number, wabw^{\frac{a}{b}} is well defined only if b=|w|b=|w|, and in this case, there is a couple of non-negative integers (c,d)(c,d) satisfying a=c|w|+da=c|w|+d, we define wabw^{\frac{a}{b}} to be wcwdbw^{c}w^{\frac{d}{b}}. For a given word ww and a given integer kk, we say that ww is of period kk if there exists a word uu of length kk such that w=u|w|kw=u^{\frac{|w|}{k}}.

Here we recall a basic lemma concerning the repetitions:

Lemma 2 (Fine and Wilf [3])

Let ww be a word having kk and ll for periods. If |w|k+lgcd(k,l)|w|\geq k+l-\gcd(k,l) then gcd(k,l)\gcd(k,l) is also a period of ww.

3 Number of right-special factors

Let ww be a finite word. In this section, we consider the word ww^{*} obtained by concatenating a special letter * at the end of ww, with the condition that Alph(w)*\not\in\hbox{\rm Alph}(w).

For any uFac(w)u\in\hbox{\rm Fac}(w), let us define mw(u)=max{i|uiFac(w),i+}m_{w}(u)=\max\left\{i|u^{i}\in\hbox{\rm Fac}(w),i\in\mathbb{Q}^{+}\right\}, and similarly, let us define mw(u)=max{i|uiFac(w),i+}m_{w^{*}}(u)=\max\left\{i|u^{i}\in\hbox{\rm Fac}(w^{*}),i\in\mathbb{Q}^{+}\right\}. There are the following facts between mw(u)m_{w}(u) and mw(u)m_{w^{*}}(u):

1) For any factor uFac(w)u\in\hbox{\rm Fac}(w), mw(u)=mw(u)1m_{w}(u)=m_{w^{*}}(u)\geq 1;

2) If uFac(w)u\in\hbox{\rm Fac}(w^{*}) but uFac(w)u\not\in\hbox{\rm Fac}(w), mw(u)=1m_{w^{*}}(u)=1.

For any factor uFac(w)u\in\hbox{\rm Fac}(w), let us define m(u)=mw(u)=mw(u)m(u)=m_{w}(u)=m_{w^{*}}(u). For any integer ii satisfying 1im(u)1\leq i\leq m(u), let us define u(i)u(i) to be the shortest suffix of um(u)u^{m(u)} containing uiu^{i} as a prefix.

Example 3

Let us consider the following word

w=abababacababa.w=abababacababa.

For u=abu=ab,m(u)=72m(u)=\frac{7}{2}, thus u(1),u(2),u(3)u(1),u(2),u(3) are all well-defined, we have u(1)=abau(1)=aba, u(2)=ababau(2)=ababa and u(3)=abababau(3)=abababa. For v=ababv=abab, we then have m(v)=74m(v)=\frac{7}{4}, thus only v(1)v(1) is well-defined, we have v(1)=ababav(1)=ababa.

Lemma 4

Let ww be a finite word and let uFac(w)u\in\hbox{\rm Fac}(w), if m(u)2m(u)\geq 2, then, for any integer ii satisfying 1im(u)11\leq i\leq m(u)-1, the factor u(i)u(i) is a right-special factor in ww^{*}.

Proof

Let w=w1w2w|w|w=w_{1}w_{2}...w_{|w|} and let w=w1w2w|w|+1w^{*}=w^{*}_{1}w^{*}_{2}...w^{*}_{|w|+1}, with wi=wiw_{i}=w^{*}_{i} for all ii satisfying 1i|w|1\leq i\leq|w| and w|w|+1=w_{|w|+1}=*. As um(u)Fac(w)u^{m(u)}\in\hbox{\rm Fac}(w), there exists an integer kk satisfying 1k|w|1\leq k\leq|w| such that um(u)=wkwk+1wk+m(u)|u|1u^{m(u)}=w_{k}w_{k+1}...w_{k+m(u)|u|-1}. For any integer ii satisfying 1im(u)11\leq i\leq m(u)-1, u(i)u(i) occurs at least twice in um(u)u^{m(u)} as respectively a prefix and a suffix of um(u)u^{m(u)}. If we suppose that |u(i)|=l|u(i)|=l, then

u(i)=wkwk+1wk+l1=wk+m(u)|u|l1wk+m(u)|u|lwk+m(u)|u|1.u(i)=w_{k}w_{k+1}...w_{k+l-1}=w_{k+m(u)|u|-l-1}w_{k+m(u)|u|-l}...w_{k+m(u)|u|-1}.

Now let us prove that u(i)u(i) is a right-special factor of ww^{*}. As um(u)Fac(w)u^{m(u)}\in\hbox{\rm Fac}(w), wkwk+1wk+m(u)|u|1wk+m(u)|u|w^{*}_{k}w^{*}_{k+1}...w^{*}_{k+m(u)|u|-1}w^{*}_{k+m(u)|u|} is well-defined and it is a factor of ww^{*}. We claim that wk+lwk+m(u)|u|w^{*}_{k+l}\neq w^{*}_{k+m(u)|u|}. In fact, if k+m(u)|u|=|w|+1k+m(u)|u|=|w|+1, then wk+m(u)|u|=wk+lw^{*}_{k+m(u)|u|}=*\neq w^{*}_{k+l}; if k+m(u)|u|<|w|+1k+m(u)|u|<|w|+1,

wkwk+1wk+m(u)|u|1wk+m(u)|u|=wkwk+1wk+m(u)|u|1wk+m(u)|u|Fac(w),w^{*}_{k}w^{*}_{k+1}...w^{*}_{k+m(u)|u|-1}w^{*}_{k+m(u)|u|}=w_{k}w_{k+1}...w_{k+m(u)|u|-1}w_{k+m(u)|u|}\in\hbox{\rm Fac}(w),

in this case if wk+l1=wk+m(u)|u|w_{k+l-1}=w_{k+m(u)|u|}, then wkwk+1wk+m(u)|u|1wk+m(u)|u|=um(u)+1|u|w_{k}w_{k+1}...w_{k+m(u)|u|-1}w_{k+m(u)|u|}=u^{m(u)+\frac{1}{|u|}}, this contradicts the maximality of m(u)m(u). As a consequence, wk+l1wk+m(u)|u|w^{*}_{k+l-1}\neq w^{*}_{k+m(u)|u|} in both cases, thus

wkwk+1wk+lwk+m(u)|u|l1wk+m(u)|u|lwk+m(u)|u|.w^{*}_{k}w^{*}_{k+1}...w^{*}_{k+l}\neq w^{*}_{k+m(u)|u|-l-1}w^{*}_{k+m(u)|u|-l}...w^{*}_{k+m(u)|u|}.

Hence, u(i)u(i) is a right-special factor in ww^{*}.

Example 5

Let us consider the word given in the previous example:

w=abababacababa.w=abababacababa.

ww^{*} is to be

w=abababacababa.w^{*}=abababacababa*.

For u=abu=ab,m(u)=72m(u)=\frac{7}{2}, thus u(1),u(2)u(1),u(2) are right-special factors in ww^{*}. In fact, (ab)iab(ab)^{i}ab, (ab)iac(ab)^{i}ac and (ab)ia(ab)^{i}a* are all factors of ww^{*} for i=1,2i=1,2. For v=ababv=abab, m(v)=74m(v)=\frac{7}{4}. v(1)v(1) is also a right-special factor of ww^{*} because v(1)=ababa=u(2)v(1)=ababa=u(2). However, it is not counted in the lemma, because m(v)1<1m(v)-1<1.

Lemma 6

For any couple of different primitive factors (u,v)Fac(w)2(u,v)\in\hbox{\rm Fac}(w)^{2} satisfying m(u),m(v)3m(u),m(v)\geq 3, we have

{u(i)|2im(u)1}{v(i)|2im(v)1}=\left\{u(i)|2\leq i\leq m(u)-1\right\}\cap\left\{v(i)|2\leq i\leq m(v)-1\right\}=\emptyset

.

Proof

If there exist two different primitive factors u,vu,v and two integers i,ji,j such that u(i)=v(j)u(i)=v(j) then uiu=vjvu^{i}u^{\prime}=v^{j}v^{\prime} with u,vu^{\prime},v^{\prime} respectively a prefix of uu and vv. From the hypothesis that i2,j2i\geq 2,j\geq 2 and Lemma 2, there exists a factor pp such that u,vu,v are both a power of pp, this fact contradicts the primitivities of u,vu,v.

Corollary 7

Let ww be a finite word and let M(w)M(w^{*}) denote the number of right-special factors in ww^{*}, then we have

uPrim(w)(m(u)2)M(w).\sum_{u\in\hbox{\rm Prim}(w)}(m(u)-2)\leq M(w^{*}).
Proof

Let us consider the set of factors of ww:

s={u(i)|uPrim(w),2im(u)1,u(i)Fac(w)}.s=\left\{u(i)|u\in\hbox{\rm Prim}(w),2\leq i\leq m(u)-1,u(i)\in\hbox{\rm Fac}(w)\right\}.

From Lemma 4, the elements in ss are all right-special factors in ww^{*} and from Lemma 6, the cardinality of ss is exactly uPrim(w)(m(u)2)\sum_{u\in\hbox{\rm Prim}(w)}(m(u)-2).

4 Proof of the main theorem

Proof ( of Theorem 1)

Let kk be an integer larger than 22, we have

Nk(w)=uPrim(w)m(u)k,N_{k}(w)=\sum_{u\in\hbox{\rm Prim}(w)}\lfloor\frac{m(u)}{k}\rfloor,

where x\lfloor x\rfloor represents the largest integer smaller than or equal to xx. On the other hand, for any primitive uu satisfying m(u)km(u)\geq k, we have m(u)km(u)2k2\frac{m(u)}{k}\leq\frac{m(u)-2}{k-2}. Hence,

Nk(w)\displaystyle N_{k}(w) =uPrim(w)m(u)k\displaystyle=\sum_{u\in\hbox{\rm Prim}(w)}\lfloor\frac{m(u)}{k}\rfloor
=uPrim(w)m(u)km(u)k\displaystyle=\sum_{\begin{subarray}{c}u\in\hbox{\rm Prim}(w)\\ m(u)\geq k\end{subarray}}\lfloor\frac{m(u)}{k}\rfloor
uPrim(w)m(u)km(u)2k2\displaystyle\leq\sum_{\begin{subarray}{c}u\in\hbox{\rm Prim}(w)\\ m(u)\geq k\end{subarray}}\frac{m(u)-2}{k-2}
uPrim(w)m(u)2k2\displaystyle\leq\sum_{u\in\hbox{\rm Prim}(w)}\frac{m(u)-2}{k-2}
M(w)k2,\displaystyle\leq\frac{M(w^{*})}{k-2},

where M(w)M(w^{*}) denote the number of right-special factors in ww^{*}.

Now let us give an upper bound of M(w)M(w^{*}). First, we claim that if uFac(w)u\in\hbox{\rm Fac}(w^{*}) is a right-special factor, then uFac(w)u\in\hbox{\rm Fac}(w). Otherwise, uu is a suffix of ww^{*} and it does not have any right-extensions. Now, for any right-special factor of ww^{*} of length ii, it has at least two right-extensions, and the suffix of ww^{*} of length ii is not right-special. Thus, if we let s(w)(i)s(w^{*})(i) denote the number of right-special factors of ww^{*} of length ii, we then have s(w)(i)Cw(i+1)Cw(i)+1s(w^{*})(i)\leq C_{w^{*}}(i+1)-C_{w^{*}}(i)+1, where Cw(i)C_{w^{*}}(i) is the number of distinct factors of length ii in ww^{*} defined in the section Preliminaries. Hence,

M(w)\displaystyle M(w^{*}) =i=1|w|s(w)(i)\displaystyle=\sum_{i=1}^{|w^{*}|}s(w^{*})(i)
i=1|w|Cw(i+1)Cw(i)+1\displaystyle\leq\sum_{i=1}^{|w^{*}|}C_{w^{*}}(i+1)-C_{w^{*}}(i)+1
|w|+1+Cw(|w|+1)Cw(1)\displaystyle\leq|w|+1+C_{w^{*}}(|w^{*}|+1)-C_{w^{*}}(1)
|w||Alph(w)|.\displaystyle\leq|w|-|\hbox{\rm Alph}(w)|.

Thus,

Nk(w)|w||Alph(w)|k2.N_{k}(w)\leq\frac{|w|-|\hbox{\rm Alph}(w)|}{k-2}.

5 Concluding Remarks

The result obtained in the main theorem is not sharp. Let us consider the word w=aaaw=aaa, it is easy to check that N3(w)=1N_{3}(w)=1 while the bound given in Theorem 1 is 22. The author believes that the problem is from the way we count the number of right-special factors. From Lemme 4 we prove that for a given primitive uu, all u(i)u(i) are right-special if 1im(u)11\leq i\leq m(u)-1. However, in Lemma 6 we count just the u(i)u(i) for 2im(u)12\leq i\leq m(u)-1. In fact, we can only prove that the words of the form u(i)u(i) are pairwisely different when i2i\geq 2. Meanwhile, we can have two different primitives u,vu,v such that u(1)=v(i)u(1)=v(i) for some positive integer ii. Thus, further work to to be done to investigate that

uPrim(w)(m(u)2)M(w).\sum_{u\in\hbox{\rm Prim}(w)}(m(u)-2)\leq M(w^{*}).

If the previous inequality holds, we may expect to prove

Nk(w)|w||Alph(w)|k1.N_{k}(w)\leq\frac{|w|-|\hbox{\rm Alph}(w)|}{k-1}.

References

  • [1] Crochemore, M., Rytter, W.: Squares, cubes, and time-space efficient string searching. Algorithmica 13, 405–425 (1995)
  • [2] Deza, A., Franek, F., Thierry, A.: How many double squares can a string contain? Discrete Applied Mathematics 180, 52–69 (2015)
  • [3] Fine, N.J., Wilf, H.S.: Uniqueness theorems for periodic functions. Proceedings of the American Mathematical Society 16(1), 109–114 (1965)
  • [4] Fraenkel, A.S., Simpson, J.: How many squares can a string contain? J. Comb. Theory, Ser. A 82(1), 112–120 (1998)
  • [5] Ilie, L.: A note on the number of distinct squares in a word. In: Brlek, S., Reutenauer, C. (eds.) Proc. Words2005, 5-th International Conference on Words. vol. 36, pp. 289–294. Publications du LaCIM, Montreal, Canada (13–17 Sep 2005)
  • [6] Lam, N.H.: On the number of squares in a string. AdvOL-Report 2 (2013)
  • [7] Thierry, A.: A proof that a word of length n has less than 1.5n distinct squares (2020)