¹¹institutetext: Laboratoire de Combinatoire et d’Informatique Mathématique,
Université du Québec à Montréal,
CP 8888 Succ. Centre-ville, Montréal (QC) Canada H3C 3P8
¹¹email: [email protected]

On the number of $k$ -powers in a finite word

Shuo LI 11

Abstract

This note is an attempt to attack a conjecture of Fraenkel and Simpson stated in 1998 concerning the number of distinct squares in a finite word. By counting the number of (right-)special factors, we give an upper bound of the number of $k$ -powers in a finite word for any integer $k\geq 3$ . By $k$ -power, we mean a word of the form $\underbrace{uu...u}_{k\;\text{times}}$ .

1 Introduction and notation

Given a finite word, the problem of counting the number of distinct squares was introduced by Fraenkel and Simpson. In [4] they conjectured that the number of distinct squares in a finite word $w$ is bounded by its length $|w|$ and they proved that this number is bounded by $2|w|$ . After that Ilie [5] strengthened this bound to $2|w|-\Theta(n)$ ; Lam [6] improved this result to $\frac{95}{48}|w|$ ; Deza, Franek and Thierry [2] achieved a bound of $\frac{11}{6}|w|$ ; Thierry [7] refined this bound to $\frac{3}{2}n$ . A basic fact about the square-counting problem is that no more than two squares can have their last occurrence starting at the same position, this fact is proved in [4] using the three squares lemma of Crochemore and Rytter [1]. After that, the ideas of improving the bound of distinct squares in a finite word are about counting the number of positions at which there exist two different squares having their last occurrence starting. In this article, instead of studying the same motif, we propose to consider the number of special factors. The idea is from the fact that (almost) every occurrent factor can be associated with a (right-)special factor. Even though we can not achieve an injection from the set of squares to the set of special factors in the finite word. This correspondence leads an upper bound of the number of $k$ -powers in the given word. The main result of this note is announced as follows:

Theorem 1

Let $k$ be an integer larger than 2. For any finite word $w$ , let $N_{k}(w)$ denote the number of its distinct non-empty factors of the form $\underbrace{uu...u}_{k\;\text{times}}$ , let $|w|$ denote the length of $w$ and let $|\hbox{\rm Alph}(w)|$ denote the number of distinct letters in $w$ . We then have

N_{k}(w)\leq\frac{|w|-|\hbox{\rm Alph}(w)|}{k-2}.

2 Preliminaries

Let us recall the basic terminology about words. By word we mean a finite concatenation of symbols $w=w_{1}w_{2}\cdots w_{n}$ , with $n\in\mathbb{N}$ . The length of $w$ , denoted $|w|$ , is $n$ and we say that the symbol $w_{i}$ is at position $i$ . The set $\hbox{\rm Alph}(w)=\left\{w_{i}|1\leq i\leq n\right\}$ is called the alphabet of $w$ and its elements are called letters. Let $|\hbox{\rm Alph}(w)|$ denote the cardinality of $\hbox{\rm Alph}(w)$ . A word of length $0$ is called the empty word and it is denoted by $\varepsilon$ . For any word $u$ , we have $u=\varepsilon u=u\varepsilon$ . Let $u,v$ be two different words, we say $a$ is shorter (resp. longer) than $b$ if $|a|<|b|$ (resp. $|a|>|b|$ ).

A word $u$ is called a factor of $w$ if $w=pus$ for some words $p,s$ . When $p=\varepsilon$ (resp. $s=\varepsilon$ ) $u$ is called a prefix (resp. suffix) of $w$ . The set of all factors (resp. prefixes, resp. suffixes) of $w$ is denoted by $\hbox{\rm Fac}(w)$ (resp. $\hbox{\rm Pref}(w)$ , resp. $\hbox{\rm Suff}(w)$ ). For any integer $i$ satisfying $1\leq i\leq|w|$ , let $C_{w}(i)$ denote the number of distinct factors of length $i$ in $w$ .

A factor $u$ of $w$ is called right-special if there exist two different letters $a,b\in\hbox{\rm Alph}(w)$ such that $ua$ and $ub$ are both factors of $w$ . $ua,ub$ are called right-extensions of $u$ .

For any natural number $k$ , the $k$ -th power of a finite word $u$ is denoted by $u^{k}=uu\cdots u$ and consists of the concatenation of $k$ copies of $u$ . A finite word $w$ is said to be primitive if it is not a power of another word, that is if $w=u^{k}$ implies $k=1$ . Let $\hbox{\rm Prim}(w)$ denote the set of all primitive factors of $w$ . A $k$ -power is a word $w$ satisfying $w=\underbrace{uu...u}_{k\;\text{tilmes}}$ for a certain $u\in\hbox{\rm Fac}(w)$ and for a $k\geq 2$ . Let $N_{k}(w)$ denote the number of its distinct non-empty $k$ -powers. For a given word $w$ and two positive integers $a,b$ satisfying $a\leq b=|w|$ , let us define $w^{\frac{a}{b}}$ to be the prefix of $w$ of length $a$ . Now we can define the rational power a word: let $w$ be a finite word and let $\frac{a}{b}\in\mathbb{Q}^{+}$ be a positive rational number, $w^{\frac{a}{b}}$ is well defined only if $b=|w|$ , and in this case, there is a couple of non-negative integers $(c,d)$ satisfying $a=c|w|+d$ , we define $w^{\frac{a}{b}}$ to be $w^{c}w^{\frac{d}{b}}$ . For a given word $w$ and a given integer $k$ , we say that $w$ is of period $k$ if there exists a word $u$ of length $k$ such that $w=u^{\frac{|w|}{k}}$ .

Here we recall a basic lemma concerning the repetitions:

Lemma 2 (Fine and Wilf [3])

Let $w$ be a word having $k$ and $l$ for periods. If $|w|\geq k+l-\gcd(k,l)$ then $\gcd(k,l)$ is also a period of $w$ .

3 Number of right-special factors

Let $w$ be a finite word. In this section, we consider the word $w^{*}$ obtained by concatenating a special letter $*$ at the end of $w$ , with the condition that $*\not\in\hbox{\rm Alph}(w)$ .

For any $u\in\hbox{\rm Fac}(w)$ , let us define $m_{w}(u)=\max\left\{i|u^{i}\in\hbox{\rm Fac}(w),i\in\mathbb{Q}^{+}\right\}$ , and similarly, let us define $m_{w^{*}}(u)=\max\left\{i|u^{i}\in\hbox{\rm Fac}(w^{*}),i\in\mathbb{Q}^{+}\right\}$ . There are the following facts between $m_{w}(u)$ and $m_{w^{*}}(u)$ :

1) For any factor $u\in\hbox{\rm Fac}(w)$ , $m_{w}(u)=m_{w^{*}}(u)\geq 1$ ;

2) If $u\in\hbox{\rm Fac}(w^{*})$ but $u\not\in\hbox{\rm Fac}(w)$ , $m_{w^{*}}(u)=1$ .

For any factor $u\in\hbox{\rm Fac}(w)$ , let us define $m(u)=m_{w}(u)=m_{w^{*}}(u)$ . For any integer $i$ satisfying $1\leq i\leq m(u)$ , let us define $u(i)$ to be the shortest suffix of $u^{m(u)}$ containing $u^{i}$ as a prefix.

Example 3

Let us consider the following word

w=abababacababa.

For $u=ab$ , $m(u)=\frac{7}{2}$ , thus $u(1),u(2),u(3)$ are all well-defined, we have $u(1)=aba$ , $u(2)=ababa$ and $u(3)=abababa$ . For $v=abab$ , we then have $m(v)=\frac{7}{4}$ , thus only $v(1)$ is well-defined, we have $v(1)=ababa$ .

Lemma 4

Let $w$ be a finite word and let $u\in\hbox{\rm Fac}(w)$ , if $m(u)\geq 2$ , then, for any integer $i$ satisfying $1\leq i\leq m(u)-1$ , the factor $u(i)$ is a right-special factor in $w^{*}$ .

Proof

Let $w=w_{1}w_{2}...w_{|w|}$ and let $w^{*}=w^{*}_{1}w^{*}_{2}...w^{*}_{|w|+1}$ , with $w_{i}=w^{*}_{i}$ for all $i$ satisfying $1\leq i\leq|w|$ and $w_{|w|+1}=*$ . As $u^{m(u)}\in\hbox{\rm Fac}(w)$ , there exists an integer $k$ satisfying $1\leq k\leq|w|$ such that $u^{m(u)}=w_{k}w_{k+1}...w_{k+m(u)|u|-1}$ . For any integer $i$ satisfying $1\leq i\leq m(u)-1$ , $u(i)$ occurs at least twice in $u^{m(u)}$ as respectively a prefix and a suffix of $u^{m(u)}$ . If we suppose that $|u(i)|=l$ , then

u(i)=w_{k}w_{k+1}...w_{k+l-1}=w_{k+m(u)|u|-l-1}w_{k+m(u)|u|-l}...w_{k+m(u)|u|-1}.

Now let us prove that $u(i)$ is a right-special factor of $w^{*}$ . As $u^{m(u)}\in\hbox{\rm Fac}(w)$ , $w^{*}_{k}w^{*}_{k+1}...w^{*}_{k+m(u)|u|-1}w^{*}_{k+m(u)|u|}$ is well-defined and it is a factor of $w^{*}$ . We claim that $w^{*}_{k+l}\neq w^{*}_{k+m(u)|u|}$ . In fact, if $k+m(u)|u|=|w|+1$ , then $w^{*}_{k+m(u)|u|}=*\neq w^{*}_{k+l}$ ; if $k+m(u)|u|<|w|+1$ ,

w^{*}_{k}w^{*}_{k+1}...w^{*}_{k+m(u)|u|-1}w^{*}_{k+m(u)|u|}=w_{k}w_{k+1}...w_{k+m(u)|u|-1}w_{k+m(u)|u|}\in\hbox{\rm Fac}(w),

in this case if $w_{k+l-1}=w_{k+m(u)|u|}$ , then $w_{k}w_{k+1}...w_{k+m(u)|u|-1}w_{k+m(u)|u|}=u^{m(u)+\frac{1}{|u|}}$ , this contradicts the maximality of $m(u)$ . As a consequence, $w^{*}_{k+l-1}\neq w^{*}_{k+m(u)|u|}$ in both cases, thus

w^{*}_{k}w^{*}_{k+1}...w^{*}_{k+l}\neq w^{*}_{k+m(u)|u|-l-1}w^{*}_{k+m(u)|u|-l}...w^{*}_{k+m(u)|u|}.

Hence, $u(i)$ is a right-special factor in $w^{*}$ .

Example 5

Let us consider the word given in the previous example:

w=abababacababa.

$w^{*}$ is to be

w^{*}=abababacababa*.

For $u=ab$ , $m(u)=\frac{7}{2}$ , thus $u(1),u(2)$ are right-special factors in $w^{*}$ . In fact, $(ab)^{i}ab$ , $(ab)^{i}ac$ and $(ab)^{i}a*$ are all factors of $w^{*}$ for $i=1,2$ . For $v=abab$ , $m(v)=\frac{7}{4}$ . $v(1)$ is also a right-special factor of $w^{*}$ because $v(1)=ababa=u(2)$ . However, it is not counted in the lemma, because $m(v)-1<1$ .

Lemma 6

For any couple of different primitive factors $(u,v)\in\hbox{\rm Fac}(w)^{2}$ satisfying $m(u),m(v)\geq 3$ , we have

\left\{u(i)|2\leq i\leq m(u)-1\right\}\cap\left\{v(i)|2\leq i\leq m(v)-1\right\}=\emptyset

Proof

If there exist two different primitive factors $u,v$ and two integers $i,j$ such that $u(i)=v(j)$ then $u^{i}u^{\prime}=v^{j}v^{\prime}$ with $u^{\prime},v^{\prime}$ respectively a prefix of $u$ and $v$ . From the hypothesis that $i\geq 2,j\geq 2$ and Lemma 2, there exists a factor $p$ such that $u,v$ are both a power of $p$ , this fact contradicts the primitivities of $u,v$ .

Corollary 7

Let $w$ be a finite word and let $M(w^{*})$ denote the number of right-special factors in $w^{*}$ , then we have

\sum_{u\in\hbox{\rm Prim}(w)}(m(u)-2)\leq M(w^{*}).

Proof

Let us consider the set of factors of $w$ :

s=\left\{u(i)|u\in\hbox{\rm Prim}(w),2\leq i\leq m(u)-1,u(i)\in\hbox{\rm Fac}(w)\right\}.

From Lemma 4, the elements in $s$ are all right-special factors in $w^{*}$ and from Lemma 6, the cardinality of $s$ is exactly $\sum_{u\in\hbox{\rm Prim}(w)}(m(u)-2)$ .

4 Proof of the main theorem

Proof ( of Theorem 1)

Let $k$ be an integer larger than $2$ , we have

N_{k}(w)=\sum_{u\in\hbox{\rm Prim}(w)}\lfloor\frac{m(u)}{k}\rfloor,

where $\lfloor x\rfloor$ represents the largest integer smaller than or equal to $x$ . On the other hand, for any primitive $u$ satisfying $m(u)\geq k$ , we have $\frac{m(u)}{k}\leq\frac{m(u)-2}{k-2}$ . Hence,

	$\displaystyle N_{k}(w)$	$\displaystyle=\sum_{u\in\hbox{\rm Prim}(w)}\lfloor\frac{m(u)}{k}\rfloor$
		$\displaystyle=\sum_{\begin{subarray}{c}u\in\hbox{\rm Prim}(w)\\ m(u)\geq k\end{subarray}}\lfloor\frac{m(u)}{k}\rfloor$
		$\displaystyle\leq\sum_{\begin{subarray}{c}u\in\hbox{\rm Prim}(w)\\ m(u)\geq k\end{subarray}}\frac{m(u)-2}{k-2}$
		$\displaystyle\leq\sum_{u\in\hbox{\rm Prim}(w)}\frac{m(u)-2}{k-2}$
		$\displaystyle\leq\frac{M(w^{*})}{k-2},$

where $M(w^{*})$ denote the number of right-special factors in $w^{*}$ .

Now let us give an upper bound of $M(w^{*})$ . First, we claim that if $u\in\hbox{\rm Fac}(w^{*})$ is a right-special factor, then $u\in\hbox{\rm Fac}(w)$ . Otherwise, $u$ is a suffix of $w^{*}$ and it does not have any right-extensions. Now, for any right-special factor of $w^{*}$ of length $i$ , it has at least two right-extensions, and the suffix of $w^{*}$ of length $i$ is not right-special. Thus, if we let $s(w^{*})(i)$ denote the number of right-special factors of $w^{*}$ of length $i$ , we then have $s(w^{*})(i)\leq C_{w^{*}}(i+1)-C_{w^{*}}(i)+1$ , where $C_{w^{*}}(i)$ is the number of distinct factors of length $i$ in $w^{*}$ defined in the section Preliminaries. Hence,

	$\displaystyle M(w^{*})$	$\displaystyle=\sum_{i=1}^{\|w^{}\|}s(w^{})(i)$
		$\displaystyle\leq\sum_{i=1}^{\|w^{}\|}C_{w^{}}(i+1)-C_{w^{*}}(i)+1$
		$\displaystyle\leq\|w\|+1+C_{w^{}}(\|w^{}\|+1)-C_{w^{*}}(1)$
		$\displaystyle\leq\|w\|-\|\hbox{\rm Alph}(w)\|.$

Thus,

N_{k}(w)\leq\frac{|w|-|\hbox{\rm Alph}(w)|}{k-2}.

5 Concluding Remarks

The result obtained in the main theorem is not sharp. Let us consider the word $w=aaa$ , it is easy to check that $N_{3}(w)=1$ while the bound given in Theorem 1 is $2$ . The author believes that the problem is from the way we count the number of right-special factors. From Lemme 4 we prove that for a given primitive $u$ , all $u(i)$ are right-special if $1\leq i\leq m(u)-1$ . However, in Lemma 6 we count just the $u(i)$ for $2\leq i\leq m(u)-1$ . In fact, we can only prove that the words of the form $u(i)$ are pairwisely different when $i\geq 2$ . Meanwhile, we can have two different primitives $u,v$ such that $u(1)=v(i)$ for some positive integer $i$ . Thus, further work to to be done to investigate that

\sum_{u\in\hbox{\rm Prim}(w)}(m(u)-2)\leq M(w^{*}).

If the previous inequality holds, we may expect to prove

N_{k}(w)\leq\frac{|w|-|\hbox{\rm Alph}(w)|}{k-1}.

References

[1] Crochemore, M., Rytter, W.: Squares, cubes, and time-space efficient string searching. Algorithmica 13, 405–425 (1995)
[2] Deza, A., Franek, F., Thierry, A.: How many double squares can a string contain? Discrete Applied Mathematics 180, 52–69 (2015)
[3] Fine, N.J., Wilf, H.S.: Uniqueness theorems for periodic functions. Proceedings of the American Mathematical Society 16(1), 109–114 (1965)
[4] Fraenkel, A.S., Simpson, J.: How many squares can a string contain? J. Comb. Theory, Ser. A 82(1), 112–120 (1998)
[5] Ilie, L.: A note on the number of distinct squares in a word. In: Brlek, S., Reutenauer, C. (eds.) Proc. Words2005, 5-th International Conference on Words. vol. 36, pp. 289–294. Publications du LaCIM, Montreal, Canada (13–17 Sep 2005)
[6] Lam, N.H.: On the number of squares in a string. AdvOL-Report 2 (2013)
[7] Thierry, A.: A proof that a word of length n has less than 1.5n distinct squares (2020)

	$\displaystyle M(w^{*})$	$\displaystyle=\sum_{i=1}^{\|w^{}\|}s(w^{})(i)$
		$\displaystyle\leq\sum_{i=1}^{\|w^{}\|}C_{w^{}}(i+1)-C_{w^{*}}(i)+1$
		$\displaystyle\leq\|w\|+1+C_{w^{}}(\|w^{}\|+1)-C_{w^{*}}(1)$
		$\displaystyle\leq\|w\|-\|\hbox{\rm Alph}(w)\|.$

On the number of kk-powers in a finite word

Abstract

1 Introduction and notation

Theorem 1

2 Preliminaries

Lemma 2 (Fine and Wilf [3])

3 Number of right-special factors

Example 3

Lemma 4

Proof

Example 5

Lemma 6

Proof

Corollary 7

Proof

4 Proof of the main theorem

Proof ( of Theorem 1)

5 Concluding Remarks

References

On the number of $k$ -powers in a finite word