M&D Data Science Center, Institute of Integrated Research, Institute of Science Tokyo, [email protected]://orcid.org/0000-0002-6856-5185JSPS KAKENHI Grant Number JP24K02899 Kyushu Institute of Technology, [email protected]://orcid.org/0000-0001-9106-6192 Department of Informatics, Kyushu University, [email protected]://orcid.org/0000-0001-6269-9353JSPS KAKENHI Grant Number JP21K17705 and JP23H04386 \ccsdesc[500]Theory of computation Data compression \ccsdesc[500]Mathematics of computing Combinatorics on words \EventEditorsJohn Q. Open and Joan R. Access \EventNoEds2 \EventLongTitle42nd Conference on Very Important Topics (CVIT 2016) \EventShortTitleCVIT 2016 \EventAcronymCVIT \EventYear2016 \EventDateDecember 24–27, 2016 \EventLocationLittle Whinging, United Kingdom \EventLogo \SeriesVolume42 \ArticleNo23
On the compressiveness of the Burrows-Wheeler transform
Abstract
The Burrows-Wheeler transform (BWT) is a reversible transform that converts a string into another string . The size of the run-length encoded BWT (RLBWT) can be interpreted as a measure of repetitiveness in the class of representations called dictionary compression which are essentially representations based on copy and paste operations. In this paper, we shed new light on the compressiveness of BWT and the bijective BWT (BBWT). We first extend previous results on the relations of their run-length compressed sizes and . We also show that the so-called “clustering effect” of BWT and BBWT can be captured by measures other than empirical entropy or run-length encoding. In particular, we show that BWT and BBWT do not increase the repetitiveness of the string with respect to various measures based on dictionary compression by more than a polylogarithmic factor. Furthermore, we show that there exists an infinite family of strings that are maximally incompressible by any dictionary compression measure, but become very compressible after applying BBWT. An interesting implication of this result is that it is possible to transcend dictionary compression in some cases by simply applying BBWT before applying dictionary compression.
keywords:
Data Compression, Bijective Burrows-Wheeler Transform, Fibonacci wordscategory:
\relatedversion1 Introduction
The Burrows-Wheeler Transform (BWT) [5] is a reversible mapping from a string to another string that enables compression and efficient pattern search, and is the theoretical cornerstone for essential tools in the field of bioinformatics [17, 18]. The compressibility of BWT has been studied in various contexts, but more recently, rather than statistical measures such as empirical entropy which are not helpful in highly repetitive datasets, the size of the run-length encoded BWT (RLBWT) and its relation to the many other repetitiveness measures related to dictionary compression has become an important topic of study (See [21] for a comprehensive survey).
Dictionary compression is a family of compressed representations which are essentially based on copy and paste operations. Kempa and Prezza [15] proposed the notion of string attractors to view dictionary compression in a uniform way, and showed that the size of the smallest string attractor lower bounds all other measures of dictionary compression, since they implicitly give a string attractor of the same size as their representations: the size of the smallest bidirectional macro scheme (BMS) [23], the size of the LZ77 parsing [25], the size of the smallest grammar with run-length rules, the size of the smallest grammar, , and more. Another measure based on substring complexity is known to lowerbound [16].
Concerning in relation to the other measures, Navarro et al. [22] showed that given an RLBWT of size , we can construct a BMS of size , thus showing . Kempa and Kociumaka [14] showed , and further 111More precisely, . Also, [22] has been shown, and therefore or, . Recently, the size of the run-length encoded bijective BWT (RLBBWT) has been studied [2, 1], and was shown, similarly to , that [1]. Also, the existence of an infinite family of strings such that , namely, and was shown [1]. The existence of the opposite case, i.e., if there are string families for which , was left open.
The “clustering effect” of the BWT in terms of the run length encoding was studied by Mantaci et al. [19], where they proved that BWT can increase the size of the run-length encoding of the string by a factor of at most . The maximal clustering effect for run-length encoding can be seen for example with the Fibonacci words: Fibonacci words of length have a run-length encoding of size , while . While this is remarkable when viewed from run-length encoding since a maximally incompressible string (in terms of rle) is transformed into a compressible one, in terms of the smallest bidirectional macro scheme, Fibonacci words and their BWT have the same size , and one might argue that BWT is merely transforming a compressible string into another compressible string.
In this paper, we extend previous results on the relation between and . We also investigate the clustering effect of BWT and BBWT in terms of other repetitiveness measures and show that they can be much more powerful than previously perceived, giving an example where BBWT transforms a maximally incompressible string (in terms of dictionary compression) into a compressible one.
The contributions of this paper are as follows.
-
1.
We show an infinite family of strings such that and , answering affirmatively an open question about the existence of a family of strings where raised by Badkobeh et al. [1].
-
2.
We show a polylogarithmic () upper bound on the multiplicative difference between and .
-
3.
We analyze the clustering effect of BWT and BBWT in terms of other repetitiveness measures, and show that the application of BWT and BBWT can only increase various repetitiveness measures of a string by a polylogarithmic factor.
-
4.
We show that BBWT on the strings whose BBWT images are Fibonacci words exhibit a maximal clustering effect in terms of the repetitiveness measures , , , , or , from almost linear () to constant. The main combinatorial part of our proof is an elegant characterization of the LF mapping function on Fibonacci strings that uses the Zeckendorf representation of the positions.
As a byproduct, this result shows that it is possible in some cases to transcend dictionary compression by simply applying BBWT before applying dictionary compression.
2 Preliminaries
Let denote a set of symbols called the alphabet. A string is an element of . If for any strings , then, , , are respectively a prefix, substring, and suffix of and are a proper prefix, substring, or suffix if they are not equal to . The length of string is denoted by . The empty string, i.e., the string of length is denoted by . The symbol at position will be denoted by , and we will use a -based index, i.e., . For integers , let if and if . We will also use . For any string , , and for any integer , . A string is primitive if it cannot be represented as for some string and integer . For any symbol let , i.e., the number of ’s in .
For any string , let , denote a left cyclic rotation of by one symbol. Notice that for any integer , naturally corresponds to the cyclic rotation of by symbols to the left if , and to the right if . We will use to denote the lexicographic smallest rotation of . A conjugacy class is an equivalence class of the equivalence relation defined by .
A string is a Lyndon word if it is lexicographically smaller than any of its, proper rotations, i.e., for any . From the definition, it is clear that a Lyndon word must be primitive. A rotation of a primitive string is always primitive and is Lyndon and unique, which we will sometimes refer to as the Lyndon rotation of . It is known that Lyndon words cannot have a non-empty proper border (a proper prefix that is also a proper suffix). The Lyndon factorization [6] of a string is a unique partitioning of into a sequence of non-increasing Lyndon words, i.e., where each , which we will call Lyndon factors of , is a Lyndon word and for . It is known that any occurrence of a Lyndon word in must be a substring of a Lyndon factor of .
The -order between primitive strings or same length strings is defined as .222As we will not be comparing non-primitive strings of different lengths in this paper, the definition here is a simplified version of the original. The -order can be different from the standard lexicographic order, but are identical when comparing strings of the same length or when comparing Lyndon words.
For a string , let , i.e., the number of symbols in . Let denote the size of the run-length encoding of , i.e., the maximal number of same symbol runs in . Let denote the number of runs of symbol in the run-length encoding of .
2.1 Repetitiveness Measures
A set of positions is a string attractor [15] of if any substring of has an occurrence in that covers a position in . The size of the smallest string attractor of is denoted by . The measure [16] is defined as , where is the number of distinct length- substrings of .
The Burrows-Wheeler transform (BWT) [5] of a string is defined as the sequence of last (or equivalently, previous, in the cyclic sense) symbols of all rotations of , in lexicographic order of the rotations. The size of the run-length encoding of will be denoted by , i.e., . The bijective BWT (BBWT) [11] of a string is defined as the sequence of last (or again, previous, in the cyclic sense) symbols of all the rotations of all the Lyndon factors of , in -order of the rotations. Actually, BWT can be understood as a special case of BBWT: because for some Lyndon word and integer , and the -order is equivalent to lexicographic order in this case since all the compared strings are of the same length. The size of the run-length encoding of will be denoted by , i.e., .
The inverse transform of and on a string can be defined by the LF mapping, which is a function over positions of where , , and is the string obtained by sorting the multiset of symbols of in increasing order. is a permutation and thus forms cycles on the set of positions of . A cycle where is the smallest positive integer such that , corresponds to a (cyclic) string . This string is always primitive, and by concatenating, in non-increasing order, all the Lyndon rotations of the strings corresponding to all cycles, it can be shown that is obtained. can be interpreted as returning, given the -order rank of a given rotation of a cycle, the -order rank of the previous rotation of the cycle. Note that when is a image of a primitive string, consists of only a single cycle and that although the Lyndon rotation will give the string for which , we have for any rotation of .
A Bidirectional Macro Scheme (BMS) [23] of a string is a partitioning of into phrases, where each phrase is either a single symbol, or is a substring that has an occurrence elsewhere in which we call the source, or the reference of the phrase. The references of the phrases must be such that the induced reference for each position in the phrases are acyclic, i.e., the referencing on the position forms a forest where the roots are the positions corresponding to single symbol phrases. The size of the smallest bidirectional macro scheme for is denoted by .
The LZ77 factorization [25] of a string is a BMS of where all references are left-referencing, i.e., they must point to a smaller position, and the phrases are determined greedily from left-to-right. It is known that LZ77 is the smallest among left-referencing BMS. The size of LZ77 of is denoted by .
The lex-parse [22] of a string is a BMS of where all references point to a rotation of smaller lexicographic (or equivalently -order) rotation, and the phrases are determined greedily from left-to-right. Similarly, it is known that lex-parse is the smallest among BMS with such constraint. The size of lex-parse of is denoted by .
2.2 Fibonacci Words
Since we will later use Fibonacci words or their slight modifications to show some of our results, we introduce them here.
The Fibonacci words are defined recursively as: , , and for any integer , . Fibonacci words can also be defined via a morphism defined as , and , and for all . The lengths of the Fibonacci words correspond to the Fibonacci sequence, i.e., , and . For technical reasons, we define for . The observation below follows from a simple induction. {observation} For any , , and .
3 New results for vs
3.1 Strings family giving
Here, we answer an open question raised by Badkobeh et al. [1], and show that there exists an infinite family of strings such that is asymptotically strictly smaller than .
Theorem 3.1.
There exists an infinite family of strings such that .
Proof 3.2.
Consider string , where . Then, for any , (Lemma 3.3), and (Corollary 3.8).
Lemma 3.3.
Let , where . Then, for any .
Proof 3.4.
The Lyndon factorization of results in the factors and , and it is known that . Since is greater, in -order, than any rotation of , we have and .
The rest of the proof will focus on showing . We show two proofs, one that relies heavily on previous results, and the other an alternate direct proof.
3.1.1 Proof for
Actually, it turns out that is a rotation of the strings whose BWT are shown to have runs in Proposition 3 of [12] and in Proposition 3 of [13]. The former defines it as for . The latter defines it as inserting at position of , and mentions that it is a rotation of the former string. Our proof of further connects these strings with the lexicographically smallest rotation of .
The following is known:
Theorem 3.5 (Theorem 1 in [7]).
The rotation of with rank in the lexicographically sorted list of all the rotations of , for , is the rotation where
Therefore, we have
Corollary 3.6.
For every integer , .
Proof 3.7.
From Theorem 3.5 with , where .
which means that inserting a at position of is a rotation of , leading to:
Corollary 3.8.
Let , where . Then, for any .
3.1.2 Alternate proof for
We also give an alternate direct proof based on morphisms, which is perhaps slightly simpler compared to the proof for presented in [12] which relies on additional results on special factors of standard words [4]. Results on morphisms and their effects on have been studied by Fici et al. [9], but to the best of our knowledge, their results do not directly apply to our case.
We use a string morphism defined as: , . The following claim can be shown by a simple induction.
Claim 1 (Claim 5 in [20]).
For any string , .
To prove Theorem 3.1, we first show the following lemma.
Lemma 3.9.
For every integer , .
Proof 3.10.
It clearly holds for . For , we show that by induction on . We can see that the statement holds for since . Suppose that holds for some . We have
by induction hypothesis | ||||
by 1 |
Since the last symbol of is and , we have
Hence, the statement also holds for , and for every . By combining it with Corollary 3.6, we obtain .
We are ready to prove the following lemma.
Lemma 3.11.
Let , where . Then, for any .
Proof 3.12.
We first prove that for for any . From Lemma 3.9, we have . Below, we extend the definition of so that , so . We claim that
(1) |
We show this statement by induction on . If , and hold. Assume that the statement holds for some . We show the statement also holds for . We consider the three disjoint sets of the rotations of as follows: the rotations (i) starting with , (ii) starting with , (iii) starting with or .
-
(i)
From the definition of , the number of rotations starting with of is . These rotations are the first rotations in the lexicographically increasingly sorted list of all rotations of . By the definition of , the lexicographically smallest rotation is , and the preceding symbol of this rotation is . The other rotations of this case are preceded by . Hence, .
-
(ii)
Let be the set of positions in that are preceded by an occurrence of . From the definitions of and , every occurrence of in is a suffix of an occurrence of the substrings ( or ) produced from and an occurrence of or in . Therefore, we have and if denotes the position in that produces the corresponding symbol at position in by the morphism , then for any . Thus, for any positions ,
(2) where the last relation follows from the fact that is an order preserving morphism (i.e., holds for any ). We can also see that for any , the symbol preceding the occurrence of is (resp. ) iff is (resp. ). Thus, together with Equation 2, the sequence of entries in corresponding to rotations that start with are equivalent to the sequence of entries in for rotations that start with or , i.e., . See also Figure 1.
-
(iii)
From the definition of , the number of rotations starting with in is , and the lexicographically largest rotation is itself. These rotations are the last rotations in the lexicographically increasingly sorted list of all rotations of . The preceding symbol of the largest rotation, or equivalently, the last symbol of is . All rotations starting with are preceded by . Hence, .
All together, we have
and the statement holds for , proving Equation 1. Therefore, . We note that Equation 1 holds for , but since , .
The last symbol of is , so changing the in to changes the unique occurrence of in all rotations of except for itself, to the unique occurrence of in all rotations of except for itself. Therefore, the lexicographic order of rotations of are equivalent to those in , except for itself. We can see that will have the following changes compared to : 1) the lexicographically smallest rotation is preceded by instead of , and 2) is the lexicographically smallest rotation that starts with (rather than ). 1) changes to and thus decreases the run-length by . 2) moves the last right after the second to last in (the preceding symbol of the largest rotation starting with ) and thus decreases the run-length by . Therefore, in total, which can be confirmed to hold for as well.

3.2 Bounds for
We show poly-logarithmic upper and lower bounds on the ratio for any string.
We first show a simple upper bound on the multiplicative rotation sensitivity of .
Lemma 3.13.
For any strings of length in the same conjugacy class, .
Proof 3.14.
It is easy to see that . This is because, given any BMS of of size , we can reuse the parsing and referencing, except for the following changes, to construct a BMS for of size : (1) if starts in the middle of a BMS phrase of , we split the phrase, and (2) if a phrase references a substring of but is split in , we split the phrase. Since only one of the phrases of the split phrase in (1) is split by (2), the total number of phrases is at most .
Since and , we have .
Lemma 3.15.
For any strings of length in the same conjugacy class, .
Proof 3.16.
Badkobeh et al. [1] showed (1) , which implies , and (2) . Therefore, from Lemma 3.13.
Corollary 3.17.
For any string of length , and .
4 Clustering effects of and
4.1 In terms of
The clustering effect of was measured by the length of the run-length encoding. Namely, the following statement was claimed by Mantaci et al. [19].
Theorem 4.1 (Theorem 3.3 in [19]).
For any string , .
While the statement is true, we believe there is a non-trivial case that was not covered in their original proof. Here, we address this case and also show that the same statement holds for . Note that the following Theorem 4.2 implies Theorem 4.1 because and .
Theorem 4.2.
For any string , .
Proof 4.3.
We first recall the arguments of [19] for BWT. Consider a range of that correspond to rotations of that start with a symbol . Then, will consist of occurrences of ’s that correspond to those preceding a in the same run of ’s in , and occurrence of symbols not equal to that precede a maximal run of ’s in . Thus, is maximized when the non- symbols in are not adjacent to each other. Mantaci et al. argued that this implies summing up to for all . However, they did not give a reason to why could not be (summing up to ), which is possibly the case when starts and ends with the symbol . We show that in such a case, an LF mapping with a single cycle cannot be defined, and ultimately, holds.
Let be a maximal run of symbol in and let be such that corresponds to , i.e., is the -order rank of . Since , . Also, since , the -order ranks , corresponding respectively to those of , are in (except ) and monotonic, i.e., either or holds. Also, . On the other hand, for any s.t. and , must hold implying that for any such , it cannot be that is part of a monotonically decreasing cycle and is part of a monotonically increasing cycle. Since each of the above monotonic sequences end at a non- position in , there must be some position such that , and cannot be defined to cover all positions in in a single cycle. Therefore holds.
For , similarly consider the range of such that the corresponding rotations of the Lyndon factors of start with the symbol . Since it is known that Lyndon factors of (except for single symbol factors) must start and end at run-boundaries of [10], a maximal run of ’s in is either a substring of a Lyndon factor of , or a maximal run of a single symbol Lyndon factors. If it is a substring of a Lyndon factor but not a prefix, then there is one non- symbol per such run that precedes the run and occurs in . If it is a prefix of a Lyndon factor longer than , then the previous symbol (or the last symbol of the Lyndon factor) is non- due to the Lyndon factor not having a proper border. Otherwise, the maximal run is a maximal run of single symbol Lyndon factors , in which case the previous symbol for these occurring in will also be . Thus, the arguments in the previous paragraph for BWT hold for BBWT as well, except for the last part: it can be that and for some . However, this implies that this corresponds to a single symbol Lyndon factor which would not have introduced a non- symbol in . Therefore still holds.
4.2 In terms of other repetitiveness measures
We consider measuring the clustering effect of and other than . Here, we show that or can only increase the repetitiveness of the string by a polylogarithmic factor.
Theorem 4.4.
For repetitiveness measure and any string of length , it holds that . Moreover, for , and for .
Proof 4.5.
The proof for follows from a simple observation that for any string , implying . Since , we have , where the last relation follows from the fact that lower bounds all the repetitiveness measures considered.
For , holds, but only has been proved, from which and the statement follows for . For the other measures, Corollary 3.17 gives us from which the statement follows.
4.3 A family of significantly “clustered” strings by BBWT
Finally, we show an infinite family of strings for which BBWT exhibits an asymptotically maximal clustering effect in terms of , and slightly weaker in terms of . Namely, we consider the family of strings whose BBWT image are Fibonacci words.
Theorem 4.6.
Let be a string where is a Fibonacci word of length . Then, for , and for , .
In order to understand , we first introduce the tools we use to characterize the LF mapping on Fibonacci words.
The Zeckendorf representation [24] of a non-negative integer is a unique (sub-)set of distinct non-consecutive Fibonacci numbers that sum up to the integer. We represent a non-negative integer as a bit string , where . We use to denote the length- prefix of . Note that for any , it suffices that a subset of is used, and thus requires only up to bits, i.e., ’s will only occur in , and the rest will be .
We observe that length prefix of a Fibonacci word can be uniquely factorized into a sequence of distinct non-consecutive Fibonacci words in order of decreasing length, where the length of each Fibonacci word corresponds to an element in the Zeckendorf representation of .
Lemma 4.7.
For any ,
(3) |
where is the smallest integer such that .
Proof 4.8.
Consider a greedy factorization of that takes the longest prefix of the remaining string that is a Fibonacci word. Let be the smallest integer such that . If then we simply take as the last element. Otherwise, we take . Notice that in this case, the remaining string is of length , and we repeat the process to find a greedy factorization of . The remaining string is a proper prefix of and thus will not be chosen next, implying that consecutive Fibonacci words are not chosen. Thus, the lengths of the sequence of strings will be a set of non-consecutive set of distinct Fibonacci numbers that sum up to . The lemma follows from the uniqueness of the Zeckendorf representation of .
The following relation about the least significant bit in the Zeckendorf representation and the th symbol in the (infinite) Fibonacci word is known.
Lemma 4.9 (Problem 6 in [8]).
For ,
Now, we make observations on the LF-mapping for Fibonacci words.
Lemma 4.10.
The LF-mapping function for the Fibonacci word is, for any ,
Proof 4.11.
Straightforward from the definition of and Section 2.2.
The next lemma shows that the LF mappings on can be interpreted as rotation operations on the Zeckendorf representation of the position to be mapped.
Lemma 4.12.
For any ,
Proof 4.13.
Let . Since , the most significant bits of and , corresponding to , are always zero, i.e., .
From Lemma 4.9, implies that . Thus, . Therefore, we have
by Lemma 4.10 | ||||
by Lemma 4.7 | ||||
On the other hand, again from Lemma 4.9, implies that and since the Zeckendorf representation does not use consecutive Fibonacci numbers, . Thus, . Therefore, we have
by Lemma 4.7 | ||||
Furthermore,
by Lemma 4.10 | ||||
thus proving the lemma (see also Figure 2).

We are now ready to prove Theorem 4.6: See 4.6
Proof 4.14.
For any Fibonacci word , and are known. In the rest of the proof, we show implying the same lower bound for all other measures which finishes the proof.
From Lemma 4.12, can be interpreted as a rotation operation on a -bit string corresponding to the Zeckendorf representation of the given position. Therefore, the number of cycles that produces is equivalent to the number of conjugacy classes among bit strings corresponding to the Zeckendorf representations of the set of positions . Since the Zeckendorf representations do not use adjacent Fibonacci numbers, this is equivalent to the number of binary necklaces of length that do not contain “11”. This corresponds to entry A000358 in the on-line encyclopedia of integer sequences, which is known [3] to be:
(4) |
where represents that is a divisor of , and is Euler’s totient function which returns, for integer , the number of positive integers less than that are relatively prime to (i.e., having of ).
As seen in the proof of Lemma 4.12, an in implies a 1-bit left rotation on for , and in implies a 2-bit left rotation on for . For simplicity of the analysis, we will restrict to being prime. Each necklace will then correspond to some cyclic string , where
(5) |
except for the string corresponding to the bit string , since all other necklaces must be primitive and a total of -bits must be rotated in order to come back to the same bit string. Consider the lexicographically smallest rotations of all of them, which are Lyndon words. All of them are distinct, and any two of them (except for the single ) cannot be a substring of the other due to the constraint on their lengths (Equation 5). The string is a concatenation (in non-increasing order) of all these strings with the single appended at the end, corresponding to the Lyndon factorization of . Since Lyndon words can only occur as a substring of a Lyndon factor, it follows that there are distinct uniquely occurring strings (Lyndon factors) that do not overlap. For prime , Equation 4 becomes:
Since the uniquely occurring Lyndon factors of satisfy Equation 5 and have length at most , any length substring of will contain at least one full occurrence of a Lyndon factor of . Since all such strings must also have unique occurrences and therefore be distinct, holds.
For example, for , the binary necklaces of length that do not contain “11” are: , , , , , which respectively correspond to , , , , and , and therefore, .
We note that is asymptotically maximally incompressible by dictionary compression since it is known that for any string, holds, which is for binary strings.
5 Discussion
The factors in our bounds, especially in Theorem 4.4 are most likely not tight; Our focus here was on showing the existence of such bounds. A natural open question is how many factors can be shaved.
References
- [1] Golnaz Badkobeh, Hideo Bannai, and Dominik Köppl. Bijective BWT based compression schemes. In Zsuzsanna Lipták, Edleno Silva de Moura, Karina Figueroa, and Ricardo Baeza-Yates, editors, String Processing and Information Retrieval - 31st International Symposium, SPIRE 2024, Puerto Vallarta, Mexico, September 23-25, 2024, Proceedings, volume 14899 of Lecture Notes in Computer Science, pages 16–25. Springer, 2024. doi:10.1007/978-3-031-72200-4\_2.
- [2] Elena Biagi, Davide Cenzato, Zsuzsanna Lipták, and Giuseppe Romana. On the number of equal-letter runs of the bijective Burrows-Wheeler transform. In Giuseppa Castiglione and Marinella Sciortino, editors, Proceedings of the 24th Italian Conference on Theoretical Computer Science, Palermo, Italy, September 13-15, 2023, volume 3587 of CEUR Workshop Proceedings, pages 129–142. CEUR-WS.org, 2023. URL: https://ceur-ws.org/Vol-3587/4564.pdf.
- [3] M. Bona. Handbook of Enumerative Combinatorics. Discrete Mathematics and Its Applications. CRC Press, 2015. URL: https://books.google.co.jp/books?id=j3kZBwAAQBAJ.
- [4] Jean-Pierre Borel and Christophe Reutenauer. On Christoffel classes. RAIRO Theor. Informatics Appl., 40(1):15–27, 2006. URL: https://doi.org/10.1051/ita:2005038, doi:10.1051/ITA:2005038.
- [5] Michael Burrows and David J. Wheeler. A block-sorting lossless data compression algorithm. Technical report, Digital Equipment Corporation, Systems Reseach Center, 1994. SRC Research Report 124.
- [6] Kuo Tsai Chen, Ralph H. Fox, and Roger C. Lyndon. Free differential calculus, IV. The quotient groups of the lower central series. Annals of Mathematics, 68(1):81–95, 1958.
- [7] Manolis Christodoulakis, Costas S. Iliopoulos, and Yoan José Pinzón Ardila. Simple algorithm for sorting the fibonacci string rotations. In Jirí Wiedermann, Gerard Tel, Jaroslav Pokorný, Mária Bieliková, and Julius Stuller, editors, SOFSEM 2006: Theory and Practice of Computer Science, 32nd Conference on Current Trends in Theory and Practice of Computer Science, Merín, Czech Republic, January 21-27, 2006, Proceedings, volume 3831 of Lecture Notes in Computer Science, pages 218–225. Springer, 2006. doi:10.1007/11611257\_19.
- [8] Maxime Crochemore, Thierry Lecroq, and Wojciech Rytter. Fibonacci Words and Fibonacci Numeration System, page 17–52. Cambridge University Press, 2021.
- [9] Gabriele Fici, Giuseppe Romana, Marinella Sciortino, and Cristian Urbina. On the impact of morphisms on BWT-runs. In Laurent Bulteau and Zsuzsanna Lipták, editors, 34th Annual Symposium on Combinatorial Pattern Matching, CPM 2023, June 26-28, 2023, Marne-la-Vallée, France, volume 259 of LIPIcs, pages 10:1–10:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2023. URL: https://doi.org/10.4230/LIPIcs.CPM.2023.10, doi:10.4230/LIPICS.CPM.2023.10.
- [10] Yuta Fujishige, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Almost linear time computation of maximal repetitions in run length encoded strings. In Yoshio Okamoto and Takeshi Tokuyama, editors, 28th International Symposium on Algorithms and Computation, ISAAC 2017, December 9-12, 2017, Phuket, Thailand, volume 92 of LIPIcs, pages 33:1–33:12. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2017. URL: https://doi.org/10.4230/LIPIcs.ISAAC.2017.33, doi:10.4230/LIPICS.ISAAC.2017.33.
- [11] Joseph Yossi Gil and David Allen Scott. A bijective string sorting transform. CoRR, abs/1201.3077, 2012. URL: http://arxiv.org/abs/1201.3077, arXiv:1201.3077.
- [12] Sara Giuliani, Shunsuke Inenaga, Zsuzsanna Lipták, Nicola Prezza, Marinella Sciortino, and Anna Toffanello. Novel results on the number of runs of the Burrows-Wheeler-transform. In Tomás Bures, Riccardo Dondi, Johann Gamper, Giovanna Guerrini, Tomasz Jurdzinski, Claus Pahl, Florian Sikora, and Prudence W. H. Wong, editors, SOFSEM 2021: Theory and Practice of Computer Science - 47th International Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM 2021, Bolzano-Bozen, Italy, January 25-29, 2021, Proceedings, volume 12607 of Lecture Notes in Computer Science, pages 249–262. Springer, 2021. doi:10.1007/978-3-030-67731-2\_18.
- [13] Sara Giuliani, Shunsuke Inenaga, Zsuzsanna Lipták, Giuseppe Romana, Marinella Sciortino, and Cristian Urbina. Bit catastrophes for the Burrows-Wheeler transform. In Frank Drewes and Mikhail Volkov, editors, Developments in Language Theory - 27th International Conference, DLT 2023, Umeå, Sweden, June 12-16, 2023, Proceedings, volume 13911 of Lecture Notes in Computer Science, pages 86–99. Springer, 2023. doi:10.1007/978-3-031-33264-7\_8.
- [14] Dominik Kempa and Tomasz Kociumaka. Resolution of the Burrows-Wheeler transform conjecture. Commun. ACM, 65(6):91–98, 2022. doi:10.1145/3531445.
- [15] Dominik Kempa and Nicola Prezza. At the roots of dictionary compression: string attractors. In Ilias Diakonikolas, David Kempe, and Monika Henzinger, editors, Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018, Los Angeles, CA, USA, June 25-29, 2018, pages 827–840. ACM, 2018. doi:10.1145/3188745.3188814.
- [16] Tomasz Kociumaka, Gonzalo Navarro, and Nicola Prezza. Toward a definitive compressibility measure for repetitive sequences. IEEE Transactions on Information Theory, 69(4):2074–2092, 2023. doi:10.1109/TIT.2022.3224382.
- [17] Ben Langmead, Cole Trapnell, Mihai Pop, and Steven L. Salzberg. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology, 10, 2009.
- [18] Heng Li and Richard Durbin. Fast and accurate short read alignment with burrows–wheeler transform. Bioinformatics, 25(14):1754–1760, 05 2009. arXiv:https://academic.oup.com/bioinformatics/article-pdf/25/14/1754/48994219/bioinformatics\_25\_14\_1754.pdf, doi:10.1093/bioinformatics/btp324.
- [19] Sabrina Mantaci, Antonio Restivo, Giovanna Rosone, Marinella Sciortino, and Luca Versari. Measuring the clustering effect of BWT via RLE. Theor. Comput. Sci., 698:79–87, 2017. URL: https://doi.org/10.1016/j.tcs.2017.07.015, doi:10.1016/J.TCS.2017.07.015.
- [20] Takuya Mieno, Shunsuke Inenaga, and Takashi Horiyama. RePair grammars are the smallest grammars for Fibonacci words. In Hideo Bannai and Jan Holub, editors, 33rd Annual Symposium on Combinatorial Pattern Matching, CPM 2022, June 27-29, 2022, Prague, Czech Republic, volume 223 of LIPIcs, pages 26:1–26:17. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022. URL: https://doi.org/10.4230/LIPIcs.CPM.2022.26, doi:10.4230/LIPICS.CPM.2022.26.
- [21] Gonzalo Navarro. Indexing highly repetitive string collections, part II: compressed indexes. ACM Comput. Surv., 54(2):26:1–26:32, 2022. doi:10.1145/3432999.
- [22] Gonzalo Navarro, Carlos Ochoa, and Nicola Prezza. On the approximation ratio of ordered parsings. IEEE Trans. Inf. Theory, 67(2):1008–1026, 2021. doi:10.1109/TIT.2020.3042746.
- [23] James A. Storer and Thomas G. Szymanski. Data compression via textual substitution. J. ACM, 29(4):928–951, 1982. doi:10.1145/322344.322346.
- [24] Edouard Zeckendorf. Representations des nombres naturels par une somme de nombres de Fibonacci on de nombres de Lucas. Bulletin de La Society Royale des Sciences de Liege, pages 179–182, 1972. URL: https://cir.nii.ac.jp/crid/1570009749187075840.
- [25] Jacob Ziv and Abraham Lempel. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory, 23(3):337–343, 1977. doi:10.1109/TIT.1977.1055714.