Longest (Sub-)Periodic Subsequence
Abstract
We present an algorithm computing the longest periodic subsequence of a string of length in time with words of space. We obtain improvements when restricting the exponents or extending the search allowing the reported subsequence to be subperiodic down to time and words of space.
1 Introduction
A natural extension of the analysis of regularities such as squares or palindromes perceived as substrings of a given text is the study of the same type of regularities when considering subsequences. In this line of research, given a text of length , Kosowski [17] proposed an algorithm running in time using words of space to find the longest subsequence that is a square. Inoue et al. [13] generalized this setting to consider the longest such subsequence common of two texts and of length , and gave an algorithm computing this sequence in time using space, also providing improvements in case that the number of matching characters pairs is rather small. Recently, Inoue et al. [14] provided similar improvements for the longest square subsequence of a single string. Here, we consider the problem for a single text, but allow the subsequence to have different exponents. In detail, we want to find the longest subsequence that is (sub-)periodic.
A non-exhaustive list of related problems are finding the longest palindromic subsequence [5, 12], absent subsequences [16], longest increasing and decreasing subsequences [19, 7], maximal common subsequences [18, 6], the longest run subsequence [20], the longest Lyndon subsequence [2], longest rollercoasters [9, 3, 8], and computing subsequence automaton [4, 1]. Our techniques rely on finding longest common subsequences, which is conceived as a well-studied problem (see [10, 11, 15] and the references therein).
2 Preliminaries
Let denote all natural numbers , and the set of all rational numbers greater than or equal to 1. We distinguish integer intervals and intervals of rational numbers .
Let denote a totally ordered set of symbols called the alphabet. An element of is called a string. Given a string , we denote its length with and its -th symbol with for . Further, we write . A factorization is a partitioning of into substrings . A subsequence of a string with length is a string with .
Given a string , we can write in the form with being either empty or a proper prefix of . Then is called a period of , and is called its exponent with respect to the period . For the largest possible such exponent , is called periodic if , or sub-periodic if . For instance, the unary string has the minimum period with exponent , or more generally, period with exponent .
Further, for two strings and , let denote the longest common subsequence of and . We omit the strings in subscript if they are clear from the context. Also, we allow us to write LCS for an arbitrary number of strings; the number of strings is reflected by the arguments given to LCS. It is known that we can answer the longest common subsequence of strings in constant time by building a table of size and filling its entries in time via dynamic programming.
Structure of the Paper
In what follows, we first show an algorithm (Sect. 3) computing the longest subsequence that is periodic or sub-periodic. Subsequently, we refine this algorithm to omit the sub-periodic subsequences by allowing more time and space in Sect. 4. Table 1 gives an overview of our obtained results. We observe that, if we are only interested in the longest subsequence having an exponent , we obtain a algorithm faster than those finding subsequences with more restricted exponents.
exponent time space solution [17] Thm. 3.2 Thm. 4.2 Thm. 4.3
A key observation is that a longest (sub-)periodic subsequence is maximal, meaning that no occurrence of in can be extended with a character in or to form a longer subsequence without breaking the periodicity.
3 Longest (Sub-)periodic Subsequence
We start with the search for the longest subsequence that is periodic or subperiodic, meaning that one of its exponents is in . The idea is to compute every possible factorization of into two factors, and try to (a) prolong each square to for being a subsequence in , or (b) extend to with being a proper prefix of found in and a subsequence found in . For a fixed partition , we define
(1) |
for and to be the longest subsequence of having an exponent in of the form with being a non-empty common subsequence of and and a subsequence of , where . Note that we can add as another selectable value for the maximum determining , but this does not change the maximum, since the value of is used as an option for the maximum determining , which is already an option for . has the following property:
a | b | c | a | a | |
---|---|---|---|---|---|
a | aba | abca | abcaa | abcaaa | |
c | aba | abca | acaac | acaaac | |
a | aba | abca | acaac | acaaaca |
Lemma 3.1.
is the length of the longest subsequence of of the form with being a proper prefix of and a common subsequence of and , and being a subsequence of .
Proof.
Let be the longest subsequence having an exponent in in with . We partition such that and are subsequences in and , respectively. Since is a proper prefix of , . If , then there is another common subsequence of and , and is a longer subsequence of than , so is a longer subsequence having an exponent in than , a contradiction. Therefore, . Obviously, , since otherwise we could extend further.
Finally, we show that the sequence is considered in the construction of . Since (otherwise cannot be a proper prefix of ), , and equality results from the fact that we otherwise have found a longer subsequence in that we can extend to a subsequence of longer than by applying the second option in Eq. 1. The third option of Eq. 1 fills up such that we obtain the length of in . ∎
Theorem 3.2.
We can find the longest subsequence having an exponent in in time using space.
Proof.
For each of the different partitions , we precompute a table answering in constant time. This table needs space and can be constructed in time. Next, we create a table of size , and fill each of its cells by Eq. 1 in constant time thanks to the precomputation step. In total, we need time per partition , and therefore time for the entire computation. ∎
In the following we want to omit subsequences having exponents only in .
4 Longest Periodic Subsequence
We now extend our ideas of the previous section to omit the sub-periodic subsequences, such that our algorithm always reports a subsequence with an exponent of at least . Our main idea is to generalize the factorization of from to factors. Computing the LCS of these factors, we obtain all longest periodic subsequences having an exponent of length with . For , we can find all square subsequences, i.e., the longest common subsequence with an exponent of for , similar to [17]. Like in Sect. 3, we can support exponent values by stopping matching characters in the last factor. In general, the number of factors is a good value if the exponent is not in . With an exponent , each factor starts capturing at the root of the repetition, i.e., if we match a subsequence , then we start capturing in all factors simultaneously. However, the last factor has to capture characters if . Hence, we need to split this last factor such that we have four factors, i.e., we need for .
However, suffices for . To see that, we let the first factor capture the subsequence . The last factor captures with , which works if , i.e., , which holds for . For , each of the first factors captures , while the last factor captures with .
4.1 Three Factors
There are possible factorizations of the form . Let us fix one such factorization . For this factorization, we define the three-dimensional table by
where if and otherwise. Compared to , the number options for a cell value have increased. The first option takes the longest common subsequence of , and , which induces the subsequence of , and tries to prolong to a subsequence of with being a prefix of and (or gives if ). The second option performs an extension of to with . The last options are similar to the standard LCS computation by copying a previously computed result on a mismatch of and .
Lemma 4.1.
is the longest subsequence of with (possibly empty) prefix such that is a common subsequence of , and , and is a subsequence of .
Proof.
The longest periodic subsequence with an exponent in is given by . In what follows, we show that we can also find the longest periodic subsequence with an exponent not divisible by three, i.e., the exponent is in . This longest periodic subsequence has the form with being a (not necessarily proper) prefix of (see the aforementioned discussion of why this subsequence has to admit this form). Then is a common subsequence of , , and with . Similar to the proof of Lemma 3.1, we can argue that , otherwise has a longer periodic subsequence with an exponent in . Hence, the longest periodic subsequence with an exponent in has the form with being a prefix of . But then is the longest common subsequence of and , computed in . ∎
Theorem 4.2.
We can find the longest periodic subsequence with an exponent in in time using space.
Proof.
For the claimed time and space complexities, let us fix a factorization . We first pre-process LCS with a three-dimensional table taking space and time such that we can answer an LCS query in time. The table also takes space, and each cell can be filled in constant time thanks to the pre-processing step. Finally, we compute the maximum value of for each factorization , which are many. Hence, we fill times, which gives us the total time of . ∎
4.2 Four Factors
Finally, we consider a factorization of size four to capture exponents in . We have possibilities to factorize into four factors . Let us fix a factorization . We fill the 4-dimensional table as follows:
where if and otherwise.
Theorem 4.3.
We can compute the longest periodic subsequence with an exponent in time using space.
5 Open Problems
We are unaware of polynomial-time algorithms computing several other types of regularities when considering subsequences. For instance, we are not aware of an algorithm computing the longest sub-periodic subsequence. For that, we would need an efficient algorithm computing the longest (common) subsequence without a border. Then we could use the algorithm of Sect. 3 and compute this subsequence without a border instead of the (plain) LCS. Other problems are finding the longest (common) subsequence that is primitive (exponent in ), or the longest (common) subsequence that is non-primitive (exponent in ).
References
- Baeza-Yates [1991] R. A. Baeza-Yates. Searching subsequences. Theor. Comput. Sci., 78(2):363–376, 1991.
- Bannai et al. [2022] H. Bannai, T. I, T. Kociumaka, D. Köppl, and S. J. Puglisi. Computing longest (common) Lyndon subsequences. CoRR, abs/2201.06773, 2022.
- Biedl et al. [2019] T. C. Biedl, A. Biniaz, R. Cummings, A. Lubiw, F. Manea, D. Nowotka, and J. O. Shallit. Rollercoasters: Long sequences without short runs. SIAM J. Discret. Math., 33(2):845–861, 2019.
- Bille et al. [2017] P. Bille, I. L. Gørtz, and F. R. Skjoldjensen. Subsequence automata with default transitions. J. Discrete Algorithms, 44:48–55, 2017.
- Chowdhury et al. [2014] S. R. Chowdhury, M. M. Hasan, S. Iqbal, and M. S. Rahman. Computing a longest common palindromic subsequence. Fundam. Informaticae, 129(4):329–340, 2014.
- Conte et al. [2019] A. Conte, R. Grossi, G. Punzi, and T. Uno. Polynomial-delay enumeration of maximal common subsequences. In Proc. SPIRE, volume 11811 of LNCS, pages 189–202, 2019.
- Duraj [2020] L. Duraj. A sub-quadratic algorithm for the longest common increasing subsequence problem. In Proc. STACS, volume 154 of LIPIcs, pages 41:1–41:18, 2020.
- Fujita et al. [2021] K. Fujita, Y. Nakashima, S. Inenaga, H. Bannai, and M. Takeda. Longest common rollercoasters. In Proc. SPIRE, volume 12944 of LNCS, pages 21–32, 2021.
- Gawrychowski et al. [2019] P. Gawrychowski, F. Manea, and R. Serafin. Fast and longest rollercoasters. In Proc. STACS, volume 126 of LIPIcs, pages 30:1–30:17, 2019.
- Hirschberg [1975] D. S. Hirschberg. A linear space algorithm for computing maximal common subsequences. Commun. ACM, 18(6):341–343, 1975.
- Hirschberg [1977] D. S. Hirschberg. Algorithms for the longest common subsequence problem. J. ACM, 24(4):664–675, 1977.
- Inenaga and Hyyrö [2018] S. Inenaga and H. Hyyrö. A hardness result and new algorithm for the longest common palindromic subsequence problem. Inf. Process. Lett., 129:11–15, 2018.
- Inoue et al. [2018] T. Inoue, S. Inenaga, H. Hyyrö, H. Bannai, and M. Takeda. Computing longest common square subsequences. In Proc. CPM, volume 105 of LIPIcs, pages 15:1–15:13, 2018.
- Inoue et al. [2020] T. Inoue, S. Inenaga, and H. Bannai. Longest square subsequence problem revisited. In Proc. SPIRE, volume 12303 of LNCS, pages 147–154, 2020.
- Kiyomi et al. [2021] M. Kiyomi, T. Horiyama, and Y. Otachi. Longest common subsequence in sublinear space. Inf. Process. Lett., 168:106084, 2021.
- Kosche et al. [2021] M. Kosche, T. Koß, F. Manea, and S. Siemer. Absent subsequences in words. In Proc. RP, volume 13035 of LNCS, pages 115–131, 2021.
- Kosowski [2004] A. Kosowski. An efficient algorithm for the longest tandem scattered subsequence problem. In Proc. SPIRE, volume 3246 of LNCS, pages 93–100, 2004.
- Sakai [2019] Y. Sakai. Maximal common subsequence algorithms. Theor. Comput. Sci., 793:132–139, 2019.
- Schensted [1961] C. Schensted. Longest increasing and decreasing subsequences. Canadian Journal of Mathematics, 13:179–191, 1961.
- Schrinner et al. [2020] S. Schrinner, M. Goel, M. Wulfert, P. Spohr, K. Schneeberger, and G. W. Klau. The longest run subsequence problem. In Proc. WABI, volume 172 of LIPIcs, pages 6:1–6:13, 2020.