This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

PalFM-index: FM-index for Palindrome Pattern Matching

Shinya Nagashita
Kyushu Institute of Technology, Japan
[email protected]

Tomohiro I
Kyushu Institute of Technology, Japan
[email protected]
Abstract

The palindrome pattern matching (pal-matching) is a kind of generalized pattern matching, in which two strings xx and yy of same length are considered to match (pal-match) if they have the same palindromic structures, i.e., for any possible 1i<j|x|=|y|1\leq i<j\leq|x|=|y|, x[i..j]x[i..j] is a palindrome if and only if y[i..j]y[i..j] is a palindrome. The pal-matching problem is the problem of searching for, in a text, the occurrences of the substrings that pal-match with a pattern. Given a text TT of length nn over an alphabet of size σ\sigma, an index for pal-matching is to support, given a pattern PP of length mm, the counting queries that compute the number 𝗈𝖼𝖼\mathsf{occ} of occurrences of PP and the locating queries that compute the occurrences of PP. The authors in [I et al., Theor. Comput. Sci., 2013] proposed an O(nlgn)O(n\lg n)-bit data structure to support the counting queries in O(mlgσ)O(m\lg\sigma) time and the locating queries in O(mlgσ+𝗈𝖼𝖼)O(m\lg\sigma+\mathsf{occ}) time. In this paper, we propose an FM-index type index for the pal-matching problem, which we call the PalFM-index, that occupies 2nlgmin(σ,lgn)+2n+o(n)2n\lg\min(\sigma,\lg n)+2n+o(n) bits of space and supports the counting queries in O(m)O(m) time. The PalFM-indexes can support the locating queries in O(m+Δ𝗈𝖼𝖼)O(m+\Delta\mathsf{occ}) time by adding nΔlgn+n+o(n)\frac{n}{\Delta}\lg n+n+o(n) bits of space, where Δ\Delta is a parameter chosen from {1,2,,n}\{1,2,\dots,n\} in the preprocessing phase.

1 Introduction

A palindrome is a string that can be read same backward as forward. Palindromic structures in a string are one of the most fundamental structures in the string and have been extensively studied. For example, it is known that any string ww contains at most |w|+1|w|+1 distinct palindromic substrings [6], and the strings reaching the maximum values have some intriguing properties [15, 28]. Another concept regarding palindromic structures is the palindrome complexity [1, 4, 2], which is the number of distinct palindromic substrings of a given length in a string.

Instead of thinking about distinct palindromic substrings, one might be interested in occurrences of palindromic substrings. The palindromic structures in such a sense are captured by the maximal palindromes from all possible “centers” in a string. Manacher’s algorithm [26], originally proposed for computing a prefix-palindrome, can be extended to compute all the maximal palindromes in O(|w|)O(|w|) time for a string ww. The authors in [18] considered the problem of inferring strings from a given set of maximal palindromes and showed that the problem can be solved in O(|w|)O(|w|) time.

In [19], a new concept called palindrome pattern matching was introduced as a generalized pattern matching. Two strings xx and yy of the same length are said to palindrome pattern match (pal-match in short) iff they have the same palindromic structures, i.e., the following condition holds: for any possible 1i<j|x|=|y|1\leq i<j\leq|x|=|y|, x[i..j]x[i..j] is a palindrome iff y[i..j]y[i..j] is a palindrome. We remark that xx and yy themselves are not necessarily palindromes. The palindrome pattern matching has potential applications to genomic analysis, in which some palindromic structures play an important role to estimate RNA secondary structures [21].

The pal-matching problem is to search for, in a text, the occurrences of the substrings that pal-match with a pattern. Given a text TT of length nn and a pattern PP of length mm, a Morris-Pratt type algorithm for solving the pal-matching problem in O(n)O(n) time was proposed in [19]. The method in [19] is based on the 𝗅𝗉𝖺𝗅\mathsf{lpal}-encoding of a string ww, denoted as 𝗅𝗉𝖺𝗅w\mathsf{lpal}_{w}, that is the integer array of length |w||w| such that 𝗅𝗉𝖺𝗅w[i]\mathsf{lpal}_{w}[i] is the length of the longest suffix palindrome of w[1..i]w[1..i]. The 𝗅𝗉𝖺𝗅\mathsf{lpal}-encoding is helpful because two strings xx and yy pal-match iff 𝗅𝗉𝖺𝗅x=𝗅𝗉𝖺𝗅y\mathsf{lpal}_{x}=\mathsf{lpal}_{y}. When TT is large and static, and patterns come online later, one might think of preprocessing TT to construct an index for pal-matching. An index for pal-matching is to support the counting queries that compute the number 𝗈𝖼𝖼\mathsf{occ} of occurrences of PP and the locating queries that compute the occurrences of PP. For this purpose, I et al. [19] proposed the palindrome suffix tree of TT, which is a compacted tree of the 𝗅𝗉𝖺𝗅\mathsf{lpal}-encoded suffixes of TT. The palindrome suffix tree takes O(nlgn)O(n\lg n) bits of space and supports the counting queries in O(mlgσ)O(m\lg\sigma) time and the locating queries in O(mlgσ+𝗈𝖼𝖼)O(m\lg\sigma+\mathsf{occ}) time, where σ\sigma is the size of the alphabet from which characters in TT are taken and 𝗈𝖼𝖼\mathsf{occ} is the number of occurrences.

In this paper, we present a new index, named the PalFM-index, by applying the technique of the FM-index [7] to the pal-matching problem. In so doing we introduce a new encoding, named the 𝗌𝗌𝗉\mathsf{ssp}-encoding, that is based on the non-trivial shortest suffix-palindrome of each prefix. In contrast to the 𝗅𝗉𝖺𝗅\mathsf{lpal}-encoding, the 𝗌𝗌𝗉\mathsf{ssp}-encoding has a good property to design the PalFM-index. The PalFM-index occupies 2nlgmin(σ,lgn)+2n+o(n)2n\lg\min(\sigma,\lg n)+2n+o(n) bits of space and supports the counting queries in O(m)O(m) time. The locating queries can be supported in O(m+Δ𝗈𝖼𝖼)O(m+\Delta\mathsf{occ}) time by adding nΔlgn+n+o(n)\frac{n}{\Delta}\lg n+n+o(n) bits of space, where Δ\Delta is a parameter chosen from {1,2,,n}\{1,2,\dots,n\} in the preprocessing phase.

1.1 Related work

One of the well-studied algorithmic problems related to palindromes is factorizing a string into non-empty palindromes, or in other words, recognizing a string that is obtained by concatenating a certain number of non-empty palindromes [26, 24, 12, 9, 20, 25, 3, 29]. The combinatorial properties discovered during tackling this factorization problem are useful to work on palindromes-related problems.

Developing techniques of designing space-efficient indexes for generalized pattern matching is of great interest. Our PalFM-index was inspired by that of Kim and Cho [23], which is a simplified version of the FM-index for parameterized pattern matching [13]. Indexes based on the FM-index for other generalized pattern matching problems were considered in [14, 11, 22].

2 Preliminaries

2.1 Notations

An integer interval {i,i+1,,j}\{i,i+1,\dots,j\} is denoted by [i..j][i..j], where [i..j][i..j] represents the empty interval if i>ji>j.

Let Σ\Sigma be a finite alphabet, a set of characters. An element of Σ\Sigma^{*} is called a string. The length of a string ww is denoted by |w||w|. The empty string ε\varepsilon is a string of length 0, that is, |ε|=0|\varepsilon|=0. The concatenated string of two strings xx and yy are denoted as xyx\cdot y or simply xyxy. The ii-th character of a string ww is denoted by w[i]w[i] for 1i|w|1\leq i\leq|w|, and the substring of a string ww that begins at position ii and ends at position jj is denoted by w[i..j]w[i..j] for 1ij|w|1\leq i\leq j\leq|w|, i.e. w[i..j]=w[i]w[i+1]w[j]w[i..j]=w[i]w[i+1]\dots w[j]. For convenience, let w[i..j]=εw[i..j]=\varepsilon if i>ji>j. A substring of the form w[1..j]w[1..j] (resp. w[i..|w|]w[i..|w|]) is called a prefix (resp. suffix) of ww and denoted as w[..j]w[..j] (resp. w[i..]w[i..]) in shorthand. Note that ε\varepsilon is a substring/prefix/suffix of any string ww. A substring of ww is called proper if it is not ww itself. When needed we use parentheses to indicate positions in a concatenated string, for example, (xy)[i](xy)[i] refers to the ii-th character of the string xyxy. Hence, (xy)[i](xy)[i] should be distinguished from xy[i]xy[i], which can be interpreted as the concatenated string of xx and y[i]y[i].

Let \prec denote the total order over an alphabet we consider. In particular, we will consider strings over a set consisting of integers and \infty, in which natural total order based on their values is employed. We extend \prec to denote the lexicographic order of strings over the alphabet. For any strings xx and yy that do not match, we say that xx is lexicographically smaller than yy and denote it by xyx\prec y iff x[i+1]y[i+1]x[i+1]\prec y[i+1] for largest integer ii with x[..i]=y[..i]x[..i]=y[..i], where we assume that x[i+1]x[i+1] or y[i+1]y[i+1] refers to the lexicographically smallest character $\$ if it points to out of bounds.

For any string ww, let wRw^{R} denote the reversed string of ww, that is, wR=w[|w|]w[2]w[1]w^{R}=w[|w|]\cdots w[2]w[1]. A string ww is called a palindrome if w=wRw=w^{R}. The radius of a palindrome ww is |w|2\frac{|w|}{2}. The center of a palindromic substring w[i..j]w[i..j] of a string ww is i+j2\frac{i+j}{2}. A palindromic substring w[i..j]w[i..j] is called the maximal palindrome at the center i+j2\frac{i+j}{2} if no other palindromes at the center i+j2\frac{i+j}{2} have a larger radius than w[i..j]w[i..j], i.e., if w[i1]w[j+1]w[i-1]\neq w[j+1], i=1i=1, or j=|w|j=|w|.

Two strings xx and yy of same length are said to palindrome pattern match (pal-match in short) iff they have the same palindromic structures, i.e., the following condition holds: for any possible 1i<j|x|=|y|1\leq i<j\leq|x|=|y|, x[i..j]x[i..j] is a palindrome iff y[i..j]y[i..j] is a palindrome. For example, 𝚊𝚋𝚌𝚋𝚊𝚊𝚌𝚊\mathtt{abcbaaca} and 𝚋𝚌𝚊𝚌𝚋𝚋𝚍𝚋\mathtt{bcacbbdb} pal-match since their palindromic structures coincide (see Figure 1). Note that pal-matching induces a substring consistent equivalent relation [27], i.e., if xx and yy pal-match then x[i..j]x[i..j] and y[i..j]y[i..j] pal-match for any possible 1i<j|x|=|y|1\leq i<j\leq|x|=|y|.

Refer to caption
Figure 1: Illustration of the palindromic structures for pal-matching strings 𝚊𝚋𝚌𝚋𝚊𝚊𝚌𝚊\mathtt{abcbaaca} and 𝚋𝚌𝚊𝚌𝚋𝚋𝚍𝚋\mathtt{bcacbbdb}. Check that the radii of their maximal palindromes for all possible centers, which are illustrated by two-headed arrows, coincide.

The pal-matching problem is to search for, in a text string TT, the occurrences of the substrings that pal-match with a pattern PP. In the pal-matching problem, an occurrence of PP refers to a position ii such that T[i..i+|P|1]T[i..i+|P|-1] and PP pal-match. Throughout this paper we consider indexing a text TT of length nn over an alphabet Σ\Sigma of size σ\sigma.

2.2 Toolbox

As a component of our PalFM-index, we use a data structure for a string ww over an integer alphabet UU supporting the following queries.

  • 𝗋𝖺𝗇𝗄w(i,c)\mathsf{rank}_{w}(i,c): return the number of occurrences of character cUc\in U in w[..i]w[..i].

  • 𝗌𝖾𝗅𝖾𝖼𝗍w(i,c)\mathsf{select}_{w}(i,c): return the ii-th smallest position of the occurrences of character cUc\in U in ww.

  • 𝗋𝖺𝗇𝗀𝖾𝖢𝗈𝗎𝗇𝗍w(i,j,c,d)\mathsf{rangeCount}_{w}(i,j,c,d): return the number of the occurrences of any character in [c..d]U[c..d]\subseteq U in w[i..j]w[i..j].

The Wavelet tree [17] supports these queries in O(lg|Σ|)O(\lg|\Sigma|) time using |w|0(w)+o(|w|lg|U|)|w|\mathcal{H}_{0}(w)+o(|w|\lg|U|) bits of space, where 0(w)=O(lg|U|)\mathcal{H}_{0}(w)=O(\lg|U|) is the 0-th order empirical entropy of ww. The subsequent studies [8, 16] improved the complexities, resulting in the following theorem.

Theorem 1 ([16]).

For a string ww over an integer alphabet UU, there is a data structure in |w|0(w)+o(|w|)|w|\mathcal{H}_{0}(w)+o(|w|) bits of space that supports 𝗋𝖺𝗇𝗄\mathsf{rank}, 𝗌𝖾𝗅𝖾𝖼𝗍\mathsf{select} and 𝗋𝖺𝗇𝗀𝖾𝖢𝗈𝗎𝗇𝗍\mathsf{rangeCount} in O(1+lg|U|lglg|w|)O(1+\frac{\lg|U|}{\lg\lg|w|}) time.

We also use a data structure for the Range Maximum Queries (RMQs) over an integer array VV. Given an interval [i..j][i..j] over VV, a query 𝖱𝖬𝖰V(i,j)\mathsf{RMQ}_{V}(i,j) returns a position in [i..j][i..j] that has the maximum value in V[i..j]V[i..j], that is, 𝖱𝖬𝖰V(i,j)=argmaxk[i..j]V[k]\mathsf{RMQ}_{V}(i,j)=\arg\max_{k\in[i..j]}V[k]. We use the following result.

Theorem 2 ([10]).

For an integer array VV of length nn, there is a data structure with 2n+o(n)2n+o(n) bits of space that supports the RMQs in O(1)O(1) time.

2.3 FM-index

The suffix array 𝖲𝖠\mathsf{SA} of TT is the integer array of length n+1n+1 such that 𝖲𝖠[i]\mathsf{SA}[i] is the starting position of the lexicographically ii-th suffix of TT.111Against convention, we include the empty string that starts with the position n+1n+1 to 𝖲𝖠\mathsf{SA}. In particular, 𝖲𝖠[1]=n+1\mathsf{SA}[1]=n+1 holds as the empty string is always the smallest suffix. We define the string 𝖫\mathsf{L} (a.k.a. the Burrows-Wheeler Transform (BWT) [5] of TT) of length n+1n+1 as follows:

𝖫[i]={$(𝖲𝖠[i]=1),T[𝖲𝖠[i]1](𝖲𝖠[i]>1).\mathsf{L}[i]=\begin{cases}\$&(\mathsf{SA}[i]=1),\\ T[\mathsf{SA}[i]-1]&(\mathsf{SA}[i]>1).\end{cases}

We define the string 𝖥\mathsf{F} of length n+1n+1 as 𝖥=T[𝖲𝖠[1]]T[𝖲𝖠[2]]T[𝖲𝖠[n+1]]\mathsf{F}=T[\mathsf{SA}[1]]T[\mathsf{SA}[2]]\cdots T[\mathsf{SA}[n+1]]. The so-called LF-mapping 𝖫𝖥\mathsf{LF} is the function defined to map a position ii to jj such that 𝖲𝖠[j]=𝖲𝖠[i]1\mathsf{SA}[j]=\mathsf{SA}[i]-1 (with the corner case 𝖫𝖥(i)=1\mathsf{LF}(i)=1 for 𝖲𝖠[i]=1\mathsf{SA}[i]=1). A crucial point is that LF-mapping can be efficiently implemented by rank queries on 𝖫\mathsf{L} and select queries on 𝖥\mathsf{F} with 𝖫𝖥(i)=𝗌𝖾𝗅𝖾𝖼𝗍𝖥(𝗋𝖺𝗇𝗄𝖫(i,𝖫[i]),𝖫[i])\mathsf{LF}(i)=\mathsf{select}_{\mathsf{F}}(\mathsf{rank}_{\mathsf{L}}(i,\mathsf{L}[i]),\mathsf{L}[i]). 222In the plain LF-mapping, select queries on 𝖥\mathsf{F} can be implemented by a simple table that counts, for each character cc, the number of occurrences of characters smaller than cc in TT, but it is not the case in our generalized LF-mapping for pal-matching. The occurrences of pattern PP in TT can be answered by finding the maximal interval [Pb..Pe][P_{b}..P_{e}] in the 𝖲𝖠\mathsf{SA} array such that T[𝖲𝖠[i]..]T[\mathsf{SA}[i]..] is prefixed by PP iff i[Pb..Pe]i\in[P_{b}..P_{e}], and computing the 𝖲𝖠\mathsf{SA}-values in the interval. For a string ww and character cc, the so-called backward search computes the maximal interval in the 𝖲𝖠\mathsf{SA} prefixed by cwcw from that of ww using a similar mechanism of the LF-mapping (see [7] for more details).

3 Encodings for pal-matching

The pal-matching algorithms in [19] are based on the 𝗅𝗉𝖺𝗅\mathsf{lpal}-encoding of a string ww, denoted as 𝗅𝗉𝖺𝗅w\mathsf{lpal}_{w}. 𝗅𝗉𝖺𝗅w\mathsf{lpal}_{w} is the integer array of length |w||w| such that, for any position 1i|w|1\leq i\leq|w|, 𝗅𝗉𝖺𝗅w[i]\mathsf{lpal}_{w}[i] is the length of the longest suffix-palindrome of w[1..i]w[1..i]. See Table 1 for example.

Lemma 3 (Lemma 2 in [19]).

For any strings xx and yy, xx and yy pal-match iff 𝗅𝗉𝖺𝗅x=𝗅𝗉𝖺𝗅y\mathsf{lpal}_{x}=\mathsf{lpal}_{y}.

Although Lemma 3 is sufficient to design suffix-tree type indexes, it seems that the 𝗅𝗉𝖺𝗅\mathsf{lpal}-encoding is not suitable to design FM-index type indexes. For example, more than one position could change when a character is prepended (see Table 1) and this unstable property make messes up lexicographic order of 𝗅𝗉𝖺𝗅\mathsf{lpal}-encoded suffixes, which prevents us to implement LF-mapping space efficiently.

In this paper, we introduce a new encoding suitable to design FM-index type indexes for pal-matching. Our new encoding is based on the shortest suffix-palindrome for each prefix, where the shortest suffix is chosen excluding the trivial palindromes of length 1\leq 1. We call the encoding the shortest suffix-palindrome encoding (the 𝗌𝗌𝗉\mathsf{ssp}-encoding in short). For any string ww, the 𝗌𝗌𝗉\mathsf{ssp}-encoding 𝗌𝗌𝗉w\mathsf{ssp}_{w} of ww is the integer array of length |w||w| such that, for any position 1i|w|1\leq i\leq|w|, 𝗌𝗌𝗉w[i]\mathsf{ssp}_{w}[i] is the length of the non-trivial shortest suffix-palindrome of w[..i]w[..i] if such exists, and otherwise \infty. See Table 1 for example.

Table 1: A comparison between 𝗅𝗉𝖺𝗅\mathsf{lpal} and 𝗌𝗌𝗉\mathsf{ssp} for w=𝚊𝚋𝚋𝚋𝚊𝚋𝚋w=\mathtt{abbbabb} and w=𝚋w=𝚋𝚊𝚋𝚋𝚋𝚊𝚋𝚋w^{\prime}=\mathtt{b}w=\mathtt{babbbabb}. The values that change when prepending 𝚋\mathtt{b} to ww are underlined.
w=w= 𝚊\mathtt{a} 𝚋\mathtt{b} 𝚋\mathtt{b} 𝚋\mathtt{b} 𝚊\mathtt{a} 𝚋\mathtt{b} 𝚋\mathtt{b}
𝗅𝗉𝖺𝗅w=\mathsf{lpal}_{w}= 1 1 2 3 5 3 5
𝗌𝗌𝗉w=\mathsf{ssp}_{w}= \infty \infty 2 2 5 3 2
w=w^{\prime}= 𝚋\mathtt{b} 𝚊\mathtt{a} 𝚋\mathtt{b} 𝚋\mathtt{b} 𝚋\mathtt{b} 𝚊\mathtt{a} 𝚋\mathtt{b} 𝚋\mathtt{b}
𝗅𝗉𝖺𝗅w=\mathsf{lpal}_{w^{\prime}}= 1 1 3 2 3 5 7 5
𝗌𝗌𝗉w=\mathsf{ssp}_{w^{\prime}}= \infty \infty 3 2 2 5 3 2
Lemma 4.

Two strings xx and yy pal-match iff 𝗌𝗌𝗉x=𝗌𝗌𝗉y\mathsf{ssp}_{x}=\mathsf{ssp}_{y}.

Proof.

Since the 𝗌𝗌𝗉\mathsf{ssp}-encoding relies only on palindromic structures, the direction from left to right is clear.

In what follows, we focus on the opposite direction; xx and yy pal-match if 𝗌𝗌𝗉x=𝗌𝗌𝗉y\mathsf{ssp}_{x}=\mathsf{ssp}_{y}. Assume for contrary that xx and yy does not pal-match. Without loss of generality, we can assume that there are positions ii and jj such that x[i..j]x[i..j] is a palindrome but y[i..j]y[i..j] is not, with smallest jj if there are many. Note that the smallest assumption on jj implies that y[i+1..j1]y[i+1..j-1] is a palindrome: If y[i+1..j1]y[i+1..j-1] is not a palindrome (clearly |y[i+1..j1]|>1|y[i+1..j-1]|>1 in such a case), j1j-1 must be a smaller position that satisfies the above condition because x[i+1..j1]x[i+1..j-1] is a palindrome. Let k=𝗌𝗌𝗉x[j]=𝗌𝗌𝗉y[j]k=\mathsf{ssp}_{x}[j]=\mathsf{ssp}_{y}[j]. Since x[i..j]x[i..j] is a palindrome, it holds that 1<k|x[i..j]|1<k\leq|x[i..j]|. Moreover, k|y[i..j]|k\neq|y[i..j]| as y[i..j]y[i..j] is not a palindrome. Since the palindrome x[i..j]x[i..j] has a suffix-palindrome of length kk, the prefix x[i..i+k1]x[i..i+k-1] of length kk is a palindrome, too. On the other hand, since y[i..j]y[i..j] is not a palindrome that has a suffix-palindrome of length kk, the prefix y[i..i+k1]y[i..i+k-1] of length kk cannot be a palindrome. This contradicts the smallest assumption on jj because i+k1i+k-1 is a smaller position such that x[i..i+k1]x[i..i+k-1] and y[i..i+k1]y[i..i+k-1] disagree on their palindromic structures. ∎

In contrast to the 𝗅𝗉𝖺𝗅\mathsf{lpal}-encoding, the 𝗌𝗌𝗉\mathsf{ssp}-encoding has a stable property when prepending a character.

Lemma 5.

For any string ww and character cc, there is at most one position i(1i|w|)i~{}(1\leq i\leq|w|) such that 𝗌𝗌𝗉w[i]𝗌𝗌𝗉cw[i+1]\mathsf{ssp}_{w}[i]\neq\mathsf{ssp}_{cw}[i+1]. Moreover, if such a position ii exists, 𝗌𝗌𝗉w[i]=\mathsf{ssp}_{w}[i]=\infty and 𝗌𝗌𝗉cw[i+1]=i+1\mathsf{ssp}_{cw}[i+1]=i+1.

Proof.

By definition it is obvious that 𝗌𝗌𝗉w[i]=𝗌𝗌𝗉cw[i+1]\mathsf{ssp}_{w}[i]=\mathsf{ssp}_{cw}[i+1] if 𝗌𝗌𝗉w[i]\mathsf{ssp}_{w}[i]\neq\infty. In what follows, we assume for contrary that there exist two positions ii and ii^{\prime} with 1i<i|w|1\leq i<i^{\prime}\leq|w| such that 𝗌𝗌𝗉w[i]=>𝗌𝗌𝗉cw[i+1]\mathsf{ssp}_{w}[i]=\infty>\mathsf{ssp}_{cw}[i+1] and 𝗌𝗌𝗉w[i]=>𝗌𝗌𝗉cw[i+1]\mathsf{ssp}_{w}[i^{\prime}]=\infty>\mathsf{ssp}_{cw}[i^{\prime}+1]. Note that 𝗌𝗌𝗉cw[i+1]=i+1\mathsf{ssp}_{cw}[i+1]=i+1 and 𝗌𝗌𝗉cw[i+1]=i+1\mathsf{ssp}_{cw}[i^{\prime}+1]=i^{\prime}+1 by definition, and (cw)[..i+1](cw)[..i+1] and (cw)[..i+1](cw)[..i^{\prime}+1] are palindromes. Since (cw)[..i+1](cw)[..i+1] is a prefix-palindrome of (cw)[..i+1](cw)[..i^{\prime}+1], it is also a suffix-palindrome of (cw)[..i+1](cw)[..i^{\prime}+1]. It contradicts that (cw)[..i+1](cw)[..i^{\prime}+1] is the non-trivial shortest suffix-palindrome of (cw)[..i+1](cw)[..i^{\prime}+1]. ∎

We consider yet another encoding based on the shortest suffix of w[..i1]w[..i-1] that is extended outwards when appending a character w[i]w[i]. The concept is closely related to the 𝗌𝗌𝗉\mathsf{ssp}-encoding because the extended palindrome is the non-trivial shortest suffix-palindrome of w[..i]w[..i]. An advantage of this new encoding is that we can reduce the number of distinct integers to be used to O(min(σ,lg|w|))O(\min(\sigma,\lg|w|)), which will be used (in a symmetric way) to define 𝖫𝗉𝖺𝗅\mathsf{L}_{\mathsf{pal}} and obtain a space-efficient FM-index specialized for pal-matching.

For any string ww we partition the suffix-palindromes (including the empty suffix) by the characters they have immediately to their left and call each group a suffix-pal-group for ww. We utilize the following lemma.

Lemma 6.

For any string ww, the number of suffix-pal-groups for ww is O(min(σ,lg|w|))O(\min(\sigma,\lg|w|)).

Proof.

It is obvious that the number of suffix-pal-groups is at most σ\sigma because each character is associated to at most one suffix-pal-group. Also it is known that the lengths of the suffix-palindromes can be represented by O(lg|w|)O(\lg|w|) arithmetic progressions and each arithmetic progression induces a period in the involved suffix (e.g., see [20]). Then we can see that every suffix-palindrome represented by an arithmetic progression is in the same group. Hence there are O(lg|w|)O(\lg|w|) groups. ∎

The next lemma shows that pal-matching strings share the same structure of suffix-pal-groups.

Lemma 7.

Let xx and yy be strings that pal-match and let ii and jj be integers with 1i<j|x|=|y|1\leq i<j\leq|x|=|y|. If x[i+1..]x[i+1..] and x[j+1..]x[j+1..] are palindromes with x[i]=x[j]x[i]=x[j], then y[i+1..]y[i+1..] and y[j+1..]y[j+1..] are palindromes with y[i]=y[j]y[i]=y[j].

Proof.

Since the palindrome x[i+1..]x[i+1..] has a suffix-palindrome of length k=|x[j+1..]|k=|x[j+1..]|, it also has a prefix-palindrome of length kk, that is, x[i+1..i+k]x[i+1..i+k] is a palindrome. Also, x[i+k+1]=x[j]x[i+k+1]=x[j] holds. Since x[i]=x[j]=x[i+k+1]x[i]=x[j]=x[i+k+1], x[i..i+k+1]x[i..i+k+1] is a palindrome.

Since xx and yy pal-match, y[i+1..]y[i+1..], y[j+1..]y[j+1..] and y[i..i+k+1]y[i..i+k+1] are palindromes. By transition of equivalence induced by the palindromes y[i..i+k+1]y[i..i+k+1] and y[i+1..]y[i+1..], we can see that y[i]=y[i+k+1]=y[j]y[i]=y[i+k+1]=y[j]. Thus the claim holds. ∎

Let the shortest palindrome in a suffix-pal-group be the representative of the group. We assign consecutive integer identifiers starting from 11 to the suffix-pal-groups in increasing order of their representative’s lengths. See Figure 2 for example.

Refer to caption
Figure 2: An example of suffix-pal-groups for 𝚋𝚊𝚋𝚊𝚋𝚊𝚋𝚊𝚋𝚊𝚌𝚊𝚋𝚊𝚋𝚊𝚌𝚊𝚋𝚊𝚋𝚊𝚌𝚊𝚋𝚊𝚋𝚊\mathtt{bababababacababacababacababa}. The number enclosed in a circle denotes the pal-group-id. The suffix-palindromes in the suffix-pal-group with identifier 11 (resp. 22 and 33) have 𝚊\mathtt{a} (resp. 𝚋\mathtt{b} and 𝚌\mathtt{c}) immediately to their left. The identifiers are given in increasing order of their representative’s lengths, that is, |ε|=0,|𝚊|=1|\varepsilon|=0,|\mathtt{a}|=1 and |𝚊𝚋𝚊𝚋𝚊|=5|\mathtt{ababa}|=5.

For any string ww, we define the shortest suffix-pal-group encoding 𝗌𝗌𝗉𝗀w\mathsf{sspg}_{w} of ww as the integer array of length |w||w| such that, for any position 1i|w|1\leq i\leq|w|, 𝗌𝗌𝗉𝗀w[i]\mathsf{sspg}_{w}[i] is the identifier assigned to the suffix-pal-group of the suffix-palindrome in w[..i1]w[..i-1] that is extended outwards by appending w[i]w[i], if such exists, and otherwise \infty. See Table 2 and Figure 3 for example. Since the non-trivial shortest suffix of w[..i]w[..i] is extended outwards from the representative of the suffix-pal-group for w[1..i1]w[1..i-1] that has w[i]w[i] immediately to the left, 𝗌𝗌𝗉𝗀w[i]\mathsf{sspg}_{w}[i] has essentially equivalent information to 𝗌𝗌𝗉w[i]\mathsf{ssp}_{w}[i]. Formally the next lemma holds.

Lemma 8.

For any string xx of length kk, suppose we have the set of lengths of the representatives of suffix-pal-gropus of x[..k1]x[..k-1]. Given 𝗌𝗌𝗉𝗀x[k]\mathsf{sspg}_{x}[k] we can identify 𝗌𝗌𝗉x[k]\mathsf{ssp}_{x}[k], and vice versa.

Proof.

It is clear that 𝗌𝗌𝗉x[k]=\mathsf{ssp}_{x}[k]=\infty iff 𝗌𝗌𝗉𝗀x[k]=\mathsf{sspg}_{x}[k]=\infty. Given 𝗌𝗌𝗉𝗀x[k]\mathsf{sspg}_{x}[k]\neq\infty we can identify 𝗌𝗌𝗉x[k]\mathsf{ssp}_{x}[k] from the representative of the suffix-pal-group with identifier 𝗌𝗌𝗉𝗀x[k]\mathsf{sspg}_{x}[k]. Given 𝗌𝗌𝗉x[k]\mathsf{ssp}_{x}[k]\neq\infty we can identify 𝗌𝗌𝗉𝗀x[k]\mathsf{sspg}_{x}[k] from the representative that has length 𝗌𝗌𝗉x[k]2\mathsf{ssp}_{x}[k]-2. ∎

The next lemma shows that the 𝗌𝗌𝗉𝗀\mathsf{sspg}-encoding is another encoding for pal-matching, and induces the same lexicographic order with the 𝗌𝗌𝗉\mathsf{ssp}-encoding.

Lemma 9.

Let xx and yy be strings of length kk such that 𝗌𝗌𝗉x[..k1]=𝗌𝗌𝗉y[..k1]\mathsf{ssp}_{x}[..k-1]=\mathsf{ssp}_{y}[..k-1]. Then, 𝗌𝗌𝗉x[k]=𝗌𝗌𝗉y[k]\mathsf{ssp}_{x}[k]=\mathsf{ssp}_{y}[k] iff 𝗌𝗌𝗉𝗀x[k]=𝗌𝗌𝗉𝗀y[k]\mathsf{sspg}_{x}[k]=\mathsf{sspg}_{y}[k]. Also, 𝗌𝗌𝗉x[k]<𝗌𝗌𝗉y[k]\mathsf{ssp}_{x}[k]<\mathsf{ssp}_{y}[k] iff 𝗌𝗌𝗉𝗀x[k]<𝗌𝗌𝗉𝗀y[k]\mathsf{sspg}_{x}[k]<\mathsf{sspg}_{y}[k].

Proof.

It follows from Lemma 7 that x[..k1]x[..k-1] and y[..k1]y[..k-1] have the same structure of suffix-pal-groups. By Lemma 8, 𝗌𝗌𝗉x[k]=𝗌𝗌𝗉y[k]\mathsf{ssp}_{x}[k]=\mathsf{ssp}_{y}[k] if 𝗌𝗌𝗉𝗀x[k]=𝗌𝗌𝗉𝗀y[k]\mathsf{sspg}_{x}[k]=\mathsf{sspg}_{y}[k], and vice versa. Since the identifiers of suffix-pal-groups are given in increasing order of their representative’s lengths, it holds that 𝗌𝗌𝗉x[k]<𝗌𝗌𝗉y[k]\mathsf{ssp}_{x}[k]<\mathsf{ssp}_{y}[k] if and only if 𝗌𝗌𝗉𝗀x[k]<𝗌𝗌𝗉𝗀y[k]\mathsf{sspg}_{x}[k]<\mathsf{sspg}_{y}[k]. ∎

Table 2: A comparison between 𝗌𝗌𝗉w\mathsf{ssp}_{w} and 𝗌𝗌𝗉𝗀w\mathsf{sspg}_{w} for w=𝚋𝚊𝚋𝚋𝚋𝚊𝚋𝚋w=\mathtt{babbbabb}. 𝗌𝗌𝗉w[6]=5\mathsf{ssp}_{w}[6]=5 because the non-trivial shortest suffix-palindrome of w[1..6]=𝚋𝚊𝚋𝚋𝚋𝚊w[1..6]=\mathtt{babbba} is 𝚊𝚋𝚋𝚋𝚊\mathtt{abbba}, which is of length 55. On the other hand, 𝗌𝗌𝗉𝗀w[6]=2\mathsf{sspg}_{w}[6]=2 because the shortest suffix-palindrome 𝚊𝚋𝚋𝚋𝚊\mathtt{abbba} ending at 66 is extended from 𝚋𝚋𝚋\mathtt{bbb} and the suffix-pal-group to which 𝚋𝚋𝚋\mathtt{bbb} belongs for w[1..5]=𝚋𝚊𝚋𝚋𝚋w[1..5]=\mathtt{babbb} has the identifier 22.
w=w= 𝚋\mathtt{b} 𝚊\mathtt{a} 𝚋\mathtt{b} 𝚋\mathtt{b} 𝚋\mathtt{b} 𝚊\mathtt{a} 𝚋\mathtt{b} 𝚋\mathtt{b}
𝗌𝗌𝗉w=\mathsf{ssp}_{w}= \infty \infty 3 2 2 5 3 2
𝗌𝗌𝗉𝗀w=\mathsf{sspg}_{w}= \infty \infty 2 1 1 2 2 2
Refer to caption
Figure 3: Illustration to show 𝗌𝗌𝗉𝗀w[6]=2\mathsf{sspg}_{w}[6]=2 for w=𝚋𝚊𝚋𝚋𝚋𝚊𝚋𝚋w=\mathtt{babbbabb}.

For any string ww, let π(w)=𝗌𝗌𝗉𝗀wR[|w|]\pi(w)=\mathsf{sspg}_{w^{R}}[|w|]. Intuitively, π(w)\pi(w) holds the information from which prefix-palindrome of w[2..]w[2..] the non-trivial shortest prefix-palindrome of ww is extended, and the information is encoded with the identifier defined in the completely symmetric way as the case of the suffix-pal-groups. The function π()\pi(\cdot) will be applied to the suffixes of TT to define 𝖥𝗉𝖺𝗅\mathsf{F}_{\mathsf{pal}} and 𝖫𝗉𝖺𝗅\mathsf{L}_{\mathsf{pal}}, and the next lemma is a key to implement LF-mapping for our PalFM-index.

Lemma 10.

Let xx and yy be strings of length 1\geq 1 such that π(x)=π(y)\pi(x)=\pi(y). Then, 𝗌𝗌𝗉x𝗌𝗌𝗉y\mathsf{ssp}_{x}\prec\mathsf{ssp}_{y} iff 𝗌𝗌𝗉x[2..]𝗌𝗌𝗉y[2..]\mathsf{ssp}_{x[2..]}\prec\mathsf{ssp}_{y[2..]}.

Proof.

Let ii be the largest integer such that x[2..i]x[2..i] and y[2..i]y[2..i] pal-match. Since π(x)=π(y)\pi(x)=\pi(y), using Lemma 9 in a symmetric way, it holds that x[..i]x[..i] and y[..i]y[..i] pal-match. Recall Lemma 5 that at most one \infty in 𝗌𝗌𝗉x[2..]\mathsf{ssp}_{x[2..]} (resp. 𝗌𝗌𝗉y[2..]\mathsf{ssp}_{y[2..]}) turns into the largest possible integer at the changed position when prepending x[1]x[1] (resp. y[1])y[1]). We analyze the cases focusing on the changed positions:

  1. 1.

    The claim clearly holds if neither 𝗌𝗌𝗉x\mathsf{ssp}_{x} nor 𝗌𝗌𝗉y\mathsf{ssp}_{y} has the changed position less than or equal to i+1i+1.

  2. 2.

    If both of 𝗌𝗌𝗉x\mathsf{ssp}_{x} and 𝗌𝗌𝗉y\mathsf{ssp}_{y} have the changed position at ji+1j\leq i+1, it holds that 𝗌𝗌𝗉x[j]=𝗌𝗌𝗉y[j]=j\mathsf{ssp}_{x}[j]=\mathsf{ssp}_{y}[j]=j and 𝗌𝗌𝗉x[2..][j1]=𝗌𝗌𝗉y[2..][j1]=\mathsf{ssp}_{x[2..]}[j-1]=\mathsf{ssp}_{y[2..]}[j-1]=\infty, which also indicates that j<i+1j<i+1. Since this change does not affect the lexicographic order, the claim holds. See the left part of Figure 4 for an illustration of this case.

  3. 3.

    Assume 𝗌𝗌𝗉y\mathsf{ssp}_{y} has the changed position at ji+1j\leq i+1, but 𝗌𝗌𝗉x\mathsf{ssp}_{x} does not. Since x[..i]x[..i] and y[..i]y[..i] pal-match, jj cannot be less than i+1i+1, and hence, j=i+1j=i+1 and 𝗌𝗌𝗉x[i+1]=𝗌𝗌𝗉x[2..][i]i+1=𝗌𝗌𝗉y[i+1]=𝗌𝗌𝗉y[2..][i]\mathsf{ssp}_{x}[i+1]=\mathsf{ssp}_{x[2..]}[i]\prec i+1=\mathsf{ssp}_{y}[i+1]\prec\infty=\mathsf{ssp}_{y[2..]}[i]. Note that the lexicographic order between 𝗌𝗌𝗉x\mathsf{ssp}_{x} and 𝗌𝗌𝗉y\mathsf{ssp}_{y} (resp. 𝗌𝗌𝗉x[2..]\mathsf{ssp}_{x[2..]} and 𝗌𝗌𝗉y[2..]\mathsf{ssp}_{y[2..]}) is determined by that between 𝗌𝗌𝗉x[i+1]\mathsf{ssp}_{x}[i+1] and 𝗌𝗌𝗉y[i+1]\mathsf{ssp}_{y}[i+1] (resp. 𝗌𝗌𝗉x[2..][i]\mathsf{ssp}_{x[2..]}[i] and 𝗌𝗌𝗉y[2..][i]\mathsf{ssp}_{y[2..]}[i]). Since the lexicographic order between 𝗌𝗌𝗉x[i+1]\mathsf{ssp}_{x}[i+1] and 𝗌𝗌𝗉y[i+1]\mathsf{ssp}_{y}[i+1] is the same as that between 𝗌𝗌𝗉x[2..][i]\mathsf{ssp}_{x[2..]}[i] and 𝗌𝗌𝗉y[2..][i]\mathsf{ssp}_{y[2..]}[i], the claim holds. See the right part of Figure 4 for an illustration of this case.

Thus, we conclude that the lemma holds. ∎

Refer to caption
Figure 4: The left (resp. right) figure illustrates the second (resp. third) case in the proof of Lemma 10.

4 Computational results for new encodings

In this section, we show that the 𝗌𝗌𝗉\mathsf{ssp}- and 𝗌𝗌𝗉𝗀\mathsf{sspg}-encodings can be computed in linear time for a given string.

We use the following known results.

Lemma 11 ([26]).

For any string ww, we can compute all the maximal palindromes in O(|w|)O(|w|) time.

Lemma 12 (Lemma 3 in [19]).

For any string ww, we can compute 𝗅𝗉𝖺𝗅w\mathsf{lpal}_{w} in O(|w|)O(|w|) time.

Using Lemmas 11 and 12, we obtain:

Lemma 13.

For any string ww, we can compute 𝗌𝗌𝗉w\mathsf{ssp}_{w} in O(|w|)O(|w|) time.

Proof.

Manacher’s algorithm [26] can compute the radius of the maximal palindrome in increasing order of centers in linear time. It can be extended to compute the length 𝗅𝗉𝖺𝗅w[i]\mathsf{lpal}_{w}[i] of the longest palindrome ending at each position ii because the maximal palindrome with the smallest center that ends at position i\geq i gives us the longest suffix-palindrome ending at ii by truncating the palindrome at ii (e.g., see Lemma 3 of [19]). In a similar way, we can compute the length 𝗅𝗉𝖺𝗅w[i]\mathsf{lpal}^{\prime}_{w}[i] of the second longest palindrome ending at ii.

Using 𝗅𝗉𝖺𝗅w\mathsf{lpal}_{w} and 𝗅𝗉𝖺𝗅w\mathsf{lpal}^{\prime}_{w}, we can compute 𝗌𝗌𝗉w[i]\mathsf{ssp}_{w}[i] in increasing order as follows:

  1. 1.

    If 𝗅𝗉𝖺𝗅w[i]=1\mathsf{lpal}_{w}[i]=1, then 𝗌𝗌𝗉w[i]=\mathsf{ssp}_{w}[i]=\infty.

  2. 2.

    If 𝗅𝗉𝖺𝗅w[i]>1\mathsf{lpal}_{w}[i]>1 and 𝗅𝗉𝖺𝗅w[i]=1\mathsf{lpal}^{\prime}_{w}[i]=1, then 𝗌𝗌𝗉w[i]=𝗅𝗉𝖺𝗅w[i]\mathsf{ssp}_{w}[i]=\mathsf{lpal}_{w}[i].

  3. 3.

    If 𝗅𝗉𝖺𝗅w[i]>1\mathsf{lpal}_{w}[i]>1 and 𝗅𝗉𝖺𝗅w[i]>1\mathsf{lpal}^{\prime}_{w}[i]>1, then 𝗌𝗌𝗉w[i]=𝗌𝗌𝗉w[i𝗅𝗉𝖺𝗅w[i]+𝗅𝗉𝖺𝗅w[i]]\mathsf{ssp}_{w}[i]=\mathsf{ssp}_{w}[i-\mathsf{lpal}_{w}[i]+\mathsf{lpal}^{\prime}_{w}[i]].

In the third case, we use the fact that the non-trivial shortest suffix-palindrome ending at ii has length 𝗅𝗉𝖺𝗅w[i]\leq\mathsf{lpal}^{\prime}_{w}[i] and it ends at i𝗅𝗉𝖺𝗅w[i]+𝗅𝗉𝖺𝗅w[i]i-\mathsf{lpal}_{w}[i]+\mathsf{lpal}^{\prime}_{w}[i], too.

Clearly all can be done in O(|w|)O(|w|) time. ∎

For any string ww, let 𝖦w\mathsf{G}_{w} denote the array of length |w||w| such that 𝖦w[i]\mathsf{G}_{w}[i] stores the number of suffix-pal-groups for w[..i]w[..i].

Lemma 14.

For any string ww, we can compute 𝖦w\mathsf{G}_{w} in O(|w|)O(|w|) time.

Proof.

Let 𝗌𝗉𝗉w\mathsf{spp}_{w} be the array defined in a symmetric way of 𝗌𝗌𝗉w\mathsf{ssp}_{w} such that 𝗌𝗉𝗉w[i]\mathsf{spp}_{w}[i] stores the length of the non-trivial shortest prefix-palindrome starting at ii (or \infty if such a palindrome does not exist). Using Lemma 13 in a symmetric way, we can compute 𝗌𝗉𝗉w\mathsf{spp}_{w} in O(|w|)O(|w|) time.

Let us focus on the palindromes involved in 𝖦w[j]\mathsf{G}_{w}[j]. First, there is a suffix-pal-group for w[..j]w[..j] that has w[j+1]w[j+1] immediately to their left iff 𝗅𝗉𝖺𝗅w[j+1]>1\mathsf{lpal}_{w}[j+1]>1. Next observe that the palindromes in other suffix-pal-groups for w[..j]w[..j], which do not have w[j+1]w[j+1] immediately to their left, are the maximal palindromes ending at jj. Also, a maximal palindrome w[i..j]w[i..j] is the representative (i.e., the shortest palindrome) in a suffix-pal-group to which it belongs. if and only if 𝗌𝗉𝗉w[i1]>|w[i1..j]|\mathsf{spp}_{w}[i-1]>|w[i-1..j]| or i=1i=1. See Figure 5 for illustrations of these observations.

Based on the above observations, we compute 𝖦w\mathsf{G}_{w} as follows: First, we compute the maximal palindromes and 𝗅𝗉𝖺𝗅w\mathsf{lpal}_{w} in O(|w|)O(|w|) time by Lemmas 11 and 12. Next we check every maximal palindrome and assign it to its ending position if it is a representative, which can be done in O(|w|)O(|w|) time in total. We also check if 𝗅𝗉𝖺𝗅w[j+1]>1\mathsf{lpal}_{w}[j+1]>1 for all positions jj in O(|w|)O(|w|) time to count a suffix-pal-group that has w[j+1]w[j+1] immediately to their left. To sum up, 𝖦w\mathsf{G}_{w} can be computed in O(|w|)O(|w|) time. ∎

Refer to caption
Figure 5: The left figure illustrates the case with 𝗅𝗉𝖺𝗅w[j+1]>1\mathsf{lpal}_{w}[j+1]>1, in which we see that there is a suffix-pal-group for w[..j]w[..j] that has w[j+1]=𝚌w[j+1]=\mathtt{c} immediately to their left. The right figure illustrates the case with 𝗌𝗉𝗉w[i1]|w[i1..j]|\mathsf{spp}_{w}[i-1]\leq|w[i-1..j]|, in which we see that the maximal palindrome w[i..j]w[i..j] is not the representative because there is a shorter palindrome that ends at jj and has the same character 𝚌\mathtt{c}^{\prime} immediately to the left.

Generalizing the algorithm presented in the proof of Lemma 14, we obtain:

Lemma 15.

For any string ww, we can compute 𝗌𝗌𝗉𝗀w\mathsf{sspg}_{w} in O(|w|)O(|w|) time.

Proof.

We modify the algorithm presented in the proof of Lemma 14 slightly. Now the task is to count, for every position j+1j+1, the number of suffix-pal-groups for w[..j]w[..j] whose representative is shorter than 𝗌𝗌𝗉[j+1]1\mathsf{ssp}[j+1]-1 because the number is exactly 𝗌𝗌𝗉𝗀w[j+1]\mathsf{sspg}_{w}[j+1] by definition. We check every maximal palindrome w[i..j]w[i..j] and assign it to its ending position jj if 𝗌𝗉𝗉w[i1]>|w[i1..j]|\mathsf{spp}_{w}[i-1]>|w[i-1..j]| and 𝗌𝗌𝗉[j+1]1>ji+1\mathsf{ssp}[j+1]-1>j-i+1. Finally the number of representatives assigned to jj plus one is 𝗌𝗌𝗉𝗀w[j+1]\mathsf{sspg}_{w}[j+1]. Similarly to the proof of Lemma 14, all can be done in O(|w|)O(|w|) time. ∎

5 PalFM-index

The PalFM-index of TT conceptually sort the suffixes of TT in lexicographic order of their 𝗌𝗌𝗉\mathsf{ssp}-encodings (or equivalently 𝗌𝗌𝗉𝗀\mathsf{sspg}-encodings). Let 𝖲𝖠𝗉𝖺𝗅\mathsf{SA}_{\mathsf{pal}} be the integer array of length n+1n+1 such that 𝖲𝖠𝗉𝖺𝗅[i]\mathsf{SA}_{\mathsf{pal}}[i] is the starting position of the ii-th suffix of TT in 𝗌𝗌𝗉\mathsf{ssp}-encoded order. We define the strings 𝖥𝗉𝖺𝗅\mathsf{F}_{\mathsf{pal}} and 𝖫𝗉𝖺𝗅\mathsf{L}_{\mathsf{pal}} of length n+1n+1 based on π\pi function applied to the sorted suffixes. Formally, for any position i(1in+1)i~{}(1\leq i\leq n+1) we define:

𝖥𝗉𝖺𝗅[i]={$if i=1,π(T[𝖲𝖠𝗉𝖺𝗅[i]..])otherwise.\mathsf{F}_{\mathsf{pal}}[i]=\begin{cases}\$&\text{if $i=1$,}\\ \pi(T[\mathsf{SA}_{\mathsf{pal}}[i]..])&\text{otherwise.}\end{cases}
𝖫𝗉𝖺𝗅[i]={$if 𝖲𝖠𝗉𝖺𝗅[i]=1,π(T[𝖲𝖠𝗉𝖺𝗅[i]1..])otherwise.\mathsf{L}_{\mathsf{pal}}[i]=\begin{cases}\$&\text{if $\mathsf{SA}_{\mathsf{pal}}[i]=1$,}\\ \pi(T[\mathsf{SA}_{\mathsf{pal}}[i]-1..])&\text{otherwise.}\end{cases}

See Figure 6 for example.

As in the case of 𝖫𝖥\mathsf{LF}, we define a function 𝖫𝖥𝗉𝖺𝗅:ij\mathsf{LF}_{\mathsf{pal}}:i\mapsto j so that 𝖲𝖠𝗉𝖺𝗅[j]=𝖲𝖠𝗉𝖺𝗅[i]1\mathsf{SA}_{\mathsf{pal}}[j]=\mathsf{SA}_{\mathsf{pal}}[i]-1 (with the corner case 𝖫𝖥𝗉𝖺𝗅(i)=1\mathsf{LF}_{\mathsf{pal}}(i)=1 for 𝖲𝖠𝗉𝖺𝗅[i]=1\mathsf{SA}_{\mathsf{pal}}[i]=1). Thanks to Lemma 10, for any value cc, the suffixes used to obtain ii-th kk in 𝖫𝗉𝖺𝗅\mathsf{L}_{\mathsf{pal}} and in 𝖥𝗉𝖺𝗅\mathsf{F}_{\mathsf{pal}} are the same, which enables us to implement the 𝖫𝖥𝗉𝖺𝗅\mathsf{LF}_{\mathsf{pal}} function by 𝖫𝖥𝗉𝖺𝗅(i)=𝗌𝖾𝗅𝖾𝖼𝗍𝖥𝗉𝖺𝗅(𝗋𝖺𝗇𝗄𝖫𝗉𝖺𝗅(i,𝖫𝗉𝖺𝗅[i]),𝖫𝗉𝖺𝗅[i])\mathsf{LF}_{\mathsf{pal}}(i)=\mathsf{select}_{\mathsf{F}_{\mathsf{pal}}}(\mathsf{rank}_{\mathsf{L}_{\mathsf{pal}}}(i,\mathsf{L}_{\mathsf{pal}}[i]),\mathsf{L}_{\mathsf{pal}}[i]). See Figure 7 for an illustration.

ii T[i..]T[i..] 𝗌𝗌𝗉T[i..]\mathsf{ssp}_{T[i..]} 𝗌𝗌𝗉T[𝖲𝖠𝗉𝖺𝗅[i]..]\mathsf{ssp}_{T[\mathsf{SA}_{\mathsf{pal}}[i]..]} 𝖲𝖠𝗉𝖺𝗅[i]\mathsf{SA}_{\mathsf{pal}}[i] 𝖥𝗉𝖺𝗅[i]\mathsf{F}_{\mathsf{pal}}[i] 𝖫𝗉𝖺𝗅[i]\mathsf{L}_{\mathsf{pal}}[i] 𝖫𝖥𝗉𝖺𝗅(i)\mathsf{LF}_{\mathsf{pal}}(i)
1 𝚊𝚋𝚋𝚊𝚋𝚋𝚌𝚋𝚌\mathtt{abbabbcbc} 𝟸𝟺𝟹𝟸𝟹𝟹\mathtt{\infty\infty 2432\infty 33} ε\varepsilon 10 $\mathtt{\$} \mathtt{\infty} 2
2 𝚋𝚋𝚊𝚋𝚋𝚌𝚋𝚌\mathtt{bbabbcbc} 𝟸𝟹𝟸𝟹𝟹\mathtt{\infty 2\infty 32\infty 33} \mathtt{\infty} 9 \mathtt{\infty} \mathtt{\infty} 5
3 𝚋𝚊𝚋𝚋𝚌𝚋𝚌\mathtt{babbcbc} 𝟹𝟸𝟹𝟹\mathtt{\infty\infty 32\infty 33} 𝟸𝟹𝟸𝟹𝟹\mathtt{\infty 2\infty 32\infty 33} 2 𝟷\mathtt{1} 𝟸\mathtt{2} 6
4 𝚊𝚋𝚋𝚌𝚋𝚌\mathtt{abbcbc} 𝟸𝟹𝟹\mathtt{\infty\infty 2\infty 33} 𝟸𝟹𝟹\mathtt{\infty 2\infty 33} 5 𝟷\mathtt{1} \mathtt{\infty} 7
5 𝚋𝚋𝚌𝚋𝚌\mathtt{bbcbc} 𝟸𝟹𝟹\mathtt{\infty 2\infty 33} \mathtt{\infty\infty} 8 \mathtt{\infty} 𝟸\mathtt{2} 8
6 𝚋𝚌𝚋𝚌\mathtt{bcbc} 𝟹𝟹\mathtt{\infty\infty 33} 𝟸𝟺𝟹𝟸𝟹𝟹\mathtt{\infty\infty 2432\infty 33} 1 𝟸\mathtt{2} $\mathtt{\$} 1
7 𝚌𝚋𝚌\mathtt{cbc} 𝟹\mathtt{\infty\infty 3} 𝟸𝟹𝟹\mathtt{\infty\infty 2\infty 33} 4 \mathtt{\infty} 𝟸\mathtt{2} 9
8 𝚋𝚌\mathtt{bc} \mathtt{\infty\infty} 𝟹\mathtt{\infty\infty 3} 7 𝟸\mathtt{2} 𝟸\mathtt{2} 10
9 𝚌\mathtt{c} \mathtt{\infty} 𝟹𝟸𝟹𝟹\mathtt{\infty\infty 32\infty 33} 3 𝟸\mathtt{2} 𝟷\mathtt{1} 3
10 ε\varepsilon ε\varepsilon 𝟹𝟹\mathtt{\infty\infty 33} 6 𝟸\mathtt{2} 𝟷\mathtt{1} 4
Figure 6: An example of 𝖲𝖠𝗉𝖺𝗅[i]\mathsf{SA}_{\mathsf{pal}}[i], 𝖥𝗉𝖺𝗅[i]\mathsf{F}_{\mathsf{pal}}[i] and 𝖫𝗉𝖺𝗅[i]\mathsf{L}_{\mathsf{pal}}[i] for T=𝚊𝚋𝚋𝚊𝚋𝚋𝚌𝚋𝚌T=\mathtt{abbabbcbc}.
Refer to caption
Figure 7: An illustration for 𝖥𝗉𝖺𝗅[i]\mathsf{F}_{\mathsf{pal}}[i], 𝖫𝗉𝖺𝗅[i]\mathsf{L}_{\mathsf{pal}}[i] and 𝖫𝖥𝗉𝖺𝗅(i)\mathsf{LF}_{\mathsf{pal}}(i). Except the corner cases that have $\mathtt{\$}, 𝖥𝗉𝖺𝗅[i]\mathsf{F}_{\mathsf{pal}}[i] and 𝖫𝗉𝖺𝗅[i]\mathsf{L}_{\mathsf{pal}}[i] are defined by π(T[𝖲𝖠𝗉𝖺𝗅[i]..])\pi(T[\mathsf{SA}_{\mathsf{pal}}[i]..]) and π(T[𝖲𝖠𝗉𝖺𝗅[i]1..])\pi(T[\mathsf{SA}_{\mathsf{pal}}[i]-1..]), respectively. Since π(w)\pi(w) encodes the information about the non-trivial shortest prefix of ww, in each row the non-trivial shortest prefix is shown in grayed background. For example, π(𝚊𝚋𝚋𝚊𝚋𝚋𝚌𝚋𝚌)=𝟸\pi(\mathtt{abbabbcbc})=\mathtt{2} because its non-trivial shortest prefix-palindrome 𝚊𝚋𝚋𝚊\mathtt{abba} is extended from the prefix-palindrome 𝚋𝚋\mathtt{bb} of 𝚋𝚋𝚊𝚋𝚋𝚌𝚋𝚌\mathtt{bbabbcbc} and 𝚋𝚋\mathtt{bb} belongs to the prefix-pal-group with the identifier 𝟸\mathtt{2}. Observe that 𝖥𝗉𝖺𝗅\mathsf{F}_{\mathsf{pal}} is a permutation of 𝖫𝗉𝖺𝗅\mathsf{L}_{\mathsf{pal}} since both 𝖥𝗉𝖺𝗅\mathsf{F}_{\mathsf{pal}} and 𝖫𝗉𝖺𝗅\mathsf{L}_{\mathsf{pal}} use every suffix ww of TT exactly once to obtain π(w)\pi(w). Roughly speaking, 𝖫𝖥𝗉𝖺𝗅()\mathsf{LF}_{\mathsf{pal}}(\cdot) is meant to map a row having a suffix ww in the T[𝖲𝖠𝗉𝖺𝗅[i]1..])T[\mathsf{SA}_{\mathsf{pal}}[i]-1..]) column to the row having the same suffix ww in the T[𝖲𝖠𝗉𝖺𝗅[i]..]T[\mathsf{SA}_{\mathsf{pal}}[i]..] column. Thanks to Lemma 10, for any value kk, the suffixes used to obtain ii-th kk in 𝖫𝗉𝖺𝗅\mathsf{L}_{\mathsf{pal}} and in 𝖥𝗉𝖺𝗅\mathsf{F}_{\mathsf{pal}} are the same, and hence, one can observe visually that the arrows starting from the same 𝖫𝗉𝖺𝗅\mathsf{L}_{\mathsf{pal}}-value are not crossed.

For any string ww, let ww-interval refer to the maximal interval [b..e][b..e] such that 𝗌𝗌𝗉T[𝖲𝖠𝗉𝖺𝗅[i]..]\mathsf{ssp}_{T[\mathsf{SA}_{\mathsf{pal}}[i]..]} is prefixed by 𝗌𝗌𝗉w\mathsf{ssp}_{w}, where ww-interval is empty if there is no substring of TT that pal-matches with ww. Notice that the substring of TT of length |w||w| starting at 𝖲𝖠𝗉𝖺𝗅[i]\mathsf{SA}_{\mathsf{pal}}[i] pal-matches with ww iff i[b..e]i\in[b..e]. A single step of backward search computes cwcw-interval from ww-interval for some character cc.

The following theorems are the main contributions of this paper.

Theorem 16.

Let TT be a string of length nn over an alphabet of size σ\sigma. There is a data structure of 2nlgmin(σ,lgn)+2n+o(n)2n\lg\min(\sigma,\lg n)+2n+o(n) bits of space to support the counting queries for the pal-matching problem in O(m)O(m) time, where mm is the length of a given pattern PP.

Proof.

We use the data structures of Theorem 1 for 𝖫𝗉𝖺𝗅\mathsf{L}_{\mathsf{pal}} and 𝖥𝗉𝖺𝗅\mathsf{F}_{\mathsf{pal}}, and the RMQ data structure of Theorem 2 for the integer array VV with V[i]=𝖫𝖥𝗉𝖺𝗅(i)V[i]=\mathsf{LF}_{\mathsf{pal}}(i). Since the number of distinct symbols in 𝖫𝗉𝖺𝗅\mathsf{L}_{\mathsf{pal}} and 𝖥𝗉𝖺𝗅\mathsf{F}_{\mathsf{pal}} are O(min(σ,lgn))O(\min(\sigma,\lg n)) by Lemma 6, the data structures occupy 2nlgmin(σ,lgn)+2n+o(n)2n\lg\min(\sigma,\lg n)+2n+o(n) bits of space in total and all queries (𝗋𝖺𝗇𝗄\mathsf{rank}, 𝗌𝖾𝗅𝖾𝖼𝗍\mathsf{select}, 𝗋𝖺𝗇𝗀𝖾𝖢𝗈𝗎𝗇𝗍\mathsf{rangeCount} and 𝖱𝖬𝖰\mathsf{RMQ}) can be supported in O(1)O(1) time.

The number of occurrences of PP can be answered by computing the width of PP-interval. Thus we focus on a single step of backward search. In a general setting, for any string ww and a character cc, we show how to compute cwcw-interval [b..e][b^{\prime}..e^{\prime}] in O(1)O(1) time from ww-interval [b..e][b..e], π(cw)\pi(cw) and the number gg of prefix-pal-groups of ww. The procedure differs depending on π(cw)=\pi(cw)=\infty or not.

  1. 1.

    When π(cw)=k\pi(cw)=k\neq\infty. Using Lemma 9 in a symmetric way, [b..e][b^{\prime}..e^{\prime}] is obtained by mapping the positions of π(cw)\pi(cw) in 𝖫𝗉𝖺𝗅[b..e]\mathsf{L}_{\mathsf{pal}}[b..e] by the 𝖫𝖥𝗉𝖺𝗅\mathsf{LF}_{\mathsf{pal}} function. More specifically, b=𝗌𝖾𝗅𝖾𝖼𝗍𝖥𝗉𝖺𝗅(𝗋𝖺𝗇𝗄𝖫𝗉𝖺𝗅(b1,k)+1,k)b^{\prime}=\mathsf{select}_{\mathsf{F}_{\mathsf{pal}}}(\mathsf{rank}_{\mathsf{L}_{\mathsf{pal}}}(b-1,k)+1,k) and e=𝗌𝖾𝗅𝖾𝖼𝗍𝖥𝗉𝖺𝗅(𝗋𝖺𝗇𝗄𝖫𝗉𝖺𝗅(e,k),k)e^{\prime}=\mathsf{select}_{\mathsf{F}_{\mathsf{pal}}}(\mathsf{rank}_{\mathsf{L}_{\mathsf{pal}}}(e,k),k), which can be computed in O(1)O(1) time.

  2. 2.

    When π(cw)=\pi(cw)=\infty. We note that [b..e][b^{\prime}..e^{\prime}] is the maximal interval such that T[𝖲𝖠𝗉𝖺𝗅[i]..]T[\mathsf{SA}_{\mathsf{pal}}[i]..] does not have non-trivial prefix-palindrome (i.e. π(T[𝖲𝖠𝗉𝖺𝗅[i]..])=\pi(T[\mathsf{SA}_{\mathsf{pal}}[i]..])=\infty) or T[𝖲𝖠𝗉𝖺𝗅[i]..]T[\mathsf{SA}_{\mathsf{pal}}[i]..] has the non-trivial shortest prefix-palindrome of length longer than |cw||cw| (i.e. π(T[𝖲𝖠𝗉𝖺𝗅[i]..])>g\pi(T[\mathsf{SA}_{\mathsf{pal}}[i]..])>g). Thus, eb+1e^{\prime}-b^{\prime}+1 is equivalent to the number of occurrences of values larger than gg in 𝖫𝗉𝖺𝗅[b..e]\mathsf{L}_{\mathsf{pal}}[b..e], which can be computed in 𝗋𝖺𝗇𝗀𝖾𝖢𝗈𝗎𝗇𝗍𝖫𝗉𝖺𝗅(b,e,g,)\mathsf{rangeCount}_{\mathsf{L}_{\mathsf{pal}}}(b,e,g,\infty) in O(1)O(1) time. Moreover, it holds that e=𝖫𝖥𝗉𝖺𝗅(𝖱𝖬𝖰V(b,e))e^{\prime}=\mathsf{LF}_{\mathsf{pal}}(\mathsf{RMQ}_{V}(b,e)) because 𝗌𝗌𝗉(T[𝖲𝖠𝗉𝖺𝗅[i]1..])\mathsf{ssp}(T[\mathsf{SA}_{\mathsf{pal}}[i]-1..]) with π(T[𝖲𝖠𝗉𝖺𝗅[i]1..])=𝖫𝗉𝖺𝗅[i]>g\pi(T[\mathsf{SA}_{\mathsf{pal}}[i]-1..])=\mathsf{L}_{\mathsf{pal}}[i]>g is always lexicographically larger than 𝗌𝗌𝗉(T[𝖲𝖠𝗉𝖺𝗅[j]1..])\mathsf{ssp}(T[\mathsf{SA}_{\mathsf{pal}}[j]-1..]) with π(T[𝖲𝖠𝗉𝖺𝗅[j]1..])=𝖫𝗉𝖺𝗅[j]g\pi(T[\mathsf{SA}_{\mathsf{pal}}[j]-1..])=\mathsf{L}_{\mathsf{pal}}[j]\leq g. Thus, we can compute [b..e][b^{\prime}..e^{\prime}] in O(1)O(1) time.

Backward search for PP requires π(P[i..])\pi(P[i..]) and the number gg of prefix-pal-groups of P[i..]P[i..] for all 1im1\leq i\leq m, which can be computed by 𝗌𝗌𝗉𝗀PR\mathsf{sspg}_{P^{R}} and 𝖦PR\mathsf{G}_{P^{R}} in O(m)O(m) time using Lemmas 15 and 14.

Putting all together, we get the theorem. ∎

Theorem 17.

Let TT be a string of length nn over an alphabet of size σ\sigma and Δ\Delta be an integer in [1..n][1..n]. There is a data structure of 2nlgmin(σ,lgn)+nΔlgn+3n+o(n)2n\lg\min(\sigma,\lg n)+\frac{n}{\Delta}\lg n+3n+o(n) bits of space to support the locating queries for the pal-matching problem in O(m+Δ𝗈𝖼𝖼)O(m+\Delta\mathsf{occ}) time, where mm is the length of a given pattern PP and 𝗈𝖼𝖼\mathsf{occ} is the number of occurrences to report.

Proof.

We use the data structure and the algorithm of Theorem 16 to compute PP-interval in 2n(1+lgmin(σ,lgn))+o(n)2n(1+\lg\min(\sigma,\lg n))+o(n) bits of space and O(m)O(m) time. The occurrences of PP (in the sense of pal-matching) can be answered by the 𝖲𝖠𝗉𝖺𝗅\mathsf{SA}_{\mathsf{pal}}-values in PP-interval. We employ exactly the same sampling technique used in the FM-index to retrieve 𝖲𝖠\mathsf{SA}-values (e.g., see [7]): We make a bit vector BB of length n+1n+1 marking the positions ii in 𝖲𝖠𝗉𝖺𝗅\mathsf{SA}_{\mathsf{pal}} such that 𝖲𝖠𝗉𝖺𝗅[i]=Δk+1\mathsf{SA}_{\mathsf{pal}}[i]=\Delta k+1 for some integer kk, and the sparse suffix array SS holding only the marked 𝖲𝖠𝗉𝖺𝗅\mathsf{SA}_{\mathsf{pal}}-values in the order. BB is equipped with a data structure to support the rank queries and the additional space to Theorem 16 is nΔlgn+n+o(n)\frac{n}{\Delta}\lg n+n+o(n) bits in total.

If position ii is marked, 𝖲𝖠𝗉𝖺𝗅[i]\mathsf{SA}_{\mathsf{pal}}[i] is retrieved by S[𝗋𝖺𝗇𝗄B(i,1)]S[\mathsf{rank}_{B}(i,1)] in O(1)O(1) time. If position ii is not marked, we apply LF-mapping kk times from ii until we reach a marked position jj and retrieve 𝖲𝖠𝗉𝖺𝗅[i]\mathsf{SA}_{\mathsf{pal}}[i] by S[𝗋𝖺𝗇𝗄B(j,1)]+kS[\mathsf{rank}_{B}(j,1)]+k. Since text positions are marked every Δ\Delta positions, the number kk of LF-mapping steps is at most Δ\Delta, and hence, 𝖲𝖠𝗉𝖺𝗅[i]\mathsf{SA}_{\mathsf{pal}}[i] can be retrieved in O(Δ)O(\Delta) time. Therefore we can report each occurrence of PP in O(Δ)O(\Delta) time, and the theorem follows. ∎

6 Conclusions and future work

In this paper, we developed new encoding schemes for pal-matching and proposed the PalFM-index, a space-efficient index for pal-matching based on the FM-index. Future work includes to present an efficient construction algorithm of the PalFM-index, and to reduce the space requirement (e.g. by incorporating with the idea of [13]). Another interesting research direction would be to develop a general framework to design FM-index type indexes in generalized pattern matching. We believe that switching encoding from 𝗅𝗉𝖺𝗅\mathsf{lpal} to 𝗌𝗌𝗉\mathsf{ssp} to design the PalFM-indexes gives a good hint to pursue this direction, and conjecture that any generalized pattern matching under a substring consistent equivalent relation [27] admits such shortest positional encodings to design FM-index type indexes.

Acknowledgements

This work was supported by JSPS KAKENHI (Grant Number 19K20213).

References

  • [1] Jean-Paul Allouche, Michael Baake, Julien Cassaigne, and David Damanik. Palindrome complexity. Theor. Comput. Sci., 292(1):9–31, 2003.
  • [2] Mira-Cristiana Anisiu, Valeriu Anisiu, and Zoltán Kása. Total palindrome complexity of finite words. Discrete Mathematics, 310(1):109–114, 2010. \hrefhttps://doi.org/http://dx.doi.org/10.1016/j.disc.2009.08.002 doi:http://dx.doi.org/10.1016/j.disc.2009.08.002.
  • [3] Kirill Borozdin, Dmitry Kosolobov, Mikhail Rubinchik, and Arseny M. Shur. Palindromic length in linear time. In Proc. 28th Annual Symposium on Combinatorial Pattern Matching (CPM) 2017, pages 23:1–23:12, 2017. \hrefhttps://doi.org/10.4230/LIPIcs.CPM.2017.23 doi:10.4230/LIPIcs.CPM.2017.23.
  • [4] Srecko Brlek, Sylvie Hamel, Maurice Nivat, and Christophe Reutenauer. On the palindromic complexity of infinite words. Int. J. Found. Comput. Sci., 15(2):293–306, 2004. \hrefhttps://doi.org/10.1142/S012905410400242X doi:10.1142/S012905410400242X.
  • [5] Michael Burrows and David J Wheeler. A block-sorting lossless data compression algorithm. Technical report, HP Labs, 1994.
  • [6] Xavier Droubay, Jacques Justin, and Giuseppe Pirillo. Episturmian words and some constructions of de luca and rauzy. Theor. Comput. Sci., 255(1-2):539–553, 2001. \hrefhttps://doi.org/10.1016/S0304-3975(99)00320-5 doi:10.1016/S0304-3975(99)00320-5.
  • [7] Paolo Ferragina and Giovanni Manzini. Opportunistic data structures with applications. In FOCS, pages 390–398, 2000.
  • [8] Paolo Ferragina, Giovanni Manzini, Veli Mäkinen, and Gonzalo Navarro. Compressed representations of sequences and full-text indexes. ACM Trans. Algorithms, 3(2), 2007.
  • [9] Gabriele Fici, Travis Gagie, Juha Kärkkäinen, and Dominik Kempa. A subquadratic algorithm for minimum palindromic factorization. Journal of Discrete Algorithms, 28:41–48, 2014. StringMasters 2012 & 2013 Special Issue (Volume 1). URL: http://www.sciencedirect.com/science/article/pii/S1570866714000525, \hrefhttps://doi.org/https://doi.org/10.1016/j.jda.2014.08.001 doi:https://doi.org/10.1016/j.jda.2014.08.001.
  • [10] Johannes Fischer and Volker Heun. Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM J. Comput., 40(2):465–492, 2011.
  • [11] Travis Gagie, Giovanni Manzini, and Rossano Venturini. An encoding for order-preserving matching. In Proc. 25th Annual European Symposium on Algorithms (ESA) 2017, pages 38:1–38:15, 2017. \hrefhttps://doi.org/10.4230/LIPIcs.ESA.2017.38 doi:10.4230/LIPIcs.ESA.2017.38.
  • [12] Zvi Galil and Joel I. Seiferas. A linear-time on-line recognition algorithm for “palstar”. J. ACM, 25(1):102–111, 1978. \hrefhttps://doi.org/http://doi.acm.org/10.1145/322047.322056 doi:http://doi.acm.org/10.1145/322047.322056.
  • [13] Arnab Ganguly, Rahul Shah, and Sharma V. Thankachan. pBWT: Achieving succinct data structures for parameterized pattern matching and related problems. In Proc. 28th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) 2017, pages 397–407, 2017. \hrefhttps://doi.org/10.1137/1.9781611974782.25 doi:10.1137/1.9781611974782.25.
  • [14] Arnab Ganguly, Rahul Shah, and Sharma V. Thankachan. Structural pattern matching - succinctly. In Proc. 28th International Symposium on Algorithms and Computation (ISAAC) 2017, pages 35:1–35:13, 2017. \hrefhttps://doi.org/10.4230/LIPIcs.ISAAC.2017.35 doi:10.4230/LIPIcs.ISAAC.2017.35.
  • [15] Amy Glen, Jacques Justin, Steve Widmer, and Luca Q. Zamboni. Palindromic richness. Eur. J. Comb., 30(2):510–531, 2009. \hrefhttps://doi.org/10.1016/j.ejc.2008.04.006 doi:10.1016/j.ejc.2008.04.006.
  • [16] Alexander Golynski, Rajeev Raman, and S. Srinivasa Rao. On the redundancy of succinct data structures. In Joachim Gudmundsson, editor, Proc. 11th Scandinavian Workshop on Algorithm Theory (SWAT) 2008, volume 5124 of Lecture Notes in Computer Science, pages 148–159. Springer, 2008.
  • [17] Roberto Grossi, Ankur Gupta, and Jeffrey Scott Vitter. High-order entropy-compressed text indexes. In Proc. 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) 2003, pages 841–850. ACM/SIAM, 2003.
  • [18] Tomohiro I, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Counting and verifying maximal palindromes. In Proc. 17th International Symposium on String Processing and Information Retrieval (SPIRE) 2010, pages 135–146, 2010.
  • [19] Tomohiro I, Shunsuke Inenaga, and Masayuki Takeda. Palindrome pattern matching. Theor. Comput. Sci., 483:162–170, 2013. \hrefhttps://doi.org/http://dx.doi.org/10.1016/j.tcs.2012.01.047 doi:http://dx.doi.org/10.1016/j.tcs.2012.01.047.
  • [20] Tomohiro I, Shiho Sugimoto, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Computing palindromic factorizations and palindromic covers on-line. In Proc. 25th Annual Symposium on Combinatorial Pattern Matching (CPM) 2014, volume 8486 of Lecture Notes in Computer Science, pages 150–161. Springer, 2014.
  • [21] Ignacio Tinoco Jr., Olke C. Uhlenbeck, and Mark D. Levine. Estimation of secondary structure in ribonucleic acids. Nature, 230:362–367, 1971.
  • [22] Sung-Hwan Kim and Hwan-Gue Cho. A compact index for cartesian tree matching. In Pawel Gawrychowski and Tatiana Starikovskaya, editors, Proc. 32nd Annual Symposium on Combinatorial Pattern Matching (CPM) 2021, volume 191 of LIPIcs, pages 18:1–18:19. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021.
  • [23] Sung-Hwan Kim and Hwan-Gue Cho. Simpler FM-index for parameterized string matching. Inf. Process. Lett., 165:106026, 2021. \hrefhttps://doi.org/10.1016/j.ipl.2020.106026 doi:10.1016/j.ipl.2020.106026.
  • [24] Donald E. Knuth, James H. Morris, and Vaughan R. Pratt. Fast pattern matching in strings. SIAM J. Comput., 6(2):323–350, 1977.
  • [25] Dmitry Kosolobov, Mikhail Rubinchik, and Arseny M. Shur. Pal k is linear recognizable online. In Proc. 41st International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM) 2015, pages 289–301, 2015.
  • [26] Glenn K. Manacher. A new linear-time “on-line” algorithm for finding the smallest initial palindrome of a string. J. ACM, 22(3):346–351, 1975. URL: http://doi.acm.org/10.1145/321892.321896, \hrefhttps://doi.org/10.1145/321892.321896 doi:10.1145/321892.321896.
  • [27] Yoshiaki Matsuoka, Takahiro Aoki, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Generalized pattern matching and periodicity under substring consistent equivalence relations. Theor. Comput. Sci., 656:225–233, 2016.
  • [28] Antonio Restivo and Giovanna Rosone. Burrows-wheeler transform and palindromic richness. Theor. Comput. Sci., 410(30-32):3018–3026, 2009. \hrefhttps://doi.org/10.1016/j.tcs.2009.03.008 doi:10.1016/j.tcs.2009.03.008.
  • [29] Mikhail Rubinchik and Arseny M. Shur. EERTREE: an efficient data structure for processing palindromes in strings. Eur. J. Comb., 68:249–265, 2018. \hrefhttps://doi.org/10.1016/j.ejc.2017.07.021 doi:10.1016/j.ejc.2017.07.021.