This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Computing matching statistics on Wheeler DFAs

Alessio Conte1    Nicola Cotumaccio2,3    Travis Gagie3    Giovanni Manzini1   
Nicola Prezza4 and Marinella Sciortino5
1 University of Pisa, Italy, [email protected], [email protected] 2 GSSI, L’Aquila, Italy, [email protected] 3 Dalhousie University, Halifax, Canada, [email protected], [email protected] 4 Ca’ Foscari Unversity, Venice, Italy, [email protected] 5 University of Palermo, Italy, [email protected]
Abstract

Matching statistics were introduced to solve the approximate string matching problem, which is a recurrent subroutine in bioinformatics applications. In 2010, Ohlebusch et al. [SPIRE 2010] proposed a time and space efficient algorithm for computing matching statistics which relies on some components of a compressed suffix tree - notably, the longest common prefix (LCP) array. In this paper, we show how their algorithm can be generalized from strings to Wheeler deterministic finite automata. Most importantly, we introduce a notion of LCP array for Wheeler automata, thus establishing a first clear step towards extending (compressed) suffix tree functionalities to labeled graphs.

1 Introduction

Given a string TT and a pattern π\pi, the classical formulation of the pattern matching problem requires to decide whether the pattern π\pi occurs in the string TT and, possibly, count the number of such occurrences and report the positions where they occur. The invention of the FM-index [1], which is based on the Burrows-Wheeler transform [2], opened a new line of research in the pattern matching field. The indexing and compression techniques behind the FM-index deeply rely on the idea of suffix sorting, and over the years have been generalized from strings to trees [3], De Brujin graphs [4, 5], Wheeler graphs [6, 7] and arbitrary graphs [8, 9]. In particular, the class of Wheeler graphs is probably the one that captures the intuition behind the FM-index in the simplest way, and indeed the notion of Wheeler order has relevant consequences in automata theory [7, 10].

However, in bioinformatics we are not only interested in exact pattern matching, but also in a myriad of variations of the pattern matching problem [11]. In particular, matching statistics were introduced to solve the approximate pattern matching problem [12]. A powerful data structure that is able to address the variations of the pattern matching problem at once is the suffix tree [13]. The main drawback of the suffix tree is its space consumption, which is non-negligible both in theory and in practice. As a consequence, the suffix tree has been replaced by the suffix array [14]. While suffix arrays do not have all the functionalities of suffix trees, it has been shown that they can be augmented with some additional data structures — notably, the longest common prefix (LCP) array — so that it is possible to retrieve the full functionalities of a suffix trees [15]. All these components can be successfully compressed, leading to the so-called compressed suffix trees [16].

The natural question is whether it is possible to provide suffix tree functionalities not only to strings, but also to graphs, and in particular Wheeler graphs. In this paper, we provide a first partial affirmative answer by considering the problem of computing matching statistics. In 2010, Ohlebusch et al. [17] proposed a time and space efficient algorithm for computing matching statistics which relies on some components of a compressed suffix tree. In this paper, we show how their algorithm can be generalized from strings to Wheeler deterministic finite automata. Most importantly, we introduce a notion of longest common prefix (LCP) array for Wheeler automata, thus establishing an important step towards extending (compressed) suffix tree functionalities to labeled graphs.

2 Notation and first definitions

Throughout the paper, we consider an alphabet Σ\Sigma and a fixed total order \preceq on Σ\Sigma. We denote by Σ\Sigma^{*} the set of all finite strings on Σ\Sigma and by Σω\Sigma^{\omega} the set of all (countably) infinite strings on Σ\Sigma. The empty word is ϵ\epsilon. If αΣ\alpha\in\Sigma^{*}, then αR\alpha^{R} is the reverse string of α\alpha. We extend the total order \preceq from Σ\Sigma to ΣΣω\Sigma^{*}\cup\Sigma^{\omega} lexicographically. If ii and jj are integers, with iji\leq j, define [i,j]={i,i+1,,j1,j}[i,j]=\{i,i+1,\dots,j-1,j\}. If TT is a string, the ii-th character of TT is T[i]T[i], and T[i..j]=T[i]..T[j]T[i..j]=T[i]..T[j].

We will consider deterministic automata 𝒜=(Q,E,s0,F)\mathcal{A}=(Q,E,s_{0},F), where QQ is the set of states, EQ×Q×ΣE\subseteq Q\times Q\times\Sigma is the set of labeled edges, s0Qs_{0}\in Q is the initial state and FQF\subseteq Q is the set of final states. The definition implies that for every uQu\in Q and for every aΣa\in\Sigma there exists at most one edge labeled aa leaving uu. Following [7, 10], we assume that s0s_{0} has no incoming edges, and every state is reachable from the initial state; moreover, all edges entering the same state have the same label (input-consistency), so that for every uQ{s0}u\in Q\setminus\{s_{0}\} we can let λ(u)\lambda(u) be the label of all edges entering uu. We define λ(s0)=#\lambda(s_{0})=\#, where #Σ\#\not\in\Sigma is a special character for which we assume #a\#\prec a for every aΣa\in\Sigma (the character #\# is an analogous of the termination character $\$ used for suffix trees and suffix arrays). As a consequence, an edge (u,u,a)(u^{\prime},u,a) can be simply written as (u,u)(u^{\prime},u), because it must be a=λ(u)a=\lambda(u).

We assume familiarity with the notions of suffix array (SA), Burrows Wheeler transform (BWT), FM-index and backward search [1].

The matching statistics of a pattern π=π[1..m]\pi=\pi[1..m] with respect to a string T=T[1..n]T=T[1..n] are defined as follows. Assume that T[n]=$ΣT[n]=\$\not\in\Sigma, where $a\$\prec a for every aΣa\in\Sigma. Determining the matching statistics of π\pi with respect to TT means determining, for 1im1\leq i\leq m, (i) the longest prefix π\pi^{\prime} of π[i..m]\pi[i..m] which occurs in TT, and (ii) the interval corresponding to the set of all strings starting with π\pi^{\prime} in the list of all lexicographically sorted suffixes. We can describe (i) and (ii) by means of three values: the length i\ell_{i} of π\pi^{\prime}, and the endpoints lil_{i} and rir_{i} of the interval considered in (ii). For example, let T=mississippi$T=mississippi\$ (see Figure 1), and π=stpissi\pi=stpissi. For i=1i=1, we have π=s\pi^{\prime}=s, so 1=1\ell_{1}=1 and [l1,r1]=[9,12][l_{1},r_{1}]=[9,12] (suffixes starting with ss). For i=2i=2, we have π=ϵ\pi^{\prime}=\epsilon, so 2=0\ell_{2}=0 and [l2,r2]=[1,n]=[1,12][l_{2},r_{2}]=[1,n]=[1,12] (all suffixes start with the empty string). For i=3i=3, we have π=pi\pi^{\prime}=pi, so 3=2\ell_{3}=2, and [l3,r3]=[7,7][l_{3},r_{3}]=[7,7] (suffixes starting with pipi). For i=4i=4, we have π=issi\pi^{\prime}=issi, so 4=4\ell_{4}=4, and [l4,r4]=[4,5][l_{4},r_{4}]=[4,5] (suffixes starting with issiissi). One can proceed analogously for i=5,6,7i=5,6,7.

ii Sorted suffixes LCP SA BWT
1 $ 12 i
2 i$ 0 11 p
3 ippi$ 1 8 s
4 issippi$ 1 5 s
5 ississippi$ 4 2 m
6 mississippi$ 0 1 $
7 pi$ 0 10 p
8 ppi$ 1 9 i
9 sippi$ 0 7 s
10 sissippi$ 2 4 s
11 ssippi$ 1 6 i
12 ssissippi$ 3 3 i
Figure 1: The sorted suffixes of “mississippi$” and the LCP, SA, and BWT arrays.

3 Computing matching statistics for strings

We will first describe the algorithm by Ohlebusch et al. [17], emphasizing the ideas that we will generalize when switching to Wheeler DFAs. The algorithms computes the matching statistics using a number of iterations linear in mm by exploiting the backward search. We start from the end of π\pi, and we use the backward search (starting from the interval [1,n][1,n] which corresponds to the set of suffixes prefixed by the empty string) to find the interval of all occurrences of the last character of π\pi in TT (if any). Then, starting from the new interval, we use the backward search to find all the occurrences of the suffix of length 2 of π\pi in TT (if any), and so on. At some point, it may happen that for some im+1i\leq m+1 we have that π[i..m]\pi[i..m] occurs in TT, but the next application of the backward search returns the empty interval, so that π[i1..m]\pi[i-1..m] does not occur in TT (the case i=m+1i=m+1 corresponds to the initial setting when π[i..m]\pi[i..m] is the empty string). We distinguish two cases:

  • (Case 1) If li=1l_{i}=1 and ri=nr_{i}=n, this means that all suffixes of TT are prefixed by π[i..m]\pi[i..m]. This may happen in particular if i=m+1i=m+1: this means that the first backward search has been unsuccessful. We immediately conclude that character π[i1]\pi[i-1] does not occur in TT, so i1=0\ell_{i-1}=0 and [li1,ri1]=[1,n][l_{i-1},r_{i-1}]=[1,n] (because all suffixes start with the empty string). In this case, in the following iterations of the algorithm, we can simply discard π[i1,m]\pi[i-1,m]: when for ii2i^{\prime}\leq i-2 we will be searching for the longest prefix of π[i,m]\pi[i^{\prime},m] occurring in TT, it will suffice to search for the longest prefix of π[i,i2]\pi[i^{\prime},i-2] occurring in TT.

  • (Case 2) If li>1l_{i}>1 or ri<nr_{i}<n, this means that the number of suffixes of TT starting with π[i..m]\pi[i..m] is less than nn. Now, every suffix starting with π[i..m]\pi[i..m] also starts with π[i..m1]\pi[i..m-1]. If the number of suffixes starting with π[i..m1]\pi[i..m-1] is equal to the number of suffixes starting with π[i..m]\pi[i..m], then also π[i1..m1]\pi[i-1..m-1] does not occur in TT. More in general, for jm1j\leq m-1 we can have that π[i1..j]\pi[i-1..j] occurs in TT only if the number of suffixes starting with π[i..j]\pi[i..j] is larger than the number of suffixes starting with π[i..m]\pi[i..m]. Since we are interested in maximal matches, we want jj to be as large as possible: we will show later how to compute the largest integer jj such that the number of suffixes starting with π[i..j]\pi[i..j] is larger than the number of suffixes starting with π[i..m]\pi[i..m]. Notice that jj always exists, because all nn suffixes start with the empty string, but less than nn suffixes start with π[i..m]\pi[i..m]. After determining jj we discard π[j+1..m]\pi[j+1..m] (so in the following iterations of the algorithm we will simply consider π[1..j]\pi[1..j]), and we recursively apply the backward search starting from the interval associated with the occurrences of π[i..j]\pi[i..j] — we will also see how to compute this interval.

Let us apply the above algorithm to T=mississippi$T=mississippi\$ and π=stpissi\pi=stpissi. We start with the interval [1,n]=[1,12][1,n]=[1,12], corresponding to the empty pattern, and character π[7]=i\pi[7]=i. A backward step yields the interval [l7,r7]=[2,5][l_{7},r_{7}]=[2,5] (suffixes starting with ii), so 7=1\ell_{7}=1. Now, we apply a backward step from [2,5][2,5] and π[6]=s\pi[6]=s, obtaining [l6,r6]=[9,10][l_{6},r_{6}]=[9,10] (suffixes starting with sisi), so 6=2\ell_{6}=2. Again, we apply a backward step from [9,10][9,10] and π[5]=s\pi[5]=s, obtaining [l5,r5]=[11,12][l_{5},r_{5}]=[11,12] (suffixes starting with ssissi), so 5=3\ell_{5}=3. Again, we apply a backward step from [11,12][11,12] and π[4]=i\pi[4]=i, obtaining [l4,r4]=[4,5][l_{4},r_{4}]=[4,5] (suffixes starting with issiissi), so 4=4\ell_{4}=4. We now apply a backward step from [4,5][4,5] and π[3]=p\pi[3]=p, and we obtain the empty interval. This means that no suffix starts with pissipissi. Notice in Figure 1 that the number of suffixes starting with issiissi is equal to the number of suffixes starting with ississ or isis, but the number of suffixes starting with ii is bigger. As a consequence, we consider the interval of all suffixes starting with ii — which is [2,5][2,5] — and we apply a backward step with π[3]=p\pi[3]=p. This time the backward step is successful, and we obtain [l3,r3]=[7,7][l_{3},r_{3}]=[7,7] (suffixes starting with pipi), and 3=2\ell_{3}=2. We now apply a backward step from [7,7][7,7] and π[2]=t\pi[2]=t, obtaining the empty interval. This means that no suffix starts with tpitpi. Notice in Figure 1 that the number of suffixes starting with pp is bigger than the number of suffixes starting with pipi. The corresponding interval is [7,8][7,8], but a backward step with π[2]=t\pi[2]=t is still unsuccessful (so no suffix starts with tptp). The number of suffixes starting with pp is smaller than the number of suffixes starting with the empty string (which is equal to n=12n=12), so we apply a backward step with [1,12][1,12] and π[2]=t\pi[2]=t. Since the backward step is still unsuccessful, we conclude that π[2]=t\pi[2]=t does not occur in SS, so [l2,r2]=[1,n]=[1,12][l_{2},r_{2}]=[1,n]=[1,12] and 2=0\ell_{2}=0. Finally, we start again from the whole interval [1,12][1,12], and a backward step with π[1]=s\pi[1]=s returns [l1,r1]=[9,12][l_{1},r_{1}]=[9,12] (suffixes starting with ss), so 1=1\ell_{1}=1.

It is easy to see that the number of iterations is linear in mm. Indeed, every time we apply a backward step, either we move to the left across π\pi to compute a new matching statistic, or we increase by at least 1 the length of the suffix of π\pi which is forever discarded. This implies that the number of iterations is bounded by 2|π|=2m2|\pi|=2m.

We are only left with showing (i) how to compute jj and (ii) the interval of all suffixes starting with π[i..j]\pi[i..j] in Case 2 of the algorithm. To this end, we introduce the longest common prefix (LCP) array 𝖫𝖢𝖯=𝖫𝖢𝖯[2,n]\mathsf{LCP}=\mathsf{LCP}[2,n] of TT. We define 𝖫𝖢𝖯[i]\mathsf{LCP}[i] to be the length of the longest common prefix of the (i1)(i-1)-st lexicographically smallest suffix of TT and the ii-th lexicographically smallest suffix of TT. In Figure 1 we have 𝖫𝖢𝖯[5]=4\mathsf{LCP}[5]=4 because the fourth lexicographically smallest suffix of TT is issippi$issippi\$, the fifth lexicographically smallest suffix of TT is ississippi$ississippi\$, and the longest common prefix of issippi$issippi\$ and ississippi$ississippi\$ is issiissi, which has length 44. Remember that in the example the backward search starting from [4,5][4,5] (suffixes starting with issiissi) and pp was unsuccessful, so computing jj means determining the longest prefix of issiissi such that the the number of suffixes starting with such a prefix is bigger than 22. This is easy to compute by using the LCP array: the longest such prefix is the one of length max{𝖫𝖢𝖯[4],𝖫𝖢𝖯[6]}=max{1,0}=1\max\{\mathsf{LCP}[4],\mathsf{LCP}[6]\}=\max\{1,0\}=1, so that the desired prefix is ii. As a consequence, we are only left with showing how to compute the interval of all suffixes starting with the prefix ii — which is [2,5][2,5]. Notice that in order to compute this interval, it is enough to expand the interval [4,6][4,6] in both directions as long as the LCP value does not go below 11. Since 𝖫𝖢𝖯[4]=1\mathsf{LCP}[4]=1, 𝖫𝖢𝖯[3]=1\mathsf{LCP}[3]=1, and 𝖫𝖢𝖯[2]=0\mathsf{LCP}[2]=0, and we already know that 𝖫𝖢𝖯[6]=0\mathsf{LCP}[6]=0, we conclude that the desired interval is [2,5][2,5]. In other words, given a position tt, we must be able to compute the biggest integer kk less than tt such that 𝖫𝖢𝖯[k]<𝖫𝖢𝖯[t]\mathsf{LCP}[k]<\mathsf{LCP}[t], and the smallest integer kk bigger than tt such that 𝖫𝖢𝖯[k]<𝖫𝖢𝖯[t]\mathsf{LCP}[k]<\mathsf{LCP}[t] (in our case, t=4t=4). These queries are called PSV (“previous smaller value”) and NSV (“next smaller value”) queries. The LCP array can be augmented in such a way that PSV and NSV queries can be solved efficiently: different space-time trade-offs are possible, we refer the reader to [17] for details.

4 Matching statistics for Wheeler DFAs

Let us define Wheeler DFAs [7].

Definition 1.

Let 𝒜=(Q,E,s0,F)\mathcal{A}=(Q,E,s_{0},F) be a DFA. A Wheeler order on 𝒜\mathcal{A} is a total order \leq on Q such that s0us_{0}\leq u for every uQu\in Q and:

  1. (Axiom 1) If u,vQu,v\in Q and u<vu<v, then λ(u)λ(v)\lambda(u)\preceq\lambda(v).

  2. (Axiom 2) If (u,u),(v,v)E(u^{\prime},u),(v^{\prime},v)\in E, λ(u)=λ(v)\lambda(u)=\lambda(v) and u<vu<v, then u<vu^{\prime}<v^{\prime}.

A DFA 𝒜\mathcal{A} is Wheeler if it admits a Wheeler order.

2255667788993344101011111212131314141515161617171818191911startaaaaabbbccddeeefghila
Figure 2: A Wheeler DFA. States are numbered according to their positions in the Wheeler order.

It is immediate to check that this definition is equivalent to the one in [7], where it was shown that if a DFA 𝒜\mathcal{A} admits a Wheeler order \leq, then \leq is uniquely determined (that is, \leq is the Wheeler order on 𝒜\mathcal{A}). In the following, we fix a Wheeler DFA 𝒜=(Q,E,s0,F)\mathcal{A}=(Q,E,s_{0},F), where we assume Q={u1,,un}Q=\{u_{1},\dots,u_{n}\}, with u1<u2<<unu_{1}<u_{2}<\dots<u_{n} in the Wheeler order, and u1u_{1} coincides with the initial state s0s_{0}. See Figure 2 for an example.

We now show that a Wheeler order can be seen of as a permutation of the set of all states playing the same role as the suffix array of a string. In the following, it will be expedient to (conceptually) assume that s0s_{0} has a self-loop labeled #\# (this is consistent with Axiom 1, because #a\#\prec a for every aΣa\in\Sigma). This implies that every state has at least one incoming edge, so for every state uiu_{i} there exists at least one infinite string αΣω\alpha\in\Sigma^{\omega} that can be read starting from uiu_{i} and following edges in a backward fashion (for example, in Figure 2 for u9u_{9} such a string is cel###cel\#\#\#\dots). We denote by IuiI_{u_{i}} the set of all such strings. Formally:

Definition 2.

Let i[1,n]i\in[1,n]. For every state uiQu_{i}\in Q define:

Iui={αΣω|there exist integers f1,f2, in [1,n] such that (i) f1=i,(ii) (ufk+1,ufk)E for every k1 and (iii) α=λ(uf1)λ(uf2)}.\begin{split}I_{u_{i}}=\{\alpha\in\Sigma^{\omega}\;|\;\text{there exist integers $f_{1},f_{2},\dots$ in $[1,n]$ such that (i) $f_{1}=i$,}\\ \text{(ii) $(u_{f_{k+1}},u_{f_{k}})\in E$ for every $k\geq 1$ and (iii) $\alpha=\lambda(u_{f_{1}})\lambda(u_{f_{2}})\dots$}\}.\end{split}

For example, in Figure 2 we have Iu3={abdg###,abeh###,acei###}I_{u_{3}}=\{abdg\#\#\#\dots,abeh\#\#\#\dots,acei\#\#\#\dots\}.

The following lemma shows that the permutation of the states defined by the Wheeler order is the one lexicographically sorting the strings entering each state, just like the permutation defined by the suffix array lexicographically sorts the suffixes of the strings (a suffix is seen as a string “leaving” a text position).

Lemma 3.

Let i,j[1,n]i,j\in[1,n], with i<ji<j. Let αIui\alpha\in I_{u_{i}} and βIuj\beta\in I_{u_{j}}. Then, αβ\alpha\preceq\beta.

Proof.

Let f1,f2,f_{1},f_{2},\dots in [1,n][1,n] be such that (i) f1=if_{1}=i, (ii) (ufk+1,ufk)E(u_{f_{k+1}},u_{f_{k}})\in E for every k1k\geq 1 and (iii) α=λ(uf1)λ(uf2)\alpha=\lambda(u_{f_{1}})\lambda(u_{f_{2}})\dots. Analogously, let g1,g2,g_{1},g_{2},\dots in [1,n][1,n] be such that (i) g1=jg_{1}=j, (ii) (ugk+1,ugk)E(u_{g_{k+1}},u_{g_{k}})\in E for every k1k\geq 1 and (iii) β=λ(ug1)λ(ug2)\beta=\lambda(u_{g_{1}})\lambda(u_{g_{2}})\dots. Let αβ\alpha\not=\beta. We must prove that αβ\alpha\prec\beta. Let p1p\geq 1 be the smallest integer such that the pp-th character of α\alpha is different than the pp-th character of β\beta. In other words, we know that λ(uf1)=λ(ug1)\lambda(u_{f_{1}})=\lambda(u_{g_{1}}), λ(uf2)=λ(ug2)\lambda(u_{f_{2}})=\lambda(u_{g_{2}}), \dots, λ(ufp1)=λ(ugp1)\lambda(u_{f_{p-1}})=\lambda(u_{g_{p-1}}), but λ(ufp)λ(ugp)\lambda(u_{f_{p}})\not=\lambda(u_{g_{p}}). We must prove that λ(ufp)λ(ugp)\lambda(u_{f_{p}})\prec\lambda(u_{g_{p}}). Since λ(uf1)=λ(ug1)\lambda(u_{f_{1}})=\lambda(u_{g_{1}}) f1=i<j=g1f_{1}=i<j=g_{1}, and (uf2,uf1),(ug2,ug1)E(u_{f_{2}},u_{f_{1}}),(u_{g_{2}},u_{g_{1}})\in E, from Axiom 2 we obtain f2<g2f_{2}<g_{2}. Since λ(uf2)=λ(ug2)\lambda(u_{f_{2}})=\lambda(u_{g_{2}}), f2<g2f_{2}<g_{2}, and (uf3,uf2),(ug3,ug2)E(u_{f_{3}},u_{f_{2}}),(u_{g_{3}},u_{g_{2}})\in E, from Axiom 2 we obtain f3<g3f_{3}<g_{3}. By iterating this argument, we conclude fp<gpf_{p}<g_{p}. By Axiom 1, we obtain λ(ufp)λ(ugp)\lambda(u_{f_{p}})\preceq\lambda(u_{g_{p}}). Since λ(ufp)λ(ugp)\lambda(u_{f_{p}})\not=\lambda(u_{g_{p}}), we conclude λ(ufp)λ(ugp)\lambda(u_{f_{p}})\prec\lambda(u_{g_{p}}). ∎

If we think of a string as a labeled path, then the suffix array sorts the strings that can be read from each position by moving forward (that is, the suffixes of the string), while the Wheeler order sorts the strings that can be read from each position by moving backward towards the initial state. The underlying idea is the same: the forward vs backward difference is only due to historical reasons [6]. To compute the matching statistics on Wheeler DFA we reason as in the previous section replacing backward search with the forward search [6] defined as follows: given an interval [i,j][i,j] in [1,n][1,n] and aΣa\in\Sigma, find the (possibly empty) interval [i,j][i^{\prime},j^{\prime}] in [1,n][1,n] such that a state vkv_{k^{\prime}} is reachable from some state vkv_{k}, with ikji\leq k\leq j, through an edge labeled aa, if and only if ikji^{\prime}\leq k^{\prime}\leq j^{\prime} (this easily follows by using the axioms of Definition 1). For a constant size alphabet, given [i,j][i,j] and aa then [i,j][i^{\prime},j^{\prime}] can be determined in constant time. Given a string πΣ\pi\in\Sigma^{*}, if we start from the whole set of states and repeatedly apply the forward search we reach the set of all states uiu_{i} for which there exists αIui\alpha\in I_{u_{i}} prefixed by πR\pi^{R}; this is an interval with respect to the Wheeler order: in the following we call this interval T(π)T(\pi).

Because of the forward vs backward difference the problem of matching statistics will be defined in a symmetrical way on Wheeler DFAs. Given a pattern π=π[1..m]\pi=\pi[1..m], for every 1im1\leq i\leq m we want to determine (i) the longest suffix π\pi^{\prime} of π[1..i]\pi[1..i] which occurs in the Wheeler DFA 𝒜\mathcal{A} (that is, that can be read somewhere on 𝒜\mathcal{A} by concatenating edges), and (ii)(ii) the endpoints of the interval T(π)T(\pi^{\prime}).

Broadly speaking, we can apply the same idea of the algorithm for strings, but in a symmetrical way. We start from the beginning of π\pi (not from the end of π)\pi), and initially we consider the whole set of states. We repeatedly apply the forward search (not the backward search), until the forward search returns the empty interval for some i0i\geq 0. This means that π[1..i+1]\pi[1..i+1] does not occur in 𝒜\mathcal{A}. Then, if T(π[1..i])T(\pi[1..i]) is the whole set of states, we conclude that the character π[i+1]\pi[i+1] labels no edge in the graph. Otherwise, we must find the smallest jj such that T(π[1..i])T(\pi[1..i]) is strictly contained in T(π[j..i])T(\pi[j..i]) (that is, we must determine the longest suffix π[j..i]\pi[j..i] of π[1..i]\pi[1..i] which reaches more states than π[1..i]\pi[1..i]). Then we must determine the endpoints of the interval T(π[j..i])T(\pi[j..i]) so that we can go on with the forward search.

The challenge now is to find a way to solve the same subproblems that we identified in Case 2 of the algorithm for strings. In other words, we must find a way to determine jj and find the endpoints of the interval T(π[j..i])T(\pi[j..i]). We will show that the solution is not as simple as the one for the algorithm on strings.

5 The LCP array and matching statistics for Wheeler DFAs

We start observing that IuiI_{u_{i}} may be an infinite set. For example, in Figure 2, we have

Iu2={aaaaa,abdf###,aabdf###,aaabdf###,}.I_{u_{2}}=\{aaaaa\dots,abdf\#\#\#\dots,aabdf\#\#\#\dots,aaabdf\#\#\#\dots,\dots\}.

In general, an infinite set of (lexicographically sorted) strings in Σω\Sigma^{\omega} need not admit a minimum or a maximum. For example, the set {baaaa,abaaa,aabaa,aaaba}\{baaaa\dots,abaaa\dots,aabaa\dots,aaaba\dots\} does not admit a minimum (but only the infimum string aaaaaaaaaa\dots). Nonetheless, Lemma 3 implies that each IuiI_{u_{i}} admits both a minimum and a maximum. For example, the minimum is obtained as follows. Let f1=if_{1}=i, and for every k1k\geq 1, recursively let fk+1f_{k+1} be the smallest integer in [1,n][1,n] such that (ufk+1,ufk)E(u_{f_{k+1}},u_{f_{k}})\in E. Then, the minimum of IuiI_{u_{i}} is λ(uf1)λ(uf2)\lambda(u_{f_{1}})\lambda(u_{f_{2}})\dots, and analogously one can determine the maximum.

In the following, we will denote the minimum and the maximum of IuiI_{u_{i}} by mini\min_{i} and maxi\max_{i}, respectively (for example, in Figure 2 we have min2=aaaaa\min_{2}=aaaaa\dots, and max2=abdf###\max_{2}=abdf\#\#\#\dots). Lemma 3 implies that:

min1max1min2max2maxn1minnmaxn.\mathrm{min}_{1}\preceq\mathrm{max}_{1}\preceq\mathrm{min}_{2}\preceq\mathrm{max}_{2}\preceq\dots\preceq\mathrm{max}_{n-1}\preceq\mathrm{min}_{n}\preceq\mathrm{max}_{n}.

This suggests to generalize the LCP array as follows. Given α,βΣΣω\alpha,\beta\in\Sigma^{*}\cup\Sigma^{\omega}, let 𝗅𝖼𝗉(α,β)\mathop{\mathsf{lcp}}(\alpha,\beta) be the length of the longest common prefix of α\alpha and β\beta (if α=βΣω\alpha=\beta\in\Sigma^{\omega}, define 𝗅𝖼𝗉(α,β)=\mathop{\mathsf{lcp}}(\alpha,\beta)=\infty).

Definition 4.

The LCP-array of a Wheeler automaton 𝒜\mathcal{A} is the array 𝖫𝖢𝖯𝒜=𝖫𝖢𝖯𝒜[2,2n]\mathsf{LCP}_{\mathcal{A}}=\mathsf{LCP}_{\mathcal{A}}[2,2n] which contains the following 2n12n-1 values in this order: 𝗅𝖼𝗉(min1,max1)\mathop{\mathsf{lcp}}(\min_{1},\max_{1}), 𝗅𝖼𝗉(max1,min2)\mathop{\mathsf{lcp}}(\max_{1},\min_{2}), 𝗅𝖼𝗉(min2,max2)\mathop{\mathsf{lcp}}(\min_{2},\max_{2}), \dots, 𝗅𝖼𝗉(maxn1,minn)\mathop{\mathsf{lcp}}(\max_{n-1},\min_{n}), 𝗅𝖼𝗉(minn,maxn)\mathop{\mathsf{lcp}}(\min_{n},\max_{n}).

From the above characterization of mini\min_{i} and maxi\max_{i}, one can prove that for every entry either 𝖫𝖢𝖯𝒜[i]=\mathsf{LCP}_{\mathcal{A}}[i]=\infty or 𝖫𝖢𝖯𝒜[i]<3n\mathsf{LCP}_{\mathcal{A}}[i]<3n (it follows from Fine and Wilf Theorem [18, 19]), and one can design a polynomial time algorithm to compute 𝖫𝖢𝖯𝒜\mathsf{LCP}_{\mathcal{A}}.

Unfortunately, the array 𝖫𝖢𝖯𝒜\mathsf{LCP}_{\mathcal{A}} alone is not sufficient for computing matching statistics. Assume that T(π)={ur,ur+1,,us1,us}T(\pi)=\{u_{r},u_{r+1},\dots,u_{s-1},u_{s}\}, and that when we apply the forward search by adding a character cc, we obtain T(πc)=T(\pi c)=\emptyset. We must then determine the largest suffix π\pi^{\prime} of T(π)T(\pi) such that T(π)T(\pi) is strictly contained in T(π)T(\pi^{\prime}). Suppose that every string in IurI_{u_{r}} is prefixed by πR\pi^{R}, and every string in IusI_{u_{s}} is prefixed by πR\pi^{R}. In particular, both minr\min_{r} and maxs\max_{s} are prefixed by πR\pi^{R}. In this case, we can proceed like in the algorithm for strings: the desired suffix π\pi^{\prime} is the one having length max{𝗅𝖼𝗉(maxr1,minr),𝗅𝖼𝗉(maxs,mins+1)}\max\{\mathop{\mathsf{lcp}}(\max_{r-1},\min_{r}),\mathop{\mathsf{lcp}}(\max_{s},\min_{s+1})\}, which can be determined using 𝖫𝖢𝖯𝒜\mathsf{LCP}_{\mathcal{A}}. However, in general, even if some string in IurI_{u_{r}} must be prefixed by πR\pi^{R}, the string minr\min_{r} need not be prefixed by πR\pi^{R}, and similarly maxs\max_{s} need not be prefixed by πR\pi^{R}. The worst-case scenario occurs when r=sr=s. Consider Figure 2, and assume that π=heba\pi=heba. Then, we have r=s=3r=s=3 (note that abeh###abeh\#\#\#\dots is a string in Iu3I_{u_{3}} prefixed by πR\pi^{R}). However, both min3=abdg###\min_{3}=abdg\#\#\#\dots, and max3=acei###\max_{3}=acei\#\#\#\dots, are not prefixed by πR\pi^{R}. Notice that 𝗅𝖼𝗉(max2,min3)=3\mathop{\mathsf{lcp}}(\max_{2},\min_{3})=3 and 𝗅𝖼𝗉(max3,min4)=3\mathop{\mathsf{lcp}}(\max_{3},\min_{4})=3, but π\pi^{\prime} is not the suffix of length 3 of π\pi. Indeed, since min3\min_{3} is only prefixed by the prefix of πR\pi^{R} of length 22, and max3\max_{3} is only prefixed by the prefix of πR\pi^{R} of length 11, we conclude that it must be |π|=2|\pi^{\prime}|=2. In general, the desired suffix π\pi^{\prime} is the one having length |π||\pi^{\prime}| given by:

max{min{𝗅𝖼𝗉(maxr1,minr),𝗅𝖼𝗉(minr,πR)},min{𝗅𝖼𝗉(πR,maxs),𝗅𝖼𝗉(maxs,mins+1)}}.\max\left\{\,\mathrm{min}\{\mathop{\mathsf{lcp}}(\mathrm{max}_{r-1},\mathrm{min}_{r}),\!\mathop{\mathsf{lcp}}(\mathrm{min}_{r},\pi^{R})\},\mathrm{min}\{\mathop{\mathsf{lcp}}(\pi^{R},\mathrm{max}_{s}),\!\mathop{\mathsf{lcp}}(\mathrm{max}_{s},\mathrm{min}_{s+1})\}\,\right\}. (1)

The above formula shows that, in order to compute π\pi^{\prime}, in addition to 𝖫𝖢𝖯𝒜\mathsf{LCP}_{\mathcal{A}} it suffices to know the values 𝗅𝖼𝗉(minr,πR)\mathop{\mathsf{lcp}}(\min_{r},\pi^{R}) and 𝗅𝖼𝗉(πR,maxs)\mathop{\mathsf{lcp}}(\pi^{R},\max_{s}) (π\pi^{\prime} is a suffix of π\pi, so it is determined by its length). We now show how our algorithm can efficiently maintain the current pattern π\pi, the set T(π)={ur,ur+1,,us1,us}T(\pi)=\{u_{r},u_{r+1},\dots,u_{s-1},u_{s}\} and the values 𝗅𝖼𝗉(minr,πR)\mathop{\mathsf{lcp}}(\min_{r},\pi^{R}) and 𝗅𝖼𝗉(πR,maxs)\mathop{\mathsf{lcp}}(\pi^{R},\max_{s}) during the computation of the matching statistics. We assume that the input automaton is encoded with the rank/select data structures supporting the execution of a step of forward search in O(log|Σ|)O(\log|\Sigma|) time, see [6] for details. In addition, we will use the following result.

Lemma 5.

Let A[1,n]A[1,n] be a sequence of values over an ordered alphabet Σ\Sigma. Consider the following queries: (i) given i,j[1..n]i,j\in[1..n], compute the minimum value in S[i..j]S[i..j], and (ii) given t[1..n]t\in[1..n] and cΣc\in\Sigma, determine the biggest k<tk<t (or the smallest k>tk>t) such that A[k]<cA[k]<c. Then, AA can be augumented with a data structure of 2n+o(n)2n+o(n) bits such that query (i) can be answered in constant time and query (ii) can be answered in O(logn)O(\log n) time.

Proof.

There exists a data structure of 2n+o(n)2n+o(n) bits that allows to solve range minimum queries in constant time [20], so using AA we can solve queries (i) in constant time. Now, let us show how to solve queries (ii). Let f1f_{1} be the answer of query (i) on input i=t/2i=\lceil t/2\rceil and j=t1j=t-1. If f1<cf_{1}<c, then we must keep searching in the interval [t/2,t1][\lceil t/2\rceil,t-1], otherwise, we must keep searching in the interval [1,t/21][1,\lceil t/2\rceil-1]. In other words, we can answer a query (ii) by means of a binary search on [1,t1][1,t-1], which takes O(logt)O(\log t) (and so O(logn)O(\log n)) time. ∎

Notice that query (ii) can be seen as a variant of PSV and NSV queries. In the following, we assume that the array 𝖫𝖢𝖯𝒜\mathsf{LCP}_{\mathcal{A}} has been augmented with the data structure of Lemma 5.

At the beginning we have π=ϵ\pi=\epsilon, so T(ϵ)={1,2,,n}T(\epsilon)=\{1,2,\ldots,n\} and trivially 𝗅𝖼𝗉(minr,πR)=𝗅𝖼𝗉(πR,maxs)=0\mathop{\mathsf{lcp}}(\min_{r},\pi^{R})=\mathop{\mathsf{lcp}}(\pi^{R},\max_{s})=0. At each iteration we perform a step of forward search computing T(πc)T(\pi c) given T(π)T(\pi); then we distinguish two cases according to whether T(πc)T(\pi c) is empty or not.

Case 1. T(πc)={ur,ur+1,,us1,us}T(\pi c)=\{u_{r^{\prime}},u_{r^{\prime}+1},\dots,u_{s^{\prime}-1},u_{s^{\prime}}\} is not empty. In that case πc\pi c will become the pattern at the next iteration. Since we already have T(πc)T(\pi c) we are left with the task of computing 𝗅𝖼𝗉(minr,cπR)\mathop{\mathsf{lcp}}(\min_{r^{\prime}},c\pi^{R}) and 𝗅𝖼𝗉(cπR,maxs)\mathop{\mathsf{lcp}}(c\pi^{R},\max_{s^{\prime}}). We only show how to compute 𝗅𝖼𝗉(minr,cπR)\mathop{\mathsf{lcp}}(\min_{r^{\prime}},c\pi^{R}), the latter computation being analogous. Let kk be the smallest integer in [1,n][1,n] such that (uk,ur)E(u_{k},u_{r^{\prime}})\in E. Notice that we can easily compute kk by means of standard rank/select operations on the compact data structure used to encode 𝒜\mathcal{A}. Since urT(πc)u_{r^{\prime}}\in T(\pi c), it must be ksk\leq s. Moreover, the characterization of minr\min_{r^{\prime}} that we described above implies that minr=cmink\min_{r^{\prime}}=c\min_{k}, hence 𝗅𝖼𝗉(minr,cπR)=𝗅𝖼𝗉(cmink,cπR)=1+𝗅𝖼𝗉(mink,πR)\mathop{\mathsf{lcp}}(\min_{r^{\prime}},c\pi^{R})=\mathop{\mathsf{lcp}}(c\min_{k},c\pi^{R})=1+\mathop{\mathsf{lcp}}(\min_{k},\pi^{R}). To compute 𝗅𝖼𝗉(mink,πR)\mathop{\mathsf{lcp}}(\min_{k},\pi^{R}) we distinguish two subcases:

  1. a)a)

    k>rk>r, hence r<ksr<k\leq s. Since ur,usT(π)u_{r},u_{s}\in T(\pi), there exist αIur\alpha\in I_{u_{r}} and βIus\beta\in I_{u_{s}} both prefixed by πR\pi^{R}. But αmaxrminkminsβ\alpha\preceq\max_{r}\preceq\min_{k}\preceq\min_{s}\preceq\beta, so mink\min_{k} is also prefixed by πR\pi^{R}, and we conclude 𝗅𝖼𝗉(mink,πR)=|π|\mathop{\mathsf{lcp}}(\min_{k},\pi^{R})=|\pi|.

  2. b)b)

    krk\leq r. In this case, we have minkmaxkmink+1maxk+1minrπR\min_{k}\preceq\max_{k}\preceq\min_{k+1}\prec\max_{k+1}\preceq\dots\preceq\min_{r}\prec\pi^{R}, and therefore 𝗅𝖼𝗉(mink,πR)\mathop{\mathsf{lcp}}(\mathrm{min}_{k},\pi^{R}) is equal to

    min{𝗅𝖼𝗉(mink,maxk),𝗅𝖼𝗉(maxk,mink+1),𝗅𝖼𝗉(mink+1,maxk+1),,𝗅𝖼𝗉(minr,πR)}.\mathrm{min}\{\mathop{\mathsf{lcp}}(\mathrm{min}_{k},\mathrm{max}_{k}),\mathop{\mathsf{lcp}}(\mathrm{max}_{k},\mathrm{min}_{k+1}),\mathop{\mathsf{lcp}}(\mathrm{min}_{k+1},\mathrm{max}_{k+1}),\dots,\mathop{\mathsf{lcp}}(\mathrm{min}_{r},\pi^{R})\}.

    With the above formula we can compute 𝗅𝖼𝗉(mink,πR)\mathop{\mathsf{lcp}}(\min_{k},\pi^{R}) using query (i) of Lemma 5 over the range 𝖫𝖢𝖯𝒜[2k,2r1]\mathsf{LCP}_{\mathcal{A}}[2k,2r-1] and the value 𝗅𝖼𝗉(minr,πR)\mathop{\mathsf{lcp}}(\min_{r},\pi^{R}).

Case 2. T(πc)T(\pi c) is empty. In this case at the next iteration the pattern will be largest suffix π\pi^{\prime} of π\pi such that T(π)T(\pi) is strictly contained in T(π)={ur′′,,us′′}T(\pi^{\prime})=\{u_{r^{\prime\prime}},\dots,u_{s^{\prime\prime}}\}. We compute |π||\pi^{\prime}| using (1); if |π|>𝗅𝖼𝗉(minr,πR)|\pi^{\prime}|>\mathop{\mathsf{lcp}}(\mathrm{min}_{r},\pi^{R}) we set r′′=rr^{\prime\prime}=r, otherwise we apply query (ii) of Lemma 5 to find the rightmost entry r′′r^{\prime\prime} in 𝖫𝖢𝖯𝒜[2,2r1]\mathsf{LCP}_{\mathcal{A}}[2,2r-1] smaller than |π||\pi^{\prime}|. Computing s′′s^{\prime\prime} is analogous.

Given T(π)={ur′′,ur′′+1,,us′′1,us′′}T(\pi^{\prime})=\{u_{r^{\prime\prime}},u_{r^{\prime\prime}+1},\dots,u_{s^{\prime\prime}-1},u_{s^{\prime\prime}}\}, where r′′rr^{\prime\prime}\leq r, ss′′s\leq s^{\prime\prime}, and at least one inequality is strict, we want to compute 𝗅𝖼𝗉(minr′′,(π)R)\mathop{\mathsf{lcp}}(\min_{r^{\prime\prime}},(\pi^{\prime})^{R}) and 𝗅𝖼𝗉((π)R,maxs′′)\mathop{\mathsf{lcp}}((\pi^{\prime})^{R},\max_{s^{\prime\prime}}). We only consider 𝗅𝖼𝗉(minr′′,(π)R)\mathop{\mathsf{lcp}}(\min_{r^{\prime\prime}},(\pi^{\prime})^{R}), the latter computation being analogous. We distinguish two subcases:

  1. a)a)

    r′′=rr^{\prime\prime}=r. Then 𝗅𝖼𝗉(minr′′,(π)R)=𝗅𝖼𝗉(minr,(π)R)=min{𝗅𝖼𝗉(minr,πR),|π|}\mathop{\mathsf{lcp}}(\min_{r^{\prime\prime}},(\pi^{\prime})^{R})=\mathop{\mathsf{lcp}}(\min_{r},(\pi^{\prime})^{R})=\min\{\mathop{\mathsf{lcp}}(\min_{r},\pi^{R}),|\pi^{\prime}|\}.

  2. b)b)

    r′′<rr^{\prime\prime}<r. In particular, since ur′′u_{r^{\prime\prime}} is the left endpoint of T(π)T(\pi^{\prime}) and |T(π)|2|T(\pi^{\prime})|\geq 2, one can prove like in Case 1a)1a) that maxr′′\max_{r^{\prime\prime}} is prefixed by (π)R(\pi^{\prime})^{R}. We immediately conclude that 𝗅𝖼𝗉(minr′′,(π)R)=min{𝗅𝖼𝗉(minr′′,maxr′′),|π|}\mathop{\mathsf{lcp}}(\min_{r^{\prime\prime}},(\pi^{\prime})^{R})=\min\{\mathop{\mathsf{lcp}}(\min_{r^{\prime\prime}},\max_{r^{\prime\prime}}),|\pi^{\prime}|\}, which can be immediately computed since 𝗅𝖼𝗉(minr′′,maxr′′)\mathop{\mathsf{lcp}}(\min_{r^{\prime\prime}},\max_{r^{\prime\prime}}) is a value stored in 𝖫𝖢𝖯𝒜\mathsf{LCP}_{\mathcal{A}}.

We can summarize the above discussion as follows.

Theorem 6.

Given a Wheeler DFA 𝒜\mathcal{A}, there exists a data structure occupying O(|𝒜|)O(|\mathcal{A}|) words which can compute the pattern matching statistics of a pattern PP in time O(|P|log|𝒜|)O(|P|\log|\mathcal{A}|).

Funding

TG funded by National Institutes of Health (NIH) NIAID (grant no. HG011392), the National Science Foundation NSF IIBR (grant no. 2029552) and a Natural Science and Engineering Research Council (NSERC) Discovery Grant (grant no. RGPIN-07185-2020). GM funded by the Italian Ministry of University and Research (PRIN 2017WR7SHH). MS funded by the INdAM-GNCS Project (CUP_E55F22000270001). NP funded by the European Union (ERC, REGINDEX, 101039208). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council. Neither the European Union nor the granting authority can be held responsible for them.

6 References

References

  • [1] P. Ferragina and G. Manzini, “Opportunistic data structures with applications,” in Proc. 41st Annual Symposium on Foundations of Computer Science (FOCS’00), 2000, pp. 390–398.
  • [2] M. Burrows and D. J. Wheeler, “A block-sorting lossless data compression algorithm,” Tech. Rep., 1994.
  • [3] P. Ferragina, F. Luccio, G. Manzini, and S. Muthukrishnan, “Structuring labeled trees for optimal succinctness, and beyond,” in proc. 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS’05), 2005, pp. 184–193.
  • [4] A. Bowe, T. Onodera, K. Sadakane, and T. Shibuya, “Succinct de Bruijn graphs,” in Algorithms in Bioinformatics, Berlin, Heidelberg, 2012, pp. 225–235, Springer Berlin Heidelberg.
  • [5] V. Mäkinen, N. Välimäki, and J. Sirén, “Indexing graphs for path queries with applications in genome research,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 11, pp. 375–388, 2014.
  • [6] T. Gagie, G. Manzini, and J. Sirén, “Wheeler graphs: A framework for BWT-based data structures,” Theoret. Comput. Sci., vol. 698, pp. 67 – 78, 2017.
  • [7] J. Alanko, G. D’Agostino, A. Policriti, and N. Prezza, “Regular languages meet prefix sorting,” in Proc. of the 31st Symposium on Discrete Algorithms, (SODA’20). 2020, pp. 911–930, SIAM.
  • [8] N. Cotumaccio and N. Prezza, “On indexing and compressing finite automata,” in Proc. of the 32nd Symposium on Discrete Algorithms, (SODA’21). 2021, pp. 2585–2599, SIAM.
  • [9] N. Cotumaccio, “Graphs can be succinctly indexed for pattern matching in O(|E|2+|V|5/2)O(|E|^{2}+|V|^{5/2}) time,” in 2022 Data Compression Conference (DCC), 2022, pp. 272–281.
  • [10] J. Alanko, G. D’Agostino, A. Policriti, and N. Prezza, “Wheeler languages,” Information and Computation, vol. 281, pp. 104820, 2021.
  • [11] D. Gusfield, Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, Cambridge University Press, 1997.
  • [12] W. I. Chang and E. L. Lawler, “Sublinear approximate string matching and biological applications,” Algorithmica, vol. 12, pp. 327–344, 2005.
  • [13] P. Weiner, “Linear pattern matching algorithms,” in Proc. 14th IEEE Annual Symposium on Switching and Automata Theory, 1973, pp. 1–11.
  • [14] U. Manber and G. Myers, “Suffix arrays: A new method for on-line string searches,” SIAM J. Comput., vol. 22, no. 5, pp. 935–948, 1993.
  • [15] M. I. Abouelhoda, S. Kurtz, and E. Ohlebusch, “Replacing suffix trees with enhanced suffix arrays,” J. of Discrete Algorithms, vol. 2, no. 1, pp. 53–86, 2004.
  • [16] K. Sadakane, “Compressed suffix trees with full functionality,” Theor. Comp. Sys., vol. 41, no. 4, pp. 589–607, 2007.
  • [17] E. Ohlebusch, S. Gog, and A. Kügell, “Computing matching statistics and maximal exact matches on compressed full-text indexes,” in Proceedings of the 17th International Conference on String Processing and Information Retrieval (SPIRE’10), Berlin, Heidelberg, 2010, p. 347–358, Springer-Verlag.
  • [18] N. J. Fine and H. S. Wilf, “Uniqueness theorem for periodic functions,” Proc. Amer. Math. Soc., , no. 16, pp. 109–114, 1965.
  • [19] S. Mantaci, A. Restivo, G. Rosone, and M. Sciortino, “An extension of the Burrows-Wheeler transform,” Theor. Comput. Sci., vol. 387, no. 3, pp. 298–312, 2007.
  • [20] Johannes Fischer, “Optimal succinctness for range minimum queries,” in LATIN 2010: Theoretical Informatics, Alejandro López-Ortiz, Ed., Berlin, Heidelberg, 2010, pp. 158–169, Springer Berlin Heidelberg.