This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Covering Sequences for \ell-tuples

Sagi Marcovich, , Tuvi Etzion,  and Eitan Yaakobi,  S. Marcovich, T. Etzion and E. Yaakobi are with the Department of Computer Science, Technion — Israel Institute of Technology, Haifa 3200003, Israel (e-mail: {sagimar,etzion,yaakobi}@cs.technion.ac.il).
Abstract

de Bruijn sequences of order \ell, i.e., sequences that contain each -tuple {\ell\text{-tuple }}as a window exactly once, have found many diverse applications in information theory and most recently in DNA storage. This family of binary sequences has asymptotic rate of 1/21/2. To overcome this low rate, we study \ell-tuples covering sequences, which impose that each \ell-tuple appears at least once as a window in the sequence. The cardinality of this family of sequences is analyzed while assuming that \ell is a function of the sequence length nn. Lower and upper bounds on the asymptotic rate of this family are given. Moreover, we study an upper bound for \ell such that the redundancy of the set of \ell-tuples covering sequences is at most a single symbol. Lastly, we present efficient encoding and decoding schemes for \ell-tuples covering sequences that meet this bound.

I Introduction

The binary de Bruijn graph of order \ell, GG_{\ell}, was introduced in 1946 by de Bruijn [6]. His target in introducing this graph was to find a recursive method to enumerate the number of cyclic binary sequences of length 22^{\ell} such that each \ell-tuple appears as a window exactly once in each sequence. These sequences were later called de Bruijn sequences. These results were later generalized in [22] for any alphabet of finite size qq, using a qq-ary generalization of the de Bruijn graph of order \ell, Gq,G_{q,\ell}.

The vertices of Gq,G_{q,\ell} are the qq-ary (1)(\ell-1)-tuples, and its edges correspond to the qq-ary \ell-tuples. There is an edge uv{u\rightarrow v} if vv can be obtained from uu by shifting one entry left and appending a symbol. Eulerian cycles in de Bruijn graphs, i.e., cycles that visit all the edges of Gq,G_{q,\ell} exactly once, are called de Bruijn cycles. It was proved that the number of de Bruijn cycles in Gq,G_{q,\ell} is (q!)q1/q(q!)^{q^{\ell-1}}/{q^{\ell}} [22].

Each de Bruijn cycle induces a single (cyclic) de Bruijn sequence of length qq^{\ell}, by picking any edge in the cycle as a starting point, considering its first entry and appending the first entry of each consecutive edge in the cycle. All sequences that can be generated in this way are considered as the same sequence. Contrary to this, each de Bruijn cycle induces qq^{\ell} distinct acyclic de Bruijn sequences, i.e., sequences of length q+1q^{\ell}+\ell-1 that contain each \ell-tuple as a window exactly once, using a similar method with the exception of appending the (1\ell-1)-suffix of the last edge as well; each sequence corresponds to a choice of different starting edge from Gq,G_{q,\ell}. Hence, the number of such acyclic de Bruijn sequences is (q!)q1(q!)^{q^{\ell-1}} and their asymptotic rate is logq(q!)/q\log_{q}(q!)/q (equals 1/21/2 for q=2q=2). One of the first applications of the de Bruijn graph was in the introduction of shift-register sequences and linear feedback shift registers [11]. Throughout the years, an extensive number of papers have studied the de Bruijn sequences and their applications, several of those include [2, 4, 8, 9, 13, 17, 19]. Most recently, DNA storage has brought fresh interest to this family of sequences; for more information on such applications the reader is referred to [1, 12, 21].

This paper studies a novel generalization of de Bruijn sequences (for the rest of this paper we refer only to acyclic sequences). We say that a sequence is an \ell-tuples covering sequence if it contains each qq-ary \ell-tuple as a window at least once. This work follows recent generalizations of de Bruijn sequences that proposed unique variations regarding the appearances of \ell-tuples in the sequence: \ell-repeat free sequences [7, 10] require each \ell-tuple to appear at most once, (b,b,\ell)-locally-constrained de Bruijn sequences [3] require each \ell-tuple to appear at most once in every window of length bb, and (,μ)(\ell,\mu)-balanced de Bruijn sequences [16] require each -tuple{\ell\text{-tuple}} to appear exactly μ\mu times in the sequence.

Notice that for sequences of length q+1q^{\ell}+\ell-1, all \ell-tuples covering sequences are simply the de Bruijn sequences deduced from the de Bruijn graph Gq,G_{q,\ell}; as a result, their asymptotic rate is logq(q!)/q\log_{q}(q!)/q. Our main goal is to efficiently construct codes of \ell-tuples covering sequences with higher rates (specifically larger than 1/21/2 for binary sequences) and fixed number of redundancy symbols. We study the cardinality for the set of \ell-tuples covering sequences and present lower bounds on its asymptotic rate for various values of \ell. Additionally, we present an upper bound on \ell such that the redundancy of the set of all \ell-tuples covering sequences is at most one symbol. Later, we present an encoding algorithm for the set of binary \ell-tuples covering sequences that uses a single redundancy bit and meets this bound on \ell. Finally, we use a generalization of the de Bruijn graph to develop an upper bound for the cardinality of this set of sequences.

Another interesting family of sequences is introduced as a building block to our analysis of \ell-tuples covering sequences. For some \ell-tuple 𝒗{\boldsymbol{v}}, we say that a sequence is a 𝐯{\boldsymbol{v}}-avoiding sequence if it does not contain 𝒗{\boldsymbol{v}} as a window. Note that if 𝒗{\boldsymbol{v}} is the all-zero \ell-tuple, then this family of sequences is known as RLL sequences and was studied before, for example in [14, 20]. We study this family of sequences for any \ell-tuple 𝒗{\boldsymbol{v}}.

The rest of this paper is organized as follows. In Section II we formally define the families of sequences studied in this paper and review several previous results. In Section III, we study the family of 𝒗{\boldsymbol{v}}-avoiding sequences for any \ell-tuple 𝒗{\boldsymbol{v}}. Based on these results, in Section IV we analyze the cardinality of \ell-tuples covering sequences and present an encoding scheme for q=2q=2 which uses a single redundancy bit.

II Definitions and Preliminaries

For two integers i,ki,k\in\mathbb{N} such that iki\leqslant k we denote by [i,k][i,k] the set {i,,k}\{i,\dots,k\} and use [k][k] as a shorthand for [0,k1][0,k-1]. We use the notation Σq={0,1,,q1}\Sigma_{q}=\{0,1,\dots,q-1\} as the alphabet of finite size qq. For simplicity, when q=2q=2, we omit the parameter qq from this notation and similar ones.

Let nn\in\mathbb{N} and let 𝒘=(w0,,wn1)Σqn{\boldsymbol{w}}=(w_{0},\dots,w_{n-1})\in\Sigma_{q}^{n} denote a sequence of length nn. For two positive integers ii and kk such that i+k1ni+k-1\leqslant n, let 𝒘i,k{\boldsymbol{w}}_{i,k} denote the substring (wi,,wi+k1)(w_{i},\dots,w_{i+k-1}). Additionally, let Pref(𝒘)k𝒘0,k,{}_{k}({\boldsymbol{w}})\triangleq{\boldsymbol{w}}_{0,k}, Suff(𝒘)k𝒘nk,k{}_{k}({\boldsymbol{w}})\triangleq{\boldsymbol{w}}_{n-k,k} denote the kk-prefix, kk-suffix of 𝒘{\boldsymbol{w}}, respectively. The notation 𝒘𝒗{\boldsymbol{w}}\circ{\boldsymbol{v}} is the concatenation of 𝒘{\boldsymbol{w}} and another sequence 𝒗{\boldsymbol{v}}, and 𝒘i{\boldsymbol{w}}^{i} denotes the concatenation of 𝒘{\boldsymbol{w}} ii times, i.e., 𝒘i=𝒘𝒘i1{{\boldsymbol{w}}^{i}={\boldsymbol{w}}\circ{\boldsymbol{w}}^{i-1}}. Let W,VW,V denote two sets of vectors over Σq\Sigma_{q}. We denote the set WV={𝒘𝒗𝒘W,𝒗V}WV=\{{\boldsymbol{w}}\circ{\boldsymbol{v}}\mid{\boldsymbol{w}}\in W,{\boldsymbol{v}}\in V\} and the set WiW^{i} to be ii concatenations of the set WW. Finally, the redundancy of a set AΣqnA\subseteq\Sigma_{q}^{n} is defined as red(A)nlogq|A|\text{red}({A})\triangleq n-\log_{q}|A|.

Definition 1

. The \ell-th order qq-ary de Bruijn graph Gq,G_{q,\ell} is the digraph (V,E)(V,E), where V=Σq1V=\Sigma_{q}^{\ell-1} and

E={((s0,s1,,s2),(s1,s2,,s1))siΣq}.E=\{\left((s_{0},s_{1},\dots,s_{\ell-2}),(s_{1},s_{2},\dots,s_{\ell-1})\right)\mid s_{i}\in\Sigma_{q}\}.

Note that the edges of Gq,G_{q,\ell} correspond to the set of qq-ary -tuples{\ell\text{-tuples}}, Σq\Sigma_{q}^{\ell}.

Definition 2

. Let >1\ell>1 be an integer and n=q+1n=q^{\ell}+\ell-1. A sequence 𝒔Σqn{\boldsymbol{s}}\in\Sigma_{q}^{n} is called a de Bruijn sequence of order \ell if 𝒔{\boldsymbol{s}} contains each qq-ary \ell-tuple as a window exactly once.

Let 𝒮q(){\cal S}_{q}(\ell) denote the set of qq-ary de Bruijn sequences of order \ell. The connection between Eulerian cycles in Gq,G_{q,\ell} to de Bruijn sequences is as follows. In order to generate a sequence from a cycle, we pick any edge in the cycle and set its first entry as the start of the sequence. Then, we append to the sequence the first entry of each consecutive edge in the cycle. Finally, we append the (1)(\ell-1)-suffix of the last edge of the cycle to form the whole sequence. Note that since each edge of Gq,G_{q,\ell} can be picked as the first edge of the Eulerian cycle, a single cycle generates qq^{\ell} unique de Bruijn sequences.

Example 1

. Let q=2,=3,n=10q=2,\ell=3,n=10. The sequence 𝒔=0001011100{\boldsymbol{s}}~{}=~{}0001011100 is a de Bruijn sequence. 𝒔{\boldsymbol{s}} can be generated from Gq,G_{q,\ell} using the Eulerian cycle

000000000101010101010101111111111101010000.00\overset{000}{\rightarrow}00\overset{001}{\rightarrow}01\overset{010}{\rightarrow}10\overset{101}{\rightarrow}01\overset{011}{\rightarrow}11\overset{111}{\rightarrow}11\overset{110}{\rightarrow}10\overset{100}{\rightarrow}00.

Recall that the number of de Bruijn cycles in Gq,G_{q,\ell} is (q!)q1/q(q!)^{q^{\ell-1}}/{q^{\ell}}. Since each de Bruijn cycle generates qq^{\ell} unique de Bruijn sequences of length q+1q^{\ell}+\ell-1, it follows that |𝒮q()|=(q!)q1|{\cal S}_{q}(\ell)|=(q!)^{q^{\ell-1}}. Therefore, the asymptotic rate of 𝒮q(){\cal S}_{q}(\ell) is

lim suplogq|𝒮q()|q+1=logq(q!)q.\displaystyle\limsup_{\ell\rightarrow\infty}\frac{\log_{q}|{\cal S}_{q}(\ell)|}{q^{\ell}+\ell-1}=\frac{\log_{q}(q!)}{q}.

Note that when q=2q=2, this asymptotic rate equals 1/21/2. However, for qq\rightarrow\infty, it approaches 11.

Next, we introduce the main family of sequences that is discussed in this paper.

Definition 3

. Let n,n,\ell be integers. A sequence 𝒘Σqn{\boldsymbol{w}}\in\Sigma_{q}^{n} is called an \ell-tuples covering sequence if 𝒘{\boldsymbol{w}} contains each qq-ary \ell-tuple as a window at least once, i.e., for each 𝒗Σq{\boldsymbol{v}}\in\Sigma_{q}^{\ell}, there exists i[n+1]i\in[n-\ell+1] such that 𝒘i,=𝒗{\boldsymbol{w}}_{i,\ell}={\boldsymbol{v}}.

Example 2

. Let q=2,=3,n=13q=2,\ell=3,n=13. The sequence 𝒘1=0001001110101{\boldsymbol{w}}_{1}=0001001110101 is an \ell-tuples covering sequence. However, the sequence 𝒘2=1001001110101{\boldsymbol{w}}_{2}=1001001110101 is not an \ell-tuples covering sequence, since it does not contain the 33-tuple 000000.

We denote the set of all qq-ary \ell-tuples covering sequence over Σqn\Sigma_{q}^{n} by q(n,){\cal R}_{q}(n,\ell) and notate the size of such code by rq(n,)|q(n,)|r_{q}(n,\ell)\triangleq|{\cal R}_{q}(n,\ell)|. For a window length that is a function of nn, that is =f(n)\ell=f(n), we denote the asymptotic rate of q(n,f(n)){\cal R}_{q}(n,f(n)) by

q()lim supnlogqrq(n,f(n))n.\mathbb{R}_{q}(\ell)\triangleq\limsup_{n\rightarrow\infty}\frac{\log_{q}r_{q}(n,f(n))}{n}.

Note the following connection between \ell-tuples covering sequences and de Bruijn sequences; if n=q+1n=q^{\ell}+\ell-1, then the set q(n,){\cal R}_{q}(n,\ell) is exactly the set of de Bruijn sequences 𝒮q(){\cal S}_{q}(\ell). Therefore, q()=logq(q!)/q\mathbb{R}_{q}(\ell)={\log_{q}(q!)}/{q} in this case. In Section IV we study the cardinality of q(n,){\cal R}_{q}(n,\ell) for various sizes of \ell, i.e., for various functions f(n)f(n). Moreover, we present an encoding algorithm for q=2q=2 that uses a single redundancy bit.

III 𝒗{\boldsymbol{v}}-Avoiding Sequences

In this section, we present the auxiliary family of 𝐯{\boldsymbol{v}}-avoiding sequences that is used later in our analysis of \ell-tuples covering sequences in Section IV.

Definition 4

. Let \ell be an integer and 𝒗Σq{\boldsymbol{v}}\in\Sigma_{q}^{\ell} a fixed \ell-tuple. The set of 𝒗{\boldsymbol{v}}-avoiding sequences over Σqn\Sigma_{q}^{n}, denoted by 𝒜q(n,𝒗){\cal A}_{q}(n,{\boldsymbol{v}}) contains all qq-ary sequences of length nn that do not contain 𝒗{\boldsymbol{v}} as a window. Namely,

𝒜q(n,𝒗)={𝒘Σqni[n+1],𝒘i,𝒗}{\cal A}_{q}(n,{\boldsymbol{v}})=\{{\boldsymbol{w}}\in\Sigma_{q}^{n}\mid\forall i\in[n-\ell+1],{\boldsymbol{w}}_{i,\ell}\neq{\boldsymbol{v}}\}

For a given \ell-tuple 𝒗{\boldsymbol{v}}, we notate the size of this code by aq(n,𝒗)|𝒜q(n,𝒗)|a_{q}(n,{\boldsymbol{v}})\triangleq|{\cal A}_{q}(n,{\boldsymbol{v}})|. Note that for 𝒗=0{\boldsymbol{v}}=0^{\ell}, this family of sequences is the family of (0,1)(0,\ell-1)-RLL sequences [20] (for integers d,kd,k, a (d,k)(d,k)-RLL sequence satisfies that the number of zeros between two consecutive ones is in the range [d,k][d,k]). These sequences were studied extensively in [14] for different functions =f(n)\ell=f(n).

We are motivated to study this family of sequences due to the following connection to the family of \ell-tuples covering sequences; a sequence 𝒔{\boldsymbol{s}} is an \ell-tuples covering sequence if and only if for every 𝒗Σq{\boldsymbol{v}}\in\Sigma_{q}^{\ell}, 𝒔{\boldsymbol{s}} is not a 𝒗{\boldsymbol{v}}-avoiding sequence. This connection will be utilized later in order to encode and analyze the cardinality of q(n,){\cal R}_{q}(n,\ell).

III-A Cardinality Upper Bound Analysis

First, we give an upper bound for aq(n,𝒗)a_{q}(n,{\boldsymbol{v}}) for any 𝒗Σq{\boldsymbol{v}}\in\Sigma^{\ell}_{q} in order to use it later to estimate the cardinality of q(n,){\cal R}_{q}(n,\ell). For a sequence 𝒔Σqn{\boldsymbol{s}}\in\Sigma_{q}^{n}, let p(𝒔)p({\boldsymbol{s}}) denote its period, that is, the smallest positive integer that satisfies si=si+p(𝒔)s_{i}=s_{i+p({\boldsymbol{s}})} for every i[np(𝒔)]i\in[n-p({\boldsymbol{s}})]. Initially, we prove the following useful lemma.

Lemma 5

. Let \ell be an integer and 𝒗Σq{\boldsymbol{v}}\in\Sigma_{q}^{\ell} a tuple of length \ell. Let β(𝒗)\beta({\boldsymbol{v}}) denote the set of sequences of length 22\ell that contain 𝒗{\boldsymbol{v}} as a window at least once, i.e.,

β(𝒗)=Σq2𝒜q(2,𝒗).\beta({\boldsymbol{v}})=\Sigma_{q}^{2\ell}\setminus{\cal A}_{q}(2\ell,{\boldsymbol{v}}).

Then,

|β(𝒗)|12(+1)(q1)2q2.|\beta({\boldsymbol{v}})|\geqslant\frac{1}{2}(\ell+1)(q-1)^{2}q^{\ell-2}.
Proof:

We observe that 𝒗{\boldsymbol{v}} can appear in a sequence of length 22\ell many times only if its period is small. In particular, we show that if p(𝒗)>/2p({\boldsymbol{v}})>\ell/2 then 𝒗{\boldsymbol{v}} can appear at most twice. Hence, we lower bound |β(𝒗)||\beta({\boldsymbol{v}})| by placing 𝒗{\boldsymbol{v}} in a sequence 𝒙β(𝒗){\boldsymbol{x}}\in\beta({\boldsymbol{v}}) and setting the minimal number of symbols in order to ensure that 𝒗{\boldsymbol{v}} appears at most twice in 𝒙{\boldsymbol{x}}. At last, we account for occurrences that were enumerated multiple times.

Let i[+1]i\in[\ell+1] denote the starting position of 𝒗{\boldsymbol{v}} in 𝒙{\boldsymbol{x}}, and assume w.l.o.g that 𝒗{\boldsymbol{v}} is in the middle of 𝒙{\boldsymbol{x}}, i.e., i[1,]i\in[1,\ell]. We increase the period of 𝒙{\boldsymbol{x}} forward by selecting xi+vmodp(𝒗)x_{i+\ell}\neq v_{\ell\bmod p({\boldsymbol{v}})} in order to ensure that p(𝒙i,+1)>/2p({\boldsymbol{x}}_{i,\ell+1})>\ell/2. Let p=p(𝒙i,+1)p=p({\boldsymbol{x}}_{i,\ell+1}) and assume in the contrary that p/2p\leqslant\ell/2. However, this means that pp is a multiple of p(𝒗)p({\boldsymbol{v}}) and hence since xi+vmodp(𝒗),x_{i+\ell}\neq v_{\ell\bmod p({\boldsymbol{v}})}, then xi+xi+px_{i+\ell}\neq x_{i+\ell-p} which is a contradiction to the definition of pp. Similarly, we increase the period of 𝒙{\boldsymbol{x}} backwards by selecting xi1v(1modp(𝒗))x_{i-1}\neq v_{(-1\bmod p({\boldsymbol{v}}))}. If 𝒙{\boldsymbol{x}} starts or ends with 𝒗{\boldsymbol{v}}, we only need to select a single symbol in order to increase the period of 𝒗{\boldsymbol{v}} forward or backwards, respectively. Hence, we enumerate for each position i[+1]i\in[\ell+1] at least (q1)2q2(q-1)^{2}q^{\ell-2} possible choices of 𝒙{\boldsymbol{x}}.

Next, it is clear that another instance of 𝒗{\boldsymbol{v}} can appear at most once after ii, at position i>i+/2i^{\prime}>i+\ell/2, and once before ii, at position i′′<i/2i^{\prime\prime}<i-\ell/2. Moreover, it is only possible for one of those instances to occur, since |[i′′,i+]|>2|[i^{\prime\prime},i^{\prime}+\ell]|>2\ell. Thus, when summing the possible choices of 𝒙{\boldsymbol{x}} for every possible position of the tuple 𝒗{\boldsymbol{v}}, we count each choice at most twice since 𝒗{\boldsymbol{v}} can appear at most twice in 𝒙{\boldsymbol{x}}. Thus,

|β(𝒗)|12(+1)(q1)2q2.|\beta({\boldsymbol{v}})|\geqslant\frac{1}{2}(\ell+1)(q-1)^{2}q^{\ell-2}.

Next, we use the result of Lemma 5 to obtain an upper bound on aq(n,𝒗)a_{q}(n,{\boldsymbol{v}}) for every n,n,\ell and 𝒗Σq{\boldsymbol{v}}\in\Sigma_{q}^{\ell}.

Lemma 6

. Let n,n,\ell be positive integers such that n\ell\leqslant n, and let 𝒗Σq{\boldsymbol{v}}\in\Sigma_{q}^{\ell} be any sequence of length \ell. Then,

aq(n,𝒗)qnc1n2q,a_{q}(n,{\boldsymbol{v}})\leqslant q^{n-c_{1}\frac{n-2\ell}{q^{\ell}}},

where c1=(q1)2logqe4q2c_{1}=\frac{(q-1)^{2}\log_{q}e}{4q^{2}} and ee is the base of the natural logarithm.

Proof:

Consider the following set

𝒜q(n,𝒗)=𝒜q(2,𝒗)n2Σqnmod2,{\cal A}^{\prime}_{q}(n,{\boldsymbol{v}})={\cal A}_{q}(2\ell,{\boldsymbol{v}})^{\left\lfloor\frac{n}{2\ell}\right\rfloor}\Sigma_{q}^{n\bmod 2\ell},

which is the set of sequences which are concatenation of n2\left\lfloor\frac{n}{2\ell}\right\rfloor sequences from 𝒜q(2,𝒗){\cal A}_{q}(2\ell,{\boldsymbol{v}}) appended by any sequence of length nmod2n\bmod 2\ell. Note that 𝒜q(n,𝒗)𝒜q(n,𝒗){\cal A}_{q}(n,{\boldsymbol{v}})\subseteq{\cal A}^{\prime}_{q}(n,{\boldsymbol{v}}) since for every 𝒙𝒜q(n,𝒗){\boldsymbol{x}}\in{\cal A}_{q}(n,{\boldsymbol{v}}) and i[n2]i\in[\left\lfloor\frac{n}{2\ell}\right\rfloor], 𝒗{\boldsymbol{v}} is not a substring of 𝒙2i,2{\boldsymbol{x}}_{2\ell\cdot i,2\ell}, i.e., 𝒙2i,2𝒜q(2,𝒗){\boldsymbol{x}}_{2\ell\cdot i,2\ell}\in{\cal A}_{q}(2\ell,{\boldsymbol{v}}). Hence,

aq(n,𝒗)|𝒜q(n,𝒗)|=aq(2,𝒗)n2qnmod2.\displaystyle a_{q}(n,{\boldsymbol{v}})\leqslant|{\cal A}^{\prime}_{q}(n,{\boldsymbol{v}})|=a_{q}(2\ell,{\boldsymbol{v}})^{\left\lfloor\frac{n}{2\ell}\right\rfloor}q^{n\bmod 2\ell}. (1)

All sequences of length 22\ell that contain 𝒗{\boldsymbol{v}} at least once are not included in 𝒜q(2,𝒗){\cal A}_{q}(2\ell,{\boldsymbol{v}}). Therefore, using Lemma 5,

aq(2,𝒗)=q2|β(𝒗)|\displaystyle a_{q}(2\ell,{\boldsymbol{v}})=q^{2\ell}-|\beta({\boldsymbol{v}})| q212(+1)(q1)2q2\displaystyle\leqslant q^{2\ell}-\frac{1}{2}(\ell+1)(q-1)^{2}q^{\ell-2} (2)
=q2(1(+1)(q1)22q+2).\displaystyle=q^{2\ell}\left(1-\frac{(\ell+1)(q-1)^{2}}{2q^{\ell+2}}\right).

By combining inequalities (1) and (2) we get

aq(n,𝒗)\displaystyle a_{q}(n,{\boldsymbol{v}}) (q2(1(+1)(q1)22q+2))n2qnmod2\displaystyle\leqslant\left(q^{2\ell}\left(1-\frac{(\ell+1)(q-1)^{2}}{2q^{\ell+2}}\right)\right)^{\left\lfloor\frac{n}{2\ell}\right\rfloor}\cdot q^{n\bmod 2\ell}
=qn(1(+1)(q1)22q+2)n2\displaystyle=q^{n}\left(1-\frac{(\ell+1)(q-1)^{2}}{2q^{\ell+2}}\right)^{\left\lfloor\frac{n}{2\ell}\right\rfloor}
(a)qnexp((+1)(q1)22q+2)n2\displaystyle\overset{(a)}{\leqslant}q^{n}\cdot\exp\left(-\frac{(\ell+1)(q-1)^{2}}{2q^{\ell+2}}\right)^{\left\lfloor\frac{n}{2\ell}\right\rfloor}
qnexp((+1)(q1)22q+2(n21))\displaystyle\leqslant q^{n}\cdot\exp\left(-\frac{(\ell+1)(q-1)^{2}}{2q^{\ell+2}}(\frac{n}{2\ell}-1)\right)
qn(q1)2(n2)4q+2logqe\displaystyle\leqslant q^{n-\frac{(q-1)^{2}(n-2\ell)}{4q^{\ell+2}}\log_{q}e}

where (a)(a) results from the inequality 1xex1-x\leqslant e^{-x} for all xx. ∎

III-B Compression Algorithm for Binary Sequences

Next, we focus on binary sequences, i.e., q=2q=2, and present a 𝒗-avoiding {{\boldsymbol{v}}\text{-avoiding }}sequences compression algorithm for any 𝒗{\boldsymbol{v}} of length logn6\ell\leqslant\log n-6 and nn large enough. The algorithm receives a 𝒗{\boldsymbol{v}}-avoiding sequence of length nn and outputs a unique unconstrained sequence of length n1n-1. Clearly, this algorithm can be used for any <\ell^{\prime}<\ell by padding 𝒗{\boldsymbol{v}} to size \ell and continuing regularly; hence we assume from now on that =logn6\ell=\log n-6. This compression algorithm will be utilized in Section IV to encode binary \ell-tuples covering sequences.

For every 𝒗Σ{\boldsymbol{v}}\in\Sigma^{\ell}, we denote two functions,

f1(𝒗)=𝒗(1v|𝒗|modp(𝒗))f_{1}({\boldsymbol{v}})={\boldsymbol{v}}\circ(1-v_{|{\boldsymbol{v}}|\bmod p({\boldsymbol{v}})})
f2(𝒗)=Pref|𝒗|/2+3(𝒗)f1(Suff|𝒗|/23(𝒗)).f_{2}({\boldsymbol{v}})=\text{Pref}_{\lfloor|{\boldsymbol{v}}|/2\rfloor+3}({\boldsymbol{v}})\circ f_{1}(\text{Suff}_{\lceil|{\boldsymbol{v}}|/2\rceil-3}({\boldsymbol{v}})).

Note that both functions append a single bit to 𝒗{\boldsymbol{v}}. We have the following lemma,

Lemma 7

. For every 𝒗Σ{\boldsymbol{v}}\in\Sigma^{\ell}, p(f1(𝒗))(+1)/2p(f_{1}({\boldsymbol{v}}))\geqslant\lceil(\ell+1)/2\rceil.

We say that a sequence has a long period if its period is at least half of its length, i.e., p(𝒗)|𝒗|/2p({\boldsymbol{v}})\geqslant\lceil|{\boldsymbol{v}}|/2\rceil. Hence, from Lemma 7, for every 𝒗Σ{\boldsymbol{v}}\in\Sigma^{\ell}, f1(𝒗)f_{1}({\boldsymbol{v}}) has a long period, and f2(𝒗)f_{2}({\boldsymbol{v}}) satisfies that its (/22)(\lceil\ell/2\rceil-2)-suffix has a long period. These functions are utilized in the following compression algorithm.

The 𝒗{\boldsymbol{v}}-avoiding compression algorithm (Algorithm 1) receives a sequence 𝒔𝒜(n,𝒗){\boldsymbol{s}}\in{\cal A}(n,{\boldsymbol{v}}) for 𝒗Σ{\boldsymbol{v}}\in\Sigma^{\ell} and compresses it to some uniquely decodable sequence 𝒙Σn1{\boldsymbol{x}}\in\Sigma^{n-1}. Initially, the algorithm checks the first bit of 𝒔{\boldsymbol{s}}. If it is zero, then the rest of 𝒔{\boldsymbol{s}} is returned as the result (see Figure 1). Otherwise, an index ii is decoded from the subsequent logn1\log n-1 bits of 𝒔{\boldsymbol{s}} (by converting this binary sequence to its integer representation) and the algorithm will construct 𝒙{\boldsymbol{x}} by inserting an occurrence of 𝒗{\boldsymbol{v}} at this index. However, since such an insertion might create new instances of 𝒗{\boldsymbol{v}} in the sequence 𝒙{\boldsymbol{x}}, 55 additional bits are appended to 𝒗{\boldsymbol{v}} in order to ensure that the insertion index can always be deduced by the decoder (see Figure 2). The redundancy bits are added as follows; first, two bits are appended to 𝒗{\boldsymbol{v}} (independently of the input sequence 𝒔{\boldsymbol{s}}) to construct 𝒖{\boldsymbol{u}}, a sequence that satisfies that both 𝒖{\boldsymbol{u}} and Suff(+1)/22(𝒖)\text{Suff}_{\lceil(\ell+1)/2\rceil-2}({\boldsymbol{u}}) have long periods. As a result, when 𝒖{\boldsymbol{u}} is inserted at position ii, at most three new occurrences can be created to the right of it (see Lemma 8 which follows). These cases are eliminated using the 33 remaining bits appended to 𝒖{\boldsymbol{u}}, denoted by 𝒂{\boldsymbol{a}}. The result is a sequence 𝒙Σn1{\boldsymbol{x}}\in\Sigma^{n-1} with its rightmost occurrence of 𝒖{\boldsymbol{u}} at position ii.

  Refer to caption  

Figure 1: Illustration of Algorithm 1 for the case s0=0s_{0}=0.

  Refer to caption  

Figure 2: Illustration of Algorithm 1 for the case s0=1s_{0}=1.
Algorithm 1 𝒗{\boldsymbol{v}}-avoiding compression algorithm

Input: A sequence 𝒔𝒜(n,𝒗){\boldsymbol{s}}\in{\cal A}(n,{\boldsymbol{v}})
Output: A sequence 𝒙Σn1{\boldsymbol{x}}\in\Sigma^{n-1}

1:If s0=0s_{0}=0, return 𝒔1,n1{\boldsymbol{s}}_{1,n-1}. Otherwise, decode index ii from 𝒔1,logn1{\boldsymbol{s}}_{1,\log n-1} and set 𝒘=Suffnlogn(𝒔){\boldsymbol{w}}=\text{Suff}_{n-\log n}({\boldsymbol{s}})
2:Construct 𝒖=f2(f1(𝒗)){\boldsymbol{u}}=f_{2}(f_{1}({\boldsymbol{v}}))
3:Let
A={p(𝒖)3m|𝒖|1:𝒘i+1,m=Suffm(𝒖)}A=\left\{p({\boldsymbol{u}})-3\leqslant m\leqslant|{\boldsymbol{u}}|-1:{\boldsymbol{w}}_{i+1,m}=\text{Suff}_{m}({\boldsymbol{u}})\right\}
4:Set 𝒂=000{\boldsymbol{a}}=000, notate the elements of AA in decreasing order m2>m1>m0Am_{2}>m_{1}>m_{0}\in A (starting from m2m_{2})
5:for every mkAm_{k}\in A do
6:     Set ak=1u1mk+ka_{k}=1-{u}_{\ell-1-m_{k}+k}
7:end for
8:Return 𝒙=Prefi(𝒘)𝒖𝒂Suff|𝒘|i(𝒘){\boldsymbol{x}}=\text{Pref}_{i}({\boldsymbol{w}})\circ{\boldsymbol{u}}\circ{\boldsymbol{a}}\circ\text{Suff}_{|{\boldsymbol{w}}|-i}({\boldsymbol{w}})
Lemma 8

. At Step 3 of Algorithm 1, we have that |A|3|A|\leqslant 3 .

Proof:

Remind that

A={p(𝒖)3m|𝒖|1:𝒘i+1,m=Suffm(𝒖)}.A=\left\{p({\boldsymbol{u}})-3\leqslant m\leqslant|{\boldsymbol{u}}|-1:{\boldsymbol{w}}_{i+1,m}=\text{Suff}_{m}({\boldsymbol{u}})\right\}.

For any two values m0<m1Am_{0}<m_{1}\in A, it holds that 𝒖m0+1,m0=𝒖m1+1,m0{\boldsymbol{u}}_{\ell-m_{0}+1,m_{0}}={\boldsymbol{u}}_{\ell-m_{1}+1,m_{0}} and thus p(Suffm1(𝒖))m1m0p(\text{Suff}_{m_{1}}({\boldsymbol{u}}))\leqslant m_{1}-m_{0}. Let 𝒖^\hat{{\boldsymbol{u}}} denote the ((+1)/22)(\lceil(\ell+1)/2\rceil-2)-suffix of 𝒖{\boldsymbol{u}} which its period was increased by invoking f2f_{2}. Since

m1>m0p(𝒖)3p(f1(𝒗))3(+1)/23,m_{1}>m_{0}\geqslant p({\boldsymbol{u}})-3\geqslant p(f_{1}({\boldsymbol{v}}))-3\geqslant\lceil(\ell+1)/2\rceil-3,

it follows that Suffm1(𝒖)\text{Suff}_{m_{1}}({\boldsymbol{u}}) contains 𝒖^\hat{{\boldsymbol{u}}} and hence

m1m0p(𝒖^)(a)(+1)/222m_{1}-m_{0}\geqslant p(\hat{{\boldsymbol{u}}})\overset{(a)}{\geqslant}\frac{\lceil(\ell+1)/2\rceil-2}{2}

where (a) follows from Corollary 7. Finally, assume in the contrary that there exist four unique values m0<m1<m2<m3m_{0}<m_{1}<m_{2}<m_{3} that satisfy the condition of the lemma. We have

m3m03(m1m0)3((+1)/22)2,m_{3}-m_{0}\geqslant 3(m_{1}-m_{0})\geqslant\frac{3(\lceil(\ell+1)/2\rceil-2)}{2},

which is larger than the difference between the maximal and the minimal values in [p(𝒖)3,|𝒖|1][p({\boldsymbol{u}})-3,|{\boldsymbol{u}}|-1] for \ell large enough, that is,

|𝒖|1p(𝒖)+3\displaystyle|{\boldsymbol{u}}|-1-p({\boldsymbol{u}})+3 +1(+1)/2+3\displaystyle\leqslant\ell+1-\lceil(\ell+1)/2\rceil+3
=(+1)/2+3,\displaystyle=\lfloor(\ell+1)/2\rfloor+3,

and we have a contradiction. ∎

Lemma 9

. The assignments at Steps 6 and 8 of Algorithm 1 are correctly defined for nn large enough.

Proof:

Assume in the contrary that the assignment at Step 6 is incorrect for some mkm_{k}, i.e., 1mk+k<0\ell-1-m_{k}+k<0. It follows that mk>1m_{k}>\ell-1. However, from the proof of Lemma 8 mkm_{k} must be the maximal value in AA and since the values of AA are indexed in decreasing order, the index of mkm_{k} is k=2k=2 and the assignment is correctly defined. The assignment at Step 8 is correctly defined since for nn large enough, it holds that

|𝒘|=nlogn>n2i.|{\boldsymbol{w}}|=n-\log n>\frac{n}{2}\geqslant i.

Lemma 10

. After Step 8 of Algorithm 1, 𝒙{\boldsymbol{x}} has its rightmost occurrence of 𝒖{\boldsymbol{u}} at position ii.

Proof:

Assume in the contrary that 𝒙{\boldsymbol{x}} contains another instance of 𝒖{\boldsymbol{u}} at position j>ij>i. Clearly, 𝒘{\boldsymbol{w}} does not contain 𝒖{\boldsymbol{u}} since 𝒘{\boldsymbol{w}} is a substring of the input sequence 𝒔𝒜(n,𝒗){\boldsymbol{s}}\in{\cal A}(n,{\boldsymbol{v}}). Therefore, from the construction of 𝒙{\boldsymbol{x}} at Step 8, ji|𝒖|+2j-i\leqslant|{\boldsymbol{u}}|+2. On the other hand, jip(𝒖)j-i\geqslant p({\boldsymbol{u}}). It follows that 𝒘i+1,m=Suffm(𝒖){\boldsymbol{w}}_{i+1,m}=\text{Suff}_{m}({\boldsymbol{u}}) for mk=ji3m_{k}=j-i-3 and hence mkm_{k} belongs to the set AA that is constructed at Step 3. Combining with Lemma 8, this occurrence was eliminated at Step 5 of the algorithm using

xj+1mk+k=xi++2+k=aku1mk+k,x_{j+\ell-1-m_{k}+k}=x_{i+\ell+2+k}=a_{k}\neq u_{\ell-1-m_{k}+k},

and we have a contradiction. ∎

Theorem 11

. Algorithm 1 compresses a sequence 𝒔𝒜(n,𝒗){\boldsymbol{s}}\in{\cal A}(n,{\boldsymbol{v}}) to a sequence 𝒙Σn1{\boldsymbol{x}}\in\Sigma^{n-1} that can be uniquely decoded to its input sequence 𝒔{\boldsymbol{s}}. The time complexity of the Algorithm 1 is 𝒪(logn){\cal O}(\log n) and the time complexity of its decoder is 𝒪(n){\cal O}(n).

Proof:

Clearly, if s0=0s_{0}=0 Algorithm 1 returns a sequence of length n1n-1. Otherwise, the algorithm inserts +5=logn1\ell+5=\log n-1 bits at Step 8 after removing log(n)\log(n) bits at Step 1, and the length of the result is n1n-1 as well. The time complexity of the algorithm follows from Θ(logn)\Theta(\log n) to decode the index ii, and other operations are on 𝒗{\boldsymbol{v}} and a substring of length 𝒪(){\cal O}(\ell) of 𝒔{\boldsymbol{s}}. Note that the time complexity of finding the period of 𝒗{\boldsymbol{v}} (or a substring of it) is also 𝒪(){\cal O}(\ell) [5]. Hence, the total time complexity is 𝒪(logn){\cal O}(\log n).

A decoder for this algorithm constructs 𝒖=f2(f1(𝒗)){\boldsymbol{u}}=f_{2}(f_{1}({\boldsymbol{v}})) and looks for the rightmost occurrence of 𝒖{\boldsymbol{u}} in 𝒙{\boldsymbol{x}}. If such occurrence is found, then it must be at the insertion index ii from Lemma 10. From here, reconstructing 𝒔{\boldsymbol{s}} is straightforward. Since the decoder scans the sequence 𝒙{\boldsymbol{x}} in order to find the rightmost occurrence of 𝒗{\boldsymbol{v}}, its time complexity is 𝒪(n){\cal O}(n), which is larger than the time complexity of the encoder.

Remark 1

. Note that Algorithm 1 is generic and fits any 𝒗Σ{{\boldsymbol{v}}\in\Sigma^{\ell}}. Clearly, with some knowledge of the tuple 𝒗{\boldsymbol{v}} some steps can be skipped and the supported tuple length \ell can be larger. For example, if 𝒗{\boldsymbol{v}} has a long period it is unnecessary to invoke f1f_{1} at Step 2 and =logn5\ell=\log n-5 can be used. If p(𝒗)=p({\boldsymbol{v}})=\ell, i.e., 𝒗{\boldsymbol{v}} is aperiodic, then no additional bits are necessary (the algorithm returns Prefi(𝒘)𝒗Suff|𝒘|i(𝒘)\text{Pref}_{i}({\boldsymbol{w}})\circ{\boldsymbol{v}}\circ\text{Suff}_{|{\boldsymbol{w}}|-i}({\boldsymbol{w}})) and the algorithm is applicable to =logn1\ell=\log n-1.

Example 3

. Let =9\ell=9, 𝒗=010101010{\boldsymbol{v}}=010101010. Notice that p(𝒗)=2p({\boldsymbol{v}})~{}=~{}2. We can construct in advance f1(𝒗)=0101010100,f_{1}({\boldsymbol{v}})=0101010100, and since |f1(𝒗)|/23=2|f_{1}({\boldsymbol{v}})|/2-3=2,

𝒖=f2(f1(𝒗))=01010101f1(00)=01010101001.{\boldsymbol{u}}=f_{2}(f_{1}({\boldsymbol{v}}))=01010101\circ f_{1}(00)=01010101001.

Notice that p(𝒖)=9p({\boldsymbol{u}})=9.

Let n=215n=2^{15} and assume the input sequence 𝒔1=0n𝒜(n,𝒗){\boldsymbol{s}}_{1}=0^{n}~{}\in~{}{\cal A}(n,{\boldsymbol{v}}). In this case, the algorithm simply returns 𝒙1=0n1{\boldsymbol{x}}_{1}=0^{n-1} at Step 1. The decoder receives 𝒙1{\boldsymbol{x}}_{1} and notices that no occurrences of 𝒗{\boldsymbol{v}} exist in the sequence, and thus it correctly returns 0𝒙1=𝒔10\circ{\boldsymbol{x}}_{1}={\boldsymbol{s}}_{1}.

Next, assume 𝒔2=(10101001)212𝒜(n,𝒗){\boldsymbol{s}}_{2}=(10101001)^{2^{12}}\in{\cal A}(n,{\boldsymbol{v}}). One can easily verify that |𝒔2|=n|{\boldsymbol{s}}_{2}|=n and that 𝒗{\boldsymbol{v}} does not appear as a window in 𝒔2{\boldsymbol{s}}_{2}. At Step 1, since the first bit of the sequence is 11, the algorithm decodes an index ii from the integer value of (𝒔2)1,14=01010011010100,({\boldsymbol{s}}_{2})_{1,14}=01010011010100, that is, i=5332i=5332, and denotes by 𝒘{\boldsymbol{w}} the unused part of 𝒔2{\boldsymbol{s}}_{2}. Soon, the algorithm will insert the tuple 𝒖{\boldsymbol{u}} at position ii. However, we need to ensure that this insertion does not create additional occurrences of 𝒖{\boldsymbol{u}} that start to the right of ii.

Since 𝒘i+1,10=1010100110{\boldsymbol{w}}_{i+1,10}=1010100110, the set of indices that is constructed at Step 3 is A={10}A=\{10\}. Note that we have |A|=1|A|=1 although by Lemma 8 the maximal size of AA is 33. At Step 6 the algorithm constructs 𝒂=001{\boldsymbol{a}}=001 where a2=1a_{2}=1 ensures that a new occurrence of 𝒖{\boldsymbol{u}} is not created when concatenating 𝒖{\boldsymbol{u}} and 𝒘i+1,10{\boldsymbol{w}}_{i+1,10}. Finally, at Step 8 the algorithm returns the sequence

𝒙2=Prefi(𝒘)01010101001001Suff|𝒘|i(𝒘).{\boldsymbol{x}}_{2}=\text{Pref}_{i}({\boldsymbol{w}})\circ 01010101001\circ 001\circ\text{Suff}_{|{\boldsymbol{w}}|-i}({\boldsymbol{w}}).

The decoder receives 𝒙2{\boldsymbol{x}}_{2} and identifies the rightmost occurrence of 𝒗{\boldsymbol{v}} at position i=5332i=5332. Thus, it constructs

𝒔2=1b(i)Prefi(𝒙2)Suff|𝒙2|i5(𝒙2).{\boldsymbol{s}}_{2}=1\circ b(i)\circ\text{Pref}_{i}({\boldsymbol{x}}_{2})\circ\text{Suff}_{|{\boldsymbol{x}}_{2}|-i-5}({\boldsymbol{x}}_{2}).

IV Tuples Covering Sequences

In this section, we focus on the main family of sequences discussed in this paper, \ell-tuples covering sequences. We study the cardinality of this set of sequences and present a lower bound on its asymptotic rate for various values of \ell. Then, we present an upper bound for \ell such that the redundancy of the set of \ell-tuples covering sequences is at most one symbol. Later, we present an encoding algorithm for binary \ell-tuples covering sequences that uses a single redundancy bit that meets this bound on \ell. Finally, we use a generalization of the de Bruijn graph to develop an upper bound for the cardinality of this set of sequences.

IV-A Lower Bound Analysis on the Rate

We begin the discussion of \ell-tuples covering sequences for any alphabet of size qq. For integers n,n,\ell, it is clear that if n<q+1n<q^{\ell}+\ell-1 then rq(n,)=0r_{q}(n,\ell)=0 since a sequence of such length can contain at most n+1n-\ell+1 unique \ell-tuples where n+1<qn-\ell+1<q^{\ell}. In the case where n=q+1n=q^{\ell}+\ell-1 the set q(n,){\cal R}_{q}(n,\ell) is exactly the set of de Bruijn sequences of order \ell, and hence rq(n,)=|𝒮q()|r_{q}(n,\ell)=|{\cal S}_{q}(\ell)|. For larger values of nn, we have the following lemma.

Lemma 12

. Let n=q+1+kn=q^{\ell}+\ell-1+k for ,k\ell,k\in\mathbb{N}. Then,

rq(n,)(q!)q1qk.r_{q}(n,\ell)\geqslant(q!)^{q^{\ell-1}}\cdot q^{k}.
Proof:

We construct \ell-tuples covering sequences of length nn using any de Bruijn sequence of order \ell followed by any kk symbols from Σq\Sigma_{q}. The result follows immediately from the size of 𝒮q(){\cal S}_{q}(\ell). ∎

Therefore, we have the next results.

Corollary 13

. Let n=q+1+f(n)n=q^{\ell}+\ell-1+f(n) for \ell\in\mathbb{N}. Then,

q(){logq(q!)qf(n)=o(q)q1logq(q!)+α1+αf(n)=αq+o(q) for α>01f(n)=ω(q)\mathbb{R}_{q}(\ell)\geqslant\begin{cases}\frac{\log_{q}(q!)}{q}&f(n)=o(q^{\ell})\\ \frac{q^{-1}\log_{q}(q!)+\alpha}{1+\alpha}&f(n)=\alpha q^{\ell}+o(q^{\ell})\text{ for }\alpha>0\\ 1&f(n)=\omega(q^{\ell})\\ \end{cases}
Proof:

Using Lemma 12 we have

q()lim supnq1logq(q!)+f(n)q+f(n),\mathbb{R}_{q}(\ell)\geqslant\limsup_{n\rightarrow\infty}\frac{q^{\ell-1}\log_{q}(q!)+f(n)}{q^{\ell}+f(n)},

and the results for f(n)=o(q)f(n)=o(q^{\ell}) and f(n)=ω(q)f(n)=\omega(q^{\ell}) follow immediately. For the last case, we apply f(n)=cq+o(q)f(n)=c\cdot q^{\ell}+o(q^{\ell}) and receive

q()lim supnq1logq(q!)+cqq+cq=q1logq(q!)+c1+c.\mathbb{R}_{q}(\ell)\geqslant\limsup_{n\rightarrow\infty}\frac{q^{\ell-1}\log_{q}(q!)+cq^{\ell}}{q^{\ell}+cq^{\ell}}=\frac{q^{-1}\log_{q}(q!)+c}{1+c}.

Alternatively, we have

Corollary 14

. Let nn\in\mathbb{N} and let =logng(n)\ell=\log n-g(n). Then,

q(){logq(q!)qg(n)=o(1)1+1qc+1logq(q!)1qcg(n)=c for c>01g(n)=ω(1)\mathbb{R}_{q}(\ell)\geqslant\begin{cases}\frac{\log_{q}(q!)}{q}&g(n)=o(1)\\ 1+\frac{1}{q^{c+1}}\log_{q}(q!)-\frac{1}{q^{c}}&g(n)=c\text{ for }c>0\\ 1&g(n)=\omega(1)\\ \end{cases}
Proof:

By applying n=q+1+kn=q^{\ell}+\ell-1+k, we have q=nqg(n)q^{\ell}=\frac{n}{q^{g(n)}}, and k=nnqg(n)𝒪(logq(n))k=n-\frac{n}{q^{g(n)}}-{\cal O}(\log_{q}(n)). Thus, using Claim 12 we have

q()\displaystyle\mathbb{R}_{q}(\ell) limnnqg(n)+1logq(q!)+nnqg(n)n\displaystyle\geqslant\lim_{n\rightarrow\infty}\frac{\frac{n}{q^{g(n)+1}}\log_{q}(q!)+n-\frac{n}{q^{g(n)}}}{n}
=limn1+1qg(n)+1logq(q!)1qg(n),\displaystyle=\lim_{n\rightarrow\infty}1+\frac{1}{q^{g(n)+1}}\log_{q}(q!)-\frac{1}{q^{g(n)}},

and the result follows as limn1qg(n)=1\lim_{n\rightarrow\infty}\frac{1}{q^{g(n)}}=1 when g(n)=o(1)g(n)=o(1) and limn1qg(n)=0\lim_{n\rightarrow\infty}\frac{1}{q^{g(n)}}=0 when g(n)=ω(1)g(n)=\omega(1). ∎

For convenience, we use both representations of nn and \ell throughout this section.

IV-B Single Symbol Analysis on the Redundancy

Next, we present an upper bound on \ell, where the redundancy of q(n,){\cal R}_{q}(n,\ell) is at most a single symbol. This bound uses the upper bound on the cardinality of 𝒗{\boldsymbol{v}}-avoiding sequences presented in Section III.

Theorem 15

. If n,n,\ell are integers such that logqnlogqlogqn𝒪(1)\ell\leqslant\log_{q}n-\log_{q}\log_{q}n-{\cal O}(1), then for nn large enough, red(q(n,))1\textmd{red}({\cal R}_{q}(n,\ell))\leqslant 1.

Proof:

Let =logq(n)logq(logq(n))c2\ell=\log_{q}(n)-\log_{q}(\log_{q}(n))-c_{2} for some constant c2>0c_{2}>0 that will be determined later. For every sequence 𝒘Σqn{\boldsymbol{w}}\in\Sigma_{q}^{n} that is not an \ell-tuples covering sequence, there exists 𝒗Σq{\boldsymbol{v}}\in\Sigma_{q}^{\ell} such that 𝒘{\boldsymbol{w}} is a 𝒗{\boldsymbol{v}}-avoiding sequence, i.e., 𝒘𝒜q(n,𝒗){\boldsymbol{w}}\in{\cal A}_{q}(n,{\boldsymbol{v}}). Thus, using Lemma 6, the number of sequences that are not \ell-tuples covering sequences can be bounded above by

𝒗Σqaq(n,𝒗)qnc1n2q+,\sum_{{\boldsymbol{v}}\in\Sigma_{q}^{\ell}}a_{q}(n,{\boldsymbol{v}})\leqslant q^{n-c_{1}\frac{n-2\ell}{q^{\ell}}+\ell}, (3)

where c1=(q1)2logqe4q2c_{1}=\frac{(q-1)^{2}\log_{q}e}{4q^{2}}. Therefore, in order to have red(q(n,))1\text{red}({\cal R}_{q}(n,\ell))\leqslant 1, i.e., rq(n,)qn1r_{q}(n,\ell)\geqslant q^{n-1}, we require that the value in equation (3) is bounded from above by (q1)qn1(q-1)q^{n-1}. Hence, it is enough to have

c1n2qlogqqq1.c_{1}\frac{n-2\ell}{q^{\ell}}-\ell\geqslant\log_{q}\frac{q}{q-1}. (4)

By applying =logqnlogqlogqnc2\ell=\log_{q}n-\log_{q}\log_{q}n-c_{2}, or alternatively q=nqc2lognq^{\ell}=\frac{n}{q^{c_{2}}\log n}, we get

c1n2q\displaystyle c_{1}\frac{n-2\ell}{q^{\ell}}-\ell =c1nq(2c1q1)\displaystyle=c_{1}\frac{n}{q^{\ell}}-\ell\left(\frac{2c_{1}}{q^{\ell}}-1\right)
=c1qc2logqn(2c1qc2logqnn1)\displaystyle=c_{1}q^{c_{2}}\log_{q}n-\ell\left(\frac{2c_{1}q^{c_{2}}\log_{q}n}{n}-1\right)
c1qc2logqn2c1qc2(logqn)2nlogqn\displaystyle\geqslant c_{1}q^{c_{2}}\log_{q}n-\frac{2c_{1}q^{c_{2}}(\log_{q}n)^{2}}{n}-\log_{q}n

which satisfies inequality (4) when c1qc2>1c_{1}q^{c_{2}}>1, i.e., c2>logqc1c_{2}>-\log_{q}c_{1}, and for nn large enough. ∎

IV-C Encoding Algorithm for the Binary Case

For the rest of this section, we assume that q=2q=2. We present an encoder for \ell-tuples covering sequences over Σn\Sigma^{n}. This encoder uses a single redundancy bit and handles \ell-tuples of length lognloglogn6\ell\leqslant\log n-\log\log n-6 for nn large enough. Note that this value of \ell is associated with the bound presented in Theorem 15.

This algorithm is based on the compressor of 𝒗{\boldsymbol{v}}-avoiding sequences presented in Algorithm 1. For 𝒗Σ{\boldsymbol{v}}\in\Sigma^{\ell}, let 𝒗{\cal E}_{{\boldsymbol{v}}} denote this compressor, i.e., 𝒗{\cal E}_{{\boldsymbol{v}}} receives a 𝒗{\boldsymbol{v}}-avoiding sequence of length nn such that logn6\ell\leqslant\log n-6 and outputs an unconstrained and uniquely decodable sequence over Σn1\Sigma^{n-1}. Let nn_{{\cal E}} denote the maximal sequence length that can be compressed with 𝒗{\cal E}_{{\boldsymbol{v}}} such that |𝒗|=|{\boldsymbol{v}}|=\ell, that is, n=2+6=n/logn{n_{{\cal E}}=2^{\ell+6}=n/\log n}. Moreover, 𝒗{\cal E}_{{\boldsymbol{v}}} can be used to efficiently compress 𝒗{\boldsymbol{v}}-avoiding sequences of length nnn\geqslant n_{{\cal E}} as well; the input sequence is split to consecutive segments of length nn_{{\cal E}} and each of them is compressed separately. If there is a remainder smaller than nn_{{\cal E}}, it is not compressed at all. This way, a 𝒗{\boldsymbol{v}}-avoiding sequence of length nn can be compressed to a uniquely decodable sequence of length nn/nn-\lfloor n/n_{{\cal E}}\rfloor. We abuse the notation 𝒗{\cal E}_{{\boldsymbol{v}}} to denote this generalized compressor as well. Similarly, let 𝒟𝒗{\cal D}_{\boldsymbol{v}} denote the matching decoder of 𝒗{\cal E}_{{\boldsymbol{v}}} for any 𝒗Σ{\boldsymbol{v}}\in\Sigma^{\ell}.

Algorithm 2 receives as an input 𝒘{\boldsymbol{w}}, a sequence of length n1n-1 and outputs 𝒙{\boldsymbol{x}}, an \ell-tuples covering sequence of length nn. The goal of the algorithm is to shorten the input sequence enough in order to make room for appending a de Bruijn sequence of order \ell at its end. The shortening procedure uses the family of compressors {𝒗𝒗Σ}\{{\cal E}_{{\boldsymbol{v}}}\mid{\boldsymbol{v}}\in\Sigma^{\ell}\}, based on the observation that as long as the sequence is not an \ell-tuples covering sequence, then it is 𝒗{\boldsymbol{v}}-avoiding for some tuple 𝒗Σ{\boldsymbol{v}}\in\Sigma^{\ell}. Let 𝒔{\boldsymbol{s}} denote a fixed de Bruijn sequence of length 2+12^{\ell}+\ell-1; 𝒔{\boldsymbol{s}} can be produced with time complexity of 𝒪(2){\cal O}(\ell 2^{\ell}), see [9]. The algorithm first sets 𝒙=0𝒘{\boldsymbol{x}}=0\circ{\boldsymbol{w}} in order to mark the start of the encoding process for the decoder. Then, as long as 𝒙{\boldsymbol{x}} is not \ell-tuples covering, the algorithm repeatedly shortens 𝒙{\boldsymbol{x}} by finding an \ell-tuple 𝒗{\boldsymbol{v}} that does not appear in 𝒙{\boldsymbol{x}}. The algorithm encodes the occurrence and compresses 𝒙{\boldsymbol{x}} using 𝒗{\cal E}_{{\boldsymbol{v}}}. This process ends when 𝒙{\boldsymbol{x}} is either an \ell-tuples covering sequence or it is short enough to be appended by 𝒔{\boldsymbol{s}}. Either way, this results with an \ell-tuples covering sequence which is returned after being padded to length nn.

Algorithm 2 \ell-tuples covering sequences encoding

Input: A sequence 𝒘Σn1{\boldsymbol{w}}\in\Sigma^{n-1}
Output: A sequence 𝒙(n,){\boldsymbol{x}}\in{\cal R}(n,\ell)

1:Set 𝒙=0𝒘{\boldsymbol{x}}=0\circ{\boldsymbol{w}}
2:while 𝒙{\boldsymbol{x}} is not \ell-tuples covering and |𝒙|>n|𝒔||{\boldsymbol{x}}|>n-|{\boldsymbol{s}}| do
3:     Pick 𝒗Σ{𝒙i,:i[|𝒙|+1]}{\boldsymbol{v}}\in\Sigma^{\ell}\setminus\{{\boldsymbol{x}}_{i,\ell}:i\in[|{\boldsymbol{x}}|-\ell+1]\}
4:     Set 𝒙=1𝒗𝒗(𝒙){\boldsymbol{x}}=1\circ{\boldsymbol{v}}\circ{\cal E}_{{\boldsymbol{v}}}({\boldsymbol{x}})
5:end while
6:Return Prefn(𝒙𝒔1n)\text{Pref}_{n}({\boldsymbol{x}}\circ{\boldsymbol{s}}\circ 1^{n})

We prove the correctness of Algorithm 2 in the next two lemmas.

Lemma 16

. At every iteration of the while loop of Algorithm 2, at Step 4 the size of 𝒙{\boldsymbol{x}} decreases.

Proof:

Let 𝒙{\boldsymbol{x}}^{\prime} denote the sequence 𝒙{\boldsymbol{x}} after the execution of Step 4 at some iteration of the loop. Hence,

|𝒙|\displaystyle|{\boldsymbol{x}}^{\prime}| =|𝒙||𝒙|n++1\displaystyle=|{\boldsymbol{x}}|-\left\lfloor\frac{|{\boldsymbol{x}}|}{n_{{\cal E}}}\right\rfloor+\ell+1
(a)|𝒙|n|𝒔|n++1\displaystyle\overset{(a)}{\leqslant}|{\boldsymbol{x}}|-\left\lfloor\frac{n-|{\boldsymbol{s}}|}{n_{{\cal E}}}\right\rfloor+\ell+1
(b)|𝒙|nn++2\displaystyle\overset{(b)}{\leqslant}|{\boldsymbol{x}}|-\frac{n}{n_{{\cal E}}}+\ell+2
(c)|𝒙|logn++2\displaystyle\overset{(c)}{\leqslant}|{\boldsymbol{x}}|-\log n+\ell+2
(d)|𝒙|loglogn2\displaystyle\overset{(d)}{\leqslant}|{\boldsymbol{x}}|-\log\log n-2
<|𝒙|\displaystyle<|{\boldsymbol{x}}|

where (a) follows from the fact that throughout the while loop |𝒙|>n|𝒔||{\boldsymbol{x}}|>n-|{\boldsymbol{s}}|, (b) follows from |𝒔|<n|{\boldsymbol{s}}|<n_{{\cal E}}, and (c) follows from plugging n=nlognn_{{\cal E}}=\frac{n}{\log n}, and (d) follows from plugging lognloglogn6\ell\leqslant\log n-\log\log n-6. ∎

Theorem 17

. Algorithm 2 successfully outputs a uniquely decodable \ell-tuples covering sequence of length nn. The time complexity of the algorithm and its decoder is 𝒪(n2lognloglogn){\cal O}\left(\frac{n^{2}}{\log n\log\log n}\right).

Proof:

Following Lemma 16, the while loop terminates and the algorithm reaches Step 6. At Step 6 𝒙{\boldsymbol{x}} satisfies at least one of the following, 1. |𝒙|n|{\boldsymbol{x}}|\leqslant n and 𝒙{\boldsymbol{x}} is \ell-tuples covering, or 2. |𝒙|n|𝒔||{\boldsymbol{x}}|\leqslant n-|{\boldsymbol{s}}|. In both cases, Prefn(𝒙𝒔1n)\text{Pref}_{n}({\boldsymbol{x}}\circ{\boldsymbol{s}}\circ 1^{n}) is a sequence of length nn that contains an entire \ell-tuples covering sequence, 𝒙{\boldsymbol{x}} in the first case and 𝒔{\boldsymbol{s}} in the second case.

As for the time complexity, at every iteration the algorithm invokes 𝒗{\cal E}_{{\boldsymbol{v}}} at most nn\lfloor\frac{n}{n_{{\cal E}}}\rfloor times, each requires 𝒪(){\cal O}(\ell) from Theorem 11. Additionally at every iteration finding 𝒗{\boldsymbol{v}} that is not contained in 𝒙{\boldsymbol{x}} can be done in 𝒪(n){\cal O}(n). From Lemma 16, 𝒙{\boldsymbol{x}} is shortened by Θ(loglogn)\Theta(\log\log n) at every iteration, hence we have at most 𝒪(2loglogn)=𝒪(n2lognloglogn){\cal O}\left(\frac{2^{\ell}}{\log\log n}\right)={\cal O}\left(\frac{n^{2}}{\log n\log\log n}\right) iterations, and we receive the result in the theorem statement. Note that the time complexity of constructing the de Bruijn sequence 𝒔{\boldsymbol{s}} is 𝒪(2)=𝒪(n){\cal O}(\ell 2^{\ell})={\cal O}(n) which does not affect the time complexity of the encoder.

In order to decode 𝒘Σn1{\boldsymbol{w}}\in\Sigma^{n-1} given 𝒙{\boldsymbol{x}}, an output of Algorithm 2, we iteratively inverse the operation of the while loop using the set of decoders {𝒟𝒗𝒗Σ}\{{\cal D}_{{\boldsymbol{v}}}\mid{\boldsymbol{v}}\in\Sigma^{\ell}\}. As long as x0=1x_{0}=1, we repeatedly extract 𝒗=𝒙1,{\boldsymbol{v}}={\boldsymbol{x}}_{1,\ell} and decode the rest of 𝒙{\boldsymbol{x}} using 𝒟𝒗{\cal D}_{{\boldsymbol{v}}}. This process ends when x0=0x_{0}=0, where the decoder returns 𝒘=Prefn1(𝒙){\boldsymbol{w}}=\text{Pref}_{n-1}({\boldsymbol{x}}).

Note that as a result of the possible concatenation of 𝒔{\boldsymbol{s}} and 1n1^{n} to 𝒙{\boldsymbol{x}} at Step 6, in some cases the values of 𝒙{\boldsymbol{x}} in the matching iterations of the encoder and the decoder are not equal. In those cases, the decoded sequence contains an additional suffix and 𝒟𝒗{\cal D}_{{\boldsymbol{v}}} can be invoked on segments that were not outputs of 𝒗{\cal E}_{{\boldsymbol{v}}} in Algorithm 2. However, this does not impact the correctness of the decoder since those segments will be trimmed from 𝒙{\boldsymbol{x}} at the end of the decoding process. ∎

Remark 2

. Algorithm 2 can be generalized to any q>2q>2 using a qq-ary generalization of the compressor 𝒗{\cal E}_{\boldsymbol{v}}. In this case, the algorithm can encode qq-ary -tuples {\ell\text{-tuples }}covering sequences of length logqnlogqlogqn4\ell\leqslant\log_{q}n-\log_{q}\log_{q}n-4 for nn large enough.

IV-D Upper Bound on the Rate Using de Bruijn Graph

In this section we use an enumeration technique which was first used to enumerate de Bruijn sequences using the de Bruijn graph [18]. It was recently used to enumerate another generalization of de Bruijn sequences [16]. In this paper, this technique is used to derive an upper bound on the cardinality of \ell-tuple covering sequences. For simplicity, we focus on binary sequences, although this technique can be extended to any alphabet of finite size.

Let n=2+1+kn=2^{\ell}+\ell-1+k, for k=f(n)k=f(n). For every selection of kk \ell-tuples (with repetitions) we construct a graph GG which is a generalization of the de Bruijn graph GG_{\ell}. The vertices of each graph are the same vertices of GG_{\ell} which are represented by the binary (1)(\ell-1)-tuples. The edges are the edges of GG_{\ell} with additional parallel edges corresponding to each of the kk \ell-tuples picked. There are (2+k1k)\binom{2^{\ell}+k-1}{k} different graphs which are generated this way.

A reverse spanning tree TT of a generalized graph GG is a graph whose underlying graph is a tree rooted at some 𝒓Σ1{{\boldsymbol{r}}\in\Sigma^{\ell-1}}, it contains all the vertices of GG, and there is a unique directed path from each vertex vv of TT to 𝒓{\boldsymbol{r}}. Clearly, all the generalized graphs share the same set of reverse spanning trees as GG_{\ell}, and there are 2212^{2^{\ell-1}-\ell} such trees for each 𝒓Σ1{\boldsymbol{r}}\in\Sigma^{\ell-1} [18].

For each graph GG, reverse spanning tree TT and root vertex 𝒓{\boldsymbol{r}}, we use a nondeterministic algorithm to attempt traversing all the edges of GG exactly once, starting from the root vertex 𝒓{\boldsymbol{r}}. In order to ensure the uniqueness of the path with respect to TT, its edges are notated as starred, and the algorithm leaves a vertex vv on a starred edge only if it is the last outgoing edge of vv that was not traversed before (this algorithm is defined formally in [16]). If the algorithm succeeded traversing all the edges of GG, the result is a path that corresponds to an \ell-tuples covering sequence of length nn. Notice that for some graphs, the algorithm might fail to traverse all the edges and to generate sequences. However, all the sequences of (n,){\cal R}(n,\ell) are produced by this method and thus enumerating the paths constructed by such algorithm derives an upper bound for r(n,)r(n,\ell).

Lemma 18

. For each reverse spanning tree, at most 2k+(2+k1k){2^{k+\ell}\dbinom{2^{\ell}+k-1}{k}} distinct acyclic sequences are constructed by the algorithm.

Proof:

First, there are (2+t1t)\binom{2^{\ell}+t-1}{t} graphs and 212^{\ell-1} vertices in each graph. For each graph, root vertex and spanning tree, we can bound the number of different orders to traverse the edges by 2t+12^{t+1}. That is since each vertex besides the root that has t0t^{\prime}\geqslant 0 parallel outgoing edges has at most 2t2^{t^{\prime}} different options to traverse its outgoing edges, since the edges of TT are traversed last. The root 𝒓{\boldsymbol{r}} does not share outgoing edges with TT and therefore has at most 2t+12^{t^{\prime}+1} different orders. Hence, we receive the value mentioned in the lemma statement. ∎

Corollary 19

. The total number of distinct acyclic sequences of length nn formed by the algorithm is at most 221+t(2+t1t)2^{2^{\ell-1}+t}\binom{2^{\ell}+t-1}{t}.

The value presented in Corollary 19 provides an upper bound on r(n,)r(n,\ell). Next, we derive an upper bound for the asymptotic rate of \ell-tuples covering sequences, for different values of \ell. Recall the following result of Lemma 14, applied for q=2q=2,

Corollary 20

. Let n=2+1+f(n)n=2^{\ell}+\ell-1+f(n) for \ell\in\mathbb{N}. Then,

(){12f(n)=o(2)2α+12α+2f(n)=α2+o(2) for constant α>01f(n)=ω(2)\mathbb{R}(\ell)\geqslant\begin{cases}\frac{1}{2}&f(n)=o(2^{\ell})\\ \frac{2\alpha+1}{2\alpha+2}&f(n)=\alpha 2^{\ell}+o(2^{\ell})\text{ for constant }\alpha>0\\ 1&f(n)=\omega(2^{\ell})\\ \end{cases}
Theorem 21

. If n=2+1+f(n)n=2^{\ell}+\ell-1+f(n) where f(n)=o(2)f(n)=o(2^{\ell}), then the asymptotic rate of \ell-tuples covering sequences satisfies

()=12.\mathbb{R}(\ell)=\frac{1}{2}.
Proof:

Following Corollary 20 it is necessary to show that ()1/2\mathbb{R}(\ell)\leqslant 1/2. Let t=f(n)=o(2)t=f(n)=o(2^{\ell}), then based on (ab)2aH2(b/a)\binom{a}{b}\leqslant 2^{aH_{2}(b/a)}, where H2H_{2} is the binary entropy function (see, e.g., [15, Ch.10, Sec.11, Lem.7]), it is possible to derive that log(2+t1t)=o(2)\log\binom{2^{\ell}+t-1}{t}=o(2^{\ell}). Hence,

()\displaystyle\mathbb{R}(\ell) lim sup21+t+log(2+t1t)2\displaystyle\leqslant\limsup_{\ell\rightarrow\infty}\frac{2^{\ell-1}+t+\log\binom{2^{\ell}+t-1}{t}}{2^{\ell}}
=12+lim supo(2)2=12.\displaystyle=\frac{1}{2}+\limsup_{\ell\rightarrow\infty}\frac{o(2^{\ell})}{2^{\ell}}=\frac{1}{2}.

Similarly, when t=α2+o(2)t=\alpha 2^{\ell}+o(2^{\ell}) for α>0\alpha>0, we have the next corollary which uses similar considerations as Theorem 21.

Corollary 22

. Let n=2+1+f(n)n=2^{\ell}+\ell-1+f(n) where f(n)=α2+o(2)f(n)=\alpha 2^{\ell}+o(2^{\ell}) for α>0\alpha>0, then the asymptotic rate of \ell-covering sequences satisfies

()H(αα+1)+2α+12α+2.\mathbb{R}(\ell)\leqslant H\left(\frac{\alpha}{\alpha+1}\right)+\frac{2\alpha+1}{2\alpha+2}.

Note that the result of Corollary 22 is useful only for small values of α\alpha in the range 0<α<10<\alpha<1. Other values of α\alpha are subject for future research.

Table I summarizes the results presented in this paper regarding the asymptotic rate of \ell-tuples covering sequences.

TABLE I: Asymptotic rate of binary \ell- tuples covering sequences n=2+1+f(n)n=2^{\ell}+\ell-1+f(n)
Case Result
f(n)=o(2)f(n)=o(2^{\ell}) ()=1/2\mathbb{R}(\ell)=1/2
f(n)=α2+o(2) , α>0f(n)=\alpha 2^{\ell}+o(2^{\ell})\text{ , }\alpha>0 2α+12α+2()min{H(αα+1)+2α+12α+2,1}\frac{2\alpha+1}{2\alpha+2}\leqslant\mathbb{R}(\ell)\leqslant\min\left\{H\left(\frac{\alpha}{\alpha+1}\right)+\frac{2\alpha+1}{2\alpha+2},1\right\}
f(n)=ω(2)f(n)=\omega(2^{\ell}) ()=1\mathbb{R}(\ell)=1

Acknowledgment

S. Marcovich and E. Yaakobi were supported by the United States-Israel BSF grant no. 2018048. T. Etzion was supported by ISF grant no. 222/19.

References

  • [1] N. Alon, J. Bruck, F. F. Hassanzadeh, and S. Jain, “Duplication distance to the root for binary sequences,” IEEE Transactions on Information Theory, vol. 63, no. 12, pp. 7793–7803, 2017.
  • [2] A. H. Chan, R. A. Games, and E. L. Key, “On the complexities of de bruijn sequences,” Journal of Combinatorial Theory, Series A, vol. 33, no. 3, pp. 233 – 246, 1982.
  • [3] Y. M. Chee, T. Etzion, H. M. Kiah, S. Marcovich, A. Vardy, V. Khu Vu, and E. Yaakobi, “Locally-constrained de bruijn codes: Properties, enumeration, code constructions, and applications,” IEEE Transactions on Information Theory, vol. 67, no. 12, pp. 7857–7875, 2021.
  • [4] P. Compeau, P. Pevzner, and G. Tesler, “How to apply de bruijn graphs to genome assembly,” Nature biotechnology, no. 11, pp. 987–991, 2011.
  • [5] A. Czumaj and L. Gasieniec, “On the complexity of determining the period of a string,” in CPM, 2000.
  • [6] N. G. de Bruijn, “A combinatorial problem,” Koninklijke Nederlandse Akademie v. Wetenschappen, vol. 49, pp. 758–764, 1946.
  • [7] O. Elishco, R. Gabrys, E. Yaakobi, and M. Mèdard, “Repeat-free codes,” IEEE Transactions on Information Theory, vol. 67, no. 9, pp. 5749–5764, 2021.
  • [8] H. Fredricksen, “A survey of full length nonlinear shift register cycle algorithms,” SIAM Review, vol. 24, pp. 195–221, 1982.
  • [9] H. M. Fredricksen, “A class of nonlinear de Bruijn cycles,” Journal of Combinatorial Theory, vol. 19, pp. 192–199, 1975.
  • [10] R. Gabrys and O. Milenkovic, “Unique reconstruction of coded sequences from multiset substring spectra,” in Proc. of the IEEE International Symposium on Information Theory, Vail, Colorado, USA, 2018, pp. 2540–2544.
  • [11] S. W. Golomb, “Shift register sequences,” San Francisco: Holden-Day, 1967.
  • [12] H. M. Kiah, G. J. Puleo, and O. Milenkovic, “Codes for DNA sequence profiles,” IEEE Transactions on Information Theory, vol. 62, no. 6, pp. 3125–3146, 2016.
  • [13] A. Lempel, “On a homomorphism of the de Bruijn graph and its applications to the design of feedback shift registers,” IEEE Transactions on Computers, vol. C-19, no. 12, pp. 1204–1209, 1970.
  • [14] M. Levy and E. Yaakobi, “Mutually uncorrelated codes for DNA storage,” IEEE Transactions on Information Theory, vol. 65, no. 6, pp. 3671–3691, 2019.
  • [15] F. J. MacWilliams and N. J. A. Sloane, The Theory of Error-Correcting Codes.   North-Holland, 1978.
  • [16] S. Marcovich, T. Etzion, and E. Yaakobi, “Balanced de Bruijn sequences,” in IEEE International Symposium on Information Theory (ISIT), 2021, pp. 1528–1533.
  • [17] U. M. Maurer, “Asymptotically-tight bounds on the number of cycles in generalized de Bruijn-Good graphs,” Discrete Applied Mathematics, vol. 37-38, pp. 421 – 436, 1992.
  • [18] F. J. Mowle, “Relations between pnp_{n} cycles and stable feedback shift registers,” IEEE Transactions on Computers, vol. C-15, pp. 375–378, 1966.
  • [19] A. Ralston, “A new memoryless algorithm for de Bruijn sequences,” Journal of Algorithms, vol. 2, pp. 50–62, 1981.
  • [20] K. Schouhamer-Immink, Coding techniques for digital recorders.   Prentice-Hall, 1991.
  • [21] L. Song, F. Geng, Z. Gong, B. Li, and Y. Yuan, “Super-robust data storage in DNA by de bruijn graph-based decoding,” 2020.
  • [22] T. van Aardenne-Ehrenfest and N. G. de Bruijn, “Circuits and trees in oriented linear graphs,” Simon Stevin : Wis-en Natuurkundig Tijdschrif, vol. 28, pp. 203–217, 1951.