Covering Sequences for -tuples
Abstract
de Bruijn sequences of order , i.e., sequences that contain each as a window exactly once, have found many diverse applications in information theory and most recently in DNA storage. This family of binary sequences has asymptotic rate of . To overcome this low rate, we study -tuples covering sequences, which impose that each -tuple appears at least once as a window in the sequence. The cardinality of this family of sequences is analyzed while assuming that is a function of the sequence length . Lower and upper bounds on the asymptotic rate of this family are given. Moreover, we study an upper bound for such that the redundancy of the set of -tuples covering sequences is at most a single symbol. Lastly, we present efficient encoding and decoding schemes for -tuples covering sequences that meet this bound.
I Introduction
The binary de Bruijn graph of order , , was introduced in 1946 by de Bruijn [6]. His target in introducing this graph was to find a recursive method to enumerate the number of cyclic binary sequences of length such that each -tuple appears as a window exactly once in each sequence. These sequences were later called de Bruijn sequences. These results were later generalized in [22] for any alphabet of finite size , using a -ary generalization of the de Bruijn graph of order , .
The vertices of are the -ary -tuples, and its edges correspond to the -ary -tuples. There is an edge if can be obtained from by shifting one entry left and appending a symbol. Eulerian cycles in de Bruijn graphs, i.e., cycles that visit all the edges of exactly once, are called de Bruijn cycles. It was proved that the number of de Bruijn cycles in is [22].
Each de Bruijn cycle induces a single (cyclic) de Bruijn sequence of length , by picking any edge in the cycle as a starting point, considering its first entry and appending the first entry of each consecutive edge in the cycle. All sequences that can be generated in this way are considered as the same sequence. Contrary to this, each de Bruijn cycle induces distinct acyclic de Bruijn sequences, i.e., sequences of length that contain each -tuple as a window exactly once, using a similar method with the exception of appending the ()-suffix of the last edge as well; each sequence corresponds to a choice of different starting edge from . Hence, the number of such acyclic de Bruijn sequences is and their asymptotic rate is (equals for ). One of the first applications of the de Bruijn graph was in the introduction of shift-register sequences and linear feedback shift registers [11]. Throughout the years, an extensive number of papers have studied the de Bruijn sequences and their applications, several of those include [2, 4, 8, 9, 13, 17, 19]. Most recently, DNA storage has brought fresh interest to this family of sequences; for more information on such applications the reader is referred to [1, 12, 21].
This paper studies a novel generalization of de Bruijn sequences (for the rest of this paper we refer only to acyclic sequences). We say that a sequence is an -tuples covering sequence if it contains each -ary -tuple as a window at least once. This work follows recent generalizations of de Bruijn sequences that proposed unique variations regarding the appearances of -tuples in the sequence: -repeat free sequences [7, 10] require each -tuple to appear at most once, ()-locally-constrained de Bruijn sequences [3] require each -tuple to appear at most once in every window of length , and -balanced de Bruijn sequences [16] require each to appear exactly times in the sequence.
Notice that for sequences of length , all -tuples covering sequences are simply the de Bruijn sequences deduced from the de Bruijn graph ; as a result, their asymptotic rate is . Our main goal is to efficiently construct codes of -tuples covering sequences with higher rates (specifically larger than for binary sequences) and fixed number of redundancy symbols. We study the cardinality for the set of -tuples covering sequences and present lower bounds on its asymptotic rate for various values of . Additionally, we present an upper bound on such that the redundancy of the set of all -tuples covering sequences is at most one symbol. Later, we present an encoding algorithm for the set of binary -tuples covering sequences that uses a single redundancy bit and meets this bound on . Finally, we use a generalization of the de Bruijn graph to develop an upper bound for the cardinality of this set of sequences.
Another interesting family of sequences is introduced as a building block to our analysis of -tuples covering sequences. For some -tuple , we say that a sequence is a -avoiding sequence if it does not contain as a window. Note that if is the all-zero -tuple, then this family of sequences is known as RLL sequences and was studied before, for example in [14, 20]. We study this family of sequences for any -tuple .
The rest of this paper is organized as follows. In Section II we formally define the families of sequences studied in this paper and review several previous results. In Section III, we study the family of -avoiding sequences for any -tuple . Based on these results, in Section IV we analyze the cardinality of -tuples covering sequences and present an encoding scheme for which uses a single redundancy bit.
II Definitions and Preliminaries
For two integers such that we denote by the set and use as a shorthand for . We use the notation as the alphabet of finite size . For simplicity, when , we omit the parameter from this notation and similar ones.
Let and let denote a sequence of length . For two positive integers and such that , let denote the substring . Additionally, let Pref Suff denote the -prefix, -suffix of , respectively. The notation is the concatenation of and another sequence , and denotes the concatenation of times, i.e., . Let denote two sets of vectors over . We denote the set and the set to be concatenations of the set . Finally, the redundancy of a set is defined as .
Definition 1
. The -th order -ary de Bruijn graph is the digraph , where and
Note that the edges of correspond to the set of -ary , .
Definition 2
. Let be an integer and . A sequence is called a de Bruijn sequence of order if contains each -ary -tuple as a window exactly once.
Let denote the set of -ary de Bruijn sequences of order . The connection between Eulerian cycles in to de Bruijn sequences is as follows. In order to generate a sequence from a cycle, we pick any edge in the cycle and set its first entry as the start of the sequence. Then, we append to the sequence the first entry of each consecutive edge in the cycle. Finally, we append the -suffix of the last edge of the cycle to form the whole sequence. Note that since each edge of can be picked as the first edge of the Eulerian cycle, a single cycle generates unique de Bruijn sequences.
Example 1
. Let . The sequence is a de Bruijn sequence. can be generated from using the Eulerian cycle
Recall that the number of de Bruijn cycles in is . Since each de Bruijn cycle generates unique de Bruijn sequences of length , it follows that . Therefore, the asymptotic rate of is
Note that when , this asymptotic rate equals . However, for , it approaches .
Next, we introduce the main family of sequences that is discussed in this paper.
Definition 3
. Let be integers. A sequence is called an -tuples covering sequence if contains each -ary -tuple as a window at least once, i.e., for each , there exists such that .
Example 2
. Let . The sequence is an -tuples covering sequence. However, the sequence is not an -tuples covering sequence, since it does not contain the -tuple .
We denote the set of all -ary -tuples covering sequence over by and notate the size of such code by . For a window length that is a function of , that is , we denote the asymptotic rate of by
Note the following connection between -tuples covering sequences and de Bruijn sequences; if , then the set is exactly the set of de Bruijn sequences . Therefore, in this case. In Section IV we study the cardinality of for various sizes of , i.e., for various functions . Moreover, we present an encoding algorithm for that uses a single redundancy bit.
III -Avoiding Sequences
In this section, we present the auxiliary family of -avoiding sequences that is used later in our analysis of -tuples covering sequences in Section IV.
Definition 4
. Let be an integer and a fixed -tuple. The set of -avoiding sequences over , denoted by contains all -ary sequences of length that do not contain as a window. Namely,
For a given -tuple , we notate the size of this code by . Note that for , this family of sequences is the family of -RLL sequences [20] (for integers , a -RLL sequence satisfies that the number of zeros between two consecutive ones is in the range ). These sequences were studied extensively in [14] for different functions .
We are motivated to study this family of sequences due to the following connection to the family of -tuples covering sequences; a sequence is an -tuples covering sequence if and only if for every , is not a -avoiding sequence. This connection will be utilized later in order to encode and analyze the cardinality of .
III-A Cardinality Upper Bound Analysis
First, we give an upper bound for for any in order to use it later to estimate the cardinality of . For a sequence , let denote its period, that is, the smallest positive integer that satisfies for every . Initially, we prove the following useful lemma.
Lemma 5
. Let be an integer and a tuple of length . Let denote the set of sequences of length that contain as a window at least once, i.e.,
Then,
Proof:
We observe that can appear in a sequence of length many times only if its period is small. In particular, we show that if then can appear at most twice. Hence, we lower bound by placing in a sequence and setting the minimal number of symbols in order to ensure that appears at most twice in . At last, we account for occurrences that were enumerated multiple times.
Let denote the starting position of in , and assume w.l.o.g that is in the middle of , i.e., . We increase the period of forward by selecting in order to ensure that . Let and assume in the contrary that . However, this means that is a multiple of and hence since then which is a contradiction to the definition of . Similarly, we increase the period of backwards by selecting . If starts or ends with , we only need to select a single symbol in order to increase the period of forward or backwards, respectively. Hence, we enumerate for each position at least possible choices of .
Next, it is clear that another instance of can appear at most once after , at position , and once before , at position . Moreover, it is only possible for one of those instances to occur, since . Thus, when summing the possible choices of for every possible position of the tuple , we count each choice at most twice since can appear at most twice in . Thus,
∎
Next, we use the result of Lemma 5 to obtain an upper bound on for every and .
Lemma 6
. Let be positive integers such that , and let be any sequence of length . Then,
where and is the base of the natural logarithm.
Proof:
Consider the following set
which is the set of sequences which are concatenation of sequences from appended by any sequence of length . Note that since for every and , is not a substring of , i.e., . Hence,
(1) |
III-B Compression Algorithm for Binary Sequences
Next, we focus on binary sequences, i.e., , and present a sequences compression algorithm for any of length and large enough. The algorithm receives a -avoiding sequence of length and outputs a unique unconstrained sequence of length . Clearly, this algorithm can be used for any by padding to size and continuing regularly; hence we assume from now on that . This compression algorithm will be utilized in Section IV to encode binary -tuples covering sequences.
For every , we denote two functions,
Note that both functions append a single bit to . We have the following lemma,
Lemma 7
. For every , .
We say that a sequence has a long period if its period is at least half of its length, i.e., . Hence, from Lemma 7, for every , has a long period, and satisfies that its -suffix has a long period. These functions are utilized in the following compression algorithm.
The -avoiding compression algorithm (Algorithm 1) receives a sequence for and compresses it to some uniquely decodable sequence . Initially, the algorithm checks the first bit of . If it is zero, then the rest of is returned as the result (see Figure 1). Otherwise, an index is decoded from the subsequent bits of (by converting this binary sequence to its integer representation) and the algorithm will construct by inserting an occurrence of at this index. However, since such an insertion might create new instances of in the sequence , additional bits are appended to in order to ensure that the insertion index can always be deduced by the decoder (see Figure 2). The redundancy bits are added as follows; first, two bits are appended to (independently of the input sequence ) to construct , a sequence that satisfies that both and have long periods. As a result, when is inserted at position , at most three new occurrences can be created to the right of it (see Lemma 8 which follows). These cases are eliminated using the remaining bits appended to , denoted by . The result is a sequence with its rightmost occurrence of at position .
Input: A sequence
Output: A sequence
Proof:
Remind that
For any two values , it holds that and thus . Let denote the -suffix of which its period was increased by invoking . Since
it follows that contains and hence
where (a) follows from Corollary 7. Finally, assume in the contrary that there exist four unique values that satisfy the condition of the lemma. We have
which is larger than the difference between the maximal and the minimal values in for large enough, that is,
and we have a contradiction. ∎
Proof:
Assume in the contrary that the assignment at Step 6 is incorrect for some , i.e., . It follows that . However, from the proof of Lemma 8 must be the maximal value in and since the values of are indexed in decreasing order, the index of is and the assignment is correctly defined. The assignment at Step 8 is correctly defined since for large enough, it holds that
∎
Proof:
Assume in the contrary that contains another instance of at position . Clearly, does not contain since is a substring of the input sequence . Therefore, from the construction of at Step 8, . On the other hand, . It follows that for and hence belongs to the set that is constructed at Step 3. Combining with Lemma 8, this occurrence was eliminated at Step 5 of the algorithm using
and we have a contradiction. ∎
Theorem 11
Proof:
Clearly, if Algorithm 1 returns a sequence of length . Otherwise, the algorithm inserts bits at Step 8 after removing bits at Step 1, and the length of the result is as well. The time complexity of the algorithm follows from to decode the index , and other operations are on and a substring of length of . Note that the time complexity of finding the period of (or a substring of it) is also [5]. Hence, the total time complexity is .
A decoder for this algorithm constructs and looks for the rightmost occurrence of in . If such occurrence is found, then it must be at the insertion index from Lemma 10. From here, reconstructing is straightforward. Since the decoder scans the sequence in order to find the rightmost occurrence of , its time complexity is , which is larger than the time complexity of the encoder.
∎
Remark 1
. Note that Algorithm 1 is generic and fits any . Clearly, with some knowledge of the tuple some steps can be skipped and the supported tuple length can be larger. For example, if has a long period it is unnecessary to invoke at Step 2 and can be used. If , i.e., is aperiodic, then no additional bits are necessary (the algorithm returns ) and the algorithm is applicable to .
Example 3
. Let , . Notice that . We can construct in advance and since ,
Notice that .
Let and assume the input sequence . In this case, the algorithm simply returns at Step 1. The decoder receives and notices that no occurrences of exist in the sequence, and thus it correctly returns .
Next, assume . One can easily verify that and that does not appear as a window in . At Step 1, since the first bit of the sequence is , the algorithm decodes an index from the integer value of that is, , and denotes by the unused part of . Soon, the algorithm will insert the tuple at position . However, we need to ensure that this insertion does not create additional occurrences of that start to the right of .
Since , the set of indices that is constructed at Step 3 is . Note that we have although by Lemma 8 the maximal size of is . At Step 6 the algorithm constructs where ensures that a new occurrence of is not created when concatenating and . Finally, at Step 8 the algorithm returns the sequence
The decoder receives and identifies the rightmost occurrence of at position . Thus, it constructs
IV Tuples Covering Sequences
In this section, we focus on the main family of sequences discussed in this paper, -tuples covering sequences. We study the cardinality of this set of sequences and present a lower bound on its asymptotic rate for various values of . Then, we present an upper bound for such that the redundancy of the set of -tuples covering sequences is at most one symbol. Later, we present an encoding algorithm for binary -tuples covering sequences that uses a single redundancy bit that meets this bound on . Finally, we use a generalization of the de Bruijn graph to develop an upper bound for the cardinality of this set of sequences.
IV-A Lower Bound Analysis on the Rate
We begin the discussion of -tuples covering sequences for any alphabet of size . For integers , it is clear that if then since a sequence of such length can contain at most unique -tuples where . In the case where the set is exactly the set of de Bruijn sequences of order , and hence . For larger values of , we have the following lemma.
Lemma 12
. Let for . Then,
Proof:
We construct -tuples covering sequences of length using any de Bruijn sequence of order followed by any symbols from . The result follows immediately from the size of . ∎
Therefore, we have the next results.
Corollary 13
. Let for . Then,
Proof:
Using Lemma 12 we have
and the results for and follow immediately. For the last case, we apply and receive
∎
Alternatively, we have
Corollary 14
. Let and let . Then,
Proof:
By applying , we have , and . Thus, using Claim 12 we have
and the result follows as when and when . ∎
For convenience, we use both representations of and throughout this section.
IV-B Single Symbol Analysis on the Redundancy
Next, we present an upper bound on , where the redundancy of is at most a single symbol. This bound uses the upper bound on the cardinality of -avoiding sequences presented in Section III.
Theorem 15
. If are integers such that , then for large enough, .
Proof:
Let for some constant that will be determined later. For every sequence that is not an -tuples covering sequence, there exists such that is a -avoiding sequence, i.e., . Thus, using Lemma 6, the number of sequences that are not -tuples covering sequences can be bounded above by
(3) |
where . Therefore, in order to have , i.e., , we require that the value in equation (3) is bounded from above by . Hence, it is enough to have
(4) |
By applying , or alternatively , we get
which satisfies inequality (4) when , i.e., , and for large enough. ∎
IV-C Encoding Algorithm for the Binary Case
For the rest of this section, we assume that . We present an encoder for -tuples covering sequences over . This encoder uses a single redundancy bit and handles -tuples of length for large enough. Note that this value of is associated with the bound presented in Theorem 15.
This algorithm is based on the compressor of -avoiding sequences presented in Algorithm 1. For , let denote this compressor, i.e., receives a -avoiding sequence of length such that and outputs an unconstrained and uniquely decodable sequence over . Let denote the maximal sequence length that can be compressed with such that , that is, . Moreover, can be used to efficiently compress -avoiding sequences of length as well; the input sequence is split to consecutive segments of length and each of them is compressed separately. If there is a remainder smaller than , it is not compressed at all. This way, a -avoiding sequence of length can be compressed to a uniquely decodable sequence of length . We abuse the notation to denote this generalized compressor as well. Similarly, let denote the matching decoder of for any .
Algorithm 2 receives as an input , a sequence of length and outputs , an -tuples covering sequence of length . The goal of the algorithm is to shorten the input sequence enough in order to make room for appending a de Bruijn sequence of order at its end. The shortening procedure uses the family of compressors , based on the observation that as long as the sequence is not an -tuples covering sequence, then it is -avoiding for some tuple . Let denote a fixed de Bruijn sequence of length ; can be produced with time complexity of , see [9]. The algorithm first sets in order to mark the start of the encoding process for the decoder. Then, as long as is not -tuples covering, the algorithm repeatedly shortens by finding an -tuple that does not appear in . The algorithm encodes the occurrence and compresses using . This process ends when is either an -tuples covering sequence or it is short enough to be appended by . Either way, this results with an -tuples covering sequence which is returned after being padded to length .
Input: A sequence
Output: A sequence
We prove the correctness of Algorithm 2 in the next two lemmas.
Proof:
Let denote the sequence after the execution of Step 4 at some iteration of the loop. Hence,
where (a) follows from the fact that throughout the while loop , (b) follows from , and (c) follows from plugging , and (d) follows from plugging . ∎
Theorem 17
. Algorithm 2 successfully outputs a uniquely decodable -tuples covering sequence of length . The time complexity of the algorithm and its decoder is .
Proof:
Following Lemma 16, the while loop terminates and the algorithm reaches Step 6. At Step 6 satisfies at least one of the following, 1. and is -tuples covering, or 2. . In both cases, is a sequence of length that contains an entire -tuples covering sequence, in the first case and in the second case.
As for the time complexity, at every iteration the algorithm invokes at most times, each requires from Theorem 11. Additionally at every iteration finding that is not contained in can be done in . From Lemma 16, is shortened by at every iteration, hence we have at most iterations, and we receive the result in the theorem statement. Note that the time complexity of constructing the de Bruijn sequence is which does not affect the time complexity of the encoder.
In order to decode given , an output of Algorithm 2, we iteratively inverse the operation of the while loop using the set of decoders . As long as , we repeatedly extract and decode the rest of using . This process ends when , where the decoder returns .
Note that as a result of the possible concatenation of and to at Step 6, in some cases the values of in the matching iterations of the encoder and the decoder are not equal. In those cases, the decoded sequence contains an additional suffix and can be invoked on segments that were not outputs of in Algorithm 2. However, this does not impact the correctness of the decoder since those segments will be trimmed from at the end of the decoding process. ∎
Remark 2
. Algorithm 2 can be generalized to any using a -ary generalization of the compressor . In this case, the algorithm can encode -ary covering sequences of length for large enough.
IV-D Upper Bound on the Rate Using de Bruijn Graph
In this section we use an enumeration technique which was first used to enumerate de Bruijn sequences using the de Bruijn graph [18]. It was recently used to enumerate another generalization of de Bruijn sequences [16]. In this paper, this technique is used to derive an upper bound on the cardinality of -tuple covering sequences. For simplicity, we focus on binary sequences, although this technique can be extended to any alphabet of finite size.
Let , for . For every selection of -tuples (with repetitions) we construct a graph which is a generalization of the de Bruijn graph . The vertices of each graph are the same vertices of which are represented by the binary -tuples. The edges are the edges of with additional parallel edges corresponding to each of the -tuples picked. There are different graphs which are generated this way.
A reverse spanning tree of a generalized graph is a graph whose underlying graph is a tree rooted at some , it contains all the vertices of , and there is a unique directed path from each vertex of to . Clearly, all the generalized graphs share the same set of reverse spanning trees as , and there are such trees for each [18].
For each graph , reverse spanning tree and root vertex , we use a nondeterministic algorithm to attempt traversing all the edges of exactly once, starting from the root vertex . In order to ensure the uniqueness of the path with respect to , its edges are notated as starred, and the algorithm leaves a vertex on a starred edge only if it is the last outgoing edge of that was not traversed before (this algorithm is defined formally in [16]). If the algorithm succeeded traversing all the edges of , the result is a path that corresponds to an -tuples covering sequence of length . Notice that for some graphs, the algorithm might fail to traverse all the edges and to generate sequences. However, all the sequences of are produced by this method and thus enumerating the paths constructed by such algorithm derives an upper bound for .
Lemma 18
. For each reverse spanning tree, at most distinct acyclic sequences are constructed by the algorithm.
Proof:
First, there are graphs and vertices in each graph. For each graph, root vertex and spanning tree, we can bound the number of different orders to traverse the edges by . That is since each vertex besides the root that has parallel outgoing edges has at most different options to traverse its outgoing edges, since the edges of are traversed last. The root does not share outgoing edges with and therefore has at most different orders. Hence, we receive the value mentioned in the lemma statement. ∎
Corollary 19
. The total number of distinct acyclic sequences of length formed by the algorithm is at most .
The value presented in Corollary 19 provides an upper bound on . Next, we derive an upper bound for the asymptotic rate of -tuples covering sequences, for different values of . Recall the following result of Lemma 14, applied for ,
Corollary 20
. Let for . Then,
Theorem 21
. If where , then the asymptotic rate of -tuples covering sequences satisfies
Proof:
Similarly, when for , we have the next corollary which uses similar considerations as Theorem 21.
Corollary 22
. Let where for , then the asymptotic rate of -covering sequences satisfies
Note that the result of Corollary 22 is useful only for small values of in the range . Other values of are subject for future research.
Table I summarizes the results presented in this paper regarding the asymptotic rate of -tuples covering sequences.
Case | Result |
---|---|
Acknowledgment
S. Marcovich and E. Yaakobi were supported by the United States-Israel BSF grant no. 2018048. T. Etzion was supported by ISF grant no. 222/19.
References
- [1] N. Alon, J. Bruck, F. F. Hassanzadeh, and S. Jain, “Duplication distance to the root for binary sequences,” IEEE Transactions on Information Theory, vol. 63, no. 12, pp. 7793–7803, 2017.
- [2] A. H. Chan, R. A. Games, and E. L. Key, “On the complexities of de bruijn sequences,” Journal of Combinatorial Theory, Series A, vol. 33, no. 3, pp. 233 – 246, 1982.
- [3] Y. M. Chee, T. Etzion, H. M. Kiah, S. Marcovich, A. Vardy, V. Khu Vu, and E. Yaakobi, “Locally-constrained de bruijn codes: Properties, enumeration, code constructions, and applications,” IEEE Transactions on Information Theory, vol. 67, no. 12, pp. 7857–7875, 2021.
- [4] P. Compeau, P. Pevzner, and G. Tesler, “How to apply de bruijn graphs to genome assembly,” Nature biotechnology, no. 11, pp. 987–991, 2011.
- [5] A. Czumaj and L. Gasieniec, “On the complexity of determining the period of a string,” in CPM, 2000.
- [6] N. G. de Bruijn, “A combinatorial problem,” Koninklijke Nederlandse Akademie v. Wetenschappen, vol. 49, pp. 758–764, 1946.
- [7] O. Elishco, R. Gabrys, E. Yaakobi, and M. Mèdard, “Repeat-free codes,” IEEE Transactions on Information Theory, vol. 67, no. 9, pp. 5749–5764, 2021.
- [8] H. Fredricksen, “A survey of full length nonlinear shift register cycle algorithms,” SIAM Review, vol. 24, pp. 195–221, 1982.
- [9] H. M. Fredricksen, “A class of nonlinear de Bruijn cycles,” Journal of Combinatorial Theory, vol. 19, pp. 192–199, 1975.
- [10] R. Gabrys and O. Milenkovic, “Unique reconstruction of coded sequences from multiset substring spectra,” in Proc. of the IEEE International Symposium on Information Theory, Vail, Colorado, USA, 2018, pp. 2540–2544.
- [11] S. W. Golomb, “Shift register sequences,” San Francisco: Holden-Day, 1967.
- [12] H. M. Kiah, G. J. Puleo, and O. Milenkovic, “Codes for DNA sequence profiles,” IEEE Transactions on Information Theory, vol. 62, no. 6, pp. 3125–3146, 2016.
- [13] A. Lempel, “On a homomorphism of the de Bruijn graph and its applications to the design of feedback shift registers,” IEEE Transactions on Computers, vol. C-19, no. 12, pp. 1204–1209, 1970.
- [14] M. Levy and E. Yaakobi, “Mutually uncorrelated codes for DNA storage,” IEEE Transactions on Information Theory, vol. 65, no. 6, pp. 3671–3691, 2019.
- [15] F. J. MacWilliams and N. J. A. Sloane, The Theory of Error-Correcting Codes. North-Holland, 1978.
- [16] S. Marcovich, T. Etzion, and E. Yaakobi, “Balanced de Bruijn sequences,” in IEEE International Symposium on Information Theory (ISIT), 2021, pp. 1528–1533.
- [17] U. M. Maurer, “Asymptotically-tight bounds on the number of cycles in generalized de Bruijn-Good graphs,” Discrete Applied Mathematics, vol. 37-38, pp. 421 – 436, 1992.
- [18] F. J. Mowle, “Relations between cycles and stable feedback shift registers,” IEEE Transactions on Computers, vol. C-15, pp. 375–378, 1966.
- [19] A. Ralston, “A new memoryless algorithm for de Bruijn sequences,” Journal of Algorithms, vol. 2, pp. 50–62, 1981.
- [20] K. Schouhamer-Immink, Coding techniques for digital recorders. Prentice-Hall, 1991.
- [21] L. Song, F. Geng, Z. Gong, B. Li, and Y. Yuan, “Super-robust data storage in DNA by de bruijn graph-based decoding,” 2020.
- [22] T. van Aardenne-Ehrenfest and N. G. de Bruijn, “Circuits and trees in oriented linear graphs,” Simon Stevin : Wis-en Natuurkundig Tijdschrif, vol. 28, pp. 203–217, 1951.