Reconstruction of Sequences Distorted by Two Insertions

Zuo Ye, Xin Liu, Xiande Zhang and Gennian Ge This project was supported by the National Key Research and Development Program of China under Grant 2020YFA0712100, Grant 2018YFA0704703 and Grant 2020YFA0713100, the National Natural Science Foundation of China under Grant 11971325, Grant 12171452 and Grant 12231014, and Beijing Scholars Program.Z. Ye and X. Zhang are with the School of Mathematical Sciences, University of Science and Technology of China, Hefei 230026, Anhui, China (e-mails: [email protected]; [email protected]).X. Liu and G. Ge are with the School of Mathematical Sciences, Capital Normal University, Beijing 100048, China (e-mails: [email protected]; [email protected]).

Abstract

Reconstruction codes are generalizations of error-correcting codes that can correct errors by a given number of noisy reads. The study of such codes was initiated by Levenshtein in 2001 and developed recently due to applications in modern storage devices such as racetrack memories and DNA storage. The central problem on this topic is to design codes with redundancy as small as possible for a given number $N$ of noisy reads. In this paper, the minimum redundancy of such codes for binary channels with exactly two insertions is determined asymptotically for all values of $N\geq 5$ . Previously, such codes were studied only for channels with single edit errors or two-deletion errors.

Index Terms:

Sequence reconstruction, insertion channel, DNA storage

I Introduction

The study of sequence reconstruction problems was initiated by Levenshtein in [1, 2, 3] to combat errors by repeatedly transmitting a message without coding, which is of interest in fields like informatics, molecular biology, and chemistry. The setting of this problem is as follows: the sender transmits a sequence through $N$ different noisy channels and the receiver receives all the channel outputs, called noisy reads; then the receiver aims to reconstruct the transmitted sequence with the help of these $N$ outputs. There are probabilistic and combinatorial versions of this problem. In the probabilistic version, the noisy reads are obtained by transmitting the sequence $\bm{x}$ through probabilistic channels, and the goal is to minimize the value of $N$ required for reconstructing $\bm{x}$ with high probability, see [2, 4, 5, 6, 7, 8, 9] and the references therein. In the combinatorial version, noisy reads are obtained by introducing the maximum number of errors, and the goal is to determine the minimum value of $N$ needed for zero-error reconstruction [2].

In this paper, we focus on the combinatorial version, which requires that all channel outputs are different from each other[2]. Then the problem of determining the minimum value of $N$ is reduced to determining the maximum intersection size of two distinct error balls. Specifically, the maximum intersection size is equal to $N-1$ . In [2, 3], Levenshtein solved the problem for several error types, such as substitutions, insertions, deletions, transpositions, and asymmetric errors. In [10, 11, 12], permutation errors were analyzed. For general error graphs, see [2, 13, 14]. Note that these works considered the uncoded case, that is, the transmitted sequences are selected from the entire space. Due to applications in DNA storage, many researchers also investigated this problem under the setting where the transmitted sequences are chosen from a given code with a certain error-correcting property, see for example [15, 16, 17, 18, 19].

Recently, the dual problem of the sequence reconstruction was developed under the scenario that the number of noisy reads is a fixed system parameter. Motivated by applications in DNA-based data storage and racetrack memories, Kiah et al. [20, 21] and Chrisnata et al. [22] initiated the study of designing reconstruction codes, for which the size of intersections between any two error-balls is strictly less than the number of reads $N$ , for a given $N$ . Note that when $N=1$ , reconstruction codes are the classical error-correcting codes. When $N>1$ , one can increase the information capacity, or equivalently, reduce the number of redundant bits by leveraging on these multiple reads. The redundancy of a binary code $\mathcal{C}$ of length $n$ is defined to be $n-\log_{2}|\mathcal{C}|$ . In [20, 21], the authors focused on channels that introduce single edit error, that is a single substitution, insertion, or deletion, and their variants. For single deletion binary codes, they showed that the number of redundant symbols required can be reduced from $\log_{2}(n)+O(1)$ to $\log_{2}\log_{2}(n)+O(1)$ if one increases the number of noisy reads $N$ from one to two. For two deletions, the best known $2$ -deletion correcting code (i.e., $N=1$ ) has redundancy $4\log_{2}(n)+o(\log_{2}(n))$ [23]. In [22, 24], Chrisnata et al. reduced this redundancy to $2\log_{2}(n)+o(\log_{2}(n))$ by increasing the number of noisy reads $N$ from one to five, while each noisy read suffers exactly two deletions. For $t$ -deletions (with $t\geq 2$ ), it was proved in [24] that the reconstruction code with two reads for a single deletion [21] is able to reconstruct codewords from $N=n^{t-1}/(t-1)!+O(n^{t-2})$ distinct noisy reads.

It is well known that a code can correct $t$ -deletions (i.e., $N=1$ ) if and only if it can correct $t$ -insertions. However, it is not the case when $N>1$ . When $N>1$ , the problem is closely related to the size of the intersection of two error balls. From [17], [18] and [21, Proposition 9], we can see that in general the intersection size of two $t$ -deletion balls is different from that of two $t$ -insertion balls.

In this paper, we study binary reconstruction codes for channels that cause exactly two insertion errors. We show that the number of redundant symbols required to reconstruct a codeword uniquely can be reduced from $4\log_{2}(n)+o(\log_{2}(n))$ to $\log_{2}(n)+O(\log_{2}\log_{2}(n))$ by increasing the number of noisy reads $N$ from one to five. The redundancy can be further reduced to $\log_{2}\log_{2}(n)+\Theta(1)$ if the number of noisy reads is at least $n+4$ . For readers’ easy reference, we summarize the best known results on binary $2$ -deletion reconstruction codes and $2$ -insertion reconstruction codes in Table I. For general $t$ -insertions (with $t>2$ ), we can show the similar result as in [24]: the reconstruction code with two reads for a single insertion [21] is able to reconstruct codewords from $N=n^{t-1}/(t-1)!+O(n^{t-2})$ distinct noisy reads.

TABLE I: Best known results, where DRC denotes deletion reconstruction codes and IRC denotes insertion reconstruction codes

	The value of $N$	Redundancy	Reference
$2$ -DRC	$N\leq 4$	$\Theta(\log_{2}(n))$	[23, Theorem 1]
	$N=5$	$2\log_{2}(n)+O(\log_{2}\log_{2}(n))$	[24, Theorem 10]
	$N=7$	$\log_{2}(n)+O(1)$	[24, Theorem 5]
	$N=n+1$	$\log_{2}\log_{2}(n)+O(1)$	[24, Theorem 9]
	$N>2n-4$	$0$	[2, Page 7],[24, Theorem 3]
$2$ -IRC	$N\leq 4$	$\Theta(\log_{2}(n))$	[23, Theorem 1]
	$N=5,6$	$\log_{2}(n)+O(\log_{2}\log_{2}(n))$	Theorem V.2
	$6<N\leq n+3$	$\log_{2}(n)+\Theta(1)$	Proposition III.2
	$n+4\leq N\leq 2n+4$	$\log_{2}\log_{2}(n)+\Theta(1)$	Proposition III.2,Theorem IV.2
	$N>2n+4$	$0$	Proposition III.2

This paper is organized as follows. In Section II, we introduce some notations and the problem statement formally. In Section III, we derive some preliminary results, which help us to solve the trivial cases. In Section IV, we first completely characterize the structures of two sequences when the intersection size of their $2$ -insertion balls is exactly $n+4$ or $n+5$ . Then by utilizing the structures, we give explicit reconstruction codes for $N=n+4$ , which are asymptotically optimal in terms of redundancy. In Section V, by applying higher order parity checks on short subwords, we construct asymptotically optimal codes when $N=5$ . Finally, we conclude this paper in Section VI with some open problems.

II Notations and Problem statement

In this section, we introduce some notations and preliminary results needed throughout this paper. Let $\Sigma_{2}$ denote the alphabet $\{0,1\}$ , and $\Sigma_{2}^{n}$ denote the set of all binary sequences of length $n$ . Further, let $\Sigma_{2}^{*}\triangleq\mathop{\cup}\limits_{n=0}^{\infty}\Sigma_{2}^{n}$ , that is the set consisting of all binary sequences of finite length. Here if $n=0$ , we mean the empty sequence, denoted by $\emptyset$ .

For a sequence $\bm{x}=x_{1}\cdots x_{n}\in\Sigma_{2}^{n}$ and a sequence $\bm{z}=z_{1}\cdots z_{t}\in\Sigma_{2}^{t}$ , where $t$ is an integer satisfying $1\leq t\leq n$ , if there exist $1\leq i_{1}<\cdots<i_{t}\leq n$ such that $z_{j}=x_{i_{j}}$ for all $1\leq j\leq t$ , we say that $\bm{z}$ is a subsequence of $\bm{x}$ , or $\bm{x}$ is a supersequence of $\bm{z}$ . In particular, when $i_{j+1}=i_{j}+1$ for all $j$ , we say that $\bm{z}$ is a subword of $\bm{x}$ , and use the notation $\bm{x}[l:k]$ with $l=i_{1}$ and $k=i_{t}$ to denote $\bm{z}$ . Let $|\bm{x}|$ denote the length of $\bm{x}$ . For any $\bm{x}\in\Sigma_{2}^{n}$ and $t\geq 0$ , let $I_{t}(\bm{x})\triangleq\{\bm{z}\in\Sigma_{2}^{n+t}:\text{ }\bm{z}\text{ is a supersequence of }\bm{x}\}$ . We call $I_{t}(\bm{x})$ the $t$ -insertion ball centered at $\bm{x}$ . It is known that $\left|I_{t}(\bm{x})\right|$ is independent of the choice of $\bm{x}$ by [2] and equal to

I(n,t)\triangleq\mathop{\sum}\limits_{i=0}^{t}\binom{n+t}{i}.

(1)

The Levenshtein distance between $\bm{x}$ and $\bm{y}$ in $\Sigma_{2}^{n}$ , denoted by $d_{L}(\bm{x},\bm{y})$ , is the smallest integer $\ell$ such that $I_{\ell}(\bm{x})\cap I_{\ell}(\bm{y})$ is not empty. For example, $d_{L}(10100,01001)=1$ . It is easy to see that $0\leq d_{L}(\bm{x},\bm{y})\leq n$ , and $d_{L}(\bm{x},\bm{y})=0$ if and only if $\bm{x}=\bm{y}$ . Finally, the Hamming distance between $\bm{x}$ and $\bm{y}$ , denoted by $d_{H}(\bm{x},\bm{y})$ , is the number of coordinates where $\bm{x}$ and $\bm{y}$ differ, and the Hamming weight $wt_{H}(\bm{x})$ of $\bm{x}$ is the number of nonzero coordinates of $\bm{x}$ .

The intersection size of two error balls is a key ingredient of the reconstruction problem in coding theory. By Levenshtein’s original framework [2], for channels causing $t$ insertion errors, it is always possible to exactly reconstruct $\bm{x}\in\Sigma_{2}^{n}$ given $N^{+}(n,t)+1$ distinct elements of $I_{t}(\bm{x})$ , where

	$\displaystyle N^{+}(n,t)$	$\displaystyle\triangleq\max\{\left\|I_{t}(\bm{x})\cap I_{t}(\bm{y})\right\|:\text{ }\bm{x},\bm{y}\in\Sigma_{2}^{n}\text{ and }\bm{x}\neq\bm{y}\}$		(2)
		$\displaystyle=\mathop{\sum}\limits_{i=0}^{t-1}\binom{n+t}{i}(1-(-1)^{t-i}).$		(2)

This notation was generalized in [18] for any two sequences with a minimum Levenshtein distance. For integers $n$ , $t\geq\ell\geq 0$ , let

N^{+}(n,t,\ell)\triangleq\max\{\left|I_{t}(\bm{x})\cap I_{t}(\bm{y})\right|:\text{ }\bm{x},\bm{y}\in\Sigma_{2}^{n}\text{ and }d_{L}(\bm{x},\bm{y})\geq\ell\},

then the explicit formula is given by

N^{+}(n,t,\ell)=\mathop{\sum}\limits_{j=\ell}^{t}\mathop{\sum}\limits_{i=0}^{t-j}\binom{2j}{j}\binom{t+j-i}{2j}\binom{n+t}{i}(-1)^{t+j-i}.

(3)

Note that when $\ell=0,1$ , the values of $N^{+}(n,t,l)$ reduce to $I(n,t)$ and $N^{+}(n,t)$ , respectively. See [18, Corollary 9] and [18, Corollary 10] for details.

Given a code $\mathcal{C}\subseteq\Sigma_{2}^{n}$ with $|\mathcal{C}|\geq 2$ , the read coverage $\nu_{t}(\mathcal{C})$ of $\mathcal{C}$ after $t$ insertions is defined as

\nu_{t}(\mathcal{C})\triangleq\max\{\left|I_{t}(\bm{x})\cap I_{t}(\bm{y})\right|:\text{ }\bm{x},\bm{y}\in\mathcal{C}\text{ and }\bm{x}\neq\bm{y}\}.

(4)

It is clear that if $\nu_{t}(\mathcal{C})<N$ , we can uniquely recover any codeword in $\mathcal{C}$ by its $N$ distinct reads. So we have the following definition.

Definition II.1.

(Reconstruction codes) A code $\mathcal{C}\subset\Sigma_{2}^{n}$ is an $(n,N;B^{I(t)})$ -reconstruction code if $\nu_{t}(\mathcal{C})<N$ , where $\nu_{t}(\mathcal{C})$ is the read coverage of $\mathcal{C}$ after $t$ insertions.

Given a code $\mathcal{C}\subset\Sigma_{2}^{n}$ , the redundancy of $\mathcal{C}$ is defined to be the value $n-\log_{2}\left|\mathcal{C}\right|$ . Then we are interested in the following quantity

\rho(n,N;B^{I(t)})\triangleq\min\{n-\log_{2}\left|\mathcal{C}\right|:\text{ }\mathcal{C}\subseteq\Sigma_{2}^{n}\text{ and }\nu_{t}(\mathcal{C})<N\},

which is called the optimal redundancy of an $(n,N;B_{2}^{I(t)})$ -reconstruction code. By definition, an $(n,N;B^{I(t)})$ -reconstruction code is also an $(n,N^{\prime};B^{I(t)})$ -reconstruction code for any $N^{\prime}>N$ . So we have $\rho(n,N;B^{I(t)})\geq\rho(n,N+1;B^{I(t)})$ for all $N\geq 1$ .

When $N=1$ , an $(n,1;B^{I(t)})$ -reconstruction code is indeed a $t$ -insertion correcting code, or equivalently a $t$ -deletion correcting code. This class of codes has been extensively studied in recent years, see [25, 26, 27, 28, 23] for more details. For $t=1$ , we have the following single deletion correcting code [29]:

\left\{\bm{x}=x_{1}\cdots x_{n}\in\Sigma_{2}^{n}\;:\;\mathop{\sum}\limits_{i=1}^{n}ix_{i}\equiv a\pmod{n+1}\right\},

(5)

where $a$ is a fixed integer between $0$ and $n$ . This is the famous Varshamov-Tenengolts (VT) code, which has asymptotically optimal redundancy $\log_{2}(n+1)$ when $a=0$ [30, Corollary 2.3]. In [21, Theorem 24], Cai et al. determined the asymptotic value of $\rho(n,N;B^{I(1)})$ for all $N\geq 1$ as below.

Theorem II.1.

Taking $q=2$ in [21, Theorem 24], we have

\rho(n,N;B^{I(1)})=\begin{cases}\log_{2}(n)+\Theta(1),&\mbox{if }N=1,\\ \log_{2}\log_{2}(n)+\Theta(1),&\mbox{if }N=2,\\ 0,&\mbox{if }N\geq 3.\end{cases}

Our work is to extend the result in Theorem II.1 to the case of $t=2$ . As will be shown in Corollary III.2 and the paragraph prior to Proposition III.2, the lower bound of the optimal redundancy can be deduced from the case $t=1$ . The upper bound needs explicit code constructions. So the main task is to construct an $(n,N;B^{I(2)})$ -reconstruction code with redundancy as small as possible, for any given $N$ . Before closing this section, we introduce the following notations which will be used later.

For $\bm{x},\bm{y}\in\Sigma_{2}^{*}$ , we write $\bm{x}\bm{y}$ for the concatenation of $\bm{x}$ and $\bm{y}$ . In particular, if $i\geq 1$ , we define $\bm{x}^{i}$ to be the sequence obtained by concatenating $i$ $\bm{x}$ ’s; if $i=0$ , $\bm{x}^{0}$ is the empty sequence. If $\bm{u}=u_{1}\cdots u_{n}\in\Sigma_{2}^{n}$ , we denote $\overline{\bm{u}}=(1-u_{1})\cdots(1-u_{n})$ as the complementary sequence of $\bm{u}$ . For any $S\subseteq\Sigma_{2}^{*}$ and $a,b\in\Sigma_{2}$ , we denote $S^{a}\triangleq\{\bm{x}\in S:\text{ }\bm{x}\text{ starts with }a\}$ , $S_{b}\triangleq\{\bm{x}\in S:\text{ }\bm{x}\text{ ends with }b\}$ , $S_{b}^{a}\triangleq\{\bm{x}\in S:\text{ }\bm{x}\text{ starts with }a\text{ and ends with }b\}$ , $aS\triangleq\{a\bm{y}:\text{ }\bm{y}\in S\}$ , and $Sb\triangleq\{\bm{y}b:\text{ }\bm{y}\in S\}$ .

Example II.1.

Let $\bm{x}=10$ , $\bm{y}=101$ and $\bm{u}=101101$ . Then $\bm{x}\bm{y}=10101$ , $\bm{x}^{2}=1010$ and $\overline{\bm{u}}=010010$ . Let $S=\left\{10101,0101,01010\right\}$ , $a=0$ and $b=1$ . Then $S^{a}=\left\{0101,01010\right\}$ , $S_{b}=\left\{10101,0101\right\}$ , $S_{b}^{a}=\left\{0101\right\}$ , $aS=\left\{010101,00101,001010\right\}$ and $Sb=\left\{101011,01011,010101\right\}$ .

Given a sequence $\bm{x}=x_{1}\cdots x_{n}\in\Sigma_{2}^{n}$ with $n\geq 2$ , we say that $\bm{x}$ has period $\ell$ ( $\leq n$ ) if $\ell$ is the smallest integer such that $x_{i}=x_{i+\ell}$ for all $1\leq i\leq n-\ell$ . A sequence with period $2$ is called alternating. We also view an empty sequence or a length-one sequence as an alternating sequence. For example, $\emptyset$ , $1$ , $0$ , $10$ , $01$ , $101$ are all alternating sequences.

III Preliminary results

As mentioned before, the intersection size of two error balls is a key ingredient of the reconstruction problem. In this section, we characterize the structures of two binary sequences whose intersection of $2$ -insertion balls are of a certain size.

Let us first recall the results for a single insertion, that is the size of $I_{1}(\bm{x})\cap I_{1}(\bm{y})$ . By Equation (2), we know that $\left|I_{1}(\bm{x})\cap I_{1}(\bm{y})\right|\leq 2$ for any two distinct sequences $\bm{x},\bm{y}\in\Sigma_{2}^{n}$ . The characterization of $\bm{x},\bm{y}$ when the equality holds was given in [21], which needs the following notations.

Definition III.1.

(Type-A confusability) Suppose that $\bm{x}\neq\bm{y}\in\Sigma_{2}^{n}$ . We say that $\bm{x},\bm{y}$ are Type-A confusable, if there exist $\bm{u}$ , $\bm{v}$ , $\bm{w}$ $\in\Sigma_{2}^{*}$ such that

\bm{x}=\bm{u}\bm{w}\bm{v}\text{ and }\bm{y}=\bm{u}\overline{\bm{w}}\bm{v},

where $\bm{w}$ is an alternating sequence of length at least one, i.e., $\bm{w}=(a\bar{a})^{i}$ for some $i\geq 1$ or $\bm{w}=(a\bar{a})^{j}a$ for some $j\geq 0$ , where $a=0$ or $1$ .

For example, the two sequences $110$ and $100$ are Type-A confusable with $\bm{u}=1$ , $\bm{w}=1$ and $\bm{v}=0$ . In general, if the Hamming distance $d_{H}(\bm{x},\bm{y})=1$ , then $\bm{x}$ and $\bm{y}$ are always Type-A confusable by definition. Note that the case where $d_{H}(\bm{x},\bm{y})=1$ was not included in the original definition in [21, Definition 8], where $j\geq 1$ is required when $\bm{w}=(a\bar{a})^{j}a$ . However, the following statement is still true, which is just a straightforward combination of the two claims in [21, Proposition 9].

Lemma III.1.

[21, Proposition 9] Let $\bm{x},\bm{y}\in\Sigma_{2}^{n}$ be distinct. Then $\left|I_{1}(\bm{x})\cap I_{1}(\bm{y})\right|$ $=2$ if and only if $\bm{x},\bm{y}$ are Type-A confusable.

In fact, if two sequences $\bm{x}$ and $\bm{y}$ are Type-A confusable, then

\left\{\begin{array}[]{l}\bm{x}=\bm{u}(a\bar{a})^{m+1}\bm{w}\\ \bm{y}=\bm{u}(\bar{a}a)^{m+1}\bm{w}\end{array}\right.\text{\ or\ }\left\{\begin{array}[]{l}\bm{x}=\bm{u}(a\bar{a})^{m}a\bm{w}\\ \bm{y}=\bm{u}(\bar{a}a)^{m}\bar{a}\bm{w}\end{array}\right.

for some $\bm{u},\bm{w}\in\Sigma_{2}^{*}$ , $a\in\Sigma_{2}$ and $m\geq 0$ . By Lemma III.1, $I_{1}(\bm{x})\cap I_{1}(\bm{y})=\{\bm{u}(\bar{a}a)^{m+1}\bar{a}\bm{w},\bm{u}(a\bar{a})^{m+1}a\bm{w}\}$ for the left case and $I_{1}(\bm{x})\cap I_{1}(\bm{y})=\{\bm{u}(\bar{a}a)^{m+1}\bm{w},\bm{u}(a\bar{a})^{m+1}\bm{w}\}$ for the right case. Next, we define the concept of Type-B confusability, which was introduced in [22, Definition 18] and [24, Definition 11]. This concept will help to characterize the case $\left|I_{1}(\bm{x})\cap I_{1}(\bm{y})\right|=1$ .

Definition III.2.

(Type-B confusability) Suppose that $\bm{x}\neq\bm{y}\in\Sigma_{2}^{n}$ with $n\geq 3$ . We say that $\bm{x},\bm{y}$ are Type-B confusable, if there exist $\bm{u}$ , $\bm{v}$ , $\bm{w}$ $\in\Sigma_{2}^{*}$ and $a,b\in\Sigma_{2}$ such that

\left\{\bm{x},\bm{y}\right\}=\left\{\bm{u}a\bar{a}\bm{v}b\bm{w},\bm{u}\bar{a}\bm{v}b\bar{b}\bm{w}\right\}.

Example III.1.

We give some examples of these two types of confusability.

•

Let $\bm{x}=11101010$ and $\bm{y}=11010110$ . Then $\bm{x}$ and $\bm{y}$ are Type-A confusable with $\bm{u}=11$ , $\bm{w}=1010$ and $\bm{v}=10$ . They are also Type-B confusable with $\bm{u}=11$ , $\bm{v}=1$ , $\bm{w}=10$ , $a=1$ and $b=0$ .
•

Let $\bm{x}=111010$ and $\bm{y}=110110$ . Then $\bm{x}$ and $\bm{y}$ are Type-A confusable with $\bm{u}=11$ and $\bm{w}=\bm{v}=10$ . Obviously, they are not Type-B confusable.
•

Let $\bm{x}=1110100110$ and $\bm{y}=1101001010$ . Then $\bm{x}$ and $\bm{y}$ are Type-B confusable with $\bm{u}=11$ , $\bm{v}=100$ , $\bm{w}=10$ and $a=b=1$ . It is easy to see that they are not Type-A confusable.
•

The two sequences $0011$ and $1110$ are neither Type-A confusable nor Type-B confusable.

From Example III.1, we can see that Type-A confusability and Type-B confusability do not contain each other. In general, we have the following proposition, which is easy to verify.

Proposition III.1.

From Definition III.1 and Definition III.2, one can easily check that

(1)

if $\bm{x}=\bm{u}\bm{w}\bm{v}$ , $\bm{y}=\bm{u}\overline{\bm{w}}\bm{v}$ $\in\Sigma_{2}^{n}$ are Type-A confusable, then they are Type-B confusable if and only if $\bm{w}\in\{(10)^{m},(01)^{m}\}$ for some $m\geq 2$ , or $\bm{w}\in\{(10)^{m}1,(01)^{m}0\}$ for some $m\geq 1$ ;
(2)
if $\bm{x}=\bm{u}a\bar{a}\bm{v}b\bm{w}$ , $\bm{y}=\bm{u}\bar{a}\bm{v}b\bar{b}\bm{w}$ $\in\Sigma_{2}^{n}$ are Type-B confusable, then they are Type-A confusable if and only if one of the following two conditions is satisfied:
- (i)
  
  $a=b$ and $\bm{v}=(a\bar{a})^{m}$ for some $m\geq 0$ ;
- (ii)
  
  $a=\bar{b}$ and $\bm{v}=(a\bar{a})^{m}a$ for some $m\geq 0$ .

Figure 1: We denote

i

and

j

to be the leftmost and rightmost indices where

\bm{x}

and

\bm{y}

differ.

Lemma III.2.

Suppose $\bm{x},\bm{y}\in\Sigma_{2}^{n}$ . If $\left|I_{1}(\bm{x})\cap I_{1}(\bm{y})\right|=1$ , then $\bm{x},\bm{y}$ are Type-B confusable.

Proof:

Since $\left|I_{1}(\bm{x})\cap I_{1}(\bm{y})\right|=1$ , then by Lemma III.1, $\bm{x}$ and $\bm{y}$ are not Type-A confusable, which further implies that $d_{H}(\bm{x},\bm{y})\geq 2$ . So we can assume that

\bm{x}=\bm{u}a\bm{d}b\bm{w},\text{ and }\bm{y}=\bm{u}\bar{a}\bm{e}\bar{b}\bm{w},

where $a,b\in\Sigma_{2}$ and $\bm{u},\bm{d},\bm{e},\bm{w}\in\Sigma_{2}^{*}$ . Let $i,j$ be the leftmost and rightmost indices, respectively, where $\bm{x}$ and $\bm{y}$ differ. See Figure 1 for a depiction.

Suppose that $\bm{d}=\bm{e}=\emptyset$ . If $a=\bar{b}$ , then $\bm{x}$ , $\bm{y}$ are Type-A confusable, which is a contradiction. If $a=b$ , then $|wt_{H}(\bm{x})-wt_{H}(\bm{y})|=2$ and thus $\left|I_{1}(\bm{x})\cap I_{1}(\bm{y})\right|=0$ , a contradiction. Therefore, we conclude that $\bm{d}$ and $\bm{e}$ are both nonempty.

Let $\bm{z}=z_{1}z_{2}\cdots z_{n+1}$ and $\{\bm{z}\}=I_{1}(\bm{x})\cap I_{1}(\bm{y})$ . Suppose that $z_{k}$ and $z_{\ell}$ are inserted in $\bm{x}$ and $\bm{y}$ to get $\bm{z}$ , respectively. Then there are only two possibilities: $k\leq i$ and $\ell>j$ , or $k>j$ and $\ell\leq i$ . If $k\leq i$ and $\ell>j$ , then $\bm{z}=\bm{u}\bar{a}a\bm{d}b\bm{w}=\bm{u}\bar{a}\bm{e}\bar{b}b\bm{w}$ and thus $a\bm{d}=\bm{e}\bar{b}$ . This means that there exists $\bm{v}\in\Sigma_{2}^{*}$ such that $\bm{d}=\bm{v}\bar{b}$ and $\bm{e}=a\bm{v}$ . Therefore, $\bm{x},\bm{y}$ are Type-B confusable. If $k>j$ and $\ell\leq i$ , the proof is similar. ∎

By Lemma III.2, if $\left|I_{1}(\bm{x})\cap I_{1}(\bm{y})\right|=1$ , then we must have $\left\{\bm{x},\bm{y}\right\}=\left\{\bm{u}a\bar{a}\bm{v}b\bm{w},\bm{u}\bar{a}\bm{v}b\bar{b}\bm{w}\right\}$ for some $\bm{u}$ , $\bm{v}$ , $\bm{w}$ $\in\Sigma_{2}^{*}$ and $a,b\in\Sigma_{2}$ . Consequently, $\bm{z}=\bm{u}a\bar{a}\bm{v}b\bar{b}\bm{w}$ is the unique sequence in $I_{1}(\bm{x})\cap I_{1}(\bm{y})$ . Combining Lemma III.1 and Lemma III.2, we completely characterize the structures of two binary sequences for which the intersection of $1$ -insertion balls is of a certain size. See the following corollary and Figure 2.

Corollary III.1.

Let $\bm{x},\bm{y}\in\Sigma_{2}^{n}$ be distinct. Then

•

$\left|I_{1}(\bm{x})\cap I_{1}(\bm{y})\right|=0$ if and only if $\bm{x}$ and $\bm{y}$ are neither Type-A confusable, nor Type-B confusable;
•

$\left|I_{1}(\bm{x})\cap I_{1}(\bm{y})\right|=1$ if and only if $\bm{x}$ and $\bm{y}$ are Type-B confusable, but not Type-A confusable;
•

$\left|I_{1}(\bm{x})\cap I_{1}(\bm{y})\right|=2$ if and only if $\bm{x}$ and $\bm{y}$ are Type-A confusable.

Now we proceed to estimate the intersection size of two $2$ -insertions balls, that is $\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|$ , based on the value of $\left|I_{1}(\bm{x})\cap I_{1}(\bm{y})\right|$ . By Equation (2), we have that $N^{+}(n,2)=2n+4$ . That is, $\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|\leq 2n+4$ for any $\bm{x}\neq\bm{y}\in\Sigma_{2}^{n}$ . The following lemma further restricts its value in the case that $\left|I_{1}(\bm{x})\cap I_{1}(\bm{y})\right|=1$ .

Lemma III.3.

Let $n\geq 3$ and $\bm{x},\bm{y}\in\Sigma_{2}^{n}$ . If $|I_{1}(\bm{x})\cap I_{1}(\bm{y})|=1$ , then $n+3\leq\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|\leq n+5$ . In particular, when $\bm{x}=a\bar{a}\bm{v}b$ and $\bm{y}=\bar{a}\bm{v}b\bar{b}$ , we have

•

$\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|=n+3$ if and only if $a\bar{a}\bm{v}$ and $\bm{v}b\bar{b}$ are neither Type-B confusable, nor Type-A confusable;
•

$\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|=n+4$ if and only if $a\bar{a}\bm{v}$ and $\bm{v}b\bar{b}$ are Type-B confusable, but not Type-A confusable;
•

$\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|=n+5$ if and only if $a\bar{a}\bm{v}$ and $\bm{v}b\bar{b}$ are Type-A confusable.

Proof:

Since $\left|I_{1}(\bm{x})\cap I_{1}(\bm{y})\right|=1$ , we conclude that $\bm{x},\bm{y}$ are Type-B confusable by Lemma III.2. Assume that $\bm{x}=\bm{u}a\bar{a}\bm{v}b\bm{w}$ and $\bm{y}=\bm{u}\bar{a}\bm{v}b\bar{b}\bm{w}$ for some $a,b\in\Sigma_{2}$ and $\bm{u},\bm{v},\bm{w}\in\Sigma_{2}^{*}$ . Let $I_{1}(\bm{x})\cap I_{1}(\bm{y})=\{\bm{z}\}$ . Then $\bm{z}=\bm{u}a\bar{a}\bm{v}b\bar{b}\bm{w}$ . Let $S=I_{2}(\bm{x})\cap I_{2}(\bm{y})$ . We prove that $n+3\leq|S|\leq n+5$ by induction on the length $n$ .

The base case is $\bm{u}=\bm{w}=\emptyset$ , that is, $\bm{x}=a\bar{a}\bm{v}b,\bm{y}=\bar{a}\bm{v}b\bar{b}$ and $\bm{z}=a\bar{a}\bm{v}b\bar{b}$ . In this case, we have $S=S^{a}\cup S_{\bar{b}}\cup S_{b}^{\bar{a}}$ and

\begin{array}[]{l}S^{a}=a\left(I_{2}(\bar{a}\bm{v}b)\cap I_{1}(\bar{a}\bm{v}b\bar{b})\right)=aI_{1}(\bar{a}\bm{v}b\bar{b})\subseteq I_{1}(\bm{z}),\\ S_{\bar{b}}=\left(I_{1}(a\bar{a}\bm{v}b)\cap I_{2}(\bar{a}\bm{v}b)\right)\bar{b}=I_{1}(a\bar{a}\bm{v}b)\bar{b}\subseteq I_{1}(\bm{z}),\\ S_{b}^{\bar{a}}=\bar{a}\left(I_{1}(a\bar{a}\bm{v})\cap I_{1}(\bm{v}b\bar{b})\right)b.\end{array}

The second equality in the first line follows from the fact that $I_{2}(\bar{a}\bm{v}b)\cap I_{1}(\bar{a}\bm{v}b\bar{b})=I_{1}(\bar{a}\bm{v}b\bar{b})$ , and similarly for the second equality in the second line.

By the form of $\bm{z}$ , we have $I_{1}(\bm{z})\subseteq S^{a}\cup S_{\bar{b}}$ and $I_{1}(\bm{z})\cap S_{b}^{\bar{a}}=\emptyset$ . Thus $S$ is the disjoint union of $I_{1}(\bm{z})$ and $S_{b}^{\bar{a}}$ . By Equation 1, $|I_{1}(\bm{z})|=n+3$ . So it remains to show that $|S_{b}^{\bar{a}}|\leq 2$ , or equivalently $a\bar{a}\bm{v}\neq\bm{v}b\bar{b}$ by Equation 2. Otherwise, if $a\bar{a}\bm{v}=\bm{v}b\bar{b}$ , then $a=b$ and $\bm{v}=(a\bar{a})^{m}$ , or $a=\bar{b}$ and $\bm{v}=(a\bar{a})^{m}a$ for some $m\geq 0$ . For both cases, $\bm{x}$ and $\bm{y}$ are Type-A confusable by Proposition III.1 (2), which contradicts the fact that $\left|I_{1}(\bm{x})\cap I_{1}(\bm{y})\right|=1$ . This completes the proof of the base case by Corollary III.1.

In the base case, we have shown that $S=I_{1}(\bm{z})\cup J({\bm{x}},{\bm{y}})$ for some set $J({\bm{x}},{\bm{y}})$ (i.e. $S_{b}^{\bar{a}}$ ) of size at most two. Suppose we have proved the same conclusion for the codeword length $n-1$ , and we will prove it for length $n$ . Now $\bm{x}=\bm{u}a\bar{a}\bm{v}b\bm{w}$ , $\bm{y}=\bm{u}\bar{a}\bm{v}b\bar{b}\bm{w}$ and $\bm{z}=\bm{u}a\bar{a}\bm{v}b\bar{b}\bm{w}$ . Without loss of generality, we assume that $|\bm{u}|\geq 1$ and $\bm{u}=c\bm{u}^{*}$ for some $c\in\Sigma_{2}$ and $\bm{u}^{*}\in\Sigma_{2}^{*}$ . Then $S^{\bar{c}}$ $=$ $\bar{c}\left(I_{1}(\bm{x})\cap I_{1}(\bm{y})\right)$ $=$ $\{\bar{c}\bm{z}\}=I_{1}(\bm{z})^{\bar{c}}$ . Denote $\widetilde{\bm{x}}=\bm{u}^{*}a\bar{a}\bm{v}b\bm{w}$ , $\widetilde{\bm{y}}=\bm{u}^{*}\bar{a}\bm{v}b\bar{b}\bm{w}$ and $\widetilde{\bm{z}}=\bm{u}^{*}a\bar{a}\bm{v}b\bar{b}\bm{w}$ , which are obtained from $\bm{x},\bm{y},\bm{z}$ by deleting the first element $c$ . By the induction hypothesis, $I_{2}(\widetilde{\bm{x}})\cap I_{2}(\widetilde{\bm{y}})=I_{1}(\widetilde{\bm{z}})\cup J(\widetilde{\bm{x}},\widetilde{\bm{y}})$ , for some set $J(\widetilde{\bm{x}},\widetilde{\bm{y}})$ of size at most two. Then $S^{c}=c\left(I_{2}(\widetilde{\bm{x}})\cap I_{2}(\widetilde{\bm{y}})\right)=cI_{1}(\widetilde{\bm{z}})\cup cJ(\widetilde{\bm{x}},\widetilde{\bm{y}})=I_{1}(\bm{z})^{c}\cup cJ(\widetilde{\bm{x}},\widetilde{\bm{y}})$ . Hence $S=S^{\bar{c}}\cup S^{c}=I_{1}(\bm{z})\cup cJ(\widetilde{\bm{x}},\widetilde{\bm{y}})$ , and this completes the proof. ∎

By Lemma III.3, we are able to give a rough estimate of $\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|$ based on the different values of $\left|I_{1}(\bm{x})\cap I_{1}(\bm{y})\right|$ , and consequently based on the confusability of $\bm{x}$ and $\bm{y}$ by Corollary III.1. See the following lemma and Figure 2.

Figure 2: The relation of two kinds of confusability and the intersection sizes, where

I_{1}\triangleq\left|I_{1}\left(\bm{x}\right)\cap I_{1}\left(\bm{y}\right)\right|

and

I_{2}\triangleq\left|I_{2}\left(\bm{x}\right)\cap I_{2}\left(\bm{y}\right)\right|

Lemma III.4.

Let $\bm{x}\neq\bm{y}\in\Sigma_{2}^{n}$ where $n\geq 4$ . Then

(i)

$\left|I_{1}(\bm{x})\cap I_{1}(\bm{y})\right|=2$ if and only if $\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|=2n+4$ ;
(ii)

$\left|I_{1}(\bm{x})\cap I_{1}(\bm{y})\right|=1$ if and only if $n+3\leq\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|\leq n+5$ ;
(iii)

$\left|I_{1}(\bm{x})\cap I_{1}(\bm{y})\right|=0$ if and only if $\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|\leq 6$ .

Proof:

(i) We prove the sufficiency by contradiction. If $\left|I_{1}(\bm{x})\cap I_{1}(\bm{y})\right|=0$ , that is $d_{L}(\bm{x},\bm{y})\geq 2$ , then $\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|\leq N^{+}(n,2,2)=6<2n+4$ by Equation 3. If $\left|I_{1}(\bm{x})\cap I_{1}(\bm{y})\right|=1$ , then $\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|\leq n+5<2n+4$ by Lemma III.3. So we must have $\left|I_{1}(\bm{x})\cap I_{1}(\bm{y})\right|=2$ .

For the necessity, recall that $\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|\leq 2n+4$ . Since $\left|I_{1}(\bm{x})\cap I_{1}(\bm{y})\right|$ $=2$ , by Lemma III.1, there must be some $\bm{u},\bm{w}\in\Sigma_{2}^{*}$ , $a\in\Sigma_{2}$ and $m\geq 0$ , such that

\left\{\begin{array}[]{l}\bm{x}=\bm{u}(a\bar{a})^{m+1}\bm{w}\\ \bm{y}=\bm{u}(\bar{a}a)^{m+1}\bm{w}\end{array}\right.\text{\ or\ }\left\{\begin{array}[]{l}\bm{x}=\bm{u}(a\bar{a})^{m}a\bm{w}\\ \bm{y}=\bm{u}(\bar{a}a)^{m}\bar{a}\bm{w}\end{array}.\right.

For the left case, let $\bm{z}_{1}=\bm{u}(\bar{a}a)^{m+1}\bar{a}\bm{w}$ and $\bm{z}_{2}=\bm{u}(a\bar{a})^{m+1}a\bm{w}$ . Then $I_{1}(\bm{x})\cap I_{1}(\bm{y})=\{\bm{z}_{1},\bm{z}_{2}\}$ and $I_{1}(\bm{z}_{1})\cap I_{1}(\bm{z}_{2})=\{\bm{u}(\bar{a}a)^{m+2}\bm{w},\bm{u}(a\bar{a})^{m+2}\bm{w}\}$ . Noting that $\left|I_{1}(\bm{z}_{1})\right|=\left|I_{1}(\bm{z}_{2})\right|=n+3$ by Equation 1, we have $\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|\geq|I_{1}(\bm{z}_{1})\cup I_{1}(\bm{z}_{2})|$ $\geq 2n+4$ . Therefore, $\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|=2n+4$ . For the right case, the proof is similar.

The claim (ii) is clear by Lemma III.3, the claim (i) and the fact that $\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|\leq 6$ if $\left|I_{1}(\bm{x})\cap I_{1}(\bm{y})\right|=0$ . The claim (iii) then follows too. ∎

By Lemma III.4 and Figure 2, it is immediate to have the following relations between reconstruction codes for two insertions and those for single insertion under certain conditions. See Figure 3 for a depiction.

Corollary III.2.

Let $n\geq 4$ and $\mathcal{C}\subset\Sigma_{2}^{n}$ .

(i)

When $n+5<N\leq 2n+4$ , $\mathcal{C}$ is an $(n,N;B^{I(2)})$ -reconstruction code if and only if $\mathcal{C}$ is an $(n,2;B^{I(1)})$ -reconstruction code.
(ii)

When $6<N\leq n+3$ , $\mathcal{C}$ is an $(n,N;B^{I(2)})$ -reconstruction code if and only if $\mathcal{C}$ is an $(n,1;B^{I(1)})$ -reconstruction code.

Figure 3: Relations between

(n,N_{1};B^{I(1)})

-reconstruction codes and

(n,N_{2};B^{I(2)})

-reconstruction codes, and a hierarchy relation for different values of

N

. Values of

N_{1}

and

N_{2}

in the same layer satisfy the “if and only if” statement, that is an

(n,N_{1};B^{I(1)})

-reconstruction code is an

(n,N_{2};B^{I(2)})

-reconstruction code, and vice verse. Values

N

in a smaller square and values

N^{\prime}

in a bigger square satisfy the “inclusion” statement, that is, an

(n,N;B^{I(t)})

-reconstruction code is an

(n,N^{\prime};B^{I(t)})

-reconstruction code for

t=1,2

, but the converse is not true.

By now, we are able to determine $\rho(n,N;B^{I(2)})$ for most values of $N$ . First, since $\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|\leq 2n+4$ for any $\bm{x}\neq\bm{y}\in\Sigma_{2}^{n}$ , we have $\nu_{2}(\Sigma_{2}^{n})=2n+4$ and $\rho(n,N;B^{I(2)})=0$ for any $N>2n+4$ . By Lemma III.4, we see that if $N=n+4,n+5$ , an $(n,N;B^{I(2)})$ -reconstruction code must be an $(n,2;B^{I(1)})$ -reconstruction code, and thus $\rho(n,N;B^{I(2)})\geq\log_{2}\log_{2}(n)+\Omega(1)$ by Theorem II.1; if $2\leq N\leq 6$ , an $(n,N;B^{I(2)})$ -reconstruction code must be an $(n,1;B^{I(1)})$ -reconstruction code, and thus $\rho(n,N;B^{I(2)})\geq\log_{2}(n)+\Omega(1)$ by Theorem II.1. The last case $N=1$ , corresponds to a $2$ -insertion error-correcting code, or equivalently a $2$ -deletion error-correcting code, whose best known upper bound on the optimal redundancy is $4\log_{2}(n)+O(\log_{2}\log_{2}(n))$ [23, Theorem 1], and the best known lower bound is $2\log_{2}(n)-O(1)$ (see [29]). Proposition III.2 summarizes the information given above.

Proposition III.2.

\rho(n,N;B^{I(2)})=\begin{cases}0,&\mbox{if }N>2n+4,\\ \log_{2}\log_{2}(n)+\Theta(1),&\mbox{if }n+5<N\leq 2n+4,\\ \geq\log_{2}\log_{2}(n)+\Omega(1),&\mbox{if }N=n+4,n+5,\\ \log_{2}(n)+\Theta(1),&\mbox{if }6<N\leq n+3,\\ \Theta(\log_{2}(n)),&\mbox{if }1\leq N\leq 6.\end{cases}

Therefore, the remaining cases are $N\in\{n+4,n+5\}$ and $2\leq N\leq 6$ , which will be handled in the rest of this paper. In fact, we construct asymptotically optimal reconstruction codes for $N\in\{n+4,n+5\}$ with redundancy $\log_{2}\log_{2}(n)+O(1)$ , and for $N\in\{5,6\}$ with redundancy $\log_{2}(n)+14\log_{2}\log_{2}(n)+O(1)$ , thus determine the asymptotically optimal redundancy for all $N\geq 5$ .

IV Reconstruction Codes for $N\in\{n+4,n+5\}$

In this section, we construct reconstruction codes for $N\in\{n+4,n+5\}$ with redundancy as small as possible. The main idea is as follows. First, we choose an $(n,n+6;B^{I(2)})$ -reconstruction code with small redundancy, or equivalently an $(n,2;B^{I(1)})$ -reconstruction code $\mathcal{C}_{1}$ (see Corollary III.2 (i)), which can be found in [21, Theorem 17]. Then we impose additional constraints on $\mathcal{C}_{1}$ such that $\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|\notin\{n+4,n+5\}$ under these constraints, and consequently we obtain an $(n,n+4;B^{I(2)})$ -reconstruction code $\mathcal{C}$ .

The main task in our construction is to characterize the structures of two sequences $\bm{x},\bm{y}$ when $\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|=n+4$ or $n+5$ . By Lemma III.4, we have $\left|I_{1}(\bm{x})\cap I_{1}(\bm{y})\right|=1$ . Then by Lemma III.2, we can assume that

\bm{x}=\bm{u}a\bar{a}\bm{v}b\bm{w}\text{ and }\bm{y}=\bm{u}\bar{a}\bm{v}b\bar{b}\bm{w}

for some $\bm{u},\bm{v},\bm{w}\in\Sigma_{2}^{*}$ and $a,b\in\Sigma_{2}$ , where $\bm{v}$ satisfies neither of the following conditions by Proposition III.1:

\begin{array}[]{c}a=b\text{ and }\bm{v}=(a\bar{a})^{m}\text{ for some }m\geq 0,\\ a=\bar{b}\text{ and }\bm{v}=(a\bar{a})^{m}a\text{ for some }m\geq 0.\end{array}

(6)

Let $\bm{z}=\bm{u}a\bar{a}\bm{v}b\bar{b}\bm{w}$ . In the proof of Lemma III.3, we have shown that $I_{2}(\bm{x})\cap I_{2}(\bm{y})=I_{1}(\bm{z})\cup J({\bm{x}},{\bm{y}})$ for some set $J({\bm{x}},{\bm{y}})$ of size at most two. The following result reveals more information about $J({\bm{x}},{\bm{y}})$ , whose proof is deferred to Appendix A.

Lemma IV.1.

Let $\bm{x}=\bm{u}a\bar{a}\bm{v}b\bm{w}$ and $\bm{y}=\bm{u}\bar{a}\bm{v}b\bar{b}\bm{w}$ for some $\bm{u},\bm{v},\bm{w}$ $\in\Sigma_{2}^{*}$ and $a,b\in\Sigma_{2}$ , and let $\bm{z}=\bm{u}a\bar{a}\bm{v}b\bar{b}\bm{w}$ . If $\bm{x}$ and $\bm{y}$ are not Type-A confusable, then

\begin{array}[]{l}I_{2}(\bm{x})\cap I_{2}(\bm{y})=I_{1}(\bm{z})\cup\bm{u}\left(I_{2}(a\bar{a}\bm{v}b)\cap I_{2}(\bar{a}\bm{v}b\bar{b})\setminus I_{1}(a\bar{a}\bm{v}b\bar{b})\right)\bm{w},\end{array}

and the union is disjoint.

Let $n^{\prime}$ be the length of $a\bar{a}\bm{v}b$ . Then by Equation 1, we have $\left|I_{1}(\bm{z})\right|=n+3$ and $\left|I_{1}(a\bar{a}\bm{v}b\bar{b})\right|=n^{\prime}+3$ . Lemma IV.1 implies

	$\displaystyle\left\|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right\|$	$\displaystyle=\left\|I_{1}(\bm{z})\right\|+\left\|I_{2}(a\bar{a}\bm{v}b)\cap I_{2}(\bar{a}\bm{v}b\bar{b})\setminus I_{1}(a\bar{a}\bm{v}b\bar{b})\right\|$
		$\displaystyle=n+3+\left(\left\|I_{2}(a\bar{a}\bm{v}b)\cap I_{2}(\bar{a}\bm{v}b\bar{b})\right\|-n^{\prime}-3\right).$

By Equation 6 and Proposition III.1, $a\bar{a}\bm{v}b$ and $\bar{a}\bm{v}b\bar{b}$ are Type-B confusable but not Type-A confusable, and hence $\left|I_{1}(a\bar{a}\bm{v}b)\cap I_{1}(\bar{a}\bm{v}b\bar{b})\right|=1$ by Corollary III.1. Then $\left|I_{2}(a\bar{a}\bm{v}b)\cap I_{2}(\bar{a}\bm{v}b\bar{b})\right|\in\{n^{\prime}+3,n^{\prime}+4,n^{\prime}+5\}$ by Lemma III.3. We also have $\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|\in\{n+3,n+4,n+5\}$ due to the fact $\left|I_{1}(\bm{x})\cap I_{1}(\bm{y})\right|=1$ . So the above equality implies that $\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|=n+4$ or $n+5$ if and only if $\left|I_{2}(a\bar{a}\bm{v}b)\cap I_{2}(\bar{a}\bm{v}b\bar{b})\right|=n^{\prime}+4$ or $n^{\prime}+5$ , respectively. Thus, we can further assume that $\bm{x}=a\bar{a}\bm{v}b\text{ and }\bm{y}=\bar{a}\bm{v}b\bar{b}.$ The following theorem characterizes the conditions when $\left|I_{2}(a\bar{a}\bm{v}b)\cap I_{2}(\bar{a}\bm{v}b\bar{b})\right|=n^{\prime}+4$ or $n^{\prime}+5$ .

TABLE II: case (ii) of Theorem IV.1

	$\bm{v}$	Conditions	Row Number
$a=b$	$(a\bar{a})^{i}\bar{a}^{j}(a\bar{a})^{k}$	$i\geq 0,j\geq 2,k\geq 0$	1
	$(a\bar{a})^{i}a^{j}(a\bar{a})^{k}$	$i\geq 0,j\geq 2,k\geq 0$	2
	$(a\bar{a})^{i}(\bar{a}a\bar{a})^{j}(\bar{a}a)^{k}\bar{a}$	$i\geq 0,j\geq 1,k\geq 0$	3
	$(a\bar{a})^{i}a(a\bar{a}a)^{j}(a\bar{a})^{k}$	$i\geq 0,j\geq 1,k\geq 0$	4
$a=\bar{b}$	$(a\bar{a})^{i}\bar{a}^{j}(\bar{a}a)^{k}$	$i\geq 0,j\geq 1,k\geq 0$	5
	$(a\bar{a})^{i}a^{j}(\bar{a}a)^{k}$	$i\geq 0,j\geq 3,k\geq 0$	6
	$(a\bar{a})^{i}(\bar{a}a\bar{a})^{j}(\bar{a}a)^{k}$	$i\geq 0,j\geq 1,k\geq 0$	7
	$(a\bar{a})^{i}a(a\bar{a}a)^{j}(a\bar{a})^{k}a$	$i\geq 0,j\geq 1,k\geq 0$	8

Theorem IV.1.

Let $n^{\prime}\geq 3$ . Suppose that $a,b\in\Sigma_{2}$ , and $\bm{v}\in\Sigma_{2}^{n^{\prime}-3}$ do not satisfy the conditions in (6). Then $\bm{x}=a\bar{a}\bm{v}b$ and $\bm{y}=\bar{a}\bm{v}b\bar{b}$ satisfy the following properties.

(i)
$\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|=n^{\prime}+5$ if and only if
- •
  
  $a=b$ and $\bm{v}\in\{(a\bar{a})^{i}a(a\bar{a})^{j},(a\bar{a})^{i}(\bar{a}a)^{j}\bar{a}\}$ for some $i,j\geq 0$ ; or
- •
  
  $a=\bar{b}$ and $\bm{v}\in\{(a\bar{a})^{i}(\bar{a}a)^{j},(a\bar{a})^{i}aa(\bar{a}a)^{j}\}$ for some $i,j\geq 0$ .
(ii)

$\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|=n^{\prime}+4$ if and only if $a,b$ and $\bm{v}$ satisfy one of the conditions listed in Table II.

Proof:

Denote $l=n^{\prime}-3$ . Let $\widetilde{\bm{x}}=a\bar{a}\bm{v}=\tilde{x}_{1}\cdots\tilde{x}_{l+2}$ and $\widetilde{\bm{y}}=\bm{v}b\bar{b}=\tilde{y}_{1}\cdots\tilde{y}_{l+2}$ . Then

\left\{\begin{array}[]{l}\tilde{x}_{1}=a,\tilde{x}_{2}=\bar{a};\\ \tilde{x}_{i+2}=\tilde{y}_{i},\text{ for all }i=1,\ldots,l;\\ \tilde{y}_{l+1}=b,\tilde{y}_{l+2}=\bar{b}.\end{array}\right.

(7)

(i). From Lemma III.3, we know that $\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|=n^{\prime}+5$ if and only if $\widetilde{\bm{x}}$ and $\widetilde{\bm{y}}$ are Type-A confusable. That is to say, $\widetilde{\bm{x}}=\bm{u}\bm{c}\bm{w}$ and $\widetilde{\bm{y}}=\bm{u}\bar{\bm{c}}\bm{w}$ for some $\bm{u},\bm{c},\bm{w}\in\Sigma_{2}^{*}$ , where $\bm{c}$ is an alternating sequence of length at least one. Denote $s=|\bm{u}|\geq 0$ and $t=|\bm{c}|\geq 1$ . Then

\left\{\begin{array}[]{ll}\tilde{x}_{i}=\tilde{y}_{i},\text{ for }i=1,\ldots,s,s+t+1,\ldots,l+2&\text{ by $\bm{u}$ and $\bm{w}$};\\ \tilde{x}_{i}=\overline{\tilde{y}_{i}},\text{ for }i=s+1,\ldots,s+t&\text{ by $\bm{c}$ and $\bar{\bm{c}}$};\\ \tilde{x}_{i}=\tilde{x}_{i+2},\text{ for }i=s+1,\ldots,s+t-2&\text{ by alternating of $\bm{c}$}.\end{array}\right.

(8)

Since $wt_{H}(\widetilde{\bm{x}})=wt_{H}(\widetilde{\bm{y}})=wt_{H}(\bm{v})+1$ , we have $|\bm{c}|=d_{H}(\widetilde{\bm{x}},\widetilde{\bm{y}})\geq 2$ . If $|\bm{c}|\geq 3$ , consider the prefix $\tilde{x}_{s+1}\tilde{x}_{s+2}\tilde{x}_{s+3}$ of $\bm{c}$ . By the third line of Equation 8 and the second line of Equation 7, we have $\tilde{x}_{s+1}=\tilde{x}_{s+3}$ and $\tilde{x}_{s+3}=\tilde{y}_{s+1}$ , respectively. Thus $\tilde{x}_{s+1}=\tilde{y}_{s+1}$ , which contradicts the second line of Equation 8. So we have $|\bm{c}|=2$ , and Equation 8 becomes

\left\{\begin{array}[]{l}\tilde{x}_{i}=\tilde{y}_{i},\text{ for }i=1,\ldots,s,s+3,\ldots,l+2;\\ \tilde{x}_{i}=\overline{\tilde{y}_{i}},\text{ for }i=s+1,s+2;\\ \tilde{x}_{s+1}=\overline{\tilde{x}_{s+2}}.\end{array}\right.

(9)

Next, we determine the explicit structure of $\bm{v}$ according to the values of $s$ and $l$ . Write $\bm{v}=v_{1}\cdots v_{l}$ . Then

v_{i}=\tilde{x}_{i+2}=\tilde{y}_{i},\text{ for all }i=1,\ldots,l.

(10)

For convenience, let $v_{-1}=\tilde{x}_{1}=a$ , $v_{0}=\tilde{x}_{2}=\bar{a}$ . Since $l\geq s$ , the first two lines of Equation 7 and the first line of Equation 9 imply that $v_{-1}v_{0}\cdots v_{s}$ is alternating. Therefore,

v_{-1}v_{0}\cdots v_{s}=\left\{\begin{array}[]{ll}(a\bar{a})^{\frac{s+2}{2}}&\text{ if }s\text{ is even},\\ (a\bar{a})^{\frac{s+1}{2}}a&\text{ if }s\text{ is odd}.\end{array}\right.

(11)

We divide our discussion into three cases.

•

$l=s$ . In this case, we have $v_{l}=\tilde{x}_{l+2}=\overline{\tilde{y}_{l+2}}=b$ and $v_{l-1}=\tilde{x}_{l+1}=\overline{\tilde{y}_{l+1}}=\bar{b}$ . Therefore, $\bm{v}=(a\bar{a})^{\frac{l}{2}}$ and $a=\bar{b}$ when $l$ is even, or $\bm{v}=(a\bar{a})^{\frac{l-1}{2}}a$ and $a=b$ when $l$ is odd.
•

$l=s+1$ . In this case, we have $v_{l}=\tilde{x}_{l+2}=\tilde{y}_{l+2}=\bar{b}$ and $v_{l-1}=\tilde{x}_{l+1}=\overline{\tilde{y}_{l+1}}=\bar{b}$ . Therefore, $\bm{v}=(a\bar{a})^{\frac{l-1}{2}}\bar{a}$ and $a=b$ when $l$ is odd, or $\bm{v}=(a\bar{a})^{\frac{l-2}{2}}aa$ and $a=\bar{b}$ when $l$ is even.

•

$l\geq s+2$ . In this case, $v_{i}=\tilde{y}_{i}=\overline{\tilde{x}_{i}}$ for $i=s+1,s+2$ by the second line of Equation 9 and Equation 10. So $v_{s+1}=\overline{v_{s+2}}$ by the third line of Equation 9. Hence $v_{s+1}v_{s+2}\cdots v_{l}$ is alternating by the first line of Equation 9 and Equation 10. Since $l+2\geq s+4$ , we have $v_{l-1}=\tilde{x}_{l+1}=\tilde{y}_{l+1}=b$ and $v_{l}=\tilde{x}_{l+2}=\tilde{y}_{l+2}=\bar{b}$ by the first line of Equation 9. Now, combining with Equation 11, we come to the desired forms of $\bm{v}$ : for all $0\leq s\leq l-2$ ,

\text{ when }a=\bar{b},\text{ }\bm{v}=v_{1}\cdots v_{l}=\left\{\begin{array}[]{ll}(a\bar{a})^{\frac{s}{2}}(\bar{a}a)^{\frac{l-s}{2}}&\text{ if }l,s\text{ are both even},\\ (a\bar{a})^{\frac{s-1}{2}}aa(\bar{a}a)^{\frac{l-1-s}{2}}&\text{ if }s\text{ is odd and }l\text{ is even},\end{array}\right.

and

\text{ when }a=b,\text{ }\bm{v}=v_{1}\cdots v_{l}=\left\{\begin{array}[]{ll}(a\bar{a})^{\frac{s}{2}}(\bar{a}a)^{\frac{l-1-s}{2}}\bar{a}&\text{ if }s\text{ is even and }l\text{ is odd},\\ (a\bar{a})^{\frac{s-1}{2}}a(a\bar{a})^{\frac{l-s}{2}}&\text{ if }l,s\text{ are both odd}.\end{array}\right.

This completes the proof of (i).

(ii). The proof of this case is long and tedious, thus we move it to Appendix B. ∎

Theorem IV.1 gives a full characterization of the structures of two sequences $\bm{x},\bm{y}\in\Sigma_{2}^{n}$ when $\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|=n+4$ or $n+5$ , respectively. This will help to exclude such pairs of sequences from an $(n,n+6;B^{I(2)})$ -reconstruction code (or an $(n,2;B^{I(1)})$ -reconstruction code) to obtain an $(n,n+4;B^{I(2)})$ -reconstruction code, as mentioned in the beginning of this section. We use the $(n,2;B^{I(1)})$ -reconstruction code in [21, Theorem 17] as a candidate for our construction. For that, we need the following notation.

For any $\bm{x}\in\Sigma_{2}^{n}$ , define

\textup{Inv}\left(\bm{x}\right)=\left|\{(i,j):1\leq i<j\leq n\text{ and }x_{i}>x_{j}\}\right|.

By definition, for any $a\in\Sigma_{2}$ and any $\bm{u},\bm{v},\bm{w}\in\Sigma_{2}^{*}$ ,

|\textup{Inv}\left(\bm{u}a\bar{a}\bm{v}\bar{a}\bm{w}\right)-\textup{Inv}\left(\bm{u}\bar{a}\bm{v}\bar{a}a\bm{w}\right)|=|\textup{Inv}\left(a\bar{a}\bm{v}\bar{a}\right)-\textup{Inv}\left(\bar{a}\bm{v}\bar{a}a\right)|=N_{\bm{v}}(\bar{a})+2,

(12)

where $N_{\bm{v}}(\bar{a})$ denotes the number of $\bar{a}$ appearing in $\bm{v}$ . Let $R(n,\ell,t)$ denote the set of all sequences $\bm{x}\in\Sigma_{2}^{n}$ such that the length of any subword with period $\ell^{\prime}$ of $\bm{x}$ is at most $t$ , for any $\ell^{\prime}\leq\ell$ . By definition, $R(n,\ell,t)\subseteq R(n,\jmath,s)$ if $\ell\geq\jmath$ and $t\leq s$ .

Lemma IV.2.

([31, Theorem 13]) For $\ell\geq 1$ , if $t\geq\lceil\log_{2}(n)\rceil+\ell+1$ , we have that $|R(n,\ell,t)|\geq 2^{n-1}$ .

For $n,P>0$ , let $c\in\mathbb{Z}_{1+P}$ and $d\in\mathbb{Z}_{2}$ . In [21, Theorem 17], the authors defined the following code

\mathcal{C}_{1}=\mathcal{C}_{1}(n;c,d)=\{\bm{x}\in R(n,2,2P):\textup{Inv}\left(\bm{x}\right)\equiv c\pmod{1+P},wt_{H}(\bm{x})\equiv d\pmod{2}\}.

They proved that $\mathcal{C}_{1}$ is an $(n,2;B^{I(1)})$ -reconstruction code in the following way. The constraint on $wt_{H}(\bm{x})$ implies that the Hamming weights of any two distinct codewords $\bm{x}$ and $\bm{y}$ in $\mathcal{C}_{1}$ have the same parity and thus $d_{H}(\bm{x},\bm{y})\geq 2$ . Therefore, if $\left|I_{1}(\bm{x})\cap I_{1}(\bm{y})\right|=2$ , we can conclude that $\bm{x}=\bm{u}\bm{c}\bm{w}$ and $\bm{y}=\bm{u}\overline{\bm{c}}\bm{w}$ for some $\bm{u},\bm{w}\in\Sigma_{2}^{*}$ and some alternating sequence $\bm{c}$ of positive even length. On the one hand, it can be shown that $\left|\textup{Inv}\left(\bm{x}\right)-\textup{Inv}\left(\bm{y}\right)\right|=\frac{|\bm{c}|}{2}$ . So the constraint on $\textup{Inv}\left(\bm{x}\right)$ leads to $(1+P)\mid\frac{|\bm{c}|}{2}$ and hence $1+P\leq\frac{|\bm{c}|}{2}$ . On the other hand, the length of $\bm{c}$ is upper bounded by $2P$ since $\bm{c}$ has period $2$ . So we have $1+P\leq\frac{|\bm{c}|}{2}\leq P$ , which is impossible. Therefore, $\mathcal{C}_{1}$ is an $(n,2;B^{I(1)})$ -reconstruction code.

By Corollary III.2 (i), the code $\mathcal{C}_{1}$ is an $(n,n+6;B^{I(2)})$ -reconstruction code. To obtain an $(n,n+4;B^{I(2)})$ -reconstruction code from $\mathcal{C}_{1}$ , we have to exclude pairs of sequences $\bm{x},\bm{y}\in\mathcal{C}_{1}$ satisfying that $\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|=n+4$ or $n+5$ . By Theorem IV.1, such pairs have a common subword $\bm{v}$ with a certain periodic property. Thus we can further bound the length of $\bm{v}$ and use the constraint on $\textup{Inv}\left(\bm{x}\right)$ to get a contradiction and exclude such pairs. See our construction below.

Theorem IV.2.

( $N=n+4$ ) For integers $n\geq 4,P\geq 6$ such that $3\mid P$ , let $c\in\mathbb{Z}_{1+P}$ and $d\in\mathbb{Z}_{2}$ . Let

\mathcal{C}(n;c,d)=\{\bm{x}\in R(n,3,\frac{P}{3}):\textup{Inv}\left(\bm{x}\right)\equiv c\pmod{1+P},wt_{H}(\bm{x})\equiv d\pmod{2}\}\}.

Then $\mathcal{C}(n;c,d)$ is an $(n,n+4;B^{I(2)})$ -reconstruction code. Furthermore, if $P=3\lceil\log_{2}(n)\rceil+12$ , then $\mathcal{C}(n;c,d)$ has redundancy at most $\log_{2}(P+1)+2=\log_{2}\log_{2}(n)+O(1)$ for some choice of $c$ and $d$ .

Proof:

Since $R(n,3,\frac{P}{3})\subseteq R(n,2,2P)$ , that is $\mathcal{C}(n;c,d)\subseteq\mathcal{C}_{1}(n;c,d)$ , it suffices to show that $\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|\notin\{n+4,n+5\}$ for any two distinct sequences $\bm{x},\bm{y}\in\mathcal{C}(n;c,d)$ .

If, on the contrary, there exists such a pair $\bm{x},\bm{y}$ , then without loss of generality, we can assume that

\bm{x}=\bm{u}a\bar{a}\bm{v}b\bm{w}\text{ and }\bm{y}=\bm{u}\bar{a}\bm{v}b\bar{b}\bm{w}

for some $\bm{u},\bm{v},\bm{w}$ $\in\Sigma_{2}^{*}$ and $a,b\in\Sigma_{2}$ . Since $wt_{H}(\bm{x})\equiv wt_{H}(\bm{y})\equiv d\pmod{2}$ , we conclude that $a=\bar{b}$ .

If $\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|=n+5$ , then $\bm{v}\in\{(a\bar{a})^{i}(\bar{a}a)^{j},$ $(a\bar{a})^{i}aa(\bar{a}a)^{j}\}$ for some $i,j\geq 0$ by Theorem IV.1(i). Since $\bm{x}\in R(n,3,\frac{P}{3})$ , we have $2i,2j\leq\frac{P}{3}$ . From Equation 12, we know that

\left|\textup{Inv}\left(\bm{x}\right)-\textup{Inv}\left(\bm{y}\right)\right|=\left|\textup{Inv}\left(a\bar{a}\bm{v}\bar{a}\right)-\textup{Inv}\left(\bar{a}\bm{v}\bar{a}a\right)\right|=i+j+2.

So $i+j+2\equiv 0\pmod{1+P}$ by the definition of $\mathcal{C}$ . Then $1+P\leq i+j+2\leq\frac{P}{3}+2$ , which implies that $P\leq 1$ , a contradiction.

If $\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|=n+4$ , then $\bm{v}$ is one of the four forms in Table II under the case $a=\bar{b}$ by Theorem IV.1(ii). For each case, we can deduce that $P\leq 3$ , thus a contradiction. For example, if $\bm{v}=(a\bar{a})^{i}(\bar{a}a\bar{a})^{j}(\bar{a}a)^{k}$ for some $i\geq 0,j\geq 1,k\geq 0$ , then $|\textup{Inv}\left(\bm{x}\right)-\textup{Inv}\left(\bm{y}\right)|=i+2j+k+2\equiv 0\pmod{1+P}$ . On the other hand, since $\bm{x}\in R(n,3,\frac{P}{3})$ , we have $2i\leq\frac{P}{3}$ , $3j\leq\frac{P}{3}$ , and $2k\leq\frac{P}{3}$ . So $1+P\leq i+2j+k+2\leq\frac{5P}{9}+2$ , which implies $P\leq 2$ , a contradiction.

So we conclude that $\mathcal{C}(n;c,d)$ is an $(n,n+4;B^{I(2)})$ -reconstruction code. Finally, we can choose $P=3\lceil\log_{2}(n)\rceil+12$ . Then $\left|R(n,3,\frac{P}{3})\right|\geq 2^{n-1}$ by Lemma IV.2. Note that for different pairs $(c,d)\in\mathbb{Z}_{1+P}\times\mathbb{Z}_{2}$ , the codes $\mathcal{C}(n;c,d)$ are disjoint from each other. So by pigeonhole principle, there must be some $c$ and $d$ such that $|\mathcal{C}(n;c,d)|\geq\frac{2^{n-1}}{2(1+P)}$ , which has redundancy $\log_{2}(P+1)+2=\log_{2}\log_{2}(n)+O(1)$ for $n$ large enough. ∎

Obviously, the code $\mathcal{C}(n;c,d)$ in Theorem IV.2 is also an $(n,n+5;B^{I(2)})$ -reconstruction code. Then combining with Proposition III.2, we have $\rho(n,N;B^{I(2)})=\log_{2}\log_{2}(n)+\Theta(1)$ for $N\in\{n+4,n+5\}$ . However, we can do a little bit better for $N=n+5$ . Under the same notations, let

\mathcal{C}_{2}(n;c,d)=\{\bm{x}\in R(n,2,\frac{2P}{3}):\textup{Inv}\left(\bm{x}\right)\equiv c\pmod{1+P},wt_{H}(\bm{x})\equiv d\pmod{2}\}\}.

Then we can show that $\mathcal{C}_{2}(n;c,d)$ is an $(n,n+5;B^{I(2)})$ -reconstruction code by the similar argument in Theorem IV.2. Furthermore, if we set $2P=3\lceil\log_{2}(n)\rceil+9$ , then $\mathcal{C}_{2}(n;c,d)$ has redundancy at most $\log_{2}(P+1)+2$ for some choice of $c$ and $d$ . Since this $P$ is about a half of the $P$ in Theorem IV.2, we know that the redundancy of $\mathcal{C}_{2}(n;c,d)$ is about one bit smaller than that of $\mathcal{C}(n;c,d)$ .

V Reconstruction codes for $N\in\{5,6\}$

In this section, we provide reconstruction codes for $N\in\{5,6\}$ with redundancy as small as possible. It seems that we can also construct a code by characterizing the structures of two sequences $\bm{x},\bm{y}$ such that $\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|=5$ or $6$ , as we did in the last section. However, it will be a long and tedious task. So the construction we use here is quite different from the one for $N=n+4$ .

In [27, Theorem 2], Sima et al. constructed a class of $2$ -deletion correcting codes with redundancy $7\log_{2}(n)+o(\log_{2}(n))$ , by generalizing the VT code with higher order parity checks of some indicator vectors of the original sequences. Inspired by their constructions, we will present an $(n,5;B^{I(2)})$ -reconstruction code with redundancy $\log_{2}(n)+O(\log_{2}\log_{2}(n))$ by utilizing five different reads. This improves the trivial upper bound $7\log_{2}(n)+o(\log_{2}(n))$ to $\log_{2}(n)+O(\log_{2}\log_{2}(n))$ , which is tight in terms of the leading term. We first recall the construction in [27, Theorem 2], which needs the following notations.

For $\bm{x}=x_{1}x_{2}\ldots x_{n}\in\Sigma_{2}^{n}$ ( $n\geq 2$ ) and $a,b\in\Sigma_{2}$ , the $ab$ -indicator $\mathbbm{1}_{ab}(\bm{x})\in\Sigma_{2}^{n-1}$ of $\bm{x}$ is defined as follows:

\mathbbm{1}_{ab}(\bm{x})_{i}=\begin{cases}1,&\mbox{if }x_{i}=a\text{ and }x_{i+1}=b,\\ 0,&\mbox{otherwise}.\end{cases}

For example, if $\bm{x}=10011010$ , then $\mathbbm{1}_{10}(\bm{x})=1000101$ and $\mathbbm{1}_{01}(\bm{x})=0010010$ . Note that the $10$ - and $01$ -indicators of any sequence do not contain consecutive ones. Define the following integer vectors of length $n-1$ :

\begin{array}[]{l}\bm{m}^{(0)}\triangleq(1,2,\ldots,n-1),\\ \bm{m}^{(1)}\triangleq\left(1,1+2,1+2+3,\ldots,\frac{n(n-1)}{2}\right),\\ \bm{m}^{(2)}\triangleq\left(1^{2},1^{2}+2^{2},1^{2}+2^{2}+3^{2},\ldots,\frac{(n-1)n(2n-1)}{6}\right).\end{array}

(13)

In other words, the $i$ th components of $\bm{m}^{(0)}$ , $\bm{m}^{(1)}$ and $\bm{m}^{(2)}$ are $i$ , $i(i+1)/2$ and $i(i+1)(2i+1)/6$ , respectively. Then given $\bm{x}\in\Sigma_{2}^{n}$ , the higher order parity checks for $\mathbbm{1}_{10}(\bm{x})$ and $\mathbbm{1}_{01}(\bm{x})$ are defined as

\begin{array}[]{l}f(\bm{x})=\left(\mathbbm{1}_{10}(\bm{x})\cdot\bm{m}^{(0)}\pmod{2n},~{}~{}\mathbbm{1}_{10}(\bm{x})\cdot\bm{m}^{(1)}\pmod{n^{2}},~{}~{}\mathbbm{1}_{10}(\bm{x})\cdot\bm{m}^{(2)}\pmod{n^{3}}\right),\\ h(\bm{x})=\left(wt_{H}\left(\mathbbm{1}_{01}(\bm{x})\right)\pmod{3},~{}~{}\mathbbm{1}_{01}(\bm{x})\cdot\bm{m}^{(1)}\pmod{2n}\right),\end{array}

(14)

where $\cdot$ denotes the inner product over the integers. By applying the above parity checks, Sima et al. proved the following result.

Theorem V.1.

[27, Theorem 2] For any $a_{1}\in\mathbb{Z}_{2n}$ , $a_{2}\in\mathbb{Z}_{n^{2}}$ , $a_{3}\in\mathbb{Z}_{n^{3}}$ , $a_{4}\in\mathbb{Z}_{3}$ and $a_{5}\in\mathbb{Z}_{2n}$ , let

\mathcal{D}(a_{1},a_{2},a_{3},a_{4},a_{5})=\left\{\bm{x}\in\Sigma_{2}^{n}:f(\bm{x})=\left(a_{1},a_{2},a_{3}\right),h(\bm{x})=\left(a_{4},a_{5}\right)\right\}.

Then $\mathcal{D}(a_{1},a_{2},a_{3},a_{4},a_{5})$ is a $2$ -deletion (or insertion) correcting code.

Clearly, the redundancy of $\mathcal{D}(a_{1},a_{2},a_{3},a_{4},a_{5})$ is at most $7\log_{2}(n)+O(1)$ for some choice of $a_{1}$ , $a_{2}$ , $a_{3}$ , $a_{4}$ and $a_{5}$ . Although this implies an $(n,5;B^{I(2)})$ -reconstruction code, the redundancy is large. To reduce the redundancy, we note that the parity checks in the construction of Theorem V.1 are applied to the whole sequence of length $n$ to correct two random insertion errors. However, as will be shown later, the two random insertion errors are located in a short subword (say of length $L$ ) of the original sequence, if we impose some constraint (belonging to $R(n,3,P)$ ) on the sequences. So we can apply the parity checks in Equation 14 only on this subword of length $L$ to combat the two insertion errors, which will greatly reduce the redundancy of the code if $L=o(n)$ .

Now assume that $\bm{x}\neq\bm{y}\in\Sigma_{2}^{n}$ ( $n\geq 4$ ) and $|I_{2}(\bm{x})\cap I_{2}(\bm{y})|\leq 6$ , then $|I_{1}(\bm{x})\cap I_{1}(\bm{y})|=0$ by Lemma III.4, and hence $d_{H}(\bm{x},\bm{y})\geq 2$ by Lemma III.1. So we can always assume that

\bm{x}=\bm{u}a\bm{d}b\bm{w}\text{ and }\bm{y}=\bm{u}\bar{a}\bm{e}\bar{b}\bm{w}

for some $\bm{u},\bm{w},\bm{d},\bm{e}\in\Sigma_{2}^{*}$ and $a,b\in\Sigma_{2}$ . Further, $a\bm{d}\neq\bm{e}\bar{b}$ and $\bm{d}b\neq\bar{a}\bm{e}$ , since otherwise $|I_{1}(\bm{x})\cap I_{1}(\bm{y})|\neq 0$ . As in Lemma IV.1, we have

I_{2}(\bm{x})\cap I_{2}(\bm{y})=\bm{u}\left(I_{2}(a\bm{d}b)\cap I_{2}(\bar{a}\bm{e}\bar{b})\right)\bm{w}.

(15)

In particular, $\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|=\left|I_{2}(a\bm{d}b)\cap I_{2}(\bar{a}\bm{e}\bar{b})\right|$ . The proof of Equation 15 is similar to that of Lemma IV.1 and thus omitted.

By Equation 15, we know that if $|I_{2}(\bm{x})\cap I_{2}(\bm{y})|\in\{5,6\}$ , then any sequence in $I_{2}(\bm{x})\cap I_{2}(\bm{y})$ can be obtained from $\bm{x}$ by inserting two symbols in $a\bm{d}b$ , or from $\bm{y}$ by inserting two symbols in $\bar{a}\bm{e}\bar{b}$ . Therefore, the idea is: if $\left|a\bm{d}b\right|=\left|\bar{a}\bm{e}\bar{b}\right|\leq m$ for some $m=o(n)$ , then by partitioning $\bm{x}$ (or $\bm{y}$ ) into non-overlapping segments of length $m$ (see Figure 4, the rightmost segment has length at most $m$ ), we can see that the two inserted symbols are located in at most two adjacent segments. Thus we can apply the higher order parity checks in Equation 14 to the subword of length $L=2m$ . Intuitively, we can deem that the redundancy can be greatly reduced.

(a) inserted symbols are in the same segment

(b) inserted symbols are in two adjacent segments

Figure 4: Stars represent the inserted symbols

To bound the length of $a\bm{d}b$ or $\bar{a}\bm{e}\bar{b}$ , we restrict our sequences to be within $R(n,\ell,P)$ , which is of size at least $2^{n-1}$ when $P=\lceil\log_{2}(n)\rceil+\ell+1$ by Lemma IV.2. Suppose $\bm{x},\bm{y}\in R(n,\ell,P)$ . If we can prove that $a\bm{d}b$ and $\bar{a}\bm{e}\bar{b}$ are both concatenation of a finite number of sequences of period at most $\ell$ , then $\left|a\bm{d}b\right|$ and $\left|\bar{a}\bm{e}\bar{b}\right|$ can be upper bounded by $m=O(P)=o(n)$ . Next, we will show that this is true.

Let $S=I_{2}(a\bm{d}b)\cap I_{2}(\bar{a}\bm{e}\bar{b})$ , where $a\bm{d}\neq\bm{e}\bar{b}$ and $\bm{d}b\neq\bar{a}\bm{e}$ . Then $S=S^{a}_{b}\cup S^{a}_{\bar{b}}\cup S_{b}^{\bar{a}}\cup S^{\bar{a}}_{\bar{b}}$ , where

\begin{array}[]{l}S^{a}_{b}=a\left(I_{2}(\bm{d})\cap\{\bar{a}\bm{e}\bar{b}\}\right)b,\\ S^{a}_{\bar{b}}=a\left(I_{1}(\bm{d}b)\cap I_{1}(\bar{a}\bm{e})\right)\bar{b},\\ S^{\bar{a}}_{b}=\bar{a}\left(I_{1}(a\bm{d})\cap I_{1}(\bm{e}\bar{b})\right)b,\\ S^{\bar{a}}_{\bar{b}}=\bar{a}\left(\{a\bm{d}b\}\cap I_{2}(\bm{e})\right)\bar{b}.\end{array}

It is clear that $S^{a}_{b}$ , $S^{a}_{\bar{b}}$ $S_{b}^{\bar{a}}$ and $S^{\bar{a}}_{\bar{b}}$ are mutually disjoint, and each has size at most two. Suppose that $|S|=\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|\in\{5,6\}$ . Then we must have:

\left|S_{\bar{b}}^{a}\right|=2\text{ and }\left|S_{b}^{\bar{a}}\right|\geq 1;\text{ or }\left|S_{b}^{\bar{a}}\right|=2\text{ and }\left|S_{\bar{b}}^{a}\right|\geq 1.

(16)

For the former case of Equation 16, $\bm{d}b$ and $\bar{a}\bm{e}$ are Type-A confusable, i.e., there exist some $\bm{\alpha},\bm{\beta}\in\Sigma_{2}^{*}$ such that

\bm{d}b=\bm{\alpha}\bm{w}\bm{\beta}\text{ and }\bar{a}\bm{e}=\bm{\alpha}\overline{\bm{w}}\bm{\beta},

(17)

where $\bm{w}\in\Sigma_{2}^{*}$ is an alternating sequence of length at least one. Since $\left|S_{b}^{\bar{a}}\right|\geq 1$ , we conclude that $a\bm{d}$ and $\bm{e}\bar{b}$ are Type-A confusable or Type-B confusable. With these notations and observations, we characterize the structures of $\bm{\alpha}$ and $\bm{\beta}$ in the following two lemmas for different cases. The proofs are in Appendix C.

Lemma V.1.

Suppose that $\bm{d}b$ and $\bar{a}\bm{e}$ are Type-A confusable, and that $a\bm{d}$ and $\bm{e}\bar{b}$ are Type-A confusable. Then $\bm{\alpha}=\bm{p}_{1}^{\prime}\bm{p}_{2}^{\prime}$ and $\bm{\beta}=\bm{p}_{1}\bm{p}_{2}$ for some $\bm{p}_{1},\bm{p}_{2},\bm{p}_{1}^{\prime},\bm{p}_{2}^{\prime}\in\Sigma_{2}^{*}$ , where the four sequences $\bm{p}_{1},\bm{p}_{2},\bm{p}_{1}^{\prime}$ and $\bm{p}_{2}^{\prime}$ all have period at most $2$ .

Lemma V.2.

Suppose that $\bm{d}b$ and $\bar{a}\bm{e}$ are Type-A confusable, and that $a\bm{d}$ and $\bm{e}\bar{b}$ are Type-B confusable. Then $\bm{\alpha}=\bm{p}_{1}^{\prime}\bm{p}_{2}^{\prime}\bm{p}_{3}^{\prime}$ and $\bm{\beta}=\bm{p}_{1}\bm{p}_{2}\bm{p}_{3}$ for some $\bm{p}_{1},\bm{p}_{2},\bm{p}_{3},\bm{p}_{1}^{\prime},\bm{p}_{2}^{\prime},\bm{p}_{3}^{\prime}\in\Sigma_{2}^{*}$ , where the six sequences $\bm{p}_{1},\bm{p}_{2},\bm{p}_{3},\bm{p}_{1}^{\prime},\bm{p}_{2}^{\prime}$ and $\bm{p}_{3}^{\prime}$ all have period at most $3$ .

For the latter case of Equation 16, we have $a\bm{d}$ and $\bm{e}\bar{b}$ are Type-A confusable by $\left|S_{b}^{\bar{a}}\right|=2$ , and $\bm{d}b$ and $\bar{a}\bm{e}$ are Type-A (or B) confusable by $\left|S_{\bar{b}}^{a}\right|\geq 1$ . As in Equation 17, we can let $a\bm{d}=\bm{\alpha}\bm{w}\bm{\beta}$ and $\bm{e}\bar{b}=\bm{\alpha}\overline{\bm{w}}\bm{\beta}$ , and show that Lemma V.1 and Lemma V.2 are both true for this case.

Now suppose that $\bm{x},\bm{y}\in R(n,3,P)$ , and $\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|\in\{5,6\}$ . Then Lemma V.1 and Lemma V.2 imply that

\left|a\bm{d}b\right|=\left|\bar{a}\bm{e}\bar{b}\right|\leq 7P+1.

(18)

So in the rest of this subsection, we let $m\triangleq 7P+1$ . For convenience, we always assume that $m\mid n$ , that is, each segment in Figure 4 has length $m$ . All results in this subsection can be extended to the case $m\nmid n$ by a truncated method, see Remark V.2 later. Before giving the construction of an $(n,5;B^{I(2)})$ -reconstruction code, we define the following notations.

For $\bm{x}\in\Sigma_{2}^{n}$ with $n=sm$ , partition $\bm{x}$ into $s$ segments of length $m$ as in Figure 4. For any two consecutive segments, we define the parity checks on this subword of length $2m$ as in Equation 14 as follows. For each $k\in\{0,1,\ldots,s-2\}$ , let $\mathbbm{1}^{(k)}_{10}(\bm{x})\triangleq\mathbbm{1}_{10}(\bm{x}[km+1:km+2m])$ and $\mathbbm{1}^{(k)}_{01}(\bm{x})\triangleq\mathbbm{1}_{01}(\bm{x}[km+1:km+2m])$ , which are the indicators of the $k$ -th segment of $\bm{x}$ . Define

\begin{array}[]{l}f_{k}(\bm{x})=\left(\mathbbm{1}^{(k)}_{10}(\bm{x})\cdot\bm{m}^{(0)}\pmod{4m},~{}~{}\mathbbm{1}^{(k)}_{10}(\bm{x})\cdot\bm{m}^{(1)}\pmod{4m^{2}},~{}~{}\mathbbm{1}^{(k)}_{10}(\bm{x})\cdot\bm{m}^{(2)}\pmod{8m^{3}}\right),\\ h_{k}(\bm{x})=\left(wt_{H}\left(\mathbbm{1}^{(k)}_{01}(\bm{x})\right)\pmod{3},~{}~{}\mathbbm{1}^{(k)}_{01}(\bm{x})\cdot\bm{m}^{(0)}\pmod{4m}\right),\end{array}

where $\bm{m}^{(i)}$ is defined in Equation 13 but with length $2m-1$ . If we let $\bm{z}=\bm{x}[km+1:km+2m]$ , then it is easy to see that $f_{k}(\bm{x})=f(\bm{z})$ and $h_{k}(\bm{x})=h(\bm{z})$ . Next, we will sum up these $f_{k}$ ’s and $h_{k}$ ’s according to the parity of $k$ . Let $K_{e}$ be the set of even integers and let $K_{o}$ be the set of odd integers in $\{0,1,\ldots,s-2\}$ , respectively. It is clear that if $k_{1},k_{2}\in K_{e}$ (or $K_{o}$ ) and $k_{1}\neq k_{2}$ , then the two intervals $[k_{1}m+1,k_{1}m+2m]$ and $[k_{2}m+1,k_{2}m+2m]$ are disjoint. Define

\tilde{f}_{e}(\bm{x})\triangleq\sum_{k\in K_{e}}f_{k}(\bm{x})\text{ and }\tilde{f}_{o}(\bm{x})\triangleq\sum_{k\in K_{o}}f_{k}(\bm{x})

with the summation over $\mathbb{Z}_{4m}\times\mathbb{Z}_{4m^{2}}\times\mathbb{Z}_{8m^{3}}$ , and define

\tilde{h}_{e}(\bm{x})\triangleq\sum_{k\in K_{e}}h_{k}(\bm{x})\text{ and }\tilde{h}_{o}(\bm{x})\triangleq\sum_{k\in K_{o}}h_{k}(\bm{x})

with the summation over $\mathbb{Z}_{3}\times\mathbb{Z}_{4m}$ . Now we are ready to give our construction.

Theorem V.2.

( $N=5$ ) Let $n$ and $P$ be integers such that $m=7P+1<n$ and $m\mid n$ . For any $\bm{a}$ $=$ $(a_{1},a_{2},a_{3},a_{4},a_{5})$ , $\bm{b}=(b_{1},b_{2},b_{3},b_{4},b_{5})$ $\in$ $\mathbb{Z}_{4m}\times\mathbb{Z}_{4m^{2}}\times\mathbb{Z}_{8m^{3}}\times\mathbb{Z}_{3}\times\mathbb{Z}_{4m}$ , and $a\in\mathbb{Z}_{n+1}$ , let $\mathcal{E}(n;a,\bm{a},\bm{b})$ be the set of all $\bm{x}=x_{1}\cdots x_{n}\in\Sigma_{2}^{n}$ such that the following four conditions hold:

•

$\mathop{\sum}\limits_{i=1}^{n}ix_{i}=a\pmod{n+1}$ ;
•

$\left(\tilde{f}_{e}(\bm{x}),\tilde{h}_{e}(\bm{x})\right)=\bm{a}$ ;
•

$\left(\tilde{f}_{o}(\bm{x}),\tilde{h}_{o}(\bm{x})\right)=\bm{b}$ ; and
•

$\bm{x}\in R(n,3,P)$ .

Then $\mathcal{E}(n;a,\bm{a},\bm{b})$ is an $(n,5;B^{I(2)})$ -reconstruction code. Furthermore, if $P=\lceil\log_{2}(n)\rceil+4$ , then $\mathcal{E}(n;a,\bm{a},\bm{b})$ has redundancy at most $19+2\log_{2}(3)+\log_{2}(n+1)+14\log_{2}(7P+1)=\log_{2}(n)+14\log_{2}\log_{2}(n)+O(1)$ for some choice of $\bm{a}$ , $\bm{b}$ , and $a$ .

Proof:

According to Equation 5, the first condition implies that $\mathcal{E}(n;a,\bm{a},\bm{b})$ is a subset of a VT code. So $I_{1}(\bm{x})\cap I_{1}(\bm{y})=\emptyset$ , and hence $d_{H}(\bm{x},\bm{y})\geq 2$ for any distinct $\bm{x},\bm{y}$ . Assume that

\bm{x}=\bm{u}a\bm{d}b\bm{w}\text{ and }\bm{y}=\bm{u}\bar{a}\bm{e}\bar{b}\bm{w}

for some $\bm{u},\bm{w},\bm{d},\bm{e}\in\Sigma_{2}^{*}$ and $a,b\in\Sigma_{2}$ . By Lemma III.4, $|I_{2}(\bm{x})\cap I_{2}(\bm{y})|\leq 6$ . We will show that $\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|<5$ by contradiction.

Assume that $\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|\geq 5$ . Then by Equation 18 we can partition the sequences $\bm{x}$ and $\bm{y}$ into $s=n/m$ segments each of length $m$ , such that the two inserted symbols in the common supersequence are located in at most two adjacent segments containing $a\bm{d}b$ or $\bar{a}\bm{e}\bar{b}$ . See Figure 4 for illustrations. This means that there exists some $k\in\{0,1,\ldots,s-2\}$ such that $a\bm{d}b$ and $\bar{a}\bm{e}\bar{b}$ are subwords of $\bm{x}[km+1:km+2m]$ and $\bm{y}[km+1:km+2m]$ , respectively. Without loss of generality, we assume $k\in K_{e}$ and the proof is the same for the case $k\in K_{o}$ . Let $\bm{x}^{\prime}\triangleq\bm{x}[km+1:km+2m]$ and $\bm{y}^{\prime}\triangleq\bm{y}[km+1:km+2m]$ . Then $f_{k}(\bm{x})=f(\bm{x}^{\prime})$ , $f_{k}(\bm{y})=f(\bm{y}^{\prime})$ , $h_{k}(\bm{x})=h(\bm{x}^{\prime})$ and $h_{k}(\bm{y})=h(\bm{y}^{\prime})$ . By the choice of $k$ , we know that $\bm{x}[k^{\prime}m+1:(k^{\prime}+2)m]$ $=$ $\bm{y}[k^{\prime}m+1:(k^{\prime}+2)m]$ for each $k^{\prime}\in K_{e}\setminus\{k\}$ , and hence $f_{k^{\prime}}(\bm{x})=f_{k^{\prime}}(\bm{y})$ and $h_{k^{\prime}}(\bm{x})=h_{k^{\prime}}(\bm{y})$ . Therefore, we have

\begin{array}[]{l}\tilde{f}_{e}(\bm{x})-\tilde{f}_{e}(\bm{y})=f_{k}(\bm{x})-f_{k}(\bm{y})=f(\bm{x}^{\prime})-f(\bm{y}^{\prime}),\\ \tilde{h}_{e}(\bm{x})-\tilde{h}_{e}(\bm{y})=h_{k}(\bm{x})-h_{k}(\bm{y})=h(\bm{x}^{\prime})-h(\bm{y}^{\prime}).\end{array}

The second condition in this theorem implies $\tilde{f}_{e}(\bm{x})=\tilde{f}_{e}(\bm{y})$ and $\tilde{h}_{e}(\bm{x})=\tilde{h}_{e}(\bm{y})$ , i.e., $f(\bm{x}^{\prime})=f(\bm{y}^{\prime})$ and $h(\bm{x}^{\prime})=h(\bm{y}^{\prime})$ . By Theorem V.1, we know that $\left|I_{2}(\bm{x}^{\prime})\cap I_{2}(\bm{y}^{\prime})\right|=0$ . However, by Equation 15, $\left|I_{2}(\bm{x}^{\prime})\cap I_{2}(\bm{y}^{\prime})\right|$ $=$ $\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|$ $\geq 5$ , a contradiction. Thus the code $\mathcal{E}(n;a,\bm{a},\bm{b})$ is indeed an $(n,5;B^{I(2)})$ -reconstruction code.

Finally, we choose $P=\lceil\log_{2}(n)\rceil+4$ , which implies that $\left|R(n,3,P)\right|\geq 2^{n-1}$ by Lemma IV.2. So there must be some $a\in\mathbb{Z}_{n+1}$ , $\bm{a}$ , $\bm{b}$ $\in$ $\mathbb{Z}_{4m}\times\mathbb{Z}_{4m^{2}}\times\mathbb{Z}_{8m^{3}}\times\mathbb{Z}_{3}\times\mathbb{Z}_{4m}$ , such that $|\mathcal{E}(n;a,\bm{a},\bm{b})|\geq\frac{2^{n-1}}{(n+1)M^{2}}$ , where $M={4m}\cdot{4m^{2}}\cdot{8m^{3}}\cdot 3\cdot{4m}=3\cdot 2^{9}\cdot m^{7}$ . By computing the redundancy, we complete the proof. ∎

Remark V.1.

If we replace the fourth condition in Theorem V.2 with $\bm{x}\in R(n,2,P)$ , and take $P=\left\lceil\log_{2}(n)\right\rceil+3$ and $m=5P+1$ , then we will get an $(n,6;B^{I(2)})$ -reconstruction code with redundancy at most $\log_{2}(n)+14\log_{2}\log_{2}(n)+O(1)$ , the constant term of which is slightly smaller than that of the redundancy of the code given in Theorem V.2.

Remark V.2.

Now we explain how to extend the construction in Theorem V.2 to the case $m\nmid n$ . Let $P=\lceil\log_{2}(n)\rceil+4$ and $m=7P+1$ as before. Suppose that $\bar{n}$ be the smallest integer such that $\bar{n}>n$ and $m\mid\bar{n}$ . For each $\bm{x}\in\Sigma_{2}^{n}$ , let $\bar{\bm{x}}$ be the word of length $\bar{n}$ by appending $\bar{n}-n$ zeros at the end of $\bm{x}$ . Then in Theorem V.2, modify the second and third conditions by replacing $\bm{x}$ with $\bar{\bm{x}}$ , and keep the other two conditions unchanged. By almost the same arguments, we can show that there exists an $(n,5;B^{I(2)})$ -reconstruction code with redundancy at most $\log_{2}(n)+14\log_{2}\log_{2}(n)+O(1)$ .

When $N\in\{5,6\}$ , an $(n,N;B^{I(2)})$ -reconstruction code must be an $(n,1;B^{I(1)})$ -reconstruction code. So the lower bound of $\rho(n,N;B^{I(2)})$ is $\log_{2}(n)+\Omega(1)$ (see Equation 5). Therefore, the redundancy of the code in Theorem V.2 has optimal leading term. It is worth noting that the idea in this section can be applied to all $N\leq 6$ . The main task is to upper bound the length of $a\bm{d}b$ and $\bar{a}\bm{e}\bar{b}$ under some restrictions. We only accomplished this task when $N\in\{5,6\}$ . When $N\leq 4$ , the analysis will be more complicated.

To conclude this section, we summarize the main results into the following theorem.

Theorem V.3.

\rho(n,N;B^{I(2)})=\begin{cases}\Theta(\log_{2}(n)),&\mbox{if }1\leq N\leq 4,\\ \log_{2}(n)+O(\log_{2}\log_{2}(n)),&\mbox{if }N=5,6,\\ \log_{2}(n)+\Theta(1),&\mbox{if }6<N\leq n+3,\\ \log_{2}\log_{2}(n)+\Theta(1),&\mbox{if }n+4\leq N\leq 2n+4,\\ 0,&\mbox{if }N>2n+4.\\ \end{cases}

VI Conclusion

In this paper, we study the redundancy of reconstruction codes for channels with exactly two insertions. Let $N$ be the number of given channels and let $n$ be the sequence length. It turns out that the nontrivial cases are $1\leq N\leq 6$ and $N\in\{n+4,n+5\}$ . For $N=n+4$ and $N=5$ , our constructions of codes are both explicit, and the ideas are quite different. When $N=n+4$ , the construction is based on characterizations of two sequences when the intersection size of their $2$ -insertion balls is exactly $n+4$ or $n+5$ . When $N=5$ , the construction is based on higher order parity checks on short subwords. Consequently, for all $N\geq 5$ , we construct codes which are asymptotically optimal in terms of redundancy.

For general $t$ -insertions (with $t>2$ being a constant), we have the following result as a generalization of Lemma III.3.

Lemma VI.1.

Let $n\geq 3$ and $\bm{x},\bm{y}\in\Sigma_{2}^{n}$ . Suppose that $\left|I_{1}(\bm{x})\cap I_{1}(\bm{y})\right|=1$ , then

\displaystyle\left|I_{t}(\bm{x})\cap I_{t}(\bm{y})\right|\leq I_{2}(n+1,t-1)+N_{2}^{+}(n-1,t-1)

for any $t\geq 2$ .

The proof of Lemma VI.1 is in Appendix D. By Equation 1 and Equation 2, we can see that $I_{2}(n+1,t-1)+N_{2}^{+}(n-1,t-1)=\frac{n^{t-1}}{(t-1)!}+O(n^{t-2})$ . Thus Lemma VI.1 gives a similar result as in [24]: the reconstruction code with two reads for a single insertion [21] is able to reconstruct codewords from $N=n^{t-1}/(t-1)!+O(n^{t-2})$ distinct noisy reads.

For future research, we raise the following questions.

(1)

In Theorem V.2, we construct an $(n,5;B^{I(2)})$ -reconstruction code with redundancy $\log_{2}(n)+14\log_{2}\log_{2}(n)+O(1)$ . The leading term $\log_{2}(n)$ is optimal. But we do not know whether the coefficient of the secondary term is optimal or not. This is left for future study.

(2)

Using the notations of Section V, we can see that $\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|=6$ if and only if $\left|S^{a}_{\bar{b}}\right|=\left|S^{b}_{\bar{a}}\right|=2$ and $\left|S^{a}_{b}\right|=\left|S^{\bar{b}}_{\bar{a}}\right|=1$ . In other words, $\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|=6$ if and only if

\begin{array}[]{l}\bm{d}b\text{ and }\bar{a}\bm{e}\text{ are Type-A confusable},\\ a\bm{d}\text{ and }\bm{e}\bar{b}\text{ are Type-A confusable},\\ \bar{a}\bm{e}\bar{b}\in I_{2}(\bm{d}),\text{ and }a\bm{d}b\in I_{2}(\bm{e}).\end{array}

By these equivalent conditions, we can determine the explicit structures of $\bm{d}$ and $\bm{e}$ . Then similar to Theorem IV.2, we can construct an $(n,6;B^{I(2)})$ -reconstruction code with redundancy $\log_{2}(n)+2\log_{2}\log_{2}(n)+O(1)$ . But the proofs are long and tedious. The details of this construction can be provided upon requirement. Again, we do not know whether the secondary term is optimal or not.

(3)

Find a way to upper bound the length of $a\bm{d}b$ and $\bar{a}\bm{e}\bar{b}$ in Section V. This will be helpful for us to construct $(n,N;B^{I(2)})$ -reconstruction codes for $2\leq N\leq 4$ .

Appendix A The proof of Lemma IV.1

Proof:

Recall that $\bm{x}=\bm{u}a\bar{a}\bm{v}b\bm{w}$ , $\bm{y}=\bm{u}\bar{a}\bm{v}b\bar{b}\bm{w}$ and $\bm{z}=\bm{u}a\bar{a}\bm{v}b\bar{b}\bm{w}$ . For simplicity, we denote $\bm{\widetilde{x}}=a\bar{a}\bm{v}b$ , $\bm{\widetilde{y}}=\bar{a}\bm{v}b\bar{b}$ , $\bm{\widetilde{z}}=a\bar{a}\bm{v}b\bar{b}$ and $T=I_{2}(\widetilde{\bm{x}})\cap I_{2}(\widetilde{\bm{y}})\setminus I_{1}(\bm{\widetilde{z}})$ . Clearly, $\bm{u}I_{1}(\bm{\widetilde{z}})\bm{w}\subset I_{1}(\bm{z})$ , and the two sets $I_{1}(\bm{z}),\bm{u}T\bm{w}$ are disjoint. Next we will prove $I_{2}(\bm{x})\cap I_{2}(\bm{y})=I_{1}(\bm{z})\cup\bm{u}T\bm{w}$ . It is easy to see that

I_{1}(\bm{z})\cup\bm{u}T\bm{w}\subseteq I_{2}(\bm{x})\cap I_{2}(\bm{y}).

So we only need to prove that the opposite inclusion is true. Since $\bm{u}I_{1}(\bm{\widetilde{z}})\bm{w}\subset I_{1}(\bm{z})$ , it is sufficient to show $I_{2}(\bm{x})\cap I_{2}(\bm{y})$ $\subseteq$ $I_{1}(\bm{z})\cup\bm{u}\left(I_{2}(\widetilde{\bm{x}})\cap I_{2}(\widetilde{\bm{y}})\right)\bm{w}$ . Assume that $\bm{z}^{\prime}\in I_{2}(\bm{x})\cap I_{2}(\bm{y})$ . Suppose that $k_{1}\leq k_{2}$ are the two insertion positions in $\bm{x}$ ,¹¹1This means that two symbols are inserted to the left of $x_{k_{1}}$ and $x_{k_{2}}$ , respectively, to get $\bm{z}^{\prime}$ . and $\ell_{1}\leq\ell_{2}$ are the two insertion positions in $\bm{y}$ . Let $i$ and $j$ be the leftmost and rightmost indices where $\bm{x}$ and $\bm{y}$ differ (see Figure 5). Comparing $k_{1},k_{2}$ with $i$ , we can divide the discussions into three cases: (1) $k_{1},k_{2}\leq i$ ; (2) $k_{1}\leq i,k_{2}>i$ ; (3) $k_{1},k_{2}>i$ .

Case (1): $k_{1},k_{2}\leq i$ (Figure 6). Since $x_{j}\neq y_{j}$ , we must have $\ell_{2}>j$ in this case, and either $\ell_{1}>j$ or $\ell_{1}\leq j$ .

Subcase (1): If $\ell_{1}>j$ , then by matching positions, we have

\bm{z}^{\prime}\in I_{2}(\bm{u})a\bar{a}\bm{v}b\bm{w}\cap\bm{u}\bar{a}\bm{v}b\bar{b}I_{2}(\bm{w})=\bm{u}\left(I_{2}(\bm{\widetilde{x}})\cap I_{2}(\bm{\widetilde{y}})\right)\bm{w}.

Subcase (2): If $\ell_{1}\leq j$ , then

	$\displaystyle\bm{z}^{\prime}$	$\displaystyle\in I_{2}(\bm{u})a\bar{a}\bm{v}b\bm{w}\cap I_{1}(\bm{u}\bar{a}\bm{v}b)\bar{b}b\bm{w}$
		$\displaystyle=\left(I_{2}(\bm{u})a\bar{a}\bm{v}\cap I_{1}(\bm{u}\bar{a}\bm{v}b)\bar{b}\right)b\bm{w}.$

For $I_{2}(\bm{u})a\bar{a}\bm{v}\cap I_{1}(\bm{u}\bar{a}\bm{v}b)\bar{b}$ , we have

I_{2}(\bm{u})a\bar{a}\bm{v}\cap I_{1}(\bm{u}\bar{a}\bm{v}b)\bar{b}=\left(I_{2}(\bm{u})a\bar{a}\bm{v}\cap\bm{u}I_{1}(\bar{a}\bm{v}b)\bar{b}\right)\cup\left(I_{2}(\bm{u})a\bar{a}\bm{v}\cap I_{1}(\bm{u})\bar{a}\bm{v}b\bar{b}\right).

If the second bracket $I_{2}(\bm{u})a\bar{a}\bm{v}\cap I_{1}(\bm{u})\bar{a}\bm{v}b\bar{b}$ is not empty, then $a\bar{a}\bm{v}=\bm{v}b\bar{b}$ . This holds if and only if $\bm{v}=(a\bar{a})^{m}$ and $a=b$ , or $\bm{v}=(a\bar{a})^{m}a$ and $a=\bar{b}$ , for some $m\geq 0$ . Both cases imply that $\bm{x}$ and $\bm{y}$ are Type-A confusable by Proposition III.1(2), which is a contradiction. Therefore, $I_{1}(\bm{u})\bar{a}a\bar{a}\bm{v}\cap I_{1}(\bm{u})\bar{a}\bm{v}b\bar{b}$ is empty and so

\bm{z}^{\prime}\in\left(I_{2}(\bm{u})a\bar{a}\bm{v}\cap\bm{u}I_{1}(\bar{a}\bm{v}b)\bar{b}\right)b\bm{w}\subseteq\bm{u}I_{1}(\bar{a}\bm{v}b)\bar{b}b\bm{w}\subseteq\bm{u}I_{2}(\bm{\widetilde{y}})\bm{w}.

This means that $\bm{z}^{\prime}\in\bm{u}\left(I_{2}(\bm{\widetilde{x}})\cap I_{2}(\bm{\widetilde{y}})\right)\bm{w}$ , since $\bm{z}^{\prime}$ is also in $I_{2}(\bm{u}a\bar{a}\bm{v}b\bm{w})$ ( $=I_{2}(\bm{x})$ ) by assumption.

Case (2) (Figure 7): $k_{1}\leq i,k_{2}>i$ . Since $x_{i}\neq y_{i}$ , we can not have exactly one of $\ell_{i}$ less than $i$ . Thus we have $\ell_{1},\ell_{2}\leq i$ or $\ell_{1},\ell_{2}>i$ in this case.

Subcase (1): If $\ell_{1},\ell_{2}\leq i$ then we must have $k_{2}>j$ since $x_{j}\neq y_{j}$ . So we have

	$\displaystyle\bm{z}^{\prime}$	$\displaystyle\in I_{1}(\bm{u})a\bar{a}\bm{v}bI_{1}(\bm{w})\cap I_{2}(\bm{u})\bar{a}\bm{v}b\bar{b}\bm{w}$
		$\displaystyle=I_{1}(\bm{u})a\bar{a}\bm{v}b\bar{b}\bm{w}\cap I_{2}(\bm{u})\bar{a}\bm{v}b\bar{b}\bm{w}$
		$\displaystyle\subseteq I_{1}(\bm{u}a\bar{a}\bm{v}b\bar{b}\bm{w})=I_{1}(\bm{z}).$

Subcase (2): If $\ell_{1},\ell_{2}>i$ , then

	$\displaystyle\bm{z}^{\prime}$	$\displaystyle\in I_{1}(\bm{u})aI_{1}(\bar{a}\bm{v}b\bm{w})\cap\bm{u}\bar{a}I_{2}(\bm{v}b\bar{b}\bm{w})$
		$\displaystyle=\bm{u}\bar{a}aI_{1}(\bar{a}\bm{v}b\bm{w})\cap\bm{u}\bar{a}I_{2}(\bm{v}b\bar{b}\bm{w})$
		$\displaystyle=\bm{u}\bar{a}\left(aI_{1}(\bar{a}\bm{v}b\bm{w})\cap I_{2}(\bm{v}b\bar{b}\bm{w})\right).$

Note that $I_{1}(\bar{a}\bm{v}b\bm{w})=$ $I_{1}(\bar{a}\bm{v}b)\bm{w}\cup\bar{a}\bm{v}bI_{1}(\bm{w})$ . So

	$\displaystyle aI_{1}(\bar{a}\bm{v}b\bm{w})\cap I_{2}(\bm{v}b\bar{b}\bm{w})$	$\displaystyle=\left(aI_{1}(\bar{a}\bm{v}b)\bm{w}\cap I_{2}(\bm{v}b\bar{b}\bm{w})\right)\cup\left(a\bar{a}\bm{v}bI_{1}(\bm{w})\cap I_{2}(\bm{v}b\bar{b}\bm{w})\right)$
		$\displaystyle=\left(aI_{1}(\bar{a}\bm{v}b)\bm{w}\cap I_{2}(\bm{v}b\bar{b})\bm{w}\right)\cup\left(a\bar{a}\bm{v}bI_{1}(\bm{w})\cap I_{1}(\bm{v}b\bar{b})I_{1}(\bm{w})\right).$

The last equality follows from $I_{2}(\bm{v}b\bar{b}\bm{w})=I_{2}(\bm{v}b\bar{b})\bm{w}\cup\bm{v}b\bar{b}I_{2}(\bm{w})\cup I_{1}(\bm{v}b\bar{b})I_{1}(\bm{w})$ . If $a\bar{a}\bm{v}bI_{1}(\bm{w})\cap I_{1}(\bm{v}b\bar{b})I_{1}(\bm{w})$ is not empty, then we have $a\bar{a}\bm{v}b\in I_{1}(\bm{v}b\bar{b})$ , which means $a\bar{a}\bm{v}b=\bm{v}b\bar{b}b$ and so $a\bar{a}\bm{v}=\bm{v}b\bar{b}$ . This implies that $\bm{x}$ and $\bm{y}$ are Type-A confusable, which is a contradiction. Therefore, $a\bar{a}\bm{v}bI_{1}(\bm{w})\cap I_{1}(\bm{v}b\bar{b})I_{1}(\bm{w})$ is empty and

\bm{z}^{\prime}\in\bm{u}\bar{a}\left(aI_{1}(\bar{a}\bm{v}b)\bm{w}\cap I_{2}(\bm{v}b\bar{b})\bm{w}\right)\subseteq\bm{u}\bar{a}I_{2}(\bm{v}b\bar{b})\bm{w}\subseteq\bm{u}I_{2}(\bm{\tilde{y}})\bm{w}.

This means that $\bm{z}^{\prime}\in\bm{u}\left(I_{2}(\bm{\tilde{x}})\cap I_{2}(\bm{\tilde{y}})\right)\bm{w}$ , since $\bm{z}^{\prime}$ is also in $I_{2}(\bm{u}a\bar{a}\bm{v}b\bm{w})$ ( $=I_{2}(\bm{x})$ ) by assumption.

Case (3) (Figure 8): $k_{1},k_{2}>i$ . In this case, we have $\ell_{1},\ell_{2}\leq i$ or $\ell_{1}\leq i,\ell_{2}>i$ . By a process similar to that of the previous two cases, we will get $\bm{z}^{\prime}\in\bm{u}\left(I_{2}(\bm{\tilde{x}})\cap I_{2}(\bm{\tilde{y}})\right)\bm{w}$ or $\bm{z}^{\prime}\in I_{1}(\bm{z})$ .

Combining cases (1), (2) and (3), we conclude that $I_{2}(\bm{x})\cap I_{2}(\bm{y})\subseteq I_{1}(\bm{z})\cup\bm{u}T\bm{w}$ . Thus the proof is completed. ∎

Figure 5: We denote

i

and

j

to be the leftmost and rightmost indices where

\bm{x}

and

\bm{y}

differ.

(a) subcase (1):

\ell_{1},\ell_{2}>j

(b) subcase (2):

\ell_{1}\leq j,\ell_{2}>j

Figure 6: Case (1):

k_{1},k_{2}\leq i

. Stars are the inserted positions.

(a) subcase (1):

\ell_{1},\ell_{2}\leq i

(b) subcase (2):

\ell_{1},\ell_{2}>i

Figure 7: Case (2):

k_{1}\leq i,k_{2}>i

. Stars are the inserted positions.

(a) subcase (1):

\ell_{1},\ell_{2}\leq i

(b) subcase (2):

\ell_{1}\leq i,\ell_{2}>i

Figure 8: Case (3):

k_{1},k_{2}>i

. Stars are the inserted positions.

Appendix B Proof of Case (ii) of Theorem IV.1

Proof:

(ii). Recall that $\widetilde{\bm{x}}=a\bar{a}\bm{v}$ , $\widetilde{\bm{y}}=\bm{v}b\bar{b}$ and $\bm{v}=v_{1}\cdots v_{l}$ , where $l\geq 0$ is the length of $\bm{v}$ . In addition, we write $\widetilde{\bm{x}}=\tilde{x}_{1}\cdots\tilde{x}_{l+2}$ and $\widetilde{\bm{y}}=\tilde{y}_{1}\cdots\tilde{y}_{l+2}$ .

By Lemma III.3, $\left|I_{2}(\bm{x})\cap I_{2}(\bm{y})\right|=n^{\prime}+4$ if and only if $a\bar{a}\bm{v}$ and $\bm{v}b\bar{b}$ are Type-B confusable, but not Type-A confusable. That is to say,

\widetilde{\bm{x}}=a\bar{a}\bm{v}=\bm{u}\alpha\bar{\alpha}\widetilde{\bm{v}}\beta\bm{w},\text{ }\widetilde{\bm{y}}=\bm{v}b\bar{b}=\bm{u}\bar{\alpha}\widetilde{\bm{v}}\beta\bar{\beta}\bm{w}

(19)

\widetilde{\bm{x}}=a\bar{a}\bm{v}=\bm{u}\bar{\alpha}\widetilde{\bm{v}}\beta\bar{\beta}\bm{w},\text{ }\widetilde{\bm{y}}=\bm{v}b\bar{b}=\bm{u}\alpha\bar{\alpha}\widetilde{\bm{v}}\beta\bm{w}

(20)

for some $\bm{u},\widetilde{\bm{v}},\bm{w}\in\Sigma_{2}^{*}$ and $\alpha,\beta\in\Sigma_{2}$ . Note that $\beta=\bar{\alpha}$ since $a\bar{a}\bm{v}$ and $\bm{v}b\bar{b}$ have the same Hamming weight. Suppose that $|\bm{u}|=s$ , $|\bm{w}|=t$ , where $s,t\geq 0$ . Then $l+2=|a\bar{a}\bm{v}|=s+t+3+|\widetilde{\bm{v}}|$ and so $l\geq s+t+1$ .

In order to simplify arguments in the sequel, we let $v_{-1}=a$ , $v_{0}=\bar{a}$ , $v_{l+1}=b$ and $v_{l+2}=\bar{b}$ . Then it is clear that $\tilde{x}_{i}=v_{i-2}$ and $\tilde{y}_{i}=v_{i}$ for all $1\leq i\leq l+2$ . Both of Equation 19 and Equation 20 indicate that $\tilde{x}_{1}\cdots\tilde{x}_{s}=\tilde{y}_{1}\cdots\tilde{y}_{s}$ and $\tilde{x}_{l+3-t}\cdots\tilde{x}_{l+2}=\tilde{y}_{l+3-t}\cdots\tilde{y}_{l+2}$ . Therefore, we have the following two identities:

\begin{array}[]{c}v_{-1}\cdots v_{s-2}=v_{1}\cdots v_{s},\\ v_{l-t+1}\cdots v_{l}=v_{l-t+3}\cdots v_{l+2},\end{array}

(21)

for any $s,t\geq 0$ and $l\geq s+t+1$ . From the first row in Equation 21, we know that $v_{-1}v_{0}\cdots v_{s}$ is the same as that in Equation 11. And the second row in Equation 21 implies that

v_{l-t+1}\cdots v_{l+2}=\begin{cases}(v_{l-t+1}v_{l-t+2})^{\frac{t+2}{2}},&\mbox{if }t\text{ is even},\\ (v_{l-t+1}v_{l-t+2})^{\frac{t+1}{2}}v_{l-t+1},&\mbox{if }t\text{ is odd}.\end{cases}

(22)

From Equation 22, we can see that if $t>0$ , then $\bar{b}=v_{l+2}=v_{l}$ . However, if $t=0$ , we get no information from Equation 22. Furthermore, if $t$ is even, we have $v_{l+1}=v_{l-t+1}$ , $v_{l+2}=v_{l-t+2}$ ; if $t$ is odd, we have $v_{l+1}=v_{l-t+2}$ , $v_{l+2}=v_{l-t+1}$ . Therefore, we must have $v_{l-t+1}\neq v_{l-t+2}$ since $v_{l+1}=b\neq\bar{b}=v_{l+2}$ .

Now we divide our discussions into two cases, according to Equation 19 or Equation 20. As we will see later, they will lead to different values of $\bm{v}$ .

•

Firstly, we assume that Equation 19 holds. Then $v_{s-1}=\tilde{x}_{s+1}=\alpha=\overline{\tilde{x}}_{s+2}=\bar{v}_{s}$ , $v_{l-t}=\tilde{x}_{l+2-t}=\beta$ , $v_{s+1}=\tilde{y}_{s+1}=\bar{\alpha}$ , $v_{l+2-t}=\tilde{y}_{l+2-t}=\bar{\beta}=\overline{\tilde{y}}_{l+1-t}=\bar{v}_{l+1-t}$ and $\tilde{x}_{s+3}\cdots\tilde{x}_{l+1-t}=\widetilde{\bm{v}}=\tilde{y}_{s+2}\cdots\tilde{y}_{l-t}$ . Thus we have

\begin{array}[]{l}v_{s-1}=\bar{v}_{s}=\bar{v}_{s+1},\\ v_{l-t}=\bar{v}_{l-t+2}=v_{l-t+1},\\ v_{s+1}\cdots v_{l-t-1}=v_{s+2}\cdots v_{l-t}.\end{array}

(23)

The three identities above imply that

\bar{v}_{s-1}=v_{s}=\cdots=v_{l-t+1}=\bar{v}_{l-t+2}.

(24)

Therefore, if $t=0$ , then $\bar{b}=v_{l+2}=\bar{v}_{l}$ . Combining Equations 11, 22 and 24, we get

\bm{v}=v_{1}\cdots v_{l}=\left\{\begin{array}[]{ll}(a\bar{a})^{\frac{s}{2}}\bar{a}^{l-s-t}(\bar{a}a)^{\frac{t}{2}},&\text{ }s,t\text{ both even},\\ (a\bar{a})^{\frac{s}{2}}\bar{a}^{l-s-t}(\bar{a}a)^{\frac{t-1}{2}}\bar{a},&\text{ }s\text{ even},t\text{ odd},\\ (a\bar{a})^{\frac{s-1}{2}}a^{l+1-s-t}(a\bar{a})^{\frac{t}{2}},&\text{ }s\text{ odd},t\text{ even},\\ (a\bar{a})^{\frac{s-1}{2}}a^{l+1-s-t}(a\bar{a})^{\frac{t-1}{2}}a,&\text{ }s,t\text{ both odd},\end{array}\right.

(25)

for all $s,t\geq 0$ and $l\geq s+t+1$ . Recall that $v_{l}=\bar{b}$ if $t\geq 1$ and $v_{l}=b$ if $t=0$ . So the first and last rows correspond to the case where $a=\bar{b}$ , while the second and third rows correspond to the case where $a=b$ .

•

Secondly, we assume that Equation 20 holds. Comparing with the previous case, the only difference is that Equation 23 becomes

\begin{array}[]{l}v_{s-1}=v_{s+2}=\bar{v}_{s+1},\\ \bar{v}_{l-t}=v_{l-t+2}=v_{l-t-1},\\ v_{s}\cdots v_{l-t-2}=v_{s+3}\cdots v_{l-t+1}.\end{array}

(26)

Therefore, if $t=0$ , we have $\bar{b}=v_{l+2}=\bar{v}_{l}$ . Recall that $l\geq s+t+1$ . From the first two rows in Equation 26, we can deduce that $l\geq s+t+2$ . Otherwise, the first two rows in Equation 26 will result in $v_{s-1}=\bar{v}_{s+1}=v_{s}$ , which contradicts Equation 11. Now the three identities in Equation 26 indicate that $v_{s+1}\cdots$ $v_{l-t+1}$ has period $3$ , and that $v_{s+1}=\bar{v}_{s-1}=v_{s}=v_{s+3}$ and $v_{s+2}=v_{s-1}$ .

If $l\equiv s+t\pmod{3}$ , then $l-t+1\equiv s+1\pmod{3}$ and $l-t-1\equiv s+2\pmod{3}$ . So from the periodicity of $v_{s+1}\cdots$ $v_{l-t+1}$ , we have $v_{l-t+1}=v_{s+1}=v_{s}$ and $v_{l-t+2}=v_{l-t-1}=v_{s+2}=v_{s-1}\neq v_{s}=v_{l-t+1}$ . Similarly, if $l\equiv s+t+1\pmod{3}$ , then $l-t+1\equiv(s+1)+1\pmod{3}$ and $l-t-1\equiv s+3\pmod{3}$ . So $v_{l-t+1}=v_{s+2}=v_{s-1}$ and $v_{l-t+2}=v_{l-t-1}=v_{s+3}=v_{s}\neq v_{l-t+1}$ . If $l\equiv s+t+2\pmod{3}$ , then $l-t+1\equiv(s+1)+2\pmod{3}$ . So $v_{l-t+1}=v_{s+3}=v_{s}$ and $v_{l-t+2}=\overline{v}_{l-t}=\overline{v}_{s+2}=\overline{v}_{s-1}=v_{s}$ , where the last equality follows from Equation 11. This contradicts the fact that $v_{l-t+2}\neq v_{l-t+1}$ . So we can not have $l\equiv s+t+2\pmod{3}$ .

By the above discussions, we have

v_{s+1}\cdots v_{l-t}=\begin{cases}(v_{s}v_{s-1}v_{s})^{\frac{l-s-t}{3}},&\mbox{if }l\equiv s+t\pmod{3},\\ (v_{s}v_{s-1}v_{s})^{\frac{l-s-t-1}{3}}v_{s},&\mbox{if }l\equiv s+t+1\pmod{3}.\end{cases}

(27)

Recall that $v_{l}=\bar{b}$ if $t\geq 1$ and $v_{l}=b$ if $t=0$ . Now, combining Equations 11, 22 and 27, for all $s,t,\geq 0$ and $l\geq s+t+2$ , we get eight cases for the values of $\bm{v}$ , which are listed in Table III. Note that in the third and seventh rows of Table III, we can not have $t=0$ . Otherwise, we will get $v_{l}=v_{l-1}$ , which contradicts the second row in Equation 26.

From Equation 25 and Table III, we can derive the results in Table II. Precisely (in Table III, we require $l\geq s+t+2$ ),

•

the first row in Table II follows from the second row in Equation 25;
•

the second row in Table II follows from the third row in Equation 25;
•

the third row in Table II follows from the second and third rows in Table III;
•

the fourth row in Table II follows from the fifth and eighth rows in Table III;
•

the fifth row in Table II follows from the first row in Equation 25;
•

the sixth row in Table II follows from the fourth row in Equation 25;
•

the seventh row in Table II follows from the first and fourth rows in Table III;
•

the eighth row in Table II follows from the sixth and seventh rows in Table III.

Therefore, we have prove the “only if” part of the conclusion.

For the “if” part, it is straightforward to verify that if we take $\bm{v}$ to be one of these values, $a\bar{a}\bm{v}$ and $\bm{v}b\bar{b}$ are Type-B confusable. Comparing Table II with the results in (i), we can see that $a\bar{a}\bm{v}$ and $\bm{v}b\bar{b}$ are not Type-A confusable. This completes the proof of (ii).

TABLE III: values of

\bm{v}

derived from Equation 20

No.

\bm{v}

conditions for

l,s

and

t

a=b

(a\bar{a})^{\frac{s}{2}}(\bar{a}a\bar{a})^{\frac{l-s-t}{3}}(\bar{a}a)^{\frac{t}{2}}

s

is even,

t

is even,

l\equiv s+t\pmod{3}

(a\bar{a})^{\frac{s}{2}}(\bar{a}a\bar{a})^{\frac{l-s-t}{3}}(\bar{a}a)^{\frac{t-1}{2}}\bar{a}

s

is even,

t

is odd,

l\equiv s+t\pmod{3}

Yes

(a\bar{a})^{\frac{s}{2}}(\bar{a}a\bar{a})^{\frac{l-s-t-1}{3}}(\bar{a}a)^{\frac{t}{2}}\bar{a}

s

is even,

t\geq 2

is even

l\equiv s+t+1\pmod{3}

Yes

(a\bar{a})^{\frac{s}{2}}(\bar{a}a\bar{a})^{\frac{l-s-t-1}{3}}(\bar{a}a)^{\frac{t+1}{2}}

s

is even,

t

is odd

l\equiv s+t+1\pmod{3}

(a\bar{a})^{\frac{s-1}{2}}a(a\bar{a}a)^{\frac{l-s-t}{3}}(a\bar{a})^{\frac{t}{2}}

s

is odd,

t

is even

l\equiv s+t\pmod{3}

Yes

(a\bar{a})^{\frac{s-1}{2}}a(a\bar{a}a)^{\frac{l-s-t}{3}}(a\bar{a})^{\frac{t-1}{2}}a

s

is odd,

t

is odd

l\equiv s+t\pmod{3}

(a\bar{a})^{\frac{s-1}{2}}a(a\bar{a}a)^{\frac{l-s-t-1}{3}}(a\bar{a})^{\frac{t}{2}}a

s

is odd,

t\geq 2

is even

l\equiv s+t+1\pmod{3}

(a\bar{a})^{\frac{s-1}{2}}a(a\bar{a}a)^{\frac{l-s-t-1}{3}}(a\bar{a})^{\frac{t+1}{2}}

s

is odd,

t

is odd

l\equiv s+t+1\pmod{3}

Yes

∎

Appendix C Proofs of Lemma V.1 and Lemma V.2

Lemma V.1 is the combination of Claim 1–Claim 4, and Lemma V.2 is the combination of Claim 1 and Claim 5–Claim 7.

In this section, we always denote $k=\left|\bm{w}\right|$ , $s=\left|\bm{\alpha}\right|$ and $t=\left|\bm{\beta}\right|$ . Then $k\geq 1$ , $s\geq 0$ and $t\geq 0$ . Let $\beta_{t+1}\triangleq\bar{b}$ . For simplicity, we define $\widetilde{\bm{x}}\triangleq a\bm{d}$ and $\widetilde{\bm{y}}\triangleq\bm{e}\bar{b}$ . So $\left|\widetilde{\bm{x}}\right|=\left|\widetilde{\bm{y}}\right|=k+s+t$ . Since $\left|S_{b}^{\bar{a}}\right|\geq 1$ , we conclude that $\widetilde{\bm{x}}$ and $\widetilde{\bm{y}}$ are Type-A confusable or Type-B confusable. Therefore, we can choose $i$ to be the leftmost index such that $\widetilde{x}_{i}\neq\widetilde{y}_{i}$ and $j$ to be the rightmost index such that $\widetilde{x}_{j}\neq\widetilde{y}_{j}$ . By the choice of $i$ and $j$ , we have $\widetilde{x}_{1}\cdots\widetilde{x}_{i-1}$ $=$ $\widetilde{y}_{1}\cdots\widetilde{y}_{i-1}$ and $\widetilde{x}_{j+1}\cdots\widetilde{x}_{s+k+t}$ $=$ $\widetilde{y}_{j+1}\cdots\widetilde{y}_{s+k+t}$ . Furthermore, according to Definition III.1 and Definition III.2, we conclude that

•

if $\widetilde{\bm{x}}$ and $\widetilde{\bm{y}}$ are Type-A confusable, then $\widetilde{x}_{i}\cdots\widetilde{x}_{j}$ is an alternating sequence and $\widetilde{x}_{i}\cdots\widetilde{x}_{j}=\overline{\widetilde{y}_{i}\cdots\widetilde{y}_{j}}$ .

•

if $\widetilde{\bm{x}}$ and $\widetilde{\bm{y}}$ are Type-B confusable, then $j-i\geq 2$ . Besides, one of the following two conditions must hold:

\widetilde{x}_{i}\neq\widetilde{x}_{i+1}=\widetilde{y}_{i},\widetilde{x}_{j}=\widetilde{y}_{j-1}\neq\widetilde{y}_{j}\text{ and }\widetilde{x}_{i+2}\cdots\widetilde{x}_{j-1}=\widetilde{y}_{i+1}\cdots\widetilde{y}_{j-2},

(28)

\widetilde{x}_{i}=\widetilde{y}_{i+1}\neq\widetilde{y}_{i},\widetilde{x}_{j-1}=\widetilde{y}_{j}\neq\widetilde{x}_{j}\text{ and }\widetilde{x}_{i+1}\cdots\widetilde{x}_{j-2}=\widetilde{y}_{i+2}\cdots\widetilde{y}_{j-1}.

(29)

Claim 1.

If $j\leq s+k+1$ , then $\bm{\beta}$ is a periodic sequence and its period is at most $2$ . If $i\geq s$ , then $\bm{\alpha}$ is a periodic sequence and its period is at most $2$ .

Proof:

We only prove the conclusion for $\bm{\beta}$ , since for $\bm{\alpha}$ , the proof is similar.

When $0\leq t\leq 2$ , it is clear that the period of $\bm{\beta}$ is $2$ . So we assume that $t\geq 3$ . When $j\leq s+k+1$ , we have $\beta_{1}\cdots\beta_{t-1}$ $=$ $\widetilde{x}_{s+k+2}\cdots\widetilde{x}_{s+k+t}$ $=$ $\widetilde{y}_{s+k+2}\cdots\widetilde{y}_{s+k+t}$ $=$ $\beta_{3}\cdots\beta_{t}\bar{b}$ , which implies that $\bm{\beta}$ is a periodic sequence and its period is at most $2$ . ∎

Claim 2.

Suppose that $i\geq s+k+2$ . If $a\bm{d}$ and $\bm{e}\bar{b}$ are Type-A confusable, then $\bm{\beta}=\bm{p}_{1}\bm{p}_{2}$ , where $\bm{p}_{1},\bm{p}_{2}\in\Sigma_{2}^{*}$ are sequences of period at most $2$ .

Proof:

Since $s+k+2\leq i\leq s+k+t$ , we know that $t\geq 2$ . When $t=2$ , the conclusion is trivial. So we assume that $t\geq 3$ . When $i\geq s+k+2$ , we have $\beta_{1}\cdots\beta_{i-s-k-2}$ $=$ $\widetilde{x}_{s+k+2}\cdots\widetilde{x}_{i-1}$ $=$ $\widetilde{y}_{s+k+2}\cdots\widetilde{y}_{i-1}$ $=$ $\beta_{3}\cdots\beta_{i-s-k}$ , $\beta_{i-s-k-1}\cdots\beta_{j-s-k-1}$ $=$ $\widetilde{x}_{i}\cdots\widetilde{x}_{j}$ $=$ $\overline{\widetilde{y}_{i}\cdots\widetilde{y}_{j}}$ $=$ $\overline{\beta_{i-s-k+1}\cdots\beta_{j-s-k+1}}$ and $\beta_{j-s-k}\cdots\beta_{t-1}$ $=$ $\widetilde{x}_{j+1}\cdots\widetilde{x}_{s+k+t}$ $=$ $\widetilde{y}_{j+1}\cdots\widetilde{y}_{s+k+t}$ $=$ $\beta_{j-s-k+2}\cdots\beta_{t+1}$ . The first and third equalities imply that $\beta_{1}\cdots\beta_{i-s-k}$ and $\beta_{j-s-k}\cdots\beta_{t+1}$ both have period at most $2$ . Recall that $\widetilde{x}_{i}\cdots\widetilde{x}_{j}$ is an alternating sequence. So $j-i=0$ or $1$ . Otherwise the second equality will lead to $\beta_{i-s-k+1}=\beta_{i-s-k-1}=\bar{\beta}_{i-s-k+1}$ , which is impossible. Now we can conclude that $\beta$ is the concatenation of two sequences, and each of them has period at most $2$ . ∎

Claim 3.

Suppose that $i\leq s+k+1$ and $j\geq s+k+2$ . If $a\bm{d}$ and $\bm{e}\bar{b}$ are Type-A confusable, then $\bm{\beta}=\bm{p}_{1}\bm{p}_{2}$ , where $\bm{p}_{1},\bm{p}_{2}\in\Sigma_{2}^{*}$ are sequences of period at most $2$ .

Proof:

Since $s+k+2\leq j\leq s+k+t$ , we know that $t\geq 2$ . When $t=2$ , the conclusion is trivial. So we assume that $t\geq 3$ . When $i\leq s+k+1$ and $j\geq s+k+2$ , we have $\beta_{1}\cdots\beta_{j-s-k-1}$ $=$ $\widetilde{x}_{s+k+2}\cdots\widetilde{x}_{j}$ $=$ $\overline{\widetilde{y}_{s+k+2}\cdots\widetilde{y}_{j}}$ $=$ $\overline{\beta_{3}\cdots\beta_{j-s-k+1}}$ and $\beta_{j-s-k}\cdots\beta_{t-1}$ $=$ $\widetilde{x}_{j+1}\cdots\widetilde{x}_{s+k+t}$ $=$ $\widetilde{y}_{j+1}\cdots\widetilde{y}_{s+k+t}$ $=$ $\beta_{j-s-k+2}\cdots\beta_{t+1}$ . The second equality implies that $\beta_{j-s-k}\cdots\beta_{t+1}$ has period at most $2$ . Recall that $\widetilde{x}_{i}\cdots\widetilde{x}_{j}$ is an alternating sequence and $j\geq s+k+2$ . So $j=s+k+2$ or $s+k+3$ . Otherwise the first equality will lead to $\beta_{3}=\beta_{1}=\bar{\beta}_{3}$ , which is impossible. Now we can take $\bm{p}_{1}=\beta_{1}\beta_{2}$ and $\bm{p}_{2}=\beta_{3}\cdots\beta_{t}$ . ∎

Similar to Claim 2 and Claim 3, we can prove the following claim.

Claim 4.

Suppose that $a\bm{d}$ and $\bm{e}\bar{b}$ are Type-A confusable, and that $s\geq 3$ . If $j\leq s-1$ , or if $i\leq s-1$ and $j\geq s$ , then $\bm{\alpha}=\bm{p}_{1}\bm{p}_{2}$ , where $\bm{p}_{1},\bm{p}_{2}\in\Sigma_{2}^{*}$ are sequences of period at most $2$ .

Claim 5.

Suppose that $i\leq s+k+1$ and $j\geq s+k+2$ , and that $a\bm{d}$ and $\bm{e}\bar{b}$ are Type-B confusable. Then

•

$\bm{\beta}=\beta_{1}\bm{p}_{1}\bm{p}_{2}$ , where $\bm{p}_{1},\bm{p}_{2}\in\Sigma_{2}^{*}$ are sequences of period at most $2$ ; or
•

$\bm{\beta}=\bm{p}_{1}\bm{p}_{2}$ , where $\bm{p}_{1},\bm{p}_{2}\in\Sigma_{2}^{*}$ are sequences of period at most $3$ .

Proof:

When $t\leq 3$ the conclusion is trivial. So we assume that $t\geq 4$ . If Equation 28 is true, then $\beta_{2}\cdots\beta_{j-s-k-2}$ $=$ $\widetilde{x}_{s+k+3}\cdots\widetilde{x}_{j-1}$ $=$ $\widetilde{y}_{s+k+2}\cdots\widetilde{y}_{j-2}$ $=$ $\beta_{3}\cdots\beta_{j-s-k-1}$ and $\beta_{j-s-k}\cdots\beta_{t-1}$ $=$ $\widetilde{x}_{j+1}\cdots\widetilde{x}_{s+k+t}$ $=$ $\widetilde{y}_{j+1}\cdots\widetilde{y}_{s+k+t}$ $=$ $\beta_{j-s-k+2}\cdots\beta_{t+1}$ . Now we can conclude that $\beta_{2}\cdots\beta_{j-s-k-1}$ is a run and hence has period $1$ , and that $\beta_{j-s-k}\cdots\beta_{t+1}$ is a sequence of period at most $2$ .

If Equation 29 is true, then $\beta_{1}\cdots\beta_{j-s-k-3}$ $=$ $\widetilde{x}_{s+k+2}\cdots\widetilde{x}_{j-2}$ $=$ $\widetilde{y}_{s+k+3}\cdots\widetilde{y}_{j-1}$ $=$ $\beta_{4}\cdots\beta_{j-s-k}$ and
$\beta_{j-s-k}\cdots\beta_{t-1}$ $=$ $\widetilde{x}_{j+1}\cdots\widetilde{x}_{s+k+t}$ $=$ $\widetilde{y}_{j+1}\cdots\widetilde{y}_{s+k+t}$ $=$ $\beta_{j-s-k+2}\cdots\beta_{t+1}$ . Now we can conclude that $\beta_{1}\cdots\beta_{j-s-k}$ is sequence of period at most $3$ , and that $\beta_{j-s-k}\cdots\beta_{t+1}$ is a sequence of period at most $2$ . ∎

Claim 6.

Suppose that $i\geq s+k+2$ , and that $a\bm{d}$ and $\bm{e}\bar{b}$ are Type-B confusable. Then $\bm{\beta}=\bm{p}_{1}\bm{p}_{2}\bm{p}_{3}$ , where $\bm{p}_{1},\bm{p}_{2},\bm{p}_{3}\in\Sigma_{2}^{*}$ are sequences of period at most $3$ .

Proof:

Similar to the proof of Claim 2, we can see that $\beta_{1}\cdots\beta_{i-s-k}$ and $\beta_{j-s-k}\cdots\beta_{t+1}$ both have period at most $2$ . If Equation 28 is true, then $\beta_{i-s-k+1}\cdots\beta_{j-s-k-2}$ $=$ $\widetilde{x}_{i+2}\cdots\widetilde{x}_{j-1}$ $=$ $\widetilde{y}_{i+1}\cdots\widetilde{y}_{j-2}$ $=$ $\beta_{i-s-k+2}\cdots\beta_{j-s-k-1}$ . Now we can conclude that $\beta_{i-s-k+1}\cdots\beta_{j-s-k-1}$ is a run and hence has period $1$ .

If Equation 29 is true, then $\beta_{i-s-k}\cdots\beta_{j-s-k-3}$ $=$ $\widetilde{x}_{i+1}\cdots\widetilde{x}_{j-2}$ $=$ $\widetilde{y}_{i+2}\cdots\widetilde{y}_{j-1}$ $=$ $\beta_{i-s-k+3}\cdots\beta_{j-s-k}$ . Therefore, $\beta_{i-s-k}\cdots\beta_{j-s-k}$ is sequence of period at most $3$ . ∎

Similar to Claim 5 and Claim 6, we can prove the following claim.

Claim 7.

Suppose that $a\bm{d}$ and $\bm{e}\bar{b}$ are Type-B confusable, and that $s\geq 4$ .

•
If $j\geq s$ and $i\leq s-1$ , then
- –
  
  $\bm{\alpha}=\bm{p}_{1}\bm{p}_{2}\alpha_{s}$ , where $\bm{p}_{1},\bm{p}_{2}\in\Sigma_{2}^{*}$ are sequences of period at most $2$ ; or
- –
  
  $\bm{\alpha}=\bm{p}_{1}\bm{p}_{2}$ , where $\bm{p}_{1},\bm{p}_{2}\in\Sigma_{2}^{*}$ are sequences of period at most $3$ .
•

If $j\leq s-1$ , then $\bm{\alpha}=\bm{p}_{1}\bm{p}_{2}\bm{p}_{3}$ , where $\bm{p}_{1},\bm{p}_{2},\bm{p}_{3}\in\Sigma_{2}^{*}$ are sequences of period at most $3$ .

Appendix D The proof of Lemma VI.1

Firstly, we need the following lemma, which is a generalization of the base case (i.e., the case where $\bm{u}=\bm{w}=\emptyset$ ) in the proof of Lemma III.3.

Lemma D.1.

Let $n\geq 3$ and $\bm{x},\bm{y}\in\Sigma_{2}^{n}$ such that

\bm{x}=a\bar{a}\bm{v}b,\text{ }\bm{y}=\bar{a}\bm{v}b\bar{b}

for some $a,b\in\Sigma_{2}$ and $\bm{v}$ $\in\Sigma_{2}^{n-3}$ . If $I_{1}(\bm{x})\cap I_{1}(\bm{y})=\{\bm{z}\}$ , then

\left|I_{t}(\bm{x})\cap I_{t}(\bm{y})\right|\leq\left|I_{t-1}(\bm{z})\right|+N_{2}^{+}(n-1,t-1)=I_{2}(n+1,t-1)+N_{2}^{+}(n-1,t-1)

for any $t\geq 2$ .

Proof:

It is clear that $\bm{z}=a\bar{a}\bm{v}b\bar{b}$ . Let $S=I_{t}(\bm{x})\cap I_{t}(\bm{y})$ . Then $S=S^{a}\cup S_{\bar{b}}\cup S_{b}^{\bar{a}}$ and

\begin{array}[]{l}S^{a}=a\left(I_{t}(\bar{a}\bm{v}b)\cap I_{t-1}(\bar{a}\bm{v}b\bar{b})\right)=aI_{t-1}(\bar{a}\bm{v}b\bar{b})\subseteq I_{t-1}(\bm{z}),\\ S_{\bar{b}}=\left(I_{t-1}(a\bar{a}\bm{v}b)\cap I_{t}(\bar{a}\bm{v}b)\right)\bar{b}=I_{t-1}(a\bar{a}\bm{v}b)\bar{b}\subseteq I_{t-1}(\bm{z}),\\ S_{b}^{\bar{a}}=\bar{a}\left(I_{t-1}(a\bar{a}\bm{v})\cap I_{t-1}(\bm{v}b\bar{b})\right)b.\end{array}

If $a\bar{a}\bm{v}=\bm{v}b\bar{b}$ , then $a=b$ and $\bm{v}=(a\bar{a})^{m}$ , or $a=\bar{b}$ and $\bm{v}=(a\bar{a})^{m}a$ for some $m\geq 0$ . For both cases, $\bm{x}$ and $\bm{y}$ are Type-A confusable by Proposition III.1 (2), which contradicts the fact that $\left|I_{1}(\bm{x})\cap I_{1}(\bm{y})\right|=1$ . Therefore, $a\bar{a}\bm{v}\neq\bm{v}b\bar{b}$ and hence $\left|I_{t}(\bm{x})\cap I_{t}(\bm{y})\right|=|S|=|S^{a}\cup S_{\bar{b}}|+|S_{b}^{\bar{a}}|\leq\left|I_{t-1}(\bm{z})\right|+N_{2}^{+}(n-1,t-1)$ . ∎

Proof of Lemma VI.1 Since $\left|I_{1}(\bm{x})\cap I_{1}(\bm{y})\right|=1$ , by Lemma III.2 we can assume that $\bm{x}=\bm{u}a\bar{a}\bm{v}b\bm{w}$ and $\bm{y}=\bm{u}\bar{a}\bm{v}b\bar{b}\bm{w}$ for some $a,b\in\Sigma_{2}$ and $\bm{u},\bm{v},\bm{w}\in\Sigma_{2}^{*}$ . Let $n_{0}=|a\bar{a}\bm{v}b|$ . Then $n\geq n_{0}$ . Let $S=I_{t}(\bm{x})\cap I_{t}(\bm{y})$ . We prove this theorem by induction on $n$ and $t$ .

The base case is $n=n_{0}$ or $t=2$ . When $n=n_{0}$ , the conclusion follows from Lemma D.1. When $t=2$ , the conclusion follows from Lemma III.3, since $I_{2}(n+1,1)+N_{2}^{+}(n-1,1)=n+5$ . Now suppose we have proved this theorem for all $n^{\prime}<n$ and $t^{\prime}<t$ . Without loss of generality, we assume that $\bm{u}=c\bm{u}^{*}$ for some $c\in\Sigma_{2}$ and $\bm{u}^{*}\in\Sigma_{2}^{*}$ . Clearly, $S=S^{\bar{c}}\cup S^{c}$ , and

\begin{array}[]{c}S^{\bar{c}}=\bar{c}\left(I_{t-1}(\bm{x})\cap I_{t-1}({\bm{y}})\right),\\ S^{c}=c\left(I_{t}(\bm{u}^{*}a\bar{a}\bm{v}b\bm{w})\cap I_{t}(\bm{u}^{*}\bar{a}\bm{v}b\bar{b}\bm{w})\right).\end{array}

By the induction hypothesis, $|S^{\bar{c}}|\leq I_{2}(n+1,t-2)+N_{2}^{+}(n-1,t-2)$ and $|S^{c}|\leq I_{2}(n,t-1)+N_{2}^{+}(n-2,t-1)$ . By Equations (1) and (2), we can get the following two identities,

\begin{array}[]{c}I_{2}(n+1,t-1)=I_{2}(n+1,t-2)+I_{2}(n,t-1),\\ \text{ }N_{2}^{+}(n-1,t-1)=N_{2}^{+}(n-1,t-2)+N_{2}^{+}(n-2,t-1).\end{array}

Then the proof is completed. $\blacksquare$

References

[1] V. I. Levenshtein, “Reconstruction of objects from the minimum number of distorted patterns,” Dokl. Akad. Nauk, vol. 354, no. 5, pp. 593–596, 1997.
[2] ——, “Efficient Reconstruction of Sequences,” IEEE Trans. Inf. Theory, vol. 47, no. 1, pp. 2–22, Jan. 2001.
[3] ——, “Efficient Reconstruction of Sequences from Their Subsequences or Supersequences,” J. Combinat. Theory, A, vol. 93, no. 2, pp. 310–332, Feb. 2001.
[4] T. Batu, S. Kannan, S. Khanna, and A. McGregor, “Reconstructing Strings from Random Traces,” in Proc. ACM-SIAM Symp. Discrete Algorithms (SODA), New Orleans, LA, USA, Jan. 2004, pp. 910–918.
[5] S. Kannan and A. McGregor, “More on reconstructing strings from random traces: insertions and deletions,” in Proc. Int. Symp. Inf. Theory (ISIT), Adelaide, Australia, Sep. 2005, pp. 297–301.
[6] K. Viswanathan and R. Swaminathan, “Improved string reconstruction over insertion-deletion channels,” in Proc. ACM-SIAM Symp. Discrete Algorithms (SODA), San Francisco, CA, USA, Jan. 2008, pp. 399–408.
[7] T. Holenstein, M. Mitzenmacher, R. Panigrahy, and U. Wieder, “Trace reconstruction with constant deletion probability and related results,” in Proc. ACM-SIAM Symp. Discrete Algorithms (SODA), San Francisco, CA, USA, Jan. 2008, pp. 389–398.
[8] M. Cheraghchi, R. Gabrys, O. Milenkovic, and J. Ribeiro, “Coded Trace Reconstruction,” IEEE Trans. Inf. Theory, vol. 66, no. 10, pp. 6084–6103, Oct. 2020.
[9] J. Brakensiek, R. Li, and B. Spang, “Coded trace reconstruction in a constant number of traces,” in Proc. Annu. Symp. Found. Comput. Sci. (FOCS), Durham, NC, USA, Nov. 2020, pp. 482–493.
[10] E. Konstantinova, V. I. Levenshtein, and J. Siemons, “Reconstruction of permutations distorted by single transposition errors,” arXiv: 0702191, 2007.
[11] E. Konstantinova, “On reconstruction of signed permutations distorted by reversal errors,” Discrete Math., vol. 308, no. 5, pp. 974–984, Mar. 2008.
[12] ——, “Reconstruction of permutations distorted by reversal errors,” Discrete Appl. Math., vol. 155, no. 18, pp. 2426–2434, Nov. 2007.
[13] V. I. Levenshtein, E. Konstantinova, E. Konstantinov, and S. Molodtsov, “Reconstruction of a graph from 2-vicinities of its vertices,” Discrete Appl. Math., vol. 156, no. 9, pp. 1399–1406, May 2008.
[14] V. I. Levenshtein and J. Siemons, “Error graphs and the reconstruction of elements in groups,” J. Combinat. Theory, A, vol. 116, no. 4, pp. 795–815, May 2009.
[15] E. Yaakobi, M. Schwartz, M. Langberg, and J. Bruck, “Sequence reconstruction for Grassmann graphs and permutations,” in Proc. Int. Symp. Inf. Theory (ISIT), Istanbul, Turkey, Oct. 2013, pp. 874–878.
[16] R. Gabrys and E. Yaakobi, “Sequence reconstruction over the deletion channel,” in Proc. Int. Symp. Inf. Theory (ISIT), Barcelona, Spain, Aug. 2016, pp. 1596–1600.
[17] ——, “Sequence Reconstruction Over the Deletion Channel,” IEEE Trans. Inf. Theory, vol. 64, no. 4, pp. 2924–2931, Apr. 2018.
[18] F. Sala, R. Gabrys, C. Schoeny, and L. Dolecek, “Exact Reconstruction From Insertions in Synchronization Codes,” IEEE Trans. Inf. Theory, vol. 63, no. 4, pp. 2428–2445, Apr. 2017.
[19] V. L. P. Pham, K. Goyal, and H. M. Kiah, “Sequence Reconstruction Problem for Deletion Channels: A Complete Asymptotic Solution,” arXiv:2111.04255v1, 2021.
[20] H. M. Kiah, T. Thanh Nguyen, and E. Yaakobi, “Coding for Sequence Reconstruction for Single Edits,” in Proc. Int. Symp. Inf. Theory (ISIT), Los Angeles, CA, USA, 2020, pp. 676–681.
[21] K. Cai, H. M. Kiah, T. T. Nguyen, and E. Yaakobi, “Coding for Sequence Reconstruction for Single Edits,” IEEE Trans. Inf. Theory, vol. 68, no. 1, pp. 66–79, Jan. 2022.
[22] J. Chrisnata, H. M. Kiah, and E. Yaakobi, “Optimal Reconstruction Codes for Deletion Channels,” arXiv: 2004.06032, 2020.
[23] V. Guruswami and J. Håstad, “Explicit two-deletion codes with redundancy matching the existential bound,” IEEE Trans. Inf. Theory, vol. 67, no. 10, pp. 6384–6394, Oct. 2021.
[24] J. Chrisnata, H. M. Kiah, and E. Yaakobi, “Correcting Deletions with Multiple Reads,” IEEE Trans. Inf. Theory, vol. Early Access, 2022.
[25] J. Brakensiek, V. Guruswami, and S. Zbarsky, “Efficient Low-Redundancy Codes for Correcting Multiple Deletions,” IEEE Trans. Inf. Theory, vol. 64, no. 5, pp. 3403–3410, May 2018.
[26] R. Gabrys and F. Sala, “Codes Correcting Two Deletions,” IEEE Trans. Inf. Theory, vol. 65, no. 2, pp. 965–974, Feb. 2019.
[27] J. Sima, N. Raviv, and J. Bruck, “Two Deletion Correcting Codes From Indicator Vectors,” IEEE Trans. Inf. Theory, vol. 66, no. 4, pp. 2375–2391, Apr. 2020.
[28] J. Sima and J. Bruck, “On Optimal $k$ -Deletion Correcting Codes,” IEEE Trans. Inf. Theory, vol. 67, no. 6, pp. 3360–3375, Jun. 2021.
[29] V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions and reversals,” Soviet Physics Doklady, vol. 10, no. 8, pp. 707–710, Feb. 1966.
[30] N. J. A. Sloane, “On single-deletion-correcting codes,” Codes and Designs, vol. 10, pp. 273–291, May 2002.
[31] Y. M. Chee, H. M. Kiah, A. Vardy, V. K. Vu, and E. Yaakobi, “Coding for Racetrack Memories,” IEEE Trans. Inf. Theory, vol. 64, no. 11, pp. 7094–7112, Nov. 2018.

Reconstruction of Sequences Distorted by Two Insertions

Abstract

Index Terms:

I Introduction

II Notations and Problem statement

Definition II.1.

Theorem II.1.

Example II.1.

III Preliminary results

Definition III.1.

Lemma III.1.

Definition III.2.

Example III.1.

Proposition III.1.

Lemma III.2.

Proof:

Corollary III.1.

Lemma III.3.

Proof:

Lemma III.4.

Proof:

Corollary III.2.

Proposition III.2.

IV Reconstruction Codes for N∈{n+4,n+5}N\in\{n+4,n+5\}

Lemma IV.1.

Theorem IV.1.

Proof:

Lemma IV.2.

Theorem IV.2.

Proof:

V Reconstruction codes for N∈{5,6}N\in\{5,6\}

Theorem V.1.

Lemma V.1.

Lemma V.2.

Theorem V.2.

Proof:

Remark V.1.

Remark V.2.

Theorem V.3.

VI Conclusion

Lemma VI.1.

Appendix A The proof of Lemma IV.1

Proof:

Appendix B Proof of Case (ii) of Theorem IV.1

Proof:

Appendix C Proofs of Lemma V.1 and Lemma V.2

Claim 1.

Proof:

Claim 2.

Proof:

Claim 3.

Proof:

Claim 4.

Claim 5.

Proof:

Claim 6.

Proof:

Claim 7.

Appendix D The proof of Lemma VI.1

Lemma D.1.

Proof:

References

IV Reconstruction Codes for $N\in\{n+4,n+5\}$

V Reconstruction codes for $N\in\{5,6\}$