This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Delayed Coding Scheme for Channels with Insertion, Deletion, and Substitution Errors

Ryo Shibata and Hiroyuki Yashima Dept. of Information and Computer Technology, Faculty of Engineering,
Tokyo University of Science, Tokyo 125-8585, Japan
Email: {r_shibata, yashima}@rs.tus.ac.jp
Abstract

We propose a new coding scheme, called the delayed coding (DC) scheme, for channels with insertion, deletion, and substitution (IDS) errors. The proposed scheme employs delayed encoding and non-iterative detection and decoding strategies to manage the transmission of multiple codewords in a linear code. In the DC scheme, a channel input sequence consists of subblocks of multiple codewords from the previous to current time instances. At the receiver side, the maximum a posteriori detection applies to the received sequences that contain information of the codeword at the current time instance, where priorly decoded codewords aid the detection. The channel code decoding is then performed, and extrinsic messages are exploited for the codeword estimations of the following time instances. We show that the rate achievable with the DC scheme over the IDS channel approaches the symmetric information rate of the channel. Moreover, we show the excellent asymptotic and finite-length performances of the DC scheme in conjunction with low-density parity-check codes.

I Introduction

Recently, synchronization error correction codes have garnered much attention owing to their occurrence in future storage devices[1, 2]. An insertion or deletion of a symbol is the most well-known instance of synchronization errors. To tackle random insertion and deletion (ID) errors, several constructions based on low-density parity-check (LDPC)[3, 4] codes have been proposed in the past two decades [5, 6, 7, 8, 9] and achieved promising results in approaching the symmetric information rate (SIR) of the channel. Davey and Mackay [5] and Wang et al. [6] studied the concatenated codes of outer LDPC codes with inner synchronization codes (e.g., marker and watermark codes). The inner codes serve to discriminate positions of ID errors by exploiting their non-uniform input distributions. Currently, standalone LDPC codes (i.e., without an inner code) are being investigated [7, 8, 9]. These standalone codes have synchronization recovery structures (e.g., structured irregularities [7] and low-degree check nodes[8]). Both concatenated and standalone codes are used in conjunction with a maximum a posteriori (MAP) detector, which infers a posteriori probabilities (APPs), prior to the belief-propagation (BP) decoder of an LDPC code. In general, the re-estimation of the APPs after each BP iteration (i.e., a joint iterative detection and decoding strategy) enhances decoding accuracy, at the expense of increased complexity and decreased throughput. To the best of our knowledge, there are no coding schemes that approach the SIRs of ID-type channels without the joint iteration.

Another concern for most of the existing coding schemes is the fact that they do not possess universal performance behaviors for noisy channels with ID errors, such as ID-substitution (IDS) and ID-AWGN channels. This means that different code structures are required to suppress different noise levels (or variance) [6, 9]. For instance, inner synchronization codes are utterly useless for correcting random noise errors as they simply result in a rate loss.

In this paper, we propose a new coding scheme, called the delayed coding (DC) scheme, which considers the transmission of a large number of codewords in a linear code. In the DC scheme, subblocks of multiple codewords from the previous to current time instances comprise a channel input sequence. Such a structure serves to remove the memory effects of ID errors from a particular codeword. At the receiver side, the codewords are successively decoded by employing a non-iterative detection and decoding algorithm. In estimating a particular codeword, the MAP detection is applied to the received sequences containing information regarding that codeword, where priorly decoded codewords function as pilot markers and aid the detection. We show that the rates achievable with the DC scheme over the IDS channel universally approach the SIRs, regardless of substitution probabilities. Furthermore, we demonstrate that the DC scheme with LDPC codes approaches the SIRs in terms of the asymptotic and finite-length performances, with reduced complexity compared to the existing coding scheme.

The remainder of this paper is organized as follows. In Section II, we introduce the proposed system model. In Section III, we show the achievable information rates (AIRs) for the proposed DC scheme. In Section IV, we demonstrate the asymptotic and finite-length performances when using LDPC codes. Finally, in Section V, we present our conclusions and future directions.

II System model

Throughout this paper, upper case letters are used to denote random variables, whereas their realizations are denoted by lower case letters.

II-A IDS channel model

We consider the concatenation of a binary input–output ID channel with a binary symmetric channel, i.e., an IDS channel [6, 9]. Let 𝑿=(X1,X2,,Xn){0,1}n\bm{X}=(X_{1},X_{2},\ldots,X_{n})\in\{0,1\}^{n} denote the input sequence. In this channel, each input bit XtX_{t}, t{1,,n}t\in\{1,\ldots,n\}, is deleted with probability pid/2p_{\rm id}/2, replaced by two random bits (i.e., insertion) with probability pid/2p_{\rm id}/2, or transmitted with probability 1pid1-p_{\rm id}. During transmission, XtX_{t} is flipped with a probability of psp_{\rm s}. Through this process, 𝑿\bm{X} is mapped to the output sequence 𝒀=(Y1,Y2,,Yn){0,1}n\bm{Y}=(Y_{1},Y_{2},\ldots,Y_{n^{\prime}})\in\{0,1\}^{n^{\prime}}, where nn^{\prime} can be larger or smaller than nn. For ID-type channels, the process YtY_{t} is typically formulated as the output of a hidden Markov model (HMM). Similar to [7, 8, 9], we adopt the channel state process StS_{t} given by a simple random walk in the presence of reflecting boundaries over the set {Dmax,,+Dmax}\{-D_{\rm max},...,+D_{\rm max}\}, where DmaxD_{\rm max} is the maximum allowed drift (Dmax>0D_{\rm max}>0). The state St=iS_{t}=i represents the timing drift at time tt (the difference between the number of transmitted and received bits is ii). The maximum drift Dmax=8D_{\rm max}=8 is used in this study based on observations in [8].

For ID-type channels, instead of capacity, the SIR is generally used as an achievable information rate. The SIR is defined as

CSIR=limn1nI(𝑿;𝒀)p(𝑿=𝒙)=2n,\displaystyle C_{\rm SIR}=\lim_{n\to\infty}\frac{1}{n}I(\bm{X};\bm{Y})\mid_{p(\bm{X}=\bm{x})=2^{-n}},

that is, the mutual information rates between the i.i.d. equiprobable inputs and corresponding outputs. The estimation of CSIRC_{\rm SIR} was carried out in [10, 11] based on the Shannon–McMillan–Breiman theorem for discrete stationary ergodic random processes.

II-B DC scheme

Refer to caption
Figure 1: Example illustrating the proposed DC encoding scheme with 𝑻=(0,1,2,3)\bm{T}=(0,1,2,3) and L=5L=5: (a) block level and (b) bit level.

We now describe the proposed DC scheme for channels with ID errors. The main idea underlying our scheme is inspired by a synchronization marker [6] and recently proposed modulation technique called delayed bit interleaved coded modulation [12].

Let 𝒄t=(c1,c2,,cn){0,1}n\bm{c}_{t}=(c_{1},c_{2},\ldots,c_{n})\in\{0,1\}^{n} represent the interleaved codeword selected from a binary linear code 𝒞\mathcal{C} at time t{1,,L}t\in\{1,\ldots,L\}, where LL is the number of transmitted codewords. Hereafter, no distinction is made between codewords and “interleaved” codewords. At the transmitter side, each codeword is divided into mm subblocks 𝒄t(1),𝒄t(2),,𝒄t(m)\bm{c}_{t}^{(1)},\bm{c}_{t}^{(2)},\ldots,\bm{c}_{t}^{(m)} of n/mn/m bits (for simplicity, nn is assumed to be divisible by mm). For i{1,,m}i\in\{1,\ldots,m\}, each subblock 𝒄t(i)\bm{c}^{(i)}_{t} is associated with a delay Ti{0,,Tmax}T_{i}\in\{0,\ldots,T_{\rm max}\}, where TmaxT_{\rm max} denotes the maximum allowed delay. We call 𝑻=(T1,T2,,Tm)\bm{T}=(T_{1},T_{2},\ldots,T_{m}) a delay scheme. While using a delay scheme 𝑻\bm{T}, the tt-th transmitted sequence 𝒙t{0,1}n\bm{x}_{t}\in\{0,1\}^{n}, t{1,,L+Tmax}t\in\{1,\ldots,L+T_{\rm max}\}, is obtained by interleaving 𝒄tT1(1),𝒄tT2(2),,𝒄tTm(m)\bm{c}^{(1)}_{t-T_{1}},\bm{c}^{(2)}_{t-T_{2}},\ldots,\bm{c}^{(m)}_{t-T_{m}} as follows:

𝒙t\displaystyle\bm{x}_{t} =𝒄tT1(1)||𝒄tT2(2)||||𝒄tTm(m),\displaystyle=\bm{c}^{(1)}_{t-T_{1}}||\bm{c}^{(2)}_{t-T_{2}}||\cdots||\bm{c}^{(m)}_{t-T_{m}}, (1)
=(ctT1,1(1),ctT2,1(2),ctTm,1(m),ctT1,2(1),ctT1,2(2),\displaystyle=(c^{(1)}_{t-T_{1},1},c^{(2)}_{t-T_{2},1},\ldots c^{(m)}_{t-T_{m},1},c^{(1)}_{t-T_{1},2},c^{(2)}_{t-T_{1},2},
,ctTm,2(m),ctT1,3(1),,ctTm,n/m(m)),\displaystyle\qquad\qquad\ldots,c^{(m)}_{t-T_{m},2},c^{(1)}_{t-T_{1},3},\ldots,c^{(m)}_{t-T_{m},n/m}), (2)

where |||| denotes the interleaving operator. The sequence 𝒙t\bm{x}_{t} is transmitted over an IDS channel, and the corresponding output 𝒚t\bm{y}_{t} is observed at the receiver. Figure 1 shows the configuration of transmitted sequences for 𝑻=(0,1,2,3)\bm{T}=(0,1,2,3) and L=5L=5, where m=4m=4 and Tmax=3T_{\rm max}=3. As shown in the figure, to start and terminate the encoding process, known bits (white-colored blocks in Fig. 1) participate in some transmitted sequences. In this study, we set these known-bits to be independent and uniformly distributed (i.u.d.), and the delay scheme 𝑻=(0,1,,Tmax)\bm{T}=(0,1,\ldots,T_{\rm max}) is employed for a given TmaxT_{\rm max}.

As explained in the following subsection, the delay and known bits lead to a performance improvement in detection (re-synchronization). However, they also incur a rate loss. Given a code rate RR of a component code 𝒞\mathcal{C}, the transmission rate of the DC scheme is

RDC=R(LL+Tmax).\displaystyle R_{\rm DC}=R\left(\frac{L}{L+T_{\rm max}}\right). (3)

We can see that, for large LL (LTmax)(L\gg T_{\rm max}), RDCR_{\rm DC} approaches RR. Note that the DC scheme does not impose a constant rate-loss as required for the use of inner codes.

For future use, we introduce the sets 𝒟j\mathcal{D}_{j} and 𝒟~j\mathcal{\tilde{D}}_{j} characterized by a given DC scheme. For j{0,,Tmax}j\in\{0,\ldots,T_{\rm max}\}, 𝒟j={iTi=j}\mathcal{D}_{j}=\{i\mid T_{i}=j\} and 𝒟~j={ij<Ti}\mathcal{\tilde{D}}_{j}=\{i\mid j<T_{i}\} denote the index sets of subblocks ii at delay equal to jj and larger than jj, respectively.

II-C Chained detection and decoding algorithm

Refer to caption
Figure 2: Example illustrating chained detection with 𝑻=(0,1,2,3)\bm{T}=(0,1,2,3) at time instances tt and t+1t+1.

We explain the chained detection and decoding algorithm used in the DC scheme in this section. This algorithm proceeds by successively estimating original codewords 𝒄t\bm{c}_{t}, t{1,,L}t\in\{1,\ldots,L\}, one by one starting from t=1t=1 to LL, where priorly decoded codewords help to recover the current codeword in the detection. The MAP detector, implemented with the Bahl–Cocke–Jelinek–Raviv (BCJR) algorithm, is employed here.

For i{1,,L}i\in\{1,\ldots,L\}, 𝒒i=(𝒒i(1),𝒒i(2),,𝒒i(m))\bm{q}_{i}=(\bm{q}_{i}^{(1)},\bm{q}_{i}^{(2)},\ldots,\bm{q}_{i}^{(m)}) denotes the extrinsic soft-messages output from the ii-th code decoder, whereas 𝒐i=(𝒐i(1),𝒐i(2),,𝒐i(m))\bm{o}_{i}=(\bm{o}_{i}^{(1)},\bm{o}_{i}^{(2)},\ldots,\bm{o}_{i}^{(m)}) denotes the sequence of log-APP ratios provided by the detector. These 𝒒i\bm{q}_{i} and 𝒐i\bm{o}_{i} correspond to the ii-th codeword 𝒄i\bm{c}_{i} and are also expressed as using mm divided subblocks. Let us now look at the procedure for estimating 𝒄t\bm{c}_{t}. The tt-th estimation can operate after the (t1)(t-1)-th estimation is completed and the sequences 𝒚t,𝒚t+1,,𝒚t+Tmax\bm{y}_{t},\bm{y}_{t+1},\ldots,\bm{y}_{t+T_{\rm max}} are received. First, for all i{0,,Tmax}i\in\{0,\ldots,T_{\rm max}\}, we run the MAP detection for the sequence 𝒚t+i\bm{y}_{t+i} given priori messages 𝒒tTj(j)\bm{q}_{t-T_{j}}^{(j)}, j𝒟~ij\in\mathcal{\tilde{D}}_{i}, and exploit a part of the log-APP ratios sequence 𝒐t(j)\bm{o}_{t}^{(j)}, j𝒟ij\in\mathcal{D}_{i}. This is illustrated in Fig. 2, where 𝑻=(0,1,2,3)\bm{T}=(0,1,2,3) (i.e., 𝒟i={i+1}\mathcal{D}_{i}=\{i+1\} for i{0,,Tmax}i\in\{0,\ldots,T_{\rm max}\}, 𝒟~0={2,3,4}\mathcal{\tilde{D}}_{0}=\{2,3,4\}, 𝒟~1={3,4}\mathcal{\tilde{D}}_{1}=\{3,4\}, 𝒟~2={4}\mathcal{\tilde{D}}_{2}=\{4\}, and 𝒟~3=ϕ\mathcal{\tilde{D}}_{3}=\phi). As shown in the figure, the log-APP ratios 𝒐t\bm{o}_{t} are collected from different (Tmax+1)(T_{\rm max}+1) received sequences and then fed to the tt-th code decoder, such as the BP decoder of an LDPC code. Finally, the decoder estimates 𝒄t\bm{c}_{t}, operating as it would on a memoryless channel, and sends the extrinsic messages 𝒒t(j)\bm{q}_{t}^{(j)}, j𝒟~0j\in\mathcal{\tilde{D}}_{0} to subsequent estimations.

As shown in Fig. 2, at the tt-th estimation round, the MAP detector can use reliable a priori information corresponding to the pre-estimated codewords 𝒄t3\bm{c}_{t-3}, 𝒄t2\bm{c}_{t-2}, and 𝒄t1\bm{c}_{t-1}. Moreover, the known bits provide perfect a priori information to the detector. In other words, these reliable bits behave as pilot symbols. The chained estimation strategy benefits from this advantage, which prevents the DC scheme from using a joint iterative detection and decoding strategy.

Before leaving this section, we compare the computational complexity between the existing and DC schemes in terms of the MAP detection. It goes without saying that the non-iterative scheme typically used with concatenated codes has the lowest complexity because it applies the MAP detection only once. For the iterative scheme, the execution number of the MAP detection depends on the received sequences, channel parameters, code structures, and decoder configuration. For instance, if we use a standalone LDPC code with a BP-based decoding algorithm, the detection is performed at every iteration until all parity check equations are satisfied, or a predefined number of iterations is reached. Meanwhile, for the DC scheme, the MAP detection is performed just (Tmax+1)(T_{\rm max}+1) times before channel code decoding. Note that these multiple detections are parallelizable because they are performed on received sequences that are independent of each other. Such a structure and non-iterative data flow strategy can improve the throughput and latency of the receiver and mitigate the effect of delay due to the DC scheme.

III Achievable information rates

Refer to caption
Figure 3: (a) Delay scheme channel. (b) ii-th delay subchannel.

In this section, we present AIRs for the DC scheme. As the BCJR algorithm has high complexity, strategies that apply the detection step only once are often considered for channels with states. The AIR with such a strategy is called the BCJR-once rate [13, 14, 15], which is defined as the mutual information between the channel inputs and corresponding log-APP ratios. To evaluate the AIRs of our scheme, we focus on the transmission of a particular random codeword. Specifically, we consider the concatenation channel of a DC, an IDS channel, and MAP detection, whose input and output are 𝑪{0,1}n\bm{C}\in\{0,1\}^{n} and 𝑶n\bm{O}\in\mathbb{R}^{n}, respectively. This channel, called a DC scheme channel (DCSC), is illustrated in Fig. 3 and can be decomposed into (1+Tmax)(1+T_{\rm max}) parallel independent channels, called delay subchannels, related to each delay.

Let 𝑪i\bm{C}_{i} and 𝑶i\bm{O}_{i} denote parts of the interleaved 𝑪\bm{C} and 𝑶\bm{O} of lengths n|𝒟i|/mn|\mathcal{D}_{i}|/m corresponding to the ii-th delay subchannel, respectively. We assume that the input sequence 𝑪\bm{C} is i.u.d.  and define the BCJR-once rate over a DCSC as

RDC=limn1ni=0Tmaxj=1n|𝒟i|/mI(Ci,j;Oi,j).\displaystyle R_{\rm DC}=\lim_{n\to\infty}\frac{1}{n}\sum_{i=0}^{T_{\rm max}}\sum_{j=1}^{n|\mathcal{D}_{i}|/m}I(C_{i,j};O_{i,j}). (4)

In the following, instead of estimating RDCR_{\rm DC} directly, we determine it by separately considering a BCJR-once rate for each delay subchannel. In the ii-th delay subchannel, the IDS channel input 𝑿i=𝑿(1)||𝑿(2)||||𝑿(m)\bm{X}_{i}=\bm{X}^{(1)}||\bm{X}^{(2)}||\cdots||\bm{X}^{(m)} consists of three parts: (i) bits corresponding to 𝑪i\bm{C}_{i} (i.e., 𝑿(k),k𝒟i\bm{X}^{(k)},k\in\mathcal{D}_{i}), (ii) priorly estimated bits (i.e., 𝑿(z),z𝒟~i\bm{X}^{(z)},z\in\mathcal{\tilde{D}}_{i}), and (iii) the other bits (i.e., 𝑿(j),j𝒟i,𝒟~i\bm{X}^{(j)},j\notin\mathcal{D}_{i},\mathcal{\tilde{D}}_{i}). Suppose that the codewords transmitted before 𝑪\bm{C} are successfully decoded, then the BCJR-once rate R(i)R^{(i)} for the ii-th channel is given by

R(i)=limns1ns|𝒟i|j=1ns|𝒟i|I(Ci,j;Oi,j),\displaystyle R^{(i)}=\lim_{n_{\rm s}\to\infty}\frac{1}{n_{\rm s}|\mathcal{D}_{i}|}\sum_{j=1}^{n_{\rm s}|\mathcal{D}_{i}|}I(C_{i,j};O_{i,j}), (5)

where ns:=n/mn_{\rm s}:=n/m, and the MAP detector provides

oi,j=lnPr[Ci,j=0𝒀i=𝒚i,{𝑿(z)=𝒙(z);z𝒟~i}]Pr[Ci,j=1𝒀i=𝒚i,{𝑿(z)=𝒙(z);z𝒟~i}],\displaystyle o_{i,j}=\ln\frac{\Pr\left[C_{i,j}=0\mid\bm{Y}_{i}=\bm{y}_{i},\{\bm{X}^{(z)}=\bm{x}^{(z)};z\in\mathcal{\tilde{D}}_{i}\}\right]}{\Pr\left[C_{i,j}=1\mid\bm{Y}_{i}=\bm{y}_{i},\{\bm{X}^{(z)}=\bm{x}^{(z)};z\in\mathcal{\tilde{D}}_{i}\}\right]},

because the detector can exploit perfect information about bits for 𝑿(z)\bm{X}^{(z)}, z𝒟~iz\in\mathcal{\tilde{D}}_{i}. Ultimately, the BCJR-once rate over a DCSC is given by RDC=1mi=0Tmax|𝒟i|R(i)R_{\rm DC}=\frac{1}{m}\sum_{i=0}^{T_{\rm max}}|\mathcal{D}_{i}|R^{(i)}. The rate R(i)R^{(i)} can be numerically computed in a similar manner to [13]. We note that, assuming a DCSC is a memoryless channel, any rate r<RDCr<R_{\rm DC} is achievable. Although the channel in fact has memory (i.e., Pr(Oi,j,Oi,zCi,j,Ci,z)Pr(Oi,jCi,j)Pr(Oi,zCi,z)\Pr(O_{i,j},O_{i,z}\mid C_{i,j},C_{i,z})\neq\Pr(O_{i,j}\mid C_{i,j})\Pr(O_{i,z}\mid C_{i,z})), the use of a delay scheme 𝑻\bm{T} with large mm and |𝒟i|=1|\mathcal{D}_{i}|=1 for all i{0,,Tmax}i\in\{0,\ldots,T_{\rm max}\} makes a DCSC virtually memoryless.

Refer to caption
Figure 4: BCJR-once rates for the DC scheme, marker coding scheme, and i.u.d. inputs, on the ID channel (ps=0p_{\rm s}=0).
Refer to caption
Figure 5: BCJR-once limits for the DC scheme and marker coding scheme.

Figure 4 shows the BCJR-once rates RDCR_{\rm DC} with Tmax{3,7,15,31}T_{\rm max}\in\{3,7,15,31\} over ID channels (i.e., ps=0p_{\rm s}=0). It also shows three other types of curves: (1) the SIR CSIRC_{\rm SIR} as computed using the Arnold–Loeliger method[10], (2) the BCJR-once rate for the marker coding scheme[6] with two-bit periodic markers inserted into the i.u.d. sequence at every d{10,20}d\in\{10,20\} bits, and (3) the BCJR-once rate with i.u.d. inputs (i.e., the rate achievable with a standalone linear code using MAP detection applied only once). As shown in the figure, the BCJR-once rate of the DC scheme outperforms the marker coding scheme and i.u.d. inputs for a wide range of ID probabilities. For Tmax=15T_{\rm max}=15 and 3131, the gap between the once rate and SIR is almost closed without depending on ID probabilities pidp_{\rm id}. Figure 5 shows the pairs (pid,ps)(p_{\rm id},p_{\rm s}) such that the SIR/BCJR-once rate is equal to 0.50.5 for IDS channels, as a function of psp_{\rm s} (we call the pairs the SIR/BCJR-once limits). It is clear that the DC scheme also performs well over the IDS channel, and it universally approaches the SIR when Tmax=15T_{\rm max}=15 and 3131 regardless of psp_{\rm s}. Such a universal property does not apply to the coding schemes with inner codes (e.g., watermark[6] and marker[6] codes) because they themselves incur a rate loss for correcting random errors.

IV Performance of DC scheme with LDPC codes

This section presents the asymptotic and finite-length performances of the DC scheme with LDPC codes.

IV-A Asymptotic performance analysis

For memoryless symmetric channels, density evolution[4] is well known as a useful technique to predict the asymptotic performance of LDPC codes with BP-based decoding algorithms. This technique is also extended to channels with states or/and asymmetric nature[7, 13, 14], where LDPC coset codes are employed. Building on their results, we can establish the density evolution for the DC scheme with LDPC coset codes. When using LDPC codes, the chained estimation algorithm of the DC scheme involves (Tmax+1)(T_{\rm max}+1) MAP detection and subsequent BP decoding for each codeword/time instant t{1,,L}t\in\{1,\ldots,L\}, starting from t=1t=1. Here, our interest is the convergence behavior of the BP decoding for the tt-th codeword. We assume that the tt^{\prime}-th (t<tt^{\prime}<t) BP decoder produces a probability of error that converges to zero as the number of BP iterations tends to infinity, i.e., the prior decoding round is successful. Under this assumption, the transmission of the tt-th codeword is reduced to the transmission over a DCSC. Therefore, by running density evolution for a DCSC, we can predict the asymptotic threshold (i.e., the BP threshold) of an ID probability pidp^{\ast}_{\rm id} over the IDS channel with a given psp_{\rm s}.

In this study, we primarily investigate the irregular LDPC code ensembles introduced in [4]. They are specified by degree distribution pairs λ(x)=i=2dvλixi1\lambda(x)=\sum^{d_{\rm v}}_{i=2}\lambda_{i}x^{i-1} and ρ(x)=i=2dcρixi1\rho(x)=\sum^{d_{\rm c}}_{i=2}\rho_{i}x^{i-1}, where λi\lambda_{i} (resp. ρi\rho_{i}) denotes the fraction of the edges connected to the degree i2i\geq 2 variable (resp. check) nodes and dvd_{\rm v} (resp. dcd_{\rm c}) denotes the maximum degree of the variable (resp. check) nodes. The design rate of an LDPC code is given by R=1(01ρ(x)𝑑x/01λ(x)𝑑x)R=1-(\int_{0}^{1}\rho(x)dx/\int_{0}^{1}\lambda(x)dx).

Let us now compare the BP thresholds of some existing irregular LDPC codes of a rate R=0.5R=0.5, whose degree distributions are shown in Table I. The ID and IDS codes are designed for the iterative scheme over the ID (ps=0p_{\rm s}=0) and IDS (ps=0.04p_{\rm s}=0.04) channels, respectively. In Table II, we show the BP thresholds of these codes with the DC scheme. For comparison, we also show the SIRs and BP threshold of the C(J=3,K=6,L)C(J=3,K=6,L) SC-LDPC code [16] (variable node degree JJ, check node degree KK, and coupling length LL). We can see that, with the iterative scheme, the ID, IDS, and SC-LDPC codes exhibit outstanding performance, whereas the Bi-AWGN code cannot correct IDS errors. However, interestingly, for the Bi-AWGN code with the DC scheme, the BP thresholds approach the corresponding BCJR-once rates (see Fig. 4) given TmaxT_{\rm max} and are close to the SIRs, without depending on psp_{\rm s}. This is also observed for the DC scheme with the SC-LDPC code. These results indicate that the DC scheme provides its universal performance even in conjunction with practical linear codes and decoding schemes. For this to be realized, careful code selection is necessary. More precisely, it seems reasonable to use codes designed for random errors, instead of ID and IDS codes, which have many low-degree check nodes.

TABLE I: Degree distributions of the Bi-AWGN [4], ID [8], and IDS codes [9].
Bi-AWGN[4] ID[8] IDS[9]
ii λi\lambda_{i} ρi\rho_{i} ii λi\lambda_{i} ρi\rho_{i} ii λi\lambda_{i} ρi\rho_{i}
2 0.24426 2 0.39884 0.10018 2 0.24136 0.08363
3 0.25907 3 0.45664 3 0.49091
4 0.01054 4 0.30375 5 0.11696
5 0.05510 9 0.22102 7 0.07771 0.45892
7 0.25475 12 0.14452 0.37505 9 0.00506 0.13615
8 0.01455 0.73438 11 0.30544
9 0.01087 12 0.06800 0.01586
10 0.01275
12 0.40373
TABLE II: Comparison of BP thresholds pidp_{\rm id}^{\ast} with the DC scheme, iterative scheme, and non-iterative scheme. The left and right values in each cell denote the BP thresholds for the ID (ps=0p_{\rm s}=0) and IDS (ps=0.04p_{\rm s}=0.04) channels, where \ast signifies pid<0.001p_{\rm id}^{\ast}<0.001.
scheme code Bi-AWGN[4] ID[8] IDS[9] SC[7]
non-iterative \ast, \ast \ast, \ast \ast, \ast \ast, \ast
iterative \ast, \ast 0.097, \ast 0.072, 0.046 0.097, 0.048
DC (Tmax=3T_{\rm max}=3) 0.072, 0.018 0.024, \ast 0.048, \ast 0.072, 0.017
DC (Tmax=7T_{\rm max}=7) 0.088, 0.035 0.038, \ast 0.063, 0.011 0.087, 0.035
DC (Tmax=15T_{\rm max}=15) 0.091, 0.039 0.040, \ast 0.067, 0.018 0.090, 0.040
DC (Tmax=31T_{\rm max}=31) 0.091, 0.041 0.040, \ast 0.067, 0.019 0.090, 0.041
SIR limit 0.0997, 0.0499

IV-B Finite-length simulation

Refer to caption
Figure 6: BER curves of the proposed DC scheme with the Bi-AWGN code, iterative scheme with the ID and SC-LDPC codes, and non-iterative scheme with the concatenated LDPC/marker code in the ID channel (ps=0p_{\rm s}=0), where an LDPC code length of n5×104n\approx 5\times 10^{4} is used.
TABLE III: Average number of executions of MAP detection for the DC and iterative schemes.
pidp_{\rm id} code, scheme Bi-AWGN[4] ID[8] SC[7]
DC (Tmax=15T_{\rm max}=15) iterative
0.070 16 21.3 49.4
0.075 16 25.5 57.9
0.080 16 31.7 72.5

In this section, we present the simulation results of finite-length LDPC codes with the DC scheme. We compare the bit-error rate (BER) performance of the following schemes: (1) the DC scheme (we use 𝑻=(0,1,,Tmax)\bm{T}=(0,1,\ldots,T_{\rm max})) with the Bi-AWGN code, (2) the iterative scheme with the C(3,6,25)C(3,6,25) SC-LDPC and ID codes, and (3) the non-iterative scheme with concatenated rate-0.50.5 optimized LDPC and marker codes[6] (two-bit markers are inserted into an LDPC codeword every 10 bits).

Figure 6 shows the BER curves with length n5×104n\approx 5\times 10^{4} LDPC codes as a function of pidp_{\rm id} with a sufficient number of iterations, where ps=0p_{\rm s}=0. From the figure, the good performance can be obtained with the DC scheme for a wide range of TmaxT_{\rm max}, with a sharp waterfall drop in BER. The BER curves of the DC scheme are positioned between the non-iterative and iterative schemes, and approach the latter when Tmax=15T_{\rm max}=15 at ID probabilities around pid=0.08p_{\rm id}=0.08. Table III shows the average number of executions of MAP detection for the DC and iterative schemes. Although it is clear from Fig. 6 that the BERs of both schemes are vanishing (we did not observe any errors in our simulations), the DC scheme has smaller execution numbers compared with the iterative scheme. At pid=0.080p_{\rm id}=0.080, in terms of the complexity of detection, the DC scheme is 2 and 4.5 times as efficient as the iterative scheme with the ID and C(3,6,25)C(3,6,25) SC-LDPC codes, respectively.

V Conclusion

This paper proposed the DC scheme for IDS channels, which employs delayed encoding and non-iterative chained detection and decoding strategies. We presented the AIRs of the DC scheme over the channel and showed that for a large delay TmaxT_{\rm max}, the AIRs universally approach the SIRs without depending on substitution and ID occurrence probabilities. We also analyzed the asymptotic performance of the DC scheme with LDPC codes, and showed that, with an appropriate LDPC code, the BP threshold approaches the corresponding AIR, even though the MAP detection is performed only once before BP decoding. Moreover, the results of simulations conducted show that the DC scheme provides excellent performance with the finite-length codes while reducing complexity compared to the conventional scheme. In future work, to approach the SIRs as closely as possible with a small delay and low complexity, we plan to design a delay scheme 𝑻\bm{T} and degree distribution of LDPC codes.

Acknowledgments

This work was supported by JSPS KAKENHI Grant Numbers JP21K14160 and JP20K04473.

References

  • [1] C. Zhang, G. Sun, X. Zhang, W. Zhang, W. Zhao, T. Wang, Y. Liang, Y.Liu, Y. Wang, and J. Shu, “Hi-fi playback: tolerating position errors in shift operations of racetrack memory”, Proc. ACM/IEEE Annu. Int. Symp. on Computer Architecture (ISCA), pp.694–706, Portland, OR, USA, June 2015.
  • [2] S. M. H. T. Yazdi, H. M. Kiah, E. Garcia-Ruiz, J. Ma, H. Zhao, and O. Milenkovic, “DNA-based storage: trends and methods”, IEEE Trans. Mol. Biol. Multi-Scale Commun., vol.1, no.3, pp.230–248, Sep. 2015.
  • [3] R. Gallager, “Low-density parity-check codes”, IRE Trans. Inf. Theory, vol.8, no.1, pp.21–28, Jan. 1962.
  • [4] T. J. Richardson, M. A. Shokrollahi, and R. L. Urbanke, “Design of capacity-approaching irregular low-density parity-check codes”, IEEE Trans. Inf. Theory, vol.47, no.2, pp.619–637, Feb. 2001.
  • [5] M. C. Davey and D. J. C. Mackay, “Reliable communication over channels with insertions, deletions, and substitutions”, IEEE Trans. Inf. Theory, vol.47, no.2, pp.687–698, Feb. 2001.
  • [6] F. Wang, D. Fertonani, and T. M. Duman, “Symbol-level synchronization and LDPC code design for insertion/deletion channels”, IEEE Trans. Commun., vol.59, no.5, pp.1287–1297, May 2011.
  • [7] R. Goto and K. Kasai, “Sparse graph codes for channels with synchronous errors”, IEICE Trans. on Fundamentals, vol.E101.A, pp.2064–2071, Dec. 2018.
  • [8] R. Shibata, G. Hosoya, and H. Yashima, “Design and construction of irregular LDPC codes for channels with synchronization errors: new aspect of degree profiles,” IEICE Trans. on Fundamentals, vol.E103-A, no.10, pp.1237–1247, Oct. 2020.
  • [9] R. Shibata and H. Yashima, “Design and performance of low-density parity-check codes for noisy channels with synchronization errors,” IEICE Trans. on Fundamentals, vol.E105-A, no.1, pp.63–67, Jan. 2022.
  • [10] D. M. Arnold, H.-A. Loeliger, P. O. Vontobel, A. Kavcic, and W. Zeng, “Simulation-based computation of information rates for channels with memory”, IEEE Trans. Inf. Theory, vol.52, no.8, pp.3498–3508, Aug. 2006.
  • [11] J. Hu, T. M. Duman, M. F. Erden, and A. Kavcic, “Achievable information rates for channels with insertions, deletions, and intersymbol interference with i.i.d. inputs,” IEEE Trans. Commun., vol.58, no.4, pp.1102–1111, Apr. 2010.
  • [12] Y. Liao et al., “Design and analysis of delayed bit-interleaved coded modulation with LDPC codes,” IEEE Trans. Commun., vol.69, no.6, pp.3556–3571, Jun. 2021.
  • [13] A. Kavcic, Xiao Ma, and M. Mitzenmacher, “Binary intersymbol interference channels: Gallager codes, density evolution, and code performance bounds”, IEEE Trans. Inf. Theory, vol.49, no.7, pp.1636–1652, Jul. 2003.
  • [14] B. Soriaga, H. D. Pfister, and P. H. Siegel, “Determining and approaching achievable rates of binary intersymbol interference channels using multistage decoding”, IEEE Trans. Inf. Theory, vol.53, no.4, pp.1416–1429, Apr. 2007.
  • [15] S. R. Srinivasavaradhan, S. Gopi, H. D. Pfister, and S. Yekhanin, “Trellis BMA: coded trace reconstruction on IDS channels for DNA storage,” Proc. IEEE Int. Symp. on Information Theory (ISIT), pp.2253–2458, Melbourne, Australia, Jul. 2021.
  • [16] D. G. M. Mitchell, M. Lentmaier, and D. J. Costello, “Spatially coupled LDPC codes constructed from protographs”, IEEE Trans. Inf. Theory, vol.61, no.9, pp.4866–4889, Sep. 2015.