This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Performance Limit and Coding Schemes for Resistive Random-Access Memory Channels

Guanghui Song, , Kui Cai, , Xingwei Zhong
Jiang Yu, and Jun Cheng
Abstract

Resistive random-access memory (ReRAM) is a promising candidate for the next generation non-volatile memory technology due to its simple read/write operations and high storage density. However, its crossbar array structure causes a severe interference effect known as the “sneak path.” In this paper, we propose channel coding techniques that can mitigate both the sneak-path interference and the channel noise. The main challenge is that the sneak-path interference is data-dependent, and also correlated within a memory array, and hence the conventional error correction coding scheme will be inadequate. In this work, we propose an across-array coding strategy that assigns a codeword to multiple independent memory arrays, and exploit a real-time channel estimation scheme to estimate the instantaneous status of the ReRAM channel. Since the coded bits from different arrays experience independent channels, a “diversity” gain can be obtained during decoding, and when the codeword is adequately distributed over different memory arrays, the code actually performs as that over an uncorrelated channel. By performing decoding based on the scheme of treating-interference-as-noise (TIN), the ReRAM channel over different memory arrays is equivalent to a block varying channel we defined, for which we propose both the capacity bounds and a coding scheme. The proposed coding scheme consists of a serial concatenation of an optimized error correction code with a data shaper, which enables the ReRAM system to achieve a near capacity limit storage efficiency.

Index Terms:
ReRAM, sneak path, across-array coding, data shaping

I Introduction

Resistive random-access memory (ReRAM) is an emerging non-volatile memory technology that changes the resistance value of a memristor to represent two states of the binary user data: the High-Resistance State corresponding to logic 0 while the Low-Resistance State corresponding to logic 1. The memristor cell is positioned on each row-column intersection of a crossbar structure, which offers a huge density advantage for ReRAM systems [1].

When a cell in a memory array is read, voltage is applied to the memristor cell, and the current flows through the memristor and senses the resistance value. If the memristor is detected with a High-Resistance State, the bit is decided to be a ‘0’; if it is detected with a Low-Resistance State, the bit is determined to be a ‘1’. A fundamental drawback of the ReRAM crossbar array is the sneak-path problem [2]. Sneak paths are undesirable paths in parallel to the selected cell for reading. The current goes through the sneak paths and degrades the measured resistance value. Sneak paths are detrimental when a cell with a High-Resistance State is read because the resistance degradation may lead to an erroneous decision.

In the literature, several works [3, 4, 5] tackled the sneak-path problem by using information and communication theoretical frameworks. In particular, Y. Cassuto et al. [3] studied the maximum storage efficiency when the constrained codes are employed to completely avoid sneak paths. This method incurs a high code rate loss, especially when the array size is large. Y. Cassuto et al. proved that as the array size approaches infinity, the storage information rate approaches 0 in order to achieve a sneak-path free channel. On the other hand, a commonly used method to eliminate the sneak paths is to introduce a cell selector in series to each array cell. However, these selectors are also prone to failure due to the imperfections of the memory fabrication and maintenance process, leading to the reoccurrence of the sneak paths [4][5]. Y. Ben-Hur [4] and Zehui et al. [5] considered ReRAM systems with imperfect selectors which fail with a certain probability. They built a probabilistic sneak-path model and developed the corresponding data detection schemes. A main challenge for the ReRAM channel is that the sneak-path induced interference to the channel is data-dependent. Moreover, the sneak-path interference is also correlated between different locations of the crossbar array. Previous work [4] developed single-cell data detection schemes that detect the data for each memory cell independently. More sophisticated joint-cell data detection schemes were developed in [5] by introducing pilot cells. However, the probabilistic model in [5] becomes too complex when the array size is getting large. No error correction code (ECC) was employed in previous research works [4][5].

According to the information theory, an efficient way to achieve reliable data storage is to apply ECC to the system. Such a design should not be a straightforward extension of the conventional ECC designed for the symbol-wise independent and identically distributed (i.i.d.) channels. Other than the channel noise, the code must overcome the sneak-path interference which is data-dependent and also correlated within the crossbar array.

In this paper, we propose efficient coding and decoding schemes for ReRAM channels. To reduce the correlation of the sneak-path interference within a codeword, we propose an across-array coding strategy which spreads a codeword to multiple independent memory arrays, and also exploit a real-time channel estimation scheme to obtain the channel status of each memory array. Since the coded bits from different memory arrays experience independent channels, the overall channel will be averaged and a “diversity” gain will be obtained during decoding. In this way, we can design the coding scheme over a symbol-wise i.i.d. channel. By further applying the treating-interference-as-noise (TIN) decoding, the ReRAM channel is equivalent to a block-varying channel whose status does not depend on the input data, based on which we derive the lower and upper bounds of its channel capacity and propose a coding scheme. The proposed coding scheme consists of an outer irregular repeat-accumulate (IRA) code being concatenated with an inner data shaper, which is used to change the input data distribution so as to achieve the maximum information rate. A low-complexity joint message-passing decoding for the IRA code and the data shaper is developed based on the state-of-the-art sparse-graph coding theory. With an optimized IRA code, our ReRAM system achieves a near capacity limit storage efficiency.

The rest of this paper is organized as follows. In Section II, we present the ReRAM channel model and describe the data-dependent feature of the sneak-path interference. The across-array coding strategy and the capacity bounds for the ReRAM channel are proposed in Section III. In Section IV, we propose a coding scheme for ReRAM channels and present both numerical and simulation results. In Section V, we conclude the paper.

II ReRAM Channel Model

Consider an m×nm\times n crossbar memory array. The memristor that lies at the intersection of row ii and column jj denotes memory cell (i,j)(i,j). Each array is able to store an m×nm\times n binary data matrix 𝑿=[xi,j]m×n\mbox{\boldmath$X$}=[x_{i,j}]_{m\times n}, where bit xi,j{0,1}x_{i,j}\in\{0,1\} is stored in memory cell (i,j),i{1,,m},j{1,,n}(i,j),i\in\{1,...,m\},j\in\{1,...,n\}. During the writing process, each bit is written into the memory cell by changing the resistance value of the memristor, i.e., for a logical “0” bit, the cell is changed to a high resistance of R0R_{0}, referred to as the High-Resistance State, and for a logical “1” bit, it is changed to a low resistance of R1R_{1}, referred to as the Low-Resistance State.

During the reading process, the data bit can be detected by measuring the resistance value of the corresponding cell. If it is in the High-Resistance State, the bit is identified as a ‘0’; if it is in the Low-Resistance State, the bit is identified as a ‘1’. However, due to the existence of the sneak-path interference, as well as a mix of other noises which can be modeled as an additive Gaussian noise, the memory reading becomes unreliable [3, 4, 5]. When the cell (i,j)(i,j) in a memory array is read, certain voltage is applied to the target cell to measure its resistance. A sneak path is defined as a closed path that originates from and returns to location (i,j)(i,j) while traversing logical-1 cells through alternating vertical and horizontal steps. An example is shown in Fig. 1, where (3,2)(3,2) is a target cell for reading and the green line shows the desired path to measure the resistance. However, (3,2)(3,4)(1,4)(1,2)(3,2)(3,2)\rightarrow(3,4)\rightarrow(1,4)\rightarrow(1,2)\rightarrow(3,2) forms a sneak path (red line) in parallel of the selected cell (3,2)(3,2). Since sneak paths always degrade the measured resistance value, they actually benefit the data detection when a cell in a Low-Resistance State (logic 1) is read. The detrimental effect only occurs when a High-Resistance State cell (logic 0) is read, making it more vulnerable to noise. In this paper, we only consider the sneak path when a High-Resistance State cell is read.

Refer to caption
Figure 1: (a) Example of a 4×44\times 4 memory array. (b) Corresponding logical values of memory array. (3,2)(3,2) is a target cell for reading. Voltage is applied to memristor cell (3,2)(3,2) and green line is the desired path for resistance measuring. However, (3,2)(3,4)(1,4)(1,2)(3,2)(3,2)\rightarrow(3,4)\rightarrow(1,4)\rightarrow(1,2)\rightarrow(3,2) forms a sneak path (red line) in parallel of target cell (3,2)(3,2) that degrades the measured resistance value. Arrows show current flow directions. Note that the sneak path produces a reverse current across cell (1,4)(1,4).

The most popular method to mitigate the sneak-path interference is to introduce a cell selector in series to each array cell. A cell selector is an electrical device that allows current to flow only in one direction across the cell. Since sneak paths inherently produce reverse currents in at least one of the cells along the parallel path, the cell selector can completely eliminate sneak paths from the entire array. In this paper we follow the previous work [4, 5] and consider the 1D1R structure, even though our proposed approaches can be extended to other structures as well. Although cell selectors can effectively eliminate sneak paths, they are also prone to failures due to the imperfections in the production or the maintenance of memory array, leading to the reoccurrence of the sneak paths. Following previous work [4, 5], we assume that the selectors in a memory array fail i.i.d. with probability pfp_{f}. However, our work is based on the fact that once a selector fails it will by no means recover, and hence the locations of the failed selectors are fixed during the reading of the whole array. This is different from the assumption made by the previous work [4, 5] that the locations of the failure cells vary randomly in the crossbar array during the reading of each cell. Although these two assumptions may not affect the detection performance significantly for the uncoded ReRAM systems, they are fundamentally different for the coded systems. The previous assumption [4, 5] actually leads to a near i.i.d. model for the sneak-path interference of each cell. In this work, the sneak-path events for the cells at different locations in the same array are highly correlated, which is much more difficult to be tackled by the channel coding scheme.

We define a sneak-path event indicator ei,je_{i,j} for cell (i,j)(i,j) to be a Boolean variable with the value ei,j=1e_{i,j}=1 if the cell (i,j)(i,j) is affected by sneak paths, otherwise, ei,j=0e_{i,j}=0. According to the previous work [5], sneak-path events occur during the reading of cell (i,j)(i,j) and lead to ei,j=1e_{i,j}=1 if and only if the following three conditions are satisfied:

[Sneak-Path Condition:]

1) The cell (i,j)(i,j) is in a High-Resistance State, i.e., xi,j=0x_{i,j}=0.

2) There exists at least one combination of k{1,,m},{1,,n}k\in\{1,...,m\},\ell\in\{1,...,n\} that induces a sneak path, defined by

xi,=xk,=xk,j=1.x_{i,\ell}=x_{k,\ell}=x_{k,j}=1. (1)

3) The selector at the diagonal cell (k,)(k,\ell) fails. Due to the circuit structure of the crossbar array, cells (i,l)(i,l) and (k,j)(k,j) will conduct current in the forward direction and not be affected by their selectors. Only when the selector of the cell (k,l)(k,l) is faulty, a sneak path will be formed.

The above Sneak-Path Condition definition limits the sneak path to length of 3, i.e., traversing three cells [5]. More sophisticated cases of the sneak paths were considered in [4]. The principle of our work can be extended to those cases.

Building on the above Sneak-Path Condition, we define a

[ReRAM Channel:]

Let 𝑿=[xi,j]m×n\mbox{\boldmath$X$}=[x_{i,j}]_{m\times n} be the stored data array and 𝒀=[yi,j]m×n\mbox{\boldmath$Y$}=[y_{i,j}]_{m\times n} be the corresponding readback signal for the crossbar array. Let \mathcal{R} be the set of real numbers. An ReRAM channel is a channel with input 𝑿{0,1}m×n\mbox{\boldmath$X$}\in\{0,1\}^{m\times n} and output 𝒀m×n\mbox{\boldmath$Y$}\in\mathcal{R}^{m\times n}:

yi,j={(1R0+ei,jRs)1+ηi,jif xi,j=0R1+ηi,jif xi,j=1y_{i,j}=\begin{cases}(\frac{1}{R_{0}}+\frac{e_{i,j}}{R_{s}})^{-1}+\eta_{i,j}&\mbox{if $x_{i,j}=0$}\\ R_{1}+\eta_{i,j}&\mbox{if $x_{i,j}=1$}\end{cases} (2)

where RsR_{s} is the parasitic resistance value brought by sneak paths. Here ηi,j𝒩(0,σ2),i=1,,m,j=1,,n\eta_{i,j}\sim\mathcal{N}(0,\sigma^{2}),i=1,...,m,j=1,...,n is an additive white Gaussian noise (AWGN) with mean 0 and variance σ2\sigma^{2} [4]. It is used to model a mix of various noises of the ReRAM system.

Refer to caption
Figure 2: PMFs of sneak-path rate within single memory array simulated for array sizes m×n=64×64,128×128m\times n=64\times 64,128\times 128 and input distributions with q=0.25,0.5q=0.25,0.5. The mean values of sneak-path rates for the four cases, indicated by the dashed lines, are ϵq=0.06,0.3888,0.2216\epsilon_{q}=0.06,0.3888,0.2216, and 0.86260.8626, respectively.

The fundamental problem of the ReRAM channel is to recover the stored data array 𝑿X based on readback signal 𝒀Y in the presence of sneak-path interference [ei,j]m×n[e_{i,j}]_{m\times n} and Gaussian noise [ηi,j]m×n[\eta_{i,j}]_{m\times n}. The ReRAM channel {0,1}m×nm×n\{0,1\}^{m\times n}\rightarrow\mathcal{R}^{m\times n} with input and output size m×nm\times n actually consists of mnmn symbol-wise channels with {0,1}\{0,1\}\rightarrow\mathcal{R}. Since the sneak-path indicator ei,je_{i,j} of each target cell is related to the entire data array, these mnmn symbol-wise channels are data-dependent and correlated.

The ReRAM channel is also asymmetrical, whose channel status (sneak-path occurring probability) is highly related to the channel input distribution. We define the input distribution as i.i.d. Bernoulli (q)(q), i.e., Pr(xi,j=1)=q\textrm{Pr}(x_{i,j}=1)=q and Pr(xi,j=0)=1q\textrm{Pr}(x_{i,j}=0)=1-q for i=1,,m,j=1,,ni=1,...,m,j=1,...,n.

For a fixed input distribution, we investigate the fraction of sneak-path affected cells in the crossbar array and define a sneak-path rate over the array as i=1mj=1nei,jmn(1q)\frac{\sum_{i=1}^{m}\sum_{j=1}^{n}e_{i,j}}{mn(1-q)}. Its mean value is derived as a function of qq:

ϵq\displaystyle\epsilon_{q}\!\!\!\!\!\!\!\! =ΔE[i=1mj=1nei,jmn(1q)]\displaystyle\overset{\Delta}{=}E\left[\frac{\sum_{i=1}^{m}\sum_{j=1}^{n}e_{i,j}}{mn(1-q)}\right] (4)
=Pr(ei,j=1|xi,j=0)\displaystyle=\textrm{Pr}(e_{i,j}=1|x_{i,j}=0)
=1u=0m1v=0n1(m1u)(n1v)qu+v(1q)m1u+n1v\displaystyle=1-\sum_{u=0}^{m-1}\sum_{v=0}^{n-1}\binom{m-1}{u}\binom{n-1}{v}q^{u+v}(1-q)^{m-1-u+n-1-v}
×(1pfq)uv.\displaystyle\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \times(1-p_{f}q)^{uv}.

When mm or nn is large, (4) is difficult to calculate. However, when pfqp_{f}q is small, using the Taylor expansion (1pfq)uv1uvpfq+α(uv2)pf2q2(1-p_{f}q)^{uv}\approx 1-uvp_{f}q+\alpha\binom{uv}{2}p_{f}^{2}q^{2}, we can approximately calculate (4):

ϵq\displaystyle\epsilon_{q}\!\!\!\!\!\!\!\! 1u=0m1v=0n1(m1u)(n1v)qu+v(1q)m1u+n1v\displaystyle\approx 1-\sum_{u=0}^{m-1}\sum_{v=0}^{n-1}\binom{m-1}{u}\binom{n-1}{v}q^{u+v}(1-q)^{m-1-u+n-1-v} (6)
×(1uvpfq+α(uv2)pf2q2)\displaystyle\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \times\left(1-uvp_{f}q+\alpha\binom{uv}{2}p_{f}^{2}q^{2}\right)
=(m1)(n1)pfq3α(2q(m12)(n12)\displaystyle=(m-1)(n-1)p_{f}q^{3}-\alpha\left(2q\binom{m-1}{2}\binom{n-1}{2}\right.
+(n1)(m12)+(m1)(n12))pf2q5\displaystyle\ \ \ \ \ \ \ \ \ \ \ \ \left.+(n-1)\binom{m-1}{2}+(m-1)\binom{n-1}{2}\right)p_{f}^{2}q^{5}

where α\alpha is a balance factor for the last term of the Taylor expansion. Here a good setting for α\alpha is 0.80.8.

We define the probability mess function (PMF) of the sneak-path rate as

F(ϵ)=Pr(i=1mj=1nei,jmn(1q)=ϵ).F(\epsilon)=\textrm{Pr}\left(\frac{\sum_{i=1}^{m}\sum_{j=1}^{n}e_{i,j}}{mn(1-q)}=\epsilon\right). (7)

For a memory array size of m×n=64×64,128×128m\times n=64\times 64,128\times 128 and input distributions with q=0.25,0.5q=0.25,0.5, we simulate F(ϵ)F(\epsilon) and the results are illustrated by Fig. 2. In particular, we generate a large amount of input data arrays, compute the sneak-path rate of each array, and obtain the PMF statistically. In the simulations as well as the numerical results of this paper, we assume the selectors fail i.i.d. with probability pf=103p_{f}=10^{-3}. Fig. 2 shows that a larger value of qq, or a larger array size, leads to higher sneak-path rates, i.e., worse channels. The values of the sneak-path rate are quite diverse for different input data patterns since the PMF spreads in a large range over the abscissa, which indicates that the channel varies significantly for different input data patterns. This is because the occurrence of sneak-path events depends on the input data pattern. This creates a big challenge for designing the coding scheme for the ReRAM channels since the code directly designed based on the average sneak-path rate of ϵq=ϵϵF(ϵ)\epsilon_{q}=\sum_{\epsilon}\epsilon F(\epsilon) will be inadequate.

Refer to caption
Figure 3: Across-array coding strategy for ReRAM.

III Across-Array Coding Strategy and Channel Capacity Bounds

To mitigate the variability of the ReRAM channel, we propose an across-array coding strategy that assigns a codeword to multiple crossbar arrays. Since the coded bits at different arrays experience independent channels, the sneak-path rate within one codeword will be close to its mean value and hence the channel is closer to an i.i.d. channel. Based on the across-array coding strategy, we further investigate the ReRAM channel capacity bounds. As the sneak path is dependent on the data message, it is difficult to derive the exact capacity of the ReRAM channel. By treating the sneak-path interference as the i.i.d. noise during decoding, the ReRAM channel over multiple memory arrays resembles a block-varying channel that we will define in Section III-B, whose status does not depend on the input data. We then derive the capacity bounds of the block-varying channel, which can be regarded as an approximation of the ReRAM channel capacity.

III-A Across-Array Coding Strategy

The proposed across-array coding strategy is illustrated in Fig. 3. Consider the processing of the data vector 𝒃=(b1,b2,,bNR)\mbox{\boldmath$b$}=(b_{1},b_{2},...,b_{NR}), where NN is the code length and RR is the code rate. Here, 𝒃b is encoded into codeword 𝒙=(x1,x2,,xN)\mbox{\boldmath$x$}=(x_{1},x_{2},...,x_{N}) which is assigned to TT memory arrays, where N=sTN=sT for some integer ss. Thus, we split 𝒙x into TT equal-length vectors, each of which is assigned to an independent memory array. Without loss of generality, we assign 𝒙t=(x1t,x2t,.,xN/Tt)\mbox{\boldmath$x$}^{t}=(x_{1}^{t},x_{2}^{t},....,x_{N/T}^{t}) with xit=x(t1)N/T+i,i=1,,N/Tx_{i}^{t}=x_{(t-1)N/T+i},i=1,...,N/T, to the tt-th memory array for t=1,,Tt=1,...,T. Since each memory array is of size m×nm\times n, mnT/NmnT/N codewords can be stored by these TT memory arrays, where mnT/NmnT/N is assumed to be an integer. As the code rate is RR, the storage efficiency is RR bits/cell.

Each codeword is decoded independently based on its readback signal. The codeword 𝒙=(𝒙1,,𝒙T)\mbox{\boldmath$x$}=(\mbox{\boldmath$x$}^{1},...,\mbox{\boldmath$x$}^{T}) is decoded based on its readback signal 𝒚=(𝒚1,,𝒚T)\mbox{\boldmath$y$}=(\mbox{\boldmath$y$}^{1},...,\mbox{\boldmath$y$}^{T}), where 𝒚t\mbox{\boldmath$y$}^{t} is the readback signal of 𝒙t\mbox{\boldmath$x$}^{t} from the tt-th memory array, t=1,,Tt=1,...,T, to obtain the estimated data 𝒃^\hat{\mbox{\boldmath$b$}}. Here we employ a suboptimal decoding scheme known as the TIN decoding [6]. The TIN decoder ignores the correlation between the channel input and the sneak paths and treats the sneak-path interference as the i.i.d. noise.

To illustrate the advantage of across-array coding strategy more explicitly, in Fig 4, we evaluate the PMF of the sneak-path rate over one codeword that is stored in TT memory arrays. In the figure, we employ the array sizes and the code lengths of N=m×n=64×64N=m\times n=64\times 64 and 128×128128\times 128. Since the coded bits are distributed in TT memory arrays, the sneak-path rate as well as its PMF are rewritten as t=1Ti=1N/Teit/(mn(1q))\sum_{t=1}^{T}\sum_{i=1}^{N/T}e_{i}^{t}/(mn(1-q)) and FqT(ϵ)=Pr(t=1Ti=1N/Teit/(mn(1q))=ϵ)F^{T}_{q}(\epsilon)=\textrm{Pr}(\sum_{t=1}^{T}\sum_{i=1}^{N/T}e_{i}^{t}/(mn(1-q))=\epsilon), where eite_{i}^{t} is the sneak-path event indicator during the reading of the ii-th bit that belongs to the tt-th array. For each case of m×n=64×64m\times n=64\times 64 and 128×128128\times 128 and the given input distribution with q=0.25q=0.25 and 0.50.5, as TT increases, the spread of the PMF of the sneak-path rate gets smaller and concentrates closer to the mean value ϵq\epsilon_{q} and the channel becomes more stable. The reason is that since the codeword is assigned to TT independent memory arrays, the sneak-path rate is averaged over the TT arrays. Based on the law of large numbers, as TT\rightarrow\infty, the sneak-path rate converges exactly to the mean value with probability 1, and therefore, we can design a code based on this mean value to guarantee error free decoding. Note that the across-array coding strategy does not change the channel correlation. It actually reduces the correlation of the sneak-path interference within one codeword since the readback signals for coded bits from different memory arrays are independent with each other. As all the coded bits are encoded from the same message data, this resembles a “code diversity” strategy for block fading channels [7].

Refer to caption
Figure 4: PMFs of sneak-path rates over one codeword distributed in TT memory arrays with T=1,2,4,8,16T=1,2,4,8,16. Code length and memory array size are m×n=64×64m\times n=64\times 64 (left) and 128×128128\times 128 (right).

The drawback of a joint coding across TT arrays is a potential increase of the read/write latency. Note that such additional latency is unavoidable for many sneak-path mitigating approaches in the literature. For example, the multistage reading technique presented in [8] requires three readings and three writings to get a better estimation of the sneak current; [4] investigates the multiple-read detector and shows that it can achieve near-optimum performance with 10 reads. We remark that for our proposed across-array coding strategy, the additional read/write latency can be minimized if a parallel reading/writing circuit [9] is adopted across different crossbar arrays. Moreover, the additional latency incurred will become negligible if the two-dimensional crossbar arrays are stacked to form a 3D structure, which naturally enables the parallel reading/writing across different arrays [10].

III-B Channel Equivalence and Capacity Bounds

We first define a block-varying (ϵ,σ)(\mbox{\boldmath$\epsilon$},\sigma)-channel and show how the ReRAM channel capacity is related to that of the block varying (ϵ,σ)(\mbox{\boldmath$\epsilon$},\sigma)-channel.

To begin, we first define an (ϵ,σ)(\epsilon,\sigma)-channel. As illustrated in Fig. 5, an (ϵ,σ)(\epsilon,\sigma)-channel is a concatenation of an i.i.d. asymmetrical discrete channel and an i.i.d. additive Gaussian channel, and therefore, it is also an i.i.d. channel without channel correlation. The asymmetrical discrete channel is with binary-input X{0,1}X\in\{0,1\} and ternary-output from {R0,R0,R1}\{R_{0},R_{0}^{\prime},R_{1}\} with R0=(1R0+1Rs)1R_{0}^{\prime}=\left(\frac{1}{R_{0}}+\frac{1}{R_{s}}\right)^{-1}, and the transition property is described by Pr(R0|0)=1ϵ(R_{0}|0)=1-\epsilon, Pr(R0|0)=ϵ(R_{0}^{\prime}|0)=\epsilon, and Pr(R1|1)=1(R_{1}|1)=1. The additive Gaussian channel is with noise distribution η𝒩(0,σ2)\eta\sim\mathcal{N}(0,\sigma^{2}), whose output YY\in\mathcal{R} serves as the output of the (ϵ,σ)(\epsilon,\sigma)-channel.

Refer to caption
Figure 5: (ϵ,σ)(\epsilon,\sigma)-channel model with R0=(1R0+1Rs)1R_{0}^{\prime}=\left(\frac{1}{R_{0}}+\frac{1}{R_{s}}\right)^{-1}.

For given input distribution Pr(X=0)=1q(X=0)=1-q, Pr(X=1)=q(X=1)=q, the (ϵ,σ)(\epsilon,\sigma)-channel capacity can be derived as

Cq(ϵ,σ)\displaystyle\!\!\!\!C_{q}(\epsilon,\sigma) =I(X;Y)\displaystyle\!\!\!\!\!\!\!\!=I(X;Y)
=H(Y)H(Y|X)\displaystyle\!\!\!\!\!\!\!\!=H(Y)-H(Y|X)
=H(Y)qH(Y|X=1)(1q)H(Y|X=0)\displaystyle\!\!\!\!\!\!\!\!=H(Y)-qH(Y|X=1)-(1-q)H(Y|X=0)
=+pY(y)log2pY(y)𝑑yqlog22πeσ2\displaystyle\!\!\!\!\!\!\!\!=-\int_{-\infty}^{+\infty}p_{Y}(y)\log_{2}p_{Y}(y)dy-q\log_{2}\sqrt{2\pi e\sigma^{2}}
+(1q)+pY|X=0(y)log2pY|X=0(y)𝑑y\displaystyle\!\!\!\!\!\!\!\!\ \ \ \ \ \ \ \ \ +(1-q)\int_{-\infty}^{+\infty}p_{Y|X=0}(y)\log_{2}p_{Y|X=0}(y)dy

where

pY(y)=\displaystyle p_{Y}(y)=\!\!\!\!\!\!\!\! (1q)(ϵϕ(y,R0)+(1ϵ)ϕ(y,R0))+qϕ(y,R1)\displaystyle(1-q)\left(\epsilon\phi(y,R_{0}^{\prime})+(1-\epsilon)\phi(y,R_{0})\right)+q\phi(y,R_{1})
pY|X=0(y)=\displaystyle p_{Y|X=0}(y)=\!\!\!\!\!\!\!\! ϵϕ(y,R0)+(1ϵ)ϕ(y,R0)\displaystyle\epsilon\phi(y,R_{0}^{\prime})+(1-\epsilon)\phi(y,R_{0})
ϕ(y,m)=\displaystyle\phi(y,m)=\!\!\!\!\!\!\!\! 1/(2πσ)e(ym)22σ2.\displaystyle{1}/{(\sqrt{2\pi}\sigma)}e^{-\frac{(y-m)^{2}}{2\sigma^{2}}}.

The (0,σ)(0,\sigma)- and (1,σ)(1,\sigma)-channels are asymmetrical binary-input AWGN channels, which are two special cases of an (ϵ,σ)(\epsilon,\sigma)-channel. Obviously, the (ϵ,σ)(\epsilon,\sigma)-channel capacity decreases as ϵ\epsilon increases leading to Cq(1,σ)<Cq(ϵ,σ)<Cq(0,σ)C_{q}(1,\sigma)<C_{q}(\epsilon,\sigma)<C_{q}(0,\sigma).

A TT-block block-varying (ϵ,σ)(\mbox{\boldmath$\epsilon$},\sigma)-channel with parameters ϵ=(ϵ1,ϵ2,,ϵT)\mbox{\boldmath$\epsilon$}=(\epsilon^{1},\epsilon^{2},...,\epsilon^{T}) varies from data block to data block, while within the tt-th data block, the channel is a symbol-wise i.i.d. (ϵt,σ)(\epsilon^{t},\sigma)-channel, t=1,2,,Tt=1,2,...,T. The block-varying (ϵ,σ)(\mbox{\boldmath$\epsilon$},\sigma)-channel is a type of channel with block interference as proposed by [11].

The ReRAM channel over TT memory arrays resembles the TT-block block-varying (ϵ,σ)(\mbox{\boldmath$\epsilon$},\sigma)-channel. In particular, the sneak-path rate of the ReRAM channel varies from memory array to memory array resembles the block-varying property (the channel varies from block to block) of the block-varying (ϵ,σ)(\mbox{\boldmath$\epsilon$},\sigma)-channel. The channel parameter ϵt=Pr(R0|X=0)\epsilon^{t}=\textrm{Pr}(R_{0}^{\prime}|X=0) resembles the instantaneous sneak-path rate of the tt-th memory array. The main difference is that in the ReRAM channel, the sneak-path interference is dependent on the input data and this data-dependency leads to the channel correlation, while the parameter ϵ\epsilon of block-varying (ϵ,σ)(\mbox{\boldmath$\epsilon$},\sigma)-channel is data-independent and the channel within each block is i.i.d. However, if we adopt TIN decoding where the decoder ignores this data-dependency and regards the sneak paths as the i.i.d. noise, an ReRAM channel is equivalent to a block-varying (ϵ,σ)(\mbox{\boldmath$\epsilon$},\sigma)-channel. The memory block length, denoted by MM, of the block-varying (ϵ,σ)(\mbox{\boldmath$\epsilon$},\sigma)-channel should be identical to the data array size of the ReRAM channel, i.e., M=mnM=mn. The channel parameters ϵt,t=1,2,,T\epsilon^{t},t=1,2,...,T should be i.i.d. generated based on PMF of the sneak-path rate of the ReRAM channel. Therefore, the maximum achievable coding rate over the ReRAM channel under TIN decoding can be approximated by the block-varying (ϵ,σ)(\mbox{\boldmath$\epsilon$},\sigma)-channel capacity.

In preparation to give the capacity limit, we define an (ϵ,σ)(\epsilon,\sigma)-channel code:

Definition 1

An (ϵ,σ)(\epsilon,\sigma)-channel code includes a sequences of codes with rate Cq(ϵ,σ)C_{q}(\epsilon,\sigma) and different code lengths nn, which achieve asymptotical error free decoding over the i.i.d. (ϵ,σ)(\epsilon,\sigma)-channel as the code length approaches infinity.

The existence of the (ϵ,σ)(\epsilon,\sigma)-channel code ensemble is guaranteed by the conventional channel coding theorem.

Consider a block-varying (ϵ,σ)(\mbox{\boldmath$\epsilon$},\sigma)-channel with memory block size MM. Parameter ϵt\epsilon^{t} has the PMF F(ϵt),t=1,2,,TF(\epsilon^{t}),t=1,2,...,T, and mean value ϵ¯=ϵϵF(ϵ)\bar{\epsilon}=\sum_{\epsilon}\epsilon F(\epsilon), as defined in (4). Since the channel varies from block to block, we consider joint TT-block encoding and decoding. The code length is therefore MTMT. Let RR be the encoding rate, and we then have the following theorem:

Theorem 1

For fixed input distribution of Bernoulli (q)(q), as TT\rightarrow\infty, the maximum achievable code rate RR with joint TT-block encoding and decoding is bounded by

Cq(ϵ¯,σ)RC¯q(σ)C_{q}(\bar{\epsilon},\sigma)\leq R\leq\overline{C}_{q}(\sigma) (8)

where C¯q(σ)=ϵF(ϵ)Cq(ϵ,σ)\overline{C}_{q}(\sigma)=\sum_{\epsilon}F(\epsilon)C_{q}(\epsilon,\sigma).

Proof: We first show that for a fixed input distribution of Bernoulli (q)(q), Cq(ϵ¯,σ)C_{q}(\bar{\epsilon},\sigma) is achievable. Let 𝒙=(𝒙1,,𝒙T)\mbox{\boldmath$x$}=(\mbox{\boldmath$x$}^{1},...,\mbox{\boldmath$x$}^{T}) be the joint TT-block codeword, where tt-th block 𝒙t=(x1t,,xMt)\mbox{\boldmath$x$}^{t}=(x^{t}_{1},...,x^{t}_{M}) experiences an (ϵt,σ)(\epsilon^{t},\sigma)-channel, i.e., each symbol xjt,j=1,,Mx^{t}_{j},j=1,...,M, experiences an i.i.d. (ϵt,σ)(\epsilon^{t},\sigma)-channel. We assume that the codeword is encoded in the way that the ii-th bits, located at different data blocks, i.e., (xi1,xi2,,xiT)(x_{i}^{1},x_{i}^{2},...,x_{i}^{T}), belong to a codeword of a length-TT (ϵ¯,σ)(\bar{\epsilon},\sigma)-channel code. In this way, the original length-MTMT codeword 𝒙x can be considered as a vector consisting MM length-TT (ϵ¯,σ)(\bar{\epsilon},\sigma)-channel codewords. This is possible because we can always split the uncoded data vector into MM equal-length sub-vectors, and encode each of them independently using an (ϵ¯,σ)(\bar{\epsilon},\sigma)-channel code. As the encoding rate of each (ϵ¯,σ)(\bar{\epsilon},\sigma)-channel code is Cq(ϵ¯,σ)C_{q}(\bar{\epsilon},\sigma), the overall code rate is R=Cq(ϵ¯,σ)R=C_{q}(\bar{\epsilon},\sigma).

During decoding, for each i=1,,Mi=1,...,M, (xi1,xi2,,xiT)(x_{i}^{1},x_{i}^{2},...,x_{i}^{T}) is decoded based on its channel output (yi1,yi2,,yiT)(y_{i}^{1},y_{i}^{2},...,y_{i}^{T}), where yity_{i}^{t} is a channel observation of xitx_{i}^{t}. Since coded bit xitx_{i}^{t} experiences an (ϵt,σ)(\epsilon^{t},\sigma)-channel, where ϵt,t=1,2,,T\epsilon^{t},t=1,2,...,T, are i.i.d. generated based on the PMF of F(ϵ)F(\epsilon), the overall codeword experiences an (ϵ¯,σ)(\bar{\epsilon},\sigma)-channel where ϵ¯=ϵϵF(ϵ)\bar{\epsilon}=\sum_{\epsilon}\epsilon F(\epsilon). Since (xi1,xi2,,xiT)(x_{i}^{1},x_{i}^{2},...,x_{i}^{T}) is an (ϵ¯,σ)(\bar{\epsilon},\sigma)-channel codeword, the decoding error probability approaches 0 as TT\rightarrow\infty according to Definition 1.

In [11], an upper bound of the block interference channel is proposed. That is, given the channel parameters ϵ\epsilon, the channel becomes memoryless, leading to R1MTI(𝒙;𝒚)1MTI(𝒙;𝒙|ϵ)I(x11;y11|ϵ1)=ϵF(ϵ)Cq(ϵ,σ)=C¯q(σ)R\leq\frac{1}{MT}I(\mbox{\boldmath$x$};\mbox{\boldmath$y$})\leq\frac{1}{MT}I(\mbox{\boldmath$x$};\mbox{\boldmath$x$}|\mbox{\boldmath$\epsilon$})\leq I(x_{1}^{1};y_{1}^{1}|\epsilon^{1})=\sum_{\epsilon}F(\epsilon)C_{q}(\epsilon,\sigma)=\overline{C}_{q}(\sigma). \Box

It was also shown in [11] that if the channel state is finite, i.e., ϵ\epsilon is from a finite set, the upper bound is tight when the block size MM\rightarrow\infty.

Refer to caption
Figure 6: Upper bound ϵFq(ϵ)Cq(ϵ,σ)\sum_{\epsilon}F_{q}(\epsilon)C_{q}(\epsilon,\sigma) and lower bound C(ϵq,σ)C(\epsilon_{q},\sigma) of the capacity as TT\rightarrow\infty, with R1=100Ω,R0=1000Ω,Rs=250ΩR_{1}=100\ \Omega,R_{0}=1000\ \Omega,R_{s}=250\ \Omega, and pf=103p_{f}=10^{-3}.

Based on our channel equivalence, maxqCq(ϵq,σ)\max_{q}C_{q}(\epsilon_{q},\sigma) is an approximate lower bound of the ReRAM channel capacity with TIN decoding, and maxqϵFq(ϵ)Cq(ϵ,σ)\max_{q}\sum_{\epsilon}F_{q}(\epsilon)C_{q}(\epsilon,\sigma) is an upper bound. Since we have the explicit formula (4) for ϵq\epsilon_{q}, the lower bound is much easier to be calculated than the upper bound, which requires Fq(ϵ)F_{q}(\epsilon). Fortunately, we can show that when the array size NN is large, the two bounds are numerically very close with each other, and hence the lower bound maxqC(ϵq,σ)\max_{q}C(\epsilon_{q},\sigma) can be a good approximation of the ReRAM channel capacity. Fig. 6 illustrates the capacity upper bound ϵFq(ϵ)Cq(ϵ,σ)\sum_{\epsilon}F_{q}(\epsilon)C_{q}(\epsilon,\sigma) and lower bound C(ϵq,σ)C(\epsilon_{q},\sigma) as functions of qq, for different memory array sizes m×n=32×32,64×64,128×128,256×256m\times n=32\times 32,64\times 64,128\times 128,256\times 256 and different noise values of σ=30,50,100\sigma=30,50,100. The resistance parameters are fixed with R1=100Ω,R0=1000ΩR_{1}=100\ \Omega,R_{0}=1000\ \Omega, and Rs=250ΩR_{s}=250\ \Omega. Observe that when the memory array size NN is large, the two bounds are very close with each other.

Fig. 6 also indicates that the ReRAM channel capacity bounds decrease as the data size increases due to the increase of the sneak-path rate, i.e., the larger the data array size the lower the average storage efficiency for each cell, and vice versa. For a very low noise level of σ=30\sigma=30, the capacity bounds are maximized at about q=0.5q=0.5, and for σ=50,100\sigma=50,100, they are typically maximized when q<0.5q<0.5. The optimal value of qq that maximizes the channel capacity bounds decreases as noise level increases. This is because noise amplifies the detrimental effect of sneak paths, while reducing qq effectively reduces the sneak-path rate.

IV Design of the Coding Scheme

In this section, we present the design of a coding scheme for the ReRAM channel. We also propose a real-time maximum likelihood channel estimation, based on which the message-passing decoding is performed. We utilize the state-of-the-art sparse-graph code and message-passing decoding theories to design the coding scheme, which is essentially an (ϵq,σ)(\epsilon_{q},\sigma)-channel code design. A major difference between the (ϵq,σ)(\epsilon_{q},\sigma)-channel code and the classical ECC is that since the former works over an asymmetrical channel, the coded data should follow the desired distribution to approach the channel capacity (Fig. 6). We propose a coding scheme, which is a serial concatenation between a classical ECC and a data shaper that shapes the desired data distribution. Bit error rate (BER) simulations and performance comparisons are also presented in this section for our proposed coding scheme.

Refer to caption
Figure 7: Proposed coding scheme for the ReRAM channel.

IV-A Coding and Decoding Model

A system model for the proposed coding scheme is illustrated in Fig. 7. The encoder includes an ECC encoder and a data shaper. The ECC encoder encodes data vector 𝒃=(b1,b2,,bNR)\mbox{\boldmath$b$}=(b_{1},b_{2},...,b_{NR}) into codeword 𝒄=(c1,c2,,cN)\mbox{\boldmath$c$}=(c_{1},c_{2},...,c_{N}) whose entries are uniformly distributed on {0,1}\{0,1\}. The data shaper reforms the data distribution into Bernoulli (q)(q). Its output is 𝒙=(x1,x2,,xN)\mbox{\boldmath$x$}=(x_{1},x_{2},...,x_{N}). Here the data shaper in our system has rate-1, and therefore, the overall code rate is still RR.

The decoder involves a real-time channel estimator, elementary signal estimator (ESE), a de-shaper, and an ECC decoder. Since the decoding is actually a block-varying (ϵ,σ)(\mbox{\boldmath$\epsilon$},\sigma)-channel decoding, the channel estimator first estimates the channel parameters ϵ=(ϵ1,,ϵT)\mbox{\boldmath$\epsilon$}=(\epsilon^{1},...,\epsilon^{T}) over the TT memory arrays based on readback signal 𝒚=(𝒚1,,𝒚T)\mbox{\boldmath$y$}=(\mbox{\boldmath$y$}^{1},...,\mbox{\boldmath$y$}^{T}). Based on ϵ^\hat{\mbox{\boldmath$\epsilon$}} and 𝒚y, the ESE calculates a soft estimation {L(xit|yit,ϵ^t)}\{L(x_{i}^{t}|y_{i}^{t},\hat{\epsilon}^{t})\}, i.e., the log-likelihood ratio (LLR), for each coded bit xitx_{i}^{t} that is used as the decoder input. The decoder consists of a de-shaper and an ECC decoder, both of which use soft-in soft-out (SISO) processings and perform iteratively to improve the decoding reliability. The corresponding decoding is standard message-passing decoding. Specifically, the de-shaper calculates soft LLR {Le(ci)}\{L^{e}(c_{i})\} for each ECC coded bit, based on which an ECC decoding refines the estimation and feds back an updated LLR {La(ci)}\{L^{a}(c_{i})\} to the de-shaper for the next round of decoding iterations. After a fixed maximum number of iterations, a hard decision is performed at the ECC decoder to produce data estimation 𝒃^\hat{\mbox{\boldmath$b$}}.

Refer to caption
Figure 8: Data shaper and its factor graph illustration.

IV-B Data Shaper

The data shaper consists of a length-LL repeater, a length-NLNL bit interleaver π\pi, and an LL-to-1 mapper (Fig. 8). The repeater duplicates each ECC coded bit LL times. The bit interleaver permutes the repeater output to relocate the bits. The LL-to-1 mapper maps every LL bits to one bit, i.e., :{0,1}L{0,1}\mathcal{M}:\{0,1\}^{L}\rightarrow\{0,1\}. Therefore, the data shaper’s overall rate is 1. For the LL-to-1 mapper, since there are 2L2^{L} patterns for the mapping inputs, by mapping ii of them to 1 and 2Li2^{L}-i of them to 0, we obtain the data output with a distribution of Bernoulli (i2L)(\frac{i}{2^{L}}). By choosing i=1,2,,2L1i=1,2,...,2^{L}-1, we achieve data distributions of Bernoulli (q)(q) with q=12L,22L,,2L12Lq=\frac{1}{2^{L}},\frac{2}{2^{L}},...,\frac{2^{L}-1}{2^{L}}.

Refer to caption
Figure 9: 3-to-1 mapping for output data distribution Pr(x~=0)=34(\tilde{x}=0)=\frac{3}{4}, Pr(x~=1)=14(\tilde{x}=1)=\frac{1}{4}.

The interleaver inside the data shaper is crucial in our scheme. Rather than adopting random interleaving, we propose a structured interleaving scheme, as shown in the data shaper’s factor graph in Fig. 8. The interleaver π\pi consists of LL sub-interleavers πi,i=1,,L\pi_{i},i=1,...,L, each of which can be random. The data-shaping process can be described using a factor graph. Each variable node is associated with an ECC coded bit, where the ii-th variable node is associated with cic_{i}. There are LL edges from a variable node to the interleavers corresponding to the LL repetitions of the ECC coded bit. Each mapping node has LL edges from the interleavers corresponding to the LL mapping inputs. The ii-th mapping node is associated with the mapping output xix_{i}. By using our structured interleaver, the ii-th repetitions of the ECC coded bits enter a sub-interleaver πi\pi_{i} whose outputs are used as the ii-th inputs of the mapping nodes. By doing so, each ECC coded bit has exactly one repetition that occupies the ii-th input of a mapping node for i=1,,Li=1,...,L.

The advantage of employing this structured interleaver can be explained using an example. Consider a data shaper with an (L=3)(L=3)-repeater and a 3-to-1 mapper (c~1,c~2,c~3)=x~\mathcal{M}(\tilde{c}_{1},\tilde{c}_{2},\tilde{c}_{3})=\tilde{x} (Fig. 9). There are eight patterns for three binary inputs c~1,c~2,c~3\tilde{c}_{1},\tilde{c}_{2},\tilde{c}_{3}, where only two of them, 110 and 111, are mapped to 1, and the other six are mapped to 0. If c~1,c~2,c~3\tilde{c}_{1},\tilde{c}_{2},\tilde{c}_{3} are i.i.d. with Pr(c~i=0)(\tilde{c}_{i}=0)=Pr(c~i=1)=12,i=1,2,3(\tilde{c}_{i}=1)=\frac{1}{2},i=1,2,3, the mapping can realize output data distribution with Pr(x~=0)=34(\tilde{x}=0)=\frac{3}{4}, Pr(x~=1)=14(\tilde{x}=1)=\frac{1}{4}. Next we address the de-mapping. Mapping output x~\tilde{x} actually contains a different quantity of information about the three input bits c~1,c~2,c~3\tilde{c}_{1},\tilde{c}_{2},\tilde{c}_{3}. By formulating the mapping rule as x~=(c~1,c~2,c~3)=c~1c~2\tilde{x}=\mathcal{M}(\tilde{c}_{1},\tilde{c}_{2},\tilde{c}_{3})=\tilde{c}_{1}\cdot\tilde{c}_{2}, where \cdot is a multiply operation, we evaluate the mutual information between x~\tilde{x} and each input bit as I(x~;c~1)=I(x~;c~2)=34log243I(\tilde{x};\tilde{c}_{1})=I(\tilde{x};\tilde{c}_{2})=\frac{3}{4}\log_{2}\frac{4}{3} and I(x~;c~3)=0I(\tilde{x};\tilde{c}_{3})=0. Therefore, if the de-mapper is sufficiently near-optimal, during de-mapping we can obtain information about the first and second bits c~1,c~2\tilde{c}_{1},\tilde{c}_{2}. Unfortunately, we cannot get any information about the third bit c~3\tilde{c}_{3} since x~\tilde{x} does not contain any information about c~3\tilde{c}_{3}. In other words, one-third of the bits are erased after de-mapping. If random interleaving is employed, with probability 127\frac{1}{27}, all the three repetitions of an ECC coded bit will be assigned as the third input of a mapping node and erased. In other words, with probability 127\frac{1}{27}, an ECC coded bit will be erased after de-shaping, which leads to a poor ECC decoding performance. Our structured interleaving guarantees that all the ECC coded bits can obtain a positive and statistically equal quantity of information from the de-shaper to benefit the ECC decoding.

IV-C Channel Estimation and ESE

We propose a maximum likelihood channel estimation to obtain parameters ϵ=(ϵ1,,ϵT)\mbox{\boldmath$\epsilon$}=(\epsilon^{1},...,\epsilon^{T}). Since the decoder assumes the channel created by the tt-th memory array as an i.i.d. (ϵt,σ)(\epsilon^{t},\sigma)-channel, with the channel observation of 𝒚t=(y1t,,yN/Tt)\mbox{\boldmath$y$}^{t}=(y^{t}_{1},...,y^{t}_{N/T}), the log-likelihood function of ϵt\epsilon^{t} is written as:

Λ(ϵt;𝒚t)\displaystyle\Lambda(\epsilon^{t};\mbox{\boldmath$y$}^{t}) =logi=1N/TPr(yit|ϵt)\displaystyle\!\!\!\!\!\!\!\!\!\!=\log\prod_{i=1}^{N/T}\textrm{Pr}\left(y^{t}_{i}|\epsilon^{t}\right) (10)
=i=1N/Tlog[(1q)(ϵtϕ(yit,R0)+(1ϵt)ϕ(yit,R0))\displaystyle\!\!\!\!\!\!\!\!\!\!=\sum_{i=1}^{N/T}\log\left[(1-q)\left(\epsilon^{t}\phi(y^{t}_{i},R_{0}^{\prime})+(1-\epsilon^{t})\phi(y^{t}_{i},R_{0})\right)\right.
+qϕ(yit,R1)].\displaystyle\!\!\!\!\!\!\!\!\!\!\ \ \ \ \ \ \left.+q\phi(y^{t}_{i},R_{1})\right].

Next we consider the approximation for (10). Let y¯it=argminx{R0,R0,R1}|yitx|\bar{y}^{t}_{i}=\arg\min_{x\in\{R_{0},R_{0}^{\prime},R_{1}\}}|y^{t}_{i}-x| be the hard decision value of yity^{t}_{i}. Since each term of (10) is in the form of logx{R0,R0,R1}pxϕ(yit,x)\log\sum_{x\in\{R_{0},R_{0}^{\prime},R_{1}\}}p_{x}\phi(y^{t}_{i},x), when the channel noise level is low, it is dominated by the term of x=y¯itx=\bar{y}^{t}_{i}. We thereby apply

logx{R0,R0,R1}pxϕ(yit,x)\displaystyle\log\sum_{x\in\{R_{0},R_{0}^{\prime},R_{1}\}}p_{x}\phi(y^{t}_{i},x) log(py¯itϕ(yit,y¯it))\displaystyle\!\!\!\!\!\!\!\!\!\!\approx\log\left(p_{\bar{y}^{t}_{i}}\phi(y^{t}_{i},\bar{y}^{t}_{i})\right)
=logpy¯it(yity¯it)22σ212log(2πσ2),\displaystyle\!\!\!\!\!\!\!\!\!\!=\log p_{\bar{y}^{t}_{i}}-\frac{(y^{t}_{i}-\bar{y}^{t}_{i})^{2}}{2\sigma^{2}}-\frac{1}{2}\log(2\pi\sigma^{2}),

and hence have the following approximation:

Λ(ϵt;𝒚t)nR0logϵt+nR0log(1ϵt)i=1N/T(yity¯it)22σ2+c\displaystyle\Lambda(\epsilon^{t};\mbox{\boldmath$y$}^{t})\approx n_{R_{0}^{\prime}}\log\epsilon^{t}+n_{R_{0}}\log(1-\epsilon^{t})-\sum_{i=1}^{N/T}\frac{(y^{t}_{i}-\bar{y}^{t}_{i})^{2}}{2\sigma^{2}}\!+\!c

where nx=Δi=1N/T1{y¯it=x}n_{x}\overset{\Delta}{=}\sum_{i=1}^{N/T}1\{\bar{y}^{t}_{i}=x\} is the total number of yity^{t}_{i} whose hard decision is xx and cc is a constant term independent of ϵt\epsilon^{t}. By maximizing Λ(ϵt;𝒚t)\Lambda(\epsilon^{t};\mbox{\boldmath$y$}^{t}) we obtain the estimation of ϵt\epsilon^{t}:

ϵ^t=argmaxϵtΛ(ϵt;𝒚t)nR0nR0+nR0.\displaystyle\hat{\epsilon}^{t}=\arg\max_{\epsilon^{t}}\Lambda(\epsilon^{t};\mbox{\boldmath$y$}^{t})\approx\frac{n_{R_{0}^{\prime}}}{n_{R_{0}^{\prime}}+n_{R_{0}}}. (11)

Next, the ESE calculates the LLR for each coded bit of (𝒙1,,𝒙T)(\mbox{\boldmath$x$}^{1},...,\mbox{\boldmath$x$}^{T}) based on the readback signal (𝒚1,,𝒚T)(\mbox{\boldmath$y$}^{1},...,\mbox{\boldmath$y$}^{T}) and the estimated channel parameters:

L(xit|yit,ϵ^t)\displaystyle L(x_{i}^{t}|y_{i}^{t},\hat{\epsilon}^{t}) =logPr(yit|xit=0,ϵ^t)Pr(yit|xit=1,ϵ^t)\displaystyle\!\!\!\!\!\!\!\!\!\!=\log\frac{\textrm{Pr}(y_{i}^{t}|x_{i}^{t}=0,\hat{\epsilon}^{t})}{\textrm{Pr}(y_{i}^{t}|x_{i}^{t}=1,\hat{\epsilon}^{t})} (13)
=logϵ^tϕ(yit,R0)+(1ϵ^t)ϕ(yit,R0)ϕ(yit,R1,σ2)\displaystyle\!\!\!\!\!\!\!\!\!\!=\log\frac{\hat{\epsilon}^{t}\phi(y_{i}^{t},R_{0}^{\prime})+(1-\hat{\epsilon}^{t})\phi(y_{i}^{t},R_{0})}{\phi(y_{i}^{t},R_{1},\sigma^{2})}

for i=1,,N/T,t=1,,Ti=1,...,N/T,t=1,...,T. Note that the implementation of ESE (LABEL:eq:ESE) ignores the correlation between cells in a memory array and regards the sneak-path interference as the i.i.d. noise. A more sophisticated decoding scheme can be developed by utilizing the cell correlation and performing joint data and sneak path detection. It is left as our future work.

Refer to caption
Figure 10: MSE: E[(ϵ^ϵ)2]E[(\hat{\epsilon}-\epsilon)^{2}] of our proposed channel estimation (solid lines) and MSE: E[(ϵqϵ)2]E[(\epsilon_{q}-\epsilon)^{2}] of the average channel parameter (dashed lines).

To demonstrate the accuracy of the proposed channel estimation, in Fig. 10, we illustrate the mean squared error (MSE) between the estimated and the actual sneak-path rates. We obtain the MSE: E[(ϵ^ϵ)2]E[(\hat{\epsilon}-\epsilon)^{2}] by simulation for T=1T=1 and memory array sizes m×n=64×64,128×128m\times n=64\times 64,128\times 128, where ϵ\epsilon is the actual sneak-path occurrence rate, and ϵ^\hat{\epsilon} is the estimated value obtained by (11). In general, the MSE is below 10210^{-2} and decreases as the channel noise level decreases. For comparison, we also illustrate the MSE: E[(ϵqϵ)2]E[(\epsilon_{q}-\epsilon)^{2}] between the average and the actual sneak-path rates, where the average sneak-path rate is employed by the decoder when channel estimation is unavailable. Our proposed channel estimation is much more accurate to predict the channel than the average channel parameter, especially for large array size.

IV-D De-Shaper

The SISO de-shaper can also be realized by message-passing processing over the factor graph shown in Fig. 8. During the de-shaping, each node performs as a local processor and the edges pass LLR messages. The message passing on the edges is bi-directional. The overall processing is performed iteratively. In each iteration, each node in the factor graph acts once. A mapping node performs de-mapping processing based on the LL priori LLRs from its neighboring variable nodes and the LLR from ESE and outputs an extrinsic LLR for each mapping input. The extrinsic LLR is used as an a priori LLR for variable node processings. A variable node combines the LL priori LLRs from its neighboring mapping nodes and feeds back an extrinsic LLR to each of its neighboring mapping nodes. After a certain number of iterations the variable nodes output a more reliable LLR for each ECC coded bit as the de-shaper output.

IV-D1 Mapping Node Processing

A mapping node represents a mapping constraint, i.e., the LL edges on the left side link to the LL variable nodes that are the LL mapping input, and the edge on the right side links to a mapping output. Therefore, the ii-th mapping node represents mapping constraint (ci,1,ci,2,,ci,L)=xi\mathcal{M}(c_{i,1}^{\prime},c_{i,2}^{\prime},\cdots,c_{i,L}^{\prime})=x_{i}, where ci,1,ci,2,,ci,Lc_{i,1}^{\prime},c_{i,2}^{\prime},\cdots,c_{i,L}^{\prime} are the mapping inputs and xix_{i} is the mapping output. Thus, the edges from the left of a mapping node should pass the LLRs for ci,1,ci,2,,ci,Lc_{i,1}^{\prime},c_{i,2}^{\prime},\cdots,c_{i,L}^{\prime} and the edge from the right side should pass LLR for xix_{i}.

Let L(xi)=logPr(yi|xi=0)Pr(yi|xi=1)L(x_{i})=\log\frac{\textrm{Pr}(y_{i}|x_{i}=0)}{\textrm{Pr}(y_{i}|x_{i}=1)} be the LLR about xi,i=1,,Nx_{i},i=1,...,N, obtained from the ESE. Let La(ci,j)=logPr(ci,j=0)Pr(ci,j=1)L^{a}(c_{i,j}^{\prime})=\log\frac{\textrm{Pr}(c_{i,j}^{\prime}=0)}{\textrm{Pr}(c_{i,j}^{\prime}=1)} be an a priori LLR of ci,jc_{i,j}^{\prime} from a variable node. The ii-th mapping node calculates an extrinsic LLR for ci,k,k=1,2,,Lc_{i,k}^{\prime},k=1,2,...,L, given by

Le(ci,k)\displaystyle L^{e}(c_{i,k}^{\prime}) =logPr(yi|ci,k=0)Pr(yi|ci,k=1)\displaystyle\!\!\!\!\!\!\!\!=\log\frac{\textrm{Pr}(y_{i}|c_{i,k}^{\prime}=0)}{\textrm{Pr}(y_{i}|c_{i,k}^{\prime}=1)}
=logci,j,jkPr(yi|ci,k=0,ci,j,jk)jkPr(ci,j)ci,j,jkPr(yi|ci,k=1,ci,j,jk)jkPr(ci,j)\displaystyle\!\!\!\!\!\!\!\!=\log\frac{\sum_{c_{i,j}^{\prime},j\neq k}\textrm{Pr}(y_{i}|c_{i,k}^{\prime}=0,c_{i,j}^{\prime},j\neq k)\prod_{j\neq k}\textrm{Pr}(c_{i,j}^{\prime})}{\sum_{c_{i,j}^{\prime},j\neq k}\textrm{Pr}(y_{i}|c_{i,k}^{\prime}=1,c_{i,j}^{\prime},j\neq k)\prod_{j\neq k}\textrm{Pr}(c_{i,j}^{\prime})}
=logci,1,ci,2,,ci,L(1ci,k)Pr(yi|xi)jkPr(ci,j)ci,1,ci,2,,ci,Lci,kPr(yi|xi)jkPr(ci,j)\displaystyle\!\!\!\!\!\!\!\!=\log\frac{\sum_{c_{i,1}^{\prime},c_{i,2}^{\prime},\cdots,c_{i,L}^{\prime}}(1-c_{i,k}^{\prime})\textrm{Pr}(y_{i}|x_{i})\prod_{j\neq k}\textrm{Pr}(c_{i,j}^{\prime})}{\sum_{c_{i,1}^{\prime},c_{i,2}^{\prime},\cdots,c_{i,L}^{\prime}}c_{i,k}^{\prime}\textrm{Pr}(y_{i}|x_{i})\prod_{j\neq k}\textrm{Pr}(c_{i,j}^{\prime})}
=logci,1,ci,2,,ci,L(1ci,k)e(1xi)L(xi)+jk(1ci,j)La(ci,j)ci,1,ci,2,,ci,Lci,ke(1xi)L(xi)+jk(1ci,j)La(ci,j)\displaystyle\!\!\!\!\!\!\!\!=\log\frac{\sum_{c_{i,1}^{\prime},c_{i,2}^{\prime},\cdots,c_{i,L}^{\prime}}(1-c_{i,k}^{\prime})e^{(1-x_{i})L(x_{i})+\sum_{j\neq k}(1-c_{i,j}^{\prime})L^{a}(c_{i,j}^{\prime})}}{\sum_{c_{i,1}^{\prime},c_{i,2}^{\prime},\cdots,c_{i,L}^{\prime}}c_{i,k}^{\prime}e^{(1-x_{i})L(x_{i})+\sum_{j\neq k}(1-c_{i,j}^{\prime})L^{a}(c_{i,j}^{\prime})}}

where xi=(ci,1,ci,2,,ci,L)x_{i}=\mathcal{M}(c_{i,1}^{\prime},c_{i,2}^{\prime},\cdots,c_{i,L}^{\prime}).

IV-D2 Variable Node Processing

Since a variable node is associated with an ECC coded bit, i.e., the ii-th node is associated with cic_{i}, the edges connected to it should pass LLR messages for the same bit, i.e., ci,1=ci,2==ci,L=cic_{i,1}=c_{i,2}=\cdots=c_{i,L}=c_{i}, where ci,j,j=1,,Lc_{i,j},j=1,...,L, are LL repetitions of cic_{i}.

Consider the processing at the ii-th variable node. Let La(ci)L^{a}(c_{i}) be the priori LLR about cic_{i} from the ECC decoder and La(ci,j)L^{a}(c_{i,j}) be the priori LLR about ci,j,j=1,,Lc_{i,j},j=1,...,L, from the neighboring mapping nodes. Since ci,1=ci,2==ci,L=cic_{i,1}=c_{i,2}=\cdots=c_{i,L}=c_{i}, the variable node calculates an extrinsic LLR for each ci,kc_{i,k}, given by

Le(ci,k)=La(ci)+jkLa(ci,k).\displaystyle L^{e}(c_{i,k})=L^{a}(c_{i})+\sum_{j\neq k}L^{a}(c_{i,k}). (14)

After a certain number of processing iterations, the variable node outputs an extrinsic LLR about cic_{i} to the ECC decoder

Le(ci)=k=1LLa(ci,k).\displaystyle L^{e}(c_{i})=\sum_{k=1}^{L}L^{a}(c_{i,k}). (15)

IV-E ECC Optimization and BER Simulations

In this section, we present the ECC optimization and BER simulation results for ReRAM systems. With our proposed ECC, we achieve a high storage efficiency with a gap of less than 0.1 bit/cell from the ReRAM channel capacity.

Refer to caption
Figure 11: Mappings A and B that achieved data distributions with q=5/16q=5/16 and q=3/16q=3/16.
TABLE I: Code parameters for m×n=64×64m\times n=64\times 64 and 128×128128\times 128 ReRAM systems. Parameters of IRA code involves its variable node degree distribution: {λi}\{\lambda_{i}\} and combiner factor: aa [12, 13].
Array size 64×6464\times 64 128×128128\times 128
m×nm\times n
Data shaper L=4L=4 L=4L=4
Mapping A Mapping B
q=5/16q=5/16 q=3/16q=3/16
IRA code (ECC) λ3=0.567736\lambda_{3}=0.567736 λ3=0.501564\lambda_{3}=0.501564
λ50=0.432264\lambda_{50}=0.432264 λ50=0.498436\lambda_{50}=0.498436
Combiner: a=6a=6 Combiner: a=4a=4
Code rate R=0.542824R=0.542824 R=0.414735R=0.414735
maxq(Cq(ϵq,σ=100))\max_{q}(C_{q}(\epsilon_{q},\sigma=100)) 0.660 0.494
Refer to caption
Figure 12: BERs for IRA-coded ReRAM channel (solid line) and IRA-coded block-varying (ϵ,σ)(\mbox{\boldmath$\epsilon$},\sigma)-channel (labeled by \circ) with across-array coding strategy where channel parameters are set as m×n=64×64m\times n=64\times 64, R1=100Ω,R0=1000Ω,Rs=250ΩR_{1}=100\ \Omega,R_{0}=1000\ \Omega,R_{s}=250\ \Omega, and pf=103p_{f}=10^{-3}.
Refer to caption
Figure 13: BERs for IRA-coded ReRAM channel (solid line) and IRA-coded block-varying (ϵ,σ)(\mbox{\boldmath$\epsilon$},\sigma)-channel (labeled by \circ) with across-array coding strategy where channel parameters are set as m×n=64×64m\times n=64\times 64, R1=100Ω,R0=1000Ω,Rs=250ΩR_{1}=100\ \Omega,R_{0}=1000\ \Omega,R_{s}=250\ \Omega, and pf=103p_{f}=10^{-3}.
Refer to caption
Figure 14: BER comparison between the IRA-coded ReRAM channel with and without channel estimation (CE), where memory array size is m×n=64×64,128×128m\times n=64\times 64,128\times 128 and joint-coding array number is T=1,16T=1,16.
Refer to caption
Figure 15: BER comparison between our proposed structured interleaving and random interleaving over the i.i.d. (ϵq,σ)(\epsilon_{q},\sigma)-channel.

For the ECC, we adopt an IRA code which is a type of irregular low-density parity-check (LDPC) code that is able to approach the i.i.d. channel capacity with low encoding and decoding complexity [12, 13]. We consider the code design for two ReRAM systems with a memory array sizes of m×n=64×64m\times n=64\times 64 and 128×128128\times 128, and at a noise level of σ=100\sigma=100. We adopt the data shapers with a (L=4)(L=4)-repeater and 4-to-1 mappers with Mappings A and B (Fig. 11). Mappings A and B produce data distributions of q=516q=\frac{5}{16} and q=316q=\frac{3}{16}, respectively, which approach the maximum storage efficiency for the considered ReRAM systems (Fig. 6). Here the maximum storage efficiencies for m×n=64×64m\times n=64\times 64 and 128×128128\times 128 ReRAM systems are maxqCq(ϵq,σ=100)=0.660\max_{q}C_{q}(\epsilon_{q},\sigma=100)=0.660 and 0.4940.494 bit/cell, respectively. We optimize the IRA code for these two cases over an i.i.d. (ϵq,σ)(\epsilon_{q},\sigma)-channel using a density evolution method [14]. The code parameters for these two ReRAM systems are listed in TABLE I. Our codes achieve R=0.542824R=0.542824 and 0.4147350.414735 bit/cell, which are close to the capacity bound with gaps of about 0.120.12 and 0.080.08 bit/cell.

In Figs. 12 and 13, we simulate the BER of the two coded ReRAM systems in TABLE I with the across-array coding strategy over both the ReRAM channels (solid lines) and the equivalent block-varying (ϵ,σ)(\mbox{\boldmath$\epsilon$},\sigma)-channels (labeled by \circ). Message-passing decoding were employed for both the de-shaper and the ECC decoder, where the decoding of IRA codes can be found [13]. For the m×n=64×64m\times n=64\times 64 system, we employ a code length N=64×64=4096N=64\times 64=4096 and for 128×128128\times 128, we adopt N=128×128=16384N=128\times 128=16384. In both figures, the BERs over the ReRAM channels are almost the same as those over the block-varying (ϵ,σ)(\mbox{\boldmath$\epsilon$},\sigma)-channels, thus demonstrating the proposed channel equivalence. The BERs improve as TT increases and approach the decoding performance over the i.i.d. (ϵq,σ)(\epsilon_{q},\sigma)-channel, which is the performance limit for ReRAM channels as TT\rightarrow\infty. This performance improvement as TT increases can be regarded as a “diversity” gain by assigning the codeword to multiple memory arrays. For comparison, we also illustrate the BER performances of the same IRA codes without data shaping (q=1/2)(q=1/2) over i.i.d. (ϵq,σ)(\epsilon_{q},\sigma)-channel. The decoding performance deteriorates significantly due to the lack of data shaping.

To emphasize the importance of the proposed real-time channel estimation, in Fig. 14, we compared the BERs between the IRA-coded ReRAM channels with and without channel estimation for memory array sizes m×n=64×64,128×128m\times n=64\times 64,128\times 128 and joint-coding array numbers T=1,16T=1,16. We employ the same code parameters listed in TABLE I for both cases. For the IRA-coded ReRAM channel without channel estimation, we employ the average channel parameter ϵq\epsilon_{q} for the decoder. We observe that by applying the channel estimation, the BER improvement is obvious for each comparison pair.

Note that the codes in TABLE I are designed at σ=100\sigma=100 for the i.i.d. (ϵq,σ)(\epsilon_{q},\sigma)-channel, which means that theoretically the code can be decoded without errors at σ=100\sigma=100. However, the actual decoding performances shown in Figs. 12 and 13 are much worse. This is because with density evolution, the code is designed under an infinite code length assumption, while in our simulations finite-length codes were employed. In other words, the codes are asymptotically decodable at σ=100\sigma=100 as the code length approaches infinity. For the same reason, the 128×128128\times 128 ReRAM system achieves a lower BER performance than the 64×6464\times 64 system since a much longer code is employed in the 128×128128\times 128 system although the codes in both systems are designed at the same noise level of σ=100\sigma=100.

In Fig. 15, we provide BER comparisons between our proposed structured interleaving scheme and the random interleaving for the codes in TABLE I over the i.i.d. (ϵq,σ)(\epsilon_{q},\sigma)-channel. For both codes, our proposed structured interleaving scheme outperforms the random interleaving. The BER curves are steeper with structured interleaving than that with random interleaving. This verifies our analysis in Section IV-B. Note that similar performance gain can also be observed from the ReRAM channels, which are the block-varying forms of the (ϵ,σ)(\epsilon,\sigma)-channel.

V Conclusion

We have considered the design of effective channel coding schemes to tackle both the sneak-path interference and the additive noise for the ReRAM channels. We have proposed an across-array coding strategy to mitigate the channel instability. It also enables a “diversity” gain during decoding. By employing TIN decoding, the ReRAM channel is equivalent to a block-varying channel whose status is not data-dependent, based on which, we proposed the capacity limit as well as a coding scheme. We have also proposed a real-time channel estimation scheme to obtain the sneak-path rates of the TT arrays, based on which an ESE calculates the LLR for each coded bit for decoding. To deal with the channel asymmetry, we proposed an ECC concatenated with a data shaper, where the later forms the desired input data distribution to achieve the maximum information rate. With an optimal ECC design, the ReRAM system achieved a high storage efficiency with a gap of less than 0.1 bit/cell from the ReRAM channel capacity limit.

We would also like to point out some possible extensions that lead to our future work. Although we only considered the AWGN noise in this paper, our work can be easily extended to other types of noises, such as the lognormal noise, through reformulating the channel capacity and the LLR formula in the ESE. Moreover, to consider more general sneak-path models that involve multiple sneak paths affecting a read cell, the channel model in Fig. 5 should be modified as a binary input multi-level output channel, where the types of the output signal levels depend on the corresponding sneak-path combinations [4].

References

  • [1] D. B. Strukov, G. S. Snider, D. R. Stewart, and R. S. Williams, “The missing memristor found,” Nature, vol. 453, no. 7191, p. 80, 2008.
  • [2] M. A. Zidan, H. A. H. Fahmy, M. M. Hussain, and K. N. Salama, “Memristor-based memory: The sneak paths problem and solutions,” Microelectron. J., vol. 44, no. 2, pp. 176–183, 2013.
  • [3] Y. Cassuto, S. Kvatinsky, and E. Yaakobi, “Information-theoretic sneakpath mitigation in memristor crossbar arrays,” IEEE Trans. Inf. Theory., vol. 62, no. 9, pp. 4801–4813, Sep. 2016.
  • [4] Y. Ben-Hur and Y. Cassuto, “Detection and coding schemes for sneakpath interference in resistive memory arrays,” IEEE Trans. Commun., vol. 67, no. 6, pp. 3821–3833, Feb. 2019.
  • [5] Z. Chen, C. Schoeny, and L. Dolecek, “Pilot assisted adaptive thresholding for sneak-path mitigation in resistive memories with failed selection devices,” IEEE Trans. Commun., vol. 68, no. 1, pp. 66–81, Jan. 2020.
  • [6] P. Huang, P. H. Siegel, and E. Yaakobi, “Performance of multilevel flash memories with different binary labelings: a multi-user perspective,” IEEE JSAC, vol. 34, no. 9, pp.2336–2353, Sep. 2016.
  • [7] R. Knopp and P. A. Humblet, “On coding for block fading channels,” IEEE Trans. Inf. Theory, vol. 46, no. 1, pp.189–205, Jan. 2000.
  • [8] P. O. Vontobel,W. Robinett, P. J. Kuekes, D. R. Stewart, J. Straznicky, and R. S.Williams, “Writing to and reading from a nano-scale crossbar memory based on memristors,” Nanotechnology, vol. 20, no. 42, Oct. 2009.
  • [9] J. Zhou, K. Kim, and W. Lu, “Crossbar RRAM arrays: selector device requirements during read operation,” IEEE Trans. Electron Devices, vol. 61, no. 5, pp. 1369–1376, May 2014.
  • [10] Q. Luo, X. Xu, T. Gong, and H. Lv, “8-layers 3D vertical RRAM with excellent scalability towards storage class memory applications,” in Proc. IEEE IEDM 2017, pp. 2.7.1–2.7.4.
  • [11] R. J. Mceliece and W. E. Stark, “Channels with block interference,” IEEE Trans. Inf. Theory, vol. IT-30, no. 1, pp.44–53, Jan. 1984.
  • [12] D. Divsalar, H. Jin, And R. J. Mceliece. “Coding theorems for ‘turbo-like’ codes.” in Proc. 36Th Allerton Conf. On Communication, Control And Computing, Allerton, Illinois, Sept. 1998, pp. 201–210.
  • [13] S. ten Brink and G. Kramer, “Design of repeat-accumulate codes for iterative detection and decoding,” IEEE Trans. Signal Processing, vol. 51, no. 11, pp. 2764–2772, Nov. 2003.
  • [14] T.J. Richardson and R.L. Urbanke, “The capacity of low-density parity-check codes under message-passing decoding,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 599 – 618, Feb. 2001.