This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Spatio-Temporal Modeling for Flash Memory Channels Using Conditional Generative Nets

Simeng Zheng, Chih-Hui Ho, Wenyu Peng, and Paul H. Siegel
Electrical and Computer Engineering Dept., University of California, San Diego, La Jolla, CA 92093 U.S.A
{sizheng,chh279,w6peng,psiegel}@ucsd.edu
Abstract

We propose a data-driven approach to modeling the spatio-temporal characteristics of NAND flash memory read voltages using conditional generative networks. The learned model reconstructs read voltages from an individual memory cell based on the program levels of the cell and its surrounding cells, as well as the specified program/erase (P/E) cycling time stamp. We evaluate the model over a range of time stamps using the cell read voltage distributions, the cell level error rates, and the relative frequency of errors for patterns most susceptible to inter-cell interference (ICI) effects. We conclude that the model accurately captures the spatial and temporal features of the flash memory channel.

Index Terms:
Machine learning, Flash memory channel, Generative modeling, Spatio-temporal analysis.

I Introduction

Realistic models for digital communication and storage channels are essential tools in the design and optimization of signal processing, detection, and coding algorithms. For NAND flash memories, the steady reduction in technology feature size and the increase in cell bit-density has been accompanied by diminished memory reliability and reduced device endurance. Sources of errors are manifold, including programming errors, inter-cell interference (ICI), cell wear during program/erase (P/E) cycling, cell charge loss due to data retention, and read disturb effects. Consequently, there is a critical need for comprehensive models that capture the complex behavior of the flash memory channel and accurately simulate the spatial and temporal characteristics of the read voltages.

Several models of voltage distributions supported by empirical measurements or simulated voltages in NAND flash have appeared in the literature. Cai et al. [1] models the voltage distribution in 2-bit per cell MLC flash devices as a Gaussian distribution. Parnell et al. [14] proposed a parameterized Normal-Laplace mixture model that more accurately describes MLC flash read voltage distributions. Luo et al. [11] proposed another accurate and computationally more efficient model for MLC flash, based on a modified version of the Student’s t-distribution and a temporal power law. Liu et al. [10] used a neural network to model simulated read voltages as a function of P/E cycles for one individual program level in isolated MLC flash cells. In addition, statistical analysis of hard bit error statistics in [18, 13] and characterization of dominant error patterns [2] offer a valuable empirical understanding of flash memories. However, as effective as these models have been in the scenarios to which they have been applied, none has provide an accurate model of the complex spatial and temporal characteristics of flash memory read voltages.

Our goal in this paper is to use machine-learning techniques to develop a comprehensive and accurate model of the flash memory channel, with the aim of capturing both spatial ICI effects and temporal distortions arising from P/E cycling.

Recently, generative modeling techniques [3, 8] have been successfully applied to image processing [7]. In view of the demonstrated power of neural networks in learning complex multidimensional distributions, we propose the use of conditional generative nets as an approach to modeling flash memory read voltages in space and time. We adopt the conditional VAE-GAN as our training architecture, where the fusion of the variational autoencoder (VAE) [8] and the generative adversarial network (GAN)  [3] can leverage the information from the latent space to produce high-quality, accurate reconstruction with the help of the discriminator.

We train the conditional VAE-GAN network to regenerate the soft read voltage levels from program levels, using a dataset of measurements from a 1X-nm, 3-bit per cell TLC NAND flash memory. By incorporating the exact P/E cycle counts into the architecture, we are able to control the regenerated voltage levels at a specified time stamp.

We formulate a novel generative modeling workflow to generate soft read voltages from all program levels over a range of P/E cycles. When we train the network on measured data, this conditional method can regenerate realistic cell-level read voltages on an array of flash cells from a corresponding array of program levels and a given P/E cycle. To our knowledge, this is the first work that models both temporal features (P/E cycles) and spatial characteristics (ICI effects). We use two evaluation metrics to compare the model results to measured data: the read voltage distributions, i.e., the estimated probability density functions (PDFs), at different time stamps, and the relative frequencies of spatially-dependent and pattern-dependent ICI-induced hard read errors. Details of these experimental comparisons are provided in Section IV.

II Flash Memory Basics

II-A Flash Structure and Experimental Procedure

Refer to caption
Figure 1: (Left) Example mapping of cell program levels to binary representations in TLC flash. (Right) Schematic diagram of a TLC flash memory block showing the 2-D array of cells connected in the horizontal direction by wordlines (WLs) and in the vertical direction by bitlines (BLs).

The basic unit of data storage in NAND flash memory is a floating-gate transistor, referred to as a cell. Today’s flash memories are capable of storing single or multiple bits (e.g. 2 to 5 bits) per cell, where the nn-bit strings correspond to 2n2^{n} program levels. The cells are organized into an interconnected two-dimensional (2-D) array, called a block, via horizontal wordlines (WLs) and vertical bitlines (BLs). The flash memory chip is composed of a collection of such blocks. Fig. 1 depicts a schematic diagram of a planar TLC flash memory block and an example of a mapping from program levels to binary bits.

The basic unit of write (i.e., program) and read operations in flash memory is a page, corresponding to a logical bit position in a wordline of a block. On the other hand, the basic unit of an erase operation is an entire block.

To characterize the channel and create datasets for machine learning, we conducted program/erase (P/E) cycling experiments on several blocks of a commercial 1X-nm TLC flash memory chip. During each P/E cycle, the selected block is first erased, and then pseudo-random data are programmed into the successive pages in the block. At 40004000, 70007000, and 1000010000 cycles, we perform read operation on the tested block. For each cell, we then the record program level and the measured voltage level. The P/E cycling experiments were performed at room temperature in a continuous manner with no wait time between the erase-program-read operations.

In the program operation, we refer to the program levels of three consecutive cells along WLs and BLs as a pattern. As an example, in Fig. 1, the programmed levels PLi1,jPLi,jPLi+1,j\text{PL}_{i-1,j}\text{PL}_{i,j}\text{PL}_{i+1,j} in WLs (i1),i,(i+1)(i-1),i,(i+1) of BL jj, correspond to bit strings “011”, “111”, “011”, which we associate with the pattern 707 in the vertical (BL) direction.

II-B Spatio-temporal Characteristics

Repeated P/E operations induce wear on the flash cells, and high-low-high program levels in three consecutive cells often leads to severe ICI effects. These distortions represent the spatio-temporal nature of the flash memory channel. As a result, the level error rate increases as the temporal P/E cycle count grows, as seen in Fig. 2.

The spatial ICI effect refers to the phenomenon where programming of a cell induces changes in the voltage levels of neighboring cells within its block. In particular, the read voltage level of a cell programmed to a low level may be inadvertently increased if its adjacent cells are programmed to high levels, i.e., when the programming pattern is high-low-high. For example, if we program a 707 pattern in a TLC flash memory, the read voltage of the low central cell may be increased by its high adjacent cells. During data detection, the recovered program level of the central “victim” cell may therefore be erroneously interpreted as an incorrect level. These the severe ICI effects can be observed in Fig. 2. At each P/E cycle, we see that the cell errors are not randomly distributed; they are clearly affected by neighboring program levels. The 9 patterns shown are the most error-prone in both WL and BL directions. Pattern 707 in the BL direction is the most severely affected by ICI. Moreover, patterns 707, 706, and 607 in the BL direction are more error-prone than those on the WL direction.

We note that, to mitigate the effects of ICI in flash memory, the use of constrained codes that prevent the appearance of ICI-prone patterns has been proposed; see, for example, [15, 4, 5]. Accurate modleing of the dependence of WL and BL pattern errors on the P/E cycle count can be a valuable tool to help researchers design efficient, time-aware constrained codes.

Refer to caption
Figure 2: Count of top error-prone patterns and level error rate at selected P/E cycle. The error pattern counts are normalized by the count of pattern 707 at bitline direction of 4000 P/E cycle.

III Generative Modeling for Flash Memory

In this section, we discuss our generative modeling pipeline for the flash memory channel. We adopt a conditional VAE-GAN (cVAE-GAN) architecture [9] for our pipeline, depicted in Fig. 3. Our goal is to learn a mapping between program levels and soft read voltage levels at various P/E cycles, where the reconstructed voltage levels accurately reflect the spatial and temporal nature of the channel.

Refer to caption
Figure 3: Generative modeling pipeline: encoder, generator, and discriminator constitute the generative modeling workflow. Here, zz is the latent vector; P/E is the corresponding P/E cycle count; PL is the array of program levels, VL is the array of measured read voltage levels, and VL~\widetilde{\text{VL}} is the reconstructed array of read voltage levels. In our implementation, PL, VL, and VL~\widetilde{\text{VL}} are all 64×6464\times 64 arrays.

III-A Generative Flash Modeling

Using the P/E cycling experiment outlined in Section II-A, we collect the paired channel instances at specific P/E cycles, where the channel instances are denoted as {(PL,VL,P/E)}\{(\text{PL},\text{VL},\text{P/E})\}. Here, we use PL, VL, and P/E to denote the input program level of channel, the output voltage level of channel, and the P/E cycle count, respectively. The goal of channel reconstruction and generative modeling in this paper is to learn the analytically intractable likelihood P(VL|PL,P/E)P(\text{VL}|\text{PL},\text{P/E}), with the aim of capturing the spatio-temporal nature of the flash memory channel.

Fig. 3 summarizes the architecture of the generative modeling pipeline. The conditional VAE-GAN architecture consists of three components: an encoder (EncEnc), a generator (GenGen), and a discriminator (DisDis). The encoder maps the read voltages to the latent vector zz at a specific P/E and replaces the prior distribution P(z)P(z) in the GAN with the learned posterior distribution P(z|VL,P/E)P(z|\text{VL},\text{P/E}). The decoder in the VAE shares its weights with the GAN generator [3]. In the conditional setting, the variational lower bound of P(VL|P/E)P(\text{VL}|\text{P/E}) can be derived as

logP(VL|P/E)\displaystyle\log P(\text{VL}|\text{P/E})\geq DKL(Q(z|VL,P/E)||P(z|P/E))\displaystyle-D_{KL}(Q(z|\text{VL},\text{P/E})||P(z|\text{P/E}))
+𝔼Q(z|VL,P/E)[log(P(VL|z,P/E))]\displaystyle+\mathbb{E}_{Q(z|\text{VL},\text{P/E})}[\log(P(\text{VL}|z,\text{P/E}))]

where DKLD_{KL} represents the Kullback-Leibler (KL) divergence. The distribution Q(z|VL,P/E)Q(z|\text{VL},\text{P/E}) of the latent vector zz is trained to approach P(z|P/E)P(z|\text{P/E}) via the KL loss KL\mathcal{L}_{KL}, where P(z|P/E)P(z|\text{P/E}) is assumed to be a Gaussian distribution.

Generator GenGen will take both the learned latent vector and PL as input and generate a “fake” VL~\widetilde{\text{VL}}. The latent vectors are sampled from Q(z|VL,P/E)Q(z|\text{VL},\text{P/E}) using the re-parameterization trick [8]. When sampling different latent vectors zz from the same distribution, we can generate multiple arrays of plausible voltages levels. The variations in these output arrays for a given array of program levels reflect the stochasticity of the channel. The discriminator measures the difference between PL and VL~\widetilde{\text{VL}}. The loss in the conditional GAN part is

GAN=log(1Dis(PL,Gen(PL,P/E,z)))+log(Dis(PL,VL)).\displaystyle\mathcal{L}_{GAN}=\log(1-Dis(\text{PL},Gen(\text{PL},\text{P/E},z)))+\log(Dis(\text{PL},\text{VL})).

Similar to VAE-GAN [9], we encourage the reconstructed voltage levels to match the authentic voltage levels, using the 2\ell_{2}-norm to measure the reconstruction loss

recon=VLGen(PL,P/E,z)2.\mathcal{L}_{recon}=||\text{VL}-Gen(\text{PL},\text{P/E},z)||_{2}.\vspace{-1ex}

Combining these equations, we formulate the loss function of the cVAE-GAN architecture as

minGen,EnmaxDisGAN+αrecon+βKL.\min_{Gen,En}\max_{Dis}\mathcal{L}_{GAN}+\alpha\mathcal{L}_{recon}+\beta\mathcal{L}_{KL}.\vspace{-1ex} (1)

III-B Spatio-temporal Combination

Aiming to capture the spatial ICI effects in the channel model, we implement the generator using a convolutional neural network (CNN) in GenGen, where VL is reconstructed from the PL values in its cell and neighboring cells. To generate VL at an explicit P/E cycle count, we control the generator with an additional temporal factor and incorporate the explicit P/E cycle count into GenGen.

We first encode the normalized P/E cycle count into a dd-dimensional P/E vector, which contains expressive powers of the normalized P/E cycle, e.g., P/E2\text{P/E}^{2}, P/E\sqrt{\text{P/E}}, etc. Then, we spatially replicate the dd-dimensional P/E vector to the feature map with appropriate size H×W×dH\times W\times d and concatenate it with the H×W×CH\times W\times C feature from each layer in GenGen, where H×WH\times W is the spatial dimension of the feature from each convolutional layer and CC is the number of channels in the CNN. The channel-wise combination produces the final feature with size H×W×(C+d)H\times W\times(C+d) of each layer. The fusion of the features from the program levels and the P/E feature maps guarantees the spatial-temporal characteristics of the reconstructed voltage levels.

III-C Implementation Details

We implement and evaluate our method using the recorded data in a commercial TLC flash chip. With the generative modeling framework, we believe this data-driven approach can be flexibly applied to flash memories of any technology generation and scale. In TLC flash memory, each block contains hundreds of pages, each of which is typically 8-16 kB in size. In order to represent the TLC flash memory without bias, we collect data from several blocks of one 1X-nm TLC chip at selected P/E cycle counts. We crop the blocks into non-overlapping 64×6464\times 64 2-D arrays to formulate our paired data. The number of 2-D arrays in the training set is 1.5×1051.5\times 10^{5} (5×1045\times 10^{4} for each P/E cycle) and the size of the evaluation dataset is 2.1×1042.1\times 10^{4} (7×1037\times 10^{3} for each P/E cycle). The dimensions of latent vector zz and P/E cycle vector are both set to 6.

Remark 1

. We adapt the conditional VAE-GAN in [19] for input arrays of 64×6464\times 64 program levels and output arrays of 64×6464\times 64 voltage levels, each of which has a single channel. Three network modules in Fig. 3 refer to: ResNet [6] (EncEnc), U-net [16] (GenGen), and PatchGAN [7] (DisDis). The following descriptions of the modules exploit the terminologies in the corresponding references.

  1. 1.

    Encoder: We use the two residual blocks, each of which contains two 3×33\times 3 convolutional layers with stride 1 and padding 1. We then add two linear layers, which map output features to mean and variance for the latent vector.

  2. 2.

    Generator: CkCk denotes a Convolution-BatchNorm-ReLU layer with kk output channels. All convolutions are 4×44\times 4 kernels applied with stride 2 and padding 1. The network architecture before spatio-temporal combination can be described as

    (Down Part)C64,C128,C256,C512,C512,C512\displaystyle(\text{Down Part})\ C64,C128,C256,C512,C512,C512
    (Up Part)C512,C512,C256,C128,C64,C1\displaystyle(\text{Up Part})\ C512,C512,C256,C128,C64,C1

    where we inject latent vector zz by replication and concatenation into every layer in the “Down” part [19], and each layer in the “Up” part receives skip connections from the corresponding layer in the “Down” part [16].

  3. 3.

    Discriminator DisDis: The input to the discriminator is the concatenation of fake voltage levels and program levels. With the same naming convention as in the generator, we express the discriminator as C64,C128,C1C64,C128,C1.

Remark 2

. The training details of the generative modeling methods are as follows. Adam optimizer is used with learning rate 2×1042\times 10^{-4}. Parameters in the loss function (1) are set to α=10\alpha=10 and β=0.01\beta=0.01. We train the conditional VAE-GAN for 7 epochs with batch size 2.

During evaluation, we use program levels and latent vector zz sampled from a standard Gaussian distribution. For each program level array, we prepare 1010 different sampled latent vectors to evaluate the learned model.

IV Experimental Results

Refer to caption
Refer to caption
(a) 4000 P/E cycles
Refer to caption
(b) 7000 P/E cycles
Refer to caption
(c) 10000 P/E cycles
Figure 4: PDF visualizations for measured and cVAE-GAN voltage levels at 4000, 7000, 10000 P/E cycles. In each sub-figure, solid curves represent cVAE-GAN modeled distribution and triangles represent measured distribution. The plots are in linear scale. Vertical dash-dotted lines are fixed default voltage thresholds.
Refer to caption
Figure 5: Total error counts of measured (‘M’), cVAE-GAN (’cV-G’), Gaussian (‘G’), Normal-Laplace (‘NL’), Student’s t (‘S’t’) model.In each bar, we stack the errors from program level 1 to program level 7. We normalize the error counts of measured data at 4000 P/E cycle as 1.

To evaluate the quality of the reconstructed voltage levels and analyze the spatio-temporal nature of flash memory channel, we summarize our evaluation approaches as follows.

  1. 1.

    Distribution: The frequency of occurrence of each voltage level given the program level and P/E cycle count is used to estimate the conditional probability of that level and time. We visualize the PDFs for measured data and reconstructed data. For quantitative comparison, we compare the distributions generated by our data-driven method with three statistical fitting methods [11], using the metric of level error count.

  2. 2.

    Inter-cell interference (ICI): For cells programmed to 0 level that suffer an error according to their voltage level, we compute the relative frequencies of the patterns of program levels in adjacent cells in the WL and BL directions. We visualize these relative error frequencies using pie charts.

Remark 3

. We compared the cVAE-GAN model to other popular generative modeling architectures: conditional GAN [7], conditional VAE [17], and Bicycle GAN [19]. The networks were adapted for the array sizes of program levels and voltage levels, as in Remark 1. We then compare the learned distributions with the measured distribution using the total variation distance dTVd_{TV},

dTV(Preal,Pfake)=12VL|Preal(VL)Pfake(VL)|\vspace{-0.5ex}d_{TV}(P_{real},P_{fake})=\frac{1}{2}\sum_{\text{VL}}|P_{real}(\text{VL})-P_{fake}(\text{VL})|\vspace{-0.5ex}

The numerical results of dTVd_{TV} indicated that the cVAE-GAN achieved the smallest total variation distance with respect to the measured distributions. Therefore, we selected cVAE-GAN as our model for the present study.

IV-A Distribution Analysis

As we evaluate our learned model using input arrays of program levels, we collect regenerated voltage levels and count the frequency of occurrence of voltage levels over the voltage range. We then estimate the conditional PDFs of voltages associated with each program level and given P/E cycle.

Fig. 4 shows the conditional PDFs for measured data and regenerated data in the evaluation dataset at three different time stamps. The xx-axis represents the soft read voltages spanning a certain voltage level range. The yy-axis represents the conditional PDF. We only show conditional PDFs from program level 1 to 7 due to normalization problems of program 0; the detailed analysis of level 0 will be discussed in Section IV-B. We make two observations from Fig. 4. First, as P/E cycle count increases, the peak of the distribution in each program level drops and the distribution becomes wider. This indicates that more voltage levels exceed the read thresholds and more errors will be detected, which is consistent with the increased error rate in Fig. 2. Second, our modeled data (solid curves) almost match the measured data (triangle markers) and, thus, capture the dependence on P/E cycles.

For a more quantitative assessment, we compare our generative model with three state-of-the-art statistical models: Gaussian model [1], Normal-Laplace model [14], and Student’s t-distribution model [11]. Following the optimization process used in [11] for an MLC flash device, we fit those statistical distributions to our TLC measured distributions. We minimize the KL divergence between real distribution and fake distribution by using the Nelder-Mead simplex method  [12], where the KL divergence is denoted as DKL(Preal,Pfake)D_{KL}(P_{real},P_{fake}). We obtain the best-fit parameters for all program levels, except PL=0\text{PL}=0, with each of the statistical distributions.

We then compute the level error counts under each of the distributions and quantitatively compare those fake distributions with real distributions, where the error count is a measure of the reliability and endurance of the flash memory. To calculate the level error counts from distributions, we fix 7 default read thresholds, as shown by the dash-dotted vertical lines in Fig. 4. The hard read voltages are determined by comparing soft voltages to these thresholds. For instance, if a voltage level of program level 1 lies below the first threshold or above the second threshold, the hard read level of the cell will not be designated as 1. The error probability of PL=1\text{PL}=1 is denoted as

P(VL<Vth(01)|PL=1)+P(VL>Vth(12)|PL=1)P(VL<V_{th(01)}|\text{PL}=1)+P(VL>V_{th(12)}|\text{PL}=1)

where Vth(01)V_{th(01)} denotes the voltage threshold used to distinguish between level 0 and level 1, and Vth(12)V_{th(12)} denotes the voltage threshold used to distinguish between level 1 and level 2.

We present the level error counts for five models in the form of bar charts Fig. 5. The xx-axis represents the chosen P/E cycle count and the label directly under each bar represents the corresponding model name. The yy-axis corresponds to the normalized error count. At each P/E cycle count, for each model, the errors from 7 program levels are stacked as one individual bar. The stacked bar represent the total error count. For the measured distributions, we see that level 1 has the highest error counts and the total error count at 10000 P/E cycles is around 2.5×\times that at 4000 P/E cycles.

For the statistical distributions, we observe that the Gaussian model does not estimate the error well; this is because the tails in the actual distribution are becoming heavier as the device is cycled to severe conditions. The Normal-Laplace model, on the other hand, takes the heavier tails into consideration and provides accurate estimations of error counts at each P/E cycle. Student’s t-distribution over-estimates the errors at those P/E cycles. For the machine learning approach, cVAE-GAN slightly overestimates the wear conditions in 7000 and 10000 P/E cycles. At 4000 P/E cycles, cVAE-GAN produces more errors than the measured data. In conclusion, Normal-Laplace is the best statistical model to capture the distributions of flash devices. Our cVAE-GAN works better than Normal-Laplace at 7000 and 10000 P/E cycles but overestimate the errors at 4000 P/E cycles. (However, as shown in the next section, the generative model can not only generate realistic-looking voltage distributions but also accurately learn spatial characteristics of the read voltages.)

Overall, we conclude that cVAE-GAN can regenerate voltages as a function of P/E cycles and produce high-quality voltage levels according to visual and quantitative metrics.

IV-B Characterization of ICI Effects

As discussed in Section II-B, the read voltage of a cell may be adversely increased by ICI when the neighboring cells are programmed to high levels. Cells programmed to level 0 are the most susceptible to such ICI effects. Generating voltage levels with ICI effects is complicated due to the pattern-dependent and spatially-dependent distortions. We remark that classical statistical models focus on regeneration of the PDFs of the measured data and, as such, are not expected to be effective in capturing ICI effects.

We evaluate how well the generative model learns spatial ICI properties by examining errors associated with program level patterns PLi,j1PLi,jPLi,j+1\text{PL}_{i,j-1}\;\text{PL}_{i,j}\;\text{PL}_{i,j+1} and PLi1,jPLi,jPLi+1,j\text{PL}_{i-1,j}\;\text{PL}_{i,j}\;\text{PL}_{i+1,j} in the WL and BL directions, respectively. As we observed in Fig. 2, the most error-prone patterns have central victim cell as 0, i.e., PLi,j=0\text{PL}_{i,j}=0. We consider pattern-dependent error probabilities in both directions. The error probability measures the relative frequency of occurrence of the WL and BL patterns when an error occurs in the victim cell. More precisely, we calculate the pattern-dependent error probabilities in WL and BL directions,

P(PLi,j1,PLi,j=0,PLi,j+1|VLi,j>Vth(01),PLi,j=0)\displaystyle P(\text{PL}_{i,j-1},\text{PL}_{i,j}=0,\text{PL}_{i,j+1}|\text{VL}_{i,j}>V_{th(01)},\text{PL}_{i,j}=0)
P(PLi1,j,PLi,j=0,PLi+1,j|VLi,j>Vth(01),PLi,j=0).\displaystyle P(\text{PL}_{i-1,j},\text{PL}_{i,j}=0,\text{PL}_{i+1,j}|\text{VL}_{i,j}>V_{th(01)},\text{PL}_{i,j}=0).

The probabilities for measured and regenerated data are visualized as pie charts in Fig. 6. When we program an interior cell to level 0, there are 64 such patterns of program levels for the pair of adjacent cells in both WL and BL directions. Without ICI effects, the errors would occur randomly for all possible patterns.

In the measured data, the 23 listed patterns account for 55% of the errors in the WL direction and around 70% of the errors in the BL direction. The dominant error pattern in both WL and BL directions is 707. Comparing the area of pattern 707 in WL and BL, we find that pattern 707 in the WL direction is less severe than that in the BL direction.

As shown in Fig. 6, for the prevalent error patterns at 7000 P/E cycles, probabilities observed in the data generated by cVAE-GAN are very similar to those seen in the measured data. The only substantial discrepancy we observed is that the generative model underestimates the fraction of the 707 pattern in the WL direction. At 4000 (resp., 10000) P/E cycles, the model underestimates (resp., overestimates) the fraction of the 707 pattern in both directions. However, at all P/E cycles, cVAE-GAN generates the same rank ordering of pattern fractions as the measured data in both directions.

These results show that cVAE-GAN generally produces spatial distributions of voltage levels in a flash memory block that capture with good accuracy the effects of ICI in both vertical and horizontal directions.

Refer to caption
Refer to caption
Refer to caption
(a) Measured errors in WL (left) and BL (right)
97921 total errors observed at PL=0\text{PL}=0
Refer to caption
Refer to caption
(b) cVAE-GAN errors in WL (left) and BL (right)
989565 total errors observed at PL=0\text{PL}=0
Figure 6: Pie charts visualizing error-causing probabilities according to measured voltages and cVAE-GAN generated voltages at 7000 P/E cycles. The sector labeled others combines 41 lesser frequency patterns. The first column of pie charts correspond to the WL direction and the second column to the BL direction.

V Conclusion

In this paper, we explored the use of conditional generative networks to model the flash memory channel. Unlike traditional modeling and analysis, our model can generate “realistic” soft voltage levels from program levels, not only representing accurate statistical distributions among different P/E cycle counts but also preserving spatial ICI effects. Our generative model marks a first step in capturing the spatio-temporal characteristics of flash devices.

Acknowledgment

The authors would like to thank Zachary Blair and Yi Liu for the flash memory test platform used in this study. The authors would also like to acknowledge very helpful discussions with Naoaki Kokubun, Sarah Ekaireb, and Shengqiu Jin.

References

  • [1] Y. Cai, E. F. Haratsch, O. Mutlu, and K. Mai, “Threshold voltage distribution in MLC NAND flash memory: Characterization, analysis, and modeling,” in Proc. Design, Automation & Test in Europe Conf. & Exhib., Grenoble, France, Mar. 2013.
  • [2] Y. Cai, E. F. Haratsch, O. Mutlu, and K. Mai, “Error patterns in MLC NAND flash memory: Measurement, characterization, and analysis,” in Proc. Design, Automation & Test in Europe Conf. & Exhib., Dresden, Germany, Mar. 2012.
  • [3] I. J. Goodfellow, J. P.-Abadie, M. Mirza, B. Xu, D. W.-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Proc. Neural Inf. Process. Syst. (NIPS), Montréal, Canada, Dec. 2014, pp. 2672–2680.
  • [4] A. Hareedy, B. Dabak, and R. Calderbank, “Managing device lifecycle: Reconfigurable constrained codes for M/T/Q/P-LC flash memories,” IEEE Trans. Inf. Theory, vol. 67, no. 1, pp. 282–295, Oct. 2020.
  • [5] A. Hareedy, S. Zheng, P. H. Siegel, and R. Calderbank, “Read-and-run constrained coding for modern flash devices,” in Proc. IEEE Int. Conf. Commun. (ICC), Seoul, South Korea, May 2022.
  • [6] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Las Vegas, NV, USA, July 2016, pp. 770–778.
  • [7] P. Isola, J-Y. Zhu, T. Zhou, A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Honolulu, HI, USA, July 2017, pp. 1125–1134.
  • [8] D. P. Kingma and M. Welling, “Auto-encoding variational Bayes,” in Proc. Int. Conf. Represent. Learn. (ICLR), Banff, Canada, Apr. 2014.
  • [9] A. B. L. Larsen, S. K. Sønderby, H. Larochelle, O. Winther, “Autoencoding beyond pixels using a learned similarity metric,” in Int. Conf. Mach. Learn. (ICML), New York, USA, June 2016.
  • [10] Z. Liu, Y. Liu, and P. H. Siegel, “Generative modeling of NAND flash memory voltage level,” in Non-Volatile Memories Workshop (NVMW), CA, USA, Mar. 2021.
  • [11] Y. Luo, S. Ghose, Y. Cai, E. F. Haratsch and O. Mutlu, “Enabling accurate and practical online flash channel modeling for modern MLC NAND flash memory,” IEEE J. Select. Areas Commun., vol. 34, no. 9, pp. 2294-2311, Sept. 2016.
  • [12] J. A. Nelder and R. Mead, “A simplex method for function minimization,” The computer Journal, vol. 7, no. 4, pp. 308-313, 1965.
  • [13] N. Papandreou, H. Pozadis, T. Parnell, N. Loannoum, R. Pletka, S. Tomic, P. Breen, G. Tressler, A. Fry, and T. Fisher, “Characterization and analysis of bit errors in 3D TLC NAND flash memory,” in IEEE Int. Relilability Phys. Symp. (IRPS), Monterey, CA, USA, Apr. 2019.
  • [14] T. Parnell, N. Papandreou, T. Mittelholzer, and H. Pozidis, “Modelling of the threshold voltage distributions of sub-20nm NAND flash memory,” in Proc. IEEE Global Commun. Conf., Dec. 2014, pp. 2351–2356.
  • [15] M. Qin, E. Yaakobi, and P. H. Siegel, “Constrained codes that mitigate inter-cell interference in read/write cycles for flash memories,” IEEE J. Select. Areas Commun., vol.32, no.5, pp. 836–846, May 2014.
  • [16] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: convolutional networks for biomedical image segmentation,” in Int. Conf. Med. Imag. Comput. Comput.-assisted Intervention (MICCAI 2015), Springer, Cham.
  • [17] K. Sohn, H. Lee, and X. Yan “Learning structured output representation using deep conditional generative models,” in Proc. Neural Inf. Process. Syst. (NIPS), Montréal, Canada, Dec. 2015.
  • [18] V. Taranalli, H. Uchikawa, and P. H. Siegel, “Channel models for multi-level cell flash memories based on empirical error analysis,” IEEE Trans. Commun., vol. 64, no. 8, pp. 3169–3181, Aug. 2016.
  • [19] J-Y. Zhu, R. Zhang, D. Pathak, T. Darrell, A. A. Efros, O. Wang, and E. Shechtman, “Toward multimodal image-to-image translation,” in Proc. Neural Inf. Process. Syst. (NIPS), Montréal, Canada, Dec. 2017, pp. 465–476.