Spatio-Temporal Modeling for Flash Memory Channels Using Conditional Generative Nets
Abstract
We propose a data-driven approach to modeling the spatio-temporal characteristics of NAND flash memory read voltages using conditional generative networks. The learned model reconstructs read voltages from an individual memory cell based on the program levels of the cell and its surrounding cells, as well as the specified program/erase (P/E) cycling time stamp. We evaluate the model over a range of time stamps using the cell read voltage distributions, the cell level error rates, and the relative frequency of errors for patterns most susceptible to inter-cell interference (ICI) effects. We conclude that the model accurately captures the spatial and temporal features of the flash memory channel.
Index Terms:
Machine learning, Flash memory channel, Generative modeling, Spatio-temporal analysis.I Introduction
Realistic models for digital communication and storage channels are essential tools in the design and optimization of signal processing, detection, and coding algorithms. For NAND flash memories, the steady reduction in technology feature size and the increase in cell bit-density has been accompanied by diminished memory reliability and reduced device endurance. Sources of errors are manifold, including programming errors, inter-cell interference (ICI), cell wear during program/erase (P/E) cycling, cell charge loss due to data retention, and read disturb effects. Consequently, there is a critical need for comprehensive models that capture the complex behavior of the flash memory channel and accurately simulate the spatial and temporal characteristics of the read voltages.
Several models of voltage distributions supported by empirical measurements or simulated voltages in NAND flash have appeared in the literature. Cai et al. [1] models the voltage distribution in 2-bit per cell MLC flash devices as a Gaussian distribution. Parnell et al. [14] proposed a parameterized Normal-Laplace mixture model that more accurately describes MLC flash read voltage distributions. Luo et al. [11] proposed another accurate and computationally more efficient model for MLC flash, based on a modified version of the Student’s t-distribution and a temporal power law. Liu et al. [10] used a neural network to model simulated read voltages as a function of P/E cycles for one individual program level in isolated MLC flash cells. In addition, statistical analysis of hard bit error statistics in [18, 13] and characterization of dominant error patterns [2] offer a valuable empirical understanding of flash memories. However, as effective as these models have been in the scenarios to which they have been applied, none has provide an accurate model of the complex spatial and temporal characteristics of flash memory read voltages.
Our goal in this paper is to use machine-learning techniques to develop a comprehensive and accurate model of the flash memory channel, with the aim of capturing both spatial ICI effects and temporal distortions arising from P/E cycling.
Recently, generative modeling techniques [3, 8] have been successfully applied to image processing [7]. In view of the demonstrated power of neural networks in learning complex multidimensional distributions, we propose the use of conditional generative nets as an approach to modeling flash memory read voltages in space and time. We adopt the conditional VAE-GAN as our training architecture, where the fusion of the variational autoencoder (VAE) [8] and the generative adversarial network (GAN) [3] can leverage the information from the latent space to produce high-quality, accurate reconstruction with the help of the discriminator.
We train the conditional VAE-GAN network to regenerate the soft read voltage levels from program levels, using a dataset of measurements from a 1X-nm, 3-bit per cell TLC NAND flash memory. By incorporating the exact P/E cycle counts into the architecture, we are able to control the regenerated voltage levels at a specified time stamp.
We formulate a novel generative modeling workflow to generate soft read voltages from all program levels over a range of P/E cycles. When we train the network on measured data, this conditional method can regenerate realistic cell-level read voltages on an array of flash cells from a corresponding array of program levels and a given P/E cycle. To our knowledge, this is the first work that models both temporal features (P/E cycles) and spatial characteristics (ICI effects). We use two evaluation metrics to compare the model results to measured data: the read voltage distributions, i.e., the estimated probability density functions (PDFs), at different time stamps, and the relative frequencies of spatially-dependent and pattern-dependent ICI-induced hard read errors. Details of these experimental comparisons are provided in Section IV.
II Flash Memory Basics
II-A Flash Structure and Experimental Procedure

The basic unit of data storage in NAND flash memory is a floating-gate transistor, referred to as a cell. Today’s flash memories are capable of storing single or multiple bits (e.g. 2 to 5 bits) per cell, where the -bit strings correspond to program levels. The cells are organized into an interconnected two-dimensional (2-D) array, called a block, via horizontal wordlines (WLs) and vertical bitlines (BLs). The flash memory chip is composed of a collection of such blocks. Fig. 1 depicts a schematic diagram of a planar TLC flash memory block and an example of a mapping from program levels to binary bits.
The basic unit of write (i.e., program) and read operations in flash memory is a page, corresponding to a logical bit position in a wordline of a block. On the other hand, the basic unit of an erase operation is an entire block.
To characterize the channel and create datasets for machine learning, we conducted program/erase (P/E) cycling experiments on several blocks of a commercial 1X-nm TLC flash memory chip. During each P/E cycle, the selected block is first erased, and then pseudo-random data are programmed into the successive pages in the block. At , , and cycles, we perform read operation on the tested block. For each cell, we then the record program level and the measured voltage level. The P/E cycling experiments were performed at room temperature in a continuous manner with no wait time between the erase-program-read operations.
In the program operation, we refer to the program levels of three consecutive cells along WLs and BLs as a pattern. As an example, in Fig. 1, the programmed levels in WLs of BL , correspond to bit strings “011”, “111”, “011”, which we associate with the pattern 707 in the vertical (BL) direction.
II-B Spatio-temporal Characteristics
Repeated P/E operations induce wear on the flash cells, and high-low-high program levels in three consecutive cells often leads to severe ICI effects. These distortions represent the spatio-temporal nature of the flash memory channel. As a result, the level error rate increases as the temporal P/E cycle count grows, as seen in Fig. 2.
The spatial ICI effect refers to the phenomenon where programming of a cell induces changes in the voltage levels of neighboring cells within its block. In particular, the read voltage level of a cell programmed to a low level may be inadvertently increased if its adjacent cells are programmed to high levels, i.e., when the programming pattern is high-low-high. For example, if we program a 707 pattern in a TLC flash memory, the read voltage of the low central cell may be increased by its high adjacent cells. During data detection, the recovered program level of the central “victim” cell may therefore be erroneously interpreted as an incorrect level. These the severe ICI effects can be observed in Fig. 2. At each P/E cycle, we see that the cell errors are not randomly distributed; they are clearly affected by neighboring program levels. The 9 patterns shown are the most error-prone in both WL and BL directions. Pattern 707 in the BL direction is the most severely affected by ICI. Moreover, patterns 707, 706, and 607 in the BL direction are more error-prone than those on the WL direction.
We note that, to mitigate the effects of ICI in flash memory, the use of constrained codes that prevent the appearance of ICI-prone patterns has been proposed; see, for example, [15, 4, 5]. Accurate modleing of the dependence of WL and BL pattern errors on the P/E cycle count can be a valuable tool to help researchers design efficient, time-aware constrained codes.

III Generative Modeling for Flash Memory
In this section, we discuss our generative modeling pipeline for the flash memory channel. We adopt a conditional VAE-GAN (cVAE-GAN) architecture [9] for our pipeline, depicted in Fig. 3. Our goal is to learn a mapping between program levels and soft read voltage levels at various P/E cycles, where the reconstructed voltage levels accurately reflect the spatial and temporal nature of the channel.

III-A Generative Flash Modeling
Using the P/E cycling experiment outlined in Section II-A, we collect the paired channel instances at specific P/E cycles, where the channel instances are denoted as . Here, we use PL, VL, and P/E to denote the input program level of channel, the output voltage level of channel, and the P/E cycle count, respectively. The goal of channel reconstruction and generative modeling in this paper is to learn the analytically intractable likelihood , with the aim of capturing the spatio-temporal nature of the flash memory channel.
Fig. 3 summarizes the architecture of the generative modeling pipeline. The conditional VAE-GAN architecture consists of three components: an encoder (), a generator (), and a discriminator (). The encoder maps the read voltages to the latent vector at a specific P/E and replaces the prior distribution in the GAN with the learned posterior distribution . The decoder in the VAE shares its weights with the GAN generator [3]. In the conditional setting, the variational lower bound of can be derived as
where represents the Kullback-Leibler (KL) divergence. The distribution of the latent vector is trained to approach via the KL loss , where is assumed to be a Gaussian distribution.
Generator will take both the learned latent vector and PL as input and generate a “fake” . The latent vectors are sampled from using the re-parameterization trick [8]. When sampling different latent vectors from the same distribution, we can generate multiple arrays of plausible voltages levels. The variations in these output arrays for a given array of program levels reflect the stochasticity of the channel. The discriminator measures the difference between PL and . The loss in the conditional GAN part is
Similar to VAE-GAN [9], we encourage the reconstructed voltage levels to match the authentic voltage levels, using the -norm to measure the reconstruction loss
Combining these equations, we formulate the loss function of the cVAE-GAN architecture as
(1) |
III-B Spatio-temporal Combination
Aiming to capture the spatial ICI effects in the channel model, we implement the generator using a convolutional neural network (CNN) in , where VL is reconstructed from the PL values in its cell and neighboring cells. To generate VL at an explicit P/E cycle count, we control the generator with an additional temporal factor and incorporate the explicit P/E cycle count into .
We first encode the normalized P/E cycle count into a -dimensional P/E vector, which contains expressive powers of the normalized P/E cycle, e.g., , , etc. Then, we spatially replicate the -dimensional P/E vector to the feature map with appropriate size and concatenate it with the feature from each layer in , where is the spatial dimension of the feature from each convolutional layer and is the number of channels in the CNN. The channel-wise combination produces the final feature with size of each layer. The fusion of the features from the program levels and the P/E feature maps guarantees the spatial-temporal characteristics of the reconstructed voltage levels.
III-C Implementation Details
We implement and evaluate our method using the recorded data in a commercial TLC flash chip. With the generative modeling framework, we believe this data-driven approach can be flexibly applied to flash memories of any technology generation and scale. In TLC flash memory, each block contains hundreds of pages, each of which is typically 8-16 kB in size. In order to represent the TLC flash memory without bias, we collect data from several blocks of one 1X-nm TLC chip at selected P/E cycle counts. We crop the blocks into non-overlapping 2-D arrays to formulate our paired data. The number of 2-D arrays in the training set is ( for each P/E cycle) and the size of the evaluation dataset is ( for each P/E cycle). The dimensions of latent vector and P/E cycle vector are both set to 6.
Remark 1
. We adapt the conditional VAE-GAN in [19] for input arrays of program levels and output arrays of voltage levels, each of which has a single channel. Three network modules in Fig. 3 refer to: ResNet [6] (), U-net [16] (), and PatchGAN [7] (). The following descriptions of the modules exploit the terminologies in the corresponding references.
-
1.
Encoder: We use the two residual blocks, each of which contains two convolutional layers with stride 1 and padding 1. We then add two linear layers, which map output features to mean and variance for the latent vector.
-
2.
Generator: denotes a Convolution-BatchNorm-ReLU layer with output channels. All convolutions are kernels applied with stride 2 and padding 1. The network architecture before spatio-temporal combination can be described as
where we inject latent vector by replication and concatenation into every layer in the “Down” part [19], and each layer in the “Up” part receives skip connections from the corresponding layer in the “Down” part [16].
-
3.
Discriminator : The input to the discriminator is the concatenation of fake voltage levels and program levels. With the same naming convention as in the generator, we express the discriminator as .
Remark 2
. The training details of the generative modeling methods are as follows. Adam optimizer is used with learning rate . Parameters in the loss function (1) are set to and . We train the conditional VAE-GAN for 7 epochs with batch size 2.
During evaluation, we use program levels and latent vector sampled from a standard Gaussian distribution. For each program level array, we prepare different sampled latent vectors to evaluate the learned model.
IV Experimental Results





To evaluate the quality of the reconstructed voltage levels and analyze the spatio-temporal nature of flash memory channel, we summarize our evaluation approaches as follows.
-
1.
Distribution: The frequency of occurrence of each voltage level given the program level and P/E cycle count is used to estimate the conditional probability of that level and time. We visualize the PDFs for measured data and reconstructed data. For quantitative comparison, we compare the distributions generated by our data-driven method with three statistical fitting methods [11], using the metric of level error count.
-
2.
Inter-cell interference (ICI): For cells programmed to 0 level that suffer an error according to their voltage level, we compute the relative frequencies of the patterns of program levels in adjacent cells in the WL and BL directions. We visualize these relative error frequencies using pie charts.
Remark 3
. We compared the cVAE-GAN model to other popular generative modeling architectures: conditional GAN [7], conditional VAE [17], and Bicycle GAN [19]. The networks were adapted for the array sizes of program levels and voltage levels, as in Remark 1. We then compare the learned distributions with the measured distribution using the total variation distance ,
The numerical results of indicated that the cVAE-GAN achieved the smallest total variation distance with respect to the measured distributions. Therefore, we selected cVAE-GAN as our model for the present study.
IV-A Distribution Analysis
As we evaluate our learned model using input arrays of program levels, we collect regenerated voltage levels and count the frequency of occurrence of voltage levels over the voltage range. We then estimate the conditional PDFs of voltages associated with each program level and given P/E cycle.
Fig. 4 shows the conditional PDFs for measured data and regenerated data in the evaluation dataset at three different time stamps. The -axis represents the soft read voltages spanning a certain voltage level range. The -axis represents the conditional PDF. We only show conditional PDFs from program level 1 to 7 due to normalization problems of program 0; the detailed analysis of level 0 will be discussed in Section IV-B. We make two observations from Fig. 4. First, as P/E cycle count increases, the peak of the distribution in each program level drops and the distribution becomes wider. This indicates that more voltage levels exceed the read thresholds and more errors will be detected, which is consistent with the increased error rate in Fig. 2. Second, our modeled data (solid curves) almost match the measured data (triangle markers) and, thus, capture the dependence on P/E cycles.
For a more quantitative assessment, we compare our generative model with three state-of-the-art statistical models: Gaussian model [1], Normal-Laplace model [14], and Student’s t-distribution model [11]. Following the optimization process used in [11] for an MLC flash device, we fit those statistical distributions to our TLC measured distributions. We minimize the KL divergence between real distribution and fake distribution by using the Nelder-Mead simplex method [12], where the KL divergence is denoted as . We obtain the best-fit parameters for all program levels, except , with each of the statistical distributions.
We then compute the level error counts under each of the distributions and quantitatively compare those fake distributions with real distributions, where the error count is a measure of the reliability and endurance of the flash memory. To calculate the level error counts from distributions, we fix 7 default read thresholds, as shown by the dash-dotted vertical lines in Fig. 4. The hard read voltages are determined by comparing soft voltages to these thresholds. For instance, if a voltage level of program level 1 lies below the first threshold or above the second threshold, the hard read level of the cell will not be designated as 1. The error probability of is denoted as
where denotes the voltage threshold used to distinguish between level 0 and level 1, and denotes the voltage threshold used to distinguish between level 1 and level 2.
We present the level error counts for five models in the form of bar charts Fig. 5. The -axis represents the chosen P/E cycle count and the label directly under each bar represents the corresponding model name. The -axis corresponds to the normalized error count. At each P/E cycle count, for each model, the errors from 7 program levels are stacked as one individual bar. The stacked bar represent the total error count. For the measured distributions, we see that level 1 has the highest error counts and the total error count at 10000 P/E cycles is around 2.5 that at 4000 P/E cycles.
For the statistical distributions, we observe that the Gaussian model does not estimate the error well; this is because the tails in the actual distribution are becoming heavier as the device is cycled to severe conditions. The Normal-Laplace model, on the other hand, takes the heavier tails into consideration and provides accurate estimations of error counts at each P/E cycle. Student’s t-distribution over-estimates the errors at those P/E cycles. For the machine learning approach, cVAE-GAN slightly overestimates the wear conditions in 7000 and 10000 P/E cycles. At 4000 P/E cycles, cVAE-GAN produces more errors than the measured data. In conclusion, Normal-Laplace is the best statistical model to capture the distributions of flash devices. Our cVAE-GAN works better than Normal-Laplace at 7000 and 10000 P/E cycles but overestimate the errors at 4000 P/E cycles. (However, as shown in the next section, the generative model can not only generate realistic-looking voltage distributions but also accurately learn spatial characteristics of the read voltages.)
Overall, we conclude that cVAE-GAN can regenerate voltages as a function of P/E cycles and produce high-quality voltage levels according to visual and quantitative metrics.
IV-B Characterization of ICI Effects
As discussed in Section II-B, the read voltage of a cell may be adversely increased by ICI when the neighboring cells are programmed to high levels. Cells programmed to level 0 are the most susceptible to such ICI effects. Generating voltage levels with ICI effects is complicated due to the pattern-dependent and spatially-dependent distortions. We remark that classical statistical models focus on regeneration of the PDFs of the measured data and, as such, are not expected to be effective in capturing ICI effects.
We evaluate how well the generative model learns spatial ICI properties by examining errors associated with program level patterns and in the WL and BL directions, respectively. As we observed in Fig. 2, the most error-prone patterns have central victim cell as 0, i.e., . We consider pattern-dependent error probabilities in both directions. The error probability measures the relative frequency of occurrence of the WL and BL patterns when an error occurs in the victim cell. More precisely, we calculate the pattern-dependent error probabilities in WL and BL directions,
The probabilities for measured and regenerated data are visualized as pie charts in Fig. 6. When we program an interior cell to level 0, there are 64 such patterns of program levels for the pair of adjacent cells in both WL and BL directions. Without ICI effects, the errors would occur randomly for all possible patterns.
In the measured data, the 23 listed patterns account for 55% of the errors in the WL direction and around 70% of the errors in the BL direction. The dominant error pattern in both WL and BL directions is 707. Comparing the area of pattern 707 in WL and BL, we find that pattern 707 in the WL direction is less severe than that in the BL direction.
As shown in Fig. 6, for the prevalent error patterns at 7000 P/E cycles, probabilities observed in the data generated by cVAE-GAN are very similar to those seen in the measured data. The only substantial discrepancy we observed is that the generative model underestimates the fraction of the 707 pattern in the WL direction. At 4000 (resp., 10000) P/E cycles, the model underestimates (resp., overestimates) the fraction of the 707 pattern in both directions. However, at all P/E cycles, cVAE-GAN generates the same rank ordering of pattern fractions as the measured data in both directions.
These results show that cVAE-GAN generally produces spatial distributions of voltage levels in a flash memory block that capture with good accuracy the effects of ICI in both vertical and horizontal directions.



97921 total errors observed at


989565 total errors observed at
V Conclusion
In this paper, we explored the use of conditional generative networks to model the flash memory channel. Unlike traditional modeling and analysis, our model can generate “realistic” soft voltage levels from program levels, not only representing accurate statistical distributions among different P/E cycle counts but also preserving spatial ICI effects. Our generative model marks a first step in capturing the spatio-temporal characteristics of flash devices.
Acknowledgment
The authors would like to thank Zachary Blair and Yi Liu for the flash memory test platform used in this study. The authors would also like to acknowledge very helpful discussions with Naoaki Kokubun, Sarah Ekaireb, and Shengqiu Jin.
References
- [1] Y. Cai, E. F. Haratsch, O. Mutlu, and K. Mai, “Threshold voltage distribution in MLC NAND flash memory: Characterization, analysis, and modeling,” in Proc. Design, Automation & Test in Europe Conf. & Exhib., Grenoble, France, Mar. 2013.
- [2] Y. Cai, E. F. Haratsch, O. Mutlu, and K. Mai, “Error patterns in MLC NAND flash memory: Measurement, characterization, and analysis,” in Proc. Design, Automation & Test in Europe Conf. & Exhib., Dresden, Germany, Mar. 2012.
- [3] I. J. Goodfellow, J. P.-Abadie, M. Mirza, B. Xu, D. W.-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Proc. Neural Inf. Process. Syst. (NIPS), Montréal, Canada, Dec. 2014, pp. 2672–2680.
- [4] A. Hareedy, B. Dabak, and R. Calderbank, “Managing device lifecycle: Reconfigurable constrained codes for M/T/Q/P-LC flash memories,” IEEE Trans. Inf. Theory, vol. 67, no. 1, pp. 282–295, Oct. 2020.
- [5] A. Hareedy, S. Zheng, P. H. Siegel, and R. Calderbank, “Read-and-run constrained coding for modern flash devices,” in Proc. IEEE Int. Conf. Commun. (ICC), Seoul, South Korea, May 2022.
- [6] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Las Vegas, NV, USA, July 2016, pp. 770–778.
- [7] P. Isola, J-Y. Zhu, T. Zhou, A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Honolulu, HI, USA, July 2017, pp. 1125–1134.
- [8] D. P. Kingma and M. Welling, “Auto-encoding variational Bayes,” in Proc. Int. Conf. Represent. Learn. (ICLR), Banff, Canada, Apr. 2014.
- [9] A. B. L. Larsen, S. K. Sønderby, H. Larochelle, O. Winther, “Autoencoding beyond pixels using a learned similarity metric,” in Int. Conf. Mach. Learn. (ICML), New York, USA, June 2016.
- [10] Z. Liu, Y. Liu, and P. H. Siegel, “Generative modeling of NAND flash memory voltage level,” in Non-Volatile Memories Workshop (NVMW), CA, USA, Mar. 2021.
- [11] Y. Luo, S. Ghose, Y. Cai, E. F. Haratsch and O. Mutlu, “Enabling accurate and practical online flash channel modeling for modern MLC NAND flash memory,” IEEE J. Select. Areas Commun., vol. 34, no. 9, pp. 2294-2311, Sept. 2016.
- [12] J. A. Nelder and R. Mead, “A simplex method for function minimization,” The computer Journal, vol. 7, no. 4, pp. 308-313, 1965.
- [13] N. Papandreou, H. Pozadis, T. Parnell, N. Loannoum, R. Pletka, S. Tomic, P. Breen, G. Tressler, A. Fry, and T. Fisher, “Characterization and analysis of bit errors in 3D TLC NAND flash memory,” in IEEE Int. Relilability Phys. Symp. (IRPS), Monterey, CA, USA, Apr. 2019.
- [14] T. Parnell, N. Papandreou, T. Mittelholzer, and H. Pozidis, “Modelling of the threshold voltage distributions of sub-20nm NAND flash memory,” in Proc. IEEE Global Commun. Conf., Dec. 2014, pp. 2351–2356.
- [15] M. Qin, E. Yaakobi, and P. H. Siegel, “Constrained codes that mitigate inter-cell interference in read/write cycles for flash memories,” IEEE J. Select. Areas Commun., vol.32, no.5, pp. 836–846, May 2014.
- [16] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: convolutional networks for biomedical image segmentation,” in Int. Conf. Med. Imag. Comput. Comput.-assisted Intervention (MICCAI 2015), Springer, Cham.
- [17] K. Sohn, H. Lee, and X. Yan “Learning structured output representation using deep conditional generative models,” in Proc. Neural Inf. Process. Syst. (NIPS), Montréal, Canada, Dec. 2015.
- [18] V. Taranalli, H. Uchikawa, and P. H. Siegel, “Channel models for multi-level cell flash memories based on empirical error analysis,” IEEE Trans. Commun., vol. 64, no. 8, pp. 3169–3181, Aug. 2016.
- [19] J-Y. Zhu, R. Zhang, D. Pathak, T. Darrell, A. A. Efros, O. Wang, and E. Shechtman, “Toward multimodal image-to-image translation,” in Proc. Neural Inf. Process. Syst. (NIPS), Montréal, Canada, Dec. 2017, pp. 465–476.