A Deep Dive into the Computational Fidelity of High Variability Low Energy Barrier Magnet Technology for Accelerating Optimization and Bayesian Problems ^†^†thanks: [email protected],[email protected]

Md Golam Morshed¹1, Samiran Ganguly²2, and Avik W. Ghosh^1,3
¹Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA 22904, USA
²Department of Electrical and Computer Engineering, Virginia Commonwealth University, Richmond, VA 23284, USA
³Department of Physics, University of Virginia, Charlottesville, VA 22904, USA

Abstract

Low energy barrier magnet (LBM) technology has recently been proposed as a candidate for accelerating algorithms based on energy minimization and probabilistic graphs because their physical characteristics have a one-to-one mapping onto the primitives of these algorithms. Many of these algorithms have a much higher tolerance for error compared to high-accuracy numerical computation. LBM, however, is a nascent technology, and devices show high sample-to-sample variability. In this work, we take a deep dive into the overall fidelity afforded by this technology in providing computational primitives for these algorithms. We show that while the compute results show finite deviations from zero variability devices, the margin of error is almost always certifiable to a certain percentage. This suggests that LBM technology could be a viable candidate as an accelerator for popular emerging paradigms of computing.

Index Terms:

Nanomagnetics, binary stochastic neurons, probabilistic computing, energy minimization-based optimization algorithms, probabilistic graphical algorithms.

I INTRODUCTION

Low energy barrier magnet (LBM) technology, which utilizes nanomagnets with barrier height in the order of thermal energy, has recently been proposed as a potential candidate for hardware accelerators for probabilistic computing and stochastic sampling [1, 2]. These accelerators may be broadly considered as hardware Markov chain Monte Carlo implementation that utilizes the built-in stochasticity provided by the dynamics of the LBM, which results in highly compact devices with true stochasticity, as compared to linear feedback-shift register (LFSR) based pseudo-random number generators (pRNGs) [3]. The magnetization component $m_{z}$ of the LBM randomly fluctuates between two stable states ( $\uparrow,~{}\downarrow$ ) under the influence of the thermal noise, and the probability of getting any one of the two stable states can be driven via an external current [4]. There are a handful of applications ranging from probabilistic computing to machine learning and artificial intelligence that leverage the intrinsic stochastic nature of LBMs [4, 5, 6, 7, 8]. The prototype hardware building blocks are the binary stochastic neurons (BSNs), popularly known as “p-bits” with programmable weights in a recurrent configuration. An illustrative example of a dual-stacked feedback cross-bar structure is shown in Fig. 1(a). The synaptic weights or the “program” is loaded in memristors located at the cross-points of the core cross-bar structure, whereas the neurons are at the peripheries. Using a dual cross-bar structure, it is possible to build recurrent networks, including a Restricted Boltzmann Machine [RBM, Fig. 1(b)], an example application area of this accelerator. The RBM is embedded in the computing fabric by enabling certain neurons and synaptic connections while disabling the rest.

Although BSN-based non-Boolean probabilistic applications are inherently more error resilient than conventional nanomagnet switches used for deterministic Boolean memory and logic applications, the computational reliability of these accelerators that employ LBMs as their hardware RNG, needs to be carefully assessed. Recently, several studies have discussed the impact of geometric, structural, and process variation from device-to-device that can create ignorable to high variability in the characteristics of LBMs depending on the degree of variation [9, 10, 11], however, the resulting impact of these “non-idealities” on the computational networks is still largely not understood.

Refer to caption — Figure 1: (a) Illustrative schematic of an embedded RBM, an energy-based optimization and learning algorithm, in a dual-stacked feedback cross-bar structure with neurons (the compute units) at the edges (large circles), while the synaptic weights (the program) loaded in memristors located at the cross-points of the core cross-bar structure (small circles). The active neurons and synapses are colored bold (red and yellow), while inactive units are greyed out. (b) The RBM network that gets embedded in (a). The bidirectional blue lines represent the synaptic connections between the neurons (red circles). The yellow circles used in (a) are not shown here for simplicity. (c) The design of an LBM-magnetic tunnel junction (MTJ)-based p-bit unit. (d) Ideal characteristics of a p-bit device. (e) Schematics of different characteristics distortions. (f) Illustration of energy barrier variation in a nanomagnet. Symbols (diamond, square, etc.) in (e) and (f) represent different variabilities henceforth.

In this letter, we discuss the issues of variability in the context of circuits and networks built from LBM-based BSN devices. We categorize the variability into a few broad classes, namely shifting and scaling of the device characteristics from the ideal as expected from the mathematical model, and the variability of the barrier heights for two broad classes of algorithms that can be solved using p-bits, such as energy minimization-based optimization algorithm (EMOA) and probabilistic graphical algorithm (PGA). EMOA includes problems such as Ising model and RBMs, which seek to define a problem in terms of a thermodynamically definable “energy-landscape” with the embedding of the desired optimal result in the ground/vacuum energy, while PGA includes Bayesian decision diagrams, which do not have an inherent notion of energy and thermodynamics. In terms of network connectivity (using the spectral theorem of linear systems) [12], this implies that the EMOA networks have symmetric or undirected connections, resulting in eigenstates that are real-valued and reachable via real-space computation, whereas PGA networks are asymmetric or directed, resulting in non-real or complex eigenstates not reachable via real-space computation.

We estimate the mean absolute error (MAE) to quantify the performance deviation from the ideal devices. We find the MAE shows a sub-linear saturation for EMOA, while in the PGA, the error grows linearly to super-linearly. Moreover, the networks are found to be more prone to shifting variability than scaling. Additionally, for EMOA, larger networks are less affected by the variability, while for PGA, the trend is the opposite. Our findings may provide a potential path forward toward designing reliable LBM-based hardware accelerators.

II BUILDING ‘p-bits’ USING LOW BARRIER MAGNETS

Thin film magnets used in magnetic random access memory (MRAM) technology exhibit a double potential well corresponding to the two easy points [Fig. 1(f)]. The height of the barrier determines the expected state retention time using the Arrhenius relation given by:

\tau=\tau_{0}e^{U/k_{B}T}

(1)

In the above equation, $U(=\mu_{0}M_{s}H_{k}\Omega/2)$ is the energy barrier, where the symbols respectively stand for permeability of free space, saturation magnetization, magnetic anisotropy field strength, and volume. For a conventional storage class memory, $U$ is set to $40-60~{}k_{B}T$ for the free layer of an MTJ, which yields a decade-long state retention time $\tau$ depending on the $\tau_{0}$ , the inverse of attempt frequency that ranges from $0.1-1~{}ns$ [13]. However, if the magnet is ultra-scaled by reducing the volume $\Omega$ or its profile is made circular, which reduces the $H_{k}$ by removing the shape anisotropy, the retention time can be scaled down to near $\tau_{0}$ [14]. In this case, the free layer’s magnetization vector fluctuates between the two easy points under the influence of the thermal noise, which is able to “kick” the magnetization over the barrier with ease, at near $\mathrm{GHz}$ frequencies. MTJ structure allows this fluctuation to be translated into an equivalent fluctuation in the resistance of the device, which can be used for building useful devices that can harvest true randomness from the environment.

One such device is the “p-bit”, which is a binary stochastic neuron with a compact model given by:

V^{out}_{i}=\rm{sgn}[\tanh(\beta V^{in}_{i})+\alpha\cdot\rm{rnd}(-1,+1)]V_{DD}/2

(2)

In this device, the output swings between $-V_{DD}/2$ to $V_{DD}/2$ corresponding to $-1$ and $+1$ state labels of $m_{z}$ , however, the ratio of these states is controllable by an input signal, which imposes a $\tanh$ -like probability distribution. rnd is a uniform random distribution. The parameters $\beta$ and $\alpha$ represent the transfer gain of the unit and the relative contribution of the stochasticity to the characteristics, respectively. For large scale correlated networks, $V^{in}$ can be represented as:

V^{in}_{i}=\kappa[h_{i}+\sum_{j}{J_{ij}{V^{out}_{j}}/{(V_{DD}/2})]}

(3)

where $j$ stands for the index over all input devices connected to the particular $i$ -th device, $h$ is the bias vector, and $J$ is the synaptic matrix. Different functionalities correspond to different choices of $h$ and $J$ . $\kappa$ is a coupling coefficient representing the inverse of the “temperature” of the system.

III SIMULATION METHOD

We implement the compact model of $p$ -bit networks described by (2) and (3) in MATLAB according to the methodology discussed in Camsari [4]. The MATLAB model is a parameterized version of the compact modeling simulation performed in SPICE [4, 15]. In MATLAB implementation, we use $\alpha=1$ , $\beta=1$ (for ideal case), $\kappa=0.8$ , and $V_{DD}=2~{}V$ throughout the calculation unless otherwise specified.

We use computational networks constructed from p-bits of varying sizes. For EMOA, we use AND gate and full-adder having $J$ matrices sized $3\times 3$ and $14\times 14$ , respectively [4]. We construct an arbitrary symmetric $J$ matrix of $50\times 50$ for a large network. For PGA, we use Bayesian networks (BNs) constructed from $8$ , $20$ , and $50$ p-bits ( $J$ matrices are asymmetric in these cases). For EMOA, MAE is computed by taking the summation of the absolute difference between the output probability distribution of ideal and non-ideal cases, normalized by the number of LBMs in the network. However, for PGA, we calculate the normalized MAE from the difference in the correlation matrix ( $\sigma(i,j)=\frac{1}{T}\int_{0}^{T}V_{i}^{out}V_{j}^{out}\,dt$ ) between the ideal and non-ideal cases. For both algorithms, we use $T=10^{6}$ simulation steps to get to the $V^{out}$ . If the sample generation time is $2~{}ns$ , this is equivalent to $2~{}ms$ of compute time. The average and standard deviation of the MAE are calculated from $N=100$ simulations.

IV RESULTS

LBM devices are hybrids of silicon CMOS, which is a highly mature technology, and spintronics/magnetics, which is a relatively new technology. While they have been successfully integrated into the context of high energy barrier storage class MRAM technology by several commercial vendors, its LBM variant comes with lithographic challenges that may require a long process of technological developments to perfect. These lithographic challenges mainly concern the quality of magnetic films and the precision control over their geometry. Abeed [2019a] studied the impact of geometrical irregularities such as dimples, holes, shape variance, etc. on the characteristic correlation times of LBMs and found that the distribution of correlation times can be large. These kinds of variations can have implications that are beyond the intrinsic behavior of the free-layer magnet of the MTJ itself.

In particular, two critical sets of variations are discussed next. Please note that these variations become relevant in the context of circuits and networks built from these devices.

IV-A Characteristics Distortion

Fig. 1(c) and 1(d) show the proposed device and its ideal output characteristics, respectively. The characteristics of the device depend on the swing that is generated by the NMOS transistor turning on or turning off, balanced around the MTJ’s characteristic resistance, i.e., the resistance of the transistor in the linear intermediate mode should match the MTJ’s average resistance. In the linear mode of operation, the $\tanh$ shape shows up as an interplay between the MTJ’s average and transistor’s intermediate resistance as it swings from on to off, while the MTJ’s magnetization flipping adds the fluctuation on the characteristics. A mismatch between these two can lead to a deviation from the “ideal” model presented in (2). We categorize the variations into four categories that broadly cover the phase space of such distortions [shown in Fig. 1(e)]: 1. horizontal shift; 2. vertical shift; 3. horizontal scale, due to variation in gain $\beta$ ; 4. vertical scale, from loading effects from follow on p-bits that the output cannot handle adequately. Fig. 2 shows the normalized MAE that emerged from horizontal shifting in the networks for both the EMOA and PGA problem classes. We vary the maximum voltage shift from $0~{}V$ to $1~{}V$ . From Fig. 2(a), we find that for AND gate, the error increases rapidly up to a $\sim 20\%$ horizontal voltage shift and slows down afterward. However, the error has an overall increasing trend. For larger networks, the error starts saturating at $\sim 10\%$ voltage shift. For AND gate, we find a maximum of $\sim 30\%$ MAE corresponding to a horizontal voltage shift of $1~{}V$ . We see that as the network size increases, the error percentage decreases for EMOA. On the other hand, for PGA, from Fig. 2(b), we can see that the error increases almost linearly as a function of horizontal voltage shift. However, the relation between the error and the network size is opposite to that of EMOA. Fig. 3-5 show the normalized MAE that emerged from vertical shifting, horizontal scaling, and vertical scaling, respectively, for both EMOA and PGA. The increasing trend of the MAE is similar for different types of distortion; however, the percentage error varies depending on the problem class, distortion type, and network size. We list the maximum error arising from different types of distortion in Fig. 6, where different colors represent the overall trend of MAE. It is important to note that we vary only one type of distortion at a time.

IV-B Energy Barrier Variability

It is clear from (1) that a small variation in the energy barrier $U$ can lead to a large variation in the expected state retention time $\tau$ . This translates to a circuit encountering widely different time scales or a large dynamic range of operation within its individual components. This can lead to significant issues with the operational viability of a circuit built from p-bits. We, therefore, analyze the effect of energy barrier variation on the performance of the networks. As a result of the energy barrier variation, the magnetic states of different nanomagnets update at different times than the ideal case (assuming $0~{}k_{B}T$ energy barrier), leading to an overall error in the output quantity. Fig. 7 shows the normalized MAE for EMOA and PGA arising from energy barrier variability. We find that for both classes of problems, the error percentage is small (within $\sim 10\%$ ) up to an energy barrier of $\sim 10~{}k_{B}T$ . For EMOA, the impact of a high energy barrier in a small network is severe ( $\sim 40\%$ error), while the large network seems more forgiving in terms of error ( $\sim 4\%$ error). On the contrary, the trend is the opposite in the case of PGA. We find a maximum of $\sim 50\%$ error for a large-sized BN. We note that we do not include the characteristics distortion variability while taking into account the energy barrier variability.

IV-C Sampling vs. Simulated annealing

The results discussed above for EMOA are calculated using the sampling technique, based on a fixed interaction strength $\kappa$ (pseudo-inverse temperature) throughout the simulation, and run the simulation long enough time ( $10^{6}$ steps) so that the p-bits visit primarily the low-energy state. Fig. 8 shows the MAE using simulated annealing in comparison with the sampling technique. We vary $\kappa$ from $0.5$ to $5$ after every $2\times 10^{5}$ steps while calculating the output using the simulated annealing technique. We find that the error percentage is slightly higher for all types of characteristics distortions for the simulated annealing technique. We conjecture that this is because the sampling method, when run long enough, can cover the system’s phase space better ergodically than a linear simulated annealing schedule, which is in essence a guided importance sampling for a shorter time, may not be able to sample the phase space as comprehensively to discover the true ground state. This may be improved by more complex annealing schedules, which we do not discuss further.

V CONCLUSION

In summary, we quantify the impact of non-idealities in computational networks built from LBM-based BSNs using two different techniques. In all the possible variances studied in this work, the error shows a sub-linear saturation at the extremal device variability points for EMOA, while in the PGA, the error grows linearly to super-linearly. We conjecture that this is because, in EMOA, the system tries to seek a single thermodynamically favorable fixed point in a finite phase space, which limits the growth of error, whereas, in PGA, there is no similar principle that can check the growth of the error. Additionally, running multiple samples of the same problem with different random seeds (thereby simulating the “real world”) helps in reducing the variance of the error, but not its mean value. This suggests that for a certain amount of device variability, the average error is fixed, which may be estimated or characterized beforehand, and the results are certified accordingly. These findings may provide critical design insights for building suitable LBM-based hardware accelerators.

ACKNOWLEDGMENTS

This work is supported in part by the NSF I/UCRC on Multi-functional Integrated System Technology (MIST) Center; IIP-1439644, IIP-1439680, IIP-1738752, IIP-1939009, IIP-1939050, and IIP-1939012. We thank Kerem Yunus Camsari and Faiyaz Elahi Mullick for useful discussions. All the calculations are done using the computational resources from High-Performance Computing systems at the University of Virginia (Rivanna).

References

[1] Camsari K. Y., Sutton B. M., Datta, S. (2019), “p-bits for probabilistic spin logic,” Appl. Phys. Rev., vol. 6, 011305, doi: 10.1063/1.5055860.
[2] Parks B., Bapna M., Igbokwe, J., Almasi H., Wang W., Majetich S. A. (2018), “Superparamagnetic perpendicular magnetic tunnel junctions for true random number generators,” AIP Adv., vol. 8, 055903, doi: 10.1063/1.5006422.
[3] Vodenicarevic D., Locatelli, N., Mizrahi A., Friedman J. S., Vincent A. F., Romera M., Fukushima A., Yakushiji K., Kubota H., Yuasa S., Tiwari S., Grollier J., Querlioz D. (2017), “Low-Energy Truly Random Number Generation with Superparamagnetic Tunnel Junctions for Unconventional Computing,” Phys. Rev. Appl., vol. 8, 054045, doi: 10.1103/PhysRevApplied.8.054045.
[4] Camsari K. Y., Faria R., Sutton B. M., Datta S. (2017a), “Stochastic $p$ -Bits for Invertible Logic,” Phys. Rev. X, vol. 7, 031014, doi: 10.1103/PhysRevX.7.031014.
[5] Faria R., Camsari K. Y., Datta S. (2017), “Low-Barrier Nanomagnets as p-Bits for Spin Logic,” IEEE Magnetics Letters, vol. 8, pp. 1-5, doi: 10.1109/LMAG.2017.2685358.
[6] Sutton B., Camsari K. Y., Behin-Aein, B., Datta S. (2017), “Intrinsic optimization using stochastic nanomagnets - Scientific Reports,” Sci. Rep., vol. 7, 44370, doi: 10.1038/srep44370.
[7] Hassan O., Camsari K. Y., Datta S. (2019), “Voltage-Driven Building Block for Hardware Belief Networks,” IEEE Design & Test, vol. 36, pp. 15-21, doi: 10.1109/MDAT.2019.2897964.
[8] Ganguly S., Camsari K. Y., Ghosh A. W., (2021), “Analog Signal Processing Using Stochastic Magnets,” IEEE Access, vol. 9, pp. 92640-92650, doi: 10.1109/ACCESS.2021.3075839.
[9] Abeed Md. A., Bandyopadhyay S., (2019a), “Low Energy Barrier Nanomagnet Design for Binary Stochastic Neurons: Design Challenges for Real Nanomagnets With Fabrication Defects,” IEEE Magnetics Letters, vol. 10, pp. 1-5, doi: 10.1109/LMAG.2019.2929484.
[10] Abeed Md. A., Bandyopadhyay S., (2019b), “Sensitivity of the Power Spectra of Thermal Magnetization Fluctuations in Low Barrier Nanomagnets Proposed for Stochastic Computing to In-Plane Barrier Height Variations and Structural Defects,” SPIN, vol. 10, 2050001, doi: 10.1142/S2010324720500010.
[11] Drobitch J. L., Bandyopadhyay S., (2019), “Reliability and Scalability of p-Bits Implemented With Low Energy Barrier Nanomagnets,” IEEE Magnetics Letters, vol. 10, pp. 1-4, doi: 10.1109/LMAG.2019.2956913.
[12] Strang G., (2016), Introduction to Linear Algebra, Wellesley, MA, USA: Wellesley-Cambridge Press.
[13] Lopez-Diaz L., Torres L., Moro E., (2002), “Transition from ferromagnetism to superparamagnetism on the nanosecond time scale,” Phys. Rev. B, vol. 65, 224406, doi: 10.1103/PhysRevB.65.224406.
[14] Debashis P., Faria R., Camsari K. Y., Appenzeller, J., Datta S., Chen Z., (2016), “Experimental demonstration of nanomagnet networks as hardware for Ising computing,” 2016 IEEE International Electron Devices Meeting (IEDM), pp. 34.3.1-34.3.4, doi: 10.1109/IEDM.2016.7838539.
[15] Camsari K. Y., Salahuddin S., Datta S. (2017b), “Implementing p-bits With Embedded MTJ,” IEEE Electron Device Lett., vol. 38, pp. 1767-1770, doi: 10.1109/LED.2017.2768321.

A Deep Dive into the Computational Fidelity of High Variability Low Energy Barrier Magnet Technology for Accelerating Optimization and Bayesian Problems ††thanks: [email protected],[email protected]