This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: School of Information Science and Engineering, Shandong Normal University, Jinan, China
11email: [email protected]
22institutetext: School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China

Obtaining Optimal Spiking Neural Network in Sequence Learning via CRNN-SNN Conversion

Jiahao Su Work done during the internship at Shanghai Jiao Tong University1122    Kang You 22    Zekai Xu 22    Weizhi Xu 11    Zhezhi He Corresponding author: [email protected]
Abstract

Spiking neural networks (SNNs) are becoming a promising alternative to conventional artificial neural networks (ANNs) due to their rich neural dynamics and the implementation of energy-efficient neuromorphic chips. However, the non-differential binary communication mechanism makes SNN hard to converge to an ANN-level accuracy. When SNN encounters sequence learning, the situation becomes worse due to the difficulties in modeling long-range dependencies. To overcome these difficulties, researchers developed variants of LIF neurons and different surrogate gradients but still failed to obtain good results when the sequence became longer (e.g., >>500). Unlike them, we obtain an optimal SNN in sequence learning by directly mapping parameters from a quantized CRNN. We design two sub-pipelines to support the end-to-end conversion of different structures in neural networks, which is called CNN-Morph (CNN \rightarrow QCNN \rightarrow BIFSNN) and RNN-Morph (RNN\rightarrow QRNN \rightarrow RBIFSNN). Using conversion pipelines and the s-analog encoding method, the conversion error of our framework is zero. Furthermore, we give the theoretical and experimental demonstration of the lossless CRNN-SNN conversion. Our results show the effectiveness of our method over short and long timescales tasks compared with the state-of-the-art learning- and conversion-based methods. We reach the highest accuracy of 99.16% (0.46 \uparrow) on S-MNIST, 94.95% (3.95 \uparrow) on PS-MNIST (sequence length of 784) respectively, and the lowest loss of 0.057 (0.013 \downarrow) within 8 time-steps in collision avoidance dataset.

Keywords:
CRNN-SNN conversion Sequence learning

1 Introduction

Spiking Neural Networks (SNNs), known as third-generation neural networks  [22], are inspired by the biological structure of the brain. Recent studies have shown that brain-inspired neuron models (e.g., integrate and fire (IF) neuron), can obtain results comparable to ANN networks with high energy efficiency and low latency [12, 15]. Unlike traditional ANNs, SNNs use discrete spikes to convey information between neurons. Such binary communication mechanism can be executed smoothly on a neuromorphic chip (e.g., Truenorth[1], Loihi[8]).

SNNs and RNNs share similarities in many ways, like the design of hidden states and the ability to learn through time. Many efforts have been made in RNN to improve long-term learning dependencies and have achieved astonishing results in sequence learning [16, 20]. Attracted by the performance of RNNs, a question arises: how to obtain SNNs that can perform as well as RNNs in sequence learning? An obstacle to answering this question is the non-differential binary communication mechanism of SNN, which results in significant information loss. To address the problem, surrogate gradient (SG) based back-propagation methods [24, 23] and variants of neurons based learning [32, 30] was introduced. However, such approaches still suffer from the spike vanishing phenomenon [24] and inaccurate gradient approximation [23]. When the temporal sequence becomes longer, SNN cannot achieve the ANN-level accuracy (e.g., SNN-SoTa is 91% while RNN-SoTa is 97.2% in permuted-sequential MNIST).

Refer to caption
Figure 1: RNN-RBIF conversion. A quantized RNN (left) is converted to its corresponding RBIFSNN (right) via QCRC framework without accuracy loss.

Instead of expecting that the non-differential binary network directly converges to ANN-level accuracy through learning, the conversion-based method obtains an SNN by mapping parameters from its counterpart ANN. However, existing neuron models in conversion methods [27, 19, 6] are not compatible with RNN cells because the data in the recurrent structure remain in float type after conversion, which is not allowed. In addition, it still suffers from conversion errors, and these errors will be magnified over time. To address the aforementioned issues, we propose the Recurrent Bipolar Integrate-and-Fire (RBIF) neuron to support the RNN-SNN conversion (as shown in fig. 1), which guarantees the spike form of the recurrent connection after conversion. Furthermore, we propose a comprehensive framework that supports lossless Quantized Convolutional and Recurrent neural networks to SNN Conversion (QCRC) end-to-end. Our main contributions are summarized as follows:

  • We propose the Recurrent Bipolar Integrate-and-Fire (RBIF) neuron to address the incompatibility problem of RNN cell. We further give theoretical and experimental proofs of CRNN-SNN conversion.

  • We obtain optimal SNN in sequence learning via CRNN-SNN conversion framework, which includes a conversion pipeline with two branches, namely CNN-Morph and RNN-Morph, enabling the conversion of various types of networks into SNNs end-to-end.

  • We outperform SoTa learning-based works with the accuracy of 99.16% (0.46 \uparrow) on S-MNIST and 94.95% (3.95 \uparrow) on PS-MNIST. We also surpass SoTa conversion-based methods on the collision avoidance dataset, achieving the lowest loss at every time-step (e.g., a loss of 0.118 (0.118 \downarrow) at time-step 2, and a loss of 0.057 (0.013 \downarrow) at time-step 8).

2 Related Works

2.1 Relation of RNN and SNN

Spiking neural networks have similarities to vanilla RNN and its variants in the same form. Since the change of membrane potential is related to time, an SNN can be understood as an RNN without recurrent connection [23]. Recurrent neural networks (RNNs) are powerful models for processing sequential data while spiking neural networks (SNNs) show huge potential for processing sequential event-based data. To address the vanishing and exploding gradient problems during the training of RNN, long short-term memory (LSTM) [14] is proposed. In addition to adding the gate units in recurrent neurons, other works address the problem by weight initialization like IRNN [16] or changing the form of recurrent neurons like indRNN [20]. Similar to RNNs, many efforts have been made to help SNNs learn long-term patterns. Variants of LIF (e.g., Adaptive LIF [2, 31], GLIF [yao2023glif]) are proposed to enlarge the representation of neuronal behaviors. The RSNN that contains recurrent connections is adopted by [32, 29], resulting in better performance compared with feedforward-only connections. However, it still remains challenges to obtain an SNN with RNN-level performance in the dataset that RNNs are good at, such as sequential image classification and time series forecasting.

2.2 ANN-to-SNN conversion

The ANN-to-SNN conversion algorithm was first introduced in [7] by changing the activation function to ReLUReLU. [9] presented two ways to normalize the network weights (i.e., data-based and model-based normalization) to prevent the overestimating output activation. [26, 10] took threshold into consideration and proposed different normalization methods. By theoretically analyzing the conversion error between the source ANN and the converted SNN, [6, 21] achieved the high-performance ANN-SNN conversion with ultra-low latency. To mitigate the sequential error, a neuron that can trigger both positive and negative spikes was proposed, which has been widely used in recent works [15, 19, 28, 33].

The main idea of conversion is to map the firing rate of SNN to the output of quantized ReLU. This idea is in tune with our goal, which is bridging the recurrent dynamics of SNN and RNN. However, previous proofs of ANN-SNN conversion mainly focused on linear and convolutional layers, consequently, the effectiveness of the conversion method was only demonstrated on static datasets. In table 1, we summarize the techniques and settings shared by state-of-the-art works of ANN-SNN conversion. It shows that our work could support different structures of the original ANN and different data types.

Table 1: Technical settings of related works. “Eq.” is the abbreviation of equivalence. “✓” represents support. “m-analog” indicates the analog input is fed to SNN at every time-step. “s-analog” indicates the analog input is only fed to SNN at the first time-step, which is equal to the RNN input.
QCFS [6] Offset [13] Fast-snn [15] Ours
Encoding m-analog m-analog m-analog s-analog
Neuron IF IF signed IF BIF/RBIF
Theoretical Eq. of CNN
Theoretical Eq. of RNN
Experimental Eq. of CRNN
Data Type static static static static/temporal

2.3 Quantization in ANN Compression

Quantization refers to techniques for performing computations and storing tensors at lower bit-widths than floating point precision. The mathematics of quantization for neural networks is as follows:

xq=clip(round(xs+z),a,b).x_{q}=clip(round(\frac{x}{s}+z),a,b). (1)

where ss and zz denote quantization scale and zero point respectively. clip(,a,b)clip(\cdot,a,b) function sets the the lower bound aa and upper bound bb. There are two main classes of algorithms: post-training quantization (PTQ) and quantization-aware training (QAT). Compared with PTQ, QAT usually leads to a more robust model. It inserts some fake modules in the computational graph of the model to simulate the effect of the quantization during training, where the straight-through estimator (STE) [3] is a typical adoption to approximate the gradient of the quantization function. To further mitigate the quantization error, LSQ [11] makes ss as learnable as other network parameters (i.e., z,a,bz,a,b). We adopt LSQ as our quantization method, following the approach of previous works [5, 15].

3 Method

3.1 SNN Model

Refer to caption
Figure 2: The computational graphs of BIF neuron and the RBIF neuron. The recurrent connections in figure (b) are in spike form, which charges sk1(t)s_{k-1}(t) to the H(t) at time-step t.

3.1.1 Bipolar Integrate-and-Fire Neuron

To mitigate the sequential error (the phenomenon that spikes are generated in spiking neurons where they should not be), we adopt bipolar integrate-and-fire (BIF) neuron as our basic neuron. (fig. 2 (a)) The overall dynamic of BIF neuron can be expressed as follows:

𝑯l(t)\displaystyle\bm{H}^{l}(t) =𝑽l(t1)+𝑾l𝒔l1(t)λl1,\displaystyle=\bm{V}^{l}(t-1)+\bm{W}^{l}\bm{s}^{l-1}(t)\lambda^{l-1}, (2)
𝑽l(t)\displaystyle\bm{V}^{l}(t) =𝑯l(t)𝒔l(t)λl.\displaystyle=\bm{H}^{l}(t)-\bm{s}^{l}(t)\lambda^{l}. (3)

where 𝑯l(t)\bm{H}^{l}(t) and 𝑽l(t)\bm{V}^{l}(t) represent the membrane potential before and after firing. 𝑾l\bm{W}^{l} denotes the synaptic weight between layer l1l-1 and layer ll. To minimize information loss, we adopt the “reset-by-subtraction” mechanism [25]. Here, 𝒔l(t)\bm{s}^{l}(t) denotes the bipolar output spikes at time step tt and λl\lambda^{l} represents the threshold of layer ll. We mitigate the sequential error by allowing 𝒔l(t)\bm{s}^{l}(t) to be either positive or negative while setting a spike tracer 𝑺l(t)\bm{S}^{l}(t) to record the sum of spikes. The firing rules can be described by the equations below.

𝑺l(t)\displaystyle\bm{S}^{l}(t) =𝑺l(t1)+𝒔l(t),\displaystyle=\bm{S}^{l}(t-1)+\bm{s}^{l}(t), (4)

where 𝑺l(t)=0,1,,Smaxl\bm{S}^{l}(t)=0,1,...,S^{l}_{\textrm{max}}.

𝒔l(t)\displaystyle\bm{s}^{l}(t) ={1,𝑯l(t)λl&𝑺l(t1)<Smaxl0,others1,𝑯l(t)<0&𝑺l(t1)>0.\displaystyle=\begin{cases}1,&\bm{H}^{l}(t)\geq\lambda^{l}~{}\&~{}\bm{S}^{l}(t-1)<S^{l}_{\textrm{max}}\\ 0,&\textrm{others}\\ -1,&\bm{H}^{l}(t)<0~{}\&~{}\bm{S}^{l}(t-1)>0\end{cases}. (5)

3.1.2 Recurrent Bipolar Integrate-and-Fire Neuron

As RNN introduces external recurrent connections, the computation graph is different from linear and convolution layers. Accordingly, the pattern of BIF is not compatible with RNN cells, because it will lead to illegal non-spiking forms of recurrent connection after conversion. To address the problem, we propose a novel neuron called the recurrent bipolar integrate and fire (Figure 2 (b) RBIF) neuron. The neural dynamics of RBIF is defined as:

Refer to caption
Figure 3: Conversion pipelines. Conversion pipeline has two branches, where the top one is RNN-Morph and the bottom one is CNN-Morph. In general, both of the sub-pipelines can be divided into two steps: quantization and Neuron-Morph.
𝑯kl(t)\displaystyle\bm{H}_{k}^{l}(t) =𝑽kl(t1)+𝑾ih𝒔kl1(t)λl1+𝑾hh𝒔k1l(t)λl,\displaystyle=\bm{V}_{k}^{l}(t-1)+\bm{W}_{ih}\bm{s}_{k}^{l-1}(t)\lambda^{l-1}+\bm{W}_{hh}\bm{s}_{k-1}^{l}(t)\lambda^{l}, (6)
𝑽kl(t)\displaystyle\bm{V}_{k}^{l}(t) =𝑯kl(t)𝒔kl(t)λl.\displaystyle=\bm{H}_{k}^{l}(t)-\bm{s}_{k}^{l}(t)\lambda^{l}. (7)

Here we use subscript kk to distinguish SNN time-step tt, which represents the kk-th input of the RNN sequence. 𝑾ih\bm{W}_{ih} and 𝑾hh\bm{W}_{hh} denote the learnable input-hidden and hidden-hidden weights respectively. As shown in fig. 2 (b), for the k-th input of a sequence and the l-th layer, the potential of RBIF at time-step tt (𝑯(t)\bm{H}(t)) depends on three parts, the inherited potential 𝑽(t1)\bm{V}(t-1), the charge of the previous layer 𝑿l1\bm{X}^{l-1} (𝒔kl1(t)λl1\bm{s}_{k}^{l-1}(t)\lambda^{l-1} in eq. 6), and the output of the k1k-1-th RBIF at the same time-step 𝒔k1l(t)\bm{s}_{k-1}^{l}(t). Note that, to avoid sequential errors, we adopt the same firing rules as described in eqs. 4 and 5.

3.2 Conversion Pipelines

As illustrated in fig. 3, QCRC can simultaneously convert different layers to their corresponding SNN layers via two sub-pipelines, which is versatile and suitable for the compound model (i.e., model that contains different types of layers), such as CRNNs. We design two conversion pipelines for different types of layers in networks, which we call CNN-Morph (CNNQCNNBIFSNNCNN\rightarrow QCNN\rightarrow BIFSNN) and RNN-Morph (RNNQRNNRBIFSNNRNN\rightarrow QRNN\rightarrow RBIFSNN). In brief, the conversion pipeline can be divided into two steps: the quantization process and the Neuron-Morph process.

3.2.1 Quantization.

(1) Operator Substitution: The first step is to make sure all operators in the original ANN are compatible with the SNN. For example, all activation functions should be ReLU based on equivalence requirements before training at full precision. In addition, max-pooling should be replaced by average-pooling because computing maxima with spiking neurons is non-trivial [25]. (2) Activation Substitution: In this step, the ReLU function is replaced by the quantized ReLU function, where the lower bound aa is set to 0 and upper bound bb set to LL. After the configuration, the quantized ANN is trained using the protocols defined in [11, 4].

3.2.2 Neuron-Morph.

(1) Neuron Substitution: Benefiting from neuronal equivalence (section 3.3), the synaptic weights of a quantized ANN can be directly mapped to their corresponding SNNs. Specifically, BIF neurons are converted from convolutional/linear neurons, while RBIF neurons are converted from recurrent neurons. (2) Neuron Configuration: The last step of conversion is to configure the BIF/RBIF neuron attributes (i.e., λl,Smaxl,Vkl(0)\lambda^{l},S_{max}^{l},V_{k}^{l}(0)) and set the s-analog encoding method for input and bias based on QCRC equivalence requirements. The s-analog encoding is the prerequisite for conversion, that is to make sure the inputs to the ll layer of ANN and SNN are the same. Two operations will be performed: a) the current XX will be charged into the network only at the first time step, otherwise the input is equal to zero; b) turn off the bias term calculations after the first time step.

3.3 Theoretical Equivalence in QCRC

Table 2: Summary of notations in this paper.
Symbol Definition Symbol Definition
ll Layer index 𝒔𝒌𝒍(𝒕)\bm{s_{k}^{l}(t)} Output spike for the k-th input at time-step t
kk RNN input index 𝑺𝒌𝒍(𝒕)\bm{S_{k}^{l}(t)} Spike Tracer2   at time-step t
tt SNN time-step SmaxlS^{l}_{\textrm{max}} Maximum value in spike tracer
𝑯kl(t)\bm{H}_{k}^{l}(t) Potential before firing 𝒙𝒌𝒍(𝒕)\bm{x_{k}^{l}(t)} UPP1   for the k-th input at time-step t
𝑽kl(t)\bm{V}_{k}^{l}(t) Potential after firing 𝑿kl(t)\bm{X}_{k}^{l}(t) UPP Tracer at time-step t
TT Total time-step nn Quantization level in ANN
λl\lambda^{l} Trainable threshold in ANN ss Learnable quantization scale in ANN
𝑾hhl\bm{W}_{hh}^{l} Learnable hidden-hidden weights \lfloor\cdot\rceil Round operation
𝑾ihl\bm{W}_{ih}^{l} Learnable input-hidden weights clip(x,a,b)clip(x,a,b) Clip function that limits x between a and b
  • 1

    Unweighted postsynaptic potential

  • 2

    Tracer records the sum of the first t values.

Theorem 3.1.

Assume a quantized CNN with ReLU activation function parameterized by 𝐖𝐥\bm{W^{l}} is converted to a BIFSNN based on CNN-Morph and s-analog encoding is adopted, then the accumulated outputs of the SNN are equal to the quantized CNN outputs when T is long enough that remaining membrane potential is insufficient to fire a spike.

Proof.

Theorem 0..1 proof is in the appendix. ∎

Theorem 3.2.

Suppose an RNN with ReLU activation function, parameterized by 𝐖ih\bm{W}_{ih} and 𝐖hh\bm{W}_{hh}, is quantized into nn quantization level by quantization scale ss:

𝒉k=sclip(𝑾ih𝒙k+bih+𝑾hh𝒉k1+bhhs,0,n).\bm{h}_{k}=s\cdot clip(\lfloor\dfrac{\bm{W}_{ih}\bm{x}_{k}+b_{ih}+\bm{W}_{hh}\bm{h}_{k-1}+b_{hh}}{s}\rceil,0,n). (8)

If an RBIFSNN is converted from the QRNN with 𝐕kl(0)=0.5s\bm{V}_{k}^{l}(0)=0.5s, 𝐒maxl=n\bm{S}_{max}^{l}=n, λl=s\lambda^{l}=s and the s-analog encoding is adopted, then for any kk-th input of the RNN sequence, the accumulated outputs of the SNN is equal to the QRNN output:

𝑿kl(T)=𝒉k,\bm{X}_{k}^{l}(T)=\bm{h}_{k}, (9)

when T is long enough that the remaining membrane potential is not sufficient to fire a spike.

Proof.

The key idea of QRNN-RBIF conversion is that for each RNN sequence input, the activation value of the RNN neuron can be equivalently mapped to the accumulated output of the SNN neuron. Based on this, we first combine eq. 6 and eq. 7 to get the potential update equation:

𝑽kl(t)𝑽kl(t1)=𝑾ih𝒔kl1(t)λl1+𝑾hh𝒔k1l(t)λl𝒔kl(t)λl.\bm{V}_{k}^{l}(t)-\bm{V}_{k}^{l}(t-1)=\bm{W}_{ih}\bm{s}_{k}^{l-1}(t)\lambda^{l-1}+\bm{W}_{hh}\bm{s}_{k-1}^{l}(t)\lambda^{l}-\bm{s}_{k}^{l}(t)\lambda^{l}. (10)

By summing up eq. 10 from 1 to inference time-step TT, we have:

𝑽kl(T)𝑽kl(0)=𝑾ihλl1i=1T𝒔kl1(t)+𝑾hhλli=1T𝒔k1l(t)λli=1T𝒔kl(t),\bm{V}_{k}^{l}(T)-\bm{V}_{k}^{l}(0)=\bm{W}_{ih}\lambda^{l-1}\sum_{i=1}^{T}\bm{s}_{k}^{l-1}(t)+\bm{W}_{hh}\lambda^{l}\sum_{i=1}^{T}\bm{s}_{k-1}^{l}(t)-\lambda^{l}\sum_{i=1}^{T}\bm{s}_{k}^{l}(t), (11)

where i=1T𝒔kl(t)=i=1T(𝑺kl(t)𝑺kl(t1))=𝑺kl(T)𝑺kl(0)\sum_{i=1}^{T}\bm{s}_{k}^{l}(t)=\sum_{i=1}^{T}(\bm{S}_{k}^{l}(t)-\bm{S}_{k}^{l}(t-1))=\bm{S}_{k}^{l}(T)-\bm{S}_{k}^{l}(0) according to eq. 4. If we set 𝑺kl(0)\bm{S}_{k}^{l}(0) = 0, eq. 11 can be simplified as

𝑽kl(T)𝑽kl(0)=𝑾ihλl1𝑺kl1(T)+𝑾hhλl𝑺k1l(T)λl𝑺kl(T).\bm{V}_{k}^{l}(T)-\bm{V}_{k}^{l}(0)=\bm{W}_{ih}\lambda^{l-1}\bm{S}_{k}^{l-1}(T)+\bm{W}_{hh}\lambda^{l}\bm{S}_{k-1}^{l}(T)-\lambda^{l}\bm{S}_{k}^{l}(T). (12)

Then, we divide both sides of eq. 12 by the threshold λl\lambda^{l}. With additional simple transformation, we can obtain the expression for spike tracer:

𝑺kl(T)=(𝑾ihλl1𝑺kl1(T)+𝑾hhλl𝑺k1l(T)+𝑽kl(0)𝑽kl(T))λl.\bm{S}_{k}^{l}(T)=\dfrac{(\bm{W}_{ih}\lambda^{l-1}\bm{S}_{k}^{l-1}(T)+\bm{W}_{hh}\lambda^{l}\bm{S}_{k-1}^{l}(T)+\bm{V}_{k}^{l}(0)-\bm{V}_{k}^{l}(T))}{\lambda^{l}}. (13)

When the simulation time-steps TT is long enough so that the remaining membrane potential 𝑽kl(T)\bm{V}_{k}^{l}(T) is insufficient to fire a spike, the eq. 13 can be written as

𝑺kl(T)=𝑾ihλl1𝑺kl1(T)+𝑾hhλl𝑺k1l(T)+𝑽kl(0)λl,\bm{S}_{k}^{l}(T)=\left\lfloor\frac{\bm{W}_{ih}\lambda^{l-1}\bm{S}_{k}^{l-1}(T)+\bm{W}_{hh}\lambda^{l}\bm{S}_{k-1}^{l}(T)+\bm{V}_{k}^{l}(0)}{\lambda^{l}}\right\rfloor, (14)

where 𝑺kl(T)=0,1,,Smaxl\bm{S}_{k}^{l}(T)=0,1,...,S^{l}_{\textrm{max}}. By multiplying both sides of the eq. 14 by λl\lambda^{l} and inserting the clip function, we can get the final equation:

𝑿kl(T)=λlclip(𝑾ih𝑿kl1(T)+𝑾hh𝑿k1l(T)+𝑽kl(0)λl,0,𝑺maxl),\bm{X}_{k}^{l}(T)=\lambda^{l}\cdot clip(\lfloor\dfrac{\bm{W}_{ih}\bm{X}_{k}^{l-1}(T)+\bm{W}_{hh}\bm{X}_{k-1}^{l}(T)+\bm{V}_{k}^{l}(0)}{\lambda^{l}}\rfloor,0,\bm{S}^{l}_{max}), (15)

where 𝑿kl(T)=λl𝑺kl(T)\bm{X}_{k}^{l}(T)=\lambda^{l}\bm{S}_{k}^{l}(T) by definition.

Equation 15 describes the relationship between unweighted postsynaptic potential of RBIF neurons in adjacent layers. By setting λl=s\lambda^{l}=s, 𝑺maxl=n\bm{S}_{max}^{l}=n, 𝑽kl(0)=0.5s+bih+bhh\bm{V}_{k}^{l}(0)=0.5s+b_{ih}+b_{hh}, eq. 15 and eq. 8 are equivalent, which will lead to the conclusion in eq. 9. Note that, setting 𝑽kl(0)=0.5s\bm{V}_{k}^{l}(0)=0.5s, which is called pre-charge method in [5], will make operator \lfloor\cdot\rfloor and operator \lfloor\cdot\rceil equal. ∎

4 Experiments

In this section, we obtain optimal SNNs in sequence learning via CRNN-to-SNN conversion. We validate the effectiveness of our method with other state-of-the-art learning-based approaches and conversion-based approaches, demonstrating the advantages of our method on different datasets (i.e., benchmark S-MNIST/pS-MNIST [18] and collision avoidance dataset [17]). We further experimentally demonstrate the lossless conversion of QCRC and the effectiveness of s-analog encoding in the ablation study.

4.1 Implementation details

The experiments exactly follow the quantization and conversion stages as introduced in section 3.2. Both ANN quantization training and SNN implementation are carried out with PyTorch. Unless otherwise specified, the optimizer is Adam, the learning rate scheduler is the cosine annealing schedule.

4.1.1 S-MNIST and pS-MNIST

We only apply normalization transform to the dataset. The main hyper-parameters of the models follow their corresponding papers [2, 31, 12]. Training epoch and batch size are 200 and 256 for all models. The learning rate of our model is 0.0002. The cross-entropy loss (CE) is used to evaluate the difference between the estimated value and the actual value.

4.1.2 Obstacle detection and avoidance

The total dataset (including 20 training, 5 validation traces) is split into multiple sub-sequences of length 32 and fed into the model sequentially. The input of the LIDAR scanner will be fed to the main structure, while the estimated robot pose will be firstly clipped to the range of -1.0 to 1.0 and then concatenated with the output of layer 5 before sent to the next layer. We follow a similar network as [17], consisting of an RNN preceded by a set of convolutional layers. The epoch and batch size are set to 1000 and 32 respectively. We use a fixed learning rate of 0.0001 to train the model with the mean square error (MSE) loss function.

4.2 Sequential MNIST

The sequential- and permuted-sequential MNIST (S/PS-MNIST) [18] are widely used benchmarks to verify the learning ability for long-term dependencies. The image will be divided into 784 pixels and sent to the network pixel by pixel. The networks are asked to predict the class of MNIST image only when all 784 pixels are fed sequentially to the recurrent network. Therefore, achieving high accuracy on the “pixel-by-pixel MNIST” problem is not easy because neurons must have the ability to learn from the long contexts.

Benefiting from the high scalability of our method, we use indRNN cell [20] as the original RNN model. We set the quantization step and time-steps to 128 and 512. A performance comparison is given in Table 3. RBIF reads the image pixel by pixel without any extra encoding process, just as the same as the LSTM. It outperforms all models, achieving 99.16% and 94.95% classification accuracy on S-MNIST and PS-MNIST respectively. Note that, the accuracy of pr-ALIF (94.3%) on PS-MNIST is not included for comparison because the adoption of a sliding window is unfair to other models. We also compare our method with the conversion-based method. It turns out that performance deteriorates rapidly as the sequence gets longer due to the propagation of sequential error, which we will explain further in section 4.4.2.

Table 3: Test accuracy (%) of S-MNIST and PS-MNIST. - refers to data not reported or cannot be reproduced. Current and compared best results are in bold and grey respectively.
Dataset Neurons   RBIF  
LSTM
[14]
 
pr-ALIF
[32]
 
ALIF
[31]
 
MPSN
[12]
 
LSNN
[2]
 LIF
S-MNIST
99.16 98.2 98.7 97.82 63.6 96.4 28.6
PS-MNIST 94.95 88 94.3 1 91 34.9 - 23.9
  • 1

    not included for comparison due to the adoption of a sliding window.

4.3 Obstacle detection and avoidance

To explore the application of SNNs in sequential robotic tasks, we conduct robot navigation experiments using the dataset proposed in [17]. The objective of this task is to navigate a Pioneer 3-AT mobile robot safely through obstacles. Specifically, the network input comprises data streams from a 270-degree 2D LiDAR scanner and a time series of estimated robot poses sampled at 10Hz. By generating a decision in the form of a target angular velocity, the network can maneuver the robot safely around the obstacles.

Table 4 reports the results of the collision avoidance dataset, our method outperforms the others at every time-step with a lower MSE loss. Note that, because of the incompatibility problem mentioned in section 1, we add a recurrent structure to neurons in compared works, which is compatible to their conversion algorithm. It clearly shows that the IF neuron still suffers a lot from the conversion errors in QCFS. The calibrating offset spikes in [13] can bridge the gap between ANN and SNN in a certain way, but cannot control the errors from the source. Fast-SNN can achieve nearly lossless performance but cannot eliminate the sequential errors in the end. In contrast, QCRC achieves the lowest loss of 0.0569 when T=8T=8, which is equal to the loss of ANN. Even when the time-steps is 2, we can achieve a very low loss of 0.1180.

Table 4: Experiment results on collision avoidance dataset. σ\sigma denotes the time-steps to calculate offset spikes. - means no result can be obtained under L = 8 according to [15]. Best results are in bold.
Method Neuron ANN T=2 T=4 T=8 T=16 T=32
QCFS [6] IF 0.0907 0.2726 0.1929 0.1469 0.1208 0.1078
Offset (σ=6\sigma=6) [13] IF 0.0907 0.2187 0.1414 0.1125 0.0985 0.0969
Fast-SNN [15] signed IF 0.0669 0.2364 0.1356 0.0694 - -
Ours BIF/RBIF 0.0569 0.1180 0.0780 0.0569 0.0569 0.0569
Refer to caption
Figure 4: Conversion error study. (left) The L1 Norm between the feature map of accumulated spiking output and the feature map of quantized activation output. (right) The sum output of the activation layers in QCRC. “C/L” refers to convolutional/linear layers and “R” denotes recurrent layers.

.

4.4 Ablation Study

4.4.1 Conversion Error Analysis

We perform two-fold validation on the equivalence of QCRC. We use the dataset and network in section 4.3. The choice of CRNN network can make the analysis more comprehensive since it contains three commonly used layers (i.e., linear, convolutional, recurrent layers). To measure the conversion error straightforwardly, we use a batch of data to visualize the L1 Norm (a.k.a. Manhattan distance) between QANN and its counterpart SNN for intermediate activation layers, as shown in the left of fig. 4. It is shown that the use of IF neurons makes the L1 Norm in QCFS remain at a large value due to the accumulating sequential error. Although Fast-snn proposes the signed IF neuron and layer-wise fine-tuning scheme to mitigate the sequential error, the m-analog encoding still leads to in-equivalence at the model level and degrades the performance at deeper layers. Compared with them, only QCRC reaches the true lossless level (i.e., the L1 norm between QANN and converted SNN is 0). Furthermore, the bar graphs we draw (right part in Figure 4) show that the sum of activations for each neuron layer in QANN and SNN is equal.

Refer to caption
Figure 5: The absolute distance between quantized ANN and SNN for two analog encoding methods. The results show the first 15 time steps in the sequence (left) and the entire sequence (right).

.

4.4.2 Effect of s-analog Encoding

We perform experiments on sequential MNIST to explore the effects of different analog encoding methods. The m-analog encoding (used in [6, 15]) charges the current XX into the network at every time-step, while the s-analog encoding we use only charges X into the network at the first time step. We randomly select an image in MNIST and evaluate the two analog encodings using the same network with the same settings. Figure 5 depicts the results of the absolute distance between quantized ANN and SNN for the first 15 time steps in the sequence (left) and the entire sequence (right). We can see that the m-analog encoding only mitigates the sequential error. If the sequential error is not zero for one input (e.g., the fourth input of the sequence), it will propagate pixel by pixel by the recurrent structure and will be magnified by larger quantization error. When the sequence gets longer, the accumulated error will rapidly grow and become uncontrollable. Since we only need the output of the last time step, accumulated error will severely degrade the performance. In contrast, the adoption of s-analog encoding together with our conversion pipelines guarantees lossless conversion at every time-step in the sequence.

5 Discussion and Conclusion

This paper proposes a comprehensive QCRC framework to help SNNs overcome the challenge of not achieving ANN-level results in sequence learning, enabling SNNs to achieve results comparable to RNNs. To overcome the incompatibility problem of RNN cell, we propose RBIF neuron. Based on this, we further demonstrate the lossless CRNN-SNN conversion with the design of conversion pipelines and s-analog encoding. The framework includes two sub-pipelines (i.e., CNN-Morph and RNN-Morph), which can support end-to-end conversion of complex models with both recurrent and convolutional structures into SNN and is not limited by the type of dataset. We are the first work to implement lossless RNN-SNN conversion on time series tasks. Our results show promising advantages compared to the state-of-the-art conversion- and learning-based methods. Our results answer the question in section 1: we can easily achieve ANN-level performance for SNNs in sequence learning via CRNN-SNN conversion. We believe our work paves the way for the application of SNNs in time series tasks.

5.0.1 Acknowledgment.

This work is partially supported by National Key R&D Program of China (2022YFB4500200), National Natural Science Foundation of China (Nos.62102257), Biren Technology–Shanghai Jiao Tong University Joint Laboratory Open Research Fund, Microsoft Research Asia Gift Fund, Shandong Normal University Undergraduate Research Fund.

Appendix

Theorem 0..1.

Assume a quantized CNN with ReLU activation function parameterized by 𝐖𝐥\bm{W^{l}} is converted to a BIFSNN based on CNN-Morph and s-analog encoding is adopted, then the accumulated outputs of the SNN is equal to the quantized CNN output when T is long enough that remaining membrane potential is insufficient to fire a spike.

Proof.

We first combine eq. 2 and eq. 3 to get the potential update equation:

𝑽l(t)𝑽l(t1)=𝑾l𝒔l1(t)λl1𝒔l(t)λl.\bm{V}^{l}(t)-\bm{V}^{l}(t-1)=\bm{W}^{l}\bm{s}^{l-1}(t)\lambda^{l-1}-\bm{s}^{l}(t)\lambda^{l}. (16)

By summing up eq. 16 from 1 to inference time-step TT, we have:

𝑽l(T)𝑽l(0)=𝑾lλl1i=1T𝒔l1(t)λli=1T𝒔l(t).\bm{V}^{l}(T)-\bm{V}^{l}(0)=\bm{W}^{l}\lambda^{l-1}\sum_{i=1}^{T}\bm{s}^{l-1}(t)-\lambda^{l}\sum_{i=1}^{T}\bm{s}^{l}(t). (17)

where i=1T𝒔l(t)=i=1T(𝑺l(t)𝑺l(t1))=𝑺l(T)𝑺l(0)\sum_{i=1}^{T}\bm{s}^{l}(t)=\sum_{i=1}^{T}(\bm{S}^{l}(t)-\bm{S}^{l}(t-1))=\bm{S}^{l}(T)-\bm{S}^{l}(0) according to eq. 4. If we set 𝑺l(0)\bm{S}^{l}(0) = 0, eq. 17 can be simplified as:

𝑽l(T)𝑽l(0)=𝑾lλl1𝑺l1(T)λl𝑺l(T).\bm{V}^{l}(T)-\bm{V}^{l}(0)=\bm{W}^{l}\lambda^{l-1}\bm{S}^{l-1}(T)-\lambda^{l}\bm{S}^{l}(T). (18)

Then, we divide both sides of eq. 18 by the threshold λl\lambda^{l}. With additional simple transformation, we can obtain the expression for spike tracer:

𝑺l(T)=𝑾lλl1𝑺l1(T)+𝑽l(0)𝑽l(T)λl\bm{S}^{l}(T)=\dfrac{\bm{W}^{l}\lambda^{l-1}\bm{S}^{l-1}(T)+\bm{V}^{l}(0)-\bm{V}^{l}(T)}{\lambda^{l}} (19)

When the simulation time-steps TT is long enough so that the remaining membrane potential 𝑽l(T)\bm{V}^{l}(T) is insufficient to fire a spike, eq. 19 can be rewritten as the expression of :

𝑺l(T)=𝑾lλl1𝑺l1(T)+𝑽l(0)λl,\bm{S}^{l}(T)=\left\lfloor\frac{\bm{W}^{l}\lambda^{l-1}\bm{S}^{l-1}(T)+\bm{V}^{l}(0)}{\lambda^{l}}\right\rfloor, (20)

where 𝑺l(T)=0,1,,Smaxl\bm{S}^{l}(T)=0,1,...,S^{l}_{\textrm{max}}. By multiplying both sides of the eq. 20 by λl\lambda^{l}, we can get the final equation:

𝑿l(T)=λlclip(𝑾l𝑿l1(T)+𝑽l(0)λl,0,𝑺maxl),\bm{X}^{l}(T)=\lambda^{l}\cdot clip(\lfloor\dfrac{\bm{W}^{l}\bm{X}^{l-1}(T)+\bm{V}^{l}(0)}{\lambda^{l}}\rfloor,0,\bm{S}^{l}_{max}), (21)

where 𝑿l(T)=λl𝑺l(T).\bm{X}^{l}(T)=\lambda^{l}\bm{S}^{l}(T). by definition.

Equation 21 describes the relationship between unweighted postsynaptic potential of BIF neurons in adjacent layers.

Considering a quantization CNN with quantization scale ss and quantization level nn:

𝑿=sclip(𝑾l𝑿l1+bs,0,n).\bm{X^{\prime}}=s\cdot clip(\lfloor\dfrac{\bm{W}^{l}\bm{X}^{l-1}+b}{s}\rceil,0,n). (22)

If we set λl=s\lambda^{l}=s, 𝑺maxl=n\bm{S}_{max}^{l}=n, 𝑽l(0)=b+0.5s\bm{V}^{l}(0)=b+0.5s, eq. 22 and eq. 21 are equivalent.

References

  • [1] Akopyan, F., Sawada, J., Cassidy, A., Alvarez-Icaza, R., Arthur, J., Merolla, P., Imam, N., Nakamura, Y., Datta, P., Nam, G.J., et al.: Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip. IEEE transactions on computer-aided design of integrated circuits and systems 34(10), 1537–1557 (2015)
  • [2] Bellec, G., Salaj, D., Subramoney, A., Legenstein, R., Maass, W.: Long short-term memory and learning-to-learn in networks of spiking neurons. CoRR abs/1803.09574 (2018), http://arxiv.org/abs/1803.09574
  • [3] Bengio, Y., Léonard, N., Courville, A.: Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013)
  • [4] Bhalgat, Y., Lee, J., Nagel, M., Blankevoort, T., Kwak, N.: Lsq+: Improving low-bit quantization through learnable offsets and better initialization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. pp. 696–697 (2020)
  • [5] Bu, T., Fang, W., Ding, J., DAI, P., Yu, Z., Huang, T.: Optimal ANN-SNN conversion for high-accuracy and ultra-low-latency spiking neural networks. In: International Conference on Learning Representations (2022), https://openreview.net/forum?id=7B3IJMM1k_M
  • [6] Bu, T., Fang, W., Ding, J., Dai, P., Yu, Z., Huang, T.: Optimal ann-snn conversion for high-accuracy and ultra-low-latency spiking neural networks. arXiv preprint arXiv:2303.04347 (2023)
  • [7] Cao, Y., Chen, Y., Khosla, D.: Spiking deep convolutional neural networks for energy-efficient object recognition. International Journal of Computer Vision 113, 54–66 (2015)
  • [8] Davies, M., Srinivasa, N., Lin, T.H., Chinya, G., Cao, Y., Choday, S.H., Dimou, G., Joshi, P., Imam, N., Jain, S., et al.: Loihi: A neuromorphic manycore processor with on-chip learning. Ieee Micro 38(1), 82–99 (2018)
  • [9] Diehl, P.U., Neil, D., Binas, J., Cook, M., Liu, S.C., Pfeiffer, M.: Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing. In: 2015 International joint conference on neural networks (IJCNN). pp. 1–8. ieee (2015)
  • [10] Ding, J., Yu, Z., Tian, Y., Huang, T.: Optimal ann-snn conversion for fast and accurate inference in deep spiking neural networks. arXiv preprint arXiv:2105.11654 (2021)
  • [11] Esser, S.K., McKinstry, J.L., Bablani, D., Appuswamy, R., Modha, D.S.: Learned step size quantization. arXiv preprint arXiv:1902.08153 (2019)
  • [12] Fang, W., Yu, Z., Zhou, Z., Chen, D., Chen, Y., Ma, Z., Masquelier, T., Tian, Y.: Parallel spiking neurons with high efficiency and ability to learn long-term dependencies. In: Thirty-seventh Conference on Neural Information Processing Systems (2023)
  • [13] Hao, Z., Ding, J., Bu, T., Huang, T., Yu, Z.: Bridging the gap between anns and snns by calibrating offset spikes. arXiv preprint arXiv:2302.10685 (2023)
  • [14] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997)
  • [15] Hu, Y., Zheng, Q., Jiang, X., Pan, G.: Fast-snn: Fast spiking neural network by converting quantized ann. arXiv preprint arXiv:2305.19868 (2023)
  • [16] Le, Q.V., Jaitly, N., Hinton, G.E.: A simple way to initialize recurrent networks of rectified linear units. arXiv preprint arXiv:1504.00941 (2015)
  • [17] Lechner, M., Hasani, R., Rus, D., Grosu, R.: Gershgorin loss stabilizes the recurrent neural network compartment of an end-to-end robot learning scheme. In: 2020 International Conference on Robotics and Automation (ICRA). IEEE (2020)
  • [18] Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791
  • [19] Li, C., Ma, L., Furber, S.: Quantization framework for fast spiking neural networks. Frontiers in Neuroscience 16, 918793 (2022)
  • [20] Li, S., Li, W., Cook, C., Zhu, C., Gao, Y.: Independently recurrent neural network (indrnn): Building a longer and deeper rnn (2018)
  • [21] Li, Y., Deng, S., Dong, X., Gong, R., Gu, S.: A free lunch from ann: Towards efficient, accurate spiking neural networks calibration. In: International conference on machine learning. pp. 6316–6325. PMLR (2021)
  • [22] Maass, W.: Networks of spiking neurons: the third generation of neural network models. Neural networks 10(9), 1659–1671 (1997)
  • [23] Neftci, E.O., Mostafa, H., Zenke, F.: Surrogate gradient learning in spiking neural networks: Bringing the power of gradient-based optimization to spiking neural networks. IEEE Signal Processing Magazine 36(6), 51–63 (2019)
  • [24] Panda, P., Aketi, S.A., Roy, K.: Toward scalable, efficient, and accurate deep spiking neural networks with backward residual connections, stochastic softmax, and hybridization. Frontiers in Neuroscience 14, 535502 (2020)
  • [25] Rueckauer, B., Lungu, I.A., Hu, Y., Pfeiffer, M., Liu, S.C.: Conversion of continuous-valued deep networks to efficient event-driven networks for image classification. Frontiers in neuroscience 11,  682 (2017)
  • [26] Sengupta, A., Ye, Y., Wang, R., Liu, C., Roy, K.: Going deeper in spiking neural networks: Vgg and residual architectures. Frontiers in neuroscience 13,  95 (2019)
  • [27] Stanojevic, A., Woźniak, S., Bellec, G., Cherubini, G., Pantazi, A., Gerstner, W.: An exact mapping from relu networks to spiking neural networks. Neural Networks 168, 74–88 (2023). https://doi.org/https://doi.org/10.1016/j.neunet.2023.09.011, https://www.sciencedirect.com/science/article/pii/S0893608023005051
  • [28] Wang, Y., Zhang, M., Chen, Y., Qu, H.: Signed neuron with memory: Towards simple, accurate and high-efficient ann-snn conversion. In: International Joint Conference on Artificial Intelligence (2022)
  • [29] Xing, Y., Di Caterina, G., Soraghan, J.: A new spiking convolutional recurrent neural network (scrnn) with applications to event-based hand gesture recognition. Frontiers in neuroscience 14, 590164 (2020)
  • [30] Xu*, Z., You*, K., Wang, X., Guo, Q., He, Z.: Bkdsnn: Enhancing the performance of learning-based spiking neural networks training with blurred knowledge distillation. In: Proceedings of The 18th European Conference on Computer Vision (ECCV) 2024 (2024)
  • [31] Yin, B., Corradi, F., Bohté, S.M.: Effective and efficient computation with multiple-timescale spiking recurrent neural networks. In: International Conference on Neuromorphic Systems 2020. pp. 1–8 (2020)
  • [32] Yin, B., Corradi, F., Bohté, S.M.: Accurate and efficient time-domain classification with adaptive spiking recurrent neural networks. CoRR abs/2103.12593 (2021), https://arxiv.org/abs/2103.12593
  • [33] You*=, K., Xu*=, Z., Nie*, C., Deng, Z., Guo, Q., Wang, X., He, Z.: Spikezip-tf: Conversion is all you need for transformer-based snn. In: Proceedings of Forty-First International conference on Machine Learning (ICML) (2024)