BSNN: Towards Faster and Better Conversion of Artificial Neural Networks to Spiking Neural Networks with Bistable Neurons

Yang Li, Yi Zeng, and Dongcheng Zhao Yang Li is with the Research Center for Brain-inspired Intelligence, Institute of Automation, Chinese Academy of Sciences (CASIA), Beijing 100190, China, and School of Artificial Intelligence, University of Chinese Academy of Sciences (UCAS), Beijing 100190, China, e-mail: [email protected] Zhao is with CASIA and UCAS, Beijing 100190, China.Yi Zeng is with CASIA, UCAS and Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai 200031, China, e-mail: [email protected] corresponding author is Yi Zeng.Manuscript received May 27, 2021; revised XX XX, 2021.

Abstract

The spiking neural network (SNN) computes and communicates information through discrete binary events. It is considered more biologically plausible and more energy-efficient than artificial neural networks (ANN) in emerging neuromorphic hardware. However, due to the discontinuous and non-differentiable characteristics, training SNN is a relatively challenging task. Recent work has achieved essential progress on an excellent performance by converting ANN to SNN. Due to the difference in information processing, the converted deep SNN usually suffers serious performance loss and large time delay. In this paper, we analyze the reasons for the performance loss and propose a novel bistable spiking neural network (BSNN) that addresses the problem of spikes of inactivated neurons (SIN) caused by the phase lead and phase lag. Also, when ResNet structure-based ANNs are converted, the information of output neurons is incomplete due to the rapid transmission of the shortcut path. We design synchronous neurons (SN) to help efficiently improve performance. Experimental results show that the proposed method only needs 1/4-1/10 of the time steps compared to previous work to achieve nearly lossless conversion. We demonstrate state-of-the-art ANN-SNN conversion for VGG16, ResNet20, and ResNet34 on challenging datasets including CIFAR-10 (95.16% top-1), CIFAR-100 (78.12% top-1), and ImageNet (72.64% top-1).

Index Terms:

Spiking Neural Network, Bitability, Neuromorphic Computing, Neural Coding.

I Introduction

Deep learning (or Deep Neural Network, DNN) has made breakthroughs in many fields such as computer vision [1, 2, 3], natural language processing [4, 5], and speech processing [6], and has even surpassed humans in some specific fields. But many difficulties and challenges also need to be overcome in the development process of deep learning [7, 8, 9, 10]. One concerning issue is that researchers pay more attention to higher computing power and better performance while ignoring the cost of energy consumption [11]. Taking natural language processing tasks as an example, the power consumption and carbon emissions of Transformer [12] model training are very considerable. In recent years, the cost advantages and environmental advantages of low-energy AI have attracted the attention of researchers. They design compression algorithms [13, 14]to enable artificial neural networks (ANN) to significantly reduce network parameters and calculations while maintaining their original performance. Another part of the work focuses on computing architecture [15], less computational energy consumption can be achieved by designing hardware that is more suitable for the operational characteristics of neural network models. But the problem of the high computational complexity of deep neural networks still exists. Therefore, the spiking neural network, known as the third-generation artificial neural network [16], has received more and more attention [17, 18, 19, 20, 21].

Spike neural networks (SNNs) process discrete spike signals through the dynamic characteristics of spiking neurons, rather than real values, and are considered to be more biologically plausible and more energy-efficient [22, 23, 24]. For the former, the event-type information transmitted by neurons in SNN is the spike, which is generated when the membrane potential reaches the neuron firing threshold. Thus, its information processing process is more in line with biological reality than traditional artificial neurons [25, 26, 27]. For the latter, the information in SNN is based on the event, e.g., neurons that do not emit spikes do not participate in calculations, and the information integration of neurons is an accumulate (AC) operation, which is more energy-efficient than the multiply-accumulate (MAC) operations in ANN [28, 29]. Therefore, researchers put forward the concept of neuromorphic computing [30, 31, 32], which realizes the more biologically plausible SNN on hardware. It shows more significant progress in fast information processing and energy saving. But due to the non-differentiable characteristics of SNN, training SNN is still a challenging task. Because of the lack of the derivative of the output, the common backpropagation algorithm cannot be used directly. How to use SNN for effective reference has become a problem for researchers.

Taking inspiration from the brain, such as Spike-Timing Dependent Plasticity (STDP) [33, 34], lateral inhibition [35, 36], Long-Term Potentiation (LTP) [37], and Long-Term Depression (LTD) [38] are effective methods. By properly integrating different neural mechanisms in the brain [39], SNN can be effectively trained. Because most of these methods are unsupervised, researchers often add SVM [40] or other classifiers for supervised learning [18, 41] or directly do learning in an unsupervised manner [19, 42]. All of them are of great importance for further enhancing the interpretability of SNN and exploring the working mechanism of the human brain. However, this optimization method that only uses local neural activities is challenging to achieve high performance and be applied to complex tasks. Some researchers try to train SNNs through approximated gradient algorithms [43, 44, 45, 46], where the backpropagation algorithm can be applied to the SNN by continuous the spike firing process of the neuron. However, this method suffers from difficulty in convergence and requires a lot of time in training procedure in the deep neural networks (DNN) because it is difficult to balance the whole firing rate. For the above two methods, they perform poorly in large networks and complex tasks. We believe that the inability to obtain an SNN with effective reference ability is a key issue in the development and application of SNN.

Refer to caption — Figure 1: Illustration of ANN-SNN Conversion

Recently, the conversion method has been proposed to convert the training result of ANN to SNN [47]. The ANN-SNN conversion method maps the trained ANN parameters with ReLU activation function to SNN with the same topology as illustrated in Figure 1, which makes it possible for SNN to obtain extremely high performance at a very low computational cost. But direct mapping will lead to severe performance degradation [48]. Diehl et al. [49] propose the data-based normalization method, which scales the parameters with the maximum activation values of each layer in ANN, improving the performance of the converted SNN. Reuckauer et al. [50] and Han et al. [51] use integrate-and-fire (IF) neurons with soft reset to make SNN achieve performance comparable to ANN. Nonetheless, it usually takes more than 1000-4000 time steps to achieve better performance on complex datasets. And when converting ResNet [52] to SNN, researchers suffer from a certain performance loss [53, 54, 55] because the information received by the output neuron of the residual block is incomplete with the spikes on the shortcut path arriving earlier.

Bistability is a special activity form in biological neurons [56]. Neurons can switch between spike and non-spike states under the action of neuromodulating substances, thus exhibiting short-term memory function [57]. Inspired from the bistability characteristic, we focus on improving the performance of SNN and propose a bistable spiking neural network (BSNN), which combines phase coding and the bistability mechanism that greatly improves the performance after conversion and reduces the time delay. For high-performance spiking ResNet, we propose synchronous neurons (SN), which can help spikes in the residual block synchronously reach the output neurons from input neurons through two paths. The experimental results demonstrate they can help achieve nearly lossless conversion and state-of-the-art in MNIST, CIFAR-10, CIFAR-100, and ImageNet while significantly reduce time delay. Our contributions can be summarized as follows:

•

We propose a novel BSNN that combines phase coding and bistability mechanism. It effectively solves the problem of SIN and greatly reduces the performance loss and time delay of the converted SNN.
•

We propose synchronous neurons to solve the problem that information in the spiking ResNet cannot synchronously reach the output neurons from two paths.
•

We achieve state-of-the-art on the MNIST, CIFAR-10, CIFAR-100, and ImageNet datasets, verifying the effectiveness of the proposed method.

II Related Work

Many conversion methods have been proposed in order to obtain high-performance SNN. According to the encoding method they can be divided into three kinds.

Temporal Coding Based Conversion. Temporal coding uses neural firing time to encode the input to spike trains and approximate activations in ANN [58]. However, since neurons in the hidden layer need to accumulate membrane potential to spike, when the activation value is equal to the maximum, neurons in deep layers are difficult to spike immediately, making this method difficult to convert deep ANNs. Zhang et al. [59] use ticking neurons to modify the method above, which transfers information layer by layer. Nevertheless, this method is less robust and difficult to be used in models with complex network structures like the residual block.

Rate Coding Based Conversion. Unlike temporal coding, the rate coding-based conversion method uses the firing rates of spiking neurons to approximate the activation values in the ANN [47]. Diehl et al. [49] propose data-based and model-based normalization, which use the maximum activation value of neurons in each layer to normalize the weights. When disturbed by noise, the normalization parameter may be quite large, which will cause the weight smaller and the time to spike longer. Researchers propose to use the p-th largest value for normalization operation, thereby greatly improving robustness and reducing time delay [50]. Therefore, the conversion method based on rate coding has achieved better performance in ResNet [53] and Inception Networks [55, 54]. However, the processing speed of spikes on the paths with different processing units is different. The information received by the output neuron is delayed to various degrees when spreading on these wider networks. The difference between the firing rate and the activation value in the ANN will be greater. Therefore, the performance loss and the time delay of the SNN is more significant when converting these ANNs.

Phase Coding Based Conversion. To overcome the large time delay of the converted SNN, researchers propose SNN with weighted spike, which assigns different weights to the spikes in different phases to pack more information in the spike [60]. Nonetheless, when neurons do not spike in the expected phase, the spikes of neurons in hidden layers will deviate from the coding rules to a certain extent, resulting in poor performance on complex datasets and large networks. Phase coding and burst coding are combined to speed up the information transmission [61], but still needs 3000 simulation time on CIFAR-100 dataset.

III Proposed BSNN

In this section, we introduce the spiking neurons and encoding methods in detail, and then analyze the reasons for the loss of conversion performance based on the process of phase coding conversion methods. The detailed information of the model to reduce conversion loss and time delay is described. And we will introduce the effect of synchronized neurons in spiking ResNet.

III-A Spiking Neuron and Encoding

The most commonly used spiking neuron model is the integrate-and-fire (IF) model. The IF neuron continuously receives spikes from the presynaptic neuron and dynamically changes its membrane potential. When it exceeds the threshold, the neuron spikes and the membrane potential is traditionally reset to zero. But it will cause a lot of information loss. We follow [50] and use the soft reset to subtract the threshold from the membrane potential:

\displaystyle V_{i,t}^{l}=V_{i,t-1}^{l}+\sum_{j}w_{ij}\delta_{j,t}^{l-1},

(1)

\displaystyle if\quad V_{i,t}^{l}\geq V_{th},\quad\begin{cases}V_{i,t}^{l}=V_{i,t}^{l}-V_{th},\\ \delta_{i,t}^{l}=1.\end{cases}

(2)

where $V_{i,t}^{l}$ represents the membrane potential of neuron $i$ in layer $l$ at time $t$ , $w_{ij}$ is the weight connecting the neuron $j$ and $i$ , $\delta_{j,t}^{l-1}$ is the spike of neuron $j$ in layer ( $l-1$ ) at time $t$ .

The spike trains can be encoded by real values with different encoding methods. The real value is equal to the firing rate in rate coding, which is the number of spikes in a period, or the ratio of the spike time and total simulation time $T$ in temporal coding, which is:

\displaystyle a_{rate}=\frac{N}{T},\quad a_{temporal}=\frac{t_{spike}}{T},

(3)

where $N$ denotes the number of spikes, $t_{spike}$ is the time of the first spike. Previous work shows a considerable time delay with the use of rate and temporal coding. For example, they all need at least 1000 time steps to represent 0.001 of input.

Therefore, we use phase coding [60] to encode activation values to spike trains. It can pack more information in one spike by assigning different weights to spikes and thresholds of each phase. Thus, phase coding is more energy efficient. Experiments show a shorter time is taken to accurately represent the real value when phase coding is used:

\displaystyle a_{j}^{l}=\frac{1}{n}\sum\limits_{k=1}^{nK}S_{k}\delta_{j,k}^{l},\quad V_{th,t}=S_{k}V_{th},

(4)

where $a_{j}^{l}$ is the activation value of neuron $j$ in layer $l,$ $K$ is the number of the phase of a period, $n=\frac{T}{K}$ is the number of the period, the phase function $S$ is represented by

\displaystyle S_{t}=2^{-(1+\mod(t,K))}.

(5)

III-B Framework of ANN-SNN Conversion

To make SNN work, we need to do some processing on ANN before conversion. We use $a_{i}^{l}=\max\{0,\sum\limits_{j}w_{ij}a_{j}^{l-1}+b_{i}^{l}\}$ to denote the arbitrary activation value in the ANN, $w_{ij}$ and $b_{i}^{l}$ are weight and bias respectively. The maximum firing rate in SNN is one because neurons emit one spike at most at every time step. Thus, we normalize the weight and bias with the data-norm method [50] by

\displaystyle\hat{w}_{ij}^{l}=w_{ij}^{l}\frac{\lambda_{l-1}}{\lambda_{l}},\quad\hat{b}_{i}^{l}=\frac{b_{i}^{l}}{\lambda_{l}},

(6)

where $\hat{w}_{ij}^{l}$ and $\hat{b}_{i}^{l}$ represent the weights and biases used in SNN, $\lambda_{l}$ is the maximum activation value of the $l$ -th layer. Then all activation values in ANN are at most 1.

As mentioned above, it is hard to perform max-pooling and batch normalization (BN) in SNN. We choose the spike of the neuron with the largest firing rate to output as the max-pooling operation in SNN. We follow [50] and merge the convolutional layer and the subsequent BN layer to form a new convolutional layer. An input $x$ is transformed into $BN[x]=\frac{\gamma}{\theta}(x-\mu)+\beta$ , where $\mu$ and $\theta$ are mean and variance of batch, $\beta$ and $\gamma$ are two learned parameters during training. The parameters of the new convolutional layer which can be converted, are described by

\displaystyle\hat{w_{ij}}=\frac{\gamma_{i}}{\theta_{i}}w_{ij},\quad\hat{b_{i}}=\frac{\gamma_{i}}{\theta_{i}}(b_{i}-\mu_{i})+\beta_{i}.

(7)

III-C Analysis of Performance Loss

Even though the ANN is processed, the converted SNN usually suffers performance loss. To simplify the analysis of performance loss, we assume that $a_{i}^{l}\geq 0$ , $b_{i}^{l}=0$ and the threshold $V_{th}$ is 1. The neuron membrane potential is $V_{i,nK}^{l}$ at the end of the simulation. The total number of spikes of the neuron is numerically equal to the total received input minus the membrane potential at $T$ :

\displaystyle N=\sum_{t}^{nK}S_{t}\sum\limits_{j}w_{ij}\delta_{j,k}^{l-1}-V_{i,nK}^{l}.

(8)

Then the firing rate of neurons is approximately equal to the activation value in ANN when $T$ is long enough:

$\displaystyle r_{i,nK}^{l}=$	$\displaystyle\frac{N}{n}$	(9)
$\displaystyle=$	$\displaystyle\frac{1}{n}\ \sum_{t}^{nK}S_{t}\sum\limits_{j}w_{ij}\delta_{j,k}^{l-1}-V_{i,nK}^{l}$	(10)
$\displaystyle=$	$\displaystyle\frac{1}{n}\left(\sum\limits^{n}\sum_{j}w_{ij}\sum\limits_{k=1}^{K}S_{k}\delta_{j,k}^{l-1}-V_{i,nK}^{l}\right)$
$\displaystyle=$	$\displaystyle\sum\limits_{j}w_{ij}a_{j}^{l-1}-\frac{1}{n}V_{i,nK}^{l}.$	(11)

Note that the postsynaptic current at each moment is as follows:

\displaystyle I_{j,t}=\sum_{j}w_{ij}\delta_{j,t}^{l-1}.

(12)

As shown in Figure 2, once the neuron in hidden layers spikes earlier or later than the time directly encoded, which we call phase lead or phase lag, the neuron will transmit too much or too little information to the next layer. It makes some features over-activated or not activated and may cause spikes of inactivated neurons (SIN), which means neurons that cannot be activated in ANN spike in the SNN. SNN needs a long time to accumulate spikes to reduce the impact of these destructive spikes. Thus, the features corresponding to the network firing rate can be approximately equal and proportional to the ANN features, which is the reason for the large time delay of the converted SNN. When the problem of SIN is quite severe, e.g., a large number of features that should not be activated in the ANN are activated in the SNN, it cannot be solved by long-time simulation and causes serve performance loss. Note that the above analysis is also applicable to rate-based conversion methods.

III-D Bistable SNN

The immediate response of the neuron to the received current is unreliable. How should the information propagate in the spiking neurons to make the spike trains conform to the encoding rules to avoid the SIN problem caused by phase lag and phase lead? We solve the problem by proposing a bistable IF neuron (BIF) combining the IF neuron and bistability mechanism. We model the process of spiking as a piecewise function according to the fact that the bistability is shown as the periodic change of spike and non-spike states. In the spike stage, neurons spike according to the membrane potential normally while can’t spike in the non-spike phase:

	$\displaystyle\delta_{A,i,t}^{l}=\begin{cases}\mathcal{H}(V_{A,i,t}^{l}-V_{th,t}),\mod(\lfloor\frac{t}{K}\rfloor,2)=1,\\ 0,\quad else.\end{cases}$
	$\displaystyle\delta_{B,i,t}^{l}=\begin{cases}\mathcal{H}(V_{B,i,t}^{l}-V_{th,t}),\mod(\lfloor\frac{t}{K}\rfloor,2)=0,\\ 0,\quad else.\end{cases}$		(13)

where $\mathcal{H}(x)$ is unit step function, $\lfloor x\rfloor$ is the round-down operation. With periodic input, neurons do not have to respond to the input spikes all the time but accumulate spikes first and then respond and loop. Neurons respond accurately in each phase by accumulating spikes in the non-spike stage, which can effectively avoid the phase lead or lag mentioned above.

We use two BIF neurons as one unit to represent one activation value in the ANN, which is:

\displaystyle\delta_{i,t}^{l}=\delta_{B,i,t}^{l}+\delta_{A,i,t}^{l}.

(14)

One reason for using two BIF neurons is that the BIF neuron does not spike half the simulation time. The use of two neurons with complementary spike states can make the information be transmitted to the next layer in time and maintain the continuity of information transmission. One of the neurons in two adjacent layers is in the spike state to release memory information, and the other is in the non-spike state to accumulate spikes. Note that even if the neurons in the previous layer are in the non-spike state, its silence will not interfere with the neurons in the spike state connected to the next layer. Another reason is its powerful scalability. We can convert ANNs of various topologies without carefully designing the spike stage for each layer when converting deeper and wider ANNs. If only one BIF neuron is used in each layer, when the neuron is in a spike state, it cannot play the role of accumulation as described above.

As shown in Figure 3, there are two connections between the two units: neuron A of one unit is connected to neuron B of the other unit:

	$\displaystyle V_{A,i,t}^{l}=V_{A,i,t-1}^{l}+\sum_{j}w_{ij}\delta_{B,j,t}^{l-1},$
	$\displaystyle V_{B,i,t}^{l}=V_{B,i,t-1}^{l}+\sum_{j}w_{ij}\delta_{A,j,t}^{l-1}.$		(15)

They share the same weight. When the presynaptic neuron is in the spike phase, the postsynaptic neuron in the non-spike phase accumulates spikes to respond accurately later. In fact, the information between the two adjacent layers is periodically switched between the red connection and the blue connection with the simulation time, which also reflects that our BSNN can convert any structure of ANN.

The residual block of ResNet has two information paths, in which shortcut path connects input and output directly or through a convolution operation. The convolutional layer and the BN layer are merged to facilitate the conversion. When converting ResNet, two key problems need to be addressed:

•

The information of the two paths cannot be scaled synchronously. The information of two paths received by output neurons of the residual block is not proportional to the activation values. Because it is impossible to normalize the shortcut path which has no convolutional layer.
•

The information of the two paths cannot reach the output neuron synchronously. The shortcut path is one less ReLU operation, which corresponds to two BIF neurons in the SNN, than the convolution path. Since neurons need time to accumulate membrane potential to spike, the information of the shortcut path reaches the output neuron faster.

III-E Synchronous Neurons for Spiking ResNet

For the first problem, we determine the scale parameters according to the maximum activation value of the input and output so that the sum of the information of the two paths received by the output is proportional to the activation value:

\displaystyle scale=\frac{\lambda_{in}}{\lambda_{out}}.

(16)

To solve the second problem, we add synchronous neurons, which are two BIF neurons, in the shortcut path. It is equivalent to adding a ReLU function to the head of the shortcut path in ANN. Figure 4 shows the conversion process of the residual block. The information reaches the output of the residual block through the synchronous neurons. Since the input of the shortcut path is all non-negative, the transmission in ANN will not have any impact. In SNN, due to the existence of synchronous neurons, the output of the shortcut path and the convolutional path will reach the output neuron at the same time, thereby eliminating the phase lead and lag and SIN problems in spiking ResNet.

The entire conversion process summarized in Algorithm 1 where the SNNs transmit information with BIF neurons.

Algorithm 1 ANN-SNN Conversion with BIF Neurons

Input: Training and test set, simulation time $T$ , trained ANN
Output: Performance of the SNN

1: Let

V_{th}=1,\lambda_{l}=0

for

l=1,\cdots,L

to save the maximum activation value of each ANN layer.

2: Merge the convolutional layer and BN layer according to equation (7).

3: for

l=1

L

a^{l}\leftarrow

layer-wise activation value

\lambda_{l}=\max\{a_{i}^{l}\}

6: end for

7: for

l=1

L

\hat{w}_{ij}^{l}=w_{ij}^{l}\frac{\lambda_{l-1}}{\lambda_{l}},\quad\hat{b}_{i}^{l}=\frac{b_{i}^{l}}{\lambda_{l}}

9: end for

10: Map the processed parameters to the SNN.

11: for

s=1

\#

of test set do

12: for

t=1

L

13: do inference according to equation (III-D)(14)(III-D)

14: end for

15: end for

16: return performance of the SNN

IV Experiment

In this section, various experiments are conducted to evaluate the performance of our proposed conversion algorithm. We also test the effect of the synchronous neurons and compare our BSNN with various advanced conversion algorithms.

IV-A Dataset

The MNIST [62], CIFAR-10, CiFAR-100 [63], and ImageNet [64] datasets are used to test the performance of our proposed BSNN.

The MNIST dataset is the most commonly used dataset and benchmark for classification tasks. It contains 60,000 handwritten digital images from 0 to 9, 50,000 images for the training set, and 10,000 images for the test set. Each image contains 28x28 pixels, which are represented in the form of 8-bit gray values. Note that we do not perform any preprocessing on the MNIST dataset.

The CIFAR-10 dataset is the color image dataset closer to universal objects and a benchmark test set of the CNN architecture. It contains 60,000 images of 10 classes. 50,000 images for the training sets, and 10,000 images for the test sets. It is a 3-channel color RGB image, whose size of each image is 32x32. Unlike MNIST, we normalize the dataset to make the CIFAR-10 obey a standard normal distribution.

The CIFAR-100 dataset has the same image format as CIFAR-10. We also perform the same normalization operation on it, with different normalization parameters. The difference with CIFAR-10 is that CIFAR-100 contains 100 categories instead of 10. Each category contains 500 training images and 100 test images.

ImageNet is currently the world’s largest image recognition large-scale labeled image database organized according to the wordnet structure, and it is also the most challenging classification dataset for SNN. Among them, the training set is 1281167 pictures, and the verification set is 50,000 pictures, including 1000 different categories and 3-channel natural images. The normalization process is also performed to obtain a sufficiently high classification performance.

IV-B Experimental Setup

Our experiments are implemented on the Pytroch framework and NVIDIA A100. We convert CNN with 12c5-2s-64c5-2s-10 architecture [60] on MNIST. 12c5 means a convolutional layer with 12 output channels and kernel size of 5 and 2s refers to non-overlapping pooling layer with kernel size of 2. We use VGG16, ResNet18, ResNet20 architecture on CIFAR-10 and CIFAR-100, while ResNet18 and ResNet34 are used for experiments on ImageNet. Their structures are the same as that of Pytorch’s built-in model. We train the ANN for 100 or 300 epochs by using the stochastic gradient descent algorithm. The initial learning rate is 0.01, and the learning rate is scaled by 0.1 at the training epoch of [180, 240, 270]. Other parameters are listed in Table I. We perform data augmentation on the input except for the MNIST dataset and use real-value input in SNNs for better performance. When comparing various conversion methods, except for the encoding and the way of information transmission, the other settings are the same.

Parameters	MNIST	CIFAR-10	CIFAR-100	ImageNet
training epoch	100	300	300	300
total time step	80	400	800	1000
batch size	10000	100	100	50
p-max	0.999	1.0	1.0	1.0
threshold	1.0	1.0	1.0	1.0

TABLE I: Parameters of BSNN

IV-C Performance and Comparsion with other Methods

To obtain a high-performance SNN, the firing rate of the converted SNN should be similar or equal to the activation value of ANN, which is consistent with the conversion principle. We check the output difference of 100 samples of CIFAR-100 between the firing rate of converted SNN and the corresponding activation value of the ANN with architecture of VGG16. Ideally, due to the weight normalization, the output of the ANN is proportional to the firing frequency of the SNN output, and the multiple is the maximum value of the ANN output layer. We multiply the output of the SNN with the multiple for comparison. As we can see from Figure 5, the difference between the output of the selected 100 samples and the output of the ANN is mostly near 0. However, although the rate-bsed conversion method is widely used, it can be seen from the output of the network that the performance loss is that SNN cannot approach the activation value of ANN very well. The method based on phase encoding reduces the difference between the outputs by increasing the amount of information contained in the spikes, however, the problem of inaccurate approximation is still not solved. As can be seen in Figure 5(c), the output of BSNN is at most 0.005 different from the corresponding activation value of ANN. This indicates that the improvement of performance with BSNN comes from the accurate approximation to ANN activation values.

Dataset	Method	Network	Encoding	ANN (%)	SNN (%)	Loss (%)	Time Steps
	p-Norm [50]	CNN	Rate	99.44	99.44	0.00	-
MNIST	Weighted Spikes [60]	CNN	Phase	99.20	99.20	0.00	16
	BSNN	CNN	Phase	99.30	99.31	-0.01	35
	p-Norm [50]	VGG16	Rate	91.91	91.85	0.06	35
	Spike-Norm [54]	VGG16	Rate	91.70	91.55	0.15	-
	Hybrid Training [65]	VGG16	Rate	92.81	91.13	1.68	100
	RMP-SNN [51]	VGG16	Rate	93.63	93.63	0.00	1536
	TSC [66]	VGG16	Temporal	93.63	93.63	0.00	2048
	CQ Trained [67]	VGG16	Rate	92.56	92.48	0.08	600
CIFAR-10	BSNN	VGG16	Phase	94.11	94.12	-0.01	166
	Weighted Spikes [60]	ResNet20	Phase	91.40	91.40	0.00	-
	Hybrid Training [65]	ResNet20	Rate	93.15	92.22	0.93	250
	RMP-SNN [51]	ResNet20	Rate	91.47	91.36	0.11	-
	TSC [66]	ResNet20	Temporal	91.47	91.42	0.05	1536
	BSNN	ResNet20	Phase	95.02	95.16	-0.14	206
	Hybrid Training [65]	VGG11	Rate	71.21	67.87	3.34	125
	RMP-SNN [51]	VGG16	Rate	71.22	70.93	0.29	2048
	TSC [66]	VGG16	Temporal	71.22	70.97	0.25	1024
	CQ Trained [67]	VGG	Rate	71.84	71.84	0.00	300
	BSNN	VGG16	Phase	73.26	73.41	-0.15	242
CIFAR-100	Spiking ResNet [53]	ResNet44	Rate	70.18	68.56	1.62	-
	Weighted Spikes [60]	ResNet32	Phase	66.10	66.20	-0.10	-
	RMP-SNN [51]	ResNet20	Rate	68.72	67.82	0.90	2048
	TSC [66]	ResNet	Temporal	68.72	68.18	0.54	2048
	BSNN	ResNet20	Phase	77.97	78.12	-0.15	265
	Spike-Norm [54]	ResNet20	Rate	70.52	69.39	1.13	-
	BSNN	ResNet18	Phase	69.65	69.65	0.00	200
ImageNet	Hybrid Training [65]	ResNet34	Rate	70.20	61.48	8.72	250
	RMP-SNN [51]	ResNet34	Rate	70.64	69.89	0.75	4096
	BSNN	ResNet34	Phase	73.27	72.64	0.63	989

TABLE II: Top-1 classification accuracy on MNIST, CIFAR-10, CIFAR-100 and ImageNet for our converted SNNs, compared to the original ANNs, and compared to other conversion methods

Then we compare the performance of our model and other conversion methods on MNIST, CIFAR-10, CIFAR-100, and ImageNet, as shown in Table II. The time step is the simulation time required to achieve the best performance. We choose rate-based methods including p-Norm [50], Spike-Norm [54], RMP-SNN [51], etc., phase-based Weighted Spikes [60] method, temporal coding-based TSC [66] method, and other advanced methods such as CQ trained [67], Hybrid training [65], etc. for comparison. Here we do not compare the BSNN with algorithms based on biological rules and backpropagation. Because the former focuses on the biological interpretability of the network, while the latter focuses on exploring the temporal and spatial representation of features. The training cost of both is particularly high because of the information processing method similar to RNN in the training process. It is difficult to apply them to complex networks such as VGG16 and ResNet34, Thus, their performance significantly lags behind advanced conversion-based methods.

We first focus on the performance loss of the conversion method. The phase-based method is usually better than other methods because it combines the advantages of rate coding and temporal coding. The time information expressed in phase and the rate information expressed in period improve the information expressing ability of the spike. Based on this, our BSNN improves the information propagation of SNN based on BIF neurons and reduces the phase lead and lag problems in the Weighted Spike method, thus minimizing the performance loss. We achieved 99.31% performance on MNIST, 94.12% (VGG16), and 95.02% (ResNet20) performance on CIFAR-10, 73.41% (VGG16) and 78.12% (ResNet20) performance on CIFAR-100, and 69.65% (ResNet18) performance on ImageNet, which are better than other conversion method. To continue testing the ability of our method to convert deep networks, we conduct experiments on ResNet34. The results show that BSNN only needs less than 1000 time steps to achieve the performance of 72.64% with only 0.63% performance loss. As far as we know, this is also the highest performance that SNN can achieve.

In addition to the excellence in accuracy, our model has also achieved outstanding performance in time steps. The conversion method based on rate and timing naturally takes a long time to accurately represent the information and therefore requires a longer time step. The Hybrid Training method sacrifices part of the performance in exchange for shorter simulation time. We analyze above that the reason why the conversion method requires a long simulation time is that SNN needs enough spikes to compensate for the destruction of the proportional relationship caused by spikes of inactivated neurons. BSNN uses the bistable mechanism to accumulate and release spikes, thus the SIN problem is significantly improved. As shown in the Table II, on complex data sets such as CIFAR-10 and ImageNet, BSNN only needs a time step of 1/4 to 1/10 to achieve the performance of other advanced algorithms. hence, BSNN can save at least 25% of calculation loss and energy consumption to a certain extent, which plays an important role in the development and application of SNN.

IV-D Effect of Synchronous Neuron

In order to verify the effectiveness of the proposed synchronous neuron in converting ResNet, we convert ResNet18 on multiple datasets. As shown in Figure 6 ¹¹1We only run 400 time steps on the CIFAR-10 dataset. For the convenience of drawing, we use the result of step 400 for the next 400 time steps., since neurons are not always in the spike state but switch between spike and non-spike states, BSNN doesn’t work in the early simulation but completes the high-precision conversion with a small time delay. The detailed results are listed in Table III. The loss means the accuracy difference ( $acc_{ANN}-acc_{SNN}$ ) between the source ANN and the converted SNN. The experimental results show that the performance of the spiking ResNet using synchronous neurons exceeds the SNNs without synchronous neurons on CIFAR-10, CIFAR-100, and ImageNet datasets. It achieves the same performance as the ANN with 200-800 time-steps reduction. The use of synchronous neurons on ResNet conversion can ensure that the information of two paths reaches the output neuron of the residual block synchronously, which significantly improves the conversion accuracy and reduces the time delay.

	Dataset	SNN (%)	Loss (%)	Time
	CIFAR-10	94.04	0.74	395
w/out SN	CIFAR-100	76.37	0.03	741
	ImageNet	69.32	7.08	996
	CIFAR-10	94.83	-0.05	218
w/ SN	CIFAR-100	76.48	-0.08	237
	ImageNet	69.64	0.00	200

TABLE III: The Results of Adding Synchronous Neurons on ResNet18.

Note that previous work like Spike-Norm [54] uses average pooling and dropout instead of max-pooling and BN, limiting the performance of the converted SNN to a certain extent. The results show that our work can be adapted to various types of ANNs, and achieve almost lossless conversion with less time delay. Experimental results on complex datasets like CIFAR-100 and deep networks like ResNet34 show that BSNN can solve the difficulty in approximating features in deep layers to ANN by cooperating two BIF neurons of each unit to accumulate and emit spikes periodically. It means that we can achieve the same effect as current deep learning with a more biologically plausible network structure, less computational cost and energy consumption.

V Conclusion

In this paper, we analyze the reasons for the performance loss and large time delay in the conversion method. Our analysis reveals that the immediate response of neurons to the received current is unreliable in converted SNNs. It can bring the problem of SIN, which makes the firing rate in the deep layer cannot approximate the activation values in ANNs. Based on these analysis and observation, we propose a novel Bistable SNN which combines phase coding and the bistability mechanism, and design synchronous neurons to improve energy-efficiency, performance, and inference speed. Our experiments demonstrate that the BSNNs could significantly reduce performance loss and time delay. The efficiency and efficacy of our proposed BSNN could thus be of great importance for fast and energy-efficiency spike-based neuromorphic computing.

Acknowledgments

This work is supported by the National Key Research and Development Program (2020AAA0107800), the Strategic Priority Research Program of the Chinese Academy of Sciences (Grant No. XDB32070100), the Beijing Municipal Commission of Science and Technology (Grant No. Z181100001518006), and the Beijing Academy of Artificial Intelligence (BAAI).

References

[1] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788.
[2] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” in European conference on computer vision. Springer, 2016, pp. 21–37.
[3] R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440–1448.
[4] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014.
[5] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
[6] D. S. Park, Y. Zhang, Y. Jia, W. Han, C.-C. Chiu, B. Li, Y. Wu, and Q. V. Le, “Improved noisy student training for automatic speech recognition,” arXiv preprint arXiv:2005.09629, 2020.
[7] R. Kemker, M. McClure, A. Abitino, T. Hayes, and C. Kanan, “Measuring catastrophic forgetting in neural networks,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, 2018.
[8] M. Yan, C. A. Chan, A. F. Gygax, J. Yan, L. Campbell, A. Nirmalathas, and C. Leckie, “Modeling the total energy consumption of mobile network services and applications,” Energies, vol. 12, no. 1, p. 184, 2019.
[9] A. Nguyen, J. Yosinski, and J. Clune, “Deep neural networks are easily fooled: High confidence predictions for unrecognizable images,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 427–436.
[10] B. M. Lake, R. Salakhutdinov, and J. B. Tenenbaum, “Human-level concept learning through probabilistic program induction,” Science, vol. 350, no. 6266, pp. 1332–1338, 2015.
[11] E. Strubell, A. Ganesh, and A. McCallum, “Energy and policy considerations for deep learning in nlp,” arXiv preprint arXiv:1906.02243, 2019.
[12] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” arXiv preprint arXiv:1706.03762, 2017.
[13] X. He and J. Cheng, “Learning compression from limited unlabeled data,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 752–769.
[14] J. Wu, C. Leng, Y. Wang, Q. Hu, and J. Cheng, “Quantized convolutional neural networks for mobile devices,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4820–4828.
[15] T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam, “Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning,” ACM SIGARCH Computer Architecture News, vol. 42, no. 1, pp. 269–284, 2014.
[16] W. Maass, “Networks of spiking neurons: the third generation of neural network models,” Neural networks, vol. 10, no. 9, pp. 1659–1671, 1997.
[17] A. Tavanaei, M. Ghodrati, S. R. Kheradpisheh, T. Masquelier, and A. Maida, “Deep learning in spiking neural networks,” Neural Networks, vol. 111, pp. 47–63, 2019.
[18] X. Wang, X. Lin, and X. Dang, “Supervised learning in spiking neural networks: A review of algorithms and evaluations,” Neural Networks, vol. 125, pp. 258–280, 2020.
[19] B. Illing, W. Gerstner, and J. Brea, “Biologically plausible deep learning—but how far can we go with shallow networks?” Neural Networks, vol. 118, pp. 90–101, 2019.
[20] H. Jang, O. Simeone, B. Gardner, and A. Gruning, “An introduction to probabilistic spiking neural networks: Probabilistic models, learning rules, and applications,” IEEE Signal Processing Magazine, vol. 36, no. 6, pp. 64–77, 2019.
[21] Z. Bing, C. Meschede, F. Röhrbein, K. Huang, and A. C. Knoll, “A survey of robotics control based on learning-inspired spiking neural networks,” Frontiers in neurorobotics, vol. 12, p. 35, 2018.
[22] K. Roy, A. Jaiswal, and P. Panda, “Towards spike-based machine intelligence with neuromorphic computing,” Nature, vol. 575, no. 7784, pp. 607–617, 2019.
[23] J. L. Lobo, J. Del Ser, A. Bifet, and N. Kasabov, “Spiking neural networks and online learning: An overview and perspectives,” Neural Networks, vol. 121, pp. 88–100, 2020.
[24] M. Pfeiffer and T. Pfeil, “Deep learning with spiking neurons: opportunities and challenges,” Frontiers in neuroscience, vol. 12, p. 774, 2018.
[25] T. Zhang, Y. Zeng, D. Zhao, and M. Shi, “A plasticity-centric approach to train the non-differential spiking neural networks,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, 2018.
[26] Q. Liang and Y. Zeng, “Stylistic composition of melodies based on a brain-inspired spiking neural network,” Frontiers in systems neuroscience, vol. 15, p. 21, 2021.
[27] H. Fang, Y. Zeng, and F. Zhao, “Brain inspired sequences production by spiking neural networks with reward-modulated stdp,” Frontiers in Computational Neuroscience, vol. 15, p. 8, 2021.
[28] B. Zhao, Q. Yu, R. Ding, S. Chen, and H. Tang, “Event-driven simulation of the tempotron spiking neuron,” in 2014 IEEE Biomedical Circuits and Systems Conference (BioCAS) Proceedings. IEEE, 2014, pp. 667–670.
[29] I. Marian, R. Reilly, and D. Mackey, “Efficient event-driven simulation of spiking neural networks,” 2002.
[30] G. W. Burr, R. M. Shelby, A. Sebastian, S. Kim, S. Kim, S. Sidler, K. Virwani, M. Ishii, P. Narayanan, A. Fumarola et al., “Neuromorphic computing using non-volatile memory,” Advances in Physics: X, vol. 2, no. 1, pp. 89–124, 2017.
[31] M. Davies, “Benchmarks for progress in neuromorphic computing,” Nature Machine Intelligence, vol. 1, no. 9, pp. 386–388, 2019.
[32] S. Song, A. Balaji, A. Das, N. Kandasamy, and J. Shackleford, “Compiling spiking neural networks to neuromorphic hardware,” in The 21st ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, 2020, pp. 38–50.
[33] G.-q. Bi and M.-m. Poo, “Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type,” Journal of neuroscience, vol. 18, no. 24, pp. 10 464–10 472, 1998.
[34] Y. Bengio, T. Mesnard, A. Fischer, S. Zhang, and Y. Wu, “Stdp as presynaptic activity times rate of change of postsynaptic activity,” arXiv preprint arXiv:1509.05936, 2015.
[35] L. F. Abbott and S. B. Nelson, “Synaptic plasticity: taming the beast,” Nature neuroscience, vol. 3, no. 11, pp. 1178–1183, 2000.
[36] C. Blakemore, R. H. Carpenter, and M. A. Georgeson, “Lateral inhibition between orientation detectors in the human visual system,” Nature, vol. 228, no. 5266, pp. 37–39, 1970.
[37] R. C. Malenka, “The long-term potential of ltp,” Nature Reviews Neuroscience, vol. 4, no. 11, pp. 923–926, 2003.
[38] M. Ito, “Long-term depression,” Annual review of neuroscience, vol. 12, no. 1, pp. 85–102, 1989.
[39] Y. Zeng, T. Zhang, and B. Xu, “Improving multi-layer spiking neural networks by incorporating brain-inspired rules,” Science China Information Sciences, vol. 60, no. 5, p. 052201, 2017.
[40] W. S. Noble, “What is a support vector machine?” Nature biotechnology, vol. 24, no. 12, pp. 1565–1567, 2006.
[41] Y. Hao, X. Huang, M. Dong, and B. Xu, “A biologically plausible supervised learning method for spiking neural networks using the symmetric stdp rule,” Neural Networks, vol. 121, pp. 387–395, 2020.
[42] P. U. Diehl and M. Cook, “Unsupervised learning of digit recognition using spike-timing-dependent plasticity,” Frontiers in computational neuroscience, vol. 9, p. 99, 2015.
[43] Y. Wu, L. Deng, G. Li, J. Zhu, Y. Xie, and L. Shi, “Direct training for spiking neural networks: Faster, larger, better,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2019, pp. 1311–1318.
[44] S. B. Shrestha and G. Orchard, “Slayer: Spike layer error reassignment in time,” Advances in neural information processing systems, vol. 31, pp. 1412–1421, 2018.
[45] C. Lee, S. S. Sarwar, P. Panda, G. Srinivasan, and K. Roy, “Enabling spike-based backpropagation for training deep neural network architectures,” Frontiers in neuroscience, vol. 14, 2020.
[46] W. Zhang and P. Li, “Temporal spike sequence learning via backpropagation for deep spiking neural networks,” arXiv preprint arXiv:2002.10085, 2020.
[47] Y. Cao, Y. Chen, and D. Khosla, “Spiking deep convolutional neural networks for energy-efficient object recognition,” International Journal of Computer Vision, vol. 113, no. 1, pp. 54–66, 2015.
[48] X. Yang, Z. Zhang, W. Zhu, S. Yu, L. Liu, and N. Wu, “Deterministic conversion rule for cnns to efficient spiking convolutional neural networks,” Science China Information Sciences, vol. 63, no. 2, p. 122402, 2020.
[49] P. U. Diehl, D. Neil, J. Binas, M. Cook, S.-C. Liu, and M. Pfeiffer, “Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing,” in 2015 International Joint Conference on Neural Networks (IJCNN). ieee, 2015, pp. 1–8.
[50] B. Rueckauer, I.-A. Lungu, Y. Hu, M. Pfeiffer, and S.-C. Liu, “Conversion of continuous-valued deep networks to efficient event-driven networks for image classification,” Frontiers in neuroscience, vol. 11, p. 682, 2017.
[51] B. Han, G. Srinivasan, and K. Roy, “Rmp-snn: Residual membrane potential neuron for enabling deeper high-accuracy and low-latency spiking neural network,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 13 558–13 567.
[52] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
[53] Y. Hu, H. Tang, Y. Wang, and G. Pan, “Spiking deep residual network,” arXiv preprint arXiv:1805.01352, 2018.
[54] A. Sengupta, Y. Ye, R. Wang, C. Liu, and K. Roy, “Going deeper in spiking neural networks: Vgg and residual architectures,” Frontiers in neuroscience, vol. 13, p. 95, 2019.
[55] F. Xing, Y. Yuan, H. Huo, and T. Fang, “Homeostasis-based cnn-to-snn conversion of inception and residual architectures,” in International Conference on Neural Information Processing. Springer, 2019, pp. 173–184.
[56] E. M. Izhikevich, “Simple model of spiking neurons,” IEEE Transactions on neural networks, vol. 14, no. 6, pp. 1569–1572, 2003.
[57] E. Marder, L. Abbott, G. G. Turrigiano, Z. Liu, and J. Golowasch, “Memory from the dynamics of intrinsic membrane currents,” Proceedings of the national academy of sciences, vol. 93, no. 24, pp. 13 481–13 486, 1996.
[58] B. Rueckauer and S.-C. Liu, “Conversion of analog to spiking neural networks using sparse temporal coding,” in 2018 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 2018, pp. 1–5.
[59] L. Zhang, S. Zhou, T. Zhi, Z. Du, and Y. Chen, “Tdsnn: From deep neural networks to deep spike neural networks with temporal-coding,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2019, pp. 1319–1326.
[60] J. Kim, H. Kim, S. Huh, J. Lee, and K. Choi, “Deep neural networks with weighted spikes,” Neurocomputing, vol. 311, pp. 373–386, 2018.
[61] S. Park, S. Kim, H. Choe, and S. Yoon, “Fast and efficient information transmission with burst spikes in deep spiking neural networks,” in 2019 56th ACM/IEEE Design Automation Conference (DAC). IEEE, 2019, pp. 1–6.
[62] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[63] A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” 2009.
[64] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009, pp. 248–255.
[65] N. Rathi, G. Srinivasan, P. Panda, and K. Roy, “Enabling deep spiking neural networks with hybrid conversion and spike timing dependent backpropagation,” arXiv preprint arXiv:2005.01807, 2020.
[66] B. Han and K. Roy, “Deep spiking neural network: Energy efficiency through time based coding,” in Proc. IEEE Eur. Conf. Comput. Vis.(ECCV), 2020, pp. 388–404.
[67] Z. Yan, J. Zhou, and W.-F. Wong, “Near lossless transfer learning for spiking neural networks,” 2021.

Yi Zeng obtained his Bachelor degree in 2004 and Ph.D degree in 2010 from Beijing University of Technology, China. He is currently a Professor and Deputy Director at Research Center for Brain-inspired Intelligence, Institute of Automation, Chinese Academy of Sciences (CASIA), China. He is also with the National Laboratory of Pattern Recognition, CASIA, and University of Chinese Academy of Sciences, China. He is a Principal Investigator at Center for Excellence of Brain Science and Intelligence Technology, Chinese Academy of Sciences, China. His research interests include cognitive brain computational modeling, brain-inspired neural networks, brain-inspired robotics, etc.