Distributed Compressed Sparse Row Format for Spiking Neural Network Simulation, Serialization, and Interoperability
Abstract.
With the increasing development of neuromorphic platforms and their related software tools as well as the increasing scale of spiking neural network (SNN) models, there is a pressure for interoperable and scalable representations of network state. In response to this, we discuss a parallel extension of a widely used format for efficiently representing sparse matrices, the compressed sparse row (CSR), in the context of supporting the simulation and serialization of large-scale SNNs. Sparse matrices for graph adjacency structure provide a natural fit for describing the connectivity of an SNN, and prior work in the area of parallel graph partitioning has developed the distributed CSR (dCSR) format for storing and ingesting large graphs. We contend that organizing additional network information, such as neuron and synapse state, in alignment with its adjacency as dCSR provides a straightforward partition-based distribution of network state. For large-scale simulations, this means each parallel process is only responsible for its own partition of state, which becomes especially useful when the size of an SNN exceeds the memory resources of a single compute node. For potentially long-running simulations, this also enables network serialization to and from disk (e.g. for checkpoint/restart fault-tolerant computing) to be performed largely independently between parallel processes. We also provide a potential implementation, and put it forward for adoption within the neural computing community.
1. Introduction
The field of neural computing has witnessed significant developments and expansions in the software frameworks, network simulators, hardware platforms, and engineering tools available to the community (Schuman et al., 2017; Dai et al., 2020; Stimberg et al., 2019; Aimone et al., 2019; Intel, 2022; Rothganger et al., 2014; Davison et al., 2009; Brette et al., 2007). Especially in the software space, there has also been a continual push toward larger scale experiments in line with the advancements in high performance computing (HPC) and use of accelerators such as graphics processing units (GPU) (Golosio et al., 2021; Knight and Nowotny, 2021; Pehle and Pedersen, 2021; Wang, 2015). However, alongside this expansion of frameworks and tools is the question of their compatibility and interoperability within a shared ecosystem. For example, a common thread of discussion is how to adequately compare and benchmark between different simulators and target neuromorphic platforms (Vineyard et al., 2019; Kulkarni et al., 2021; Davies, 2019).
Recent efforts such as Fugu have attempted to resolve the interoperability between a spiking neural algorithm and different neurormophic backends by employing only the most widely supported neuron and synapse models (Aimone et al., 2019). In the adjacent machine learning (ML) community, there are also efforts such as the open neural network exchange (ONNX), which provides an open format built to represent ML models (e.g. between PyTorch and TensorFlow) (Bai et al., 2019). Perhaps most closely related to this work is the Scalable Open Network Architecture TemplAte (SONATA) data format for large-scale network models (Dai and et al, 2020). Although it leverages common file formats (e.g. csv, hdf5, json), we noted some concerns due to its use of hierarchical, population-based grouping (for both neurons and synapses), especially with regards to data locality.
In this paper, we propose the use of a relatively straightforward data format for interoperability and sharing of SNN models between simulators and neuromorphic hardware platforms. We base this on the widely used compressed sparse row (CSR) format for efficiently representing sparse matrices, and extend it to accommodate parallel partitions of network state. We first provide a high-level description of this format in Section 2, a pointer to an implementation in Section 3, and finally a discussion in Section 4.
2. Overview
Compressed sparse row is a widely used format to efficiently represent sparse matrices (Saad, 2011). Although there are several implementations, the central idea is to store the non-zero values of a matrix corresponding to indexical arrays over the rows and columns. For an matrix with non-zero entries, the row array is of size , where each entry provides a prefix or cumulative sum over the number of non-zero column indices for that row, and the last entry in the row array is . The column array is of size and contains the indices of the non-zeros entries for a given row-column pair. The corresponding value array is also of size and simply concatenates the non-zero values as read off in row-major order.
As an extension to this, the distributed CSR (dCSR) format was most notably introduced for storing and ingesting large graphs in prior work in parallel graph partitioning algorithms (Karypis and Kumar, 1998). This format provides an additional indexical array of size which supplies the prefix or cumulative sum over the number of non-zero indices for a given -way partitioning of the rows, where . Furthermore, the original CSR value and column arrays are split into multiple arrays along these partitions and .
To compare, for an SNN, a natural representation of the connectivity and overall network structure is as a directed graph (West, 2001). Vertices correspond to the spiking neurons, and edges correspond to the synaptic connections. The direction of an edge , for and corresponds to the propagation of an event (e.g. a spike) from the presynaptic neuron to the postsynaptic neuron . We may further partition this graph by splitting up the vertices such that , known as a -way partition. Here, edges may be assigned to a given partition based on their source vertex or their target vertex. With respect to the communication and computational patterns for SNNs, typically with synaptic weights applying current on their target neuron, colocating a directed edge with its target vertex is more sensible.
We can see that there is essentially an isomorphism between the rows and columns of a sparse matrix to the vertices and edges of an SNN graph. The main difference is that for an SNN, the vertices and edges typically carry far more additional information (e.g. connection delays, neural and synaptic states) than may be afforded by the standard, single non-zero entry in the value array. To address this, we suggest the extension to simply allow for tuples of values to be associated with the column array (as well as tuples of values to be associated with the row array). Because the amount of necessary unique state for any given vertex or edge will depend on its specific model dynamics, we may also introduce an additional model dictionary to provide tuple sizes.
3. Serialization Format
Transitioning from this high-level description to a more concrete implementation is actually fairly straightforward. In fact, we may simply extend the dCSR format such as used by ParMETIS (Karypis and Schloegel, 2013). This has the added advantage that we may directly interoperate with existing graph partitioning tools. Although generally less memory efficient on-disk than in simulation, we also opt to serialize to plain-text files for portability.
Instead of serializing a graph adjacency as two contiguous arrays of row offsets and column indices, respectively, one of the shortcuts implemented by ParMETIS is to implicitly index the row by its line number in a given .adjcy.k (adjacency) file and then list its column indices as space-separated values. Because the entire file must be read in from disk for processing, the initialization of data structures such as the row offsets can be computed at runtime. The partitioning offsets are still needed, however, and these are supplied through a separate .dist (distribution) file.
As additional information to a geometric partitioner, we also initialize a .coord.k (coordinate) file corresponding to the spatial coordinates of a vertex within a cartesian coordinate system. This becomes especially useful when network sizes exceed the memory requirements for advanced partitioners and may need to fall back to simple voxel-based partitioning.
For the main network .state.k (state) files, we supply a space-separated list of string-based model identifiers and tuples of state data. We opt to colocate the vertex and edge models together, resulting in a file that begins with a vertex identifier and its state, and followed by edge identifiers and state for each of the incoming connections. Of note here is that because the adjacency file for graph partitioning is typically undirected as opposed to directed, we additionally include special ‘none’ model identifiers with no associated state for edge where there is an outgoing connection but no incoming one. As mentioned previously, we also introduce a .model file which provides a mapping between the string-based model identifiers and the size of its state tuple, as well as shared model parameters.
There are also .event.k (event) files which provide serialization of any simulation events “in-flight” that have not yet been processed on the target vertex due to connection delays. These include space-separated tuples of the source vertex, arrival time, the event type, and any associated data.
Here, we point to an implementation of this distributed file format in the Simulation Tool for Asynchronous Cortical Streams (STACS) simulator (Wang, 2015). Incidentally, STACS was built from the ground up for parallel simulation using the Charm++ parallel programming framework which essentially forces serialization for the packing-unpacking of messages between parallel objects to support fault-tolerant computing (Kale and Krishnan, 1993). In effect, this enabled the decoupling of the network generation process and the simulation process through an intermediate serialized representation. It also served as an efficient format to snapshot of the network state for checkpointing-restarting and offline analysis.
As a comparative scalability example, we built and serialized the cortical microcircuit model consisting of roughly 76K neurons and 0.3B synapses (Potjans and Diesmann, 2014), resulting in about 12GB on disk (regardless of the number of partitions). For a 2x (in neurons) for 154K neurons and 1.2B synapses, our result was about 49GB on disk. This is effectively linear cost in number of synapses.
4. Discussion
What makes the extended dCSR format particularly appealing is that it draws from pre-existing, widely used formats that are intuitive to understand. For our implementation, all of the network state is serialized into essentially four main types of parallel files (adjacency, coordinate, state, and event), and there is also little overhead in storage costs with the introduction of few additional metadata files (dist, model). Due to its simplicity, it also becomes relatively straightforward to interoperate with popular graph analysis packages such as NetworkX and its directed graph data structure (Hagberg et al., 2008).
Beyond its simplicity, we also contend that a partition-based distribution of network state makes dCSR more immediately suitable for computational parallelism, whether its target platform is between different nodes for simulation or between different chips on neuromorphic hardware. Furthermore, as a result of its lineage in graph partitioners, such a serialization may also be readily used to inform a potential repartitioning of an SNN model such that it may optimally fit to different backends. For these reasons, we put forward the extended dCSR data format for adoption within the neural computing community.
Acknowledgements.
Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government.
References
- (1)
- Aimone et al. (2019) James B. Aimone, William Severa, and Craig M. Vineyard. 2019. Composing Neural Algorithms with Fugu. In Proceedings of the International Conference on Neuromorphic Systems (Knoxville, TN, USA) (ICONS ’19). Association for Computing Machinery, New York, NY, USA, Article 3, 8 pages. https://doi.org/10.1145/3354265.3354268
- Bai et al. (2019) Junjie Bai, Fang Lu, Ke Zhang, et al. 2019. ONNX: Open Neural Network Exchange. https://github.com/onnx/onnx.
- Brette et al. (2007) Romain Brette, Michelle Rudolph, Ted Carnevale, Michael Hines, David Beeman, James M Bower, Markus Diesmann, Abigail Morrison, Philip H Goodman, Frederick C Jr Harris, Milind Zirpe, Thomas Natschläger, Dejan Pecevski, Bard Ermentrout, Mikael Djurfeldt, Anders Lansner, Olivier Rochel, Thierry Vieville, Eilif Muller, Andrew P Davison, Sami El Boustani, and Alain Destexhe. 2007. Simulation of networks of spiking neurons: a review of tools and strategies. Journal of computational neuroscience 23, 3 (Dec 2007), 349–398.
- Dai and et al (2020) Kael Dai and et al. 2020. The SONATA data format for efficient description of large-scale network models. PLOS Computational Biology 16, 2 (2020), e1007696.
- Dai et al. (2020) Kael Dai, Sergey L. Gratiy, Yazan N. Billeh, Richard Xu, Binghuang Cai, Nicholas Cain, Atle E. Rimehaug, Alexander J. Stasik, Gaute T. Einevoll, Stefan Mihalas, Christof Koch, and Anton Arkhipov. 2020. Brain Modeling ToolKit: An open source software suite for multiscale modeling of brain circuits. PLoS Computational Biology 16 (2020).
- Davies (2019) Mike Davies. 2019. Benchmarks for progress in neuromorphic computing. Nature Machine Intelligence 1, 9 (2019), 386–388.
- Davison et al. (2009) Andrew Davison, Daniel Brüderle, Jochen Eppler, Jens Kremkow, Eilif Muller, Dejan Pecevski, Laurent Perrinet, and Pierre Yger. 2009. PyNN: a common interface for neuronal network simulators. Frontiers in Neuroinformatics 2 (2009). https://doi.org/10.3389/neuro.11.011.2008
- Golosio et al. (2021) Bruno Golosio, Gianmarco Tiddia, Chiara De Luca, Elena Pastorelli, Francesco Simula, and Pier Stanislao Paolucci. 2021. Fast Simulations of Highly-Connected Spiking Cortical Models Using GPUs. Frontiers in Computational Neuroscience 15 (2021).
- Hagberg et al. (2008) Aric A. Hagberg, Daniel A. Schult, and Pieter J. Swart. 2008. Exploring Network Structure, Dynamics, and Function using NetworkX. In Proceedings of the 7th Python in Science Conference, Gaël Varoquaux, Travis Vaught, and Jarrod Millman (Eds.). Pasadena, CA USA, 11 – 15.
- Intel (2022) Intel. 2022. Lava: A Software Framework for Neuromorphic Computing. GitHub Repository. https://github.com/lava-nc/lava
- Kale and Krishnan (1993) Laxmikant V. Kale and Sanjeev Krishnan. 1993. CHARM++: A Portable Concurrent Object Oriented System Based on C++. In Proceedings of the Eighth Annual Conference on Object-Oriented Programming Systems, Languages, and Applications (Washington, D.C., USA) (OOPSLA ’93). Association for Computing Machinery, New York, NY, USA, 91–108. https://doi.org/10.1145/165854.165874
- Karypis and Kumar (1998) George Karypis and Vipin Kumar. 1998. A Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering. J. Parallel and Distrib. Comput. 48, 1 (1998), 71–95. https://doi.org/10.1006/jpdc.1997.1403
- Karypis and Schloegel (2013) George Karypis and Kirk Schloegel. 2013. ParMETIS: Parallel Graph Partitioning and Sparse Matrix Ordering Library (4.0 ed.). University of Minnesota, Department of Computer Science and Engineering, Minneapolis, MN.
- Knight and Nowotny (2021) James C. Knight and Thomas Nowotny. 2021. Larger GPU-accelerated brain simulations with procedural connectivity. Nature Computational Science 1, 2 (2021), 136–142. https://doi.org/10.1038/s43588-020-00022-7
- Kulkarni et al. (2021) S. Kulkarni, M. Parsa, J. P. Mitchell, and C. D. Schuman. 2021. Benchmarking the Performance of Neuromorphic and Spiking Neural Simulators. Neurocomputing 447 (2021), 145–160.
- Pehle and Pedersen (2021) Christian Pehle and Jens Egholm Pedersen. 2021. Norse - A deep learning library for spiking neural networks. https://doi.org/10.5281/zenodo.4422025 Documentation: https://norse.ai/docs/.
- Potjans and Diesmann (2014) Tobias C Potjans and Markus Diesmann. 2014. The cell-type specific cortical microcircuit: relating structure and activity in a full-scale spiking network model. Cereb Cortex 24, 3 (Mar 2014), 785–806.
- Rothganger et al. (2014) Fredrick Rothganger, Christina Warrender, Derek Trumbo, and James Aimone. 2014. N2A: a computational tool for modeling from neurons to algorithms. Frontiers in Neural Circuits 8 (2014). https://doi.org/10.3389/fncir.2014.00001
- Saad (2011) Yousef Saad. 2011. Numerical Methods for Large Eigenvalue Problems. Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9781611970739 arXiv:https://epubs.siam.org/doi/pdf/10.1137/1.9781611970739
- Schuman et al. (2017) Catherine D. Schuman, Thomas E. Potok, Robert M. Patton, J. Douglas Birdwell, Mark E. Dean, Garrett S. Rose, and James S. Plank. 2017. A Survey of Neuromorphic Computing and Neural Networks in Hardware. ArXiv abs/1705.06963 (2017).
- Stimberg et al. (2019) Marcel Stimberg, Romain Brette, and Dan FM Goodman. 2019. Brian 2, an intuitive and efficient neural simulator. eLife 8 (Aug. 2019), e47314. https://doi.org/10.7554/eLife.47314
- Vineyard et al. (2019) Craig M. Vineyard, Sam Green, William M. Severa, and Çetin Kaya Koç. 2019. Benchmarking Event-Driven Neuromorphic Architectures. In Proceedings of the International Conference on Neuromorphic Systems (Knoxville, TN, USA). Association for Computing Machinery, New York, NY, USA, 5 pages.
- Wang (2015) Felix Wang. 2015. Simulation Tool for Asynchronous Cortical Streams (STACS): Interfacing with Spiking Neural Networks. In Procedia Computer Science (San Jose, CA, USA), Vol. 61. 322–7.
- West (2001) Douglas B. West. 2001. Introduction to Graph Theory (2 ed.). Prentice Hall.