[1]\fnmHongzhi \surYou
[1]\orgdivMOE Key Laboratory for NeuroInformation, \orgnameSchool of Life Science and Technology, University of Electronic Science and Technology of China, \orgaddress\cityChengdu, \stateSichuan, \countryChina
2]\orgnameSchool of Electronic Engineering and Automation, Guilin University of Electronic Technology, \orgaddress\cityGuilin, \stateGuangxi, \countryChina
3]\orgnameSynSense Tech. Co. Ltd., \orgaddress\cityNingbo, \stateZhejiang, \countryChina
Vector-Symbolic Architecture for Event-Based Optical Flow
Abstract
From a perspective of feature matching, optical flow estimation for event cameras involves identifying event correspondences by comparing feature similarity across accompanying event frames. In this work, we introduces an effective and robust high-dimensional (HD) feature descriptor for event frames, utilizing Vector Symbolic Architectures (VSA). The topological similarity among neighboring variables within VSA contributes to the enhanced representation similarity of feature descriptors for flow-matching points, while its structured symbolic representation capacity facilitates feature fusion from both event polarities and multiple spatial scales. Based on this HD feature descriptor, we propose a novel feature matching framework for event-based optical flow, encompassing both model-based (VSA-Flow) and self-supervised learning (VSA-SM) methods. In VSA-Flow, accurate optical flow estimation validates the effectiveness of HD feature descriptors. In VSA-SM, a novel similarity maximization method based on the HD feature descriptor is proposed to learn optical flow in a self-supervised way from events alone, eliminating the need for auxiliary grayscale images. Evaluation results demonstrate that our VSA-based method achieves superior accuracy in comparison to both model-based and self-supervised learning methods on the DSEC benchmark, while remains competitive among both methods on the MVSEC benchmark. This contribution marks a significant advancement in event-based optical flow within the feature matching methodology.
keywords:
Vector-symbolic architecture, Optical flow, Event camera, feature matching1 Introduction
Event-based cameras are bio-inspired vision sensors that asynchronously provide per-pixel brightness changes as an event stream [18]. Leveraging their high temporal resolution, dynamic range, and low latency, these cameras have the potential to enhance accurate motion estimation, particularly in optical flow [6, 3]. However, event-based optical flow estimation poses challenges due to its asynchronously and sparsely event visual information and the difficulty in obtaining ground-truth for optical flow, as compared to traditional cameras [18, 52]. Therefore, it is crucial to develop unsupervised optical flow methods that capitalize on the unique characteristics of event data, eliminating the dependency on expensive-to-collect and error-prone ground truth [52].
Optical flow estimation involves finding pixel correspondences between images captured at different moments. The feature matching method, a fundamental approach for event-based optical flow, relies on maximizing feature similarity between accompanying frames [18]. In this method, the feature for each event is typically represented by the image pattern around the corresponding pixel in the event frame [40, 41]. However, the inherent randomness in events [18] result in inconsistent image patterns of the same object across various frames, posing challenges in acquiring accurate and robust feature descriptors. Due to the absence of an effective event-only local feature descriptor, the feature matching method for event-based optical flow is generally limited to estimating sparse optical flows for key points, showing suboptimal performance [40, 41]. During self-supervised learning, accurate dense optical flow estimation becomes challenging without restoring luminance or additional sensor information, such as grayscale images [64, 22, 11, 13].
In this study, we introduce a high-dimensional (HD) feature descriptor for event frames, leveraging the Vector Symbolic Architecture (VSA). VSAs, regarded for their effectiveness in utilizing high-dimensional distributed vectors [33, 34], have traditionally been employed in symbolic representations of artificial shapes [29, 50, 51, 24] or few-shot learning classification tasks [23, 30]. In this work, VSAs form the basis of our novel descriptor in natural scenes captured by event cameras. This descriptor utilizes the local similarity characteristics of neighboring variables within VSA [14, 50] to reduce the impact of randomness in events on representation accuracy. Employing structured symbolic representations [35], it achieves multi-spatial-scale and two-polarity feature fusion for feature descriptor. Our evaluation of descriptor similarity for flow-matching points on datasets DSEC and MVSEC demonstrates the effectiveness of our proposed approach.
Further, we focus on a unifying framework for event-based optical flow within the feature matching strategy, centered around the proposed HD feature descriptors. The model-based VSA-Flow method, derived from the framework, utilizes the similarity of HD feature descriptors to achieve more accurate dense optical flow. Similarity integration in the cost volume from three event frame pairs with progressively doubling time intervals at gradually downsampled scales enables VSA-Flow to achieve large optical flow estimation within a limited neighboring region. Meanwhile, the proposed VSA-SM method relies on a similarity maximization (SM) proxy loss for predicted flow-matching points. This novel self-supervised learning approach effectively estimates optical flow from event-only HD feature descriptors, eliminating the need for additional sensor information. Evaluation results reveal that we obtain the best accuracy in both model-based and self-supervised learning methods on the DSEC-Flow benchmark, and competitive performance on the MVSEC benchmark.
2 Related Works
2.1 Event-based Optical Flow Estimation
From a methodological perspective, event-based optical flow estimation encompasses three primary approaches [18]. The first approach involves the gradient-based method, which leverages the spatial and temporal derivative information provided by event data directly or after appropriate processing to compute optical flow [5, 6]. Previous studies have explored event-based adaptations of Horn-Schunck and Lucas-kanade [26, 42, 5, 2], distance surface [3, 8] and spatial-temporal plane-fitting [6, 1].
The second approach is the feature matching method, which calculates optical flow by evaluating the similarity or correlation of feature representations for individual pixels between consecutive event frames in the temporal domain. For instance, the model-based EDFLOW estimates optical flow by applying adaptive block matching [40, 41]. Meanwhile, this approach is frequently employed in the design of learning-based optical flow neural networks that incorporate cost volume modules capable of computing feature similarity or correlation, such as E-RAFT [21] and TMA [61]. In addition, treating auxiliary grayscale images as low-dimensional features, EV-FlowNet engages in self-supervised learning by minimizing the intensity difference between warped images based on the estimated optical flow [64].
The third approach, exclusive to event cameras, is the contrast maximization method. This method maximizes an objective function, often related to contrast, to quantify the alignment of events generated by the same scene edge [53, 16, 17]. The underlying idea is to estimate motion by reconstructing a clear motion-compensated image of the edge patterns that triggered the events. This approach can be applied not only to model-based optical flow estimation [52] but also frequently serves as a loss function for unsupervised and self-supervised optical flow learning [52, 60, 46, 22, 47].
In contrast to prior work, our proposed VSA-based framework for event-based optical flow adopts a classical feature matching approach to offer deeper insights into the problem. This framework is adaptable to both model-based and self-supervised learning methods, akin to the contrast maximization method [16, 52]. Particularly, the self-supervised learning method in the framework can achieve accurate optical flow solely from event-only VSA-based HD feature descriptors, eliminating the need for auxiliary grayscale images.
2.2 High-dimensional Representations of Images Using Vector Symbolic Architecture
Vector Symbolic Architectures (VSAs) are regarded as a powerful algorithmic framework that leverages high-dimensional distributed vectors and employs specific algebraic operations and structured symbolic representations [33, 34]. VSAs have demonstrated remarkable capabilities in various domains, including spatial cognition and visual scene understanding. The hypervector encoding of the color images and event frames, including artificial shapes, is achieved through a superposition of spatial index vectors, weighted by their corresponding image pixel values [51, 50]. These HD representations finds application in neuromorphic visual scene understanding [51] and visual odometry [50]. Leveraging the structured symbolic representation capacity of VSAs, a biologically inspired spatial representation is employed to generate hierarchical cognitive maps, each containing objects at various locations [35]. Moreover, several VSA-based approaches have been introduced as frameworks for systematic aggregation of image descriptors suitable to visual place recognition [45, 31]. Overall, VSA endows HD representations of images with intrinsic attributes of hierarchical structure and semantics.
Accurate representations of feature descriptors that encompass individual pixels and their contextual features are crucial for optical flow estimation based on the feature matching method. In contrast to prior work, we adopt a specific type of VSA, Vector Function Architecture (VFA), which embodies continuous similarity characteristics to reduce the impact of randomness in events. This specific VSA is employed as a HD kernel to extract localized feature information from event frames. Meanwhile, optical flow estimation models commonly incorporate a multi-scale pyramid design to enhance their performance. Utilizing the binding capacity of structured features in VSA, we amalgamate HD feature representations from multiple scales and two event polarities into a unified feature descriptor.
3 Methodology
3.1 Preliminary
VSAs constitute a family of computational models with vector representations that have two distinct properties [33, 14]. Firstly, symbols are represented by mutually orthogonal randomized -dimensional vectors (), which facilitates a clear distinction between different symbols. Secondly, all computations within VSAs can be composed by a limited set of elementary vector algebraic operations, where the primary operations are the binding () and superposition () operations. The binding operation commonly signifies associations between symbols, such as a roll-filler pair [27], while the superposition operation is frequently used to represent sets of symbols. Both operations do not change the hypervector dimensionality. Through the combination of these operations and symbols, VSAs can effectively achieve structured symbolic representations. For instance, consider a scenario in which the character is located at position and at position in a given image. The hypervector symbolic representation of this image can be denoted as , where , , and represent mutually orthogonal randomized hypervectors of corresponding concepts.
VSAs have various models that use different types of random vectors [33]. In this study, an improved Holographic Reduced Representation (HRR) is employed as the VSA model to ensure high concept retrieval efficacy [19]. For HRR, the binding operation is the circular convolution of two hypervectors, and the superposition operation the component-wise sum. Additionally, the similarity between two HRRs can be measured through the cosine similarity.
In this work, the feature extraction from event frames requires the VSA-based 2-D spatial representation. Here, we first introduce the fractional power encoding (FPE) method [48, 49] for representing integers along each coordinate axis in an image plane, and then the VSA-based spatial representation.
3.1.1 The Fractional Power Encoding Method
In the fractional power encoding method [49], let be an integer, be a random hypervector, the hypervector representation for any integer can be obtained by repeatedly binding the base vector with itself times as follows:
(1) |
where the rightmost equation denotes the fractional binding operation by expressing it in the complex domain [35, 14]. is the Fourier transform, and is an component-wise exponentiation of the corresponding complex vector.
3.1.2 The VSA-based Spatial Representation
Recent studies have demonstrated that the hypervector spatial representation of a point () in 2-D space can be obtained using VSA with FPE [35, 15], as expressed in the following:
(2) |
Here, random vectors and represent the base vectors for horizontal and vertical axes, respectively. and represent pseudo-orthogonal representation vectors for distinct integer positions and along each axes.
3.2 The VSA-based Feature Matching Framework
This work aims to establish a novel framework for event-based optical flow utilizing VSA, adaptable to both model-based and self-supervised learning methods within the feature matching approach. Optical flow estimation involves finding pixel correspondences between images captured at distinct time intervals. Effective event representation and precise feature descriptors are essential in the framework.
3.2.1 Accumulative Time Surface
Event cameras are innovative bio-inspired sensors that respond to changes in brightness through continuous streams of events in a sparse and asynchronous manner. Each event comprises the space-time coordinates with polarity . In this work, we use an event representation called accumulative Time Surface (TS) [36, 63]. An accumulative TS at pixel and time is defined as follows:
(3) |
Here, represents the exponential-decay rate, and denotes the timestamp of any event that occurred at pixel prior to time . Thus, the accumulative TS emulates the synaptic activity that takes place after receiving the stream of events.
3.2.2 VSA-based HD Kernel for Feature Extraction

Utilizing the spatial representation described in Equation 2, the HD feature representation of the neighborhood centered around the pixel () in the image can be encoded as a hypervector using the following formula [51]:
(4) |
where denotes the offset from the pixel to any pixel within its neighborhood, in a scope of , . From the perspective of 2-D image convolution, we can utilize in Equation 4 as the HD kernel to achieve local feature extraction within an neighborhood for each pixel in the image. Consequently, the HD feature descriptor of the image can be efficiently obtained by convolving with the HD kernel [62] as follows:
(5) |
In principle, feature descriptors are required to capture differences between various image patterns of event frames, as well as exhibit similarities among comparable image patterns, displaying a certain degree of continuous similarity as image patterns vary. However, the basic VSA spatial representation defined in Equation 2 and 4 ignores important topological similarity relationships in 2-D space due to their pseudo-orthogonal property (Figure 1a) [15]. Given the inherent randomness in the event representations of the same object at different times, the spatial representation (Equation 2) is unsuitable as a HD kernel for feature extraction from event frames in tasks involving feature matching.
Recent studies have revealed that the Vector Function Architecture (VFA) [14] and the hyperdimensional transform [12] exhibit continuous translation-invariant similarity kernels. Inspired from these findings and for the sake of simplicity, here we employ a Gaussian-smoothed HD kernel with topological similarity to achieve the HD feature descriptors of the accumulative TS as follows:
(6) |
Here, represents a two-dimensional Gaussian kernel with a standard deviation of , facilitating the HD kernel to possess a translation-invariant similarity and characteristic similar to VFA (Figure 3, Equation 12 and Theorem 1 in [14]). Hence, we consider as specific instances of VFA. The corresponding hypervector spatial representation exhibits topological similarity relationships within a 2-D space (Figure 1b). Compared to the basic VSA (Equation 2), the local similarity characteristics of spatial representation in VFA (Equation 6) can effectively assist the feature descriptor in reducing the impact of randomness in events on representation accuracy. Unless explicitly noted, VSA used in the following sections is VFA.

3.2.3 VSA-based HD Feature Descriptor
Inspired by classical estimation methods, feature descriptors at time are obtained by a multi-scale strategy [7, 43]. Here, the VSA-based HD feature descriptor involves three steps (Figure 2a): transforming event streams into multiple scales of polarity-dependent accumulative TSs; generating HD feature descriptors for each scale by merging TSs from both polarities; and amalgamating HD feature descriptors from various scales into the final HD descriptor at the original scale of TSs. Here, we leverage the role-filler binding [33] to achieve the fusion of HD features, thereby realizing the structured representation of multi-scale and two-polarity HD feature descriptors.
First, the accumulative TSs for each polarity at time are obtained from event streams according to Equation 3. undergo continuous down-interpolation times at a ratio of , resulting in a set of TSs ().
Second, utilizing the polarity-dependent HD kernel (Equation 6), the HD feature descriptor for the corresponding TS of each polarity at Scale can be efficiently computed as follows:
(7) |
By the role-filler binding, the HD feature descriptor for each scale is obtained from corresponding polarity-specific HD feature descriptors as follows:
(8) |
where and denote the random role (key) vectors for two polarities, respectively.
3.2.4 Description of the Framework
Optical flow estimation involves identifying pixel correspondences between images captured at two different moments in time. The foundation of the feature matching method lies in the assumption that accurately estimated optical flow information corresponds to a higher similarity between corresponding pixels in accompanying event frames, compared to other pixels. The VSA-based feature matching framework here consists of two primary steps: 1) utilizing the VSA-based HD kernel to derive HD feature descriptors of consecutive event frames, and 2) employing algorithms such as search and optimization (for model-based methods) or neural networks with a proxy loss (for self-supervised learning methods). Both approaches aims to estimate optical flow by maximizing the similarity in feature descriptors of flow-matching points. In the following, we apply this framework to a model-based method (VSA-Flow) and a self-supervised learning method (VSA-SM) for event-based optical flow.
3.3 VSA-Flow: A Model-based Method Using VSA
The details of VSA-Flow is illustrated in Figure 2b, comprising three main components: HD feature extractors, the cost volume module, and the flow generator. The HD feature extractors are responsible for obtaining corresponding VSA-based HD feature descriptors from the accumulative TSs essential for optical flow estimation. The cost volume module calculates local visual similarity by constructing a volume representing the similarity between all pairs of TSs. Finally, the optical flow estimator generates the optical flow based on the local visual similarity.
3.3.1 HD feature extractors
The accuracy of event-based optical flow estimation is hindered by the stochastic nature in events, especially when relying solely on two accumulative TS with a time difference of . To address this limitation and incorporate more comprehensive intermediate motion information into our method, we include accumulative TSs captured at time 0, , , and , each with two polarities, successively represented as (, and ) in Figure 2b. By utilizing this extended set of event frames, we can achieve more precise optical flow estimation from time 0 to . Notably, the latter three time instances follow a progressive doubling pattern (), which will be further explained in the subsequent subsection. Following that, HD feature descriptors () corresponding to above event frames are acquired using the HD feature extractors depicted in Equation 9 and Figure 2a.
3.3.2 The cost volume module
Inspired by the basic cost volume in [56, 57], we adopt a strategy that integrates multiple pairs of HD feature descriptors with different time intervals: specifically, and at Scale , and at Scale , and and at Scale (Figure 2b). The time interval between and at Scale is (Figure 2c). The HD feature descriptors at the latter two scales are obtained through average pooling from those at Scale with kernel sizes and , respectively, and equivalent stride. In this module, we first compute local visual similarity for each pair of HD descriptors and at Scale . Specifically, the HD descriptor of any event in is compared for similarity only with the descriptors of pixels within a surrounding neighborhood in (Figure 2c). Thus, the cost volume, , can be efficiently computed using the cosine similarity as follows:
(10) |
The displacement , which maps each event in to its corresponding coordinates in , is obtained through the maximal similarity (Figure 2c). The estimated optical flow at the original scale of the event camera () can be calculated as follows:
(11) |
Here, denotes the corresponding displacement of transformed from Scale to Scale . Assuming the optical flow is constant during the interval, Equation 11 reveals that is independent of the scale and remains the same at different scales. This indicates that the cost volume, (), should theoretically be the same for different scales. Hence, by up-interpolating all cost volumes at different scales to the same size at Scale 0, we obtain the final cost volume as the sum of all cost volumes (Figure 2b). Here, similarity integration in the cost volume from three event frame pairs with progressively doubling time intervals at gradually downsampled scales enables VSA-Flow to achieve large optical flow estimation within a limited neighboring region.
3.3.3 The optical flow estimator
In this module, we adopt the scheme of optical flow probability volumes combined with a priori information on the position of the optical flow to estimate the optical flow (Figure 2b) [9]. The optical flow probability volumes predict the probability of optical flow within a local area for each pixel, based on the final cost volume . The priori information on the position of the optical flow is provided by a predefined 2D grid template of optical flow containing all possible optical directions that align with the optical flow probability volumes.
The optical flow probability volumes are calculated as follows:
(12) |
Here, and represent the maximal and mean values of . The coefficient determines the probability area contributing to the estimated optical flow. The cost volume in Equation 12 is obtained by average pooling the original from the cost volume module with a kernel size of and a stride of to remove fluctuations due to the stochastic nature in events.
The template of optical flow is formed by concatenating two 2D grids along the and axes with a range of , , respectively [9]:
(13) |
Next, the optical flow is estimated from a weighted average of its probability volumes over the predefined template (Figure 2b) [9], formulated as:
(14) |
where and represent the components of the predicted optical flow along the and axes, respectively.

3.4 VSA-SM: A Self-supervised Learning Method Through Similarity Maximization
Here, we adopt a self-supervised approach to learn optical flow estimation from accumulative TSs by maximizing the similarity of HD feature descriptors (Figure 3). We use a classical multi-frame approach for flow refinement, as illustrated in Figure 3a. During a time interval of , we extract HD feature descriptors from corresponding accumulative TSs at intervals of (), yielding a set of descriptors denoted as (). Assuming the optical flow within the interval is represented by , the inferred optical flow between descriptor and descriptor equates to . As a result, we utilize pairs of descriptors (, where ) to facilitate flow refinement within the context of self-supervised learning.
Knowing the per-pixel optical flow , the matching point at time can be obtained through:
(15) |
However, the matching point may not correspond to an actual pixel. Thus, the similarity between HD feature descriptors of in and the matching point in is calculated by evaluating its similarity to the descriptors of neighboring pixels () around the matching point in , with normalized weights () via bilinear interpolation (Figure 3b):
(16) |
In this study, we use the similarity maximization proxy loss for feature matching to learn to estimate the event-based optical flow, as outlined in Equation 16. Building upon the principles of previous unsupervised learning methods that emphasize contrast maximization [52, 47], we build on the loss function as follows:
(17) |
which is a weighted combination of two terms: the similarity loss and the smoothness . The computation of the similarity loss involves pixels, encompassing the most recent events occurring before time , as well as pixels sampled at every 5th interval both horizontally and vertically across the image plane (Figure 3a and 3b). The formulation of the similarity loss is as follows:
(18) |
Here, represents the average similarity encompassing all relevant pixels within pairs of descriptors, while serves as a coefficient. A higher value of corresponds to more accurate optical flow estimation and a diminished similarity loss function. Additionally, the smoothness adopts the Charbonnier smoothness prior [22, 65] or the first order edge-aware smoothness [55].
In this study, we train E-RAFT [21] in a self-supervised manner, utilizing the loss function described in Equation 17, to demonstrate the effectiveness of our self-supervised learning method based on similarity maximization of HD feature descriptors. In principle, this methodology holds applicability across various event-based optical flow networks. Meanwhile, we adopt the full-image warping technique [55] to improve flow quality near image boundaries.
4 Experiments
4.1 Datasets, Metrics and Implementation Details
Following previous works [21, 52], both VSA-Flow and VSA-SM are evaluated using well-established event-based datasets DSEC-Flow ( pixel resolution) [21] and MVSEC ( pixel resolution) [64].
For the model-based method (VSA-Flow), experiments are conducted on the official testing set of the public DSEC-Flow benchmark and on outdoor_day1 and three indoor_flying sequences with time intervals of gray images on the MVSEC benchmark. For the self-supervised learning method (VSA-SM), E-RAFT is trained on the official training set of DSEC and on outdoor_day2 sequence on MVSEC, respectively. To increase the variation in the optical flow magnitude during training, the training sequences on MVSEC are extended with time intervals of gray images. Following separate training, evaluations are performed on the same testing sets as VSA-Flow, respectively on DSEC and MVSEC. Both methods are implemented using Pytorch library. For the VSA-SM training, we set batch size to , the optimizer is set to Adam [32] and learning rate to .
We evaluate the accuracy of our predictions using following metrics: (i) EPE, the endpoint error; (ii) and , the percentage of points with EPE greater than and pixels; (iii) AE, angular error. For both DSEC-Flow [20, 21] and MVSEC [64] datasets, metrics are measured over pixels with valid ground-truth and at least one event in the evaluation intervals.

4.2 Descriptor Similarity of Flow-Matching Points
In this study, HD feature descriptors are derived from feature extractors utilizing VSA-based HD kernels. We explore the impact of different VSA types (basic VSA and VFA) on the descriptor similarity among flow-matching points within the DSEC and MVSEC datasets (Figure 4).
In the basic VSA HD kernel, all hypervectors are pseudo-orthogonal, implying that each pixel within the neighborhoods contributes independently to the feature descriptor. Feature descriptors obtained from the basic VSA HD kernel reflect the most fundamental image patterns. Hence, Figure 4 (blue curves) reveals that the similarity of flow-matching points in the MVSEC dataset is inferior to that in the DSEC dataset. This observation suggests that, in comparison to the DSEC dataset, the MVSEC dataset experiences greater randomness in event frames, leading to lower event frame quality.
Figure 4 (red curves) illustrates that VFA yields higher descriptor similarity for flow-matching points compared to basic VSA. In contrast to basic VSA, VFA exhibits an improved ability to encode the similarity of flow-matching points in event frames.
Methods | EPE | 1PE | 3PE | AE | |
MB | MultiCM [52] | 3.47 | 76.57 | 30.86 | 13.98 |
RTEF [8] | 4.88 | 82.81 | 41.96 | 10.82 | |
VSA-Flow (VFA)† | 3.460.06 | 68.940.57 | 28.970.40 | 9.450.17 | |
VSA-Flow (Basic VSA)† | 4.190.15 | 77.501.08 | 32.340.72 | 13.410.56 | |
SL | EV-FlowNet [21]∗ | 2.32 | 55.4 | 18.6 | - |
E-RAFT [21] | 0.79 | 12.74 | 2.68 | 2.85 | |
IDNet [59] | 0.72 | 10.07 | 2.04 | 2.72 | |
TMA [39] | 0.74 | 10.86 | 2.30 | 2.68 | |
E-Flowformer [38] | 0.76 | 11.23 | 2.45 | 2.68 | |
SSL | EV-FlowNet [47]∗ | 3.86 | - | 31.45 | - |
TamingCM [47] | 2.33 | 68.29 | 17.77 | 10.56 | |
VSA-SM (VFA)† | 2.22 | 55.46 | 16.83 | 8.86 |
Methods | indoor_flying1 | indoor_flying2 | indoor_flying3 | outdoor_day1 | |||||
---|---|---|---|---|---|---|---|---|---|
EPE | 3PE | EPE | 3PE | EPE | 3PE | EPE | 3PE | ||
MB | Nagata et al. [44] | 0.62 | — | 0.93 | — | 0.84 | — | 0.77 | — |
Akolkar et al. [1] | 1.52 | — | 1.59 | — | 1.89 | — | 2.75 | — | |
Brebion et al. [8] | 0.52 | 0.10 | 0.98 | 5.50 | 0.71 | 2.10 | 0.53 | 0.20 | |
MultiCM [52] | 0.42 | 0.10 | 0.60 | 0.59 | 0.50 | 0.28 | 0.30 | 0.10 | |
VSA-Flow (VFA) † | 0.46 | 0.05 | 0.65 | 1.08 | 0.53 | 0.29 | 0.65 | 3.60 | |
SL | EV-FlowNet+ [54] | 0.56 | 1.00 | 0.66 | 1.00 | 0.59 | 1.00 | 0.68 | 0.99 |
E-RAFT [21] | 1.10 | 5.72 | 1.94 | 30.79 | 1.66 | 25.20 | 0.24 | 0.00 | |
TMA [39] | 1.06 | 3.63 | 1.81 | 27.29 | 1.58 | 23.26 | 0.25 | 0.07 | |
SSLF | EV-FlowNet [64] | 1.03 | 2.20 | 1.72 | 15.10 | 1.53 | 11.90 | 0.49 | 0.20 |
Spike-FlowNet [37] | 0.84 | —– | 1.28 | —– | 1.11 | —– | 0.49 | —– | |
STE-FlowNet [13] | 0.57 | 0.10 | 0.79 | 1.60 | 0.72 | 1.30 | 0.42 | 0.00 | |
SSL | EV-FlowNet [54] | 0.58 | 0.00 | 1.02 | 4.00 | 0.87 | 3.00 | 0.32 | 0.00 |
Hagenaars et al.[22] | 0.60 | 0.51 | 1.17 | 8.06 | 0.93 | 5.64 | 0.47 | 0.25 | |
VSA-SM (VFA)∗ | 0.57 | 0.07 | 0.91 | 3.91 | 0.69 | 1.63 | 0.46 | 3.42 | |
MB | MultiCM [52] | 1.69 | 12.95 | 2.49 | 26.35 | 2.06 | 19.03 | 1.25 | 9.21 |
VSA-Flow (VFA) † | 1.44 | 6.71 | 2.49 | 18.01 | 1.79 | 11.90 | 1.66 | 13.96 | |
SL | E-RAFT [21] | 2.81 | 40.25 | 5.09 | 64.19 | 4.46 | 57.11 | 0.72 | 1.12 |
TMA [39] | 2.43 | 29.91 | 4.32 | 52.74 | 3.60 | 42.02 | 0.70 | 1.08 | |
SSLF | EV-FlowNet [64] | 2.25 | 24.70 | 4.05 | 45.30 | 3.45 | 39.70 | 1.23 | 7.30 |
Spike-FlowNet [37] | 2.24 | —– | 3.83 | —– | 3.18 | —– | 1.09 | —– | |
STE-FlowNet [13] | 1.77 | 14.70 | 2.52 | 26.10 | 2.23 | 22.10 | 0.99 | 3.90 | |
SSL | EV-FlowNet [54] | 2.18 | 24.20 | 3.85 | 46.80 | 3.18 | 47.80 | 1.30 | 9.70 |
Hagenaars et al.[22] | 2.16 | 21.51 | 3.90 | 40.72 | 3.00 | 29.60 | 1.69 | 12.50 | |
VSA-SM (VFA)∗ | 1.63 | 10.05 | 2.92 | 22.57 | 1.98 | 13.12 | 1.24 | 8.31 |
4.3 Results on DSEC
Table 1 presents the evaluation results on the DSEC-Flow benchmark [21]. The methods listed in different rows are classified into three types: model-based methods (MB), supervised learning methods (SL), and self-supervised learning methods (SSL). The notations ‘VFA’ and ‘Basic VSA’ within the parentheses for our methods represent the utilization of the VFA (Eq. 6) and the basic VSA (Eq. 2) HD kernels for feature descriptors. It’s important to note that the stochastic nature of generating spatial base vectors for the HD kernel impacts the evaluation of the VSA-Flow method, all evaluation metrics for the VSA-Flow method represent the statistical outcomes obtained from randomly producing 10 sets of HD kernels. This includes the mean and standard deviation for each metric. Regarding the VSA-SM method, due to its prolonged training duration, Table 1 showcases the evaluation results based on a single set of randomly generated HD kernel used during training.
The VSA-Flow (VFA) method provides the superior performance among all model-based methods in the DSEC-Flow dataset. In particular, the EPE and 3PE metrics slightly outperform other methods, whereas the 1PE and AE metrics display substantial improvements. Moreover, it is evident that employing VFA as the HD kernel in VSA-Flow leads to a significant performance improvement compared to utilizing the basic VSA, which is consistent with observations in Figure 4. In self-supervised training group, the proposed VSA-SM (VFA) method demonstrates the best results among all self-supervised learning methods. The extent of its improvement across the metrics aligns with the evaluation outcomes of the VSA-Flow (VFA).
4.4 Results on MVSEC
Table 2 reports the evaluation results on the MVSEC benchmark [64]. Due to the small deviation of all metrics when (Table 3), for the sake of simplicity, the evaluation results on MVSEC for our methods come from a single set of randomly generated HD kernel. Consistent with [64] and [52], Table 2 compares some primary methods using the same training and testing sequences. Many learning-based methods trained on alternate outdoor sequences or datasets are not used for testing.
The VSA-Flow method achieves the best results among all methods in indoor_flying sequences when and the competitive results when . These results indicate that the model-based VSA-Flow method, based on HD feature descriptors, are well-suited for large optical flow estimation () and maintain competitiveness for low optical flow (). In addition, in comparison to the indoor_flying sequences, the performance of VSA-Flow is less competitive in the outdoor_day sequence. This discrepancy may primarily stem from the fact that, compared to the indoor_flying scene, the smaller motion in the outdoor_day scene leads to sparser events [65], thereby impacting the representation of HD feature descriptors.
As mentioned earlier, the training sequences for VSA-SM on MVSEC are extended with time intervals of grayscale images. Because VSA-Flow exhibits relatively weaker performance for low optical flow (, Table 3) compared to large optical flow (, Table 3), in the training strategy for VSA-SM, the optical flow predictions at time intervals are scaled by factors of , , and , respectively. Subsequently, self-supervised learning when is conducted using high-dimensional feature descriptors of events frames for . Evaluation results indicate that the VSA-SM method achieves competitive performance compared to other self-supervised learning methods. Furthermore, it outperforms some semi-supervised learning methods that employ grayscale images for supervision, particularly on certain sequences.
It is noteworthy that many learning methods, including VSA-SM, exhibit lower performance in the indoor scenes compared to model-based methods. This discrepancy arises because training for MVSEC is exclusively conducted on the outdoor_day2 sequence, but indoor and outdoor sequences contain distinct scene information.

4.5 Qualitative Results on DSEC
Qualitative results of both VSA-Flow and VSA-SM methods on multiple sequences from the test partition of DSEC-Flow dataset are shown in Figure 5. Given the unavailability of ground truth for the official testing set, a comparison with the state-of-the-art E-RAFT architecture [21] is performed. Our model-based and self-supervised learning methods can achieve high-quality event-based optical flow estimation from events without the need for additional sensory information. Several conclusions can be drawn from these results: (1) Both VSA-Flow and VSA-SM accurately estimate optical flow, particularly in regions containing events. Event-masked sparse optical flow estimation appears more accurate than dense flow estimation. (2) The optical flow estimation from VSA-SM appears smoother compared to VSA-Flow; (3) VSA-Flow exhibits inaccuracies in optical flow estimation near image boundaries, whereas the adoption of a full-image warping technique [55] for VSA-SM during self-supervised learning enhances its accuracy near image boundaries; (4) Due to both methods relying solely on event frames for flow estimation, accuracy diminishes in large areas devoid of events, sometimes resulting in zero flow estimation - a trend consistent with other self-supervised learning methods [22, 47]; (5) As model-based and self-supervised learning approaches relying on event-only local features, our methods predict optical flow less smoothly compared to supervised learning methods; Meanwhile, our methods exhibit less sharp optical flow estimation at the edges of objects, displaying a smoother transition.
EPE | 1PE | 3PE | AE | ||
1024 | 1 | 3.400.05 | 70.261.17 | 28.360.49 | 9.930.31 |
2 | 3.460.06 | 68.940.57 | 28.970.40 | 9.450.17 | |
3 | 3.850.06 | 69.810.46 | 31.350.28 | 9.640.10 | |
512 | 3.560.07 | 69.441.20 | 29.550.56 | 9.730.32 | |
256 | 2 | 3.630.12 | 70.101.25 | 29.970.84 | 10.000.43 |
128 | 3.870.28 | 72.712.21 | 31.751.64 | 11.081.02 |
4.6 Effects of Hypervector Dimension and Multi-scale
Table 3 reports evaluation results for experiments on the VSA-Flow method with varying hypervector dimensions () and different multi-scale numbers () for the HD feature descriptor. When , the VSA-Flow exhibits better EPE and 3PE metrics with , while for , it demonstrates better 1PE and AE metrics. Moreover, as remains constant and is altered, all metrics are improved with , indicating that a larger hypervector dimension leads to enhanced performance. This result is consistent with the understanding that, within VSA, increased hypervector dimensions contribute to heightened information encoding capabilities [33, 34].

4.7 Effects of Exponential-decay Rate of Time Surface
The temporal information of the HD feature descriptor is primarily impacted by the exponential-decay rate of the accumulative Time Surface (TS). Figure 6 illustrates the metrics EPE and 3PE for a single trial of the VSA-Flow method on DSEC and MVSEC datasets with varying . Both metrics exhibit a trend of initially decreasing and then increasing with . These results indicate that the performance of VSA-Flow’s optical flow estimation diminishes when is either too small or too large. Optimal performance is observed within a suitable range of . It is because short for TS emphasizes recent events, resulting in its sparsity and inadequacy. Conversely, an excessively long causes TS to encompass events over an extended period, leading to a blurred representation. Hence, an appropriate is essential. It is worthy noting that the optimal range for in the VSA-Flow method differs between DSEC and MVSEC due to variations in the characteristics of the event cameras used. In contrast to DSEC, events in MVSEC are more sparse, requiring a larger . This indicates the necessity to accumulate events over a longer period for MVSEC to achieve more accurate information encoding in the TS.
5 Conclusions and Discussions
In summary, our work introduces a novel VSA-based feature matching framework for event-based optical flow, applicable to both model-based (VSA-Flow) and self-supervised learning (VSA-SM) methods. The key of our work lies in the effective utilization of a VSA-based HD feature descriptor for event frames. The proposed methods can achieve accurate estimation of event-based optical flow in the feature matching methodology without restoring luminance or additional sensor information [64, 22, 11, 13, 58]. This work signifies an important advancement in event-based optical flow within the feature matching methodology, underscored by our compelling and robust results. The proposed framework can have broad applicability, extending to more event-based tasks such as depth estimation and tracking.
Currently, most primary methods for event-based optical flow estimation applicable to both model-based and self-supervised learning are contrast maximization methods [52, 60, 46, 22, 47]. The contrast maximization (CM) methods excel in utilizing temporal information from events but are less adept for local spatial features from events. Hence, these methods perform well in estimating optical flow within short time intervals or for small flow magnitudes. They require more complex strategies for achieving satisfactory performance in larger time intervals, such as producing sharp image of warped events (IWE) at multiple reference times through iterative warping [22, 47]. In contrast, our methods, based on feature similarity maximization, excel in utilizing the local spatial features of events, but is comparatively weaker in exploiting temporal information. Consequently, our methods demonstrate better performance in optical estimation within larger time intervals (Table 2). Our methods achieve competitive performance without complex strategies, and avoid circumvent issues such as occlusions and overfitting observed when warping events in CM methods [52]. Future research will focus on enhancing the temporal encoding capability in HD feature descriptors.
Traditionally, the feature matching is primarily determined by the differences between two local images within the neighborhoods of two feature points, which are often quantified using metrics such as the sum of absolute differences and Euclidean distance [36, 40, 63]. This approach is frequently applied in event camera hardware platforms [41]. However, due to the inherent randomness in events, it may not be the most effective approach for gauging feature similarity directly from local event frames. Inspired from [14, 50], we utilize the VSA-based HD kernel to extract the local features and structured symbolic representation to achieve feature fusion from both event polarities and multiple spatial scales. These approaches enhance the similarity of flow-matching feature descriptors as shown in our evaluation results. VSA, also known as Hyperdimensional Computing, is considered a new emerging neuromorphic computing model for ultra-efficient edge AI [28, 4, 66]. Presently, our method focuses on dense optical flow estimation. With appropriate adjustments and configurations, our method is promising to efficiently and rapidly achieve sparse optical flow estimation on hardware, facilitating the design of event-driven hardware optical flow sensors [10, 25, 41].
Acknowledgements
Data Availability Statement
The DSEC-Flow dataset is available for download from the website at https://dsec.ifi.uzh.ch/dsec-datasets/download/.
The MVSEC dataset is available for download from the website at https://daniilidis-group.github.io/mvsec/download/.
Appendix A Parameter Configurations
Parameters | VSA-Flow | VSA-SM | |||
---|---|---|---|---|---|
DSEC | MVSEC | DSEC | MVSEC | ||
Accumulative time surface | |||||
The exponential-decay rate | 35∗ | 35∗ | 35 | 35 | |
VSA-based kernel and HD feature descriptor | |||||
The hypervector dimension | 1024∗ | 1024 | 1024∗ | 1024 | |
The size of HD kernel | 21 | 25 | 21 | 25 | |
Stander deviation of Gaussian kernel | 1.5 | 1.5 | 1.5 | 1.5 | |
The multi-scale number in the descriptor | 2∗ | 2 | 2∗ | 2 | |
The cost volume module and the optical flow estimator in VSA-Flow | |||||
The neighborhood size of the cost volume | 31 | 31 | |||
A coefficient in the optical flow probability volume | 0.85 | 0.60 | |||
The kernel size of average pooling | 71 | 71 | |||
Loss functions in VSA-SM | |||||
The weight of the smoothness in loss function | 1.0 | 1.0 | |||
A coefficient in | 5.0 | 5.0 |
-
•
∗Unless explicitly noted.
Table 4 shows parameter values used in proposed VSA-Flow and VSA-SM methods.
References
- \bibcommenthead
- Akolkar et al [2020] Akolkar H, Ieng SH, Benosman R (2020) Real-time high speed motion prediction using fast aperture-robust event-driven visual flow. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(1):361–372
- Almatrafi and Hirakawa [2019] Almatrafi M, Hirakawa K (2019) Davis camera optical flow. IEEE Transactions on Computational Imaging 6:396–407
- Almatrafi et al [2020] Almatrafi M, Baldwin R, Aizawa K, et al (2020) Distance surface for event-based optical flow. IEEE transactions on pattern analysis and machine intelligence 42(7):1547–1556
- Amrouch et al [2022] Amrouch H, Imani M, Jiao X, et al (2022) Brain-inspired hyperdimensional computing for ultra-efficient edge ai. In: 2022 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ ISSS), IEEE, pp 25–34
- Benosman et al [2012] Benosman R, Ieng SH, Clercq C, et al (2012) Asynchronous frameless event-based optical flow. Neural Networks 27:32–37
- Benosman et al [2013] Benosman R, Clercq C, Lagorce X, et al (2013) Event-based visual flow. IEEE transactions on neural networks and learning systems 25(2):407–417
- Black and Anandan [1996] Black MJ, Anandan P (1996) The robust estimation of multiple motions: Parametric and piecewise-smooth flow fields. Computer vision and image understanding 63(1):75–104
- Brebion et al [2022] Brebion V, Moreau J, Davoine F (2022) Real-time optical flow for vehicular perception with low-and high-resolution event cameras. IEEE Transactions on Intelligent Transportation Systems 23(9)
- Cao et al [2023] Cao YJ, Zhang XS, Luo FY, et al (2023) Learning generalized visual odometry using position-aware optical flow and geometric bundle adjustment. Pattern Recognition 136:109262
- Chao et al [2013] Chao H, Gu Y, Gross J, et al (2013) A comparative study of optical flow and traditional sensors in uav navigation. In: 2013 American Control Conference, IEEE, pp 3858–3863
- Deng et al [2021] Deng Y, Chen H, Chen H, et al (2021) Learning from images: A distillation learning framework for event cameras. IEEE Transactions on Image Processing 30:4919–4931
- Dewulf et al [2023] Dewulf P, De Baets B, Stock M (2023) The hyperdimensional transform for distributional modelling, regression and classification. arXiv preprint arXiv:231108150
- Ding et al [2022] Ding Z, Zhao R, Zhang J, et al (2022) Spatio-temporal recurrent networks for event-based optical flow estimation. In: Proceedings of the AAAI conference on artificial intelligence, pp 525–533
- Frady et al [2021] Frady EP, Kleyko D, Kymn CJ, et al (2021) Computing on functions using randomized vector representations. arXiv preprint arXiv:210903429
- Frady et al [2022] Frady EP, Kleyko D, Kymn CJ, et al (2022) Computing on functions using randomized vector representations (in brief). In: Proceedings of the 2022 Annual Neuro-Inspired Computational Elements Conference, pp 115–122
- Gallego et al [2018] Gallego G, Rebecq H, Scaramuzza D (2018) A unifying contrast maximization framework for event cameras, with applications to motion, depth, and optical flow estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3867–3876
- Gallego et al [2019] Gallego G, Gehrig M, Scaramuzza D (2019) Focus is all you need: Loss functions for event-based vision. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12280–12289
- Gallego et al [2020] Gallego G, Delbrück T, Orchard G, et al (2020) Event-based vision: A survey. IEEE transactions on pattern analysis and machine intelligence 44(1):154–180
- Ganesan et al [2021] Ganesan A, Gao H, Gandhi S, et al (2021) Learning with holographic reduced representations. Advances in Neural Information Processing Systems 34:25606–25620
- Gehrig et al [2021a] Gehrig M, Aarents W, Gehrig D, et al (2021a) Dsec: A stereo event camera dataset for driving scenarios. IEEE Robotics and Automation Letters 6(3):4947–4954
- Gehrig et al [2021b] Gehrig M, Millhäusler M, Gehrig D, et al (2021b) E-raft: Dense optical flow from event cameras. In: 2021 International Conference on 3D Vision (3DV), IEEE, pp 197–206
- Hagenaars et al [2021] Hagenaars J, Paredes-Vallés F, De Croon G (2021) Self-supervised learning of event-based optical flow with spiking neural networks. Advances in Neural Information Processing Systems 34:7167–7179
- Hersche et al [2022] Hersche M, Karunaratne G, Cherubini G, et al (2022) Constrained few-shot class-incremental learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9057–9067
- Hersche et al [2023] Hersche M, Zeqiri M, Benini L, et al (2023) A neuro-vector-symbolic architecture for solving raven’s progressive matrices. Nature Machine Intelligence 5(4):363–375
- Honegger et al [2013] Honegger D, Meier L, Tanskanen P, et al (2013) An open source and open hardware embedded metric optical flow cmos camera for indoor and outdoor applications. In: 2013 IEEE International Conference on Robotics and Automation, IEEE, pp 1736–1741
- Horn and Schunck [1981] Horn BK, Schunck BG (1981) Determining optical flow. Artificial intelligence 17(1-3):185–203
- Kanerva [2009] Kanerva P (2009) Hyperdimensional computing: An introduction to computing in distributed representation with high-dimensional random vectors. Cognitive computation 1:139–159
- Karunaratne et al [2020] Karunaratne G, Le Gallo M, Cherubini G, et al (2020) In-memory hyperdimensional computing. Nature Electronics 3(6):327–337
- Karunaratne et al [2021] Karunaratne G, Schmuck M, Le Gallo M, et al (2021) Robust high-dimensional memory-augmented neural networks. Nature communications 12(1):2468
- Karunaratne et al [2022] Karunaratne G, Hersche M, Langeneager J, et al (2022) In-memory realization of in-situ few-shot continual learning with a dynamically evolving explicit memory. In: ESSCIRC 2022-IEEE 48th European Solid State Circuits Conference (ESSCIRC), IEEE, pp 105–108
- Kempitiya et al [2022] Kempitiya T, De Silva D, Kahawala S, et al (2022) Parameterization of vector symbolic approach for sequence encoding based visual place recognition. In: 2022 International Joint Conference on Neural Networks (IJCNN), IEEE, pp 1–7
- Kingma and Ba [2014] Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:14126980
- Kleyko et al [2021] Kleyko D, Rachkovskij DA, Osipov E, et al (2021) A survey on hyperdimensional computing aka vector symbolic architectures, part i: Models and data transformations. ACM Computing Surveys (CSUR)
- Kleyko et al [2023] Kleyko D, Rachkovskij D, Osipov E, et al (2023) A survey on hyperdimensional computing aka vector symbolic architectures, part ii: Applications, cognitive models, and challenges. ACM Computing Surveys 55(9):1–52
- Komer [2020] Komer B (2020) Biologically inspired spatial representation
- Lagorce et al [2016] Lagorce X, Orchard G, Galluppi F, et al (2016) Hots: a hierarchy of event-based time-surfaces for pattern recognition. IEEE transactions on pattern analysis and machine intelligence 39(7):1346–1359
- Lee et al [2020] Lee C, Kosta AK, Zhu AZ, et al (2020) Spike-flownet: event-based optical flow estimation with energy-efficient hybrid neural networks. In: European Conference on Computer Vision, Springer, pp 366–382
- Li et al [2023] Li Y, Huang Z, Chen S, et al (2023) Blinkflow: A dataset to push the limits of event-based optical flow estimation. arXiv preprint arXiv:230307716
- Liu et al [2023] Liu H, Chen G, Qu S, et al (2023) Tma: Temporal motion aggregation for event-based optical flow. arXiv preprint arXiv:230311629
- Liu and Delbruck [2018] Liu M, Delbruck T (2018) Adaptive time-slice block-matching optical flow algorithm for dynamic vision sensors. BMVC
- Liu and Delbruck [2022] Liu M, Delbruck T (2022) Edflow: Event driven optical flow camera with keypoint detection and adaptive block matching. IEEE Transactions on Circuits and Systems for Video Technology 32(9):5776–5789
- Lucas and Kanade [1981] Lucas BD, Kanade T (1981) An iterative image registration technique with an application to stereo vision. In: IJCAI’81: 7th international joint conference on Artificial intelligence, pp 674–679
- Mémin and Pérez [2002] Mémin E, Pérez P (2002) Hierarchical estimation and segmentation of dense motion fields. International Journal of Computer Vision 46:129–155
- Nagata et al [2021] Nagata J, Sekikawa Y, Aoki Y (2021) Optical flow estimation by matching time surface with event-based cameras. Sensors 21(4):1150
- Neubert and Schubert [2021] Neubert P, Schubert S (2021) Hyperdimensional computing as a framework for systematic aggregation of image descriptors. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16938–16947
- Paredes-Vallés and de Croon [2021] Paredes-Vallés F, de Croon GC (2021) Back to event basics: Self-supervised learning of image reconstruction for event cameras via photometric constancy. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3446–3455
- Paredes-Vallés et al [2023] Paredes-Vallés F, Scheper KY, De Wagter C, et al (2023) Taming contrast maximization for learning sequential, low-latency, event-based optical flow. arXiv preprint arXiv:230305214
- Plate [1992] Plate TA (1992) Holographic recurrent networks. Advances in neural information processing systems 5
- Plate [1994] Plate TA (1994) Distributed representations and nested compositional structure. Citeseer
- Renner et al [2022a] Renner A, Supic L, Danielescu A, et al (2022a) Neuromorphic visual odometry with resonator networks. arXiv preprint arXiv:220902000
- Renner et al [2022b] Renner A, Supic L, Danielescu A, et al (2022b) Neuromorphic visual scene understanding with resonator networks. arXiv preprint arXiv:220812880
- Shiba et al [2022] Shiba S, Aoki Y, Gallego G (2022) Secrets of event-based optical flow. In: European Conference on Computer Vision, Springer, pp 628–645
- Stoffregen and Kleeman [2018] Stoffregen T, Kleeman L (2018) Simultaneous optical flow and segmentation (sofas) using dynamic vision sensor. arXiv preprint arXiv:180512326
- Stoffregen et al [2020] Stoffregen T, Scheerlinck C, Scaramuzza D, et al (2020) Reducing the sim-to-real gap for event cameras. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVII 16, Springer, pp 534–549
- Stone et al [2021] Stone A, Maurer D, Ayvaci A, et al (2021) Smurf: Self-teaching multi-frame unsupervised raft with full-image warping. In: Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, pp 3887–3896
- Sun et al [2018] Sun D, Yang X, Liu MY, et al (2018) Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8934–8943
- Teed and Deng [2020] Teed Z, Deng J (2020) Raft: Recurrent all-pairs field transforms for optical flow. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, Springer, pp 402–419
- Wan et al [2022] Wan Z, Dai Y, Mao Y (2022) Learning dense and continuous optical flow from an event camera. IEEE Transactions on Image Processing 31:7237–7251
- Wu et al [2022] Wu Y, Paredes-Vallés F, de Croon GC (2022) Lightweight event-based optical flow estimation via iterative deblurring. arXiv preprint arXiv:221113726
- Ye et al [2020] Ye C, Mitrokhin A, Fermüller C, et al (2020) Unsupervised learning of dense optical flow, depth and egomotion with event-based sensors. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, pp 5831–5838
- Ye et al [2023] Ye Y, Shi H, Yang K, et al (2023) Towards anytime optical flow estimation with event cameras. arXiv preprint arXiv:230705033
- Zhang et al [1988] Zhang W, Tanida J, Itoh K, et al (1988) Shift-invariant pattern recognition neural network and its optical architecture. In: Proceedings of annual conference of the Japan Society of Applied Physics, Montreal, CA
- Zhou et al [2021] Zhou Y, Gallego G, Shen S (2021) Event-based stereo visual odometry. IEEE Transactions on Robotics 37(5):1433–1450
- Zhu and Yuan [2018] Zhu AZ, Yuan L (2018) Ev-flownet: Self-supervised optical flow estimation for event-based cameras. In: Robotics: Science and Systems
- Zhu et al [2019] Zhu AZ, Yuan L, Chaney K, et al (2019) Unsupervised event-based learning of optical flow, depth, and egomotion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 989–997
- Zou et al [2022] Zou Z, Alimohamadi H, Kim Y, et al (2022) Eventhd: Robust and efficient hyperdimensional learning with neuromorphic sensor. Frontiers in Neuroscience 16:858329