This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

EV-Planner: Energy-Efficient Robot Navigation via Event-Based Physics-Guided Neuromorphic Planner

Sourav Sanyal (), Rohan Kumar Manna, and Kaushik Roy ()
Elmore Family School of Electrical and Computer Engineering, Purdue University
Manuscript received: July 20, 2023; Revised: November 9, 2023; Accepted: December 16, 2023.This paper was recommended for publication by Editor Tetsuya Ogata upon evaluation of the Associate Editor and Reviewers’ comments.This work was supported in part by the Center for Brain-inspired Computing (C-BRIC), a DARPA sponsored JUMP center, the Semiconductor Research Corporation (SRC), the National Science Foundation, US Army Research Lab, and IARPA MicroE4AI. Sourav and Rohan contributed equally to implement this work. Code and video demo available at https://github.com/souravsanyal06/EV-Planner Digital Object Identifier (DOI): see top of this page.
Abstract

Vision-based object tracking is an essential precursor to performing autonomous aerial navigation in order to avoid obstacles. Biologically inspired neuromorphic event cameras are emerging as a powerful alternative to frame-based cameras, due to their ability to asynchronously detect varying intensities (even in poor lighting conditions), high dynamic range, and robustness to motion blur. Spiking neural networks (SNNs) have gained traction for processing events asynchronously in an energy-efficient manner. On the other hand, physics-based artificial intelligence (AI) has gained prominence recently, as they enable embedding system knowledge via physical modeling inside traditional analog neural networks (ANNs). In this letter, we present an event-based physics-guided neuromorphic planner (EV-Planner) to perform obstacle avoidance using neuromorphic event cameras and physics-based AI. We consider the task of autonomous drone navigation where the mission is to detect moving gates and fly through them while avoiding a collision. We use event cameras to perform object detection using a shallow spiking neural network in an unsupervised fashion. Utilizing the physical equations of the brushless DC motors present in the drone rotors, we train a lightweight energy-aware physics-guided neural network (PgNN) with depth inputs. This predicts the optimal flight time responsible for generating near-minimum energy paths. We spawn the drone in the Gazebo simulator and implement a sensor-fused vision-to-planning neuro-symbolic framework using Robot Operating System (ROS). Simulation results for safe collision-free flight trajectories are presented with performance analysis, ablation study and potential future research directions.

Index Terms:
Event cameras, Neuromorphic vision, Physics-based AI, Spiking Neural Networks, Vision-based navigation

I Introduction

For performing vision-based robot navigation [1], object tracking is a fundamental task. As navigation environments become increasingly challenging [2] (due to increased demands on robot speed or to fly under reduced lighting conditions), the need for newer vision sensors arises. Biologically inspired event cameras or Dynamic Vision Sensors (DVS) [3, 4, 5, 6], which are capable of triggering events in response to changes in the logarithm of pixel intensities past a certain threshold, have emerged as a promising candidate. Event cameras are relatively immune to problems such as motion blur, can withstand higher temporal resolution (10μs10𝜇𝑠10\mu s vs 3ms3𝑚𝑠3ms), operating frequencies with wider dynamic illumination ranges (140dB140𝑑𝐵140dB vs 60dB60𝑑𝐵60dB), and consume lower power (10mW10𝑚𝑊10mW vs 3W3𝑊3W), compared to traditional frame-based cameras [7]. Consequently, there has been considerable interest in the autonomous systems community, in using event cameras as vision sensors, for navigation purposes. Inspired by the neuronal dynamics observed in biological brains, spiking neurons, specifically Leaky Integrate and Fire (LIF) neurons [8] have been designed for leveraging temporal information processing – quite similar to the signals produced by event cameras. Furthermore, the asynchronous event-driven nature of spiking neuron firing makes it a natural candidate for handling asynchronous events [9]. A recent work [10] utilized a shallow spiking architecture for spatio-temporal event processing to perform low latency object detection for autonomous navigation systems.

Refer to caption
Figure 1: High-level overview of EV-Planner framework

On the other hand, physics-based machine learning/artificial intelligence [11, 12, 13, 14] is gaining momentum, as they enable encoding prior knowledge of physical systems while learning from data. In robotic systems, symbolic dynamics have been used traditionally to describe physical properties (of both systems and environments) [15]. Works on predictive control [16, 17, 18, 19] have incorporated such physics-encoded prior inside neural network pipelines, and have shown to present advantages in terms of training efficiency (data and hence compute) as well as inference (latency and therefore energy). Moreover, adding physical knowledge as priors reduces laborious dataset preparation efforts [20] and makes neural networks more robust and interpretable – a quality immensely desired by the AI community in mission-critical scenarios.

Considering today’s actuators with a 15V15𝑉15V supply, the average instantaneous power for a simulated Parrot Bebop2 is 124Wsimilar-toabsent124𝑊\sim 124W (see Section III-B), while the maximum hovering time without any maneuver is 25similar-toabsent25\sim 25 minutes for a 2700mAh2700𝑚𝐴2700mAh battery [21]. For agile navigation tasks, the power increases, further reducing the flight time. To that effect, we present EV-Planner – an event-based physics-guided neuromorphic planner to perform autonomous navigation energy-efficiently using neuromorphic event cameras and physics-based AI. Exploiting the energy-efficient nature of spike-based computation while efficiently capturing the temporal event information, and harnessing the benefits of physics-based simulations in training neural networks, the main goal of this work is to perform energy-efficient generalizable planning using a sensor fusion of event and depth cameras for small drones. We consider the task of autonomous quadrotor navigation where the mission is to detect moving gates and fly through them without collisions.

Figure 1 illustrates the logical overview of the EV-Planner framework. We consider two sensors – an event camera and a depth camera. The Spiking Neural Network (SNN) block detects objects using event camera inputs. The Physics-guided Neural Network (PgNN) block learns to predict near-optimal trajectory times to the target destination from the depth input. The main contributions of this work are:

  • We use event cameras to perform object detection in a 3D environment using a shallow low-latency spiking neural architecture in an unsupervised fashion (Section III-A).

  • Utilizing the physical equations of the brushless DC motors present in the drone rotors, we train a PgNN with depth camera inputs. The pre-trained network predicts near-optimal flight times responsible for generating near-minimum energy paths during inference (Section III-B).

  • We spawn a Parrot Bebop2 drone in the Gazebo simulator and implement a sensor-fused vision-to-planning neuro-symbolic framework (Section III-C) using the Robot Operating System (ROS) [22].

  • Using our simulation setup (Section IV), we perform autonomous collision-free trajectory planning. We present performance statistics along with an ablation study to quantify the role of each design component (Section V).

To the best of our knowledge, this is the first work that uses event-based spike-driven neuromorphic vision coupled with physics-based AI for vision-based autonomous navigation. This results in a hybrid neuro-symbolic network architecture for energy-efficient generalizable robot planning.

II Background and Related Work

We consider vision-based autonomous navigation systems which use event-based perception for obstacle avoidance and physics-guided robot learning for planning and control. Relevant research pursuits to achieve this are briefly discussed.

II-A Event-based Robot Perception

Event cameras such as Dynamic Vision Sensors (DVS) capture asynchronous changes in the logarithm of light intensities. Events are encoded as arrayed tuples of event instances, given as E={e1,e2,e3,en}𝐸subscript𝑒1subscript𝑒2subscript𝑒3subscript𝑒𝑛E=\{e_{1},e_{2},e_{3},...e_{n}\} across n𝑛n time-steps. Each discrete event eisubscript𝑒𝑖e_{i} for-all\forall i={1,2,3,,n}𝑖123𝑛i=\{1,2,3,...,n\} is represented as ei={xi,yi,pi,ti}subscript𝑒𝑖subscript𝑥𝑖subscript𝑦𝑖subscript𝑝𝑖subscript𝑡𝑖e_{i}=\{x_{i},y_{i},p_{i},t_{i}\}, where (xi,yi)subscript𝑥𝑖subscript𝑦𝑖(x_{i},y_{i}) is the location of the camera pixel, pisubscript𝑝𝑖p_{i} represents the polarity (or sign-change of logarithm of intensity), and tisubscript𝑡𝑖t_{i} corresponds to i𝑖i-th time-step.

Early efforts on event-based object detection applied simple kernels on event outputs to capture salient patterns [23, 24, 25]. Although these works paved the way for future endeavors, they failed to capture events generated by the background. To overcome this, [26] proposed a motion compensation scheme using ego-motion filtering to eliminate background events. For object-tracking, [27] utilized both frames as well as event cameras. These hybrid techniques were computationally expensive as they relied on frame inputs to detect objects. Works in [28, 29] estimate time-to-collision using events. Related to our work, event cameras have been mounted on drones for dynamic obstacle dodging [30]. However, optical flow and segmentation using learning-based techniques require bulky neural networks with lots of parameters. In this work, we focus on object detection for tracking using spiking neurons [10] – achievable with low-latency neural networks.

TABLE I: Perception Works for 3D Object Tracking
Sensor Modality Available Literature
Events [10, 24, 25, 26, 31, 32]
Depth [33, 34, 35, 36]
Events + Depth EV-Planner

Why Sensor Fusion of Events and Depth? In order to track an object in a 3D environment, the notion of depth is of utmost importance. As shown in Table I, there are works that either 1) estimate depth using non-depth sensors[10, 24, 25, 26, 31, 32] or 2) use depth sensors[33, 34, 35, 36]. The absence of an event sensor makes sensing and perception energy-expensive. On the other hand, the absence of a depth sensor will require estimating depth adding to the processing delay of the perception algorithm. Hence, in order to limit the overheads of perception delays (which is responsible for real-time object tracking), it is imperative that there are dedicated sensors for both events and depth. While adding sensors increases the robot payload, future designs will consider embedding multiple sensors into a single device. The reduced perception time will make the design more energy-efficient, rendering sensor fusion a promising design paradigm.

II-B Physics-Based Robot Learning for Planning/Control

Physics-based AI shows promise as these approaches embed system knowledge via physical modeling inside neural networks, making them robust and interpretable. Traditionally, in robotic systems, symbolic methods have been used for describing physical properties as they are fast and accurate [15, 37]. For controlling dynamical systems, the work in [13] added the provision of control variables to physics-informed neural networks. The work in [18] has showcased the advantages of using physical priors in control tasks. The research presented in [16] used [13] for controlling multi-link manipulators, while the work in [17] extended the previous framework for trajectory tracking of drones subjected to uncertain disturbances. The authors in [19] used neural ordinary differential equations (ODEs) for drones. While all these works use physics to solve differential equations, the same principle can be applied to add non-ODE (or non-PDEs) as soft constraints to neural network training, which is explored in this work. Incorporating physics into neural networks as a hard constraint still remains unexplored and is beyond the scope of this work.

III EV-Planner

We present EV-Planner – an event-based physics-guided neuromorphic planner to perform obstacle avoidance using neuromorphic event cameras and physics-based AI. For object tracking, we use event-based low-latency SNNs and train a lightweight energy-aware neural network with depth inputs. This makes our design loosely coupled with no interaction between the SNN and PgNN block (see Figure 1). Information about the current object position (via SNN block) and estimated trajectory time (via PgNN block) is consumed by a symbolic program (a ROS node2) for predictive planning.

22footnotetext: The basic unit of compute in ROS is called a node. A planner which performs decision-making via symbolic rules is implemented as a ROS node.

III-A Event-based Object Tracking via Spiking Neural Network

Due to a lack of photo-metric features such as light intensity and texture in event streams, traditional computer vision algorithms fail to work on events. Hence, we use a modified version of [10] which utilizes the inherent temporal information in the events with the help of a shallow architecture of spiking LIF neurons. The SNN can separate events based on the object’s speed (present in the robot’s field of view).

Refer to caption
Figure 2: On the left, the spikes generated do not exceed the membrane potential threshold due to the absence of enough events. On the right, due to object motion, densely generated events produce more input spikes which exceed the membrane potential threshold to generate output spikes.

III-A1 Filtering out objects based on Speed

The network architecture used for filtering objects based on their motion consists of a single spike-based neural layer, whose parameters are fine-tuned to directly represent the speed of the moving object. The LIF neuron model [8] has a membrane potential (U[t]𝑈delimited-[]𝑡U[t]) and a leak factor (β𝛽\beta). At first, the initial membrane potential (U[t0]𝑈delimited-[]subscript𝑡0U[t_{0}]) and the threshold value (Uthsubscript𝑈𝑡U_{th}) get initialized. Next, according to equation [1] for each time step (t𝑡t) the weighted sum of the inputs (WX[t]𝑊𝑋delimited-[]𝑡WX[t]) gets stored in U[t]𝑈delimited-[]𝑡U[t]. Finally, if at any instance of time, the stored U[t]𝑈delimited-[]𝑡U[t] exceeds Uthsubscript𝑈𝑡U_{th} then the neuron produces an output spike and resets U[t]𝑈delimited-[]𝑡U[t].

U[t]=βU[tn1]+WX[t]𝑈delimited-[]𝑡𝛽𝑈delimited-[]subscript𝑡𝑛1𝑊𝑋delimited-[]𝑡U[t]=\beta U[t_{n-1}]+WX[t] (1)

In our approach hyper-parameters like β𝛽\beta and Uthsubscript𝑈𝑡U_{th} are fine-tuned to select nearby input spikes during any given instance of time. Figure 2 demonstrates the difference in the output spikes generated from the neuron according to the frequency of the input events. As shown, the frequency of the generated events is directly proportional to the speed of the moving object. Exploiting this property of spiking neurons and event data, we consequently design an SNN capable of filtering out a particular object of interest, based on speed.

Refer to caption
Figure 3: Spike-based neural architecture responsible for filtering events from objects based on speed. This SNN is used for subsequent object tracking.

III-A2 Finding Center of the Bounding box

To detect an object, the initial step involves connecting the output pixels from the event camera to the spiking neurons. Then, we apply a weighted sum (W𝑊W) from a neighborhood of pixels and pass that as input to the spiking neuron. Here, the neurons connected to the specific neighborhood will register more than one input pixel every consecutive time step. This enables the SNN with the ability to identify pixels as fast-moving. In the next step, we locate the minimum (Xmin,Yminsubscript𝑋𝑚𝑖𝑛subscript𝑌𝑚𝑖𝑛X_{min},Y_{min}) and maximum (Xmax,Ymaxsubscript𝑋𝑚𝑎𝑥subscript𝑌𝑚𝑎𝑥X_{max},Y_{max}) pixel values in two dimensions and subsequently draw the bounding box. Finally, we obtain the center coordinates of the moving object using Eqn. (2).

centerx=Xmin+(XmaxXmin)/2𝑐𝑒𝑛𝑡𝑒subscript𝑟𝑥subscript𝑋𝑚𝑖𝑛subscript𝑋𝑚𝑎𝑥subscript𝑋𝑚𝑖𝑛2\displaystyle center_{x}=X_{min}+\lfloor(X_{max}-X_{min})/2\rfloor (2a)
centery=Ymin+(YmaxYmin)/2𝑐𝑒𝑛𝑡𝑒subscript𝑟𝑦subscript𝑌𝑚𝑖𝑛subscript𝑌𝑚𝑎𝑥subscript𝑌𝑚𝑖𝑛2\displaystyle center_{y}=Y_{min}+\lfloor(Y_{max}-Y_{min})/2\rfloor (2b)

Figure 3 depicts our network architecture. We selected a reasonable weighted neighborhood size (W𝑊W) (see Section IV) as input to every spiking neuron. The weights for the neighborhood were normalized to obtain a summation value close to one. Note that the SNN mentioned above detects the moving object. But we still need to isolate the input events which represent the object of interest. Therefore, after detection, we recover the input events around the neighborhood of the spiking neuron output. Hence, the SNN works asynchronously to find the output spikes and recover the inputs from the previous time step (one-time step delay). Generally, after separating the objects based on the movement speed, their corresponding events are spatially grouped together using some clustering technique for predicting the object class. However, in this work, we only consider a single moving object. Hence, our current SNN-based design does not require any spatial clustering technique, which reduces the latency and energy overheads of our object detection and tracking. Please note that the SNN is scene-independent and can be deployed in various scenarios with minimal fine-tuning. Figure 4 illustrates the output of the event-based object (gate) tracking using SNN, situated at different depths (see Section V-A for more details).

Refer to caption
Figure 4: Isolating the moving object and finding the center of the bounding box in the pixel coordinate system at different depths.

III-B Physics-Based Robot Modelling

III-B1 Energy model

We model the actuation energy of the drone in flight by considering an electrical model of the LiPo battery-powered brushless DC (BLDC) motors of the rotor propellers [38]. Figure 5 represents the equivalent circuit of each BLDC motor by considering the resistive and inductive windings. The energy for a particular flight path requires overcoming the motor and load frictions (given by the coefficients Jmsubscript𝐽𝑚J_{m} and JLsubscript𝐽𝐿J_{L}). The instantaneous motor current is:

i(t)=1KT[Tf+TLω(t)+Dfω(t)+(Jm+JL)dω(t)dt]𝑖𝑡1subscript𝐾𝑇delimited-[]subscript𝑇𝑓subscript𝑇𝐿𝜔𝑡subscript𝐷𝑓𝜔𝑡subscript𝐽𝑚subscript𝐽𝐿𝑑𝜔𝑡𝑑𝑡i(t)=\frac{1}{K_{T}}[T_{f}+T_{L}\;\omega(t)+D_{f}\;\omega(t)+(J_{m}+J_{L})\frac{d\omega(t)}{dt}] (3)

KTsubscript𝐾𝑇K_{T} is the torque constant. Tfsubscript𝑇𝑓T_{f} and TLω(t)subscript𝑇𝐿𝜔𝑡T_{L}\;\omega(t) are the motor and load friction torques respectively. Dfsubscript𝐷𝑓D_{f} represents the viscous damping coefficient. For the BLDC motors of drone propellers, we can safely neglect Tfsubscript𝑇𝑓T_{f} and Dfsubscript𝐷𝑓D_{f}.

JL=14nBmB(rϵ)2subscript𝐽𝐿14subscript𝑛𝐵subscript𝑚𝐵superscript𝑟italic-ϵ2J_{L}=\frac{1}{4}\;n_{B}\;m_{B}\;(r-\epsilon)^{2} (4)

nBsubscript𝑛𝐵n_{B} is the number of propeller blades, mBsubscript𝑚𝐵m_{B} is the mass of the blade and r𝑟r is the propeller radius. ϵitalic-ϵ\epsilon is the clearance between the blade and motor.

TABLE II: Energy Model Specification
Parameters Values
Motor Resistance (R𝑅R) 0.30.30.3 ohm
Supply Voltage (V𝑉V) 151515 V
Maximum Motor Speed 799479947994 rpm
Motor Friction Torque (Tfsubscript𝑇𝑓T_{f}) 0.01870.01870.0187 Nm
Aerodynamic Drag Coefficient (kτsubscript𝑘𝜏k_{\tau}) 9.04969e099.04969𝑒099.04969e-09
Viscous Damping Coefficient (Dfsubscript𝐷𝑓D_{f}) 2e042𝑒042e-04 Nm
Motor Voltage Constant (KE=KTsubscript𝐾𝐸subscript𝐾𝑇K_{E}=K_{T}) 0.5320.5320.532 V/rpm
Motor Moment of Inertia (Jmsubscript𝐽𝑚J_{m}) 4.9e064.9𝑒064.9e-06 kgm2
Number of blades (nbsubscript𝑛𝑏n_{b}) 333
Mass of blade (mbsubscript𝑚𝑏m_{b}) 0.0010.0010.001 kg
Radius of blade (r𝑟r) 0.10.10.1 m
Blade clearance (ϵitalic-ϵ\epsilon) 0.0230.0230.023 m

The motor voltage is as follows:

e(t)=Ri(t)+KEω(t)+Ldi(t)dt𝑒𝑡𝑅𝑖𝑡subscript𝐾𝐸𝜔𝑡𝐿𝑑𝑖𝑡𝑑𝑡e(t)=R\;i(t)+K_{E}\;\omega(t)+L\;\frac{di(t)}{dt} (5)

R𝑅R and L𝐿L denote the resistive and inductive impedances of the phase windings respectively. KEsubscript𝐾𝐸K_{E} represents the motor voltage constant. We also neglect the electronic speed controller (ESC) voltage drops and any non-idealities present in the LiPo battery. For small drones (which we assume in this work), the propellers have direct connections with the shafts. At steady-state operation, the motor voltage becomes:

e(t)=Ri(t)+KEω(t)𝑒𝑡𝑅𝑖𝑡subscript𝐾𝐸𝜔𝑡e(t)=R\;i(t)+K_{E}\;\omega(t) (6)

For flight from time 00 to t𝑡t, then the total actuation energy of the flight path for all practical purposes can be written as:

E(t)=0tj=14ej(t)ij(t)dt𝐸𝑡superscriptsubscript0𝑡superscriptsubscript𝑗14subscript𝑒𝑗𝑡subscript𝑖𝑗𝑡𝑑𝑡\small E(t)=\int_{0}^{t}\sum_{j=1}^{4}e_{j}(t)\;i_{j}(t)\;dt (7)

Using equations (3) and (6), the energy expression becomes:

E(t)=0tj=14[c0+c1ωj(t)+c2ωj2(t)+c3ωj3(t)\displaystyle E(t)=\int_{0}^{t}\sum_{j=1}^{4}[c_{0}+c_{1}\;\omega_{j}(t)+c_{2}\;\omega_{j}^{2}(t)+c_{3}\;\omega_{j}^{3}(t) (8)
+c4ωj4(t)+c5ωj˙2(t)]dt\displaystyle+c_{4}\;\omega_{j}^{4}(t)+c_{5}\;\dot{\omega_{j}}^{2}(t)]dt

where,

c0=RTf2KT2,c1=TfKT(2RDfKT+KE)formulae-sequencesubscript𝑐0𝑅superscriptsubscript𝑇𝑓2superscriptsubscript𝐾𝑇2subscript𝑐1subscript𝑇𝑓subscript𝐾𝑇2𝑅subscript𝐷𝑓subscript𝐾𝑇subscript𝐾𝐸\displaystyle c_{0}=\frac{RT_{f}^{2}}{K_{T}^{2}},\;\;c_{1}=\frac{T_{f}}{K_{T}}\left(\frac{2RD_{f}}{K_{T}}+K_{E}\right) (9a)
c2=DfKT(RDfKT+KE)+2RTfkτKT2subscript𝑐2subscript𝐷𝑓subscript𝐾𝑇𝑅subscript𝐷𝑓subscript𝐾𝑇subscript𝐾𝐸2𝑅subscript𝑇𝑓subscript𝑘𝜏superscriptsubscript𝐾𝑇2\displaystyle c_{2}=\frac{D_{f}}{K_{T}}\left(\frac{RD_{f}}{K_{T}}+K_{E}\right)+\frac{2RT_{f}k_{\tau}}{K_{T}^{2}} (9b)
c3=kτKT(2RDfKT+KE),c4=Rkτ2KT2,c5=2RJkτKT2formulae-sequencesubscript𝑐3subscript𝑘𝜏subscript𝐾𝑇2𝑅subscript𝐷𝑓subscript𝐾𝑇subscript𝐾𝐸formulae-sequencesubscript𝑐4𝑅superscriptsubscript𝑘𝜏2superscriptsubscript𝐾𝑇2subscript𝑐52𝑅𝐽subscript𝑘𝜏superscriptsubscript𝐾𝑇2\displaystyle c_{3}=\frac{k_{\tau}}{K_{T}}\left(\frac{2RD_{f}}{K_{T}}+K_{E}\right),\;\;c_{4}=\frac{Rk_{\tau}^{2}}{K_{T}^{2}},\;\;c_{5}=\frac{2RJk_{\tau}}{K_{T}^{2}} (9c)
Refer to caption
Figure 5: Energy model of brushless DC motor of quadrotor propellers
Refer to caption
Figure 6: Energy burned by quadrotor motors for traversing different depths for varying times of flight.
This only considers linear motion along one direction.
Refer to caption
Figure 7: Gazebo world containing the dynamic gate moving with velocity vrsubscript𝑣𝑟v_{r} which changes direction after reaching distance +/L+/-L. ttrajsubscript𝑡𝑡𝑟𝑎𝑗t_{traj} is the trajectory time estimated by the PgNN.

Using the expression given in equation (8), and plugging in the constant values given in equations (9), we are thus able to calculate the energy for a particular trajectory up to time t𝑡t. Given a certain distance away from a target destination (depth), an energy-efficient planner should be able to approximately estimate the flight time that would minimize the energy. We utilize the knowledge obtained through the aforementioned equations by simulating several trajectories to reach targets situated at different depths.

III-B2 Energy characterization as a function of velocity

We implemented the energy model using the values in Table II. We present key insights into our physics-guided simulation. Figure 7 presents the actuation energy for traversing different depths. Using our physics-guided simulation, we report an average instantaneous power of 124Wsimilar-toabsent124𝑊\sim 124W for a 2700mAh2700𝑚𝐴2700mAh battery. For each depth, we swipe the linear velocity as the control variable from 111 m/s to 161616 m/s (the top speed of the drone in this study) in steps of 111 m/s. We observe that the energy is the highest when the robot speed is the minimum. The time taken by the drone to reach the destination varies inversely with speed. As a consequence, if we use Eqn. (8) to estimate the energy, then the effect of time dominates at a lower speed for which the energy adds up. With increasing speed, the energy burned drops dramatically in the beginning due to a reduction in flight time. Beyond a certain speed, the energy again increases. This is because, for higher speeds, the motor currents drawn from the battery increases, resulting in an increase in instantaneous power and consequently energy. We observe there is a sweet spot for a near-optimal velocity which yields the least energy for a particular trajectory. This establishes the need for an interpolation a near-optimal velocity, given the target depth.

III-B3 Physics-Guided Neural Network

We collect several trajectories as part of our training dataset by flying the drone across different depths and by varying the linear velocity as shown in Figure 7. A vanilla ANN can be trained with several such depth-velocity pairs. However, the ANN will have no knowledge about the underlying source data distribution.

E=c0+c1v+c2v2+c3v3+c4v4+c5v5𝐸subscript𝑐0subscript𝑐1𝑣subscript𝑐2superscript𝑣2subscript𝑐3superscript𝑣3subscript𝑐4superscript𝑣4subscript𝑐5superscript𝑣5\displaystyle E=c_{0}+c_{1}v+c_{2}v^{2}+c_{3}v^{3}+c_{4}v^{4}+c_{5}v^{5} (10a)
dEdv=c1+2c2v+3c3v2+4c4v3+5c5v4𝑑𝐸𝑑𝑣subscript𝑐12subscript𝑐2𝑣3subscript𝑐3superscript𝑣24subscript𝑐4superscript𝑣35subscript𝑐5superscript𝑣4\displaystyle\frac{dE}{dv}=c_{1}+2c_{2}v+3c_{3}v^{2}+4c_{4}v^{3}+5c_{5}v^{4} (10b)

As shown in Equation (10a), we fit 5th-order polynomial curves to the energy-velocity profile obtained from the simulations, for each depth. To approximately evaluate the optimal velocity for near-minimum energy, we equate the derivative of the fitted polynomial to zero as shown in Eqn. (10b). As shown in Table III, for each training data point, the expression from Eqn. (10b) using the corresponding polynomial coefficients is added as an additional regularization term in the loss function. While a mid-range value of v𝑣v will cover most cases, addition of physical equations makes the model more interpretable. We use the standard mean-square error loss along with the additional physical constraints:

=1ni=1nvivipred2+λi=1nj=15jcjvi(j1)1𝑛superscriptsubscript𝑖1𝑛superscriptnormsuperscriptsubscript𝑣𝑖superscriptsubscript𝑣𝑖𝑝𝑟𝑒𝑑2𝜆superscriptsubscript𝑖1𝑛superscriptsubscript𝑗15𝑗subscript𝑐𝑗superscriptsubscript𝑣𝑖absent𝑗1\mathcal{L}=\frac{1}{n}\sum_{i=1}^{n}||v_{i}^{*}-v_{i}^{pred}||^{2}+\lambda\sum_{i=1}^{n}\sum_{j=1}^{5}jc_{j}v_{i}^{*(j-1)} (11)

n𝑛n is the number of training samples. vsuperscript𝑣v^{*} is the desired velocity obtained from the groundtruth, while vpredsuperscript𝑣𝑝𝑟𝑒𝑑v^{pred} is the velocity the network learns. λ𝜆\lambda is the regularization coefficient.

TABLE III: Dataset for Physics-guided Neural Network
Depth Velocity Constraint
d1subscript𝑑1d_{1} v1subscript𝑣1v_{1} c11+2c21v1+3c31v12+4c41v13+5c51v14=0subscript𝑐112subscript𝑐21subscript𝑣13subscript𝑐31superscriptsubscript𝑣124subscript𝑐41superscriptsubscript𝑣135subscript𝑐51superscriptsubscript𝑣140c_{11}+2c_{21}v_{1}+3c_{31}v_{1}^{2}+4c_{41}v_{1}^{3}+5c_{51}v_{1}^{4}=0
d2subscript𝑑2d_{2} v2subscript𝑣2v_{2} c12+2c22v2+3c32v22+4c42v23+5c52v24subscript𝑐122subscript𝑐22subscript𝑣23subscript𝑐32superscriptsubscript𝑣224subscript𝑐42superscriptsubscript𝑣235subscript𝑐52superscriptsubscript𝑣24c_{12}+2c_{22}v_{2}+3c_{32}v_{2}^{2}+4c_{42}v_{2}^{3}+5c_{52}v_{2}^{4} = 0
….. …. ….
dnsubscript𝑑𝑛d_{n} vnsubscript𝑣𝑛v_{n} c1n+2c2nvn+3c3nvn2+4c4nvn3+5c5nvn4subscript𝑐1𝑛2subscript𝑐2𝑛subscript𝑣𝑛3subscript𝑐3𝑛superscriptsubscript𝑣𝑛24subscript𝑐4𝑛superscriptsubscript𝑣𝑛35subscript𝑐5𝑛superscriptsubscript𝑣𝑛4c_{1n}+2c_{2n}v_{n}+3c_{3n}v_{n}^{2}+4c_{4n}v_{n}^{3}+5c_{5n}v_{n}^{4} = 0
ttraj=dvpredsubscript𝑡𝑡𝑟𝑎𝑗𝑑superscript𝑣𝑝𝑟𝑒𝑑t_{traj}=\frac{d}{v^{pred}} (12)

Note, the output of the PgNN is a velocity. Using Eqn. (12), we calculate an approximate trajectory time using which we estimate the position of the moving gate, in the future. Also, the training set is small because we are constraining the training via physical equations. This enhances sample-efficiency which accelerates training. While the training set consists of a similar range of depths as seen during inference, the inference can have a depth value not available in training data, making the proposed method superior than a regression-less lookup-based approach.

III-C Planning Robot Trajectories

III-C1 Motion Estimation of Moving Gate

As shown in Figure 7, we assume the moving gate has a linear velocity along the Y axis as vrsubscript𝑣𝑟v_{r}. We use our SNN-based object tracking to estimate vrsubscript𝑣𝑟v_{r}. Consider the position of the gate along the Y axis to be y1subscript𝑦1y_{1} at any given instant, obtained from the object tracker. For a sensing interval of δt𝛿𝑡\delta t, let the position of the same point in the gate after δt𝛿𝑡\delta t be y2subscript𝑦2y_{2}. Then, to estimate the instantaneous velocity of the gate, we use the expression given:

vr=y2y1δtsubscript𝑣𝑟subscript𝑦2subscript𝑦1𝛿𝑡v_{r}=\frac{y_{2}-y_{1}}{\delta t} (13)
Require: ttrajsubscript𝑡𝑡𝑟𝑎𝑗t_{traj}, L𝐿L , y1subscript𝑦1y_{1}, y2subscript𝑦2y_{2}// Input Arguments
vry2y1δtsubscript𝑣𝑟subscript𝑦2subscript𝑦1𝛿𝑡v_{r}\leftarrow\frac{y_{2}-y_{1}}{\delta t}// vrsubscript𝑣𝑟v_{r} estimate via SNN
d1vr×ttrajsubscript𝑑1subscript𝑣𝑟subscript𝑡𝑡𝑟𝑎𝑗d_{1}\leftarrow v_{r}\times t_{traj} // Gate distance traversed
1 Compute distance d2=f(L,y2)subscript𝑑2𝑓𝐿subscript𝑦2d_{2}=f(L,y_{2}) from approaching end
2
3
if  y2>y1subscript𝑦2subscript𝑦1y_{2}>y_{1} then
       // Gate moving right
4       if y2>0subscript𝑦20y_{2}>0 then
5            d2=Ly2subscript𝑑2𝐿subscript𝑦2d_{2}=L-y_{2}
6       end if
7      if y2<0subscript𝑦20y_{2}<0 then
8            d2=L+|y2|subscript𝑑2𝐿subscript𝑦2d_{2}=L+|y_{2}|
9       end if
10      Compare d1subscript𝑑1d_{1} and d2subscript𝑑2d_{2}
       if d1>d2subscript𝑑1subscript𝑑2d_{1}>d_{2} // direction change > d2subscript𝑑2d_{2}
11       then
12            y=Ld1+d2superscript𝑦𝐿subscript𝑑1subscript𝑑2y^{*}=L-d_{1}+d_{2}
13       end if
14      if d1<d2subscript𝑑1subscript𝑑2d_{1}<d_{2} then
15            y=y2+d1superscript𝑦subscript𝑦2subscript𝑑1y^{*}=y_{2}+d_{1}
16       end if
17      
18 end if
19if  y2<y1subscript𝑦2subscript𝑦1y_{2}<y_{1} then
       // Gate moving left
20       if y2<0subscript𝑦20y_{2}<0 then
21            d2=|Ly2|subscript𝑑2𝐿subscript𝑦2d_{2}=|-L-y_{2}|
22       end if
23      if y2>0subscript𝑦20y_{2}>0 then
24            d2=L+y2subscript𝑑2𝐿subscript𝑦2d_{2}=L+y_{2}
25       end if
26      Compare d1subscript𝑑1d_{1} and d2subscript𝑑2d_{2}
       if d1>d2subscript𝑑1subscript𝑑2d_{1}>d_{2} // direction change > d2subscript𝑑2d_{2}
27       then
28            y=L+d1d2superscript𝑦𝐿subscript𝑑1subscript𝑑2y^{*}=-L+d_{1}-d_{2}
29       end if
30      if d1<d2subscript𝑑1subscript𝑑2d_{1}<d_{2} then
31            y=y2d1superscript𝑦subscript𝑦2subscript𝑑1y^{*}=y_{2}-d_{1}
32       end if
33      
34 end if
return ysuperscript𝑦y^{*}
Algorithm 1 trajectory_learner.pyformulae-sequence𝑡𝑟𝑎𝑗𝑒𝑐𝑡𝑜𝑟𝑦_𝑙𝑒𝑎𝑟𝑛𝑒𝑟𝑝𝑦trajectory\_learner.py

III-C2 Symbolic Planning

We assume the gate is constrained within distances +L𝐿+L and L𝐿-L, beyond which the gate reverses the direction of motion. This is the only prior knowledge that we use during inference. Algorithm 1 presents the pseudo-code of our symbolic predictive planner. The planner accepts the variables ttrajsubscript𝑡𝑡𝑟𝑎𝑗t_{traj} (trajectory time) and L𝐿L as the input arguments along with y1subscript𝑦1y_{1} and y2subscript𝑦2y_{2}. y1subscript𝑦1y_{1}, y2subscript𝑦2y_{2} are the outputs from the SNN and ttrajsubscript𝑡𝑡𝑟𝑎𝑗t_{traj} is the output from the PgNN. The information from the two networks are now combined as discussed next. The algorithm outputs ysuperscript𝑦y^{*} – the desired destination of the robot after ttrajsubscript𝑡𝑡𝑟𝑎𝑗t_{traj}, which will enable collision-free gate crossing. The distance d1subscript𝑑1d_{1} traversed by the gate is computed during the trajectory time ttrajsubscript𝑡𝑡𝑟𝑎𝑗t_{traj} (Line 3). Also, the distance d2subscript𝑑2d_{2} (which is the distance of the gate from the nearest end (L𝐿-L or +L𝐿+L) ) is computed by considering the different sub-cases. If the gate moves right (line 5), then we consider the cases where the gate is in the positive Y axis (Line 6), or negative Y axis (Line 9). Accordingly, the value of d2subscript𝑑2d_{2} is evaluated (Line 7 or Line 10). If d1subscript𝑑1d_{1} >> d2subscript𝑑2d_{2}, then the gate will change the direction of motion (from right to left), and ysuperscript𝑦y^{*} is computed (Line 15). If d1subscript𝑑1d_{1} << d2subscript𝑑2d_{2}, then the gate keeps moving right, and ysuperscript𝑦y^{*} is computed accordingly (Line 18). Similarly, if the gate is detected to be moving left (Line 21), ysuperscript𝑦y^{*} is evaluated symbolically by stepping through the hard-coded rules (Lines 22 - 36). Once ysuperscript𝑦y^{*} is obtained, we use the minimum jerk trajectory [39] which is optimized to make sure the amount of current drawn from the battery is the lowest possible which results in almost zero rate of change of acceleration. Consequently, the planned trajectory burns near-minimum motor currents for the concerned navigation task. Note, that during PgNN training, the predicted velocity is along the depth axis (path AB = depth d𝑑d in Figure 8). However, during inference, there will be a component of the drone velocity along the Y axis (vysubscript𝑣𝑦v_{y} in Figure 8) in the same direction as vrsubscript𝑣𝑟v_{r}, as the target destination is C which depends on the speed of the moving ring (vrsubscript𝑣𝑟v_{r} which cannot be known in advance). The predicted drone velocity vpredsuperscript𝑣𝑝𝑟𝑒𝑑v^{pred} is only a proxy to calculate ttrajsubscript𝑡𝑡𝑟𝑎𝑗t_{traj} (see Eqn. 12). However, a lower value of ttrajsubscript𝑡𝑡𝑟𝑎𝑗t_{traj} will reduce the difference between paths AB and AC, making the actuation-energy near-minimum. Also, note, the flight path has a parabolic nature, but for the sake of explanation we simplified them as straight lines in Figure 8. While it is beyond the scope of this work to mathematically derive lower bounds, our method is reasonably energy-efficient (see Sections V-C, V-D and V-E).

TABLE IV: Neural Network Specifications
SNN Parameters Physics-guided ANN Parameters
Kernel (W) Size 3×3333\times 3 Number of Layers 333
W[0,0,1,1] 0.150.150.15 Neurons per layer [64,128,1286412812864,128,128]
Leak Factor (β𝛽\beta) 0.10.10.1 Coefficient (λ𝜆\lambda) 1e041𝑒041e-04
Threshold (Uthsubscript𝑈𝑡U_{th}) 1.751.751.75 Epochs 200020002000

IV Methodology

We implemented the SNN for event-based tracking using SNNTorch [40], whereas the physics-guided ANN was implemented using Tensorflow [41]. For the PgNN, we also used batch normalization after each layer. The moving gate was created using blender [42]. To integrate the proposed planner with a controller, we used the RotorS simulator [22] and spawned the drone using the Gazebo physics engine [43]. Table IV mentions the parameters used for the networks.

V Experimental Results

As a comparative scheme, we implemented an equivalent fusion-less physics-guided planner. It uses a depth camera to track the gate as well. We call this the Depth-Planner.

V-A Object Tracking

Figure 10 illustrates the mean and peak intersection-over-union (IOU) values for the object tracking using event-based SNN for different depths. The corresponding visual representations are shown in Figure 4. For depth 222m, the IOU is lesser than depths 333m or 444m (where we observe best tracking). At a depth less than 222m, the moving gate (which has a diameter of 222m) comes too near to fit within the 640×480640480640\times 480 viewing window (see Figure 4). For depth greater than 666m, the performance degrades. The success rate of the overall design reduces with increased perception overhead. Hence, we only report performance statistics up to a depth of 666m subsequently.

Refer to caption
Figure 8: Actuation Energy: Training vs Inference

V-B Success Rate

Figure 10 illustrates in a time-series snapshot the Parrot Bebop2 drone flying through the moving gate using EV-Planner. Table V enumerates the success rate of collision-free gate crossing for different starting points of both the drone and the gate. Hence, the results showcase the generalization capability of our design for points not present in the training data. The success rates are lower when the drone starts from y=±2𝑦plus-or-minus2y=\pm 2 as the gates are detected (entirely) for lesser duration (especially at depths <3absent3<3m). For certain cases, the sensor-fused EV-Planner performs better than fusion-less Depth-Planner. Tracking an object using depth involves a series of intermediate image processing steps that incur significant latency. By the time the target is evaluated, the gate shifts by a little, which Depth-Planner struggles to take into account (for greater depths mostly). As the depth becomes greater, this effect is observed more for both methods (IOU reduces in Figure 10). EV-Planner still performs better, as event-based tracking is faster than depth-based tracking.

Refer to caption
Figure 9: Object Tracking using Event-based SNN
Refer to caption
Figure 10: Parrot Bebop2 crossing a moving gate using the EV-Planner algorithm.
TABLE V: Success Rates for different initial locations. Each data-point considers 10 observations. Higher is better.
Depth 2m 3m 4m 5m 6m
Starting Points Drone (x,y) (0,0) (0,±plus-or-minus\pm1) (0,±plus-or-minus\pm2) (1,0) (1,±plus-or-minus\pm1) (1,±plus-or-minus\pm2) (2,0) (2,±plus-or-minus\pm1) (2,±plus-or-minus\pm2) (3,0) (3,±plus-or-minus\pm1) (3,±plus-or-minus\pm2) (4,0) (4,±plus-or-minus\pm1) (4,±plus-or-minus\pm2)
Ring (-2,y) ±plus-or-minus\pm2 minus-or-plus\mp1 ±plus-or-minus\pm1 ±plus-or-minus\pm2 minus-or-plus\mp1 ±plus-or-minus\pm1 ±plus-or-minus\pm2 minus-or-plus\mp1 ±plus-or-minus\pm1 0 0/±plus-or-minus\pm1 0 ±plus-or-minus\pm2 ±plus-or-minus\pm1/minus-or-plus\mp2 minus-or-plus\mp2
Success Rate
Depth-Planner
0.8 0.7 0.7 1 0.9 0.8 1 0.9 0.8 0.9 0.8 0.7 0.7 0.6 0.5
EV-Planner
1 0.8 0.8 1 0.9 0.9 1 1 0.9 0.9 0.9 0.8 0.7 0.7 0.6
Refer to caption
Figure 11: Actuation Energy (averaged over 25 flights with each flight averaged over 10 runs) for different starting points of the drone while flying through the ring without colliding.
Lower is better.
Refer to caption
Figure 12: Ablation Study for actuation energy to quantify
the contribution of each design component. Lower is better.

V-C Actuation Energy: Event vs Depth

Figure 12 presents the average actuation energy for EV-Planner compared to Depth-Planner for different starting locations of the drone. Although we do not compare computation energy for perception, the increased latency for depth-based tracking increases the actuation energy. This is because, the drone hovers longer for the depth-based tracking, which in turn burns more rotor currents. On average, Depth-Planner takes 222 seconds longer for the symbolic planner. For an average instantaneous power of 124W124𝑊124W, this translates to an additional 248J248𝐽248J of energy, as depth-based perception overhead. On the other hand, EV-Planner relies on events for faster object tracking using low-latency SNNs. For 252525 flights (each averaged over 10 runs), EV-Planner achieves 2.6×2.6\times lower actuation energy compared to Depth-Planner.

V-D Ablation Study

Figure 12 presents an ablation study to quantify the contribution of each design component (i.e the SNN based perception and PgNN based flight-time estimator). Note that when SNN is not used, the depth-sensor based tracking is employed (Depth-Planner). In absence of the PgNN, a vanilla ANN without any underlying physical prior is used for time-estimation. A systematic comparison among the four possible cases reveal that SNN + PgNN based EV-Planner results in 2.85×\sim 2.85\times lower actuation energy compared to the baseline which uses depth-based perception and physics-less planning, highlighting the energy-efficiency of our proposed design.

V-E Comparison with SOTA Event-based Perception

Table VI presents a summary by comparing two state-of-the-art (SOTA) perception works which use events for object-detection – a) Recurrent Vision Transformers (RVT) [31] and b) Asynchronous Event-based Graph Neural Networks (AEGNN) [32]. EV-Planner which uses a lightweight variant of [10] is scene-independent without requiring any training and is also orders of magnitude more parameter-efficient. RVT and AEGNN both detect objects elegantly, but require curated datasets with lots of instances and labelling. AEGNN uses different configurations for different datasets. To use any supervised-learning (SL) technique in place of our shallow SNN, one has to prepare a dataset with several instances of the ring along with proper labels resulting in training overhead. Also, since the model complexity is greater than the shallow SNN, we did not use SL-based methods as perception block.

TABLE VI: Model Complexity Comparison for event-based perception
Works Architecture Details Mode of Learning
[31]
4 transformer blocks with
similar-to\sim9-18 million parameters
Supervised
[32]
7 graph convolution blocks, 2 pooling layers
similar-to\sim230k parameters for ncars dataset
Supervised
This work
1 3x3 Conv2d block with 9 spiking neurons
with no trainable parameters
Unsupervised

VI Limitations, Discussions and Future Work

The results clearly highlight the benefits associated with sensor-fused (events and depth) perception for navigation via SNNs and PgNNs. For future evaluations in real-world, noise could be an issue for which the perception pipeline needs modification. While event cameras are more robust to low-light and motion blur[7], it is possible to encounter spurious events in the environment. One way to filter those spurious events would be to introduce clustering [10]. However, that will add to the latency overhead which may be unavoidable in real-world experiments but not encountered in a simulator. Also, note that, though the PgNN predicted velocities would result in actuation energies (as low as 200Jsimilar-toabsent200𝐽\sim 200J), the planner burns more power during the perception window for gate tracking and due to the Y component of the velocity as shown in Figure 8. Hence, the energies reported for the actual navigation are higher. There have been works that evaluate compute energy for object detection by considering special-purpose hardware [44, 45]. Compute or non-actuation energy refers to the perception energy which is typically executed on low-power processors with current ratings of nano-Amperes. The actuators are the rotor blades which draws currents of several Amperes (109×10^{9}\times more w.r.t compute current) from a 15 V battery, making the actuation energy orders of magnitude higher. Hence, we limit our energy calculations in this work only to actuation energy. Research on reducing compute energy will consider designing embedded sensor-fused controllers with scaled in-robot compute hardware with programming support in future. This will reduce the payload mass, and hence the actuation energy. Future work will also consider deploying the proposed method in a real robot with the aforementioned considerations.

VII Summary

We presented EV-Planner – an event-based physics-guided neuromorphic planner to perform obstacle avoidance using neuromorphic event cameras and physics-based AI. For object tracking, we used event-based low-latency SNNs. Utilizing the physical equations of the drone rotors, an energy-aware PgNN was trained with depth inputs to predict flight times. The outputs from the neural networks were consumed by a symbolic planner to publish the robot trajectories. The task of autonomous quadrotor navigation was considered with the aim to detect moving gates and fly through them while avoiding a collision. Extensive simulation results show that sensor-fusion based EV-Planner performs generalizable collision-free dynamic gate crossing with decent success rates, with lower actuation energy compared to ablated variants.

References

  • [1] G. Desouza and A. Kak, “Vision for mobile robot navigation: a survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 2, pp. 237–267, 2002.
  • [2] S. Li, M. M. Ozo, C. De Wagter, and G. C. de Croon, “Autonomous drone race: A computationally efficient vision-based navigation and control strategy,” Robotics and Autonomous Systems, vol. 133, p. 103621, 2020.
  • [3] A. Amir, B. Taba, D. Berg et al., “A low power, fully event-based gesture recognition system,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 7388–7397.
  • [4] P. Lichtsteiner, C. Posch, and T. Delbruck, “A 128×\times 128 120 db 15 μ𝜇\mus latency asynchronous temporal contrast vision sensor,” IEEE Journal of Solid-State Circuits, vol. 43, no. 2, pp. 566–576, 2008.
  • [5] C. Posch, D. Matolin, and R. Wohlgenannt, “A qvga 143 db dynamic range frame-free pwm image sensor with lossless pixel-level video compression and time-domain cds,” IEEE Journal of Solid-State Circuits, vol. 46, no. 1, pp. 259–275, 2011.
  • [6] C. Brandli, R. Berner, M. Yang, S.-C. Liu, and T. Delbruck, “A 240 × 180 130 db 3 µs latency global shutter spatiotemporal vision sensor,” IEEE Journal of Solid-State Circuits, vol. 49, no. 10, pp. 2333–2341, 2014.
  • [7] G. Gallego, T. Delbrück, G. Orchard, C. Bartolozzi, B. Taba, A. Censi, S. Leutenegger, A. J. Davison, J. Conradt, K. Daniilidis, and D. Scaramuzza, “Event-based vision: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 1, pp. 154–180, 2022.
  • [8] A. Delorme, J. Gautrais, R. van Rullen, and S. Thorpe, “Spikenet: A simulator for modeling large networks of integrate and fire neurons,” Neurocomputing, vol. 26-27, pp. 989–996, 1999.
  • [9] C. Lee, A. K. Kosta, A. Z. Zhu, K. Chaney, K. Daniilidis, and K. Roy, “Spike-flownet: event-based optical flow estimation with energy-efficient hybrid neural networks,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, 2020, pp. 366–382.
  • [10] M. Nagaraj, C. M. Liyanagedera, and K. Roy, “Dotie - detecting objects through temporal isolation of events using a spiking architecture,” in 2023 International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 4858–4864.
  • [11] M. Raissi, P. Perdikaris, and G. E. Karniadakis, “Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations,” Journal of Computational physics, vol. 378, pp. 686–707, 2019.
  • [12] G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, and L. Yang, “Physics-informed machine learning,” Nature Reviews Physics, vol. 3, no. 6, pp. 422–440, 2021.
  • [13] E. A. Antonelo, E. Camponogara, L. O. Seman, E. R. de Souza, J. P. Jordanou, and J. F. Hubner, “Physics-informed neural nets for control of dynamical systems,” SSRN Electronic Journal, 2021.
  • [14] S. Sanyal and K. Roy, “Neuro-ising: Accelerating large-scale traveling salesman problems via graph neural network guided localized ising solvers,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 41, no. 12, pp. 5408–5420, 2022.
  • [15] C. Belta, A. Bicchi, M. Egerstedt, E. Frazzoli, E. Klavins, and G. J. Pappas, “Symbolic planning and control of robot motion [grand challenges of robotics],” IEEE Robotics & Automation Magazine, vol. 14, no. 1, pp. 61–70, 2007.
  • [16] J. Nicodemus, J. Kneifl, J. Fehr, and B. Unger, “Physics-informed neural networks-based model predictive control for multi-link manipulators,” IFAC-PapersOnLine, vol. 55, no. 20, pp. 331–336, 2022, 10th Vienna International Conference on Mathematical Modelling MATHMOD 2022.
  • [17] S. Sanyal and K. Roy, “Ramp-net: A robust adaptive mpc for quadrotors via physics-informed neural network,” in 2023 International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 1019–1025.
  • [18] A. Salehi and S. Doncieux, “Data-efficient, explainable and safe payload manipulation: An illustration of the advantages of physical priors in model-predictive control,” arXiv preprint arXiv:2303.01563, 2023.
  • [19] K. Y. Chee, T. Z. Jiahao, and M. A. Hsieh, “Knode-mpc: A knowledge-based data-driven predictive control framework for aerial robots,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 2819–2826, 2022.
  • [20] S. N. Wadekar, B. J. Schwartz, S. S. Kannan, M. Mar, R. K. Manna, V. Chellapandi, D. J. Gonzalez, and A. E. Gamal, “Towards end-to-end deep learning for autonomous racing: On data collection and a unified architecture for steering and throttle prediction,” arXiv preprint arXiv:2105.01799, 2021.
  • [21] [Online]. Available: https://www.parrot.com/assets/s3fs-public/media-public/EN_Pressrelease2015/parrotbebop2theall-in-onedrone.pdf
  • [22] F. Furrer, M. Burri, M. Achtelik, and R. Siegwart, Robot Operating System (ROS): The Complete Reference (Volume 1).   Cham: Springer International Publishing, 2016, ch. RotorS—A Modular Gazebo MAV Simulator Framework, pp. 595–625.
  • [23] T. Delbruck and P. Lichtsteiner, “Fast sensory motor control based on event-based hybrid neuromorphic-procedural system,” in 2007 IEEE International Symposium on Circuits and Systems, 2007, pp. 845–848.
  • [24] X. Lagorce, C. Meyer, S.-H. Ieng, D. Filliat, and R. Benosman, “Asynchronous event-based multikernel algorithm for high-speed visual features tracking,” IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 8, pp. 1710–1720, 2015.
  • [25] X. Clady, S.-H. Ieng, and R. Benosman, “Asynchronous event-based corner detection and matching,” Neural Networks, vol. 66, pp. 91–106, 2015.
  • [26] A. Mitrokhin, C. Fermüller, C. Parameshwara, and Y. Aloimonos, “Event-based moving object detection and tracking,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018, pp. 1–9.
  • [27] D. Gehrig, H. Rebecq, G. Gallego, and D. Scaramuzza, “Asynchronous, photometric feature tracking using events and frames,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 750–765.
  • [28] C. Walters and S. Hadfield, “Evreflex: Dense time-to-impact prediction for event-based obstacle avoidance,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2021, pp. 1304–1309.
  • [29] Z. Wang, F. C. Ojeda, A. Bisulco, D. Lee, C. J. Taylor, K. Daniilidis, M. A. Hsieh, D. D. Lee, and V. Isler, “Ev-catcher: High-speed object catching using low-latency event-based neural networks,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 8737–8744, 2022.
  • [30] D. Falanga, K. Kleber, and D. Scaramuzza, “Dynamic obstacle avoidance for quadrotors with event cameras,” Science Robotics, vol. 5, no. 40, p. 9712, 2020.
  • [31] M. Gehrig and D. Scaramuzza, “Recurrent vision transformers for object detection with event cameras,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 13 884–13 893.
  • [32] S. Schaefer, D. Gehrig, and D. Scaramuzza, “Aegnn: Asynchronous event-based graph neural networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 12 371–12 381.
  • [33] L. Zhang, J. Sturm, D. Cremers, and D. Lee, “Real-time human motion tracking using multiple depth cameras,” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012, pp. 2389–2395.
  • [34] M. Wüthrich, P. Pastor, M. Kalakrishnan, J. Bohg, and S. Schaal, “Probabilistic object tracking using a range camera,” in 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2013, pp. 3195–3202.
  • [35] C. Reading, A. Harakeh, J. Chae, and S. L. Waslander, “Categorical depth distribution network for monocular 3d object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 8555–8564.
  • [36] Z. Ren, J. Meng, and J. Yuan, “Depth camera based hand gesture recognition and its applications in human-computer-interaction,” in 2011 8th International Conference on Information, Communications & Signal Processing, 2011, pp. 1–5.
  • [37] R. K. Manna, D. J. Gonzalez, V. Chellapandi, M. Mar, S. S. Kannan, S. Wadekar, E. J. Dietz, C. M. Korpela, and A. El Gamal, “Control challenges for high-speed autonomous racing: Analysis and simulated experiments,” SAE International Journal of Connected and Automated Vehicles, vol. 5, no. 1, pp. 101–114, jan 2022.
  • [38] F. Morbidi, R. Cano, and D. Lara, “Minimum-energy path generation for a quadrotor uav,” in 2016 IEEE International Conference on Robotics and Automation (ICRA), 2016, pp. 1492–1498.
  • [39] D. Mellinger and V. Kumar, “Minimum snap trajectory generation and control for quadrotors,” in 2011 IEEE international conference on robotics and automation.   IEEE, 2011, pp. 2520–2525.
  • [40] J. K. Eshraghian, M. Ward, E. Neftci, X. Wang, G. Lenz, G. Dwivedi, M. Bennamoun, D. S. Jeong, and W. D. Lu, “Training spiking neural networks using lessons from deep learning,” arXiv preprint arXiv:2109.12894, 2021.
  • [41] M. Abadi, A. Agarwal et al., “TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015, software available from tensorflow.org. [Online]. Available: https://www.tensorflow.org/
  • [42] B. O. Community, Blender - a 3D modelling and rendering package, Blender Foundation, Stichting Blender Foundation, Amsterdam, 2018. [Online]. Available: http://www.blender.org
  • [43] N. Koenig and A. Howard, “Design and use paradigms for gazebo, an open-source multi-robot simulator,” in 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), vol. 3, 2004, pp. 2149–2154 vol.3.
  • [44] S. Sanyal, A. Ankit, C. M. Vineyard, and K. Roy, “Energy-efficient target recognition using reram crossbars for enabling on-device intelligence,” in 2020 IEEE Workshop on Signal Processing Systems (SiPS), 2020, pp. 1–6.
  • [45] A. Roy, M. Nagaraj, C. M. Liyanagedera, and K. Roy, “Live demonstration: Real-time event-based speed detection using spiking neural networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2023, pp. 4080–4081.