This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

AsynEIO: Asynchronous Monocular Event-Inertial Odometry Using Gaussian Process Regression

Zhixiang Wang, Xudong Li, Yizhai Zhang, Fan Zhang, and Panfeng Huang This work was supported by the National Natural Science Foundation of China under Grant 62022067. (Corresponding author: Yizhai Zhang.)Zhixiang Wang is with the Research Center for Intelligent Robotics, Shaanxi Province Innovation Team of Intelligent Robotic Technology, School of Automation, Northwestern Polytechnical University, Xi’an 710072, China (e-mail: [email protected]).Xudong Li, Yizhai Zhang, Fan Zhang, and Panfeng Huang are with the Research Center for Intelligent Robotics, Shaanxi Province Innovation Team of Intelligent Robotic Technology, School of Astronautics, Northwestern Polytechnical University, Xi’an 710072, China (e-mail: [email protected]; [email protected]; [email protected]; [email protected]).
Abstract

Event cameras, when combined with inertial sensors, show significant potential for motion estimation in challenging scenarios, such as high-speed maneuvers and low-light environments. There are many methods for producing such estimations, but most boil down to a synchronous discrete-time fusion problem. However, the asynchronous nature of event cameras and their unique fusion mechanism with inertial sensors remain underexplored. In this paper, we introduce a monocular event-inertial odometry method called AsynEIO, designed to fuse asynchronous event and inertial data within a unified Gaussian Process (GP) regression framework. Our approach incorporates an event-driven frontend that tracks feature trajectories directly from raw event streams at a high temporal resolution. These tracked feature trajectories, along with various inertial factors, are integrated into the same GP regression framework to enable asynchronous fusion. With deriving analytical residual Jacobians and noise models, our method constructs a factor graph that is iteratively optimized and pruned using a sliding-window optimizer. Comparative assessments highlight the performance of different inertial fusion strategies, suggesting optimal choices for varying conditions. Experimental results on both public datasets and our own event-inertial sequences indicate that AsynEIO outperforms existing methods, especially in high-speed and low-illumination scenarios.

Index Terms:
event-inertial fusion, Gaussian process regression, motion estimation.
This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.

I Introduction

The visual inertial odometry (VIO) technology is critical for autonomous robots to navigate in unknown environments. The VIO systems estimate motion states of robots by means of fusing Inertial Measurement Units (IMUs) and visual sensors. Generally, the Inertial Measurement Units (IMUs) directly measure angular velocities and linear accelerations while the visual sensors observe the bearing information with respect to environments. The complementarity of inertial and visual sensors makes the estimation problem more stable in both static and dynamic scenarios. Nevertheless, the conventional VIO systems are still prone to crash in several challenging scenarios, including high-speed movements, High-Dynamic-Range (HDR) environments, and sensors with incompatible frequencies. Specifically, high-speed movements and HDR environments will cause low-quality grayscale images for intensity cameras and long-period preintegration for IMUs. The long-period preintegration rapidly accumulates integration errors, meanwhile, low-quality images intend to deteriorate the visual frondend of VIO systems. Both of them result in a decline in estimation performance. In addition, discrete-time estimation and preintegration methods inherently have little support to fuse gyroscopes and accelerometers with incompatible frequencies or even asynchronous measurements. Therefore, it is imperative to develop a more robust VIO system to enhance estimation performance in challenging situations.

Refer to caption
Figure 1: Illustration of two asynchronous fusion schemes (GPIF and GPP). The GPIF conducts a mass of factors for inertial measurements and deforms the continuous-time trajectory in conjunction with GP-based motion prior. The GPP first reduces a series of latent states and then integrates them efficiently with the linear operator at arbitrary timestamps.

Event cameras have gained popularity in computer vision and robotics for applications such as object detection [1], image enhancement [2], and motion estimation [3, 4]. As a bio-inspired visual sensor, the event camera has an underlying asynchronous trigger mechanism where the illumination intensity variety of the scenario is independently recorded in a per-pixel manner. Benefiting from this special mechanism, event cameras gain prominent performance improvements to conventional cameras, such as low power consumption, low latency, HDR, high-temporal resolution, etc [5]. Intuitively, it means that event cameras has potential in motion estimation of high-speed and HDR scenarios. However, the discrete-time estimation framework adopted in event-based VIO systems greatly hinders the potential of event cameras as illustrated in [6].

Continuous-time estimation methods have been proposed to deal with foregoing drawbacks of event-based VIO systems. For these methods, a continuous-time trajectory is firstly defined with splines or kinematic assumptions, and then employed to build the measurement residuals. As the continuous-time trajectory have valid definitions at arbitrary timestamps, it naturally has the capacity of inferring the motion trajectory from totally asynchronous measurements and sensors. Among them, the Gaussian Process (GP) based methods have gained great interests of researchers for its clearer physical meaning. Although previous researchers have proposed to leverage GP-based methods to infer the SE(3) trajectory for event-based VIOs, existing methods still struggle to address the aforementioned challenges. With the introduction of GP-based methods, various inertial fusion schemes combining event cameras and IMUs have emerged. Thus, the interesting question is: which is the best fusion scheme for event-based VIO?

In this paper, we suggest an Asynchronous Event-Inertial Odometry (AsynEIO) system to improve the performance in challenging cases and comprehensively assess the reasonable inertial schemes. This study significantly builds on our previous conference publication [7] by developing a purely event-based frontend and introducing new inertial fusion schemes. The proposed AsynEIO utilizes a fully asynchronous fusion pipeline based on GP regression to improve the estimation performance in HDR and high-speed environments. Two main types of GP-based inertial fusion methods are integrated into the system to permit to fuse incompatible frequencies or asynchronous inertial measurements (i.e., GP-inertial Factor (GPIF) and GP-preintegration (GPP)), as depicted in Fig. 1. The GPIF is specifically designed for GP-based continuous-time methods to avoid the need for integrating raw measurements. We achieve this objective by leveraging a White Noise on Jerk (WNOJ) motion prior and investigating the underlying relationship between the differential SE(3) state variables and raw inertial measurements. Instead, the GPP infers data-driven GP models from real-time inertial measurements and generate a series of latent states to replace the motion prior in GPIF. Inspired by GPIF, we further extend the conventional discrete preintegration (Preint) to develop a new GP-based inertial factor, referred to as ExtPreint, where new residuals are directly incorporated into the GP trajectory. We fairly compare the performance of these inertial fusion methods under the unified event-inertial pipeline. Eventually, we conduct experiments on both public datasets and own-collected event-inertial sequences to demonstrate the advantages of the proposal with respect to the state-of-the-art. In summary, the main contributions of this article are as follows:

  1. 1.

    A fully asynchronous event-inertial odometry named as AsynEIO that fuses pure event streams with inertial measurements using a unified GP regression framework.

  2. 2.

    Comparative assessments of different inertial fusion schemes where four types of inertial schemes (i.e., GPIF, GPP, Preint, and ExtPreint) are compared to demonstrate the advantages of different fusion schemes.

  3. 3.

    Comprehensive experiments conducted on both public datasets and own-collected sequences to show the improvements of our proposal with respect to the state-of-the-art.

The remainder of this paper is organized as follows. Sec. II reviews the relevant literature. The GP regression on SE(3)SE(3) with WNOJ are detailed in Sec. III. The GPP and GPIF are introduced in Sec. IV and Sec, V, respectively. The proposed AsynEIO with different inertial factors is described in Sec. VI. Experimental assessments are carried out in Sec. LABEL:sec:experiments. Finally, conclusions are drawn in Sec. VII.

II Related Work

Event-based visual-inertial odometry has gained significant research interest for challenging scenarios such as aggressive motion estimation as well as HDR scene. The most critical issue in this field is how to fuse data from the synchronous IMU data and the asynchronous event streams [5]. Zhu et al. [8] constructed a batch event packets aligning it by two Expectation-Maximization[9] steps and fused with pre-integration IMU in an EKF backend. Mahlknecht et al. [10] synchronized measurements from the EKLT tracker [11] using extrapolation when a fixed number of events is triggered. Then, these associated visual measurements was fused with IMU under a filter-based backend. However, the frontend computational complexity of these methods is very high. Rebecq et al. [12] detected corners on motion compensated event image and tracked through Lucas Kanade (LK) optical flow, and then fused the IMU-preintegrated measurements in a standard keyframe-based framework. Furthermore, this work was extended to the Ultimate-SLAM [4] by combing the standard frames. Lee et al. [13] proposed a 8-DOF warping model to fuse events and frames for accurate feature tracks. The residual from frontend optimization process was used to update an adaptive noise covariance. There is another trend of using the Time Surface (TS) [14] ,which stores the timestamp of the recent event at each pixel, as event representation to apply frame-based detection and tracking technique. [15] performed Arc* [16] for corner detection and LK tracking on TS with polarity. A tightly coupled graph-based optimization framework was designed to obtain estimation from the IMU and visual observations. PL-EVIO [17] combined the event-based line and point features and image-based point features to provide more information in human-made scene based on [15]. ESVO [18] proposed a first event-based stereo visual odometry by maximizing the spatio-temporal consistency using TS. Extended from [18], Junkai et al. [19] proposed a gyroscope measurements prior via pre-integration to circumvent the degeneracy issue. Kai et al. [20] utilized an adaptive decay-based TS to extract features and proposed a polarity-aware strategy to enhance the robustness, Then, these tracked features and IMU were fused in the MSCKF state estimator Overall, all these methods transform asynchronous event streams to synchronous data association and convert high rate IMU data into inter frame motion constrain through IMU-preintegration.

Another category of works directly processed event streams asynchronously [21, 16, 22, 23] and applied continuous-time state estimation methods to fuse asynchronous data. Unlike estimating discrete states, continuous-time estimation methods assume a continuous trajectory and approximates it through a set of basis functions. Interpolating on continuous trajectories, it can relate measurements observed at arbitrary times like multi cameras system [24], Lidar odometry [25, 26, 27] and event camera [28, 29]. Mueggler et al. [30] firstly used cubic B-splines trajectory representation to directly integrate the information tracked by a given line-based maps. Furthermore, to estimate the absolute scale, they fused pre-integration IMU between adjacent control points and extended [30] to point-based maps in [31]. Lu et al. [32] proposed event-based visual-inertial velometer which leveraged a B-spline continuous-time formulation to fuse the heterogeneous measurements from a stereo event camera and IMU. Besides the B-spline method, GP regression models the system’s motion state using GP and construct a GP prior between adjacent discrete states, like White-Noise-on-Acceleration (WNOA) prior [33] and WNOJ prior [34]. Wang et al. [28] applied a motion-compensated RANSAC and WNOA motion prior to a stereo event-based VO pipeline. Comparing the above two continuous trajectory representations, Johnson et al. [35] demonstrated the accuracy and computation time of WNOJ prior GP-based method and spline-based method are similar. Note that there is no work to implement the WNOJ prior to event-based odometry.

Recently, researchers have focused on designing new inertial fusion formulas specially for continuous-time estimation frameworks. Some of research works have proposed continuous-time formulations of IMU pre-integration [36, 37, 38]. These methods used inertial measurements to characterize the system’s continuous motion rather than direct prior like B-spline or GP-prior. [36] formulated a GP-based preintegration method for one-axis rotations and upsample the gyroscope data to compute the rotation preintegration when rotating on multiple axes. This strategy was employed in line features [39] and corner features [40] event-based VIO. IDOL [39] extracted event clusters of line segments and tracked lines between consecutive event windows. they leveraged [36] to associate asynchronous event measurements. However, this method shown a high computational cost for its full-batch optimization. Dai et al. [40] proposed a asynchronous tracking method relied on decaying event-based corners and adopted a similar fusing scheme as[39]. More recently, Gentil et al. [38] proposed a latent variable model to analytically integrate inertial measurements and implicitly induce a data-driven GP. This method formulated a unified GP-based integration to address the issue of continuous integration over SE(3) in [36]. Although their methods is specially designed for GP frameworks, they still integrate the inertial measurements rather than directly inferring on the raw measurements. Burnett et al. [41] introduced a velocity measurement model with integration in linear velocity under the WNOA motion prior [42]. Then, they further improve their methods with the Singer motion prior [27] where angular velocity and body-centric acceleration [43]. However, both of their methods accomplished poor preciseness in the accelerometer measurement model. Zheng et al. [25] developed a multi-LiDAR and IMU state estimator based on the GP-based trajectory. They separately defined the kinematic state on SO(3) and linear vector spaces rather than SE(3) and achieved a direct residual model on the raw measurements of IMUs. Chamorro et al. [44] investigated the event-IMU fusion strategies under Kalman filter which compared the conventional predict-with-IMU-correction-with-vision method and predict-with-model-correction-with-measurement. However, these solutions may face challenges in scenarios where straight lines are absent. Our previous work [7] investigated two different GP-based methods to fuse asynchronous visual measurements and IMU preintegration. Building on this foundation, we further extend our previous work to incorporate more inertial fusion methods and conduct comprehensive experiments to evaluate the performance of different schemes in fusing asynchronous measurements. In addition, our previous work relies on intensity images as auxiliary measurements for tracking event streams, limiting its performance in high-speed and HDR scenarios. In AsynEIO, we directly track pure event streams, achieving excellent adaptability in these challenging conditions.

III Sparse GP regression on SE(3)SE(3)

III-A Preliminaries and Kinematics

Define the transformation of the rigid body frame bb with respect to (w.r.t) the fixed inertial frame ii as a matrix 𝑻ib\bm{T}_{ib}, where the pose 𝑻ib\bm{T}_{ib} forms the special Euclidean group, denoted SE(3)SE(3). Suppose the WNOJ prior to the kinematics of 𝑻ib\bm{T}_{ib} and apply the right multiply update of SE(3)SE(3). We have the differential equation as follows:

𝑻˙ib(t)\displaystyle\dot{\bm{T}}_{ib}(t) =𝑻ib(t)(ϖbbi(t))\displaystyle=\bm{T}_{ib}(t)(\bm{\varpi}_{b}^{bi}(t))^{\land}
ϖ˙bbi(t)\displaystyle\dot{\bm{\varpi}}_{b}^{bi}(t) =𝒔(t)\displaystyle=\bm{s}(t) (1)
𝒔˙(t)\displaystyle\quad\dot{\bm{s}}(t) 𝑮𝑷(𝟎,𝑸cδ(tt)),\displaystyle\sim\bm{GP}(\bm{0},\bm{Q}_{c}\delta(t-t^{\prime})),

where

ϖbbi=[𝝎bbi𝝂bbi]\bm{\varpi}_{b}^{bi}=\left[\begin{array}[]{cc}\bm{\omega}_{b}^{bi}\\ \bm{\nu}_{b}^{bi}\end{array}\right] (2)

is the generalized velocity, 𝒔\bm{s} is the generalized acceleration, and 𝑸c\bm{Q}_{c} is the power spectral density matrix of the GP. Given the detailed rotation and translation components as

𝑻ib=[𝑪ib𝒓ibi𝟎1],ϖbbi=[(𝝎bbi)𝝂bbi𝟎0],\bm{T}_{ib}=\begin{bmatrix}\bm{C}_{ib}&\bm{r}_{i}^{bi}\\ \bm{0}^{\top}&1\end{bmatrix},\quad{\bm{\varpi}_{b}^{bi}}^{\land}=\begin{bmatrix}(\bm{\omega}^{bi}_{b})^{\land}&\bm{\nu}_{b}^{bi}\\ \bm{0}^{\top}&0\end{bmatrix}, (3)

where (𝝎bbi)(\bm{\omega}^{bi}_{b})^{\land} maps the angular velocity vector to a skew-symmetric matrix. Then, we have 𝒓˙ibi=𝑪ib𝝂bbi\dot{\bm{r}}_{i}^{bi}=\bm{C}_{ib}\bm{\nu}_{b}^{bi}. Comparing the subscripts, 𝒓˙ibi\dot{\bm{r}}_{i}^{bi} can be defined as 𝝂ibi\bm{\nu}_{i}^{bi}, which represents the linear velocity of body frame w.r.t the inertial frame observed and expressed in the inertial frame. In fact, the generalized linear velocity 𝝂bbi\bm{\nu}_{b}^{bi} indicates the same linear velocity observed in the inertial frame but expressed in the body frame. Notice that 𝝂bbi𝒓˙bbi\bm{\nu}_{b}^{bi}\neq\dot{\bm{r}}_{b}^{bi}, where 𝒓˙bbi\dot{\bm{r}}_{b}^{bi} represents the linear velocity of body frame w.r.t the inertial frame observed and expressed in the body frame. Thereby, the generalized acceleration is given by

ϖ˙bbi=[𝝎˙bbi𝝂˙bbi]=[𝝎˙bbi𝑪˙bi𝒓˙ibi+𝑪bi𝒓¨ibi],\dot{\bm{\varpi}}_{b}^{bi}=\left[\begin{array}[]{cc}\dot{\bm{\omega}}_{b}^{bi}\\ \dot{\bm{\nu}}_{b}^{bi}\end{array}\right]=\left[\begin{array}[]{cc}\dot{\bm{\omega}}_{b}^{bi}\\ \dot{\bm{C}}_{bi}\dot{\bm{r}}_{i}^{bi}+\bm{C}_{bi}\ddot{\bm{r}}_{i}^{bi}\end{array}\right], (4)

where 𝑪˙bi=𝑪bi(𝝎iib)\dot{\bm{C}}_{bi}=\bm{C}_{bi}(\bm{\omega}_{i}^{ib})^{\land}. We know that (𝑪𝝎)𝑪(𝝎)𝑪(\bm{C}\bm{\omega})^{\land}\equiv\bm{C}(\bm{\omega})^{\land}\bm{C}^{\top} and 𝝎bbi=𝑪bi𝝎iib\bm{\omega}_{b}^{bi}=-\bm{C}_{bi}\bm{\omega}_{i}^{ib}. Substituting them into (4), we can show

𝝂˙bbi=(𝝎bbi)𝝂bbi+𝑪bi𝒓¨ibi,\dot{\bm{\nu}}_{b}^{bi}=-(\bm{\omega}^{bi}_{b})^{\land}\bm{\nu}_{b}^{bi}+\bm{C}_{bi}\ddot{\bm{r}}_{i}^{bi}, (5)

where 𝒓¨ibi\ddot{\bm{r}}_{i}^{bi} represents the linear acceleration of body frame w.r.t the inertial frame observed and expressed in the inertial frame, and 𝝎bbi\bm{\omega}_{b}^{bi} is the angular velocity of body frame w.r.t the inertial frame observed in the body frame. The kinematic state can be defined as

𝒙(t)={𝑻(t),ϖ(t),ϖ˙(t)},\bm{x}(t)=\{\bm{T}(t),\bm{\varpi}(t),\dot{\bm{\varpi}}(t)\}, (6)

where the sub- and superscripts have been dropped and (III-A)-(5) describe the stochastic differential equations (SDEs) of kinematics. Since 𝒙(t)SE(3)×12\bm{x}(t)\in SE(3)\times\mathbb{R}^{12} forms a manifold, these nonlinear SDEs are difficult to directly deploy in continuous-time estimation.

III-B GP Regression on Local Linear Space

To linearize it, the local variable 𝝃\bm{\xi} in the tangent space of 𝑻k\bm{T}_{k} can be given by

𝑻(t)=𝑻kexp((𝝃k(t))),\bm{T}(t)=\bm{T}_{k}\exp((\bm{\xi}_{k}(t))^{\land}), (7)

where 𝑻k𝑻(tk)\bm{T}_{k}\doteq\bm{T}(t_{k}), and exp()\exp(\bullet) is the exponential map from 𝔰𝔢(3)\mathfrak{se}(3) to SE(3)SE(3). The time derivative of (7) indicates the relationship between local variable and kinematic state as

ϖ(t)=𝓙(𝝃k(t))𝝃˙k(t),\bm{\varpi}(t)=\bm{\mathcal{J}}(\bm{\xi}_{k}(t))\dot{\bm{\xi}}_{k}(t), (8)

where 𝓙(𝝃k(t))\bm{\mathcal{J}}(\bm{\xi}_{k}(t)) is the right Jacobian of 𝝃k(t)\bm{\xi}_{k}(t). Notice that 𝓙𝟏6×6\bm{\mathcal{J}}\approx\bm{1}^{6\times 6} when 𝝃k(t)\bm{\xi}_{k}(t) is small enough. Further take the time derivative of (8), we can show the relationship between ϖ(t)\bm{\varpi}(t) and 𝝃¨k(t)\ddot{\bm{\xi}}_{k}(t):

𝝃¨k(t)=𝓙1(𝝃k(t))ϖ˙(t)+ddt𝓙1(𝝃k(t))ϖ(t).\ddot{\bm{\xi}}_{k}(t)=\bm{\mathcal{J}}^{-1}(\bm{\xi}_{k}(t))\dot{\bm{\varpi}}(t)+\frac{d}{dt}\bm{\mathcal{J}}^{-1}(\bm{\xi}_{k}(t))\bm{\varpi}(t). (9)

Apply the first-order approximation and substitute (8), we have

𝝃¨k(t)𝓙(𝝃k(t))1ϖ˙(t)+12(𝓙(𝝃k(t))1ϖ(t))ϖ(t),\ddot{\bm{\xi}}_{k}(t)\approx\bm{\mathcal{J}}(\bm{\xi}_{k}(t))^{-1}\dot{\bm{\varpi}}(t)+\frac{1}{2}(\bm{\mathcal{J}}(\bm{\xi}_{k}(t))^{-1}\bm{\varpi}(t))^{\curlywedge}\bm{\varpi}(t), (10)

where ()(\bullet)^{\curlywedge} is the adjoint matrix operator. For our definition, we have

𝝃=[ϕ𝝆]=[ϕ𝟎𝝆ϕ],\bm{\xi}^{\curlywedge}=\left[\begin{array}[]{cc}\bm{\phi}\\ \bm{\rho}\end{array}\right]^{\curlywedge}=\left[\begin{array}[]{cc}\bm{\phi}^{\land}&\bm{0}\\ \bm{\rho}^{\land}&\bm{\phi}^{\land}\end{array}\right], (11)

where ϕ3\bm{\phi}\in\mathbb{R}^{3} is the rotation component and 𝝆3\bm{\rho}\in\mathbb{R}^{3} is the translation component. Now, we can define the local linear kinematic state w.r.t 𝑻k\bm{T}_{k} as

𝜸k(t)[𝝃k(t)𝝃˙k(t)𝝃¨k(t)].\bm{\gamma}_{k}(t)\triangleq\left[\begin{array}[]{ccc}\bm{\xi}_{k}(t)\\ \dot{\bm{\xi}}_{k}(t)\\ \ddot{\bm{\xi}}_{k}(t)\end{array}\right]. (12)

Similarly, we can assume the motion prior to be

𝝃˙˙˙k(t)𝑮𝑷(𝟎,𝑸cδ(tt)).\dddot{\bm{\xi}}_{k}(t)\sim\bm{GP}(\bm{0},\bm{Q}_{c}\delta(t-t^{\prime})). (13)

On this basis, the linear time-invariant SDEs are

𝜸˙k(t)=[𝟎𝟏𝟎𝟎𝟎𝟏𝟎𝟎𝟎]𝑨𝜸k(t)+[𝟎𝟎𝟏]𝑳𝝃˙˙˙k(t),\dot{\bm{\gamma}}_{k}(t)=\underbrace{\begin{bmatrix}\bm{0}&\bm{1}&\bm{0}\\ \bm{0}&\bm{0}&\bm{1}\\ \bm{0}&\bm{0}&\bm{0}\end{bmatrix}}_{\bm{A}}\bm{\gamma}_{k}(t)+\underbrace{\begin{bmatrix}\bm{0}\\ \bm{0}\\ \bm{1}\end{bmatrix}}_{\bm{L}}\dddot{\bm{\xi}}_{k}(t), (14)

which has a closed-form solution for the state transition function 𝚽(t,tk)=exp(𝑨(ttk))\bm{\Phi}(t,t_{k})=\exp(\bm{A}(t-t_{k})). Meanwhile, the transition equation of the covariance matrix can be given by

𝑸k(t)=tkt𝚽(s,tk)𝑳𝑸c𝑳𝚽(s,tk)𝑑s.\bm{Q}_{k}(t)=\int_{t_{k}}^{t}\bm{\Phi}(s,t_{k})\bm{L}\bm{Q}_{c}\bm{L}^{\top}\bm{\Phi}(s,t_{k})^{\top}ds. (15)

III-C WNOJ Prior Residual

Under the WNOJ assumption, the GP prior residual between timestamp tkt_{k} and tk+1t_{k+1} can be defined as

𝒆k=[Δtkϖk+12Δtk2ϖ˙k𝝃k,k+1ϖk+Δtkϖ˙k𝓙k,k+11ϖk+1ϖ˙k𝓙k,k+11ϖ˙k+112(𝓙k,k+11ϖk+1)ϖk+1],\bm{e}_{k}=\left[\begin{array}[]{ccc}\Delta t_{k}\bm{\varpi}_{k}+\frac{1}{2}\Delta t_{k}^{2}\dot{\bm{\varpi}}_{k}-\bm{\xi}_{k,k+1}\\ \bm{\varpi}_{k}+\Delta t_{k}\dot{\bm{\varpi}}_{k}-\bm{\mathcal{J}}_{k,k+1}^{-1}\bm{\varpi}_{k+1}\\ \dot{\bm{\varpi}}_{k}-\bm{\mathcal{J}}_{k,k+1}^{-1}\dot{\bm{\varpi}}_{k+1}-\frac{1}{2}(\bm{\mathcal{J}}_{k,k+1}^{-1}\bm{\varpi}_{k+1})^{\curlywedge}\bm{\varpi}_{k+1}\end{array}\right], (16)

in which Δtk=tk+1kk\Delta t_{k}=t_{k+1}-k_{k}, 𝝃k,k+1=log(𝑻k1𝑻k+1)\bm{\xi}_{k,k+1}=\log(\bm{T}_{k}^{-1}\bm{T}_{k+1})^{\vee}, 𝓙k,k+1=𝓙(𝝃k,k+1)\bm{\mathcal{J}}_{k,k+1}=\bm{\mathcal{J}}(\bm{\xi}_{k,k+1}), and ϖk=ϖ(tk)\bm{\varpi}_{k}=\bm{\varpi}(t_{k}). Notice that log()\log(\bullet) and ()(\bullet)^{\vee} are the inverse maps of exp()\exp(\bullet) and ()(\bullet)^{\land}, respectively. We can compute the covariance matrix for prior residual from (15). Furthermore, the detailed Jacobians of prior residuals are listed in Appendix-C.

IV GP-Inertial Factor

The GPIF is directly built upon the continuous-time trajectory with the GP-based motion prior. The relationships between GP-based motion states and raw inertial measurements are derived as follows. Subsequently, the residuals and Jacobians are provided analytically.

IV-A Inertial Measurement Model

Generally, the measurement of the gyroscope is the noisy angular velocity 𝝎~bbi=𝝎bbi+𝐛g+𝝇g\tilde{\bm{\omega}}_{b}^{bi}=\bm{\omega}_{b}^{bi}+\mathbf{b}_{g}+\bm{\varsigma}_{g}, and the measurement of the accelerometer 𝒂~\tilde{\bm{a}} is the noisy value

𝒂~=𝑪bi(𝒓¨ibi𝒈i)+𝒃a+𝝇a,\tilde{\bm{a}}=\bm{C}_{bi}(\ddot{\bm{r}}_{i}^{bi}-\bm{g}_{i})+\bm{b}_{a}+\bm{\varsigma}_{a}, (17)

where 𝝇g𝒩(𝟎,𝑸g)\bm{\varsigma}_{g}\sim\mathcal{N}(\bm{0},\bm{Q}_{g}) and 𝝇a𝒩(𝟎,𝑸a)\bm{\varsigma}_{a}\sim\mathcal{N}(\bm{0},\bm{Q}_{a}) are the Gaussian measurement noises, 𝒈i\bm{g}_{i} is the gravitational acceleration, 𝐛g\mathbf{b}_{g} and 𝐛a\mathbf{b}_{a} are biases that obey the Brownian movement.

𝒃˙g𝒩(𝟎,𝑸bg),\displaystyle\dot{\bm{b}}_{g}\sim\mathcal{N}(\bm{0},\bm{Q}_{bg}), (18)
𝒃˙a𝒩(𝟎,𝑸ba),\displaystyle\dot{\bm{b}}_{a}\sim\mathcal{N}(\bm{0},\bm{Q}_{ba}), (19)

where Qa,Qg,Qbg,QbaQ_{a},Q_{g},Q_{bg},Q_{ba} are covariance matrices.

IV-B GP-Inertial Residual and Jacobian

Before describing the GPIF, we need first know the kinematic state at an arbitrary timestamp. To query the trajectory at timestamp τ\tau between tkt_{k} and tk+1t_{k+1}, we firstly interpolate the local variable 𝜸k(τ)\bm{\gamma}_{k}(\tau) with

𝜸k(τ)\displaystyle\bm{\gamma}_{k}(\tau) =𝚲(τ)𝜸k(tk)+𝚿(τ)𝜸k(tk+1),\displaystyle=\bm{\Lambda}(\tau)\bm{\gamma}_{k}(t_{k})+\bm{\Psi}(\tau)\bm{\gamma}_{k}(t_{k+1}),
𝚲(τ)\displaystyle\bm{\Lambda}(\tau) =𝚽(τ,tk)𝚿(τ)𝚽(tk+1,tk),\displaystyle=\bm{\Phi}(\tau,t_{k})-\bm{\Psi}(\tau)\bm{\Phi}(t_{k+1},t_{k}),
𝚿(τ)\displaystyle\bm{\Psi}(\tau) =𝑸k(τ)𝚽(tk+1,τ)𝑸k(tk+1)1.\displaystyle=\bm{Q}_{k}(\tau)\bm{\Phi}(t_{k+1},\tau)^{\top}\bm{Q}_{k}(t_{k+1})^{-1}. (20)

Then, in conjunction with (7)-(12), the kinematic state 𝒙(τ)\bm{x}(\tau) in (6) can be resolved.

With the GP-based continuous-time trajectory, we construct the GPIF directly from the angular velocity and linear acceleration states, which avoids the preintegration. Considering (2), (5) and (17), the residual of GPIF can be given by

𝒆g(τ)\displaystyle\bm{e}_{g}(\tau) =𝝎~(τ)𝝎(τ)𝒃g(τ),\displaystyle=\tilde{\bm{\omega}}(\tau)-\bm{\omega}(\tau)-\bm{b}_{g}(\tau),
𝒆a(τ)\displaystyle\bm{e}_{a}(\tau) =𝒂~(τ)𝝂˙(τ)(𝝎(τ))𝝂(τ)+𝑪(τ)𝒈𝒃a(τ),\displaystyle=\tilde{\bm{a}}(\tau)-\dot{\bm{\nu}}(\tau)-(\bm{\omega}(\tau))^{\land}\bm{\nu}(\tau)+\bm{C}(\tau)^{\top}\bm{g}-\bm{b}_{a}(\tau), (21)

where the sub- and superscripts have been dropped. Neglecting the effects of bias and gravity, we can see that the raw acceleration measurement 𝒂~\tilde{\bm{a}} has a direct relationship with the kinematic state 𝝂˙\dot{\bm{\nu}} and an extra term 𝝎𝝂\bm{\omega}^{\land}\bm{\nu}. Notably, when the angular velocity or linear velocity is very small, or when the axis of rotation is approximately aligned with the direction of the linear velocity, this extra term can be neglected. In other cases, this extra term should be taken into account. Up to this point, we have established a detailed residual model between the raw IMU measurements and the kinematic states.

The covariance matrix of the direct residual is exactly the measurement covariance diag(𝑸g,𝑸a)diag(\bm{Q}_{g},\bm{Q}_{a}). The jacobians of GPIF residuals can be derived from the chain rule as follows:

𝒆(τ)𝒙(tk)=𝒆(τ)𝒙(τ)𝒙(τ)𝒙(tk),\frac{\partial\bm{e}(\tau)}{\partial\bm{x}(t_{k})}=\frac{\partial\bm{e}(\tau)}{\partial\bm{x}(\tau)}\frac{\partial\bm{x}(\tau)}{\partial\bm{x}(t_{k})}, (22)

where the derivations of 𝒙(τ)𝒙(tk)\frac{\partial\bm{x}(\tau)}{\partial\bm{x}(t_{k})} can be found in the Appendix-D. Here, the nonzero components of 𝒆(τ)𝒙(τ)\frac{\partial\bm{e}(\tau)}{\partial\bm{x}(\tau)} are listed as follows:

𝒆g(τ)𝝎(τ)\displaystyle\frac{\partial\bm{e}_{g}(\tau)}{\partial\bm{\omega}(\tau)} =𝒆g(τ)𝒃g(τ)=𝒆a(τ)𝝂˙(τ)=𝒆a(τ)𝒃a(τ)=𝟏,\displaystyle=\frac{\partial\bm{e}_{g}(\tau)}{\partial\bm{b}_{g}(\tau)}=\frac{\partial\bm{e}_{a}(\tau)}{\partial\dot{\bm{\nu}}(\tau)}=\frac{\partial\bm{e}_{a}(\tau)}{\partial\bm{b}_{a}(\tau)}=-\bm{1},
𝒆a(τ)𝝂(τ)\displaystyle\frac{\partial\bm{e}_{a}(\tau)}{\partial\bm{\nu}(\tau)} =(𝝎(τ)),𝒆a(τ)𝝎(τ)=(𝝂(τ)),\displaystyle=-(\bm{\omega}(\tau))^{\land},\ \frac{\partial\bm{e}_{a}(\tau)}{\partial\bm{\omega}(\tau)}=(\bm{\nu}(\tau))^{\land},
𝒆a(τ)𝑪(τ)\displaystyle\frac{\partial\bm{e}_{a}(\tau)}{\partial\bm{C}(\tau)} =(𝑪(τ)𝒈),\displaystyle=\left(\bm{C}(\tau)^{\top}\bm{g}\right)^{\land}, (23)

where 𝒃a(τ)\bm{b}_{a}(\tau) and 𝒃g(τ)\bm{b}_{g}(\tau) are calculated with linear interpolation. In addition, a bias prior factor is attached between each pair of sampling timestamps tkt_{k} and tk+1t_{k+1}, as follows:

𝒆ba(tk)=𝒃a(tk+1)𝒃a(tk),\displaystyle\bm{e}_{ba}(t_{k})=\bm{b}_{a}(t_{k+1})-\bm{b}_{a}(t_{k}),
𝒆bg(tk)=𝒃g(tk+1)𝒃g(tk).\displaystyle\bm{e}_{bg}(t_{k})=\bm{b}_{g}(t_{k+1})-\bm{b}_{g}(t_{k}). (24)

V GP-Preintegration

The GPP provides a different insight to utilize GP regression framework to tightly fuse inertial and other asynchronous measurements. Rather than conducting the kinematic prior, the GPP assumes local inertial measurements as a group of independent GPs and instantiates them on-the-fly in a data-driven manner.

V-A GP Assumptions on Local Measurements

Concretely, the GPP models the “local accelerations” 𝒂k(t)3\bm{a}_{k}(t)\in\mathbb{R}^{3} and the “local angular velocities” ϕ˙k(t)3\dot{\bm{\phi}}_{k}(t)\in\mathbb{R}^{3} as six independent GPs. Therefore, we have

ϕ˙k(t)𝑮𝑷(0,kϕ(t,tk)𝑰3),\displaystyle\dot{\bm{\phi}}_{k}(t)\sim\bm{GP}(0,k_{\phi}(t,t_{k})\bm{I}_{3}), (25)
𝒂k(t)𝑮𝑷(0,ka(t,tk)𝑰3),\displaystyle\bm{a}_{k}(t)\sim\bm{GP}(0,k_{a}(t,t_{k})\bm{I}_{3}),

where ϕ˙k(t)3\dot{\bm{\phi}}_{k}(t)\in\mathbb{R}^{3} is the time derivative of ϕk(t)\bm{\phi}_{k}(t), and kϕ(t,tk),ka(t,tk)k_{\phi}(t,t_{k}),k_{a}(t,t_{k}) are kernel functions of GPs, and 𝑰33×3\bm{I}_{3}\in\mathbb{R}^{3\times 3} is an identity matrix. The local variables are defined as

ϕk(t)\displaystyle\bm{\phi}_{k}(t) =log(𝑪(tk)𝑪(t)),\displaystyle=\log(\bm{C}(t_{k})^{\top}\bm{C}(t))^{\vee}, (26)
𝒂k(t)\displaystyle\bm{a}_{k}(t) =exp(ϕk(t))𝒂(t),\displaystyle=\exp(\bm{\phi}_{k}(t)^{\land})\bm{a}(t),

in which 𝑪(t)\bm{C}(t) is the rotation matrix of the body frame with respect to the inertial frame at the timestamp tt, and 𝒂(t)\bm{a}(t) is so called the proper acceleration of the body frame. From the definition of (17), we know that 𝒂(t)=𝑪bi(𝒓¨ibi𝒈i)\bm{a}(t)=\bm{C}_{bi}(\ddot{\bm{r}}_{i}^{bi}-\bm{g}_{i}). As shown in (25) and (26), the local accelerations and angular velocities are expressed in a local body frame at the timestamp tkt_{k}. On this basis, these variables can be integrated linearly.

V-B GP-Preintegration on Induced Latent States

To permit efficient analytical preintegration, the GPP introduces a series of the latent states as the induced pseudo-observations and conducts an online optimization problem to infer them from inertial measurements. With the linear operator [45] defined on corresponding latent states, it can effectively execute the preintegration for given timestamps.

Suppose the latent states 𝝆^j\hat{\bm{\rho}}_{j} and 𝜶^j\hat{\bm{\alpha}}_{j} be noisy observed vectors of local variables at the timestamp tjt_{j} as in (25), we have

𝝆~j\displaystyle\tilde{\bm{\rho}}_{j} =ϕ˙k(tj)+𝜼ρwith𝜼ρ𝒩(0,σϕ2𝑰3),\displaystyle=\dot{\bm{\phi}}_{k}(t_{j})+\bm{\eta}_{\rho}\ with\ \bm{\eta}_{\rho}\sim\mathcal{N}(0,\sigma_{\phi}^{2}\bm{I}_{3}), (27)
𝜶~j\displaystyle\tilde{\bm{\alpha}}_{j} =𝒂k(tj)+𝜼αwith𝜼α𝒩(0,σa2𝑰3),\displaystyle=\bm{a}_{k}(t_{j})+\bm{\eta}_{\alpha}\ with\ \bm{\eta}_{\alpha}\sim\mathcal{N}(0,\sigma_{a}^{2}\bm{I}_{3}),

where 𝜼\bm{\eta} follows the zero-mean Gaussian distribution with a diagonal covariance matrix. By conditioning upon these latent states, the posterior values of local variables in (25) can be expressed as

ϕ˙~k(τ)\displaystyle\tilde{\dot{\bm{\phi}}}_{k}(\tau) =𝝆~[𝑲ϕ(𝒕,𝒕)+σϕ2𝑰]𝒌ϕ(𝒕,τ),\displaystyle=\tilde{\bm{\rho}}[\bm{K}_{\phi}(\bm{t},\bm{t})+\sigma_{\phi}^{2}\bm{I}]^{-\top}\bm{k}_{\phi}(\bm{t},\tau), (28)
𝒂~k(τ)\displaystyle\tilde{\bm{a}}_{k}(\tau) =𝜶~[𝑲a(𝒕,𝒕)+σa2𝑰]𝒌a(𝒕,τ),\displaystyle=\tilde{\bm{\alpha}}[\bm{K}_{a}(\bm{t},\bm{t})+\sigma_{a}^{2}\bm{I}]^{-\top}\bm{k}_{a}(\bm{t},\tau),

where 𝒕=[t1,t2,,tN]\bm{t}=[t_{1},t_{2},\cdots,t_{N}] is timestamp vector of desired latent states, 𝑲(𝒕,𝒕)=k(𝒕,𝒕)\bm{K}(\bm{t},\bm{t})=k(\bm{t},\bm{t}) is N×NN\times N, 𝒌(𝒕,τ)=k(𝒕,τ)\bm{k}(\bm{t},\tau)=k(\bm{t},\tau) is N×1N\times 1, 𝝆~=[𝝆~1,𝝆~2,,𝝆~N]\tilde{\bm{\rho}}=[\tilde{\bm{\rho}}_{1},\tilde{\bm{\rho}}_{2},\cdots,\tilde{\bm{\rho}}_{N}] is 3×N3\times N, and 𝜶~=[𝜶~1,𝜶~2,,𝜶~N]\tilde{\bm{\alpha}}=[\tilde{\bm{\alpha}}_{1},\tilde{\bm{\alpha}}_{2},\cdots,\tilde{\bm{\alpha}}_{N}] is 3×N3\times N. According to [38], the latent states can be recovered by means of constructing several optimization problems where the latent states 𝝆~\tilde{\bm{\rho}} and 𝜶~\tilde{\bm{\alpha}} in (28) are optimized to predict a bunch of true IMU measurements as accurate as possible. After that, the preintegrated velocity Δ𝝂~k(τ)\Delta\tilde{\bm{\nu}}_{k}(\tau), position Δ𝒓~k(τ)\Delta\tilde{\bm{r}}_{k}(\tau) and rotation vector increment Δϕ~k(τ)\Delta\tilde{\bm{\phi}}_{k}(\tau) can be written as:

Δϕ~k(τ)\displaystyle\Delta\tilde{\bm{\phi}}_{k}(\tau) =tkτϕ˙~k(t)𝑑t,\displaystyle=\int_{t_{k}}^{\tau}\tilde{\dot{\bm{\phi}}}_{k}(t)dt, (29)
Δ𝝂~k(τ)\displaystyle\Delta\tilde{\bm{\nu}}_{k}(\tau) =tkτ𝒂~k(t)𝑑t,\displaystyle=\int_{t_{k}}^{\tau}\tilde{\bm{a}}_{k}(t)dt,
Δ𝒓~k(τ)\displaystyle\Delta\tilde{\bm{r}}_{k}(\tau) =tkτtkt𝒂~k(s)𝑑s𝑑t,\displaystyle=\int_{t_{k}}^{\tau}\int_{t_{k}}^{t}\tilde{\bm{a}}_{k}(s)dsdt,

where τ>tk\tau>t_{k} is the queried timestamp. From (28), we know that the related variables of integration operations in (29) are only 𝒌ϕ(𝒕,τ)\bm{k}_{\phi}(\bm{t},\tau) and 𝒌a(𝒕,τ)\bm{k}_{a}(\bm{t},\tau). Indeed, the integration on latent states can be calculated analytically when 𝒌ϕ(𝒕,τ)\bm{k}_{\phi}(\bm{t},\tau) and 𝒌a(𝒕,τ)\bm{k}_{a}(\bm{t},\tau) are given. The efficient analytical preintegration offers an ability for inferring on asynchronous measurements and continuous-time trajectories. The preintegration for the rotation matrix can be calculated with (26) as Δ𝑪~k(τ)=exp(Δϕ~k(τ))\Delta\tilde{\bm{C}}_{k}(\tau)=\exp(\Delta\tilde{\bm{\phi}}_{k}(\tau)^{\land}). Also, it is convenient to recover the kinematic state 𝒙(τ)\bm{x}(\tau) by means of composing the previous state 𝒙(tk)\bm{x}(t_{k}) with (29). With these composed states, the asynchronous visual projection can be done on arbitrary timestamps.

V-C Residual Terms for GP-Preintegration

According to the inertial preintegration method, the residual terms for GPP are built with the integration increments in (29) as follows:

𝒆ϕ\displaystyle\bm{e}_{\phi} =log(Δ𝑪~k,k+1𝑪k,k+1),\displaystyle=\log(\Delta\tilde{\bm{C}}_{k,k+1}^{\top}\bm{C}_{k,k+1})^{\vee}, (30)
𝒆ν\displaystyle\bm{e}_{\nu} =𝑪k,k+1𝝂k+1𝝂k𝑪k𝒈iΔtk,k+1Δ𝝂~k,k+1,\displaystyle=\bm{C}_{k,k+1}\bm{\nu}_{k+1}-\bm{\nu}_{k}-\bm{C}_{k}^{\top}\bm{g}_{i}\Delta t_{k,k+1}-\Delta\tilde{\bm{\nu}}_{k,k+1},
𝒆r\displaystyle\bm{e}_{r} =𝑪k(𝒓k+1𝒓k12𝒈iΔtk,k+12)𝝂kΔtk,k+1Δ𝒓~k,k+1,\displaystyle=\bm{C}_{k}^{\top}(\bm{r}_{k+1}-\bm{r}_{k}-\frac{1}{2}\bm{g}_{i}\Delta t_{k,k+1}^{2})-\bm{\nu}_{k}\Delta t_{k,k+1}-\Delta\tilde{\bm{r}}_{k,k+1},

where we have 𝒓k=𝒓ibi(tk)\bm{r}_{k}=\bm{r}_{i}^{bi}(t_{k}), and 𝝂k=𝝂bbi(tk)\bm{\nu}_{k}=\bm{\nu}_{b}^{bi}(t_{k}), and Δ𝝂~k,k+1=Δ𝝂~k(tk+1)\Delta\tilde{\bm{\nu}}_{k,k+1}=\Delta\tilde{\bm{\nu}}_{k}(t_{k+1}), and 𝑪k=𝑪ib(tk)\bm{C}_{k}=\bm{C}_{ib}(t_{k}), and 𝑪k,k+1=𝑪k𝑪k+1\bm{C}_{k,k+1}=\bm{C}_{k}^{\top}\bm{C}_{k+1}, and so on. All notations and frames of (30) follow the definitions in Sec. III.

VI Event-driven VIO

Refer to caption
Figure 2: System framework of AsynEIO.

Given the GPIF and GPP , we design an event-inertial odometry that can estimate motion trajectory from pure event streams and inertial measurements. Both GPIF and GPP, in conjunction with the discrete preintegration [46], are built into the same EIO system (as shown in Fig. 2). This designment allows us to switch to different inertial fusion methods and compare them fairly.

VI-A Event-Driven Frontend

To realize high-temporal resolution, we design an asynchronous event-driven frontend where sparse features are detected and tracked in an event-by-event manner, as depicted in Fig. 3. Although separate event-driven feature detecting and tracking methods have been proposed in previous literature, a whole event-driven frontend needs further design on strategies of screening and maintenance. Concretely, we leverage a registration table to record the unique identifier of each active event feature. The pixel location of each event feature can be infer from its index in the registration table. Then, the unique identifier is mapped to the corresponding feature trajectory and its independent tracker. When a event measurement is obtained, we first search in the registration table to determine if its neighbor region already exists activate event features. If there is at least one active event feature, we utilize the identifier-tracker map to find the corresponding tracker and feed the event measurement to it. The movement of active feature will be reported by the tracker and collected into the feature trajectory. Otherwise, we update the SAE and determine if a FA-Harris [47] corner feature can be detected at the pixel location of this event measurement.

Refer to caption
Figure 3: Event-driven feature trajectory generation and management.
Data: Event stream queue =𝒆1,𝒆2,,𝒆n\mathcal{E}=\langle\bm{e}_{1},\bm{e}_{2},...,\bm{e}_{n}\rangle,
   Pair of fts trajectory and tracker 𝒫i=(𝑭i,𝒯i)\mathcal{P}_{i}=(\bm{F}_{i},\mathcal{T}_{i})
   Map function :i𝒫i\mathcal{M}:i\to\mathcal{P}_{i},
   Registration table and SAE 𝒢,𝒮w,h\mathcal{G},\mathcal{S}\in\mathbb{R}^{w,h},
   Set of pairs Φ={𝒫i|i}\Phi=\{\mathcal{P}_{i}|i\in\mathcal{M}\}.
Result: Fts trajectory set 𝑭={𝑭i|i}\bm{F}=\{\bm{F}_{i}|i\in\mathcal{M}\}.
1 i0i\leftarrow 0, and Φ,,𝑭\Phi,\mathcal{M},\bm{F}\leftarrow\emptyset, and 𝒢1\mathcal{G}\leftarrow-1, and 𝒮=0\mathcal{S}=0;
2 while \mathcal{E} not empty do
3       𝒆j\bm{e}_{j} = \mathcal{E}.pop();
4       𝒮\mathcal{S}.update(𝒆j\bm{e}_{j});
5       if 𝒢\mathcal{G}.search(𝐞j\bm{e}_{j}) then
6             k𝒢k\leftarrow\mathcal{G}.nearest(𝒆j\bm{e}_{j});
7             𝒫k(k)\mathcal{P}_{k}\leftarrow\mathcal{M}(k);
8             Update tracker 𝒯k\mathcal{T}_{k} with 𝒆j\bm{e}_{j};
9             if  tmin<𝐞j.t𝒯k.time<tmaxt_{min}<\bm{e}_{j}.t-\mathcal{T}_{k}.time<t_{max}  then
10                   𝒯k.time𝒆j.t\mathcal{T}_{k}.time\leftarrow\bm{e}_{j}.t;
11                   𝑭k\bm{F}_{k}.append(𝒯k.time,𝒯k.location\langle\mathcal{T}_{k}.time,\mathcal{T}_{k}.location\rangle);
12                   Move kk to 𝒯k.location\mathcal{T}_{k}.location in 𝒢\mathcal{G};
13                  
14            else if 𝐞j.t𝒯k.timetmax\bm{e}_{j}.t-\mathcal{T}_{k}.time\geq t_{max} then
15                   Set 1-1 in 𝒢\mathcal{G} at 𝒯k.location\mathcal{T}_{k}.location;
16                   Delete kk, 𝒫k\mathcal{P}_{k}, 𝑭k\bm{F}_{k} in \mathcal{M}, Φ\Phi and 𝑭\bm{F};
17                  
18            
19      else if len(𝐅)<ftsmaxlen(\bm{F})<fts_{max} then
20             if 𝒮\mathcal{S}.query(𝐞j\bm{e}_{j}) is FA-Harris then
21                   ii+1i\leftarrow i+1;
22                   𝒢\mathcal{G}.record(ii) at 𝒆j.location\bm{e}_{j}.location;
23                   Create 𝑭i=𝒆j.t,𝒆j.location\bm{F}_{i}=\langle\bm{e}_{j}.t,\bm{e}_{j}.location\rangle;
24                   Create a new tracker 𝒯i\mathcal{T}_{i}, Update with 𝒆j\bm{e}_{j};
25                   𝒯i.time𝒆j.t\mathcal{T}_{i}.time\leftarrow\bm{e}_{j}.t;
26                   Create 𝒫i\mathcal{P}_{i} with 𝑭i\bm{F}_{i} and 𝒯i\mathcal{T}_{i};
27                   Add i𝒫ii\rightarrow\mathcal{P}_{i} to \mathcal{M}, 𝑭i\bm{F}_{i} to 𝑭\bm{F}, 𝒫i\mathcal{P}_{i} to Φ\Phi;
28                  
29            
30      
Algorithm 1 Event-Driven Visual Frontend

Since the registration table has the same width and height as the resolution of event cameras, it is high-efficient to access and modify elements in the registration table. The search of existing features only pays attention to a small patch within the registration table, and thus has a low time complexity. The small patch also reject the feature clustering and realize sparse features. Similarly, the detection only occurs at the location of the current event measurement and is thus efficient.

As the tracking of active features is triggered by subsequent events, a special case is that an active feature have not be triggered for a long time. For example, a corner feature is detected in noisy events by mistake, and thus it should be discarded promptly. To realize it, we assume the active feature should be triggered frequently within a maximum time threshold. We periodically delete the active features that have not been updated for more than the maximum time threshold. In addition, we also set a minimum time threshold to avoid the feature trajectory collecting a mass of redundant measurements. With patch search and time threshold mechanisms, the result frontend can obtain sparse and asynchronous feature tracking.

VI-B Asynchronous Repojection

Refer to caption
Figure 4: Illustration of event streams and feature trajectories. When the event camera observes some landmarks and moves in the scenario, the event stream triggered by the same landmark will be managed as a feature trajectory.

The feature trajectory, uniquely indexed by ii, can be defined as

𝑭i𝒎1i,𝒎2i,,𝒎ni,\bm{F}_{i}\triangleq\langle\bm{m}_{1}^{i},\bm{m}_{2}^{i},...,\bm{m}_{n}^{i}\rangle, (31)

which represents an ordered spatial-temporal association set. The 𝒎ji\bm{m}_{j}^{i} indicates the jj-th event feature measurement of the landmark 𝒍i\bm{l}_{i} at timestamp ti,jt_{i,j}, and it contains

𝒎ji{ti,j,𝒒~i,j},\bm{m}^{i}_{j}\triangleq\{t_{i,j},\tilde{\bm{q}}_{i,j}\}, (32)

where ti,jt_{i,j} is the measurement timestamp and 𝒒~i,j=[xi,j,yi,j]\tilde{\bm{q}}_{i,j}=[x_{i,j},y_{i,j}]^{\top} is the observed pixel position.

When a feature trajectory has collected enough measurements for triangulation, the induced projection will be utilized to triangulate and embed these measurements into the underlying factor graph. Given the triangulated landmark 𝒍i=[xi,yi,zi]\bm{l}_{i}=[x_{i},y_{i},z_{i}]^{\top} and the event feature measurement 𝒎ji\bm{m}_{j}^{i} as defined in (32), the projection model can be given by

𝒆c(ti,j)=𝒒~i,j1di𝑲(𝑻(ti,j)𝑻bc)𝒍i,\bm{e}_{c}(t_{i,j})=\tilde{\bm{q}}_{i,j}-\frac{1}{d_{i}}\bm{K}(\bm{T}(t_{i,j})\bm{T}_{bc})^{\top}\bm{l}_{i}, (33)

where 𝑲\bm{K} indicates the intrinsic matrix, did_{i} is the depth of this landmark in current camera frame, 𝑻bc\bm{T}_{bc} is the extrinsic parameter of the event camera w.r.t the IMU, which can be solved by calibration. The kinematic state 𝑻(ti,j)\bm{T}(t_{i,j}) can be queried using (7) and (IV-B).

Different from the interpolated projection factor, the induced projection factor has reshaped the continuous-time trajectory using current accessible inertial measurements. Intuitively, it is more credible to build visual projection with this inertial induced factor. This concept is similar to the induce GP in the latent state preintegration method, while we leverage the proposed direct inertial factor to accomplish it. In contrast, the discrete preintegration scheme produces a unique pseudo-observation until the next sampling state. As a result, the asynchronous visual projection factors between inter-sampling states can only be interpolated under the motion prior assumption, which may sometimes lead to overly approximate results. Generally, both the direct factor and latent state preintegration can enhance the asynchronous projection of our event-driven VIO.

VI-C Inverse Depth Parametrization

Refer to caption
Figure 5: Inverse depth parametrization.

The inverse depth parametrization [48] is adopted to enhance the numerical stability and realize undelayed initialization. Different from the filter-based method in [48], we leverage this parametrization to optimize the GP-based factor graph and derive the analytical Jacobians. Furthermore, we define the directional vector in local frames rather than global frames, which is compatible with the optimization method. The (33) can be simplified as

𝒆c=𝒒2𝒒~2.\bm{e}_{c}=\bm{q}_{2}-\tilde{\bm{q}}_{2}. (34)

We use the inverse depth ρ\rho, the directional vector 𝜿\bm{\kappa} in the unit sphere of the first camera pose 𝑻1\bm{T}_{1} observing this feature trajectory 𝑭i\bm{F}_{i} to represent the landmark 𝒍i\bm{l}_{i}. As displayed in Fig. 5, the latter camera pose 𝑻2\bm{T}_{2} should observe the ray 𝒉2\bm{h}_{2} and have a reprojected pixel location 𝒒2\bm{q}_{2}. When the poses and landmark are estimated precisely, this reprojected pixel location should coincide with the visual tracking result 𝒒~2\tilde{\bm{q}}_{2} at the same camera pose. In fact, the direction of 𝒉2\bm{h}_{2} can uniquely determine the 𝒒2\bm{q}_{2}. From the geometric constraints in Fig. 5, the projected directional vector can be given by

𝒉2=𝑪12(𝜿ρ𝒓121)=𝑷𝑻121[𝜿ρ]=[hxhyhz],\bm{h}_{2}^{\dagger}=\bm{C}_{12}^{\top}(\bm{\kappa}-\rho\bm{r}_{1}^{21})=\bm{P}\bm{T}_{12}^{-1}\begin{bmatrix}\bm{\kappa}\\ \rho\end{bmatrix}=\begin{bmatrix}h_{x}\\ h_{y}\\ h_{z}\end{bmatrix}, (35)

where

𝑷=[𝟏3×3,𝟎3×1].\bm{P}=\begin{bmatrix}\bm{1}^{3\times 3},\bm{0}^{3\times 1}\end{bmatrix}. (36)

Therefore, the projection function can be written as

𝒒2=𝑲(𝒉2),\bm{q}_{2}=\bm{K}(\bm{h}_{2}^{\dagger})^{\flat}, (37)

where

(𝒉2)=[hx/hzhy/hz1].(\bm{h}_{2}^{\dagger})^{\flat}=\begin{bmatrix}h_{x}/h_{z}\\ h_{y}/h_{z}\\ 1\end{bmatrix}. (38)

It means the infinity landmark or the pure rotation when the inverse depth ρ\rho is 0 for all time. In both situations, the pose and inverse depth can be optimized without ambiguity. Therefore, this parametrization permits immediate initialization of feature trajectories. The analytical Jacobians can be derived using the chain rule, as detailed in Appendix-E. Although inverse depth has a better numerical performance in optimization, it introduces more state variables (6-DoF first observed pose) in each projection factor. When the first observed pose occurs in inter-samples of the GP trajectory, it will increase further due to the GP interpolation (Two adjacent trajectory points with 36 dimensions). To alleviate this problem, we interpolate on the feature trajectory to find the closest sampling timestamp on the motion trajectory. The interpolated pixel position is then used to compute 𝜿\bm{\kappa}. This process initializes each feature trajectory for inverse depth optimization and reduces 30 dimensions for each projection factor. Benefited from high-time resolution of visual frontend, the interpolation on feature trajectories normally has negligible deviations.

VI-D Overall Cost Function in Sliding-Window

Refer to caption
Figure 6: Factor graph of two asynchronous fusing schemes. (a) is the factor graph of AsynEIO with sparse GP priors and GPIFs, and the asynchronous projection is realized by GP interpolation on the motion trajectory. (b) is factor graph with GPP factors, and the GPP allows to analytically preintegrate at arbitrary timestamps. Therefore, the GPP can also support the asynchronous projection. The dynamic marginalization and sliding-window optimization are adopted to reduce the computational consumption.

The primary advancement of GPIF is that it avoids the integral operations on raw measurements. For the micro-electro-mechanical systems (MEMS) IMU, these integral operations would introduce collective integral errors and deteriorate the estimation system after a fewer second. Instead, we establish the detailed relationship between the differential SE(3) states of the kinematic trajectory and the physical measurements obtained from the IMU. We find that the residual, Jacobians and covariance matrix for GPIF are elegant and suitable to be embedded into the GP-based factor graph. The overall cost for our event-driven VIO with GPIF is given by

min𝒙,𝒍𝒆k𝑸k+12𝑾𝑵𝑶𝑱𝒑𝒓𝒊𝒐𝒓+𝒆bg(tk)𝑸bg2+𝒆bg(tk)𝑸bg2𝒃𝒊𝒂𝒔𝒑𝒓𝒊𝒐𝒓\displaystyle\mathop{\min}_{\bm{x},\bm{l}}\underbrace{\sum\|\bm{e}_{k}\|_{\bm{Q}_{k+1}}^{2}}_{\bm{WNOJ\ prior}}+\underbrace{\sum\|\bm{e}_{bg}(t_{k})\|_{\bm{Q}_{bg}}^{2}+\sum\|\bm{e}_{bg}(t_{k})\|_{\bm{Q}_{bg}}^{2}}_{\bm{bias\ prior}}
+𝒆c(ti,j)𝑹2𝒆𝒗𝒆𝒏𝒕𝒄𝒐𝒔𝒕+𝒆g(τ)𝑸g2+𝒆a(τ)𝑸a2𝒊𝒏𝒆𝒓𝒕𝒊𝒂𝒍𝒄𝒐𝒔𝒕,\displaystyle+\underbrace{\sum\|\bm{e}_{c}(t_{i,j})\|_{\bm{R}}^{2}}_{\bm{event\ cost}}+\underbrace{\sum\|\bm{e}_{g}(\tau)\|_{\bm{Q}_{g}}^{2}+\sum\|\bm{e}_{a}(\tau)\|_{\bm{Q}_{a}}^{2}}_{\bm{inertial\ cost}}, (39)

where 𝑹\bm{R} is the measurement covariance matrix of the event camera. We can see that there is only measurement covariance required by (VI-D) if the direct inertial factor is applied. Compared with preintegration schemes, we reduce the computational consumption paid for the covariance propagation. Also, this propagation is unreliable when the inertial measurements are partly lost or even asynchronous.

Refer to caption
Figure 7: Event-inertial initialization in AsynEIO.

VI-E Event and Inertial Initialization

We design an initialization scheme (as shown in Fig. 7) to warm up our event-inertial system. The asynchronous continuous-time initialization is first converted to a synchronous discrete-time problem using a sampling strategy. A standard visual-inertial initialization method [49] is adopted to achieve initial estimating of pose, linear velocity and bias. With these known states, the remaining unknown states on the continuous-time GP trajectory are solved by a warm up optimization as

min𝝎,ϖ˙𝒆k𝑸k+12𝑾𝑵𝑶𝑱𝒑𝒓𝒊𝒐𝒓+λ1𝝎02+λ2ϖ˙02,\mathop{\min}_{\bm{\omega},\dot{\bm{\varpi}}}\underbrace{\sum\|\bm{e}_{k}\|_{\bm{Q}_{k+1}}^{2}}_{\bm{WNOJ\ prior}}+\lambda_{1}\|\bm{\omega}_{0}\|^{2}+\lambda_{2}\|\dot{\bm{\varpi}}_{0}\|^{2}, (40)

where 𝝎0\bm{\omega}_{0} and ϖ˙0\dot{\bm{\varpi}}_{0} are the rotational velocity and generalized acceleration of the first trajectory point, λ1\lambda_{1} and λ2\lambda_{2} are two weight coefficients. This procedure adds some zero priors and limits the drift of GP trajectory in null space. Finally, a joint optimization as (VI-D) is built to add all asynchronous measurements and refine overall state variables.

VII Conclusion

This article introduces an asynchronous event-inertial odometry called AsynEIO that leverages a unified GP regression framework, where several important inertial fusion schemes are realized and evaluated. To the best of our knowledge, it is the first work that makes fair comparisons for asynchronous event-inertial fusion system. The expressing abilities of GP trajectories defined in different spaces and motion priors are assessed in detail, which can offer valuable suggest for choosing the suitable one according to certain conditions. The comparison analysis of various inertial factors explores their responses to sampling intervals and noises. We demonstrate that the GPP factor has greater contribution for inferring a fine-grained trajectory than GPIF and preintegration methods when the inertial measurements have low-level noises. Meanwhile, the GPP factor is more sensitive than other methods as the noise increases. Experiments conducted on real event-inertial sequences show that AsynEIO have a higher precision than the comparison method. Especially, our AsynEIO has a satisfied performance in high-speed maneuvering and low-illumination scenarios, which mainly benefited from the proposed asynchronous tracking and fusion mechanism. Although AsynEIO still need more improvements and engineering tricks for real-time estimation, we expect the proposed asynchronous event-inertial fusion framework and comparison results can contribute the community.

References

  • [1] D. Gehrig and D. Scaramuzza, “Low-latency automotive vision with event cameras,” Nature, vol. 629, no. 8014, pp. 1034–1040, 2024.
  • [2] G. Liang, K. Chen, H. Li, Y. Lu, and L. Wang, “Towards robust event-guided low-light image enhancement: A large-scale real-world event-image dataset and novel approach,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2024, pp. 23–33.
  • [3] Y.-F. Zuo, W. Xu, X. Wang, Y. Wang, and L. Kneip, “Cross-modal semi-dense 6-dof tracking of an event camera in challenging conditions,” IEEE Trans. Robot., vol. 40, pp. 1600–1616, 2024.
  • [4] A. R. Vidal, H. Rebecq, T. Horstschaefer, and D. Scaramuzza, “Ultimate slam? combining events, images, and imu for robust visual slam in hdr and high-speed scenarios,” IEEE Robot. Autom. Lett., vol. 3, no. 2, pp. 994–1001, 2018.
  • [5] G. Gallego, T. Delbrück, G. Orchard, C. Bartolozzi, B. Taba, A. Censi, S. Leutenegger, A. J. Davison, J. Conradt, K. Daniilidis et al., “Event-based vision: A survey,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 1, pp. 154–180, 2020.
  • [6] Z. Wang, X. Li, T. Liu, Y. Zhang, and P. Huang, “Efficient continuous-time ego-motion estimation for asynchronous event-based data associations,” arXiv preprint arXiv:2402.16398, 2024.
  • [7] X. Li, Z. Wang, Y. Zhang, F. Zhang, and P. Huang, “Asynchronous event-inertial odometry using a unified gaussian process regression framework,” in Proc. IEEE/RSJ Int. Conf. Intell. Robot. Syst.   IEEE, 2024, accepted.
  • [8] A. Zihao Zhu, N. Atanasov, and K. Daniilidis, “Event-based visual inertial odometry,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2017, pp. 5391–5399.
  • [9] A. Z. Zhu, N. Atanasov, and K. Daniilidis, “Event-based feature tracking with probabilistic data association,” in Proc. IEEE Int. Conf. Robot. Autom.   IEEE, 2017, pp. 4465–4470.
  • [10] F. Mahlknecht, D. Gehrig, J. Nash, F. M. Rockenbauer, B. Morrell, J. Delaune, and D. Scaramuzza, “Exploring event camera-based odometry for planetary robots,” IEEE Robot. Autom. Lett., vol. 7, no. 4, pp. 8651–8658, 2022.
  • [11] D. Gehrig, H. Rebecq, G. Gallego, and D. Scaramuzza, “Eklt: Asynchronous photometric feature tracking using events and frames,” Int. J. Comput. Vis., vol. 128, no. 3, pp. 601–618, 2020.
  • [12] H. Rebecq, T. Horstschaefer, and D. Scaramuzza, “Real-time visual-inertial odometry for event cameras using keyframe-based nonlinear optimization,” in Proc. British Mach. Vis. Conf., 2017.
  • [13] M. S. Lee, J. H. Jung, Y. J. Kim, and C. G. Park, “Event-and frame-based visual-inertial odometry with adaptive filtering based on 8-dof warping uncertainty,” IEEE Robot. Autom. Lett., 2023.
  • [14] X. Lagorce, G. Orchard, F. Galluppi, B. E. Shi, and R. B. Benosman, “Hots: a hierarchy of event-based time-surfaces for pattern recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 7, pp. 1346–1359, 2016.
  • [15] W. Guan and P. Lu, “Monocular event visual inertial odometry based on event-corner using sliding windows graph-based optimization,” in Proc. IEEE/RSJ Int. Conf. Intell. Robot. Syst.   IEEE, 2022, pp. 2438–2445.
  • [16] I. Alzugaray and M. Chli, “Asynchronous corner detection and tracking for event cameras in real time,” IEEE Robot. Autom. Lett., vol. 3, no. 4, pp. 3177–3184, 2018.
  • [17] W. Guan, P. Chen, Y. Xie, and P. Lu, “Pl-evio: Robust monocular event-based visual inertial odometry with point and line features,” IEEE Trans. Autom. Sci. Eng., pp. 1–17, 2023.
  • [18] Y. Zhou, G. Gallego, and S. Shen, “Event-based stereo visual odometry,” IEEE Trans. Robot., vol. 37, no. 5, pp. 1433–1450, 2021.
  • [19] J. Niu, S. Zhong, and Y. Zhou, “Imu-aided event-based stereo visual odometry,” arXiv preprint arXiv:2405.04071, 2024.
  • [20] K. Tang, X. Lang, Y. Ma, Y. Huang, L. Li, Y. Liu, and J. Lv, “Monocular event-inertial odometry with adaptive decay-based time surface and polarity-aware tracking,” arXiv preprint arXiv:2409.13971, 2024.
  • [21] S. Hu, Y. Kim, H. Lim, A. J. Lee, and H. Myung, “ecdt: Event clustering for simultaneous feature detection and tracking,” in Proc. IEEE/RSJ Int. Conf. Intell. Robot. Syst.   IEEE, 2022, pp. 3808–3815.
  • [22] I. Alzugaray and M. Chli, “Haste: Multi-hypothesis asynchronous speeded-up tracking of events,” in Proc. Brit. Mach. Vis. Conf., 2020, p. 744.
  • [23] X. Clady, S.-H. Ieng, and R. Benosman, “Asynchronous event-based corner detection and matching,” Neural Networks, vol. 66, pp. 91–106, 2015.
  • [24] A. J. Yang, C. Cui, I. A. Bârsan, R. Urtasun, and S. Wang, “Asynchronous multi-view slam,” in Proc. IEEE Int. Conf. Robot. Autom.   IEEE, 2021, pp. 5669–5676.
  • [25] X. Zheng and J. Zhu, “Traj-lio: A resilient multi-lidar multi-imu state estimator through sparse gaussian process,” arXiv preprint arXiv:2402.09189, 2024.
  • [26] P. Dellenbach, J.-E. Deschaud, B. Jacquet, and F. Goulette, “Ct-icp: Real-time elastic lidar odometry with loop closure,” in Proc. IEEE Int. Conf. Robot. Autom.   IEEE, 2022, pp. 5580–5586.
  • [27] J. N. Wong, D. J. Yoon, A. P. Schoellig, and T. D. Barfoot, “A data-driven motion prior for continuous-time trajectory estimation on se (3),” IEEE Robot. Autom. Lett., vol. 5, no. 2, pp. 1429–1436, 2020.
  • [28] J. Wang and J. D. Gammell, “Event-based stereo visual odometry with native temporal resolution via continuous-time gaussian process regression,” IEEE Robot. Autom. Lett., vol. 8, no. 10, pp. 6707–6714, 2023.
  • [29] D. Liu, A. Parra, Y. Latif, B. Chen, T.-J. Chin, and I. Reid, “Asynchronous optimisation for event-based visual odometry,” in Proc. IEEE Int. Conf. Robot. Autom., 2022, pp. 9432–9438.
  • [30] E. Mueggler, G. Gallego, and D. Scaramuzza, “Continuous-time trajectory estimation for event-based vision sensors,” in Proc. Robot.: Sci. Syst., 2015.
  • [31] E. Mueggler, G. Gallego, H. Rebecq, and D. Scaramuzza, “Continuous-time visual-inertial odometry for event cameras,” IEEE Trans. Robot., vol. 34, no. 6, pp. 1425–1440, 2018.
  • [32] X. Lu, Y. Zhou, and S. Shen, “Event-based visual inertial velometer,” arXiv preprint arXiv:2311.18189, 2023.
  • [33] T. D. Barfoot, C. H. Tong, and S. Särkkä, “Batch continuous-time trajectory estimation as exactly sparse gaussian process regression.” in Proc. Robot.: Sci. and Syst., vol. 10.   Citeseer, 2014, pp. 1–10.
  • [34] T. Y. Tang, D. J. Yoon, and T. D. Barfoot, “A white-noise-on-jerk motion prior for continuous-time trajectory estimation on se(3),” IEEE Robot. Autom. Lett., vol. 4, no. 2, pp. 594–601, 2019.
  • [35] J. Johnson, J. Mangelson, T. Barfoot, and R. Beard, “Continuous-time trajectory estimation: A comparative study between gaussian process and spline-based approaches,” arXiv preprint arXiv:2402.00399, 2024.
  • [36] C. Le Gentil, T. Vidal-Calleja, and S. Huang, “Gaussian process preintegration for inertial-aided state estimation,” IEEE Robot. Autom. Lett., vol. 5, no. 2, pp. 2108–2114, 2020.
  • [37] C. Le Gentil and T. Vidal-Calleja, “Continuous integration over so (3) for imu preintegration,” Robot.: Sci. Syst., 2021.
  • [38] ——, “Continuous latent state preintegration for inertial-aided systems,” Int. J. Robot. Res., vol. 42, no. 10, pp. 874–900, 2023.
  • [39] C. Le Gentil, F. Tschopp, I. Alzugaray, T. Vidal-Calleja, R. Siegwart, and J. Nieto, “Idol: A framework for imu-dvs odometry using lines,” in Proc. IEEE/RSJ Int. Conf. Intell. Robot. Syst.   IEEE, 2020, pp. 5863–5870.
  • [40] B. Dai, C. Le Gentil, and T. Vidal-Calleja, “A tightly-coupled event-inertial odometry using exponential decay and linear preintegrated measurements,” in Proc. IEEE/RSJ Int. Conf. Intell. Robot. Syst.   IEEE, 2022, pp. 9475–9482.
  • [41] K. Burnett, A. P. Schoellig, and T. D. Barfoot, “Continuous-time radar-inertial and lidar-inertial odometry using a gaussian process motion prior,” arXiv preprint arXiv:2402.06174, 2024.
  • [42] S. Anderson and T. D. Barfoot, “Full steam ahead: Exactly sparse gaussian process regression for batch continuous-time trajectory estimation on se (3),” in Proc. IEEE/RSJ Int. Conf. Intell. Robot. Syst.   IEEE, 2015, pp. 157–164.
  • [43] K. Burnett, A. P. Schoellig, and T. D. Barfoot, “Imu as an input vs. a measurement of the state in inertial-aided state estimation,” arXiv preprint arXiv:2403.05968, 2024.
  • [44] W. Chamorro, J. Solà, and J. Andrade-Cetto, “Event-imu fusion strategies for faster-than-imu estimation throughput,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 3975–3982.
  • [45] S. Särkkä, “Linear operators and stochastic partial differential equations in gaussian process regression,” in Artificial Neural Networks and Machine Learning – ICANN 2011.   Springer, 2011, pp. 151–158.
  • [46] C. Forster, L. Carlone, F. Dellaert, and D. Scaramuzza, “On-manifold preintegration for real-time visual–inertial odometry,” IEEE Trans. Robot., vol. 33, no. 1, pp. 1–21, 2016.
  • [47] R. Li, D. Shi, Y. Zhang, K. Li, and R. Li, “Fa-harris: A fast and asynchronous corner detector for event cameras,” in Proc. IEEE/RSJ Int. Conf. Intell. Robot. Syst.   IEEE, 2019, pp. 6223–6229.
  • [48] J. Civera, A. J. Davison, and J. M. Montiel, “Inverse depth parametrization for monocular slam,” IEEE Trans. Robot., vol. 24, no. 5, pp. 932–945, 2008.
  • [49] T. Qin, P. Li, and S. Shen, “Vins-mono: A robust and versatile monocular visual-inertial state estimator,” IEEE Trans. Robot., vol. 34, no. 4, pp. 1004–1020, 2018.

Appendix

VII-A Adjoint of SE(3)SE(3)

𝓣=Ad(exp(𝝃))\displaystyle\bm{\mathcal{T}}=Ad(\exp(\bm{\xi}^{\land})) =Ad([𝑪𝑪𝑱𝝆𝟎1])\displaystyle=Ad(\begin{bmatrix}\bm{C}&\bm{C}\bm{J}\bm{\rho}\\ \bm{0}^{\top}&1\end{bmatrix})
=[𝑪𝟎𝑪(𝑱𝝆)𝑪]\displaystyle=\begin{bmatrix}\bm{C}&\bm{0}\\ \bm{C}(\bm{J}\bm{\rho})^{\land}&\bm{C}\end{bmatrix}
=exp(𝝃)\displaystyle=\exp(\bm{\xi}^{\curlywedge}) (41)

where 𝑪=exp(ϕ)\bm{C}=\exp(\bm{\phi}^{\land}), 𝑱=𝑱(ϕ)\bm{J}=\bm{J}(\bm{\phi}) is the right Jacobian matrix.

𝑻exp(𝝃)𝑻1exp((Ad(𝑻)𝝃))\bm{T}\exp(\bm{\xi}^{\land})\bm{T}^{-1}\equiv\exp((Ad(\bm{T})\bm{\xi})^{\land}) (42)

VII-B Perturbing the Pose on SE(3)SE(3)

𝑻\displaystyle\bm{T} =𝑻¯exp(ϵ)\displaystyle=\bar{\bm{T}}\exp(\bm{\epsilon}^{\land}) (43)
𝑻\displaystyle\bm{T} =exp(ϵ)𝑻¯\displaystyle=\exp(\bm{\epsilon}^{\land})\bar{\bm{T}}
=𝑻¯exp((Ad(𝑻¯1)ϵ))\displaystyle=\bar{\bm{T}}\exp((Ad(\bar{\bm{T}}^{-1})\bm{\epsilon})^{\land})
=𝑻¯exp((Ad(𝑻¯)1ϵ))\displaystyle=\bar{\bm{T}}\exp((Ad(\bar{\bm{T}})^{-1}\bm{\epsilon})^{\land})
=𝑻¯exp(𝓣1ϵ))\displaystyle=\bar{\bm{T}}\exp(\bm{\mathcal{T}}^{-1}\bm{\epsilon})^{\land}) (44)

VII-C Perturbing WNOJ Prior Residuals

𝝃k,k+1\displaystyle\bm{\xi}_{k,k+1} =log(𝑻k1𝑻k+1)\displaystyle=\log(\bm{T}_{k}^{-1}\bm{T}_{k+1})^{\vee}
=log(exp(ϵk)𝑻¯k1𝑻¯k+1exp(ϵk+1))\displaystyle=\log(\exp(-\bm{\epsilon}^{\land}_{k})\bar{\bm{T}}_{k}^{-1}\bar{\bm{T}}_{k+1}\exp(\bm{\epsilon}^{\land}_{k+1}))
𝝃¯k,k+1+𝓙k,k+1,op1(ϵk+1exp(𝝃)ϵk)\displaystyle\approx\bar{\bm{\xi}}_{k,k+1}+\bm{\mathcal{J}}^{-1}_{k,k+1,op}(\bm{\epsilon}_{k+1}-\exp(-\bm{\xi}^{\curlywedge})\bm{\epsilon}_{k})
=𝝃¯k,k+1+𝓙k,k+1,op1(ϵk+1𝓣k,k+1,op1ϵk)δ𝝃k,k+1\displaystyle=\bar{\bm{\xi}}_{k,k+1}+\underbrace{\bm{\mathcal{J}}_{k,k+1,op}^{-1}(\bm{\epsilon}_{k+1}-\bm{\mathcal{T}}_{k,k+1,op}^{-1}\bm{\epsilon}_{k})}_{\triangleq\delta\bm{\xi}_{k,k+1}} (45)
𝓙k,k+11\displaystyle\bm{\mathcal{J}}^{-1}_{k,k+1} =𝓙1(𝝃k,k+1)\displaystyle=\bm{\mathcal{J}}^{-1}(\bm{\xi}_{k,k+1})
𝓙1(𝝃¯k,k+1+𝓙k,k+1,op1(ϵk+1𝓣k,k+1,op1ϵk))\displaystyle\approx\bm{\mathcal{J}}^{-1}(\bar{\bm{\xi}}_{k,k+1}+\bm{\mathcal{J}}_{k,k+1,op}^{-1}(\bm{\epsilon}_{k+1}-\bm{\mathcal{T}}_{k,k+1,op}^{-1}\bm{\epsilon}_{k}))
𝓙k,k+1,op1+12(𝓙k,k+1,op1(ϵk+1𝓣k,k+1,op1ϵk))\displaystyle\approx\bm{\mathcal{J}}_{k,k+1,op}^{-1}+\frac{1}{2}(\bm{\mathcal{J}}_{k,k+1,op}^{-1}(\bm{\epsilon}_{k+1}-\bm{\mathcal{T}}_{k,k+1,op}^{-1}\bm{\epsilon}_{k}))^{\curlywedge}
=𝓙k,k+1,op1+12(δ𝝃k,k+1)\displaystyle=\bm{\mathcal{J}}_{k,k+1,op}^{-1}+\frac{1}{2}(\delta\bm{\xi}_{k,k+1})^{\curlywedge} (46)
ϖk\displaystyle\bm{\varpi}_{k} =ϖ¯k+δϖk\displaystyle=\bar{\bm{\varpi}}_{k}+\delta\bm{\varpi}_{k} (47)
ϖk+1\displaystyle\bm{\varpi}_{k+1} =ϖ¯k+1+δϖk+1\displaystyle=\bar{\bm{\varpi}}_{k+1}+\delta\bm{\varpi}_{k+1} (48)
ϖ˙k\displaystyle\dot{\bm{\varpi}}_{k} =ϖ˙¯k+δϖ˙k\displaystyle=\bar{\dot{\bm{\varpi}}}_{k}+\delta\dot{\bm{\varpi}}_{k} (49)
ϖ˙k+1\displaystyle\dot{\bm{\varpi}}_{k+1} =ϖ˙¯k+1+δϖ˙k+1\displaystyle=\bar{\dot{\bm{\varpi}}}_{k+1}+\delta\dot{\bm{\varpi}}_{k+1} (50)
𝓙k,k+11ϖk+1\displaystyle\bm{\mathcal{J}}_{k,k+1}^{-1}\bm{\varpi}_{k+1} =(𝓙k,k+1,op1+12(δ𝝃k,k+1))ϖk+1\displaystyle=\left(\bm{\mathcal{J}}_{k,k+1,op}^{-1}+\frac{1}{2}(\delta\bm{\xi}_{k,k+1})^{\curlywedge}\right)\bm{\varpi}_{k+1}
𝓙k,k+1,op1ϖ¯k+1+𝓙k,k+1,op1δϖk+1\displaystyle\approx\bm{\mathcal{J}}_{k,k+1,op}^{-1}\bar{\bm{\varpi}}_{k+1}+\bm{\mathcal{J}}_{k,k+1,op}^{-1}\delta\bm{\varpi}_{k+1}
+12(δ𝝃k,k+1)ϖ¯k+112ϖ¯k+1δ𝝃k,k+1\displaystyle+\underbrace{\frac{1}{2}(\delta\bm{\xi}_{k,k+1})^{\curlywedge}\bar{\bm{\varpi}}_{k+1}}_{-\frac{1}{2}\bar{\bm{\varpi}}_{k+1}^{\curlywedge}\delta\bm{\xi}_{k,k+1}} (51)
𝓙k,k+11ϖ˙k+1\displaystyle\bm{\mathcal{J}}_{k,k+1}^{-1}\dot{\bm{\varpi}}_{k+1} =(𝓙k,k+1,op1+12(δ𝝃k,k+1))ϖ˙k+1\displaystyle=\left(\bm{\mathcal{J}}_{k,k+1,op}^{-1}+\frac{1}{2}(\delta\bm{\xi}_{k,k+1})^{\curlywedge}\right)\dot{\bm{\varpi}}_{k+1}
𝓙k,k+1,op1ϖ˙¯k+1+𝓙k,k+1,op1δϖ˙k+1\displaystyle\approx\bm{\mathcal{J}}_{k,k+1,op}^{-1}\bar{\dot{\bm{\varpi}}}_{k+1}+\bm{\mathcal{J}}_{k,k+1,op}^{-1}\delta\dot{\bm{\varpi}}_{k+1}
+12(δ𝝃k,k+1)ϖ˙¯k+1\displaystyle+\frac{1}{2}(\delta\bm{\xi}_{k,k+1})^{\curlywedge}\bar{\dot{\bm{\varpi}}}_{k+1} (52)
(𝓙k,k+11ϖk+1)ϖk+1=(𝓙k,k+1,op1ϖ¯k+1)ϖ¯k+1\displaystyle(\bm{\mathcal{J}}_{k,k+1}^{-1}\bm{\varpi}_{k+1})^{\curlywedge}\bm{\varpi}_{k+1}=(\bm{\mathcal{J}}_{k,k+1,op}^{-1}\bar{\bm{\varpi}}_{k+1})^{\curlywedge}\bar{\bm{\varpi}}_{k+1}
+(𝓙k,k+1,op1ϖ¯k+1)δϖk+1\displaystyle+(\bm{\mathcal{J}}_{k,k+1,op}^{-1}\bar{\bm{\varpi}}_{k+1})^{\curlywedge}\delta\bm{\varpi}_{k+1}
ϖ¯k+1𝓙k,k+1,op1δϖk+1\displaystyle-\bar{\bm{\varpi}}_{k+1}^{\curlywedge}\bm{\mathcal{J}}_{k,k+1,op}^{-1}\delta\bm{\varpi}_{k+1}
+(12(δ𝝃k,k+1)ϖ¯k+1)ϖ¯k+112ϖ¯k+1ϖ¯k+1δ𝝃k,k+1\displaystyle+\underbrace{\left(\frac{1}{2}(\delta\bm{\xi}_{k,k+1})^{\curlywedge}\bar{\bm{\varpi}}_{k+1}\right)^{\curlywedge}\bar{\bm{\varpi}}_{k+1}}_{\frac{1}{2}\bar{\bm{\varpi}}_{k+1}^{\curlywedge}\bar{\bm{\varpi}}_{k+1}^{\curlywedge}\delta\bm{\xi}_{k,k+1}} (53)

VII-D Jacobians of Interpolated State w.r.t Sampling States

The relationship between the linear local state and kinematic state at sampling timestamps is

𝜸k(tk)\displaystyle\bm{\gamma}_{k}(t_{k}) =[𝟎ϖkϖ˙k],\displaystyle=\begin{bmatrix}\bm{0}\\ \bm{\varpi}_{k}\\ \dot{\bm{\varpi}}_{k}\end{bmatrix}, (54)
𝜸k(tk+1)\displaystyle\bm{\gamma}_{k}(t_{k+1}) =[𝝃k,k+1𝓙k,k+11ϖk+1𝓙k,k+11ϖ˙k+1+12(𝓙k,k+11ϖk+1)ϖk+1].\displaystyle=\begin{bmatrix}\bm{\xi}_{k,k+1}\\ \bm{\mathcal{J}}^{-1}_{k,k+1}\bm{\varpi}_{k+1}\\ \bm{\mathcal{J}}^{-1}_{k,k+1}\dot{\bm{\varpi}}_{k+1}+\frac{1}{2}(\bm{\mathcal{J}}^{-1}_{k,k+1}\bm{\varpi}_{k+1})^{\curlywedge}\bm{\varpi}_{k+1}\end{bmatrix}. (55)

The interpolated local state can be expressed with sampling states as

𝝃k,τ\displaystyle\bm{\xi}_{k,\tau} =𝝃k(τ)\displaystyle=\bm{\xi}_{k}(\tau) (56)
=𝚲12(τ)ϖk+𝚲13(τ)ϖ˙k+𝚿11(τ)𝝃k,k+1\displaystyle=\bm{\Lambda}_{12}(\tau)\bm{\varpi}_{k}+\bm{\Lambda}_{13}(\tau)\dot{\bm{\varpi}}_{k}+\bm{\Psi}_{11}(\tau)\bm{\xi}_{k,k+1}
+𝚿12(τ)𝓙k,k+11ϖk+1+𝚿13(τ)𝓙k,k+11ϖ˙k+1\displaystyle+\bm{\Psi}_{12}(\tau)\bm{\mathcal{J}}^{-1}_{k,k+1}\bm{\varpi}_{k+1}+\bm{\Psi}_{13}(\tau)\bm{\mathcal{J}}^{-1}_{k,k+1}\dot{\bm{\varpi}}_{k+1}
+12𝚿13(τ)(𝓙k,k+11ϖk+1)ϖk+1\displaystyle+\frac{1}{2}\bm{\Psi}_{13}(\tau)(\bm{\mathcal{J}}^{-1}_{k,k+1}\bm{\varpi}_{k+1})^{\curlywedge}\bm{\varpi}_{k+1} (57)
𝑻(τ)\displaystyle\bm{T}(\tau) =𝑻kexp((𝝃k,τ))\displaystyle=\bm{T}_{k}\exp((\bm{\xi}_{k,\tau})^{\land}) (58)

Notice that 𝝃˙k,τ\dot{\bm{\xi}}_{k,\tau} and 𝝃¨k,τ\ddot{\bm{\xi}}_{k,\tau} have the same forms as 𝝃k,τ\bm{\xi}_{k,\tau}, but have different rows of matrix coefficients. With (8) and (9), we can resolve ϖ(τ)\bm{\varpi}(\tau) and ϖ˙(τ)\dot{\bm{\varpi}}(\tau).

We first derive the perturbing Jacobians of 𝑻(τ)\bm{T}(\tau) w.r.t 𝑻k\bm{T}_{k} and 𝝃k,τ\bm{\xi}_{k,\tau}, respectively. Then, the perturbing Jacobians of 𝝃k,τ\bm{\xi}_{k,\tau} w.r.t 𝒙(k)\bm{x}(k) and 𝒙(k+1)\bm{x}(k+1) are given.

log(𝑻(τ))\displaystyle\log(\bm{T}(\tau))^{\vee} =log(𝑻¯(τ)exp(ϵτ))\displaystyle=\log(\bar{\bm{T}}(\tau)\exp(\bm{\epsilon}_{\tau}^{\land}))^{\vee}
log(𝑻¯(τ))+𝓙1ϵτ\displaystyle\approx\log(\bar{\bm{T}}(\tau))^{\vee}+\bm{\mathcal{J}}^{-1}\bm{\epsilon}_{\tau} (59)
log(𝑻(τ))\displaystyle\log(\bm{T}(\tau))^{\vee} =log(𝑻kexp((𝝃k,τ)))\displaystyle=\log(\bm{T}_{k}\exp((\bm{\xi}_{k,\tau})^{\land}))^{\vee}
=log(𝑻kexp((𝝃¯k,τ+δ𝝃τ)))\displaystyle=\log(\bm{T}_{k}\exp((\bar{\bm{\xi}}_{k,\tau}+\delta\bm{\xi}_{\tau})^{\land}))^{\vee}
=log(𝑻kexp((𝝃¯k,τ)))+𝓙1𝓙(𝝃¯k,τ)δ𝝃τ\displaystyle=\log(\bm{T}_{k}\exp((\bar{\bm{\xi}}_{k,\tau})^{\land}))^{\vee}+\bm{\mathcal{J}}^{-1}\bm{\mathcal{J}}(\bar{\bm{\xi}}_{k,\tau})\delta\bm{\xi}_{\tau}
=log(𝑻¯(τ))+𝓙1𝓙(𝝃¯k,τ)δ𝝃τ\displaystyle=\log(\bar{\bm{T}}(\tau))^{\vee}+\bm{\mathcal{J}}^{-1}\bm{\mathcal{J}}(\bar{\bm{\xi}}_{k,\tau})\delta\bm{\xi}_{\tau} (60)
log(𝑻(τ))\displaystyle\log(\bm{T}(\tau))^{\vee} =log(𝑻¯kexp(ϵk)exp(𝝃k,τ))\displaystyle=\log(\bar{\bm{T}}_{k}\exp(\bm{\epsilon}_{k}^{\land})\exp(\bm{\xi}_{k,\tau}^{\land}))^{\vee}
=log(𝑻¯(τ))+𝓙1(𝓣(𝝃k,τ))1ϵk\displaystyle=\log(\bar{\bm{T}}(\tau))^{\vee}+\bm{\mathcal{J}}^{-1}(\bm{\mathcal{T}}(\bm{\xi}_{k,\tau}))^{-1}\bm{\epsilon}_{k} (61)

According the chain rule, we can define the Jacobians of interpolated state w.r.t 𝑻k\bm{T}_{k} and 𝝃k,τ\bm{\xi}_{k,\tau} as

𝑻(τ)𝝃k,τ\displaystyle\frac{\partial\bm{T}(\tau)}{\partial\bm{\xi}_{k,\tau}} =𝑻kexp((𝝃¯k,τ+δ𝝃τ))δ𝝃τ=𝓙(𝝃¯k,τ),\displaystyle=\frac{\partial\bm{T}_{k}\exp((\bar{\bm{\xi}}_{k,\tau}+\delta\bm{\xi}_{\tau})^{\land})}{\partial\delta\bm{\xi}_{\tau}}=\bm{\mathcal{J}}(\bar{\bm{\xi}}_{k,\tau}), (62)
𝑻(τ)𝑻k\displaystyle\frac{\partial\bm{T}(\tau)}{\partial\bm{T}_{k}} =𝑻¯kexp(ϵk)exp(𝝃k,τ)ϵk=(𝓣(𝝃k,τ))1.\displaystyle=\frac{\partial\bar{\bm{T}}_{k}\exp(\bm{\epsilon}_{k}^{\land})\exp(\bm{\xi}_{k,\tau}^{\land})}{\partial\bm{\epsilon}_{k}}=(\bm{\mathcal{T}}(\bm{\xi}_{k,\tau}))^{-1}. (63)

Perturbing the right-hand side of (8) with δ𝝃\delta\bm{\xi} and δ𝝃˙\delta\dot{\bm{\xi}}, we have

ϖ(τ)\displaystyle\bm{\varpi}(\tau) =𝓙(𝝃k,τ)𝝃˙k,τ\displaystyle=\bm{\mathcal{J}}(\bm{\xi}_{k,\tau})\dot{\bm{\xi}}_{k,\tau} (64)
=𝓙(𝝃¯k,τ+δ𝝃)(𝝃˙¯k,τ+δ𝝃˙)\displaystyle=\bm{\mathcal{J}}(\bar{\bm{\xi}}_{k,\tau}+\delta\bm{\xi})(\bar{\dot{\bm{\xi}}}_{k,\tau}+\delta\dot{\bm{\xi}}) (65)
(𝓙(𝝃¯k,τ)12δ𝝃)(𝝃˙¯k,τ+δ𝝃˙)\displaystyle\approx\left(\bm{\mathcal{J}}(\bar{\bm{\xi}}_{k,\tau})-\frac{1}{2}\delta\bm{\xi}^{\curlywedge}\right)(\bar{\dot{\bm{\xi}}}_{k,\tau}+\delta\dot{\bm{\xi}}) (66)
𝓙k,τ,op𝝃˙¯k,τ+12𝝃˙¯k,τδ𝝃+𝓙k,τ,opδ𝝃˙.\displaystyle\approx\bm{\mathcal{J}}_{k,\tau,op}\bar{\dot{\bm{\xi}}}_{k,\tau}+\frac{1}{2}\bar{\dot{\bm{\xi}}}_{k,\tau}^{\curlywedge}\delta\bm{\xi}+\bm{\mathcal{J}}_{k,\tau,op}\delta\dot{\bm{\xi}}. (67)

On this basis, the Jacobians of ϖ(τ)\bm{\varpi}(\tau) w.r.t the local variables are given by

ϖ(τ)𝝃k,τ\displaystyle\frac{\partial\bm{\varpi}(\tau)}{\partial\bm{\xi}_{k,\tau}} =12𝝃˙¯k,τ\displaystyle=\frac{1}{2}\bar{\dot{\bm{\xi}}}_{k,\tau}^{\curlywedge} (68)
ϖ(τ)𝝃˙k,τ\displaystyle\frac{\partial\bm{\varpi}(\tau)}{\partial\dot{\bm{\xi}}_{k,\tau}} =𝓙k,τ,op\displaystyle=\bm{\mathcal{J}}_{k,\tau,op} (69)
ϖ(τ)𝝃¨k,τ\displaystyle\frac{\partial\bm{\varpi}(\tau)}{\partial\ddot{\bm{\xi}}_{k,\tau}} =𝟎\displaystyle=\bm{0} (70)

Calculating the derivative on both sides of (8) yields

ϖ˙(τ)\displaystyle\dot{\bm{\varpi}}(\tau) =𝓙k,τ𝝃¨k,τ+ddt𝓙k,τ𝝃˙k,τ\displaystyle=\bm{\mathcal{J}}_{k,\tau}\ddot{\bm{\xi}}_{k,\tau}+\frac{d}{dt}\bm{\mathcal{J}}_{k,\tau}\dot{\bm{\xi}}_{k,\tau}
𝓙k,τ𝝃¨k,τ12𝝃˙k,τ𝝃˙k,τ\displaystyle\approx\bm{\mathcal{J}}_{k,\tau}\ddot{\bm{\xi}}_{k,\tau}-\frac{1}{2}\dot{\bm{\xi}}_{k,\tau}^{\curlywedge}\dot{\bm{\xi}}_{k,\tau}
=𝓙k,τ𝝃¨k,τ.\displaystyle=\bm{\mathcal{J}}_{k,\tau}\ddot{\bm{\xi}}_{k,\tau}. (71)

Therefore, we have the Jacobians as

ϖ˙(τ)𝝃k,τ\displaystyle\frac{\partial\dot{\bm{\varpi}}(\tau)}{\partial\bm{\xi}_{k,\tau}} =12𝝃¨¯k,τ\displaystyle=\frac{1}{2}\bar{\ddot{\bm{\xi}}}_{k,\tau}^{\curlywedge} (72)
ϖ˙(τ)𝝃˙k,τ\displaystyle\frac{\partial\dot{\bm{\varpi}}(\tau)}{\partial\dot{\bm{\xi}}_{k,\tau}} =𝟎\displaystyle=\bm{0} (73)
ϖ˙(τ)𝝃¨k,τ\displaystyle\frac{\partial\dot{\bm{\varpi}}(\tau)}{\partial\ddot{\bm{\xi}}_{k,\tau}} =𝓙k,τ,op\displaystyle=\bm{\mathcal{J}}_{k,\tau,op} (74)

Perturbing the right-hand side of (56) with δϖk\delta\bm{\varpi}_{k}, δϖk+1\delta\bm{\varpi}_{k+1} and δ𝝃k,k+1\delta\bm{\xi}_{k,k+1}, we have

𝝃k,τ\displaystyle\bm{\xi}_{k,\tau} =𝝃k,τ,op+𝚲12δϖk+𝚲13δϖ˙k+𝚿11(τ)δ𝝃k,k+1\displaystyle=\bm{\xi}_{k,\tau,op}+\bm{\Lambda}_{12}\delta\bm{\varpi}_{k}+\bm{\Lambda}_{13}\delta\dot{\bm{\varpi}}_{k}+\bm{\Psi}_{11}(\tau)\delta\bm{\xi}_{k,k+1}
+𝚿12(τ)𝓙k,k+1,op1δϖk+112𝚿12(τ)ϖ¯k+1δ𝝃k,k+1\displaystyle+\bm{\Psi}_{12}(\tau)\bm{\mathcal{J}}_{k,k+1,op}^{-1}\delta\bm{\varpi}_{k+1}-\frac{1}{2}\bm{\Psi}_{12}(\tau)\bar{\bm{\varpi}}_{k+1}^{\curlywedge}\delta\bm{\xi}_{k,k+1}
+𝚿13(τ)𝓙k,k+1,op1δϖ˙k+112𝚿13(τ)ϖ˙¯k+1δ𝝃k,k+1\displaystyle+\bm{\Psi}_{13}(\tau)\bm{\mathcal{J}}_{k,k+1,op}^{-1}\delta\dot{\bm{\varpi}}_{k+1}-\frac{1}{2}\bm{\Psi}_{13}(\tau)\bar{\dot{\bm{\varpi}}}_{k+1}^{\curlywedge}\delta\bm{\xi}_{k,k+1}
+12𝚿13(τ)(𝓙k,k+1,op1ϖ¯k+1)δϖk+1\displaystyle+\frac{1}{2}\bm{\Psi}_{13}(\tau)(\bm{\mathcal{J}}_{k,k+1,op}^{-1}\bar{\bm{\varpi}}_{k+1})^{\curlywedge}\delta\bm{\varpi}_{k+1}
12𝚿13(τ)ϖ¯k+1𝓙k,k+1,op1δϖk+1\displaystyle-\frac{1}{2}\bm{\Psi}_{13}(\tau)\bar{\bm{\varpi}}_{k+1}^{\curlywedge}\bm{\mathcal{J}}_{k,k+1,op}^{-1}\delta\bm{\varpi}_{k+1}
+14𝚿13(τ)ϖ¯k+1ϖ¯k+1δ𝝃k,k+1\displaystyle+\frac{1}{4}\bm{\Psi}_{13}(\tau)\bar{\bm{\varpi}}_{k+1}^{\curlywedge}\bar{\bm{\varpi}}_{k+1}^{\curlywedge}\delta\bm{\xi}_{k,k+1} (75)

Therefore, we have Jacobians as follows:

𝝃k,τϵk+1\displaystyle\frac{\partial\bm{\xi}_{k,\tau}}{\partial\bm{\epsilon}_{k+1}} =𝚿11(τ)𝓙k,k+1,op1\displaystyle=\bm{\Psi}_{11}(\tau)\bm{\mathcal{J}}_{k,k+1,op}^{-1}
12𝚿12(τ)ϖ¯k+1𝓙k,k+1,op1\displaystyle-\frac{1}{2}\bm{\Psi}_{12}(\tau)\bar{\bm{\varpi}}_{k+1}^{\curlywedge}\bm{\mathcal{J}}_{k,k+1,op}^{-1}
12𝚿13(τ)ϖ˙¯k+1𝓙k,k+1,op1\displaystyle-\frac{1}{2}\bm{\Psi}_{13}(\tau)\bar{\dot{\bm{\varpi}}}_{k+1}^{\curlywedge}\bm{\mathcal{J}}_{k,k+1,op}^{-1}
+14𝚿13(τ)ϖ¯k+1ϖ¯k+1𝓙k,k+1,op1\displaystyle+\frac{1}{4}\bm{\Psi}_{13}(\tau)\bar{\bm{\varpi}}_{k+1}^{\curlywedge}\bar{\bm{\varpi}}_{k+1}^{\curlywedge}\bm{\mathcal{J}}_{k,k+1,op}^{-1} (76)
𝝃k,τϵk\displaystyle\frac{\partial\bm{\xi}_{k,\tau}}{\partial\bm{\epsilon}_{k}} =𝝃k,τϵk+1𝓣k,k+1,op1\displaystyle=-\frac{\partial\bm{\xi}_{k,\tau}}{\partial\bm{\epsilon}_{k+1}}\bm{\mathcal{T}}_{k,k+1,op}^{-1} (77)
𝝃k,τδϖk+1\displaystyle\frac{\partial\bm{\xi}_{k,\tau}}{\partial\delta\bm{\varpi}_{k+1}} =𝚿12(τ)𝓙k,k+1,op1\displaystyle=\bm{\Psi}_{12}(\tau)\bm{\mathcal{J}}_{k,k+1,op}^{-1}
+12𝚿13(τ)(𝓙k,k+1,op1ϖ¯k+1)\displaystyle+\frac{1}{2}\bm{\Psi}_{13}(\tau)(\bm{\mathcal{J}}_{k,k+1,op}^{-1}\bar{\bm{\varpi}}_{k+1})^{\curlywedge}
12𝚿13(τ)ϖ¯k+1𝓙k,k+1,op1\displaystyle-\frac{1}{2}\bm{\Psi}_{13}(\tau)\bar{\bm{\varpi}}_{k+1}^{\curlywedge}\bm{\mathcal{J}}_{k,k+1,op}^{-1} (78)
𝝃k,τδϖk\displaystyle\frac{\partial\bm{\xi}_{k,\tau}}{\partial\delta\bm{\varpi}_{k}} =𝚲12\displaystyle=\bm{\Lambda}_{12} (79)
𝝃k,τδϖ˙k\displaystyle\frac{\partial\bm{\xi}_{k,\tau}}{\partial\delta\dot{\bm{\varpi}}_{k}} =𝚲13\displaystyle=\bm{\Lambda}_{13} (80)
𝝃k,τδϖ˙k+1\displaystyle\frac{\partial\bm{\xi}_{k,\tau}}{\partial\delta\dot{\bm{\varpi}}_{k+1}} =𝚿13(τ)𝓙k,k+1,op1\displaystyle=\bm{\Psi}_{13}(\tau)\bm{\mathcal{J}}_{k,k+1,op}^{-1} (81)

VII-E Jacobians of Visual Residuals

[𝜿ρ]\displaystyle\begin{bmatrix}\bm{\kappa}\\ \rho\end{bmatrix}^{\odot} =[𝜿ρ𝟏𝟎𝟎]\displaystyle=\begin{bmatrix}-\bm{\kappa}^{\land}&\rho\bm{1}\\ \bm{0}^{\top}&\bm{0}^{\top}\end{bmatrix} (82)
𝝃[𝜿ρ]\displaystyle\bm{\xi}^{\land}\begin{bmatrix}\bm{\kappa}\\ \rho\end{bmatrix} [𝜿ρ]𝝃\displaystyle\equiv\begin{bmatrix}\bm{\kappa}\\ \rho\end{bmatrix}^{\odot}\bm{\xi} (83)
𝑻121[𝜿ρ]\displaystyle\bm{T}_{12}^{-1}\begin{bmatrix}\bm{\kappa}\\ \rho\end{bmatrix} =(𝑻11𝑻2)1[𝜿ρ]\displaystyle=\left(\bm{T}_{1}^{-1}\bm{T}_{2}\right)^{-1}\begin{bmatrix}\bm{\kappa}\\ \rho\end{bmatrix}
=((𝑻¯1exp(ϵ1))1𝑻¯2exp(ϵ2))1[𝜿ρ]\displaystyle=\left((\bar{\bm{T}}_{1}\exp(\bm{\epsilon}_{1}^{\land}))^{-1}\bar{\bm{T}}_{2}\exp(\bm{\epsilon}_{2}^{\land})\right)^{-1}\begin{bmatrix}\bm{\kappa}\\ \rho\end{bmatrix}
=exp(ϵ2)𝑻¯21𝑻¯1exp(ϵ1)[𝜿ρ]\displaystyle=\exp(-\bm{\epsilon}_{2}^{\land})\bar{\bm{T}}_{2}^{-1}\bar{\bm{T}}_{1}\exp(\bm{\epsilon}_{1}^{\land})\begin{bmatrix}\bm{\kappa}\\ \rho\end{bmatrix}
(𝟏ϵ2)𝑻¯121(𝟏+ϵ1)[𝜿ρ]\displaystyle\approx\left(\bm{1}-\bm{\epsilon}_{2}^{\land}\right)\bar{\bm{T}}_{12}^{-1}\left(\bm{1}+\bm{\epsilon}_{1}^{\land}\right)\begin{bmatrix}\bm{\kappa}\\ \rho\end{bmatrix}
𝑻¯121[𝜿ρ]ϵ2𝑻¯121[𝜿ρ]+𝑻¯121ϵ1[𝜿ρ]\displaystyle\approx\bar{\bm{T}}_{12}^{-1}\begin{bmatrix}\bm{\kappa}\\ \rho\end{bmatrix}-\bm{\epsilon}_{2}^{\land}\bar{\bm{T}}_{12}^{-1}\begin{bmatrix}\bm{\kappa}\\ \rho\end{bmatrix}+\bar{\bm{T}}_{12}^{-1}\bm{\epsilon}_{1}^{\land}\begin{bmatrix}\bm{\kappa}\\ \rho\end{bmatrix}
=𝑻¯121[𝜿ρ](𝑻¯121[𝜿ρ])ϵ2+𝑻¯121[𝜿ρ]ϵ1\displaystyle=\bar{\bm{T}}_{12}^{-1}\begin{bmatrix}\bm{\kappa}\\ \rho\end{bmatrix}-\left(\bar{\bm{T}}_{12}^{-1}\begin{bmatrix}\bm{\kappa}\\ \rho\end{bmatrix}\right)^{\odot}\bm{\epsilon}_{2}+\bar{\bm{T}}_{12}^{-1}\begin{bmatrix}\bm{\kappa}\\ \rho\end{bmatrix}^{\odot}\bm{\epsilon}_{1} (84)
𝑻121[𝜿ρ]\displaystyle\bm{T}_{12}^{-1}\begin{bmatrix}\bm{\kappa}\\ \rho\end{bmatrix} =𝑻121[𝜿ρ¯+δρ]\displaystyle=\bm{T}_{12}^{-1}\begin{bmatrix}\bm{\kappa}\\ \bar{\rho}+\delta\rho\end{bmatrix}
=𝑻121[𝜿ρ¯]+𝑻121[𝟎1]δρ\displaystyle=\bm{T}_{12}^{-1}\begin{bmatrix}\bm{\kappa}\\ \bar{\rho}\end{bmatrix}+\bm{T}_{12}^{-1}\begin{bmatrix}\bm{0}\\ 1\end{bmatrix}\delta\rho (85)
𝒉2=𝑷𝑻121[𝜿ρ]\bm{h}_{2}^{\dagger}=\bm{P}\bm{T}_{12}^{-1}\begin{bmatrix}\bm{\kappa}\\ \rho\end{bmatrix} (86)
𝒉2𝑻1\displaystyle\frac{\partial\bm{h}_{2}^{\dagger}}{\partial\bm{T}_{1}} =𝑷𝑻¯121[𝜿ρ]\displaystyle=\bm{P}\bar{\bm{T}}_{12}^{-1}\begin{bmatrix}\bm{\kappa}\\ \rho\end{bmatrix}^{\odot} (87)
𝒉2𝑻2\displaystyle\frac{\partial\bm{h}_{2}^{\dagger}}{\partial\bm{T}_{2}} =𝑷(𝑻¯121[𝜿ρ])\displaystyle=-\bm{P}\left(\bar{\bm{T}}_{12}^{-1}\begin{bmatrix}\bm{\kappa}\\ \rho\end{bmatrix}\right)^{\odot} (88)
𝒉2ρ\displaystyle\frac{\partial\bm{h}_{2}^{\dagger}}{\partial\rho} =𝑷𝑻121[𝟎1]\displaystyle=\bm{P}\bm{T}_{12}^{-1}\begin{bmatrix}\bm{0}\\ 1\end{bmatrix} (89)
𝒒2𝒉¯2=[fx00fy][1h2,z0h2,xh2,z201h2,zh2,yh2,z2]\frac{\partial\bm{q}_{2}}{\partial\bar{\bm{h}}_{2}}=\begin{bmatrix}f_{x}&0\\ 0&f_{y}\end{bmatrix}\begin{bmatrix}\frac{1}{h_{2,z}}&0&-\frac{h_{2,x}}{h_{2,z}^{2}}\\ 0&\frac{1}{h_{2,z}}&-\frac{h_{2,y}}{h_{2,z}^{2}}\\ \end{bmatrix} (90)
𝑻1\displaystyle\bm{T}_{1} =𝑻ib1𝑻bc\displaystyle=\bm{T}_{ib1}\bm{T}_{bc} (91)
𝑻1𝑻ib1\displaystyle\frac{\partial\bm{T}_{1}}{\partial\bm{T}_{ib1}} =Ad(𝑻bc)1\displaystyle=Ad(\bm{T}_{bc})^{-1} (92)