This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\newfloatcommand

capbtabboxtable[][\FBwidth]

Data-Driven Protection Levels for Camera and 3D Map-based Safe Urban Localization

Shubh Gupta and Grace Xingxin Gao    Stanford University

Abstract

Reliably assessing the error in an estimated vehicle position is integral for ensuring the vehicle’s safety in urban environments. Many existing approaches use GNSS measurements to characterize protection levels (PLs) as probabilistic upper bounds on the position error. However, GNSS signals might be reflected or blocked in urban environments, and thus additional sensor modalities need to be considered to determine PLs. In this paper, we propose an approach for computing PLs by matching camera image measurements to a LiDAR-based 3D map of the environment. We specify a Gaussian mixture model probability distribution of position error using deep neural network-based data-driven models and statistical outlier weighting techniques. From the probability distribution, we compute the PLs by evaluating the position error bound using numerical line-search methods. Through experimental validation with real-world data, we demonstrate that the PLs computed from our method are reliable bounds on the position error in urban environments.

1 Introduction

In recent years, research on autonomous navigation for urban environments has been garnering increasing attention. Many publications have targeted different aspects of navigation such as route planning [1], perception [2] and localization [3, 4]. For trustworthy operation in each of these aspects, assessing the level of safety of the vehicle from potential system failures is critical. However, fewer works have examined the problem of safety quantification for autonomous vehicles.

In the context of satellite-based localization, safety is typically addressed via integrity monitoring (IM) [5]. Within IM, protection levels specify a statistical upper bound on the error in an estimated position of the vehicle, which can be trusted to enclose the position errors with a required probabilistic guarantee. To detect an unsafe estimated vehicle position, these protection levels are compared with the maximum allowable position error value, known as the alarm limit. Various methods [6, 7, 8] have been proposed over the years for computing protection levels, however, most of these approaches focus on GNSS-only navigation. These approaches do not directly apply to GNSS-denied urban environments, where visual sensors are becoming increasingly preferred [9]. Although various options in visual sensors exist in the market, camera sensors are inexpensive, lightweight, and have been widely employed in industry. For quantifying localization safety in GNSS-denied urban environments, there is thus a need to develop new ways of computing protection levels using camera image measurements.

Since protection levels are bounds over the position error, computing them from camera image measurements requires a model that relates the measurements to position error in the estimate of the vehicle location. Furthermore, since the lateral, longitudinal and vertical directions are well-defined with respect to a vehicle’s location on the road, the model must estimate the maximum position error in each of these directions for computing protection levels [10]. However, characterizing such a model is not straightforward. This is because the relation between a vehicle location in an environment and the corresponding camera image measurement is complex which depends on identifying and matching structural patterns in the measurements with prior known information about the environment [3, 4, 11, 12].

Recently, data-driven techniques based on deep neural networks (DNNs) have demonstrated state-of-the-art performance in determining the state of the camera sensor, comprising of its position and orientation, by identifying and matching patterns in images with a known map of the environment [13, 14, 15] or an existing database of images [16, 11]. By leveraging datasets consisting of multiple images with known camera states in an environment, these approaches train a DNN to model the relationship between an image and the corresponding state. However, the model characterized by the DNN can often be erroneous or brittle. For instance, recent research has shown that the output of a DNN can change significantly with minimal changes to the inputs [17]. Thus, for using DNNs to determine the position error, uncertainty in the output of the DNN must also be addressed.

DNN-based algorithms consider two types of uncertainty [18, 19]. Aleatoric or statistical uncertainty results from the noise present in the inputs to the DNN, due to which a precise output cannot be produced. For camera image inputs, sources of noise include illumination changes, occlusion or the presence of visually ambiguous structures, such as windows tessellated along a wall [18]. On the other hand, epistemic or systematic uncertainty exists within the model itself. Sources of epistemic uncertainty include poorly determined DNN model parameters as well as external factors that are not considered in the model [20], such as environmental features that might be ignored by the algorithm while matching the camera images to the environment map.

While aleatoric uncertainty is typically modeled as the input-dependent variance in the output of the DNN [18, 21, 22], epistemic uncertainty relates to the DNN model and, therefore, requires further deliberation. Existing approaches approximate epistemic uncertainty by assuming a probability distribution over the weight parameters of the DNN to represent the ignorance about the correct parameters [23, 24, 25]. However, these approaches assume that a correct value of the parameters exists and that the probability distribution over the weight parameters captures the uncertainty in the model, both of which do not necessarily hold in practice [26]. This inability of existing DNN-based methods to properly characterize uncertainty limits their applicability to safety-critical applications, such as localization of autonomous vehicles.

In this paper, we propose a novel method for computing protection levels associated with a given vehicular state estimate (position and orientation) from camera image measurements and a 3D map of the environment. This work is based on our recent ION GNSS+ 2020 conference paper [27] and includes additional experiments and improvements to the DNN training process. Recently, high-definition 3D environment maps in the form of LiDAR point clouds have become increasingly available through industry players such as HERE, TomTom, Waymo and NVIDIA, as well as through projects such as USGS 3DEP [28] and OpenTopography [29]. Furthermore, LiDAR-based 3D maps are more robust to noise from environmental factors, such as illumination and weather, than image-based maps[30]. Hence, we use LiDAR-based 3D point cloud maps in our approach.

Previously, CMRNet [15] has been proposed as a DNN-based approach for determining the vehicular state from camera images and a LiDAR-based 3D map. In our approach, we extend the DNN architecture proposed in [15] to model the position error and the covariance matrix (aleatoric uncertainty) in the vehicular state estimate. To assess the epistemic uncertainty in the position error, we evaluate the DNN position error outputs at multiple candidate states in the vicinity of the state estimate, and combine the outputs into samples of the state estimate position error. Fig. 1 shows the architecture of our proposed approach. Given a state estimate, we first select multiple candidate states from its neighborhood. Using the DNN, we then evaluate the position error and covariance for each candidate state by comparing the camera image measurement with a local map constructed from the candidate state and the 3D environment map. Next, we linearly transform the position error and covariance outputs from the DNN with the relative positions of candidate states into samples of the state estimate position error and variance. We then separate these samples into the lateral, longitudinal and vertical directions and weight the samples to mitigate the impact of outliers in each direction. Subsequently, we combine the position error samples, outlier weights, and variance samples to construct a Gaussian mixture model probability distribution of the position error in each direction, and numerically evaluate its intervals to compute protection levels.

Our main contributions are as follows:

  1. 1.

    We extend the CMRNet [15] architecture to model both the position error in the vehicular state estimate and the associated covariance matrix. Using the 3D LiDAR-based map of the environment, we first construct a local map representation with respect to the vehicular state estimate. Then, we use the DNN to analyze the correspondence between the camera image measurement and the local map for determining the position error and the covariance matrix.

  2. 2.

    We develop a novel method for capturing epistemic uncertainty in the DNN position error output. Unlike existing approaches which assume a probability distribution over DNN weight parameters, we directly analyze different position errors that are determined by the DNN for multiple candidate states selected from within a neighborhood of the state estimate. The position error outputs from the DNN corresponding to the candidate states are then linearly combined with the candidate states’ relative position from the state estimate, to obtain an empirical distribution of the state estimate position error.

  3. 3.

    We design an outlier weighting scheme to account for possible errors in the DNN output at inputs that differ from the training data. Our approach weighs the position error samples from the empirical distribution using a robust outlier detection metric, known as robust Z-score [31], along the lateral, longitudinal and vertical directions individually.

  4. 4.

    We construct the lateral, longitudinal and vertical protection levels as intervals over the probability distribution of the position error. We model this probability distribution as a Gaussian Mixture Model [32] from the position error samples, DNN covariance and outlier weights.

  5. 5.

    We demonstrate the applicability of our approach in urban environments by experimentally validating the protection levels computed from our method on real-world data with multiple camera images and different state estimates.

Refer to caption
Figure 1: Architecture of our proposed approach for computing protection levels. Given a state estimate, multiple candidate states are selected from its neighborhood and the corresponding position error and the covariance matrix for each candidate state are evaluated using the DNN. The position errors and covariance are then linearly transformed to obtain samples of the state estimate position error and variance, which are then weighted to determine outliers. Finally, the position error samples, outlier weights and variance are combined to construct a Gaussian Mixture Model probability distribution, from which the lateral, longitudinal and vertical protection levels are computed through numerical evaluation of its probability intervals.

The remainder of this paper is structured as follows: Section II discusses related work. Section III formulates the problem of estimating protection levels. Section IV describes the two types of uncertainties considered in our approach. Section V details our algorithm. Section VI presents the results from experimentation with real-world data. We conclude the paper in Section VII.

2 Related Work

Several methods have been developed over the years which characterize protection levels in the context of GNSS-based urban navigation. Jiang and Wang [6] compute horizontal protection levels using an iterative search-based method and test statistic based on the bivariate normal distribution. Cezón et al. [7] analyze methods which utilize the isotropy of residual vectors from the least-squares position estimation to compute the protection levels. Tran and Presti [8] combine Advanced Receiver Autonomous Integrity Monitoring (ARAIM) with Kalman filtering, and compute the protection levels by considering the set of position solutions which arise after excluding faulty measurements. These approaches compute the protection levels by deriving the mathematical relation between measurement and position domain errors. However, such a relation is difficult to formulate with camera image measurements and a LiDAR-based 3D map, since the position error in this case depends on various factors such as the structure of buildings in the environment, available visual features and illumination levels.

Previous works have proposed IM approaches for LiDAR and camera-based navigation where the vehicle is localized by associating identified landmarks with a stored map or a database. Joerger et al. [33] developed a method to quantify integrity risk for LiDAR-based navigation algorithms by analyzing failures of feature extraction and data association subroutines. Zhu et al. [34] derived a bound on the integrity risk in camera-based navigation using EKF caused by incorrect feature associations. However, these IM approaches have been developed for localization algorithms based on data-association and cannot be directly applied to many recent camera and LiDAR-based localization techniques which use deep learning to model the complex relation between measurements and the stored map or the database. Furthermore, these IM techniques do not estimate protection levels, which is the focus of our work.

Deep learning has been widely applied for determining position information from camera images. Kendall et al. [35] train a DNN using images from a single environment to learn a relation between the image and the camera 6-DOF pose. Taira et al. [11] learn image features using a DNN and apply feature extraction and matching techniques to estimate the 6-DOF camera pose relative to a known 3D map of the environment. Sarlin et al. [16] develop a deep learning-based 2D-3D matching technique to obtain 6-DOF camera pose from images and a 3D environment model. However, these approaches do not model the corresponding uncertainty associated with the estimated camera pose, or account for failures in DNN approximation [26], which is necessary for characterizing safety measures such as protection levels.

Some recent works have proposed to estimate the uncertainty associated with deep learning algorithms. Kendall and Cipolla [23] estimate the uncertainty in DNN-based camera pose estimation from images, by evaluating the network multiple times through dropout [24]. Loquercio et al. [19] propose a general framework for estimating uncertainty in deep learning as variance computed from both aleatoric and epistemic sources. McAllister et al. [21] suggest using Bayesian deep learning to determine uncertainty and quantify safety in autonomous vehicles, by placing probability distributions over DNN weights to represent the uncertainty in the DNN model. Yang et al. [22] jointly estimate the vehicle odometry, scene depth and uncertainty from sequential camera images. However, the uncertainty estimates from these algorithms do not take into account the inaccuracy of the trained DNN model, or the influence of the underlying environment structure on the DNN outputs. In our approach, we evaluate the DNN position error outputs at inputs corresponding to multiple states in the environment, and utilize these position errors for characterizing uncertainty both from inaccuracy in the DNN model as well as from the environment structure around the state estimate.

To the best of our knowledge, our approach is the first that applies data-driven algorithms for computing protection levels by characterizing the uncertainty from different error sources. The proposed method seeks to leverage the high-fidelity function modeling capability of DNNs and combine it with techniques from robust statistics and integrity monitoring to compute robust protection levels using camera image measurements and 3D map of the environment.

3 Problem Formulation

We consider the scenario of a vehicle navigating in an urban environment using measurements acquired by an on-board camera. The 3D LiDAR map of the environment \mathcal{M} that consists of points 𝐩3\mathbf{p}\in\mathbb{R}^{3} is assumed to be pre-known from either openly available repositories [28, 29] or from Simultaneous Localization and Mapping algorithms [36].

The vehicular state 𝐬t=[𝐱t,𝐨t]\mathbf{s}_{t}=[\mathbf{x}_{t},\mathbf{o}_{t}] at time tt is a 7-element vector comprising of its 3D position 𝐱t=[xt,yt,zt]3\mathbf{x}_{t}=[x_{t},y_{t},z_{t}]^{\top}\in\mathbb{R}^{3} along x,yx,y and zz-dimensions and 3D orientation unit quaternion 𝐨t=[o1,t,o2,t,o3,t,o4,t]SU(2)\mathbf{o}_{t}=[o_{1,t},o_{2,t},o_{3,t},o_{4,t}]\in\textrm{SU}(2). The vehicle state estimates over time are denoted as {𝐬t}t=1Tmax\{\mathbf{s}_{t}\}_{t=1}^{T_{\text{max}}} where TmaxT_{\text{max}} denotes the total time in a navigation sequence. At each time tt, the vehicle captures an RGB camera image Itl×w×3I_{t}\in\mathbb{R}^{l\times w\times 3} from the on-board camera, where ll and ww denote pixels along length and width dimensions, respectively.

Given an integrity risk specification IRIR, our objective is to compute the lateral protection level PLlat,tPL_{lat,t}, longitudinal protection level PLlon,tPL_{lon,t}, and vertical protection level PLvert,tPL_{vert,t} at time tt, which denote the maximal bounds on the position error magnitude with a probabilistic guarantee of at least 1IR1-IR. Considering x,yx,y and zz-dimensions in the rotational frame of the vehicle

PLlat,t\displaystyle PL_{lat,t} =sup{ρ(|xtxt|ρ)1IR}\displaystyle=\sup\left\{\rho\mid\mathbb{P}\left(|x_{t}-x^{*}_{t}|\leq\rho\right)\leq 1-IR\right\}
PLlon,t\displaystyle PL_{lon,t} =sup{ρ(|ytyt|ρ)1IR}\displaystyle=\sup\left\{\rho\mid\mathbb{P}\left(|y_{t}-y^{*}_{t}|\leq\rho\right)\leq 1-IR\right\}
PLvert,t\displaystyle PL_{vert,t} =sup{ρ(|ztzt|ρ)1IR},\displaystyle=\sup\left\{\rho\mid\mathbb{P}\left(|z_{t}-z^{*}_{t}|\leq\rho\right)\leq 1-IR\right\}, (1)

where 𝐱t=[xt,yt,zt]\mathbf{x}^{*}_{t}=[x^{*}_{t},y^{*}_{t},z^{*}_{t}] denotes the unknown true vehicle position at time tt.

4 Types of Uncertainty in Position Error

Protection levels for a state estimate 𝐬t\mathbf{s}_{t} at time tt depend on the uncertainty in determining the associated position error Δ𝐱t=[Δxt,Δyt,Δzt]\Delta\mathbf{x}_{t}=[\Delta x_{t},\Delta y_{t},\Delta z_{t}] between the state estimate position 𝐱t\mathbf{x}_{t} and the true position 𝐱t\mathbf{x}^{*}_{t} from the camera image ItI_{t} and the environment map \mathcal{M}. We consider two different kinds of uncertainty, which are categorized by the source of inaccuracy in determining the position error Δ𝐱t\Delta\mathbf{x}_{t}: aleatoric and epistemic uncertainty.

4.1 Aleatoric Uncertainty

Aleatoric uncertainty refers to the uncertainty from noise present in the camera image measurements ItI_{t} and the environment map \mathcal{M}, due to which a precise value of the position error Δ𝐱t\Delta\mathbf{x}_{t} cannot be determined. Existing DNN-based localization approaches model the aleatoric uncertainty as a covariance matrix with only diagonal entries [18, 21, 22] or with both diagonal and off-diagonal terms [37, 38]. Similar to the existing approaches, we characterize the aleatoric uncertainty by using a DNN to model the covariance matrix Σt\Sigma_{t} associated with the position error Δ𝐱t\Delta\mathbf{x}_{t}. We consider both nonzero diagonal and off-diagonal terms in Σt\Sigma_{t} to model the correlation between x,yx,y and zz-dimension uncertainties, such as along the ground plane.

Aleatoric uncertainty by itself does not accurately represent the uncertainty in determining the position error. This is because aleatoric uncertainty assumes that the noise present in training data also represents the noise in all future inputs and that the DNN approximation is error-free. These assumptions fail in scenarios when the input at evaluation time is different from the training data or when the input contains features that occur rarely in the real world [26]. Thus, relying purely on aleatoric uncertainty can lead to an overconfident estimates of the position error uncertainty [18].

Refer to caption
𝐱t\mathbf{x}_{t}
𝐱t1\mathbf{x}^{1}_{t}
𝐱t2\mathbf{x}^{2}_{t}
Δ𝐱t1\Delta\mathbf{x}^{1}_{t}
Δ𝐱t\Delta\mathbf{x}_{t}
Δ𝐱t2\Delta\mathbf{x}^{2}_{t}
𝐱t\mathbf{x}^{*}_{t}
Refer to caption
State estimate
True state
Candidate state
Figure 2: Position error Δ𝐱t\Delta\mathbf{x}_{t} in the state estimate position 𝐱t\mathbf{x}_{t} is a linear combination of the position error Δ𝐱ti\Delta\mathbf{x}^{i}_{t} in position 𝐱ti\mathbf{x}^{i}_{t} of any candidate state stis^{i}_{t} and the relative position vector between 𝐱ti\mathbf{x}^{i}_{t} and 𝐱t\mathbf{x}_{t}.

4.2 Epistemic Uncertainty

Epistemic uncertainty relates to the inaccuracies in the model for determining the position error Δ𝐱t\Delta\mathbf{x}_{t}. In our approach, we characterize the epistemic uncertainty by leveraging a geometrical property of the position error Δ𝐱t\Delta\mathbf{x}_{t}, where for the same camera image ItI_{t}, Δ𝐱t\Delta\mathbf{x}_{t} can be obtained by linearly combining the position error Δ𝐱t\Delta\mathbf{x}^{\prime}_{t} computed for any candidate state 𝐬t\mathbf{s}^{\prime}_{t} and the relative position of 𝐬t\mathbf{s}^{\prime}_{t} from the state estimate 𝐬t\mathbf{s}_{t} (Fig. 2). Hence, using known relative positions and orientations of NCN_{C} candidate states {𝐬t1,,𝐬tNC}\{\mathbf{s}_{t}^{1},\ldots,\mathbf{s}_{t}^{N_{C}}\} from 𝐬t\mathbf{s}_{t}, we transform the different position errors {Δ𝐱t1,,Δ𝐱tNC}\{\Delta\mathbf{x}_{t}^{1},\ldots,\Delta\mathbf{x}_{t}^{N_{C}}\} determined for the candidate states into samples of the state estimate position error Δ𝐱t\Delta\mathbf{x}_{t}. The empirical distribution comprised of these position error samples characterizes the epistemic uncertainty in the position error estimated using the DNN.

5 Data-Driven Protection Levels

This section details our algorithm for computing data-driven protection levels for the state estimate 𝐬t\mathbf{s}_{t} at time tt, using the camera image ItI_{t} and environment map \mathcal{M}. First, we describe the method for generating local representations of the 3D environment map \mathcal{M} with respect to the state estimate 𝐬t\mathbf{s}_{t}. Then, we illustrate the architecture of the DNN. Next, we discuss the loss functions used in DNN training. We then detail the method for selecting multiple candidate states from the neighborhood of the state estimate 𝐬t\mathbf{s}_{t}. Using position errors and covariance matrix evaluated from the DNN for each of these candidate states, we then illustrate the process for transforming the candidate state position errors into multiple samples of the state estimate position error. Then, to mitigate the impact of outliers in the computed position error samples in each of the lateral, longitudinal and vertical directions, we detail the procedure for computing outlier weights. Next, we characterize the probability distribution over position error in lateral, longitudinal and vertical directions. Finally, we detail the approach for determining protection levels from the probability distribution by numerical methods.

5.1 Local Map Construction

A local representation of the 3D LiDAR map of the environment captures the environment information in the vicinity of the state estimate 𝐬t\mathbf{s}_{t} at time tt. By comparing the environment information captured in the local map with the camera image Itl×w×3I_{t}\in\mathbb{R}^{l\times w\times 3} using a DNN, we estimate the position error Δ𝐱t\Delta\mathbf{x}_{t} and covariance Σt\Sigma_{t} in the state estimate 𝐬t\mathbf{s}_{t}. For computing the local maps, we utilize the LiDAR-image generation procedure described in [15]. Similar to their approach, we generate the local map L(𝐬,)l×wL(\mathbf{s},\mathcal{M})\in\mathbb{R}^{l\times w} associated with a vehicle state 𝐬\mathbf{s} and LiDAR environment map \mathcal{M} in two steps.

  1. 1.

    First, we determine the rigid-body transformation matrix H𝐬H_{\mathbf{s}} in the special Euclidean group SE(3)\textrm{SE}(3) corresponding to the vehicle state 𝐬\mathbf{s}

    H𝐬=[R𝐬T𝐬𝟎1×31]SE(3),H_{\mathbf{s}}=\left[\begin{matrix}R_{\mathbf{s}}&T_{\mathbf{s}}\\ \mathbf{0}_{1\times 3}&1\end{matrix}\right]\in\textrm{SE}(3), (2)

    where

    1. R𝐬R_{\mathbf{s}} denotes the rotation matrix corresponding to the orientation quaternion elements 𝐨=[o1,o2,o3,o4]\mathbf{o}=[o_{1},o_{2},o_{3},o_{4}] in the state 𝐬\mathbf{s}

    2. T𝐬T_{\mathbf{s}} denotes the translation vector corresponding to the position elements 𝐱=[x,y,z]\mathbf{x}=[x,y,z] in the state 𝐬\mathbf{s}.

    Using the matrix H𝐬H_{\mathbf{s}}, we rotate and translate the points in the map \mathcal{M} to the map 𝐬\mathcal{M}_{\mathbf{s}} in the reference frame of the state 𝐬\mathbf{s}

    𝐬={[I3×3𝟎3×1]H𝐬[𝐩1]𝐩},\mathcal{M}_{\mathbf{s}}=\{\left[\begin{matrix}I_{3\times 3}&\mathbf{0}_{3\times 1}\end{matrix}\right]\cdot H_{\mathbf{s}}\cdot\left[\begin{matrix}\mathbf{p}^{\top}&1\end{matrix}\right]^{\top}\mid\mathbf{p}\in\mathcal{M}\}, (3)

    where II denotes the identity matrix.

    For maintaining computational efficiency in the case of large maps, we use the points in the LiDAR map 𝐬\mathcal{M}_{\mathbf{s}} that lie in a sub-region around the state 𝐬\mathbf{s} and in the direction of the vehicle orientation.

  2. 2.

    In the second step, we apply the occlusion estimation filter presented in [39] to identify and remove occluded points along rays from the camera center. For each pair of points (𝐩(i),𝐩(j))(\mathbf{p}^{(i)},\mathbf{p}^{(j)}) where 𝐩(i)\mathbf{p}^{(i)} is closer to the state 𝐬\mathbf{s}, 𝐩(j)\mathbf{p}^{(j)} is marked occluded if the angle between the ray from 𝐩(j)\mathbf{p}^{(j)} to the camera center and the line from 𝐩(j)\mathbf{p}^{(j)} to 𝐩(i)\mathbf{p}^{(i)} is less than a threshold. Then, the remaining points are projected to the camera image frame using the camera projection matrix KK to generate the local depth map L(𝐬,)L(\mathbf{s},\mathcal{M}). The iith point 𝐩(i)\mathbf{p}^{(i)} in 𝐬\mathcal{M}_{\mathbf{s}} is projected as

    [pxpyc]\displaystyle[\begin{matrix}p_{x}&p_{y}&c\end{matrix}]^{\top} =K𝐩(i)\displaystyle=K\cdot\mathbf{p}^{(i)}
    [L(𝐬,)](px/c,py/c)\displaystyle[L(\mathbf{s},\mathcal{M})]_{(\lceil p_{x}/c\rceil,\lceil p_{y}/c\rceil)} =[001]𝐩(i),\displaystyle=[\begin{matrix}0&0&1\end{matrix}]\cdot\mathbf{p}^{(i)}, (4)

    where

    1. px,pyp_{x},p_{y} denote the projected 2D coordinates with scaling term cc

    2. [L(𝐬,)](px,py)[L(\mathbf{s},\mathcal{M})]_{(p_{x},p_{y})} denotes the (px,py)(p_{x},p_{y}) pixel position in the local map L(𝐬,)L(\mathbf{s},\mathcal{M}).

The local depth map L(𝐬,)L(\mathbf{s},\mathcal{M}) for state 𝐬\mathbf{s} visualizes the environment features that are expected to be captured in a camera image obtained from the state 𝐬\mathbf{s}. However, the obtained camera image ItI_{t} is associated with the true state 𝐬t\mathbf{s}^{*}_{t} that might be different from the state estimate 𝐬t\mathbf{s}_{t}. Nevertheless, for reasonably small position and orientation differences between the state estimate 𝐬t\mathbf{s}_{t} and true state 𝐬t\mathbf{s}^{*}_{t}, the local map L(𝐬,)L(\mathbf{s},\mathcal{M}) contains features that correspond with some of the features in the camera image ItI_{t} that we use to estimate the position error.

Refer to caption
Figure 3: Architecture of our deep neural network for estimating translation and rotation errors as well as parameters of the covariance matrix. The translation and rotation errors are determined using CMRNet [15], and employs correlation layers [40] for comparing feature representations of the camera image and the local depth map. Using a similar architecture, we design CovarianceNet which produces parameters of the covariance matrix associated with the translation error output.

5.2 DNN Architecture

We use a DNN to estimate the position error Δ𝐱t\Delta\mathbf{x}_{t} and associated covariance matrix Σt\Sigma_{t} by implicitly identifying and comparing the positions of corresponding features in camera image ItI_{t} and the local depth map L(𝐬t,)L(\mathbf{s}_{t},\mathcal{M}) associated with the state estimate 𝐬t\mathbf{s}_{t}.

The architecture of our DNN is given in Fig. 3. Our DNN comprises of two separate modules, one for estimating the position error Δ𝐱t\Delta\mathbf{x}_{t} and other for the parameters of the covariance matrix Σt\Sigma_{t}. The first module for estimating the position error Δ𝐱t\Delta\mathbf{x}_{t} is based on CMRNet [15]. CMRNet was originally proposed as an algorithm to iteratively determine the position and orientation of a vehicle using a camera image and 3D LiDAR map, starting from a provided initial state. For determining position error Δ𝐱t\Delta\mathbf{x}_{t} using CMRNet, we use the state estimate 𝐬t\mathbf{s}_{t} as the provided initial state and the corresponding DNN translation Δ𝐱~t\Delta\tilde{\mathbf{x}}_{t} and rotation Δ𝐫~\Delta\tilde{\mathbf{r}} error output for transforming the state 𝐬t\mathbf{s}_{t} towards the true state 𝐬t\mathbf{s}^{*}_{t}. Formally, given any state 𝐬\mathbf{s} and camera image ItI_{t} at time tt, the translation error Δ𝐱~\Delta\tilde{\mathbf{x}} and rotation error Δ𝐫~\Delta\tilde{\mathbf{r}} are expressed as

Δ𝐱~,Δ𝐫~=CMRNet(It,L(𝐬,)).\Delta\tilde{\mathbf{x}},\Delta\tilde{\mathbf{r}}=\textrm{CMRNet}(I_{t},L(\mathbf{s},\mathcal{M})). (5)

CMRNet estimates the rotation error Δ𝐫~\Delta\tilde{\mathbf{r}} as a unit quaternion. Furthermore, the architecture determines both the translation error Δ𝐱~\Delta\tilde{\mathbf{x}} and rotation error Δ𝐫~\Delta\tilde{\mathbf{r}} in the reference frame of the state 𝐬\mathbf{s}. Since the protection levels depend on the position error Δ𝐱\Delta\mathbf{x} in the reference frame from which the camera image ItI_{t} is captured (the vehicle reference frame), we transform the translation error Δ𝐱~\Delta\tilde{\mathbf{x}} to the vehicle reference frame by rotating it with the inverse of Δ𝐫~\Delta\tilde{\mathbf{r}}

Δ𝐱=R~Δ𝐱~,\Delta\mathbf{x}=-\tilde{R}^{\top}\cdot\Delta\tilde{\mathbf{x}}, (6)

where R~\tilde{R} is the 3×33\times 3 rotation matrix corresponding to the rotation error quaternion Δ𝐫~\Delta\tilde{\mathbf{r}}.

In the second module, we determine the covariance matrix Σ\Sigma associated with Δ𝐱\Delta\mathbf{x} by first estimating the covariance matrix Σ~\tilde{\Sigma} associated with the translation error Δ𝐱~\Delta\tilde{\mathbf{x}} obtained from CMRNet and then transforming it to the vehicle reference frame using Δ𝐫~\Delta\tilde{\mathbf{r}}.

We model the covariance matrix Σ~\tilde{\Sigma} by following a similar approach to [37]. Since the covariance matrix is both symmetric and positive-definite, we consider the decomposition of Σ~\tilde{\Sigma} into diagonal standard deviations 𝝈=[σ1,σ2,σ3]\boldsymbol{\sigma}=[\sigma_{1},\sigma_{2},\sigma_{3}] and correlation coefficients 𝜼=[η21,η31,η32]\boldsymbol{\eta}=[\eta_{21},\eta_{31},\eta_{32}]

[Σ~]ii\displaystyle[\tilde{\Sigma}]_{ii} =σi2\displaystyle=\sigma_{i}^{2}
[Σ~]ij\displaystyle[\tilde{\Sigma}]_{ij} =[Σ]ji=ηijσiσj,\displaystyle=[\Sigma]_{ji}=\eta_{ij}\sigma_{i}\sigma_{j}, (7)

where i,j{1,2,3}i,j\in\{1,2,3\} and j<ij<i. We estimate these terms using our second DNN module (referred to as CovarianceNet) which has a similar network structure as CMRNet, but with 256256 and 66 artificial neurons in the last two fully connected layers to prevent overfitting. For stable training, CovarianceNet produces logarithm of the standard deviation output, which is converted to the standard deviation by then taking the exponent. Additionally, we use tanh function to scale the correlation coefficient outputs 𝜼\boldsymbol{\eta} in CovarianceNet between ±1\pm 1. Formally, given a vehicle state 𝐬\mathbf{s} and camera image ItI_{t} at time tt, the standard deviation 𝝈\boldsymbol{\sigma} and correlation coefficients 𝜼\boldsymbol{\eta} is approximated as

𝝈,𝜼=CovarianceNet(It,L(𝐬,)).\boldsymbol{\sigma},\boldsymbol{\eta}=\textrm{CovarianceNet}(I_{t},L(\mathbf{s},\mathcal{M})). (8)

Using the constructed Σ~\tilde{\Sigma} from the obtained 𝝈,𝜼\boldsymbol{\sigma},\boldsymbol{\eta}, we obtain the covariance matrix Σ\Sigma associated with Δ𝐱\Delta\mathbf{x} as

Σ=R~Σ~R~.\Sigma=\tilde{R}^{\top}\cdot\tilde{\Sigma}\cdot\tilde{R}. (9)

We keep the aleatoric uncertainty restricted to position domain errors in this work for simplicity, and thus treat Δ𝐫~\Delta\tilde{\mathbf{r}} as a point estimate. The impact of errors in estimating Δ𝐫~\Delta\tilde{\mathbf{r}} on protection levels is taken into consideration as epistemic uncertainty, and is discussed in more detail in Section V.5 and V.7.

The feature extraction modules in CovarianceNet and CMRNet are separate since the two tasks are complementary: for estimating position error, the DNN must learn features that are robust to noise in the inputs while the variance in the estimated position error depends on the noise itself.

5.3 Loss Functions

The loss function for training the DNN must penalize position error outputs that differ from the corresponding ground truth present in the dataset, as well as penalize covariance that overestimates or underestimates the uncertainty in the position error predictions. Furthermore, the loss muss incentivize the DNN to extract useful features from the camera image and local map inputs for predicting the position error. Hence, we consider three additive components in our loss function ()\mathcal{L}(\cdot)

=αHuberHuber(Δ𝐱~,Δ𝐱~)+αMLEMLE(Δ𝐱~,Δ𝐱~,Σ~)+αAngAng(Δ𝐫~,Δ𝐫~),\mathcal{L}=\alpha_{\textrm{Huber}}\mathcal{L}_{\textrm{Huber}}(\Delta\tilde{\mathbf{x}}^{*},\Delta\tilde{\mathbf{x}})+\alpha_{\textrm{MLE}}\mathcal{L}_{\textrm{MLE}}(\Delta\tilde{\mathbf{x}}^{*},\Delta\tilde{\mathbf{x}},\tilde{\Sigma})+\alpha_{\textrm{Ang}}\mathcal{L}_{\textrm{Ang}}(\Delta\tilde{\mathbf{r}}^{*},\Delta\tilde{\mathbf{r}}), (10)

where

  1. Δ𝐱~,Δ𝐫~\Delta\tilde{\mathbf{x}}^{*},\Delta\tilde{\mathbf{r}}^{*} denotes the vector-valued translation and rotation error in the reference frame of the state estimate 𝐬\mathbf{s} to the unknown true state 𝐬\mathbf{s}^{*}

  2. Huber()\mathcal{L}_{\textrm{Huber}}(\cdot) denotes the Huber loss function [41]

  3. MLE()\mathcal{L}_{\textrm{MLE}}(\cdot) denotes the loss function for the maximum likelihood estimation of position error Δ𝐱\Delta\mathbf{x} and covariance Σ~\tilde{\Sigma}

  4. Ang()\mathcal{L}_{\textrm{Ang}}(\cdot) denotes the quaternion angular distance from [15]

  5. αHuber,αMLE,αAng\alpha_{\textrm{Huber}},\alpha_{\textrm{MLE}},\alpha_{\textrm{Ang}} are coefficients for weighting each loss term.

We employ the Huber loss Huber()\mathcal{L}_{\textrm{Huber}}(\cdot) and quaternion angular distance Ang()\mathcal{L}_{\textrm{Ang}}(\cdot) terms from [15]. The Huber loss term Huber()\mathcal{L}_{\textrm{Huber}}(\cdot) penalizes the translation error output Δ𝐱~\Delta\tilde{\mathbf{x}} of the DNN

Huber(Δ𝐱~,Δ𝐱~)\displaystyle\mathcal{L}_{\textrm{Huber}}(\Delta\tilde{\mathbf{x}}^{*},\Delta\tilde{\mathbf{x}}) =X=x,y,zDHuber(ΔX~,ΔX~)\displaystyle=\sum_{X=x,y,z}D_{\textrm{Huber}}(\Delta\tilde{X}^{*},\Delta\tilde{X})
DHuber(a,a)\displaystyle D_{\textrm{Huber}}(a^{*},a) ={12(aa)2for |aa|δδ(|aa|12δ)otherwise,\displaystyle=\begin{cases}\frac{1}{2}(a-a^{*})^{2}&\textrm{for }|a-a^{*}|\leq\delta\\ \delta\cdot(|a-a^{*}|-\frac{1}{2}\delta)&\textrm{otherwise}\end{cases}, (11)

where δ\delta is a hyperparameter for adjusting the penalty assignment to small error values. In this paper, we set δ=1\delta=1. Unlike the more common mean squared error, the penalty assigned to higher error values is linear in Huber loss instead of quadratic. Thus, Huber loss is more robust to outliers and leads to more stable training as compared with squared error. The quaternion angular distance term Ang()\mathcal{L}_{\textrm{Ang}}(\cdot) penalizes the rotation error output Δ𝐫~\Delta\tilde{\mathbf{r}} from CMRNet

Ang(Δ𝐫~,Δ𝐫~)\displaystyle\mathcal{L}_{\textrm{Ang}}(\Delta\tilde{\mathbf{r}}^{*},\Delta\tilde{\mathbf{r}}) =DAng(Δ𝐫~×Δ𝐫~1)\displaystyle=D_{\textrm{Ang}}(\Delta\tilde{\mathbf{r}}^{*}\times\Delta\tilde{\mathbf{r}}^{-1})
DAng(𝐪)\displaystyle D_{\textrm{Ang}}(\mathbf{q}) =atan22(q22+q32+q42,|q1|),\displaystyle=\operatorname{atan2}2\left(\sqrt{q_{2}^{2}+q_{3}^{2}+q_{4}^{2}},|q_{1}|\right), (12)

where

  1. qiq_{i} denotes the iith element in quaternion 𝐪\mathbf{q}

  2. Δ𝐫1\Delta\mathbf{r}^{-1} denotes the inverse of the quaternion Δ𝐫\Delta\mathbf{r}

  3. 𝐪×𝐫\mathbf{q}\times\mathbf{r} here denotes element-wise multiplication of the quaternions 𝐪\mathbf{q} and 𝐫\mathbf{r}

  4. atan22()\operatorname{atan2}2(\cdot) is the two-argument version of the arctangent function.

Including the quaternion angular distance term Ang()\mathcal{L}_{\textrm{Ang}}(\cdot) in the loss function incentivizes the DNN to learn features that are relevant to the geometry between the camera image and the local depth map. Hence, it provides additional supervision to the DNN training as a multi-task objective [42], and is important for the stability and speed of the training process.

The maximum likelihood loss term MLE()\mathcal{L}_{\textrm{MLE}}(\cdot) depends on both the translation error Δ𝐱~\Delta\tilde{\mathbf{x}} and covariance matrix Σ~\tilde{\Sigma} estimated from the DNN. The loss function is analogous to the negative log-likelihood of the Gaussian distribution

MLE(Δ𝐱~,Δ𝐱~,Σ~)=12log|Σ~|+12(Δ𝐱~Δ𝐱~)Σ~1(Δ𝐱~Δ𝐱~)\mathcal{L}_{\textrm{MLE}}(\Delta\tilde{\mathbf{x}}^{*},\Delta\tilde{\mathbf{x}},\tilde{\Sigma})=\frac{1}{2}\log|\tilde{\Sigma}|+\frac{1}{2}(\Delta\tilde{\mathbf{x}}^{*}-\Delta\tilde{\mathbf{x}})^{\top}\cdot\tilde{\Sigma}^{-1}\cdot(\Delta\tilde{\mathbf{x}}^{*}-\Delta\tilde{\mathbf{x}}) (13)

If the covariance output from the DNN has small values, the corresponding translation error is penalized much more than the translation error corresponding to a large valued covariance. Hence, the maximum likelihood loss term MLE()\mathcal{L}_{\textrm{MLE}}(\cdot) incentivizes the DNN to output small covariance only when the corresponding translation error output has high confidence, and otherwise output large covariance.

5.4 Multiple Candidate State Selection

To assess the uncertainty in the DNN-based position error estimation process as well as the uncertainty from environmental factors, we evaluate the DNN output at NCN_{C} candidate states {𝐬t1,𝐬tNC}\{\mathbf{s}^{1}_{t}\ldots,\mathbf{s}^{N_{C}}_{t}\} in the neighborhood of the state estimate 𝐬t\mathbf{s}_{t}.

For selecting the candidate states {𝐬t1,𝐬tNC}\{\mathbf{s}^{1}_{t}\ldots,\mathbf{s}^{N_{C}}_{t}\}, we randomly generate multiple values of translation offset {𝐭1,,𝐭NC}\{\mathbf{t}^{1},\ldots,\mathbf{t}^{N_{C}}\} and rotation offset {𝐫1,,𝐫NC}\{\mathbf{r}^{1},\ldots,\mathbf{r}^{N_{C}}\} about the state estimate 𝐬t\mathbf{s}_{t}, where NCN_{C} is the total number of selected candidate states. The iith translation offset 𝐭i3\mathbf{t}^{i}\in\mathbb{R}^{3} denotes translation in x,yx,y and zz dimensions and is sampled from a uniform probability distribution between a specified range ±tmax\pm t_{max} in each dimension. Similarly, the iith rotation offset 𝐫iSU(2)\mathbf{r}^{i}\in\textrm{SU}(2) is obtained by uniformly sampling between ±rmax\pm r_{max} angular deviations about each axis and converting the resulting rotation to a quaternion. The iith candidate state 𝐬ti\mathbf{s}^{i}_{t} is generated by rotating and translating the state estimate 𝐬t\mathbf{s}_{t} by 𝐫i\mathbf{r}^{i} and 𝐭i\mathbf{t}^{i}, respectively. Corresponding to each candidate state 𝐬ti\mathbf{s}^{i}_{t}, we generate a local depth map L(𝐬ti,)L(\mathbf{s}^{i}_{t},\mathcal{M}) using the procedure laid out in Section V.1.

5.5 Linear Transformation of Position Errors

Using each local depth map L(𝐬ti,)L(\mathbf{s}^{i}_{t},\mathcal{M}) and camera image ItI_{t} for the iith candidate state 𝐬ti\mathbf{s}^{i}_{t} as inputs to the DNN in Section V.2, we evaluate the candidate state position error Δ𝐱ti\Delta\mathbf{x}^{i}_{t} and covariance matrix Σti\Sigma^{i}_{t}. From the known translation offset 𝐭i\mathbf{t}^{i} between the candidate state 𝐬ti\mathbf{s}^{i}_{t} and the state estimate 𝐬t\mathbf{s}_{t} and the DNN-based rotation error Δ𝐫~t\Delta\tilde{\mathbf{r}}_{t} in 𝐬t\mathbf{s}_{t}, we compute the transformation matrix H𝐬ti𝐬tH_{\mathbf{s}^{i}_{t}\to\mathbf{s}_{t}} for converting the candidate state position error Δ𝐱ti\Delta\mathbf{x}^{i}_{t} to the state estimate position error Δ𝐱t\Delta\mathbf{x}_{t} in the vehicle reference frame

H𝐬ti𝐬t=[I3×3R~t𝐭i],H_{\mathbf{s}^{i}_{t}\to\mathbf{s}_{t}}=\left[\begin{matrix}I_{3\times 3}&-\tilde{R}_{t}^{\top}\mathbf{t}^{i}\end{matrix}\right], (14)

where I3×3I_{3\times 3} denotes the identity matrix and R~t\tilde{R}_{t} is the 3×33\times 3 rotation matrix computed from the DNN-based rotation error Δ𝐫~t\Delta\tilde{\mathbf{r}}_{t} between the state estimate 𝐬t\mathbf{s}_{t} and the unknown true state 𝐬t\mathbf{s}^{*}_{t}. Note that the rotation offset 𝐫i\mathbf{r}^{i} is not used in the transformation, since we are only concerned with the position errors from the true state 𝐬t\mathbf{s}^{*}_{t} to the state estimate 𝐬t\mathbf{s}_{t}, which are invariant to the orientation of the candidate state 𝐬ti\mathbf{s}^{i}_{t}. Using the transformation matrix H𝐬ti𝐬tH_{\mathbf{s}^{i}_{t}\to\mathbf{s}_{t}}, we obtain the iith sample of the state estimate position error Δ𝐱t(i)\Delta\mathbf{x}_{t}^{(i)}

Δ𝐱t(i)=H𝐬ti𝐬t[Δ𝐱ti1]=Δ𝐱tiR~t𝐭i.\Delta\mathbf{x}_{t}^{(i)}=H_{\mathbf{s}^{i}_{t}\to\mathbf{s}_{t}}\cdot[\begin{matrix}\Delta\mathbf{x}^{i}_{t}&1\end{matrix}]^{\top}=\Delta\mathbf{x}^{i}_{t}-\tilde{R}_{t}^{\top}\mathbf{t}^{i}. (15)

We use parentheses in the notation Δ𝐱t(i)\Delta\mathbf{x}_{t}^{(i)} for the transformed samples of the position error between the true state 𝐬t\mathbf{s}^{*}_{t} and the state estimate 𝐬t\mathbf{s}_{t} to differentiate from the position error Δ𝐱ti\Delta\mathbf{x}^{i}_{t} between 𝐬t\mathbf{s}^{*}_{t} and the candidate state 𝐬ti\mathbf{s}^{i}_{t}. Next, we modify the candidate state covariance matrix Σti\Sigma^{i}_{t} to account for uncertainty in DNN-based rotation error Δ𝐫~t\Delta\tilde{\mathbf{r}}_{t}. The resulting covariance matrix Σt(i)\Sigma^{(i)}_{t} in terms of the covariance matrix Σti\Sigma^{i}_{t} for Δ𝐱ti\Delta\mathbf{x}^{i}_{t}, R~t\tilde{R}_{t} and 𝐭i\mathbf{t}^{i} is

Σt(i)=Σti+Var[R~t𝐭i].\Sigma^{(i)}_{t}=\Sigma^{i}_{t}+\textrm{Var}[\tilde{R}_{t}^{\top}\mathbf{t}^{i}]. (16)

Assuming small errors in determining the true rotation offsets between state estimate 𝐬t\mathbf{s}_{t} and the true state 𝐬t\mathbf{s}^{*}_{t}, we consider the random variable RR~t𝐭iR^{\prime}\tilde{R}_{t}^{\top}\mathbf{t}^{i} where RR^{\prime} represents the random rotation matrix corresponding to small angular deviations [43]. Using RR~t𝐭iR^{\prime}\tilde{R}_{t}^{\top}\mathbf{t}^{i}, we approximate the covariance matrix Σt(i)\Sigma^{(i)}_{t} as

Σt(i)\displaystyle\Sigma^{(i)}_{t} Σti+𝔼[(RI)(R~t𝐭i)(R~t𝐭i)(RI)]\displaystyle\approx\Sigma^{i}_{t}+\mathbb{E}[(R^{\prime}-I)(\tilde{R}_{t}^{\top}\mathbf{t}^{i})(\tilde{R}_{t}^{\top}\mathbf{t}^{i})^{\top}(R^{\prime}-I)^{\top}]
[Σt(i)]ij\displaystyle[\Sigma^{(i)}_{t}]_{i^{\prime}j^{\prime}} [Σti]ij+𝔼[(𝐫i)(R~t𝐭i)(R~t𝐭i)(𝐫j)]\displaystyle\approx[\Sigma^{i}_{t}]_{i^{\prime}j^{\prime}}+\mathbb{E}[(\mathbf{r}^{\prime}_{i^{\prime}})^{\top}(\tilde{R}_{t}^{\top}\mathbf{t}^{i})(\tilde{R}_{t}^{\top}\mathbf{t}^{i})^{\top}(\mathbf{r}^{\prime}_{j^{\prime}})]
=[Σti]ij+Tr((R~t𝐭i)(R~t𝐭i)𝔼[(𝐫i)(𝐫j)])\displaystyle=[\Sigma^{i}_{t}]_{i^{\prime}j^{\prime}}+\mathrm{Tr}\left((\tilde{R}_{t}^{\top}\mathbf{t}^{i})(\tilde{R}_{t}^{\top}\mathbf{t}^{i})^{\top}\mathbb{E}[(\mathbf{r}^{\prime}_{i^{\prime}})(\mathbf{r}^{\prime}_{j^{\prime}})^{\top}]\right)
=[Σti]ij+Tr((R~t𝐭i)(R~t𝐭i)Qij),\displaystyle=[\Sigma^{i}_{t}]_{i^{\prime}j^{\prime}}+\mathrm{Tr}\left((\tilde{R}_{t}^{\top}\mathbf{t}^{i})(\tilde{R}_{t}^{\top}\mathbf{t}^{i})^{\top}Q_{i^{\prime}j^{\prime}}\right), (17)

where (𝐫i)(\mathbf{r}^{\prime}_{i})^{\top} represents the iith row vector in RIR^{\prime}-I. Since errors in R~\tilde{R} depend on the DNN output, we specify RR^{\prime} through the empirical distribution of the angular deviations in R~\tilde{R} as observed for the trained DNN on the training and validation data, and precompute the expectation QijQ_{i^{\prime}j^{\prime}} for each (i,j)(i^{\prime},j^{\prime}) pair.

The samples of state estimate position error {Δ𝐱t(1),,Δ𝐱t(NC)}\{\Delta\mathbf{x}_{t}^{(1)},\ldots,\Delta\mathbf{x}_{t}^{(N_{C})}\} represent both inaccuracy in the DNN estimation as well as uncertainties due to environmental factors. If the DNN approximation fails at the input corresponding to the state estimate 𝐬t\mathbf{s}_{t}, the estimated position errors at candidate states would lead to a wide range of different values for the state estimate position errors. Similarly, if the environment map \mathcal{M} near the state estimate 𝐬t\mathbf{s}_{t} contains repetitive features, the position errors computed from candidate states would be different and hence indicate high uncertainty.

5.6 Outlier Weights

Since the candidate states {𝐬t1,𝐬tNC}\{\mathbf{s}^{1}_{t}\ldots,\mathbf{s}^{N_{C}}_{t}\} are selected randomly, some position error samples may correspond to the local depth map and camera image pairs for which the DNN performs poorly. Thus, we compute outlier weights {𝐰t(1),,𝐰t(NC)}\{\mathbf{w}^{(1)}_{t},\ldots,\mathbf{w}^{(N_{C})}_{t}\} corresponding to the position error samples {Δ𝐱t(1),,Δ𝐱t(NC)}\{\Delta\mathbf{x}_{t}^{(1)},\ldots,\Delta\mathbf{x}_{t}^{(N_{C})}\} to mitigate the effect of these erroneous position error values in determining the protection levels. We compute outlier weights in each of the x,y,x,y, and zz-dimensions separately, since the DNN approximation might not necessarily fail in all of its outputs. An example of this scenario is when the input camera image and local map contain features such as building edges that can be used to robustly determine errors along certain directions but not others.

For computing the outlier weights 𝐰t(i)=[wx,t(i),wy,t(i),wz,t(i)]\mathbf{w}_{t}^{(i)}=[w^{(i)}_{x,t},w^{(i)}_{y,t},w^{(i)}_{z,t}] associated with the iith position error value Δ𝐱t(i)=[Δxt(i),Δyt(i),Δzt(i)]\Delta\mathbf{x}_{t}^{(i)}=[\Delta x_{t}^{(i)},\Delta y_{t}^{(i)},\Delta z_{t}^{(i)}], we employ the robust Z-score based outlier detection technique [31]. The robust Z-score is used in a variety of anomaly detection approaches due to its resilience to outliers [44]. We apply the following operations in each dimension X=x,y,X=x,y, and zz:

  1. 1.

    We compute the Median Absolute Deviation statistic [31] MADX{M\negthinspace AD}_{X} using all position error values {ΔXt(1),,ΔXt(NC)}\{\Delta X_{t}^{(1)},\ldots,\Delta X_{t}^{(N_{C})}\}

    MADX=median(|ΔXt(i)median(ΔXt(i))|).{M\negthinspace AD}_{X}=\operatorname{median}(|\Delta X_{t}^{(i)}-\operatorname{median}(\Delta X_{t}^{(i)})|). (18)
  2. 2.

    Using the statistic MADX{M\negthinspace AD}_{X}, we compute the robust Z-score 𝒵X(i)\mathcal{Z}^{(i)}_{X} for each position error value ΔXt(i)\Delta X_{t}^{(i)}

    𝒵X(i)=|ΔXt(i)median(ΔXt(i))|MADX.\mathcal{Z}^{(i)}_{X}=\frac{|\Delta X_{t}^{(i)}-\operatorname{median}(\Delta X_{t}^{(i)})|}{{M\negthinspace AD}_{X}}. (19)

    The robust Z-score 𝒵X(i)\mathcal{Z}^{(i)}_{X} is high if the position error Δ𝐱(i)\Delta\mathbf{x}^{(i)} deviates from the median error with a large value when compared with the median deviation value.

  3. 3.

    We compute the outlier weights {wX(1),,wX(NC)}\{w^{(1)}_{X},\ldots,w^{(N_{C})}_{X}\} from the robust Z-scores {𝒵X(1),,𝒵X(NC)}\{\mathcal{Z}^{(1)}_{X},\ldots,\mathcal{Z}^{(N_{C})}_{X}\} by applying the softmax operation [45] such that the sum of weights is unity

    wX,t(i)=eγ𝒵X(i)j=1NCeγ𝒵X(j),w^{(i)}_{X,t}=\frac{e^{-\gamma\cdot\mathcal{Z}^{(i)}_{X}}}{\sum_{j=1}^{N_{C}}e^{-\gamma\cdot\mathcal{Z}^{(j)}_{X}}}, (20)

    where γ\gamma denotes the scaling coefficient in the softmax function. We set γ=0.6745\gamma=0.6745 as the approximate inverse of the standard normal distribution evaluated at 3/43/4 to make the scaling in the statistic consistent with the standard deviation of a normal distribution [31]. A small value of outlier weight wX,t(i)w^{(i)}_{X,t} indicates that the position error ΔXt(i)\Delta X_{t}^{(i)} is an outlier.

For brevity, we extract the diagonal variances associated with each dimension for all position error samples

(σx,t2)(i)\displaystyle(\sigma^{2}_{x,t})^{(i)} =[Σt(i)]11\displaystyle=[\Sigma^{(i)}_{t}]_{11}
(σy,t2)(i)\displaystyle(\sigma^{2}_{y,t})^{(i)} =[Σt(i)]22\displaystyle=[\Sigma^{(i)}_{t}]_{22}
(σz,t2)(i)\displaystyle(\sigma^{2}_{z,t})^{(i)} =[Σt(i)]33.\displaystyle=[\Sigma^{(i)}_{t}]_{33}. (21)

5.7 Probability Distribution of Position Error

We construct a probability distribution in each of the X=x,yX=x,y and zz-dimensions from the previously obtained samples of position errors ΔXt(i)\Delta X^{(i)}_{t}, variances (σX,t2)(i)(\sigma^{2}_{X,t})^{(i)} and outlier weights wX,t(i)w^{(i)}_{X,t}. We model the probability distribution using the Gaussian Mixture Model (GMM) distribution [32]

(ρX,t)\displaystyle\mathbb{P}(\rho_{X,t}) =i=1NCwX,t(i)𝒩(ΔXt(i),(σX,t2)(i)),\displaystyle=\sum_{i=1}^{N_{C}}w^{(i)}_{X,t}\mathcal{N}\left(\Delta X_{t}^{(i)},(\sigma^{2}_{X,t})^{(i)}\right), (22)

where

  1. ρX,t\rho_{X,t} denotes the position error random variable

  2. 𝒩(μ,σ2)\mathcal{N}(\mu,\sigma^{2}) is the Gaussian distribution with mean μ\mu and variance σ2\sigma^{2}.

The probability distributions (ρx,t)\mathbb{P}(\rho_{x,t}), (ρy,t)\mathbb{P}(\rho_{y,t}) and (ρz,t)\mathbb{P}(\rho_{z,t}) incorporate both aleatoric uncertainty from the DNN-based covariance and epistemic uncertainty from the multiple DNN evaluations associated with different candidate states. Both the position error and covariance matrix depend on the rotation error point estimate from CMRNet for transforming the error values to the vehicle reference frame. Since each DNN evaluation for a candidate state estimates the rotation error independently, the epistemic uncertainty incorporates the effects of errors in DNN-based estimation of both rotation and translation. The epistemic uncertainty is reflected in the multiple GMM components and their weight coefficients, which represent the different possible position error values that may arise from the same camera image measurement and the environment map. The aleatoric uncertainty is present as the variance in each possible value of the position error represented by the individual components.

5.8 Protection Levels

We compute the protection levels along the lateral, longitudinal and vertical directions using the probability distributions obtained in the previous section. Since the position errors are in the vehicle reference frame, the x,yx,y and zz-dimensions coincide with the lateral, longitudinal and the vertical directions, respectively. First, we obtain the cumulative distribution function CDF()\textrm{CDF}(\cdot) for each probability distribution

CDF(ρX,t)\displaystyle\textrm{CDF}(\rho_{X,t}) =i=1NCwX,t(i)Φ(ρX,tΔXt(i)(σX,t)(i))\displaystyle=\sum_{i=1}^{N_{C}}w^{(i)}_{X,t}\Phi\left(\frac{\rho_{X,t}-\Delta X_{t}^{(i)}}{(\sigma_{X,t})^{(i)}}\right) (23)

where Φ()\Phi(\cdot) is the cumulative distribution function of the standard normal distribution. Then, for a specified value of the integrity risk IRIR, we compute the protection level PLPL in lateral, longitudinal and vertical directions from equation 1 using the CDF as the probability distribution. For numerical optimization, we employ a simple interval halving method for line search or the bisection method [46]. To account for both positive and negative errors, we perform the optimization both using CDF (supremum) and 1CDF1-\textrm{CDF} (infemum) with IR/2IR/2 as the integrity risk and use the maximum absolute value as the protection level.

The computed protection levels consider heavy-tails in the GMM probability distribution of the position error that arise because of the different possible values of the position error that can be computed from the available camera measurements and environment map. Our method computes large protection levels when many different values of position error may be equally probable from the measurements, resulting in larger tail probabilities in the GMM, and small protection levels only if the uncertainty from both aleatoric and epistemic sources is small.

6 Experimental Results

6.1 Real-World Driving Dataset

We use the KITTI visual odometry dataset [47] to evaluate the performance of the protection levels computed by our approach. The dataset was recorded around Karlsruhe, Germany over multiple driving sequences and contains images recorded by multiple on-board cameras, along with ground truth positions and orientations. Additionally, the dataset contains LiDAR point cloud measurements which we use to generate the environment map corresponding to each sequence. Since our approach for computing protection levels just requires a monocular camera sensor, we use the images recorded by the left RGB camera in our experiments. We use the sequences 0000, 0303, 0505, 0606, 0707, 0808 and 0909 from the dataset based on the availability of a LiDAR environment map. We use sequence 0000 for validation of our approach and the rest of the sequences are utilized in training our DNN. The experimental parameters are provided in Table 5.

6.2 LiDAR Environment Map

To construct a precise LiDAR point cloud map \mathcal{M} of the environment, we exploit the openly available position and orientation values for the dataset computed via Simultaneous Localization and Mapping [4]. Similar to [15], we aggregate the LiDAR point clouds across all time instances. Then, we detect and remove sparse outliers within the aggregated point cloud by computing Z-score [31] of each point in a 0.10.1 m local neighborhood. We discarded the points which had a higher Z-score than 33. Finally, the remaining points are down sampled into a voxel map of the environment \mathcal{M} with resolution of 0.10.1 m. The corresponding map for sequence 00 in the KITTI dataset is shown in Fig. 5. For storing large maps, we divide the LiDAR point cloud sequences into multiple overlapping parts and construct separate maps of roughly 500500 Megabytes each.

6.3 DNN Training and Testing Datasets

We generate the training dataset for our DNN in two steps. First, we randomly select a state estimate sts_{t} at time tt from within a 22 m translation and a 1010^{\circ} rotation of the ground truth positions and orientations in each driving sequence. The translation and rotation used for generating the state estimate is utilized as the ground truth position error Δ𝐱t\Delta\mathbf{x}^{*}_{t} and orientation error Δ𝐫t\Delta\mathbf{r}^{*}_{t}. Then, using the LiDAR map \mathcal{M}, we generate the local depth map L(𝐬t,)L(\mathbf{s}_{t},\mathcal{M}) corresponding to the state estimate 𝐬t\mathbf{s}_{t} and use it as the DNN input along with the camera image ItI_{t} from the driving sequence data. The training dataset comprises of camera images from 1145511455 different time instances, with the state estimate selected at runtime so as to have different state estimates for the same camera images in different epochs.

Similar to the data augmentation techniques described in [15], we

  1. 1.

    Randomly change contrast, saturation and brightness of images,

  2. 2.

    Apply random rotations in the range of ±5\pm 5^{\circ} to both the camera images and local depth maps,

  3. 3.

    Horizontally mirror the camera image and compute the local depth map using a modified camera projection matrix.

All three of these data augmentation techniques are used in training CMRNet in the first half of the optimization process. However, for training CovarianceNet, we skip the contrast, saturation and brightness changes during the second half of the optimization so that the DNN can learn real-world noise features from camera images.

We generate the validation and test datasets from sequence 0000 in the KITTI odometry dataset, which is not used for training. We follow a similar procedure as the one for generating the training dataset, except we do not augment the data. The validation dataset comprises of randomly selected 100100 time instances from sequence 0000, while the test dataset contains the remaining 44414441 time instances in sequence 0000.

{floatrow}\capbtabbox
Parameter Value
Integrity risk IRIR 0.010.01
Candidate state maximum translation offset tmaxt_{max} 1.01.0 m
Candidate state maximum rotation offset rmaxr_{max} 55^{\circ}
Number of candidate states NCN_{C} 2424
Lateral alarm limit ALlatAL_{lat} 0.850.85 m
Longitudinal alarm limit ALlonAL_{lon} 1.501.50 m
Vertical alarm limit ALvertAL_{vert} 1.471.47 m
Figure 4: Experimental parameters
\ffigboxRefer to caption
Figure 5: 3D LiDAR environment map from KITTI dataset sequence 00 [47].

6.4 Training Procedure

We train the DNN using stochastic gradient descent. Directly optimizing via the maximum likelihood loss term MLE()\mathcal{L}_{\textrm{MLE}}(\cdot) might suffer from instability caused by the interdependence between the translation error Δ𝐱~\Delta\tilde{\mathbf{x}} and covariance Σ~\tilde{\Sigma} outputs [48]. Therefore, we employ the mean-variance split training strategy proposed in [48]: First, we set (αHuber=1,αMLE=1,αAng=1)(\alpha_{\textrm{Huber}}=1,\alpha_{\textrm{MLE}}=1,\alpha_{\textrm{Ang}}=1) and only optimize the parameters of CMRNet till validation error stops decreasing. Next, we set (αHuber=0,αMLE=1,αAng=0)(\alpha_{\textrm{Huber}}=0,\alpha_{\textrm{MLE}}=1,\alpha_{\textrm{Ang}}=0) and optimize the parameters of CovarianceNet. We alternate between these two steps till validation loss stops decreasing. Our DNN is implemented using the PyTorch library [49] and takes advantage of the open-source implementation available for CMRNet [15] as well as the available pretrained weights for initialization. Similar to CMRNet, all the layers in our DNN use the leaky RELU activation function with a negative slope of 0.10.1. We train the DNN on using a single NVIDIA Tesla P40 GPU with a batch size of 2424 and learning rate of 10510^{-5} selected via grid search.

6.5 Metrics

We evaluate the lateral, longitudinal and vertical protection levels computed using our approach using the following three metrics (with subscript tt dropped for brevity):

  1. 1.

    Bound gap measures the difference between the computed protection levels PLlat,PLlon,PLvertPL_{lat},PL_{lon},PL_{vert} and the true position error magnitude during nominal operations (protection level is less than the alarm limit and greater than the position error)

    BGlat\displaystyle BG_{lat} =avg(PLlat|Δx|)\displaystyle=\textrm{avg}(PL_{lat}-|\Delta x^{*}|)
    BGlon\displaystyle BG_{lon} =avg(PLlon|Δy|)\displaystyle=\textrm{avg}(PL_{lon}-|\Delta y^{*}|)
    BGvert\displaystyle BG_{vert} =avg(PLvert|Δz|),\displaystyle=\textrm{avg}(PL_{vert}-|\Delta z^{*}|), (24)

    where

    1. BGlat,BGlonBG_{lat},BG_{lon} and BGvertBG_{vert} denote bound gaps in lateral, longitudinal and vertical dimensions respectively

    2. avg()\textrm{avg}(\cdot) denotes the average computed over the test dataset for which the value of protection level is greater than position error and less than the alarm limit

    A small bound gap value BGlat,BGlon,BGvertBG_{lat},BG_{lon},BG_{vert} is desirable since it implies that the algorithm both estimates the position error magnitude during nominal operations accurately and has low uncertainty in the prediction. We only consider the bound gap for nominal operations, since the estimated position is declared unsafe when the protection level exceeds the alarm limit.

  2. 2.

    Failure rate measures the total fraction of time instances in the test data sequence for which the computed protection levels PLlat,PLlon,PLvertPL_{lat},PL_{lon},PL_{vert} are smaller than the true position error magnitude

    FRlat\displaystyle FR_{lat} =1Tmaxt=1Tmax𝟙t(PLlat<|Δx|)\displaystyle=\frac{1}{T_{\textrm{max}}}\sum_{t=1}^{T_{\textrm{max}}}\mathbbm{1}_{t}\left(PL_{lat}<|\Delta x^{*}|\right)
    FRlon\displaystyle FR_{lon} =1Tmaxt=1Tmax𝟙t(PLlon<|Δy|)\displaystyle=\frac{1}{T_{\textrm{max}}}\sum_{t=1}^{T_{\textrm{max}}}\mathbbm{1}_{t}\left(PL_{lon}<|\Delta y^{*}|\right)
    FRvert\displaystyle FR_{vert} =1Tmaxt=1Tmax𝟙t(PLvert<|Δz|),\displaystyle=\frac{1}{T_{\textrm{max}}}\sum_{t=1}^{T_{\textrm{max}}}\mathbbm{1}_{t}\left(PL_{vert}<|\Delta z^{*}|\right), (25)

    where

    1. FRlat,FRlonFR_{lat},FR_{lon} and FRvertFR_{vert} denote failure rates for lateral, longitudinal and vertical protection levels, respectively

    2. 𝟙t()\mathbbm{1}_{t}(\cdot) denotes the indicator function computed using the protection level and true position error values at time tt. The indicator function evaluates to 11 if the event in its argument holds true, and otherwise evaluates to 0

    3. TmaxT_{\textrm{max}} denotes the total time duration of the test sequence

    The failure rate FRlat,FRlon,FRvertFR_{lat},FR_{lon},FR_{vert} should be consistent with the specified value of the integrity risk IRIR to meet the safety requirements.

  3. 3.

    False alarm rate is computed for a specified alarm limit ALlat,ALlon,ALvertAL_{lat},AL_{lon},AL_{vert} in the lateral, longitudinal and vertical directions and measures the fraction of time instances in the test data sequence for which the computed protection levels PLlat,PLlon,PLvertPL_{lat},PL_{lon},PL_{vert} exceed the alarm limit ALlat,ALlon,ALvertAL_{lat},AL_{lon},AL_{vert} while the position error magnitude is within the alarm limits. We first define the following integrity events

    Ωlat,PL\displaystyle\Omega_{lat,PL} =(PLlat>ALlat)\displaystyle=(PL_{lat}>AL_{lat})
    Ωlat,PE\displaystyle\Omega_{lat,PE} =(|Δx|>ALlat)\displaystyle=(|\Delta x^{*}|>AL_{lat})
    Ωlon,PL\displaystyle\Omega_{lon,PL} =(PLlon>ALlon)\displaystyle=(PL_{lon}>AL_{lon})
    Ωlon,PE\displaystyle\Omega_{lon,PE} =(|Δy|>ALlon)\displaystyle=(|\Delta y^{*}|>AL_{lon})
    Ωvert,PL\displaystyle\Omega_{vert,PL} =(PLvert>ALvert)\displaystyle=(PL_{vert}>AL_{vert})
    Ωvert,PE\displaystyle\Omega_{vert,PE} =(|Δz|>ALvert).\displaystyle=(|\Delta z^{*}|>AL_{vert}). (26)

    The complement of each event is denoted by Ω¯\bar{\Omega}. Next, we define the counts for false alarms NX,FAN_{X,FA}, true alarms NX,TAN_{X,TA} and the number of times the position error exceeds the alarm limit NX,PEN_{X,PE} with X=lat,lonX=lat,lon and vertvert

    NX,FA\displaystyle N_{X,FA} =t=1Tmax𝟙t(ΩX,PLΩ¯X,PE)\displaystyle=\sum_{t=1}^{T_{\textrm{max}}}\mathbbm{1}_{t}\left(\Omega_{X,PL}\cap\bar{\Omega}_{X,PE}\right)
    NX,TA\displaystyle N_{X,TA} =t=1Tmax𝟙t(ΩX,PLΩX,PE)\displaystyle=\sum_{t=1}^{T_{\textrm{max}}}\mathbbm{1}_{t}\left(\Omega_{X,PL}\cap\Omega_{X,PE}\right)
    NX,PE\displaystyle N_{X,PE} =t=1Tmax𝟙t(ΩX,PE).\displaystyle=\sum_{t=1}^{T_{\textrm{max}}}\mathbbm{1}_{t}\left(\Omega_{X,PE}\right). (27)

    Finally, we compute the false alarm rates FARlat,FARlon,FARvertFAR_{lat},FAR_{lon},FAR_{vert} after normalizing the total number of position error magnitudes lying above and below the alarm limit ALAL

    FARX\displaystyle FAR_{X} =NX,FA(TmaxNX,PE)NX,FA(TmaxNX,PE)+NX,TANX,PE.\displaystyle=\frac{N_{X,FA}\cdot({T_{\textrm{max}}}-N_{X,PE})}{N_{X,FA}\cdot({T_{\textrm{max}}}-N_{X,PE})+N_{X,TA}\cdot N_{X,PE}}. (28)

6.6 Results

Fig. 6 shows the lateral and longitudinal protection levels computed by our approach on two 200200 s subsets of the test sequence. For clarity, protection levels are computed at every 55th time instance. Similarly, Fig. 7 shows the vertical protection levels along with the vertical position error magnitude in a subset of the test sequence. As can be seen from both the figures, the computed protection levels successfully enclose the position error magnitudes at a majority of the points (99%\sim 99\%) in the visualized subsequences. Furthermore, the vertical protection levels are observed to be visually closer to the position error as compared to the lateral and longitudinal protection levels. This is due to the superior performance of the DNN in determining position errors along the vertical dimension, which is easier to learn since all the camera images in the dataset are captured by a ground-based vehicle.

Refer to caption
Figure 6: Lateral and longitudinal protection level results on the test sequence in real-world dataset. We show protection levels for two subsets of the total sequence, computed at every 55 s intervals. The protection levels successfully enclose the state estimates in 99%\sim 99\% of the cases.
Refer to caption
Figure 7: Vertical protection level results on the test sequence in real-world dataset. We show protection levels for a subset of the total sequence. The protection levels successfully enclose the position error magnitudes with a small bound gap.

Fig. 8 displays the integrity diagrams generated after the Stanford-ESA integrity diagram proposed for SBAS integrity [50]. The diagram is generated from 1500015000 samples of protection levels corresponding to randomly selected state estimates and camera images within the test sequence. For protection levels each direction, we set the alarm limit (Table 5) based on the specifications suggested for mid-size vehicles in [10], beyond which the state estimate is declared unsafe to use. The lateral, longitudinal and vertical protection levels are greater than the position error magnitudes in 99\sim 99% cases, which is consistent with the specified integrity requirement. Furthermore, a large fraction of the failures is in the region where the protection level is greater than the alarm limit and thus the system has been correctly identified to be under unsafe operation.

We conducted an ablation study to numerically evaluate the impact of our proposed epistemic uncertainty measure and outlier weighting method in computing protection levels. We evaluated protection levels in three different cases: Incorporating DNN covariance, epistemic uncertainty and outlier weighting (VAR+EO); incorporating just the DNN covariance and epistemic uncertainty with equal weights assigned to all position error samples (VAR+E); and only using the DNN covariance (VAR). For VAR, we constructed a Gaussian distribution using the DNN position error output and diagonal variance entries in each dimension. Then, we computed protection levels from the inverse cumulative distribution function of the Gaussian distribution corresponding to the specified value of integrity risk IRIR. Table 1 summarizes our results. Incorporating the epistemic uncertainty in computing protection levels improved the failure rate from 0.050.05 in lateral protection levels, 0.050.05 in longitudinal protection levels and 0.030.03 in vertical protection levels to within 0.010.01 in all cases. This is because the covariance estimate from the DNN provides an overconfident measure of uncertainty, which is corrected by our epistemic uncertainty measure. Furthermore, incorporating outlier weighting reduced the average nominal bound gap by about 0.020.02 m in lateral protection levels, 0.050.05 m in longitudinal protection levels, and 0.050.05 m in vertical protection levels as well as false alarm rate by about 0.020.02 for each direction while keeping the failure rate within the specified integrity risk requirement.

The mean bound gap between the lateral protection levels computed from our approach and the position error magnitudes in the nominal cases is smaller than a quarter of the width of a standard U.S. lane. In the longitudinal direction, the bound gap is somewhat larger since fewer visual features are present along the road for determining the position error using the DNN. The corresponding value in the vertical dimension is smaller, owing to the DNN’s superior performance in determining position errors and uncertainty in the vertical dimension. This demonstrates the applicability of our approach to urban roads.

For an integrity risk requirement of 0.010.01, the protection levels computed by our method demonstrate a failure rate equal to or within 0.010.01 as well. However, further lowering the integrity risk requirement during our experiments either did not similarly improve the failure rate or caused a significant increase in the bound gaps and the false alarm rate. A possible reason is that the uncertainty approximated by our approach through both the aleatoric and epistemic measures fails to act as an accurate uncertainty representation for smaller integrity risk requirements than 0.010.01. Future research would consider more and varied training data, better strategies for selecting candidate states, and different DNN architectures to meet smaller integrity risk requirements.

A shortcoming of our approach is the large false alarm rate exhibited by the computed protection levels in Table 1. The large value results both from the inherent noise in the DNN-based estimation of position and rotation error as well as from frequently selecting candidate states that result in large outlier error values. A future work direction for reducing the false alarm rate is to explore strategies for selecting candidate states and mitigating outliers.

A key advantage offered by our approach is its application to scenarios where a direct analysis of the error sources in the state estimation algorithm is difficult, such as when feature rich visual information is processed by a machine learning algorithm for estimating the state. In such scenarios, our approach computes protection levels separately from the state estimation algorithm by both evaluating a data-driven model of the position error uncertainty and characterizing the epistemic uncertainty in the model outputs.

Lateral PL Longitudinal PL Vertical PL
BGBG(m) FRFR FARFAR BGBG(m) FRFR FARFAR BGBG(m) FRFR FARFAR
VAR+EO 0.490.49 0.010.01 0.470.47 0.770.77 0.010.01 0.400.40 0.380.38 <0.01<\negmedspace 0.01 0.140.14
VAR+E 0.510.51 0.010.01 0.490.49 0.820.82 0.010.01 0.430.43 0.430.43 <0.01<\negmedspace 0.01 0.160.16
VAR 0.420.42 0.050.05 0.450.45 0.640.64 0.050.05 0.360.36 0.300.30 0.020.02 0.120.12
Table 1: Evaluation of lateral, longitudinal and vertical protection levels from our approach. We compare protection levels computed by our trained model using DNN covariance, epistemic uncertainty and outlier weighting (VAR+EO), DNN covariance and epistemic uncertainty (VAR+E) and only DNN covariance (VAR). Incorporating epistemic uncertainty results in lower failure rate, while incorporating outlier weights reduces bound gap and false alarm rate.
Refer to caption
Figure 8: Integrity diagram results for the lateral, longitudinal and vertical protection levels. The diagram contains protection levels evaluated across 1500015000 different state estimates and camera images randomly selected from the test sequence. A majority of the samples are close to and greater than the position error magnitude, validating the applicability of the computed protection levels as a robust safety measure.

7 Conclusions

In this work, we presented a data-driven approach for computing lateral, longitudinal and vertical protection levels associated with a given state estimate from camera images and a 3D LiDAR map of the environment. Our approach estimates both aleatoric and epistemic measures of uncertainty for computing protection levels, thereby providing robust measures of localization safety. We demonstrated the efficacy of our method on real-world data in terms of bound gap, failure rate and false alarm rate. Results show that the lateral, longitudinal and vertical protection levels computed from our method enclose the position error magnitudes with 0.010.01 probability of failure and less than 11 m bound gap in all directions, which demonstrates that our approach is applicable to GNSS-denied urban environments.

Acknowledgements

This material is based upon work supported by the National Science Foundation under award #2006162.

References

  • Delling et al. [2017] Daniel Delling, Andrew V. Goldberg, Thomas Pajor, and Renato F. Werneck. Customizable Route Planning in Road Networks. Transportation Science, 51(2):566–591, May 2017. ISSN 0041-1655, 1526-5447. 10.1287/trsc.2014.0579.
  • Jensen et al. [2016] Morten Borno Jensen, Mark Philip Philipsen, Andreas Mogelmose, Thomas Baltzer Moeslund, and Mohan Manubhai Trivedi. Vision for Looking at Traffic Lights: Issues, Survey, and Perspectives. IEEE Transactions on Intelligent Transportation Systems, 17(7):1800–1815, July 2016. ISSN 1524-9050, 1558-0016. 10.1109/TITS.2015.2509509.
  • Wolcott and Eustice [2017] Ryan W Wolcott and Ryan M Eustice. Robust LIDAR localization using multiresolution Gaussian mixture maps for autonomous driving. The International Journal of Robotics Research, 36(3):292–319, March 2017. ISSN 0278-3649, 1741-3176. 10.1177/0278364917696568.
  • Caselitz et al. [2016] Tim Caselitz, Bastian Steder, Michael Ruhnke, and Wolfram Burgard. Monocular camera localization in 3D LiDAR maps. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1926–1931, Daejeon, South Korea, October 2016. IEEE. ISBN 978-1-5090-3762-9. 10.1109/IROS.2016.7759304.
  • Spilker Jr. et al. [1996] James J. Spilker Jr., Penina Axelrad, Bradford W. Parkinson, and Per Enge, editors. Global Positioning System: Theory and Applications, Volume I. American Institute of Aeronautics and Astronautics, Washington DC, January 1996. ISBN 978-1-56347-106-3 978-1-60086-638-8. 10.2514/4.866388.
  • Jiang and Wang [2016] Yiping Jiang and Jinling Wang. A New Approach to Calculate the Horizontal Protection Level. Journal of Navigation, 69(1):57–74, January 2016. ISSN 0373-4633, 1469-7785. 10.1017/S0373463315000545.
  • Cezón et al. [2013] A. Cezón, M. Cueto, and I. Fernández. Analysis of Multi-GNSS Service Performance Assessment: ARAIM vs. IBPL Performances Comparison. pages 2654–2663, September 2013. ISSN: 2331-5954.
  • Tran and Lo Presti [2019] Hieu Trung Tran and Letizia Lo Presti. Kalman filter-based ARAIM algorithm for integrity monitoring in urban environment. ICT Express, 5(1):65–71, March 2019. ISSN 24059595. 10.1016/j.icte.2018.05.002.
  • Badue et al. [2021] Claudine Badue, Rânik Guidolini, Raphael Vivacqua Carneiro, Pedro Azevedo, Vinicius B. Cardoso, Avelino Forechi, Luan Jesus, Rodrigo Berriel, Thiago M. Paixão, Filipe Mutz, Lucas de Paula Veronese, Thiago Oliveira-Santos, and Alberto F. De Souza. Self-driving cars: A survey. Expert Systems with Applications, 165:113816, March 2021. ISSN 09574174. 10.1016/j.eswa.2020.113816.
  • Reid et al. [2019] Tyler G. R. Reid, Sarah E. Houts, Robert Cammarata, Graham Mills, Siddharth Agarwal, Ankit Vora, and Gaurav Pandey. Localization Requirements for Autonomous Vehicles. SAE International Journal of Connected and Automated Vehicles, 2(3):12–02–03–0012, September 2019. ISSN 2574-075X. 10.4271/12-02-03-0012. arXiv: 1906.01061.
  • Taira et al. [2021] Hajime Taira, Masatoshi Okutomi, Torsten Sattler, Mircea Cimpoi, Marc Pollefeys, Josef Sivic, Tomas Pajdla, and Akihiko Torii. InLoc: Indoor Visual Localization with Dense Matching and View Synthesis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(4):1293–1307, April 2021. ISSN 0162-8828, 2160-9292, 1939-3539. 10.1109/TPAMI.2019.2952114.
  • Kim et al. [2018] Youngji Kim, Jinyong Jeong, and Ayoung Kim. Stereo Camera Localization in 3D LiDAR Maps. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1–9, Madrid, October 2018. IEEE. ISBN 978-1-5386-8094-0. 10.1109/IROS.2018.8594362.
  • Lyrio et al. [2015] Lauro J. Lyrio, Thiago Oliveira-Santos, Claudine Badue, and Alberto Ferreira De Souza. Image-based mapping, global localization and position tracking using VG-RAM weightless neural networks. In 2015 IEEE International Conference on Robotics and Automation (ICRA), pages 3603–3610, Seattle, WA, May 2015. IEEE. ISBN 978-1-4799-6923-4. 10.1109/ICRA.2015.7139699.
  • Oliveira et al. [2020] Gabriel L. Oliveira, Noha Radwan, Wolfram Burgard, and Thomas Brox. Topometric Localization with Deep Learning. In Nancy M. Amato, Greg Hager, Shawna Thomas, and Miguel Torres-Torriti, editors, Robotics Research, volume 10, pages 505–520. Springer International Publishing, Cham, 2020. ISBN 978-3-030-28618-7 978-3-030-28619-4. 10.1007/978-3-030-28619-4_38. Series Title: Springer Proceedings in Advanced Robotics.
  • Cattaneo et al. [2019] Daniele Cattaneo, Matteo Vaghi, Augusto Luis Ballardini, Simone Fontana, Domenico Giorgio Sorrenti, and Wolfram Burgard. CMRNet: Camera to LiDAR-Map Registration. 2019 IEEE Intelligent Transportation Systems Conference (ITSC), pages 1283–1289, October 2019. 10.1109/ITSC.2019.8917470. arXiv: 1906.10109.
  • Sarlin et al. [2019] Paul-Edouard Sarlin, Cesar Cadena, Roland Siegwart, and Marcin Dymczyk. From Coarse to Fine: Robust Hierarchical Localization at Large Scale. arXiv:1812.03506 [cs], pages 12708–12717, April 2019. 10.1109/CVPR.2019.01300. arXiv: 1812.03506 version: 2.
  • Recht et al. [2019] Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. Do ImageNet Classifiers Generalize to ImageNet? In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 5389–5400. PMLR, June 2019.
  • Kendall and Gal [2017] Alex Kendall and Yarin Gal. What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? arXiv:1703.04977 [cs], 30, October 2017. arXiv: 1703.04977.
  • Loquercio et al. [2020] Antonio Loquercio, Mattia Segu, and Davide Scaramuzza. A General Framework for Uncertainty Estimation in Deep Learning. IEEE Robotics and Automation Letters, 5(2):3153–3160, April 2020. ISSN 2377-3766, 2377-3774. 10.1109/LRA.2020.2974682.
  • Kiureghian and Ditlevsen [2009] Armen Der Kiureghian and Ove Ditlevsen. Aleatory or epistemic? Does it matter? Structural Safety, 31(2):105–112, March 2009. ISSN 01674730. 10.1016/j.strusafe.2008.06.020.
  • McAllister et al. [2017] Rowan McAllister, Yarin Gal, Alex Kendall, Mark van der Wilk, Amar Shah, Roberto Cipolla, and Adrian Weller. Concrete Problems for Autonomous Vehicle Safety: Advantages of Bayesian Deep Learning. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pages 4745–4753, Melbourne, Australia, August 2017. International Joint Conferences on Artificial Intelligence Organization. ISBN 978-0-9992411-0-3. 10.24963/ijcai.2017/661.
  • Yang et al. [2020] Nan Yang, Lukas von Stumberg, Rui Wang, and Daniel Cremers. D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1278–1289, Seattle, WA, USA, June 2020. IEEE. ISBN 978-1-72817-168-5. 10.1109/CVPR42600.2020.00136.
  • Kendall and Cipolla [2016] Alex Kendall and Roberto Cipolla. Modelling uncertainty in deep learning for camera relocalization. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pages 4762–4769, Stockholm, Sweden, May 2016. IEEE. ISBN 978-1-4673-8026-3. 10.1109/ICRA.2016.7487679.
  • Gal and Ghahramani [2016] Yarin Gal and Zoubin Ghahramani. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Maria Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 1050–1059, New York, New York, USA, June 2016. PMLR.
  • Blundell et al. [2015] Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. Weight Uncertainty in Neural Network. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 1613–1622, Lille, France, July 2015. PMLR.
  • Smith and Gal [2018] Lewis Smith and Yarin Gal. Understanding Measures of Uncertainty for Adversarial Example Detection. arXiv:1803.08533 [cs, stat], pages 560–569, March 2018. arXiv: 1803.08533.
  • Gupta and Gao [2020] Shubh Gupta and Grace X. Gao. Data-Driven Protection Levels for Camera and 3D Map-based Safe Urban Localization. pages 2483–2499, October 2020. 10.33012/2020.17698.
  • Lukas and Stoker [2016] Vicki Lukas and J. M. Stoker. 3D Elevation Program—Virtual USA in 3D. USGS Numbered Series 2016-3022, U.S. Geological Survey, Reston, VA, 2016. Code Number: 2016-3022 Code: 3D Elevation Program—Virtual USA in 3D Publication Title: 3D Elevation Program—Virtual USA in 3D Reporter: 3D Elevation Program—Virtual USA in 3D Series: Fact Sheet IP-074727.
  • Krishnan et al. [2011] Sriram Krishnan, Christopher Crosby, Viswanath Nandigam, Minh Phan, Charles Cowart, Chaitanya Baru, and Ramon Arrowsmith. OpenTopography: a services oriented architecture for community access to LIDAR topography. In Proceedings of the 2nd International Conference on Computing for Geospatial Research & Applications - COM.Geo ’11, pages 1–8, Washington, DC, 2011. ACM Press. ISBN 978-1-4503-0681-2. 10.1145/1999320.1999327.
  • Wang et al. [2020] Cheng Wang, Chenglu Wen, Yudi Dai, Shangshu Yu, and Minghao Liu. Urban 3D modeling with mobile laser scanning: a review. Virtual Reality & Intelligent Hardware, 2(3):175–212, June 2020. ISSN 20965796. 10.1016/j.vrih.2020.05.003.
  • Iglewicz and Hoaglin [1993] Boris Iglewicz and David Caster Hoaglin. How to Detect and Handle Outliers. ASQC Quality Press, 1993. ISBN 978-0-87389-247-6. Google-Books-ID: siInAQAAIAAJ.
  • Lindsay [1995] Bruce G. Lindsay. Mixture Models: Theory, Geometry, and Applications. IMS, 1995. ISBN 978-0-940600-32-4. Google-Books-ID: VFDzNhikFbQC.
  • Joerger and Pervan [2019] M. Joerger and B. Pervan. Quantifying Safety of Laser-Based Navigation. IEEE Transactions on Aerospace and Electronic Systems, 55(1):273–288, February 2019. ISSN 1557-9603. 10.1109/TAES.2018.2850381. Conference Name: IEEE Transactions on Aerospace and Electronic Systems.
  • Zhu et al. [2020] C. Zhu, M. Joerger, and M. Meurer. Quantifying Feature Association Error in Camera-based Positioning. In 2020 IEEE/ION Position, Location and Navigation Symposium (PLANS), pages 967–972, April 2020. 10.1109/PLANS46316.2020.9109919. ISSN: 2153-3598.
  • Kendall et al. [2015] Alex Kendall, Matthew Grimes, and Roberto Cipolla. PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 2938–2946, Santiago, Chile, December 2015. IEEE. ISBN 978-1-4673-8391-2. 10.1109/ICCV.2015.336.
  • Cadena et al. [2016] Cesar Cadena, Luca Carlone, Henry Carrillo, Yasir Latif, Davide Scaramuzza, Jose Neira, Ian Reid, and John J. Leonard. Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age. IEEE Transactions on Robotics, 32(6):1309–1332, December 2016. ISSN 1552-3098, 1941-0468. 10.1109/TRO.2016.2624754. arXiv: 1606.05830.
  • Russell and Reale [2019] Rebecca L. Russell and Christopher Reale. Multivariate Uncertainty in Deep Learning. arXiv:1910.14215 [cs, stat], October 2019. arXiv: 1910.14215.
  • Liu et al. [2018] Katherine Liu, Kyel Ok, William Vega-Brown, and Nicholas Roy. Deep Inference for Covariance Estimation: Learning Gaussian Noise Models for State Estimation. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 1436–1443, Brisbane, QLD, May 2018. IEEE. ISBN 978-1-5386-3081-5. 10.1109/ICRA.2018.8461047.
  • Pintus et al. [2011] Ruggero Pintus, Enrico Gobbetti, and Marco Agus. Real-time rendering of massive unstructured raw point clouds using screen-space operators. In Proceedings of the 12th International conference on Virtual Reality, Archaeology and Cultural Heritage, pages 105–112, 2011.
  • Dosovitskiy et al. [2015] Alexey Dosovitskiy, Philipp Fischer, Eddy Ilg, Philip Hausser, Caner Hazirbas, Vladimir Golkov, Patrick van der Smagt, Daniel Cremers, and Thomas Brox. FlowNet: Learning Optical Flow with Convolutional Networks. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 2758–2766, Santiago, December 2015. IEEE. ISBN 978-1-4673-8391-2. 10.1109/ICCV.2015.316.
  • Huber [1992] Peter J. Huber. Robust Estimation of a Location Parameter. In Samuel Kotz and Norman L. Johnson, editors, Breakthroughs in Statistics, pages 492–518. Springer New York, New York, NY, 1992. ISBN 978-0-387-94039-7 978-1-4612-4380-9. 10.1007/978-1-4612-4380-9_35. Series Title: Springer Series in Statistics.
  • Zeng and Ji [2015] Tao Zeng and Shuiwang Ji. Deep Convolutional Neural Networks for Multi-instance Multi-task Learning. In 2015 IEEE International Conference on Data Mining, pages 579–588, Atlantic City, NJ, USA, November 2015. IEEE. ISBN 978-1-4673-9504-5. 10.1109/ICDM.2015.92.
  • Barfoot et al. [2011] Timothy Barfoot, James R. Forbes, and Paul T. Furgale. Pose estimation using linearized rotations and quaternion algebra. Acta Astronautica, 68(1-2):101–112, January 2011. ISSN 00945765. 10.1016/j.actaastro.2010.06.049.
  • Rousseeuw and Hubert [2018] Peter J. Rousseeuw and Mia Hubert. Anomaly detection by robust statistics. WIREs Data Mining and Knowledge Discovery, 8(2), March 2018. ISSN 1942-4787, 1942-4795. 10.1002/widm.1236.
  • Goodfellow et al. [2016] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, November 2016. ISBN 978-0-262-03561-3. Google-Books-ID: Np9SDQAAQBAJ.
  • Burden and Faires [2011] Richard L. Burden and J. Douglas Faires. Numerical Analysis. Brooks/Cole, Cengage Learning, 2011. ISBN 978-0-538-73564-3. Google-Books-ID: KlfrjCDayHwC.
  • Geiger et al. [2012] A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous driving? The KITTI vision benchmark suite. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 3354–3361, Providence, RI, June 2012. IEEE. ISBN 978-1-4673-1228-8 978-1-4673-1226-4 978-1-4673-1227-1. 10.1109/CVPR.2012.6248074.
  • Skafte et al. [2019] Nicki Skafte, Martin Jø rgensen, and Sø ren Hauberg. Reliable training and estimation of variance networks. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d\textquotesingle Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, volume 32, pages 6326–6336. Curran Associates, Inc., 2019.
  • Paszke et al. [2019] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d\textquotesingle Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  • Tossaint et al. [2007] M. Tossaint, J. Samson, F. Toran, J. Ventura-Traveset, M. Hernandez-Pajares, J.M. Juan, J. Sanz, and P. Ramos-Bosch. The Stanford - ESA Integrity Diagram: A New Tool for The User Domain SBAS Integrity Assessment. Navigation, 54(2):153–162, June 2007. ISSN 00281522. 10.1002/j.2161-4296.2007.tb00401.x.