\newfloatcommand

capbtabboxtable[][\FBwidth]

Data-Driven Protection Levels for Camera and 3D Map-based Safe Urban Localization

Shubh Gupta and Grace Xingxin Gao Stanford University

Abstract

Reliably assessing the error in an estimated vehicle position is integral for ensuring the vehicle’s safety in urban environments. Many existing approaches use GNSS measurements to characterize protection levels (PLs) as probabilistic upper bounds on the position error. However, GNSS signals might be reflected or blocked in urban environments, and thus additional sensor modalities need to be considered to determine PLs. In this paper, we propose an approach for computing PLs by matching camera image measurements to a LiDAR-based 3D map of the environment. We specify a Gaussian mixture model probability distribution of position error using deep neural network-based data-driven models and statistical outlier weighting techniques. From the probability distribution, we compute the PLs by evaluating the position error bound using numerical line-search methods. Through experimental validation with real-world data, we demonstrate that the PLs computed from our method are reliable bounds on the position error in urban environments.

1 Introduction

In recent years, research on autonomous navigation for urban environments has been garnering increasing attention. Many publications have targeted different aspects of navigation such as route planning [1], perception [2] and localization [3, 4]. For trustworthy operation in each of these aspects, assessing the level of safety of the vehicle from potential system failures is critical. However, fewer works have examined the problem of safety quantification for autonomous vehicles.

In the context of satellite-based localization, safety is typically addressed via integrity monitoring (IM) [5]. Within IM, protection levels specify a statistical upper bound on the error in an estimated position of the vehicle, which can be trusted to enclose the position errors with a required probabilistic guarantee. To detect an unsafe estimated vehicle position, these protection levels are compared with the maximum allowable position error value, known as the alarm limit. Various methods [6, 7, 8] have been proposed over the years for computing protection levels, however, most of these approaches focus on GNSS-only navigation. These approaches do not directly apply to GNSS-denied urban environments, where visual sensors are becoming increasingly preferred [9]. Although various options in visual sensors exist in the market, camera sensors are inexpensive, lightweight, and have been widely employed in industry. For quantifying localization safety in GNSS-denied urban environments, there is thus a need to develop new ways of computing protection levels using camera image measurements.

Since protection levels are bounds over the position error, computing them from camera image measurements requires a model that relates the measurements to position error in the estimate of the vehicle location. Furthermore, since the lateral, longitudinal and vertical directions are well-defined with respect to a vehicle’s location on the road, the model must estimate the maximum position error in each of these directions for computing protection levels [10]. However, characterizing such a model is not straightforward. This is because the relation between a vehicle location in an environment and the corresponding camera image measurement is complex which depends on identifying and matching structural patterns in the measurements with prior known information about the environment [3, 4, 11, 12].

Recently, data-driven techniques based on deep neural networks (DNNs) have demonstrated state-of-the-art performance in determining the state of the camera sensor, comprising of its position and orientation, by identifying and matching patterns in images with a known map of the environment [13, 14, 15] or an existing database of images [16, 11]. By leveraging datasets consisting of multiple images with known camera states in an environment, these approaches train a DNN to model the relationship between an image and the corresponding state. However, the model characterized by the DNN can often be erroneous or brittle. For instance, recent research has shown that the output of a DNN can change significantly with minimal changes to the inputs [17]. Thus, for using DNNs to determine the position error, uncertainty in the output of the DNN must also be addressed.

DNN-based algorithms consider two types of uncertainty [18, 19]. Aleatoric or statistical uncertainty results from the noise present in the inputs to the DNN, due to which a precise output cannot be produced. For camera image inputs, sources of noise include illumination changes, occlusion or the presence of visually ambiguous structures, such as windows tessellated along a wall [18]. On the other hand, epistemic or systematic uncertainty exists within the model itself. Sources of epistemic uncertainty include poorly determined DNN model parameters as well as external factors that are not considered in the model [20], such as environmental features that might be ignored by the algorithm while matching the camera images to the environment map.

While aleatoric uncertainty is typically modeled as the input-dependent variance in the output of the DNN [18, 21, 22], epistemic uncertainty relates to the DNN model and, therefore, requires further deliberation. Existing approaches approximate epistemic uncertainty by assuming a probability distribution over the weight parameters of the DNN to represent the ignorance about the correct parameters [23, 24, 25]. However, these approaches assume that a correct value of the parameters exists and that the probability distribution over the weight parameters captures the uncertainty in the model, both of which do not necessarily hold in practice [26]. This inability of existing DNN-based methods to properly characterize uncertainty limits their applicability to safety-critical applications, such as localization of autonomous vehicles.

In this paper, we propose a novel method for computing protection levels associated with a given vehicular state estimate (position and orientation) from camera image measurements and a 3D map of the environment. This work is based on our recent ION GNSS+ 2020 conference paper [27] and includes additional experiments and improvements to the DNN training process. Recently, high-definition 3D environment maps in the form of LiDAR point clouds have become increasingly available through industry players such as HERE, TomTom, Waymo and NVIDIA, as well as through projects such as USGS 3DEP [28] and OpenTopography [29]. Furthermore, LiDAR-based 3D maps are more robust to noise from environmental factors, such as illumination and weather, than image-based maps[30]. Hence, we use LiDAR-based 3D point cloud maps in our approach.

Previously, CMRNet [15] has been proposed as a DNN-based approach for determining the vehicular state from camera images and a LiDAR-based 3D map. In our approach, we extend the DNN architecture proposed in [15] to model the position error and the covariance matrix (aleatoric uncertainty) in the vehicular state estimate. To assess the epistemic uncertainty in the position error, we evaluate the DNN position error outputs at multiple candidate states in the vicinity of the state estimate, and combine the outputs into samples of the state estimate position error. Fig. 1 shows the architecture of our proposed approach. Given a state estimate, we first select multiple candidate states from its neighborhood. Using the DNN, we then evaluate the position error and covariance for each candidate state by comparing the camera image measurement with a local map constructed from the candidate state and the 3D environment map. Next, we linearly transform the position error and covariance outputs from the DNN with the relative positions of candidate states into samples of the state estimate position error and variance. We then separate these samples into the lateral, longitudinal and vertical directions and weight the samples to mitigate the impact of outliers in each direction. Subsequently, we combine the position error samples, outlier weights, and variance samples to construct a Gaussian mixture model probability distribution of the position error in each direction, and numerically evaluate its intervals to compute protection levels.

Our main contributions are as follows:

1.

We extend the CMRNet [15] architecture to model both the position error in the vehicular state estimate and the associated covariance matrix. Using the 3D LiDAR-based map of the environment, we first construct a local map representation with respect to the vehicular state estimate. Then, we use the DNN to analyze the correspondence between the camera image measurement and the local map for determining the position error and the covariance matrix.
2.

We develop a novel method for capturing epistemic uncertainty in the DNN position error output. Unlike existing approaches which assume a probability distribution over DNN weight parameters, we directly analyze different position errors that are determined by the DNN for multiple candidate states selected from within a neighborhood of the state estimate. The position error outputs from the DNN corresponding to the candidate states are then linearly combined with the candidate states’ relative position from the state estimate, to obtain an empirical distribution of the state estimate position error.
3.

We design an outlier weighting scheme to account for possible errors in the DNN output at inputs that differ from the training data. Our approach weighs the position error samples from the empirical distribution using a robust outlier detection metric, known as robust Z-score [31], along the lateral, longitudinal and vertical directions individually.
4.

We construct the lateral, longitudinal and vertical protection levels as intervals over the probability distribution of the position error. We model this probability distribution as a Gaussian Mixture Model [32] from the position error samples, DNN covariance and outlier weights.
5.

We demonstrate the applicability of our approach in urban environments by experimentally validating the protection levels computed from our method on real-world data with multiple camera images and different state estimates.

Refer to caption — Figure 1: Architecture of our proposed approach for computing protection levels. Given a state estimate, multiple candidate states are selected from its neighborhood and the corresponding position error and the covariance matrix for each candidate state are evaluated using the DNN. The position errors and covariance are then linearly transformed to obtain samples of the state estimate position error and variance, which are then weighted to determine outliers. Finally, the position error samples, outlier weights and variance are combined to construct a Gaussian Mixture Model probability distribution, from which the lateral, longitudinal and vertical protection levels are computed through numerical evaluation of its probability intervals.

The remainder of this paper is structured as follows: Section II discusses related work. Section III formulates the problem of estimating protection levels. Section IV describes the two types of uncertainties considered in our approach. Section V details our algorithm. Section VI presents the results from experimentation with real-world data. We conclude the paper in Section VII.

2 Related Work

Several methods have been developed over the years which characterize protection levels in the context of GNSS-based urban navigation. Jiang and Wang [6] compute horizontal protection levels using an iterative search-based method and test statistic based on the bivariate normal distribution. Cezón et al. [7] analyze methods which utilize the isotropy of residual vectors from the least-squares position estimation to compute the protection levels. Tran and Presti [8] combine Advanced Receiver Autonomous Integrity Monitoring (ARAIM) with Kalman filtering, and compute the protection levels by considering the set of position solutions which arise after excluding faulty measurements. These approaches compute the protection levels by deriving the mathematical relation between measurement and position domain errors. However, such a relation is difficult to formulate with camera image measurements and a LiDAR-based 3D map, since the position error in this case depends on various factors such as the structure of buildings in the environment, available visual features and illumination levels.

Previous works have proposed IM approaches for LiDAR and camera-based navigation where the vehicle is localized by associating identified landmarks with a stored map or a database. Joerger et al. [33] developed a method to quantify integrity risk for LiDAR-based navigation algorithms by analyzing failures of feature extraction and data association subroutines. Zhu et al. [34] derived a bound on the integrity risk in camera-based navigation using EKF caused by incorrect feature associations. However, these IM approaches have been developed for localization algorithms based on data-association and cannot be directly applied to many recent camera and LiDAR-based localization techniques which use deep learning to model the complex relation between measurements and the stored map or the database. Furthermore, these IM techniques do not estimate protection levels, which is the focus of our work.

Deep learning has been widely applied for determining position information from camera images. Kendall et al. [35] train a DNN using images from a single environment to learn a relation between the image and the camera 6-DOF pose. Taira et al. [11] learn image features using a DNN and apply feature extraction and matching techniques to estimate the 6-DOF camera pose relative to a known 3D map of the environment. Sarlin et al. [16] develop a deep learning-based 2D-3D matching technique to obtain 6-DOF camera pose from images and a 3D environment model. However, these approaches do not model the corresponding uncertainty associated with the estimated camera pose, or account for failures in DNN approximation [26], which is necessary for characterizing safety measures such as protection levels.

Some recent works have proposed to estimate the uncertainty associated with deep learning algorithms. Kendall and Cipolla [23] estimate the uncertainty in DNN-based camera pose estimation from images, by evaluating the network multiple times through dropout [24]. Loquercio et al. [19] propose a general framework for estimating uncertainty in deep learning as variance computed from both aleatoric and epistemic sources. McAllister et al. [21] suggest using Bayesian deep learning to determine uncertainty and quantify safety in autonomous vehicles, by placing probability distributions over DNN weights to represent the uncertainty in the DNN model. Yang et al. [22] jointly estimate the vehicle odometry, scene depth and uncertainty from sequential camera images. However, the uncertainty estimates from these algorithms do not take into account the inaccuracy of the trained DNN model, or the influence of the underlying environment structure on the DNN outputs. In our approach, we evaluate the DNN position error outputs at inputs corresponding to multiple states in the environment, and utilize these position errors for characterizing uncertainty both from inaccuracy in the DNN model as well as from the environment structure around the state estimate.

To the best of our knowledge, our approach is the first that applies data-driven algorithms for computing protection levels by characterizing the uncertainty from different error sources. The proposed method seeks to leverage the high-fidelity function modeling capability of DNNs and combine it with techniques from robust statistics and integrity monitoring to compute robust protection levels using camera image measurements and 3D map of the environment.

3 Problem Formulation

We consider the scenario of a vehicle navigating in an urban environment using measurements acquired by an on-board camera. The 3D LiDAR map of the environment $\mathcal{M}$ that consists of points $\mathbf{p}\in\mathbb{R}^{3}$ is assumed to be pre-known from either openly available repositories [28, 29] or from Simultaneous Localization and Mapping algorithms [36].

The vehicular state $\mathbf{s}_{t}=[\mathbf{x}_{t},\mathbf{o}_{t}]$ at time $t$ is a 7-element vector comprising of its 3D position $\mathbf{x}_{t}=[x_{t},y_{t},z_{t}]^{\top}\in\mathbb{R}^{3}$ along $x,y$ and $z$ -dimensions and 3D orientation unit quaternion $\mathbf{o}_{t}=[o_{1,t},o_{2,t},o_{3,t},o_{4,t}]\in\textrm{SU}(2)$ . The vehicle state estimates over time are denoted as $\{\mathbf{s}_{t}\}_{t=1}^{T_{\text{max}}}$ where $T_{\text{max}}$ denotes the total time in a navigation sequence. At each time $t$ , the vehicle captures an RGB camera image $I_{t}\in\mathbb{R}^{l\times w\times 3}$ from the on-board camera, where $l$ and $w$ denote pixels along length and width dimensions, respectively.

Given an integrity risk specification $IR$ , our objective is to compute the lateral protection level $PL_{lat,t}$ , longitudinal protection level $PL_{lon,t}$ , and vertical protection level $PL_{vert,t}$ at time $t$ , which denote the maximal bounds on the position error magnitude with a probabilistic guarantee of at least $1-IR$ . Considering $x,y$ and $z$ -dimensions in the rotational frame of the vehicle

$\displaystyle PL_{lat,t}$	$\displaystyle=\sup\left\{\rho\mid\mathbb{P}\left(\|x_{t}-x^{*}_{t}\|\leq\rho\right)\leq 1-IR\right\}$
$\displaystyle PL_{lon,t}$	$\displaystyle=\sup\left\{\rho\mid\mathbb{P}\left(\|y_{t}-y^{*}_{t}\|\leq\rho\right)\leq 1-IR\right\}$
$\displaystyle PL_{vert,t}$	$\displaystyle=\sup\left\{\rho\mid\mathbb{P}\left(\|z_{t}-z^{*}_{t}\|\leq\rho\right)\leq 1-IR\right\},$	(1)

where $\mathbf{x}^{*}_{t}=[x^{*}_{t},y^{*}_{t},z^{*}_{t}]$ denotes the unknown true vehicle position at time $t$ .

4 Types of Uncertainty in Position Error

Protection levels for a state estimate $\mathbf{s}_{t}$ at time $t$ depend on the uncertainty in determining the associated position error $\Delta\mathbf{x}_{t}=[\Delta x_{t},\Delta y_{t},\Delta z_{t}]$ between the state estimate position $\mathbf{x}_{t}$ and the true position $\mathbf{x}^{*}_{t}$ from the camera image $I_{t}$ and the environment map $\mathcal{M}$ . We consider two different kinds of uncertainty, which are categorized by the source of inaccuracy in determining the position error $\Delta\mathbf{x}_{t}$ : aleatoric and epistemic uncertainty.

4.1 Aleatoric Uncertainty

Aleatoric uncertainty refers to the uncertainty from noise present in the camera image measurements $I_{t}$ and the environment map $\mathcal{M}$ , due to which a precise value of the position error $\Delta\mathbf{x}_{t}$ cannot be determined. Existing DNN-based localization approaches model the aleatoric uncertainty as a covariance matrix with only diagonal entries [18, 21, 22] or with both diagonal and off-diagonal terms [37, 38]. Similar to the existing approaches, we characterize the aleatoric uncertainty by using a DNN to model the covariance matrix $\Sigma_{t}$ associated with the position error $\Delta\mathbf{x}_{t}$ . We consider both nonzero diagonal and off-diagonal terms in $\Sigma_{t}$ to model the correlation between $x,y$ and $z$ -dimension uncertainties, such as along the ground plane.

Aleatoric uncertainty by itself does not accurately represent the uncertainty in determining the position error. This is because aleatoric uncertainty assumes that the noise present in training data also represents the noise in all future inputs and that the DNN approximation is error-free. These assumptions fail in scenarios when the input at evaluation time is different from the training data or when the input contains features that occur rarely in the real world [26]. Thus, relying purely on aleatoric uncertainty can lead to an overconfident estimates of the position error uncertainty [18].

4.2 Epistemic Uncertainty

Epistemic uncertainty relates to the inaccuracies in the model for determining the position error $\Delta\mathbf{x}_{t}$ . In our approach, we characterize the epistemic uncertainty by leveraging a geometrical property of the position error $\Delta\mathbf{x}_{t}$ , where for the same camera image $I_{t}$ , $\Delta\mathbf{x}_{t}$ can be obtained by linearly combining the position error $\Delta\mathbf{x}^{\prime}_{t}$ computed for any candidate state $\mathbf{s}^{\prime}_{t}$ and the relative position of $\mathbf{s}^{\prime}_{t}$ from the state estimate $\mathbf{s}_{t}$ (Fig. 2). Hence, using known relative positions and orientations of $N_{C}$ candidate states $\{\mathbf{s}_{t}^{1},\ldots,\mathbf{s}_{t}^{N_{C}}\}$ from $\mathbf{s}_{t}$ , we transform the different position errors $\{\Delta\mathbf{x}_{t}^{1},\ldots,\Delta\mathbf{x}_{t}^{N_{C}}\}$ determined for the candidate states into samples of the state estimate position error $\Delta\mathbf{x}_{t}$ . The empirical distribution comprised of these position error samples characterizes the epistemic uncertainty in the position error estimated using the DNN.

5 Data-Driven Protection Levels

This section details our algorithm for computing data-driven protection levels for the state estimate $\mathbf{s}_{t}$ at time $t$ , using the camera image $I_{t}$ and environment map $\mathcal{M}$ . First, we describe the method for generating local representations of the 3D environment map $\mathcal{M}$ with respect to the state estimate $\mathbf{s}_{t}$ . Then, we illustrate the architecture of the DNN. Next, we discuss the loss functions used in DNN training. We then detail the method for selecting multiple candidate states from the neighborhood of the state estimate $\mathbf{s}_{t}$ . Using position errors and covariance matrix evaluated from the DNN for each of these candidate states, we then illustrate the process for transforming the candidate state position errors into multiple samples of the state estimate position error. Then, to mitigate the impact of outliers in the computed position error samples in each of the lateral, longitudinal and vertical directions, we detail the procedure for computing outlier weights. Next, we characterize the probability distribution over position error in lateral, longitudinal and vertical directions. Finally, we detail the approach for determining protection levels from the probability distribution by numerical methods.

5.1 Local Map Construction

A local representation of the 3D LiDAR map of the environment captures the environment information in the vicinity of the state estimate $\mathbf{s}_{t}$ at time $t$ . By comparing the environment information captured in the local map with the camera image $I_{t}\in\mathbb{R}^{l\times w\times 3}$ using a DNN, we estimate the position error $\Delta\mathbf{x}_{t}$ and covariance $\Sigma_{t}$ in the state estimate $\mathbf{s}_{t}$ . For computing the local maps, we utilize the LiDAR-image generation procedure described in [15]. Similar to their approach, we generate the local map $L(\mathbf{s},\mathcal{M})\in\mathbb{R}^{l\times w}$ associated with a vehicle state $\mathbf{s}$ and LiDAR environment map $\mathcal{M}$ in two steps.

First, we determine the rigid-body transformation matrix $H_{\mathbf{s}}$ in the special Euclidean group $\textrm{SE}(3)$ corresponding to the vehicle state $\mathbf{s}$

H_{\mathbf{s}}=\left[\begin{matrix}R_{\mathbf{s}}&T_{\mathbf{s}}\\ \mathbf{0}_{1\times 3}&1\end{matrix}\right]\in\textrm{SE}(3),

(2)

where

–

$R_{\mathbf{s}}$ denotes the rotation matrix corresponding to the orientation quaternion elements $\mathbf{o}=[o_{1},o_{2},o_{3},o_{4}]$ in the state $\mathbf{s}$
–

$T_{\mathbf{s}}$ denotes the translation vector corresponding to the position elements $\mathbf{x}=[x,y,z]$ in the state $\mathbf{s}$ .

Using the matrix $H_{\mathbf{s}}$ , we rotate and translate the points in the map $\mathcal{M}$ to the map $\mathcal{M}_{\mathbf{s}}$ in the reference frame of the state $\mathbf{s}$

\mathcal{M}_{\mathbf{s}}=\{\left[\begin{matrix}I_{3\times 3}&\mathbf{0}_{3\times 1}\end{matrix}\right]\cdot H_{\mathbf{s}}\cdot\left[\begin{matrix}\mathbf{p}^{\top}&1\end{matrix}\right]^{\top}\mid\mathbf{p}\in\mathcal{M}\},

(3)

where $I$ denotes the identity matrix.

For maintaining computational efficiency in the case of large maps, we use the points in the LiDAR map $\mathcal{M}_{\mathbf{s}}$ that lie in a sub-region around the state $\mathbf{s}$ and in the direction of the vehicle orientation.

In the second step, we apply the occlusion estimation filter presented in [39] to identify and remove occluded points along rays from the camera center. For each pair of points $(\mathbf{p}^{(i)},\mathbf{p}^{(j)})$ where $\mathbf{p}^{(i)}$ is closer to the state $\mathbf{s}$ , $\mathbf{p}^{(j)}$ is marked occluded if the angle between the ray from $\mathbf{p}^{(j)}$ to the camera center and the line from $\mathbf{p}^{(j)}$ to $\mathbf{p}^{(i)}$ is less than a threshold. Then, the remaining points are projected to the camera image frame using the camera projection matrix $K$ to generate the local depth map $L(\mathbf{s},\mathcal{M})$ . The $i$ th point $\mathbf{p}^{(i)}$ in $\mathcal{M}_{\mathbf{s}}$ is projected as

	$\displaystyle[\begin{matrix}p_{x}&p_{y}&c\end{matrix}]^{\top}$	$\displaystyle=K\cdot\mathbf{p}^{(i)}$
	$\displaystyle[L(\mathbf{s},\mathcal{M})]_{(\lceil p_{x}/c\rceil,\lceil p_{y}/c\rceil)}$	$\displaystyle=[\begin{matrix}0&0&1\end{matrix}]\cdot\mathbf{p}^{(i)},$		(4)

where

–

$p_{x},p_{y}$ denote the projected 2D coordinates with scaling term $c$
–

$[L(\mathbf{s},\mathcal{M})]_{(p_{x},p_{y})}$ denotes the $(p_{x},p_{y})$ pixel position in the local map $L(\mathbf{s},\mathcal{M})$ .

The local depth map $L(\mathbf{s},\mathcal{M})$ for state $\mathbf{s}$ visualizes the environment features that are expected to be captured in a camera image obtained from the state $\mathbf{s}$ . However, the obtained camera image $I_{t}$ is associated with the true state $\mathbf{s}^{*}_{t}$ that might be different from the state estimate $\mathbf{s}_{t}$ . Nevertheless, for reasonably small position and orientation differences between the state estimate $\mathbf{s}_{t}$ and true state $\mathbf{s}^{*}_{t}$ , the local map $L(\mathbf{s},\mathcal{M})$ contains features that correspond with some of the features in the camera image $I_{t}$ that we use to estimate the position error.

5.2 DNN Architecture

We use a DNN to estimate the position error $\Delta\mathbf{x}_{t}$ and associated covariance matrix $\Sigma_{t}$ by implicitly identifying and comparing the positions of corresponding features in camera image $I_{t}$ and the local depth map $L(\mathbf{s}_{t},\mathcal{M})$ associated with the state estimate $\mathbf{s}_{t}$ .

The architecture of our DNN is given in Fig. 3. Our DNN comprises of two separate modules, one for estimating the position error $\Delta\mathbf{x}_{t}$ and other for the parameters of the covariance matrix $\Sigma_{t}$ . The first module for estimating the position error $\Delta\mathbf{x}_{t}$ is based on CMRNet [15]. CMRNet was originally proposed as an algorithm to iteratively determine the position and orientation of a vehicle using a camera image and 3D LiDAR map, starting from a provided initial state. For determining position error $\Delta\mathbf{x}_{t}$ using CMRNet, we use the state estimate $\mathbf{s}_{t}$ as the provided initial state and the corresponding DNN translation $\Delta\tilde{\mathbf{x}}_{t}$ and rotation $\Delta\tilde{\mathbf{r}}$ error output for transforming the state $\mathbf{s}_{t}$ towards the true state $\mathbf{s}^{*}_{t}$ . Formally, given any state $\mathbf{s}$ and camera image $I_{t}$ at time $t$ , the translation error $\Delta\tilde{\mathbf{x}}$ and rotation error $\Delta\tilde{\mathbf{r}}$ are expressed as

\Delta\tilde{\mathbf{x}},\Delta\tilde{\mathbf{r}}=\textrm{CMRNet}(I_{t},L(\mathbf{s},\mathcal{M})).

(5)

CMRNet estimates the rotation error $\Delta\tilde{\mathbf{r}}$ as a unit quaternion. Furthermore, the architecture determines both the translation error $\Delta\tilde{\mathbf{x}}$ and rotation error $\Delta\tilde{\mathbf{r}}$ in the reference frame of the state $\mathbf{s}$ . Since the protection levels depend on the position error $\Delta\mathbf{x}$ in the reference frame from which the camera image $I_{t}$ is captured (the vehicle reference frame), we transform the translation error $\Delta\tilde{\mathbf{x}}$ to the vehicle reference frame by rotating it with the inverse of $\Delta\tilde{\mathbf{r}}$

\Delta\mathbf{x}=-\tilde{R}^{\top}\cdot\Delta\tilde{\mathbf{x}},

(6)

where $\tilde{R}$ is the $3\times 3$ rotation matrix corresponding to the rotation error quaternion $\Delta\tilde{\mathbf{r}}$ .

In the second module, we determine the covariance matrix $\Sigma$ associated with $\Delta\mathbf{x}$ by first estimating the covariance matrix $\tilde{\Sigma}$ associated with the translation error $\Delta\tilde{\mathbf{x}}$ obtained from CMRNet and then transforming it to the vehicle reference frame using $\Delta\tilde{\mathbf{r}}$ .

We model the covariance matrix $\tilde{\Sigma}$ by following a similar approach to [37]. Since the covariance matrix is both symmetric and positive-definite, we consider the decomposition of $\tilde{\Sigma}$ into diagonal standard deviations $\boldsymbol{\sigma}=[\sigma_{1},\sigma_{2},\sigma_{3}]$ and correlation coefficients $\boldsymbol{\eta}=[\eta_{21},\eta_{31},\eta_{32}]$

	$\displaystyle[\tilde{\Sigma}]_{ii}$	$\displaystyle=\sigma_{i}^{2}$
	$\displaystyle[\tilde{\Sigma}]_{ij}$	$\displaystyle=[\Sigma]_{ji}=\eta_{ij}\sigma_{i}\sigma_{j},$		(7)

where $i,j\in\{1,2,3\}$ and $j<i$ . We estimate these terms using our second DNN module (referred to as CovarianceNet) which has a similar network structure as CMRNet, but with $256$ and $6$ artificial neurons in the last two fully connected layers to prevent overfitting. For stable training, CovarianceNet produces logarithm of the standard deviation output, which is converted to the standard deviation by then taking the exponent. Additionally, we use tanh function to scale the correlation coefficient outputs $\boldsymbol{\eta}$ in CovarianceNet between $\pm 1$ . Formally, given a vehicle state $\mathbf{s}$ and camera image $I_{t}$ at time $t$ , the standard deviation $\boldsymbol{\sigma}$ and correlation coefficients $\boldsymbol{\eta}$ is approximated as

\boldsymbol{\sigma},\boldsymbol{\eta}=\textrm{CovarianceNet}(I_{t},L(\mathbf{s},\mathcal{M})).

(8)

Using the constructed $\tilde{\Sigma}$ from the obtained $\boldsymbol{\sigma},\boldsymbol{\eta}$ , we obtain the covariance matrix $\Sigma$ associated with $\Delta\mathbf{x}$ as

\Sigma=\tilde{R}^{\top}\cdot\tilde{\Sigma}\cdot\tilde{R}.

(9)

We keep the aleatoric uncertainty restricted to position domain errors in this work for simplicity, and thus treat $\Delta\tilde{\mathbf{r}}$ as a point estimate. The impact of errors in estimating $\Delta\tilde{\mathbf{r}}$ on protection levels is taken into consideration as epistemic uncertainty, and is discussed in more detail in Section V.5 and V.7.

The feature extraction modules in CovarianceNet and CMRNet are separate since the two tasks are complementary: for estimating position error, the DNN must learn features that are robust to noise in the inputs while the variance in the estimated position error depends on the noise itself.

5.3 Loss Functions

The loss function for training the DNN must penalize position error outputs that differ from the corresponding ground truth present in the dataset, as well as penalize covariance that overestimates or underestimates the uncertainty in the position error predictions. Furthermore, the loss muss incentivize the DNN to extract useful features from the camera image and local map inputs for predicting the position error. Hence, we consider three additive components in our loss function $\mathcal{L}(\cdot)$

\mathcal{L}=\alpha_{\textrm{Huber}}\mathcal{L}_{\textrm{Huber}}(\Delta\tilde{\mathbf{x}}^{*},\Delta\tilde{\mathbf{x}})+\alpha_{\textrm{MLE}}\mathcal{L}_{\textrm{MLE}}(\Delta\tilde{\mathbf{x}}^{*},\Delta\tilde{\mathbf{x}},\tilde{\Sigma})+\alpha_{\textrm{Ang}}\mathcal{L}_{\textrm{Ang}}(\Delta\tilde{\mathbf{r}}^{*},\Delta\tilde{\mathbf{r}}),

(10)

where

–

$\Delta\tilde{\mathbf{x}}^{*},\Delta\tilde{\mathbf{r}}^{*}$ denotes the vector-valued translation and rotation error in the reference frame of the state estimate $\mathbf{s}$ to the unknown true state $\mathbf{s}^{*}$
–

$\mathcal{L}_{\textrm{Huber}}(\cdot)$ denotes the Huber loss function [41]
–

$\mathcal{L}_{\textrm{MLE}}(\cdot)$ denotes the loss function for the maximum likelihood estimation of position error $\Delta\mathbf{x}$ and covariance $\tilde{\Sigma}$
–

$\mathcal{L}_{\textrm{Ang}}(\cdot)$ denotes the quaternion angular distance from [15]
–

$\alpha_{\textrm{Huber}},\alpha_{\textrm{MLE}},\alpha_{\textrm{Ang}}$ are coefficients for weighting each loss term.

We employ the Huber loss $\mathcal{L}_{\textrm{Huber}}(\cdot)$ and quaternion angular distance $\mathcal{L}_{\textrm{Ang}}(\cdot)$ terms from [15]. The Huber loss term $\mathcal{L}_{\textrm{Huber}}(\cdot)$ penalizes the translation error output $\Delta\tilde{\mathbf{x}}$ of the DNN

	$\displaystyle\mathcal{L}_{\textrm{Huber}}(\Delta\tilde{\mathbf{x}}^{*},\Delta\tilde{\mathbf{x}})$	$\displaystyle=\sum_{X=x,y,z}D_{\textrm{Huber}}(\Delta\tilde{X}^{*},\Delta\tilde{X})$
	$\displaystyle D_{\textrm{Huber}}(a^{*},a)$	$\displaystyle=\begin{cases}\frac{1}{2}(a-a^{})^{2}&\textrm{for }\|a-a^{}\|\leq\delta\\ \delta\cdot(\|a-a^{*}\|-\frac{1}{2}\delta)&\textrm{otherwise}\end{cases},$		(11)

where $\delta$ is a hyperparameter for adjusting the penalty assignment to small error values. In this paper, we set $\delta=1$ . Unlike the more common mean squared error, the penalty assigned to higher error values is linear in Huber loss instead of quadratic. Thus, Huber loss is more robust to outliers and leads to more stable training as compared with squared error. The quaternion angular distance term $\mathcal{L}_{\textrm{Ang}}(\cdot)$ penalizes the rotation error output $\Delta\tilde{\mathbf{r}}$ from CMRNet

	$\displaystyle\mathcal{L}_{\textrm{Ang}}(\Delta\tilde{\mathbf{r}}^{*},\Delta\tilde{\mathbf{r}})$	$\displaystyle=D_{\textrm{Ang}}(\Delta\tilde{\mathbf{r}}^{*}\times\Delta\tilde{\mathbf{r}}^{-1})$
	$\displaystyle D_{\textrm{Ang}}(\mathbf{q})$	$\displaystyle=\operatorname{atan2}2\left(\sqrt{q_{2}^{2}+q_{3}^{2}+q_{4}^{2}},\|q_{1}\|\right),$		(12)

where

–

$q_{i}$ denotes the $i$ th element in quaternion $\mathbf{q}$
–

$\Delta\mathbf{r}^{-1}$ denotes the inverse of the quaternion $\Delta\mathbf{r}$
–

$\mathbf{q}\times\mathbf{r}$ here denotes element-wise multiplication of the quaternions $\mathbf{q}$ and $\mathbf{r}$
–

$\operatorname{atan2}2(\cdot)$ is the two-argument version of the arctangent function.

Including the quaternion angular distance term $\mathcal{L}_{\textrm{Ang}}(\cdot)$ in the loss function incentivizes the DNN to learn features that are relevant to the geometry between the camera image and the local depth map. Hence, it provides additional supervision to the DNN training as a multi-task objective [42], and is important for the stability and speed of the training process.

The maximum likelihood loss term $\mathcal{L}_{\textrm{MLE}}(\cdot)$ depends on both the translation error $\Delta\tilde{\mathbf{x}}$ and covariance matrix $\tilde{\Sigma}$ estimated from the DNN. The loss function is analogous to the negative log-likelihood of the Gaussian distribution

\mathcal{L}_{\textrm{MLE}}(\Delta\tilde{\mathbf{x}}^{*},\Delta\tilde{\mathbf{x}},\tilde{\Sigma})=\frac{1}{2}\log|\tilde{\Sigma}|+\frac{1}{2}(\Delta\tilde{\mathbf{x}}^{*}-\Delta\tilde{\mathbf{x}})^{\top}\cdot\tilde{\Sigma}^{-1}\cdot(\Delta\tilde{\mathbf{x}}^{*}-\Delta\tilde{\mathbf{x}})

(13)

If the covariance output from the DNN has small values, the corresponding translation error is penalized much more than the translation error corresponding to a large valued covariance. Hence, the maximum likelihood loss term $\mathcal{L}_{\textrm{MLE}}(\cdot)$ incentivizes the DNN to output small covariance only when the corresponding translation error output has high confidence, and otherwise output large covariance.

5.4 Multiple Candidate State Selection

To assess the uncertainty in the DNN-based position error estimation process as well as the uncertainty from environmental factors, we evaluate the DNN output at $N_{C}$ candidate states $\{\mathbf{s}^{1}_{t}\ldots,\mathbf{s}^{N_{C}}_{t}\}$ in the neighborhood of the state estimate $\mathbf{s}_{t}$ .

For selecting the candidate states $\{\mathbf{s}^{1}_{t}\ldots,\mathbf{s}^{N_{C}}_{t}\}$ , we randomly generate multiple values of translation offset $\{\mathbf{t}^{1},\ldots,\mathbf{t}^{N_{C}}\}$ and rotation offset $\{\mathbf{r}^{1},\ldots,\mathbf{r}^{N_{C}}\}$ about the state estimate $\mathbf{s}_{t}$ , where $N_{C}$ is the total number of selected candidate states. The $i$ th translation offset $\mathbf{t}^{i}\in\mathbb{R}^{3}$ denotes translation in $x,y$ and $z$ dimensions and is sampled from a uniform probability distribution between a specified range $\pm t_{max}$ in each dimension. Similarly, the $i$ th rotation offset $\mathbf{r}^{i}\in\textrm{SU}(2)$ is obtained by uniformly sampling between $\pm r_{max}$ angular deviations about each axis and converting the resulting rotation to a quaternion. The $i$ th candidate state $\mathbf{s}^{i}_{t}$ is generated by rotating and translating the state estimate $\mathbf{s}_{t}$ by $\mathbf{r}^{i}$ and $\mathbf{t}^{i}$ , respectively. Corresponding to each candidate state $\mathbf{s}^{i}_{t}$ , we generate a local depth map $L(\mathbf{s}^{i}_{t},\mathcal{M})$ using the procedure laid out in Section V.1.

5.5 Linear Transformation of Position Errors

Using each local depth map $L(\mathbf{s}^{i}_{t},\mathcal{M})$ and camera image $I_{t}$ for the $i$ th candidate state $\mathbf{s}^{i}_{t}$ as inputs to the DNN in Section V.2, we evaluate the candidate state position error $\Delta\mathbf{x}^{i}_{t}$ and covariance matrix $\Sigma^{i}_{t}$ . From the known translation offset $\mathbf{t}^{i}$ between the candidate state $\mathbf{s}^{i}_{t}$ and the state estimate $\mathbf{s}_{t}$ and the DNN-based rotation error $\Delta\tilde{\mathbf{r}}_{t}$ in $\mathbf{s}_{t}$ , we compute the transformation matrix $H_{\mathbf{s}^{i}_{t}\to\mathbf{s}_{t}}$ for converting the candidate state position error $\Delta\mathbf{x}^{i}_{t}$ to the state estimate position error $\Delta\mathbf{x}_{t}$ in the vehicle reference frame

H_{\mathbf{s}^{i}_{t}\to\mathbf{s}_{t}}=\left[\begin{matrix}I_{3\times 3}&-\tilde{R}_{t}^{\top}\mathbf{t}^{i}\end{matrix}\right],

(14)

where $I_{3\times 3}$ denotes the identity matrix and $\tilde{R}_{t}$ is the $3\times 3$ rotation matrix computed from the DNN-based rotation error $\Delta\tilde{\mathbf{r}}_{t}$ between the state estimate $\mathbf{s}_{t}$ and the unknown true state $\mathbf{s}^{*}_{t}$ . Note that the rotation offset $\mathbf{r}^{i}$ is not used in the transformation, since we are only concerned with the position errors from the true state $\mathbf{s}^{*}_{t}$ to the state estimate $\mathbf{s}_{t}$ , which are invariant to the orientation of the candidate state $\mathbf{s}^{i}_{t}$ . Using the transformation matrix $H_{\mathbf{s}^{i}_{t}\to\mathbf{s}_{t}}$ , we obtain the $i$ th sample of the state estimate position error $\Delta\mathbf{x}_{t}^{(i)}$

\Delta\mathbf{x}_{t}^{(i)}=H_{\mathbf{s}^{i}_{t}\to\mathbf{s}_{t}}\cdot[\begin{matrix}\Delta\mathbf{x}^{i}_{t}&1\end{matrix}]^{\top}=\Delta\mathbf{x}^{i}_{t}-\tilde{R}_{t}^{\top}\mathbf{t}^{i}.

(15)

We use parentheses in the notation $\Delta\mathbf{x}_{t}^{(i)}$ for the transformed samples of the position error between the true state $\mathbf{s}^{*}_{t}$ and the state estimate $\mathbf{s}_{t}$ to differentiate from the position error $\Delta\mathbf{x}^{i}_{t}$ between $\mathbf{s}^{*}_{t}$ and the candidate state $\mathbf{s}^{i}_{t}$ . Next, we modify the candidate state covariance matrix $\Sigma^{i}_{t}$ to account for uncertainty in DNN-based rotation error $\Delta\tilde{\mathbf{r}}_{t}$ . The resulting covariance matrix $\Sigma^{(i)}_{t}$ in terms of the covariance matrix $\Sigma^{i}_{t}$ for $\Delta\mathbf{x}^{i}_{t}$ , $\tilde{R}_{t}$ and $\mathbf{t}^{i}$ is

\Sigma^{(i)}_{t}=\Sigma^{i}_{t}+\textrm{Var}[\tilde{R}_{t}^{\top}\mathbf{t}^{i}].

(16)

Assuming small errors in determining the true rotation offsets between state estimate $\mathbf{s}_{t}$ and the true state $\mathbf{s}^{*}_{t}$ , we consider the random variable $R^{\prime}\tilde{R}_{t}^{\top}\mathbf{t}^{i}$ where $R^{\prime}$ represents the random rotation matrix corresponding to small angular deviations [43]. Using $R^{\prime}\tilde{R}_{t}^{\top}\mathbf{t}^{i}$ , we approximate the covariance matrix $\Sigma^{(i)}_{t}$ as

$\displaystyle\Sigma^{(i)}_{t}$	$\displaystyle\approx\Sigma^{i}_{t}+\mathbb{E}[(R^{\prime}-I)(\tilde{R}_{t}^{\top}\mathbf{t}^{i})(\tilde{R}_{t}^{\top}\mathbf{t}^{i})^{\top}(R^{\prime}-I)^{\top}]$
$\displaystyle[\Sigma^{(i)}_{t}]_{i^{\prime}j^{\prime}}$	$\displaystyle\approx[\Sigma^{i}_{t}]_{i^{\prime}j^{\prime}}+\mathbb{E}[(\mathbf{r}^{\prime}_{i^{\prime}})^{\top}(\tilde{R}_{t}^{\top}\mathbf{t}^{i})(\tilde{R}_{t}^{\top}\mathbf{t}^{i})^{\top}(\mathbf{r}^{\prime}_{j^{\prime}})]$
	$\displaystyle=[\Sigma^{i}_{t}]_{i^{\prime}j^{\prime}}+\mathrm{Tr}\left((\tilde{R}_{t}^{\top}\mathbf{t}^{i})(\tilde{R}_{t}^{\top}\mathbf{t}^{i})^{\top}\mathbb{E}[(\mathbf{r}^{\prime}_{i^{\prime}})(\mathbf{r}^{\prime}_{j^{\prime}})^{\top}]\right)$
	$\displaystyle=[\Sigma^{i}_{t}]_{i^{\prime}j^{\prime}}+\mathrm{Tr}\left((\tilde{R}_{t}^{\top}\mathbf{t}^{i})(\tilde{R}_{t}^{\top}\mathbf{t}^{i})^{\top}Q_{i^{\prime}j^{\prime}}\right),$	(17)

where $(\mathbf{r}^{\prime}_{i})^{\top}$ represents the $i$ th row vector in $R^{\prime}-I$ . Since errors in $\tilde{R}$ depend on the DNN output, we specify $R^{\prime}$ through the empirical distribution of the angular deviations in $\tilde{R}$ as observed for the trained DNN on the training and validation data, and precompute the expectation $Q_{i^{\prime}j^{\prime}}$ for each $(i^{\prime},j^{\prime})$ pair.

The samples of state estimate position error $\{\Delta\mathbf{x}_{t}^{(1)},\ldots,\Delta\mathbf{x}_{t}^{(N_{C})}\}$ represent both inaccuracy in the DNN estimation as well as uncertainties due to environmental factors. If the DNN approximation fails at the input corresponding to the state estimate $\mathbf{s}_{t}$ , the estimated position errors at candidate states would lead to a wide range of different values for the state estimate position errors. Similarly, if the environment map $\mathcal{M}$ near the state estimate $\mathbf{s}_{t}$ contains repetitive features, the position errors computed from candidate states would be different and hence indicate high uncertainty.

5.6 Outlier Weights

Since the candidate states $\{\mathbf{s}^{1}_{t}\ldots,\mathbf{s}^{N_{C}}_{t}\}$ are selected randomly, some position error samples may correspond to the local depth map and camera image pairs for which the DNN performs poorly. Thus, we compute outlier weights $\{\mathbf{w}^{(1)}_{t},\ldots,\mathbf{w}^{(N_{C})}_{t}\}$ corresponding to the position error samples $\{\Delta\mathbf{x}_{t}^{(1)},\ldots,\Delta\mathbf{x}_{t}^{(N_{C})}\}$ to mitigate the effect of these erroneous position error values in determining the protection levels. We compute outlier weights in each of the $x,y,$ and $z$ -dimensions separately, since the DNN approximation might not necessarily fail in all of its outputs. An example of this scenario is when the input camera image and local map contain features such as building edges that can be used to robustly determine errors along certain directions but not others.

For computing the outlier weights $\mathbf{w}_{t}^{(i)}=[w^{(i)}_{x,t},w^{(i)}_{y,t},w^{(i)}_{z,t}]$ associated with the $i$ th position error value $\Delta\mathbf{x}_{t}^{(i)}=[\Delta x_{t}^{(i)},\Delta y_{t}^{(i)},\Delta z_{t}^{(i)}]$ , we employ the robust Z-score based outlier detection technique [31]. The robust Z-score is used in a variety of anomaly detection approaches due to its resilience to outliers [44]. We apply the following operations in each dimension $X=x,y,$ and $z$ :

1.

We compute the Median Absolute Deviation statistic [31] ${M\negthinspace AD}_{X}$ using all position error values $\{\Delta X_{t}^{(1)},\ldots,\Delta X_{t}^{(N_{C})}\}$

${M\negthinspace AD}_{X}=\operatorname{median}(|\Delta X_{t}^{(i)}-\operatorname{median}(\Delta X_{t}^{(i)})|).$ (18)
2.

Using the statistic ${M\negthinspace AD}_{X}$ , we compute the robust Z-score $\mathcal{Z}^{(i)}_{X}$ for each position error value $\Delta X_{t}^{(i)}$

$\mathcal{Z}^{(i)}_{X}=\frac{|\Delta X_{t}^{(i)}-\operatorname{median}(\Delta X_{t}^{(i)})|}{{M\negthinspace AD}_{X}}.$ (19)

The robust Z-score $\mathcal{Z}^{(i)}_{X}$ is high if the position error $\Delta\mathbf{x}^{(i)}$ deviates from the median error with a large value when compared with the median deviation value.
3.

We compute the outlier weights $\{w^{(1)}_{X},\ldots,w^{(N_{C})}_{X}\}$ from the robust Z-scores $\{\mathcal{Z}^{(1)}_{X},\ldots,\mathcal{Z}^{(N_{C})}_{X}\}$ by applying the softmax operation [45] such that the sum of weights is unity

$w^{(i)}_{X,t}=\frac{e^{-\gamma\cdot\mathcal{Z}^{(i)}_{X}}}{\sum_{j=1}^{N_{C}}e^{-\gamma\cdot\mathcal{Z}^{(j)}_{X}}},$ (20)

where $\gamma$ denotes the scaling coefficient in the softmax function. We set $\gamma=0.6745$ as the approximate inverse of the standard normal distribution evaluated at $3/4$ to make the scaling in the statistic consistent with the standard deviation of a normal distribution [31]. A small value of outlier weight $w^{(i)}_{X,t}$ indicates that the position error $\Delta X_{t}^{(i)}$ is an outlier.

For brevity, we extract the diagonal variances associated with each dimension for all position error samples

$\displaystyle(\sigma^{2}_{x,t})^{(i)}$	$\displaystyle=[\Sigma^{(i)}_{t}]_{11}$
$\displaystyle(\sigma^{2}_{y,t})^{(i)}$	$\displaystyle=[\Sigma^{(i)}_{t}]_{22}$
$\displaystyle(\sigma^{2}_{z,t})^{(i)}$	$\displaystyle=[\Sigma^{(i)}_{t}]_{33}.$	(21)

5.7 Probability Distribution of Position Error

We construct a probability distribution in each of the $X=x,y$ and $z$ -dimensions from the previously obtained samples of position errors $\Delta X^{(i)}_{t}$ , variances $(\sigma^{2}_{X,t})^{(i)}$ and outlier weights $w^{(i)}_{X,t}$ . We model the probability distribution using the Gaussian Mixture Model (GMM) distribution [32]

\displaystyle\mathbb{P}(\rho_{X,t})

\displaystyle=\sum_{i=1}^{N_{C}}w^{(i)}_{X,t}\mathcal{N}\left(\Delta X_{t}^{(i)},(\sigma^{2}_{X,t})^{(i)}\right),

(22)

where

–

$\rho_{X,t}$ denotes the position error random variable
–

$\mathcal{N}(\mu,\sigma^{2})$ is the Gaussian distribution with mean $\mu$ and variance $\sigma^{2}$ .

The probability distributions $\mathbb{P}(\rho_{x,t})$ , $\mathbb{P}(\rho_{y,t})$ and $\mathbb{P}(\rho_{z,t})$ incorporate both aleatoric uncertainty from the DNN-based covariance and epistemic uncertainty from the multiple DNN evaluations associated with different candidate states. Both the position error and covariance matrix depend on the rotation error point estimate from CMRNet for transforming the error values to the vehicle reference frame. Since each DNN evaluation for a candidate state estimates the rotation error independently, the epistemic uncertainty incorporates the effects of errors in DNN-based estimation of both rotation and translation. The epistemic uncertainty is reflected in the multiple GMM components and their weight coefficients, which represent the different possible position error values that may arise from the same camera image measurement and the environment map. The aleatoric uncertainty is present as the variance in each possible value of the position error represented by the individual components.

5.8 Protection Levels

We compute the protection levels along the lateral, longitudinal and vertical directions using the probability distributions obtained in the previous section. Since the position errors are in the vehicle reference frame, the $x,y$ and $z$ -dimensions coincide with the lateral, longitudinal and the vertical directions, respectively. First, we obtain the cumulative distribution function $\textrm{CDF}(\cdot)$ for each probability distribution

\displaystyle\textrm{CDF}(\rho_{X,t})

\displaystyle=\sum_{i=1}^{N_{C}}w^{(i)}_{X,t}\Phi\left(\frac{\rho_{X,t}-\Delta X_{t}^{(i)}}{(\sigma_{X,t})^{(i)}}\right)

(23)

where $\Phi(\cdot)$ is the cumulative distribution function of the standard normal distribution. Then, for a specified value of the integrity risk $IR$ , we compute the protection level $PL$ in lateral, longitudinal and vertical directions from equation 1 using the CDF as the probability distribution. For numerical optimization, we employ a simple interval halving method for line search or the bisection method [46]. To account for both positive and negative errors, we perform the optimization both using CDF (supremum) and $1-\textrm{CDF}$ (infemum) with $IR/2$ as the integrity risk and use the maximum absolute value as the protection level.

The computed protection levels consider heavy-tails in the GMM probability distribution of the position error that arise because of the different possible values of the position error that can be computed from the available camera measurements and environment map. Our method computes large protection levels when many different values of position error may be equally probable from the measurements, resulting in larger tail probabilities in the GMM, and small protection levels only if the uncertainty from both aleatoric and epistemic sources is small.

6 Experimental Results

6.1 Real-World Driving Dataset

We use the KITTI visual odometry dataset [47] to evaluate the performance of the protection levels computed by our approach. The dataset was recorded around Karlsruhe, Germany over multiple driving sequences and contains images recorded by multiple on-board cameras, along with ground truth positions and orientations. Additionally, the dataset contains LiDAR point cloud measurements which we use to generate the environment map corresponding to each sequence. Since our approach for computing protection levels just requires a monocular camera sensor, we use the images recorded by the left RGB camera in our experiments. We use the sequences $00$ , $03$ , $05$ , $06$ , $07$ , $08$ and $09$ from the dataset based on the availability of a LiDAR environment map. We use sequence $00$ for validation of our approach and the rest of the sequences are utilized in training our DNN. The experimental parameters are provided in Table 5.

6.2 LiDAR Environment Map

To construct a precise LiDAR point cloud map $\mathcal{M}$ of the environment, we exploit the openly available position and orientation values for the dataset computed via Simultaneous Localization and Mapping [4]. Similar to [15], we aggregate the LiDAR point clouds across all time instances. Then, we detect and remove sparse outliers within the aggregated point cloud by computing Z-score [31] of each point in a $0.1$ m local neighborhood. We discarded the points which had a higher Z-score than $3$ . Finally, the remaining points are down sampled into a voxel map of the environment $\mathcal{M}$ with resolution of $0.1$ m. The corresponding map for sequence 00 in the KITTI dataset is shown in Fig. 5. For storing large maps, we divide the LiDAR point cloud sequences into multiple overlapping parts and construct separate maps of roughly $500$ Megabytes each.

6.3 DNN Training and Testing Datasets

We generate the training dataset for our DNN in two steps. First, we randomly select a state estimate $s_{t}$ at time $t$ from within a $2$ m translation and a $10^{\circ}$ rotation of the ground truth positions and orientations in each driving sequence. The translation and rotation used for generating the state estimate is utilized as the ground truth position error $\Delta\mathbf{x}^{*}_{t}$ and orientation error $\Delta\mathbf{r}^{*}_{t}$ . Then, using the LiDAR map $\mathcal{M}$ , we generate the local depth map $L(\mathbf{s}_{t},\mathcal{M})$ corresponding to the state estimate $\mathbf{s}_{t}$ and use it as the DNN input along with the camera image $I_{t}$ from the driving sequence data. The training dataset comprises of camera images from $11455$ different time instances, with the state estimate selected at runtime so as to have different state estimates for the same camera images in different epochs.

Similar to the data augmentation techniques described in [15], we

1.

Randomly change contrast, saturation and brightness of images,
2.

Apply random rotations in the range of $\pm 5^{\circ}$ to both the camera images and local depth maps,
3.

Horizontally mirror the camera image and compute the local depth map using a modified camera projection matrix.

All three of these data augmentation techniques are used in training CMRNet in the first half of the optimization process. However, for training CovarianceNet, we skip the contrast, saturation and brightness changes during the second half of the optimization so that the DNN can learn real-world noise features from camera images.

We generate the validation and test datasets from sequence $00$ in the KITTI odometry dataset, which is not used for training. We follow a similar procedure as the one for generating the training dataset, except we do not augment the data. The validation dataset comprises of randomly selected $100$ time instances from sequence $00$ , while the test dataset contains the remaining $4441$ time instances in sequence $00$ .

Parameter	Value
Integrity risk $IR$	$0.01$
Candidate state maximum translation offset $t_{max}$	$1.0$ m
Candidate state maximum rotation offset $r_{max}$	$5^{\circ}$
Number of candidate states $N_{C}$	$24$
Lateral alarm limit $AL_{lat}$	$0.85$ m
Longitudinal alarm limit $AL_{lon}$	$1.50$ m
Vertical alarm limit $AL_{vert}$	$1.47$ m

6.4 Training Procedure

We train the DNN using stochastic gradient descent. Directly optimizing via the maximum likelihood loss term $\mathcal{L}_{\textrm{MLE}}(\cdot)$ might suffer from instability caused by the interdependence between the translation error $\Delta\tilde{\mathbf{x}}$ and covariance $\tilde{\Sigma}$ outputs [48]. Therefore, we employ the mean-variance split training strategy proposed in [48]: First, we set $(\alpha_{\textrm{Huber}}=1,\alpha_{\textrm{MLE}}=1,\alpha_{\textrm{Ang}}=1)$ and only optimize the parameters of CMRNet till validation error stops decreasing. Next, we set $(\alpha_{\textrm{Huber}}=0,\alpha_{\textrm{MLE}}=1,\alpha_{\textrm{Ang}}=0)$ and optimize the parameters of CovarianceNet. We alternate between these two steps till validation loss stops decreasing. Our DNN is implemented using the PyTorch library [49] and takes advantage of the open-source implementation available for CMRNet [15] as well as the available pretrained weights for initialization. Similar to CMRNet, all the layers in our DNN use the leaky RELU activation function with a negative slope of $0.1$ . We train the DNN on using a single NVIDIA Tesla P40 GPU with a batch size of $24$ and learning rate of $10^{-5}$ selected via grid search.

6.5 Metrics

We evaluate the lateral, longitudinal and vertical protection levels computed using our approach using the following three metrics (with subscript $t$ dropped for brevity):

Bound gap measures the difference between the computed protection levels $PL_{lat},PL_{lon},PL_{vert}$ and the true position error magnitude during nominal operations (protection level is less than the alarm limit and greater than the position error)

$\displaystyle BG_{lat}$	$\displaystyle=\textrm{avg}(PL_{lat}-\|\Delta x^{*}\|)$
$\displaystyle BG_{lon}$	$\displaystyle=\textrm{avg}(PL_{lon}-\|\Delta y^{*}\|)$
$\displaystyle BG_{vert}$	$\displaystyle=\textrm{avg}(PL_{vert}-\|\Delta z^{*}\|),$	(24)

where

–

$BG_{lat},BG_{lon}$ and $BG_{vert}$ denote bound gaps in lateral, longitudinal and vertical dimensions respectively
–

$\textrm{avg}(\cdot)$ denotes the average computed over the test dataset for which the value of protection level is greater than position error and less than the alarm limit

A small bound gap value $BG_{lat},BG_{lon},BG_{vert}$ is desirable since it implies that the algorithm both estimates the position error magnitude during nominal operations accurately and has low uncertainty in the prediction. We only consider the bound gap for nominal operations, since the estimated position is declared unsafe when the protection level exceeds the alarm limit.

Failure rate measures the total fraction of time instances in the test data sequence for which the computed protection levels $PL_{lat},PL_{lon},PL_{vert}$ are smaller than the true position error magnitude

$\displaystyle FR_{lat}$	$\displaystyle=\frac{1}{T_{\textrm{max}}}\sum_{t=1}^{T_{\textrm{max}}}\mathbbm{1}_{t}\left(PL_{lat}<\|\Delta x^{*}\|\right)$
$\displaystyle FR_{lon}$	$\displaystyle=\frac{1}{T_{\textrm{max}}}\sum_{t=1}^{T_{\textrm{max}}}\mathbbm{1}_{t}\left(PL_{lon}<\|\Delta y^{*}\|\right)$
$\displaystyle FR_{vert}$	$\displaystyle=\frac{1}{T_{\textrm{max}}}\sum_{t=1}^{T_{\textrm{max}}}\mathbbm{1}_{t}\left(PL_{vert}<\|\Delta z^{*}\|\right),$	(25)

where

–

$FR_{lat},FR_{lon}$ and $FR_{vert}$ denote failure rates for lateral, longitudinal and vertical protection levels, respectively
–

$\mathbbm{1}_{t}(\cdot)$ denotes the indicator function computed using the protection level and true position error values at time $t$ . The indicator function evaluates to $1$ if the event in its argument holds true, and otherwise evaluates to $0$
–

$T_{\textrm{max}}$ denotes the total time duration of the test sequence

The failure rate $FR_{lat},FR_{lon},FR_{vert}$ should be consistent with the specified value of the integrity risk $IR$ to meet the safety requirements.

False alarm rate is computed for a specified alarm limit $AL_{lat},AL_{lon},AL_{vert}$ in the lateral, longitudinal and vertical directions and measures the fraction of time instances in the test data sequence for which the computed protection levels $PL_{lat},PL_{lon},PL_{vert}$ exceed the alarm limit $AL_{lat},AL_{lon},AL_{vert}$ while the position error magnitude is within the alarm limits. We first define the following integrity events

$\displaystyle\Omega_{lat,PL}$	$\displaystyle=(PL_{lat}>AL_{lat})$
$\displaystyle\Omega_{lat,PE}$	$\displaystyle=(\|\Delta x^{*}\|>AL_{lat})$
$\displaystyle\Omega_{lon,PL}$	$\displaystyle=(PL_{lon}>AL_{lon})$
$\displaystyle\Omega_{lon,PE}$	$\displaystyle=(\|\Delta y^{*}\|>AL_{lon})$
$\displaystyle\Omega_{vert,PL}$	$\displaystyle=(PL_{vert}>AL_{vert})$
$\displaystyle\Omega_{vert,PE}$	$\displaystyle=(\|\Delta z^{*}\|>AL_{vert}).$	(26)

The complement of each event is denoted by $\bar{\Omega}$ . Next, we define the counts for false alarms $N_{X,FA}$ , true alarms $N_{X,TA}$ and the number of times the position error exceeds the alarm limit $N_{X,PE}$ with $X=lat,lon$ and $vert$

$\displaystyle N_{X,FA}$	$\displaystyle=\sum_{t=1}^{T_{\textrm{max}}}\mathbbm{1}_{t}\left(\Omega_{X,PL}\cap\bar{\Omega}_{X,PE}\right)$
$\displaystyle N_{X,TA}$	$\displaystyle=\sum_{t=1}^{T_{\textrm{max}}}\mathbbm{1}_{t}\left(\Omega_{X,PL}\cap\Omega_{X,PE}\right)$
$\displaystyle N_{X,PE}$	$\displaystyle=\sum_{t=1}^{T_{\textrm{max}}}\mathbbm{1}_{t}\left(\Omega_{X,PE}\right).$	(27)

Finally, we compute the false alarm rates $FAR_{lat},FAR_{lon},FAR_{vert}$ after normalizing the total number of position error magnitudes lying above and below the alarm limit $AL$

\displaystyle FAR_{X}

\displaystyle=\frac{N_{X,FA}\cdot({T_{\textrm{max}}}-N_{X,PE})}{N_{X,FA}\cdot({T_{\textrm{max}}}-N_{X,PE})+N_{X,TA}\cdot N_{X,PE}}.

(28)

6.6 Results

Fig. 6 shows the lateral and longitudinal protection levels computed by our approach on two $200$ s subsets of the test sequence. For clarity, protection levels are computed at every $5$ th time instance. Similarly, Fig. 7 shows the vertical protection levels along with the vertical position error magnitude in a subset of the test sequence. As can be seen from both the figures, the computed protection levels successfully enclose the position error magnitudes at a majority of the points ( $\sim 99\%$ ) in the visualized subsequences. Furthermore, the vertical protection levels are observed to be visually closer to the position error as compared to the lateral and longitudinal protection levels. This is due to the superior performance of the DNN in determining position errors along the vertical dimension, which is easier to learn since all the camera images in the dataset are captured by a ground-based vehicle.

Fig. 8 displays the integrity diagrams generated after the Stanford-ESA integrity diagram proposed for SBAS integrity [50]. The diagram is generated from $15000$ samples of protection levels corresponding to randomly selected state estimates and camera images within the test sequence. For protection levels each direction, we set the alarm limit (Table 5) based on the specifications suggested for mid-size vehicles in [10], beyond which the state estimate is declared unsafe to use. The lateral, longitudinal and vertical protection levels are greater than the position error magnitudes in $\sim 99$ % cases, which is consistent with the specified integrity requirement. Furthermore, a large fraction of the failures is in the region where the protection level is greater than the alarm limit and thus the system has been correctly identified to be under unsafe operation.

We conducted an ablation study to numerically evaluate the impact of our proposed epistemic uncertainty measure and outlier weighting method in computing protection levels. We evaluated protection levels in three different cases: Incorporating DNN covariance, epistemic uncertainty and outlier weighting (VAR+EO); incorporating just the DNN covariance and epistemic uncertainty with equal weights assigned to all position error samples (VAR+E); and only using the DNN covariance (VAR). For VAR, we constructed a Gaussian distribution using the DNN position error output and diagonal variance entries in each dimension. Then, we computed protection levels from the inverse cumulative distribution function of the Gaussian distribution corresponding to the specified value of integrity risk $IR$ . Table 1 summarizes our results. Incorporating the epistemic uncertainty in computing protection levels improved the failure rate from $0.05$ in lateral protection levels, $0.05$ in longitudinal protection levels and $0.03$ in vertical protection levels to within $0.01$ in all cases. This is because the covariance estimate from the DNN provides an overconfident measure of uncertainty, which is corrected by our epistemic uncertainty measure. Furthermore, incorporating outlier weighting reduced the average nominal bound gap by about $0.02$ m in lateral protection levels, $0.05$ m in longitudinal protection levels, and $0.05$ m in vertical protection levels as well as false alarm rate by about $0.02$ for each direction while keeping the failure rate within the specified integrity risk requirement.

The mean bound gap between the lateral protection levels computed from our approach and the position error magnitudes in the nominal cases is smaller than a quarter of the width of a standard U.S. lane. In the longitudinal direction, the bound gap is somewhat larger since fewer visual features are present along the road for determining the position error using the DNN. The corresponding value in the vertical dimension is smaller, owing to the DNN’s superior performance in determining position errors and uncertainty in the vertical dimension. This demonstrates the applicability of our approach to urban roads.

For an integrity risk requirement of $0.01$ , the protection levels computed by our method demonstrate a failure rate equal to or within $0.01$ as well. However, further lowering the integrity risk requirement during our experiments either did not similarly improve the failure rate or caused a significant increase in the bound gaps and the false alarm rate. A possible reason is that the uncertainty approximated by our approach through both the aleatoric and epistemic measures fails to act as an accurate uncertainty representation for smaller integrity risk requirements than $0.01$ . Future research would consider more and varied training data, better strategies for selecting candidate states, and different DNN architectures to meet smaller integrity risk requirements.

A shortcoming of our approach is the large false alarm rate exhibited by the computed protection levels in Table 1. The large value results both from the inherent noise in the DNN-based estimation of position and rotation error as well as from frequently selecting candidate states that result in large outlier error values. A future work direction for reducing the false alarm rate is to explore strategies for selecting candidate states and mitigating outliers.

A key advantage offered by our approach is its application to scenarios where a direct analysis of the error sources in the state estimation algorithm is difficult, such as when feature rich visual information is processed by a machine learning algorithm for estimating the state. In such scenarios, our approach computes protection levels separately from the state estimation algorithm by both evaluating a data-driven model of the position error uncertainty and characterizing the epistemic uncertainty in the model outputs.

	Lateral PL			Longitudinal PL			Vertical PL
	$BG$ (m)	$FR$	$FAR$	$BG$ (m)	$FR$	$FAR$	$BG$ (m)	$FR$	$FAR$
VAR+EO	$0.49$	$0.01$	$0.47$	$0.77$	$0.01$	$0.40$	$0.38$	$<\negmedspace 0.01$	$0.14$
VAR+E	$0.51$	$0.01$	$0.49$	$0.82$	$0.01$	$0.43$	$0.43$	$<\negmedspace 0.01$	$0.16$
VAR	$0.42$	$0.05$	$0.45$	$0.64$	$0.05$	$0.36$	$0.30$	$0.02$	$0.12$

Table 1: Evaluation of lateral, longitudinal and vertical protection levels from our approach. We compare protection levels computed by our trained model using DNN covariance, epistemic uncertainty and outlier weighting (VAR+EO), DNN covariance and epistemic uncertainty (VAR+E) and only DNN covariance (VAR). Incorporating epistemic uncertainty results in lower failure rate, while incorporating outlier weights reduces bound gap and false alarm rate.

7 Conclusions

In this work, we presented a data-driven approach for computing lateral, longitudinal and vertical protection levels associated with a given state estimate from camera images and a 3D LiDAR map of the environment. Our approach estimates both aleatoric and epistemic measures of uncertainty for computing protection levels, thereby providing robust measures of localization safety. We demonstrated the efficacy of our method on real-world data in terms of bound gap, failure rate and false alarm rate. Results show that the lateral, longitudinal and vertical protection levels computed from our method enclose the position error magnitudes with $0.01$ probability of failure and less than $1$ m bound gap in all directions, which demonstrates that our approach is applicable to GNSS-denied urban environments.

Acknowledgements

This material is based upon work supported by the National Science Foundation under award #2006162.

References

Delling et al. [2017] Daniel Delling, Andrew V. Goldberg, Thomas Pajor, and Renato F. Werneck. Customizable Route Planning in Road Networks. Transportation Science, 51(2):566–591, May 2017. ISSN 0041-1655, 1526-5447. 10.1287/trsc.2014.0579.
Jensen et al. [2016] Morten Borno Jensen, Mark Philip Philipsen, Andreas Mogelmose, Thomas Baltzer Moeslund, and Mohan Manubhai Trivedi. Vision for Looking at Traffic Lights: Issues, Survey, and Perspectives. IEEE Transactions on Intelligent Transportation Systems, 17(7):1800–1815, July 2016. ISSN 1524-9050, 1558-0016. 10.1109/TITS.2015.2509509.
Wolcott and Eustice [2017] Ryan W Wolcott and Ryan M Eustice. Robust LIDAR localization using multiresolution Gaussian mixture maps for autonomous driving. The International Journal of Robotics Research, 36(3):292–319, March 2017. ISSN 0278-3649, 1741-3176. 10.1177/0278364917696568.
Caselitz et al. [2016] Tim Caselitz, Bastian Steder, Michael Ruhnke, and Wolfram Burgard. Monocular camera localization in 3D LiDAR maps. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1926–1931, Daejeon, South Korea, October 2016. IEEE. ISBN 978-1-5090-3762-9. 10.1109/IROS.2016.7759304.
Spilker Jr. et al. [1996] James J. Spilker Jr., Penina Axelrad, Bradford W. Parkinson, and Per Enge, editors. Global Positioning System: Theory and Applications, Volume I. American Institute of Aeronautics and Astronautics, Washington DC, January 1996. ISBN 978-1-56347-106-3 978-1-60086-638-8. 10.2514/4.866388.
Jiang and Wang [2016] Yiping Jiang and Jinling Wang. A New Approach to Calculate the Horizontal Protection Level. Journal of Navigation, 69(1):57–74, January 2016. ISSN 0373-4633, 1469-7785. 10.1017/S0373463315000545.
Cezón et al. [2013] A. Cezón, M. Cueto, and I. Fernández. Analysis of Multi-GNSS Service Performance Assessment: ARAIM vs. IBPL Performances Comparison. pages 2654–2663, September 2013. ISSN: 2331-5954.
Tran and Lo Presti [2019] Hieu Trung Tran and Letizia Lo Presti. Kalman filter-based ARAIM algorithm for integrity monitoring in urban environment. ICT Express, 5(1):65–71, March 2019. ISSN 24059595. 10.1016/j.icte.2018.05.002.
Badue et al. [2021] Claudine Badue, Rânik Guidolini, Raphael Vivacqua Carneiro, Pedro Azevedo, Vinicius B. Cardoso, Avelino Forechi, Luan Jesus, Rodrigo Berriel, Thiago M. Paixão, Filipe Mutz, Lucas de Paula Veronese, Thiago Oliveira-Santos, and Alberto F. De Souza. Self-driving cars: A survey. Expert Systems with Applications, 165:113816, March 2021. ISSN 09574174. 10.1016/j.eswa.2020.113816.
Reid et al. [2019] Tyler G. R. Reid, Sarah E. Houts, Robert Cammarata, Graham Mills, Siddharth Agarwal, Ankit Vora, and Gaurav Pandey. Localization Requirements for Autonomous Vehicles. SAE International Journal of Connected and Automated Vehicles, 2(3):12–02–03–0012, September 2019. ISSN 2574-075X. 10.4271/12-02-03-0012. arXiv: 1906.01061.
Taira et al. [2021] Hajime Taira, Masatoshi Okutomi, Torsten Sattler, Mircea Cimpoi, Marc Pollefeys, Josef Sivic, Tomas Pajdla, and Akihiko Torii. InLoc: Indoor Visual Localization with Dense Matching and View Synthesis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(4):1293–1307, April 2021. ISSN 0162-8828, 2160-9292, 1939-3539. 10.1109/TPAMI.2019.2952114.
Kim et al. [2018] Youngji Kim, Jinyong Jeong, and Ayoung Kim. Stereo Camera Localization in 3D LiDAR Maps. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1–9, Madrid, October 2018. IEEE. ISBN 978-1-5386-8094-0. 10.1109/IROS.2018.8594362.
Lyrio et al. [2015] Lauro J. Lyrio, Thiago Oliveira-Santos, Claudine Badue, and Alberto Ferreira De Souza. Image-based mapping, global localization and position tracking using VG-RAM weightless neural networks. In 2015 IEEE International Conference on Robotics and Automation (ICRA), pages 3603–3610, Seattle, WA, May 2015. IEEE. ISBN 978-1-4799-6923-4. 10.1109/ICRA.2015.7139699.
Oliveira et al. [2020] Gabriel L. Oliveira, Noha Radwan, Wolfram Burgard, and Thomas Brox. Topometric Localization with Deep Learning. In Nancy M. Amato, Greg Hager, Shawna Thomas, and Miguel Torres-Torriti, editors, Robotics Research, volume 10, pages 505–520. Springer International Publishing, Cham, 2020. ISBN 978-3-030-28618-7 978-3-030-28619-4. 10.1007/978-3-030-28619-4_38. Series Title: Springer Proceedings in Advanced Robotics.
Cattaneo et al. [2019] Daniele Cattaneo, Matteo Vaghi, Augusto Luis Ballardini, Simone Fontana, Domenico Giorgio Sorrenti, and Wolfram Burgard. CMRNet: Camera to LiDAR-Map Registration. 2019 IEEE Intelligent Transportation Systems Conference (ITSC), pages 1283–1289, October 2019. 10.1109/ITSC.2019.8917470. arXiv: 1906.10109.
Sarlin et al. [2019] Paul-Edouard Sarlin, Cesar Cadena, Roland Siegwart, and Marcin Dymczyk. From Coarse to Fine: Robust Hierarchical Localization at Large Scale. arXiv:1812.03506 [cs], pages 12708–12717, April 2019. 10.1109/CVPR.2019.01300. arXiv: 1812.03506 version: 2.
Recht et al. [2019] Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. Do ImageNet Classifiers Generalize to ImageNet? In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 5389–5400. PMLR, June 2019.
Kendall and Gal [2017] Alex Kendall and Yarin Gal. What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? arXiv:1703.04977 [cs], 30, October 2017. arXiv: 1703.04977.
Loquercio et al. [2020] Antonio Loquercio, Mattia Segu, and Davide Scaramuzza. A General Framework for Uncertainty Estimation in Deep Learning. IEEE Robotics and Automation Letters, 5(2):3153–3160, April 2020. ISSN 2377-3766, 2377-3774. 10.1109/LRA.2020.2974682.
Kiureghian and Ditlevsen [2009] Armen Der Kiureghian and Ove Ditlevsen. Aleatory or epistemic? Does it matter? Structural Safety, 31(2):105–112, March 2009. ISSN 01674730. 10.1016/j.strusafe.2008.06.020.
McAllister et al. [2017] Rowan McAllister, Yarin Gal, Alex Kendall, Mark van der Wilk, Amar Shah, Roberto Cipolla, and Adrian Weller. Concrete Problems for Autonomous Vehicle Safety: Advantages of Bayesian Deep Learning. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pages 4745–4753, Melbourne, Australia, August 2017. International Joint Conferences on Artificial Intelligence Organization. ISBN 978-0-9992411-0-3. 10.24963/ijcai.2017/661.
Yang et al. [2020] Nan Yang, Lukas von Stumberg, Rui Wang, and Daniel Cremers. D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1278–1289, Seattle, WA, USA, June 2020. IEEE. ISBN 978-1-72817-168-5. 10.1109/CVPR42600.2020.00136.
Kendall and Cipolla [2016] Alex Kendall and Roberto Cipolla. Modelling uncertainty in deep learning for camera relocalization. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pages 4762–4769, Stockholm, Sweden, May 2016. IEEE. ISBN 978-1-4673-8026-3. 10.1109/ICRA.2016.7487679.
Gal and Ghahramani [2016] Yarin Gal and Zoubin Ghahramani. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Maria Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 1050–1059, New York, New York, USA, June 2016. PMLR.
Blundell et al. [2015] Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. Weight Uncertainty in Neural Network. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 1613–1622, Lille, France, July 2015. PMLR.
Smith and Gal [2018] Lewis Smith and Yarin Gal. Understanding Measures of Uncertainty for Adversarial Example Detection. arXiv:1803.08533 [cs, stat], pages 560–569, March 2018. arXiv: 1803.08533.
Gupta and Gao [2020] Shubh Gupta and Grace X. Gao. Data-Driven Protection Levels for Camera and 3D Map-based Safe Urban Localization. pages 2483–2499, October 2020. 10.33012/2020.17698.
Lukas and Stoker [2016] Vicki Lukas and J. M. Stoker. 3D Elevation Program—Virtual USA in 3D. USGS Numbered Series 2016-3022, U.S. Geological Survey, Reston, VA, 2016. Code Number: 2016-3022 Code: 3D Elevation Program—Virtual USA in 3D Publication Title: 3D Elevation Program—Virtual USA in 3D Reporter: 3D Elevation Program—Virtual USA in 3D Series: Fact Sheet IP-074727.
Krishnan et al. [2011] Sriram Krishnan, Christopher Crosby, Viswanath Nandigam, Minh Phan, Charles Cowart, Chaitanya Baru, and Ramon Arrowsmith. OpenTopography: a services oriented architecture for community access to LIDAR topography. In Proceedings of the 2nd International Conference on Computing for Geospatial Research & Applications - COM.Geo ’11, pages 1–8, Washington, DC, 2011. ACM Press. ISBN 978-1-4503-0681-2. 10.1145/1999320.1999327.
Wang et al. [2020] Cheng Wang, Chenglu Wen, Yudi Dai, Shangshu Yu, and Minghao Liu. Urban 3D modeling with mobile laser scanning: a review. Virtual Reality & Intelligent Hardware, 2(3):175–212, June 2020. ISSN 20965796. 10.1016/j.vrih.2020.05.003.
Iglewicz and Hoaglin [1993] Boris Iglewicz and David Caster Hoaglin. How to Detect and Handle Outliers. ASQC Quality Press, 1993. ISBN 978-0-87389-247-6. Google-Books-ID: siInAQAAIAAJ.
Lindsay [1995] Bruce G. Lindsay. Mixture Models: Theory, Geometry, and Applications. IMS, 1995. ISBN 978-0-940600-32-4. Google-Books-ID: VFDzNhikFbQC.
Joerger and Pervan [2019] M. Joerger and B. Pervan. Quantifying Safety of Laser-Based Navigation. IEEE Transactions on Aerospace and Electronic Systems, 55(1):273–288, February 2019. ISSN 1557-9603. 10.1109/TAES.2018.2850381. Conference Name: IEEE Transactions on Aerospace and Electronic Systems.
Zhu et al. [2020] C. Zhu, M. Joerger, and M. Meurer. Quantifying Feature Association Error in Camera-based Positioning. In 2020 IEEE/ION Position, Location and Navigation Symposium (PLANS), pages 967–972, April 2020. 10.1109/PLANS46316.2020.9109919. ISSN: 2153-3598.
Kendall et al. [2015] Alex Kendall, Matthew Grimes, and Roberto Cipolla. PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 2938–2946, Santiago, Chile, December 2015. IEEE. ISBN 978-1-4673-8391-2. 10.1109/ICCV.2015.336.
Cadena et al. [2016] Cesar Cadena, Luca Carlone, Henry Carrillo, Yasir Latif, Davide Scaramuzza, Jose Neira, Ian Reid, and John J. Leonard. Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age. IEEE Transactions on Robotics, 32(6):1309–1332, December 2016. ISSN 1552-3098, 1941-0468. 10.1109/TRO.2016.2624754. arXiv: 1606.05830.
Russell and Reale [2019] Rebecca L. Russell and Christopher Reale. Multivariate Uncertainty in Deep Learning. arXiv:1910.14215 [cs, stat], October 2019. arXiv: 1910.14215.
Liu et al. [2018] Katherine Liu, Kyel Ok, William Vega-Brown, and Nicholas Roy. Deep Inference for Covariance Estimation: Learning Gaussian Noise Models for State Estimation. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 1436–1443, Brisbane, QLD, May 2018. IEEE. ISBN 978-1-5386-3081-5. 10.1109/ICRA.2018.8461047.
Pintus et al. [2011] Ruggero Pintus, Enrico Gobbetti, and Marco Agus. Real-time rendering of massive unstructured raw point clouds using screen-space operators. In Proceedings of the 12th International conference on Virtual Reality, Archaeology and Cultural Heritage, pages 105–112, 2011.
Dosovitskiy et al. [2015] Alexey Dosovitskiy, Philipp Fischer, Eddy Ilg, Philip Hausser, Caner Hazirbas, Vladimir Golkov, Patrick van der Smagt, Daniel Cremers, and Thomas Brox. FlowNet: Learning Optical Flow with Convolutional Networks. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 2758–2766, Santiago, December 2015. IEEE. ISBN 978-1-4673-8391-2. 10.1109/ICCV.2015.316.
Huber [1992] Peter J. Huber. Robust Estimation of a Location Parameter. In Samuel Kotz and Norman L. Johnson, editors, Breakthroughs in Statistics, pages 492–518. Springer New York, New York, NY, 1992. ISBN 978-0-387-94039-7 978-1-4612-4380-9. 10.1007/978-1-4612-4380-9_35. Series Title: Springer Series in Statistics.
Zeng and Ji [2015] Tao Zeng and Shuiwang Ji. Deep Convolutional Neural Networks for Multi-instance Multi-task Learning. In 2015 IEEE International Conference on Data Mining, pages 579–588, Atlantic City, NJ, USA, November 2015. IEEE. ISBN 978-1-4673-9504-5. 10.1109/ICDM.2015.92.
Barfoot et al. [2011] Timothy Barfoot, James R. Forbes, and Paul T. Furgale. Pose estimation using linearized rotations and quaternion algebra. Acta Astronautica, 68(1-2):101–112, January 2011. ISSN 00945765. 10.1016/j.actaastro.2010.06.049.
Rousseeuw and Hubert [2018] Peter J. Rousseeuw and Mia Hubert. Anomaly detection by robust statistics. WIREs Data Mining and Knowledge Discovery, 8(2), March 2018. ISSN 1942-4787, 1942-4795. 10.1002/widm.1236.
Goodfellow et al. [2016] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, November 2016. ISBN 978-0-262-03561-3. Google-Books-ID: Np9SDQAAQBAJ.
Burden and Faires [2011] Richard L. Burden and J. Douglas Faires. Numerical Analysis. Brooks/Cole, Cengage Learning, 2011. ISBN 978-0-538-73564-3. Google-Books-ID: KlfrjCDayHwC.
Geiger et al. [2012] A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous driving? The KITTI vision benchmark suite. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 3354–3361, Providence, RI, June 2012. IEEE. ISBN 978-1-4673-1228-8 978-1-4673-1226-4 978-1-4673-1227-1. 10.1109/CVPR.2012.6248074.
Skafte et al. [2019] Nicki Skafte, Martin Jø rgensen, and Sø ren Hauberg. Reliable training and estimation of variance networks. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d\textquotesingle Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, volume 32, pages 6326–6336. Curran Associates, Inc., 2019.
Paszke et al. [2019] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d\textquotesingle Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
Tossaint et al. [2007] M. Tossaint, J. Samson, F. Toran, J. Ventura-Traveset, M. Hernandez-Pajares, J.M. Juan, J. Sanz, and P. Ramos-Bosch. The Stanford - ESA Integrity Diagram: A New Tool for The User Domain SBAS Integrity Assessment. Navigation, 54(2):153–162, June 2007. ISSN 00281522. 10.1002/j.2161-4296.2007.tb00401.x.