Quantifying and Using System Uncertainty
in UAV Navigation
Abstract
As autonomous systems increasingly rely on Deep Neural Networks (DNN) to implement the navigation pipeline functions, uncertainty estimation methods have become paramount for estimating confidence in DNN predictions. Bayesian Deep Learning (BDL) offers a principled approach to model uncertainties in DNNs. However, DNN components from autonomous systems partially capture uncertainty, or more importantly, the uncertainty effect in downstream tasks is ignored. This paper provides a method to capture the overall system uncertainty in a UAV navigation task. In particular, we study the effect of the uncertainty from perception representations in downstream control predictions. Moreover, we leverage the uncertainty in the system’s output to improve control decisions that positively impact the UAV’s performance on its task.
I Introduction
Navigation in complex environments still represents a big challenge for automated systems (AS). Particular instances of this problem are autonomous driving and autonomous aerial navigation in the context of Unmanned Aerial Vehicles (UAVs). In both cases, the navigation task is addressed by first acquiring rich and complex raw sensory information (e.g., from camera, radar, LiDAR, etc.), which is then processed to drive the agent towards its goal. Usually, this process is done in sequence, where tasks and specific software components are linked together in the so-called perception-planning-control software pipeline [1, 2]. In the last decade, Deep Neural Networks (DNNs) have become a popular choice to implement navigation components. For this purpose, three paradigms exist to develop and train DNN-based components: Modular learning (isolated) [3], End-to-End (E2E) learning [4, 5, 6], and a mixed approach [7, 8].
Despite the remarkable progress in representation learning, DNNs should also represent the confidence in their predictions to deploy them in safety-critical systems. McAllister et al. [2] proposed using Bayesian Deep Learning (BDL) to implement the components from navigation pipelines. Bayesian methods offer a principled framework to model and capture uncertainty. Nevertheless, if the Bayesian approach is followed, all the components in the system should use BDL methods to enable uncertainty propagation in the navigation pipeline. Hence, BDL components should admit uncertainty information as an input to account for the uncertainty from the outputs of preceding BDL components (See Fig. 1 bottom).
In recent years, a large body of literature has employed uncertainty estimation methods in robotic tasks thanks to its potential to improve the safety of automated functions [9], and the capacity to increase the task performance [10, 11]. However, uncertainty is captured partially in navigation pipelines that utilize DNNs. BDL methods are used mainly in perception task, and downstream components (e.g., planning and control) usually ignore the uncertainty from the preceding components or do not capture uncertainty in their predictions. Although some works propagate downstream perceptual uncertainty from intermediate representations [12, 13, 14], the overall system output does not take into account all the uncertainty sources from DNN components in the pipeline.

Quantifying uncertainty in a BDL-based system (i.e., a pipeline of BDL components) still remains a challenging task. Uncertainties from BDL components must be assembled in a principled way to provide a reliable measure of overall system uncertainty, based on which safe decisions can be made [2, 15, 16] (See Fig. 1 top). In this paper, we propose to capture and use the the overall system uncertainty in the navigation pipeline to improve the UAV performance on its task.
II System Task Formulation
In this paper, we address the problem of autonomous aerial navigation. The goal of the UAV is to navigate through a set of gates with unknown locations disposed in a circular track. Following prior work from [7, 12], the navigation architecture consists of two neural networks: one for perception and the other for control, as shown in Fig. 2. To achieve goal, the navigation task is formulated as a sequential-decision making problem, where a control action is produced given an environment observation. In this regard, the simulation environment provides at time an observation comprised of an RGB image acquired from a front-facing camera on the UAV. The perception component defines an encoder function that maps the input image to a rich low dimensional representation . Next, a control policy maps the compact representation , to control commands , corresponding to linear and yaw velocities in the UAV body frame.
In the perception component, a cross-modal variational autoencoder (CMVAE) [17, 7] is used to learn a rich and robust compact representation. A CMVAE is a variant of traditional variational autoencoders (VAE) [18] that learns a single latent representation for multiple data modalities. In this case, two data modalities are used: the RGB images and the pose of the gate relative to the UAV body-frame. The CMVAE encoder maps then an input image to a noisy representation with mean and variance in the latent space, from where latent vectors are sampled, . The encoder is based on the Dronet architecture [4], and additional constraints on the latent space are imposed to promote the learning of robust disentangled representations. For the downstream control task (control policy ), a feed-forward network is used to operate on the latent vectors . For more information about the general architecture for aerial navigation, we refer the reader to [7, 12].
III Methodology
III-A Uncertainty from Perception Representations
Although the CMVAE encoder employs Bayesian inference to obtain latent vectors , CM-VAE does not capture epistemic uncertainty since the encoder lacks a distribution over parameters . To capture uncertainty in the perception encoder we follow prior work from [19, 20] that attempts to capture epistemic uncertainty in VAEs. We adapt the CM-VAE to capture the posterior as shown in eq. 1.
(1) |
To approximate eq. 1 we take a set of encoder parameters samples , to obtain a set of latent samples at the output of the encoder. In practice, we modify CMVAE by adding a dropout layer in the encoder. Then, we use Monte Carlo Dropout (MCD) [21] to approximate the posterior on the encoder weights . Finally, for a given input image we perform stochastic forward passes (with dropout \sayturned on) to compute a set of latent vector samples at runtime.

III-B Input Uncertainty for Control
In BDL, downstream uncertainty propagation assumes that a neural network component is able to handle or admit uncertainty at the input. In our case, this implies that the neural network for control is able to handle the uncertainty coming from the perception component. When BNNs are used with LV (BNN+LV), both types of uncertainty can be captured [22, 23, 24, 12]. To capture the overall system uncertainty at the output of the controller, we compute the posterior predictive distribution for target variable associated with a new input image , as shown in eq. 2.
(2) |
The above integrals are intractable, and we rely on approximations to obtain an estimation of the predictive distribution. The posterior is difficult to evaluate, thus we can approximate the inner integral using an ensemble of neural networks [25]. In practice, we train an ensemble of probabilistic control policies , with weights , and where each control policy in the ensemble predicts the mean and variance for each velocity command, i.e., and . For training each probabilistic control policy we use imitation learning and the heteroscedastic loss function, as suggested by [26, 27].
The outer integral is approximated by taking a set of samples from the perception component latent space. In [12] latent samples are drawn using the encoder mean and variance . For the sake of simplicity, we directly use the samples obtained in the perception component to take into account the epistemic uncertainty from the previous stage. Finally, the predictions that we get from passing each latent vector through each ensemble member are used to estimate the posterior predictive distribution in eq. 2. From the control policy perspective, using multiple latent samples can be seen as taking a better \saypicture of the latent space (perception representation) to gather more information about the environment. Interestingly, we can also make a connection between our sampling approach and MEMO [28], a method that performs augmentations on test dataset points to improve prediction robustness. Our method is illustrated in Fig. 2.
III-C Decision-Making Under Uncertainty
For control, assigning the same weight to each ensemble member prediction to compute the ensemble mixture predictive mean and variance [27, 25] can result in sub-optimal solutions when facing multimodal predictions that could arise from ambiguous inputs. For example, the predictions from one control policy network could attempt to move the UAV to the left, while the predictions from another control policy could try to move the UAV to the right. This can result in the UAV moving straight and hence can lead to a tragic result.
To overcome this problem, we take inspiration from techniques on active learning for Bayesian Deep Learning [29, 30], that use mutual information. However, in our approach, given the samples from the latent space, we propose to choose the control predictions (the density) from the ensemble member that minimizes the mutual information, as presented in eq. 3.
(3) |
In our navigation architecture, the mutual information is formulated as follows,
(4) |
To estimate the mutual information from the previous integral, we use the variational lower bound approximation from [31] taking the latent samples and each ensemble member predictions ( and ), as presented in eq. 5:
(5) |
Once an ensemble member is chosen, we can use the control policy predicted densities by taking the mean or the modes, as presented in Fig.2.
IV Experiments
For our experiments, we study the impact of the uncertainty from perception representations in a downstream control component. We seek to answer the following research questions. RQ1. How does uncertainty from perception representations affect the downstream control predictions? RQ2. Can we improve the UAV performance using perception and control uncertainties?


IV-A Experimental setup
IV-A1 Navigation Model Baselines
All the navigation architectures are based on [7] and are implemented using PyTorch. Table I shows the uncertainty-aware navigation architectures used in our experiments, and the type of perception component, the number latent variable samples (LVS), the type of control policy, and the number of control prediction samples (CPS) at the output of the system. For instance, represents our Bayesian navigation pipeline. perception component captures epistemic uncertainty using MCD with 32 forward passes for each input to get 32 latent variable predictions. For the sake of simplicity, perception predictions are directly used as latent variable samples in downstream control. The control component uses a ensemble of 5 probabilistic control policies obtaining 160 control prediction samples.
Model | Perception | LVS | Control Policy | CPS |
---|---|---|---|---|
MCD-CMVAE | 32 | Ensemble (5) Prob. | 160 | |
CMVAE | 32 | Ensemble (5) Prob. | 160 | |
CMVAE | 1 | Ensemble (5) Prob. | 5 | |
CMVAE | 32 | Deterministic | 32 | |
CMVAE | 1 | Prob. | 1 |


IV-A2 Datasets
We use two independent datasets for each component in the navigation pipeline. The CMVAE uses a dataset of 300k images with gate pose labels. The control component uses a dataset of 17k images with drone velocity labels. The perception dataset split divided into 80% for training, and the remaining 20% for validation and testing. The control dataset uses a split of 90% for training and the remaining for validation and testing. In both cases the image size is 64x64 pixels.
IV-A3 Evaluation Procedure
First, we observe predictions with an out-of-distribution input sample that introduces ambiguity to the model (double-gate). Then, we evaluate the navigation model using AirSim. We use a circular track with eight equally spaced gates positioned initially in a radius of 8 m and constant height. To assess the system performance to perturbations in the environment, we generate new tracks adding random noise to each gate radius and height. In the context of the AirSim [32] simulation environment, a track is entirely defined by a set of gates, their poses in three-dimensional space, and the agent navigation direction. For perception-based navigation, the complexity of a track resides in the \saygate-visibility difficulty [33, 34], i.e., how well the camera Field-of-View (FoV) captures the gate. A natural way to increase track complexity is by adding a random displacement to the position of each gate. A track without random displacement in the gates has a circular fashion. Gate position randomness alters the shape of the track, affecting the gate visibility, i.e., gates are: not visible, partially visible or multiple gates can be captured in the UAV FoV as presented in Fig. 3. We measure the system performance by taking the average number of gates passed in six different tracks. The UAV mission considers a maximum of 32 gates which is equivalent to 4 laps (8 gates / lap), and two levels of noise for gate offset: Gate Radius Noise (GRN) and Gate Height Noise (GHN). Each navigation model has two trials on each track. In addition, we consider two control decision-making strategies (CDMS), to use control predictions. The first strategy uses the deep ensemble mean. The second control strategy uses mutual information lower bound to select the ensemble member predicted density from where we choose the lowest velocity modes for , while for we choose a conservative strategy by selecting the lowest predicted velocity.
Avg. number of gates passed
Model | CDMS | Passed Gates with Noise Level | |||
---|---|---|---|---|---|
|
|
||||
MI-mode | 28.88 | 23.05 | |||
DE-mean | 19.77 | 9.22 | |||
DE-mean | 17.67 | 6.0 | |||
DE-mean | 17.33 | 4.0 | |||
DE-mean | 8.33 | 5.0 | |||
DE-mean | 15.16 | 4.38 |
IV-B Results
Fig. 4(a) shows predictions at the output of the perception () and control (, ) components. Predictions are made using an input sample image with two gates to see if the model is able to capture the ambiguity from the sample. Interestingly, the control densities, using all the CPS from the ensemble, show that is able to represent the ambiguity in the input. The predicted velocity densities shows complex multimodal distributions (two peaks) for , and . the multimodal predictions are observed more in detail in Fig. 4(b), in the prediciton densities for each ensemble member for and velocity commands.
Finally, Table II presents the navigation performance results. In general, learning to predict uncertainty in the control components can boost the performance significantly. In the case of , the strategy to select control predictions clearly impacts the navigation performance. In general, using the control ensemble mean can lead to similar results. However, exploiting the ensemble members predicted distributions (as seen 4) can boost the model performance.
V Conclusion
We presented a method to capture and propagate uncertainty along a navigation pipeline implemented with Bayesian deep learning components for UAV navigation. We analyzed the effect of uncertainty propagation regarding system component predictions and performance. Our experiments show that our approach for capturing and propagating uncertainty along the system can provide valuable predictions and uncertainty estimates to build dependable systems. However, a proper use of component predictions and uncertainty estimates are needed to positively impact the system performance. In future work, we aim to explore sampling-free methods for uncertainty estimation [35] to reduce the computational budget and memory footprint in our approach.
Acknowledgment
This work has received funding from the COMP4DRONES project, under Joint Undertaking (JU) grant agreement N°826610. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and from Spain, Austria, Belgium, Czech Republic, France, Italy, Latvia, Netherlands.
References
- [1] R. Siegwart, I. R. Nourbakhsh, and D. Scaramuzza, Introduction to autonomous mobile robots. MIT press, 2011.
- [2] R. McAllister, Y. Gal, A. Kendall, M. Van Der Wilk, A. Shah, R. Cipolla, and A. Weller, “Concrete problems for autonomous vehicle safety: Advantages of bayesian deep learning,” in Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence, Inc., 2017.
- [3] S. Grigorescu, B. Trasnea, T. Cocias, and G. Macesanu, “A survey of deep learning techniques for autonomous driving,” Journal of Field Robotics, vol. 37, no. 3, pp. 362–386, 2020.
- [4] A. Loquercio, A. I. Maqueda, C. R. Del-Blanco, and D. Scaramuzza, “Dronet: Learning to fly by driving,” IEEE Robotics and Automation Letters, vol. 3, no. 2, pp. 1088–1095, 2018.
- [5] F. Codevilla, E. Santana, A. M. López, and A. Gaidon, “Exploring the limitations of behavior cloning for autonomous driving,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 9329–9338.
- [6] W. Zeng, W. Luo, S. Suo, A. Sadat, B. Yang, S. Casas, and R. Urtasun, “End-to-end interpretable neural motion planner,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 8660–8669.
- [7] R. Bonatti, R. Madaan, V. Vineet, S. Scherer, and A. Kapoor, “Learning visuomotor policies for aerial navigation using cross-modal representations,” arXiv preprint arXiv:1909.06993, 2019.
- [8] M. Mueller, A. Dosovitskiy, B. Ghanem, and V. Koltun, “Driving policy transfer via modularity and abstraction,” in Conference on Robot Learning. PMLR, 2018, pp. 1–15.
- [9] R. Michelmore, M. Kwiatkowska, and Y. Gal, “Evaluating uncertainty quantification in end-to-end autonomous driving control,” arXiv preprint arXiv:1811.06817, 2018.
- [10] F. Nozarian, C. Müller, and P. Slusallek, “Uncertainty quantification and calibration of imitation learning policy in autonomous driving.” in TAILOR, 2020, pp. 146–162.
- [11] E. Ohn-Bar, A. Prakash, A. Behl, K. Chitta, and A. Geiger, “Learning situational driving,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11 296–11 305.
- [12] F. Arnez, H. Espinoza, A. Radermacher, and F. Terrier, “Improving robustness of deep neural networks for aerial navigation by incorporating input uncertainty,” in International Conference on Computer Safety, Reliability, and Security. Springer, 2021, pp. 219–225.
- [13] B. Ivanovic, K.-H. Lee, P. Tokmakov, B. Wulfe, R. McAllister, A. Gaidon, and M. Pavone, “Heterogeneous-agent trajectory forecasting incorporating class uncertainty,” arXiv preprint arXiv:2104.12446, 2021.
- [14] S. Casas, C. Gulino, S. Suo, K. Luo, R. Liao, and R. Urtasun, “Implicit latent variable model for scene-consistent motion forecasting,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIII 16. Springer, 2020, pp. 624–641.
- [15] A. Lavin, C. M. Gilligan-Lee, A. Visnjic, S. Ganju, D. Newman, S. Ganguly, D. Lange, A. G. Baydin, A. Sharma, A. Gibson et al., “Technology readiness levels for machine learning systems,” arXiv preprint arXiv:2101.03989, 2021.
- [16] H. Rueß and S. Burton, “Safe ai–how is this possible?” arXiv preprint arXiv:2201.10436, 2022.
- [17] A. Spurr, J. Song, S. Park, and O. Hilliges, “Cross-modal deep variational hand pose estimation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 89–98.
- [18] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
- [19] E. Daxberger and J. M. Hernández-Lobato, “Bayesian variational autoencoders for unsupervised out-of-distribution detection,” arXiv preprint arXiv:1912.05651, 2019.
- [20] A. Jesson, S. Mindermann, U. Shalit, and Y. Gal, “Identifying causal-effect inference failure with uncertainty-aware models,” Advances in Neural Information Processing Systems, vol. 33, pp. 11 637–11 649, 2020.
- [21] Y. Gal and Z. Ghahramani, “Dropout as a bayesian approximation: Representing model uncertainty in deep learning,” in international conference on machine learning, 2016, pp. 1050–1059.
- [22] S. Depeweg, J. Hernández-Lobato, F. Doshi-Velez, and S. Udluft, “Learning and policy search in stochastic dynamical systems with bayesian neural networks,” in 5th International Conference on Learning Representations, ICLR 2017-Conference Track Proceedings, 2017.
- [23] S. Depeweg, J.-M. Hernandez-Lobato, F. Doshi-Velez, and S. Udluft, “Decomposition of uncertainty in bayesian deep learning for efficient and risk-sensitive learning,” in International Conference on Machine Learning. PMLR, 2018, pp. 1184–1193.
- [24] M. Henaff, A. Canziani, and Y. LeCun, “Model-predictive policy learning with uncertainty regularization for driving in dense traffic,” in International Conference on Learning Representations, 2018.
- [25] F. K. Gustafsson, M. Danelljan, and T. B. Schön, “Evaluating scalable bayesian deep learning methods for robust computer vision,” arXiv preprint arXiv:1906.01620, 2019.
- [26] A. Kendall and Y. Gal, “What uncertainties do we need in bayesian deep learning for computer vision?” in Advances in neural information processing systems, 2017, pp. 5574–5584.
- [27] B. Lakshminarayanan, A. Pritzel, and C. Blundell, “Simple and scalable predictive uncertainty estimation using deep ensembles,” in Advances in neural information processing systems, 2017, pp. 6402–6413.
- [28] M. Zhang, S. Levine, and C. Finn, “Memo: Test time robustness via adaptation and augmentation,” arXiv preprint arXiv:2110.09506, 2021.
- [29] Y. Gal, R. Islam, and Z. Ghahramani, “Deep bayesian active learning with image data,” in International Conference on Machine Learning. PMLR, 2017, pp. 1183–1192.
- [30] A. Kirsch, J. Van Amersfoort, and Y. Gal, “Batchbald: Efficient and diverse batch acquisition for deep bayesian active learning,” Advances in neural information processing systems, vol. 32, 2019.
- [31] B. Poole, S. Ozair, A. Van Den Oord, A. Alemi, and G. Tucker, “On variational bounds of mutual information,” in International Conference on Machine Learning. PMLR, 2019, pp. 5171–5180.
- [32] S. Shah, D. Dey, C. Lovett, and A. Kapoor, “Airsim: High-fidelity visual and physical simulation for autonomous vehicles,” in Field and service robotics. Springer, 2018, pp. 621–635.
- [33] R. Madaan, N. Gyde, S. Vemprala, M. Brown, K. Nagami, T. Taubner, E. Cristofalo, D. Scaramuzza, M. Schwager, and A. Kapoor, “Airsim drone racing lab,” arXiv preprint arXiv:2003.05654, 2020.
- [34] Y. Song, M. Steinweg, E. Kaufmann, and D. Scaramuzza, “Autonomous drone racing with deep reinforcement learning,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2021, pp. 1205–1212.
- [35] B. Charpentier, O. Borchert, D. Zügner, S. Geisler, and S. Günnemann, “Natural posterior network: Deep bayesian predictive uncertainty for exponential family distributions,” arXiv preprint arXiv:2105.04471, 2021.