This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Probabilistic Traversability Model for Risk-Aware Motion Planning in Off-Road Environments

Xiaoyi Cai1, Michael Everett2, Lakshay Sharma1, Philip R. Osteen3, and Jonathan P. How1 1Massachusetts Institute of Technology, Cambridge, MA 02139, USA. {xyc, lakshays, jhow}@mit.edu.2Northeastern University, Boston, MA 02115, USA. [email protected].3DEVCOM Army Research Laboratory, Adelphi, MD 20783, USA. [email protected]. Distribution Statement A. Approved for public release: distribution unlimited.
Abstract

A key challenge in off-road navigation is that even visually similar terrains or ones from the same semantic class may have substantially different traction properties. Existing work typically assumes no wheel slip or uses the expected traction for motion planning, where the predicted trajectories provide a poor indication of the actual performance if the terrain traction has high uncertainty. In contrast, this work proposes to analyze terrain traversability with the empirical distribution of traction parameters in unicycle dynamics, which can be learned by a neural network in a self-supervised fashion. The probabilistic traction model leads to two risk-aware cost formulations that account for the worst-case expected cost and traction. To help the learned model generalize to unseen environment, terrains with features that lead to unreliable predictions are detected via a density estimator fit to the trained network’s latent space and avoided via auxiliary penalties during planning. Simulation results demonstrate that the proposed approach outperforms existing work that assumes no slip or uses the expected traction in both navigation success rate and completion time. Furthermore, avoiding terrains with low density-based confidence score achieves up to 30% improvement in success rate when the learned traction model is used in a novel environment.

Supplementary Material

Video and GPU implementation of planners are available at https://github.com/mit-acl/mppi_numba.

© 2023 IEEE. The paper will appear in the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2023). Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

I Introduction

Progress in autonomous robot navigation has expanded the set of non-urban environments where robots can be deployed, such as mines, forests, oceans and Mars [1, 2, 3, 4]. Unlike environments where safe and reliable navigation can be achieved by avoiding hazards easily detected based on geometric features, navigation in forested environments poses unique challenges that still prevent systems from achieving good performance, because a purely geometric view of the world is not sufficient to identify non-geometric hazards (e.g., mud puddles, slippery surfaces) and geometric non-hazards (e.g., grass and foliage). To this end, recent approaches train semantic classifiers for camera images [5, 6, 7, 8, 9] or lidar pointclouds [10] to identify terrains or objects that could cause failures to the robotic platform in order to manually design semantics-based cost functions for navigation. However, existing labeled datasets for off-road navigation have limited number of class labels such as “bush”, “grass”, and “tree” for vegetation without capturing the varying traversability within each class (e.g., [11, 12]), or have limited transferability due to specificity to the vehicles that collected the data.

Refer to caption
Figure 1: Imperfect sensing and coarse semantic labels about the terrain make it difficult to model traction and future states when given a control sequence. This work proposes to learn empirical traction distribution in order to estimate the worst-case expected outcomes for risk-aware planning.

While manually designing cost functions based on semantically labeled terrains are common, self-supervised learning techniques are increasingly adopted to model traversability based on historically collected data about terrain properties [13, 14, 15]. For example, a robot can learn to map camera images to terrain properties such as traction that affects achievable velocities [16, 17, 18]. However, these methods do not fully account for the uncertainty in the learned terrain models. Based on our empirical findings (see the real-world traction distribution learned by a neural network (NN) in the “Traversability Analysis” block of Fig. 3), terrains properties such as traction can be non-Gaussian, which makes assuming no slip or using the expected traction inadequate to capture the risk of obtaining poor performance. Even if the full empirical distribution can be learned about the traction parameters, working with distributions that are not necessarily Gaussian makes it difficult to efficiently characterise and estimate the cost of experiencing tail events during planning.

In this work, we propose to analyze terrain traversability with the empirical distribution of traction parameters in the unicycle model without making Gaussian assumptions. In addition, we propose two cost formulations that exploit the learned traction distribution to estimate the impact of tail events on navigation performance. The resultant planners mitigate the risks that stem from the stochastic dynamics and the downstream realization of the objectives. Lastly, as a NN’s predictions are unreliable when input features are significantly different from training data, we fit a Gaussian Mixture Model (GMM) in the latent space of the trained NN to provide confidence scores for detecting and avoiding novel terrains via auxiliary planning costs. In summary, the contributions of this work are:

  • A new representation of traversability as an empirical traction distribution in the unicycle dynamics conditioned on both semantic and geometric terrain features;

  • Two risk-aware cost formulations based on the worst-case expected traction and objective that outperform methods that assume no slip or use expected traction;

  • A GMM-based detector for detecting terrains that may lead to unreliable NN predictions and should be avoided during planning, which improves navigation success rate by 30% when a learned traction model is used in an environment unseen during training.

II Related Work

II-A Traversability Representation

Traversability analysis is a key component of off-road navigation algorithms; a more complete summary of various approaches is provided in [17], including representations based on proprioceptive measurements [19, 20], geometric features [21, 22, 23, 1] and combination of geometric and semantic features [24, 25, 10]. WayFast [16] is a more recent approach, similar to this work, that proposes to represent traversability by learning traction coefficients for a unicycle model from terrain perception data. Another recent work [2] represents traversability as the probability of a quadruped robot to stabilize itself on uneven terrain, based on 3D occupancy data of the terrain. In [26], speed and gait policies are learned based on terrain semantics and human demonstrations; these policies provide a novel interpretation of the terrain’s traversability and can be used by the robot’s motor control policy. A key limitation, however, is that these point estimates of traversability do not capture the uncertainty of terrain properties on similar-looking terrains. Instead, our approach represents traversability as the distribution of traction parameters in the dynamics model.

II-B Planning with Terrain-Dependent Stochastic Dynamics

After learning the traction distribution in the dynamics model, a further challenge exists in incorporating this model into the planner. Despite capturing various types of uncertainties and risks, many existing methods still plan with the nominal or expected parameters in the dynamics model, such as [25, 27, 17]. Alternatively, our work is inspired by [28] that proposes a general framework for optimizing the conditional value at risk (CVaR) of the objective under uncertain dynamics, parameters and initial conditions, by taking extra samples from sources of uncertainty. In this work, we evaluate each control sequence based on traction samples and use the CVaR of the noisy realizations of the objectives instead of the nominal objective. Additionally, we propose a more computationally efficient approach which computes the nominal objective using trajectory rollouts based on the CVaR of traction parameters. Compared to WayFast [16] that uses the expected traction values (a risk-neutral special case of our second method), our approach can produce behaviors that are more risk-averse by adjusting the worst-case quantiles used to compute the CVaR of traction.

II-C Uncertainty Estimation for Neural Networks

Data collection in practice is often expensive and limited in diversity, so it is important to know when a learned model cannot be trusted (e.g., due to input features that are significantly different from training data). NN uncertainty estimation is well studied in the machine learning literature (e.g., see survey [29]), where representative techniques include Monte Carlo dropouts [30], ensembles [31], and single-pass methods [32]. Most similar to the single-pass technique [32] that only requires a single neural network evaluation to estimate uncertainty, our work tries to capture the aleatoric uncertainty (the inherent process uncertainty that cannot be reduced with more data) by predicting a distribution, and the epistemic uncertainty (the model uncertainty that can be reduced with more data) by leveraging the latent space density of NN. Compared to using dropouts or ensembles, single-pass methods are attractive, because they do not require taking extra samples or using more memory.

III Problem Formulation

We consider the problem of motion planning for a wheeled vehicle whose dynamics depend on the underlying terrains, where the traction values are uncertain due to imperfect sensing. Therefore, we model traction values as random variables whose distributions can be learned empirically.

III-A Unicycle Model with Uncertain Traction Parameters

Consider the discrete time system:

𝐱t+1=F(𝐱t,𝐮t,𝝍t),\mathbf{x}_{t+1}=F(\mathbf{x}_{t},\mathbf{u}_{t},\bm{\psi}_{t}), (1)

where 𝐱t𝐗n\mathbf{x}_{t}\in\mathbf{X}\subseteq\mathbb{R}^{n} is the state vector, 𝐮tm\mathbf{u}_{t}\in\mathbb{R}^{m} is the control input, and 𝝍t𝚿q\bm{\psi}_{t}\in\bm{\Psi}\subseteq\mathbb{R}^{q} is the parameter vector that captures the terrain traction. For each mission, terrain-dependent parameter vectors are sampled from the ground truth distribution 𝝍p(|𝐨𝐱)\bm{\psi}\sim p^{*}(\cdot|\mathbf{o}_{\mathbf{x}}) for every 𝐱𝐗\mathbf{x}\in\mathbf{X} and the associated terrain features 𝐨𝐱𝑶\mathbf{o}_{\mathbf{x}}\in\bm{O}. For concreteness, we use the following unicycle model

[pt+1xpt+1yθt+1]=[ptxptyθt]+Δ[ψ1vtcos(θt)ψ1vtsin(θt)ψ2ωt],\begin{bmatrix}p_{t+1}^{x}\\ p_{t+1}^{y}\\ \theta_{t+1}\end{bmatrix}=\begin{bmatrix}p_{t}^{x}\\ p_{t}^{y}\\ \theta_{t}\end{bmatrix}+\Delta\cdot\begin{bmatrix}\psi_{1}\cdot v_{t}\cdot\cos{(\theta_{t})}\\ \psi_{1}\cdot v_{t}\cdot\sin{(\theta_{t})}\\ \psi_{2}\cdot\omega_{t}\end{bmatrix}, (2)

where 𝐱t=[ptx,pty,θt]\mathbf{x}_{t}=[p_{t}^{x},p_{t}^{y},\theta_{t}]^{\top} contains the X, Y positions and yaw, 𝐮t=[vt,ωt]\mathbf{u}_{t}=[v_{t},\omega_{t}]^{\top} contains the linear and angular velocities, 𝝍t=[ψ1,ψ2]\bm{\psi}_{t}=[\psi_{1},\psi_{2}]^{\top} contains the linear and angular traction values 0ψ1,ψ210\leq\psi_{1},\psi_{2}\leq 1, and Δ>0\Delta>0 is the time interval. Intuitively, traction captures how much of the commanded velocities can be achieved and is a good indicator for terrain traversability for fast off-road navigation.

III-B Motion Planning

As this work focuses on achieving fast navigation to a given goal position, we adopt and modify the minimum-time formulation used in [17], but any other task-specific objectives can be used instead. Given initial state 𝐱0\mathbf{x}_{0} and goal position 𝐩goal2\mathbf{p}^{\text{goal}}\in\mathbb{R}^{2}, the problem of finding a sequence of control 𝐮0:T1\mathbf{u}_{0:T-1} can be written as

min𝐮0:T1\displaystyle\min_{\mathbf{u}_{0:T-1}}\quad C(𝐱0:T):=ϕ(𝐱T)+t=0T1q(𝐱t),\displaystyle C(\mathbf{x}_{0:T}):=\phi(\mathbf{x}_{T})+\sum_{t=0}^{T-1}q(\mathbf{x}_{t}), (3)
s.t. 𝐱t+1=F(𝐱t,𝐮t,𝝍t),t{0,,T1},\displaystyle\mathbf{x}_{t+1}=F(\mathbf{x}_{t},\mathbf{u}_{t},\bm{\psi}_{t}),\quad\forall t\in\{0,\dots,T-1\}, (4)

where ϕ(𝐱T)\phi(\mathbf{x}_{T}) and q(𝐱t)q(\mathbf{x}_{t}) are the terminal cost and the stage cost:

ϕ(𝐱T)\displaystyle\phi(\mathbf{x}_{T}) =𝐩goal𝐩Tsdefault(1𝟙done(𝐱0:T)),\displaystyle=\frac{\left\lVert\mathbf{p}^{\text{goal}}-\mathbf{p}_{T}\right\rVert}{s^{\text{default}}}\left(1-\mathbb{1}^{\text{done}}(\mathbf{x}_{0:T})\right), (5)
q(𝐱t)\displaystyle q(\mathbf{x}_{t}) =Δ(1𝟙done(𝐱0:t))+wdist𝐩goal𝐩t,\displaystyle=\Delta\left(1-\mathbb{1}^{\text{done}}(\mathbf{x}_{0:t})\right)+w^{\text{dist}}\cdot\left\lVert\mathbf{p}^{\text{goal}}-\mathbf{p}_{t}\right\rVert, (6)

where sdefaults^{\text{default}} is the default speed for estimating time-to-go at the end of the rollout, Δ\Delta is the sampling duration, 𝐩t\mathbf{p}_{t} is the robot position at tt, and wdist>0w^{\text{dist}}>0 is the weight for penalizing distance from the goal. To avoid accumulating costs after robot reaches the goal, we use an indicator function 𝟙done(𝐱0:t)\mathbb{1}^{\text{done}}(\mathbf{x}_{0:t}) that returns 11 when any state 𝐱τ\mathbf{x}_{\tau} has reached 𝐩goal\mathbf{p}^{\text{goal}} for 0τT0\leq\tau\leq T, and returns 0 otherwise. Intuitively, the objective encourages the robot to arrive at the goal location as quickly as possible.

While the problem (3) can be optimized via non-linear optimization techniques such as Model Predictive Path Integral control (MPPI [33, Algorithm 2]), the main challenge is that the terrain traction 𝛙\bm{\psi} is uncertain. To address this issue, existing techniques try to learn the expected traction, or use nominal traction while penalizing undesirable terrains manually. However, these approaches are either risk-neutral or require human expertise in cost design. In this work, we propose to learn the full traction distribution empirically (Sec. IV) that can be used to design risk-aware costs with easy-to-tune risk tolerance (Sec. V).

IV Traversability Model

In this section, we introduce how to learn traction distribution (aleatoric uncertainty) and how to leverage density estimation in the trained NN’s latent space for detecting unfamiliar terrains (epistemic uncertainty). An overview of the entire traversability analysis procedure is shown in Fig. 3.

Refer to caption
Figure 2: Illustration of a real world data collection used to obtain terrain features and traction values. (a) The robot was manually driven for 10 minutes over different terrains while building a semantic octomap from which a projected 2D semantic map and an elevation map can be extracted. Example terrain types include vegetation (light green), grass (dark green) and mulch (brown). (b) Only a subset of the collected linear and angular traction values are shown for clarity. Note that discontinuity in traction values occurred when linear or angular commands were not sent.
Refer to caption
Figure 3: Overview of the proposed traversability analysis procedure that shows the real-world terrain features and the actual empirical traction distribution learned by the trained NN. The proposed method captures both the aleatoric uncertainty via the predicted distribution, and the epistemic uncertainty via the latent space density. In a sliding-window fashion, the trained NN takes in local semantic and elevation features to predict linear and angular traction distributions. By learning categorical distributions over the discretized traction values, the NN is able to capture rich terrain properties that are not necessarily Gaussian. To identify terrain features that are significantly different from the training data, we fit a density estimator in the trained NN’s latent space, such as a Gaussian Mixture Model (GMM), in order to obtain the likelihood of any input terrain during deployment. If the likelihood of input terrain features is below a certain threshold, the terrain is deemed out-of-distribution (OOD) and later avoided during planning via auxiliary penalties.

IV-A Terrain-Dependent Traction Distribution

Given the set of system parameters 𝚿\bm{\Psi} and the set of terrain features 𝑶\bm{O}, we want to model the conditional distribution

p𝜽(𝝍𝐨):𝚿𝑶,p_{\bm{\theta}}(\bm{\psi}\mid\mathbf{o}):\bm{\Psi}\mid\bm{O}\rightarrow\mathbb{R}, (7)

where p𝜽p_{\bm{\theta}} is a probability distribution parameterized by 𝜽\bm{\theta}, which in practice can be learned by a NN using empirically collected dataset {(𝐨,𝝍)k}k=1K\{(\mathbf{o},\bm{\psi})_{k}\}_{k=1}^{K} where K>0K>0. A real world example can be found in Fig. 2 where a Clearpath Husky was manually driven in a forest to build an environment model and collect traction data. Note that the semantic and geometric information about the environment can be built by using a semantic octomap [34] that temporally fuses semantic point clouds. We used PointRend [35] trained on RUGD off-road navigation dataset [11] to segment RGB images and subsequently projected the semantics onto lidar point clouds. To estimate the true linear and angular velocities of the robot, we could not rely on the wheel-encoders due to wheel slips. Therefore, we used direct lidar odometry [36] that produced accurate pose estimates even when driving through tall grass, and the resultant pose estimates were fused with IMU measurements in an extended Kalman filter to obtain the high-rate velocity estimates. The velocity estimates were further filtered to reduce noise due to bumpy terrains. Finally, the traction values were computed as the ratios between the estimated and the commanded velocities and stored for offline training.

Given terrain features, traversed path and estimated traction values, we can train a NN (e.g., with convolutional layers followed by fully connected layers) to map local terrain features to categorical distributions over discretized traction values in order to capture rich terrain properties. To facilitate planning, we store learned traction distributions in a map 𝐌𝜽\mathbf{M}_{\bm{\theta}}, where each cell 𝐌𝜽h,w\mathbf{M}_{\bm{\theta}}^{h,w} indexed by row hh and height ww stores the traction distribution p𝜽h,w:𝚿p_{\bm{\theta}}^{h,w}:\bm{\Psi}\rightarrow\mathbb{R} that has already been conditioned on the associated terrain features.

IV-B Density-Based Detector for Unfamiliar Terrain Features

As the learned traction predictor is trained on limited data, its predictions based on terrain features significantly different from training data are unreliable, which can lead to degraded navigation performance. Therefore, we fit a density estimator in the latent space of the trained NN in order to use the predicted likelihood as a measure of model uncertainty.

We first apply principal component analysis (PCA) to the latent space features that correspond to all the terrain features observed during training 𝑶train={𝐨k}k=1K\bm{O}^{\text{train}}=\{\mathbf{o}_{k}\}_{k=1}^{K}. Next, we fit a Gaussian Mixture Model (GMM) for the entire training dataset in the reduced latent space, where the likelihood of observing a particular feature 𝐨k\mathbf{o}_{k} is denoted as p𝜽latent(𝐨k)p_{\bm{\theta}}^{\text{latent}}(\mathbf{o}_{k}). We design a simple confidence score gg based on the log-likelihood of the query data normalized between the maximum and minimum log-likelihood observed in training data:

g(𝐨)\displaystyle g(\mathbf{o}) =p𝜽latent(𝐨)pminpmaxpmin,\displaystyle=\frac{p_{\bm{\theta}}^{\text{latent}}(\mathbf{o})-p^{\text{min}}}{p^{\text{max}}-p^{\text{min}}}, (8)
pmax\displaystyle p^{\text{max}} =max𝐨𝑶trainp𝜽latent(𝐨),\displaystyle=\max_{\mathbf{o}^{\prime}\in\bm{O}^{\text{train}}}p_{\bm{\theta}}^{\text{latent}}(\mathbf{o}^{\prime}), (9)
pmin\displaystyle p^{\text{min}} =min𝐨𝑶trainp𝜽latent(𝐨).\displaystyle=\min_{\mathbf{o}^{\prime}\in\bm{O}^{\text{train}}}p_{\bm{\theta}}^{\text{latent}}(\mathbf{o}^{\prime}). (10)

Note that g(𝐨)g(\mathbf{o}) is not limited to [0,1][0,1] and lower values indicate that the terrain features are less similar to the training data. With the NN trained in the environment shown in Fig. 2, we project the latent space features to the first 2 principle components and fit a GMM with 2 clusters. During deployment, terrain features with confidence score below some threshold gthres[0,1]g^{\text{thres}}\in[0,1] are deemed out-of-distribution (OOD) and the OOD terrains should be explicitly avoided during planning via auxiliary penalties. This strategy improves navigation success rate when the NN is deployed in an environment unseen during training (see Sec. VI-C).

V Planning with Learned Traction Distribution

Given the learned traction model, we propose two risk-aware cost formulations in order to generate control input that is less likely to lead to worst-case failures. Note that the proposed cost formulations can substitute the nominal objective in (3) and the problem can still be solved normally using nonlinear optimization technique such as MPPI.

V-A Conditional Value at Risk (CVaR)

We adopt the Conditional Value at Risk (CVaR) as a risk metric because it satisfies a group of axioms important for rational risk assessment [37], but the conventional definition assumes the worst-case occurs at the right tail of the distribution. In this work, we define CVaR at both the right and left tails (see Fig. 5) at level α(0,1]\alpha\in(0,1] for the random variable ZZ and its possible realization zz\in\mathbb{R} as follows:

CVaRα(Z)\displaystyle{\text{CVaR}_{\alpha}^{\rightarrow}(Z)} :=1α0αVaRτ(Z)𝑑τ,\displaystyle:=\frac{1}{\alpha}\int_{0}^{\alpha}{\text{VaR}_{\tau}^{\rightarrow}(Z)}\ d\tau, (11)
CVaRα(Z)\displaystyle{\text{CVaR}_{\alpha}^{\leftarrow}(Z)} :=1α0αVaRτ(Z)𝑑τ,\displaystyle:=\frac{1}{\alpha}\int_{0}^{\alpha}{\text{VaR}_{\tau}^{\leftarrow}(Z)}\ d\tau, (12)

where the right and left Values at Risk are defined as:

VaRα(Z)\displaystyle{\text{VaR}_{\alpha}^{\rightarrow}(Z)} :=min{zp(Z>z)α},\displaystyle:=\min\{z\mid p(Z>z)\leq\alpha\}, (13)
VaRα(Z)\displaystyle{\text{VaR}_{\alpha}^{\leftarrow}(Z)} :=max{zp(Z<z)α}.\displaystyle:=\max\{z\mid p(Z<z)\leq\alpha\}. (14)

Intuitively, CVaRα(Z){\text{CVaR}_{\alpha}^{\rightarrow}(Z)} and CVaRα(Z){\text{CVaR}_{\alpha}^{\leftarrow}(Z)} capture the expected outcomes that fall in the right tail and left tail of the distribution, respectively, where each tail occupies α\alpha portion of the total probability. Notice that either definition of CVaR produces the mean when α=1\alpha=1.

V-B Risk-Aware Cost Formulations

In order to account for the risk of obtaining high cost due to the uncertain system parameters 𝝍t\bm{\psi}_{t}, we propose to optimize two modified versions of the cost function (3). Fig. 5 gives a high-level illustration of the core ideas behind the two cost functions.

Refer to caption
Figure 4: This work defines two versions of Conditional Value at Risk (CVaR) to capture the worst-case expected values at either the left tail as CVaRα(Z){\text{CVaR}_{\alpha}^{\leftarrow}(Z)} or the right tail as CVaRα(Z){\text{CVaR}_{\alpha}^{\rightarrow}(Z)} for some random variable ZZ, where the worst-case scenarios constitute α(0,1]\alpha\in(0,1] portion of total probability. The left-tail and right-tail Values at Risk (VaR) are defined as VaRα(Z){\text{VaR}_{\alpha}^{\leftarrow}(Z)} and VaRα(Z){\text{VaR}_{\alpha}^{\rightarrow}(Z)}. Note that the right-tail definitions are suitable for costs to be minimized, and the left-tail definitions are suitable for low traction values.
Refer to caption
Figure 5: Illustrations of the proposed risk-aware costs in a toy problem with known traction distributions for vegetation and dirt terrains as shown in Fig. 7. Linear and angular traction models are assumed equal. (a) The state rollout based on nominal dynamics (i.e., no slip) does not account for uncertain traction. (b) CVaR-Cost computes the expected costs in the right α\alpha-quantile by evaluating the given control sequence over MM traction map samples. (c) CVaR-Dyn requires only a single rollout over the traction map that contains the expected traction in the left α\alpha-quantile.

V-B1 Worst-Case Expected Cost (CVaR-Cost)

Given the control sequence 𝐮0:T1\mathbf{u}_{0:T-1}, we want to evaluate the worst-case expected value for the nominal objective CC (3) due to uncertain terrain traction. First, we sample from the traction distribution map 𝐌𝜽\mathbf{M}_{\bm{\theta}} to obtain M>0M>0 traction maps that contain samples of traction values from distribution p𝜽h,wp_{\bm{\theta}}^{h,w} in every map cell indexed by height hh and width ww for each sample index m{1,,M}m\in\{1,\dots,M\}. Next, we compute the empirical right-tail CVaR of the performances of the MM rollouts:

cCVaR-Cost:=CVaRα({C(𝐱0:Tm)}m=1M),c^{\text{{CVaR-Cost}{}}}:={\text{CVaR}_{\alpha}^{\rightarrow}(\{C(\mathbf{x}^{m}_{0:T})\}_{m=1}^{M})}, (15)

where the mm-th state rollout 𝐱0:Tm\mathbf{x}^{m}_{0:T} follows

𝐱t+1m=F(𝐱tm,𝐮t,𝝍tm),𝐱0m=𝐱0,\mathbf{x}_{t+1}^{m}=F(\mathbf{x}_{t}^{m},\mathbf{u}_{t},\bm{\psi}_{t}^{m}),\quad\mathbf{x}_{0}^{m}=\mathbf{x}_{0}, (16)

for t{0,,T1}t\in\{0,\dots,T-1\}. The traction parameter 𝝍tm\bm{\psi}_{t}^{m} is queried in the mm-th sampled traction map at state 𝐱tm\mathbf{x}_{t}^{m}. Note that this approach is inspired by [28], but we additionally handle terrain-dependent distributions of parameters and the sampling of traction maps. For better real-time performance, the sampled traction maps can be reused for evaluating different control sequences.

V-B2 Worst-Case Expected System Parameters (CVaR-Dyn)

The procedure for evaluating a control sequence over a large number of sampled traction maps can be efficiently parallelized on GPUs, but the computational overhead can still grow prohibitively when considering many control sequences. Therefore, we propose an alternative cost design that accounts for the worst-case expectation of the traction values in the dynamical model.

Given the control sequence 𝐮0:T1\mathbf{u}_{0:T-1}, we evaluate the nominal mission objective CC (3) based on the state rollout simulated with the worst-case expected traction, i.e.,

cCVaR-Dyn:=C(𝐱¯0:T),c^{\text{{CVaR-Dyn}{}}}:=C(\overline{\mathbf{x}}_{0:T}), (17)

where the state rollout follows

𝐱¯t+1=F(𝐱¯t,𝐮t,𝝍¯t),𝐱¯0=𝐱0,\overline{\mathbf{x}}_{t+1}=F(\overline{\mathbf{x}}_{t},\mathbf{u}_{t},\overline{\bm{\psi}}_{t}),\quad\overline{\mathbf{x}}_{0}=\mathbf{x}_{0}, (18)

for t{0,,T1}t\in\{0,\dots,T-1\} and the worst-case expected traction 𝝍¯t\overline{\bm{\psi}}_{t} is computed based on the corresponding traction distribution at some row hh and height ww determined by state 𝐱¯t\overline{\mathbf{x}}_{t}:

𝝍¯t=[CVaRα(Ψ1)CVaRα(Ψ2)],[Ψ1Ψ2]p𝜽h,w.\overline{\bm{\psi}}_{t}=\begin{bmatrix}{\text{CVaR}_{\alpha}^{\leftarrow}(\Psi_{1})}\\ {\text{CVaR}_{\alpha}^{\leftarrow}(\Psi_{2})}\end{bmatrix},\ \begin{bmatrix}\Psi_{1}\\ \Psi_{2}\end{bmatrix}\sim p_{\bm{\theta}}^{h,w}. (19)

When α=1\alpha=1, the expected values of the traction parameters are used, equivalent to the approach in [16]. However, as the results in Sec. VI-B show for a go-to-goal task, planning with the worst-case expected traction can improve navigation performance when the traction distribution is not Gaussian.

Remark 1.

The proposed costs (15) and (17) provide intuitive notions of risk that depend on the worst-case expected cost and terrain traction. Moreover, they are simple to tune with a single risk parameter α\alpha, avoiding the need to manually design weights for a potentially large variety of terrains.

Remark 2.

While adding auxiliary penalties for trajectories entering low-traction terrains can generate similar risk-aware behaviors achievable by our approach, we show that using the proposed costs (15) and (17) leads to solutions with better trade-offs between success rate and time-to-goal than ones achieved by using the nominal objective augmented with terrain penalties (see Fig. 8). However, the highest success rate achieved by using the proposed cost formulations are lower. Combining the best of both worlds, using the proposed costs with auxiliary penalties for undesirable terrains, such as OOD terrains, lead to better performance (see Sec. VI-C).

VI Simulation Results

Using simulated semantic environments, we show that the proposed methods outperform existing approaches [16, 17] that either assume no slip or use expected traction. Moreover, we discuss the advantages and limitations of our proposed risk-aware costs compared to auxiliary penalties for low-traction terrains. Lastly via simulations based on real-world traction data, we show that avoiding OOD terrains improves navigation success rate, and using our approach with auxiliary cost for OOD terrains improves time-to-goals. To prevent simulations from running indefinitely when the robot encounters near-zero traction, we impose time limits (selected based on mission difficulties) that are much longer than the average time required to complete the missions.

Refer to caption
Figure 6: Results comparing the proposed methods (CVaR-Cost and CVaR-Dyn in Sec. V-B) against existing methods based on the expected traction (WayFAST [16]) and the method that assumes nominal traction [17] (i.e., no slip). Note that a mission is successful if the goal is reached within 15 s. Overall, the CVaR-Cost planner always maintains better success rate and time-to-goal than WayFAST and the method using nominal traction. As the risk tolerance increases, the CVaR-Cost planner becomes more optimistic and achieves lower time-to-goal. When the risk tolerance is sufficiently low (e.g., α=0.2\alpha=0.2), the second proposed planner CVaR-Dyn achieves similar or better success rate and time-to-goal compared to the CVaR-Cost planner.

VI-A Implementation Details

In order to solve the optimization problem (3), we adopt MPPI [33, Algorithm 2] to generate control for achieving fast navigation to goal. This approach is attractive because it is derivative-free, parallelizable on GPU, and works with our proposed risk-aware cost formulations (15) and (17). The MPPI planners run in a receding horizon fashion with 100 time steps and each step is 0.1 s. The max linear and angular speeds are capped at 3 m/s and π\pi rad/s, and the noise standard deviations for the control signals are 2 m/s and 2 rad/s. The number of control rollouts is 1024 and the number of sampled traction maps is 1024 (only applicable for the CVaR-Cost). We use probability mass functions with 20 uniform bins to approximate the parameter distribution. The CVaR-Cost planner is the most expensive to compute, but it is able to re-plan at 15 Hz while sampling new control actions and maps with dimension of 200×200200\times 200. Planners that do not sample traction maps can be executed at over 50 Hz. A computer with Intel Core i9 CPU and Nvidia GeForce RTX 3070 GPU is used for the simulations, where majority of the computation happens on the GPU.

VI-B Simulated Semantic Environments

We consider a grid world scenario where “dirt” and “vegetation” cells have known traction distributions, as shown in Fig. 7. Vegetation patches are randomly spawned with increasing probabilities at the center of the arena, and a robot may experience significant slow-down for certain vegetation cells due to vegetation’s bi-modal distribution. The mission is deemed successful if the goal is reached within 15 s.

Refer to caption
Figure 7: The simulation environment where a robot has to move from start to goal as fast as possible within the bounded arena. Linear and angular traction parameters share the same distribution for simplicity. Vegetation terrain patches are randomly sampled at the center.

Overall, we sample 4040 different semantic maps and 55 random realizations of traction parameters for every semantic map. The traction parameters are drawn before starting each trial and remain fixed. The benchmark results can be found in Fig. 6, where we compare the two proposed costs with existing methods, namely WayFAST [16] that uses the expected traction and the technique in [17] that assumes nominal dynamics while adjusting the time cost with the CVaR of linear traction. The takeaway is that the proposed methods outperform the two existing ones by accounting for the worst-case expected cost and traction. Notably, although the CVaR-Dyn planner does not sample from the entire parameter distribution, it achieves similar performance to the CVaR-Cost planner when α\alpha is set sufficiently low. The poor performance of the CVaR-Cost planner can be attributed to the difficulty of estimating CVaR from samples in general when α\alpha is low. However, the CVaR-Dyn planner makes conservative assumption about the dynamics, so it does not out-perform CVaR-Cost in achieving low time-to-goal.

To compare our cost formulations against the nominal objective with auxiliary stage costs that penalize states in vegetation cells, we focus on the most challenging setting with 70% vegetation where it is easy to get stuck in local minima. The benchmark result is shown in Fig. 8 where we compare the trade-offs between success rate and time-to-goal achieved by different methods. Although there exists risk tolerance α\alpha that allows CVaR-Cost and CVaR-Dyn to obtain better trade-off than having vegetation penalties, their conservativeness in considering the CVaR of cost and traction prevents them from achieving solutions with higher success rate. Therefore, when domain knowledge is available, adding auxiliary costs can help achieve much higher success rate desirable in practice, but tuning the cost may be challenging when a large variety of terrains exist.

Refer to caption
Figure 8: In the most challenging scenario of 70%70\% vegetation, solutions obtained by CVaR-Dyn (green) achieve a better trade-off between success rate and time-to-goal than ones achieved by adding auxiliary penalty w>0w>0 for states entering vegetation terrains (black) by being in the upper left of the figure. However, the success rate of CVaR-Dyn plateaus as α\alpha decreases due to shorter state rollouts that lead to local minima. Although the CVaR-Cost planner (red) achieves the best time-to-goal with high α\alpha, its performance suffers significantly from the conservativeness of using CVaR of the objective, which leads to worsening success rate and time-to-goal as α\alpha lowers. The conservativeness of the proposed costs can be mitigated by adding auxiliary penalties for undesirable terrains to improve navigation performance when domain knowledge is available (see Sec. VI-C).

VI-C Simulated Traction Based on Real-World Data

Refer to caption
Figure 9: In a test environment unseen during training (left), the robot has to reach two goals selected to highlight the danger of using unreliable NN predictions due to high model uncertainty. (Right) The GMM-based confidence score (8) indicates the amount of model uncertainty for the predicted traction distribution in each map location, where unknown terrains and known terrains with negative scores are shown in black. Note that the brown semantic region (mulch) at the top has confidence below zero due to the presence of unknown cells, in contrast to the brown semantic region to the left with much fewer unknown cells.
Refer to caption
Figure 10: Navigation success rate improves thanks to the proposed OOD terrain detector when deploying a trained NN in a novel environment. As the focus is not on benchmarking different planning algorithms, we select CVaR-Dyn with α=0.2\alpha=0.2 and gthres=0g^{\text{thres}}=0 as the baseline planner, but similar conclusions can be drawn using a different planner. The OOD terrains are handled by either assigning zero traction (blue) or imposing penalties (orange). The performance of the planner that uses the ground truth (GT) traction is also shown to demonstrate the best performance. Overall, higher gthresg^{\text{thres}} improves the navigation success rate at the cost of higher time-to-goal, because there are more OOD terrain cells to avoid. However, auxiliary penalties for OOD terrains make it easier for a given planner to find solutions that lead to the goal. Notably, the average success rate when gthres=0.75g^{\text{thres}}=0.75 approaches 11, indicating that the learned traction model generalizes well to terrains with high confidence values in the test environment.

In this section, we demonstrate the benefit of the proposed density-based confidence score (8) for detecting terrains that lead to unreliable NN predictions. Because benchmarking different planning algorithms is not the main focus of this section, we use the proposed CVaR-Dyn planner that has been shown to achieve higher success rates than CVaR-Cost and set a low risk tolerance α=0.2\alpha=0.2 (experience has shown that similar results occur using other planners). In order to simulate training and testing environments, we leverage the data collected in two distinct forests, where the first one (visualized in Fig. 2) is used to train the traction predictor, and the second one (whose semantic top-down view is shown in Fig. 10) is used to simulate the test environment. The traction values will be drawn from the test environment’s empirical traction distribution learned by a separate NN as the proxy ground truth. Two specific start-goal pairs are selected in order to highlight the most challenging parts of the test environment with novel features. Each start-goal pair is repeated 10 times for each selected confidence threshold gthresg^{\text{thres}}. We investigate two different approaches to prevent the planner from entering OOD terrains with low confidence: (1) assigning zero traction to OOD terrains, and (2) adding large penalties for states entering OOD terrains. Note that the mission is successful if each goal is reached within 30 s.

As shown in Fig. 10, the navigation success rate improves by up to 30%30\% as gthresg^{\text{thres}} increases, because the robot avoids regions where the network’s prediction may be significantly different from the ground truth. Interestingly, using CVaR-Dyn with additional penalties for states entering OOD terrains leads to better time-to-goal while retaining similar success rate. Intuitively, the auxiliary costs make it easier for the CVaR-Dyn planner to find trajectories that move around the OOD terrains. Therefore, it is advantageous to combine the proposed cost formulations with auxiliary costs when domain knowledge is available in order to achieve both high success rate and fast navigation in practice.

VII Conclusion & Future Work

This work proposed a probabilistic traversability model that is easy to train thanks to self-supervision that captures the full empirical distribution of the traction parameters in the unicycle dynamics. For navigation tasks in simulated environments, planning with the proposed risk-aware costs led to better performance than methods that assumed no slip or used expected traction. Furthermore, the learned traction model generalized better in novel environments by avoiding terrains that had low confidence scores based on the GMM-based density estimator. Lastly, using the proposed costs with auxiliary penalties for undesirable terrains, when such prior knowledge is available, can lead to improved performance.

Based on our results, the proposed CVaR-Cost planner achieved the best time-to-goals but suffered from poor sample efficiency and conservativeness. Therefore, two interesting research directions are to design a sample-efficient planner that optimizes the CVaR of the objective, and to investigate other risk metrics that are less conservative. Additionally, the proposed framework can be streamlined by replacing the GMM with a normalizing flow model that can be jointly trained. Lastly, extensive hardware experiments are needed to validate the proposed approaches in practice.

Acknowledgment

Research was sponsored by ARL W911NF-21-2-0150 and by ONR grant N00014-18-1-2832.

References

  • [1] D. D. Fan, K. Otsu, Y. Kubo, A. Dixit, J. Burdick, and A.-A. Agha-Mohammadi, “Step: Stochastic traversability evaluation and planning for risk-aware off-road navigation,” in Robotics: Science and Systems.   RSS Foundation, 2021, pp. 1–21.
  • [2] J. Frey, D. Hoeller, S. Khattak, and M. Hutter, “Locomotion policy guided traversability learning using volumetric representations of complex environments,” arXiv preprint arXiv:2203.15854, 2022.
  • [3] A. A. Pereira, J. Binney, G. A. Hollinger, and G. S. Sukhatme, “Risk-aware path planning for autonomous underwater vehicles using predictive ocean models,” Journal of Field Robotics, vol. 30, no. 5, pp. 741–762, 2013.
  • [4] M. Massari, G. Giardini, and F. Bernelli-Zazzera, “Autonomous navigation system for planetary exploration rover based on artificial potential fields,” in Proceedings of Dynamics and Control of Systems and Structures in Space (DCSSS) 6th Conference, 2004, pp. 153–162.
  • [5] A. Valada, G. Oliveira, T. Brox, and W. Burgard, “Deep multispectral semantic scene understanding of forested environments using multimodal fusion,” in International Symposium on Experimental Robotics (ISER), 2016.
  • [6] A. Valada, J. Vertens, A. Dhall, and W. Burgard, “Adapnet: Adaptive semantic segmentation in adverse environmental conditions,” in 2017 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2017, pp. 4644–4651.
  • [7] Z. Chen, D. Pushp, and L. Liu, “Cali: Coarse-to-fine alignments based unsupervised domain adaptation of traversability prediction for deployable autonomous navigation,” arXiv preprint arXiv:2204.09617, 2022.
  • [8] T. Guan, D. Kothandaraman, R. Chandra, A. J. Sathyamoorthy, K. Weerakoon, and D. Manocha, “Ga-nav: Efficient terrain segmentation for robot navigation in unstructured outdoor environments,” IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 8138–8145, 2022.
  • [9] T. Guan, Z. He, R. Song, D. Manocha, and L. Zhang, “Tns: Terrain traversability mapping and navigation system for autonomous excavators,” Robotics: Science and Systems XVIII, 2021.
  • [10] A. Shaban, X. Meng, J. Lee, B. Boots, and D. Fox, “Semantic terrain classification for off-road autonomous driving,” in Conference on Robot Learning.   PMLR, 2022, pp. 619–629.
  • [11] M. Wigness, S. Eum, J. G. Rogers, D. Han, and H. Kwon, “A rugd dataset for autonomous navigation and visual perception in unstructured outdoor environments,” in International Conference on Intelligent Robots and Systems (IROS), 2019.
  • [12] P. Jiang, P. Osteen, M. Wigness, and S. Saripalli, “Rellis-3d dataset: Data, benchmarks and analysis,” in 2021 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2021, pp. 1110–1116.
  • [13] G. Kahn, P. Abbeel, and S. Levine, “Badgr: An autonomous self-supervised learning-based navigation system,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 1312–1319, 2021.
  • [14] X. Yao, J. Zhang, and J. Oh, “Rca: Ride comfort-aware visual navigation via self-supervised learning,” arXiv preprint arXiv:2207.14460, 2022.
  • [15] J. Zürn, W. Burgard, and A. Valada, “Self-supervised visual terrain classification from unsupervised acoustic feature learning,” IEEE Transactions on Robotics, vol. 37, no. 2, pp. 466–481, 2020.
  • [16] M. V. Gasparino, A. N. Sivakumar, Y. Liu, A. E. B. Velasquez, V. A. H. Higuti, J. Rogers, H. Tran, and G. Chowdhary, “Wayfast: Navigation with predictive traversability in the field,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 10 651–10 658, 2022.
  • [17] X. Cai, M. Everett, J. Fink, and J. P. How, “Risk-aware off-road navigation via a learned speed distribution map,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2022, pp. 2931–2937.
  • [18] A. J. Sathyamoorthy, K. Weerakoon, T. Guan, J. Liang, and D. Manocha, “Terrapn: Unstructured terrain navigation through online self-supervised learning,” arXiv preprint arXiv:2202.12873, 2022.
  • [19] F. G. Oliveira, A. A. Neto, D. Howard, P. Borges, M. F. Campos, and D. G. Macharet, “Three-dimensional mapping with augmented navigation cost through deep learning,” Journal of Intelligent & Robotic Systems, vol. 101, no. 3, pp. 1–21, 2021.
  • [20] S. Otte, C. Weiss, T. Scherer, and A. Zell, “Recurrent neural networks for fast and robust vibration-based ground classification on mobile robots,” in 2016 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2016, pp. 5603–5608.
  • [21] J. Larson, M. Trivedi, and M. Bruch, “Off-road terrain traversability analysis and hazard avoidance for ugvs,” California University San Diego, Dept. of Electrical Engineering, Tech. Rep., 2011.
  • [22] T. Overbye and S. Saripalli, “Fast local planning and mapping in unknown off-road terrain,” in 2020 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2020, pp. 5912–5918.
  • [23] ——, “G-vom: A gpu accelerated voxel off-road mapping system,” in 2022 IEEE Intelligent Vehicles Symposium (IV).   IEEE, 2022, pp. 1480–1486.
  • [24] T. Guan, Z. He, D. Manocha, and L. Zhang, “Ttm: Terrain traversability mapping for autonomous excavator navigation in unstructured environments,” arXiv preprint arXiv:2109.06250, 2021.
  • [25] Y. Tan, N. Virani, B. Good, S. Gray, M. Yousefhussien, Z. Yang, K. Angeliu, N. Abate, and S. Sen, “Risk-aware autonomous navigation,” in Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications III, vol. 11746.   International Society for Optics and Photonics, 2021, p. 117461D.
  • [26] Y. Yang, X. Meng, W. Yu, T. Zhang, J. Tan, and B. Boots, “Learning semantics-aware locomotion skills from human demonstration,” arXiv preprint arXiv:2206.13631, 2022.
  • [27] D. D. Fan, A.-A. Agha-Mohammadi, and E. A. Theodorou, “Learning risk-aware costmaps for traversability in challenging environments,” IEEE Robotics and Automation Letters, vol. 7, no. 1, pp. 279–286, 2021.
  • [28] Z. Wang, O. So, K. Lee, and E. A. Theodorou, “Adaptive risk sensitive model predictive control with stochastic search,” in Learning for Dynamics and Control.   PMLR, 2021, pp. 510–522.
  • [29] J. Gawlikowski, C. R. N. Tassi, M. Ali, J. Lee, M. Humt, J. Feng, A. Kruspe, R. Triebel, P. Jung, R. Roscher, et al., “A survey of uncertainty in deep neural networks,” arXiv preprint arXiv:2107.03342, 2021.
  • [30] Y. Gal and Z. Ghahramani, “Dropout as a bayesian approximation: Representing model uncertainty in deep learning,” in international conference on machine learning.   PMLR, 2016, pp. 1050–1059.
  • [31] I. Osband, C. Blundell, A. Pritzel, and B. Van Roy, “Deep exploration via bootstrapped dqn,” Advances in neural information processing systems, vol. 29, 2016.
  • [32] B. Charpentier, O. Borchert, D. Zügner, S. Geisler, and S. Günnemann, “Natural Posterior Network: Deep Bayesian Predictive Uncertainty for Exponential Family Distributions,” in International Conference on Learning Representations, 2022.
  • [33] G. Williams, N. Wagener, B. Goldfain, P. Drews, J. M. Rehg, B. Boots, and E. A. Theodorou, “Information theoretic mpc for model-based reinforcement learning,” in 2017 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2017, pp. 1714–1721.
  • [34] A. Asgharivaskasi and N. Atanasov, “Active bayesian multi-class mapping from range and semantic segmentation observations,” in 2021 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2021, pp. 1–7.
  • [35] A. Kirillov, Y. Wu, K. He, and R. Girshick, “Pointrend: Image segmentation as rendering,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 9799–9808.
  • [36] K. Chen, B. T. Lopez, A.-a. Agha-mohammadi, and A. Mehta, “Direct lidar odometry: Fast localization with dense point clouds,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 2000–2007, 2022.
  • [37] A. Majumdar and M. Pavone, “How should a robot assess risk? towards an axiomatic theory of risk in robotics,” in Robotics Research.   Springer, 2020, pp. 75–84.