This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Risk Bounded Nonlinear Robot Motion Planning With Integrated Perception & Control

Venkatraman Renganathan [email protected] Sleiman Safaoui [email protected] Aadi Kothari [email protected] Benjamin Gravell [email protected] Iman Shames [email protected] Tyler Summers [email protected]
Abstract

Robust autonomy stacks require tight integration of perception, motion planning, and control layers, but these layers often inadequately incorporate inherent perception and prediction uncertainties, either ignoring them altogether or making questionable assumptions of Gaussianity. Robots with nonlinear dynamics and complex sensing modalities operating in an uncertain environment demand more careful consideration of how uncertainties propagate across stack layers. We propose a framework to integrate perception, motion planning, and control by explicitly incorporating perception and prediction uncertainties into planning so that risks of constraint violation can be mitigated. Specifically, we use a nonlinear model predictive control based steering law coupled with a decorrelation scheme based Unscented Kalman Filter for state and environment estimation to propagate the robot state and environment uncertainties. Subsequently, we use distributionally robust risk constraints to limit the risk in the presence of these uncertainties. Finally, we present a layered autonomy stack consisting of a nonlinear steering-based distributionally robust motion planning module and a reference trajectory tracking module. Our numerical experiments with nonlinear robot models and an urban driving simulator show the effectiveness of our proposed approaches.

keywords:
Risk-Bounded Motion Planning , Distributional Robustness , Integrated Perception & Planning
journal: Artificial Intelligencet1t1footnotetext: This work was completed when V. Renganathan was a PhD student at The University of Texas at Dallas, USA.
\affiliation

[inst1] organization=Department of Automatic Control, Lund University, addressline=Naturvetarvägen 18, city=Lund, postcode=221 00, state=SE, country=Sweden.

\affiliation

[inst2] organization=Eric Jonsson School of Engineering & Computer Science, The University of Texas at Dallas, addressline=800 W Campbell Rd, city=Richardson, postcode=75080, state=TX, country=USA.

\affiliation

[inst3]organization=School of Engineering,addressline=Massachusetts Institute of Technology, city=Cambridge, postcode=02139, state=MA, country=USA.

\affiliation

[inst4]organization=CIICADA Lab, School of Engineering,addressline=The Australian National University, city=Acton, postcode=0200, state=ACT, country=Australia.

Supplementary Material

Video of the simulation is available at https://youtu.be/KpyWXRZ-wSI and the simulation code is available at https://github.com/TSummersLab/Risk_Bounded_Nonlinear_Robot_Motion_Planning.

Acknowledgement

This work is partially supported by Defence Science and Technology Group, through agreement MyIP: ID10266 entitled “Hierarchical Verification of Autonomy Architectures”, the Australian Government, via grant AUSMURIB000001 associated with ONR MURI grant N00014-19-1-2571, and by the United States Air Force Office of Scientific Research under award number FA2386-19-1-4073.

1 Introduction

Safety is a critical issue for robotic and autonomous systems that must traverse through uncertain environments. More sophisticated motion planning and control algorithms are needed as environments become increasingly dynamic and uncertain to ensure safe and effective autonomous behavior. Safely deploying robots in such dynamic environments requires a systematic accounting of various risks both within and across layers in an autonomy stack from perception to motion planning and control. Many widely used motion planning algorithms have been developed in deterministic settings. However, since motion planning algorithms must be coupled with the outputs of inherently uncertain perception systems, there is a crucial need for more tightly coupled perception and planning frameworks that explicitly incorporate perception and prediction uncertainties.

Motion planning under uncertainty has been considered in several lines of recent research [1, 2, 3, 4, 5, 6, 7]. Many approaches make questionable assumptions of Gaussianity and utilize chance constraints, ostensibly to maintain computational tractability. However, this can cause significant miscalculations of risk, and the underlying risk metrics do not necessarily possess desirable coherence properties [8, 9]. The emerging area of distributionally robust optimization (DRO) shows that stochastic uncertainty can be handled in much more sophisticated ways without necessarily sacrificing computational tractability [10]. These approaches allow modelers to explicitly incorporate inherent ambiguity in probability distributions, rather than making overly strong structural assumptions on the distribution.

Traditionally, the perception and planning components in a robot autonomy stack are loosely coupled in the sense that nominal estimates from the perception system may be used for planning, while inherent perception uncertainties are usually ignored. This paradigm is inherited, in part, from the classical separation of estimation and control in linear systems theory. However, in the presence of uncertainties and constraints, estimation and control should not be separated as there are needs and opportunities to explicitly incorporate perception uncertainties into planning, both to mitigate risks of constraint violation [1, 11, 3, 12, 7] and to actively plan paths that improve perception [13].

The present paper is aimed towards establishing a systematic framework for integrating the perception and control components using a coherent risk assessment with distributionally robust risk constraints. Algorithms for motion planning under uncertainty have begun to address mission safety, state and control constraints, and trajectory robustness using chance constraints, as in [3]. This approach has been recently broadened in [12, 14, 15] using a more general framework of axiomatic risk theory and distributionally robust optimization.

Contributions: This manuscript is a significant extension of our previous works [14, 16]. In this paper, we relax the assumptions from our previous works by considering both motion model and sensor models to be nonlinear with additive uncertainties and propose an unified framework aimed toward a tighter integration of perception and planning in autonomous robotic systems. Our main contributions are:

  • 1.

    We propose a distributionally robust incremental sampling-based motion planning framework that explicitly and coherently incorporates perception and prediction uncertainties. Our solution approach called Nonlinear Risk Bounded RRT\texttt{RRT}^{\star} (NRB-RRT)(\texttt{NRB-RRT}^{\star}) (Algorithm 1), approximates asymptotically optimal risk-bounded trajectories.

  • 2.

    We design output feedback policies and consider moment-based ambiguity sets of distributions to enforce probabilistic collision avoidance constraints under the worst-case distribution in the ambiguity set (Algorithm 2). Our output feedback policy satisfying input and state risk constraints is constructed using a nonlinear model predictive controller coupled with an Unscented Kalman Filter for state estimation.

  • 3.

    We formulate distributionally robust collision avoidance risk constraints. We demonstrate via numerical simulation results that this gives a more sophisticated and coherent risk quantification compared to an approach that accounts for uncertainty using Gaussian assumptions, without increasing the computation complexity.

  • 4.

    We demonstrate our proposed algorithms and approaches using a unicycle model and a bicycle model in an open urban driving simulator [17] and show that risk bounded motion planning can be achieved effectively for nonlinear robotic systems.

The rest of the paper is organized as follows. The integrated environment state formulation using the nonlinear dynamical model of the robot and the obstacle and their uncertainty modeling is discussed in §2. Next, we describe the layers of a distributionally robust autonomy stack and formulate a general stochastic optimal control problem in §3. An output feedback based steering law with uncertainty propagation is presented in §4. In §5, we describe the planner module that uses the (NRB-RRT)(\texttt{NRB-RRT}^{\star}) motion planning algorithm for planning a reference trajectory. Then, in §6, we discuss about the reformulation technique for the distributionally robust risk constraints that define obstacle avoidance. The simulation results are then presented in §7. Finally, the paper summary and future research directions are discussed in §8.

Notation

The set of real numbers and natural numbers are denoted by \mathbb{R} and \mathbb{N}, respectively. The subset of natural numbers between and including aa and bb with a<ba<b is denoted by [a:b][a:b]. The operators |||\cdot| and ()c(\cdot)^{c} denote the set cardinality and the set complement of its argument, respectively. The operators ,\\oplus,\backslash denote the set translation and set subtraction respectively. An identity matrix in dimension nn is denoted by InI_{n}. For a non-zero vector xnx\in\mathbb{R}^{n} and a matrix P𝕊++nP\in\mathbb{S}^{n}_{++} (set of positive definite matrices), let xP=xPx\left\|x\right\|_{P}=\sqrt{x^{\top}Px}.

2 Description of the Robot & the Environment

Consider a robot operating in an environment, 𝒳n\mathcal{X}\subset\mathbb{R}^{n} cluttered with obstacles whose locations and motion are uncertain. The robot is assumed to be equipped with sensors and a perception system that support safe navigation through the environment. In what follows, we present the necessary ingredients for presenting the results such as the models for the robot, the environment, and the perception system.

2.1 Robot & Environment Model

2.1.1 Robot Model:

The robot is modeled as a stochastic discrete-time dynamical system:

xt+1\displaystyle x_{t+1} =𝐟(xt,ut)+wr,t,\displaystyle=\mathbf{f}(x_{t},u_{t})+w_{r,t}, (1)

where xtn,utmx_{t}\in\mathbb{R}^{n},u_{t}\in\mathbb{R}^{m} are the robot state and input respectively at time tt and 𝐟:n×mn\mathbf{f}:\mathbb{R}^{n}\times\mathbb{R}^{m}\rightarrow\mathbb{R}^{n} is a nonlinear function that represents the robot dynamics. The initial condition x0x_{0} is subject to an uncertainty model with the true distribution x0\mathbb{P}_{x_{0}} of x0x_{0} belonging to a moment-based ambiguity set 𝒫x0\mathcal{P}^{x_{0}}, i.e., x0𝒫x0\mathbb{P}_{x_{0}}\in\mathcal{P}^{x_{0}} where

𝒫x0:={x0𝔼[x0]=x¯0,𝔼[(x0x¯0)(x0x¯0)]=Σx0},\displaystyle\mathcal{P}^{x_{0}}:=\left\{\mathbb{P}_{x_{0}}\mid\mathbb{E}[x_{0}]=\bar{x}_{0},\mathbb{E}[(x_{0}-\bar{x}_{0})(x_{0}-\bar{x}_{0})^{\top}]=\Sigma_{x_{0}}\right\}, (2)

and x¯0\bar{x}_{0} and Σx0\Sigma_{x_{0}} are some known parameters of appropriate dimensions. The additive process noise wr,tnw_{r,t}\in\mathbb{R}^{n} is a zero-mean random vector independent and identically distributed across time according to some prescribed distribution wr\mathbb{P}_{w_{r}} with covariance Σwr\Sigma_{w_{r}}.

2.1.2 Environment Model

The environment 𝒳\mathcal{X} is assumed to be convex and represented by finite number of linear inequalities

𝒳:={xtAextbe}n,\displaystyle\mathcal{X}:=\left\{x_{t}\mid A_{e}x_{t}\leq b_{e}\right\}\subset\mathbb{R}^{n}, (3)

where the matrices Ae,beA_{e},b_{e} are of appropriate dimensions. We collectively refer the set of obstacles in the environment to be avoided as \mathcal{B} with ||=F|\mathcal{B}|=F\in\mathbb{N}. We represent the shape of an obstacle ii\in\mathcal{B} at a given initial time (t0=0)(t_{0}=0) by 𝒪i,0n\mathcal{O}_{i,0}\subset\mathbb{R}^{n}. For simplicity, i\forall i\in\mathcal{B} the obstacles shapes 𝒪i,0\mathcal{O}_{i,0} are assumed to be convex polytopes. That is, for ii\in\mathcal{B}

𝒪i,0:={x0Aix0bi,0}n,\displaystyle\mathcal{O}_{i,0}:=\left\{x_{0}\mid A_{i}x_{0}\leq b_{i,0}\right\}\subset\mathbb{R}^{n}, (4)

with the matrices Ain×n,bi,tnA_{i}\in\mathbb{R}^{n\times n},b_{i,t}\in\mathbb{R}^{n}. All obstacles are assumed to have an unpredictable motion and the set defining the space occupied by the obstacle ii\in\mathcal{B} at any time tt is given by 𝒪i,t\mathcal{O}_{i,t}. Then, the free space in the environment at time tt is

𝒳tfree\displaystyle\mathcal{X}^{\texttt{free}}_{t} :=𝒳\i𝒪i,t,where,\displaystyle:=\mathcal{X}\,\big{\backslash}\,\bigcup_{i\in\mathcal{B}}\mathcal{O}_{i,t},\quad\text{where,} (5)
𝒪i,t\displaystyle\mathcal{O}_{i,t} =𝖱i,t1𝒪i,t1{r¯i,t}{ri,t1},t1.\displaystyle=\mathsf{R}_{i,t-1}\mathcal{O}_{i,t-1}\oplus\{\bar{r}_{i,t}\}\oplus\{r_{i,t-1}\},\quad\forall t\geq 1. (6)

Here, r¯i,tn\bar{r}_{i,t}\in\mathbb{R}^{n} represents a known nominal translation and ri,tnr_{i,t}\in\mathbb{R}^{n} is a zero-mean random vector that represents the uncertain translation (possibly an unknown location or unpredictable obstacle motion) of the obstacle ii\in\mathcal{B} at time tt, and it is assumed to follow a known distribution i,tr\mathbb{P}^{r}_{i,t} with covariance Σitr\Sigma^{r}_{it}. Further, 𝖱i,t1n×n\mathsf{R}_{i,t-1}\in\mathbb{R}^{n\times n} denotes the product of random rotation matrices that represents the uncertain rotation of the obstacle ii\in\mathcal{B} at time tt where we used the definition that multiplying a matrix 𝖱i,t\mathsf{R}_{i,t} to a set 𝒪i,t\mathcal{O}_{i,t} is defined by 𝖱i,t𝒪i,t:={𝖱i,t(oco,i)+co,io𝒪i,t}\mathsf{R}_{i,t}\mathcal{O}_{i,t}:=\left\{\mathsf{R}_{i,t}\cdot(o-c_{o,i})+c_{o,i}\mid o\in\mathcal{O}_{i,t}\right\} with co,inc_{o,i}\in\mathbb{R}^{n} denoting the centroid of obstacle set 𝒪i,t\mathcal{O}_{i,t}. Let us denote the vectorized version of the uncertain rotation matrix 𝖱i,t\mathsf{R}_{i,t} as 𝗋i,t:=𝗏𝖾𝖼(𝖱i,t)n2\mathsf{r}_{i,t}:=\mathsf{vec}\left(\mathsf{R}_{i,t}\right)\in\mathbb{R}^{n^{2}} and it is assumed to be a zero-mean random vector following a known distribution i,t𝗋\mathbb{P}^{\mathsf{r}}_{i,t} with covariance Σit𝗋\Sigma^{\mathsf{r}}_{it}. However, for the ease of exposition in the remainder of this paper, we shall assume that either 𝖱i,t\mathsf{R}_{i,t} is known or 𝖱i,t\mathsf{R}_{i,t} being an identity transformation (no rotation) at all time steps tt, so that there is uncertainty only in the translation. We denote by 𝖷i,tl,\mathsf{X}_{i,t}\in\mathbb{R}^{l}, the states of the obstacle ii\in\mathcal{B}, which for instance could represent the position and velocity of the centroid of the obstacle at time tt. Then, the evolution of the state of the obstacle ii\in\mathcal{B} can be written as

𝖷i,t+1\displaystyle\mathsf{X}_{i,t+1} =𝗀i,t(𝖷i,t)+w𝒪,i,t.\displaystyle=\mathsf{g}_{i,t}\left(\mathsf{X}_{i,t}\right)+w_{\mathcal{O},i,t}. (7)

The random vector ri,tr_{i,t} that resulted in the set translation in (6) manifests itself as the zero-mean process noise that affects the states of the obstacle ii\in\mathcal{B} as w𝒪,i,tlw_{\mathcal{O},i,t}\in\mathbb{R}^{l} and w𝒪,i,tw_{\mathcal{O},i,t} is assumed to follow a known distribution w𝒪i\mathbb{P}_{w_{\mathcal{O}_{i}}} with covariance Σw𝒪i\Sigma_{w_{\mathcal{O}_{i}}}. Further, 𝗀i,t:ll\mathsf{g}_{i,t}:\mathbb{R}^{l}\rightarrow\mathbb{R}^{l} denotes the function (possibly nonlinear) that represents the dynamics of the obstacle state at time tt. For instance, the obstacle can be assumed to travel at a constant velocity but with process noise. Further, we assume that the uncertainty in the motion of obstacle ii is independent from every other obstacle jj\in\mathcal{B}, which implies 𝔼[w𝒪,i,tw𝒪,j,t]=0,i,j,ij\mathbb{E}\left[w_{\mathcal{O},i,t}w^{\top}_{\mathcal{O},j,t}\right]=0,\forall i,j\in\mathcal{B},i\neq j. Then, the joint evolution of all obstacles can be written as

𝖷𝒪,t+1=𝐠(𝖷𝒪,t)+w𝒪,t,\displaystyle\mathsf{X}_{\mathcal{O},t+1}=\mathbf{g}\left(\mathsf{X}_{\mathcal{O},t}\right)+w_{\mathcal{O},t}, (8)

where 𝖷𝒪t\mathsf{X}_{\mathcal{O}_{t}}, 𝐠()\mathbf{g}(\cdot), and w𝒪,tw_{\mathcal{O},t} represent the concatenated states, nonlinear dynamics and the process noises of all the FF obstacles at time tt respectively. The additive process noise w𝒪,tFlw_{\mathcal{O},t}\in\mathbb{R}^{Fl} is a zero-mean random vector independent and identically distributed across time according to some prescribed distribution w𝒪\mathbb{P}_{w_{\mathcal{O}}} with covariance Σw𝒪\Sigma_{w_{\mathcal{O}}}. We assume that the process noises affecting the robot and the obstacles ii\in\mathcal{B} are independent of each other. That is, 𝔼[wr,tw𝒪,i,t]=0,i\mathbb{E}\left[w_{r,t}w^{\top}_{\mathcal{O},i,t}\right]=0,\forall i\in\mathcal{B}.

2.1.3 Integrated Environmental Dynamics

We concatenate both the robot’s state and the obstacle states at time tt to form the environmental state

𝒵t\displaystyle\mathcal{Z}_{t} =[xt𝖷𝒪t]nz,nz=n+Fl.\displaystyle=\begin{bmatrix}x_{t}\\ \mathsf{X}_{\mathcal{O}_{t}}\end{bmatrix}\in\mathbb{R}^{n_{z}},\quad n_{z}=n+Fl. (9)

Then the dynamics of the environmental state is given by

𝒵t+1\displaystyle\mathcal{Z}_{t+1} =[𝐟(xt,ut)𝐠(𝖷𝒪t)]f~(𝒵t,ut)+[wr,tw𝒪,t]wt.\displaystyle=\underbrace{\begin{bmatrix}\mathbf{f}(x_{t},u_{t})\\ \mathbf{g}(\mathsf{X}_{\mathcal{O}_{t}})\end{bmatrix}}_{\tilde{f}\left(\mathcal{Z}_{t},u_{t}\right)}+\underbrace{\begin{bmatrix}w_{r,t}\\ w_{\mathcal{O},t}\end{bmatrix}}_{w_{t}}. (10)

The initial condition 𝒵0\mathcal{Z}_{0} is subject to an uncertainty model with the true distribution 𝒵0\mathbb{P}_{\mathcal{Z}_{0}} of 𝒵0\mathcal{Z}_{0} belonging to a moment-based ambiguity set 𝒫𝒵0\mathcal{P}^{\mathcal{Z}_{0}}, i.e., 𝒵0𝒫𝒵0\mathbb{P}_{\mathcal{Z}_{0}}\in\mathcal{P}^{\mathcal{Z}_{0}} where

𝒫𝒵0:={𝒵0𝔼[𝒵0]=𝒵¯0,𝔼[(𝒵0𝒵¯0)(𝒵0𝒵¯0)]=Σ𝒵0}.\displaystyle\mathcal{P}^{\mathcal{Z}_{0}}:=\left\{\mathbb{P}_{\mathcal{Z}_{0}}\mid\mathbb{E}[\mathcal{Z}_{0}]=\bar{\mathcal{Z}}_{0},\mathbb{E}[(\mathcal{Z}_{0}-\bar{\mathcal{Z}}_{0})(\mathcal{Z}_{0}-\bar{\mathcal{Z}}_{0})^{\top}]=\Sigma_{\mathcal{Z}_{0}}\right\}. (11)

Here, wtnzw_{t}\in\mathbb{R}^{n_{z}} is a zero-mean random vector independent and identically distributed across time with covariance Σw=[Σwr00Σw𝒪]\Sigma_{w}=\begin{bmatrix}\Sigma_{w_{r}}&0\\ 0&\Sigma_{w_{\mathcal{O}}}\end{bmatrix}. At any instant of time tt, xtx_{t} can be extracted from the environmental state 𝒵t\mathcal{Z}_{t} as

xt\displaystyle x_{t} =[In𝟎n×Fl]Cxr𝒵t.\displaystyle=\underbrace{\begin{bmatrix}I_{n}&\mathbf{0}_{n\times Fl}\end{bmatrix}}_{C_{xr}}\mathcal{Z}_{t}. (12)

2.2 Sensor Model

In an autonomous robot, the environmental state 𝒵t\mathcal{Z}_{t} must be estimated with a perception system represented by a noisy on-board sensor measurements. We assume that a high-level perception system, such as Semantic SLAM described in [18] is in place and it processes high dimensional raw data ΘtNΘ,NΘ\Theta_{t}\in\mathbb{R}^{N_{\Theta}},N_{\Theta}\in\mathbb{N} to recognize the robot and the obstacles to produce noisy joint measurements of their respective states. In particular, the perception system recognizes the robot and obstacles through their distinctive features through mappings Υx:NΘn,Υ𝒪:NΘl\Upsilon_{x}:\mathbb{R}^{N_{\Theta}}\rightarrow\mathbb{R}^{n},\Upsilon_{\mathcal{O}}:\mathbb{R}^{N_{\Theta}}\rightarrow\mathbb{R}^{l}, labels the entities accordingly and returns a noisy measurement about their position and orientation. We abstract this whole process and assume that the perception system produces noisy measurements of the robot and obstacle states using an assumed nonlinear output model

yt\displaystyle y_{t} =𝒮(𝒵t)+vt,\displaystyle=\mathcal{S}(\mathcal{Z}_{t})+v_{t}, (13)

where ytpy_{t}\in\mathbb{R}^{p} is the output measurement and 𝒮:nzp\mathcal{S}:\mathbb{R}^{n_{z}}\rightarrow\mathbb{R}^{p} is a nonlinear function. The additive measurement noise vtpv_{t}\in\mathbb{R}^{p} is a zero-mean random vector independent and identically distributed across time according to some prescribed distribution v\mathbb{P}_{v} with covariance Σv\Sigma_{v}.

There can be interactions between the uncertainties of the perception system and the process disturbance, due to coupling between robot and environmental states in the perception and the use of an output feedback control policy. Often, this interaction is ignored for the sake of simplicity and as a result can impose undue risk. To model the interaction, we assume that the process noise wtw_{t} and the measurement noise vtv_{t} are cross correlated with each other. That is, the joint noise [wtvt]\begin{bmatrix}w_{t}\\ v_{t}\end{bmatrix} has zero mean with covariance

Σwv=𝔼[[wtvt][wtvt]]=[ΣwMMΣv]0,\displaystyle\Sigma_{wv}=\mathbb{E}\left[\begin{bmatrix}w_{t}\\ v_{t}\end{bmatrix}\begin{bmatrix}w_{t}\\ v_{t}\end{bmatrix}^{\top}\right]=\begin{bmatrix}\Sigma_{w}&M\\ M^{\top}&\Sigma_{v}\end{bmatrix}\succ 0, (14)

where M=𝔼[wtvt]nz×pM=\mathbb{E}[w_{t}v^{\top}_{t}]\in\mathbb{R}^{n_{z}\times p} denotes the rank-one cross-correlation matrix across all time steps tt and the matrix M𝟎nz×pM\neq\mathbf{0}_{n_{z}\times p} is such that the joint noise covariance Σwv\Sigma_{wv} given by (14) is positive definite.

2.3 State & Input Constraints

The robot is nominally subject to constraints on the state and input of the form, t=0,,T1\forall t=0,\dots,T-1,

xt\displaystyle x_{t} 𝒳tfree,\displaystyle\in\mathcal{X}^{\texttt{free}}_{t}, (15)
ut\displaystyle u_{t} 𝒰,\displaystyle\in\mathcal{U}, (16)

where the 𝒰m\mathcal{U}\subset\mathbb{R}^{m} are assumed to be convex polytope. We can rewrite the set description of the required sets using the robot inputs and environmental state as

𝒰\displaystyle\mathcal{U} ={utAuutbu},\displaystyle=\{u_{t}\mid A_{u}u_{t}\leq b_{u}\}, (17)
𝒳\displaystyle\mathcal{X} ={𝒵tA0Cxr𝒵tb0},\displaystyle=\{\mathcal{Z}_{t}\mid A_{0}C_{xr}\mathcal{Z}_{t}\leq b_{0}\}, (18)
𝒪it\displaystyle\mathcal{O}_{it} ={𝒵tAiCxr𝒵tbit},i\displaystyle=\{\mathcal{Z}_{t}\mid A_{i}C_{xr}\mathcal{Z}_{t}\leq b_{it}\},\quad i\in\mathcal{B} (19)

where bunu,b0nE,bitni,b_{u}\in\mathbb{R}^{n_{u}},b_{0}\in\mathbb{R}^{n_{E}},b_{it}\in\mathbb{R}^{n_{i}}, and Au,A0,A_{u},A_{0}, and AiA_{i} are matrices of appropriate dimension. Suppose that nobin_{ob_{i}} and nenvn_{env} be the number of constraints for obstacle ii\in\mathcal{B} and the environment 𝒳\mathcal{X} respectively. Then, the total number of constraints is denoted by

ntotal=nenv+i=1Fnobi.\displaystyle n_{total}=n_{env}+\sum_{i=1}^{F}n_{ob_{i}}. (20)

3 Layers of a Distributionally Robust Autonomy Stack

The autonomy stack in an autonomous robot, as depicted in Figure 1, can be partitioned into a hierarchy of i) a perception system that jointly estimates the robot and environmental states; ii) a motion planner which generates a reference trajectory through the environment, and iii) a feedback controller that tracks the reference trajectory online to mitigate the effects of disturbances. To emphasize our integrated approach, we first pose a general stochastic optimal control problem that incorporates all of these layers. Then we describe how the layers are integrated with explicit incorporation of uncertainties across layers in subsequent sections.

Refer to caption
Figure 1: A distributionally robust autonomy stack with integrated perception, planning, and control layers.

Given an initial state distribution 𝒵0𝒵0\mathcal{Z}_{0}\sim\mathbb{P}_{\mathcal{Z}_{0}} and a set of final goal locations 𝒳goal𝒳\mathcal{X}_{goal}\subset\mathcal{X}, we find a measurable output-and-input-history-dependent control policy π=[π0,πT1]\pi=[\pi_{0},\dots\pi_{T-1}] with ut=πt(y[0:t],u[0:t1])u_{t}=\pi_{t}(y_{[0:t]},u_{[0:t-1]}) that moves the robot state mean to the goal set while respecting input constraints and distributionally robust collision risk constraints, while incorporating the ambiguities in the distributions described above. To this aim, we define the following distributionally robust constrained stochastic optimal control problem

minimize𝜋\displaystyle\underset{\pi}{\text{minimize}} [t=0T1t(𝔼[𝒵t],ut)+T(𝔼[𝒵T])]\displaystyle\left[\sum^{T-1}_{t=0}\ell_{t}(\mathbb{E}[\mathcal{Z}_{t}],u_{t})+\ell_{T}(\mathbb{E}[\mathcal{Z}_{T}])\right] (21a)
subject to 𝒵t+1=f~(𝒵t,ut)+wt,\displaystyle\mathcal{Z}_{t+1}=\tilde{f}(\mathcal{Z}_{t},u_{t})+w_{t}, (21b)
yt=𝒮(𝒵t)+vt,\displaystyle y_{t}=\mathcal{S}(\mathcal{Z}_{t})+v_{t}, (21c)
𝒵0𝒵0𝒫𝒵,\displaystyle\mathcal{Z}_{0}\sim\mathbb{P}_{\mathcal{Z}_{0}}\in\mathcal{P}^{\mathcal{Z}}, (21d)
wtw(0,Σw),vtv(0,Σv),\displaystyle w_{t}\sim\mathbb{P}_{w}(0,\Sigma_{w}),v_{t}\sim\mathbb{P}_{v}(0,\Sigma_{v}), (21e)
ut𝒰={utAuutbu},\displaystyle u_{t}\in\mathcal{U}=\{u_{t}\mid A_{u}u_{t}\leq b_{u}\}, (21f)
sup𝒵t𝒫𝒵(Cxr𝒵t𝒳tfree)αt,t[0:T],\displaystyle\underset{\mathbb{P}_{\mathcal{Z}_{t}}\in\mathcal{P}^{\mathcal{Z}}}{\sup}\mathcal{R}\left(C_{xr}\mathcal{Z}_{t}\notin\mathcal{X}^{\texttt{free}}_{t}\right)\leq\alpha_{t},\quad\forall t\in[0:T], (21g)

where 𝒫𝒵\mathcal{P}^{\mathcal{Z}} is an ambiguity set of marginal state distributions, αt(0,0.5]\alpha_{t}\in(0,0.5] is a user-prescribed stage risk parameter, and ()\mathcal{R}(\cdot) is a risk measure that operates on the distribution of states and satisfies certain desirable axioms mentioned in [9]. The stage cost functions t()\ell_{t}(\cdot) penalizes the robot’s distance to the goal set and actuator effort, and are assumed to be expressed in terms of the environmental state mean 𝔼[𝒵t]\mathbb{E}[\mathcal{Z}_{t}], so that the state uncertainty appears only in the constraints.

Two key features distinguish our problem formulation. First, the state constraints are expressed as distributionally robust risk constraints, using the risk measure ()\mathcal{R}(\cdot) and the ambiguity set 𝒫𝒵\mathcal{P}^{\mathcal{Z}} whose construction will be detailed below. Since we will work with moment based ambiguity sets for the states, we give special consideration to the distributionally robust value-at-risk (DR-VaR) risk measure

sup𝒵t𝒫𝒵𝒵t(Cxr𝒵t𝒳tfree)αt,t[0:T],\displaystyle\underset{\mathbb{P}_{\mathcal{Z}_{t}}\in\mathcal{P}^{\mathcal{Z}}}{\sup}\mathbb{P}_{\mathcal{Z}_{t}}(C_{xr}\mathcal{Z}_{t}\notin\mathcal{X}^{\texttt{free}}_{t})\leq\alpha_{t},\quad\forall t\in[0:T], (22)

which is a special case of (21g) with a tractable reformulation. This means that the nominal constraints xt𝒳tfreex_{t}\in\mathcal{X}^{\texttt{free}}_{t} are enforced with probability αt\alpha_{t} at time tt under the worst-case distribution in the ambiguity set and thus making them distributionally robust chance constraints.

Second, since information about the environmental state is obtained only from noisy measurements, we optimize over dynamic output feedback policies. This policy encompasses the full autonomy stack from perception to planning to control. However, the problem is infinite-dimensional, and moreover, the distributionally robust risk constraint in (22) is also infinite-dimensional and inherently non-convex due to the underlying obstacle avoidance constraints. Thus, solving (21) exactly is essentially impossible. Instead, we aim for an approximate solution. Our proposed solution framework, detailed below, integrates a dynamic nonlinear state estimator (the estimator block in Figure 1), a nonlinear feedback controller (the controller block in Figure 1), and a sampling-based kinodynamic motion planning under uncertainty algorithm (the planner block in Figure 1). Specifically, in this work we propose using an Unscented Kalman filter as the nonlinear estimator, a nonlinear model predictive control as the nonlinear feedback controller, and a novel distributionally robust, kinodynamic variant of the RRT\texttt{RRT}^{\star} called Nonlinear Risk Bounded RRT(NRB-RRT)\texttt{RRT}^{\star}(\texttt{NRB-RRT}^{\star}) as the planner111It is worth noting that other choices for estimator, controller, and planner subsystems can be incorporated in our proposed framework, nevertheless, in this work, for reasons that will be explained later, we make the aforementioned choices for these subsystems.. This integration and explicit incorporation of state estimation uncertainty into the motion planning and control takes a step toward tighter integration of perception, planning, and control, which are nearly always separated in state-of-the-art robotic systems.

4 Output Feedback Based Steering Law: Unscented Kalman Filter with Nonlinear Model Predictive Controller

Sampling based motion planning algorithms require a steering law to steer the robot from one pose to another feasible pose in the free space. Since the environment state has nonlinear dynamics and must be estimated from noisy nonlinear output measurements (13), our proposed steering law π=[π0,,πT1]\pi=[\pi_{0},\dots,\pi_{T-1}] with ut=πt(y0:t,u0:t1)u_{t}=\pi_{t}(y_{0:t},u_{0:t-1}) comprises a combination of nonlinear dynamic state estimator and a nonlinear feedback controller. We utilize the Unscented Kalman Filter (UKF) for joint estimation of robot and obstacle states. The UKF provides derivative-free estimation of the first and second moment, which can be more accurate than the Extended Kalman Filter while remaining computationally tractable. A nonlinear model predictive controller as described in [19], [20] is then utilized based on the state estimate from the UKF.

4.1 The Unscented Kalman Filter (UKF)

In this subsection, we elaborate about the Unscented Kalman Filter algorithm which serves as an estimator module for a perception system present in the autonomy stack shown in Figure 1. Later in subsection 4.2, we will describe how the estimates from UKF are then used to realize an output feedback based controller.

Noise Decorrelation. Before describing the UKF, we present a decorrelation scheme [21] to handle cross-correlation between process noise and sensor induced by the coupling between perception and control layers from (14). The scheme uses a pseudo state process equation to reconstruct a corresponding pseudo process noise, which is no longer correlated with the measurement noise. It is evident from (13) that

yt𝒮(𝒵t)vt=0.\displaystyle y_{t}-\mathcal{S}(\mathcal{Z}_{t})-v_{t}=0. (23)

Thus the environmental dynamics given in (10) can be rearranged as follows

𝒵t+1\displaystyle\mathcal{Z}_{t+1} =f~(𝒵t,ut)+wt+(yt𝒮(𝒵t)vt)\displaystyle=\tilde{f}(\mathcal{Z}_{t},u_{t})+w_{t}+\mathcal{H}(y_{t}-\mathcal{S}(\mathcal{Z}_{t})-v_{t}) (24)
=f~(𝒵t,ut)𝒮(𝒵t)+ytf(𝒵t,ut)+wtvtwt,\displaystyle=\underbrace{\tilde{f}(\mathcal{Z}_{t},u_{t})-\mathcal{H}\mathcal{S}(\mathcal{Z}_{t})+\mathcal{H}y_{t}}_{f^{*}(\mathcal{Z}_{t},u_{t})}+\underbrace{w_{t}-\mathcal{H}v_{t}}_{w^{*}_{t}}, (25)

where f(𝒵t,ut),wtf^{*}(\mathcal{Z}_{t},u_{t}),w^{*}_{t} are the corresponding pseudo process dynamics and pseudo process noise respectively. The term yt\mathcal{H}y_{t} is considered as a known input and is approximated with y^t\mathcal{H}\hat{y}_{t}. The pseudo gain term \mathcal{H} is a design parameter chosen such that the cross-correlation between the pseudo process noise and the sensor noise is made zero. That is,

𝔼[wtvt]=0𝔼[(wtvt)vt]=MΣv=0,=MΣv1.\mathbb{E}\left[w^{*}_{t}v^{\top}_{t}\right]=0\\ \implies\mathbb{E}\left[(w_{t}-\mathcal{H}v_{t})v^{\top}_{t}\right]=M-\mathcal{H}\Sigma_{v}=0,\\ \implies\mathcal{H}=M\Sigma^{-1}_{v}.

The mean and the covariance of the pseudo process noise are then given by

𝔼[wt]\displaystyle\mathbb{E}\left[w^{*}_{t}\right] =𝔼[wtvt]=𝔼[wt]𝔼[vt]=0,\displaystyle=\mathbb{E}\left[w_{t}-\mathcal{H}v_{t}\right]=\mathbb{E}\left[w_{t}\right]-\mathcal{H}\mathbb{E}\left[v_{t}\right]=0, (26)
Σw\displaystyle\Sigma^{*}_{w} =𝔼[wtwt]\displaystyle=\mathbb{E}\left[w^{*}_{t}{w^{*}_{t}}^{\top}\right]
=𝔼[(wtvt)(wtvt)]\displaystyle=\mathbb{E}\left[(w_{t}-\mathcal{H}v_{t})(w_{t}-\mathcal{H}v_{t})^{\top}\right]
=𝔼[wtwtwtvtvtwt+vtvt]\displaystyle=\mathbb{E}\left[w_{t}w^{\top}_{t}-w_{t}v^{\top}_{t}\mathcal{H}^{\top}-\mathcal{H}v_{t}w^{\top}_{t}+\mathcal{H}v_{t}v^{\top}_{t}\mathcal{H}^{\top}\right]
=ΣwMM+Σv\displaystyle=\Sigma_{w}-M\mathcal{H}^{\top}-\mathcal{H}M^{\top}+\mathcal{H}\Sigma_{v}\mathcal{H}^{\top}
=ΣwMΣv1MΣv+Σv\displaystyle=\Sigma_{w}-M\Sigma^{-1}_{v}M^{\top}-\mathcal{H}\Sigma_{v}\mathcal{H}^{\top}+\mathcal{H}\Sigma_{v}\mathcal{H}^{\top}
=ΣwMΣv1M.\displaystyle=\Sigma_{w}-M\Sigma^{-1}_{v}M^{\top}. (27)

Then, applying Schur complement on (14), it is easy to observe that Σw\Sigma^{*}_{w} is positive definite as well:

[ΣwMMΣv]0Σv0,ΣwMΣv1MΣw0.\begin{bmatrix}\Sigma_{w}&M\\ M^{\top}&\Sigma_{v}\end{bmatrix}\succ 0\iff\Sigma_{v}\succ 0,\,\underbrace{\Sigma_{w}-M\Sigma^{-1}_{v}M^{\top}}_{\Sigma^{*}_{w}}\succ 0.

With the new pseudo process equation f()f^{*}(\cdot) given by (25) and the measurement update given by (13), the standard Unscented Kalman Filter can be implemented as the new process noise wtw^{*}_{t} is no longer cross-correlated with the sensor noise vtv_{t}.

The Unscented Kalman Filter. The UKF is a popular tool for nonlinear state estimation. It uses the so-called Unscented Transformation (UT) in both the propagation and update step, yielding derivative-free Kalman filtering for nonlinear systems. For Gaussian inputs, the moment estimates from UT are accurate up to the third order approximation and for the case of non-Gaussian, the approximations are accurate to at least the second-order as described in [22]. An ensemble of 2nz+12n_{z}+1 samples called the sigma points following the Van Der Merwe algorithm is generated deterministically as follows:

χ0\displaystyle\chi_{0} =Z^t1\displaystyle=\hat{Z}_{t-1} (28)
χi\displaystyle\chi_{i} =Z^t1+[(nz+λ)Σ^𝒵t1]i,i=[1:nz]\displaystyle=\hat{Z}_{t-1}+\left[\sqrt{(n_{z}+\lambda)\hat{\Sigma}_{\mathcal{Z}_{t-1}}}\right]_{i},i=[1:n_{z}] (29)
χi+n\displaystyle\chi_{i+n} =Z^t1[(nz+λ)Σ^𝒵t1]i,i=[1:nz],\displaystyle=\hat{Z}_{t-1}-\left[\sqrt{(n_{z}+\lambda)\hat{\Sigma}_{\mathcal{Z}_{t-1}}}\right]_{i},i=[1:n_{z}], (30)

where Z^t1,Σ^𝒵t1\hat{Z}_{t-1},\hat{\Sigma}_{\mathcal{Z}_{t-1}} denote the posterior UKF estimate of mean and covariance of the environmental state at t1t-1 and [(nz+λ)Σ^𝒵t1]i\left[\sqrt{(n_{z}+\lambda)\hat{\Sigma}_{\mathcal{Z}_{t-1}}}\right]_{i} is the ithi^{th} row or column of the matrix square root (nz+λ)Σ^𝒵t1\sqrt{(n_{z}+\lambda)\hat{\Sigma}_{\mathcal{Z}_{t-1}}}. The weights of the sigma points in the calculation of propagated mean and covariance are

W0(m)\displaystyle W^{(m)}_{0} =λnz+λ,W0(c)=λnz+λ+1αu2+βu,\displaystyle=\frac{\lambda}{n_{z}+\lambda},\ W^{(c)}_{0}=\frac{\lambda}{n_{z}+\lambda}+1-\alpha^{2}_{u}+\beta_{u}, (31)
Wi(m)\displaystyle W^{(m)}_{i} =Wi(c)=λ2(nz+λ),i=1,2,,2nz.\displaystyle=W^{(c)}_{i}=\frac{\lambda}{2(n_{z}+\lambda)},\ i=1,2,\dots,2n_{z}. (32)

The scaling parameter λ=αu2(n+κ)nz\lambda=\alpha^{2}_{u}(n+\kappa)-n_{z}, where αu,βu,κ\alpha_{u},\beta_{u},\kappa are used to tune the unscented transformation. Then, every sigma point is propagated through non-linear state update equation to yield an ensemble of sigma points capturing the a priori statistics of 𝒵t\mathcal{Z}_{t} namely 𝒵t,Σ𝒵t\mathcal{Z}^{-}_{t},\Sigma^{-}_{\mathcal{Z}_{t}}. In the update step, we use the transformed set of sigma points described above and propagate them using the given nonlinear measurement model. Finally, the aposteriori state Z^t\hat{Z}_{t} and covariance Σ^𝒵t\hat{\Sigma}_{\mathcal{Z}_{t}} are computed using the output residual and the obtained Unscented Kalman filter gain, t\mathcal{L}_{t} as follows

ξi\displaystyle\xi_{i} =f(χi,ut),i=0,1,,2nz,\displaystyle=f^{*}(\chi_{i},u_{t}),\quad i=0,1,\dots,2n_{z}, (33)
𝒵t\displaystyle\mathcal{Z}^{-}_{t} =i=02nzWi(m)ξi,\displaystyle=\sum^{2n_{z}}_{i=0}W^{(m)}_{i}\xi_{i}, (34)
Σ𝒵t\displaystyle\Sigma^{-}_{\mathcal{Z}_{t}} =i=02nzWi(c)(ξi𝒵t)(ξi𝒵t)+Σw\displaystyle=\sum^{2n_{z}}_{i=0}W^{(c)}_{i}(\xi_{i}-\mathcal{Z}^{-}_{t})(\xi_{i}-\mathcal{Z}^{-}_{t})^{\top}+\Sigma^{*}_{w} (35)
Θi\displaystyle\Theta_{i} =𝒮(ξi),i=0,1,,2nz,\displaystyle=\mathcal{S}(\xi_{i}),\quad i=0,1,\dots,2n_{z}, (36)
μΘ\displaystyle\mu_{\Theta} =i=02nzWi(m)Θi,\displaystyle=\sum^{2n_{z}}_{i=0}W^{(m)}_{i}\Theta_{i}, (37)
ΣΘ\displaystyle\Sigma_{\Theta} =i=02nzWi(c)(ΘiμΘ)(ΘiμΘ)+Σv\displaystyle=\sum^{2n_{z}}_{i=0}W^{(c)}_{i}(\Theta_{i}-\mu_{\Theta})(\Theta_{i}-\mu_{\Theta})^{\top}+\Sigma_{v} (38)
t\displaystyle\mathcal{L}_{t} =[i=02nzWi(c)(ξi𝒵t)(ΘiμΘ)]ΣΘ1\displaystyle=\left[\sum^{2n_{z}}_{i=0}W^{(c)}_{i}(\xi_{i}-\mathcal{Z}^{-}_{t})(\Theta_{i}-\mu_{\Theta})^{\top}\right]\Sigma^{-1}_{\Theta} (39)
Z^t\displaystyle\hat{Z}_{t} =𝒵t+t(ytμΘ)\displaystyle=\mathcal{Z}^{-}_{t}+\mathcal{L}_{t}(y_{t}-\mu_{\Theta}) (40)
Σ^𝒵t\displaystyle\hat{\Sigma}_{\mathcal{Z}_{t}} =Σ𝒵ttΣΘt.\displaystyle=\Sigma^{-}_{\mathcal{Z}_{t}}-\mathcal{L}_{t}\Sigma_{\Theta}\mathcal{L}^{\top}_{t}. (41)
Remark

While the values of the scaling parameters λ,αu,βu,κ\lambda,\alpha_{u},\beta_{u},\kappa that yield third-order approximation accuracy for Gaussian state distributions are well known, there are still no known guidelines to chose the best values for the above parameters when the state distributions are non-Gaussian. With the environmental state distribution being generally non-Gaussian and along with the moments calculation being done using finite number of deterministic samples result in the obtained moments being only approximations of the true moments of the non-Gaussian state distributions. This motivates our use of distributionally robust risk constraints to explicitly account for this non-Gaussianity.

4.2 Nonlinear Model Predictive Control (NMPC) Law

In this subsection, we elaborate about the feedback controller that represents an important module in the autonomy stack shown in Figure 1. Since the robot dynamics is both nonlinear and stochastic, we will be describing an output feedback based nonlinear model predictive control algorithm to steer the robots in the considered setting. Specifically, the steering law for steering the robots is realized through a multiple shooting-based nonlinear model predictive controller [19], [20] that uses the state estimate obtained through an Unscented Kalman filter at each time step. An illustration of the output feedback is shown in Figure 2.

Refer to caption
Figure 2: An output feedback based controller using NMPC and UKF is illustrated here in \mathbb{R}. The sigma points and the distribution of the states at different time steps are depicted with orange dots and shaded blue area respectively. The NMPC-steered trajectory from each sigma point is depicted in dotted black lines. At each time step tt, the distributionally robust budget risk constraint is shown using darkly shaded color whose area is at most αt\alpha_{t}.

The error at a time step tt is

et=CxrZ^txs=x^txs,\displaystyle e_{t}=C_{xr}\hat{Z}_{t}-x_{s}=\hat{x}_{t}-x_{s}, (42)

where xsx_{s} represents a sample in the free space to be steered to. At every time instance tt, the nonlinear model predictive controller repeatedly solves the following optimization problem in a receding horizon fashion with prediction horizon NtN_{t} to find a control input sequence as follows

ut=argmin{uk}k=tt+Nt1\displaystyle u^{\dagger}_{t}=\operatorname*{argmin}_{\left\{u_{k}\right\}^{t+N_{t}-1}_{k=t}}\quad k=tt+Nt1(ekQ2+ukR2)+ek+NtQ2,\displaystyle\sum^{t+N_{t}-1}_{k=t}\left(\left\|e_{k}\right\|^{2}_{Q}+\left\|u_{k}\right\|^{2}_{R}\right)+\left\|e_{k+N_{t}}\right\|^{2}_{Q}, (43)
subject to x^k+1=𝐟(x^k,uk),\displaystyle\hat{x}_{k+1}=\mathbf{f}(\hat{x}_{k},u_{k}),
x^k𝒳,uk𝒰,\displaystyle\hat{x}_{k}\in\mathcal{X},\ u_{k}\in\mathcal{U},

where only the first control input in the optimal sequence utu^{\dagger}_{t} is applied at time tt and the horizon is shifted to t+1t+1 for the problem to be solved again. Here, Q,RQ,R denote the state penalty and control penalty matrices respectively.

5 Sampling-Based Motion Planning Algorithm

The planner module of an autonomy stack as shown in the Figure 1 is responsible for planning a path from a given source to a desired destination adhering to all motion planning specifications. Typically, the planner module solves a motion planning problem as in (21) to design the reference trajectory from source to destination. However, due to the non-convex collision avoidance constraints and presence of uncertainties, it is difficult to exactly solve the motion planning problem subject to the distributionally robust risk constraints. Hence, we resort to a sampling based motion planner, which incrementally constructs a motion plan from the source to the destination for a robot by sampling a point in the obstacle free space and connecting the point to the tree of motion plans if the trajectory connecting the points is collision-free. We propose to use a distributionally robust, kinodynamic variant of the RRT\texttt{RRT}^{\star} motion planning algorithm with dynamic output feedback policies. Our proposed algorithm, called Nonlinear Risk Bounded RRT(NRB-RRT)\texttt{RRT}^{\star}(\texttt{NRB-RRT}^{\star}), grows trees of state and state estimate distributions, rather than merely trees of states, and incorporates distributionally robust risk constraints to build risk-bounded state trajectories and feedback policies. The NRB-RRT\texttt{NRB-RRT}^{\star} tree expansion procedure, inspired by the chance constrained RRT\texttt{RRT}^{\star} algorithm developed in [4], is presented in Algorithm 1. The NRB-RRT\texttt{NRB-RRT}^{\star} tree is denoted by 𝒯\mathcal{T}, consisting of |𝒯||\mathcal{T}| nodes. Each node NN of the tree 𝒯\mathcal{T} consists of a sequence of state distributions, characterized by a distribution mean x^\hat{x} and covariance DD. A sequence of means and covariance matrices is denoted by σ¯\bar{\sigma} and Π¯\bar{\Pi}, respectively. The final mean and covariance of a node’s sequence are denoted by x[N]x[N] and D[N]D[N], respectively. For the state distribution sequence (σ¯,Π¯)(\bar{\sigma},\bar{\Pi}), the notation ΔJ(σ¯,Π¯)\Delta J(\bar{\sigma},\bar{\Pi}) denotes the cost of that sequence. If (σ¯,Π¯)(\bar{\sigma},\bar{\Pi}) denotes the trajectory of node NN with parent NparentN_{parent}, then we denote by J[N]J[N], the entire path cost from the starting state to the terminal state of node NN, constructed recursively as

J[N]=J[Nparent]+ΔJ(σ¯,Π¯).\displaystyle J[N]=J[N_{parent}]+\Delta J(\bar{\sigma},\bar{\Pi}). (44)

5.1 Approximating Optimal Cost-To-Go

In order to efficiently explore the reachable set of the dynamics and increase the likelihood of generating collision-free trajectories, authors in [23] described the limitations of the standard Euclidean distance metric and rather advocated the use of an optimal “cost-to-go” metric, that takes into account a dynamics and control-related quantities to steer the robot. When the dynamics are linear and obstacles are ignored, the optimal “cost-to-go” between any two nodes in the tree can be computed using dynamic programming. However, with nonlinear dynamics and potentially non-holonomic constraints, cost-to-go can be tailored to the specific robot dynamics. Inspired by [24], we present an approximate cost-to-go for some special robot representations and car-like dynamics. The initial pose of a robot, p0p_{0} can be defined using a tuple (x,y,ψ)(x,y,\psi), where (x,y)(x,y) are the positions in global coordinate frame and ψ\psi being its orientation with respect to the positive xx axis. We use an egocentric polar coordinate system to describes the relative location of the target pose pTp_{T} observed from the robot with radial distance rr along the line of sight vector from the robot to the target, orientation of the target ϕ\phi, and orientation of the robot δ\delta, where angles are measured from the line of sight vector as shown in the Figure 3.

Refer to caption
Figure 3: The problem of steering a non-holonomic robot from an initial pose p0p_{0} to a target pose pTp_{T} with different orientation is shown here.
Definition 1.

The non-holonomic directed distance from an initial pose p0p_{0} to a target pose pTp_{T}, with non-holonomic constraints is defined as

𝔇(p0,pT)=r2+kϕ2ϕ2+kδ|δ|.\displaystyle\mathfrak{D}(p_{0},p_{T})=\sqrt{r^{2}+k^{2}_{\phi}\phi^{2}}+k_{\delta}|\delta|. (45)

Here, kϕ>0k_{\phi}>0 is a constant that represent the weight of ϕ\phi with respect to the radial distance rr and kδ>0k_{\delta}>0 is another constant that emphasizes the weight on |δ||\delta| as δ\delta can take both positive and negative values. The non-holonomic distance in (45) is asymmetric meaning that 𝔇(p0,pT)\mathfrak{D}(p_{0},p_{T}) is not equal to 𝔇(pT,p0)\mathfrak{D}(p_{T},p_{0}).

5.2 Outline of NRB-RRT\texttt{NRB-RRT}^{\star} Algorithm

In the first step, using the Sample()(\cdot) function, a random pose xrandx_{rand} is sampled from the free space 𝒳tfree\mathcal{X}^{\texttt{free}}_{t}. Then the tree node, NnearestN_{nearest} that is nearest to the sampled pose is selected using the NearestNode()(\cdot) function (line 3 of Algorithm 1) which uses the “cost-to-go” distance metric defined in (45). Attempts are then made to steer the robot from the nearest tree node to the random sample using the Steer()(\cdot) function that employs the steering law explained in Section 4 (line 4). The control policy obtained is then used to propagate the state mean and covariance, and the entire trajectory (σ¯,Π¯)(\bar{\sigma},\bar{\Pi}) is returned by the Steer()(\cdot) function. Using the DR-Feasible()(\cdot) function, each state distribution in the trajectory is checked for distributionally robust probabilistic constraint satisfaction discussed in the next section. Further, the line connecting subsequent state distributions in the trajectory are also checked for collision with the obstacle sets 𝒪it,i\mathcal{O}_{it},i\in\mathcal{B}. An outline of the DR-Feasible subroutine is shown in Algorithm 2. If the entire trajectory (σ¯,Π¯)(\bar{\sigma},\bar{\Pi}) is probabilistically feasible, a new node NminN_{min} with that distribution sequence (σ¯,Π¯)(\bar{\sigma},\bar{\Pi}) is created (line 7) but not yet added to 𝒯\mathcal{T}. Instead, nearby nodes are identified for possible connections via the NearNodes()(\cdot) function (line 8), which returns a subset of nodes 𝒩near𝒯\mathcal{N}_{near}\subseteq\mathcal{T}, if they are within a search radius ensuring probabilistic asymptotic optimality guarantees specified in [25]. The search radius is given by

r=min{γ(log(|𝒩t|)𝒩t)1d+1,μmax},\displaystyle r=\min\left\{\gamma\left(\frac{\log(|\mathcal{N}_{t}|)}{\mathcal{N}_{t}}\right)^{\frac{1}{d+1}},\mu_{max}\right\}, (46)

where 𝒩t\mathcal{N}_{t} refers to the number of nodes in the tree at time tt, the positive scalar μmax\mu_{max} is the maximum radius specified by the user, and γ\gamma refers to the planning constant based on the dd dimensional environment and is selected using [25, Theorem 1]. Then we seek to identify the lowest-cost, probabilistically feasible connection from the 𝒩near\mathcal{N}_{near} nodes to xrandx_{rand} (lines 10-14). For each possible connection, a distribution sequence is simulated via the steering law (line 11). If the resulting sequence is probabilistically feasible, and the cost of that node represented as crand=J[Nnear]+ΔJ(σ¯,Π¯c_{rand}=J[N_{near}]+\Delta J(\bar{\sigma},\bar{\Pi}, is lower than the cost of NminN_{min} denote by J[Nmin]J[N_{min}], then a new node with this sequence replaces NminN_{min} (line 14). The lowest-cost node is ultimately added to 𝒯\mathcal{T} (line 15). Finally, edges are rewired based on attempted connections from the new node NminN_{min} to nearby nodes 𝒩near\mathcal{N}_{near} (lines 17-22), ancestors excluded (line 17) which are found using Ancestors()(\cdot) function. A distribution sequence is simulated via the steering law from NminN_{min} to the terminal state of each nearby node Nnear𝒩nearN_{near}\in\mathcal{N}_{near} (line 18). If the resulting sequence is probabilistically feasible, and the cost of that node cminc_{min} is lower than the cost of NnearN_{near} given by J[Nnear]J[N_{near}] (line 19), then a new node with this distribution sequence replaces NnearN_{near} (lines 21-22). The tree expansion procedure is then repeated until a node from the goal set is added to the tree. At that point, a distributionally robust feasible trajectory is obtained from the tree root to 𝒳goal\mathcal{X}_{goal}.

Algorithm 1 NRB-RRT\texttt{NRB-RRT}^{\star}- Tree Expansion Procedure
1:Inputs: Current Tree 𝒯, time step t\text{Inputs: Current Tree }\mathcal{T},\text{ time step }t
2:xrandx_{rand}\leftarrow Sample(𝒳tfree)(\mathcal{X}^{\texttt{free}}_{t})
3:NnearestN_{nearest}\leftarrow NearestNode(xrand,𝒯)(x_{rand},\mathcal{T})
4:(σ¯,Π¯)(\bar{\sigma},\bar{\Pi})\leftarrow Steer(x^[Nnearest],D[Nnearest],xrand)(\hat{x}[N_{nearest}],D[N_{nearest}],x_{rand})
5: // Check if sequence (σ¯,Π¯)(\bar{\sigma},\bar{\Pi}) is DR-Feasible
6:if DR-Feasible(σ¯,Π¯)(\bar{\sigma},\bar{\Pi}) then
7:     Create node Nmin{σ¯,Π¯}N_{min}\{\bar{\sigma},\bar{\Pi}\}
8:     𝒩near\mathcal{N}_{near}\leftarrow NearNodes(𝒯,xrand,|𝒯|)(\mathcal{T},x_{rand},\left|\mathcal{T}\right|)
9:      // Connect via a minimum-cost path
10:     for each Nnear𝒩near\NnearestN_{near}\in\mathcal{N}_{near}\backslash N_{nearest} do
11:         (σ¯,Π¯)(\bar{\sigma},\bar{\Pi})\leftarrow Steer(x^[Nnear],D[Nnear],xrand)(\hat{x}[N_{near}],D[N_{near}],x_{rand})
12:         crandc_{rand}\leftarrow J[xnear]+ΔJ(σ¯,Π¯)[x_{near}]+\Delta J(\bar{\sigma},\bar{\Pi})
13:         if DR-Feasible(σ¯,Π¯)(\bar{\sigma},\bar{\Pi}) & crand<J[Nmin]c_{rand}<J[N_{min}] then
14:              Replace NminN_{min} with Nmin{σ¯,Π¯}N_{min}\{\bar{\sigma},\bar{\Pi}\}               
15:     Add NminN_{min} to 𝒯\mathcal{T}
16:      // Re-Wire the Tree
17:     for each Nnear𝒩near\N_{near}\in\mathcal{N}_{near}\backslash Ancestors(Nmin)(N_{min}) do
18:         (σ¯,Π¯)(\bar{\sigma},\bar{\Pi})\leftarrow Steer(x^[Nmin],D[Nmin],x^[Nnear])(\hat{x}[N_{min}],D[N_{min}],\hat{x}[N_{near}])
19:         cminc_{min}\leftarrow J[Nmin]+ΔJ(σ¯,Π¯)[N_{min}]+\Delta J(\bar{\sigma},\bar{\Pi})
20:         if DR-Feasible(σ¯,Π¯)(\bar{\sigma},\bar{\Pi}) & cmin<J[Nnear]c_{min}<J[N_{near}] then
21:              Delete NnearN_{near} from 𝒯\mathcal{T}
22:              Add new node Nnew{σ¯,Π¯}N_{new}\{\bar{\sigma},\bar{\Pi}\} to 𝒯\mathcal{T}               
Algorithm 2 DR-Feasible Subroutine
Input: 𝐓time step distribution sequence(σ¯,Π¯)\text{Input: }\mathbf{T-}\text{time step distribution sequence}(\bar{\sigma},\bar{\Pi})
2:for t=1t=1 to 𝐓\mathbf{T} do
     (x^t,Dx^t)(\hat{x}_{t},D_{\hat{x}_{t}})\leftarrow ttht^{th} element in (σ¯,Π¯)(\bar{\sigma},\bar{\Pi}) sequence
4:     𝕃\mathbb{L}\leftarrow Line connecting position block of x^t1\hat{x}_{t-1} to x^t\hat{x}_{t}
     for each ii\in\mathcal{B} do
6:         if (x^t,Dx^t)(\hat{x}_{t},D_{\hat{x}_{t}}) dissatisfies (51) or 𝕃𝒪it\mathbb{L}\cap\mathcal{O}_{it}\neq\varnothing then
              Return false               
8:Return true

6 Distributionally Robust Collision Check

It is necessary to ensure that the total risk of the trajectory returned by the planner module of the autonomy stack in Figure 1 does not exceed the given total risk budget and thereby agree with the motion planning specifications. In this section, we evaluate the safety of a trajectory returned by the planner module using distributionally robust risk constraints and formally present a risk treatment to ensure the safety of the whole mission plan.

6.1 Moment-Based Ambiguity Set To Model Uncertainty

Unlike most stochastic motion planning algorithms that often assume a functional form (often Gaussian) for probability distributions to model uncertainties, we will focus here on uncertainty modeling using moment-based ambiguity sets. Since the initial environmental state 𝒵0\mathcal{Z}_{0} is not assumed to be Gaussian, then neither are the environmental state distributions 𝒵t\mathbb{P}_{\mathcal{Z}_{t}} at any point of future time tt. The ambiguity set defining the true environmental state is

𝒫𝒵t={𝒵t𝔼[𝒵t]=𝒵^t,𝔼[(𝒵t𝒵^t)(𝒵t𝒵^t)]=Σ𝒵t}.\mathcal{P}^{\mathcal{Z}_{t}}=\left\{\mathbb{P}_{\mathcal{Z}_{t}}\mid\mathbb{E}[\mathcal{Z}_{t}]=\hat{\mathcal{Z}}_{t},\mathbb{E}[(\mathcal{Z}_{t}-\hat{\mathcal{Z}}_{t})(\mathcal{Z}_{t}-\hat{\mathcal{Z}}_{t})^{\top}]=\Sigma_{\mathcal{Z}_{t}}\right\}. (47)

6.2 Risk Treatment for Trajectory Safety

To study the stage risk constraints (21) for the safety of the planned reference trajectory, we follow the steps similar to the one described in [16]. Let 𝔓Safe\mathfrak{P}_{Safe} denote the event that plan 𝔓\mathfrak{P} succeeds and 𝔓Fail\mathfrak{P}_{Fail} as the complementary event (i.e. failure). Given a confidence β(0,0.5]\beta\in(0,0.5], we consider the specification that a plan succeeds with high probability or equivalently that it fails with low probability. That is,

(𝔓Safe)1β(𝔓Fail)β.\displaystyle\mathbb{P}(\mathfrak{P}_{Safe})\geq 1-\beta\iff\mathbb{P}(\mathfrak{P}_{Fail})\leq\beta. (48)

Failure of the total plan requires at least one stage risk constraints to be violated. Then, t[0:T]\forall t\in[0:T],

inf𝒵t𝒫𝒵𝒵t(Cxr𝒵t𝒳tfree)1αt\displaystyle\underset{\mathbb{P}_{\mathcal{Z}_{t}}\in\mathcal{P}^{\mathcal{Z}}}{\inf}\mathbb{P}_{\mathcal{Z}_{t}}(C_{xr}\mathcal{Z}_{t}\in\mathcal{X}^{\texttt{free}}_{t})\geq 1-\alpha_{t}
\displaystyle\Leftrightarrow sup𝒵t𝒫𝒵𝒵t(Cxr𝒵t𝒳tfree)αt.\displaystyle\underset{\mathbb{P}_{\mathcal{Z}_{t}}\in\mathcal{P}^{\mathcal{Z}}}{\sup}\mathbb{P}_{\mathcal{Z}_{t}}(C_{xr}\mathcal{Z}_{t}\notin\mathcal{X}^{\texttt{free}}_{t})\leq\alpha_{t}.

Applying Boole’s law, probability of the success event can be bounded as

(𝔓Safe)=1(𝔓Fail)\displaystyle\mathbb{P}(\mathfrak{P}_{Safe})=1-\mathbb{P}(\mathfrak{P}_{Fail}) =1𝒵t(t=0TCxr𝒵t𝒳tfree)\displaystyle=1-\mathbb{P}_{\mathcal{Z}_{t}}\left({\bigcup_{t=0}^{T}C_{xr}\mathcal{Z}_{t}\not\in\mathcal{X}^{\texttt{free}}_{t}}\right)
1sup𝒵t𝒫𝒵𝒵t(t=0TCxr𝒵t𝒳tfree)\displaystyle\geq 1-\underset{\mathbb{P}_{\mathcal{Z}_{t}}\in\mathcal{P}^{\mathcal{Z}}}{\sup}\mathbb{P}_{\mathcal{Z}_{t}}\left({\bigcup_{t=0}^{T}C_{xr}\mathcal{Z}_{t}\not\in\mathcal{X}^{\texttt{free}}_{t}}\right)
1t=0Tsup𝒵t𝒫𝒵𝒵t(Cxr𝒵t𝒳tfree)\displaystyle\geq 1-\sum_{t=0}^{T}\underset{\mathbb{P}_{\mathcal{Z}_{t}}\in\mathcal{P}^{\mathcal{Z}}}{\sup}\mathbb{P}_{\mathcal{Z}_{t}}\left({C_{xr}\mathcal{Z}_{t}\not\in\mathcal{X}^{\texttt{free}}_{t}}\right)
1t=0Tαt:=β\displaystyle\geq 1-\underbrace{\sum_{t=0}^{T}\alpha_{t}}_{:=\beta}

If the stage risks t[0:T]\forall t\in[0:T] are equal, meaning αt=α\alpha_{t}=\alpha, then β=(T+1)α\beta=(T+1)\alpha. Furthermore, if the stage risk αt=α\alpha_{t}=\alpha is equally distributed over all ntotaln_{total} constraints, then, a risk bound for a single constraint is αntotal\frac{\alpha}{n_{total}}. Subsequently, the corresponding risk bound for an obstacle ii\in\mathcal{B} and the environment 𝒳\mathcal{X} would be αnobintotal\frac{\alpha n_{ob_{i}}}{n_{total}} and αnenvntotal\frac{\alpha n_{env}}{n_{total}} respectively. It is possible to have a spatio-temporal based risk allocation and is currently being pursued as a future work.

6.3 Distributionally Robust Collision Check

The non-convex obstacle avoidance constraints for obstacle ii\in\mathcal{B} can be expressed as the disjunction

¬(AiCxr𝒵tbit)j=1ni(aijCxr𝒵taijcijt),\displaystyle\neg(A_{i}C_{xr}\mathcal{Z}_{t}\leq b_{it})\Leftrightarrow\bigvee^{n_{i}}_{j=1}(a^{\top}_{ij}C_{xr}\mathcal{Z}_{t}\geq a^{\top}_{ij}c_{ijt}), (49)

where \lor denotes disjunction (logical OR operator) and ¬\neg denotes negation (logical NOT operator) and cijt=c^ijt+ri,tc_{ijt}=\hat{c}_{ijt}+r_{i,t} is a point nominally on the jjth constraint of the iith obstacle whose covariance is the same as that of ri,tr_{i,t}. The control law returned by the steering function should also satisfy the state constraints which are expressed as distributionally robust chance constraints. The nominal state constraints, Cxr𝒵t𝒳tfreeC_{xr}\mathcal{Z}_{t}\in\mathcal{X}^{\texttt{free}}_{t}, are required to be satisfied with probability 1α1-\alpha, under the worst case probability distribution in the ambiguity set. Let the mean and covariance of the uncertain obstacle motion be defined using 𝐄[cijt]=c^ijt\mathbf{E}[c_{ijt}]=\hat{c}_{ijt} and 𝐄[(cijtc^ijt)(cijtc^ijt)]=Σjtc\mathbf{E}[(c_{ijt}-\hat{c}_{ijt})(c_{ijt}-\hat{c}_{ijt})^{\top}]=\Sigma^{c}_{jt}. Following [26], under the moment-based ambiguity set defined by (47), a constraint on the worst-case probability of violating the jthj^{th} constraint of obstacle ii\in\mathcal{B}

sup𝒵t𝒫𝒵𝒵t(aijCxr𝒵taijcijt)αi\underset{\mathbb{P}_{\mathcal{Z}_{t}}\in\mathcal{P}^{\mathcal{Z}}}{\sup}\mathbb{P}_{\mathcal{Z}_{t}}(a^{\top}_{ij}C_{xr}\mathcal{Z}_{t}\geq a^{\top}_{ij}c_{ijt})\leq\alpha_{i} (50)

is equivalent to the linear constraint on the state mean 𝒵^t\hat{\mathcal{Z}}_{t}

aijCxr𝒵^taijc^ijt+1αiαi(Dx^t+Σjtc)12aij2,a^{\top}_{ij}C_{xr}\hat{\mathcal{Z}}_{t}\geq a^{\top}_{ij}\hat{c}_{ijt}+\sqrt{\frac{1-\alpha_{i}}{\alpha_{i}}}{\left\|(D_{\hat{x}_{t}}+\Sigma^{c}_{jt})^{\frac{1}{2}}a_{ij}\right\|}_{2}, (51)

where Dx^t=CxrΣ𝒵tCxrD_{\hat{x}_{t}}=C_{xr}\Sigma_{\mathcal{Z}_{t}}C^{\top}_{xr} and αi\alpha_{i} is the user prescribed risk parameter for obstacle ii\in\mathcal{B}. Obstacle risks are allocated such that their sum does not exceed the total constraint risk α\alpha (as described in subsection 6.2). The scaling constant 1αiαi\sqrt{\frac{1-\alpha_{i}}{\alpha_{i}}} in the deterministic tightening of the nominal constraint in (51) is larger than the one obtained with a Gaussian assumption, leading to a stronger tightening that reflects the weaker assumptions about the uncertainty distributions.

7 Simulation Results

In this section, we demonstrate our proposed framework using simulations with both a unicycle and a bicycle robot dynamics. Specifically, we demonstrate the bicycle dynamics in an urban driving simulator [17]. The robot is assumed to be moving in a bounded and cluttered environment. While the proposed framework can handle both dynamic and uncertain obstacles, we assume the obstacles to be static and deterministic (w𝒪t=0,t)(w_{\mathcal{O}_{t}}=0,\forall t) for simplicity, so that all uncertainty comes from the unknown initial state, robot process disturbance, and measurement noise. That is, we assume that gi(x)=0,ig_{i}(x)=0,\forall i\in\mathcal{B}.

7.1 Unicycle Model Based Simulation

7.1.1 Robot Motion Model:

Refer to caption
Figure 4: A unicycle robot operating in 2\mathbb{R}^{2} is shown here.

We consider the problem of navigating a robot with unicycle dynamics from an initial state to a final set of states. The nonlinear robot and the static obstacle dynamics as shown in Figure 4 are given by

[xt+1yt+1θt+1xobs,t+1yobs,t+1]\displaystyle\begin{bmatrix}x_{t+1}\\ y_{t+1}\\ \theta_{t+1}\\ x_{obs,t+1}\\ y_{obs,t+1}\end{bmatrix} =[xtytθtxobs,tyobs,t]𝒵t+Δt[νtcos(θt)νtsin(θt)ωt00]+Δtwt\displaystyle=\underbrace{\begin{bmatrix}x_{t}\\ y_{t}\\ \theta_{t}\\ x_{obs,t}\\ y_{obs,t}\end{bmatrix}}_{\mathcal{Z}_{t}}+\Delta t\begin{bmatrix}\nu_{t}\cos(\theta_{t})\\ \nu_{t}\sin(\theta_{t})\\ \omega_{t}\\ 0\\ 0\end{bmatrix}+\Delta t\,w_{t} (52)

where xt,ytx_{t},y_{t}\in\mathbb{R} are the horizontal and vertical positions of the robot, θt\theta_{t}\in\mathbb{R} is the heading of the robot relative to the xx-axis with vt,ωtv_{t},\omega_{t}\in\mathbb{R} being the linear and angular velocity control inputs, and wtw_{t} denoting the disturbances at time tt. Further, xobs,t,yobs,tx_{obs,t},y_{obs,t}\in\mathbb{R} denote the the horizontal and vertical positions of the obstacles at time tt. The discretization time step was selected to be Δt=0.2\Delta t=0.2sec.

7.1.2 Measurement Model:

We assume that the unicycle robot is equipped with a sensor with measurement noise. Further we consider independent sensor and process noises for simplicity. The measurement model is given by

zt=𝒮(𝒵t)+vt=𝒵t+vt.z_{t}=\mathcal{S}(\mathcal{Z}_{t})+v_{t}=\mathcal{Z}_{t}+v_{t}. (53)

7.1.3 Discussion of Results:

Refer to caption
Figure 5: A risk-bounded motion plan of a unicycle robot operating in 2\mathbb{R}^{2} computed using the NRB-RRT\texttt{NRB-RRT}^{\star} algorithm with stage-risk budget αt=104\alpha_{t}=10^{-4} and the corresponding 1σ1\sigma uncertainty covariance ellipses for its positions are shown here in yellow. The white triangle and the green rectangle respectively denote the start position and the goal region.
Refer to caption
Figure 6: Results of 1000 independent Monte-Carlo trials using zero-mean noises wt,vtw_{t},v_{t} sampled from multivariate Laplacian distribution with covariance matrices Σw=Σv=107In\Sigma_{w}=\Sigma_{v}=10^{-7}I_{n} are shown here. With noises being minimal, success rate is 100%100\%. The white triangle, star and green rectangle denote the starting position, goal position and the goal region respectively.
Refer to caption
Figure 7: Results of 1000 independent Monte-Carlo trials using zero-mean noises wt,vtw_{t},v_{t} sampled from multivariate Laplacian distribution with covariance matrices Σw=Σv=103In\Sigma_{w}=\Sigma_{v}=10^{-3}I_{n} are shown here. With noises being stronger, success rate falls to 44.9%44.9\%. The white triangle, star and green rectangle denote the starting position, goal position and the goal region respectively.
Refer to caption
Figure 8: Results of 1000 independent Monte-Carlo trials using zero-mean noises wt,vtw_{t},v_{t} sampled from multivariate Laplacian distribution with covariance matrices Σw=Σv=102In\Sigma_{w}=\Sigma_{v}=10^{-2}I_{n} are shown here. With noises being strong, success rate falls to 0%0\%. The white triangle, star and green rectangle denote the starting position, goal position and the goal region respectively.

The motion plan using NRB-RRT\texttt{NRB-RRT}^{\star} algorithm was generated on a machine with an Intel Core i7 CPU and 8GB of RAM. The trajectory tracking Monte Carlo simulations were performed on a machine with a Ryzen 7 2700X and 64GB of RAM. The nonlinear MPC problem in (43) is modeled with CasADi Opti and solved with IPOPT solver [27]. We demonstrate NRB-RRT\texttt{NRB-RRT}^{\star} in a obstacle cluttered environment as shown in Figure 5. The environment consisted of a root node (white triangle), a goal area (dashed green rectangle), and a 10×1010\times 10 environment with rectangular obstacles for its sides (black boundary). The robot is assumed to occupy a single point. The input bounds are ±0.5\pm 0.5 units/sec for linear velocity and ±π\pm\pirad/sec for angular velocity. The NRB-RRT\texttt{NRB-RRT}^{\star} steering horizon is N=30N=30 and the NMPC planning horizon is Nll=10N_{ll}=10. The planner control cost matrix was R=diag([1,1])R=\text{diag}([1,1]). The tracking cost matrices were Q=diag([100,100,10])Q=\text{diag}([100,100,10]) and R=diag([1,1])R=\text{diag}([1,1]) for all t=[0:T1]t=[0:T-1], and QT=10QQ_{T}=10Q. We used a high-level plan risk bound of β=0.1\beta=0.1. It is divided equally across the time steps Tmax=1000T_{max}=1000 and among the obstacle constraints. The process noise distribution w\mathbb{P}_{w} and the sensor noise distribution v\mathbb{P}_{v} were taken as a multivariate Laplace distribution with zero mean and covariance Σw=Σv=107In\Sigma_{w}=\Sigma_{v}=10^{-7}I_{n}. We employed NRB-RRT\texttt{NRB-RRT}^{\star} algorithm with just the euclidean distance metric for generating the motion plan. Clearly, the NRB-RRT\texttt{NRB-RRT}^{\star} tree as shown in Figure 5 avoided the unsafe gap to the top-right of the goal region in process of reaching the goal as it was deemed to be too risky. To evaluate the effectiveness of the NRB-RRT\texttt{NRB-RRT}^{\star} algorithm, we performed Monte-Carlo simulation using the reference trajectory obtained after 1000 iterations of NRB-RRT\texttt{NRB-RRT}^{\star} algorithm. Particularly, at each independent trials out of the total 1000 trials, we realized both the process noise wtw_{t} and the sensor noise vtv_{t} from either a multivariate Laplacian distribution or a multivariate Gaussian distribution with the same mean and covariance. The results of the Monte-Carlo trials using multivariate Laplacian noises with covariance matrices (Σw=Σv)(\Sigma_{w}=\Sigma_{v}) for different noise levels (107In,103In,102In)(10^{-7}I_{n},10^{-3}I_{n},10^{-2}I_{n}) are shown in Figures 8, 8, 8 respectively. An analogous results using multivariate Gaussian distribution based noises for same noise levels were also obtained. The Monte-carlo trials subjected to noises from both distributions with different noise levels reveal that as the noise levels were smaller (Σw=Σv=107In)(\Sigma_{w}=\Sigma_{v}=10^{-7}I_{n}), the feedback control layer had good tracking of the reference trajectory provided by the planning layer. However, when the noises got stronger (Σw=Σv=103In)(\Sigma_{w}=\Sigma_{v}=10^{-3}I_{n}), more collisions/failure occurred and finally with even stronger noises, (Σw=Σv=102In)(\Sigma_{w}=\Sigma_{v}=10^{-2}I_{n}), no trial was capable to track the whole trajectory without collision. This demonstrates that as the planner in the autonomy stack plans a reference trajectory with a conservative noise setting, the controller can tolerate noises up to a certain level after which it cannot reach the goal set without collisions. A similar analysis was done in [16] but without a measurement model with sensor noise. In the next subsection, we simulate a car-like robot having bicycle dynamics with a nonlinear measurement model and sensor noise and demonstrate our approach using an urban driving simulator.

Noise Multivariate Laplacian Multivariate Gaussian
# of Avg. Run # of Avg. Run
(×In)(\times I_{n}) collisions Time (s) collisions Time (s)
10710^{-7} 0 4.2117 0 4.2043
10310^{-3} 551 3.5119 495 3.7024
10210^{-2} 1000 0.8964 1000 0.9310
Table 1: Performance metrics from Monte Carlo trials with different covariance matrices Σw=Σv\Sigma_{w}=\Sigma_{v} for noises corresponding to two different distributions are tabulated here. It is evident the multivariate Laplacian being an heavy-tailed distribution results in more collisions than its counterpart multivariate Gaussian.

7.2 Bicycle Model Based Simulation

7.2.1 Robot Motion Model:

Refer to caption
Figure 9: A bicycle model [28] depicting the motion of a car-like vehicle is shown here.

The kinematics of the robot modeled using bicycle dynamics as shown in Figure 9 and the static obstacle dynamics are given by

[xt+1yt+1ψt+1νt+1xobs,t+1yobs,t+1]\displaystyle\begin{bmatrix}x_{t+1}\\ y_{t+1}\\ \psi_{t+1}\\ \nu_{t+1}\\ x_{obs,t+1}\\ y_{obs,t+1}\end{bmatrix} =[xtytψtνtxobs,tyobs,t]𝒵t+Δt[νtcos(ψt)νtsin(ψt)νtLtan(δt)at00]+Δtwt,\displaystyle=\underbrace{\begin{bmatrix}x_{t}\\ y_{t}\\ \psi_{t}\\ \nu_{t}\\ x_{obs,t}\\ y_{obs,t}\end{bmatrix}}_{\mathcal{Z}_{t}}+\Delta t\begin{bmatrix}\nu_{t}\cos(\psi_{t})\\ \nu_{t}\sin(\psi_{t})\\ \frac{\nu_{t}}{L}\tan(\delta_{t})\\ a_{t}\\ 0\\ 0\end{bmatrix}+\Delta t\,w_{t}, (54)

where (xt,yt)(x_{t},y_{t}) denote the position of center of mass of the robot in the simulation environment, and ψt\psi_{t} its orientation at time tt. The pair (xobs,t,yobs,t)(x_{obs,t},y_{obs,t}) denote the position of center of mass of the obstacle at time tt. The discretization time step was selected to be Δt=0.2\Delta t=0.2. The length of the vehicle wheelbase is denoted by LL and L=lf+lrL=l_{f}+l_{r}, where lf,lrl_{f},l_{r} denote the distance from the center of mass to the front and rear wheels respectively. The angle of the current velocity νt\nu_{t} of the center of mass with respect to the longitudinal axis of the car is denoted by βt\beta_{t}. The control inputs (at,δt)(a_{t},\delta_{t}) respectively denote the linear acceleration and the steering angle of the robot and acceleration of the center of mass is assumed to be in the same direction as the velocity νt\nu_{t}. We assume the robot to start from the origin with an orientation (ψ0=270(\psi_{0}=270^{\circ} in Carla)). The pose of the robot is sampled uniformly within the bounds of the feasible environment in 2\mathbb{R}^{2} whose boundaries are not treated probabilistically. We chose to simulate our model using the Audi E-Tron vehicle in the simulator whose parameter values are L=2.9mL=2.9m.

7.2.2 Measurement Model:

We assume that the robot is equipped with a radar sensor which provides a noisy bearing and range to multiple known obstacle locations in the landscape. Thus the measurement model is given by

zt=𝒮(xt,yt,θ,xobs,t,yobs,t)=[(xtxgoal)2+(ytygoal)2tan1(ytygoalxtxgoal)ψtcos(θ)xobs,tsin(θ)yobs,t]+vt,\footnotesize z_{t}=\mathcal{S}(x_{t},y_{t},\theta^{\star},x_{obs,t},y_{obs,t})=\begin{bmatrix}\sqrt{(x_{t}-x_{goal})^{2}+(y_{t}-y_{goal})^{2}}\\ \tan^{-1}\left(\frac{y_{t}-y_{goal}}{x_{t}-x_{goal}}\right)-\psi_{t}\\ \cos(\theta^{\star})x_{obs,t}\\ \sin(\theta^{\star})y_{obs,t}\end{bmatrix}+v_{t}, (55)

where the point (xgoal,ygoal)(x_{goal},y_{goal}) denotes the location of a landmark inside the goal region 𝒳goal\mathcal{X}_{goal} and θ=0.01\theta^{\star}=0.01 radian denotes a known distortion of a sensor that estimates the position of obstacles. The initial state is assumed to be of zero mean with covariance matrix Σ^𝒵0=0.001Inz\hat{\Sigma}_{\mathcal{Z}_{0}}=0.001I_{n_{z}}. The sensor noise and process noise covariance matrices were Σv=0.001Ip\Sigma_{v}=0.001I_{p} and Σw=diag(0.001,0.001,0.001,0.001,0,0)\Sigma_{w}=\text{diag}(0.001,0.001,0.001,0.001,0,0). For simplicity, we assumed the cross-correlation matrix to be zero (M=0)(M=0). While executing the algorithm 1, the equation (45) is used to approximate the cost-to-go quantity with kϕ=1.2k_{\phi}=1.2 and kδ=3k_{\delta}=3 respectively in the NearestNode()(\cdot) and NearNodes()(\cdot) modules and for the rest, the cost-to-go is approximated using the steering law that employs nonlinear model predictive control. The search radius used in the Algorithm 1 is obtained with γ=30\gamma=30. For simplicity, the environmental boundaries are not treated probabilistically. Since the motion model given by (54) and the measurement model given by (55) are nonlinear, this particular set up demands using Unscented Kalman Filter for state estimation and a nonlinear model predictive control for steering. To generate the sigma points and weigh them accordingly, we use αu=1,β=2,κ=3n\alpha_{u}=1,\beta=2,\kappa=3-n. Further, we define the error et=Cxr𝒵^txse_{t}=C_{xr}\hat{\mathcal{Z}}_{t}-x_{s} for i=0,,Ti=0,\dots,T, with T=1000T=1000. A dynamic output feedback policy utu_{t} that minimizes the cost function

J\displaystyle J =k=tt+N1(ekQek+ukRuk)+ek+NQek+N,\displaystyle=\sum^{t+N-1}_{k=t}\left(e^{\top}_{k}Qe_{k}+u^{\top}_{k}Ru_{k}\right)+e^{\top}_{k+N}Qe_{k+N}, (56)

is computed using nonlinear model predictive control to steer the robot from a tree node state xtx_{t} to a random feasible sample pose xsx_{s} with prediction horizon N=50N=50. The control inputs are constrained as |a|3ms2|a|\leq 3ms^{-2} and |δ|70|\delta|\leq 70^{\circ}. The nonlinear model predictive control law is obtained through the CasADi toolbox [27] that employs the IPOPT large-scale nonlinear optimization solver. The state and control penalty matrices namely Q=100diag([2,2,1,5,1,1]),R=0.01ImQ=100\texttt{diag}([2,2,1,5,1,1]),R=0.01I_{m} are used to penalize the state and control deviations respectively. The distributionally robust state constraints are enforced over a mission horizon length T=50T=50 and with probabilistic satisfaction parameter α=0.05\alpha=0.05. For all the simulations, the NRB-RRT\texttt{NRB-RRT}^{\star} algorithm with the distributionally robust collision check or Gaussian collision check procedure is run for 200 iterations, with 2-σ\sigma position uncertainty ellipses from the covariance matrix being drawn at the end of each trajectory.

7.2.3 Results and Discussion:

Refer to caption
Figure 10: Our proposed approach is demonstrated using an urban driving simulator [17]. An NRB-RRT\texttt{NRB-RRT}^{\star} tree (in orange color) as a result of 200 iterations and the resulting reference path (in yellow color) from the start to the goal region are shown here. The NRB-RRT\texttt{NRB-RRT}^{\star} algorithm employed distributionally robust chance constraints with α=0.05\alpha=0.05.

The NRB-RRT\texttt{NRB-RRT}^{\star} tree expansion procedure was run until at most 2 nodes from the goal region were added to the tree successfully and the resulting reference trajectory from the starting pose to the goal is shown in Figure 10. The corresponding 2D projection is shown in the left side of Figure 11 and distributionally robust risk constraints generated more conservative trajectories around the obstacles, by explicitly incorporating the uncertainty in the state due to the initial localization, system dynamics, and measurement uncertainties in the form of ambiguity sets. It produces trajectories that satisfy the chance constraints under the worst-case distribution in the ambiguity sets. Clearly, the feasible set is smaller with the distributionally robust constraints. On the other hand, a similar reference trajectory was generated using Gaussian chance constraints assuming system uncertainties are Gaussian. The reference trajectory generated using distributionally robust risk constraints (with UKF and EKF), and Gaussian chance constraints are shown in the center of Figure 11. The distributionally robust trajectories with a more sophisticated and coherent quantification of risk, are generated with the same computational complexity as with Gaussian chance constraints and exhibit conservatism to account for state distribution ambiguity. Though the result with Extended Kalman Filter based state estimation also produces collision free trajectories, the result inherently suffers from the (first order) approximations due to linearization. To corroborate the results generated by the high-level motion planner, a Monte-Carlo simulation of 100 independent trials involving noises sampled from multivariate Gaussian distribution was conducted and the results are shown in the right side of Figure 11. It is evident that given the conservative noise assumption and risk quantification performed using the distributionally robust approach, the realized sampled paths from the source to destination had little deviations around the reference trajectory. This further demonstrates our proposed approach that with a high-level motion plan that accounts for uncertainty, the low level online tracking controller has more room to maneuver and respond to noise realizations. The video of the simulation is available at https://youtu.be/KpyWXRZ-wSI and the simulation code is available at https://github.com/TSummersLab/Risk_Bounded_Nonlinear_Robot_Motion_Planning.

Refer to caption
Figure 11: A 2-D projection of the urban driving environment containing the robot starting pose, goal region (blue shaded rectangle), the obstacle (black shaded rectangle), constructed motion plan with distributionally robust chance constraints (in blue color) after executing 200 iterations of NRB-RRT\texttt{NRB-RRT}^{\star} algorithm with the two-standard deviation covariance ellipses and the generated reference path from the source to the goal are shown on the plot to the left. A comparison of NRB-RRT\texttt{NRB-RRT}^{\star} algorithm with distributionally robust (in blue color(UKF) and red color(EKF)) & Gaussian chance constraints (in green color(UKF)) are shown on the middle. Clearly, the added conservatism greater than that of the chance constrained counterpart that comes due to the distributional robustness would account for any arbitrary unknown state distribution satisfying the given moments. The results of 100 independent Monte Carlo trials of trajectory tracking with noises drawn from multivariate Gaussian distributions are shown on the right.

8 Conclusion

In this paper, we presented a methodological framework aimed towards tighter integration of perception and planning in nonlinear robotic systems. Uncertainties in perception and motion prediction are explicitly accounted for through distributionally robust risk constraints. Using a dynamic output feedback based control policy realized using a nonlinear MPC and an Unscented Kalman filter, a new algorithm called NRB-RRT\texttt{NRB-RRT}^{\star} is shown to produce risk bounded trajectories with systematic risk assessment. Potential future research involves using deep learning based perception system with semantic SLAM combined with our approach. We will also explore optimal spatio-temporal risk allocation for obstacles and environmental constraints which can result less conservative motion plans for robots operating in cluttered environments. Another promising future research direction involves studying propagation of state distributions of nonlinear systems with higher order moments and using measure concentration techniques to estimate the risk.

References

  • [1] L. Blackmore, H. Li, B. Williams, A probabilistic approach to optimal robust path planning with obstacles, in: 2006 American Control Conference, IEEE, 2006, pp. 7–pp.
  • [2] A.-A. Agha-Mohammadi, S. Chakravorty, N. M. Amato, Firm: Sampling-based feedback motion-planning under motion uncertainty and imperfect measurements, The International Journal of Robotics Research 33 (2) (2014) 268–304.
  • [3] B. Luders, M. Kothari, J. How, Chance constrained rrt for probabilistic robustness to environmental uncertainty, in: AIAA guidance, navigation, and control conference, 2010, p. 8160.
  • [4] B. D. Luders, S. Karaman, J. P. How, Robust sampling-based motion planning with asymptotic optimality guarantees, in: AIAA Guidance, Navigation, and Control (GNC) Conference, 2013, p. 5097.
  • [5] L. Blackmore, M. Ono, B. C. Williams, Chance-constrained optimal path planning with obstacles, IEEE Transactions on Robotics 27 (6) (2011) 1080–1094.
  • [6] W. Liu, M. H. Ang, Incremental sampling-based algorithm for risk-aware planning under motion uncertainty, in: 2014 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2014, pp. 2051–2058.
  • [7] H. Zhu, J. Alonso-Mora, Chance-constrained collision avoidance for mavs in dynamic environments, IEEE Robotics and Automation Letters 4 (2) (2019) 776–783.
  • [8] R. T. Rockafellar, Coherent approaches to risk in optimization under uncertainty, in: OR Tools and Applications: Glimpses of Future Technologies, Informs, 2007, pp. 38–61.
  • [9] A. Majumdar, M. Pavone, How should a robot assess risk? towards an axiomatic theory of risk in robotics, in: Robotics Research, Springer, 2020, pp. 75–84.
  • [10] J. Goh, M. Sim, Distributionally robust optimization and its tractable approximations, Operations research 58 (4-part-1) (2010) 902–917.
  • [11] P. Florence, J. Carter, R. Tedrake, Integrated perception and control at high speed: Evaluating collision avoidance maneuvers without maps, in: Workshop on the Algorithmic Foundations of Robotics (WAFR), 2016.
  • [12] T. Summers, Distributionally robust sampling-based motion planning under uncertainty, in: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2018, pp. 6518–6523.
  • [13] G. Costante, C. Forster, J. Delmerico, P. Valigi, D. Scaramuzza, Perception-aware path planning, arXiv preprint arXiv:1605.04151 (2016).
  • [14] V. Renganathan, I. Shames, T. H. Summers, Towards integrated perception and motion planning with distributionally robust risk constraints, IFAC-PapersOnLine 53 (2) (2020) 15530–15536, 21th IFAC World Congress. doi:https://doi.org/10.1016/j.ifacol.2020.12.2396.
  • [15] A. Hakobyan, I. Yang, Wasserstein distributionally robust motion control for collision avoidance using conditional value-at-risk, arXiv preprint arXiv:2001.04727 (2020).
  • [16] S. Safaoui, B. J. Gravell, V. Renganathan, T. H. Summers, Risk-averse rrt* planning with nonlinear steering and tracking controllers for nonlinear robotic systems under uncertainty, arXiv preprint arXiv:2103.05572 (2021).
  • [17] A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, V. Koltun, Carla: An open urban driving simulator, in: Conference on robot learning, PMLR, 2017, pp. 1–16.
  • [18] N. Sünderhauf, T. T. Pham, Y. Latif, M. Milford, I. Reid, Meaningful maps with object-oriented semantic mapping, in: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2017, pp. 5079–5085.
  • [19] D. Gu, H. Hu, A stabilizing receding horizon regulator for nonholonomic mobile robots, IEEE Transactions on Robotics 21 (5) (2005) 1022–1028.
  • [20] R. Findeisen, L. Imsland, F. Allgower, B. A. Foss, State and output feedback nonlinear model predictive control: An overview, European Journal of Control 9 (2) (2003) 190 – 206. doi:https://doi.org/10.3166/ejc.9.190-206.
  • [21] G. Chang, Marginal unscented kalman filter for cross-correlated process and observation noise at the same epoch, IET Radar, Sonar & Navigation 8 (1) (2014) 54–64.
  • [22] E. A. Wan, R. Van Der Merwe, The Unscented Kalman Filter for nonlinear estimation, in: Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium (Cat. No. 00EX373), IEEE, 2000, pp. 153–158.
  • [23] E. Frazzoli, M. A. Dahleh, E. Feron, Real-time motion planning for agile autonomous vehicles, Journal of guidance, control & dynamics 25 (1) (2002) 116–129.
  • [24] J. J. Park, B. Kuipers, Feedback motion planning via non-holonomic rrt* for mobile robots, in: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2015, pp. 4035–4040.
  • [25] K. Solovey, L. Janson, E. Schmerling, E. Frazzoli, M. Pavone, Revisiting the asymptotic optimality of RRT*, in: 2020 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2020, pp. 2189–2195.
  • [26] G. C. Calafiore, L. El Ghaoui, On distributionally robust chance-constrained linear programs, Jour. of Optimization Theory and Applications 130 (1) (Dec. 2006).
  • [27] J. A. E. Andersson, J. Gillis, G. Horn, J. B. Rawlings, M. Diehl, CasADi – A software framework for nonlinear optimization and optimal control, Mathematical Programming Computation 11 (1) (2019) 1–36. doi:10.1007/s12532-018-0139-4.
  • [28] J. Kong, M. Pfeiffer, G. Schildbach, F. Borrelli, Kinematic and dynamic vehicle models for autonomous driving control design, in: 2015 IEEE Intelligent Vehicles Symposium (IV), IEEE, 2015, pp. 1094–1099.