Belief Space Planning: A Covariance
Steering Approach

Dongliang Zheng1, Jack Ridderhof1, Panagiotis Tsiotras1, and Ali-akbar Agha-mohammadi2 1 Georgia Institute of Technology, Atlanta, Georgia 30332–0150, USA
{dzheng, jridderhof3, tsiotras}@gatech.edu 2NASA-JPL, California Institute of Technology, Pasadena, CA 91109 USA
[email protected]

Abstract

A new belief space planning algorithm, called covariance steering Belief RoadMap (CS-BRM), is introduced, which is a multi-query algorithm for motion planning of dynamical systems under simultaneous motion and observation uncertainties. CS-BRM extends the probabilistic roadmap (PRM) approach to belief spaces and is based on the recently developed theory of covariance steering (CS) that enables guaranteed satisfaction of terminal belief constraints in finite-time. The nodes in the CS-BRM are sampled in belief space and represent distributions of the system states. A covariance steering controller steers the system from one BRM node to another, thus acting as an edge controller of the corresponding belief graph that ensures belief constraint satisfaction. After the edge controller is computed, a specific edge cost is assigned to that edge. The CS-BRM algorithm allows the sampling of non-stationary belief nodes, and thus is able to explore the velocity space and find efficient motion plans. The performance of CS-BRM is evaluated and compared to a previous belief space planning method, demonstrating the benefits of the proposed approach.

I Introduction

Motion uncertainty and measurement noise arise in all real-world robotic applications. When evaluating the safety of a robot under motion and estimation uncertainties, it is no longer sufficient to rely only on deterministic indicators of performance, such as whether the robot is in collision-free or in-collision status. Instead, the state of the robot is best characterized by a probability distribution function (pdf) over all possible states, which is commonly referred to as the belief or information state [1, 2, 3]. Explicitly taking into account the motion and observation uncertainties thus requires planning in the belief space, which allows one to compute the collision probability and thus make more informed decisions. Planning under motion and observation uncertainties is referred to as belief space planning, which can be formulated as a partially observable Markov decision process (POMDP) problem [4]. Solving POMDP in general domains, however, is very challenging, despite some recent progress in terms of more efficient POMDP solvers [5, 6, 7, 8, 9]. Planning in infinite-dimensional distributional (e.g., belief) spaces can become more tractable by the use of roadmaps, that is, graphs constructed by sampling. Since their introduction [10, 11], such belief roadmaps (BRMs) have increased in popularity owing to their simplicity and their ability to avoid local minima.

Sampling-based motion planning algorithms such as probabilistic roadmaps (PRM) [12] and rapidly exploring random trees (RRTs) [13] can be used to solve planning problems in high-dimensional continuous state spaces, by building a roadmap or a tree incrementally through sampling the search space. However, traditional PRM-based methods only address deterministic systems. PRM methods have been extended to belief space planning using belief space roadmaps (BRMs) [11, 14, 15]. One of the main challenges of belief space roadmap (BRM) methods is that the costs of different edges depend on each other, resulting in the “curse of history” problem for POMDPs [16, 14]. This dependence between edges breaks the optimal substructure property, which is required for search algorithms such as Dijkstra’s algorithm or A*. This problem arises from the unreachability of the belief nodes; even if the robot has full control of its mean, it is difficult to reach higher order moments (e.g., a specified covariance). Since the nodes in BRM are sampled in the belief space, the edges in BRM should ideally steer the robot from one distribution to another. If reachability of BRM nodes is not achieved, an edge in the BRM depends on all preceding edges along the path.

In general, in the literature, the belief may be classified into (i) the estimation belief (e-belief), describing the output of the estimator, i.e., the pdf over the error between the estimated value and true value of the state; (ii) control belief (c-belief), which refers to the pdf over “separated control” error, i.e., the error between the estimated state and the desired state; and (iii) the full belief (f-belief), that is, the belief over the true state. We discuss these different beliefs in greater detail in Section III. We just note here that e-belief planning tries to obtain better state estimates and finds a path with minimum estimation uncertainty. The f-belief planning aims at minimizing the full uncertainty that takes into account the impact of the controller as well. In this paper, we consider the f-belief planning problem and will call it belief planning for simplicity.

Planning in e-belief space was studied in [11, 17], where the goal was to find the minimum estimation uncertainty path for a robot from a starting position to a goal position. In [15], a linear-quadratic Gaussian (LQG) controller was used for motion planning within the BRM context. By taking into account the controller and the sensors used, [15] computes the true a priori probability distribution of the state of the robot. Thus, [15] studies the f-belief planning problem. However, the independence between edges is not satisfied. Similarly, [18] studies the f-belief planning problem using a tree.

The state-of-the-art in terms of BRM methods is probably [14], which tackles the “curse of history” problem. The proposed SLQG-FIRM method achieves node reachability using a stationary LQG controller. One limitation of this method is that the nodes have to be stationary. That is, the nodes in the BRM graph need to be sampled in the equilibrium space of the robot, which usually means zero velocity. Thus, this method cannot explore the velocity space and the resulting paths are suboptimal. Secondly, a converging process is required at every node. That is, the robot will have to “wait” at each node, which will increase the time required for the robot to reach the goal. Some remedies are introduced in [19] for systems (e.g., a fixed-wing aircraft) that cannot reach zero velocity by using periodic trajectories and periodic controllers. The periodic controller is applied repeatedly until the trajectory of the vehicle converges to the periodic trajectory. Thus, a “waiting” procedure is still required in these approaches. Online replanning in belief space is studied in [20]. The method in [20] improves the online phase by recomputing the local plans, which includes adding a virtual belief node and local belief edges to the FIRM graph, and solving a dynamic program at every time-step. The offline roadmap construction phase is the same as FIRM [14].

Recent developments in explicitly controlling the covariance of a linear system [21, 22] provide an appealing approach to construct the BRM with guarantees of node reachability. In particular, for a discrete-time linear stochastic system, covariance steering theory designs a controller that steers the system from an initial Gaussian distribution to a terminal Gaussian distribution in finite-time [23, 24, 25]. Reference [21] formulated the finite-horizon covariance control problem as a stochastic optimal control problem. In [23], the covariance steering problem is formulated as a convex program. Additional state chance constraints are considered in [25] and nonlinear systems are considered in [26] using iterative linearization. The covariance steering problem with output feedback has also been studied in [27, 28, 29].

In this paper, we propose the CS-BRM algorithm, which uses covariance steering as the edge controller of a BRM to ensure a priori node reachability. Since the goal of covariance steering is to reach a given distribution of the state, it is well-suited for reaching a belief node. In addition, covariance steering avoids the limitation of sampling in the equilibrium space, and thus the proposed CS-BRM method allows sampling of non-stationary belief nodes. Our method allows searching in the velocity space and thus finds paths with lower cost, as demonstrated in the numerical examples in Section VI.

Extending belief planning to nonlinear systems requires a nominal trajectory for each edge [14]. In the SLQG-FIRM framework, these nominal trajectories are either assumed to be given or being approximated by simple straight lines [15, 14]. However, the nominal trajectory has to be dynamically feasible in order to apply a time-varying LQG controller. Also, the nominal trajectory must be optimal if one wishes to generate optimal motion plans. Finding an optimal nominal trajectory requires solving a two-point boundary value problem (TPBVP), which for general nonlinear systems can be computationally expensive [30]. In this paper, we develop a simple, yet efficient, algorithm to find suitable nominal trajectories between the nodes of the BRM graph to steer the mean states in an optimal fashion.

The contributions of the paper are summarized as follows.

$\bullet$

A new belief space planning method, called CS-BRM, is developed, to construct a roadmap in belief space by using the recently developed theory of finite-time covariance control. CS-BRM achieves finite-time belief node reachability using covariance steering and overcomes the limitation of sampling stationary nodes.
$\bullet$

The concept of compatible nominal trajectory is introduced, which aims to improve the performance of linearization-based control methods to control nonlinear systems. An efficient algorithm, called the CNT algorithm, is proposed to compute nominal feasible trajectories for nonlinear systems.

The paper is organized as follows. The statement of the problem is given in Section II. In Section III, the output-feedback covariance steering theory for linear systems is outlined. In Section IV, the method of computing nominal trajectory for nonlinear systems is introduced. The main algorithm, CS-BRM, is given in Section V. The numerical implementation of the proposed algorithm is presented in Section VI. Finally, Section VII concludes the paper.

II Problem Statement

We consider the problem of planning for a nonholonomic robot in an uncertain environment which contains obstacles. The uncertainty in the problem stems from model uncertainty, as well from sensor noise that corrupts the measurements. We model such a system by a stochastic difference equation of the form

x_{k+1}=f(x_{k},u_{k},w_{k}),

(1)

where $k=0,1,\ldots,N-1$ are the discrete time-steps, $x_{k}\in\mathbb{R}^{n_{x}}$ is the state, and $u_{k}\in\mathbb{R}^{n_{u}}$ is the control input. The steps of the noise process $w_{k}\in\mathbb{R}^{n_{w}}$ are i.i.d standard Gaussian random vectors. The measurements are given by the noisy and partial sensing model

y_{k}=h(x_{k},v_{k}),

(2)

where $y_{k}\in\mathbb{R}^{n_{y}}$ is the measurement at time step $k$ , and the steps of the process $v_{k}\in\mathbb{R}^{n_{y}}$ are i.i.d standard Gaussian random vectors. We assume that $(w_{k})_{k=0}^{N-1}$ and $(v_{k})_{k=0}^{N-1}$ are independent.

The objective is to steer the system (1) from some initial state $x_{0}$ to some final state $x_{N}$ within $N$ time steps while avoiding obstacles and, at the same time, minimize a given performance index. The controller $u_{k}$ at time step $k$ is allowed to depend on the whole history of measurements $y_{\ell},~{}\ell=0,\ldots,k$ up to time $k$ but not on any future measurements.

This is a difficult problem to solve in its full generality. Here we use graph-based methods to build a roadmap in the space of distributions of the states (e.g., the belief space) for multi-query motion planning. Planning in the belief space accounts for the uncertainty inherent in system (1) and (2) owing to the noises $w_{k}$ and $v_{k}$ , as well as uncertainty owing to the distribution of the initial states. Each node sampled in the belief space is a distribution over the state. The edges in a BRM steer the state from one distribution to another. In the next section, we describe a methodology to design BRM edge controllers that allow to steer from one node (i.e., distribution) to another. Similar to previous works [11, 14, 15], we consider Gaussian distributions where the belief is given by the state mean and the state covariance.

III Covariance Steering

In this section, we give a introduction of covariance steering and outline some key results. Some of these results are explicitly used in subsequent sections. The derivation in this section follows closely [27, 25].

Given a nominal trajectory $(n_{k})_{k=0}^{N-1}$ , where $n_{k}=(x_{k}^{r},u_{k}^{r})$ , we can construct a linear approximation of (1)-(2) around $(n_{k})_{k=0}^{N-1}$ via linearization as follows

	$\displaystyle x_{k+1}$	$\displaystyle=A_{k}x_{k}+B_{k}u_{k}+h_{k}+G_{k}w_{k},$		(3)
	$\displaystyle y_{k}$	$\displaystyle=C_{k}x_{k}+D_{k}v_{k},$		(4)

where $h_{k}\in\mathbb{R}^{n_{x}}$ is the drift term, $A_{k}\in\mathbb{R}^{n_{x}\times n_{x}}$ , $B_{k}\in\mathbb{R}^{n_{x}\times n_{u}}$ , and $G_{k}\in\mathbb{R}^{n_{x}\times n_{x}}$ are system matrices, and $C_{k}\in\mathbb{R}^{n_{y}\times n_{x}}$ and $D_{k}\in\mathbb{R}^{n_{y}\times n_{y}}$ are observation model matrices.

We define the covariance steering problem as follows.

Problem 1: Find the control sequence $u=(u_{k})_{k=0}^{N-1}$ such that the system given by (3) and (4), starting from the initial state distribution $x_{0}\sim\mathcal{N}(\bar{x}_{0},P_{0})$ , reaches the final distribution $x_{N}\sim\mathcal{N}(\bar{x}_{N},P_{N})$ where $P_{N}\preceq P_{f}$ , while minimizing the cost functional

J(u)=\mathbb{E}\left[\sum_{k=0}^{N-1}(x_{k}-m_{k})^{\top}Q_{k}(x_{k}-m_{k})+u_{k}^{\top}R_{k}u_{k}\right],

(5)

where $(m_{k})_{k=0}^{N-1}$ is a given reference trajectory of the states, $\bar{x}_{k}=\mathbb{E}(x_{k})$ is the mean of the state $x_{k}$ , $P_{0}$ and $P_{N}$ are the covariance matrices of $x_{0}$ and $x_{N}$ , respectively, the terminal covariance $P_{N}$ is upper bounded by a given covariance matrix $P_{f}$ , and the matrices $(Q_{k}\succeq 0)$ and $(R_{k}\succ 0)$ are given.

III-A Separation of Observation and Control

Similarly to [27], we assume that the control input $u_{k}$ at time-step $k$ is an affine function of the measurement data. It follows that the state will be Gaussian distributed over the entire horizon of the problem. We assume the system (3)-(4) is observable. To solve Problem 1, we use a Kalman filter to estimate the state. Specifically, let the prior initial state estimate be $\hat{x}_{0^{\mbox{\scriptsize{-}}}}$ and distributed according to $\hat{x}_{0^{\mbox{\scriptsize{-}}}}\sim\mathcal{N}(\bar{x}_{0},\hat{P}_{0^{\mbox{\scriptsize{-}}}})$ , and let the prior initial estimation error be $\tilde{x}_{0^{\mbox{\scriptsize{-}}}}=x_{0}-\hat{x}_{0^{\mbox{\scriptsize{-}}}}$ and let its distribution be given by $\tilde{x}_{0^{\mbox{\scriptsize{-}}}}\sim\mathcal{N}(0,\tilde{P}_{0^{\mbox{\scriptsize{-}}}})$ . The estimated state at time $k$ , denoted as $\hat{x}_{k}=\mathbb{E}[x_{k}|Y_{k}]$ , with $Y_{k}$ denoting the filtration generated by $\{\hat{x}_{0^{\mbox{\scriptsize{-}}}},y_{i}:0\leq i\leq k)$ , is computed from [31]

\begin{split}\hat{x}_{k}&=\hat{x}_{k^{\mbox{\scriptsize{-}}}}+L_{k}(y_{k}-C_{k}\hat{x}_{k^{\mbox{\scriptsize{-}}}}),\\ \hat{x}_{k^{\mbox{\scriptsize{-}}}}&=A_{k-1}\hat{x}_{k-1}+B_{k-1}u_{k-1}+h_{k-1},\end{split}

(6)

where

\begin{split}L_{k}&=\tilde{P}_{k^{\mbox{\scriptsize{-}}}}C_{k}^{\top}(C_{k}\tilde{P}_{k^{\mbox{\scriptsize{-}}}}C_{k}^{\top}+D_{k}D_{k}^{\top})^{-1},\\ \tilde{P}_{k}&=(I-L_{k}C_{k})\tilde{P}_{k^{\mbox{\scriptsize{-}}}},\\ \tilde{P}_{k^{\mbox{\scriptsize{-}}}}&=A_{k-1}\tilde{P}_{k-1}A_{k-1}^{\top}+G_{k-1}G_{k-1}^{\top},\end{split}

(7)

where $L_{k}$ is the Kalman gain and where the covariances of $x_{k}$ , $\hat{x}_{k}$ and $\tilde{x}_{k}$ are denoted as $P_{k}=\mathbb{E}[(x_{k}-\bar{x}_{k})(x_{k}-\bar{x}_{k})^{\top}]$ , $\hat{P}_{k}=\mathbb{E}[(\hat{x}_{k}-\bar{x}_{k})(\hat{x}_{k}-\bar{x}_{k})^{\top}]$ and $\tilde{P}_{k}=\mathbb{E}[(x_{k}-\hat{x}_{k})(x_{k}-\hat{x}_{k})^{\top}]$ , respectively.

It can be shown that the cost functional (5) can be written as

	$\displaystyle J(u)=$	$\displaystyle\mathbb{E}\left[\sum_{k=0}^{N-1}\hat{x}_{k}^{\top}Q_{k}\hat{x}_{k}+u_{k}^{\top}R_{k}u_{k}\right]-2\sum_{k=0}^{N-1}\bar{x}_{k}^{\top}Q_{k}m_{k}$
		$\displaystyle+\sum_{k=0}^{N-1}(\mathrm{trace}(\tilde{P}_{k}Q_{k})+m_{k}^{\top}Q_{k}m_{k}),$		(8)

where the last summation is deterministic and does not depend on the control and thus it can be discarded.

By defining the innovation process $(\xi_{k})_{k=0}^{N}$ as

\xi_{k}=y_{k}-\mathbb{E}[y_{k}|Y_{k-1}],

(9)

and noting that $\mathbb{E}[y_{k}|Y_{k-1}]=\mathbb{E}[C_{k}x_{k}+D_{k}v_{k}|Y_{k-1}]=C_{k}\hat{x}_{k^{\mbox{\scriptsize{-}}}}$ , the estimated state dynamics in equation (6) can be rewritten as

\hat{x}_{k+1}=A_{k}\hat{x}_{k}+B_{k}u_{k}+h_{k}+L_{k+1}\xi_{k+1},

(10)

with $\hat{x}_{0}=\hat{x}_{0^{\mbox{\scriptsize{-}}}}+L_{0}\xi_{0}$ . We can then restate Problem 1 as follows.

Problem 2: Find the control sequence $(u_{k})_{k=0}^{N-1}$ , such that the system (10) starting from the initial distribution $\hat{x}_{0^{\mbox{\scriptsize{-}}}}\sim\mathcal{N}(\bar{x}_{0},P_{0}-\tilde{P}_{0^{\mbox{\scriptsize{-}}}})$ reaches the final distribution $\hat{x}_{N}\sim\mathcal{N}(\bar{x}_{N},P_{N}-\tilde{P}_{N})$ , where $P_{N}\preceq P_{f}$ , while minimizing the cost functional

\hat{J}(u)=\mathbb{E}\left[\sum_{k=0}^{N-1}\hat{x}_{k}^{\top}Q_{k}\hat{x}_{k}+u_{k}^{\top}R_{k}u_{k}\right]-2\sum_{k=0}^{N-1}\bar{x}_{k}^{\top}Q_{k}m_{k}.

(11)

To summarize, the covariance steering problem of the state $x_{k}$ with output feedback has been transformed to a covariance steering problem of the estimated state $\hat{x}_{k}$ , where $\hat{x}_{k}$ is computed using the Kalman filter. This transformation relies on the separation principle of estimation and control. The solution to this problem is given in the next section.

III-B Separation of Mean Control and Covariance Control

In this section, we will separate Problem 2 into a mean control problem and a covariance control problem.

By defining the augmented vectors $U_{k}=[u_{0}^{\top}\ u_{1}^{\top}\ \cdots\ u_{k}^{\top}]^{\top},$ $\Xi_{k}=[\xi_{0}^{\top}\ \xi_{1}^{\top}\ \cdots\ \xi_{k}^{\top}]^{\top},$ and $\hat{X}_{k}=[\hat{x}_{0}^{\top}\ \hat{x}_{1}^{\top}\ \cdots\ \hat{x}_{k}^{\top}]^{\top}.$ We can compute $\hat{X}_{k}$ as follows

\hat{X}=A\hat{x}_{0^{\mbox{\scriptsize{-}}}}+BU+H+L\Xi,

(12)

where $\hat{X}=\hat{X}_{N}$ , $U=U_{N-1}$ , $\Xi=\Xi_{N}$ , and $A$ , $B$ , $H$ , and $L$ are block matrices constructed using the system matrices in (10). Defining $\check{x}_{0^{\mbox{\scriptsize{-}}}}\triangleq\hat{x}_{0^{\mbox{\scriptsize{-}}}}-\bar{x}_{0}$ , $\bar{X}\triangleq\mathbb{E}[\hat{X}]$ , $\bar{U}\triangleq\mathbb{E}[U]$ , $\check{X}\triangleq\hat{X}-\bar{X}$ , and $\tilde{U}\triangleq U-\bar{U}$ , and using (12), it follows that

	$\displaystyle\bar{X}$	$\displaystyle=A\bar{x}_{0}+B\bar{U}+H,$		(13)
	$\displaystyle\check{X}$	$\displaystyle=A\check{x}_{0^{\mbox{\scriptsize{-}}}}+B\tilde{U}+L\Xi.$		(14)

The cost functional in (11) can be rewritten as

\begin{split}\hat{J}(u)=&\mathbb{E}[\check{X}^{\top}Q\check{X}+\tilde{U}^{\top}R\tilde{U}]+\bar{X}^{\top}Q\bar{X}+\bar{U}^{\top}R\bar{U}\\ &-2\bar{X}^{\top}QM_{r},\end{split}

(15)

where $Q=\text{blkdiag}(Q_{0},Q_{1},\ldots,Q_{N-1},0)$ , $R=\text{blkdiag}(R_{0},R_{1},\ldots,R_{N-1})$ , $M_{r}=[m_{0}^{\top}\ m_{1}^{\top}\ \cdots\ m_{N}^{\top}]^{\top}$ .

From (13), (14), and (15), the covariance steering problem of the estimated state $\hat{x}$ (Problem 2) can thus be divided into the following mean control and covariance control problems. The mean control problem is given by

\begin{split}\min_{\bar{U}}&\ \ \bar{X}^{\top}Q\bar{X}+\bar{U}^{\top}R\bar{U}-2\bar{X}^{\top}QM_{r}\\ \mathrm{s.t.}&\ \ \bar{X}=A\bar{x}_{0}+B\bar{U}+H,\\ &\ \ E_{0}\bar{X}=\bar{x}_{0},\quad E_{N}\bar{X}=\bar{x}_{N},\end{split}

(16)

where $E_{k}$ is a matrix defined such that $E_{k}\bar{X}=\bar{x}_{k}$ . The covariance control problem is given by

\begin{split}\min_{\tilde{U}}&\ \ \mathbb{E}[\check{X}^{\top}Q\check{X}+\tilde{U}^{\top}R\tilde{U}]\\ \mathrm{s.t.}&\ \ \check{X}=A\check{x}_{0^{\mbox{\scriptsize{-}}}}+B\tilde{U}+L\Xi,\\ &\ \ \mathbb{E}[\check{x}_{0^{\mbox{\scriptsize{-}}}}\check{x}_{0^{\mbox{\scriptsize{-}}}}^{\top}]=\hat{P}_{0^{\mbox{\scriptsize{-}}}},\ \ E_{N}\mathbb{E}[\check{X}\check{X}^{\top}]E_{N}^{\top}=\hat{P}_{N},\end{split}

(17)

where $\hat{P}_{0^{\mbox{\scriptsize{-}}}}=P_{0}-\tilde{P}_{0^{\mbox{\scriptsize{-}}}}$ , $\hat{P}_{N}=P_{N}-\tilde{P}_{N}\preceq P_{f}-\tilde{P}_{N}$ .

III-C Solutions of the Mean Control and Covariance Control Problems

Assuming that the system (3) is controllable, the solution for the mean control problem (16) to obtain the mean trajectory $(\bar{x}_{k})_{k=0}^{N}$ and $(\bar{u}_{k})_{k=0}^{N-1}$ is given by [25]

\begin{split}\bar{U}^{*}=&W(-V+\bar{B}_{N}^{\top}(\bar{B}_{N}W\bar{B}_{N}^{\top})^{-1}\\ &(\bar{x}_{N}-\bar{A}_{N}\bar{x}_{0}-H_{N}+\bar{B}_{N}WV)),\end{split}

(18)

where $W=(B^{\top}QB+R)^{-1}$ , and $V=B^{\top}Q(A\bar{x}_{0}+H-M_{r})$ .

The solution to the covariance control problem (17) is given by the controller of the form

\tilde{u}_{k}=\sum_{i=0}^{k}K_{k,i}\check{x}_{i},

(19)

equivalently, by $\tilde{U}=K\check{X}$ , where the control gain matrix $K$ is lower block diagonal of the form

K=\begin{bmatrix}K_{0,0}&0&\cdots&0&0\\ K_{1,0}&K_{1,1}&\cdots&0&0\\ \vdots&\vdots&\ddots&\vdots&0\\ K_{N-1,0}&K_{N-1,1}&\cdots&K_{N-1,N-1}&0\end{bmatrix},

(20)

and is computed from $K=F(I+BF)^{-1}$ where $F$ is obtained by solving the following convex optimization problem [27]

\displaystyle\min_{F}\ \mathrm{trace}[((I+BF)^{\top}Q(I+BF)+F^{\top}RF)P_{Z}],

(21)

subject to

\lVert P_{Z}^{1/2}(I+BF)^{\top}E_{N}^{\top}(P_{f}-\tilde{P}_{N})^{-1/2}\rVert-1\leq 0,

(22)

where $P_{Z}=A\hat{P}_{0^{\mbox{\scriptsize{-}}}}A^{\top}+LP_{\Xi}L^{\top}$ is the covariance of $Z\triangleq A\check{x}_{0^{\mbox{\scriptsize{-}}}}+L\Xi$ and $P_{\Xi}=\text{blkdiag}(P_{\xi_{0}},\cdots,P_{\xi_{N}})$ is the covariance of the innovation process.

IV Computing the nominal Trajectory

In this section, we develop an efficient algorithm to find nominal trajectories for nonlinear systems using the mean controller (18). A deterministic nonlinear dynamic model, i.e., with the noise $w_{k}$ set to zero, is considered when designing the nominal trajectory. To this end, consider the deterministic nonlinear system

x_{k+1}=f(x_{k},u_{k},0).

(23)

Let $(x_{k}^{r})_{k=0}^{N-1}$ and $(u_{k}^{r})_{k=0}^{N-1}$ be a nominal trajectory for this system. The linearized model (3) along this nominal trajectory will be used by the mean controller (18) to compute a control sequence $(u_{k}^{c})_{k=0}^{N-1}$ and the corresponding state sequence $(x_{k}^{c})_{k=0}^{N-1}$ .

Definition 1: A nominal trajectory is called a compatible nominal trajectory for system (23) if $x_{k}^{r}=x_{k}^{c}$ and $u_{k}^{r}=u_{k}^{c}$ for $k=0,\dots,N-1$ .

Linearizing along a compatible nominal trajectory ensures that the nonlinear system is linearized at the “correct” points. When applying the designed control to the linearized system, the resulting trajectory is the same as the nominal trajectory, which means that the system reaches the exact points where the linearization is performed. On the other hand, there will be extra errors caused by the linearization if the system is linearized along a non-compatible nominal trajectory.

The iterative algorithm to find a compatible nominal trajectory is given in Algorithm 1.

1 Initialize

(x_{k}^{r})_{k=0}^{N-1}

and

(u_{k}^{r})_{k=0}^{N-1}

;

2 while NotConverged do

3 Linearize (23) along

(x_{k}^{r})_{k=0}^{N-1}

and

(u_{k}^{r})_{k=0}^{N-1}

;

4 Compute the mean optimal control using (18) to obtain the control sequence

(u_{k}^{c})_{k=0}^{N-1}

;

5 Compute the the controlled state trajectory

(x_{k}^{c})_{k=0}^{N-1}

using (13) ;

6 Set

(x_{k}^{r}=x_{k}^{c})_{k=0}^{N-1}

and

(u_{k}^{r}=u_{k}^{c})_{k=0}^{N-1}

;

return

(x_{k}^{r})_{k=0}^{N-1}

and

(u_{k}^{r})_{k=0}^{N-1}

;

Algorithm 1 Compatible Nominal Trajectory (CNT)

When the algorithm converges, the nominal trajectory is the same as the mean optimal trajectory, which, by definition, is a compatible nominal trajectory. Note that $(\ref{S3Eq29})$ gives the analytical solution of the mean control, which can be quickly computed. Each iteration of the algorithm has a low computational load and the overall algorithm may be solved efficiently. As seen later in Section VI, the algorithm typically convergences within a few iterations.

V The CS-BRM Algorithm

The nodes in the CS-BRM are sampled in the belief space. In a partially observable environment, the belief $b_{k}$ at time-step $k$ is given by the conditional probability distribution of the state $x_{k}$ , conditioned on the history of observations $(y_{i})_{i=0}^{k}$ and the history of control inputs $(u_{i})_{i=0}^{k-1}$ , that is, $b_{k}=\mathbb{P}(x_{k}|(y_{i})_{i=0}^{k},(u_{i})_{i=0}^{k-1})$ . In a Gaussian belief space, $b_{k}$ can be equivalently represented by the estimated state, $\hat{x}_{k}$ , and the estimation error covariance, $\tilde{P}_{k}$ , that is, $b_{k}=(\hat{x}_{k},\tilde{P}_{k})$ [18, 14]. The state estimate $\hat{x}_{k}$ is Guassian, and is given by $\hat{x}_{k}\sim\mathcal{N}(\bar{x}_{k},\hat{P}_{k})$ . Hence, the Gaussian belief can be also written as $b_{k}=(\bar{x}_{k},\hat{P}_{k},\tilde{P}_{k})$ . The main idea of CS-BRM is to use covariance steering theory to design the edge controller to achieve node reachability.

V=\{nod_{1},\ \dots,\ nod_{n}\}\leftarrow\texttt{SampleNodes}(n)

;

V_{r}\leftarrow V

E\leftarrow\emptyset

;

3 for $i=1:n$ do

V_{near}\leftarrow\texttt{Neighbor}(V_{r},\ nod_{i})

;

5 foreach $nod_{j}\in V_{near}$ do

(\bar{U}_{ij},\ \tau_{ij},\ MCost_{ij})\leftarrow\texttt{MTraj}(nod_{i},\ nod_{j})

;

7 if $\texttt{ObstacleFree}(\tau_{ij})$ then

\tilde{P}_{N^{\mbox{\scriptsize{-}}}}\leftarrow\texttt{KF}(nod_{i},\ nod_{j})

;

9 if $\tilde{P}_{N^{\mbox{\scriptsize{-}}}}\preceq\tilde{P}_{nod_{j}}$ then

(\tilde{U}_{ij},\ CovCost_{ij})\leftarrow\texttt{CovControl}(nod_{i},\ nod_{j})

;

CollisionCost_{ij}\leftarrow\texttt{MonteCarlo}(\bar{U}_{ij},\ \tilde{U}_{ij})

;

E_{ij}\leftarrow(\bar{U}_{ij},\ \tilde{U}_{ij},\ EdgeCost_{ij})

;

E\leftarrow E\cup E_{ij}

(\bar{U}_{ji},\ \tau_{ji},\ MCost_{ji})\leftarrow\texttt{MTraj}(nod_{j},\ nod_{i})

;

16 if $\texttt{ObstacleFree}(\tau_{ji})$ then

17 Repeat line 8-13 with

i

and

j

swapped

V_{r}\leftarrow V_{r}\setminus nod_{i}

;

\text{CS-BRM}\leftarrow(V,\ E)

;

return CS-BRM;

Algorithm 2 Constructing CS-BRM

An illustration of the steps for building the CS-BRM is shown in Fig. 1. The algorithm for constructing the CS-BRM is given in Algorithm 2. The following procedures are used in the algorithm.
Sample Nodes: The function SampleNodes $(n)$ samples $n$ CS-BRM nodes. A node in the CS-BRM is represented by the tuple $(\bar{x},\hat{P}_{{}^{\mbox{\scriptsize{-}}}},\tilde{P}_{{}^{\mbox{\scriptsize{-}}}})$ , where $\bar{x}$ is the state mean, $\hat{P}_{{}^{\mbox{\scriptsize{-}}}}$ is the prior estimated state covariance, and $\tilde{P}_{{}^{\mbox{\scriptsize{-}}}}$ is the prior state estimation error covariance. Since $P=\hat{P}_{{}^{\mbox{\scriptsize{-}}}}+\tilde{P}_{{}^{\mbox{\scriptsize{-}}}}$ , the node can also be equivalently represented by $(\bar{x},P,\tilde{P}_{{}^{\mbox{\scriptsize{-}}}})$ . Note that we can compute the a posteriori convariances $\hat{P}$ and $\tilde{P}$ using $\hat{P}_{{}^{\mbox{\scriptsize{-}}}}$ and $\tilde{P}_{{}^{\mbox{\scriptsize{-}}}}$ . For constructing node $j$ , the algorithm starts by sampling the free state space, which provides the mean $\bar{x}_{j}$ of the distribution. Then, an estimated state covariance $\hat{P}_{j^{\mbox{\scriptsize{-}}}}$ is sampled from the space of symmetric and positive definite matrix space. Similarly, a state estimation error covariance $\tilde{P}_{j^{\mbox{\scriptsize{-}}}}$ is also sampled at node $j$ . $\tilde{P}_{j^{\mbox{\scriptsize{-}}}}$ is the initial condition of the Kalman filter update for the edges that are coming out of node $j$ . In Fig. 1(a), for each node, $\bar{x}$ is shown as a black dot, $P$ is shown as a solid ellipse, and $\tilde{P}_{{}^{\mbox{\scriptsize{-}}}}$ is shown as a dashed ellipse.
Neighbor: The function Neighbor $(V_{r},nod_{i})$ finds all the nodes in $V_{r}$ that are within a given distance $d_{1}$ to node $nod_{i}$ , where $V_{r}$ is a node set containing all nodes in the CS-BRM.
Mean Trajectory: Given two nodes $nod_{i}$ and $nod_{j}$ , MTraj $(nod_{i},nod_{j})$ uses the mean controller (18) and Algorithm 1 to find the compatible nominal trajectory, which is also the mean trajectory from $nod_{i}$ to $nod_{j}$ . The function returns the mean control $\bar{U}_{ij}$ , mean trajectory $\tau_{ij}$ , and the cost of the mean control $MCost_{ij}$ .
Obstacle Checking: The function ObstacleFree $(\tau_{ij})$ returns true if the mean trajectory $\tau_{ij}$ is collision free.
Kalman Filter: Given two nodes $nod_{i}$ and $nod_{j}$ , KF $(nod_{i},\ nod_{j})$ returns the prior estimation error covariance at the last time step of the trajectory from $nod_{i}$ to $nod_{j}$ .
Covariance Control: CovControl $(nod_{i},nod_{j})$ solves the covariance control problem from $nod_{i}$ to $nod_{j}$ . It returns the control $\tilde{U}_{ij}$ and the cost $CovCost_{ij}$ .
Monte Carlo: We use Monte Carlo simulations to calculate the probability of collision of the edges. For edge $E_{ij}$ , the initial state $x_{0}$ and initial state estimate $\hat{x}_{0^{\mbox{\scriptsize{-}}}}$ are sampled from their corresponding distributions. The state trajectory is simulated using the mean control $\bar{U}_{ij}$ and covariance control $\tilde{U}_{ij}$ . Then, collision checking is performed on the simulated state trajectory. By repeating this process, we approximate the probability of collision of this edge. In our implementation the collision cost $CollisionCost_{ij}$ is taken to be proportional to the probability of collision along that edge.

Algorithm 2 starts by sampling $n$ nodes in the belief space using SampleNodes (Line 1). Lines 3-17 are the steps to add CS-BRM edges. Given two neighboring nodes $nod_{i}$ and $nod_{j}$ , Lines 6-13 try to construct the edge $E_{ij}$ and Lines 14-16 try to construct the edge $E_{ji}$ .

Every edge in the CS-BRM is constructed by solving a covariance steering problem. Each node tries to connect to its neighboring nodes if possible. First, for each edge, we use Algorithm 1 to find the mean trajectory of that edge (Line 6). Next, we check if the mean trajectory is collision-free. Only if the mean trajectory is collision-free do we proceed to the next step of edge construction. Then, the Kalman filter updates are simulated, which give the state estimation error covariance at every time step along that edge. The state estimation error covariances of the edges $\overline{ab}$ and $\overline{ac}$ are shown as red dash ellipses in Figure 1(c). Take the edge $\overline{ab}$ as an example. The initial condition of the Kalman filter is $\tilde{P}_{0^{\mbox{\scriptsize{-}}}}=\tilde{P}_{a^{\mbox{\scriptsize{-}}}}$ . The prior estimation error covariance at the final time-step, $\tilde{P}_{N^{\mbox{\scriptsize{-}}}}$ , is compared with the state estimation error covariance of node $b$ , $\tilde{P}_{b^{\mbox{\scriptsize{-}}}}$ . If $\tilde{P}_{N^{\mbox{\scriptsize{-}}}}\preceq\tilde{P}_{b^{\mbox{\scriptsize{-}}}}$ (Line 9), the algorithm proceeds to solve the covariance control problem and this edge is added to the graph. For the covariance control problem of edge $\overline{ab}$ , the initial constraint and the terminal constraint are given by $\hat{P}_{0^{\mbox{\scriptsize{-}}}}=\hat{P}_{a^{\mbox{\scriptsize{-}}}}$ and $\hat{P}_{N}\preceq\hat{P}_{b}$ . The covariance $\hat{P}_{b}$ is computed using $\hat{P}_{b^{\mbox{\scriptsize{-}}}}$ and $\tilde{P}_{b^{\mbox{\scriptsize{-}}}}$ . The final edge controller is the combination of the mean controller, covariance controller, and the Kalman filter.

In addition to the edge controller, the edge cost is computed for each edge. The edge cost $EdgeCost_{ij}$ is a weighted sum of $MCost_{ij}$ , $CovCost_{ij}$ , and $CollisionCost_{ij}$ . With covariance steering serving as the lower-level controller, the higher-level motion planning problem using the roadmap is a graph search problem similarly to a PRM. Thus, the covariance steering approach transforms the belief space roadmap into traditional PRM with specific edge costs.

In CS-BRM, each edge is an independent covariance steering problem and the planned path using CS-BRM consists of a concatenation of edges. Since the terminal estimated state covariance satisfies an inequality constraint (17) and the terminal state estimation error covariance satisfies an inequality relation by construction (Line 9), it is important to verify that the covariance constraints at all nodes are still satisfied by concatenating the edges.

To this end, consider the covariance steering problem from node $a$ to node $b$ . For the edge $\overline{ab}$ , we have the initial constraints $\hat{P}_{0^{\mbox{\scriptsize{-}}}}=\hat{P}_{a^{\mbox{\scriptsize{-}}}}$ , $\tilde{P}_{0^{\mbox{\scriptsize{-}}}}=\tilde{P}_{a^{\mbox{\scriptsize{-}}}}$ , and the terminal covariance constraint $\hat{P}_{N}\preceq\hat{P}_{b}$ . Note that the constraint $\tilde{P}_{N^{\mbox{\scriptsize{-}}}}\preceq\tilde{P}_{b^{\mbox{\scriptsize{-}}}}$ is satisfied for $\overline{ab}$ (Line 9, Algorithm 2). Suppose that the solution of the covariance control problem of edge $\overline{ab}$ is the feedback gain $K$ . By concatenating the edges, the system may not start exactly at node $a$ . Instead, it will start at some node $a^{{}^{\prime}}=(\bar{x}_{a},\hat{P}_{a^{\mbox{\scriptsize{-}}}}^{{}^{\prime}},\tilde{P}_{a^{\mbox{\scriptsize{-}}}}^{{}^{\prime}})$ that satisfies $\hat{P}_{a^{\mbox{\scriptsize{-}}}}^{{}^{\prime}}\preceq\hat{P}_{a^{\mbox{\scriptsize{-}}}}$ and $\tilde{P}_{a^{\mbox{\scriptsize{-}}}}^{{}^{\prime}}\preceq\tilde{P}_{a^{\mbox{\scriptsize{-}}}}$ . Next, we show that by applying the pre-computed feedback gain $K$ , the terminal covariance still satisfies $\hat{P}_{N}^{{}^{\prime}}\preceq\hat{P}_{b}$ for the new covariances $\hat{P}_{a^{\mbox{\scriptsize{-}}}}^{{}^{\prime}}$ and $\tilde{P}_{a^{\mbox{\scriptsize{-}}}}^{{}^{\prime}}$ . Before verifying this result, we provide a property of the covariance of the Kalman filter. Given an initial estimation error covariance $\tilde{P}_{0^{\mbox{\scriptsize{-}}}}$ , we obtain a sequence of error covariances $(\tilde{P}_{k^{\mbox{\scriptsize{-}}}})_{k=1}^{N}$ using the Kalman filter updates (7). Similarly, given the new initial error covariance $\tilde{P}_{0^{\mbox{\scriptsize{-}}}}^{{}^{\prime}}$ which satisfies $\tilde{P}_{0^{\mbox{\scriptsize{-}}}}^{{}^{\prime}}\preceq\tilde{P}_{0^{\mbox{\scriptsize{-}}}}$ , we obtain a new sequence of error covariances $(\tilde{P}_{k^{\mbox{\scriptsize{-}}}}^{{}^{\prime}})_{k=0}^{N}$ . We have the following lemma.

Lemma 1

If $\tilde{P}_{0^{\mbox{\scriptsize{-}}}}^{{}^{\prime}}\preceq\tilde{P}_{0^{\mbox{\scriptsize{-}}}}$ , then $\tilde{P}_{k^{\mbox{\scriptsize{-}}}}^{{}^{\prime}}\preceq\tilde{P}_{k^{\mbox{\scriptsize{-}}}}$ , for all $k=0,\cdots,N$ .

The proof of this lemma is omitted. A proof of a similar result can be found in [18]. From Lemma 1, it is straightforward to show that, if $\tilde{P}_{0^{\mbox{\scriptsize{-}}}}^{{}^{\prime}}\preceq\tilde{P}_{0^{\mbox{\scriptsize{-}}}}$ , we also have $\tilde{P}_{k}^{{}^{\prime}}\preceq\tilde{P}_{k}$ for all $k=0,\ldots,N$ .

Proposition 2

Consider a path on the CS-BRM roadmap, where the initial node of the path is denoted by $(\bar{x}_{i},\hat{P}_{i},\tilde{P}_{i})$ and the final node of the path is denoted by $(\bar{x}_{j},\hat{P}_{j},\tilde{P}_{j})$ . Starting from the initial node and following this path by applying the sequence of edge controllers, the robot will arrive at a belief node $(\bar{x},\hat{P},\tilde{P})$ such that $\bar{x}=\bar{x}_{j}$ , $\hat{P}\preceq\hat{P}_{j}$ , and $\tilde{P}\preceq\tilde{P}_{j}$ .

The proof of this proposition is straightforward and is omitted. Proposition 2 guarantees that, when planning on the CS-BRM roadmap, the covariance at each arrived node is always smaller than the assigned fixed covariance at the corresponding node of the CS-BRM roadmap.

VI Numerical Examples

In this section, we first illustrate our theoretical results for the motion planning problem of a 2-D double integrator. We compare our method with the SLQG-FIRM algorithm [14], and we show that our method overcomes some of the limitations of SLQG-FIRM. Subsequently, the problem of a fixed-wing aerial vehicle in a 3-D environment is studied.

VI-A 2-D Double Integrator

A 2-D double integrator is a linear system with a 4-dimensional state space given by its position and velocity, and a 2-dimensional control space given by its acceleration.

The environment considered is shown in Figure 2. There are $\ell$ landmarks placed in the environment, which are represented by black stars. The black polygonal shapes represent the obstacles. The agent observes all landmarks and obtains estimates of its position at all time steps. The agent achieves better position estimates when it is closer to the landmarks. Let the Euclidean distance between the position of the 2-D double integrator and the $j^{th}$ landmarks be given by $d_{j}$ . Then, the $j^{th}$ position measurement corresponding to landmark $j$ is

\begin{split}{}^{j}\!y=[x^{(1)}\ x^{(2)}]^{\top}+\eta_{p}d_{j}v_{p},\quad j=1,2,\cdots,\ell,\end{split}

(24)

where $\eta_{p}$ is a parameter related to the intensity of the position measurement noise that is set to $0.1$ , and $v_{p}$ is a two-dimensional standard Gaussian random vector. The velocity measurement is given by

\begin{split}y_{v}=[x^{(3)}\ x^{(4)}]^{\top}+\eta_{v}v_{v},\end{split}

(25)

where $\eta_{v}$ is a parameter related to the intensity of the velocity measurement noise that is set to $0.2$ , and $v_{v}$ is a two-dimensional standard Gaussian random vector. Thus, the total measurement vector $y$ is a $2\ell+2$ dimensional vector, composed of $\ell$ position measurements and one velocity measurement.

The sampled CS-BRM nodes are shown in Figure 2(a). The velocities are restricted to be zero for the purpose of comparison with SLQG-FIRM. The gray dots are the state means of the nodes. The red dot shows the position of the starting node and the blue square shows the position of the goal node. The black ellipses around the state represent the $3\sigma$ confidence intervals of the position covariances. The red dash ellipses correspond to the prior estimation error covariances. Following Algorithm 2, the edge controllers are computed using the covariance steering controller for each edge and the collision cost is computed using Monte Carlo simulations. The final CS-BRM is given in Figure 2(b).

After the CS-BRM is built, the path planning problem on CS-BRM is the same as the problem of planning using a PRM, which can be easily solved using a graph search algorithm. The difference between the CS-BRM and PRM is that the edge cost in CS-BRM is specifically designed to deal with dynamical system models and uncertainties, and the transition between two nodes of CS-BRM is achieved using covariance steering as the edge controller.

The planning results of CS-BRM and SLQG-FIRM are given in Figure 3. The paths goes through the same set of belief nodes for this example. In Figure 3(b), the red ellipses are the state covariances at the last step of the time-varying LQG controller. As we see, each one of the ellipses is larger than the corresponding state covariance of the intermediate belief node (bold black ellipses). Thus, a converging step is required at every intermediate node. The time-varying LQG controller is switched to the SLQG controller until the state covariance converges to a value that is smaller than the state covariance of the belief nodes (bold black ellipses).

The CS-BRM that samples nonstationary nodes is shown in Figure 4(a). For clarity, only the mean trajectories of the edges are shown (green lines). The environment, the sampled mean positions, and the sampled covariances are all the same as those in Figure 3(a) and 3(b). The only difference is that multiple velocities are sampled at each position in Figure 4(a). The planned path is shown in Figure 4(b). The cost of the planned path is $104.87$ , which is much lower than the cost of the paths in Figure 3(a) and Figure 3(b), which are $239.36$ and $240.47$ respectively. The decrease in the path cost is due to the decrease of the mean control cost. In Figure 3(b), the robot has to stop (zero velocity) at every intermediate node. With the proposed method, the robot goes through each intermediate node smoothly, which results in a more efficient path.

VI-B Fixed-Wing Aircraft

In this section, we apply the proposed method on a nonlinear example, namely, planning for a fixed-wing aerial vehicle [32]. The discrete-time system model is given by

\begin{split}x_{k+1}&=x_{k}+V_{k}\cos\psi_{k}\cos\gamma_{k}\Delta t+g_{1k}w_{1k},\\ y_{k+1}&=y_{k}+V_{k}\sin\psi_{k}\cos\gamma_{k}\Delta t+g_{2k}w_{2k},\\ z_{k+1}&=z_{k}+V_{k}\sin\gamma_{k}\Delta t+g_{3k}w_{3k},\\ \psi_{k+1}&=\psi_{k}+\frac{g}{V_{k}}\tan\phi_{k}\Delta t+g_{4k}w_{4k},\end{split}

(26)

where the 4-dimensional state space is given by $[x\ y\ z\ \psi]^{\top}$ , where $[x\ y\ z]^{\top}$ is the 3-D position of the vehicle, and $\psi$ is the heading angle. The 3-D control input space is given by $[V\ \gamma\ \phi]^{\top}$ , where $V$ is the air speed, $\gamma$ is the flight-path angle, and $\phi$ is the bank angle, $\Delta t$ is the time step size, $w_{ik}$ , $i=1,2,3,4$ , are standard Gaussian random variables, $g_{1k}$ , $g_{2k}$ , $g_{3k}$ , and $g_{4k}$ are multipliers correspond to the magnitude of the noise. Their values are all set to be $0.1$ .

Similar to the 2-D double integrator example, several landmarks are placed in the environment. Let the Euclidean distance between the 3D position of the vehicle and the $j^{th}$ landmarks be given by $d_{j}$ . Then, the $j^{th}$ position measurement corresponding to landmark $j$ is

{}^{j}\!y=[x\ y\ z\ \phi]^{\top}+\eta d_{j}v,\quad j=1,2,\ldots,\ell,

(27)

where $\eta$ is a parameter related to the intensity of the noise of the measurement and is set to $0.05$ , and $v$ is a 4-dimensional standard Gaussian random vector. Thus, the total measurement vector $y$ is a $4\ell$ -dimensional vector, where $\ell$ is the number of landmarks.

Algorithm 1 was used to find a compatible nominal trajectory for the fixed-wing vehicle. One example is given in Figure 5. The nominal trajectory and the reference trajectory are initialized as straight lines in all four state dimensions and the three control input dimensions. The algorithm converged in eight iterations.

The planning environment is shown in Figure 6. The four spheres in the middle of the environment are the obstacles. The three black stars represent the landmarks. The black circles represent the state mean (position) of the nodes. The red dot shows the position of the starting node and the blue square shows the position of the goal node. The state means are deterministically chosen to discretize the environment. The covariances of the nodes are not showed.

The planned path using this CS-BRM is also shown in Figure 6. Similarly to the example in Section VI-A, instead of flying through the narrow passage between the four obstacles, which resulting a high probability of collision, the algorithm choose a path that trades off between control cost and collision cost while minimizing the total edge costs.

VII Conclusion

A belief space roadmap (BRM) algorithm is developed in this paper. The nodes in BRM represent distributions of the state of the system and are sampled in the belief space. The main idea is to use covariance steering to design the edge controllers of the BRM graph to steer the system from one distribution to another. Compared to [14], the proposed method allows sampling nonstationary belief nodes, which has the advantage of more complete exploration of the belief space. For covariance steering of nonlinear systems, we introduce the concept of compatible nominal trajectories, which aim to better approximate the nonlinear dynamics through successive linearization. We also propose an efficient algorithm to compute compatible nominal trajectories. Compared to the standard PRM, the additional computation load comes from the computation of the edge controllers and edge cost evaluations, which, however, are done offline. By explicitly incorporating motion and observation uncertainties, we show that the proposed CS-BRM algorithm generates efficient motion plans that take into account both the control effort and collision probability.

Acknowledgments

This work has been supported by NSF awards IIS-1617630 and IIS-2008695, NASA NSTRF Fellowship 80NSSC17K0093, and ARL under CRA DCIST W911NF-17-2-0181. The authors would also like to thank Dipankar Maity for many insightful discussions regarding the output-feedback covariance steering problem.

References

[1] B. Bonet and H. Geffner, “Planning with incomplete information as heuristic search in belief space,” in International Conference on Artificial Intelligence Planning Systems, Breckenridge, CO, April 2000, pp. 52–61.
[2] S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics. MIT Press, 2005.
[3] J. Van Den Berg, S. Patil, and R. Alterovitz, “Motion planning under uncertainty using iterative local optimization in belief space,” The International Journal of Robotics Research, vol. 31, no. 11, pp. 1263–1278, 2012.
[4] L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, “Planning and acting in partially observable stochastic domains,” Artificial Intelligence, vol. 101, pp. 99–134, 1998.
[5] S. C. Ong, S. W. Png, D. Hsu, and W. S. Lee, “Planning under uncertainty for robotic tasks with mixed observability,” The International Journal of Robotics Research, vol. 29, no. 8, pp. 1053–1068, 2010.
[6] S. Ross, J. Pineau, S. Paquet, and B. Chaib-Draa, “Online planning algorithms for POMDP,” Journal of Artificial Intelligence Research, vol. 32, pp. 663–704, 2008.
[7] G. Shani, J. Pineau, and R. Kaplow, “A survey of point-based POMDP solvers,” Autonomous Agents and Multi-Agent Systems, vol. 27, no. 1, pp. 1–51, 2013.
[8] A. Somani, N. Ye, D. Hsu, and W. S. Lee, “DESPOT: online POMDP planning with regularization,” Advances in neural information processing systems, vol. 26, pp. 1772–1780, 2013.
[9] K. Sun, B. Schlotfeldt, G. J. Pappas, and V. Kumar, “Stochastic motion planning under partial observability for mobile robots with continuous range measurements,” IEEE Transactions on Robotics, 2020, early access.
[10] R. Alterovitz, T. Siméon, and K. Y. Goldberg, “The stochastic motion roadmap: A sampling framework for planning with Markov motion uncertainty.” in Robotics: Science and systems, 2007, pp. 233–241.
[11] S. Prentice and N. Roy, “The belief roadmap: Efficient planning in belief space by factoring the covariance,” The International Journal of Robotics Research, vol. 28, pp. 1448–1465, 2009.
[12] L. E. Kavraki, P. Svestka, J. C. Latombe, and M. H. Overmars, “Probabilistic roadmaps for path planning in high-dimensional configuration spaces,” IEEE Transactions on Robotics and Automation, vol. 12, no. 4, pp. 566–580, 1996.
[13] S. Karaman and E. Frazzoli, “Sampling-based algorithms for optimal motion planning,” The International Journal of Robotics Research, vol. 30, no. 7, pp. 846–894, June 2011.
[14] A. A. Agha-Mohammadi, S. Chakravorty, and N. M. Amato, “FIRM: sampling-based feedback motion-planning under motion uncertainty and imperfect measurements,” The International Journal of Robotics Research, vol. 33, no. 2, pp. 268–304, 2014.
[15] J. Van Den Berg, P. Abbeel, and K. Goldberg, “LQG-MP: optimized path planning for robots with motion uncertainty and imperfect state information,” The International Journal of Robotics Research, vol. 30, no. 7, pp. 895–913, 2011.
[16] J. Pineau, G. Gordon, and S. Thrun, “Point-based value iteration: An anytime algorithm for POMDPs,” in International joint conference on artificial intelligence, vol. 3, Acapulco, Mexico, August 2003, pp. 1025–1032.
[17] S. D. Bopardikar, B. Englot, A. Speranzon, and J. van den Berg, “Robust belief space planning under intermittent sensing via a maximum eigenvalue-based bound,” The International Journal of Robotics Research, vol. 35, no. 13, pp. 1609–1626, 2016.
[18] A. Bry and N. Roy, “Rapidly-exploring random belief trees for motion planning under uncertainty,” in 2011 IEEE international conference on robotics and automation. IEEE, 2011, pp. 723–730.
[19] A. A. Agha-Mohammadi, S. Chakravorty, and N. M. Amato, “Periodic feedback motion planning in belief space for non-holonomic and/or nonstoppable robots,” Technical Report: TR12-003, Parasol Lab., CSE Dept., Texas A&M University, pp. 1–23, 2012.
[20] A. A. Agha-mohammadi, S. Agarwal, S. K. Kim, S. Chakravorty, and N. M. Amato, “SLAP: simultaneous localization and planning under uncertainty via dynamic replanning in belief space,” IEEE Transactions on Robotics, vol. 34, no. 5, pp. 1195–1214, 2018.
[21] Y. Chen, T. T. Georgiou, and M. Pavon, “Optimal steering of a linear stochastic system to a final probability distribution, Part I,” IEEE Transactions on Automatic Control, vol. 61, p. 1158–1169, 2015.
[22] ——, “Optimal steering of a linear stochastic system to a final probability distribution, part ii,” IEEE Transactions on Automatic Control, vol. 61, no. 5, pp. 1170–1180, 2015.
[23] E. Bakolas, “Optimal covariance control for discrete-time stochastic linear systems subject to constraints,” in IEEE Conference on Decision and Control, Las Vegas, NV, December 2016, pp. 1153–1158.
[24] M. Goldshtein and P. Tsiotras, “Finite-horizon covariance control of linear time-varying systems,” in IEEE Conference on Decision and Control, Melbourne, Australia, December 2017, p. 3606–3611.
[25] K. Okamoto, M. Goldshtein, and P. Tsiotras, “Optimal covariance control for stochastic systems under chance constraints,” IEEE Control Systems Letters, vol. 2, no. 2, pp. 266–271, 2018.
[26] J. Ridderhof, K. Okamoto, and P. Tsiotras, “Nonlinear uncertainty control with iterative covariance steering,” in IEEE Conference on Decision and Control, Nice, France, December 2019, pp. 3484–3490.
[27] ——, “Chance constrained covariance control for linear stochastic systems with output feedback,” in IEEE Conference on Decision and Control, Jeju Island, South Korea, Dec. 8–11 2020, pp. 1153–1158.
[28] Y. Chen, T. Georgiou, and M. Pavon, “Steering state statistics with output feedback,” in IEEE Conference on Decision and Control, Osaka, Japan, December 2015, pp. 6502–6507.
[29] E. Bakolas, “Covariance control for discrete-time stochastic linear systems with incomplete state information,” in American Control Conference, Seattle, WA, May 2017, pp. 432–437.
[30] J. T. Betts, “Survey of numerical methods for trajectory optimization,” Journal of Guidance, Control, and Dynamics, vol. 21, no. 2, pp. 193–207, 1998.
[31] A. E. Bryson and Y. C. Ho, Applied Optimal Control: Optimization, Estimation and Control. CRC Press, 1975.
[32] M. Owen, R. Beard, and T. McLain, “Implementing Dubins airplane paths on fixed-wing UAVs,” in Handbook of Unmanned Aerial Vehicles, K. Valavanis and G. Vachtsevanos, Eds. Springer, 2015, ch. 64, pp. 1677–1701.

Belief Space Planning: A Covariance Steering Approach