DIRECT: A Differential Dynamic Programming Based Framework for Trajectory Generation

Kun Cao^∗, Muqing Cao^∗, Shenghai Yuan, and Lihua Xie This research is supported by the National Research Foundation, Singapore under its Medium Sized Center for Advanced Robotics Technology Innovation. ^∗Equal contribution. K. Cao, M. Cao, S. Yuan and L. Xie (corresponding author) are with School of Electrical and Electronic Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798 (email: [email protected]; {mqcao, shyuan, elhxie}@ntu.edu.sg).

Abstract

This paper introduces a differential dynamic programming (DDP) based framework for polynomial trajectory generation for differentially flat systems. In particular, instead of using a linear equation with increasing size to represent multiple polynomial segments as in literature, we take a new perspective from state-space representation such that the linear equation reduces to a finite horizon control system with a fixed state dimension and the required continuity conditions for consecutive polynomials are automatically satisfied. Consequently, the constrained trajectory generation problem (both with and without time optimization) can be converted to a discrete-time finite-horizon optimal control problem with inequality constraints, which can be approached by a recently developed interior-point DDP (IPDDP) algorithm. Furthermore, for unconstrained trajectory generation with preallocated time, we show that this problem is indeed a linear-quadratic tracking (LQT) problem (DDP algorithm with exact one iteration). All these algorithms enjoy linear complexity with respect to the number of segments. Both numerical comparisons with state-of-the-art methods and physical experiments are presented to verify and validate the effectiveness of our theoretical findings. The implementation code will be open-sourced¹¹1https://github.com/ntu-caokun/DIRECT.

I Introduction

Recent decade has witnessed many emerging applications of unmanned systems, such as search and rescue [1], environmental monitoring [2] and mapping [3], etc. As one of the fundamental problems in robotics and control researches, trajectory planning of mobile robots has been actively studied, see [4, 5, 6].

Although polynomial-based approaches can generate energy-optimal smooth trajectories with fixed time allocation efficiently, they are usually formulated as a quadratic programming problem of big size and solved via various commercial solvers. In addition, although there are several methods considering bi-level optimization on energy and time in view of high nonlinearity and non-convexity of joint energy-time optimization, the computation complexity with respect to the number of segments is unknown in general. On the other hand, although DDP based methods enjoy linear complexity in the length of prediction horizon, they usually can only plan a relatively short time ahead for highly nonlinear systems such as quadrotor as the discretization step should be very fine.

Refer to caption — Figure 1: (Top) Trajectory generated using our proposed method in a two-room environment, the safe corridor consists of $34$ polyhedra. (Bottom) Snapshots of a quadrotor executing the trajectory generated.

To address these issues, we propose DIRECT, a differential dynamic programming based framework for trajectory generation. In particular, this framework uses a state-space equation to characterize the polynomial trajectory such that half of its coefficients can be saved and the continuity conditions on the derivatives of two consecutive segments can be automatically satisfied. By leveraging the differential flatness of quadrotor systems and this perspective of polynomial trajectory, we further reformulate the trajectory generation problem into a finite-horizon free-end discrete-time optimal control problem such that it can be addressed by DDP algorithm without fine discretization and the help of solvers. Unlike many existing approaches where optimization with respect to time and polynomial coefficients are carried out alternately, this algorithm features joint optimization of all variables directly, which explains the naming of this framework.

The proposed approach is based on the observation that the state at the proximal endpoint of an odd order polynomial is solely determined by half of the coefficients (low-order terms) while the state at the distant endpoint is co-determined by all of the coefficients and time. Along with the continuity condition, a long trajectory of multiple segments can be characterized by the initial state and a sequence of control inputs which are functions of the other half of the coefficients (high-order terms) and time. By incorporating the final state constraint into the terminal cost and introducing the safety and dynamical feasibility constraints on the state and control variables via control points, the optimization problem is reformulated into a discrete-time finite-horizon optimal control problem with inequality constraints. The resulting problem either with or without time optimization can be solved under the same framework via a recently developed IPDDP algorithm in [7] with linear complexity. In addition, we show that this problem can be further simplified to a LQT problem if there is no constraint and hence admits an analytical solution, which is still within the DDP framework (exactly one round of backward forward iteration). Our contributions can be summarized as follows:

•

We show that an odd-order piecewise-polynomial trajectory can be characterized by state-space equations;
•

We propose a DDP-based framework to generate polynomial trajectories with / without time optimization and constraints;
•

We achieve comparable results with state-of-the-art methods and perform extensive real-world experiments. The code will be open-sourced.

The rest of this paper is organized as follows. Some related works are presented in Section II. Section III details the DDP-based framework for trajectory generation. Some specializations of this scheme are discussed in Section IV. Section V gives some comparisons with state-of-the-art methods and real-world implementations. Section VI draws the conclusion.

Notations: In this paper, the sets of real, positive real and nonnegative real numbers are denoted by $\mathbb{R}$ , $\mathbb{R}_{+}$ and $\mathbb{R}_{\geq 0}$ , respectively. Denote by $\mathbb{R}^{n}$ and $\mathbb{R}^{m\times n}$ the sets of $n$ -dimensional real vectors and $m\times n$ real matrices, respectively. Let $\mathbb{Z}_{+}$ and $\mathbb{Z}_{\geq 0}$ be the sets of positive and nonnegative integers, respectively. Let $\|\mathbf{x}\|$ be the $2$ norm of $\mathbf{x}$ and $\|\mathbf{x}\|_{\mathbf{A}}^{2}=\mathbf{x}^{\top}\mathbf{A}\mathbf{x}$ . Denote by $\mathbf{A}^{\top}$ and $\mathbf{A}^{-1}$ the transpose and inverse of $\mathbf{A}\in\mathbb{R}^{n\times n}$ , respectively. Let $\mathbf{I}_{n}\in\mathbb{R}^{n\times n}$ be the $n$ -dimensional identity matrix and $\mathbf{1}_{n}$ the $n$ -dimensional column vector with all entries of $1$ . Let $\otimes$ denote the Kronecker product and $\mathcal{I}_{n}=\{0,\dots,n-1\}$ . Let $\mathbf{x}^{(i)}$ be the $i$ -th order derivative of vector $\mathbf{x}$ and $\mathbf{x}^{[i]}=[\mathbf{x}^{(0)},\dots,\mathbf{x}^{(i-1)}]$ . Denote the vectorization operation by $\operatorname{vec}(\cdot)$ , i.e., $\operatorname{vec}([\mathbf{a},\mathbf{b}])=[\mathbf{a}^{\top},\mathbf{b}^{\top}]^{\top}$ .

II Related Works

II-A Trajectory planning for quadrotors

The seminal work [4] firstly proved the differential flatness for quadrotor and proposed the minimum snap piecewise polynomial trajectory generation problem, which can be transformed into a quadratic programming (QP) problem with safety corridors being included as linear constraints on discretized points and hence can be solved efficiently via solver. The time allocation and dynamical feasibility problems were approached by projected gradient with backtracking line search in an outer loop and temporal scaling, respectively. Later, the authors in [5] proposed a closed-form solution for the trajectory generation problem with a generalized cost where the time allocation, safety, and dynamical feasibility aspects are handled by gradient descent of joint energy-time cost, addition of waypoints, and temporal scaling, respectively. In [8], the authors presented an efficient safe flight corridor construction method and a temporal scaling step (similar to [5]) on time allocation to generate safe and dynamical feasible trajectories. To avoid high nonlinearity related to joint space-time optimization, some recent works propose methods that iteratively optimize spatial and time variables in two separate subroutines. The authors in [6] introduced a spatial-temporal optimization method which iteratively refines the geometrical coefficients and time-warping functions via QP and second order cone program (SOCP), respectively. In [9], the authors decomposed the trajectory optimization problem as a bi-level optimization problem where the gradient required in the outer loop is analytically obtained from the dual solution of the inner loop QP. An alternating minimization scheme was proposed in [10] where complex safety constraints were not considered. Recently in [11], the authors formulate an unconstrained nonlinear optimization that can be solved by quasi-Newton methods efficiently, but the safety and dynamical feasibility are enforced as penalty functions on discretized points instead of hard constraints. In terms of large scale trajectory ( $\geq 10^{3}$ ), the authors in [12] proved that the minimum snap problem with fixed time allocation can be solved in linear complexity by exploiting the structure of optimality conditions. The computational complexity has been further improved in [13] by eliminating a matrix inversion operation. An outer loop with analytical projected gradient was introduced for [12] in [14] such that the time allocation can be optimized. However, the computational complexity is unspecified for the case with safety and dynamical feasibility constraints.

II-B DDP algorithms

The DDP algorithm firstly introduced in [15] enjoys linear computational complexity (w.r.t. horizon) and local quadratic convergence [16] and has been applied in trajectory optimization of various systems, e.g., biped walking [17] and robotic manipulation [18]. Much research has been devoted to generalizing the DDP to the case with inequality constraints and can be generally classified into two categories. The first category converts the constrained problems to unconstrained ones via penalty [19] while the other deals with the constraints explicitly by identifying the active ones [20]. To avoid the combinatorial problem regarding the active constraints, a constrained version of Bellman’s principle of optimality has been introduced in [7, 21], which augments the control input with dual variables. However, these DDP algorithms have only been applied to a short-duration (e.g. $2\sim 3\mathrm{s}$ , $200\sim 300$ steps of sampling period $0.01\mathrm{s}$ ) quadrotor planning problem and takes quite long time for simple constraints ( $\geq 5\mathrm{s}$ , see Table 7 in [21]). In this paper, we shall exploit the differential flatness of quadrotors and the linear complexity property of DDP algorithm to generate long-duration smooth trajectories efficiently.

III DDP based framework for trajectory generation

In this section, we shall first introduce our proposed state-space representation for polynomial trajectories and then present our algorithm for jointly optimizing energy and time.

III-A State-space representation

We consider the problem of generating an $N$ -segment $n$ -th order polynomial trajectory in $d$ -dimensional space with the $k$ -th segment $\mathbf{p}_{k}\in\mathbb{R}^{d}$ being parametrized by:

\mathbf{p}_{k}(t)=\mathbf{C}_{k}^{\top}\mathbf{b}(t),t\in[0,t_{k}],

(1)

where $\mathbf{C}_{k}\in\mathbb{R}^{(n+1)\times d}$ is the coefficient matrix and $\mathbf{b}(t)=[1,t,\dots,t^{n}]^{\top}$ is the $n$ -order monomial basis. Hence, the entire trajectory on $[0,\tau_{N-1}]$ can be defined with

\mathbf{p}(t)=\mathbf{p}_{k}(t-\tau_{k-1})=\mathbf{C}_{k}^{\top}\mathbf{b}(t-\tau_{k-1})

for $t\in[\tau_{k-1},\tau_{k}]$ where $\tau_{-1}=0$ and $\tau_{k}=\sum_{i=0}^{k}t_{k}$ , $k\in\mathcal{I}_{N}$ .

Following [4, 13, 11], we consider the problem of enforcing the smoothness of trajectory up to the $(m-1)$ -th ( $m=\frac{n+1}{2}$ ) order derivative and optimizing on the energy cost up to the $m$ -th order derivative.

We consider a single segment $k$ . By (1), the collection of trajectory up to $(m-1)$ -th order derivative can be written as follows:

\mathbf{p}_{k}^{[m]}(t)=\mathbf{C}_{k}^{\top}\mathbf{b}^{[m]}(t).

(2)

Collecting both the endpoints, one has:

[\mathbf{p}_{k}^{[m]}(0),\mathbf{p}_{k}^{[m]}(t_{k})]=\mathbf{C}_{k}^{\top}[\mathbf{b}^{[m]}(0),\mathbf{b}^{[m]}(t_{k})]:=\mathbf{C}_{k}^{\top}\mathbf{F}_{k}^{\top},

(3)

where $\mathbf{F}_{k}$ can be expressed explicitly with a $2$ -by- $2$ block matrix [13], i.e., $\mathbf{F}_{k}=\left[\begin{smallmatrix}\mathbf{F}_{k,11}&\mathbf{0}\\ \mathbf{F}_{k,21}&\mathbf{F}_{k,22}\\ \end{smallmatrix}\right]$ , where each block is of size $m\times m$ , and

_ij	$\displaystyle=\begin{cases}\begin{aligned} (i-1)!,&~{}\mathrm{if}~{}i=j,\\ 0,&~{}\mathrm{otherwise},\end{aligned}\end{cases}$	(4)
$\displaystyle[\mathbf{F}_{k,21}]_{ij}$	$\displaystyle=\begin{cases}\begin{aligned} \frac{(j-1)}{(j-i)!}t_{k}^{j-i},&~{}\mathrm{if}~{}i\leq j,\\ 0,&~{}\mathrm{otherwise},\end{aligned}\end{cases}$
$\displaystyle[\mathbf{F}_{k,22}]_{ij}$	$\displaystyle=\frac{(m+j-1)!}{(m+j-i)!}t_{k}^{m+j-i}.$

Next, we partition $\mathbf{C}_{k}$ according to the block structure of $\mathbf{F}_{k}$ , i.e., $\mathbf{C}_{k}=[\mathbf{C}_{k,1}^{\top},\mathbf{C}_{k,2}^{\top}]^{\top}$ and obtain the following equality:

(\mathbf{p}_{k}^{[m]}(t_{k}))^{\top}=\mathbf{F}_{k,21}\mathbf{F}_{k,11}^{-1}(\mathbf{p}_{k}^{[m]}(0))^{\top}+\mathbf{F}_{k,22}\mathbf{C}_{k,2},

(5)

where $\mathbf{C}_{k,1}$ was substituted with $\mathbf{F}_{k,11}^{-1}(\mathbf{p}_{k}^{[m]}(0))^{\top}$ .

Letting $\mathbf{x}_{k}=\operatorname{vec}(\mathbf{p}_{k}^{[m]}(0))$ , $\mathbf{v}_{k}=\operatorname{vec}(\mathbf{C}_{k,2}^{\top})$ , $\mathbf{u}_{k}=[\mathbf{v}_{k}^{\top},t_{k}]^{\top}$ and by the continuity condition $\mathbf{p}_{k}^{[m]}(t_{k})=\mathbf{p}_{k+1}^{[m]}(0)$ , one has

\displaystyle\mathbf{x}_{k+1}=\mathbf{A}(t_{k})\mathbf{x}_{k}+\mathbf{B}(t_{k})\mathbf{v}_{k}:=\mathbf{f}(\mathbf{x}_{k},\mathbf{u}_{k}),

(6)

where $\mathbf{A}(t_{k})=(\mathbf{F}_{k,21}\mathbf{F}_{k,11}^{-1})\otimes\mathbf{I}_{d}$ and $\mathbf{B}(t_{k})=\mathbf{F}_{k,22}\otimes\mathbf{I}_{d}$ . As shown in Fig. 2, this operation transfers the original representation of trajectory into a state-space equation, where the system state is the state (position, velocity, acceleration, etc.) at each endpoint and the control input is a concatenation of half of the coefficients of the segment (i.e., $\mathbf{C}_{k,2}$ ) and duration (i.e., $t_{k}$ ). Unlike previous application of DDP algorithms in quadrotor planning and control where direct discretization is used, this representation enables the characterization of a long-duration trajectory with fewer number of variables.

III-B Objective function

Previous results on generating polynomial trajectories (e.g., [4, 13]) usually consider the so-called interpolating splines [22], where the spline must pass some prescribed waypoints exactly at specific times. However, for the trajectory generation problem for time-critical aggressive flights, quite often it is sufficient that the generated trajectories only pass through the vicinity of these waypoints, i.e., smoothing splines [23]. In this paper, we aim to find a smoothing spline such that the following objective function can be minimized:

	$\displaystyle J(\mathcal{C},\mathcal{T})=$	$\displaystyle\sum_{k=0}^{N-1}(\\|\mathbf{x}_{k}-\mathbf{x}_{k,\mathrm{g}}\\|_{\mathbf{Q}_{k}}^{2}+\int_{0}^{t_{k}}\sum_{i=1}^{m}\eta_{i,k}\\|\mathbf{p}_{k}^{(i)}(t)\\|^{2}\differential t$		(7)
		$\displaystyle\quad\quad+w_{k}t_{k}^{2})+\\|\mathbf{x}_{N}-\mathbf{x}_{N,\mathrm{g}}\\|_{\mathbf{Q}_{N}}^{2},$		(7)

where $\mathcal{C}:=\{\mathbf{C}_{k}\}_{k\in\mathcal{I}_{N}}$ and $\mathcal{T}:=\{t_{k}\}_{k\in\mathcal{I}_{N}}$ . In this cost function, the first term is designed to minimize the discrepancy between the current state $\mathbf{x}_{k}$ and the desired state $\mathbf{x}_{k,\mathrm{g}}$ at the beginning of each segment, which “attracts” the endpoint to a prescribed waypoint; the second and third terms penalize the aggressiveness of planned trajectory up to the $m$ -th order and flight time during the $k$ -th segment, respectively; the last term penalizes the discrepancy between the terminal state $\mathbf{x}_{N}$ and the goal state $\mathbf{x}_{N,\mathrm{g}}$ ; and $\mathbf{Q}_{k}$ , $\eta_{i,k}$ , $w_{k}$ , ( $k\in\mathcal{I}_{N}$ ), and $\mathbf{Q}_{N}$ denote the weights for each term. It can be found that different from the literature (e.g., [4, 13]) where hard constraints for the endpoints of $\mathbf{p}_{k}$ were used, we have softened these constraints by incorporating them into an objective function.

For usually considered minimum jerk or minimum snap trajectories, $\mathbf{Q}_{k}=\mathbf{0}$ , $\eta_{i,k}=0$ for all $i=1,\dots,m-1$ and $k\in\mathcal{I}_{N}$ . In this case, one can partition $\mathbf{b}^{(m)}(t)(\mathbf{b}^{(m)}(t))^{\top}$ into $2$ -by- $2$ block matrix with the size of each block being $m\times m$ .

Observing that only the right-bottom block is non-zero and denoting it by $\mathbf{R}_{\mathbf{v}}(t)$ , one can rewrite (7) as

J(\mathcal{U})=\sum_{k=0}^{N-1}(\|\mathbf{x}_{k}-\mathbf{x}_{k,\mathrm{g}}\|_{\mathbf{Q}_{k}}^{2}+\|\mathbf{u}_{k}\|_{\mathbf{R}_{k}}^{2})+\|\mathbf{x}_{N}-\mathbf{x}_{N,\mathrm{g}}\|_{\mathbf{Q}_{N}}^{2}

(8)

with $\mathcal{U}:=\{\mathbf{u}_{k}\}_{k\in\mathcal{I}_{N}}$ and $\mathbf{R}_{k}=\operatorname{diag}(\eta_{m,k}\int_{0}^{t_{k}}\mathbf{R}_{\mathbf{v}}(t)\differential t,w_{k})$ .

Remark III.1.

For the case with $\mathbf{Q}_{k}\neq\mathbf{0}$ , $\eta_{i,k}>0$ for some $i=1,\dots,m-1$ and $k\in\mathcal{I}_{N}$ , an equation similar to (8) can be obtained by using

	$\displaystyle\\|\mathbf{p}_{k}^{(i)}(t)\\|^{2}$	$\displaystyle=(\mathbf{b}^{(i)}(t))^{\top}\mathbf{C}_{k}\mathbf{C}_{k}^{\top}\mathbf{b}^{(i)}(t)$
		$\displaystyle=(\operatorname{vec}(\mathbf{C}_{k}^{\top}))^{\top}[\mathbf{b}^{(i)}(t)(\mathbf{b}^{(i)}(t))^{\top}\otimes\mathbf{I}_{d}]\operatorname{vec}(\mathbf{C}_{k}^{\top})$

and $\operatorname{vec}(\mathbf{C}_{k}^{\top})=\left[\begin{smallmatrix}(\mathbf{F}_{k,11}^{-1}\otimes\mathbf{I}_{d})\mathbf{x}_{k}\\ \mathbf{v}_{k}\end{smallmatrix}\right]$ . In addition, while other general differentiable functions can be used for each cost term in (7), we use a quadratic form for simplicity.

III-C Constraints

In general, there are two types of methods handling the safety and dynamical feasibility constraints. The first method discretizes each segment of trajectory and adds inequality constraints on the states at these time instants, while the issue is that the feasibility for the continuous time interval cannot be guaranteed. The other method is to add constraints on a set of control points, which are the vertices of the convex hull containing the segment, but it suffers from conservativeness. A recent work [24] proposes the MINVO basis, which has been shown to be less conservative than the widely used Bezier basis. We use the MINVO basis to construct a set of control points for our polynomial trajectory. Denote the $i$ -th order control points by $\bm{\Xi}_{k}^{\langle i\rangle}\in\mathbb{R}^{(n+1-i)\times d}$ , which can be computed by:

\bm{\Xi}_{k}^{\langle i\rangle}=\mathbf{T}^{\langle i\rangle}(t_{k})\mathbf{C}_{k},

(9)

where $\mathbf{T}^{\langle i\rangle}(t_{k})\in\mathbb{R}^{(n+1-i)\times(n+1)}$ represents the transformation matrix from polynomial coefficients to the $i$ -th order control points.

Assuming that a safe flight corridor (consists of $N$ convex polyhedra) can be generated from perception system and polyhedron $k$ is represented by $s_{k}$ hyperplanes. With (9), the safety constraint can be written as:

\mathbf{W}_{k}\cdot(\bm{\Xi}_{k}^{\langle 0\rangle})^{\top}\preceq(\mathbf{1}_{n+1}^{\top}\otimes\mathbf{h}_{k}^{\langle 0\rangle}),

(10)

where each row of $\mathbf{W}_{k}\in\mathbb{R}^{s_{k}\times d}$ and $\mathbf{h}_{k}^{\langle 0\rangle}\in\mathbb{R}^{s_{k}}$ corresponds to the parameters of a single hyperplane in polyhedron $k$ .

On the other hand, the dynamical feasibility is encoded by the following box constraint on the $i$ -th order control points $\bm{\Xi}_{k}^{\langle i\rangle}$ with $i\in[1,m-1]$ :

\underline{h}_{k}^{\langle i\rangle}\mathbf{I}_{d\times(n+1-i)}\preceq(\bm{\Xi}_{k}^{\langle i\rangle})^{\top}\preceq\overline{h}_{k}^{\langle i\rangle}\mathbf{I}_{d\times(n+1-i)},

(11)

where $\underline{h}_{k}^{\langle i\rangle}<0$ and $\overline{h}_{k}^{\langle i\rangle}>0$ are the lower and upper bounds (e.g., minimal / maximal velocity / acceleration), respectively.

Additionally, one more inequality constraint on time can be introduced to avoid negative time allocation during optimization:

t_{k}\geq\underline{t},

(12)

where $\underline{t}\in\mathbb{R}_{+}$ .

Concatenating the vectorized forms of (10) and (11) with (12), one has

\mathbf{g}_{k}:=\operatorname{diag}(\mathbf{H}_{k},-1)[\mathbf{x}_{k}^{\top},\mathbf{u}_{k}^{\top}]^{\top}-[\mathbf{h}_{k}^{\top},-\underline{t}]^{\top}\preceq\mathbf{0},

(13)

where

\mathbf{H}_{k}=\left[\begin{smallmatrix}\mathbf{T}^{\langle 0\rangle}(t_{k})\otimes\mathbf{W}_{k}\\ -\mathbf{T}^{\langle 1\rangle}(t_{k})\otimes\mathbf{I}_{d}\\ \mathbf{T}^{\langle 1\rangle}(t_{k})\otimes\mathbf{I}_{d}\\ \vdots\\ -\mathbf{T}^{\langle m-1\rangle}(t_{k})\otimes\mathbf{I}_{d}\\ \mathbf{T}^{\langle m-1\rangle}(t_{k})\otimes\mathbf{I}_{d}\\ \end{smallmatrix}\right]\operatorname{diag}(\mathbf{F}_{k,11}^{-1}\otimes\mathbf{I}_{d},\mathbf{I}_{md})

and

	$\displaystyle\mathbf{h}_{k}$	$\displaystyle=[(\mathbf{1}_{n+1}\otimes\mathbf{h}_{k}^{\langle 0\rangle})^{\top},-\underline{h}_{k}^{\langle 1\rangle}\mathbf{1}_{nd}^{\top},\overline{h}_{k}^{\langle 1\rangle}\mathbf{1}_{nd}^{\top},\cdots,$
		$\displaystyle\quad\quad-\underline{h}_{k}^{\langle m-1\rangle}\mathbf{1}_{(n-m+2)d}^{\top},\overline{h}_{k}^{\langle m-1\rangle}\mathbf{1}_{(n-m+2)d}^{\top}]^{\top}.$

III-D The overall problem and algorithm

With the objective function and constraints constructed above, we can formulate the trajectory generation problem as the following multi-stage constrained optimal control problem:

	$\displaystyle\min_{\mathcal{U}}$	$\displaystyle~{}~{}\eqref{eq:obj1}$		(14)
	$\displaystyle\mathrm{s.t.}$	$\displaystyle~{}~{}\eqref{eq:state},\eqref{eq:cons},\mathbf{x}_{0}~{}\mathrm{is~{}given}.$		(14)

It should be noted that the constraint and objective function in (14) are respectively in linear and quadratic forms, in fact, all the matrices involved are functions of $t_{k}$ and hence they are not linear-quadratic. Therefore, we shall solve the formulated problem via IPDDP [7], which generalizes the traditional unconstrained DDP to a constrained version. Similar to DDP, the IPDDP algorithm iterates between backward pass, which computes control inputs to minimize a quadratic approximation of cost in the vicinity of the current trajectory, and forward pass, which updates the current trajectory to a new one, until some terminating condition is triggered.

Define $V_{N}=\|\mathbf{x}_{N}-\mathbf{x}_{N,\mathrm{g}}\|_{\mathbf{Q}_{N}}^{2}$ . Let $\ell_{k}=\|\mathbf{x}_{k}-\mathbf{x}_{k,\mathrm{g}}\|_{\mathbf{Q}_{k}}^{2}+\|\mathbf{u}_{k}\|_{\mathbf{R}_{k}}^{2}+\bm{\lambda}_{k}^{\top}\mathbf{g}_{k}$ , and $L_{k}=\ell_{k}+V_{k+1}$ for all $k\in\mathcal{I}_{N}$ , where $\bm{\lambda}_{k}\succeq\mathbf{0}$ is the Lagrange multiplier. In the following, we shall omit subscript $k$ and denote $(\cdot)_{k+1}$ by $(\cdot)^{+}$ for clarity. Letting $(\cdot)_{\mathbf{a}}:=\partialderivative{(\cdot)}{\mathbf{a}}$ and $(\cdot)_{\mathbf{ab}}:=\partialderivative{(\cdot)}{\mathbf{a}}{\mathbf{b}}$ , one has the following iteration which resembles traditional DDP:

$\displaystyle L_{\bm{\lambda}}$	$\displaystyle=\mathbf{g},L_{\bm{\lambda}\mathbf{x}}=\mathbf{g}_{\mathbf{x}},L_{\bm{\lambda}\mathbf{u}}=\mathbf{g}_{\mathbf{u}},L_{\bm{\lambda\lambda}}=\mathbf{0},$	(15)
$\displaystyle L_{\mathbf{x}}$	$\displaystyle=\ell_{\mathbf{x}}+\mathbf{f}_{\mathbf{x}}^{\top}V_{\mathbf{x}}^{+},L_{\mathbf{u}}=\ell_{\mathbf{u}}+\mathbf{f}_{\mathbf{u}}^{\top}V_{\mathbf{u}}^{+},$
$\displaystyle L_{\mathbf{xx}}$	$\displaystyle=\ell_{\mathbf{xx}}+\mathbf{f}_{\mathbf{x}}^{\top}V_{\mathbf{xx}}^{+}\mathbf{f}_{\mathbf{x}}+V_{\mathbf{x}}^{+}\odot\mathbf{f}_{\mathbf{xx}},$
$\displaystyle L_{\mathbf{ux}}$	$\displaystyle=\ell_{\mathbf{ux}}+\mathbf{f}_{\mathbf{u}}^{\top}V_{\mathbf{ux}}^{+}\mathbf{f}_{\mathbf{x}}+V_{\mathbf{x}}^{+}\odot\mathbf{f}_{\mathbf{ux}},$
$\displaystyle L_{\mathbf{uu}}$	$\displaystyle=\ell_{\mathbf{uu}}+\mathbf{f}_{\mathbf{u}}^{\top}V_{\mathbf{uu}}^{+}\mathbf{f}_{\mathbf{u}}+V_{\mathbf{x}}^{+}\odot\mathbf{f}_{\mathbf{uu}},$

where $\odot$ denotes the tensor contraction and $\mathbf{f}$ is defined in (6). In addition to the decision variable $\delta\mathbf{u}_{k}$ (i.e., the first order variation of $\mathbf{u}_{k}$ ), the primal-dual version of DDP introduces the additional decision variable $\delta\bm{\lambda}_{k}$ . Both $\delta\mathbf{u}_{k}$ and $\delta\bm{\lambda}_{k}$ can be represented in an affine form of $\delta\mathbf{x}_{k}$ by solving the optimality conditions of an approximation of constrained Bellman equation (see $(6)$ in [7]), i.e.,

\displaystyle\begin{bmatrix}\mathbf{k}^{\mathbf{u}}&\mathbf{K}^{\mathbf{u}}\\ \mathbf{k}^{\bm{\lambda}}&\mathbf{K}^{\bm{\lambda}}\end{bmatrix}=-\begin{bmatrix}L_{\mathbf{uu}}&L_{\mathbf{u}\bm{\lambda}}\\ \bm{\Lambda}L_{\bm{\lambda}\mathbf{u}}&\mathbf{G}\end{bmatrix}^{-1}\begin{bmatrix}L_{\mathbf{u}}&L_{\mathbf{ux}}\\ \mathbf{r}&\bm{\Lambda}L_{\bm{\lambda}\mathbf{x}}\end{bmatrix},

(16)

where $\mathbf{k}^{\mathbf{u}}$ , $\mathbf{K}^{\mathbf{u}}$ , $\mathbf{k}^{\bm{\lambda}}$ and $\mathbf{K}^{\bm{\lambda}}$ are the update gains for $\delta\mathbf{u}_{k}$ and $\delta\bm{\lambda}_{k}$ , respectively; $\mathbf{r}=\bm{\Lambda}\mathbf{g}+\mu\mathbf{1}$ , $\mathbf{G}=\operatorname{diag}(\mathbf{g})$ , $\bm{\Lambda}=\operatorname{diag}(\bm{\lambda})$ and $\mu$ is a perturbation constant.

With these at hand, the gradient and Hessian of $V_{k}$ can be updated by

	$\displaystyle V_{\mathbf{x}}$	$\displaystyle=\hat{L}_{\mathbf{x}}+\hat{L}_{\mathbf{xu}}\mathbf{k}^{\mathbf{u}},$		(17)
	$\displaystyle V_{\mathbf{xx}}$	$\displaystyle=\hat{L}_{\mathbf{xx}}+\hat{L}_{\mathbf{xu}}\mathbf{K}^{\mathbf{u}},$		(17)

with the following equation:

$\displaystyle\hat{L}_{\mathbf{x}}$	$\displaystyle=L_{\mathbf{x}}-L_{\mathbf{x}\bm{\lambda}}\mathbf{G}^{-1}\mathbf{r},$	(18)
$\displaystyle\hat{L}_{\mathbf{u}}$	$\displaystyle=L_{\mathbf{u}}-L_{\mathbf{u}\bm{\lambda}}\mathbf{G}^{-1}\mathbf{r},$
$\displaystyle\hat{L}_{\mathbf{xx}}$	$\displaystyle=L_{\mathbf{xx}}-L_{\mathbf{x}\bm{\lambda}}\mathbf{G}^{-1}\bm{\Lambda}L_{\bm{\lambda}\mathbf{x}},$
$\displaystyle\hat{L}_{\mathbf{ux}}$	$\displaystyle=L_{\mathbf{ux}}-L_{\mathbf{u}\bm{\lambda}}\mathbf{G}^{-1}\bm{\Lambda}L_{\bm{\lambda}\mathbf{x}},$
$\displaystyle\hat{L}_{\mathbf{uu}}$	$\displaystyle=L_{\mathbf{uu}}-L_{\mathbf{u}\bm{\lambda}}\mathbf{G}^{-1}\bm{\Lambda}L_{\bm{\lambda}\mathbf{u}},$

where $\delta\bm{\lambda}_{k}$ was eliminated.

The forward pass resembles the traditional DDP while updating both the control input $\mathbf{u}_{k}$ and the dual variable $\bm{\lambda}_{k}$ :

$\displaystyle\mathbf{x}_{0}^{\dagger}$	$\displaystyle=\mathbf{x}_{0},$	(19)
$\displaystyle\mathbf{u}_{k}^{\dagger}$	$\displaystyle=\mathbf{u}_{k}+\mathbf{k}^{\mathbf{u}}_{k}+\mathbf{K}^{\mathbf{u}}_{k}(\mathbf{x}_{k}^{\dagger}-\mathbf{x}_{k}),$
$\displaystyle\bm{\lambda}_{k}^{\dagger}$	$\displaystyle=\bm{\lambda}_{k}+\mathbf{k}^{\bm{\lambda}}_{k}+\mathbf{K}^{\bm{\lambda}}_{k}(\mathbf{x}_{k}^{\dagger}-\mathbf{x}_{k}),$
$\displaystyle\mathbf{x}_{k+1}^{\dagger}$	$\displaystyle=\mathbf{f}(\mathbf{x}_{k}^{\dagger},\mathbf{u}_{k}^{\dagger}),$

where $\mathbf{x}^{\dagger}$ , $\mathbf{u}^{\dagger}$ , and $\bm{\lambda}^{\dagger}$ denote the new state, control variable, and dual variable, respectively.

Algorithm 1 presents the entire algorithm for generating a polynomial trajectory using IPDDP. It should be noted that the presented algorithm requires a feasible initial trajectory. This can be generated from a similar adaptation of the Infeasible-IPDDP algorithm in [7].

Algorithm 1 DIRECT

\mathbf{x}_{k,\mathrm{g}}

\mathbf{Q}_{k}

\eta_{m,k}

w_{k}

\mathbf{x}_{k,\mathrm{g}}

k\in\mathcal{I}_{N}

\mathbf{Q}_{N}

, a feasible trajectory with

\mathcal{C}

and

\mathcal{T}

\mathcal{C}^{\ast}

\mathcal{T}^{\ast}

Convert

\mathcal{C}

\mathcal{T}

\mathcal{U}

for

\mathrm{iter}=1:\mathrm{max\_iteration}

Set

V_{N}

V_{\mathbf{x},N}

V_{\mathbf{xx},N}

for

k=N-1,\dots,0

Evaluate (15)

Compute control inputs and auxiliary matrices from (16) and (18)

Update

V_{\mathbf{x}}

V_{\mathbf{xx}}

according to (17)

end for

Set

\mathbf{x}_{0}^{\dagger}=\mathbf{x}_{0}

for

k=0,\dots,N-1

Update the control variable

\mathbf{u}_{k}^{\dagger}

, multiplier

\bm{\lambda}_{k}^{\dagger}

and next state

\mathbf{x}_{k+1}^{\dagger}

according to (19)

end for

if termination condition is satisfied then

break

end if

end for

Extract

\mathcal{U}^{\ast}

and convert back to

\mathcal{C}^{\ast}

and

\mathcal{T}^{\ast}

IV Specializations

In last section, we have introduced an algorithm for jointly optimizing energy and time subject to safety and dynamical feasibility constraints. In that algorithm, part of the polynomial coefficient $\mathbf{C}_{k,2}$ (i.e., $\mathbf{v}_{k}$ ) and allocated time $t_{k}$ for segment $k$ are both regarded as control inputs in the formulation (14). In this section, we shall show that this framework can be reduced to two other types of simpler problems.

IV-A Fixed time allocation

It can be easily found that the above algorithm can be used for constrained polynomial trajectory generation with preallocated time. In particular, letting $t_{k}=\operatorname{const}$ and redefining $\mathbf{u}_{k}=\mathbf{v}_{k}$ for all $k\in\mathcal{I}_{N}$ , $\eqref{eq:state}$ turns to be a linear-time-varying system and $\eqref{eq:cons}$ is a linear-time-varying constraint w.r.t. $[\mathbf{x}_{k}^{\top},\mathbf{v}_{k}^{\top}]^{\top}$ ²²2The “time-varying” is in the sense of $k$ .. The problem can be formally written as:

$\displaystyle\min_{\mathcal{V}}$	$\displaystyle~{}~{}J(\mathcal{V})=\sum_{k=0}^{N-1}(\\|\mathbf{x}_{k}-\mathbf{x}_{k,\mathrm{g}}\\|_{\mathbf{Q}_{k}}^{2}+\\|\mathbf{v}_{k}\\|_{\mathbf{R}_{k}}^{2})$	(20)
	$\displaystyle\quad\quad\quad\quad\quad+\\|\mathbf{x}_{N}-\mathbf{x}_{N,\mathrm{g}}\\|_{\mathbf{Q}_{N}}^{2}$
$\displaystyle\mathrm{s.t.}$	$\displaystyle~{}~{}\mathbf{x}_{k+1}=\mathbf{A}_{k}\mathbf{x}_{k}+\mathbf{B}_{k}\mathbf{v}_{k},$
	$\displaystyle~{}~{}\mathbf{H}_{k}[\mathbf{x}_{k}^{\top},\mathbf{v}_{k}^{\top}]^{\top}\preceq\mathbf{h}_{k},$
	$\displaystyle~{}~{}\mathbf{x}_{0}~{}\mathrm{is~{}given},$

where $\mathcal{V}:=\{\mathbf{v}_{k}\}_{k\in\mathcal{I}_{N}}$ . In this case, Algorithm 1 can still be used for solving such type of problem³³3It is also a typical linear-time-varying model predictive control (MPC) problem, which can be solved by existing MPC algorithms., with all $\ell$ and $\mathbf{f}$ related terms in (15) being precomputed and saved during the initialization stage instead of being reevaluated at each iteration.

IV-B Fixed time allocation and no constraint

If neither safety nor dynamical feasibility constraint is considered, one has the following problem:

$\displaystyle\min_{\mathcal{V}}$	$\displaystyle~{}~{}J(\mathcal{V})=\sum_{k=0}^{N-1}(\\|\mathbf{x}_{k}-\mathbf{x}_{k,\mathrm{g}}\\|_{\mathbf{Q}_{k}}^{2}+\\|\mathbf{v}_{k}\\|_{\mathbf{R}_{k}}^{2})$	(21)
	$\displaystyle\quad\quad\quad\quad\quad+\\|\mathbf{x}_{N}-\mathbf{x}_{N,\mathrm{g}}\\|_{\mathbf{Q}_{N}}^{2}$
$\displaystyle\mathrm{s.t.}$	$\displaystyle~{}~{}\mathbf{x}_{k+1}=\mathbf{A}_{k}\mathbf{x}_{k}+\mathbf{B}_{k}\mathbf{v}_{k},$
	$\displaystyle~{}~{}\mathbf{x}_{0}~{}\mathrm{is~{}given}.$

We show that (21) is indeed a LQT problem by proving that $\mathbf{R}_{k}\succ\mathbf{0}$ . Firstly, the positive semi-definiteness $\mathbf{R}_{k}=\eta_{m,k}\int_{0}^{t_{k}}\mathbf{R}_{\mathbf{v}}(t)\differential t$ can be easily established by the linearity of integral. Secondly, denoting the last $m$ elements of $\mathbf{b}^{(m)}(t)$ by $\bm{\sigma}(t)$ , one has

	$\displaystyle\bm{\sigma}(t)$	$\displaystyle=[\frac{m!}{0!}t^{0},\cdots,\frac{n!}{(m-1)!}t^{m-1}]^{\top}$
		$\displaystyle=\operatorname{diag}(\{(m+i)!\}_{i=0}^{m-1})[\frac{1}{0!}t^{0},\cdots,\frac{1}{(m-1)!}t^{m-1}]^{\top}$
		$\displaystyle=:\bm{\Psi}\bm{\tau}.$

On the other hand, $\bm{\tau}$ can be written as $\bm{\tau}=e^{\underline{\mathbf{A}}^{\top}t}\mathbf{e}_{1}$ with $\underline{\mathbf{A}}=\left[\begin{smallmatrix}\mathbf{0}_{m-1}&\mathbf{I}_{m-1}\\ 0&\mathbf{0}_{m-1}^{\top}\end{smallmatrix}\right]$ and $\mathbf{e}_{1}=[1,\mathbf{0}_{m-1}^{\top}]^{\top}$ . Since the pair $(\underline{\mathbf{A}},\mathbf{e}_{1}^{\top})$ is observable, the matrix $\int_{0}^{\tau}e^{\underline{\mathbf{A}}^{\top}t}\mathbf{e}_{1}\mathbf{e}_{1}^{\top}e^{\underline{\mathbf{A}}t}\differential t$ is nonsingular for any $\tau>0$ . Therefore,

\int_{0}^{t_{k}}\mathbf{R}_{\mathbf{v}}(t)\differential t=\int_{0}^{t_{k}}\bm{\Psi}\bm{\tau}\bm{\tau}^{\top}\bm{\Psi}^{\top}\differential t=\bm{\Psi}\left(\int_{0}^{t_{k}}\bm{\tau}\bm{\tau}^{\top}\differential t\right)\bm{\Psi}^{\top}

is positive definite and this shows that (21) is an LQT problem.

Let $\mathbf{P}_{N}=\mathbf{Q}_{N}$ . The backward iteration for the resulting LQT problem reduces to

$\displaystyle\mathbf{K}_{k}$	$\displaystyle=(\mathbf{B}_{k}^{\top}\mathbf{P}_{k+1}\mathbf{B}_{k}+\mathbf{R}_{k})^{-1}\mathbf{B}_{k}\mathbf{P}_{k+1}\mathbf{A}_{k},$	(22)
$\displaystyle\mathbf{P}_{k}$	$\displaystyle=\mathbf{A}_{k}^{\top}\mathbf{P}_{k+1}(\mathbf{A}_{k}-\mathbf{B}_{k}\mathbf{K}_{k})+\mathbf{Q}_{k},$
$\displaystyle\mathbf{q}_{k}$	$\displaystyle=(\mathbf{A}_{k}-\mathbf{B}_{k}\mathbf{K}_{k})^{\top}\mathbf{q}_{k+1}+\mathbf{Q}_{k}\mathbf{x}_{k,\mathrm{g}},$
$\displaystyle\mathbf{k}_{k}$	$\displaystyle=(\mathbf{B}_{k}^{\top}\mathbf{P}_{k+1}\mathbf{B}_{k}+\mathbf{R}_{k})^{-1}\mathbf{B}_{k}\mathbf{q}_{k+1},$

where $\mathbf{K}_{k}$ and $\mathbf{k}_{k}$ are the parameters for expressing the new control inputs as a linear form of current state (resembles $\mathbf{k}^{\mathbf{u}}$ and $\mathbf{K}^{\mathbf{u}}$ in (16)). The forward pass reads:

	$\displaystyle\mathbf{u}_{k}$	$\displaystyle=-\mathbf{K}_{k}\mathbf{x}_{k}+\mathbf{k}_{k},$		(23)
	$\displaystyle\mathbf{x}_{k+1}$	$\displaystyle=\mathbf{A}_{k}\mathbf{x}_{k}+\mathbf{B}_{k}\mathbf{u}_{k},$		(23)

which iteratively computes the control input and next state.

Algorithm 2 summarizes the above iteration and this gives the analytical solution to the unconstrained fixed-time polynomial trajectory generation problem (21). It should be noted that Algorithm 2 can be regarded as a simplified version of Algorithm 1 with exact one iteration.

Algorithm 2 LQT-based trajectory generation.

\mathbf{x}_{k,\mathrm{g}}

\mathbf{Q}_{k}

\eta_{m,k}

(

k\in\mathcal{I}_{N}

\mathbf{Q}_{N}

\mathcal{T}

\mathcal{C}^{\ast}

Set

\mathbf{P}_{N}=\mathbf{Q}_{N}

\mathbf{q}_{N}=\mathbf{x}_{N,\mathrm{g}}

for

k=N-1,\dots,0

Compute

\mathbf{K}_{k}

\mathbf{k}_{k}

according to (22)

end for

for

k=0,\dots,N-1

Compute the control variable

\mathbf{u}_{k}

and next state

\mathbf{x}_{k+1}

according to (23)

end for

Extract

\mathcal{U}

and convert it to

\mathcal{C}^{\ast}

V Simulation and Experimental Results

In this section, we shall demonstrate the efficacy of our proposed framework by comparing with state-of-the-art methods as well as performing real-world experiments.

V-A Benchmarks for trajectory generation methods

For trajectory generation with time optimization as well as safety and dynamical feasibility constraints, we compare our algorithm with the alternating spatial-temporal optimization[6] and bi-level optimization with analytic gradient[9].⁴⁴4According to Fig. 9 of [11], these two algorithms are the state-of-the-art methods for generating a safe and dynamical feasible trajectory with time optimization. The code for [11] is currently unavailable. In [6] the energy and time costs are separately optimized while in [11] the overall cost function takes a similar form as ours, but without the terminal cost and the time cost takes the form of $w_{k}t_{k}$ . The benchmark is conducted using an i7-8550U CPU. Our algorithm is implemented in three steps: $1)$ initialize our algorithm with Infeasible-IPDDP [7] and zero initial condition to find a feasible solution with $w_{k}=1$ and $\mathbf{Q}_{N}=\mathbf{I}$ ; $2)$ feed the obtained solution into Algorithm 1 with $w_{k}=20$ and $\mathbf{Q}_{N}=100\mathbf{I}$ to jointly optimize energy and time; and $3)$ feed the obtained solution into Algorithm 1 with fixed time allocation setting (see Section IV-A). The algorithm in [6] is implemented as it is (the coefficient of time is set as $10$ ). The algorithm in [9] is implemented in C++ with MOSEK⁵⁵5https://www.mosek.com/ solver for the inner loop QP and backtracking line search for the outer loop and the coefficient of time is set as $20$ . We randomly generate $10$ paths with $64$ safety flight corridors (the facet number ranges from $6$ to $117$ , and is $28$ on average) and extract $2\sim 64$ of them to implement these three algorithms. The initial allocation time is set by the subroutine in [6]. We consider the problem of generating minimum jerk trajectory with $n=5$ , where the maximal velocity and acceleration are set as $2\mathrm{m/s}$ and $2\mathrm{m/s^{2}}$ , respectively. The success rate versus number of segments is shown in Fig. 3. The success rate is dropping in general with the increasing number of segments. The success rate of our algorithm is marginally higher than Gao’s in some cases since the terminal constraint is softened in our formulation. As a result of terminal constraints and enforcing the dynamical feasibility via control points in inner QP, Sun’s method has the lowest success rate. The overall computation time and the reduction rate of the flight time (computed by $[\sum_{k}^{N-1}(t_{k}^{\mathrm{ini}}-t_{k}^{\ast})]/\sum_{k}^{N-1}t^{\mathrm{ini}}_{k}$ , where $t_{k}^{\mathrm{ini}}$ is the initial allocated time for segment $k$ and $t_{k}^{\ast}$ is the optimized allocated time computed by each algorithm) and the control effort are also compared in Fig. 3. It can be found that for middle number of segments ( $N\leq 40$ ), our algorithm consumes less time than the other two algorithms. For $N>40$ , our algorithm still consumes less time than Gao’s while more than Sun’s. The reason is that the outer loop of Sun’s method terminates in only a few iterations and gets stuck in local minimum in these cases, which can be seen from the relatively low reduction rate in Fig. 3. In addition, there is a trade-off between flight time and control effort: a shorter flight time (i.e., higher reduction rate) corresponds to a higher control effort. Unlike Gao’s method which has increasing reduction rate and control effort w.r.t. $N$ , our algorithm maintains relatively consistent reduction rate and level of control energy. Since our formulation also incorporates the terminal cost term (see (14)), the change of this term affects the other two terms and hence causes the fluctuation of control energy in our approach, as observed in Fig. 3. Overall, for the application of trajectory generation with $N\in[10,30]$ , our algorithm achieves comparable results with Gao’s method while consumes less runtime.

For trajectory generation with fixed time allocation and without constraint, we compare our Algorithm 2 with [13]. Our parameters are set as $\eta_{m,k}=10^{-5}$ , $\mathbf{Q}_{k}=100\mathbf{I}$ , $k\in\mathcal{I}_{N}$ , $\mathbf{Q}_{N}=100\mathbf{I}$ . We randomly sample $100$ times for each number of waypoints and run $4$ algorithms ( $2$ for minimum jerk trajectory and $2$ for minimum snap trajectory). The average runtime versus number of segments is shown in Fig. 4. It can be found that all the algorithms enjoy linear complexity w.r.t. number of segments. Although our algorithm for minimum jerk trajectory consumes around three times the time of [13], the gap for minimum snap trajectory is much narrower(around 1.5 times). Furthermore, based on the results reported in [13], our algorithm is faster than the optimized version of algorithm in [12], which is one order of magnitude slower than [13] (we do not plot the result of [12] as the code is not available.)

V-B Flight experiments

The proposed framework is implemented in ROS and we conduct multiple flight experiments of a small quadrotor executing the trajectories generated using our method. We use stereo camera fusion with IMU[25] for localization and state estimation of the quadrotor in the environment and use the robust and perfect tracking (RPT) controller [26] for trajectory tracking. Firstly, we conduct a set of experiments where a long trajectory is generated in a pre-built map consisting of two rooms and multiple obstacles, as shown in Fig. 1. The trajectory is generated by the i7 onboard computer and executed by the quadrotor immediately. We use multiple settings of dynamical feasibility. The trajectory with maximal velocity and acceleration set as $2.5\mathrm{m/s}$ and $3\mathrm{m/s^{2}}$ is shown in Fig. 5. In the second set of experiments the onboard computer generates safe and feasible trajectories online to reach target points set by a commander remotely. The video of the experiments can be found online⁶⁶6https://youtu.be/BM8_ABM_2VM or in the supplemental material.

VI Conclusion and Future Work

In this paper, we have proposed to use a state-space equation to characterize a piecewise-polynomial trajectory such that the trajectory generation problem (either with or without time optimization and constraints) can be subsequently solved via DDP algorithm. The proposed framework is implemented in C++ without the use of solvers, and the efficiency of our proposed framework was demonstrated by comparing with state-of-the-art methods as well as real-world experiments.

The proposed framework opens up opportunity for further developments. For example, the joint energy-time optimization can be extended to a multi-robot scenario with non-collision constraints. The fixed-time optimization can be used for online trajectory replanning due to its fast computation.

References

[1] J. Hu, J. Xu, and L. Xie, “Cooperative search and exploration in robotic networks,” Unmanned Systems, vol. 1, no. 1, pp. 121–142, 2013.
[2] P. Ogren, E. Fiorelli, and N. E. Leonard, “Cooperative control of mobile sensor networks: Adaptive gradient climbing in a distributed environment,” IEEE Transactions on Automatic Control, vol. 49, no. 8, pp. 1292–1302, 2004.
[3] Y. Yue, C. Yang, Y. Wang, P. C. N. Senarathne, J. Zhang, M. Wen, and D. Wang, “A multilevel fusion system for multirobot 3-d mapping using heterogeneous sensors,” IEEE Systems Journal, published online, 2019.
[4] D. Mellinger and V. Kumar, “Minimum snap trajectory generation and control for quadrotors,” in 2011 IEEE International Conference on Robotics and Automation, pp. 2520–2525, IEEE, 2011.
[5] A. Bry, C. Richter, A. Bachrach, and N. Roy, “Aggressive flight of fixed-wing and quadrotor aircraft in dense indoor environments,” The International Journal of Robotics Research, vol. 34, no. 7, pp. 969–1002, 2015.
[6] F. Gao, L. Wang, B. Zhou, X. Zhou, J. Pan, and S. Shen, “Teach-repeat-replan: A complete and robust system for aggressive flight in complex environments,” IEEE Transactions on Robotics, vol. 36, no. 5, pp. 1526–1545, 2020.
[7] A. Pavlov, I. Shames, and C. Manzie, “Interior point differential dynamic programming,” IEEE Transactions on Control Systems Technology, early access, 2021.
[8] S. Liu, M. Watterson, K. Mohta, K. Sun, S. Bhattacharya, C. J. Taylor, and V. Kumar, “Planning dynamically feasible trajectories for quadrotors using safe flight corridors in 3-d complex environments,” IEEE Robotics and Automation Letters, vol. 2, no. 3, pp. 1688–1695, 2017.
[9] W. Sun, G. Tang, and K. Hauser, “Fast uav trajectory optimization using bilevel optimization with analytical gradients,” IEEE Transactions on Robotics, early access, 2021.
[10] Z. Wang, X. Zhou, C. Xu, J. Chu, and F. Gao, “Alternating minimization based trajectory generation for quadrotor aggressive flight,” IEEE Robotics and Automation Letters, vol. 5, no. 3, pp. 4836–4843, 2020.
[11] Z. Wang, X. Zhou, C. Xu, and F. Gao, “Geometrically constrained trajectory optimization for multicopters,” arXiv preprint arXiv:2103.00190, 2021.
[12] D. Burke, A. Chapman, and I. Shames, “Generating minimum-snap quadrotor trajectories really fast,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1487–1492, IEEE, 2020.
[13] Z. Wang, H. Ye, C. Xu, and F. Gao, “Generating large-scale trajectories efficiently using double descriptions of polynomials,” arXiv preprint arXiv:2011.02662, 2020.
[14] D. Burke, A. Chapman, and I. Shames, “Fast spline trajectory planning: Minimum snap and beyond,” arXiv preprint arXiv:2105.01788, 2021.
[15] D. Mayne, “A second-order gradient method for determining optimal trajectories of non-linear discrete-time systems,” International Journal of Control, vol. 3, no. 1, pp. 85–95, 1966.
[16] J. De O. Pantoja, “Differential dynamic programming and newton’s method,” International Journal of Control, vol. 47, no. 5, pp. 1539–1553, 1988.
[17] J. Morimoto, G. Zeglin, and C. G. Atkeson, “Minimax differential dynamic programming: Application to a biped walking robot,” in Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003)(Cat. No. 03CH37453), vol. 2, pp. 1927–1932, IEEE, 2003.
[18] V. Kumar, E. Todorov, and S. Levine, “Optimal control with learned local models: Application to dexterous manipulation,” in 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 378–383, IEEE, 2016.
[19] B. Plancher, Z. Manchester, and S. Kuindersma, “Constrained unscented dynamic programming,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5674–5680, IEEE, 2017.
[20] Z. Xie, C. K. Liu, and K. Hauser, “Differential dynamic programming with nonlinear constraints,” in 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 695–702, IEEE, 2017.
[21] Y. Aoyama, G. Boutselis, A. Patel, and E. A. Theodorou, “Constrained differential dynamic programming revisited,” arXiv preprint arXiv:2005.00985, 2020.
[22] Z. Zhang, J. Tomlinson, and C. Martin, “Splines and linear control theory,” Acta Applicandae Mathematica, vol. 49, no. 1, pp. 1–34, 1997.
[23] S. Sun, M. B. Egerstedt, and C. F. Martin, “Control theoretic smoothing splines,” IEEE Transactions on Automatic Control, vol. 45, no. 12, pp. 2271–2279, 2000.
[24] J. Tordesillas and J. P. How, “Minvo basis: Finding simplexes with minimum volume enclosing polynomial curves,” arXiv preprint arXiv:2010.10726, 2020.
[25] T. Qin, J. Pan, S. Cao, and S. Shen, “A general optimization-based framework for local odometry estimation with multiple sensors,” 2019.
[26] G. Cai, B. M. Chen, and T. H. Lee, Unmanned rotorcraft systems. Springer Science & Business Media, 2011.