This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

DIRECT: A Differential Dynamic Programming Based Framework for Trajectory Generation

Kun Cao, Muqing Cao, Shenghai Yuan, and Lihua Xie This research is supported by the National Research Foundation, Singapore under its Medium Sized Center for Advanced Robotics Technology Innovation. Equal contribution. K. Cao, M. Cao, S. Yuan and L. Xie (corresponding author) are with School of Electrical and Electronic Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798 (email: [email protected]; {mqcao, shyuan, elhxie}@ntu.edu.sg).
Abstract

This paper introduces a differential dynamic programming (DDP) based framework for polynomial trajectory generation for differentially flat systems. In particular, instead of using a linear equation with increasing size to represent multiple polynomial segments as in literature, we take a new perspective from state-space representation such that the linear equation reduces to a finite horizon control system with a fixed state dimension and the required continuity conditions for consecutive polynomials are automatically satisfied. Consequently, the constrained trajectory generation problem (both with and without time optimization) can be converted to a discrete-time finite-horizon optimal control problem with inequality constraints, which can be approached by a recently developed interior-point DDP (IPDDP) algorithm. Furthermore, for unconstrained trajectory generation with preallocated time, we show that this problem is indeed a linear-quadratic tracking (LQT) problem (DDP algorithm with exact one iteration). All these algorithms enjoy linear complexity with respect to the number of segments. Both numerical comparisons with state-of-the-art methods and physical experiments are presented to verify and validate the effectiveness of our theoretical findings. The implementation code will be open-sourced111https://github.com/ntu-caokun/DIRECT.

I Introduction

Recent decade has witnessed many emerging applications of unmanned systems, such as search and rescue [1], environmental monitoring [2] and mapping [3], etc. As one of the fundamental problems in robotics and control researches, trajectory planning of mobile robots has been actively studied, see [4, 5, 6].

Although polynomial-based approaches can generate energy-optimal smooth trajectories with fixed time allocation efficiently, they are usually formulated as a quadratic programming problem of big size and solved via various commercial solvers. In addition, although there are several methods considering bi-level optimization on energy and time in view of high nonlinearity and non-convexity of joint energy-time optimization, the computation complexity with respect to the number of segments is unknown in general. On the other hand, although DDP based methods enjoy linear complexity in the length of prediction horizon, they usually can only plan a relatively short time ahead for highly nonlinear systems such as quadrotor as the discretization step should be very fine.

Refer to caption
Refer to caption
Figure 1: (Top) Trajectory generated using our proposed method in a two-room environment, the safe corridor consists of 3434 polyhedra. (Bottom) Snapshots of a quadrotor executing the trajectory generated.

To address these issues, we propose DIRECT, a differential dynamic programming based framework for trajectory generation. In particular, this framework uses a state-space equation to characterize the polynomial trajectory such that half of its coefficients can be saved and the continuity conditions on the derivatives of two consecutive segments can be automatically satisfied. By leveraging the differential flatness of quadrotor systems and this perspective of polynomial trajectory, we further reformulate the trajectory generation problem into a finite-horizon free-end discrete-time optimal control problem such that it can be addressed by DDP algorithm without fine discretization and the help of solvers. Unlike many existing approaches where optimization with respect to time and polynomial coefficients are carried out alternately, this algorithm features joint optimization of all variables directly, which explains the naming of this framework.

The proposed approach is based on the observation that the state at the proximal endpoint of an odd order polynomial is solely determined by half of the coefficients (low-order terms) while the state at the distant endpoint is co-determined by all of the coefficients and time. Along with the continuity condition, a long trajectory of multiple segments can be characterized by the initial state and a sequence of control inputs which are functions of the other half of the coefficients (high-order terms) and time. By incorporating the final state constraint into the terminal cost and introducing the safety and dynamical feasibility constraints on the state and control variables via control points, the optimization problem is reformulated into a discrete-time finite-horizon optimal control problem with inequality constraints. The resulting problem either with or without time optimization can be solved under the same framework via a recently developed IPDDP algorithm in [7] with linear complexity. In addition, we show that this problem can be further simplified to a LQT problem if there is no constraint and hence admits an analytical solution, which is still within the DDP framework (exactly one round of backward forward iteration). Our contributions can be summarized as follows:

  • We show that an odd-order piecewise-polynomial trajectory can be characterized by state-space equations;

  • We propose a DDP-based framework to generate polynomial trajectories with / without time optimization and constraints;

  • We achieve comparable results with state-of-the-art methods and perform extensive real-world experiments. The code will be open-sourced.

The rest of this paper is organized as follows. Some related works are presented in Section II. Section III details the DDP-based framework for trajectory generation. Some specializations of this scheme are discussed in Section IV. Section V gives some comparisons with state-of-the-art methods and real-world implementations. Section VI draws the conclusion.

Notations: In this paper, the sets of real, positive real and nonnegative real numbers are denoted by \mathbb{R}, +\mathbb{R}_{+} and 0\mathbb{R}_{\geq 0}, respectively. Denote by n\mathbb{R}^{n} and m×n\mathbb{R}^{m\times n} the sets of nn-dimensional real vectors and m×nm\times n real matrices, respectively. Let +\mathbb{Z}_{+} and 0\mathbb{Z}_{\geq 0} be the sets of positive and nonnegative integers, respectively. Let 𝐱\|\mathbf{x}\| be the 22 norm of 𝐱\mathbf{x} and 𝐱𝐀2=𝐱𝐀𝐱\|\mathbf{x}\|_{\mathbf{A}}^{2}=\mathbf{x}^{\top}\mathbf{A}\mathbf{x}. Denote by 𝐀\mathbf{A}^{\top} and 𝐀1\mathbf{A}^{-1} the transpose and inverse of 𝐀n×n\mathbf{A}\in\mathbb{R}^{n\times n}, respectively. Let 𝐈nn×n\mathbf{I}_{n}\in\mathbb{R}^{n\times n} be the nn-dimensional identity matrix and 𝟏n\mathbf{1}_{n} the nn-dimensional column vector with all entries of 11. Let \otimes denote the Kronecker product and n={0,,n1}\mathcal{I}_{n}=\{0,\dots,n-1\}. Let 𝐱(i)\mathbf{x}^{(i)} be the ii-th order derivative of vector 𝐱\mathbf{x} and 𝐱[i]=[𝐱(0),,𝐱(i1)]\mathbf{x}^{[i]}=[\mathbf{x}^{(0)},\dots,\mathbf{x}^{(i-1)}]. Denote the vectorization operation by vec()\operatorname{vec}(\cdot), i.e., vec([𝐚,𝐛])=[𝐚,𝐛]\operatorname{vec}([\mathbf{a},\mathbf{b}])=[\mathbf{a}^{\top},\mathbf{b}^{\top}]^{\top}.

II Related Works

II-A Trajectory planning for quadrotors

The seminal work [4] firstly proved the differential flatness for quadrotor and proposed the minimum snap piecewise polynomial trajectory generation problem, which can be transformed into a quadratic programming (QP) problem with safety corridors being included as linear constraints on discretized points and hence can be solved efficiently via solver. The time allocation and dynamical feasibility problems were approached by projected gradient with backtracking line search in an outer loop and temporal scaling, respectively. Later, the authors in [5] proposed a closed-form solution for the trajectory generation problem with a generalized cost where the time allocation, safety, and dynamical feasibility aspects are handled by gradient descent of joint energy-time cost, addition of waypoints, and temporal scaling, respectively. In [8], the authors presented an efficient safe flight corridor construction method and a temporal scaling step (similar to [5]) on time allocation to generate safe and dynamical feasible trajectories. To avoid high nonlinearity related to joint space-time optimization, some recent works propose methods that iteratively optimize spatial and time variables in two separate subroutines. The authors in [6] introduced a spatial-temporal optimization method which iteratively refines the geometrical coefficients and time-warping functions via QP and second order cone program (SOCP), respectively. In [9], the authors decomposed the trajectory optimization problem as a bi-level optimization problem where the gradient required in the outer loop is analytically obtained from the dual solution of the inner loop QP. An alternating minimization scheme was proposed in [10] where complex safety constraints were not considered. Recently in [11], the authors formulate an unconstrained nonlinear optimization that can be solved by quasi-Newton methods efficiently, but the safety and dynamical feasibility are enforced as penalty functions on discretized points instead of hard constraints. In terms of large scale trajectory (103\geq 10^{3}), the authors in [12] proved that the minimum snap problem with fixed time allocation can be solved in linear complexity by exploiting the structure of optimality conditions. The computational complexity has been further improved in [13] by eliminating a matrix inversion operation. An outer loop with analytical projected gradient was introduced for [12] in [14] such that the time allocation can be optimized. However, the computational complexity is unspecified for the case with safety and dynamical feasibility constraints.

II-B DDP algorithms

The DDP algorithm firstly introduced in [15] enjoys linear computational complexity (w.r.t. horizon) and local quadratic convergence [16] and has been applied in trajectory optimization of various systems, e.g., biped walking [17] and robotic manipulation [18]. Much research has been devoted to generalizing the DDP to the case with inequality constraints and can be generally classified into two categories. The first category converts the constrained problems to unconstrained ones via penalty [19] while the other deals with the constraints explicitly by identifying the active ones [20]. To avoid the combinatorial problem regarding the active constraints, a constrained version of Bellman’s principle of optimality has been introduced in [7, 21], which augments the control input with dual variables. However, these DDP algorithms have only been applied to a short-duration (e.g. 23s2\sim 3\mathrm{s}, 200300200\sim 300 steps of sampling period 0.01s0.01\mathrm{s}) quadrotor planning problem and takes quite long time for simple constraints (5s\geq 5\mathrm{s}, see Table 7 in [21]). In this paper, we shall exploit the differential flatness of quadrotors and the linear complexity property of DDP algorithm to generate long-duration smooth trajectories efficiently.

III DDP based framework for trajectory generation

In this section, we shall first introduce our proposed state-space representation for polynomial trajectories and then present our algorithm for jointly optimizing energy and time.

III-A State-space representation

We consider the problem of generating an NN-segment nn-th order polynomial trajectory in dd-dimensional space with the kk-th segment 𝐩kd\mathbf{p}_{k}\in\mathbb{R}^{d} being parametrized by:

𝐩k(t)=𝐂k𝐛(t),t[0,tk],\mathbf{p}_{k}(t)=\mathbf{C}_{k}^{\top}\mathbf{b}(t),t\in[0,t_{k}], (1)

where 𝐂k(n+1)×d\mathbf{C}_{k}\in\mathbb{R}^{(n+1)\times d} is the coefficient matrix and 𝐛(t)=[1,t,,tn]\mathbf{b}(t)=[1,t,\dots,t^{n}]^{\top} is the nn-order monomial basis. Hence, the entire trajectory on [0,τN1][0,\tau_{N-1}] can be defined with

𝐩(t)=𝐩k(tτk1)=𝐂k𝐛(tτk1)\mathbf{p}(t)=\mathbf{p}_{k}(t-\tau_{k-1})=\mathbf{C}_{k}^{\top}\mathbf{b}(t-\tau_{k-1})

for t[τk1,τk]t\in[\tau_{k-1},\tau_{k}] where τ1=0\tau_{-1}=0 and τk=i=0ktk\tau_{k}=\sum_{i=0}^{k}t_{k}, kNk\in\mathcal{I}_{N}.

Following [4, 13, 11], we consider the problem of enforcing the smoothness of trajectory up to the (m1)(m-1)-th (m=n+12m=\frac{n+1}{2}) order derivative and optimizing on the energy cost up to the mm-th order derivative.

We consider a single segment kk. By (1), the collection of trajectory up to (m1)(m-1)-th order derivative can be written as follows:

𝐩k[m](t)=𝐂k𝐛[m](t).\mathbf{p}_{k}^{[m]}(t)=\mathbf{C}_{k}^{\top}\mathbf{b}^{[m]}(t). (2)

Collecting both the endpoints, one has:

[𝐩k[m](0),𝐩k[m](tk)]=𝐂k[𝐛[m](0),𝐛[m](tk)]:=𝐂k𝐅k,[\mathbf{p}_{k}^{[m]}(0),\mathbf{p}_{k}^{[m]}(t_{k})]=\mathbf{C}_{k}^{\top}[\mathbf{b}^{[m]}(0),\mathbf{b}^{[m]}(t_{k})]:=\mathbf{C}_{k}^{\top}\mathbf{F}_{k}^{\top}, (3)

where 𝐅k\mathbf{F}_{k} can be expressed explicitly with a 22-by-22 block matrix [13], i.e., 𝐅k=[𝐅k,11𝟎𝐅k,21𝐅k,22]\mathbf{F}_{k}=\left[\begin{smallmatrix}\mathbf{F}_{k,11}&\mathbf{0}\\ \mathbf{F}_{k,21}&\mathbf{F}_{k,22}\\ \end{smallmatrix}\right], where each block is of size m×mm\times m, and

ij ={(i1)!,ifi=j,0,otherwise,\displaystyle=\begin{cases}\begin{aligned} (i-1)!,&~{}\mathrm{if}~{}i=j,\\ 0,&~{}\mathrm{otherwise},\end{aligned}\end{cases} (4)
[𝐅k,21]ij\displaystyle[\mathbf{F}_{k,21}]_{ij} ={(j1)(ji)!tkji,ifij,0,otherwise,\displaystyle=\begin{cases}\begin{aligned} \frac{(j-1)}{(j-i)!}t_{k}^{j-i},&~{}\mathrm{if}~{}i\leq j,\\ 0,&~{}\mathrm{otherwise},\end{aligned}\end{cases}
[𝐅k,22]ij\displaystyle[\mathbf{F}_{k,22}]_{ij} =(m+j1)!(m+ji)!tkm+ji.\displaystyle=\frac{(m+j-1)!}{(m+j-i)!}t_{k}^{m+j-i}.

Next, we partition 𝐂k\mathbf{C}_{k} according to the block structure of 𝐅k\mathbf{F}_{k}, i.e., 𝐂k=[𝐂k,1,𝐂k,2]\mathbf{C}_{k}=[\mathbf{C}_{k,1}^{\top},\mathbf{C}_{k,2}^{\top}]^{\top} and obtain the following equality:

(𝐩k[m](tk))=𝐅k,21𝐅k,111(𝐩k[m](0))+𝐅k,22𝐂k,2,(\mathbf{p}_{k}^{[m]}(t_{k}))^{\top}=\mathbf{F}_{k,21}\mathbf{F}_{k,11}^{-1}(\mathbf{p}_{k}^{[m]}(0))^{\top}+\mathbf{F}_{k,22}\mathbf{C}_{k,2}, (5)

where 𝐂k,1\mathbf{C}_{k,1} was substituted with 𝐅k,111(𝐩k[m](0))\mathbf{F}_{k,11}^{-1}(\mathbf{p}_{k}^{[m]}(0))^{\top}.

Letting 𝐱k=vec(𝐩k[m](0))\mathbf{x}_{k}=\operatorname{vec}(\mathbf{p}_{k}^{[m]}(0)), 𝐯k=vec(𝐂k,2)\mathbf{v}_{k}=\operatorname{vec}(\mathbf{C}_{k,2}^{\top}), 𝐮k=[𝐯k,tk]\mathbf{u}_{k}=[\mathbf{v}_{k}^{\top},t_{k}]^{\top} and by the continuity condition 𝐩k[m](tk)=𝐩k+1[m](0)\mathbf{p}_{k}^{[m]}(t_{k})=\mathbf{p}_{k+1}^{[m]}(0), one has

𝐱k+1=𝐀(tk)𝐱k+𝐁(tk)𝐯k:=𝐟(𝐱k,𝐮k),\displaystyle\mathbf{x}_{k+1}=\mathbf{A}(t_{k})\mathbf{x}_{k}+\mathbf{B}(t_{k})\mathbf{v}_{k}:=\mathbf{f}(\mathbf{x}_{k},\mathbf{u}_{k}), (6)

where 𝐀(tk)=(𝐅k,21𝐅k,111)𝐈d\mathbf{A}(t_{k})=(\mathbf{F}_{k,21}\mathbf{F}_{k,11}^{-1})\otimes\mathbf{I}_{d} and 𝐁(tk)=𝐅k,22𝐈d\mathbf{B}(t_{k})=\mathbf{F}_{k,22}\otimes\mathbf{I}_{d}. As shown in Fig. 2, this operation transfers the original representation of trajectory into a state-space equation, where the system state is the state (position, velocity, acceleration, etc.) at each endpoint and the control input is a concatenation of half of the coefficients of the segment (i.e., 𝐂k,2\mathbf{C}_{k,2}) and duration (i.e., tkt_{k}). Unlike previous application of DDP algorithms in quadrotor planning and control where direct discretization is used, this representation enables the characterization of a long-duration trajectory with fewer number of variables.

Refer to caption
Figure 2: State space representation.

III-B Objective function

Previous results on generating polynomial trajectories (e.g., [4, 13]) usually consider the so-called interpolating splines [22], where the spline must pass some prescribed waypoints exactly at specific times. However, for the trajectory generation problem for time-critical aggressive flights, quite often it is sufficient that the generated trajectories only pass through the vicinity of these waypoints, i.e., smoothing splines [23]. In this paper, we aim to find a smoothing spline such that the following objective function can be minimized:

J(𝒞,𝒯)=\displaystyle J(\mathcal{C},\mathcal{T})= k=0N1(𝐱k𝐱k,g𝐐k2+0tki=1mηi,k𝐩k(i)(t)2dt\displaystyle\sum_{k=0}^{N-1}(\|\mathbf{x}_{k}-\mathbf{x}_{k,\mathrm{g}}\|_{\mathbf{Q}_{k}}^{2}+\int_{0}^{t_{k}}\sum_{i=1}^{m}\eta_{i,k}\|\mathbf{p}_{k}^{(i)}(t)\|^{2}\differential t (7)
+wktk2)+𝐱N𝐱N,g𝐐N2,\displaystyle\quad\quad+w_{k}t_{k}^{2})+\|\mathbf{x}_{N}-\mathbf{x}_{N,\mathrm{g}}\|_{\mathbf{Q}_{N}}^{2},

where 𝒞:={𝐂k}kN\mathcal{C}:=\{\mathbf{C}_{k}\}_{k\in\mathcal{I}_{N}} and 𝒯:={tk}kN\mathcal{T}:=\{t_{k}\}_{k\in\mathcal{I}_{N}}. In this cost function, the first term is designed to minimize the discrepancy between the current state 𝐱k\mathbf{x}_{k} and the desired state 𝐱k,g\mathbf{x}_{k,\mathrm{g}} at the beginning of each segment, which “attracts” the endpoint to a prescribed waypoint; the second and third terms penalize the aggressiveness of planned trajectory up to the mm-th order and flight time during the kk-th segment, respectively; the last term penalizes the discrepancy between the terminal state 𝐱N\mathbf{x}_{N} and the goal state 𝐱N,g\mathbf{x}_{N,\mathrm{g}}; and 𝐐k\mathbf{Q}_{k}, ηi,k\eta_{i,k}, wkw_{k}, (kNk\in\mathcal{I}_{N}), and 𝐐N\mathbf{Q}_{N} denote the weights for each term. It can be found that different from the literature (e.g., [4, 13]) where hard constraints for the endpoints of 𝐩k\mathbf{p}_{k} were used, we have softened these constraints by incorporating them into an objective function.

For usually considered minimum jerk or minimum snap trajectories, 𝐐k=𝟎\mathbf{Q}_{k}=\mathbf{0}, ηi,k=0\eta_{i,k}=0 for all i=1,,m1i=1,\dots,m-1 and kNk\in\mathcal{I}_{N}. In this case, one can partition 𝐛(m)(t)(𝐛(m)(t))\mathbf{b}^{(m)}(t)(\mathbf{b}^{(m)}(t))^{\top} into 22-by-22 block matrix with the size of each block being m×mm\times m.

Observing that only the right-bottom block is non-zero and denoting it by 𝐑𝐯(t)\mathbf{R}_{\mathbf{v}}(t), one can rewrite (7) as

J(𝒰)=k=0N1(𝐱k𝐱k,g𝐐k2+𝐮k𝐑k2)+𝐱N𝐱N,g𝐐N2J(\mathcal{U})=\sum_{k=0}^{N-1}(\|\mathbf{x}_{k}-\mathbf{x}_{k,\mathrm{g}}\|_{\mathbf{Q}_{k}}^{2}+\|\mathbf{u}_{k}\|_{\mathbf{R}_{k}}^{2})+\|\mathbf{x}_{N}-\mathbf{x}_{N,\mathrm{g}}\|_{\mathbf{Q}_{N}}^{2} (8)

with 𝒰:={𝐮k}kN\mathcal{U}:=\{\mathbf{u}_{k}\}_{k\in\mathcal{I}_{N}} and 𝐑k=diag(ηm,k0tk𝐑𝐯(t)dt,wk)\mathbf{R}_{k}=\operatorname{diag}(\eta_{m,k}\int_{0}^{t_{k}}\mathbf{R}_{\mathbf{v}}(t)\differential t,w_{k}).

Remark III.1.

For the case with 𝐐k𝟎\mathbf{Q}_{k}\neq\mathbf{0}, ηi,k>0\eta_{i,k}>0 for some i=1,,m1i=1,\dots,m-1 and kNk\in\mathcal{I}_{N}, an equation similar to (8) can be obtained by using

𝐩k(i)(t)2\displaystyle\|\mathbf{p}_{k}^{(i)}(t)\|^{2} =(𝐛(i)(t))𝐂k𝐂k𝐛(i)(t)\displaystyle=(\mathbf{b}^{(i)}(t))^{\top}\mathbf{C}_{k}\mathbf{C}_{k}^{\top}\mathbf{b}^{(i)}(t)
=(vec(𝐂k))[𝐛(i)(t)(𝐛(i)(t))𝐈d]vec(𝐂k)\displaystyle=(\operatorname{vec}(\mathbf{C}_{k}^{\top}))^{\top}[\mathbf{b}^{(i)}(t)(\mathbf{b}^{(i)}(t))^{\top}\otimes\mathbf{I}_{d}]\operatorname{vec}(\mathbf{C}_{k}^{\top})

and vec(𝐂k)=[(𝐅k,111𝐈d)𝐱k𝐯k]\operatorname{vec}(\mathbf{C}_{k}^{\top})=\left[\begin{smallmatrix}(\mathbf{F}_{k,11}^{-1}\otimes\mathbf{I}_{d})\mathbf{x}_{k}\\ \mathbf{v}_{k}\end{smallmatrix}\right]. In addition, while other general differentiable functions can be used for each cost term in (7), we use a quadratic form for simplicity.

III-C Constraints

In general, there are two types of methods handling the safety and dynamical feasibility constraints. The first method discretizes each segment of trajectory and adds inequality constraints on the states at these time instants, while the issue is that the feasibility for the continuous time interval cannot be guaranteed. The other method is to add constraints on a set of control points, which are the vertices of the convex hull containing the segment, but it suffers from conservativeness. A recent work [24] proposes the MINVO basis, which has been shown to be less conservative than the widely used Bezier basis. We use the MINVO basis to construct a set of control points for our polynomial trajectory. Denote the ii-th order control points by 𝚵ki(n+1i)×d\bm{\Xi}_{k}^{\langle i\rangle}\in\mathbb{R}^{(n+1-i)\times d}, which can be computed by:

𝚵ki=𝐓i(tk)𝐂k,\bm{\Xi}_{k}^{\langle i\rangle}=\mathbf{T}^{\langle i\rangle}(t_{k})\mathbf{C}_{k}, (9)

where 𝐓i(tk)(n+1i)×(n+1)\mathbf{T}^{\langle i\rangle}(t_{k})\in\mathbb{R}^{(n+1-i)\times(n+1)} represents the transformation matrix from polynomial coefficients to the ii-th order control points.

Assuming that a safe flight corridor (consists of NN convex polyhedra) can be generated from perception system and polyhedron kk is represented by sks_{k} hyperplanes. With (9), the safety constraint can be written as:

𝐖k(𝚵k0)(𝟏n+1𝐡k0),\mathbf{W}_{k}\cdot(\bm{\Xi}_{k}^{\langle 0\rangle})^{\top}\preceq(\mathbf{1}_{n+1}^{\top}\otimes\mathbf{h}_{k}^{\langle 0\rangle}), (10)

where each row of 𝐖ksk×d\mathbf{W}_{k}\in\mathbb{R}^{s_{k}\times d} and 𝐡k0sk\mathbf{h}_{k}^{\langle 0\rangle}\in\mathbb{R}^{s_{k}} corresponds to the parameters of a single hyperplane in polyhedron kk.

On the other hand, the dynamical feasibility is encoded by the following box constraint on the ii-th order control points 𝚵ki\bm{\Xi}_{k}^{\langle i\rangle} with i[1,m1]i\in[1,m-1]:

h¯ki𝐈d×(n+1i)(𝚵ki)h¯ki𝐈d×(n+1i),\underline{h}_{k}^{\langle i\rangle}\mathbf{I}_{d\times(n+1-i)}\preceq(\bm{\Xi}_{k}^{\langle i\rangle})^{\top}\preceq\overline{h}_{k}^{\langle i\rangle}\mathbf{I}_{d\times(n+1-i)}, (11)

where h¯ki<0\underline{h}_{k}^{\langle i\rangle}<0 and h¯ki>0\overline{h}_{k}^{\langle i\rangle}>0 are the lower and upper bounds (e.g., minimal / maximal velocity / acceleration), respectively.

Additionally, one more inequality constraint on time can be introduced to avoid negative time allocation during optimization:

tkt¯,t_{k}\geq\underline{t}, (12)

where t¯+\underline{t}\in\mathbb{R}_{+}.

Concatenating the vectorized forms of (10) and (11) with (12), one has

𝐠k:=diag(𝐇k,1)[𝐱k,𝐮k][𝐡k,t¯]𝟎,\mathbf{g}_{k}:=\operatorname{diag}(\mathbf{H}_{k},-1)[\mathbf{x}_{k}^{\top},\mathbf{u}_{k}^{\top}]^{\top}-[\mathbf{h}_{k}^{\top},-\underline{t}]^{\top}\preceq\mathbf{0}, (13)

where

𝐇k=[𝐓0(tk)𝐖k𝐓1(tk)𝐈d𝐓1(tk)𝐈d𝐓m1(tk)𝐈d𝐓m1(tk)𝐈d]diag(𝐅k,111𝐈d,𝐈md)\mathbf{H}_{k}=\left[\begin{smallmatrix}\mathbf{T}^{\langle 0\rangle}(t_{k})\otimes\mathbf{W}_{k}\\ -\mathbf{T}^{\langle 1\rangle}(t_{k})\otimes\mathbf{I}_{d}\\ \mathbf{T}^{\langle 1\rangle}(t_{k})\otimes\mathbf{I}_{d}\\ \vdots\\ -\mathbf{T}^{\langle m-1\rangle}(t_{k})\otimes\mathbf{I}_{d}\\ \mathbf{T}^{\langle m-1\rangle}(t_{k})\otimes\mathbf{I}_{d}\\ \end{smallmatrix}\right]\operatorname{diag}(\mathbf{F}_{k,11}^{-1}\otimes\mathbf{I}_{d},\mathbf{I}_{md})

and

𝐡k\displaystyle\mathbf{h}_{k} =[(𝟏n+1𝐡k0),h¯k1𝟏nd,h¯k1𝟏nd,,\displaystyle=[(\mathbf{1}_{n+1}\otimes\mathbf{h}_{k}^{\langle 0\rangle})^{\top},-\underline{h}_{k}^{\langle 1\rangle}\mathbf{1}_{nd}^{\top},\overline{h}_{k}^{\langle 1\rangle}\mathbf{1}_{nd}^{\top},\cdots,
h¯km1𝟏(nm+2)d,h¯km1𝟏(nm+2)d].\displaystyle\quad\quad-\underline{h}_{k}^{\langle m-1\rangle}\mathbf{1}_{(n-m+2)d}^{\top},\overline{h}_{k}^{\langle m-1\rangle}\mathbf{1}_{(n-m+2)d}^{\top}]^{\top}.

III-D The overall problem and algorithm

With the objective function and constraints constructed above, we can formulate the trajectory generation problem as the following multi-stage constrained optimal control problem:

min𝒰\displaystyle\min_{\mathcal{U}} (8)\displaystyle~{}~{}\eqref{eq:obj1} (14)
s.t.\displaystyle\mathrm{s.t.} (6),(13),𝐱0isgiven.\displaystyle~{}~{}\eqref{eq:state},\eqref{eq:cons},\mathbf{x}_{0}~{}\mathrm{is~{}given}.

It should be noted that the constraint and objective function in (14) are respectively in linear and quadratic forms, in fact, all the matrices involved are functions of tkt_{k} and hence they are not linear-quadratic. Therefore, we shall solve the formulated problem via IPDDP [7], which generalizes the traditional unconstrained DDP to a constrained version. Similar to DDP, the IPDDP algorithm iterates between backward pass, which computes control inputs to minimize a quadratic approximation of cost in the vicinity of the current trajectory, and forward pass, which updates the current trajectory to a new one, until some terminating condition is triggered.

Define VN=𝐱N𝐱N,g𝐐N2V_{N}=\|\mathbf{x}_{N}-\mathbf{x}_{N,\mathrm{g}}\|_{\mathbf{Q}_{N}}^{2}. Let k=𝐱k𝐱k,g𝐐k2+𝐮k𝐑k2+𝝀k𝐠k\ell_{k}=\|\mathbf{x}_{k}-\mathbf{x}_{k,\mathrm{g}}\|_{\mathbf{Q}_{k}}^{2}+\|\mathbf{u}_{k}\|_{\mathbf{R}_{k}}^{2}+\bm{\lambda}_{k}^{\top}\mathbf{g}_{k}, and Lk=k+Vk+1L_{k}=\ell_{k}+V_{k+1} for all kNk\in\mathcal{I}_{N}, where 𝝀k𝟎\bm{\lambda}_{k}\succeq\mathbf{0} is the Lagrange multiplier. In the following, we shall omit subscript kk and denote ()k+1(\cdot)_{k+1} by ()+(\cdot)^{+} for clarity. Letting ()𝐚:=()𝐚(\cdot)_{\mathbf{a}}:=\partialderivative{(\cdot)}{\mathbf{a}} and ()𝐚𝐛:=2()𝐚𝐛(\cdot)_{\mathbf{ab}}:=\partialderivative{(\cdot)}{\mathbf{a}}{\mathbf{b}}, one has the following iteration which resembles traditional DDP:

L𝝀\displaystyle L_{\bm{\lambda}} =𝐠,L𝝀𝐱=𝐠𝐱,L𝝀𝐮=𝐠𝐮,L𝝀𝝀=𝟎,\displaystyle=\mathbf{g},L_{\bm{\lambda}\mathbf{x}}=\mathbf{g}_{\mathbf{x}},L_{\bm{\lambda}\mathbf{u}}=\mathbf{g}_{\mathbf{u}},L_{\bm{\lambda\lambda}}=\mathbf{0}, (15)
L𝐱\displaystyle L_{\mathbf{x}} =𝐱+𝐟𝐱V𝐱+,L𝐮=𝐮+𝐟𝐮V𝐮+,\displaystyle=\ell_{\mathbf{x}}+\mathbf{f}_{\mathbf{x}}^{\top}V_{\mathbf{x}}^{+},L_{\mathbf{u}}=\ell_{\mathbf{u}}+\mathbf{f}_{\mathbf{u}}^{\top}V_{\mathbf{u}}^{+},
L𝐱𝐱\displaystyle L_{\mathbf{xx}} =𝐱𝐱+𝐟𝐱V𝐱𝐱+𝐟𝐱+V𝐱+𝐟𝐱𝐱,\displaystyle=\ell_{\mathbf{xx}}+\mathbf{f}_{\mathbf{x}}^{\top}V_{\mathbf{xx}}^{+}\mathbf{f}_{\mathbf{x}}+V_{\mathbf{x}}^{+}\odot\mathbf{f}_{\mathbf{xx}},
L𝐮𝐱\displaystyle L_{\mathbf{ux}} =𝐮𝐱+𝐟𝐮V𝐮𝐱+𝐟𝐱+V𝐱+𝐟𝐮𝐱,\displaystyle=\ell_{\mathbf{ux}}+\mathbf{f}_{\mathbf{u}}^{\top}V_{\mathbf{ux}}^{+}\mathbf{f}_{\mathbf{x}}+V_{\mathbf{x}}^{+}\odot\mathbf{f}_{\mathbf{ux}},
L𝐮𝐮\displaystyle L_{\mathbf{uu}} =𝐮𝐮+𝐟𝐮V𝐮𝐮+𝐟𝐮+V𝐱+𝐟𝐮𝐮,\displaystyle=\ell_{\mathbf{uu}}+\mathbf{f}_{\mathbf{u}}^{\top}V_{\mathbf{uu}}^{+}\mathbf{f}_{\mathbf{u}}+V_{\mathbf{x}}^{+}\odot\mathbf{f}_{\mathbf{uu}},

where \odot denotes the tensor contraction and 𝐟\mathbf{f} is defined in (6). In addition to the decision variable δ𝐮k\delta\mathbf{u}_{k} (i.e., the first order variation of 𝐮k\mathbf{u}_{k}), the primal-dual version of DDP introduces the additional decision variable δ𝝀k\delta\bm{\lambda}_{k}. Both δ𝐮k\delta\mathbf{u}_{k} and δ𝝀k\delta\bm{\lambda}_{k} can be represented in an affine form of δ𝐱k\delta\mathbf{x}_{k} by solving the optimality conditions of an approximation of constrained Bellman equation (see (6)(6) in [7]), i.e.,

[𝐤𝐮𝐊𝐮𝐤𝝀𝐊𝝀]=[L𝐮𝐮L𝐮𝝀𝚲L𝝀𝐮𝐆]1[L𝐮L𝐮𝐱𝐫𝚲L𝝀𝐱],\displaystyle\begin{bmatrix}\mathbf{k}^{\mathbf{u}}&\mathbf{K}^{\mathbf{u}}\\ \mathbf{k}^{\bm{\lambda}}&\mathbf{K}^{\bm{\lambda}}\end{bmatrix}=-\begin{bmatrix}L_{\mathbf{uu}}&L_{\mathbf{u}\bm{\lambda}}\\ \bm{\Lambda}L_{\bm{\lambda}\mathbf{u}}&\mathbf{G}\end{bmatrix}^{-1}\begin{bmatrix}L_{\mathbf{u}}&L_{\mathbf{ux}}\\ \mathbf{r}&\bm{\Lambda}L_{\bm{\lambda}\mathbf{x}}\end{bmatrix}, (16)

where 𝐤𝐮\mathbf{k}^{\mathbf{u}}, 𝐊𝐮\mathbf{K}^{\mathbf{u}}, 𝐤𝝀\mathbf{k}^{\bm{\lambda}} and 𝐊𝝀\mathbf{K}^{\bm{\lambda}} are the update gains for δ𝐮k\delta\mathbf{u}_{k} and δ𝝀k\delta\bm{\lambda}_{k}, respectively; 𝐫=𝚲𝐠+μ𝟏\mathbf{r}=\bm{\Lambda}\mathbf{g}+\mu\mathbf{1}, 𝐆=diag(𝐠)\mathbf{G}=\operatorname{diag}(\mathbf{g}), 𝚲=diag(𝝀)\bm{\Lambda}=\operatorname{diag}(\bm{\lambda}) and μ\mu is a perturbation constant.

With these at hand, the gradient and Hessian of VkV_{k} can be updated by

V𝐱\displaystyle V_{\mathbf{x}} =L^𝐱+L^𝐱𝐮𝐤𝐮,\displaystyle=\hat{L}_{\mathbf{x}}+\hat{L}_{\mathbf{xu}}\mathbf{k}^{\mathbf{u}}, (17)
V𝐱𝐱\displaystyle V_{\mathbf{xx}} =L^𝐱𝐱+L^𝐱𝐮𝐊𝐮,\displaystyle=\hat{L}_{\mathbf{xx}}+\hat{L}_{\mathbf{xu}}\mathbf{K}^{\mathbf{u}},

with the following equation:

L^𝐱\displaystyle\hat{L}_{\mathbf{x}} =L𝐱L𝐱𝝀𝐆1𝐫,\displaystyle=L_{\mathbf{x}}-L_{\mathbf{x}\bm{\lambda}}\mathbf{G}^{-1}\mathbf{r}, (18)
L^𝐮\displaystyle\hat{L}_{\mathbf{u}} =L𝐮L𝐮𝝀𝐆1𝐫,\displaystyle=L_{\mathbf{u}}-L_{\mathbf{u}\bm{\lambda}}\mathbf{G}^{-1}\mathbf{r},
L^𝐱𝐱\displaystyle\hat{L}_{\mathbf{xx}} =L𝐱𝐱L𝐱𝝀𝐆1𝚲L𝝀𝐱,\displaystyle=L_{\mathbf{xx}}-L_{\mathbf{x}\bm{\lambda}}\mathbf{G}^{-1}\bm{\Lambda}L_{\bm{\lambda}\mathbf{x}},
L^𝐮𝐱\displaystyle\hat{L}_{\mathbf{ux}} =L𝐮𝐱L𝐮𝝀𝐆1𝚲L𝝀𝐱,\displaystyle=L_{\mathbf{ux}}-L_{\mathbf{u}\bm{\lambda}}\mathbf{G}^{-1}\bm{\Lambda}L_{\bm{\lambda}\mathbf{x}},
L^𝐮𝐮\displaystyle\hat{L}_{\mathbf{uu}} =L𝐮𝐮L𝐮𝝀𝐆1𝚲L𝝀𝐮,\displaystyle=L_{\mathbf{uu}}-L_{\mathbf{u}\bm{\lambda}}\mathbf{G}^{-1}\bm{\Lambda}L_{\bm{\lambda}\mathbf{u}},

where δ𝝀k\delta\bm{\lambda}_{k} was eliminated.

The forward pass resembles the traditional DDP while updating both the control input 𝐮k\mathbf{u}_{k} and the dual variable 𝝀k\bm{\lambda}_{k}:

𝐱0\displaystyle\mathbf{x}_{0}^{\dagger} =𝐱0,\displaystyle=\mathbf{x}_{0}, (19)
𝐮k\displaystyle\mathbf{u}_{k}^{\dagger} =𝐮k+𝐤k𝐮+𝐊k𝐮(𝐱k𝐱k),\displaystyle=\mathbf{u}_{k}+\mathbf{k}^{\mathbf{u}}_{k}+\mathbf{K}^{\mathbf{u}}_{k}(\mathbf{x}_{k}^{\dagger}-\mathbf{x}_{k}),
𝝀k\displaystyle\bm{\lambda}_{k}^{\dagger} =𝝀k+𝐤k𝝀+𝐊k𝝀(𝐱k𝐱k),\displaystyle=\bm{\lambda}_{k}+\mathbf{k}^{\bm{\lambda}}_{k}+\mathbf{K}^{\bm{\lambda}}_{k}(\mathbf{x}_{k}^{\dagger}-\mathbf{x}_{k}),
𝐱k+1\displaystyle\mathbf{x}_{k+1}^{\dagger} =𝐟(𝐱k,𝐮k),\displaystyle=\mathbf{f}(\mathbf{x}_{k}^{\dagger},\mathbf{u}_{k}^{\dagger}),

where 𝐱\mathbf{x}^{\dagger}, 𝐮\mathbf{u}^{\dagger}, and 𝝀\bm{\lambda}^{\dagger} denote the new state, control variable, and dual variable, respectively.

Algorithm 1 presents the entire algorithm for generating a polynomial trajectory using IPDDP. It should be noted that the presented algorithm requires a feasible initial trajectory. This can be generated from a similar adaptation of the Infeasible-IPDDP algorithm in [7].

Algorithm 1 DIRECT
0:  𝐱k,g\mathbf{x}_{k,\mathrm{g}}, 𝐐k\mathbf{Q}_{k}, ηm,k\eta_{m,k}, wkw_{k}, 𝐱k,g\mathbf{x}_{k,\mathrm{g}}, kNk\in\mathcal{I}_{N}, 𝐐N\mathbf{Q}_{N}, a feasible trajectory with 𝒞\mathcal{C} and 𝒯\mathcal{T}.
0:  𝒞\mathcal{C}^{\ast}, 𝒯\mathcal{T}^{\ast}.
  Convert 𝒞\mathcal{C}, 𝒯\mathcal{T} to 𝒰\mathcal{U}
  for iter=1:max_iteration\mathrm{iter}=1:\mathrm{max\_iteration} do
     Set VNV_{N}, V𝐱,NV_{\mathbf{x},N}, V𝐱𝐱,NV_{\mathbf{xx},N}
     for k=N1,,0k=N-1,\dots,0 do
        Evaluate (15)
        Compute control inputs and auxiliary matrices from (16) and (18)
        Update V𝐱V_{\mathbf{x}}, V𝐱𝐱V_{\mathbf{xx}} according to (17)
     end for
     Set 𝐱0=𝐱0\mathbf{x}_{0}^{\dagger}=\mathbf{x}_{0}
     for k=0,,N1k=0,\dots,N-1 do
        Update the control variable 𝐮k\mathbf{u}_{k}^{\dagger}, multiplier 𝝀k\bm{\lambda}_{k}^{\dagger} and next state 𝐱k+1\mathbf{x}_{k+1}^{\dagger} according to (19)
     end for
     if termination condition is satisfied then
        break
     end if
  end for
  Extract 𝒰\mathcal{U}^{\ast} and convert back to 𝒞\mathcal{C}^{\ast} and 𝒯\mathcal{T}^{\ast}

IV Specializations

In last section, we have introduced an algorithm for jointly optimizing energy and time subject to safety and dynamical feasibility constraints. In that algorithm, part of the polynomial coefficient 𝐂k,2\mathbf{C}_{k,2} (i.e., 𝐯k\mathbf{v}_{k}) and allocated time tkt_{k} for segment kk are both regarded as control inputs in the formulation (14). In this section, we shall show that this framework can be reduced to two other types of simpler problems.

IV-A Fixed time allocation

It can be easily found that the above algorithm can be used for constrained polynomial trajectory generation with preallocated time. In particular, letting tk=constt_{k}=\operatorname{const} and redefining 𝐮k=𝐯k\mathbf{u}_{k}=\mathbf{v}_{k} for all kNk\in\mathcal{I}_{N}, (6)\eqref{eq:state} turns to be a linear-time-varying system and (13)\eqref{eq:cons} is a linear-time-varying constraint w.r.t. [𝐱k,𝐯k][\mathbf{x}_{k}^{\top},\mathbf{v}_{k}^{\top}]^{\top}222The “time-varying” is in the sense of kk.. The problem can be formally written as:

min𝒱\displaystyle\min_{\mathcal{V}} J(𝒱)=k=0N1(𝐱k𝐱k,g𝐐k2+𝐯k𝐑k2)\displaystyle~{}~{}J(\mathcal{V})=\sum_{k=0}^{N-1}(\|\mathbf{x}_{k}-\mathbf{x}_{k,\mathrm{g}}\|_{\mathbf{Q}_{k}}^{2}+\|\mathbf{v}_{k}\|_{\mathbf{R}_{k}}^{2}) (20)
+𝐱N𝐱N,g𝐐N2\displaystyle\quad\quad\quad\quad\quad+\|\mathbf{x}_{N}-\mathbf{x}_{N,\mathrm{g}}\|_{\mathbf{Q}_{N}}^{2}
s.t.\displaystyle\mathrm{s.t.} 𝐱k+1=𝐀k𝐱k+𝐁k𝐯k,\displaystyle~{}~{}\mathbf{x}_{k+1}=\mathbf{A}_{k}\mathbf{x}_{k}+\mathbf{B}_{k}\mathbf{v}_{k},
𝐇k[𝐱k,𝐯k]𝐡k,\displaystyle~{}~{}\mathbf{H}_{k}[\mathbf{x}_{k}^{\top},\mathbf{v}_{k}^{\top}]^{\top}\preceq\mathbf{h}_{k},
𝐱0isgiven,\displaystyle~{}~{}\mathbf{x}_{0}~{}\mathrm{is~{}given},

where 𝒱:={𝐯k}kN\mathcal{V}:=\{\mathbf{v}_{k}\}_{k\in\mathcal{I}_{N}}. In this case, Algorithm 1 can still be used for solving such type of problem333It is also a typical linear-time-varying model predictive control (MPC) problem, which can be solved by existing MPC algorithms., with all \ell and 𝐟\mathbf{f} related terms in (15) being precomputed and saved during the initialization stage instead of being reevaluated at each iteration.

IV-B Fixed time allocation and no constraint

If neither safety nor dynamical feasibility constraint is considered, one has the following problem:

min𝒱\displaystyle\min_{\mathcal{V}} J(𝒱)=k=0N1(𝐱k𝐱k,g𝐐k2+𝐯k𝐑k2)\displaystyle~{}~{}J(\mathcal{V})=\sum_{k=0}^{N-1}(\|\mathbf{x}_{k}-\mathbf{x}_{k,\mathrm{g}}\|_{\mathbf{Q}_{k}}^{2}+\|\mathbf{v}_{k}\|_{\mathbf{R}_{k}}^{2}) (21)
+𝐱N𝐱N,g𝐐N2\displaystyle\quad\quad\quad\quad\quad+\|\mathbf{x}_{N}-\mathbf{x}_{N,\mathrm{g}}\|_{\mathbf{Q}_{N}}^{2}
s.t.\displaystyle\mathrm{s.t.} 𝐱k+1=𝐀k𝐱k+𝐁k𝐯k,\displaystyle~{}~{}\mathbf{x}_{k+1}=\mathbf{A}_{k}\mathbf{x}_{k}+\mathbf{B}_{k}\mathbf{v}_{k},
𝐱0isgiven.\displaystyle~{}~{}\mathbf{x}_{0}~{}\mathrm{is~{}given}.

We show that (21) is indeed a LQT problem by proving that 𝐑k𝟎\mathbf{R}_{k}\succ\mathbf{0}. Firstly, the positive semi-definiteness 𝐑k=ηm,k0tk𝐑𝐯(t)dt\mathbf{R}_{k}=\eta_{m,k}\int_{0}^{t_{k}}\mathbf{R}_{\mathbf{v}}(t)\differential t can be easily established by the linearity of integral. Secondly, denoting the last mm elements of 𝐛(m)(t)\mathbf{b}^{(m)}(t) by 𝝈(t)\bm{\sigma}(t), one has

𝝈(t)\displaystyle\bm{\sigma}(t) =[m!0!t0,,n!(m1)!tm1]\displaystyle=[\frac{m!}{0!}t^{0},\cdots,\frac{n!}{(m-1)!}t^{m-1}]^{\top}
=diag({(m+i)!}i=0m1)[10!t0,,1(m1)!tm1]\displaystyle=\operatorname{diag}(\{(m+i)!\}_{i=0}^{m-1})[\frac{1}{0!}t^{0},\cdots,\frac{1}{(m-1)!}t^{m-1}]^{\top}
=:𝚿𝝉.\displaystyle=:\bm{\Psi}\bm{\tau}.

On the other hand, 𝝉\bm{\tau} can be written as 𝝉=e𝐀¯t𝐞1\bm{\tau}=e^{\underline{\mathbf{A}}^{\top}t}\mathbf{e}_{1} with 𝐀¯=[𝟎m1𝐈m10𝟎m1]\underline{\mathbf{A}}=\left[\begin{smallmatrix}\mathbf{0}_{m-1}&\mathbf{I}_{m-1}\\ 0&\mathbf{0}_{m-1}^{\top}\end{smallmatrix}\right] and 𝐞1=[1,𝟎m1]\mathbf{e}_{1}=[1,\mathbf{0}_{m-1}^{\top}]^{\top}. Since the pair (𝐀¯,𝐞1)(\underline{\mathbf{A}},\mathbf{e}_{1}^{\top}) is observable, the matrix 0τe𝐀¯t𝐞1𝐞1e𝐀¯tdt\int_{0}^{\tau}e^{\underline{\mathbf{A}}^{\top}t}\mathbf{e}_{1}\mathbf{e}_{1}^{\top}e^{\underline{\mathbf{A}}t}\differential t is nonsingular for any τ>0\tau>0. Therefore,

0tk𝐑𝐯(t)dt=0tk𝚿𝝉𝝉𝚿dt=𝚿(0tk𝝉𝝉dt)𝚿\int_{0}^{t_{k}}\mathbf{R}_{\mathbf{v}}(t)\differential t=\int_{0}^{t_{k}}\bm{\Psi}\bm{\tau}\bm{\tau}^{\top}\bm{\Psi}^{\top}\differential t=\bm{\Psi}\left(\int_{0}^{t_{k}}\bm{\tau}\bm{\tau}^{\top}\differential t\right)\bm{\Psi}^{\top}

is positive definite and this shows that (21) is an LQT problem.

Let 𝐏N=𝐐N\mathbf{P}_{N}=\mathbf{Q}_{N}. The backward iteration for the resulting LQT problem reduces to

𝐊k\displaystyle\mathbf{K}_{k} =(𝐁k𝐏k+1𝐁k+𝐑k)1𝐁k𝐏k+1𝐀k,\displaystyle=(\mathbf{B}_{k}^{\top}\mathbf{P}_{k+1}\mathbf{B}_{k}+\mathbf{R}_{k})^{-1}\mathbf{B}_{k}\mathbf{P}_{k+1}\mathbf{A}_{k}, (22)
𝐏k\displaystyle\mathbf{P}_{k} =𝐀k𝐏k+1(𝐀k𝐁k𝐊k)+𝐐k,\displaystyle=\mathbf{A}_{k}^{\top}\mathbf{P}_{k+1}(\mathbf{A}_{k}-\mathbf{B}_{k}\mathbf{K}_{k})+\mathbf{Q}_{k},
𝐪k\displaystyle\mathbf{q}_{k} =(𝐀k𝐁k𝐊k)𝐪k+1+𝐐k𝐱k,g,\displaystyle=(\mathbf{A}_{k}-\mathbf{B}_{k}\mathbf{K}_{k})^{\top}\mathbf{q}_{k+1}+\mathbf{Q}_{k}\mathbf{x}_{k,\mathrm{g}},
𝐤k\displaystyle\mathbf{k}_{k} =(𝐁k𝐏k+1𝐁k+𝐑k)1𝐁k𝐪k+1,\displaystyle=(\mathbf{B}_{k}^{\top}\mathbf{P}_{k+1}\mathbf{B}_{k}+\mathbf{R}_{k})^{-1}\mathbf{B}_{k}\mathbf{q}_{k+1},

where 𝐊k\mathbf{K}_{k} and 𝐤k\mathbf{k}_{k} are the parameters for expressing the new control inputs as a linear form of current state (resembles 𝐤𝐮\mathbf{k}^{\mathbf{u}} and 𝐊𝐮\mathbf{K}^{\mathbf{u}} in (16)). The forward pass reads:

𝐮k\displaystyle\mathbf{u}_{k} =𝐊k𝐱k+𝐤k,\displaystyle=-\mathbf{K}_{k}\mathbf{x}_{k}+\mathbf{k}_{k}, (23)
𝐱k+1\displaystyle\mathbf{x}_{k+1} =𝐀k𝐱k+𝐁k𝐮k,\displaystyle=\mathbf{A}_{k}\mathbf{x}_{k}+\mathbf{B}_{k}\mathbf{u}_{k},

which iteratively computes the control input and next state.

Algorithm 2 summarizes the above iteration and this gives the analytical solution to the unconstrained fixed-time polynomial trajectory generation problem (21). It should be noted that Algorithm 2 can be regarded as a simplified version of Algorithm 1 with exact one iteration.

Algorithm 2 LQT-based trajectory generation.
0:  𝐱k,g\mathbf{x}_{k,\mathrm{g}}, 𝐐k\mathbf{Q}_{k}, ηm,k\eta_{m,k} (kNk\in\mathcal{I}_{N}), 𝐐N\mathbf{Q}_{N}, 𝒯\mathcal{T}.
0:  𝒞\mathcal{C}^{\ast}.
  Set 𝐏N=𝐐N\mathbf{P}_{N}=\mathbf{Q}_{N}, 𝐪N=𝐱N,g\mathbf{q}_{N}=\mathbf{x}_{N,\mathrm{g}}
  for k=N1,,0k=N-1,\dots,0 do
     Compute 𝐊k\mathbf{K}_{k}, 𝐤k\mathbf{k}_{k} according to (22)
  end for
  for k=0,,N1k=0,\dots,N-1 do
     Compute the control variable 𝐮k\mathbf{u}_{k} and next state 𝐱k+1\mathbf{x}_{k+1} according to (23)
  end for
  Extract 𝒰\mathcal{U} and convert it to 𝒞\mathcal{C}^{\ast}

V Simulation and Experimental Results

In this section, we shall demonstrate the efficacy of our proposed framework by comparing with state-of-the-art methods as well as performing real-world experiments.

V-A Benchmarks for trajectory generation methods

For trajectory generation with time optimization as well as safety and dynamical feasibility constraints, we compare our algorithm with the alternating spatial-temporal optimization[6] and bi-level optimization with analytic gradient[9].444According to Fig. 9 of [11], these two algorithms are the state-of-the-art methods for generating a safe and dynamical feasible trajectory with time optimization. The code for [11] is currently unavailable. In [6] the energy and time costs are separately optimized while in [11] the overall cost function takes a similar form as ours, but without the terminal cost and the time cost takes the form of wktkw_{k}t_{k}. The benchmark is conducted using an i7-8550U CPU. Our algorithm is implemented in three steps: 1)1) initialize our algorithm with Infeasible-IPDDP [7] and zero initial condition to find a feasible solution with wk=1w_{k}=1 and 𝐐N=𝐈\mathbf{Q}_{N}=\mathbf{I}; 2)2) feed the obtained solution into Algorithm 1 with wk=20w_{k}=20 and 𝐐N=100𝐈\mathbf{Q}_{N}=100\mathbf{I} to jointly optimize energy and time; and 3)3) feed the obtained solution into Algorithm 1 with fixed time allocation setting (see Section IV-A). The algorithm in [6] is implemented as it is (the coefficient of time is set as 1010). The algorithm in [9] is implemented in C++ with MOSEK555https://www.mosek.com/ solver for the inner loop QP and backtracking line search for the outer loop and the coefficient of time is set as 2020. We randomly generate 1010 paths with 6464 safety flight corridors (the facet number ranges from 66 to 117117, and is 2828 on average) and extract 2642\sim 64 of them to implement these three algorithms. The initial allocation time is set by the subroutine in [6]. We consider the problem of generating minimum jerk trajectory with n=5n=5, where the maximal velocity and acceleration are set as 2m/s2\mathrm{m/s} and 2m/s22\mathrm{m/s^{2}}, respectively. The success rate versus number of segments is shown in Fig. 3. The success rate is dropping in general with the increasing number of segments. The success rate of our algorithm is marginally higher than Gao’s in some cases since the terminal constraint is softened in our formulation. As a result of terminal constraints and enforcing the dynamical feasibility via control points in inner QP, Sun’s method has the lowest success rate. The overall computation time and the reduction rate of the flight time (computed by [kN1(tkinitk)]/kN1tkini[\sum_{k}^{N-1}(t_{k}^{\mathrm{ini}}-t_{k}^{\ast})]/\sum_{k}^{N-1}t^{\mathrm{ini}}_{k}, where tkinit_{k}^{\mathrm{ini}} is the initial allocated time for segment kk and tkt_{k}^{\ast} is the optimized allocated time computed by each algorithm) and the control effort are also compared in Fig. 3. It can be found that for middle number of segments (N40N\leq 40), our algorithm consumes less time than the other two algorithms. For N>40N>40, our algorithm still consumes less time than Gao’s while more than Sun’s. The reason is that the outer loop of Sun’s method terminates in only a few iterations and gets stuck in local minimum in these cases, which can be seen from the relatively low reduction rate in Fig. 3. In addition, there is a trade-off between flight time and control effort: a shorter flight time (i.e., higher reduction rate) corresponds to a higher control effort. Unlike Gao’s method which has increasing reduction rate and control effort w.r.t. NN, our algorithm maintains relatively consistent reduction rate and level of control energy. Since our formulation also incorporates the terminal cost term (see (14)), the change of this term affects the other two terms and hence causes the fluctuation of control energy in our approach, as observed in Fig. 3. Overall, for the application of trajectory generation with N[10,30]N\in[10,30], our algorithm achieves comparable results with Gao’s method while consumes less runtime.

Refer to caption
Figure 3: Success rate, average runtime, flight time reduction rate and control effort of the three algorithms. “Gao” and “Sun” refer to the algorithms proposed in [6] and [9], respectively.

For trajectory generation with fixed time allocation and without constraint, we compare our Algorithm 2 with [13]. Our parameters are set as ηm,k=105\eta_{m,k}=10^{-5}, 𝐐k=100𝐈\mathbf{Q}_{k}=100\mathbf{I}, kNk\in\mathcal{I}_{N}, 𝐐N=100𝐈\mathbf{Q}_{N}=100\mathbf{I}. We randomly sample 100100 times for each number of waypoints and run 44 algorithms (22 for minimum jerk trajectory and 22 for minimum snap trajectory). The average runtime versus number of segments is shown in Fig. 4. It can be found that all the algorithms enjoy linear complexity w.r.t. number of segments. Although our algorithm for minimum jerk trajectory consumes around three times the time of [13], the gap for minimum snap trajectory is much narrower(around 1.5 times). Furthermore, based on the results reported in [13], our algorithm is faster than the optimized version of algorithm in [12], which is one order of magnitude slower than [13] (we do not plot the result of [12] as the code is not available.)

V-B Flight experiments

The proposed framework is implemented in ROS and we conduct multiple flight experiments of a small quadrotor executing the trajectories generated using our method. We use stereo camera fusion with IMU[25] for localization and state estimation of the quadrotor in the environment and use the robust and perfect tracking (RPT) controller [26] for trajectory tracking. Firstly, we conduct a set of experiments where a long trajectory is generated in a pre-built map consisting of two rooms and multiple obstacles, as shown in Fig. 1. The trajectory is generated by the i7 onboard computer and executed by the quadrotor immediately. We use multiple settings of dynamical feasibility. The trajectory with maximal velocity and acceleration set as 2.5m/s2.5\mathrm{m/s} and 3m/s23\mathrm{m/s^{2}} is shown in Fig. 5. In the second set of experiments the onboard computer generates safe and feasible trajectories online to reach target points set by a commander remotely. The video of the experiments can be found online666https://youtu.be/BM8_ABM_2VM or in the supplemental material.

Refer to caption
Figure 4: Comparison of large scale trajectory generation.
Refer to caption
Figure 5: The trajectory command generated from our proposed method.

VI Conclusion and Future Work

In this paper, we have proposed to use a state-space equation to characterize a piecewise-polynomial trajectory such that the trajectory generation problem (either with or without time optimization and constraints) can be subsequently solved via DDP algorithm. The proposed framework is implemented in C++ without the use of solvers, and the efficiency of our proposed framework was demonstrated by comparing with state-of-the-art methods as well as real-world experiments.

The proposed framework opens up opportunity for further developments. For example, the joint energy-time optimization can be extended to a multi-robot scenario with non-collision constraints. The fixed-time optimization can be used for online trajectory replanning due to its fast computation.

References

  • [1] J. Hu, J. Xu, and L. Xie, “Cooperative search and exploration in robotic networks,” Unmanned Systems, vol. 1, no. 1, pp. 121–142, 2013.
  • [2] P. Ogren, E. Fiorelli, and N. E. Leonard, “Cooperative control of mobile sensor networks: Adaptive gradient climbing in a distributed environment,” IEEE Transactions on Automatic Control, vol. 49, no. 8, pp. 1292–1302, 2004.
  • [3] Y. Yue, C. Yang, Y. Wang, P. C. N. Senarathne, J. Zhang, M. Wen, and D. Wang, “A multilevel fusion system for multirobot 3-d mapping using heterogeneous sensors,” IEEE Systems Journal, published online, 2019.
  • [4] D. Mellinger and V. Kumar, “Minimum snap trajectory generation and control for quadrotors,” in 2011 IEEE International Conference on Robotics and Automation, pp. 2520–2525, IEEE, 2011.
  • [5] A. Bry, C. Richter, A. Bachrach, and N. Roy, “Aggressive flight of fixed-wing and quadrotor aircraft in dense indoor environments,” The International Journal of Robotics Research, vol. 34, no. 7, pp. 969–1002, 2015.
  • [6] F. Gao, L. Wang, B. Zhou, X. Zhou, J. Pan, and S. Shen, “Teach-repeat-replan: A complete and robust system for aggressive flight in complex environments,” IEEE Transactions on Robotics, vol. 36, no. 5, pp. 1526–1545, 2020.
  • [7] A. Pavlov, I. Shames, and C. Manzie, “Interior point differential dynamic programming,” IEEE Transactions on Control Systems Technology, early access, 2021.
  • [8] S. Liu, M. Watterson, K. Mohta, K. Sun, S. Bhattacharya, C. J. Taylor, and V. Kumar, “Planning dynamically feasible trajectories for quadrotors using safe flight corridors in 3-d complex environments,” IEEE Robotics and Automation Letters, vol. 2, no. 3, pp. 1688–1695, 2017.
  • [9] W. Sun, G. Tang, and K. Hauser, “Fast uav trajectory optimization using bilevel optimization with analytical gradients,” IEEE Transactions on Robotics, early access, 2021.
  • [10] Z. Wang, X. Zhou, C. Xu, J. Chu, and F. Gao, “Alternating minimization based trajectory generation for quadrotor aggressive flight,” IEEE Robotics and Automation Letters, vol. 5, no. 3, pp. 4836–4843, 2020.
  • [11] Z. Wang, X. Zhou, C. Xu, and F. Gao, “Geometrically constrained trajectory optimization for multicopters,” arXiv preprint arXiv:2103.00190, 2021.
  • [12] D. Burke, A. Chapman, and I. Shames, “Generating minimum-snap quadrotor trajectories really fast,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1487–1492, IEEE, 2020.
  • [13] Z. Wang, H. Ye, C. Xu, and F. Gao, “Generating large-scale trajectories efficiently using double descriptions of polynomials,” arXiv preprint arXiv:2011.02662, 2020.
  • [14] D. Burke, A. Chapman, and I. Shames, “Fast spline trajectory planning: Minimum snap and beyond,” arXiv preprint arXiv:2105.01788, 2021.
  • [15] D. Mayne, “A second-order gradient method for determining optimal trajectories of non-linear discrete-time systems,” International Journal of Control, vol. 3, no. 1, pp. 85–95, 1966.
  • [16] J. De O. Pantoja, “Differential dynamic programming and newton’s method,” International Journal of Control, vol. 47, no. 5, pp. 1539–1553, 1988.
  • [17] J. Morimoto, G. Zeglin, and C. G. Atkeson, “Minimax differential dynamic programming: Application to a biped walking robot,” in Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003)(Cat. No. 03CH37453), vol. 2, pp. 1927–1932, IEEE, 2003.
  • [18] V. Kumar, E. Todorov, and S. Levine, “Optimal control with learned local models: Application to dexterous manipulation,” in 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 378–383, IEEE, 2016.
  • [19] B. Plancher, Z. Manchester, and S. Kuindersma, “Constrained unscented dynamic programming,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5674–5680, IEEE, 2017.
  • [20] Z. Xie, C. K. Liu, and K. Hauser, “Differential dynamic programming with nonlinear constraints,” in 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 695–702, IEEE, 2017.
  • [21] Y. Aoyama, G. Boutselis, A. Patel, and E. A. Theodorou, “Constrained differential dynamic programming revisited,” arXiv preprint arXiv:2005.00985, 2020.
  • [22] Z. Zhang, J. Tomlinson, and C. Martin, “Splines and linear control theory,” Acta Applicandae Mathematica, vol. 49, no. 1, pp. 1–34, 1997.
  • [23] S. Sun, M. B. Egerstedt, and C. F. Martin, “Control theoretic smoothing splines,” IEEE Transactions on Automatic Control, vol. 45, no. 12, pp. 2271–2279, 2000.
  • [24] J. Tordesillas and J. P. How, “Minvo basis: Finding simplexes with minimum volume enclosing polynomial curves,” arXiv preprint arXiv:2010.10726, 2020.
  • [25] T. Qin, J. Pan, S. Cao, and S. Shen, “A general optimization-based framework for local odometry estimation with multiple sensors,” 2019.
  • [26] G. Cai, B. M. Chen, and T. H. Lee, Unmanned rotorcraft systems. Springer Science & Business Media, 2011.