Simultaneous Spatial and Temporal Assignment for Fast UAV Trajectory Optimization using Bilevel Optimization

I Introduction

II Related Work

III Background

IV Method

V Simulation results

VI Conclusion

References

II-A Trajectory Planning

II-B Time Allocation

II-C Bilevel Optimization

V-A Trajectory Planning with Two Adjustable Waypoints

V-B Scalability Experiment with Multiple Adjustable Waypoints

V-C Trajectory Optimization with Dynamic Constraints

Abstract

Index Terms:

Remark 1

Qianzhong Chen, Sheng Cheng, and Naira Hovakimyan Manuscript received: December 1, 2022; Revised March 10, 2023; Accepted April 8, 2023.This paper was recommended for publication by Editor Hanna Kurniawati upon evaluation of the Associate Editor and Reviewers’ comments. This work was supported by National Aeronautics and Space Administration (NASA) grant 80NSSC22M0070, National Science Foundation (NSF) under the RI grant #2133656, and Air Force Office of Scientific Research (AFOSR) grant FA9550-21-1-0411.All the authors are with the Department of Mechanical Science and Engineering, University of Illinois Urbana-Champaign, USA. {qc19, chengs, nhovakim}@illinois.eduDigital Object Identifier (DOI): see top of this page.

In this paper, we propose a framework for fast trajectory planning for unmanned aerial vehicles (UAVs). Our framework is reformulated from an existing bilevel optimization, in which the lower-level problem solves for the optimal trajectory with a fixed time allocation, whereas the upper-level problem updates the time allocation using analytical gradients. The lower-level problem incorporates the safety-set constraints (in the form of inequality constraints) and is cast as a convex quadratic program (QP). Our formulation modifies the lower-level QP by excluding the inequality constraints for the safety sets, which significantly reduces the computation time. The safety-set constraints are moved to the upper-level problem, where the feasible waypoints are updated together with the time allocation using analytical gradients enabled by the OptNet. We validate our approach in simulations, where our method’s computation time scales linearly with respect to the number of safety sets, in contrast to the state-of-the-art that scales exponentially.

Constrained Motion Planning; Aerial Systems: Applications; Optimization and Optimal Control

Trajectory planning has long been a critical problem in robotics. As trajectory planning affects the quality of robots’ motion to a great extent, people have long been investigating this problem for various kinds of robots [1]. Pioneered by Mellinger et al. [2], the minimum-snap method has been dominantly used in trajectory planning of UAVs with differentially flat dynamics (e.g., quadrotors). Such a method takes the snap of the whole trajectory as cost and generates the planning problem in the form of a quadratic program (QP). Meanwhile, various equality or inequality constraints with different requirements on the trajectory can be added to the QP, making it convenient for addressing more complex tasks.

One of the most important features of a trajectory planning algorithm is that it needs to quickly generate a feasible trajectory for the given UAV. The computational efficiency is not a significant issue if the optimization problem is convex, e.g., the minimum-snap formulation [2] with fixed time allocation to the adjustable waypoints.

However, the optimization problem normally becomes non-convex when the time allocation is included as an optimization variable. Since time allocation can largely affect the quality of the planned trajectory, researchers are searching for more efficient ways to optimize time allocation for trajectory planning. Early trials have used heuristics [3] and decoupling methods [4], although they both can lead to inefficient trajectories and burdensome computation time. Mellinger et al. [2] propose to solve this problem by gradient descent, which can be seen as the prototype of the bilevel optimization method, though the finite difference is used to approximate the gradient. Sun et al. [5] finalize the idea of using bilevel optimization to determine time allocation using analytical gradients. In their design, the lower-level problem optimizes the trajectory with fixed time allocation, which is generated by the upper-level problem that optimizes the time allocation using analytical gradients.

Another feature that burdens computational efficiency is the requirement of collision avoidance. To fulfill the collision-free requirement, a flight corridor, which is a space with safety assurance, must be specified in advance. Ways to plan a trajectory with flight corridors vary, e.g., sampling method [2], two-staged planning strategy [6], Bernstein polynomials [7, 5]. Notwithstanding, all of the methods mentioned above are characterized as inequality constraints, introducing burdens on QP’s computation time as a QP can be solved in a much shorter time when equality constraints are involved only.

We propose a bilevel optimization framework to plan a trajectory subject to variable time allocation and safety-set constraints. Our formulation is similar to that of [5]: the lower-level problem is convex and solved for the global minimum; the upper-level problem is non-convex, and a new “solution” is obtained by one-step gradient descent to reduce the optimization cost. The uniqueness of our approach is that we exclude the inequality constraint for the safety set in the lower-level problem, making it a QP with equality constraints only. Specifically, the feasible waypoints in the safety set originally characterized by the inequality constraints are fixed in the lower-level problem. Such a formulation can significantly reduce the computation time on the lower-level QP since it can be turned into an unconstrained optimization problem. Correspondingly, the upper-level problem will adjust the time allocation and the feasible waypoints using analytical gradients that are efficiently obtained via OptNet [8]. Compared with the upper-level problem formulation in [5], where only the time allocation is updated via analytical gradients, the extra computation time to obtain the gradient for the spatial update (of the feasible waypoints) is negligible. This comparison concludes the major reason why our formulation can significantly reduce the computation time of that in [5], where we validated the comparison in simulations with a scalability test. While both methods minimize the cost to a similar level, our approach’s computation time scales linearly with the number of waypoints, whereas the one by [5] scales exponentially.

Our contributions are summarized as follows: 1. We reformulate the bilevel optimization in [5] by excluding the inequality constraints in the lower-level problem, which leads to drastically reduced computation time for an optimal solution. 2. We use analytical gradients in the upper-level problem to simultaneously update the spatial and temporal assignment, which is more accurate and efficient than using finite-difference methods or other numerical approximations.

The remainder of the paper is organized as follows: Section II reviews the related work. Section III introduces the background of trajectory planning and the existing bilevel optimization formulation. Section IV shows our formulation of the bilevel optimization, with simulations results given in Section V showing the advantage of our method in drastically reducing the computation time. Finally, Section VI summarizes the paper and discusses future work.

A general introduction to trajectory planning is provided in [1]. The authors of [2] first introduce the method of minimum-snap that leverages quadrotors’ differential flatness and apply monomial polynomial parametrization to the planning trajectory. The method uses snap information to build the cost function and formulate the optimization problem in the form of a convex QP. The problem can be solved efficiently with off-the-shelf solvers, making it possible for real-time onboard motion planning. Moreover, the QP formulation permits more features of trajectory planning to be integrated using equality or inequality constraints. Based on this formulation, the authors of [7] replace monomial polynomials with Bernstein polynomials to confine the whole trajectory in a convex hull with safety assurance. Another variant uses B-spline instead of monomial polynomials to include curvature constraints on the planned trajectory [9].

As mentioned above, trajectory planning in the form of piecewise polynomials is a well-studied problem when the temporal assignment is fixed. However, as temporal assignment determines the coefficients in the cost function, it affects the resultant optimal trajectory to a great extent. People have long found it difficult to efficiently obtain an optimal time allocation, especially when planning the trajectory in a complex environment. The authors of [10] formulate the problem in the QP form with a time penalty and conduct gradient descent to iteratively find a better time allocation. However, the computation time of the method becomes the major obstacle to widely using it in real-time trajectory planning. Heuristics methods [3] have also been used to achieve better time allocation. Notwithstanding, such methods do not guarantee optimal time allocation and can lead to suboptimal trajectories. Another strategy to find optimal time allocation is to decouple trajectory planning into two problems: a spatial one and a temporal one, and then make use of convex second-order conic program (SOCP) reformulation to find optimal time allocation [4]. However, this method may be infeasible when the initial and final states are not static.

A bilevel optimization is a type of mathematical program where one optimization problem (the lower-level optimization problem) is embedded inside another optimization problem (the upper-level optimization problem), and the lower-level optimization problem is constrained by the upper-level optimization problem [11]. The hierarchical nature of bilevel or multi-level optimization makes it rather appropriate for solving complex optimization problems with more than one level, i.e., the coefficients of the lower level’s optimization problem are dependent on the upper level’s optimization results. In robotics, bilevel optimization has been deployed in optimal control [12] and various kinds of robots’ motion planning [13, 14, 15]. Sun et al. [5] present a method using bilevel optimization to conduct trajectory optimization with optimal time allocation. The time allocation is solved in the upper-level problem and passed to the lower-level one as a parameter. The analytical gradients of the optimal cost with respect to the time allocation are obtained with Karush–Kuhn–Tucker (KKT) conditions of the lower-level problem. Using the analytical gradient with line search, the cost in the upper-level problem is reduced, and the updated time allocation updates the parameters in the lower-level problem to generate the trajectory. In the bilevel optimization problem, the derivation of gradients is crucial. One method of obtaining gradients is by analyzing the sensitivity of the optimization problem [16]. A similar approach for obtaining the analytical gradient of the solution of a QP with respect to the QP’s parameters via implicit differentiation over the KKT condition is shown in [8].

We use piecewise polynomials of time to parameterize the UAV trajectories. Since the differential flatness of UAVs’ dynamics excludes the necessity of explicitly enforcing dynamics (e.g., quadrotors [2]), we can characterize the trajectories as a smooth curve in the space of flat outputs $\bm{\sigma}:[t_{0},t_{m}]\rightarrow\mathbb{R}^{3}\times SO(2)$ as $\bm{\sigma}(t)=[x(t),y(t),z(t),\psi(t)]^{\top}$ , which contains the coordinates of the vehicle’s center of mass in the world coordinate system $\bm{r}=[x,y,z]^{\top}$ and yaw angle $\psi$ . Consider $m$ piecewise polynomials that characterize the entire trajectory, where each piece is an $n$ -th order polynomial, i.e., for $i\in\{1,2,\dots,m\}$ ,

\bm{\sigma}_{i}(t)=\sum_{k=0}^{n}\bm{\sigma}_{ik}t^{k},\quad t\in[t_{i-1},t_{i}),\ \bm{\sigma}_{ik}\in\mathbb{R}^{4}.

(1)

The problem is to find the polynomial coefficients $\{\bm{\sigma}_{ik}\}_{i\in\mathcal{I},k\in\mathcal{K}}$ and the temporal assignment $\bm{T}=[t_{1},t_{2},\dots,t_{m-1}]^{\top}$ such that the following cost function is minimized

\sum_{i=0}^{m-1}\int_{t_{i}}^{t_{i+1}}\mu_{r}\lVert\frac{\text{d}^{k_{r}}\bm{r}(t)}{\text{d}t^{k_{r}}}\rVert^{2}+\mu_{\psi}\lVert\frac{\text{d}^{k_{\psi}}\psi(t)}{\text{d}t^{k_{\psi}}}\rVert^{2}\text{d}t,

(2)

where $k_{r}$ and $k_{\psi}$ refer to the order of derivative of the coordinate $\bm{r}$ and yaw angle $\psi$ , respectively. For the minimum-snap cost in [2], $k_{r}=4$ and $k_{\psi}=2$ are used. The constraints include convex set $\mathcal{C}_{i}\subseteq\mathbb{R}^{3}$ of adjustable waypoints at time instances, i.e.,

\bm{r}_{1}(t_{0})\in\mathcal{C}_{0},\ \bm{r}_{i}(t_{i})\in\mathcal{C}_{i},\ i\in\{1,2,\dots,m\}.

(3)

The sets $\{\mathcal{C}_{i}\}_{i=0:m}$ serve as one type of safety constraint, e.g., each set can be a corner connecting two corridors such that the UAV must pass through the set to ensure safety. (The usage of Bernstein polynomials can characterize the corridors as safety sets [7, 5], which is out of the scope of this paper and will not be discussed.)

Without loss of generality, the feasible sets $\mathcal{C}_{i}$ ’s are polytopes, yielding simple characterization by linear inequalities. The continuity between adjacent polynomials is enforced by

\frac{\text{d}^{k_{1}}\bm{r}_{i}(t_{i})}{\text{d}t^{k_{1}}}=\frac{\text{d}^{k_{1}}\bm{r}_{i+1}(t_{i})}{\text{d}t^{k_{1}}},\ \frac{\text{d}^{k_{2}}\psi_{i}(t_{i})}{\text{d}t^{k_{2}}}=\frac{\text{d}^{k_{2}}\psi_{i+1}(t_{i})}{\text{d}t^{k_{2}}},

(4)

for $i\in\{1,2,\dots,m-1\}$ , $k_{1}\in\{0,1,\dots,k_{r}\}$ , and $k_{2}\in\{0,1,\dots,k_{\psi}\}$ . Finally, the temporal constraint requires the time allocation to satisfy

\bm{T}\in\mathcal{T}=\{[t_{1},\dots,t_{m-1}]^{\top}|t_{0}<t_{1}<\dots<t_{m-1}<t_{m}\}.

(5)

The optimization problem seeks to find the polynomial coefficients $\bm{\sigma}$ and temporal assignment $\bm{T}$ that minimize the objective function in (2) subject to the constraints in (3)–(5). Since polynomials are highly structured basis functions with coefficients captured by $\bm{\sigma}$ , the problem can be conveniently presented as

$\displaystyle\underset{\bm{\sigma},\bm{T}\in\mathcal{T}}{\text{minimize}}$	$\displaystyle J(\bm{\sigma},\bm{T})=\bm{\sigma}^{\top}P(\bm{T})\bm{\sigma}+\bm{\sigma}^{\top}\bm{q}(\bm{T})$	(P)
subject to	$\displaystyle G(\bm{T})\bm{\sigma}\preceq\bm{h},$
$\displaystyle A(\bm{T})\bm{\sigma}=\bm{b},$

where $\bm{\sigma}$ denotes a permutation of all polynomial coefficients $\{\bm{\sigma}_{ik}\}_{i\in\mathcal{I},k\in\mathcal{K}}$ . The inequality constraint $G(\bm{T})\bm{\sigma}\preceq\bm{h}$ associates with the safety-set constraint (3), whereas the equality constraint $A(\bm{T})=\bm{b}$ associates with the continuity constraints (4). All the coefficients $P(\bm{T})$ , $\bm{q}(\bm{T})$ , $G(\bm{T})$ , $\bm{h}$ , $A(\bm{T})$ , and $\bm{b}$ are derived based on the objective function (2) and the constraints (3) and (4). The matrix $P(\bm{T})$ is positive semi-definite.

In general, the problem (P) is a non-convex problem. Therefore, bilevel optimization has been widely applied to solve the problem (P), e.g., [2, 10, 5]. The bilevel optimization has an upper-level problem:

		$\displaystyle\underset{\bm{T}\in\mathcal{T}}{\text{minimize}}$		$\displaystyle J(\bm{\sigma}^{*},\bm{T})$		(6)
		subject to		$\displaystyle\bm{\sigma}^{*}(\bm{T})\in\underset{\bm{\sigma}}{\text{argmin}}\{J(\bm{\sigma},\bm{T}):\bm{\sigma}\in\mathcal{F}\},$		(6)

where the $\mathcal{F}$ denotes the feasible set of $\bm{\sigma}$ such that $\mathcal{F}=\{\bm{\sigma}:G(\bm{T})\bm{\sigma}\preceq\bm{h},A(\bm{T})\bm{\sigma}=\bm{b}\}$ . Embedded in the upper-level problem, the lower-level problem is

$\displaystyle\underset{\bm{\sigma}}{\text{minimize}}$	$\displaystyle J(\bm{\sigma},\bm{T})$	(7)
subject to	$\displaystyle G(\bm{T})\bm{\sigma}\preceq\bm{h},$
$\displaystyle A(\bm{T})\bm{\sigma}=\bm{b}.$

Note that in the lower-level problem, the time assignment $\bm{T}$ is a parameter of the problem with a known value (obtained via solving (6)). Hence, (7) reduces to the minimum-snap formulation in [2] and turns into a quadratic program, which is convex and efficiently solvable. The challenges come from the upper-level optimization (6), which is non-convex. A typical approach in solving (6) is to use gradient descent, where the temporal assignment $\bm{T}$ is iteratively updated. In [2], the descent direction is obtained by computing the directional derivative along unit vectors. In [5], the descent direction is the gradient of the optimal lower-level cost $\nabla_{\bm{T}}J^{*}(\bm{\sigma}^{*}(\bm{T}),\bm{T})$ , which is computed as an analytical gradient by implicit differentiation of the KKT condition of the low-level problem (7).

In this section, we describe our reformulation of the bilevel-optimization problem based on [5]. Our reformulation can drastically reduce the computation time to that of [5] by reducing the computation time of the lower-level QP problem. Such a QP problem is reported in [5] to dominate the computation time despite being convex. We keep equality constraints only in the lower-level problem, which turns into an unconstrained problem that can be solved in a shorter time.

Our formulation starts by introducing a new optimization variable, which allows decomposing the inequality constraint $G(\bm{T})\bm{\sigma}\preceq\bm{h}$ into an equality constraint and an inequality constraint. Specifically, let the adjustable waypoint $\bm{\xi}_{i}\in\mathbb{R}^{3}$ be the new optimization variable such that $\bm{\xi}_{i}\in\mathcal{C}_{i}$ for $i\in\{0,1,\dots,m\}$ . Then the constraint (3) breaks down to

\bm{\xi}_{0}=\bm{r}_{1}(t_{0}),\quad\bm{\xi}_{0}\in\mathcal{C}_{0},\quad\bm{\xi}_{i}=\bm{r}_{i}(t_{i}),\quad\bm{\xi}_{i}\in\mathcal{C}_{i},

(8)

for $i\in\{1,\dots,m\}$ . Correspondingly, the inequality constraint $G(\bm{T})\bm{\sigma}\preceq\bm{h}$ turns into

C(\bm{T})\bm{\sigma}=\bm{\xi},\quad R\bm{\xi}\preceq\bm{s},

(9)

where $\bm{\xi}^{\top}=[\bm{\xi}_{0}^{\top}\ \bm{\xi}_{1}^{\top}\ \dots\ \bm{\xi}_{m}^{\top}]$ . The matrix $C(\bm{T})$ evaluates the positions of the trajectory on the allocated time instances in $\bm{T}$ , whereas the inequality $R\bm{\xi}\preceq\bm{s}$ is the half-space representation of the polytopes $\{\mathcal{C}_{i}\}_{i=0:m}$ . With the newly introduced variable and constraints, problem (P) turns into

The introduction of the new variable $\bm{\xi}$ and additional constraints preserves the equivalence between (P) and (AP): from a solution of one problem, a solution of the other is readily found, and vice versa [17]. Similar to (P), problem (AP) also can be solved using bilevel optimization. However, the formulation of (AP) permits only keeping the equality constraints in the lower-level optimization:

$\displaystyle\underset{\bm{\sigma}}{\text{minimize}}$	$\displaystyle J(\bm{\sigma},\bm{T})$	(LLP)
subject to	$\displaystyle C(\bm{T})\bm{\sigma}=\bm{\xi},$
$\displaystyle A(\bm{T})\bm{\sigma}=\bm{b},$

where the spatial and temporal assignments, $\bm{\xi}$ and $\bm{T}$ , are generated from the upper-level optimization:

		$\displaystyle\underset{\bm{\xi}\in\mathcal{X},\bm{T}\in\mathcal{T}}{\text{minimize}}$		$\displaystyle J(\bm{\sigma}^{*},\bm{T})$		(ULP)
		subject to		$\displaystyle\bm{\sigma}^{*}(\bm{\xi},\bm{T})\in\underset{\bm{\sigma}}{\text{argmin}}\{J(\bm{\sigma},\bm{T}):\bm{\sigma}\in\mathcal{F}(\bm{\xi},\bm{T})\}.$		(ULP)

Here, the set $\mathcal{F}(\bm{\xi},\bm{T})$ denotes the feasible set of $\bm{\sigma}$ such that $\mathcal{F}(\bm{\xi},\bm{T})=\{\bm{\sigma}:C(\bm{T})\bm{\sigma}=\bm{\xi},A(\bm{T})\bm{\sigma}=\bm{b}\}$ and $\mathcal{X}$ denotes the feasible waypoints such that $\mathcal{X}=\{\bm{\xi}:R\bm{\xi}\preceq\bm{s}\}$ .

Our formulation can be extended to include dynamic constraints of a UAV, e.g., velocity and acceleration limits. Such constraints can be cast as convex sets $\mathcal{C}_{i}^{v}$ and $\mathcal{C}_{i}^{a}$ for feasible velocities and accelerations, respectively. In other words, we have

\displaystyle\bm{v}_{1}(t_{0})\in\mathcal{C}_{0}^{v},\ \bm{v}_{i}(t_{i})\in\mathcal{C}_{i}^{v},\ \bm{a}_{1}(t_{0})\in\mathcal{C}_{0}^{a},\ \bm{a}_{i}(t_{i})\in\mathcal{C}_{i}^{a},

(10)

for $i\in\{1,2,\dots,m\}$ (analogous to (3) for the feasible waypoints), which can be augmented to the inequality constraints $G(\bm{T})\bm{\sigma}\preceq\bm{h}$ in (P). Subsequently, the decomposition in (9) will include the augmented velocity and acceleration assignments. Therefore, our design of lower- and upper-level problems also applies to the dynamic constraints, where the upper-level one determines the temporal allocation and augmented spatial assignments (for position, velocity, and acceleration), and the lower-level one determines the polynomial coefficients to meet these assignments.

A few notes follow our construction of the lower-upper-level problems. First, the lower-level optimization is still a convex problem: a quadratic program with equality constraints only. The insight here is to significantly reduce the computation time of the quadratic program by keeping the equality constraints only. Such a problem can be reduced to an equivalent unconstrained problem by introducing Lagrangian multipliers, which reduces to solving a system of linear equations [17]. Second, the upper-level problem, despite different forms than in [2, 10, 5], is generally hard to solve due to its non-convexity. Coordinate-descent may be applied to successively solve (LLP) and (ULP) in a row despite the fact that the computation time can be forbidden for real-time planning. Instead, we use gradient-descent on (ULP) to only update on $\bm{\xi}$ and $\bm{T}$ once (LLP) is solved. Consequently, the updated values of $\bm{\xi}$ and $\bm{T}$ are plugged into (LLP) as parameters. In order to update $\bm{\xi}$ and $\bm{T}$ , we need $\partial J^{*}/\partial\bm{\xi}$ and $\partial J^{*}/\partial\bm{T}$ to update $\bm{\xi}$ and $\bm{T}$ by

\bm{\xi}\leftarrow P_{\mathcal{X}}\left(\bm{\xi}-\alpha_{1}(\frac{\partial J^{*}}{\partial\bm{\xi}})^{\top}\right),\ \bm{T}\leftarrow P_{\mathcal{T}}\left(\bm{T}-\alpha_{2}(\frac{\partial J^{*}}{\partial\bm{T}})^{\top}\right),

(11)

where $\alpha_{1},\alpha_{2}>0$ are the user-selected step sizes and $P_{\mathcal{Y}}(y)$ denotes the projection operator [18] that projects $y$ onto the set $\mathcal{Y}$ . The step sizes $\alpha_{1}$ and $\alpha_{2}$ dominate the update of the spatial assignment and time allocation, respectively. Since the two variables $\bm{\xi}$ and $\bm{T}$ in space and time have totally different physical units, individual step sizes are needed for efficient parameter updates. For the gradient descent, the key is to obtain the partial derivatives $\partial J^{*}/\partial\bm{\xi}$ and $\partial J^{*}/\partial\bm{T}$ , which stand for the sensitivity of the optimal cost of (LLP) to $\bm{\xi}$ and $\bm{T}$ , respectively. We use OptNet [8] to obtain these two gradients. Specifically, the OptNet allows computing the analytical gradient of the optimal cost with respect to the coefficients of a QP (e.g., $P(\bm{T})$ , $\bm{q}(\bm{T})$ , $C(\bm{T})$ , $A(\bm{T})$ , $\bm{\xi}$ , and $\bm{b}$ in (LLP)) by implicitly differentiating the KKT condition. Using the method enabled by OptNet, we can directly obtain

\frac{\partial J^{*}}{\partial\bm{\xi}},\frac{\partial J^{*}}{\partial P(\bm{T})},\frac{\partial J^{*}}{\partial\bm{q}(\bm{T})},\frac{\partial J^{*}}{\partial C(\bm{T})},\frac{\partial J^{*}}{\partial A(\bm{T})}.

(12)

Using chain rule, we can obtain

	$\displaystyle\frac{\partial J^{*}}{\partial\bm{T}}=$	$\displaystyle\frac{\partial J^{}}{\partial P(\bm{T})}\frac{\partial P(\bm{T})}{\partial\bm{T}}+\frac{\partial J^{}}{\partial\bm{q}(\bm{T})}\frac{\partial\bm{q}(\bm{T})}{\partial\bm{T}}$
		$\displaystyle+\frac{\partial J^{}}{\partial C(\bm{T})}\frac{\partial C(\bm{T})}{\partial\bm{T}}+\frac{\partial J^{}}{\partial A(\bm{T})}\frac{\partial A(\bm{T})}{\partial\bm{T}},$		(13)

where $\frac{\partial P(\bm{T})}{\partial\bm{T}}$ , $\frac{\partial\bm{q}(\bm{T})}{\partial\bm{T}}$ , $\frac{\partial C(\bm{T})}{\partial\bm{T}}$ , $\frac{\partial A(\bm{T})}{\partial\bm{T}}$ are easy to compute since $P(\bm{T})$ , $\bm{q}(\bm{T})$ , $C(\bm{T})$ , $A(\bm{T})$ are explicit functions of $\bm{T}$ . We summarize our solution method to (AP) in Alg. 1.

Algorithm 1 Solution method of (AP)

1:Initial temporal assignment

\bm{T}_{0}

, initial spatial assignment

\bm{\xi}_{0}

, spatial step size

\alpha_{1}

, temporal step size

\alpha_{2}

, and termination condition

\mathcal{C}

2:Optimal spatial and temporal assignment

(\bm{\xi}^{*},\bm{T}^{*})

and polynomial coefficients

\bm{\sigma}^{*}

(\bm{\xi},\bm{T})\leftarrow(\bm{\xi}_{0},\bm{T}_{0})

4:while

\mathcal{C}

is FALSE do

5: Solve the (LLP) using a QP solver (with the parameters

P(\bm{T})

\bm{q}(\bm{T})

A(\bm{T})

\bm{b}

C(\bm{T})

\bm{\xi}

) to obtain the optimal solution

\bm{\sigma}^{*}

and optimal value

J^{*}

6: Obtain the gradients

\partial J^{*}/\partial\bm{\xi}

and

\partial J^{*}/\partial\bm{T}

using OptNet [8] and chain rule in (13)

7: Use the projected gradient descent in (11) to update

\bm{\xi}

and

\bm{T}

(\bm{\xi}^{*},\bm{T}^{*})\leftarrow(\bm{\xi},\bm{T})

9:end while

10:return

(\bm{\xi}^{*},\bm{T}^{*})

and

\bm{\sigma}^{*}

All the simulations shown in this section are implemented using MATLAB R2021b on a computer with a 2.20 GHz Intel i7-8750H CPU. We use quadprog as the QP solver, in which we choose the interior-point method instead of the active-set method in quadprog because the former provided better solution quality with shorter computation time than the latter in our simulations. This observation is consistent with the solver comparison shown in [5]. We compare our method with that of [5] (referred to as “compared method” in this section) in three experiments. In the first one, we fix the number of adjustable waypoints and show a breakdown of the computation time. In the second one, we conduct a scalability experiment to show how the computation time scales with the number of adjustable waypoints. In the third one, we fix the number of adjustable waypoints and include the dynamic constraints. Note that the number of adjustable waypoints equals the number of safety sets as indicated in (8).

We randomly select four waypoints, forming a path with a total length of 10 m. Initially, we allocate 2 s to each trajectory segment. The start and end points are fixed with velocity, acceleration, jerk, and snap set to 0. Two adjustable waypoints are constrained in safety sets $\mathcal{C}_{1}$ and $\mathcal{C}_{2}$ , which are cubes with 0.6 m side lengths centered at the initial waypoints. The scenario is illustrated in Fig. 3. Our method optimizes the time allocation and spatial assignment for the two adjustable waypoints, whereas the compared method only updates the time allocation (the adjustable waypoints are solved in their lower-level problem). The iterations terminate when the relative cost reduction between consecutive iterations is within 3%.

The hyperparameter choice of the spatial and temporal step sizes, $\alpha_{1}$ and $\alpha_{2}$ , largely determine the performance and computation time of the proposed method. We first investigate the optimal cost and total computation time subject to different combinations of $\alpha_{1}$ and $\alpha_{2}$ , where the results are shown in Figs. 1 and 2. It can be observed that when $\alpha_{1}\geq 0.2$ , increasing $\alpha_{1}$ does not lower computation time, and when $\alpha_{2}\leq 0.08$ , decreasing $\alpha_{2}$ does not reduce the optimal cost significantly. Therefore, we choose $\alpha_{1}=0.2$ and $\alpha_{2}=0.08$ to balance the optimality and computation time, where we use this combination of step sizes in the simulations next.

Refer to caption — Figure 1: Optimal costs of the trajectories planned by the proposed method under the combination of different step sizes $\alpha_{1}$ (spatial) and $\alpha_{2}$ (temporal)

Figure 3 shows the gradually updated trajectory in the first 20 iterations. Compared to the stretched initial trajectory, the final trajectory in iteration 20 is better shaped, indicating the benefit of simultaneously conducting spatial and temporal assignments. Meanwhile, the adjustable waypoints are contained within the cubic safety sets and close to the initial waypoints. The cost reduction is demonstrated in Fig. 4, where both the proposed method and compared method are initialized with identical temporal assignments and terminate on the 13th iteration according to the 3% relative cost reduction. It can be seen that both methods achieve similar optimal costs in a similar number of iterations.

The optimal cost and computation time of the two methods are shown in Table I. The comparison shows that the proposed method reaches a slightly smaller cost with a significant time reduction compared to [5] with the same iterations. We dissect the computation in each iteration into four parts and count the individual computation time. The averaged computation time and its standard deviation is shown in Table II. In the compared method, the computation time of solving QP fluctuates due to the inequality constraints in the QPs. Specifically, the change in the time coefficients dramatically affects the parameters of the QP problem, causing the number of iterations and computation time to vary in a wide range for solving the QP. Since the compared method updates the adjustable waypoints in the lower-level QP, the spatial update time is not applicable. It can be observed that the large gap in total computation time mainly comes from the discrepancy in the time of solving the QP in these two methods. Besides, we observe that the QP solver’s reported iterations are consistently 1 when running the proposed method, whereas the number could vary from 5 to 20 when running the compared method. These observations are consistent with our previous analysis that the merits of the proposed method mainly lie in the exclusion of inequality constraints when solving QP.

TABLE I: Comparison on the optimal costs and computation time till convergence

	Compared	Proposed
Optimal Cost	135.69	132.54
Total Computation Time [ms]	361.14	141.31

TABLE II: Breakdown of the averaged computation time and its standard deviation for one iteration. (U) and (L) associate with the upper- and lower-level problems, respectively.

Computation Time [ms]	Compared	Proposed
Gradients from OptNet (U)	5.21 $\pm$ 0.45	4.04 $\pm$ 0.03
Update QP Parameters (U)	1.75 $\pm$ 0.08	1.72 $\pm$ 0.12
Update Waypoints (U)	N/A	0.34 $\pm$ 0.03
Solving QP (L)	20.82 $\pm$ 17.39	4.77 $\pm$ 0.29
Total	27.78 $\pm$ 17.92	10.87 $\pm$ 0.47

We also study how the size of the safety sets affects the performance of the two methods. Keeping all other conditions fixed, we expand the side length of the two cubic safety sets to 0.9 m and 1.2 m. With new safety sets, we conduct the simulations and record both methods’ optimal cost and computation time, which are shown in Table III. The results show that for both methods, the optimal cost and total computation time decline remarkably when the side length of the safety sets expands from 0.6 m to 0.9 m. However, the values remain almost unchanged when the side length of safety sets expands from 0.9 m to 1.2 m. It occurs because the optimal waypoints are on the boundary of the cubes with 0.6 m side length, in contrast to those in the interior of the cubes with 0.9 m side length. In other words, the optimization constraints change from active to inactive when the side length of safety sets sides expands from 0.6 m to 0.9 m. Therefore, expanding from 0.6 m to 0.9 m lowers the cost and computation time dramatically. However, expanding from 0.9 m to 1.2 m does not have the same effect since the optimal waypoints are all in the interior of the safety sets. Regardless of the size of safety sets, the proposed method has the advantage of shorter computation time while achieving almost the same planning quality as the compared method for all three cases.

TABLE III: Comparison on optimal cost and computation time for safety sets with different sizes

Side Length of

Safety Sets [m]

Compared

Proposed

0.6

Optimal Cost

135.69

132.54

Comp. Time [ms]

361.14

141.31

0.9

Optimal Cost

98.53

99.16

Comp. Time [ms]

261.30

132.30

1.2

Optimal Cost

96.44

95.25

Comp. Time [ms]

283.62

159.48

We conduct the scalability experiment to test the scalability of the proposed method and also demonstrate the proposed method’s merit in computation time when planning more complex trajectories. In this experiment, we conduct a series of tests with the number of adjustable waypoints spanning from one to ten. The start and end waypoints are fixed at (0, 0, 0) and (10, 10, 10), respectively, with velocity, acceleration, jerk, and snap set to 0 at both waypoints. The adjustable waypoints are initialized as the vertices of a slalom path and are evenly distributed in the $z$ -axis (see Fig. 5 for an illustration with ten adjustable waypoints). Therefore, the planned trajectories twist and turn around the line connecting the start and end waypoints, causing a dramatic change in the cost as the number of adjustable waypoints increases (the cost increases from 20 to 833,536 as the waypoints increase from one to ten). The total time of the trajectory is 8 s. The initial time allocated to the adjustable waypoints is equally distributed in the 8 s interval. For both methods, the adjustable waypoints are constrained within a cube with 0.6 m side length centered at the initially selected waypoints. For all cases, the termination criterion is set to 5% relative reduction (which is met within 20 iterations for both methods). Figure 5 shows an example with ten adjustable waypoints. Similar to what has been observed in Fig. 3, the final trajectory in iteration 20 is better shaped and less stretched than the initial trajectory.

The total computation time (when the 5% relative cost reduction is reached) of the two methods scales similar to that of a single iteration shown in Table II. The time comparison is demonstrated in Fig. 6. It is clear that the proposed method has a great advantage compared with [5]: The computation time of the proposed method scales linearly with the number of adjustable waypoints, whereas the compared method in [5] scales exponentially. Specifically, we conduct a regression analysis of the computation time of the two methods with the safety set size of 0.6 m. Denote the computation time by $t$ . Then for $N\in\{1,2,\dots,10\}$ number of safety sets, the proposed method’s computation time fits $t=7.52N-3.42$ with an R-square score of 0.99 (the exponential regression $t=11.45\exp{(0.19N)}$ gives an R-score of 0.97, which has less fidelity than the linear fit). The compared method’s computation time fits $t=72.06\exp{(0.31N)}$ with an R-square score of 0.98 (the linear regression $t=170.32N-370.94$ gives an R-score of 0.93, which has less fidelity than the exponential fit). Furthermore, the standard deviations are shown in Fig. 6 by error bars, which indicate the more consistent computation time of the proposed method than the compared method.

The optimal costs of the two methods are shown in Fig. 7. The proposed method either reaches a smaller cost or is within 20% of the cost obtained by the compared method. Summarizing the comparison in Fig. 6 and Fig. 7, we conclude that the proposed method can deliver similar performance in the planned trajectory with the benefit of significantly reducing the computation time.

We further compare the two methods subject to larger safety sets. By expanding the side length of safety sets from 0.6 m to 1.2 m and keeping all other conditions unchanged as above, we display the average computation time in one iteration and optimal cost of both methods in Fig. 6 and Fig. 7. The results show that the computation time and the optimal cost of both methods decrease with the expansion of safety sets. Meanwhile, under larger safety sets, the proposed method still has the advantage of generating trajectories with similar quality while significantly reducing the computation time than the compared method.

We randomly select five waypoints, forming a path with a total length of 20 m. Initially, we allocate 2 s to each trajectory segment. The start and end points are fixed with velocity, acceleration, jerk, and snap set to 0. Three adjustable waypoints are constrained in the cubic safety sets with 0.6 m side lengths centered at the initial waypoints. Meanwhile, the maximum velocity and acceleration are set to 5 m/s and 5 m/s², respectively, on the three adjustable waypoints. The scenario is illustrated in Fig. 8.

Our method optimizes the time allocation and the augmented spatial assignment for the three adjustable waypoints in the upper-level problem, whereas the compared method updates the time allocation only in their upper-level problem and solves the adjustable waypoints velocity and acceleration subject to the maximum in their lower-level problem. The cost reduction is demonstrated in Fig. 9, where both the proposed method and compared method are initialized with identical temporal assignments. Under the termination condition of 3% relative cost reduction, the compared method terminates at the 6th iteration, and the proposed method terminates at the 25th iteration. Meanwhile, the proposed method’s optimal cost is within 3.5% of the cost obtained by the compared method.

Table IV shows the optimization results of two methods. Though the proposed method takes more iterations to terminate, the remarkable advantage in computation time of one iteration still makes the proposed method use much less time to plan a trajectory in contrast with the compared method, while a similar optimality is achieved in both methods. The breakdown of the average computation time for one iteration and their standard deviations are demonstrated in Table V. It is clear that the major gap between the two methods in computation time is caused by the gap in solving the lower-level QP. For each adjustable waypoint, the compared method needs to solve 18 inequality constraints while the proposed method delegates them to the upper-level problem and solves the lower-level problem with equality constraints only, reducing the time consumption significantly.

TABLE IV: Comparison on the optimal costs and computation time till convergence when the dynamic constraints are present.

	Compared	Proposed
Optimal Cost	1609.34	1662.36
Total Computation Time [ms]	4342.73	362.75

TABLE V: Breakdown of the averaged computation time and its standard deviation for one iteration when the dynamic constraints are present. (U) and (L) associate with the upper- and lower-level problems, respectively.

Computation Time [ms]	Compared	Proposed
Gradients from OptNet (U)	9.82 $\pm$ 1.49	5.56 $\pm$ 0.88
Update QP Parameters (U)	1.73 $\pm$ 0.21	2.41 $\pm$ 0.61
Update Waypoints (U)	N/A	0.35 $\pm$ 0.03
Update Dynamic Constraints(U)	N/A	0.38 $\pm$ 0.04
Solving QP (L)	608.84 $\pm$ 19.54	5.81 $\pm$ 0.29
Total	620.39 $\pm$ 21.24	14.51 $\pm$ 1.85

We present a novel method of UAV trajectory planning based on bilevel optimization by simultaneously conducting spatial and temporal assignments. Our bilevel optimization is composed of a lower-level problem that is a QP with equality constraints only (which contributes majorly to the reduced computation time) and an upper-level problem that uses analytical gradients to make the spatial and temporal update on the adjustable waypoints. Simulation results show that our method has great advantages in computation time compared to the existing bilevel optimization method, where the former scales linearly and the latter scales exponentially to the number of adjustable waypoints. Future work will deploy our method on a real quadrotor to demonstrate its capability for onboard and real-time trajectory planning in complex environments.

$\displaystyle\underset{\bm{\sigma},\bm{\xi},\bm{T}\in\mathcal{T}}{\text{minimize}}$

$\displaystyle J(\bm{\sigma},\bm{T})=\bm{\sigma}^{\top}P(\bm{T})\bm{\sigma}+\bm{\sigma}^{\top}\bm{q}(\bm{T})$

subject to

$\displaystyle R\bm{\xi}\preceq\bm{s},$

$\displaystyle C(\bm{T})\bm{\sigma}=\bm{\xi},$

$\displaystyle A(\bm{T})\bm{\sigma}=\bm{b}.$

[1] A. Gasparetto, P. Boscariol, A. Lanzutti, and R. Vidoni, “Path planning and trajectory planning algorithms: A general overview,” Motion and Operation Planning of Robotic Systems, pp. 3–27, 2015.
[2] D. Mellinger and V. Kumar, “Minimum snap trajectory generation and control for quadrotors,” in Proceedings of the IEEE International Conference on Robotics and Automation, 2011, pp. 2520–2525.
[3] S. Liu, M. Watterson, K. Mohta, K. Sun, S. Bhattacharya, C. J. Taylor, and V. Kumar, “Planning dynamically feasible trajectories for quadrotors using safe flight corridors in 3-D complex environments,” IEEE Robotics and Automation Letters, vol. 2, no. 3, pp. 1688–1695, 2017.
[4] F. Gao, W. Wu, J. Pan, B. Zhou, and S. Shen, “Optimal time allocation for quadrotor trajectory generation,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2018, pp. 4715–4722.
[5] W. Sun, G. Tang, and K. Hauser, “Fast UAV trajectory optimization using bilevel optimization with analytical gradients,” IEEE Transactions on Robotics, vol. 37, no. 6, pp. 2010–2024, 2021.
[6] J. Chen, T. Liu, and S. Shen, “Online generation of collision-free trajectories for quadrotor flight in unknown cluttered environments,” in Proceedings of the IEEE International Conference on Robotics and Automation. IEEE, 2016, pp. 1476–1483.
[7] F. Gao, W. Wu, Y. Lin, and S. Shen, “Online safe trajectory generation for quadrotors using fast marching method and Bernstein basis polynomial,” in Proceedings of the IEEE International Conference on Robotics and Automation, 2018, pp. 344–351.
[8] B. Amos and J. Z. Kolter, “OptNet: Differentiable optimization as a layer in neural networks,” in Proceedings of the International Conference on Machine Learning. PMLR, 2017, pp. 136–145.
[9] H. Kano and H. Fujioka, “B-spline trajectory planning with curvature constraint,” in Proceedings of the American Control Conference, 2018, pp. 1963–1968.
[10] C. Richter, A. Bry, and N. Roy, “Polynomial trajectory planning for aggressive quadrotor flight in dense indoor environments,” in Robotics Research. Springer, 2016, pp. 649–666.
[11] A. Sinha, P. Malo, and K. Deb, “A review on bilevel optimization: From classical to evolutionary approaches and applications,” IEEE Transactions on Evolutionary Computation, vol. 22, no. 2, pp. 276–295, 2018.
[12] B. Landry, Z. Manchester, and M. Pavone, “A differentiable augmented Lagrangian method for bilevel nonlinear optimization,” in Proceedings of Robotics: Science and Systems, 2019.
[13] T. Stouraitis, I. Chatzinikolaidis, M. Gienger, and S. Vijayakumar, “Online hybrid motion planning for dyadic collaborative manipulation via bilevel optimization,” IEEE Transactions on Robotics, vol. 36, no. 5, pp. 1452–1471, 2020.
[14] R. Menasri, A. Nakib, B. Daachi, H. Oulhadj, and P. Siarry, “A trajectory planning of redundant manipulators based on bilevel optimization,” Applied Mathematics and Computation, vol. 250, pp. 934–947, 2015.
[15] F. Farshidian, M. Neunert, A. W. Winkler, G. Rey, and J. Buchli, “An efficient optimal planning and control framework for quadrupedal locomotion,” in Proceedings of the IEEE International Conference on Robotics and Automation. IEEE, 2017, pp. 93–100.
[16] H. Pirnay, R. López-Negrete, and L. T. Biegler, “Optimal sensitivity based on IPOPT,” Mathematical Programming Computation, vol. 4, no. 4, pp. 307–331, 2012.
[17] S. Boyd, S. P. Boyd, and L. Vandenberghe, Convex Optimization. Cambridge university press, 2004.
[18] N. Parikh, S. Boyd et al., “Proximal algorithms,” Foundations and Trends® in Optimization, vol. 1, no. 3, pp. 127–239, 2014.