This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Optimizing Trajectories with Closed-Loop Dynamic SQP

Sumeet Singh1, Jean-Jacques Slotine1, Vikas Sindhwani1 1 Sumeet Singh, Jean-Jacques Slotine (MIT), and Vikas Sindhwani are with Robotics at Google, New York City, NY 10011, USA {ssumeet,jeanjacquess, sindhwani}@google.com
Abstract

Indirect trajectory optimization methods such as Differential Dynamic Programming (DDP) have found considerable success when only planning under dynamic feasibility constraints. Meanwhile, nonlinear programming (NLP) has been the state-of-the-art approach when faced with additional constraints (e.g., control bounds, obstacle avoidance). However, a naïve implementation of NLP algorithms, e.g., shooting-based sequential quadratic programming (SQP), may suffer from slow convergence – caused from natural instabilities of the underlying system manifesting as poor numerical stability within the optimization. Re-interpreting the DDP closed-loop rollout policy as a sensitivity-based correction to a second-order search direction, we demonstrate how to compute analogous closed-loop policies (i.e., feedback gains) for constrained problems. Our key theoretical result introduces a novel dynamic programming-based constraint-set recursion that augments the canonical “cost-to-go” backward pass. On the algorithmic front, we develop a hybrid-SQP algorithm incorporating DDP-style closed-loop rollouts, enabled via efficient parallelized computation of the feedback gains. Finally, we validate our theoretical and algorithmic contributions on a set of increasingly challenging benchmarks, demonstrating significant improvements in convergence speed over standard open-loop SQP.

I Introduction

Trajectory optimization forms the backbone of model-based optimal control with myriad applications in robot mobility and manipulation [1, 2, 3, 4, 5]. The problem formulation is as follows: consider a robotic system with state xnx\in\mathbb{R}^{n}, control input umu\in\mathbb{R}^{m}, subject to the discrete-time dynamics:

xk+1=f(xk,uk),k0.x_{k+1}=f(x_{k},u_{k}),\quad k\in\mathbb{N}_{\geq 0}. (1)

Let N>0N\in\mathbb{N}_{>0} be some fixed planning horizon. Given some initial state x0x_{0}, the trajectory optimization problem is as follows:

min𝒖,𝒙\displaystyle\min_{\bm{u},\bm{x}}\qquad k=0N1lk(xk,uk)+lN(xN)=:𝒥(𝒖,𝒙)\displaystyle\sum_{k=0}^{N-1}l_{k}(x_{k},u_{k})+l_{N}(x_{N})=:\mathcal{J}(\bm{u},\bm{x}) (2a)
s.t.\displaystyle\mathrm{s.t.}\qquad k=0,,N1:{xk+1=f(xk,uk)cku(uk)0ck+1x(xk+1)0,\displaystyle k=0,\ldots,N-1:\quad\begin{cases}x_{k+1}=f(x_{k},u_{k})\\ c_{k}^{u}(u_{k})\geq 0\\ c_{k+1}^{x}(x_{k+1})\geq 0\end{cases}, (2b)

where we use (𝒖,𝒙)(\bm{u},\bm{x}) to denote the concatenations (u0,,uN1)(u_{0},\ldots,u_{N-1}) and (x0,,xN)(x_{0},\ldots,x_{N}), respectively. Here, lk:n×m0l_{k}:\mathbb{R}^{n}\times\mathbb{R}^{m}\rightarrow\mathbb{R}_{\geq 0} is the running cost, lN:n0l_{N}:\mathbb{R}^{n}\rightarrow\mathbb{R}_{\geq 0} is the terminal cost, and ckx:nnx,cku:mnuc_{k}^{x}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n_{x}},c_{k}^{u}:\mathbb{R}^{m}\rightarrow\mathbb{R}^{n_{u}} are the vector-valued constraint functions on the state and control input. We assume that the control constraint encodes simple box constraints: u¯uku¯\underline{u}\leq u_{k}\leq\overline{u}, though, the results in this paper may be generalized beyond this assumption.

Solution methods generally fall into one of two approaches: optimal control-based (indirect methods), or optimization-based (direct methods). The former leverages necessary conditions of optimality for optimal control, such as dynamic programming (DP), while the latter treats the problem as a pure mathematical optimization program [6]. A further sub-categorization of the direct method distinguishes between a Full or a Condensed formulation, where the former treats both the states and controls as optimization variables, subject to dynamics equality constraints, while the latter optimizes only over the control variables, with the dynamics implicit.

Refer to caption
Figure 1: Motion planning using our methods for a planar quadrotor with attached pendulum, starting at rest with the pendulum in stable equilibrium (top-left, green), and ending at rest with the pendulum upright (bottom-right, red).

Lacking constraints beyond dynamic feasibility, ubiquitous indirect methods [7, 8, 9, 10] such as Differential Dynamic Programming (DDP) and its Gauss-Newton relaxation, iterative Linear Quadratic Regulator (iLQR) rely upon the DP recursion to split the full-horizon planning problem into a sequence of one-step optimizations, and alternate between a backward and forward pass through the time-steps. The backward pass recursively forms quadratic expansions of the optimal cost-to-go function and computes a time-varying affine perturbation policy that is subsequently rolled out through the system’s dynamics in the forward pass to yield the updated trajectory iterate. Under mild assumptions, DDP locally achieves quadratic convergence, and the proof relies upon establishing the close link between DDP and the Newton method, as applied to the condensed optimization-based formulation [9]. The stability properties of the underlying nonlinear system manifest as numerical stability during the optimization process, and hence the closed-loop nature of the forward pass in DDP typically leads to better performance [11] than Newton’s method, which implements open-loop rollouts.

Computing DDP-style closed-loop updates within the constrained setting (beyond dynamics feasibility) is much more challenging, since quadratization of the cost and linearization of the dynamics and constraints yields constrained quadratic programs (QPs) with piecewise-affine optimal perturbation policies [12], and complexity growing exponentially in the number of constraints and horizon of the problem. Consequently, direct methods (featuring open-loop updates) are the prevailing solution approach, typically combined with interior-point or SQP algorithms. Aside from possessing more variables, the direct formulation must additionally resolve dynamic feasibility, which can be non-trivial and lead to slower convergence even for unconstrained systems, as compared to indirect methods. Moreover, while the condensed formulation yields smaller problems, instabilities have been observed [13] due to divergences between the predicted step from the linearized dynamics, and the open-loop nonlinear rollout, leading to vanishing step-sizes.

Contributions: Re-interpreting the canonical DDP closed-loop rollout as a sensitivity-based correction to a second-order search direction, we demonstrate how to compute a locally affine approximation to the constrained perturbation policies, i.e., a set of feedback gains similar to those employed by the DDP rollouts. Our key theoretical result states that in the constrained setting, one must first compute an optimal perturbation sequence about the current trajectory iterate by solving a full-horizon QP (as opposed to the one-step DP backward recursion), and then augment the canonical cost-to-go backward pass with a constraint-set recursion. We then demonstrate how to approximate the desired feedback gains using an efficient, parallelized algorithm, eliminating the backward pass. The closed-loop rollout is integrated into an SQP line-search, yielding a hybrid indirect/direct algorithm that combines the theoretical foundations of SQP for constrained optimization with the algorithmic efficiencies of DDP-style forward rollouts. The method is rigorously evaluated within several environments, where we confirm significant convergence speed improvements over naïve (i.e., open-loop) SQP.

Related Work: Quasi-DDP methods for constrained trajectory optimization fall into one of three main categories: control-bounds only [14],[15], modified backward pass via KKT analysis [16, 17, 18, 19, 20], and augmented Lagrangian methods [21, 4, 3, 22, 23]. We provide a comprehensive overview of these approaches in Appendix A111All appendices referenced herein may be found in the online version of this work [24]..

II Shooting SQP

We detail below the core algorithmic steps for SQP, as applied to the shooting formulation of problem (2), i.e., where dynamics are treated implicitly and we optimize only over the control sequence 𝒖\bm{u}. The three steps are [25, 26]: (i) solving a QP sub-problem to compute a search direction, i.e., a sequence of control perturbations 𝜹𝒖=(δu0,,δuN1)\bm{\delta u}=(\delta u_{0},\ldots,\delta u_{N-1}), (ii) performing line-search along 𝜹𝒖\bm{\delta u} using a merit function, and (iii) monitoring termination conditions. We provide some details regarding (i) and (ii) here, and refer the reader to Appendices B and C for the rest.

Let ck(xk,uk)c_{k}(x_{k},u_{k}) denote the concatenation (ckx(xk),cku(uk))(c_{k}^{x}(x_{k}),c_{k}^{u}(u_{k})), k=0,,Nk=0,\ldots,N (where cNuc_{N}^{u} is null), and let yky_{k} denote the corresponding dual variable. Consider the Lagrangian at current primal-dual iterate (𝒖,𝒚)(\bm{u},\bm{y}):

(𝒖,𝒙,𝒚)=k=0N1lkykTck+lNyNTcN\mathcal{L}(\bm{u},\bm{x},\bm{y})=\sum_{k=0}^{N-1}l_{k}-y_{k}^{T}c_{k}+l_{N}-y_{N}^{T}c_{N} (3)

For brevity, we omit the explicit arguments (xk,uk)(x_{k},u_{k}) where possible. As we are only optimizing over 𝒖\bm{u}, let 𝒙[𝒖]\bm{x}[\bm{u}] represent the state trajectory starting at x0x_{0} obtained from propagating the open-loop control sequence 𝒖\bm{u} through the discrete-time dynamics in (1) for k=0,,N1k=0,\ldots,N-1. Define the “reduced” objective and Lagrangian as 𝒥R(𝒖)=𝒥(𝒖,𝒙[𝒖])\mathcal{J}_{R}(\bm{u})=\mathcal{J}(\bm{u},\bm{x}[\bm{u}]) and R(𝒖,𝒚):=(𝒖,𝒙[𝒖],𝒚)\mathcal{L}_{R}(\bm{u},\bm{y}):=\mathcal{L}(\bm{u},\bm{x}[\bm{u}],\bm{y}), respectively, and define the sets {δ𝒳k}k=1N\{\delta\mathcal{X}_{k}\}_{k=1}^{N} and {δ𝒰k}k=0N1\{\delta\mathcal{U}_{k}\}_{k=0}^{N-1} as δ𝒳k:={δx:ckx+Jkxδx0}\delta\mathcal{X}_{k}:=\{\delta x:c_{k}^{x}+J_{k}^{x}\delta x\geq 0\} and δ𝒰k:={δu:cku+Jkuδu0}\delta\mathcal{U}_{k}:=\{\delta u:c_{k}^{u}+J_{k}^{u}\delta u\geq 0\}. Then, the QP sub-problem takes the form:

min𝜹𝒖,𝜹𝒙\displaystyle\min_{\bm{\delta u},\bm{\delta x}}\quad 𝜹𝒖,𝒖𝒥R(𝒖)+12𝜹𝒖,𝒖2R(𝒖,𝒚)𝜹𝒖\displaystyle\langle\bm{\delta u},\nabla_{\bm{u}}\mathcal{J}_{R}(\bm{u})\rangle+\dfrac{1}{2}\langle\bm{\delta u},\nabla_{\bm{u}}^{2}\mathcal{L}_{R}(\bm{u},\bm{y})\bm{\delta u}\rangle (4a)
s.t.\displaystyle\mathrm{s.t.}\quad δxk+1=Akδxk+Bkδuk,k=0,,N1\displaystyle\delta x_{k+1}=A_{k}\delta x_{k}+B_{k}\delta u_{k},\ k=0,\ldots,N-1 (4b)
δukδ𝒰k,δxk+1δ𝒳k+1,k=0,,N1,\displaystyle\delta u_{k}\in\delta\mathcal{U}_{k},\quad\delta x_{k+1}\in\delta\mathcal{X}_{k+1},\ k=0,\ldots,N-1, (4c)

where δx0=0\delta x_{0}=0 and ,\langle\cdot,\cdot\rangle denotes the standard Euclidean dot product; (Ak,Bk)(A_{k},B_{k}) are the dynamics Jacobians (f/x,f/u)(xk,uk)(\partial f/\partial x,\partial f/\partial u)(x_{k},u_{k}), and (Jku,Jkx)(J_{k}^{u},J_{k}^{x}) are the constraint Jacobians (cku(uk)/u,ckx(xk)/x)(\partial c_{k}^{u}(u_{k})/\partial u,\partial c_{k}^{x}(x_{k})/\partial x). The sub-problem objective (4a) has the form:

k=0N1[qkTδxk+rkTδuk+12[δxkδuk]TZk[δxkδuk]]:=l~k(δxk,δuk)++12δxNTZNδxN+qNTδxN:=l~N(δxN)\begin{split}&\sum_{k=0}^{N-1}\overbrace{\left[q_{k}^{T}\delta x_{k}+r_{k}^{T}\delta u_{k}+\dfrac{1}{2}\begin{bmatrix}\delta x_{k}\\ \delta u_{k}\end{bmatrix}^{T}Z_{k}\begin{bmatrix}\delta x_{k}\\ \delta u_{k}\end{bmatrix}\right]}^{:=\tilde{l}_{k}(\delta x_{k},\delta u_{k})}+\\ &\quad+\underbrace{\dfrac{1}{2}\delta x_{N}^{T}Z_{N}\delta x_{N}+q_{N}^{T}\delta x_{N}}_{:=\tilde{l}_{N}(\delta x_{N})}\end{split} (5)

where the terms {qk,rk,Zk}\{q_{k},r_{k},Z_{k}\} are provided in Appendix C. Let (𝜹𝒖,𝜹𝒙)(\bm{\delta u}^{*},\bm{\delta x}^{*}) represent the optimal primal, and 𝒚^\hat{\bm{y}} the optimal inequality dual solutions for (4), and define the dual search direction 𝜹𝒚:=𝒚^𝒚\bm{\delta y}^{*}:=\hat{\bm{y}}-\bm{y}.

II-A Line-Search

For α(0,1]\alpha\in(0,1], define 𝒖[α]:=𝒖+α𝜹𝒖\bm{u}[\alpha]:=\bm{u}+\alpha\bm{\delta u}^{*} and 𝒙[𝒖[α]]:=𝒙[α]\bm{x}[\bm{u}[\alpha]]:=\bm{x}[\alpha]. The line-search merit function is defined as:

ϕ(α;ρ)=I(𝒖[α],𝒚+α𝜹𝒚,𝒔+α𝜹𝒔;ρ),𝒙=𝒙[α],\phi(\alpha;\rho)=\mathcal{M}_{I}(\bm{u}[\alpha],\bm{y}+\alpha\bm{\delta y}^{*},\bm{s}+\alpha\bm{\delta s^{*}};\rho),\ \bm{x}=\bm{x}[\alpha], (6)

where I\mathcal{M}_{I} is the augmented Lagrangian function for problem (2), 𝒔\bm{s} is a vector of slack variables for the inequality constraints with search direction 𝜹𝒔\bm{\delta s}^{*}, introduced solely for the line-search, and ρ={ρk}k=0N\rho=\{\rho_{k}\}_{k=0}^{N} is a set of penalty parameters. Please see Appendix C for details on {𝒔,𝜹𝒔,ρ}\{\bm{s},\bm{\delta s},\rho\}.

III Dynamic Programming SQP

Implicit within the line-search is the open-loop rollout along the search direction 𝜹𝒖\bm{\delta u}^{*}, i.e., 𝒙[α]=𝒙[𝒖+α𝜹𝒖]\bm{x}[\alpha]=\bm{x}[\bm{u}+\alpha\bm{\delta u}^{*}]. For unstable nonlinear systems, this state trajectory may differ significantly from 𝒙+𝜹𝒙\bm{x}+\bm{\delta x}^{*}, the “predicted” sequence from the QP sub-problem, forcing the line-search to take sub-optimal step-sizes and slowing convergence. This observation is corroborated in [11] in context of comparing DDP and Newton methods for unconstrained problems, and within [13] in the constrained context. Our objective therefore, is to efficiently compute a set of feedback gains to perform DDP-style closed-loop rollouts within the SQP line-search. We hypothesize that such an enhancement will (i) improve the numerical stability of the line-search, and (ii) accelerate convergence of Shooting SQP. We first demonstrate how the classical DP recursion is ill-posed in the context of constrained trajectory optimization, and propose a correction inspired from sensitivity analysis.

III-A Sensitivity-Based Dynamic Programming

The starting point for the derivation of iLQR and DDP algorithms for unconstrained problems is with the Bellman form of the optimal cost-to-go function:

Vk(x):=minπk[lk(x,πk(x))+Vk+1(f(x,πk(x)))],k=0,,N1VN(x):=lN(x),\small\begin{split}V_{k}(x)&:=\min_{\pi_{k}}\left[l_{k}(x,\pi_{k}(x))+V_{k+1}(f(x,\pi_{k}(x)))\right],\ k=0,\ldots,N-1\\ V_{N}(x)&:=l_{N}(x),\end{split}

where πk:nm\pi_{k}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{m} is a policy for time-step kk, mapping states to controls. For a non-optimal state-control sequence (𝒙,𝒖)(\bm{x},\bm{u}), consider the local expansion of the optimal cost-to-go function:

δVk(δxk):=Vk(xk+δxk)=minδπk[lk(xk+δxk,uk+δπk(δxk))+δVk+1(δxk+1):=Qk(δxk,δπk)],\begin{split}&\delta V_{k}(\delta x_{k}):=V_{k}(x_{k}+\delta x_{k})\\ &\quad=\min_{\delta\pi_{k}}\left[\underbrace{l_{k}(x_{k}+\delta x_{k},u_{k}+\delta\pi_{k}(\delta x_{k}))+\delta V_{k+1}(\delta x_{k+1})}_{:=Q_{k}(\delta x_{k},\delta\pi_{k})}\right],\end{split}

where δxk+1=f(xk+δxk,uk+δπk(δxk))xk+1\delta x_{k+1}=f(x_{k}+\delta x_{k},u_{k}+\delta\pi_{k}(\delta x_{k}))-x_{k+1}, and δπk:nm\delta\pi_{k}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{m} is a perturbation policy for time-step kk at xkx_{k}, as a function of δxk\delta x_{k}. Now, by recursively (in a backward pass) taking quadratic approximations of the state-action variation function QkQ_{k} about (δxk,δuk)=(0,0)(\delta x_{k},\delta u_{k})=(0,0), one can solve for an affine approximation to the minimizing perturbation policy. In particular, let Q˘k\breve{Q}_{k} represent the quadratic approximation of QkQ_{k}, and define δπ˘k(δxk):=argminδuQ˘k(δxk,δu)\delta\breve{\pi}_{k}^{*}(\delta x_{k}):=\operatornamewithlimits{\mathrm{argmin}}_{\delta u}\breve{Q}_{k}(\delta x_{k},\delta u). Then δπ˘k(δxk)=δu¯k+Kkδxk\delta\breve{\pi}_{k}^{*}(\delta x_{k})=\overline{\delta u}_{k}+K_{k}\delta x_{k}. For step-length α\alpha, this perturbation policy is rolled out to obtain the new trajectory iterate:

δxk+1[α]=f(xk+δxk[α],uk+δuk[α])xk+1,δuk[α]=αδu¯k+Kkδxk[α],α(0,1],\begin{split}\delta x_{k+1}[\alpha]&=f(x_{k}+\delta x_{k}[\alpha],u_{k}+\delta u_{k}[\alpha])-x_{k+1},\\ \delta u_{k}[\alpha]&=\alpha\overline{\delta u}_{k}+K_{k}\delta x_{k}[\alpha],\quad\alpha\in(0,1],\end{split} (7)

where δx0[α]=0\delta x_{0}[\alpha]=0. Notice that one may interpret the terms of the unconstrained perturbation policy δπ˘k\delta\breve{\pi}_{k}^{*} as follows:

δu¯k:=δπ˘k(0),Kk:=δπ˘k(0)δxk.\overline{\delta u}_{k}:=\delta\breve{\pi}_{k}^{*}(0),\quad K_{k}:=\dfrac{\partial\delta\breve{\pi}_{k}^{*}(0)}{\partial\delta x_{k}}. (8)
Remark 1.

Since δπ˘k(δxk)\delta\breve{\pi}^{*}_{k}(\delta x_{k}) is the solution of an unconstrained convex quadratic, the argument 0 is redundant for the sensitivity matrix KkK_{k}. This will not be the case in the constrained setting.

Consider now the constrained setting, and define for k{0,,N1}k\in\{0,\ldots,N-1\}:

δπk(δxk):=argminδukδ𝒰kl~k(δxk,δuk)+δVk+1(δxk+1)\small\delta\pi_{k}^{*}(\delta x_{k}):=\operatornamewithlimits{\mathrm{argmin}}_{\delta u_{k}\in\delta\mathcal{U}_{k}}\quad\tilde{l}_{k}(\delta x_{k},\delta u_{k})+\delta V_{k+1}(\delta x_{k+1}) (9)

where l~k\tilde{l}_{k} is the stage-kk term in (5), δxk+1=Akδxk+Bkδuk\delta x_{k+1}=A_{k}\delta x_{k}+B_{k}\delta u_{k}, and δVk+1\delta V_{k+1} is the optimal “cost-to-go” for problem (4). That is, for k+1=Nk+1=N, δxNδ𝒳N\delta x_{N}\in\delta\mathcal{X}_{N}, δVN(δxN)=l~N(δxN)\delta V_{N}(\delta x_{N})=\tilde{l}_{N}(\delta x_{N}), while for k+1{1,,N1}k+1\in\{1,\ldots,N-1\}, δxk+1δ𝒳k+1\delta x_{k+1}\in\delta\mathcal{X}_{k+1}, δVk+1(δxk+1)\delta V_{k+1}(\delta x_{k+1}) is the optimal value of the tail-truncation of QP sub-problem (4), starting at time-step k+1k+1 at δxk+1\delta x_{k+1}.

Notice that since δVk+1\delta V_{k+1} is the optimal value of a constrained QP, δπk(0)\delta\pi_{k}^{*}(0) and the sensitivity matrix δπk(0)/δxk\partial\delta\pi_{k}^{*}(0)/\partial\delta x_{k} (paralleling the terms defined in (8)) may be ill-defined, for instance when the tail sub-problem is infeasible at δxk=0\delta x_{k}=0. This is a consequence of the linearized constraints, irrespective of the objective function used to define the DP recursion.

Instead, consider the following equivalent re-arrangement of the unconstrained DDP control law in (7):

δuk[α]=αδu¯k+Kkδxk[α]=αδu¯k+KkαδxkL+Kk(δxk[α]αδxkL)=αδukL+Kk(δxk[α]αδxkL)\small\begin{split}\delta u_{k}[\alpha]&=\alpha\overline{\delta u}_{k}+K_{k}\delta x_{k}[\alpha]\\ &=\alpha\overline{\delta u}_{k}+K_{k}\alpha\delta x_{k}^{L}+K_{k}(\delta x_{k}[\alpha]-\alpha\delta x_{k}^{L})\\ &=\alpha\delta u_{k}^{L}+K_{k}(\delta x_{k}[\alpha]-\alpha\delta x_{k}^{L})\end{split} (10)

where, the sequence (𝜹𝒙L,𝜹𝒖L)(\bm{\delta x}^{L},\bm{\delta u}^{L}) is defined by the rollout of δπ˘k\delta\breve{\pi}^{*}_{k} via the linearized dynamics:

δxk+1L=AkδxkL+BkδukL,δx0L=0δukL:=δπ˘k(δxkL).\begin{split}\delta x_{k+1}^{L}&=A_{k}\delta x_{k}^{L}+B_{k}\delta u_{k}^{L},\quad\delta x_{0}^{L}=0\\ \delta u_{k}^{L}&:=\delta\breve{\pi}^{*}_{k}(\delta x_{k}^{L}).\end{split} (11)

In light of the homogeneity of the above recursion (i.e., the sequence α𝜹𝒖L\alpha\bm{\delta u}^{L} rolled out via the linear dynamics yields α𝜹𝒙L\alpha\bm{\delta x}^{L}), eq. (10) suggests interpreting 𝜹𝒖L\bm{\delta u}^{L} as a search-direction, α𝜹𝒖L\alpha\bm{\delta u}^{L} as the search step, and the feedback term as a sensitivity-based correction. Thus, we may interpret the DDP rollout as a local sensitivity-based correction to the Newton search direction (𝛅𝐮L\bm{\delta u}^{L}). Generalizing this interpretation to the constrained setting, consider the following control law:

δuk[α]=clipu¯uku¯uk[αδπk(δxkL)+δπk(αδxkL)δxk(δxk[α]αδxkL)],\delta u_{k}[\alpha]=\mathrm{clip}_{\underline{u}-u_{k}}^{\overline{u}-u_{k}}\bigg{[}\alpha\delta\pi_{k}^{*}(\delta x_{k}^{L})+\dfrac{\partial\delta\pi_{k}^{*}(\alpha\delta x_{k}^{L})}{\partial\delta x_{k}}(\delta x_{k}[\alpha]-\alpha\delta x_{k}^{L})\bigg{]},

where similarly to (11), 𝜹𝒙L\bm{\delta x}^{L} is obtained from rolling-out {δπk(δxkL)}k=0N1\{\delta\pi_{k}^{*}(\delta x_{k}^{L})\}_{k=0}^{N-1} through the linearized dynamics. Now, it follows from Bellman’s principle of optimality that δπk(δxk)=δuk\delta\pi_{k}^{*}(\delta x_{k}^{*})=\delta u_{k}^{*}, where (𝜹𝒙,𝜹𝒖)(\bm{\delta x},\bm{\delta u})^{*} are the optimal solution to (4). Further since δx0L=δx0=0\delta x_{0}^{L}=\delta x_{0}^{*}=0, it follows inductively that 𝜹𝒙L=𝜹𝒙\bm{\delta x}^{L}=\bm{\delta x}^{*}. Thus, the final control law for δuk[α]\delta u_{k}[\alpha] becomes

clipu¯uku¯uk[αδuk+δπk(αδxk)δxk(δxk[α]αδxk)].\mathrm{clip}_{\underline{u}-u_{k}}^{\overline{u}-u_{k}}\bigg{[}\alpha\delta u_{k}^{*}+\dfrac{\partial\delta\pi^{*}_{k}(\alpha\delta x_{k}^{*})}{\partial\delta x_{k}}(\delta x_{k}[\alpha]-\alpha\delta x_{k}^{*})\bigg{]}. (12)

Given that δπk\delta\pi^{*}_{k}, as defined in (9), is implicitly the solution of a variable-horizon optimization, it is computationally prohibitive to compute the sensitivity matrices above via explicit differentiation. Instead, we next define a DP recursion to exactly compute these sensitivities about the fixed sequence 𝜹𝒙\bm{\delta x}^{*}.

Remark 2.

Since problem (9) is a multi-parametric QP in δxk\delta x_{k}, δπk\delta\pi^{*}_{k} is a piecewise-affine function of δxk\delta x_{k} [12]. Thus, for α=1\alpha=1, the expression inside the brackets in (12) lies within δ𝒰k\delta\mathcal{U}_{k} only for δxk[1]\delta x_{k}[1] in a local region around δxk\delta x_{k}^{*}, thereby necessitating the clipping operation (i.e., projection onto δ𝒰k\delta\mathcal{U}_{k}).

III-B DP Recursion for Computing Sensitivity Gains

We outline the DP recursion first and characterize its correctness in Theorem 1.

Initialization: Set PN=ZN,pN=qN,vN=0,GNcr=JNx,hNcr=cNxP_{N}=Z_{N},p_{N}=q_{N},v_{N}=0,G^{\mathrm{cr}}_{N}=-J_{N}^{x},h^{\mathrm{cr}}_{N}=c_{N}^{x}.

Time-step k=N1,,0k=N-1,\ldots,0: Define the function δV~k+1(δxk+1):=(1/2)δxk+1TPk+1δxk+1+pk+1Tδxk+1+vk+1\delta\tilde{V}_{k+1}(\delta x_{k+1}):=(1/2)\delta x_{k+1}^{T}P_{k+1}\delta x_{k+1}+p_{k+1}^{T}\delta x_{k+1}+v_{k+1}. For δxkδ𝒳k\delta x_{k}\in\delta\mathcal{X}_{k}, consider the one-step QP:

minδukδxk+1l~k(δxk,δuk)+δV~k+1(δxk+1)s.t.δxk+1=Akδxk+Bkδuk[Jku00Gk+1cr]:=G¯k[δukδxk+1][ckuhk+1cr]:=h¯k.\small\begin{split}\min_{\begin{subarray}{c}\delta u_{k}\\ \delta x_{k+1}\end{subarray}}\quad&\tilde{l}_{k}(\delta x_{k},\delta u_{k})+\delta\tilde{V}_{k+1}(\delta x_{k+1})\\ \mathrm{s.t.}\quad&\delta x_{k+1}=A_{k}\delta x_{k}+B_{k}\delta u_{k}\\ &\underbrace{\begin{bmatrix}-J_{k}^{u}&0\\ 0&G^{\mathrm{cr}}_{k+1}\end{bmatrix}}_{:=\bar{G}_{k}}\begin{bmatrix}\delta u_{k}\\ \delta x_{k+1}\end{bmatrix}\leq\begin{bmatrix}c^{u}_{k}\\ h^{cr}_{k+1}\end{bmatrix}:=\bar{h}_{k}.\end{split} (13)

Let δπ^k(δxk)\delta\hat{\pi}^{*}_{k}(\delta x_{k}) and y^k(δxk)\hat{y}^{*}_{k}(\delta x_{k}) denote the optimal control perturbation and inequality dual solutions for the above one-step QP, as a function of δxk\delta x_{k}. Define the sensitivity matrices KkuK_{k}^{u} and KkyK_{k}^{y} as the Jacobians  [27] of δπ^k\delta\hat{\pi}^{*}_{k} and y^k\hat{y}^{*}_{k} respectively, evaluated at δxk\delta x_{k}^{*}, and define the affine functions:

δπ^k,a(δxk)\displaystyle\delta\hat{\pi}^{*}_{k,a}(\delta x_{k}) =δπ^k(δxk)+Kku(δxkδxk)\displaystyle=\delta\hat{\pi}^{*}_{k}(\delta x_{k}^{*})+K_{k}^{u}(\delta x_{k}-\delta x_{k}^{*}) (14a)
y^k,a(δxk)\displaystyle\hat{y}^{*}_{k,a}(\delta x_{k}) =y^k(δxk)+Kky(δxkδxk).\displaystyle=\hat{y}^{*}_{k}(\delta x_{k}^{*})+K_{k}^{y}(\delta x_{k}-\delta x_{k}^{*}). (14b)

Recurse: Compute:

Gkcr:=[JkxG¯k[KkuA¯k]Kky],hkcr=[ckxh¯kG¯k[IBk]δπ^k,a(0)y^k,a(0)],\small G^{\mathrm{cr}}_{k}:=\begin{bmatrix}-J_{k}^{x}\\ \bar{G}_{k}\begin{bmatrix}K^{u}_{k}\\ \bar{A}_{k}\end{bmatrix}\\ -K_{k}^{y}\end{bmatrix},\ h^{\mathrm{cr}}_{k}=\begin{bmatrix}c_{k}^{x}\\ \bar{h}_{k}-\bar{G}_{k}\begin{bmatrix}I\\ B_{k}\end{bmatrix}\delta\hat{\pi}^{*}_{k,a}(0)\\ \hat{y}^{*}_{k,a}(0)\end{bmatrix}, (15)

where A¯k:=Ak+BkKku\bar{A}_{k}:=A_{k}+B_{k}K_{k}^{u}, and:

r¯k\displaystyle\bar{r}_{k} =rk+BkTpk+1R¯k=Zk,uu+BkTPk+1Bk\displaystyle=r_{k}+B_{k}^{T}p_{k+1}\quad\bar{R}_{k}=Z_{k,uu}+B_{k}^{T}P_{k+1}B_{k} (16a)
M¯k\displaystyle\bar{M}_{k} =Zk,xu+AkTPk+1Bk\displaystyle=Z_{k,xu}+A_{k}^{T}P_{k+1}B_{k} (16b)
pk\displaystyle p_{k} =qk+AkTpk+1+KkuTr¯k+(KkuTR¯k+M¯k)δπ^k,a(0)\displaystyle=q_{k}+A_{k}^{T}p_{k+1}+K_{k}^{u^{T}}\bar{r}_{k}+(K_{k}^{u^{T}}\bar{R}_{k}+\bar{M}_{k})\delta\hat{\pi}^{*}_{k,a}(0) (16c)
Pk\displaystyle P_{k} =Zk,xx+AkTPk+1Ak+KkuTR¯kKkuT+\displaystyle=Z_{k,xx}+A_{k}^{T}P_{k+1}A_{k}+K_{k}^{u^{T}}\bar{R}_{k}K_{k}^{u^{T}}+
+M¯kKku+KkuTM¯kT\displaystyle\quad+\bar{M}_{k}K_{k}^{u}+K_{k}^{u^{T}}\bar{M}_{k}^{T} (16d)
vk\displaystyle v_{k} =r¯kTδπ^k,a(0)+12δπ^k,a(0)TR¯kδπ^k,a(0)+vk+1.\displaystyle=\bar{r}_{k}^{T}\delta\hat{\pi}^{*}_{k,a}(0)+\dfrac{1}{2}\delta\hat{\pi}^{*}_{k,a}(0)^{T}\bar{R}_{k}\delta\hat{\pi}^{*}_{k,a}(0)+v_{k+1}. (16e)

We now characterize the correctness of this DP recursion in the following theorem; the proof is provided in Appendix D.

Theorem 1.

Suppose that for each k{N1,,0}k\in\{N-1,\ldots,0\}, the solution of the one-step QP in (13) at δxk=δxk\delta x_{k}=\delta x_{k}^{*} satisfies Linear Independent Constraint Qualification (LICQ). Then, the recursion in (15)–(16) is well-defined. Define the sets 𝒞k:{δxk:Gkcrδxkhkcr}\mathcal{CR}_{k}:\{\delta x_{k}:G^{\mathrm{cr}}_{k}\delta x_{k}\leq h^{\mathrm{cr}}_{k}\}, for k=N,,0k=N,\ldots,0. It holds that:

δxk𝒞kk=N,,0and{δxk𝒞k:δπk(δxk)=δπ^k,a(δxk)δxk+1𝒞k+1:δVk+1(δxk+1)=δV~k+1(δxk+1),\begin{split}&\delta x_{k}^{*}\in\mathcal{CR}_{k}\ k=N,\ldots,0\ \ \text{and}\\ &\begin{cases}\forall\delta x_{k}\in\mathcal{CR}_{k}&:\ \delta\pi^{*}_{k}(\delta x_{k})=\delta\hat{\pi}^{*}_{k,a}(\delta x_{k})\\ \forall\delta x_{k+1}\in\mathcal{CR}_{k+1}&:\ \delta V_{k+1}(\delta x_{k+1})=\delta\tilde{V}_{k+1}(\delta x_{k+1}),\end{cases}\end{split}

for k=N1,,0k=N-1,\ldots,0.

Remark 3.

A notable consequence of Theorem 1 is that the canonical cost-to-go recursion is ill-posed in the presence of constraints. One must back-propagate both the cost-to-go terms and a set of constraints (i.e., the sets {𝒞k}\{\mathcal{CR}_{k}\}) that define the regions where the quadratic models of the cost-to-go functions are precise.

Despite the exactness of the DP recursion, there are some computational drawbacks. First, one must solve both the “full-horizon” QP defined in (4) and the one-step QPs defined in (13), serially. Second, back-propagating sets {𝒞k}\{\mathcal{CR}_{k}\} is not numerically robust, particularly if the sensitivity KkyK_{k}^{y} is ill-defined. This occurs when the LICQ condition fails and the resulting matrix solve computation for the sensitivities is singular. Thus, in the next section, we outline a parallelized and tuneable approximation to the sensitivity gains, derived from the viewpoint of interior point methods.

III-C Approximating the Sensitivity Gains

For k=N1,,0k=N-1,\ldots,0, define problem 𝒫k(δx)\mathcal{P}_{k\multimap}(\delta x) as the tail portion of QP sub-problem (4), starting at time-step kk at δx\delta x. Let 𝜹𝒖k(δx)\bm{\delta u}^{*}_{k\multimap}(\delta x) represent the optimal solution as a function of δx\delta x, i.e., the optimal control perturbation sequence starting at time-step kk. Notice that δπk(δx)\delta\pi_{k}^{*}(\delta x), as defined in (9), corresponds to the first element of the sequence 𝜹𝒖k(δx)\bm{\delta u}^{*}_{k\multimap}(\delta x).

Now define the QP problem 𝒫k(δx)\mathcal{P}_{k}(\delta x) as QP sub-problem (4), subject to an additional equality constraint: δxk=δx\delta x_{k}=\delta x, and let 𝜹𝒖k:(δx)\bm{\delta u}^{*}_{k:}(\delta x) represent the optimal tail control perturbation sequence starting at time-step kk.

Notice then that for all δx\delta x where 𝒫k(δx)\mathcal{P}_{k}(\delta x) is feasible, we have that 𝜹𝒖k(δx)=𝜹𝒖k:(δx)\bm{\delta u}^{*}_{k\multimap}(\delta x)=\bm{\delta u}^{*}_{k:}(\delta x). Thus, δπk(δx)\delta\pi_{k}^{*}(\delta x) is equal to the optimal control perturbation at time-step kk for problem 𝒫k(δx)\mathcal{P}_{k}(\delta x), hereby denoted as the function δπ^k(δx)\delta\hat{\pi}_{k}^{*}(\delta x). Further, since 𝒫k(δxk)\mathcal{P}_{k}(\delta x_{k}^{*}) is feasible, one may compute the desired sensitivity gains KkuK_{k}^{u} as the Jacobian δπ^k(δxk)/δx\partial\delta\hat{\pi}_{k}^{*}(\delta x_{k}^{*})/\partial\delta x.

Note the distinction: the sensitivity δπ^k/δx\partial\delta\hat{\pi}_{k}^{*}/\partial\delta x corresponds to the Jacobian of the solution of a fixed-horizon QP (problem 𝒫k(δx)\mathcal{P}_{k}(\delta x)) w.r.t. a parameter (δx\delta x) that defines the equality constraint δxk=δx\delta x_{k}=\delta x. In comparison, the sensitivity δπk/δx\partial\delta\pi_{k}^{*}/\partial\delta x corresponds to the Jacobian of the solution of a variable-horizon QP (problem 𝒫k(δx)\mathcal{P}_{k\multimap}(\delta x)) w.r.t. a parameter (δx\delta x) that defines the “initial condition.” The former computation is easily parallelized.

Leveraging a recent result in [28], we approximate δπ^k(δxk)/δx\partial\delta\hat{\pi}_{k}^{*}(\delta x_{k}^{*})/\partial\delta x by the Jacobian of the solution of the following unconstrained barrier re-formulation of problem 𝒫k(δx)\mathcal{P}_{k}(\delta x) w.r.t. δx\delta x at δx=δxk\delta x=\delta x_{k}^{*}:

min𝜹𝒖,𝜹𝒙j=0N1[l~jγ𝟏Tlog(cjx+Jjxδxj)γ𝟏Tlog(cju+Jjuδuj)]+12γδxkδx2+l~Nγ𝟏Tlog(cNx+JNxδxN),\begin{split}\small\min_{\bm{\delta u},\bm{\delta x}}\ &\sum_{j=0}^{N-1}\bigg{[}\tilde{l}_{j}-\gamma\bm{1}^{T}\log(c_{j}^{x}+J_{j}^{x}\delta x_{j})-\gamma\bm{1}^{T}\log(c_{j}^{u}+J_{j}^{u}\delta u_{j})\bigg{]}\\ &\ +\dfrac{1}{2\gamma}\|\delta x_{k}-\delta x\|^{2}+\tilde{l}_{N}-\gamma\bm{1}^{T}\log(c_{N}^{x}+J_{N}^{x}\delta x_{N}),\end{split}

subject to the linear dynamics in (4b); where γ>0\gamma>0 is the barrier constant. Denote Kku(γ)K_{k}^{u}(\gamma) to be the barrier-based Jacobian with parameter γ\gamma and let KkuK_{k}^{u} be the true Jacobian. Under appropriate conditions on the solution of QP sub-problem (4), Kku(γ)KkuK_{k}^{u}(\gamma)\rightarrow K_{k}^{u} as γ0\gamma\rightarrow 0 [28].

Practically, we compute the Jacobians Kku(γ)K_{k}^{u}(\gamma) efficiently using iLQR and a straightforward application of the Implicit Function Theorem [29]. We initialized the solver with the QP sub-problem (4) solution 𝜹𝒖\bm{\delta u}^{*}, and found only a handful of iterations were needed to converge, particularly since problem (III-C) is convex.

III-D Hybrid SQP Algorithm

We formally state the Hybrid SQP algorithm as a line-search modification of Shooting SQP, introduced in Section II. Thus, at the current primal-dual iterate (𝒖,𝒙[𝒖],𝒚)(\bm{u},\bm{x}[\bm{u}],\bm{y}):

Step 1: Solve QP-subproblem (4) to obtain the optimal perturbation sequence pair (𝜹𝒖,𝜹𝒙)(\bm{\delta u}^{*},\bm{\delta x}^{*}).

Step 2: Compute the sensitivity gains {Kk}k=0N1\{K_{k}\}_{k=0}^{N-1}, using either the DP recursion in Section III-B (i.e., Kk=KkuK_{k}=K_{k}^{u}), or the smoothed approximation method in Section III-C (i.e., Kk=Kku(γ)K_{k}=K_{k}^{u}(\gamma) for some γ>0\gamma>0).

Step 3: Perform line-search using (6), where 𝒙[α]:=𝒙+𝜹𝒙[α]\bm{x}[\alpha]:=\bm{x}+\bm{\delta x}[\alpha] is now defined by the closed-loop rollout:

δuk[α]:=clipu¯uku¯uk[αδuk+Kk(δxk[α]αδxk)]δxk+1[α]=f(xk+δxk[α],uk+δuk[α])xk+1,δx0[α]=0.\begin{split}\delta u_{k}[\alpha]:=&\mathrm{clip}_{\underline{u}-u_{k}}^{\overline{u}-u_{k}}\left[\alpha\delta u_{k}^{*}+K_{k}(\delta x_{k}[\alpha]-\alpha\delta x_{k}^{*})\right]\\ \delta x_{k+1}[\alpha]&=f(x_{k}+\delta x_{k}[\alpha],u_{k}+\delta u_{k}[\alpha])-x_{k+1},\ \delta x_{0}[\alpha]=0.\end{split}

Notice that if α=1\alpha=1, the rollout corresponds with the ideal DDP rollout in (12), while for α<1\alpha<1, we end up with an approximation222As the sensitivity gains are only valid in a neighborhood of 𝜹𝒙\bm{\delta x}^{*}, it is possible (though rare in our experiments) that the computed step-length α\alpha falls below the user-set threshold α¯\underline{\alpha} for a specific iteration. As a backup, we compute a set of TV-LQR gains {Kklqr}\{K_{k}^{\mathrm{lqr}}\} using the linearized dynamics and the Hessian of the objective function 𝒥(𝒖,𝒙)\mathcal{J}(\bm{u},\bm{x}), and perform the closed-loop rollout with these gains. This strategy is motivated by the goal of tracking the perturbation α𝜹𝒙\alpha\bm{\delta x}^{*} during the rollout [13]. stemming from using a fixed gain matrix.

IV Experiments

We compare the naïve, open-loop, Shooting SQP implementation introduced in Section II (referred to as OL) with the DDP-style closed-loop variation developed in Section III (referred to as CL and CLγ\texttt{\bf{CL}}_{\gamma}) on two environments. The identifiers CL and CLγ\texttt{\bf{CL}}_{\gamma} distinguish between the exact DP recursion and the smoothed barrier-based approximation for computing the forward rollout gains. Please see Appendix E for details regarding problem setup, SQP hyperparameters, additional plots, and an extra worked example.

IV-A Motion Planning for a Car

The first example is taken from [17], featuring a 2D car (n=4n=4, m=2m=2) moving within the obstacle-ridden environment shown in Figure 2. The objective is to drive to the goal position (3,3)(3,3) with final velocity 0 and orientation aligned with the horizontal axis in N=40N=40 steps, while avoiding the obstacles.

Figure 2 shows the computed trajectories by the three methods for three different initial conditions, while Table I provides the solver statistics. Notice that the OL method fails to converge within 100 iterations (the limit) for two of the three cases. In contrast, both closed-loop variations are quickly able to converge to stationary solutions.

Method Case Converged Iter Obj Viol Time/it [s]
OL 1 100 278.53 -0.0346 0.34
2 100 2.17 0.0061 0.30
3 12 21.49 3.25e-6 0.30
CL 1 19 3.19 7.87e-6 13.17
2 19 6.98 1.43e-6 19.19
3 13 21.58 1.12e-5 17.90
CLγ\texttt{\bf{CL}}_{\gamma} 1 19 3.19 1.34e-5 0.40
2 16 2.06 1.28e-5 0.41
3 11 21.58 -4.67e-6 0.38
TABLE I: Solver statistics for the car planning example. Baseline: OL, Ours: {CL,CLγ}\{\texttt{\bf{CL}},\texttt{\bf{CL}}_{\gamma}\}. Iter, Obj, and Viol report the number of iterations, objective, and max state constraint violation (negative value indicates infeasibility), respectively, for the final solution. Time/it reports the average (over the course of the optimization) computation time per SQP iteration.
Refer to caption
Figure 2: Solution trajectories for all solvers on the car planning example. Start/end locations shaded green/red.
Refer to caption
Figure 3: Re-construction error between open-loop solution 𝜹𝒖\bm{\delta u}^{*} and perturbation policy solution δπ^k(δxk)\delta\hat{\pi}_{k}^{*}(\delta x_{k}^{*}) as a function of time-step across all SQP iterations. Top: CL; Bottom: CLγ\texttt{\bf{CL}}_{\gamma}, γ=104\gamma=10^{-4}; Left-to-right: Case index. Colorbar identifies the SQP iteration.

In Figure 3, we plot the re-construction error δπ^k(δxk)δuk\|\delta\hat{\pi}_{k}^{*}(\delta x_{k}^{*})-\delta u_{k}^{*}\| for CL and CLγ\texttt{\bf{CL}}_{\gamma} over the course of the SQP iterations. Observe that for most iterations, the error is negligible for CL, with occasional spikes resulting from the numerical instability of the constrained DP recursion. In contrast, the error remains sufficiently low for all iterations of CLγ\texttt{\bf{CL}}_{\gamma}, even leading to a better quality (lower objective) solution for Case #2. We hypothesize that the better numerical stability of CLγ\texttt{\bf{CL}}_{\gamma} stems from a smoothing of the computed Jacobians (i.e., feedback gains), courtesy of the unconstrained barrier re-formulation. Finally, a notable advantage of CLγ\texttt{\bf{CL}}_{\gamma} over CL is the computation time. While CL involves differentiating through the KKT conditions of one-step horizon QPs, this computation must happen serially in the backward pass. In contrast, CLγ\texttt{\bf{CL}}_{\gamma} computes the required Jacobians across all time-steps in parallel using an efficient adjoint recursion associated with problem (III-C). Consequently, the computation times-per iteration are much closer together for OL and CLγ\texttt{\bf{CL}}_{\gamma} than for OL and CL. For the remaining experiments, we only compare OL and CLγ\texttt{\bf{CL}}_{\gamma}.

IV-B Quad-Pendulum

Consider a quadrotor with an attached pendulum (n=8n=8, m=2m=2) moving within an obstacle-cluttered 2D vertical plane, subject to viscous friction at the pendulum joint. The system is subject to operational constraints on the state and control input, as well as obstacle avoidance constraints. The task involves planning a trajectory starting at rest with the pendulum at the stable equilibrium, to a goal location, with the pendulum upright and both quadrotor and pendulum stationary. Table II provides the solver statistics for CLγ\texttt{\bf{CL}}_{\gamma} and OL (up until the algorithm stalls due to infeasibility of the QP sub-problem). Figure 1 shows a timelapse of the solution for the more difficult of the two cases, highlighting the ability of CLγ\texttt{\bf{CL}}_{\gamma} in solving challenging planning tasks.

Method Case Converged Iter Obj Viol Time/it [s]
OL 1 stall (2) 2272.8 -7.33 1.49
2 stall (4) 488.9 -0.58 1.35
CLγ\texttt{\bf{CL}}_{\gamma} 1 39 9.31 -2.73e-10 6.25
2 59 11.57 -4.33e-10 5.65
TABLE II: Solver statistics for the Quad-Pendulum example. Algorithm OL stalls within a handful of iterations.

.

V Conclusions

In this work, we re-interpret DDP rollout policies from a perspective of sensitivity-based corrections, and use this insight to develop algorithms for computing analogous policies for constrained problems. We incorporate the resulting closed-loop rollouts within a shooting-based SQP framework, and demonstrate significant improvements in convergence speed over a standard SQP implementation using open-loop rollouts.

Our work opens several avenues for future research. First, a key bottleneck of SQP involves solving the QP sub-problem at each iteration to compute a “nominal” perturbation sequence. This may be achieved with fast, but potentially, less-accurate unconstrained solvers (e.g., augmented-Lagrangian iLQR), that additionally compute the desired sensitivity gains using an efficient Riccati recursion. Second, leveraging recent results on differentiating through the solution of general convex problems, the sensitivity-based computations may be applied to the sequential-convex-programming algorithm. Finally, while the SQP algorithm was studied in the shooting context, recent work [30] has demonstrated how to incorporate nonlinear rollouts with both states and controls as optimization variables, albeit in an otherwise unconstrained setting. The sensitivity-based gain computation can be extended to this setting, potentially boosting the efficiency of “full” direct methods.

References

  • [1] J. Schulman, Y. Duan, J. Ho, A. Lee, I. Awwal, H. Bradlow, J. Pan, S. Patil, K. Goldberg, and P. Abbeel, “Motion planning with sequential convex optimization and convex collision checking,” The International Journal of Robotics Research, vol. 33, no. 9, pp. 1251–1270, 2014.
  • [2] M. Kalakrishnan, S. Chitta, E. Theodorou, P. Pastor, and S. Schaal, “STOMP: Stochastic trajectory optimization for motion planning,” in 2011 IEEE international conference on robotics and automation.   IEEE, 2011, pp. 4569–4574.
  • [3] V. Sindhwani, R. Roelofs, and M. Kalakrishnan, “Sequential operator splitting for constrained nonlinear optimal control,” in 2017 American Control Conference (ACC).   IEEE, 2017, pp. 4864–4871.
  • [4] T. A. Howell, B. E. Jackson, and Z. Manchester, “ALTRO: A fast solver for constrained trajectory optimization,” in 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2019, pp. 7674–7679.
  • [5] Y. Tassa, T. Erez, and W. D. Smart, “Receding horizon differential dynamic programming,” in nips, 2007, pp. 1465–1472.
  • [6] D. Bertsekas, Nonlinear Programming.   Athena Scientific; 3rd edition, 2016.
  • [7] W. Li and E. Todorov, “Iterative linear quadratic regulator design for nonlinear biological movement systems.” in ICINCO (1), 2004, pp. 222–229.
  • [8] D. Q. Mayne, “Differential dynamic programming–a unified approach to the optimization of dynamic systems,” Control and Dynamic Systems, vol. 10, pp. 179–254, 1973.
  • [9] D. Murray and S. Yakowitz, “Differential dynamic programming and newton’s method for discrete optimal control problems,” Journal of Optimization Theory and Applications, vol. 43, no. 3, pp. 395–414, 1984.
  • [10] J. C. Dunn and D. Bertsekas, “Efficient dynamic programming implementations of Newton’s method for unconstrained optimal control problems,” Journal of Optimization Theory and Applications, vol. 63, no. 1, pp. 23–38, 1989.
  • [11] L.-z. Liao and C. A. Shoemaker, “Advantages of differential dynamic programming over Newton’s method for discrete-time optimal control problems,” Cornell University, Tech. Rep., 1992.
  • [12] F. Borrelli, A. Bemporad, and M. Morari, Predictive control for linear and hybrid systems.   Cambridge University Press, 2017.
  • [13] M. J. Tenny, S. J. Wright, and J. B. Rawlings, “Nonlinear model predictive control via feasibility-perturbed sequential quadratic programming,” Computational Optimization and Applications, vol. 28, no. 1, pp. 87–121, 2004.
  • [14] Y. Tassa, N. Mansard, and E. Todorov, “Control-limited differential dynamic programming,” in 2014 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2014, pp. 1168–1175.
  • [15] J. Marti-Saumell, J. Solà, C. Mastalli, and A. Santamaria-Navarro, “Squash-box feasibility driven differential dynamic programming,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2020, pp. 7637–7644.
  • [16] M. Giftthaler and J. Buchli, “A projection approach to equality constrained iterative linear quadratic optimal control,” in 2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids).   IEEE, 2017, pp. 61–66.
  • [17] Y. Aoyama, G. Boutselis, A. Patel, and E. A. Theodorou, “Constrained differential dynamic programming revisited,” arXiv preprint arXiv:2005.00985, 2020.
  • [18] Z. Xie, C. K. Liu, and K. Hauser, “Differential dynamic programming with nonlinear constraints,” in 2017 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2017, pp. 695–702.
  • [19] S. Yakowitz, “The stagewise Kuhn-Tucker condition and differential dynamic programming,” IEEE transactions on automatic control, vol. 31, no. 1, pp. 25–30, 1986.
  • [20] G. Lantoine and R. P. Russell, “A hybrid differential dynamic programming algorithm for constrained optimal control problems. part 1: Theory,” Journal of Optimization Theory and Applications, vol. 154, no. 2, pp. 382–417, 2012.
  • [21] B. Plancher, Z. Manchester, and S. Kuindersma, “Constrained unscented dynamic programming,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2017, pp. 5674–5680.
  • [22] J. Ma, Z. Cheng, X. Zhang, M. Tomizuka, and T. H. Lee, “Alternating direction method of multipliers for constrained iterative LQR in autonomous driving,” arXiv preprint arXiv:2011.00462, 2020.
  • [23] A. Pavlov, I. Shames, and C. Manzie, “Interior point differential dynamic programming,” IEEE Transactions on Control Systems Technology, 2021.
  • [24] S. Singh, J. J. Slotine, and V. Sindhwani, “Optimizing trajectories with closed-loop dynamic SQP,” arXiv preprint arXiv:2109.07081, 2021.
  • [25] P. E. Gill, W. Murray, M. A. Saunders, and M. H. Wright, “User’s guide for NPSOL (version 4.0): A Fortran package for nonlinear programming.” Stanford Univ CA Systems Optimization Lab, Tech. Rep., 1986.
  • [26] P. E. Gill, W. Murray, and M. A. Saunders, “SNOPT: An SQP algorithm for large-scale constrained optimization,” SIAM review, vol. 47, no. 1, pp. 99–131, 2005.
  • [27] A. Agrawal, B. Amos, S. Barratt, S. Boyd, S. Diamond, and J. Z. Kolter, “Differentiable convex optimization layers,” in nips, 2019, pp. 9562–9574.
  • [28] W. Jin, S. Mou, and G. J. Pappas, “Safe pontryagin differentiable programming,” arXiv preprint arXiv:2105.14937, 2021.
  • [29] B. Amos, I. Jimenez, J. Sacks, B. Boots, and J. Z. Kolter, “Differentiable mpc for end-to-end planning and control,” in Advances in Neural Information Processing Systems, 2018, pp. 8289–8300.
  • [30] C. Mastalli, R. Budhiraja, W. Merkt, G. Saurel, B. Hammoud, M. Naveau, J. Carpentier, L. Righetti, S. Vijayakumar, and N. Mansard, “Crocoddyl: An efficient and versatile framework for multi-contact optimal control,” in 2020 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2020, pp. 2536–2542.
  • [31] W. Murray and F. J. Prieto, “A sequential quadratic programming algorithm using an incomplete solution of the subproblem,” SIAM Journal on Optimization, vol. 5, no. 3, pp. 590–640, 1995.
  • [32] A. Bemporad, M. Morari, V. Dua, and E. N. Pistikopoulos, “The explicit linear quadratic regulator for constrained systems,” Automatica, vol. 38, no. 1, pp. 3–20, 2002.
  • [33] R. Tedrake, “Underactuated robotics: Learning, planning, and control for efficient and agile machines course notes for MIT 6.832,” MIT, Tech. Rep., 2009.

Appendix A Related Work

iLQR/DDP-like methods for constrained trajectory optimization fall into one of three main categories: control-bounds only [14],[15], modification of the backward pass via KKT analysis [16, 17, 18, 19, 20], and augmented Lagrangian methods [21, 4, 3, 22, 23]. In the first category, only control-bound constraints are considered. For instance, [14] leverage the box-constrained DDP algorithm where the projected Newton algorithm is used to compute the affine perturbation policy in the backward pass, accounting for the box constraints on the control input. On the other hand, [15] composes a “squashing” function to constrain the input, and augments the objective with a barrier penalty to encourage the iterates to stay away from the plateaued regions of squashing function.

More general constraints are considered in the second and third categories [16, 17, 18, 19, 20, 21, 4, 3, 22, 23]. The work in [16] features equality constraints. The update step leverages constraint linearization and a nullspace projection to reduce the problem to a singular optimal control problem in an unconstrained control input lying within the nullspace of the linearized constraints. The works in [17, 18, 19, 20] formulate a constrained backward pass, where the one-step optimization problems now feature general state and control inequality constraints, linearized about the current trajectory iterate, yielding constrained one-step QPs. The algorithm in [18] extracts a guess of the active inequality constraints at each time-step by looking at the current trajectory iterate, and formulates the backward pass using linearized active constraints as equalities. The resulting KKT system is solved using Schur’s complement and using the dual variable extracted from the “feedforward” perturbation (i.e., the control perturbation computed assuming zero state perturbation) to yield an affine perturbation policy. The work in [19] formulates the theoretical underpinnings of such a constrained backward pass by introducing the stage-wise KKT conditions, and implements a very similar algorithm to [18], but where the active set is guessed to be equal to the active set at the feedforward solution, computed using linearized inequality constraints. [20] implements an identical backward pass, but with the active set guessed by extracting the violating constraints from implementing the feedforward perturbation (computed using a trust-region constraint). More recently, [17] leverages a slack-form of the linearized inequality constraints in the one-step optimization problems and uses an iterative procedure to refine the dual variable assuming zero state perturbation. A final solve of the KKT system using Schur’s complement yields the affine perturbation policy. The forward pass of all these methods discard this affine policy and instead use the one-step QPs to compute the control perturbation trajectory. In similar spirit, [23] introduced an interior-point variation of the backward pass, where the one-step optimization is re-written in min-max form w.r.t. the Lagrangian, and linearization of the associated perturbed KKT conditions is used to compute the affine perturbation policy.

The algorithms in [21, 4, 3, 22] incorporate the constraints by forming the augmented Lagrangian, and alternate between unconstrained trajectory optimization (with the augmented Lagrangian as the objective), and updating the dual variables and penalty parameter – loosely mimicking the method of multipliers [6][21] implements this in combination with a sampling-based construction of the optimal cost-to-go approximation; [4] dovetails the multipliers algorithm with Newton-based projections to project the intermediate solution onto the current active set;  [3] adopts an ADMM-based solution, by leveraging indicator function representations of the constraints and alternates between an unconstrained TV-LQR problem, linearized constraint projection, consensus update, and dual update.

In contrast to the methods above, our approach does not involve guessing active constraint sets for the one-step QPs and instead re-interprets the DDP-style control law from the lens of sensitivity analysis about a known feasible perturbation sequence. We use this interpretation to derive an augmented “backward pass” where we additionally back-propagate a “next-step” constraint set, along with the quadratic cost-to-go parameters. Further, we demonstrate how to approximate the desired feedback gains across all time-steps in parallel, dispelling the need for backward passes which are numerically less robust. Finally, we embed the closed-loop rollout within a theoretically sound SQP framework.

Appendix B SQP Preliminaries

In this appendix, we outline the main steps of the SQP algorithm. These steps represent a simplified version of the commercial code NPSOL [25], and use the termination conditions from the commercial code SNOPT [26]. Consider the following smooth optimization problem:

𝒫:minxnf(x)s.t.c(x)0m,\begin{split}\mathcal{P}:\quad\min_{x\in\mathbb{R}^{n}}\quad&f(x)\\ \mathrm{s.t.}\quad&c(x)\in\mathbb{R}^{m}_{\geq 0},\end{split} (17)

where the functions ff and cc are at least 𝒞2\mathcal{C}^{2}. Define the Lagrangian as:

(x,y)=f(x)yTc(x).\mathcal{L}(x,y)=f(x)-y^{T}c(x).

A standard SQP iteration consists of the following steps:

  • Solve a quadratic problem (QP) at current primal-dual point (x,y)(x,y):

    δx=argminδxg(x)Tδx+12δxTH(x,y)δxs.t.c(x)+J(x)δx0.\begin{split}\delta x^{*}=\operatornamewithlimits{\mathrm{argmin}}_{\delta x}\quad&g(x)^{T}\delta x+\dfrac{1}{2}\delta x^{T}H(x,y)\delta x\\ \mathrm{s.t.}\quad&c(x)+J(x)\delta x\geq 0.\end{split}

    where g(x)=xf(x)g(x)=\nabla_{x}f(x), H(x,y)=xx2(x,y)H(x,y)=\nabla^{2}_{xx}\mathcal{L}(x,y), and J(x)=c(x)/xJ(x)=\partial c(x)/\partial x. Let y^\hat{y} be the optimal dual variable for the inequality constraint above, and define the primal-dual search directions (δx,δy)(\delta x^{*},\delta y^{*}), where δy:=y^y\delta y^{*}:=\hat{y}-y.

  • Define the Merit function to be the augmented Lagrangian:

    (x,y,s;ρ)=f(x)yT(c(x)s)+12ρc(x)s2,\mathcal{M}(x,y,s;\rho)=f(x)-y^{T}(c(x)-s)+\dfrac{1}{2}\rho\|c(x)-s\|^{2},

    where s0ms\in\mathbb{R}^{m}_{\geq 0} is the inequality “slack,” defined only for the line-search, and ρ\rho is a non-negative penalty parameter. At the current iterate (x,y)(x,y) and penalty value ρ\rho, set ss as follows:

    s={max(0,c(x))ifρ=0max(0,c(x)1ρy)otherwise,s=\begin{cases}\max(0,c(x))\quad&\text{if}\ \rho=0\\ \max\left(0,c(x)-\dfrac{1}{\rho}y\right)\quad&\text{otherwise}\end{cases},

    where the max\max operator is component-wise, and set the slack search direction δs\delta s^{*} as c(x)+J(x)δxsc(x)+J(x)\delta x^{*}-s.

  • Define the line-search function ϕ(α;ρ):=(x+αδx,y+αδy,s+αδs;ρ)\phi(\alpha;\rho):=\mathcal{M}(x+\alpha\delta x^{*},y+\alpha\delta y^{*},s+\alpha\delta s^{*};\rho) and pick the updated penalty parameter ρ+\rho^{+} s.t. ϕ(0;ρ+)(1/2)Δ\phi^{\prime}(0;\rho^{+})\leq-(1/2)\Delta^{*}, where Δ\Delta^{*} is the decrement, defined as: δxTH(x,y)δx\delta x^{*^{T}}H(x,y)\delta x^{*}. A simple update rule is given below [31]:

    ρ+{ρifϕ(0;ρ)12Δmax{2ρ,g(x)Tδx+12Δ+(2yy^)T(c(x)s)c(x)s2}otherwise.\rho^{+}\leftarrow\begin{cases}\rho\quad&\text{if}\quad\phi^{\prime}(0;\rho)\leq-\dfrac{1}{2}\Delta^{*}\\ \max\left\{2\rho,\dfrac{g(x)^{T}\delta x+\frac{1}{2}\Delta^{*}+(2y-\hat{y})^{T}(c(x)-s)}{\|c(x)-s\|^{2}}\right\}\quad&\text{otherwise}.\end{cases} (18)
  • Compute (e.g., using backtracking, safe-guarded polynomial interpolation, etc.) the largest step length α[α¯,1]\alpha\in[\underline{\alpha},1], where α¯\underline{\alpha} is a user-specified lower-bound, s.t. the following line-search conditions are satisfied:

    ϕ(α;ρ+)ϕ(0;ρ+)σαϕ(0;ρ+),and|ϕ(α;ρ+)|ηϕ(0;ρ+),\phi(\alpha;\rho^{+})-\phi(0;\rho^{+})\leq\sigma\alpha\phi^{\prime}(0;\rho^{+}),\quad\text{and}\quad|\phi^{\prime}(\alpha;\rho^{+})|\leq-\eta\phi^{\prime}(0;\rho^{+}), (19)

    where 0<ση<120<\sigma\leq\eta<\frac{1}{2}.

  • Update (x,y)(x,y) using the computed step length, and set ρρ+\rho\leftarrow\rho^{+}.

Note that this stripped-down version of SQP does not account for infeasibility detection, which involves slackened forms of the QP sub-problem.

The termination conditions are based upon specified relative tolerances τp,τd>0\tau_{p},\tau_{d}\in\mathbb{R}_{>0}. Define τx:=τp(1+x)\tau_{x}:=\tau_{p}(1+\|x\|), and τy:=τd(1+y)\tau_{y}:=\tau_{d}(1+\|y\|). Then, convergence to a KKT stationary point is declared and the algorithm terminated if:

{minici(x)τxminjyjτyc(x)yτyg(x)J(x)Tyτy\begin{cases}\min_{i}c_{i}(x)\geq-\tau_{x}\\ \min_{j}y_{j}\geq-\tau_{y}\\ \|c(x)\circ y\|_{\infty}\leq\tau_{y}\\ \|g(x)-J(x)^{T}y\|_{\infty}\leq\tau_{y}\end{cases} (20)

where \circ denotes the Hadamard product.

Appendix C Details for Shooting SQP

We provide here the explicit expressions used within various stages of Shooting SQP.

C-A QP Sub-Problem

The linear term in (4a) and (5) is given as follows:

𝜹𝒖,𝒖𝒥R(𝒖)=k=0N1xlkTδxk+ulkTδuk+xlNTδxN:=k=0N1qkTδxk+rkTδuk+qNTδxN\begin{split}\langle\bm{\delta u},\nabla_{\bm{u}}\mathcal{J}_{R}(\bm{u})\rangle&=\sum_{k=0}^{N-1}\nabla_{x}l_{k}^{T}\delta x_{k}+\nabla_{u}l_{k}^{T}\delta u_{k}+\nabla_{x}l_{N}^{T}\delta x_{N}\\ &:=\sum_{k=0}^{N-1}q_{k}^{T}\delta x_{k}+r_{k}^{T}\delta u_{k}+q_{N}^{T}\delta x_{N}\end{split} (21)

where δxk\delta x_{k} is defined recursively using the linearized dynamics:

δx0=0,δxk+1=Akδxk+Bkδuk,k=0,,N1.\delta x_{0}=0,\ \ \delta x_{k+1}=A_{k}\delta x_{k}+B_{k}\delta u_{k},\quad k=0,\ldots,N-1. (22)

The Hessian term in (4a) and (5) is given by:

𝜹𝒖,𝒖2R(𝒖,𝒚)𝜹𝒖=k=0N1[δxkδuk]T[xx2H^kxu2H^kuu2H^k]:=Zk[δxkδuk]+δxNTxx2l^N:=ZNδxN,where:l^k(xk,uk,yk):=lk(xk,uk)ykTck(xk,uk),k=0,,N1l^N(xN,yN):=lN(xN)yNTcN(xN)H^k=l^k(xk,uk,yk)+ν^k+1Tf(xk,uk),k=0,,N1{ν^N=xl^N(xN,yN)ν^k=xl^k(xk,uk,yk)+AkTν^k+1,k=N1,,1\begin{split}\langle\bm{\delta u},\nabla_{\bm{u}}^{2}\mathcal{L}_{R}(\bm{u},\bm{y})\bm{\delta u}\rangle=&\sum_{k=0}^{N-1}\begin{bmatrix}\delta x_{k}\\ \delta u_{k}\end{bmatrix}^{T}\overbrace{\begin{bmatrix}\nabla_{xx}^{2}\hat{H}_{k}&\nabla_{xu}^{2}\hat{H}_{k}\\ *&\nabla_{uu}^{2}\hat{H}_{k}\end{bmatrix}}^{:=Z_{k}}\begin{bmatrix}\delta x_{k}\\ \delta u_{k}\end{bmatrix}+\delta x_{N}^{T}\overbrace{\nabla_{xx}^{2}\hat{l}_{N}}^{:=Z_{N}}\delta x_{N},\\ &\qquad\qquad\text{where:}\\ &\hat{l}_{k}(x_{k},u_{k},y_{k}):=l_{k}(x_{k},u_{k})-y_{k}^{T}c_{k}(x_{k},u_{k}),\quad k=0,\ldots,N-1\\ &\hat{l}_{N}(x_{N},y_{N}):=l_{N}(x_{N})-y_{N}^{T}c_{N}(x_{N})\\ &\hat{H}_{k}=\hat{l}_{k}(x_{k},u_{k},y_{k})+\hat{\nu}_{k+1}^{T}f(x_{k},u_{k}),\quad k=0,\ldots,N-1\\ &\begin{cases}\hat{\nu}_{N}=\nabla_{x}\hat{l}_{N}(x_{N},y_{N})\\ \hat{\nu}_{k}=\nabla_{x}\hat{l}_{k}(x_{k},u_{k},y_{k})+A_{k}^{T}\hat{\nu}_{k+1},\ k=N-1,\ldots,1\end{cases}\end{split} (23)
Remark 4.

The matrices ZkZ_{k} above may not be positive semi-definite. To ensure problem (4) is convex, we project ZkZ_{k} to the positive semi-definite cone with some ϵ>0\epsilon>0 tolerance.

C-B Line-Search

Define sk=(skx,sku)s_{k}=(s_{k}^{x},s_{k}^{u}) where skxs_{k}^{x} and skus_{k}^{u} have the same dimensionality as ckxc_{k}^{x} and ckuc_{k}^{u} respectively. We use the augmented Lagrangian function as the merit function for line-search:

I(𝒖,𝒚,𝒔;ρ)=k=0N1lkykT(cksk)+ρk2cksk2+lNyNT(cNsN)+ρN2cNsN2,\mathcal{M}_{I}(\bm{u},\bm{y},\bm{s};\rho)=\sum_{k=0}^{N-1}l_{k}-y_{k}^{T}(c_{k}-s_{k})+\dfrac{\rho_{k}}{2}\|c_{k}-s_{k}\|^{2}+l_{N}-y_{N}^{T}(c_{N}-s_{N})+\dfrac{\rho_{N}}{2}\|c_{N}-s_{N}\|^{2}, (24)

where the dependence on 𝒙\bm{x} is implicit, i.e., 𝒙=𝒙[𝒖]\bm{x}=\bm{x}[\bm{u}]. The non-negative constants {ρk}k=0N\{\rho_{k}\}_{k=0}^{N} are a set of penalty parameters, which for brevity, we denote by ρ\rho. The vectors sks_{k} represent a set of slack vectors, used only for the line-search. At the current iterate (𝒖,𝒙)(\bm{u},\bm{x}) and set of penalty parameters ρ\rho, set sks_{k} as

sk={max(0,ck)ifρk=0max(0,ckyk/ρ)otherwise,s_{k}=\begin{cases}\max(0,c_{k})\quad&\text{if}\ \rho_{k}=0\\ \max(0,c_{k}-y_{k}/\rho)\quad&\text{otherwise}\end{cases},

and set δsk=(δskx,δsku)\delta s_{k}^{*}=(\delta s_{k}^{x^{*}},\delta s_{k}^{u^{*}}), the slack-search directions, as:

δskx=ckx(xk)+Jkxδxkskxδsku=cku(uk)+Jkuδuksku.\begin{split}\delta s_{k}^{x^{*}}&=c_{k}^{x}(x_{k})+J_{k}^{x}\delta x_{k}^{*}-s_{k}^{x}\\ \delta s_{k}^{u^{*}}&=c_{k}^{u}(u_{k})+J_{k}^{u}\delta u_{k}^{*}-s_{k}^{u}.\end{split}

As in Appendix B, we now must compute an updated set of penalty parameters ρ+\rho^{+} so that ϕ(0;ρ+)(1/2)𝜹𝒖,𝒖2R(𝒖,𝒚)𝜹𝒖:=(1/2)Δ\phi^{\prime}(0;\rho^{+})\leq-(1/2)\langle\bm{\delta u}^{*},\nabla_{\bm{u}}^{2}\mathcal{L}_{R}(\bm{u},\bm{y})\bm{\delta u}^{*}\rangle:=-(1/2)\Delta^{*}. To derive this update, let us consider an abstract optimization problem of the form:

minxnf(x)s.t.ck(x)0,k=0,,N,\begin{split}\min_{x\in\mathbb{R}^{n}}\quad&f(x)\\ \mathrm{s.t.}\quad&c_{k}(x)\geq 0,\quad k=0,\ldots,N,\end{split}

where {ck:nmk,k=0,,N}\{c_{k}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{m_{k}},k=0,\ldots,N\} is a set of vector-valued inequality constraints, with corresponding Jacobians Jk(x):=ck(x)/xJ_{k}(x):=\partial c_{k}(x)/\partial x. Let 𝒚=(y0,,yN)\bm{y}=(y_{0},\ldots,y_{N}) be the stacked dual vector for the inequality constraints, H(x,𝒚)H(x,\bm{y}) the Lagrangian Hessian, and g(x)g(x) the objective gradient. The QP sub-problem is then defined as:

δx=argminδxnψ(δx):=g(x)Tδx+12δxTH(x,𝒚)δxs.t.ck(x)+Jk(x)δx0,k=0,,N.\begin{split}\delta x^{*}=\operatornamewithlimits{\mathrm{argmin}}_{\delta x\in\mathbb{R}^{n}}\quad&\psi(\delta x):=g(x)^{T}\delta x+\dfrac{1}{2}\delta x^{T}H(x,\bm{y})\delta x\\ \mathrm{s.t.}\quad&c_{k}(x)+J_{k}(x)\delta x\geq 0,\quad k=0,\ldots,N.\end{split}

Let y^k\hat{y}_{k} be the optimal dual vector for the QP-subproblem for constraint kk and define δyk:=y^kyk\delta y_{k}^{*}:=\hat{y}_{k}-y_{k}. Consider the Augmented Lagrangian, \mathcal{M}, defined as:

(x,𝒚,𝒔;{ρk}k=0N)=f(x)k=0NykT(ck(x)sk)+k=0N12ρkck(x)sk2(sk0),\mathcal{M}(x,\bm{y},\bm{s};\{\rho_{k}\}_{k=0}^{N})=f(x)-\sum_{k=0}^{N}y_{k}^{T}(c_{k}(x)-s_{k})+\sum_{k=0}^{N}\dfrac{1}{2}\rho_{k}\|c_{k}(x)-s_{k}\|^{2}\quad(s_{k}\geq 0),

where sk0mks_{k}\in\mathbb{R}^{m_{k}}_{\geq 0} is the “slack” for the kthk^{\mathrm{th}} constraint, and 𝒔\bm{s} denotes the concatenatation (s0,,sN)(s_{0},\ldots,s_{N}). At the current iterate (x,𝒚)(x,\bm{y}) set the slack vectors sks_{k} as follows:

sk={max(0,ck)ifρk=0max(0,ckyk/ρ)otherwise,s_{k}=\begin{cases}\max(0,c_{k})\quad&\text{if}\ \rho_{k}=0\\ \max(0,c_{k}-y_{k}/\rho)\quad&\text{otherwise}\end{cases},

and set the slack search directions δsk=ck(x)+Jk(x)δxsk\delta s_{k}^{*}=c_{k}(x)+J_{k}(x)\delta x^{*}-s_{k}. Define the line-search function ϕ(α;ρ)\phi(\alpha;\rho) as:

ϕ(α;ρ):=(x+αδx,𝒚+α𝜹𝒚,𝒔+α𝜹𝒔;ρ).\phi(\alpha;\rho):=\mathcal{M}(x+\alpha\delta x^{*},\bm{y}+\alpha\bm{\delta y},\bm{s}+\alpha\bm{\delta s};\rho).

Now, we wish to choose the set of penalty parameters {ρk+}k=0N\{\rho_{k}^{+}\}_{k=0}^{N} s.t. ϕ(0;ρ+)12δxTH(x,𝒚)δx:=12Δ\phi^{\prime}(0;\rho^{+})\leq-\dfrac{1}{2}\delta x^{*^{T}}H(x,\bm{y})\delta x^{*}:=-\dfrac{1}{2}\Delta^{*}. Consider the gradient ϕ(0;ρ+)\phi^{\prime}(0;\rho^{+}):

ϕ(0;ρ+)=gTδx+k=0N[δxTJkTyk+ρk+δxTJkT(cksk)(cksk)Tδyk+ykTδskρk+δskT(cksk)]=gTδx+k=0N(2yky^k)T(cksk)ρk+cksk2,\begin{split}\phi^{\prime}(0;\rho^{+})&=g^{T}\delta x^{*}+\sum_{k=0}^{N}\left[-\delta x^{*^{T}}J_{k}^{T}y_{k}+\rho_{k}^{+}\delta x^{*^{T}}J_{k}^{T}(c_{k}-s_{k})-(c_{k}-s_{k})^{T}\delta y_{k}^{*}+y_{k}^{T}\delta s_{k}^{*}-\rho_{k}^{+}\delta s_{k}^{*^{T}}(c_{k}-s_{k})\right]\\ &=g^{T}\delta x^{*}+\sum_{k=0}^{N}(2y_{k}-\hat{y}_{k})^{T}(c_{k}-s_{k})-\rho_{k}^{+}\|c_{k}-s_{k}\|^{2},\end{split}

where we have used the definitions of δsk\delta s_{k}^{*} and δyk\delta y_{k}^{*}. Notice now that if cksk=0c_{k}-s_{k}=0 for all kk, then since sk0s_{k}\geq 0, δx=0\delta x=0 is a feasible solution to the QP sub-problem. Thus, ψ(δx)ψ(0)\psi(\delta x^{*})\leq\psi(0), from which one obtains the desired inequality. Consider then the case that the set :={k:cksk>0}\mathcal{I}:=\{k:\|c_{k}-s_{k}\|>0\} is non-empty. Then, we can re-write ϕ(0;ρ+)\phi^{\prime}(0;\rho^{+}) as:

k1||gTδx+(2yky^k)T(cksk)ρk+cksk2\sum_{k\in\mathcal{I}}\dfrac{1}{|\mathcal{I}|}g^{T}\delta x^{*}+(2y_{k}-\hat{y}_{k})^{T}(c_{k}-s_{k})-\rho_{k}^{+}\|c_{k}-s_{k}\|^{2}

Now, for ρk+=ρ^k,k\rho_{k}^{+}=\hat{\rho}_{k},k\in\mathcal{I}, defined as:

ρ^k:=1||ψ(δx)+(2yky^k)T(cksk)cksk2,\hat{\rho}_{k}:=\dfrac{\frac{1}{|\mathcal{I}|}\psi(\delta x^{*})+(2y_{k}-\hat{y}_{k})^{T}(c_{k}-s_{k})}{\|c_{k}-s_{k}\|^{2}},

we obtain:

ϕ(0;{ρ^k}k{ρk}k)=12Δ.\phi^{\prime}\left(0;\{\hat{\rho}_{k}\}_{k\in\mathcal{I}}\cup\{\rho_{k}\}_{k\notin\mathcal{I}}\right)=-\dfrac{1}{2}\Delta^{*}.

Thus, the update equation for {ρk}\{\rho_{k}\} may be written as:

ρk+={max(2ρk,ρ^k)ifkρkotherwise.\rho_{k}^{+}=\begin{cases}\max(2\rho_{k},\hat{\rho}_{k})\quad&\text{if}\quad k\in\mathcal{I}\\ \rho_{k}\quad&\text{otherwise}.\end{cases}

In context of Section II-A, the term ψ(δx)\psi(\delta x^{*}) is equivalent to the optimal objective of problem (4), while the correspondences for the dual and slack vectors follows straightforwardly. The line-search acceptance conditions are as given in (19).

C-C Termination Conditions

Let τp,τd>0\tau_{p},\tau_{d}\in\mathbb{R}_{>0} be user-specified relative tolerances, and define τx=τp(1+𝒖)\tau_{x}=\tau_{p}(1+\|\bm{u}\|) and τy=τd(1+𝒚)\tau_{y}=\tau_{d}(1+\|\bm{y}\|). Then, the KKT stationarity termination conditions are given as:

k=0,,N:{ckτxykτyckykτyk=0,,N1:ukH^kτy\begin{split}&k=0,\ldots,N:\begin{cases}c_{k}\geq-\tau_{x}\\ y_{k}\geq-\tau_{y}\\ \|c_{k}\circ y_{k}\|_{\infty}\leq\tau_{y}\end{cases}\\ &k=0,\ldots,N-1:\|\nabla_{u_{k}}{\hat{H}}_{k}\|_{\infty}\leq\tau_{y}\end{split} (25)

where \circ denotes the Hadamard product, and H^k\hat{H}_{k}, the Hamiltonian, is defined in (23).

Appendix D Computing Sensitivity Gains using DP

Proof of Theorem 1.

We will prove the result via induction. Notice that for the base case k=N1k=N-1, the cost-to-go function δVN(δxN)\delta V_{N}(\delta x_{N}) is defined only for δxN𝒳N\delta x_{N}\in\mathcal{X}_{N}, which by construction, is the set 𝒞N\mathcal{CR}_{N}. Thus δxN𝒞N\delta x_{N}^{*}\in\mathcal{CR}_{N} and δV~N(δxN)=δVN(δxN)\delta\tilde{V}_{N}(\delta x_{N})=\delta V_{N}(\delta x_{N}) for δxN𝒞N\delta x_{N}\in\mathcal{CR}_{N}. It follows that problem (13) for k=N1k=N-1 coincides exactly with the definition for δπN1\delta\pi_{N-1}^{*} given in (9). Thus, δπ^N1(δxN1)=δπN1(δxN1)\delta\hat{\pi}^{*}_{N-1}(\delta x_{N-1})=\delta\pi^{*}_{N-1}(\delta x_{N-1}) for all δxN1δXN1\delta x_{N-1}\in\delta X_{N-1} where the problem is feasible.

Now, notice that problem (13) for k=N1k=N-1 is a multi-parametric QP in the “parameter” δxN1\delta x_{N-1}. Thus, by Theorem 2 in [32], linear independence of the active inequality constraints at δπ^N1(δxN1)\delta\hat{\pi}^{*}_{N-1}(\delta x_{N-1}^{*}) implies that the solution functions δπ^N1(δxN1)\delta\hat{\pi}^{*}_{N-1}(\delta x_{N-1}) and y^N1(δxN1)\hat{y}^{*}_{N-1}(\delta x_{N-1}) are locally333One can actually show that the solution function δπ^N1\delta\hat{\pi}^{*}_{N-1} is in fact piecewise affine over δ𝒳N1\delta\mathcal{X}_{N-1} however this is not necessary for the proof. affine in a region containing δxN1\delta x_{N-1}^{*}, taking the form in (14). This region containing δxN1\delta x_{N-1}^{*} is precisely the set:

{δxN1δ𝒳N1:δπ^N1,a(δxN1)=δπ^N1(δxN1)}.\{\delta x_{N-1}\in\delta\mathcal{X}_{N-1}:\delta\hat{\pi}^{*}_{N-1,a}(\delta x_{N-1})=\delta\hat{\pi}^{*}_{N-1}(\delta x_{N-1})\}.

A second consequence of the aforementioned theorem is that the above set is defined by the intersection of the following two sets:

{δxN1δ𝒳N1:G¯N1[δπ^N1,a(δxN1)AN1δxN1+BN1π^N1,a(δxN1)]h¯N1}{δxN1δ𝒳N1:y^N1,a(δxN1)0},\begin{split}&\{\delta x_{N-1}\in\delta\mathcal{X}_{N-1}:\bar{G}_{N-1}\begin{bmatrix}\delta\hat{\pi}^{*}_{N-1,a}(\delta x_{N-1})\\ A_{N-1}\delta x_{N-1}+B_{N-1}\hat{\pi}^{*}_{N-1,a}(\delta x_{N-1})\end{bmatrix}\leq\bar{h}_{N-1}\}\\ &\{\delta x_{N-1}\in\delta\mathcal{X}_{N-1}:\hat{y}^{*}_{N-1,a}(\delta x_{N-1})\geq 0\},\end{split}

that is, the subset of δ𝒳N1\delta\mathcal{X}_{N-1} where the locally affine solutions satisfy the primal-dual feasibility constraints of problem (13) for k=N1k=N-1. This however is precisely the definition of the set 𝒞N1\mathcal{CR}_{N-1} in (15). Thus, we have established that δπ^N1,a(δxN1)=δπN1(δxN1)\delta\hat{\pi}^{*}_{N-1,a}(\delta x_{N-1})=\delta\pi^{*}_{N-1}(\delta x_{N-1}) for δxN1𝒞N1\delta x_{N-1}\in\mathcal{CR}_{N-1} and by construction, δxN1𝒞N1\delta x_{N-1}^{*}\in\mathcal{CR}_{N-1}. Substituting the affine feedback law into the objective for problem (13) for k=N1k=N-1 gives the recursion for {PN1,pN1,vN1}\{P_{N-1},p_{N-1},v_{N-1}\}, as defined in (16). Thus, δV~N1(δxN1)=δVN1(δxN1)\delta\tilde{V}_{N-1}(\delta x_{N-1})=\delta V_{N-1}(\delta x_{N-1}) for δxN1𝒞N1\delta x_{N-1}\in\mathcal{CR}_{N-1}, completing the proof for k=N1k=N-1.

Suppose then that the following statements are true for some k+1N1k+1\leq N-1: (i) δxk+1𝒞k+1\delta x_{k+1}^{*}\in\mathcal{CR}_{k+1}, and (ii) δVk+1(δxk+1)=δV~k+1(δxk+1)\delta V_{k+1}(\delta x_{k+1})=\delta\tilde{V}_{k+1}(\delta x_{k+1}) for δxk+1𝒞k+1\delta x_{k+1}\in\mathcal{CR}_{k+1}. Consider the definition of δπk\delta\pi^{*}_{k} given in (9), and define:

Pre(𝒞k+1):={δxkδ𝒳k:Akδxk+Bkδπk(δxk)𝒞k+1}\mathrm{Pre}^{*}(\mathcal{CR}_{k+1}):=\{\delta x_{k}\in\delta\mathcal{X}_{k}:A_{k}\delta x_{k}+B_{k}\delta\pi^{*}_{k}(\delta x_{k})\in\mathcal{CR}_{k+1}\}

Since δxk+1𝒞k+1\delta x_{k+1}^{*}\in\mathcal{CR}_{k+1} by the inductive hypothesis, it follows that δxkPre(𝒞k+1)\delta x_{k}^{*}\in\mathrm{Pre}^{*}(\mathcal{CR}_{k+1}) and hence, the set is non-empty. Now, for all δxkPre(𝒞k+1)\delta x_{k}\in\mathrm{Pre}^{*}(\mathcal{CR}_{k+1}), one may add the redundant constraint: δxk+1𝒞k+1\delta x_{k+1}\in\mathcal{CR}_{k+1} and equivalently re-write δπk\delta\pi_{k}^{*} as:

argminδukδ𝒰kl~k(δxk,δuk)+δVk+1(δxk+1)s.t.δxk+1𝒞k+1.\begin{split}\operatornamewithlimits{\mathrm{argmin}}_{\delta u_{k}\in\delta\mathcal{U}_{k}}\quad&\tilde{l}_{k}(\delta x_{k},\delta u_{k})+\delta V_{k+1}(\delta x_{k+1})\\ \mathrm{s.t.}\quad&\delta x_{k+1}\in\mathcal{CR}_{k+1}.\end{split}

Now, leveraging the inductive hypothesis that δV~k+1(δxk+1)=δVk+1(δxk+1)\delta\tilde{V}_{k+1}(\delta x_{k+1})=\delta V_{k+1}(\delta x_{k+1}) for all δxk+1𝒞k+1\delta x_{k+1}\in\mathcal{CR}_{k+1}, we can re-write δπk\delta\pi_{k}^{*} as:

argminδukδ𝒰kl~k(δxk,δuk)+δV~k+1(δxk+1)s.t.δxk+1𝒞k+1.\begin{split}\operatornamewithlimits{\mathrm{argmin}}_{\delta u_{k}\in\delta\mathcal{U}_{k}}\quad&\tilde{l}_{k}(\delta x_{k},\delta u_{k})+\delta\tilde{V}_{k+1}(\delta x_{k+1})\\ \mathrm{s.t.}\quad&\delta x_{k+1}\in\mathcal{CR}_{k+1}.\end{split}

Comparing with problem (13), this is precisely the definition of δπ^k\delta\hat{\pi}^{*}_{k}. Thus, we have that δπ^k(δxk)=δπk(δxk)\delta\hat{\pi}^{*}_{k}(\delta x_{k})=\delta\pi^{*}_{k}(\delta x_{k}) for δxkPre(𝒞k+1)\delta x_{k}\in\mathrm{Pre}^{*}(\mathcal{CR}_{k+1}).

Now, consider the fixed affine feedback law δπ^k,a\delta\hat{\pi}^{*}_{k,a} in (14), that is well-defined thanks to the linear independence assumption at δxk\delta x_{k}^{*}. Once again, applying Theorem 2 in [32], we have that δπ^k(δxk)=δπ^k,a(δxk)\delta\hat{\pi}^{*}_{k}(\delta x_{k})=\delta\hat{\pi}^{*}_{k,a}(\delta x_{k}) for δxk𝒞k\delta x_{k}\in\mathcal{CR}_{k}, as defined in (15). Substituting the affine law into the objective of problem (13) yields the recursion in (16); thus, δV~k(δxk)=δVk(δxk)\delta\tilde{V}_{k}(\delta x_{k})=\delta V_{k}(\delta x_{k}) for all δxk𝒞kPre(𝒞k+1)\delta x_{k}\in\mathcal{CR}_{k}\cap\mathrm{Pre}^{*}(\mathcal{CR}_{k+1}). Further, by construction, δxk𝒞k\delta x_{k}^{*}\in\mathcal{CR}_{k}; thus, the intersection 𝒞kPre(𝒞k+1)\mathcal{CR}_{k}\cap\mathrm{Pre}^{*}(\mathcal{CR}_{k+1}) is non-empty.

To finish the proof, we must show that 𝒞kPre(𝒞k+1)\mathcal{CR}_{k}\subseteq\mathrm{Pre}^{*}(\mathcal{CR}_{k+1}). By doing so, we establish the equivalency between the fixed affine feedback law δπ^k,a\delta\hat{\pi}^{*}_{k,a} and δπk\delta\pi^{*}_{k} and consequently, the equivalency of δVk\delta V_{k} and δV~k\delta\tilde{V}_{k} for δxk𝒞k\delta x_{k}\in\mathcal{CR}_{k}.

To show this last part, notice that for any δxk\delta x_{k} that lies on the boundary Pre(𝒞k+1)\partial\mathrm{Pre}^{*}(\mathcal{CR}_{k+1}), it holds that Akδxk+Bkδπk(δxk)A_{k}\delta x_{k}+B_{k}\delta\pi_{k}^{*}(\delta x_{k}) lies on the boundary 𝒞k+1\partial\mathcal{CR}_{k+1}. This follows from the continuity of δπk(δxk)\delta\pi_{k}^{*}(\delta x_{k}). Then, by the equivalence of δπk\delta\pi^{*}_{k} and δπ^k\delta\hat{\pi}_{k}^{*} for δxkPre(𝒞k+1)\delta x_{k}\in\mathrm{Pre}^{*}(\mathcal{CR}_{k+1}) and the equivalence of δπ^k\delta\hat{\pi}^{*}_{k} and δπ^k,a\delta\hat{\pi}^{*}_{k,a} for δxk𝒞k\delta x_{k}\in\mathcal{CR}_{k}, for any δxk𝒞kPre(𝒞k+1)\delta x_{k}\in\mathcal{CR}_{k}\cap\partial\mathrm{Pre}^{*}(\mathcal{CR}_{k+1}), it holds that Akδxk+Bkδπ^k,a(δxk)𝒞k+1A_{k}\delta x_{k}+B_{k}\delta\hat{\pi}_{k,a}^{*}(\delta x_{k})\in\partial\mathcal{CR}_{k+1}. It follows that δxk𝒞k\delta x_{k}\in\partial\mathcal{CR}_{k}.

Thus, since the intersection 𝒞kPre(𝒞k+1)\mathcal{CR}_{k}\cap\mathrm{Pre}^{*}(\mathcal{CR}_{k+1}) is non-empty, then either (i) 𝒞kPre(𝒞k+1)\mathcal{CR}_{k}\subset\mathrm{Pre}^{*}(\mathcal{CR}_{k+1}), or (ii) 𝒞kPre(𝒞k+1)\mathcal{CR}_{k}\cap\partial\mathrm{Pre}^{*}(\mathcal{CR}_{k+1}) is non-empty. In the latter case, we have shown that is the set 𝒞kPre(𝒞k+1)\partial\mathcal{CR}_{k}\cap\partial\mathrm{Pre}^{*}(\mathcal{CR}_{k+1}), completing the proof.

Appendix E Experiment Details

The following hyperparameters were held fixed for all Shooting SQP variations over all environments:

Parameter Value
Line-search decrease ratio: σ\sigma 0.4
Line-search gradient ratio: η\eta 0.49
Line-search step-size lower-bound: α¯\underline{\alpha} 1e-5
Termination Primal tolerance: τp\tau_{p} 1e-3
Termination Dual tolerance: τd\tau_{d} 1e-3*
TABLE III: Hyperparameters for Shooting SQP. (*) For the Quad-Pendulum example, the dual tolerance τd\tau_{d} was set as 10210^{-2}.

In the following, we give details specific to each environment case-study.

E-A Motion Planning for a Car

This is a system with 4 states: x=(px,py,θ,v)x=(p_{x},p_{y},\theta,v), where (px,py)2(p_{x},p_{y})\in\mathbb{R}^{2} is the 2D position, θ𝒮1\theta\in\mathcal{S}^{1} is the orientation, and vv\in\mathbb{R} is the forward velocity. The continuous-time dynamics are given by:

p˙x=vsinθ,p˙y=vcosθ,θ˙=vuθ,v˙=uv,\dot{p}_{x}=v\sin\theta,\quad\dot{p}_{y}=v\cos\theta,\quad\dot{\theta}=vu^{\theta},\quad\dot{v}=u^{v},

where (uθ,uv)(u^{\theta},u^{v}) are the steering and acceleration inputs. As in [18], we use Euler integration with a time-step of 0.050.05s to yield the discrete-time model. The control limits are taken from [17]: uθ[π3,π3]u^{\theta}\in[-\frac{\pi}{3},\frac{\pi}{3}] rad/s, and uv[6,6]s2u^{v}\in[-6,6]\ \mathrm{s}^{-2}.

The stage cost is lk(x,u)=0.05uTRul_{k}(x,u)=0.05u^{T}Ru with R=diag(0.2,0.1)R=\mathrm{diag}(0.2,0.1), and terminal cost is lN(x)=(xxg)TQN(xxg)l_{N}(x)=(x-x_{g})^{T}Q_{N}(x-x_{g}) with QN=diag(50,50,50,10)Q_{N}=\mathrm{diag}(50,50,50,10) and xg=(3,3,π2,0)x_{g}=(3,3,\frac{\pi}{2},0). The horizon NN is 40 steps. The initial control sequence guess was all zeros. The constant γ\gamma for CLγ\texttt{\bf{CL}}_{\gamma} was set to 10410^{-4}.

Figure 4 shows the solver progress for all algorithms and cases.

Refer to caption
Figure 4: Step-size, objective, constraint violation (positive indicates strict feasibility), and computation time per iteration, as a function of iteration. Left-to-right: Case index.

E-B Acrobot

We study the classic acrobot “swing-up” task with n=4n=4 states and m=1m=1 control input. The state is defined as x=(q1,q2,v1,v2)x=(q_{1},q_{2},v_{1},v_{2}) where q1q_{1} the root joint angle, q2q_{2} is the relative elbow joint angle, and v1,v2v_{1},v_{2} are their respective angular rates. The dynamics were taken from [33]. Starting from an initial state x0=0x_{0}=0, we require the acrobot to swing up to the goal state xg=(π,0,0,0)x_{g}=(\pi,0,0,0), subject to the control limits u[2,2]u\in[-2,2]. The stage cost is lk(x,u)=12[w1(cosq1+cos(q1+q2)+2)+w2u2]l_{k}(x,u)=\frac{1}{2}\left[w_{1}(\cos q_{1}+\cos(q_{1}+q_{2})+2)+w_{2}u^{2}\right], and the terminal cost is lN(x)=12w3xxg2l_{N}(x)=\frac{1}{2}w_{3}\|x-x_{g}\|^{2}, with weights (w1,w2,w3)=(0.1,0.01,10)(w_{1},w_{2},w_{3})=(0.1,0.01,10). Additionally, we enforce the limit constraints444Due to the fact that these two state components really live on 𝒮1×𝒮1\mathcal{S}^{1}\times\mathcal{S}^{1}, we employ chart switching to smoothly handle this constraint near the boundary π/π-\pi/\pi. This is achieved by adjusting the boundary constraints for each angle to lie in [π±δk,π±δk][-\pi\pm\delta_{k},\pi\pm\delta_{k}] for the next iterate, where the per time-step adjustment δk\delta_{k} is chosen based on the current values for q1q_{1} and q2q_{2} at time-step kk. We also wrap both angles back to the interval [π,π][-\pi,\pi] at the end of each iteration. (q1,q2)[π,π]2(q_{1},q_{2})\in[-\pi,\pi]^{2} and the terminal constraint: xNxg20.22\|x_{N}-x_{g}\|^{2}\leq 0.2^{2}. The discrete-time dynamics were obtained using explicit Euler integration with a time-step of 0.05s, and the horizon NN was 150150 steps.

To generate the initial guess, we set 𝝌0\bm{\chi}_{0}, the initial guess for the state-trajectory to be a linear interpolation between x0x_{0} and xgx_{g}, and 𝝁0\bm{\mu}_{0}, the initial guess for the controls as all zeros. Then, the following rollout was used to generate the initial control trajectory for SQP:

xk+1=f(xk,clipu¯u¯[μk+Kklqr(xkχk)]),x0=0,k=0,,N1,x_{k+1}=f(x_{k},\mathrm{clip}_{\underline{u}}^{\overline{u}}[\mu_{k}+K_{k}^{\mathrm{lqr}}(x_{k}-\chi_{k})]),\ x_{0}=0,\ k=0,\ldots,N-1,

where the gains {Kklqr}\{K_{k}^{\mathrm{lqr}}\} were computed via a TV-LQR solve, with cost defined by the Hessian of the objective and dynamics linearized about (𝝌0,𝝁0)(\bm{\chi}_{0},\bm{\mu}_{0}).

We found that for this problem, the Gauss-Newton Hessian approximation was more stable for both algorithms. This entails dropping the second-order gradients stemming from ν^k+1Tf(xk,uk)\hat{\nu}_{k+1}^{T}f(x_{k},u_{k}) in (23). The constant γ\gamma for CLγ\texttt{\bf{CL}}_{\gamma} was set to 10410^{-4}. Figure 5 illustrates solver progress.

Refer to caption
Figure 5: Step-size, objective, constraint violation (positive indicates strict feasibility), and computation time per iteration.

Despite sub-optimal step-sizes for both algorithms (due to the natural instabilities of the dynamics), CLγ\texttt{\bf{CL}}_{\gamma} successfully converges within 33 iterations while OL is still struggling to resolve the Lagrangian stationarity residual at 100 iterations.

E-C Quad-Pendulum

The state is defined as x=(px,pz,θ,ϕ,p˙x,p˙z,θ˙,ϕ˙)x=(p_{x},p_{z},\theta,\phi,\dot{p}_{x},\dot{p}_{z},\dot{\theta},\dot{\phi}), where (px,pz)2(p_{x},p_{z})\in\mathbb{R}^{2} is the CoM position of the quadrotor in the vertical plane, θ𝒮1\theta\in\mathcal{S}^{1} is its orientation, and ϕ𝒮1\phi\in\mathcal{S}^{1} is the orientation of the attached pendulum w.r.t. the vertical. The two control inputs u=(u1,u2)u=(u_{1},u_{2}) corresponding to the rotor thrusts are subject to the limits [0.1,3]mqg[0.1,3]m_{q}g, where mqm_{q} is the mass of the quadrotor and gg is the gravitational acceleration.

We provide the Lagrangian characterization of the dynamics. The mass matrix M(q)M(q), as a function of the generalized coordinates q=(px,pz,θ,ϕ)q=(p_{x},p_{z},\theta,\phi) is given by:

[mq+mp00mpLcosϕ0mq+mp0mpLsinϕ00J0mpLcosϕmpLsinϕ0mpL2],\begin{bmatrix}m_{q}+m_{p}&0&0&m_{p}L\cos\phi\\ 0&m_{q}+m_{p}&0&m_{p}L\sin\phi\\ 0&0&J&0\\ m_{p}L\cos\phi&m_{p}L\sin\phi&0&m_{p}L^{2}\end{bmatrix},

where mpm_{p} is the mass of the pendulum (assumed concentrated at the endpoint), LL is the pendulum length, and JJ is the quadrotor moment of inertia. The kinetic energy is thus: T(q,q˙)=12q˙TM(q)q˙T(q,\dot{q})=\frac{1}{2}\dot{q}^{T}M(q)\dot{q}; the potential energy is V(q)=mqgpz+mpg(pzLcosϕ)V(q)=m_{q}gp_{z}+m_{p}g(p_{z}-L\cos\phi), and the mechanical Lagrangian is (q,q˙)=T(q,q˙)V(q)\mathcal{L}(q,\dot{q})=T(q,\dot{q})-V(q). The dynamics are thus given as:

ddt((q,q˙)q˙)(q,q˙)q=F(q),\dfrac{d}{dt}\left(\dfrac{\partial\mathcal{L}(q,\dot{q})}{\partial\dot{q}}\right)-\dfrac{\partial\mathcal{L}(q,\dot{q})}{\partial q}=F(q),

where F(q)F(q), the generalized force vector is given as:

F(q)=[(u1+u2)sinθ(u1+u2)cosθ(u1u2)lτfτf],F(q)=\begin{bmatrix}-(u_{1}+u_{2})\sin\theta\\ (u_{1}+u_{2})\cos\theta\\ (u_{1}-u_{2})l-\tau_{f}\\ \tau_{f}\end{bmatrix},

where 2l2l is the quadrotor wing-span, τf\tau_{f} is the viscous frictional torque: ν(ϕ˙θ˙)-\nu(\dot{\phi}-\dot{\theta}), where ν\nu is the constant friction coefficient. We used the constants: mq=0.486m_{q}=0.486, mp=0.2mqm_{p}=0.2m_{q}, l=0.25l=0.25, L=2lL=2l, g=9.81g=9.81, J=0.00383J=0.00383, ν=0.01\nu=0.01, and employed Euler discretization with time-step 0.025 s.

The stage cost is lk(x,u)=12(w1((px,pz,θ)(gx,gz,gθ)2+(1+cosϕ))+w2uuh2)l_{k}(x,u)=\frac{1}{2}(w_{1}(\|(p_{x},p_{z},\theta)-(g_{x},g_{z},g_{\theta})\|^{2}+(1+\cos\phi))+w_{2}\|u-u_{h}\|^{2}), and the terminal cost is lN(x)=12(xxg)TQN(xxg)l_{N}(x)=\frac{1}{2}(x-x_{g})^{T}Q_{N}(x-x_{g}), where xg=(gx,gz,gθ,gϕ,0,0,0,0)x_{g}=(g_{x},g_{z},g_{\theta},g_{\phi},0,0,0,0) is the goal state with: (gx,gz,gθ,gϕ)=(3,1.5,0,π)(g_{x},g_{z},g_{\theta},g_{\phi})=(3,-1.5,0,\pi). The vector uh=0.5(mq+mp)g𝟏u_{h}=0.5(m_{q}+m_{p})g\bm{1} is the hover thrust setpoint, and QN=diag(10,10,1,1,1,1,1,1)Q_{N}=\mathrm{diag}(10,10,1,1,1,1,1,1). The weights are (w1,w2,w3)=(0.01,0.05,5)(w_{1},w_{2},w_{3})=(0.01,0.05,5). The horizon NN was 160 steps.

In addition to the obstacle avoidance constraints, we also set the operational constraints (px,pz)[4,4]×[2,2](p_{x},p_{z})\in[-4,4]\times[-2,2] and θ[3π4,3π4]\theta\in[-\frac{3\pi}{4},\frac{3\pi}{4}]. The initial control sequence guess was set to uhu_{h}, the hover setpoint, for all timesteps.

Due to the difficulty of this problem, the constant γ\gamma for CLγ\texttt{\bf{CL}}_{\gamma} was initialized at 10310^{-3} and reduced by a factor of 10 every SQP iteration, until a lower-bound of 10510^{-5}.

Figure 6 shows a timelapse for the first case solution from CLγ\texttt{\bf{CL}}_{\gamma}, and in Figure 7 we plot the re-construction error δπ^k(δxk)δuk\|\delta\hat{\pi}_{k}^{*}(\delta x_{k}^{*})-\delta u_{k}^{*}\| for both cases for CLγ\texttt{\bf{CL}}_{\gamma}.

Refer to caption
Figure 6: Timelapse of the solution obtained by CLγ\texttt{\bf{CL}}_{\gamma} for Case #1. Robot snapshots are 0.1s apart at the beginning and expand to 0.3s towards the end; initial render is in green, final render is in red. Light shaded circle centered on the quadrotor is the circumscribing disc used for the collision-avoidance constraint for the quadrotor body (in addition to the avoidance constraint w.r.t. the pendulum pole).
Refer to caption
Figure 7: Re-construction error between open-loop solution 𝜹𝒖\bm{\delta u}^{*} and perturbation policy solution δπ^k(δxk)\delta\hat{\pi}_{k}^{*}(\delta x_{k}^{*}) as a function of time-step across all SQP iterations. Left: Case #1; Right: Case #2. Colorbar identifies the SQP iteration.