Optimizing Trajectories with Closed-Loop Dynamic SQP
Abstract
Indirect trajectory optimization methods such as Differential Dynamic Programming (DDP) have found considerable success when only planning under dynamic feasibility constraints. Meanwhile, nonlinear programming (NLP) has been the state-of-the-art approach when faced with additional constraints (e.g., control bounds, obstacle avoidance). However, a naïve implementation of NLP algorithms, e.g., shooting-based sequential quadratic programming (SQP), may suffer from slow convergence – caused from natural instabilities of the underlying system manifesting as poor numerical stability within the optimization. Re-interpreting the DDP closed-loop rollout policy as a sensitivity-based correction to a second-order search direction, we demonstrate how to compute analogous closed-loop policies (i.e., feedback gains) for constrained problems. Our key theoretical result introduces a novel dynamic programming-based constraint-set recursion that augments the canonical “cost-to-go” backward pass. On the algorithmic front, we develop a hybrid-SQP algorithm incorporating DDP-style closed-loop rollouts, enabled via efficient parallelized computation of the feedback gains. Finally, we validate our theoretical and algorithmic contributions on a set of increasingly challenging benchmarks, demonstrating significant improvements in convergence speed over standard open-loop SQP.
I Introduction
Trajectory optimization forms the backbone of model-based optimal control with myriad applications in robot mobility and manipulation [1, 2, 3, 4, 5]. The problem formulation is as follows: consider a robotic system with state , control input , subject to the discrete-time dynamics:
(1) |
Let be some fixed planning horizon. Given some initial state , the trajectory optimization problem is as follows:
(2a) | ||||
(2b) |
where we use to denote the concatenations and , respectively. Here, is the running cost, is the terminal cost, and are the vector-valued constraint functions on the state and control input. We assume that the control constraint encodes simple box constraints: , though, the results in this paper may be generalized beyond this assumption.
Solution methods generally fall into one of two approaches: optimal control-based (indirect methods), or optimization-based (direct methods). The former leverages necessary conditions of optimality for optimal control, such as dynamic programming (DP), while the latter treats the problem as a pure mathematical optimization program [6]. A further sub-categorization of the direct method distinguishes between a Full or a Condensed formulation, where the former treats both the states and controls as optimization variables, subject to dynamics equality constraints, while the latter optimizes only over the control variables, with the dynamics implicit.

Lacking constraints beyond dynamic feasibility, ubiquitous indirect methods [7, 8, 9, 10] such as Differential Dynamic Programming (DDP) and its Gauss-Newton relaxation, iterative Linear Quadratic Regulator (iLQR) rely upon the DP recursion to split the full-horizon planning problem into a sequence of one-step optimizations, and alternate between a backward and forward pass through the time-steps. The backward pass recursively forms quadratic expansions of the optimal cost-to-go function and computes a time-varying affine perturbation policy that is subsequently rolled out through the system’s dynamics in the forward pass to yield the updated trajectory iterate. Under mild assumptions, DDP locally achieves quadratic convergence, and the proof relies upon establishing the close link between DDP and the Newton method, as applied to the condensed optimization-based formulation [9]. The stability properties of the underlying nonlinear system manifest as numerical stability during the optimization process, and hence the closed-loop nature of the forward pass in DDP typically leads to better performance [11] than Newton’s method, which implements open-loop rollouts.
Computing DDP-style closed-loop updates within the constrained setting (beyond dynamics feasibility) is much more challenging, since quadratization of the cost and linearization of the dynamics and constraints yields constrained quadratic programs (QPs) with piecewise-affine optimal perturbation policies [12], and complexity growing exponentially in the number of constraints and horizon of the problem. Consequently, direct methods (featuring open-loop updates) are the prevailing solution approach, typically combined with interior-point or SQP algorithms. Aside from possessing more variables, the direct formulation must additionally resolve dynamic feasibility, which can be non-trivial and lead to slower convergence even for unconstrained systems, as compared to indirect methods. Moreover, while the condensed formulation yields smaller problems, instabilities have been observed [13] due to divergences between the predicted step from the linearized dynamics, and the open-loop nonlinear rollout, leading to vanishing step-sizes.
Contributions: Re-interpreting the canonical DDP closed-loop rollout as a sensitivity-based correction to a second-order search direction, we demonstrate how to compute a locally affine approximation to the constrained perturbation policies, i.e., a set of feedback gains similar to those employed by the DDP rollouts. Our key theoretical result states that in the constrained setting, one must first compute an optimal perturbation sequence about the current trajectory iterate by solving a full-horizon QP (as opposed to the one-step DP backward recursion), and then augment the canonical cost-to-go backward pass with a constraint-set recursion. We then demonstrate how to approximate the desired feedback gains using an efficient, parallelized algorithm, eliminating the backward pass. The closed-loop rollout is integrated into an SQP line-search, yielding a hybrid indirect/direct algorithm that combines the theoretical foundations of SQP for constrained optimization with the algorithmic efficiencies of DDP-style forward rollouts. The method is rigorously evaluated within several environments, where we confirm significant convergence speed improvements over naïve (i.e., open-loop) SQP.
Related Work: Quasi-DDP methods for constrained trajectory optimization fall into one of three main categories: control-bounds only [14],[15], modified backward pass via KKT analysis [16, 17, 18, 19, 20], and augmented Lagrangian methods [21, 4, 3, 22, 23]. We provide a comprehensive overview of these approaches in Appendix A111All appendices referenced herein may be found in the online version of this work [24]..
II Shooting SQP
We detail below the core algorithmic steps for SQP, as applied to the shooting formulation of problem (2), i.e., where dynamics are treated implicitly and we optimize only over the control sequence . The three steps are [25, 26]: (i) solving a QP sub-problem to compute a search direction, i.e., a sequence of control perturbations , (ii) performing line-search along using a merit function, and (iii) monitoring termination conditions. We provide some details regarding (i) and (ii) here, and refer the reader to Appendices B and C for the rest.
Let denote the concatenation , (where is null), and let denote the corresponding dual variable. Consider the Lagrangian at current primal-dual iterate :
(3) |
For brevity, we omit the explicit arguments where possible. As we are only optimizing over , let represent the state trajectory starting at obtained from propagating the open-loop control sequence through the discrete-time dynamics in (1) for . Define the “reduced” objective and Lagrangian as and , respectively, and define the sets and as and . Then, the QP sub-problem takes the form:
(4a) | ||||
(4b) | ||||
(4c) |
where and denotes the standard Euclidean dot product; are the dynamics Jacobians , and are the constraint Jacobians . The sub-problem objective (4a) has the form:
(5) |
where the terms are provided in Appendix C. Let represent the optimal primal, and the optimal inequality dual solutions for (4), and define the dual search direction .
II-A Line-Search
For , define and . The line-search merit function is defined as:
(6) |
where is the augmented Lagrangian function for problem (2), is a vector of slack variables for the inequality constraints with search direction , introduced solely for the line-search, and is a set of penalty parameters. Please see Appendix C for details on .
III Dynamic Programming SQP
Implicit within the line-search is the open-loop rollout along the search direction , i.e., . For unstable nonlinear systems, this state trajectory may differ significantly from , the “predicted” sequence from the QP sub-problem, forcing the line-search to take sub-optimal step-sizes and slowing convergence. This observation is corroborated in [11] in context of comparing DDP and Newton methods for unconstrained problems, and within [13] in the constrained context. Our objective therefore, is to efficiently compute a set of feedback gains to perform DDP-style closed-loop rollouts within the SQP line-search. We hypothesize that such an enhancement will (i) improve the numerical stability of the line-search, and (ii) accelerate convergence of Shooting SQP. We first demonstrate how the classical DP recursion is ill-posed in the context of constrained trajectory optimization, and propose a correction inspired from sensitivity analysis.
III-A Sensitivity-Based Dynamic Programming
The starting point for the derivation of iLQR and DDP algorithms for unconstrained problems is with the Bellman form of the optimal cost-to-go function:
where is a policy for time-step , mapping states to controls. For a non-optimal state-control sequence , consider the local expansion of the optimal cost-to-go function:
where , and is a perturbation policy for time-step at , as a function of . Now, by recursively (in a backward pass) taking quadratic approximations of the state-action variation function about , one can solve for an affine approximation to the minimizing perturbation policy. In particular, let represent the quadratic approximation of , and define . Then . For step-length , this perturbation policy is rolled out to obtain the new trajectory iterate:
(7) |
where . Notice that one may interpret the terms of the unconstrained perturbation policy as follows:
(8) |
Remark 1.
Since is the solution of an unconstrained convex quadratic, the argument is redundant for the sensitivity matrix . This will not be the case in the constrained setting.
Consider now the constrained setting, and define for :
(9) |
where is the stage- term in (5), , and is the optimal “cost-to-go” for problem (4). That is, for , , , while for , , is the optimal value of the tail-truncation of QP sub-problem (4), starting at time-step at .
Notice that since is the optimal value of a constrained QP, and the sensitivity matrix (paralleling the terms defined in (8)) may be ill-defined, for instance when the tail sub-problem is infeasible at . This is a consequence of the linearized constraints, irrespective of the objective function used to define the DP recursion.
Instead, consider the following equivalent re-arrangement of the unconstrained DDP control law in (7):
(10) |
where, the sequence is defined by the rollout of via the linearized dynamics:
(11) |
In light of the homogeneity of the above recursion (i.e., the sequence rolled out via the linear dynamics yields ), eq. (10) suggests interpreting as a search-direction, as the search step, and the feedback term as a sensitivity-based correction. Thus, we may interpret the DDP rollout as a local sensitivity-based correction to the Newton search direction (). Generalizing this interpretation to the constrained setting, consider the following control law:
where similarly to (11), is obtained from rolling-out through the linearized dynamics. Now, it follows from Bellman’s principle of optimality that , where are the optimal solution to (4). Further since , it follows inductively that . Thus, the final control law for becomes
(12) |
Given that , as defined in (9), is implicitly the solution of a variable-horizon optimization, it is computationally prohibitive to compute the sensitivity matrices above via explicit differentiation. Instead, we next define a DP recursion to exactly compute these sensitivities about the fixed sequence .
III-B DP Recursion for Computing Sensitivity Gains
We outline the DP recursion first and characterize its correctness in Theorem 1.
Initialization: Set .
Time-step : Define the function . For , consider the one-step QP:
(13) |
Let and denote the optimal control perturbation and inequality dual solutions for the above one-step QP, as a function of . Define the sensitivity matrices and as the Jacobians [27] of and respectively, evaluated at , and define the affine functions:
(14a) | ||||
(14b) |
Recurse: Compute:
(15) |
where , and:
(16a) | ||||
(16b) | ||||
(16c) | ||||
(16d) | ||||
(16e) |
We now characterize the correctness of this DP recursion in the following theorem; the proof is provided in Appendix D.
Theorem 1.
Remark 3.
A notable consequence of Theorem 1 is that the canonical cost-to-go recursion is ill-posed in the presence of constraints. One must back-propagate both the cost-to-go terms and a set of constraints (i.e., the sets ) that define the regions where the quadratic models of the cost-to-go functions are precise.
Despite the exactness of the DP recursion, there are some computational drawbacks. First, one must solve both the “full-horizon” QP defined in (4) and the one-step QPs defined in (13), serially. Second, back-propagating sets is not numerically robust, particularly if the sensitivity is ill-defined. This occurs when the LICQ condition fails and the resulting matrix solve computation for the sensitivities is singular. Thus, in the next section, we outline a parallelized and tuneable approximation to the sensitivity gains, derived from the viewpoint of interior point methods.
III-C Approximating the Sensitivity Gains
For , define problem as the tail portion of QP sub-problem (4), starting at time-step at . Let represent the optimal solution as a function of , i.e., the optimal control perturbation sequence starting at time-step . Notice that , as defined in (9), corresponds to the first element of the sequence .
Now define the QP problem as QP sub-problem (4), subject to an additional equality constraint: , and let represent the optimal tail control perturbation sequence starting at time-step .
Notice then that for all where is feasible, we have that . Thus, is equal to the optimal control perturbation at time-step for problem , hereby denoted as the function . Further, since is feasible, one may compute the desired sensitivity gains as the Jacobian .
Note the distinction: the sensitivity corresponds to the Jacobian of the solution of a fixed-horizon QP (problem ) w.r.t. a parameter () that defines the equality constraint . In comparison, the sensitivity corresponds to the Jacobian of the solution of a variable-horizon QP (problem ) w.r.t. a parameter () that defines the “initial condition.” The former computation is easily parallelized.
Leveraging a recent result in [28], we approximate by the Jacobian of the solution of the following unconstrained barrier re-formulation of problem w.r.t. at :
subject to the linear dynamics in (4b); where is the barrier constant. Denote to be the barrier-based Jacobian with parameter and let be the true Jacobian. Under appropriate conditions on the solution of QP sub-problem (4), as [28].
Practically, we compute the Jacobians efficiently using iLQR and a straightforward application of the Implicit Function Theorem [29]. We initialized the solver with the QP sub-problem (4) solution , and found only a handful of iterations were needed to converge, particularly since problem (III-C) is convex.
III-D Hybrid SQP Algorithm
We formally state the Hybrid SQP algorithm as a line-search modification of Shooting SQP, introduced in Section II. Thus, at the current primal-dual iterate :
Step 1: Solve QP-subproblem (4) to obtain the optimal perturbation sequence pair .
Step 2: Compute the sensitivity gains , using either the DP recursion in Section III-B (i.e., ), or the smoothed approximation method in Section III-C (i.e., for some ).
Step 3: Perform line-search using (6), where is now defined by the closed-loop rollout:
Notice that if , the rollout corresponds with the ideal DDP rollout in (12), while for , we end up with an approximation222As the sensitivity gains are only valid in a neighborhood of , it is possible (though rare in our experiments) that the computed step-length falls below the user-set threshold for a specific iteration. As a backup, we compute a set of TV-LQR gains using the linearized dynamics and the Hessian of the objective function , and perform the closed-loop rollout with these gains. This strategy is motivated by the goal of tracking the perturbation during the rollout [13]. stemming from using a fixed gain matrix.
IV Experiments
We compare the naïve, open-loop, Shooting SQP implementation introduced in Section II (referred to as OL) with the DDP-style closed-loop variation developed in Section III (referred to as CL and ) on two environments. The identifiers CL and distinguish between the exact DP recursion and the smoothed barrier-based approximation for computing the forward rollout gains. Please see Appendix E for details regarding problem setup, SQP hyperparameters, additional plots, and an extra worked example.
IV-A Motion Planning for a Car
The first example is taken from [17], featuring a 2D car (, ) moving within the obstacle-ridden environment shown in Figure 2. The objective is to drive to the goal position with final velocity and orientation aligned with the horizontal axis in steps, while avoiding the obstacles.
Figure 2 shows the computed trajectories by the three methods for three different initial conditions, while Table I provides the solver statistics. Notice that the OL method fails to converge within 100 iterations (the limit) for two of the three cases. In contrast, both closed-loop variations are quickly able to converge to stationary solutions.
Method | Case | Converged | Iter | Obj | Viol | Time/it [s] |
---|---|---|---|---|---|---|
OL | 1 | ✗ | 100 | 278.53 | -0.0346 | 0.34 |
2 | ✗ | 100 | 2.17 | 0.0061 | 0.30 | |
3 | ✓ | 12 | 21.49 | 3.25e-6 | 0.30 | |
CL | 1 | ✓ | 19 | 3.19 | 7.87e-6 | 13.17 |
2 | ✓ | 19 | 6.98 | 1.43e-6 | 19.19 | |
3 | ✓ | 13 | 21.58 | 1.12e-5 | 17.90 | |
1 | ✓ | 19 | 3.19 | 1.34e-5 | 0.40 | |
2 | ✓ | 16 | 2.06 | 1.28e-5 | 0.41 | |
3 | ✓ | 11 | 21.58 | -4.67e-6 | 0.38 |


In Figure 3, we plot the re-construction error for CL and over the course of the SQP iterations. Observe that for most iterations, the error is negligible for CL, with occasional spikes resulting from the numerical instability of the constrained DP recursion. In contrast, the error remains sufficiently low for all iterations of , even leading to a better quality (lower objective) solution for Case #2. We hypothesize that the better numerical stability of stems from a smoothing of the computed Jacobians (i.e., feedback gains), courtesy of the unconstrained barrier re-formulation. Finally, a notable advantage of over CL is the computation time. While CL involves differentiating through the KKT conditions of one-step horizon QPs, this computation must happen serially in the backward pass. In contrast, computes the required Jacobians across all time-steps in parallel using an efficient adjoint recursion associated with problem (III-C). Consequently, the computation times-per iteration are much closer together for OL and than for OL and CL. For the remaining experiments, we only compare OL and .
IV-B Quad-Pendulum
Consider a quadrotor with an attached pendulum (, ) moving within an obstacle-cluttered 2D vertical plane, subject to viscous friction at the pendulum joint. The system is subject to operational constraints on the state and control input, as well as obstacle avoidance constraints. The task involves planning a trajectory starting at rest with the pendulum at the stable equilibrium, to a goal location, with the pendulum upright and both quadrotor and pendulum stationary. Table II provides the solver statistics for and OL (up until the algorithm stalls due to infeasibility of the QP sub-problem). Figure 1 shows a timelapse of the solution for the more difficult of the two cases, highlighting the ability of in solving challenging planning tasks.
Method | Case | Converged | Iter | Obj | Viol | Time/it [s] |
OL | 1 | ✗ | stall (2) | 2272.8 | -7.33 | 1.49 |
2 | ✗ | stall (4) | 488.9 | -0.58 | 1.35 | |
1 | ✓ | 39 | 9.31 | -2.73e-10 | 6.25 | |
2 | ✓ | 59 | 11.57 | -4.33e-10 | 5.65 |
.
V Conclusions
In this work, we re-interpret DDP rollout policies from a perspective of sensitivity-based corrections, and use this insight to develop algorithms for computing analogous policies for constrained problems. We incorporate the resulting closed-loop rollouts within a shooting-based SQP framework, and demonstrate significant improvements in convergence speed over a standard SQP implementation using open-loop rollouts.
Our work opens several avenues for future research. First, a key bottleneck of SQP involves solving the QP sub-problem at each iteration to compute a “nominal” perturbation sequence. This may be achieved with fast, but potentially, less-accurate unconstrained solvers (e.g., augmented-Lagrangian iLQR), that additionally compute the desired sensitivity gains using an efficient Riccati recursion. Second, leveraging recent results on differentiating through the solution of general convex problems, the sensitivity-based computations may be applied to the sequential-convex-programming algorithm. Finally, while the SQP algorithm was studied in the shooting context, recent work [30] has demonstrated how to incorporate nonlinear rollouts with both states and controls as optimization variables, albeit in an otherwise unconstrained setting. The sensitivity-based gain computation can be extended to this setting, potentially boosting the efficiency of “full” direct methods.
References
- [1] J. Schulman, Y. Duan, J. Ho, A. Lee, I. Awwal, H. Bradlow, J. Pan, S. Patil, K. Goldberg, and P. Abbeel, “Motion planning with sequential convex optimization and convex collision checking,” The International Journal of Robotics Research, vol. 33, no. 9, pp. 1251–1270, 2014.
- [2] M. Kalakrishnan, S. Chitta, E. Theodorou, P. Pastor, and S. Schaal, “STOMP: Stochastic trajectory optimization for motion planning,” in 2011 IEEE international conference on robotics and automation. IEEE, 2011, pp. 4569–4574.
- [3] V. Sindhwani, R. Roelofs, and M. Kalakrishnan, “Sequential operator splitting for constrained nonlinear optimal control,” in 2017 American Control Conference (ACC). IEEE, 2017, pp. 4864–4871.
- [4] T. A. Howell, B. E. Jackson, and Z. Manchester, “ALTRO: A fast solver for constrained trajectory optimization,” in 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2019, pp. 7674–7679.
- [5] Y. Tassa, T. Erez, and W. D. Smart, “Receding horizon differential dynamic programming,” in nips, 2007, pp. 1465–1472.
- [6] D. Bertsekas, Nonlinear Programming. Athena Scientific; 3rd edition, 2016.
- [7] W. Li and E. Todorov, “Iterative linear quadratic regulator design for nonlinear biological movement systems.” in ICINCO (1), 2004, pp. 222–229.
- [8] D. Q. Mayne, “Differential dynamic programming–a unified approach to the optimization of dynamic systems,” Control and Dynamic Systems, vol. 10, pp. 179–254, 1973.
- [9] D. Murray and S. Yakowitz, “Differential dynamic programming and newton’s method for discrete optimal control problems,” Journal of Optimization Theory and Applications, vol. 43, no. 3, pp. 395–414, 1984.
- [10] J. C. Dunn and D. Bertsekas, “Efficient dynamic programming implementations of Newton’s method for unconstrained optimal control problems,” Journal of Optimization Theory and Applications, vol. 63, no. 1, pp. 23–38, 1989.
- [11] L.-z. Liao and C. A. Shoemaker, “Advantages of differential dynamic programming over Newton’s method for discrete-time optimal control problems,” Cornell University, Tech. Rep., 1992.
- [12] F. Borrelli, A. Bemporad, and M. Morari, Predictive control for linear and hybrid systems. Cambridge University Press, 2017.
- [13] M. J. Tenny, S. J. Wright, and J. B. Rawlings, “Nonlinear model predictive control via feasibility-perturbed sequential quadratic programming,” Computational Optimization and Applications, vol. 28, no. 1, pp. 87–121, 2004.
- [14] Y. Tassa, N. Mansard, and E. Todorov, “Control-limited differential dynamic programming,” in 2014 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2014, pp. 1168–1175.
- [15] J. Marti-Saumell, J. Solà, C. Mastalli, and A. Santamaria-Navarro, “Squash-box feasibility driven differential dynamic programming,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020, pp. 7637–7644.
- [16] M. Giftthaler and J. Buchli, “A projection approach to equality constrained iterative linear quadratic optimal control,” in 2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids). IEEE, 2017, pp. 61–66.
- [17] Y. Aoyama, G. Boutselis, A. Patel, and E. A. Theodorou, “Constrained differential dynamic programming revisited,” arXiv preprint arXiv:2005.00985, 2020.
- [18] Z. Xie, C. K. Liu, and K. Hauser, “Differential dynamic programming with nonlinear constraints,” in 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2017, pp. 695–702.
- [19] S. Yakowitz, “The stagewise Kuhn-Tucker condition and differential dynamic programming,” IEEE transactions on automatic control, vol. 31, no. 1, pp. 25–30, 1986.
- [20] G. Lantoine and R. P. Russell, “A hybrid differential dynamic programming algorithm for constrained optimal control problems. part 1: Theory,” Journal of Optimization Theory and Applications, vol. 154, no. 2, pp. 382–417, 2012.
- [21] B. Plancher, Z. Manchester, and S. Kuindersma, “Constrained unscented dynamic programming,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2017, pp. 5674–5680.
- [22] J. Ma, Z. Cheng, X. Zhang, M. Tomizuka, and T. H. Lee, “Alternating direction method of multipliers for constrained iterative LQR in autonomous driving,” arXiv preprint arXiv:2011.00462, 2020.
- [23] A. Pavlov, I. Shames, and C. Manzie, “Interior point differential dynamic programming,” IEEE Transactions on Control Systems Technology, 2021.
- [24] S. Singh, J. J. Slotine, and V. Sindhwani, “Optimizing trajectories with closed-loop dynamic SQP,” arXiv preprint arXiv:2109.07081, 2021.
- [25] P. E. Gill, W. Murray, M. A. Saunders, and M. H. Wright, “User’s guide for NPSOL (version 4.0): A Fortran package for nonlinear programming.” Stanford Univ CA Systems Optimization Lab, Tech. Rep., 1986.
- [26] P. E. Gill, W. Murray, and M. A. Saunders, “SNOPT: An SQP algorithm for large-scale constrained optimization,” SIAM review, vol. 47, no. 1, pp. 99–131, 2005.
- [27] A. Agrawal, B. Amos, S. Barratt, S. Boyd, S. Diamond, and J. Z. Kolter, “Differentiable convex optimization layers,” in nips, 2019, pp. 9562–9574.
- [28] W. Jin, S. Mou, and G. J. Pappas, “Safe pontryagin differentiable programming,” arXiv preprint arXiv:2105.14937, 2021.
- [29] B. Amos, I. Jimenez, J. Sacks, B. Boots, and J. Z. Kolter, “Differentiable mpc for end-to-end planning and control,” in Advances in Neural Information Processing Systems, 2018, pp. 8289–8300.
- [30] C. Mastalli, R. Budhiraja, W. Merkt, G. Saurel, B. Hammoud, M. Naveau, J. Carpentier, L. Righetti, S. Vijayakumar, and N. Mansard, “Crocoddyl: An efficient and versatile framework for multi-contact optimal control,” in 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020, pp. 2536–2542.
- [31] W. Murray and F. J. Prieto, “A sequential quadratic programming algorithm using an incomplete solution of the subproblem,” SIAM Journal on Optimization, vol. 5, no. 3, pp. 590–640, 1995.
- [32] A. Bemporad, M. Morari, V. Dua, and E. N. Pistikopoulos, “The explicit linear quadratic regulator for constrained systems,” Automatica, vol. 38, no. 1, pp. 3–20, 2002.
- [33] R. Tedrake, “Underactuated robotics: Learning, planning, and control for efficient and agile machines course notes for MIT 6.832,” MIT, Tech. Rep., 2009.
Appendix A Related Work
iLQR/DDP-like methods for constrained trajectory optimization fall into one of three main categories: control-bounds only [14],[15], modification of the backward pass via KKT analysis [16, 17, 18, 19, 20], and augmented Lagrangian methods [21, 4, 3, 22, 23]. In the first category, only control-bound constraints are considered. For instance, [14] leverage the box-constrained DDP algorithm where the projected Newton algorithm is used to compute the affine perturbation policy in the backward pass, accounting for the box constraints on the control input. On the other hand, [15] composes a “squashing” function to constrain the input, and augments the objective with a barrier penalty to encourage the iterates to stay away from the plateaued regions of squashing function.
More general constraints are considered in the second and third categories [16, 17, 18, 19, 20, 21, 4, 3, 22, 23]. The work in [16] features equality constraints. The update step leverages constraint linearization and a nullspace projection to reduce the problem to a singular optimal control problem in an unconstrained control input lying within the nullspace of the linearized constraints. The works in [17, 18, 19, 20] formulate a constrained backward pass, where the one-step optimization problems now feature general state and control inequality constraints, linearized about the current trajectory iterate, yielding constrained one-step QPs. The algorithm in [18] extracts a guess of the active inequality constraints at each time-step by looking at the current trajectory iterate, and formulates the backward pass using linearized active constraints as equalities. The resulting KKT system is solved using Schur’s complement and using the dual variable extracted from the “feedforward” perturbation (i.e., the control perturbation computed assuming zero state perturbation) to yield an affine perturbation policy. The work in [19] formulates the theoretical underpinnings of such a constrained backward pass by introducing the stage-wise KKT conditions, and implements a very similar algorithm to [18], but where the active set is guessed to be equal to the active set at the feedforward solution, computed using linearized inequality constraints. [20] implements an identical backward pass, but with the active set guessed by extracting the violating constraints from implementing the feedforward perturbation (computed using a trust-region constraint). More recently, [17] leverages a slack-form of the linearized inequality constraints in the one-step optimization problems and uses an iterative procedure to refine the dual variable assuming zero state perturbation. A final solve of the KKT system using Schur’s complement yields the affine perturbation policy. The forward pass of all these methods discard this affine policy and instead use the one-step QPs to compute the control perturbation trajectory. In similar spirit, [23] introduced an interior-point variation of the backward pass, where the one-step optimization is re-written in min-max form w.r.t. the Lagrangian, and linearization of the associated perturbed KKT conditions is used to compute the affine perturbation policy.
The algorithms in [21, 4, 3, 22] incorporate the constraints by forming the augmented Lagrangian, and alternate between unconstrained trajectory optimization (with the augmented Lagrangian as the objective), and updating the dual variables and penalty parameter – loosely mimicking the method of multipliers [6]. [21] implements this in combination with a sampling-based construction of the optimal cost-to-go approximation; [4] dovetails the multipliers algorithm with Newton-based projections to project the intermediate solution onto the current active set; [3] adopts an ADMM-based solution, by leveraging indicator function representations of the constraints and alternates between an unconstrained TV-LQR problem, linearized constraint projection, consensus update, and dual update.
In contrast to the methods above, our approach does not involve guessing active constraint sets for the one-step QPs and instead re-interprets the DDP-style control law from the lens of sensitivity analysis about a known feasible perturbation sequence. We use this interpretation to derive an augmented “backward pass” where we additionally back-propagate a “next-step” constraint set, along with the quadratic cost-to-go parameters. Further, we demonstrate how to approximate the desired feedback gains across all time-steps in parallel, dispelling the need for backward passes which are numerically less robust. Finally, we embed the closed-loop rollout within a theoretically sound SQP framework.
Appendix B SQP Preliminaries
In this appendix, we outline the main steps of the SQP algorithm. These steps represent a simplified version of the commercial code NPSOL [25], and use the termination conditions from the commercial code SNOPT [26]. Consider the following smooth optimization problem:
(17) |
where the functions and are at least . Define the Lagrangian as:
A standard SQP iteration consists of the following steps:
-
•
Solve a quadratic problem (QP) at current primal-dual point :
where , , and . Let be the optimal dual variable for the inequality constraint above, and define the primal-dual search directions , where .
-
•
Define the Merit function to be the augmented Lagrangian:
where is the inequality “slack,” defined only for the line-search, and is a non-negative penalty parameter. At the current iterate and penalty value , set as follows:
where the operator is component-wise, and set the slack search direction as .
-
•
Define the line-search function and pick the updated penalty parameter s.t. , where is the decrement, defined as: . A simple update rule is given below [31]:
(18) -
•
Compute (e.g., using backtracking, safe-guarded polynomial interpolation, etc.) the largest step length , where is a user-specified lower-bound, s.t. the following line-search conditions are satisfied:
(19) where .
-
•
Update using the computed step length, and set .
Note that this stripped-down version of SQP does not account for infeasibility detection, which involves slackened forms of the QP sub-problem.
The termination conditions are based upon specified relative tolerances . Define , and . Then, convergence to a KKT stationary point is declared and the algorithm terminated if:
(20) |
where denotes the Hadamard product.
Appendix C Details for Shooting SQP
We provide here the explicit expressions used within various stages of Shooting SQP.
C-A QP Sub-Problem
(21) |
where is defined recursively using the linearized dynamics:
(22) |
The Hessian term in (4a) and (5) is given by:
(23) |
Remark 4.
The matrices above may not be positive semi-definite. To ensure problem (4) is convex, we project to the positive semi-definite cone with some tolerance.
C-B Line-Search
Define where and have the same dimensionality as and respectively. We use the augmented Lagrangian function as the merit function for line-search:
(24) |
where the dependence on is implicit, i.e., . The non-negative constants are a set of penalty parameters, which for brevity, we denote by . The vectors represent a set of slack vectors, used only for the line-search. At the current iterate and set of penalty parameters , set as
and set , the slack-search directions, as:
As in Appendix B, we now must compute an updated set of penalty parameters so that . To derive this update, let us consider an abstract optimization problem of the form:
where is a set of vector-valued inequality constraints, with corresponding Jacobians . Let be the stacked dual vector for the inequality constraints, the Lagrangian Hessian, and the objective gradient. The QP sub-problem is then defined as:
Let be the optimal dual vector for the QP-subproblem for constraint and define . Consider the Augmented Lagrangian, , defined as:
where is the “slack” for the constraint, and denotes the concatenatation . At the current iterate set the slack vectors as follows:
and set the slack search directions . Define the line-search function as:
Now, we wish to choose the set of penalty parameters s.t. . Consider the gradient :
where we have used the definitions of and . Notice now that if for all , then since , is a feasible solution to the QP sub-problem. Thus, , from which one obtains the desired inequality. Consider then the case that the set is non-empty. Then, we can re-write as:
Now, for , defined as:
we obtain:
Thus, the update equation for may be written as:
In context of Section II-A, the term is equivalent to the optimal objective of problem (4), while the correspondences for the dual and slack vectors follows straightforwardly. The line-search acceptance conditions are as given in (19).
C-C Termination Conditions
Let be user-specified relative tolerances, and define and . Then, the KKT stationarity termination conditions are given as:
(25) |
where denotes the Hadamard product, and , the Hamiltonian, is defined in (23).
Appendix D Computing Sensitivity Gains using DP
Proof of Theorem 1.
We will prove the result via induction. Notice that for the base case , the cost-to-go function is defined only for , which by construction, is the set . Thus and for . It follows that problem (13) for coincides exactly with the definition for given in (9). Thus, for all where the problem is feasible.
Now, notice that problem (13) for is a multi-parametric QP in the “parameter” . Thus, by Theorem 2 in [32], linear independence of the active inequality constraints at implies that the solution functions and are locally333One can actually show that the solution function is in fact piecewise affine over however this is not necessary for the proof. affine in a region containing , taking the form in (14). This region containing is precisely the set:
A second consequence of the aforementioned theorem is that the above set is defined by the intersection of the following two sets:
that is, the subset of where the locally affine solutions satisfy the primal-dual feasibility constraints of problem (13) for . This however is precisely the definition of the set in (15). Thus, we have established that for and by construction, . Substituting the affine feedback law into the objective for problem (13) for gives the recursion for , as defined in (16). Thus, for , completing the proof for .
Suppose then that the following statements are true for some : (i) , and (ii) for . Consider the definition of given in (9), and define:
Since by the inductive hypothesis, it follows that and hence, the set is non-empty. Now, for all , one may add the redundant constraint: and equivalently re-write as:
Now, leveraging the inductive hypothesis that for all , we can re-write as:
Comparing with problem (13), this is precisely the definition of . Thus, we have that for .
Now, consider the fixed affine feedback law in (14), that is well-defined thanks to the linear independence assumption at . Once again, applying Theorem 2 in [32], we have that for , as defined in (15). Substituting the affine law into the objective of problem (13) yields the recursion in (16); thus, for all . Further, by construction, ; thus, the intersection is non-empty.
To finish the proof, we must show that . By doing so, we establish the equivalency between the fixed affine feedback law and and consequently, the equivalency of and for .
To show this last part, notice that for any that lies on the boundary , it holds that lies on the boundary . This follows from the continuity of . Then, by the equivalence of and for and the equivalence of and for , for any , it holds that . It follows that .
Thus, since the intersection is non-empty, then either (i) , or (ii) is non-empty. In the latter case, we have shown that is the set , completing the proof.
∎
Appendix E Experiment Details
The following hyperparameters were held fixed for all Shooting SQP variations over all environments:
Parameter | Value |
---|---|
Line-search decrease ratio: | 0.4 |
Line-search gradient ratio: | 0.49 |
Line-search step-size lower-bound: | 1e-5 |
Termination Primal tolerance: | 1e-3 |
Termination Dual tolerance: | 1e-3* |
In the following, we give details specific to each environment case-study.
E-A Motion Planning for a Car
This is a system with 4 states: , where is the 2D position, is the orientation, and is the forward velocity. The continuous-time dynamics are given by:
where are the steering and acceleration inputs. As in [18], we use Euler integration with a time-step of s to yield the discrete-time model. The control limits are taken from [17]: rad/s, and .
The stage cost is with , and terminal cost is with and . The horizon is 40 steps. The initial control sequence guess was all zeros. The constant for was set to .
Figure 4 shows the solver progress for all algorithms and cases.

E-B Acrobot
We study the classic acrobot “swing-up” task with states and control input. The state is defined as where the root joint angle, is the relative elbow joint angle, and are their respective angular rates. The dynamics were taken from [33]. Starting from an initial state , we require the acrobot to swing up to the goal state , subject to the control limits . The stage cost is , and the terminal cost is , with weights . Additionally, we enforce the limit constraints444Due to the fact that these two state components really live on , we employ chart switching to smoothly handle this constraint near the boundary . This is achieved by adjusting the boundary constraints for each angle to lie in for the next iterate, where the per time-step adjustment is chosen based on the current values for and at time-step . We also wrap both angles back to the interval at the end of each iteration. and the terminal constraint: . The discrete-time dynamics were obtained using explicit Euler integration with a time-step of 0.05s, and the horizon was steps.
To generate the initial guess, we set , the initial guess for the state-trajectory to be a linear interpolation between and , and , the initial guess for the controls as all zeros. Then, the following rollout was used to generate the initial control trajectory for SQP:
where the gains were computed via a TV-LQR solve, with cost defined by the Hessian of the objective and dynamics linearized about .
We found that for this problem, the Gauss-Newton Hessian approximation was more stable for both algorithms. This entails dropping the second-order gradients stemming from in (23). The constant for was set to . Figure 5 illustrates solver progress.

Despite sub-optimal step-sizes for both algorithms (due to the natural instabilities of the dynamics), successfully converges within 33 iterations while OL is still struggling to resolve the Lagrangian stationarity residual at 100 iterations.
E-C Quad-Pendulum
The state is defined as , where is the CoM position of the quadrotor in the vertical plane, is its orientation, and is the orientation of the attached pendulum w.r.t. the vertical. The two control inputs corresponding to the rotor thrusts are subject to the limits , where is the mass of the quadrotor and is the gravitational acceleration.
We provide the Lagrangian characterization of the dynamics. The mass matrix , as a function of the generalized coordinates is given by:
where is the mass of the pendulum (assumed concentrated at the endpoint), is the pendulum length, and is the quadrotor moment of inertia. The kinetic energy is thus: ; the potential energy is , and the mechanical Lagrangian is . The dynamics are thus given as:
where , the generalized force vector is given as:
where is the quadrotor wing-span, is the viscous frictional torque: , where is the constant friction coefficient. We used the constants: , , , , , , , and employed Euler discretization with time-step 0.025 s.
The stage cost is , and the terminal cost is , where is the goal state with: . The vector is the hover thrust setpoint, and . The weights are . The horizon was 160 steps.
In addition to the obstacle avoidance constraints, we also set the operational constraints and . The initial control sequence guess was set to , the hover setpoint, for all timesteps.
Due to the difficulty of this problem, the constant for was initialized at and reduced by a factor of 10 every SQP iteration, until a lower-bound of .
Figure 6 shows a timelapse for the first case solution from , and in Figure 7 we plot the re-construction error for both cases for .

