Sequential Convex Programming for
Non-Linear Stochastic Optimal Control
Abstract.
This work introduces a sequential convex programming framework for non-linear, finite-dimensional stochastic optimal control, where uncertainties are modeled by a multidimensional Wiener process. We prove that any accumulation point of the sequence of iterates generated by sequential convex programming is a candidate locally-optimal solution for the original problem in the sense of the stochastic Pontryagin Maximum Principle. Moreover, we provide sufficient conditions for the existence of at least one such accumulation point. We then leverage these properties to design a practical numerical method for solving non-linear stochastic optimal control problems based on a deterministic transcription of stochastic sequential convex programming.
Key words and phrases:
Nonlinear stochastic optimal control, sequential convex programming, convergence of Pontryagin extremals, numerical deterministic reformulation.1991 Mathematics Subject Classification:
49K40, 65C30, 93E20Ce travail introduit un cadre de programmation séquentielle convexe pour le contrôle optimal de systèmes stochastiques non-linéaires de dimensions finies. Nous prouvons que tout point d’accumulation de la suite des itérées générée par la programmation séquentielle convexe est un candidat à une solution localement optimale du problème originel, au sens du Principe du Maximum de Pontriaguine stochastique. De plus, nous développons des conditions suffisantes pour l’existence d’au moins un de ces points d’accumulation. Nous exploitons ces propriétés afin de concevoir une méthode numérique pratique résolvant des problèmes de contrôle stochastique non-linéaires par une reformulation déterministe de la programmation séquentielle convexe.
1. Introduction
Over the past few decades, the applied control community has devoted increasing attention toward the optimal control of stochastic systems. The general formulation of this problem entails steering a dynamical system from an initial configuration to a final configuration while optimizing some prescribed performance criterion (e.g., minimizing some given cost) and satisfying constraints. This dynamical system is also subject to uncertainties which are modeled by Wiener processes and may come from unmodeled and/or unpredicted behaviors, as well as from measurement errors. Active development of new theoretical and numerical methods to address this problem continues, and the subject already has a rich literature.
We can classify the existing works into two main categories.
The first category consists of contributions that focus on Linear Convex Problems (LCPs), i.e., whose dynamics are linear and whose costs are convex in both state and control variables. An important class of LCPs is given by Linear Quadratic Problems (LQPs) in which costs are quadratic in both state and control variables and for which the analysis of optimal solutions may be reduced to the study of an algebraic relation known as Stochastic Riccati Equation (SRE) [1, 2, 3, 4]. Efficient algorithmic frameworks have been devised to numerically solve LCPs, ranging from local search [5] and dual representations [6, 7], to deterministic-equivalent reformulations [8, 9], among others. In the special case of LQPs, those techniques may be further improved by combining SRE theory with semidefinite programming [10, 11], finite-dimensional approximation [12, 13], or chaos expansion [14].
The second category of works deals with problems that do not enjoy any specific regularity, allowing non-linear dynamics or non-convex (therefore non-quadratic) costs. Throughout this paper, we call these Non-Linear Problems (NLPs). It is unquestionable that NLPs have so far received less attention from the community than LCPs, especially since the analysis of the former is usually more involved. Similar to the deterministic case, there are two main theoretical tools that have been developed to analyze NLPs: stochastic Dynamic Programming (DP) [15, 16] and the stochastic Pontryagin Maximum Principle (PMP) [17, 18, 19] (an extensive survey of generalizations of DP and the PMP may be found in [20]). In the case of LQPs, one can show that DP and the PMP lead to SRE [20]. DP provides optimal policies through the solution of a partial differential equation, whereas the necessary conditions for optimality offered by the PMP allow one to set up a two-point boundary value problem which returns candidate locally-optimal solutions when solved. Both methods only lead to analytical solutions in a few cases, and they can involve complex numerical challenges (the stochastic setting is even more problematic than the deterministic one, the latter being better understood for a wide range of problems, see, e.g., [21, 22, 23]). This has fostered the investigation of more tractable approaches to solve NLPs such as Monte Carlo simulation [24, 25], Markov chain discretization [26, 27], and deterministic (though non-equivalent) reformulations [28, 9], among others. Importantly, many of the aforementioned approaches, e.g., [26], are often based on some sort of approximation of the original formulation and thus offer powerful alternatives to DP and the PMP, especially since they are more numerically tractable and can be shown to converge to policies satisfying DP or the stochastic PMP.
In this paper, our objective is to lay the theoretical foundations to leverage Sequential Convex Programming (SCP) for the purposes of computing candidate optimal solutions for a specific class of NLPs. SCP is among the most well-known and earliest approximation techniques for deterministic non-linear optimal control and, to the best of our knowledge, such an approach has not been extended to stochastic settings yet. The simplest SCP scheme (which we consider in this work) consists of successively linearizing any non-linear terms in the dynamics and any non-convex functions in the cost and seeking a solution to the original formulation by solving a sequence of LCPs. This approach leads to two desirable properties, which jointly are instrumental to the design of efficient numerical schemes. First, one can rely on the many efficient techniques and associated software libraries that have been devised to solve LCPs (or LQPs depending on the shape of the original NLP). Second, as we will show in this paper, when this iterative process converges, it returns a strategy that satisfies the PMP related to the original NLP, i.e., a candidate optimum for the original formulation. Unlike existing methods such as [26], which introduce approximated formulations that are still non-linear, our approach offers the main advantage of requiring the solution to LCPs only. Specifically, we identify three key contributions:
-
(1)
We introduce and analyze a new framework to compute candidate optimal solutions for finite-horizon, finite-dimensional non-linear stochastic optimal control problems. with control-affine dynamics and uncontrolled diffusion. This hinges on the basic principle of SCP, i.e., iteratively solving a sequence of LCPs that stem from successive linear approximations of the original problem.
-
(2)
Through a meticulous study of the continuity of the stochastic Pontryagin cones of variations with respect to linearization, we prove that any accumulation point of the sequence of iterates generated by SCP is a strategy satisfying the PMP related to the original formulation. In addition, by leveraging additional assumptions we prove that any such accumulation point always exists, which in turn provides a “weak” guarantee of success for the method.
-
(3)
Through an explicit example, we show how to leverage the properties offered by this framework to better understand what approximations may be adopted for the design of efficient numerical schemes for NLPs, although we leave the theoretical investigation of the approximation error as future direction.
The paper is organized as follows. Section 2 introduces notation and preliminary results and defines the stochastic optimal control problem of interest. In Section 3, we introduce the framework of stochastic SCP and the stochastic PMP, and we state our main result of convergence. For the sake of clarity, in Section 4 we retrace the proof of the stochastic PMP and introduce all the necessary technicalities we need to prove our main result of convergence, though we recall that the stochastic PMP is a well-established result and Section 4 should not be understood as part of our main contribution. In Section 6, we show how to leverage our analysis to design a practical numerical method to solve non-linear stochastic optimal control problems, and we provide numerical experiments. Finally, Section 7 provides concluding remarks and perspectives on future directions.
2. Stochastic Optimal Control Setting
Let be a second–countable probability space and let be a –dimensional Brownian motion with continuous sample paths starting at zero and whose filtration is complete. We consider processes that are defined within bounded time intervals. Hence, for every , and maximal time , we introduce the space of progressive processes (with respect to the filtration ) such that , where is the Euclidean norm. In this setting, for every and , the Itô integral of with respect to is the continuous, bounded in , and –dimensional martingale in (with respect to the filtration ) that starts at zero, denoted . For , we denote by the space of –adapted processes that have continuous sample paths and satisfy .
2.1. Stochastic Differential Equations
From now on, we fix two integers , a maximal time , and a compact, convex subset . Although in this work we consider differential equations steered by deterministic controls (see Section 2.2 below), for the sake of generality, we introduce stochastic dynamics which depend on either deterministic or stochastic controls. Specifically, we denote by the set of admissible controls and consider either deterministic controls or stochastic controls . Note that since is compact, admissible controls are almost everywhere, or a.e. for brevity (and additionally almost surely, or a.s. for brevity) bounded. We are given continuous mappings , and , which are at least with respect to the variable . For a given , we consider dynamical systems modeled through the following forward stochastic differential equation with uncontrolled diffusion
(1) | ||||
where we assume that the fixed initial condition satisfies , for every (for instance, this holds when is a deterministic vector of ).
The procedure developed in this work is based on the following linearization of (1). For , let and . For a given , we define the linearization of (1) around to be the following well-defined forward stochastic differential equation with uncontrolled diffusion
(2) | ||||
We require the solutions to (1) and to (2) to be bounded in expectation uniformly with respect to , and . For this, we consider the following (standard) assumption:
Functions , , , , have compact supports in .
Under , for every and every , the stochastic equation (1) has a unique (up to stochastic indistinguishability) solution , whereas for every and , the stochastic equation (2) has a unique (up to stochastic indistinguishability) solution (see, e.g., [29, 30]), and the following technical result holds, which will be used in the proof of our convergence result, with proof given in the appendix (see Section 7.1).
2.2. Stochastic Optimal Control Problem
Given , we consider continuous mappings , , , and with . We require , , and , , to be at least with respect to the variable , and we require , to be convex. In particular, is Lipschitz when restricted to the compact and convex set . We focus on finite-horizon, finite-dimensional non-linear stochastic Optimal Control Problems (OCP) with control-affine dynamics and uncontrolled diffusion, of the form
where we optimize over deterministic controls . We adopt the (fairly mild) assumption:
Mappings , , and , , either are affine-in-state or have compact supports in and in , respectively.
Our choice of optimizing over deterministic controls as opposed to stochastic controls is motivated by practical considerations. Specifically, in several applications of interest ranging from aerospace to robotics, it is often advantageous to compute and implement simpler deterministic controls at higher control rates, to be able to quickly react to external disturbances, unmodeled dynamical effects, and changes in the cost function (e.g., moving obstacles that a robot should avoid in real-time). Moreover, in cases where a feedback controller is accounted for, a common and efficient approach entails decomposing the stochastic control into a state-dependent feedback term, plus a nominal deterministic control trajectory to be optimized for, which is equivalent to adopting deterministic controls (see, e.g., [31, 32]). Nevertheless, for the sake of completeness and generality, in Section 5 we introduce appropriate conditions under which our method may extend to stochastic controls; we also analyze possible extensions to the case of free-final-time optimal control problems.
Many applications of interest often involve state constraints. In this case, to make sure the procedure developed in this work still applies, every such constraint needs to be considered in expectation and penalized within the cost of OCP (for example by including those contributions in or through some penalization function). In future work, we plan to extend our method to more general settings, whereby for instance state constraints are enforced through more accurate chance constraints (see Section 7 for a thorough discussion).
3. Stochastic Sequential Convex Programming
We propose the following framework to solve OCP, based on the classical SCP methodology. Starting from some initial guesses of control and trajectory , , we inductively define a sequence of stochastic linear-convex problems whose dynamics and costs stem from successive linearizations of the mappings , , and , and we successively solve those problems while updating user-defined parameters. The convergence of the method generally depends on a good choice of the initial guess and of updates in each iteration to trust-region constraints. These constraints are added to make the successive linearizations of OCP well-posed. Below, we detail this procedure.
3.1. The Method
At iteration , by denoting
(3) |
we define the following stochastic Linearized Optimal Control Problem (LOCP)
where we optimize over deterministic controls . The tuple is defined inductively and denotes a solution to (LOCP).
Each problem LOCP consists of linearizing OCP around the solution at the previous iteration , starting from . To avoid misguidance due to high linearization error, we must restrict the search for optimal solutions for LOCP to neighborhoods of . This is achieved by the final constraints listed in LOCP, referred to as trust-region constraints, where the constant is the trust-region radius. No such constraint is enforced on controls, since those appear linearly in , , and . To ensure well-posedness of the sequence (LOCP)k∈N, we require LOCP has a solution, for which we consider the following assumption:
For every , problem LOCP is feasible.
[Existence of solutions of LOCP] Under , LOCP has a solution for every .
Proof.
The proof is based on the argument developed to prove [20, Theorem 5.2, Chapter 2]. Specifically, let and be a minimizing sequence for LOCP.
The sequence is uniformly bounded in , and thus we may assume the existence of such that converges, up to some subsequence, to for the weak topology of . Moreover, the compactness and convexity of yield . Finally, by Mazur’s theorem there exists a sequence of convex combinations
such that converges to for the strong topology of (although, since controls are deterministic and the diffusion is uncontrolled, weak convergence of controls would suffice in our setting).
Thanks to Lemma 2.1, the sequence of trajectories converges to the trajectory for the strong topology of . At this step, for every , thanks to the linearity of (2) we obtain that
and therefore the linearity of the function yields
whereas the convexity of the norm yields
In turn, passing to the limit for , we infer that the tuple is admissible for LOCP. In addition, the convexity of the function allows us to compute
from which we conclude that is a solution of LOCP. ∎
Although one finds empirically that is often satisfied in practice (see for instance results in Section 6), it is generally difficult to derive sufficient conditions for this assumption to hold true a priori. In particular, to the best of our knowledge, only stochastic dynamics with constant coefficients and with specific regularity conditions on the stochastic diffusion yield controllability [33]. In deterministic settings, some works obtain feasibility of each convex subproblem, though only locally around the unknown solution of the original formulation by assuming second-order regularity conditions, which however can not be checked a priori [34, 35]. Thus, even SCP-based schemes in simpler deterministic settings must often either explicitly assume the feasibility of each convex subproblem, or must modify the linearized dynamics by infusing additional slack controls to force feasibility [36]. Motivated by these remarks, we leave the investigation for sufficient conditions for the validity of as a direction of future research, which are out of the scope of this work which focuses on the properties of accumulations points of the sequence generated by SCP. Note that is satisfied when , i.e., no final constraints are imposed: a problem that remains generally relevant though computationally difficult to solve.
Under assumptions –, the method consists of iteratively solving the aforementioned linearized problems through the update of the sequence of trust-region radii, producing a sequence of tuples such that for each , solves LOCP. The user may often steer this procedure to convergence (with respect to appropriate topologies) by adequately selecting an initial guess and an update rule for 111When state constraint penalization is adopted, SCP procedures may also consider update rules for penalization weights, and those must be provided together with update rules for the trust-region radii. Further details can be found in [37, 38]., and appropriate choices will be described in Section 6. Assuming that an accumulation point for can be found (whose existence will be discussed shortly), our objective consists of proving that this is a candidate locally-optimal solution to the original formulation OCP. Specifically, we show that any accumulation point for satisfies the stochastic PMP related to OCP. To develop such analysis, we require the absence of state constraints, and in particular of trust-region constraints.
3.2. Stochastic Pontryagin Maximum Principle
In this section, we recall the statement of the PMP which provides classical first-order necessary conditions for optimality, upon which we will establish our main result. For the sake of clarity, we introduce the PMP related to OCP and the PMP related to each convexified problem LOCP separately.
3.2.1. PMP related to OCP
For every , , and define the Hamiltonian (same notation as in (1))
[Stochastic Pontryagin Maximum Principle for OCP [20]] Let be a locally-optimal solution to OCP. There exist , a tuple , where and are constant, and such that the following relations are satisfied:
-
(1)
Non-Triviality Condition: .
-
(2)
Adjoint Equation:
-
(3)
Maximality Condition:
The quantity uniquely determines , , and and is called extremal for OCP (associated with the tuple , or simply with ). An extremal is called normal if .
Albeit final conditions are specified instead of initial conditions for , it turns out that processes satisfying backward stochastic differential equations are adapted with respect to the filtration (see, e.g., [30, 20]), which makes the adjoint equation well-posed. Although conditions for optimality for stochastic optimal control problems are usually developed when considering stochastic controls only, the proof of Theorem 3.2.1 (and of Theorem 3.2.2 below) readily follows from classical arguments (e.g., see [20, Chapter 3.6]). Nevertheless, to prove our main result, we rely on a proof of Theorem 3.2.1 (and of Theorem 3.2.2 below) which stems from implicit-function-theorem-type results (see Sections 4.1 and 4.2). Thus, in Section 4.1, we provide a new proof of Theorem 3.2.1 (more precisely, of Theorem 3.2.2 below) which follows from the original idea developed by Pontryagin and his group (see, e.g., [17, 39, 40]), though the latter result should not be understood as part of our main contribution.
3.2.2. PMP related to LOCP
For every , , , and , define the Hamiltonian (same notation as in (2))
[Weak Stochastic Pontryagin Maximum Principle for LOCP [20]] Let be a locally-optimal solution to LOCP. There exists , a tuple , where , , and are constant, and such that the following relations are satisfied:
-
(1)
Non-Triviality Condition: .
-
(2)
Adjoint Equation:
-
(3)
Maximality Condition:
The quantity uniquely determines , , and and is called extremal for LOCP (associated with , or simply with ). An extremal is called normal if .
By introducing the new variable
(4) |
the trust-region constraint in LOCP can be rewritten as . Thus, by leveraging this transformation, LOCP may be reformulated as a standard stochastic optimal control problem with final inequality constraints. Nevertheless, although the conditions listed in Theorem 3.2.1 are essentially sharp, the statement of Theorem 3.2.2 may be strengthened as follows. If is an extremal for LOCP, one can additionally prove that (e.g., see [41]; note that in [41] the multipliers have opposite signs because a different convention is adopted) and
(the latter is know as slack condition), which motivates the choice “weak stochastic PMP for LOCP” as name for Theorem 3.2.2. Nevertheless, since the trust-region constraints do not appear in the original problem, we do not need to leverage these latter additional conditions on to prove our claims, i.e., Theorem 3.2.2 suffices to establish the aforementioned properties of accumulation points for SCP when applied to solve OCP.
3.3. Main Results
Our contribution is twofold. First, under very mild assumptions, we prove that any accumulation point of the sequence of iterates generated by SCP is a candidate locally-optimal solution for OCP. Specifically, we prove that any accumulation point of the sequence generated by SCP, where is an extremal for LOCP associated with in the sense of Theorem 3.2.1, is an extremal for OCP in the sense of Theorem 3.2.2 (see Theorem 3.3.1). Although the optimization community is aware that establishing the convergence of the sequence generated by SCP is generally difficult (see, e.g., [42, 43]), one can often prove optimality-related properties of its accumulation points, a much natural property which justify the use of SCP (see, e.g., [43]). Second, by strengthening our original assumptions, we prove existence of at least one accumulation point of the aforementioned sequence generated by SCP (see Theorem 3.3.2). Below, we organize these claims in two more precise statements.
3.3.1. Properties of Accumulation Points for SCP
[Properties of Accumulation Points for SCP] Assume that – hold and that SCP generates a sequence such that converges to zero, and for every , the tuple locally solves LOCP. For every , letting be an extremal associated with for LOCP (whose existence is ensured by Theorem 3.2.2), assume the following Accumulation Condition holds:
-
(AC)
Up to some subsequence, converges to some for the weak topology of .
If , then is an extremal for OCP associated with .
The guarantees offered by Theorem 3.3.1 read as follows. Under – and by selecting a shrinking-to-zero sequence of trust-region radii, if iteratively solving problems LOCP returns a sequence of strategies whose extremals satisfy (AC) with a non-trivial multiplier , then SCP finds a candidate (local) solution to OCP. Theorem 3.3.1 extends classical results on the well-posedness of SCP (see, e.g., [43]) from deterministic to stochastic settings. In particular, the requirement in Theorem 3.3.1 is natural and has an equivalent in deterministic settings, playing the role of some sort of qualification condition (see, e.g., [43, Theorem 3.4]). Note that the requirement can be easily numerically checked (see our discussion after Theorem 3.3.2).
3.3.2. Existence of Accumulation Points for SCP
Assumptions – together with some minor requirements are sufficient to establish that any accumulation point for the sequence of iterates satisfies the stochastic PMP related to OCP (Theorem 3.3.1). We can additionally infer the existence of accumulation points, i.e., (AC) in Theorem 3.3.1 holds true, if some more structure on the data defining OCP is assumed. Importantly, the validity of (AC) endows stochastic SCP with the additional guarantee that at least one accumulation point (with respect to weak topologies) exists, which in turn provides a “weak” guarantee of success for the method via the result of Theorem 3.3.1 (“weak” because convergence is satisfied up to some subsequence). We introduce the following technical condition:
The mapping is given by , where each is symmetric definite positive and the mapping is continuous.
The use we make of is essentially contained in the following result.
Under , for every , every normal extremal for LOCP, i.e., for which , is such that the corresponding control is time-continuous.
Proof.
Fix , and let be a normal extremal for LOCP. Due to , the convexity of and the maximality condition in Theorem 3.2.2 yield
where we denote with , whereas denotes the projection over the convex set . The claim readily follows once we prove the mappings are continuous, given that and are continuous. We only prove the continuity of , given that the continuity of can be proved by leveraging similar arguments, and and . For this, thanks to , , Lemma 2.1, Theorem 3.2.2, and Hölder and Burkholder–Davis–Gundy inequalities, for every , we obtain that
for some appropriate constant , and the conclusion follows. ∎
Through Lemma 3.3.2, becomes crucial to ensure the validity of (AC) in Theorem 3.3.1. In particular, the works [44, 45, 46, 47], which analyze continuity properties of extremals with respect to appropriate deformations of some deterministic optimal control problems and which inspired our work, show that the time-continuity of optimal controls for each LOCP represents a requirement which is not easy to relax (in particular, see the counterexample in [47, Section 2.3]), especially in the presence of trust-region constraints. Importantly, motivated by regularity results in deterministic optimal control settings (see, e.g., [48, Theorem 3.2]), we reckon that more generic mappings might yield Lemma 3.3.2, especially when optimizing over deterministic controls, although we leave the investigation of more general sufficient conditions for the time-continuity of optimal controls for each LOCP as a future research direction, in that it is out of the scope of this work which again focuses on the properties of accumulations points of the sequence generated by SCP.
[Existence of Accumulation Points for SCP] Assume that – hold and that SCP generates a sequence such that converges to zero, and for every , the tuple locally solves LOCP. For every , let be an extremal associated with for LOCP (whose existence is ensured by Theorem 3.2.2). If for every , then (AC) in Theorem 3.3.1 holds true.
In addition, if denotes the extremal for OCP associated with some , which is provided by Theorem 3.3.1, the following convergence holds, up to some subsequence, for every when :
(5) |
The main takeaway of Theorem 3.3.2 is that an accumulation point for stochastic SCP always exists, which in turn is a candidate (local) solution to OCP due to Theorem 3.3.1, as soon as appropriate qualification-type conditions are satisfied. In particular, similarly to the condition in Theorem 3.3.1, the requirement in Theorem 3.3.2 plays the role of an additional qualification condition, but at the level of each subproblem LOCP. Note that the latter is a generic property in deterministic optimal control (see, e.g., [49]). Finally, the requirement in Theorem 3.3.1 from which (local) optimality of the strategy found by SCP stems can be numerically checked thanks to (5), as soon as the multipliers are accessible through SCP iterations.
3.3.3. Insights to Speed Up the Convergence of SCP
In Theorem 3.3.2, there are also insightful statements concerning the convergence of Pontryagin extremals. Let us outline how those statements may be leveraged to speed up convergence.
For this, adopt the notation of Theorem 3.2.1 and assume that we are in the situation where applying the maximality condition of the PMP to problem OCP leads to smooth candidate optimal controls, as functions of the variables , , and (be aware that this might not be straightforward to obtain). We are then in a position to define two-point boundary value problems to solve OCP, also known as shooting methods, for which the decision variables become , , and . In particular, the core of the method consists of iteratively choosing and making the adjoint equation evolve until some given final condition is met (see, e.g., [50, 51] for a more detailed explanation of shooting methods). In the context of deterministic optimal control, when convergence is achieved, shooting methods terminate quite fast (at least quadratically). However, here the bottlenecks are: 1) to deal with the presence of the variable and 2) to find a good guess for the initial value of to make the whole procedure converge. In the setting of Theorem 3.3.2, a valid option to design well-posed shooting methods is as follows. With the notation and assumptions of Theorem 3.3.2, up to some subsequence it holds that (with respect to appropriate topologies) as . Therefore, assuming we have access to Pontryagin extremals along iterations and given some large enough iteration , we can fix and initialize with a shooting method for OCP that operates on the finite-dimensional variable . If successful, this strategy would speed up the convergence of the entire numerical scheme, though we leave its investigation as a future research direction.
4. Proof of the Main Results
Since the statement of Theorem 3.3.1 is in particular contained in Theorem 3.3.2, it is sufficient to prove Theorem 3.3.2 only. We split this proof into three main steps. First, we retrace the proof of the stochastic PMP to introduce necessary notation and expressions. In addition, we leverage this step to provide novel insight on how to prove the stochastic PMP by following the lines of the original work of Pontryagin and his group (see, e.g., [17, 39, 40]), a proof that we could not find in the stochastic literature. Second, we show the convergence of trajectories and controls, together with the convergence of variational inequalities (see Section 5.2.2 for a definition). The latter represents the cornerstone of the proof and paves the way for the final step, which consists of proving the convergence of the Pontryagin extremals. For the sake of clarity and brevity and without loss of generality, we carry out the proof in the case of scalar Brownian motion, i.e., we assume . Moreover, for any with , we adopt the notation .
4.1. Main Steps of the Proof of the Stochastic Maximum Principle
For the sake of clarity and brevity, we retrace the proof of Theorem 3.2.1 only. The proof of Theorem 3.2.2 follows from a straightforward modification of the steps we provide below, by introducing the additional final constraint via (4). In particular, we highlight those modifications below and in Section 4.2.2.
4.1.1. Linear Stochastic Differential Equations
Define the stochastic matrices and . For any time and any bounded initial condition , the following problem
(6) |
is well-posed [20]. Its unique solution is the –adapted with right-continuous sample paths process , where the matrix-valued –adapted with continuous sample paths processes and satisfy
(7) |
respectively. In particular, a straightforward application of the Itô formula shows that , and therefore , for every .
4.1.2. Needle-like Variations and End-point Mapping
One way to prove the PMP comes from the analysis of specific variations called needle-like variations on a mapping called the end-point mapping. Those concepts are introduced below in the context of optimization over deterministic controls (see Section 5 for the generalization of this argument to stochastic controls).
Given an integer , fix times which are Lebesgue points for , and fix random variables such that . For fixed scalars , , and , the needle-like variation of the control is defined to be the admissible control if and otherwise. Denote by the solution related to an admissible control of the augmented system
(8) |
and define the mapping . For every fixed time , by denoting , the end-point mapping at time is defined to be the function
(9) | ||||
where is the open ball in of radius . Due to Lemma 2.1, it is not difficult to see that is Lipschitz (see also the argument developed to prove Lemma 4.1.2 below). In addition, this mapping may be Gateaux differentiated at zero along admissible directions of the cone . For this, denote , and let be the unique solution to (6) with .
For the proof of Theorem 3.2.2, the only change compared to the proof of Theorem 3.2.1 which is required up to this point consists of replacing the function which defines the end-point mapping (9) by the function
Note that this change is consistent since we will effectively make use of the mapping at the time only.
[Stochastic needle-like variation formula] Let . For any , it holds that
The proof of this result is technical (it requires an intense use of stochastic inequalities) but not difficult. We provide an extensive proof of Lemma 4.1.2 in the appendix in a more general context (see also Section 5).
4.1.3. Variational Inequalities
The main step in the proof of the PMP goes by contradiction, leveraging Lemma 4.1.2. To this end, for every , define the linear mapping
which due to Lemma 4.1.2, satisfies
for every . Finally, consider the closed, convex cone of given by
If , it would hold , and by applying [40, Lemma 12.1], one would find that the origin is an interior point of . This would imply that cannot be optimal for OCP, which gives a contradiction.
The argument above (together with an application of the separation plane theorem) provides the existence of a non-zero vector denoted such that the following variational inequality holds
(10) |
4.1.4. Conclusion of the Proof of the Stochastic Maximum Principle
The conditions of the PMP are derived by working out the variational inequality (10) and finding expressions of some appropriate conditional expectations. The main details are developed below in the context of optimization over deterministic controls (see Section 5 for the generalization to stochastic controls).
First, by appropriately developing solutions to (6), (10) can be rewritten as
for every Lebesgue point for and every . Second, again from the structure of (6), it can be readily checked that by denoting
(18) |
the quantity is constant in (in addition, its negativity can be shown through a standard reformulation of problem OCP, as done in [40, Section 12.4]). Notice that the stochastic process is by definition –adapted. The quantities so far introduced allow one to reformulate the inequality above as
for every Lebesgue point for and , from which we infer the maximality condition of the PMP.
It remains to show the existence of the process , the continuity of the sample paths of the process , and the validity of the adjoint equation. For this, remark that, due to Jensen inequality and Lemma 2.1, the martingale is bounded in . Hence, the martingale representation theorem provides the existence of a process such that , where is a constant matrix. The definition in (18) immediately gives that the sample paths of the process are continuous. Next, an application of Itô formula (component-wise) readily shows that the product satisfies, for ,
Denoting , the computations above readily give the adjoint equation of the PMP. Those computations may also be leveraged to show and (see, e.g., [20, Section 7.2]).
4.2. Proof of the Convergence Result
Here we enter the core of the proof of Theorem 3.3.1. The convergence of trajectories and controls is addressed first. We devote the last two sections to the convergence of variational inequalities and Pontryagin extremals. For the sake of clarity and brevity, we only consider free-final-time problems. From now on, we implicitly assume –.
4.2.1. Convergence of Controls and Trajectories
Due to –, there exists a sequence of tuples such that for every , the tuple solves LOCP. In what follows, we implicitly adopt the reformulation of each problem convexified problem LOCP, which consist of adding to the final constraints through the new variable in (4). If denotes an admissible control for OCP that fulfills the conditions of Theorem 3.3.1, we denote by the –adapted with continuous sample paths process solution to the augmented system (8) related to OCP with control . The following holds. {lmm}[Convergence of trajectories] Assume that the sequence converges to zero. If the sequence converges to for the weak topology of , then , as , for every .
Proof.
For every , we have (below, represents some often overloaded appropriate constant)
(19) | ||||
Now, we take expectations. For the last term, we compute
due to Hölder and Burkholder–Davis–Gundy inequalities, and the last inequality comes from Lemma 2.1. Similar computations can be carried out for the first and third terms of (19).
To handle the second term of (19), we proceed as follows. We see that since implies , –almost surely, for every fixed and , the convergence of the sequence of controls for the weak topology of entails that , –almost surely as . In addition, gives for every and –almost surely. Hence, [52, Lemma 3.4] and the dominated convergence theorem finally provide that
It is easy to conclude the proof by applying a routine Grönwall inequality argument. ∎
The sought-after convergence of trajectories is a consequence of Lemma 4.2.1 when the conditions of Theorem 3.3.1 are met. In addition, limiting points for the sequences of controls fulfilling the conditions of Theorem 3.3.1 always exist up to some subsequence. Indeed, in this case, the set of admissible controls is closed and convex for the strong topology of . Hence it is closed for the weak topology of . Therefore, since is bounded in , there exists such that, up to some subsequence, weakly converges to for the weak topology of . It remains to show that the process is feasible for OCP. For this, we compute
due to Lemma 4.2.1 and the dominated convergence theorem (here is a constant coming from , and we use the continuity of the sample paths of ).
4.2.2. Convergence of Variational Inequalities
We start with a crucial result on linear stochastic differential equations. Recall the notation of Section 4.1. {lmm}[Convergence of variational inequalities] Fix and consider sequences of times and of uniformly bounded variables such that for every , . Assume that with for and with bounded. Denote , the stochastic process solutions, respectively, to
Under the assumptions of Lemma 4.2.1, it holds that for .
Proof.
From , , for we have (below, is a constant)
(20) | ||||
We consider the expectation of the fourth term on the right-hand side (a similar argument can be developed for the first three terms, which are omitted in the interest of clarity and brevity). The Burkholder–Davis–Gundy and Holdër inequalities give
where the last inequality holds due to Lemma 4.2.1 and because Lemma 2.1 may be readily extended to .
Consider which converges to for the weak convergence of . By assuming , we may denote by the full Lebesgue-measure subset such that if and only if is a Lebesgue point for and (we use the notation introduced with ). We prove the existence of a non-zero vector whose last component is non-positive such that for every , ,
(21) |
where the –adapted with continuous sample paths stochastic process solves (6) with
where we denote – we will use this notation from now on.
For this, due to (17), for every , the optimality of for LOCP provides a non-zero vector whose second to last component is non-zero by assumption, so that for and , it holds that
where with the notation
the –adapted with continuous sample paths stochastic process solves a higher dimensional version of (6) with
where again we denote .
Now, fix and . The following comes from combining Lemma 3.3.2 with [47, Lemma 3.11]. {lmm}[Pointwise convergence on controls] Under , there exists such that for every , is a Lebesgue point for , and , as . If denotes the sequence given by Lemma 4.2.2, we define and . Straightforward computations give (below, is a constant)
(22) | ||||
from which for (Lemma 4.2.1 and 4.2.2). Therefore, from Lemma 4.2.2 we infer that
which, together with , readily yields
At this step, we point out that the variational inequalities in (10) and in (17) still hold if we take multipliers of norm one. Specifically, we may assume that for every . Therefore, up to some subsequence, there exists a vector such that for and satisfying , . We use this remark to conclude as follows. The definition of and the Hölder inequality give ( is a constant)
and in this case, (21) follows from Lemma 4.2.1 and the convergences obtained above. From what we showed in Section 4.1.4, this latter inequality yields that is extremal for OCP as soon as .
Finally, we turn to the case for which the sequence converges to for the strong topology of , but without assuming . For this, fix and define the stochastic processes and , where . Similar computations to the ones developed to compute the bound (22) provide that as , and therefore up to some subsequence, the quantity converges to zero for a.e. in . By taking countable intersections of sets of Lebesgue points (one for each control , for all ), it follows that the argument above can be iterated exactly in the same manner (via Lemma 4.2.2), leading to the same conclusion.
4.2.3. Convergence of Multipliers and Conclusion
By applying the construction developed in Section 4.1.4 to the variational inequality (21), we retrieve a tuple , where , are constant, and , such that, as soon as , the tuple is extremal for OCP associated with satisfying conditions 1., 2., and 3. of Theorem 3.2.1. To conclude, it only remains to prove that
(23) |
Here, each tuple is the extremal of LOCP associated with the tuple , where solves the adjoint equation of Theorem 3.2.2. Since, under the assumption , the multipliers , , and do not play any role, for the sake of clarity and brevity of notation, and without loss of generality, in what follows we implicitly assume , for every .
Let us start with the first convergence of (23). For this, fix and consider the process
where , solve (7) with matrices , from which, due to , for every , we remove the last column and row, whereas , solve (7) with matrices , , those matrices being defined above. Due to a straightforward extension of Lemma 2.1 to equations (7), this process is a martingale, bounded in . Hence, the martingale representation theorem allows us to infer that this process is a martingale with continuous sample paths, and Doob and Jensen inequalities give
(24) | ||||
which holds for every . By combining (24) with (18), we compute
where is a constant. Up to some subsequence, the first term on the right-hand side converges to zero. Moreover, the definition of and Hölder inequality give
and a straightforward extension of Lemma 4.2.2 to equations (7) entails that all the terms on the right-hand side tend to zero. The convergence of is proved.
5. Extension to Problems with Free Final Time and Stochastic Controls
The accumulation properties of SCP can be extended to OCPs with free final time and whereby optimization is undertaken over stochastic controls. Specifically, in this section we investigate how results from Theorem 3.3.1 may be extended to finite-horizon, finite-dimensional non-linear stochastic General Optimal Control Problems (GOCP) having control-affine dynamics and uncontrolled diffusion, of the form
where the final time may be free or not (here, is some fixed maximal time), and we optimize over controls which are either deterministic, i.e., , or stochastic, i.e., . Solutions to GOCP will be denoted by , for .
By adopting the notation given in (3), the stochastic Linearized General Optimal Control Problem (LGOCP) at iteration may be defined accordingly as
whose solutions are denoted by , for . In particular, trust-region constraints are now imposed on the variable as well to derive convergence guarantees.
We point out that the results we present in this section are not to be considered as part of the main contribution, but they rather aim at providing insights for the future design of efficient numerical methods for stochastic optimal control problems which consider free final time and stochastic admissible controls. In particular, extending SCP in the presence of free final time and stochastic controls requires introducing additional assumptions which might seem demanding. We leave the investigation of the validity of this framework under sharper assumptions as a future direction of research.
5.1. Refined assumptions and extended result of convergence
The presence of the free final time hinders the well-posedness of the stochastic PMP when applied to each LGOCP, i.e., Theorem 3.2.1, and in turn the validity of Theorem 3.3.1 in such more general setting. To overcome this issue, we need to tighten assumptions –. Specifically, although remains unchanged, assumptions – are replaced by the followings, respectively:
Mappings , , and , , either are affine-in-state or have compact supports in and in , respectively. In the case of free final time, is affine.
For every , LGOCP is feasible. In addition, in the case of free final time, for every , any optimal control for LGOCP has continuous sample paths at the optimal final time for LGOCP.
Assumptions plays a crucial role to provide the existence of pointwise necessary conditions for optimality for the linearized problems LGOCP, thus enabling classical formulations of the stochastic PMP. However, we do recognize might be demanding and, as future research direction, we propose to relax it by leveraging integral-type necessary conditions for optimality, which are best suited to deal with optimal control problems which show explicit discontinuous dependence on time (e.g., [53]). Unfortunately, when optimization over stochastic controls is adopted, successfully leveraging weakly converging subsequences of controls to show “weak” guarantee of success for SCP, i.e., the equivalent of Theorem 3.3.2, becomes more challenging, in which case only Theorem 3.3.1 still holds, though under appropriate modifications. We leave proving success guarantees for SCP when optimizing over stochastic controls with weaker assumptions as a future direction of research.
Extending the stochastic PMP to GOCP and LGOCP comes by assuming , , and . In particular, under those assumptions we have the following extension of Theorem 3.2.1 and of Theorem 3.2.2: {thrm}[Stochastic Pontryagin Maximum Principle for GOCP] Let be a locally-optimal solution to GOCP. There exist and a tuple , where and are constant, and such that the following relations are satisfied:
-
(1)
Non-Triviality Condition: .
-
(2)
Adjoint Equation:
-
(3)
Maximality Condition:
-
(4)
Transversality Condition: if the final time is free
where equalities hold in the case .
The quantity uniquely determines , , and and is called extremal for GOCP (associated with the tuple , or simply with ). An extremal is called normal if .
[Weak Stochastic Pontryagin Maximum Principle for LGOCP] Let be a locally-optimal solution to LGOCP. There exist , a tuple , where , , and are constant, and such that the following relations are satisfied:
-
(1)
Non-Triviality Condition: .
-
(2)
Adjoint Equation:
-
(3)
Maximality Condition:
-
(4)
Transversality Condition: if the final time is free
where equalities hold in the case .
The quantity uniquely determines , , and and is called extremal for LOCP (associated with , or ). An extremal is called normal if .
Extending the stochastic PMP to the new linearized problems LGOCP, and in particular the new transversality condition 4., additionally requires to assume together with and . Although the proof of this result for fixed-final-time problems is well-established (see [20, Chapter 3]), we could not find any published proof of Theorem 5.1 when the final time is free. Therefore, we provide its proof in Section 5.2. The proof of Theorem 5.1 is achieved similarly, thus for the sake of clarity and brevity, we avoid reporting any detail concerning the latter. Thanks to Theorem 5.1, we can extend the convergence of SCP as follows:
[Generalized Properties of Accumulation Points for SCP] Assume that , , and hold and that SCP generates a sequence such that converges to zero, and for every , the tuple locally solves LGOCP. For every , letting be an extremal associated with for LGOCP (whose existence is ensured by Theorem 5.1), assume the following Accumulation Condition holds:
-
(AC)
Up to some subsequence, converges to some for the strong topology of .
If , then is an extremal for OCP associated with .
The guarantees offered by Theorem 5.1 read similarly to what we have explained in Section 3.3. One sees that the computations provided in Section 4.2 for the proof of Theorem 3.3.1 straightforwardly generalize as soon as weak convergence of controls in is replaced with strong convergence of controls in (actually, the proofs become even simpler), and may be endorsed to prove Theorem 5.1, provided that we are capable to extend the proof of Theorem 3.2.1 and Theorem 3.2.2 in the case of free final time and stochastic controls. Therefore, to conclude, in the next section we develop the necessary technical details which enable proving Theorem 5.1 and Theorem 5.1 through the machinery developed in Section 4.2. Specifically, since the proofs of Theorem 5.1 and Theorem 5.1 are similar, for the sake of clarity and brevity we only provide details for the proof of Theorem 5.1.
5.2. Proof of the extension for the stochastic Pontryagin Maximum Principle
Before getting started, we need to introduce the notion of a Lebesgue point for a stochastic control . For this, we adopt the theory of Bochner integrals, showing that , where the latter is the space of Bochner integrable mappings . Let . First of all, by definition we see that for almost every , it holds that , and thus this control is well-defined as a mapping . Since is second-countable, is separable, and therefore the claim follows from the Pettis measurability theorem once we prove that is strongly measurable with respect to the Lebesgue measure of . For this, it is sufficient to show that for every and , the mapping is Lebesgue measurable. By fixing , this can be achieved by proving that the family is a monotone class and then using standard monotone class arguments (the details are left to the reader). At this step, the Lebesgue differentiation theorem provides that for almost every , the following relations hold:
Such a time is called Lebesgue point for the control .
5.2.1. Modified Needle-like Variations and End-point Mapping
We may extend the concept of needle-like variations and of end-point mapping previously introduced in Section 4.1.2 to the setting of free final time and of stochastic controls as follows.
Given an integer , fix times which are Lebesgue points for , and fix random variables such that . For fixed scalars , , and , the needle-like variation of the control is defined to be the admissible control if and otherwise. Denote by the solution related to an admissible control of the augmented system
and define the mapping . For every fixed time , by denoting , the end-point mapping at time is defined to be
where is the open ball in of radius . Variations on the variable are necessary only if free-final-time problems are considered, in which case , and in particular the fact that is an affine function, play a crucial role for computations. Due to Lemma 2.1 (and to in the case of free-final-time problems), it is not difficult to see that is Lipschitz (see also the argument developed to prove Lemma 5.2.1 below). In addition, this mapping may be Gateaux differentiated at zero along admissible directions of the cone . For this, denote , and let be the unique solution to (6) with . {lmm}[Generalized stochastic needle-like variation formula] Let and assume holds (in particular, is an affine function when ). If is a Lebesgue point for , then it holds that
As for Lemma 4.1.2, we provide an extensive proof of Lemma 5.2.1 in the appendix.
5.2.2. Variational Inequalities and Conclusion
Similarly to what we have argued in Section 5.2.2, the main step in the proof of the stochastic PMP with free final time and stochastic control goes by contradiction, leveraging Lemma 5.2.1. For this, from now on we assume that when free, the final time is a Lebesgue point for the optimal control. Otherwise, one may proceed by mimicking the argument developed in [39, Section 7.3].
Assume that and hold. For every , define the linear mapping
which due to Lemma 5.2.1, satisfies
for every . Finally, consider the closed, convex cone of given by
If , it would hold , and by [40, Lemma 12.1], one would find that the origin is an interior point of . This would imply that is not optimal for OCP, a contradiction.
The argument above (together with an application of the separation plane theorem) provides the existence of a non-zero vector denoted such that the following variational inequalities hold
(25) |
In the case of deterministic controls, the random variables in (25) are replaced by deterministic vectors . Moreover, when , only negative variations on the final time are allowed. Hence in this case, the first equality of (25) actually becomes a greater-or-equal-to-zero inequality (see also [39, Chapter 7]).
6. An Example Numerical Scheme
Although the procedure detailed previously provides methodological steps to tackle OCP through successive linearizations, numerically solving LCPs that depend on stochastic coefficients still remains a challenge. In this last section, under appropriate assumptions we propose an approximate, though very practical, numerical scheme to effectively solve each subproblem LOCP. We stress the fact that our main goal is not the development of an ultimate algorithm, but rather consists of demonstrating how one may leverage the theoretical insights provided by Theorem 3.3.1 to design efficient strategies to practically solve OCP.
6.1. A Simplified Context
The proposed approach relies on a specific shape of the cost and the dynamics of OCP. Specifically, we consider OCPs with deterministic admissible controls (over a fixed time horizon ) only and whose cost functions are such that . Moreover, we assume the state variable is given by two components for , satisfying the following system of forward stochastic differential equations ( and are accordingly defined as in (1))
(26) |
In particular, any -adapted process solution to (26) with continuous sample paths for a given control is such that is deterministic. For the sake of clarity and to avoid cumbersome notation, from now on we assume and that does not explicitly depend on . This is clearly done without loss of generality.
6.2. The Proposed Approach
With the assumptions adopted previously, we see that the diffusion in the dynamics of OCP is now forced to be deterministic. This fact is at the root of our method, which mimics the procedure proposed in [9]. Specifically, we transcribe every stochastic subproblem LOCP into a deterministic and convex optimal control problem, whereby the variables are the mean and the covariance of the solution to LOCP. The main advantage in doing so is that deterministic reformulations of the subproblems LOCP can be efficiently solved via off-the-shelf convex solvers. Unlike [9], here we rely on some upstream information for the design of this numerical scheme which is entailed by Theorem 3.3.1 as follows.
By recalling the notation introduced in the previous sections, we denote , , and for , and . Heuristically, assuming that solutions to LOCP have small variance, i.e., , one can compute the linearization of , , , and at rather than at . In doing so, for the cost of LOCP we obtain the following approximation (the notation goes accordingly as in (3))
(27) |
whereas for the dynamics of LOCP (the notation goes accordingly as in (2))
(28) |
where , , , , and are deterministic and with affine in . Accordingly, by introducing and , slightly tighter trust-region constraints are
(29) |
At this point, as all coefficients are deterministic, solutions to (28) are Gaussian processes whose dynamics take the form (see, e.g., [54])
(30) |
The system above is not linear because of . Nevertheless, we may call upon the convergences of Theorem 3.3.1 , and in particular the convergence of the sequence to some deterministic curve , to replace (30) with
(31) |
In conclusion, we may heuristically replace every LOCP with the deterministic convex optimal control problem whose dynamics are (31), whose trust-region constraints are (29), whose variables are and (and additionally and ), and whose cost consists of replacing (27) with
to force solutions to have small variances, which in turn justifies the whole approach.
6.3. Uncertain Car Trajectory Planning Problem
We now provide numerical experiments. We consider the problem of planning the trajectory of a car whose state and control inputs are and and whose dynamics are
where and quantify the effect of slip. The evolution of is deterministic: given actuator commands, the change in velocity is known exactly, but uncertainty in positional variables persists. This model is motivated by driving applications where the vehicle may drift during tight turns and suffer from important positional uncertainty, but limits its acceleration to avoid wheel slip. By defining , this problem setting matches the one presented in the previous section. We consider minimizing control effort . The initial state is fixed, and the final state constraint consists of reaching the goal in expectation such that . Further, we consider cylindrical obstacles of radius centered at for which we define the potential function as if and otherwise, where and is an additional clearance. We penalize obstacle avoidance constraint violation directly within the cost, defining , with (note that in this setting, following Section 6.1).
6.4. Results
We set , , and , and we set up four obstacles. We discretize (31) using a forward Euler scheme with discretization nodes, set , set at each SCP iteration, and use IPOPT to solve each convexified problem. We check convergence of SCP by verifying that . Although our method is initialized with an unfeasible straight-line initial trajectory, SCP converges in iterations, with a trajectory avoiding the obstacles in expectation. Indeed, after evaluating sample paths of the system, only of the trajectories intersect with obstacles. We also verify that at every SCP iteration, and at convergence, so that . Thus, Theorems 3.3.1 and 3.3.2 apply and the solution generated by SCP is an extremal for OCP. We visualize our results in Figure 1 and release our implementation at https://github.com/StanfordASL/stochasticSCP.





7. Conclusion and Perspectives
In this paper we introduced and analyzed convergence properties for sequential convex programming for non-linear stochastic optimal control, from which we derived a practical numerical framework to solve non-linear stochastic optimal control problems.
Future work may consider extending this analysis to tackle more general problem formulations, e.g., risk measures as costs and state (chance) constraints. In this context, some preliminary results using SCP only exist for discrete time problem formulations [32]. However, tackling continuous time formulations will require more sophisticated necessary conditions for optimality. In addition, we plan to better investigate the setting of free final time and stochastic admissible controls, thus devising sharper assumptions and improving the convergence result. Finally, we plan to further leverage our theoretical insights to design new and more efficient numerical schemes for non-linear stochastic optimal control.
References
- [1] J.E. Potter. A matrix equation arising in statistical filter theory. Rep. RE-9, Experimental Astronomy Laboratory, Massachusetts Institute of Technology, 1965.
- [2] J.-M. Bismut. Linear quadratic optimal stochastic control with random coefficients. SIAM Journal on Control and Optimization, 14(3):419–444, 1976.
- [3] S. Peng. Stochastic Hamilton–Jacobi–Bellman equations. SIAM Journal on Control and Optimization, 30(2):284–304, 1992.
- [4] S. Tang. General linear quadratic optimal stochastic control problems with random coefficients: linear stochastic Hamilton systems and backward stochastic Riccati equations. SIAM journal on control and optimization, 42(1):53–75, 2003.
- [5] P. Kleindorfer and K. Glover. Linear convex stochastic optimal control with applications in production planning. IEEE Transactions on Automatic Control, 18(1):56–59, 1973.
- [6] R. T. Rockafellar and R. J. B. Wets. Generalized linear-quadratic problems of deterministic and stochastic optimal control in discrete time. SIAM Journal on control and optimization, 28(4):810–822, 1990.
- [7] D. Kuhn, W. Wiesemann, and A. Georghiou. Primal and dual linear decision rules in stochastic and robust optimization. Mathematical Programming, 130(1):177–209, 2011.
- [8] C. Bes and S. Sethi. Solution of a class of stochastic linear-convex control problems using deterministic equivalents. Journal of optimization theory and applications, 62(1):17–27, 1989.
- [9] B. Berret and F. Jean. Efficient computation of optimal open-loop controls for stochastic systems. Automatica, 115(108874), 2020.
- [10] M. A. Rami and X. Y. Zhou. Linear matrix inequalities, riccati equations, and indefinite stochastic linear quadratic controls. IEEE Transactions on Automatic Control, 45(6):1131–1143, 2000.
- [11] D. D. Yao, S. Zhang, and X. Y. Zhou. Stochastic linear-quadratic control via semidefinite programming. SIAM Journal on Control and Optimization, 40(3):801–823, 2001.
- [12] D. Bertsimas and D. B. Brown. Constrained stochastic lqc: a tractable approach. IEEE Transactions on automatic control, 52(10):1826–1841, 2007.
- [13] T. Damm, H. Mena, and T. Stillfjord. Numerical solution of the finite horizon stochastic linear quadratic control problem. Numerical Linear Algebra with Applications, 24(4), 2017.
- [14] T. Levajković, H. Mena, and L.-M. Pfurtscheller. Solving stochastic LQR problems by polynomial chaos. IEEE control systems letters, 2(4):641–646, 2018.
- [15] R. Bellman. Dynamic Programming. Princeton Univ. Press, Princeton, New Jersey, 1957.
- [16] P.-L. Lions. Optimal control of diffusion processes and Hamilton-Jacobi-Bellman equations, part i. Comm. Partial Differential Equations, 8:1101–1174, 1983.
- [17] L.S. Pontryagin. Mathematical theory of optimal processes. Routledge, 2018.
- [18] H. J. Kushner and F. C. Schweppe. A maximum principle for stochastic control systems. Journal of Mathematical Analysis and Applications, 8(2):287–302, 1964.
- [19] S. Peng. A general stochastic maximum principle for optimal control problems. SIAM Journal on control and optimization, 28(4):966–979, 1990.
- [20] J. Yong and X. Y. Zhou. Stochastic controls: Hamiltonian systems and HJB equations, volume 43. Springer Science & Business Media, 1999.
- [21] E. Trélat. Optimal control and applications to aerospace: some results and challenges. Journal of Optimization Theory and Applications, 154(3):713–758, 2012.
- [22] R. Bonalli, B. Hérissé, and E. Trélat. Analytical Initialization of a Continuation-Based Indirect Method for Optimal Control of Endo-Atmospheric Launch Vehicle Systems. In IFAC World Congress, 2017.
- [23] R. Bonalli, B. Hérissé, and E. Trélat. Optimal control of endo-atmospheric launch vehicle systems: Geometric and computational issues. IEEE Transactions on Automatic Control, 65(6):2418–2433, 2019.
- [24] A. Shapiro and A. Nemirovski. On complexity of stochastic programming problems. In Continuous optimization, pages 111–146. Springer, 2005.
- [25] E. Gobet. Monte-Carlo methods and stochastic processes: from linear to non-linear. CRC Press, 2016.
- [26] H. J. Kushner. Numerical methods for stochastic control problems in continuous time. SIAM Journal on Control and Optimization, 28(5):999–1048, 1990.
- [27] H. J. Kushner and L. F. Martins. Numerical methods for stochastic singular control problems. SIAM journal on control and optimization, 29(6):1443–1475, 1991.
- [28] M. Annunziato and A. Borzì. A Fokker–Planck control framework for multidimensional stochastic processes. Journal of Computational and Applied Mathematics, 237(1):487–507, 2013.
- [29] J.-F. Le Gall. Brownian motion, martingales, and stochastic calculus, volume 274. Springer, 2016.
- [30] R. Carmona. Lectures on BSDEs, stochastic control, and stochastic differential games with financial applications, volume 1. SIAM, 2016.
- [31] O. Kazuhide, M. Goldshtein, and P. Tsiotras. Optimal covariance control for stochastic systems under chance constraints. IEEE Control Systems Letters, 2(2):266–271, 2018.
- [32] T. Lew, R. Bonalli, and M. Pavone. Chance-constrained sequential convex programming for robust trajectory optimization. In European Control Conference, 2020.
- [33] Y. Wang, D. Yang, J. Yong, and Z. Yu. Exact controllability of linear stochastic differential equations and related problems. American Institute of Mathematical Sciences, 7(2):305–345, 2017.
- [34] Q. T. Dinh and M. Diehl. Local Convergence of Sequential Convex Programming for Nonconvex Optimization. In Recent Advances in Optimization and its Applications in Engineering, pages 93–102. Springer, 2010.
- [35] M. Diehl and F. Messerer. Local Convergence of Generalized Gauss-Newton and Sequential Convex Programming. In Conference on Decision and Control, 2019.
- [36] Y. Mao, D. Dueri, M. Szmuk, and B. Açikmeşe. Successive Convexification of Non-Convex Optimal Control Problems with State Constraints. IFAC-PapersOnLine, 50(1):4063–4069, 2017.
- [37] F. Palacios-Gomez, L. Lasdon, and M. Engquist. Nonlinear optimization by successive linear programming. Management science, 28(10):1106–1120, 1982.
- [38] P. T. Boggs and J. W. Tolle. Sequential quadratic programming. Acta numerica, 4(1):1–51, 1995.
- [39] R. Gamkrelidze. Principles of optimal control theory, volume 7. Springer Science & Business Media, 2013.
- [40] A.A. Agrachev and Y. Sachkov. Control theory from the geometric viewpoint, volume 87. Springer Science & Business Media, 2013.
- [41] H. Frankowska, H. Zhang, and X. Zhang. Stochastic Optimal Control Problems with Control and Initial-Final States Constraints. SIAM Journal on Control and Optimization, 56:1823–1855, 2018.
- [42] J. Nocedal and S. Wright. Numerical Optimization. Springer, 1999.
- [43] Z. Lu. Sequential convex programming methods for a class of structured nonlinear programming. Technical report, 2013.
- [44] T. Haberkorn and E. Trélat. Convergence results for smooth regularizations of hybrid nonlinear optimal control problems. SIAM Journal on Control and Optimization, 49:1498–1522, 2011.
- [45] R. Bonalli, B. Hérissé, and E. Trélat. Solving Optimal Control Problems for Delayed Control-Affine Systems with Quadratic Cost by Numerical Continuation. In American Control Conference, 2017.
- [46] R. Bonalli. Optimal control of aerospace systems with control-state constraints and delays. PhD thesis, Sorbonne Université, 2018.
- [47] R. Bonalli, B. Hérissé, and E. Trélat. Continuity of pontryagin extremals with respect to delays in nonlinear optimal control. SIAM Journal on Control and Optimization, 57(2):1440–1466, 2019.
- [48] I. A. Shvartsman and R. B. Vinter. Regularity properties of optimal controls for problems with time-varying state and control constraints. Nonlinear Analysis: Theory, Methods & Applications, 65:448–474, 2006.
- [49] Y. Chitour, F. Jean, and E. Trélat. Singular trajectories of control-affine systems. SIAM Journal on Control and Optimization, 47:1078–1095, 2008.
- [50] A.E. Bryson. Applied optimal control: optimization, estimation and control. CRC Press, 1975.
- [51] J.T. Betts. Survey of numerical methods for trajectory optimization. Journal of guidance, control, and dynamics, 21(2):193–207, 1998.
- [52] Emmanuel Trélat. Some properties of the value function and its level sets for affine control systems with quadratic cost. Journal of Dynamical and Control Systems, 6(4):511–541, 2000.
- [53] L. Bourdin and E. Trélat. Pontryagin maximum principle for finite dimensional nonlinear optimal control problem on time scales. SIAM Journal on Control and Optimization, 51(5):3781–3813, 2013.
- [54] P. S. Maybeck. Stochastic models, estimation, and control. Academic press, 1982.
Appendix
7.1. Proof of Lemma 2.1
Proof of Lemma 2.1.
For the sake of clarity in the notation, we only consider the case for which , the case with multivariate Brownian motion being similar.
Let us start with the first inequality. For this, by denoting , for every we compute (below, is an appropriate constant)
For the last term, by denoting , Burkholder–Davis–Gundy, Hölder, and Young inequalities give
Similar computations apply to the other terms and when considering solutions to (1). Therefore, we conclude from a routine Grönwall inequality argument.
Let us prove the second inequality of the lemma. For , we compute
For the second term on the right-hand side, for Hölder inequality gives
and similar computations hold for the remaining terms and when considering solutions to (1). Again, we conclude by a Grönwall inequality argument. ∎
7.2. Proof of Lemma 4.1.2 and of Lemma 5.2.1
The proof of Lemma 4.1.2 immediately follows from the following preliminary result. {lmm}[Stochastic needle-like variation formula - no free final time] Let (in particular, no or any assumption on are required). For , uniformly for ,
Proof.
We only consider the case , being the most general case done by adopting a classical induction argument (see, e.g., [40]). We only need to prove that
(32) |
uniformly for every . For this, first we remark that for . Since and are fixed and we take the limit , we may assume that . Therefore, without loss of generality, we replace with , assuming uniformly. We have (below, denotes a constant)
Let us analyze those integrals separately. Starting with the last one, we have
Lemma 2.1 immediately gives that the first term on the right-hand side is . In addition, from Hölder inequality, it follows that
Since is a Lebesgue point for , the two terms above go to zero as . Next, by the Burkholder–Davis–Gundy inequality and a Taylor development, we have
from Lemma 2.1. Similar estimates hold for the remaining terms in the first inequality of the proof. Summarizing, there exists a constant such that
and the conclusion follows from a routine Grönwall inequality argument. ∎
Proof of Lemma 5.2.1.
We may assume and . Hence, is an affine function, and in the rest of this proof we denote , where and .
Developing, we have
From Lemma 7.2, the second term on the right-hand side is . For the last summand, from the property of the stochastic integral, we have
where the last term is . It is worth pointing out the importance for being affine to provide the last claim. Finally, for the remaining term, we apply Itô formula to each coordinate , obtaining
and therefore
As is Lebesgue for , this last quantity goes to zero for , and we conclude. ∎