Sequential Convex Programming for
Non-Linear Stochastic Optimal Control

Riccardo Bonalli The author is with the Université Paris-Saclay, CNRS, CentraleSupélec, Laboratoire des signaux et systèmes, 91190, Gif-sur-Yvette, France. Email:[email protected] and [email protected]. , Thomas Lew² and Marco Pavone The authors are with the Department of Aeronautics and Astronautics, Stanford University, Stanford, CA 94305. Emails: {thomas.lew, pavone}@stanford.edu. This research was supported by the National Science Foundation under the CPS program (grant #1931815).

Abstract.

This work introduces a sequential convex programming framework for non-linear, finite-dimensional stochastic optimal control, where uncertainties are modeled by a multidimensional Wiener process. We prove that any accumulation point of the sequence of iterates generated by sequential convex programming is a candidate locally-optimal solution for the original problem in the sense of the stochastic Pontryagin Maximum Principle. Moreover, we provide sufficient conditions for the existence of at least one such accumulation point. We then leverage these properties to design a practical numerical method for solving non-linear stochastic optimal control problems based on a deterministic transcription of stochastic sequential convex programming.

Key words and phrases:

Nonlinear stochastic optimal control, sequential convex programming, convergence of Pontryagin extremals, numerical deterministic reformulation.

1991 Mathematics Subject Classification:

49K40, 65C30, 93E20

{resume}

Ce travail introduit un cadre de programmation séquentielle convexe pour le contrôle optimal de systèmes stochastiques non-linéaires de dimensions finies. Nous prouvons que tout point d’accumulation de la suite des itérées générée par la programmation séquentielle convexe est un candidat à une solution localement optimale du problème originel, au sens du Principe du Maximum de Pontriaguine stochastique. De plus, nous développons des conditions suffisantes pour l’existence d’au moins un de ces points d’accumulation. Nous exploitons ces propriétés afin de concevoir une méthode numérique pratique résolvant des problèmes de contrôle stochastique non-linéaires par une reformulation déterministe de la programmation séquentielle convexe.

1. Introduction

Over the past few decades, the applied control community has devoted increasing attention toward the optimal control of stochastic systems. The general formulation of this problem entails steering a dynamical system from an initial configuration to a final configuration while optimizing some prescribed performance criterion (e.g., minimizing some given cost) and satisfying constraints. This dynamical system is also subject to uncertainties which are modeled by Wiener processes and may come from unmodeled and/or unpredicted behaviors, as well as from measurement errors. Active development of new theoretical and numerical methods to address this problem continues, and the subject already has a rich literature.

We can classify the existing works into two main categories.

The first category consists of contributions that focus on Linear Convex Problems (LCPs), i.e., whose dynamics are linear and whose costs are convex in both state and control variables. An important class of LCPs is given by Linear Quadratic Problems (LQPs) in which costs are quadratic in both state and control variables and for which the analysis of optimal solutions may be reduced to the study of an algebraic relation known as Stochastic Riccati Equation (SRE) [1, 2, 3, 4]. Efficient algorithmic frameworks have been devised to numerically solve LCPs, ranging from local search [5] and dual representations [6, 7], to deterministic-equivalent reformulations [8, 9], among others. In the special case of LQPs, those techniques may be further improved by combining SRE theory with semidefinite programming [10, 11], finite-dimensional approximation [12, 13], or chaos expansion [14].

The second category of works deals with problems that do not enjoy any specific regularity, allowing non-linear dynamics or non-convex (therefore non-quadratic) costs. Throughout this paper, we call these Non-Linear Problems (NLPs). It is unquestionable that NLPs have so far received less attention from the community than LCPs, especially since the analysis of the former is usually more involved. Similar to the deterministic case, there are two main theoretical tools that have been developed to analyze NLPs: stochastic Dynamic Programming (DP) [15, 16] and the stochastic Pontryagin Maximum Principle (PMP) [17, 18, 19] (an extensive survey of generalizations of DP and the PMP may be found in [20]). In the case of LQPs, one can show that DP and the PMP lead to SRE [20]. DP provides optimal policies through the solution of a partial differential equation, whereas the necessary conditions for optimality offered by the PMP allow one to set up a two-point boundary value problem which returns candidate locally-optimal solutions when solved. Both methods only lead to analytical solutions in a few cases, and they can involve complex numerical challenges (the stochastic setting is even more problematic than the deterministic one, the latter being better understood for a wide range of problems, see, e.g., [21, 22, 23]). This has fostered the investigation of more tractable approaches to solve NLPs such as Monte Carlo simulation [24, 25], Markov chain discretization [26, 27], and deterministic (though non-equivalent) reformulations [28, 9], among others. Importantly, many of the aforementioned approaches, e.g., [26], are often based on some sort of approximation of the original formulation and thus offer powerful alternatives to DP and the PMP, especially since they are more numerically tractable and can be shown to converge to policies satisfying DP or the stochastic PMP.

In this paper, our objective is to lay the theoretical foundations to leverage Sequential Convex Programming (SCP) for the purposes of computing candidate optimal solutions for a specific class of NLPs. SCP is among the most well-known and earliest approximation techniques for deterministic non-linear optimal control and, to the best of our knowledge, such an approach has not been extended to stochastic settings yet. The simplest SCP scheme (which we consider in this work) consists of successively linearizing any non-linear terms in the dynamics and any non-convex functions in the cost and seeking a solution to the original formulation by solving a sequence of LCPs. This approach leads to two desirable properties, which jointly are instrumental to the design of efficient numerical schemes. First, one can rely on the many efficient techniques and associated software libraries that have been devised to solve LCPs (or LQPs depending on the shape of the original NLP). Second, as we will show in this paper, when this iterative process converges, it returns a strategy that satisfies the PMP related to the original NLP, i.e., a candidate optimum for the original formulation. Unlike existing methods such as [26], which introduce approximated formulations that are still non-linear, our approach offers the main advantage of requiring the solution to LCPs only. Specifically, we identify three key contributions:

(1)

We introduce and analyze a new framework to compute candidate optimal solutions for finite-horizon, finite-dimensional non-linear stochastic optimal control problems. with control-affine dynamics and uncontrolled diffusion. This hinges on the basic principle of SCP, i.e., iteratively solving a sequence of LCPs that stem from successive linear approximations of the original problem.
(2)

Through a meticulous study of the continuity of the stochastic Pontryagin cones of variations with respect to linearization, we prove that any accumulation point of the sequence of iterates generated by SCP is a strategy satisfying the PMP related to the original formulation. In addition, by leveraging additional assumptions we prove that any such accumulation point always exists, which in turn provides a “weak” guarantee of success for the method.
(3)

Through an explicit example, we show how to leverage the properties offered by this framework to better understand what approximations may be adopted for the design of efficient numerical schemes for NLPs, although we leave the theoretical investigation of the approximation error as future direction.

The paper is organized as follows. Section 2 introduces notation and preliminary results and defines the stochastic optimal control problem of interest. In Section 3, we introduce the framework of stochastic SCP and the stochastic PMP, and we state our main result of convergence. For the sake of clarity, in Section 4 we retrace the proof of the stochastic PMP and introduce all the necessary technicalities we need to prove our main result of convergence, though we recall that the stochastic PMP is a well-established result and Section 4 should not be understood as part of our main contribution. In Section 6, we show how to leverage our analysis to design a practical numerical method to solve non-linear stochastic optimal control problems, and we provide numerical experiments. Finally, Section 7 provides concluding remarks and perspectives on future directions.

2. Stochastic Optimal Control Setting

Let $(\Omega,\mathcal{G},P)$ be a second–countable probability space and let $B_{t}=(B^{1}_{t},\dots,B^{d}_{t})$ be a $d$ –dimensional Brownian motion with continuous sample paths starting at zero and whose filtration $\mathcal{F}\triangleq(\mathcal{F}_{t})_{t\geq 0}=\big{(}\sigma(B_{s}:0\leq s\leq t)\big{)}_{t\geq 0}$ is complete. We consider processes that are defined within bounded time intervals. Hence, for every $n\in\mathbb{N}$ , $\ell\geq 2$ and maximal time $T\in\mathbb{R}_{+}$ , we introduce the space $L^{\ell}_{\mathcal{F}}([0,T]\times\Omega;\mathbb{R}^{n})$ of progressive processes $x:[0,T]\times\Omega\rightarrow\mathbb{R}^{n}$ (with respect to the filtration $\mathcal{F}$ ) such that $\mathbb{E}\left[\int^{T}_{0}\|x(s,\omega)\|^{\ell}\;\mathrm{d}s\right]<\infty$ , where $\|\cdot\|$ is the Euclidean norm. In this setting, for every $x\in L^{\ell}_{\mathcal{F}}([0,T]\times\Omega;\mathbb{R}^{n})$ and $i=1,\dots,d$ , the Itô integral of $x$ with respect to $B^{i}$ is the continuous, bounded in $L^{2}$ , and $n$ –dimensional martingale in $[0,T]$ (with respect to the filtration $\mathcal{F}$ ) that starts at zero, denoted $\int^{\cdot}_{0}x(s)\;\mathrm{d}B^{i}_{s}:[0,T]\times\Omega\rightarrow\mathbb{R}^{n}$ . For $\ell\geq 2$ , we denote by $L^{\ell}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n}))$ the space of $\mathcal{F}$ –adapted processes $x:[0,T]\times\Omega\rightarrow\mathbb{R}^{n}$ that have continuous sample paths and satisfy $\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \|x(s,\omega)\|^{\ell}\right]<\infty$ .

2.1. Stochastic Differential Equations

From now on, we fix two integers $n,m\in\mathbb{N}$ , a maximal time $T\in\mathbb{R}_{+}$ , and a compact, convex subset $U\subseteq\mathbb{R}^{m}$ . Although in this work we consider differential equations steered by deterministic controls (see Section 2.2 below), for the sake of generality, we introduce stochastic dynamics which depend on either deterministic or stochastic controls. Specifically, we denote by $\mathcal{U}$ the set of admissible controls and consider either deterministic controls $\mathcal{U}=L^{2}([0,T];U)$ or stochastic controls $\mathcal{U}=L^{2}_{\mathcal{F}}([0,T]\times\Omega;U)$ . Note that since $U$ is compact, admissible controls are almost everywhere, or a.e. for brevity (and additionally almost surely, or a.s. for brevity) bounded. We are given continuous mappings $b_{i}:\mathbb{R}\times\mathbb{R}^{n}\rightarrow\mathbb{R}^{n}$ , $i=0,\dots,m$ and $\sigma_{j}:\mathbb{R}\times\mathbb{R}^{n}\rightarrow\mathbb{R}^{n}$ , $j=1,\dots,d$ which are at least $C^{2}$ with respect to the variable $x$ . For a given $u\in\mathcal{U}$ , we consider dynamical systems modeled through the following forward stochastic differential equation with uncontrolled diffusion

	$\displaystyle\displaystyle\mathrm{d}x(t)$	$\displaystyle=b(t,u(t),x(t))\;\mathrm{d}t+\sigma(t,x(t))\;\mathrm{d}B_{t},\quad x(0)=x^{0}$		(1)
		$\displaystyle\triangleq\left(b_{0}(t,x(t))+\sum^{m}_{i=1}u^{i}(t)b_{i}(t,x(t))\right)\;\mathrm{d}t+\sum^{d}_{j=1}\sigma_{j}(t,x(t))\;\mathrm{d}B^{j}_{t}$

where we assume that the fixed initial condition satisfies $x^{0}\in L^{\ell}_{\mathcal{F}_{0}}(\Omega;\mathbb{R}^{n})$ , for every $\ell\geq 2$ (for instance, this holds when $x^{0}$ is a deterministic vector of $\mathbb{R}^{n}$ ).

The procedure developed in this work is based on the following linearization of (1). For $\ell\geq 2$ , let $v\in\mathcal{U}$ and $y\in L^{\ell}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n}))$ . For a given $u\in\mathcal{U}$ , we define the linearization of (1) around $(v,y)$ to be the following well-defined forward stochastic differential equation with uncontrolled diffusion

		$\displaystyle\mathrm{d}x(t)=b_{v,y}(t,u(t),x(t))\;\mathrm{d}t+\sigma_{y}(t,x(t))\;\mathrm{d}B_{t},\quad x(0)=x^{0}$		(2)
		$\displaystyle\triangleq\left(b_{0}(t,y(t))+\frac{\partial b_{0}}{\partial x}(t,y(t))(x(t)-y(t))+\sum^{m}_{i=1}\left(u^{i}(t)b_{i}(t,y(t))+v^{i}(t)\frac{\partial b_{i}}{\partial x}(t,y(t))(x(t)-y(t))\right)\right)\mathrm{d}t$
		$\displaystyle+\sum^{d}_{j=1}\left(\sigma_{j}(t,y(t))+\frac{\partial\sigma_{j}}{\partial x}(t,y(t))(x(t)-y(t))\right)\mathrm{d}B^{j}_{t}.$

We require the solutions to (1) and to (2) to be bounded in expectation uniformly with respect to $u$ , $v$ and $y$ . For this, we consider the following (standard) assumption:

$(A_{1})$ Functions $b_{i}$ , $i=0,\dots,m$ , $\sigma_{j}$ , $j=1,\dots,d$ , have compact supports in $[0,T]\times\mathbb{R}^{n}$ .

Under $(A_{1})$ , for every $\ell\geq 2$ and every $u\in\mathcal{U}$ , the stochastic equation (1) has a unique (up to stochastic indistinguishability) solution $x_{u}\in L^{\ell}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n}))$ , whereas for every $u,v\in\mathcal{U}$ and $y\in L^{\ell}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n}))$ , the stochastic equation (2) has a unique (up to stochastic indistinguishability) solution $x_{u,v,y}\in L^{\ell}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n}))$ (see, e.g., [29, 30]), and the following technical result holds, which will be used in the proof of our convergence result, with proof given in the appendix (see Section 7.1).

{lmm}

[Uniform boundness and continuity with respect to control inputs] Fix $\ell\geq 2$ , and for $u\in\mathcal{U}$ , let $x_{u}$ denote the solution to (1), whereas for $u,v\in\mathcal{U}$ and $y\in L^{\ell}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n}))$ , let $x_{u,v,y}$ denote the solution to (2). Under $(A_{1})$ , there exists a constant $C\geq 0$ which does not depend on $u$ , $v$ , or $y$ such that it holds that

\mathbb{E}\left[\underset{t\in[0,T]}{\sup}\ \|x_{u}(t)\|^{\ell}+\underset{t\in[0,T]}{\sup}\ \|x_{u,v,y}(t)\|^{\ell}\right]\leq C,

\displaystyle\mathbb{E}\left[\underset{t\in[0,T]}{\sup}\ \|x_{u_{1}}(t)-x_{u_{2}}(t)\|^{\ell}+\underset{t\in[0,T]}{\sup}\ \|x_{u_{1},v,y}(t)-x_{u_{2},v,y}(t)\|^{\ell}\right]\leq C\mathbb{E}\left[\left(\int^{T}_{0}\|u_{1}(s)-u_{2}(s)\|\;\mathrm{d}s\right)^{\ell}\right],u_{1},u_{2}\in\mathcal{U}.

2.2. Stochastic Optimal Control Problem

Given $q\in\mathbb{N}$ , we consider continuous mappings $g:\mathbb{R}^{n}\rightarrow\mathbb{R}^{q}$ , $G:\mathbb{R}^{m}\rightarrow\mathbb{R}$ , $H:\mathbb{R}^{n}\rightarrow\mathbb{R}$ , and $L:\mathbb{R}\times\mathbb{R}^{m}\times\mathbb{R}^{n}\rightarrow\mathbb{R}$ with $L(t,x,u)\triangleq L_{0}(t,x)+\sum^{m}_{i=1}u^{i}L_{i}(t,x)$ . We require $g$ , $H$ , and $L_{i}$ , $i=0,\dots,m$ , to be at least $C^{2}$ with respect to the variable $x$ , and we require $G$ , $H$ to be convex. In particular, $G$ is Lipschitz when restricted to the compact and convex set $U$ . We focus on finite-horizon, finite-dimensional non-linear stochastic Optimal Control Problems (OCP) with control-affine dynamics and uncontrolled diffusion, of the form

\begin{cases}\displaystyle\underset{u\in\mathcal{U}}{\min}\ \mathbb{E}\left[\int^{T}_{0}f^{0}(s,u(s),x(s))\;\mathrm{d}s\right]\triangleq\mathbb{E}\left[\int^{T}_{0}\Big{(}G(u(s))+H(x(s))+L(s,u(s),x(s))\Big{)}\;\mathrm{d}s\right]\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \mathrm{d}x(t)=b(t,u(t),x(t))\;\mathrm{d}t+\sigma(t,x(t))\;\mathrm{d}B_{t},\quad x(0)=x^{0},\quad\mathbb{E}\left[g(x(T))\right]=0\end{cases}

where we optimize over deterministic controls $u\in\mathcal{U}=L^{2}([0,T];U)$ . We adopt the (fairly mild) assumption:

$(A_{2})$ Mappings $g$ , $H$ , and $L_{i}$ , $i=0,\dots,m$ , either are affine-in-state or have compact supports in $\mathbb{R}^{n}$ and in $[0,T]\times\mathbb{R}^{n}$ , respectively.

Our choice of optimizing over deterministic controls as opposed to stochastic controls is motivated by practical considerations. Specifically, in several applications of interest ranging from aerospace to robotics, it is often advantageous to compute and implement simpler deterministic controls at higher control rates, to be able to quickly react to external disturbances, unmodeled dynamical effects, and changes in the cost function (e.g., moving obstacles that a robot should avoid in real-time). Moreover, in cases where a feedback controller is accounted for, a common and efficient approach entails decomposing the stochastic control into a state-dependent feedback term, plus a nominal deterministic control trajectory to be optimized for, which is equivalent to adopting deterministic controls (see, e.g., [31, 32]). Nevertheless, for the sake of completeness and generality, in Section 5 we introduce appropriate conditions under which our method may extend to stochastic controls; we also analyze possible extensions to the case of free-final-time optimal control problems.

Many applications of interest often involve state constraints. In this case, to make sure the procedure developed in this work still applies, every such constraint needs to be considered in expectation and penalized within the cost of OCP (for example by including those contributions in $H$ or $L$ through some penalization function). In future work, we plan to extend our method to more general settings, whereby for instance state constraints are enforced through more accurate chance constraints (see Section 7 for a thorough discussion).

3. Stochastic Sequential Convex Programming

We propose the following framework to solve OCP, based on the classical SCP methodology. Starting from some initial guesses of control $u_{0}\in\mathcal{U}$ and trajectory $x_{0}\in L^{\ell}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n}))$ , $\ell\geq 2$ , we inductively define a sequence of stochastic linear-convex problems whose dynamics and costs stem from successive linearizations of the mappings $b$ , $\sigma$ , and $L$ , and we successively solve those problems while updating user-defined parameters. The convergence of the method generally depends on a good choice of the initial guess $(u_{0},x_{0})$ and of updates in each iteration to trust-region constraints. These constraints are added to make the successive linearizations of OCP well-posed. Below, we detail this procedure.

3.1. The Method

At iteration $k+1\in\mathbb{N}$ , by denoting

f^{0}_{v,y}(s,u,x)\triangleq G(u)+H(x)+L(s,u,y)+\frac{\partial L}{\partial x}(s,v,y)(x-y),

(3)

we define the following stochastic Linearized Optimal Control Problem (LOCP ${}^{\Delta}_{k+1}$ )

\begin{cases}\displaystyle\underset{u\in\mathcal{U}}{\min}\ \mathbb{E}\left[\int^{T}_{0}f^{0}_{k+1}(s,u(s),x(s))\;\mathrm{d}s\right]\triangleq\mathbb{E}\bigg{[}\int^{T}_{0}f^{0}_{u_{k},x_{k}}(s,u(s),x(s))\;\mathrm{d}s\bigg{]}\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \mathrm{d}x(t)=b_{k+1}(t,u(t),x(t))\;\mathrm{d}t+\sigma_{k+1}(t,x(t))\;\mathrm{d}B_{t},\quad x(0)=x^{0}\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \hskip 25.83325pt\triangleq b_{u_{k},x_{k}}(t,u(t),x(t))\;\mathrm{d}t+\sigma_{x_{k}}(t,x(t))\;\mathrm{d}B_{t}\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \displaystyle\mathbb{E}\left[g_{k+1}(x(T))\right]\triangleq\mathbb{E}\left[g(x_{k}(T))+\frac{\partial g}{\partial x}(x_{k}(T))(x(T)-x_{k}(T))\right]=0\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \displaystyle\int^{T}_{0}\mathbb{E}\left[\|x(s)-x_{k}(s)\|^{2}\right]\;\mathrm{d}s\leq\Delta_{k+1}\end{cases}

where we optimize over deterministic controls $u\in\mathcal{U}=L^{2}([0,T];U)$ . The tuple $(u_{k},x_{k})\in\mathcal{U}\times L^{\ell}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n}))$ is defined inductively and denotes a solution to (LOCP) ${}^{\Delta}_{k}$ .

Each problem LOCP ${}^{\Delta}_{k+1}$ consists of linearizing OCP around the solution at the previous iteration $(u_{k},x_{k})$ , starting from $(u_{0},x_{0})$ . To avoid misguidance due to high linearization error, we must restrict the search for optimal solutions for LOCP ${}^{\Delta}_{k+1}$ to neighborhoods of $(u_{k},x_{k})$ . This is achieved by the final constraints listed in LOCP ${}^{\Delta}_{k+1}$ , referred to as trust-region constraints, where the constant $\Delta_{k+1}\geq 0$ is the trust-region radius. No such constraint is enforced on controls, since those appear linearly in $b$ , $\sigma$ , and $L$ . To ensure well-posedness of the sequence (LOCP ${}^{\Delta}_{k}$ )_k∈N, we require LOCP ${}^{\Delta}_{k+1}$ has a solution, for which we consider the following assumption:

$(A_{3})$ For every $k\in\mathbb{N}$ , problem LOCP ${}^{\Delta}_{k+1}$ is feasible.

{prpstn}

[Existence of solutions of LOCP ${}^{\Delta}_{k+1}$ ] Under $(A_{3})$ , LOCP ${}^{\Delta}_{k+1}$ has a solution for every $k\in\mathbb{N}$ .

Proof.

The proof is based on the argument developed to prove [20, Theorem 5.2, Chapter 2]. Specifically, let $k\in\mathbb{N}$ and $\{(u^{\alpha}_{k+1},x^{\alpha}_{k+1})\}_{\alpha\in\mathbb{N}}\in\mathcal{U}\times L^{\ell}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n}))$ be a minimizing sequence for LOCP ${}^{\Delta}_{k+1}$ .

The sequence $\{u^{\alpha}_{k+1}\}_{\alpha\in\mathbb{N}}$ is uniformly bounded in $L^{2}([0,T];\mathbb{R}^{m})$ , and thus we may assume the existence of $\tilde{u}_{k+1}\in L^{2}([0,T];\mathbb{R}^{m})$ such that $\{u^{\alpha}_{k+1}\}_{\alpha\in\mathbb{N}}$ converges, up to some subsequence, to $\tilde{u}_{k+1}$ for the weak topology of $L^{2}([0,T];\mathbb{R}^{m})$ . Moreover, the compactness and convexity of $U$ yield $\tilde{u}_{k+1}\in\mathcal{U}$ . Finally, by Mazur’s theorem there exists a sequence of convex combinations

\tilde{u}^{\alpha}_{k+1}\triangleq\sum_{\beta\geq 1}c^{k+1}_{\alpha,\beta}u^{\alpha+\beta}_{k+1},\quad c^{k+1}_{\alpha,\beta}\geq 0,\quad\sum_{\beta\geq 1}c^{k+1}_{\alpha,\beta}=1,

such that $\{\tilde{u}^{\alpha}_{k+1}\}_{\alpha\in\mathbb{N}}$ converges to $\tilde{u}_{k+1}$ for the strong topology of $L^{2}([0,T];\mathbb{R}^{m})$ (although, since controls are deterministic and the diffusion is uncontrolled, weak convergence of controls would suffice in our setting).

Thanks to Lemma 2.1, the sequence of trajectories $\{x_{\tilde{u}^{\alpha}_{k+1},u_{k},x_{k}}\}_{\alpha\in\mathbb{N}}\subseteq L^{\ell}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n}))$ converges to the trajectory $\tilde{x}_{k+1}\triangleq x_{\tilde{u}_{k+1},u_{k},x_{k}}\in L^{\ell}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n}))$ for the strong topology of $L^{\ell}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n}))$ . At this step, for every $\alpha\in\mathbb{N}$ , thanks to the linearity of (2) we obtain that

x_{\tilde{u}^{\alpha}_{k+1},u_{k},x_{k}}\equiv\sum_{\beta\geq 1}c^{k+1}_{\alpha,\beta}x_{u^{\alpha+\beta}_{k+1},u_{k},x_{k}}=\sum_{\beta\geq 1}c^{k+1}_{\alpha,\beta}x^{\alpha+\beta}_{k+1},

and therefore the linearity of the function $g_{k+1}$ yields

\mathbb{E}\left[g_{k+1}(x_{\tilde{u}^{\alpha}_{k+1},u_{k},x_{k}}(T))\right]=\sum_{\beta\geq 1}c^{k+1}_{\alpha,\beta}\mathbb{E}\left[g_{k+1}(x^{\alpha+\beta}_{k+1}(T))\right]=0,

whereas the convexity of the norm yields

\left(\int^{T}_{0}\mathbb{E}\left[\|x_{\tilde{u}^{\alpha}_{k+1},u_{k},x_{k}}(s)-x_{k}(s)\|^{2}\right]\;\mathrm{d}s\right)^{\frac{1}{2}}\leq\sum_{\beta\geq 1}c^{k+1}_{\alpha,\beta}\left(\int^{T}_{0}\mathbb{E}\left[\|x^{\alpha+\beta}_{k+1}(s)-x_{k}(s)\|^{2}\right]\;\mathrm{d}s\right)^{\frac{1}{2}}\leq\Delta^{\frac{1}{2}}_{k+1}.

In turn, passing to the limit for $\alpha\rightarrow\infty$ , we infer that the tuple $(\tilde{u}_{k+1},\tilde{x}_{k+1})\in\mathcal{U}\times L^{\ell}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n}))$ is admissible for LOCP ${}^{\Delta}_{k+1}$ . In addition, the convexity of the function $f^{0}_{k+1}$ allows us to compute

	$\displaystyle\mathbb{E}\left[\int^{T}_{0}f^{0}_{k+1}(s,\tilde{u}_{k+1}(s),\tilde{x}_{k+1}(s))\;\mathrm{d}s\right]$	$\displaystyle\leq\underset{\alpha\rightarrow\infty}{\lim}\ \sum_{\beta\geq 1}c^{k+1}_{\alpha,\beta}\mathbb{E}\left[\int^{T}_{0}f^{0}_{k+1}(s,u^{\alpha+\beta}_{k+1}(s),x^{\alpha+\beta}_{k+1}(s))\;\mathrm{d}s\right]$
		$\displaystyle=\underset{u\in\mathcal{U}}{\min}\ \mathbb{E}\left[\int^{T}_{0}f^{0}_{k+1}(s,u(s),x(s))\;\mathrm{d}s\right],$

from which we conclude that $(\tilde{u}_{k+1},\tilde{x}_{k+1})\in\mathcal{U}\times L^{\ell}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n}))$ is a solution of LOCP ${}^{\Delta}_{k+1}$ . ∎

{rmrk}

Although one finds empirically that $(A_{3})$ is often satisfied in practice (see for instance results in Section 6), it is generally difficult to derive sufficient conditions for this assumption to hold true a priori. In particular, to the best of our knowledge, only stochastic dynamics with constant coefficients and with specific regularity conditions on the stochastic diffusion yield controllability [33]. In deterministic settings, some works obtain feasibility of each convex subproblem, though only locally around the unknown solution of the original formulation by assuming second-order regularity conditions, which however can not be checked a priori [34, 35]. Thus, even SCP-based schemes in simpler deterministic settings must often either explicitly assume the feasibility of each convex subproblem, or must modify the linearized dynamics by infusing additional slack controls to force feasibility [36]. Motivated by these remarks, we leave the investigation for sufficient conditions for the validity of $(A_{3})$ as a direction of future research, which are out of the scope of this work which focuses on the properties of accumulations points of the sequence generated by SCP. Note that $(A_{3})$ is satisfied when $g=0$ , i.e., no final constraints are imposed: a problem that remains generally relevant though computationally difficult to solve.

Under assumptions $(A_{1})$ – $(A_{3})$ , the method consists of iteratively solving the aforementioned linearized problems through the update of the sequence of trust-region radii, producing a sequence of tuples $(u_{k},x_{k})_{k\in\mathbb{N}}$ such that for each $k\in\mathbb{N}$ , $(u_{k+1},x_{k+1})$ solves LOCP ${}^{\Delta}_{k+1}$ . The user may often steer this procedure to convergence (with respect to appropriate topologies) by adequately selecting an initial guess $(u_{0},x_{0})$ and an update rule for $(\Delta_{k})_{k\in\mathbb{N}}$ ¹¹1When state constraint penalization is adopted, SCP procedures may also consider update rules for penalization weights, and those must be provided together with update rules for the trust-region radii. Further details can be found in [37, 38]., and appropriate choices will be described in Section 6. Assuming that an accumulation point for $(u_{k},x_{k})_{k\in\mathbb{N}}$ can be found (whose existence will be discussed shortly), our objective consists of proving that this is a candidate locally-optimal solution to the original formulation OCP. Specifically, we show that any accumulation point for $(u_{k},x_{k})_{k\in\mathbb{N}}$ satisfies the stochastic PMP related to OCP. To develop such analysis, we require the absence of state constraints, and in particular of trust-region constraints.

3.2. Stochastic Pontryagin Maximum Principle

In this section, we recall the statement of the PMP which provides classical first-order necessary conditions for optimality, upon which we will establish our main result. For the sake of clarity, we introduce the PMP related to OCP and the PMP related to each convexified problem LOCP ${}^{\Delta}_{k}$ separately.

3.2.1. PMP related to OCP

For every $p\in\mathbb{R}^{n}$ , $p^{0}\in\mathbb{R}$ , and $q=(q_{1},\dots,q_{d})\in\mathbb{R}^{n\times d}$ define the Hamiltonian (same notation as in (1))

{\color[rgb]{0,0,0}H(s,u,x,p,p^{0},q)\triangleq p^{\top}b(s,x,u)+p^{0}f^{0}(s,x,u)+\sum^{d}_{i=1}q^{\top}_{i}\sigma_{i}(s,x).}

{thrm}

[Stochastic Pontryagin Maximum Principle for OCP [20]] Let $(u,x)$ be a locally-optimal solution to OCP. There exist $p\in L^{2}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n}))$ , a tuple $(\mathfrak{p},p^{0})$ , where $\mathfrak{p}\in\mathbb{R}^{d}$ and $p^{0}\leq 0$ are constant, and $q=(q_{1},\dots,q_{d})\in L^{2}_{\mathcal{F}}([0,T]\times\Omega;\mathbb{R}^{n\times d})$ such that the following relations are satisfied:

(1)

Non-Triviality Condition: $(\mathfrak{p},p^{0})\neq 0$ .

(2)

Adjoint Equation:

\displaystyle\mathrm{d}p(t)

\displaystyle=\displaystyle-\frac{\partial H}{\partial x}(t,u(t),x(t),p(t),p^{0},q(t))\;\mathrm{d}t+q(t)\;\mathrm{d}B_{t},\quad p(T)=\displaystyle\mathbb{E}\left[\frac{\partial g}{\partial x}(x(T))\right]^{\top}\mathfrak{p}\in\mathbb{R}^{n}.

(3)

Maximality Condition:

{\color[rgb]{0,0,0}u(t)=\underset{v\in U}{\arg\max}\ \mathbb{E}\Big{[}H(t,v,x(t),p(t),p^{0},q(t))\Big{]},\ \textnormal{a.e.}}

The quantity $(u,\mathfrak{p},p^{0})$ uniquely determines $x$ , $p$ , and $q$ and is called extremal for OCP (associated with the tuple $(u,x,p,\mathfrak{p},p^{0},q)$ , or simply with $(u,x)$ ). An extremal $(u,\mathfrak{p},p^{0})$ is called normal if $p^{0}\neq 0$ .

Albeit final conditions are specified instead of initial conditions for $p$ , it turns out that processes satisfying backward stochastic differential equations are adapted with respect to the filtration $\mathcal{F}$ (see, e.g., [30, 20]), which makes the adjoint equation well-posed. Although conditions for optimality for stochastic optimal control problems are usually developed when considering stochastic controls only, the proof of Theorem 3.2.1 (and of Theorem 3.2.2 below) readily follows from classical arguments (e.g., see [20, Chapter 3.6]). Nevertheless, to prove our main result, we rely on a proof of Theorem 3.2.1 (and of Theorem 3.2.2 below) which stems from implicit-function-theorem-type results (see Sections 4.1 and 4.2). Thus, in Section 4.1, we provide a new proof of Theorem 3.2.1 (more precisely, of Theorem 3.2.2 below) which follows from the original idea developed by Pontryagin and his group (see, e.g., [17, 39, 40]), though the latter result should not be understood as part of our main contribution.

3.2.2. PMP related to LOCP ${}^{\Delta}_{k}$

For every $k\geq 1$ , $p\in\mathbb{R}^{n}$ , $p^{0},p^{1}\in\mathbb{R}$ , and $q=(q_{1},\dots,q_{d})\in\mathbb{R}^{n\times d}$ , define the Hamiltonian (same notation as in (2))

H_{k}(s,u,x,p,p^{0},p^{1},q)\triangleq p^{\top}b_{k}(s,x,u)+p^{0}f^{0}_{k}(s,x,u)+p^{1}\|x-x_{k-1}(s)\|^{2}+\sum^{d}_{i=1}q^{\top}_{i}(\sigma_{k})_{i}(s,x).

{thrm}

[Weak Stochastic Pontryagin Maximum Principle for LOCP ${}^{\Delta}_{k}$ [20]] Let $(u_{k},x_{k})$ be a locally-optimal solution to LOCP ${}^{\Delta}_{k}$ . There exists $p_{k}\in L^{2}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n}))$ , a tuple $(\mathfrak{p}_{k},p^{0}_{k},p^{1}_{k})$ , where $\mathfrak{p}_{k}\in\mathbb{R}^{q}$ , $p^{0}_{k}\leq 0$ , and $p^{1}_{k}\in\mathbb{R}$ are constant, and $q_{k}=((q_{k})_{1},\dots,(q_{k})_{d})\in L^{2}_{\mathcal{F}}([0,T]\times\Omega;\mathbb{R}^{n\times d})$ such that the following relations are satisfied:

(1)

Non-Triviality Condition: $(\mathfrak{p}_{k},p^{0}_{k},p^{1}_{k})\neq 0$ .

(2)

Adjoint Equation:

\displaystyle\mathrm{d}p_{k}(t)

\displaystyle=\displaystyle-\frac{\partial H_{k}}{\partial x}(t,u_{k}(t),x_{k}(t),p_{k}(t),p^{0}_{k},p^{1}_{k},q_{k}(t))\;\mathrm{d}t+q_{k}(t)\;\mathrm{d}B_{t},\quad p_{k}(T)=\displaystyle\mathbb{E}\left[\frac{\partial g_{k}}{\partial x}(x_{k}(T))\right]^{\top}\mathfrak{p}_{k}\in\mathbb{R}^{n}.

(3)

Maximality Condition:

u_{k}(t)=\underset{v\in U}{\arg\max}\ \mathbb{E}\Big{[}H_{k}(t,v,x_{k}(t),p_{k}(t),p^{0}_{k},p^{1}_{k},q_{k}(t))\Big{]},\ \textnormal{a.e.}

The quantity $(u_{k},\mathfrak{p}_{k},p^{0}_{k},p^{1}_{k})$ uniquely determines $x_{k}$ , $p_{k}$ , and $q_{k}$ and is called extremal for LOCP ${}^{\Delta}_{k}$ (associated with $(u_{k},x_{k},p_{k},\mathfrak{p}_{k},p^{0}_{k},p^{1}_{k},q_{k})$ , or simply with $(u_{k},x_{k})$ ). An extremal $(u_{k},\mathfrak{p}_{k},p^{0}_{k},p^{1}_{k})$ is called normal if $p^{0}_{k}\neq 0$ .

{rmrk}

By introducing the new variable

y(t)\triangleq\int^{t}_{0}\|x(s)-x_{k}(s)\|^{2}\;\mathrm{d}s,

(4)

the trust-region constraint in LOCP ${}^{\Delta}_{k+1}$ can be rewritten as $\mathbb{E}[y(T)-\Delta_{k+1}]\leq 0$ . Thus, by leveraging this transformation, LOCP ${}^{\Delta}_{k+1}$ may be reformulated as a standard stochastic optimal control problem with final inequality constraints. Nevertheless, although the conditions listed in Theorem 3.2.1 are essentially sharp, the statement of Theorem 3.2.2 may be strengthened as follows. If $(u_{k+1},x_{k+1},\mathfrak{p}_{k+1},p^{0}_{k+1},p^{1}_{k+1},q_{k+1})$ is an extremal for LOCP ${}^{\Delta}_{k+1}$ , one can additionally prove that $p^{1}_{k+1}\leq 0$ (e.g., see [41]; note that in [41] the multipliers have opposite signs because a different convention is adopted) and

p^{1}_{k+1}\mathbb{E}\left[\int^{T}_{0}\|x_{k+1}(s)-x_{k}(s)\|^{2}\;\mathrm{d}s-\Delta_{k+1}\right]=0

(the latter is know as slack condition), which motivates the choice “weak stochastic PMP for LOCP ${}^{\Delta}_{k+1}$ ” as name for Theorem 3.2.2. Nevertheless, since the trust-region constraints do not appear in the original problem, we do not need to leverage these latter additional conditions on $p^{1}_{k+1}$ to prove our claims, i.e., Theorem 3.2.2 suffices to establish the aforementioned properties of accumulation points for SCP when applied to solve OCP.

3.3. Main Results

Our contribution is twofold. First, under very mild assumptions, we prove that any accumulation point of the sequence of iterates generated by SCP is a candidate locally-optimal solution for OCP. Specifically, we prove that any accumulation point of the sequence $(u_{k},x_{k},\mathfrak{p}_{k},p^{0}_{k},p^{1}_{k},q_{k})_{k\in\mathbb{N}}$ generated by SCP, where $(u_{k},\mathfrak{p}_{k},p^{0}_{k},p^{1}_{k})$ is an extremal for LOCP ${}^{\Delta}_{k}$ associated with $(u_{k},x_{k})$ in the sense of Theorem 3.2.1, is an extremal for OCP in the sense of Theorem 3.2.2 (see Theorem 3.3.1). Although the optimization community is aware that establishing the convergence of the sequence generated by SCP is generally difficult (see, e.g., [42, 43]), one can often prove optimality-related properties of its accumulation points, a much natural property which justify the use of SCP (see, e.g., [43]). Second, by strengthening our original assumptions, we prove existence of at least one accumulation point of the aforementioned sequence $(u_{k},x_{k},\mathfrak{p}_{k},p^{0}_{k},p^{1}_{k},q_{k})_{k\in\mathbb{N}}$ generated by SCP (see Theorem 3.3.2). Below, we organize these claims in two more precise statements.

3.3.1. Properties of Accumulation Points for SCP

{thrm}

[Properties of Accumulation Points for SCP] Assume that $(A_{1})$ – $(A_{3})$ hold and that SCP generates a sequence $(\Delta_{k},u_{k},x_{k})_{k\in\mathbb{N}}$ such that $(\Delta_{k})_{k\in\mathbb{N}}\subseteq\mathbb{R}_{+}\setminus\{0\}$ converges to zero, and for every $k\geq 1$ , the tuple $(u_{k},x_{k})$ locally solves LOCP ${}^{\Delta}_{k}$ . For every $k\geq 1$ , letting $(u_{k},\mathfrak{p}_{k},p^{0}_{k},p^{1}_{k})$ be an extremal associated with $(u_{k},x_{k})$ for LOCP ${}^{\Delta}_{k}$ (whose existence is ensured by Theorem 3.2.2), assume the following Accumulation Condition holds:

(AC)

Up to some subsequence, $(u_{k},\mathfrak{p}_{k},p^{0}_{k},p^{1}_{k})$ converges to some $(u,\mathfrak{p},p^{0},p^{1})\in L^{2}([0,T];\mathbb{R}^{m})\times\mathbb{R}^{q+2}$ for the weak topology of $L^{2}([0,T];\mathbb{R}^{m})\times\mathbb{R}^{q+2}$ .

If $(\mathfrak{p},p^{0})\neq 0$ , then $(u,\mathfrak{p},p^{0})$ is an extremal for OCP associated with $(u,x_{u})$ .

The guarantees offered by Theorem 3.3.1 read as follows. Under $(A_{1})$ – $(A_{3})$ and by selecting a shrinking-to-zero sequence of trust-region radii, if iteratively solving problems LOCP ${}^{\Delta}_{k}$ returns a sequence of strategies whose extremals satisfy (AC) with a non-trivial multiplier $(\mathfrak{p},p^{0})\neq 0$ , then SCP finds a candidate (local) solution to OCP. Theorem 3.3.1 extends classical results on the well-posedness of SCP (see, e.g., [43]) from deterministic to stochastic settings. In particular, the requirement $(\mathfrak{p},p^{0})\neq 0$ in Theorem 3.3.1 is natural and has an equivalent in deterministic settings, playing the role of some sort of qualification condition (see, e.g., [43, Theorem 3.4]). Note that the requirement $(\mathfrak{p},p^{0})\neq 0$ can be easily numerically checked (see our discussion after Theorem 3.3.2).

3.3.2. Existence of Accumulation Points for SCP

Assumptions $(A_{1})$ – $(A_{3})$ together with some minor requirements are sufficient to establish that any accumulation point for the sequence of iterates $(u_{k},x_{k})_{k\in\mathbb{N}}$ satisfies the stochastic PMP related to OCP (Theorem 3.3.1). We can additionally infer the existence of accumulation points, i.e., (AC) in Theorem 3.3.1 holds true, if some more structure on the data defining OCP is assumed. Importantly, the validity of (AC) endows stochastic SCP with the additional guarantee that at least one accumulation point (with respect to weak topologies) exists, which in turn provides a “weak” guarantee of success for the method via the result of Theorem 3.3.1 (“weak” because convergence is satisfied up to some subsequence). We introduce the following technical condition:

$(A_{4})$ The mapping $G:\mathbb{R}^{m+1}\rightarrow\mathbb{R}$ is given by $G(t,u)=u^{\top}\mathbb{U}(t)u$ , where each $\mathbb{U}(t)\in\mathbb{R}^{m\times m}$ is symmetric definite positive and the mapping $t\mapsto\mathbb{U}(t)^{-1}$ is continuous.

The use we make of $(A_{4})$ is essentially contained in the following result.

{lmm}

Under $(A_{4})$ , for every $k\in\mathbb{N}$ , every normal extremal $(u_{k+1},\mathfrak{p}_{k+1},p^{0}_{k+1},p^{1}_{k+1})$ for LOCP ${}^{\Delta}_{k+1}$ , i.e., for which $p^{0}_{k+1}\neq 0$ , is such that the corresponding control $u_{k+1}$ is time-continuous.

Proof.

Fix $k\in\mathbb{N}$ , and let $(u_{k+1},\mathfrak{p}_{k+1},p^{0}_{k+1},p^{1}_{k+1})$ be a normal extremal for LOCP ${}^{\Delta}_{k+1}$ . Due to $p^{0}_{k+1}\neq 0$ , the convexity of $U\subseteq\mathbb{R}^{m}$ and the maximality condition in Theorem 3.2.2 yield

u_{k+1}(s)=\left\{\begin{aligned} \frac{1}{2p^{0}_{k+1}}\mathbb{U}^{-1}(s)\gamma_{k}(s)\quad&\textnormal{if}\quad\frac{1}{2p^{0}_{k+1}}\mathbb{U}^{-1}(s)\gamma_{k}(s)\in U,\\ \textnormal{Proj}_{U}\left(\frac{1}{2p^{0}_{k+1}}\mathbb{U}^{-1}(s)\gamma_{k}(s)\right)\quad&\textnormal{if}\quad\frac{1}{2p^{0}_{k+1}}\mathbb{U}^{-1}(s)\gamma_{k}(s)\notin U,\end{aligned}\right.

where we denote $\gamma_{k}(s)\triangleq\big{(}(\gamma_{k}(s))_{1},\dots,(\gamma_{k}(s))_{m}\big{)}$ with $(\gamma_{k}(s))_{i}\triangleq\mathbb{E}\big{[}p^{\top}_{k+1}(s)b_{i}(s,x_{k}(s))+p^{0}_{k+1}L_{i}(s,x_{k}(s))\big{]}$ , whereas $\textnormal{Proj}_{U}:\mathbb{R}^{m}\rightarrow\mathbb{R}^{m}$ denotes the projection over the convex set $U$ . The claim readily follows once we prove the mappings $t\in[0,T]\mapsto\gamma_{k}(s)$ are continuous, given that $t\in[0,T]\mapsto\mathbb{U}^{-1}(s)$ and $v\mapsto\textnormal{Proj}_{U}(v)$ are continuous. We only prove the continuity of $t\in[0,T]\mapsto\mathbb{E}[p_{k}(t)]$ , given that the continuity of $t\in[0,T]\mapsto\gamma_{k}(s)$ can be proved by leveraging similar arguments, and $(A_{1})$ and $(A_{2})$ . For this, thanks to $(A_{1})$ , $(A_{2})$ , Lemma 2.1, Theorem 3.2.2, and Hölder and Burkholder–Davis–Gundy inequalities, for every $0\leq s<t\leq T$ , we obtain that

	$\displaystyle\mathbb{E}\big{[}\\|p_{k}(t)-p_{k}(s)\\|\big{]}\leq$
	$\displaystyle\leq 2\mathbb{E}\bigg{[}\bigg{\\|}\int^{t}_{s}\bigg{(}p^{\top}_{k}(r)\frac{\partial f_{k}}{\partial x}(r,x_{k}(r),u_{k}(r))+p^{0}_{k}\frac{\partial f^{0}_{k}}{\partial x}(r,x_{k}(r),u_{k}(r))+2p^{1}_{k}(x_{k}(r)-x_{k-1}(r))$
	$\displaystyle\hskip 193.74939pt+\sum^{d}_{i=1}(q_{k})^{\top}_{i}(r)\frac{\partial(\sigma_{k})_{i}}{\partial x}(r,x_{k}(r))\bigg{)}\;\mathrm{d}r\bigg{\\|}\bigg{]}+2\mathbb{E}\left[\left\\|\int^{t}_{s}q_{k}(r)\;\mathrm{d}B_{r}\right\\|\right]$
	$\displaystyle\leq C\left(p^{0}_{k}+p^{1}_{k}+\mathbb{E}\left[\underset{t\in[0,T]}{\sup}\ \\|x_{k}(t)\\|^{2}\right]^{\frac{1}{2}}+\mathbb{E}\left[\underset{t\in[0,T]}{\sup}\ \\|p_{k}(t)\\|^{2}\right]^{\frac{1}{2}}+\mathbb{E}\left[\int^{T}_{0}\ \\|q_{k}(t)\\|^{2}\;\mathrm{d}t\right]^{\frac{1}{2}}\right)(t-s)^{\frac{1}{2}},$

for some appropriate constant $C\geq 0$ , and the conclusion follows. ∎

{rmrk}

Through Lemma 3.3.2, $(A_{4})$ becomes crucial to ensure the validity of (AC) in Theorem 3.3.1. In particular, the works [44, 45, 46, 47], which analyze continuity properties of extremals with respect to appropriate deformations of some deterministic optimal control problems and which inspired our work, show that the time-continuity of optimal controls $u_{k}$ for each LOCP ${}^{\Delta}_{k}$ represents a requirement which is not easy to relax (in particular, see the counterexample in [47, Section 2.3]), especially in the presence of trust-region constraints. Importantly, motivated by regularity results in deterministic optimal control settings (see, e.g., [48, Theorem 3.2]), we reckon that more generic mappings $G$ might yield Lemma 3.3.2, especially when optimizing over deterministic controls, although we leave the investigation of more general sufficient conditions for the time-continuity of optimal controls $u_{k}$ for each LOCP ${}^{\Delta}_{k}$ as a future research direction, in that it is out of the scope of this work which again focuses on the properties of accumulations points of the sequence generated by SCP.

{thrm}

[Existence of Accumulation Points for SCP] Assume that $(A_{1})$ – $(A_{4})$ hold and that SCP generates a sequence $(\Delta_{k},u_{k},x_{k})_{k\in\mathbb{N}}$ such that $(\Delta_{k})_{k\in\mathbb{N}}\subseteq\mathbb{R}_{+}\setminus\{0\}$ converges to zero, and for every $k\geq 1$ , the tuple $(u_{k},x_{k})$ locally solves LOCP ${}^{\Delta}_{k}$ . For every $k\geq 1$ , let $(u_{k},\mathfrak{p}_{k},p^{0}_{k},p^{1}_{k})$ be an extremal associated with $(u_{k},x_{k})$ for LOCP ${}^{\Delta}_{k}$ (whose existence is ensured by Theorem 3.2.2). If $p^{0}_{k}\neq 0$ for every $k\geq 1$ , then (AC) in Theorem 3.3.1 holds true.

In addition, if $(u,\mathfrak{p},p^{0})$ denotes the extremal for OCP associated with some $(u,x,p,\mathfrak{p},p^{0},q)$ , which is provided by Theorem 3.3.1, the following convergence holds, up to some subsequence, for every $\ell\geq 2$ when $k\rightarrow\infty$ :

\|(\mathfrak{p}_{k},p^{0}_{k})-(\mathfrak{p},p^{0})\|+\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \left\|x_{k}(s)-x(s)\right\|^{\ell}+\underset{s\in[0,T]}{\sup}\ \left\|p_{k}(s)-p(s)\right\|^{2}+\int^{T}_{0}\left\|q_{k}(s)-q(s)\right\|^{2}\;\mathrm{d}s\right]\rightarrow 0.

(5)

The main takeaway of Theorem 3.3.2 is that an accumulation point for stochastic SCP always exists, which in turn is a candidate (local) solution to OCP due to Theorem 3.3.1, as soon as appropriate qualification-type conditions are satisfied. In particular, similarly to the condition $(\mathfrak{p},p^{0})\neq 0$ in Theorem 3.3.1, the requirement $p^{0}_{k}\neq 0$ in Theorem 3.3.2 plays the role of an additional qualification condition, but at the level of each subproblem LOCP ${}^{\Delta}_{k}$ . Note that the latter is a generic property in deterministic optimal control (see, e.g., [49]). Finally, the requirement $(\mathfrak{p},p^{0})\neq 0$ in Theorem 3.3.1 from which (local) optimality of the strategy found by SCP stems can be numerically checked thanks to (5), as soon as the multipliers $(\mathfrak{p}_{k},p^{0}_{k})$ are accessible through SCP iterations.

3.3.3. Insights to Speed Up the Convergence of SCP

In Theorem 3.3.2, there are also insightful statements concerning the convergence of Pontryagin extremals. Let us outline how those statements may be leveraged to speed up convergence.

For this, adopt the notation of Theorem 3.2.1 and assume that we are in the situation where applying the maximality condition of the PMP to problem OCP leads to smooth candidate optimal controls, as functions of the variables $x$ , $p^{0}$ , and $p$ (be aware that this might not be straightforward to obtain). We are then in a position to define two-point boundary value problems to solve OCP, also known as shooting methods, for which the decision variables become $p^{0}$ , $p(T)$ , and $q$ . In particular, the core of the method consists of iteratively choosing $(p^{0},p(T),q)$ and making the adjoint equation evolve until some given final condition is met (see, e.g., [50, 51] for a more detailed explanation of shooting methods). In the context of deterministic optimal control, when convergence is achieved, shooting methods terminate quite fast (at least quadratically). However, here the bottlenecks are: 1) to deal with the presence of the variable $q$ and 2) to find a good guess for the initial value of $(p^{0},p(T),q)$ to make the whole procedure converge. In the setting of Theorem 3.3.2, a valid option to design well-posed shooting methods is as follows. With the notation and assumptions of Theorem 3.3.2, up to some subsequence it holds that $(p^{0}_{k},p_{k}(T),q_{k})\rightarrow(p^{0},p(T),q)$ (with respect to appropriate topologies) as $k\rightarrow\infty$ . Therefore, assuming we have access to Pontryagin extremals along iterations and given some large enough iteration $k$ , we can fix $q=q_{k}$ and initialize with $(p^{0}_{k},p_{k}(T))$ a shooting method for OCP that operates on the finite-dimensional variable $(p^{0},p(T))\in\mathbb{R}^{q+1}$ . If successful, this strategy would speed up the convergence of the entire numerical scheme, though we leave its investigation as a future research direction.

4. Proof of the Main Results

Since the statement of Theorem 3.3.1 is in particular contained in Theorem 3.3.2, it is sufficient to prove Theorem 3.3.2 only. We split this proof into three main steps. First, we retrace the proof of the stochastic PMP to introduce necessary notation and expressions. In addition, we leverage this step to provide novel insight on how to prove the stochastic PMP by following the lines of the original work of Pontryagin and his group (see, e.g., [17, 39, 40]), a proof that we could not find in the stochastic literature. Second, we show the convergence of trajectories and controls, together with the convergence of variational inequalities (see Section 5.2.2 for a definition). The latter represents the cornerstone of the proof and paves the way for the final step, which consists of proving the convergence of the Pontryagin extremals. For the sake of clarity and brevity and without loss of generality, we carry out the proof in the case of scalar Brownian motion, i.e., we assume $d=1$ . Moreover, for any $x\in\mathbb{R}^{n}$ with $n\in\mathbb{N}$ , we adopt the notation $\tilde{x}\triangleq(x,x^{n+1})\in\mathbb{R}^{n+1}$ .

4.1. Main Steps of the Proof of the Stochastic Maximum Principle

For the sake of clarity and brevity, we retrace the proof of Theorem 3.2.1 only. The proof of Theorem 3.2.2 follows from a straightforward modification of the steps we provide below, by introducing the additional final constraint $\mathbb{E}[y(T)-\Delta_{k+1}]\leq 0$ via (4). In particular, we highlight those modifications below and in Section 4.2.2.

4.1.1. Linear Stochastic Differential Equations

Define the stochastic matrices $A(t)\triangleq\frac{\partial(b,f^{0})}{\partial x}(t,u(t),x(t))$ and $D(t)\triangleq\frac{\partial(\sigma,0)}{\partial x}(t,u(t),x(t))$ . For any time $r\in[0,T]$ and any bounded initial condition $\tilde{\xi}_{r}\in L^{2}_{\mathcal{F}_{r}}(\Omega;\mathbb{R}^{n+1})$ , the following problem

\begin{cases}\mathrm{d}z(t)=A(t)z(t)\;\mathrm{d}t+D(t)z(t)\;\mathrm{d}B_{t}\vskip 6.0pt plus 2.0pt minus 2.0pt\\ z(s)=0,\ s\in[0,r),\quad z(r)=\tilde{\xi}_{r}\end{cases}

(6)

is well-posed [20]. Its unique solution is the $\mathcal{F}$ –adapted with right-continuous sample paths process $z:[0,T]\times\Omega\rightarrow\mathbb{R}^{n+1}:(t,\omega)\mapsto\mathbbm{1}_{[r,T]}(t)\phi(t,\omega)\psi(r,\omega)\xi_{r}(\omega)$ , where the matrix-valued $\mathcal{F}$ –adapted with continuous sample paths processes $\phi$ and $\psi$ satisfy

\begin{cases}\mathrm{d}\phi(t)=A(t)\phi(t)\mathrm{d}t+D(t)\phi(t)\mathrm{d}B_{t}\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \phi(0)=I,\end{cases}\qquad\begin{cases}\mathrm{d}\psi(t)=-\psi(t)\left(A(t)-D(t)^{2}\right)\mathrm{d}t-\psi(t)D(t)\mathrm{d}B_{t}\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \psi(0)=I,\end{cases}

(7)

respectively. In particular, a straightforward application of the Itô formula shows that $\phi(t)\psi(t)=\psi(t)\phi(t)=I$ , and therefore $\psi(t)=\phi(t)^{-1}$ , for every $t\in[0,T]$ .

4.1.2. Needle-like Variations and End-point Mapping

One way to prove the PMP comes from the analysis of specific variations called needle-like variations on a mapping called the end-point mapping. Those concepts are introduced below in the context of optimization over deterministic controls (see Section 5 for the generalization of this argument to stochastic controls).

Given an integer $j\in\mathbb{N}$ , fix $j$ times $0<t_{1}<\dots<t_{j}<T$ which are Lebesgue points for $u$ , and fix $j$ random variables $u_{1},\dots,u_{j}$ such that $u_{i}\in U$ . For fixed scalars $0\leq\eta_{i}<t_{i+1}-t_{i}$ , $i=1,\dots,j-1$ , and $0\leq\eta_{j}<T-t_{j}$ , the needle-like variation $\pi=\{t_{i},\eta_{i},u_{i}\}_{i=1,\dots,j}$ of the control $u$ is defined to be the admissible control $u_{\pi}(t)=u_{i}$ if $t\in[t_{i},t_{i}+\eta_{i}]$ and $u_{\pi}(t)=u(t)$ otherwise. Denote by $\tilde{x}_{v}$ the solution related to an admissible control $v$ of the augmented system

\begin{cases}\mathrm{d}x(t)=b(t,v(t),x(t))\;\mathrm{d}t+\sigma(t,x(t))\;\mathrm{d}B_{t},\quad x(0)=x^{0}\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \mathrm{d}x^{n+1}(t)=f^{0}(t,v(t),x(t))\;\mathrm{d}t,\hskip 60.27759ptx^{n+1}(0)=0\end{cases}

(8)

and define the mapping $\tilde{g}:\mathbb{R}^{n+1}\rightarrow\mathbb{R}^{q+1}:\tilde{x}\mapsto(g(x),x^{n+1})$ . For every fixed time $t\in(t_{j},T]$ , by denoting $\delta_{t}\triangleq\min\{t_{i+1}-t_{i},t-t_{j}:i=1,\dots,j-1\}>0$ , the end-point mapping at time $t$ is defined to be the function

	$\displaystyle F^{j}_{t}:\$	$\displaystyle\mathcal{C}^{j}_{t}\triangleq B^{j}_{\delta_{t}}(0)\cap\mathbb{R}^{j}_{+}\rightarrow\mathbb{R}^{q+1}$		(9)
		$\displaystyle(\eta_{1},\dots,\eta_{j})\mapsto\mathbb{E}\left[\tilde{g}(\tilde{x}_{u_{\pi}}(t))-\tilde{g}(\tilde{x}_{u}(t))\right]$

where $B^{j}_{\rho}$ is the open ball in $\mathbb{R}^{j}$ of radius $\rho>0$ . Due to Lemma 2.1, it is not difficult to see that $F^{j}_{t}$ is Lipschitz (see also the argument developed to prove Lemma 4.1.2 below). In addition, this mapping may be Gateaux differentiated at zero along admissible directions of the cone $\mathcal{C}^{j}_{t}$ . For this, denote $\tilde{b}=(b^{\top},f^{0})^{\top}$ , $\tilde{\sigma}=(\sigma^{\top},0)^{\top}$ and let $z_{t_{i},u_{i}}$ be the unique solution to (6) with $\xi_{t_{i}}=\tilde{b}(t_{i},u_{i},x(t_{i}))-\tilde{b}(t_{i},u(t_{i}),x(t_{i}))$ .

{rmrk}

For the proof of Theorem 3.2.2, the only change compared to the proof of Theorem 3.2.1 which is required up to this point consists of replacing the function $\tilde{g}:\mathbb{R}^{n+1}\rightarrow\mathbb{R}^{q+1}$ which defines the end-point mapping (9) by the function

(\tilde{g}_{k},g^{n+2}_{k}):\mathbb{R}^{n+2}\rightarrow\mathbb{R}^{q+2}:(\tilde{x},y)\mapsto(g_{k}(x),x^{n+1},y).

Note that this change is consistent since we will effectively make use of the mapping $F^{j}_{t}$ at the time $t=T$ only.

{lmm}

[Stochastic needle-like variation formula] Let $(\eta_{1},\dots,\eta_{j})\in\mathcal{C}^{j}_{t}$ . For any $t>t_{j}$ , it holds that

\displaystyle\bigg{\|}\mathbb{E}\bigg{[}\tilde{g}(\tilde{x}_{u_{\pi}}(t))-\tilde{g}(\tilde{x}_{u}(t))-\sum^{j}_{i=1}\eta_{i}\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(t))z_{t_{i},u_{i}}(t)\bigg{]}\bigg{\|}=o\left(\sum^{j}_{i=1}\eta_{i}\right).

The proof of this result is technical (it requires an intense use of stochastic inequalities) but not difficult. We provide an extensive proof of Lemma 4.1.2 in the appendix in a more general context (see also Section 5).

4.1.3. Variational Inequalities

The main step in the proof of the PMP goes by contradiction, leveraging Lemma 4.1.2. To this end, for every $j\in\mathbb{N}$ , define the linear mapping

dF^{j}_{T}:\mathbb{R}^{j}_{+}\rightarrow\mathbb{R}^{q+1}:(\eta_{1},\dots,\eta_{j})\mapsto\sum_{i=1}^{j}\eta_{i}\mathbb{E}\left[\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(T))z_{t_{i},u_{i}}(T)\right],

which due to Lemma 4.1.2, satisfies

\underset{\alpha>0,\alpha\rightarrow 0}{\lim}\ \frac{F^{j}_{T}\left(\alpha\eta\right)}{\alpha}=dF^{j}_{T}(\eta),

for every $\eta\in\mathbb{R}^{j}_{+}$ . Finally, consider the closed, convex cone of $\mathbb{R}^{q+1}$ given by

\displaystyle K\triangleq\textnormal{Cl}\bigg{(}\textnormal{Cone}\ \bigg{\{}\mathbb{E}\left[\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(T))z_{t_{i},u_{i}}(T)\right]:\ \textnormal{for}\ u_{i}\in U\ \textnormal{and}\ t_{i}\in(0,T)\ \textnormal{is Lebesgue for}\ u\bigg{\}}\bigg{)}.

If $K=\mathbb{R}^{q+1}$ , it would hold $dF^{j}_{T}(\mathbb{R}^{j}_{+})=K=\mathbb{R}^{q+1}$ , and by applying [40, Lemma 12.1], one would find that the origin is an interior point of $F^{j}_{T}(\mathcal{C}^{j}_{T})$ . This would imply that $(u,x)$ cannot be optimal for OCP, which gives a contradiction.

The argument above (together with an application of the separation plane theorem) provides the existence of a non-zero vector denoted $\tilde{\mathfrak{p}}=(\mathfrak{p}^{\top},\mathfrak{p}^{0})\in\mathbb{R}^{q+1}$ such that the following variational inequality holds

\tilde{\mathfrak{p}}^{\top}\mathbb{E}\left[\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(T))z_{r,v}(T)\right]\leq 0,\ r\in[0,T]\ \textnormal{is Lebesgue for}\ u,\ v\in U.

(10)

{rmrk}

For the proof of Theorem 3.2.2, the variational inequalities (10) are replaced by the following upgraded variational inequalities which hold for $(u_{k+1},x_{k+1})$ , solution to LOCP ${}^{\Delta}_{k+1}$ :

	$\displaystyle\left(\begin{array}[]{c}\tilde{\mathfrak{p}}_{k+1}\\ p^{1}_{k+1}\end{array}\right)^{\top}\mathbb{E}\bigg{[}\bigg{(}\begin{array}[]{cc}\displaystyle\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{k}(T))&0\\ 0&1\end{array}\bigg{)}$	$\displaystyle\bigg{(}\begin{array}[]{c}z^{k}_{r,v}(T)\\ (z^{k}_{r,v})^{n+2}(T)\end{array}\bigg{)}\bigg{]}\leq 0,$		(17)
		$\displaystyle r\in[0,T]\ \textnormal{is Lebesgue for}\ u_{k},\ v\in U,$

where each $((z^{k}_{r,v})^{\top},(z^{k}_{r,v})^{n+2})^{\top}$ solves an equation similar to (6) (we report this new equation in Section 4.2.2).

4.1.4. Conclusion of the Proof of the Stochastic Maximum Principle

The conditions of the PMP are derived by working out the variational inequality (10) and finding expressions of some appropriate conditional expectations. The main details are developed below in the context of optimization over deterministic controls (see Section 5 for the generalization to stochastic controls).

First, by appropriately developing solutions to (6), (10) can be rewritten as

\mathbb{E}\left[\left(\tilde{\mathfrak{p}}^{\top}\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(T))\phi(T)\psi(r)\right)^{\top}\left(\left(\begin{array}[]{c}b\\ f^{0}\end{array}\right)(r,v,x(r))-\left(\begin{array}[]{c}b\\ f^{0}\end{array}\right)(r,u(r),x(r))\right)\right]\leq 0

for every $r\in[0,T]$ Lebesgue point for $u$ and every $v\in U$ . Second, again from the structure of (6), it can be readily checked that by denoting

p(t)\triangleq\left(\tilde{\mathfrak{p}}^{\top}\mathbb{E}\left[\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(T))\phi(T)\Big{|}\mathcal{F}_{t}\right]\psi(t)\right)_{1,\dots,n},\quad p^{0}\triangleq\left(\tilde{\mathfrak{p}}^{\top}\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(T))\phi(T)\psi(t)\right)_{n+1},

(18)

the quantity $p^{0}$ is constant in $[0,T]$ (in addition, its negativity can be shown through a standard reformulation of problem OCP, as done in [40, Section 12.4]). Notice that the stochastic process $p:[0,T]\times\Omega\rightarrow\mathbb{R}^{n}$ is by definition $\mathcal{F}$ –adapted. The quantities so far introduced allow one to reformulate the inequality above as

\displaystyle\mathbb{E}\bigg{[}p(t)^{\top}\Big{(}b(t,u(t),x(t))-b(t,v,x(t))\Big{)}+p^{0}\Big{(}f^{0}(t,u(t),x(t))-f^{0}(t,v,x(t))\Big{)}\bigg{]}\geq 0

for every $t\in[0,T]$ Lebesgue point for $u$ and $v\in U$ , from which we infer the maximality condition of the PMP.

It remains to show the existence of the process $q\in L_{\mathcal{F}}(\Omega;L^{2}([0,T];\mathbb{R}^{n}))$ , the continuity of the sample paths of the process $p$ , and the validity of the adjoint equation. For this, remark that, due to Jensen inequality and Lemma 2.1, the martingale $\left(\mathbb{E}\left[\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(T))\phi(T)\Big{|}\mathcal{F}_{t}\right]\right)_{t\in[0,T]}$ is bounded in $L^{2}$ . Hence, the martingale representation theorem provides the existence of a process $\mu\in L^{2}_{\mathcal{F}}([0,T]\times\Omega;\mathbb{R}^{(q+1)\times(n+1)})$ such that $\mathbb{E}\left[\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(T))\phi(T)\Big{|}\mathcal{F}_{t}\right]=\mathbb{E}\left[\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(T))\phi(T)\Big{|}\mathcal{F}_{0}\right]+\int^{t}_{0}\mu(s)\;\mathrm{d}B_{s}\triangleq N+\chi(t)$ , where $N\in\mathbb{R}^{(q+1)\times(n+1)}$ is a constant matrix. The definition in (18) immediately gives that the sample paths of the process $p$ are continuous. Next, an application of Itô formula (component-wise) readily shows that the product $\chi\psi$ satisfies, for $t\in[0,T]$ ,

	$\displaystyle(\chi\psi)(t)$	$\displaystyle=\left(\int^{t}_{0}\mu(s)\;\mathrm{d}B_{s}\right)\psi(t)=\int^{t}_{0}\mu(s)\psi(s)\;\mathrm{d}B_{s}-\int^{t}_{0}\mu(s)\psi(s)D(s)\;\mathrm{d}s$
		$\displaystyle\quad-\int^{t}_{0}\chi(s)\psi(s)\left(A(s)-D(s)^{2}\right)\;\mathrm{d}s-\int^{t}_{0}\chi(s)\psi(s)D(s)\;\mathrm{d}B_{s}.$

Denoting $q(t)\triangleq\left(\tilde{\mathfrak{p}}^{\top}\Big{(}\mu(t)\psi(t)-\left(N\psi(t)+\chi(t)\psi(t)\right)D(t)\Big{)}\right)_{1,\dots,n}$ , the computations above readily give the adjoint equation of the PMP. Those computations may also be leveraged to show $p\in L^{2}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n}))$ and $q\in L^{2}_{\mathcal{F}}([0,T]\times\Omega;\mathbb{R}^{n})$ (see, e.g., [20, Section 7.2]).

4.2. Proof of the Convergence Result

Here we enter the core of the proof of Theorem 3.3.1. The convergence of trajectories and controls is addressed first. We devote the last two sections to the convergence of variational inequalities and Pontryagin extremals. For the sake of clarity and brevity, we only consider free-final-time problems. From now on, we implicitly assume $(A_{1})$ – $(A_{3})$ .

4.2.1. Convergence of Controls and Trajectories

Due to $(A_{1})$ – $(A_{3})$ , there exists a sequence of tuples $(u_{k},x_{k})_{k\in\mathbb{N}}$ such that for every $k\in\mathbb{N}$ , the tuple $(u_{k+1},x_{k+1})$ solves LOCP ${}^{\Delta}_{k+1}$ . In what follows, we implicitly adopt the reformulation of each problem convexified problem LOCP ${}^{\Delta}_{k+1}$ , which consist of adding $\mathbb{E}[y(T)-\Delta_{k+1}]\leq 0$ to the final constraints through the new variable $y$ in (4). If $u\in\mathcal{U}$ denotes an admissible control for OCP that fulfills the conditions of Theorem 3.3.1, we denote by $\tilde{x}:[0,T]\times\Omega\rightarrow\mathbb{R}^{n+1}$ the $\mathcal{F}$ –adapted with continuous sample paths process solution to the augmented system (8) related to OCP with control $u$ . The following holds. {lmm}[Convergence of trajectories] Assume that the sequence $(\Delta_{k})_{k\in\mathbb{N}}\subseteq\mathbb{R}_{+}$ converges to zero. If the sequence $(u_{k})_{k\in\mathbb{N}}$ converges to $u$ for the weak topology of $L^{2}$ , then $\mathbb{E}\left[\underset{t\in[0,T]}{\sup}\ \|x_{k}(t)-x(t)\|^{\ell}\right]\rightarrow 0$ , as $k\rightarrow\infty$ , for every $\ell\geq 2$ .

Proof.

For every $t\in[0,T]$ , we have (below, $C\geq 0$ represents some often overloaded appropriate constant)

		$\displaystyle\\|x_{k+1}(t)-x(t)\\|^{\ell}\leq C\left\\|\int^{t}_{0}\Big{(}b_{0}(s,x_{k}(s))-b_{0}(s,x(s))\Big{)}\;\mathrm{d}s\right\\|^{\ell}$		(19)
		$\displaystyle+C\sum^{m}_{i=1}\left\\|\int^{t}_{0}\Big{(}u^{i}_{k+1}(s)b_{i}(s,x_{k}(s))-u^{i}(s)b_{i}(s,x(s))\Big{)}\;\mathrm{d}s\right\\|^{\ell}$
		$\displaystyle+C\left\\|\int^{t}_{0}\left(\frac{\partial b_{0}}{\partial x}(s,x_{k}(s))+\sum^{m}_{i=1}u^{i}_{k}(s)\frac{\partial b_{i}}{\partial x}(s,x_{k}(s))\right)(x_{k+1}(s)-x_{k}(s))\;\mathrm{d}s\right\\|^{\ell}$
		$\displaystyle+C\left\\|\int^{t}_{0}\left(\sigma(s,x_{k}(s))-\sigma(s,x(s))+\frac{\partial\sigma}{\partial x}(s,x_{k}(s))(x_{k+1}(s)-x_{k}(s))\right)\;\mathrm{d}B_{s}\right\\|^{\ell}$

Now, we take expectations. For the last term, we compute

	$\displaystyle\mathbb{E}\left[\left\\|\int^{t}_{0}\left(\sigma(s,x_{k}(s))-\sigma(s,x(s))+\frac{\partial\sigma}{\partial x}(s,x_{k}(s))(x_{k+1}(s)-x_{k}(s))\right)\;\mathrm{d}B_{s}\right\\|^{\ell}\right]\leq$
	$\displaystyle\leq C\left(\int^{t}_{0}\mathbb{E}\left[\underset{r\in[0,s]}{\sup}\ \left\\|x_{k+1}(r)-x(r)\right\\|^{\ell}\right]\;\mathrm{d}s+\mathbb{E}\left[\int^{T}_{0}\left\\|x_{k+1}(s)-x_{k}(s)\right\\|^{1+(\ell-1)}\;\mathrm{d}s\right]\right)$
	$\displaystyle\leq C\int^{t}_{0}\mathbb{E}\left[\underset{r\in[0,s]}{\sup}\ \left\\|x_{k+1}(r)-x(r)\right\\|^{\ell}\right]\;\mathrm{d}s+C\left(\int^{T}_{0}\mathbb{E}\left[\left\\|x_{k+1}(s)-x_{k}(s)\right\\|^{2}\right]\mathrm{d}s\right)^{\frac{1}{2}}\left(\underset{k\in\mathbb{N}}{\sup}\ \mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \\|x_{k}(s)\\|^{2(\ell-1)}\right]\right)^{\frac{1}{2}}$
	$\displaystyle\leq C\left(\int^{t}_{0}\mathbb{E}\left[\underset{r\in[0,s]}{\sup}\ \left\\|x_{k+1}(r)-x(r)\right\\|^{\ell}\right]\;\mathrm{d}s+\Delta_{k+1}\right)$

due to Hölder and Burkholder–Davis–Gundy inequalities, and the last inequality comes from Lemma 2.1. Similar computations can be carried out for the first and third terms of (19).

To handle the second term of (19), we proceed as follows. We see that since $\mathbb{E}\left[\int^{T}_{0}\|b_{i}(s,x(s))\|^{2}\;\mathrm{d}t\right]<\infty$ implies $\int^{T}_{0}\|b_{i}(s,x(s))\|^{2}\;\mathrm{d}t<\infty$ , $\mathcal{G}$ –almost surely, for every fixed $i=1,\dots,m$ and $t\in[0,T]$ , the convergence of the sequence of controls for the weak topology of $L^{2}$ entails that $\left\|\int^{t}_{0}\left(u^{i}_{k}(s)-u^{i}(s)\right)b_{i}(s,x(s))\;\mathrm{d}s\right\|\rightarrow 0$ , $\mathcal{G}$ –almost surely as $k\rightarrow\infty$ . In addition, $(A_{1})$ gives $\left\|\int^{b}_{a}\left(u^{i}_{k}(s)-u^{i}(s)\right)b_{i}(s,x(s))\;\mathrm{d}s\right\|\leq C|b-a|$ for every $a,b\in[0,T]$ and $\mathcal{G}$ –almost surely. Hence, [52, Lemma 3.4] and the dominated convergence theorem finally provide that

\mathbb{E}\left[\underset{t\in[0,T]}{\sup}\ \left\|\int^{t}_{0}\left(u^{i}_{k}(s)-u^{i}(s)\right)b_{i}(s,x(s))\;\mathrm{d}s\right\|^{\ell}\right]\rightarrow 0,\quad k\rightarrow\infty.

It is easy to conclude the proof by applying a routine Grönwall inequality argument. ∎

The sought-after convergence of trajectories is a consequence of Lemma 4.2.1 when the conditions of Theorem 3.3.1 are met. In addition, limiting points for the sequences of controls fulfilling the conditions of Theorem 3.3.1 always exist up to some subsequence. Indeed, in this case, the set of admissible controls $\mathcal{U}$ is closed and convex for the strong topology of $L^{2}$ . Hence it is closed for the weak topology of $L^{2}$ . Therefore, since $(u_{k})_{k\in\mathbb{N}}$ is bounded in $L^{2}$ , there exists $u\in\mathcal{U}$ such that, up to some subsequence, $(u_{k})_{k\in\mathbb{N}}$ weakly converges to $u$ for the weak topology of $L^{2}$ . It remains to show that the process $x$ is feasible for OCP. For this, we compute

	$\displaystyle\Big{\|}\mathbb{E}[$	$\displaystyle g(x(T))]\Big{\|}\leq\mathbb{E}\Big{[}\\|g(x(T))-g(x_{k}(T))\\|\Big{]}+\mathbb{E}\left[\left\\|\frac{\partial g}{\partial x}(x_{k}(T))(x_{k+1}(T)-x_{k}(T))\right\\|\right]$
		$\displaystyle\leq C\left(\mathbb{E}\Big{[}\\|x(T)-x_{k}(T)\\|^{2}\Big{]}^{\frac{1}{2}}+\mathbb{E}\Big{[}\\|x_{k+1}(T)-x(T)\\|^{2}\Big{]}^{\frac{1}{2}}\right)\rightarrow 0$

due to Lemma 4.2.1 and the dominated convergence theorem (here $C\geq 0$ is a constant coming from $(A_{2})$ , and we use the continuity of the sample paths of $x$ ).

4.2.2. Convergence of Variational Inequalities

We start with a crucial result on linear stochastic differential equations. Recall the notation of Section 4.1. {lmm}[Convergence of variational inequalities] Fix $\ell\geq 2$ and consider sequences of times $(r_{k})_{k\in\mathbb{N}}\subseteq[0,T]$ and of uniformly bounded variables $(\tilde{\xi}_{k})_{k\in\mathbb{N}}$ such that for every $k\in\mathbb{N}$ , $\tilde{\xi}_{k}\in L^{2}_{\mathcal{F}_{r_{k}}}(\Omega;\mathbb{R}^{n+1})$ . Assume that $r_{k}\rightarrow r$ with $r_{k}\leq r$ for $k\in\mathbb{N}$ and $\tilde{\xi}_{k}\overset{L^{\ell}}{\rightarrow}\tilde{\xi}\in L^{2}_{\mathcal{F}_{r}}(\Omega;\mathbb{R}^{n+1})$ with $\tilde{\xi}$ bounded. Denote $\tilde{w}_{k+1}$ , $\tilde{w}$ the stochastic process solutions, respectively, to

\begin{cases}\displaystyle\mathrm{d}w(t)=\left(\frac{\partial b_{0}}{\partial x}(t,x_{k}(t))+\sum^{m}_{i=1}u^{i}_{k}(t)\frac{\partial b_{i}}{\partial x}(t,x_{k}(t))\right)w(t)\;\mathrm{d}t+\frac{\partial\sigma}{\partial x}(t,x_{k}(t))w(t)\;\mathrm{d}B_{t}\\ \displaystyle\mathrm{d}w^{n+1}(t)=\left(\frac{\partial H}{\partial x}(x_{k+1}(t))+\frac{\partial L}{\partial x}(t,u_{k}(t),x_{k}(t))\right)w(t)\;\mathrm{d}t\\ \tilde{w}(t)=0,\ t\in[0,r_{k}),\quad\tilde{w}(r_{k})=\tilde{\xi}_{k+1},\end{cases}

\begin{cases}\displaystyle\mathrm{d}w(t)=\left(\frac{\partial b_{0}}{\partial x}(t,x(t))+\sum^{m}_{i=1}u^{i}(t)\frac{\partial b_{i}}{\partial x}(t,x(t))\right)w(t)\;\mathrm{d}t+\frac{\partial\sigma}{\partial x}(t,x(t))w(t)\;\mathrm{d}B_{t}\\ \displaystyle\mathrm{d}w^{n+1}(t)=\left(\frac{\partial H}{\partial x}(x(t))+\frac{\partial L}{\partial x}(t,u(t),x(t))\right)w(t)\;\mathrm{d}t\\ \tilde{w}(t)=0,\ t\in[0,r),\quad\tilde{w}(r)=\tilde{\xi}.\end{cases}

Under the assumptions of Lemma 4.2.1, it holds that $\mathbb{E}\left[\underset{t\in[r,T]}{\sup}\ \|\tilde{w}_{k}(t)-\tilde{w}(t)\|^{\ell}\right]\rightarrow 0$ for $k\rightarrow\infty$ .

Proof.

From $r_{k}\leq r$ , $k\in\mathbb{N}$ , for $t\in[r,T]$ we have (below, $C\geq 0$ is a constant)

$\displaystyle\\|$	$\displaystyle\tilde{w}_{k+1}(t)-\tilde{w}(t)\\|^{\ell}\leq C\\|\tilde{\xi}_{k+1}-\tilde{\xi}\\|^{\ell}$	(20)
	$\displaystyle+C\left\\|\int^{t}_{r_{k}}\left(\frac{\partial H}{\partial x}(x_{k+1}(s))w_{k+1}(s)-\frac{\partial H}{\partial x}(x(s))w(s)\right)\;\mathrm{d}s\right\\|^{\ell}$
	$\displaystyle+C\left\\|\int^{t}_{r_{k}}\left(\frac{\partial b_{0}}{\partial x}(s,x_{k}(s))w_{k+1}(s)-\frac{\partial b_{0}}{\partial x}(s,x(s))w(s)\right)\;\mathrm{d}s\right\\|^{\ell}$
	$\displaystyle+C\left\\|\int^{t}_{0}\mathbbm{1}_{[r_{k},T]}(t)\left(\frac{\partial\sigma}{\partial x}(s,x_{k}(s))w_{k+1}(s)-\frac{\partial\sigma}{\partial x}(s,x(s))w(s)\right)\;\mathrm{d}B_{s}\right\\|^{\ell}$
	$\displaystyle+C\left\\|\int^{t}_{r_{k}}\left(\frac{\partial L}{\partial x}(t,u_{k}(t),x_{k}(t))w_{k+1}(s)-\frac{\partial L}{\partial x}(t,u(t),x(t))w(s)\right)\;\mathrm{d}s\right\\|^{\ell}$
	$\displaystyle+C\sum^{m}_{i=1}\left\\|\int^{t}_{r_{k}}\left(u^{i}_{k}(s)\frac{\partial b_{i}}{\partial x}(s,x_{k}(s))w_{k+1}(s)-u^{i}(s)\frac{\partial b_{i}}{\partial x}(s,x(s))w(s)\right)\;\mathrm{d}s\right\\|^{\ell}.$

We consider the expectation of the fourth term on the right-hand side (a similar argument can be developed for the first three terms, which are omitted in the interest of clarity and brevity). The Burkholder–Davis–Gundy and Holdër inequalities give

	$\displaystyle\mathbb{E}\bigg{[}\bigg{\\|}\int^{t}_{0}\mathbbm{1}_{[r_{k},T]}(t)\left(\frac{\partial\sigma}{\partial x}(s,x_{k}(s))w_{k+1}(s)-\frac{\partial\sigma}{\partial x}(s,x(s))w(s)\right)\;\mathrm{d}B_{s}\bigg{\\|}^{\ell}\bigg{]}\leq$
	$\displaystyle\leq C\|r-r_{k}\|+C\int^{t}_{r}\mathbb{E}\left[\underset{s^{\prime}\in[0,s]}{\sup}\ \left\\|w_{k+1}(s^{\prime})-w(s^{\prime})\right\\|^{\ell}\right]\;\mathrm{d}s+C\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \\|w(s)\\|^{2\ell}\right]\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \\|x_{k}(s)-x(s)\\|^{2\ell}\right],$

where the last inequality holds due to Lemma 4.2.1 and because Lemma 2.1 may be readily extended to $\tilde{w}$ .

Finally, due to the fact that $L$ is affine with respect to the control variable, we may handle the fifth and the sixth terms in (20) by combining the argument above with the final steps in the proof of Lemma 4.2.1, and a routine Grönwall inequality argument provides the conclusion. ∎

Consider $(u_{k})_{k\in\mathbb{N}}\subseteq\mathcal{U}$ which converges to $u\in\mathcal{U}$ for the weak convergence of $L^{2}$ . By assuming $(A_{4})$ , we may denote by $\mathcal{L}\subseteq[0,T]$ the full Lebesgue-measure subset such that $r\in\mathcal{L}$ if and only if $r$ is a Lebesgue point for $u$ and $r\notin\cup_{k\in\mathbb{N}}\mathcal{D}_{k+1}$ (we use the notation introduced with $(A_{4})$ ). We prove the existence of a non-zero vector $\tilde{\mathfrak{p}}\in\mathbb{R}^{q+1}$ whose last component is non-positive such that for every $r\in\mathcal{L}$ , $v\in U$ ,

\tilde{\mathfrak{p}}^{\top}\mathbb{E}\left[\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}(T))z_{r,v}(T)\right]\leq 0,

(21)

where the $\mathcal{F}$ –adapted with continuous sample paths stochastic process $z_{r,v}:[0,T]\times\Omega\rightarrow\mathbb{R}^{n+1}$ solves (6) with

A(t)=\left(\begin{array}[]{c}\displaystyle\frac{\partial b_{0}}{\partial x}(t,x(t))+\sum^{m}_{i=1}u^{i}(t)\frac{\partial b_{i}}{\partial x}(t,x(t))\\ \displaystyle\frac{\partial H}{\partial x}(x(t))+\frac{\partial L}{\partial x}(t,u(t),x(t))\end{array}\right),\quad D(t)=\left(\begin{array}[]{c}\displaystyle\frac{\partial\sigma}{\partial x}(t,x(t))\\ \displaystyle 0\end{array}\right),

\tilde{\xi}_{r}=\tilde{\xi}_{r,v}\triangleq\tilde{b}(r,v,x(r))-\tilde{b}(r,u(r),x(r)),

where we denote $\tilde{b}=(b^{\top},f^{0})^{\top}$ – we will use this notation from now on.

For this, due to (17), for every $k\in\mathbb{N}$ , the optimality of $(u_{k+1},x_{k+1})$ for LOCP ${}^{\Delta}_{k+1}$ provides a non-zero vector $(\tilde{\mathfrak{p}}^{\top}_{k+1},p^{1}_{k+1})^{\top}\in\mathbb{R}^{q+2}$ whose second to last component $p^{0}_{k+1}\leq 0$ is non-zero by assumption, so that for $r\in\mathcal{L}$ and $v\in U$ , it holds that

\left(\begin{array}[]{c}\tilde{\mathfrak{p}}_{k+1}\\ p^{1}_{k+1}\end{array}\right)^{\top}\mathbb{E}\left[\left(\begin{array}[]{cc}\displaystyle\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{k}(T))&0\\ 0&1\end{array}\right)\left(\begin{array}[]{c}z^{k+1}_{r,v}(T)\\ (z^{k+1}_{r,v})^{n+2}(T)\end{array}\right)\right]\leq 0,

where with the notation

A_{k+1}(t)=\left(\begin{array}[]{cc}\displaystyle\frac{\partial b_{0}}{\partial x}(t,x_{k}(t))+\sum^{m}_{i=1}u^{i}_{k}(t)\frac{\partial b_{i}}{\partial x}(t,x_{k}(t))&0\\ \displaystyle\frac{\partial H}{\partial x}(x_{k+1}(t))+\frac{\partial L}{\partial x}(t,u_{k}(t),x_{k}(t))&0\\ 2(x_{k+1}(t)-x_{k}(t))&0\end{array}\right),\quad D_{k+1}(t)=\left(\begin{array}[]{cc}\displaystyle\frac{\partial\sigma}{\partial x}(t,x_{k}(t))&0\\ \displaystyle 0&0\\ \displaystyle 0&0\end{array}\right),

the $\mathcal{F}$ –adapted with continuous sample paths stochastic process $((z^{k}_{r,v})^{\top},(z^{k}_{r,v})^{n+2})^{\top}:[0,T]\times\Omega\rightarrow\mathbb{R}^{n+2}$ solves a higher dimensional version of (6) with

A=A_{k+1},\ D=D_{k+1},\ \textnormal{and initial condition}\ \left(\begin{array}[]{c}\tilde{\xi}^{k+1}_{r,v}\\ 0\end{array}\right)\triangleq\left(\begin{array}[]{c}\tilde{b}_{k+1}(r,v,x_{k+1}(r))-\tilde{b}_{k+1}(r,u_{k+1}(r),x_{k+1}(r))\\ 0\end{array}\right),

where again we denote $\tilde{b}_{k+1}=(b^{\top}_{k+1},f^{0}_{k+1})^{\top}$ .

Now, fix $r\in\mathcal{L}$ and $v\in U$ . The following comes from combining Lemma 3.3.2 with [47, Lemma 3.11]. {lmm}[Pointwise convergence on controls] Under $(A_{4})$ , there exists $(r_{k})_{k\in\mathbb{N}}\subseteq(0,r)$ such that for every $k\in\mathbb{N}$ , $r_{k}$ is a Lebesgue point for $u_{k}$ , and $r_{k}\rightarrow r$ , $u_{k}(r_{k})\rightarrow u(r)$ as $k\rightarrow\infty$ . If $(r_{k})_{k\in\mathbb{N}}\subseteq(0,r)$ denotes the sequence given by Lemma 4.2.2, we define $\tilde{\xi}_{k}=\tilde{\xi}^{k}_{r_{k},v}$ and $\tilde{\xi}=\tilde{\xi}_{r,v}$ . Straightforward computations give (below, $C\geq 0$ is a constant)

	$\displaystyle\mathbb{E}\left[\\|\tilde{\xi}_{k+1}-\tilde{\xi}\\|^{2}\right]\leq$	$\displaystyle C\bigg{(}\\|u_{k+1}(r_{k+1})-u(r)\\|^{2}+\|r_{k+1}-r\|^{2}$		(22)
		$\displaystyle+\mathbb{E}\left[\\|x(r_{k+1})-x(r)\\|^{2}\right]+\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \\|x_{k}(s)-x(s)\\|^{2}\right]\bigg{)},$

from which $\mathbb{E}\left[\|\tilde{\xi}_{k+1}-\tilde{\xi}\|^{2}\right]\rightarrow 0$ for $k\rightarrow\infty$ (Lemma 4.2.1 and 4.2.2). Therefore, from Lemma 4.2.2 we infer that

\mathbb{E}\left[\underset{t\in[r,T]}{\sup}\ \|z^{k}_{r_{k},v}(t)-z_{r,v}(t)\|^{2}\right]\rightarrow 0,\quad k\rightarrow\infty,

which, together with $\Delta_{k}\rightarrow 0$ , readily yields

\mathbb{E}\left[|(z^{k}_{r_{k},v})^{n+2}(T)|\right]\rightarrow 0,\quad k\rightarrow\infty.

At this step, we point out that the variational inequalities in (10) and in (17) still hold if we take multipliers of norm one. Specifically, we may assume that $\|(\tilde{\mathfrak{p}}^{\top}_{k+1},p^{1}_{k+1})^{\top}\|=1$ for every $k\in\mathbb{N}$ . Therefore, up to some subsequence, there exists a vector $(\tilde{\mathfrak{p}}^{\top},p^{1})^{\top}=(\mathfrak{p}^{\top},p^{0},p^{1})^{\top}\in\mathbb{R}^{q+1}$ such that $(\tilde{\mathfrak{p}}^{\top}_{k},p^{1}_{k})^{\top}\rightarrow(\tilde{\mathfrak{p}}^{\top},p^{1})^{\top}$ for $k\rightarrow\infty$ and satisfying $(\tilde{\mathfrak{p}}^{\top},p^{1})^{\top}\neq 0$ , $p^{0}\leq 0$ . We use this remark to conclude as follows. The definition of $\tilde{g}$ and the Hölder inequality give ( $C\geq 0$ is a constant)

	$\displaystyle\tilde{\mathfrak{p}}^{\top}\mathbb{E}\left[\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}(T))z_{r,v}(T)\right]\leq\left\|\tilde{\mathfrak{p}}^{\top}\mathbb{E}\left[\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}(T))z_{r,v}(T)\right]-\left(\begin{array}[]{c}\tilde{\mathfrak{p}}_{k+1}\\ p^{1}_{k+1}\end{array}\right)^{\top}\mathbb{E}\left[\left(\begin{array}[]{cc}\displaystyle\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{k}(T))&0\\ 0&1\end{array}\right)\left(\begin{array}[]{c}z^{k+1}_{r_{k+1},v}(T)\\ (z^{k+1}_{r_{k+1},v})^{n+2}(T)\end{array}\right)\right]\right\|$
	$\displaystyle\leq C\bigg{(}\\|\tilde{\mathfrak{p}}-\tilde{\mathfrak{p}}_{k+1}\\|+p^{1}_{k+1}\mathbb{E}\left[\|(z^{k}_{r_{k},v})^{n+2}(T)\|\right]$
	$\displaystyle\hskip 129.16626pt+\mathbb{E}\left[\\|z^{k+1}_{r_{k+1},v}(T)-z_{r,v}(T)\\|^{2}\right]^{\frac{1}{2}}+\mathbb{E}\Big{[}\\|z_{r,v}(T)\\|^{2}\Big{]}^{\frac{1}{2}}\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \\|x_{k}(s)-x(s)\\|^{2}\right]^{\frac{1}{2}}\bigg{)},$

and in this case, (21) follows from Lemma 4.2.1 and the convergences obtained above. From what we showed in Section 4.1.4, this latter inequality yields that $(u,\mathfrak{p},p^{0})$ is extremal for OCP as soon as $(\mathfrak{p}^{\top},p^{0})^{\top}\neq 0$ .

Finally, we turn to the case for which the sequence $(u_{k})_{k\in\mathbb{N}}\subseteq\mathcal{U}$ converges to $u\in\mathcal{U}$ for the strong topology of $L^{2}$ , but without assuming $(A_{4})$ . For this, fix $v\in U$ and define the stochastic processes $\tilde{\xi}_{k}(s)=\tilde{\xi}^{k}_{s,v}$ and $\tilde{\xi}(s)=\tilde{\xi}_{s,v}$ , where $s\in[0,T]$ . Similar computations to the ones developed to compute the bound (22) provide that $\int^{T}_{0}\mathbb{E}\left[\|\tilde{\xi}_{k}(s)-\tilde{\xi}(s)\|^{2}\right]\;\mathrm{d}s\rightarrow 0$ as $k\rightarrow\infty$ , and therefore up to some subsequence, the quantity $\mathbb{E}\left[\|\tilde{\xi}_{k}(s)-\tilde{\xi}(s)\|^{2}\right]$ converges to zero for $k\rightarrow 0$ a.e. in $[0,T]$ . By taking countable intersections of sets of Lebesgue points (one for each control $u_{k}$ , for all $k\in\mathbb{N}$ ), it follows that the argument above can be iterated exactly in the same manner (via Lemma 4.2.2), leading to the same conclusion.

4.2.3. Convergence of Multipliers and Conclusion

By applying the construction developed in Section 4.1.4 to the variational inequality (21), we retrieve a tuple $(p,\mathfrak{p},p^{0},q)$ , where $\mathfrak{p}\in\mathbb{R}^{q}$ , $p^{0}\leq 0$ are constant, $p\in L^{2}_{\mathcal{F}}(\Omega;C^{0}([0,T];\mathbb{R}^{n}))$ and $q\in L^{2}_{\mathcal{F}}([0,T]\times\Omega;\mathbb{R}^{n})$ , such that, as soon as $(\mathfrak{p}^{\top},p^{0})^{\top}\neq 0$ , the tuple $(u,\mathfrak{p},p^{0},q)$ is extremal for OCP associated with $(u,x,p,\mathfrak{p},p^{0},q)$ satisfying conditions 1., 2., and 3. of Theorem 3.2.1. To conclude, it only remains to prove that

\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \|p_{k}(s)-p(s)\|^{2}\right]\rightarrow 0,\quad\int^{T}_{0}\mathbb{E}\Big{[}\|q_{k}(s)-q(s)\|^{2}\Big{]}\;\mathrm{d}s\rightarrow 0,\quad k\rightarrow\infty.

(23)

Here, each tuple $(u_{k},\mathfrak{p}_{k},p^{0}_{k})$ is the extremal of LOCP ${}^{\Delta}_{k}$ associated with the tuple $(u_{k},x_{k},p^{k},\mathfrak{p}_{k},p^{0}_{k},q_{k})$ , where $(p_{k},q_{k})$ solves the adjoint equation of Theorem 3.2.2. Since, under the assumption $(\mathfrak{p}^{\top},p^{0})^{\top}\neq 0$ , the multipliers $p^{1}_{k}$ , $k\in\mathbb{N}$ , and $p^{1}$ do not play any role, for the sake of clarity and brevity of notation, and without loss of generality, in what follows we implicitly assume $p^{1}_{k}=p^{1}=0$ , for every $k\in\mathbb{N}$ .

Let us start with the first convergence of (23). For this, fix $k\in\mathbb{N}$ and consider the process

\left(\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \left\|\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{k}(T))\phi_{k+1}(T)\psi_{k+1}(s)-\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}(T))\phi(T)\psi(s)\right\|\bigg{|}\mathcal{F}_{t}\right]\right)_{t\in[0,T]},

where $\phi_{k+1}$ , $\psi_{k+1}$ solve (7) with matrices $A_{k+1}$ , $D_{k+1}$ from which, due to $p^{1}_{k}=p^{1}=0$ , for every $k\in\mathbb{N}$ , we remove the last column and row, whereas $\phi$ , $\psi$ solve (7) with matrices $A$ , $D$ , those matrices being defined above. Due to a straightforward extension of Lemma 2.1 to equations (7), this process is a martingale, bounded in $L^{2}$ . Hence, the martingale representation theorem allows us to infer that this process is a martingale with continuous sample paths, and Doob and Jensen inequalities give

	$\displaystyle\mathbb{E}$	$\displaystyle\left[\underset{t\in[0,T]}{\sup}\ \mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \left\\|\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{k}(T))\phi_{k+1}(T)\psi_{k+1}(s)-\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}(T))\phi(T)\psi(s)\right\\|\bigg{\|}\mathcal{F}_{t}\right]^{2}\right]\leq$		(24)
		$\displaystyle\leq 4\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \left\\|\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{k}(T))\phi_{k+1}(T)\psi_{k+1}(s)-\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}(T))\phi(T)\psi(s)\right\\|^{2}\right],$

which holds for every $k\in\mathbb{N}$ . By combining (24) with (18), we compute

	$\displaystyle\mathbb{E}$	$\displaystyle\left[\underset{s\in[0,T]}{\sup}\ \\|p_{k+1}(s)-p(s)\\|^{2}\right]\leq C\mathbb{E}\left[\underset{t\in[0,T]}{\sup}\ \left\\|\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}(t_{f}))\phi(T)\psi(s)\right\\|^{2}\right]\\|\tilde{\mathfrak{p}}_{k+1}-\tilde{\mathfrak{p}}\\|^{2}$
		$\displaystyle+C\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \left\\|\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{k}(T))\phi_{k+1}(T)\psi_{k+1}(s)-\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}(T))\phi(T)\psi(s)\right\\|^{2}\right],$

where $C\geq 0$ is a constant. Up to some subsequence, the first term on the right-hand side converges to zero. Moreover, the definition of $\tilde{g}$ and Hölder inequality give

	$\displaystyle\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \left\\|\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{k}(T))\phi_{k+1}(T)\psi_{k+1}(s)-\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}(T))\phi(T)\psi(s)\right\\|^{2}\right]\leq$
	$\displaystyle\leq C\mathbb{E}\left[\\|\phi_{k+1}(T)\\|^{4}\right]^{\frac{1}{2}}\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \\|\psi_{k+1}(s)-\psi(s)\\|^{4}\right]^{\frac{1}{2}}$
	$\displaystyle\ +C\left(\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \\|\psi(s)\\|^{4}\right]^{\frac{1}{2}}\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \\|\phi_{k+1}(s)-\phi(s)\\|^{4}\right]^{\frac{1}{2}}+\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \\|\phi(T)\psi(s)\\|^{4}\right]^{\frac{1}{2}}\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \\|x_{k}(s)-x(s)\\|^{4}\right]^{\frac{1}{2}}\right),$

and a straightforward extension of Lemma 4.2.2 to equations (7) entails that all the terms on the right-hand side tend to zero. The convergence of $(p_{k})_{k\in\mathbb{N}}$ is proved.

It remains to prove the second convergence of (23). For this, we apply Itô formula to $\|p_{k+1}(t)-p(t)\|^{2}$ , which due to (18), (24) and Lemma 2.1 extended to (7) gives (below, $C\geq 0$ is a constant)

	$\displaystyle\mathbb{E}\Big{[}\\|p_{k+1}(0)-p(0)\\|^{2}\Big{]}+\int^{T}_{0}\mathbb{E}\Big{[}\\|q_{k+1}(s)-q(s)\\|^{2}\Big{]}\;\mathrm{d}s=\mathbb{E}\Big{[}\\|p_{k+1}(T)-p(T)\\|^{2}\Big{]}$
	$\displaystyle+2\mathbb{E}\bigg{[}\int^{T}_{0}\Big{(}p_{k+1}(s)-p(s)\Big{)}^{\top}\bigg{(}p(s)^{\top}_{k+1}\frac{\partial b_{k+1}}{\partial x}(s,u_{k+1}(s),x_{k+1}(s))-p(s)^{\top}\frac{\partial b}{\partial x}(s,u(s),x(s))\bigg{)}\;\mathrm{d}s\bigg{]}$
	$\displaystyle+2\mathbb{E}\left[\int^{T}_{0}\Big{(}p_{k+1}(s)-p(s)\Big{)}^{\top}\bigg{(}p^{0}_{k+1}\frac{\partial f^{0}_{k+1}}{\partial x}(s,u_{k+1}(s),x_{k+1}(s))-p^{0}\frac{\partial f^{0}}{\partial x}(s,u(s),x(s))\bigg{)}\;\mathrm{d}s\right]$
	$\displaystyle+2\mathbb{E}\left[\int^{T}_{0}\Big{(}p_{k+1}(s)-p(s)\Big{)}^{\top}\bigg{(}q(s)^{\top}_{k+1}\frac{\partial\sigma_{k+1}}{\partial x}(s,x_{k+1}(s))-q(s)^{\top}\frac{\partial\sigma}{\partial x}(s,x(s))\bigg{)}\;\mathrm{d}s\right]$
	$\displaystyle\leq C\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \\|p_{k+1}(s)-p(s)\\|^{2}\right]^{\frac{1}{2}}\left(\int^{T}_{0}\mathbb{E}\Big{[}\\|q_{k+1}(s)-q(s)\\|^{2}\Big{]}\;\mathrm{d}s\right)^{\frac{1}{2}}$
	$\displaystyle\quad+C\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \\|p_{k+1}(s)-p(s)\\|^{2}\right]^{\frac{1}{2}}\left(1+\left(\int^{T}_{0}\mathbb{E}\Big{[}\\|q(s)\\|^{2}\Big{]}\;\mathrm{d}s\right)^{\frac{1}{2}}\right).$

The conclusion finally follows from Young’s inequality and the convergence of $(p_{k})_{k\in\mathbb{N}}$ .

5. Extension to Problems with Free Final Time and Stochastic Controls

The accumulation properties of SCP can be extended to OCPs with free final time and whereby optimization is undertaken over stochastic controls. Specifically, in this section we investigate how results from Theorem 3.3.1 may be extended to finite-horizon, finite-dimensional non-linear stochastic General Optimal Control Problems (GOCP) having control-affine dynamics and uncontrolled diffusion, of the form

\begin{cases}\displaystyle\underset{u,t_{f}}{\min}\ \mathbb{E}\left[\int^{t_{f}}_{0}f^{0}(s,u(s),x(s))\;\mathrm{d}s\right]\triangleq\mathbb{E}\left[\int^{t_{f}}_{0}\Big{(}G(u(s))+H(x(s))+L(s,u(s),x(s))\Big{)}\;\mathrm{d}s\right]\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \mathrm{d}x(t)=b(t,u(t),x(t))\;\mathrm{d}t+\sigma(t,x(t))\;\mathrm{d}B_{t},\quad x(0)=x^{0},\quad\mathbb{E}\left[g(x(t_{f}))\right]=0,\end{cases}

where the final time $0\leq t_{f}\leq T$ may be free or not (here, $T>0$ is some fixed maximal time), and we optimize over controls $u\in\mathcal{U}$ which are either deterministic, i.e., $\mathcal{U}=L^{2}([0,T];U)$ , or stochastic, i.e., $\mathcal{U}=L^{2}_{\mathcal{F}}([0,T]\times\Omega;U)$ . Solutions to GOCP will be denoted by $(t_{f},u,x)\in[0,T]\times\mathcal{U}\times L^{\ell}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n}))$ , for $\ell\geq 2$ .

By adopting the notation given in (3), the stochastic Linearized General Optimal Control Problem (LGOCP ${}^{\Delta}_{k+1}$ ) at iteration $k\in\mathbb{N}$ may be defined accordingly as

\begin{cases}\displaystyle\underset{u,t_{f}}{\min}\ \mathbb{E}\left[\int^{t_{f}}_{0}f^{0}_{k+1}(s,u(s),x(s))\;\mathrm{d}s\right]\triangleq\mathbb{E}\bigg{[}\int^{t_{f}}_{0}f^{0}_{u_{k},x_{k}}(s,u(s),x(s))\;\mathrm{d}s\bigg{]}\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \mathrm{d}x(t)=b_{k+1}(t,u(t),x(t))\;\mathrm{d}t+\sigma_{k+1}(t,x(t))\;\mathrm{d}B_{t},\quad x(0)=x^{0}\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \hskip 25.83325pt\triangleq b_{u_{k},x_{k}}(t,u(t),x(t))\;\mathrm{d}t+\sigma_{x_{k}}(t,x(t))\;\mathrm{d}B_{t}\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \displaystyle\mathbb{E}\left[g_{k+1}(x(t_{f}))\right]\triangleq\mathbb{E}\left[g(x_{k}(t^{k}_{f}))+\frac{\partial g}{\partial x}(x_{k}(t^{k}_{f}))(x(t_{f})-x_{k}(t^{k}_{f}))\right]=0\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \displaystyle\int^{T}_{0}\mathbb{E}\left[\|x(s)-x_{k}(s)\|^{2}\right]\;\mathrm{d}s\leq\Delta_{k+1},\quad|t_{f}-t^{k}_{f}|\leq\Delta_{k+1},\end{cases}

whose solutions are denoted by $(t^{k+1}_{f},u_{k+1},x_{k+1})\in[0,T]\times\mathcal{U}\times L^{\ell}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n}))$ , for $\ell\geq 2$ . In particular, trust-region constraints are now imposed on the variable $t_{f}$ as well to derive convergence guarantees.

We point out that the results we present in this section are not to be considered as part of the main contribution, but they rather aim at providing insights for the future design of efficient numerical methods for stochastic optimal control problems which consider free final time and stochastic admissible controls. In particular, extending SCP in the presence of free final time and stochastic controls requires introducing additional assumptions which might seem demanding. We leave the investigation of the validity of this framework under sharper assumptions as a future direction of research.

5.1. Refined assumptions and extended result of convergence

The presence of the free final time hinders the well-posedness of the stochastic PMP when applied to each LGOCP ${}^{\Delta}_{k+1}$ , i.e., Theorem 3.2.1, and in turn the validity of Theorem 3.3.1 in such more general setting. To overcome this issue, we need to tighten assumptions $(A_{1})$ – $(A_{3})$ . Specifically, although $(A_{1})$ remains unchanged, assumptions $(A_{2})$ – $(A_{3})$ are replaced by the followings, respectively:

$(A^{\prime}_{2})$ Mappings $g$ , $H$ , and $L_{i}$ , $i=0,\dots,m$ , either are affine-in-state or have compact supports in $\mathbb{R}^{n}$ and in $[0,T]\times\mathbb{R}^{n}$ , respectively. In the case of free final time, $g$ is affine.

$(A^{\prime}_{3})$ For every $k\in\mathbb{N}$ , LGOCP ${}^{\Delta}_{k+1}$ is feasible. In addition, in the case of free final time, for every $k\in\mathbb{N}$ , any optimal control $u_{k}$ for LGOCP ${}^{\Delta}_{k}$ has continuous sample paths at the optimal final time $t^{k+1}_{f}$ for LGOCP ${}^{\Delta}_{k+1}$ .

Assumptions $(A^{\prime}_{3})$ plays a crucial role to provide the existence of pointwise necessary conditions for optimality for the linearized problems LGOCP ${}^{\Delta}_{k}$ , thus enabling classical formulations of the stochastic PMP. However, we do recognize $(A^{\prime}_{3})$ might be demanding and, as future research direction, we propose to relax it by leveraging integral-type necessary conditions for optimality, which are best suited to deal with optimal control problems which show explicit discontinuous dependence on time (e.g., [53]). Unfortunately, when optimization over stochastic controls is adopted, successfully leveraging weakly converging subsequences of controls to show “weak” guarantee of success for SCP, i.e., the equivalent of Theorem 3.3.2, becomes more challenging, in which case only Theorem 3.3.1 still holds, though under appropriate modifications. We leave proving success guarantees for SCP when optimizing over stochastic controls with weaker assumptions as a future direction of research.

Extending the stochastic PMP to GOCP and LGOCP ${}^{\Delta}_{k+1}$ comes by assuming $(A_{1})$ , $(A^{\prime}_{2})$ , and $(A^{\prime}_{3})$ . In particular, under those assumptions we have the following extension of Theorem 3.2.1 and of Theorem 3.2.2: {thrm}[Stochastic Pontryagin Maximum Principle for GOCP] Let $(t_{f},u,x)$ be a locally-optimal solution to GOCP. There exist $p\in L^{2}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n}))$ and a tuple $(\mathfrak{p},p^{0},q)$ , where $\mathfrak{p}\in\mathbb{R}^{q}$ and $p^{0}\leq 0$ are constant, and $q=(q_{1},\dots,q_{d})\in L^{2}_{\mathcal{F}}([0,T]\times\Omega;\mathbb{R}^{n\times d})$ such that the following relations are satisfied:

(1)

Non-Triviality Condition: $(\mathfrak{p},p^{0})\neq 0$ .

(2)

Adjoint Equation:

\displaystyle\mathrm{d}p(t)

\displaystyle=\displaystyle-\frac{\partial H}{\partial x}(t,u(t),x(t),p(t),p^{0},q(t))\;\mathrm{d}t+q(t)\;\mathrm{d}B_{t},\quad p(t_{f})=\displaystyle\mathbb{E}\left[\frac{\partial g}{\partial x}(x(t_{f}))\right]^{\top}\mathfrak{p}\in\mathbb{R}^{n}.

(3)

Maximality Condition:

{\color[rgb]{0,0,0}u(t)=\underset{v\in U}{\arg\max}\ \mathbb{E}\Big{[}H(t,v,x(t),p(t),p^{0},q(t))\Big{]}},\ \textnormal{a.e.}\ \textnormal{(deterministic controls)}

{\color[rgb]{0,0,0}u(t)=\underset{v\in U}{\arg\max}\ H(t,v,x(t),p(t),p^{0},q(t))},\ \textnormal{a.e., \ a.s.}\ \textnormal{(stochastic controls)}

(4)

Transversality Condition: if the final time is free

{\color[rgb]{0,0,0}\underset{v\in U}{\max}\ \mathbb{E}\Big{[}H(t_{f},v,x(t_{f}),p(t_{f}),p^{0},q(t_{f}))\Big{]}\geq 0}\ \textnormal{(deterministic controls)}

{\color[rgb]{0,0,0}\mathbb{E}\left[\underset{v\in U}{\max}\ H(t_{f},v,x(t_{f}),p(t_{f}),p^{0},q(t_{f}))\right]\geq 0}\ \textnormal{(stochastic controls)}

where equalities hold in the case $t_{f}<T$ .

The quantity $(t_{f},u,\mathfrak{p},p^{0})$ uniquely determines $x$ , $p$ , and $q$ and is called extremal for GOCP (associated with the tuple $(t_{f},u,x,p,\mathfrak{p},p^{0},q)$ , or simply with $(t_{f},u,x)$ ). An extremal $(t_{f},u,\mathfrak{p},p^{0})$ is called normal if $p^{0}\neq 0$ .

{thrm}

[Weak Stochastic Pontryagin Maximum Principle for LGOCP ${}^{\Delta}_{k}$ ] Let $(t^{k}_{f},u_{k},x_{k})$ be a locally-optimal solution to LGOCP ${}^{\Delta}_{k}$ . There exist $p_{k}\in L^{2}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n}))$ , a tuple $(\mathfrak{p}_{k},p^{0}_{k},p^{1}_{k})$ , where $\mathfrak{p}_{k}\in\mathbb{R}^{q}$ , $p^{0}_{k}\leq 0$ , and $p^{1}_{k}\in\mathbb{R}$ are constant, and $q_{k}=((q_{k})_{1},\dots,(q_{k})_{d})\in L^{2}_{\mathcal{F}}([0,T]\times\Omega;\mathbb{R}^{n\times d})$ such that the following relations are satisfied:

(1)

Non-Triviality Condition: $(\mathfrak{p}_{k},p^{0}_{k},p^{1}_{k})\neq 0$ .

(2)

Adjoint Equation:

\displaystyle\mathrm{d}p_{k}(t)

\displaystyle=\displaystyle-\frac{\partial H_{k}}{\partial x}(t,u_{k}(t),x_{k}(t),p_{k}(t),p^{0}_{k},p^{1}_{k},q_{k}(t))\;\mathrm{d}t+q_{k}(t)\;\mathrm{d}B_{t},\quad p_{k}(t^{k}_{f})=\displaystyle\mathbb{E}\left[\frac{\partial g_{k}}{\partial x}(x_{k}(t^{k}_{f}))\right]^{\top}\mathfrak{p}_{k}\in\mathbb{R}^{n}.

(3)

Maximality Condition:

u_{k}(t)=\underset{v\in U}{\arg\max}\ \mathbb{E}\Big{[}H_{k}(t,v,x_{k}(t),p_{k}(t),p^{0}_{k},p^{1}_{k},q_{k}(t))\Big{]},\ \textnormal{a.e.}\ \textnormal{(deterministic controls)}

u_{k}(t)=\underset{v\in U}{\arg\max}\ H_{k}(t,v,x_{k}(t),p_{k}(t),p^{0}_{k},p^{1}_{k},q_{k}(t)),\ \textnormal{a.e., \ a.s.}\ \textnormal{(stochastic controls)}

(4)

Transversality Condition: if the final time is free

\underset{v\in U}{\max}\ \mathbb{E}\Big{[}H_{k}(t^{k}_{f},v,x_{k}(t^{k}_{f}),p_{k}(t^{k}_{f}),p^{0}_{k},p^{1}_{k},q_{k}(t^{k}_{f}))\Big{]}\geq 0\ \textnormal{(deterministic controls)}

\mathbb{E}\left[\underset{v\in U}{\max}\ H_{k}(t^{k}_{f},v,x_{k}(t^{k}_{f}),p_{k}(t^{k}_{f}),p^{0}_{k},p^{1}_{k},q_{k}(t^{k}_{f}))\right]\geq 0\ \textnormal{(stochastic controls)}

where equalities hold in the case $t^{k}_{f}<T$ .

The quantity $(t^{k}_{f},u_{k},\mathfrak{p}_{k},p^{0}_{k},p^{1}_{k})$ uniquely determines $x_{k}$ , $p_{k}$ , and $q_{k}$ and is called extremal for LOCP ${}^{\Delta}_{k}$ (associated with $(t^{k}_{f},u_{k},x_{k},p_{k},\mathfrak{p}_{k},p^{0}_{k},p^{1}_{k},q_{k})$ , or $(t^{k}_{f},u_{k},x_{k})$ ). An extremal $(t^{k}_{f},u_{k},\mathfrak{p}_{k},p^{0}_{k},p^{1}_{k})$ is called normal if $p^{0}_{k}\neq 0$ .

Extending the stochastic PMP to the new linearized problems LGOCP ${}^{\Delta}_{k}$ , and in particular the new transversality condition 4., additionally requires to assume $(A^{\prime}_{3})$ together with $(A_{1})$ and $(A^{\prime}_{2})$ . Although the proof of this result for fixed-final-time problems is well-established (see [20, Chapter 3]), we could not find any published proof of Theorem 5.1 when the final time is free. Therefore, we provide its proof in Section 5.2. The proof of Theorem 5.1 is achieved similarly, thus for the sake of clarity and brevity, we avoid reporting any detail concerning the latter. Thanks to Theorem 5.1, we can extend the convergence of SCP as follows:

{thrm}

[Generalized Properties of Accumulation Points for SCP] Assume that $(A_{1})$ , $(A^{\prime}_{2})$ , and $(A^{\prime}_{3})$ hold and that SCP generates a sequence $(\Delta_{k},t^{k}_{f},u_{k},x_{k})_{k\in\mathbb{N}}$ such that $(\Delta_{k})_{k\in\mathbb{N}}\subseteq\mathbb{R}_{+}\setminus\{0\}$ converges to zero, and for every $k\geq 1$ , the tuple $(t^{k}_{f},u_{k},x_{k})$ locally solves LGOCP ${}^{\Delta}_{k}$ . For every $k\geq 1$ , letting $(t^{k}_{f},u_{k},\mathfrak{p}_{k},p^{0}_{k},p^{1}_{k})$ be an extremal associated with $(t^{k}_{f},u_{k},x_{k})$ for LGOCP ${}^{\Delta}_{k}$ (whose existence is ensured by Theorem 5.1), assume the following Accumulation Condition holds:

(AC)

Up to some subsequence, $(t^{k}_{f},u_{k},\mathfrak{p}_{k},p^{0}_{k},p^{1}_{k})$ converges to some $(u,\mathfrak{p},p^{0},p^{1})\in L^{2}([0,T];\mathbb{R}^{m})\times\mathbb{R}^{q+2}$ for the strong topology of $L^{2}_{\mathcal{F}}([0,T]\times\Omega;\mathbb{R}^{m})\times\mathbb{R}^{q+2}$ .

If $(\mathfrak{p},p^{0})\neq 0$ , then $(t^{k}_{f},u,\mathfrak{p},p^{0})$ is an extremal for OCP associated with $(t^{k}_{f},u,x_{u})$ .

The guarantees offered by Theorem 5.1 read similarly to what we have explained in Section 3.3. One sees that the computations provided in Section 4.2 for the proof of Theorem 3.3.1 straightforwardly generalize as soon as weak convergence of controls in $L^{2}([0,T];\mathbb{R}^{m})$ is replaced with strong convergence of controls in $L^{2}_{\mathcal{F}}([0,T]\times\Omega;\mathbb{R}^{m})$ (actually, the proofs become even simpler), and may be endorsed to prove Theorem 5.1, provided that we are capable to extend the proof of Theorem 3.2.1 and Theorem 3.2.2 in the case of free final time and stochastic controls. Therefore, to conclude, in the next section we develop the necessary technical details which enable proving Theorem 5.1 and Theorem 5.1 through the machinery developed in Section 4.2. Specifically, since the proofs of Theorem 5.1 and Theorem 5.1 are similar, for the sake of clarity and brevity we only provide details for the proof of Theorem 5.1.

5.2. Proof of the extension for the stochastic Pontryagin Maximum Principle

Before getting started, we need to introduce the notion of a Lebesgue point for a stochastic control $u\in L^{2}_{\mathcal{F}}([0,T]\times\Omega;\mathbb{R}^{m})$ . For this, we adopt the theory of Bochner integrals, showing that $L^{2}_{\mathcal{F}}([0,T]\times\Omega;\mathbb{R}^{m})\subseteq L^{2}([0,T];L^{2}_{\mathcal{G}}(\Omega;\mathbb{R}^{m}))$ , where the latter is the space of Bochner integrable mappings $u:[0,T]\rightarrow L^{2}_{\mathcal{G}}(\Omega;\mathbb{R}^{m})$ . Let $u\in L^{2}_{\mathcal{F}}([0,T]\times\Omega;\mathbb{R}^{m})$ . First of all, by definition we see that for almost every $t\in[0,T]$ , it holds that $\mathbb{E}\left[\|u(t)\|^{2}\right]<\infty$ , and thus this control is well-defined as a mapping $u:[0,T]\rightarrow L^{2}_{\mathcal{G}}(\Omega;\mathbb{R}^{m})$ . Since $\Omega$ is second-countable, $L^{2}_{\mathcal{G}}(\Omega;\mathbb{R}^{m})$ is separable, and therefore the claim follows from the Pettis measurability theorem once we prove that $u:[0,T]\rightarrow L^{2}_{\mathcal{G}}(\Omega;\mathbb{R}^{m})$ is strongly measurable with respect to the Lebesgue measure of $[0,T]$ . For this, it is sufficient to show that for every $A\in\mathcal{B}([0,T])\otimes\mathcal{G}$ and $\alpha\in L^{2}_{\mathcal{G}}(\Omega;\mathbb{R}^{m})$ , the mapping $t\mapsto\mathbb{E}\left[\mathbbm{1}_{A}(t,\omega)\alpha(\omega)\right]$ is Lebesgue measurable. By fixing $\alpha\in L^{2}_{\mathcal{G}}(\Omega;\mathbb{R}^{m})$ , this can be achieved by proving that the family $\{A\in\mathcal{B}([0,T])\otimes\mathcal{G}:\ t\mapsto\mathbb{E}\left[\mathbbm{1}_{A}(t,\omega)\alpha(\omega)\right]\ \textnormal{is Lebesgue measurable}\}$ is a monotone class and then using standard monotone class arguments (the details are left to the reader). At this step, the Lebesgue differentiation theorem provides that for almost every $t\in[0,T]$ , the following relations hold:

\underset{\eta\rightarrow 0}{\lim}\ \frac{1}{\eta}\int^{t+\eta}_{t}\mathbb{E}\Big{[}\|u(s)-u(t)\|\Big{]}\;\mathrm{d}s=0,\quad\underset{\eta\rightarrow 0}{\lim}\ \frac{1}{\eta}\int^{t+\eta}_{t}\mathbb{E}\Big{[}\|u(s)-u(t)\|^{2}\Big{]}\;\mathrm{d}s=0.

Such a time $t\in[0,T]$ is called Lebesgue point for the control $u\in L^{2}_{\mathcal{F}}([0,T]\times\Omega;\mathbb{R}^{m})$ .

5.2.1. Modified Needle-like Variations and End-point Mapping

We may extend the concept of needle-like variations and of end-point mapping previously introduced in Section 4.1.2 to the setting of free final time and of stochastic controls as follows.

Given an integer $j\in\mathbb{N}$ , fix $j$ times $0<t_{1}<\dots<t_{j}<t_{f}$ which are Lebesgue points for $u$ , and fix $j$ random variables $u_{1},\dots,u_{j}$ such that $u_{i}\in L^{2}_{\mathcal{F}_{t_{i}}}(\Omega;U)$ . For fixed scalars $0\leq\eta_{i}<t_{i+1}-t_{i}$ , $i=1,\dots,j-1$ , and $0\leq\eta_{j}<t_{f}-t_{j}$ , the needle-like variation $\pi=\{t_{i},\eta_{i},u_{i}\}_{i=1,\dots,j}$ of the control $u$ is defined to be the admissible control $u_{\pi}(t)=u_{i}$ if $t\in[t_{i},t_{i}+\eta_{i}]$ and $u_{\pi}(t)=u(t)$ otherwise. Denote by $\tilde{x}_{v}$ the solution related to an admissible control $v$ of the augmented system

\begin{cases}\mathrm{d}x(t)=b(t,v(t),x(t))\;\mathrm{d}t+\sigma(t,x(t))\;\mathrm{d}B_{t},\quad x(0)=x^{0}\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \mathrm{d}x^{n+1}(t)=f^{0}(t,v(t),x(t))\;\mathrm{d}t,\hskip 60.27759ptx^{n+1}(0)=0\end{cases}

and define the mapping $\tilde{g}:\mathbb{R}^{n+1}\rightarrow\mathbb{R}^{q+1}:\tilde{x}\mapsto(g(x),x^{n+1})$ . For every fixed time $t\in(t_{j},t_{f}]$ , by denoting $\delta_{t}\triangleq\min\{t_{i+1}-t_{i},t-t_{j},T-t:i=1,\dots,j-1\}>0$ , the end-point mapping at time $t$ is defined to be

	$\displaystyle F^{j}_{t}:\$	$\displaystyle\mathcal{C}^{j}_{t}\triangleq B^{j+1}_{\delta_{t}}(0)\cap(\mathbb{R}\times\mathbb{R}^{j}_{+})\rightarrow\mathbb{R}^{q+1}$
		$\displaystyle(\delta,\eta_{1},\dots,\eta_{j})\mapsto\mathbb{E}\left[\tilde{g}(\tilde{x}_{u_{\pi}}(t+\delta))\right]-\mathbb{E}\left[\tilde{g}(\tilde{x}_{u}(t))\right]$

where $B^{j+1}_{\rho}$ is the open ball in $\mathbb{R}^{j+1}$ of radius $\rho>0$ . Variations on the variable $\delta$ are necessary only if free-final-time problems are considered, in which case $(A^{\prime}_{2})$ , and in particular the fact that $g$ is an affine function, play a crucial role for computations. Due to Lemma 2.1 (and to $(A^{\prime}_{2})$ in the case of free-final-time problems), it is not difficult to see that $F^{j}_{t}$ is Lipschitz (see also the argument developed to prove Lemma 5.2.1 below). In addition, this mapping may be Gateaux differentiated at zero along admissible directions of the cone $\mathcal{C}^{j}_{t}$ . For this, denote $\tilde{b}=(b^{\top},f^{0})^{\top}$ , $\tilde{\sigma}=(\sigma^{\top},0)^{\top}$ and let $z_{t_{i},u_{i}}$ be the unique solution to (6) with $\xi_{t_{i}}=\tilde{b}(t_{i},u_{i},x(t_{i}))-\tilde{b}(t_{i},u(t_{i}),x(t_{i}))$ . {lmm}[Generalized stochastic needle-like variation formula] Let $(\delta,\eta_{1},\dots,\eta_{j})\in\mathcal{C}^{j}_{t}$ and assume $(A_{2})$ holds (in particular, $\tilde{g}$ is an affine function when $\delta\neq 0$ ). If $t>t_{j}$ is a Lebesgue point for $u$ , then it holds that

\bigg{\|}\mathbb{E}\bigg{[}\tilde{g}(\tilde{x}_{u_{\pi}}(t+\delta))-\tilde{g}(\tilde{x}_{u}(t))-\delta\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(t))\tilde{b}(t,u(t),x_{u}(t))-\sum^{j}_{i=1}\eta_{i}\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(t))z_{t_{i},u_{i}}(t)\bigg{]}\bigg{\|}=o\left(\delta+\sum^{j}_{i=1}\eta_{i}\right).

As for Lemma 4.1.2, we provide an extensive proof of Lemma 5.2.1 in the appendix.

5.2.2. Variational Inequalities and Conclusion

Similarly to what we have argued in Section 5.2.2, the main step in the proof of the stochastic PMP with free final time and stochastic control goes by contradiction, leveraging Lemma 5.2.1. For this, from now on we assume that when free, the final time is a Lebesgue point for the optimal control. Otherwise, one may proceed by mimicking the argument developed in [39, Section 7.3].

Assume that $(A_{1})$ and $(A^{\prime}_{2})$ hold. For every $j\in\mathbb{N}$ , define the linear mapping

dF^{j}_{t_{f}}(\delta,\eta)=\delta\mathbb{E}\bigg{[}\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(t_{f}))\tilde{b}(t_{f},u(t_{f}),x_{u}(t_{f}))\bigg{]}+\sum_{i=1}^{j}\eta_{i}\mathbb{E}\left[\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(t_{f}))z_{t_{i},u_{i}}(t_{f})\right],

which due to Lemma 5.2.1, satisfies

\underset{\alpha>0,\alpha\rightarrow 0}{\lim}\ \frac{F^{j}_{t_{f}}\left(\alpha(\delta,\eta)\right)}{\alpha}=dF^{j}_{t_{f}}(\delta,\eta),

for every $(\delta,\eta)\in\mathbb{R}\times\mathbb{R}^{j}_{+}$ . Finally, consider the closed, convex cone of $\mathbb{R}^{q+1}$ given by

	$\displaystyle K\triangleq\textnormal{Cl}\bigg{(}\textnormal{Cone}\ \bigg{\{}$	$\displaystyle\mathbb{E}\left[\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(t_{f}))z_{t_{i},u_{i}}(t_{f})\right],\pm\mathbb{E}\left[\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(t_{f}))\tilde{b}(t_{f},u(t_{f}),x_{u}(t_{f}))\right]$
		$\displaystyle\textnormal{for}\ u_{i}\in U\ \textnormal{and}\ t_{i}\in(0,t_{f})\ \textnormal{is Lebesgue for}\ u\bigg{\}}\bigg{)}.$

If $K=\mathbb{R}^{q+1}$ , it would hold $dF^{j}_{t_{f}}(\mathbb{R}\times\mathbb{R}^{j}_{+})=K=\mathbb{R}^{q+1}$ , and by [40, Lemma 12.1], one would find that the origin is an interior point of $F^{j}_{t_{f}}(\mathcal{C}^{j}_{t_{f}})$ . This would imply that $(t_{f},u,x)$ is not optimal for OCP, a contradiction.

\begin{cases}\displaystyle\tilde{\mathfrak{p}}^{\top}\mathbb{E}\left[\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(t_{f}))\tilde{b}(t_{f},u(t_{f}),x_{u}(t_{f}))\right]=0\vskip 12.0pt plus 4.0pt minus 4.0pt\\ \displaystyle\tilde{\mathfrak{p}}^{\top}\mathbb{E}\left[\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(t_{f}))z_{r,v}(t_{f})\right]\leq 0,\ r\in[0,t_{f}]\ \textnormal{is Lebesgue for}\ u,\ v\in L^{2}_{\mathcal{F}_{r}}(\Omega;U).\end{cases}

(25)

In the case of deterministic controls, the random variables $v\in L^{2}_{\mathcal{F}_{r}}(\Omega;U)$ in (25) are replaced by deterministic vectors $v\in U$ . Moreover, when $t_{f}=T$ , only negative variations on the final time are allowed. Hence in this case, the first equality of (25) actually becomes a greater-or-equal-to-zero inequality (see also [39, Chapter 7]).

The rest of the proof remains unchanged, i.e., it follows the argument of Section 4.1.4 verbatim, and the proof of Theorem 5.1 may be readily inferred by the computations of Section 4.2.

6. An Example Numerical Scheme

Although the procedure detailed previously provides methodological steps to tackle OCP through successive linearizations, numerically solving LCPs that depend on stochastic coefficients still remains a challenge. In this last section, under appropriate assumptions we propose an approximate, though very practical, numerical scheme to effectively solve each subproblem LOCP ${}^{\Delta}_{k}$ . We stress the fact that our main goal is not the development of an ultimate algorithm, but rather consists of demonstrating how one may leverage the theoretical insights provided by Theorem 3.3.1 to design efficient strategies to practically solve OCP.

6.1. A Simplified Context

The proposed approach relies on a specific shape of the cost and the dynamics of OCP. Specifically, we consider OCPs with deterministic admissible controls $u\in\mathcal{U}=L^{2}([0,T];U)$ (over a fixed time horizon $[0,T]$ ) only and whose cost functions $f^{0}$ are such that $H=0$ . Moreover, we assume the state variable is given by two components $(x,z)\in\mathbb{R}^{n_{x}+n_{z}}$ for $n_{x},n_{z}\in\mathbb{N}$ , satisfying the following system of forward stochastic differential equations ( $b^{x}$ and $b^{z}$ are accordingly defined as in (1))

\begin{cases}\mathrm{d}x(t)=b^{x}(t,u(t),x(t),z(t))\;\mathrm{d}t+\sigma(t,z(t))\;\mathrm{d}B_{t},\quad x(0)=x^{0}\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \mathrm{d}z(t)=b^{z}(t,u(t),z(t))\;\mathrm{d}t,\quad z(0)=z^{0}\in\mathbb{R}^{n_{z}}.\end{cases}

(26)

In particular, any $\mathcal{F}$ -adapted process $(x,z)$ solution to (26) with continuous sample paths for a given control $u\in\mathcal{U}$ is such that $z$ is deterministic. For the sake of clarity and to avoid cumbersome notation, from now on we assume $g=0$ and that $b^{x}$ does not explicitly depend on $z$ . This is clearly done without loss of generality.

6.2. The Proposed Approach

With the assumptions adopted previously, we see that the diffusion in the dynamics of OCP is now forced to be deterministic. This fact is at the root of our method, which mimics the procedure proposed in [9]. Specifically, we transcribe every stochastic subproblem LOCP ${}^{\Delta}_{k}$ into a deterministic and convex optimal control problem, whereby the variables are the mean and the covariance of the solution $x_{k}$ to LOCP ${}^{\Delta}_{k}$ . The main advantage in doing so is that deterministic reformulations of the subproblems LOCP ${}^{\Delta}_{k}$ can be efficiently solved via off-the-shelf convex solvers. Unlike [9], here we rely on some upstream information for the design of this numerical scheme which is entailed by Theorem 3.3.1 as follows.

By recalling the notation introduced in the previous sections, we denote $\mu(t)\triangleq\mathbb{E}[x(t)]$ , $\Sigma(t)\triangleq\mathbb{E}[(x(t)-\mu(t))(x(t)-\mu(t))^{\top}]$ , and for $k\in\mathbb{N}$ , $\mu_{k}(t)\triangleq\mathbb{E}[x_{k}(t)]$ and $\Sigma_{k}(t)\triangleq\mathbb{E}[(x_{k}(t)-\mu_{k}(t))(x_{k}(t)-\mu_{k}(t))^{\top}]$ . Heuristically, assuming that solutions to LOCP ${}^{\Delta}_{k}$ have small variance, i.e., $\|\textnormal{tr}\ \Sigma_{k}\|_{L^{2}}\ll 1$ , one can compute the linearization of $b^{x}$ , $b^{z}$ , $\sigma$ , and $L$ at $\mu_{k}$ rather than at $x_{k}$ . In doing so, for the cost of LOCP ${}^{\Delta}_{k+1}$ we obtain the following approximation (the notation goes accordingly as in (3))

\mathbb{E}\left[\int^{T}_{0}f^{0}_{k+1}(s,u(s),x(s),z(s))\;\mathrm{d}s\right]\approx\int^{T}_{0}f^{0}_{u_{k},\mu_{k}}(s,u(s),\mu(s),z(s))\;\mathrm{d}s,

(27)

whereas for the dynamics of LOCP ${}^{\Delta}_{k+1}$ (the notation goes accordingly as in (2))

\begin{cases}\mathrm{d}x(t)\approx b^{x}_{u_{k},\mu_{k}}(t,u(t),x(t))\;\mathrm{d}t+\sigma_{z_{k}}(t,z(t))\;\mathrm{d}B_{t}\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \hskip 25.83325pt=\Big{(}\mathcal{A}_{k+1}(t)x(t)+\mathcal{B}_{k+1}(t,u(t))\Big{)}\;\mathrm{d}t+\mathcal{C}_{k+1}(t,z(t))\;\mathrm{d}B_{t}\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \mathrm{d}z(t)\approx b^{z}_{u_{k},z_{k}}(t,u(t),z(t))\;\mathrm{d}t=\Big{(}\mathcal{D}_{k+1}(t)z(t)+\mathcal{E}_{k+1}(t,u(t))\Big{)}\;\mathrm{d}t\end{cases}

(28)

where $\mathcal{A}_{k+1}(t)\in\mathbb{R}^{n_{x}\times n_{x}}$ , $\mathcal{B}_{k+1}(t,u)\in\mathbb{R}^{n_{x}}$ , $\mathcal{C}_{k+1}(t,z)\in\mathbb{R}^{n_{x}}$ , $\mathcal{D}_{k+1}(t)\in\mathbb{R}^{n_{z}\times n_{z}}$ , and $\mathcal{E}_{k+1}(t,u)\in\mathbb{R}^{n_{z}}$ are deterministic and with $\mathcal{C}_{k+1}(t,z)$ affine in $z$ . Accordingly, by introducing $\mu^{e_{k}}(t)\triangleq\mu(t)-\mu_{k}(t)$ and $\Sigma^{e_{k}}(t)\triangleq\mathbb{E}[(x(t)-x_{k}(t)-\mu(t)+\mu_{k}(t))(x(t)-x_{k}(t)-\mu(t)+\mu_{k}(t))^{\top}]$ , slightly tighter trust-region constraints are

\int^{T}_{0}\textnormal{tr}\ \Sigma^{e_{k}}(t)\;\mathrm{d}t+\int^{T}_{0}\|\mu^{e_{k}}(t)\|^{2}\;\mathrm{d}t\leq\Delta_{k+1},\quad|t^{k+1}_{f}-t^{k}_{f}|\leq\Delta_{k+1}.

(29)

At this point, as all coefficients are deterministic, solutions to (28) are Gaussian processes whose dynamics take the form (see, e.g., [54])

\begin{cases}\dot{\mu}(t)=\mathcal{A}_{k+1}(t)\mu(t)+\mathcal{B}_{k+1}(t,u(t)),\quad\dot{z}(t)=\mathcal{D}_{k+1}(t)z(t)+\mathcal{E}_{k+1}(t,u(t))\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \dot{\Sigma}(t)=\mathcal{A}_{k+1}(t)\Sigma(t)+\Sigma(t)\mathcal{A}_{k+1}(t)^{\top}+\mathcal{C}_{k+1}(t,z(t))\mathcal{C}_{k+1}(t,z(t))^{\top}.\end{cases}

(30)

The system above is not linear because of $\mathcal{C}_{k+1}(t,z(t))\mathcal{C}_{k+1}(t,z(t))^{\top}$ . Nevertheless, we may call upon the convergences of Theorem 3.3.1 , and in particular the convergence of the sequence $(z_{k})_{k\in\mathbb{N}}$ to some deterministic curve $z$ , to replace (30) with

\begin{cases}\dot{\mu}(t)=\mathcal{A}_{k+1}(t)\mu(t)+\mathcal{B}_{k+1}(t,u(t)),\quad\dot{z}(t)=\mathcal{D}_{k+1}(t)z(t)+\mathcal{E}_{k+1}(t,u(t))\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \dot{\Sigma}(t)=\mathcal{A}_{k+1}(t)\Sigma(t)+\Sigma(t)\mathcal{A}_{k+1}(t)^{\top}+\mathcal{C}_{k+1}(t,z_{k}(t))\mathcal{C}_{k+1}(t,z(t))^{\top}.\end{cases}

(31)

In conclusion, we may heuristically replace every LOCP ${}^{\Delta}_{k+1}$ with the deterministic convex optimal control problem whose dynamics are (31), whose trust-region constraints are (29), whose variables are $\mu$ and $\Sigma$ (and additionally $\mu^{e_{k}}$ and $\Sigma_{e_{k}}$ ), and whose cost consists of replacing (27) with

\int^{T}_{0}\Big{(}f^{0}_{u_{k},\mu_{k}}(s,u(s),\mu(s),z(s))+\textnormal{tr}\ \Sigma(t)\Big{)}\;\mathrm{d}s

to force solutions to have small variances, which in turn justifies the whole approach.

6.3. Uncertain Car Trajectory Planning Problem

We now provide numerical experiments. We consider the problem of planning the trajectory of a car whose state and control inputs are $x=(r_{x},r_{y},\theta,v,\omega)$ and $u=(a_{v},a_{\omega})$ and whose dynamics are

b(t,x,u)=(v\cos(\theta),v\sin(\theta),\omega,a_{v},a_{\omega}),\ \sigma(t,x)=\textrm{diag}([\alpha^{2}\omega v,\alpha^{2}\omega v,\beta^{2}\omega v,0,0]),

where $\alpha^{2}=0.1$ and $\beta^{2}=0.01$ quantify the effect of slip. The evolution of $(v,\omega)$ is deterministic: given actuator commands, the change in velocity is known exactly, but uncertainty in positional variables persists. This model is motivated by driving applications where the vehicle may drift during tight turns and suffer from important positional uncertainty, but limits its acceleration to avoid wheel slip. By defining $z=(v,\omega)$ , this problem setting matches the one presented in the previous section. We consider minimizing control effort $G(u(s))=\|u(s)\|_{2}^{2}$ . The initial state $x^{0}$ is fixed, and the final state constraint consists of reaching the goal $\smash{x^{f}}$ in expectation such that $\smash{g(x(t_{f}))=x(t_{f})-x^{f}}$ . Further, we consider cylindrical obstacles of radius $\delta_{o}>0$ centered at $r_{o}\in\mathbb{R}^{2}$ for which we define the potential function $c_{o}:\mathbb{R}^{n}\rightarrow\mathbb{R}$ as $c_{o}(x)=\|r-r_{o}\|^{2}-(\delta_{o}+\varepsilon)^{2}$ if $\|r-r_{o}\|<(\delta_{o}+\varepsilon)$ and $0$ otherwise, where $r=[r_{x},r_{y}]\subset x$ and $\varepsilon=0.1$ is an additional clearance. We penalize obstacle avoidance constraint violation directly within the cost, defining $L(s,x(s),u(s))=L_{0}(s,x(s))=\lambda c_{o}(x(s))$ , with $\lambda=500$ (note that $H=0$ in this setting, following Section 6.1).

6.4. Results

We set $x(0)=[0,\dots,0]$ , $x^{f}=[2.2,3,0,0,0]$ , and $t_{f}=5~{}\textrm{s}$ , and we set up four obstacles. We discretize (31) using a forward Euler scheme with $N=41$ discretization nodes, set $\Delta_{0}=100$ , set $\Delta_{k+1}=0.99\Delta_{k}$ at each SCP iteration, and use IPOPT to solve each convexified problem. We check convergence of SCP by verifying that $\int^{t_{f}}_{0}\|u_{k+1}-u_{k}\|^{2}(s)+\|u_{k}-u_{k-1}\|^{2}(s)\;\mathrm{d}s\leq 10^{-3}$ . Although our method is initialized with an unfeasible straight-line initial trajectory, SCP converges in $10$ iterations, with a trajectory avoiding the obstacles in expectation. Indeed, after evaluating $10^{4}$ sample paths of the system, only $6.14\%$ of the trajectories intersect with obstacles. We also verify that $p^{0}_{k}\neq 0$ at every SCP iteration, and at convergence, so that $(\mathfrak{p},p^{0})\neq 0$ . Thus, Theorems 3.3.1 and 3.3.2 apply and the solution generated by SCP is an extremal for OCP. We visualize our results in Figure 1 and release our implementation at https://github.com/StanfordASL/stochasticSCP.

Refer to caption — Figure 1. Left: solution of each SCP iteration and $100$ sample paths of the resulting final trajectory. Right: velocities and control inputs at each SCP iteration.

7. Conclusion and Perspectives

In this paper we introduced and analyzed convergence properties for sequential convex programming for non-linear stochastic optimal control, from which we derived a practical numerical framework to solve non-linear stochastic optimal control problems.

Future work may consider extending this analysis to tackle more general problem formulations, e.g., risk measures as costs and state (chance) constraints. In this context, some preliminary results using SCP only exist for discrete time problem formulations [32]. However, tackling continuous time formulations will require more sophisticated necessary conditions for optimality. In addition, we plan to better investigate the setting of free final time and stochastic admissible controls, thus devising sharper assumptions and improving the convergence result. Finally, we plan to further leverage our theoretical insights to design new and more efficient numerical schemes for non-linear stochastic optimal control.

References

[1] J.E. Potter. A matrix equation arising in statistical filter theory. Rep. RE-9, Experimental Astronomy Laboratory, Massachusetts Institute of Technology, 1965.
[2] J.-M. Bismut. Linear quadratic optimal stochastic control with random coefficients. SIAM Journal on Control and Optimization, 14(3):419–444, 1976.
[3] S. Peng. Stochastic Hamilton–Jacobi–Bellman equations. SIAM Journal on Control and Optimization, 30(2):284–304, 1992.
[4] S. Tang. General linear quadratic optimal stochastic control problems with random coefficients: linear stochastic Hamilton systems and backward stochastic Riccati equations. SIAM journal on control and optimization, 42(1):53–75, 2003.
[5] P. Kleindorfer and K. Glover. Linear convex stochastic optimal control with applications in production planning. IEEE Transactions on Automatic Control, 18(1):56–59, 1973.
[6] R. T. Rockafellar and R. J. B. Wets. Generalized linear-quadratic problems of deterministic and stochastic optimal control in discrete time. SIAM Journal on control and optimization, 28(4):810–822, 1990.
[7] D. Kuhn, W. Wiesemann, and A. Georghiou. Primal and dual linear decision rules in stochastic and robust optimization. Mathematical Programming, 130(1):177–209, 2011.
[8] C. Bes and S. Sethi. Solution of a class of stochastic linear-convex control problems using deterministic equivalents. Journal of optimization theory and applications, 62(1):17–27, 1989.
[9] B. Berret and F. Jean. Efficient computation of optimal open-loop controls for stochastic systems. Automatica, 115(108874), 2020.
[10] M. A. Rami and X. Y. Zhou. Linear matrix inequalities, riccati equations, and indefinite stochastic linear quadratic controls. IEEE Transactions on Automatic Control, 45(6):1131–1143, 2000.
[11] D. D. Yao, S. Zhang, and X. Y. Zhou. Stochastic linear-quadratic control via semidefinite programming. SIAM Journal on Control and Optimization, 40(3):801–823, 2001.
[12] D. Bertsimas and D. B. Brown. Constrained stochastic lqc: a tractable approach. IEEE Transactions on automatic control, 52(10):1826–1841, 2007.
[13] T. Damm, H. Mena, and T. Stillfjord. Numerical solution of the finite horizon stochastic linear quadratic control problem. Numerical Linear Algebra with Applications, 24(4), 2017.
[14] T. Levajković, H. Mena, and L.-M. Pfurtscheller. Solving stochastic LQR problems by polynomial chaos. IEEE control systems letters, 2(4):641–646, 2018.
[15] R. Bellman. Dynamic Programming. Princeton Univ. Press, Princeton, New Jersey, 1957.
[16] P.-L. Lions. Optimal control of diffusion processes and Hamilton-Jacobi-Bellman equations, part i. Comm. Partial Differential Equations, 8:1101–1174, 1983.
[17] L.S. Pontryagin. Mathematical theory of optimal processes. Routledge, 2018.
[18] H. J. Kushner and F. C. Schweppe. A maximum principle for stochastic control systems. Journal of Mathematical Analysis and Applications, 8(2):287–302, 1964.
[19] S. Peng. A general stochastic maximum principle for optimal control problems. SIAM Journal on control and optimization, 28(4):966–979, 1990.
[20] J. Yong and X. Y. Zhou. Stochastic controls: Hamiltonian systems and HJB equations, volume 43. Springer Science & Business Media, 1999.
[21] E. Trélat. Optimal control and applications to aerospace: some results and challenges. Journal of Optimization Theory and Applications, 154(3):713–758, 2012.
[22] R. Bonalli, B. Hérissé, and E. Trélat. Analytical Initialization of a Continuation-Based Indirect Method for Optimal Control of Endo-Atmospheric Launch Vehicle Systems. In IFAC World Congress, 2017.
[23] R. Bonalli, B. Hérissé, and E. Trélat. Optimal control of endo-atmospheric launch vehicle systems: Geometric and computational issues. IEEE Transactions on Automatic Control, 65(6):2418–2433, 2019.
[24] A. Shapiro and A. Nemirovski. On complexity of stochastic programming problems. In Continuous optimization, pages 111–146. Springer, 2005.
[25] E. Gobet. Monte-Carlo methods and stochastic processes: from linear to non-linear. CRC Press, 2016.
[26] H. J. Kushner. Numerical methods for stochastic control problems in continuous time. SIAM Journal on Control and Optimization, 28(5):999–1048, 1990.
[27] H. J. Kushner and L. F. Martins. Numerical methods for stochastic singular control problems. SIAM journal on control and optimization, 29(6):1443–1475, 1991.
[28] M. Annunziato and A. Borzì. A Fokker–Planck control framework for multidimensional stochastic processes. Journal of Computational and Applied Mathematics, 237(1):487–507, 2013.
[29] J.-F. Le Gall. Brownian motion, martingales, and stochastic calculus, volume 274. Springer, 2016.
[30] R. Carmona. Lectures on BSDEs, stochastic control, and stochastic differential games with financial applications, volume 1. SIAM, 2016.
[31] O. Kazuhide, M. Goldshtein, and P. Tsiotras. Optimal covariance control for stochastic systems under chance constraints. IEEE Control Systems Letters, 2(2):266–271, 2018.
[32] T. Lew, R. Bonalli, and M. Pavone. Chance-constrained sequential convex programming for robust trajectory optimization. In European Control Conference, 2020.
[33] Y. Wang, D. Yang, J. Yong, and Z. Yu. Exact controllability of linear stochastic differential equations and related problems. American Institute of Mathematical Sciences, 7(2):305–345, 2017.
[34] Q. T. Dinh and M. Diehl. Local Convergence of Sequential Convex Programming for Nonconvex Optimization. In Recent Advances in Optimization and its Applications in Engineering, pages 93–102. Springer, 2010.
[35] M. Diehl and F. Messerer. Local Convergence of Generalized Gauss-Newton and Sequential Convex Programming. In Conference on Decision and Control, 2019.
[36] Y. Mao, D. Dueri, M. Szmuk, and B. Açikmeşe. Successive Convexification of Non-Convex Optimal Control Problems with State Constraints. IFAC-PapersOnLine, 50(1):4063–4069, 2017.
[37] F. Palacios-Gomez, L. Lasdon, and M. Engquist. Nonlinear optimization by successive linear programming. Management science, 28(10):1106–1120, 1982.
[38] P. T. Boggs and J. W. Tolle. Sequential quadratic programming. Acta numerica, 4(1):1–51, 1995.
[39] R. Gamkrelidze. Principles of optimal control theory, volume 7. Springer Science & Business Media, 2013.
[40] A.A. Agrachev and Y. Sachkov. Control theory from the geometric viewpoint, volume 87. Springer Science & Business Media, 2013.
[41] H. Frankowska, H. Zhang, and X. Zhang. Stochastic Optimal Control Problems with Control and Initial-Final States Constraints. SIAM Journal on Control and Optimization, 56:1823–1855, 2018.
[42] J. Nocedal and S. Wright. Numerical Optimization. Springer, 1999.
[43] Z. Lu. Sequential convex programming methods for a class of structured nonlinear programming. Technical report, 2013.
[44] T. Haberkorn and E. Trélat. Convergence results for smooth regularizations of hybrid nonlinear optimal control problems. SIAM Journal on Control and Optimization, 49:1498–1522, 2011.
[45] R. Bonalli, B. Hérissé, and E. Trélat. Solving Optimal Control Problems for Delayed Control-Affine Systems with Quadratic Cost by Numerical Continuation. In American Control Conference, 2017.
[46] R. Bonalli. Optimal control of aerospace systems with control-state constraints and delays. PhD thesis, Sorbonne Université, 2018.
[47] R. Bonalli, B. Hérissé, and E. Trélat. Continuity of pontryagin extremals with respect to delays in nonlinear optimal control. SIAM Journal on Control and Optimization, 57(2):1440–1466, 2019.
[48] I. A. Shvartsman and R. B. Vinter. Regularity properties of optimal controls for problems with time-varying state and control constraints. Nonlinear Analysis: Theory, Methods & Applications, 65:448–474, 2006.
[49] Y. Chitour, F. Jean, and E. Trélat. Singular trajectories of control-affine systems. SIAM Journal on Control and Optimization, 47:1078–1095, 2008.
[50] A.E. Bryson. Applied optimal control: optimization, estimation and control. CRC Press, 1975.
[51] J.T. Betts. Survey of numerical methods for trajectory optimization. Journal of guidance, control, and dynamics, 21(2):193–207, 1998.
[52] Emmanuel Trélat. Some properties of the value function and its level sets for affine control systems with quadratic cost. Journal of Dynamical and Control Systems, 6(4):511–541, 2000.
[53] L. Bourdin and E. Trélat. Pontryagin maximum principle for finite dimensional nonlinear optimal control problem on time scales. SIAM Journal on Control and Optimization, 51(5):3781–3813, 2013.
[54] P. S. Maybeck. Stochastic models, estimation, and control. Academic press, 1982.

Appendix

7.1. Proof of Lemma 2.1

Proof of Lemma 2.1.

For the sake of clarity in the notation, we only consider the case for which $d=1$ , the case with multivariate Brownian motion being similar.

Let us start with the first inequality. For this, by denoting $x=x_{u,v,y}$ , for every $t\in[0,T]$ we compute (below, $C\geq 0$ is an appropriate constant)

	$\displaystyle\mathbb{E}\left[\underset{s\in[0,t]}{\sup}\ \\|x(s)\\|^{\ell}\right]\leq C\mathbb{E}\left[\\|x^{0}\\|^{\ell}\right]+C\mathbb{E}\left[\underset{r\in[0,t]}{\sup}\ \left\\|\int^{r}_{0}\sigma(s,y(s))\;\mathrm{d}B_{s}\right\\|^{\ell}\right]$
	$\displaystyle+C\mathbb{E}\left[\underset{r\in[0,t]}{\sup}\ \left\\|\int^{r}_{0}\left(b_{0}(s,y(s))+\sum^{m}_{i=1}u^{i}(s)b_{i}(s,y(s))\right)\;\mathrm{d}s\right\\|^{p}\right]$
	$\displaystyle+C\mathbb{E}\left[\underset{r\in[0,t]}{\sup}\ \left\\|\int^{r}_{0}\left(\frac{\partial b_{0}}{\partial x}(s,y(s))+\sum^{m}_{i=1}v^{i}(s)\frac{\partial b_{i}}{\partial x}(s,y(s))\right)(x(s)-y(s))\;\mathrm{d}s\right\\|^{\ell}\right]$
	$\displaystyle+C\mathbb{E}\left[\underset{r\in[0,t]}{\sup}\ \left\\|\int^{r}_{0}\frac{\partial\sigma}{\partial x}(s,y(s))(x(s)-y(s))\;\mathrm{d}B_{s}\right\\|^{\ell}\right].$

For the last term, by denoting $S_{y,\sigma}\triangleq\bigg{\{}(s,\omega)\in[0,T]\times\Omega:\ (s,y(s,\omega))\in\textnormal{supp}\ \sigma\bigg{\}}$ , Burkholder–Davis–Gundy, Hölder, and Young inequalities give

	$\displaystyle\mathbb{E}\left[\underset{r\in[0,t]}{\sup}\ \left\\|\int^{r}_{0}\frac{\partial\sigma}{\partial x}(s,y(s))(x(s)-y(s))\;\mathrm{d}B_{s}\right\\|^{p}\right]$
	$\displaystyle\leq C\left(\int^{t}_{0}\mathbb{E}\left[\underset{s\in[0,r]}{\sup}\ \left\\|x(r)\right\\|^{p}\right]\;\mathrm{d}s+\int_{S_{y,\sigma}}\left\\|\frac{\partial\sigma}{\partial x}(s,y(s))\right\\|^{p}\left\\|y(s)\right\\|^{p}\;\mathrm{d}(s\times P)\right)$
	$\displaystyle\leq C\left(1+\int^{t}_{0}\mathbb{E}\left[\underset{s\in[0,r]}{\sup}\ \left\\|x(r)\right\\|^{p}\right]\;\mathrm{d}s\right).$

Similar computations apply to the other terms and when considering solutions to (1). Therefore, we conclude from a routine Grönwall inequality argument.

Let us prove the second inequality of the lemma. For $t\in[0,T]$ , we compute

	$\displaystyle\mathbb{E}\bigg{[}\underset{s\in[0,t]}{\sup}\$	$\displaystyle\\|x_{u_{1}}(s)-x_{u_{2}}(s)\\|^{p}\bigg{]}\leq C\mathbb{E}\left[\underset{s\in[0,t]}{\sup}\ \left\\|\int^{r}_{0}\left(b_{0}(s,x_{u_{1}}(s))-b_{0}(s,x_{u_{2}}(s))\right)\;\mathrm{d}s\right\\|^{p}\right]$
		$\displaystyle+C\sum^{m}_{i=1}\mathbb{E}\left[\underset{s\in[0,t]}{\sup}\ \left\\|\int^{r}_{0}\left(u^{i}_{1}(s)b_{i}(s,x_{u_{1}}(s))-u^{i}_{2}(s)b_{i}(s,x_{u_{2}}(s))\right)\;\mathrm{d}s\right\\|^{p}\right]$
		$\displaystyle+C\mathbb{E}\left[\underset{s\in[0,t]}{\sup}\ \left\\|\int^{r}_{0}\left(\sigma(s,x_{u_{1}}(s))-\sigma(s,x_{u_{2}}(s))\right)\;\mathrm{d}B_{s}\right\\|^{p}\right].$

For the second term on the right-hand side, for $i=1,\dots,m$ Hölder inequality gives

	$\displaystyle\mathbb{E}\left[\underset{s\in[0,t]}{\sup}\ \left\\|\int^{r}_{0}\left(u^{i}_{1}(s)b_{i}(s,x_{u_{1}}(s))-u^{i}_{2}(s)b_{i}(s,x_{u_{2}}(s))\right)\;\mathrm{d}s\right\\|^{p}\right]\leq$
	$\displaystyle\leq C\left(\int^{t}_{0}\mathbb{E}\left[\underset{r\in[0,s]}{\sup}\ \left\\|x_{u_{1}}(r)-x_{u_{2}}(r)\right\\|^{p}\right]\;\mathrm{d}s+\mathbb{E}\left[\left(\int^{T}_{0}\\|u_{1}(s)-u_{2}(s)\\|\;\mathrm{d}s\right)^{p}\right]\right),$

and similar computations hold for the remaining terms and when considering solutions to (1). Again, we conclude by a Grönwall inequality argument. ∎

7.2. Proof of Lemma 4.1.2 and of Lemma 5.2.1

The proof of Lemma 4.1.2 immediately follows from the following preliminary result. {lmm}[Stochastic needle-like variation formula - no free final time] Let $(\eta_{1},\dots,\eta_{j})\in\textnormal{Pr}_{\mathbb{R}^{j}_{+}}(\mathcal{C}^{j}_{t})$ (in particular, no $(A^{\prime}_{2})$ or any assumption on $t>t_{j}$ are required). For $\varepsilon\in[0,t-t_{j})$ , uniformly for $\delta\in[-\varepsilon,T-t]$ ,

\mathbb{E}\bigg{[}\bigg{\|}\tilde{g}(\tilde{x}_{u_{\pi}}(t+\delta))-\tilde{g}(\tilde{x}_{u}(t+\delta))-\sum_{i=1}^{j}\eta_{i}\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(t+\delta))z_{t_{i},u_{i}}(t+\delta)\bigg{\|}^{2}\bigg{]}^{\frac{1}{2}}=o\left(\sum_{i=1}^{j}\eta_{i}\right).

Proof.

We only consider the case $j=1$ , being the most general case $j>1$ done by adopting a classical induction argument (see, e.g., [40]). We only need to prove that

\mathbb{E}\left[\left\|\tilde{x}_{u_{\pi}}(t+\delta)-\tilde{x}_{u}(t+\delta)-\eta_{1}z_{t_{1},u_{1}}(t+\delta)\right\|^{2}\right]=o(\eta^{2}_{1})

(32)

uniformly for every $\delta\in[-\varepsilon,T-t]$ . For this, first we remark that $\tilde{x}_{u_{\pi}}(t)=\tilde{x}_{u}(t)$ for $t\in[0,t_{1}]$ . Since $t\in(t_{j},t_{f}]$ and $\varepsilon\geq 0$ are fixed and we take the limit $\eta_{1}\rightarrow 0$ , we may assume that $\eta_{1}+t_{1}<t-\varepsilon$ . Therefore, without loss of generality, we replace $t+\delta$ with $t$ , assuming $t\geq t_{1}+\eta_{1}$ uniformly. We have (below, $C\geq 0$ denotes a constant)

	$\displaystyle\mathbb{E}$	$\displaystyle\left[\left\\|\tilde{x}_{u_{\pi}}(t)-\tilde{x}_{u}(t)-\eta_{1}z_{t_{1},u_{1}}(t)\right\\|^{2}\right]\leq C\mathbb{E}\left[\left\\|\int^{t_{1}+\eta_{1}}_{t_{1}}\eta_{1}A(s)z_{t_{1},u_{1}}(s)\mathrm{d}s\right\\|^{2}\right]$
		$\displaystyle+C\mathbb{E}\left[\left\\|\int^{t}_{t_{1}+\eta_{1}}\left(\tilde{b}(s,u_{\pi}(s),x_{u_{\pi}}(s))-\tilde{b}(s,u(s),x_{u}(s))-\eta_{1}A(s)z_{t_{1},u_{1}}(s)\right)\mathrm{d}s\right\\|^{2}\right]$
		$\displaystyle+C\mathbb{E}\left[\left\\|\int^{t}_{0}\mathbbm{1}_{(t_{1},T]}(s)\left(\tilde{\sigma}(s,x_{u_{\pi}}(s))-\tilde{\sigma}(s,x_{u}(s))-\eta_{1}D(s)z_{t_{1},u_{1}}(s)\right)\mathrm{d}B_{s}\right\\|^{2}\right]$
		$\displaystyle+C\mathbb{E}\left[\left\\|\int^{t_{1}+\eta_{1}}_{t_{1}}\left(\tilde{b}(s,u_{\pi}(s),x_{u_{\pi}}(s))-\tilde{b}(s,u(s),x_{u}(s))\right)\mathrm{d}s-\eta_{1}z_{t_{1},u_{1}}(t_{1})\right\\|^{2}\right].$

Let us analyze those integrals separately. Starting with the last one, we have

	$\displaystyle\mathbb{E}$	$\displaystyle\left[\left\\|\int^{t_{1}+\eta_{1}}_{t_{1}}\left(\tilde{b}(s,u_{\pi}(s),x_{u_{\pi}}(s))-\tilde{b}(s,u(s),x_{u}(s))\right)\mathrm{d}s-\eta_{1}z_{t_{1},u_{1}}(t_{1})\right\\|^{2}\right]\leq$
		$\displaystyle\leq C\mathbb{E}\left[\left\\|\int^{t_{1}+\eta_{1}}_{t_{1}}\left(\tilde{b}(s,u_{1},x_{u_{\pi}}(s))-\tilde{b}(s,u_{1},x_{u}(s))\right)\mathrm{d}s\right\\|^{2}\right]$
		$\displaystyle\quad+C\mathbb{E}\left[\left\\|\int^{t_{1}+\eta_{1}}_{t_{1}}\left(\tilde{b}(s,u_{1},x_{u}(s))-\tilde{b}(s,u(s),x_{u}(s))\right)\mathrm{d}s-\eta_{1}z_{t_{1},u_{1}}(t_{1})\right\\|^{2}\right].$

Lemma 2.1 immediately gives that the first term on the right-hand side is $o(\eta^{2}_{1})$ . In addition, from Hölder inequality, it follows that

	$\displaystyle\frac{1}{\eta^{2}_{1}}\mathbb{E}$	$\displaystyle\left[\left\\|\int^{t_{1}+\eta_{1}}_{t_{1}}\left(\tilde{b}(s,u_{1},x_{u}(s))-\tilde{b}(s,u(s),x_{u}(s))\right)\mathrm{d}s-\eta_{1}z_{t_{1},u_{1}}(t_{1})\right\\|^{2}\right]\leq$
		$\displaystyle\leq\frac{C}{\eta_{1}}\int^{t_{1}+\eta_{1}}_{t_{1}}\mathbb{E}\left[\left\\|\tilde{b}(s,u_{1},x_{u}(s))-\tilde{b}(t_{1},u_{1},x_{u}(t_{1}))\right\\|^{2}\right]\;\mathrm{d}s$
		$\displaystyle\quad+\frac{C}{\eta_{1}}\int^{t_{1}+\eta_{1}}_{t_{1}}\mathbb{E}\left[\left\\|\tilde{b}(s,u(s),x_{u}(s))-\tilde{b}(t_{1},u(t_{1}),x_{u}(t_{1}))\right\\|^{2}\right]\;\mathrm{d}s.$

Since $t_{1}$ is a Lebesgue point for $u$ , the two terms above go to zero as $\eta_{1}\rightarrow 0$ . Next, by the Burkholder–Davis–Gundy inequality and a Taylor development, we have

	$\displaystyle\mathbb{E}\left[\left\\|\int^{t}_{0}\mathbbm{1}_{(t_{1},T]}(s)\left(\tilde{\sigma}(s,x_{u_{\pi}}(s))-\tilde{\sigma}(s,x_{u}(s))-\eta_{1}D(s)z_{t_{1},u_{1}}(s)\right)\mathrm{d}B_{s}\right\\|^{2}\right]$
	$\displaystyle\leq C\mathbb{E}\left[\int^{t}_{t_{1}}\left\\|D(s)\left(\tilde{x}_{u_{\pi}}(s)-\tilde{x}_{u}(s)-\eta_{1}z_{t_{1},u_{1}}(s)\right)\right\\|^{2}\mathrm{d}s\right]$
	$\displaystyle\quad+C\mathbb{E}\left[\int^{t}_{t_{1}}\left\\|\int^{1}_{0}\theta\frac{\partial^{2}\tilde{\sigma}}{\partial\tilde{x}^{2}}\left(s,\theta\tilde{x}_{u}(s)+(1-\theta)(\tilde{x}_{u_{\pi}}-\tilde{x}_{u})(s)\right)\left(\tilde{x}_{u_{\pi}}-\tilde{x}_{u}\right)^{2}(s)\;\mathrm{d}\theta\right\\|^{2}\mathrm{d}s\right]$
	$\displaystyle=C\int^{t}_{t_{1}}\mathbb{E}\left[\left\\|\tilde{x}_{u_{\pi}}(s)-\tilde{x}_{u}(s)-\eta_{1}z_{t_{1},u_{1}}(s)\right\\|^{2}\right]\;\mathrm{d}s+o(\eta^{2}_{1}),$

from Lemma 2.1. Similar estimates hold for the remaining terms in the first inequality of the proof. Summarizing, there exists a constant $C\geq 0$ such that

\mathbb{E}\Big{[}\|\tilde{x}_{u_{\pi}}(t)-\tilde{x}_{u}(t)-\eta_{1}z_{t_{1},u_{1}}(t)\|^{2}\Big{]}\leq o(\eta^{2}_{1})+C\int^{t}_{t_{1}}\mathbb{E}\left[\left\|\tilde{x}_{u_{\pi}}(s)-\tilde{x}_{u}(s)-\eta_{1}z_{t_{1},u_{1}}(s)\right\|^{2}\right]\;\mathrm{d}s

and the conclusion follows from a routine Grönwall inequality argument. ∎

Proof of Lemma 5.2.1.

We may assume $\delta\neq 0$ and $(A^{\prime}_{2})$ . Hence, $\tilde{g}$ is an affine function, and in the rest of this proof we denote $\tilde{g}(\tilde{x})=M\tilde{x}+d$ , where $M\in\mathbb{R}^{(q+1)\times(n+1)}$ and $d\in\mathbb{R}^{q+1}$ .

Developing, we have

	$\displaystyle\bigg{\\|}\mathbb{E}\bigg{[}$	$\displaystyle\tilde{g}(\tilde{x}_{u_{\pi}}(t+\delta))-\tilde{g}(\tilde{x}_{u}(t))-\delta M\tilde{b}(t,u(t),x_{u}(t))-\sum^{j}_{i=1}\eta_{i}Az_{t_{i},u_{i}}(t)\bigg{]}\bigg{\\|}\leq$
		$\displaystyle\leq\Big{\\|}\mathbb{E}\Big{[}\tilde{g}(\tilde{x}_{u}(t+\delta))-\tilde{g}(\tilde{x}_{u}(t))-\delta M\tilde{b}(t,u(t),x_{u}(t))\Big{]}\Big{\\|}$
		$\displaystyle\quad+\bigg{\\|}\mathbb{E}\bigg{[}\tilde{g}(\tilde{x}_{u_{\pi}}(t+\delta))-\tilde{g}(\tilde{x}_{u}(t+\delta))-\sum^{j}_{i=1}\eta_{i}Mz_{t_{i},u_{i}}(t+\delta)\bigg{]}\bigg{\\|}+\sum^{j}_{i=1}\eta_{i}\left\\|M\right\\|\Big{\\|}\mathbb{E}\Big{[}z_{t_{i},u_{i}}(t+\delta)-z_{t_{i},u_{i}}(t)\Big{]}\Big{\\|}.$

From Lemma 7.2, the second term on the right-hand side is $o\left(\sum_{i=1}^{j}\eta_{i}\right)$ . For the last summand, from the property of the stochastic integral, we have

	$\displaystyle\Big{\\|}$	$\displaystyle\mathbb{E}\Big{[}z_{t_{i},u_{i}}(t+\delta)-z_{t_{i},u_{i}}(t)\Big{]}\Big{\\|}\leq\left\\|\mathbb{E}\left[\int^{t+\delta}_{t}A(s)z_{t_{i},u_{i}}(s)\;\mathrm{d}s\right]\right\\|$
		$\displaystyle+\left\\|\mathbb{E}\left[\int^{t+\delta}_{0}\mathbbm{1}_{[t,T]}(s)D(s)z_{t_{i},u_{i}}(s)\;\mathrm{d}B_{s}\right]\right\\|\leq C\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \\|z_{t_{i},u_{i}}(s)\\|\;\right]\delta,$

where the last term is $o\left(\delta+\sum_{i=1}^{j}\eta_{i}\right)$ . It is worth pointing out the importance for $g$ being affine to provide the last claim. Finally, for the remaining term, we apply Itô formula to each coordinate $h=1,\dots,q+1$ , obtaining

	$\displaystyle\Big{[}\tilde{g}(\tilde{x}_{u}(t+\delta))-\tilde{g}(\tilde{x}_{u}(t))-$	$\displaystyle\delta M\tilde{b}(t,u(t),x_{u}(t))\Big{]}_{h}=\sum^{n+1}_{k=1}M_{hk}\int^{t+\delta}_{0}\mathbbm{1}_{[t,T]}(s)\tilde{\sigma}_{k}(s,x_{u}(s))\;\mathrm{d}B_{s}$
		$\displaystyle+\sum^{n+1}_{k=1}M_{hk}\left(\int^{t+\delta}_{t}\tilde{b}_{k}(s,u(s),x_{u}(s))\;\mathrm{d}s-\delta\tilde{b}_{k}(t,u(t),x_{u}(t))\right),$

and therefore

	$\displaystyle\frac{1}{\delta+\sum_{i=1}^{j}\eta_{i}}\Big{\\|}$	$\displaystyle\mathbb{E}\Big{[}\tilde{g}(\tilde{x}_{u}(t+\delta))-\tilde{g}(\tilde{x}_{u}(t))-\delta M\tilde{b}(t,u(t),x_{u}(t))\Big{]}\Big{\\|}\leq$
		$\displaystyle\leq\frac{\\|M\\|}{\delta}\int^{t+\delta}_{t}\mathbb{E}\left[\left\\|\tilde{b}_{k}(s,u(s),x_{u}(s))-\tilde{b}_{k}(t,u(t),x_{u}(t))\right\\|\right]\;\mathrm{d}s.$

As $t$ is Lebesgue for $u$ , this last quantity goes to zero for $\delta\rightarrow 0$ , and we conclude. ∎

	$\displaystyle\mathbb{E}\big{[}\\|p_{k}(t)-p_{k}(s)\\|\big{]}\leq$
	$\displaystyle\leq 2\mathbb{E}\bigg{[}\bigg{\\|}\int^{t}_{s}\bigg{(}p^{\top}_{k}(r)\frac{\partial f_{k}}{\partial x}(r,x_{k}(r),u_{k}(r))+p^{0}_{k}\frac{\partial f^{0}_{k}}{\partial x}(r,x_{k}(r),u_{k}(r))+2p^{1}_{k}(x_{k}(r)-x_{k-1}(r))$
	$\displaystyle\hskip 193.74939pt+\sum^{d}_{i=1}(q_{k})^{\top}_{i}(r)\frac{\partial(\sigma_{k})_{i}}{\partial x}(r,x_{k}(r))\bigg{)}\;\mathrm{d}r\bigg{\\|}\bigg{]}+2\mathbb{E}\left[\left\\|\int^{t}_{s}q_{k}(r)\;\mathrm{d}B_{r}\right\\|\right]$
	$\displaystyle\leq C\left(p^{0}_{k}+p^{1}_{k}+\mathbb{E}\left[\underset{t\in[0,T]}{\sup}\ \\|x_{k}(t)\\|^{2}\right]^{\frac{1}{2}}+\mathbb{E}\left[\underset{t\in[0,T]}{\sup}\ \\|p_{k}(t)\\|^{2}\right]^{\frac{1}{2}}+\mathbb{E}\left[\int^{T}_{0}\ \\|q_{k}(t)\\|^{2}\;\mathrm{d}t\right]^{\frac{1}{2}}\right)(t-s)^{\frac{1}{2}},$

		$\displaystyle\\|x_{k+1}(t)-x(t)\\|^{\ell}\leq C\left\\|\int^{t}_{0}\Big{(}b_{0}(s,x_{k}(s))-b_{0}(s,x(s))\Big{)}\;\mathrm{d}s\right\\|^{\ell}$		(19)
		$\displaystyle+C\sum^{m}_{i=1}\left\\|\int^{t}_{0}\Big{(}u^{i}_{k+1}(s)b_{i}(s,x_{k}(s))-u^{i}(s)b_{i}(s,x(s))\Big{)}\;\mathrm{d}s\right\\|^{\ell}$
		$\displaystyle+C\left\\|\int^{t}_{0}\left(\frac{\partial b_{0}}{\partial x}(s,x_{k}(s))+\sum^{m}_{i=1}u^{i}_{k}(s)\frac{\partial b_{i}}{\partial x}(s,x_{k}(s))\right)(x_{k+1}(s)-x_{k}(s))\;\mathrm{d}s\right\\|^{\ell}$
		$\displaystyle+C\left\\|\int^{t}_{0}\left(\sigma(s,x_{k}(s))-\sigma(s,x(s))+\frac{\partial\sigma}{\partial x}(s,x_{k}(s))(x_{k+1}(s)-x_{k}(s))\right)\;\mathrm{d}B_{s}\right\\|^{\ell}$

	$\displaystyle\mathbb{E}\left[\left\\|\int^{t}_{0}\left(\sigma(s,x_{k}(s))-\sigma(s,x(s))+\frac{\partial\sigma}{\partial x}(s,x_{k}(s))(x_{k+1}(s)-x_{k}(s))\right)\;\mathrm{d}B_{s}\right\\|^{\ell}\right]\leq$
	$\displaystyle\leq C\left(\int^{t}_{0}\mathbb{E}\left[\underset{r\in[0,s]}{\sup}\ \left\\|x_{k+1}(r)-x(r)\right\\|^{\ell}\right]\;\mathrm{d}s+\mathbb{E}\left[\int^{T}_{0}\left\\|x_{k+1}(s)-x_{k}(s)\right\\|^{1+(\ell-1)}\;\mathrm{d}s\right]\right)$
	$\displaystyle\leq C\int^{t}_{0}\mathbb{E}\left[\underset{r\in[0,s]}{\sup}\ \left\\|x_{k+1}(r)-x(r)\right\\|^{\ell}\right]\;\mathrm{d}s+C\left(\int^{T}_{0}\mathbb{E}\left[\left\\|x_{k+1}(s)-x_{k}(s)\right\\|^{2}\right]\mathrm{d}s\right)^{\frac{1}{2}}\left(\underset{k\in\mathbb{N}}{\sup}\ \mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \\|x_{k}(s)\\|^{2(\ell-1)}\right]\right)^{\frac{1}{2}}$
	$\displaystyle\leq C\left(\int^{t}_{0}\mathbb{E}\left[\underset{r\in[0,s]}{\sup}\ \left\\|x_{k+1}(r)-x(r)\right\\|^{\ell}\right]\;\mathrm{d}s+\Delta_{k+1}\right)$

$\displaystyle\\|$	$\displaystyle\tilde{w}_{k+1}(t)-\tilde{w}(t)\\|^{\ell}\leq C\\|\tilde{\xi}_{k+1}-\tilde{\xi}\\|^{\ell}$	(20)
	$\displaystyle+C\left\\|\int^{t}_{r_{k}}\left(\frac{\partial H}{\partial x}(x_{k+1}(s))w_{k+1}(s)-\frac{\partial H}{\partial x}(x(s))w(s)\right)\;\mathrm{d}s\right\\|^{\ell}$
	$\displaystyle+C\left\\|\int^{t}_{r_{k}}\left(\frac{\partial b_{0}}{\partial x}(s,x_{k}(s))w_{k+1}(s)-\frac{\partial b_{0}}{\partial x}(s,x(s))w(s)\right)\;\mathrm{d}s\right\\|^{\ell}$
	$\displaystyle+C\left\\|\int^{t}_{0}\mathbbm{1}_{[r_{k},T]}(t)\left(\frac{\partial\sigma}{\partial x}(s,x_{k}(s))w_{k+1}(s)-\frac{\partial\sigma}{\partial x}(s,x(s))w(s)\right)\;\mathrm{d}B_{s}\right\\|^{\ell}$
	$\displaystyle+C\left\\|\int^{t}_{r_{k}}\left(\frac{\partial L}{\partial x}(t,u_{k}(t),x_{k}(t))w_{k+1}(s)-\frac{\partial L}{\partial x}(t,u(t),x(t))w(s)\right)\;\mathrm{d}s\right\\|^{\ell}$
	$\displaystyle+C\sum^{m}_{i=1}\left\\|\int^{t}_{r_{k}}\left(u^{i}_{k}(s)\frac{\partial b_{i}}{\partial x}(s,x_{k}(s))w_{k+1}(s)-u^{i}(s)\frac{\partial b_{i}}{\partial x}(s,x(s))w(s)\right)\;\mathrm{d}s\right\\|^{\ell}.$

	$\displaystyle\mathbb{E}\left[\\|\tilde{\xi}_{k+1}-\tilde{\xi}\\|^{2}\right]\leq$	$\displaystyle C\bigg{(}\\|u_{k+1}(r_{k+1})-u(r)\\|^{2}+\|r_{k+1}-r\|^{2}$		(22)
		$\displaystyle+\mathbb{E}\left[\\|x(r_{k+1})-x(r)\\|^{2}\right]+\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \\|x_{k}(s)-x(s)\\|^{2}\right]\bigg{)},$

Sequential Convex Programming for Non-Linear Stochastic Optimal Control

Abstract.

Key words and phrases:

1991 Mathematics Subject Classification:

1. Introduction

2. Stochastic Optimal Control Setting

2.1. Stochastic Differential Equations

2.2. Stochastic Optimal Control Problem

3. Stochastic Sequential Convex Programming

3.1. The Method

Proof.

3.2. Stochastic Pontryagin Maximum Principle

3.2.1. PMP related to OCP

3.2.2. PMP related to LOCPkΔ{}^{\Delta}_{k}

3.3. Main Results

3.3.1. Properties of Accumulation Points for SCP

3.3.2. Existence of Accumulation Points for SCP

Proof.

3.3.3. Insights to Speed Up the Convergence of SCP

4. Proof of the Main Results

4.1. Main Steps of the Proof of the Stochastic Maximum Principle

4.1.1. Linear Stochastic Differential Equations

4.1.2. Needle-like Variations and End-point Mapping

4.1.3. Variational Inequalities

4.1.4. Conclusion of the Proof of the Stochastic Maximum Principle

4.2. Proof of the Convergence Result

4.2.1. Convergence of Controls and Trajectories

Proof.

4.2.2. Convergence of Variational Inequalities

Proof.

4.2.3. Convergence of Multipliers and Conclusion

5. Extension to Problems with Free Final Time and Stochastic Controls

5.1. Refined assumptions and extended result of convergence

5.2. Proof of the extension for the stochastic Pontryagin Maximum Principle

5.2.1. Modified Needle-like Variations and End-point Mapping

5.2.2. Variational Inequalities and Conclusion

6. An Example Numerical Scheme

6.1. A Simplified Context

6.2. The Proposed Approach

6.3. Uncertain Car Trajectory Planning Problem

6.4. Results

7. Conclusion and Perspectives

References

Appendix

7.1. Proof of Lemma 2.1

Proof of Lemma 2.1.

7.2. Proof of Lemma 4.1.2 and of Lemma 5.2.1

Proof.

Proof of Lemma 5.2.1.

Sequential Convex Programming for
Non-Linear Stochastic Optimal Control

3.2.2. PMP related to LOCP ${}^{\Delta}_{k}$