This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Sequential Convex Programming for
Non-Linear Stochastic Optimal Control

Riccardo Bonalli The author is with the Université Paris-Saclay, CNRS, CentraleSupélec, Laboratoire des signaux et systèmes, 91190, Gif-sur-Yvette, France. Email:[email protected] and [email protected]. Thomas Lew2  and  Marco Pavone The authors are with the Department of Aeronautics and Astronautics, Stanford University, Stanford, CA 94305. Emails: {thomas.lew, pavone}@stanford.edu. This research was supported by the National Science Foundation under the CPS program (grant #1931815).
Abstract.

This work introduces a sequential convex programming framework for non-linear, finite-dimensional stochastic optimal control, where uncertainties are modeled by a multidimensional Wiener process. We prove that any accumulation point of the sequence of iterates generated by sequential convex programming is a candidate locally-optimal solution for the original problem in the sense of the stochastic Pontryagin Maximum Principle. Moreover, we provide sufficient conditions for the existence of at least one such accumulation point. We then leverage these properties to design a practical numerical method for solving non-linear stochastic optimal control problems based on a deterministic transcription of stochastic sequential convex programming.

Key words and phrases:
Nonlinear stochastic optimal control, sequential convex programming, convergence of Pontryagin extremals, numerical deterministic reformulation.
1991 Mathematics Subject Classification:
49K40, 65C30, 93E20
{resume}

Ce travail introduit un cadre de programmation séquentielle convexe pour le contrôle optimal de systèmes stochastiques non-linéaires de dimensions finies. Nous prouvons que tout point d’accumulation de la suite des itérées générée par la programmation séquentielle convexe est un candidat à une solution localement optimale du problème originel, au sens du Principe du Maximum de Pontriaguine stochastique. De plus, nous développons des conditions suffisantes pour l’existence d’au moins un de ces points d’accumulation. Nous exploitons ces propriétés afin de concevoir une méthode numérique pratique résolvant des problèmes de contrôle stochastique non-linéaires par une reformulation déterministe de la programmation séquentielle convexe.

1. Introduction

Over the past few decades, the applied control community has devoted increasing attention toward the optimal control of stochastic systems. The general formulation of this problem entails steering a dynamical system from an initial configuration to a final configuration while optimizing some prescribed performance criterion (e.g., minimizing some given cost) and satisfying constraints. This dynamical system is also subject to uncertainties which are modeled by Wiener processes and may come from unmodeled and/or unpredicted behaviors, as well as from measurement errors. Active development of new theoretical and numerical methods to address this problem continues, and the subject already has a rich literature.

We can classify the existing works into two main categories.

The first category consists of contributions that focus on Linear Convex Problems (LCPs), i.e., whose dynamics are linear and whose costs are convex in both state and control variables. An important class of LCPs is given by Linear Quadratic Problems (LQPs) in which costs are quadratic in both state and control variables and for which the analysis of optimal solutions may be reduced to the study of an algebraic relation known as Stochastic Riccati Equation (SRE) [1, 2, 3, 4]. Efficient algorithmic frameworks have been devised to numerically solve LCPs, ranging from local search [5] and dual representations [6, 7], to deterministic-equivalent reformulations [8, 9], among others. In the special case of LQPs, those techniques may be further improved by combining SRE theory with semidefinite programming [10, 11], finite-dimensional approximation [12, 13], or chaos expansion [14].

The second category of works deals with problems that do not enjoy any specific regularity, allowing non-linear dynamics or non-convex (therefore non-quadratic) costs. Throughout this paper, we call these Non-Linear Problems (NLPs). It is unquestionable that NLPs have so far received less attention from the community than LCPs, especially since the analysis of the former is usually more involved. Similar to the deterministic case, there are two main theoretical tools that have been developed to analyze NLPs: stochastic Dynamic Programming (DP) [15, 16] and the stochastic Pontryagin Maximum Principle (PMP) [17, 18, 19] (an extensive survey of generalizations of DP and the PMP may be found in [20]). In the case of LQPs, one can show that DP and the PMP lead to SRE [20]. DP provides optimal policies through the solution of a partial differential equation, whereas the necessary conditions for optimality offered by the PMP allow one to set up a two-point boundary value problem which returns candidate locally-optimal solutions when solved. Both methods only lead to analytical solutions in a few cases, and they can involve complex numerical challenges (the stochastic setting is even more problematic than the deterministic one, the latter being better understood for a wide range of problems, see, e.g., [21, 22, 23]). This has fostered the investigation of more tractable approaches to solve NLPs such as Monte Carlo simulation [24, 25], Markov chain discretization [26, 27], and deterministic (though non-equivalent) reformulations [28, 9], among others. Importantly, many of the aforementioned approaches, e.g., [26], are often based on some sort of approximation of the original formulation and thus offer powerful alternatives to DP and the PMP, especially since they are more numerically tractable and can be shown to converge to policies satisfying DP or the stochastic PMP.

In this paper, our objective is to lay the theoretical foundations to leverage Sequential Convex Programming (SCP) for the purposes of computing candidate optimal solutions for a specific class of NLPs. SCP is among the most well-known and earliest approximation techniques for deterministic non-linear optimal control and, to the best of our knowledge, such an approach has not been extended to stochastic settings yet. The simplest SCP scheme (which we consider in this work) consists of successively linearizing any non-linear terms in the dynamics and any non-convex functions in the cost and seeking a solution to the original formulation by solving a sequence of LCPs. This approach leads to two desirable properties, which jointly are instrumental to the design of efficient numerical schemes. First, one can rely on the many efficient techniques and associated software libraries that have been devised to solve LCPs (or LQPs depending on the shape of the original NLP). Second, as we will show in this paper, when this iterative process converges, it returns a strategy that satisfies the PMP related to the original NLP, i.e., a candidate optimum for the original formulation. Unlike existing methods such as [26], which introduce approximated formulations that are still non-linear, our approach offers the main advantage of requiring the solution to LCPs only. Specifically, we identify three key contributions:

  1. (1)

    We introduce and analyze a new framework to compute candidate optimal solutions for finite-horizon, finite-dimensional non-linear stochastic optimal control problems. with control-affine dynamics and uncontrolled diffusion. This hinges on the basic principle of SCP, i.e., iteratively solving a sequence of LCPs that stem from successive linear approximations of the original problem.

  2. (2)

    Through a meticulous study of the continuity of the stochastic Pontryagin cones of variations with respect to linearization, we prove that any accumulation point of the sequence of iterates generated by SCP is a strategy satisfying the PMP related to the original formulation. In addition, by leveraging additional assumptions we prove that any such accumulation point always exists, which in turn provides a “weak” guarantee of success for the method.

  3. (3)

    Through an explicit example, we show how to leverage the properties offered by this framework to better understand what approximations may be adopted for the design of efficient numerical schemes for NLPs, although we leave the theoretical investigation of the approximation error as future direction.

The paper is organized as follows. Section 2 introduces notation and preliminary results and defines the stochastic optimal control problem of interest. In Section 3, we introduce the framework of stochastic SCP and the stochastic PMP, and we state our main result of convergence. For the sake of clarity, in Section 4 we retrace the proof of the stochastic PMP and introduce all the necessary technicalities we need to prove our main result of convergence, though we recall that the stochastic PMP is a well-established result and Section 4 should not be understood as part of our main contribution. In Section 6, we show how to leverage our analysis to design a practical numerical method to solve non-linear stochastic optimal control problems, and we provide numerical experiments. Finally, Section 7 provides concluding remarks and perspectives on future directions.

2. Stochastic Optimal Control Setting

Let (Ω,𝒢,P)(\Omega,\mathcal{G},P) be a second–countable probability space and let Bt=(Bt1,,Btd)B_{t}=(B^{1}_{t},\dots,B^{d}_{t}) be a dd–dimensional Brownian motion with continuous sample paths starting at zero and whose filtration (t)t0=(σ(Bs:0st))t0\mathcal{F}\triangleq(\mathcal{F}_{t})_{t\geq 0}=\big{(}\sigma(B_{s}:0\leq s\leq t)\big{)}_{t\geq 0} is complete. We consider processes that are defined within bounded time intervals. Hence, for every nn\in\mathbb{N}, 2\ell\geq 2 and maximal time T+T\in\mathbb{R}_{+}, we introduce the space L([0,T]×Ω;n)L^{\ell}_{\mathcal{F}}([0,T]\times\Omega;\mathbb{R}^{n}) of progressive processes x:[0,T]×Ωnx:[0,T]\times\Omega\rightarrow\mathbb{R}^{n} (with respect to the filtration \mathcal{F}) such that 𝔼[0Tx(s,ω)ds]<\mathbb{E}\left[\int^{T}_{0}\|x(s,\omega)\|^{\ell}\;\mathrm{d}s\right]<\infty, where \|\cdot\| is the Euclidean norm. In this setting, for every xL([0,T]×Ω;n)x\in L^{\ell}_{\mathcal{F}}([0,T]\times\Omega;\mathbb{R}^{n}) and i=1,,di=1,\dots,d, the Itô integral of xx with respect to BiB^{i} is the continuous, bounded in L2L^{2}, and nn–dimensional martingale in [0,T][0,T] (with respect to the filtration \mathcal{F}) that starts at zero, denoted 0x(s)dBsi:[0,T]×Ωn\int^{\cdot}_{0}x(s)\;\mathrm{d}B^{i}_{s}:[0,T]\times\Omega\rightarrow\mathbb{R}^{n}. For 2\ell\geq 2, we denote by L(Ω;C([0,T];n))L^{\ell}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n})) the space of \mathcal{F}–adapted processes x:[0,T]×Ωnx:[0,T]\times\Omega\rightarrow\mathbb{R}^{n} that have continuous sample paths and satisfy 𝔼[sups[0,T]x(s,ω)]<\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \|x(s,\omega)\|^{\ell}\right]<\infty.

2.1. Stochastic Differential Equations

From now on, we fix two integers n,mn,m\in\mathbb{N}, a maximal time T+T\in\mathbb{R}_{+}, and a compact, convex subset UmU\subseteq\mathbb{R}^{m}. Although in this work we consider differential equations steered by deterministic controls (see Section 2.2 below), for the sake of generality, we introduce stochastic dynamics which depend on either deterministic or stochastic controls. Specifically, we denote by 𝒰\mathcal{U} the set of admissible controls and consider either deterministic controls 𝒰=L2([0,T];U)\mathcal{U}=L^{2}([0,T];U) or stochastic controls 𝒰=L2([0,T]×Ω;U)\mathcal{U}=L^{2}_{\mathcal{F}}([0,T]\times\Omega;U). Note that since UU is compact, admissible controls are almost everywhere, or a.e. for brevity (and additionally almost surely, or a.s. for brevity) bounded. We are given continuous mappings bi:×nnb_{i}:\mathbb{R}\times\mathbb{R}^{n}\rightarrow\mathbb{R}^{n}, i=0,,mi=0,\dots,m and σj:×nn\sigma_{j}:\mathbb{R}\times\mathbb{R}^{n}\rightarrow\mathbb{R}^{n}, j=1,,dj=1,\dots,d which are at least C2C^{2} with respect to the variable xx. For a given u𝒰u\in\mathcal{U}, we consider dynamical systems modeled through the following forward stochastic differential equation with uncontrolled diffusion

dx(t)\displaystyle\displaystyle\mathrm{d}x(t) =b(t,u(t),x(t))dt+σ(t,x(t))dBt,x(0)=x0\displaystyle=b(t,u(t),x(t))\;\mathrm{d}t+\sigma(t,x(t))\;\mathrm{d}B_{t},\quad x(0)=x^{0} (1)
(b0(t,x(t))+i=1mui(t)bi(t,x(t)))dt+j=1dσj(t,x(t))dBtj\displaystyle\triangleq\left(b_{0}(t,x(t))+\sum^{m}_{i=1}u^{i}(t)b_{i}(t,x(t))\right)\;\mathrm{d}t+\sum^{d}_{j=1}\sigma_{j}(t,x(t))\;\mathrm{d}B^{j}_{t}

where we assume that the fixed initial condition satisfies x0L0(Ω;n)x^{0}\in L^{\ell}_{\mathcal{F}_{0}}(\Omega;\mathbb{R}^{n}), for every 2\ell\geq 2 (for instance, this holds when x0x^{0} is a deterministic vector of n\mathbb{R}^{n}).

The procedure developed in this work is based on the following linearization of (1). For 2\ell\geq 2, let v𝒰v\in\mathcal{U} and yL(Ω;C([0,T];n))y\in L^{\ell}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n})). For a given u𝒰u\in\mathcal{U}, we define the linearization of (1) around (v,y)(v,y) to be the following well-defined forward stochastic differential equation with uncontrolled diffusion

dx(t)=bv,y(t,u(t),x(t))dt+σy(t,x(t))dBt,x(0)=x0\displaystyle\mathrm{d}x(t)=b_{v,y}(t,u(t),x(t))\;\mathrm{d}t+\sigma_{y}(t,x(t))\;\mathrm{d}B_{t},\quad x(0)=x^{0} (2)
(b0(t,y(t))+b0x(t,y(t))(x(t)y(t))+i=1m(ui(t)bi(t,y(t))+vi(t)bix(t,y(t))(x(t)y(t))))dt\displaystyle\triangleq\left(b_{0}(t,y(t))+\frac{\partial b_{0}}{\partial x}(t,y(t))(x(t)-y(t))+\sum^{m}_{i=1}\left(u^{i}(t)b_{i}(t,y(t))+v^{i}(t)\frac{\partial b_{i}}{\partial x}(t,y(t))(x(t)-y(t))\right)\right)\mathrm{d}t
+j=1d(σj(t,y(t))+σjx(t,y(t))(x(t)y(t)))dBtj.\displaystyle+\sum^{d}_{j=1}\left(\sigma_{j}(t,y(t))+\frac{\partial\sigma_{j}}{\partial x}(t,y(t))(x(t)-y(t))\right)\mathrm{d}B^{j}_{t}.

We require the solutions to (1) and to (2) to be bounded in expectation uniformly with respect to uu, vv and yy. For this, we consider the following (standard) assumption:

(A1)(A_{1}) Functions bib_{i}, i=0,,mi=0,\dots,m, σj\sigma_{j}, j=1,,dj=1,\dots,d, have compact supports in [0,T]×n[0,T]\times\mathbb{R}^{n}.

Under (A1)(A_{1}), for every 2\ell\geq 2 and every u𝒰u\in\mathcal{U}, the stochastic equation (1) has a unique (up to stochastic indistinguishability) solution xuL(Ω;C([0,T];n))x_{u}\in L^{\ell}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n})), whereas for every u,v𝒰u,v\in\mathcal{U} and yL(Ω;C([0,T];n))y\in L^{\ell}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n})), the stochastic equation (2) has a unique (up to stochastic indistinguishability) solution xu,v,yL(Ω;C([0,T];n))x_{u,v,y}\in L^{\ell}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n})) (see, e.g., [29, 30]), and the following technical result holds, which will be used in the proof of our convergence result, with proof given in the appendix (see Section 7.1).

{lmm}

[Uniform boundness and continuity with respect to control inputs] Fix 2\ell\geq 2, and for u𝒰u\in\mathcal{U}, let xux_{u} denote the solution to (1), whereas for u,v𝒰u,v\in\mathcal{U} and yL(Ω;C([0,T];n))y\in L^{\ell}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n})), let xu,v,yx_{u,v,y} denote the solution to (2). Under (A1)(A_{1}), there exists a constant C0C\geq 0 which does not depend on uu, vv, or yy such that it holds that

𝔼[supt[0,T]xu(t)+supt[0,T]xu,v,y(t)]C,\mathbb{E}\left[\underset{t\in[0,T]}{\sup}\ \|x_{u}(t)\|^{\ell}+\underset{t\in[0,T]}{\sup}\ \|x_{u,v,y}(t)\|^{\ell}\right]\leq C,
𝔼[supt[0,T]xu1(t)xu2(t)+supt[0,T]xu1,v,y(t)xu2,v,y(t)]C𝔼[(0Tu1(s)u2(s)ds)],u1,u2𝒰.\displaystyle\mathbb{E}\left[\underset{t\in[0,T]}{\sup}\ \|x_{u_{1}}(t)-x_{u_{2}}(t)\|^{\ell}+\underset{t\in[0,T]}{\sup}\ \|x_{u_{1},v,y}(t)-x_{u_{2},v,y}(t)\|^{\ell}\right]\leq C\mathbb{E}\left[\left(\int^{T}_{0}\|u_{1}(s)-u_{2}(s)\|\;\mathrm{d}s\right)^{\ell}\right],u_{1},u_{2}\in\mathcal{U}.

2.2. Stochastic Optimal Control Problem

Given qq\in\mathbb{N}, we consider continuous mappings g:nqg:\mathbb{R}^{n}\rightarrow\mathbb{R}^{q}, G:mG:\mathbb{R}^{m}\rightarrow\mathbb{R}, H:nH:\mathbb{R}^{n}\rightarrow\mathbb{R}, and L:×m×nL:\mathbb{R}\times\mathbb{R}^{m}\times\mathbb{R}^{n}\rightarrow\mathbb{R} with L(t,x,u)L0(t,x)+i=1muiLi(t,x)L(t,x,u)\triangleq L_{0}(t,x)+\sum^{m}_{i=1}u^{i}L_{i}(t,x). We require gg, HH, and LiL_{i}, i=0,,mi=0,\dots,m, to be at least C2C^{2} with respect to the variable xx, and we require GG, HH to be convex. In particular, GG is Lipschitz when restricted to the compact and convex set UU. We focus on finite-horizon, finite-dimensional non-linear stochastic Optimal Control Problems (OCP) with control-affine dynamics and uncontrolled diffusion, of the form

{minu𝒰𝔼[0Tf0(s,u(s),x(s))ds]𝔼[0T(G(u(s))+H(x(s))+L(s,u(s),x(s)))ds]dx(t)=b(t,u(t),x(t))dt+σ(t,x(t))dBt,x(0)=x0,𝔼[g(x(T))]=0\begin{cases}\displaystyle\underset{u\in\mathcal{U}}{\min}\ \mathbb{E}\left[\int^{T}_{0}f^{0}(s,u(s),x(s))\;\mathrm{d}s\right]\triangleq\mathbb{E}\left[\int^{T}_{0}\Big{(}G(u(s))+H(x(s))+L(s,u(s),x(s))\Big{)}\;\mathrm{d}s\right]\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \mathrm{d}x(t)=b(t,u(t),x(t))\;\mathrm{d}t+\sigma(t,x(t))\;\mathrm{d}B_{t},\quad x(0)=x^{0},\quad\mathbb{E}\left[g(x(T))\right]=0\end{cases}

where we optimize over deterministic controls u𝒰=L2([0,T];U)u\in\mathcal{U}=L^{2}([0,T];U). We adopt the (fairly mild) assumption:

(A2)(A_{2}) Mappings gg, HH, and LiL_{i}, i=0,,mi=0,\dots,m, either are affine-in-state or have compact supports in n\mathbb{R}^{n} and in [0,T]×n[0,T]\times\mathbb{R}^{n}, respectively.

Our choice of optimizing over deterministic controls as opposed to stochastic controls is motivated by practical considerations. Specifically, in several applications of interest ranging from aerospace to robotics, it is often advantageous to compute and implement simpler deterministic controls at higher control rates, to be able to quickly react to external disturbances, unmodeled dynamical effects, and changes in the cost function (e.g., moving obstacles that a robot should avoid in real-time). Moreover, in cases where a feedback controller is accounted for, a common and efficient approach entails decomposing the stochastic control into a state-dependent feedback term, plus a nominal deterministic control trajectory to be optimized for, which is equivalent to adopting deterministic controls (see, e.g., [31, 32]). Nevertheless, for the sake of completeness and generality, in Section 5 we introduce appropriate conditions under which our method may extend to stochastic controls; we also analyze possible extensions to the case of free-final-time optimal control problems.

Many applications of interest often involve state constraints. In this case, to make sure the procedure developed in this work still applies, every such constraint needs to be considered in expectation and penalized within the cost of OCP (for example by including those contributions in HH or LL through some penalization function). In future work, we plan to extend our method to more general settings, whereby for instance state constraints are enforced through more accurate chance constraints (see Section 7 for a thorough discussion).

3. Stochastic Sequential Convex Programming

We propose the following framework to solve OCP, based on the classical SCP methodology. Starting from some initial guesses of control u0𝒰u_{0}\in\mathcal{U} and trajectory x0L(Ω;C([0,T];n))x_{0}\in L^{\ell}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n})), 2\ell\geq 2, we inductively define a sequence of stochastic linear-convex problems whose dynamics and costs stem from successive linearizations of the mappings bb, σ\sigma, and LL, and we successively solve those problems while updating user-defined parameters. The convergence of the method generally depends on a good choice of the initial guess (u0,x0)(u_{0},x_{0}) and of updates in each iteration to trust-region constraints. These constraints are added to make the successive linearizations of OCP well-posed. Below, we detail this procedure.

3.1. The Method

At iteration k+1k+1\in\mathbb{N}, by denoting

fv,y0(s,u,x)G(u)+H(x)+L(s,u,y)+Lx(s,v,y)(xy),f^{0}_{v,y}(s,u,x)\triangleq G(u)+H(x)+L(s,u,y)+\frac{\partial L}{\partial x}(s,v,y)(x-y), (3)

we define the following stochastic Linearized Optimal Control Problem (LOCPk+1Δ{}^{\Delta}_{k+1})

{minu𝒰𝔼[0Tfk+10(s,u(s),x(s))ds]𝔼[0Tfuk,xk0(s,u(s),x(s))ds]dx(t)=bk+1(t,u(t),x(t))dt+σk+1(t,x(t))dBt,x(0)=x0buk,xk(t,u(t),x(t))dt+σxk(t,x(t))dBt𝔼[gk+1(x(T))]𝔼[g(xk(T))+gx(xk(T))(x(T)xk(T))]=00T𝔼[x(s)xk(s)2]dsΔk+1\begin{cases}\displaystyle\underset{u\in\mathcal{U}}{\min}\ \mathbb{E}\left[\int^{T}_{0}f^{0}_{k+1}(s,u(s),x(s))\;\mathrm{d}s\right]\triangleq\mathbb{E}\bigg{[}\int^{T}_{0}f^{0}_{u_{k},x_{k}}(s,u(s),x(s))\;\mathrm{d}s\bigg{]}\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \mathrm{d}x(t)=b_{k+1}(t,u(t),x(t))\;\mathrm{d}t+\sigma_{k+1}(t,x(t))\;\mathrm{d}B_{t},\quad x(0)=x^{0}\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \hskip 25.83325pt\triangleq b_{u_{k},x_{k}}(t,u(t),x(t))\;\mathrm{d}t+\sigma_{x_{k}}(t,x(t))\;\mathrm{d}B_{t}\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \displaystyle\mathbb{E}\left[g_{k+1}(x(T))\right]\triangleq\mathbb{E}\left[g(x_{k}(T))+\frac{\partial g}{\partial x}(x_{k}(T))(x(T)-x_{k}(T))\right]=0\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \displaystyle\int^{T}_{0}\mathbb{E}\left[\|x(s)-x_{k}(s)\|^{2}\right]\;\mathrm{d}s\leq\Delta_{k+1}\end{cases}

where we optimize over deterministic controls u𝒰=L2([0,T];U)u\in\mathcal{U}=L^{2}([0,T];U). The tuple (uk,xk)𝒰×L(Ω;C([0,T];n))(u_{k},x_{k})\in\mathcal{U}\times L^{\ell}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n})) is defined inductively and denotes a solution to (LOCP)kΔ{}^{\Delta}_{k}.

Each problem LOCPk+1Δ{}^{\Delta}_{k+1} consists of linearizing OCP around the solution at the previous iteration (uk,xk)(u_{k},x_{k}), starting from (u0,x0)(u_{0},x_{0}). To avoid misguidance due to high linearization error, we must restrict the search for optimal solutions for LOCPk+1Δ{}^{\Delta}_{k+1} to neighborhoods of (uk,xk)(u_{k},x_{k}). This is achieved by the final constraints listed in LOCPk+1Δ{}^{\Delta}_{k+1}, referred to as trust-region constraints, where the constant Δk+10\Delta_{k+1}\geq 0 is the trust-region radius. No such constraint is enforced on controls, since those appear linearly in bb, σ\sigma, and LL. To ensure well-posedness of the sequence (LOCPkΔ{}^{\Delta}_{k})k∈N, we require LOCPk+1Δ{}^{\Delta}_{k+1} has a solution, for which we consider the following assumption:

(A3)(A_{3}) For every kk\in\mathbb{N}, problem LOCPk+1Δ{}^{\Delta}_{k+1} is feasible.

{prpstn}

[Existence of solutions of LOCPk+1Δ{}^{\Delta}_{k+1}] Under (A3)(A_{3}), LOCPk+1Δ{}^{\Delta}_{k+1} has a solution for every kk\in\mathbb{N}.

Proof.

The proof is based on the argument developed to prove [20, Theorem 5.2, Chapter 2]. Specifically, let kk\in\mathbb{N} and {(uk+1α,xk+1α)}α𝒰×L(Ω;C([0,T];n))\{(u^{\alpha}_{k+1},x^{\alpha}_{k+1})\}_{\alpha\in\mathbb{N}}\in\mathcal{U}\times L^{\ell}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n})) be a minimizing sequence for LOCPk+1Δ{}^{\Delta}_{k+1}.

The sequence {uk+1α}α\{u^{\alpha}_{k+1}\}_{\alpha\in\mathbb{N}} is uniformly bounded in L2([0,T];m)L^{2}([0,T];\mathbb{R}^{m}), and thus we may assume the existence of u~k+1L2([0,T];m)\tilde{u}_{k+1}\in L^{2}([0,T];\mathbb{R}^{m}) such that {uk+1α}α\{u^{\alpha}_{k+1}\}_{\alpha\in\mathbb{N}} converges, up to some subsequence, to u~k+1\tilde{u}_{k+1} for the weak topology of L2([0,T];m)L^{2}([0,T];\mathbb{R}^{m}). Moreover, the compactness and convexity of UU yield u~k+1𝒰\tilde{u}_{k+1}\in\mathcal{U}. Finally, by Mazur’s theorem there exists a sequence of convex combinations

u~k+1αβ1cα,βk+1uk+1α+β,cα,βk+10,β1cα,βk+1=1,\tilde{u}^{\alpha}_{k+1}\triangleq\sum_{\beta\geq 1}c^{k+1}_{\alpha,\beta}u^{\alpha+\beta}_{k+1},\quad c^{k+1}_{\alpha,\beta}\geq 0,\quad\sum_{\beta\geq 1}c^{k+1}_{\alpha,\beta}=1,

such that {u~k+1α}α\{\tilde{u}^{\alpha}_{k+1}\}_{\alpha\in\mathbb{N}} converges to u~k+1\tilde{u}_{k+1} for the strong topology of L2([0,T];m)L^{2}([0,T];\mathbb{R}^{m}) (although, since controls are deterministic and the diffusion is uncontrolled, weak convergence of controls would suffice in our setting).

Thanks to Lemma 2.1, the sequence of trajectories {xu~k+1α,uk,xk}αL(Ω;C([0,T];n))\{x_{\tilde{u}^{\alpha}_{k+1},u_{k},x_{k}}\}_{\alpha\in\mathbb{N}}\subseteq L^{\ell}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n})) converges to the trajectory x~k+1xu~k+1,uk,xkL(Ω;C([0,T];n))\tilde{x}_{k+1}\triangleq x_{\tilde{u}_{k+1},u_{k},x_{k}}\in L^{\ell}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n})) for the strong topology of L(Ω;C([0,T];n))L^{\ell}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n})). At this step, for every α\alpha\in\mathbb{N}, thanks to the linearity of (2) we obtain that

xu~k+1α,uk,xkβ1cα,βk+1xuk+1α+β,uk,xk=β1cα,βk+1xk+1α+β,x_{\tilde{u}^{\alpha}_{k+1},u_{k},x_{k}}\equiv\sum_{\beta\geq 1}c^{k+1}_{\alpha,\beta}x_{u^{\alpha+\beta}_{k+1},u_{k},x_{k}}=\sum_{\beta\geq 1}c^{k+1}_{\alpha,\beta}x^{\alpha+\beta}_{k+1},

and therefore the linearity of the function gk+1g_{k+1} yields

𝔼[gk+1(xu~k+1α,uk,xk(T))]=β1cα,βk+1𝔼[gk+1(xk+1α+β(T))]=0,\mathbb{E}\left[g_{k+1}(x_{\tilde{u}^{\alpha}_{k+1},u_{k},x_{k}}(T))\right]=\sum_{\beta\geq 1}c^{k+1}_{\alpha,\beta}\mathbb{E}\left[g_{k+1}(x^{\alpha+\beta}_{k+1}(T))\right]=0,

whereas the convexity of the norm yields

(0T𝔼[xu~k+1α,uk,xk(s)xk(s)2]ds)12β1cα,βk+1(0T𝔼[xk+1α+β(s)xk(s)2]ds)12Δk+112.\left(\int^{T}_{0}\mathbb{E}\left[\|x_{\tilde{u}^{\alpha}_{k+1},u_{k},x_{k}}(s)-x_{k}(s)\|^{2}\right]\;\mathrm{d}s\right)^{\frac{1}{2}}\leq\sum_{\beta\geq 1}c^{k+1}_{\alpha,\beta}\left(\int^{T}_{0}\mathbb{E}\left[\|x^{\alpha+\beta}_{k+1}(s)-x_{k}(s)\|^{2}\right]\;\mathrm{d}s\right)^{\frac{1}{2}}\leq\Delta^{\frac{1}{2}}_{k+1}.

In turn, passing to the limit for α\alpha\rightarrow\infty, we infer that the tuple (u~k+1,x~k+1)𝒰×L(Ω;C([0,T];n))(\tilde{u}_{k+1},\tilde{x}_{k+1})\in\mathcal{U}\times L^{\ell}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n})) is admissible for LOCPk+1Δ{}^{\Delta}_{k+1}. In addition, the convexity of the function fk+10f^{0}_{k+1} allows us to compute

𝔼[0Tfk+10(s,u~k+1(s),x~k+1(s))ds]\displaystyle\mathbb{E}\left[\int^{T}_{0}f^{0}_{k+1}(s,\tilde{u}_{k+1}(s),\tilde{x}_{k+1}(s))\;\mathrm{d}s\right] limαβ1cα,βk+1𝔼[0Tfk+10(s,uk+1α+β(s),xk+1α+β(s))ds]\displaystyle\leq\underset{\alpha\rightarrow\infty}{\lim}\ \sum_{\beta\geq 1}c^{k+1}_{\alpha,\beta}\mathbb{E}\left[\int^{T}_{0}f^{0}_{k+1}(s,u^{\alpha+\beta}_{k+1}(s),x^{\alpha+\beta}_{k+1}(s))\;\mathrm{d}s\right]
=minu𝒰𝔼[0Tfk+10(s,u(s),x(s))ds],\displaystyle=\underset{u\in\mathcal{U}}{\min}\ \mathbb{E}\left[\int^{T}_{0}f^{0}_{k+1}(s,u(s),x(s))\;\mathrm{d}s\right],

from which we conclude that (u~k+1,x~k+1)𝒰×L(Ω;C([0,T];n))(\tilde{u}_{k+1},\tilde{x}_{k+1})\in\mathcal{U}\times L^{\ell}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n})) is a solution of LOCPk+1Δ{}^{\Delta}_{k+1}.

{rmrk}

Although one finds empirically that (A3)(A_{3}) is often satisfied in practice (see for instance results in Section 6), it is generally difficult to derive sufficient conditions for this assumption to hold true a priori. In particular, to the best of our knowledge, only stochastic dynamics with constant coefficients and with specific regularity conditions on the stochastic diffusion yield controllability [33]. In deterministic settings, some works obtain feasibility of each convex subproblem, though only locally around the unknown solution of the original formulation by assuming second-order regularity conditions, which however can not be checked a priori [34, 35]. Thus, even SCP-based schemes in simpler deterministic settings must often either explicitly assume the feasibility of each convex subproblem, or must modify the linearized dynamics by infusing additional slack controls to force feasibility [36]. Motivated by these remarks, we leave the investigation for sufficient conditions for the validity of (A3)(A_{3}) as a direction of future research, which are out of the scope of this work which focuses on the properties of accumulations points of the sequence generated by SCP. Note that (A3)(A_{3}) is satisfied when g=0g=0, i.e., no final constraints are imposed: a problem that remains generally relevant though computationally difficult to solve.

Under assumptions (A1)(A_{1})(A3)(A_{3}), the method consists of iteratively solving the aforementioned linearized problems through the update of the sequence of trust-region radii, producing a sequence of tuples (uk,xk)k(u_{k},x_{k})_{k\in\mathbb{N}} such that for each kk\in\mathbb{N}, (uk+1,xk+1)(u_{k+1},x_{k+1}) solves LOCPk+1Δ{}^{\Delta}_{k+1}. The user may often steer this procedure to convergence (with respect to appropriate topologies) by adequately selecting an initial guess (u0,x0)(u_{0},x_{0}) and an update rule for (Δk)k(\Delta_{k})_{k\in\mathbb{N}}111When state constraint penalization is adopted, SCP procedures may also consider update rules for penalization weights, and those must be provided together with update rules for the trust-region radii. Further details can be found in [37, 38]., and appropriate choices will be described in Section 6. Assuming that an accumulation point for (uk,xk)k(u_{k},x_{k})_{k\in\mathbb{N}} can be found (whose existence will be discussed shortly), our objective consists of proving that this is a candidate locally-optimal solution to the original formulation OCP. Specifically, we show that any accumulation point for (uk,xk)k(u_{k},x_{k})_{k\in\mathbb{N}} satisfies the stochastic PMP related to OCP. To develop such analysis, we require the absence of state constraints, and in particular of trust-region constraints.

3.2. Stochastic Pontryagin Maximum Principle

In this section, we recall the statement of the PMP which provides classical first-order necessary conditions for optimality, upon which we will establish our main result. For the sake of clarity, we introduce the PMP related to OCP and the PMP related to each convexified problem LOCPkΔ{}^{\Delta}_{k} separately.

3.2.1. PMP related to OCP

For every pnp\in\mathbb{R}^{n}, p0p^{0}\in\mathbb{R}, and q=(q1,,qd)n×dq=(q_{1},\dots,q_{d})\in\mathbb{R}^{n\times d} define the Hamiltonian (same notation as in (1))

H(s,u,x,p,p0,q)pb(s,x,u)+p0f0(s,x,u)+i=1dqiσi(s,x).{\color[rgb]{0,0,0}H(s,u,x,p,p^{0},q)\triangleq p^{\top}b(s,x,u)+p^{0}f^{0}(s,x,u)+\sum^{d}_{i=1}q^{\top}_{i}\sigma_{i}(s,x).}
{thrm}

[Stochastic Pontryagin Maximum Principle for OCP [20]] Let (u,x)(u,x) be a locally-optimal solution to OCP. There exist pL2(Ω;C([0,T];n))p\in L^{2}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n})), a tuple (𝔭,p0)(\mathfrak{p},p^{0}), where 𝔭d\mathfrak{p}\in\mathbb{R}^{d} and p00p^{0}\leq 0 are constant, and q=(q1,,qd)L2([0,T]×Ω;n×d)q=(q_{1},\dots,q_{d})\in L^{2}_{\mathcal{F}}([0,T]\times\Omega;\mathbb{R}^{n\times d}) such that the following relations are satisfied:

  1. (1)

    Non-Triviality Condition: (𝔭,p0)0(\mathfrak{p},p^{0})\neq 0.

  2. (2)

    Adjoint Equation:

    dp(t)\displaystyle\mathrm{d}p(t) =Hx(t,u(t),x(t),p(t),p0,q(t))dt+q(t)dBt,p(T)=𝔼[gx(x(T))]𝔭n.\displaystyle=\displaystyle-\frac{\partial H}{\partial x}(t,u(t),x(t),p(t),p^{0},q(t))\;\mathrm{d}t+q(t)\;\mathrm{d}B_{t},\quad p(T)=\displaystyle\mathbb{E}\left[\frac{\partial g}{\partial x}(x(T))\right]^{\top}\mathfrak{p}\in\mathbb{R}^{n}.
  3. (3)

    Maximality Condition:

    u(t)=argmaxvU𝔼[H(t,v,x(t),p(t),p0,q(t))],a.e.{\color[rgb]{0,0,0}u(t)=\underset{v\in U}{\arg\max}\ \mathbb{E}\Big{[}H(t,v,x(t),p(t),p^{0},q(t))\Big{]},\ \textnormal{a.e.}}

The quantity (u,𝔭,p0)(u,\mathfrak{p},p^{0}) uniquely determines xx, pp, and qq and is called extremal for OCP (associated with the tuple (u,x,p,𝔭,p0,q)(u,x,p,\mathfrak{p},p^{0},q), or simply with (u,x)(u,x)). An extremal (u,𝔭,p0)(u,\mathfrak{p},p^{0}) is called normal if p00p^{0}\neq 0.

Albeit final conditions are specified instead of initial conditions for pp, it turns out that processes satisfying backward stochastic differential equations are adapted with respect to the filtration \mathcal{F} (see, e.g., [30, 20]), which makes the adjoint equation well-posed. Although conditions for optimality for stochastic optimal control problems are usually developed when considering stochastic controls only, the proof of Theorem 3.2.1 (and of Theorem 3.2.2 below) readily follows from classical arguments (e.g., see [20, Chapter 3.6]). Nevertheless, to prove our main result, we rely on a proof of Theorem 3.2.1 (and of Theorem 3.2.2 below) which stems from implicit-function-theorem-type results (see Sections 4.1 and 4.2). Thus, in Section 4.1, we provide a new proof of Theorem 3.2.1 (more precisely, of Theorem 3.2.2 below) which follows from the original idea developed by Pontryagin and his group (see, e.g., [17, 39, 40]), though the latter result should not be understood as part of our main contribution.

3.2.2. PMP related to LOCPkΔ{}^{\Delta}_{k}

For every k1k\geq 1, pnp\in\mathbb{R}^{n}, p0,p1p^{0},p^{1}\in\mathbb{R}, and q=(q1,,qd)n×dq=(q_{1},\dots,q_{d})\in\mathbb{R}^{n\times d}, define the Hamiltonian (same notation as in (2))

Hk(s,u,x,p,p0,p1,q)pbk(s,x,u)+p0fk0(s,x,u)+p1xxk1(s)2+i=1dqi(σk)i(s,x).H_{k}(s,u,x,p,p^{0},p^{1},q)\triangleq p^{\top}b_{k}(s,x,u)+p^{0}f^{0}_{k}(s,x,u)+p^{1}\|x-x_{k-1}(s)\|^{2}+\sum^{d}_{i=1}q^{\top}_{i}(\sigma_{k})_{i}(s,x).
{thrm}

[Weak Stochastic Pontryagin Maximum Principle for LOCPkΔ{}^{\Delta}_{k} [20]] Let (uk,xk)(u_{k},x_{k}) be a locally-optimal solution to LOCPkΔ{}^{\Delta}_{k}. There exists pkL2(Ω;C([0,T];n))p_{k}\in L^{2}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n})), a tuple (𝔭k,pk0,pk1)(\mathfrak{p}_{k},p^{0}_{k},p^{1}_{k}), where 𝔭kq\mathfrak{p}_{k}\in\mathbb{R}^{q}, pk00p^{0}_{k}\leq 0, and pk1p^{1}_{k}\in\mathbb{R} are constant, and qk=((qk)1,,(qk)d)L2([0,T]×Ω;n×d)q_{k}=((q_{k})_{1},\dots,(q_{k})_{d})\in L^{2}_{\mathcal{F}}([0,T]\times\Omega;\mathbb{R}^{n\times d}) such that the following relations are satisfied:

  1. (1)

    Non-Triviality Condition: (𝔭k,pk0,pk1)0(\mathfrak{p}_{k},p^{0}_{k},p^{1}_{k})\neq 0.

  2. (2)

    Adjoint Equation:

    dpk(t)\displaystyle\mathrm{d}p_{k}(t) =Hkx(t,uk(t),xk(t),pk(t),pk0,pk1,qk(t))dt+qk(t)dBt,pk(T)=𝔼[gkx(xk(T))]𝔭kn.\displaystyle=\displaystyle-\frac{\partial H_{k}}{\partial x}(t,u_{k}(t),x_{k}(t),p_{k}(t),p^{0}_{k},p^{1}_{k},q_{k}(t))\;\mathrm{d}t+q_{k}(t)\;\mathrm{d}B_{t},\quad p_{k}(T)=\displaystyle\mathbb{E}\left[\frac{\partial g_{k}}{\partial x}(x_{k}(T))\right]^{\top}\mathfrak{p}_{k}\in\mathbb{R}^{n}.
  3. (3)

    Maximality Condition:

    uk(t)=argmaxvU𝔼[Hk(t,v,xk(t),pk(t),pk0,pk1,qk(t))],a.e.u_{k}(t)=\underset{v\in U}{\arg\max}\ \mathbb{E}\Big{[}H_{k}(t,v,x_{k}(t),p_{k}(t),p^{0}_{k},p^{1}_{k},q_{k}(t))\Big{]},\ \textnormal{a.e.}

The quantity (uk,𝔭k,pk0,pk1)(u_{k},\mathfrak{p}_{k},p^{0}_{k},p^{1}_{k}) uniquely determines xkx_{k}, pkp_{k}, and qkq_{k} and is called extremal for LOCPkΔ{}^{\Delta}_{k} (associated with (uk,xk,pk,𝔭k,pk0,pk1,qk)(u_{k},x_{k},p_{k},\mathfrak{p}_{k},p^{0}_{k},p^{1}_{k},q_{k}), or simply with (uk,xk)(u_{k},x_{k})). An extremal (uk,𝔭k,pk0,pk1)(u_{k},\mathfrak{p}_{k},p^{0}_{k},p^{1}_{k}) is called normal if pk00p^{0}_{k}\neq 0.

{rmrk}

By introducing the new variable

y(t)0tx(s)xk(s)2ds,y(t)\triangleq\int^{t}_{0}\|x(s)-x_{k}(s)\|^{2}\;\mathrm{d}s, (4)

the trust-region constraint in LOCPk+1Δ{}^{\Delta}_{k+1} can be rewritten as 𝔼[y(T)Δk+1]0\mathbb{E}[y(T)-\Delta_{k+1}]\leq 0. Thus, by leveraging this transformation, LOCPk+1Δ{}^{\Delta}_{k+1} may be reformulated as a standard stochastic optimal control problem with final inequality constraints. Nevertheless, although the conditions listed in Theorem 3.2.1 are essentially sharp, the statement of Theorem 3.2.2 may be strengthened as follows. If (uk+1,xk+1,𝔭k+1,pk+10,pk+11,qk+1)(u_{k+1},x_{k+1},\mathfrak{p}_{k+1},p^{0}_{k+1},p^{1}_{k+1},q_{k+1}) is an extremal for LOCPk+1Δ{}^{\Delta}_{k+1}, one can additionally prove that pk+110p^{1}_{k+1}\leq 0 (e.g., see [41]; note that in [41] the multipliers have opposite signs because a different convention is adopted) and

pk+11𝔼[0Txk+1(s)xk(s)2dsΔk+1]=0p^{1}_{k+1}\mathbb{E}\left[\int^{T}_{0}\|x_{k+1}(s)-x_{k}(s)\|^{2}\;\mathrm{d}s-\Delta_{k+1}\right]=0

(the latter is know as slack condition), which motivates the choice “weak stochastic PMP for LOCPk+1Δ{}^{\Delta}_{k+1}” as name for Theorem 3.2.2. Nevertheless, since the trust-region constraints do not appear in the original problem, we do not need to leverage these latter additional conditions on pk+11p^{1}_{k+1} to prove our claims, i.e., Theorem 3.2.2 suffices to establish the aforementioned properties of accumulation points for SCP when applied to solve OCP.

3.3. Main Results

Our contribution is twofold. First, under very mild assumptions, we prove that any accumulation point of the sequence of iterates generated by SCP is a candidate locally-optimal solution for OCP. Specifically, we prove that any accumulation point of the sequence (uk,xk,𝔭k,pk0,pk1,qk)k(u_{k},x_{k},\mathfrak{p}_{k},p^{0}_{k},p^{1}_{k},q_{k})_{k\in\mathbb{N}} generated by SCP, where (uk,𝔭k,pk0,pk1)(u_{k},\mathfrak{p}_{k},p^{0}_{k},p^{1}_{k}) is an extremal for LOCPkΔ{}^{\Delta}_{k} associated with (uk,xk)(u_{k},x_{k}) in the sense of Theorem 3.2.1, is an extremal for OCP in the sense of Theorem 3.2.2 (see Theorem 3.3.1). Although the optimization community is aware that establishing the convergence of the sequence generated by SCP is generally difficult (see, e.g., [42, 43]), one can often prove optimality-related properties of its accumulation points, a much natural property which justify the use of SCP (see, e.g., [43]). Second, by strengthening our original assumptions, we prove existence of at least one accumulation point of the aforementioned sequence (uk,xk,𝔭k,pk0,pk1,qk)k(u_{k},x_{k},\mathfrak{p}_{k},p^{0}_{k},p^{1}_{k},q_{k})_{k\in\mathbb{N}} generated by SCP (see Theorem 3.3.2). Below, we organize these claims in two more precise statements.

3.3.1. Properties of Accumulation Points for SCP

{thrm}

[Properties of Accumulation Points for SCP] Assume that (A1)(A_{1})(A3)(A_{3}) hold and that SCP generates a sequence (Δk,uk,xk)k(\Delta_{k},u_{k},x_{k})_{k\in\mathbb{N}} such that (Δk)k+{0}(\Delta_{k})_{k\in\mathbb{N}}\subseteq\mathbb{R}_{+}\setminus\{0\} converges to zero, and for every k1k\geq 1, the tuple (uk,xk)(u_{k},x_{k}) locally solves LOCPkΔ{}^{\Delta}_{k}. For every k1k\geq 1, letting (uk,𝔭k,pk0,pk1)(u_{k},\mathfrak{p}_{k},p^{0}_{k},p^{1}_{k}) be an extremal associated with (uk,xk)(u_{k},x_{k}) for LOCPkΔ{}^{\Delta}_{k} (whose existence is ensured by Theorem 3.2.2), assume the following Accumulation Condition holds:

  • (AC)

    Up to some subsequence, (uk,𝔭k,pk0,pk1)(u_{k},\mathfrak{p}_{k},p^{0}_{k},p^{1}_{k}) converges to some (u,𝔭,p0,p1)L2([0,T];m)×q+2(u,\mathfrak{p},p^{0},p^{1})\in L^{2}([0,T];\mathbb{R}^{m})\times\mathbb{R}^{q+2} for the weak topology of L2([0,T];m)×q+2L^{2}([0,T];\mathbb{R}^{m})\times\mathbb{R}^{q+2}.

If (𝔭,p0)0(\mathfrak{p},p^{0})\neq 0, then (u,𝔭,p0)(u,\mathfrak{p},p^{0}) is an extremal for OCP associated with (u,xu)(u,x_{u}).

The guarantees offered by Theorem 3.3.1 read as follows. Under (A1)(A_{1})(A3)(A_{3}) and by selecting a shrinking-to-zero sequence of trust-region radii, if iteratively solving problems LOCPkΔ{}^{\Delta}_{k} returns a sequence of strategies whose extremals satisfy (AC) with a non-trivial multiplier (𝔭,p0)0(\mathfrak{p},p^{0})\neq 0, then SCP finds a candidate (local) solution to OCP. Theorem 3.3.1 extends classical results on the well-posedness of SCP (see, e.g., [43]) from deterministic to stochastic settings. In particular, the requirement (𝔭,p0)0(\mathfrak{p},p^{0})\neq 0 in Theorem 3.3.1 is natural and has an equivalent in deterministic settings, playing the role of some sort of qualification condition (see, e.g., [43, Theorem 3.4]). Note that the requirement (𝔭,p0)0(\mathfrak{p},p^{0})\neq 0 can be easily numerically checked (see our discussion after Theorem 3.3.2).

3.3.2. Existence of Accumulation Points for SCP

Assumptions (A1)(A_{1})(A3)(A_{3}) together with some minor requirements are sufficient to establish that any accumulation point for the sequence of iterates (uk,xk)k(u_{k},x_{k})_{k\in\mathbb{N}} satisfies the stochastic PMP related to OCP (Theorem 3.3.1). We can additionally infer the existence of accumulation points, i.e., (AC) in Theorem 3.3.1 holds true, if some more structure on the data defining OCP is assumed. Importantly, the validity of (AC) endows stochastic SCP with the additional guarantee that at least one accumulation point (with respect to weak topologies) exists, which in turn provides a “weak” guarantee of success for the method via the result of Theorem 3.3.1 (“weak” because convergence is satisfied up to some subsequence). We introduce the following technical condition:

(A4)(A_{4}) The mapping G:m+1G:\mathbb{R}^{m+1}\rightarrow\mathbb{R} is given by G(t,u)=u𝕌(t)uG(t,u)=u^{\top}\mathbb{U}(t)u, where each 𝕌(t)m×m\mathbb{U}(t)\in\mathbb{R}^{m\times m} is symmetric definite positive and the mapping t𝕌(t)1t\mapsto\mathbb{U}(t)^{-1} is continuous.

The use we make of (A4)(A_{4}) is essentially contained in the following result.

{lmm}

Under (A4)(A_{4}), for every kk\in\mathbb{N}, every normal extremal (uk+1,𝔭k+1,pk+10,pk+11)(u_{k+1},\mathfrak{p}_{k+1},p^{0}_{k+1},p^{1}_{k+1}) for LOCPk+1Δ{}^{\Delta}_{k+1}, i.e., for which pk+100p^{0}_{k+1}\neq 0, is such that the corresponding control uk+1u_{k+1} is time-continuous.

Proof.

Fix kk\in\mathbb{N}, and let (uk+1,𝔭k+1,pk+10,pk+11)(u_{k+1},\mathfrak{p}_{k+1},p^{0}_{k+1},p^{1}_{k+1}) be a normal extremal for LOCPk+1Δ{}^{\Delta}_{k+1}. Due to pk+100p^{0}_{k+1}\neq 0, the convexity of UmU\subseteq\mathbb{R}^{m} and the maximality condition in Theorem 3.2.2 yield

uk+1(s)={12pk+10𝕌1(s)γk(s)if12pk+10𝕌1(s)γk(s)U,ProjU(12pk+10𝕌1(s)γk(s))if12pk+10𝕌1(s)γk(s)U,u_{k+1}(s)=\left\{\begin{aligned} \frac{1}{2p^{0}_{k+1}}\mathbb{U}^{-1}(s)\gamma_{k}(s)\quad&\textnormal{if}\quad\frac{1}{2p^{0}_{k+1}}\mathbb{U}^{-1}(s)\gamma_{k}(s)\in U,\\ \textnormal{Proj}_{U}\left(\frac{1}{2p^{0}_{k+1}}\mathbb{U}^{-1}(s)\gamma_{k}(s)\right)\quad&\textnormal{if}\quad\frac{1}{2p^{0}_{k+1}}\mathbb{U}^{-1}(s)\gamma_{k}(s)\notin U,\end{aligned}\right.

where we denote γk(s)((γk(s))1,,(γk(s))m)\gamma_{k}(s)\triangleq\big{(}(\gamma_{k}(s))_{1},\dots,(\gamma_{k}(s))_{m}\big{)} with (γk(s))i𝔼[pk+1(s)bi(s,xk(s))+pk+10Li(s,xk(s))](\gamma_{k}(s))_{i}\triangleq\mathbb{E}\big{[}p^{\top}_{k+1}(s)b_{i}(s,x_{k}(s))+p^{0}_{k+1}L_{i}(s,x_{k}(s))\big{]}, whereas ProjU:mm\textnormal{Proj}_{U}:\mathbb{R}^{m}\rightarrow\mathbb{R}^{m} denotes the projection over the convex set UU. The claim readily follows once we prove the mappings t[0,T]γk(s)t\in[0,T]\mapsto\gamma_{k}(s) are continuous, given that t[0,T]𝕌1(s)t\in[0,T]\mapsto\mathbb{U}^{-1}(s) and vProjU(v)v\mapsto\textnormal{Proj}_{U}(v) are continuous. We only prove the continuity of t[0,T]𝔼[pk(t)]t\in[0,T]\mapsto\mathbb{E}[p_{k}(t)], given that the continuity of t[0,T]γk(s)t\in[0,T]\mapsto\gamma_{k}(s) can be proved by leveraging similar arguments, and (A1)(A_{1}) and (A2)(A_{2}). For this, thanks to (A1)(A_{1}), (A2)(A_{2}), Lemma 2.1, Theorem 3.2.2, and Hölder and Burkholder–Davis–Gundy inequalities, for every 0s<tT0\leq s<t\leq T, we obtain that

𝔼[pk(t)pk(s)]\displaystyle\mathbb{E}\big{[}\|p_{k}(t)-p_{k}(s)\|\big{]}\leq
2𝔼[st(pk(r)fkx(r,xk(r),uk(r))+pk0fk0x(r,xk(r),uk(r))+2pk1(xk(r)xk1(r))\displaystyle\leq 2\mathbb{E}\bigg{[}\bigg{\|}\int^{t}_{s}\bigg{(}p^{\top}_{k}(r)\frac{\partial f_{k}}{\partial x}(r,x_{k}(r),u_{k}(r))+p^{0}_{k}\frac{\partial f^{0}_{k}}{\partial x}(r,x_{k}(r),u_{k}(r))+2p^{1}_{k}(x_{k}(r)-x_{k-1}(r))
+i=1d(qk)i(r)(σk)ix(r,xk(r)))dr]+2𝔼[stqk(r)dBr]\displaystyle\hskip 193.74939pt+\sum^{d}_{i=1}(q_{k})^{\top}_{i}(r)\frac{\partial(\sigma_{k})_{i}}{\partial x}(r,x_{k}(r))\bigg{)}\;\mathrm{d}r\bigg{\|}\bigg{]}+2\mathbb{E}\left[\left\|\int^{t}_{s}q_{k}(r)\;\mathrm{d}B_{r}\right\|\right]
C(pk0+pk1+𝔼[supt[0,T]xk(t)2]12+𝔼[supt[0,T]pk(t)2]12+𝔼[0Tqk(t)2dt]12)(ts)12,\displaystyle\leq C\left(p^{0}_{k}+p^{1}_{k}+\mathbb{E}\left[\underset{t\in[0,T]}{\sup}\ \|x_{k}(t)\|^{2}\right]^{\frac{1}{2}}+\mathbb{E}\left[\underset{t\in[0,T]}{\sup}\ \|p_{k}(t)\|^{2}\right]^{\frac{1}{2}}+\mathbb{E}\left[\int^{T}_{0}\ \|q_{k}(t)\|^{2}\;\mathrm{d}t\right]^{\frac{1}{2}}\right)(t-s)^{\frac{1}{2}},

for some appropriate constant C0C\geq 0, and the conclusion follows.

{rmrk}

Through Lemma 3.3.2, (A4)(A_{4}) becomes crucial to ensure the validity of (AC) in Theorem 3.3.1. In particular, the works [44, 45, 46, 47], which analyze continuity properties of extremals with respect to appropriate deformations of some deterministic optimal control problems and which inspired our work, show that the time-continuity of optimal controls uku_{k} for each LOCPkΔ{}^{\Delta}_{k} represents a requirement which is not easy to relax (in particular, see the counterexample in [47, Section 2.3]), especially in the presence of trust-region constraints. Importantly, motivated by regularity results in deterministic optimal control settings (see, e.g., [48, Theorem 3.2]), we reckon that more generic mappings GG might yield Lemma 3.3.2, especially when optimizing over deterministic controls, although we leave the investigation of more general sufficient conditions for the time-continuity of optimal controls uku_{k} for each LOCPkΔ{}^{\Delta}_{k} as a future research direction, in that it is out of the scope of this work which again focuses on the properties of accumulations points of the sequence generated by SCP.

{thrm}

[Existence of Accumulation Points for SCP] Assume that (A1)(A_{1})(A4)(A_{4}) hold and that SCP generates a sequence (Δk,uk,xk)k(\Delta_{k},u_{k},x_{k})_{k\in\mathbb{N}} such that (Δk)k+{0}(\Delta_{k})_{k\in\mathbb{N}}\subseteq\mathbb{R}_{+}\setminus\{0\} converges to zero, and for every k1k\geq 1, the tuple (uk,xk)(u_{k},x_{k}) locally solves LOCPkΔ{}^{\Delta}_{k}. For every k1k\geq 1, let (uk,𝔭k,pk0,pk1)(u_{k},\mathfrak{p}_{k},p^{0}_{k},p^{1}_{k}) be an extremal associated with (uk,xk)(u_{k},x_{k}) for LOCPkΔ{}^{\Delta}_{k} (whose existence is ensured by Theorem 3.2.2). If pk00p^{0}_{k}\neq 0 for every k1k\geq 1, then (AC) in Theorem 3.3.1 holds true.

In addition, if (u,𝔭,p0)(u,\mathfrak{p},p^{0}) denotes the extremal for OCP associated with some (u,x,p,𝔭,p0,q)(u,x,p,\mathfrak{p},p^{0},q), which is provided by Theorem 3.3.1, the following convergence holds, up to some subsequence, for every 2\ell\geq 2 when kk\rightarrow\infty:

(𝔭k,pk0)(𝔭,p0)+𝔼[sups[0,T]xk(s)x(s)+sups[0,T]pk(s)p(s)2+0Tqk(s)q(s)2ds]0.\|(\mathfrak{p}_{k},p^{0}_{k})-(\mathfrak{p},p^{0})\|+\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \left\|x_{k}(s)-x(s)\right\|^{\ell}+\underset{s\in[0,T]}{\sup}\ \left\|p_{k}(s)-p(s)\right\|^{2}+\int^{T}_{0}\left\|q_{k}(s)-q(s)\right\|^{2}\;\mathrm{d}s\right]\rightarrow 0. (5)

The main takeaway of Theorem 3.3.2 is that an accumulation point for stochastic SCP always exists, which in turn is a candidate (local) solution to OCP due to Theorem 3.3.1, as soon as appropriate qualification-type conditions are satisfied. In particular, similarly to the condition (𝔭,p0)0(\mathfrak{p},p^{0})\neq 0 in Theorem 3.3.1, the requirement pk00p^{0}_{k}\neq 0 in Theorem 3.3.2 plays the role of an additional qualification condition, but at the level of each subproblem LOCPkΔ{}^{\Delta}_{k}. Note that the latter is a generic property in deterministic optimal control (see, e.g., [49]). Finally, the requirement (𝔭,p0)0(\mathfrak{p},p^{0})\neq 0 in Theorem 3.3.1 from which (local) optimality of the strategy found by SCP stems can be numerically checked thanks to (5), as soon as the multipliers (𝔭k,pk0)(\mathfrak{p}_{k},p^{0}_{k}) are accessible through SCP iterations.

3.3.3. Insights to Speed Up the Convergence of SCP

In Theorem 3.3.2, there are also insightful statements concerning the convergence of Pontryagin extremals. Let us outline how those statements may be leveraged to speed up convergence.

For this, adopt the notation of Theorem 3.2.1 and assume that we are in the situation where applying the maximality condition of the PMP to problem OCP leads to smooth candidate optimal controls, as functions of the variables xx, p0p^{0}, and pp (be aware that this might not be straightforward to obtain). We are then in a position to define two-point boundary value problems to solve OCP, also known as shooting methods, for which the decision variables become p0p^{0}, p(T)p(T), and qq. In particular, the core of the method consists of iteratively choosing (p0,p(T),q)(p^{0},p(T),q) and making the adjoint equation evolve until some given final condition is met (see, e.g., [50, 51] for a more detailed explanation of shooting methods). In the context of deterministic optimal control, when convergence is achieved, shooting methods terminate quite fast (at least quadratically). However, here the bottlenecks are: 1) to deal with the presence of the variable qq and 2) to find a good guess for the initial value of (p0,p(T),q)(p^{0},p(T),q) to make the whole procedure converge. In the setting of Theorem 3.3.2, a valid option to design well-posed shooting methods is as follows. With the notation and assumptions of Theorem 3.3.2, up to some subsequence it holds that (pk0,pk(T),qk)(p0,p(T),q)(p^{0}_{k},p_{k}(T),q_{k})\rightarrow(p^{0},p(T),q) (with respect to appropriate topologies) as kk\rightarrow\infty. Therefore, assuming we have access to Pontryagin extremals along iterations and given some large enough iteration kk, we can fix q=qkq=q_{k} and initialize with (pk0,pk(T))(p^{0}_{k},p_{k}(T)) a shooting method for OCP that operates on the finite-dimensional variable (p0,p(T))q+1(p^{0},p(T))\in\mathbb{R}^{q+1}. If successful, this strategy would speed up the convergence of the entire numerical scheme, though we leave its investigation as a future research direction.

4. Proof of the Main Results

Since the statement of Theorem 3.3.1 is in particular contained in Theorem 3.3.2, it is sufficient to prove Theorem 3.3.2 only. We split this proof into three main steps. First, we retrace the proof of the stochastic PMP to introduce necessary notation and expressions. In addition, we leverage this step to provide novel insight on how to prove the stochastic PMP by following the lines of the original work of Pontryagin and his group (see, e.g., [17, 39, 40]), a proof that we could not find in the stochastic literature. Second, we show the convergence of trajectories and controls, together with the convergence of variational inequalities (see Section 5.2.2 for a definition). The latter represents the cornerstone of the proof and paves the way for the final step, which consists of proving the convergence of the Pontryagin extremals. For the sake of clarity and brevity and without loss of generality, we carry out the proof in the case of scalar Brownian motion, i.e., we assume d=1d=1. Moreover, for any xnx\in\mathbb{R}^{n} with nn\in\mathbb{N}, we adopt the notation x~(x,xn+1)n+1\tilde{x}\triangleq(x,x^{n+1})\in\mathbb{R}^{n+1}.

4.1. Main Steps of the Proof of the Stochastic Maximum Principle

For the sake of clarity and brevity, we retrace the proof of Theorem 3.2.1 only. The proof of Theorem 3.2.2 follows from a straightforward modification of the steps we provide below, by introducing the additional final constraint 𝔼[y(T)Δk+1]0\mathbb{E}[y(T)-\Delta_{k+1}]\leq 0 via (4). In particular, we highlight those modifications below and in Section 4.2.2.

4.1.1. Linear Stochastic Differential Equations

Define the stochastic matrices A(t)(b,f0)x(t,u(t),x(t))A(t)\triangleq\frac{\partial(b,f^{0})}{\partial x}(t,u(t),x(t)) and D(t)(σ,0)x(t,u(t),x(t))D(t)\triangleq\frac{\partial(\sigma,0)}{\partial x}(t,u(t),x(t)). For any time r[0,T]r\in[0,T] and any bounded initial condition ξ~rLr2(Ω;n+1)\tilde{\xi}_{r}\in L^{2}_{\mathcal{F}_{r}}(\Omega;\mathbb{R}^{n+1}), the following problem

{dz(t)=A(t)z(t)dt+D(t)z(t)dBtz(s)=0,s[0,r),z(r)=ξ~r\begin{cases}\mathrm{d}z(t)=A(t)z(t)\;\mathrm{d}t+D(t)z(t)\;\mathrm{d}B_{t}\vskip 6.0pt plus 2.0pt minus 2.0pt\\ z(s)=0,\ s\in[0,r),\quad z(r)=\tilde{\xi}_{r}\end{cases} (6)

is well-posed [20]. Its unique solution is the \mathcal{F}–adapted with right-continuous sample paths process z:[0,T]×Ωn+1:(t,ω)𝟙[r,T](t)ϕ(t,ω)ψ(r,ω)ξr(ω)z:[0,T]\times\Omega\rightarrow\mathbb{R}^{n+1}:(t,\omega)\mapsto\mathbbm{1}_{[r,T]}(t)\phi(t,\omega)\psi(r,\omega)\xi_{r}(\omega), where the matrix-valued \mathcal{F}–adapted with continuous sample paths processes ϕ\phi and ψ\psi satisfy

{dϕ(t)=A(t)ϕ(t)dt+D(t)ϕ(t)dBtϕ(0)=I,{dψ(t)=ψ(t)(A(t)D(t)2)dtψ(t)D(t)dBtψ(0)=I,\begin{cases}\mathrm{d}\phi(t)=A(t)\phi(t)\mathrm{d}t+D(t)\phi(t)\mathrm{d}B_{t}\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \phi(0)=I,\end{cases}\qquad\begin{cases}\mathrm{d}\psi(t)=-\psi(t)\left(A(t)-D(t)^{2}\right)\mathrm{d}t-\psi(t)D(t)\mathrm{d}B_{t}\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \psi(0)=I,\end{cases} (7)

respectively. In particular, a straightforward application of the Itô formula shows that ϕ(t)ψ(t)=ψ(t)ϕ(t)=I\phi(t)\psi(t)=\psi(t)\phi(t)=I, and therefore ψ(t)=ϕ(t)1\psi(t)=\phi(t)^{-1}, for every t[0,T]t\in[0,T].

4.1.2. Needle-like Variations and End-point Mapping

One way to prove the PMP comes from the analysis of specific variations called needle-like variations on a mapping called the end-point mapping. Those concepts are introduced below in the context of optimization over deterministic controls (see Section 5 for the generalization of this argument to stochastic controls).

Given an integer jj\in\mathbb{N}, fix jj times 0<t1<<tj<T0<t_{1}<\dots<t_{j}<T which are Lebesgue points for uu, and fix jj random variables u1,,uju_{1},\dots,u_{j} such that uiUu_{i}\in U. For fixed scalars 0ηi<ti+1ti0\leq\eta_{i}<t_{i+1}-t_{i}, i=1,,j1i=1,\dots,j-1, and 0ηj<Ttj0\leq\eta_{j}<T-t_{j}, the needle-like variation π={ti,ηi,ui}i=1,,j\pi=\{t_{i},\eta_{i},u_{i}\}_{i=1,\dots,j} of the control uu is defined to be the admissible control uπ(t)=uiu_{\pi}(t)=u_{i} if t[ti,ti+ηi]t\in[t_{i},t_{i}+\eta_{i}] and uπ(t)=u(t)u_{\pi}(t)=u(t) otherwise. Denote by x~v\tilde{x}_{v} the solution related to an admissible control vv of the augmented system

{dx(t)=b(t,v(t),x(t))dt+σ(t,x(t))dBt,x(0)=x0dxn+1(t)=f0(t,v(t),x(t))dt,xn+1(0)=0\begin{cases}\mathrm{d}x(t)=b(t,v(t),x(t))\;\mathrm{d}t+\sigma(t,x(t))\;\mathrm{d}B_{t},\quad x(0)=x^{0}\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \mathrm{d}x^{n+1}(t)=f^{0}(t,v(t),x(t))\;\mathrm{d}t,\hskip 60.27759ptx^{n+1}(0)=0\end{cases} (8)

and define the mapping g~:n+1q+1:x~(g(x),xn+1)\tilde{g}:\mathbb{R}^{n+1}\rightarrow\mathbb{R}^{q+1}:\tilde{x}\mapsto(g(x),x^{n+1}). For every fixed time t(tj,T]t\in(t_{j},T], by denoting δtmin{ti+1ti,ttj:i=1,,j1}>0\delta_{t}\triangleq\min\{t_{i+1}-t_{i},t-t_{j}:i=1,\dots,j-1\}>0, the end-point mapping at time tt is defined to be the function

Ftj:\displaystyle F^{j}_{t}:\ 𝒞tjBδtj(0)+jq+1\displaystyle\mathcal{C}^{j}_{t}\triangleq B^{j}_{\delta_{t}}(0)\cap\mathbb{R}^{j}_{+}\rightarrow\mathbb{R}^{q+1} (9)
(η1,,ηj)𝔼[g~(x~uπ(t))g~(x~u(t))]\displaystyle(\eta_{1},\dots,\eta_{j})\mapsto\mathbb{E}\left[\tilde{g}(\tilde{x}_{u_{\pi}}(t))-\tilde{g}(\tilde{x}_{u}(t))\right]

where BρjB^{j}_{\rho} is the open ball in j\mathbb{R}^{j} of radius ρ>0\rho>0. Due to Lemma 2.1, it is not difficult to see that FtjF^{j}_{t} is Lipschitz (see also the argument developed to prove Lemma 4.1.2 below). In addition, this mapping may be Gateaux differentiated at zero along admissible directions of the cone 𝒞tj\mathcal{C}^{j}_{t}. For this, denote b~=(b,f0)\tilde{b}=(b^{\top},f^{0})^{\top}, σ~=(σ,0)\tilde{\sigma}=(\sigma^{\top},0)^{\top} and let zti,uiz_{t_{i},u_{i}} be the unique solution to (6) with ξti=b~(ti,ui,x(ti))b~(ti,u(ti),x(ti))\xi_{t_{i}}=\tilde{b}(t_{i},u_{i},x(t_{i}))-\tilde{b}(t_{i},u(t_{i}),x(t_{i})).

{rmrk}

For the proof of Theorem 3.2.2, the only change compared to the proof of Theorem 3.2.1 which is required up to this point consists of replacing the function g~:n+1q+1\tilde{g}:\mathbb{R}^{n+1}\rightarrow\mathbb{R}^{q+1} which defines the end-point mapping (9) by the function

(g~k,gkn+2):n+2q+2:(x~,y)(gk(x),xn+1,y).(\tilde{g}_{k},g^{n+2}_{k}):\mathbb{R}^{n+2}\rightarrow\mathbb{R}^{q+2}:(\tilde{x},y)\mapsto(g_{k}(x),x^{n+1},y).

Note that this change is consistent since we will effectively make use of the mapping FtjF^{j}_{t} at the time t=Tt=T only.

{lmm}

[Stochastic needle-like variation formula] Let (η1,,ηj)𝒞tj(\eta_{1},\dots,\eta_{j})\in\mathcal{C}^{j}_{t}. For any t>tjt>t_{j}, it holds that

𝔼[g~(x~uπ(t))g~(x~u(t))i=1jηig~x~(x~u(t))zti,ui(t)]=o(i=1jηi).\displaystyle\bigg{\|}\mathbb{E}\bigg{[}\tilde{g}(\tilde{x}_{u_{\pi}}(t))-\tilde{g}(\tilde{x}_{u}(t))-\sum^{j}_{i=1}\eta_{i}\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(t))z_{t_{i},u_{i}}(t)\bigg{]}\bigg{\|}=o\left(\sum^{j}_{i=1}\eta_{i}\right).

The proof of this result is technical (it requires an intense use of stochastic inequalities) but not difficult. We provide an extensive proof of Lemma 4.1.2 in the appendix in a more general context (see also Section 5).

4.1.3. Variational Inequalities

The main step in the proof of the PMP goes by contradiction, leveraging Lemma 4.1.2. To this end, for every jj\in\mathbb{N}, define the linear mapping

dFTj:+jq+1:(η1,,ηj)i=1jηi𝔼[g~x~(x~u(T))zti,ui(T)],dF^{j}_{T}:\mathbb{R}^{j}_{+}\rightarrow\mathbb{R}^{q+1}:(\eta_{1},\dots,\eta_{j})\mapsto\sum_{i=1}^{j}\eta_{i}\mathbb{E}\left[\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(T))z_{t_{i},u_{i}}(T)\right],

which due to Lemma 4.1.2, satisfies

limα>0,α0FTj(αη)α=dFTj(η),\underset{\alpha>0,\alpha\rightarrow 0}{\lim}\ \frac{F^{j}_{T}\left(\alpha\eta\right)}{\alpha}=dF^{j}_{T}(\eta),

for every η+j\eta\in\mathbb{R}^{j}_{+}. Finally, consider the closed, convex cone of q+1\mathbb{R}^{q+1} given by

KCl(Cone{𝔼[g~x~(x~u(T))zti,ui(T)]:foruiUandti(0,T)is Lebesgue foru}).\displaystyle K\triangleq\textnormal{Cl}\bigg{(}\textnormal{Cone}\ \bigg{\{}\mathbb{E}\left[\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(T))z_{t_{i},u_{i}}(T)\right]:\ \textnormal{for}\ u_{i}\in U\ \textnormal{and}\ t_{i}\in(0,T)\ \textnormal{is Lebesgue for}\ u\bigg{\}}\bigg{)}.

If K=q+1K=\mathbb{R}^{q+1}, it would hold dFTj(+j)=K=q+1dF^{j}_{T}(\mathbb{R}^{j}_{+})=K=\mathbb{R}^{q+1}, and by applying [40, Lemma 12.1], one would find that the origin is an interior point of FTj(𝒞Tj)F^{j}_{T}(\mathcal{C}^{j}_{T}). This would imply that (u,x)(u,x) cannot be optimal for OCP, which gives a contradiction.

The argument above (together with an application of the separation plane theorem) provides the existence of a non-zero vector denoted 𝔭~=(𝔭,𝔭0)q+1\tilde{\mathfrak{p}}=(\mathfrak{p}^{\top},\mathfrak{p}^{0})\in\mathbb{R}^{q+1} such that the following variational inequality holds

𝔭~𝔼[g~x~(x~u(T))zr,v(T)]0,r[0,T]is Lebesgue foru,vU.\tilde{\mathfrak{p}}^{\top}\mathbb{E}\left[\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(T))z_{r,v}(T)\right]\leq 0,\ r\in[0,T]\ \textnormal{is Lebesgue for}\ u,\ v\in U. (10)
{rmrk}

For the proof of Theorem 3.2.2, the variational inequalities (10) are replaced by the following upgraded variational inequalities which hold for (uk+1,xk+1)(u_{k+1},x_{k+1}), solution to LOCPk+1Δ{}^{\Delta}_{k+1}:

(𝔭~k+1pk+11)𝔼[(g~x~(x~k(T))001)\displaystyle\left(\begin{array}[]{c}\tilde{\mathfrak{p}}_{k+1}\\ p^{1}_{k+1}\end{array}\right)^{\top}\mathbb{E}\bigg{[}\bigg{(}\begin{array}[]{cc}\displaystyle\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{k}(T))&0\\ 0&1\end{array}\bigg{)} (zr,vk(T)(zr,vk)n+2(T))]0,\displaystyle\bigg{(}\begin{array}[]{c}z^{k}_{r,v}(T)\\ (z^{k}_{r,v})^{n+2}(T)\end{array}\bigg{)}\bigg{]}\leq 0, (17)
r[0,T]is Lebesgue foruk,vU,\displaystyle r\in[0,T]\ \textnormal{is Lebesgue for}\ u_{k},\ v\in U,

where each ((zr,vk),(zr,vk)n+2)((z^{k}_{r,v})^{\top},(z^{k}_{r,v})^{n+2})^{\top} solves an equation similar to (6) (we report this new equation in Section 4.2.2).

4.1.4. Conclusion of the Proof of the Stochastic Maximum Principle

The conditions of the PMP are derived by working out the variational inequality (10) and finding expressions of some appropriate conditional expectations. The main details are developed below in the context of optimization over deterministic controls (see Section 5 for the generalization to stochastic controls).

First, by appropriately developing solutions to (6), (10) can be rewritten as

𝔼[(𝔭~g~x~(x~u(T))ϕ(T)ψ(r))((bf0)(r,v,x(r))(bf0)(r,u(r),x(r)))]0\mathbb{E}\left[\left(\tilde{\mathfrak{p}}^{\top}\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(T))\phi(T)\psi(r)\right)^{\top}\left(\left(\begin{array}[]{c}b\\ f^{0}\end{array}\right)(r,v,x(r))-\left(\begin{array}[]{c}b\\ f^{0}\end{array}\right)(r,u(r),x(r))\right)\right]\leq 0

for every r[0,T]r\in[0,T] Lebesgue point for uu and every vUv\in U. Second, again from the structure of (6), it can be readily checked that by denoting

p(t)(𝔭~𝔼[g~x~(x~u(T))ϕ(T)|t]ψ(t))1,,n,p0(𝔭~g~x~(x~u(T))ϕ(T)ψ(t))n+1,p(t)\triangleq\left(\tilde{\mathfrak{p}}^{\top}\mathbb{E}\left[\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(T))\phi(T)\Big{|}\mathcal{F}_{t}\right]\psi(t)\right)_{1,\dots,n},\quad p^{0}\triangleq\left(\tilde{\mathfrak{p}}^{\top}\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(T))\phi(T)\psi(t)\right)_{n+1}, (18)

the quantity p0p^{0} is constant in [0,T][0,T] (in addition, its negativity can be shown through a standard reformulation of problem OCP, as done in [40, Section 12.4]). Notice that the stochastic process p:[0,T]×Ωnp:[0,T]\times\Omega\rightarrow\mathbb{R}^{n} is by definition \mathcal{F}–adapted. The quantities so far introduced allow one to reformulate the inequality above as

𝔼[p(t)(b(t,u(t),x(t))b(t,v,x(t)))+p0(f0(t,u(t),x(t))f0(t,v,x(t)))]0\displaystyle\mathbb{E}\bigg{[}p(t)^{\top}\Big{(}b(t,u(t),x(t))-b(t,v,x(t))\Big{)}+p^{0}\Big{(}f^{0}(t,u(t),x(t))-f^{0}(t,v,x(t))\Big{)}\bigg{]}\geq 0

for every t[0,T]t\in[0,T] Lebesgue point for uu and vUv\in U, from which we infer the maximality condition of the PMP.

It remains to show the existence of the process qL(Ω;L2([0,T];n))q\in L_{\mathcal{F}}(\Omega;L^{2}([0,T];\mathbb{R}^{n})), the continuity of the sample paths of the process pp, and the validity of the adjoint equation. For this, remark that, due to Jensen inequality and Lemma 2.1, the martingale (𝔼[g~x~(x~u(T))ϕ(T)|t])t[0,T]\left(\mathbb{E}\left[\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(T))\phi(T)\Big{|}\mathcal{F}_{t}\right]\right)_{t\in[0,T]} is bounded in L2L^{2}. Hence, the martingale representation theorem provides the existence of a process μL2([0,T]×Ω;(q+1)×(n+1))\mu\in L^{2}_{\mathcal{F}}([0,T]\times\Omega;\mathbb{R}^{(q+1)\times(n+1)}) such that 𝔼[g~x~(x~u(T))ϕ(T)|t]=𝔼[g~x~(x~u(T))ϕ(T)|0]+0tμ(s)dBsN+χ(t)\mathbb{E}\left[\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(T))\phi(T)\Big{|}\mathcal{F}_{t}\right]=\mathbb{E}\left[\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(T))\phi(T)\Big{|}\mathcal{F}_{0}\right]+\int^{t}_{0}\mu(s)\;\mathrm{d}B_{s}\triangleq N+\chi(t), where N(q+1)×(n+1)N\in\mathbb{R}^{(q+1)\times(n+1)} is a constant matrix. The definition in (18) immediately gives that the sample paths of the process pp are continuous. Next, an application of Itô formula (component-wise) readily shows that the product χψ\chi\psi satisfies, for t[0,T]t\in[0,T],

(χψ)(t)\displaystyle(\chi\psi)(t) =(0tμ(s)dBs)ψ(t)=0tμ(s)ψ(s)dBs0tμ(s)ψ(s)D(s)ds\displaystyle=\left(\int^{t}_{0}\mu(s)\;\mathrm{d}B_{s}\right)\psi(t)=\int^{t}_{0}\mu(s)\psi(s)\;\mathrm{d}B_{s}-\int^{t}_{0}\mu(s)\psi(s)D(s)\;\mathrm{d}s
0tχ(s)ψ(s)(A(s)D(s)2)ds0tχ(s)ψ(s)D(s)dBs.\displaystyle\quad-\int^{t}_{0}\chi(s)\psi(s)\left(A(s)-D(s)^{2}\right)\;\mathrm{d}s-\int^{t}_{0}\chi(s)\psi(s)D(s)\;\mathrm{d}B_{s}.

Denoting q(t)(𝔭~(μ(t)ψ(t)(Nψ(t)+χ(t)ψ(t))D(t)))1,,nq(t)\triangleq\left(\tilde{\mathfrak{p}}^{\top}\Big{(}\mu(t)\psi(t)-\left(N\psi(t)+\chi(t)\psi(t)\right)D(t)\Big{)}\right)_{1,\dots,n}, the computations above readily give the adjoint equation of the PMP. Those computations may also be leveraged to show pL2(Ω;C([0,T];n))p\in L^{2}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n})) and qL2([0,T]×Ω;n)q\in L^{2}_{\mathcal{F}}([0,T]\times\Omega;\mathbb{R}^{n}) (see, e.g., [20, Section 7.2]).

4.2. Proof of the Convergence Result

Here we enter the core of the proof of Theorem 3.3.1. The convergence of trajectories and controls is addressed first. We devote the last two sections to the convergence of variational inequalities and Pontryagin extremals. For the sake of clarity and brevity, we only consider free-final-time problems. From now on, we implicitly assume (A1)(A_{1})(A3)(A_{3}).

4.2.1. Convergence of Controls and Trajectories

Due to (A1)(A_{1})(A3)(A_{3}), there exists a sequence of tuples (uk,xk)k(u_{k},x_{k})_{k\in\mathbb{N}} such that for every kk\in\mathbb{N}, the tuple (uk+1,xk+1)(u_{k+1},x_{k+1}) solves LOCPk+1Δ{}^{\Delta}_{k+1}. In what follows, we implicitly adopt the reformulation of each problem convexified problem LOCPk+1Δ{}^{\Delta}_{k+1}, which consist of adding 𝔼[y(T)Δk+1]0\mathbb{E}[y(T)-\Delta_{k+1}]\leq 0 to the final constraints through the new variable yy in (4). If u𝒰u\in\mathcal{U} denotes an admissible control for OCP that fulfills the conditions of Theorem 3.3.1, we denote by x~:[0,T]×Ωn+1\tilde{x}:[0,T]\times\Omega\rightarrow\mathbb{R}^{n+1} the \mathcal{F}–adapted with continuous sample paths process solution to the augmented system (8) related to OCP with control uu. The following holds. {lmm}[Convergence of trajectories] Assume that the sequence (Δk)k+(\Delta_{k})_{k\in\mathbb{N}}\subseteq\mathbb{R}_{+} converges to zero. If the sequence (uk)k(u_{k})_{k\in\mathbb{N}} converges to uu for the weak topology of L2L^{2}, then 𝔼[supt[0,T]xk(t)x(t)]0\mathbb{E}\left[\underset{t\in[0,T]}{\sup}\ \|x_{k}(t)-x(t)\|^{\ell}\right]\rightarrow 0, as kk\rightarrow\infty, for every 2\ell\geq 2.

Proof.

For every t[0,T]t\in[0,T], we have (below, C0C\geq 0 represents some often overloaded appropriate constant)

xk+1(t)x(t)C0t(b0(s,xk(s))b0(s,x(s)))ds\displaystyle\|x_{k+1}(t)-x(t)\|^{\ell}\leq C\left\|\int^{t}_{0}\Big{(}b_{0}(s,x_{k}(s))-b_{0}(s,x(s))\Big{)}\;\mathrm{d}s\right\|^{\ell} (19)
+Ci=1m0t(uk+1i(s)bi(s,xk(s))ui(s)bi(s,x(s)))ds\displaystyle+C\sum^{m}_{i=1}\left\|\int^{t}_{0}\Big{(}u^{i}_{k+1}(s)b_{i}(s,x_{k}(s))-u^{i}(s)b_{i}(s,x(s))\Big{)}\;\mathrm{d}s\right\|^{\ell}
+C0t(b0x(s,xk(s))+i=1muki(s)bix(s,xk(s)))(xk+1(s)xk(s))ds\displaystyle+C\left\|\int^{t}_{0}\left(\frac{\partial b_{0}}{\partial x}(s,x_{k}(s))+\sum^{m}_{i=1}u^{i}_{k}(s)\frac{\partial b_{i}}{\partial x}(s,x_{k}(s))\right)(x_{k+1}(s)-x_{k}(s))\;\mathrm{d}s\right\|^{\ell}
+C0t(σ(s,xk(s))σ(s,x(s))+σx(s,xk(s))(xk+1(s)xk(s)))dBs\displaystyle+C\left\|\int^{t}_{0}\left(\sigma(s,x_{k}(s))-\sigma(s,x(s))+\frac{\partial\sigma}{\partial x}(s,x_{k}(s))(x_{k+1}(s)-x_{k}(s))\right)\;\mathrm{d}B_{s}\right\|^{\ell}

Now, we take expectations. For the last term, we compute

𝔼[0t(σ(s,xk(s))σ(s,x(s))+σx(s,xk(s))(xk+1(s)xk(s)))dBs]\displaystyle\mathbb{E}\left[\left\|\int^{t}_{0}\left(\sigma(s,x_{k}(s))-\sigma(s,x(s))+\frac{\partial\sigma}{\partial x}(s,x_{k}(s))(x_{k+1}(s)-x_{k}(s))\right)\;\mathrm{d}B_{s}\right\|^{\ell}\right]\leq
C(0t𝔼[supr[0,s]xk+1(r)x(r)]ds+𝔼[0Txk+1(s)xk(s)1+(1)ds])\displaystyle\leq C\left(\int^{t}_{0}\mathbb{E}\left[\underset{r\in[0,s]}{\sup}\ \left\|x_{k+1}(r)-x(r)\right\|^{\ell}\right]\;\mathrm{d}s+\mathbb{E}\left[\int^{T}_{0}\left\|x_{k+1}(s)-x_{k}(s)\right\|^{1+(\ell-1)}\;\mathrm{d}s\right]\right)
C0t𝔼[supr[0,s]xk+1(r)x(r)]ds+C(0T𝔼[xk+1(s)xk(s)2]ds)12(supk𝔼[sups[0,T]xk(s)2(1)])12\displaystyle\leq C\int^{t}_{0}\mathbb{E}\left[\underset{r\in[0,s]}{\sup}\ \left\|x_{k+1}(r)-x(r)\right\|^{\ell}\right]\;\mathrm{d}s+C\left(\int^{T}_{0}\mathbb{E}\left[\left\|x_{k+1}(s)-x_{k}(s)\right\|^{2}\right]\mathrm{d}s\right)^{\frac{1}{2}}\left(\underset{k\in\mathbb{N}}{\sup}\ \mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \|x_{k}(s)\|^{2(\ell-1)}\right]\right)^{\frac{1}{2}}
C(0t𝔼[supr[0,s]xk+1(r)x(r)]ds+Δk+1)\displaystyle\leq C\left(\int^{t}_{0}\mathbb{E}\left[\underset{r\in[0,s]}{\sup}\ \left\|x_{k+1}(r)-x(r)\right\|^{\ell}\right]\;\mathrm{d}s+\Delta_{k+1}\right)

due to Hölder and Burkholder–Davis–Gundy inequalities, and the last inequality comes from Lemma 2.1. Similar computations can be carried out for the first and third terms of (19).

To handle the second term of (19), we proceed as follows. We see that since 𝔼[0Tbi(s,x(s))2dt]<\mathbb{E}\left[\int^{T}_{0}\|b_{i}(s,x(s))\|^{2}\;\mathrm{d}t\right]<\infty implies 0Tbi(s,x(s))2dt<\int^{T}_{0}\|b_{i}(s,x(s))\|^{2}\;\mathrm{d}t<\infty, 𝒢\mathcal{G}–almost surely, for every fixed i=1,,mi=1,\dots,m and t[0,T]t\in[0,T], the convergence of the sequence of controls for the weak topology of L2L^{2} entails that 0t(uki(s)ui(s))bi(s,x(s))ds0\left\|\int^{t}_{0}\left(u^{i}_{k}(s)-u^{i}(s)\right)b_{i}(s,x(s))\;\mathrm{d}s\right\|\rightarrow 0, 𝒢\mathcal{G}–almost surely as kk\rightarrow\infty. In addition, (A1)(A_{1}) gives ab(uki(s)ui(s))bi(s,x(s))dsC|ba|\left\|\int^{b}_{a}\left(u^{i}_{k}(s)-u^{i}(s)\right)b_{i}(s,x(s))\;\mathrm{d}s\right\|\leq C|b-a| for every a,b[0,T]a,b\in[0,T] and 𝒢\mathcal{G}–almost surely. Hence, [52, Lemma 3.4] and the dominated convergence theorem finally provide that

𝔼[supt[0,T]0t(uki(s)ui(s))bi(s,x(s))ds]0,k.\mathbb{E}\left[\underset{t\in[0,T]}{\sup}\ \left\|\int^{t}_{0}\left(u^{i}_{k}(s)-u^{i}(s)\right)b_{i}(s,x(s))\;\mathrm{d}s\right\|^{\ell}\right]\rightarrow 0,\quad k\rightarrow\infty.

It is easy to conclude the proof by applying a routine Grönwall inequality argument. ∎

The sought-after convergence of trajectories is a consequence of Lemma 4.2.1 when the conditions of Theorem 3.3.1 are met. In addition, limiting points for the sequences of controls fulfilling the conditions of Theorem 3.3.1 always exist up to some subsequence. Indeed, in this case, the set of admissible controls 𝒰\mathcal{U} is closed and convex for the strong topology of L2L^{2}. Hence it is closed for the weak topology of L2L^{2}. Therefore, since (uk)k(u_{k})_{k\in\mathbb{N}} is bounded in L2L^{2}, there exists u𝒰u\in\mathcal{U} such that, up to some subsequence, (uk)k(u_{k})_{k\in\mathbb{N}} weakly converges to uu for the weak topology of L2L^{2}. It remains to show that the process xx is feasible for OCP. For this, we compute

|𝔼[\displaystyle\Big{|}\mathbb{E}[ g(x(T))]|𝔼[g(x(T))g(xk(T))]+𝔼[gx(xk(T))(xk+1(T)xk(T))]\displaystyle g(x(T))]\Big{|}\leq\mathbb{E}\Big{[}\|g(x(T))-g(x_{k}(T))\|\Big{]}+\mathbb{E}\left[\left\|\frac{\partial g}{\partial x}(x_{k}(T))(x_{k+1}(T)-x_{k}(T))\right\|\right]
C(𝔼[x(T)xk(T)2]12+𝔼[xk+1(T)x(T)2]12)0\displaystyle\leq C\left(\mathbb{E}\Big{[}\|x(T)-x_{k}(T)\|^{2}\Big{]}^{\frac{1}{2}}+\mathbb{E}\Big{[}\|x_{k+1}(T)-x(T)\|^{2}\Big{]}^{\frac{1}{2}}\right)\rightarrow 0

due to Lemma 4.2.1 and the dominated convergence theorem (here C0C\geq 0 is a constant coming from (A2)(A_{2}), and we use the continuity of the sample paths of xx).

4.2.2. Convergence of Variational Inequalities

We start with a crucial result on linear stochastic differential equations. Recall the notation of Section 4.1. {lmm}[Convergence of variational inequalities] Fix 2\ell\geq 2 and consider sequences of times (rk)k[0,T](r_{k})_{k\in\mathbb{N}}\subseteq[0,T] and of uniformly bounded variables (ξ~k)k(\tilde{\xi}_{k})_{k\in\mathbb{N}} such that for every kk\in\mathbb{N}, ξ~kLrk2(Ω;n+1)\tilde{\xi}_{k}\in L^{2}_{\mathcal{F}_{r_{k}}}(\Omega;\mathbb{R}^{n+1}). Assume that rkrr_{k}\rightarrow r with rkrr_{k}\leq r for kk\in\mathbb{N} and ξ~kLξ~Lr2(Ω;n+1)\tilde{\xi}_{k}\overset{L^{\ell}}{\rightarrow}\tilde{\xi}\in L^{2}_{\mathcal{F}_{r}}(\Omega;\mathbb{R}^{n+1}) with ξ~\tilde{\xi} bounded. Denote w~k+1\tilde{w}_{k+1}, w~\tilde{w} the stochastic process solutions, respectively, to

{dw(t)=(b0x(t,xk(t))+i=1muki(t)bix(t,xk(t)))w(t)dt+σx(t,xk(t))w(t)dBtdwn+1(t)=(Hx(xk+1(t))+Lx(t,uk(t),xk(t)))w(t)dtw~(t)=0,t[0,rk),w~(rk)=ξ~k+1,\begin{cases}\displaystyle\mathrm{d}w(t)=\left(\frac{\partial b_{0}}{\partial x}(t,x_{k}(t))+\sum^{m}_{i=1}u^{i}_{k}(t)\frac{\partial b_{i}}{\partial x}(t,x_{k}(t))\right)w(t)\;\mathrm{d}t+\frac{\partial\sigma}{\partial x}(t,x_{k}(t))w(t)\;\mathrm{d}B_{t}\\ \displaystyle\mathrm{d}w^{n+1}(t)=\left(\frac{\partial H}{\partial x}(x_{k+1}(t))+\frac{\partial L}{\partial x}(t,u_{k}(t),x_{k}(t))\right)w(t)\;\mathrm{d}t\\ \tilde{w}(t)=0,\ t\in[0,r_{k}),\quad\tilde{w}(r_{k})=\tilde{\xi}_{k+1},\end{cases}
{dw(t)=(b0x(t,x(t))+i=1mui(t)bix(t,x(t)))w(t)dt+σx(t,x(t))w(t)dBtdwn+1(t)=(Hx(x(t))+Lx(t,u(t),x(t)))w(t)dtw~(t)=0,t[0,r),w~(r)=ξ~.\begin{cases}\displaystyle\mathrm{d}w(t)=\left(\frac{\partial b_{0}}{\partial x}(t,x(t))+\sum^{m}_{i=1}u^{i}(t)\frac{\partial b_{i}}{\partial x}(t,x(t))\right)w(t)\;\mathrm{d}t+\frac{\partial\sigma}{\partial x}(t,x(t))w(t)\;\mathrm{d}B_{t}\\ \displaystyle\mathrm{d}w^{n+1}(t)=\left(\frac{\partial H}{\partial x}(x(t))+\frac{\partial L}{\partial x}(t,u(t),x(t))\right)w(t)\;\mathrm{d}t\\ \tilde{w}(t)=0,\ t\in[0,r),\quad\tilde{w}(r)=\tilde{\xi}.\end{cases}

Under the assumptions of Lemma 4.2.1, it holds that 𝔼[supt[r,T]w~k(t)w~(t)]0\mathbb{E}\left[\underset{t\in[r,T]}{\sup}\ \|\tilde{w}_{k}(t)-\tilde{w}(t)\|^{\ell}\right]\rightarrow 0 for kk\rightarrow\infty.

Proof.

From rkrr_{k}\leq r, kk\in\mathbb{N}, for t[r,T]t\in[r,T] we have (below, C0C\geq 0 is a constant)

\displaystyle\| w~k+1(t)w~(t)Cξ~k+1ξ~\displaystyle\tilde{w}_{k+1}(t)-\tilde{w}(t)\|^{\ell}\leq C\|\tilde{\xi}_{k+1}-\tilde{\xi}\|^{\ell} (20)
+Crkt(Hx(xk+1(s))wk+1(s)Hx(x(s))w(s))ds\displaystyle+C\left\|\int^{t}_{r_{k}}\left(\frac{\partial H}{\partial x}(x_{k+1}(s))w_{k+1}(s)-\frac{\partial H}{\partial x}(x(s))w(s)\right)\;\mathrm{d}s\right\|^{\ell}
+Crkt(b0x(s,xk(s))wk+1(s)b0x(s,x(s))w(s))ds\displaystyle+C\left\|\int^{t}_{r_{k}}\left(\frac{\partial b_{0}}{\partial x}(s,x_{k}(s))w_{k+1}(s)-\frac{\partial b_{0}}{\partial x}(s,x(s))w(s)\right)\;\mathrm{d}s\right\|^{\ell}
+C0t𝟙[rk,T](t)(σx(s,xk(s))wk+1(s)σx(s,x(s))w(s))dBs\displaystyle+C\left\|\int^{t}_{0}\mathbbm{1}_{[r_{k},T]}(t)\left(\frac{\partial\sigma}{\partial x}(s,x_{k}(s))w_{k+1}(s)-\frac{\partial\sigma}{\partial x}(s,x(s))w(s)\right)\;\mathrm{d}B_{s}\right\|^{\ell}
+Crkt(Lx(t,uk(t),xk(t))wk+1(s)Lx(t,u(t),x(t))w(s))ds\displaystyle+C\left\|\int^{t}_{r_{k}}\left(\frac{\partial L}{\partial x}(t,u_{k}(t),x_{k}(t))w_{k+1}(s)-\frac{\partial L}{\partial x}(t,u(t),x(t))w(s)\right)\;\mathrm{d}s\right\|^{\ell}
+Ci=1mrkt(uki(s)bix(s,xk(s))wk+1(s)ui(s)bix(s,x(s))w(s))ds.\displaystyle+C\sum^{m}_{i=1}\left\|\int^{t}_{r_{k}}\left(u^{i}_{k}(s)\frac{\partial b_{i}}{\partial x}(s,x_{k}(s))w_{k+1}(s)-u^{i}(s)\frac{\partial b_{i}}{\partial x}(s,x(s))w(s)\right)\;\mathrm{d}s\right\|^{\ell}.

We consider the expectation of the fourth term on the right-hand side (a similar argument can be developed for the first three terms, which are omitted in the interest of clarity and brevity). The Burkholder–Davis–Gundy and Holdër inequalities give

𝔼[0t𝟙[rk,T](t)(σx(s,xk(s))wk+1(s)σx(s,x(s))w(s))dBs]\displaystyle\mathbb{E}\bigg{[}\bigg{\|}\int^{t}_{0}\mathbbm{1}_{[r_{k},T]}(t)\left(\frac{\partial\sigma}{\partial x}(s,x_{k}(s))w_{k+1}(s)-\frac{\partial\sigma}{\partial x}(s,x(s))w(s)\right)\;\mathrm{d}B_{s}\bigg{\|}^{\ell}\bigg{]}\leq
C|rrk|+Crt𝔼[sups[0,s]wk+1(s)w(s)]ds+C𝔼[sups[0,T]w(s)2]𝔼[sups[0,T]xk(s)x(s)2],\displaystyle\leq C|r-r_{k}|+C\int^{t}_{r}\mathbb{E}\left[\underset{s^{\prime}\in[0,s]}{\sup}\ \left\|w_{k+1}(s^{\prime})-w(s^{\prime})\right\|^{\ell}\right]\;\mathrm{d}s+C\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \|w(s)\|^{2\ell}\right]\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \|x_{k}(s)-x(s)\|^{2\ell}\right],

where the last inequality holds due to Lemma 4.2.1 and because Lemma 2.1 may be readily extended to w~\tilde{w}.

Finally, due to the fact that LL is affine with respect to the control variable, we may handle the fifth and the sixth terms in (20) by combining the argument above with the final steps in the proof of Lemma 4.2.1, and a routine Grönwall inequality argument provides the conclusion. ∎

Consider (uk)k𝒰(u_{k})_{k\in\mathbb{N}}\subseteq\mathcal{U} which converges to u𝒰u\in\mathcal{U} for the weak convergence of L2L^{2}. By assuming (A4)(A_{4}), we may denote by [0,T]\mathcal{L}\subseteq[0,T] the full Lebesgue-measure subset such that rr\in\mathcal{L} if and only if rr is a Lebesgue point for uu and rk𝒟k+1r\notin\cup_{k\in\mathbb{N}}\mathcal{D}_{k+1} (we use the notation introduced with (A4)(A_{4})). We prove the existence of a non-zero vector 𝔭~q+1\tilde{\mathfrak{p}}\in\mathbb{R}^{q+1} whose last component is non-positive such that for every rr\in\mathcal{L}, vUv\in U,

𝔭~𝔼[g~x~(x~(T))zr,v(T)]0,\tilde{\mathfrak{p}}^{\top}\mathbb{E}\left[\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}(T))z_{r,v}(T)\right]\leq 0, (21)

where the \mathcal{F}–adapted with continuous sample paths stochastic process zr,v:[0,T]×Ωn+1z_{r,v}:[0,T]\times\Omega\rightarrow\mathbb{R}^{n+1} solves (6) with

A(t)=(b0x(t,x(t))+i=1mui(t)bix(t,x(t))Hx(x(t))+Lx(t,u(t),x(t))),D(t)=(σx(t,x(t))0),A(t)=\left(\begin{array}[]{c}\displaystyle\frac{\partial b_{0}}{\partial x}(t,x(t))+\sum^{m}_{i=1}u^{i}(t)\frac{\partial b_{i}}{\partial x}(t,x(t))\\ \displaystyle\frac{\partial H}{\partial x}(x(t))+\frac{\partial L}{\partial x}(t,u(t),x(t))\end{array}\right),\quad D(t)=\left(\begin{array}[]{c}\displaystyle\frac{\partial\sigma}{\partial x}(t,x(t))\\ \displaystyle 0\end{array}\right),
ξ~r=ξ~r,vb~(r,v,x(r))b~(r,u(r),x(r)),\tilde{\xi}_{r}=\tilde{\xi}_{r,v}\triangleq\tilde{b}(r,v,x(r))-\tilde{b}(r,u(r),x(r)),

where we denote b~=(b,f0)\tilde{b}=(b^{\top},f^{0})^{\top} – we will use this notation from now on.

For this, due to (17), for every kk\in\mathbb{N}, the optimality of (uk+1,xk+1)(u_{k+1},x_{k+1}) for LOCPk+1Δ{}^{\Delta}_{k+1} provides a non-zero vector (𝔭~k+1,pk+11)q+2(\tilde{\mathfrak{p}}^{\top}_{k+1},p^{1}_{k+1})^{\top}\in\mathbb{R}^{q+2} whose second to last component pk+100p^{0}_{k+1}\leq 0 is non-zero by assumption, so that for rr\in\mathcal{L} and vUv\in U, it holds that

(𝔭~k+1pk+11)𝔼[(g~x~(x~k(T))001)(zr,vk+1(T)(zr,vk+1)n+2(T))]0,\left(\begin{array}[]{c}\tilde{\mathfrak{p}}_{k+1}\\ p^{1}_{k+1}\end{array}\right)^{\top}\mathbb{E}\left[\left(\begin{array}[]{cc}\displaystyle\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{k}(T))&0\\ 0&1\end{array}\right)\left(\begin{array}[]{c}z^{k+1}_{r,v}(T)\\ (z^{k+1}_{r,v})^{n+2}(T)\end{array}\right)\right]\leq 0,

where with the notation

Ak+1(t)=(b0x(t,xk(t))+i=1muki(t)bix(t,xk(t))0Hx(xk+1(t))+Lx(t,uk(t),xk(t))02(xk+1(t)xk(t))0),Dk+1(t)=(σx(t,xk(t))00000),A_{k+1}(t)=\left(\begin{array}[]{cc}\displaystyle\frac{\partial b_{0}}{\partial x}(t,x_{k}(t))+\sum^{m}_{i=1}u^{i}_{k}(t)\frac{\partial b_{i}}{\partial x}(t,x_{k}(t))&0\\ \displaystyle\frac{\partial H}{\partial x}(x_{k+1}(t))+\frac{\partial L}{\partial x}(t,u_{k}(t),x_{k}(t))&0\\ 2(x_{k+1}(t)-x_{k}(t))&0\end{array}\right),\quad D_{k+1}(t)=\left(\begin{array}[]{cc}\displaystyle\frac{\partial\sigma}{\partial x}(t,x_{k}(t))&0\\ \displaystyle 0&0\\ \displaystyle 0&0\end{array}\right),

the \mathcal{F}–adapted with continuous sample paths stochastic process ((zr,vk),(zr,vk)n+2):[0,T]×Ωn+2((z^{k}_{r,v})^{\top},(z^{k}_{r,v})^{n+2})^{\top}:[0,T]\times\Omega\rightarrow\mathbb{R}^{n+2} solves a higher dimensional version of (6) with

A=Ak+1,D=Dk+1,and initial condition(ξ~r,vk+10)(b~k+1(r,v,xk+1(r))b~k+1(r,uk+1(r),xk+1(r))0),A=A_{k+1},\ D=D_{k+1},\ \textnormal{and initial condition}\ \left(\begin{array}[]{c}\tilde{\xi}^{k+1}_{r,v}\\ 0\end{array}\right)\triangleq\left(\begin{array}[]{c}\tilde{b}_{k+1}(r,v,x_{k+1}(r))-\tilde{b}_{k+1}(r,u_{k+1}(r),x_{k+1}(r))\\ 0\end{array}\right),

where again we denote b~k+1=(bk+1,fk+10)\tilde{b}_{k+1}=(b^{\top}_{k+1},f^{0}_{k+1})^{\top}.

Now, fix rr\in\mathcal{L} and vUv\in U. The following comes from combining Lemma 3.3.2 with [47, Lemma 3.11]. {lmm}[Pointwise convergence on controls] Under (A4)(A_{4}), there exists (rk)k(0,r)(r_{k})_{k\in\mathbb{N}}\subseteq(0,r) such that for every kk\in\mathbb{N}, rkr_{k} is a Lebesgue point for uku_{k}, and rkrr_{k}\rightarrow r, uk(rk)u(r)u_{k}(r_{k})\rightarrow u(r) as kk\rightarrow\infty. If (rk)k(0,r)(r_{k})_{k\in\mathbb{N}}\subseteq(0,r) denotes the sequence given by Lemma 4.2.2, we define ξ~k=ξ~rk,vk\tilde{\xi}_{k}=\tilde{\xi}^{k}_{r_{k},v} and ξ~=ξ~r,v\tilde{\xi}=\tilde{\xi}_{r,v}. Straightforward computations give (below, C0C\geq 0 is a constant)

𝔼[ξ~k+1ξ~2]\displaystyle\mathbb{E}\left[\|\tilde{\xi}_{k+1}-\tilde{\xi}\|^{2}\right]\leq C(uk+1(rk+1)u(r)2+|rk+1r|2\displaystyle C\bigg{(}\|u_{k+1}(r_{k+1})-u(r)\|^{2}+|r_{k+1}-r|^{2} (22)
+𝔼[x(rk+1)x(r)2]+𝔼[sups[0,T]xk(s)x(s)2]),\displaystyle+\mathbb{E}\left[\|x(r_{k+1})-x(r)\|^{2}\right]+\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \|x_{k}(s)-x(s)\|^{2}\right]\bigg{)},

from which 𝔼[ξ~k+1ξ~2]0\mathbb{E}\left[\|\tilde{\xi}_{k+1}-\tilde{\xi}\|^{2}\right]\rightarrow 0 for kk\rightarrow\infty (Lemma 4.2.1 and 4.2.2). Therefore, from Lemma 4.2.2 we infer that

𝔼[supt[r,T]zrk,vk(t)zr,v(t)2]0,k,\mathbb{E}\left[\underset{t\in[r,T]}{\sup}\ \|z^{k}_{r_{k},v}(t)-z_{r,v}(t)\|^{2}\right]\rightarrow 0,\quad k\rightarrow\infty,

which, together with Δk0\Delta_{k}\rightarrow 0, readily yields

𝔼[|(zrk,vk)n+2(T)|]0,k.\mathbb{E}\left[|(z^{k}_{r_{k},v})^{n+2}(T)|\right]\rightarrow 0,\quad k\rightarrow\infty.

At this step, we point out that the variational inequalities in (10) and in (17) still hold if we take multipliers of norm one. Specifically, we may assume that (𝔭~k+1,pk+11)=1\|(\tilde{\mathfrak{p}}^{\top}_{k+1},p^{1}_{k+1})^{\top}\|=1 for every kk\in\mathbb{N}. Therefore, up to some subsequence, there exists a vector (𝔭~,p1)=(𝔭,p0,p1)q+1(\tilde{\mathfrak{p}}^{\top},p^{1})^{\top}=(\mathfrak{p}^{\top},p^{0},p^{1})^{\top}\in\mathbb{R}^{q+1} such that (𝔭~k,pk1)(𝔭~,p1)(\tilde{\mathfrak{p}}^{\top}_{k},p^{1}_{k})^{\top}\rightarrow(\tilde{\mathfrak{p}}^{\top},p^{1})^{\top} for kk\rightarrow\infty and satisfying (𝔭~,p1)0(\tilde{\mathfrak{p}}^{\top},p^{1})^{\top}\neq 0, p00p^{0}\leq 0. We use this remark to conclude as follows. The definition of g~\tilde{g} and the Hölder inequality give (C0C\geq 0 is a constant)

𝔭~𝔼[g~x~(x~(T))zr,v(T)]|𝔭~𝔼[g~x~(x~(T))zr,v(T)](𝔭~k+1pk+11)𝔼[(g~x~(x~k(T))001)(zrk+1,vk+1(T)(zrk+1,vk+1)n+2(T))]|\displaystyle\tilde{\mathfrak{p}}^{\top}\mathbb{E}\left[\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}(T))z_{r,v}(T)\right]\leq\left|\tilde{\mathfrak{p}}^{\top}\mathbb{E}\left[\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}(T))z_{r,v}(T)\right]-\left(\begin{array}[]{c}\tilde{\mathfrak{p}}_{k+1}\\ p^{1}_{k+1}\end{array}\right)^{\top}\mathbb{E}\left[\left(\begin{array}[]{cc}\displaystyle\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{k}(T))&0\\ 0&1\end{array}\right)\left(\begin{array}[]{c}z^{k+1}_{r_{k+1},v}(T)\\ (z^{k+1}_{r_{k+1},v})^{n+2}(T)\end{array}\right)\right]\right|
C(𝔭~𝔭~k+1+pk+11𝔼[|(zrk,vk)n+2(T)|]\displaystyle\leq C\bigg{(}\|\tilde{\mathfrak{p}}-\tilde{\mathfrak{p}}_{k+1}\|+p^{1}_{k+1}\mathbb{E}\left[|(z^{k}_{r_{k},v})^{n+2}(T)|\right]
+𝔼[zrk+1,vk+1(T)zr,v(T)2]12+𝔼[zr,v(T)2]12𝔼[sups[0,T]xk(s)x(s)2]12),\displaystyle\hskip 129.16626pt+\mathbb{E}\left[\|z^{k+1}_{r_{k+1},v}(T)-z_{r,v}(T)\|^{2}\right]^{\frac{1}{2}}+\mathbb{E}\Big{[}\|z_{r,v}(T)\|^{2}\Big{]}^{\frac{1}{2}}\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \|x_{k}(s)-x(s)\|^{2}\right]^{\frac{1}{2}}\bigg{)},

and in this case, (21) follows from Lemma 4.2.1 and the convergences obtained above. From what we showed in Section 4.1.4, this latter inequality yields that (u,𝔭,p0)(u,\mathfrak{p},p^{0}) is extremal for OCP as soon as (𝔭,p0)0(\mathfrak{p}^{\top},p^{0})^{\top}\neq 0.

Finally, we turn to the case for which the sequence (uk)k𝒰(u_{k})_{k\in\mathbb{N}}\subseteq\mathcal{U} converges to u𝒰u\in\mathcal{U} for the strong topology of L2L^{2}, but without assuming (A4)(A_{4}). For this, fix vUv\in U and define the stochastic processes ξ~k(s)=ξ~s,vk\tilde{\xi}_{k}(s)=\tilde{\xi}^{k}_{s,v} and ξ~(s)=ξ~s,v\tilde{\xi}(s)=\tilde{\xi}_{s,v}, where s[0,T]s\in[0,T]. Similar computations to the ones developed to compute the bound (22) provide that 0T𝔼[ξ~k(s)ξ~(s)2]ds0\int^{T}_{0}\mathbb{E}\left[\|\tilde{\xi}_{k}(s)-\tilde{\xi}(s)\|^{2}\right]\;\mathrm{d}s\rightarrow 0 as kk\rightarrow\infty, and therefore up to some subsequence, the quantity 𝔼[ξ~k(s)ξ~(s)2]\mathbb{E}\left[\|\tilde{\xi}_{k}(s)-\tilde{\xi}(s)\|^{2}\right] converges to zero for k0k\rightarrow 0 a.e. in [0,T][0,T]. By taking countable intersections of sets of Lebesgue points (one for each control uku_{k}, for all kk\in\mathbb{N}), it follows that the argument above can be iterated exactly in the same manner (via Lemma 4.2.2), leading to the same conclusion.

4.2.3. Convergence of Multipliers and Conclusion

By applying the construction developed in Section 4.1.4 to the variational inequality (21), we retrieve a tuple (p,𝔭,p0,q)(p,\mathfrak{p},p^{0},q), where 𝔭q\mathfrak{p}\in\mathbb{R}^{q}, p00p^{0}\leq 0 are constant, pL2(Ω;C0([0,T];n))p\in L^{2}_{\mathcal{F}}(\Omega;C^{0}([0,T];\mathbb{R}^{n})) and qL2([0,T]×Ω;n)q\in L^{2}_{\mathcal{F}}([0,T]\times\Omega;\mathbb{R}^{n}), such that, as soon as (𝔭,p0)0(\mathfrak{p}^{\top},p^{0})^{\top}\neq 0, the tuple (u,𝔭,p0,q)(u,\mathfrak{p},p^{0},q) is extremal for OCP associated with (u,x,p,𝔭,p0,q)(u,x,p,\mathfrak{p},p^{0},q) satisfying conditions 1., 2., and 3. of Theorem 3.2.1. To conclude, it only remains to prove that

𝔼[sups[0,T]pk(s)p(s)2]0,0T𝔼[qk(s)q(s)2]ds0,k.\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \|p_{k}(s)-p(s)\|^{2}\right]\rightarrow 0,\quad\int^{T}_{0}\mathbb{E}\Big{[}\|q_{k}(s)-q(s)\|^{2}\Big{]}\;\mathrm{d}s\rightarrow 0,\quad k\rightarrow\infty. (23)

Here, each tuple (uk,𝔭k,pk0)(u_{k},\mathfrak{p}_{k},p^{0}_{k}) is the extremal of LOCPkΔ{}^{\Delta}_{k} associated with the tuple (uk,xk,pk,𝔭k,pk0,qk)(u_{k},x_{k},p^{k},\mathfrak{p}_{k},p^{0}_{k},q_{k}), where (pk,qk)(p_{k},q_{k}) solves the adjoint equation of Theorem 3.2.2. Since, under the assumption (𝔭,p0)0(\mathfrak{p}^{\top},p^{0})^{\top}\neq 0, the multipliers pk1p^{1}_{k}, kk\in\mathbb{N}, and p1p^{1} do not play any role, for the sake of clarity and brevity of notation, and without loss of generality, in what follows we implicitly assume pk1=p1=0p^{1}_{k}=p^{1}=0, for every kk\in\mathbb{N}.

Let us start with the first convergence of (23). For this, fix kk\in\mathbb{N} and consider the process

(𝔼[sups[0,T]g~x~(x~k(T))ϕk+1(T)ψk+1(s)g~x~(x~(T))ϕ(T)ψ(s)|t])t[0,T],\left(\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \left\|\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{k}(T))\phi_{k+1}(T)\psi_{k+1}(s)-\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}(T))\phi(T)\psi(s)\right\|\bigg{|}\mathcal{F}_{t}\right]\right)_{t\in[0,T]},

where ϕk+1\phi_{k+1}, ψk+1\psi_{k+1} solve (7) with matrices Ak+1A_{k+1}, Dk+1D_{k+1} from which, due to pk1=p1=0p^{1}_{k}=p^{1}=0, for every kk\in\mathbb{N}, we remove the last column and row, whereas ϕ\phi, ψ\psi solve (7) with matrices AA, DD, those matrices being defined above. Due to a straightforward extension of Lemma 2.1 to equations (7), this process is a martingale, bounded in L2L^{2}. Hence, the martingale representation theorem allows us to infer that this process is a martingale with continuous sample paths, and Doob and Jensen inequalities give

𝔼\displaystyle\mathbb{E} [supt[0,T]𝔼[sups[0,T]g~x~(x~k(T))ϕk+1(T)ψk+1(s)g~x~(x~(T))ϕ(T)ψ(s)|t]2]\displaystyle\left[\underset{t\in[0,T]}{\sup}\ \mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \left\|\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{k}(T))\phi_{k+1}(T)\psi_{k+1}(s)-\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}(T))\phi(T)\psi(s)\right\|\bigg{|}\mathcal{F}_{t}\right]^{2}\right]\leq (24)
4𝔼[sups[0,T]g~x~(x~k(T))ϕk+1(T)ψk+1(s)g~x~(x~(T))ϕ(T)ψ(s)2],\displaystyle\leq 4\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \left\|\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{k}(T))\phi_{k+1}(T)\psi_{k+1}(s)-\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}(T))\phi(T)\psi(s)\right\|^{2}\right],

which holds for every kk\in\mathbb{N}. By combining (24) with (18), we compute

𝔼\displaystyle\mathbb{E} [sups[0,T]pk+1(s)p(s)2]C𝔼[supt[0,T]g~x~(x~(tf))ϕ(T)ψ(s)2]𝔭~k+1𝔭~2\displaystyle\left[\underset{s\in[0,T]}{\sup}\ \|p_{k+1}(s)-p(s)\|^{2}\right]\leq C\mathbb{E}\left[\underset{t\in[0,T]}{\sup}\ \left\|\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}(t_{f}))\phi(T)\psi(s)\right\|^{2}\right]\|\tilde{\mathfrak{p}}_{k+1}-\tilde{\mathfrak{p}}\|^{2}
+C𝔼[sups[0,T]g~x~(x~k(T))ϕk+1(T)ψk+1(s)g~x~(x~(T))ϕ(T)ψ(s)2],\displaystyle+C\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \left\|\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{k}(T))\phi_{k+1}(T)\psi_{k+1}(s)-\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}(T))\phi(T)\psi(s)\right\|^{2}\right],

where C0C\geq 0 is a constant. Up to some subsequence, the first term on the right-hand side converges to zero. Moreover, the definition of g~\tilde{g} and Hölder inequality give

𝔼[sups[0,T]g~x~(x~k(T))ϕk+1(T)ψk+1(s)g~x~(x~(T))ϕ(T)ψ(s)2]\displaystyle\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \left\|\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{k}(T))\phi_{k+1}(T)\psi_{k+1}(s)-\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}(T))\phi(T)\psi(s)\right\|^{2}\right]\leq
C𝔼[ϕk+1(T)4]12𝔼[sups[0,T]ψk+1(s)ψ(s)4]12\displaystyle\leq C\mathbb{E}\left[\|\phi_{k+1}(T)\|^{4}\right]^{\frac{1}{2}}\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \|\psi_{k+1}(s)-\psi(s)\|^{4}\right]^{\frac{1}{2}}
+C(𝔼[sups[0,T]ψ(s)4]12𝔼[sups[0,T]ϕk+1(s)ϕ(s)4]12+𝔼[sups[0,T]ϕ(T)ψ(s)4]12𝔼[sups[0,T]xk(s)x(s)4]12),\displaystyle\ +C\left(\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \|\psi(s)\|^{4}\right]^{\frac{1}{2}}\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \|\phi_{k+1}(s)-\phi(s)\|^{4}\right]^{\frac{1}{2}}+\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \|\phi(T)\psi(s)\|^{4}\right]^{\frac{1}{2}}\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \|x_{k}(s)-x(s)\|^{4}\right]^{\frac{1}{2}}\right),

and a straightforward extension of Lemma 4.2.2 to equations (7) entails that all the terms on the right-hand side tend to zero. The convergence of (pk)k(p_{k})_{k\in\mathbb{N}} is proved.

It remains to prove the second convergence of (23). For this, we apply Itô formula to pk+1(t)p(t)2\|p_{k+1}(t)-p(t)\|^{2}, which due to (18), (24) and Lemma 2.1 extended to (7) gives (below, C0C\geq 0 is a constant)

𝔼[pk+1(0)p(0)2]+0T𝔼[qk+1(s)q(s)2]ds=𝔼[pk+1(T)p(T)2]\displaystyle\mathbb{E}\Big{[}\|p_{k+1}(0)-p(0)\|^{2}\Big{]}+\int^{T}_{0}\mathbb{E}\Big{[}\|q_{k+1}(s)-q(s)\|^{2}\Big{]}\;\mathrm{d}s=\mathbb{E}\Big{[}\|p_{k+1}(T)-p(T)\|^{2}\Big{]}
+2𝔼[0T(pk+1(s)p(s))(p(s)k+1bk+1x(s,uk+1(s),xk+1(s))p(s)bx(s,u(s),x(s)))ds]\displaystyle+2\mathbb{E}\bigg{[}\int^{T}_{0}\Big{(}p_{k+1}(s)-p(s)\Big{)}^{\top}\bigg{(}p(s)^{\top}_{k+1}\frac{\partial b_{k+1}}{\partial x}(s,u_{k+1}(s),x_{k+1}(s))-p(s)^{\top}\frac{\partial b}{\partial x}(s,u(s),x(s))\bigg{)}\;\mathrm{d}s\bigg{]}
+2𝔼[0T(pk+1(s)p(s))(pk+10fk+10x(s,uk+1(s),xk+1(s))p0f0x(s,u(s),x(s)))ds]\displaystyle+2\mathbb{E}\left[\int^{T}_{0}\Big{(}p_{k+1}(s)-p(s)\Big{)}^{\top}\bigg{(}p^{0}_{k+1}\frac{\partial f^{0}_{k+1}}{\partial x}(s,u_{k+1}(s),x_{k+1}(s))-p^{0}\frac{\partial f^{0}}{\partial x}(s,u(s),x(s))\bigg{)}\;\mathrm{d}s\right]
+2𝔼[0T(pk+1(s)p(s))(q(s)k+1σk+1x(s,xk+1(s))q(s)σx(s,x(s)))ds]\displaystyle+2\mathbb{E}\left[\int^{T}_{0}\Big{(}p_{k+1}(s)-p(s)\Big{)}^{\top}\bigg{(}q(s)^{\top}_{k+1}\frac{\partial\sigma_{k+1}}{\partial x}(s,x_{k+1}(s))-q(s)^{\top}\frac{\partial\sigma}{\partial x}(s,x(s))\bigg{)}\;\mathrm{d}s\right]
C𝔼[sups[0,T]pk+1(s)p(s)2]12(0T𝔼[qk+1(s)q(s)2]ds)12\displaystyle\leq C\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \|p_{k+1}(s)-p(s)\|^{2}\right]^{\frac{1}{2}}\left(\int^{T}_{0}\mathbb{E}\Big{[}\|q_{k+1}(s)-q(s)\|^{2}\Big{]}\;\mathrm{d}s\right)^{\frac{1}{2}}
+C𝔼[sups[0,T]pk+1(s)p(s)2]12(1+(0T𝔼[q(s)2]ds)12).\displaystyle\quad+C\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \|p_{k+1}(s)-p(s)\|^{2}\right]^{\frac{1}{2}}\left(1+\left(\int^{T}_{0}\mathbb{E}\Big{[}\|q(s)\|^{2}\Big{]}\;\mathrm{d}s\right)^{\frac{1}{2}}\right).

The conclusion finally follows from Young’s inequality and the convergence of (pk)k(p_{k})_{k\in\mathbb{N}}.

5. Extension to Problems with Free Final Time and Stochastic Controls

The accumulation properties of SCP can be extended to OCPs with free final time and whereby optimization is undertaken over stochastic controls. Specifically, in this section we investigate how results from Theorem 3.3.1 may be extended to finite-horizon, finite-dimensional non-linear stochastic General Optimal Control Problems (GOCP) having control-affine dynamics and uncontrolled diffusion, of the form

{minu,tf𝔼[0tff0(s,u(s),x(s))ds]𝔼[0tf(G(u(s))+H(x(s))+L(s,u(s),x(s)))ds]dx(t)=b(t,u(t),x(t))dt+σ(t,x(t))dBt,x(0)=x0,𝔼[g(x(tf))]=0,\begin{cases}\displaystyle\underset{u,t_{f}}{\min}\ \mathbb{E}\left[\int^{t_{f}}_{0}f^{0}(s,u(s),x(s))\;\mathrm{d}s\right]\triangleq\mathbb{E}\left[\int^{t_{f}}_{0}\Big{(}G(u(s))+H(x(s))+L(s,u(s),x(s))\Big{)}\;\mathrm{d}s\right]\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \mathrm{d}x(t)=b(t,u(t),x(t))\;\mathrm{d}t+\sigma(t,x(t))\;\mathrm{d}B_{t},\quad x(0)=x^{0},\quad\mathbb{E}\left[g(x(t_{f}))\right]=0,\end{cases}

where the final time 0tfT0\leq t_{f}\leq T may be free or not (here, T>0T>0 is some fixed maximal time), and we optimize over controls u𝒰u\in\mathcal{U} which are either deterministic, i.e., 𝒰=L2([0,T];U)\mathcal{U}=L^{2}([0,T];U), or stochastic, i.e., 𝒰=L2([0,T]×Ω;U)\mathcal{U}=L^{2}_{\mathcal{F}}([0,T]\times\Omega;U). Solutions to GOCP will be denoted by (tf,u,x)[0,T]×𝒰×L(Ω;C([0,T];n))(t_{f},u,x)\in[0,T]\times\mathcal{U}\times L^{\ell}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n})), for 2\ell\geq 2.

By adopting the notation given in (3), the stochastic Linearized General Optimal Control Problem (LGOCPk+1Δ{}^{\Delta}_{k+1}) at iteration kk\in\mathbb{N} may be defined accordingly as

{minu,tf𝔼[0tffk+10(s,u(s),x(s))ds]𝔼[0tffuk,xk0(s,u(s),x(s))ds]dx(t)=bk+1(t,u(t),x(t))dt+σk+1(t,x(t))dBt,x(0)=x0buk,xk(t,u(t),x(t))dt+σxk(t,x(t))dBt𝔼[gk+1(x(tf))]𝔼[g(xk(tfk))+gx(xk(tfk))(x(tf)xk(tfk))]=00T𝔼[x(s)xk(s)2]dsΔk+1,|tftfk|Δk+1,\begin{cases}\displaystyle\underset{u,t_{f}}{\min}\ \mathbb{E}\left[\int^{t_{f}}_{0}f^{0}_{k+1}(s,u(s),x(s))\;\mathrm{d}s\right]\triangleq\mathbb{E}\bigg{[}\int^{t_{f}}_{0}f^{0}_{u_{k},x_{k}}(s,u(s),x(s))\;\mathrm{d}s\bigg{]}\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \mathrm{d}x(t)=b_{k+1}(t,u(t),x(t))\;\mathrm{d}t+\sigma_{k+1}(t,x(t))\;\mathrm{d}B_{t},\quad x(0)=x^{0}\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \hskip 25.83325pt\triangleq b_{u_{k},x_{k}}(t,u(t),x(t))\;\mathrm{d}t+\sigma_{x_{k}}(t,x(t))\;\mathrm{d}B_{t}\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \displaystyle\mathbb{E}\left[g_{k+1}(x(t_{f}))\right]\triangleq\mathbb{E}\left[g(x_{k}(t^{k}_{f}))+\frac{\partial g}{\partial x}(x_{k}(t^{k}_{f}))(x(t_{f})-x_{k}(t^{k}_{f}))\right]=0\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \displaystyle\int^{T}_{0}\mathbb{E}\left[\|x(s)-x_{k}(s)\|^{2}\right]\;\mathrm{d}s\leq\Delta_{k+1},\quad|t_{f}-t^{k}_{f}|\leq\Delta_{k+1},\end{cases}

whose solutions are denoted by (tfk+1,uk+1,xk+1)[0,T]×𝒰×L(Ω;C([0,T];n))(t^{k+1}_{f},u_{k+1},x_{k+1})\in[0,T]\times\mathcal{U}\times L^{\ell}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n})), for 2\ell\geq 2. In particular, trust-region constraints are now imposed on the variable tft_{f} as well to derive convergence guarantees.

We point out that the results we present in this section are not to be considered as part of the main contribution, but they rather aim at providing insights for the future design of efficient numerical methods for stochastic optimal control problems which consider free final time and stochastic admissible controls. In particular, extending SCP in the presence of free final time and stochastic controls requires introducing additional assumptions which might seem demanding. We leave the investigation of the validity of this framework under sharper assumptions as a future direction of research.

5.1. Refined assumptions and extended result of convergence

The presence of the free final time hinders the well-posedness of the stochastic PMP when applied to each LGOCPk+1Δ{}^{\Delta}_{k+1}, i.e., Theorem 3.2.1, and in turn the validity of Theorem 3.3.1 in such more general setting. To overcome this issue, we need to tighten assumptions (A1)(A_{1})(A3)(A_{3}). Specifically, although (A1)(A_{1}) remains unchanged, assumptions (A2)(A_{2})(A3)(A_{3}) are replaced by the followings, respectively:

(A2)(A^{\prime}_{2}) Mappings gg, HH, and LiL_{i}, i=0,,mi=0,\dots,m, either are affine-in-state or have compact supports in n\mathbb{R}^{n} and in [0,T]×n[0,T]\times\mathbb{R}^{n}, respectively. In the case of free final time, gg is affine.

(A3)(A^{\prime}_{3}) For every kk\in\mathbb{N}, LGOCPk+1Δ{}^{\Delta}_{k+1} is feasible. In addition, in the case of free final time, for every kk\in\mathbb{N}, any optimal control uku_{k} for LGOCPkΔ{}^{\Delta}_{k} has continuous sample paths at the optimal final time tfk+1t^{k+1}_{f} for LGOCPk+1Δ{}^{\Delta}_{k+1}.

Assumptions (A3)(A^{\prime}_{3}) plays a crucial role to provide the existence of pointwise necessary conditions for optimality for the linearized problems LGOCPkΔ{}^{\Delta}_{k}, thus enabling classical formulations of the stochastic PMP. However, we do recognize (A3)(A^{\prime}_{3}) might be demanding and, as future research direction, we propose to relax it by leveraging integral-type necessary conditions for optimality, which are best suited to deal with optimal control problems which show explicit discontinuous dependence on time (e.g., [53]). Unfortunately, when optimization over stochastic controls is adopted, successfully leveraging weakly converging subsequences of controls to show “weak” guarantee of success for SCP, i.e., the equivalent of Theorem 3.3.2, becomes more challenging, in which case only Theorem 3.3.1 still holds, though under appropriate modifications. We leave proving success guarantees for SCP when optimizing over stochastic controls with weaker assumptions as a future direction of research.

Extending the stochastic PMP to GOCP and LGOCPk+1Δ{}^{\Delta}_{k+1} comes by assuming (A1)(A_{1}), (A2)(A^{\prime}_{2}), and (A3)(A^{\prime}_{3}). In particular, under those assumptions we have the following extension of Theorem 3.2.1 and of Theorem 3.2.2: {thrm}[Stochastic Pontryagin Maximum Principle for GOCP] Let (tf,u,x)(t_{f},u,x) be a locally-optimal solution to GOCP. There exist pL2(Ω;C([0,T];n))p\in L^{2}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n})) and a tuple (𝔭,p0,q)(\mathfrak{p},p^{0},q), where 𝔭q\mathfrak{p}\in\mathbb{R}^{q} and p00p^{0}\leq 0 are constant, and q=(q1,,qd)L2([0,T]×Ω;n×d)q=(q_{1},\dots,q_{d})\in L^{2}_{\mathcal{F}}([0,T]\times\Omega;\mathbb{R}^{n\times d}) such that the following relations are satisfied:

  1. (1)

    Non-Triviality Condition: (𝔭,p0)0(\mathfrak{p},p^{0})\neq 0.

  2. (2)

    Adjoint Equation:

    dp(t)\displaystyle\mathrm{d}p(t) =Hx(t,u(t),x(t),p(t),p0,q(t))dt+q(t)dBt,p(tf)=𝔼[gx(x(tf))]𝔭n.\displaystyle=\displaystyle-\frac{\partial H}{\partial x}(t,u(t),x(t),p(t),p^{0},q(t))\;\mathrm{d}t+q(t)\;\mathrm{d}B_{t},\quad p(t_{f})=\displaystyle\mathbb{E}\left[\frac{\partial g}{\partial x}(x(t_{f}))\right]^{\top}\mathfrak{p}\in\mathbb{R}^{n}.
  3. (3)

    Maximality Condition:

    u(t)=argmaxvU𝔼[H(t,v,x(t),p(t),p0,q(t))],a.e.(deterministic controls){\color[rgb]{0,0,0}u(t)=\underset{v\in U}{\arg\max}\ \mathbb{E}\Big{[}H(t,v,x(t),p(t),p^{0},q(t))\Big{]}},\ \textnormal{a.e.}\ \textnormal{(deterministic controls)}
    u(t)=argmaxvUH(t,v,x(t),p(t),p0,q(t)),a.e.,  a.s.(stochastic controls){\color[rgb]{0,0,0}u(t)=\underset{v\in U}{\arg\max}\ H(t,v,x(t),p(t),p^{0},q(t))},\ \textnormal{a.e., \ a.s.}\ \textnormal{(stochastic controls)}
  4. (4)

    Transversality Condition: if the final time is free

    maxvU𝔼[H(tf,v,x(tf),p(tf),p0,q(tf))]0(deterministic controls){\color[rgb]{0,0,0}\underset{v\in U}{\max}\ \mathbb{E}\Big{[}H(t_{f},v,x(t_{f}),p(t_{f}),p^{0},q(t_{f}))\Big{]}\geq 0}\ \textnormal{(deterministic controls)}
    𝔼[maxvUH(tf,v,x(tf),p(tf),p0,q(tf))]0(stochastic controls){\color[rgb]{0,0,0}\mathbb{E}\left[\underset{v\in U}{\max}\ H(t_{f},v,x(t_{f}),p(t_{f}),p^{0},q(t_{f}))\right]\geq 0}\ \textnormal{(stochastic controls)}

    where equalities hold in the case tf<Tt_{f}<T.

The quantity (tf,u,𝔭,p0)(t_{f},u,\mathfrak{p},p^{0}) uniquely determines xx, pp, and qq and is called extremal for GOCP (associated with the tuple (tf,u,x,p,𝔭,p0,q)(t_{f},u,x,p,\mathfrak{p},p^{0},q), or simply with (tf,u,x)(t_{f},u,x)). An extremal (tf,u,𝔭,p0)(t_{f},u,\mathfrak{p},p^{0}) is called normal if p00p^{0}\neq 0.

{thrm}

[Weak Stochastic Pontryagin Maximum Principle for LGOCPkΔ{}^{\Delta}_{k}] Let (tfk,uk,xk)(t^{k}_{f},u_{k},x_{k}) be a locally-optimal solution to LGOCPkΔ{}^{\Delta}_{k}. There exist pkL2(Ω;C([0,T];n))p_{k}\in L^{2}_{\mathcal{F}}(\Omega;C([0,T];\mathbb{R}^{n})), a tuple (𝔭k,pk0,pk1)(\mathfrak{p}_{k},p^{0}_{k},p^{1}_{k}), where 𝔭kq\mathfrak{p}_{k}\in\mathbb{R}^{q}, pk00p^{0}_{k}\leq 0, and pk1p^{1}_{k}\in\mathbb{R} are constant, and qk=((qk)1,,(qk)d)L2([0,T]×Ω;n×d)q_{k}=((q_{k})_{1},\dots,(q_{k})_{d})\in L^{2}_{\mathcal{F}}([0,T]\times\Omega;\mathbb{R}^{n\times d}) such that the following relations are satisfied:

  1. (1)

    Non-Triviality Condition: (𝔭k,pk0,pk1)0(\mathfrak{p}_{k},p^{0}_{k},p^{1}_{k})\neq 0.

  2. (2)

    Adjoint Equation:

    dpk(t)\displaystyle\mathrm{d}p_{k}(t) =Hkx(t,uk(t),xk(t),pk(t),pk0,pk1,qk(t))dt+qk(t)dBt,pk(tfk)=𝔼[gkx(xk(tfk))]𝔭kn.\displaystyle=\displaystyle-\frac{\partial H_{k}}{\partial x}(t,u_{k}(t),x_{k}(t),p_{k}(t),p^{0}_{k},p^{1}_{k},q_{k}(t))\;\mathrm{d}t+q_{k}(t)\;\mathrm{d}B_{t},\quad p_{k}(t^{k}_{f})=\displaystyle\mathbb{E}\left[\frac{\partial g_{k}}{\partial x}(x_{k}(t^{k}_{f}))\right]^{\top}\mathfrak{p}_{k}\in\mathbb{R}^{n}.
  3. (3)

    Maximality Condition:

    uk(t)=argmaxvU𝔼[Hk(t,v,xk(t),pk(t),pk0,pk1,qk(t))],a.e.(deterministic controls)u_{k}(t)=\underset{v\in U}{\arg\max}\ \mathbb{E}\Big{[}H_{k}(t,v,x_{k}(t),p_{k}(t),p^{0}_{k},p^{1}_{k},q_{k}(t))\Big{]},\ \textnormal{a.e.}\ \textnormal{(deterministic controls)}
    uk(t)=argmaxvUHk(t,v,xk(t),pk(t),pk0,pk1,qk(t)),a.e.,  a.s.(stochastic controls)u_{k}(t)=\underset{v\in U}{\arg\max}\ H_{k}(t,v,x_{k}(t),p_{k}(t),p^{0}_{k},p^{1}_{k},q_{k}(t)),\ \textnormal{a.e., \ a.s.}\ \textnormal{(stochastic controls)}
  4. (4)

    Transversality Condition: if the final time is free

    maxvU𝔼[Hk(tfk,v,xk(tfk),pk(tfk),pk0,pk1,qk(tfk))]0(deterministic controls)\underset{v\in U}{\max}\ \mathbb{E}\Big{[}H_{k}(t^{k}_{f},v,x_{k}(t^{k}_{f}),p_{k}(t^{k}_{f}),p^{0}_{k},p^{1}_{k},q_{k}(t^{k}_{f}))\Big{]}\geq 0\ \textnormal{(deterministic controls)}
    𝔼[maxvUHk(tfk,v,xk(tfk),pk(tfk),pk0,pk1,qk(tfk))]0(stochastic controls)\mathbb{E}\left[\underset{v\in U}{\max}\ H_{k}(t^{k}_{f},v,x_{k}(t^{k}_{f}),p_{k}(t^{k}_{f}),p^{0}_{k},p^{1}_{k},q_{k}(t^{k}_{f}))\right]\geq 0\ \textnormal{(stochastic controls)}

    where equalities hold in the case tfk<Tt^{k}_{f}<T.

The quantity (tfk,uk,𝔭k,pk0,pk1)(t^{k}_{f},u_{k},\mathfrak{p}_{k},p^{0}_{k},p^{1}_{k}) uniquely determines xkx_{k}, pkp_{k}, and qkq_{k} and is called extremal for LOCPkΔ{}^{\Delta}_{k} (associated with (tfk,uk,xk,pk,𝔭k,pk0,pk1,qk)(t^{k}_{f},u_{k},x_{k},p_{k},\mathfrak{p}_{k},p^{0}_{k},p^{1}_{k},q_{k}), or (tfk,uk,xk)(t^{k}_{f},u_{k},x_{k})). An extremal (tfk,uk,𝔭k,pk0,pk1)(t^{k}_{f},u_{k},\mathfrak{p}_{k},p^{0}_{k},p^{1}_{k}) is called normal if pk00p^{0}_{k}\neq 0.

Extending the stochastic PMP to the new linearized problems LGOCPkΔ{}^{\Delta}_{k}, and in particular the new transversality condition 4., additionally requires to assume (A3)(A^{\prime}_{3}) together with (A1)(A_{1}) and (A2)(A^{\prime}_{2}). Although the proof of this result for fixed-final-time problems is well-established (see [20, Chapter 3]), we could not find any published proof of Theorem 5.1 when the final time is free. Therefore, we provide its proof in Section 5.2. The proof of Theorem 5.1 is achieved similarly, thus for the sake of clarity and brevity, we avoid reporting any detail concerning the latter. Thanks to Theorem 5.1, we can extend the convergence of SCP as follows:

{thrm}

[Generalized Properties of Accumulation Points for SCP] Assume that (A1)(A_{1}), (A2)(A^{\prime}_{2}), and (A3)(A^{\prime}_{3}) hold and that SCP generates a sequence (Δk,tfk,uk,xk)k(\Delta_{k},t^{k}_{f},u_{k},x_{k})_{k\in\mathbb{N}} such that (Δk)k+{0}(\Delta_{k})_{k\in\mathbb{N}}\subseteq\mathbb{R}_{+}\setminus\{0\} converges to zero, and for every k1k\geq 1, the tuple (tfk,uk,xk)(t^{k}_{f},u_{k},x_{k}) locally solves LGOCPkΔ{}^{\Delta}_{k}. For every k1k\geq 1, letting (tfk,uk,𝔭k,pk0,pk1)(t^{k}_{f},u_{k},\mathfrak{p}_{k},p^{0}_{k},p^{1}_{k}) be an extremal associated with (tfk,uk,xk)(t^{k}_{f},u_{k},x_{k}) for LGOCPkΔ{}^{\Delta}_{k} (whose existence is ensured by Theorem 5.1), assume the following Accumulation Condition holds:

  • (AC)

    Up to some subsequence, (tfk,uk,𝔭k,pk0,pk1)(t^{k}_{f},u_{k},\mathfrak{p}_{k},p^{0}_{k},p^{1}_{k}) converges to some (u,𝔭,p0,p1)L2([0,T];m)×q+2(u,\mathfrak{p},p^{0},p^{1})\in L^{2}([0,T];\mathbb{R}^{m})\times\mathbb{R}^{q+2} for the strong topology of L2([0,T]×Ω;m)×q+2L^{2}_{\mathcal{F}}([0,T]\times\Omega;\mathbb{R}^{m})\times\mathbb{R}^{q+2}.

If (𝔭,p0)0(\mathfrak{p},p^{0})\neq 0, then (tfk,u,𝔭,p0)(t^{k}_{f},u,\mathfrak{p},p^{0}) is an extremal for OCP associated with (tfk,u,xu)(t^{k}_{f},u,x_{u}).

The guarantees offered by Theorem 5.1 read similarly to what we have explained in Section 3.3. One sees that the computations provided in Section 4.2 for the proof of Theorem 3.3.1 straightforwardly generalize as soon as weak convergence of controls in L2([0,T];m)L^{2}([0,T];\mathbb{R}^{m}) is replaced with strong convergence of controls in L2([0,T]×Ω;m)L^{2}_{\mathcal{F}}([0,T]\times\Omega;\mathbb{R}^{m}) (actually, the proofs become even simpler), and may be endorsed to prove Theorem 5.1, provided that we are capable to extend the proof of Theorem 3.2.1 and Theorem 3.2.2 in the case of free final time and stochastic controls. Therefore, to conclude, in the next section we develop the necessary technical details which enable proving Theorem 5.1 and Theorem 5.1 through the machinery developed in Section 4.2. Specifically, since the proofs of Theorem 5.1 and Theorem 5.1 are similar, for the sake of clarity and brevity we only provide details for the proof of Theorem 5.1.

5.2. Proof of the extension for the stochastic Pontryagin Maximum Principle

Before getting started, we need to introduce the notion of a Lebesgue point for a stochastic control uL2([0,T]×Ω;m)u\in L^{2}_{\mathcal{F}}([0,T]\times\Omega;\mathbb{R}^{m}). For this, we adopt the theory of Bochner integrals, showing that L2([0,T]×Ω;m)L2([0,T];L𝒢2(Ω;m))L^{2}_{\mathcal{F}}([0,T]\times\Omega;\mathbb{R}^{m})\subseteq L^{2}([0,T];L^{2}_{\mathcal{G}}(\Omega;\mathbb{R}^{m})), where the latter is the space of Bochner integrable mappings u:[0,T]L𝒢2(Ω;m)u:[0,T]\rightarrow L^{2}_{\mathcal{G}}(\Omega;\mathbb{R}^{m}). Let uL2([0,T]×Ω;m)u\in L^{2}_{\mathcal{F}}([0,T]\times\Omega;\mathbb{R}^{m}). First of all, by definition we see that for almost every t[0,T]t\in[0,T], it holds that 𝔼[u(t)2]<\mathbb{E}\left[\|u(t)\|^{2}\right]<\infty, and thus this control is well-defined as a mapping u:[0,T]L𝒢2(Ω;m)u:[0,T]\rightarrow L^{2}_{\mathcal{G}}(\Omega;\mathbb{R}^{m}). Since Ω\Omega is second-countable, L𝒢2(Ω;m)L^{2}_{\mathcal{G}}(\Omega;\mathbb{R}^{m}) is separable, and therefore the claim follows from the Pettis measurability theorem once we prove that u:[0,T]L𝒢2(Ω;m)u:[0,T]\rightarrow L^{2}_{\mathcal{G}}(\Omega;\mathbb{R}^{m}) is strongly measurable with respect to the Lebesgue measure of [0,T][0,T]. For this, it is sufficient to show that for every A([0,T])𝒢A\in\mathcal{B}([0,T])\otimes\mathcal{G} and αL𝒢2(Ω;m)\alpha\in L^{2}_{\mathcal{G}}(\Omega;\mathbb{R}^{m}), the mapping t𝔼[𝟙A(t,ω)α(ω)]t\mapsto\mathbb{E}\left[\mathbbm{1}_{A}(t,\omega)\alpha(\omega)\right] is Lebesgue measurable. By fixing αL𝒢2(Ω;m)\alpha\in L^{2}_{\mathcal{G}}(\Omega;\mathbb{R}^{m}), this can be achieved by proving that the family {A([0,T])𝒢:t𝔼[𝟙A(t,ω)α(ω)]is Lebesgue measurable}\{A\in\mathcal{B}([0,T])\otimes\mathcal{G}:\ t\mapsto\mathbb{E}\left[\mathbbm{1}_{A}(t,\omega)\alpha(\omega)\right]\ \textnormal{is Lebesgue measurable}\} is a monotone class and then using standard monotone class arguments (the details are left to the reader). At this step, the Lebesgue differentiation theorem provides that for almost every t[0,T]t\in[0,T], the following relations hold:

limη01ηtt+η𝔼[u(s)u(t)]ds=0,limη01ηtt+η𝔼[u(s)u(t)2]ds=0.\underset{\eta\rightarrow 0}{\lim}\ \frac{1}{\eta}\int^{t+\eta}_{t}\mathbb{E}\Big{[}\|u(s)-u(t)\|\Big{]}\;\mathrm{d}s=0,\quad\underset{\eta\rightarrow 0}{\lim}\ \frac{1}{\eta}\int^{t+\eta}_{t}\mathbb{E}\Big{[}\|u(s)-u(t)\|^{2}\Big{]}\;\mathrm{d}s=0.

Such a time t[0,T]t\in[0,T] is called Lebesgue point for the control uL2([0,T]×Ω;m)u\in L^{2}_{\mathcal{F}}([0,T]\times\Omega;\mathbb{R}^{m}).

5.2.1. Modified Needle-like Variations and End-point Mapping

We may extend the concept of needle-like variations and of end-point mapping previously introduced in Section 4.1.2 to the setting of free final time and of stochastic controls as follows.

Given an integer jj\in\mathbb{N}, fix jj times 0<t1<<tj<tf0<t_{1}<\dots<t_{j}<t_{f} which are Lebesgue points for uu, and fix jj random variables u1,,uju_{1},\dots,u_{j} such that uiLti2(Ω;U)u_{i}\in L^{2}_{\mathcal{F}_{t_{i}}}(\Omega;U). For fixed scalars 0ηi<ti+1ti0\leq\eta_{i}<t_{i+1}-t_{i}, i=1,,j1i=1,\dots,j-1, and 0ηj<tftj0\leq\eta_{j}<t_{f}-t_{j}, the needle-like variation π={ti,ηi,ui}i=1,,j\pi=\{t_{i},\eta_{i},u_{i}\}_{i=1,\dots,j} of the control uu is defined to be the admissible control uπ(t)=uiu_{\pi}(t)=u_{i} if t[ti,ti+ηi]t\in[t_{i},t_{i}+\eta_{i}] and uπ(t)=u(t)u_{\pi}(t)=u(t) otherwise. Denote by x~v\tilde{x}_{v} the solution related to an admissible control vv of the augmented system

{dx(t)=b(t,v(t),x(t))dt+σ(t,x(t))dBt,x(0)=x0dxn+1(t)=f0(t,v(t),x(t))dt,xn+1(0)=0\begin{cases}\mathrm{d}x(t)=b(t,v(t),x(t))\;\mathrm{d}t+\sigma(t,x(t))\;\mathrm{d}B_{t},\quad x(0)=x^{0}\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \mathrm{d}x^{n+1}(t)=f^{0}(t,v(t),x(t))\;\mathrm{d}t,\hskip 60.27759ptx^{n+1}(0)=0\end{cases}

and define the mapping g~:n+1q+1:x~(g(x),xn+1)\tilde{g}:\mathbb{R}^{n+1}\rightarrow\mathbb{R}^{q+1}:\tilde{x}\mapsto(g(x),x^{n+1}). For every fixed time t(tj,tf]t\in(t_{j},t_{f}], by denoting δtmin{ti+1ti,ttj,Tt:i=1,,j1}>0\delta_{t}\triangleq\min\{t_{i+1}-t_{i},t-t_{j},T-t:i=1,\dots,j-1\}>0, the end-point mapping at time tt is defined to be

Ftj:\displaystyle F^{j}_{t}:\ 𝒞tjBδtj+1(0)(×+j)q+1\displaystyle\mathcal{C}^{j}_{t}\triangleq B^{j+1}_{\delta_{t}}(0)\cap(\mathbb{R}\times\mathbb{R}^{j}_{+})\rightarrow\mathbb{R}^{q+1}
(δ,η1,,ηj)𝔼[g~(x~uπ(t+δ))]𝔼[g~(x~u(t))]\displaystyle(\delta,\eta_{1},\dots,\eta_{j})\mapsto\mathbb{E}\left[\tilde{g}(\tilde{x}_{u_{\pi}}(t+\delta))\right]-\mathbb{E}\left[\tilde{g}(\tilde{x}_{u}(t))\right]

where Bρj+1B^{j+1}_{\rho} is the open ball in j+1\mathbb{R}^{j+1} of radius ρ>0\rho>0. Variations on the variable δ\delta are necessary only if free-final-time problems are considered, in which case (A2)(A^{\prime}_{2}), and in particular the fact that gg is an affine function, play a crucial role for computations. Due to Lemma 2.1 (and to (A2)(A^{\prime}_{2}) in the case of free-final-time problems), it is not difficult to see that FtjF^{j}_{t} is Lipschitz (see also the argument developed to prove Lemma 5.2.1 below). In addition, this mapping may be Gateaux differentiated at zero along admissible directions of the cone 𝒞tj\mathcal{C}^{j}_{t}. For this, denote b~=(b,f0)\tilde{b}=(b^{\top},f^{0})^{\top}, σ~=(σ,0)\tilde{\sigma}=(\sigma^{\top},0)^{\top} and let zti,uiz_{t_{i},u_{i}} be the unique solution to (6) with ξti=b~(ti,ui,x(ti))b~(ti,u(ti),x(ti))\xi_{t_{i}}=\tilde{b}(t_{i},u_{i},x(t_{i}))-\tilde{b}(t_{i},u(t_{i}),x(t_{i})). {lmm}[Generalized stochastic needle-like variation formula] Let (δ,η1,,ηj)𝒞tj(\delta,\eta_{1},\dots,\eta_{j})\in\mathcal{C}^{j}_{t} and assume (A2)(A_{2}) holds (in particular, g~\tilde{g} is an affine function when δ0\delta\neq 0). If t>tjt>t_{j} is a Lebesgue point for uu, then it holds that

𝔼[g~(x~uπ(t+δ))g~(x~u(t))δg~x~(x~u(t))b~(t,u(t),xu(t))i=1jηig~x~(x~u(t))zti,ui(t)]=o(δ+i=1jηi).\bigg{\|}\mathbb{E}\bigg{[}\tilde{g}(\tilde{x}_{u_{\pi}}(t+\delta))-\tilde{g}(\tilde{x}_{u}(t))-\delta\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(t))\tilde{b}(t,u(t),x_{u}(t))-\sum^{j}_{i=1}\eta_{i}\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(t))z_{t_{i},u_{i}}(t)\bigg{]}\bigg{\|}=o\left(\delta+\sum^{j}_{i=1}\eta_{i}\right).

As for Lemma 4.1.2, we provide an extensive proof of Lemma 5.2.1 in the appendix.

5.2.2. Variational Inequalities and Conclusion

Similarly to what we have argued in Section 5.2.2, the main step in the proof of the stochastic PMP with free final time and stochastic control goes by contradiction, leveraging Lemma 5.2.1. For this, from now on we assume that when free, the final time is a Lebesgue point for the optimal control. Otherwise, one may proceed by mimicking the argument developed in [39, Section 7.3].

Assume that (A1)(A_{1}) and (A2)(A^{\prime}_{2}) hold. For every jj\in\mathbb{N}, define the linear mapping

dFtfj(δ,η)=δ𝔼[g~x~(x~u(tf))b~(tf,u(tf),xu(tf))]+i=1jηi𝔼[g~x~(x~u(tf))zti,ui(tf)],dF^{j}_{t_{f}}(\delta,\eta)=\delta\mathbb{E}\bigg{[}\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(t_{f}))\tilde{b}(t_{f},u(t_{f}),x_{u}(t_{f}))\bigg{]}+\sum_{i=1}^{j}\eta_{i}\mathbb{E}\left[\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(t_{f}))z_{t_{i},u_{i}}(t_{f})\right],

which due to Lemma 5.2.1, satisfies

limα>0,α0Ftfj(α(δ,η))α=dFtfj(δ,η),\underset{\alpha>0,\alpha\rightarrow 0}{\lim}\ \frac{F^{j}_{t_{f}}\left(\alpha(\delta,\eta)\right)}{\alpha}=dF^{j}_{t_{f}}(\delta,\eta),

for every (δ,η)×+j(\delta,\eta)\in\mathbb{R}\times\mathbb{R}^{j}_{+}. Finally, consider the closed, convex cone of q+1\mathbb{R}^{q+1} given by

KCl(Cone{\displaystyle K\triangleq\textnormal{Cl}\bigg{(}\textnormal{Cone}\ \bigg{\{} 𝔼[g~x~(x~u(tf))zti,ui(tf)],±𝔼[g~x~(x~u(tf))b~(tf,u(tf),xu(tf))]\displaystyle\mathbb{E}\left[\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(t_{f}))z_{t_{i},u_{i}}(t_{f})\right],\pm\mathbb{E}\left[\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(t_{f}))\tilde{b}(t_{f},u(t_{f}),x_{u}(t_{f}))\right]
foruiUandti(0,tf)is Lebesgue foru}).\displaystyle\textnormal{for}\ u_{i}\in U\ \textnormal{and}\ t_{i}\in(0,t_{f})\ \textnormal{is Lebesgue for}\ u\bigg{\}}\bigg{)}.

If K=q+1K=\mathbb{R}^{q+1}, it would hold dFtfj(×+j)=K=q+1dF^{j}_{t_{f}}(\mathbb{R}\times\mathbb{R}^{j}_{+})=K=\mathbb{R}^{q+1}, and by [40, Lemma 12.1], one would find that the origin is an interior point of Ftfj(𝒞tfj)F^{j}_{t_{f}}(\mathcal{C}^{j}_{t_{f}}). This would imply that (tf,u,x)(t_{f},u,x) is not optimal for OCP, a contradiction.

The argument above (together with an application of the separation plane theorem) provides the existence of a non-zero vector denoted 𝔭~=(𝔭,𝔭0)q+1\tilde{\mathfrak{p}}=(\mathfrak{p}^{\top},\mathfrak{p}^{0})\in\mathbb{R}^{q+1} such that the following variational inequalities hold

{𝔭~𝔼[g~x~(x~u(tf))b~(tf,u(tf),xu(tf))]=0
𝔭~𝔼[g~x~(x~u(tf))zr,v(tf)]0,r[0,tf]is Lebesgue foru,vLr2(Ω;U).
\begin{cases}\displaystyle\tilde{\mathfrak{p}}^{\top}\mathbb{E}\left[\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(t_{f}))\tilde{b}(t_{f},u(t_{f}),x_{u}(t_{f}))\right]=0\vskip 12.0pt plus 4.0pt minus 4.0pt\\ \displaystyle\tilde{\mathfrak{p}}^{\top}\mathbb{E}\left[\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(t_{f}))z_{r,v}(t_{f})\right]\leq 0,\ r\in[0,t_{f}]\ \textnormal{is Lebesgue for}\ u,\ v\in L^{2}_{\mathcal{F}_{r}}(\Omega;U).\end{cases}
(25)

In the case of deterministic controls, the random variables vLr2(Ω;U)v\in L^{2}_{\mathcal{F}_{r}}(\Omega;U) in (25) are replaced by deterministic vectors vUv\in U. Moreover, when tf=Tt_{f}=T, only negative variations on the final time are allowed. Hence in this case, the first equality of (25) actually becomes a greater-or-equal-to-zero inequality (see also [39, Chapter 7]).

The rest of the proof remains unchanged, i.e., it follows the argument of Section 4.1.4 verbatim, and the proof of Theorem 5.1 may be readily inferred by the computations of Section 4.2.

6. An Example Numerical Scheme

Although the procedure detailed previously provides methodological steps to tackle OCP through successive linearizations, numerically solving LCPs that depend on stochastic coefficients still remains a challenge. In this last section, under appropriate assumptions we propose an approximate, though very practical, numerical scheme to effectively solve each subproblem LOCPkΔ{}^{\Delta}_{k}. We stress the fact that our main goal is not the development of an ultimate algorithm, but rather consists of demonstrating how one may leverage the theoretical insights provided by Theorem 3.3.1 to design efficient strategies to practically solve OCP.

6.1. A Simplified Context

The proposed approach relies on a specific shape of the cost and the dynamics of OCP. Specifically, we consider OCPs with deterministic admissible controls u𝒰=L2([0,T];U)u\in\mathcal{U}=L^{2}([0,T];U) (over a fixed time horizon [0,T][0,T]) only and whose cost functions f0f^{0} are such that H=0H=0. Moreover, we assume the state variable is given by two components (x,z)nx+nz(x,z)\in\mathbb{R}^{n_{x}+n_{z}} for nx,nzn_{x},n_{z}\in\mathbb{N}, satisfying the following system of forward stochastic differential equations (bxb^{x} and bzb^{z} are accordingly defined as in (1))

{dx(t)=bx(t,u(t),x(t),z(t))dt+σ(t,z(t))dBt,x(0)=x0dz(t)=bz(t,u(t),z(t))dt,z(0)=z0nz.\begin{cases}\mathrm{d}x(t)=b^{x}(t,u(t),x(t),z(t))\;\mathrm{d}t+\sigma(t,z(t))\;\mathrm{d}B_{t},\quad x(0)=x^{0}\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \mathrm{d}z(t)=b^{z}(t,u(t),z(t))\;\mathrm{d}t,\quad z(0)=z^{0}\in\mathbb{R}^{n_{z}}.\end{cases} (26)

In particular, any \mathcal{F}-adapted process (x,z)(x,z) solution to (26) with continuous sample paths for a given control u𝒰u\in\mathcal{U} is such that zz is deterministic. For the sake of clarity and to avoid cumbersome notation, from now on we assume g=0g=0 and that bxb^{x} does not explicitly depend on zz. This is clearly done without loss of generality.

6.2. The Proposed Approach

With the assumptions adopted previously, we see that the diffusion in the dynamics of OCP is now forced to be deterministic. This fact is at the root of our method, which mimics the procedure proposed in [9]. Specifically, we transcribe every stochastic subproblem LOCPkΔ{}^{\Delta}_{k} into a deterministic and convex optimal control problem, whereby the variables are the mean and the covariance of the solution xkx_{k} to LOCPkΔ{}^{\Delta}_{k}. The main advantage in doing so is that deterministic reformulations of the subproblems LOCPkΔ{}^{\Delta}_{k} can be efficiently solved via off-the-shelf convex solvers. Unlike [9], here we rely on some upstream information for the design of this numerical scheme which is entailed by Theorem 3.3.1 as follows.

By recalling the notation introduced in the previous sections, we denote μ(t)𝔼[x(t)]\mu(t)\triangleq\mathbb{E}[x(t)], Σ(t)𝔼[(x(t)μ(t))(x(t)μ(t))]\Sigma(t)\triangleq\mathbb{E}[(x(t)-\mu(t))(x(t)-\mu(t))^{\top}], and for kk\in\mathbb{N}, μk(t)𝔼[xk(t)]\mu_{k}(t)\triangleq\mathbb{E}[x_{k}(t)] and Σk(t)𝔼[(xk(t)μk(t))(xk(t)μk(t))]\Sigma_{k}(t)\triangleq\mathbb{E}[(x_{k}(t)-\mu_{k}(t))(x_{k}(t)-\mu_{k}(t))^{\top}]. Heuristically, assuming that solutions to LOCPkΔ{}^{\Delta}_{k} have small variance, i.e., trΣkL21\|\textnormal{tr}\ \Sigma_{k}\|_{L^{2}}\ll 1, one can compute the linearization of bxb^{x}, bzb^{z}, σ\sigma, and LL at μk\mu_{k} rather than at xkx_{k}. In doing so, for the cost of LOCPk+1Δ{}^{\Delta}_{k+1} we obtain the following approximation (the notation goes accordingly as in (3))

𝔼[0Tfk+10(s,u(s),x(s),z(s))ds]0Tfuk,μk0(s,u(s),μ(s),z(s))ds,\mathbb{E}\left[\int^{T}_{0}f^{0}_{k+1}(s,u(s),x(s),z(s))\;\mathrm{d}s\right]\approx\int^{T}_{0}f^{0}_{u_{k},\mu_{k}}(s,u(s),\mu(s),z(s))\;\mathrm{d}s, (27)

whereas for the dynamics of LOCPk+1Δ{}^{\Delta}_{k+1} (the notation goes accordingly as in (2))

{dx(t)buk,μkx(t,u(t),x(t))dt+σzk(t,z(t))dBt=(𝒜k+1(t)x(t)+k+1(t,u(t)))dt+𝒞k+1(t,z(t))dBtdz(t)buk,zkz(t,u(t),z(t))dt=(𝒟k+1(t)z(t)+k+1(t,u(t)))dt\begin{cases}\mathrm{d}x(t)\approx b^{x}_{u_{k},\mu_{k}}(t,u(t),x(t))\;\mathrm{d}t+\sigma_{z_{k}}(t,z(t))\;\mathrm{d}B_{t}\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \hskip 25.83325pt=\Big{(}\mathcal{A}_{k+1}(t)x(t)+\mathcal{B}_{k+1}(t,u(t))\Big{)}\;\mathrm{d}t+\mathcal{C}_{k+1}(t,z(t))\;\mathrm{d}B_{t}\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \mathrm{d}z(t)\approx b^{z}_{u_{k},z_{k}}(t,u(t),z(t))\;\mathrm{d}t=\Big{(}\mathcal{D}_{k+1}(t)z(t)+\mathcal{E}_{k+1}(t,u(t))\Big{)}\;\mathrm{d}t\end{cases} (28)

where 𝒜k+1(t)nx×nx\mathcal{A}_{k+1}(t)\in\mathbb{R}^{n_{x}\times n_{x}}, k+1(t,u)nx\mathcal{B}_{k+1}(t,u)\in\mathbb{R}^{n_{x}}, 𝒞k+1(t,z)nx\mathcal{C}_{k+1}(t,z)\in\mathbb{R}^{n_{x}}, 𝒟k+1(t)nz×nz\mathcal{D}_{k+1}(t)\in\mathbb{R}^{n_{z}\times n_{z}}, and k+1(t,u)nz\mathcal{E}_{k+1}(t,u)\in\mathbb{R}^{n_{z}} are deterministic and with 𝒞k+1(t,z)\mathcal{C}_{k+1}(t,z) affine in zz. Accordingly, by introducing μek(t)μ(t)μk(t)\mu^{e_{k}}(t)\triangleq\mu(t)-\mu_{k}(t) and Σek(t)𝔼[(x(t)xk(t)μ(t)+μk(t))(x(t)xk(t)μ(t)+μk(t))]\Sigma^{e_{k}}(t)\triangleq\mathbb{E}[(x(t)-x_{k}(t)-\mu(t)+\mu_{k}(t))(x(t)-x_{k}(t)-\mu(t)+\mu_{k}(t))^{\top}], slightly tighter trust-region constraints are

0TtrΣek(t)dt+0Tμek(t)2dtΔk+1,|tfk+1tfk|Δk+1.\int^{T}_{0}\textnormal{tr}\ \Sigma^{e_{k}}(t)\;\mathrm{d}t+\int^{T}_{0}\|\mu^{e_{k}}(t)\|^{2}\;\mathrm{d}t\leq\Delta_{k+1},\quad|t^{k+1}_{f}-t^{k}_{f}|\leq\Delta_{k+1}. (29)

At this point, as all coefficients are deterministic, solutions to (28) are Gaussian processes whose dynamics take the form (see, e.g., [54])

{μ˙(t)=𝒜k+1(t)μ(t)+k+1(t,u(t)),z˙(t)=𝒟k+1(t)z(t)+k+1(t,u(t))Σ˙(t)=𝒜k+1(t)Σ(t)+Σ(t)𝒜k+1(t)+𝒞k+1(t,z(t))𝒞k+1(t,z(t)).\begin{cases}\dot{\mu}(t)=\mathcal{A}_{k+1}(t)\mu(t)+\mathcal{B}_{k+1}(t,u(t)),\quad\dot{z}(t)=\mathcal{D}_{k+1}(t)z(t)+\mathcal{E}_{k+1}(t,u(t))\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \dot{\Sigma}(t)=\mathcal{A}_{k+1}(t)\Sigma(t)+\Sigma(t)\mathcal{A}_{k+1}(t)^{\top}+\mathcal{C}_{k+1}(t,z(t))\mathcal{C}_{k+1}(t,z(t))^{\top}.\end{cases} (30)

The system above is not linear because of 𝒞k+1(t,z(t))𝒞k+1(t,z(t))\mathcal{C}_{k+1}(t,z(t))\mathcal{C}_{k+1}(t,z(t))^{\top}. Nevertheless, we may call upon the convergences of Theorem 3.3.1 , and in particular the convergence of the sequence (zk)k(z_{k})_{k\in\mathbb{N}} to some deterministic curve zz, to replace (30) with

{μ˙(t)=𝒜k+1(t)μ(t)+k+1(t,u(t)),z˙(t)=𝒟k+1(t)z(t)+k+1(t,u(t))Σ˙(t)=𝒜k+1(t)Σ(t)+Σ(t)𝒜k+1(t)+𝒞k+1(t,zk(t))𝒞k+1(t,z(t)).\begin{cases}\dot{\mu}(t)=\mathcal{A}_{k+1}(t)\mu(t)+\mathcal{B}_{k+1}(t,u(t)),\quad\dot{z}(t)=\mathcal{D}_{k+1}(t)z(t)+\mathcal{E}_{k+1}(t,u(t))\vskip 6.0pt plus 2.0pt minus 2.0pt\\ \dot{\Sigma}(t)=\mathcal{A}_{k+1}(t)\Sigma(t)+\Sigma(t)\mathcal{A}_{k+1}(t)^{\top}+\mathcal{C}_{k+1}(t,z_{k}(t))\mathcal{C}_{k+1}(t,z(t))^{\top}.\end{cases} (31)

In conclusion, we may heuristically replace every LOCPk+1Δ{}^{\Delta}_{k+1} with the deterministic convex optimal control problem whose dynamics are (31), whose trust-region constraints are (29), whose variables are μ\mu and Σ\Sigma (and additionally μek\mu^{e_{k}} and Σek\Sigma_{e_{k}}), and whose cost consists of replacing (27) with

0T(fuk,μk0(s,u(s),μ(s),z(s))+trΣ(t))ds\int^{T}_{0}\Big{(}f^{0}_{u_{k},\mu_{k}}(s,u(s),\mu(s),z(s))+\textnormal{tr}\ \Sigma(t)\Big{)}\;\mathrm{d}s

to force solutions to have small variances, which in turn justifies the whole approach.

6.3. Uncertain Car Trajectory Planning Problem

We now provide numerical experiments. We consider the problem of planning the trajectory of a car whose state and control inputs are x=(rx,ry,θ,v,ω)x=(r_{x},r_{y},\theta,v,\omega) and u=(av,aω)u=(a_{v},a_{\omega}) and whose dynamics are

b(t,x,u)=(vcos(θ),vsin(θ),ω,av,aω),σ(t,x)=diag([α2ωv,α2ωv,β2ωv,0,0]),b(t,x,u)=(v\cos(\theta),v\sin(\theta),\omega,a_{v},a_{\omega}),\ \sigma(t,x)=\textrm{diag}([\alpha^{2}\omega v,\alpha^{2}\omega v,\beta^{2}\omega v,0,0]),

where α2=0.1\alpha^{2}=0.1 and β2=0.01\beta^{2}=0.01 quantify the effect of slip. The evolution of (v,ω)(v,\omega) is deterministic: given actuator commands, the change in velocity is known exactly, but uncertainty in positional variables persists. This model is motivated by driving applications where the vehicle may drift during tight turns and suffer from important positional uncertainty, but limits its acceleration to avoid wheel slip. By defining z=(v,ω)z=(v,\omega), this problem setting matches the one presented in the previous section. We consider minimizing control effort G(u(s))=u(s)22G(u(s))=\|u(s)\|_{2}^{2}. The initial state x0x^{0} is fixed, and the final state constraint consists of reaching the goal xf\smash{x^{f}} in expectation such that g(x(tf))=x(tf)xf\smash{g(x(t_{f}))=x(t_{f})-x^{f}}. Further, we consider cylindrical obstacles of radius δo>0\delta_{o}>0 centered at ro2r_{o}\in\mathbb{R}^{2} for which we define the potential function co:nc_{o}:\mathbb{R}^{n}\rightarrow\mathbb{R} as co(x)=rro2(δo+ε)2c_{o}(x)=\|r-r_{o}\|^{2}-(\delta_{o}+\varepsilon)^{2} if rro<(δo+ε)\|r-r_{o}\|<(\delta_{o}+\varepsilon) and 0 otherwise, where r=[rx,ry]xr=[r_{x},r_{y}]\subset x and ε=0.1\varepsilon=0.1 is an additional clearance. We penalize obstacle avoidance constraint violation directly within the cost, defining L(s,x(s),u(s))=L0(s,x(s))=λco(x(s))L(s,x(s),u(s))=L_{0}(s,x(s))=\lambda c_{o}(x(s)), with λ=500\lambda=500 (note that H=0H=0 in this setting, following Section 6.1).

6.4. Results

We set x(0)=[0,,0]x(0)=[0,\dots,0], xf=[2.2,3,0,0,0]x^{f}=[2.2,3,0,0,0], and tf=5st_{f}=5~{}\textrm{s}, and we set up four obstacles. We discretize (31) using a forward Euler scheme with N=41N=41 discretization nodes, set Δ0=100\Delta_{0}=100, set Δk+1=0.99Δk\Delta_{k+1}=0.99\Delta_{k} at each SCP iteration, and use IPOPT to solve each convexified problem. We check convergence of SCP by verifying that 0tfuk+1uk2(s)+ukuk12(s)ds103\int^{t_{f}}_{0}\|u_{k+1}-u_{k}\|^{2}(s)+\|u_{k}-u_{k-1}\|^{2}(s)\;\mathrm{d}s\leq 10^{-3}. Although our method is initialized with an unfeasible straight-line initial trajectory, SCP converges in 1010 iterations, with a trajectory avoiding the obstacles in expectation. Indeed, after evaluating 10410^{4} sample paths of the system, only 6.14%6.14\% of the trajectories intersect with obstacles. We also verify that pk00p^{0}_{k}\neq 0 at every SCP iteration, and at convergence, so that (𝔭,p0)0(\mathfrak{p},p^{0})\neq 0. Thus, Theorems 3.3.1 and 3.3.2 apply and the solution generated by SCP is an extremal for OCP. We visualize our results in Figure 1 and release our implementation at https://github.com/StanfordASL/stochasticSCP.

Refer to caption
Refer to captionRefer to captionRefer to captionRefer to caption
Figure 1. Left: solution of each SCP iteration and 100100 sample paths of the resulting final trajectory. Right: velocities and control inputs at each SCP iteration.

7. Conclusion and Perspectives

In this paper we introduced and analyzed convergence properties for sequential convex programming for non-linear stochastic optimal control, from which we derived a practical numerical framework to solve non-linear stochastic optimal control problems.

Future work may consider extending this analysis to tackle more general problem formulations, e.g., risk measures as costs and state (chance) constraints. In this context, some preliminary results using SCP only exist for discrete time problem formulations [32]. However, tackling continuous time formulations will require more sophisticated necessary conditions for optimality. In addition, we plan to better investigate the setting of free final time and stochastic admissible controls, thus devising sharper assumptions and improving the convergence result. Finally, we plan to further leverage our theoretical insights to design new and more efficient numerical schemes for non-linear stochastic optimal control.

References

  • [1] J.E. Potter. A matrix equation arising in statistical filter theory. Rep. RE-9, Experimental Astronomy Laboratory, Massachusetts Institute of Technology, 1965.
  • [2] J.-M. Bismut. Linear quadratic optimal stochastic control with random coefficients. SIAM Journal on Control and Optimization, 14(3):419–444, 1976.
  • [3] S. Peng. Stochastic Hamilton–Jacobi–Bellman equations. SIAM Journal on Control and Optimization, 30(2):284–304, 1992.
  • [4] S. Tang. General linear quadratic optimal stochastic control problems with random coefficients: linear stochastic Hamilton systems and backward stochastic Riccati equations. SIAM journal on control and optimization, 42(1):53–75, 2003.
  • [5] P. Kleindorfer and K. Glover. Linear convex stochastic optimal control with applications in production planning. IEEE Transactions on Automatic Control, 18(1):56–59, 1973.
  • [6] R. T. Rockafellar and R. J. B. Wets. Generalized linear-quadratic problems of deterministic and stochastic optimal control in discrete time. SIAM Journal on control and optimization, 28(4):810–822, 1990.
  • [7] D. Kuhn, W. Wiesemann, and A. Georghiou. Primal and dual linear decision rules in stochastic and robust optimization. Mathematical Programming, 130(1):177–209, 2011.
  • [8] C. Bes and S. Sethi. Solution of a class of stochastic linear-convex control problems using deterministic equivalents. Journal of optimization theory and applications, 62(1):17–27, 1989.
  • [9] B. Berret and F. Jean. Efficient computation of optimal open-loop controls for stochastic systems. Automatica, 115(108874), 2020.
  • [10] M. A. Rami and X. Y. Zhou. Linear matrix inequalities, riccati equations, and indefinite stochastic linear quadratic controls. IEEE Transactions on Automatic Control, 45(6):1131–1143, 2000.
  • [11] D. D. Yao, S. Zhang, and X. Y. Zhou. Stochastic linear-quadratic control via semidefinite programming. SIAM Journal on Control and Optimization, 40(3):801–823, 2001.
  • [12] D. Bertsimas and D. B. Brown. Constrained stochastic lqc: a tractable approach. IEEE Transactions on automatic control, 52(10):1826–1841, 2007.
  • [13] T. Damm, H. Mena, and T. Stillfjord. Numerical solution of the finite horizon stochastic linear quadratic control problem. Numerical Linear Algebra with Applications, 24(4), 2017.
  • [14] T. Levajković, H. Mena, and L.-M. Pfurtscheller. Solving stochastic LQR problems by polynomial chaos. IEEE control systems letters, 2(4):641–646, 2018.
  • [15] R. Bellman. Dynamic Programming. Princeton Univ. Press, Princeton, New Jersey, 1957.
  • [16] P.-L. Lions. Optimal control of diffusion processes and Hamilton-Jacobi-Bellman equations, part i. Comm. Partial Differential Equations, 8:1101–1174, 1983.
  • [17] L.S. Pontryagin. Mathematical theory of optimal processes. Routledge, 2018.
  • [18] H. J. Kushner and F. C. Schweppe. A maximum principle for stochastic control systems. Journal of Mathematical Analysis and Applications, 8(2):287–302, 1964.
  • [19] S. Peng. A general stochastic maximum principle for optimal control problems. SIAM Journal on control and optimization, 28(4):966–979, 1990.
  • [20] J. Yong and X. Y. Zhou. Stochastic controls: Hamiltonian systems and HJB equations, volume 43. Springer Science & Business Media, 1999.
  • [21] E. Trélat. Optimal control and applications to aerospace: some results and challenges. Journal of Optimization Theory and Applications, 154(3):713–758, 2012.
  • [22] R. Bonalli, B. Hérissé, and E. Trélat. Analytical Initialization of a Continuation-Based Indirect Method for Optimal Control of Endo-Atmospheric Launch Vehicle Systems. In IFAC World Congress, 2017.
  • [23] R. Bonalli, B. Hérissé, and E. Trélat. Optimal control of endo-atmospheric launch vehicle systems: Geometric and computational issues. IEEE Transactions on Automatic Control, 65(6):2418–2433, 2019.
  • [24] A. Shapiro and A. Nemirovski. On complexity of stochastic programming problems. In Continuous optimization, pages 111–146. Springer, 2005.
  • [25] E. Gobet. Monte-Carlo methods and stochastic processes: from linear to non-linear. CRC Press, 2016.
  • [26] H. J. Kushner. Numerical methods for stochastic control problems in continuous time. SIAM Journal on Control and Optimization, 28(5):999–1048, 1990.
  • [27] H. J. Kushner and L. F. Martins. Numerical methods for stochastic singular control problems. SIAM journal on control and optimization, 29(6):1443–1475, 1991.
  • [28] M. Annunziato and A. Borzì. A Fokker–Planck control framework for multidimensional stochastic processes. Journal of Computational and Applied Mathematics, 237(1):487–507, 2013.
  • [29] J.-F. Le Gall. Brownian motion, martingales, and stochastic calculus, volume 274. Springer, 2016.
  • [30] R. Carmona. Lectures on BSDEs, stochastic control, and stochastic differential games with financial applications, volume 1. SIAM, 2016.
  • [31] O. Kazuhide, M. Goldshtein, and P. Tsiotras. Optimal covariance control for stochastic systems under chance constraints. IEEE Control Systems Letters, 2(2):266–271, 2018.
  • [32] T. Lew, R. Bonalli, and M. Pavone. Chance-constrained sequential convex programming for robust trajectory optimization. In European Control Conference, 2020.
  • [33] Y. Wang, D. Yang, J. Yong, and Z. Yu. Exact controllability of linear stochastic differential equations and related problems. American Institute of Mathematical Sciences, 7(2):305–345, 2017.
  • [34] Q. T. Dinh and M. Diehl. Local Convergence of Sequential Convex Programming for Nonconvex Optimization. In Recent Advances in Optimization and its Applications in Engineering, pages 93–102. Springer, 2010.
  • [35] M. Diehl and F. Messerer. Local Convergence of Generalized Gauss-Newton and Sequential Convex Programming. In Conference on Decision and Control, 2019.
  • [36] Y. Mao, D. Dueri, M. Szmuk, and B. Açikmeşe. Successive Convexification of Non-Convex Optimal Control Problems with State Constraints. IFAC-PapersOnLine, 50(1):4063–4069, 2017.
  • [37] F. Palacios-Gomez, L. Lasdon, and M. Engquist. Nonlinear optimization by successive linear programming. Management science, 28(10):1106–1120, 1982.
  • [38] P. T. Boggs and J. W. Tolle. Sequential quadratic programming. Acta numerica, 4(1):1–51, 1995.
  • [39] R. Gamkrelidze. Principles of optimal control theory, volume 7. Springer Science & Business Media, 2013.
  • [40] A.A. Agrachev and Y. Sachkov. Control theory from the geometric viewpoint, volume 87. Springer Science & Business Media, 2013.
  • [41] H. Frankowska, H. Zhang, and X. Zhang. Stochastic Optimal Control Problems with Control and Initial-Final States Constraints. SIAM Journal on Control and Optimization, 56:1823–1855, 2018.
  • [42] J. Nocedal and S. Wright. Numerical Optimization. Springer, 1999.
  • [43] Z. Lu. Sequential convex programming methods for a class of structured nonlinear programming. Technical report, 2013.
  • [44] T. Haberkorn and E. Trélat. Convergence results for smooth regularizations of hybrid nonlinear optimal control problems. SIAM Journal on Control and Optimization, 49:1498–1522, 2011.
  • [45] R. Bonalli, B. Hérissé, and E. Trélat. Solving Optimal Control Problems for Delayed Control-Affine Systems with Quadratic Cost by Numerical Continuation. In American Control Conference, 2017.
  • [46] R. Bonalli. Optimal control of aerospace systems with control-state constraints and delays. PhD thesis, Sorbonne Université, 2018.
  • [47] R. Bonalli, B. Hérissé, and E. Trélat. Continuity of pontryagin extremals with respect to delays in nonlinear optimal control. SIAM Journal on Control and Optimization, 57(2):1440–1466, 2019.
  • [48] I. A. Shvartsman and R. B. Vinter. Regularity properties of optimal controls for problems with time-varying state and control constraints. Nonlinear Analysis: Theory, Methods & Applications, 65:448–474, 2006.
  • [49] Y. Chitour, F. Jean, and E. Trélat. Singular trajectories of control-affine systems. SIAM Journal on Control and Optimization, 47:1078–1095, 2008.
  • [50] A.E. Bryson. Applied optimal control: optimization, estimation and control. CRC Press, 1975.
  • [51] J.T. Betts. Survey of numerical methods for trajectory optimization. Journal of guidance, control, and dynamics, 21(2):193–207, 1998.
  • [52] Emmanuel Trélat. Some properties of the value function and its level sets for affine control systems with quadratic cost. Journal of Dynamical and Control Systems, 6(4):511–541, 2000.
  • [53] L. Bourdin and E. Trélat. Pontryagin maximum principle for finite dimensional nonlinear optimal control problem on time scales. SIAM Journal on Control and Optimization, 51(5):3781–3813, 2013.
  • [54] P. S. Maybeck. Stochastic models, estimation, and control. Academic press, 1982.

Appendix

7.1. Proof of Lemma 2.1

Proof of Lemma 2.1.

For the sake of clarity in the notation, we only consider the case for which d=1d=1, the case with multivariate Brownian motion being similar.

Let us start with the first inequality. For this, by denoting x=xu,v,yx=x_{u,v,y}, for every t[0,T]t\in[0,T] we compute (below, C0C\geq 0 is an appropriate constant)

𝔼[sups[0,t]x(s)]C𝔼[x0]+C𝔼[supr[0,t]0rσ(s,y(s))dBs]\displaystyle\mathbb{E}\left[\underset{s\in[0,t]}{\sup}\ \|x(s)\|^{\ell}\right]\leq C\mathbb{E}\left[\|x^{0}\|^{\ell}\right]+C\mathbb{E}\left[\underset{r\in[0,t]}{\sup}\ \left\|\int^{r}_{0}\sigma(s,y(s))\;\mathrm{d}B_{s}\right\|^{\ell}\right]
+C𝔼[supr[0,t]0r(b0(s,y(s))+i=1mui(s)bi(s,y(s)))dsp]\displaystyle+C\mathbb{E}\left[\underset{r\in[0,t]}{\sup}\ \left\|\int^{r}_{0}\left(b_{0}(s,y(s))+\sum^{m}_{i=1}u^{i}(s)b_{i}(s,y(s))\right)\;\mathrm{d}s\right\|^{p}\right]
+C𝔼[supr[0,t]0r(b0x(s,y(s))+i=1mvi(s)bix(s,y(s)))(x(s)y(s))ds]\displaystyle+C\mathbb{E}\left[\underset{r\in[0,t]}{\sup}\ \left\|\int^{r}_{0}\left(\frac{\partial b_{0}}{\partial x}(s,y(s))+\sum^{m}_{i=1}v^{i}(s)\frac{\partial b_{i}}{\partial x}(s,y(s))\right)(x(s)-y(s))\;\mathrm{d}s\right\|^{\ell}\right]
+C𝔼[supr[0,t]0rσx(s,y(s))(x(s)y(s))dBs].\displaystyle+C\mathbb{E}\left[\underset{r\in[0,t]}{\sup}\ \left\|\int^{r}_{0}\frac{\partial\sigma}{\partial x}(s,y(s))(x(s)-y(s))\;\mathrm{d}B_{s}\right\|^{\ell}\right].

For the last term, by denoting Sy,σ{(s,ω)[0,T]×Ω:(s,y(s,ω))suppσ}S_{y,\sigma}\triangleq\bigg{\{}(s,\omega)\in[0,T]\times\Omega:\ (s,y(s,\omega))\in\textnormal{supp}\ \sigma\bigg{\}}, Burkholder–Davis–Gundy, Hölder, and Young inequalities give

𝔼[supr[0,t]0rσx(s,y(s))(x(s)y(s))dBsp]\displaystyle\mathbb{E}\left[\underset{r\in[0,t]}{\sup}\ \left\|\int^{r}_{0}\frac{\partial\sigma}{\partial x}(s,y(s))(x(s)-y(s))\;\mathrm{d}B_{s}\right\|^{p}\right]
C(0t𝔼[sups[0,r]x(r)p]ds+Sy,σσx(s,y(s))py(s)pd(s×P))\displaystyle\leq C\left(\int^{t}_{0}\mathbb{E}\left[\underset{s\in[0,r]}{\sup}\ \left\|x(r)\right\|^{p}\right]\;\mathrm{d}s+\int_{S_{y,\sigma}}\left\|\frac{\partial\sigma}{\partial x}(s,y(s))\right\|^{p}\left\|y(s)\right\|^{p}\;\mathrm{d}(s\times P)\right)
C(1+0t𝔼[sups[0,r]x(r)p]ds).\displaystyle\leq C\left(1+\int^{t}_{0}\mathbb{E}\left[\underset{s\in[0,r]}{\sup}\ \left\|x(r)\right\|^{p}\right]\;\mathrm{d}s\right).

Similar computations apply to the other terms and when considering solutions to (1). Therefore, we conclude from a routine Grönwall inequality argument.

Let us prove the second inequality of the lemma. For t[0,T]t\in[0,T], we compute

𝔼[sups[0,t]\displaystyle\mathbb{E}\bigg{[}\underset{s\in[0,t]}{\sup}\ xu1(s)xu2(s)p]C𝔼[sups[0,t]0r(b0(s,xu1(s))b0(s,xu2(s)))dsp]\displaystyle\|x_{u_{1}}(s)-x_{u_{2}}(s)\|^{p}\bigg{]}\leq C\mathbb{E}\left[\underset{s\in[0,t]}{\sup}\ \left\|\int^{r}_{0}\left(b_{0}(s,x_{u_{1}}(s))-b_{0}(s,x_{u_{2}}(s))\right)\;\mathrm{d}s\right\|^{p}\right]
+Ci=1m𝔼[sups[0,t]0r(u1i(s)bi(s,xu1(s))u2i(s)bi(s,xu2(s)))dsp]\displaystyle+C\sum^{m}_{i=1}\mathbb{E}\left[\underset{s\in[0,t]}{\sup}\ \left\|\int^{r}_{0}\left(u^{i}_{1}(s)b_{i}(s,x_{u_{1}}(s))-u^{i}_{2}(s)b_{i}(s,x_{u_{2}}(s))\right)\;\mathrm{d}s\right\|^{p}\right]
+C𝔼[sups[0,t]0r(σ(s,xu1(s))σ(s,xu2(s)))dBsp].\displaystyle+C\mathbb{E}\left[\underset{s\in[0,t]}{\sup}\ \left\|\int^{r}_{0}\left(\sigma(s,x_{u_{1}}(s))-\sigma(s,x_{u_{2}}(s))\right)\;\mathrm{d}B_{s}\right\|^{p}\right].

For the second term on the right-hand side, for i=1,,mi=1,\dots,m Hölder inequality gives

𝔼[sups[0,t]0r(u1i(s)bi(s,xu1(s))u2i(s)bi(s,xu2(s)))dsp]\displaystyle\mathbb{E}\left[\underset{s\in[0,t]}{\sup}\ \left\|\int^{r}_{0}\left(u^{i}_{1}(s)b_{i}(s,x_{u_{1}}(s))-u^{i}_{2}(s)b_{i}(s,x_{u_{2}}(s))\right)\;\mathrm{d}s\right\|^{p}\right]\leq
C(0t𝔼[supr[0,s]xu1(r)xu2(r)p]ds+𝔼[(0Tu1(s)u2(s)ds)p]),\displaystyle\leq C\left(\int^{t}_{0}\mathbb{E}\left[\underset{r\in[0,s]}{\sup}\ \left\|x_{u_{1}}(r)-x_{u_{2}}(r)\right\|^{p}\right]\;\mathrm{d}s+\mathbb{E}\left[\left(\int^{T}_{0}\|u_{1}(s)-u_{2}(s)\|\;\mathrm{d}s\right)^{p}\right]\right),

and similar computations hold for the remaining terms and when considering solutions to (1). Again, we conclude by a Grönwall inequality argument. ∎

7.2. Proof of Lemma 4.1.2 and of Lemma 5.2.1

The proof of Lemma 4.1.2 immediately follows from the following preliminary result. {lmm}[Stochastic needle-like variation formula - no free final time] Let (η1,,ηj)Pr+j(𝒞tj)(\eta_{1},\dots,\eta_{j})\in\textnormal{Pr}_{\mathbb{R}^{j}_{+}}(\mathcal{C}^{j}_{t}) (in particular, no (A2)(A^{\prime}_{2}) or any assumption on t>tjt>t_{j} are required). For ε[0,ttj)\varepsilon\in[0,t-t_{j}), uniformly for δ[ε,Tt]\delta\in[-\varepsilon,T-t],

𝔼[g~(x~uπ(t+δ))g~(x~u(t+δ))i=1jηig~x~(x~u(t+δ))zti,ui(t+δ)2]12=o(i=1jηi).\mathbb{E}\bigg{[}\bigg{\|}\tilde{g}(\tilde{x}_{u_{\pi}}(t+\delta))-\tilde{g}(\tilde{x}_{u}(t+\delta))-\sum_{i=1}^{j}\eta_{i}\frac{\partial\tilde{g}}{\partial\tilde{x}}(\tilde{x}_{u}(t+\delta))z_{t_{i},u_{i}}(t+\delta)\bigg{\|}^{2}\bigg{]}^{\frac{1}{2}}=o\left(\sum_{i=1}^{j}\eta_{i}\right).
Proof.

We only consider the case j=1j=1, being the most general case j>1j>1 done by adopting a classical induction argument (see, e.g., [40]). We only need to prove that

𝔼[x~uπ(t+δ)x~u(t+δ)η1zt1,u1(t+δ)2]=o(η12)\mathbb{E}\left[\left\|\tilde{x}_{u_{\pi}}(t+\delta)-\tilde{x}_{u}(t+\delta)-\eta_{1}z_{t_{1},u_{1}}(t+\delta)\right\|^{2}\right]=o(\eta^{2}_{1}) (32)

uniformly for every δ[ε,Tt]\delta\in[-\varepsilon,T-t]. For this, first we remark that x~uπ(t)=x~u(t)\tilde{x}_{u_{\pi}}(t)=\tilde{x}_{u}(t) for t[0,t1]t\in[0,t_{1}]. Since t(tj,tf]t\in(t_{j},t_{f}] and ε0\varepsilon\geq 0 are fixed and we take the limit η10\eta_{1}\rightarrow 0, we may assume that η1+t1<tε\eta_{1}+t_{1}<t-\varepsilon. Therefore, without loss of generality, we replace t+δt+\delta with tt, assuming tt1+η1t\geq t_{1}+\eta_{1} uniformly. We have (below, C0C\geq 0 denotes a constant)

𝔼\displaystyle\mathbb{E} [x~uπ(t)x~u(t)η1zt1,u1(t)2]C𝔼[t1t1+η1η1A(s)zt1,u1(s)ds2]\displaystyle\left[\left\|\tilde{x}_{u_{\pi}}(t)-\tilde{x}_{u}(t)-\eta_{1}z_{t_{1},u_{1}}(t)\right\|^{2}\right]\leq C\mathbb{E}\left[\left\|\int^{t_{1}+\eta_{1}}_{t_{1}}\eta_{1}A(s)z_{t_{1},u_{1}}(s)\mathrm{d}s\right\|^{2}\right]
+C𝔼[t1+η1t(b~(s,uπ(s),xuπ(s))b~(s,u(s),xu(s))η1A(s)zt1,u1(s))ds2]\displaystyle+C\mathbb{E}\left[\left\|\int^{t}_{t_{1}+\eta_{1}}\left(\tilde{b}(s,u_{\pi}(s),x_{u_{\pi}}(s))-\tilde{b}(s,u(s),x_{u}(s))-\eta_{1}A(s)z_{t_{1},u_{1}}(s)\right)\mathrm{d}s\right\|^{2}\right]
+C𝔼[0t𝟙(t1,T](s)(σ~(s,xuπ(s))σ~(s,xu(s))η1D(s)zt1,u1(s))dBs2]\displaystyle+C\mathbb{E}\left[\left\|\int^{t}_{0}\mathbbm{1}_{(t_{1},T]}(s)\left(\tilde{\sigma}(s,x_{u_{\pi}}(s))-\tilde{\sigma}(s,x_{u}(s))-\eta_{1}D(s)z_{t_{1},u_{1}}(s)\right)\mathrm{d}B_{s}\right\|^{2}\right]
+C𝔼[t1t1+η1(b~(s,uπ(s),xuπ(s))b~(s,u(s),xu(s)))dsη1zt1,u1(t1)2].\displaystyle+C\mathbb{E}\left[\left\|\int^{t_{1}+\eta_{1}}_{t_{1}}\left(\tilde{b}(s,u_{\pi}(s),x_{u_{\pi}}(s))-\tilde{b}(s,u(s),x_{u}(s))\right)\mathrm{d}s-\eta_{1}z_{t_{1},u_{1}}(t_{1})\right\|^{2}\right].

Let us analyze those integrals separately. Starting with the last one, we have

𝔼\displaystyle\mathbb{E} [t1t1+η1(b~(s,uπ(s),xuπ(s))b~(s,u(s),xu(s)))dsη1zt1,u1(t1)2]\displaystyle\left[\left\|\int^{t_{1}+\eta_{1}}_{t_{1}}\left(\tilde{b}(s,u_{\pi}(s),x_{u_{\pi}}(s))-\tilde{b}(s,u(s),x_{u}(s))\right)\mathrm{d}s-\eta_{1}z_{t_{1},u_{1}}(t_{1})\right\|^{2}\right]\leq
C𝔼[t1t1+η1(b~(s,u1,xuπ(s))b~(s,u1,xu(s)))ds2]\displaystyle\leq C\mathbb{E}\left[\left\|\int^{t_{1}+\eta_{1}}_{t_{1}}\left(\tilde{b}(s,u_{1},x_{u_{\pi}}(s))-\tilde{b}(s,u_{1},x_{u}(s))\right)\mathrm{d}s\right\|^{2}\right]
+C𝔼[t1t1+η1(b~(s,u1,xu(s))b~(s,u(s),xu(s)))dsη1zt1,u1(t1)2].\displaystyle\quad+C\mathbb{E}\left[\left\|\int^{t_{1}+\eta_{1}}_{t_{1}}\left(\tilde{b}(s,u_{1},x_{u}(s))-\tilde{b}(s,u(s),x_{u}(s))\right)\mathrm{d}s-\eta_{1}z_{t_{1},u_{1}}(t_{1})\right\|^{2}\right].

Lemma 2.1 immediately gives that the first term on the right-hand side is o(η12)o(\eta^{2}_{1}). In addition, from Hölder inequality, it follows that

1η12𝔼\displaystyle\frac{1}{\eta^{2}_{1}}\mathbb{E} [t1t1+η1(b~(s,u1,xu(s))b~(s,u(s),xu(s)))dsη1zt1,u1(t1)2]\displaystyle\left[\left\|\int^{t_{1}+\eta_{1}}_{t_{1}}\left(\tilde{b}(s,u_{1},x_{u}(s))-\tilde{b}(s,u(s),x_{u}(s))\right)\mathrm{d}s-\eta_{1}z_{t_{1},u_{1}}(t_{1})\right\|^{2}\right]\leq
Cη1t1t1+η1𝔼[b~(s,u1,xu(s))b~(t1,u1,xu(t1))2]ds\displaystyle\leq\frac{C}{\eta_{1}}\int^{t_{1}+\eta_{1}}_{t_{1}}\mathbb{E}\left[\left\|\tilde{b}(s,u_{1},x_{u}(s))-\tilde{b}(t_{1},u_{1},x_{u}(t_{1}))\right\|^{2}\right]\;\mathrm{d}s
+Cη1t1t1+η1𝔼[b~(s,u(s),xu(s))b~(t1,u(t1),xu(t1))2]ds.\displaystyle\quad+\frac{C}{\eta_{1}}\int^{t_{1}+\eta_{1}}_{t_{1}}\mathbb{E}\left[\left\|\tilde{b}(s,u(s),x_{u}(s))-\tilde{b}(t_{1},u(t_{1}),x_{u}(t_{1}))\right\|^{2}\right]\;\mathrm{d}s.

Since t1t_{1} is a Lebesgue point for uu, the two terms above go to zero as η10\eta_{1}\rightarrow 0. Next, by the Burkholder–Davis–Gundy inequality and a Taylor development, we have

𝔼[0t𝟙(t1,T](s)(σ~(s,xuπ(s))σ~(s,xu(s))η1D(s)zt1,u1(s))dBs2]\displaystyle\mathbb{E}\left[\left\|\int^{t}_{0}\mathbbm{1}_{(t_{1},T]}(s)\left(\tilde{\sigma}(s,x_{u_{\pi}}(s))-\tilde{\sigma}(s,x_{u}(s))-\eta_{1}D(s)z_{t_{1},u_{1}}(s)\right)\mathrm{d}B_{s}\right\|^{2}\right]
C𝔼[t1tD(s)(x~uπ(s)x~u(s)η1zt1,u1(s))2ds]\displaystyle\leq C\mathbb{E}\left[\int^{t}_{t_{1}}\left\|D(s)\left(\tilde{x}_{u_{\pi}}(s)-\tilde{x}_{u}(s)-\eta_{1}z_{t_{1},u_{1}}(s)\right)\right\|^{2}\mathrm{d}s\right]
+C𝔼[t1t01θ2σ~x~2(s,θx~u(s)+(1θ)(x~uπx~u)(s))(x~uπx~u)2(s)dθ2ds]\displaystyle\quad+C\mathbb{E}\left[\int^{t}_{t_{1}}\left\|\int^{1}_{0}\theta\frac{\partial^{2}\tilde{\sigma}}{\partial\tilde{x}^{2}}\left(s,\theta\tilde{x}_{u}(s)+(1-\theta)(\tilde{x}_{u_{\pi}}-\tilde{x}_{u})(s)\right)\left(\tilde{x}_{u_{\pi}}-\tilde{x}_{u}\right)^{2}(s)\;\mathrm{d}\theta\right\|^{2}\mathrm{d}s\right]
=Ct1t𝔼[x~uπ(s)x~u(s)η1zt1,u1(s)2]ds+o(η12),\displaystyle=C\int^{t}_{t_{1}}\mathbb{E}\left[\left\|\tilde{x}_{u_{\pi}}(s)-\tilde{x}_{u}(s)-\eta_{1}z_{t_{1},u_{1}}(s)\right\|^{2}\right]\;\mathrm{d}s+o(\eta^{2}_{1}),

from Lemma 2.1. Similar estimates hold for the remaining terms in the first inequality of the proof. Summarizing, there exists a constant C0C\geq 0 such that

𝔼[x~uπ(t)x~u(t)η1zt1,u1(t)2]o(η12)+Ct1t𝔼[x~uπ(s)x~u(s)η1zt1,u1(s)2]ds\mathbb{E}\Big{[}\|\tilde{x}_{u_{\pi}}(t)-\tilde{x}_{u}(t)-\eta_{1}z_{t_{1},u_{1}}(t)\|^{2}\Big{]}\leq o(\eta^{2}_{1})+C\int^{t}_{t_{1}}\mathbb{E}\left[\left\|\tilde{x}_{u_{\pi}}(s)-\tilde{x}_{u}(s)-\eta_{1}z_{t_{1},u_{1}}(s)\right\|^{2}\right]\;\mathrm{d}s

and the conclusion follows from a routine Grönwall inequality argument. ∎

Proof of Lemma 5.2.1.

We may assume δ0\delta\neq 0 and (A2)(A^{\prime}_{2}). Hence, g~\tilde{g} is an affine function, and in the rest of this proof we denote g~(x~)=Mx~+d\tilde{g}(\tilde{x})=M\tilde{x}+d, where M(q+1)×(n+1)M\in\mathbb{R}^{(q+1)\times(n+1)} and dq+1d\in\mathbb{R}^{q+1}.

Developing, we have

𝔼[\displaystyle\bigg{\|}\mathbb{E}\bigg{[} g~(x~uπ(t+δ))g~(x~u(t))δMb~(t,u(t),xu(t))i=1jηiAzti,ui(t)]\displaystyle\tilde{g}(\tilde{x}_{u_{\pi}}(t+\delta))-\tilde{g}(\tilde{x}_{u}(t))-\delta M\tilde{b}(t,u(t),x_{u}(t))-\sum^{j}_{i=1}\eta_{i}Az_{t_{i},u_{i}}(t)\bigg{]}\bigg{\|}\leq
𝔼[g~(x~u(t+δ))g~(x~u(t))δMb~(t,u(t),xu(t))]\displaystyle\leq\Big{\|}\mathbb{E}\Big{[}\tilde{g}(\tilde{x}_{u}(t+\delta))-\tilde{g}(\tilde{x}_{u}(t))-\delta M\tilde{b}(t,u(t),x_{u}(t))\Big{]}\Big{\|}
+𝔼[g~(x~uπ(t+δ))g~(x~u(t+δ))i=1jηiMzti,ui(t+δ)]+i=1jηiM𝔼[zti,ui(t+δ)zti,ui(t)].\displaystyle\quad+\bigg{\|}\mathbb{E}\bigg{[}\tilde{g}(\tilde{x}_{u_{\pi}}(t+\delta))-\tilde{g}(\tilde{x}_{u}(t+\delta))-\sum^{j}_{i=1}\eta_{i}Mz_{t_{i},u_{i}}(t+\delta)\bigg{]}\bigg{\|}+\sum^{j}_{i=1}\eta_{i}\left\|M\right\|\Big{\|}\mathbb{E}\Big{[}z_{t_{i},u_{i}}(t+\delta)-z_{t_{i},u_{i}}(t)\Big{]}\Big{\|}.

From Lemma 7.2, the second term on the right-hand side is o(i=1jηi)o\left(\sum_{i=1}^{j}\eta_{i}\right). For the last summand, from the property of the stochastic integral, we have

\displaystyle\Big{\|} 𝔼[zti,ui(t+δ)zti,ui(t)]𝔼[tt+δA(s)zti,ui(s)ds]\displaystyle\mathbb{E}\Big{[}z_{t_{i},u_{i}}(t+\delta)-z_{t_{i},u_{i}}(t)\Big{]}\Big{\|}\leq\left\|\mathbb{E}\left[\int^{t+\delta}_{t}A(s)z_{t_{i},u_{i}}(s)\;\mathrm{d}s\right]\right\|
+𝔼[0t+δ𝟙[t,T](s)D(s)zti,ui(s)dBs]C𝔼[sups[0,T]zti,ui(s)]δ,\displaystyle+\left\|\mathbb{E}\left[\int^{t+\delta}_{0}\mathbbm{1}_{[t,T]}(s)D(s)z_{t_{i},u_{i}}(s)\;\mathrm{d}B_{s}\right]\right\|\leq C\mathbb{E}\left[\underset{s\in[0,T]}{\sup}\ \|z_{t_{i},u_{i}}(s)\|\;\right]\delta,

where the last term is o(δ+i=1jηi)o\left(\delta+\sum_{i=1}^{j}\eta_{i}\right). It is worth pointing out the importance for gg being affine to provide the last claim. Finally, for the remaining term, we apply Itô formula to each coordinate h=1,,q+1h=1,\dots,q+1, obtaining

[g~(x~u(t+δ))g~(x~u(t))\displaystyle\Big{[}\tilde{g}(\tilde{x}_{u}(t+\delta))-\tilde{g}(\tilde{x}_{u}(t))- δMb~(t,u(t),xu(t))]h=n+1k=1Mhkt+δ0𝟙[t,T](s)σ~k(s,xu(s))dBs\displaystyle\delta M\tilde{b}(t,u(t),x_{u}(t))\Big{]}_{h}=\sum^{n+1}_{k=1}M_{hk}\int^{t+\delta}_{0}\mathbbm{1}_{[t,T]}(s)\tilde{\sigma}_{k}(s,x_{u}(s))\;\mathrm{d}B_{s}
+k=1n+1Mhk(tt+δb~k(s,u(s),xu(s))dsδb~k(t,u(t),xu(t))),\displaystyle+\sum^{n+1}_{k=1}M_{hk}\left(\int^{t+\delta}_{t}\tilde{b}_{k}(s,u(s),x_{u}(s))\;\mathrm{d}s-\delta\tilde{b}_{k}(t,u(t),x_{u}(t))\right),

and therefore

1δ+i=1jηi\displaystyle\frac{1}{\delta+\sum_{i=1}^{j}\eta_{i}}\Big{\|} 𝔼[g~(x~u(t+δ))g~(x~u(t))δMb~(t,u(t),xu(t))]\displaystyle\mathbb{E}\Big{[}\tilde{g}(\tilde{x}_{u}(t+\delta))-\tilde{g}(\tilde{x}_{u}(t))-\delta M\tilde{b}(t,u(t),x_{u}(t))\Big{]}\Big{\|}\leq
Mδtt+δ𝔼[b~k(s,u(s),xu(s))b~k(t,u(t),xu(t))]ds.\displaystyle\leq\frac{\|M\|}{\delta}\int^{t+\delta}_{t}\mathbb{E}\left[\left\|\tilde{b}_{k}(s,u(s),x_{u}(s))-\tilde{b}_{k}(t,u(t),x_{u}(t))\right\|\right]\;\mathrm{d}s.

As tt is Lebesgue for uu, this last quantity goes to zero for δ0\delta\rightarrow 0, and we conclude. ∎