This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A modified MSA for stochastic control problems

(Date: , )
Abstract.

The classical Method of Successive Approximations (MSA) is an iterative method for solving stochastic control problems and is derived from Pontryagin’s optimality principle. It is known that the MSA may fail to converge. Using careful estimates for the backward stochastic differential equation (BSDE) this paper suggests a modification to the MSA algorithm. This modified MSA is shown to converge for general stochastic control problems with control in both the drift and diffusion coefficients. Under some additional assumptions the rate of convergence is shown. The results are valid without restrictions on the time horizon of the control problem, in contrast to iterative methods based on the theory of forward-backward stochastic differential equations.

Supported by the Alan Turing Institute under EPSRC grant no. EP/N510129/1 and by The Maxwell Institute Graduate School in Analysis and its Applications, a Centre for Doctoral Training funded by the UK Engineering and Physical Sciences Research Council (grant EP/L016508/01), the Scottish Funding Council, Heriot-Watt University and the University of Edinburgh.

1. Introduction

Stochastic control problems appear naturally in a range of applications in engineering, economics and finance. With the exception of very specific cases such as the linear-quadratic control problem in engineering or Merton portfolio optimization task in finance, stochastic control problems typically have no closed form solutions and have to be solved numerically. In this work, we consider a modification to the method of successive approximations (MSA), see Algorithm 1. The MSA is essentially a way of applying the Pontryagin’s optimality principle to get numerical solutions of stochastic control problems.

We will consider the continuous space, continuous time problem where the controlled system is modelled by an d\mathbb{R}^{d}-valued diffusion process. Let WW be a dd^{\prime}-dimensional Wiener martingale on a filtered probability space (Ω,,(t)t0,)(\Omega,\mathcal{F},(\mathcal{F}_{t})_{t\geq 0},\mathbb{P}). We will provide exact assumptions we need in Section 2. For now, let us fix a finite time T(0,)T\in(0,\infty) and consider the controlled stochastic differential equation (SDE) for given measurable functions b:[0,T]×d×Adb:[0,T]\times\mathbb{R}^{d}\times A\to\mathbb{R}^{d} and σ:[0,T]×d×Ad×d\sigma:[0,T]\times\mathbb{R}^{d}\times A\rightarrow\mathbb{R}^{d\times d^{\prime}}

dXs=b(s,Xs,αs)ds+σ(s,Xs,αs)dWs,s[0,T],X0=x.dX_{s}=b(s,X_{s},\alpha_{s})\,ds+\sigma(s,X_{s},\alpha_{s})\,dW_{s}\,,\,\,s\in[0,T]\,,\,\,\,X_{0}=x\,. (1)

Here α=(αs)s[0,T]\alpha=(\alpha_{s})_{s\in[0,T]} is a control process belonging to the space of admissible controls 𝒜\mathcal{A}, valued in a separable metric space AA and we will write XαX^{\alpha} to denote the unique solution of (1) which starts from xx at time 0 whilst being controlled by α\alpha. Furthermore let f:[0,T]×d×Af:[0,T]\times\mathbb{R}^{d}\times A\to\mathbb{R} and g:dg:\mathbb{R}^{d}\to\mathbb{R} be given measurable functions and consider the gain functional

J(x,α):=𝔼[0Tf(s,Xsα,αs)𝑑s+g(XTα)]J(x,\alpha):=\mathbb{E}\left[\int_{0}^{T}f(s,X_{s}^{\alpha},\alpha_{s})ds+g(X_{T}^{\alpha})\right] (2)

for all xdx\in\mathbb{R}^{d} and α𝒜\alpha\in\mathcal{A}. We want to solve the optimisation problem i.e. to find the optimal control α\alpha^{*} which achieves the minimum of (2) (or, if the infimum cannot be reached by α𝒜\alpha\in\mathcal{A} then an ε\varepsilon-optimal control αε𝒜\alpha^{\varepsilon}\in\mathcal{A} such that infα𝒜J(x,α)J(x,αε)+ε\inf_{\alpha\in\mathcal{A}}J(x,\alpha)\leq J(x,\alpha^{\varepsilon})+\varepsilon).

In the present paper, we study an approach based on Pontryagin’s optimality principle, see e.g. [4][7] or [25]. The main idea is to consider optimality conditions for controls of the problem (2). Given b,σb,\sigma and ff we define the Hamiltonian :[0,T]×d×d×d×d×A\mathcal{H}:[0,T]\times\mathbb{R}^{d}\times\mathbb{R}^{d}\times\mathbb{R}^{d\times d^{\prime}}\times A\rightarrow\mathbb{R} as

(t,x,y,z,a)=b(t,x,a)y+tr(σ(t,x,a)z)+f(t,x,a).\mathcal{H}(t,x,y,z,a)=b(t,x,a)\cdot y+\text{tr}(\sigma^{\top}(t,x,a)z)+f(t,x,a)\,. (3)

Consider for each α𝒜\alpha\in\mathcal{A}, the BSDE, called the adjoint equation

dYsα=Dx(s,Xsα,Ysα,Zsα,αs)ds+ZsαdWs,YTα=Dxg(XTα),s[0,T].dY_{s}^{\alpha}=-D_{x}\mathcal{H}(s,X_{s}^{\alpha},Y_{s}^{\alpha},Z_{s}^{\alpha},\alpha_{s})\,ds+Z_{s}^{\alpha}\,dW_{s},\,\,\,Y_{T}^{\alpha}=D_{x}g(X_{T}^{\alpha}),\,\,\,s\in[0,T]\,. (4)

It is well known from Pontryagin’s optimality principle that, if an admissible control α𝒜\alpha^{*}\in\mathcal{A} is optimal, XαX^{\alpha^{*}} is the corresponding optimally controlled dynamic (1) and (Yα,Zα)(Y^{\alpha^{*}},Z^{\alpha^{*}}) is the solution to the associated adjoint equation (4), then aA\forall a\in A and s[0,T]\forall s\in[0,T] the following holds

(s,Xsα,Ysα,Zsα,αs)(s,Xsα,Ysα,Zsα,a)a.s.\mathcal{H}(s,X_{s}^{\alpha^{*}},Y_{s}^{\alpha^{*}},Z_{s}^{\alpha^{*}},\alpha_{s}^{*})\leq\mathcal{H}(s,X_{s}^{\alpha^{*}},Y_{s}^{\alpha^{*}},Z_{s}^{\alpha^{*}},a)\,\,\,\text{a.s.} (5)

We now define the augmented Hamiltonian ~:[0,T]×d×d×d×d×A×A\mathcal{\tilde{H}}:[0,T]\times\mathbb{R}^{d}\times\mathbb{R}^{d}\times\mathbb{R}^{d\times d^{\prime}}\times A\times A\rightarrow\mathbb{R} for some ρ0\rho\geq 0 by

~(t,x,y,z,a,a):=(t,x,y,z,a)+12ρ|b(t,x,a)b(t,x,a)|212ρ|σ(t,x,a)σ(t,x,a)|2+12ρ|Dx(t,x,y,z,a)Dx(t,x,y,z,a)|2.\begin{split}\mathcal{\tilde{H}}&(t,x,y,z,a^{\prime},a):=\mathcal{H}(t,x,y,z,a)+\frac{1}{2}\rho|b(t,x,a)-b(t,x,a^{\prime})|^{2}\\ &\frac{1}{2}\rho|\sigma(t,x,a)-\sigma(t,x,a^{\prime})|^{2}+\frac{1}{2}\rho\left|D_{x}\mathcal{H}(t,x,y,z,a)-D_{x}\mathcal{H}(t,x,y,z,a^{\prime})\right|^{2}\,.\end{split} (6)

Notice that when ρ=0\rho=0 we have exactly the definition of Hamiltonian (3). Given the augmented Hamiltonian, let us introduce the modified MSA in Algorithm 1 which consists of successive integrations of the state and adjoint systems and updates to the control. Notice that the backward SDE depends on the Hamiltonian \mathcal{H}, while the control update step comes from minimizing the augmented Hamiltonian ~\tilde{\mathcal{H}}.

Algorithm 1 Modified Method of Successive Approximations:
  Initialisation: make a guess of the control α0=(αs0)s[0,T]\alpha^{0}=(\alpha^{0}_{s})_{s\in[0,T]}.
  while difference between J(x,αn)J(x,\alpha^{n}) and J(x,αn1)J(x,\alpha^{n-1}) is large do
     Given a control αn1=(αsn1)s[0,T]\alpha^{n-1}=(\alpha^{n-1}_{s})_{s\in[0,T]} solve the following forward SDE, then solve backward SDE:
dXsn=b(s,Xsn,αsn1)ds+σ(s,Xsn,αsn1)dWs,X0n=x,dYsn=Dx(s,Xsn,Ysn,Zsn,αsn1)ds+ZsndWs,YTn=Dxg(XTn).\begin{split}dX^{n}_{s}&=b(s,X^{n}_{s},\alpha^{n-1}_{s})\,ds+\sigma(s,X^{n}_{s},\alpha^{n-1}_{s})\,dW_{s}\,,\,\,\,\,X^{n}_{0}=x\,,\\ dY^{n}_{s}&=-D_{x}\mathcal{H}(s,X^{n}_{s},Y^{n}_{s},Z^{n}_{s},\alpha^{n-1}_{s})\,ds+Z^{n}_{s}\,dW_{s},\,\,\,Y^{n}_{T}=D_{x}g(X^{n}_{T})\,.\end{split} (7)
     Update the control
αsnargminaA~(s,Xsn,Ysn,Zsn,αsn1,a),s[0,T].\alpha^{n}_{s}\in\arg\min_{a\in A}\tilde{\mathcal{H}}(s,X^{n}_{s},Y^{n}_{s},Z^{n}_{s},\alpha^{n-1}_{s},a)\,,\,\,\forall s\in[0,T]\,. (8)
  end while
  return  αn\alpha^{n}.

The method of successive approximations (i.e. case ρ=0\rho=0) for numerical solution of deterministic control problems was proposed already in [5]. Recent application of the modified MSA to a deep learning problem has been studied in [32], where they formulated the training of deep neural networks as an optimal control problem and introduced the modified method of successive approximations as an alternative training algorithm for deep learning. For us, the main motivation to explore the modified MSA for stochastic control problems is to obtain convergence, ideally with rate, of an iterative algorithm, applicable to problems with the control in the diffusion part of the controlled dynamics. This is in contrast to [36] where convergence rate of an the Bellman–Howard policy iteration is shown but only for control problems with no control in the diffusion part of the controlled dynamics.

In Lemma 2.3, which can be established using careful BSDE estimates, we can see the estimate on the change of JJ when we do a minimization step of Hamiltonian as in (8). If the sum of the last three terms of (14) is bigger than the first term, then for classical MSA algorithm (i.e. case ρ=0\rho=0) we cannot guarantee that we do an update of the control in optimal descent direction of JJ. That means that the method of successive approximations may diverge. To overcome this, we need to modify the algorithm in such way so that we ensure convergence. With this in mind the desirability of the the augmented Hamiltonian (6) for updating the control becomes clear, as long as it still characterises optimal controls like \mathcal{H} does. Theorem 2.4 answers this question affirmatively which opens the way to the modified MSA. In Theorem 2.5 we show that the modified method of successive approximations, converges for arbitrary TT, and in Corollary 2.6, we show logarithmic convergence rate for certain stochastic control problems.

We observe that the forward and backward dynamics in (7) are decoupled, due to the iteration used. Therefore, it can be efficiently approximated, even in high dimension, using deep learning methods, see [31] and [30]. However, the minimization step (8) might be computationally expensive for some problems. A possible approach circumventing this is to replace the full minimization of (8) by gradient descent. A continuous version of this gradient flow is analyzed in [37].

The main contributions of this paper are the probabilistic proof of convergence of the modified method of successive approximations and establishing convergence rate for a specific type of optimal control problems.

This paper is organised as follow: in Section 1.1 we compare our results with existing work. In Section 2 we state the assumptions and main results. In Section 3 we collect all proofs. Finally, in Appendix A we recall an auxiliary lemma which is needed in the proof of Corollary 2.6.

1.1. Related work

One can solve the stochastic optimal control problem using dynamic programming principle. It is well known, see e.g. Krylov [8], that under reasonable assumptions the value function, defined as infimum of (2) over all admissible controls, satisfies the Bellman partial differential equation (PDE). There are several approaches to solve this nonlinear problem. One may apply a finite difference method to discretise the Bellman PDE and get a high dimensional nonlinear system of equations, see e.g [20] or [22]. Or one may linearize the Bellman PDE and then iterate. The classical approach is the Bellman-Howard policy improvement / iteration algorithm, see e.g. [1][2] or [3]. The algorithm is initialised with a “guess” of Markovian control. Given a Markovian control strategy at step nn one solves a linear PDE with the given control fixed and then one uses the solution to the linear PDE to update the Markovian control, see e.g. [27][28] or  [29]. In [36], a global rate of convergence and stability for the policy iteration algorithm has been established using backward stochastic differential equations (BSDE) theory. However, the result only applies to stochastic control problems with no control in the diffusion coefficient of the controlled dynamics.

It is known that the solution of the stochastic optimal control problem can be obtained from a corresponding forward backward stochastic differential equation (FBSDE) via the stochastic optimality principle, see [26, Chapter 8.1]. Indeed, let us consider (1) and (4), and recall from the stochastic optimality principle, see [25, Theorem 4.12], that for the optimal control α=(αs)s[0,T]\alpha^{*}=(\alpha_{s}^{*})_{s\in[0,T]} we have that (5) holds. Assume that under some conditions on b,σb,\sigma and ff we have that the first order condition stated above uniquely determines α\alpha^{*} for s[0,T]s\in[0,T] by

αs=φ(s,Xsα,Ysα,Zsα),\alpha_{s}^{*}=\varphi(s,X_{s}^{\alpha^{*}},Y_{s}^{\alpha^{*}},Z_{s}^{\alpha^{*}})\,, (9)

for some function φ\varphi. Therefore, after plugging (9) into (1) and (4), we obtain the following coupled FBSDE:

dXsα=b¯(s,Xsα,Ysα,Zsα)ds+σ¯(s,Xsα,Ysα,Zsα)dWs,s[0,T],X0α=x.dYsα=Dx¯(s,Xsα,Ysα,Zsα)ds+ZsαdWs,YT=Dxg(XTα),s[0,T],\begin{split}dX_{s}^{\alpha^{*}}&=\bar{b}(s,X_{s}^{\alpha^{*}},Y_{s}^{\alpha^{*}},Z_{s}^{\alpha^{*}})\,ds+\bar{\sigma}(s,X_{s}^{\alpha^{*}},Y_{s}^{\alpha^{*}},Z_{s}^{\alpha^{*}})\,dW_{s}\,,\,\,s\in[0,T]\,,\,\,\,X_{0}^{\alpha^{*}}=x\,.\\ dY_{s}^{\alpha^{*}}&=-D_{x}\bar{\mathcal{H}}(s,X_{s}^{\alpha^{*}},Y_{s}^{\alpha^{*}},Z_{s}^{\alpha^{*}})\,ds+Z_{s}^{\alpha^{*}}\,dW_{s},\,\,\,Y_{T}=D_{x}g(X_{T}^{\alpha^{*}}),\,\,\,s\in[0,T]\,,\end{split} (10)

where (b¯,σ¯)(s,Xsα,Ysα,Zsα)=(b,σ)(s,Xsα,φ(s,Xsα,Ysα,Zsα))(\bar{b},\bar{\sigma})(s,X_{s}^{\alpha^{*}},Y_{s}^{\alpha^{*}},Z_{s}^{\alpha^{*}})=(b,\sigma)(s,X_{s}^{\alpha^{*}},\varphi(s,X_{s}^{\alpha^{*}},Y_{s}^{\alpha^{*}},Z_{s}^{\alpha^{*}})) and
¯(s,Xsα,Ysα,Zsα)=(s,Xsα,Ysα,Zsα,φ(s,Xsα,Ysα,Zsα))\bar{\mathcal{H}}(s,X_{s}^{\alpha^{*}},Y_{s}^{\alpha^{*}},Z_{s}^{\alpha^{*}})=\mathcal{H}(s,X_{s}^{\alpha^{*}},Y_{s}^{\alpha^{*}},Z_{s}^{\alpha^{*}},\varphi(s,X_{s}^{\alpha^{*}},Y_{s}^{\alpha^{*}},Z_{s}^{\alpha^{*}})). It is worth mentioning that when σ\sigma does not depend on the control σ¯\bar{\sigma} will depend on forward process and time only. This means that σ¯\bar{\sigma} does not have YY and ZZ components.

The theory of FBSDE has been studied widely and there are several methods to show the existence and uniqueness result, and a number of numerical algorithms have been proposed based on those methods. First is the method of contraction mapping. It was first studied by Antonelli [9] and later by Pardoux and Tang [15]. The main idea there is to show that a certain map is a contraction, and then to apply a fixed point argument. However, it turns out that this method works only for small enough time horizon TT. In the case when σ¯\bar{\sigma} does not depend on YY and ZZ, having small TT is sufficient to get contraction. Otherwise, one needs to assume additionally that the Lipschitz constants of σ¯\bar{\sigma} in zz and that of gg in xx satisfy a certain inequality, see [26, Theorem 8.2.1]. Using the method of contraction mapping one can then implement a Picard-iteration-type numerical algorithm and show exponential convergence for small TT. The second method is the Four Step Scheme. It was introduced by Ma, Protter and Yong, see [10], and was later studied by Delarue [17]. The idea is to use a decoupling function and then study an associated quasi-linear PDE. We note that in [10, 17] the forward diffusion coefficient σ¯\bar{\sigma} does not depend on ZZ. This corresponds to stochastic control problems with the uncontrolled diffusion coefficient. The numerical algorithms based on this method exploits the numerical solution of the associated quasi-linear PDE and therefore faces some limitations for high dimensional problems, see Douglas, Ma and Protter [12], Milstein and Tretyakov [19], Ma, Shen and Zhao [21] and Delarue and Menozzi [18]. Guo, Zhang and Zhuo [24] proposed a numerical scheme for high-dimensional quasi-linear PDE associated with the coupled FBSDE when σ¯\bar{\sigma} does not depend on ZZ, which is based on a monotone scheme and on probabilistic approach. Finally, there is the method of continuation. This method was developed by Hu and Peng [11], Peng and Wu [16] and by Yong [14]. It allows them to show the existence and uniqueness result for arbitrary TT under monotonicity conditions on the coefficients, which one would not expect to apply to FBSDEs arising from a control problem as described by (9), (10). Recently, deep learning methods have been applied to solving FBSDEs. In [35], three algorithms for solving fully coupled FBSDEs which have good accuracy and performance for high-dimensional problems are provided. One of the algorithms is based on the Picard iteration and it converges, but only for small enough TT. Such method for solving high-dimensional FBSDEs has also been proposed in [34].

2. Main results

We fix a finite horizon T(0,)T\in(0,\infty). Let AA be a separable metric space. This is the space where the control processes α\alpha take values. We fix a filtered probability space (Ω,,𝔽=(t)0tT,)(\Omega,\mathcal{F},\mathbb{F}=(\mathcal{F}_{t})_{0\leq t\leq T},\mathbb{P}). Let W=(Wt)t[0,T]W=(W_{t})_{t\in[0,T]} be a dd^{\prime}-dimensional Wiener martingale on this space. By 𝔼t\mathbb{E}_{t} we denote the conditional expectation with respect to t\mathcal{F}_{t}. Let |||\cdot| denote any norm in a finite dimensional Euclidean space. By L\|\cdot\|_{L^{\infty}} we denote the norm in L(Ω)L^{\infty}(\Omega). Let Z:=esssup(t,ω)|Zt(ω)|\|Z\|_{\mathbb{H}^{\infty}}:=\text{ess}\sup_{(t,\omega)}|Z_{t}(\omega)| for any predictable process ZZ. We understand the following as Dxσ=DxlσijD_{x}\sigma=D_{x_{l}}\sigma^{ij}, Dx2b=Dxlxn2biD_{x}^{2}b=D^{2}_{x_{l}x_{n}}b^{i} and Dx2σ=Dxlxn2σijD_{x}^{2}\sigma=D^{2}_{x_{l}x_{n}}\sigma^{ij}, where i,l,n=1,2,,di,l,n=1,2,\dots,d and j=1,2,,dj=1,2,\dots,d^{\prime}. By ZZ^{\top} we denote the transpose of ZZ. The state of the system is governed by the controlled SDE (1) . The corresponding adjoint equation satisfies (4).

Assumption 2.1.

The functions bb and σ\sigma are jointly continuous in tt and twice differentiable in xx. There exists K0K\geq 0 such that xd,aA,t[0,T]\forall x\in\mathbb{R}^{d},\forall a\in A,\forall t\in[0,T],

|Dxb(t,x,a)|+|Dxσ(t,x,a)|+|Dx2b(t,x,a)|K.|D_{x}b(t,x,a)|+|D_{x}\sigma(t,x,a)|+|D^{2}_{x}b(t,x,a)|\leq K\,. (11)

Moreover, assume that Dx2σ(t,x,a)=0D_{x}^{2}\sigma(t,x,a)=0 xd,aA,t[0,T]\forall x\in\mathbb{R}^{d},\forall a\in A,\forall t\in[0,T].

Clearly the assumption (12) implies that x,xd,aA,t[0,T]\forall x,x^{\prime}\in\mathbb{R}^{d},\forall a\in A,\forall t\in[0,T] we have

|b(t,x,a)b(t,x,a)|+|σ(t,x,a)σ(t,x,a)|K|xx|.|b(t,x,a)-b(t,x^{\prime},a)|+|\sigma(t,x,a)-\sigma(t,x^{\prime},a)|\leq K|x-x^{\prime}|\,. (12)

The assumption that Dx2σ(t,x,a)=0D_{x}^{2}\sigma(t,x,a)=0 xd,aA,t[0,T]\forall x\in\mathbb{R}^{d},\forall a\in A,\forall t\in[0,T] is needed so that (21), in the proof of Lemma 3.1, holds. Without this assumption (21) would only hold if we could show that Zα<\|Z^{\alpha}\|_{\mathbb{H}^{\infty}}<\infty. Without additional regularization of the control problem this is impossible. Indeed, with [13, Proposition 5.3] we see that ZtαZ^{\alpha}_{t} is a version of DtYtαD_{t}Y_{t}^{\alpha} (the Malliavin derivative of YtαY_{t}^{\alpha}) and DtYtαD_{t}Y_{t}^{\alpha} itself satisfies an a linear BSDE. However, to obtain the estimates using this representation, one term that arises is DtαsD_{t}\alpha_{s} where t[0,T]t\in[0,T] and s[t,T]s\in[t,T]. So we would need esssupωΩ,t(0,T),s(t,T)|Dtαs(ω)|<\text{ess}\sup_{\omega\in\Omega,t\in(0,T),s\in(t,T)}|D_{t}\alpha_{s}(\omega)|<\infty. This is not necessarily the case here.

Assumption 2.2.

The functions ff is joinly continuous in tt, and ff and σ\sigma are twice differentiable in xx. There is a constant K0K\geq 0 such that x,aA,t[0,T]\forall x,\forall a\in A,\forall t\in[0,T]

|Dxg(x)|+|Dxf(t,x,a)|+|Dx2g(x)|+|Dx2f(t,x,a)|K.|D_{x}g(x)|+|D_{x}f(t,x,a)|+|D^{2}_{x}g(x)|+|D_{x}^{2}f(t,x,a)|\leq K\,. (13)

Under these assumptions, we can obtain the following estimate.

Lemma 2.3.

Let Assumption 2.1 and 2.2 hold. Then for any admissible controls φ\varphi and θ\theta there exists a constant C>0C>0 such that

J(x,φ)J(x,θ)𝔼0T[(s,Xsθ,Ysθ,Zsθ,φs)(s,Xsθ,Ysθ,Zsθ,θs)]𝑑s+C𝔼0T|b(s,Xsθ,φs)b(s,Xsθ,θs)|2𝑑s+C𝔼0T|σ(s,Xsθ,φs)σ(s,Xsθ,θs)|2𝑑s+C𝔼0T|Dx(s,Xsθ,Ysθ,Zsθ,φs)Dx(s,Xsθ,Ysθ,Zsθ,θs)|2𝑑s.\begin{split}J(x,\varphi)&-J(x,\theta)\leq\mathbb{E}\int_{0}^{T}[\mathcal{H}(s,X^{\theta}_{s},Y^{\theta}_{s},Z^{\theta}_{s},\varphi_{s})-\mathcal{H}(s,X^{\theta}_{s},Y^{\theta}_{s},Z^{\theta}_{s},\theta_{s})]\,ds\\ &+C\mathbb{E}\int_{0}^{T}|b(s,X^{\theta}_{s},\varphi_{s})-b(s,X^{\theta}_{s},\theta_{s})|^{2}\,ds\\ &+C\mathbb{E}\int_{0}^{T}|\sigma(s,X^{\theta}_{s},\varphi_{s})-\sigma(s,X^{\theta}_{s},\theta_{s})|^{2}\,ds\\ &+C\mathbb{E}\int_{0}^{T}|D_{x}\mathcal{H}(s,X^{\theta}_{s},Y^{\theta}_{s},Z^{\theta}_{s},\varphi_{s})-D_{x}\mathcal{H}(s,X^{\theta}_{s},Y^{\theta}_{s},Z^{\theta}_{s},\theta_{s})|^{2}\,ds\,.\end{split} (14)

The proof will be given in Section 3. We now state a necessary condition for optimality for the augmented Hamiltonian.

Theorem 2.4 (Extended Pontryagin’s optimality principle).

Let α\alpha^{*} be the (locally) optimal control, XαX^{\alpha^{*}} be the associated controlled state solving (1), and (Yα,Zα)(Y^{\alpha^{*}},Z^{\alpha^{*}}) be the associated adjoint processes solving (4). Then for any aAa\in A we have

~(s,Xsα,Ysα,Zsα,αs,αs)~(t,Xsα,Ysα,Zsα,αs,a),s[0,T].\tilde{\mathcal{H}}(s,X_{s}^{\alpha^{*}},Y_{s}^{\alpha^{*}},Z_{s}^{\alpha^{*}},\alpha_{s}^{*},\alpha_{s}^{*})\leq\tilde{\mathcal{H}}(t,X_{s}^{\alpha^{*}},Y_{s}^{\alpha^{*}},Z_{s}^{\alpha^{*}},\alpha_{s}^{*},a)\,,\,\,\,\forall s\in[0,T]\,. (15)

The proof of Theorem 2.4 will come in Section 3. We are now ready to present the main result of the paper.

Theorem 2.5.

Let Assumptions 2.1 and 2.2 hold. Then Algorithm 1 converges to a local minimum of (2) for sufficiently large ρ>0\rho>0.

Theorem 2.5 will be proved in Section 3. It can be seen from the proof that ρ\rho needs to be two times larger than the constant appearing in Lemma 2.3, which itself depends increases with T,dT,d and constants from Assumption 2.1 and 2.2.

We cannot guarantee that the Algorithm 1 converges to the optimal control which minimizes (2), since the extended Pontryagin’s optimality principle, see Theorem 2.4, is the necessary condition for optimality. The sufficient condition for optimality tells us that to get the optimal control we need to assume convexity of the Hamiltonian in state and control variables, and need to assume convexity of the terminal cost function. To that end, we need to assume convexity of b,σ,fb,\sigma,f and gg in xx and aa.

In the following corollary, we show that under a particular setting of the problem we have logarithmic convergence of the modified method of successive approximations to the true solution of the problem.

Corollary 2.6.

Let Assumptions 2.1 and 2.2 hold. Moreover, if b,σb,\sigma and ff are in the form of

b(t,x,a)=b1(t)x+b2(t,a),σ(t,x,a)=σ1(t)x+σ2(t,a),f(t,x,a)=f1(t,x)+f2(t,a)\begin{split}&b(t,x,a)=b_{1}(t)x+b_{2}(t,a)\,,\\ &\sigma(t,x,a)=\sigma_{1}(t)x+\sigma_{2}(t,a)\,,\\ &f(t,x,a)=f_{1}(t,x)+f_{2}(t,a)\,\end{split}

for t[0,T],xd,aA\forall t\in[0,T]\,,\,\forall x\in\mathbb{R}^{d}\,,\,\forall a\in A. In addition, assume that ff and gg are convex in xx, f2,b2,σ2f_{2},b_{2},\sigma_{2} are convex in aa. Then we have the following estimate for the sequence (αn)n(\alpha^{n})_{n\in\mathbb{N}} from Algorithm 1:

0J(x,αn)J(x,α)Cn,0\leq J(x,\alpha^{n})-J(x,\alpha^{*})\leq\frac{C}{n}\,,

where α\alpha^{*} is the optimal control for (2) and CC is a positive constant.

The proof of Corollary 2.6 will be given in Section 3. Theorem 2.5 and Corollary 2.6 are extensions of the result in [5] to the stochastic case.

3. Proofs

We start working towards the proof of Theorem 2.5. Recall the adjoint equation for an admissible control α\alpha:

dYsα=Dx(s,Xsα,Ysα,Zsα,αs)ds+ZsαdWs,s[0,T],YT=Dxg(XTα).dY_{s}^{\alpha}=-D_{x}\mathcal{H}(s,X_{s}^{\alpha},Y_{s}^{\alpha},Z_{s}^{\alpha},\alpha_{s})\,ds+Z_{s}^{\alpha}\,dW_{s},\,s\in[0,T],\,\,Y_{T}=D_{x}g(X_{T}^{\alpha})\,. (16)

From now on, we shall use Einstein notation, so that repeated indices in a single term imply summation over all the values of that index.

Lemma 3.1.

Assume that there exists K0K\geq 0 such that xd,aA,t[0,T]\forall x\in\mathbb{R}^{d},\forall a\in A,\forall t\in[0,T] we have

|Dxb(t,x,a)|+|Dxσ(t,x,a)|K,|D_{x}b(t,x,a)|+|D_{x}\sigma(t,x,a)|\leq K\,,

and

|Dxg(x)|+|Dxf(t,x,a)|K.|D_{x}g(x)|+|D_{x}f(t,x,a)|\leq K\,.

Then Yα\|Y^{\alpha}\|_{\mathbb{H}^{\infty}} is bounded.

Proof.

From the definition of the Hamiltonian (3) we have

Dxi(s,Xsα,Ysα,Zsα,αs)=Dxibj(s,Xsα,αs)(Ysα)j+Dxiσjp(s,Xsα,αs)(Zsα)jp+Dxif(s,Xsα,αs),s[0,T],i=1,2,,d.\begin{split}&D_{x_{i}}\mathcal{H}(s,X_{s}^{\alpha},Y_{s}^{\alpha},Z_{s}^{\alpha},\alpha_{s})=D_{x_{i}}b^{j}(s,X_{s}^{\alpha},\alpha_{s})(Y_{s}^{\alpha})^{j}+D_{x_{i}}\sigma^{jp}(s,X_{s}^{\alpha},\alpha_{s})(Z_{s}^{\alpha})^{jp}\\ &\qquad\qquad\qquad\qquad\qquad\qquad\qquad+D_{x_{i}}f(s,X_{s}^{\alpha},\alpha_{s})\,,\,\,\forall s\in[0,T]\,,\,\,i=1,2,\dots,d\,.\end{split}

Hence, one can observe that (16) is a linear BSDE. Therefore, from [33, Proposition 3.2] we can write the formula for the solution of (16):

Ytα=𝔼t[St1STDxg(XTα)+tTSt1SsDxf(s,Xsα,αs)𝑑s],Y_{t}^{\alpha}=\mathbb{E}_{t}\left[S_{t}^{-1}S_{T}D_{x}g(X_{T}^{\alpha})+\int_{t}^{T}S_{t}^{-1}S_{s}D_{x}f(s,X_{s}^{\alpha},\alpha_{s})\,ds\right]\,,

where the process SS is the unique strong solution of

dStij=StilDxlbj(t,Xtα,αt)dt+StilDxlσjp(t,Xtα,αt)dWtp,i,j=1,2,,d,S0=Id,dS_{t}^{ij}=S^{il}_{t}D_{x_{l}}b^{j}(t,X_{t}^{\alpha},\alpha_{t})\,dt+S^{il}_{t}D_{x_{l}}\sigma^{jp}(t,X_{t}^{\alpha},\alpha_{t})\,dW_{t}^{p}\,,\,i,j=1,2,\dots,d,\,S_{0}=I_{d}\,,

and S1S^{-1} is the inverse process of SS. Thus, due to [33, Corollary 3.7] and assumptions of lemma we have the following bound:

YαCDxg(XTα)L+CTDxf(,Xα,α).\|Y^{\alpha}\|_{\mathbb{H}^{\infty}}\leq C\|D_{x}g(X_{T}^{\alpha})\|_{L^{\infty}}+CT\|D_{x}f(\cdot,X_{\cdot}^{\alpha},\alpha_{\cdot})\|_{\mathbb{H}^{\infty}}\,.

Hence, due to assumptions of lemma we conclude that Yα\|Y^{\alpha}\|_{\mathbb{H}^{\infty}} is bounded. ∎

Proof of Lemma 2.3.

Let φ\varphi and θ\theta be some generic admissible controls. We will write (Xsφ)s[0,T](X^{\varphi}_{s})_{s\in[0,T]} for the solution of (1) controlled by φ\varphi and (Xsθ)s[0,T](X^{\theta}_{s})_{s\in[0,T]} for the solution of (1) controlled by θ\theta. We denote solutions of corresponding adjoint equations by (Ysφ,Zsφ)s[0,T](Y^{\varphi}_{s},\,Z^{\varphi}_{s})_{s\in[0,T]} and (Ysθ,Zsθ)s[0,T](Y^{\theta}_{s},\,Z^{\theta}_{s})_{s\in[0,T]}. Due to Taylor’s theorem, we note that for some R1(ω)[0,1]R^{1}(\omega)\in[0,1], we have ωΩ\forall\omega\in\Omega that

g(XTφ)g(XTθ)=(Dxg(XTθ))(XTφXTθ)+12(XTφXTθ)Dx2g(XTθ+R1(XTφXTθ))(XTφXTθ)(Dxg(XTθ))(XTφXTθ)+12(XTφXTθ)|Dx2g(XTθ+R1(XTφXTθ))|(XTφXTθ)(Dxg(XTθ))(XTφXTθ)+K2|XTφXTθ|2.\begin{split}g(X^{\varphi}_{T})&-g(X^{\theta}_{T})=(D_{x}g(X^{\theta}_{T}))^{\top}(X^{\varphi}_{T}-X^{\theta}_{T})\\ &\qquad+\frac{1}{2}(X^{\varphi}_{T}-X^{\theta}_{T})^{\top}D^{2}_{x}g(X^{\theta}_{T}+R^{1}(X^{\varphi}_{T}-X^{\theta}_{T}))(X^{\varphi}_{T}-X^{\theta}_{T})\\ &\leq(D_{x}g(X^{\theta}_{T}))^{\top}(X^{\varphi}_{T}-X^{\theta}_{T})\\ &\qquad+\frac{1}{2}(X^{\varphi}_{T}-X^{\theta}_{T})^{\top}\left|D^{2}_{x}g(X^{\theta}_{T}+R^{1}(X^{\varphi}_{T}-X^{\theta}_{T}))\right|(X^{\varphi}_{T}-X^{\theta}_{T})\\ &\leq(D_{x}g(X^{\theta}_{T}))^{\top}(X^{\varphi}_{T}-X^{\theta}_{T})+\frac{K}{2}\left|X^{\varphi}_{T}-X^{\theta}_{T}\right|^{2}\,.\end{split}

The last inequality holds due to Assumption 2.2. Recall that YTθ=Dxg(XTθ)Y^{\theta}_{T}=D_{x}g(X^{\theta}_{T}). Hence, using Itô’s product rule, we get

𝔼[g(XTφ)g(XTθ)]𝔼[(YTθ)(XTφXTθ)+K2|XTφXTθ|2]𝔼0T(XsφXsθ)𝑑Ysθ+𝔼0T(Ysθ)[dXsφdXsθ]+𝔼0Ttr[(σ(s,Xsφ,φs)σ(s,Xsθ,θs))Zsθ]𝑑s+K2𝔼[|XTφXTθ|2].\begin{split}\mathbb{E}&[g(X^{\varphi}_{T})-g(X^{\theta}_{T})]\leq\mathbb{E}\left[(Y^{\theta}_{T})^{\top}(X^{\varphi}_{T}-X^{\theta}_{T})+\frac{K}{2}\left|X^{\varphi}_{T}-X^{\theta}_{T}\right|^{2}\right]\\ &\leq\mathbb{E}\int_{0}^{T}(X^{\varphi}_{s}-X^{\theta}_{s})^{\top}\,dY^{\theta}_{s}+\mathbb{E}\int_{0}^{T}(Y^{\theta}_{s})^{\top}[dX^{\varphi}_{s}-dX^{\theta}_{s}]\\ &\qquad+\mathbb{E}\int_{0}^{T}\text{tr}[(\sigma(s,X^{\varphi}_{s},\varphi_{s})-\sigma(s,X^{\theta}_{s},\theta_{s}))^{\top}Z^{\theta}_{s}]\,ds+\frac{K}{2}\mathbb{E}\left[\left|X^{\varphi}_{T}-X^{\theta}_{T}\right|^{2}\right]\,.\\ \end{split}

From this, the forward SDE (1) and the adjoint equation (4) we thus get

𝔼[g(XTφ)g(XTθ)]𝔼0T(XsφXsθ)Dx(s,Xsθ,Ysθ,Zsθ,θs)𝑑s+𝔼0T(Ysθ)[b(s,Xsφ,φs)b(s,Xsθ,θs)]𝑑s+𝔼0Ttr[(σ(s,Xsφ,φs)σ(s,Xsθ,θs))Zsθ]𝑑s+K2𝔼[|XTφXTθ|2].\begin{split}&\mathbb{E}[g(X^{\varphi}_{T})-g(X^{\theta}_{T})]\\ &\leq-\mathbb{E}\int_{0}^{T}(X^{\varphi}_{s}-X^{\theta}_{s})^{\top}D_{x}\mathcal{H}(s,X^{\theta}_{s},Y^{\theta}_{s},Z^{\theta}_{s},\theta_{s})\,ds\\ &\qquad+\mathbb{E}\int_{0}^{T}(Y^{\theta}_{s})^{\top}[b(s,X^{\varphi}_{s},\varphi_{s})-b(s,X^{\theta}_{s},\theta_{s})]\,ds\\ &\qquad+\mathbb{E}\int_{0}^{T}\text{tr}[(\sigma(s,X^{\varphi}_{s},\varphi_{s})-\sigma(s,X^{\theta}_{s},\theta_{s}))^{\top}Z^{\theta}_{s}]\,ds+\frac{K}{2}\mathbb{E}\left[\left|X^{\varphi}_{T}-X^{\theta}_{T}\right|^{2}\right]\,.\end{split} (17)

On the other hand, by definition of the Hamiltonian we have

𝔼0T[f(s,Xsφ,φs)f(s,Xsθ,θs)]𝑑s=𝔼0T[(s,Xsφ,Ysθ,Zsθ,φs)(s,Xsθ,Ysθ,Zsθ,θs)]𝑑s𝔼0T(Ysθ)[b(s,Xsφ,φs)b(s,Xsθ,θs)]𝑑s𝔼0Ttr[(σ(s,Xsφ,φs)σ(s,Xsθ,θs))Zsθ]𝑑s.\begin{split}\mathbb{E}&\int_{0}^{T}[f(s,X^{\varphi}_{s},\varphi_{s})-f(s,X^{\theta}_{s},\theta_{s})]\,ds\\ &=\mathbb{E}\int_{0}^{T}[\mathcal{H}(s,X^{\varphi}_{s},Y^{\theta}_{s},Z^{\theta}_{s},\varphi_{s})-\mathcal{H}(s,X^{\theta}_{s},Y^{\theta}_{s},Z^{\theta}_{s},\theta_{s})]\,ds\\ &\qquad-\mathbb{E}\int_{0}^{T}(Y^{\theta}_{s})^{\top}[b(s,X^{\varphi}_{s},\varphi_{s})-b(s,X^{\theta}_{s},\theta_{s})]\,ds\\ &\qquad-\mathbb{E}\int_{0}^{T}\text{tr}[(\sigma(s,X^{\varphi}_{s},\varphi_{s})-\sigma(s,X^{\theta}_{s},\theta_{s}))^{\top}Z^{\theta}_{s}]\,ds\,.\end{split} (18)

Summing up (17) and (18) we get

J(x,φ)J(x,θ)=𝔼[g(XTφ)g(XTθ)]+𝔼0T[f(s,Xsφ,φs)f(s,Xsθ,θs)]𝑑s𝔼0T[(s,Xsφ,Ysθ,Zsθ,φs)(s,Xsθ,Ysθ,Zsθ,θs)(XsφXsθ)Dx(s,Xsθ,Ysθ,Zsθ,θs)]ds+K2𝔼[|XTφXTθ|2].\begin{split}&J(x,\varphi)-J(x,\theta)\\ &=\mathbb{E}[g(X^{\varphi}_{T})-g(X^{\theta}_{T})]+\mathbb{E}\int_{0}^{T}[f(s,X^{\varphi}_{s},\varphi_{s})-f(s,X^{\theta}_{s},\theta_{s})]\,ds\\ &\leq\mathbb{E}\int_{0}^{T}[\mathcal{H}(s,X^{\varphi}_{s},Y^{\theta}_{s},Z^{\theta}_{s},\varphi_{s})-\mathcal{H}(s,X^{\theta}_{s},Y^{\theta}_{s},Z^{\theta}_{s},\theta_{s})\\ &\qquad-(X^{\varphi}_{s}-X^{\theta}_{s})^{\top}D_{x}\mathcal{H}(s,X^{\theta}_{s},Y^{\theta}_{s},Z^{\theta}_{s},\theta_{s})]\,ds+\frac{K}{2}\mathbb{E}\left[\left|X^{\varphi}_{T}-X^{\theta}_{T}\right|^{2}\right]\,.\end{split} (19)

Due to Taylor’s theorem, there exists (Rs2(ω))s[0,T][0,1](R^{2}_{s}(\omega))_{s\in[0,T]}\in[0,1] such that ωΩ\forall\omega\in\Omega we have

(s,Xsφ,Ysθ,Zsθ,φs)(s,Xsθ,Ysθ,Zsθ,θs)=(s,Xsθ,Ysθ,Zsθ,φs)(s,Xsθ,Ysθ,Zsθ,θs)+(XsφXsθ)Dx(s,Xsθ,Ysθ,Zsθ,φs)+12(XsφXsθ)Dx2(s,Xsθ+Rs2(XsφXsθ),Ysθ,Zsθ,φs)(XsφXsθ).\begin{split}&\mathcal{H}(s,X^{\varphi}_{s},Y^{\theta}_{s},Z^{\theta}_{s},\varphi_{s})-\mathcal{H}(s,X^{\theta}_{s},Y^{\theta}_{s},Z^{\theta}_{s},\theta_{s})\\ &=\mathcal{H}(s,X^{\theta}_{s},Y^{\theta}_{s},Z^{\theta}_{s},\varphi_{s})-\mathcal{H}(s,X^{\theta}_{s},Y^{\theta}_{s},Z^{\theta}_{s},\theta_{s})\\ &+(X^{\varphi}_{s}-X^{\theta}_{s})^{\top}D_{x}\mathcal{H}(s,X^{\theta}_{s},Y^{\theta}_{s},Z^{\theta}_{s},\varphi_{s})\\ &+\frac{1}{2}(X^{\varphi}_{s}-X^{\theta}_{s})^{\top}D_{x}^{2}\mathcal{H}(s,X^{\theta}_{s}+R^{2}_{s}(X^{\varphi}_{s}-X^{\theta}_{s}),Y^{\theta}_{s},Z^{\theta}_{s},\varphi_{s})(X^{\varphi}_{s}-X^{\theta}_{s})\,.\end{split} (20)

Since Dx2σ(s,Xsθ+Rs2(XsφXsθ),φs)=0D_{x}^{2}\sigma(s,X^{\theta}_{s}+R^{2}_{s}(X^{\varphi}_{s}-X^{\theta}_{s}),\varphi_{s})=0 by Assumption 2.1, we have that

|Dxixj2(s,Xsθ+Rs2(XsφXsθ),Ysθ,Zsθ,φs)|=|Dxixj2bl(s,Xsθ+Rs2(XsφXsθ),φs)(Ysθ)l+Dxixj2f(s,Xsθ+Rs2(XsφXsθ),φs)|,i,j=1,2,,d.\begin{split}&\left|D_{x_{i}x_{j}}^{2}\mathcal{H}(s,X^{\theta}_{s}+R^{2}_{s}(X^{\varphi}_{s}-X^{\theta}_{s}),Y^{\theta}_{s},Z^{\theta}_{s},\varphi_{s})\right|\\ &\qquad=\left|D_{x_{i}x_{j}}^{2}b^{l}(s,X^{\theta}_{s}+R^{2}_{s}(X^{\varphi}_{s}-X^{\theta}_{s}),\varphi_{s})(Y^{\theta}_{s})^{l}\right.\\ &\qquad\qquad\left.+D_{x_{i}x_{j}}^{2}f(s,X^{\theta}_{s}+R^{2}_{s}(X^{\varphi}_{s}-X^{\theta}_{s}),\varphi_{s})\right|\,,\,i,j=1,2,\dots,d\,.\end{split}

From Lemma 3.1 we know that |Ysθ||Y^{\theta}_{s}| is bounded a.s. for all s[0,T]s\in[0,T]. Hence by Assumption 2.1 and 2.2 we have

|Dx2(s,Xsθ+Rs2(XsφXsθ),Ysθ,Zsθ,φs)|<.|D_{x}^{2}\mathcal{H}(s,X^{\theta}_{s}+R^{2}_{s}(X^{\varphi}_{s}-X^{\theta}_{s}),Y^{\theta}_{s},Z^{\theta}_{s},\varphi_{s})|<\infty\,. (21)

Therefore, after substituting (20) into (19), and by 21 we get

J(x,φ)J(x,θ)=𝔼[0T[(s,Xsθ,Ysθ,Zsθ,φs)(s,Xsθ,Ysθ,Zsθ,θs)+(XsφXsθ)(Dx(s,Xsθ,Ysθ,Zsθ,φs)Dx(s,Xsθ,Ysθ,Zsθ,θs))+K2|XsφXsθ|2ds]+K2𝔼[|XTφXTθ|2].\begin{split}J(x,&\varphi)-J(x,\theta)=\mathbb{E}\left[\int_{0}^{T}[\mathcal{H}(s,X^{\theta}_{s},Y^{\theta}_{s},Z^{\theta}_{s},\varphi_{s})-\mathcal{H}(s,X^{\theta}_{s},Y^{\theta}_{s},Z^{\theta}_{s},\theta_{s})\right.\\ &+(X^{\varphi}_{s}-X^{\theta}_{s})^{\top}(D_{x}\mathcal{H}(s,X^{\theta}_{s},Y^{\theta}_{s},Z^{\theta}_{s},\varphi_{s})-D_{x}\mathcal{H}(s,X^{\theta}_{s},Y^{\theta}_{s},Z^{\theta}_{s},\theta_{s}))\\ &\left.+\frac{K}{2}\left|X^{\varphi}_{s}-X^{\theta}_{s}\right|^{2}\,ds\right]+\frac{K}{2}\mathbb{E}\left[\left|X^{\varphi}_{T}-X^{\theta}_{T}\right|^{2}\right]\,.\end{split}

Let us now get a standard SDE estimate for the difference of XφX^{\varphi} and XθX^{\theta}. From (a+b)22a2+2b2(a+b)^{2}\leq 2a^{2}+2b^{2}, from taking the expectation, from Hölder’s inequality, from Assumption 2.1, from the Burkholder-Davis-Gundy inequality and from Gronwall’s inequality we obtain

𝔼sup0tT|XtφXtθ|2C𝔼0T|b(s,Xsθ,φs)b(s,Xsθ,θs)|2𝑑s+C𝔼0T|σ(s,Xsθ,φs)σ(s,Xsθ,θs)|2𝑑s.\begin{split}\mathbb{E}\sup_{0\leq t\leq T}&|X^{\varphi}_{t}-X^{\theta}_{t}|^{2}\leq C\mathbb{E}\int_{0}^{T}|b(s,X^{\theta}_{s},\varphi_{s})-b(s,X^{\theta}_{s},\theta_{s})|^{2}\,ds\\ &+C\mathbb{E}\int_{0}^{T}|\sigma(s,X^{\theta}_{s},\varphi_{s})-\sigma(s,X^{\theta}_{s},\theta_{s})|^{2}\,ds\,.\end{split} (22)

Young’s inequality allows us to get the estimate

J(x,φ)J(x,θ)𝔼0T[(s,Xsθ,Ysθ,Zsθ,φs)(s,Xsθ,Ysθ,Zsθ,θs)]𝑑s+12𝔼0T|XsφXsθ|2𝑑s+12𝔼[0T|Dx(s,Xsθ,Ysθ,Zsθ,φs)Dx(s,Xsθ,Ysθ,Zsθ,θs)|2+K2|XsφXsθ|2ds]+K2𝔼[|XTφXTθ|2].\begin{split}J&(x,\varphi)-J(x,\theta)\\ &\leq\mathbb{E}\int_{0}^{T}[\mathcal{H}(s,X^{\theta}_{s},Y^{\theta}_{s},Z^{\theta}_{s},\varphi_{s})-\mathcal{H}(s,X^{\theta}_{s},Y^{\theta}_{s},Z^{\theta}_{s},\theta_{s})]\,ds+\frac{1}{2}\mathbb{E}\int_{0}^{T}|X^{\varphi}_{s}-X^{\theta}_{s}|^{2}\,ds\\ &\quad\qquad+\frac{1}{2}\mathbb{E}\left[\int_{0}^{T}|D_{x}\mathcal{H}(s,X^{\theta}_{s},Y^{\theta}_{s},Z^{\theta}_{s},\varphi_{s})-D_{x}\mathcal{H}(s,X^{\theta}_{s},Y^{\theta}_{s},Z^{\theta}_{s},\theta_{s})|^{2}\right.\\ &\quad\qquad\qquad\left.+\frac{K}{2}\left|X^{\varphi}_{s}-X^{\theta}_{s}\right|^{2}\,ds\right]+\frac{K}{2}\mathbb{E}\left[\left|X^{\varphi}_{T}-X^{\theta}_{T}\right|^{2}\right]\,.\end{split}

Hence, from (22) we have that

J(x,φ)J(x,θ)𝔼0T[(s,Xsθ,Ysθ,Zsθ,φs)(s,Xsθ,Ysθ,Zsθ,θs)]𝑑s+C𝔼0T|b(s,Xsθ,φs)b(s,Xsθ,θs)|2𝑑s+C𝔼0T|σ(s,Xsθ,φs)σ(s,Xsθ,θs)|2𝑑s+C𝔼0T|Dx(s,Xsθ,Ysθ,Zsθ,φs)Dx(s,Xsθ,Ysθ,Zsθ,θs)|2𝑑s,\begin{split}J&(x,\varphi)-J(x,\theta)\leq\mathbb{E}\int_{0}^{T}[\mathcal{H}(s,X^{\theta}_{s},Y^{\theta}_{s},Z^{\theta}_{s},\varphi_{s})-\mathcal{H}(s,X^{\theta}_{s},Y^{\theta}_{s},Z^{\theta}_{s},\theta_{s})]\,ds\\ &\quad+C\mathbb{E}\int_{0}^{T}|b(s,X^{\theta}_{s},\varphi_{s})-b(s,X^{\theta}_{s},\theta_{s})|^{2}\,ds\\ &\quad+C\mathbb{E}\int_{0}^{T}|\sigma(s,X^{\theta}_{s},\varphi_{s})-\sigma(s,X^{\theta}_{s},\theta_{s})|^{2}\,ds\\ &\quad+C\mathbb{E}\int_{0}^{T}|D_{x}\mathcal{H}(s,X^{\theta}_{s},Y^{\theta}_{s},Z^{\theta}_{s},\varphi_{s})-D_{x}\mathcal{H}(s,X^{\theta}_{s},Y^{\theta}_{s},Z^{\theta}_{s},\theta_{s})|^{2}\,ds\,,\end{split}

for some constant C>0C>0, which depends on K,TK,T, and dd. ∎

Proof of Theorem 2.4.

Since α\alpha^{*} is the (locally) optimal control for the problem (2), the Pontryagin’s optimality principle holds, see e.g. [23]. Hence for any aAa\in A we have

(s,Xsα,Ysα,Zsα,αs)(s,Xsα,Ysα,Zsα,a),s[0,T].\mathcal{H}(s,X_{s}^{\alpha^{*}},Y_{s}^{\alpha^{*}},Z_{s}^{\alpha^{*}},\alpha_{s}^{*})\leq\mathcal{H}(s,X_{s}^{\alpha^{*}},Y_{s}^{\alpha^{*}},Z_{s}^{\alpha^{*}},a)\,,\,\,\,\forall s\in[0,T]\,. (23)

By definition of the augmented Hamiltonian (6) for all s[0,T]s\in[0,T] we have

~(s,Xsα,Ysα,Zsα,αs,a)=(s,Xsα,Ysα,Zsα,a)+12ρ|b(s,Xsα,a)b(s,Xsα,αs)|2+12ρ|σ(s,Xsα,a)σ(s,Xsα,αs)|2+12ρ|Dx(s,Xsα,Ysα,Zsα,a)Dx(s,Xsα,Ysα,Zsα,αs)|2.\begin{split}&\tilde{\mathcal{H}}(s,X_{s}^{\alpha^{*}},Y_{s}^{\alpha^{*}},Z_{s}^{\alpha^{*}},\alpha_{s}^{*},a)=\mathcal{H}(s,X_{s}^{\alpha^{*}},Y_{s}^{\alpha^{*}},Z_{s}^{\alpha^{*}},a)\\ &\qquad+\frac{1}{2}\rho|b(s,X_{s}^{\alpha^{*}},a)-b(s,X_{s}^{\alpha^{*}},\alpha_{s}^{*})|^{2}+\frac{1}{2}\rho|\sigma(s,X_{s}^{\alpha^{*}},a)-\sigma(s,X_{s}^{\alpha^{*}},\alpha_{s}^{*})|^{2}\\ &\qquad+\frac{1}{2}\rho|D_{x}\mathcal{H}(s,X_{s}^{\alpha^{*}},Y_{s}^{\alpha^{*}},Z_{s}^{\alpha^{*}},a)-D_{x}\mathcal{H}(s,X_{s}^{\alpha^{*}},Y_{s}^{\alpha^{*}},Z_{s}^{\alpha^{*}},\alpha_{s}^{*})|^{2}\,.\end{split} (24)

Therefore, due to (23) and (24) we have

~(s,Xsα,Ysα,Zsα,αs,αs)=(s,Xsα,Ysα,Zsα,αs)(s,Xsα,Ysα,Zsα,a)+12ρ|b(s,Xsα,a)b(s,Xsα,αs)|2+12ρ|σ(s,Xsα,a)σ(s,Xsα,αs)|2+12ρ|Dx(s,Xsα,Ysα,Zsα,a)Dx(s,Xsα,Ysα,Zsα,αs)|2=~(s,Xsα,Ysα,Zsα,αs,a).\begin{split}\tilde{\mathcal{H}}&(s,X_{s}^{\alpha^{*}},Y_{s}^{\alpha^{*}},Z_{s}^{\alpha^{*}},\alpha_{s}^{*},\alpha_{s}^{*})=\mathcal{H}(s,X_{s}^{\alpha^{*}},Y_{s}^{\alpha^{*}},Z_{s}^{\alpha^{*}},\alpha_{s}^{*})\\ &\leq\mathcal{H}(s,X_{s}^{\alpha^{*}},Y_{s}^{\alpha^{*}},Z_{s}^{\alpha^{*}},a)+\frac{1}{2}\rho|b(s,X_{s}^{\alpha^{*}},a)-b(s,X_{s}^{\alpha^{*}},\alpha_{s}^{*})|^{2}\\ &\qquad+\frac{1}{2}\rho|\sigma(s,X_{s}^{\alpha^{*}},a)-\sigma(s,X_{s}^{\alpha^{*}},\alpha_{s}^{*})|^{2}\\ &\qquad+\frac{1}{2}\rho|D_{x}\mathcal{H}(s,X_{s}^{\alpha^{*}},Y_{s}^{\alpha^{*}},Z_{s}^{\alpha^{*}},a)-D_{x}\mathcal{H}(s,X_{s}^{\alpha^{*}},Y_{s}^{\alpha^{*}},Z_{s}^{\alpha^{*}},\alpha_{s}^{*})|^{2}\\ &=\tilde{\mathcal{H}}(s,X_{s}^{\alpha^{*}},Y_{s}^{\alpha^{*}},Z_{s}^{\alpha^{*}},\alpha_{s}^{*},a)\,.\end{split}

This concludes the proof. ∎

Proof of Theorem 2.5.

Let us apply Lemma 2.3 for φ=αn\varphi=\alpha^{n} and θ=αn1\theta=\alpha^{n-1}. Hence, for some C>0C>0 we have

J(x,αn)J(x,αn1)𝔼0T[(s,Xsn,Ysn,Zsn,αsn)(s,Xsn,Ysn,Zsn,αsn1)]𝑑s+C𝔼0T|b(s,Xsn,αsn)b(s,Xsn,αsn1)|2𝑑s+C𝔼0T|σ(s,Xsn,αsn)σ(s,Xsn,αsn1)|2𝑑s+C𝔼0T|Dx(s,Xsn,Ysn,Zsn,αsn)Dx(s,Xsn,Ysn,Zsn,αsn1)|2𝑑s.\begin{split}J&(x,\alpha^{n})-J(x,\alpha^{n-1})\\ &\leq\mathbb{E}\int_{0}^{T}[\mathcal{H}(s,X^{n}_{s},Y^{n}_{s},Z^{n}_{s},\alpha^{n}_{s})-\mathcal{H}(s,X^{n}_{s},Y^{n}_{s},Z^{n}_{s},\alpha^{n-1}_{s})]\,ds\\ &\quad+C\mathbb{E}\int_{0}^{T}|b(s,X^{n}_{s},\alpha^{n}_{s})-b(s,X^{n}_{s},\alpha^{n-1}_{s})|^{2}\,ds\\ &\quad+C\mathbb{E}\int_{0}^{T}|\sigma(s,X^{n}_{s},\alpha^{n}_{s})-\sigma(s,X^{n}_{s},\alpha^{n-1}_{s})|^{2}\,ds\\ &\quad+C\mathbb{E}\int_{0}^{T}\left|D_{x}\mathcal{H}(s,X^{n}_{s},Y^{n}_{s},Z^{n}_{s},\alpha^{n}_{s})-D_{x}\mathcal{H}(s,X^{n}_{s},Y^{n}_{s},Z^{n}_{s},\alpha^{n-1}_{s})\right|^{2}\,ds\,.\end{split} (25)

Let

μ(αn1)=𝔼0T[(s,Xsn,Ysn,Zsn,αsn)(s,Xsn,Ysn,Zsn,αsn1)]𝑑s.\mu(\alpha^{n-1})=\mathbb{E}\int_{0}^{T}[\mathcal{H}(s,X^{n}_{s},Y^{n}_{s},Z^{n}_{s},\alpha^{n}_{s})-\mathcal{H}(s,X^{n}_{s},Y^{n}_{s},Z^{n}_{s},\alpha^{n-1}_{s})]\,ds\,.

Due to the definition of αn\alpha^{n} (8) and (15) we have for all s[0,T]s\in[0,T]

(s,Xsn,Ysn,Zsn,αsn)+12ρ|b(s,Xsn,αsn)b(s,Xsn,αsn1)|2+12ρ|σ(s,Xsn,αsn)σ(s,Xsn,αsn1)|2+12ρ|Dx(s,Xsn,Ysn,Zsn,αsn)Dx(s,Xsn,Ysn,Zsn,αsn1)|2(s,Xsn,Ysn,Zsn,αsn1).\begin{split}\mathcal{H}&(s,X_{s}^{n},Y_{s}^{n},Z_{s}^{n},\alpha^{n}_{s})+\frac{1}{2}\rho|b(s,X^{n}_{s},\alpha^{n}_{s})-b(s,X^{n}_{s},\alpha^{n-1}_{s})|^{2}\\ &\qquad+\frac{1}{2}\rho|\sigma(s,X^{n}_{s},\alpha^{n}_{s})-\sigma(s,X^{n}_{s},\alpha^{n-1}_{s})|^{2}\\ &\qquad+\frac{1}{2}\rho|D_{x}\mathcal{H}(s,X^{n}_{s},Y^{n}_{s},Z^{n}_{s},\alpha^{n}_{s})-D_{x}\mathcal{H}(s,X^{n}_{s},Y^{n}_{s},Z^{n}_{s},\alpha^{n-1}_{s})|^{2}\\ &\leq\mathcal{H}(s,X_{s}^{n},Y_{s}^{n},Z_{s}^{n},\alpha^{n-1}_{s})\,.\end{split}

Therefore, we can observe that μ(αn1)0\mu(\alpha^{n-1})\leq 0. Hence we can rewrite the inequality (25) as

J(x,αn)J(x,αn1)μ(αn1)2Cρμ(αn1)=Dμ(αn1),\begin{split}J(x,\alpha^{n})&-J(x,\alpha^{n-1})\leq\mu(\alpha^{n-1})-\frac{2C}{\rho}\mu(\alpha^{n-1})=D\mu(\alpha^{n-1})\,,\end{split} (26)

where D:=12CρD:=1-\frac{2C}{\rho}. By choosing ρ>2C\rho>2C we have that D>0D>0. Notice that for any integer M>1M>1 we have

n=1M(μ(αn1))D1n=1M(J(x,αn1)J(x,αn))=D1(J(x,α0)J(x,αM))D1(J(x,α0)infα𝒜J(x,α))<.\begin{split}\sum_{n=1}^{M}&(-\mu(\alpha^{n-1}))\leq D^{-1}\sum_{n=1}^{M}(J(x,\alpha^{n-1})-J(x,\alpha^{n}))\\ &\,=D^{-1}(J(x,\alpha^{0})-J(x,\alpha^{M}))\leq D^{-1}(J(x,\alpha^{0})-\inf_{\alpha\in\mathcal{A}}J(x,\alpha))<\infty.\end{split}

Since (μ(αn1))0(-\mu(\alpha^{n-1}))\geq 0 and n=1(μ(αn1))<+\sum_{n=1}^{\infty}(-\mu(\alpha^{n-1}))<+\infty we have that μ(αn1)0\mu(\alpha^{n-1})\rightarrow 0 as n0n\rightarrow 0. This concludes the proof. ∎

We need to introduce new notation, which will be used in the proof of Corollary 2.6. Denote the set

Iτ,h:=[τh,τ+h][0,T],τ[0,T],h[0,+).I_{\tau,h}:=[\tau-h,\tau+h]\cap[0,T]\,\,,\,\tau\in[0,T],\,h\in[0,+\infty)\,. (27)

Let us define for all s[0,T]s\in[0,T]

Δαn1(s):=(s,Xsn,Ysn,Zsn,αsn)(s,Xsn,Ysn,Zsn,αsn1),\Delta_{\alpha^{n-1}}\mathcal{H}(s):=\mathcal{H}(s,X^{n}_{s},Y^{n}_{s},Z^{n}_{s},\alpha^{n}_{s})-\mathcal{H}(s,X^{n}_{s},Y^{n}_{s},Z^{n}_{s},\alpha^{n-1}_{s})\,,

and

μ(αn1):=𝔼0TΔαn1(s)𝑑s.\mu(\alpha^{n-1}):=\mathbb{E}\int_{0}^{T}\Delta_{\alpha^{n-1}}\mathcal{H}(s)\,ds\,.

By definition of αn\alpha^{n} notice that Δαn1(t)0\Delta_{\alpha^{n-1}}\mathcal{H}(t)\leq 0 for all t[0,T]t\in[0,T]. Let us show an auxiliary lemma.

Lemma 3.2.

For any h>0h>0 there exists τ\tau, which depends on hh and αn1\alpha^{n-1}, such that

𝔼Iτ,hΔαn1(t)𝑑thμ(αn1)T.\mathbb{E}\int_{I_{\tau,h}}\Delta_{\alpha^{n-1}}\mathcal{H}(t)\,dt\leq\frac{h\mu(\alpha^{n-1})}{T}\,.
Proof.

We will prove by contradiction. Assume that there exists h>0h^{*}>0 such that τ[0,T]\forall\tau\in[0,T] we have

𝔼Iτ,hΔαn1(t)𝑑t>hμ(αn1)T.\mathbb{E}\int_{I_{\tau,h^{*}}}\Delta_{\alpha^{n-1}}\mathcal{H}(t)\,dt>\frac{h^{*}\mu(\alpha^{n-1})}{T}\,. (28)

Denote τi=ih\tau_{i}=ih^{*}, i=0,1,,N(h)i=0,1,\dots,N(h^{*}), where N(h)=[T/h]N(h^{*})=[T/h^{*}] - integer part. Since Δαn1(t)0\Delta_{\alpha^{n-1}}\mathcal{H}(t)\leq 0 for all t[0,T]t\in[0,T] by definition of αn\alpha^{n} and i=0N(h)Iτi,h\cup_{i=0}^{N(h^{*})}I_{\tau_{i},h^{*}} is a superset of [0,T][0,T] we have

μ(αn1)=𝔼0TΔαn1(t)𝑑ti=0N(h)𝔼Iτi,hΔαn1(t)𝑑t.\begin{split}\mu(\alpha^{n-1})=\mathbb{E}\int_{0}^{T}\Delta_{\alpha^{n-1}}\mathcal{H}(t)\,dt\geq\sum_{i=0}^{N(h^{*})}\mathbb{E}\int_{I_{\tau_{i},h^{*}}}\Delta_{\alpha^{n-1}}\mathcal{H}(t)\,dt\,.\end{split} (29)

Hence, by (28) we get

μ(αn1)>hN(h)T𝔼0TΔαn1(t)𝑑t>μ(αn1).\mu(\alpha^{n-1})>\frac{h^{*}N(h^{*})}{T}\mathbb{E}\int_{0}^{T}\Delta_{\alpha^{n-1}}\mathcal{H}(t)\,dt>\mu(\alpha^{n-1})\,.

Hence we get the contradiction. ∎

Now we are ready to prove Corollary 2.6.

Proof of Corollary 2.6.

First, observe that

b(s,Xsn,αsn)b(s,Xsn,αsn1)=b2(s,αsn)b2(s,αsn1),σ(s,Xsn,αsn)σ(s,Xsn,αsn1)=σ2(s,αsn)σ2(s,αsn1),Dx(s,Xsn,Ysn,Zsn,αsn)Dx(s,Xsn,Ysn,Zsn,αsn1)=0.\begin{split}&b(s,X^{n}_{s},\alpha^{n}_{s})-b(s,X^{n}_{s},\alpha^{n-1}_{s})=b_{2}(s,\alpha^{n}_{s})-b_{2}(s,\alpha^{n-1}_{s})\,,\\ &\sigma(s,X^{n}_{s},\alpha^{n}_{s})-\sigma(s,X^{n}_{s},\alpha^{n-1}_{s})=\sigma_{2}(s,\alpha^{n}_{s})-\sigma_{2}(s,\alpha^{n-1}_{s})\,,\\ &D_{x}\mathcal{H}(s,X^{n}_{s},Y^{n}_{s},Z^{n}_{s},\alpha^{n}_{s})-D_{x}\mathcal{H}(s,X^{n}_{s},Y^{n}_{s},Z^{n}_{s},\alpha^{n-1}_{s})=0\,.\end{split}

Let us consider the set Iτ,hI_{\tau,h} given by (27). We will specify the choice of τ\tau and hh later. Hence, after applying Lemma 2.3 for αn\alpha^{n} and αn1\alpha^{n-1} we have for some C>0C>0

J(x,αn)J(x,αn1)𝔼Iτ,h[(s,Xsn,Ysn,Zsn,αsn)(s,Xsn,Ysn,Zsn,αsn1)]𝑑s+C𝔼Iτ,h|b2(s,αsn)b2(s,αsn1)|2+|σ2(s,αsn)σ2(s,αsn1)|2ds+𝔼[0,T]Iτ,h[(s,Xsn,Ysn,Zsn,αsn)(s,Xsn,Ysn,Zsn,αsn1)]𝑑s+C𝔼[0,T]Iτ,h|b2(s,αsn)b2(s,αsn1)|2+|σ2(s,αsn)σ2(s,αsn1)|2ds.\begin{split}J&(x,\alpha^{n})-J(x,\alpha^{n-1})\\ \leq&\mathbb{E}\int_{I_{\tau,h}}[\mathcal{H}(s,X^{n}_{s},Y^{n}_{s},Z^{n}_{s},\alpha^{n}_{s})-\mathcal{H}(s,X^{n}_{s},Y^{n}_{s},Z^{n}_{s},\alpha^{n-1}_{s})]\,ds\\ &+C\mathbb{E}\int_{I_{\tau,h}}|b_{2}(s,\alpha^{n}_{s})-b_{2}(s,\alpha^{n-1}_{s})|^{2}+|\sigma_{2}(s,\alpha^{n}_{s})-\sigma_{2}(s,\alpha^{n-1}_{s})|^{2}\,ds\\ &+\mathbb{E}\int_{[0,T]\setminus I_{\tau,h}}[\mathcal{H}(s,X^{n}_{s},Y^{n}_{s},Z^{n}_{s},\alpha^{n}_{s})-\mathcal{H}(s,X^{n}_{s},Y^{n}_{s},Z^{n}_{s},\alpha^{n-1}_{s})]\,ds\\ &+C\mathbb{E}\int_{[0,T]\setminus I_{\tau,h}}|b_{2}(s,\alpha^{n}_{s})-b_{2}(s,\alpha^{n-1}_{s})|^{2}+|\sigma_{2}(s,\alpha^{n}_{s})-\sigma_{2}(s,\alpha^{n-1}_{s})|^{2}\,ds\,.\end{split}

Since the following holds for all s[0,T]s\in[0,T] and ρ0\rho\geq 0:

(s,Xsn,Ysn,Zsn,αsn)(s,Xsn,Ysn,Zsn,αsn1)+12ρ|b2(s,αsn)b2(s,αsn1)|2+12ρ|σ2(s,αsn)σ2(s,αsn1)|20,\begin{split}\mathcal{H}&(s,X_{s}^{n},Y_{s}^{n},Z_{s}^{n},\alpha^{n}_{s})-\mathcal{H}(s,X_{s}^{n},Y_{s}^{n},Z_{s}^{n},\alpha^{n-1}_{s})\\ &+\frac{1}{2}\rho|b_{2}(s,\alpha^{n}_{s})-b_{2}(s,\alpha^{n-1}_{s})|^{2}+\frac{1}{2}\rho|\sigma_{2}(s,\alpha^{n}_{s})-\sigma_{2}(s,\alpha^{n-1}_{s})|^{2}\leq 0\,,\end{split}

we have for ρ2C\rho\geq 2C

J(x,αn)J(x,αn1)𝔼Iτ,h[(s,Xsn,Ysn,Zsn,αsn)(s,Xsn,Ysn,Zsn,αsn1)]𝑑s+C𝔼Iτ,h|b2(s,αsn)b2(s,αsn1)|2+|σ2(s,αsn)σ2(s,αsn1)|2ds.\begin{split}J&(x,\alpha^{n})-J(x,\alpha^{n-1})\\ &\leq\mathbb{E}\int_{I_{\tau,h}}[\mathcal{H}(s,X^{n}_{s},Y^{n}_{s},Z^{n}_{s},\alpha^{n}_{s})-\mathcal{H}(s,X^{n}_{s},Y^{n}_{s},Z^{n}_{s},\alpha^{n-1}_{s})]\,ds\\ &\qquad+C\mathbb{E}\int_{I_{\tau,h}}|b_{2}(s,\alpha^{n}_{s})-b_{2}(s,\alpha^{n-1}_{s})|^{2}+|\sigma_{2}(s,\alpha^{n}_{s})-\sigma_{2}(s,\alpha^{n-1}_{s})|^{2}\,ds\,.\end{split}

Therefore, from Lemma 3.2 and from similar calculations as in (26), there exists τ\tau such that

J(x,αn)J(x,αn1)(12Cρ)𝔼Iτ,h[(s,Xsn,Ysn,Zsn,αsn)(s,Xsn,Ysn,Zsn,αsn1)]𝑑s(12Cρ)hμ(αn1)T.\begin{split}J&(x,\alpha^{n})-J(x,\alpha^{n-1})\\ &\leq\left(1-\frac{2C}{\rho}\right)\mathbb{E}\int_{I_{\tau,h}}[\mathcal{H}(s,X^{n}_{s},Y^{n}_{s},Z^{n}_{s},\alpha^{n}_{s})-\mathcal{H}(s,X^{n}_{s},Y^{n}_{s},Z^{n}_{s},\alpha^{n-1}_{s})]\,ds\\ &\leq\left(1-\frac{2C}{\rho}\right)\frac{h\mu(\alpha^{n-1})}{T}\,.\end{split}

Let us choose h=(ρ2C)μ(αn1)/(ρT)h=-(\rho-2C)\mu(\alpha^{n-1})/(\rho T). Hence

J(x,αn)J(x,αn1)(ρ2C)2(μ(αn1))2/(ρ2T2).J(x,\alpha^{n})-J(x,\alpha^{n-1})\leq-(\rho-2C)^{2}(\mu(\alpha^{n-1}))^{2}/(\rho^{2}T^{2}). (30)

Let α\alpha^{*} be the optimal control. Indeed, by the sufficient condition for optimality, see e.g. [23], and by assumptions of corollary, we have the existence of the optimal control. Therefore, by convexity of gg, and by Itô’s product rule we have

0J(x,αn1)J(x,α)=𝔼[0T(f(s,Xsn,αsn1)f(s,Xs,αs))𝑑s+g(XTn)g(XT)]𝔼[0T(f(s,Xsn,αsn1)f(s,Xs,αs))𝑑s]+𝔼[(Dxg(Xn))(XTnXT)]𝔼[0T(f(s,Xsn,αsn1)f(s,Xs,αs))𝑑s]+𝔼[0T(Ysn)d(XsnXs)+0T(XsnXs)𝑑Ysn]+𝔼[0Ttr((σ(s,Xsn,αsn1)σ(s,Xs,αs))Zsn)𝑑s].\begin{split}0\leq&J(x,\alpha^{n-1})-J(x,\alpha^{*})\\ =&\mathbb{E}\left[\int_{0}^{T}(f(s,X^{n}_{s},\alpha^{n-1}_{s})-f(s,X_{s},\alpha_{s}^{*}))\,ds+g(X^{n}_{T})-g(X_{T})\right]\\ \leq&\mathbb{E}\left[\int_{0}^{T}(f(s,X^{n}_{s},\alpha^{n-1}_{s})-f(s,X_{s},\alpha_{s}^{*}))\,ds\right]+\mathbb{E}[(D_{x}g(X^{n}))^{\top}(X^{n}_{T}-X_{T})]\\ \leq&\mathbb{E}\left[\int_{0}^{T}(f(s,X^{n}_{s},\alpha^{n-1}_{s})-f(s,X_{s},\alpha_{s}^{*}))\,ds\right]\\ &\qquad+\mathbb{E}\left[\int_{0}^{T}(Y^{n}_{s})^{\top}d(X^{n}_{s}-X_{s})+\int_{0}^{T}(X^{n}_{s}-X_{s})^{\top}dY^{n}_{s}\right]\\ &\qquad+\mathbb{E}\left[\int_{0}^{T}\text{tr}((\sigma(s,X^{n}_{s},\alpha^{n-1}_{s})-\sigma(s,X_{s},\alpha_{s}^{*}))^{\top}Z^{n}_{s})\,ds\right]\,.\end{split}

Hence, we have that

0J(x,αn1)J(x,α)𝔼[0Tf(s,Xsn,αsn1)f(s,Xs,αs)ds]+𝔼[0T(Ysn)(b(s,Xsn,αsn1)b(s,Xs,αs))𝑑s]𝔼[0T(XsnXs)Dx(s,Xsn,Ysn,Zsn,αsn1)𝑑s]+𝔼[0Ttr((σ(s,Xsn,αsn1)σ(s,Xs,αs))Zsn)𝑑s].\begin{split}0\leq&J(x,\alpha^{n-1})-J(x,\alpha^{*})\leq\mathbb{E}\left[\int_{0}^{T}f(s,X^{n}_{s},\alpha^{n-1}_{s})-f(s,X_{s},\alpha_{s}^{*})\,ds\right]\\ &+\mathbb{E}\left[\int_{0}^{T}(Y^{n}_{s})^{\top}(b(s,X^{n}_{s},\alpha^{n-1}_{s})-b(s,X_{s},\alpha_{s}^{*}))\,ds\right]\\ &-\mathbb{E}\left[\int_{0}^{T}(X^{n}_{s}-X_{s})^{\top}D_{x}\mathcal{H}(s,X^{n}_{s},Y^{n}_{s},Z^{n}_{s},\alpha^{n-1}_{s})\,ds\right]\\ &+\mathbb{E}\left[\int_{0}^{T}\text{tr}((\sigma(s,X^{n}_{s},\alpha^{n-1}_{s})-\sigma(s,X_{s},\alpha_{s}^{*}))^{\top}Z^{n}_{s})\,ds\right]\,.\end{split}

Recalling the form of b,σb,\sigma and observing that

Dx(s,Xsn,Ysn,Zsn,αsn1)=b1(s)Ysn+σ1(s)Zsn+Dxf(s,Xsn,αsn1),D_{x}\mathcal{H}(s,X^{n}_{s},Y^{n}_{s},Z^{n}_{s},\alpha^{n-1}_{s})=b_{1}(s)Y^{n}_{s}+\sigma_{1}(s)Z_{s}^{n}+D_{x}f(s,X^{n}_{s},\alpha^{n-1}_{s})\,,

we have

0J(x,αn1)J(x,α)𝔼[0Tf(s,Xsn,αsn1)f(s,Xs,αs)ds]+𝔼[0Ttr((σ2(s,αsn1)σ2(s,αs))Zsn)𝑑s]+𝔼[0T(Ysn)(b2(s,αsn1)b2(s,αs))𝑑s0T(XsnXs)Dxf(s,Xsn,αsn1)𝑑s].\begin{split}0\leq&J(x,\alpha^{n-1})-J(x,\alpha^{*})\leq\mathbb{E}\left[\int_{0}^{T}f(s,X^{n}_{s},\alpha^{n-1}_{s})-f(s,X_{s},\alpha_{s}^{*})\,ds\right]\\ &+\mathbb{E}\left[\int_{0}^{T}\text{tr}((\sigma_{2}(s,\alpha^{n-1}_{s})-\sigma_{2}(s,\alpha_{s}^{*}))^{\top}Z^{n}_{s})\,ds\right]\\ &+\mathbb{E}\left[\int_{0}^{T}(Y^{n}_{s})^{\top}(b_{2}(s,\alpha^{n-1}_{s})-b_{2}(s,\alpha_{s}^{*}))\,ds-\int_{0}^{T}(X^{n}_{s}-X_{s})^{\top}D_{x}f(s,X^{n}_{s},\alpha^{n-1}_{s})\,ds\right]\,.\end{split}

Since ff is convex in xx we have for all s[0,T]s\in[0,T] that

f(s,Xs,αsn1)f(s,Xsn,αsn1)+(XsXsn)Dxf(s,Xsn,αsn1).f(s,X_{s},\alpha^{n-1}_{s})\geq f(s,X_{s}^{n},\alpha_{s}^{n-1})+(X_{s}-X^{n}_{s})^{\top}D_{x}f(s,X^{n}_{s},\alpha^{n-1}_{s})\,.

Therefore, we obtain

J(x,αn1)J(x,α)𝔼0T[(s,Xsn,Ysn,Zsn,αsn1)(s,Xsn,Ysn,Zsn,αs)]𝑑sμ(αn1),\begin{split}J(&x,\alpha^{n-1})-J(x,\alpha^{*})\\ &\leq\mathbb{E}\int_{0}^{T}\left[\mathcal{H}(s,X^{n}_{s},Y^{n}_{s},Z^{n}_{s},\alpha^{n-1}_{s})-\mathcal{H}(s,X^{n}_{s},Y^{n}_{s},Z^{n}_{s},\alpha_{s}^{*})\right]\,ds\\ &\leq-\mu(\alpha^{n-1})\,,\end{split} (31)

where the second inequality holds due to

(s,Xsn,Ysn,Zsn,αsn)(s,Xsn,Ysn,Zsn,αs).\mathcal{H}(s,X^{n}_{s},Y^{n}_{s},Z^{n}_{s},\alpha^{n}_{s})\leq\mathcal{H}(s,X^{n}_{s},Y^{n}_{s},Z^{n}_{s},\alpha_{s}^{*})\,.

Let bn:=J(x,αn)J(x,α)b^{n}:=J(x,\alpha^{n})-J(x,\alpha), then due to (30) and (31) we have that

bnbn1(ρ2C)2μ(αn1)2(ρ2T2)(ρ2C)2(bn1)2ρ2T2.b^{n}-b^{n-1}\leq\frac{-(\rho-2C)^{2}\mu(\alpha^{n-1})^{2}}{(\rho^{2}T^{2})}\leq\frac{-(\rho-2C)^{2}(b^{n-1})^{2}}{\rho^{2}T^{2}}\,.

Therefore, due to Lemma A.1 we have

J(x,αn)J(x,α)C1n.J(x,\alpha^{n})-J(x,\alpha^{*})\leq\frac{C_{1}}{n}\,.

for some constant C1>0C_{1}>0. This concludes the proof. ∎

Appendix A Auxiliary Lemma

Lemma A.1.

Let {bk}k\{b_{k}\}_{k\in\mathbb{N}} be the sequence of nonnegative numbers such that

bk+1bkqbk2,b_{k+1}\leq b_{k}-qb_{k}^{2}\,,

where qq is a positive constant. Then bk=O(1/k)b_{k}=O(1/k).

One can find the proof in [6, Lemma 1.4, p. 93]. However, the proof is written in Russian. For convenience of the reader we provide it here.

Proof.

Let bk=ckkb_{k}=\frac{c_{k}}{k} for some nonnegative sequence (ck)k(c_{k})_{k\in\mathbb{N}}. Then it is enough to show that ckc_{k} is bounded for all kk\in\mathbb{N}. By assumption we have

bkbk+1=ckkck+1k+1=ckk(1ck+1ckkk+1)qck2k2.b_{k}-b_{k+1}=\frac{c_{k}}{k}-\frac{c_{k+1}}{k+1}=\frac{c_{k}}{k}\left(1-\frac{c_{k+1}}{c_{k}}\frac{k}{k+1}\right)\geq q\frac{c_{k}^{2}}{k^{2}}\,.

Therefore,

1ck+1ckkk+1qckk.1-\frac{c_{k+1}}{c_{k}}\frac{k}{k+1}\geq q\frac{c_{k}}{k}\,.

After some transformation, we can rewrite the equation above as

(1+1k)(1qckk)ck+1ck.\left(1+\frac{1}{k}\right)\left(1-q\frac{c_{k}}{k}\right)\geq\frac{c_{k+1}}{c_{k}}\,.

Thus

1+1k(1qck)qckk2ck+1ck.1+\frac{1}{k}(1-qc_{k})-q\frac{c_{k}}{k^{2}}\geq\frac{c_{k+1}}{c_{k}}\,.

If 1qck<01-qc_{k}<0 we have

1>1+1k(1qck)qckk2ck+1ck.1>1+\frac{1}{k}(1-qc_{k})-q\frac{c_{k}}{k^{2}}\geq\frac{c_{k+1}}{c_{k}}\,.

Hence ck+1<ckc_{k+1}<c_{k}. On the other hand, if 1qck01-qc_{k}\geq 0, we have ck1qc_{k}\leq\frac{1}{q}. Therefore, we conclude that for all kk we have

ckmax{c1,1q}.c_{k}\leq\max\left\{c_{1},\frac{1}{q}\right\}\,.

References

  • [1] R. Bellman, Functional equations in the theory of dynamic programming. v. positivity and quasi-linearity, Proc. Natl. Acad. Sci. U.S.A., 41(10):743–746, 1955.
  • [2] R. Bellman, Dynamic Programming, Princeton University Press, Princeton, NJ, USA, 1957.
  • [3] R. A. Howard, Dynamic Programming and Markov Processes. MIT Press, Cambridge, MA, 1960.
  • [4] V. G. Boltyanskii, R. V. Gamkrelidze, and L. S. Pontryagin, Theory of optimal processes. I. The maximum principle, Izv. Akad. Nauk SSSR Ser. Mat., 24(1): 3–42, 1960.
  • [5] I. A. Krylov and F. L. Chernousko, On the method of successive approximations for solution of optimal control problems (in Russian), U.S.S.R. Comput. Math. Math. Phys., 2:6, 1371–1382, 1963.
  • [6] V. D. Demyanov and A. M. Rubinov, Approximate Methods in Extremal Problems (in Russian), Leningrad, 1968.
  • [7] L. S. Pontryagin, Mathematical Theory of Optimal Processes, CRC Press, 1987.
  • [8] N. V. Krylov, Controlled diffusion processes, Springer, 1980.
  • [9] F. Antonelli, Backward-forward stochastic differential equations, Ann. Appl. Probab., 3:777–793, 1993.
  • [10] J. Ma, P. Protter, and J. M. Yong, Solving forward-backward stochastic differential equations explicitly – a four step scheme, Probab. Th. Rel. Fields, 98:339–359, 1994.
  • [11] Y. Hu and S. Peng, Solution of forward-backward stochastic differential equations, Probab. Th. Rel. Fields, 103:273–283, 1995.
  • [12] J. Douglas, J. Ma, and P. Protter, Numerical methods for forward-backward stochastic differential equations, Ann. Appl. Probab., 6(3):940–968, 1996.
  • [13] N. El Karoui and S. Peng and M. C. Quenez, Backward Stochastic Differential Equations in Finance, Math. Finance, 7(1):1–71, 1997.
  • [14] J. Yong, Finding adapted solutions of forward-backward stochastic differential equations: Method of continuation, Probab. Th. Rel. Fields, 107:537–572, 1997.
  • [15] E. Pardoux and S. Tang, Forward-backward stochastic differential equations and quasilinear parabolic PDEs, Probab. Th. Rel. Fields, 114(2):123–150, 1999.
  • [16] S. Peng and Z. Wu, Fully coupled forward-backward stochastic differential equations and applications to optimal control, SIAM J. Control Optim. 37:825–843, 1999.
  • [17] F. Delarue, On the existence and uniqueness of solutions to FBSDEs in a non-degenerate case, Stochastic Process. Appl., 99:209–289, 2002.
  • [18] F. Delarue and S. Menozzi, A forward-backward stochastic algorithm for quasi-linear PDEs, Ann. Appl. Probab., 16(1):140–184, 2006.
  • [19] G. Milstein and M. Tretyakov, Discretization of forward-backward stochastic differential equations and related quasi-linear parabolic equations, IMA J. Numer. Anal., 27(1):24–44, 2007.
  • [20] H. Dong and N. V. Krylov, The rate of convergence of finite-difference approximations for parabolic Bellman equations with Lipschitz coefficients in cylindrical domains, Appl. Math. Optim., 56(1):37–66, 2007.
  • [21] J. Ma, J. Shen, and Y. Zhao, On numerical approximations of forward-backward stochastic differential equations, SIAM J. Numer. Anal., 46(5):2636–2661, 2008.
  • [22] I. Gyöngy and D. Šiska, On Finite-Difference Approximations for Normalized Bellman Equations, Appl. Math. Optim., 60:297–339, 2009.
  • [23] H. Pham, Continuous-time stochastic control and optimization with financial applications, Springer, 2009.
  • [24] W. Guo, J. Zhang, and J. Zhuo, A monotone scheme for high-dimensional fully nonlinear PDEs, Ann. Appl. Probab., 25(3):1540–1580, 2015.
  • [25] R. Carmona, Lectures on BSDEs, Stochastic Control, and Stochastic Differential Games with Financial Applications, SIAM, 2016.
  • [26] J. Zhang, Backward Stochastic Differential Equations: From Linear to Fully Nonlinear Theory, Springer, New York, 2017.
  • [27] S. D. Jacka and A. Mijatović, On the policy improvement algorithm in continuous time, Stochastics, 89(1):348–359, 2017.
  • [28] S. D. Jacka, A. Mijatović, and D. Siraj, Coupling and a generalised Policy Iteration Algorithm in continuous time, arXiv:1707.07834, 2017.
  • [29] J. Maeda and S. D. Jacka, Evaluation of the Rate of Convergence in the PIA, arXiv:1709.06466, 2017.
  • [30] M. Sabate Vidales, D. Šiška, and Ł. Szpruch, Unbiased deep solvers for parametric PDEs, arXiv:1810.05094v2, 2018.
  • [31] J. Han, A. Jentzen, and W. E, Solving high-dimensional partial differential equations using deep learning, Proc. Natl. Acad. Sci. U.S.A., 115(34):8505–8510, 2018.
  • [32] Q. Li, L. Chen, C.Tai, and W. E, Maximum principle based algorithms for deep learning, J. Mach. Learn. Res., 18(165), 1–29, 2018.
  • [33] J. Harter and A. Richou, A stability approach for solving multidimensional quadratic BSDEs, Electron. J. Probab., 24(4), 1–51, 2019.
  • [34] J. Han and J. Long, Convergence of the deep BSDE method for coupled FBSDEs, Probab. Uncertain. Quant. Risk., 5(1), 1–33, 2020.
  • [35] S. Ji, S. Peng, Y. Peng, and X. Zhang, Three algorithms for solving high-dimensional fully-coupled FBSDEs through deep learning, IEEE Intell. Syst., 35(3):71–84, 2020.
  • [36] B. Kerimkulov, D. Šiška, and Ł. Szpruch. Exponential convergence and stability of Howard’s policy improvement algorithm for controlled diffusions, SIAM J. Control Optim., 58(3), 1314–1340, 2020.
  • [37] D. Šiška and Ł. Szpruch, Gradient Flows for Regularized Stochastic Control Problems, arXiv:2006.05956, 2020.