This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

RNN-BSDE method for high-dimensional fractional backward stochastic differential equations with Wick-Itô integrals

Chunhao Cai [email protected] Cong Zhang [email protected] School of Mathematics (Zhuhai), Sun Yat-sen University, Zhuhai 519082, Guangdong, People’s Republic of China
Abstract

Fractional Brownian motions(fBMs) are not semimartingales so the classical theory of Itô integral can’t apply to fBMs. Wick integration as one of the applications of Malliavin calculus to stochastic analysis is a fine definition for fBMs. We consider the fractional forward backward stochastic differential equations(fFBSDEs) driven by a fBM that have the Hurst parameter in (12,1)(\frac{1}{2},1) where 0tfs𝑑BsH\int_{0}^{t}f_{s}\,dB_{s}^{H} is in the sense of a Wick integral, and relate our fFBSDEs to the system of partial differential equations by using an analogue of the Itô formula for Wick integrals. And we develop a deep learning algorithm referred to as the RNN-BSDE method based on recurrent neural networks which is exactly designed for solving high-dimensional fractional BSDEs and their corresponding partial differential equations.

keywords:
Fractional Brownian motions, Wick calculus , Fractional backward stochastic differential equations , Deep learning , Recurrent neural networks

1 Introduction

In recent years, deep learning has been developed and has been widely used to deal with high-dimensional problems about Partial Differential Equations(PDEs), though the key points of the idea of deep learning which include convolutional neural networks[1] and back propagation[2] were already developed by 1990. The Recurrent Neural Network algorithm, which is classical and basic for dealing with time series, was already developed by 2000[3][4]. In the 21st century, the advanced hardware and datasets rather than theories are mainly the reasons that deep learning takes off.

We have already known some work on solving PDEs by deep learning algorithms like PINNs[5] and neural operator[6]. In this paper, we’d like to discuss more on solving SDEs by deep learning, which may be rare and have less work on it. The late 2010s and 2020s have witnessed some deep learning algorithms to solve stochastic control problems[7][8] and solve the backward stochastic differential equations(BSDEs)[9]. Since the connection between nonlinear PDEs and BSDEs has been proved[10], Solving BSDEs means we can also solve the corresponding PDEs by using the same algorithm.

These work has inspired us whether it will work if we try to solve fractional backward stochastic differential equations(fBSDEs) by the deep learning method since it has been proved to be effective for BSDEs. In our work, we consider the following fractional forward backward stochastic differential equations(fFBSDEs)

{dXs=μ(s,Xs)ds+σ(s,Xs)dBsH,dYs=f(s,Xs,Ys,Zs)dsZsdBsH,Xt=x,YT=g(XT),\begin{cases}dX_{s}=\mu(s,X_{s})ds+\sigma(s,X_{s})dB_{s}^{H},\\ -dY_{s}=f(s,X_{s},Y_{s},Z_{s})ds-Z_{s}dB_{s}^{H},\\ X_{t}=x,\\ Y_{T}=g(X_{T}),\end{cases} (1)

where BsHB_{s}^{H} is a fractional Brownian motion(fBM), let Hurst constant H(12,1)H\in(\frac{1}{2},1), tt and TT mean the start and final time, {(Xs,Ys,Zs),tsT}\left\{(X_{s},Y_{s},Z_{s}),t\leq s\leq T\right\} are all stochastic processes.

Mandelbrot and van Ness defined a fractional Brownian Motion as that

Definition 1.1 (fBM[11]).

Let B0B_{0} an arbitrary real number. We call BtHB_{t}^{H} a fractional Brownian Motion with Hurst parameter H and starting value B0B_{0} at time 0, such as

  1. 1.

    BH(0,ω)=B0B^{H}(0,\omega)=B_{0},

  2. 2.

    BH(t,ω)BH(0,ω)=1Γ(H+12)0t(ts)H12𝑑B(s,ω)B^{H}(t,\omega)-B^{H}(0,\omega)=\frac{1}{\Gamma(H+\frac{1}{2})}\int_{0}^{t}(t-s)^{H-\frac{1}{2}}dB(s,\omega).

Since the fractional Brownian motions are not semimartingales, f𝑑BH\int f\,dB^{H} is not well-defined in the sense of Itô integral[12]. In this paper, we instead consider it as the Wick integral[13][14][15][16]. To introduce it, the Wick product is used. Moreover, the stochastic calculus including the Wick integration and differentiation which we will introduce later is based on the Malliavin calculus[13]. According to Wick integration, we can also write (1) as

{dXs=μ(s,Xs)ds+σ(s,Xs)B˙sHds,dYs=f(s,Xs,Ys,Zs)dsZsB˙sHds,Xt=x,YT=g(XT),\begin{cases}dX_{s}=\mu(s,X_{s})ds+\sigma(s,X_{s})\diamond\dot{B}_{s}^{H}ds,\\ -dY_{s}=f(s,X_{s},Y_{s},Z_{s})ds-Z_{s}\diamond\dot{B}_{s}^{H}ds,\\ X_{t}=x,\\ Y_{T}=g(X_{T}),\end{cases} (2)

where \diamond is the mark of the Wick product.

Our purpose is to approximate the t\mathscr{F}_{t}-adapted processes {(Ys,Zs),tsT}\left\{(Y_{s},Z_{s}),t\leq s\leq T\right\} such that

Yt=g(XT)+tTf(s,Xs,Ys,Zs)𝑑stTZs𝑑BsH,a.s.Y_{t}=g(X_{T})+\int_{t}^{T}f(s,X_{s},Y_{s},Z_{s})\,ds-\int_{t}^{T}Z_{s}\,dB_{s}^{H},\quad a.s. (3)

by using the deep learning method.

Our paper is organized as follows: In Sec.2, we summarize some results from the applications of Malliavin calculus to stochastic analysis that we need for our work. In Sec.3, we consider the relationship between fFBSDEs and PDEs, also we present some calculations which are useful for the numerical experiments in Sec.5. In Sec.4, we introduce our deep learning algorithm for solving fFBSDEs which is based on the Recurrent Neural Network, and we refer to this algorithm as RNN-BSDE method. In Sec.5, we give some numerical experiments of solving some parabolic PDEs by RNN-BSDE method and compare our method with other methods that are mainly designed for solving BSDEs rather than fractional BSDEs.

2 Wick calculus

We will review some results of the applications of Malliavin calculus to stochastic analysis. For the details one can learn more about stochastic calculus for Brownian motions from [13][17] and the version for fractional Brownian motions from [14][15][16]. Firstly we have some preparations.

Set 12H1\frac{1}{2}\leq H\leq 1 and fix the Hurst constant. Define

ϕ(s,t)=H(2H1)|st|2H2,s,t.\phi(s,t)=H(2H-1)\left|s-t\right|^{2H-2},\quad s,t\in\mathbb{R}. (4)

Let f:f:\mathbb{R}\to\mathbb{R} be measurable. We say that ff belongs to the Hilbert space Lϕ2()L^{2}_{\phi}(\mathbb{R}) if

|f|ϕ2:=ϕ(s,t)f(s)f(t)𝑑s𝑑t<.{\left|f\right|}_{\phi}^{2}:=\int_{\mathbb{R}}\int_{\mathbb{R}}\phi(s,t)f(s)f(t)\,dsdt<\infty. (5)

The inner product on Lϕ2()L^{2}_{\phi}(\mathbb{R}) is denoted by (,)ϕ{(\cdot,\cdot)}_{\phi}.

For any fLϕ2()f\in L^{2}_{\phi}(\mathbb{R}), define ε(f)\varepsilon(f) as

ε(f)=exp(f𝑑BH12|f|ϕ2).\varepsilon(f)=\exp{(\int_{\mathbb{R}}f\,dB^{H}-\frac{1}{2}\left\lvert f\right\rvert^{2}_{\phi})}. (6)

ε(f)\varepsilon(f) is called an exponential function and let EE be the linear span of
{ε(f),fLϕ2()}\left\{\varepsilon(f),f\in L^{2}_{\phi}(\mathbb{R})\right\}.

2.1 Wick integration

Consider the fractional white noise probability space denoted (Ω,,P)=(S(),,Pϕ)(\Omega,\mathscr{F},P)=(S^{\prime}(\mathbb{R}),\mathscr{B},P_{\phi}) where S()S^{\prime}(\mathbb{R}) is the dual space of the Schwartz space S()S(\mathbb{R}) and the probability measure PϕP_{\phi} exists by using the Bochner-Minlos theorem(see e.g.[18]), by use the fractional Wiener-Itô chaos expansion theorem[14], there is a chaos expansion(see e.g.[15]) for all FL2(Pϕ)F\in L^{2}(P_{\phi}). Then it’s ready to define the fractional Hida distribution spaces denoted SS^{*} if FF admits the chaos expansion with finite negative norm described in [15][16].

The Wick product is the product defined for X,YX,Y which satisfy the fractional Wiener-Itô chaos expansion. And in this paper, it’s sufficient to know the fractional Brownian motion BtHB_{t}^{H} has its derivative in SS^{*} such that

f(t)𝑑BtH=f(t)B˙tHdt,forf:S.\int_{\mathbb{R}}f(t)\,dB^{H}_{t}=\int_{\mathbb{R}}f(t)\diamond\dot{B}_{t}^{H}dt,\quad for\quad f:\mathbb{R}\to S^{*}.

EE is dense in L2(Pϕ)L^{2}(P_{\phi}) and we have

Definition 2.1 (Wick integration).

Consider fEf\in E and an arbitrary partition of [0,T][0,T] denoted π\pi, then for the following Riemann sum:

S(f,π)=i=0n1fti(Bti+1HBtiH),S(f,\pi)=\sum_{i=0}^{n-1}f_{t_{i}}\diamond(B_{t_{i+1}}^{H}-B_{t_{i}}^{H}), (7)

is well-defined. denote |π|=maxi(ti+1ti)\left\lvert\pi\right\rvert=\max_{i}(t_{i+1}-t_{i}), define 0Tf(t)𝑑BtH\int_{0}^{T}f(t)\,dB^{H}_{t} as

0Tf(t)𝑑BtH=lim|π|0S(f,π),foranyπ.\int_{0}^{T}f(t)\,dB^{H}_{t}=\lim_{\left\lvert\pi\right\rvert\to 0}S(f,\pi),\quad for~{}any~{}\pi. (8)

For f(0,T)f\in\mathcal{L}(0,T) defined on Page 591 of [14], S(f,π)S(f,\pi) will be a cauchy sequence in L2(Pϕ)L^{2}(P_{\phi}) so that it has the limit denoted by 0Tf(s)𝑑BsH\int_{0}^{T}f(s)\,dB_{s}^{H}. In view of Theorem 3.9 in [14], for f(0,T)f\in\mathcal{L}(0,T), the limit of (7) satisfies

E|0Tf(t)𝑑BtH|2=E{(0TDsfs𝑑s)2+|f|ϕ2},E\left\lvert\int_{0}^{T}f(t)\,dB^{H}_{t}\right\rvert^{2}=E\left\{\left(\int_{0}^{T}D_{s}f_{s}\,ds\right)^{2}+\left\lvert f\right\rvert_{\phi}^{2}\right\}, (9)

where DsD_{s} are defined in the next subsection.

2.2 Stochastic derivative

The definition of stochastic derivative that its derivative operator is denoted as DtD_{t} can be found in [13][14]. For gLϕ2()g\in L^{2}_{\phi}(\mathbb{R}), ϕ\phi is defined by

(Φg)t=0ϕ(t,u)gu𝑑u(\Phi_{g})_{t}=\int_{0}^{\infty}\phi(t,u)g_{u}\,du

Let DΦgD_{\Phi_{g}}(defined in [14]) an analogue of the directional derivative and DtD_{t} is defined as

DΦgF(ω)=0DsF(ω)g(s)𝑑s,D_{\Phi_{g}}F(\omega)=\int_{0}^{\infty}D_{s}F(\omega)g(s)\,ds,

where FF is a random variable and FLpF\in L^{p}.

The following results of stochastic derivative are used in this paper.

Lemma 2.2 (The chain rule).

Let fC1()f\in C^{1}(\mathbb{R}) and F:S()F:S^{\prime}(\mathbb{R})\to\mathbb{R}, DtFD_{t}F exists, then

Dtf(F)=fDtF.D_{t}f(F)=f^{\prime}D_{t}F. (10)

The proof is an analogue of the proof of Lemma 3.6 of [19].

Theorem 2.3 (Itô formula for Wick integration).

Let Xt=ξ+0tμ(s,Xs)𝑑s+0tσ(s,Xs)𝑑BsHX_{t}=\xi+\int_{0}^{t}\mu(s,X_{s})\,ds+\int_{0}^{t}\sigma(s,X_{s})\,dB_{s}^{H}, σ(0,T)\sigma\in\mathcal{L}(0,T) and Esup0sT|μs|<E\sup_{0\leq s\leq T}|\mu_{s}|<\infty. Assume there is an α>1H\alpha>1-H such that

E|σuσv|2C|uv|2α,E\left\lvert\sigma_{u}-\sigma_{v}\right\rvert^{2}\leq C\left\lvert u-v\right\rvert^{2\alpha},

where |uv|δ\left\lvert u-v\right\rvert\leq\delta for some δ>0\delta>0 and

lim0u,vt,|uv|0E|Du(σuσv)|2=0.\lim_{0\leq u,v\leq t,\left\lvert u-v\right\rvert\to 0}E\left\lvert D_{u}(\sigma_{u}-\sigma_{v})\right\rvert^{2}=0.

Let f:+×f:\mathbb{R_{+}}\times\mathbb{R}\to\mathbb{R}, be a function having the first continuous derivative in its first variable and the second continuous derivative in its second variable. Assume that these derivatives are bounded. Moreover, it is assumed that E0T|σsDsXs|<E\int_{0}^{T}|\sigma_{s}D_{s}X_{s}|<\infty, (fx(s,Xs)σs,0sT)\left(\frac{\partial f}{\partial x}(s,X_{s})\sigma_{s},0\leq s\leq T\right) are in (0,T)\mathcal{L}(0,T). Then, for 0tT0\leq t\leq T,

f(t,Xt)=\displaystyle f(t,X_{t})= f(0,ξ)+0tfs(s,Xs)𝑑s+0tfx(s,Xs)μs𝑑s\displaystyle f(0,\xi)+\int_{0}^{t}\frac{\partial f}{\partial s}(s,X_{s})ds+\int_{0}^{t}\frac{\partial f}{\partial x}(s,X_{s})\mu_{s}ds (11)
+0t2fx2(s,Xs)σsDsXs𝑑s\displaystyle+\int_{0}^{t}\frac{\partial^{2}f}{\partial x^{2}}(s,X_{s})\sigma_{s}D_{s}X_{s}ds
+0tfx(s,Xs)σs𝑑BsHa.s.\displaystyle+\int_{0}^{t}\frac{\partial f}{\partial x}(s,X_{s})\sigma_{s}dB_{s}^{H}\quad a.s.
Proposition 2.4.

If gLϕ2()g\in L^{2}_{\phi}(\mathbb{R}), FL2(Ω,,P)F\in L^{2}(\Omega,\mathscr{F},P), and DϕgFL2(Ω,,P)D_{\phi_{g}}F\in L^{2}(\Omega,\mathscr{F},P), then

F+gs𝑑BsH=F+gs𝑑BsHDϕgF.F\diamond\int_{\mathbb{R^{+}}}g_{s}dB^{H}_{s}=F\int_{\mathbb{R^{+}}}g_{s}dB^{H}_{s}-D_{\phi_{g}}F. (12)

if as nn\to\infty, i=0n1f(ti(n))(BH(ti+1(n))BH(ti(n)))\sum_{i=0}^{n-1}f(t^{(n)}_{i})(B^{H}(t^{(n)}_{i+1})-B^{H}(t^{(n)}_{i})) converges in L2(Ω,,P)L^{2}(\Omega,\mathscr{F},P) to the same limit for all partitions (πn,nN)(\pi_{n},n\in N) satisfying |πn|0\left\lvert\pi_{n}\right\rvert\to 0 as nn\to\infty, then this limit is called the stochastic integral of Stratonovich type and the limit is denoted by 0tfsδBsH\int_{0}^{t}f_{s}\delta B_{s}^{H}.

In view of Proposition 2.4, we have

Theorem 2.5.

For f(0,T)f\in\mathcal{L}(0,T), the following equality is satisfied:

0tfs𝑑BsH=0tfsδBsH0tDsfs𝑑sa.s.\int_{0}^{t}f_{s}\,dB_{s}^{H}=\int_{0}^{t}f_{s}\delta B_{s}^{H}-\int_{0}^{t}D_{s}f_{s}\,ds\qquad a.s. (13)

3 Fractional Backward SDEs and systems of PDEs

Consider the fractional white noise space (Ω,,P)=(S(d),,Pϕ)(\Omega,\mathscr{F},P)=(S^{\prime}(\mathbb{R}^{d}),\mathscr{B},P_{\phi})(The multidimensional presentation d>1d>1 will be an analogue of the case in [20]), and the fFBSDEs (2) which we give in Sec.1:

{dXs=μ(s,Xs)ds+σ(s,Xs)B˙sHds,dYs=f(s,Xs,Ys,Zs)dsZsB˙sHds,Xt=x,YT=g(XT),\begin{cases}dX_{s}=\mu(s,X_{s})ds+\sigma(s,X_{s})\diamond\dot{B}_{s}^{H}ds,\\ -dY_{s}=f(s,X_{s},Y_{s},Z_{s})ds-Z_{s}\diamond\dot{B}_{s}^{H}ds,\\ X_{t}=x,\\ Y_{T}=g(X_{T}),\end{cases}

where H(12,1)H\in(\frac{1}{2},1), μ(s,x):+×dd\mu(s,x):\mathbb{R}^{+}\times\mathbb{R}^{d}\to\mathbb{R}^{d}, σ(s,x):+×dd×d\sigma(s,x):\mathbb{R}^{+}\times\mathbb{R}^{d}\to\mathbb{R}^{d\times d}, and let μ(s,Xs)\mu(s,X_{s}), σ(s,Xs)\sigma(s,X_{s}) satisfy the conditions of Theorem 2.3. Y:+×ΩkY:\mathbb{R}^{+}\times\Omega\to\mathbb{R}^{k}, Z:+×Ωk×dZ:\mathbb{R}^{+}\times\Omega\to\mathbb{R}^{k\times d}, g(XT)g(X_{T}) is T\mathscr{F}_{T}-measurable, f:+×d×k×k×dkf:\mathbb{R}^{+}\times\mathbb{R}^{d}\times\mathbb{R}^{k}\times\mathbb{R}^{k\times d}\to\mathbb{R}^{k}.

We want to link fractional Backward SDEs and PDEs. Firstly, we give the following system of PDEs,

{u(t,x)t+fracu(t,x)+f(t,x,u(t,x),(uσ)(t,x))=0,u(T,x)=g(x),\begin{cases}\begin{aligned} &\frac{\partial u(t,x)}{\partial t}+\mathcal{L}_{frac}u(t,x)+f(t,x,u(t,x),(\nabla u\sigma)(t,x))=0,\\ &u(T,x)=g(x),\\ \end{aligned}\end{cases} (14)

where u:+×dku:\mathbb{R_{+}}\times\mathbb{R}^{d}\to\mathbb{R}^{k} and we denote u(t,x)=(u1(t,x),,uk(t,x))Tu(t,x)=\left(u^{1}(t,x),\cdots,u^{k}(t,x)\right)^{T}, and

frac=(Lfracu1Lfracuk),\mathcal{L}_{frac}=\begin{pmatrix}L_{frac}u_{1}\\ \cdots\\ L_{frac}u_{k}\end{pmatrix},
Lfrac=i,j=1dσi(t,x)Dtxj2xixj+i=1dμi(t,x)xi.L_{frac}=\sum_{i,j=1}^{d}\sigma_{i}(t,x)D_{t}x_{j}\frac{\partial^{2}}{\partial x_{i}\partial x_{j}}+\sum_{i=1}^{d}\mu_{i}(t,x)\frac{\partial}{\partial x_{i}}.

We have the following theorem that

Theorem 3.1.

Fix 12<H1\frac{1}{2}<H\leq 1, consider Xt=X0+0tμ(s,Xs)𝑑s+0tσ(s,Xs)𝑑BsHX_{t}=X_{0}+\int_{0}^{t}\mu(s,X_{s})\,ds+\int_{0}^{t}\sigma(s,X_{s})\,dB_{s}^{H} and let {(μs,σs),0sT}\left\{(\mu_{s},\sigma_{s}),0\leq s\leq T\right\} satisfy the conditions of Theorem 2.3. If uC1,2([0,T]×d)u\in C^{1,2}([0,T]\times\mathbb{R}^{d}) solves equation (14), then u(t,x)=Ytt,xu(t,x)=Y^{t,x}_{t}(where Xtt,x=xX^{t,x}_{t}=x), t0t\geq 0, xdx\in\mathbb{R}^{d}, where {(Yst,x,Zst,x),tsT}\left\{(Y_{s}^{t,x},Z_{s}^{t,x}),t\leq s\leq T\right\} is the solution of the BSDE (3).

Proof.

Apply Theorem 2.3 to the solution uu of (14),

u(T,XT)u(t,Xt)=\displaystyle u(T,X_{T})-u(t,X_{t})= tTfracu(s,Xs)+fds\displaystyle-\int_{t}^{T}\mathcal{L}_{frac}u(s,X_{s})+f\,ds
+i=1dtTμi(s,Xs)u(s,Xs)xi𝑑s\displaystyle+\sum_{i=1}^{d}\int_{t}^{T}\mu_{i}(s,X_{s})\frac{\partial u(s,X_{s})}{\partial x^{i}}\,ds
+i=1dtTσi(s,Xs)u(s,Xs)xi𝑑BsH\displaystyle+\sum_{i=1}^{d}\int_{t}^{T}\sigma_{i}(s,X_{s})\frac{\partial u(s,X_{s})}{\partial x^{i}}\,dB^{H}_{s}
+i,j=1dtTσi(s,Xs)DsXsj2u(s,X)xixj𝑑sa.s.\displaystyle+\sum_{i,j=1}^{d}\int_{t}^{T}\sigma_{i}(s,X_{s})D_{s}X_{s}^{j}\frac{\partial^{2}u(s,X)}{\partial x^{i}\partial x^{j}}\,ds\quad a.s.
u(t,Xt)=g(XT)+tTf𝑑si=1dtTσi(s,Xs)u(s,Xs)xi𝑑BsH,a.s.u(t,X_{t})=g(X_{T})+\int_{t}^{T}f\,ds-\sum_{i=1}^{d}\int_{t}^{T}\sigma_{i}(s,X_{s})\frac{\partial u(s,X_{s})}{\partial x^{i}}\,dB^{H}_{s},\quad a.s. (15)

it can be shown that proving this theorem. ∎

Remark 1.

From the proof, we can obtain

{u(t,Xt)=Yt,σ(t,Xt)u(t,Xt)=Zt.\begin{cases}\begin{aligned} &u(t,X_{t})=Y_{t},\\ &\sigma(t,X_{t})\nabla u(t,X_{t})=Z_{t}.\end{aligned}\end{cases} (16)

It’s worth to consider an example of the geometric fractional Brownian motion which solves the fractional SDE that

dXt=μXtdt+σXtdBtH,X0=x00,dX_{t}=\mu X_{t}dt+\sigma X_{t}dB_{t}^{H},X_{0}=x_{0}\geq 0, (17)

where x0x_{0}, μ\mu, σ\sigma are constants.

Proposition 3.2.

The solution of (17) is Xt=exp(μt+σBtH)X_{t}=\exp^{\diamond}{\left(\mu t+\sigma B_{t}^{H}\right)}, i.e. Xt=exp(μt+σBtH12σ2t2H)X_{t}=\exp{\left(\mu t+\sigma B_{t}^{H}-\frac{1}{2}\sigma^{2}t^{2H}\right)}.

Proof.
dXt\displaystyle dX_{t} =μXtdt+σXtB˙tHdt\displaystyle=\mu X_{t}dt+\sigma X_{t}\diamond\dot{B}_{t}^{H}dt
=(μ+σB˙tH)Xtdt\displaystyle=(\mu+\sigma\dot{B}_{t}^{H})\diamond X_{t}dt
dXtdt\displaystyle\frac{dX_{t}}{dt} =(μ+σB˙tH)Xt.\displaystyle=(\mu+\sigma\dot{B}_{t}^{H})\diamond X_{t}.

Using Wick calculus, then

Xt\displaystyle X_{t} =x0exp(μt+σ0tB˙sH𝑑s)\displaystyle=x_{0}\exp^{\diamond}{\left(\mu t+\sigma\int_{0}^{t}\dot{B}_{s}^{H}\,ds\right)} (18)
=x0exp(μt+σBtH)\displaystyle=x_{0}\exp^{\diamond}{\left(\mu t+\sigma B_{t}^{H}\right)}
=x0exp(μt+σBtH12σ2t2H).\displaystyle=x_{0}\exp{\left(\mu t+\sigma B_{t}^{H}-\frac{1}{2}\sigma^{2}t^{2H}\right)}.

Proposition 3.3.

Let sts\leq t, the solution of (17) has a derivative DsXt=σHXt[s2H1(ts)2H1]D_{s}X_{t}=\sigma HX_{t}\left[s^{2H-1}-(t-s)^{2H-1}\right]

Proof.

From Lemma 2.2, DsXtD_{s}X_{t} exists and

DsXt\displaystyle D_{s}X_{t} =XtxDsBtH\displaystyle=\frac{\partial X_{t}}{\partial x}D_{s}B_{t}^{H} (19)
=σx0exp(μt+σBtH12σ2t2H)0tϕ(u,s)𝑑u\displaystyle=\sigma x_{0}\exp{\left(\mu t+\sigma B_{t}^{H}-\frac{1}{2}\sigma^{2}t^{2H}\right)}\int_{0}^{t}\phi(u,s)\,du
=σHXt[s2H1(ts)2H1].\displaystyle=\sigma HX_{t}\left[s^{2H-1}-(t-s)^{2H-1}\right].

Corollary 3.4.

Consider Xt=η0+0tμXs𝑑s+0tσXs𝑑BsHX_{t}=\eta_{0}+\int_{0}^{t}\mu X_{s}\,ds+\int_{0}^{t}\sigma X_{s}\,dB_{s}^{H}, μ\mu\in\mathbb{R}, σ\sigma\in\mathbb{R}, η0d\eta_{0}\in\mathbb{R}^{d} are constants, if uC1,2([0,T]×d)u\in C^{1,2}([0,T]\times\mathbb{R}^{d}) solves

{u(t,x)t+fracu(t,x)+f(t,x,u(t,x),(uσ)(t,x))=0,u(T,x)=g(x),\begin{cases}\begin{aligned} \frac{\partial u(t,x)}{\partial t}+\mathcal{L}_{frac}u(t,x)+f(t,x,u(t,x),(\nabla u\sigma)(t,x))&=0,\\ u(T,x)&=g(x),\\ \end{aligned}\end{cases} (20)

and

frac=(Lfracu1Lfracuk),\mathcal{L}_{frac}=\begin{pmatrix}L_{frac}u_{1}\\ \cdots\\ L_{frac}u_{k}\end{pmatrix},
Lfrac=i,j=1dσ2Hxj2t2H12xixj+i=1dμxixi.L_{frac}=\sum_{i,j=1}^{d}\sigma^{2}Hx_{j}^{2}t^{2H-1}\frac{\partial^{2}}{\partial x_{i}\partial x_{j}}+\sum_{i=1}^{d}\mu x_{i}\frac{\partial}{\partial x_{i}}.

Then u(t,x)=Ytt,xu(t,x)=Y^{t,x}_{t} where {(Yst,x,Zst,x),tsT}\left\{(Y_{s}^{t,x},Z_{s}^{t,x}),t\leq s\leq T\right\} is the solution of the BSDE (3).

Proof.

Easily proved from 3.1 and 3.3. ∎

4 RNN-BSDE method

Before we start to build our network, we should apply a time discretization to BSDEs (3). Consider the partition π:0=t0<t1<<tN=T\pi:0=t_{0}<t_{1}<\cdots<t_{N}=T, for any tn<tn+1t_{n}<t_{n+1} on [0,T][0,T], from Definition 1.1 and Definition 2.1, it holds that

{Ytn+1=Ytnf(tn,Xtn,Ytn,Ztn)(tn+1tn)+ZtnΔBtnH,Btn+1H=1Γ(H+12)i=0n(tn+1ti)H12(Bti+1Bti).\begin{cases}Y_{t_{n+1}}=Y_{t_{n}}-f(t_{n},X_{t_{n}},Y_{t_{n}},Z_{t_{n}})\left(t_{n+1}-t_{n}\right)+Z_{t_{n}}\diamond\Delta B^{H}_{t_{n}},\\ B^{H}_{t_{n+1}}=\frac{1}{\Gamma(H+\frac{1}{2})}\sum_{i=0}^{n}(t_{n+1}-t_{i})^{H-\frac{1}{2}}(B_{t_{i+1}}-B_{t_{i}}).\end{cases} (21)

Moreover, to deal with the Wick product, we use Proposition 2.4 and obtain that

Ytn+1=\displaystyle Y_{t_{n+1}}= Ytnf(tn,Xtn,Ytn,Ztn)(tn+1tn)+Ztn(Btn+1HBtnH)\displaystyle Y_{t_{n}}-f(t_{n},X_{t_{n}},Y_{t_{n}},Z_{t_{n}})\left(t_{n+1}-t_{n}\right)+Z_{t_{n}}(B^{H}_{t_{n+1}}-B^{H}_{t_{n}}) (22)
DΦgZtn\displaystyle-D_{\Phi_{g}}Z_{t_{n}}
=\displaystyle= Ytnf(tn,Xtn,Ytn,Ztn)(tn+1tn)+Ztn(Btn+1HBtnH)\displaystyle Y_{t_{n}}-f(t_{n},X_{t_{n}},Y_{t_{n}},Z_{t_{n}})\left(t_{n+1}-t_{n}\right)+Z_{t_{n}}(B^{H}_{t_{n+1}}-B^{H}_{t_{n}})
DtnZtn(tn+1tn).\displaystyle-D_{t_{n}}Z_{t_{n}}\left(t_{n+1}-t_{n}\right).

The approximation scheme is still incomplete because DtnZtnD_{t_{n}}Z_{t_{n}} is unknown since ZtnZ_{t_{n}} is which we need to find if it’s a problem of solving the fBSDEs. Of course, it may be an idea that we construct another neural network to approximate DtnZtnD_{t_{n}}Z_{t_{n}} just like what we do for approximating ZtnZ_{t_{n}} that we will introduce soon. But we prefer to give another way, from Theorem 3.1 and (1), we can consider ZtZ_{t} as g(t,Xt)g(t,X_{t}), so in view of Lemma 2.2

DsZt\displaystyle D_{s}Z_{t} =Dsg(t,Xt)\displaystyle=D_{s}g(t,X_{t})
=g(t,Xt)xDsXt.\displaystyle=\frac{\partial g(t,X_{t})}{\partial x}D_{s}X_{t}.

Besides, if XtX_{t} is the solution of (17), in view of Proposition 3.3,

DsZt\displaystyle D_{s}Z_{t} =g(t,Xt)xDsXt\displaystyle=\frac{\partial g(t,X_{t})}{\partial x}D_{s}X_{t}
=σHXtg(t,Xt)x[s2H1(st)2H1].\displaystyle=\sigma HX_{t}\frac{\partial g(t,X_{t})}{\partial x}[s^{2H-1}-(s-t)^{2H-1}].

Then we can rewrite (22) as

Ytn+1=\displaystyle Y_{t_{n+1}}= Ytn(f(tn,Xtn,Ytn,Ztn)+xZtnDtnXtn)Δtn\displaystyle Y_{t_{n}}-\left(f(t_{n},X_{t_{n}},Y_{t_{n}},Z_{t_{n}})+\partial_{x}^{\ast}Z_{t_{n}}\cdot D_{t_{n}}X_{t_{n}}\right)\Delta t_{n}
+ZtnΔBtnH,\displaystyle+Z_{t_{n}}\Delta B_{t_{n}}^{H},

where x\partial_{x}^{\ast} means automatic differentiation.

4.1 Main ideas of the RNN-BSDE method

In our work, we develop an algorithm for solving fractional BSDEs based on deep BSDE method[9] and refer to it as RNN-BSDE method. The reason why it’s necessary to develop a new algorithm is, as we all know, fractional Brownian motions are not Markov processes. Besides, fBMs have the property of long-range dependence if H(12,1)H\in(\frac{1}{2},1) and short-range dependence if H(0,12)H\in(0,\frac{1}{2})(see e.g. [16]). It means the increments of BtHB^{H}_{t} can’t be non-correlated if H12H\neq\frac{1}{2}. When we approximate (Ztn,Ytn+1)(Z_{t_{n}},Y_{t_{n+1}}), it will not be satisfying if we only consider using the information of XtnX_{t_{n}} in the input layer. Instead, we want to make full use of all the information before time tn+1t_{n+1}, so a recurrent neural network is a better choice than a feedforward neural network.

Recurrent neural network(RNN) structure[3][4], which is classical for dealing with time series, has some advantages of solving fractional BSDEs that

  1. 1.

    RNN can make full use of more information before time tn+1t_{n+1}. RNN processes the sequence data stored in rank-3 tensors of shape (samples, timesteps, features). the computations of XtnX_{t_{n}}, n=0,1,,N1n=0,1,\cdots,N-1 in the recurrent unit which is the fundamental building block of an RNN can be expressed as

    htn=f(XtnU+htn1W+b),h_{t_{n}}=f(X_{t_{n}}*U+h_{t_{n-1}}*W+b),\\ (23)

    where UU,WW are weight matrices, bb are biases, htnh_{t_{n}} means the hidden state of hidden layers at time tnt_{n}, ff is the activation function. For RNN, consider (23) with each t=tn,n=0,1,,N1t=t_{n},\quad n=0,1,\cdots,N-1, then

    htn=\displaystyle h_{t_{n}}= f(UXtn+Wf(UXtn1+Whtn2+b)+b)\displaystyle f(U*X_{t_{n}}+W*f(U*X_{t_{n-1}}+W*h_{t_{n-2}}+b)+b)
    =\displaystyle= f(UXtn+Wf(UXtn1+Wf(f(UXt0+Wh0\displaystyle f(U*X_{t_{n}}+W*f(U*X_{t_{n-1}}+W*f(\cdots f(U*X_{t_{0}}+W*h_{0}
    +b))+b))\displaystyle+b))+b))
    =\displaystyle= f(Xtn,Xtn1,,Xt0),\displaystyle f(X_{t_{n}},X_{t_{n-1}},\dots,X_{t_{0}}),

    clearly, RNN satisfies what we required above.

  2. 2.

    For {tn|n=0,1,2,,N}[0,T]\left\{t_{n}|n=0,1,2,\cdots,N\right\}\subseteq[0,T], there are N1N-1 FNNs in deep BSDE method, what means the larger N is, the more parameters in neural networks that need to be determined are, which may be a burden of compute. But if we use an RNN structure, however large N is, there will be always one RNN since the hidden layer has a recurrent structure so that for any timestep tn{tn|n=0,1,2,,N}t_{n}\in\left\{t_{n}|n=0,1,2,\cdots,N\right\}, the weight matrices and biases are common and reused in an epoch. As mentioned in [9], for N+1N+1 time nodes, one dd-dimensional input layer, two (d+10)(d+10)-dimensional hidden layers and one dd-dimensional output layer, there will be {d+1+(N1)[2d(d+10)+(d+10)2+4(d+10)+4d]}\left\{d+1+(N-1)\left[2d(d+10)+(d+10)^{2}+4(d+10)+4d\right]\right\} parameters to be trained for deep BSDE while [1+2d(d+10)+3(d+10)2+d2+6(d+10)+3d][1+2d(d+10)+3(d+10)^{2}+d^{2}+6(d+10)+3d] parameters to be trained for RNN-BSDE using a stacked-RNN.

The main idea of the RNN-BSDE method can be expressed as.

Ztn=subRNN(Xtn;θ),n=0,1,,N1\displaystyle\ Z_{t_{n}}=subRNN(X_{t_{n}};\theta),\quad n=0,1,\cdots,N-1 (24)
Ytn+1=Ytn(f(tn,Xtn,Ytn,Ztn)+xZtnDtnXtn)Δtn+ZtnΔBtnH,\displaystyle\ \begin{aligned} Y_{t_{n+1}}=&Y_{t_{n}}-\left(f(t_{n},X_{t_{n}},Y_{t_{n}},Z_{t_{n}})+\partial_{x}^{\ast}Z_{t_{n}}\cdot D_{t_{n}}X_{t_{n}}\right)\Delta t_{n}\\ &+Z_{t_{n}}\Delta B_{t_{n}}^{H},\end{aligned} , (25)
loss=E|g(XT)YT|2,\displaystyle\ loss=E\left\lvert g(X_{T})-Y_{T}\right\rvert^{2}, (26)
(Y0new,Z0new,{Znnew},θnew)=BP(loss(DNN(Xt))).n=1,,N1\displaystyle(Y_{0}^{new},Z_{0}^{new},\left\{Z_{n}^{new}\right\},\theta^{new})=BP(loss(DNN(X_{t}))).\quad n=1,\cdots,N-1 (27)

In (24), the sub-neural network is an RNN instead of N1N-1 FNNs, and to make the RNN more effective, we choose a stacked RNN rather than a simple RNN, which the structure is shown in Fig.1.

Refer to caption
Figure 1: Rough sketch of the architecture of the stacked RNN. tnt_{n} is simply written as nn, hj(i)h_{j}^{(i)} means the hidden state of the iith hidden layer at time tjt_{j}.

The whole flow in the direction of forward propagation can be seen in Fig.2. Besides, when apply the deep BSDE method, batch normalization[21] is adopted right after each matrix multiplication and before activation. Notice that batch normalization is not suitable for RNNs, instead, we choose layer normalization[22]. Finally, we provide the pseudocode of RNN-BSDE method as following:

Refer to caption
Figure 2: Rough sketch of the architecture of the RNN-BSDE method. θ\theta represents the parameters to be trained.
Algorithm 1 The RNN-BSDE method
1:Initial parameters (α,θ)\left(\alpha,\theta\right), {Xtjm}\left\{X^{m}_{t_{j}}\right\} samples;
2:(αi,θi)\left(\alpha^{i},\theta^{i}\right);
3:for i=1maxstepi=1\to maxstep do
4:     {Ztjm,i}sub𝒩𝒩({Xtjm};θi)\left\{Z^{m,i}_{t_{j}}\right\}\leftarrow sub\mathcal{RNN}(\left\{X^{m}_{t_{j}}\right\};\theta^{i});
5:     {xZtjm,i}AD({Ztjm,i},{Xtjm})\left\{\partial_{x}^{\ast}Z^{m,i}_{t_{j}}\right\}\leftarrow AD\left(\left\{Z^{m,i}_{t_{j}}\right\},\left\{X^{m}_{t_{j}}\right\}\right); \triangleright Automatic differentiation
6:     for j=0N1j=0\to N-1 do
7:         Ytj+1m,iYtjm,i(f(tj,Xtjm,Ytjm,i,Ztjm,i)+xZtjm,iDtjXtjm)Δt+Ztjm,iΔBtjHY^{m,i}_{t_{j+1}}\leftarrow Y^{m,i}_{t_{j}}-\left(f(t_{j},X^{m}_{t_{j}},Y^{m,i}_{t_{j}},Z^{m,i}_{t_{j}})+\partial_{x}^{\ast}Z^{m,i}_{t_{j}}\cdot D_{t_{j}}X^{m}_{t_{j}}\right)\Delta t+Z^{m,i}_{t_{j}}\Delta B_{t_{j}}^{H};
8:     end for
9:     Set loss function 1Mm=1M|g(XTm)YTm,i|2\frac{1}{M}\sum_{m=1}^{M}\left\lvert g(X^{m}_{T})-Y^{m,i}_{T}\right\rvert^{2};
10:     (αi+1,θi+1)Adam(loss;αi,θi)\left(\alpha^{i+1},\theta^{i+1}\right)\leftarrow Adam(loss;\alpha^{i},\theta^{i});
11:end for

4.2 More detail of the RNN-BSDE method and an example of solving fBSDEs

In this section, we will introduce more about how to set up the neural network of the RNN-BSDE method to make the algorithm more practical. It’s convenient to use the simplest traditional RNN to explain our main idea but it’s too weak to solve some fBSDEs. So it’s necessary to tell more detail and apply some more practical types of RNN.

Suppose the samples of XtX_{t}, Xt~m×(N+1)×d\widetilde{X_{t}}\in\mathbb{R}^{m\times(N+1)\times d}, where m{m1,m2}m\in\{m_{1},m_{2}\}, m1m_{1} means the number of sample paths in the whole valid sets and we denote the mini-batch size by m2m_{2}, N+1N+1 means the number of time nodes, dd is the dimension (which will be regarded as the number of features in deep learning) of inputs. The stacked RNN to approximate (Ztn,n=0,1,,N)(Z_{t_{n}},n=0,1,\cdots,N) has at least four layers including one dd-dimensional input layer, at least two (d+10)(d+10)-dimensional hidden layers and one dd-dimensional output layer. The hidden layers and output layer are all RNN layers composed of recurrent units, we set weight matrices Uhd×(d+10)U_{h}\in\mathbb{R}^{d\times\left(d+10\right)} and Uod×dU_{o}\in\mathbb{R}^{d\times d}, recurrent weight matrices Wh(d+10)×(d+10)W_{h}\in\mathbb{R}^{\left(d+10\right)\times\left(d+10\right)} and Wo(d+10)×dW_{o}\in\mathbb{R}^{\left(d+10\right)\times d} . There is no activation function after each matrix multiplication and instead we have

htn(1)=tanh(LN(1)(XtnUh(1)+htn1(1)Wh(1))+b(1)),\displaystyle h^{(1)}_{t_{n}}=tanh\left(LN^{(1)}\left(X_{t_{n}}*U_{h}^{(1)}+h_{t_{n-1}}^{(1)}*W_{h}^{(1)}\right)+b^{(1)}\right), (28)
htn(2)=tanh(LN(2)(htn(1)Uh(2)+htn1(2)Wh(2))+b(2)),\displaystyle h^{(2)}_{t_{n}}=tanh\left(LN^{(2)}\left(h_{t_{n}}^{(1)}*U_{h}^{(2)}+h_{t_{n-1}}^{(2)}*W_{h}^{(2)}\right)+b^{(2)}\right),
Ztn=LN(3)(htn(2)Uo+Ztn1Wo)+b(3),n=0,1,,N1\displaystyle Z_{t_{n}}=LN^{(3)}\left(h^{(2)}_{t_{n}}*U_{o}+Z_{t_{n-1}}*W_{o}\right)+b^{(3)},\quad n=0,1,\cdots,N-1

where LNLN means layer normalization, tanhtanh is the hyperbolic tangent function. (28) can be understood more easily with Fig.1 together. All weights and parameters of layer normalization will be randomly initialised at the start of each run.

If we worry that a stacked RNN is still not powerful enough to solve most fBSDEs, it will be the turn of a special type of RNN named Long Short Term Memory networks(LSTMs)[23]. For H(12,1)H\in\left(\frac{1}{2},1\right), the fBM BtHB_{t}^{H} holds the long memory property and as known a traditional RNN is not able to handle ”long-term dependencies” in practice while the LSTM can keep useful with long-term dependencies and deal with the vanishing gradient problem in the RNN. Since LSTMs are a kind of RNN, we can change an RNN into a LSTM just by replacing RNN units in the network with LSTM units which are the fundamental building blocks of a LSTM. A LSTM cell is composed of a cell and three gates including an input gate, a forget gate and an output gate. In the RNN-BSDE method, we choose to use a LSTM with layer normalization which has the similar structure to a stacked-RNN illustrated in Fig.1, i.e. multiple (d+10)(d+10)-dimensional LSTM layers as hidden layers and one extra dd-dimensional LSTM layer before output.

4.3 Convergence analysis

In this part, we provide a posterior estimate of the numerical solution and this posterior estimate justifies the convergence of RNN-BSDE method. Firstly, assume

Assumptions 4.1.

Let uC1,2(+×d)u\in C^{1,2}(\mathbb{R}^{+}\times\mathbb{R}^{d}), uu and the process XtX_{t} satisfy the conditions in 2.3 to ensure that Itô formula for Wick integration(Theorem.2.3) is applicable to u(t,Xt)u(t,X_{t}).

Assumptions 4.2.

For any y,yy,y^{\prime},x,xx,x^{\prime} and t,tt,t^{\prime},

|f(t,x,y,z)f(t,x,y,z)|Lf(|tt|+|xx|+|yy|),\left\lvert f(t,x,y,z)-f(t^{\prime},x^{\prime},y^{\prime},z)\right\rvert\leq L_{f}\left(\left\lvert t-t^{\prime}\right\rvert+\left\lvert x-x^{\prime}\right\rvert+\left\lvert y-y^{\prime}\right\rvert\right),

where LfL_{f} is a given positive constant.

Assumptions 4.3.

For any z,zz,z^{\prime} and any tit_{i}, tj[0,T]t_{j}\in[0,T] satisfying titjt_{i}\leq t_{j},

titj|f(t,x,y,z)f(t,x,y,z)|2𝑑xCfE|titjzzdBtH|2,\int_{t_{i}}^{t_{j}}\left\lvert f(t,x,y,z)-f(t,x,y,z^{\prime})\right\rvert^{2}\,dx\leq C_{f}E\left\lvert\int_{t_{i}}^{t_{j}}z-z^{\prime}\,dB_{t}^{H}\right\rvert^{2},

where CfC_{f} is a given positive constant and denote Cf=CfLfC_{f}^{\ast}=C_{f}\vee L_{f}.

Assumptions 4.4.

Assume (14) has a unique classical solution and there exist unique pair of t\mathscr{F}_{t}-adapted processes (Yt,Zt)\left(Y_{t},Z_{t}\right) solving (3).

Consider the following fFBSDE system with the state Y~t\widetilde{Y}_{t}

{Xt=ξ+0tμ(s,Xs)𝑑s+0tσ(s,Xs)𝑑BsH,Y~t=y~0tf(s,Xs,Y~s,Z~s)𝑑s+0tZ~s𝑑BsH,\begin{cases}\begin{aligned} &X_{t}=\xi+\int_{0}^{t}\mu(s,X_{s})\,ds+\int_{0}^{t}\sigma(s,X_{s})\,dB_{s}^{H},\\ &\widetilde{Y}_{t}=\widetilde{y}-\int_{0}^{t}f(s,X_{s},\widetilde{Y}_{s},\widetilde{Z}_{s})ds+\int_{0}^{t}\widetilde{Z}_{s}\,dB_{s}^{H},\\ \end{aligned}\end{cases} (29)

the aim is to minimize the objective functional

J(y~,Z~)=E|g(XT)Y~T|2,J\left(\widetilde{y},\widetilde{Z}_{\centerdot}\right)=E\left\lvert g(X_{T})-\widetilde{Y}_{T}\right\rvert^{2}, (30)

under the control (y~,Z~)×(0,T)\left(\widetilde{y},\widetilde{Z}_{\centerdot}\right)\in\mathbb{R}\times\mathcal{L}(0,T).

Firstly, notice (9), and we provide

Theorem 4.5.

Assume Assumptions 4.1-4.4 hold. Let (Yt,Zt)(Y_{t},Z_{t}) be the solution of fBSDE (3), Y~t\widetilde{Y}_{t} denotes the state of system (29) under the control (y~,Z~)×(0,T)\left(\widetilde{y},\widetilde{Z}_{\centerdot}\right)\in\mathbb{R}\times\mathcal{L}(0,T), then there exists some constant CC which only depends on TT and CfC_{f}^{\ast} satisfying

sup0tTE|YtY~t|2+E|0TZsZ~sdBsH|2\displaystyle\sup_{0\leq t\leq T}E\left\lvert Y_{t}-\widetilde{Y}_{t}\right\rvert^{2}+E\left\lvert\int_{0}^{T}Z_{s}-\widetilde{Z}_{s}\,dB_{s}^{H}\right\rvert^{2} (31)
=\displaystyle= sup0tTE|YtY~t|2+E{(0TDs(ZsZ~s)𝑑s)2+|ZsZ~s|ϕ}\displaystyle\sup_{0\leq t\leq T}E\left\lvert Y_{t}-\widetilde{Y}_{t}\right\rvert^{2}+E\left\{\left(\int_{0}^{T}D_{s}\left(Z_{s}-\widetilde{Z}_{s}\right)\,ds\right)^{2}+\left\lvert Z_{s}-\widetilde{Z}_{s}\right\rvert_{\phi}\right\}
\displaystyle\leq CE|g(XT)Y~T|2.\displaystyle CE\left\lvert g(X_{T})-\widetilde{Y}_{T}\right\rvert^{2}.

The proof of Theorem 4.5 is an analogue of Theorem 2.1 in [24], only note that Itô formula should be replaced by the Itô formula for Wick integration.

In view of 4.5, the problem of solving (3) can be changed into the stochastic control problem of the system (29). So it will be a reasonable choice to apply deep learning to this kind of problems.

Then, we need to provide the estimation of the error resulted from time discretization. From now on, we manily consider d=1d=1 for brevity.

Cγ([0,T])C^{\gamma}([0,T]) denotes a γ\gamma-Ho¨\mathrm{\ddot{o}}lder space, γ\left\lVert\cdot\right\rVert_{\gamma} denotes the γ\gamma-Ho¨\mathrm{\ddot{o}}lder norm.

For any constant K>0K>0, define

𝒜K:={Z(0,T)|Z(,ω)C12([0,T]),E(Z124)K}.\mathcal{A}_{K}:=\left\{Z\in\mathcal{L}(0,T)|Z(\cdot,\omega)\in C^{\frac{1}{2}}([0,T]),E\left(\left\lVert Z\right\rVert_{\frac{1}{2}}^{4}\right)\leq K\right\}.

Let Y^t\widehat{Y}_{t} be the state of

{X^tk+1=X^tk+μ(tk,X^tk)Δt+σ(tk,X^tk)ΔBtkH,Y^tk+1=Y^tkf(tk,X^tk,Y^tk,Z~tk)(tk+1tk)+Z~tkΔBtkH,\begin{cases}\begin{aligned} &\widehat{X}_{t_{k+1}}=\widehat{X}_{t_{k}}+\mu(t_{k},\widehat{X}_{t_{k}})\Delta t+\sigma(t_{k},\widehat{X}_{t_{k}})\,\diamond\Delta B^{H}_{t_{k}},\\ &\widehat{Y}_{t_{k+1}}=\widehat{Y}_{t_{k}}-f(t_{k},\widehat{X}_{t_{k}},\widehat{Y}_{t_{k}},\widetilde{Z}_{t_{k}})\left(t_{k+1}-t_{k}\right)+\widetilde{Z}_{t_{k}}\diamond\Delta B^{H}_{t_{k}},\end{aligned}\end{cases} (32)

with the aim of minimizing the objective functional

J^(y~,Z~)=E|g(XT)Y^T|2,\widehat{J}\left(\widetilde{y},\widetilde{Z}_{\centerdot}\right)=E\left\lvert g(X_{T})-\widehat{Y}_{T}\right\rvert^{2}, (33)

under the control (y~,Z~)×𝒜K\left(\widetilde{y},\widetilde{Z}_{\centerdot}\right)\in\mathbb{R}\times\mathcal{A}_{K}. For any partition π\pi, define

Lemma 4.6.

Assume Z~𝒜K\widetilde{Z}\in\mathcal{A}_{K}, then let NN large enough, for any partition π\pi and k=0,,N1k=0,\dots,N-1, it follows that

E|tktk+1Z~t𝑑BtHZ~tkΔBtkH|2CK[E(BHCHδ4)(Δt)1+2H2δ+Δt2],E\left\lvert\int_{t_{k}}^{t_{k+1}}\widetilde{Z}_{t}\,dB_{t}^{H}-\widetilde{Z}_{t_{k}}\diamond\Delta B^{H}_{t_{k}}\right\rvert^{2}\leq C_{K}\left[\sqrt{E\left(\left\lVert B^{H}\right\rVert^{4}_{C^{H-\delta}}\right)}\left(\Delta t\right)^{1+2H-2\delta}+\Delta t^{2}\right], (34)

moreover,

suptkttk+1E|Y~tY^tk|2CK[E(BHCHδ4)(Δt)1+2H2δ+Δt2],\sup_{t_{k}\leq t\leq t_{k+1}}E\left\lvert\widetilde{Y}_{t}-\widehat{Y}_{t_{k}}\right\rvert^{2}\leq C_{K}\left[\sqrt{E\left(\left\lVert B^{H}\right\rVert^{4}_{C^{H-\delta}}\right)}\left(\Delta t\right)^{1+2H-2\delta}+\Delta t^{2}\right], (35)

especially,

E|Y~TY^T|2CK[E(BHCHδ4)(Δt)1+2H2δ+Δt2],E\left\lvert\widetilde{Y}_{T}-\widehat{Y}_{T}\right\rvert^{2}\leq C_{K}\left[\sqrt{E\left(\left\lVert B^{H}\right\rVert^{4}_{C^{H-\delta}}\right)}\left(\Delta t\right)^{1+2H-2\delta}+\Delta t^{2}\right], (36)

with some CKC_{K} not depending on BHB^{H}, δ(0,H12)\delta\in\left(0,H-\frac{1}{2}\right).

Proof.

Define Zt:=Z~tZ~tkZ_{t}^{\ast}:=\widetilde{Z}_{t}-\widetilde{Z}_{t_{k}} for t[tk,tk+1]t\in\left[t_{k},t_{k+1}\right]. In view of Lemma 19 in [25], it follows that

|tktk+1ZtδBtH|CZt12BHCHδ(Δt)12+Hδ\left\lvert\int_{t_{k}}^{t_{k+1}}Z_{t}^{\ast}\,\delta B_{t}^{H}\right\rvert\leq C\left\lVert Z_{t}^{\ast}\right\rVert_{\frac{1}{2}}\left\lVert B^{H}\right\rVert_{C^{H-\delta}}\left(\Delta t\right)^{\frac{1}{2}+H-\delta} (37)

where 12+Hδ>1\frac{1}{2}+H-\delta>1. Then

E|tktk+1Z~tδBtHZ~tkΔBtkH|2CK[E(BHCHδ4)(Δt)1+2H2δ].E\left\lvert\int_{t_{k}}^{t_{k+1}}\widetilde{Z}_{t}\,\delta B_{t}^{H}-\widetilde{Z}_{t_{k}}\diamond\Delta B^{H}_{t_{k}}\right\rvert^{2}\leq C_{K}\left[\sqrt{E\left(\left\lVert B^{H}\right\rVert^{4}_{C^{H-\delta}}\right)}\left(\Delta t\right)^{1+2H-2\delta}\right]. (38)

In view of Theorem 2.5 and (9), for NN large enough,

E|tktk+1Zt𝑑BtH|2\displaystyle E\left\lvert\int_{t_{k}}^{t_{k+1}}Z_{t}^{\ast}\,dB_{t}^{H}\right\rvert^{2} E|tktk+1ZtδBtH|2+E{(tktk+1DsZs𝑑s)2}\displaystyle\leq E\left\lvert\int_{t_{k}}^{t_{k+1}}Z_{t}^{\ast}\,\delta B_{t}^{H}\right\rvert^{2}+E\left\{\left(\int_{t_{k}}^{t_{k+1}}D_{s}Z_{s}^{\ast}\,ds\right)^{2}\right\} (39)
CK[E(BHCHδ4)(Δt)1+2H2δ]\displaystyle\leq C_{K}\left[\sqrt{E\left(\left\lVert B^{H}\right\rVert^{4}_{C^{H-\delta}}\right)}\left(\Delta t\right)^{1+2H-2\delta}\right]
+suptkttk+1E|Dt(ZtZtk)|2Δt2\displaystyle+\sup_{t_{k}\leq t\leq t_{k+1}}E\left\lvert D_{t}\left(Z_{t}-Z_{t_{k}}\right)\right\rvert^{2}\Delta t^{2}
CK[E(BHCHδ4)(Δt)1+2H2δ+Δt2].\displaystyle\leq C_{K}\left[\sqrt{E\left(\left\lVert B^{H}\right\rVert^{4}_{C^{H-\delta}}\right)}\left(\Delta t\right)^{1+2H-2\delta}+\Delta t^{2}\right].

let b0b_{0} denote CK[E(BHCHδ4)(Δt)1+2H2δ+Δt2]C_{K}\left[\sqrt{E\left(\left\lVert B^{H}\right\rVert^{4}_{C^{H-\delta}}\right)}\left(\Delta t\right)^{1+2H-2\delta}+\Delta t^{2}\right]. Consider (32) and (29), for any t[tk+1,tk+2]t\in[t_{k+1},t_{k+2}] we have

E|Y~tY^tk+1|2\displaystyle E\left\lvert\widetilde{Y}_{t}-\widehat{Y}_{t_{k+1}}\right\rvert^{2}\leq (1+Δt)E|Y~tkY^tk|2\displaystyle\left(1+\Delta t\right)E\left\lvert\widetilde{Y}_{t_{k}}-\widehat{Y}_{t_{k}}\right\rvert^{2} (40)
+\displaystyle+ (Δt+Δt2)tktk+1E|f(s,Xs,Y~s,Z~s)f(tk,X^tk,Y^tk,Z~tk)|2𝑑s\displaystyle\left(\Delta t+\Delta t^{2}\right)\int_{t_{k}}^{t_{k+1}}E\left\lvert f(s,X_{s},\widetilde{Y}_{s},\widetilde{Z}_{s})-f(t_{k},\widehat{X}_{t_{k}},\widehat{Y}_{t_{k}},\widetilde{Z}_{t_{k}})\right\rvert^{2}\,ds
+\displaystyle+ E|tktk+1Z~t𝑑BtHZ~tkΔBtkH|2\displaystyle E\left\lvert\int_{t_{k}}^{t_{k+1}}\widetilde{Z}_{t}\,dB_{t}^{H}-\widetilde{Z}_{t_{k}}\diamond\Delta B^{H}_{t_{k}}\right\rvert^{2}
\displaystyle\leq (1+Δt)E|Y~tkY^tk|2\displaystyle\left(1+\Delta t\right)E\left\lvert\widetilde{Y}_{t_{k}}-\widehat{Y}_{t_{k}}\right\rvert^{2}
+\displaystyle+ CΔttktk+1E(|XsX^tk|2+|Y~sY^tk|2)𝑑s\displaystyle C\Delta t\int_{t_{k}}^{t_{k+1}}E\left(\left\lvert X_{s}-\widehat{X}_{t_{k}}\right\rvert^{2}+\left\lvert\widetilde{Y}_{s}-\widehat{Y}_{t_{k}}\right\rvert^{2}\right)\,ds
+\displaystyle+ (1+CΔt)E|tktk+1Z~t𝑑BtHZ~tkΔBtkH|2\displaystyle\left(1+C\Delta t\right)E\left\lvert\int_{t_{k}}^{t_{k+1}}\widetilde{Z}_{t}\,dB_{t}^{H}-\widetilde{Z}_{t_{k}}\diamond\Delta B^{H}_{t_{k}}\right\rvert^{2}
\displaystyle\leq (1+CΔt)suptkttk+1E|Y~tY^tk|2\displaystyle\left(1+C\Delta t\right)\sup_{t_{k}\leq t\leq t_{k+1}}E\left\lvert\widetilde{Y}_{t}-\widehat{Y}_{t_{k}}\right\rvert^{2}
+\displaystyle+ CΔttktk+1E(|XsX^tk|2)𝑑s+(1+CΔt)b0,\displaystyle C\Delta t\int_{t_{k}}^{t_{k+1}}E\left(\left\lvert X_{s}-\widehat{X}_{t_{k}}\right\rvert^{2}\right)\,ds+\left(1+C\Delta t\right)b_{0},

since we let NN large enough, and apply supremum to (40)

suptk+1ttk+2E|Y~tY^tk+1|2(1+CΔt)(suptkttk+1E|Y~tY^tk|2+b0),\sup_{t_{k+1}\leq t\leq t_{k+2}}E\left\lvert\widetilde{Y}_{t}-\widehat{Y}_{t_{k+1}}\right\rvert^{2}\leq\left(1+C\Delta t\right)\left(\sup_{t_{k}\leq t\leq t_{k+1}}E\left\lvert\widetilde{Y}_{t}-\widehat{Y}_{t_{k}}\right\rvert^{2}+b_{0}\right), (41)

especially, we have

E|Y~TY^T|2(1+CΔt)(suptN1tTE|Y~tY^tN1|2+b0),E\left\lvert\widetilde{Y}_{T}-\widehat{Y}_{T}\right\rvert^{2}\leq\left(1+C\Delta t\right)\left(\sup_{t_{N-1}\leq t\leq T}E\left\lvert\widetilde{Y}_{t}-\widehat{Y}_{t_{N-1}}\right\rvert^{2}+b_{0}\right), (42)

Define ak:=suptk1ttkE|Y~tY^tk1|2a_{k}:=\sup_{t_{k-1}\leq t\leq t_{k}}E\left\lvert\widetilde{Y}_{t}-\widehat{Y}_{t_{k-1}}\right\rvert^{2}, k=1,,Nk=1,\dots,N. Notice Y~0=Y^t0=y~\widetilde{Y}_{0}=\widehat{Y}_{t_{0}}=\widetilde{y}, instantly we have a1=(1+CΔt)b0a_{1}=(1+C\Delta t)b_{0}, then

{ak(1+CΔt)(ak1+b0),a1(1+CΔt)b0,\begin{cases}\begin{aligned} &a_{k}\leq\left(1+C\Delta t\right)\left(a_{k-1}+b_{0}\right),\\ &a_{1}\leq(1+C\Delta t)b_{0},\end{aligned}\end{cases} (43)

in view of (43), we have

aNk=1N(1+CΔt)kb0CK[E(BHCHδ4)(Δt)1+2H2δ+Δt2],a_{N}\leq\sum_{k=1}^{N}\left(1+C\Delta t\right)^{k}b_{0}\leq C_{K}\left[\sqrt{E\left(\left\lVert B^{H}\right\rVert^{4}_{C^{H-\delta}}\right)}\left(\Delta t\right)^{1+2H-2\delta}+\Delta t^{2}\right], (44)

and aka_{k}, k=1,,N1k=1,\dots,N-1, E|Y~TY^T|2E\left\lvert\widetilde{Y}_{T}-\widehat{Y}_{T}\right\rvert^{2} also hold, the proof is finished. ∎

Finally, we can give

Theorem 4.7.

Assume Assumptions 4.1-4.4 hold. Let (Yt,Zt)(Y_{t},Z_{t}) be the solution of fBSDE (3), Y^t\widehat{Y}_{t} denotes the state of system (32) under the control (y~,Z~)×𝒜K\left(\widetilde{y},\widetilde{Z}_{\centerdot}\right)\in\mathbb{R}\times\mathcal{A}_{K}, then for NN large enough, there exist some constants CC which only depends on TT and CfC_{f}^{\ast} and CKC_{K} which depends on TT, CfC_{f}^{\ast} and KK satisfying

max0kNsuptk1ttkE|YtY^tk1|2+E|0TZsZ~sdBsH|2\displaystyle\max_{0\leq k\leq N}\sup_{t_{k-1}\leq t\leq t_{k}}E\left\lvert Y_{t}-\widehat{Y}_{t_{k-1}}\right\rvert^{2}+E\left\lvert\int_{0}^{T}Z_{s}-\widetilde{Z}_{s}\,dB_{s}^{H}\right\rvert^{2} (45)
=\displaystyle= max0kNsuptk1ttkE|YtY^tk1|2+E{(0TDs(ZsZ~s)𝑑s)2+|ZsZ~s|ϕ}\displaystyle\max_{0\leq k\leq N}\sup_{t_{k-1}\leq t\leq t_{k}}E\left\lvert Y_{t}-\widehat{Y}_{t_{k-1}}\right\rvert^{2}+E\left\{\left(\int_{0}^{T}D_{s}\left(Z_{s}-\widetilde{Z}_{s}\right)\,ds\right)^{2}+\left\lvert Z_{s}-\widetilde{Z}_{s}\right\rvert_{\phi}\right\}
\displaystyle\leq CE|g(XT)Y^T|2+CK[E(BHCHδ4)(Δt)1+2H2δ+Δt2],\displaystyle CE\left\lvert g(X_{T})-\widehat{Y}_{T}\right\rvert^{2}+C_{K}\left[\sqrt{E\left(\left\lVert B^{H}\right\rVert^{4}_{C^{H-\delta}}\right)}\left(\Delta t\right)^{1+2H-2\delta}+\Delta t^{2}\right],

where δ(0,H12)\delta\in\left(0,H-\frac{1}{2}\right).

Proof.

In view of Theorem 4.5 and Lemma 4.6,

max0kNsuptk1ttkE|YtY^tk1|2+E|0TZsZ~sdBsH|2\displaystyle\max_{0\leq k\leq N}\sup_{t_{k-1}\leq t\leq t_{k}}E\left\lvert Y_{t}-\widehat{Y}_{t_{k-1}}\right\rvert^{2}+E\left\lvert\int_{0}^{T}Z_{s}-\widetilde{Z}_{s}\,dB_{s}^{H}\right\rvert^{2} (46)
\displaystyle\leq sup0tTE|YtY~t|2+max0kNsuptk1ttkE|Y~tY^tk1|2+E|0TZsZ~sdBsH|2\displaystyle\sup_{0\leq t\leq T}E\left\lvert Y_{t}-\widetilde{Y}_{t}\right\rvert^{2}+\max_{0\leq k\leq N}\sup_{t_{k-1}\leq t\leq t_{k}}E\left\lvert\widetilde{Y}_{t}-\widehat{Y}_{t_{k-1}}\right\rvert^{2}+E\left\lvert\int_{0}^{T}Z_{s}-\widetilde{Z}_{s}\,dB_{s}^{H}\right\rvert^{2}
\displaystyle\leq CE|g(XT)Y~T|2+max0kNsuptk1ttkE|Y~tY^tk1|2\displaystyle CE\left\lvert g(X_{T})-\widetilde{Y}_{T}\right\rvert^{2}+\max_{0\leq k\leq N}\sup_{t_{k-1}\leq t\leq t_{k}}E\left\lvert\widetilde{Y}_{t}-\widehat{Y}_{t_{k-1}}\right\rvert^{2}
\displaystyle\leq CE|g(XT)Y^T|2+CE|Y~TY^T|2+max0kNsuptk1ttkE|Y~tY^tk1|2\displaystyle CE\left\lvert g(X_{T})-\widehat{Y}_{T}\right\rvert^{2}+CE\left\lvert\widetilde{Y}_{T}-\widehat{Y}_{T}\right\rvert^{2}+\max_{0\leq k\leq N}\sup_{t_{k-1}\leq t\leq t_{k}}E\left\lvert\widetilde{Y}_{t}-\widehat{Y}_{t_{k-1}}\right\rvert^{2}
\displaystyle\leq CE|g(XT)Y^T|2+CK[E(BHCHδ4)(Δt)1+2H2δ+Δt2].\displaystyle CE\left\lvert g(X_{T})-\widehat{Y}_{T}\right\rvert^{2}+C_{K}\left[\sqrt{E\left(\left\lVert B^{H}\right\rVert^{4}_{C^{H-\delta}}\right)}\left(\Delta t\right)^{1+2H-2\delta}+\Delta t^{2}\right].

5 Numerical examples

In this section, We will introduce some experiments to verify whether RNN-BSDE method works well on fractional BSDEs. We mainly apply the RNN-BSDE algorithm based on a multi-layer LSTM to our experiments, which we refer to as LSTM-BSDE for brevity. In addition, we refer to the RNN-BSDE algorithm based on a stacked RNN as mRNN-BSDE.

5.1 Fractional Black-Scholes equation

In this subsection, we consider the extension of the famous Black-Scholes equation[26] which is widely applied in the field of finance. In view of (3.4), the fractional Black-scholes equation in the case of d=1d=1 has the form of

{u(t,x)t+σ2Hx2t2H12u(t,x)x2+rxu(t,x)xru(t,x)=0,u(T,x)=g(x),\begin{cases}\begin{aligned} &\frac{\partial u(t,x)}{\partial t}+\sigma^{2}Hx^{2}t^{2H-1}\frac{\partial^{2}u(t,x)}{\partial x^{2}}+rx\frac{\partial u(t,x)}{\partial x}-ru(t,x)=0,\\ &u(T,x)=g(x),\\ \end{aligned}\end{cases} (47)

where rr is a constant known as the interest rate. And if H=12H=\frac{1}{2}, (47) will be exactly the famous standard Black-Scholes equation.

Adopt g(x)=max{xK,0}g(x)=\max{\left\{x-K,0\right\}}, To solve (47) is the equivalent of solving the pricing problem of European call option. It is not a difficult thing and just similar to what to do to solve the standard Black-Scholes equation in the 1d1-d case. By means of variable substitution, change (47) into a typical heat equation, and it can be verified that the solution of (47) is

u(t,x)=xN(d1)Ker(Tt)N(d2),u(t,x)=xN(d_{1})-Ke^{-r(T-t)}N(d_{2}), (48)

where N(t)=12πtes22𝑑sN(t)=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{t}e^{-\frac{s^{2}}{2}}\,ds is the normal distribution function and

η\displaystyle\eta =lnxK+r(Tt)σT2Ht2H,\displaystyle=\frac{\ln{\frac{x}{K}}+r(T-t)}{\sigma\sqrt{T^{2H}-t^{2H}}},
d1\displaystyle d_{1} =η+σ2T2Ht2H,\displaystyle=\eta+\frac{\sigma}{2}\sqrt{T^{2H}-t^{2H}},
d2\displaystyle d_{2} =ησ2T2Ht2H.\displaystyle=\eta-\frac{\sigma}{2}\sqrt{T^{2H}-t^{2H}}.

Our goal is to approximate u(0,x0)u(0,x_{0}), x0=(x0(1),,x0d)dx_{0}=(x_{0}^{(1)},\dots,x_{0}^{d})\in\mathbb{R}^{d} by the deep learning method and compare the LSTM-BSDE method with other methods designed for solving high-dimensional PDEs and SDEs to verify whether RNN-BSDE method works well on fractional BSDEs. There is some common setting for LSTM-BSDE method. The multi-layer LSTM set in the LSTM-BSDE consists of one input layer, two hidden layers and one output layer. The input layer is dd-dimensional , the two hidden layers are (d+10)(d+10)-dimensional, the output layer is dd-dimensional. In every hidden layer and output layer, Xavier initialisation[27] is used to initialise weights of inputs, orthogonal initialisation is used to initialise weights of recurrent connections, the biases are initialised to zero (these are exactly the default setting in Keras for LSTM units). The normal initialisation and the uniform initialisation are used to initialise β\beta and γ\gamma of layer normalization. The layer normalization is applied before all the activation functions in the LSTM units of all hidden layers, and before the output layer.

The setting for the stacked RNN used in the RNN-BSDE method is the same as what we set for LSTM-BSDE. The methods used to compare with LSTM-BSDE that we choose are deep splitting method[28] and DBDP1 method[29]. The neural networks of these methods are FNN-based if without any extra description. And we set these FNNs as same as the one for the Deep BSDE method described in [9].

As for the optimizer, we choose Adam[30] for all, which is effective as known and checked by our experiment results.

5.1.1 Results in the one-dimension case(d=1d=1)

Set the dimension d=1d=1, 0=t0<t1<<tN=T0=t_{0}<t_{1}<\dots<t_{N}=T, T=0.5T=0.5, N=20N=20, then Δt=T/N=0.025\Delta t=T/N=0.025. For the parameters of (17), (47) and (48), μ=0.06\mu=0.06, σ=0.2\sigma=0.2, x0=100x_{0}=100, r=0.06r=0.06, K=100K=100. Learning rate lr=0.005lr=0.005, the valid set size m1=256m_{1}=256 and the mini-batch size m2=64m_{2}=64. To approximate u(0,x0)u(0,x_{0}), there will be 5 independent runs for each of the methods.

For comparison, firstly, we consider a trival case, where H=12H=\frac{1}{2}, i.e. (17) is a standard SDE driven by the Brownian motion BtB_{t} and the explicit solution u(0,x0)u(0,x_{0}) in (48) is around 7.1559. It is not surprising to observe that in Fig.3 the results of u~(0,x0)\widetilde{u}(0,x_{0}) from all algorithms are close to the true value and these methods using FNNs except deep splitting have a little better performance than these using RNNs since the Brownian motion BtB_{t} has such a well-known fine property named Markov property. It means that to forecast the information in the future, we only need to know the information at the moment without considering what happened in the past, which makes the RNN structure lose its advantage.

Refer to caption
Figure 3: Mean of the loss function and relative L1L^{1}-approximation error of u(0,x0)u(0,x_{0}) in the 1d1-d case of the PDE (47). (a.1) mean of the loss function when H=1/2H=1/2; (a.2)relative L1L^{1}-approximation error when H=1/2H=1/2; (b.1) mean of the loss function when H=3/4H=3/4; (b.2)relative L1L^{1}-approximation error when H=3/4H=3/4.
Table 1: Numerical simulations for each method in the 1d1-d case of the PDE (47) with H=12(utrue=7.1559)H=\frac{1}{2}(u_{true}=7.1559).
Means of
u0u_{0}
Std of
u0u_{0}
Rel.L1L^{1}
error
Std of
rel.error
Avg.runtime
/s
deep BSDE 7.1556 4.52e-3 4.71e-4 3.49e-4 277
LSTM-BSDE 7.1439 2.30e-2 2.75e-3 2.03e-3 403
mRNN-BSDE 7.1491 6.81e-3 1.23e-3 6.19e-4 180
DS 7.1472 1.64e-2 2.03e-3 1.44e-3 779
DBDP1 7.1524 1.91e-2 1.89e-3 1.77e-3 595

Focus on the numerical experiments with H=34H=\frac{3}{4}, the explicit solution u(0,x0)u(0,x_{0}) in (48) is around 6.2968, in this case, the LSTM-BSDE method and mRNN-BSDE begin to make a difference, u~(0,x0)\widetilde{u}(0,x_{0}) by the LSTM-BSDE method is close to the true value u(0,x0)u(0,x_{0}) while the deep BSDE method and DBDP1 both offer the results of u~(0,x0)\widetilde{u}(0,x_{0}) which don’t converge to the true value after 10000 iterations. But the interesting thing is that deep splitting is also an effective method of solving fBSDEs and corresponding PDEs even without an RNN. The reason can be known from the idea and these loss functions of the deep splitting method introduced by [28], such loss functions help us to avoid estimating the integral of ZZ w.r.t BtHB_{t}^{H}, i.e. Zs𝑑BsH\int Z_{s}\,dB_{s}^{H}, directly.

Table 2: Numerical simulations for each method in the 1d1-d case of the PDE (47) with H=34(utrue=6.2968)H=\frac{3}{4}(u_{true}=6.2968).
Means of
u0u_{0}
Std of
u0u_{0}
Rel.L1L^{1}
error
Std of
rel.error
Avg.runtime
/s
deep BSDE 4.2473 4.84e-3 0.3255 7.68e-4 286
LSTM-BSDE 6.2048 2.90e-2 0.0146 4.60e-3 402
mRNN-BSDE 6.1989 2.39e-3 0.0153 3.79e-4 181
DS 6.1819 1.77e-3 0.0184 2.81e-4 823
DBDP1 4.3427 8.85e-3 0.3101 1.41e-3 619

5.1.2 Results in the high-dimension case(d=50d=50)

In the high-dimensional case, the fractional Black-scholes equation has the form

{u(t,x)t+i=1dσ2Hxj2t2H12u(t,x)xi2+i=1drxiu(t,x)xiru(t,x)=0,u(T,x)=g(x),\begin{cases}\begin{aligned} &\frac{\partial u(t,x)}{\partial t}+\sum_{i=1}^{d}\sigma^{2}Hx_{j}^{2}t^{2H-1}\frac{\partial^{2}u(t,x)}{\partial x_{i}^{2}}+\sum_{i=1}^{d}rx_{i}\frac{\partial u(t,x)}{\partial x_{i}}-ru(t,x)=0,\\ &u(T,x)=g(x),\\ \end{aligned}\end{cases} (49)

where g(XT)=max{max1idXTiK,0}g(X_{T})=\max\left\{\max_{1\leq i\leq d}X_{T}^{i}-K,0\right\}. And in this case, there is no known analytical solution which is different from the 1-dimensional case.

We choose d=50d=50 as the high-dimension case, 0=t0<t1<<tN=T0=t_{0}<t_{1}<\dots<t_{N}=T, T=0.5T=0.5, N=20N=20, then Δt=T/N=0.025\Delta t=T/N=0.025. For the parameters, assume μ=0.06\mu=0.06, σ=0.2\sigma=0.2, x0=(100,100,,100)dx_{0}=(100,100,\dots,100)\in\mathbb{R}^{d}, r=0.06r=0.06, K=100K=100. Learning rate lr=0.008lr=0.008, the valid set size m1=256m_{1}=256 and the mini-batch size m2=64m_{2}=64. To approximate u(0,x0)u(0,x_{0}), there will be 5 independent runs for each of the methods.

On principle, we still try our best to keep hyperparameters same for all algorithms but we can hardly ignore the difference between different methods especially in the high-dimensional case. For DBDP1, we set n=11n=11 neuros on each hidden layers because if we set n=d+10n=d+10, u~(0,x0)\widetilde{u}(0,x_{0}) is slow to converge under lr=0.008lr=0.008. Though we can choose to increase learning rate for DBDP1, we have finally chosen to keep the number of neuros on each hidden layers same as the 1-dimensional case by comparing the results.

Firstly, we also make a comparison between these methods with H=12H=\frac{1}{2} in the high-dimensional case. Similar to the one-dimensional case with H=12H=\frac{1}{2}, the values of u~(0,x0)\widetilde{u}(0,x_{0}) are all close whichever method we use as shown in Fig 4 and Table 3.

Refer to caption
Figure 4: Mean of the loss function and mean of the approximations of u(0,x0)u(0,x_{0}) in the 50d50-d case of the PDE (49). (c.1) mean of the loss function when H=1/2H=1/2; (c.2)mean of the approximations of u(0,x0)u(0,x_{0}) when H=1/2H=1/2; (d.1) mean of the loss function when H=3/4H=3/4; (d.2)mean of the approximations of u(0,x0)u(0,x_{0}) when H=3/4H=3/4.
Table 3: Numerical simulations for each method in the 50d50-d case of the PDE (49) with H=12H=\frac{1}{2}.
Mean of
u0u_{0}
Std of
u0u_{0}
Avg.time to reach
±\pm0.5% of
means of u0u_{0}/s
Avg.runtime
/s
deep BSDE 39.3409 0.0246 352 799
LSTM-BSDE 39.3214 0.0274 1144 2771
mRNN-BSDE 39.3441 0.0397 389 894
DS 39.3275 0.0405 1521 3041
DBDP1 39.3062 0.0099 1283 2571

Set H=34H=\frac{3}{4}, from Fig 4 and Table 4, it can be observed that LSTM-BSDE, mRNN-BSDE and deep splitting give close u~(0,x0)\widetilde{u}(0,x_{0}) and the same level of standard deviations while DBDP1 gives u~(0,x0)\widetilde{u}(0,x_{0}) a little far from them and a higher standard deviation and deep BSDE don’t offer a convergence value after 10000 iterations in this case.

Table 4: Numerical simulations for each method in the 50d50-d case of the PDE (49) with H=34H=\frac{3}{4}.
Mean of
u0u_{0}
Std of
u0u_{0}
Avg.time to reach
±\pm0.5% of
means of u0u_{0}/s
Avg.runtime
/s
deep BSDE NC NC NC NC
LSTM-BSDE 30.6935 0.0165 614 3025
mRNN-BSDE 30.6828 0.0225 301 1289
DS 30.7005 0.0239 1750 4284
DBDP1 29.9831 0.1045 2217 3820

5.2 Nonlinear fractional Black-Scholes equation with different interest rates for borrowing and lending(d=100d=100)

Next, we gave some numerical experiments for calculating approximate solutions of some nonlinear parabolic PDEs by using mRNN-BSDE and LSTM-BSDE. Comparing with classical linear Black-Scholes equations, nonlinear Black-Scholes equations are under more realistic assumptions and there are many types of them. Here we have considered a nonlinear fractional Black-Scholes equation with different interest rates for borrowing and lending, which is

{u(t,x)t+i=1dσ2Hxj2t2H12u(t,x)xi2+i=1dμxiu(t,x)xi+f(t,x,u(t,x),(uσ)(t,x))=0,f(t,x,y,z)=rlyμrlσi=1dzi+(rbrl)max{0,[1σi=1dzi]y},\begin{cases}\begin{aligned} &\frac{\partial u(t,x)}{\partial t}+\sum_{i=1}^{d}\sigma^{2}Hx_{j}^{2}t^{2H-1}\frac{\partial^{2}u(t,x)}{\partial x_{i}^{2}}+\sum_{i=1}^{d}\mu x_{i}\frac{\partial u(t,x)}{\partial x_{i}}\\ &+f(t,x,u(t,x),(\nabla u\sigma)(t,x))=0,\\ &f(t,x,y,z)=-r^{l}y-\frac{\mu-r^{l}}{\sigma}\sum_{i=1}^{d}z_{i}+(r^{b}-r^{l})\max\left\{0,\left[\frac{1}{\sigma}\sum_{i=1}^{d}z_{i}\right]-y\right\},\\ \end{aligned}\end{cases} (50)

and

g(x)=max{[max1i100xi]120,0}2max{[max1i100xi]150,0}.g(x)=\max\left\{\left[\max_{1\leq i\leq 100}x_{i}\right]-120,0\right\}-2\max\left\{\left[\max_{1\leq i\leq 100}x_{i}\right]-150,0\right\}. (51)

Assume d=100d=100, T=0.5T=0.5, N=20N=20, μ=0.06\mu=0.06, σ=0.2\sigma=0.2, x0=(100,100,,100)dx_{0}=(100,100,\dots,100)\in\mathbb{R}^{d}, rl=0.04r^{l}=0.04, rb=0.06r^{b}=0.06, H=0.75H=0.75. As for the network setting, assume learning rate lr=0.005lr=0.005, the valid set size m1=256m_{1}=256 and the mini-batch size m2=64m_{2}=64. For both mRNN-BSDE method and LSTM-BSDE method, set one dd-dimensional input layer, two d+10d+10-dimensional hidden layers and one dd-dimensional output layer. To approximate u(0,x0)u(0,x_{0}), there will be 5 independent runs for each of the methods. The true value of u(0,x0)u(0,x_{0}) of Equation (50) is replaced by the reference value u(0,x0)=14.8236u(0,x_{0})=14.8236 which is calculated by deep splitting.

Refer to caption
Figure 5: Mean of the loss function and relative L1L^{1}-approximation error of u(0,x0)u(0,x_{0}) in the 100d100-d case of the PDE (50). (a) mean of the loss function ; (b)relative L1L^{1}-approximation error of u(0,x0)u(0,x_{0}) when H=3/4H=3/4.
Table 5: Numerical simulations for RNN-BSDE method in the 100d100-d case of the PDE (50) with H=34H=\frac{3}{4}.
Mean of
u0u_{0}
Std of
u0u_{0}
Rel.L1L^{1}
error
Std of
rel.error
Avg.runtime
/s
LSTM-BSDE 14.7947 2.29e-2 1.93e-3 1.55e-3 3726
mRNN-BSDE 14.6800 1.42e-2 9.52e-3 9.60e-4 1480

5.3 A semilinear heat equation with variable coefficients(d=50d=50)

In this subsection, we consider a type of semilinear heat equation with variable coefficients of the form

u(t,x)t+σ2Ht2H1Δxu(t,x)+1|u(t,x)|21+|u(t,x)|2=0,\frac{\partial u(t,x)}{\partial t}+\sigma^{2}Ht^{2H-1}\Delta_{x}u(t,x)+\frac{1-\left\lvert u(t,x)\right\rvert^{2}}{1+\left\lvert u(t,x)\right\rvert^{2}}=0, (52)

and

g(x)=5exp(T2H)10+2xd2.g(x)=\frac{5\exp\left(T^{2H}\right)}{10+2\left\lVert x\right\rVert^{2}_{\mathbb{R}^{d}}}. (53)

Assume d=50d=50, T=0.5T=0.5, N=20N=20, μ(t,x)=0\mu(t,x)=0, σ(t,x)=σ=1\sigma(t,x)=\sigma=1, x0=(0,0,,0)dx_{0}=(0,0,\dots,0)\in\mathbb{R}^{d}, H=2/3H=2/3 and assume lr=0.005lr=0.005, the valid set size m1=256m_{1}=256 and the mini-batch size m2=64m_{2}=64. For both mRNN-BSDE method and LSTM-BSDE method, set one dd-dimensional input layer, two d+10d+10-dimensional hidden layers and one dd-dimensional output layer. To approximate u(0,x0)u(0,x_{0}), there will be 5 independent runs for each of the methods. The true value of u(0,x0)u(0,x_{0}) of Equation (52) is replaced by the reference value u(0,x0)=0.54516u(0,x_{0})=0.54516 which is calculated by deep splitting.

Refer to caption
Figure 6: Mean of the loss function and relative L1L^{1}-approximation error of u(0,x0)u(0,x_{0}) in the 50d50-d case of the PDE (52). (a) mean of the loss function ; (b)relative L1L^{1}-approximation error of u(0,x0)u(0,x_{0}) when H=2/3H=2/3.
Table 6: Numerical simulations for RNN-BSDE method in the 50d50-d case of the PDE (52) with H=23H=\frac{2}{3}.
Mean of
u0u_{0}
Std of
u0u_{0}
Rel.L1L^{1}
error
Std of
rel.error
Avg.runtime
/s
LSTM-BSDE 0.57731 7.80e-4 5.90e-2 1.43e-3 2517
mRNN-BSDE 0.57496 1.01e-3 5.46e-2 1.84e-3 1022

6 Conclusion

Fix H(12,1)H\in(\frac{1}{2},1), in this paper, we have discussed the relationship between the systems of PDEs and the fFBSDEs where 0tfs𝑑BsH\int_{0}^{t}f_{s}\,dB_{s}^{H} is in the sense of a Wick integral. And we have given (20) which is the PDE corresponding to the fFBSDEs where the solution solves the forward SDE (17) is called geometric fractional Brownian motion and significant in finance. What’s more, we have developed the RNN-BSDE method designed to solve fBSDEs. By numerical experiments, it can be observed that deep splitting and RNN-BSDE method are effective to solve fractional BSDEs and corresponding PDEs comparing with other methods. LSTM-BSDE and mRNN-BSDE have similar performance in solving fBSDEs according to our numerical experiments except LSTM-BSDE will cost more time. Especially, if one worries about the problem of ”long-term dependencies”, LSTM-BSDE may be a good choice to face it regardless of time cost.

References