RNN-BSDE method for high-dimensional fractional backward stochastic differential equations with Wick-Itô integrals

Chunhao Cai [email protected] Cong Zhang [email protected] School of Mathematics (Zhuhai), Sun Yat-sen University, Zhuhai 519082, Guangdong, People’s Republic of China

Abstract

Fractional Brownian motions(fBMs) are not semimartingales so the classical theory of Itô integral can’t apply to fBMs. Wick integration as one of the applications of Malliavin calculus to stochastic analysis is a fine definition for fBMs. We consider the fractional forward backward stochastic differential equations(fFBSDEs) driven by a fBM that have the Hurst parameter in $(\frac{1}{2},1)$ where $\int_{0}^{t}f_{s}\,dB_{s}^{H}$ is in the sense of a Wick integral, and relate our fFBSDEs to the system of partial differential equations by using an analogue of the Itô formula for Wick integrals. And we develop a deep learning algorithm referred to as the RNN-BSDE method based on recurrent neural networks which is exactly designed for solving high-dimensional fractional BSDEs and their corresponding partial differential equations.

keywords:

Fractional Brownian motions, Wick calculus , Fractional backward stochastic differential equations , Deep learning , Recurrent neural networks

1 Introduction

In recent years, deep learning has been developed and has been widely used to deal with high-dimensional problems about Partial Differential Equations(PDEs), though the key points of the idea of deep learning which include convolutional neural networks[1] and back propagation[2] were already developed by 1990. The Recurrent Neural Network algorithm, which is classical and basic for dealing with time series, was already developed by 2000[3][4]. In the 21st century, the advanced hardware and datasets rather than theories are mainly the reasons that deep learning takes off.

We have already known some work on solving PDEs by deep learning algorithms like PINNs[5] and neural operator[6]. In this paper, we’d like to discuss more on solving SDEs by deep learning, which may be rare and have less work on it. The late 2010s and 2020s have witnessed some deep learning algorithms to solve stochastic control problems[7][8] and solve the backward stochastic differential equations(BSDEs)[9]. Since the connection between nonlinear PDEs and BSDEs has been proved[10], Solving BSDEs means we can also solve the corresponding PDEs by using the same algorithm.

These work has inspired us whether it will work if we try to solve fractional backward stochastic differential equations(fBSDEs) by the deep learning method since it has been proved to be effective for BSDEs. In our work, we consider the following fractional forward backward stochastic differential equations(fFBSDEs)

\begin{cases}dX_{s}=\mu(s,X_{s})ds+\sigma(s,X_{s})dB_{s}^{H},\\ -dY_{s}=f(s,X_{s},Y_{s},Z_{s})ds-Z_{s}dB_{s}^{H},\\ X_{t}=x,\\ Y_{T}=g(X_{T}),\end{cases}

(1)

where $B_{s}^{H}$ is a fractional Brownian motion(fBM), let Hurst constant $H\in(\frac{1}{2},1)$ , $t$ and $T$ mean the start and final time, $\left\{(X_{s},Y_{s},Z_{s}),t\leq s\leq T\right\}$ are all stochastic processes.

Mandelbrot and van Ness defined a fractional Brownian Motion as that

Definition 1.1 (fBM[11]).

Let $B_{0}$ an arbitrary real number. We call $B_{t}^{H}$ a fractional Brownian Motion with Hurst parameter H and starting value $B_{0}$ at time 0, such as

1.

$B^{H}(0,\omega)=B_{0}$ ,
2.

$B^{H}(t,\omega)-B^{H}(0,\omega)=\frac{1}{\Gamma(H+\frac{1}{2})}\int_{0}^{t}(t-s)^{H-\frac{1}{2}}dB(s,\omega)$ .

Since the fractional Brownian motions are not semimartingales, $\int f\,dB^{H}$ is not well-defined in the sense of Itô integral[12]. In this paper, we instead consider it as the Wick integral[13][14][15][16]. To introduce it, the Wick product is used. Moreover, the stochastic calculus including the Wick integration and differentiation which we will introduce later is based on the Malliavin calculus[13]. According to Wick integration, we can also write (1) as

\begin{cases}dX_{s}=\mu(s,X_{s})ds+\sigma(s,X_{s})\diamond\dot{B}_{s}^{H}ds,\\ -dY_{s}=f(s,X_{s},Y_{s},Z_{s})ds-Z_{s}\diamond\dot{B}_{s}^{H}ds,\\ X_{t}=x,\\ Y_{T}=g(X_{T}),\end{cases}

(2)

where $\diamond$ is the mark of the Wick product.

Our purpose is to approximate the $\mathscr{F}_{t}$ -adapted processes $\left\{(Y_{s},Z_{s}),t\leq s\leq T\right\}$ such that

Y_{t}=g(X_{T})+\int_{t}^{T}f(s,X_{s},Y_{s},Z_{s})\,ds-\int_{t}^{T}Z_{s}\,dB_{s}^{H},\quad a.s.

(3)

by using the deep learning method.

Our paper is organized as follows: In Sec.2, we summarize some results from the applications of Malliavin calculus to stochastic analysis that we need for our work. In Sec.3, we consider the relationship between fFBSDEs and PDEs, also we present some calculations which are useful for the numerical experiments in Sec.5. In Sec.4, we introduce our deep learning algorithm for solving fFBSDEs which is based on the Recurrent Neural Network, and we refer to this algorithm as RNN-BSDE method. In Sec.5, we give some numerical experiments of solving some parabolic PDEs by RNN-BSDE method and compare our method with other methods that are mainly designed for solving BSDEs rather than fractional BSDEs.

2 Wick calculus

We will review some results of the applications of Malliavin calculus to stochastic analysis. For the details one can learn more about stochastic calculus for Brownian motions from [13][17] and the version for fractional Brownian motions from [14][15][16]. Firstly we have some preparations.

Set $\frac{1}{2}\leq H\leq 1$ and fix the Hurst constant. Define

\phi(s,t)=H(2H-1)\left|s-t\right|^{2H-2},\quad s,t\in\mathbb{R}.

(4)

Let $f:\mathbb{R}\to\mathbb{R}$ be measurable. We say that $f$ belongs to the Hilbert space $L^{2}_{\phi}(\mathbb{R})$ if

{\left|f\right|}_{\phi}^{2}:=\int_{\mathbb{R}}\int_{\mathbb{R}}\phi(s,t)f(s)f(t)\,dsdt<\infty.

(5)

The inner product on $L^{2}_{\phi}(\mathbb{R})$ is denoted by ${(\cdot,\cdot)}_{\phi}$ .

For any $f\in L^{2}_{\phi}(\mathbb{R})$ , define $\varepsilon(f)$ as

\varepsilon(f)=\exp{(\int_{\mathbb{R}}f\,dB^{H}-\frac{1}{2}\left\lvert f\right\rvert^{2}_{\phi})}.

(6)

$\varepsilon(f)$ is called an exponential function and let $E$ be the linear span of
$\left\{\varepsilon(f),f\in L^{2}_{\phi}(\mathbb{R})\right\}$ .

2.1 Wick integration

Consider the fractional white noise probability space denoted $(\Omega,\mathscr{F},P)=(S^{\prime}(\mathbb{R}),\mathscr{B},P_{\phi})$ where $S^{\prime}(\mathbb{R})$ is the dual space of the Schwartz space $S(\mathbb{R})$ and the probability measure $P_{\phi}$ exists by using the Bochner-Minlos theorem(see e.g.[18]), by use the fractional Wiener-Itô chaos expansion theorem[14], there is a chaos expansion(see e.g.[15]) for all $F\in L^{2}(P_{\phi})$ . Then it’s ready to define the fractional Hida distribution spaces denoted $S^{*}$ if $F$ admits the chaos expansion with finite negative norm described in [15][16].

The Wick product is the product defined for $X,Y$ which satisfy the fractional Wiener-Itô chaos expansion. And in this paper, it’s sufficient to know the fractional Brownian motion $B_{t}^{H}$ has its derivative in $S^{*}$ such that

\int_{\mathbb{R}}f(t)\,dB^{H}_{t}=\int_{\mathbb{R}}f(t)\diamond\dot{B}_{t}^{H}dt,\quad for\quad f:\mathbb{R}\to S^{*}.

$E$ is dense in $L^{2}(P_{\phi})$ and we have

Definition 2.1 (Wick integration).

Consider $f\in E$ and an arbitrary partition of $[0,T]$ denoted $\pi$ , then for the following Riemann sum:

S(f,\pi)=\sum_{i=0}^{n-1}f_{t_{i}}\diamond(B_{t_{i+1}}^{H}-B_{t_{i}}^{H}),

(7)

is well-defined. denote $\left\lvert\pi\right\rvert=\max_{i}(t_{i+1}-t_{i})$ , define $\int_{0}^{T}f(t)\,dB^{H}_{t}$ as

\int_{0}^{T}f(t)\,dB^{H}_{t}=\lim_{\left\lvert\pi\right\rvert\to 0}S(f,\pi),\quad for~{}any~{}\pi.

(8)

For $f\in\mathcal{L}(0,T)$ defined on Page 591 of [14], $S(f,\pi)$ will be a cauchy sequence in $L^{2}(P_{\phi})$ so that it has the limit denoted by $\int_{0}^{T}f(s)\,dB_{s}^{H}$ . In view of Theorem 3.9 in [14], for $f\in\mathcal{L}(0,T)$ , the limit of (7) satisfies

E\left\lvert\int_{0}^{T}f(t)\,dB^{H}_{t}\right\rvert^{2}=E\left\{\left(\int_{0}^{T}D_{s}f_{s}\,ds\right)^{2}+\left\lvert f\right\rvert_{\phi}^{2}\right\},

(9)

where $D_{s}$ are defined in the next subsection.

2.2 Stochastic derivative

The definition of stochastic derivative that its derivative operator is denoted as $D_{t}$ can be found in [13][14]. For $g\in L^{2}_{\phi}(\mathbb{R})$ , $\phi$ is defined by

(\Phi_{g})_{t}=\int_{0}^{\infty}\phi(t,u)g_{u}\,du

Let $D_{\Phi_{g}}$ (defined in [14]) an analogue of the directional derivative and $D_{t}$ is defined as

D_{\Phi_{g}}F(\omega)=\int_{0}^{\infty}D_{s}F(\omega)g(s)\,ds,

where $F$ is a random variable and $F\in L^{p}$ .

The following results of stochastic derivative are used in this paper.

Lemma 2.2 (The chain rule).

Let $f\in C^{1}(\mathbb{R})$ and $F:S^{\prime}(\mathbb{R})\to\mathbb{R}$ , $D_{t}F$ exists, then

D_{t}f(F)=f^{\prime}D_{t}F.

(10)

The proof is an analogue of the proof of Lemma 3.6 of [19].

Theorem 2.3 (Itô formula for Wick integration).

Let $X_{t}=\xi+\int_{0}^{t}\mu(s,X_{s})\,ds+\int_{0}^{t}\sigma(s,X_{s})\,dB_{s}^{H}$ , $\sigma\in\mathcal{L}(0,T)$ and $E\sup_{0\leq s\leq T}|\mu_{s}|<\infty$ . Assume there is an $\alpha>1-H$ such that

E\left\lvert\sigma_{u}-\sigma_{v}\right\rvert^{2}\leq C\left\lvert u-v\right\rvert^{2\alpha},

where $\left\lvert u-v\right\rvert\leq\delta$ for some $\delta>0$ and

\lim_{0\leq u,v\leq t,\left\lvert u-v\right\rvert\to 0}E\left\lvert D_{u}(\sigma_{u}-\sigma_{v})\right\rvert^{2}=0.

Let $f:\mathbb{R_{+}}\times\mathbb{R}\to\mathbb{R}$ , be a function having the first continuous derivative in its first variable and the second continuous derivative in its second variable. Assume that these derivatives are bounded. Moreover, it is assumed that $E\int_{0}^{T}|\sigma_{s}D_{s}X_{s}|<\infty$ , $\left(\frac{\partial f}{\partial x}(s,X_{s})\sigma_{s},0\leq s\leq T\right)$ are in $\mathcal{L}(0,T)$ . Then, for $0\leq t\leq T$ ,

$\displaystyle f(t,X_{t})=$	$\displaystyle f(0,\xi)+\int_{0}^{t}\frac{\partial f}{\partial s}(s,X_{s})ds+\int_{0}^{t}\frac{\partial f}{\partial x}(s,X_{s})\mu_{s}ds$	(11)
	$\displaystyle+\int_{0}^{t}\frac{\partial^{2}f}{\partial x^{2}}(s,X_{s})\sigma_{s}D_{s}X_{s}ds$
	$\displaystyle+\int_{0}^{t}\frac{\partial f}{\partial x}(s,X_{s})\sigma_{s}dB_{s}^{H}\quad a.s.$

Proposition 2.4.

If $g\in L^{2}_{\phi}(\mathbb{R})$ , $F\in L^{2}(\Omega,\mathscr{F},P)$ , and $D_{\phi_{g}}F\in L^{2}(\Omega,\mathscr{F},P)$ , then

F\diamond\int_{\mathbb{R^{+}}}g_{s}dB^{H}_{s}=F\int_{\mathbb{R^{+}}}g_{s}dB^{H}_{s}-D_{\phi_{g}}F.

(12)

if as $n\to\infty$ , $\sum_{i=0}^{n-1}f(t^{(n)}_{i})(B^{H}(t^{(n)}_{i+1})-B^{H}(t^{(n)}_{i}))$ converges in $L^{2}(\Omega,\mathscr{F},P)$ to the same limit for all partitions $(\pi_{n},n\in N)$ satisfying $\left\lvert\pi_{n}\right\rvert\to 0$ as $n\to\infty$ , then this limit is called the stochastic integral of Stratonovich type and the limit is denoted by $\int_{0}^{t}f_{s}\delta B_{s}^{H}$ .

In view of Proposition 2.4, we have

Theorem 2.5.

For $f\in\mathcal{L}(0,T)$ , the following equality is satisfied:

\int_{0}^{t}f_{s}\,dB_{s}^{H}=\int_{0}^{t}f_{s}\delta B_{s}^{H}-\int_{0}^{t}D_{s}f_{s}\,ds\qquad a.s.

(13)

3 Fractional Backward SDEs and systems of PDEs

Consider the fractional white noise space $(\Omega,\mathscr{F},P)=(S^{\prime}(\mathbb{R}^{d}),\mathscr{B},P_{\phi})$ (The multidimensional presentation $d>1$ will be an analogue of the case in [20]), and the fFBSDEs (2) which we give in Sec.1:

\begin{cases}dX_{s}=\mu(s,X_{s})ds+\sigma(s,X_{s})\diamond\dot{B}_{s}^{H}ds,\\ -dY_{s}=f(s,X_{s},Y_{s},Z_{s})ds-Z_{s}\diamond\dot{B}_{s}^{H}ds,\\ X_{t}=x,\\ Y_{T}=g(X_{T}),\end{cases}

where $H\in(\frac{1}{2},1)$ , $\mu(s,x):\mathbb{R}^{+}\times\mathbb{R}^{d}\to\mathbb{R}^{d}$ , $\sigma(s,x):\mathbb{R}^{+}\times\mathbb{R}^{d}\to\mathbb{R}^{d\times d}$ , and let $\mu(s,X_{s})$ , $\sigma(s,X_{s})$ satisfy the conditions of Theorem 2.3. $Y:\mathbb{R}^{+}\times\Omega\to\mathbb{R}^{k}$ , $Z:\mathbb{R}^{+}\times\Omega\to\mathbb{R}^{k\times d}$ , $g(X_{T})$ is $\mathscr{F}_{T}$ -measurable, $f:\mathbb{R}^{+}\times\mathbb{R}^{d}\times\mathbb{R}^{k}\times\mathbb{R}^{k\times d}\to\mathbb{R}^{k}$ .

We want to link fractional Backward SDEs and PDEs. Firstly, we give the following system of PDEs,

\begin{cases}\begin{aligned} &\frac{\partial u(t,x)}{\partial t}+\mathcal{L}_{frac}u(t,x)+f(t,x,u(t,x),(\nabla u\sigma)(t,x))=0,\\ &u(T,x)=g(x),\\ \end{aligned}\end{cases}

(14)

where $u:\mathbb{R_{+}}\times\mathbb{R}^{d}\to\mathbb{R}^{k}$ and we denote $u(t,x)=\left(u^{1}(t,x),\cdots,u^{k}(t,x)\right)^{T}$ , and

\mathcal{L}_{frac}=\begin{pmatrix}L_{frac}u_{1}\\ \cdots\\ L_{frac}u_{k}\end{pmatrix},

L_{frac}=\sum_{i,j=1}^{d}\sigma_{i}(t,x)D_{t}x_{j}\frac{\partial^{2}}{\partial x_{i}\partial x_{j}}+\sum_{i=1}^{d}\mu_{i}(t,x)\frac{\partial}{\partial x_{i}}.

We have the following theorem that

Theorem 3.1.

Fix $\frac{1}{2}<H\leq 1$ , consider $X_{t}=X_{0}+\int_{0}^{t}\mu(s,X_{s})\,ds+\int_{0}^{t}\sigma(s,X_{s})\,dB_{s}^{H}$ and let $\left\{(\mu_{s},\sigma_{s}),0\leq s\leq T\right\}$ satisfy the conditions of Theorem 2.3. If $u\in C^{1,2}([0,T]\times\mathbb{R}^{d})$ solves equation (14), then $u(t,x)=Y^{t,x}_{t}$ (where $X^{t,x}_{t}=x$ ), $t\geq 0$ , $x\in\mathbb{R}^{d}$ , where $\left\{(Y_{s}^{t,x},Z_{s}^{t,x}),t\leq s\leq T\right\}$ is the solution of the BSDE (3).

Proof.

Apply Theorem 2.3 to the solution $u$ of (14),

	$\displaystyle u(T,X_{T})-u(t,X_{t})=$	$\displaystyle-\int_{t}^{T}\mathcal{L}_{frac}u(s,X_{s})+f\,ds$
		$\displaystyle+\sum_{i=1}^{d}\int_{t}^{T}\mu_{i}(s,X_{s})\frac{\partial u(s,X_{s})}{\partial x^{i}}\,ds$
		$\displaystyle+\sum_{i=1}^{d}\int_{t}^{T}\sigma_{i}(s,X_{s})\frac{\partial u(s,X_{s})}{\partial x^{i}}\,dB^{H}_{s}$
		$\displaystyle+\sum_{i,j=1}^{d}\int_{t}^{T}\sigma_{i}(s,X_{s})D_{s}X_{s}^{j}\frac{\partial^{2}u(s,X)}{\partial x^{i}\partial x^{j}}\,ds\quad a.s.$

u(t,X_{t})=g(X_{T})+\int_{t}^{T}f\,ds-\sum_{i=1}^{d}\int_{t}^{T}\sigma_{i}(s,X_{s})\frac{\partial u(s,X_{s})}{\partial x^{i}}\,dB^{H}_{s},\quad a.s.

(15)

it can be shown that proving this theorem. ∎

Remark 1.

From the proof, we can obtain

\begin{cases}\begin{aligned} &u(t,X_{t})=Y_{t},\\ &\sigma(t,X_{t})\nabla u(t,X_{t})=Z_{t}.\end{aligned}\end{cases}

(16)

It’s worth to consider an example of the geometric fractional Brownian motion which solves the fractional SDE that

dX_{t}=\mu X_{t}dt+\sigma X_{t}dB_{t}^{H},X_{0}=x_{0}\geq 0,

(17)

where $x_{0}$ , $\mu$ , $\sigma$ are constants.

Proposition 3.2.

The solution of (17) is $X_{t}=\exp^{\diamond}{\left(\mu t+\sigma B_{t}^{H}\right)}$ , i.e. $X_{t}=\exp{\left(\mu t+\sigma B_{t}^{H}-\frac{1}{2}\sigma^{2}t^{2H}\right)}$ .

Proof.

	$\displaystyle dX_{t}$	$\displaystyle=\mu X_{t}dt+\sigma X_{t}\diamond\dot{B}_{t}^{H}dt$
		$\displaystyle=(\mu+\sigma\dot{B}_{t}^{H})\diamond X_{t}dt$
	$\displaystyle\frac{dX_{t}}{dt}$	$\displaystyle=(\mu+\sigma\dot{B}_{t}^{H})\diamond X_{t}.$

Using Wick calculus, then

$\displaystyle X_{t}$	$\displaystyle=x_{0}\exp^{\diamond}{\left(\mu t+\sigma\int_{0}^{t}\dot{B}_{s}^{H}\,ds\right)}$	(18)
	$\displaystyle=x_{0}\exp^{\diamond}{\left(\mu t+\sigma B_{t}^{H}\right)}$
	$\displaystyle=x_{0}\exp{\left(\mu t+\sigma B_{t}^{H}-\frac{1}{2}\sigma^{2}t^{2H}\right)}.$

∎

Proposition 3.3.

Let $s\leq t$ , the solution of (17) has a derivative $D_{s}X_{t}=\sigma HX_{t}\left[s^{2H-1}-(t-s)^{2H-1}\right]$

Proof.

From Lemma 2.2, $D_{s}X_{t}$ exists and

$\displaystyle D_{s}X_{t}$	$\displaystyle=\frac{\partial X_{t}}{\partial x}D_{s}B_{t}^{H}$	(19)
	$\displaystyle=\sigma x_{0}\exp{\left(\mu t+\sigma B_{t}^{H}-\frac{1}{2}\sigma^{2}t^{2H}\right)}\int_{0}^{t}\phi(u,s)\,du$
	$\displaystyle=\sigma HX_{t}\left[s^{2H-1}-(t-s)^{2H-1}\right].$

∎

Corollary 3.4.

Consider $X_{t}=\eta_{0}+\int_{0}^{t}\mu X_{s}\,ds+\int_{0}^{t}\sigma X_{s}\,dB_{s}^{H}$ , $\mu\in\mathbb{R}$ , $\sigma\in\mathbb{R}$ , $\eta_{0}\in\mathbb{R}^{d}$ are constants, if $u\in C^{1,2}([0,T]\times\mathbb{R}^{d})$ solves

\begin{cases}\begin{aligned} \frac{\partial u(t,x)}{\partial t}+\mathcal{L}_{frac}u(t,x)+f(t,x,u(t,x),(\nabla u\sigma)(t,x))&=0,\\ u(T,x)&=g(x),\\ \end{aligned}\end{cases}

(20)

and

\mathcal{L}_{frac}=\begin{pmatrix}L_{frac}u_{1}\\ \cdots\\ L_{frac}u_{k}\end{pmatrix},

L_{frac}=\sum_{i,j=1}^{d}\sigma^{2}Hx_{j}^{2}t^{2H-1}\frac{\partial^{2}}{\partial x_{i}\partial x_{j}}+\sum_{i=1}^{d}\mu x_{i}\frac{\partial}{\partial x_{i}}.

Then $u(t,x)=Y^{t,x}_{t}$ where $\left\{(Y_{s}^{t,x},Z_{s}^{t,x}),t\leq s\leq T\right\}$ is the solution of the BSDE (3).

Proof.

Easily proved from 3.1 and 3.3. ∎

4 RNN-BSDE method

Before we start to build our network, we should apply a time discretization to BSDEs (3). Consider the partition $\pi:0=t_{0}<t_{1}<\cdots<t_{N}=T$ , for any $t_{n}<t_{n+1}$ on $[0,T]$ , from Definition 1.1 and Definition 2.1, it holds that

\begin{cases}Y_{t_{n+1}}=Y_{t_{n}}-f(t_{n},X_{t_{n}},Y_{t_{n}},Z_{t_{n}})\left(t_{n+1}-t_{n}\right)+Z_{t_{n}}\diamond\Delta B^{H}_{t_{n}},\\ B^{H}_{t_{n+1}}=\frac{1}{\Gamma(H+\frac{1}{2})}\sum_{i=0}^{n}(t_{n+1}-t_{i})^{H-\frac{1}{2}}(B_{t_{i+1}}-B_{t_{i}}).\end{cases}

(21)

Moreover, to deal with the Wick product, we use Proposition 2.4 and obtain that

$\displaystyle Y_{t_{n+1}}=$	$\displaystyle Y_{t_{n}}-f(t_{n},X_{t_{n}},Y_{t_{n}},Z_{t_{n}})\left(t_{n+1}-t_{n}\right)+Z_{t_{n}}(B^{H}_{t_{n+1}}-B^{H}_{t_{n}})$	(22)
	$\displaystyle-D_{\Phi_{g}}Z_{t_{n}}$
$\displaystyle=$	$\displaystyle Y_{t_{n}}-f(t_{n},X_{t_{n}},Y_{t_{n}},Z_{t_{n}})\left(t_{n+1}-t_{n}\right)+Z_{t_{n}}(B^{H}_{t_{n+1}}-B^{H}_{t_{n}})$
	$\displaystyle-D_{t_{n}}Z_{t_{n}}\left(t_{n+1}-t_{n}\right).$

The approximation scheme is still incomplete because $D_{t_{n}}Z_{t_{n}}$ is unknown since $Z_{t_{n}}$ is which we need to find if it’s a problem of solving the fBSDEs. Of course, it may be an idea that we construct another neural network to approximate $D_{t_{n}}Z_{t_{n}}$ just like what we do for approximating $Z_{t_{n}}$ that we will introduce soon. But we prefer to give another way, from Theorem 3.1 and (1), we can consider $Z_{t}$ as $g(t,X_{t})$ , so in view of Lemma 2.2

	$\displaystyle D_{s}Z_{t}$	$\displaystyle=D_{s}g(t,X_{t})$
		$\displaystyle=\frac{\partial g(t,X_{t})}{\partial x}D_{s}X_{t}.$

Besides, if $X_{t}$ is the solution of (17), in view of Proposition 3.3,

	$\displaystyle D_{s}Z_{t}$	$\displaystyle=\frac{\partial g(t,X_{t})}{\partial x}D_{s}X_{t}$
		$\displaystyle=\sigma HX_{t}\frac{\partial g(t,X_{t})}{\partial x}[s^{2H-1}-(s-t)^{2H-1}].$

Then we can rewrite (22) as

	$\displaystyle Y_{t_{n+1}}=$	$\displaystyle Y_{t_{n}}-\left(f(t_{n},X_{t_{n}},Y_{t_{n}},Z_{t_{n}})+\partial_{x}^{\ast}Z_{t_{n}}\cdot D_{t_{n}}X_{t_{n}}\right)\Delta t_{n}$
		$\displaystyle+Z_{t_{n}}\Delta B_{t_{n}}^{H},$

where $\partial_{x}^{\ast}$ means automatic differentiation.

4.1 Main ideas of the RNN-BSDE method

In our work, we develop an algorithm for solving fractional BSDEs based on deep BSDE method[9] and refer to it as RNN-BSDE method. The reason why it’s necessary to develop a new algorithm is, as we all know, fractional Brownian motions are not Markov processes. Besides, fBMs have the property of long-range dependence if $H\in(\frac{1}{2},1)$ and short-range dependence if $H\in(0,\frac{1}{2})$ (see e.g. [16]). It means the increments of $B^{H}_{t}$ can’t be non-correlated if $H\neq\frac{1}{2}$ . When we approximate $(Z_{t_{n}},Y_{t_{n+1}})$ , it will not be satisfying if we only consider using the information of $X_{t_{n}}$ in the input layer. Instead, we want to make full use of all the information before time $t_{n+1}$ , so a recurrent neural network is a better choice than a feedforward neural network.

Recurrent neural network(RNN) structure[3][4], which is classical for dealing with time series, has some advantages of solving fractional BSDEs that

RNN can make full use of more information before time $t_{n+1}$ . RNN processes the sequence data stored in rank-3 tensors of shape (samples, timesteps, features). the computations of $X_{t_{n}}$ , $n=0,1,\cdots,N-1$ in the recurrent unit which is the fundamental building block of an RNN can be expressed as

h_{t_{n}}=f(X_{t_{n}}*U+h_{t_{n-1}}*W+b),\\

(23)

where $U$ , $W$ are weight matrices, $b$ are biases, $h_{t_{n}}$ means the hidden state of hidden layers at time $t_{n}$ , $f$ is the activation function. For RNN, consider (23) with each $t=t_{n},\quad n=0,1,\cdots,N-1$ , then

	$\displaystyle h_{t_{n}}=$	$\displaystyle f(UX_{t_{n}}+Wf(UX_{t_{n-1}}+Wh_{t_{n-2}}+b)+b)$
	$\displaystyle=$	$\displaystyle f(UX_{t_{n}}+Wf(UX_{t_{n-1}}+Wf(\cdots f(UX_{t_{0}}+Wh_{0}$
		$\displaystyle+b))+b))$
	$\displaystyle=$	$\displaystyle f(X_{t_{n}},X_{t_{n-1}},\dots,X_{t_{0}}),$

clearly, RNN satisfies what we required above.

2.

For $\left\{t_{n}|n=0,1,2,\cdots,N\right\}\subseteq[0,T]$ , there are $N-1$ FNNs in deep BSDE method, what means the larger N is, the more parameters in neural networks that need to be determined are, which may be a burden of compute. But if we use an RNN structure, however large N is, there will be always one RNN since the hidden layer has a recurrent structure so that for any timestep $t_{n}\in\left\{t_{n}|n=0,1,2,\cdots,N\right\}$ , the weight matrices and biases are common and reused in an epoch. As mentioned in [9], for $N+1$ time nodes, one $d$ -dimensional input layer, two $(d+10)$ -dimensional hidden layers and one $d$ -dimensional output layer, there will be $\left\{d+1+(N-1)\left[2d(d+10)+(d+10)^{2}+4(d+10)+4d\right]\right\}$ parameters to be trained for deep BSDE while $[1+2d(d+10)+3(d+10)^{2}+d^{2}+6(d+10)+3d]$ parameters to be trained for RNN-BSDE using a stacked-RNN.

The main idea of the RNN-BSDE method can be expressed as.

\displaystyle\ Z_{t_{n}}=subRNN(X_{t_{n}};\theta),\quad n=0,1,\cdots,N-1

(24)

\displaystyle\ \begin{aligned} Y_{t_{n+1}}=&Y_{t_{n}}-\left(f(t_{n},X_{t_{n}},Y_{t_{n}},Z_{t_{n}})+\partial_{x}^{\ast}Z_{t_{n}}\cdot D_{t_{n}}X_{t_{n}}\right)\Delta t_{n}\\ &+Z_{t_{n}}\Delta B_{t_{n}}^{H},\end{aligned}

(25)

\displaystyle\ loss=E\left\lvert g(X_{T})-Y_{T}\right\rvert^{2},

(26)

\displaystyle(Y_{0}^{new},Z_{0}^{new},\left\{Z_{n}^{new}\right\},\theta^{new})=BP(loss(DNN(X_{t}))).\quad n=1,\cdots,N-1

(27)

In (24), the sub-neural network is an RNN instead of $N-1$ FNNs, and to make the RNN more effective, we choose a stacked RNN rather than a simple RNN, which the structure is shown in Fig.1.

Refer to caption — Figure 1: Rough sketch of the architecture of the stacked RNN. $t_{n}$ is simply written as $n$ , $h_{j}^{(i)}$ means the hidden state of the $i$ th hidden layer at time $t_{j}$ .

The whole flow in the direction of forward propagation can be seen in Fig.2. Besides, when apply the deep BSDE method, batch normalization[21] is adopted right after each matrix multiplication and before activation. Notice that batch normalization is not suitable for RNNs, instead, we choose layer normalization[22]. Finally, we provide the pseudocode of RNN-BSDE method as following:

Algorithm 1 The RNN-BSDE method

1:Initial parameters

\left(\alpha,\theta\right)

\left\{X^{m}_{t_{j}}\right\}

samples;

\left(\alpha^{i},\theta^{i}\right)

;

3:for

i=1\to maxstep

\left\{Z^{m,i}_{t_{j}}\right\}\leftarrow sub\mathcal{RNN}(\left\{X^{m}_{t_{j}}\right\};\theta^{i})

;

\left\{\partial_{x}^{\ast}Z^{m,i}_{t_{j}}\right\}\leftarrow AD\left(\left\{Z^{m,i}_{t_{j}}\right\},\left\{X^{m}_{t_{j}}\right\}\right)

;

\triangleright

Automatic differentiation

6: for

j=0\to N-1

Y^{m,i}_{t_{j+1}}\leftarrow Y^{m,i}_{t_{j}}-\left(f(t_{j},X^{m}_{t_{j}},Y^{m,i}_{t_{j}},Z^{m,i}_{t_{j}})+\partial_{x}^{\ast}Z^{m,i}_{t_{j}}\cdot D_{t_{j}}X^{m}_{t_{j}}\right)\Delta t+Z^{m,i}_{t_{j}}\Delta B_{t_{j}}^{H}

;

8: end for

9: Set loss function

\frac{1}{M}\sum_{m=1}^{M}\left\lvert g(X^{m}_{T})-Y^{m,i}_{T}\right\rvert^{2}

;

10:

\left(\alpha^{i+1},\theta^{i+1}\right)\leftarrow Adam(loss;\alpha^{i},\theta^{i})

;

11:end for

4.2 More detail of the RNN-BSDE method and an example of solving fBSDEs

In this section, we will introduce more about how to set up the neural network of the RNN-BSDE method to make the algorithm more practical. It’s convenient to use the simplest traditional RNN to explain our main idea but it’s too weak to solve some fBSDEs. So it’s necessary to tell more detail and apply some more practical types of RNN.

Suppose the samples of $X_{t}$ , $\widetilde{X_{t}}\in\mathbb{R}^{m\times(N+1)\times d}$ , where $m\in\{m_{1},m_{2}\}$ , $m_{1}$ means the number of sample paths in the whole valid sets and we denote the mini-batch size by $m_{2}$ , $N+1$ means the number of time nodes, $d$ is the dimension (which will be regarded as the number of features in deep learning) of inputs. The stacked RNN to approximate $(Z_{t_{n}},n=0,1,\cdots,N)$ has at least four layers including one $d$ -dimensional input layer, at least two $(d+10)$ -dimensional hidden layers and one $d$ -dimensional output layer. The hidden layers and output layer are all RNN layers composed of recurrent units, we set weight matrices $U_{h}\in\mathbb{R}^{d\times\left(d+10\right)}$ and $U_{o}\in\mathbb{R}^{d\times d}$ , recurrent weight matrices $W_{h}\in\mathbb{R}^{\left(d+10\right)\times\left(d+10\right)}$ and $W_{o}\in\mathbb{R}^{\left(d+10\right)\times d}$ . There is no activation function after each matrix multiplication and instead we have

		$\displaystyle h^{(1)}_{t_{n}}=tanh\left(LN^{(1)}\left(X_{t_{n}}U_{h}^{(1)}+h_{t_{n-1}}^{(1)}W_{h}^{(1)}\right)+b^{(1)}\right),$		(28)
		$\displaystyle h^{(2)}_{t_{n}}=tanh\left(LN^{(2)}\left(h_{t_{n}}^{(1)}U_{h}^{(2)}+h_{t_{n-1}}^{(2)}W_{h}^{(2)}\right)+b^{(2)}\right),$
		$\displaystyle Z_{t_{n}}=LN^{(3)}\left(h^{(2)}_{t_{n}}U_{o}+Z_{t_{n-1}}W_{o}\right)+b^{(3)},\quad n=0,1,\cdots,N-1$

where $LN$ means layer normalization, $tanh$ is the hyperbolic tangent function. (28) can be understood more easily with Fig.1 together. All weights and parameters of layer normalization will be randomly initialised at the start of each run.

If we worry that a stacked RNN is still not powerful enough to solve most fBSDEs, it will be the turn of a special type of RNN named Long Short Term Memory networks(LSTMs)[23]. For $H\in\left(\frac{1}{2},1\right)$ , the fBM $B_{t}^{H}$ holds the long memory property and as known a traditional RNN is not able to handle ”long-term dependencies” in practice while the LSTM can keep useful with long-term dependencies and deal with the vanishing gradient problem in the RNN. Since LSTMs are a kind of RNN, we can change an RNN into a LSTM just by replacing RNN units in the network with LSTM units which are the fundamental building blocks of a LSTM. A LSTM cell is composed of a cell and three gates including an input gate, a forget gate and an output gate. In the RNN-BSDE method, we choose to use a LSTM with layer normalization which has the similar structure to a stacked-RNN illustrated in Fig.1, i.e. multiple $(d+10)$ -dimensional LSTM layers as hidden layers and one extra $d$ -dimensional LSTM layer before output.

4.3 Convergence analysis

In this part, we provide a posterior estimate of the numerical solution and this posterior estimate justifies the convergence of RNN-BSDE method. Firstly, assume

Assumptions 4.1.

Let $u\in C^{1,2}(\mathbb{R}^{+}\times\mathbb{R}^{d})$ , $u$ and the process $X_{t}$ satisfy the conditions in 2.3 to ensure that Itô formula for Wick integration(Theorem.2.3) is applicable to $u(t,X_{t})$ .

Assumptions 4.2.

For any $y,y^{\prime}$ , $x,x^{\prime}$ and $t,t^{\prime}$ ,

\left\lvert f(t,x,y,z)-f(t^{\prime},x^{\prime},y^{\prime},z)\right\rvert\leq L_{f}\left(\left\lvert t-t^{\prime}\right\rvert+\left\lvert x-x^{\prime}\right\rvert+\left\lvert y-y^{\prime}\right\rvert\right),

where $L_{f}$ is a given positive constant.

Assumptions 4.3.

For any $z,z^{\prime}$ and any $t_{i}$ , $t_{j}\in[0,T]$ satisfying $t_{i}\leq t_{j}$ ,

\int_{t_{i}}^{t_{j}}\left\lvert f(t,x,y,z)-f(t,x,y,z^{\prime})\right\rvert^{2}\,dx\leq C_{f}E\left\lvert\int_{t_{i}}^{t_{j}}z-z^{\prime}\,dB_{t}^{H}\right\rvert^{2},

where $C_{f}$ is a given positive constant and denote $C_{f}^{\ast}=C_{f}\vee L_{f}$ .

Assumptions 4.4.

Assume (14) has a unique classical solution and there exist unique pair of $\mathscr{F}_{t}$ -adapted processes $\left(Y_{t},Z_{t}\right)$ solving (3).

Consider the following fFBSDE system with the state $\widetilde{Y}_{t}$

\begin{cases}\begin{aligned} &X_{t}=\xi+\int_{0}^{t}\mu(s,X_{s})\,ds+\int_{0}^{t}\sigma(s,X_{s})\,dB_{s}^{H},\\ &\widetilde{Y}_{t}=\widetilde{y}-\int_{0}^{t}f(s,X_{s},\widetilde{Y}_{s},\widetilde{Z}_{s})ds+\int_{0}^{t}\widetilde{Z}_{s}\,dB_{s}^{H},\\ \end{aligned}\end{cases}

(29)

the aim is to minimize the objective functional

J\left(\widetilde{y},\widetilde{Z}_{\centerdot}\right)=E\left\lvert g(X_{T})-\widetilde{Y}_{T}\right\rvert^{2},

(30)

under the control $\left(\widetilde{y},\widetilde{Z}_{\centerdot}\right)\in\mathbb{R}\times\mathcal{L}(0,T)$ .

Firstly, notice (9), and we provide

Theorem 4.5.

Assume Assumptions 4.1-4.4 hold. Let $(Y_{t},Z_{t})$ be the solution of fBSDE (3), $\widetilde{Y}_{t}$ denotes the state of system (29) under the control $\left(\widetilde{y},\widetilde{Z}_{\centerdot}\right)\in\mathbb{R}\times\mathcal{L}(0,T)$ , then there exists some constant $C$ which only depends on $T$ and $C_{f}^{\ast}$ satisfying

	$\displaystyle\sup_{0\leq t\leq T}E\left\lvert Y_{t}-\widetilde{Y}_{t}\right\rvert^{2}+E\left\lvert\int_{0}^{T}Z_{s}-\widetilde{Z}_{s}\,dB_{s}^{H}\right\rvert^{2}$	(31)
$\displaystyle=$	$\displaystyle\sup_{0\leq t\leq T}E\left\lvert Y_{t}-\widetilde{Y}_{t}\right\rvert^{2}+E\left\{\left(\int_{0}^{T}D_{s}\left(Z_{s}-\widetilde{Z}_{s}\right)\,ds\right)^{2}+\left\lvert Z_{s}-\widetilde{Z}_{s}\right\rvert_{\phi}\right\}$
$\displaystyle\leq$	$\displaystyle CE\left\lvert g(X_{T})-\widetilde{Y}_{T}\right\rvert^{2}.$

The proof of Theorem 4.5 is an analogue of Theorem 2.1 in [24], only note that Itô formula should be replaced by the Itô formula for Wick integration.

In view of 4.5, the problem of solving (3) can be changed into the stochastic control problem of the system (29). So it will be a reasonable choice to apply deep learning to this kind of problems.

Then, we need to provide the estimation of the error resulted from time discretization. From now on, we manily consider $d=1$ for brevity.

$C^{\gamma}([0,T])$ denotes a $\gamma$ -H $\mathrm{\ddot{o}}$ lder space, $\left\lVert\cdot\right\rVert_{\gamma}$ denotes the $\gamma$ -H $\mathrm{\ddot{o}}$ lder norm.

For any constant $K>0$ , define

\mathcal{A}_{K}:=\left\{Z\in\mathcal{L}(0,T)|Z(\cdot,\omega)\in C^{\frac{1}{2}}([0,T]),E\left(\left\lVert Z\right\rVert_{\frac{1}{2}}^{4}\right)\leq K\right\}.

Let $\widehat{Y}_{t}$ be the state of

\begin{cases}\begin{aligned} &\widehat{X}_{t_{k+1}}=\widehat{X}_{t_{k}}+\mu(t_{k},\widehat{X}_{t_{k}})\Delta t+\sigma(t_{k},\widehat{X}_{t_{k}})\,\diamond\Delta B^{H}_{t_{k}},\\ &\widehat{Y}_{t_{k+1}}=\widehat{Y}_{t_{k}}-f(t_{k},\widehat{X}_{t_{k}},\widehat{Y}_{t_{k}},\widetilde{Z}_{t_{k}})\left(t_{k+1}-t_{k}\right)+\widetilde{Z}_{t_{k}}\diamond\Delta B^{H}_{t_{k}},\end{aligned}\end{cases}

(32)

with the aim of minimizing the objective functional

\widehat{J}\left(\widetilde{y},\widetilde{Z}_{\centerdot}\right)=E\left\lvert g(X_{T})-\widehat{Y}_{T}\right\rvert^{2},

(33)

under the control $\left(\widetilde{y},\widetilde{Z}_{\centerdot}\right)\in\mathbb{R}\times\mathcal{A}_{K}$ . For any partition $\pi$ , define

Lemma 4.6.

Assume $\widetilde{Z}\in\mathcal{A}_{K}$ , then let $N$ large enough, for any partition $\pi$ and $k=0,\dots,N-1$ , it follows that

E\left\lvert\int_{t_{k}}^{t_{k+1}}\widetilde{Z}_{t}\,dB_{t}^{H}-\widetilde{Z}_{t_{k}}\diamond\Delta B^{H}_{t_{k}}\right\rvert^{2}\leq C_{K}\left[\sqrt{E\left(\left\lVert B^{H}\right\rVert^{4}_{C^{H-\delta}}\right)}\left(\Delta t\right)^{1+2H-2\delta}+\Delta t^{2}\right],

(34)

moreover,

\sup_{t_{k}\leq t\leq t_{k+1}}E\left\lvert\widetilde{Y}_{t}-\widehat{Y}_{t_{k}}\right\rvert^{2}\leq C_{K}\left[\sqrt{E\left(\left\lVert B^{H}\right\rVert^{4}_{C^{H-\delta}}\right)}\left(\Delta t\right)^{1+2H-2\delta}+\Delta t^{2}\right],

(35)

especially,

E\left\lvert\widetilde{Y}_{T}-\widehat{Y}_{T}\right\rvert^{2}\leq C_{K}\left[\sqrt{E\left(\left\lVert B^{H}\right\rVert^{4}_{C^{H-\delta}}\right)}\left(\Delta t\right)^{1+2H-2\delta}+\Delta t^{2}\right],

(36)

with some $C_{K}$ not depending on $B^{H}$ , $\delta\in\left(0,H-\frac{1}{2}\right)$ .

Proof.

Define $Z_{t}^{\ast}:=\widetilde{Z}_{t}-\widetilde{Z}_{t_{k}}$ for $t\in\left[t_{k},t_{k+1}\right]$ . In view of Lemma 19 in [25], it follows that

\left\lvert\int_{t_{k}}^{t_{k+1}}Z_{t}^{\ast}\,\delta B_{t}^{H}\right\rvert\leq C\left\lVert Z_{t}^{\ast}\right\rVert_{\frac{1}{2}}\left\lVert B^{H}\right\rVert_{C^{H-\delta}}\left(\Delta t\right)^{\frac{1}{2}+H-\delta}

(37)

where $\frac{1}{2}+H-\delta>1$ . Then

E\left\lvert\int_{t_{k}}^{t_{k+1}}\widetilde{Z}_{t}\,\delta B_{t}^{H}-\widetilde{Z}_{t_{k}}\diamond\Delta B^{H}_{t_{k}}\right\rvert^{2}\leq C_{K}\left[\sqrt{E\left(\left\lVert B^{H}\right\rVert^{4}_{C^{H-\delta}}\right)}\left(\Delta t\right)^{1+2H-2\delta}\right].

(38)

In view of Theorem 2.5 and (9), for $N$ large enough,

$\displaystyle E\left\lvert\int_{t_{k}}^{t_{k+1}}Z_{t}^{\ast}\,dB_{t}^{H}\right\rvert^{2}$	$\displaystyle\leq E\left\lvert\int_{t_{k}}^{t_{k+1}}Z_{t}^{\ast}\,\delta B_{t}^{H}\right\rvert^{2}+E\left\{\left(\int_{t_{k}}^{t_{k+1}}D_{s}Z_{s}^{\ast}\,ds\right)^{2}\right\}$	(39)
	$\displaystyle\leq C_{K}\left[\sqrt{E\left(\left\lVert B^{H}\right\rVert^{4}_{C^{H-\delta}}\right)}\left(\Delta t\right)^{1+2H-2\delta}\right]$
	$\displaystyle+\sup_{t_{k}\leq t\leq t_{k+1}}E\left\lvert D_{t}\left(Z_{t}-Z_{t_{k}}\right)\right\rvert^{2}\Delta t^{2}$
	$\displaystyle\leq C_{K}\left[\sqrt{E\left(\left\lVert B^{H}\right\rVert^{4}_{C^{H-\delta}}\right)}\left(\Delta t\right)^{1+2H-2\delta}+\Delta t^{2}\right].$

let $b_{0}$ denote $C_{K}\left[\sqrt{E\left(\left\lVert B^{H}\right\rVert^{4}_{C^{H-\delta}}\right)}\left(\Delta t\right)^{1+2H-2\delta}+\Delta t^{2}\right]$ . Consider (32) and (29), for any $t\in[t_{k+1},t_{k+2}]$ we have

$\displaystyle E\left\lvert\widetilde{Y}_{t}-\widehat{Y}_{t_{k+1}}\right\rvert^{2}\leq$	$\displaystyle\left(1+\Delta t\right)E\left\lvert\widetilde{Y}_{t_{k}}-\widehat{Y}_{t_{k}}\right\rvert^{2}$	(40)
$\displaystyle+$	$\displaystyle\left(\Delta t+\Delta t^{2}\right)\int_{t_{k}}^{t_{k+1}}E\left\lvert f(s,X_{s},\widetilde{Y}_{s},\widetilde{Z}_{s})-f(t_{k},\widehat{X}_{t_{k}},\widehat{Y}_{t_{k}},\widetilde{Z}_{t_{k}})\right\rvert^{2}\,ds$
$\displaystyle+$	$\displaystyle E\left\lvert\int_{t_{k}}^{t_{k+1}}\widetilde{Z}_{t}\,dB_{t}^{H}-\widetilde{Z}_{t_{k}}\diamond\Delta B^{H}_{t_{k}}\right\rvert^{2}$
$\displaystyle\leq$	$\displaystyle\left(1+\Delta t\right)E\left\lvert\widetilde{Y}_{t_{k}}-\widehat{Y}_{t_{k}}\right\rvert^{2}$
$\displaystyle+$	$\displaystyle C\Delta t\int_{t_{k}}^{t_{k+1}}E\left(\left\lvert X_{s}-\widehat{X}_{t_{k}}\right\rvert^{2}+\left\lvert\widetilde{Y}_{s}-\widehat{Y}_{t_{k}}\right\rvert^{2}\right)\,ds$
$\displaystyle+$	$\displaystyle\left(1+C\Delta t\right)E\left\lvert\int_{t_{k}}^{t_{k+1}}\widetilde{Z}_{t}\,dB_{t}^{H}-\widetilde{Z}_{t_{k}}\diamond\Delta B^{H}_{t_{k}}\right\rvert^{2}$
$\displaystyle\leq$	$\displaystyle\left(1+C\Delta t\right)\sup_{t_{k}\leq t\leq t_{k+1}}E\left\lvert\widetilde{Y}_{t}-\widehat{Y}_{t_{k}}\right\rvert^{2}$
$\displaystyle+$	$\displaystyle C\Delta t\int_{t_{k}}^{t_{k+1}}E\left(\left\lvert X_{s}-\widehat{X}_{t_{k}}\right\rvert^{2}\right)\,ds+\left(1+C\Delta t\right)b_{0},$

since we let $N$ large enough, and apply supremum to (40)

\sup_{t_{k+1}\leq t\leq t_{k+2}}E\left\lvert\widetilde{Y}_{t}-\widehat{Y}_{t_{k+1}}\right\rvert^{2}\leq\left(1+C\Delta t\right)\left(\sup_{t_{k}\leq t\leq t_{k+1}}E\left\lvert\widetilde{Y}_{t}-\widehat{Y}_{t_{k}}\right\rvert^{2}+b_{0}\right),

(41)

especially, we have

E\left\lvert\widetilde{Y}_{T}-\widehat{Y}_{T}\right\rvert^{2}\leq\left(1+C\Delta t\right)\left(\sup_{t_{N-1}\leq t\leq T}E\left\lvert\widetilde{Y}_{t}-\widehat{Y}_{t_{N-1}}\right\rvert^{2}+b_{0}\right),

(42)

Define $a_{k}:=\sup_{t_{k-1}\leq t\leq t_{k}}E\left\lvert\widetilde{Y}_{t}-\widehat{Y}_{t_{k-1}}\right\rvert^{2}$ , $k=1,\dots,N$ . Notice $\widetilde{Y}_{0}=\widehat{Y}_{t_{0}}=\widetilde{y}$ , instantly we have $a_{1}=(1+C\Delta t)b_{0}$ , then

\begin{cases}\begin{aligned} &a_{k}\leq\left(1+C\Delta t\right)\left(a_{k-1}+b_{0}\right),\\ &a_{1}\leq(1+C\Delta t)b_{0},\end{aligned}\end{cases}

(43)

in view of (43), we have

a_{N}\leq\sum_{k=1}^{N}\left(1+C\Delta t\right)^{k}b_{0}\leq C_{K}\left[\sqrt{E\left(\left\lVert B^{H}\right\rVert^{4}_{C^{H-\delta}}\right)}\left(\Delta t\right)^{1+2H-2\delta}+\Delta t^{2}\right],

(44)

and $a_{k}$ , $k=1,\dots,N-1$ , $E\left\lvert\widetilde{Y}_{T}-\widehat{Y}_{T}\right\rvert^{2}$ also hold, the proof is finished. ∎

Finally, we can give

Theorem 4.7.

Assume Assumptions 4.1-4.4 hold. Let $(Y_{t},Z_{t})$ be the solution of fBSDE (3), $\widehat{Y}_{t}$ denotes the state of system (32) under the control $\left(\widetilde{y},\widetilde{Z}_{\centerdot}\right)\in\mathbb{R}\times\mathcal{A}_{K}$ , then for $N$ large enough, there exist some constants $C$ which only depends on $T$ and $C_{f}^{\ast}$ and $C_{K}$ which depends on $T$ , $C_{f}^{\ast}$ and $K$ satisfying

	$\displaystyle\max_{0\leq k\leq N}\sup_{t_{k-1}\leq t\leq t_{k}}E\left\lvert Y_{t}-\widehat{Y}_{t_{k-1}}\right\rvert^{2}+E\left\lvert\int_{0}^{T}Z_{s}-\widetilde{Z}_{s}\,dB_{s}^{H}\right\rvert^{2}$	(45)
$\displaystyle=$	$\displaystyle\max_{0\leq k\leq N}\sup_{t_{k-1}\leq t\leq t_{k}}E\left\lvert Y_{t}-\widehat{Y}_{t_{k-1}}\right\rvert^{2}+E\left\{\left(\int_{0}^{T}D_{s}\left(Z_{s}-\widetilde{Z}_{s}\right)\,ds\right)^{2}+\left\lvert Z_{s}-\widetilde{Z}_{s}\right\rvert_{\phi}\right\}$
$\displaystyle\leq$	$\displaystyle CE\left\lvert g(X_{T})-\widehat{Y}_{T}\right\rvert^{2}+C_{K}\left[\sqrt{E\left(\left\lVert B^{H}\right\rVert^{4}_{C^{H-\delta}}\right)}\left(\Delta t\right)^{1+2H-2\delta}+\Delta t^{2}\right],$

where $\delta\in\left(0,H-\frac{1}{2}\right)$ .

Proof.

In view of Theorem 4.5 and Lemma 4.6,

	$\displaystyle\max_{0\leq k\leq N}\sup_{t_{k-1}\leq t\leq t_{k}}E\left\lvert Y_{t}-\widehat{Y}_{t_{k-1}}\right\rvert^{2}+E\left\lvert\int_{0}^{T}Z_{s}-\widetilde{Z}_{s}\,dB_{s}^{H}\right\rvert^{2}$	(46)
$\displaystyle\leq$	$\displaystyle\sup_{0\leq t\leq T}E\left\lvert Y_{t}-\widetilde{Y}_{t}\right\rvert^{2}+\max_{0\leq k\leq N}\sup_{t_{k-1}\leq t\leq t_{k}}E\left\lvert\widetilde{Y}_{t}-\widehat{Y}_{t_{k-1}}\right\rvert^{2}+E\left\lvert\int_{0}^{T}Z_{s}-\widetilde{Z}_{s}\,dB_{s}^{H}\right\rvert^{2}$
$\displaystyle\leq$	$\displaystyle CE\left\lvert g(X_{T})-\widetilde{Y}_{T}\right\rvert^{2}+\max_{0\leq k\leq N}\sup_{t_{k-1}\leq t\leq t_{k}}E\left\lvert\widetilde{Y}_{t}-\widehat{Y}_{t_{k-1}}\right\rvert^{2}$
$\displaystyle\leq$	$\displaystyle CE\left\lvert g(X_{T})-\widehat{Y}_{T}\right\rvert^{2}+CE\left\lvert\widetilde{Y}_{T}-\widehat{Y}_{T}\right\rvert^{2}+\max_{0\leq k\leq N}\sup_{t_{k-1}\leq t\leq t_{k}}E\left\lvert\widetilde{Y}_{t}-\widehat{Y}_{t_{k-1}}\right\rvert^{2}$
$\displaystyle\leq$	$\displaystyle CE\left\lvert g(X_{T})-\widehat{Y}_{T}\right\rvert^{2}+C_{K}\left[\sqrt{E\left(\left\lVert B^{H}\right\rVert^{4}_{C^{H-\delta}}\right)}\left(\Delta t\right)^{1+2H-2\delta}+\Delta t^{2}\right].$

∎

5 Numerical examples

In this section, We will introduce some experiments to verify whether RNN-BSDE method works well on fractional BSDEs. We mainly apply the RNN-BSDE algorithm based on a multi-layer LSTM to our experiments, which we refer to as LSTM-BSDE for brevity. In addition, we refer to the RNN-BSDE algorithm based on a stacked RNN as mRNN-BSDE.

5.1 Fractional Black-Scholes equation

In this subsection, we consider the extension of the famous Black-Scholes equation[26] which is widely applied in the field of finance. In view of (3.4), the fractional Black-scholes equation in the case of $d=1$ has the form of

\begin{cases}\begin{aligned} &\frac{\partial u(t,x)}{\partial t}+\sigma^{2}Hx^{2}t^{2H-1}\frac{\partial^{2}u(t,x)}{\partial x^{2}}+rx\frac{\partial u(t,x)}{\partial x}-ru(t,x)=0,\\ &u(T,x)=g(x),\\ \end{aligned}\end{cases}

(47)

where $r$ is a constant known as the interest rate. And if $H=\frac{1}{2}$ , (47) will be exactly the famous standard Black-Scholes equation.

Adopt $g(x)=\max{\left\{x-K,0\right\}}$ , To solve (47) is the equivalent of solving the pricing problem of European call option. It is not a difficult thing and just similar to what to do to solve the standard Black-Scholes equation in the $1-d$ case. By means of variable substitution, change (47) into a typical heat equation, and it can be verified that the solution of (47) is

u(t,x)=xN(d_{1})-Ke^{-r(T-t)}N(d_{2}),

(48)

where $N(t)=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{t}e^{-\frac{s^{2}}{2}}\,ds$ is the normal distribution function and

	$\displaystyle\eta$	$\displaystyle=\frac{\ln{\frac{x}{K}}+r(T-t)}{\sigma\sqrt{T^{2H}-t^{2H}}},$
	$\displaystyle d_{1}$	$\displaystyle=\eta+\frac{\sigma}{2}\sqrt{T^{2H}-t^{2H}},$
	$\displaystyle d_{2}$	$\displaystyle=\eta-\frac{\sigma}{2}\sqrt{T^{2H}-t^{2H}}.$

Our goal is to approximate $u(0,x_{0})$ , $x_{0}=(x_{0}^{(1)},\dots,x_{0}^{d})\in\mathbb{R}^{d}$ by the deep learning method and compare the LSTM-BSDE method with other methods designed for solving high-dimensional PDEs and SDEs to verify whether RNN-BSDE method works well on fractional BSDEs. There is some common setting for LSTM-BSDE method. The multi-layer LSTM set in the LSTM-BSDE consists of one input layer, two hidden layers and one output layer. The input layer is $d$ -dimensional , the two hidden layers are $(d+10)$ -dimensional, the output layer is $d$ -dimensional. In every hidden layer and output layer, Xavier initialisation[27] is used to initialise weights of inputs, orthogonal initialisation is used to initialise weights of recurrent connections, the biases are initialised to zero (these are exactly the default setting in Keras for LSTM units). The normal initialisation and the uniform initialisation are used to initialise $\beta$ and $\gamma$ of layer normalization. The layer normalization is applied before all the activation functions in the LSTM units of all hidden layers, and before the output layer.

The setting for the stacked RNN used in the RNN-BSDE method is the same as what we set for LSTM-BSDE. The methods used to compare with LSTM-BSDE that we choose are deep splitting method[28] and DBDP1 method[29]. The neural networks of these methods are FNN-based if without any extra description. And we set these FNNs as same as the one for the Deep BSDE method described in [9].

As for the optimizer, we choose Adam[30] for all, which is effective as known and checked by our experiment results.

5.1.1 Results in the one-dimension case( $d=1$ )

Set the dimension $d=1$ , $0=t_{0}<t_{1}<\dots<t_{N}=T$ , $T=0.5$ , $N=20$ , then $\Delta t=T/N=0.025$ . For the parameters of (17), (47) and (48), $\mu=0.06$ , $\sigma=0.2$ , $x_{0}=100$ , $r=0.06$ , $K=100$ . Learning rate $lr=0.005$ , the valid set size $m_{1}=256$ and the mini-batch size $m_{2}=64$ . To approximate $u(0,x_{0})$ , there will be 5 independent runs for each of the methods.

For comparison, firstly, we consider a trival case, where $H=\frac{1}{2}$ , i.e. (17) is a standard SDE driven by the Brownian motion $B_{t}$ and the explicit solution $u(0,x_{0})$ in (48) is around 7.1559. It is not surprising to observe that in Fig.3 the results of $\widetilde{u}(0,x_{0})$ from all algorithms are close to the true value and these methods using FNNs except deep splitting have a little better performance than these using RNNs since the Brownian motion $B_{t}$ has such a well-known fine property named Markov property. It means that to forecast the information in the future, we only need to know the information at the moment without considering what happened in the past, which makes the RNN structure lose its advantage.

Table 1: Numerical simulations for each method in the

1-d

case of the PDE (47) with

H=\frac{1}{2}(u_{true}=7.1559)

Means of

u_{0}

Std of

u_{0}

Rel.

L^{1}

error

Std of

rel.error

Avg.runtime

deep BSDE

7.1556

4.52e-3

4.71e-4

3.49e-4

277

LSTM-BSDE

7.1439

2.30e-2

2.75e-3

2.03e-3

403

mRNN-BSDE

7.1491

6.81e-3

1.23e-3

6.19e-4

180

7.1472

1.64e-2

2.03e-3

1.44e-3

779

DBDP1

7.1524

1.91e-2

1.89e-3

1.77e-3

595

Focus on the numerical experiments with $H=\frac{3}{4}$ , the explicit solution $u(0,x_{0})$ in (48) is around 6.2968, in this case, the LSTM-BSDE method and mRNN-BSDE begin to make a difference, $\widetilde{u}(0,x_{0})$ by the LSTM-BSDE method is close to the true value $u(0,x_{0})$ while the deep BSDE method and DBDP1 both offer the results of $\widetilde{u}(0,x_{0})$ which don’t converge to the true value after 10000 iterations. But the interesting thing is that deep splitting is also an effective method of solving fBSDEs and corresponding PDEs even without an RNN. The reason can be known from the idea and these loss functions of the deep splitting method introduced by [28], such loss functions help us to avoid estimating the integral of $Z$ w.r.t $B_{t}^{H}$ , i.e. $\int Z_{s}\,dB_{s}^{H}$ , directly.

Table 2: Numerical simulations for each method in the

1-d

case of the PDE (47) with

H=\frac{3}{4}(u_{true}=6.2968)

Means of

u_{0}

Std of

u_{0}

Rel.

L^{1}

error

Std of

rel.error

Avg.runtime

deep BSDE

4.2473

4.84e-3

0.3255

7.68e-4

286

LSTM-BSDE

6.2048

2.90e-2

0.0146

4.60e-3

402

mRNN-BSDE

6.1989

2.39e-3

0.0153

3.79e-4

181

6.1819

1.77e-3

0.0184

2.81e-4

823

DBDP1

4.3427

8.85e-3

0.3101

1.41e-3

619

5.1.2 Results in the high-dimension case( $d=50$ )

In the high-dimensional case, the fractional Black-scholes equation has the form

\begin{cases}\begin{aligned} &\frac{\partial u(t,x)}{\partial t}+\sum_{i=1}^{d}\sigma^{2}Hx_{j}^{2}t^{2H-1}\frac{\partial^{2}u(t,x)}{\partial x_{i}^{2}}+\sum_{i=1}^{d}rx_{i}\frac{\partial u(t,x)}{\partial x_{i}}-ru(t,x)=0,\\ &u(T,x)=g(x),\\ \end{aligned}\end{cases}

(49)

where $g(X_{T})=\max\left\{\max_{1\leq i\leq d}X_{T}^{i}-K,0\right\}$ . And in this case, there is no known analytical solution which is different from the 1-dimensional case.

We choose $d=50$ as the high-dimension case, $0=t_{0}<t_{1}<\dots<t_{N}=T$ , $T=0.5$ , $N=20$ , then $\Delta t=T/N=0.025$ . For the parameters, assume $\mu=0.06$ , $\sigma=0.2$ , $x_{0}=(100,100,\dots,100)\in\mathbb{R}^{d}$ , $r=0.06$ , $K=100$ . Learning rate $lr=0.008$ , the valid set size $m_{1}=256$ and the mini-batch size $m_{2}=64$ . To approximate $u(0,x_{0})$ , there will be 5 independent runs for each of the methods.

On principle, we still try our best to keep hyperparameters same for all algorithms but we can hardly ignore the difference between different methods especially in the high-dimensional case. For DBDP1, we set $n=11$ neuros on each hidden layers because if we set $n=d+10$ , $\widetilde{u}(0,x_{0})$ is slow to converge under $lr=0.008$ . Though we can choose to increase learning rate for DBDP1, we have finally chosen to keep the number of neuros on each hidden layers same as the 1-dimensional case by comparing the results.

Firstly, we also make a comparison between these methods with $H=\frac{1}{2}$ in the high-dimensional case. Similar to the one-dimensional case with $H=\frac{1}{2}$ , the values of $\widetilde{u}(0,x_{0})$ are all close whichever method we use as shown in Fig 4 and Table 3.

Table 3: Numerical simulations for each method in the

50-d

case of the PDE (49) with

H=\frac{1}{2}

Mean of

u_{0}

Std of

u_{0}

Avg.time to reach

\pm

0.5% of

means of

u_{0}

Avg.runtime

deep BSDE

39.3409

0.0246

352

799

LSTM-BSDE

39.3214

0.0274

1144

2771

mRNN-BSDE

39.3441

0.0397

389

894

39.3275

0.0405

1521

3041

DBDP1

39.3062

0.0099

1283

2571

Set $H=\frac{3}{4}$ , from Fig 4 and Table 4, it can be observed that LSTM-BSDE, mRNN-BSDE and deep splitting give close $\widetilde{u}(0,x_{0})$ and the same level of standard deviations while DBDP1 gives $\widetilde{u}(0,x_{0})$ a little far from them and a higher standard deviation and deep BSDE don’t offer a convergence value after 10000 iterations in this case.

Table 4: Numerical simulations for each method in the

50-d

case of the PDE (49) with

H=\frac{3}{4}

Mean of

u_{0}

Std of

u_{0}

Avg.time to reach

\pm

0.5% of

means of

u_{0}

Avg.runtime

deep BSDE

LSTM-BSDE

30.6935

0.0165

614

3025

mRNN-BSDE

30.6828

0.0225

301

1289

30.7005

0.0239

1750

4284

DBDP1

29.9831

0.1045

2217

3820

5.2 Nonlinear fractional Black-Scholes equation with different interest rates for borrowing and lending( $d=100$ )

Next, we gave some numerical experiments for calculating approximate solutions of some nonlinear parabolic PDEs by using mRNN-BSDE and LSTM-BSDE. Comparing with classical linear Black-Scholes equations, nonlinear Black-Scholes equations are under more realistic assumptions and there are many types of them. Here we have considered a nonlinear fractional Black-Scholes equation with different interest rates for borrowing and lending, which is

\begin{cases}\begin{aligned} &\frac{\partial u(t,x)}{\partial t}+\sum_{i=1}^{d}\sigma^{2}Hx_{j}^{2}t^{2H-1}\frac{\partial^{2}u(t,x)}{\partial x_{i}^{2}}+\sum_{i=1}^{d}\mu x_{i}\frac{\partial u(t,x)}{\partial x_{i}}\\ &+f(t,x,u(t,x),(\nabla u\sigma)(t,x))=0,\\ &f(t,x,y,z)=-r^{l}y-\frac{\mu-r^{l}}{\sigma}\sum_{i=1}^{d}z_{i}+(r^{b}-r^{l})\max\left\{0,\left[\frac{1}{\sigma}\sum_{i=1}^{d}z_{i}\right]-y\right\},\\ \end{aligned}\end{cases}

(50)

and

g(x)=\max\left\{\left[\max_{1\leq i\leq 100}x_{i}\right]-120,0\right\}-2\max\left\{\left[\max_{1\leq i\leq 100}x_{i}\right]-150,0\right\}.

(51)

Assume $d=100$ , $T=0.5$ , $N=20$ , $\mu=0.06$ , $\sigma=0.2$ , $x_{0}=(100,100,\dots,100)\in\mathbb{R}^{d}$ , $r^{l}=0.04$ , $r^{b}=0.06$ , $H=0.75$ . As for the network setting, assume learning rate $lr=0.005$ , the valid set size $m_{1}=256$ and the mini-batch size $m_{2}=64$ . For both mRNN-BSDE method and LSTM-BSDE method, set one $d$ -dimensional input layer, two $d+10$ -dimensional hidden layers and one $d$ -dimensional output layer. To approximate $u(0,x_{0})$ , there will be 5 independent runs for each of the methods. The true value of $u(0,x_{0})$ of Equation (50) is replaced by the reference value $u(0,x_{0})=14.8236$ which is calculated by deep splitting.

Table 5: Numerical simulations for RNN-BSDE method in the

100-d

case of the PDE (50) with

H=\frac{3}{4}

Mean of

u_{0}

Std of

u_{0}

Rel.

L^{1}

error

Std of

rel.error

Avg.runtime

LSTM-BSDE

14.7947

2.29e-2

1.93e-3

1.55e-3

3726

mRNN-BSDE

14.6800

1.42e-2

9.52e-3

9.60e-4

1480

5.3 A semilinear heat equation with variable coefficients( $d=50$ )

In this subsection, we consider a type of semilinear heat equation with variable coefficients of the form

\frac{\partial u(t,x)}{\partial t}+\sigma^{2}Ht^{2H-1}\Delta_{x}u(t,x)+\frac{1-\left\lvert u(t,x)\right\rvert^{2}}{1+\left\lvert u(t,x)\right\rvert^{2}}=0,

(52)

and

g(x)=\frac{5\exp\left(T^{2H}\right)}{10+2\left\lVert x\right\rVert^{2}_{\mathbb{R}^{d}}}.

(53)

Assume $d=50$ , $T=0.5$ , $N=20$ , $\mu(t,x)=0$ , $\sigma(t,x)=\sigma=1$ , $x_{0}=(0,0,\dots,0)\in\mathbb{R}^{d}$ , $H=2/3$ and assume $lr=0.005$ , the valid set size $m_{1}=256$ and the mini-batch size $m_{2}=64$ . For both mRNN-BSDE method and LSTM-BSDE method, set one $d$ -dimensional input layer, two $d+10$ -dimensional hidden layers and one $d$ -dimensional output layer. To approximate $u(0,x_{0})$ , there will be 5 independent runs for each of the methods. The true value of $u(0,x_{0})$ of Equation (52) is replaced by the reference value $u(0,x_{0})=0.54516$ which is calculated by deep splitting.

Table 6: Numerical simulations for RNN-BSDE method in the

50-d

case of the PDE (52) with

H=\frac{2}{3}

Mean of

u_{0}

Std of

u_{0}

Rel.

L^{1}

error

Std of

rel.error

Avg.runtime

LSTM-BSDE

0.57731

7.80e-4

5.90e-2

1.43e-3

2517

mRNN-BSDE

0.57496

1.01e-3

5.46e-2

1.84e-3

1022

6 Conclusion

Fix $H\in(\frac{1}{2},1)$ , in this paper, we have discussed the relationship between the systems of PDEs and the fFBSDEs where $\int_{0}^{t}f_{s}\,dB_{s}^{H}$ is in the sense of a Wick integral. And we have given (20) which is the PDE corresponding to the fFBSDEs where the solution solves the forward SDE (17) is called geometric fractional Brownian motion and significant in finance. What’s more, we have developed the RNN-BSDE method designed to solve fBSDEs. By numerical experiments, it can be observed that deep splitting and RNN-BSDE method are effective to solve fractional BSDEs and corresponding PDEs comparing with other methods. LSTM-BSDE and mRNN-BSDE have similar performance in solving fBSDEs according to our numerical experiments except LSTM-BSDE will cost more time. Especially, if one worries about the problem of ”long-term dependencies”, LSTM-BSDE may be a good choice to face it regardless of time cost.

References

[1] A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, K. Lang, Phoneme recognition using time-delay neural networks, IEEE Transactions on Acoustics, Speech, and Signal Processing 37 (3) (1989) 328–339. doi:10.1109/29.21701.
[2] D. E. Rumelhart, G. E. Hinton, R. J. Williams, Learning representations by back-propagating errors, Nature 323 (6088) (1986) 533–536.
[3] J. L. Elman, Finding structure in time, Cognitive Science 14 (2) (1990) 179–211. doi:https://doi.org/10.1016/0364-0213(90)90002-E.
URL https://www.sciencedirect.com/science/article/pii/036402139090002E
[4] M. I. Jordan, Serial order: A parallel distributed processing approach, in: J. W. Donahoe, V. Packard Dorsel (Eds.), Neural-Network Models of Cognition, Vol. 121 of Advances in Psychology, North-Holland, 1997, pp. 471–495. doi:https://doi.org/10.1016/S0166-4115(97)80111-2.
URL https://www.sciencedirect.com/science/article/pii/S0166411597801112
[5] M. Raissi, P. Perdikaris, G. E. Karniadakis, Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, Journal of Computational physics 378 (2019) 686–707.
[6] N. Kovachki, Z. Li, B. Liu, K. Azizzadenesheli, K. Bhattacharya, A. Stuart, A. Anandkumar, Neural operator: Learning maps between function spaces with applications to pdes, Journal of Machine Learning Research 24 (89) (2023) 1–97.
[7] H. Jiequn, E. Weinan, Deep learning approximation for stochastic control problems, CoRR abs/1611.07422 (2016).
[8] S. Ji, S. Peng, Y. Peng, X. Zhang, Solving stochastic optimal control problem via stochastic maximum principle with deep learning method, Journal of Scientific Computing 93 (1) (2022) 30.
[9] E. Weinan, J. Han, A. Jentzen, Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations, Communications in Mathematics & Statistics (2017).
[10] E. Pardoux, S. Peng, Backward stochastic differential equations and quasilinear parabolic partial differential equations, in: B. L. Rozovskii, R. B. Sowers (Eds.), Stochastic Partial Differential Equations and Their Applications, Springer Berlin Heidelberg, Berlin, Heidelberg, 1992, pp. 200–217.
[11] B. B. Mandelbrot, J. W. V. Ness, Fractional brownian motions, fractional noises and applications, SIAM Review 10 (4) (1968) 422–437.
URL http://www.jstor.org/stable/2027184
[12] J.-F. Le Gall, Brownian motion, martingales, and stochastic calculus, Springer, 2016.
[13] B. Øksendal, An introduction to malliavin calculus with applications to economics, 1996.
URL https://api.semanticscholar.org/CorpusID:6992348
[14] T. E. Duncan, Y. Hu, B. Pasik-Duncan, Stochastic calculus for fractional brownian motion. i. theory, in: IEEE Conference on Decision & Control, 2000, pp. 582–612.
[15] Y. HU, B. ØKSENDAL, Fractional white noise calculus and applications to finance, Infinite Dimensional Analysis, Quantum Probability and Related Topics 06 (01) (2003) 1–32. arXiv:https://doi.org/10.1142/S0219025703001110, doi:10.1142/S0219025703001110.
URL https://doi.org/10.1142/S0219025703001110
[16] Y. S.Mishura, Stochastic calculus for fractional Brownian motion and related processes, Vol. 1929, Springer Science & Business Media, 2008.
[17] N. Agram, B. ØKsendal, Introduction to white noise, hida-malliavin calculus and applications (2019).
[18] H. Helge, Ø. Bernt, U. Jan, Z. Tusheng, Stochastic Partial Differential Equations, Spring, 2010.
[19] K. Aase, B. Øksendal, N. Privault, J. Ubøe, White noise generalizations of the clark-haussmann-ocone theorem with application to mathematical finance, Finance & Stochastics 4 (4) (2000) 465–496.
[20] H. Gjessing, H. Holden, T. Lindstrøm, B. Øksendal, T. S. Zhang, The wick product (1992).
[21] S. Ioffe, C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, CoRR abs/1502.03167 (2015). arXiv:1502.03167.
URL http://arxiv.org/abs/1502.03167
[22] J. L. Ba, J. R. Kiros, G. E. Hinton, Layer normalization (2016). arXiv:1607.06450.
[23] S. Hochreiter, J. Schmidhuber, Long Short-Term Memory, Neural Computation 9 (8) (1997) 1735–1780. arXiv:https://direct.mit.edu/neco/article-pdf/9/8/1735/813796/neco.1997.9.8.1735.pdf, doi:10.1162/neco.1997.9.8.1735.
URL https://doi.org/10.1162/neco.1997.9.8.1735
[24] Y. Jiang, J. Li, Convergence of the deep bsde method for fbsdes with non-lipschitz coefficients, Probability, Uncertainty and Quantitative Risk 6 (4) (2021) 391. doi:10.3934/puqr.2021019.
URL http://dx.doi.org/10.3934/puqr.2021019
[25] D. Feyel, A. D. L. Pradelle, On fractional brownian processes, Potential Analysis 10 (3) (1999) 273–288.
[26] F. Black, M. S. Scholes, The pricing of options and corporate liabilities, Journal of Political Economy 81 (3) (1973) 637–654.
[27] X. Glorot, Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, JMLR Workshop and Conference Proceedings (2010).
[28] C. Beck, S. Becker, P. Cheridito, A. Jentzen, A. Neufeld, Deep splitting method for parabolic pdes, SIAM Journal on Scientific Computing 43 (5) (2021) A3135–A3154. doi:10.1137/19m1297919.
URL http://dx.doi.org/10.1137/19M1297919
[29] C. Huré, H. Pham, X. Warin, Deep backward schemes for high-dimensional nonlinear pdes, Mathematics of Computation 89 (324) (2020) 1.
[30] D. Kingma, J. Ba, Adam: A method for stochastic optimization, Computer Science (2014).

		$\displaystyle h^{(1)}_{t_{n}}=tanh\left(LN^{(1)}\left(X_{t_{n}}U_{h}^{(1)}+h_{t_{n-1}}^{(1)}W_{h}^{(1)}\right)+b^{(1)}\right),$		(28)
		$\displaystyle h^{(2)}_{t_{n}}=tanh\left(LN^{(2)}\left(h_{t_{n}}^{(1)}U_{h}^{(2)}+h_{t_{n-1}}^{(2)}W_{h}^{(2)}\right)+b^{(2)}\right),$
		$\displaystyle Z_{t_{n}}=LN^{(3)}\left(h^{(2)}_{t_{n}}U_{o}+Z_{t_{n-1}}W_{o}\right)+b^{(3)},\quad n=0,1,\cdots,N-1$

RNN-BSDE method for high-dimensional fractional backward stochastic differential equations with Wick-Itô integrals

Abstract

keywords:

1 Introduction

Definition 1.1 (fBM[11]).

2 Wick calculus

2.1 Wick integration

Definition 2.1 (Wick integration).

2.2 Stochastic derivative

Lemma 2.2 (The chain rule).

Theorem 2.3 (Itô formula for Wick integration).

Proposition 2.4.

Theorem 2.5.

3 Fractional Backward SDEs and systems of PDEs

Theorem 3.1.

Proof.

Remark 1.

Proposition 3.2.

Proof.

Proposition 3.3.

Proof.

Corollary 3.4.

Proof.

4 RNN-BSDE method

4.1 Main ideas of the RNN-BSDE method

4.2 More detail of the RNN-BSDE method and an example of solving fBSDEs

4.3 Convergence analysis

Assumptions 4.1.

Assumptions 4.2.

Assumptions 4.3.

Assumptions 4.4.

Theorem 4.5.

Lemma 4.6.

Proof.

Theorem 4.7.

Proof.

5 Numerical examples

5.1 Fractional Black-Scholes equation

5.1.1 Results in the one-dimension case(d=1d=1)

5.1.2 Results in the high-dimension case(d=50d=50)

5.2 Nonlinear fractional Black-Scholes equation with different interest rates for borrowing and lending(d=100d=100)

5.3 A semilinear heat equation with variable coefficients(d=50d=50)

6 Conclusion

References

5.1.1 Results in the one-dimension case( $d=1$ )

5.1.2 Results in the high-dimension case( $d=50$ )

5.2 Nonlinear fractional Black-Scholes equation with different interest rates for borrowing and lending( $d=100$ )

5.3 A semilinear heat equation with variable coefficients( $d=50$ )