RNN-BSDE method for high-dimensional fractional backward stochastic differential equations with Wick-Itô integrals
Abstract
Fractional Brownian motions(fBMs) are not semimartingales so the classical theory of Itô integral can’t apply to fBMs. Wick integration as one of the applications of Malliavin calculus to stochastic analysis is a fine definition for fBMs. We consider the fractional forward backward stochastic differential equations(fFBSDEs) driven by a fBM that have the Hurst parameter in where is in the sense of a Wick integral, and relate our fFBSDEs to the system of partial differential equations by using an analogue of the Itô formula for Wick integrals. And we develop a deep learning algorithm referred to as the RNN-BSDE method based on recurrent neural networks which is exactly designed for solving high-dimensional fractional BSDEs and their corresponding partial differential equations.
keywords:
Fractional Brownian motions, Wick calculus , Fractional backward stochastic differential equations , Deep learning , Recurrent neural networks1 Introduction
In recent years, deep learning has been developed and has been widely used to deal with high-dimensional problems about Partial Differential Equations(PDEs), though the key points of the idea of deep learning which include convolutional neural networks[1] and back propagation[2] were already developed by 1990. The Recurrent Neural Network algorithm, which is classical and basic for dealing with time series, was already developed by 2000[3][4]. In the 21st century, the advanced hardware and datasets rather than theories are mainly the reasons that deep learning takes off.
We have already known some work on solving PDEs by deep learning algorithms like PINNs[5] and neural operator[6]. In this paper, we’d like to discuss more on solving SDEs by deep learning, which may be rare and have less work on it. The late 2010s and 2020s have witnessed some deep learning algorithms to solve stochastic control problems[7][8] and solve the backward stochastic differential equations(BSDEs)[9]. Since the connection between nonlinear PDEs and BSDEs has been proved[10], Solving BSDEs means we can also solve the corresponding PDEs by using the same algorithm.
These work has inspired us whether it will work if we try to solve fractional backward stochastic differential equations(fBSDEs) by the deep learning method since it has been proved to be effective for BSDEs. In our work, we consider the following fractional forward backward stochastic differential equations(fFBSDEs)
(1) |
where is a fractional Brownian motion(fBM), let Hurst constant , and mean the start and final time, are all stochastic processes.
Mandelbrot and van Ness defined a fractional Brownian Motion as that
Definition 1.1 (fBM[11]).
Let an arbitrary real number. We call a fractional Brownian Motion with Hurst parameter H and starting value at time 0, such as
-
1.
,
-
2.
.
Since the fractional Brownian motions are not semimartingales, is not well-defined in the sense of Itô integral[12]. In this paper, we instead consider it as the Wick integral[13][14][15][16]. To introduce it, the Wick product is used. Moreover, the stochastic calculus including the Wick integration and differentiation which we will introduce later is based on the Malliavin calculus[13]. According to Wick integration, we can also write (1) as
(2) |
where is the mark of the Wick product.
Our purpose is to approximate the -adapted processes such that
(3) |
by using the deep learning method.
Our paper is organized as follows: In Sec.2, we summarize some results from the applications of Malliavin calculus to stochastic analysis that we need for our work. In Sec.3, we consider the relationship between fFBSDEs and PDEs, also we present some calculations which are useful for the numerical experiments in Sec.5. In Sec.4, we introduce our deep learning algorithm for solving fFBSDEs which is based on the Recurrent Neural Network, and we refer to this algorithm as RNN-BSDE method. In Sec.5, we give some numerical experiments of solving some parabolic PDEs by RNN-BSDE method and compare our method with other methods that are mainly designed for solving BSDEs rather than fractional BSDEs.
2 Wick calculus
We will review some results of the applications of Malliavin calculus to stochastic analysis. For the details one can learn more about stochastic calculus for Brownian motions from [13][17] and the version for fractional Brownian motions from [14][15][16]. Firstly we have some preparations.
Set and fix the Hurst constant. Define
(4) |
Let be measurable. We say that belongs to the Hilbert space if
(5) |
The inner product on is denoted by .
For any , define as
(6) |
is called an exponential function and let be the linear span of
.
2.1 Wick integration
Consider the fractional white noise probability space denoted where is the dual space of the Schwartz space and the probability measure exists by using the Bochner-Minlos theorem(see e.g.[18]), by use the fractional Wiener-Itô chaos expansion theorem[14], there is a chaos expansion(see e.g.[15]) for all . Then it’s ready to define the fractional Hida distribution spaces denoted if admits the chaos expansion with finite negative norm described in [15][16].
The Wick product is the product defined for which satisfy the fractional Wiener-Itô chaos expansion. And in this paper, it’s sufficient to know the fractional Brownian motion has its derivative in such that
is dense in and we have
Definition 2.1 (Wick integration).
Consider and an arbitrary partition of denoted , then for the following Riemann sum:
(7) |
is well-defined. denote , define as
(8) |
2.2 Stochastic derivative
The definition of stochastic derivative that its derivative operator is denoted as can be found in [13][14]. For , is defined by
Let (defined in [14]) an analogue of the directional derivative and is defined as
where is a random variable and .
The following results of stochastic derivative are used in this paper.
Lemma 2.2 (The chain rule).
Let and , exists, then
(10) |
The proof is an analogue of the proof of Lemma 3.6 of [19].
Theorem 2.3 (Itô formula for Wick integration).
Let , and . Assume there is an such that
where for some and
Let , be a function having the first continuous derivative in its first variable and the second continuous derivative in its second variable. Assume that these derivatives are bounded. Moreover, it is assumed that , are in . Then, for ,
(11) | ||||
Proposition 2.4.
If , , and , then
(12) |
if as , converges in to the same limit for all partitions satisfying as , then this limit is called the stochastic integral of Stratonovich type and the limit is denoted by .
In view of Proposition 2.4, we have
Theorem 2.5.
For , the following equality is satisfied:
(13) |
3 Fractional Backward SDEs and systems of PDEs
Consider the fractional white noise space (The multidimensional presentation will be an analogue of the case in [20]), and the fFBSDEs (2) which we give in Sec.1:
where , , , and let , satisfy the conditions of Theorem 2.3. , , is -measurable, .
We want to link fractional Backward SDEs and PDEs. Firstly, we give the following system of PDEs,
(14) |
where and we denote , and
We have the following theorem that
Theorem 3.1.
Remark 1.
From the proof, we can obtain
(16) |
It’s worth to consider an example of the geometric fractional Brownian motion which solves the fractional SDE that
(17) |
where , , are constants.
Proposition 3.2.
The solution of (17) is , i.e. .
Proof.
Using Wick calculus, then
(18) | ||||
∎
Proposition 3.3.
Let , the solution of (17) has a derivative
Proof.
Corollary 3.4.
4 RNN-BSDE method
Before we start to build our network, we should apply a time discretization to BSDEs (3). Consider the partition , for any on , from Definition 1.1 and Definition 2.1, it holds that
(21) |
Moreover, to deal with the Wick product, we use Proposition 2.4 and obtain that
(22) | ||||
The approximation scheme is still incomplete because is unknown since is which we need to find if it’s a problem of solving the fBSDEs. Of course, it may be an idea that we construct another neural network to approximate just like what we do for approximating that we will introduce soon. But we prefer to give another way, from Theorem 3.1 and (1), we can consider as , so in view of Lemma 2.2
Besides, if is the solution of (17), in view of Proposition 3.3,
Then we can rewrite (22) as
where means automatic differentiation.
4.1 Main ideas of the RNN-BSDE method
In our work, we develop an algorithm for solving fractional BSDEs based on deep BSDE method[9] and refer to it as RNN-BSDE method. The reason why it’s necessary to develop a new algorithm is, as we all know, fractional Brownian motions are not Markov processes. Besides, fBMs have the property of long-range dependence if and short-range dependence if (see e.g. [16]). It means the increments of can’t be non-correlated if . When we approximate , it will not be satisfying if we only consider using the information of in the input layer. Instead, we want to make full use of all the information before time , so a recurrent neural network is a better choice than a feedforward neural network.
Recurrent neural network(RNN) structure[3][4], which is classical for dealing with time series, has some advantages of solving fractional BSDEs that
-
1.
RNN can make full use of more information before time . RNN processes the sequence data stored in rank-3 tensors of shape (samples, timesteps, features). the computations of , in the recurrent unit which is the fundamental building block of an RNN can be expressed as
(23) where , are weight matrices, are biases, means the hidden state of hidden layers at time , is the activation function. For RNN, consider (23) with each , then
clearly, RNN satisfies what we required above.
-
2.
For , there are FNNs in deep BSDE method, what means the larger N is, the more parameters in neural networks that need to be determined are, which may be a burden of compute. But if we use an RNN structure, however large N is, there will be always one RNN since the hidden layer has a recurrent structure so that for any timestep , the weight matrices and biases are common and reused in an epoch. As mentioned in [9], for time nodes, one -dimensional input layer, two -dimensional hidden layers and one -dimensional output layer, there will be parameters to be trained for deep BSDE while parameters to be trained for RNN-BSDE using a stacked-RNN.
The main idea of the RNN-BSDE method can be expressed as.
(24) |
, | (25) |
(26) |
(27) |
In (24), the sub-neural network is an RNN instead of FNNs, and to make the RNN more effective, we choose a stacked RNN rather than a simple RNN, which the structure is shown in Fig.1.

The whole flow in the direction of forward propagation can be seen in Fig.2. Besides, when apply the deep BSDE method, batch normalization[21] is adopted right after each matrix multiplication and before activation. Notice that batch normalization is not suitable for RNNs, instead, we choose layer normalization[22]. Finally, we provide the pseudocode of RNN-BSDE method as following:

4.2 More detail of the RNN-BSDE method and an example of solving fBSDEs
In this section, we will introduce more about how to set up the neural network of the RNN-BSDE method to make the algorithm more practical. It’s convenient to use the simplest traditional RNN to explain our main idea but it’s too weak to solve some fBSDEs. So it’s necessary to tell more detail and apply some more practical types of RNN.
Suppose the samples of , , where , means the number of sample paths in the whole valid sets and we denote the mini-batch size by , means the number of time nodes, is the dimension (which will be regarded as the number of features in deep learning) of inputs. The stacked RNN to approximate has at least four layers including one -dimensional input layer, at least two -dimensional hidden layers and one -dimensional output layer. The hidden layers and output layer are all RNN layers composed of recurrent units, we set weight matrices and , recurrent weight matrices and . There is no activation function after each matrix multiplication and instead we have
(28) | ||||
where means layer normalization, is the hyperbolic tangent function. (28) can be understood more easily with Fig.1 together. All weights and parameters of layer normalization will be randomly initialised at the start of each run.
If we worry that a stacked RNN is still not powerful enough to solve most fBSDEs, it will be the turn of a special type of RNN named Long Short Term Memory networks(LSTMs)[23]. For , the fBM holds the long memory property and as known a traditional RNN is not able to handle ”long-term dependencies” in practice while the LSTM can keep useful with long-term dependencies and deal with the vanishing gradient problem in the RNN. Since LSTMs are a kind of RNN, we can change an RNN into a LSTM just by replacing RNN units in the network with LSTM units which are the fundamental building blocks of a LSTM. A LSTM cell is composed of a cell and three gates including an input gate, a forget gate and an output gate. In the RNN-BSDE method, we choose to use a LSTM with layer normalization which has the similar structure to a stacked-RNN illustrated in Fig.1, i.e. multiple -dimensional LSTM layers as hidden layers and one extra -dimensional LSTM layer before output.
4.3 Convergence analysis
In this part, we provide a posterior estimate of the numerical solution and this posterior estimate justifies the convergence of RNN-BSDE method. Firstly, assume
Assumptions 4.1.
Assumptions 4.2.
For any , and ,
where is a given positive constant.
Assumptions 4.3.
For any and any , satisfying ,
where is a given positive constant and denote .
Assumptions 4.4.
Consider the following fFBSDE system with the state
(29) |
the aim is to minimize the objective functional
(30) |
under the control .
Firstly, notice (9), and we provide
Theorem 4.5.
The proof of Theorem 4.5 is an analogue of Theorem 2.1 in [24], only note that Itô formula should be replaced by the Itô formula for Wick integration.
In view of 4.5, the problem of solving (3) can be changed into the stochastic control problem of the system (29). So it will be a reasonable choice to apply deep learning to this kind of problems.
Then, we need to provide the estimation of the error resulted from time discretization. From now on, we manily consider for brevity.
denotes a -Hlder space, denotes the -Hlder norm.
For any constant , define
Let be the state of
(32) |
with the aim of minimizing the objective functional
(33) |
under the control . For any partition , define
Lemma 4.6.
Assume , then let large enough, for any partition and , it follows that
(34) |
moreover,
(35) |
especially,
(36) |
with some not depending on , .
Proof.
Define for . In view of Lemma 19 in [25], it follows that
(37) |
where . Then
(38) |
In view of Theorem 2.5 and (9), for large enough,
(39) | ||||
let denote . Consider (32) and (29), for any we have
(40) | ||||
since we let large enough, and apply supremum to (40)
(41) |
especially, we have
(42) |
Define , . Notice , instantly we have , then
(43) |
in view of (43), we have
(44) |
and , , also hold, the proof is finished. ∎
Finally, we can give
Theorem 4.7.
5 Numerical examples
In this section, We will introduce some experiments to verify whether RNN-BSDE method works well on fractional BSDEs. We mainly apply the RNN-BSDE algorithm based on a multi-layer LSTM to our experiments, which we refer to as LSTM-BSDE for brevity. In addition, we refer to the RNN-BSDE algorithm based on a stacked RNN as mRNN-BSDE.
5.1 Fractional Black-Scholes equation
In this subsection, we consider the extension of the famous Black-Scholes equation[26] which is widely applied in the field of finance. In view of (3.4), the fractional Black-scholes equation in the case of has the form of
(47) |
where is a constant known as the interest rate. And if , (47) will be exactly the famous standard Black-Scholes equation.
Adopt , To solve (47) is the equivalent of solving the pricing problem of European call option. It is not a difficult thing and just similar to what to do to solve the standard Black-Scholes equation in the case. By means of variable substitution, change (47) into a typical heat equation, and it can be verified that the solution of (47) is
(48) |
where is the normal distribution function and
Our goal is to approximate , by the deep learning method and compare the LSTM-BSDE method with other methods designed for solving high-dimensional PDEs and SDEs to verify whether RNN-BSDE method works well on fractional BSDEs. There is some common setting for LSTM-BSDE method. The multi-layer LSTM set in the LSTM-BSDE consists of one input layer, two hidden layers and one output layer. The input layer is -dimensional , the two hidden layers are -dimensional, the output layer is -dimensional. In every hidden layer and output layer, Xavier initialisation[27] is used to initialise weights of inputs, orthogonal initialisation is used to initialise weights of recurrent connections, the biases are initialised to zero (these are exactly the default setting in Keras for LSTM units). The normal initialisation and the uniform initialisation are used to initialise and of layer normalization. The layer normalization is applied before all the activation functions in the LSTM units of all hidden layers, and before the output layer.
The setting for the stacked RNN used in the RNN-BSDE method is the same as what we set for LSTM-BSDE. The methods used to compare with LSTM-BSDE that we choose are deep splitting method[28] and DBDP1 method[29]. The neural networks of these methods are FNN-based if without any extra description. And we set these FNNs as same as the one for the Deep BSDE method described in [9].
As for the optimizer, we choose Adam[30] for all, which is effective as known and checked by our experiment results.
5.1.1 Results in the one-dimension case()
Set the dimension , , , , then . For the parameters of (17), (47) and (48), , , , , . Learning rate , the valid set size and the mini-batch size . To approximate , there will be 5 independent runs for each of the methods.
For comparison, firstly, we consider a trival case, where , i.e. (17) is a standard SDE driven by the Brownian motion and the explicit solution in (48) is around 7.1559. It is not surprising to observe that in Fig.3 the results of from all algorithms are close to the true value and these methods using FNNs except deep splitting have a little better performance than these using RNNs since the Brownian motion has such a well-known fine property named Markov property. It means that to forecast the information in the future, we only need to know the information at the moment without considering what happened in the past, which makes the RNN structure lose its advantage.

|
|
|
|
|
|||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
deep BSDE | 7.1556 | 4.52e-3 | 4.71e-4 | 3.49e-4 | 277 | ||||||||||
LSTM-BSDE | 7.1439 | 2.30e-2 | 2.75e-3 | 2.03e-3 | 403 | ||||||||||
mRNN-BSDE | 7.1491 | 6.81e-3 | 1.23e-3 | 6.19e-4 | 180 | ||||||||||
DS | 7.1472 | 1.64e-2 | 2.03e-3 | 1.44e-3 | 779 | ||||||||||
DBDP1 | 7.1524 | 1.91e-2 | 1.89e-3 | 1.77e-3 | 595 |
Focus on the numerical experiments with , the explicit solution in (48) is around 6.2968, in this case, the LSTM-BSDE method and mRNN-BSDE begin to make a difference, by the LSTM-BSDE method is close to the true value while the deep BSDE method and DBDP1 both offer the results of which don’t converge to the true value after 10000 iterations. But the interesting thing is that deep splitting is also an effective method of solving fBSDEs and corresponding PDEs even without an RNN. The reason can be known from the idea and these loss functions of the deep splitting method introduced by [28], such loss functions help us to avoid estimating the integral of w.r.t , i.e. , directly.
|
|
|
|
|
|||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
deep BSDE | 4.2473 | 4.84e-3 | 0.3255 | 7.68e-4 | 286 | ||||||||||
LSTM-BSDE | 6.2048 | 2.90e-2 | 0.0146 | 4.60e-3 | 402 | ||||||||||
mRNN-BSDE | 6.1989 | 2.39e-3 | 0.0153 | 3.79e-4 | 181 | ||||||||||
DS | 6.1819 | 1.77e-3 | 0.0184 | 2.81e-4 | 823 | ||||||||||
DBDP1 | 4.3427 | 8.85e-3 | 0.3101 | 1.41e-3 | 619 |
5.1.2 Results in the high-dimension case()
In the high-dimensional case, the fractional Black-scholes equation has the form
(49) |
where . And in this case, there is no known analytical solution which is different from the 1-dimensional case.
We choose as the high-dimension case, , , , then . For the parameters, assume , , , , . Learning rate , the valid set size and the mini-batch size . To approximate , there will be 5 independent runs for each of the methods.
On principle, we still try our best to keep hyperparameters same for all algorithms but we can hardly ignore the difference between different methods especially in the high-dimensional case. For DBDP1, we set neuros on each hidden layers because if we set , is slow to converge under . Though we can choose to increase learning rate for DBDP1, we have finally chosen to keep the number of neuros on each hidden layers same as the 1-dimensional case by comparing the results.
Firstly, we also make a comparison between these methods with in the high-dimensional case. Similar to the one-dimensional case with , the values of are all close whichever method we use as shown in Fig 4 and Table 3.

|
|
|
|
||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
deep BSDE | 39.3409 | 0.0246 | 352 | 799 | |||||||||
LSTM-BSDE | 39.3214 | 0.0274 | 1144 | 2771 | |||||||||
mRNN-BSDE | 39.3441 | 0.0397 | 389 | 894 | |||||||||
DS | 39.3275 | 0.0405 | 1521 | 3041 | |||||||||
DBDP1 | 39.3062 | 0.0099 | 1283 | 2571 |
Set , from Fig 4 and Table 4, it can be observed that LSTM-BSDE, mRNN-BSDE and deep splitting give close and the same level of standard deviations while DBDP1 gives a little far from them and a higher standard deviation and deep BSDE don’t offer a convergence value after 10000 iterations in this case.
|
|
|
|
||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
deep BSDE | NC | NC | NC | NC | |||||||||
LSTM-BSDE | 30.6935 | 0.0165 | 614 | 3025 | |||||||||
mRNN-BSDE | 30.6828 | 0.0225 | 301 | 1289 | |||||||||
DS | 30.7005 | 0.0239 | 1750 | 4284 | |||||||||
DBDP1 | 29.9831 | 0.1045 | 2217 | 3820 |
5.2 Nonlinear fractional Black-Scholes equation with different interest rates for borrowing and lending()
Next, we gave some numerical experiments for calculating approximate solutions of some nonlinear parabolic PDEs by using mRNN-BSDE and LSTM-BSDE. Comparing with classical linear Black-Scholes equations, nonlinear Black-Scholes equations are under more realistic assumptions and there are many types of them. Here we have considered a nonlinear fractional Black-Scholes equation with different interest rates for borrowing and lending, which is
(50) |
and
(51) |
Assume , , , , , , , , . As for the network setting, assume learning rate , the valid set size and the mini-batch size . For both mRNN-BSDE method and LSTM-BSDE method, set one -dimensional input layer, two -dimensional hidden layers and one -dimensional output layer. To approximate , there will be 5 independent runs for each of the methods. The true value of of Equation (50) is replaced by the reference value which is calculated by deep splitting.

|
|
|
|
|
|||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
LSTM-BSDE | 14.7947 | 2.29e-2 | 1.93e-3 | 1.55e-3 | 3726 | ||||||||||
mRNN-BSDE | 14.6800 | 1.42e-2 | 9.52e-3 | 9.60e-4 | 1480 |
5.3 A semilinear heat equation with variable coefficients()
In this subsection, we consider a type of semilinear heat equation with variable coefficients of the form
(52) |
and
(53) |
Assume , , , , , , and assume , the valid set size and the mini-batch size . For both mRNN-BSDE method and LSTM-BSDE method, set one -dimensional input layer, two -dimensional hidden layers and one -dimensional output layer. To approximate , there will be 5 independent runs for each of the methods. The true value of of Equation (52) is replaced by the reference value which is calculated by deep splitting.

|
|
|
|
|
|||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
LSTM-BSDE | 0.57731 | 7.80e-4 | 5.90e-2 | 1.43e-3 | 2517 | ||||||||||
mRNN-BSDE | 0.57496 | 1.01e-3 | 5.46e-2 | 1.84e-3 | 1022 |
6 Conclusion
Fix , in this paper, we have discussed the relationship between the systems of PDEs and the fFBSDEs where is in the sense of a Wick integral. And we have given (20) which is the PDE corresponding to the fFBSDEs where the solution solves the forward SDE (17) is called geometric fractional Brownian motion and significant in finance. What’s more, we have developed the RNN-BSDE method designed to solve fBSDEs. By numerical experiments, it can be observed that deep splitting and RNN-BSDE method are effective to solve fractional BSDEs and corresponding PDEs comparing with other methods. LSTM-BSDE and mRNN-BSDE have similar performance in solving fBSDEs according to our numerical experiments except LSTM-BSDE will cost more time. Especially, if one worries about the problem of ”long-term dependencies”, LSTM-BSDE may be a good choice to face it regardless of time cost.
References
- [1] A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, K. Lang, Phoneme recognition using time-delay neural networks, IEEE Transactions on Acoustics, Speech, and Signal Processing 37 (3) (1989) 328–339. doi:10.1109/29.21701.
- [2] D. E. Rumelhart, G. E. Hinton, R. J. Williams, Learning representations by back-propagating errors, Nature 323 (6088) (1986) 533–536.
-
[3]
J. L. Elman,
Finding
structure in time, Cognitive Science 14 (2) (1990) 179–211.
doi:https://doi.org/10.1016/0364-0213(90)90002-E.
URL https://www.sciencedirect.com/science/article/pii/036402139090002E -
[4]
M. I. Jordan,
Serial
order: A parallel distributed processing approach, in: J. W. Donahoe,
V. Packard Dorsel (Eds.), Neural-Network Models of Cognition, Vol. 121 of
Advances in Psychology, North-Holland, 1997, pp. 471–495.
doi:https://doi.org/10.1016/S0166-4115(97)80111-2.
URL https://www.sciencedirect.com/science/article/pii/S0166411597801112 - [5] M. Raissi, P. Perdikaris, G. E. Karniadakis, Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, Journal of Computational physics 378 (2019) 686–707.
- [6] N. Kovachki, Z. Li, B. Liu, K. Azizzadenesheli, K. Bhattacharya, A. Stuart, A. Anandkumar, Neural operator: Learning maps between function spaces with applications to pdes, Journal of Machine Learning Research 24 (89) (2023) 1–97.
- [7] H. Jiequn, E. Weinan, Deep learning approximation for stochastic control problems, CoRR abs/1611.07422 (2016).
- [8] S. Ji, S. Peng, Y. Peng, X. Zhang, Solving stochastic optimal control problem via stochastic maximum principle with deep learning method, Journal of Scientific Computing 93 (1) (2022) 30.
- [9] E. Weinan, J. Han, A. Jentzen, Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations, Communications in Mathematics & Statistics (2017).
- [10] E. Pardoux, S. Peng, Backward stochastic differential equations and quasilinear parabolic partial differential equations, in: B. L. Rozovskii, R. B. Sowers (Eds.), Stochastic Partial Differential Equations and Their Applications, Springer Berlin Heidelberg, Berlin, Heidelberg, 1992, pp. 200–217.
-
[11]
B. B. Mandelbrot, J. W. V. Ness,
Fractional brownian motions,
fractional noises and applications, SIAM Review 10 (4) (1968) 422–437.
URL http://www.jstor.org/stable/2027184 - [12] J.-F. Le Gall, Brownian motion, martingales, and stochastic calculus, Springer, 2016.
-
[13]
B. Øksendal, An
introduction to malliavin calculus with applications to economics, 1996.
URL https://api.semanticscholar.org/CorpusID:6992348 - [14] T. E. Duncan, Y. Hu, B. Pasik-Duncan, Stochastic calculus for fractional brownian motion. i. theory, in: IEEE Conference on Decision & Control, 2000, pp. 582–612.
-
[15]
Y. HU, B. ØKSENDAL,
Fractional white noise
calculus and applications to finance, Infinite Dimensional Analysis, Quantum
Probability and Related Topics 06 (01) (2003) 1–32.
arXiv:https://doi.org/10.1142/S0219025703001110, doi:10.1142/S0219025703001110.
URL https://doi.org/10.1142/S0219025703001110 - [16] Y. S.Mishura, Stochastic calculus for fractional Brownian motion and related processes, Vol. 1929, Springer Science & Business Media, 2008.
- [17] N. Agram, B. ØKsendal, Introduction to white noise, hida-malliavin calculus and applications (2019).
- [18] H. Helge, Ø. Bernt, U. Jan, Z. Tusheng, Stochastic Partial Differential Equations, Spring, 2010.
- [19] K. Aase, B. Øksendal, N. Privault, J. Ubøe, White noise generalizations of the clark-haussmann-ocone theorem with application to mathematical finance, Finance & Stochastics 4 (4) (2000) 465–496.
- [20] H. Gjessing, H. Holden, T. Lindstrøm, B. Øksendal, T. S. Zhang, The wick product (1992).
-
[21]
S. Ioffe, C. Szegedy, Batch
normalization: Accelerating deep network training by reducing internal
covariate shift, CoRR abs/1502.03167 (2015).
arXiv:1502.03167.
URL http://arxiv.org/abs/1502.03167 - [22] J. L. Ba, J. R. Kiros, G. E. Hinton, Layer normalization (2016). arXiv:1607.06450.
-
[23]
S. Hochreiter, J. Schmidhuber,
Long Short-Term Memory,
Neural Computation 9 (8) (1997) 1735–1780.
arXiv:https://direct.mit.edu/neco/article-pdf/9/8/1735/813796/neco.1997.9.8.1735.pdf,
doi:10.1162/neco.1997.9.8.1735.
URL https://doi.org/10.1162/neco.1997.9.8.1735 -
[24]
Y. Jiang, J. Li, Convergence of
the deep bsde method for fbsdes with non-lipschitz coefficients,
Probability, Uncertainty and Quantitative Risk 6 (4) (2021) 391.
doi:10.3934/puqr.2021019.
URL http://dx.doi.org/10.3934/puqr.2021019 - [25] D. Feyel, A. D. L. Pradelle, On fractional brownian processes, Potential Analysis 10 (3) (1999) 273–288.
- [26] F. Black, M. S. Scholes, The pricing of options and corporate liabilities, Journal of Political Economy 81 (3) (1973) 637–654.
- [27] X. Glorot, Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, JMLR Workshop and Conference Proceedings (2010).
-
[28]
C. Beck, S. Becker, P. Cheridito, A. Jentzen, A. Neufeld,
Deep splitting method for
parabolic pdes, SIAM Journal on Scientific Computing 43 (5) (2021)
A3135–A3154.
doi:10.1137/19m1297919.
URL http://dx.doi.org/10.1137/19M1297919 - [29] C. Huré, H. Pham, X. Warin, Deep backward schemes for high-dimensional nonlinear pdes, Mathematics of Computation 89 (324) (2020) 1.
- [30] D. Kingma, J. Ba, Adam: A method for stochastic optimization, Computer Science (2014).