Deep FBSDE Neural Networks for Solving Incompressible Navier-Stokes Equation and Cahn-Hilliard Equation

Yangtao Deng [email protected] Qiaolin He [email protected] School of Mathematics, Sichuan University, Chengdu, China

Abstract

Efficient algorithms for solving high-dimensional partial differential equations (PDEs) has been an exceedingly difficult task for a long time, due to the curse of dimensionality. We extend the forward-backward stochastic neural networks (FBSNNs) which depends on forward-backward stochastic differential equation (FBSDE) to solve incompressible Navier-Stokes equation. For Cahn-Hilliard equation, we derive a modified Cahn-Hilliard equation from a widely used stabilized scheme for original Cahn-Hilliard equation. This equation can be written as a continuous parabolic system, where FBSDE can be applied and the unknown solution is approximated by neural network. Also our method is successfully developed to Cahn-Hilliard-Navier-Stokes (CHNS) equation. The accuracy and stability of our methods are shown in many numerical experiments, specially in high dimension.

keywords:

forward-backward stochastic differential equation, neural network, Navier-Stokes, Cahn-Hilliard, high dimension

1 Introduction

High-dimensional nonlinear partial differential equations (PDEs) are used widely in a number of areas of social and natural sciences. Due to the significant nonlinearity of nonlinear PDEs, particularly in high-dimensional cases, analytical solutions to nonlinear PDEs are typically difficult to acquire. Therefore, numerical solutions to these kinds of nonlinear PDEs are very important. However, due to their exponential increase in complexity, traditional approaches like finite difference method and finite element method fail in high-dimensional instances. Many fields pay close attention to developments in numerical algorithms for solving high-dimensional PDEs. There are several numerical methods for solving nonlinear high-dimensional partial differential equations here, such as Monte Carlo method[1, 2], lattice rule[3] and sparse grid method[4, 5], etc. They exhibit relative adaptability in addressing high-dimensional problems. However, they typically require substantial computational resources, especially in high-dimensional scenarios. Monte Carlo method often demands a large number of sample points, while lattice rule and sparse grid method may require finer grids or adaptive strategies. Moreover, their convergence rates are usually relatively slow, particularly in high-dimensional situations. Achieving the desired accuracy may entail a significant amount of computation.

Recently, deep neural networks (DNNs) have been used to create numerical algorithms which work well at overcoming the curse of dimensionality and successfully solving high-dimensional PDEs[6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22]. Inspired by Ritz method, deep Ritz method (DRM) [8] is proposed to solve variational problem arising from PDEs. The deep Galerkin method (DGM) is proposed in [18] to solve high-dimensional PDEs by approximating the solution with a deep neural network which is trained to satisfy the differential operator, initial condition, and boundary conditions.The physics-informed neural networks (PINN) is presented in [16], where the PDE is embedded into the neural network by utilizing automatic differentiation (AD). Three adaptive techniques to improve the computational performance of DNNs methods for high-dimensional PDEs are presented in [21]. The authors in [12] proposed an approach for scattering problems connected with linear PDEs of the Helmholtz type that relies on DNNs to describe the forward and inverse map. In [9], the deep backward stochastic differential equation (BSDE) method based on the nonlinear Feynman-Kac formula (see e.g.[23]) is proposed, which is used in [10] to estimate the solution of eigenvalue problems for semilinear second order differential operators.

The PINN and deep BSDE method are two different kinds of numerical frameworks for solving general PDEs. The AD is used to avoid truncation error and the numerical quadrature errors of variational form. Some gradient optimization methods are used to update the neural network so that the loss of the differential equation and boundary condition is reduced. The deep BSDE method treats the BSDE as a stochastic control problem with the gradient of the solution being the policy function and parameterizes the control process of the solution by DNNs. Then it trains the resulting sequence of networks in a global optimization given by the prescribed terminal condition. These methods do not rely on the training data provided by some external algorithms, which can be considered as unsupervised learning methods. One drawback of PINN is the high computational complexity of its loss function, which includes the differential operator in the PDE to be solved. On the other hand, the deep BSDE method does not require the computation of high order derivatives. Moreover, the loss function used by deep BSDE method involves only simple additive calculations, thereby deep BSDE method iterates faster. The deep BSDE method has made high-dimensional problems solvable, which allows us to solve high-dimensional semilinear PDEs in a reasonable amount of time. Recently, there are some works related to deep BSDE method, see [6, 7, 10, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36]. Based on the deep BSDE method, an improved method called FBSNNs is proposed in [6]. The method proceeds by approximating the unknown solution using a fully connected feedforward neural network (FNN) and obtains the required gradient vector by applying AD. However, because the nonlinear Feynman-Kac formula is involved in the reformulation procedure, FBSNNs can only handle some specific Cauchy problems without boundary conditions. Then, it is desirable to extend the FBSNNs to other kinds of PDEs and deal with the problems with boundary conditions.

The Navier-Stokes equation is an important equation in fluid dynamics and the Cahn-Hilliard equation is widely used in multi-phase problems. These equations are difficult to solve due to their complexity. There are many deep learning methods that have been applied to solve these equations in one or two dimensions (see e.g.[37, 38, 39, 40, 41, 42]). However, these methods fail due to excessive complexity when the dimension is more than three. We choose to introduce the FBSNNs presented in [6] to solve these equations. We convert the incompressible Navier-Stokes equations into FBSDEs and then employ FBSNNs to solve them in two or three dimension. We develop a suitable numerical method based on the reflection principle to deal with the Neumann boundary condition and handle the Dirichlet boundary condition using the method mentioned in [43]. We rewrite the Cahn-Hilliard equation into a system of parabolic equations by adding stable terms reasonably, then the numerical solution of the new system is obtained using the FBSNNs. However, when dealing with mixed boundary condition, the above method should be improved. We utilize an approach which is similar to the method for Dirichlet boundary case, meanwhile we add an extra error item to the final loss function for the Neumann boundary condition. The equation can also be solved for periodic boundary condition with techniques involved the periodicity. Therefore, we can naturally solve the CHNS equation which is a coupled system of Navier-Stokes and Cahn-Hilliard equations.

The rest of this article is organized as follows. In Section 2, we introduce FBSDEs, deep BSDE method and FBSNNs method briefly. In Section 3, we present the approach to solve incompressible Navier-Stokes equations with different boundary conditions. A methodology is proposed in Section 4 to solve Cahn-Hilliard equation with different boundary conditions. In Section 5, the method to solve CHNS system is developed. Numerical experiments are given in Section 6 to verify the effectiveness of our methods. Finally, conclusions and remarks are given in Section 7.

2 FBSDEs, deep BSDE method and FBSNNs

2.1 A brief introduction of FBSDEs

The FBSDEs where the randomness in the BSDE driven by a forward stochastic differential equation (SDE), is written in the general form

\left\{\begin{aligned} &dX_{s}=b(s,X_{s})ds+\sigma(s,X_{s})dW_{s},\quad s\in[0,T],\\ &X_{0}=x_{0},\\ &-dY_{s}=f(s,X_{s},Y_{s},Z_{s})ds-Z_{s}^{T}\sigma(s,X_{s})dW_{s},\quad s\in[0,T],\\ &Y_{T}=g(X_{T}),\end{aligned}\right.

(1)

where $\{W_{s}\}_{0\leq s\leq T}$ is a d-dimensional Brownian motion, $b:[0,T]\times\mathbb{R}^{d}\rightarrow\mathbb{R}^{d}$ , $\sigma:[0,T]\times\mathbb{R}^{d}\rightarrow\mathbb{R}^{d\times d}$ , $f:[0,T]\times\mathbb{R}^{d}\times\mathbb{R}^{m}\times\mathbb{R}^{d\times m}\rightarrow\mathbb{R}^{m}$ and $g:[0,T]\times\mathbb{R}^{d}\rightarrow\mathbb{R}^{m}$ are all deterministic mappings of time and space, with the fixed $T>0$ . We refer to $Z$ as the $control\ process$ according to the stochastic control terminology. In order to guarantee the existence of a unique solution pair $\{(Y_{s},Z_{s})\}_{0\leq s\leq T}$ adapted to the augmented natural filtration, the standard well-posedness assumptions of [23] are required. Indeed, considering the quasi-linear, parabolic terminal problem

\displaystyle u_{t}(t,x)+\mathcal{L}u(t,x)+f(t,x,u(t,x),\nabla u(t,x))=0,\ (t,x)\in[0,T]\times\mathbb{R}^{d},

(2)

with $u(T,x)=g(x)$ and $\mathcal{L}$ is the second-order differential operator

\displaystyle\mathcal{L}u(t,x)=\frac{1}{2}\sum_{i,j=1}^{d}a_{i,j}(t,x)\frac{\partial^{2}u(t,x)}{\partial x_{i}\partial x_{j}}+\sum_{i=1}^{d}b_{i}(t,x)\frac{\partial u(t,x)}{\partial x_{i}},\quad a_{i,j}=[\sigma\sigma^{T}]_{ij},

(3)

the nonlinear Feynman-Kac formula indicates that the solution of (1) coincides almost exactly with the solution of (2) (cf., e.g., [23])

\displaystyle Y_{s}=u(s,X_{s}),\ Z_{s}=\nabla u\left(s,X_{s}\right),\ s\in[0,T].

(4)

As a result, the BSDE formulation offers a stochastic representation to the synchronous solution of a parabolic problem and its gradient, which is a distinct advantage for numerous applications in stochastic control.

2.2 Deep BSDE method

Inorder to review the deep BSDE method proposed in [9], we consider the following FBSDEs

\left\{\begin{aligned} &X_{t}=x_{0}+\int_{0}^{t}b(s,X_{s})ds+\int_{0}^{t}\sigma(s,X_{s})dW_{s},\\ &Y_{t}=g(X_{T})+\int_{t}^{T}f(s,X_{s},Y_{s},Z_{s})ds-\int_{t}^{T}Z_{s}^{T}\sigma(s,X_{s})dW_{s},\end{aligned}\right.

(5)

which is the integration form of (1). Given a partition of the time interval $[0,T]:0=t_{0}<t_{1}<...<t_{N}=T$ , the Euler-Maruyama scheme is used to discretize for $X_{t}$ and $Y_{t}$ and we have

\left\{\begin{aligned} &X_{t_{n+1}}=X_{t_{n}}+b(t_{n},X_{t_{n}})\Delta t_{n}+\sigma(t_{n},X_{t_{n}})\Delta W_{t_{n}},\\ &Y_{t_{n+1}}(X_{t_{n+1}})=Y_{t_{n}}(X_{t_{n}})-f_{t_{n}}(X_{t_{n}},Y_{t_{n}}(X_{t_{n}}),Z_{t_{n}}(X_{t_{n}}))\Delta t_{n}\\ &+Z_{t_{n}}^{T}(X_{t_{n}})\sigma(t_{n},X_{t_{n}})\Delta W_{t_{n}},\end{aligned}\right.

(6)

where $\Delta t_{n}=t_{n+1}-t_{n}=\frac{T}{N}$ , $\Delta W_{t_{n}}=W_{t_{n+1}}-W_{t_{n}}$ . The $Z_{t_{n}}(X_{t_{n}})$ is approximated by a FNN with parameter $\theta_{n}$ for $n=1,\cdots,N-1$ . The initial values $Y_{t_{0}}(X_{t_{0}})$ and $Z_{t_{0}}(X_{t_{0}})$ are treated as trainable parameters in the model. To make $Y_{t_{0}}(X_{t_{0}})$ to approximate $u(t_{0},X_{t_{0}})$ , the difference in the matching with a given terminal condition is used to define the expected loss function

\displaystyle l\left(Y_{t_{0}}(X_{t_{0}}),Z_{t_{0}}(X_{t_{0}}),\theta_{1},...,\theta_{N-1}\right)=\frac{1}{M}\sum_{m=1}^{M}\left|(g-Y_{t_{N}})(X_{t_{N},m})\right|^{2},

(7)

which represents $M$ different realizations of the underlying Brownian motion, where the subscript $m$ corresponds to the $m$ -th realization of the underlying Brownian motion. The process is called the deep BSDE method.

2.3 FBSNNs

Raissi [6] introduced neural networks called FBSNNs to solve FBSDEs. The unknown solution $u(t,x)$ is approximated by the FNN with the input $(t,x)$ and the required gradient vector $\nabla u(t,x)$ is attained by applying AD. The parameter $\theta$ of FNN can be learned by minimizing the loss function given explicitly in equation (9) obtained from discretizing the FBSDEs (5) using the Euler-Maruyama scheme

\left\{\begin{aligned} &X_{t_{n+1}}=X_{t_{n}}+b(t_{n},X_{t_{n}})\Delta t_{n}+\sigma(t_{n},X_{t_{n}})\Delta W_{t_{n}},\\ &\widetilde{Y}_{t_{n+1}}(X_{t_{n+1}})=Y_{t_{n}}(X_{t_{n}})-f_{t_{n}}(X_{t_{n}},Y_{t_{n}}(X_{t_{n}}),Z_{t_{n}}(X_{t_{n}}))\Delta t_{n}\\ &+Z_{t_{n}}^{T}(X_{t_{n}})\sigma(t_{n},X_{t_{n}})\Delta W_{t_{n}},\end{aligned}\right.

(8)

where $Y_{t_{n}}(X_{t_{n}})$ represents the estimated value of $u(t_{n},X_{t_{n}})$ given by the FNN and $\widetilde{Y}_{t_{n+1}}(X_{t_{n+1}})$ is the reference value corresponding to $Y_{t_{n+1}}(X_{t_{n+1}})$ , which is obtained from the calculation in (8). The loss function is then given by

\displaystyle l(\theta)

\displaystyle=\sum_{m=1}^{M}\left[\sum_{n=0}^{N-1}|(\widetilde{Y}_{t_{n+1}}-Y_{t_{n+1}})(X_{t_{n+1},m})|^{2}+|(g-Y_{t_{N}})(X_{t_{N},m})|^{2}\right],

(9)

where the subscripts $m$ is the same meaning as it in (7).

The deep BSDE method only calculates the value of $u(t_{0},X_{t_{0}})$ . This means that in order to obtain an approximation to $Y_{t_{n}}(X_{t_{n}})=u(t_{n},X_{t_{n}})$ at a later time $t_{n}>t_{0}$ , we have to retrain the algorithm. Furthermore, the number of the FNNs grows with the number of time steps $N$ , which makes training difficult. In this article, we use the FBSNNs. The FNN is expected to be able to approximate $u(t,x)$ over the entire computational area instead of only one point. That is, we will use multiple initial points to train the FNN. In order to improve the efficiency, the number of Brownian motion trajectories for each initial point is set as $M=1$ .

3 Deep neural network for solving the incompressible Navier-Stokes equation

3.1 A class of FBSDEs associated to the incompressible Navier-Stokes equation

The Cauchy problem for deterministic backward Navier–Stokes equation for the velocity field of the incompressible and viscous fluid is

\left\{\begin{aligned} &u_{t}+\nu\Delta u+(u\cdot\nabla)u+\nabla p+f=0,\ 0\leq t\leq T,\\ &\nabla\cdot u=0,\ u(T)=g,\\ \end{aligned}\right.

(10)

which is obtained from the classical Navier–Stokes equation via the time-reversing transformation

\displaystyle(u,p,f)(t,x)\rightarrow(-u,p,f)(T-t,x),\quad 0\leq t\leq T.

(11)

Here $u=u(t,x)$ represents the $d$ -dimensional velocity field of a fluid, $p=p(t,x)$ is the pressure, $\nu>0$ is the viscosity coefficient, and $f=f(t,x)$ stands for the external force. We now study the backward Navier–Stokes equation (10) in $\mathbb{R}^{d}$ with different boundary conditions.

Then, the PDE (10) is associated through the nonlinear Feynman-Kac formula to the following FBSDEs

\left\{\begin{aligned} &dX_{s}=\sqrt{2\nu}dW_{s},\quad s\in[0,T],\\ &X_{0}=x_{0},\\ &-dY_{s}=\left(f(s,X_{s})+\nabla p\left(s,X_{s}\right)+(Y_{s}\cdot\nabla)Y_{s}\right)ds-\sqrt{2\nu}Z_{s}^{T}dW_{s},\quad s\in[0,T],\\ &Y_{T}=g(X_{T}),\\ \end{aligned}\right.

(12)

where

Y_{s}=u\left(s,X_{s}\right),\quad Z_{s}=\nabla u\left(s,X_{s}\right).

3.2 The algorithm for solving the incompressible Navier-Stokes equation

Given a partition of $[0,T]:0=t_{0}<t_{1}<...<t_{N}=T$ , we consider the Euler-Maruyama scheme with $n=0,...,N-1$ for FBSDEs (12)

\left\{\begin{aligned} &X_{t_{n+1}}=X_{t_{n}}+\sqrt{2\nu}\Delta W_{t_{n}},\\ &\widetilde{Y}_{t_{n+1}}(X_{t_{n+1}})=Y_{t_{n}}(X_{t_{n}})-(f_{t_{n}}+\nabla P_{t_{n}}+(Y_{t_{n}}\cdot\nabla)Y_{t_{n}})(X_{t_{n}})\Delta t_{n}\\ &+\sqrt{2\nu}Z_{t_{n}}^{T}(X_{t_{n}})\Delta W_{t_{n}},\end{aligned}\right.

(13)

where $\Delta t_{n}=t_{n+1}-t_{n}=\frac{T}{N}$ and $\Delta W_{t_{n}}=W_{t_{n+1}}-W_{t_{n}}$ . The $(Y_{t_{n}},P_{t_{n}})^{T}$ represents the estimated value of $(u,p)^{T}$ at time $t_{n}$ given by the FNN, respectively. The $\widetilde{Y}_{t_{n+1}}(X_{t_{n+1}})$ is the reference value of $Y_{t_{n+1}}(X_{t_{n+1}})$ , which is obtained from the calculation in the second equation in (13). We utilize $K$ different initial sampling points for training the FNN. The algorithm of the proposed scheme is summarized in Algorithm 1. Illustration of the Algorithm 1 for solving the incompressible Navier-Stokes equation is shown in Figure 1.

Algorithm 1 Algorithm for the incompressible Navier-Stokes equation

1:Number of initial sampling points

K

, terminal time

T

, number of time intervals

N

, viscosity coefficient

\nu

, maximum number of training iterations

M_{iter}

2:The optimal FNN

\mathcal{U}_{\theta}

3:Initialize the FNN

\mathcal{U}_{\theta}

;

4:Select initial sampling points

x_{0}

by uniform distribution;

5:Generate independent

d

-dimensional standard Brownian motions

W_{t_{n}}(n=0,...,N)

;

6:Compute

X_{t_{n+1}}

according to (13) for

n=0,...,N-1

;

7:According to (13), use the FNN

\mathcal{U}_{\theta}

with AD to calculate

\widetilde{Y}_{t_{n+1}}(X_{t_{n+1}})

for

n=0,\ldots,N-1

;

8:Minimize the loss function by the Adam algorithm

	$\displaystyle l(\theta)$	$\displaystyle=\frac{1}{K}\sum_{k=1}^{K}\left[\frac{1}{N}\sum_{n=0}^{N-1}\|(\widetilde{Y}_{t_{n+1}}-Y_{t_{n+1}})(X_{t_{n+1},k})\|^{2}+\alpha_{1}\|(g-Y_{t_{N}})(X_{t_{N},k})\|^{2}\right.$		(14)
		$\displaystyle\left.+\frac{\alpha_{2}}{N+1}\sum_{n=0}^{N}\|\nabla\cdot Y_{t_{n}}(X_{t_{n},k})\|^{2}\right],$		(14)

where

\alpha_{i},i=1,2

are the weights of the components of the loss function. The subscript

k

corresponds to the

k

-th initial sampling point;

9:Repeat procedures 2 to 6 until the maximum number of training iterations

M_{iter}

is reached.

Refer to caption — Figure 1: Illustration of Algorithm 1 for solving the incompressible Navier-Stokes equation

3.3 The algorithm for solving the incompressible Navier-Stokes equation with Dirichlet boundary condition

For the backward Navier–Stokes equation (10) in $\Omega\subset\mathbb{R}^{d}$ with the Dirichlet boundary condition

\displaystyle u(t,x)=h(t,x),\quad(t,x)\in[0,T]\times\partial\Omega,

(15)

the corresponding FBSDEs can be rewritten as the following form according to [43] by the nonlinear Feynman-Kac formula

\left\{\begin{aligned} &X_{t}=x_{0}+\int_{0}^{t}\sqrt{2\nu}dW_{s},\\ &Y_{t}=\Phi(T\Lambda\tau,X_{T\Lambda\tau})+\int_{t\Lambda\tau}^{T\Lambda\tau}(f(s,X_{s})+\nabla p\left(s,X_{s}\right)\\ &+(Y_{s}\cdot\nabla)Y_{s})ds-\int_{t\Lambda\tau}^{T\Lambda\tau}\sqrt{2\nu}Z_{s}^{T}dW_{s},\end{aligned}\right.

(16)

where $a\Lambda b=\min\{a,b\}$ , the stopping time $\tau=\inf\{t>0:X_{t}\notin\Omega\}$ be the first time that the process $X_{t}$ exits $\Omega$ and

\displaystyle\Phi(T\Lambda\tau,X_{T\Lambda\tau})=\left\{\begin{array}[]{ll}g(X_{T}),\quad\tau>T,\ X_{T}\in\Omega,\\ h(\tau,X_{\tau}),\quad\tau\leq T,\ X_{\tau}\in\partial\Omega.\\ \end{array}\right.

(17)

Through the Euler scheme, the discrete formulation of the FBSDEs (16) can be obtained accordingly

\left\{\begin{aligned} &X_{t_{n+1}}=X_{t_{n}}+\sqrt{2\nu}\Delta W_{t_{n}},\\ &\widetilde{Y}_{t_{n+1}\Lambda\tau}(X_{t_{n+1}\Lambda\tau})=Y_{t_{n}\Lambda\tau}(X_{t_{n}\Lambda\tau})-\mathbbm{1}_{(0,\tau)}(t_{n})[(f_{t_{n}}+\nabla P_{t_{n}}\\ &+(Y_{t_{n}}\cdot\nabla)Y_{t_{n}})(X_{t_{n}})\Delta t_{n}-\sqrt{2\nu}Z_{t_{n}}^{T}(X_{t_{n}})\Delta W_{t_{n}}],\end{aligned}\right.

(18)

where $\mathbbm{1}_{(0,\tau)}(t_{n})=1,t_{n}\in[0,\tau)$ . It should be noted that we will calculate the stop time $\tau$ after the iteration of $X_{t_{n+1}}$ is completed. Supposing $\tau=t_{n+1}$ when $\tau\leq T$ , we let $X_{\tau}=X_{t_{n+1}}$ on $\partial\Omega$ and update $\Delta W_{t_{n}}=(X_{t_{n+1}}-X_{t_{n}})/\sqrt{2\nu}$ to satisfy (18). The algorithm of the proposed scheme is similar as Algorithm 1.

3.4 The algorithm for solving the incompressible Navier-Stokes equation with Neumann boundary condition

We consider the backward Navier–Stokes equation (10) with the Neumann boundary condition

\displaystyle\frac{\partial u(t,x)}{\partial\mathbf{n}}=q(t,x),\quad(t,x)\in[0,T]\times\partial\Omega,

(19)

where $\mathbf{n}$ is the unit normal vector at $\partial\Omega$ pointing outward of $\Omega$ . Supposing $x\in\partial\Omega,x+\Delta x\notin\Omega$ and $x-\Delta x\in\Omega$ , where $x+\Delta x$ and $x-\Delta x$ are symmetric to the boundary $\partial\Omega$ . Then we have

\displaystyle u(t,x-\Delta x)-u(t,x+\Delta x)\approx-2q(t,x)|\Delta x|,\quad(t,x)\in[0,T]\times\partial\Omega.

(20)

If $X_{t_{n+1}}\in\Omega$ , let $X^{\prime}_{t_{n+1}}=X_{t_{n+1}}$ , and if $X_{t_{n+1}}\notin\Omega$ , let $X^{\prime}_{t_{n+1}}\in\Omega$ is the symmetric point of $X_{t_{n+1}}$ to the boundary $\partial\Omega$ . The $X^{\prime\prime}_{t_{n+1}}$ is used to denote the intersection of the line segment $\overline{X_{t_{n+1}}X^{\prime}_{t_{n+1}}}$ and $\partial\Omega$ . Therefore, the discretization can be rewritten similarly as

\left\{\begin{aligned} &X_{t_{n+1}}=X_{t_{n}}+\sqrt{2\nu}\Delta W_{t_{n}},\\ &\Delta Y_{t_{n+1}}=q(t_{n+1},X^{\prime\prime}_{t_{n+1}})|\Delta X_{t_{n+1}}|,\\ &X_{t_{n+1}}=X^{\prime}_{t_{n+1}},\\ &\widetilde{Y}_{t_{n+1}}(X_{t_{n+1}})=Y_{t_{n}}(X_{t_{n}})-(f_{t_{n}}+\nabla P_{t_{n}}+(Y_{t_{n}}\cdot\nabla)Y_{t_{n}})(X_{t_{n}})\Delta t_{n}\\ &+\sqrt{2\nu}Z_{t_{n}}^{T}(X_{t_{n}})\Delta W_{t_{n}}-\Delta Y_{t_{n+1}},\end{aligned}\right.

(21)

where $\Delta X_{t_{n+1}}=X_{t_{n+1}}-X^{\prime}_{t_{n+1}}$ . The algorithm of the proposed scheme is similar as Algorithm 1.

Remark 1.

There are some similar works in [25, 29] for dealing with the Neumann boundary conditions. If $X_{t_{n+1}}\notin\Omega$ during the iterative process, the authors [25, 29] choose to reflect $X_{t_{n+1}}$ on the boundary $\partial\Omega$ , which allows them to deal with the homogeneous Neumann conditions. In contrast, our method can deal with non-homogeneous Neumann boundary conditions.

4 Deep neural network for solving the Cahn-Hilliard equation

4.1 Rewrite Cahn-Hilliard equation into a parabolic PDE system

We consider the following Cahn-Hilliard equation, which has fourth order derivatives,

\left\{\begin{aligned} &\phi_{t}-L_{d}\Delta\mu+f=0,\quad t\geq 0,\\ &\mu+\gamma^{2}\Delta\phi+\phi-\phi^{3}=0,\quad t\geq 0,\\ &\phi(0)=-g,\end{aligned}\right.

(22)

where $\phi=\phi(t,x)$ is the unknown, e.g., the concentration of the fluid, $\mu=\mu(t,x)$ is a function of $\phi$ , e.g.,the chemical potential, $L_{d}>0$ is the diffusion coefficient and $\gamma>0$ is the model parameter. A first order stabilized scheme [44] for the Cahn-Hilliard equation (22) reads as

\left\{\begin{aligned} &\phi^{n+1}-\phi^{n}-\Delta tL_{d}\Delta\mu^{n}+\Delta tf^{n}=0,\\ &\mu^{n+1}+\gamma^{2}\Delta\phi^{n}+\phi^{n}-(\phi^{n})^{3}-\frac{S}{L_{d}}(\phi^{n+1}-\phi^{n})=0,\\ \end{aligned}\right.

(23)

where $S$ is a suitable stabilized parameter. It is easy to derive the following equation

\displaystyle\frac{\mu^{n+1}-\mu^{n}}{\Delta t}+\frac{\gamma^{2}\Delta\phi^{n}}{\Delta t}+\frac{\phi^{n}-(\phi^{n})^{3}-\frac{S}{L_{d}}(\phi^{n+1}-\phi^{n})}{\Delta t}=\frac{-\mu^{n}}{\Delta t}.

(24)

The first equation of (23) and (24) can be regarded as the discretization of the following modified Cahn-Hilliard equation in $[0,T]$

\left\{\begin{aligned} &\phi_{t}-L_{d}\Delta\mu+f=0,\\ &\mu_{t}+\frac{\gamma^{2}}{\delta}\Delta\phi-S\Delta\mu+\frac{S}{L_{d}}f+\frac{1}{\delta}\left(\mu+\phi-\phi^{3}\right)=0,\\ &\phi(0)=-g,\end{aligned}\right.

(25)

where $\delta=O(\Delta t)$ . By reversing the time and defining

(\phi,\mu,f)(t,x)\rightarrow(-\phi,-\mu,f)(T-t,x),\quad 0\leq t\leq T,

the $(\phi,\mu)$ satisfies the following backward Cahn-Hilliard equation in $[0,T]$

\left\{\begin{aligned} &\phi_{t}+L_{d}\Delta\mu+f=0,\\ &\mu_{t}-\frac{\gamma^{2}}{\delta}\Delta\phi+S\Delta\mu+\frac{S}{L_{d}}f-\frac{1}{\delta}(\mu+\phi-\phi^{3})=0,\\ &\phi(T)=g.\end{aligned}\right.

(26)

In order to satisfy the nonlinear Feynman-Kac formula and utilize the FBSNNs, we treat the backward Cahn-Hilliard equation (26) as a semilinear parabolic differential equation

\displaystyle\psi_{t}+A\Delta\psi+F=0,

(27)

with $\psi=(\phi,\mu)^{T}$ , $A=\left(\begin{array}[]{ll}0&L_{d}\\ -\frac{\gamma^{2}}{\delta}&S\end{array}\right)$ and $F=\left(f,\frac{S}{L_{d}}f-\frac{1}{\delta}(\mu+\phi-\phi^{3})\right)^{T}$ . Supposing $\lambda_{1}$ and $\lambda_{2}$ are two different eigenvalues of the coefficient matrix $A$ , then the coefficient matrix $A$ can be diagonalized by $R$ and $R^{-1}$ so that $D=\mbox{diag}(\lambda_{1},\lambda_{2})=R^{-1}AR$ , where $R$ is a matrix of eigenvectors. The system (27) becomes

\displaystyle\hat{\psi}_{t}+D\Delta\hat{\psi}+\hat{F}=0,

(28)

where $\hat{\psi}=R^{-1}\psi=(\hat{\phi},\hat{\mu})^{T}$ , $\hat{F}=R^{-1}F=(\hat{F}^{\hat{\phi}},\hat{F}^{\hat{\mu}})^{T}$ and $S$ is chosen so that $S>2\gamma\sqrt{\frac{L_{d}}{\delta}}$ for $\lambda_{1,2}>0$ , where $\lambda_{1,2}=\frac{S\pm\sqrt{S^{2}-\frac{4\gamma^{2}L_{d}}{\delta}}}{2}$ .

Therefore, the system (28) is decomposed into two independent PDEs and the corresponding FBSDEs can be obtained as follows

\left\{\begin{aligned} &dX_{s}^{\hat{\phi}}=\sqrt{2\lambda_{1}}dW_{s}^{\hat{\phi}},\quad s\in[0,T],\\ &dX_{s}^{\hat{\mu}}=\sqrt{2\lambda_{2}}dW_{s}^{\hat{\mu}},\quad s\in[0,T],\\ &X_{0}^{\hat{\phi}}=x_{0},\ X_{0}^{\hat{\mu}}=x_{0},\\ &-dY_{s}^{\hat{\phi}}=\hat{F}_{s}^{\hat{\phi}}ds-\sqrt{2\lambda_{1}}(Z_{s}^{\hat{\phi}})^{T}dW_{s}^{\hat{\phi}},\quad s\in[0,T],\\ &-dY_{s}^{\hat{\mu}}=\hat{F}_{s}^{\hat{\mu}}ds-\sqrt{2\lambda_{2}}(Z_{s}^{\hat{\mu}})^{T}dW_{s}^{\hat{\mu}},\quad s\in[0,T],\\ &\phi(T)=g,\end{aligned}\right.

(29)

where

Y_{s}^{\hat{\phi}}=\hat{\phi}\left(s,X_{s}^{\hat{\phi}}\right),Z_{s}^{\hat{\phi}}=\nabla\hat{\phi}\left(s,X_{s}^{\hat{\phi}}\right),\hat{F}_{s}^{\hat{\phi}}=\hat{F}^{\hat{\phi}}\left(s,X_{s}^{\hat{\phi}},\phi(s,X_{s}^{\hat{\phi}}),\mu(s,X_{s}^{\hat{\phi}})\right),

and

Y_{s}^{\hat{\mu}}=\hat{\mu}\left(s,X_{s}^{\hat{\mu}}\right),Z_{s}^{\hat{\mu}}=\nabla\hat{\mu}\left(s,X_{s}^{\hat{\mu}}\right),\hat{F}_{s}^{\hat{\mu}}=\hat{F}^{\hat{\mu}}\left(s,X_{s}^{\hat{\mu}},\phi(s,X_{s}^{\hat{\mu}}),\mu(s,X_{s}^{\hat{\mu}})\right).

The $\{X_{s}^{\hat{\phi}}\}_{0\leq s\leq T}$ and $\{X_{s}^{\hat{\mu}}\}_{0\leq s\leq T}$ are the forward stochastic processes corresponding to $\hat{\phi}$ and $\hat{\mu}$ respectively, which are constrained by $x_{0}$ at the initial time.

4.2 The algorithm for solving the Cahn-Hilliard equation

Given a partition of $[0,T]:0=t_{0}<t_{1}<...<t_{N}=T$ , we consider the simple Euler scheme for the FBSDEs (29) with $n=0,...,N-1$

\left\{\begin{aligned} &X_{t_{n+1}}^{\hat{\phi}}=X_{t_{n}}^{\hat{\phi}}+\sqrt{2\lambda_{1}}\Delta W_{t_{n}}^{\hat{\phi}},\\ &X_{t_{n+1}}^{\hat{\mu}}=X_{t_{n}}^{\hat{\mu}}+\sqrt{2\lambda_{2}}\Delta W_{t_{n}}^{\hat{\mu}}.\\ &\widetilde{Y}_{t_{n+1}}^{\hat{\phi}}(X_{t_{n+1}}^{\hat{\phi}})=Y_{t_{n}}^{\hat{\phi}}(X_{t_{n}}^{\hat{\phi}})-\hat{F}_{t_{n}}^{\hat{\phi}}(X_{t_{n}}^{\hat{\phi}},Y_{t_{n}}^{\phi}(X_{t_{n}}^{\hat{\phi}}),Y_{t_{n}}^{\mu}(X_{t_{n}}^{\hat{\phi}}))\Delta t_{n}\\ &+\sqrt{2\lambda_{1}}(Z_{t_{n}}^{\hat{\phi}}(X_{t_{n}}^{\hat{\phi}}))^{T}\Delta W_{t_{n}}^{\hat{\phi}},\\ &\widetilde{Y}_{t_{n+1}}^{\hat{\mu}}(X_{t_{n+1}}^{\hat{\mu}})=Y_{t_{n}}^{\hat{\mu}}(X_{t_{n}}^{\hat{\mu}})-\hat{F}_{t_{n}}^{\hat{\mu}}(X_{t_{n}}^{\hat{\mu}},Y_{t_{n}}^{\phi}(X_{t_{n}}^{\hat{\mu}}),Y_{t_{n}}^{\mu}(X_{t_{n}}^{\hat{\mu}}))\Delta t_{n}\\ &+\sqrt{2\lambda_{2}}(Z_{t_{n}}^{\hat{\mu}}(X_{t_{n}}^{\hat{\mu}}))^{T}\Delta W_{t_{n}}^{\hat{\mu}},\end{aligned}\right.

(30)

where $\Delta t_{n}=t_{n+1}-t_{n}=\frac{T}{N}=\delta,\ \Delta W_{t_{n}}^{\hat{\phi}}=W_{t_{n+1}}^{\hat{\phi}}-W_{t_{n}}^{\hat{\phi}}$ and $\Delta W_{t_{n}}^{\hat{\mu}}=W_{t_{n+1}}^{\hat{\mu}}-W_{t_{n}}^{\hat{\mu}}$ . The $(Y_{t_{n}}^{\hat{\phi}}(X_{t_{n}}^{\hat{\phi}}),Y_{t_{n}}^{\hat{\mu}}(X_{t_{n}}^{\hat{\mu}}))^{T}$ represents the estimated value of $(\hat{\phi}(t_{n},X_{t_{n}}^{\hat{\phi}}),\hat{\mu}(t_{n},X_{t_{n}}^{\hat{\mu}}))^{T}$ . The $(\widetilde{Y}_{t_{n+1}}^{\hat{\phi}}(X_{t_{n+1}}^{\hat{\phi}}),\widetilde{Y}_{t_{n+1}}^{\hat{\mu}}(X_{t_{n+1}}^{\hat{\mu}}))^{T}$ is the reference value of $(Y_{t_{n+1}}^{\hat{\phi}}(X_{t_{n+1}}^{\hat{\phi}}),$ $Y_{t_{n+1}}^{\hat{\mu}}(X_{t_{n+1}}^{\hat{\mu}}))^{T}$ , which is obtained from the last two equations in (30).

The $(Y_{t_{n}}^{\phi},Y_{t_{n}}^{\mu})^{T}$ represents the estimated value of $(\phi,\mu)^{T}$ at time $t_{n}$ given by the FNN. Due to the diagonalization, we have

\left\{\begin{aligned} \begin{aligned} &Y_{t_{n}}^{\hat{\phi}}(X_{t_{n}}^{\hat{\phi}})=\left[R^{-1}(Y_{t_{n}}^{\phi},Y_{t_{n}}^{\mu})^{T}(X_{t_{n}}^{\hat{\phi}})\right]_{1},\\ &Y_{t_{n}}^{\hat{\mu}}(X_{t_{n}}^{\hat{\mu}})=\left[R^{-1}(Y_{t_{n}}^{\phi},Y_{t_{n}}^{\mu})^{T}(X_{t_{n}}^{\hat{\mu}})\right]_{2},\end{aligned}\end{aligned}\right.

(31)

where subscript $1$ or $2$ is used to represent the $1$ -th or $2$ -th component of the vector. We utilize $K$ different initial sampling points for training the FNN. The algorithm of the proposed scheme is summarized as Algorithm 2. Illustration of the Algorithm 2 for solving the Cahn-Hilliard equation is shown in Figure 2.

Algorithm 2 Algorithm for the Cahn-Hilliard equation

1:Number of initial sampling points

K

, terminal time

T

, number of time intervals

N

, diffusion coefficient

L_{d}

, parameters

\gamma

\delta

and

S

, matrix

R^{-1}

, eigenvalues

\lambda_{1,2}

, maximum number of training iterations

M_{iter}

2:The optimal FNN

\mathcal{U}_{\theta}

3:Initialize the FNN

\mathcal{U}_{\theta}

;

4:Select initial sampling points

x_{0}

by uniform distribution;

5:Generate independent

d

-dimensional standard Brownian motions

W_{t_{n}}^{\hat{\phi}}

and

W_{t_{n}}^{\hat{\mu}}(n=0,...,N)

;

6:Compute

X_{t_{n+1}}^{\hat{\phi}}

and

X_{t_{n+1}}^{\hat{\mu}}

according to (30) for

n=0,...,N-1

;

7:According to (30) and (31), use the FNN

\mathcal{U}_{\theta}

with AD to calculate

\widetilde{Y}_{t_{n+1}}^{\hat{\phi}}(X_{t_{n+1}}^{\hat{\phi}})

and

\widetilde{Y}_{t_{n+1}}^{\hat{\mu}}(X_{t_{n+1}}^{\hat{\mu}})

for

n=0,\ldots,N-1

;

8:Minimize the loss function by the Adam algorithm

	$\displaystyle l(\theta)=$	$\displaystyle\frac{1}{K}\sum_{k=1}^{K}\left[\frac{1}{N}\sum_{n=0}^{N-1}\left(\|(\widetilde{Y}_{t_{n+1}}^{\hat{\phi}}-Y_{t_{n+1}}^{\hat{\phi}})(X_{t_{n+1},k}^{\hat{\phi}})\|^{2}\right.\right.$		(32)
		$\displaystyle\left.\left.+\|(\widetilde{Y}_{t_{n+1}}^{\hat{\mu}}-Y_{t_{n+1}}^{\hat{\mu}})(X_{t_{n+1},k}^{\hat{\mu}})\|^{2}\right)+\alpha_{1}\|(g-Y_{t_{N}}^{\phi})(X_{t_{N},k}^{\hat{\phi}\cup\hat{\mu}})\|^{2}\right],$		(32)

where

X_{t_{N},k}^{\hat{\phi}\cup\hat{\mu}}=X_{t_{N},k}^{\hat{\phi}}\cup X_{t_{N},k}^{\hat{\mu}}

and

\alpha_{1}

is the weight of the terminal condition. The subscript

k

corresponds to the

k

-th initial sampling point;

9:Repeat procedures 2 to 6 until the maximum number of training iterations

M_{iter}

is reached.

The Algorithm 2 shows that we only need to compute first-order derivatives during training, which causes the training time to increase linearly with the dimension $d$ . This makes our method capable of efficiently solving high-dimensional Cahn-Hilliard equations, which can be observed in the numerical experiments of solving the high-dimensional Cahn-Hilliard equations in Section 6.2.

4.3 The algorithm for solving the Cahn-Hilliard equation with mixed boundary condition

We consider the Cahn-Hilliard equation (22) in $\Omega\subset\mathbb{R}^{d}$ with the mixed condition

\displaystyle\left\{\begin{aligned} &\phi(t,x)=h(t,x),\\ &\frac{\partial\mu(t,x)}{\partial\mathbf{n}}=q(t,x),\end{aligned}\right.\quad(t,x)\in[0,T]\times\partial\Omega,

(33)

where $\mathbf{n}$ is the unit normal vector at $\partial\Omega$ pointing outward of $\Omega$ . The method described in Section 3.3 is used to deal with the Dirichlet boundary condition. For the Neumann boundary condition, it is noted that

\displaystyle\begin{aligned} \frac{\partial\hat{\mu}}{\partial\mathbf{n}}(t,x)=\left[R^{-1}\left(\frac{\partial\phi}{\partial\mathbf{n}},\frac{\partial\mu}{\partial\mathbf{n}}\right)^{T}(t,x)\right]_{2}\end{aligned}

(34)

where $\frac{\partial\phi}{\partial\mathbf{n}}(t,x)$ is given by the FNN with AD and subscript 2 represents the second component. Therefore, the method described in Section 3.4 can be used to deal with the Neumann boundary condition. It is shown in Section 6 that our method performs well numerically. The algorithm of the proposed scheme is similar as Algorithm 2.

4.4 The algorithm for solving the Cahn-Hilliard equation with periodic boundary condition

We consider the Cahn-Hilliard equation (22) with the periodic boundary condition,

\displaystyle\left\{\begin{aligned} &\phi(t,x_{1},\cdots,x_{i}+I_{i},\cdots,x_{d})=\phi(t,x_{1},\cdots,x_{i},\cdots,x_{d}),\\ &\mu(t,x_{1},\cdots,x_{i}+I_{i},\cdots,x_{d})=\mu(t,x_{1},\cdots,x_{i},\cdots,x_{d}),\end{aligned}\right.\quad i=1,\cdots,d,

(35)

where $I_{i}$ is the period along the $i$ -th direction. To satisfy the condition (35), as in [10], we transform the input vector $x=(x_{1},\cdots,x_{d})$ into a fixed trigonometric basis before applying the FNN. The component $x_{i}$ in $x$ is mapped as follows

\displaystyle x_{i}\rightarrow\left\{\sin\left(j\cdot 2\pi\frac{x_{i}}{I_{i}}\right),\cos\left(j\cdot 2\pi\frac{x_{i}}{I_{i}}\right)\right\}_{j=1}^{J},\ i=1,\cdots,d,

(36)

where $J$ is the order of the trigonometric basis. The network structure for the periodic boundary condition (35) is shown in Figure 3. The algorithm of the proposed scheme is similar as Algorithm 2.

5 Deep neural network for solving Cahn-Hilliard-Navier-Stokes system

We now solve the coupled Cahn-Hilliard-Navier-Stokes equation in domain $\mathbb{R}^{d}$ . According to Section 3.1 and Section 4.1, after time-reversing, the modified CHNS system is

\left\{\begin{aligned} &u_{t}+\nu\Delta u+(u\cdot\nabla)u+\nabla p+C\phi\nabla\mu+f_{1}=0,\\ &\phi_{t}+u\cdot\nabla\phi+L_{d}\Delta\mu+f_{2}=0,\\ &\mu_{t}-\frac{\gamma^{2}}{\delta}\Delta\phi+S\Delta\mu+\frac{S}{L_{d}}(u\cdot\nabla\phi+f_{2})-\frac{1}{\delta}(\mu+\phi-\phi^{3})=0,\\ &\nabla\cdot u=0,\\ &u(T)=g_{u},\quad\phi(T)=g_{\phi},\end{aligned}\right.

(37)

where $C$ denotes a parameter, e.g., the strength of the capillary force comparing with the Newtonian fluid stress.

Similarly, we have the corresponding FBSDEs of (37) by diagonalizing and using the nonlinear Feynman-Kac formula

\left\{\begin{aligned} &dX_{s}^{u}=\sqrt{2\nu}dW_{s}^{u},\quad s\in[0,T],\\ &dX_{s}^{\hat{\phi}}=\sqrt{2\lambda_{1}}dW_{s}^{\hat{\phi}},\quad s\in[0,T],\\ &dX_{s}^{\hat{\mu}}=\sqrt{2\lambda_{2}}dW_{s}^{\hat{\mu}},\quad s\in[0,T],\\ &(X_{0}^{u},X_{0}^{\hat{\phi}},X_{0}^{\hat{\mu}})^{T}=(x_{0},x_{0},x_{0})^{T},\\ &-dY_{s}^{u}=F_{s}^{u}ds-\sqrt{2\nu}(Z_{s}^{u})^{T}dW_{s}^{u},\quad s\in[0,T],\\ &-dY_{s}^{\hat{\phi}}=\hat{F}_{s}^{\hat{\phi}}ds-\sqrt{2\lambda_{1}}(Z_{s}^{\hat{\phi}})^{T}dW_{s}^{\hat{\phi}},\quad s\in[0,T],\\ &-dY_{s}^{\hat{\mu}}=\hat{F}_{s}^{\hat{\mu}}ds-\sqrt{2\lambda_{2}}(Z_{s}^{\hat{\mu}})^{T}dW_{s}^{\hat{\mu}},\quad s\in[0,T],\\ &u(T)=g_{u},\quad\phi(T)=g_{\phi},\end{aligned}\right.

(38)

with $F^{u}=(u\cdot\nabla)u+\nabla p+C\phi\nabla\mu+f_{1}$ and

(\hat{F}^{\hat{\phi}},\hat{F}^{\hat{\mu}})^{T}=R^{-1}\left(f_{2}+u\cdot\nabla\phi,\frac{S}{L_{d}}\left(u\cdot\nabla\phi+f_{2}\right)-\frac{1}{\delta}\left(\mu+\phi-\phi^{3}\right)\right)^{T}.

The Euler scheme of (38) for $n=0,\cdots,N-1$ is

\left\{\begin{aligned} &X_{t_{n+1}}^{u}=X_{t_{n}}^{u}+\sqrt{2\nu}\Delta W_{t_{n}}^{u},\\ &X_{t_{n+1}}^{\hat{\phi}}=X_{t_{n}}^{\hat{\phi}}+\sqrt{2\lambda_{1}}\Delta W_{t_{n}}^{\hat{\phi}},\\ &X_{t_{n+1}}^{\hat{\mu}}=X_{t_{n}}^{\hat{\mu}}+\sqrt{2\lambda_{2}}\Delta W_{t_{n}}^{\hat{\mu}},\\ &\widetilde{Y}_{t_{n+1}}^{u}(X_{t_{n+1}}^{u})=Y_{t_{n}}^{u}(X_{t_{n}}^{u})+\sqrt{2\nu}(Z_{t_{n}}^{u}(X_{t_{n}}^{u}))^{T}\Delta W_{t_{n}}^{u}\\ &-F_{t_{n}}^{u}(X_{t_{n}}^{u},\nabla P_{t_{n}}(X_{t_{n}}^{u}),Y_{t_{n}}^{u}(X_{t_{n}}^{u}),Y_{t_{n}}^{\phi}(X_{t_{n}}^{u}),Z_{t_{n}}^{\mu}(X_{t_{n}}^{u}))\Delta t_{n},\\ &\widetilde{Y}_{t_{n+1}}^{\hat{\phi}}(X_{t_{n+1}}^{\hat{\phi}})=Y_{t_{n}}^{\hat{\phi}}(X_{t_{n}}^{\hat{\phi}})+\sqrt{2\lambda_{1}}(Z_{t_{n}}^{\hat{\phi}}(X_{t_{n}}^{\hat{\phi}}))^{T}\Delta W_{t_{n}}^{\hat{\phi}}\\ &-\hat{F}_{t_{n}}^{\hat{\phi}}(X_{t_{n}}^{\hat{\phi}},Y_{t_{n}}^{u}(X_{t_{n}}^{\hat{\phi}}),Y_{t_{n}}^{\phi}(X_{t_{n}}^{\hat{\phi}}),Y_{t_{n}}^{\mu}(X_{t_{n}}^{\hat{\phi}}),Z_{t_{n}}^{\phi}(X_{t_{n}}^{\hat{\phi}}))\Delta t_{n},\\ &\widetilde{Y}_{t_{n+1}}^{\hat{\mu}}(X_{t_{n+1}}^{\hat{\mu}})=Y_{t_{n}}^{\hat{\mu}}(X_{t_{n}}^{\hat{\mu}})+\sqrt{2\lambda_{2}}(Z_{t_{n}}^{\hat{\mu}}(X_{t_{n}}^{\hat{\mu}}))^{T}\Delta W_{t_{n}}^{\hat{\mu}}\\ &-\hat{F}_{t_{n}}^{\hat{\mu}}(X_{t_{n}}^{\hat{\mu}},Y_{t_{n}}^{u}(X_{t_{n}}^{\hat{\mu}}),Y_{t_{n}}^{\phi}(X_{t_{n}}^{\hat{\mu}}),Y_{t_{n}}^{\mu}(X_{t_{n}}^{\hat{\mu}}),Z_{t_{n}}^{\phi}(X_{t_{n}}^{\hat{\mu}}))\Delta t_{n},\end{aligned}\right.

(39)

where $\Delta t_{n}=t_{n+1}-t_{n}=\frac{T}{N}=\delta,\Delta W_{t_{n}}^{u}=W_{t_{n+1}}^{u}-W_{t_{n}}^{u},\Delta W_{t_{n}}^{\hat{\phi}}=W_{t_{n+1}}^{\hat{\phi}}-W_{t_{n}}^{\hat{\phi}}$ and $\Delta W_{t_{n}}^{\hat{\mu}}=W_{t_{n+1}}^{\hat{\mu}}-W_{t_{n}}^{\hat{\mu}}$ . The $Y_{t_{n}}^{u}(X_{t_{n}}^{u})$ , $Y_{t_{n}}^{\hat{\phi}}(X_{t_{n}}^{\hat{\phi}})$ and $Y_{t_{n}}^{\hat{\mu}}(X_{t_{n}}^{\hat{\mu}})$ represent the estimated values of $u(t_{n},X_{t_{n}}^{u})$ , $\hat{\phi}(t_{n},X_{t_{n}}^{\hat{\phi}})$ and $\hat{\mu}(t_{n},X_{t_{n}}^{\hat{\mu}})$ , respectively. The $\widetilde{Y}_{t_{n+1}}^{u}(X_{t_{n+1}}^{u})$ , $\widetilde{Y}_{t_{n+1}}^{\hat{\phi}}(X_{t_{n+1}}^{\hat{\phi}})$ and $\widetilde{Y}_{t_{n+1}}^{\hat{\mu}}(X_{t_{n+1}}^{\hat{\mu}})$ are the reference values of $Y_{t_{n+1}}^{u}(X_{t_{n+1}}^{u})$ , $Y_{t_{n+1}}^{\hat{\phi}}(X_{t_{n+1}}^{\hat{\phi}})$ and $Y_{t_{n+1}}^{\hat{\mu}}(X_{t_{n+1}}^{\hat{\mu}})$ , respectively, which are obtained from the last three equations in (39).

The $(Y_{t_{n}}^{u},P_{t_{n}})^{T}$ represents the estimated value of $(u,p)^{T}$ at time $t_{n}$ given by the FNN $\mathcal{U}_{\theta_{1}}$ . The $(Y_{t_{n}}^{\phi},Y_{t_{n}}^{\mu})^{T}$ represents the estimated value of $(\phi,\mu)^{T}$ at time $t_{n}$ given by the FNN $\mathcal{U}_{\theta_{2}}$ . The calculations of $Y_{t_{n}}^{\hat{\phi}}(X_{t_{n}}^{\hat{\phi}})$ and $Y_{t_{n}}^{\hat{\mu}}(X_{t_{n}}^{\hat{\mu}})$ are based on (31). We choose $K$ different initial sampling points for training. The algorithm of the proposed scheme is summarized as Algorithm 3.

Algorithm 3 Algorithm for the Cahn-Hilliard-Navier-Stokes equation

1:Number of initial sampling points

K

, terminal time

T

, number of time intervals

N

, viscosity coefficient

\nu

, diffusion coefficient

L_{d}

, parameters

\gamma

\delta

C

and

S

, matrix

R^{-1}

, eigenvalues

\lambda_{1,2}

, maximum number of training iterations

M_{iter}

2:The optimal FNNs

\mathcal{U}_{\theta_{1}}

and

\mathcal{U}_{\theta_{2}}

3:Initialize the FNNs

\mathcal{U}_{\theta_{1}}

and

\mathcal{U}_{\theta_{2}}

;

4:Select initial sampling points

x_{0}

by uniform distribution;

5:Generate independent

d

-dimensional standard Brownian motions

W_{t_{n}}^{u}

W_{t_{n}}^{\hat{\phi}}

and

W_{t_{n}}^{\hat{\mu}}(n=0,...,N)

;

6:Compute

X_{t_{n+1}}^{u}

X_{t_{n+1}}^{\hat{\phi}}

and

X_{t_{n+1}}^{\hat{\mu}}

according to (39) for

n=0,...,N-1

;

7:According to (31) and (39), use the FNNs

\mathcal{U}_{\theta_{1}}

and

\mathcal{U}_{\theta_{2}}

with AD to calculate

\widetilde{Y}_{t_{n+1}}^{u}(X_{t_{n+1}}^{u})

\widetilde{Y}_{t_{n+1}}^{\hat{\phi}}(X_{t_{n+1}}^{\hat{\phi}})

and

\widetilde{Y}_{t_{n+1}}^{\hat{\mu}}(X_{t_{n+1}}^{\hat{\mu}})

;

8:Minimize the loss function by the Adam algorithm

		$\displaystyle l(\theta_{1},\theta_{2})=\frac{1}{K}\sum_{k=1}^{K}\left[\frac{1}{N}\sum_{n=0}^{N-1}\left(\|(\widetilde{Y}_{t_{n+1}}^{u}-Y_{t_{n+1}}^{u})(X_{t_{n+1},k}^{u})\|^{2}\right.\right.$		(40)
		$\displaystyle\left.+\|(\widetilde{Y}_{t_{n+1}}^{\hat{\phi}}-Y_{t_{n+1}}^{\hat{\phi}})(X_{t_{n+1},k}^{\hat{\phi}})\|^{2}+\|(\widetilde{Y}_{t_{n+1}}^{\hat{\mu}}-Y_{t_{n+1}}^{\hat{\mu}})(X_{t_{n+1},k}^{\hat{\mu}})\|^{2}\right)$
		$\displaystyle+\alpha_{1}\|(g_{u}-Y_{t_{N}}^{u})(X_{t_{N},k}^{u\cup\hat{\phi}\cup\hat{\mu}})\|^{2}+\alpha_{2}\|(g_{\phi}-Y_{t_{N}}^{\phi})(X_{t_{N},k}^{u\cup\hat{\phi}\cup\hat{\mu}})\|^{2}$
		$\displaystyle\left.+\frac{\alpha_{3}}{N+1}\sum_{n=0}^{N}\|\nabla\cdot Y_{t_{n}}^{u}(X_{t_{n},k}^{u\cup\hat{\phi}\cup\hat{\mu}})\|^{2}\right],$

where

X_{t_{n},k}^{u\cup\hat{\phi}\cup\hat{\mu}}=X_{t_{n},k}^{u}\cup X_{t_{n},k}^{\hat{\phi}}\cup X_{t_{n},k}^{\hat{\mu}}

and

\alpha_{i},i=1,2,3

are the weights of the components of the loss function. The subscript

k

corresponds to the

k

-th initial sampling point;

9:Repeat procedures 2 to 6 until the maximum number of training iterations

M_{iter}

is reached.

For the CHNS equation with the Dirichlet, Neumann and periodic boundary conditions, the similar methods as in Section 3 and Section 4 can be used.

6 Numerical experiments

In this section, we present a series of numerical results to validate our methods. For quantitative comparison, we calculate the error of the numerical solution $Y_{t_{0}}$ and the exact solution $u_{t_{0}}$ in the relative $L^{\infty}$ norm and relative $L^{2}$ norm, which are defined as

||e||_{L^{\infty}}=\frac{||Y_{t_{0}}-u_{t_{0}}||_{L^{\infty}}}{||u_{t_{0}}||_{L^{\infty}}},\ \ \ ||e||_{L^{2}}=\frac{||Y_{t_{0}}-u_{t_{0}}||_{L^{2}}}{||u_{t_{0}}||_{L^{2}}}.

The total number of training iterations is given by 1E+5, which is divided into 2E+4, 3E+4, 3E+4 and 2E+4 iterations with learning rates of 5E-3, 5E-4, 5E-5 and 5E-6, respectively, as the way in [6]. We employ the Adam optimizer to train FNNs. For each training step, we train the FNNs using 100 points randomly selected by the Latin hypercube sampling technique (LHS) in the domain. After the training process, we randomly pick 10000 points by the same method in the domain to test the FNNs. We set 4 hidden layers for the FNNs and each hidden layer consists of 30 neurons. The cosine function is taken as the activation function of the FNNs if we do not specify otherwise. In each numerical example, we use a set of the time interval $\Delta t$ , weights $\alpha_{i}$ and the stabilization parameter $S$ . How to select or adjust these hyperparameters will be our future work. In our simulations, we use AMD Ryzen 7 3700X CPU and NVIDIA GTX 1660 SUPER GPU to train FNNs. The parameters and settings of numerical experiments are summarized in the Table 1.

total number of training iterations	1E+5
number of iterations per segment	[2E+4, 3E+4, 3E+4, 2E+4]
learning rate per segment	[5E-3, 5E-4, 5E-5, 5E-6]
optimization algorithm	Adam
structure of neural networks	[30,30,30,30]
number of training points	100
number of test points	10000
point selection method	LHS
activation function	cos
CPU	AMD Ryzen 7 3700X
GPU	NVIDIA GTX 1660 SUPER

Table 1: Parameters and settings for numerical experiments

6.1 Navier-Stokes equation

In this section, we numerically simulate the Taylor-Green vortex flow, which is a classical model to test numerical schemes for the incompressible Navier-Stokes equation. First, we consider the explicit 2D Taylor-Green vortex flow

\left\{\begin{aligned} &u_{1}(t,x)=-\cos(x_{1})\sin(x_{2})\exp(-2\nu t),\\ &u_{2}(t,x)=\sin(x_{1})\cos(x_{2})\exp(-2\nu t),\\ &p(t,x)=-\frac{1}{4}\left(\cos(2x_{1})+\cos(2x_{2})\right)\exp(-4\nu t)+c,\end{aligned}\right.

(41)

for $(t,x)=(0,T]\times[0,2\pi]^{2}$ with constant $c\in\mathbb{R}$ and initial condition

\left\{\begin{aligned} &u_{1}(0,x)=-\cos(x_{1})\sin(x_{2}),\\ &u_{2}(0,x)=\sin(x_{1})\cos(x_{2}).\end{aligned}\right.

(42)

Algorithm 1 is employed to estimate $\mathbf{u}(T,x)=(u_{1},u_{2})^{T}$ with $T=0.1$ , $f=0$ , $N=5$ , $\Delta t=\frac{T}{N}=0.02$ and $\alpha_{1}=\alpha_{2}=0.1$ . The numerical results of the errors for $\mathbf{u}$ and $\nabla p$ with different viscosity $\nu$ are shown in Table 2. The relative $L^{2}$ errors and the training losses with different training steps are shown in Figure 4. It is observed that these values decrease with parameter $\nu$ decreases. Similar phenomena will occur in the later experiments. Our method is not sensitive to the parameter $\nu$ . The training time is 500s for each case, which is a acceptable cost.

$\nu$	1E-1	1E-2	1E-3	1E-4
$\|\|e_{1}\|\|_{L^{\infty}}$	1.60E-2	8.58E-3	7.41E-3	6.92E-3
$\|\|e_{2}\|\|_{L^{\infty}}$	1.66E-2	5.72E-3	6.30E-3	6.45E-3
$\|\|e_{1}\|\|_{L^{2}}$	7.23E-3	2.00E-3	9.02E-4	8.52E-4
$\|\|e_{2}\|\|_{L^{2}}$	8.61E-3	1.39E-3	8.59E-4	8.36E-4
$\|\|e_{\nabla p}\|\|_{L^{2}}$	1.12E-1	2.41E-2	9.45E-3	8.40E-3
time	500s

Table 2: Relative

L^{\infty}

and

L^{2}

errors for (41) performed by the Algorithm 1 with

\nu=

1E-1, 1E-2, 1E-3 and 1E-4.

The 3D Arnold-Beltrami-Childress (ABC) flow is as follows

\left\{\begin{aligned} &u_{1}(t,x)=(A\sin(x_{3})+C\cos(x_{2}))e^{-\nu t},\\ &u_{2}(t,x)=(B\sin(x_{1})+A\cos(x_{3}))e^{-\nu t},\\ &u_{3}(t,x)=(C\sin(x_{2})+B\cos(x_{1}))e^{-\nu t},\\ &p(t,x)=-(BC\cos(x_{1})\sin(x_{2})+AB\sin(x_{1})\cos(x_{3})\\ &\quad\quad\quad\quad+AC\sin(x_{3})\cos(x_{2}))e^{-2\nu t}+c,\end{aligned}\right.

(43)

for $(t,x)=(0,T]\times[0,2\pi]^{3}$ with parameters $A,B,C\in\mathbb{R}$ , constant $c\in\mathbb{R}$ and initial condition

\left\{\begin{aligned} &u_{1}(0,x)=(A\sin(x_{3})+C\cos(x_{2})),\\ &u_{2}(0,x)=(B\sin(x_{1})+A\cos(x_{3})),\\ &u_{3}(0,x)=(C\sin(x_{2})+B\cos(x_{1})).\end{aligned}\right.

(44)

We estimate $\mathbf{u}(T,x)=(u_{1},u_{2},u_{3})^{T}$ by applying the Algorithm 1 with parameters $A=B=C=0.5$ , $T=0.1$ , $f=0$ , $N=5$ , $\Delta t=\frac{T}{N}=0.02$ , $\alpha_{1}=\alpha_{2}=0.1$ . The numerical results of the errors for $\mathbf{u}$ and $\nabla p$ with different viscosity $\nu$ are shown in Table 3. The relative $L^{2}$ errors and the training losses with different training steps are shown in Figure 5. The training time is 700s for each case, which is not too much longer than the 2D simulations.

$\nu$	1E-1	1E-2	1E-3	1E-4
$\|\|e_{1}\|\|_{L^{\infty}}$	1.46E-2	8.77E-3	9.19E-3	9.16E-3
$\|\|e_{2}\|\|_{L^{\infty}}$	1.41E-2	8.12E-3	9.34E-3	9.60E-3
$\|\|e_{3}\|\|_{L^{\infty}}$	1.24E-2	9.32E-3	8.83E-3	8.81E-3
$\|\|e_{1}\|\|_{L^{2}}$	6.20E-3	2.01E-3	1.81E-3	1.83E-3
$\|\|e_{2}\|\|_{L^{2}}$	4.60E-3	2.36E-3	2.08E-3	2.05E-3
$\|\|e_{3}\|\|_{L^{2}}$	7.05E-3	3.12E-3	2.38E-3	2.26E-3
$\|\|e_{\nabla p}\|\|_{L^{2}}$	1.70E-1	6.92E-2	5.72E-2	5.58E-2
time	700s

Table 3: Relative

L^{\infty}

and

L^{2}

errors for (43) performed by the Algorithm 1 with

\nu=

1E-1, 1E-2, 1E-3 and 1E-4.

Next, we consider the 2D Taylor-Green vortex flow (41)–(42) with the Dirichlet boundary condition (15) for $(t,x)=(0,T]\times(0,2\pi)^{2}$ . The other parameters remain the same as the first example. We use the algorithm in Section 3.3 and the numerical results of the errors for $\mathbf{u}$ and $\nabla p$ with different viscosity parameters $\nu$ are shown in Table 4 and Figure 6 depicts the training processes. We also consider the 2D Taylor-Green vortex flow (41)–(42) with the Neumann boundary condition (19) for $(t,x)=(0,T]\times(0,2\pi)^{2}$ . We use the algorithm in Section 3.4 and the numerical results of the errors for $\mathbf{u}$ and $\nabla p$ with different viscosity parameters $\nu$ are shown in Table 5 and the training processes are shown in Figure 7.

$\nu$	1E-1	1E-2	1E-3	1E-4
$\|\|e_{1}\|\|_{L^{\infty}}$	4.96E-3	8.19E-3	7.13E-3	6.99E-3
$\|\|e_{2}\|\|_{L^{\infty}}$	5.44E-3	5.70E-3	6.09E-3	6.41E-3
$\|\|e_{1}\|\|_{L^{2}}$	1.89E-3	1.57E-3	8.90E-4	8.70E-4
$\|\|e_{2}\|\|_{L^{2}}$	1.92E-3	1.59E-3	8.62E-4	8.57E-4
$\|\|e_{\nabla p}\|\|_{L^{2}}$	3.26E-2	2.20E-2	9.28E-3	8.83E-3
time	700s

Table 4: Relative

L^{\infty}

and

L^{2}

errors for (41) with the Dirichlet boundary condition performed by the algorithm in Section 3.3 with

\nu=

1E-1, 1E-2, 1E-3 and 1E-4.

$\nu$	1E-1	1E-2	1E-3	1E-4
$\|\|e_{1}\|\|_{L^{\infty}}$	1.50E-2	9.45E-3	7.65E-3	7.35E-3
$\|\|e_{2}\|\|_{L^{\infty}}$	1.55E-2	6.10E-3	6.50E-3	6.64E-3
$\|\|e_{1}\|\|_{L^{2}}$	8.05E-3	1.98E-3	9.27E-4	8.83E-4
$\|\|e_{2}\|\|_{L^{2}}$	7.59E-3	1.52E-3	8.60E-4	8.53E-4
$\|\|e_{\nabla p}\|\|_{L^{2}}$	1.11E-1	2.43E-2	9.36E-3	9.00E-3
time	900s

Table 5: Relative

L^{\infty}

and

L^{2}

errors for (41) with the Neumann boundary condition performed by the algorithm in Section 3.4 with

\nu=

1E-1, 1E-2, 1E-3 and 1E-4.

Now, we consider the 2D lid driven cavity flow for $(t,x)=(0,T]\times(0,1)^{2}$ with the boundary and initial conditions

\left\{\begin{aligned} &\mathbf{u}(t,x)=(1,0),\quad x\in\partial\Omega_{u}\\ &\mathbf{u}(t,x)=(0,0),\quad x\in\partial\Omega\backslash\partial\Omega_{u}\\ &\mathbf{u}(0,x)=(0,0),\quad x\in\Omega,\end{aligned}\right.

(45)

where $\partial\Omega_{u}$ represents the upper boundary. We utilize the algorithm in Section 3.3 to simulate $\mathbf{u}$ . We impose boundary conditions $u_{i}=0,i=1,2$ to the network in the training process and let

\left\{\begin{aligned} &Y_{t_{n}}^{1}(x_{1},x_{2})=8x_{1}(x_{1}-1)x_{2}Y_{t_{n}}^{1}(x_{1},x_{2}),\\ &Y_{t_{n}}^{2}(x_{1},x_{2})=8x_{1}(x_{1}-1)x_{2}(x_{2}-1)Y_{t_{n}}^{2}(x_{1},x_{2}),\end{aligned}\right.

(46)

where $Y_{t_{n}}^{i}(x_{1},x_{2})$ represents the estimate of $u_{i}(t_{n},x_{1},x_{2})$ output by the FNN. Therefore, it is easily verified that $Y_{t_{n}}$ satisfies the boundary conditions of $u_{i}=0,i=1,2$ . For the condition of $u_{1}=1$ on $\partial\Omega_{u}$ , we add the following additional term to the loss function in Algorithm 1

\displaystyle\frac{\alpha_{3}}{K_{u}}\sum_{k=1}^{K_{u}}\left[\frac{1}{N+1}\sum_{n=0}^{N}|Y_{t_{n}}^{1}(X^{u}_{t_{n},k})+1|^{2}\right],

(47)

where $X^{u}_{t_{n},k}$ represents the $k$ -th point among the $K_{u}$ points selected on $\partial\Omega_{u}$ at time $t_{n}$ . The parameters are chosen as $T=0.5$ , $f=0$ , $\nu=0.1$ , $N=25$ , $K_{u}=100$ , $\Delta t=0.02$ , $\alpha_{1}=\alpha_{2}=\alpha_{3}=0.01$ . To improve accuracy and save training time, we use the time adaptive approach II mentioned in [41]. At $T=0.5$ , the stream function and the pressure $p$ with $\nu=0.1$ are visually shown in Figure 8. These results are consistent with benchmark results.

Finally, we consider that the flow past a circular obstacle for $(t,x)=(0,T]\times(-2,10)\times(-2,2)$ . The center of the obstacle is at position $(0,0)$ with the diameter $D=1$ . The boundary and initial conditions are given

\left\{\begin{aligned} &u_{2}(t,x)=0,\quad x\in\partial\Omega_{u}\cup\partial\Omega_{d},\\ \ &\mathbf{u}(t,x)=(0,0),\quad x\in\partial\Omega_{c},\\ &\mathbf{u}(t,x)=(u_{in},0),\quad x\in\partial\Omega_{l},\\ &p\mathbf{n}-\nu\nabla\mathbf{u}\cdot\mathbf{n}=0,\quad x\in\partial\Omega_{r},\\ &\mathbf{u}(0,x)=(u_{in},0),\quad x\in\Omega,\end{aligned}\right.

(48)

where $\partial\Omega_{u},\partial\Omega_{d},\partial\Omega_{l},\partial\Omega_{r},\partial\Omega_{c}$ represent the upper, lower, left, right boundaries and the surface of the obstacle. The $u_{in}$ is the inlet velocity and $\mathbf{n}$ is the unit normal vector at $\partial\Omega$ pointing outward of $\Omega$ . We utilize the algorithm in Section 3.3 to deal with the Dirichlet boundary conditions on $\partial\Omega_{u}\cup\partial\Omega_{d}\cup\partial\Omega_{l}\cup\partial\Omega_{c}$ , while we utilize the algorithm in Section 3.4 to deal with the condition on $\partial\Omega_{r}$ . We let

\left\{\begin{aligned} &Y_{t_{n}}^{1}(x_{1},x_{2})=\frac{8Y_{t_{n}}^{1}(x_{1},x_{2})(x_{1}^{2}+x_{2}^{2}-0.25)}{(x_{1}^{2}+x_{2}^{2})},\\ &Y_{t_{n}}^{2}(x_{1},x_{2})=\frac{8Y_{t_{n}}^{2}(x_{1},x_{2})(x_{1}^{2}+x_{2}^{2}-0.25)(x_{1}+2)(x_{2}-2)(x_{2}+2)}{(x_{1}^{2}+x_{2}^{2})},\\ \end{aligned}\right.

(49)

where $Y_{t_{n}}^{i}(x_{1},x_{2})$ represents the estimate of $u_{i}(t_{n},x_{1},x_{2})$ output by the FNN. Therefore, it is easily verified that $Y_{t_{n}}$ satisfies the boundary conditions of $u_{i}=0,i=1,2$ . For the conditions of $u_{1}=u_{in}$ on $\partial\Omega_{l}$ and $p\mathbf{n}-\nu\nabla\mathbf{u}\cdot\mathbf{n}=0$ on $\partial\Omega_{r}$ , we add the following additional term to the loss function in Algorithm 1

\displaystyle\frac{\alpha_{3}}{N+1}\sum_{n=0}^{N}\left[\frac{1}{K_{l}}\sum_{k=1}^{K_{l}}|Y_{t_{n}}^{1}(X^{l}_{t_{n},k})+u_{\infty}|^{2}+\frac{1}{K_{r}}\sum_{k=1}^{K_{r}}|(P_{t_{n}}\mathbf{n}+\nu Z_{t_{n}}\cdot\mathbf{n})(X^{r}_{t_{n},k})|^{2}\right],

(50)

where $X^{l}_{t_{n},k}$ denotes the $k$ -th point among the $K_{l}$ points selected on $\partial\Omega_{l}$ at time $t_{n}$ and $X^{r}_{t_{n},k}$ denotes the $k$ -th point among the $K_{r}$ points selected on $\partial\Omega_{r}$ at time $t_{n}$ . The parameters are chosen as $T=1.0$ , $f=0$ , $\nu=0.025$ , $N=50$ , $K_{l}=K_{r}=100$ , $\Delta t=0.02$ , $\alpha_{1}=\alpha_{2}=\alpha_{3}=0.01$ and $u_{in}=3$ . Similarly, We choose to use the time adaptive approach II mentioned in [41] to improve accuracy and save training time. At $T=1.0$ , the streamline is shown in Figure 9 with $\nu=0.025$ , which is consistent with the result obtained by traditional numerical methods.

6.2 Cahn-Hilliard equation

In this section, we consider the Cahn-Hilliard equation (22) in $(t,x)=(0,T]\times[-1,1]^{d}$ with initial condition

\displaystyle\phi(0,x)=\cos\left(\frac{\pi}{\sqrt{d}}\sum_{i=1}^{d}x_{i}\right).

(51)

The exact solution is given by

\displaystyle\phi(t,x)=e^{-t}\cos\left(\frac{\pi}{\sqrt{d}}\sum_{i=1}^{d}x_{i}\right).

(52)

The parameters are taken as $L_{d}=$ 5E-4, $T=0.1$ , $N=10$ , $\delta=\Delta t=0.01$ and $\alpha_{1}=0.01$ . We estimate $\phi$ using Algorithm 2 with different parameter $\gamma$ in different dimension. The numerical results of the errors for $\phi$ with different $\gamma$ and $S$ are recorded in Table 6. Training processes in different dimension are shown in Figure 10. For a fixed dimension, when $\gamma$ decreases, the relative $L^{2}$ error and training losses decrease. Our method is not sensitive to parameters $\gamma$ and $S$ , and the training time of our method increases linearly with the dimension $d$ , while the accuracy does not decrease. It works for the problem with high-order derivatives in high dimensions, which does not make the training difficult.

d	$\gamma$	S	$\|\|e\|\|_{L^{\infty}}$	$\|\|e\|\|_{L^{2}}$	time
2	0.5	0.5	3.25E-2	5.02E-3	1200s
	0.1	0.1	2.56E-3	1.67E-3
	0.05	0.05	2.08E-3	1.13E-3
	0.01	0.01	1.99E-3	6.67E-4
50	0.5	0.5	1.08E-2	4.06E-3	3200s
	0.1	0.1	3.78E-3	1.65E-3
	0.05	0.05	3.48E-3	1.04E-3
	0.01	0.01	3.63E-3	6.49E-4
100	0.5	0.5	1.62E-2	4.25E-3	5200s
	0.1	0.1	6.46E-3	1.76E-3
	0.05	0.05	2.38E-3	1.15E-3
	0.01	0.01	4.06E-3	6.79E-4

Table 6: Relative

L^{\infty}

and

L^{2}

errors for (52) performed by the Algorithm 2 with

\gamma=0.5,\ 0.1,\ 0.05,\ 0.01

for dimension

d=2,\ 50,\ 100

We consider the Cahn-Hilliard equation (22) with exact solution (52) defined in $\Omega=\{x:|x|<1\}$ satisfying the mixed boundary conditions (33) on $\partial\Omega=\{x:|x|=1\}$ , where $h(t,x)$ and $q(t,x)$ are given by the exact solution. In this case, we choose $\alpha_{1}=1$ . We utilize the algorithm in Section 4.3 to solve $\phi$ with different parameter $\gamma$ in different dimension. The numerical results of the errors for $\phi$ with different $\gamma$ and $S$ are recorded in Table 7 and the training processes are shown in Figure 11. Our method works for boundary value problem in high dimension.

d	$\gamma$	S	$\|\|e\|\|_{L^{\infty}}$	$\|\|e\|\|_{L^{2}}$	time
2	0.5	0.5	1.69E-2	6.12E-3	1800s
	0.1	0.1	4.52E-3	1.94E-3
	0.05	0.05	3.64E-3	1.69E-3
	0.01	0.01	3.72E-3	1.92E-3
50	0.5	0.5	2.40E-2	4.46E-3	3800s
	0.1	0.1	1.13E-2	2.01E-3
	0.05	0.05	5.77E-3	1.16E-3
	0.01	0.01	7.33E-3	8.55E-4
100	0.5	0.5	1.14E-2	2.58E-3	6000s
	0.1	0.1	9.52E-3	2.10E-3
	0.05	0.05	1.10E-2	1.44E-3
	0.01	0.01	4.98E-2	9.56E-4

Table 7: Relative

L^{\infty}

and

L^{2}

errors for (52) with the mixed boundary conditions performed by the algorithm in Section 4.3, with

\gamma=0.5,\ 0.1,\ 0.05,\ 0.01

for dimension

d=2,\ 50,\ 100

Next, we consider the Cahn-Hilliard equation (22) with exact solution (52) defined in $\Omega=(-\sqrt{2},\sqrt{2})^{2}$ with the periodic boundary condition (35) and mixed boundary condition (33) on $\partial\Omega$ , where the periods $I_{i}=2\sqrt{2},i=1,2$ , where $h(t,x)$ and $q(t,x)$ are given by the exact solution. We choose $J=1$ and other parameters remain the same as previous example. We utilize the algorithm in Sections 4.3 and 4.4 to solve $\phi$ with different parameter $\gamma$ . The numerical results of the errors for $\phi$ are recorded in Table 8 and Figure 12.

$\gamma$	S	$\|\|e\|\|_{L^{\infty}}$	$\|\|e\|\|_{L^{2}}$	time
0.5	0.5	9.98E-3	5.48E-3	3400s
0.1	0.1	4.86E-3	2.42E-3
0.05	0.05	7.86E-3	2.84E-3
0.01	0.01	7.39E-3	2.88E-3

Table 8: Relative

L^{\infty}

and

L^{2}

errors for (52) with the mixed and periodic boundary condition performed by the algorithm in Sections 4.3 and 4.4 with

\gamma=0.5,\ 0.1,\ 0.05,\ 0.01

6.3 Cahn-Hilliard-Navier-Stokes equation

In this section, we consider the coupled CHNS system

\left\{\begin{aligned} &u_{1}(t,x)=-\cos(x_{1})\sin(x_{2})e^{-t},\\ &u_{2}(t,x)=\sin(x_{1})\cos(x_{2})e^{-t},\\ &p(t,x)=-\frac{1}{4}(cos(2x_{1})+\cos(2x_{2}))e^{-2t}+c,\\ &\phi(t,x)=\sin(x_{1})\sin(x_{2})e^{-t},\end{aligned}\right.

(53)

for $(t,x)=(0,T]\times[0,2\pi]^{2}$ with the constant $c$ and initial condition

\left\{\begin{aligned} &u_{1}(0,x)=-\cos(x_{1})\sin(x_{2}),\\ &u_{2}(0,x)=\sin(x_{1})\cos(x_{2}),\\ &\phi(0,x)=\sin(x_{1})\sin(x_{2}).\\ \end{aligned}\right.

(54)

The parameters are taken as $T=0.1$ , $N=5$ , $\delta=\Delta t=0.02$ , $\nu=1E-3$ , $C=1$ , $L_{d}=5E-4$ , $\gamma=0.01$ , $S=0.0032$ , $\alpha_{1}=\alpha_{2}=\alpha_{3}=0.01$ . The numerical results of the errors for $\mathbf{u}(T,x)=(u_{1},u_{2})^{T}$ , $\phi$ and $\nabla p$ are recorded in Table 9 and Figure 13 shows the training process, where the Algorithm 3 in Section 5 is implemented. It is easy to see that our method works for the coupled system.

$\|\|e_{u_{1}}\|\|_{L^{\infty}}$	$\|\|e_{u_{2}}\|\|_{L^{\infty}}$	$\|\|e_{\phi}\|\|_{L^{\infty}}$	$\|\|e_{u_{1}}\|\|_{L^{2}}$	$\|\|e_{u_{2}}\|\|_{L^{2}}$	$\|\|e_{\phi}\|\|_{L^{2}}$	$\|\|e_{\nabla p}\|\|_{L^{2}}$	time
2.41E-2	2.30E-2	1.25E-2	1.31E-2	1.16E-2	4.78E-3	2.07E-1	1800s

Table 9: Relative

L^{\infty}

and

L^{2}

errors for (53) performed by the Algorithm 3.

Finally, we study the interface problem modeled by the CHNS system. In this example, we choose the square domain $\Omega=[-1,1]^{2}$ and the parameters $L_{d}=1$ , $\nu=1$ , $C=10$ , $T=3$ , $N=300$ , $\delta=\Delta t=0.01$ , $\gamma=0.03$ , $S=3.3$ , $\alpha_{1}=\alpha_{2}=\alpha_{3}=0.01$ . The initial conditions for $\phi$ and $\mathbf{u}=(u_{1},u_{2})^{T}$ is given

\left\{\begin{aligned} &\phi(0,x_{1},x_{2})=\max\left(\tanh\frac{r-R_{1}}{2\gamma},\tanh\frac{r-R_{2}}{2\gamma}\right),\\ &\mathbf{u}(0,x_{1},x_{2})=0,\end{aligned}\right.

(55)

where $r=0.4$ , $R_{1}=\sqrt{(x_{1}-0.7r)^{2}+x_{2}^{2}}$ , and $R_{2}=\sqrt{(x_{1}+0.7r)^{2}+x_{2}^{2}}$ . The time adaptive approach II in [41] is employed to reduce the training time. According to the conservation of mass, we add the following loss term to the final loss function defined in Algorithm 3

\displaystyle\frac{1}{N+1}\sum_{n=0}^{N}\left|\int_{\Omega}Y_{t_{n}}^{\phi}(X)dX-\int_{\Omega}g(X)dX\right|^{2}.

(56)

The cosine and tanh functions are chosen as activation functions for the FNNs $\mathcal{U}_{\theta_{1}}$ and $\mathcal{U}_{\theta_{2}}$ , respectively. The evolution of the bubbles merging is visually shown in Figure 14, which is coincide with the result in the literature.

7 Conclusions and remarks

In this article, we have presented the methods to obtain the numerical solutions of the incompressible Navier-Stokes equation, the Cahn-Hilliard equation and the CHNS system with different boundary conditions based on the Forward-Backward Stochastic Neural Networks. In particular, we utilize the modified Cahn-Hilliard equation that is derived from a widely used stabilized scheme for original Cahn-Hilliard, which can be diagonalized into a parabolic system. The FBSNNs are applied to this system, which works well for high-dimensional problem. We demonstrate the performance of our algorithms on a variety of numerical experiments. In all numerical results, our methods are shown to be both stable and accurate. In the future work, we will study on how to make the training more efficiently and provide the theoretical analysis for our methods with some assumptions.

Declarations

1.

–Ethical Approval
Not Applicable
2.

–Availability of supporting data
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
3.

–Competing interests
The authors have no relevant financial or non-financial interests to disclose.
4.

–Funding
This research is partially supported by the National key R & D Program of China (No.2022YFE03040002) and the National Natural Science Foundation of China ( No.12371434).
5.

–Authors’ contributions
All authors contribute to the study conception and design. Numerical simulations are performed by Deng Yangtao. All authors read and approve the final manuscript.
6.

–Acknowledgments
This research is partially supported by the National key R & D Program of China (No.2022YFE03040002) and the National Natural Science Foundation of China ( No.12371434).

References

[1] N. Kantas, A. Beskos, A. Jasra, Sequential Monte Carlo methods for high-dimensional inverse problems: A case study for the Navier–Stokes equations, SIAM/ASA Journal on Uncertainty Quantification 2 (1) (2014) 464–489.
[2] X. Warin, Nesting Monte Carlo for high-dimensional non-linear PDEs, Monte Carlo Methods and Applications 24 (4) (2018) 225–247.
[3] J. Dick, F. Y. Kuo, I. H. Sloan, High-dimensional integration: the quasi-Monte Carlo way, Acta Numerica 22 (2013) 133–288.
[4] J. Shen, H. Yu, Efficient spectral sparse grid methods and applications to high-dimensional elliptic problems, SIAM Journal on Scientific Computing 32 (6) (2010) 3228–3250.
[5] Z. Wang, Q. Tang, W. Guo, Y. Cheng, Sparse grid discontinuous Galerkin methods for high-dimensional elliptic equations, Journal of Computational Physics 314 (2016) 244–263.
[6] M. Raissi, Forward–backward stochastic neural networks: deep learning of high-dimensional partial differential equations, in: Peter Carr Gedenkschrift: Research Advances in Mathematical Finance, World Scientific, 2024, pp. 637–655.
[7] C. Beck, S. Becker, P. Cheridito, A. Jentzen, A. Neufeld, Deep splitting method for parabolic PDEs, SIAM Journal on Scientific Computing 43 (5) (2021) A3135–A3154.
[8] W. E, B. Yu, The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems, Communications in Mathematics and Statistics 6 (1) (2018) 1–12.
[9] J. Han, A. Jentzen, W. E, Solving high-dimensional partial differential equations using deep learning, Proceedings of the National Academy of Sciences 115 (34) (2018) 8505–8510.
[10] J. Han, J. Lu, M. Zhou, Solving high-dimensional eigenvalue problems using deep neural networks: A diffusion Monte Carlo like approach, Journal of Computational Physics 423 (2020) 109792.
[11] Y. Khoo, J. Lu, L. Ying, Solving parametric PDE problems with artificial neural networks, European Journal of Applied Mathematics 32 (3) (2021) 421–435.
[12] Y. Khoo, L. Ying, SwitchNet: a neural network model for forward and inverse scattering problems, SIAM Journal on Scientific Computing 41 (5) (2019) A3182–A3201.
[13] L. Lu, X. Meng, Z. Mao, G. E. Karniadakis, DeepXDE: A deep learning library for solving differential equations, SIAM Review 63 (1) (2021) 208–228.
[14] K. O. Lye, S. Mishra, D. Ray, P. Chandrashekar, Iterative surrogate model optimization (ISMO): an active learning algorithm for PDE constrained optimization with deep neural networks, Computer Methods in Applied Mechanics and Engineering 374 (2021) 113575.
[15] L. Lyu, Z. Zhang, M. Chen, J. Chen, MIM: A deep mixed residual method for solving high-order partial differential equations, Journal of Computational Physics 452 (2022) 110930.
[16] M. Raissi, P. Perdikaris, G. E. Karniadakis, Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, Journal of Computational physics 378 (2019) 686–707.
[17] E. Samaniego, C. Anitescu, S. Goswami, V. M. Nguyen-Thanh, H. Guo, K. Hamdia, X. Zhuang, T. Rabczuk, An energy approach to the solution of partial differential equations in computational mechanics via machine learning: Concepts, implementation and applications, Computer Methods in Applied Mechanics and Engineering 362 (2020) 112790.
[18] J. Sirignano, K. Spiliopoulos, DGM: A deep learning algorithm for solving partial differential equations, Journal of computational physics 375 (2018) 1339–1364.
[19] Z. Wang, Z. Zhang, A mesh-free method for interface problems using the deep learning approach, Journal of Computational Physics 400 (2020) 108963.
[20] Y. Zang, G. Bao, X. Ye, H. Zhou, Weak adversarial networks for high-dimensional partial differential equations, Journal of Computational Physics 411 (2020) 109409.
[21] S. Zeng, Z. Zhang, Q. Zou, Adaptive deep neural networks methods for high-dimensional partial differential equations, Journal of Computational Physics 463 (2022) 111232.
[22] Y. Zhu, N. Zabaras, P.-S. Koutsourelakis, P. Perdikaris, Physics-constrained deep learning for high-dimensional surrogate modeling and uncertainty quantification without labeled data, Journal of Computational Physics 394 (2019) 56–81.
[23] E. Pardoux, S. Peng, Backward stochastic differential equations and quasilinear parabolic partial differential equations, in: B. L. Rozovskii, R. B. Sowers (Eds.), Stochastic partial differential equations and their applications, Springer, Berlin, Heidelberg, 1992, pp. 200–217.
[24] C. Beck, S. Becker, P. Grohs, N. Jaafari, A. Jentzen, Solving the Kolmogorov pde by means of deep learning, Journal of Scientific Computing 88 (2021) 1–28.
[25] V. Boussange, S. Becker, A. Jentzen, B. Kuckuck, L. Pellissier, Deep learning approximations for non-local nonlinear PDEs with Neumann boundary conditions, Partial Differential Equations and Applications 4 (6) (2023) 51.
[26] Q. Chan-Wai-Nam, J. Mikael, X. Warin, Machine learning for semi linear PDEs, Journal of scientific computing 79 (3) (2019) 1667–1712.
[27] Q. Feng, M. Luo, Z. Zhang, Deep signature FBSDE algorithm, Numerical Algebra, Control and Optimization 13 (3&4) (2023) 500–522.
[28] M. Fujii, A. Takahashi, M. Takahashi, Asymptotic expansion as prior knowledge in deep learning method for high dimensional BSDEs, Asia-Pacific Financial Markets 26 (3) (2019) 391–408.
[29] J. Han, M. Nica, A. R. Stinchcombe, A derivative-free method for solving elliptic partial differential equations with deep neural networks, Journal of Computational Physics 419 (2020) 109672.
[30] J. Y. Nguwi, G. Penent, N. Privault, A deep branching solver for fully nonlinear partial differential equations, Journal of Computational Physics 499 (2024) 112712.
[31] N. Nüsken, L. Richter, Interpolating between BSDEs and PINNs: deep learning for elliptic and parabolic boundary value problems, Journal of Machine Learning 2 (1) (2023) 31–64.
[32] N. Nüsken, L. Richter, Solving high-dimensional Hamilton–Jacobi–Bellman PDEs using neural networks: perspectives from the theory of controlled diffusions and measures on path space, Partial Differential Equations and Applications 2 (4) (2021) 1–48.
[33] H. Pham, X. Warin, M. Germain, Neural networks-based backward scheme for fully nonlinear PDEs, SN Partial Differential Equations and Applications 2 (1) (2021) 1–24.
[34] A. Takahashi, Y. Tsuchida, T. Yamada, A new efficient approximation scheme for solving high-dimensional semilinear PDEs: control variate method for Deep BSDE solver, Journal of Computational Physics 454 (2022) 110956.
[35] M. Sabate Vidales, D. Šiška, L. Szpruch, Unbiased deep solvers for linear parametric PDEs, Applied Mathematical Finance 28 (4) (2021) 299–329.
[36] Y. Wang, Y.-H. Ni, Deep BSDE-ML learning and its application to model-free optimal control, arXiv preprint arXiv:2201.01318.
[37] R. Mattey, S. Ghosh, A novel sequential method to train physics informed neural networks for Allen Cahn and Cahn Hilliard equations, Computer Methods in Applied Mechanics and Engineering 390 (2022) 114474.
[38] T. P. Miyanawala, R. K. Jaiman, An efficient deep learning technique for the Navier-Stokes equations: Application to unsteady wake flow dynamics, arXiv preprint arXiv:1710.09099.
[39] A. T. Mohan, D. V. Gaitonde, A deep learning based approach to reduced order modeling for turbulent flow control using LSTM neural networks, arXiv preprint arXiv:1804.09269.
[40] M. Raissi, A. Yazdani, G. E. Karniadakis, Hidden fluid mechanics: A Navier-Stokes informed deep learning framework for assimilating flow visualization data, arXiv preprint arXiv:1808.04327.
[41] C. L. Zhao, Solving Allen-Cahn and Cahn-Hilliard Equations using the Adaptive Physics Informed Neural Networks, Communications in Computational Physics 29 (3).
[42] J. Yang, Q. Zhu, A Local Deep Learning Method for Solving High Order Partial Differential Equations., Numerical Mathematics: Theory, Methods & Applications 15 (1).
[43] É. Pardoux, Backward stochastic differential equations and viscosity solutions of systems of semilinear parabolic and elliptic PDEs of second order, in: L. Decreusefond, B. Øksendal, J. Gjerde, A. S. Üstünel (Eds.), Stochastic Analysis and Related Topics VI, Springer, Boston, MA, 1998, pp. 79–127.
[44] J. Shen, X. Yang, Numerical approximations of Allen-Cahn and Cahn-Hilliard equations, Discrete & Continuous Dynamical Systems 28 (4) (2010) 1669.

		$\displaystyle l(\theta_{1},\theta_{2})=\frac{1}{K}\sum_{k=1}^{K}\left[\frac{1}{N}\sum_{n=0}^{N-1}\left(\|(\widetilde{Y}_{t_{n+1}}^{u}-Y_{t_{n+1}}^{u})(X_{t_{n+1},k}^{u})\|^{2}\right.\right.$		(40)
		$\displaystyle\left.+\|(\widetilde{Y}_{t_{n+1}}^{\hat{\phi}}-Y_{t_{n+1}}^{\hat{\phi}})(X_{t_{n+1},k}^{\hat{\phi}})\|^{2}+\|(\widetilde{Y}_{t_{n+1}}^{\hat{\mu}}-Y_{t_{n+1}}^{\hat{\mu}})(X_{t_{n+1},k}^{\hat{\mu}})\|^{2}\right)$
		$\displaystyle+\alpha_{1}\|(g_{u}-Y_{t_{N}}^{u})(X_{t_{N},k}^{u\cup\hat{\phi}\cup\hat{\mu}})\|^{2}+\alpha_{2}\|(g_{\phi}-Y_{t_{N}}^{\phi})(X_{t_{N},k}^{u\cup\hat{\phi}\cup\hat{\mu}})\|^{2}$
		$\displaystyle\left.+\frac{\alpha_{3}}{N+1}\sum_{n=0}^{N}\|\nabla\cdot Y_{t_{n}}^{u}(X_{t_{n},k}^{u\cup\hat{\phi}\cup\hat{\mu}})\|^{2}\right],$