This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: Yuling Jiao22institutetext: School of Mathematics and Statistics, Wuhan University, Wuhan 430072, China
22email: [email protected]
33institutetext: Yuhui Liu44institutetext: School of Mathematics and Statistics, Wuhan University, Wuhan 430072, China
44email: [email protected]
55institutetext: Jerry Zhijian Yang66institutetext: School of Mathematics and Statistics, Wuhan University, Wuhan 430072, China
66email: [email protected]
77institutetext: Cheng Yuan88institutetext: School of Mathematics and Statistics, Wuhan University, Wuhan 430072, China
88email: [email protected]

A Stabilized Physics Informed Neural Networks Method for Wave Equations

Yuling Jiao    Yuhui Liu    Jerry Zhijian Yang    Cheng Yuan
(Received: date / Accepted: date)
Abstract

In this article, we propose a novel Stabilized Physics Informed Neural Networks method (SPINNs) for solving wave equations. In general, this method not only demonstrates theoretical convergence but also exhibits higher efficiency compared to the original PINNs. By replacing the L2L^{2} norm with H1H^{1} norm in the learning of initial condition and boundary condition, we theoretically proved that the error of solution can be upper bounded by the risk in SPINNs. Based on this, we decompose the error of SPINNs into approximation error, statistical error and optimization error. Furthermore, by applying the approximating theory of ReLU3ReLU^{3} networks and the learning theory on Rademacher complexity, covering number and pseudo-dimension of neural networks, we present a systematical non-asymptotic convergence analysis on our method, which shows that the error of SPINNs can be well controlled if the number of training samples, depth and width of the deep neural networks have been appropriately chosen. Two illustrative numerical examples on 1-dimensional and 2-dimensional wave equations demonstrate that SPINNs can achieve a faster and better convergence than classical PINNs method.

Keywords:
PINNs ReLU3ReLU^{3} neural network Wave equationsError analysis
MSC:
68T07 65M12 62G05

1 Introduction

During the past few decades, numerical methods of Partial differential equations (PDEs) have been widely studied and applied in various fields of scientific computation brenner2007mathematical ; ciarlet2002finite ; Quarteroni2008Numerical ; Thomas2013Numerical . Among these, due to the central significance in solid mechanics, acoustics and electromagnetism, the numerical solution for wave equation attracts considerable attention, and a lot of work has been done to analyze the convergence rate, improve the solving efficiency and deal with practical problems such as boundary conditions. For many real problems with complex region, however, designing an efficient and accurate algorithms with practical absorbing boundary conditions is still difficult, especially for problems with irregular boundary. Furthermore, in high-dimensional case, many traditional methods may become even intractable due to the Curse of Dimensionality, which leads to an exponential increase in degree of freedom with the dimension of problem.

More recently, inspired by the great success of deep learning in fields of natural language processing and computational visions, solving PDEs with deep learning has become as a highly promising topic lagaris1998artificial ; anitescu2019artificial ; Berner2020Numerically ; han2018solving ; lu2021deepxde ; sirignano2018dgm . Several numerical schemes have been proposed to solve PDEs using neural networks, including the deep Ritz method (DRM) Weinan2017The , physics-informed neural networks (PINNs) raissi2019physics , weak adversarial neural networks(WANs) Yaohua2020weak and their extensions jagtap2020conservative ; npinns ; fpinns . Due to the simplicity and flexibility in its formulation, PINNs turns out to be the most concerned method. In the field of wave equations, researchers have successfully applied PINNs to the modeling of scattered acoustic fields wang2023acoustic , including transcranial ultrasound wave wang2023physics and seismic wave ding2023self . In these works, all of the authors observed an interesting phenomenon that training PINNs without any boundary constraints may lead to a solution under absorbing boundary condition. In another word, the waves obtained by PINNs without boundary loss will be naturally absorbed at the boundary. This phenomenon, in fact, greatly improves the application value of PINNs in wave simulation, especially for inverse scattering problems. On the other hand, although PINNs have been widely used in the simulation of waves, a rigorous numerical analysis of PINNs for wave equations and more efficient training strategy are still needed.

In this work, we propose the Stabilized Physics Informed Neural Networks (SPINNs) for simulation of waves. By replacing the L2L^{2} norm in initial condition and boundary condition with H1H^{1} norm, we obtain a stable PINNs method, in the sense that the error in solution can be upper bounded by the risk during training. It is worth mentioning that, in 2017 a similar idea called Sobolev Training has been proposed to improve the efficiency for regression czarnecki2017sobolev . Later in son2021sobolev and vlassis2021sobolev , the authors generalized this idea to the training of PINNs, with applications to heat equation, Burgers’ equation, Fokker-Planck equation and elasto-plasticity models. One main difference between our model and these works is that, we still use the L2L^{2} norm, rather than H1H^{1} norm for the residual and initial velocity in the loss of SPINNs. This designing, as we will demonstrate, turns out to be a sufficient condition to guarantee the stability, which also enables us to achieve a lower error with same and even less training samples. Furthermore, based on this stability, we firstly give a systematical convergence rate analysis of PINNs for wave equations. In general, our main contributions are summarized as follows:

Main contributions

\bullet We propose a novel Stabilized Physics Informed Neural Networks method (SPINNs) for solving wave equations, for which we prove that the error in solution can be bounded by the training risk.

\bullet We numerically show that SPINNs can achieve a faster and better convergence than original PINNs.

\bullet We present a systematical convergence analysis of SPINNs. According to our result, once the network depth, width, as well as the number of training samples have been appropriately chosen, the error between the numerical solution uϕu_{\phi} from SPINNs and the exact solution uu^{*} can be arbitrarily small in the H1H^{1} norm

𝔼{Xn}n=1N,{Ym}m=1M,{Tk}k=1Ku^ϕuH1(ΩT¯)ε.\displaystyle\mathbb{E}_{\{X_{n}\}_{n=1}^{N},\{Y_{m}\}_{m=1}^{M},\{T_{k}\}_{k=1}^{K}}\|\hat{u}_{\phi}-u^{*}\|_{H^{1}(\overline{\Omega_{T}})}\leq\varepsilon.

The rest of this paper is organized as follows. In Section 2, we describe the problem setting and introduce the SPINNs method. In Section 3, we study the convergence rate of the SPINNs method for solving wave equations. In Section 4, we present several numerical examples to illustrate the efficiency of SPINNs. Finally in Section 5 the main conclusion will be provided.

2 The Stabilized PINNs method

In this section, we would introduce a stabilized PINNs (SPINNs) method for solving the wave equation. For completeness, we first list the following notations of neural networks and function spaces we will use. After that, the formulation of SPINNs will be presented.

2.1 Preliminary

Let 𝒟\mathcal{D}\in\mathbb{N}, we would call function ff a neural network if it is implemented by:

f0(x)\displaystyle f_{0}(x) =x,\displaystyle=x,
fl(x)\displaystyle f_{l}(x) =ρl(Alfl1+bl),forl=1,,𝒟1,\displaystyle=\rho_{l}(A_{l}f_{l-1}+b_{l}),\quad for\quad l=1,\cdots,\mathcal{D}-1,
f:\displaystyle f: =f𝒟(x)=A𝒟f𝒟1+b𝒟,\displaystyle=f_{\mathcal{D}}(x)=A_{\mathcal{D}}f_{\mathcal{D}-1}+b_{\mathcal{D}},

where Al=(aij(l))nl×nl1A_{l}=(a_{ij}^{(l)})\in\mathbb{R}^{n_{l}\times n_{l-1}} , bl=(bi(l))nlb_{l}=(b_{i}^{(l)})\in\mathbb{R}^{n_{l}}.And ρl:nlnl\rho_{l}:\mathbb{R}^{n_{l}}\rightarrow\mathbb{R}^{n_{l}}, is the active function. The hyper-parameters 𝒟\mathcal{D} and 𝒲:=max{Nl,l=0,,𝒟}\mathcal{W}:=max\{N_{l},l=0,\cdots,\mathcal{D}\} are called the depth and the width of the network, respectively. Let Φ\Phi be a set of activation functions and XX be a Banach space, the normed neural network function class can be defined as

𝒩(𝒟,𝒲,{X,},Φ):={f:fis implemented by a neural network with\displaystyle\mathcal{N}(\mathcal{D},\mathcal{W},\{\|\cdot\|_{X},\mathcal{B}\},\Phi):=\{f:f\ \text{is implemented by a neural network with }
depth 𝒟 and width 𝒲,fX,ρi(l)Φfor each i and l}.\displaystyle\text{depth }\mathcal{D}\text{ and width }\mathcal{W},\|f\|_{X}\leq\mathcal{B},\rho_{i}^{(l)}\in\Phi\ \text{for each $i$ and $l$}\}.

Next, we introduce several standard function spaces, including the continuous function space and Sobolev space:

C(Ω):={all the continuous functions defined on Ω},\displaystyle C(\Omega):=\{\text{all the continuous functions defined on }\Omega\},
Cs(Ω):={f:Ω|DαfC(Ω)},|α|s,\displaystyle C^{s}(\Omega):=\{f:\Omega\rightarrow\mathbb{R}\ |\ D^{\alpha}f\in C(\Omega)\},|\alpha|\leq s,
C(Ω¯):={all the continuous functions defined on Ω¯},fC(Ω¯):=maxxΩ¯|f(x)|,\displaystyle C(\overline{\Omega}):=\{\text{all the continuous functions defined on }\overline{\Omega}\},\left\|f\right\|_{C(\overline{\Omega})}:=\max_{x\in\overline{\Omega}}|f(x)|,
Cs(Ω¯):={f:Ω|DαfC(Ω¯)},|α|s,fCs(Ω¯):=maxxΩ¯,|α|s|Dαf(x)|,\displaystyle C^{s}(\overline{\Omega}):=\{f:\Omega\rightarrow\mathbb{R}\ |\ D^{\alpha}f\in C(\overline{\Omega})\},|\alpha|\leq s,\left\|f\right\|_{C^{s}(\overline{\Omega})}:=\max_{x\in\overline{\Omega},|\alpha|\leq s}|D^{\alpha}f(x)|,
Lp(Ω):={f:Ω|Ω|f|pdx<},\displaystyle L^{p}(\Omega):=\left\{f:\Omega\rightarrow\mathbb{R}|\int_{\Omega}\left|f\right|^{p}dx<\infty\right\},
fLp(Ω):=[Ω|f|p𝑑x]1/p,p[1,),\displaystyle\left\|f\right\|_{L^{p}(\Omega)}:=\bigg{[}\int_{\Omega}|f|^{p}dx\bigg{]}^{1/p},\forall p\in[1,\infty),
L(Ω):={f:Ω|C>0s.t.|f|Ca.e.},\displaystyle L^{\infty}(\Omega):=\left\{f:\Omega\rightarrow\mathbb{R}\ |\ \exists C>0\ s.t.|f|\leq C\ a.e.\right\},
fL(Ω):=inf{C||f|Ca.e.},\displaystyle\left\|f\right\|_{L^{\infty}(\Omega)}:=\inf\{C\ |\ |f|\leq C\ a.e.\},
Hs(Ω):={f:Ω|DαfL2(Ω),|α|s},\displaystyle H^{s}(\Omega):=\left\{f:\Omega\rightarrow\mathbb{R}\ |\ D^{\alpha}f\in L^{2}(\Omega),|\alpha|\leq s\right\},
fHs(Ω):=(|α|sDαfL2(Ω)2)1/2.\displaystyle\left\|f\right\|_{H^{s}(\Omega)}:=\left(\sum_{|\alpha|\leq s}\left\|D^{\alpha}f\right\|_{L^{2}(\Omega)}^{2}\right)^{1/2}.

2.2 Stabilized PINNs for wave equations

Considering the following wave equation:

{uttΔu=f,(x,t)ΩT,u(x,0)=φ(x),xΩ,ut(x,0)=ψ(x),xΩ,u(x,t)=g(x,t),xΩ,t[0,T],\begin{cases}u_{tt}-\Delta u=f,\quad\ \quad&(x,t)\in\Omega_{T},\\ u(x,0)=\varphi(x),&x\in\Omega,\\ u_{t}(x,0)=\psi(x),&x\in\Omega,\\ u(x,t)=g(x,t),&x\in\partial\Omega,t\in[0,T],\\ \end{cases} (1)

where Δu=i=1duxixi\Delta u=\sum_{i=1}^{d}u_{x_{i}x_{i}}, Ω=(0,1)d\Omega=(0,1)^{d} and ΩT=Ω×[0,T]\Omega_{T}=\Omega\times\left[0,T\right]. Through this article, we would assume this problems defines a unique solution: {assumption} Assume (1) has a unique strong solution uC2(ΩT)C(ΩT¯)u^{*}\in C^{2}(\Omega_{T})\cap C(\overline{\Omega_{T}}).

Without loss of generality, we further assume that f,φ,ψ,gf,\varphi,\psi,g and their derivatives are LL^{\infty} bounded by a constant κ\kappa. Denote max{2uC2(ΩT¯),κ4}\mathcal{B}\triangleq\max\{2\|u^{*}\|_{C^{2}(\overline{\Omega_{T}})},\kappa^{4}\}. Instead of solving problem (1) by traditional numerical methods, we turn to formulate (1) as a minimization problem on C2(ΩT)C(ΩT¯)C^{2}(\Omega_{T})\cap C(\overline{\Omega_{T}}), with the loss functional \mathcal{L} being defined as:

(u)\displaystyle\mathcal{L}(u) :=utt(x,t)Δu(x,t)fL2(ΩT)2+u(x,0)φ(x)H1(Ω)2\displaystyle:=\|u_{tt}(x,t)-\Delta u(x,t)-f\|^{2}_{L^{2}(\Omega_{T})}+\|u(x,0)-\varphi(x)\|^{2}_{H^{1}(\Omega)}
+ut(x,0)ψ(x)L2(Ω)2+u(x,t)g(x,t)H1(ΩT)2.\displaystyle+\|u_{t}(x,0)-\psi(x)\|^{2}_{L^{2}(\Omega)}+\|u(x,t)-g(x,t)\|^{2}_{H^{1}(\partial\Omega_{T})}. (2)
Remark 1

Different from the original loss in PINNs, we adopt H1H^{1}-norm in stead of L2L^{2}-norm in the learning of initial position and boundary condition. This modification, as we will demonstrate, offers advantages in both theoretical analysis and numerical computation.

With assumption 2.2, we know that uu^{*} is also the unique minimizer of loss functional \mathcal{L} such that (u)=0\mathcal{L}(u^{*})=0. Let |Ω||\Omega| and |Ω||\partial\Omega| be the measure of Ω\Omega and Ω\partial\Omega, namely, |Ω|:=Ω1𝑑x|\Omega|:=\int_{\Omega}1dx, |Ω|:=Ω1𝑑x|\partial\Omega|:=\int_{\partial\Omega}1dx and |T|:=0T1𝑑t|T|:=\int_{0}^{T}1dt, then (u)\mathcal{L}(u) can be equivalently written as

(u)\displaystyle\mathcal{L}(u) =|Ω||T|𝔼XU(Ω),TU([0,T])(utt(X,T)Δu(X,T)f(X,T))2\displaystyle=|\Omega||T|\mathbb{E}_{X\in U(\Omega),T\in U([0,T])}\bigg{(}u_{tt}(X,T)-\Delta u(X,T)-f(X,T)\bigg{)}^{2}
+|Ω|𝔼XU(Ω)[(u(X,0)φ(X))2+i=1d(uxi(X,0)φxi(X))2\displaystyle+|\Omega|\mathbb{E}_{X\in U(\Omega)}\bigg{[}\big{(}u(X,0)-\varphi(X)\big{)}^{2}+\sum_{i=1}^{d}\big{(}u_{x_{i}}(X,0)-\varphi_{x_{i}}(X)\big{)}^{2}
+(ut(X,0)ψ(X))2]\displaystyle+\big{(}u_{t}(X,0)-\psi(X)\big{)}^{2}\bigg{]}
+|Ω||T|𝔼YU(Ω),TU([0,T])[(u(Y,T)g(Y,T))2\displaystyle+|\partial\Omega||T|\mathbb{E}_{Y\in U(\partial\Omega),T\in U([0,T])}\bigg{[}\big{(}u(Y,T)-g(Y,T)\big{)}^{2}
+(ut(Y,T)gt(Y,T))2+i=1d(uxi(Y,T)gxi(Y,T))2]\displaystyle+\big{(}u_{t}(Y,T)-g_{t}(Y,T)\big{)}^{2}+\sum_{i=1}^{d}\big{(}u_{x_{i}}(Y,T)-g_{x_{i}}(Y,T)\big{)}^{2}\bigg{]} (3)

where U(Ω)U(\Omega), U(Ω)U(\partial\Omega), U([0,T])U([0,T]) are uniform distribution on Ω\Omega, Ω\partial\Omega and [0,T][0,T], respectively. To solve minimization of (u)\mathcal{L}(u) approximately, a Monte Carlo discrete version of \mathcal{L} will be used:

^(u)\displaystyle\hat{\mathcal{L}}(u) =|Ω||T|NKn=1Nk=1K(utt(Xn,Tk)Δu(Xn,Tk)f(Xn,Tk))2\displaystyle=\frac{|\Omega||T|}{NK}\sum_{n=1}^{N}\sum_{k=1}^{K}\bigg{(}u_{tt}(X_{n},T_{k})-\Delta u(X_{n},T_{k})-f(X_{n},T_{k})\bigg{)}^{2}
+|Ω|Nn=1N[(u(Xn,0)φ(Xn))2+i=1d(uxi(Xn,0)φxi(Xn))2\displaystyle+\frac{|\Omega|}{N}\sum_{n=1}^{N}\bigg{[}\big{(}u(X_{n},0)-\varphi(X_{n})\big{)}^{2}+\sum_{i=1}^{d}\big{(}u_{x_{i}}(X_{n},0)-\varphi_{x_{i}}(X_{n})\big{)}^{2}
+(ut(Xn,0)ψ(Xn))2]\displaystyle+\big{(}u_{t}(X_{n},0)-\psi(X_{n})\big{)}^{2}\bigg{]}
+|Ω||T|MKm=1Mk=1K[(u(Ym,Tk)g(Ym,Tk))2\displaystyle+\frac{|\partial\Omega||T|}{MK}\sum_{m=1}^{M}\sum_{k=1}^{K}\bigg{[}\big{(}u(Y_{m},T_{k})-g(Y_{m},T_{k})\big{)}^{2}
+(ut(Ym,Tk)gt(Ym,Tk))2+i=1d(uxi(Ym,Tk)gxi(Ym,Tk))2]\displaystyle+\big{(}u_{t}(Y_{m},T_{k})-g_{t}(Y_{m},T_{k})\big{)}^{2}+\sum_{i=1}^{d}\big{(}u_{x_{i}}(Y_{m},T_{k})-g_{x_{i}}(Y_{m},T_{k})\big{)}^{2}\bigg{]} (4)

where {Xn}n=1N\{X_{n}\}_{n=1}^{N},{Ym}m=1M\{Y_{m}\}_{m=1}^{M} and {Tk}k=1K\{T_{k}\}_{k=1}^{K} are independent and identically distributed random samples according to the uniform distribution U(Ω)U(\Omega), U(Ω)U(\partial\Omega) and U([0,T])U([0,T]), respectively. With this approximation, we would solve the original problem (1) by using the empirical risk minimization:

u^ϕ=argminuϕ𝒫^(uϕ),\displaystyle\hat{u}_{\phi}=\arg\min_{u_{\phi}\in\mathcal{P}}\hat{\mathcal{L}}(u_{\phi}), (5)

where the admissible set 𝒫\mathcal{P} refers to the deep neural network function class parameterized by ϕ\phi. In this work, we will choose 𝒫\mathcal{P} as the ReLU3ReLU^{3} network function space, to ensure 𝒫C2(ΩT)C(ΩT¯)\mathcal{P}\subset C^{2}(\Omega_{T})\cap C(\overline{\Omega_{T}}). More precisely,

𝒫=𝒩(𝒟,𝒲,{C2(ΩT¯),},{ReLU3}),\mathcal{P}=\mathcal{N}(\mathcal{D},\mathcal{W},\{\|\cdot\|_{C^{2}(\overline{\Omega_{T}})},\mathcal{B}\},\{ReLU^{3}\}),

{𝒟,𝒲}\{\mathcal{D},\mathcal{W}\} will be given later to ensure the desired accuracy. The ReLU3ReLU^{3} activation function is defined by

ρ(x)={x3,x0,0,others.\displaystyle\rho(x)=\begin{cases}x^{3},&x\geq 0,\\ 0,&others.\end{cases}

In practical, the minimizer of problem (5) is usually obtained through some optimization algorithm 𝒜\mathcal{A}. We would denote the minimizer as uϕ𝒜u_{\phi_{\mathcal{A}}}.

3 Convergence analysis of SPINNs

In this section, we will present a systematical error analysis of SPINNs for wave equations. To begin with, we first review some basic notations and theorem in the PDEs theory on wave equations.

For wave equation, its total energy consists of two parts: kinetic energy UU and potential energy VV, both of which can be expressed by multiple integrals,

U=12Ωut2𝑑x,U=\frac{1}{2}\int_{\Omega}u_{t}^{2}dx,
V=12Ω(i=1duxi2)𝑑x,V=\frac{1}{2}\int_{\Omega}\left(\sum_{i=1}^{d}u_{x_{i}}^{2}\right)dx,

and their sum is called energy integral, the total energy of the wave equation (1) excluding a constant factor is denoted as,

E(t)=Ω(ut2+i=1duxi2)𝑑x.\displaystyle E(t)=\int_{\Omega}\left(u_{t}^{2}+\sum_{i=1}^{d}u_{x_{i}}^{2}\right)dx. (6)
Theorem 3.1 (Energy stability)

We denote E0(t):=Ωu2𝑑xE_{0}(t):=\int_{\Omega}u^{2}dx, which stands for the square norm estimation of uu. We have the energy inequality as below.

E(t)+E0(t)\displaystyle E(t)+E_{0}(t) C(T)(E(0)+E0(0)+0TΩf2dxdt\displaystyle\leq C(T)(E(0)+E_{0}(0)+\int_{0}^{T}\int_{\Omega}f^{2}dxdt
+20TΩ|ut|udsdt).\displaystyle+2\int_{0}^{T}\int_{\partial\Omega}|u_{t}|\cdot\|\nabla u\|dsdt).
Proof

See Appendix 6.1 for details.

3.1 Risk decomposition

By the definition of uu^{*} and uϕ𝒜u_{\phi_{\mathcal{A}}}, we can decompose the risk in SPINNs as

(uϕ𝒜)(u)=\displaystyle\mathcal{L}(u_{\phi_{\mathcal{A}}})-\mathcal{L}(u^{*})= (uϕ𝒜)^(uϕ𝒜)+^(uϕ𝒜)^(u^ϕ)+^(u^ϕ)\displaystyle\mathcal{L}(u_{\phi_{\mathcal{A}}})-\hat{\mathcal{L}}(u_{\phi_{\mathcal{A}}})+\hat{\mathcal{L}}(u_{\phi_{\mathcal{A}}})-\hat{\mathcal{L}}(\hat{u}_{\phi})+\hat{\mathcal{L}}(\hat{u}_{\phi})
^(u¯)+^(u¯)(u¯)+(u¯)(u)\displaystyle-\hat{\mathcal{L}}(\overline{u})+\hat{\mathcal{L}}(\overline{u})-\mathcal{L}(\overline{u})+\mathcal{L}(\overline{u})-\mathcal{L}(u^{*})
=\displaystyle= [^(u^ϕ)^(u¯)]+[(uϕ𝒜)^(uϕ𝒜)]+[^(u¯)(u¯)]\displaystyle\bigg{[}\hat{\mathcal{L}}(\hat{u}_{\phi})-\hat{\mathcal{L}}(\overline{u})\bigg{]}+\bigg{[}\mathcal{L}(u_{\phi_{\mathcal{A}}})-\hat{\mathcal{L}}(u_{\phi_{\mathcal{A}}})\bigg{]}+\bigg{[}\hat{\mathcal{L}}(\overline{u})-\mathcal{L}(\overline{u})\bigg{]}
+[(u¯)(u)]+[^(uϕ𝒜)^(u^ϕ)],\displaystyle+\bigg{[}\mathcal{L}(\overline{u})-\mathcal{L}(u^{*})\bigg{]}+\bigg{[}\hat{\mathcal{L}}(u_{\phi_{\mathcal{A}}})-\hat{\mathcal{L}}(\hat{u}_{\phi})\bigg{]},

where u¯\overline{u} is an arbitrarily element in 𝒫\mathcal{P}. Since ^(u^ϕ)^(u¯)0\hat{\mathcal{L}}(\hat{u}_{\phi})-\hat{\mathcal{L}}(\overline{u})\leq 0, and uu is an arbitrarily element in 𝒫\mathcal{P}, we have:

(uϕ𝒜)(u)2supu𝒫|(u)^(u)|εsta+infu𝒫|(u)(u)|εapp+[^(uϕA)^(uϕ^)]εopt\displaystyle\mathcal{L}(u_{\phi_{\mathcal{A}}})-\mathcal{L}(u^{*})\leq\underbrace{2\mathop{sup}_{u\in\mathcal{P}}\bigg{|}\mathcal{L}(u)-\hat{\mathcal{L}}(u)\bigg{|}}_{\varepsilon_{sta}}+\underbrace{\mathop{inf}_{u\in\mathcal{P}}\bigg{|}\mathcal{L}(u)-\mathcal{L}(u^{*})\bigg{|}}_{\varepsilon_{app}}+\underbrace{\bigg{[}\hat{\mathcal{L}}(u_{\phi_{A}})-\hat{\mathcal{L}}(\hat{u_{\phi}})\bigg{]}}_{\varepsilon_{opt}}

Thus, we have decomposed the total risk into approximation error (εapp\varepsilon_{app}), statistical error (εsta\varepsilon_{sta}) and optimization error (εopt\varepsilon_{opt}). While the approximation error describes the expressive power of ReLU3ReLU^{3} network, the statistical error is caused by the discretization of the Monte Carlo method and the optimization error represents performance of the solver 𝒜\mathcal{A} we use. In this work, we compromisely assume that the neural network can be well trained such that εopt=0\varepsilon_{opt}=0, and leave the optimization error as future study. In this case, it can be found that u^ϕ=uϕ𝒜\hat{u}_{\phi}=u_{\phi_{\mathcal{A}}}.

3.2 Lower bound of risk

Next, based on the energy stability of wave equations, we shall present a lower bound of risk (u^ϕ)(u)\mathcal{L}(\hat{u}_{\phi})-\mathcal{L}(u^{*}) in SPINNs. As we will demonstrate later, the risk can be arbitrary small if the network and sample complexity have been well chosen, and thus we can assume (u^ϕ)(u)<1\mathcal{L}(\hat{u}_{\phi})-\mathcal{L}(u^{*})<1. Let v=u^ϕuv=\hat{u}_{\phi}-u^{*} be the error between numerical solution and exact solution, we have

{vttΔv=(u^ϕ)ttΔu^ϕff~,xΩ,t[0,T],v(x,0)=u^ϕ(x,0)φ(x),xΩ,vt(x,0)=(u^ϕ)t(x,0)ψ(x),xΩ,v(x,t)=u^ϕ(x,0)g(x,t),xΩ,t[0,T],\begin{cases}v_{tt}-\Delta v=(\hat{u}_{\phi})_{tt}-\Delta\hat{u}_{\phi}-f\triangleq\tilde{f},\quad\ \quad&x\in\Omega,t\in[0,T],\\ v(x,0)=\hat{u}_{\phi}(x,0)-\varphi(x),&x\in\Omega,\\ v_{t}(x,0)=(\hat{u}_{\phi})_{t}(x,0)-\psi(x),&x\in\Omega,\\ v(x,t)=\hat{u}_{\phi}(x,0)-g(x,t),&x\in\partial\Omega,t\in[0,T],\\ \end{cases} (7)

and vC2(ΩT¯)32\|v\|_{C^{2}(\overline{\Omega_{T}})}\leq\frac{3}{2}\mathcal{B}. By applying theorem (3.1) to equation (7), we obtain

u^ϕuH12\displaystyle\|\hat{u}_{\phi}-u^{*}\|^{2}_{H_{1}}
=\displaystyle= Ω(vt2+i=1dvxi2)𝑑x+Ωv2𝑑x\displaystyle\int_{\Omega}\left(v_{t}^{2}+\sum_{i=1}^{d}v_{x_{i}}^{2}\right)dx+\int_{\Omega}v^{2}dx
\displaystyle\leq C(T)(E(0)+E0(0)+0TΩf~2𝑑x𝑑t+20TΩ|vt|v𝑑s𝑑t)\displaystyle C(T)\left(E(0)+E_{0}(0)+\int_{0}^{T}\int_{\Omega}\tilde{f}^{2}dxdt+2\int_{0}^{T}\int_{\partial\Omega}|v_{t}|\cdot\|\nabla v\|dsdt\right)
\displaystyle\leq C(T)(Ω(vt(x,0)2+i=1dvxi(x,0)2)dx+Ωv(x,0)2dx\displaystyle C(T)\left(\int_{\Omega}(v_{t}(x,0)^{2}+\sum_{i=1}^{d}v_{x_{i}}(x,0)^{2})dx+\int_{\Omega}v(x,0)^{2}dx\right.
+0TΩ((u^ϕ)ttΔu^ϕf)2dxdt+3d0TΩ|vt|dsdt)\displaystyle\left.+\int_{0}^{T}\int_{\Omega}((\hat{u}_{\phi})_{tt}-\Delta\hat{u}_{\phi}-f)^{2}dxdt+3\sqrt{d}\mathcal{B}\int_{0}^{T}\int_{\partial\Omega}|v_{t}|dsdt\right)
\displaystyle\leq C(T)(Ω(vt(x,0)2+i=1dvxi(x,0)2)dx+Ωv(x,0)2dx\displaystyle C(T)\left(\int_{\Omega}(v_{t}(x,0)^{2}+\sum_{i=1}^{d}v_{x_{i}}(x,0)^{2})dx+\int_{\Omega}v(x,0)^{2}dx\right.
+0TΩ((u^ϕ)ttΔu^ϕf)2dxdt+3d|ΩT|(0TΩvt2dsdt)12)\displaystyle\left.+\int_{0}^{T}\int_{\Omega}((\hat{u}_{\phi})_{tt}-\Delta\hat{u}_{\phi}-f)^{2}dxdt+3\sqrt{d}\mathcal{B}|\partial\Omega_{T}|(\int_{0}^{T}\int_{\partial\Omega}v_{t}^{2}dsdt)^{\frac{1}{2}}\right)
\displaystyle\leq C(T)((u^ϕ)ttΔu^ϕfL2(ΩT)2+(u^ϕ)t(x,0)ψ(x)L2(Ω)2\displaystyle C(T)\left(\|(\hat{u}_{\phi})_{tt}-\Delta\hat{u}_{\phi}-f\|^{2}_{L^{2}(\Omega_{T})}+\|(\hat{u}_{\phi})_{t}(x,0)-\psi(x)\|^{2}_{L^{2}(\Omega)}\right.
+u^ϕ(x,0)φ(x)H1(Ω)2+3d|ΩT|u^ϕ(x,t)g(x,t)H1(ΩT))\displaystyle\left.+\|\hat{u}_{\phi}(x,0)-\varphi(x)\|^{2}_{H^{1}(\Omega)}+3\sqrt{d}\mathcal{B}|\partial\Omega_{T}|\|\hat{u}_{\phi}(x,t)-g(x,t)\|_{H^{1}(\partial\Omega_{T})}\right)
\displaystyle\leq C(T)((u^ϕ)+3d|ΩT|(u^ϕ)12)\displaystyle C(T)\left(\mathcal{L}(\hat{u}_{\phi})+3\sqrt{d}\mathcal{B}|\partial\Omega_{T}|\mathcal{L}(\hat{u}_{\phi})^{\frac{1}{2}}\right)
\displaystyle\leq C(T)(1+3d|ΩT|)(u^ϕ)12((u^ϕ)<1)\displaystyle C(T)(1+3\sqrt{d}\mathcal{B}|\partial\Omega_{T}|)\mathcal{L}(\hat{u}_{\phi})^{\frac{1}{2}}\quad(\mathcal{L}(\hat{u}_{\phi})<1)
\displaystyle\leq γ((u^ϕ)(u))12,((u)=0)\displaystyle\gamma\bigg{(}\mathcal{L}(\hat{u}_{\phi})-\mathcal{L}(u^{*})\bigg{)}^{\frac{1}{2}},\quad(\mathcal{L}(u^{*})=0)

where we define γ(T,d,,|ΩT|)C(T)(1+3d|ΩT|)\gamma(T,d,\mathcal{B},|\partial\Omega_{T}|)\triangleq C(T)(1+3\sqrt{d}\mathcal{B}|\partial\Omega_{T}|). Combined this lower bound with previous risk decomposition, we can arrive at:

u^ϕuH14γ2(εapp+εsta).\|\hat{u}_{\phi}-u^{*}\|^{4}_{H_{1}}\leq\gamma^{2}(\varepsilon_{app}+\varepsilon_{sta}). (8)

3.3 Approximation error

By applying the following lemma proved in our previous work, we can get the upper bound of approximation error:

Lemma 1

u¯C3(ΩT¯)\forall\overline{u}\in C^{3}(\overline{\Omega_{T}}) and ε>0\varepsilon>0, there exist a ReLU3ReLU^{3} network uϕu_{\phi} with depth [log2d]+2[log_{2}d]+2 and width C(d,u¯C3(ΩT¯))(1ε)d+1C(d,\|\overline{u}\|_{C^{3}(\overline{\Omega_{T}})})(\frac{1}{\varepsilon})^{d+1} such that

u¯uϕC2(ΩT¯)ε.\|\overline{u}-u_{\phi}\|_{C^{2}(\overline{\Omega_{T}})}\leq\varepsilon.
Proof

A special case of Corollary 4.2 in AAM-39-239 .

Theorem 3.2

Under the Assumption 2.2 and the condition that uC3(ΩT¯)u^{*}\in C^{3}(\overline{\Omega_{T}}), for any ε>0\varepsilon>0, if we choose the following neural network function class:

𝒫\displaystyle\mathcal{P} =\displaystyle= 𝒩([log2d]+2,C(d,|Ω|,|Ω|,|T|,uC3(ΩT¯))(1ε)d+1,\displaystyle\mathcal{N}([log_{2}d]+2,C(d,|\Omega|,|\partial\Omega|,|T|,\|u^{*}\|_{C^{3}(\overline{\Omega_{T}})})(\frac{1}{\varepsilon})^{d+1},
{C2(ΩT¯),2uC2(ΩT¯)},{ReLU3}),\displaystyle\{\|\cdot\|_{C^{2}(\overline{\Omega_{T}})},2\|u^{*}\|_{C^{2}(\overline{\Omega_{T}})}\},\{ReLU^{3}\}),

then the approximation error εappC(d,|ΩT|,ΩT|)ε2\varepsilon_{app}\leq C(d,|\Omega_{T}|,\partial\Omega_{T}|)\varepsilon^{2}.

Proof

See Appendix 6.2 for details.

3.4 Statistical error

The following theorem demonstrates that with sufficiently large sample complexity, the statistical error can be well controlled:

Theorem 3.3

Let 𝒟,𝒲,+.\mathcal{D},\mathcal{W}\in\mathbb{N},\mathcal{B}\in\mathbb{R}^{+}. For any ε0\varepsilon\geq 0, if the number of samples satisfy:

{N=C(d,|Ω|,)𝒟4𝒲2(𝒟+log(𝒲))(1ε)2+δ,K=C(d,|T|,)𝒟2fK(𝒟,𝒲)(1ε)k1,M=C(d,|Ω|,)fM(𝒟,𝒲)(1ε)k2,\begin{cases}N=C(d,|\Omega|,\mathcal{B})\mathcal{D}^{4}\mathcal{W}^{2}(\mathcal{D}+log(\mathcal{W}))(\frac{1}{\varepsilon})^{2+\delta},\\ K=C(d,|T|,\mathcal{B})\mathcal{D}^{2}f_{K}(\mathcal{D},\mathcal{W})(\frac{1}{\varepsilon})^{k_{1}},\\ M=C(d,|\partial\Omega|,\mathcal{B})f_{M}(\mathcal{D},\mathcal{W})(\frac{1}{\varepsilon})^{k_{2}},\\ \end{cases}

where fK(𝒟,𝒲)1f_{K}(\mathcal{D},\mathcal{W})\geq 1, fM(𝒟,𝒲)1f_{M}(\mathcal{D},\mathcal{W})\geq 1, δ\delta is an arbitrarily small number such that

{k1+k2=2+δ,fk(𝒟,𝒲)fM(𝒟,𝒲)=𝒟2𝒲2(𝒟+log(𝒲)),\begin{cases}k_{1}+k_{2}=2+\delta,\\ f_{k}(\mathcal{D},\mathcal{W})\cdot f_{M}(\mathcal{D},\mathcal{W})=\mathcal{D}^{2}\mathcal{W}^{2}(\mathcal{D}+log(\mathcal{W})),\end{cases}

then we have:

𝔼{Xn}n=1N,{Ym}m=1M,{Tk}k=1Ksupu𝒫|(u)^(u)|ε\displaystyle\mathbb{E}_{\{X_{n}\}_{n=1}^{N},\{Y_{m}\}_{m=1}^{M},\{T_{k}\}_{k=1}^{K}}\sup_{u\in\mathcal{P}}|\mathcal{L}(u)-\hat{\mathcal{L}}(u)|\leq\varepsilon
Proof

See Appendix 6.3 for details.

3.5 Convergence rate of SPINNs

With the preparation in last two sections on the bounds of approximation and statistical errors, we will give the main results in this section.

Theorem 3.4

Under the Assumption 2.2 and the condition that uC3(ΩT¯)u^{*}\in C^{3}(\overline{\Omega_{T}}). For any ε>0\varepsilon>0, if we choose the parameterized neural network class

𝒫\displaystyle\mathcal{P} =\displaystyle= 𝒩([log2d]+2,C(d,|Ω|,|Ω|,|T|,uC3(ΩT¯))(1ε2)d+1,\displaystyle\mathcal{N}([log_{2}d]+2,C(d,|\Omega|,|\partial\Omega|,|T|,\|u^{*}\|_{C^{3}(\overline{\Omega_{T}})})(\frac{1}{\varepsilon^{2}})^{d+1},
{C2(ΩT¯),2uC2(ΩT¯)},{ReLU3})\displaystyle\{\|\cdot\|_{C^{2}(\overline{\Omega_{T}})},2\|u^{*}\|_{C^{2}(\overline{\Omega_{T}})}\},\{ReLU^{3}\})

and let the number of samples be

{N=C(d,|Ω|,)(1ε4)d+3+δ,K=C(d,|T|,)(1ε4)k1+k~1,M=C(d,|Ω|,)(1ε4)k2+k~2,\begin{cases}N=C(d,|\Omega|,\mathcal{B})(\frac{1}{\varepsilon^{4}})^{d+3+\delta},\\ K=C(d,|T|,\mathcal{B})(\frac{1}{\varepsilon^{4}})^{k_{1}+\tilde{k}_{1}},\\ M=C(d,|\partial\Omega|,\mathcal{B})(\frac{1}{\varepsilon^{4}})^{k_{2}+\tilde{k}_{2}},\\ \end{cases}

where k~1,k~20\tilde{k}_{1},\tilde{k}_{2}\geq 0, δ\delta is arbitrarily small such that

{k1+k2=2+δ,k~1+k~2=d+1,\begin{cases}k_{1}+k_{2}=2+\delta,\\ \tilde{k}_{1}+\tilde{k}_{2}=d+1,\end{cases}

then we have:

𝔼{Xn}n=1N,{Ym}m=1M,{Tk}k=1Ku^ϕuH1(ΩT¯)ε\displaystyle\mathbb{E}_{\{X_{n}\}_{n=1}^{N},\{Y_{m}\}_{m=1}^{M},\{T_{k}\}_{k=1}^{K}}\|\hat{u}_{\phi}-u^{*}\|_{H^{1}(\overline{\Omega_{T}})}\leq\varepsilon
Proof

By theorem 3.2, if we set the neural network function class as:

𝒫\displaystyle\mathcal{P} =\displaystyle= 𝒩([log2d]+2,C(d,|Ω|,|Ω|,|T|,uC3(ΩT¯))(1ε2)d+1,\displaystyle\mathcal{N}([log_{2}d]+2,C(d,|\Omega|,|\partial\Omega|,|T|,\|u^{*}\|_{C^{3}(\overline{\Omega_{T}})})(\frac{1}{\varepsilon^{2}})^{d+1}, (9)
{C2(ΩT¯),2uC2(ΩT¯)},{ReLU3})\displaystyle\{\|\cdot\|_{C^{2}(\overline{\Omega_{T}})},2\|u^{*}\|_{C^{2}(\overline{\Omega_{T}})}\},\{ReLU^{3}\})

the approximation error can be arbitrarily small:

εappε42γ2\displaystyle\varepsilon_{app}\leq\frac{\varepsilon^{4}}{2\gamma^{2}} (10)

Without loss of generality we assume that ε\varepsilon is small enough such that

u^ϕC2(ΩT¯)uu^ϕC2(ΩT¯)+uC2(ΩT¯)2uC2(ΩT¯).\displaystyle\|\hat{u}_{\phi}\|_{C^{2}(\overline{\Omega_{T}})}\leq\|u^{*}-\hat{u}_{\phi}\|_{C^{2}(\overline{\Omega_{T}})}+\|u^{*}\|_{C^{2}(\overline{\Omega_{T}})}\leq 2\|u^{*}\|_{C^{2}(\overline{\Omega_{T}})}.

By theorem 3.3, when the number of samples be:

{N=C(d,|Ω|,)(1ε4)d+3+δ,K=C(d,|T|,)(1ε4)k1+k~1,M=C(d,|Ω|,)(1ε4)k2+k~2,\begin{cases}N=C(d,|\Omega|,\mathcal{B})(\frac{1}{\varepsilon^{4}})^{d+3+\delta},\\ K=C(d,|T|,\mathcal{B})(\frac{1}{\varepsilon^{4}})^{k_{1}+\tilde{k}_{1}},\\ M=C(d,|\partial\Omega|,\mathcal{B})(\frac{1}{\varepsilon^{4}})^{k_{2}+\tilde{k}_{2}},\\ \end{cases} (11)

where δ\delta is an arbitrarily positive number and

{k1+k2=2+δ,k~1+k~2=d+1,\begin{cases}k_{1}+k_{2}=2+\delta,\\ \tilde{k}_{1}+\tilde{k}_{2}=d+1,\end{cases} (12)

we have:

𝔼{Xn}n=1N,{Ym}m=1M,{Tk}k=1Kεstaε42γ2\displaystyle\mathbb{E}_{\{X_{n}\}_{n=1}^{N},\{Y_{m}\}_{m=1}^{M},\{T_{k}\}_{k=1}^{K}}\varepsilon_{sta}\leq\frac{\varepsilon^{4}}{2\gamma^{2}} (13)

Combining (8), (10) and (13) together, we get the final result:

𝔼{Xn}n=1N,{Ym}m=1M,{Tk}k=1Ku^ϕuH1(ΩT¯)\displaystyle\mathbb{E}_{\{X_{n}\}_{n=1}^{N},\{Y_{m}\}_{m=1}^{M},\{T_{k}\}_{k=1}^{K}}\|\hat{u}_{\phi}-u^{*}\|_{H^{1}(\overline{\Omega_{T}})}
\displaystyle\leq 𝔼{Xn}n=1N,{Ym}m=1M,{Tk}k=1K[γ2((u^ϕ)(u))]1/4\displaystyle\mathbb{E}_{\{X_{n}\}_{n=1}^{N},\{Y_{m}\}_{m=1}^{M},\{T_{k}\}_{k=1}^{K}}[\gamma^{2}\left(\mathcal{L}(\hat{u}_{\phi})-\mathcal{L}(u^{*})\right)]^{1/4}
\displaystyle\leq ε.\displaystyle\varepsilon. (14)

4 Numerical Experiments

In this section, we would use SPINNs to solve wave equation in both one dimension and two dimension.

4.1 1D example

Consider the following 1D wave equation on Ω=[2,2]\Omega=[-2,2] from t=0t=0 to t=T=8t=T=8:

{utt=uxx,xΩ,0t8,u(0,x)=0,xΩ,ut(0,x)=0,xΩ,u(t,2)=sin(0.8πt),0t8,u(t,2)=0,0t8.\begin{cases}u_{tt}=u_{xx},\quad&x\in\Omega,\quad 0\leq t\leq 8,\\ u(0,x)=0,\quad&x\in\Omega,\\ u_{t}(0,x)=0,\quad&x\in\Omega,\\ u(t,-2)=sin(0.8\pi t),\quad&0\leq t\leq 8,\\ u(t,2)=0,\quad&0\leq t\leq 8.\end{cases} (15)

For the training with SPINNs, we use a four-layer ReLU3ReLU^{3} network with 64 neurons in each layer to approximate the solution. We choose the Adam algorithm to implement the minimization, and the initial learning rate is set as 1E-3.

As for sample complexity, we train SPINNs with 10000 interior points, 500 boundary points (250 for each end) and 250 initial points in each epoch, all of which are sampled according to a uniform distribution (see (a) in Figure 1). Further more, to obtain a better accuracy, we apply GAS method jiao2023gas as an adaptive sampling strategy. After every 250 epochs, we would adaptively add 600 inner points, 30 boundary points and 15 initial points based on a Gaussian mixture distribution. The GAS procedure will be repeated for 10 times. See jiao2023gas for more details. For evaluation, we use the central difference method with a fine mesh (dx=0.01dx=0.01, dt=0.009dt=0.009) to obtain a reference solution ucdmu_{cdm} (see (b) in Figure 1) , with which we can calculate the following relative error by using numerical integration:

Relative error=uϕucdm2ucdm2.\text{Relative error}=\frac{\|u_{\phi}-u_{cdm}\|_{2}}{\|u_{cdm}\|_{2}}.
Refer to caption
Refer to caption
Figure 1: Collection points (left) and reference solution (right) in [0,T]×Ω[0,T]\times\Omega. The red, green and blue points stand for the initial points, boundary points and interior points correspondingly.

Figure 2 demonstrates the numerical result of PINNs and SPINNs for (15) after NGN_{G} times of adaptive sampling by GAS. As we can see, the SPINNs method converges faster than the classical PINNs, e.g., after 5 times of adaptive sampling, SPINNs have already captured all the six peaks of standing waves generated by the superposition of reflected wave and right-traveling waves. Furthermore, we present the relative error of PINNs and SPINNs in Figure 3, which shows that our method can achieve a lower relative error at the early stage of training. On the other hand, with the times of adaptive sampling increasing, the classical PINNs can also arrive at a comparable accuracy, which has also been revealed by (g) in Figure 2. These results reflect the fact that training with SPINNs can speed up the convergence in solution, especially when the number of samples is relatively small.

Refer to caption
(a) NG=2,uPINNsN_{G}=2,u^{PINNs}
Refer to caption
(b) NG=2,uSPINNsN_{G}=2,u^{SPINNs}
Refer to caption
(c) NG=5,uPINNsN_{G}=5,u^{PINNs}
Refer to caption
(d) NG=5,uSPINNsN_{G}=5,u^{SPINNs}
Refer to caption
(e) NG=8,uPINNsN_{G}=8,u^{PINNs}
Refer to caption
(f) NG=8,uSPINNsN_{G}=8,u^{SPINNs}
Refer to caption
(g) NG=11,uPINNsN_{G}=11,u^{PINNs}
Refer to caption
(h) NG=11,uSPINNsN_{G}=11,u^{SPINNs}
Figure 2: The numerical result of PINNs and SPINNs for equation (15).
Refer to caption
Figure 3: The relative error of PINNs and SPINNs.

4.2 2D example

Consider the following 2D wave equation on Ω=[2,2]2\Omega=[-2,2]^{2} from t=0t=0 to t=1t=1:

{utt=uxx+uyy,xΩ,0t1,u(0,x,y)=sin(2πx2+y2),x2+y2<1,u(0,x,y)=0,x2+y21,ut(0,x,y)=0,(x,y)Ω,u(t,x,y)=0,(x,y)Ω,0t1,\begin{cases}u_{tt}=u_{xx}+u_{yy},\quad&x\in\Omega,\quad 0\leq t\leq 1,\\ u(0,x,y)=sin(2\pi\sqrt{x^{2}+y^{2}}),\quad&\sqrt{x^{2}+y^{2}}<1,\\ u(0,x,y)=0,\quad&\sqrt{x^{2}+y^{2}}\geq 1,\\ u_{t}(0,x,y)=0,\quad&(x,y)\in\Omega,\\ u(t,x,y)=0,\quad&(x,y)\in\partial\Omega,0\leq t\leq 1,\\ \end{cases} (16)

For the training with SPINNs, based on the experiment, we choose a three-layer ReLU3ReLU^{3} network with 512 neurons in each layer to approximate the solution. The optimization algorithm and the initial learning rate are kept as before.

As for sample complexity, we train SPINNs with 2000 interior points, 4000 boundary points (1000 for each edge) and 1000 initial points in each epoch, all of which are sampled according to a uniform distribution. For evaluation, we use the central difference method with a fine mesh (dx=dy=0.01dx=dy=0.01, dt=0.004dt=0.004) to obtain a reference solution ucdmu_{cdm}, with which we can calculate the pointwise absolute error |ucdmuϕ||u_{cdm}-u_{\phi}|. As we can observe from Figure 4 and Figure 5, the SPINNs method can achieve a lower pointwise error, after training for same epochs. This superiority, in fact, can be understood as that learning with derivative information improves accuracy in the fitting of initial condition.

Refer to caption
(a) t=0,uϕt=0,u_{\phi}
Refer to caption
(b) t=0,|ucdmuϕ|t=0,|u_{cdm}-u_{\phi}|
Refer to caption
(c) t=0.1,uϕt=0.1,u_{\phi}
Refer to caption
(d) t=0.1,|ucdmuϕ|t=0.1,|u_{cdm}-u_{\phi}|
Refer to caption
(e) t=0.2,uϕt=0.2,u_{\phi}
Refer to caption
(f) t=0.2,|ucdmuϕ|t=0.2,|u_{cdm}-u_{\phi}|
Refer to caption
(g) t=0.3,uϕt=0.3,u_{\phi}
Refer to caption
(h) t=0.3,|ucdmuϕ|t=0.3,|u_{cdm}-u_{\phi}|
Figure 4: The numerical result for (16) after 10000 epochs training of PINNs.
Refer to caption
(a) t=0,uϕt=0,u_{\phi}
Refer to caption
(b) t=0,|ucdmuϕ|t=0,|u_{cdm}-u_{\phi}|
Refer to caption
(c) t=0.1,uϕt=0.1,u_{\phi}
Refer to caption
(d) t=0.1,|ucdmuϕ|t=0.1,|u_{cdm}-u_{\phi}|
Refer to caption
(e) t=0.2,uϕt=0.2,u_{\phi}
Refer to caption
(f) t=0.2,|ucdmuϕ|t=0.2,|u_{cdm}-u_{\phi}|
Refer to caption
(g) t=0.3,uϕt=0.3,u_{\phi}
Refer to caption
(h) t=0.3,|ucdmuϕ|t=0.3,|u_{cdm}-u_{\phi}|
Figure 5: The numerical result for (16) after 10000 epochs training of SPINNs.

5 Conclusion

In this work, we propose a stabilized physics informed neural networks method SPINNs for wave equations. With some numerical analysis, we rigorously prove SPINNs is a stable learning method, in which the solution error can be well controlled by the loss term. Based on this, a non-asymptotic convergence rate of SPINNs is presented, which provide people with a solid theoretical foundation to use it. Furthermore, by applying SPINNs to the simulation of two wave propagation problems, we numerically demonstrate that SPINNs can achieve a higher training accuracy and efficiency, especially when the number of samples is limited. On the other hand, how to extend this method to more difficult situations such as high dimensional problems and how to handel the optimization error in our convergence analysis are still needed to be studied. We will leave these topics as our future research.

Acknowledgement

This work is supported by the National Key Research and Development Program of China (No.2020YFA0714200), by the National Nature Science Foundation of China (No.12371441, No.12301558, No.12125103, No.12071362), and by the Fundamental Research Funds for the Central Universities.

6 Appendix

6.1 Appendix for energy integral of wave equations

According to the Gaussian formula, we have

Ω(i=1d(utxiuxi)+ut(i=1duxixi))𝑑x\displaystyle\int_{\Omega}\left(\sum_{i=1}^{d}(u_{tx_{i}}u_{x_{i}})+u_{t}(\sum_{i=1}^{d}u_{x_{i}x_{i}})\right)dx =Ω((utu))𝑑x\displaystyle=\int_{\Omega}(\nabla\cdot(u_{t}\nabla u))dx
=Ωutu𝐧ds,\displaystyle=\int_{\partial\Omega}u_{t}\nabla u\cdot\mathbf{n}ds, (17)

where 𝐧\mathbf{n} is the unit outer normal vector. Combine (6) and (6.1),we have

dE(t)dt\displaystyle\frac{dE(t)}{dt} =2Ω(ututt+i=1duxiuxit)𝑑x\displaystyle=2\int_{\Omega}\left(u_{t}u_{tt}+\sum_{i=1}^{d}u_{x_{i}}u_{x_{i}t}\right)dx
=2Ω(ututtuti=1duxixi)𝑑x+2Ωutu𝐧ds\displaystyle=2\int_{\Omega}\left(u_{t}u_{tt}-u_{t}\sum_{i=1}^{d}u_{x_{i}x_{i}}\right)dx+2\int_{\partial\Omega}u_{t}\nabla u\cdot\mathbf{n}ds
=2Ωutf𝑑x+2Ωutu𝐧ds\displaystyle=2\int_{\Omega}u_{t}fdx+2\int_{\partial\Omega}u_{t}\nabla u\cdot\mathbf{n}ds
Ω(ut2+f2)𝑑x+2Ω|ut|u𝑑s.\displaystyle\leq\int_{\Omega}(u_{t}^{2}+f^{2})dx+2\int_{\partial\Omega}|u_{t}|\cdot\|\nabla u\|ds. (18)

Multiply both sides of the inequality (6.1) by ete^{-t},

d(etE(t))dtet(Ωf2𝑑x+2Ω|ut|u𝑑s).\displaystyle\frac{d(e^{-t}E(t))}{dt}\leq e^{-t}\left(\int_{\Omega}f^{2}dx+2\int_{\partial\Omega}|u_{t}|\cdot\|\nabla u\|ds\right). (19)

Then, integrating the equation (19) from 0 to t,

E(t)et(E(0)+0teτΩf2𝑑x𝑑τ+20teτΩ|ut|u𝑑s𝑑τ)\displaystyle E(t)\leq e^{t}\left(E(0)+\int_{0}^{t}e^{-\tau}\int_{\Omega}f^{2}dxd\tau+2\int_{0}^{t}e^{-\tau}\int_{\partial\Omega}|u_{t}|\cdot\|\nabla u\|dsd\tau\right)

For any t[0,T]t\in[0,T],

E(t)C1(E(0)+0TΩf2𝑑x𝑑t+20TΩ|ut|u𝑑s𝑑t),\displaystyle E(t)\leq C_{1}\left(E(0)+\int_{0}^{T}\int_{\Omega}f^{2}dxdt+2\int_{0}^{T}\int_{\partial\Omega}|u_{t}|\cdot\|\nabla u\|dsdt\right), (20)

C1C_{1} is a constant that is only related to TT. Further we have

dE0(t)dt=Ω2uut𝑑xΩu2𝑑x+Ωut2𝑑xE0(t)+E(t)\displaystyle\frac{dE_{0}(t)}{dt}=\int_{\Omega}2uu_{t}dx\leq\int_{\Omega}u^{2}dx+\int_{\Omega}u_{t}^{2}dx\leq E_{0}(t)+E(t) (21)

Multiply both sides of the above equation (21) by ete^{-t},

ddt(etE0(t))etE(t)\displaystyle\frac{d}{dt}(e^{-t}E_{0}(t))\leq e^{-t}E(t) (22)

Integrating the equation (22) from 0 to t,

E0(t)etE0(0)+et0teτE(τ)𝑑τ\displaystyle E_{0}(t)\leq e^{t}E_{0}(0)+e^{t}\int_{0}^{t}e^{-\tau}E(\tau)d\tau (23)

For any t[0,T]t\in[0,T],

E0(t)C2(E0(0)+E(t)),\displaystyle E_{0}(t)\leq C_{2}\left(E_{0}(0)+E(t)\right), (24)

C2C_{2} is a constant that is only related to TT. Combine (20) and (24),we have,

E(t)+E0(t)C(E(0)+E0(0)+0TΩf2dxdt\displaystyle E(t)+E_{0}(t)\leq C\left(E(0)+E_{0}(0)+\int_{0}^{T}\int_{\Omega}f^{2}dxdt\right. (25)
+20TΩ|ut|udsdt).\displaystyle+\left.2\int_{0}^{T}\int_{\partial\Omega}|u_{t}|\cdot\|\nabla u\|dsdt\right). (26)

CC is a constant that is only related to TT.

6.2 Appendix for approximation error

[Proof of theorem 3.2] According to lemma 1, we know that for uC3(ΩT¯)u^{*}\in C^{3}(\overline{\Omega_{T}}), ε>0\varepsilon>0, there exist a ReLU3ReLU^{3} network with depth [log2d]+2[log_{2}d]+2 and width C(d,uC3(ΩT¯))(1ε)d+1C(d,\|u^{*}\|_{C^{3}(\overline{\Omega_{T}})})(\frac{1}{\varepsilon})^{d+1}, such that v(x,t)C2(ΩT¯)=uu^ϕC2(ΩT¯)ε\|v(x,t)\|_{C^{2}(\overline{\Omega_{T}})}=\|u^{*}-\hat{u}_{\phi}\|_{C^{2}(\overline{\Omega_{T}})}\leq\varepsilon. Hence,

εapp\displaystyle\varepsilon_{app} |(u^ϕ)(u)|\displaystyle\leq|\mathcal{L}(\hat{u}_{\phi})-\mathcal{L}(u^{*})|
=(u^ϕ)tt(x,t)Δu^ϕ(x,t)fL2(ΩT)2+u^ϕ(x,0)φ(x)H1(Ω)2\displaystyle=\|(\hat{u}_{\phi})_{tt}(x,t)-\Delta\hat{u}_{\phi}(x,t)-f\|^{2}_{L^{2}(\Omega_{T})}+\|\hat{u}_{\phi}(x,0)-\varphi(x)\|^{2}_{H^{1}(\Omega)}
+(u^ϕ)t(x,0)ψ(x)L2(Ω)2+u^ϕ(x,t)g(x,t)H1(ΩT)2\displaystyle+\|(\hat{u}_{\phi})_{t}(x,0)-\psi(x)\|^{2}_{L^{2}(\Omega)}+\|\hat{u}_{\phi}(x,t)-g(x,t)\|^{2}_{H^{1}(\partial\Omega_{T})}
vttL2(ΩT)2+ΔvL2(ΩT)2+v(x,0)H1(Ω)2+vt(x,0)L2(Ω)2+vH1(ΩT)2\displaystyle\leq\|v_{tt}\|^{2}_{L^{2}(\Omega_{T})}+\|\Delta v\|^{2}_{L^{2}(\Omega_{T})}+\|v(x,0)\|^{2}_{H^{1}(\Omega)}+\|v_{t}(x,0)\|^{2}_{L^{2}(\Omega)}+\|v\|_{H^{1}(\partial\Omega_{T})}^{2}
C(d,|ΩT|,|ΩT|)ε2.\displaystyle\leq C(d,|\Omega_{T}|,|\partial\Omega_{T}|)\cdot\varepsilon^{2}.

6.3 Appendix for statistical error

We will give the precise computation on the upper bounds of statistical error in this section. To begin with, we first introduce several basic concepts and results in learning theory.

Definition 1

(Rademacher complexity) The Rademacher complexity of a set ANA\subseteq\mathbb{R}^{N} is defined by

(A)=𝔼{σk}k=1N[supaA1Nk=1Nσkak],\displaystyle\Re(A)=\mathbb{E}_{\{\sigma_{k}\}_{k=1}^{N}}\Bigg{[}\ \sup_{a\in A}\frac{1}{N}\sum_{k=1}^{N}\sigma_{k}a_{k}\Bigg{]},

where {σk}k=1N\{\sigma_{k}\}_{k=1}^{N} are NN i.i.d Rademacher variables with p(σk=1)=p(σk=1)=12p(\sigma_{k}=1)=p(\sigma_{k}=-1)=\frac{1}{2}.

Let Ω\Omega be a set and \mathcal{F} be a function class which maps Ω\Omega to \mathbb{R}. Let PP be a probability distribution over Ω\Omega and {Xk}k=1N\{X_{k}\}_{k=1}^{N} be i.i.d. samples from PP. The Rademacher complexity of \mathcal{F} associated with distribution PP and sample size NN is defined by

P,N()=𝔼{Xk,σk}k=1N[supu1Nk=1Nσku(Xk)].\displaystyle\Re_{P,N}(\mathcal{F})=\mathbb{E}_{\{X_{k},\sigma_{k}\}_{k=1}^{N}}\Bigg{[}\ \sup_{u\in\mathcal{F}}\frac{1}{N}\sum_{k=1}^{N}\sigma_{k}u(X_{k})\Bigg{]}.
Lemma 2

Let Ω\Omega be a set and PP be a probability distribution over Ω\Omega. Let NN\in\mathbb{N}. Assume that ω:Ω\omega:\Omega\rightarrow\mathbb{R} and |ω(x)||\omega(x)|\leq\mathcal{B} for all xΩx\in\Omega, then for any function class \mathcal{F} mapping Ω\Omega to \mathbb{R}, there holds

P,N(ω(x))P,N(),\displaystyle\Re_{P,N}(\omega(x)\mathcal{F})\leq\mathcal{B}\Re_{P,N}(\mathcal{F}),

where ω(x):={u¯:u¯(x)=ω(x)u(x),u}.\omega(x)\mathcal{F}:=\{\overline{u}:\overline{u}(x)=\omega(x)u(x),u\in\mathcal{F}\}.

Proof

See jiao2022rate for the proof.

Definition 2

(Covering number) Suppose that WW\subset\mathbb{R}. For any ε>0\varepsilon>0 ,let VnV\subset\mathbb{R}^{n} be an ε\varepsilon-cover of WW with respect to the distance dd_{\infty}, that is, for any ωW\omega\in W, there exists a vVv\in V such that d(u,v)<εd_{\infty}(u,v)<\varepsilon, where dd_{\infty} is defined by d(u,v):=max1in|uivi|d_{\infty}(u,v):=\max_{1\leq i\leq n}|u_{i}-v_{i}|. The covering number 𝒞(ε,W,d)\mathcal{C}(\varepsilon,W,d_{\infty}) is defined to be the minimum cardinality among all ε\varepsilon-cover of WW with respect to the distance dd_{\infty}.

Definition 3

(Uniform covering number) Suppose that \mathcal{F} is a class of functions from Ω\Omega to \mathbb{R}. Given nn sample Zn=(Z1,Z2,Zn)Ωn,|ZnnZ_{n}=(Z_{1},Z_{2},\cdots Z_{n})\in\Omega^{n},\mathcal{F}|_{Z_{n}}\subset\mathbb{R}^{n} is defined by

|Znn={(u(Z1),u(Z2),,u(Zn)):u}.\displaystyle\mathcal{F}|_{Z_{n}}\subset\mathbb{R}^{n}=\{(u(Z_{1}),u(Z_{2}),\cdots,u(Z_{n})):u\in\mathcal{F}\}.

The uniform covering number 𝒞(ε,,n)\mathcal{C}_{\infty}(\varepsilon,\mathcal{F},n) is defined by

𝒞(ε,,n)=maxZnΩn𝒞(ε,|Zn,d).\displaystyle\mathcal{C}_{\infty}(\varepsilon,\mathcal{F},n)=\max_{Z_{n}\in\Omega^{n}}\mathcal{C}(\varepsilon,\mathcal{F}|_{Z_{n}},d_{\infty}).
Lemma 3

Let Ω\Omega be a set and PP be a probability distribution over Ω\Omega. Let N1N\in\mathbb{N}_{\geq 1},and \mathcal{F} be a class of functions from Ω\Omega to \mathbb{R} such that 00\in\mathcal{F} and the diameter of \mathcal{F} is less than \mathcal{B},i.e.,uL(Ω)\|u\|_{L^{\infty}(\Omega)}\leq\mathcal{B},u\forall u\in\mathcal{F}. Then

P,N()inf0<δ<(4δ+12Nδlog(2𝒞(ε,,N))𝑑ε).\displaystyle\Re_{P,N}(\mathcal{F})\leq\inf_{0<\delta<\mathcal{B}}\bigg{(}4\delta+\frac{12}{\sqrt{N}}\int_{\delta}^{\mathcal{B}}\sqrt{log(2\mathcal{C}_{\infty}(\varepsilon,\mathcal{F},N))}d\varepsilon\bigg{)}.
Proof

This proof is base on the chaining method, see van1996weak .

Definition 4

(Pseudo-dimension) Let \mathcal{F} be a class of functions from XX to \mathbb{R}. Suppose that S={x1,x2,,xn}XS=\{x_{1},x_{2},\cdots,x_{n}\}\subset X. We say that SS is pseudo-shattered by \mathcal{F} if there exists y1,y2,,yny_{1},y_{2},\cdots,y_{n} such that for any b{0,1}nb\in\{0,1\}^{n}, there exists a uu\in\mathcal{F} satisfying

sign(u(xi)yi)=bi,i=1,2,,n,\displaystyle sign(u(x_{i})-y_{i})=b_{i},i=1,2,\cdots,n,

and we say that {yi}i=1n\{y_{i}\}_{i=1}^{n} witnesses the shattering. The pseudo-dimension of \mathcal{F}, denoted as Pdim()Pdim(\mathcal{F}), is defined to be the maximum cardinality amoong all sets pseudo-shattered by \mathcal{F}.

Lemma 4 (Theorem 12.2 in Anthony1999Neural )

Let \mathcal{F} be a class of real functions from a domain XX to the bounded interval [0,][0,\mathcal{B}].Let ε>0\varepsilon>0. Then

𝒞(ε,,n)i=1Pdim()(ni)(ε)i,\displaystyle\mathcal{C}_{\infty}(\varepsilon,\mathcal{F},n)\leq\sum_{i=1}^{Pdim(\mathcal{F})}\tbinom{n}{i}{\tbinom{\mathcal{B}}{\varepsilon}}^{i},

which is less than (enεPdim())Pdim()(\frac{en\mathcal{B}}{\varepsilon\cdot Pdim(\mathcal{F})})^{Pdim(\mathcal{F})} for nPdim()n\geq Pdim(\mathcal{F}).

Next, to obtain the upper bound, we would decompose the statistical error into 24 terms by using triangle inequality:

𝔼{Xn}n=1N,{Ym}m=1M,{Tk}k=1Ksupu𝒫|(u)^(u)|\displaystyle\mathbb{E}_{\{X_{n}\}_{n=1}^{N},\{Y_{m}\}_{m=1}^{M},\{T_{k}\}_{k=1}^{K}}\sup_{u\in\mathcal{P}}|\mathcal{L}(u)-\hat{\mathcal{L}}(u)|\leq
j=124𝔼{Xn}n=1N,{Ym}m=1M,{Tk}k=1Ksupu𝒫|j(u)^j(u)|\displaystyle\sum_{j=1}^{24}\mathbb{E}_{\{X_{n}\}_{n=1}^{N},\{Y_{m}\}_{m=1}^{M},\{T_{k}\}_{k=1}^{K}}\sup_{u\in\mathcal{P}}|\mathcal{L}_{j}(u)-\hat{\mathcal{L}}_{j}(u)|

where

1\displaystyle\mathcal{L}_{1} =|Ω||T|𝔼XU(Ω),TU([0,T])(utt(X,T))2,\displaystyle=|\Omega||T|\mathbb{E}_{X\sim U(\Omega),T\sim U([0,T])}\left(u_{tt}(X,T)\right)^{2},
2\displaystyle\mathcal{L}_{2} =|Ω||T|𝔼XU(Ω),TU([0,T])(i=1duxixi(X,T))2,\displaystyle=|\Omega||T|\mathbb{E}_{X\sim U(\Omega),T\sim U([0,T])}\left(\sum_{i=1}^{d}u_{x_{i}x_{i}}(X,T)\right)^{2},
3\displaystyle\mathcal{L}_{3} =|Ω||T|𝔼XU(Ω),TU([0,T])(f(X,T))2,\displaystyle=|\Omega||T|\mathbb{E}_{X\sim U(\Omega),T\sim U([0,T])}\left(f(X,T)\right)^{2},
4\displaystyle\mathcal{L}_{4} =2|Ω||T|𝔼XU(Ω),TU([0,T])(i=1dutt(X,T)uxixi(X,T)),\displaystyle=-2|\Omega||T|\mathbb{E}_{X\sim U(\Omega),T\sim U([0,T])}\left(\sum_{i=1}^{d}u_{tt}(X,T)u_{x_{i}x_{i}}(X,T)\right),
5\displaystyle\mathcal{L}_{5} =2|Ω||T|𝔼XU(Ω),TU([0,T])(utt(X,T)f(X,T)),\displaystyle=-2|\Omega||T|\mathbb{E}_{X\sim U(\Omega),T\sim U([0,T])}\left(u_{tt}(X,T)f(X,T)\right),
6\displaystyle\mathcal{L}_{6} =2|Ω||T|𝔼XU(Ω),TU([0,T])(i=1duxixi(X,T)f(X,T)),\displaystyle=2|\Omega||T|\mathbb{E}_{X\sim U(\Omega),T\sim U([0,T])}\left(\sum_{i=1}^{d}u_{x_{i}x_{i}}(X,T)f(X,T)\right),
7\displaystyle\mathcal{L}_{7} =|Ω|𝔼XU(Ω)(u(X,0))2,\displaystyle=|\Omega|\mathbb{E}_{X\sim U(\Omega)}\left(u(X,0)\right)^{2},
8\displaystyle\mathcal{L}_{8} =|Ω|𝔼XU(Ω)(φ(X))2,\displaystyle=|\Omega|\mathbb{E}_{X\sim U(\Omega)}\left(\varphi(X)\right)^{2},
9\displaystyle\mathcal{L}_{9} =2|Ω|𝔼XU(Ω)(u(X,0)φ(X)),\displaystyle=-2|\Omega|\mathbb{E}_{X\sim U(\Omega)}\left(u(X,0)\varphi(X)\right),
10\displaystyle\mathcal{L}_{10} =|Ω|𝔼XU(Ω)(i=1duxi(X,0)2),\displaystyle=|\Omega|\mathbb{E}_{X\sim U(\Omega)}\left(\sum_{i=1}^{d}u_{x_{i}}(X,0)^{2}\right),
11\displaystyle\mathcal{L}_{11} =|Ω|𝔼XU(Ω)(i=1dφxi(X)2),\displaystyle=|\Omega|\mathbb{E}_{X\sim U(\Omega)}\left(\sum_{i=1}^{d}\varphi_{x_{i}}(X)^{2}\right),
12\displaystyle\mathcal{L}_{12} =2|Ω|𝔼XU(Ω)(i=1duxi(X,0)φxi(X)),\displaystyle=-2|\Omega|\mathbb{E}_{X\sim U(\Omega)}\left(\sum_{i=1}^{d}u_{x_{i}}(X,0)\varphi_{x_{i}}(X)\right),
13\displaystyle\mathcal{L}_{13} =|Ω|𝔼XU(Ω)(ut(X,0))2,\displaystyle=|\Omega|\mathbb{E}_{X\sim U(\Omega)}\left(u_{t}(X,0)\right)^{2},
14\displaystyle\mathcal{L}_{14} =|Ω|𝔼XU(Ω)(ψ(X))2,\displaystyle=|\Omega|\mathbb{E}_{X\sim U(\Omega)}\left(\psi(X)\right)^{2},
15\displaystyle\mathcal{L}_{15} =2|Ω|𝔼XU(Ω)(ut(X,0)ψ(X)),\displaystyle=-2|\Omega|\mathbb{E}_{X\sim U(\Omega)}\left(u_{t}(X,0)\psi(X)\right),
16\displaystyle\mathcal{L}_{16} =|Ω||T|𝔼YU(Ω),TU([0,T])(u(Y,T))2,\displaystyle=|\partial\Omega||T|\mathbb{E}_{Y\sim U(\partial\Omega),T\sim U([0,T])}\left(u(Y,T)\right)^{2},
17\displaystyle\mathcal{L}_{17} =|Ω||T|𝔼YU(Ω),TU([0,T])(g(Y,T))2,\displaystyle=|\partial\Omega||T|\mathbb{E}_{Y\sim U(\partial\Omega),T\sim U([0,T])}\left(g(Y,T)\right)^{2},
18\displaystyle\mathcal{L}_{18} =2|Ω||T|𝔼YU(Ω),TU([0,T])(u(Y,T)g(Y,T)),\displaystyle=-2|\partial\Omega||T|\mathbb{E}_{Y\sim U(\partial\Omega),T\sim U([0,T])}\left(u(Y,T)g(Y,T)\right),
19\displaystyle\mathcal{L}_{19} =|Ω||T|𝔼YU(Ω),TU([0,T])(ut(Y,T))2,\displaystyle=|\partial\Omega||T|\mathbb{E}_{Y\sim U(\partial\Omega),T\sim U([0,T])}\left(u_{t}(Y,T)\right)^{2},
20\displaystyle\mathcal{L}_{20} =|Ω||T|𝔼YU(Ω),TU([0,T])(gt(Y,T))2,\displaystyle=|\partial\Omega||T|\mathbb{E}_{Y\sim U(\partial\Omega),T\sim U([0,T])}\left(g_{t}(Y,T)\right)^{2},
21\displaystyle\mathcal{L}_{21} =2|Ω||T|𝔼YU(Ω),TU([0,T])(ut(Y,T)gt(Y,T)),\displaystyle=-2|\partial\Omega||T|\mathbb{E}_{Y\sim U(\partial\Omega),T\sim U([0,T])}\left(u_{t}(Y,T)g_{t}(Y,T)\right),
22\displaystyle\mathcal{L}_{22} =|Ω||T|𝔼YU(Ω),TU([0,T])i=1d(uxi(Y,T)2),\displaystyle=|\partial\Omega||T|\mathbb{E}_{Y\sim U(\partial\Omega),T\sim U([0,T])}\sum_{i=1}^{d}\left(u_{x_{i}}(Y,T)^{2}\right),
23\displaystyle\mathcal{L}_{23} =|Ω||T|𝔼YU(Ω),TU([0,T])i=1d(gxi(Y,T)2),\displaystyle=|\partial\Omega||T|\mathbb{E}_{Y\sim U(\partial\Omega),T\sim U([0,T])}\sum_{i=1}^{d}\left(g_{x_{i}}(Y,T)^{2}\right),
24\displaystyle\mathcal{L}_{24} =2|Ω||T|𝔼YU(Ω),TU([0,T])i=1d(uxixi(Y,T)gxi(X,T)),\displaystyle=-2|\partial\Omega||T|\mathbb{E}_{Y\sim U(\partial\Omega),T\sim U([0,T])}\sum_{i=1}^{d}\left(u_{x_{i}}{x_{i}}(Y,T)g_{x_{i}}(X,T)\right),

and ^j(u)\hat{\mathcal{L}}_{j}(u) is the empirical version of j(u)\mathcal{L}_{j}(u). The following lemma states that each of these 24 terms can be controlled by the corresponding Rademacher complexity.

Lemma 5

Let {Xn}n=1N,{Ym}m=1M,{Tk}k=1K{\{X_{n}\}_{n=1}^{N},\{Y_{m}\}_{m=1}^{M},\{T_{k}\}_{k=1}^{K}} be i.i.d samples from U(Ω),U(Ω),U([0,T])U(\Omega),U(\partial\Omega),U([0,T]), then we have

𝔼{Xn}n=1N,{Ym}m=1M,{Tk}k=1Ksupu𝒫|j(u)^j(u)|C(d,)𝒰,N(j)\displaystyle\mathbb{E}_{\{X_{n}\}_{n=1}^{N},\{Y_{m}\}_{m=1}^{M},\{T_{k}\}_{k=1}^{K}}\sup_{u\in\mathcal{P}}\bigg{|}\mathcal{L}_{j}(u)-\hat{\mathcal{L}}_{j}(u)\bigg{|}\leq C(d,\mathcal{B})\Re_{\mathcal{U},N}(\mathcal{F}_{j})

for j=1,2,,24,j=1,2,\cdots,24, where:

1\displaystyle\mathcal{F}_{1} ={±f:ΩT|u𝒫s.t.f(x,t)=utt(x,t)2},\displaystyle=\{\pm f:\Omega_{T}\rightarrow\mathbb{R}|\quad\exists u\in\mathcal{P}\quad s.t.\quad f(x,t)=u_{tt}(x,t)^{2}\},
2\displaystyle\mathcal{F}_{2} ={±f:ΩT|u𝒫1ijds.t.f(x,t)=uxixi(x,t)uxjxj(x,t)},\displaystyle=\{\pm f:\Omega_{T}\rightarrow\mathbb{R}|\quad\exists u\in\mathcal{P}\quad 1\leq i\leq j\leq d\quad s.t.\quad f(x,t)=u_{x_{i}x_{i}}(x,t)u_{x_{j}x_{j}}(x,t)\},
4\displaystyle\mathcal{F}_{4} ={±f:ΩT|u𝒫1ids.t.f(x,t)=utt(x,t)uxixi(x,t)},\displaystyle=\{\pm f:\Omega_{T}\rightarrow\mathbb{R}|\quad\exists u\in\mathcal{P}\quad 1\leq i\leq d\quad s.t.\quad f(x,t)=u_{tt}(x,t)u_{x_{i}x_{i}}(x,t)\},
5\displaystyle\mathcal{F}_{5} ={±f:ΩT|u𝒫s.t.f(x,t)=utt(x,t)},\displaystyle=\{\pm f:\Omega_{T}\rightarrow\mathbb{R}|\quad\exists u\in\mathcal{P}\quad s.t.\quad f(x,t)=u_{tt}(x,t)\},
6\displaystyle\mathcal{F}_{6} ={±f:ΩT|u𝒫1ids.t.f(x,t)=uxixi(x,t)},\displaystyle=\{\pm f:\Omega_{T}\rightarrow\mathbb{R}|\quad\exists u\in\mathcal{P}\quad 1\leq i\leq d\quad s.t.\quad f(x,t)=u_{x_{i}x_{i}}(x,t)\},
7\displaystyle\mathcal{F}_{7} ={±f:Ω|u𝒫s.t.f(x)=u(x,0)2},\displaystyle=\{\pm f:\Omega\rightarrow\mathbb{R}|\quad\exists u\in\mathcal{P}\quad s.t.\quad f(x)=u(x,0)^{2}\},
9\displaystyle\mathcal{F}_{9} ={±f:Ω|u𝒫s.t.f(x)=u(x,0)},\displaystyle=\{\pm f:\Omega\rightarrow\mathbb{R}|\quad\exists u\in\mathcal{P}\quad s.t.\quad f(x)=u(x,0)\},
10\displaystyle\mathcal{F}_{10} ={±f:Ω|u𝒫1ijds.t.f(x)=uxi(x,0)uxj(x,0)},\displaystyle=\{\pm f:\Omega\rightarrow\mathbb{R}|\quad\exists u\in\mathcal{P}\quad 1\leq i\leq j\leq d\quad s.t.\quad f(x)=u_{x_{i}}(x,0)u_{x_{j}}(x,0)\},
12\displaystyle\mathcal{F}_{12} ={±f:Ω|u𝒫1ids.t.f(x,t)=uxi(x,0)},\displaystyle=\{\pm f:\Omega\rightarrow\mathbb{R}|\quad\exists u\in\mathcal{P}\quad 1\leq i\leq d\quad s.t.\quad f(x,t)=u_{x_{i}}(x,0)\},
13\displaystyle\mathcal{F}_{13} ={±f:Ω|u𝒫s.t.f(x)=ut(x,0)2},\displaystyle=\{\pm f:\Omega\rightarrow\mathbb{R}|\quad\exists u\in\mathcal{P}\quad s.t.\quad f(x)=u_{t}(x,0)^{2}\},
15\displaystyle\mathcal{F}_{15} ={±f:Ω|u𝒫s.t.f(x)=ut(x,0)},\displaystyle=\{\pm f:\Omega\rightarrow\mathbb{R}|\quad\exists u\in\mathcal{P}\quad s.t.\quad f(x)=u_{t}(x,0)\},
16\displaystyle\mathcal{F}_{16} ={±f:ΩT|u𝒫s.t.f(x,t)=u(x,t)2|Ω},\displaystyle=\{\pm f:\partial\Omega_{T}\rightarrow\mathbb{R}|\quad\exists u\in\mathcal{P}\quad s.t.\quad f(x,t)=u(x,t)^{2}|_{\partial\Omega}\},
18\displaystyle\mathcal{F}_{18} ={±f:ΩT|u𝒫s.t.f(x,t)=u(x,t)|Ω},\displaystyle=\{\pm f:\partial\Omega_{T}\rightarrow\mathbb{R}|\quad\exists u\in\mathcal{P}\quad s.t.\quad f(x,t)=u(x,t)|_{\partial\Omega}\},
19\displaystyle\mathcal{F}_{19} ={±f:ΩT|u𝒫s.t.f(x,t)=ut(x,t)2|Ω},\displaystyle=\{\pm f:\partial\Omega_{T}\rightarrow\mathbb{R}|\quad\exists u\in\mathcal{P}\quad s.t.\quad f(x,t)=u_{t}(x,t)^{2}|_{\partial\Omega}\},
21\displaystyle\mathcal{F}_{21} ={±f:ΩT|u𝒫s.t.f(x,t)=ut(x,t)|Ω},\displaystyle=\{\pm f:\partial\Omega_{T}\rightarrow\mathbb{R}|\quad\exists u\in\mathcal{P}\quad s.t.\quad f(x,t)=u_{t}(x,t)|_{\partial\Omega}\},
22\displaystyle\mathcal{F}_{22} ={±f:ΩT|u𝒫1ijds.t.f(x,t)=uxi(x,t)uxj(x,t)|Ω},\displaystyle=\{\pm f:\partial\Omega_{T}\rightarrow\mathbb{R}|\quad\exists u\in\mathcal{P}\quad 1\leq i\leq j\leq d\quad s.t.\quad f(x,t)=u_{x_{i}}(x,t)u_{x_{j}}(x,t)|_{\partial\Omega}\},
24\displaystyle\mathcal{F}_{24} ={±f:ΩT|u𝒫1ids.t.f(x,t)=uxi(x,t)|Ω}.\displaystyle=\{\pm f:\partial\Omega_{T}\rightarrow\mathbb{R}|\quad\exists u\in\mathcal{P}\quad 1\leq i\leq d\quad s.t.\quad f(x,t)=u_{x_{i}}(x,t)|_{\partial\Omega}\}.
Proof

The proof is based on the symmetrization technique, see lemma 4.3 in jiao2022rate for more details.

Lemma 6

Let Φ={ReLU,ReLU2,ReLU3}\Phi=\{ReLU,ReLU^{2},ReLU^{3}\}. There holds

1𝒩1:=𝒩(𝒟+5,(𝒟+2)(𝒟+4)𝒲,{C(ΩT¯),2},Φ),\displaystyle\mathcal{F}_{1}\subset\mathcal{N}_{1}:=\mathcal{N}(\mathcal{D}+5,(\mathcal{D}+2)(\mathcal{D}+4)\mathcal{W},\{\|\cdot\|_{C(\overline{\Omega_{T}})},\mathcal{B}^{2}\},\Phi),
2𝒩2:=𝒩(𝒟+5,2(𝒟+2)(𝒟+4)𝒲,{C(ΩT¯),2},Φ),\displaystyle\mathcal{F}_{2}\subset\mathcal{N}_{2}:=\mathcal{N}(\mathcal{D}+5,2(\mathcal{D}+2)(\mathcal{D}+4)\mathcal{W},\{\|\cdot\|_{C(\overline{\Omega_{T}})},\mathcal{B}^{2}\},\Phi),
4𝒩4:=𝒩(𝒟+5,2(𝒟+2)(𝒟+4)𝒲,{C(ΩT¯),2},Φ),\displaystyle\mathcal{F}_{4}\subset\mathcal{N}_{4}:=\mathcal{N}(\mathcal{D}+5,2(\mathcal{D}+2)(\mathcal{D}+4)\mathcal{W},\{\|\cdot\|_{C(\overline{\Omega_{T}})},\mathcal{B}^{2}\},\Phi),
5𝒩5:=𝒩(𝒟+4,(𝒟+2)(𝒟+4)𝒲,{C(ΩT¯),},Φ),\displaystyle\mathcal{F}_{5}\subset\mathcal{N}_{5}:=\mathcal{N}(\mathcal{D}+4,(\mathcal{D}+2)(\mathcal{D}+4)\mathcal{W},\{\|\cdot\|_{C(\overline{\Omega_{T}})},\mathcal{B}\},\Phi),
6𝒩6:=𝒩(𝒟+4,(𝒟+2)(𝒟+4)𝒲,{C(ΩT¯),},Φ),\displaystyle\mathcal{F}_{6}\subset\mathcal{N}_{6}:=\mathcal{N}(\mathcal{D}+4,(\mathcal{D}+2)(\mathcal{D}+4)\mathcal{W},\{\|\cdot\|_{C(\overline{\Omega_{T}})},\mathcal{B}\},\Phi),
7𝒩7:=𝒩(𝒟+1),𝒲,{C(ΩT¯),2},Φ),\displaystyle\mathcal{F}_{7}\subset\mathcal{N}_{7}:=\mathcal{N}(\mathcal{D}+1),\mathcal{W},\{\|\cdot\|_{C(\overline{\Omega_{T}})},\mathcal{B}^{2}\},\Phi),
9𝒩9:=𝒩(𝒟,𝒲,{C(ΩT¯),},Φ),\displaystyle\mathcal{F}_{9}\subset\mathcal{N}_{9}:=\mathcal{N}(\mathcal{D},\mathcal{W},\{\|\cdot\|_{C(\overline{\Omega_{T}})},\mathcal{B}\},\Phi),
10𝒩10:=𝒩(𝒟+2,2(𝒟+2)𝒲,{C(ΩT¯),2},Φ),\displaystyle\mathcal{F}_{10}\subset\mathcal{N}_{10}:=\mathcal{N}(\mathcal{D}+2,2(\mathcal{D}+2)\mathcal{W},\{\|\cdot\|_{C(\overline{\Omega_{T}})},\mathcal{B}^{2}\},\Phi),
12𝒩12:=𝒩(𝒟+2,(𝒟+2)𝒲,{C(ΩT¯),},Φ),\displaystyle\mathcal{F}_{12}\subset\mathcal{N}_{12}:=\mathcal{N}(\mathcal{D}+2,(\mathcal{D}+2)\mathcal{W},\{\|\cdot\|_{C(\overline{\Omega_{T}})},\mathcal{B}\},\Phi),
13𝒩13:=𝒩(𝒟+3,(𝒟+2)𝒲,{C(ΩT¯),2},Φ),\displaystyle\mathcal{F}_{13}\subset\mathcal{N}_{13}:=\mathcal{N}(\mathcal{D}+3,(\mathcal{D}+2)\mathcal{W},\{\|\cdot\|_{C(\overline{\Omega_{T}})},\mathcal{B}^{2}\},\Phi),
15𝒩15:=𝒩(𝒟+2,(𝒟+2)𝒲,{C(ΩT¯),},Φ),\displaystyle\mathcal{F}_{15}\subset\mathcal{N}_{15}:=\mathcal{N}(\mathcal{D}+2,(\mathcal{D}+2)\mathcal{W},\{\|\cdot\|_{C(\overline{\Omega_{T}})},\mathcal{B}\},\Phi),
16𝒩16:=𝒩(𝒟+1,𝒲,{C(ΩT¯),2},Φ),\displaystyle\mathcal{F}_{16}\subset\mathcal{N}_{16}:=\mathcal{N}(\mathcal{D}+1,\mathcal{W},\{\|\cdot\|_{C(\overline{\Omega_{T}})},\mathcal{B}^{2}\},\Phi),
18𝒩18:=𝒩(𝒟,𝒲,{C(ΩT¯),},Φ),\displaystyle\mathcal{F}_{18}\subset\mathcal{N}_{18}:=\mathcal{N}(\mathcal{D},\mathcal{W},\{\|\cdot\|_{C(\overline{\Omega_{T}})},\mathcal{B}\},\Phi),
19𝒩19:=𝒩(𝒟+3,(𝒟+2)𝒲,{C(ΩT¯),2},Φ),\displaystyle\mathcal{F}_{19}\subset\mathcal{N}_{19}:=\mathcal{N}(\mathcal{D}+3,(\mathcal{D}+2)\mathcal{W},\{\|\cdot\|_{C(\overline{\Omega_{T}})},\mathcal{B}^{2}\},\Phi),
21𝒩21:=𝒩(𝒟+2,(𝒟+2)𝒲,{C(ΩT¯),},Φ),\displaystyle\mathcal{F}_{21}\subset\mathcal{N}_{21}:=\mathcal{N}(\mathcal{D}+2,(\mathcal{D}+2)\mathcal{W},\{\|\cdot\|_{C(\overline{\Omega_{T}})},\mathcal{B}\},\Phi),
22𝒩22:=𝒩(𝒟+3,2(𝒟+2)𝒲,{C(ΩT¯),2},Φ),\displaystyle\mathcal{F}_{22}\subset\mathcal{N}_{22}:=\mathcal{N}(\mathcal{D}+3,2(\mathcal{D}+2)\mathcal{W},\{\|\cdot\|_{C(\overline{\Omega_{T}})},\mathcal{B}^{2}\},\Phi),
24𝒩24:=𝒩(𝒟+2,(𝒟+2)𝒲,{C(ΩT¯),},Φ).\displaystyle\mathcal{F}_{24}\subset\mathcal{N}_{24}:=\mathcal{N}(\mathcal{D}+2,(\mathcal{D}+2)\mathcal{W},\{\|\cdot\|_{C(\overline{\Omega_{T}})},\mathcal{B}\},\Phi).
Proof

The proof is an application of proposition 4.2 in jiao2022rate . Take 1\mathcal{F}_{1} as an example, since u𝒫u\in\mathcal{P}, we have ut𝒩(𝒟+2,(𝒟+2)𝒲,{C1(ΩT¯),},Φ)u_{t}\in\mathcal{N}(\mathcal{D}+2,(\mathcal{D}+2)\mathcal{W},\{\|\cdot\|_{C^{1}(\overline{\Omega_{T}})},\mathcal{B}\},\Phi) and utt𝒩(𝒟+4,(𝒟+2)(𝒟+4)𝒲,{C(ΩT¯),},Φ)u_{tt}\in\mathcal{N}(\mathcal{D}+4,(\mathcal{D}+2)(\mathcal{D}+4)\mathcal{W},\{\|\cdot\|_{C(\overline{\Omega_{T}})},\mathcal{B}\},\Phi). Notice the square operation can be implemented as x2=ReLU2(x)+ReLU2(x)x^{2}=ReLU^{2}(x)+ReLU^{2}(-x), thus we get that utt2𝒩(𝒟+5,(𝒟+2)(𝒟+4)𝒲,{C(ΩT¯),2},Φ)u_{tt}^{2}\in\mathcal{N}(\mathcal{D}+5,(\mathcal{D}+2)(\mathcal{D}+4)\mathcal{W},\{\|\cdot\|_{C(\overline{\Omega_{T}})},\mathcal{B}^{2}\},\Phi).

Lemma 7 (Proposition 4.3 in jiao2022rate )

For any 𝒟,𝒲\mathcal{D},\mathcal{W}\in\mathbb{N},

Pdim(𝒩(𝒟,𝒲,{ReLU,ReLU2,ReLU3}))=𝒪(𝒟2𝒲2(𝒟+log𝒲)).\displaystyle Pdim(\mathcal{N}(\mathcal{D},\mathcal{W},\{ReLU,ReLU^{2},ReLU^{3}\}))=\mathcal{O}(\mathcal{D}^{2}\mathcal{W}^{2}(\mathcal{D}+log\mathcal{W})).

Now we are ready to prove Theorem 3.3 on the statistical error.

Proof (The proof of Theorem 3.3)

According to lemma 3 and lemma 6,

\bullet  For i=1,2,4,5,6i=1,2,4,5,6, when the sample numbers n=NK>Pdim(i)n=NK>Pdim(\mathcal{F}_{i}), we have

P(𝒳),NK(i)\displaystyle\Re_{P(\mathcal{X}),NK}(\mathcal{F}_{i}) inf0<δ<i(4δ+12NKδilog(2C(ε,i,N))𝑑ε)\displaystyle\leq\inf_{0<\delta<\mathcal{B}_{i}}\left(4\delta+\frac{12}{\sqrt{NK}}\int_{\delta}^{\mathcal{B}_{i}}\sqrt{log(2C_{\infty}(\varepsilon,\mathcal{F}_{i},N))}d\varepsilon\right)
inf0<δ<i(4δ+12NKδilog(2(eNKiεPdim(i))Pdim(i))𝑑ε)\displaystyle\leq\inf_{0<\delta<\mathcal{B}_{i}}\left(4\delta+\frac{12}{\sqrt{NK}}\int_{\delta}^{\mathcal{B}_{i}}\sqrt{log\left(2\left(\frac{eNK\mathcal{B}_{i}}{\varepsilon\cdot Pdim(\mathcal{F}_{i})}\right)^{Pdim(\mathcal{F}_{i})}\right)}d\varepsilon\right)
inf0<δ<i(4δ+12iNK+12NKδiPdim(i)log(eNKiεPdim(i))𝑑ε)\displaystyle\leq\inf_{0<\delta<\mathcal{B}_{i}}\left(4\delta+\frac{12\mathcal{B}_{i}}{\sqrt{NK}}+\frac{12}{\sqrt{NK}}\int_{\delta}^{\mathcal{B}_{i}}\sqrt{Pdim(\mathcal{F}_{i})log\left(\frac{eNK\mathcal{B}_{i}}{\varepsilon\cdot Pdim(\mathcal{F}_{i})}\right)}d\varepsilon\right) (27)

Let t=log(eNKiεPim(i))t=\sqrt{log\left(\frac{eNK\mathcal{B}_{i}}{\varepsilon\cdot Pim(\mathcal{F}_{i})}\right)}, then ε=eNKiPdim(i)et2\varepsilon=\frac{eNK\mathcal{B}_{i}}{Pdim(\mathcal{F}_{i})}e^{-t^{2}}. Denoting:

t1=log(eNKiiPdim(i)),t2=log(eNKiδPdim(i))\displaystyle t_{1}=\sqrt{log\left(\frac{eNK\mathcal{B}_{i}}{\mathcal{B}_{i}\cdot Pdim(\mathcal{F}_{i})}\right)},t_{2}=\sqrt{log\left(\frac{eNK\mathcal{B}_{i}}{\delta\cdot Pdim(\mathcal{F}_{i})}\right)}

we have:

δilog(eNKiεPim(i))𝑑ε\displaystyle\int_{\delta}^{\mathcal{B}_{i}}\sqrt{log\left(\frac{eNK\mathcal{B}_{i}}{\varepsilon\cdot Pim(\mathcal{F}_{i})}\right)}d\varepsilon
=2eNKiPdim(i)t1t2t2et2𝑑t\displaystyle=\frac{2eNK\mathcal{B}_{i}}{Pdim(\mathcal{F}_{i})}\int_{t_{1}}^{t_{2}}t^{2}e^{-t^{2}}dt
=2eNKiPdim(i)t1t2t(et22)𝑑t\displaystyle=\frac{2eNK\mathcal{B}_{i}}{Pdim(\mathcal{F}_{i})}\int_{t_{1}}^{t_{2}}t\left(\frac{-e^{-t^{2}}}{2}\right)^{{}^{\prime}}dt
=eNKiPdim(i)[t1et12t2et22+t1t2et2𝑑t]\displaystyle=\frac{eNK\mathcal{B}_{i}}{Pdim(\mathcal{F}_{i})}\left[t_{1}e^{-t_{1}^{2}}-t_{2}e^{-t_{2}^{2}}+\int_{t_{1}}^{t_{2}}e^{-t^{2}}dt\right]
eNKiPdim(i)[t1et12t2et22+(t2t1)et12]\displaystyle\leq\frac{eNK\mathcal{B}_{i}}{Pdim(\mathcal{F}_{i})}\left[t_{1}e^{-t_{1}^{2}}-t_{2}e^{-t_{2}^{2}}+(t_{2}-t_{1})e^{-t_{1}^{2}}\right]
eNKiPdim(i)t2et12\displaystyle\leq\frac{eNK\mathcal{B}_{i}}{Pdim(\mathcal{F}_{i})}t_{2}e^{-t_{1}^{2}}
ilog(eNKiδPdim(i))\displaystyle\leq\mathcal{B}_{i}\sqrt{log\left(\frac{eNK\mathcal{B}_{i}}{\delta\cdot Pdim(\mathcal{F}_{i})}\right)} (28)

Substitute (28) into (27) and choose δ=i(Pdim(i)NK)12i\delta=\mathcal{B}_{i}\left(\frac{Pdim(\mathcal{F}_{i})}{NK}\right)^{\frac{1}{2}}\leq\mathcal{B}_{i}, we have:

P(𝒳),NK(i)\displaystyle\Re_{P(\mathcal{X}),NK}(\mathcal{F}_{i}) inf0<δ<i(4δ+12iNK+12NKδiPdim(i)log(eNKiεPdim(i))𝑑ε)\displaystyle\leq\inf_{0<\delta<\mathcal{B}_{i}}\left(4\delta+\frac{12\mathcal{B}_{i}}{\sqrt{NK}}+\frac{12}{\sqrt{NK}}\int_{\delta}^{\mathcal{B}_{i}}\sqrt{Pdim(\mathcal{F}_{i})log\left(\frac{eNK\mathcal{B}_{i}}{\varepsilon\cdot Pdim(\mathcal{F}_{i})}\right)}d\varepsilon\right)
inf0<δ<i(4δ+12iNK+12iPdim(i)NKlog(eNKiδPdim(i)))\displaystyle\leq\inf_{0<\delta<\mathcal{B}_{i}}\left(4\delta+\frac{12\mathcal{B}_{i}}{\sqrt{NK}}+\frac{12\mathcal{B}_{i}\sqrt{Pdim(\mathcal{F}_{i})}}{\sqrt{NK}}\sqrt{log\left(\frac{eNK\mathcal{B}_{i}}{\delta\cdot Pdim(\mathcal{F}_{i})}\right)}\right)
2832i(Pdim(i)NK)12log(eNKPdim(i))\displaystyle\leq 28\sqrt{\frac{3}{2}}\mathcal{B}_{i}\left(\frac{Pdim(\mathcal{F}_{i})}{NK}\right)^{\frac{1}{2}}\sqrt{log\left(\frac{eNK}{Pdim(\mathcal{F}_{i})}\right)}
2832i(Pdim(𝒩i)NK)12log(eNKPdim(𝒩i))\displaystyle\leq 28\sqrt{\frac{3}{2}}\mathcal{B}_{i}\left(\frac{Pdim(\mathcal{N}_{i})}{NK}\right)^{\frac{1}{2}}\sqrt{log\left(\frac{eNK}{Pdim(\mathcal{N}_{i})}\right)}
2832max{,2}(1NK)12log(eNK1)\displaystyle\leq 28\sqrt{\frac{3}{2}}max\{\mathcal{B},\mathcal{B}^{2}\}\left(\frac{\mathcal{H}_{1}}{NK}\right)^{\frac{1}{2}}\sqrt{log\left(\frac{eNK}{\mathcal{H}_{1}}\right)} (29)

where

1=C1(𝒟+2)2(𝒟+4)2(𝒟+5)2𝒲2[(𝒟+5)+log(2(𝒟+2)(𝒟+4)𝒲)].\mathcal{H}_{1}=C_{1}(\mathcal{D}+2)^{2}(\mathcal{D}+4)^{2}(\mathcal{D}+5)^{2}\mathcal{W}^{2}[(\mathcal{D}+5)+log(2(\mathcal{D}+2)(\mathcal{D}+4)\mathcal{W})].

The last step above is due to lemma 7.

\bullet  For i=7,9,10,12,13,15i=7,9,10,12,13,15, when the sample numbers n=N>Pdim(i)n=N>Pdim(\mathcal{F}_{i}), we can similarly prove that

𝒰(Ω),N(i)2832max{,2}(2N)12log(eN2)\displaystyle\Re_{\mathcal{U}(\Omega),N}(\mathcal{F}_{i})\leq 28\sqrt{\frac{3}{2}}max\{\mathcal{B},\mathcal{B}^{2}\}\left(\frac{\mathcal{H}_{2}}{N}\right)^{\frac{1}{2}}\sqrt{log\left(\frac{eN}{\mathcal{H}_{2}}\right)} (30)

where

2=C2(𝒟+2)2(𝒟+3)2𝒲2[(𝒟+3)+log(2(𝒟+2)𝒲)].\mathcal{H}_{2}=C_{2}(\mathcal{D}+2)^{2}(\mathcal{D}+3)^{2}\mathcal{W}^{2}[(\mathcal{D}+3)+log(2(\mathcal{D}+2)\mathcal{W})].

\bullet  For i=16,18,19,21,22,24i=16,18,19,21,22,24, when the sample numbers n=MK>Pdim(i)n=MK>Pdim(\mathcal{F}_{i}), we have

𝒰(ΩT),MK(i)2832max{,2}(3MK)12log(eMK3)\displaystyle\Re_{\mathcal{U}(\partial\Omega_{T}),MK}(\mathcal{F}_{i})\leq 28\sqrt{\frac{3}{2}}max\{\mathcal{B},\mathcal{B}^{2}\}\left(\frac{\mathcal{H}_{3}}{MK}\right)^{\frac{1}{2}}\sqrt{log\left(\frac{eMK}{\mathcal{H}_{3}}\right)} (31)

where

3=C3(𝒟+2)2(𝒟+3)2𝒲2[(𝒟+3)+log(2(𝒟+2)𝒲)].\mathcal{H}_{3}=C_{3}(\mathcal{D}+2)^{2}(\mathcal{D}+3)^{2}\mathcal{W}^{2}[(\mathcal{D}+3)+log(2(\mathcal{D}+2)\mathcal{W})].

\bullet  For i=3i=3, we have,

𝔼{Xn}n=1N,{Tk}k=1K|3^3|\displaystyle\mathbb{E}_{\{X_{n}\}_{n=1}^{N},\{T_{k}\}_{k=1}^{K}}\bigg{|}\mathcal{L}_{3}-\hat{\mathcal{L}}_{3}\bigg{|}
=|Ω||T|𝔼{Xn}n=1N,{Tk}k=1K|𝔼XU(Ω),TU([0,T])f(X,T)21NKn=1Nk=1Kf(Xn,Tk)2|\displaystyle=|\Omega||T|\mathbb{E}_{\{X_{n}\}_{n=1}^{N},\{T_{k}\}_{k=1}^{K}}\bigg{|}\mathbb{E}_{X\sim U(\Omega),T\sim U([0,T])}f(X,T)^{2}-\frac{1}{NK}\sum_{n=1}^{N}\sum_{k=1}^{K}f(X_{n},T_{k})^{2}\bigg{|}
|Ω||T|𝔼{Xn}n=1N,{Tk}k=1K|𝔼XU(Ω),TU([0,T])f(X,T)21NKn=1Nk=1Kf(Xn,Tk)2|2\displaystyle\leq|\Omega||T|\sqrt{\mathbb{E}_{\{X_{n}\}_{n=1}^{N},\{T_{k}\}_{k=1}^{K}}\bigg{|}\mathbb{E}_{X\sim U(\Omega),T\sim U([0,T])}f(X,T)^{2}-\frac{1}{NK}\sum_{n=1}^{N}\sum_{k=1}^{K}f(X_{n},T_{k})^{2}\bigg{|}^{2}}
=|Ω||T|NK𝔼{Xn}n=1N,{Tk}k=1Kn=1Nk=1K|𝔼XU(Ω),TU([0,T])f(X,T)2f(Xn,Tk)2|2\displaystyle=\frac{|\Omega||T|}{NK}\sqrt{\mathbb{E}_{\{X_{n}\}_{n=1}^{N},\{T_{k}\}_{k=1}^{K}}\sum_{n=1}^{N}\sum_{k=1}^{K}\bigg{|}\mathbb{E}_{X\sim U(\Omega),T\sim U([0,T])}f(X,T)^{2}-f(X_{n},T_{k})^{2}\bigg{|}^{2}}
=|Ω||T|NKNK𝔼X1U(Ω),T1U([0,T])|𝔼XU(Ω),TU([0,T])f(X,T)2f(X1,T1)2|2\displaystyle=\frac{|\Omega||T|}{NK}\sqrt{NK\cdot\mathbb{E}_{X_{1}\sim U(\Omega),T_{1}\sim U([0,T])}\bigg{|}\mathbb{E}_{X\sim U(\Omega),T\sim U([0,T])}f(X,T)^{2}-f(X_{1},T_{1})^{2}\bigg{|}^{2}}
=|Ω||T|NKNKσ(f(X,T)2)\displaystyle=\frac{|\Omega||T|}{NK}\sqrt{NK}\sigma(f(X,T)^{2})
=|Ω||T|σ(f(X,T)2)NK\displaystyle=|\Omega||T|\frac{\sigma(f(X,T)^{2})}{\sqrt{NK}}

where σ(f(X,T))\sigma(f(X,T)) is the standard deviation of f(X,T)f(X,T). With the bound of ff, we can further obtain,

σ2(f(X,T)2)\displaystyle\sigma^{2}(f(X,T)^{2}) =𝔼((f(X,T)2)2)(𝔼(f(X,T)2))2\displaystyle=\mathbb{E}\left((f(X,T)^{2})^{2}\right)-\left(\mathbb{E}(f(X,T)^{2})\right)^{2}
1|Ω||T|(ΩT(f(X,T)2)2𝑑X𝑑T)\displaystyle\leq\frac{1}{|\Omega||T|}\left(\int_{\Omega_{T}}(f(X,T)^{2})^{2}dXdT\right)
1|Ω||T|(ΩTκ4𝑑X𝑑T)\displaystyle\leq\frac{1}{|\Omega||T|}\left(\int_{\Omega_{T}}\kappa^{4}dXdT\right)
1|Ω||T||Ω||T|\displaystyle\leq\frac{1}{|\Omega||T|}\cdot|\Omega||T|\mathcal{B}
=\displaystyle=\mathcal{B}

then we have,

𝔼XU(Ω),TU([0,T])|3^3||Ω||T|NK.\displaystyle\mathbb{E}_{X\sim U(\Omega),T\sim U([0,T])}\bigg{|}\mathcal{L}_{3}-\hat{\mathcal{L}}_{3}\bigg{|}\leq|\Omega||T|\sqrt{\frac{\mathcal{B}}{NK}}.

\bullet  Similarly, for i=8,14,17,20i=8,14,17,20,

𝔼XU(Ω),TU([0,T])|8^8||Ω|N,\displaystyle\mathbb{E}_{X\sim U(\Omega),T\sim U([0,T])}\bigg{|}\mathcal{L}_{8}-\hat{\mathcal{L}}_{8}\bigg{|}\leq|\Omega|\sqrt{\frac{\mathcal{B}}{N}},
𝔼XU(Ω)|14^14||Ω|N,\displaystyle\mathbb{E}_{X\sim U(\Omega)}\bigg{|}\mathcal{L}_{14}-\hat{\mathcal{L}}_{14}\bigg{|}\leq|\Omega|\sqrt{\frac{\mathcal{B}}{N}},
𝔼YU(Ω),TU([0,T])|17^17||Ω||T|MK,\displaystyle\mathbb{E}_{Y\sim U(\partial\Omega),T\sim U([0,T])}\bigg{|}\mathcal{L}_{17}-\hat{\mathcal{L}}_{17}\bigg{|}\leq|\partial\Omega||T|\sqrt{\frac{\mathcal{B}}{MK}},
𝔼YU(Ω),TU([0,T])|20^20||Ω||T|MK.\displaystyle\mathbb{E}_{Y\sim U(\partial\Omega),T\sim U([0,T])}\bigg{|}\mathcal{L}_{20}-\hat{\mathcal{L}}_{20}\bigg{|}\leq|\partial\Omega||T|\sqrt{\frac{\mathcal{B}}{MK}}.

\bullet  For i=11i=11,

σ2(i=1dφxi(x)2)\displaystyle\sigma^{2}(\sum_{i=1}^{d}\varphi_{x_{i}}(x)^{2}) =𝔼((i=1dφxi(x)2)2)(𝔼(i=1dφxi(x)2))2\displaystyle=\mathbb{E}\left((\sum_{i=1}^{d}\varphi_{x_{i}}(x)^{2})^{2}\right)-\left(\mathbb{E}(\sum_{i=1}^{d}\varphi_{x_{i}}(x)^{2})\right)^{2}
1|Ω|[Ω(i=1dφxi(x)2)2𝑑x]\displaystyle\leq\frac{1}{|\Omega|}\left[\int_{\Omega}\left(\sum_{i=1}^{d}\varphi_{x_{i}}(x)^{2}\right)^{2}dx\right]
1|Ω|[Ω(dκ2)2𝑑x]\displaystyle\leq\frac{1}{|\Omega|}\left[\int_{\Omega}(d\kappa^{2})^{2}dx\right]
d2\displaystyle\leq d^{2}\mathcal{B}

then we have,

𝔼XU(Ω)|11^11|d|Ω|N.\displaystyle\mathbb{E}_{X\sim U(\Omega)}\bigg{|}\mathcal{L}_{11}-\hat{\mathcal{L}}_{11}\bigg{|}\leq d|\Omega|\sqrt{\frac{\mathcal{B}}{N}}.

\bullet  Similarly, for i=23i=23,

𝔼YU(Ω),TU([0,T])|23^23|d|Ω||T|MK.\displaystyle\mathbb{E}_{Y\sim U(\partial\Omega),T\sim U([0,T])}\bigg{|}\mathcal{L}_{23}-\hat{\mathcal{L}}_{23}\bigg{|}\leq d|\partial\Omega||T|\sqrt{\frac{\mathcal{B}}{MK}}.

Hence, we have,

𝔼{Xn}n=1N,{Ym}m=1M,{Tk}k=1Ksupu𝒫|(u)^(u)|\displaystyle\mathbb{E}_{\{X_{n}\}_{n=1}^{N},\{Y_{m}\}_{m=1}^{M},\{T_{k}\}_{k=1}^{K}}\sup_{u\in\mathcal{P}}|\mathcal{L}(u)-\hat{\mathcal{L}}(u)|
j=124𝔼{X,Y,T}supu𝒫|j(u)^j(u)|\displaystyle\leq\sum_{j=1}^{24}\mathbb{E}_{\{X,Y,T\}}\sup_{u\in\mathcal{P}}|\mathcal{L}_{j}(u)-\hat{\mathcal{L}}_{j}(u)|
2832max{,2}(5|Ω||T|C1(1NK)12log(eNK1)+\displaystyle\leq 28\sqrt{\frac{3}{2}}max\{\mathcal{B},\mathcal{B}^{2}\}\left(5|\Omega||T|C_{1}\left(\frac{\mathcal{H}_{1}}{NK}\right)^{\frac{1}{2}}\sqrt{log\left(\frac{eNK}{\mathcal{H}_{1}}\right)}+\right.
6|Ω|C2(2N)12log(eN2)+6|Ω||T|C3(3MK)12log(eMK3))\displaystyle\left.6|\Omega|C_{2}\left(\frac{\mathcal{H}_{2}}{N}\right)^{\frac{1}{2}}\sqrt{log\left(\frac{eN}{\mathcal{H}_{2}}\right)}+6|\partial\Omega||T|C_{3}\left(\frac{\mathcal{H}_{3}}{MK}\right)^{\frac{1}{2}}\sqrt{log\left(\frac{eMK}{\mathcal{H}_{3}}\right)}\right)
(|Ω||T|NK+(2+d)|Ω|N+(2+d)|Ω||T|MK),\displaystyle\left(|\Omega||T|\sqrt{\frac{\mathcal{B}}{NK}}+(2+d)|\Omega|\sqrt{\frac{\mathcal{B}}{N}}+(2+d)|\partial\Omega||T|\sqrt{\frac{\mathcal{B}}{MK}}\right),

where C1,C2,C3C_{1},C_{2},C_{3} are constants associated with dimensionality dd and bound \mathcal{B}. Hence, for any ε0\varepsilon\geq 0, if the number of samples satisfies:

{N=C(d,|Ω|,)𝒟4𝒲2(𝒟+log(𝒲))(1ε)2+δ,K=C(d,|T|,)𝒟2fK(𝒟,𝒲)(1ε)k1,M=C(d,|Ω|,)fM(𝒟,𝒲)(1ε)k2,\begin{cases}N&=C(d,|\Omega|,\mathcal{B})\mathcal{D}^{4}\mathcal{W}^{2}(\mathcal{D}+log(\mathcal{W}))(\frac{1}{\varepsilon})^{2+\delta},\\ K&=C(d,|T|,\mathcal{B})\mathcal{D}^{2}f_{K}(\mathcal{D},\mathcal{W})(\frac{1}{\varepsilon})^{k_{1}},\\ M&=C(d,|\partial\Omega|,\mathcal{B})f_{M}(\mathcal{D},\mathcal{W})(\frac{1}{\varepsilon})^{k_{2}},\\ \end{cases}

where:

{k1+k2=2+δ,fk(𝒟,𝒲)fM(𝒟,𝒲)=𝒟2𝒲2(𝒟+log(𝒲)).\begin{cases}k_{1}+k_{2}=2+\delta,\\ f_{k}(\mathcal{D},\mathcal{W})\cdot f_{M}(\mathcal{D},\mathcal{W})=\mathcal{D}^{2}\mathcal{W}^{2}(\mathcal{D}+log(\mathcal{W})).\end{cases}

with restriction fK(𝒟,𝒲)1f_{K}(\mathcal{D},\mathcal{W})\geq 1, fM(𝒟,𝒲)1f_{M}(\mathcal{D},\mathcal{W})\geq 1, and δ\delta is arbitrarily small. Then we have:

𝔼{Xn}n=1N,{Ym}m=1M,{Tk}k=1Ksupu𝒫|(u)^(u)|ε.\displaystyle\mathbb{E}_{\{X_{n}\}_{n=1}^{N},\{Y_{m}\}_{m=1}^{M},\{T_{k}\}_{k=1}^{K}}\sup_{u\in\mathcal{P}}|\mathcal{L}(u)-\hat{\mathcal{L}}(u)|\leq\varepsilon.

References

  • (1) Susanne Brenner and Ridgway Scott. The mathematical theory of finite element methods, volume 15. Springer Science & Business Media, 2007.
  • (2) Philippe G Ciarlet. The finite element method for elliptic problems. SIAM, 2002.
  • (3) A. Quarteroni and A. Valli. Numerical Approximation of Partial Differential Equations, volume 23. Springer Science & Business Media, 2008.
  • (4) J.W. Thomas. Numerical Partial Differential Equations: Finite Difference Methods, volume 22. Springer Science & Business Media, 2013.
  • (5) Isaac E Lagaris, Aristidis Likas, and Dimitrios I Fotiadis. Artificial neural networks for solving ordinary and partial differential equations. IEEE transactions on neural networks, 9(5):987–1000, 1998.
  • (6) Cosmin Anitescu, Elena Atroshchenko, Naif Alajlan, and Timon Rabczuk. Artificial neural network methods for the solution of second order boundary value problems. Computers, Materials & Continua, 59(1):345–359, 2019.
  • (7) Julius Berner, Markus Dablander, and Philipp Grohs. Numerically solving parametric families of high-dimensional kolmogorov partial differential equations via deep learning. In Advances in Neural Information Processing Systems, volume 33, pages 16615–16627. Curran Associates, Inc., 2020.
  • (8) Jiequn Han, Arnulf Jentzen, and E Weinan. Solving high-dimensional partial differential equations using deep learning. Proceedings of the National Academy of Sciences, 115(34):8505–8510, 2018.
  • (9) Lu Lu, Xuhui Meng, Zhiping Mao, and George Em Karniadakis. Deepxde: A deep learning library for solving differential equations. SIAM Review, 63(1):208–228, 2021.
  • (10) Justin Sirignano and Konstantinos Spiliopoulos. Dgm: A deep learning algorithm for solving partial differential equations. Journal of computational physics, 375:1339–1364, 2018.
  • (11) E. Weinan and Ting Yu. The deep ritz method: A deep learning-based numerical algorithm for solving variational problems. Communications in Mathematics and Statistics, 6(1):1–12, 2017.
  • (12) Maziar Raissi, Paris Perdikaris, and George E Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686–707, 2019.
  • (13) Yaohua Zang, Gang Bao, Xiaojing Ye, and Haomin Zhou. Weak adversarial networks for high-dimensional partial differential equations. Journal of Computational Physics, 411:109409, 2020.
  • (14) Ameya D Jagtap, Ehsan Kharazmi, and George Em Karniadakis. Conservative physics-informed neural networks on discrete domains for conservation laws: Applications to forward and inverse problems. Computer Methods in Applied Mechanics and Engineering, 365:113028, 2020.
  • (15) G. Pang, M. D’Elia, M. Parks, and G.E. Karniadakis. npinns: Nonlocal physics-informed neural networks for a parametrized nonlocal universal laplacian operator. algorithms and applications. Journal of Computational Physics, 422:109760, 2020.
  • (16) Guofei Pang, Lu Lu, and George Em Karniadakis. fpinns: Fractional physics-informed neural networks. SIAM Journal on Scientific Computing, 41(4):A2603–A2626, 2019.
  • (17) Hao Wang, Jian Li, Linfeng Wang, Lin Liang, Zhoumo Zeng, and Yang Liu. On acoustic fields of complex scatters based on physics-informed neural networks. Ultrasonics, 128:106872, 2023.
  • (18) Linfeng Wang, Hao Wang, Lin Liang, Jian Li, Zhoumo Zeng, and Yang Liu. Physics-informed neural networks for transcranial ultrasound wave propagation. Ultrasonics, 132:107026, 2023.
  • (19) Yi Ding, Su Chen, Xiaojun Li, Suyang Wang, Shaokai Luan, and Hao Sun. Self-adaptive physics-driven deep learning for seismic wave modeling in complex topography. Engineering Applications of Artificial Intelligence, 123:106425, 2023.
  • (20) Wojciech M Czarnecki, Simon Osindero, Max Jaderberg, Grzegorz Swirszcz, and Razvan Pascanu. Sobolev training for neural networks. Advances in neural information processing systems, 30, 2017.
  • (21) Hwijae Son, Jin Woo Jang, Woo Jin Han, and Hyung Ju Hwang. Sobolev training for physics informed neural networks. Communications in Mathematical Sciences, 21:1679–1705, 2013.
  • (22) Nikolaos N Vlassis and WaiChing Sun. Sobolev training of thermodynamic-informed neural networks for interpretable elasto-plasticity models with level set hardening. Computer Methods in Applied Mechanics and Engineering, 377:113695, 2021.
  • (23) Yuling Jiao, Xiliang Lu, Jerry Zhijian Yang, Cheng Yuan, and Pingwen Zhang. Improved analysis of pinns: Alleviate the cod for compositional solutions. Annals of Applied Mathematics, 39(3):239–263, 2023.
  • (24) Yuling Jiao, Di Li, Xiliang Lu, Jerry Zhijian Yang, and Cheng Yuan. Gas: A gaussian mixture distribution-based adaptive sampling method for pinns. arXiv preprint arXiv:2303.15849, 2023.
  • (25) Yuling Jiao, Yanming Lai, Dingwei Li, Xiliang Lu, Fengru Wang, Jerry Zhijian Yang, et al. A rate of convergence of physics informed neural networks for the linear second order elliptic pdes. Communications in Computational Physics, 31(4):1272–1295, 2022.
  • (26) Aad W Van Der Vaart, Adrianus Willem van der Vaart, Aad van der Vaart, and Jon Wellner. Weak convergence and empirical processes: with applications to statistics. Springer Science & Business Media, 1996.
  • (27) Martin Anthony and Peter L. Bartlett. Neural network learning: Theoretical foundations. Ai Magazine, 22(2):99–100, 1999.