∎

¹¹institutetext: Yuling Jiao²²institutetext: School of Mathematics and Statistics, Wuhan University, Wuhan 430072, China
²²email: [email protected] ³³institutetext: Yuhui Liu⁴⁴institutetext: School of Mathematics and Statistics, Wuhan University, Wuhan 430072, China
⁴⁴email: [email protected] ⁵⁵institutetext: Jerry Zhijian Yang⁶⁶institutetext: School of Mathematics and Statistics, Wuhan University, Wuhan 430072, China
⁶⁶email: [email protected] ⁷⁷institutetext: Cheng Yuan⁸⁸institutetext: School of Mathematics and Statistics, Wuhan University, Wuhan 430072, China
⁸⁸email: [email protected]

A Stabilized Physics Informed Neural Networks Method for Wave Equations

Yuling Jiao Yuhui Liu Jerry Zhijian Yang Cheng Yuan

(Received: date / Accepted: date)

Abstract

In this article, we propose a novel Stabilized Physics Informed Neural Networks method (SPINNs) for solving wave equations. In general, this method not only demonstrates theoretical convergence but also exhibits higher efficiency compared to the original PINNs. By replacing the $L^{2}$ norm with $H^{1}$ norm in the learning of initial condition and boundary condition, we theoretically proved that the error of solution can be upper bounded by the risk in SPINNs. Based on this, we decompose the error of SPINNs into approximation error, statistical error and optimization error. Furthermore, by applying the approximating theory of $ReLU^{3}$ networks and the learning theory on Rademacher complexity, covering number and pseudo-dimension of neural networks, we present a systematical non-asymptotic convergence analysis on our method, which shows that the error of SPINNs can be well controlled if the number of training samples, depth and width of the deep neural networks have been appropriately chosen. Two illustrative numerical examples on 1-dimensional and 2-dimensional wave equations demonstrate that SPINNs can achieve a faster and better convergence than classical PINNs method.

Keywords:

PINNs

ReLU^{3}

neural network Wave equationsError analysis

MSC:

68T07 65M12 62G05

1 Introduction

During the past few decades, numerical methods of Partial differential equations (PDEs) have been widely studied and applied in various fields of scientific computation brenner2007mathematical ; ciarlet2002finite ; Quarteroni2008Numerical ; Thomas2013Numerical . Among these, due to the central significance in solid mechanics, acoustics and electromagnetism, the numerical solution for wave equation attracts considerable attention, and a lot of work has been done to analyze the convergence rate, improve the solving efficiency and deal with practical problems such as boundary conditions. For many real problems with complex region, however, designing an efficient and accurate algorithms with practical absorbing boundary conditions is still difficult, especially for problems with irregular boundary. Furthermore, in high-dimensional case, many traditional methods may become even intractable due to the Curse of Dimensionality, which leads to an exponential increase in degree of freedom with the dimension of problem.

More recently, inspired by the great success of deep learning in fields of natural language processing and computational visions, solving PDEs with deep learning has become as a highly promising topic lagaris1998artificial ; anitescu2019artificial ; Berner2020Numerically ; han2018solving ; lu2021deepxde ; sirignano2018dgm . Several numerical schemes have been proposed to solve PDEs using neural networks, including the deep Ritz method (DRM) Weinan2017The , physics-informed neural networks (PINNs) raissi2019physics , weak adversarial neural networks(WANs) Yaohua2020weak and their extensions jagtap2020conservative ; npinns ; fpinns . Due to the simplicity and flexibility in its formulation, PINNs turns out to be the most concerned method. In the field of wave equations, researchers have successfully applied PINNs to the modeling of scattered acoustic fields wang2023acoustic , including transcranial ultrasound wave wang2023physics and seismic wave ding2023self . In these works, all of the authors observed an interesting phenomenon that training PINNs without any boundary constraints may lead to a solution under absorbing boundary condition. In another word, the waves obtained by PINNs without boundary loss will be naturally absorbed at the boundary. This phenomenon, in fact, greatly improves the application value of PINNs in wave simulation, especially for inverse scattering problems. On the other hand, although PINNs have been widely used in the simulation of waves, a rigorous numerical analysis of PINNs for wave equations and more efficient training strategy are still needed.

In this work, we propose the Stabilized Physics Informed Neural Networks (SPINNs) for simulation of waves. By replacing the $L^{2}$ norm in initial condition and boundary condition with $H^{1}$ norm, we obtain a stable PINNs method, in the sense that the error in solution can be upper bounded by the risk during training. It is worth mentioning that, in 2017 a similar idea called Sobolev Training has been proposed to improve the efficiency for regression czarnecki2017sobolev . Later in son2021sobolev and vlassis2021sobolev , the authors generalized this idea to the training of PINNs, with applications to heat equation, Burgers’ equation, Fokker-Planck equation and elasto-plasticity models. One main difference between our model and these works is that, we still use the $L^{2}$ norm, rather than $H^{1}$ norm for the residual and initial velocity in the loss of SPINNs. This designing, as we will demonstrate, turns out to be a sufficient condition to guarantee the stability, which also enables us to achieve a lower error with same and even less training samples. Furthermore, based on this stability, we firstly give a systematical convergence rate analysis of PINNs for wave equations. In general, our main contributions are summarized as follows:

Main contributions

$\bullet$ We propose a novel Stabilized Physics Informed Neural Networks method (SPINNs) for solving wave equations, for which we prove that the error in solution can be bounded by the training risk.

$\bullet$ We numerically show that SPINNs can achieve a faster and better convergence than original PINNs.

$\bullet$ We present a systematical convergence analysis of SPINNs. According to our result, once the network depth, width, as well as the number of training samples have been appropriately chosen, the error between the numerical solution $u_{\phi}$ from SPINNs and the exact solution $u^{*}$ can be arbitrarily small in the $H^{1}$ norm

\displaystyle\mathbb{E}_{\{X_{n}\}_{n=1}^{N},\{Y_{m}\}_{m=1}^{M},\{T_{k}\}_{k=1}^{K}}\|\hat{u}_{\phi}-u^{*}\|_{H^{1}(\overline{\Omega_{T}})}\leq\varepsilon.

The rest of this paper is organized as follows. In Section 2, we describe the problem setting and introduce the SPINNs method. In Section 3, we study the convergence rate of the SPINNs method for solving wave equations. In Section 4, we present several numerical examples to illustrate the efficiency of SPINNs. Finally in Section 5 the main conclusion will be provided.

2 The Stabilized PINNs method

In this section, we would introduce a stabilized PINNs (SPINNs) method for solving the wave equation. For completeness, we first list the following notations of neural networks and function spaces we will use. After that, the formulation of SPINNs will be presented.

2.1 Preliminary

Let $\mathcal{D}\in\mathbb{N}$ , we would call function $f$ a neural network if it is implemented by:

	$\displaystyle f_{0}(x)$	$\displaystyle=x,$
	$\displaystyle f_{l}(x)$	$\displaystyle=\rho_{l}(A_{l}f_{l-1}+b_{l}),\quad for\quad l=1,\cdots,\mathcal{D}-1,$
	$\displaystyle f:$	$\displaystyle=f_{\mathcal{D}}(x)=A_{\mathcal{D}}f_{\mathcal{D}-1}+b_{\mathcal{D}},$

where $A_{l}=(a_{ij}^{(l)})\in\mathbb{R}^{n_{l}\times n_{l-1}}$ , $b_{l}=(b_{i}^{(l)})\in\mathbb{R}^{n_{l}}$ .And $\rho_{l}:\mathbb{R}^{n_{l}}\rightarrow\mathbb{R}^{n_{l}}$ , is the active function. The hyper-parameters $\mathcal{D}$ and $\mathcal{W}:=max\{N_{l},l=0,\cdots,\mathcal{D}\}$ are called the depth and the width of the network, respectively. Let $\Phi$ be a set of activation functions and $X$ be a Banach space, the normed neural network function class can be defined as

	$\displaystyle\mathcal{N}(\mathcal{D},\mathcal{W},\{\\|\cdot\\|_{X},\mathcal{B}\},\Phi):=\{f:f\ \text{is implemented by a neural network with }$
	$\displaystyle\text{depth }\mathcal{D}\text{ and width }\mathcal{W},\\|f\\|_{X}\leq\mathcal{B},\rho_{i}^{(l)}\in\Phi\ \text{for each $i$ and $l$}\}.$

Next, we introduce several standard function spaces, including the continuous function space and Sobolev space:

	$\displaystyle C(\Omega):=\{\text{all the continuous functions defined on }\Omega\},$
	$\displaystyle C^{s}(\Omega):=\{f:\Omega\rightarrow\mathbb{R}\ \|\ D^{\alpha}f\in C(\Omega)\},\|\alpha\|\leq s,$
	$\displaystyle C(\overline{\Omega}):=\{\text{all the continuous functions defined on }\overline{\Omega}\},\left\\|f\right\\|_{C(\overline{\Omega})}:=\max_{x\in\overline{\Omega}}\|f(x)\|,$
	$\displaystyle C^{s}(\overline{\Omega}):=\{f:\Omega\rightarrow\mathbb{R}\ \|\ D^{\alpha}f\in C(\overline{\Omega})\},\|\alpha\|\leq s,\left\\|f\right\\|_{C^{s}(\overline{\Omega})}:=\max_{x\in\overline{\Omega},\|\alpha\|\leq s}\|D^{\alpha}f(x)\|,$
	$\displaystyle L^{p}(\Omega):=\left\{f:\Omega\rightarrow\mathbb{R}\|\int_{\Omega}\left\|f\right\|^{p}dx<\infty\right\},$
	$\displaystyle\left\\|f\right\\|_{L^{p}(\Omega)}:=\bigg{[}\int_{\Omega}\|f\|^{p}dx\bigg{]}^{1/p},\forall p\in[1,\infty),$
	$\displaystyle L^{\infty}(\Omega):=\left\{f:\Omega\rightarrow\mathbb{R}\ \|\ \exists C>0\ s.t.\|f\|\leq C\ a.e.\right\},$
	$\displaystyle\left\\|f\right\\|_{L^{\infty}(\Omega)}:=\inf\{C\ \|\ \|f\|\leq C\ a.e.\},$
	$\displaystyle H^{s}(\Omega):=\left\{f:\Omega\rightarrow\mathbb{R}\ \|\ D^{\alpha}f\in L^{2}(\Omega),\|\alpha\|\leq s\right\},$
	$\displaystyle\left\\|f\right\\|_{H^{s}(\Omega)}:=\left(\sum_{\|\alpha\|\leq s}\left\\|D^{\alpha}f\right\\|_{L^{2}(\Omega)}^{2}\right)^{1/2}.$

2.2 Stabilized PINNs for wave equations

Considering the following wave equation:

\begin{cases}u_{tt}-\Delta u=f,\quad\ \quad&(x,t)\in\Omega_{T},\\ u(x,0)=\varphi(x),&x\in\Omega,\\ u_{t}(x,0)=\psi(x),&x\in\Omega,\\ u(x,t)=g(x,t),&x\in\partial\Omega,t\in[0,T],\\ \end{cases}

(1)

where $\Delta u=\sum_{i=1}^{d}u_{x_{i}x_{i}}$ , $\Omega=(0,1)^{d}$ and $\Omega_{T}=\Omega\times\left[0,T\right]$ . Through this article, we would assume this problems defines a unique solution: {assumption} Assume (1) has a unique strong solution $u^{*}\in C^{2}(\Omega_{T})\cap C(\overline{\Omega_{T}})$ .

Without loss of generality, we further assume that $f,\varphi,\psi,g$ and their derivatives are $L^{\infty}$ bounded by a constant $\kappa$ . Denote $\mathcal{B}\triangleq\max\{2\|u^{*}\|_{C^{2}(\overline{\Omega_{T}})},\kappa^{4}\}$ . Instead of solving problem (1) by traditional numerical methods, we turn to formulate (1) as a minimization problem on $C^{2}(\Omega_{T})\cap C(\overline{\Omega_{T}})$ , with the loss functional $\mathcal{L}$ being defined as:

	$\displaystyle\mathcal{L}(u)$	$\displaystyle:=\\|u_{tt}(x,t)-\Delta u(x,t)-f\\|^{2}_{L^{2}(\Omega_{T})}+\\|u(x,0)-\varphi(x)\\|^{2}_{H^{1}(\Omega)}$
		$\displaystyle+\\|u_{t}(x,0)-\psi(x)\\|^{2}_{L^{2}(\Omega)}+\\|u(x,t)-g(x,t)\\|^{2}_{H^{1}(\partial\Omega_{T})}.$		(2)

Remark 1

Different from the original loss in PINNs, we adopt $H^{1}$ -norm in stead of $L^{2}$ -norm in the learning of initial position and boundary condition. This modification, as we will demonstrate, offers advantages in both theoretical analysis and numerical computation.

With assumption 2.2, we know that $u^{*}$ is also the unique minimizer of loss functional $\mathcal{L}$ such that $\mathcal{L}(u^{*})=0$ . Let $|\Omega|$ and $|\partial\Omega|$ be the measure of $\Omega$ and $\partial\Omega$ , namely, $|\Omega|:=\int_{\Omega}1dx$ , $|\partial\Omega|:=\int_{\partial\Omega}1dx$ and $|T|:=\int_{0}^{T}1dt$ , then $\mathcal{L}(u)$ can be equivalently written as

$\displaystyle\mathcal{L}(u)$	$\displaystyle=\|\Omega\|\|T\|\mathbb{E}_{X\in U(\Omega),T\in U([0,T])}\bigg{(}u_{tt}(X,T)-\Delta u(X,T)-f(X,T)\bigg{)}^{2}$
	$\displaystyle+\|\Omega\|\mathbb{E}_{X\in U(\Omega)}\bigg{[}\big{(}u(X,0)-\varphi(X)\big{)}^{2}+\sum_{i=1}^{d}\big{(}u_{x_{i}}(X,0)-\varphi_{x_{i}}(X)\big{)}^{2}$
	$\displaystyle+\big{(}u_{t}(X,0)-\psi(X)\big{)}^{2}\bigg{]}$
	$\displaystyle+\|\partial\Omega\|\|T\|\mathbb{E}_{Y\in U(\partial\Omega),T\in U([0,T])}\bigg{[}\big{(}u(Y,T)-g(Y,T)\big{)}^{2}$
	$\displaystyle+\big{(}u_{t}(Y,T)-g_{t}(Y,T)\big{)}^{2}+\sum_{i=1}^{d}\big{(}u_{x_{i}}(Y,T)-g_{x_{i}}(Y,T)\big{)}^{2}\bigg{]}$	(3)

where $U(\Omega)$ , $U(\partial\Omega)$ , $U([0,T])$ are uniform distribution on $\Omega$ , $\partial\Omega$ and $[0,T]$ , respectively. To solve minimization of $\mathcal{L}(u)$ approximately, a Monte Carlo discrete version of $\mathcal{L}$ will be used:

$\displaystyle\hat{\mathcal{L}}(u)$	$\displaystyle=\frac{\|\Omega\|\|T\|}{NK}\sum_{n=1}^{N}\sum_{k=1}^{K}\bigg{(}u_{tt}(X_{n},T_{k})-\Delta u(X_{n},T_{k})-f(X_{n},T_{k})\bigg{)}^{2}$
	$\displaystyle+\frac{\|\Omega\|}{N}\sum_{n=1}^{N}\bigg{[}\big{(}u(X_{n},0)-\varphi(X_{n})\big{)}^{2}+\sum_{i=1}^{d}\big{(}u_{x_{i}}(X_{n},0)-\varphi_{x_{i}}(X_{n})\big{)}^{2}$
	$\displaystyle+\big{(}u_{t}(X_{n},0)-\psi(X_{n})\big{)}^{2}\bigg{]}$
	$\displaystyle+\frac{\|\partial\Omega\|\|T\|}{MK}\sum_{m=1}^{M}\sum_{k=1}^{K}\bigg{[}\big{(}u(Y_{m},T_{k})-g(Y_{m},T_{k})\big{)}^{2}$
	$\displaystyle+\big{(}u_{t}(Y_{m},T_{k})-g_{t}(Y_{m},T_{k})\big{)}^{2}+\sum_{i=1}^{d}\big{(}u_{x_{i}}(Y_{m},T_{k})-g_{x_{i}}(Y_{m},T_{k})\big{)}^{2}\bigg{]}$	(4)

where $\{X_{n}\}_{n=1}^{N}$ , $\{Y_{m}\}_{m=1}^{M}$ and $\{T_{k}\}_{k=1}^{K}$ are independent and identically distributed random samples according to the uniform distribution $U(\Omega)$ , $U(\partial\Omega)$ and $U([0,T])$ , respectively. With this approximation, we would solve the original problem (1) by using the empirical risk minimization:

\displaystyle\hat{u}_{\phi}=\arg\min_{u_{\phi}\in\mathcal{P}}\hat{\mathcal{L}}(u_{\phi}),

(5)

where the admissible set $\mathcal{P}$ refers to the deep neural network function class parameterized by $\phi$ . In this work, we will choose $\mathcal{P}$ as the $ReLU^{3}$ network function space, to ensure $\mathcal{P}\subset C^{2}(\Omega_{T})\cap C(\overline{\Omega_{T}})$ . More precisely,

\mathcal{P}=\mathcal{N}(\mathcal{D},\mathcal{W},\{\|\cdot\|_{C^{2}(\overline{\Omega_{T}})},\mathcal{B}\},\{ReLU^{3}\}),

$\{\mathcal{D},\mathcal{W}\}$ will be given later to ensure the desired accuracy. The $ReLU^{3}$ activation function is defined by

\displaystyle\rho(x)=\begin{cases}x^{3},&x\geq 0,\\ 0,&others.\end{cases}

In practical, the minimizer of problem (5) is usually obtained through some optimization algorithm $\mathcal{A}$ . We would denote the minimizer as $u_{\phi_{\mathcal{A}}}$ .

3 Convergence analysis of SPINNs

In this section, we will present a systematical error analysis of SPINNs for wave equations. To begin with, we first review some basic notations and theorem in the PDEs theory on wave equations.

For wave equation, its total energy consists of two parts: kinetic energy $U$ and potential energy $V$ , both of which can be expressed by multiple integrals,

U=\frac{1}{2}\int_{\Omega}u_{t}^{2}dx,

V=\frac{1}{2}\int_{\Omega}\left(\sum_{i=1}^{d}u_{x_{i}}^{2}\right)dx,

and their sum is called energy integral, the total energy of the wave equation (1) excluding a constant factor is denoted as,

\displaystyle E(t)=\int_{\Omega}\left(u_{t}^{2}+\sum_{i=1}^{d}u_{x_{i}}^{2}\right)dx.

(6)

Theorem 3.1 (Energy stability)

We denote $E_{0}(t):=\int_{\Omega}u^{2}dx$ , which stands for the square norm estimation of $u$ . We have the energy inequality as below.

	$\displaystyle E(t)+E_{0}(t)$	$\displaystyle\leq C(T)(E(0)+E_{0}(0)+\int_{0}^{T}\int_{\Omega}f^{2}dxdt$
		$\displaystyle+2\int_{0}^{T}\int_{\partial\Omega}\|u_{t}\|\cdot\\|\nabla u\\|dsdt).$

Proof

See Appendix 6.1 for details.

3.1 Risk decomposition

By the definition of $u^{*}$ and $u_{\phi_{\mathcal{A}}}$ , we can decompose the risk in SPINNs as

	$\displaystyle\mathcal{L}(u_{\phi_{\mathcal{A}}})-\mathcal{L}(u^{*})=$	$\displaystyle\mathcal{L}(u_{\phi_{\mathcal{A}}})-\hat{\mathcal{L}}(u_{\phi_{\mathcal{A}}})+\hat{\mathcal{L}}(u_{\phi_{\mathcal{A}}})-\hat{\mathcal{L}}(\hat{u}_{\phi})+\hat{\mathcal{L}}(\hat{u}_{\phi})$
		$\displaystyle-\hat{\mathcal{L}}(\overline{u})+\hat{\mathcal{L}}(\overline{u})-\mathcal{L}(\overline{u})+\mathcal{L}(\overline{u})-\mathcal{L}(u^{*})$
	$\displaystyle=$	$\displaystyle\bigg{[}\hat{\mathcal{L}}(\hat{u}_{\phi})-\hat{\mathcal{L}}(\overline{u})\bigg{]}+\bigg{[}\mathcal{L}(u_{\phi_{\mathcal{A}}})-\hat{\mathcal{L}}(u_{\phi_{\mathcal{A}}})\bigg{]}+\bigg{[}\hat{\mathcal{L}}(\overline{u})-\mathcal{L}(\overline{u})\bigg{]}$
		$\displaystyle+\bigg{[}\mathcal{L}(\overline{u})-\mathcal{L}(u^{*})\bigg{]}+\bigg{[}\hat{\mathcal{L}}(u_{\phi_{\mathcal{A}}})-\hat{\mathcal{L}}(\hat{u}_{\phi})\bigg{]},$

where $\overline{u}$ is an arbitrarily element in $\mathcal{P}$ . Since $\hat{\mathcal{L}}(\hat{u}_{\phi})-\hat{\mathcal{L}}(\overline{u})\leq 0$ , and $u$ is an arbitrarily element in $\mathcal{P}$ , we have:

\displaystyle\mathcal{L}(u_{\phi_{\mathcal{A}}})-\mathcal{L}(u^{*})\leq\underbrace{2\mathop{sup}_{u\in\mathcal{P}}\bigg{|}\mathcal{L}(u)-\hat{\mathcal{L}}(u)\bigg{|}}_{\varepsilon_{sta}}+\underbrace{\mathop{inf}_{u\in\mathcal{P}}\bigg{|}\mathcal{L}(u)-\mathcal{L}(u^{*})\bigg{|}}_{\varepsilon_{app}}+\underbrace{\bigg{[}\hat{\mathcal{L}}(u_{\phi_{A}})-\hat{\mathcal{L}}(\hat{u_{\phi}})\bigg{]}}_{\varepsilon_{opt}}

Thus, we have decomposed the total risk into approximation error ( $\varepsilon_{app}$ ), statistical error ( $\varepsilon_{sta}$ ) and optimization error ( $\varepsilon_{opt}$ ). While the approximation error describes the expressive power of $ReLU^{3}$ network, the statistical error is caused by the discretization of the Monte Carlo method and the optimization error represents performance of the solver $\mathcal{A}$ we use. In this work, we compromisely assume that the neural network can be well trained such that $\varepsilon_{opt}=0$ , and leave the optimization error as future study. In this case, it can be found that $\hat{u}_{\phi}=u_{\phi_{\mathcal{A}}}$ .

3.2 Lower bound of risk

Next, based on the energy stability of wave equations, we shall present a lower bound of risk $\mathcal{L}(\hat{u}_{\phi})-\mathcal{L}(u^{*})$ in SPINNs. As we will demonstrate later, the risk can be arbitrary small if the network and sample complexity have been well chosen, and thus we can assume $\mathcal{L}(\hat{u}_{\phi})-\mathcal{L}(u^{*})<1$ . Let $v=\hat{u}_{\phi}-u^{*}$ be the error between numerical solution and exact solution, we have

\begin{cases}v_{tt}-\Delta v=(\hat{u}_{\phi})_{tt}-\Delta\hat{u}_{\phi}-f\triangleq\tilde{f},\quad\ \quad&x\in\Omega,t\in[0,T],\\ v(x,0)=\hat{u}_{\phi}(x,0)-\varphi(x),&x\in\Omega,\\ v_{t}(x,0)=(\hat{u}_{\phi})_{t}(x,0)-\psi(x),&x\in\Omega,\\ v(x,t)=\hat{u}_{\phi}(x,0)-g(x,t),&x\in\partial\Omega,t\in[0,T],\\ \end{cases}

(7)

and $\|v\|_{C^{2}(\overline{\Omega_{T}})}\leq\frac{3}{2}\mathcal{B}$ . By applying theorem (3.1) to equation (7), we obtain

		$\displaystyle\\|\hat{u}_{\phi}-u^{*}\\|^{2}_{H_{1}}$
	$\displaystyle=$	$\displaystyle\int_{\Omega}\left(v_{t}^{2}+\sum_{i=1}^{d}v_{x_{i}}^{2}\right)dx+\int_{\Omega}v^{2}dx$
	$\displaystyle\leq$	$\displaystyle C(T)\left(E(0)+E_{0}(0)+\int_{0}^{T}\int_{\Omega}\tilde{f}^{2}dxdt+2\int_{0}^{T}\int_{\partial\Omega}\|v_{t}\|\cdot\\|\nabla v\\|dsdt\right)$
	$\displaystyle\leq$	$\displaystyle C(T)\left(\int_{\Omega}(v_{t}(x,0)^{2}+\sum_{i=1}^{d}v_{x_{i}}(x,0)^{2})dx+\int_{\Omega}v(x,0)^{2}dx\right.$
		$\displaystyle\left.+\int_{0}^{T}\int_{\Omega}((\hat{u}_{\phi})_{tt}-\Delta\hat{u}_{\phi}-f)^{2}dxdt+3\sqrt{d}\mathcal{B}\int_{0}^{T}\int_{\partial\Omega}\|v_{t}\|dsdt\right)$
	$\displaystyle\leq$	$\displaystyle C(T)\left(\int_{\Omega}(v_{t}(x,0)^{2}+\sum_{i=1}^{d}v_{x_{i}}(x,0)^{2})dx+\int_{\Omega}v(x,0)^{2}dx\right.$
		$\displaystyle\left.+\int_{0}^{T}\int_{\Omega}((\hat{u}_{\phi})_{tt}-\Delta\hat{u}_{\phi}-f)^{2}dxdt+3\sqrt{d}\mathcal{B}\|\partial\Omega_{T}\|(\int_{0}^{T}\int_{\partial\Omega}v_{t}^{2}dsdt)^{\frac{1}{2}}\right)$
	$\displaystyle\leq$	$\displaystyle C(T)\left(\\|(\hat{u}_{\phi})_{tt}-\Delta\hat{u}_{\phi}-f\\|^{2}_{L^{2}(\Omega_{T})}+\\|(\hat{u}_{\phi})_{t}(x,0)-\psi(x)\\|^{2}_{L^{2}(\Omega)}\right.$
		$\displaystyle\left.+\\|\hat{u}_{\phi}(x,0)-\varphi(x)\\|^{2}_{H^{1}(\Omega)}+3\sqrt{d}\mathcal{B}\|\partial\Omega_{T}\|\\|\hat{u}_{\phi}(x,t)-g(x,t)\\|_{H^{1}(\partial\Omega_{T})}\right)$
	$\displaystyle\leq$	$\displaystyle C(T)\left(\mathcal{L}(\hat{u}_{\phi})+3\sqrt{d}\mathcal{B}\|\partial\Omega_{T}\|\mathcal{L}(\hat{u}_{\phi})^{\frac{1}{2}}\right)$
	$\displaystyle\leq$	$\displaystyle C(T)(1+3\sqrt{d}\mathcal{B}\|\partial\Omega_{T}\|)\mathcal{L}(\hat{u}_{\phi})^{\frac{1}{2}}\quad(\mathcal{L}(\hat{u}_{\phi})<1)$
	$\displaystyle\leq$	$\displaystyle\gamma\bigg{(}\mathcal{L}(\hat{u}_{\phi})-\mathcal{L}(u^{})\bigg{)}^{\frac{1}{2}},\quad(\mathcal{L}(u^{})=0)$

where we define $\gamma(T,d,\mathcal{B},|\partial\Omega_{T}|)\triangleq C(T)(1+3\sqrt{d}\mathcal{B}|\partial\Omega_{T}|)$ . Combined this lower bound with previous risk decomposition, we can arrive at:

\|\hat{u}_{\phi}-u^{*}\|^{4}_{H_{1}}\leq\gamma^{2}(\varepsilon_{app}+\varepsilon_{sta}).

(8)

3.3 Approximation error

By applying the following lemma proved in our previous work, we can get the upper bound of approximation error:

Lemma 1

$\forall\overline{u}\in C^{3}(\overline{\Omega_{T}})$ and $\varepsilon>0$ , there exist a $ReLU^{3}$ network $u_{\phi}$ with depth $[log_{2}d]+2$ and width $C(d,\|\overline{u}\|_{C^{3}(\overline{\Omega_{T}})})(\frac{1}{\varepsilon})^{d+1}$ such that

\|\overline{u}-u_{\phi}\|_{C^{2}(\overline{\Omega_{T}})}\leq\varepsilon.

Proof

A special case of Corollary 4.2 in AAM-39-239 .

Theorem 3.2

Under the Assumption 2.2 and the condition that $u^{*}\in C^{3}(\overline{\Omega_{T}})$ , for any $\varepsilon>0$ , if we choose the following neural network function class:

	$\displaystyle\mathcal{P}$	$\displaystyle=$	$\displaystyle\mathcal{N}([log_{2}d]+2,C(d,\|\Omega\|,\|\partial\Omega\|,\|T\|,\\|u^{*}\\|_{C^{3}(\overline{\Omega_{T}})})(\frac{1}{\varepsilon})^{d+1},$
			$\displaystyle\{\\|\cdot\\|_{C^{2}(\overline{\Omega_{T}})},2\\|u^{*}\\|_{C^{2}(\overline{\Omega_{T}})}\},\{ReLU^{3}\}),$

then the approximation error $\varepsilon_{app}\leq C(d,|\Omega_{T}|,\partial\Omega_{T}|)\varepsilon^{2}$ .

Proof

See Appendix 6.2 for details.

3.4 Statistical error

The following theorem demonstrates that with sufficiently large sample complexity, the statistical error can be well controlled:

Theorem 3.3

Let $\mathcal{D},\mathcal{W}\in\mathbb{N},\mathcal{B}\in\mathbb{R}^{+}.$ For any $\varepsilon\geq 0$ , if the number of samples satisfy:

\begin{cases}N=C(d,|\Omega|,\mathcal{B})\mathcal{D}^{4}\mathcal{W}^{2}(\mathcal{D}+log(\mathcal{W}))(\frac{1}{\varepsilon})^{2+\delta},\\ K=C(d,|T|,\mathcal{B})\mathcal{D}^{2}f_{K}(\mathcal{D},\mathcal{W})(\frac{1}{\varepsilon})^{k_{1}},\\ M=C(d,|\partial\Omega|,\mathcal{B})f_{M}(\mathcal{D},\mathcal{W})(\frac{1}{\varepsilon})^{k_{2}},\\ \end{cases}

where $f_{K}(\mathcal{D},\mathcal{W})\geq 1$ , $f_{M}(\mathcal{D},\mathcal{W})\geq 1$ , $\delta$ is an arbitrarily small number such that

\begin{cases}k_{1}+k_{2}=2+\delta,\\ f_{k}(\mathcal{D},\mathcal{W})\cdot f_{M}(\mathcal{D},\mathcal{W})=\mathcal{D}^{2}\mathcal{W}^{2}(\mathcal{D}+log(\mathcal{W})),\end{cases}

then we have:

\displaystyle\mathbb{E}_{\{X_{n}\}_{n=1}^{N},\{Y_{m}\}_{m=1}^{M},\{T_{k}\}_{k=1}^{K}}\sup_{u\in\mathcal{P}}|\mathcal{L}(u)-\hat{\mathcal{L}}(u)|\leq\varepsilon

Proof

See Appendix 6.3 for details.

3.5 Convergence rate of SPINNs

With the preparation in last two sections on the bounds of approximation and statistical errors, we will give the main results in this section.

Theorem 3.4

Under the Assumption 2.2 and the condition that $u^{*}\in C^{3}(\overline{\Omega_{T}})$ . For any $\varepsilon>0$ , if we choose the parameterized neural network class

	$\displaystyle\mathcal{P}$	$\displaystyle=$	$\displaystyle\mathcal{N}([log_{2}d]+2,C(d,\|\Omega\|,\|\partial\Omega\|,\|T\|,\\|u^{*}\\|_{C^{3}(\overline{\Omega_{T}})})(\frac{1}{\varepsilon^{2}})^{d+1},$
			$\displaystyle\{\\|\cdot\\|_{C^{2}(\overline{\Omega_{T}})},2\\|u^{*}\\|_{C^{2}(\overline{\Omega_{T}})}\},\{ReLU^{3}\})$

and let the number of samples be

\begin{cases}N=C(d,|\Omega|,\mathcal{B})(\frac{1}{\varepsilon^{4}})^{d+3+\delta},\\ K=C(d,|T|,\mathcal{B})(\frac{1}{\varepsilon^{4}})^{k_{1}+\tilde{k}_{1}},\\ M=C(d,|\partial\Omega|,\mathcal{B})(\frac{1}{\varepsilon^{4}})^{k_{2}+\tilde{k}_{2}},\\ \end{cases}

where $\tilde{k}_{1},\tilde{k}_{2}\geq 0$ , $\delta$ is arbitrarily small such that

\begin{cases}k_{1}+k_{2}=2+\delta,\\ \tilde{k}_{1}+\tilde{k}_{2}=d+1,\end{cases}

then we have:

\displaystyle\mathbb{E}_{\{X_{n}\}_{n=1}^{N},\{Y_{m}\}_{m=1}^{M},\{T_{k}\}_{k=1}^{K}}\|\hat{u}_{\phi}-u^{*}\|_{H^{1}(\overline{\Omega_{T}})}\leq\varepsilon

Proof

By theorem 3.2, if we set the neural network function class as:

	$\displaystyle\mathcal{P}$	$\displaystyle=$	$\displaystyle\mathcal{N}([log_{2}d]+2,C(d,\|\Omega\|,\|\partial\Omega\|,\|T\|,\\|u^{*}\\|_{C^{3}(\overline{\Omega_{T}})})(\frac{1}{\varepsilon^{2}})^{d+1},$		(9)
			$\displaystyle\{\\|\cdot\\|_{C^{2}(\overline{\Omega_{T}})},2\\|u^{*}\\|_{C^{2}(\overline{\Omega_{T}})}\},\{ReLU^{3}\})$		(9)

the approximation error can be arbitrarily small:

\displaystyle\varepsilon_{app}\leq\frac{\varepsilon^{4}}{2\gamma^{2}}

(10)

Without loss of generality we assume that $\varepsilon$ is small enough such that

\displaystyle\|\hat{u}_{\phi}\|_{C^{2}(\overline{\Omega_{T}})}\leq\|u^{*}-\hat{u}_{\phi}\|_{C^{2}(\overline{\Omega_{T}})}+\|u^{*}\|_{C^{2}(\overline{\Omega_{T}})}\leq 2\|u^{*}\|_{C^{2}(\overline{\Omega_{T}})}.

By theorem 3.3, when the number of samples be:

\begin{cases}N=C(d,|\Omega|,\mathcal{B})(\frac{1}{\varepsilon^{4}})^{d+3+\delta},\\ K=C(d,|T|,\mathcal{B})(\frac{1}{\varepsilon^{4}})^{k_{1}+\tilde{k}_{1}},\\ M=C(d,|\partial\Omega|,\mathcal{B})(\frac{1}{\varepsilon^{4}})^{k_{2}+\tilde{k}_{2}},\\ \end{cases}

(11)

where $\delta$ is an arbitrarily positive number and

\begin{cases}k_{1}+k_{2}=2+\delta,\\ \tilde{k}_{1}+\tilde{k}_{2}=d+1,\end{cases}

(12)

we have:

\displaystyle\mathbb{E}_{\{X_{n}\}_{n=1}^{N},\{Y_{m}\}_{m=1}^{M},\{T_{k}\}_{k=1}^{K}}\varepsilon_{sta}\leq\frac{\varepsilon^{4}}{2\gamma^{2}}

(13)

Combining (8), (10) and (13) together, we get the final result:

	$\displaystyle\mathbb{E}_{\{X_{n}\}_{n=1}^{N},\{Y_{m}\}_{m=1}^{M},\{T_{k}\}_{k=1}^{K}}\\|\hat{u}_{\phi}-u^{*}\\|_{H^{1}(\overline{\Omega_{T}})}$
$\displaystyle\leq$	$\displaystyle\mathbb{E}_{\{X_{n}\}_{n=1}^{N},\{Y_{m}\}_{m=1}^{M},\{T_{k}\}_{k=1}^{K}}[\gamma^{2}\left(\mathcal{L}(\hat{u}_{\phi})-\mathcal{L}(u^{*})\right)]^{1/4}$
$\displaystyle\leq$	$\displaystyle\varepsilon.$	(14)

4 Numerical Experiments

In this section, we would use SPINNs to solve wave equation in both one dimension and two dimension.

4.1 1D example

Consider the following 1D wave equation on $\Omega=[-2,2]$ from $t=0$ to $t=T=8$ :

\begin{cases}u_{tt}=u_{xx},\quad&x\in\Omega,\quad 0\leq t\leq 8,\\ u(0,x)=0,\quad&x\in\Omega,\\ u_{t}(0,x)=0,\quad&x\in\Omega,\\ u(t,-2)=sin(0.8\pi t),\quad&0\leq t\leq 8,\\ u(t,2)=0,\quad&0\leq t\leq 8.\end{cases}

(15)

For the training with SPINNs, we use a four-layer $ReLU^{3}$ network with 64 neurons in each layer to approximate the solution. We choose the Adam algorithm to implement the minimization, and the initial learning rate is set as 1E-3.

As for sample complexity, we train SPINNs with 10000 interior points, 500 boundary points (250 for each end) and 250 initial points in each epoch, all of which are sampled according to a uniform distribution (see (a) in Figure 1). Further more, to obtain a better accuracy, we apply GAS method jiao2023gas as an adaptive sampling strategy. After every 250 epochs, we would adaptively add 600 inner points, 30 boundary points and 15 initial points based on a Gaussian mixture distribution. The GAS procedure will be repeated for 10 times. See jiao2023gas for more details. For evaluation, we use the central difference method with a fine mesh ( $dx=0.01$ , $dt=0.009$ ) to obtain a reference solution $u_{cdm}$ (see (b) in Figure 1) , with which we can calculate the following relative error by using numerical integration:

\text{Relative error}=\frac{\|u_{\phi}-u_{cdm}\|_{2}}{\|u_{cdm}\|_{2}}.

Refer to caption — Figure 1: Collection points (left) and reference solution (right) in $[0,T]\times\Omega$ . The red, green and blue points stand for the initial points, boundary points and interior points correspondingly.

Figure 2 demonstrates the numerical result of PINNs and SPINNs for (15) after $N_{G}$ times of adaptive sampling by GAS. As we can see, the SPINNs method converges faster than the classical PINNs, e.g., after 5 times of adaptive sampling, SPINNs have already captured all the six peaks of standing waves generated by the superposition of reflected wave and right-traveling waves. Furthermore, we present the relative error of PINNs and SPINNs in Figure 3, which shows that our method can achieve a lower relative error at the early stage of training. On the other hand, with the times of adaptive sampling increasing, the classical PINNs can also arrive at a comparable accuracy, which has also been revealed by (g) in Figure 2. These results reflect the fact that training with SPINNs can speed up the convergence in solution, especially when the number of samples is relatively small.

4.2 2D example

Consider the following 2D wave equation on $\Omega=[-2,2]^{2}$ from $t=0$ to $t=1$ :

\begin{cases}u_{tt}=u_{xx}+u_{yy},\quad&x\in\Omega,\quad 0\leq t\leq 1,\\ u(0,x,y)=sin(2\pi\sqrt{x^{2}+y^{2}}),\quad&\sqrt{x^{2}+y^{2}}<1,\\ u(0,x,y)=0,\quad&\sqrt{x^{2}+y^{2}}\geq 1,\\ u_{t}(0,x,y)=0,\quad&(x,y)\in\Omega,\\ u(t,x,y)=0,\quad&(x,y)\in\partial\Omega,0\leq t\leq 1,\\ \end{cases}

(16)

For the training with SPINNs, based on the experiment, we choose a three-layer $ReLU^{3}$ network with 512 neurons in each layer to approximate the solution. The optimization algorithm and the initial learning rate are kept as before.

As for sample complexity, we train SPINNs with 2000 interior points, 4000 boundary points (1000 for each edge) and 1000 initial points in each epoch, all of which are sampled according to a uniform distribution. For evaluation, we use the central difference method with a fine mesh ( $dx=dy=0.01$ , $dt=0.004$ ) to obtain a reference solution $u_{cdm}$ , with which we can calculate the pointwise absolute error $|u_{cdm}-u_{\phi}|$ . As we can observe from Figure 4 and Figure 5, the SPINNs method can achieve a lower pointwise error, after training for same epochs. This superiority, in fact, can be understood as that learning with derivative information improves accuracy in the fitting of initial condition.

5 Conclusion

In this work, we propose a stabilized physics informed neural networks method SPINNs for wave equations. With some numerical analysis, we rigorously prove SPINNs is a stable learning method, in which the solution error can be well controlled by the loss term. Based on this, a non-asymptotic convergence rate of SPINNs is presented, which provide people with a solid theoretical foundation to use it. Furthermore, by applying SPINNs to the simulation of two wave propagation problems, we numerically demonstrate that SPINNs can achieve a higher training accuracy and efficiency, especially when the number of samples is limited. On the other hand, how to extend this method to more difficult situations such as high dimensional problems and how to handel the optimization error in our convergence analysis are still needed to be studied. We will leave these topics as our future research.

Acknowledgement

This work is supported by the National Key Research and Development Program of China (No.2020YFA0714200), by the National Nature Science Foundation of China (No.12371441, No.12301558, No.12125103, No.12071362), and by the Fundamental Research Funds for the Central Universities.

6 Appendix

6.1 Appendix for energy integral of wave equations

According to the Gaussian formula, we have

	$\displaystyle\int_{\Omega}\left(\sum_{i=1}^{d}(u_{tx_{i}}u_{x_{i}})+u_{t}(\sum_{i=1}^{d}u_{x_{i}x_{i}})\right)dx$	$\displaystyle=\int_{\Omega}(\nabla\cdot(u_{t}\nabla u))dx$
		$\displaystyle=\int_{\partial\Omega}u_{t}\nabla u\cdot\mathbf{n}ds,$		(17)

where $\mathbf{n}$ is the unit outer normal vector. Combine (6) and (6.1),we have

$\displaystyle\frac{dE(t)}{dt}$	$\displaystyle=2\int_{\Omega}\left(u_{t}u_{tt}+\sum_{i=1}^{d}u_{x_{i}}u_{x_{i}t}\right)dx$
	$\displaystyle=2\int_{\Omega}\left(u_{t}u_{tt}-u_{t}\sum_{i=1}^{d}u_{x_{i}x_{i}}\right)dx+2\int_{\partial\Omega}u_{t}\nabla u\cdot\mathbf{n}ds$
	$\displaystyle=2\int_{\Omega}u_{t}fdx+2\int_{\partial\Omega}u_{t}\nabla u\cdot\mathbf{n}ds$
	$\displaystyle\leq\int_{\Omega}(u_{t}^{2}+f^{2})dx+2\int_{\partial\Omega}\|u_{t}\|\cdot\\|\nabla u\\|ds.$	(18)

Multiply both sides of the inequality (6.1) by $e^{-t}$ ,

\displaystyle\frac{d(e^{-t}E(t))}{dt}\leq e^{-t}\left(\int_{\Omega}f^{2}dx+2\int_{\partial\Omega}|u_{t}|\cdot\|\nabla u\|ds\right).

(19)

Then, integrating the equation (19) from 0 to t,

\displaystyle E(t)\leq e^{t}\left(E(0)+\int_{0}^{t}e^{-\tau}\int_{\Omega}f^{2}dxd\tau+2\int_{0}^{t}e^{-\tau}\int_{\partial\Omega}|u_{t}|\cdot\|\nabla u\|dsd\tau\right)

For any $t\in[0,T]$ ,

\displaystyle E(t)\leq C_{1}\left(E(0)+\int_{0}^{T}\int_{\Omega}f^{2}dxdt+2\int_{0}^{T}\int_{\partial\Omega}|u_{t}|\cdot\|\nabla u\|dsdt\right),

(20)

$C_{1}$ is a constant that is only related to $T$ . Further we have

\displaystyle\frac{dE_{0}(t)}{dt}=\int_{\Omega}2uu_{t}dx\leq\int_{\Omega}u^{2}dx+\int_{\Omega}u_{t}^{2}dx\leq E_{0}(t)+E(t)

(21)

Multiply both sides of the above equation (21) by $e^{-t}$ ,

\displaystyle\frac{d}{dt}(e^{-t}E_{0}(t))\leq e^{-t}E(t)

(22)

Integrating the equation (22) from 0 to t,

\displaystyle E_{0}(t)\leq e^{t}E_{0}(0)+e^{t}\int_{0}^{t}e^{-\tau}E(\tau)d\tau

(23)

For any $t\in[0,T]$ ,

\displaystyle E_{0}(t)\leq C_{2}\left(E_{0}(0)+E(t)\right),

(24)

$C_{2}$ is a constant that is only related to $T$ . Combine (20) and (24),we have,

	$\displaystyle E(t)+E_{0}(t)\leq C\left(E(0)+E_{0}(0)+\int_{0}^{T}\int_{\Omega}f^{2}dxdt\right.$		(25)
	$\displaystyle+\left.2\int_{0}^{T}\int_{\partial\Omega}\|u_{t}\|\cdot\\|\nabla u\\|dsdt\right).$		(26)

$C$ is a constant that is only related to $T$ .

6.2 Appendix for approximation error

[Proof of theorem 3.2] According to lemma 1, we know that for $u^{*}\in C^{3}(\overline{\Omega_{T}})$ , $\varepsilon>0$ , there exist a $ReLU^{3}$ network with depth $[log_{2}d]+2$ and width $C(d,\|u^{*}\|_{C^{3}(\overline{\Omega_{T}})})(\frac{1}{\varepsilon})^{d+1}$ , such that $\|v(x,t)\|_{C^{2}(\overline{\Omega_{T}})}=\|u^{*}-\hat{u}_{\phi}\|_{C^{2}(\overline{\Omega_{T}})}\leq\varepsilon$ . Hence,

	$\displaystyle\varepsilon_{app}$	$\displaystyle\leq\|\mathcal{L}(\hat{u}_{\phi})-\mathcal{L}(u^{*})\|$
		$\displaystyle=\\|(\hat{u}_{\phi})_{tt}(x,t)-\Delta\hat{u}_{\phi}(x,t)-f\\|^{2}_{L^{2}(\Omega_{T})}+\\|\hat{u}_{\phi}(x,0)-\varphi(x)\\|^{2}_{H^{1}(\Omega)}$
		$\displaystyle+\\|(\hat{u}_{\phi})_{t}(x,0)-\psi(x)\\|^{2}_{L^{2}(\Omega)}+\\|\hat{u}_{\phi}(x,t)-g(x,t)\\|^{2}_{H^{1}(\partial\Omega_{T})}$
		$\displaystyle\leq\\|v_{tt}\\|^{2}_{L^{2}(\Omega_{T})}+\\|\Delta v\\|^{2}_{L^{2}(\Omega_{T})}+\\|v(x,0)\\|^{2}_{H^{1}(\Omega)}+\\|v_{t}(x,0)\\|^{2}_{L^{2}(\Omega)}+\\|v\\|_{H^{1}(\partial\Omega_{T})}^{2}$
		$\displaystyle\leq C(d,\|\Omega_{T}\|,\|\partial\Omega_{T}\|)\cdot\varepsilon^{2}.$

6.3 Appendix for statistical error

We will give the precise computation on the upper bounds of statistical error in this section. To begin with, we first introduce several basic concepts and results in learning theory.

Definition 1

(Rademacher complexity) The Rademacher complexity of a set $A\subseteq\mathbb{R}^{N}$ is defined by

\displaystyle\Re(A)=\mathbb{E}_{\{\sigma_{k}\}_{k=1}^{N}}\Bigg{[}\ \sup_{a\in A}\frac{1}{N}\sum_{k=1}^{N}\sigma_{k}a_{k}\Bigg{]},

where $\{\sigma_{k}\}_{k=1}^{N}$ are $N$ i.i.d Rademacher variables with $p(\sigma_{k}=1)=p(\sigma_{k}=-1)=\frac{1}{2}$ .

Let $\Omega$ be a set and $\mathcal{F}$ be a function class which maps $\Omega$ to $\mathbb{R}$ . Let $P$ be a probability distribution over $\Omega$ and $\{X_{k}\}_{k=1}^{N}$ be i.i.d. samples from $P$ . The Rademacher complexity of $\mathcal{F}$ associated with distribution $P$ and sample size $N$ is defined by

\displaystyle\Re_{P,N}(\mathcal{F})=\mathbb{E}_{\{X_{k},\sigma_{k}\}_{k=1}^{N}}\Bigg{[}\ \sup_{u\in\mathcal{F}}\frac{1}{N}\sum_{k=1}^{N}\sigma_{k}u(X_{k})\Bigg{]}.

Lemma 2

Let $\Omega$ be a set and $P$ be a probability distribution over $\Omega$ . Let $N\in\mathbb{N}$ . Assume that $\omega:\Omega\rightarrow\mathbb{R}$ and $|\omega(x)|\leq\mathcal{B}$ for all $x\in\Omega$ , then for any function class $\mathcal{F}$ mapping $\Omega$ to $\mathbb{R}$ , there holds

\displaystyle\Re_{P,N}(\omega(x)\mathcal{F})\leq\mathcal{B}\Re_{P,N}(\mathcal{F}),

where $\omega(x)\mathcal{F}:=\{\overline{u}:\overline{u}(x)=\omega(x)u(x),u\in\mathcal{F}\}.$

Proof

See jiao2022rate for the proof.

Definition 2

(Covering number) Suppose that $W\subset\mathbb{R}$ . For any $\varepsilon>0$ ,let $V\subset\mathbb{R}^{n}$ be an $\varepsilon$ -cover of $W$ with respect to the distance $d_{\infty}$ , that is, for any $\omega\in W$ , there exists a $v\in V$ such that $d_{\infty}(u,v)<\varepsilon$ , where $d_{\infty}$ is defined by $d_{\infty}(u,v):=\max_{1\leq i\leq n}|u_{i}-v_{i}|$ . The covering number $\mathcal{C}(\varepsilon,W,d_{\infty})$ is defined to be the minimum cardinality among all $\varepsilon$ -cover of $W$ with respect to the distance $d_{\infty}$ .

Definition 3

(Uniform covering number) Suppose that $\mathcal{F}$ is a class of functions from $\Omega$ to $\mathbb{R}$ . Given $n$ sample $Z_{n}=(Z_{1},Z_{2},\cdots Z_{n})\in\Omega^{n},\mathcal{F}|_{Z_{n}}\subset\mathbb{R}^{n}$ is defined by

\displaystyle\mathcal{F}|_{Z_{n}}\subset\mathbb{R}^{n}=\{(u(Z_{1}),u(Z_{2}),\cdots,u(Z_{n})):u\in\mathcal{F}\}.

The uniform covering number $\mathcal{C}_{\infty}(\varepsilon,\mathcal{F},n)$ is defined by

\displaystyle\mathcal{C}_{\infty}(\varepsilon,\mathcal{F},n)=\max_{Z_{n}\in\Omega^{n}}\mathcal{C}(\varepsilon,\mathcal{F}|_{Z_{n}},d_{\infty}).

Lemma 3

Let $\Omega$ be a set and $P$ be a probability distribution over $\Omega$ . Let $N\in\mathbb{N}_{\geq 1}$ ,and $\mathcal{F}$ be a class of functions from $\Omega$ to $\mathbb{R}$ such that $0\in\mathcal{F}$ and the diameter of $\mathcal{F}$ is less than $\mathcal{B}$ ,i.e., $\|u\|_{L^{\infty}(\Omega)}\leq\mathcal{B}$ , $\forall u\in\mathcal{F}$ . Then

\displaystyle\Re_{P,N}(\mathcal{F})\leq\inf_{0<\delta<\mathcal{B}}\bigg{(}4\delta+\frac{12}{\sqrt{N}}\int_{\delta}^{\mathcal{B}}\sqrt{log(2\mathcal{C}_{\infty}(\varepsilon,\mathcal{F},N))}d\varepsilon\bigg{)}.

Proof

This proof is base on the chaining method, see van1996weak .

Definition 4

(Pseudo-dimension) Let $\mathcal{F}$ be a class of functions from $X$ to $\mathbb{R}$ . Suppose that $S=\{x_{1},x_{2},\cdots,x_{n}\}\subset X$ . We say that $S$ is pseudo-shattered by $\mathcal{F}$ if there exists $y_{1},y_{2},\cdots,y_{n}$ such that for any $b\in\{0,1\}^{n}$ , there exists a $u\in\mathcal{F}$ satisfying

\displaystyle sign(u(x_{i})-y_{i})=b_{i},i=1,2,\cdots,n,

and we say that $\{y_{i}\}_{i=1}^{n}$ witnesses the shattering. The pseudo-dimension of $\mathcal{F}$ , denoted as $Pdim(\mathcal{F})$ , is defined to be the maximum cardinality amoong all sets pseudo-shattered by $\mathcal{F}$ .

Lemma 4 (Theorem 12.2 in Anthony1999Neural )

Let $\mathcal{F}$ be a class of real functions from a domain $X$ to the bounded interval $[0,\mathcal{B}]$ .Let $\varepsilon>0$ . Then

\displaystyle\mathcal{C}_{\infty}(\varepsilon,\mathcal{F},n)\leq\sum_{i=1}^{Pdim(\mathcal{F})}\tbinom{n}{i}{\tbinom{\mathcal{B}}{\varepsilon}}^{i},

which is less than $(\frac{en\mathcal{B}}{\varepsilon\cdot Pdim(\mathcal{F})})^{Pdim(\mathcal{F})}$ for $n\geq Pdim(\mathcal{F})$ .

Next, to obtain the upper bound, we would decompose the statistical error into 24 terms by using triangle inequality:

	$\displaystyle\mathbb{E}_{\{X_{n}\}_{n=1}^{N},\{Y_{m}\}_{m=1}^{M},\{T_{k}\}_{k=1}^{K}}\sup_{u\in\mathcal{P}}\|\mathcal{L}(u)-\hat{\mathcal{L}}(u)\|\leq$
	$\displaystyle\sum_{j=1}^{24}\mathbb{E}_{\{X_{n}\}_{n=1}^{N},\{Y_{m}\}_{m=1}^{M},\{T_{k}\}_{k=1}^{K}}\sup_{u\in\mathcal{P}}\|\mathcal{L}_{j}(u)-\hat{\mathcal{L}}_{j}(u)\|$

where

	$\displaystyle\mathcal{L}_{1}$	$\displaystyle=\|\Omega\|\|T\|\mathbb{E}_{X\sim U(\Omega),T\sim U([0,T])}\left(u_{tt}(X,T)\right)^{2},$
	$\displaystyle\mathcal{L}_{2}$	$\displaystyle=\|\Omega\|\|T\|\mathbb{E}_{X\sim U(\Omega),T\sim U([0,T])}\left(\sum_{i=1}^{d}u_{x_{i}x_{i}}(X,T)\right)^{2},$
	$\displaystyle\mathcal{L}_{3}$	$\displaystyle=\|\Omega\|\|T\|\mathbb{E}_{X\sim U(\Omega),T\sim U([0,T])}\left(f(X,T)\right)^{2},$
	$\displaystyle\mathcal{L}_{4}$	$\displaystyle=-2\|\Omega\|\|T\|\mathbb{E}_{X\sim U(\Omega),T\sim U([0,T])}\left(\sum_{i=1}^{d}u_{tt}(X,T)u_{x_{i}x_{i}}(X,T)\right),$
	$\displaystyle\mathcal{L}_{5}$	$\displaystyle=-2\|\Omega\|\|T\|\mathbb{E}_{X\sim U(\Omega),T\sim U([0,T])}\left(u_{tt}(X,T)f(X,T)\right),$
	$\displaystyle\mathcal{L}_{6}$	$\displaystyle=2\|\Omega\|\|T\|\mathbb{E}_{X\sim U(\Omega),T\sim U([0,T])}\left(\sum_{i=1}^{d}u_{x_{i}x_{i}}(X,T)f(X,T)\right),$
	$\displaystyle\mathcal{L}_{7}$	$\displaystyle=\|\Omega\|\mathbb{E}_{X\sim U(\Omega)}\left(u(X,0)\right)^{2},$
	$\displaystyle\mathcal{L}_{8}$	$\displaystyle=\|\Omega\|\mathbb{E}_{X\sim U(\Omega)}\left(\varphi(X)\right)^{2},$
	$\displaystyle\mathcal{L}_{9}$	$\displaystyle=-2\|\Omega\|\mathbb{E}_{X\sim U(\Omega)}\left(u(X,0)\varphi(X)\right),$
	$\displaystyle\mathcal{L}_{10}$	$\displaystyle=\|\Omega\|\mathbb{E}_{X\sim U(\Omega)}\left(\sum_{i=1}^{d}u_{x_{i}}(X,0)^{2}\right),$
	$\displaystyle\mathcal{L}_{11}$	$\displaystyle=\|\Omega\|\mathbb{E}_{X\sim U(\Omega)}\left(\sum_{i=1}^{d}\varphi_{x_{i}}(X)^{2}\right),$
	$\displaystyle\mathcal{L}_{12}$	$\displaystyle=-2\|\Omega\|\mathbb{E}_{X\sim U(\Omega)}\left(\sum_{i=1}^{d}u_{x_{i}}(X,0)\varphi_{x_{i}}(X)\right),$
	$\displaystyle\mathcal{L}_{13}$	$\displaystyle=\|\Omega\|\mathbb{E}_{X\sim U(\Omega)}\left(u_{t}(X,0)\right)^{2},$
	$\displaystyle\mathcal{L}_{14}$	$\displaystyle=\|\Omega\|\mathbb{E}_{X\sim U(\Omega)}\left(\psi(X)\right)^{2},$
	$\displaystyle\mathcal{L}_{15}$	$\displaystyle=-2\|\Omega\|\mathbb{E}_{X\sim U(\Omega)}\left(u_{t}(X,0)\psi(X)\right),$
	$\displaystyle\mathcal{L}_{16}$	$\displaystyle=\|\partial\Omega\|\|T\|\mathbb{E}_{Y\sim U(\partial\Omega),T\sim U([0,T])}\left(u(Y,T)\right)^{2},$
	$\displaystyle\mathcal{L}_{17}$	$\displaystyle=\|\partial\Omega\|\|T\|\mathbb{E}_{Y\sim U(\partial\Omega),T\sim U([0,T])}\left(g(Y,T)\right)^{2},$
	$\displaystyle\mathcal{L}_{18}$	$\displaystyle=-2\|\partial\Omega\|\|T\|\mathbb{E}_{Y\sim U(\partial\Omega),T\sim U([0,T])}\left(u(Y,T)g(Y,T)\right),$
	$\displaystyle\mathcal{L}_{19}$	$\displaystyle=\|\partial\Omega\|\|T\|\mathbb{E}_{Y\sim U(\partial\Omega),T\sim U([0,T])}\left(u_{t}(Y,T)\right)^{2},$
	$\displaystyle\mathcal{L}_{20}$	$\displaystyle=\|\partial\Omega\|\|T\|\mathbb{E}_{Y\sim U(\partial\Omega),T\sim U([0,T])}\left(g_{t}(Y,T)\right)^{2},$
	$\displaystyle\mathcal{L}_{21}$	$\displaystyle=-2\|\partial\Omega\|\|T\|\mathbb{E}_{Y\sim U(\partial\Omega),T\sim U([0,T])}\left(u_{t}(Y,T)g_{t}(Y,T)\right),$
	$\displaystyle\mathcal{L}_{22}$	$\displaystyle=\|\partial\Omega\|\|T\|\mathbb{E}_{Y\sim U(\partial\Omega),T\sim U([0,T])}\sum_{i=1}^{d}\left(u_{x_{i}}(Y,T)^{2}\right),$
	$\displaystyle\mathcal{L}_{23}$	$\displaystyle=\|\partial\Omega\|\|T\|\mathbb{E}_{Y\sim U(\partial\Omega),T\sim U([0,T])}\sum_{i=1}^{d}\left(g_{x_{i}}(Y,T)^{2}\right),$
	$\displaystyle\mathcal{L}_{24}$	$\displaystyle=-2\|\partial\Omega\|\|T\|\mathbb{E}_{Y\sim U(\partial\Omega),T\sim U([0,T])}\sum_{i=1}^{d}\left(u_{x_{i}}{x_{i}}(Y,T)g_{x_{i}}(X,T)\right),$

and $\hat{\mathcal{L}}_{j}(u)$ is the empirical version of $\mathcal{L}_{j}(u)$ . The following lemma states that each of these 24 terms can be controlled by the corresponding Rademacher complexity.

Lemma 5

Let ${\{X_{n}\}_{n=1}^{N},\{Y_{m}\}_{m=1}^{M},\{T_{k}\}_{k=1}^{K}}$ be i.i.d samples from $U(\Omega),U(\partial\Omega),U([0,T])$ , then we have

\displaystyle\mathbb{E}_{\{X_{n}\}_{n=1}^{N},\{Y_{m}\}_{m=1}^{M},\{T_{k}\}_{k=1}^{K}}\sup_{u\in\mathcal{P}}\bigg{|}\mathcal{L}_{j}(u)-\hat{\mathcal{L}}_{j}(u)\bigg{|}\leq C(d,\mathcal{B})\Re_{\mathcal{U},N}(\mathcal{F}_{j})

for $j=1,2,\cdots,24,$ where:

	$\displaystyle\mathcal{F}_{1}$	$\displaystyle=\{\pm f:\Omega_{T}\rightarrow\mathbb{R}\|\quad\exists u\in\mathcal{P}\quad s.t.\quad f(x,t)=u_{tt}(x,t)^{2}\},$
	$\displaystyle\mathcal{F}_{2}$	$\displaystyle=\{\pm f:\Omega_{T}\rightarrow\mathbb{R}\|\quad\exists u\in\mathcal{P}\quad 1\leq i\leq j\leq d\quad s.t.\quad f(x,t)=u_{x_{i}x_{i}}(x,t)u_{x_{j}x_{j}}(x,t)\},$
	$\displaystyle\mathcal{F}_{4}$	$\displaystyle=\{\pm f:\Omega_{T}\rightarrow\mathbb{R}\|\quad\exists u\in\mathcal{P}\quad 1\leq i\leq d\quad s.t.\quad f(x,t)=u_{tt}(x,t)u_{x_{i}x_{i}}(x,t)\},$
	$\displaystyle\mathcal{F}_{5}$	$\displaystyle=\{\pm f:\Omega_{T}\rightarrow\mathbb{R}\|\quad\exists u\in\mathcal{P}\quad s.t.\quad f(x,t)=u_{tt}(x,t)\},$
	$\displaystyle\mathcal{F}_{6}$	$\displaystyle=\{\pm f:\Omega_{T}\rightarrow\mathbb{R}\|\quad\exists u\in\mathcal{P}\quad 1\leq i\leq d\quad s.t.\quad f(x,t)=u_{x_{i}x_{i}}(x,t)\},$
	$\displaystyle\mathcal{F}_{7}$	$\displaystyle=\{\pm f:\Omega\rightarrow\mathbb{R}\|\quad\exists u\in\mathcal{P}\quad s.t.\quad f(x)=u(x,0)^{2}\},$
	$\displaystyle\mathcal{F}_{9}$	$\displaystyle=\{\pm f:\Omega\rightarrow\mathbb{R}\|\quad\exists u\in\mathcal{P}\quad s.t.\quad f(x)=u(x,0)\},$
	$\displaystyle\mathcal{F}_{10}$	$\displaystyle=\{\pm f:\Omega\rightarrow\mathbb{R}\|\quad\exists u\in\mathcal{P}\quad 1\leq i\leq j\leq d\quad s.t.\quad f(x)=u_{x_{i}}(x,0)u_{x_{j}}(x,0)\},$
	$\displaystyle\mathcal{F}_{12}$	$\displaystyle=\{\pm f:\Omega\rightarrow\mathbb{R}\|\quad\exists u\in\mathcal{P}\quad 1\leq i\leq d\quad s.t.\quad f(x,t)=u_{x_{i}}(x,0)\},$
	$\displaystyle\mathcal{F}_{13}$	$\displaystyle=\{\pm f:\Omega\rightarrow\mathbb{R}\|\quad\exists u\in\mathcal{P}\quad s.t.\quad f(x)=u_{t}(x,0)^{2}\},$
	$\displaystyle\mathcal{F}_{15}$	$\displaystyle=\{\pm f:\Omega\rightarrow\mathbb{R}\|\quad\exists u\in\mathcal{P}\quad s.t.\quad f(x)=u_{t}(x,0)\},$
	$\displaystyle\mathcal{F}_{16}$	$\displaystyle=\{\pm f:\partial\Omega_{T}\rightarrow\mathbb{R}\|\quad\exists u\in\mathcal{P}\quad s.t.\quad f(x,t)=u(x,t)^{2}\|_{\partial\Omega}\},$
	$\displaystyle\mathcal{F}_{18}$	$\displaystyle=\{\pm f:\partial\Omega_{T}\rightarrow\mathbb{R}\|\quad\exists u\in\mathcal{P}\quad s.t.\quad f(x,t)=u(x,t)\|_{\partial\Omega}\},$
	$\displaystyle\mathcal{F}_{19}$	$\displaystyle=\{\pm f:\partial\Omega_{T}\rightarrow\mathbb{R}\|\quad\exists u\in\mathcal{P}\quad s.t.\quad f(x,t)=u_{t}(x,t)^{2}\|_{\partial\Omega}\},$
	$\displaystyle\mathcal{F}_{21}$	$\displaystyle=\{\pm f:\partial\Omega_{T}\rightarrow\mathbb{R}\|\quad\exists u\in\mathcal{P}\quad s.t.\quad f(x,t)=u_{t}(x,t)\|_{\partial\Omega}\},$
	$\displaystyle\mathcal{F}_{22}$	$\displaystyle=\{\pm f:\partial\Omega_{T}\rightarrow\mathbb{R}\|\quad\exists u\in\mathcal{P}\quad 1\leq i\leq j\leq d\quad s.t.\quad f(x,t)=u_{x_{i}}(x,t)u_{x_{j}}(x,t)\|_{\partial\Omega}\},$
	$\displaystyle\mathcal{F}_{24}$	$\displaystyle=\{\pm f:\partial\Omega_{T}\rightarrow\mathbb{R}\|\quad\exists u\in\mathcal{P}\quad 1\leq i\leq d\quad s.t.\quad f(x,t)=u_{x_{i}}(x,t)\|_{\partial\Omega}\}.$

Proof

The proof is based on the symmetrization technique, see lemma 4.3 in jiao2022rate for more details.

Lemma 6

Let $\Phi=\{ReLU,ReLU^{2},ReLU^{3}\}$ . There holds

	$\displaystyle\mathcal{F}_{1}\subset\mathcal{N}_{1}:=\mathcal{N}(\mathcal{D}+5,(\mathcal{D}+2)(\mathcal{D}+4)\mathcal{W},\{\\|\cdot\\|_{C(\overline{\Omega_{T}})},\mathcal{B}^{2}\},\Phi),$
	$\displaystyle\mathcal{F}_{2}\subset\mathcal{N}_{2}:=\mathcal{N}(\mathcal{D}+5,2(\mathcal{D}+2)(\mathcal{D}+4)\mathcal{W},\{\\|\cdot\\|_{C(\overline{\Omega_{T}})},\mathcal{B}^{2}\},\Phi),$
	$\displaystyle\mathcal{F}_{4}\subset\mathcal{N}_{4}:=\mathcal{N}(\mathcal{D}+5,2(\mathcal{D}+2)(\mathcal{D}+4)\mathcal{W},\{\\|\cdot\\|_{C(\overline{\Omega_{T}})},\mathcal{B}^{2}\},\Phi),$
	$\displaystyle\mathcal{F}_{5}\subset\mathcal{N}_{5}:=\mathcal{N}(\mathcal{D}+4,(\mathcal{D}+2)(\mathcal{D}+4)\mathcal{W},\{\\|\cdot\\|_{C(\overline{\Omega_{T}})},\mathcal{B}\},\Phi),$
	$\displaystyle\mathcal{F}_{6}\subset\mathcal{N}_{6}:=\mathcal{N}(\mathcal{D}+4,(\mathcal{D}+2)(\mathcal{D}+4)\mathcal{W},\{\\|\cdot\\|_{C(\overline{\Omega_{T}})},\mathcal{B}\},\Phi),$
	$\displaystyle\mathcal{F}_{7}\subset\mathcal{N}_{7}:=\mathcal{N}(\mathcal{D}+1),\mathcal{W},\{\\|\cdot\\|_{C(\overline{\Omega_{T}})},\mathcal{B}^{2}\},\Phi),$
	$\displaystyle\mathcal{F}_{9}\subset\mathcal{N}_{9}:=\mathcal{N}(\mathcal{D},\mathcal{W},\{\\|\cdot\\|_{C(\overline{\Omega_{T}})},\mathcal{B}\},\Phi),$
	$\displaystyle\mathcal{F}_{10}\subset\mathcal{N}_{10}:=\mathcal{N}(\mathcal{D}+2,2(\mathcal{D}+2)\mathcal{W},\{\\|\cdot\\|_{C(\overline{\Omega_{T}})},\mathcal{B}^{2}\},\Phi),$
	$\displaystyle\mathcal{F}_{12}\subset\mathcal{N}_{12}:=\mathcal{N}(\mathcal{D}+2,(\mathcal{D}+2)\mathcal{W},\{\\|\cdot\\|_{C(\overline{\Omega_{T}})},\mathcal{B}\},\Phi),$
	$\displaystyle\mathcal{F}_{13}\subset\mathcal{N}_{13}:=\mathcal{N}(\mathcal{D}+3,(\mathcal{D}+2)\mathcal{W},\{\\|\cdot\\|_{C(\overline{\Omega_{T}})},\mathcal{B}^{2}\},\Phi),$
	$\displaystyle\mathcal{F}_{15}\subset\mathcal{N}_{15}:=\mathcal{N}(\mathcal{D}+2,(\mathcal{D}+2)\mathcal{W},\{\\|\cdot\\|_{C(\overline{\Omega_{T}})},\mathcal{B}\},\Phi),$
	$\displaystyle\mathcal{F}_{16}\subset\mathcal{N}_{16}:=\mathcal{N}(\mathcal{D}+1,\mathcal{W},\{\\|\cdot\\|_{C(\overline{\Omega_{T}})},\mathcal{B}^{2}\},\Phi),$
	$\displaystyle\mathcal{F}_{18}\subset\mathcal{N}_{18}:=\mathcal{N}(\mathcal{D},\mathcal{W},\{\\|\cdot\\|_{C(\overline{\Omega_{T}})},\mathcal{B}\},\Phi),$
	$\displaystyle\mathcal{F}_{19}\subset\mathcal{N}_{19}:=\mathcal{N}(\mathcal{D}+3,(\mathcal{D}+2)\mathcal{W},\{\\|\cdot\\|_{C(\overline{\Omega_{T}})},\mathcal{B}^{2}\},\Phi),$
	$\displaystyle\mathcal{F}_{21}\subset\mathcal{N}_{21}:=\mathcal{N}(\mathcal{D}+2,(\mathcal{D}+2)\mathcal{W},\{\\|\cdot\\|_{C(\overline{\Omega_{T}})},\mathcal{B}\},\Phi),$
	$\displaystyle\mathcal{F}_{22}\subset\mathcal{N}_{22}:=\mathcal{N}(\mathcal{D}+3,2(\mathcal{D}+2)\mathcal{W},\{\\|\cdot\\|_{C(\overline{\Omega_{T}})},\mathcal{B}^{2}\},\Phi),$
	$\displaystyle\mathcal{F}_{24}\subset\mathcal{N}_{24}:=\mathcal{N}(\mathcal{D}+2,(\mathcal{D}+2)\mathcal{W},\{\\|\cdot\\|_{C(\overline{\Omega_{T}})},\mathcal{B}\},\Phi).$

Proof

The proof is an application of proposition 4.2 in jiao2022rate . Take $\mathcal{F}_{1}$ as an example, since $u\in\mathcal{P}$ , we have $u_{t}\in\mathcal{N}(\mathcal{D}+2,(\mathcal{D}+2)\mathcal{W},\{\|\cdot\|_{C^{1}(\overline{\Omega_{T}})},\mathcal{B}\},\Phi)$ and $u_{tt}\in\mathcal{N}(\mathcal{D}+4,(\mathcal{D}+2)(\mathcal{D}+4)\mathcal{W},\{\|\cdot\|_{C(\overline{\Omega_{T}})},\mathcal{B}\},\Phi)$ . Notice the square operation can be implemented as $x^{2}=ReLU^{2}(x)+ReLU^{2}(-x)$ , thus we get that $u_{tt}^{2}\in\mathcal{N}(\mathcal{D}+5,(\mathcal{D}+2)(\mathcal{D}+4)\mathcal{W},\{\|\cdot\|_{C(\overline{\Omega_{T}})},\mathcal{B}^{2}\},\Phi)$ .

Lemma 7 (Proposition 4.3 in jiao2022rate )

For any $\mathcal{D},\mathcal{W}\in\mathbb{N}$ ,

\displaystyle Pdim(\mathcal{N}(\mathcal{D},\mathcal{W},\{ReLU,ReLU^{2},ReLU^{3}\}))=\mathcal{O}(\mathcal{D}^{2}\mathcal{W}^{2}(\mathcal{D}+log\mathcal{W})).

Now we are ready to prove Theorem 3.3 on the statistical error.

Proof (The proof of Theorem 3.3)

According to lemma 3 and lemma 6,

$\bullet$ For $i=1,2,4,5,6$ , when the sample numbers $n=NK>Pdim(\mathcal{F}_{i})$ , we have

$\displaystyle\Re_{P(\mathcal{X}),NK}(\mathcal{F}_{i})$	$\displaystyle\leq\inf_{0<\delta<\mathcal{B}_{i}}\left(4\delta+\frac{12}{\sqrt{NK}}\int_{\delta}^{\mathcal{B}_{i}}\sqrt{log(2C_{\infty}(\varepsilon,\mathcal{F}_{i},N))}d\varepsilon\right)$
	$\displaystyle\leq\inf_{0<\delta<\mathcal{B}_{i}}\left(4\delta+\frac{12}{\sqrt{NK}}\int_{\delta}^{\mathcal{B}_{i}}\sqrt{log\left(2\left(\frac{eNK\mathcal{B}_{i}}{\varepsilon\cdot Pdim(\mathcal{F}_{i})}\right)^{Pdim(\mathcal{F}_{i})}\right)}d\varepsilon\right)$
	$\displaystyle\leq\inf_{0<\delta<\mathcal{B}_{i}}\left(4\delta+\frac{12\mathcal{B}_{i}}{\sqrt{NK}}+\frac{12}{\sqrt{NK}}\int_{\delta}^{\mathcal{B}_{i}}\sqrt{Pdim(\mathcal{F}_{i})log\left(\frac{eNK\mathcal{B}_{i}}{\varepsilon\cdot Pdim(\mathcal{F}_{i})}\right)}d\varepsilon\right)$	(27)

Let $t=\sqrt{log\left(\frac{eNK\mathcal{B}_{i}}{\varepsilon\cdot Pim(\mathcal{F}_{i})}\right)}$ , then $\varepsilon=\frac{eNK\mathcal{B}_{i}}{Pdim(\mathcal{F}_{i})}e^{-t^{2}}$ . Denoting:

\displaystyle t_{1}=\sqrt{log\left(\frac{eNK\mathcal{B}_{i}}{\mathcal{B}_{i}\cdot Pdim(\mathcal{F}_{i})}\right)},t_{2}=\sqrt{log\left(\frac{eNK\mathcal{B}_{i}}{\delta\cdot Pdim(\mathcal{F}_{i})}\right)}

we have:

	$\displaystyle\int_{\delta}^{\mathcal{B}_{i}}\sqrt{log\left(\frac{eNK\mathcal{B}_{i}}{\varepsilon\cdot Pim(\mathcal{F}_{i})}\right)}d\varepsilon$
	$\displaystyle=\frac{2eNK\mathcal{B}_{i}}{Pdim(\mathcal{F}_{i})}\int_{t_{1}}^{t_{2}}t^{2}e^{-t^{2}}dt$
	$\displaystyle=\frac{2eNK\mathcal{B}_{i}}{Pdim(\mathcal{F}_{i})}\int_{t_{1}}^{t_{2}}t\left(\frac{-e^{-t^{2}}}{2}\right)^{{}^{\prime}}dt$
	$\displaystyle=\frac{eNK\mathcal{B}_{i}}{Pdim(\mathcal{F}_{i})}\left[t_{1}e^{-t_{1}^{2}}-t_{2}e^{-t_{2}^{2}}+\int_{t_{1}}^{t_{2}}e^{-t^{2}}dt\right]$
	$\displaystyle\leq\frac{eNK\mathcal{B}_{i}}{Pdim(\mathcal{F}_{i})}\left[t_{1}e^{-t_{1}^{2}}-t_{2}e^{-t_{2}^{2}}+(t_{2}-t_{1})e^{-t_{1}^{2}}\right]$
	$\displaystyle\leq\frac{eNK\mathcal{B}_{i}}{Pdim(\mathcal{F}_{i})}t_{2}e^{-t_{1}^{2}}$
	$\displaystyle\leq\mathcal{B}_{i}\sqrt{log\left(\frac{eNK\mathcal{B}_{i}}{\delta\cdot Pdim(\mathcal{F}_{i})}\right)}$		(28)

Substitute (28) into (27) and choose $\delta=\mathcal{B}_{i}\left(\frac{Pdim(\mathcal{F}_{i})}{NK}\right)^{\frac{1}{2}}\leq\mathcal{B}_{i}$ , we have:

$\displaystyle\Re_{P(\mathcal{X}),NK}(\mathcal{F}_{i})$	$\displaystyle\leq\inf_{0<\delta<\mathcal{B}_{i}}\left(4\delta+\frac{12\mathcal{B}_{i}}{\sqrt{NK}}+\frac{12}{\sqrt{NK}}\int_{\delta}^{\mathcal{B}_{i}}\sqrt{Pdim(\mathcal{F}_{i})log\left(\frac{eNK\mathcal{B}_{i}}{\varepsilon\cdot Pdim(\mathcal{F}_{i})}\right)}d\varepsilon\right)$
	$\displaystyle\leq\inf_{0<\delta<\mathcal{B}_{i}}\left(4\delta+\frac{12\mathcal{B}_{i}}{\sqrt{NK}}+\frac{12\mathcal{B}_{i}\sqrt{Pdim(\mathcal{F}_{i})}}{\sqrt{NK}}\sqrt{log\left(\frac{eNK\mathcal{B}_{i}}{\delta\cdot Pdim(\mathcal{F}_{i})}\right)}\right)$
	$\displaystyle\leq 28\sqrt{\frac{3}{2}}\mathcal{B}_{i}\left(\frac{Pdim(\mathcal{F}_{i})}{NK}\right)^{\frac{1}{2}}\sqrt{log\left(\frac{eNK}{Pdim(\mathcal{F}_{i})}\right)}$
	$\displaystyle\leq 28\sqrt{\frac{3}{2}}\mathcal{B}_{i}\left(\frac{Pdim(\mathcal{N}_{i})}{NK}\right)^{\frac{1}{2}}\sqrt{log\left(\frac{eNK}{Pdim(\mathcal{N}_{i})}\right)}$
	$\displaystyle\leq 28\sqrt{\frac{3}{2}}max\{\mathcal{B},\mathcal{B}^{2}\}\left(\frac{\mathcal{H}_{1}}{NK}\right)^{\frac{1}{2}}\sqrt{log\left(\frac{eNK}{\mathcal{H}_{1}}\right)}$	(29)

where

\mathcal{H}_{1}=C_{1}(\mathcal{D}+2)^{2}(\mathcal{D}+4)^{2}(\mathcal{D}+5)^{2}\mathcal{W}^{2}[(\mathcal{D}+5)+log(2(\mathcal{D}+2)(\mathcal{D}+4)\mathcal{W})].

The last step above is due to lemma 7.

$\bullet$ For $i=7,9,10,12,13,15$ , when the sample numbers $n=N>Pdim(\mathcal{F}_{i})$ , we can similarly prove that

\displaystyle\Re_{\mathcal{U}(\Omega),N}(\mathcal{F}_{i})\leq 28\sqrt{\frac{3}{2}}max\{\mathcal{B},\mathcal{B}^{2}\}\left(\frac{\mathcal{H}_{2}}{N}\right)^{\frac{1}{2}}\sqrt{log\left(\frac{eN}{\mathcal{H}_{2}}\right)}

(30)

where

\mathcal{H}_{2}=C_{2}(\mathcal{D}+2)^{2}(\mathcal{D}+3)^{2}\mathcal{W}^{2}[(\mathcal{D}+3)+log(2(\mathcal{D}+2)\mathcal{W})].

$\bullet$ For $i=16,18,19,21,22,24$ , when the sample numbers $n=MK>Pdim(\mathcal{F}_{i})$ , we have

\displaystyle\Re_{\mathcal{U}(\partial\Omega_{T}),MK}(\mathcal{F}_{i})\leq 28\sqrt{\frac{3}{2}}max\{\mathcal{B},\mathcal{B}^{2}\}\left(\frac{\mathcal{H}_{3}}{MK}\right)^{\frac{1}{2}}\sqrt{log\left(\frac{eMK}{\mathcal{H}_{3}}\right)}

(31)

where

\mathcal{H}_{3}=C_{3}(\mathcal{D}+2)^{2}(\mathcal{D}+3)^{2}\mathcal{W}^{2}[(\mathcal{D}+3)+log(2(\mathcal{D}+2)\mathcal{W})].

$\bullet$ For $i=3$ , we have,

	$\displaystyle\mathbb{E}_{\{X_{n}\}_{n=1}^{N},\{T_{k}\}_{k=1}^{K}}\bigg{\|}\mathcal{L}_{3}-\hat{\mathcal{L}}_{3}\bigg{\|}$
	$\displaystyle=\|\Omega\|\|T\|\mathbb{E}_{\{X_{n}\}_{n=1}^{N},\{T_{k}\}_{k=1}^{K}}\bigg{\|}\mathbb{E}_{X\sim U(\Omega),T\sim U([0,T])}f(X,T)^{2}-\frac{1}{NK}\sum_{n=1}^{N}\sum_{k=1}^{K}f(X_{n},T_{k})^{2}\bigg{\|}$
	$\displaystyle\leq\|\Omega\|\|T\|\sqrt{\mathbb{E}_{\{X_{n}\}_{n=1}^{N},\{T_{k}\}_{k=1}^{K}}\bigg{\|}\mathbb{E}_{X\sim U(\Omega),T\sim U([0,T])}f(X,T)^{2}-\frac{1}{NK}\sum_{n=1}^{N}\sum_{k=1}^{K}f(X_{n},T_{k})^{2}\bigg{\|}^{2}}$
	$\displaystyle=\frac{\|\Omega\|\|T\|}{NK}\sqrt{\mathbb{E}_{\{X_{n}\}_{n=1}^{N},\{T_{k}\}_{k=1}^{K}}\sum_{n=1}^{N}\sum_{k=1}^{K}\bigg{\|}\mathbb{E}_{X\sim U(\Omega),T\sim U([0,T])}f(X,T)^{2}-f(X_{n},T_{k})^{2}\bigg{\|}^{2}}$
	$\displaystyle=\frac{\|\Omega\|\|T\|}{NK}\sqrt{NK\cdot\mathbb{E}_{X_{1}\sim U(\Omega),T_{1}\sim U([0,T])}\bigg{\|}\mathbb{E}_{X\sim U(\Omega),T\sim U([0,T])}f(X,T)^{2}-f(X_{1},T_{1})^{2}\bigg{\|}^{2}}$
	$\displaystyle=\frac{\|\Omega\|\|T\|}{NK}\sqrt{NK}\sigma(f(X,T)^{2})$
	$\displaystyle=\|\Omega\|\|T\|\frac{\sigma(f(X,T)^{2})}{\sqrt{NK}}$

where $\sigma(f(X,T))$ is the standard deviation of $f(X,T)$ . With the bound of $f$ , we can further obtain,

	$\displaystyle\sigma^{2}(f(X,T)^{2})$	$\displaystyle=\mathbb{E}\left((f(X,T)^{2})^{2}\right)-\left(\mathbb{E}(f(X,T)^{2})\right)^{2}$
		$\displaystyle\leq\frac{1}{\|\Omega\|\|T\|}\left(\int_{\Omega_{T}}(f(X,T)^{2})^{2}dXdT\right)$
		$\displaystyle\leq\frac{1}{\|\Omega\|\|T\|}\left(\int_{\Omega_{T}}\kappa^{4}dXdT\right)$
		$\displaystyle\leq\frac{1}{\|\Omega\|\|T\|}\cdot\|\Omega\|\|T\|\mathcal{B}$
		$\displaystyle=\mathcal{B}$

then we have,

\displaystyle\mathbb{E}_{X\sim U(\Omega),T\sim U([0,T])}\bigg{|}\mathcal{L}_{3}-\hat{\mathcal{L}}_{3}\bigg{|}\leq|\Omega||T|\sqrt{\frac{\mathcal{B}}{NK}}.

$\bullet$ Similarly, for $i=8,14,17,20$ ,

	$\displaystyle\mathbb{E}_{X\sim U(\Omega),T\sim U([0,T])}\bigg{\|}\mathcal{L}_{8}-\hat{\mathcal{L}}_{8}\bigg{\|}\leq\|\Omega\|\sqrt{\frac{\mathcal{B}}{N}},$
	$\displaystyle\mathbb{E}_{X\sim U(\Omega)}\bigg{\|}\mathcal{L}_{14}-\hat{\mathcal{L}}_{14}\bigg{\|}\leq\|\Omega\|\sqrt{\frac{\mathcal{B}}{N}},$
	$\displaystyle\mathbb{E}_{Y\sim U(\partial\Omega),T\sim U([0,T])}\bigg{\|}\mathcal{L}_{17}-\hat{\mathcal{L}}_{17}\bigg{\|}\leq\|\partial\Omega\|\|T\|\sqrt{\frac{\mathcal{B}}{MK}},$
	$\displaystyle\mathbb{E}_{Y\sim U(\partial\Omega),T\sim U([0,T])}\bigg{\|}\mathcal{L}_{20}-\hat{\mathcal{L}}_{20}\bigg{\|}\leq\|\partial\Omega\|\|T\|\sqrt{\frac{\mathcal{B}}{MK}}.$

$\bullet$ For $i=11$ ,

	$\displaystyle\sigma^{2}(\sum_{i=1}^{d}\varphi_{x_{i}}(x)^{2})$	$\displaystyle=\mathbb{E}\left((\sum_{i=1}^{d}\varphi_{x_{i}}(x)^{2})^{2}\right)-\left(\mathbb{E}(\sum_{i=1}^{d}\varphi_{x_{i}}(x)^{2})\right)^{2}$
		$\displaystyle\leq\frac{1}{\|\Omega\|}\left[\int_{\Omega}\left(\sum_{i=1}^{d}\varphi_{x_{i}}(x)^{2}\right)^{2}dx\right]$
		$\displaystyle\leq\frac{1}{\|\Omega\|}\left[\int_{\Omega}(d\kappa^{2})^{2}dx\right]$
		$\displaystyle\leq d^{2}\mathcal{B}$

then we have,

\displaystyle\mathbb{E}_{X\sim U(\Omega)}\bigg{|}\mathcal{L}_{11}-\hat{\mathcal{L}}_{11}\bigg{|}\leq d|\Omega|\sqrt{\frac{\mathcal{B}}{N}}.

$\bullet$ Similarly, for $i=23$ ,

\displaystyle\mathbb{E}_{Y\sim U(\partial\Omega),T\sim U([0,T])}\bigg{|}\mathcal{L}_{23}-\hat{\mathcal{L}}_{23}\bigg{|}\leq d|\partial\Omega||T|\sqrt{\frac{\mathcal{B}}{MK}}.

Hence, we have,

	$\displaystyle\mathbb{E}_{\{X_{n}\}_{n=1}^{N},\{Y_{m}\}_{m=1}^{M},\{T_{k}\}_{k=1}^{K}}\sup_{u\in\mathcal{P}}\|\mathcal{L}(u)-\hat{\mathcal{L}}(u)\|$
	$\displaystyle\leq\sum_{j=1}^{24}\mathbb{E}_{\{X,Y,T\}}\sup_{u\in\mathcal{P}}\|\mathcal{L}_{j}(u)-\hat{\mathcal{L}}_{j}(u)\|$
	$\displaystyle\leq 28\sqrt{\frac{3}{2}}max\{\mathcal{B},\mathcal{B}^{2}\}\left(5\|\Omega\|\|T\|C_{1}\left(\frac{\mathcal{H}_{1}}{NK}\right)^{\frac{1}{2}}\sqrt{log\left(\frac{eNK}{\mathcal{H}_{1}}\right)}+\right.$
	$\displaystyle\left.6\|\Omega\|C_{2}\left(\frac{\mathcal{H}_{2}}{N}\right)^{\frac{1}{2}}\sqrt{log\left(\frac{eN}{\mathcal{H}_{2}}\right)}+6\|\partial\Omega\|\|T\|C_{3}\left(\frac{\mathcal{H}_{3}}{MK}\right)^{\frac{1}{2}}\sqrt{log\left(\frac{eMK}{\mathcal{H}_{3}}\right)}\right)$
	$\displaystyle\left(\|\Omega\|\|T\|\sqrt{\frac{\mathcal{B}}{NK}}+(2+d)\|\Omega\|\sqrt{\frac{\mathcal{B}}{N}}+(2+d)\|\partial\Omega\|\|T\|\sqrt{\frac{\mathcal{B}}{MK}}\right),$

where $C_{1},C_{2},C_{3}$ are constants associated with dimensionality $d$ and bound $\mathcal{B}$ . Hence, for any $\varepsilon\geq 0$ , if the number of samples satisfies:

\begin{cases}N&=C(d,|\Omega|,\mathcal{B})\mathcal{D}^{4}\mathcal{W}^{2}(\mathcal{D}+log(\mathcal{W}))(\frac{1}{\varepsilon})^{2+\delta},\\ K&=C(d,|T|,\mathcal{B})\mathcal{D}^{2}f_{K}(\mathcal{D},\mathcal{W})(\frac{1}{\varepsilon})^{k_{1}},\\ M&=C(d,|\partial\Omega|,\mathcal{B})f_{M}(\mathcal{D},\mathcal{W})(\frac{1}{\varepsilon})^{k_{2}},\\ \end{cases}

where:

\begin{cases}k_{1}+k_{2}=2+\delta,\\ f_{k}(\mathcal{D},\mathcal{W})\cdot f_{M}(\mathcal{D},\mathcal{W})=\mathcal{D}^{2}\mathcal{W}^{2}(\mathcal{D}+log(\mathcal{W})).\end{cases}

with restriction $f_{K}(\mathcal{D},\mathcal{W})\geq 1$ , $f_{M}(\mathcal{D},\mathcal{W})\geq 1$ , and $\delta$ is arbitrarily small. Then we have:

\displaystyle\mathbb{E}_{\{X_{n}\}_{n=1}^{N},\{Y_{m}\}_{m=1}^{M},\{T_{k}\}_{k=1}^{K}}\sup_{u\in\mathcal{P}}|\mathcal{L}(u)-\hat{\mathcal{L}}(u)|\leq\varepsilon.

References

(1) Susanne Brenner and Ridgway Scott. The mathematical theory of finite element methods, volume 15. Springer Science & Business Media, 2007.
(2) Philippe G Ciarlet. The finite element method for elliptic problems. SIAM, 2002.
(3) A. Quarteroni and A. Valli. Numerical Approximation of Partial Differential Equations, volume 23. Springer Science & Business Media, 2008.
(4) J.W. Thomas. Numerical Partial Differential Equations: Finite Difference Methods, volume 22. Springer Science & Business Media, 2013.
(5) Isaac E Lagaris, Aristidis Likas, and Dimitrios I Fotiadis. Artificial neural networks for solving ordinary and partial differential equations. IEEE transactions on neural networks, 9(5):987–1000, 1998.
(6) Cosmin Anitescu, Elena Atroshchenko, Naif Alajlan, and Timon Rabczuk. Artificial neural network methods for the solution of second order boundary value problems. Computers, Materials & Continua, 59(1):345–359, 2019.
(7) Julius Berner, Markus Dablander, and Philipp Grohs. Numerically solving parametric families of high-dimensional kolmogorov partial differential equations via deep learning. In Advances in Neural Information Processing Systems, volume 33, pages 16615–16627. Curran Associates, Inc., 2020.
(8) Jiequn Han, Arnulf Jentzen, and E Weinan. Solving high-dimensional partial differential equations using deep learning. Proceedings of the National Academy of Sciences, 115(34):8505–8510, 2018.
(9) Lu Lu, Xuhui Meng, Zhiping Mao, and George Em Karniadakis. Deepxde: A deep learning library for solving differential equations. SIAM Review, 63(1):208–228, 2021.
(10) Justin Sirignano and Konstantinos Spiliopoulos. Dgm: A deep learning algorithm for solving partial differential equations. Journal of computational physics, 375:1339–1364, 2018.
(11) E. Weinan and Ting Yu. The deep ritz method: A deep learning-based numerical algorithm for solving variational problems. Communications in Mathematics and Statistics, 6(1):1–12, 2017.
(12) Maziar Raissi, Paris Perdikaris, and George E Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686–707, 2019.
(13) Yaohua Zang, Gang Bao, Xiaojing Ye, and Haomin Zhou. Weak adversarial networks for high-dimensional partial differential equations. Journal of Computational Physics, 411:109409, 2020.
(14) Ameya D Jagtap, Ehsan Kharazmi, and George Em Karniadakis. Conservative physics-informed neural networks on discrete domains for conservation laws: Applications to forward and inverse problems. Computer Methods in Applied Mechanics and Engineering, 365:113028, 2020.
(15) G. Pang, M. D’Elia, M. Parks, and G.E. Karniadakis. npinns: Nonlocal physics-informed neural networks for a parametrized nonlocal universal laplacian operator. algorithms and applications. Journal of Computational Physics, 422:109760, 2020.
(16) Guofei Pang, Lu Lu, and George Em Karniadakis. fpinns: Fractional physics-informed neural networks. SIAM Journal on Scientific Computing, 41(4):A2603–A2626, 2019.
(17) Hao Wang, Jian Li, Linfeng Wang, Lin Liang, Zhoumo Zeng, and Yang Liu. On acoustic fields of complex scatters based on physics-informed neural networks. Ultrasonics, 128:106872, 2023.
(18) Linfeng Wang, Hao Wang, Lin Liang, Jian Li, Zhoumo Zeng, and Yang Liu. Physics-informed neural networks for transcranial ultrasound wave propagation. Ultrasonics, 132:107026, 2023.
(19) Yi Ding, Su Chen, Xiaojun Li, Suyang Wang, Shaokai Luan, and Hao Sun. Self-adaptive physics-driven deep learning for seismic wave modeling in complex topography. Engineering Applications of Artificial Intelligence, 123:106425, 2023.
(20) Wojciech M Czarnecki, Simon Osindero, Max Jaderberg, Grzegorz Swirszcz, and Razvan Pascanu. Sobolev training for neural networks. Advances in neural information processing systems, 30, 2017.
(21) Hwijae Son, Jin Woo Jang, Woo Jin Han, and Hyung Ju Hwang. Sobolev training for physics informed neural networks. Communications in Mathematical Sciences, 21:1679–1705, 2013.
(22) Nikolaos N Vlassis and WaiChing Sun. Sobolev training of thermodynamic-informed neural networks for interpretable elasto-plasticity models with level set hardening. Computer Methods in Applied Mechanics and Engineering, 377:113695, 2021.
(23) Yuling Jiao, Xiliang Lu, Jerry Zhijian Yang, Cheng Yuan, and Pingwen Zhang. Improved analysis of pinns: Alleviate the cod for compositional solutions. Annals of Applied Mathematics, 39(3):239–263, 2023.
(24) Yuling Jiao, Di Li, Xiliang Lu, Jerry Zhijian Yang, and Cheng Yuan. Gas: A gaussian mixture distribution-based adaptive sampling method for pinns. arXiv preprint arXiv:2303.15849, 2023.
(25) Yuling Jiao, Yanming Lai, Dingwei Li, Xiliang Lu, Fengru Wang, Jerry Zhijian Yang, et al. A rate of convergence of physics informed neural networks for the linear second order elliptic pdes. Communications in Computational Physics, 31(4):1272–1295, 2022.
(26) Aad W Van Der Vaart, Adrianus Willem van der Vaart, Aad van der Vaart, and Jon Wellner. Weak convergence and empirical processes: with applications to statistics. Springer Science & Business Media, 1996.
(27) Martin Anthony and Peter L. Bartlett. Neural network learning: Theoretical foundations. Ai Magazine, 22(2):99–100, 1999.

	$\displaystyle C(\Omega):=\{\text{all the continuous functions defined on }\Omega\},$
	$\displaystyle C^{s}(\Omega):=\{f:\Omega\rightarrow\mathbb{R}\ \|\ D^{\alpha}f\in C(\Omega)\},\|\alpha\|\leq s,$
	$\displaystyle C(\overline{\Omega}):=\{\text{all the continuous functions defined on }\overline{\Omega}\},\left\\|f\right\\|_{C(\overline{\Omega})}:=\max_{x\in\overline{\Omega}}\|f(x)\|,$
	$\displaystyle C^{s}(\overline{\Omega}):=\{f:\Omega\rightarrow\mathbb{R}\ \|\ D^{\alpha}f\in C(\overline{\Omega})\},\|\alpha\|\leq s,\left\\|f\right\\|_{C^{s}(\overline{\Omega})}:=\max_{x\in\overline{\Omega},\|\alpha\|\leq s}\|D^{\alpha}f(x)\|,$
	$\displaystyle L^{p}(\Omega):=\left\{f:\Omega\rightarrow\mathbb{R}\|\int_{\Omega}\left\|f\right\|^{p}dx<\infty\right\},$
	$\displaystyle\left\\|f\right\\|_{L^{p}(\Omega)}:=\bigg{[}\int_{\Omega}\|f\|^{p}dx\bigg{]}^{1/p},\forall p\in[1,\infty),$
	$\displaystyle L^{\infty}(\Omega):=\left\{f:\Omega\rightarrow\mathbb{R}\ \|\ \exists C>0\ s.t.\|f\|\leq C\ a.e.\right\},$
	$\displaystyle\left\\|f\right\\|_{L^{\infty}(\Omega)}:=\inf\{C\ \|\ \|f\|\leq C\ a.e.\},$
	$\displaystyle H^{s}(\Omega):=\left\{f:\Omega\rightarrow\mathbb{R}\ \|\ D^{\alpha}f\in L^{2}(\Omega),\|\alpha\|\leq s\right\},$
	$\displaystyle\left\\|f\right\\|_{H^{s}(\Omega)}:=\left(\sum_{\|\alpha\|\leq s}\left\\|D^{\alpha}f\right\\|_{L^{2}(\Omega)}^{2}\right)^{1/2}.$

	$\displaystyle\varepsilon_{app}$	$\displaystyle\leq\|\mathcal{L}(\hat{u}_{\phi})-\mathcal{L}(u^{*})\|$
		$\displaystyle=\\|(\hat{u}_{\phi})_{tt}(x,t)-\Delta\hat{u}_{\phi}(x,t)-f\\|^{2}_{L^{2}(\Omega_{T})}+\\|\hat{u}_{\phi}(x,0)-\varphi(x)\\|^{2}_{H^{1}(\Omega)}$
		$\displaystyle+\\|(\hat{u}_{\phi})_{t}(x,0)-\psi(x)\\|^{2}_{L^{2}(\Omega)}+\\|\hat{u}_{\phi}(x,t)-g(x,t)\\|^{2}_{H^{1}(\partial\Omega_{T})}$
		$\displaystyle\leq\\|v_{tt}\\|^{2}_{L^{2}(\Omega_{T})}+\\|\Delta v\\|^{2}_{L^{2}(\Omega_{T})}+\\|v(x,0)\\|^{2}_{H^{1}(\Omega)}+\\|v_{t}(x,0)\\|^{2}_{L^{2}(\Omega)}+\\|v\\|_{H^{1}(\partial\Omega_{T})}^{2}$
		$\displaystyle\leq C(d,\|\Omega_{T}\|,\|\partial\Omega_{T}\|)\cdot\varepsilon^{2}.$

	$\displaystyle\mathcal{F}_{1}\subset\mathcal{N}_{1}:=\mathcal{N}(\mathcal{D}+5,(\mathcal{D}+2)(\mathcal{D}+4)\mathcal{W},\{\\|\cdot\\|_{C(\overline{\Omega_{T}})},\mathcal{B}^{2}\},\Phi),$
	$\displaystyle\mathcal{F}_{2}\subset\mathcal{N}_{2}:=\mathcal{N}(\mathcal{D}+5,2(\mathcal{D}+2)(\mathcal{D}+4)\mathcal{W},\{\\|\cdot\\|_{C(\overline{\Omega_{T}})},\mathcal{B}^{2}\},\Phi),$
	$\displaystyle\mathcal{F}_{4}\subset\mathcal{N}_{4}:=\mathcal{N}(\mathcal{D}+5,2(\mathcal{D}+2)(\mathcal{D}+4)\mathcal{W},\{\\|\cdot\\|_{C(\overline{\Omega_{T}})},\mathcal{B}^{2}\},\Phi),$
	$\displaystyle\mathcal{F}_{5}\subset\mathcal{N}_{5}:=\mathcal{N}(\mathcal{D}+4,(\mathcal{D}+2)(\mathcal{D}+4)\mathcal{W},\{\\|\cdot\\|_{C(\overline{\Omega_{T}})},\mathcal{B}\},\Phi),$
	$\displaystyle\mathcal{F}_{6}\subset\mathcal{N}_{6}:=\mathcal{N}(\mathcal{D}+4,(\mathcal{D}+2)(\mathcal{D}+4)\mathcal{W},\{\\|\cdot\\|_{C(\overline{\Omega_{T}})},\mathcal{B}\},\Phi),$
	$\displaystyle\mathcal{F}_{7}\subset\mathcal{N}_{7}:=\mathcal{N}(\mathcal{D}+1),\mathcal{W},\{\\|\cdot\\|_{C(\overline{\Omega_{T}})},\mathcal{B}^{2}\},\Phi),$
	$\displaystyle\mathcal{F}_{9}\subset\mathcal{N}_{9}:=\mathcal{N}(\mathcal{D},\mathcal{W},\{\\|\cdot\\|_{C(\overline{\Omega_{T}})},\mathcal{B}\},\Phi),$
	$\displaystyle\mathcal{F}_{10}\subset\mathcal{N}_{10}:=\mathcal{N}(\mathcal{D}+2,2(\mathcal{D}+2)\mathcal{W},\{\\|\cdot\\|_{C(\overline{\Omega_{T}})},\mathcal{B}^{2}\},\Phi),$
	$\displaystyle\mathcal{F}_{12}\subset\mathcal{N}_{12}:=\mathcal{N}(\mathcal{D}+2,(\mathcal{D}+2)\mathcal{W},\{\\|\cdot\\|_{C(\overline{\Omega_{T}})},\mathcal{B}\},\Phi),$
	$\displaystyle\mathcal{F}_{13}\subset\mathcal{N}_{13}:=\mathcal{N}(\mathcal{D}+3,(\mathcal{D}+2)\mathcal{W},\{\\|\cdot\\|_{C(\overline{\Omega_{T}})},\mathcal{B}^{2}\},\Phi),$
	$\displaystyle\mathcal{F}_{15}\subset\mathcal{N}_{15}:=\mathcal{N}(\mathcal{D}+2,(\mathcal{D}+2)\mathcal{W},\{\\|\cdot\\|_{C(\overline{\Omega_{T}})},\mathcal{B}\},\Phi),$
	$\displaystyle\mathcal{F}_{16}\subset\mathcal{N}_{16}:=\mathcal{N}(\mathcal{D}+1,\mathcal{W},\{\\|\cdot\\|_{C(\overline{\Omega_{T}})},\mathcal{B}^{2}\},\Phi),$
	$\displaystyle\mathcal{F}_{18}\subset\mathcal{N}_{18}:=\mathcal{N}(\mathcal{D},\mathcal{W},\{\\|\cdot\\|_{C(\overline{\Omega_{T}})},\mathcal{B}\},\Phi),$
	$\displaystyle\mathcal{F}_{19}\subset\mathcal{N}_{19}:=\mathcal{N}(\mathcal{D}+3,(\mathcal{D}+2)\mathcal{W},\{\\|\cdot\\|_{C(\overline{\Omega_{T}})},\mathcal{B}^{2}\},\Phi),$
	$\displaystyle\mathcal{F}_{21}\subset\mathcal{N}_{21}:=\mathcal{N}(\mathcal{D}+2,(\mathcal{D}+2)\mathcal{W},\{\\|\cdot\\|_{C(\overline{\Omega_{T}})},\mathcal{B}\},\Phi),$
	$\displaystyle\mathcal{F}_{22}\subset\mathcal{N}_{22}:=\mathcal{N}(\mathcal{D}+3,2(\mathcal{D}+2)\mathcal{W},\{\\|\cdot\\|_{C(\overline{\Omega_{T}})},\mathcal{B}^{2}\},\Phi),$
	$\displaystyle\mathcal{F}_{24}\subset\mathcal{N}_{24}:=\mathcal{N}(\mathcal{D}+2,(\mathcal{D}+2)\mathcal{W},\{\\|\cdot\\|_{C(\overline{\Omega_{T}})},\mathcal{B}\},\Phi).$

	$\displaystyle\mathbb{E}_{\{X_{n}\}_{n=1}^{N},\{T_{k}\}_{k=1}^{K}}\bigg{\|}\mathcal{L}_{3}-\hat{\mathcal{L}}_{3}\bigg{\|}$
	$\displaystyle=\|\Omega\|\|T\|\mathbb{E}_{\{X_{n}\}_{n=1}^{N},\{T_{k}\}_{k=1}^{K}}\bigg{\|}\mathbb{E}_{X\sim U(\Omega),T\sim U([0,T])}f(X,T)^{2}-\frac{1}{NK}\sum_{n=1}^{N}\sum_{k=1}^{K}f(X_{n},T_{k})^{2}\bigg{\|}$
	$\displaystyle\leq\|\Omega\|\|T\|\sqrt{\mathbb{E}_{\{X_{n}\}_{n=1}^{N},\{T_{k}\}_{k=1}^{K}}\bigg{\|}\mathbb{E}_{X\sim U(\Omega),T\sim U([0,T])}f(X,T)^{2}-\frac{1}{NK}\sum_{n=1}^{N}\sum_{k=1}^{K}f(X_{n},T_{k})^{2}\bigg{\|}^{2}}$
	$\displaystyle=\frac{\|\Omega\|\|T\|}{NK}\sqrt{\mathbb{E}_{\{X_{n}\}_{n=1}^{N},\{T_{k}\}_{k=1}^{K}}\sum_{n=1}^{N}\sum_{k=1}^{K}\bigg{\|}\mathbb{E}_{X\sim U(\Omega),T\sim U([0,T])}f(X,T)^{2}-f(X_{n},T_{k})^{2}\bigg{\|}^{2}}$
	$\displaystyle=\frac{\|\Omega\|\|T\|}{NK}\sqrt{NK\cdot\mathbb{E}_{X_{1}\sim U(\Omega),T_{1}\sim U([0,T])}\bigg{\|}\mathbb{E}_{X\sim U(\Omega),T\sim U([0,T])}f(X,T)^{2}-f(X_{1},T_{1})^{2}\bigg{\|}^{2}}$
	$\displaystyle=\frac{\|\Omega\|\|T\|}{NK}\sqrt{NK}\sigma(f(X,T)^{2})$
	$\displaystyle=\|\Omega\|\|T\|\frac{\sigma(f(X,T)^{2})}{\sqrt{NK}}$

	$\displaystyle\mathbb{E}_{X\sim U(\Omega),T\sim U([0,T])}\bigg{\|}\mathcal{L}_{8}-\hat{\mathcal{L}}_{8}\bigg{\|}\leq\|\Omega\|\sqrt{\frac{\mathcal{B}}{N}},$
	$\displaystyle\mathbb{E}_{X\sim U(\Omega)}\bigg{\|}\mathcal{L}_{14}-\hat{\mathcal{L}}_{14}\bigg{\|}\leq\|\Omega\|\sqrt{\frac{\mathcal{B}}{N}},$
	$\displaystyle\mathbb{E}_{Y\sim U(\partial\Omega),T\sim U([0,T])}\bigg{\|}\mathcal{L}_{17}-\hat{\mathcal{L}}_{17}\bigg{\|}\leq\|\partial\Omega\|\|T\|\sqrt{\frac{\mathcal{B}}{MK}},$
	$\displaystyle\mathbb{E}_{Y\sim U(\partial\Omega),T\sim U([0,T])}\bigg{\|}\mathcal{L}_{20}-\hat{\mathcal{L}}_{20}\bigg{\|}\leq\|\partial\Omega\|\|T\|\sqrt{\frac{\mathcal{B}}{MK}}.$