Solving Partial Differential Equations with Point Source Based on Physics-Informed Neural Networks

Xiang Huang¹, Hongsheng Liu², Beiji Shi², Zidong Wang², Kang Yang²,
Yang Li², Bingya Weng², Min Wang², Haotian Chu², Jing Zhou², Fan Yu²,
Bei Hua¹, Lei Chen³, Bin Dong⁴
¹University of Science and Technology of China, ²Huawei Technologies Co. Ltd
³Hong Kong University of Science and Technology, ⁴Peking University
¹[email protected], [email protected]
²liuhongsheng4, shibeiji, wang1, yangkang22, [email protected]
²wengbingya, wangmin106, chuhaotian2, zhoujing3, [email protected]
³[email protected], ⁴[email protected] Corresponding author.

Abstract

In recent years, deep learning technology has been used to solve partial differential equations (PDEs), among which the physics-informed neural networks (PINNs) emerges to be a promising method for solving both forward and inverse PDE problems. PDEs with a point source that is expressed as a Dirac delta function in the governing equations are mathematical models of many physical processes. However, they cannot be solved directly by conventional PINNs method due to the singularity brought by the Dirac delta function. We propose a universal solution to tackle this problem with three novel techniques. Firstly the Dirac delta function is modeled as a continuous probability density function to eliminate the singularity; secondly a lower bound constrained uncertainty weighting algorithm is proposed to balance the PINNs losses between point source area and other areas; and thirdly a multi-scale deep neural network with periodic activation function is used to improve the accuracy and convergence speed of the PINNs method. We evaluate the proposed method with three representative PDEs, and the experimental results show that our method outperforms existing deep learning-based methods with respect to the accuracy, the efficiency and the versatility.

1 Introduction

Partial differential equations (PDEs) play a significant role in science and engineering disciplines, which are used to formulate physical problems such as sound propagation, fluid flow, electromagnetic field, etc. In the past decades, numerical solutions of PDEs have been developed and successfully applied to many real-world applications like aerodynamics and weather forecasting. These numerical methods, two representatives of which are finite difference method (FDM) and finite element method (FEM), firstly mesh the solution areas of the problem, and then obtain the approximate values of the exact solution on the discrete grid points. As the scale and complexity of the problem increase, the computational complexity of numerical methods becomes prohibitively high.

On the other hand, with the increase use of numerical methods in both industry and academia, a growth in the collection of simulation data has been observed in computational fluid dynamics, electromagnetic simulation and many other areas [1, 2], and data-driven deep learning approaches are proposed to alleviate the computational burden. For example, Thuerey et al. [3] used U-Net to predict the velocity and pressure fields around an airfoil. Khan et al. [4] trained a convolutional neural network to solve Maxwell’s equations for low-frequency electromagnetic devices. Lu et al. [5] proposed a deep operator network (DeepONet) that can learn any mathematical operator (mapping a function to another function) based on the universal approximation theorem of operators. For PDEs, neural operators directly learn the mapping from any functional parametric dependence to the solution. Li et al. [6] proposed the Fourier neural operator (FNO) that parameterizes the integral kernel directly in Fourier space, allowing for an expressive and efficient architecture. FNO was applied to solve Burgers’ equation, Darcy flow, and Navier-Stokes equation.

Data-driven deep learning methods have the advantage that once the neural network is trained, the prediction process is fast. However, the cost of data acquisition is prohibitive for complex physical, biological, and engineering systems, and the generalization ability of the model (from one experimental condition to another) is poor when there are not sufficient labeled data [7]. Therefore, physics-informed solutions to PDEs have been proposed in recent years, which rely on the universal approximation ability of deep neural network [8] and leverage the advantage of the automatic differentiation in the deep learning frameworks [9]. Compared with traditional PDE solvers, deep learning solvers [10, 11, 12, 13] not only can deal with parametrized simulations, but also address the inverse or data assimilation problems that the traditional solvers cannot deal with. Among these methods, the physics-informed neural networks (PINNs) [13] attracts increasing attentions from the machine learning and applied mathematics communities due to its simple and effective way of approximating the solutions of the PDEs using neural networks. The PINNs method preserves the physical information in the governing equations and adapts to the boundary conditions. The PINNs method has been applied to solve many well-known PDEs such as Navier-Stokes equations [14], Schroedinger’s equation [15], and Maxwell’s equations [16].

PDEs with a point source that is expressed mathematically as a Dirac delta function $\delta(\mathbf{x})$ have many applications in physical simulation. For instance, a point source can be a pulse exciting the electric field in electromagnetic simulation [17], or a sound source in an acoustic wave equation [18]. When we use the residuals of PDEs with the point source as the loss term (i.e., the direct application of the conventional PINNs method), the training will not converge due to the singularity brought by the Dirac delta function. Several methods have been proposed in the literature to solve this kind of problem under some specific conditions. For example, the Deep Ritz method [12] and the variational method in NVIDIA SimNet [19] need to transform the point source term into a computer computable form, which however is not applicable to all PDEs with a point source; PINNs-based methods proposed by Zhang et al. [16], Moseley et al. [18], and Bekele tet al. [20] require a significant amount of labeled data that may not be available in some cases.

In this paper, we propose a universal solution to the point source PDE problem based on PINNs method. The proposed method does not rely on any labeled data or variational forms, and outperforms existing deep learning-based methods with respect to the accuracy, the efficiency and the versatility. The main features of our method are in three aspects:

•

The Dirac delta function is modeled as a continuous probability density function to eliminate the singularity;
•

A lower bound constrained uncertainty weighting algorithm is proposed to balance the loss term between the point source area and other areas;
•

A neural network architecture is built with multi-scale DNN [21] and periodic activation functions to improve the accuracy and convergence speed of the PINNs method.

The rest of this paper is organized as follows: Sec.2 briefly introduces relevant preliminary knowledge and related works, Sec.3 describes our method under the PINNs framework, Sec.4 performs extensive experiments on the method with various benchmark problems, and Sec.5 concludes the paper.

2 Preliminaries

The general form of a time-dependent PDE is given by :

$\displaystyle\mathcal{N}(u(\mathbf{x},t))$	$\displaystyle=f(\mathbf{x},t),\quad(\mathbf{x},t)\in\Omega\times[0,T]$	(1)
$\displaystyle u(\mathbf{x},t)$	$\displaystyle=g(\mathbf{x},t),\quad(\mathbf{x},t)\in\partial\Omega\times[0,T]$
$\displaystyle u(\mathbf{x},0)$	$\displaystyle=h(x),\qquad\mathbf{x}\in\Omega$

where $\mathcal{N}(\cdot)$ is the differential operator, $u$ is the solution, and $f(\mathbf{x},t)$ is the time-dependent forcing term. For the PDEs with point source we consider in this paper, the forcing term $f(\mathbf{x},t)$ contains the Dirac delta functions.

2.1 PDEs with Point Source

Here we introduce two well-known time-(in)dependent PDEs with point source forcing terms. The Poisson’s equation with a point source has the following form:

	$\displaystyle-\Delta u$	$\displaystyle=\delta(\mathbf{x}-\mathbf{x}_{0}),\quad\mathbf{x}\in\Omega,$		(2)
	$\displaystyle u$	$\displaystyle=0,\qquad\qquad\,\,\,\,\mathbf{x}\in\partial\Omega$		(2)

where $\delta$ denotes the Dirac delta function. Although it looks simple, Poisson’s equation is a fundamental model for describing diffusion process, such as fluid flow, heat transfer, and chemical transport. In the field of electromagnetic simulations, the time domain Maxwell’s equations with excitation can be expressed as

	$\displaystyle\nabla\times E$	$\displaystyle=-\mu\dfrac{\partial H}{\partial t}-J(\mathbf{x},t),$		(3)
	$\displaystyle\nabla\times H$	$\displaystyle=\epsilon\dfrac{\partial E}{\partial t}.$		(3)

where $\epsilon$ is the permittivity, $\mu$ is the permeability. The forcing term $J(\mathbf{x},t)=\delta(\mathbf{x}-\mathbf{x}_{0})g(t)$ excites a wave propagation at $\mathbf{x}_{0}$ with a particular type of pulse $g(t)$ . Maxwell’s equations survived the revolutionary changes in physics in the 20th century and they underpin a huge range of natural phenomena.

2.2 The PINNs Method

In order to solve PDE problems with deep learning technology the PINNs method is proposed to approximate the solution $u(\mathbf{x},t)$ with a deep neural network $u(\mathbf{x},t;\theta)$ , where $\theta$ represents the trainable parameters of the deep neural network. The loss function given by such approximation is defined as


$\displaystyle L_{total}(\theta)$	$\displaystyle=L_{r}(\theta)+\lambda_{ic}L_{ic}(\theta)+\lambda_{bc}L_{bc}(\theta),$	(4a)
$\displaystyle L_{r}(\theta)$	$\displaystyle=\dfrac{1}{N_{r}}\sum_{i=1}^{N_{r}}\|\|\mathcal{N}(u(\mathbf{x}_{i},t_{i};\theta))-f(\mathbf{x}_{i},t_{i})\|\|_{2}^{2},$	(4b)
$\displaystyle L_{ic}(\theta)$	$\displaystyle=\dfrac{1}{N_{ic}}\sum_{i=1}^{N_{ic}}\|\|u(\mathbf{x}_{i},0;\theta)-h(\mathbf{x}_{i})\|\|_{2}^{2},$	(4c)
$\displaystyle L_{bc}(\theta)$	$\displaystyle=\dfrac{1}{N_{bc}}\sum_{i=1}^{N_{bc}}\|\|u(\mathbf{x}_{i},t_{i};\theta)-g(\mathbf{x}_{i},t_{i})\|\|_{2}^{2}.$	(4d)

where $L_{r}$ , $L_{ic}$ , $L_{bc}$ correspond to the loss term of PDE residual, initial conditions, and boundary conditions, respectively. The hyperparameters $\lambda_{ic}$ and $\lambda_{bc}$ aim to balance the interplay of different terms in the loss function. $N_{r}$ , $N_{ic}$ , and $N_{bc}$ represent the number of samples in the entire region, initial conditions, and boundary conditions, respectively.

When $f(\cdot),g(\cdot),h(\cdot)$ are continuous, the network weights $\theta$ can be optimized by minimizing the total training loss $L_{total}(\theta)$ via standard gradient descent procedures used in deep learning. Unfortunately, due to the singularity brought by $\delta(\mathbf{x})$ , the PINNs method cannot be directly used to solve the PDEs with a point source.

2.3 Related Works

One can use the Deep Ritz method [12] to solve Eq.(2). Through mathematical derivation, Eq.(2) can be converted to the minimum value problem:

\displaystyle\min\int_{\Omega}{\frac{1}{2}\|\Delta u(\mathbf{x})\|^{2}d\mathbf{x}-u(\mathbf{x}_{0})}+\int_{\partial\Omega}{\|u(\mathbf{x})\|^{2}}d\mathbf{x}

(5)

where the Dirac function term $\delta(\mathbf{x}-\mathbf{x}_{0})$ in Eq.(2) will generate the constant term $u(\mathbf{x}_{0})$ in Eq.(5) after integration. However, application of the Deep Ritz method is restricted because it’s unable to handle PDEs like Eq.(3) which cannot be derived from any variational problems. The NVIDIA SimNet [19] proposes to solve PDEs with a point source using the variational formulation. However, the performance heavily depends on the selection of the test functions and the computational time increases linearly with the number of test functions.

Zhang et al. [16] and Moseley et al. [18] use the classical numerical method to simulate the early period ( $0\leq t\leq T_{0}<T$ ) of wave equations to avoid the point source singularity, and then applied the PINNs method to solve the PDEs with given initial conditions at $t=T_{0}$ . Bekele et al. [20] attempted to leverage the PINNs loss by adding a new loss term with a significant amount of known labels when solving the Barry and Mercer’s source problem with time-dependent fluid injection. However, this method relies on classical computational methods to generate the labeled data which is expensive or even infeasible in many situations.

In this work, we propose a universal solution based on the PINNs method which does not rely on any labeled data or variational forms. Furthermore, our method greatly improves the accuracy and convergency speed compared with current deep learning based approaches.

3 Methodology

In this section, we explain our strategy to tackle the singularity problem in detail.

3.1 Approximation to $\delta(\mathbf{x})$

The Dirac delta function $\delta(x)$ is mathematically defined by:

\delta(x)=\begin{cases}+\infty,&x=0\\ 0,&x\neq 0\end{cases}

(6)

and which is also constrained to satisfy the identity $\int_{-\infty}^{+\infty}\delta(x)dx=1$ . Taking $\eta(x)$ to be any symmetric unimodal continuous probability density function centered at 0, $\eta(x)=\alpha^{-1}\eta(\dfrac{x}{\alpha})$ can be a natural approximation to $\delta(x)$ as the kernel width $\alpha\rightarrow 0$ . For $d$ -dimensions space, $\eta_{\alpha}(\mathbf{x})=\alpha^{-d}\eta(\dfrac{\mathbf{x}}{\alpha})$ can be used instead. In our experiments, we use Gaussian distribution, Cauchy distribution, or Laplacian distribution with sufficiently small kernel width $\alpha$ to approximate $\delta(\mathbf{x})$ .

3.2 Lower Bound Constrained Uncertainty Weighting

Our approximation to $\delta(x)$ eliminates function discontinuity at the origin, but it also brings a difficulty for the PINNs method to optimize its training loss. As the probability density function is highly concentrated at the origin as $\alpha\rightarrow 0$ , the residual samples close to the origin tend to dominate the whole residual loss, making it difficult for the network to learn the governing equations and therefore the standard optimizer may not converge to the solution. To address this problem, we split the feasible region into two subdomains such that:

\Omega=\Omega_{0}\cup\Omega_{1}\quad\Omega_{0}\cap\Omega_{1}=\emptyset

where $\Omega_{0}$ represents the subdomain that contains the origin and $\Omega_{1}$ covers the complement region. By taking this decomposition, the residual loss $L_{r}$ in Eq.(4) is divided into two parts:

L_{r}(\theta)=\lambda_{r,0}L_{r,0}(\theta)+\lambda_{r,1}L_{r,1}(\theta)

(7)

where $L_{r,0}(\theta)$ and $L_{r,1}(\theta)$ correspond to the residual losses in $\Omega_{0}$ and $\Omega_{1}$ , respectively. In our experiment, for the point source excited at $\mathbf{x}_{0}\in\Omega$ , we split $\Omega$ as

\Omega_{0}=\{\mathbf{x}_{0}+\mathbf{x}\in\Omega\,|\,\|\mathbf{x}\|\leq 3\alpha\},\quad\Omega_{1}=\Omega\backslash\Omega_{0}.

Together with the IC&BC conditions, the total PINNs loss in Eq.(4a) changes to be:

\displaystyle L_{total}(\theta)=\lambda_{r,0}L_{r,0}(\theta)+\lambda_{r,1}L_{r,1}(\theta)+\lambda_{ic}L_{ic}(\theta)+\lambda_{bc}L_{bc}(\theta).

(8)

Properly setting the weight vector $\lambda=(\lambda_{r,0},\lambda_{r,1},\lambda_{ic},\lambda_{bc})$ is critical to enhance the trainability of constrained neural networks [22, 23], but searching the optimal weight vector through manual tuning is infeasible. We adapt the uncertainty weighing method in [24] to solve the problem. For a total loss function including $m$ loss terms $L(\theta)=\sum_{i=1}^{m}\lambda_{i}L_{i}(\theta)$ , we add a nonnegative $\epsilon^{2}$ to $w^{2}$ , and get the total PINNs loss function as follows:

L_{total}(\theta;\mathbf{w})=\sum_{i=1}^{m}\dfrac{1}{2(\epsilon^{2}+w_{i}^{2})}L_{i}(\theta)+\log(\epsilon^{2}+w_{i}^{2})

(9)

The uncertainty of the $i$ -th loss term is approximated by $\epsilon^{2}+w_{i}^{2}$ , whose lower bound is constrainted by $\epsilon^{2}$ when decreasing $w_{i}$ to 0. We argue that such reformulation is nontrivial since it provides a lower bound for the uncertainty estimation of each loss term and thus increases the stability of the self-adaptive weighting algorithm. In our experiment, taking $\epsilon=0.01$ leads to robust model outputs and outperforms the original formulation ( $\epsilon=0$ ).

Fig.1 shows the $L_{2}\ error$ s when different $\epsilon$ is used in solving Maxwell’s equations with a point source (i.e., Eq.(3)). For detailed experimental settings, please refer to Sec.4.1. When $\epsilon=0.01$ and $\epsilon=0.001$ , the lower bound constrained uncertainty weighting method outperforms the original uncertainty weighting method ( $\epsilon=0$ ) with not only lower final $L_{2}\ error$ s, but also faster convergence speed.

Refer to caption — Figure 1: Maxwell’s equations: $L_{2}\ error$ s for different uncertainty lower bound. Original PINNs means the equally weighted PINNs method.

3.3 Multi-Scale DNN with Periodic Activations

When $\Omega=(0,\pi)^{2}$ and the point source is located at $(x_{0},y_{0})$ , the analytical solution of the 2-D Poisson’s equation with a point source (i.e., Eq.(2)) is listed as follows:

u(x,y,x_{0},y_{0})=\frac{4}{\pi^{2}}\sum_{m,n=1}^{\infty}\frac{\sin{mx}\sin{mx_{0}}\sin{ny}\sin{ny_{0}}}{m^{2}+n^{2}}

(10)

The analytical solution is formed by superposition of multiple frequency components. This multi-frequency phenomenon is common in solutions of many PDEs. Recent development in deep learning theories [25, 26, 21] reveals that the neural network fits the hidden physical law from low to high frequency components, which differentiates from the classical iterative numerical algorithms (e.g., Gauss-Seidel method). Liu et al. [27] proposes multi-scale deep neural networks (MscaleDNNs) based on this theory which exhibit faster convergence in higher frequency components.

Periodic nonlinearities have been investigated repeatedly over the past decades. Recently, the SIREN architecture [28] proposes to use Sine function as a periodic activation function and outperforms alternative activation functions when dealing with implicit neural representations and their derivatives. Compared with other activation functions, e.g., Tanh, ReLU, the derivative of a Sine function is Cosine, a phase-shifted Sine. Therefore, the derivatives of a SIREN inherit the properties of SIREN, thus enabling us to supervise any derivative of SIREN with complex natural signals from PDEs. However, the SIREN architecture may yield poor accuracy when applied to PDEs with a point source whose solution are formed by superposition of multiple frequency components.

Borrowing ideas from both SIREN and MscaleDNNs, we propose Multi-Scale SIREN (MS-SIREN) that combines a multi-scale DNN with Sine activation function, as shown in Fig.2. Skip connections are added between consecutive hidden layers to accelerate training and improve accuracy, which make the neural network much easier to train since they help to avoid the vanishing gradient problem [29]. Fig.3 shows the effectiveness of MS-SIREN compared with other alternative architectures.

4 Numerical Experiments

To evaluate the effectiveness of our proposed method, we apply it to solve the following three classical point source problems which come from different fields of physics: (1) electromagnetic simulation with the transverse electric (TE) mode [30]; (2) Poisson’s equation with a point source; (3) Barry and Mercer’s source problem with time-dependent fluid injection [20]. Unless otherwise specified, we take the following default training settings:

•

The Dirac delta function $\delta(\mathbf{x})$ is approximated by a Gaussian distribution with default kernel width $\alpha=0.01$ .
•

$\epsilon=0.01$ is chosen for the lower bound constrained uncertainty weighting algorithm.
•

For MS-SIREN architecture, four subnets are used with scale coefficients set to {1, 2, 4, 8}. Each subnet contains seven fully connected layers and each layer contains 64 neurons. The exception is Poisson’s equation, which requires only five fully connected layers per subnet to achieve ideal accuracy.

Average relative $L_{2}\ error$ is used to evaluate the performance of the trained network $u(\mathbf{x};\theta)$ , which is computed as:

L_{2}\ error=\dfrac{\sum_{j=1}^{N}\|u_{ref}(\mathbf{x}_{j})-u(\mathbf{x}_{j},\theta)\|_{2}}{\sum_{j=1}^{N}\|u_{ref}(\mathbf{x}_{j})\|_{2}}

(11)

where $N$ is the number of test points in $\Omega$ and $u_{ref}(\cdot)$ represents the ground truth. In our experiments, all feasible regions are rectangles, thus the spatial-temporal meshes are chosen as the test point set. Unless otherwise specified, all the experiments are conducted under the MindSpore¹¹1https://www.mindspore.cn/ and trained on the Ascend 910 AI processors²²2https://e.huawei.com/en/products/servers/ascend. Readers can read our source code³³3https://gitee.com/mindspore/mindscience/tree/master/MindElec/ for implementation details.

4.1 Electromagnetic Simulation with the TE Mode

Maxwell’s equations are a set of coupled partial differential equations related to the fundamental physical modeling of electromagnetism. We use 2-D Maxwell’s equations to validate the robustness of our method. The governing equations of TE wave are given as:


$\displaystyle\frac{\partial E_{x}}{\partial t}$	$\displaystyle=\frac{1}{\epsilon}\frac{\partial H_{z}}{\partial y},$	(12a)
$\displaystyle\frac{\partial E_{y}}{\partial t}$	$\displaystyle=-\frac{1}{\epsilon}\frac{\partial H_{z}}{\partial x},$	(12b)
$\displaystyle\frac{\partial H_{z}}{\partial t}$	$\displaystyle=-\frac{1}{\mu}(\frac{\partial E_{y}}{\partial x}-\frac{\partial E_{x}}{\partial y}+J).$	(12c)

where $E_{x}$ and $E_{y}$ are the electric field in the Cartesian coordinate, and $H_{z}$ is the magnetic field perpendicular to the horizontal plane. The constants $\mu\approx 4\pi\times 10^{-7}H/m$ and $\epsilon\approx 8.854\times 10^{-12}F/m$ are known as the permeability and permittivity of the free space. $J$ is a known source function that represents the energy passing through the source node. Without loss of generality, a Gaussian pulse containing multiple frequency components is used as the source function that is expressed as:

\displaystyle J(x,y,t)=e^{-(\frac{t-d}{\tau})^{2}}\delta(x-x_{0})\delta(y-y_{0}).

(13)

where $d$ is the temporal delay and $\tau$ is a pulse-width parameter. Currently, we take $\tau=3.65\times\sqrt{2.3}/(\pi f)$ with the characteristic frequency $f$ set to $1GHz$ . Suppose the initial electromagnetic field is static, which corresponds to the initial conditions as follows:


$\displaystyle E_{x}(x,y,0)$	$\displaystyle=E_{y}(x,y,0)=0,$	(14a)
$\displaystyle H_{z}(x,y,0)$	$\displaystyle=0.$	(14b)

To solve the equations uniquely over a finite vacuum domain, the standard Mur’s second-order absorbing boundary condition is utilized. For a rectangular truncated domain, boundary conditions for the left, right, upper and lower sides, can be expressed as follows:


$\displaystyle\frac{\partial H_{z}}{\partial x}-\frac{1}{c}\frac{\partial H_{z}}{\partial t}+\frac{c\epsilon}{2}\frac{\partial E_{x}}{\partial y}$	$\displaystyle=0,$	(15a)
$\displaystyle\frac{\partial H_{z}}{\partial x}+\frac{1}{c}\frac{\partial H_{z}}{\partial t}-\frac{c\epsilon}{2}\frac{\partial E_{x}}{\partial y}$	$\displaystyle=0,$	(15b)
$\displaystyle\frac{\partial H_{z}}{\partial y}-\frac{1}{c}\frac{\partial H_{z}}{\partial t}+\frac{c\epsilon}{2}\frac{\partial E_{y}}{\partial x}$	$\displaystyle=0,$	(15c)
$\displaystyle\frac{\partial H_{z}}{\partial y}+\frac{1}{c}\frac{\partial H_{z}}{\partial t}-\frac{c\epsilon}{2}\frac{\partial E_{y}}{\partial x}$	$\displaystyle=0.$	(15d)

where $c=1/\sqrt{\mu\epsilon}$ is the light speed.

We solve the problem in a rectangular domain $\Omega=[-1,1]^{2}$ , and the point source is located at the origin $(x,y)=(0,0)$ . The total simulation time is set to $4ns$ so that the pulse signals can propagate throughout the whole space of truncated domain. The reference solution is obtained through the finite-difference time-domain (FDTD) method [30] at a spatial resolution of $\Delta=0.005$ . For model training, the spatial points are sampled from uniform distribution so that the average spacing between two adjacent random points is comparable with that in the reference FDTD mesh, and the average temporal step is chosen to satisfy the $CFL$ convergence condition [30]. The spatial point source is approximated by the Gaussian distribution with kernel width $\alpha=2\Delta=0.01$ . The model is trained using Adam optimizer with an initial learning rate of 0.001, and the learning rate attenuates 10 times when the training process reaches 40%, 60%, and 80%, respectively. The total number of iterations is $200$ k, and the batch size is $N_{r,0}=N_{r,1}=N_{ic}=N_{bc}=8192$ for each iteration. The instantaneous electromagnetic fields at the time of $2.4ns$ are showed in Fig.4. We can see that the predicted electromagnetic fields by our method are almost indistinguishable from the reference solution.

Ablation Studies

Selection of Kernel Width $\alpha$ . In order to approximate the Dirac delta function, the setting of kernel width $\alpha$ is important. In our experiments we find that the training process may not converge if $\alpha$ is too large or too small. We adopt an online fine-tuning method to decrease the value of $\alpha$ gradually during the training, which is illustrated in Fig.5 where the Gaussian distribution is used. The training process starts with $\alpha=0.01$ , the $L_{2}\ error$ drops rapidly and after $140$ k iterations the final mean $L_{2}\ error$ is stable below 0.05. Then the online fine-tuning is performed to reduce the value of $\alpha$ by half, which makes the $L_{2}\ error$ rise first but then decrease quickly as the training progresses, and after $70$ k iterations the final mean $L_{2}\ error$ is stable below 0.08. The fine-tuning operation repeats twice that reduces the value of $\alpha$ to 0.0025 and 0.00125, respectively. According to the experimental results in Fig.5, we choose $\alpha=0.01$ as the kernel width and fix it in the subsequent experiments. We further use Cauchy distribution and Laplacian distribution with $\alpha=0.01$ to approximate the point source, and get the final mean $L_{2}\ error$ s as 0.04 and 0.05, respectively. This set of experiments demonstrate that approximating the Dirac delta function with a symmetric unimodal continuous probability density function is feasible.

Network Architecture. This set of experiments aim to validate the effectiveness of our proposed MS-SIREN architecture. The SIREN architecture (# subnets = 1) and our MS-SIREN architecture are used to solve the Maxwell’s equations with a point source under different settings, and the final $L_{2}\ error$ s of all output components and their mean values are shown in Table 1. It is obvious that MS-SIREN outperforms SIREN under all settings, and the best performance is achieved by the architecture comprising four subnets.

Table 1: Maxwell’s equations:

L_{2}\ error

achieved by different network architectures.

network architectures			$L_{2}\ error$
$\#$ subnets	$\#$ layers per subnet	$\#$ neurons per layer	$E_{x}$	$E_{y}$	$H_{z}$	mean
1	7	256	0.192	0.183	0.433	0.269
1	9	256	0.147	0.146	0.070	0.121
2	7	128	0.079	0.074	0.058	0.072
2	9	128	0.054	0.053	0.019	0.027
4	7	64	0.021	0.022	0.001	0.018
4	9	64	0.025	0.022	0.017	0.021

The MS-SIREN architecture uses skip connections between consecutive hidden layers. To validate its effectiveness, we compare MS-SIREN architectures with and without skip connections, and show the results in Fig.6. It is clear that skip connections make $L_{2}\ error$ decrease faster and reach a lower value.

Activation function is another key factor affecting the prediction accuracy. To validate the effectiveness of Sine periodic activation function, we compare it with ReLU and Tanh, and the results are shown in Fig.7. Sine produces the best results in terms of convergence speed and the final mean $L_{2}\ error$ . In comparison, Tanh performs slightly worse than Sine, and ReLU fails to let the network learn any useful information.

4.2 Poisson’s Equation with a point source

We apply our approach to solve Eq.(2) in the region of $\Omega=[0,\pi]\times[0,\pi]$ with the point source located at $\mathbf{x}_{0}=(\frac{\pi}{2},\frac{\pi}{2})$ . The point source is approximated using Gaussian distribution with fixed $\alpha=0.01$ and the lower bound hyperparameter $\epsilon$ is set to 0.01. The model is trained for $50k$ iterations using Adam optimizer with an initial learning rate of 0.001, and the learning rate attenuates 10 times when the training process reaches 40%, 60%, and 80%, respectively. In each iteration, the batch size is set to $(N_{r_{,}0},N_{r,1},N_{bc})=(8192,8192,8192)$ . Fig.8 shows that the absolute error between our solution and the analytical solution is small.

For comparisons, we also apply the Deep Ritz method and the variational method in SimNet to solve the same problem. In this set of experiments, we implement our method and Deep Ritz method in MindSpore framework, and implement the variational method using the public code in SimNet with a default selection of 15 test functions. All the experiments are run on an NVIDIA Tesla V100 GPU card. Fig.9(a) shows that the convergence speed of our method is close to that of Deep Ritz method, but faster than that of SimNet. Fig.9(b) shows the final $L_{2}\ error$ s obtained by different methods, and obviously our method achieves the highest accuracy.

Ablation Studies of Network Architectures. Table 2 shows the prediction accuracy obtained by different network architectures. The SIREN architecture (# subnets = 1) cannot converge in all settings due to the singularity problem. However, when the number of subnets increases to 2 or 4, the $L_{2}\ error$ drops remarkably. This experiment indicates that a multi-scale structure is necessary for solving with this problem.

Table 2: Poisson’s Equation:

L_{2}\ error

s obtained by different network architectures.

Network Architectures			$L_{2}\ error$
# subnets	# layers per subent	# neurons per layer	$L_{2}\ error$
1	5	256	0.289
1	7	256	1.081
2	5	128	0.003
2	7	128	0.006
4	5	64	0.003
4	7	64	0.002

4.3 Barry and Mercer’s Source Problem

The third problem we solve is the nondimensionalized poroelastic Barry and Mercer’s source problem with time-dependent fluid injection. The governing equations are given by


$\displaystyle\frac{\partial^{2}u}{\partial t\partial x}+\frac{\partial^{2}v}{\partial t\partial z}-\frac{\partial^{2}p}{\partial x^{2}}-\frac{\partial^{2}p}{\partial z^{2}}-\beta Q$	$\displaystyle=0,$	(16a)
$\displaystyle(\eta+1)\frac{\partial^{2}u}{\partial x^{2}}+\frac{\partial^{2}u}{\partial z^{2}}+\eta\frac{\partial^{2}v}{\partial x\partial z}-(\eta+1)\frac{\partial p}{\partial x}$	$\displaystyle=0,$	(16b)
$\displaystyle\frac{\partial^{2}v}{\partial x^{2}}+(\eta+1)\frac{\partial^{2}v}{\partial z^{2}}+\eta\frac{\partial^{2}u}{\partial x\partial z}-(\eta+1)\frac{\partial p}{\partial z}$	$\displaystyle=0.$	(16c)

where $u$ and $v$ are the deformations of porous medium along the x and z directions, $p$ is the pore fluid pressure, $Q$ is a fluid source or sink term, $\eta$ and $\beta$ are nondimensional parameters which are functions of the real material parameters.

An illustration of the problem is shown in Fig.10. An oscillating point source $Q$ located at $(x_{0},z_{0})$ in the rectangular domain $[0,a]\times[0,b]$ is given by:

Q(x,z,t)=\delta(x-x_{0})\delta(z-z_{0})sin(\omega t)

(17)

where $\omega$ is the oscillation frequency. The parameters are set as follows: $a=b=1$ , $t\in[0,2\pi]$ , $\beta=2$ , $\eta=1.5$ , $(x_{0},z_{0})=(0.25,0.25)$ and $\omega=1$ . The model is trained for $250k$ iterations using Adam optimizer with an initial learning rate of 0.001, and the learning rate attenuates 10 times when the training process reaches 40%, 60%, and 80%, respectively. In each iteration, the batch size is set to $(N_{r_{,}0},N_{r,1},N_{bc},N_{ic})=(8192,8192,8192,8192)$ .

Fig.11 shows the solution at $t=\pi/5$ obtained by our method (middle) and the analytical solution (top), and the absolute error between them (bottom). The time averaged mean $L_{2}\ error$ of the predicted solution is 0.017, demonstrating the high accuracy of our method. In addition to Gaussian distribution, we also use Cauchy distribution and Laplacian distribution to approximate the Dirac delta function, and obtain similar accuracy with $L_{2}\ error$ being 0.051 and 0.025, respectively.

Ablation Studies

Lower Bound Hyperparameter $\epsilon$ . Fig.12 shows the convergence speed of the mean $L_{2}\ error$ s with different lower bound hyperparameter $\epsilon$ , and the best performance (mean $L_{2}\ error=0.017$ ) is achieved when $\epsilon=0.01$ . It shows that our lower bound constrained uncertainty weighting method outperforms the uncertainty weighting method ( $\epsilon=0$ ) [24] and the equally weighted PINNs method (’original PINNs’).

Network Architectures. Table 3 shows the $L_{2}\ error$ s achieved by different network architectures. Compared with the SIREN architecture (# subnets = 1), MS-SIREN does not bring any benefit in this case as the point source in this problem contains only one frequency component, and thus there are not multiple frequency components in the analytical solution.

Table 3: Barry and Mercer’s source problem:

L_{2}\ error

s got by different network architectures.

Network Architectures			$L_{2}\ error$
# subnets	# layers per subnet	# neurons per layer	$u$	$v$	$p$	mean
1	5	256	0.022	0.021	0.026	0.023
1	7	256	0.011	0.016	0.026	0.018
2	5	128	0.010	0.011	0.026	0.016
2	7	128	0.012	0.013	0.026	0.017
4	5	64	0.012	0.013	0.026	0.017
4	7	64	0.018	0.014	0.026	0.019

Fig.13 compares the convergence speed of the mean $L_{2}\ error$ s when different activation functions are used. The periodic activation function Sine exhibits superior performance over ReLU and Tanh similar to the case of Maxwell’s equations.

In summary, approximating the Dirac function with the symmetric unimodal continuous probability density function is necessary for all experiments in order for PINNs to solve the singularity problem posed by the point source. For the effectiveness of the MS-SIREN architecture, the multi-scale network exhibits better accuracy and faster convergence speed than the single-scale network in both Maxwell’s equations and Poisson’s equation experiments. However, the muti-scale network does not bring significant gain in Barry and Mercer’s source problem due to the fact that the PDE solutions do not contain multiple frequency components. Moreover, the Sine activation function exhibits better accuracy than ReLU and Tanh in all three sets of experiments. For the Maxwell’s equations and the Barry and Mercer’s source problem, the lower bound constrained uncertainty weighting algorithm can obtain a lower $L_{2}\ error$ relative to the original uncertainty weighing method. It is worth noting that all three of our proposed techniques need to be utilized to obtain the desired accuracy when solving the Maxwell’s equations.

5 Conclusions

PDEs with a point source pose a great challenge to conventional PINNs method due to the singularity problem, and all the methods trying to tackle this problem so far can only work in some specific situations. We propose a universal approach based on the PINNs method that can solve the PDEs with a point source without any specific assumptions. The novelty of our approach lies in three techniques, approximating the Dirac delta function with a symmetric unimodal continuous probability density function, balancing the training loss term of different areas with a lower bound constrained uncertainty weighting algorithm, and using multi-scale DNN with Sine activation function to build the network architecture. Three representative physical problems are used to verify the effectiveness of our method, and these experiments show that our method outperforms existing deep learning-based methods in accuracy, efficiency and versatility. In the future we will try to extend the idea of smoothing function to solve discontinuous PDEs. In addition, the lower bound constrained uncertainty weighting algorithm is not only restricted to solving PDEs, but also can be applied to many other multitask learning problems in computer vision and natural language processing [24]. An independent study on this direction is left for future explorations.

References

[1] A Ansari, S Mohaghegh, M Shahnam, JF Dietiker, and T Li. Data driven smart proxy for cfd application of big data analytics & machine learning in computational fluid dynamics, report two: Model building at the cell level. Technical report, NETL, 2018.
[2] Wenbin Yu, Feiyang Zhao, Wenming Yang, and Hongpeng Xu. Integrated analysis of cfd simulation data with k-means clustering algorithm for soot formation under varied combustion conditions. Applied Thermal Engineering, 153:299–305, 2019.
[3] Nils Thuerey, Konstantin Weißenow, Lukas Prantl, and Xiangyu Hu. Deep learning methods for reynolds-averaged navier–stokes simulations of airfoil flows. AIAA Journal, 58(1):25–36, 2020.
[4] Arbaaz Khan, Vahid Ghorbanian, and David Lowther. Deep learning for magnetic field estimation. IEEE Transactions on Magnetics, 55(6):1–4, 2019.
[5] Lu Lu, Pengzhan Jin, and George Em Karniadakis. Deeponet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators. arXiv preprint arXiv:1910.03193, 2019.
[6] Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations. arXiv preprint arXiv:2010.08895, 2020.
[7] Shengze Cai, Zhicheng Wang, Sifan Wang, Paris Perdikaris, and George Em Karniadakis. Physics-informed neural networks for heat transfer problems. Journal of Heat Transfer, 143(6):060801, 2021.
[8] Allan Pinkus. Approximation theory of the mlp model. Acta Numerica 1999: Volume 8, 8:143–195, 1999.
[9] Atilim Gunes Baydin, Barak A Pearlmutter, Alexey Andreyevich Radul, and Jeffrey Mark Siskind. Automatic differentiation in machine learning: a survey. Journal of machine learning research, 18, 2018.
[10] Isaac E Lagaris, Aristidis Likas, and Dimitrios I Fotiadis. Artificial neural networks for solving ordinary and partial differential equations. IEEE transactions on neural networks, 9(5):987–1000, 1998.
[11] Justin Sirignano and Konstantinos Spiliopoulos. Dgm: A deep learning algorithm for solving partial differential equations. Journal of computational physics, 375:1339–1364, 2018.
[12] E Weinan and Bing Yu. The deep ritz method: a deep learning-based numerical algorithm for solving variational problems. Communications in Mathematics and Statistics, 6(1):1–12, 2018.
[13] Maziar Raissi, Paris Perdikaris, and George E Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686–707, 2019.
[14] Xiaowei Jin, Shengze Cai, Hui Li, and George Em Karniadakis. Nsfnets (navier-stokes flow nets): Physics-informed neural networks for the incompressible navier-stokes equations. Journal of Computational Physics, 426:109951, 2021.
[15] Maziar Raissi. Deep hidden physics models: Deep learning of nonlinear partial differential equations. The Journal of Machine Learning Research, 19(1):932–955, 2018.
[16] Pan Zhang, Yanyan Hu, Yuchen Jin, Shaogui Deng, Xuqing Wu, and Jiefu Chen. A maxwell’s equations based deep learning method for time domain electromagnetic simulations. IEEE Journal on Multiscale and Multiphysics Computational Techniques, 6:35–40, 2021.
[17] Dennis M Sullivan. Electromagnetic simulation using the FDTD method. John Wiley & Sons, 2013.
[18] Ben Moseley, Andrew Markham, and Tarje Nissen-Meyer. Solving the wave equation with physics-informed deep learning. arXiv preprint arXiv:2006.11894, 2020.
[19] Oliver Hennigh, Susheela Narasimhan, Mohammad Amin Nabian, Akshay Subramaniam, Kaustubh Tangsali, Zhiwei Fang, Max Rietmann, Wonmin Byeon, and Sanjay Choudhry. Nvidia simnet™: An ai-accelerated multi-physics simulation framework. In International Conference on Computational Science, pages 447–461. Springer, 2021.
[20] Yared W Bekele. Physics-informed deep learning for flow and deformation in poroelastic media. arXiv preprint arXiv:2010.15426, 2020.
[21] Zhi-Qin John Xu, Yaoyu Zhang, Tao Luo, Yanyang Xiao, and Zheng Ma. Frequency principle: Fourier analysis sheds light on deep neural networks. arXiv preprint arXiv:1901.06523, 2019.
[22] Levi McClenny and Ulisses Braga-Neto. Self-adaptive physics-informed neural networks using a soft attention mechanism. arXiv preprint arXiv:2009.04544, 2020.
[23] Sifan Wang, Xinling Yu, and Paris Perdikaris. When and why pinns fail to train: A neural tangent kernel perspective. arXiv preprint arXiv:2007.14527, 2020.
[24] Alex Kendall, Yarin Gal, and Roberto Cipolla. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7482–7491, 2018.
[25] Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred Hamprecht, Yoshua Bengio, and Aaron Courville. On the spectral bias of neural networks. In International Conference on Machine Learning, pages 5301–5310. PMLR, 2019.
[26] Zhi-Qin John Xu, Yaoyu Zhang, and Yanyang Xiao. Training behavior of deep neural network in frequency domain. In International Conference on Neural Information Processing, pages 264–274. Springer, 2019.
[27] Ziqi Liu, Wei Cai, and Zhi-Qin John Xu. Multi-scale deep neural network (mscalednn) for solving poisson-boltzmann equation in complex domains. arXiv preprint arXiv:2007.11207, 2020.
[28] Vincent Sitzmann, Julien Martel, Alexander Bergman, David Lindell, and Gordon Wetzstein. Implicit neural representations with periodic activation functions. Advances in Neural Information Processing Systems, 33, 2020.
[29] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
[30] Stephen D Gedney. Introduction to the finite-difference time-domain (fdtd) method for electromagnetics. Synthesis Lectures on Computational Electromagnetics, 6(1):1–250, 2011.

6 Appendix

Analytical Solution of Barry and Mercer’s Source Problem

For the Barry and Mercer’s Source Problem defined by Eq.(16) with the domain and boundary conditions shown in Fig.10, the analytical solution is expressed as:


$\displaystyle u(x,z,t)$	$\displaystyle=\frac{4}{ab}\sum_{n,q=1}^{\infty}\hat{u}(n,q,t)\cos{\lambda_{n}x}\sin{\lambda_{q}z},$	(18a)
$\displaystyle v(x,z,t)$	$\displaystyle=\frac{4}{ab}\sum_{n,q=1}^{\infty}\hat{v}(n,q,t)\sin{\lambda_{n}x}\cos{\lambda_{q}z},$	(18b)
$\displaystyle p(x,z,t)$	$\displaystyle=-\frac{4}{ab}\sum_{n,q=1}^{\infty}\hat{p}(n,q,t)\sin{\lambda_{n}x}\sin{\lambda_{q}z}.$	(18c)

where $\hat{u}$ , $\hat{v}$ , $\hat{p}$ are intermediate variables in the solution and can be expressed as follows:


$\displaystyle\hat{u}(n,q,t)$	$\displaystyle=\frac{\lambda_{n}}{\lambda_{nq}}\hat{p}(n,q,t),$	(19a)
$\displaystyle\hat{v}(n,q,t)$	$\displaystyle=\frac{\lambda_{q}}{\lambda_{nq}}\hat{p}(n,q,t),$	(19b)
$\displaystyle\hat{p}(n,q,t)$	$\displaystyle=-\frac{\beta\sin{\lambda_{n}x_{0}}\sin{\lambda_{q}z_{0}}}{\lambda_{nq}^{2}}(\lambda_{nq}\sin{\omega t}-\omega\cos{\omega t}+\omega e^{-\lambda_{nq}t})$	(19c)

where

\displaystyle\lambda_{n}=\frac{n\pi}{a},\quad\lambda_{q}=\frac{q\pi}{b},\quad\lambda_{nq}=\lambda_{n}^{2}+\lambda_{q}^{2}.

(20)