This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: * These authors contributed equally to this work.

S. Shekarpaz  11email: [email protected]
M. Azizmalayeri  11email: [email protected]
MH. Rohban  11email: [email protected]
22institutetext: Department of Computer Engineering, Sharif University of Technology, Tehran, Iran

PIAT: Physics Informed Adversarial Training for Solving Partial Differential Equations

Simin Shekarpaz   
Mohammad Azizmalayeri
  
Mohammad Hossein Rohban
(Received: date / Accepted: date)
Abstract

In this paper, we propose the physics informed adversarial training (PIAT) of neural networks for solving nonlinear differential equations (NDE). It is well-known that the standard training of neural networks results in non-smooth functions. Adversarial training (AT) is an established defense mechanism against adversarial attacks, which could also help in making the solution smooth. AT include augmenting the training mini-batch with a perturbation that makes the network output mismatch the desired output adversarially. Unlike formal AT, which relies only on the training data, here we encode the governing physical laws in the form of nonlinear differential equations using automatic differentiation in the adversarial network architecture. We compare PIAT with PINN to indicate the effectiveness of our method in solving NDEs for up to 10 dimensions. Moreover, we propose weight decay and Gaussian smoothing to demonstrate the PIAT advantages. The code repository is available at https://github.com/rohban-lab/PIAT.

Keywords:
Adversarial Training Physics Informed Neural Networks PDE

1 Introduction

The efficiency of deep learning has recently proven to be significant in various scientific fields 1 , such as computer vision, natural language processing, robotics, and physics simulation. One of the effective applications of deep neural networks is to approximate the solution of a physical system 2 ; 3 ; 4 ; 5 ; 6 .

The idea of using neural networks to solve ordinary and partial differential equations was first studied by researchers in 7 ; 8 ; 9 ; 10 ; 11 , where the approximate solution is parameterized by a fully-connected network that allows for a fully differentiable and closed analytic form. Applications of these techniques in scientific computation and numerical approximation of physical systems has increased due to the widespread interest in neural networks.

In 12 , Raissi introduced a physics-informed neural network (PINN) in which the underlying PDE and boundary conditions are enforced through minimization of the loss functions, and the solution derivatives are obtained using the automatic differentiation. The points on the boundary conditions are also considered as the training set.

Efficiency of different types of PINNs has been shown 14 ; 15 ; 16 to solve different classes of PDEs, such as integro differential equations 17 , fractional equations 17 , surfaces PDEs 18 , and stochastic differential equations 19 . In 20 , a conservative physics-informed neural network (cPINN) has been developed to satisfy various conservation laws while solving the PDEs, where the problem is solved in multiple subdomains and ensure continuity of the flux across subdomains. This method can also be used to solve the weak form (variational) of a PDE. Since the weak form contains natural boundary conditions, the neural network solution should only meet the necessary boundary conditions. In Karumuri2020SimulatorfreeSO , the authors considered the variational form for stochastic PDEs and applied the idea of PINN to obtain the PDE solution. In addition, a variational formulation of PINNs (hp-VPINN) was proposed to deal with PDEs with non-smooth solutions 21 , which is based on the Galerkin method. A similar version was also studied in XPINN 22 ; a domain decomposition approach in the PINN framework that conforms to the conservation laws. The authors in 23 proposed a Bayesian approach to physics-informed neural networks to solve the forward and inverse problems. A parallel framework for the domain decomposition of cPINNs and XPINNs was also developed in 24 . The Python library DeepXDE lu2021deepxde , is an implementation of PINNs, which makes the user’s codes compact, and follow closely the mathematical formulation.

Some other recent works have focused on the development of neural network architecture and training 25 ; 26 ; 27 ; 28 ; 29 that can improve the performance of PINNs in various fields. Furthermore, in 30 ; 31 ; 32 , the error estimates and convergence of PINN methods were discussed.

In this paper, we explore effectiveness of the neural network smoothing methods in the performance of PINNs. Specifically, we first propose weight decay Krogh91 and Gaussian smoothing as possible methods that may help the neural network model smoothness, and hence generalization. Weight decay adds a penalty term to the loss function that promotes smaller weight norms in the model, and reduces potential overfitting of the model. Gaussian smoothing adds a small Gaussian random perturbation to each training sample, and performs similar to an augmentation on the input samples. Next, we propose a more generalized method for solving nonlinear problems based on adversarial training called PIAT. Adversarial training has been proved to be effective against the adversarial examples, where small adversarial perturbations are added to the original input such that model produces incorrect outputs SzegedyZSBEGF13 ; NguyenYC15 ; Madry2018 . By using this method, the fully connected neural network is trained to solve physical problems in various dimensions, which leads to better robustness, and generalization of the neural network on the test data, while PINN might not be effective enough. This model is implemented for three PDE systems, Kuramoto-Sivashinsky equation, Sawada–Kotera equation and high dimensional Allen-Cauhn equation. In all cases, the total number of training data is relatively small and the train and test data are generated using Latin Hypercube Sampling (LHS) strategy. The loss functions are minimized using the Adam optimization algorithm.

In the proposed PIAT method, a single neural network is used for the whole domain, which automatically enforces continuity of the solution and its derivatives over time. By using this method, fewer iterations and collocation points are used to achieve convergence in comparison with the standard PINN. The numerical results show that the PIAT method works for high order and nonlinear PDEs and its performance is significantly better than the PINN or other methods such as the Gaussian smoothing or weight decay.

The remainder of this paper is organized as follows. In section 2, we review the PINN formulation to solve differential equations. Section 3 introduces weight decay and Gaussian smoothing as baselines for the generalization purpose. The proposed adversarial training method (PIAT) is introduced in section 4 and their properties are also presented. In section 5, PIAT is applied to 3 different problems and compared with other methods to demonstrate the efficiency of the proposed technique.

2 Physics-informed neural networks (PINNs) for solving PDEs

We consider the general form of an mm-th order initial value problem (IVP), which is as follows:

ut(x,t)+N[u(x,t)]=0,xΩ,t[0,T],u(x,0)=h(x),xΩ,u(x,t)=g(x,t),xΩ,t[0,T],\begin{split}&u_{t}(\textbf{x},t)+N[u(\textbf{x},t)]=0,\quad\textbf{x}\in\Omega,\ t\in[0,T],\\ &u(\textbf{x},0)=h(\textbf{x}),\quad\textbf{x}\in\Omega,\\ &u(\textbf{x},t)=g(\textbf{x},t),\quad\textbf{x}\in\partial\Omega,\ t\in[0,T],\end{split} (1)

where NN represents a nonlinear differential operator, λ\lambda is a parameter, Ωd\Omega\subseteq\mathbb{R}^{d} and uu is the unknown solution with known initial and boundary conditions. Moreover, ut(𝐱,t)=u/tu_{t}({\mathbf{x}},t)=\partial u/\partial t.

By using PINN framework, a fully connected neural network, which is composed of multiple hidden layers, is used to approximate the solution u(x,t)u(\textbf{x},t) of the given nonlinear problem. The inputs of the network are (x,t)(\textbf{x},t) and the outputs are considered as u^(x,t)\hat{u}(\textbf{x},t). Then, each hidden neuron of the neural network in the ll-th layer can be expressed as

yj(l)=σ(iwi,j(l)xi(l)+bj(l)),y^{(l)}_{j}=\sigma\left(\sum_{i}w^{(l)}_{i,j}{x}^{(l)}_{i}+b^{(l)}_{j}\right), (2)

where xi(l)x^{(l)}_{i}’s are the inputs in the ll-th layer, and yj(l)y^{(l)}_{j}’s are the outputs of each hidden neuron in the ll-th layer. The output of the last layer, y1(L)y^{(L)}_{1} is used to approximate the solution. In this formulation wi,j(l)w^{(l)}_{i,j} and bj(l)b^{(l)}_{j} are the trainable weights and biases in the ll-th layer, and σ(.)\sigma(.) is the continuous activation function.

The parameters of the neural network are randomly initialized and iteratively updated by minimizing the loss function that consists of the residuals of three error terms including the PDE, initial condition, and boundary conditions:

minimize𝐰,𝐛1Nri=1Nr|ut(xi,ti)+N[u(xi,ti)]|2+1Nbi=1Nb|u(xi,ti)g(𝐱i,ti)|2+1N0i=1N0|u(xi,0)h(𝐱i)|2,\begin{split}\mathrm{minimize}_{{\mathbf{w}},{\mathbf{b}}}\quad\quad&\frac{1}{N_{r}}\sum_{i=1}^{N_{r}}|u_{t}(\textbf{x}_{i},t_{i})+N[u(\textbf{x}_{i},t_{i})]|^{2}\\ &+\frac{1}{N_{b}}\sum_{i=1}^{N_{b}}|u(\textbf{x}_{i},t_{i})-g({\mathbf{x}}_{i},t_{i})|^{2}+\frac{1}{N_{0}}\sum_{i=1}^{N_{0}}|u(\textbf{x}_{i},0)-h({\mathbf{x}}_{i})|^{2},\end{split} (3)

where u(x,t)u(\textbf{x},t) is the estimated solutions and NfN_{f}, NbN_{b}, and N0N_{0} denote the number of collocation points, boundary points, and initial points, respectively, which can be chosen arbitrary. By solving the minimization problem, the optimal value of the weights 𝐰{\mathbf{w}}^{\star} and biases 𝐛{\mathbf{b}}^{\star} are computed and then the approximate solutions u^(x,t)=u(𝐱,t;𝐰,𝐛)\hat{u}(\textbf{x},t)=u({\mathbf{x}},t;{\mathbf{w}}^{\star},{\mathbf{b}}^{\star}) is obtained.

In the PINN formulation, the solution’s derivatives are computed using the automatic differentiation, where combining the derivatives of the constituent operations by the chain rule gives the derivative of the overall composition. The capabilities of automatic differentiation are well-employed in most deep learning frameworks such as TensorFlow and PyTorch, and it allows us to avoid time consuming derivations or numerical discretization while computing derivatives of all orders in space-time.

3 Gaussian smoothing and weight decay

To improve the generalization of neural network in solving the physics informed equations, we propose to implicitly make the solution robust to small random perturbations. Such robustness is known to cause model generalization, which is also desired in the case of PINNs. To investigate the effectiveness of this idea, we add a random bounded noise to the input as an augmentation in the training, and also apply weight decay in order to reach a stable model that generalizes well on the test data.

Augmentation is a well-known technique in training of the machine learning models, as it prevents the model from overfitting to the training data. For instance, rotation and crop are broadly used in the context of image classification models ShortenK19 ; Luke18 . In this problem, we can not use geometrical transforms due to the data types, but we can add random perturbations to the samples. Therefore, we apply Gaussian smoothing in the training as an augmentation. Perturbing input samples with Gaussian noise can be simply done as:

xGaussian=x+𝒩(0,σ2),x_{Gaussian}=x+\mathcal{N}(0,\,\sigma^{2}), (4)

where Gauussian noise with mean equal to 0 and variance equal to σ2\sigma^{2} is added to the input xx.

Furthermore, we use the weight decay Krogh91 as another effective method in preventing overfitting and improving generalization. Weight decay is a regularization term that adds a penalty term to the loss function to keep the model weights small as follows:

loss=(fθ,y)+λθ2,loss=\mathcal{L}(f_{\theta},y)+\lambda\ \|\theta\|^{2}, (5)

where λ\lambda is the hyper-parameter that controls the impact of the weight decay. It is clear that a large λ\lambda causes higher regularization effect, which leads to more smooth solutions. Likewise, higher σ\sigma in the Gaussian noise augmentation have a similar effect.

4 Physics-Informed Adversarial Training of Neural Networks (PIAT)

Using a training set χ\chi and a neural network fθ(.)f_{\theta}(.), with weights θ\theta, the standard training can be done as:

θ=argminθ𝔼((𝐱,t),y)χ((𝐱,t),y;fθ),\theta^{*}=\operatorname*{arg\,min}_{\theta}\ \mathbb{E}_{(({\mathbf{x}},t),y)\in\chi}\ \ell(({\mathbf{x}},t),y;f_{\theta}), (6)

to minimize the loss (.)\ell(.) of fθ(.)f_{\theta}(.), the estimated PDE solution, on training samples. Note that the loss function (.)\ell(.) is defined based on the Eq. 3:

((𝐱,t),y;fθ):={(fθ/t+N[fθ(𝐱,t)]y1)2;(𝐱,t)𝒞(fθ(𝐱,t)y2)2;(𝐱,t)(fθ(𝐱,0)y3)2;(𝐱,0),\ell(({\mathbf{x}},t),y;f_{\theta}):=\begin{cases}(\partial f_{\theta}/\partial t+N[f_{\theta}({\mathbf{x}},t)]-y_{1})^{2};~{}~{}~{}({\mathbf{x}},t)\in\mathcal{C}\\ (f_{\theta}({\mathbf{x}},t)-y_{2})^{2};~{}~{}~{}({\mathbf{x}},t)\in\mathcal{B}\\ (f_{\theta}({\mathbf{x}},0)-y_{3})^{2};~{}~{}~{}({\mathbf{x}},0)\in\mathcal{I},\end{cases} (7)

where y1=0y_{1}=0, y2=g(𝐱,t)y_{2}=g({\mathbf{x}},t), and y3=h(𝐱)y_{3}=h({\mathbf{x}}), for the training sample (𝐱,t)({\mathbf{x}},t), and 𝒞\mathcal{C}, \mathcal{B}, and \mathcal{I} denote collocation, boundary, and initial points. While standard training performs well in many tasks, it leads to models that are fragile against adversarial examples. These examples are defined as the addition of an p\ell_{p}-bounded perturbation to the original input such that model generates outputs with large errors SzegedyZSBEGF13 ; NguyenYC15 . This issue is due to the fact that the model does not learn robust features for solving the problem IlyasSTETM19 ; d2020underspecification . The fact that the training data does not sufficiently cover the input space suggests that the physics-informed neural networks also suffer from the same phenomenon. The reason for using weight-decay and Gaussian smoothing was to mitigate this issue. However, they might not be as effective as they should be.

Another solution for this problem is the adversarial training that is widely explored on the image classification task Madry2018 ; ZhangYJXGJ19 ; ZhangXH0CSK20 ; az2021 . Adversarial training has two main parts. In the first part, a small perturbation δ\delta is optimized and added to the training sample 𝐱{\mathbf{x}} such that the network loss is maximized. Next, the neural network weights are optimized based on a single stochastic gradient update of the loss function defined on the perturbed sample (𝐱,t)+δ({\mathbf{x}},t)+\delta. As a result, the neural network can learn more robust features and perform well even against adversarially perturbed samples. Adversarial training is summarized as:

θ=argminθ𝔼((𝐱,t),y)χ(max(δx,δt)[ϵ,ϵ]d+1((𝐱+δx,t+δt),y;fθ)),\theta^{*}=\operatorname*{arg\,min}_{\theta}\ \mathbb{E}_{(({\mathbf{x}},t),y)\in\chi}\ \left(\max_{(\delta_{x},\delta_{t})\in[-\epsilon,\epsilon]^{d+1}}\ell(({\mathbf{x}}+\delta_{x},t+\delta_{t}),y;f_{\theta})\right), (8)

where ϵ\epsilon is the \ell_{\infty} bound for the δ\delta. We assume that δ\delta constitutes two parts δx\delta_{x} and δt\delta_{t}, which perturb 𝐱{\mathbf{x}} and tt, respectively. The minimization part of Eq. 8 is done with the stochastic Gradient Descent (SGD). For the maximization part, different algorithms can be applied but we use Projected Gradient Descent (PGD) Madry2018 . PGD attack is an iterative algorithm that computes gradients of the loss with respect to the input, and uses their sign to generate perturbations that maximize the model loss as the following:

δi+1=δi+αsign((𝐱,t)((𝐱,t),y;fθ)),\delta_{i+1}=\delta_{i}+\alpha\ sign(\nabla_{(\mathbf{x},t)}\ell(({\mathbf{x}},t),y;f_{\theta})), (9)

where α\alpha is the size that we move along the gradient sign direction in each step.

Refer to caption
Figure 1: PIAT method schematic. The neural network is optimized based on the adversarially perturbed inputs. These inputs are crafted via adding a bounded perturbation to the original ones in order to maximize the neural network loss.

The pure adversarial training itself does not work well in PIAT due to the fact that perturbing training samples in the physics-informed neural networks changes the correct label, yy, for the sample much more than perturbing images in the vision tasks. Therefore, in PIAT, we propose to optimize the model on ((𝐱,t)+δ({\mathbf{x}},t)+\delta, yy^{\prime}), where yy^{\prime} is the ground truth label for the 𝐱+δ{\mathbf{x}}+\delta, where

y=(0,g(𝐱+δx,t+δt),h(𝐱+δx)).y^{\prime}=(0,g({\mathbf{x}}+\delta_{x},t+\delta_{t}),h({\mathbf{x}}+\delta_{x})). (10)

We further assume that the type of a training sample, which could be either collocation, boundary, or initial, is unchanged after perturbing the point. Note that in the pure AT, one uses the original label for the input 𝐱{\mathbf{x}} while training on the adversarial examples. The objective function for PIAT is summarized as:

θ=argminθ𝔼((𝐱,t),y)χ(maxδ[ϵ,ϵ]d+1(𝐱+δx,t+δt,y;fθ)),\theta^{*}=\operatorname*{arg\,min}_{\theta}\ \mathbb{E}_{(({\mathbf{x}},t),y)\in\chi}\ \left(\max_{\delta\in[-\epsilon,\epsilon]^{d+1}}\ell({\mathbf{x}}+\delta_{x},t+\delta_{t},y^{\prime};f_{\theta})\right), (11)

where the loss function is set according to Eq. 3. PIAT is represented schematically in Fig. 1.

5 Numerical examples

In this section, the adversarial training of physics informed neural network, weight decay and Gaussian smoothing are applied for solving linear and nonlinear equations. Also, the Kuramoto-Sivashinsky equation, Sawada–Kotara equation, and a high dimensional Allen-Cahn equation are considered to conduct more comprehensive analysis.

The training and test points of the evaluation scheme are chosen randomly using Latin Hypercube sampling strategy on the domain of the problem. In order to show the efficiency and capability of the proposed method, the numerical approximations of the mentioned methods are compared. The convergence of our method is also illustrated numerically.

Example 1.

Refer to caption
Figure 2: A visual depiction of the actual solution (right), the training points, and predicted solution by PIAT (left) (Example 1).

Let us consider the periodic initial value problem for the fourth-order nonlinear Kuramoto–Sivashinsky (KS) equation as follows:

ut+uux+uxx+νuxxxx=f(x,t),(x,t)×0+,u(x,0)=g(x),\begin{split}&u_{t}+uu_{x}+u_{xx}+\nu u_{xxxx}=f(x,t),\quad(x,t)\in\mathbb{R}\times\mathbb{R}_{0}^{+},\\ &u(x,0)=g(x),\end{split} (12)

where u=u(x,t)u=u(x,t) is real-valued function and u0:u^{0}:\mathbb{R}\longrightarrow\mathbb{R} is sufficiently smooth. u0u^{0}, ff, and gg are 2π2\pi-periodic functions and ν\nu is a positive parameter that is playing the role of viscosity. The solution of Eq. 12 is also 2π2\pi-periodic in the space domain, i.e. u(x+2π,t)=u(x,t)u(x+2\pi,t)=u(x,t) for all xx\in\mathbb{R} and t0t\geq 0. The exact solution of Eq. 12 is given by u(x,t)=sin(x+t)u(x,t)=\sin(x+t). In all experiments, TT is equal to 11, and ν=12\nu=\frac{1}{2}. A visual depiction of the problem is shown in right panel of Fig. 2.

By using the proposed methods, the solution is approximated by a deep neural network with 5 hidden layers and 100 neurons per hidden layer and 200 boundary points. The Gaussian smoothing is only applied to the PINN, but the weight decay is applied to both PINN and PIAT. The results are presented in the Table 1. First of all, the results demonstrate the effectiveness of PIAT method over PINN regardless of using the weight-decay method. However, the weight-decay improves the results for both PINN and PIAT that demonstrates the advantages of using regularization methods in both cases. Note that the regularization is originally done to prevent overfitting, but in our case this technique has helped to reduce the training error (see the first row of the Table 1). We hypothesize that the regularization would make the loss function smoother, and increase the gradient-based optimization effectiveness. Gaussian smoothing, on the other hand, tries to achieve the same goal in an indirect and blind way, which does not help with the gradient-based optimization. Adversarial sample augmentation in PIAT, moves the samples purposefully in the steepest direction that causes large changes in the solution. Therefore, similar to the weight decay, this technique enhances the gradient, in an input adaptive manner, which leads to a better optimization.

Table 1: Training and test loss functions of standard training with and without weight decay with Nu=20N_{u}=20, Nf=200N_{f}=200, and ϵ=0.05\epsilon=0.05 for 55 hidden layers, 100100 neurons and 1000010000 epochs (Example 1).
Method PINN PIAT Gaussian
weight decay λ=0\lambda=0 λ=5×104\lambda=5\times 10^{-4} λ=0\lambda=0 λ=5×104\lambda=5\times 10^{-4} λ=0\lambda=0
train 1.04×1031.04\times 10^{-3} 3.07×1053.07\times 10^{-5} 2.23×1062.23\times 10^{-6} 1.46×1061.46\times 10^{-6} 1.11×1021.11\times 10^{-2}
test 1.08×1031.08\times 10^{-3} 2.39×1052.39\times 10^{-5} 2.59×1062.59\times 10^{-6} 1.29×𝟏𝟎𝟔\mathbf{1.29\times 10^{-6}} 1.07×1021.07\times 10^{-2}
Table 2: Training and test losses, in odd and even rows respectively, for different numbers of boundary points Nu=20N_{u}=20, Nf=200N_{f}=200 and Ntest=100N_{test}=100 for 2000020000 epochs (Example 1).
NuN_{u} PINN PIAT
2020 1.04×1031.04\times 10^{-3} 2.23×106{2.23\times 10^{-6}}
1.08×1031.08\times 10^{-3} 2.59×𝟏𝟎𝟔\mathbf{2.59\times 10^{-6}}
4040 5.65×106{5.65\times 10^{-6}} 4.50×1064.50\times 10^{-6}
4.87×1064.87\times 10^{-6} 3.98×𝟏𝟎𝟔\mathbf{3.98\times 10^{-6}}
7070 3.88×1053.88\times 10^{-5} 9.25×1069.25\times 10^{-6}
3.21×1053.21\times 10^{-5} 8.90×𝟏𝟎𝟔\mathbf{8.90\times 10^{-6}}
100100 5.90×1065.90\times 10^{-6} 1.11×1061.11\times 10^{-6}
8.72×1068.72\times 10^{-6} 9.90×𝟏𝟎𝟕\mathbf{9.90\times 10^{-7}}
Table 3: Training and test losses, in odd and even rows respectively, for different numbers of neurons and 55 layers, boundary points Nu=20N_{u}=20, Nf=200N_{f}=200 and Ntest=100N_{test}=100 for 2000020000 epochs (Example 1).
Neuron PINN PIAT
1010 1.46×1011.46\times 10^{-1} 1.36×1011.36\times 10^{-1}
1.44×1011.44\times 10^{-1} 1.33×𝟏𝟎𝟏\mathbf{1.33\times 10^{-1}}
5050 4.40×1054.40\times 10^{-5} 1.06×1041.06\times 10^{-4}
2.81×𝟏𝟎𝟓\mathbf{2.81\times 10^{-5}} 9.91×1059.91\times 10^{-5}
100100 6.44×1056.44\times 10^{-5} 2.23×1062.23\times 10^{-6}
5.86×1055.86\times 10^{-5} 2.59×𝟏𝟎𝟔\mathbf{2.59\times 10^{-6}}
Table 4: Training and test losses, in odd and even rows respectively, for different numbers layers, 5050 Neurons, boundary points Nu=20N_{u}=20, Nf=200N_{f}=200 and Ntest=100N_{test}=100 for 2000020000 epochs (Example 1).
Layer PINN PIAT
55 4.40×1054.40\times 10^{-5} 1.06×1041.06\times 10^{-4}
2.81×𝟏𝟎𝟓\mathbf{2.81\times 10^{-5}} 9.91×1059.91\times 10^{-5}
1010 1.50×1031.50\times 10^{-3} 3.16×1053.16\times 10^{-5}
1.40×1031.40\times 10^{-3} 2.50×𝟏𝟎𝟓\mathbf{2.50\times 10^{-5}}
2020 1.43×1041.43\times 10^{-4} 6.51×1066.51\times 10^{-6}
8.88×1058.88\times 10^{-5} 2.98×𝟏𝟎𝟔\mathbf{2.98\times 10^{-6}}

Refer to caption
Figure 3: Training and validation losses during the training of the neural network for solving the KS equation using the PINN and PIAT methods (Example 1).

To check the impacts of the neural network hyper-parameters in the performance of the PIAT method, the training and test errors of the solutions for different numbers of boundary and collocation points have been presented in the Table 2. Moreover, Tables 3 and 4 show the results for various number of layers and neurons, which all support the general superiority of the PIAT in comparison to the PINN. It is notable that PIAT exhibits this improvement on larger network sizes, which is consistent with the fact that adversarial training requires over-parametrization of the base model Madry2018 . The train and validation losses during the training is also available for these methods in Fig. 3, which demonstrates that both of the methods have converged and trained for sufficient number of epochs.

To check the impacts of the KS problem hyper-parameters in the performance of the PIAT method, the methods are compared within a wider ranges of xx and tt in Table 5. Moreover, as another ablation study in this example, the methods are also compared after replacing the exact solution with u(x,t)=et.cos(x).sin(1+x)u(x,t)=e^{t}.\cos(x).\sin(1+x) and corresponding f(x,t)f(x,t) in Table 6. All these new findings also point out that PIAT has a much lower error than PINN even with changing the ranges of xx and tt or the exact solution.

Table 5: Training and test losses, in odd and even rows respectively, for different ranges of xx and tt with Nu=50N_{u}=50, Nf=2000N_{f}=2000, Ntest=1000N_{test}=1000, 5 layers, 100 neurons, and 20000 epochs (Example 1).
xx tt PINN PIAT
0200-20 010-1 4.01×1084.01\times 10^{-8} 2.25×1082.25\times 10^{-8}
6.33×1086.33\times 10^{-8} 2.36×𝟏𝟎𝟖\mathbf{2.36\times 10^{-8}}
0200-20 0100-10 7.13×1077.13\times 10^{-7} 1.16×1071.16\times 10^{-7}
7.91×1077.91\times 10^{-7} 1.32×𝟏𝟎𝟕\mathbf{1.32\times 10^{-7}}
0500-50 010-1 9.88×1069.88\times 10^{-6} 5.98×1065.98\times 10^{-6}
2.41×1052.41\times 10^{-5} 6.29×𝟏𝟎𝟔\mathbf{6.29\times 10^{-6}}
0500-50 0100-10 1.13×1041.13\times 10^{-4} 7.16×1057.16\times 10^{-5}
3.20×1043.20\times 10^{-4} 7.52×𝟏𝟎𝟓\mathbf{7.52\times 10^{-5}}
Table 6: Training and test losses for u(x,t)=et.cos(x).sin(1+x)u(x,t)=e^{t}.\cos(x).\sin(1+x) with Nu=50N_{u}=50, Nf=2000N_{f}=2000, Ntest=1000N_{test}=1000, 5 layers, 100 neurons, and 20000 epochs (Example 1).
xx tt PINN PIAT
0200-20 010-1 5.07×1055.07\times 10^{-5} 4.61×1064.61\times 10^{-6}
8.30×1058.30\times 10^{-5} 6.91×𝟏𝟎𝟔\mathbf{6.91\times 10^{-6}}

Example 2. Sawada-Kotera (SK) equation was widely investigated in many recent works sk_App_1 ; sk_App_2 . In this example, we consider to apply PIAT to the seventh order SK equation, which can be shown in the form of:

ut+(63u4+63(2u2uxx+uux2)+21(uuxxxx+uxx2+uxuxxx)+uxxxxxx)x=0,u_{t}+(63u^{4}+63(2u^{2}u_{xx}+uu_{x}^{2})+21(uu_{xxxx}+u_{xx}^{2}+u_{x}u_{xxx})+u_{xxxxxx})_{x}=0, (13)

with the initial condition given by u(x,0)=4k23(23tanh2(kx))u(x,0)=\frac{4k^{2}}{3}(2-3\tanh^{2}(kx)). The exact solution is:

u(x,t)=4k23(23tanh2(k(x256k63t)).u(x,t)=\frac{4k^{2}}{3}(2-3\tanh^{2}(k(x-\frac{256k^{6}}{3}t)). (14)

This example has also been solved in 13 with the Adomian decomposition method.

The proposed method is applied, and the training and test errors with randomly chosen training and test data have been reported in the Table 7, which shows the efficiency of PIAT for solving this kind of problem. Moreover, the results of standard PINN are shown in the Table 8 and so the efficiency of the PIAT in comparison with the PINN is concluded.

Table 7: Training and test losses of the PIAT, in odd and even rows respectively, for different numbers of layers and neurons per layer with Nf=100N_{f}=100, Nu=10N_{u}=10, Ntest=100N_{test}=100 for 1000010000 epochs (Example 2).
Neuron Layers
1010 2020 4040
22 1.21×1061.21\times 10^{-6} 2.35×107{2.35\times 10^{-7}} 3.10×1073.10\times 10^{-7}
1.28×1061.28\times 10^{-6} 2.36×107{2.36\times 10^{-7}} 2.74×1072.74\times 10^{-7}
44 3.21×1083.21\times 10^{-8} 2.13×1082.13\times 10^{-8} 1.45×1081.45\times 10^{-8}
2.93×1082.93\times 10^{-8} 2.15×1082.15\times 10^{-8} 1.45×108{1.45\times 10^{-8}}
88 1.24×1081.24\times 10^{-8} 1.53×1081.53\times 10^{-8} 1.58×1081.58\times 10^{-8}
1.27×𝟏𝟎𝟖\mathbf{1.27\times 10^{-8}} 1.50×1081.50\times 10^{-8} 1.57×1081.57\times 10^{-8}
Table 8: Training and test losses of the PINN, in odd and even rows respectively, for different numbers of layers and neurons per layer with Nf=100N_{f}=100, Nu=10N_{u}=10, Ntest=100N_{test}=100 for 1000010000 epochs (Example 2).
Neuron Layers
1010 2020 4040
22 2.73×1062.73\times 10^{-6} 7.55×1077.55\times 10^{-7} 7.44×1077.44\times 10^{-7}
2.78×1062.78\times 10^{-6} 6.99×1076.99\times 10^{-7} 7.86×1077.86\times 10^{-7}
44 4.06×1084.06\times 10^{-8} 3.55×1083.55\times 10^{-8} 1.63×1081.63\times 10^{-8}
3.78×1083.78\times 10^{-8} 3.67×1083.67\times 10^{-8} 1.62×1081.62\times 10^{-8}
88 1.46×1081.46\times 10^{-8} 3.56×1083.56\times 10^{-8} 2.11×1082.11\times 10^{-8}
1.46×𝟏𝟎𝟖\mathbf{1.46\times 10^{-8}} 3.46×1083.46\times 10^{-8} 1.99×1081.99\times 10^{-8}

In Tables 9 and 10, the results of PINN and PIAT with and without weight decay have been presented for different numbers of neurons. Similar to the example 1, the weight decay improves the results for both PINN and PIAT. Moreover, the combination of PIAT and weight decay reaches the best results in both Tables.

Table 9: Training and test losses with and without weight decay with Nf=100N_{f}=100, Nu=10N_{u}=10 and ϵ=0.05\epsilon=0.05 for 22 hidden layers, 1010 neurons and 1000010000 epochs training (Example 2).
Method PINN PIAT
weight decay λ=0\lambda=0 λ=5×104\lambda=5\times 10^{-4} λ=0\lambda=0 λ=5×104\lambda=5\times 10^{-4}
train 2.73×1062.73\times 10^{-6} 7.64×1087.64\times 10^{-8} 1.21×1061.21\times 10^{-6} 4.20×1084.20\times 10^{-8}
test 2.78×1062.78\times 10^{-6} 6.98×1086.98\times 10^{-8} 1.28×1061.28\times 10^{-6} 4.06×𝟏𝟎𝟖\mathbf{4.06\times 10^{-8}}
Table 10: training and test loss functions with and without weight decay with Nf=100N_{f}=100, Nu=10N_{u}=10 and ϵ=0.05\epsilon=0.05 for 22 hidden layers, 2020 neurons and 1000010000 epochs training (Example 2).
Method PINN PIAT
weight decay λ=0\lambda=0 λ=5×104\lambda=5\times 10^{-4} λ=0\lambda=0 λ=5×104\lambda=5\times 10^{-4}
train 7.55×1077.55\times 10^{-7} 1.55×1071.55\times 10^{-7} 2.35×1072.35\times 10^{-7} 6.78×1096.78\times 10^{-9}
test 6.99×1076.99\times 10^{-7} 1.57×1071.57\times 10^{-7} 2.36×1072.36\times 10^{-7} 6.44×𝟏𝟎𝟗\mathbf{6.44\times 10^{-9}}

To perform a visual comparison, a point-wise visualization of absolute errors (|fθ(x,t)y||f_{\theta}(x,t)-y|) is presented all over the inputs range in Fig. 4 for both PINN and PIAT. The results show that PIAT achieves an approximately uniform error all over the inputs range. Moreover, PIAT achieves a much lower errors than PINN, that shows its effectiveness regardless of the choice for test points.

Refer to caption
Figure 4: Visualization of exact solution and the point-wise absolute error of predictions by PINN and PIAT (|fθ(x,t)y||f_{\theta}(x,t)-y|) in the Example 2. PIAT method achieves much lower error than PINN.

Finally, as an ablation study, the methods are compared within a wider ranges of xx and tt in Table 11. These findings also confirms the superiority of PIAT in solving the problem.

Table 11: Training and test losses, in odd and even rows respectively, for different ranges of xx and tt with 2 layers, 10 neurons, and 10000 epochs (Example 2).
xx tt PINN PIAT
050-5 050-5 6.01×1076.01\times 10^{-7} 2.32×1072.32\times 10^{-7}
6.96×1076.96\times 10^{-7} 3.04×𝟏𝟎𝟕\mathbf{3.04\times 10^{-7}}
0100-10 0100-10 4.10×1064.10\times 10^{-6} 2.83×1072.83\times 10^{-7}
4.13×1064.13\times 10^{-6} 3.83×𝟏𝟎𝟕\mathbf{3.83\times 10^{-7}}

Example 3. Consider the high dimensional Allen-Cahn equation as follows:

ut=uxx+uu3+f(x,t),xd,t,u(0,t)=0,u(π,t)=sin(απ)cos(2t),u(x,0)=sin(αx),\begin{split}u_{t}&=u_{xx}+u-u^{3}+f(x,t),\quad x\in\mathbb{R}^{d},~{}t\in\mathbb{R},\\ u(0,t)&=0,\quad u(\pi,t)=\sin(\alpha\pi)\cos(2t),\\ u(x,0)&=\sin(\alpha x),\\ \end{split} (15)

which has the exact solution u(x,t)=sin(αx)cos(2t)u(x,t)=\sin(\alpha x)\cos(2t). The proposed method have been used for solving this problem. The numerical results in Table 12 show the efficiency of the PIAT in comparison with the standard training of PINN for various dimensions. Moreover, the point-wise visualization of the absolute errors (|fθ(x,t)y||f_{\theta}(x,t)-y|) in Fig. 5 shows that PIAT has a uniform error all over the domain, which is really important in the real-world problems.

Table 12: Training and test loss functions of PINN and PIAT with Nf=1000N_{f}=1000, Nu=100N_{u}=100, Ntest=500N_{test}=500 and ϵ=0.05\epsilon=0.05 (Example 3).
dimension d=1d=1 d=3d=3 d=10d=10
method PINN PIAT PINN PIAT PINN PIAT
train 0.012430.01243 0.00600.0060 0.006510.00651 0.003000.00300 0.002690.00269 0.001160.00116
test 0.012520.01252 0.0059\mathbf{0.0059} 0.006410.00641 0.00297\mathbf{0.00297} 0.002840.00284 0.00118\mathbf{0.00118}
Refer to caption
Figure 5: Visualization of exact solution and the point-wise absolute error of predictions by PINN and PIAT (|fθ(x,t)y||f_{\theta}(x,t)-y|) in the Example 3 for d=5d=5. In order to visualize the solution on a two-dimensional plane, a projection on x1x_{1} and x2x_{2} dimensions is performed by setting the other dimensions to zero.

Finally, as an ablation study, the methods are compared within a wider ranges of xx and tt in Table 13. The results are similar to previous examples, and show that PIAT has a lower error.

Table 13: Training and test losses, in odd and even rows respectively, for different ranges of xx and tt with 5 layers, 80 neurons, 10000 epochs, and d=2d=2 (Example 3).
xx tt PINN PIAT
0π0-\pi 030-3 0.012060.01206 0.101800.10180
0.011640.01164 0.01001\mathbf{0.01001}
02π0-2\pi 070-7 0.044870.04487 0.006060.00606
0.046540.04654 0.00595\mathbf{0.00595}

Conclusion

In this paper, we have introduced the physics informed adversarial training (PIAT) of neural networks for solving nonlinear problems. Compared to previous works like PINN, our method helps reduce the model’s test errors on various PDE examples, ranging from low-dimensional to high-dimensional problems. In our simulations, we studied the effects of the different factors on training and test errors, where the convergence of our proposed method was achieved by choosing the different numbers of layers and neurons or input ranges.

Moreover, we proposed weight decay and Gaussian smoothing which demonstrated the PIAT advantages. The weight decay was applied to both PINN and PIAT, where the obtained results show the efficiency of the PIAT method regardless of the use of the weight decay method compared to the PINN. Also, weight decay improved the results for PINN and PIAT, which demonstrated the benefits of using generalized methods, which PIAT did well. The results of Gaussian smoothing also showed that an adversarial attack works better than a random noise to perturb the samples in PIAT.

Acknowledgements.
We would like to thank Hossein Yousefi Moghaddam for his insightful comments and helps.

Author’s Contributions: S.S. and M.A. were involved in the conceptualization, formal analysis, investigation, methodology, validation, and drafting the manuscript. M.A. also contributed to the software development and visualization. M.H.R. contributed to the conceptualization, investigation, methodology, supervision, and review/editing of the paper.

Competing Interests: The authors have no competing interests to declare.

Funding Sources: This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

References

  • (1) Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, pp. 436–444, 2015.
  • (2) Y. Zhu and N. Zabaras, “Bayesian deep convolutional encoderdecoder networks for surrogate modeling and uncertainty quantification,” J. Comput. Phys., vol. 366, p. 415–447, aug 2018.
  • (3) Y. Zhu, N. Zabaras, P.-S. Koutsourelakis, and P. Perdikaris, “Physics-constrained deep learning for high-dimensional surrogate modeling and uncertainty quantification without labeled data,” Journal of Computational Physics, vol. 394, pp. 56–81, 2019.
  • (4) C. Yang, X. Yang, and X. Xiao, “Data-driven projection method in fluid simulation,” Computer Animation and Virtual Worlds, vol. 27, no. 3-4, pp. 415–424, 2016.
  • (5) N. Geneva and N. Zabaras, “Quantifying model form uncertainty in reynolds-averaged turbulence models with bayesian deep neural networks,” Journal of Computational Physics, vol. 383, pp. 125–147, 2019.
  • (6) M. Schöberl, N. Zabaras, and P.-S. Koutsourelakis, “Predictive collective variable discovery with deep bayesian models,” The Journal of Chemical Physics, vol. 150, no. 2, p. 024109, 2019.
  • (7) D. C. Psichogios and L. H. Ungar, “A hybrid neural network-first principles approach to process modeling,” AIChE Journal, vol. 38, no. 10, pp. 1499–1511, 1992.
  • (8) A. J. Meade and A. A. Fernandez, “The numerical solution of linear ordinary differential equations by feedforward neural networks,” Math. Comput. Model., vol. 19, p. 1–25, June 1994.
  • (9) A. J. Meade and A. A. Fernandez, “Solution of nonlinear ordinary differential equations by feedforward neural networks,” Math. Comput. Model., vol. 20, p. 19–44, Nov. 1994.
  • (10) I. Lagaris, A. Likas, and D. Fotiadis, “Artificial neural networks for solving ordinary and partial differential equations,” IEEE transactions on neural networks, vol. 9 5, pp. 987–1000, 1998.
  • (11) M. W. M. G. Dissanayake and N. Phan-Thien, “Neural-network-based approximations for solving partial differential equations,” Communications in Numerical Methods in Engineering, vol. 10, no. 3, pp. 195–201, 1994.
  • (12) M. Raissi, P. Perdikaris, and G. Karniadakis, “Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations,” Journal of Computational Physics, vol. 378, pp. 686–707, 2019.
  • (13) D. Zhang, L. Lu, L. Guo, and G. E. Karniadakis, “Quantifying total uncertainty in physics-informed neural networks for solving forward and inverse stochastic problems,” J. Comput. Phys., vol. 397, 2019.
  • (14) L. Yang, D. Zhang, and G. E. Karniadakis, “Physics-informed generative adversarial networks for stochastic differential equations,” SIAM J. Sci. Comput., vol. 42, no. 1, pp. A292–A317, 2020.
  • (15) X. Meng and G. E. Karniadakis, “A composite neural network that learns from multi-fidelity data: Application to function approximation and inverse PDE problems,” J. Comput. Phys., vol. 401, 2020.
  • (16) G. Pang, L. Lu, and G. E. Karniadakis, “fpinns: Fractional physics-informed neural networks,” SIAM J. Sci. Comput., vol. 41, pp. A2603–A2626, 2019.
  • (17) Z. Fang and J. Zhan, “A physics-informed neural network framework for pdes on 3d surfaces: Time independent problems,” IEEE Access, vol. 8, pp. 26328–26335, 2020.
  • (18) D. Zhang, L. Guo, and G. E. Karniadakis, “Learning in modal space: Solving time-dependent stochastic pdes using physics-informed neural networks,” SIAM Journal on Scientific Computing, vol. 42, no. 2, pp. A639–A665, 2020.
  • (19) A. D. Jagtap, E. Kharazmi, and G. E. Karniadakis, “Conservative physics-informed neural networks on discrete domains for conservation laws: Applications to forward and inverse problems,” Computer Methods in Applied Mechanics and Engineering, vol. 365, p. 113028, 2020.
  • (20) S. Karumuri, R. Tripathy, I. Bilionis, and J. H. Panchal, “Simulator-free solution of high-dimensional stochastic elliptic partial differential equations using deep neural networks,” J. Comput. Phys., vol. 404, 2020.
  • (21) E. Kharazmi, Z. Zhang, and G. E. Karniadakis, “hp-vpinns: Variational physics-informed neural networks with domain decomposition,” Computer Methods in Applied Mechanics and Engineering, vol. 374, p. 113547, 2021.
  • (22) A. Jagtap and G. Karniadakis, “Extended physics-informed neural networks (xpinns): A generalized space-time domain decomposition based deep learning framework for nonlinear partial differential equations,” Communications in Computational Physics, vol. 28, pp. 2002–2041, 11 2020.
  • (23) L. Yang, X. Meng, and G. E. Karniadakis, “B-pinns: Bayesian physics-informed neural networks for forward and inverse pde problems with noisy data,” Journal of Computational Physics, vol. 425, p. 109913, 2021.
  • (24) K. Shukla, A. D. Jagtap, and G. E. Karniadakis, “Parallel physics-informed neural networks via domain decomposition,” CoRR, vol. abs/2104.10013, 2021.
  • (25) L. Lu, X. Meng, Z. Mao, and G. E. Karniadakis, “DeepXDE: A deep learning library for solving differential equations,” SIAM Review, vol. 63, no. 1, pp. 208–228, 2021.
  • (26) X. Meng and G. E. Karniadakis, “A composite neural network that learns from multi-fidelity data: Application to function approximation and inverse PDE problems,” J. Comput. Phys., vol. 401, 2020.
  • (27) A. D. Jagtap, K. Kawaguchi, and G. E. Karniadakis, “Adaptive activation functions accelerate convergence in deep and physics-informed neural networks,” J. Comput. Phys., vol. 404, 2020.
  • (28) S. Wang, Y. Teng, and P. Perdikaris, “Understanding and mitigating gradient flow pathologies in physics-informed neural networks,” SIAM Journal on Scientific Computing, vol. 43, no. 5, pp. A3055–A3081, 2021.
  • (29) L. Lu, R. Pestourie, W. Yao, Z. Wang, F. Verdugo, and S. G. Johnson, “Physics-informed neural networks with hard constraints for inverse design,” ArXiv, vol. abs/2102.04626, 2021.
  • (30) H. Gao, L. Sun, and J.-X. Wang, “Phygeonet: Physics-informed geometry-adaptive convolutional neural networks for solving parameterized steady-state pdes on irregular domain,” Journal of Computational Physics, vol. 428, p. 110079, 2021.
  • (31) Y. Shin, J. Darbon, and G. Em Karniadakis, “On the convergence of physics informed neural networks for linear second-order elliptic and parabolic type pdes,” Communications in Computational Physics, vol. 28, no. 5, pp. 2042–2074, 2020.
  • (32) S. Mishra and R. Molinaro, “Estimates on the generalization error of physics informed neural networks (pinns) for approximating pdes,” ArXiv, vol. abs/2007.01138, 2020.
  • (33) S. Wang, X. Yu, and P. Perdikaris, “When and why pinns fail to train: A neural tangent kernel perspective,” CoRR, vol. abs/2007.14527, 2020.
  • (34) A. Krogh and J. A. Hertz, “A simple weight decay can improve generalization,” in Advances in Neural Information Processing Systems 4, NIPS, 1991.
  • (35) C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” in 2nd International Conference on Learning Representations, ICLR, 2014.
  • (36) A. M. Nguyen, J. Yosinski, and J. Clune, “Deep neural networks are easily fooled: High confidence predictions for unrecognizable images,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2015.
  • (37) A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” in 6th International Conference on Learning Representations, ICLR, 2018.
  • (38) C. Shorten and T. M. Khoshgoftaar, “A survey on image data augmentation for deep learning,” J. Big Data, vol. 6, p. 60, 2019.
  • (39) L. Taylor and G. Nitschke, “Improving deep learning with generic data augmentation,” in 2018 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1542–1547, 2018.
  • (40) A. Ilyas, S. Santurkar, D. Tsipras, L. Engstrom, B. Tran, and A. Madry, “Adversarial examples are not bugs, they are features,” in Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS (H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox, and R. Garnett, eds.), 2019.
  • (41) A. D’Amour, K. Heller, D. Moldovan, B. Adlam, B. Alipanahi, A. Beutel, C. Chen, J. Deaton, J. Eisenstein, M. D. Hoffman, et al., “Underspecification presents challenges for credibility in modern machine learning,” arXiv preprint arXiv:2011.03395, 2020.
  • (42) H. Zhang, Y. Yu, J. Jiao, E. P. Xing, L. E. Ghaoui, and M. I. Jordan, “Theoretically principled trade-off between robustness and accuracy,” in Proceedings of the 36th International Conference on Machine Learning, ICML, 2019.
  • (43) J. Zhang, X. Xu, B. Han, G. Niu, L. Cui, M. Sugiyama, and M. S. Kankanhalli, “Attacks which do not kill training make adversarial learning stronger,” in Proceedings of the 37th International Conference on Machine Learning, ICML, 2020.
  • (44) M. Azizmalayeri and M. H. Rohban, “Lagrangian objective function leads to improved unforeseen attack generalization in adversarial training,” CoRR, vol. abs/2103.15385, 2021.
  • (45) J. Manafian and M. Lakestani, “Lump-type solutions and interaction phenomenon to the bidirectional sawada–kotera equation,” Pramana, vol. 92, no. 3, pp. 1–13, 2019.
  • (46) M. Osman, “One-soliton shaping and inelastic collision between double solitons in the fifth-order variable-coefficient sawada–kotera equation,” Nonlinear Dynamics, vol. 96, no. 2, pp. 1491–1496, 2019.
  • (47) D. Kaya and M. Aassila, “An application for a generalized kdv equation by the decomposition method,” Physics Letters A, vol. 299, no. 2, pp. 201–206, 2002.