PINN for Dynamical Partial Differential Equations is Not Training Deeper Networks Rather Learning Advection and Time Variance

\fnmSiddharth\sur Rout [email protected] \orgdivDepartment of Mechanical Engineering \orgnameThe University of British Columbia, \cityVancouver, \stateBC, \countryCanada

Abstract

The concepts and techniques of physics-informed neural networks (PINNs) is studied and limitations are identified to make it efficient to approximate dynamical equations. Potential working research domains are explored for increasing the robustness of this technique for the solvability of partial differential equations. It is identified that PINNs potentially fails to stronger advection and longer time duration. Also, optimization function and constraint posing needs to be smarter. Even a shallow network is good for a lot of problems while powerful deeper network fails. Reservoir computing based recurrent neural network architecture is recommended to solve dynamical problems.

keywords:

Dynamical Systems, Chaos, Physics Informed Neural Networks, PINN, Kuramoto–Sivashinsky Equation, ODE solution, PDE solution, Differential Equations, Recurrent Neural Network, Reservoir Computing

1 Introduction

The solution of differential equations have been an important topic for the almost every field of the world, say it be finance, mechanics, meteorology etc. Starting from the days of Newton and Leibniz solving differential equations have been core to developments in this world. Not all differential equations are solvable by hand and initiate limitations, especially when multiple independent variables come into the equation or build system of equations. However, solving these equations are need of the hour and hence we shifted our focus from exact solutions to approximate solutions as function operations and transformations are limiting in concepts. There arose the generation for solving equations by using the basic definitions or first principles of limits, discretization and numerical analysis1 . Popular such techniques are finite element methods, finite volume methods, finite difference approximations etc. These methods are generalizable and that is the benefit. Using these techniques we can solve almost any equation for any geometry. But with increased complexity, there are a couple of problems accompanied like how good the approximation is and computation expense. Discretization gives us a long list of simplified approximate equations to solve. Though we know how to solve however solving them will require us to use computers to do those hectic mathematical calculations. For a stable and accurate solution, a lot of time and energy are consumed in the process1 . Though we can calculate a lot of things in this world but we have lack of time, resources, money and problems could be endless. A type of differential equation called dynamical systems is tough to solve after a certain range in time, there are reasons to it. Dynamical systems are tough to solve as the solution my bifurcate and that shall make the system chaotic. In a chaotic system, a minute change in the initial condition or equation coefficients will have drastically different outcomes. This is sometimes referred by ’The Butterfly Effect’. The aim of this project is to develop a function approximation method that can potentially replace computationally expensive solvers for dynamical systems. The one dimensional Kuramoto-Sivashinsky equation is solved for trials and research2 ; 3 .

Function approximations or analytical solutions are better known for their light-weight4 . These techniques can get rid of the three primary types of errors that are evident in full order discretized approximation, namely instability, inaccuracy and shift5 . A system of dynamical systems is mainly sensitive inaccuracy due to insanely evident sensitivity to initial conditions. A benefit of function approximation is the ability of correction and reproduction. Also, an added benefit is extrapolability. Among the function approximators, neural networks have been excellent candidate as universal approximators6 ; 7 . In the past decade, deep neural networks which are basically multiple layers of neural networks have been used for various complex regression problems due to its ability to capture high dimensional strong non-linearity. These models are fitted to the data directly as input to output mapping7 . Neural networks can also be trained to differential equations by optimizing the residuals after fitting to random points in the input domain. Such models are called physics-informed neural networks or popularly called PINNs4 ; 5 ; 8 ; 9 .

Solving partial differential equations using PINNs are universally accepted by the scientific community. There are a plenty of advantages of these methods over conventional methods. The major once are ability to solve wide category of problems that were tough to solve otherwise. Moreover, these methods do not require meshing and discretization which is sometimes a tough task. Another advantage unlike other analytical models does not require data from full order solutions to set the parameters. However, being newly developed these techniques are not robust enough for solving complex equations like hyperbolic equations, strongly non-linear equations, strongly advective equations5 , chaotic dynamical systems, coupled system of equations, shock wave equations etc.

2 Dynamical Partial Differential Equations

Activities in the world is mostly the four dimensions, three in space and one in time. Each new dimension adds a layer of complexity. Dynamical systems generally mean functions that describe the dependence of state of a system with time. Henri Poincare was the first one to identify the special behaviour of dynamical systems. The theory of these dynamical systems is highly relevant in studying behaviour of complex dynamics, usually in the form of differential equations, which makes it continuous dynamical systems. The major points of focus in this domain are the attractors, chaos, fractals and bifurcations that explains the long term behaviour of states qualitatively. This helps understanding evolution of dynamical events like turbulence, storm, mixing fluids, environment change, economic changes, planetary motions and many more.

The necessary applications of dynamical systems theory are to find structural stability, Lyapunov time, bifurcation points, position tracking and quantitative approximations which one way or the other determines the predictability of the state at a particular time. Predictability of dynamical systems is a tough job. Before the advent of computing machines prediction required sophisticated mathematical techniques that were specific to specific classes of dynamical systems. These are sometimes among the toughest differential equations to solve. Also considering other factors mentioned above, accurate prediction is a great deal for these kinds of systems.

3 Case Selection

The cases below point out two major difficulties in solving differential equations clearly. The concepts are explained with reference to the terms and framework of the equation mentioned. The two equations shall be good examples to analyse the theory of PINNs.

Refer to caption — Figure 1: Solution of steady state advection-diffusion

3.1 1D Steady Advection-Diffusion Equation

The differential equation below is the governing equation for steady one dimensional flow of combined advection and diffusion phenomena.

\alpha u_{x}=u_{xx}

If we notice, $\alpha$ is the weight for the advection term in the equation. That means larger is $\alpha$ , more dominant is the advection effect, which introduces directional characters and hence discrete approximation becomes tougher. Figure 1 shows the difference in the solution with advection dominance. Higher is the Peclet number, more dominant is the advection. The figure compares the solution for Pe 1 and 50. Hence the numerical integration sees rapidly growing error that makes the solution unstable and inaccurate10 . Hence, a major class of higher order methods are developed to tackle this particular issue.

3.2 1D Kuramoto–Sivashinsky Equation

The equation below is the one dimensional Kuramoto-Sivashinsky equation

u_{t}+\alpha uu_{x}+\beta u_{xx}+\gamma u_{xxxx}=0

The linear form of it is as below:

u_{t}+\alpha u_{x}+\beta u_{xx}+\gamma u_{xxxx}=0

This equation has the advection, diffusion and dissipation effect. It is one of the equations where the solution is extremely sensitive to the initial condition2 ; 3 . The higher order terms in the expansion of the difference equation are very much relevant and hence sensitive for error propagation in time. Figure 2 shown the solution of a case of one dimensional KS equation on Julia code developed by Mahatab Lak et. al. from the University of New Hampshire.

4 Neural Networks as Universal Function Approximator

George Cybenko was the one to prove arbitrary width case using neural networks with sigmoid activation in 19896 . Later in the same year, Hornik et. al. proved multi-layer feed-forward networks are universal approximators7 . Multi-layer artificial neural networks are composites of weighted sum of inputs passed through non-linear(activation) functions like tanh(), sigmoid(), etc. This enables an extremely potent highly non-linear function with large number of trainable parameters(weights and biases). This makes it universal approximation.

5 Physics-informed Neural Networks

Informing the physics to a neural network is a concept brought up by Lagaris et. al4 . in the late 1990s by using neural network as a trial function to solve differential equations by reduction of the residuals using AutoGrad (an automatic differentiation technique) at various points in the domain. The boundary constraints are forced into the neural network function by modifying it mannually. In 2017, Raissi et al. proceed by using more accurate automatic differentiation and deeper networks to approximate tougher problems8 ; 9 . The novelty in their work comes from the way they pose the loss function to reduce the residual. They did not manually force the constraints by modifying the trial function rather let the trial function fit to the boundary and initial constraints by adding the mean square error from the data points satisfying the conditions as summed constraints to mean squared residuals. This makes the technique very much generalizable. Almost all the differential equations could be posed to be solved using this technique, which they named PINNs.

5.1 Advancements in PINNs

The unknown solution $\boldsymbol{u}(t,\boldsymbol{x})$ is represented by a deep neural network $\boldsymbol{u}_{\boldsymbol{\theta}}(t,\boldsymbol{x})$ , where $\boldsymbol{\theta}$ denotes all tunable parameters of the network (e.g., weights and biases). The physics-informed model can be trained by minimizing the following loss function.

\mathcal{L}(\boldsymbol{\theta})=\lambda_{ic}\mathcal{L}_{ic}(\boldsymbol{\theta})+\lambda_{bc}\mathcal{L}_{bc}(\boldsymbol{\theta})+\lambda_{r}\mathcal{L}_{r}(\boldsymbol{\theta}),

where

Here $\left\{\boldsymbol{x}_{ic}^{i}\right\}_{i=1}^{N_{ic}},\left\{t_{bc}^{i},\boldsymbol{x}_{bc}^{i}\right\}_{i=1}^{N_{bc}}$ and $\left\{t_{r}^{i},\boldsymbol{x}_{r}^{i}\right\}_{i=1}^{N_{r}}$ can be the vertices of a fixed mesh or points that are randomly sampled at each iteration of a gradient descent algorithm. The hyper-parameters $\left\{\lambda_{ic},\lambda_{bc},\lambda_{r}\right\}$ allow the flexibility of assigning a different learning rate to each individual loss term in order to balance their interplay during model training12 ; 13 . These weights may be user-specified or tuned automatically during training.

5.2 Advantages of PINNs over Other Neural Networks

The major advantage of this technique is that it does not require physical data to train the analytical model. Moreover, the technique is generalizable in the sense that with the exactly same concept various equations could be solved4 ; 5 ; 8 ; 9 ; 12 ; 13 . Previous models required alteration of the learning function depending on number of equations coupled, boundary conditions etc. in order to force the constraints. Being a strong approximating function the three major kinds of error, namely instability, inaccuracy and shifting errors, could be taken care of simultaneously. These problems are taken care individually in finite numerical techniques5 . This kind of technique is an excellent candidate for robust higher order methods. Neural network is not just an approximator but rather a smart approximator. Hence, depending on the local physical property it can act differently with switching kind of behavior. In particular for the case of dynamical system, with time progression the integrated error increases too rapidly. PINNs being an optimization technique for regression, training to measured physical data points as additional loss terms or regularization can be used for correcting the approximating function.

6 Experiments with the selected cases

The qualitative property in dynamical problems is strong translational variance. Hence, the two major causes for PINNs performing poorly are advection dominance and time variance which are demonstrated below.

6.1 PINNs for 1-D Steady Advection-Diffusion

Peclet number is a good non-dimensional parameter to scale advective dominance over diffusion characteristic of an equation. It is the ratio of advective transport rate to diffusive transport rate. In our problem we can quantify that by the ratio of coefficient of advective term times the length of domain space to coefficient of diffusive term, so that is $\alpha$ in our particular case. PINN can solve this problem but there is a limit set by advection characters in the differential equations. No matter how deeper and how sophisticated we make the neural network it is not possible to solve for problems with Peclet number more than something close to 8. Figure 3 shows the results noted. For lower Peclet number problems, it is noticed that it is not necessary to have deeper and wider layers. A good thing is in conventional numerical techniques fails for problems set with this value more than 2. Hence, these schemes can be used for shape functions that let larger grids with similar accuracy. The work and figures in this section has been sourced from by 2019 thesis titled ”Numerical Approximation in CFD Problems Using Physics Informed Machine Learning”5 .

Parametric analysis is done to understand more about the performance and effectiveness. The impact of change in the non-linearity function as well as loss optimization algorithm is studied. Figure number 4 records the loss values in a tabular form. Among various non-linearity functions tanh() and tan() performs consistently as well as better than sigmoid(). However, tanh() wins this game clearly. Figure number 5 shows the trend of loss value with iteration for various optimizers. The neural network is trained using various optimizers however L-BFGS-B and SL-SQP perfoms better than other specialised optimization techniques. BFGS performs remarkably better than others. The prime reason could be the fact that these optimizers are second order and perform better in the case of optimizing multiobjective functions than other techniques. However, it can be noted that first order techniques like Adam performs decently even if they don’t fail. An important thing that can be observed that the collocated PDE residuals and the fitting losses for constraints are clearly not in similar scale which is not typical for regular data-driven neural network learning. Also the gradient pathology is not smooth and hence tough for other optimizers. Guided optimization like hill climbing helps in few cases.

6.2 PINNs for 1-D Kuramoto-Sivashinsky

There are recent publications demonstrating how PINNs can solve one dimensional Kuramoto-Sivashinsky equation which is among the standard dynamical equation that can turn chaotic. However, there is a small intersecting set of coefficients for advection, diffusion and dissipation terms for which this works. As demonstrated in the previous case the dominance of advection toughens the optimization of the PINN. Here, not only it is dynamical but also non-linear. It has various orders of spatial derivatives making it difficult especially when the system becomes chaotic.

CausalPINN by Wang et. al. is considered the state of art PINN13 . They have rightly identified the problem of multiobjctive optimization due to difference in scales of residuals and constraints stated by Rout et. al.5 and devised weights for each loss terms by normalizing with the each cumulative loss terms. They are first people to be able to solve 1D KS Equation. We can validate their model using their open source code provided. Figure 6 shows how CausalPINN perfoms with time for the case provided in the figure. It can be noticed that PINN can now solve complex dynamical problems. The initial sine smoothly curls as expected while the constraints are obeyed. Like initial curve is a neat sine function and the boundaries are continuously at base zero(0). However, the typical equation where all the coefficients are considered 1 is taken for regular study of the equation. The state of art PINN fails to optimize even after an effort equivalent to one-day’s run-time. The net loss is recorded to be 33.263 where the constraint loss was 0.0016. This suggests the difficulty in fitting to the PDE where as PINN could manage to obey the constraints. The residual loss is noted to be 1808.506, where its weight in the loss function disappear from the scale. This clearly shows the issue of multi-objective optimization.

7 Observations and Conclusion

Based on the experiments and analysis a few points could be commented. Simple addition or weighed by mean addition of squared loss terms of residuals at collocation points and fitting points directly for constraints have different scale and orders of magnitudes. It is one of the major issue as it can lead to non-pareto optimal solution. It is also noticed that sometime the loss function gets stuck at local optima and gradient pathologies are tough and uneven in the parameter hyperspace5 ; 12 . Hence, stochastic first order optimizers work otherwise higher order optimizers suitable for constrained optimizations like SQP and BFGS works while other fails5 . A better representation of loss function like appropriate or adaptive weighted losses can help. Otherwise constraints could be forcefully enforced by modified architecture or trial function, like explained by Isaac Lagaris et. al4 . Specifically, in the context of problems in dynamical systems recurrent neural networks(RNN) could prove to be better candidate over simple deep networks14 . A concrete reasoning has been provided by Eldad Haber et. al. where RNNs can be proved to be in the form of differential equations and hence fit into the theory to learn the dynamical differential equations better15 ; 16 . Especially, for chaotic systems reservoir computing have been proved to be performing better11 . Reservoir computers are a class of RNNs where the intermediate nodes are randomly arranged and connected11 ; 14 . They have random recurrent connections. The intermediate nodes are jumbled and entangled however they are connected out to the output layer linearly. The entangled architecture makes it tough to backpropagate and hence only the final layer of weights are trainable for convenience. The trainable output layers makes the effective non-linear network linear with respect to trainable parameters hence conserving strong non-linearity while making it easy to train. Apparently, RNN can also be introduced with PINNs kind of loss definition to solve chaotic problems for turbulence and extreme event prediction17 .
Ultimately, we can justify the errors and specify the right path to solve a dynamical system of partial differential equations by identifying the two prime cause of poor performance. The two causes are advection dominance and time variance, which have been identified from the case studies. We can conclude that there is not always a requirement for deep and bulky layers in the architecture. The criticality lies in the way it is posed for optimization and the optimizability. Gradient pathology must be taken care of. Deeper layers give the potential to capture extremely strong non-linearity in high dimensional and strongly coupled system of equations. Also, specifically for time variance characteristic we should use recurrent neural networks, especially reservoir networks which are in fact light weight but performs better. ”Physics-Informed Recurrent Neural Networks (PIRNN) is the right path for solving dynamical and chaotic problems”.

References

(1) J. Strikwerda, “Front Matter,” Finite Difference Schemes and Partial Differential Equations, Second Edition, pp. i–xii, Jan. 2004, doi: 10.1137/1.9780898717938.fm.
(2) Y. Kuramoto, “Diffusion-Induced Chaos in Reaction Systems,” Progress of Theoretical Physics Supplement, pp. 346–367, 1978, doi: 10.1143/ptps.64.346.
(3) G. I. Sivashinsky, “Nonlinear analysis of hydrodynamic instability in laminar flames—I. Derivation of basic equations,” Acta Astronautica, no. 11–12, pp. 1177–1206, Nov. 1977, doi: 10.1016/0094-5765(77)90096-0.
(4) I. E. Lagaris, A. Likas, and D. I. Fotiadis, “Artificial neural networks for solving ordinary and partial differential equations,” IEEE Transactions on Neural Networks, no. 5, pp. 987–1000, 1998, doi: 10.1109/72.712178.
(5) S. Rout, V. Dwivedi, and B. Srinivasan, “ Numerical Approximation in CFD Problems Using Physics Informed Machine Learning,” ArXiv, Nov. 2021, doi: 10.48550/arXiv.2111.02987. Master’s Thesis 2019, Indian Institute of Technology Madras
(6) G. Cybenko, “Approximation by superpositions of a sigmoidal function,” Mathematics of Control, Signals, and Systems, no. 4, pp. 303–314, Dec. 1989, doi: 10.1007/bf02551274.
(7) K. Hornik, M. Stinchcombe, and H. White, “Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks,” Neural Networks, no. 5, pp. 551–560, Jan. 1990, doi: 10.1016/0893-6080(90)90005-6.
(8) M. Raissi, P. Perdikaris, and G. E. Karniadakis, “Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations,” Journal of Computational Physics, pp. 686–707, Feb. 2019, doi: 10.1016/j.jcp.2018.10.045.
(9) M. Raissi, P. Perdikaris, and G. Karniadsakis, “Physics Informed Deep Learning (Part I): Data-driven Solutions of Nonlinear Partial Differential Equations,” ArXiv, Nov. 2017, doi: 10.48550/arXiv.1711.10561.
(10) S. Patankar, Numerical Heat Transfer and Fluid Flow (Computational Methods in Mechanics Thermal Sciences). CRC Press, 1980.
(11) D. J. Gauthier, E. Bollt, A. Griffith, and W. A. S. Barbosa, “Next generation reservoir computing,” Nature Communications, no. 1, Sep. 2021, doi: 10.1038/s41467-021-25801-2.
(12) S. Wang, Y. Teng, and P. Perdikaris, “Understanding and Mitigating Gradient Flow Pathologies in Physics-Informed Neural Networks,” SIAM Journal on Scientific Computing, no. 5, pp. A3055–A3081, Jan. 2021, doi: 10.1137/20m1318043.
(13) S. Wang, S. Sankaran, and P. Perdikaris, “Respecting causality is all you need for training physics-informed neural networks,” arXiv.org, Mar. 2022, doi: 10.48550/arXiv.2203.07404.
(14) A. Chattopadhyay, P. Hassanzadeh, and D. Subramanian, “Data-driven predictions of a multiscale Lorenz 96 chaotic system using machine-learning methods: reservoir computing, artificial neural network, and long short-term memory network,” Nonlinear Processes in Geophysics, no. 3, pp. 373–389, Jul. 2020, doi: 10.5194/npg-27-373-2020.
(15) E. Haber and L. Ruthotto, “Stable architectures for deep neural networks,” Inverse Problems, no. 1, p. 014004, Dec. 2017, doi: 10.1088/1361-6420/aa9a90.
(16) B. Chang, L. Meng, E. Haber, F. Tung, and D. Begert, “ Multi-level Residual Networks from Dynamical Systems View,” arXiv.org, 2018. https://arxiv.org/abs/1710.10348.
(17) N. A. K. Doan, W. Polifke, and L. Magri, “Physics-informed echo state networks,” Journal of Computational Science, p. 101237, Nov. 2020, doi: 10.1016/j.jocs.2020.101237.